Causal Inference for Binary Outcomes

Causal Inference for Binary Outcomes
Applied Health Econometrics Symposium
Leeds October 2013
Frank Windmeijer
University of Bristol
Outline:
1. Instrumental variables estimators for binary outcomes
 Structural economic/econometric models
 Binary outcomes
 Identification assumptions
 Structural Mean Models
 Risk difference, Risk ratio, Odds ratio
 Mendelian randomisation
 GMM estimation
2
Related papers:
Clarke P.S. and F. Windmeijer, Instrumental Variable Estimators for
Binary Outcomes, JASA 2012, 1638-1652.
Clarke P.S. and F. Windmeijer, Identification of Causal Effects on
Binary Outcomes Using Structural Mean Models, Biostatistics 2010.
Clarke P.S., Palmer T. and F. Windmeijer, Estimating Structural Mean
Models with Multiple Instrumental Variables using the Generalised
Method of Moments, Cemmap/CMPO Working paper, 2012.
Von Hinke Kessler Scholder, S., Davey Smith, G., Lawlor, D.A.,
Propper, C. and F. Windmeijer, Genetic Markers as Instrumental
Variables, CMPO working paper.
3
Structural model and potential outcomes
Consider the simple linear model for continuous outcome Y
Y  0  X 1  U
If this is a structural model then 1 is the ceteris paribus effect of a
change in X , holding U constant.
The potential outcome Y  x  is the outcome that would be obtained if
exposure X is set equal to x .
The average causal effect is then defined as
ACE  x, x   E Y  x   Y  x  
4
For the simple structural linear model,
Y  x   0  1 x  U
and hence
Y  x   Y  x     x  x  
1
 ACE  x, x  .
In the linear model an endogeneity problem occurs if U is not
conditionally mean independent of observed X , E U | X   0 , and OLS
is inconsistent for 1.
5
A classic example in microeconometrics is the returns to education.
A simple Mincer wage equation is given by
ln wi  0  1educi   2experi  3experi 2  ui
and endogeneity arises because there are unobserved individual
characteristics in ui that affect both ln wi and the choice of the number
of years of education.
educ
6
An Instrumental Variable is then a variable that, in this particular
example, determines the level of education, but is independent of the
unobservables in the ln w equation, u .
For example, Angrist and Krueger (1991) use quarter of birth as an
instrument for education, in a “natural experiment” type setting.
educ
QoB
7
We are interested in the case that outcome Y , treatment X and
instrument Z are all binary 0,1 variables.
Example, the effect of being overweight on hypertension, using genetic
marker as instrument.
We adopt a potential outcomes framework, and the system is triangular
such that potential outcomes are denoted X  z  and Y  z , x  .
Causal effects are:
ACE  ATE  E Y 1  Y  0  
CRR  E Y 1  / E Y  0  , causal risk ratio
COR 
E Y 1  / E 1  Y 1 
E Y  0   / E 1  Y  0  
8
, causal odds ratio
The IV core conditions can be stated as:
1. Independence of the potential outcomes and IV: Y  z, x   Z
2. Exclusion restriction: Y  z, x   Y  x 
3. There is an association between exposure and IV: X not  Z
9
Under these core conditions, it is not possible to point-identify a causal
effect without any further assumptions on the generating process, which
has the general form
X  f X  Z ,V 
Y  x   f Y  x,U 
U ,V  ~ FUV
Manski (1990) provides “worst case” bounds, Balke and Pearl (1997)
provide sharp bounds for causal effects like ACE.
Chesher (2010) also provides sharp bounds for causal effects, but with
respect to different structural models that constrain the family of
generating processes. We conjecture therefore that Chesher bounds are
at least as narrow as those of Balke and Pearl.
10
At the other extreme, we can point identify all causal effects by
specifying a fully parametric structural model:
Y  x   I   0  x1  U  0 
X  Z   I  0  Z1  V  0 
Specifying the joint distribution of U ,V  as bivariate normal results in
the bivariate probit model, with the ML estimator consistent for the
parameters, provided the model is correctly specified.
11
Structural Mean Models (SMMs)
Robins (1989, 1994) introduced the class of semiparametric structural
mean models. See also Vansteelandt and Goetgebheur (2003).
The additive structural mean model is specified as
E Y | X , Z   E Y  0  | X , Z    0  1Z  X
This is a saturated model. From core conditions, Y  z, x   Y  x   Z
(conditional mean independence (CMI), or randomisation assumption)
E Y  0  | Z  1  E Y  0  | Z  0   E Y  0  
12
and hence


E Y   0  1  X  | Z  1  E  Y  0 X  | Z  0 
Cannot identify 2 parameters from 1 moment condition. The assumption
made in SMM models is the “No Effect Modification” assumption
(NEM). This assumes that the treatment effect is not modified by the
value of the instrument Z, i.e.  1  0 , or
E Y | X , Z   E Y  0  | X , Z    0 X
 0  E Y 1  Y  0  | X  1, ATT
13
and  0 is identified from the moment condition
E  Y  0 X  | Z  1  E  Y  0 X  | Z  0 
or, equivalently, from
E  Y  X 0   0  | Z  1  0
E Y  X 0   0   0
where  0  E Y  0   . Hence the SMM estimator for  0 is the same as
the standard linear IV estimator, in this case the Wald estimator, with
estimand
E Y | Z  1  E Y | Z  0 
0 
E  X | Z  1  E  X | Z  0 
14
Note, however, that a linear structural model
Y   0  0 X  U
does not make sense here, as U is either 1   0  X 0 or   0  X 0 
and hence not an antecedent of X .
With multivalued Z , the moments are, under NEM,
E  Y  X 0   0  | Z  j   0
and the causal parameters can be estimated efficiently by GMM.
15
Multiplicative SMM
The multiplicative SMM is
E Y | X , Z 
E Y  0  | X , Z 
 exp   0  1Z  X  .
Under NEM, 1  0 , and from the CMI assumption it follows that:
E Y exp   X  0    0 | Z  1  0
E Y exp   X  0    0   0
where  0  E Y  0   , and
16
exp  0  
E Y 1 | X  1
E Y  0  | X  1
Is the causal risk ratio among the treated.
Note that moment conditions of the form
E Y exp   X 0    0 | Z  j   0
are equivalent to the Mullahy multiplicative moments for count data
 Y  exp  0*  X  0 

E
| Z  j   0.
*
 exp  0  X  0 




where  0*  ln E Y  0  

17
A further generalisation is a logistic SMM, under NEM:


logit E Y | X , Z   logit E Y  0  | X , Z   0 X
where logit  p   ln  p / 1  p  , and exp 0  is the causal odds ratio for
the treated


exp   
.
E Y  0  | X  1 / 1  E Y 1 | X  1 
E Y 1 | X  1 / 1  E Y 1 | X  1
0
The causal parameters of the SMMs can easily be estimated using the
GMM command function in Stata, or R, and programmes are given in
Clarke, Palmer and Windmeijer (2012).
18
Local Treatment Effects
For the binary outcome, treatment and instrument case considered
before, NEM does not hold if the generating process e.g. is a bivariate
probit.
If the NEM assumption does not hold, we can point identify (weighted)
local causal effects, see e.g. Imbens and Angrist (1994), and Frangakis
and Rubin (2002)).
The core conditions for local estimation can be written as:
1. Independence of all potential outcomes and IV: X  z  , Y  z , x   Z .
2. Exclusion restriction: Y  z, x   Y  x 
3. Causal effect of IV on exposure: E  X  z   is a nontrivial function
of z .
19
Then, if the selection model is monotonic, such that
X  z   X  z   if z  z
or vice versa, then the additive SMM identifies the LATE
E Y 1  Y  0  | X 1  X  0  
and the multiplicative SMM identifies the LRR
E Y 1 | X 1  X  0  
E Y  0  | X 1  X  0  
With a multivalued instrument, the SMMs identify a weighted LATE
(Angrist and Imbens) and a weighted LRR.
20
For example, let the values for Z , 0,1, 2,...K  be ordered such that
E YX | Z  k   E YX | Z  k  1 , then for the one-step GMM
estimator:
K
ez   k ek,k 1

k 1
where ek,k 1 is the LRR for the subgroup with values for Z  k and
Z  k  1.
21
As an example, consider an instrument that takes the values
Z  0,1, 2,3, with Y and X generated from a bivariate normal
distribution as
X  1c0  c1Z1  c2 Z 2  c3 Z 3  V  0
Y  1b0  b1 X  U  0
U 
V 
 
0  1
N   ,
0  
where Z j  1Z  j.
22
 


1 
The parameters are such that

LRR1,0  e1,0 

LRR2,1  e2,1 

LRR3,2  e3,2 
E Y 1 | X 1  X  0  
E Y  0  | X 1  X  0  
E Y 1 | X  2   X 1 
E Y  0  | X  2   X 1 
E Y 1 | X  3  X  2  
E Y  0  | X  3  X  2  
 1.1585
 1.3227
 1.5303
And the population values of the  k are given by
1  0.3725; 2  0.3991; 3  0.2285
23
The one-step GMM estimator will thus be a estimate of the weighted
average 1LRR1,0   2 LRR2,1   3 LRR3,2  1.3090.
The table presents some estimation results confirming this:



1
e1,0
e2,1
e3,2
e
Mean
1.164 1.330 1.542 1.311 0.373
St Dev 0.094 0.121 0.160 0.038 0.027
10,000 MC replications. Sample size 40,000.
2
3
0.399
0.032
0.228
0.022
Further, using the two-step GMM results, Hansen’s J-test rejects the
null 47% of the time at the 5% level, therefore clearly having power to
reject this violation of the NEM assumption.
24
Application 1
Ten Have, Joffe and Cary (2003). Randomized placebo-controlled trial
involving 266 African-American adults aged between 40 and 70 who
had high cholesterol and/or hypertension. The treatment X is an
intervention with patients supplied with an audio tape containing advice
about good dietary behaviour for lowering cholesterol. (Noncomplicance)
The instrument Z is randomisation, outcome Y is binary indicator for
lower cholesterol.
25
Randomisation
Selection
Z
X
Usual Care (0) Usual Care (0)
Tape (1)
Tape (1)
Usual Care (0)
Tape (1)
Outcome Y
Positive
Negative
(1)
(0)
33
99
0
0
9
20
40
65
Total
82
26
184
Total
122
0
29
105
262
Estimator
First-Stage Model
Intention to treat
Ignoring Selection
Linear/Logistic/
Probit
Bounds
Balke-Pearl
Chesher
Target Parameter
Estimate
E  X | Z  1  E  X | Z  0  0.784 (0.713, 0.851)
E Y | Z  1  E Y | Z  0  0.116 (0.007, 0.226)
ATE
CRR
COR
0.120 (0.003, 0.234)
1.460 (1.010, 2.113)
1.744 (1.012, 2.976)
ATE
CRR
COR
ATE
CRR
COR
0.049-0.265
1.194-2.060
1.277-3.185
0.116-0.265
1.463-2.060
1.729-3.185
27
Fully parametric
IV probit
Semiparametric
2SLS
Mult. SMM
Logistic SMM
ATE
CRR
COR
0.151 (0.009, 0.296)
1.603 (1.027, 2.540)
2.007 (1.038, 3.973)
LATE/ATT
LRR/CRRT
CORT
0.148 (0.006, 0.285)
1.633 (1.025, 3.122)
2.022 (1.028, 3.862)
28
Application 2
We apply the SMM estimation procedures described above to estimate
the causal effect of adiposity on hypertension as in Timpson et al.
(2010), using genetic markers as instruments for adiposity. The data are
from the Copenhagen General Population Study and the full details of
the variable definitions and selection criteria are described in Timpson
et al. (2010).
The outcome variable is whether an individual has hypertension,
defined as a systolic blood pressure of >140 mmHg, diastolic blood
pressure of > 90 mmHg, or the taking of antihypertensive drugs. The
intermediate adiposity phenotype is being overweight, defined as having
a BMI>25.
We use genetic markers as instruments for being overweight.
29
Davey Smith, G. et al. BMJ 2005;330:1076-1079
30
Suitable and robust genetic instrument
We use two SNPs that have been consistently shown to relate to weight
• Frayling et al. (2007) use 38,759 individuals aged 7-80 from 13
different cohort of European ancestry.
They find a positive association between FTO and all measures of
weight:
• for individuals in all cohorts
• in all countries
• of all ages and
• of both sexes, with no difference between males and females
No association with birth weight or height
Each copy of FTO risk allele increases weight by 0.8 - 1kg
• Similar, though slightly smaller associations are found for MC4R
using 77,228 adults and 5,988 children (Loos et al., 2008)
31
Behaviours affected by genotype
E.g. if mothers carry the “fat” alleles, this may have impacted on her
behaviour.
Mechanisms
It is often unknown how the genes affect the phenotype. Recent
studies show the effect of FTO on appetite and hence diet.
Assortative Mating
Hardy-Weinberg Equilibrium
Test for patterns in observable characteristics between hetero- and
homozygotes
Population Stratification
E.g. ethnicity.
32
Linkage Disequilibrium
Some variants of different genes co-inherited
Degree of linkage is function of distance between the loci
Pleiotropy
Single genetic marker has multiple phenotypic effects
Gene environment interaction, epigenetics
33
DAGs for Linkage Disequilibrium
OK
Not OK
G2
Z: G1
G2
u
A
Y
Z: G1
34
u
A
Y
Estimation Results, effect of being overweight on hypertension
SMM
Linear
J-test p
OLS
0.2009
(0.0039)
2SLS
0.2091
(0.0819)
GMM2
0.2094
(0.0814
0.2965
Multiplicative

Gamma
0.2974
(0.0063)
GMM1
0.3090
(0.1192)
GMM2
0.3104
(0.1192)
0.3071
Logistic
Logit
0.9487
(0.0189)
GMM1
1.0409
(0.4220)
GMM2
1.0528
(0.4217)
0.2924


Sample size 55,523
35
We can use the same GMM format to estimate the logistic SMM with a
continuous exposure X .
With a continuous exposure, parametric assumptions have to be made in
order to identify causal parameters. Following Vansteelandt and
Goetghebeur (2003) and Vansteelandt et al. (2010), we impose that the
exposure effect is linear in the exposure on the odds ratio scale and
independent of the instrumental variable:
odds Y  1| X , Z 
odds Y  0   1| X , Z 
36
 exp 0 X 
Exposure

J-test p
BMI
ln BMI
ln RELBMI
0.1122
(0.0384)
0.3035
(0.1069)
0.2879
(0.1016)
0.4714
0.4828
0.5004
37
Application 3
• Child weight and academic performance:
Causal effect from weight to educational outcomes, e.g.
• Overweight children experience higher absenteeism in
school
• Overweight children are more likely to have sleep problems
• Overweight children may be treated differently by peers and
teachers
•
Reverse causation, e.g.
• Poor school outcomes may cause obesity
Association driven by other unobserved factors that affect both
weight and academic outcomes, e.g.
• Time discount rates
38
39
Data
Avon Longitudinal Study of Parents and Children (ALSPAC)
Mothers with expected delivery date between 01/04/91 31/12/92
Approx. 12,000 pregnancies; genotypes for 7,700 children
Detailed information from variety of sources
In-depth interviews, questionnaires, medical & school records,
etc.
As not all children attended special clinics where measurements
were taken, final sample sizes drop to 3,500
40
Outcome: nationally set KS3 exam (age 14, standardised)
Child body size:
Direct measure of child fat mass (DXA scan, age 11,
standardised)
Contextual variables:
Mother’s pre-pregnancy BMI
Birth weight, breastfed, age (in months), non-white, hh
composition
Family income (sq), social class, employment status, mother’s
and grandparents education, lone parenthood, local area
deprivation (IMD)
Maternal health and behaviour:
Smoking/drinking during pregnancy, mother’s age at birth
Mother’s locus-of-control, EPDS, CCEI
Parental investment in child: teaching scores, activity scores
41
42
Estimation results:
OLS:
(1)
(3)
(3)
KS3
KS3
KS3
-0.099***
-0.052***
-0.040***
(0.015)
(0.014)
(0.014)
R-squared
0.01
0.26
0.30
Number of observations
3729
3729
3729
Yes
Yes
DXA, age 11
Contextual variables
Mother’s health and behaviour
Yes
43
First Stage:
FTO
MC4R
IV strength, F-statistic
(1)
DXA
(2)
DXA
(3)
DXA
1.49***
1.40***
1.43***
(0.24)
(0.23)
(0.23)
0.768**
0.890**
0.898**
(0.28)
(0.27)
(0.27)
22.8
22.7
23.5
Yes
Yes
Contextual variables
Mother’s health and behaviour
Yes
44
IV:
(1)
OLS, KS3
(2)
IV, KS3
(3)
IV, KS3
(4)
IV, KS3
-0.040***
0.137
0.098
0.115
(0.014)
(0.143)
(0.132)
(0.131)
Contextual variables
Yes
No
Yes
Yes
Mother’s health and behaviour
Yes
No
No
Yes
DXA, age11
45
OLS shows that heavier children perform worse in school tests
compared to their leaner counterparts
Using the genetic markers, there is no evidence of fat mass affecting
outcomes
Mendelian randomization:
Strength of instruments
Even with valid instruments, need to recognise limitations
• Power
• Large standard errors
• We need larger sample sizes, more variants, or markers with
larger effects
46