17.802 – Recitation 1 - Potential Outcomes Model

17.802 – Recitation 1
Potential Outcomes Model
Yiqing Xu
MIT
Feb 7, 2014
1
Potential Outcomes Framework
2
Holland (1986)
3
Problem Set 1
Notations
Potential outcomes: {Y0i , Y1i }
Realization: Yi
Another way to write it: Yi = Di · Y1i + (1 − Di ) · Y0i
Y1i if Di = 1
Meaning Yi =
Y0i if Di = 0
Causal effect: αi = Y1i − Y0i
We only observe Y0i or Y1i
Yiqing Xu (MIT)
Rec1
Feb 7, 2014
1 / 14
Stable Unit Treatment Value Assumption (SUTVA)
The impact of treating a unit does NOT depend on how the
treatments assigned
SUTVA assumes:
No interference: the treatment status of one unit does not affect the
potential outcomes of another
No variation in treatment: the treatments for all units are comparable
Potential outcomes are thought to be fixed for each individual
Violations:
Vaccination (interference)
Fertilizer A and B on crop yield, but each fertilizer has a lot of versions
(variation in treatment)
Hawthorne effect: people change their behavior if they know they are
being observed (variation in treatment)
Yiqing Xu (MIT)
Rec1
Feb 7, 2014
2 / 14
The case of interference
Potential outcomes
Y1i (D) =
Y1i (1, 1)
Y1i (1, 0)
Y0i (D) =
Y0i (0, 1)
Y0i (0, 0)
Causal effect(s)

Y1i (1, 1) − Y0i (0, 0)




Y1i (1, 1) − Y0i (0, 1)



Y1i (1, 0) − Y0i (0, 0)
αi (D) =
Y1i (1, 0) − Y0i (0, 1)





Y (1, 1) − Y1i (1, 0)

 1i
Y0i (0, 1) − Y0i (0, 0)
Yiqing Xu (MIT)
Rec1
Feb 7, 2014
3 / 14
ATE and ATT
Average treatment effect
αATE = IE[Y1i − Y0i ]
Average treatment effect on the treated
αATT = IE[Y1i − Y0i |Di = 1]
Average treatment effect on the controls
αATC = IE[Y1i − Y0i |Di = 0]
Yiqing Xu (MIT)
Rec1
Feb 7, 2014
4 / 14
Selection Bias
Math
IE[Y |D = 1] − IE[Yi |Di = 0] = IE[Y1i |Di = 1] − IE[Y0i |Di = 0]
= IE[Y1i − Y0i |Di = 1] + {IE[Yi0 |Di = 1] − IE[Y0i |Di = 0]}
|
{z
} |
{z
}
ATT
BIAS
Example: the effect of MIT education
Yiqing Xu (MIT)
Rec1
Feb 7, 2014
5 / 14
1
Potential Outcomes Framework
2
Holland (1986)
3
Problem Set 1
Causality in “More Easy Physical Sciences”
“When signal is not faster than light (assumption), a Lorentz
transformation of reference frame such that event A (“cause”)
and B (“effect”) is simultaneous is not possible.”
Causality is possible when the speed of light is constant
Yiqing Xu (MIT)
Rec1
Feb 7, 2014
6 / 14
Rubin’s Model
Causality is about time, but Rubin points out:
The effect of a cause is always relative to another cause
{Y0i , Y1i }
The treatment causes the effect {Y1i − Y0i }
Fundamental problem of causal inference:
i.e., cannot observe both Y0i and Y1i
Two solutions: scientific and statistical
A scientific solution exploits homogeneity or invariance assumptions
e.g., a rock is a rock
A statistical solution makes use of the population
e.g., T = E (Y1 ) − E (Y0 )
Yiqing Xu (MIT)
Rec1
Feb 7, 2014
7 / 14
Special Cases
Temporal stability: sequential exposure of i to control and treatment
Unit homogeneity: perfect lab experiment, Y0i = Y0j , Y1i = Y1j
Independence
Constant effect
Yiqing Xu (MIT)
Rec1
Feb 7, 2014
8 / 14
What Can be a Cause?
(A) She did well on the exam because she is a woman.
(B) She did well on the exam because she studied for it.
(C) She did well on the exam because she was coached by her teacher.
(C): Had she not been coached by her teacher? Check
(A): Had she not been a woman?
Kempthorne (1978): “It is epistemological nonsense to talk about one trait of an
individual causing or determining another trait of the individual.”
(B): Was she studying for it voluntary?
The general problem is in deciding when it is an attribute of units and
when it is a cause that can act on units
Yiqing Xu (MIT)
Rec1
Feb 7, 2014
9 / 14
Messages in Holland (1986)
Study the effect of a cause rather than cause of a given effect
Effects of causes are relative to other causes
Attributes cannot be causes
Rubin disagrees: attributes can be causes under Fisher’s Null Hypothesis
(then SUTVA is automatically satisfied)
Yiqing Xu (MIT)
Rec1
Feb 7, 2014
10 / 14
Comments and Responses
Rubin’s comment (Important)
Always think about SUTVA carefully
No causation without manipulation
Cox’s comment
The distinction between quasitreatment and intrinsic variables is a
matter of viewpoint
Different “layers” of interpretation, e.g., turning a light switch
Glymour’s comment
The counterfactual needs to be well-defined
Some of the restrictions are unwarranted
Granger’s comment
Experimental framework may not be relevant for testing temporal
causality
Yiqing Xu (MIT)
Rec1
Feb 7, 2014
11 / 14
1
Potential Outcomes Framework
2
Holland (1986)
3
Problem Set 1
Question 1c
A collection of fixed numbers {y1 , y2 , · · · , yN } of unknown distribution
Each has an indicator {z1 , z2 , · · · , zN }, zi ∈ {0, 1}
P
We are interested in µy = N1 N
i=1 yi , which is a also number
But we can only observe {z1 y1 , z2 y2 , · · · , zN yN }
What is our best shot?
What about ỹi =
PN
zi yi
Pi=1
?
N
i=1 zi
2
var (ỹ ) = (1 −
Yiqing Xu (MIT)
n σy
N) n
Rec1
Feb 7, 2014
12 / 14
0.0
0.2
0.4
var.y
0.6
0.8
1.0
var (ỹ )
0
10
20
30
40
50
n
Yiqing Xu (MIT)
Rec1
Feb 7, 2014
13 / 14
Question 1c
The variance of ỹ is:
var (ỹ ) = var
N
1X
n
i=1
!
zi yi


N
N X
X
X
1
= 2
yi2 var (zi ) +
yi yj cov (zi , zj )
n
i=1
i=1 j6=i
Show that:
−n(N − n)
and
N 2 (N − 1)


N
N
1 n
n X 2
1 XX
var (ỹ ) = 2 (1 − )
yi −
yi yj 
n N
N
N −1
cov (zi , zj ) =
i=1
Yiqing Xu (MIT)
Rec1
i=1 j6=i
Feb 7, 2014
14 / 14