WHAT'S NEW IN
CAUSAL INFERENCE:
From Propensity Scores And
Mediation To External Validity
And Selection Bias
Judea Pearl
UCLA
(www.cs.ucla.edu/~judea/)
1
OUTLINE
1. Unified conceptualization of counterfactuals,
structural-equations, and graphs
2. Propensity scores demystified
3. Direct and indirect effects (Mediation)
4. External validity mathematized
2
TRADITIONAL STATISTICAL
INFERENCE PARADIGM
Data
P
Joint
Distribution
Q(P)
(Aspects of P)
Inference
e.g.,
Infer whether customers who bought product A
would also buy product B.
Q = P(B | A)
3
THE STRUCTURAL MODEL
PARADIGM
Data
Joint
Distribution
Data
Generating
Model
Q(M)
(Aspects of M)
M
Inference
M – Invariant strategy (mechanism, recipe, law,
protocol) by which Nature assigns values to variables
in the analysis.
•
“Think
Nature, not experiment!”
4
FAMILIAR CAUSAL MODEL
ORACLE FOR MANIPILATION
X
Y
Z
INPUT
OUTPUT
5
STRUCTURAL
CAUSAL MODELS
Definition: A structural causal model is a 4-tuple
V,U, F, P(u), where
• V = {V1,...,Vn} are endogeneas variables
• U = {U1,...,Um} are background variables
• F = {f1,..., fn} are functions determining V,
vi = fi(v, u)
e.g., y x u
Y
• P(u) is a distribution over U
P(u) and F induce a distribution P(v) over observable
variables
6
CAUSAL MODELS AND COUNTERFACTUALS
Definition:
The sentence: “Y would be y (in situation u), had X been x,”
denoted Yx(u) = y, means:
The solution for Y in a mutilated model Mx, (i.e., the equations for X
replaced by X = x) with input U=u, is equal to y.
The Fundamental Equation of Counterfactuals:
Yx (u ) YM x (u )
7
READING COUNTERFACTUALS
FROM SEM
2
2
Z
g
1
X
3
Y
0.5
Z
1
X
g 0.4
3
0.7
Y
X Treatment
x 1
Z Study Time
z x 2
Y Score
y x gz 3
Data shows: = 0.7, = 0.5, g = 0.4
A student named Joe, measured X=0.5, Z=1, Y=1.9
Q1: What would Joe’s score be had he doubled his study time?
8
READING COUNTERFACTUALS
2 0.75
2
0.5
1
X
Z
g 0.4
3
0.7
Y
0.5
1 0.5
2 0.75
Z 1.0
g 0.4
3 0.75 1 0.5
X 0.5 0.7 Y 1.5
Z 2.0
g 0.4
3 0.75
X 0.5 0.7 Y 1.9
Q1: What would Joe’s score be had he doubled his study time?
Answer: Joe’s score would be 1.9
Or,
In counterfactual notation:
Yz 2 (u ) 1.9
9
READING COUNTERFACTUALS
2 0.75
2
0.5
1
X
Z
g 0.4
3
0.7
Y
0.5
1 0.5
2 0.75
Z 1.0
g 0.4
3 0.75 1 0.5
X 0.5 0.7 Y 1.5
Z 2.0
g 0.4
3 0.75
X 0.5 0.7 Y 1.9
Q2: What would Joe’s score be, had the treatment been 0 and
had he studied at whatever level he would have studied had
the treatment been 1?
2 0.75
2 0.75
Z 1.25
ZZ11.25
.0
0.5
g 0.4
g 0.4
0.5
1 0.5
3 0.75
3 0.75
1 0.5
X 10.5 0.7 Y 12..525
0 0.7 Y 12.25
X 1
10
POTENTIAL AND OBSERVED OUTCOMES
PREDICTED BY A STRUCTURAL MODEL
11
CAUSAL MODELS AND COUNTERFACTUALS
Definition:
The sentence: “Y would be y (in situation u), had X been x,”
denoted Yx(u) = y, means:
The solution for Y in a mutilated model Mx, (i.e., the equations for X
replaced by X = x) with input U=u, is equal to y.
• Joint probabilities of counterfactuals:
P (Yx y, Z w z )
In particular:
u:Yx (u ) y , Z w (u ) z
P ( y | do(x) )
P (Yx y )
P (Yx ' y '| x, y )
u:Yx (u ) y
P (u )
P (u )
P (u | x, y )
u:Yx ' (u ) y '
12
THE FIVE NECESSARY STEPS
OF CAUSAL ANALYSIS
Define:
Express the target quantity Q as a function
Q(M) that can be computed from any model M.
PE(Y
e.g.,
e.g.,ETT
ATE
(Yx |doy(|xX
1))xE' )(Y | do( x0 ))
Assume:
Formulate causal assumptions A using some formal
language.
Identify:
Determine if Q is identifiable given A.
Estimate:
Estimate Q if it is identifiable; approximate it,
if it is not.
Test:
Test the testable implications of A (if any).
13
THE FIVE NECESSARY STEPS
OF CAUSAL ANALYSIS
Define:
Assume:
Express the target quantity Q as a function
Q(M) that can be computed from any model M.
e.g.,
DEETT
E
), IE
E (Yxy1),Z x Yx0 )
e.g.,
PC
e.g.,
| 0X
x' )x1|, Y
x0| X
(YPPx((1YY,xxZ x0yyY
0 1
0
Formulate causal assumptions A using some formal
language.
Identify:
Determine if Q is identifiable given A.
Estimate:
Estimate Q if it is identifiable; approximate it,
if it is not.
Test:
Test the testable implications of A (if any).
14
THE LOGIC OF CAUSAL ANALYSIS
CAUSAL
MODEL
(MA)
A - CAUSAL
ASSUMPTIONS
A* - Logical
implications of A
Causal inference
Q Queries of
interest
Q(P) - Identified
estimands
T(MA) - Testable
implications
Statistical inference
Data (D)
Q - Estimates of
Q(P)
Q(Q | D, A)
Provisional claims
g (T )
Model testing
Goodness of fit
15
IDENTIFICATION IN SCM
Find the effect of X on Y, P(y|do(x)), given the
causal assumptions shown in G, where Z1,..., Zk
are auxiliary variables.
G
Z1
Z2
Z3
X
Z4
Z5
Z6
Y
Can P(y|do(x)) be estimated if only a subset, Z,
can be measured?
16
ELIMINATING CONFOUNDING BIAS
THE BACK-DOOR CRITERION
P(y | do(x)) is estimable if there is a set Z of
variables such that Z d-separates X from Y in Gx.
Gx
G
Z1
Z1
Z2
Z3
Z2
Z3
Z4
X
Z
Z6
Z5
Y
Z4
X
Z6
Z5
Y
P ( x, y , z )
Moreover, P( y | do( x)) P( y | x, z ) P( z )
z
z P( x | z )
•
(“adjusting” for Z) Ignorability
17
EFFECT OF WARM-UP ON INJURY
(After Shrier & Platt, 2008)
Watch out!
???
Front
Door
No, no!
Warm-up Exercises (X)
Injury (Y)
18
PROPENSITY SCORE ESTIMATOR
(Rosenbaum & Rubin, 1983)
Z1
Z2
P(y | do(x)) = ?
Z4
Z3
Z5
L
X
Z6
Y
Can L replace {Z1, Z2, Z3, Z4, Z5} ?
L( z1, z2 , z3 , z4 , z5 )
P( X 1 | z1, z2 , z3 , z4 , z5 )
Theorem:
P( y | z , x) P( z ) P( y | L l , x) P( L l )
z
l
Adjustment for L replaces Adjustment for Z
19
WHAT PROPENSITY SCORE (PS)
PRACTITIONERS NEED TO KNOW
L( z ) P ( X 1 | Z z )
P( y | z, x) P( z ) P( y | l , x) P(l )
z
l
1. The asymptotic bias of PS is EQUAL to that of ordinary
adjustment (for same Z).
2. Including an additional covariate in the analysis CAN SPOIL
the bias-reduction potential of others.
Z
Z
X
Z
Y
X
Y
X
Y
X
Z
3. In particular, instrumental variables tend to amplify bias.
4. Choosing sufficient set for PS, requires knowledge of the
model.
Y
20
SURPRISING RESULT:
Instrumental variables are Bias-Amplifiers in linear
models (Bhattarcharya & Vogt 2007; Wooldridge 2009)
Z
U
c3
“Naive” bias
c1
X
(Unobserved)
c2
c0
Y
B0 E (Y | x) E (Y | do( x)) c0 c1c2 c0 c1c2
x
x
Adjusted bias
c1c2
cc
c0 1 2 c1c2
Bz E (Y | x, z ) E (Y | do( x)) c0
2
2
x
x
1
c
1
c
3
3
21
INTUTION:
When Z is allowed to vary, it absorbs (or explains)
some of the changes in X.
Z
U
c3
c1
X
c2
Y
c0
When Z is fixed the burden falls on U alone, and
transmitted to Y (resulting in a higher bias)
Z
U
c3
c1
X
c2
c0
Y
22
WHAT’S BETWEEN AN INSTRUMENT AND
A CONFOUNDER?
Should we adjust for Z?
U
Z
c4
c1
c3
c2
T1
X
ANSWER:
Yes, if
T2
c0
Y
c4 c2c1
c3 1 c 2
3
No, otherwise
CONCLUSION:
Adjusting for a parent of Y is safer
than a parent of X
23
WHICH SET TO ADJUST FOR
Should we adjust for {T}, {Z}, or {T, Z}?
Z
T
X
Y
Answer 1: (From bias-amplification considerations)
{T} is better than {T, Z} which is the same as {Z}
Answer 2: (From variance considerations)
{T} is better than {T, Z} which is better than {Z}
24
CONCLUSIONS
• The prevailing practice of adjusting for all
covariates, especially those that are good
predictors of X (the “treatment
assignment,” Rubin, 2009) is totally
misguided.
• The “outcome mechanism” is as
important, and much safer, from both bias
and variance viewpoints
• As X-rays are to the surgeon, graphs are
for causation
25
REGRESSION VS. STRUCTURAL EQUATIONS
(THE CONFUSION OF THE CENTURY)
Regression (claimless, nonfalsifiable):
Y = ax + Y
Structural (empirical, falsifiable):
Y = bx + uY
Claim: (regardless of distributions):
E(Y | do(x)) = E(Y | do(x), do(z)) = bx
The mothers of all questions:
Q. When would b equal a?
A. When all back-door paths are blocked, (uY X)
Q. When is b estimable by regression methods?
A. Graphical criteria available
26
TWO PARADIGMS FOR
CAUSAL INFERENCE
Observed: P(X, Y, Z,...)
Conclusions needed: P(Yx=y), P(Xy=x | Z=z)...
How do we connect observables, X,Y,Z,…
to counterfactuals Yx, Xz, Zy,… ?
N-R model
Counterfactuals are
primitives, new variables
Structural model
Counterfactuals are
derived quantities
Super-distribution
Subscripts modify the
model and distribution
P * ( X , Y ,..., Yx , X z ,...)
X , Y , Z constrain Yx , Z y ,...
P(Yx y ) PM x (Y y )
27
“SUPER” DISTRIBUTION
IN N-R MODEL
X
Y
Z
Yx=0
Yx=1
Xz=0
Xz=1
Xy=0
U
0
0
0
0
1
0
0
0
1
1
1
0
1
0
0
1
u1
u2
0
0
0
1
0
0
1
1
u3
1
0
0
1
0
0
1
0
u4
inconsistency:
Defines :
x = 0 Yx=0 = Y
Y = xY1 + (1-x) Y0
P * ( X , Y , Z ,...Yx , Z y ...Yxz , Z xy ,... ...)
P * (Yx y | Z , X z )
Yx X | Z y
28
ARE THE TWO
PARADIGMS EQUIVALENT?
• Yes (Galles and Pearl, 1998; Halpern 1998)
• In the N-R paradigm, Yx is defined by
consistency:
Y xY1 (1 x)Y0
• In SCM, consistency is a theorem.
• Moreover, a theorem in one approach is a
theorem in the other.
• Difference: Clarity of assumptions and their
implications
29
AXIOMS OF STRUCTURAL
COUNTERFACTUALS
Yx(u)=y: Y would be y, had X been x (in state U = u)
(Galles, Pearl, Halpern, 1998):
1. Definiteness
x X s.t. X y (u ) x
2. Uniqueness
( X y (u ) x) & ( X y (u ) x' ) x x'
3. Effectiveness
X xw (u ) x
4. Composition (generalized consistency)
X w (u ) x Ywx (u ) Yw (u )
5. Reversibility
(Yxw (u ) y ) & (Wxy (u ) w) Yx (u ) y
30
FORMULATING ASSUMPTIONS
THREE LANGUAGES
1. English: Smoking (X), Cancer (Y), Tar (Z), Genotypes (U)
U
Y
X
Z
2. Counterfactuals: Z x (u ) Z yx (u ),
X y (u ) X zy (u ) X z (u ) X (u ),
Yz (u ) Yzx (u ), Z x {Yz , X }
3. Structural:
X
Z
Y
x f1(u , 1)
z f 2 ( x, 2 )
y f3 ( z , u , 3)
31
COMPARISON BETWEEN THE
N-R AND SCM LANGUAGES
1. Expressing scientific knowledge
2. Recognizing the testable implications of one's
assumptions
3. Locating instrumental variables in a system of
equations
4. Deciding if two models are equivalent or nested
5. Deciding if two counterfactuals are independent
given another
6. Algebraic derivations of identifiable estimands
32
GRAPHICAL – COUNTERFACTUALS
SYMBIOSIS
Every causal graph expresses counterfactuals
assumptions, e.g., X Y Z
1. Missing arrows Y Z
2. Missing arcs
Y
Z
Yx, z (u ) Yx (u )
Yx Z y
consistent, and readable from the graph.
• Express assumption in graphs
• Derive estimands by graphical or algebraic
methods
33
EFFECT DECOMPOSITION
(direct vs. indirect effects)
1. Why decompose effects?
2. What is the definition of direct and indirect effects?
3. What are the policy implications of direct and indirect
effects?
4. When can direct and indirect effect be estimated
consistently from experimental and nonexperimental
data?
34
WHY DECOMPOSE EFFECTS?
1. To understand how Nature works
2. To comply with legal requirements
3. To predict the effects of new type of interventions:
Signal routing, rather than variable fixing
35
LEGAL IMPLICATIONS
OF DIRECT EFFECT
Can data prove an employer guilty of hiring discrimination?
(Gender)
X
Z (Qualifications)
Y
(Hiring)
What is the direct effect of X on Y ?
CDE E(Y | do( x1), do( z )) E (Y | do( x0 ), do( z ))
(averaged over z)
Adjust for Z? No! No!
36
FISHER’S GRAVE MISTAKE
(after Rubin, 2005)
What is the direct effect of treatment on yield?
(Soil treatment)
X
Z (Plant density)
(Latent factor)
Y
(Yield)
Compare treated and untreated lots of same density
Zz
Zz
E(Y | do( x1), do( z )) E (Y | do( x0 ), do( z ))
No! No! Proposed solution (?): “Principal strata”
37
NATURAL INTERPRETATION OF
AVERAGE DIRECT EFFECTS
Robins and Greenland (1992) – “Pure”
X
Z
z = f (x, u)
y = g (x, z, u)
Y
DE ( x0 , x1;Y )
Natural Direct Effect of X on Y:
The expected change in Y, when we change X from x0 to x1 and,
for each u, we keep Z constant at whatever value it attained
before the change.
E[Yx1Z x Yx0 ]
0
In linear models, DE = Natural Direct Effect ( x1 x0 )
38
DEFINITION AND IDENTIFICATION OF
NESTED COUNTERFACTUALS
Consider the quantity Q
Eu [YxZ (u ) (u )]
x*
Given M, P(u), Q is well defined
Given u, Zx*(u) is the solution for Z in Mx*, call it z
YxZ (u ) (u ) is the solution for Y in Mxz
x*
experiment al
Can Q be estimated from
data?
nonexperim ental
Experimental: nest-free expression
Nonexperimental: subscript-free expression
39
DEFINITION OF
INDIRECT EFFECTS
X
Z
Y
z = f (x, u)
y = g (x, z, u)
No Controlled Indirect Effect
IE ( x0 , x1;Y )
Indirect Effect of X on Y:
The expected change in Y when we keep X constant, say at x0,
and let Z change to whatever value it would have attained had
X changed to x1.
E[Yx0 Z x Yx0 ]
1
In linear models, IE = TE - DE
40
POLICY IMPLICATIONS
OF INDIRECT EFFECTS
What is the indirect effect of X on Y?
The effect of Gender on Hiring if sex discrimination
is eliminated.
GENDER X
IGNORE
Z QUALIFICATION
f
Y HIRING
Deactivating a link – a new type of intervention
41
MEDIATION FORMULAS
1. The natural direct and indirect effects are identifiable in
Markovian models (no confounding),
2. And are given by:
DE [ E (Y | do( x1, z )) E (Y | do( x0 , z ))]P( z | do( x0 )).
z
IE E (Y | do( x0 , z ))[ P( z | do( x1)) P( z | do( x0 ))]
z
TE DE IE (rev )
3. Applicable to linear and non-linear models, continuous
and discrete variables, regardless of distributional form.
42
WHY TE DE IE
Z
m1
X
m2
In linear systems
TE m1m2
g xz
z m1x 1
y x m2 z gxz 2
Y
Linear + interaction
TE m1m2 m1g
DE
DE
IE m1m2
TE DE IE
IE m1m2
TE DE IE m1g
43
MEDIATION FORMULAS
IN UNCONFOUNDED MODELS
Z
X
Y
DE [ E (Y | x1, z ) E (Y | x0 , z )]P( z | x0 )
z
IE [ E (Y | x0 , z )[ P( z | x1) P( z | x0 )]
z
TE E (Y | x1) E (Y | x0 )
IE Fraction of responses explained by mediation
TE DE Fraction of responses owed to mediation
44
WHY TE DE IE
Z
m1
X
TE DE IE (rev )
m2
Y
In linear systems
IE (rev ) IE
TE
TE m1m2
DE
IE m1m2 TE DE
IE Effect sustained by mediation alone
Is NOT equal to:
TE DE Effect prevented by disabling
mediation
TE - DE
DE
Disabling
mediation
IE
Disabling
direct path
45
Z
X
MEDIATION FORMULA
FOR BINARY VARIABLES
Y
n1
n2
n3
n4
n5
n6
n7
n8
X
0
0
0
0
1
1
1
1
Z
0
0
1
1
0
0
1
1
Y E(Y|x,z)=gxz E(Z|x)=hx
n2
0
g00
n1 n2
n3 n4
1
h0
n4
n1 n2 n3 n4
0
g01
n3 n4
1
n6
0
g10
n5 n6
n7 n8
1
h1
n8
n5 n6 n7 n8
0
g11
n7 n8
1
DE ( g10 g00 )(1 h0 ) ( g11 g01)h0
IE (h1 h0 )( g01 g00 )
46
RAMIFICATION OF THE
MEDIATION FORMULA
• DE should be averaged over mediator levels,
IE should NOT be averaged over exposure levels.
• TE-DE need not equal IE
TE-DE = proportion for whom mediation is necessary
IE = proportion for whom mediation is sufficient
• TE-DE informs interventions on indirect pathways
IE informs intervention on direct pathways.
47
TRANSPORTABILITY -- WHEN CAN
WE EXTRAPOLATE EXPERIMENTAL FINDINGS TO
DIFFERENT POPULATIONS?
Z = age
Z = age
Y
X
Experimental study in LA
Measured:
P ( x, y , z )
Y
X
Observational study in NYC
Measured: P* ( x, y, z )
P( y | do( x), z )
Problem:
P( z ) P* ( z )
We find
(LA population is younger)
P* ( y | do( x))
What can we say about
Intuition:
P* ( y | do( x)) P( y | do( x), z ) P* ( z )
z
48
TRANSPORT FORMULAS DEPEND
ON THE STORY
Z
Z
Y
X
Y
X
X
(b)
(a)
Z
(c)
Y
a) Z represents age
P* ( y | do( x)) P( y | do( x), z ) P* ( z )
z
b) Z represents language skill
P* ( y | do( x)) ???
c) Z represents a bio-marker
P* ( y | do( x)) ???
49
TRANSPORTABILITY
(Pearl and Bareinboim, 2010)
Definition 1 (Transportability)
Given two populations, denoted and *,
characterized by models M = <F,V,U> and
M* = <F,V,U+S>, respectively, a causal relation
R is said to be transportable from to * if
1. R() is estimable from the set I of
interventional studies on , and
2. R(*) is identified from I, P*, G, and G + S.
S = external factors responsible for M M*
50
TRANSPORT FORMULAS DEPEND
ON THE STORY
Z
S
S
S
Z
Y
X
Y
X
X
(b)
(a)
Z
(c)
Y
a) Z represents age
P* ( y | do( x)) P( y | do( x), z ) P* ( z )
z
b) Z represents language skill
P* ( y | do( x)) ?P( y | do( x))
c) Z represents a bio-marker
P* ( y | do( x)) ? P( y | do( x), z ) P* ( z | x )
z
51
WHICH MODEL LICENSES THE TRANSPORT
OF THE CAUSAL EFFECT
S
X
(a)
Y
X
Y
W
Z
(d)
X
(b)
S
X
S
S
S
Y
X
W
Z
(e)
Z
(c)
Y
S
Y
X
Z
(f)
Y
52
DETERMINE IF THE CAUSAL EFFECT
IS TRANSPORTABLE
S
U
V
What measurements need to
be taken in the study and
in the target population?
T
S
X
W
Y
Z
The transport formula
P* ( y | do( x))
P( y | do( x), z ) P*( z | w) P( w | do( x), t ) P* (t )
z
w
t
53
SURROGATE ENDPOINTS –
A CAUSAL PERSPECTIVE
The problem:
Infer effects of (randomized) treatment (X) on outcome (Y) from
measurements taken on a surrogate variable (Z), which is more
readily measurable, and is sufficiently well correlated with the
first (Ellenberg and Hamilton 1989).
Prentice 1989: "strong correlation is not sufficient," there should
be "no pathways that bypass the surrogate" (2005).
1989-2011 - Everyone agrees that correlation is not
sufficient, and no one explains why.
54
WHY STRONG CORRELATION IS NOT
SUFFICIENT FOR SURROGACY
Joffe and Green (2009):
“A surrogate outcome is an outcome for which knowing the effect
of treatment on the surrogate allows prediction of the effect of
treatment on the more clinically relevant outcome.”
Two effects = Two experiments conducted under two
different conditions.
Surrogacy = “Strong correlation,”
+ robustness to the new conditions.
New condition = Interventions to change the surrogate Z.
55
WHO IS A GOOD SURROGATE?
S
X
Z
Y
X
S
Z
(d)
Y
X
Z
(b)
(a)
X
S
S
Y
Z
(c)
S
Z
S
Y
X
Z
(e)
Y
X
W
U
Y
(f)
56
SURROGACY:
CORRELATIONS AND ROBUSTNESS
Definition (Pearl and Bareinboim, 2011):
A variable Z is said to be a surrogate endpoint relative the
effect of X on Y if and only if:
1. P(y|do(x), z) is highly sensitive to Z in the experimental study,
and
2. P(y|do(x), z, s) = P(y|do(x), z, s ) where S is a selection
variable added to G and directed towards Z.
In words, the causal effect of X on Y can be reliably predicted
from measurements of Z, regardless of the mechanism
responsible for variations in Z.
57
SURROGACY:
A GRAPHICAL CRITERION
Z and X d-separate S from Y.
S
X
Z
Y
X
S
Z
(d)
Y
X
Z
(b)
(a)
X
S
S
Y
Z
(c)
S
Z
S
Y
X
Z
(e)
Y
X
W
U
(f)
Y
58
THE ORIGIN OF
SELECTION BIAS:
UX
UY
c0
X
Y
1
2
S
No selection bias
Selection bias activated by a virtual collider
Selection bias activated by both a virtual collider and real collider
59
CONTROLLING SELECTION BIAS BY
ADJUSTMENT
U1
U2
UY
X
Y
S2
S3
S1
Can be eliminated by randomization or adjustment
60
CONTROLLING SELECTION BIAS BY
ADJUSTMENT
U1
U2
UY
X
Y
S2
S3
S1
Cannot be eliminated by randomization, requires
adjustment for U2
61
CONTROLLING SELECTION BIAS BY
ADJUSTMENT
U1
U2
UY
X
Y
S2
S3
S1
Cannot be eliminated by randomization or adjustment
62
CONTROLLING SELECTION BIAS BY
ADJUSTMENT
UX
UY
c0
X
Y
1
2
S
Cannot be eliminated by adjustment or by randomization
63
CONTROLLING BY ADJUSTMENT
MAY REQUIRE EXTERNAL
INFORMATION
U1
U2
UY
X
Y
S2
S3
S1
Adjustment for U2 gives
P( y | do( x)) P( y | x, S2 1, u2 ) P(u2 )
u2
If all we have is P(u2 | S2 = 1), not P(u2),
then only the U2-specific effect is recoverable
64
WHEN WOULD THE ODDS
RATIO BE RECOVERABLE?
(a)
X
(b)
Y
S
(c)
P( y1 | x1) CP( y1 / x0 )
OR( X , Y )
/
P( y0 | xX1) P(Yy0 | x0 )
Z
X
Y
OR(Y , X )
S
W
S
(a) OR is recoverable, despite the virtual collider at Y.
whenever (X Y | Y,Z)G or (Y S | X , Z)G, giving:
OR (X,Y | S = 1) = OR(X,Y)
(Cornfield 1951, Wittemore 1978; Geng 1992)
(b) the Z-specific OR is recoverable, but is meaningless.
OR (X ,Y | Z) = OR (X,Y | Z, S = 1)
(c) the C-specific OR is recoverable, which is meaningful.
OR (X ,Y | C) = OR (X,Y | W,C, S = 1)
65
GRAPHICAL CONDITION FOR
ODDS-RATIO RECOVERABILITY
Theorem (Bareinboim and Pearl, 2011)
Let graph G contain the arrow X Y and a selection
node S. A necessary condition for G to permit the
G-recoverability of OR(Y,X | C) for some set C of
pre-treatment covariates is that every ancestor Ai
of S that is also a descendant of X have a separating
set Ti that either d-separates Ai from X given Y, or
d-separates Ai from Y given X.
Moreover,
For every C s.t.
where
OR(Y , X | C ) OR(Y , X | C , T , S 1)
Y T '| C , X
T ' Nd ( X ) T , and T i Ti
66
EXAMPLES OF ODDS
RATIO RECOVERABILITY
(b)
(a)
Separator
X
Y
W4
(a)
Y
W4
W3
W1
W2
{W1, W2, W4}
{W4, W3, W1}
X
W3
W1
W2
S
S
Recoverable
Non-recoverable
OR(Y , X ) OR(Y , X | W1,W2 ,W4 , S 1)
67
EXAMPLES OF ODDS
RATIO RECOVERABILITY
(b)
(a)
Separator
X
Y
W4
W1
(a)
X
W4
W3
W1
W2
Y
W3
W1
W2
S
S
Recoverable
Non-recoverable
OR(Y , X ) OR(Y , X | W1,W2 ,W4 , S 1)
68
EXAMPLES OF ODDS
RATIO RECOVERABILITY
(b)
(a)
Separator
X
Y
W4
0
Y
W4
W3
W1
W2
(a)
X
W3
W1
W2
S
S
Recoverable
Non-recoverable
OR(Y , X ) OR(Y , X | W1,W2 ,W4 , S 1)
69
EXAMPLES OF ODDS
RATIO RECOVERABILITY
(b)
(a)
Separator
X
Y
W4
W4
Y
W4
W3
W1
W2
(a)
X
W3
W1
W2
S
S
Recoverable
Non-recoverable
OR(Y , X ) OR(Y , X | W1,W2 ,W4 , S 1)
70
EXAMPLES OF ODDS
RATIO RECOVERABILITY
(b)
(a)
Separator
X
Y
W4
0
Y
W4
W3
W1
W2
(a)
X
W3
W1
W2
S
S
Recoverable
Non-recoverable
OR(Y , X ) OR(Y , X | W1,W2 ,W4 , S 1)
71
CONCLUSIONS
I TOLD YOU CAUSALITY IS SIMPLE
• Formal basis for causal and counterfactual inference
(complete)
• Unification of the graphical, potential-outcome and
structural equation approaches
• Friendly and formal solutions to
century-old problems and confusions.
• No other method can do better (theorem)
72
Thank you for agreeing
with everything I said.
73
© Copyright 2026 Paperzz