Probability of causation: Bounds and identification

Probability of causation: Bounds and
identification for partial and complete
mediation analysis
Rossella Murtas1 , Alexander Philip Dawid2 , Monica Musio1
1
2
3
PhD student at the University of Cagliari, Italy
Leverhulme Emeritus Fellow, University of Cambridge, UK
Professor in Statistics, University of Cagliari, Italy
E-mail for correspondence: [email protected]
Abstract: An individual has been subjected to some exposure and has developed
some outcome. Using data on similar individuals, we wish to evaluate, for this
case, the probability that the outcome was in fact caused by the exposure. Even
with the best possible experimental data on exposure and outcome, we typically
can not identify this “ probability of causation” exactly, but we can provide information in the form of bounds for it. Here, using the potential outcome framework,
we propose new bounds for the case that a third variable mediates partially or
completely the e↵ect of the exposure on the outcome.
Keywords: Probability Of Causation; Mediation Analysis; Potential Outcomes.
1
Introduction
A typical causal question can be categorized into two main classes: about the
causes of observed e↵ects, or about the e↵ects of applied causes. Let us consider
the following example: an individual, called Ann, might be subjected to some
exposure X, and might develop some outcome Y . We will denote by XA 2 {0, 1}
the value of Ann’s exposure (coded as 1 if she takes the drug) and by YA 2
{0, 1} the value of Ann’s outcome (coded as 1 if she dies). Questions on the
e↵ects of causes, named “EoC”, are widely known in literature as for example
by Randomized clinical trials. In the EoC framework we would be interested in
asking: “What would happen to Ann if she were (were not) to take the drug?”. On
the other hand, questions on the causes of observed e↵ects, “CoE”, are common
in a Court of Law, when we want to assess legal responsibility. For example, let
us suppose that Ann has developed the outcome after being exposed, a typical
question will be “Knowing that Ann did take the drug and passed away, how likely
This paper was published as a part of the proceedings of the 31st International Workshop on Statistical Modelling, INSA Rennes, 4–8 July 2016. The
copyright remains with the author(s). Permission to reproduce or extract any
parts of this abstract should be requested from the author(s).
88
Probability of causation, bounds in mediation analysis
is it that she would not have died if she had not taken the drug?”. In this paper we
will discuss causality from the CoE perspective, invoking the potential outcome
framework. Definition of CoE causal e↵ects invokes the Probability of Causation
Pearl (1999) and Dawid (2011) PCA = PA (YA (0) = 0 | XA = 1, YA (1) = 1)
where PA denotes the probability distribution over attributes of Ann and Y (x)
is the hypothetical value of Y that would arise if X was set to x. Note that
this expression involves the bivariate distribution of the pair Y = (Y (0), Y (1))
of potential outcomes. Whenever the probability of causation exceeds 50%, in a
civil court, this is considered as preponderance of evidence because causation is
“ more probable than not”.
2
Starting Point: Simple Analysis
In this section we discuss the simple situation in which we have information, from
a hypothetical randomized experimental study (such that Xi ?
? Yi for a subject
i in the experimental population) that tested the same drug taken by Ann such
that P1 = P(Y = 1 | X
1) = 0.30 and P0 = P(Y = 1 | X
0) = 0.12.
This information alone is not sufficient to infer causality in Ann’s case. We need
to further assume that the fact of Ann’s exposure, XA , is independent of her
potential responses YA , that is XA ?
? YA , and that Ann is exchangeable with the
individuals in the experiment. On account of this and exchangeability, the PCA
reduces to PCA = P(Y (0) = 0 | Y (1) = 1). However, we can never observe the
joint event (Y (0) = 0; Y (1) = 1), since at least one event must be counterfactual.
But even without making any assumptions about this dependence, we can derive
the following inequalities, Dawid et al. (2015):
1
P(Y = 0 | X
1
 PCA 
RR
P(Y = 1 | X
0)
1)
(1)
where RR = P(Y = 1 | X
1)/P(Y = 1 | X
0) is the experimental risk ratio between exposed and unexposed. Since, in the experimental
population, the exposed are 2.5 times as likely to die as the unexposed
(RR = 30/12 = 2.5), we have enough confidence to infer causality in Ann’s
case, given that 0.60  PCA  1.
3
Bounds in Mediation Analysis
In this Section we present a novel analysis to bound the Probability of
Causation for a case where a third variable, M , is involved in the causal
pathway between the exposure X and the outcome Y and plays the role
of mediator. We shall be interested in the case that M is observed in the
experimental data but is not observed for Ann, and see how this additional
experimental evidence can be used to refine the bounds on PCA .
First we consider the case of complete mediation, Dawid et al. (2016). Using counterfactual notation, we denote by M (x) the potential value of M
for X
x, and by Y ⇤ (m) the potential value of Y for M
m. Then
Murtas et al.
89
Y (x) := Y ⇤ {M (x)}. Assuming no confounding for the exposure-mediator
and mediator-outcome relationship, the causal pathway will be blocked after adjustment for M (Markov property Y ?? X|M ). The assumed mutual
independence implies the following upper bounds for the probability of causation in the case of complete mediation: PCA  Num/P(Y = 1 | X
1),
while the lower bound remains unchanged from that of the simple analysis
of X on Y in Eq. (1). For the upper bound’s numerator, Num, one has to
consider various scenarios according to di↵erent choices of the estimable
marginal probabilities in Table 1.
TABLE 1. Upper Bound’s Numerator for PCA in Complete Mediation Anlaysis
ab
cd
c>d
a · c + (1
a · d + (1
d)(1
c)(1
a>b
b)
b)
b · c + (1
b · d + (1
d)(1
a)(1
a)
c)
In Table 1, a = P (M (0) = 0), b = P (M (1) = 1), c = P (Y ⇤ (0) = 0) and
d = P (Y ⇤ (1) = 1). Given the no-confounding assumptions, these are all
estimable probabilities.
For the case of partial mediation, we introduce: Y ⇤ (x, m), the potential
value of the outcome after setting both exposure and mediator, so that now
Y (x) = Y ⇤ (x, M (x)). Let us consider the following assumptions (named
(A)): Y ⇤ (x, m) ?? (M (0), M (1))|X; Y ⇤ (x, m) ?? X that is no X Y
confounding and M (x) ?? X that is no X M confounding. Note that assumption Y ⇤ (x, m) ?? (M (0), M (1))|X implies both Y ⇤ (x, m) ?? M (0)|X
and Y ⇤ (x, m) ?? M (1)|X, that is no M Y confounding. If Ann is exchangeable with the individuals in the experiment
PCA = P(Y (0) = 0, Y (1) = 1 | X = 1)/P(Y (1) = 1 | X = 1).
The numerator involves a bivariate distribution of counterfactual outcomes.
Using assumptions (A) and and the inequality P (A\B)  min{P (A), P (B)},
we can obtain an upper bound for PCA considering these 64 combinations
P(Y (0) = 0, Y (1) = 1|X = 1) 
⇤
⇤
min{P(Y (0, 0) = 0), P(Y (1, 0) = 1)} · min{P(M (0) = 0), P(M (1) = 0)}
(2)
+ min{P(Y (0, 0) = 0), P(Y (1, 1) = 1)} · min{P(M (0) = 0), P(M (1) = 1)}
⇤
⇤
(3)
+ min{P(Y (0, 1) = 0), P(Y (1, 0) = 1)} · min{P(M (0) = 1), P(M (1) = 0)}
⇤
⇤
(4)
+ min{P(Y (0, 1) = 0), P(Y (1, 1) = 1)} · min{P(M (0) = 1), P(M (1) = 1)}
⇤
⇤
(5)
It can be proved that the lower bound does not change. Assumptions A
will be enough to estimate the lower and the upper bounds from the data.
90
4
Probability of causation, bounds in mediation analysis
Comparisons and conclusions
The numerator of the upper bound of PCA in the simple analysis framework
(1), which ignores the mediator, may be written as
⇤
⇤
⇤
min{P(Y (0, 0) = 0)P(M (0) = 0) + P(Y (0, 1) = 0)P(M (0) = 1), P(Y (1, 0) = 1)P(M (1) = 0)
⇤
+ P(Y (1, 1) = 1)P(M (1) = 1)} = min{↵ + ,
+ }.
(6)
We can see that both (2) and (3) are smaller than or equal to ↵, while
both (4) and (5) are smaller than or equal to . Thus, the upper bound not
accounting for the mediator, could be larger or smaller than that obtained
considering the partial mediation mechanism. On the other hand, it can be
proved that the bounds found for the case of complete mediation are never
larger than for the simple analysis of X on Y .
In conclusion, the important implications of PCA in real cases encourage
the researcher to focus on studying methods capable of producing more
precise bounds. Here we have proposed a novel analysis to bound the PCA
when a mediator lies on a pathway between exposure and outcome.
References
Pearl, Judea (1999). Probabilities of causation: three counterfactual interpretations and their identification. Synthese, 121 (1-2), 93 – 149.
Dawid, A. Philip (2011). The role of scientific and statistical evidence in
assessing causality. In Perspectives on Causation, Oxford: Hart Publishing, 133 – 147.
Dawid, A. P, and Murtas, R. and Musio, M. (2016). Bounding the Probability of Causation in Mediation Analysis. In the Springer Book Selected Papers of the 47th Scientific Meeting of the Italian Statistical
Society, in press. Editors: T. Di Battista, E. Moreno, W. Racugno.
arXiv preprint arXiv:1411.2636.
Dawid, A. P, and Musio, M. and Fienberg, S. (2015). From statistical evidence to evidence of causality. Bayesian Analysis, Advance Publication, 26 August 2015. DOI:10.1214/15-BA968