A causal modelling approach to spatial and temporal confounding in

A Causal Modelling Approach
to Spatial and Temporal
Confounding in Environmental
Impact Studies
Dr Warren Paul
La Trobe University, Australia
July 2009
Introduction
• Environmental impact studies are observational studies,
and causal inference is problematic owing to spatial and
temporal confounding.
• The generally accepted (but less than ideal) solution to
spatial & temporal confounding is to use a Before-After
Control-Impact design.
• Causal modelling is a relatively new graphical and
mathematical framework for addressing confounding in
observational studies. It is to observational studies
what replication and randomisation are to
experiments.
• What can causal modelling do for environmental impact
studies?
Causal modelling in environmental impact studies
Page 2
Today’s Presentation
• Brief introduction to causal modelling.
• A causal modelling approach to spatial
and temporal confounding in
environmental impact studies.
• Where does this leave BACI designs?
• Using the causal diagram to guide the
data analysis: The 1976 Amoco Cadiz oil
spill.
• Concluding remarks & some references.
Causal modelling in environmental impact studies
Page 3
Causal Modelling
• Causal modelling combines graph theory
with statistics for reliable causal inference from
observational studies.
• Started with Sewell Wright’s Method of Path
Coefficients in 1920s.
• Progressed to Structural Equation Modelling
(Haavelmo, 1943; Simon, 1953; Goldberger,
1972).
• Recent advances (last 3 decades) due to
– Judea Pearl and collaborators at UCLA
– Peter Spirtes, Clark Glymour, and Richard Scheines
at CMU.
Causal modelling in environmental impact studies
Page 4
Building a Causal Model: A Simple
Example of Fertiliser and Crop Yield
1. Start by drawing the basic causal diagram
(also called a Directed Acyclic Graph).
–
Nodes represent variables and the arrows represent
the direction of causal effects.
X
Y
Fertiliser
(low, high)
Crop yield
(low, high)
Causal modelling in environmental impact studies
Page 5
Model Building cont’d
2. Add confounding
variables to the
diagram and
indicate whether
they are latent with
dashed lines.
Soil type
(clay, sand)
W
Dashed lines signify a
latent (unobserved)
variable
X
Y
Fertiliser
(low, high)
Crop yield
(low, high)
Causal modelling in environmental impact studies
Page 6
Model Building cont’d
3. Apply the d-separation criterion to determine
whether the causal effect of interest is
identifiable in the presence of latent variables.
–
–
If there is a chain A→B→C or a fork A←B→C then
A and C are d-separated (conditionally
independent) given B. That is, the path is blocked
by conditioning on B.
If there is an inverted fork A→B←C then A and C
are d-connected given B. That is, the path is
blocked by not conditioning on B.
Causal modelling in environmental impact studies
Page 7
Model Building cont’d
4. If the causal effect is nonidentifiable then
apply the front-door and/or back-door
criterion to search apriori for a set of
covariates that will give a consistent
estimate of the causal effect of interest.
Causal modelling in environmental impact studies
Page 8
Model Building cont’d
The back-door
criterion (Pearl,
2000):
One solution is to
observe and condition
on soil type.
Soil type
(clay, sand)
W
Conditioning on
soil type blocks the
back-door path
X←W→Y, thus isolating
the causal effect of
interest.
X
Y
Fertiliser
(low, high)
Crop yield
(low, high)
Causal modelling in environmental impact studies
Page 9
Model Building cont’d
1)
The front-door
criterion:
If soil type remains latent
then the other possible
solution is to condition on
the mediating variable(s) if
they are known.
Soil type
(clay, sand)
2)
W
This results in a two-stage
adjustment:
First find the effect of X on Z.
This is straightforward because
the back-door path from X to Z
through W is blocked by the
collider Y.
Then find the effect of Z on Y
by conditioning on X to block
the back-door between Z and
Y.
X
Z
Y
Fertiliser
(low, high)
Soil nitrate
(low, high)
Crop yield
(low, high)
Causal modelling in environmental impact studies
Page 10
Model Building cont’d
5. Translate the graphical model into a statistical
model by applying to the joint distribution over
all variables in the causal diagram
–
–
–
the Markov condition, a graph-theoretic condition
which states that a variable is independent of its
predecessors given its parents;
the do operator (Pearl, 2000), which expresses
mathematically the asymmetry of causal effects;
and the laws of probability
Causal modelling in environmental impact studies
Page 11
Model Building cont’d
W
• For the causal diagram
X
Y
– The causal effect of fertiliser on yield is
p  y | do  x     p  y | w, x  p  w 
– Or alternatively
w
y   0  1  x  high    2  w  clay   
where 1  x E  y | do  x  
Causal modelling in environmental impact studies
Page 12
Model Building cont’d
6. Collect the data and test the model.
– The testable part of the model is the
conditional independence relationships that
are encoded in the causal diagram and the
statistical model. These can be read directly
off the causal diagram using Pearl’s dseparation rules.
Causal modelling in environmental impact studies
Page 13
Model Building cont’d
• If all of the variables in this example were
observed - fertiliser, soil type, soil nitrate, and
crop yield – there would be just one conditional
independency to test.
– If our model were correct then crop yield should be
independent of fertiliser given soil type and soil
nitrate.
– In graph notation this is written Y || X | Z , W G
which says that “Y is d-separated from X given Z and
W in causal graph G.”
– This part of the model (or the equivalent class of
models) is testable.


Causal modelling in environmental impact studies
Page 14
Spatial Confounding in
Environmental Impact Studies
Control
site
Treatment
plant
Sampling
units
Impact
site
Causal modelling in environmental impact studies
Page 15
By being explicit about the nature of the confounding it becomes clear
that spatial confounding can be controlled by simple adjusting for
spatial location (i.e. distance along the stream from an arbitrary point).
Distance to other
sources of
confounding
arc
nutrients/toxicants
Nutrient/toxicant inputs
from other sources
The
in red is the
source of confounding in ControlImpact studies.
Z4
Control-Impact (CI)
Design
Z5
Water and
sediment quality
Spatial
location
Z3
Z1
Benthic macroinvertebrates
X
Effluent
(control, impact)
Y
Z2
Flow velocity
Z6
Habitat type
(pool, riffle)
Causal modelling in environmental impact studies
Page 16
Water Quality (in concentration units)
A graphical depiction of spatial
confounding in Control-Impact studies
EFFLUENT
Control site
Impact site
z3
Assuming no impact,
effluent and
water quality will
be marginally
dependent but
conditionally
independent given
spatial location.
Spatial location (in distance units)
Causal modelling in environmental impact studies
Page 17
Temporal Confounding in
Environmental Impact Studies
Temperature
Z4
Before-After (BA)
Design
Rain
Time
Z3
Z5
Z1
Water
and
sediment
quality
Benthic
macroinvertebrates
X
Effluent
(before, after)
Y
Z2
Flow
velocity
Z6
Habitat type
(pool, riffle)
Causal modelling in environmental impact studies
Page 18
Where Does this Leave BACI
Designs?
• It has been noted by others (see Smith et al. 1993 and
Stewart-Oaten et al. 2001) that the assumptions
underpinning BACI designs may not always be
reasonable and in some modifications (i.e., the Beyond
BACI design) the assumptions are invalid.
• Furthermore, from a causal modelling perspective there
is no apparent advantage in combining the Before-After
(BA) and Control-Impact (CI) designs in a BACI design.
• These results suggest that controlling for spatial or
temporal location in a Before-After (BA) or ControlImpact (CI) design is all that is needed.
Causal modelling in environmental impact studies
Page 19
A Before-After Example: The 1976
Amoco Cadiz Oil Spill
Source: http://www.black-tides.com/uk/tools/amoco-cadiz-biggest-oil-spill.pdf
Causal modelling in environmental impact studies
Page 20
A possible causal diagram for the
Amoco Cadiz example
Path 1
“temperature
component”
Time
Temperature
Species
composition
Oil spill
(before, after)
Oil
concentration
Path 2
“oil component”
Causal modelling in environmental impact studies
Page 21
Amoco Cadiz cont’d
Temperature
6
Time
Path 2: The oil component
Ord axis 2
Time
Ord axis 1
Ord axis 1
Temperature
Path 1: The temperature component
7
8
9
11
15
3
10
12
14
2
4
Time
Ord axis 2
Oil
Ord axis 2
Ord axis 1
Oil
Time
Causal modelling in environmental impact studies
Page 22
13
1
5
Amoco Cadiz cont’d
Morlaix (Amoco Cadiz spill)
Transform: Square root
Resemblance: S17 Bray Curtis similarity
40
Points labelled according to
the time order of the samples.
PCO2 (23.8% of total variation)
• The unconstrained ordination
(PCO) plot seems to indicate a
jump between 5th and 6th
times, which is when the spill
occurred.
• A seasonal pattern is also
evident at the ends of the time
series, suggesting that the
pattern was disrupted by the
spill but then began to recover.
• The first two axes explain 63%
of the total variation.
20
15
16
19
20
14
17 12
13
18 11
21
32
0
4
1
10
5
-20
89
7
spill
6
-40
-40
-20
0
PCO1 (38.9% of total variation)
20
Causal modelling in environmental impact studies
Page 23
Amoco Cadiz cont’d
10
0
PCO2
10
-20 -10
0
-10
PCO1
20
20
30
30
• The first two principal coordinates (PCO)
axes plotted against time.
5
10
Time
15
20
5
10
15
20
Time
Causal modelling in environmental impact studies
Page 24
Amoco Cadiz cont’d
•
These preliminary analyses suggest that the
multivariate regression model underlying the
distance-based redundancy analysis should
include two components:
1. An oil component – modelled as function of the
binary spill variable and a quadratic function of time,
with interaction between the polynomial terms and
the binary spill variable to reflect the change in the
pattern following the spill.
2. A temperature component – modelled as a periodic
function of time (i.e., a sum of sine and cosine terms
with a seasonal period of 4 quarters).
Causal modelling in environmental impact studies
Page 25
Amoco Cadiz cont’d
30
15
16
0 10
2019
14
17 12
11
18 13
21
32
41
10
-20
5
89
7
6
-40
PCO2
• Fitted model explains
78% of the total
variation.
• The first two dbRDA
axis contributes 61%
to the total variation
explained.
• The oil component
accounts for 90% of
the fitted model’s
variation.
Overlay of predicted values (solid line)
and observed values (numbered
points)
-30
-20
-10
0
10
PCO1
Causal modelling in environmental impact studies
Page 26
Concluding Remarks
• Causal modelling suggests that temporal and
spatial confounding in environmental impact
studies can be dealt with by adjusting directly for
temporal or spatial location in Before-After or
Control-Impact studies.
• There appears to be no advantage in combining
these study designs in a BACI design.
• Data analysis is guided by the causal diagram,
and analyses can be undertaken with available
statistical software.
Causal modelling in environmental impact studies
Page 27
Acknowledgements
• Thank you to
– Susan Lawler and Peter Pridmore for helping
me to clarify certain concepts.
– Bob Clarke for reviewing the dbRDA
analyses.
• I am grateful to The Ian Potter
Foundation and La Trobe University for
providing financial assistance to attend
this conference.
Causal modelling in environmental impact studies
Page 28
Some References
• Pearl, J. 1995. Causal diagrams for empirical
research. Biometrika 82:669-710.
• Pearl, J. 2000. Causality: Models, Reasoning,
and Inference. Cambridge University Press, New
York.
• Spirtes, P., C. Glymour, and R. Scheines. 2000.
Causation, Prediction, and Search. 2nd edition.
MIT Press, Cambridge, Massachusetts.
Causal modelling in environmental impact studies
Page 29
Thank You
Causal modelling in environmental impact studies
Page 30