Causation does not imply correlation: robust violations of the

Causation does not imply correlation: robust violations of the
faithfulness axiom
Richard Kennaway
School of Computing Sciences, University of East Anglia
31 Mar 2011
Abstract
Several methods of inferring causal information from correlational data assume that causation
implies correlation: that whenever there is a causal connection between two variables, their
correlation must be non-zero. More strictly speaking, it is claimed that a zero correlation in the
presence of causal influences can only arise by the unlikely chance (a chance with probability
zero) of multiple causal connections between the two variables exactly cancelling out. This is the
faithfulness axiom.
We exhibit two counterexamples to this axiom: classes of systems in which faithfulness is
robustly violated. These systems exhibit correlations indistinguishable from zero between variables that are strongly causally connected, and very high correlations between variables that have
no direct causal connection, only a connection via causal links between uncorrelated variables.
Furthermore, these systems are not of any artificially contrived sort. On the contrary, the equations defining them and real physical systems exemplifying them are commonplace. The first
example is that of a bounded differentiable variable and its first derivative, or a discrete time
series and its first difference. The second example is control systems. In control systems there is
a systematic tendency for low or zero correlations between variables that are physically directly
connected, together with very high correlations between variables whose only causal connections
are indirect, proceeding via those low-correlation links. That this is even possible may sound
paradoxical, but it is inherent in the way that these systems operate, and readily demonstrated
by mathematical analysis, numerical simulation, and physical measurement.
All of these counterexamples violate one of more of the preconditions required for various
published methods of causal inference to be applied. There is thus no contradiction of those
results, but a proof of a limitation of their scope.
1
Introduction
[Text in square brackets is notes to be expanded.]
[Intro paragraph referencing [Pea00, Pea09, SGS01], leading into the following definitions of the
basic concepts.]
Given a directed acyclic graph G on a set of variables V , a joint probability distribution P
over these variables satisfies the Markov condition if P factorises as the product of the conditional
distributions P (Vi |pred(Vi )), where pred(Vi ) is the set of immediate predecessors of Vi in G. This
amounts to the assumption that all of the other, unknown influences on each Vi are independent of
each other; otherwise put, it is the assumption that G contains all the variables responsible for all
of the causal connections that exist among the variables.
The faithfulness assumption is that no conditional correlation among the variables is zero unless
it is necessarily so given the Markov property. For example, if G is a graph of just two nodes x
and y with an arrow from x to y, then every probability distribution over x and y has the Markov
property, but only those yielding a non-zero correlation between x and y are faithful. It is not
obvious in general which of the many conditional correlations for a given graph G must be zero, but
a syntactically checkable condition was given by [Pea98]. [I.e. d-separation.]
1
The idea behind faithfulness is that if there are multiple causal paths from x to y, then while it
is possible that the causal effects might happen to exactly cancel out, leaving no correlation between
x and y, this is very unlikely to happen. Technically, if the distributions P are drawn from some
reasonable measure space of possible distributions, then the subset of non-faithful distributions has
measure zero.
Attacks on the faithfulness axiom have been based on the argument that very low correlations
may not be experimentally distinguishable from zero, and therefore that one may conclude from a
set of data that no causal connection can exist when there is one [ZS08]. But, it can be countered,
that merely reflects on the inadequate statistical power of one’s data set, the response to which
should be to collect more data rather than question this axiom.
Here we exhibit a large class of robust counterexamples to faithfulness: systems which contain
zero correlations that do not become nonzero by any small variation of their parameters, yet are not
implied by the Markov property. Some of these systems even exhibit large correlations (absolute
value above 0.95) between variables that have no direct causal connection, but only connections
along a series of links, each of which is associated with near-zero correlation. These systems are
neither exotic, nor are they artificially contrived for the sole purpose of being counterexamples.
On the contrary, systems of these forms are ubiquitous in both living organisms and man-made
machines.
It follows that for these systems, no method of causal analysis based only on nonexperimental
data and the Markov and faithfulness axioms can apply. Interventional experiments are capable
of obtaining information about the true causal relationships, but for some of these systems it is
paradoxically the lack of correlation between an experimentally imposed value for x and the resulting
value of y that will suggest the presence of a causal connection between them.
2
Zero correlation between a variable and its derivative
In this section we prove that the correlations between certain variables are zero, for both discrete
and continuous time, and for finite and infinite ranges.
Theorem 1 Let xi , i : 0 . . . n be a sequence of real numbers such that x0 = xn . Let yi = xi − xi−1
and zi = (xi−1 + xi )/2 for i : 1 . . . n. If neither of the sequences yi and zi is constant, then the
correlation between them is zero.
Proof. Without loss of generality we can take the mean of the sequence zi to be zero, by subtracting
a suitable constant from every xi . The sum of all the yi is xn − x0 = 0, so the mean of the yi is zero.
Both sequences have positive variance, since they are not constant. The correlation is therefore zero
if and only if Σi yi zi = 0. But yi zi = (x2i − x2i−1 )/2, therefore Σi yi zi = (x2n − x20 )/2 = 0.
Example 1 Monthly average bank balance and the monthly change in bank balance have zero correlation over any period in which the balance at the end is the same as at the beginning.
We note that anyone who believed that payments into and out of a bank account had no causal
effect on the bank balance is unlikely to be able to handle their money very well.
Theorem 2 Let xi , i : −∞ . . . ∞ be a non-constant infinite sequence of real numbers whose average
is well-defined and whose absolute values are bounded by a constant M . Let yi = xi − xi−1 and
zi = (xi−1 + xi )/2 for i ≥ 1. If the correlation between the sequences yi and zi is defined, it is zero.
Proof.
y =
=
lim
a→−∞, b→∞
lim
a→−∞, b→∞
= 0
2
1
Σi=b yi
b − a i=a
1
(xb − xa−1 )
b−a
since xb − xa−1 is bounded while b − a is unbounded. By subtracting a suitable offset from zi we
can arrange that z = 0. The correlation between zi and yi is then
cz,y =
=
1
i=b
b−a Σi=a
lim
zi yi
q
1
i=b z 2 )( 1 Σi=b y 2 )
( b−a
Σi=a
i
b−a i=a i
lim
x2b − x2a−1
q
i=b z 2 )(Σi=b y 2 )
(Σi=a
i
i=a i
a→−∞, b→∞
a→−∞, b→∞
Since this limit is assumed to exist, to prove that it is zero it is sufficient to construct some particular
sequence of values of a and b tending to ±∞, along which the limit is zero.
Since the mean of xi is zero, either xi tends to zero as i → ±∞, or there is an such that there
are arbitrarily large i for which xi > and arbitrarily large i for which xi < −. In the first case
the numerator of cz,y tends to zero while the denominator is increasing, and therefore the limit is
zero. In the second case, the numerator is bounded while the denominator is unbounded, and the
limit is again zero.
The theorem also holds for semi-infinite sequences.
Example 2 In the long term, if one is getting neither richer nor poorer, monthly average bank
balance and the monthly change in bank balance have zero correlation.
Theorem 3 Let x be a differentiable real function, defined in the interval [a, b], such that x(a) =
x(b). If x is not constant then the correlation of x and ẋ over [a, b] is defined and equal to zero.
Proof. Write x and ẋ for the means of x and ẋ over [a, b]. By replacing x by x − x we may assume
without loss of generality that x is zero. ẋ must exist and equal zero, since
1
ẋ =
b−a
Z
b
ẋ dt =
a
x(b) − x(a)
= 0
b−a
The correlation between x and ẋ over [a, b] is defined by:
cx,ẋ =
=
1
b−a
Rb
a x ẋ dt
q
R
Rb
b 2
1
1
2
( b−a
a x dt) ( b−a a ẋ dt)
(x(b)2 − x(a)2 )/2
qR
Rb
b
( a x2 dt) ( a ẋ2 dt)
The numerator is zero and the denominator is positive (since neither x nor ẋ is identically zero).
Therefore cx,ẋ = 0.
Theorem 4 Let x be a bounded, differentiable, real function whose mean exists. If the correlation
cx,ẋ between x and ẋ is defined, then it is zero.
Proof. As before, we can take x to be zero and prove that ẋ is also zero. The correlation between
x and ẋ is then defined by the limit:
cx,ẋ =
=
1
b−a
Rb
lim
a x ẋ dt
q
R
Rb
b 2
1
1
2
( b−a
x
dt)
(
b−a
a
a ẋ dt)
lim
(x(b)2 − x(a)2 )/2
qR
Rb
b
( a x2 dt) ( a ẋ2 dt)
a→−∞, b→∞
a→−∞, b→∞
3
Since this limit is assumed to exist, to prove that it is zero it is sufficient to construct some particular
sequence of values of a and b tending to ±∞, along which the limit is zero.
Either x(b) tends to zero as b → ∞, or (since x = 0 and x is continuous) there are arbitrarily
large values of b for which x(b) = 0. In either case, for any > 0 there exist arbitrarily large
values of b such that |x(b)| < . Similarly, there exist arbitrarily large negative values a such that
|x(a)| < . For such a and b, the numerator of the last expression for cx,ẋ is less than 2 . However,
the denominator is positive and non-decreasing as a → −∞ and b → ∞. The denominator is
therefore bounded below for all large enough a and b by some positive value δ.
If we take a sequence n tending to zero, and for each n take values an and bn as described
above, and such that an → −∞ and bn → ∞, then along this route to the limit, the corresponding approximant to the correlation is less than n /δ. This sequence tends to zero, therefore the
correlation is zero.
t
Note that the boundedness of x is essential. If we take x = e , then ẋ = x and the correlation
is 1 over any time interval. The requirement that x exist is merely a technicality that rules out
certain pathological cases such as x = sin(log(1 + |t|)), which are unlikely to arise in any practical
application. Even when x does not exist (or is nonzero), the formula that we have given for cx,ẋ is
still defined and the proof that it is zero is still valid.
We now consider a physical system containing two variables, both bounded, one being the
derivative of the other. The current I through a capacitor is proportional to the rate of change
of voltage V across it: I = C dV /dt, C being its constant capacitance. If V is the output of a
laboratory power supply, its magnitude continuously variable by turning a dial, then whatever the
word “causation” means, it would be perverse to say that the voltage across the capacitor does not
cause the current through it. Within the limits of what the power supply can generate and the
capacitor can withstand, I can be caused to follow any smooth trajectory by suitably and smoothly
varying V . The voltage is bounded and differentiable, so by theorem 3, on any finite interval in
which the final voltage is the same as the initial, cV,I is zero. By theorem 4 the same is true in the
limit of infinite time.
This is not a merely fortuitous cancelling out of multiple causal connections. There is a single
causal connection, the physical mechanism of a capacitor. No random noise need be assumed.
The mechanism deterministically relates the current and the voltage. Despite this strong physical
connection, the correlation between the variables is zero.
Some laboratory power supplies can be set to generate a constant current instead of a constant
voltage. When a constant current is applied to a capacitor, the mathematical relation between
voltage and current is the same as before, but the causal connection is reversed: the current causes
the voltage. Within the limits of the apparatus, any smooth trajectory of voltage can be produced
by suitably varying the current.
It can be argued that the reason for this paradoxical absence of correlation is that the productmoment correlation is too insensitive a tool to detect the causal connection. For example, if the
voltage is drawn from a signal generator set to produce a sine wave, a plot of voltage against current
will trace a circle or an axis-aligned ellipse. One can immediately see from such a plot that there is a
tight connection between the variables, but one which is invisible to the product-moment correlation.
However, let us suppose that V is not generated by any periodic source, but varies randomly and
smoothly, with a waveform such as that of Figure 1(a). This waveform has been designed to have
an autocorrelation time of 1 unit: any two voltage samples separated by at least 1 unit of time have
zero correlation. (It is the convolution of white noise with an infinitely differentiable function which
is zero outside a unit interval.) Choosing the capacitance C, which is merely a scaling factor, such
that V and I have the same standard deviation, the resulting current is shown in Figure 1(b). A
plot of voltage against current is shown in Figure 1(c). One can clearly see trajectories, but it is not
immediately obvious from the plot that there is a simple relation between voltage and current. If
we then sample the system with a time interval longer than the autocorrelation time of the voltage
source, then the result is the scatterplot of Figure 1(d). The points are connected in sequence, but
each step is random jump whose destination is not correlated with its source. Over a longer time,
4
(a)
(c)
(b)
(d)
Figure 1: Voltage and current related by I = dV /dt. (a) Voltage vs. time. (b) Current vs. time. (c)
Voltage vs. current. (d) Voltage vs. current, sampled. (e) Voltage vs. current, sampled for a longer
time.
this sampling produces the scatterplot of Figure 1(e). All mutual information between V and I has
now been lost: all of the variables Vi and Ii are close to being independently identically distributed.
Knowing the exact values of all but one of these variables gives an amount of information about
the remaining one that tends to zero as the sampling time step increases. All measures of their
mutual connection tend to zero: Granger causality, Shannon mutual information, and (assuming
V is generated by an information-theoreticall random source) any Kolmogorov-based definition of
mutual information. The only way to discover the relationship between V and I is to measure them
on timescales short enough to reveal the local trajectories instead of the Gaussian cloud.
3
Control systems
A control system, most generally described, is any device which is able to maintain some measurable
property of its environment at or close to some set value, regardless of other influences on that
variable that would otherwise tend to change its value. That is a little too general: a nail may serve
very well to prevent something from moving, despite the forces applied to it, but we do not consider
it to be a control system. Control systems, more usefully demarcated, draw on some energy source
to actively maintain the controlled variable at its reference value. Simple examples are a room
thermostat that turns heating or cooling mechanisms up and down to maintain the interior at a
constant temperature despite variations in external weather, a cruise control maintaining a car at a
constant speed despite winds and gradients, or a person maintaining their balance while standing.
The general form of a feedback controller is shown in Figure 2. The variables have the following
meanings:
P : The controller’s perceptual input. This is a property of the environment, which it is the con5
Figure 2: Block diagram of a feedback control system. The controller is above the shaded line; its
environment (the plant that it controls) is below the line.
Figure 3: Block diagram of a simple feedback control system.
troller’s purpose to hold equal to the reference signal.
R: The reference signal. This is the value that the control system tries to keep P equal to. It is
shown as a part of the controller. In an industrial setting it might be a dial set by an operator,
or it could be the output of another control system. In a living organism, R will be somewhere
inside the organism and may be difficult to discover.
O: The output signal of the controller. This is some function F of the perception and the reference
(and possibly their derivatives and integrals). This is the action the control system takes to
maintain P equal to R.
D: The disturbance: all of the influences on P besides O. P is some function G of the output and
the disturbance (and possibly their derivatives and integrals).
Figure 3 illustrates a specific simple control system acting within a simple environment, defined
by the following equations.
Ȯ = k(R − P )
(1)
P
(2)
= O+D
Equation 1 describes an integrating controller, i.e. one whose output signal O is proportional to the
integral of the error signal R − P . Equation 2 describes the environment of the controller, which
determines the effect that its output action and the disturbing variable D have upon the controlled
variable P . In this case O and D add together to produce P . Figure 4 illustrates the response
6
(a)
(b)
(c)
(d)
Figure 4: Responses of the controller. (a) Step change in D, R = 0. (b) Step change in R, D = 0.
(c) R = 0, randomly varying D. (d) R and D both randomly varying.
to step and random changes in the reference and disturbance. The random changes are smoothly
varying with an autocorrelation time of 0.1. The gain k is 100. Observe that when R and D are
constant, P converges to R and O to R − D. The settling time for step changes in R or D is of the
order of 1/k = 0.01.
The physical connections between O, R, P , and D are as shown in Figure 3: O is caused by R and
P , P is caused by O and D, and there are no other causal connections. We now demonstrate that
the correlations between the variables of this system bear no resemblance to the causal connections.
For a smoothly randomly varying disturbance D, which varies on a timescale much longer than 1/k,
and a constant reference R = 0, we find the following table of correlations in a simulation run of
around 1000 times the autocorrelation time of D:
P
D
O 0.0012 −0.999
P
0.1129
The numbers vary between runs only in the third decimal place.1
For this system, correlations are very high (close to ±1) exactly where direct causal connection
is absent, and close to zero where direct causal connection is present. There is a causal connection
1
It is difficult to give significance levels for these correlations, since the random waveform that we generate for
D (and in some other simulations, also for R) are deliberately designed to be heavily bandwidth-limited. They have
essentially no energy at wavelengths below about 10/k = 0.1 seconds. Successive samples therefore cannot be regarded
as independent samples from a random distribution, and formulas for the standard error of the correlation do not
apply. One would have to calculate a p-value based on an effective sample size considerably smaller, we conjecture by a
factor proportional to k. Empirically, one can repeat the simulation many times and observe the resulting distribution
of correlations.
7
between D and O, but it proceeds only via P , with which neither variable is correlated.
If we measure the correlation between O + D and P , then it will of course be identically equal to
1, and we might consider this correlation to be important. However, in practice, while the variables
P , R, and O can be accurately measured, D in general cannot: it represents all the other influences
on P of any sort whatever. (Note that the control system itself — equation 1 — does not use the
value of D at all. It controls P without knowing any of the influences on P .) To model our partial
ignorance concerning D, we shall split it into D0 , the disturbances that can be practically measured,
and D1 , the remainder. Let us assume that D0 and D1 are independently randomly distributed,
and that the variance of D0 + D1 is ten times that of D1 . That is, 90% of the variation of the
disturbance is accounted for by the observable disturbances. The correlations that result are now
these in a typical simulation run:
P
O + D0
D0
D1
O
0.002 0.302 −0.947 −0.304
P
0.135
0.043
0.009
O + D0
0.020 −0.990
D0
−0.014
We can summarize the general pattern thus:
O
P
O + D0
D0
P
∼0
O + D0
weak
very weak
D0
D1
∼ −1 −weak
∼0
∼0
∼0
∼ −1
∼0
Note that D0 and D1 were generated as independent random variables, and therefore the magnitude of their correlation is due entirely to the finiteness of the sample size, which for this run was
106 . Any correlation of a similar magnitude in this simulation must be considered indistinguishable
from zero.
When D1 is zero, the system is identical to the earlier one, for which O + D0 = P . But when the
additional disturbance D1 is added, accounting for only one tenth the variation of D, the correlation
between O + D0 and P sinks to a low level. The reason is that when a controller is functioning
well, as this one is, the variations in P are merely noise. For the above run, the standard deviations
were σ(O) = 0.999, σ(D0 ) = 0.953, σ(D1 ) = 0.318, and σ(P ) = 0.046. The amplitude of P is only
1/20 that of D0 and 1/7 that of D1 . So although the unmeasurable O + D0 + D1 is equal to P ,
the measurable O + D0 correlates only weakly with P , and the better the controller controls, the
smaller the correlation.
[Analyse some other simple control systems with different control laws to demonstrate that the
phenomenon is not peculiar to this one. Demonstrate that although the causal connections are the
same for all of these systems, almost all possible patterns of correlation among the variables can be
obtained. For example, if we simply move the integrator in Figure 3 into the environment, making
the controller’s output signal the same as the rate of change of the previous controller’s O, we get
these correlations: Ȯ with P : −1; Ȯ with D: ∼ 0; P with D: ∼ 0. P is affected by both O and D
but correlates with only one of them.]
In general, if the experimenter does not know that a control system is present, and can observe
only O, P , D0 , and the physical effect of O and D0 on P , then the fact that O correlates well, and
negatively, with D0 , while P remains constant can be taken as evidence of the presence of a control
system.
What the controller is doing is holding P equal to R. For constant R, variations in P measure the
imperfection of the controller’s performance. In the case of an artificial control system, of a known
design and made for a known purpose, this may be a useful thing to measure. However, when
studying a biological or social system which might contain control systems, correlations between
8
input and output — in other terminology, stimulus and response — must fail to yield any knowledge
about how it works.
These apparently paradoxical relationships between causation and correlation can in fact be used
to discover the presence of control systems. Suppose that D0 is an observable and experimentally
manipulable disturbance which one expects to affect a variable P (because one can see a physical
mechanism whereby it should do so), but when D0 is experimentally varied, it is found that P
remains constant, or varies far less than expected. This should immediately suggest the presence
of a control system which is maintaining P at a reference level. (This is called the Test for the
Controlled Variable [Pow74, Pow98].) Something else must be happening to counteract the effect
of D0 , and if one finds such a variable O that varies in such a way as to do this, then one may
hypothesise that O is the output variable of the control system. Further investigation would be
required to discover the whole control system: the actual variable being perceived (which might not
be what the experimenter is measuring as P , but something closely connected with it), the reference
R (which is internal to the control system), the actual output (which might not be exactly what the
experimenter is observing as O), and the mechanism that produces that output from the perception
and the reference.
4
Causal discovery methods applied to these examples
Both the system {V, I} and all control systems are outside the scope of the causal analyses of
both [Pea00] and [SGS01]. These systems do not consider time dependency. In addition, control
systems inherently include cyclic dependencies: the output affects the perception and the perception
affects the output. Control systems therefore fall outside the scope of any method of causal analysis
that excludes cycles, and both of these works restrict attention to directed acyclic causal graphs.
[LSRH08] considers dynamical systems sampled at intervals, and allows for cyclic dependencies, but
they impose a condition that in any equation giving xn+1 as a weighted sum of the variables at time
n, the coefficient of xn in that sum must be less than 1. This rules out any relation of the form
x = dy/dt, which in discrete time is approximated by the difference equation xn+1 = xn + yn δt.
In addition, [LSRH08] recommends sampling such systems on timescales longer than any transient
effects. As can be seen from Figure 1(c,d,e), the organised trajectories visible when the system
is sampled on a short time scale vanish at longer sampling intervals: only the transient effects
reveal anything about the relation between the variables. This recommendation thus rules out any
possibility of discerning causal influences from nonexperimental data in systems such as we have
discussed here.
5
Importance of these counterexamples
Any system in which some variables are integrals or derivatives of other variables may exhibit patterns of correlation independent of the actual causal influences. Therefore no method of discovering
causes from nonexperimental correlations can apply to any such system. Even experimental data
may fail to yield such information if the appropriate experiments are not done, or are wrongly interpreted in the light of assumptions such as faithfulness which may not apply. When control systems
are present, there is a systematic tendency to violate faithfulness, and produce low correlations
where there are direct causal effects, and high correlations between some variables that are only
indirectly causally connected.
[The following papers are not yet cited in the text: [VDD08, Das05, IOS+ 08, DEMS08, Spi95,
Ric96, Ric96, RS96]. All of them are about causal analysis of time series, dynamical systems, or
cyclic causal graphs. There are some hints of the obstacles to causal analysis that we describe here,
but they are concerned with excluding the obstacles rather than studying them.]
9
References
[Das05]
Denver Dash. Restructuring dynamic causal systems in equilibrium. In Robert G. Cowell
and Zoubin Ghahramani, editors, Proceedings of the Tenth International Workshop on
Artificial Intelligence and Statistics (AIStats 2005), pages 81–88. Society for Artificial
Intelligence and Statistics, 2005.
[DEMS08] David Duvenaud, Daniel Eaton, Kevin Murphy, and Mark Schmidt. Causal learning
without DAGs. In NIPS 2008 Workshop on Causality, 2008.
[IOS+ 08]
Sleiman Itani, Mesrob Ohannesian, Karen Sachs, Garry P. Nolan, and Munther A.
Dahleh. Structure learning in causal cyclic networks. In NIPS 2008 Workshop on Causality, 2008.
[LSRH08] Gustavo Lacerda, Peter Spirtes, Joseph Ramsey, and Patrik O. Hoyer. Discovering cyclic
causal models by independent components analysis. In David A. McAllester and Petri
Myllymäki, editors, Proc. 24th Conference on Uncertainty in Artificial Intelligence, pages
366–374. AUAI Press, 2008.
[Pea98]
Judea Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan and Kaufman, 1998.
[Pea00]
Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press,
2000.
[Pea09]
Judea Pearl. Causal inference in statistics: An overview. Statistics Surveys, 3:96–146,
2009.
[Pow74]
William T. Powers. Behavior: The Control of Perception. Aldine, 1974.
[Pow98]
William T. Powers. Making Sense of Behavior: The Meaning of Control. Benchmark,
1998.
[Ric96]
Thomas Richardson. A discovery algorithm for directed cyclic graphs. In Proc. 12th
Conference on Uncertainty in Artificial Intelligence, pages 454–461, 1996.
[RS96]
Thomas Richardson and Peter Spirtes. Automated discovery of linear feedback models.
Technical Report Technical Report CMU-75-Phil, CMU, 1996.
[SGS01]
Peter Spirtes, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search.
MIT Press, 2001.
[Spi95]
Peter Spirtes. Directed cyclic graphical representations of feedback models. In Proc.
11th Conference on Uncertainty in Artificial Intelligence, pages 491–498, 1995.
[VDD08]
M. Voortman, D. Dash, and M.J. Druzdzel. Learning Causal Models That Make Correct
Manipulation Predictions With Time Series Data. In NIPS 2008 Workshop on Causality,
2008.
[ZS08]
J. Zhang and P. Spirtes. Detection of unfaithfulness and robust causal inference. Minds
and Machines, 18(2):239–271, 2008.
A
MATLAB code
We used Matlab to perform all the simulations and plot the graphs.
[Include the source code or give a pointer to a web site.]
10