Combining Experiments to Discover Linear Cyclic Models
with Latent Variables
Frederick Eberhardt
Department of Philosophy
Washington University in St Louis
Patrik O. Hoyer
CSAIL, MIT
& HIIT, Univ. of Helsinki
Abstract
We present an algorithm to infer causal relations between a set of measured variables
on the basis of experiments on these variables. The algorithm assumes that the causal
relations are linear, but is otherwise completely general: It provides consistent estimates when the true causal structure contains feedback loops and latent variables,
while the experiments can involve surgical or
‘soft’ interventions on one or multiple variables at a time. The algorithm is ‘online’
in the sense that it combines the results from
any set of available experiments, can incorporate background knowledge and resolves conflicts that arise from combining results from
different experiments. In addition we provide
a necessary and sufficient condition that (i)
determines when the algorithm can uniquely
return the true graph, and (ii) can be used
to select the next best experiment until this
condition is satisfied. We demonstrate the
method by applying it to simulated data and
the flow cytometry data of Sachs et al (2005).
1
INTRODUCTION
Causal knowledge is key to supporting inferences
about the effects of interventions on a system, hence
much of applied science is devoted to identifying
causal relationships between various measured quantities. For this purpose the randomized controlled experiment is generally the tool of choice whenever such
experiments are feasible.
Appearing in Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS)
2010, Chia Laguna Resort, Sardinia, Italy. Volume 9 of
JMLR: W&CP 9. Copyright 2010 by the authors.
Richard Scheines
Department of Philosophy
Carnegie Mellon University
While the standard theory of experimental design provides procedures and inference techniques for a given
set of potential causes (treatment variables) and effects
(outcomes), it does not provide guidance on how to determine the full set of causal relationships among the
measured variables (i.e. the causal ‘interaction graph’).
Randomized experiments directly provide the effects of
the intervened variables in the experiment, but breaking these effects down into direct and indirect effects
(with respect to the measured variables), or combining them to determine the total effect, is typically
not straightforward. This problem arises in particular when correlations in the data may be partly attributable to unknown confounding variables, or when
the system of interest exhibits feedback phenomena,
as for example is common in bioinformatics and economics. Feedback phenomena in particular invalidate
standard approaches based on directed acyclic graphs.
Furthermore, experiments can involve different types
of manipulations (e.g. randomizing a single vs. multiple variables simultaneously per experiment), and
sometimes it is only possible to perform ‘soft’ interventions, in which the variables influenced by the experimenter still (partly) depend on their natural causes.
Because of difficulties such as these, there is a definite
need for algorithms and procedures that (i) combine
the (possibly conflicting) results from different types
of experiments, (ii) incorporate background knowledge
where available, (iii) provide guidance on the selection
of experiments, and (iv) given a set of stated assumptions, are able to identify consistently and efficiently
the causal structure from experimental data, or do as
well as possible (with regard to some measure) in cases
of underdetermination or conflicting evidence.
As a simple and concrete example, consider the two
alternative graphs of Figure 1 as explanations of the
causal relationships between the variables x1 , x2 , and
x3 . If in three separate experiments we randomize each
of the variables one at a time while measuring the two
others, we could deduce from the resulting dependencies that there exist directed paths from x1 to x2 , from
Discovery of Linear Cyclic Models with Latent Variables
x1
x1
x2
x2
l1
l1
x3
l2
x3
l2
Figure 1: Two distinct causal graphs which cannot
be distinguished based on (marginal and conditional)
independencies alone, when using only a combination
of observational data and experiments randomizing at
most one variable per experiment.
x1 to x3 , and from x2 to x3 . However, based on dependencies and independencies alone it is impossible
to determine whether there is a direct effect x1 → x3
in addition to the indirect effect through x2 , because
the latents l1 and l2 preclude us from using statistical
conditioning to identify this feature. Note that we can
identify all edges among the observed variables if we
can in an experiment simultaneously randomize both
x1 and x2 while measuring x3 . This holds generally:
By simultaneously randomizing all variables except a
target variable, it is possible to determine all direct
causal effects with respect to the randomized variables.
By performing such a ‘leave-one-out’ experiment for
each variable, and combining the results, the complete
graph can be determined. Unfortunately, it is seldom
feasible or economical to perform such experiments.
There already exists a substantial body of work in
this area. For instance, Cooper & Yoo (1999) use a
Bayesian score to combine experimental and observational data in the acyclic no-hidden-variable case.
Tong & Koller (2001), Murphy (2001), Eberhardt
(2008) and He & Geng (2008) provide strategies for
this setting to optimally select the experiments to learn
as much as possible about the underlying structure.
Furthermore, Eaton & Murphy (2007) describe how to
handle ‘uncertain’ interventions where the targets of a
given intervention is unknown, while Nyberg & Korb
(2006) analyze the implications of soft interventions.
All of the above methods assume the restricted case
of acyclic structures and no confounding latent variables. While Richardson (1996) discusses learning
cyclic models based on observational data alone, there
is little work that formally integrates experimental results into the search procedure. One exception to this
is recent work by Schmidt and Murphy (2009) who
adapt a formulation for undirected graphs to model
cyclic directed graphs with no latents. In contrast, the
standard literature on experimental design permits latent variables but only considers a very restricted set
of possible causal structures.
In this paper we provide a procedure for discovery
of linear causal models that uses experimental data
and is completely general with regard to structural assumptions. Given a set of datasets over the variables
of interest where in each dataset a different combination of variables has been subject to intervention,
our procedure identifies the linear constraints on the
causal effects among the variables. For an underdetermined system it returns a model that provides a minimal representation of the measured constraints. From
here there are two ways to proceed: Either additional
routines can be used to characterize the underdetermination, or, alternatively, a condition we provide identifies the next experiment that would add the most
additional required constraints. Once the system is
sufficiently constrained (available background knowledge can be used if available), the procedure identifies the direct causal effects relative to the measured
variables, and additionally infers the presence and locations of confounding latent variables. If the system
is overconstrained it returns the model with the minimum sum of squared errors to the constraints. The
technique works for both cyclic and acyclic systems,
and easily incorporates soft interventions.
2
MODEL
The data generating model can be represented as a
set of structural equations that assign to each of the
observed variables a linear combination of the other
variables and an additive ‘disturbance’ term. Grouping the observed variables xi , i = 1, . . . , n into the
vector x, the corresponding disturbances ei into the
vector e, and the linear coefficients bji representing
the direct effects xi → xj into the matrix B, we have:
x := Bx + e.
(1)
Here, latent variables are represented as non-zero
off-diagonal entries in the covariance matrix Σe =
E{eeT } of the disturbances. Without loss of generality, all the xi and the ei have zero mean.
A surgical (edge-breaking, i.e. fully randomizing) intervention on a variable xi is represented by a manipulated model where the row in B corresponding
to xi has been set to zero, indicating that xi is no
longer influenced by any of the other observed variables. Similarly, we set the corresponding disturbance
ei to zero, and instead assign an independent, unit
variance experimental variable ci to xi . If several of
the observed variables are surgically intervened on simultaneously (in the same experiment), we perform
these steps simultaneously for each of the intervened
variables. Thus from the original model (B, Σe ) we
obtain a manipulated model (B∗ , Σ∗e ) for which
x := B∗ x + c + e∗ ,
Eberhardt, Hoyer & Scheines
where the intervention vector c is zero except for rows
corresponding to variables that were subject to an
intervention, while the manipulated disturbances e∗
equal e except that elements for intervened variables
are set to zero. This follows the standard view that
interventions break all incoming arrows to any intervened node in a directed graph representing the direct
effects between the observed variables (Pearl 2000).
In some situations an intervention that breaks the influence of the set of causes on a given variable may not
be possible, while a ‘soft’ intervention, in which the intervention influences the variable but does not fully determine it, can be performed. In our framework such
an intervention on a variable xi does not affect the
corresponding row of B, nor the corresponding disturbance ei , but it does add an independent experimental
variable ci . Assuming that ci is measured, its correlation with x can be used to obtain an estimate of its
causal effect, as in the standard instrumental variable
setting.1 In what follows, however, all interventions
are surgical unless otherwise explicitly stated.
We assume that our data sets are generated from a sequence of experiments (Ek ) k=1, ... ,m where each experiment Ek = (Jk , Uk ) consists of a set Jk of one or more
variables that are subject to an intervention (simultaneously and independently) and a set Uk denoting the
other variables, which are passively observed.
In each experiment Ek = (Jk , Uk ) our measurements
consist of the experimental effects of each xj ∈ Jk
on each xu ∈ Uk , denoted by t(xj xu ||Jk ), which is
equal to the covariance of xj and xu in this experimental setting (since xj has unit variance).The total effect
t(xj xu ) of xj on xu is standardly defined as the experimental effect in a (hypothetical if not actual) experiment in which only xj is subject to an intervention,
i.e. we have t(xj xu ) ≡ t(xj xu ||{xj }). In experiments in which interventions are performed on more
than one variable (i.e. Jk is larger than the set {xj }),
the experimental effect may differ from the total effect,
so t(xj xu ) may in general not equal t(xj xu ||Jk )
when Jk 6= {xj }.
Additionally, if available, a set of passive observational
data (for which Jk = ∅) can be used to estimate the
(passive observational) covariance matrix of the observed variables: Σx = E{xxT }.
In the special case when the variables can be ordered in such a way that the matrix B is strictly
lower-triangular (corresponding to a directed acyclic
graph) the interpretation of the model is straightfor1
We here assume that the parametrization (B, Σe ) of
the original model does not change as a result of the soft
intervention. One could imagine other soft interventions
where this is not the case.
ward. This case is known as a ‘recursive SEM’ in the
literature and corresponds to a causal Bayes network
with linear relationships over continuous variables.
The interpretation of models which contain directed
cycles (i.e. are ‘non-recursive’) requires more care.
Such cases arise from feedback phenomena and represent systems that equilibriate.2 To ensure that an
equilibrium exists the absolute values of the eigenvalues of B must all be less than 1. This condition must
also be satisfied for the manipulated matrix B∗ in all
possible experiments (both hypothetical and actual).3
While self-loops ([B]ii 6= 0 for any i) are permitted in
the true model, the estimated model will not represent them explicitly; instead the effect of the self-loop
will be included in the incoming edges to the variable in question.4 The only underdetermination is the
path and the speed to convergence to the equilibrium,
not the equilibrium point itself. Since all of our measurments are presumed to come from the equilibrium,
this underdetermination is not only natural, but inevitable. Hence, we assume in the following that B
has been standardized to have a diagonal of zeros.
Finally, we note that our representation of latent variables is only implicit (in correlations among the components of e) rather than explicit. Thus, while the
true model may contain some latent variables that simultaneously confound more than two variables, these
will be rendered as a set of non-zero entries in the estimated covariance matrix Σe , connecting any pair of
variables that are confounded by the hidden variables.
3
PROCEDURE
The experiments only directly supply measures of the
experimental effects, but our main interest lies in the
direct effects represented by the matrix B in the model
of Section 2. Our procedure for deriving the direct effects essentially consists of two steps: The experimental effects are combined to infer total effects, and the
total effects are subsequently used to derive the direct
effects. We first describe the latter step.
2
Our procedure can handle both stochastic and deterministic equilibria for cyclic models, but the details are
beyond the scope of this paper.
3
In order to avoid special cases, in which identifiability
depends on the experiments performed and the particular
parameterization, we use this slightly stronger assumption
than is needed, including all possible experiments.
4
If variable xi has an incoming edge with edge-coefficient a and a self-loop with edge-coefficient b in the true
model B, then the estimated model will not return the self
loop but instead return the edge coefficient on the incoming
edge as a/(1 − b).
Discovery of Linear Cyclic Models with Latent Variables
3.1
TOTAL TO DIRECT EFFECTS
An experiment Ek consisting of a single surgical intervention (i.e. Jk = {xj }) directly supplies the total
effect of that variable (xj ) on all the other measured
variables. (Note that the presence of latent variables
does not affect the total effect of a variable that is
subject to a surgical intervention.5 ) If we have such
an experiment for each measured variable, then all total effects can be estimated and represented by a total
effects matrix T, where [T]ji = t(xi xj ) and we define t(xi xi ) = 1 for all i.
By solving equation (1) for x as a function of e, we
obtain x = Ae with A = (I − B)−1 , where I refers to
the n × n identity matrix.6 It turns out that T = AD
where D is a diagonal matrix that rescales A to have
a diagonal of ones. Thus, given an estimated total
effects matrix T we can infer the direct effects as
B = I − A−1 = I − DT−1 .
Because B is standardized to have a diagonal of zeros,
it follows that D rescales T−1 to have a diagonal of
ones. Furthermore, given A and the covariance matrix
Σe over the disturbances, Σx is determined by
Σx = E{xxT } = E{AeeT AT } = AΣe AT .
Given passive observational data from which we can
estimate Σx , we can now infer the sought covariance as
Σe = A−1 Σx A−T to determine the latent variables.
3.2
EXPERIMENTAL TO TOTAL
EFFECTS
Experiments intervening on more than one variable
enable search strategies that are more efficient (in the
number of experiments and in total sample size) than
single intervention experiments (see Section 5, Fig. 2).
In such multi-intervention experiments the experimental effects do not necessarily correspond to the total effects: Any path of the form xj → . . . → xi → . . . → xu
is broken if xi is surgically manipulated. Nevertheless,
the experimental effects provide partial information on
the total effects in the form of linear constraints: In an
experiment Ek = (Jk , Uk ) with xj ∈ Jk and xu ∈ Uk
the total effect [T]uj = t(xj xu ) from xj to xu can
be represented as
X
[T]uj = t(xj xu ) =
t(xj xi )t(xi xu ||Jk ) (2)
xi ∈ Jk
5
In the case of soft interventions techniques used for
instrumental variables can be applied with the same result.
6
Recall that for stability of the linear system we have
to require that the absolute values of all eigenvalues of B
are smaller than 1. Thus for each λ ∈ eigenvalues(B) we
have 1 − λ ∈ eigenvalues(I − B), so the stability condition
ensures that no eigenvalue of I − B is zero and the matrix
is always invertible.
where we define t(xj xj ) = 1. That is, the total effect of xj on xu can be redescribed as a decomposition
of the total effects of xj on each variable xi in the intervention set and the experimental effects of the xi
on the passively observed variable xu . In the experiment only the experimental effects t(xi xu ||Jk ) can
be measured, but given the intervention set Jk we can
specify for each experiment qk = |Jk |×|Uk | linear constraints of the form of equation (2) on the total effects.
Since there are n2 −n total effects for n measured variables, several experiments will be required to obtain a
sufficient number of constraints.
P
We can now combine the set of q = i qi linear constraints (from all available experiments) into
Ht = h
(3)
where the (q × (n2 − n))-matrix H and the vector h
of q scalars contain the measured experimental effects,
and t is the (n2 − n)-vector of desired total effects that
can be re-arranged into the matrix T.
In principle, any number of constraints can be represented in this equation: constraints that are linearly
dependent (by coincidence or because two sets of measurements, possibly from different experiments, constrain the same total effects and agree), constraints
that conflict (because two sets of measurements constrain the same total effects, but do not agree), or
added constraints that represent background knowledge (e.g. xj is not an ancestor of xi : t(xj xi ) = 0,
or more generally that the total effect of xj on xi is
equal to some scalar c: t(xj xi ) = c).
In most cases H is not square nor full-rank, and hence
equation (3) is not straightforwardly solvable for t.
When there are no conflicting constraints and the rank
of H is less than n2 − n then the system is underdetermined, so there are several directed graphs that satisfy
the experimental constraints. If on the other hand the
rank of H is n2 − n then there is either a unique causal
structure that satisfies all the constraints, or if there
are conflicting constraints, there is no such model.
In principle, there are several ways to proceed. One
could select non-conflicting constraints until full rank
is achieved and proceed to the unique solution ignoring
the other constraints, thereby avoiding (denying) conflict and remaining agnostic in the underdetermined
case. We proceed differently: We use the MoorePenrose pseudoinverse H† to invert H using whatever
constraints are available. In the underdetermined case
the pseudoinverse returns the solution that minimizes
the l2 -norm of the total effects; in the square invertible
case we obtain the unique solution, and in the overdetermined case we obtain the solution that minimizes
the sum of the squared errors of the constraints. We
Eberhardt, Hoyer & Scheines
Program 1 Algorithm pseudocode. A full implementation of this procedure is available at:
http://www.cs.helsinki.fi/u/phoyer/code/LLC/
Experiments, LLC: (linear, latents, cyclic)
ordered pair of variables (xi , xj ) ∈ V×V with i 6= j we
say that the pair condition is satisfied for this variable
pair w.r.t. the experiments whenever there is a k such
that xi ∈ Jk and xj ∈ Uk for some Ek = (Jk , Uk ).
Given m datasets Dk from a sequence of experiments
(Ek ) k=1, ... ,m and, if available, also a dataset Do of passive observational data over a set of measured variables:
Theorem 1 Given the experimental effects from the
experiments (Ek )k=1,...,m over the variables in V, all
total effects t(xi xj ) are identified if and only if the
pair condition is satisfied for all ordered pairs of variables w.r.t. these experiments.7
For each experiment Ek = (Jk , Uk ) with dataset Dk
For each ordered pair of variables (xj , xu ) with xj ∈ Jk
and xu ∈ Uk
Determine from
P Dk the constraint
t(xj xu ) = xi ∈Jk t(xj xi )t(xi xu ||Jk ).
Re-order the constraint and concatenate it to the
matrix equation Ht = h.
Check H and h and report rank and the existence of
conflicts, if any, to determine whether the system is
underconstrained, conflicted or exactly determined.
Compute t = H† h, and determine the total effects matrix T by reordering t into matrix form (note that the
diagonal of T consists of 1’s.).
Determine the direct effects matrix B = I − DT−1 ,
where I is the n × n identity matrix.
If passive observational dataset Do is available then
Estimate the passive observational covariance matrix
Σx = E{xxT } from dataset Do .
Infer the error covariance matrix Σe = A−1 Σx A−T .
Return the estimated B (and Σe if available).
are thus able to provide an ‘online’ algorithm (Program 1), which can integrate background knowledge
and, given the constraints, does as well as possible
with regard to a well-defined measure.
4
IDENTIFIABILITY
Ideally, the constraints obtained from the experiments
would determine the total effects matrix T uniquely.
However, this will almost never be the case, since most
combinations of experiments provide either too few or
too many constraints, and in the latter case conflicts
will be common. Hence the crucial question for the scientist is how the identifiability of the causal structure
is related to the experiments. Given a set of measured
variables V, the following condition enables exactly
this connection between identifiability and experimental set-up.
Definition 1 Given experiments (Ek )k=1,...,m and an
Sketch of proof of necessity: Suppose the pair condition is not satisfied for an ordered pair of variables
(xi , xj ). Then for any experiment Ek = (Jk , Uk ), either (i) xi , xj ∈ Jk or (ii) xi , xj ∈ Uk or (iii) xi ∈ Uk
and xj ∈ Jk . Consider t(xi xj ). Experiments of type
(iii) place no constraint on t(xi xj ). Experiments of
type (i) provide for each w ∈ Uk a constraint of the
form t(xi w) = t(xi xj )t(xj w||Jk ) + const, but
since t(xi w) is unknown, these constraints are insufficient to determine t(xi xj ). The only relevant
constraints from experiments of type (ii) marginalize
over xi making it impossible to separate the effect of
some w ∈ Jk via xi from the more direct effects:
t(w
xj ||Jk )
= t(w
+t(w
xi ||Jk )t(xi
xj ||Jk ∪ {xi })
xj ||Jk ∪ {xi })
Since there will be a constraint of this type for each w,
t(xi xj ||Jk ∪ {xi }) is undetermined. But it is needed
to determine the total effect of xi on xj in terms of
what is known, since:
t(xi
xj )
= t(xi xj ||Jk ∪ {xi })
X
+
t(xi xk )t(xk
xj ||Jk ∪ {xi })
xk ∈Jk
Hence the total effect t(xi xj ) is underdetermined, so
the total effects matrix T is not fully determined. Sketch of proof of sufficiency: Since the pair condition
is satisfied for each ordered pair of variables (xi , xj ),
we can select for each total effect t(xi xj ) one such
constraint for our matrix equation Ht = h.
Let H∗ be the matrix of n2 − n constraints on the total effects obtained when for each variable there is an
experiment intervening on all but that variable. Each
constraint will have at most n − 1 non-zero entries
in H∗ . These entries will correspond to direct effects
of an intervened variable on the target variable. Appropriately re-ordering the rows, H∗ can be organized
into a block-diagonal matrix, where each block on the
diagonal corresponds to a submatrix of I − B. One
such block is shown in Table 1. Given the stability
7
Full proofs supplied upon request. Due to space constraints we here give a proof sketch only.
Discovery of Linear Cyclic Models with Latent Variables
Table 1: One of the blocks in H∗ . The b(xi → xj ) are
entries of the direct effects matrix B.
J
V \ {x2 }
V \ {x3 }
V \ {x4 }
t(x1
x2 )
1
−b(x2 → x3 )
−b(x2 → x4 )
t(x1 x3 )
−b(x3 → x2 )
1
−b(x3 → x4 )
t(x1 x4 )
−b(x4 → x2 )
−b(x4 → x3 )
1
assumption on B (see Section 2), any such block, and
consequently H∗ , is full rank. In cases where the pair
condition is satisfied, but the experiments did not intervene on all but one variable, at least one of the
off-diagonal elements in such a block of H must be
zero. Leaving a variable out of the intervention set
corresponds in the measurement of the constraint to
a marginalization over that variable. It can be shown
that marginalization correponds to elementary row operations on the corresponding block of H∗ . Further,
since satisfaction of the pair condition ensures that
each row of H consists of a constraint corresponding
to a different marginalization, it can be shown that
any such H can be obtained by row operations from
the H∗ matrix. Since H∗ is full rank, and since row
operations preserve rank, any such H is full rank, and
hence invertible. For an underdetermined system, the theorem implies
that there are ordered pairs of variables that do not
satisfy the pair condition. One way to proceed is
to select the next intervention set to maximally reduce the number of such pairs.
brute force,
Pk Using
this would require checking i=1 ni possible experiments for n variables and maximum intervention set
size k ≤ n/2. However, faster heuristics are available
or can be adapted (Eberhardt 2008). If feasible, an
experiment that intervenes on n/2 variables, none of
which have yet been subject to intervention, supplies
the maximum number of additional pair constraints.
Selecting the next best experiment is obviously greedy,
and if this procedure is used to select a whole sequence
of experiments starting with no prior knowledge, it will
select a sequence of 2 log2 (n) experiments intervening
on a different set of n/2 variables in each experiment.
This sequence of experiments is sufficient for identifiability, but is known to be suboptimal for most n.
Instead of attempts at further resolution one might be
interested in a characterization of the underdetermination. Given that the system is linear, the solution
space for the total effects is easily specified, so further numerical or analytical routines can be used to
specify the implications of the underdetermination for
the direct effects matrix B. Although analytical procedures could be devised, a naive numerical heuristic
would sample total effects from the solution space, and
invert T repeatedly to explore the variation in B.
5
SIMULATIONS
Several factors will affect the accuracy of our estimation procedure. The number of samples used in each
experimental condition will naturally be an important
consideration. Another is the number of constraints
obtained from the set of experiments, which depends
not only on the number of experiments but also on how
many variables are intervened per experiment. Finally,
the relative efficiency of soft vs. surgical interventions
is unclear. It is thus not obvious what the best experimental strategy is in any given practical situation.
To investigate the issue empirically, we generated 100
random cyclic models over 8 variables, each with 4 latents and possible self-loops, and sampled experimental data from these models for a variety of strategies.
First, we considered surgical interventions only. We
compared the strategy of intervening on each variable
separately (8 experiments to satisfy the pair condition)
to intervening on pairs of variables (also requiring 8 experiments) to intervening on four variables at a time
(requiring only 2 log2 8 = 6 experiments), with a variety of sample sizes. Figure 2 (top row) shows the
accuracy measured as the mean correlation between
true and estimated values across the 100 graphs as a
function of the total number of samples (divided evenly
over the number of experiments) used for each strategy. The main result is that experiments involving interventions on a number of variables are more effective
than experiments of single interventions, reflecting the
larger number of constraints obtained per experiment.
Next, we investigated the efficiency of soft interventions, performing identical simulations to those given
above but replacing surgical interventions with soft interventions (bottom row in Figure 2). Here, we additionally test the strategy of performing a single experiment intervening on all variables simultaneously.
(Surgically intervening on all variables simultaneously
does not provide any information, but softly intervening does.) Results are comparable to the surgical interventions case, except that softly intervening on all
variables is a superior strategy in terms of the accuracy
as a function of the total number of samples used.
6
FLOW CYTOMETRY DATA
To demonstrate how our algorithm might be used in
practice we show its application to the flow cytometry data of Sachs et al. (2005). This data consists of
single cell recordings of the abundance of 11 different
phosphoproteins and phospholipids under a number of
experimental conditions in human T-cells. In each condition, measurements were obtained from some 700–
900 single cells. These data have been previously an-
Manuscript
under
review
by
2010
Manuscript
under
review
by AISTATS
AISTATS
2010
Manuscript
under
review
by AISTATS
2010
Eberhardt, Hoyer & Scheines
Manuscript under review by AISTATS 2010
(a) Raf:
(b) Akt:
50
100 200
500
2000
5000
20000
5000
20000
n * nsamplesvec
500
2000
50
100 200
50
100 200
0.0
100 200
500
2000
5000
20000
5000
20000
n * nsamplesvec
500
2000
1.0
1.0
0.8
0.8
0.6
0.6
0.6
0.4
0.4
errCevec1
errCevec1
errCevec1
0.0
0.2
0.0
0.2
0.4
0.2
0.8
0.6
0.6
0.4
100 200
500
2000
5000
20000
n * nsamplesvec
50
100 200
500
2000
n * nsamplesvec
50
50
100 200
100 200
0.0
errBvec1 errBvec1
0.4
0.2
0.2
0.0
50
0.0
accuracy
0.8
1.0
0.8
1.0
1.0
n * nsamplesvec
5000
20000
500
500
2000
2000
5000
5000
20000
20000
n * nsamplesvec
n * nsamplesvec
50
100 200
500
2000
n * nsamplesvec
5000
20000
4
4
4
6
8
3
0
2
4
6
8
xlim
0
0
1
0
xlim
0
2
2
2
1 0
1
ylim
2 1
2
1
ylim
ylim
3 2
3
ylim
4 3
4
3
ylim
ylim
ylim ylim
2 12
1 01
0 0
0.2
0.0
0.2
0.0
0
n * nsamplesvec
soft interventions
3 2 3
4 3 4
1.0
0.8
0.8
0.6
0.6
0.4
errCevec1 errCevec1
0.4
0.2
0.8
0.6
0.6
0.4
errBvec1 errBvec1
0.4
0.2
accuracy
0
0
0 2
2
4
2 4
6
xlim
50
0.0
surgical interventions
1.0
0.8
4
4
1.0
1.0
(a) Raf:
(b)
Akt:
(a)
Raf:
(b) Akt:
tal data
fromfrom
models for a variety
strategies.
tal data
these
of strategies.
(a) Raf
(b) Akt
(direct
effects) models for a variety
(latents)
Σe of
Bthese
(a) Raf:
(b) Akt:
First,
we considered
surgical
interventions
only.only.
We We background
First,
we considered
surgical
interventions
1.0
1.0
compared
the the
strategy
of intervening
on each
variable
compared
strategy
of intervening
on each
variable inh. Akt
0.8
0.8
separately
(requiring
8 experiments
to satisfy
the the
pairpair inh. PKC
separately
(requiring
8 experiments
to satisfy
0.6
0.6
condition)
to intervening
on pairs
of variables
(also(also inh. PIP2
condition)
to intervening
on
pairs
of variables
0.4
0.4
a minimum
of 8ofexperiments)
to intervening
on four
a minimum
8 experiments)
to intervening
on four inh. Mek
0
2
4
6
8
0
2
4
6
8
variables
time
(requiring
only
2 log22log
8=
6
expervariables
a time
(requiring
8
=
6
exper0.2 at aat
0.2 only
2
log( abundance )
log( abundance )
iments),
a variety
of sample
sizes.
In Figure
2 2 Figure 3: Histograms
iments),
with
a variety
of sample
sizes.
In Figure
0.0 with
0.0
of Raf and Akt in different ex50 200
2000
20000accuracy
200
2000 of
20000
(top(top
row)row)
we
show
the the
accuracy
as50 aas
function
the
we show
a function
of the perimental
conditions.
Top
histograms
Raf
Figure
3:
Histograms
ofofof
Raf
and
Akt
in in
different
ex- exFigure
Histograms
Raf
and
Akt
indifferent
different
ex- exFigure
3: Histograms
ofrow:
Raf
and
Akt
inof different
3:3: Histograms
Raf
and
Akt
totaltotal
number
of samples
usedused
for
each
strategy.
TheTheFigure
number
of samples
for each
strategy.
and
Akt
in
the
background
condition
(α-CD3/28).
1.0
1.0
perimental
conditions.
Top
row:
histograms
of
Raf
perimental
conditions.
TopTop
row:row:
histograms
ofRaf
Raf
perimental
conditions.
histograms
of Raf
perimental
conditions.
Top
row:
histograms
of
main
result
is that
experiments
involving
interventions
rows, histograms
of Raf
and Akt
main
result
is that
experiments
involving
interventions Second-to-bottom
and and
Akt
in
condition
(α-CD3/28).
0.8
0.8
and
AktAkt
in the
the
background
condition
(α-CD3/28).
in background
the
background
condition
(α-CD3/28).
and
Akt
in
the
background
condition
(α-CD3/28).
the four stimulusrows,
conditions
whereofselectively
Akt,
on aonnumber
of variables
are are
more
effective
thanthan
ex- ex- in
a number
of variables
more
effective
Second-to-bottom
histograms
Raf
andRaf
Aktand
Second-to-bottom
rows,
histograms
ofRaf
Raf
and
AktAkt
Second-to-bottom
rows,
histograms
of
0.6
0.6
Second-to-bottom
rows,
histograms
ofmanipulated.
and
Akt
PIP2,
and
Mek
(respectively)
were
periments
of single
interventions,
reflecting
the the
larger
periments
of single
interventions,
reflecting
larger PKC,
in the four stimulus conditions where selectively Akt,
in
the
four
stimulus
conditions
where
selectively
Akt,
in
the
four
stimulus
conditions
where
selectively
Akt,
in
the
four
stimulus
conditions
where
selectively
Akt,
0.4
0.4
[Will
numbers and were
labelsmanipulated.
big enough.
8 exp, 1 interv / exp
PKC, manually
PIP2, andmake
Mek (respectively)
number
of constraints
obtained
per per
experiment.
number
of constraints
obtained
experiment.
8 exp, 2 interv / exp
PKC,
PIP2,
and
Mek
(respectively)
were
manipulated.
PKC,
PIP2,
and
Mek
(respectively)
were
manipulated.
PKC,
PIP2,
and
Mek
(respectively)
were
manipulated.
-Patrik]
0.2
0.2
[Will manually make numbers and labels big enough.
6 exp, 4 interv / exp
Next,
we
investigated
the the
efficiency
of soft
Next,
we investigated
efficiency
1of
exp,soft
8intervenintervinterven/ exp
-Patrik]
0.0
0.0
inhibitors may just inactivate a given molecule rather
tions
in this
setting.
We
performed
identical
simutions
in50 this
We
performed
identical
200 setting.
2000
20000
50 200
2000
20000simutotalthose
number
of given
samples
totalinstead
numberof
of samples
its abundance
(Sachs
et al. 2005).
Thus,
lations
to those
given
above,
but but
instead
surgical
lations
to
above,
of surgicalthan
that decrease
many of the
known causal
relationships
between
quantities
of
the
intervened-upon
molecules
interventions
we performed
softsoft
interventions
(bottom
interventions
we performed
interventions
(bottommeasured
the measured
signaling
can be found
from
that
many of the
knownmolecules
causal relationships
between
2: 2).
Experiments
oncase,
simulated
data8 with
ob-theare
α-CD3/28
stimulation.
The
four
bottom
rowsrows
show
α-CD3/28
stimulation.
four
bottom
show
not
indicators
ofThe
how
active
they
are,
Figure
2:
Results
onthis
simulated
with
observed
row
inFigure
Figure
In
case,
wedata
additionally
test8test
the
row
in Figure
2).
In
this
we additionally
this
datareliable
using
standard
Bayesian
network
learning
the
measured
signaling
molecules
can
be
found
from
served
variables.
graph
shows
the
accuracy
Figure
2:of Experiments
on
simulated
data
withwhere
8(corob- weand
corresponding
histograms
in
four
separate
experiments
corresponding
histograms
in
four
separate
experiments
so
we
must
impute
a
‘low’
value
for
the
inhibvariables.
graphEach
shows
accuracy
(correlation
strategy
ofEach
performing
a single
experiment
where
we
methods.
Eaton
&
Murphy
(2007)
provide
evidence
strategy
performing
a the
single
experiment
this data using standard Bayesian network learning
relation
between estimates
and
true the
values) as a funcserved
variables.
Each
graph
shows
(corwhere,
in addition
to
α-CD3/28,
were
selecwhere,
in the
addition
to
α-CD3/28,
inhibitors
were
variables.
Second,
note
that inhibitors
there
areevidence
strong
ef-selecthat
many
of
‘specific
perturbations’
may
in fact
between
estimates
and
true
values)
as aaccuracy
function
of(Sur-ited
softly
intervene
on all
variables
simultaneously.
(Sursoftly
intervene
on
all
variables
simultaneously.
methods.
Eaton
&
Murphy
(2007)
provide
tion
of
the
total
number
of
samples
used,
error
bars
relation
between
estimates
and simultaneously
true
values)
a does
funcnot
have
been
localized
toPKC,
single
molecules.
A in
recent
tively
applied
tothat
Akt,
and
Mek
(top(top
to to
in
the
data
bePIP2,
captured
using
acyclic
tively
applied
tocannot
Akt,
PKC,
PIP2,
and
Mek
the gically
total
number
of
samples
used,
with
error as
bars
in-doesfects
gically
intervening
on
all
variables
intervening
on
all
variables
simultaneously
that
many
of
the
‘specific
perturbations’
may
fact
are
The
top plots
show accuracy
of thebars
estionindicated.
of the
total
number
of samples
used,
error
analysis
provided
by Schmidt
&
Murphy
(2009)
was
models
with
targeted
interventions.
For
example,
the
bottom
respectively).
In
panel
(a)
we
see
that
the
inbottom
respectively).
In
panel
(a)
we
see
that
the
indicated.
The
top
plots
show
accuracy
of
the
estimated
not
have
been
localized
to
single
molecules.
A
recent
not
provide
any
information,
but
softly
intervening
not
provide
any
information,
but
softly
intervening
timated
B and The
Σe for
interventions,
while
are indicated.
topsurgical
plots show
accuracy of
the the
es- strong
the
firsteffect
to
apply
ato
technique
for
learning
cyclic
modof
Mek
on
Raf
as
seen
in
panel
(a)
goes
hibitors
applied
PKC
and
Mek
have
a
large
effect
analysis
provided
by
Schmidt
&
Murphy
(2009)
was
B and
Σ
for
surgical
interventions,
while
the
bottom
hibitors
applied
to
PKC
and
Mek
have
a
large
effect
does.)
Results
are
comparable
to
the
surgical
intervene
does.)
Results
are
comparable
to
the
surgical
intervenbottom
gives
results for soft
intertimated row
B and
Σe corresponding
for surgical interventions,
while
the
els.
the Raf,
firstRaf,
to
apply
a technique
learning
the
acyclic
model
given
Sachs
et
al.mod(2005).
on
while
the
effect
(if for
any)
of inhibition
of Akt
rowtions
gives
corresponding
results
forintervening
soft 1interventions.
on
while
the
effect
(ifby
any)
ofcyclic
inhibition
of Akt
tions
case,case,
except
that
softly
intervening
on soft
all
except
that
softly
on variall vari-against
ventions.
Solid
blue:
8 experiments,
variable
interbottom
row
gives
corresponding
results
for
interels. give
Although
these
effects
may
be
accounted
forEllis
using
and
PIP2
is
minimal.
In
(b)
we
see,
meanwhile,
that
and
PIP2
is
minimal.
In
(b)
we
see,
meanwhile,
that
To
the
reader
a
sense
of
the
data
we
follow
ables
is
a
superior
strategy
in
terms
of
the
accuracy
as
vened
per
experiment.
Dotted
green:
8
experiments,
ables
is
a
superior
strategy
in
terms
of
the
accuracy
as
ventions. Solid blue: 8 experiments, 1 variable interthe
framework
of
‘uncertain
interventions’
(Eaton
and
&
Wong
(2008)
and
plot
in
Figure
3
histograms
of
two
all
four
experimental
conditions
affect
Akt,
though
the
all
four
experimental
conditions
affect
Akt,
though
the
variables
intervened
per
experiment.
Dashed
red:
aalyzed
function
of
the
total
number
of
samples
used.
a 2vened
function
of
the
total
number
of
samples
used.
To give the reader a sense of the data we follow Ellis
perseveral
experiment.
Dotted green:
8 experiments,
using
techniques.
Sachs et
al. (2005)
out
of
the
eleven
measured
variables,
in
five
different
Murphy
2007),
our
approach
is
to
account
for
them
us6
experiments,
4
variables
intervened
per
experiment.
effect
is
again
most
pronounced
with
the
inhibitor
on
effect
is
again
most
pronounced
with
the
inhibitor
on
&
Wong
(2008)
and
plot
in
Figure
3
histograms
of
two
variables
per experiment.
Dashed
red:
show2that
manyintervened
known causal
relationships
between
experimental
conditions.
Panel
(a)
shows
thethat
response
Dash-dotted
black:
1
experiment,
all
variables
intering
cyclic
models.
Finally,
we
emphasize
some
of
PKC.
PKC.
out
of
the
eleven
measured
variables,
in
five
different
6
experiments,
4
variables
intervened
per
experiment.
5.2
FLOW
CYTOMETRY
FLOW
CYTOMETRY
the 5.2
measured
signaling
moleculesDATA
canDATA
be found using
of
Raf
while (b)
displays
the
activity
of Akt.
The top
vened
with soft
interventions.
[I will
make the
experimental
settingsPanel
with
‘specific’
perturbations
experimental
conditions.
(a)
shows
the
response
Dash-dotted
black:
1 experiment,
all manually
variables
interstandard
Bayesian
network
learning
methods
on plots.
this
A
number
of
issues
are
worth
pointing
out.
First,
notenote
A
number
of
issues
are
worth
pointing
out.
First,
rows
show
the
responses
to
the
‘background
condition’
font
sizes
bigger
etc
once
we
are
ok
with
the
dataset
were
actually
targeted
interofthe
Raforiginal
while (b)
displays
thenot
activity
of Akt.
The
top
with soft
[I will
manually
make
To
demonstrate
howinterventions.
our(2007)
algorithm
might
be used
in inin
Tovened
demonstrate
how
our
algorithm
might
bethat
used
data.
Eaton
&
Murphy
provide
evidence
that
the
Akt
inhibitor
has
a
minimal
(if
any)
effect
that
the
Akt
inhibitor
has
a
minimal
(if
any)
(representing
the
passive
observational
data
case)
of
Consider
taking
out
Σ
plots
to
save
space?
-Patrik]
e
rows show
the responses
to the ‘background
condition’
because
they changed
the background
condi-effect
fontwe
sizes
bigger
etc application
once wetoare
with
thecytometry
plots. ventions
practice
its application
the
flow
cytometry
practice
we‘specific
show
its
took
the
flow
many
of theshow
perturbations’
may
in
fact
not
α-CD3/28
stimulation.
The
four
bottom
rows
show
on
the
measured
quantity
of
Akt.
As
pointed
out out
on
the
measured
quantity
of
Akt.
As
pointed
(representing
the
passive
observational
data
case)
Consider
taking
out(2005).
Σe This
plotsThis
to save
space? of
-Patrik]
In particular, all excitatory perturbations of
were
data
of
Sachs
et
al.
data
consists
data
of Sachs
et(2005).
al.
data
of singletions.
have
been
localized
to single
targets.
A consists
recent single
analcorresponding
histograms
in
four
separate
experiments
by
Sachs
et
al.
(2005),
this
is
because
the
inhibitors
by
Sachs
et
al.
(2005),
this
is
because
the
inhibitors
α-CD3/28
stimulation.
The
four
bottom
rows
show
without the α-CD3/28 background stimucell
recordings
of Schmidt
the
abundance
of 11
different
recordings
of the
abundance
of(2009)
11 different
phos-performed
where,
in just
addition
to α-CD3/28,
inhibitors
were
selecysiscell
provided
by
&variables
Murphy
wasphosthe
gically
intervening
on all
simultaneously
does
may
just
inactivate
a given
molecule
rather
thanthan
de- demay
inactivate
a four
given
molecule
rather
corresponding
histograms
in
separate
experiments
lus,
and
hence
any differences
to
the passive
observaphoproteins
and
phospholipids
under
a
number
of
exphoproteins
and
phospholipids
under
a
number
of
extively
applied
to
Akt,
PKC,
PIP2,
and
Mek
(top
to of of
first to
apply
a
technique
for
learning
cyclic
models.
not
provide
any
information,
but
softly
intervening
where,
in
addition
to
α-CD3/28,
inhibitors
were
seleccrease
its
abundance.
Thus,
measured
quantities
gically intervening on all variables simultaneously does tional
crease
its
abundance.
Thus,
measured
quantities
case
observed in
these (a)
conditions
may
due
perimental
conditions
in human
T-cells.
In each
condiperimental
conditions
in human
T-cells.
In
each
condi- tively
bottom
respectively).
panel
weand
seeMek
that
thebeindoes.)
Results
areinformation,
comparable
tobut
the
surgical
intervenapplied
to Akt, In
PKC,
PIP2,
(top
to indicanot the
provide
any
softly
intervening
the
intervened-upon
molecules
are
not
reliable
indicathe
intervened-upon
molecules
are
not
reliable
To tion,
give
reader
a
sense
of
the
data
we
follow
Ellis
to
either
the
excitatory
stimulus
or
the
change
in
the
tion,
measurements
were
obtained
from
some
700–900
measurements
were
obtained
some
hibitors respectively).
applied to PKC
and Mek
have
a largethe
effect
tions
case,
except
that
softly
intervening
onintervenall 700–900
varibottom
Inthey
panel
(a) and
we
see
in-impute
does.)
Results
are comparable
to
the from
surgical
tors
of how
active
they
are,
and
so
we
must
impute
a a
tors
of how
active
are,
sothat
wewemust
&
Wong
(2008)
and
plot
in
Figure
3
histograms
of
two
background
condition.
For
this
reason
exclude
on
Raf,
while
the
effect
(if
any)
of
inhibition
of
Akt
single
cells.
These
datastrategy
havesoftly
been
previously
analyzed
ables
is
a superior
in terms
of the accuracy
as
single
cells.
These
data
have
been
previously
hibitors
applied
to
PKC
and
Mek
have
a large effect
tions
case,
except
that
intervening
on
allanalyzed
vari‘low’
value
into
the
inhibited
variables.
‘low’
value
into
the
inhibited
variables.
outusing
of
the
measured
variables,
in
five
different
from
analysis.
and
PIP2
isour
minimal.
In (if
(b)any)
we see,
meanwhile,
a
of
number
of
samples
used.
using
afunction
number
ofthe
techniques.
Sachs
etof
al.
(2005)
show
aiseleven
number
oftotal
techniques.
Sachs
et
al.
(2005)
showthem
on Raf,
while
the
effect
of inhibition
of that
Akt
ables
a superior
strategy
in
terms
the
accuracy
as
conditions.
Panel
(a)
shows
the
response
of
Raf
while
all
four
experimental
conditions
affect
Akt,
though
thethe
Second,
note
that
there
arewe
strong
effects
in the
datadata
thatthat
many
of the
known
causal
relationships
between
and Second,
PIP2
is of
minimal.
Inthere
(b)
see,
meanwhile,
that
note
are
strong
effects
in
many
of the
the
known
causal
betweenSince
a function
total
number
of relationships
samples used.
each
thethat
four
remaining
inhibitory
experi(b)
displays
the
activity
of
Akt.
The
top
rows
show
the
effect
is
again
most
pronounced
with
the
inhibitor
onwith
all
four
experimental
conditions
affect
Akt,
though
the
thatthat
cannot
be captured
using
acyclic
models
with
tar-tarthe the
measured
signaling
molecules
can
be found
fromfromments
cannot
be captured
using
acyclic
signaling
molecules
can
be found
only
targeted
a single
molecule,
the models
experimen5.2measured
FLOW
CYTOMETRY
DATA
PKC.
responses
to the
‘background
condition’
of α-CD3/28
effect
isinterventions.
again
most
pronounced
with to
thethe
inhibitor
on effect
geted
For
example,
the
strong
effect
of of
this
data
using
standard
Bayesian
network
learning
geted
interventions.
For
example,
the
strong
this
data
using
standard
Bayesian
network
learning
tal
effects
here
directly
correspond
total
effects
5.2 FLOW
CYTOMETRY
DATA
stimulation.
The
four
rows
show
correspondPKC.
To demonstrate
how
our algorithm
might
be
used
in t(x
A number
of
issues
are
worth
pointing
out.
First,
note
Mek
on
Raf
as
seen
in
panel
(a)
goes
against
the
acyclic
methods.
Eaton
&
Murphy
(2007)
provide
evidence
Mek
on
Raf
as
seen
in
panel
(a)
goes
against
the
acyclic
methods.
Eaton
&bottom
Murphy
(2007)
provide
evidence
x
).
Our
measure
of
total
effects
essentially
dei
j
ing that
histograms
in
four
separate
experiments
where,
in fact model
practice
show
its
application
to the
flow
cytometry
that
thegiven
Akt
inhibitor
has
apointing
minimal
(if
any)
effect
by
Sachs
et the
al.
(2005).
that
many
ofwe
the
‘specific
may
in
fact
model
byareSachs
et
al. (2005).
many
of
the
‘specific
perturbations’
may
in
To demonstrate
how
ourperturbations’
algorithm
might
be
used
in termines
A number
ofgiven
issues
worth
out.
First,
note in
the
deviation
of
means
of
the
variables
addition
to
α-CD3/28,
inhibitors
were
selectively
apdata
of
Sachs
et
al.
(2005).
This
data
consists
of
single
on
the
measured
quantity
of
Akt.
As
pointed
out
not not
have
beenwe
localized
to single
molecules.
A recent
have
been
localized
to single
molecules.
A recentthe
practice
show
its application
to the
flow cytometry
thatexperimental
the Akt inhibitor
has a minimal
(if passive
any) effect
conditional
from
the
obserFinally,
we
emphasize
that
some
ofthe
the
experimenFinally,
we
emphasize
that
some
of
the
experimenplied
to
Akt,
PKC,
PIP2,
and
Mek
(top
to
bottom
cell
recordings
of
the
abundance
of
11
different
phosby
Sachs
et
al.
(2005),
this
is
because
inhibitors
dataprovided
of Sachs
etbyal.Schmidt
(2005).
This
data
consists
of single
on the measured
of Akt.
As standard
pointed out
analysis
& Murphy
(2009)
waswasvational
analysis
provided
by
Schmidt
& Murphy
(2009)
condition,quantity
normalized
by the
devital
settings
with
‘specific’
perturbations
in
the
orig-origtal
settings
with
‘specific’
perturbations
in
respectively).
Inand
panel
(a)
we seeunder
that11athe
inhibitors
phoproteins
phospholipids
number
of
exmay
just inactivate
a given
rather
than de-the
cellfirst
recordings
abundance
different
phosby Sachs
et al.
(2005),
this molecule
is because
theFor
inhibitors
the the
first
to
apply
a oftechnique
for learning
cyclic
modto apply
athetechnique
foroflearning
cyclic
mod-ation
in the
passive
observational
case.
instance,
inal
dataset
were
not
actually
targeted
interventions
inal
dataset
were
not
actually
targeted
interventions
perimental
human
T-cells.
In each
crease
its inactivate
abundance.
Thus,molecule
measured
quantities
of
applied
to PKCconditions
andphospholipids
Mekinhave
a under
large
on condiRaf,
phoproteins
and
aeffect
number
of
ex- the
maytotal
just
a given
rather
than deels. els.
effects of PKC,
PIP2,are
and
Mek
(respectively)
obtained
from
some
700–900
the
intervened-upon
molecules
not
reliable
indicabecause
they
changed
themeasured
background
conditions.
because
they
changed
the
background
conditions.
whiletion,
the measurements
effect conditions
(if any) were
ofininhibition
of Akt
and
PIP2
perimental
human
T-cells.
In
each
condi- onto
crease
its
abundance.
Thus,
quantities
Akt
were
−0.967
± 0.021
, −0.153
0.031,ofaand
cells.
These
data
have
been
previously
tors
of
how
active
are,
andare
so
we
must±impute
allthey
excitatory
perturbations
werewere
per-perTo
give
the
reader
awe
sense
of
the
data
we
follow
EllisEllis In
is minimal.
In reader
(b)
see
that
all
four
experimental
In
particular,
all
excitatory
perturbations
Tosingle
give
the
awere
sense
of
the
data
we analyzed
follow
tion,
measurements
obtained
from
some
700–900
the particular,
intervened-upon
molecules
not
reliable
indica−0.256
± 0.048;
the
effects variables.
are negative because an
using
a
number
of
techniques.
Sachs
et
al.
(2005)
show
‘low’
value
into
the
inhibited
without
the
α-CD3/28
stimulus,
&
Wong
(2008)
and
plot
in have
Figure
3 histograms
ofmost
two
conditions
affect
Akt,
though
effect
is again
cells.
These
data
been
previously
analyzed
torsformed
of how
active
theythe
are,α-CD3/28
and so background
we background
must impute
astimulus,
without
&single
Wong
(2008)
and
plot
inthe
Figure
3 histograms
of two formed
inhibition of the cause increased the amounts of the
number
of measured
techniques.
Sachs
show
‘low’and
value
into
the
inhibited
pronounced
with
the
inhibitor
on
PKC.et
and
hence
any any
differences
tovariables.
the
passive
observational
out
ofusing
the
eleven
measured
variables,
in al.
five
different
hence
differences
to the
passive
observational
out
of athe
eleven
variables,
in(2005)
five
different
effect molecule. (Estimates of errors were obtained
observed
in these
conditions
maymay
be due
to eiexperimental
conditions.
Panel
(a) shows
the the
response
observed
in these
conditions
be due
to eiexperimental
conditions.
Panel
(a) shows
response casecase
A number of issues are worth pointing out. First, note
using a bootstrap approach.) These numbers correthe the
excitatory
stimulus
or the
change
in the
backof Raf
while
(b) (b)
displays
the the
activity
of Akt.
TheThe
top top therther
excitatory
stimulus
or the
change
in the
backof Raf
while
displays
activity
of Akt.
that the Akt inhibitor has a minimal (if any) effect
spond to the difference between the top row and the
ground
condition.
For For
thisthis
reason
we exclude
them
rowsrows
show
the the
responses
to the
‘background
condition’
ground
condition.
reason
we
exclude
them
show
responses
to
the
‘background
condition’
on the measured quantity of Akt. This is because the
bottom three rows (respectively) of Figure 3b.
our our
analysis.
(representing
the the
passive
observational
datadata
case)
of of fromfrom
analysis.
(representing
passive
observational
case)
xlim
4 6
xlim
8
6 8
8
0
0
2
0 2
2 4
4
xlim
6
xlim
4 6
8
xlim
6 8
8
Discovery of Linear Cyclic Models with Latent Variables
Table 2: Estimated direct effects from the Sachs et
al. data, as estimated from (a) the experiments with
a background condition of α-CD3/28; and (b) with a
background condition of α-CD3/28 + ICAM-2. The
tables are read from column to row so, for instance, the
direct effect of PKC on PIP2 is −0.90 and −0.61 in the
two settings. Errors given by a bootstrap analysis were
small compared to the size of the coefficients.
(a)
(b)
Akt
PKC PIP2 Mek
Akt
-0.91 -0.22 0.22
PKC -0.15
-0.09 0.48
PIP2 -0.43 -0.90
0.40
Mek -0.27 -1.69 -0.22
Akt
Akt
PKC
PIP2
Mek
PKC PIP2
Mek
-1.34 -0.24 -0.26
0.13
-0.10 -0.08
0.09 -0.61
-0.01
0.10 -0.93 -0.11
Given the total effects among the four intervened-upon
molecules, we derived the corresponding direct effects
among them. Note that these direct effects are direct
only relative to these four molecules; the effects are
presumably mediated by the other (both measured and
unmeasured) molecules in the signaling system. The
result is shown in Table 2a. The most pronounced result is that PKC has strong negative connections to
the other three variables. Similarly, PIP2 and Akt
have weak negative effects, and Mek weak positive direct effects on the other variables.
While it is obvious that linearity is a very strong (and
indeed quite unreasonable) assumption for this data,
we nevertheless hope that we can use it as a first and
very rough approximation. Fortunately, the dataset
contains some additional data that can be used to at
least partly corroborate the results from our analysis:
In addition to the α-CD3/28 background condition,
there are similar experimental data in which a reagent
called ICAM-2 was added to the background, both
with or without the inhibitory targeted interventions.
Using this additional data, we were able to confirm the
strong inhibitory direct effect (with respect to these
four variables) of PKC on the other three molecules,
as shown in Table 2b. Similarly, PIP2 still has a weak
negative effect on the other variables. On the other
hand, in this additional data the effect of Akt and of
Mek on the other variables is weak and of opposite
sign to the results in (a), indicating that these connections should be treated as unreliable at best. Thus, in
this respect, the model fits the ‘ground truth’ graph
of Sachs et al. (2005) relatively well, as there were no
directed paths from these variables to the other of the
four variables in their graph.
7
CONCLUSIONS
We have described (and provided a software package
for) a general algorithm for discovery of linear causal
models using experimental data. The algorithm can
handle cyclic structures and discover the presence and
location of latent variables. It is ‘online’ in that it
works with any set of experimental data, and indicates whether the system is underdetermined, overdetermined or exactly determined. The method of integrating data from different experiments resolves conflicts in a way that is optimal with regard to a specified
measure. In an underdetermined system the algorithm
provides a minimally satisficing result, and we have
suggested how to proceed by selecting more experiments, or by characterizing the underdetermination.
It should be noted that in addition to the generality of
structures we consider, we do not assume faithfulness.
This results in the (admittedly demanding) identifiability condition we have shown to be necessary and
sufficient. Similarly, if constraints other than those on
the experimental effects are considered, model search
can be made more efficient.
The current version of the algorithm relies on the assumption of linearity. While no corresponding solution
can be supplied for general discrete networks, we are
hopeful that a similar procedure for particular types of
discrete parameterizations (e.g. noisy-or) is possible.
References
Cooper, G. and C. Yoo (1999). Causal discovery from a
mixture of experimental and observational data. In
UAI ’99, pp. 116–125.
Eaton, D. and K. Murphy (2007). Exact Bayesian structure learning from uncertain interventions. In AISTATS ’07.
Eberhardt, F. (2008). Almost optimal intervention sets
for causal discovery. In UAI ’08.
Ellis, B. and W. Wong (2008). Learning causal Bayesian
network structures from experimental data. J. Amer.
Stat. Assoc. 103, 778–789.
He, Y. and Z. Geng (2008). Active learning of causal
networks with intervention experiments and optimal
designs. J. Mach. Learn. Res. 9, 2523–2547.
Murphy, K. P. (2001). Active learning of causal Bayes
net structure. Technical report, U.C. Berkeley.
Nyberg, E. and K. Korb (2006). Informative interventions. In Causality and Probability in the Sciences.
College Publications, London.
Pearl, J. (2000). Causality. Oxford University Press.
Richardson, T. (1996). Feedback Models: Interpretation
and Discovery. Ph. D. thesis, Carnegie Mellon.
Sachs, K., O. Perez, D. Pe’er, D. Lauffenburger, and
G. Nolan (2005). Causal protein-signaling networks
derived from multiparameter single-cell data. Science 308 (5721), 523–529.
Schmidt, M. and K. Murphy (2009). Modeling discrete
interventional data using directed cyclic graphical
models. In UAI ’09.
Tong, S. and D. Koller (2001). Active learning for structure in Bayesian networks. In UAI ’01, pp. 863–869.
© Copyright 2026 Paperzz