Causality

Causal Graphical Models
David Sobel
Guest Lecture
CLPS1291: Computational Toolbox for
the mind, brain and behavior
What’s a Causal Graphical Model
• Representation of a joint probability
distribution
• What’s a joint probability distribution?
– Events occur in the world in certain combinations.
– Each combination has a probability (which reflects
the causal structure among the events)
Example
• Suppose I toss a die (X) and a coin (Y)
• p(X = 1 and Y = “heads”)?
• P(x,y)
p(x,y)
1
2
3
4
5
6
Row
Totals
Heads
a
b
c
d
e
f
α
Tails
g
h
i
j
k
l
β
Column
Totals
γ
δ
ε
ζ
θ
ψ
ω
Example
• Suppose I toss a die (X) and a coin (Y)
• p(X = 1 and Y = “heads”)?
• P(x,y)
p(x,y)
1
2
3
4
5
6
Row
Totals
Heads
1/12
b
c
d
e
f
α
Tails
g
h
i
j
k
l
β
Column
Totals
γ
δ
ε
ζ
θ
ψ
ω
Example
• Suppose I toss a die (X) and a coin (Y)
• p(X = 1 and Y = “heads”)?
• P(x,y)
p(x,y)
1
2
3
4
5
6
Row
Totals
Heads
1/12
1/12
1/12
1/12
1/12
1/12
1/2
Tails
1/12
1/12
1/12
1/12
1/12
1/12
1/2
Column
Totals
1/6
1/6
1/6
1/6
1/6
1/6
1
• When events are independent p(x,y) = p(x)p(y)
When dependent?
• P(x1,...,xn) = Pj p(xj|x1,… xj-1) for any j
• Thus, p(x,y) = p(x|y)p(y) or p(y|x)p(x)
• Aside: From this we can get Bayes Rule:
– p(x,y) = p(y|x)p(x)
– p(x|y)p(y)= p(y|x)p(x)
– p(x|y) = p(y|x)p(x)/p(y)
Representing p(x1,...,xn) =
Pj p(xj|x1,… xj-1)
• Could represent this with a table like this:
p(x,y)
1
2
3
4
5
6
Row
Totals
Heads
1/12
1/12
1/12
1/12
1/12
1/12
1/2
Tails
1/12
1/12
1/12
1/12
1/12
1/12
1/2
Column
Totals
1/6
1/6
1/6
1/6
1/6
1/6
1
• But that’s pretty inefficient (note, 12 entries,
just for 2 variables). In general: o(2n)
A better way (Pearl, 1985, 1988)
• Given a probability distribution V with n
variables, if we observe that Xj is not sensitive to
all other X variables, but only a subset
– i.e., p(xj|x1,… xj-1) = p(xj|paj)
• Then all variables in paj become parents of xj in a
graph
– Allows us to build a DAG (aka, a graphical model)
– Many fewer nodes: o(n)
What makes it a Causal Graphical
Model?
• Three Assumptions
1) Links between nodes specify some kind of causal
mechanism
2) What we see is what there is (Faithfulness)
3) Links specify conditional independence relations
among events (Markov Assumption)
Raining
Yesterday
Raining
Today
Raining
Tomorrow
• Raining yesterday and tomorrow are independent,
given you know today’s weather
Links with human reasoning
• Do we reason according to the Markov Assumption?
• Answer: Depends on who you talk to
– In children: Yes (e.g., Buchanan & Sobel, 2010, 2011; Gopnik, Sobel, Schulz, Glymour, 2001; Sobel,
Tenenbaum, & Gopnik, 2004; Sobel & Kirkham, 2006; Sobel & Sommerville, 2009, 2010)
– In adults:
• Some say no (e.g., Rehder & Burnett, 2005; Rehder, 2014; Park & Sloman, 2015; Walsh & Sloman,
2008)
• Some say yes (e.g., Buchanan, Tenenbaum & Sobel, 2010; Griffiths, Sobel, Tenenbaum, & Gopnik,
2011)
• There’s a fun aside here, but that’s for another day
(because I want to give an example)
The Blicket Detector
• Need a method for
asking about
children’s causal
reasoning without
bring in much a priori
knowledge
– The Blicket Detector
• Certain objects placed
on the detector make
it “activate”
– Presents a novel, nonobvious causal
property of objects
Gopnik et al. (2001)
One-Cause Condition
Object A activates the
detector by itself
Object B does not
activate the detector
by itself
Both objects activate
the detector
(Demonstrated twice)
Children are asked
which one is the
blicket
Two-Cause Condition
Object A activates the
detector by itself
(Demonstrated three
times)
Object B does not
activate the detector
by itself
(Demonstrated once)
Object B activates the
detector by itself
(Demonstrated twice)
Children are asked
which one is the
blicket
Results
Gopnik et al. (2001) Results: % of children
who say yes to each question
Sobel and Kirkham (2006) Results: Choice
in response to "Make it Go!" with Toddlers
80
100
70
90
80
60
70
50
60
Object A
Object A
Object B
50
40
30
Other
Response
30
20
20
10
10
0
40
3-year-olds
4-year-olds
0
18-month-olds 24-month-olds
Reasoning about Ambiguity
(Sobel et al., 2004)
Preschoolers respond differently
100
90
80
70
60
50
Object A
40
Object B
30
20
10
0
Inference
Backwards Blocking
Implications
• Not associative reasoning
• Priors?
– New exp. with 4-year-olds: Prior to backwards
blocking, show children that 2/12 or 10/12
identical objects activate machine. Now give
backwards blocking
• A is always a blicket
• When blickets are rare: B is not a blicket (~17%)
• When blickets are common: B is a blicket (~88%)
Modeling this
(Griffiths, Sobel, Tenenbaum, & Gopnik, 2011)
• Three steps
– 1) Create a hypothesis space of DAGs
– 2) Assign priors to hypotheses
– 3) Reason via Bayes’ rule
• Step 1
– Assume there are objects in the world called blickets and objects in
the world called blicket detectors, such that blickets activate blicket
detectors and not blickets do not activate blicket detectors.
– The question of whether an object is a blicket can be formulated as a
question of whether a causal relation exists between placing that
object on the detector and the detector activating (represented by a
link between ObjectDetector in DAG).
– For Backwards Blocking, there are two objects (A and B) and one
detector (E)
Step 1 continued
• Infinite hypothesis space, constrained by three assumptions
– Spatial Independence
– Temporal Priority
– Activation Law
• Activation law: The blicket detector activates if and only if one or
more blickets are placed on top of it.
– p(e|X, XE is in graph) = 1
Do children reason according to
activation law?
• Yes
• If they don’t, they don’t look like kids in Sobel
et al. (2004) (Sobel & Munro, 2009)
Step 2: Assign Priors
• Assume an object is a blicket with probability r
(1-r)2
r(1-r)
r(1-r)
r2
• You can do the math yourself to show these sum
to 1.
Step 3: Reason
• Simple application of Bayes’ Rule
• We know the priors: p(h)
• The activation law gives p(d|h) for each graph
Graph
E+|A+B+
E+|A+B-
E+|A-B+
E+|A-B-
0
0
0
1
0
1
1
1
0
0
2
1
0
1
0
3
1
1
0
0
Turn the Bayesian Crank
• Recall Data: AB together activates machine, A
alone activates machine
Graph
Prior Probability
After AB Event
After A Event
0
(1-r)2
0
0
1
r(1-r)
(1-r)(2-r)
(1-r)
2
r(1-r)
(1-r)(2-r)
0
3
r2
r(2-r)
r
Result
Graph
Prior Probability
After AB Event
After A Event
0
(1-r)2
0
0
1
r(1-r)
(1-r)(2-r)
(1-r)
2
r(1-r)
(1-r)(2-r)
0
3
r2
r(2-r)
r
• The probability of being a blicket is the sum of
the probability of all graphs in which the
causal relation exists
Object
Prior Probability
After AB Event
After A Event
A
r
1/(2-r)
1
B
r
1/(2-r)
r
Back to Data
• Kid data:
– Prior is 1/6, Estimates
of A = 1, B = .17
– Prior is 5/6, Estimates
of A = 1, B = .88
• Adult data:
Summary
• Causal Graphical Models
– Representation of Joint Probability Distribution
– Makes Assumptions about Causal Structure
– Children and adults follow assumptions*
• Can be used for Bayesian Inference to make
inferences about ambiguity
• Homework: Objects are rare (2/12), shown AB+,
AC+