Causal Graphical Models David Sobel Guest Lecture CLPS1291: Computational Toolbox for the mind, brain and behavior What’s a Causal Graphical Model • Representation of a joint probability distribution • What’s a joint probability distribution? – Events occur in the world in certain combinations. – Each combination has a probability (which reflects the causal structure among the events) Example • Suppose I toss a die (X) and a coin (Y) • p(X = 1 and Y = “heads”)? • P(x,y) p(x,y) 1 2 3 4 5 6 Row Totals Heads a b c d e f α Tails g h i j k l β Column Totals γ δ ε ζ θ ψ ω Example • Suppose I toss a die (X) and a coin (Y) • p(X = 1 and Y = “heads”)? • P(x,y) p(x,y) 1 2 3 4 5 6 Row Totals Heads 1/12 b c d e f α Tails g h i j k l β Column Totals γ δ ε ζ θ ψ ω Example • Suppose I toss a die (X) and a coin (Y) • p(X = 1 and Y = “heads”)? • P(x,y) p(x,y) 1 2 3 4 5 6 Row Totals Heads 1/12 1/12 1/12 1/12 1/12 1/12 1/2 Tails 1/12 1/12 1/12 1/12 1/12 1/12 1/2 Column Totals 1/6 1/6 1/6 1/6 1/6 1/6 1 • When events are independent p(x,y) = p(x)p(y) When dependent? • P(x1,...,xn) = Pj p(xj|x1,… xj-1) for any j • Thus, p(x,y) = p(x|y)p(y) or p(y|x)p(x) • Aside: From this we can get Bayes Rule: – p(x,y) = p(y|x)p(x) – p(x|y)p(y)= p(y|x)p(x) – p(x|y) = p(y|x)p(x)/p(y) Representing p(x1,...,xn) = Pj p(xj|x1,… xj-1) • Could represent this with a table like this: p(x,y) 1 2 3 4 5 6 Row Totals Heads 1/12 1/12 1/12 1/12 1/12 1/12 1/2 Tails 1/12 1/12 1/12 1/12 1/12 1/12 1/2 Column Totals 1/6 1/6 1/6 1/6 1/6 1/6 1 • But that’s pretty inefficient (note, 12 entries, just for 2 variables). In general: o(2n) A better way (Pearl, 1985, 1988) • Given a probability distribution V with n variables, if we observe that Xj is not sensitive to all other X variables, but only a subset – i.e., p(xj|x1,… xj-1) = p(xj|paj) • Then all variables in paj become parents of xj in a graph – Allows us to build a DAG (aka, a graphical model) – Many fewer nodes: o(n) What makes it a Causal Graphical Model? • Three Assumptions 1) Links between nodes specify some kind of causal mechanism 2) What we see is what there is (Faithfulness) 3) Links specify conditional independence relations among events (Markov Assumption) Raining Yesterday Raining Today Raining Tomorrow • Raining yesterday and tomorrow are independent, given you know today’s weather Links with human reasoning • Do we reason according to the Markov Assumption? • Answer: Depends on who you talk to – In children: Yes (e.g., Buchanan & Sobel, 2010, 2011; Gopnik, Sobel, Schulz, Glymour, 2001; Sobel, Tenenbaum, & Gopnik, 2004; Sobel & Kirkham, 2006; Sobel & Sommerville, 2009, 2010) – In adults: • Some say no (e.g., Rehder & Burnett, 2005; Rehder, 2014; Park & Sloman, 2015; Walsh & Sloman, 2008) • Some say yes (e.g., Buchanan, Tenenbaum & Sobel, 2010; Griffiths, Sobel, Tenenbaum, & Gopnik, 2011) • There’s a fun aside here, but that’s for another day (because I want to give an example) The Blicket Detector • Need a method for asking about children’s causal reasoning without bring in much a priori knowledge – The Blicket Detector • Certain objects placed on the detector make it “activate” – Presents a novel, nonobvious causal property of objects Gopnik et al. (2001) One-Cause Condition Object A activates the detector by itself Object B does not activate the detector by itself Both objects activate the detector (Demonstrated twice) Children are asked which one is the blicket Two-Cause Condition Object A activates the detector by itself (Demonstrated three times) Object B does not activate the detector by itself (Demonstrated once) Object B activates the detector by itself (Demonstrated twice) Children are asked which one is the blicket Results Gopnik et al. (2001) Results: % of children who say yes to each question Sobel and Kirkham (2006) Results: Choice in response to "Make it Go!" with Toddlers 80 100 70 90 80 60 70 50 60 Object A Object A Object B 50 40 30 Other Response 30 20 20 10 10 0 40 3-year-olds 4-year-olds 0 18-month-olds 24-month-olds Reasoning about Ambiguity (Sobel et al., 2004) Preschoolers respond differently 100 90 80 70 60 50 Object A 40 Object B 30 20 10 0 Inference Backwards Blocking Implications • Not associative reasoning • Priors? – New exp. with 4-year-olds: Prior to backwards blocking, show children that 2/12 or 10/12 identical objects activate machine. Now give backwards blocking • A is always a blicket • When blickets are rare: B is not a blicket (~17%) • When blickets are common: B is a blicket (~88%) Modeling this (Griffiths, Sobel, Tenenbaum, & Gopnik, 2011) • Three steps – 1) Create a hypothesis space of DAGs – 2) Assign priors to hypotheses – 3) Reason via Bayes’ rule • Step 1 – Assume there are objects in the world called blickets and objects in the world called blicket detectors, such that blickets activate blicket detectors and not blickets do not activate blicket detectors. – The question of whether an object is a blicket can be formulated as a question of whether a causal relation exists between placing that object on the detector and the detector activating (represented by a link between ObjectDetector in DAG). – For Backwards Blocking, there are two objects (A and B) and one detector (E) Step 1 continued • Infinite hypothesis space, constrained by three assumptions – Spatial Independence – Temporal Priority – Activation Law • Activation law: The blicket detector activates if and only if one or more blickets are placed on top of it. – p(e|X, XE is in graph) = 1 Do children reason according to activation law? • Yes • If they don’t, they don’t look like kids in Sobel et al. (2004) (Sobel & Munro, 2009) Step 2: Assign Priors • Assume an object is a blicket with probability r (1-r)2 r(1-r) r(1-r) r2 • You can do the math yourself to show these sum to 1. Step 3: Reason • Simple application of Bayes’ Rule • We know the priors: p(h) • The activation law gives p(d|h) for each graph Graph E+|A+B+ E+|A+B- E+|A-B+ E+|A-B- 0 0 0 1 0 1 1 1 0 0 2 1 0 1 0 3 1 1 0 0 Turn the Bayesian Crank • Recall Data: AB together activates machine, A alone activates machine Graph Prior Probability After AB Event After A Event 0 (1-r)2 0 0 1 r(1-r) (1-r)(2-r) (1-r) 2 r(1-r) (1-r)(2-r) 0 3 r2 r(2-r) r Result Graph Prior Probability After AB Event After A Event 0 (1-r)2 0 0 1 r(1-r) (1-r)(2-r) (1-r) 2 r(1-r) (1-r)(2-r) 0 3 r2 r(2-r) r • The probability of being a blicket is the sum of the probability of all graphs in which the causal relation exists Object Prior Probability After AB Event After A Event A r 1/(2-r) 1 B r 1/(2-r) r Back to Data • Kid data: – Prior is 1/6, Estimates of A = 1, B = .17 – Prior is 5/6, Estimates of A = 1, B = .88 • Adult data: Summary • Causal Graphical Models – Representation of Joint Probability Distribution – Makes Assumptions about Causal Structure – Children and adults follow assumptions* • Can be used for Bayesian Inference to make inferences about ambiguity • Homework: Objects are rare (2/12), shown AB+, AC+
© Copyright 2026 Paperzz