Bayesian models of human inference Josh Tenenbaum MIT The Bayesian revolution in AI • Principled and effective solutions for inductive inference from ambiguous data: – – – – – Vision Robotics Machine learning Expert systems / reasoning Natural language processing • Standard view in AI: no necessary connection to how the human brain solves these problems. – Heuristics & Biases program in the background (“We know people aren’t Bayesian, but…”). Bayesian models of cognition Visual perception [Weiss, Simoncelli, Adelson, Richards, Freeman, Feldman, Kersten, Knill, Maloney, Olshausen, Jacobs, Pouget, ...] Language acquisition and processing [Brent, de Marken, Niyogi, Klein, Manning, Jurafsky, Keller, Levy, Hale, Johnson, Griffiths, Perfors, Tenenbaum, …] Motor learning and motor control [Ghahramani, Jordan, Wolpert, Kording, Kawato, Doya, Todorov, Shadmehr, …] Associative learning [Dayan, Daw, Kakade, Courville, Touretzky, Kruschke, …] Memory [Anderson, Schooler, Shiffrin, Steyvers, Griffiths, McClelland, …] Attention [Mozer, Huber, Torralba, Oliva, Geisler, Yu, Itti, Baldi, …] Categorization and concept learning [Anderson, Nosfosky, Rehder, Navarro, Griffiths, Feldman, Tenenbaum, Rosseel, Goodman, Kemp, Mansinghka, …] Reasoning [Chater, Oaksford, Sloman, McKenzie, Heit, Tenenbaum, Kemp, …] Causal inference [Waldmann, Sloman, Steyvers, Griffiths, Tenenbaum, Yuille, …] Decision making and theory of mind [Lee, Stankiewicz, Rao, Baker, Goodman, Tenenbaum, …] How to meet up with mainstream JDM research (i.e., heuristics & biases)? 1. How to reconcile apparently contradictory messages of H&B and Bayesian models? Are people Bayesian or aren’t they? When are they, when aren’t they, and why? 2. How to integrate the H&B and Bayesian research approaches? When are people Bayesian, and why? • Low level hypothesis (Shiffrin, Maloney, etc.) – People are Bayesian in low-level input or output processes that have a long evolutionary history shared with other species, e.g. vision, motor control, memory retrieval. When are people Bayesian, and why? • Low level hypothesis (Shiffrin, Maloney, etc.) • Information format hypothesis (Gigerenzer) – Higher-level cognition can be Bayesian when information is presented in formats that we have evolved to process, and that support simple heuristic algorithms, e.g., base-rate neglect disappears with “natural frequencies”. Explicit probabilities Natural frequencies When are people Bayesian, and why? • Low level hypothesis (Shiffrin, Maloney, etc.) • Information format hypothesis (Gigerenzer) • Core capacities hypothesis – Bayes can illuminate distinctively human cognitive capacities for inductive inference – learning words and concepts, projecting properties of objects, causal inference, or action understanding: problems we solve effortlessly, unconsciously, and successfully in natural contexts, which a five-year-old solves better than any animal or computer. Figure 13: Procedure used in Sobel et al. (2002), Experi One-Cause Condition When are people Bayesian, and why? • Low level hypothesis (Shiffrin, Maloney, etc.) Figure 13: Procedure used in Sobel et al. (2002), Experiment 2 Figure 13: Procedure used in Sobel et (2002), al. et (2002), Experiment Figure 13: Figure Procedure 13: Procedure used in Sobel used in et Sobel al. al.Experiment (2002), Experiment 2 2(Gigerenzer) 2 • Information format One-Cause hypothesis Condition Figure 13: Procedure used in Sobel et al. (2002), Experiment 2 Backward Blocking Condition One-Cause Condition One-Cause One-Cause Condition Condition One-Cause Condition • Core capacities hypothesis Both objects activate the detector Causal induction Both objects activate detector Both objectsthe activate Both objects activate the detector the detector Both objects activate the detector Object A does not activate the detector Object A does Object not A does not byactivate itself the detector activate the detector by itself by itself AB Object A does not activate the detector by itself A B A B Object A does not Children are asked if eachactivate is a blicket activate theare detector Both objects Object Aactivates the Children asked if Thenare asked to they by each is aitself blicket the detector detector by itself Children are asked Children if are asked if Both objects activate make Object A does not the machine go Then asked eachthey is aare blicket eachtois a blicket the detector activate the detector Thenmake Then he machine they are tasked they to aregoasked to by itself makethe machine make go the machine go AB Trial Backward Blocking Condition Backward Blocking Condition Backward Backward BlockingBlocking Condition Condition C ea T th m A Trial A B C ea Children are aske T each is a blicketth Thenare asked tom they makethe machine g B Backward Blocking Condition ? ? E Both objects activate the detector Both objects activate Both objects activate the detector the detector Both objects activate the detector Object Aactivates the detector by itself Object Aactivates Object theAactivates the detector by itself detector by itself Object Aactivates the detector by itself Children are asked if each is a blicketBoth objects activate Children areasked asked Children are asked if Thenare they toif the detector eachmake is athe blicket each isgo a blicket machine Thenare askedthey Then they to are asked to makethe machine make go the machine go Children are asked if each is a blicket Thenare asked to they Aactivates the makethe machine Object go detector by itself (Sobel,Children are aske each is a blicket Griffiths, Thenare asked to they makethe machine g Tenenbaum, & Gopnik) When are people Bayesian, and why? • Low level hypothesis (Shiffrin, Maloney, etc.) • Information format hypothesis (Gigerenzer) • Core capacities hypothesis Word learning Hypothesis space Data (Tenenbaum & Xu) When are people Bayesian, and why? • Low level hypothesis (Shiffrin, Maloney, etc.) • Information format hypothesis (Gigerenzer) • Core capacities hypothesis – Bayes can illuminate distinctively human cognitive capacities for inductive inference – learning words and concepts, projecting properties of objects, causal inference, or action understanding: problems we solve effortlessly, unconsciously, and successfully in natural contexts, which a five-year-old solves better than any animal or computer. – The mind is not good at explicit Bayesian reasoning about verbally or symbolically presented statistics, unless core capacities can be engaged. When are people Bayesian, and why? Statistical version of Diagnosis problem Causal version of Diagnosis problem Correct Base-rate neglect • Low level hypothesis (Shiffrin, Maloney, etc.) • Information format hypothesis (Gigerenzer) • Core competence hypothesis (Krynski & Tenenbaum) How to meet up with mainstream JDM research (i.e., heuristics & biases)? 1. How to reconcile apparently contradictory messages of H&B and Bayesian models? Are people Bayesian or aren’t they? When are they, when aren’t they, and why? 2. How to integrate the H&B and Bayesian research approaches? Reverse engineering • Goal is to reverse-engineer human inference. – A computational understanding of how the mind works and why it works it does. • Even for core inferential capacities, we are likely to observe behavior that deviates from any ideal Bayesian analysis. • These deviations are likely to be informative about how the mind works. Analogy to visual illusions (Adelson) (Shepard) • Highlight the problems the visual system is designed to solve: inferring world structure from images, not judging properties of the images themselves. • Reveal the implicit visual system’s implicit assumptions about the physical world and the processes of image formation that are needed to solve these problems. How do we interpret deviations from a Bayesian analysis? • H&B: People aren’t Bayesian, but use some other means of inference. – – – – Base-rate neglect: representativeness heuristic Recency bias: availability heuristic Order of evidence effects: anchoring and adjustment … • Not so compelling as reverse engineering. – What engineer would want to design a system based on “representativeness”, without knowing how it is computed, why it is computed that way, what problem it attempts to solve, when it works, or how its accuracy and efficiency compares to some ideal computation or other heuristics. How do we interpret deviations from a Bayesian analysis? Multiple levels of analysis (Marr) • Computational theory – What is the goal of the computation – the outputs and available inputs? What is the logic by which the inference can be performed? What constraints (prior knowledge) do people assume to make the solution well-posed? • Representation and algorithm – How is the information represented? How is the computation carried out algorithmically, approximating the ideal computational theory with realistic time & space resources? • Hardware implementation How do we interpret deviations from a Bayesian analysis? Multiple levels of analysis (Marr) • Computational theory – What is the goal of the computation – the outputs andBayes available inputs? What is the logic by which the inference can be performed? What constraints (prior knowledge) do people assume to make the solution well-posed? • Representation and algorithm – How is the information represented? How is the computation carried out algorithmically, approximating the ideal computational theory with realistic time & space resources? • Hardware implementation Different philosophies • H&B – One canonical Bayesian analysis of any given task, and we know what it is. – Ideal Bayesian solution can be computed. – The question “Are people Bayesian?” is empirically meaningful on any given task. • Bayes+Marr – Many possible Bayesian analyses of any given task, and we need to discover which best characterize cognition. – Ideal Bayesian solution can only be approximately computed. – The question “Are people Bayesian?” is not an empirical one, at least not for an individual task. Bayes is a frameworklevel assumption, like distributed representations in connectionism or condition-action rules in ACT-R. How do we interpret deviations from a Bayesian analysis? Multiple levels of analysis (Marr) • Computational theory – What is the goal of the computation – the outputs and available inputs? What is the logic by which the inference can be performed? What constraints (prior knowledge) do people assume to make the solution well-posed? • Representation and algorithm – How is the information represented? How is the computation carried out algorithmically, approximating the ideal computational theory with realistic time & space resources? • Hardware implementation The centrality of causal inference • In visual perception: – Judge P(scene|image features) rather than P(image features|scene) or P(image features|other image features). • Coin–flipping: Which sequence is more likely to come from flipping a fair coin, HHTHT or HHHHH? • Coincidences: How likely that 2 people in a random party of 25 have the same birthday? 3 in a party of 10? (Griffiths & Tenenbaum) Judgments of randomness: Judgments of coincidence: P(data | random) P(random | data) P(data | regular ) Rational measure of evidential support: P(data | h1 ) P(data | h0 ) P(data | regular ) P(regular | data) P(data | random) (Griffiths & Tenenbaum) How do we interpret deviations from a Bayesian analysis? Multiple levels of analysis (Marr) • Computational theory – What is the goal of the computation – the outputs and available inputs? What is the logic by which the inference can be performed? What constraints (prior knowledge) do people assume to make the solution well-posed? • Representation and algorithm – How is the information represented? How is the computation carried out algorithmically, approximating the ideal computational theory with realistic time & space resources? • Hardware implementation Assuming the world is simple • In visual perception: Figure 13: Procedure used in Sobel et al. (2002), Experimen – “Slow and smooth” prior on visual motion One-Cause Condition – Figure 13: Procedure us Figure 13: Procedure used in Sobel et al. (2002), Experimen One-Cause Condition Both objects activate the detector One-Cause Condition Childre Object A does not activate the detector by itself Figure 13: Procedure used in Sobel et al. (2002), Experiment 2 13: Procedure used in et (2002), al. (2002), Experiment FigureFigure 13: Figure Procedure used in Sobel al.Sobel (2002), Experiment 2 Experiment 13: Figure Procedure 13: Procedure used inetSobel used in et Sobel al. et al.Experiment (2002), 2 2 2 One-Cause Condition Figure 13: Procedure used in Sobel et al. (2002), Experiment 2 Backward Blocking Condition One-Cause Condition One-Cause Condition One-Cause One-Cause Condition Condition One-Cause Condition each is Thenare they makethe • Causal induction: – P(blicket) = 1/6, “Activation law” Object A doesBoth not objects activate activate the detector the detector by itself Both objects activate the detector A B A Child each is Thenar they makethe Figure 13: Procedure used2 in Sobel et al. (2002), Experiment 2 Figure et al. (2002), Experiment P(A is a blicket|data) = 113: Procedure used Ain Sobel AB B B Figure 13: Procedure Figure used 13: Procedure in Sobel 2et used al. (2002), in SobelExperiment et al. (2002), 2 Experiment 2 One-Cause Condition One-Cause Condition Figure 13: Procedure used in Sobel et al. (2002), Experiment 2 Figure 13: Figure Procedure 13: Procedure used in used Sobel in et Sobel al. (2002), et al. (2002), Experiment Experiment 2 P(B is a blicket|data) ~ 1/6 Backward Blocking Co Backward Blocking Condition A Trial One-Cause Condition One-Cause Condition AB Trial One-Cause Condition One-Cause One-Cause Condition Condition Both objects activate Object A does not Children are asked if eachactivate is a blicket the detector activate detector Both objects Object Aactivates the Both objects activate A does not Children asked if Both objects activate Object A does notObject Children are asked if theare Thenare asked to they by each is aitself blicket the detector detector by itself detector activate the detector Both objectsthe activate Both objects activate Object A does Object not A doeseach not is a blicket Children are asked Children if are asked if the detector activate the detector Both objects activate make Object A does not the machine go Then asked is aare blicket eachtois a blicket byactivate itself the detector the detector the detector activate the detector Thenare askedeach they to they by itself the detector activate the detector make Then Then he machine they are tasked they to aregoasked to by itself by itselfmakethe machine go by itself makethe machine make go the machine go Backward Blocking Condition Backward Blocking Condition Backward Blocking Condition Backward Backward BlockingBlocking Condition Condition P(A is a blicket|data) ~ 3/4 P(B is a blicket|data) ~ 1/4 A A C A BCondition Backward Blocking B C Object A does Both not objects activate C Childre eachifis Children are asked Thenare each is a blicketthey Thenare asked tomakethe they makethe machine go B A does not Children are asked if Both objects activate Children are askedObject if eachactivate is a blicket Child eachactivate is a blicket activate the detector the detector activate the detector the detectorBoth objects Object AactivatesBoth the objects Then Thenare are asked to each is by itself Both objects activate Both activate Object A does notChildren are asked if detector Children asked they asked to itselfObject A does not theifthey detector by are itself Both objectsBoth activate Object A does not A does Children areobjects askedby if objectsBoth activate objects activate Object Object not A does not Children areChildren asked if are asked if the detector makethe machine they maketthe go ar eachgois a blicket each is a blicket the detectoreach is a blicket the detector activate the detector activate detector he machine Then the detector the detector the detector activate the detector each is a blicket each is a blicket activate the detector activate the detector AB Trial AC Trial Recognizing the world is complex • In visual perception: – Need uncertainty about coherence ratio and velocity of coherent motion. (Lu & Yuille) • Property induction: – Properties should be distributed stochastically over tree structure, not just focused on single branches. Gorillas have T9 cells. Seals have T9 cells. Horses have T9 cells. Bayes: single branch prior r = 0.50 (Kemp & Tenenbaum) Recognizing the world is complex • In visual perception: – Need uncertainty about coherence ratio and velocity of coherent motion. (Lu & Yuille) • Property induction: – Properties should be distributed stochastically over tree structure, not just focused on single branches. Gorillas have T9 cells. Seals have T9 cells. Horses have T9 cells. Bayes: “mutation” prior r = 0.92 (Kemp & Tenenbaum) “has T9 hormones” “can bite through wire” “is found near Minneapolis” “carry E. Spirus bacteria” (Kemp & Tenenbaum) How do we interpret deviations from a Bayesian analysis? Multiple levels of analysis (Marr) • Computational theory – What is the goal of the computation – the outputs and available inputs? What is the logic by which the inference can be performed? What constraints (prior knowledge) do people assume to make the solution well-posed? • Representation and algorithm – How is the information represented? How is the computation carried out algorithmically, approximating the ideal computational theory with realistic time & space resources? • Hardware implementation Sampling-based approximate inference • In visual perception: – Temporal dynamics of bi-stability due to fast sampling-based approximation of a bimodal posterior (Schrater & Sundareswara). • Order effects in category learning – Particle filter (sequential Monte Carlo), an online approximate inference algorithm assuming stationarity. • Probability matching in classification decisions – Sampling-based approximations with guarantees of near optimal generalization performance. (Griffiths et al., Goodman et al.) Conclusions • “Are people Bayesian?”, “When are they Bayesian?” – Maybe not the most interesting questions in the long run…. • What is the best way to reverse engineer cognition at multiple levels of analysis? Assuming core inductive capacities are approximately Bayesian at the computational-theory level offers several benefits: – – – – Explanatory power: why does cognition work? Fewer degrees of freedom in modeling A bridge to state-of-the-art AI and machine learning Tools to study the big questions: What are the goals of cognition? What does the mind know about the world? How is that knowledge represented? What are the processing mechanisms and why do they work as they do? Coincidences (Griffiths & Tenenbaum, in press) • The birthday problem – How many people do you need to have in the room before the probability exceeds 50% that two of them have the same birthday? 23. • The bombing of London How much of a coincidence? P(d | latent ) Bayesian coincidence factor: log P(d | random) Chance: Latent common cause: C x x x x x x x x x x August Alternative hypotheses: proximity in date, matching days of the month, matching month, .... How much of a coincidence? P(d | latent ) Bayesian coincidence factor: log P(d | random) Latent common cause: Chance: C x x x x x uniform x x x x x uniform + regularity
© Copyright 2026 Paperzz