Applications of quantum probability theory to dynamic decision making (FA9550-‐‑12-‐‑1-‐‑0397) PI: Jerome R. Busemeyer (Indiana University) Co-‐‑PI: S.N. Balakrishnan (Missouri Univ of Science and Tech) Co-‐‑PI: Joyce Wang (Ohio State University) AFOSR Program Review: Mathematical and Computational Cognition Program Computational and Machine Intelligence Program Robust Decision Making in Human-‐‑System Interface Program (Jan 28 – Feb 1, 2013, Washington, DC) Quantum Dynamic Decision (Busemeyer) Research Objectives: Technical Approach: • Develop quantum learning models for dynamic decisions • Mathematical theory construction • Develop quantum random walk sequential sampling models • Mathematical analysis and computer simulation algorithm performance testing • Experimentally test models against human behavior • Human experimentation to empirically test theory DoD Benefits: Budget ($k): Total (direct+indirect) • Develop fast yet robust dynamic decision learning algorithms for mixed human and autonomous systems • Identify limits and constraints on performance for complex dynamic decision tasks YR 1 YR 2 YR 3 197 203 208 YR 4 Project Start Date: July 19, 2012 Project End Date: July 1, 2015 2 List of Project Goals The broad and long term goal is to provide a new foundation for constructing probabilistic-dynamic systems from principles based on quantum probability theory. 1. Develop a quantum random walk sequential sampling decision model 2. Develop a quantum reinforcement learning model for learning a sequence of actions in a Markov decision problem environments. 3. Experimentally test whether or not the quantum models provide a better account of actual human learning and performance than traditional models. Busemeyer 3 Progress Towards Goals (or New Goals) During the six months we have been working, we 1. Developed initial quantum random walk model of choice and decision time 2. Developed a new quantum reinforcement learning algorithm and demonstrated through extensive simulation improved performance as compared to traditional models 3. Developed a new experimental lab dynamic decision task called “goal seeking” or “predatory – prey” task. 4. Collected new learning and performance data from an initial sample of participants on the new task. Busemeyer 4 Why Quantum? • Principle of Superposition – Decisions involve indefinite states that capture the intuitive feelings of conflict, ambiguity, or uncertainty • Principle of Complementarity – Judgments are constructed from the interaction of the prior indefinite state and the question being asked • Principle of Entanglement – Captures the non-compositional nature of cognition Busemeyer 5 What is Quantum Probability Theory? Classical Quantum Each unique outcome is a member of a set of points called the Sample space Each unique outcome is an orthonormal basis vector spanning a Vector space Each event is a subset of the sample space Each event is a subspace of the vector space. State is a probability function, p, defined on subsets of the sample space. State is a unit length vector, S, probability is the squared length of its projection on a subspace Busemeyer 6 Classical p(A) := prob A A ∩ B = ∅, p(A ∪ B) = p(A) + p(B) p(A) = 1− p(A) p(B ∩ C) p(B | C) = p(C) Quantum p(A) = PA S 2 PA PB = 0 p(A ∨ B) = p(A) + p(B) p(A) = 1− p(A) p(B | C) = PB PC S PC S 2 2 Busemeyer 7 Markov Dynamics pi := prob state i p = [ pi ], N × 1 vector ∑p i =1 i Schrödinger Dynamics si := amplitude state i S= [ si ], N × 1 vector ∑ 2 i si = 1 T = ⎡⎣Tij ⎤⎦ , N × N matrix U = ⎡⎣U ij ⎤⎦ , N × N matrix U ij := amplitude transit j to i Tij := prob transit j to i ∑T i ij = 1 (stochastic) p(t ) = T (t ) ⋅ p(0) d T (t ) = K ⋅T (t ) dt U 'U = I (unitary) S(t) = U(t)⋅ S(0) d U(t) = −i ⋅ H ⋅U(t) dt Busemeyer 8 What is the Empirical Evidence? Participants were shown pictures of faces 1. Categorize as ‘Good guy’ or ‘Bad guy’; 2. Decide to act Friendly or Aggressive. Categorization 1000-2000ms 1000ms 10s Decision Feedback on C and D 10s 10s Two Conditions C-then-D: Categorize the face first and afterwards take an action D-only: Make an action decision without reporting categorization Busemeyer 9 Results reveal interference • The D-only Condition Pr[ A | S ] = .69 • The C-then-D Condition Pr[ G |S] = .17 ; Pr[ A | G ] = .42 Pr[ B |S] = .83 ; Pr[ A | B ] = .63 • Law of total probability: Pr[ A |S] = (.17)(.42) + (.83)(.63) = .59 • Something is wrong! ?? ! Busemeyer 10 Replications (across four independent studies) G A|G Bad N=395 0.2303 0.3832 Good N=403 0.7803 0.3556 B A|B TP D TP-D T p 0.7697 0.6055 0.5592 0.6048 -0.046 -4.68 0.000 0.2197 0.5258 0.3866 0.3912 -0.005 -0.545 0.2930 Extensions (informed of category instead of choosing category) G A|G Bad N=246 0.4105 0.3918 Good N=242 0.6048 0.2646 B A|B TP D TP-D T p .5895 0.7080 0.5789 0.6231 -0.0442 -4.0119 0.000 0.3952 0.5941 0.3957 0.3603 0.0354 Busemeyer 11 What other evidence? Prisoner Dilemma Game Opponent D Opponent C Player D Player C O: 10, P: 10 O:5, P: 25 O: 25, P: 5 O:20, P: 20 Three conditions: Known Coop: Player is told other opponent will cooperate Known Defect: Player is told other opponent will defect UnKnown: Player is told nothing about the opponent Busemeyer 12 Five Experiments Study Known Defect Known Coop Unknown Shafir (1992) 0.97 0.84 0.63 Croson (1999) 0.67 0.32 0.30 Li & Taplan (2002) 0.82 0.77 0.72 Histrova & Gringberg (2008) 0.97 0.93 0.88 Matthew (2006) 0.91 0.84 0.66 Avg 0.87 0.72 0.65 Violates Law of Total Probability Busemeyer 13 More Evidence? Two Stage Gambling Game • Person is faced with a gamble: win $200 or lose $100. • The game is played twice, and the first play is obligatory • Player decides whether or not to play second game Three conditions Outcome of first play was a known win Outcome of first play was a known loss Outcome of first play was unknown Busemeyer 14 Three Experiments Study Known Win Known Loss Unknown Tversky & Shafir (1992) 0.69 0.58 0.37 Kuhberger et. al (2001) 0.72 0.47 0.48 Lambdin & Burdsal (2007) 0.63 0.45 0.41 Avg 0.72 0.52 0.42 Violates Law of Total Probability Busemeyer 15 Common Quantum Explanation 2 p(A | S) = PA S = PA ⋅ I ⋅ S ( ) = PA PG + PG S 2 = PA PG S + PA PG S 2 2 2 2 = PA PG S + PA PG S + Int Int = S PG PA PG S + S PG PA PG S Busemeyer 16 Quantitative Model Comparison • Two stage gambling task previously described – (Barkan & Busemeyer, 2003) • 100 participants, each played 17 gambles – Plan choices first stage unknown – Final choices first stage known – Two replications – 66 choice pairs per person • Compared two models – Quantum model (four parameters) – Reference point model (prospect theory, 4 parameters) Busemeyer 17 Bayes Factor Distribution Prior Utility Prior Loss aversion 2 2 0 0 −2 0 0.5 1 −2 0.5 1.5 1 Prior Memory 2 0 0 0 0.5 −2 −5 1 Prior Prospect Threshold Frequency 0 0 1 2 2.5 3 0 5 Bayes Factor pQ/pR 2 −2 2 Prior GammaQ 2 −2 1.5 4 20 10 0 −3 −2 −1 0 1 logBF 2 3 Busemeyer 18 Quantum Random Walk • Sequential sampling decision models are broadly used across fields in cognitive science to model all three of the primary cognitive measures – choice, confidence, and decision time. • Most of these models are built upon classic Markov probabilistic-dynamic principles – Classic random walks and diffusion models • So far, all our work has been limited to choice probability alone, and the quantum models need to be extended to account for decision time. Busemeyer 19 Distribution of Decision Times for Markov vs Quantum Random Walks Markov Quantum 0.03 0.03 Prob + = .95 MT = 65 0.025 0.02 Probability Probability 0.02 0.015 0.015 0.01 0.01 0.005 0.005 0 Prob + = .95 MT = 51 0.025 0 100 200 Time 300 0 0 100 200 300 Time 20 Dynamic Decision Making Goal Seeking Task: Start Goal may move initial position across episodes, and randomly move within an episode. Goal seeker knows velocity vector Goal Busemeyer 21 Traditional Reinforcement Learning ei := state ak := action Q-learning: Q(ei ,ak ,t + 1) = Q(ei ,ak ,t) + Δ t Δ t = η ⋅ ⎡⎣ r(t) + γ ⋅ max l Q(e j ,al ,t) − Q(ei ,ak ,t) ⎤⎦ Softmax choice rule: eλ⋅Qk Pr [ ak ] = , λ ⋅Ql ∑l e Busemeyer 22 Quantum Reinforcement Learning • Same Q-learning model • Change softmax choice rule to a quantum amplitude amplification algorithm (Hoyer, Physical Review, 2000) • Two versions of the quantum choice rule – Dong et al.’s original “L” version (IEEE, 2008) – New “Phi” version Busemeyer 23 Amplitude Amplification S = [ sl ], sl := amplitude for action l, S = 1 2 Initially sl = 1 / N , N = no. of actions A'k = ⎡⎣ 0 .. 1 .. 0 ⎤⎦ , action k to be amplified V1 = I − (1− eφ1 )(Ak ⋅ A'k ) V2 = (1− eφ2 )(SI S ' I ) − I St = (V2V1 )L SI Pr [ ak ] = sk 2 Busemeyer 24 Two Versions "L" version k = best future action L = int ( c ⋅ Δ t ) fix φ1 ,φ2 "Phi" version L =1 φ2 = c2 ⋅ φ1 , φ1 = c1 ⋅ normalized(Δ t ) (c1 ,c2 ) depend on N Busemeyer 25 20x20 grid, 5 actions, goal fixed within episode Soft Max “L” version Phi Version Busemeyer 26 Training,10x10, goal moving within episode Predator1 success-QRL 400 350 350 300 300 250 250 Number of steps Number of steps Predator2 success-softmax 400 200 150 200 150 100 100 50 50 0 0 0.2 0.4 0.6 0.8 1 1.2 Episodes Soft Max 1.4 1.6 1.8 2 4 x 10 0 0 0.2 0.4 0.6 0.8 1 1.2 Episodes 1.4 1.6 1.8 2 4 x 10 Phi version 27 Test Stage, Two predators (softmax vs. Quantum-Phi) and one moving goal 28 Human Goal Seeking Experiments 29 List of Publications A^ributed to the Grant • Wang, Z., Busemeyer, J. R., Atmanspacher, H., & Pothos, E. M. (in press). The potential to use quantum theory to build models of cognition. Topics in Cognitive Science. • Wang, Z., & Busemeyer, J. R. (in press). A quantum question order model supported by empirical tests of an a priori and precise prediction. Topics in Cognitive Science • Busemeyer, J. R., Wang, J., Pothos, E. M. (in press) Quantum models of cognition and decision. In Busemeyer, J. R., Townsend, J.T., Wang, J., Eidels, A. (Eds.) Oxford Handbook of Computational and Mathematical Psychology. • Fakhari, P., Rajagopal, K., Balakrishnan, S. N., Busmeyer, J. R. (under review) Quantum inspired reinforcement learning in a changing environment. Special Issue on Engineering of The Mind, cognitive Science and Robotics. Busemeyer 30
© Copyright 2025 Paperzz