Applications of quantum probability theory to dynamic decision making

Applications of quantum probability theory to dynamic decision making (FA9550-­‐‑12-­‐‑1-­‐‑0397)
PI: Jerome R. Busemeyer (Indiana University)
Co-­‐‑PI: S.N. Balakrishnan (Missouri Univ of Science and Tech)
Co-­‐‑PI: Joyce Wang (Ohio State University)
AFOSR Program Review:
Mathematical and Computational Cognition Program
Computational and Machine Intelligence Program Robust Decision Making in Human-­‐‑System Interface Program (Jan 28 – Feb 1, 2013, Washington, DC)
Quantum Dynamic Decision (Busemeyer)
Research Objectives:
Technical Approach:
•  Develop quantum learning models
for dynamic decisions
•  Mathematical theory construction
•  Develop quantum random walk
sequential sampling models
•  Mathematical analysis and
computer simulation algorithm
performance testing
•  Experimentally test models against
human behavior
•  Human experimentation to
empirically test theory
DoD Benefits:
Budget ($k): Total (direct+indirect)
•  Develop fast yet robust dynamic
decision learning algorithms for
mixed human and autonomous
systems
•  Identify limits and constraints on
performance for complex
dynamic decision tasks
YR 1
YR 2
YR 3
197
203
208
YR 4
Project Start Date: July 19, 2012
Project End Date: July 1, 2015
2
List of Project Goals
The broad and long term goal is to provide a new foundation
for constructing probabilistic-dynamic systems from principles
based on quantum probability theory.
1. Develop a quantum random walk sequential sampling decision
model
2. Develop a quantum reinforcement learning model for learning a
sequence of actions in a Markov decision problem environments.
3. Experimentally test whether or not the quantum models provide
a better account of actual human learning and performance than
traditional models.
Busemeyer 3
Progress Towards Goals (or New Goals)
During the six months we have been working, we
1.  Developed initial quantum random walk model of choice
and decision time
2.  Developed a new quantum reinforcement learning algorithm
and demonstrated through extensive simulation improved
performance as compared to traditional models
3.  Developed a new experimental lab dynamic decision task
called “goal seeking” or “predatory – prey” task.
4.  Collected new learning and performance data from an initial
sample of participants on the new task.
Busemeyer 4
Why Quantum?
•  Principle of Superposition
–  Decisions involve indefinite states that capture the intuitive
feelings of conflict, ambiguity, or uncertainty
•  Principle of Complementarity
–  Judgments are constructed from the interaction of the prior
indefinite state and the question being asked
•  Principle of Entanglement
–  Captures the non-compositional nature of cognition
Busemeyer 5
What is
Quantum Probability Theory?
Classical
Quantum
Each unique outcome is a
member of a set of points
called the Sample space
Each unique outcome is an
orthonormal basis vector spanning
a Vector space
Each event is a subset of the
sample space
Each event is a subspace of the
vector space.
State is a probability function,
p, defined on subsets of the
sample space.
State is a unit length vector, S,
probability is the squared length
of its projection on a subspace
Busemeyer 6
Classical
p(A) := prob A
A ∩ B = ∅,
p(A ∪ B) = p(A) + p(B)
p(A) = 1− p(A)
p(B ∩ C)
p(B | C) =
p(C)
Quantum
p(A) = PA S
2
PA PB = 0
p(A ∨ B) = p(A) + p(B)
p(A) = 1− p(A)
p(B | C) =
PB PC S
PC S
2
2
Busemeyer 7
Markov Dynamics
pi := prob state i
p = [ pi ], N × 1 vector
∑p
i
=1
i
Schrödinger Dynamics
si := amplitude state i
S= [ si ], N × 1 vector
∑
2
i
si = 1
T = ⎡⎣Tij ⎤⎦ , N × N matrix U = ⎡⎣U ij ⎤⎦ , N × N matrix
U ij := amplitude transit j to i
Tij := prob transit j to i
∑T
i
ij
= 1 (stochastic)
p(t ) = T (t ) ⋅ p(0)
d
T (t ) = K ⋅T (t )
dt
U 'U = I (unitary)
S(t) = U(t)⋅ S(0)
d
U(t) = −i ⋅ H ⋅U(t)
dt
Busemeyer 8
What is the Empirical Evidence?
Participants were shown pictures of faces
1. Categorize as ‘Good guy’ or ‘Bad guy’;
2. Decide to act Friendly or Aggressive.
Categorization
1000-2000ms
1000ms
10s
Decision
Feedback on C
and D
10s
10s
Two Conditions
C-then-D: Categorize the face first and afterwards take an action
D-only: Make an action decision without reporting categorization
Busemeyer 9
Results reveal interference
•  The D-only Condition
Pr[ A | S ] = .69
•  The C-then-D Condition
Pr[ G |S] = .17 ; Pr[ A | G ] = .42
Pr[ B |S] = .83 ; Pr[ A | B ] = .63
•  Law of total probability:
Pr[ A |S] = (.17)(.42) + (.83)(.63) = .59
•  Something is wrong! ?? !
Busemeyer 10
Replications (across four independent studies)
G
A|G
Bad
N=395
0.2303
0.3832
Good
N=403
0.7803
0.3556
B
A|B
TP
D
TP-D
T
p
0.7697
0.6055
0.5592
0.6048
-0.046
-4.68
0.000
0.2197
0.5258
0.3866
0.3912
-0.005
-0.545
0.2930
Extensions (informed of category instead of choosing category)
G
A|G
Bad
N=246
0.4105
0.3918
Good
N=242
0.6048
0.2646
B
A|B
TP
D
TP-D
T
p
.5895
0.7080
0.5789
0.6231
-0.0442
-4.0119
0.000
0.3952
0.5941
0.3957
0.3603
0.0354
Busemeyer 11
What other evidence?
Prisoner Dilemma Game
Opponent D
Opponent C
Player D
Player C
O: 10, P: 10
O:5, P: 25
O: 25, P: 5
O:20, P: 20
Three conditions:
Known Coop: Player is told other opponent will cooperate
Known Defect: Player is told other opponent will defect
UnKnown:
Player is told nothing about the opponent
Busemeyer 12
Five Experiments
Study
Known Defect
Known Coop
Unknown
Shafir (1992)
0.97
0.84
0.63
Croson (1999)
0.67
0.32
0.30
Li & Taplan (2002)
0.82
0.77
0.72
Histrova & Gringberg
(2008)
0.97
0.93
0.88
Matthew (2006)
0.91
0.84
0.66
Avg
0.87
0.72
0.65
Violates Law of Total Probability
Busemeyer 13
More Evidence?
Two Stage Gambling Game
•  Person is faced with a gamble: win $200 or lose $100.
•  The game is played twice, and the first play is obligatory
•  Player decides whether or not to play second game
Three conditions
Outcome of first play was a known win
Outcome of first play was a known loss
Outcome of first play was unknown
Busemeyer 14
Three Experiments
Study
Known Win
Known Loss
Unknown
Tversky & Shafir
(1992)
0.69
0.58
0.37
Kuhberger et. al
(2001) 0.72
0.47
0.48
Lambdin &
Burdsal (2007)
0.63
0.45
0.41
Avg
0.72
0.52
0.42
Violates Law of Total Probability
Busemeyer 15
Common Quantum Explanation
2
p(A | S) = PA S = PA ⋅ I ⋅ S
(
)
= PA PG + PG S
2
= PA PG S + PA PG S
2
2
2
2
= PA PG S + PA PG S + Int
Int = S PG PA PG S + S PG PA PG S
Busemeyer 16
Quantitative Model Comparison
•  Two stage gambling task previously described
–  (Barkan & Busemeyer, 2003)
•  100 participants, each played 17 gambles
–  Plan choices first stage unknown
–  Final choices first stage known
–  Two replications
–  66 choice pairs per person
•  Compared two models
–  Quantum model (four parameters)
–  Reference point model (prospect theory, 4 parameters)
Busemeyer 17
Bayes Factor Distribution
Prior Utility
Prior Loss aversion
2
2
0
0
−2
0
0.5
1
−2
0.5
1.5
1
Prior Memory
2
0
0
0
0.5
−2
−5
1
Prior Prospect Threshold
Frequency
0
0
1
2
2.5
3
0
5
Bayes Factor pQ/pR
2
−2
2
Prior GammaQ
2
−2
1.5
4
20
10
0
−3
−2
−1
0
1
logBF
2
3
Busemeyer 18
Quantum Random Walk
•  Sequential sampling decision models are broadly used across
fields in cognitive science to model all three of the primary
cognitive measures
–  choice, confidence, and decision time.
•  Most of these models are built upon classic Markov
probabilistic-dynamic principles
–  Classic random walks and diffusion models
•  So far, all our work has been limited to choice probability
alone, and the quantum models need to be extended to account
for decision time.
Busemeyer 19
Distribution of Decision Times for
Markov vs Quantum Random Walks
Markov
Quantum
0.03
0.03
Prob + = .95
MT = 65
0.025
0.02
Probability
Probability
0.02
0.015
0.015
0.01
0.01
0.005
0.005
0
Prob + = .95
MT = 51
0.025
0
100
200
Time
300
0
0
100
200
300
Time
20
Dynamic Decision Making
Goal Seeking Task:
Start
Goal may move
initial position
across episodes,
and randomly
move within an
episode.
Goal seeker knows
velocity vector
Goal
Busemeyer 21
Traditional Reinforcement
Learning
ei := state
ak := action
Q-learning:
Q(ei ,ak ,t + 1) = Q(ei ,ak ,t) + Δ t
Δ t = η ⋅ ⎡⎣ r(t) + γ ⋅ max l Q(e j ,al ,t) − Q(ei ,ak ,t) ⎤⎦
Softmax choice rule:
eλ⋅Qk
Pr [ ak ] =
,
λ ⋅Ql
∑l e
Busemeyer 22
Quantum Reinforcement Learning
•  Same Q-learning model
•  Change softmax choice rule to a quantum
amplitude amplification algorithm (Hoyer,
Physical Review, 2000)
•  Two versions of the quantum choice rule
–  Dong et al.’s original “L” version (IEEE, 2008)
–  New “Phi” version
Busemeyer 23
Amplitude Amplification
S = [ sl ], sl := amplitude for action l, S = 1
2
Initially sl = 1 / N , N = no. of actions
A'k = ⎡⎣ 0 .. 1 .. 0 ⎤⎦ , action k to be amplified
V1 = I − (1− eφ1 )(Ak ⋅ A'k )
V2 = (1− eφ2 )(SI S ' I ) − I
St = (V2V1 )L SI
Pr [ ak ] = sk
2
Busemeyer 24
Two Versions
"L" version
k = best future action
L = int ( c ⋅ Δ t )
fix φ1 ,φ2
"Phi" version
L =1
φ2 = c2 ⋅ φ1 ,
φ1 = c1 ⋅ normalized(Δ t )
(c1 ,c2 ) depend on N
Busemeyer 25
20x20 grid, 5 actions, goal fixed within episode
Soft Max
“L” version
Phi Version
Busemeyer 26
Training,10x10, goal moving within episode
Predator1 success-QRL
400
350
350
300
300
250
250
Number of steps
Number of steps
Predator2 success-softmax
400
200
150
200
150
100
100
50
50
0
0
0.2
0.4
0.6
0.8
1
1.2
Episodes
Soft Max
1.4
1.6
1.8
2
4
x 10
0
0
0.2
0.4
0.6
0.8
1
1.2
Episodes
1.4
1.6
1.8
2
4
x 10
Phi version
27
Test Stage, Two predators (softmax vs.
Quantum-Phi) and one moving goal
28
Human Goal Seeking Experiments
29
List of Publications A^ributed to the Grant
•  Wang, Z., Busemeyer, J. R., Atmanspacher, H., & Pothos, E. M. (in press). The potential to use quantum theory to build models of cognition. Topics in Cognitive Science.
•  Wang, Z., & Busemeyer, J. R. (in press). A quantum question order model supported by empirical tests of an a priori and precise prediction. Topics in Cognitive Science •  Busemeyer, J. R., Wang, J., Pothos, E. M. (in press) Quantum models of cognition and decision. In Busemeyer, J. R., Townsend, J.T., Wang, J., Eidels, A. (Eds.) Oxford Handbook of Computational and Mathematical Psychology. •  Fakhari, P., Rajagopal, K., Balakrishnan, S. N., Busmeyer, J. R. (under review) Quantum inspired reinforcement learning in a changing environment. Special Issue on Engineering of The Mind,
cognitive Science and Robotics.
Busemeyer 30