Overhead slides for accompanying talk.

Active Bayesian Associative
Learning
John K. Kruschke
Indiana University
April, 2008
J. K. Kruschke: Active Learning
Background:
 A typical associative learning task,
and a phenomenon from associative
learning: “Backward Blocking”.
 A traditional (non-Bayesian) model:
The Rescorla-Wagner model.
J. K. Kruschke: Active Learning
Typical Learning Task
Learn which foods produce or prevent nausea:
?
J. K. Kruschke: Active Learning
Backward Blocking
Phase
Frequency
Cue 1
Cue 2
Outcome
I
10
1
1
1
II
10
1
0
1
Note: Cell with a 1 indicates presence of
cue or outcome. Cell with a 0 indicates
absence of cue or outcome.
J. K. Kruschke: Active Learning
Please predict nausea or no nausea:
?
J. K. Kruschke: Active Learning
Actual result:
J. K. Kruschke: Active Learning
Please predict nausea or no nausea:
?
J. K. Kruschke: Active Learning
Actual result:
J. K. Kruschke: Active Learning
Please predict nausea or no nausea:
?
J. K. Kruschke: Active Learning
Actual result:
J. K. Kruschke: Active Learning
Interim Test (no corrective feedback):
Please predict nausea or no nausea:
?
J. K. Kruschke: Active Learning
Interim Test (no corrective feedback):
Please predict nausea or no nausea:
Usual response is middling prediction of nausea.
?
J. K. Kruschke: Active Learning
Please predict nausea or no nausea:
?
J. K. Kruschke: Active Learning
Actual result:
J. K. Kruschke: Active Learning
Please predict nausea or no nausea:
?
J. K. Kruschke: Active Learning
Actual result:
J. K. Kruschke: Active Learning
Please predict nausea or no nausea:
?
J. K. Kruschke: Active Learning
Actual result:
J. K. Kruschke: Active Learning
Final Test (no corrective feedback):
Please predict nausea or no nausea:
?
J. K. Kruschke: Active Learning
Final Test (no corrective feedback):
Please predict nausea or no nausea:
Usual response is reduced prediction of nausea.
This result (with its training structure) is called
“backward blocking”.
?
J. K. Kruschke: Active Learning
Active Learning
after Backward Blocking
Phase
Frequency
Cue 1
Cue 2
Outcome
I
10
1
1
1
II
10
1
0
1
After being trained on backward blocking, your goal is to better
determine which cues produce or prevent nausea.
Which probe below do you prefer to test?
J. K. Kruschke: Active Learning
Associative Models:
t
w1
c1
J. K. Kruschke: Active Learning
Outcome:
Modeled as a combination
of weighted cues.
w2
c12
Associative Weights
Cues
A classic non-Bayesian approach:
The Rescorla-Wagner model.
T 
a  i wi ci  w c
w1
c1
J. K. Kruschke: Active Learning
w2
c12
Learning
in the Rescorla-Wagner model.
T 
a  i wi ci  w c
w1
c1
J. K. Kruschke: Active Learning
w2
c2



T  
w   t  w c c
Weight Space:
Learning is change from one position to another.
J. K. Kruschke: Active Learning
w2
-1.5 +1.5
-1.0 +1.5
-0.5 +1.5
0.0 +1.5
+0.5 +1.5
+1.0 +1.5
+1.5 +1.5
-1.5 +1.0
-1.0 +1.0
-0.5 +1.0
0.0 +1.0
+0.5 +1.0
+1.0 +1.0
+1.5 +1.0
-1.5 +0.5
-1.0 +0.5
-0.5 +0.5
0.0 +0.5
+0.5 +0.5
+1.0 +0.5
+1.5 +0.5
-1.5 0.0
-1.0 0.0
-0.5 0.0
0.0 0.0
+0.5 0.0
+1.0 0.0
+1.5 0.0
-1.5 -0.5
-1.0 -0.5
-0.5 -0.5
0.0 -0.5
+0.5 -0.5
+1.0 -0.5
+1.5 -0.5
-1.5 -1.0
-1.0 -1.0
-0.5 -1.0
0.0 -1.0
+0.5 -1.0
+1.0 -1.0
+1.5 -1.0
-1.5 -1.5
-1.0 -1.5
-0.5 -1.5
0.0 -1.5
+0.5 -1.5
+1.0 -1.5
+1.5 -1.5
w1
Backward Blocking
Phase
Frequency
Cue 1
Cue 2
Outcome
I
10
1
1
1
II
10
1
0
1
Note: Cell with a 1 indicates presence of
cue or outcome. Cell with a 0 indicates
absence of cue or outcome.
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
End of Phase I.
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
Rescorla-Wagner
model does not
show backward
blocking.
J. K. Kruschke: Active Learning
Rescorla-Wagner
model does not
show backward
blocking.
And, has no way to
predict active
learning
preferences.
J. K. Kruschke: Active Learning
Design of the (rest of the) Talk:
2 x 2 x 3 factorial
Bayesian models of associative learning:
 Kalman filter.
 Noisy-logic gate.
Goals for active learning:
 Minimize expected uncertainty.
 Maximize expected maximum believability.
Training structures:
 Backward blocking.
 Blocking and reduced overshadowing.
 Ambiguous cue.
J. K. Kruschke: Active Learning
The Bayesian Ontology
Multiple hypotheses entertained simultaneously. Not just
a single state as in traditional models.
Graded degree of belief in each hypothesis.
Hypotheses specify the probability of each possible
stimulus/outcome value. Not deterministic as in
traditional models.
Learning is shifting of beliefs from less likely to more
likely hypotheses. Bayes’ rule specifies how.
J. K. Kruschke: Active Learning
Bayesian Learning In the Mind
Start with:
 
p(t | c , w)

p(w)
A model that specifies the
probability of outcome value t,
given cues c and hypothetical
weights w.
Prior believabilities of every
possible hypothetical weight
combination w.
J. K. Kruschke: Active Learning
Bayesian Learning In the Mind
Observe:
 
p(t | c , w)

p(w)

c
J. K. Kruschke: Active Learning
Cues c.
Actual outcome t.
t
Bayesian Learning In the Mind
Learn:
 
p(t | c , w)

p(w)

c
J. K. Kruschke: Active Learning
Update beliefs by Bayes’ rule.
 

 
p(t | c , w) p( w)
p( w | t , c ) 

 

 dw p(t | c , w) p(w)
t
Bayesian Learning In the Mind
Learn:
 
p(t | c , w)

p(w)

c
Update beliefs by Bayes’ rule.
 

 
p(t | c , w) p( w)
p( w | t , c ) 

 

 dw p(t | c , w) p(w)
t
Everyday Bayesian reasoning:
Logic of exoneration.
Holmesian deduction.
J. K. Kruschke: Active Learning
Kalman Filter
 
T 
pt | c , w ~ N t | w c , v 
w1
c1
J. K. Kruschke: Active Learning
w2
c2

 
pw ~ N w | , C 
Kalman Filter Updating:
Step 1. Linear Dynamics
w1
w2

 
pw ~ N w | , C 
*

  D
c1
J. K. Kruschke: Active Learning
c2
C  DCD  U
*
T
Kalman Filter Updating:
Step 2. Bayesian Learning
w1
w2

c1
J. K. Kruschke: Active Learning




 


 * *
pw ~ N w |  , C
*

 T *  1
 *T  * 
'    v  c C c t   c C c
 T *  1 *   T *
*
c2 C '  C  v  c C c C c c C
Backward Blocking
Phase
Frequency
Cue 1
Cue 2
Outcome
I
10
1
1
1
II
10
1
0
1
Note: Cell with a 1 indicates presence of
cue or outcome. Cell with a 0 indicates
absence of cue or outcome.
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
Kalman filter
does show
backward
blocking.
J. K. Kruschke: Active Learning
Noisy-logic gate
Outcome is present/absent.
Positive weight is probability
that outcome occurs if that
cue alone is present.
w1
c1
J. K. Kruschke: Active Learning
w2
c2
Negative weight is
probability that outcome
does not occur, when
otherwise it would have, if
that cue alone is present.
Noisy-logic gate

  
cj
ci


pt  1 | c , w  1   (1  wi )   (1  w j )
.t . wi  0
j s .t . w j  0
i s


 

noisy OR
noisy NOT AND
w1
c1
J. K. Kruschke: Active Learning
w2
c2

pw ~ ...
Backward Blocking
Phase
Frequency
Cue 1
Cue 2
Outcome
I
10
1
1
1
II
10
1
0
1
Note: Cell with a 1 indicates presence of
cue or outcome. Cell with a 0 indicates
absence of cue or outcome.
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
Noisy logic gate
does show
backward
blocking.
J. K. Kruschke: Active Learning
Design of the (rest of the) Talk:
2 x 2 x 3 factorial
Bayesian models of associative learning:
 Kalman filter.
 Noisy-logic gate.
Goals for active learning:
 Minimize expected uncertainty.
 Maximize expected maximum believability.
Training structures:
 Backward blocking.
 Blocking and reduced overshadowing.
 Ambiguous cue.
J. K. Kruschke: Active Learning
Active Learning
after Backward Blocking
Phase
Frequency
Cue 1
Cue 2
Outcome
I
10
1
1
1
II
10
1
0
1
After being trained on backward blocking, your goal is to better
determine which cues produce or prevent nausea.
Which probe below do you prefer to test?
J. K. Kruschke: Active Learning
Uncertainty of a Belief Distribution
(a.k.a. entropy)
Spread out:
high uncertainty.
J. K. Kruschke: Active Learning
Piled up:
low uncertainty.
Uncertainty of a Belief Distribution
(a.k.a. entropy)
Spread out:
high uncertainty.
Piled up:
low uncertainty.
A candidate goal for active learning:
Belief distribution with low uncertainty.
J. K. Kruschke: Active Learning
Uncertainty of a Belief Distribution
(a.k.a. entropy)
Spread out:
high uncertainty.
Piled up:
low uncertainty.
U w   p( w) log p( w) 
w
J. K. Kruschke: Active Learning
Current
Uncertainty is
computed from
current belief
distribution.
J. K. Kruschke: Active Learning
For a candidate
probe, we
compute the
expected
uncertainty:


EU w (c )   da p(a | c ) U w|a ,c
J. K. Kruschke: Active Learning


 

where p(a | c )   dw p(a | c , w) p( w)
= .96*0.566 + .04*.624
= .40*0.559 + .60*.532
Probe: 1 0
p = .96
outcome = 1
J. K. Kruschke: Active Learning
Probe: 0 1
p = .04
outcome = 0
p = .40
p = .60
outcome = 1
outcome = 0
Active Learning
after Backward Blocking
Both Kalman filter and noisy-logic gate match human intuition
when minimizing Expected Uncertainty.
J. K. Kruschke: Active Learning
Maximum Probability
of a Belief Distribution
Small max p.
Larger max p.
A candidate goal for active learning:
Belief distribution with high max p.
J. K. Kruschke: Active Learning
Max P and Minimal Entropy do not
always correspond:
Lower uncertainty.
J. K. Kruschke: Active Learning
Higher Max p.
For a candidate
probe, we
compute the
= .96*53 + .04*27.2
= .40*49.7 + .60*84.9
expected
max p:
Probe: 1 0
p = .96
outcome = 1
J. K. Kruschke: Active Learning
Probe: 0 1
p = .04
outcome = 0
p = .40
p = .60
outcome = 1
outcome = 0
Active Learning
after Backward Blocking
Both Kalman filter and noisy-logic gate match human intuition
when maximizing Expected Max P:
J. K. Kruschke: Active Learning
Design of the (rest of the) Talk:
2 x 2 x 3 factorial
Bayesian models of associative learning:
 Kalman filter.
 Noisy-logic gate.
Goals for active learning:
 Minimize expected uncertainty.
 Maximize expected maximum believability.
Training structures:
 Backward blocking.
 Blocking and reduced overshadowing.
 Ambiguous cue.
J. K. Kruschke: Active Learning
Blocking and Reduced Overshadowing
Phase Freq. Cue 1 Cue 2 Cue 3 Cue 4 Outc.
I
II
J. K. Kruschke: Active Learning
6
6
1
0
0
0
1
0
0
1
0
0
1
1
0
0
1
0
0
1
1
1
Active Learning
after Blocking and Reduced Overshadowing
Phase Freq. Cue 1 Cue 2 Cue 3 Cue 4 Outc.
I
II
6
6
1
0
0
0
1
0
0
1
0
0
1
1
0
0
1
0
0
1
1
1
After being trained, your goal is to better determine which cues
produce or prevent nausea. Which probe below do you prefer to test?
J. K. Kruschke: Active Learning
Kalman filter
predictions for
Expected
Uncertainty…
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
?
?
Kalman filter
predictions: No
preference!
Same for Max P.
J. K. Kruschke: Active Learning
Noisy-logic gate
predictions for
Expected
Uncertainty…
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
!
Noisy logic gate
predictions:
Match human
preference!
Same for Max P.
J. K. Kruschke: Active Learning
Ambiguous Cue
Remember: Foods can be anti-emetic, i.e., prevent nausea.
Freq.
Cue 1
Cue 2
Cue 3
Outcome
10
1
1
0
1
10
0
1
1
0
After being trained, your goal is to better determine which cues
produce or prevent nausea. Which probe below do you prefer to test?
J. K. Kruschke: Active Learning
Kalman filter
predictions for
Expected
Uncertainty…
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
Cue 2 is least
preferred!
J. K. Kruschke: Active Learning
Noisy logic gate
predictions for
Expected
Uncertainty…
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
J. K. Kruschke: Active Learning
Cue 2 is most
preferred!
J. K. Kruschke: Active Learning
Kalman filter
Tied
preference
for 1 & 3
Noisy logic gate
Preference
for 2
Minimize
Expected
Uncertainty
Tied
preference
for 1 & 3
Maximize
Expected
Max P
J. K. Kruschke: Active Learning
Tied
preference
for 1 & 2
Summary of Active Learning Predictions
and Match to Human Preference
Bayesian model of (passive) associative learning
Kalman filter
Goal for
active
learning
Minimize
Expected
Uncertainty
J. K. Kruschke: Active Learning
Maximize
Expected
Max P
Noisy logic gate
(Backward)
Blocking
Blocking &
Reduced
Overshadow.
Ambiguous
Cue
(Backward)
Blocking
Blocking &
Reduced
Overshadow.
Ambiguous
Cue






(Backward)
Blocking
Blocking &
Reduced
Overshadow.
Ambiguous
Cue
(Backward)
Blocking
Blocking &
Reduced
Overshadow.
Ambiguous
Cue






The Future of (Associative)
Learning Theory
Bayesian model of (passive) associative learning
Goal for
active
learning
Kalman filter
Noisy logic gate
New Models …
Minimize
Expected
Uncertainty
New Training Structures
New Training Structures
New Training Structures
Maximize
Expected
Max P
New Training Structures
New Training Structures
New Training Structures
New Goals …
New Training Structures
New Training Structures
New Training Structures
J. K. Kruschke: Active Learning
The Future of (Associative)
Learning Theory
Bayesian model of (passive) associative learning
Kalman filter
Goal for
active
learning
Noisy logic gate
New Models …
Rich hierarchical models.
Other levels
of analysis:
New Training Structures
New Training Structures
neuron, functional module
within individual,
corporation.
Minimize
Expected
Uncertainty
New Training Structures
Maximize
Expected
Max P
New Training Structures
New Training Structures
New Training Structures
New Goals …
New Training Structures
New Training Structures
New Training Structures
J. K. Kruschke: Active Learning
The Future of (Associative)
Learning Theory
Bayesian model of (passive) associative learning
Kalman filter
Goal for
active
learning
Minimize
Expected
Uncertainty
New Training Structures
Maximize
Expected
Max P
New Training Structures
Noisy logic gate
New Models …
Rich hierarchical models.
Other levels
of analysis:
New Training Structures
New Training Structures
neuron, functional module
within individual,
corporation.
New Training Structures
New Training Structures
Different information objectives.
Incorporate utilities of expected outcomes.
New Training Structures
New Training Structures
New Training Structures
New Goals …
Incorporate costs of different probes.
Consider multiple steps ahead.
J. K. Kruschke: Active Learning