Mice take calculated risks - RuCCS

Mice take calculated risks
Aaron Kheifets1 and C. R. Gallistel1
Department of Psychology, Rutgers University, Piscataway, NJ 08854
Animals successfully navigate the world despite having only
incomplete information about behaviorally important contingencies. It is an open question to what degree this behavior is driven
by estimates of stochastic parameters (brain-constructed models
of the experienced world) and to what degree it is directed by
reinforcement-driven processes that optimize behavior in the limit
without estimating stochastic parameters (model-free adaptation
processes, such as associative learning). We find that mice adjust
their behavior in response to a change in probability more quickly
and abruptly than can be explained by differential reinforcement.
Our results imply that mice represent probabilities and perform
calculations over them to optimize their behavior, even when the
optimization produces negligible material gain.
|
|
reinforcement learning decision under uncertainty
model-based control probability estimation timing
|
A
|
central question in cognitive neuroscience is the extent to
which computed representations of the experienced environment direct behavior. Many contemporary models of adjustment to changing risks assume computationally simpler eventby-event value-updating processes (1–3). The advantage of these
models is that a computationally simple process does in time
adjust behavior more or less optimally to changes in, for example, the probability of reinforcement. The advantage of computationally more complex model-based decision making is the
rapidity and accuracy with which behavior can adjust to changing
probabilities (4, 5). Recent studies with humans (6) suggest that
the rapidity of adjustment to changes in the payoff matrix in a
choice task requires a representation of risk factors (probabilities
and variabilities). We report an analogous result with mice when
they adjust to a shift in relative risk. In our experiment, a change
in the relative frequencies of short and long trials changes the
optimal temporal decision criterion in an interval-timing task.
We find that mice make the behavioral adjustment to the change
in relative risk about as soon and abruptly as is in principle
possible. Our results imply that neural mechanisms for estimating probabilities and calculating relative risk are phylogenetically
ancient. This finding means that they may be investigated by
genetic and other invasive procedures in genetically manipulated
mice and other laboratory animals.
The hopper-switch task (7) has been used to show approximately optimal risk adjustment in mice (8). The task requires the
mouse to decide on each trial whether to leave a short-latency
feeding location (hopper) to go to a long-latency location (hopper)
based on its estimate of how much time has elapsed since the trial
began (Fig. 1). If the computer has chosen a short trial, the mouse
gets a pellet in the short-latency hopper in response to the first
poke there at or after 3 s. Failure to obtain a pellet there after 3 s
have elapsed implies that that the computer has chosen a longlatency trial. On a long-latency trial, a pellet is delivered into the
other hopper in response to the first poke there at or after 9 s. The
computer chooses short or long trials at random, with the probability of choosing a short trial denoted pS. At the beginning of
a trial, the mouse does not know which type of trial it will be.
The rational strategy given these contingencies is to poke in
the short hopper at the start of each trial, then switch to the long
hopper if and when 3 s or more elapse without a pellet being
delivered in response to a poke into the short hopper. The mice
www.pnas.org/cgi/doi/10.1073/pnas.1205131109
learn to do this action within a few daily sessions. When the mice
have learned the action, they get a pellet on almost every trial.
However, the mice sometimes switch too soon (after fewer than
3 s) or too late (after more than 9 s), and they sometimes go
directly to the long hopper. Whether these errors cost them
a pellet or not depends on which kind of trial the computer has
chosen. Thus, the risk of one kind of error or the other depends
on pS, the probability that the current trial is a short one, and on
the subject’s timing variability, how precisely it can judge when
3 s have elapsed.
When pS changes, altering the relative risks from the two kinds
of error, mice shift the distribution of their switch latencies by an
approximately optimal amount (8) (Fig. 2). The question we pose
is, what kind of process underlies this behavioral shift in response
to a change in the relative frequencies of the short and long trials?
On the reinforcement-learning hypothesis, the shift in the
distribution of switch latencies is driven by a change in the relative
frequency with which different switch latencies are reinforced.
When the short-latency trials are frequent, the probability of a
long-latency switch being reinforced increases and the probability
of a short-latency switch decreases, and vice versa when longlatency trials are frequent. With this hypothesis, a slow trial-anderror hill-climbing process leads in time to a new stable distribution that maximizes the reinforcement probability. In this reinforcement-learning hypothesis, the mouse makes no estimate of
the relative frequency of the short and long trials, nor of its own
timing variability, and it does not calculate an optimal target. The
mouse simply discovers by trial and error the target switch latency
that maximizes the rate of reinforcement.
On what we call the model-based hypothesis, the mouse brain
estimates pS, it estimates its timing variability (σ) or its uncertainty about how much time has elapsed, and it calculates
from its estimates of these stochastic parameters the optimal
target switch time (Fig. S1). With this hypothesis, there will be
a step-change in the distribution of switch latencies as soon as
the mouse detects the change in the relative frequency of the
short and long trials, estimates the new relative frequency, and
calculates the new optimum. In this step hypothesis, a stepchange in the distribution of switch latencies can occur before
the change in the relative frequency of the trial types has produced any change in the rate of reinforcement. In the reinforcement-learning hypothesis, this cannot happen because the
change in behavior is driven by a change in the relative rates at
which short and long switches are reinforced.
Results
As may be seen from the raw data on switch latencies plotted
in Fig. 2, when we changed the relative frequency of the short
and long trials, the mice adjusted the distribution of their switch
latencies so quickly and so abruptly that the adjustments appear step-like.
Author contributions: A.K. and C.R.G. designed research, performed research, analyzed
data, and wrote the paper.
The authors declare no conflict of interest.
1
To whom correspondence may be addressed. E-mail: [email protected] or
[email protected].
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.
1073/pnas.1205131109/-/DCSupplemental.
PNAS Early Edition | 1 of 4
PSYCHOLOGICAL AND
COGNITIVE SCIENCES
Contributed by C. R. Gallistel, April 18, 2012 (sent for review November 25, 2011)
Pellet
Dispensers
Med Associates
Mouse Chamber
Plexiglas
aluminum
Connecting
tube
was a short-latency exponential, the expectation of which did not
vary with the relative frequency of short and long trials. The
switches contributing to this component of the mixture appeared
to be impulsive, not timed. The other component was a Gaussian, the mean of which varied with the relative frequency of the
short and long trials. The ExpGauss mixture distribution has four
parameters: the expectation of the exponential (denoted e), the
mean and SD of the Gaussian (μ and σ), and the proportion
parameter (p), which gives the proportion between the two
components. We take the mean of the Gaussian component to
be an estimate of the target switch time on those trials when the
mouse timed its switch.
We modeled the trial-by-trial trajectory of the behavioral shifts
from one ExpGauss distribution to another as a piece-wise linear
change in θ, the parameter vector for the ExpGauss distribution
(Fig. 3A). Our transition function has two parameters: the number of trials before the start (S) of the behavioral shift and number
of trials for it to go to completion once started (D, for duration):
T −S
;0 ;1 ;
[1]
F T = min max
D
Short-Latency
Feeding Hopper
Nest
Tub
2
1
3
2
4
2
1
Trial-Init
Hopper
2
1
Plexiglas
Long-Latency
Feeding Hopper
10 cm
Fig. 1. The experimental environment. In the switch task, a trial proceeds as
follows: 1: Light in the Trial-Initiation Hopper signals that the mouse may
initiate a trial. 2: The mouse approaches and pokes into the trial-initiation
hopper, extinguishing the light there and turning on the lights in the two
feeding hoppers (trial onset). 3: The mouse goes to the short-latency hopper
and pokes into it. 4: If, after 3 s have elapsed since the trial onset, poking in
the short-latency hopper does not deliver a pellet, the mouse switches to the
long-latency hopper, where it gets a pellet there in response to the first
poke at or after 9 s since the trial onset. Lights in both feeding hoppers
extinguish either at pellet delivery or when an erroneously timed poke
occurs. Short trials last about 3 s and long trials about 9 s, whether reinforced or not: if the mouse is poking in the short hopper at the end of a 3-s
trial, it gets a pellet and the trial ends; if it is poking in the 9-s hopper, it does
not get a pellet and the trial ends at 3 s. Similarly, long trials end at 9 s: if the
mouse is poking in the 9-s hopper, it gets a pellet; if in the 3-s hopper, it does
not. A switch latency is the latency of the last poke in the short hopper
before the mouse switches to the long hopper. Only the switch latencies
from long trials are analyzed.
where FðTÞ is the fraction of the transition completed as of trial
T (Fig. 3A). We then computed the marginal-likelihood functions for S and D to find their plausible values.
Fig. 3B plots the cumulative distribution of the expectations of
the marginal-likelihood functions for S, the start parameter.
These expectations are estimates of how many trials it takes
a mouse to detect a substantial change in pS and begin to adjust
its switch latencies. The median estimate for the start of a transition is 10 trials.
The median expectation of the marginal-likelihood function
for D, the duration of a transition, was 19 trials, with a range
Regardless of the relative frequency of the short and long
trials, the distribution of switch latencies was well described by an
ExpGauss mixture distribution. One component of the mixture
Switch Latency (s)
S1
S2
S3
S4
S5
10
5
0
0
10
5
0
0
10
5
0
0
10
5
0
0
10
5
0
0
100
200
500
300
400
1000
500
1000
500
1000
500
500
600
1500
1500
800
2000
2500
2000
1500
1000
700
2500
2000
1500
2500
2000
Trial Number
Fig. 2. Scatter plot of switch latencies on long trials (small black circles). Red dots at top and bottom of each plot mark long and short trials, respectively.
Vertical lines mark session boundaries. Thin blue horizontal lines are the medians of the distributions. When the relative density of red crosses on top and
bottom changes—that is, when the relative frequency of long and short trials changes—the distribution of black circles shifts away from the increased red
density (increased risk). Large red ovals indicate unilateral feeder malfunctions.
2 of 4 | www.pnas.org/cgi/doi/10.1073/pnas.1205131109
Kheifets and Gallistel
trials and to compute an estimate of the new pS, based on the data
since the estimated change; (ii) The change in target switch latency––hence the distribution of switch latencies––is accomplished
in a single trial. The step change in the distribution implies that the
prechange estimate of pS is replaced by a postchange estimate,
based on postchange data that do not overlap the prechange data.
To constrain the space of possible calculations that could
underlie this behavior, we considered two basic kinds of algorithms: reinforcement-driven algorithms that optimize behavior
without modeling the experienced world, and model-driven
algorithms that estimate stochastic parameters of the experienced
world (probabilities and variabilities) and calculate an optimal
change in decision criterion (target switch time) given those
estimates. Reinforcement-learning algorithms are often used in
machine learning when computing a model of the environment in
which the machine is to act is thought to be too complex. The
assumption that reinforcement shapes behavior without the
brain’s constructing a representation of the experienced world has
a long history in psychology and behavioral and cognitive neuroscience (9–13). In reinforcement learning, behavioral change is
driven by changes in the rate of reinforcement that a pattern of
behavior (“strategy”) produces. In the present case, this translates
to changes in the average number of pellets obtained per trial for
a given target switch latency. The target switch latency is the
strategy. Varying this strategy will vary the rate of reinforcement.
In line with the findings of ref. 8, we found that subjects missed
very few reinforcements. Each plot in Fig. 4 plots two functions
in the plane of the parameters of the Gaussian distribution (μ
and σ). The gain contours (red curves) delimit the expected
percentage of reinforced trials for combinations of values for
these parameters. When the μ and σ estimates for the Gaussian
component of a subject’s switch-latency distribution are below
the 99% gain contour, the subject gets pellets on more than 99%
of the trials. When above a given contour, the subject gets pellets
on a smaller percentage of the trials.
The joint confidence limits around our estimates for the values
of the parameters of the Gaussian component of their switchlatency distribution are the small concentric circles in Fig. 4. We
assume that subjects cannot control their timing variability; that
a
b
S
0
C
1.0
Cumulative Fraction of Transitions
B
D
Trials
0.8
0.6
0.4
0.2
0
1
3
10
30
Start (trial num, log scale)
1
2
5
10
20
50 100
BF in Favor of Step (log scale)
1.0
0.8
0.6
0.4
0.2
0
Fig. 3. (A) A transition function, with parameters, S (start) and D (duration),
describes the trial-by-trial change in the parameter vector, θ = he; μ; σ; pi, of
the ExpGauss mixture. Trial 0 is the last trial before the change in pS. (B)
Cumulative distribution of S estimates. (C) Cumulative distribution of Bayes
factors favoring D = 1 (a step switch) vs. 1 ≤ D ≤ 100 (a more gradual switch).
from 3 to 105. However, in all 14 transitions, the likelihood at
D = 1 was higher than the mean likelihood over the range 1 ≤
D ≤ 100. Thus, in every case, the transition data favored the step
hypothesis (D = 1) over the alternative hypothesis that the
transition lasted somewhere between 1 and 100 trials (see Fig. S2
for the marginal-likelihood functions). The cumulative distribution of the Bayes Factors (the odds in favor of D = 1) is in Fig.
3C. The combined odds in favor of a step transition (the product
of the Bayes factors) are enormous.
Our key findings are thus: (i) It takes mice about 10 trials to detect
a substantial change in the relative frequency of the short and long
Subject 2, pS = 0.7
3.5
3
Switch-Latency Variability
2.5
2
1.5
2.5
2
1.5
90%
95%
99%
1
0.5
3
4
5
Subject 5, pS = 0.7
3.5
3
6
7
90%
95%
99%
1
0.5
8
9
3
4
5
6
7
8
9
8
9
95% and 50% confidence on subject behavior
Subject 2, pS = 0.1
3.5
3
2.5
2
1.5
3.5
3
2.5
2
1.5
90%
95%
1
0.5
4
5
6
Subject 5, pS = 0.1
95%
1
0.5
99%
3
90%
7
8
9
99%
3
4
5
6
7
Switch-Latency Mean (µ)
Fig. 4. Parameters of the Gaussian switch-latency distributions (black concentric circles) in relation to gain contours (red upside down U curves). A gain
contour traces the upper limit on the combinations of variability (σ) and mean (μ) consistent with an expected percentage of reinforced trials. The small
concentric circles are the confidence limits on our estimates of the mean and SD of the Gaussian component of the subject’s switch-latency distribution. The
circles consistently fall near the mode of the 99% gain contour. Note the change in gain contours with changes in ps. Note also the shallowness of the gain
hill. The shallower the gain hill, the more slowly a hill-climbing algorithm can find its summit (that is, converge on the optimal strategy). Note finally that the
confidence limits under different pS conditions do not overlap. Thus, the changes in pS produce statistically significant changes in the mean switch latency, as
is also evident in the raw data (Fig. 2).
Kheifets and Gallistel
PNAS Early Edition | 3 of 4
PSYCHOLOGICAL AND
COGNITIVE SCIENCES
A
is, the location of the concentric circles along the σ axis. Subjects
determine the location along the μ axis when they calculate or
otherwise arrive at a target switch latency. As may be seen in Fig.
4, the observed locations are generally close to the mode of the
99% gain contour (close to the optimal location), even though
substantial deviations from that mode would have little impact
on gain (rate of reinforcement). The gain contours shift when pS
changes. The optimal target latency depends on pS and on
a subject’s timing variability. Sensitivity to one’s own variability
has also been demonstrated in humans (4, 14, 15).
We examined whether the rate of reinforcement could drive
these behavioral shifts by calculating the number of pellets missed
before the change in behavior went to completion. In estimating
when the change went to completion, we used the expected duration of a transition, rather than the maximally likely duration
that, as already noted, was always 1. Over 30% of the behavioral
shifts went to completion before a single pellet was missed (because of switching too soon or too late). Moreover, the pattern
and number of reinforcements missed after the behavioral shift
was indistinguishable from the pattern and number missed over
the same span of trials before the shift. Thus, our data rule out
reinforcement-learning as an explanation for the changes in the
distribution of switch latencies. The change in the distribution of
switch latencies must be driven by a change in the estimate of pS,
based on the subject’s experience over several trials.
Finally, we asked whether the adjustments in the estimates of
pS were a consequence of trial-by-trial probability tracking
(leaving later after experiencing one or two short trials and
sooner after experiencing one or two long trials). Given the
abruptness of the changes, the resulting estimates of pS would
have to be highly sensitive to the most recent two or three trials.
However, the distributions of switch latencies conditioned on the
number of the most recent trials (for n = 1, 3, and 5) did not
differ. That is, the distribution of switch latencies following three
short trials did not differ from the distribution following three
long trials. This finding implies that the abrupt behavioral shifts
resulted from a change-detecting mechanism that parsed the
experienced sequence into trials before a change and trials after
a change, with the new estimate of pS based only on the relatively
few trials between the estimated change point and the trial on
which the change was perceived. In arguing that for a model-based
adjustment algorithm, we mean only that the algorithm operates
on estimates of probabilities and change points. Whether it is
necessary to assume that brains—even human brains—make such
estimates has long been a question of moment in psychology,
cognitive science, and neuroscience.
In a subsequent experiment with six new mice, we made the
change approximately 40 trials into a session rather than at the
beginning of a new session. We observed similarly abrupt
changes. The marginal-likelihood functions from this experiment
are in Fig. S3.
1. Corrado GS, Sugrue LP, Seung HS, Newsome WT (2005) Linear-nonlinear-Poisson
models of primate choice dynamics. J Exp Anal Behav 84:581–617.
2. Dayan P, Daw ND (2008) Decision theory, reinforcement learning, and the brain. Cogn
Affect Behav Neurosci 8:429–453.
3. Schultz W (2006) Behavioral theories and the neurophysiology of reward. Annu Rev
Psychol 57:87–115.
4. Trommershäuser J, Maloney LT, Landy MS (2008) Decision making, movement planning and statistical decision theory. Trends Cogn Sci 12:291–297.
5. Daw ND, Niv Y, Dayan P (2005) Uncertainty-based competition between prefrontal
and dorsolateral striatal systems for behavioral control. Nat Neurosci 8:1704–
1711.
6. Stüttgen MC, Yildiz A, Güntürkün O (2011) Adaptive criterion setting in perceptual
decision making. J Exp Anal Behav 96:155–176.
7. Fetterman JG, Killeen PR (1995) Categorical scaling of time: Implications for clockcounter models. J Exp Psychol Anim Behav Process 21:43–63.
4 of 4 | www.pnas.org/cgi/doi/10.1073/pnas.1205131109
We conclude that mice can accurately represent sequential
Bernoulli probability (pS), quickly perceive a substantial change
in this probability, locate the point in the past at which the
change occurred, estimate the new probability on the basis of the
postchange trials, represent the uncertainty in their endogenous
estimate of time elapsed in a trial, and correctly compute a new
approximately optimal target switch latency.
This simple paradigm may be used with genetically manipulated mice, opening up the possibility of using genetic and molecular biological methods to discover the neurobiological
mechanisms that make these evolutionarily ancient, life-sustaining computations possible.
Materials and Methods
Protocol. Subjects lived 24/7 in the environment shown in Fig. 1, obtaining all
their food by performing the tasks. For a complete description of this behavior-testing environment and the associated software, see ref. 16. The five
mice of the CB57BL6/j strain obtained from Jackson Laboratories were
∼100 d old at the start of the experiment. In session 1 (1 d), the task was
a concurrent VI matching protocol. In session 2 (1 d) it was a two-latency,
two-hopper, autoshaped hopper-entry protocol: steps 1 and 2 were as in Fig.
1, except that at step 2 (trial onset), the light came on in one or the other of
the two feeding hoppers and a pellet was delivered at the short or long
latency, regardless of the mouse’s behavior. This process taught the mice the
feeding latency associated with each hopper. In the switch task, the interval
from the end of a trial to the next illumination of the trial-initiation hopper
was exponentially distributed with an expectation of 60 s. The pellets were
20 mg Purina grain-based from WF Fisher and Son, Inc. The sole condition on
pellet release was that the first poke after the computer-chosen feeding
latency be in the correct hopper (the short hopper on short trials and the
long hopper on long trials). Any behavior pattern that met this criterion was
reinforced (for example, going directly to the long hopper on a long trial or
switching from the short hopper to the long and then back to the short
within 3 s on a short trial). This protocol was approved by the Institutional
Animal Care and Use Committee of Rutgers University.
Transition Analysis. We assume that all four parameters follow the same twoparameter linear transition trajectory (Fig. 2A). Thus, our statistical model for
a transition has 10 parameters: 4 for the distribution before the transition
(θb), 4 for after the transition (θa), and the transition parameters, S and D
(Fig. 2A). We used Metropolis Hastings Markov chain Monte Carlo methods
to sample from the posterior distribution to estimate values for these 10
parameters. The estimates for θb and θa were essentially the same as the
maximum-likelihood estimates. To check the validity of the marginal-likelihood functions for S and D obtained from Markov chain Monte Carlo
sampling, we computed the latter numerically after fixing θb and θa at their
maximum-likelihood values.
ACKNOWLEDGMENTS. We thank Joseph Negron for help in running the
experiment and Jacob Feldman for valuable comments on an early draft.
This research was funded by Grant R01MH077027 from National Institute of
Mental Health, entitled “Cognitive Phenotyping in the Mouse.”
8. Balci F, Freestone D, Gallistel CR (2009) Risk assessment in man and mouse. Proc Natl
Acad Sci USA 106:2459–2463.
9. Hull CL (1943) Principles of Behavior (Appleton-Century-Crofts, New York).
10. Skinner BF (1938) The Behavior of Organisms (Appleton-Century-Crofts, New York).
11. Skinner BF (1981) Selection by consequences. Science 213:501–504.
12. Sutton RS, Barto AG (1998) Reinforcement Learning: An Introduction (MIT Press,
Cambridge, MA).
13. Rangel A, Camerer C, Montague PR (2008) A framework for studying the neurobiology of value-based decision making. Nat Rev Neurosci 9:545–556.
14. Trommershäuser J, Maloney LT, Landy MS (2003) Statistical decision theory and the
selection of rapid, goal-directed movements. J Opt Soc Am A Opt Image Sci Vis 20:
1419–1433.
15. Behrens TE, Woolrich MW, Walton ME, Rushworth MF (2007) Learning the value of
information in an uncertain world. Nat Neurosci 10:1214–1221.
16. Gallistel CR, et al. (2010) Screening for learning and memory mutations: A new approach. Acta Psychologica Sinica 42(1):138–158.
Kheifets and Gallistel
Supporting Information
0.25
1.0
0.2
.8
0.15
.6
SD
0.1
.4
0.05
.2
Cumulative Probability
Probability Density
Kheifets and Gallistel 10.1073/pnas.1205131109
mean
0
0
0
3
9
Elapsed Time in Trial (s)
Fig. S1. Illustration for the calculation of the optimal target switch latencys. Black function is the Gaussian probability density function describing the distribution of timed switch latencies; red function is the cumulative distribution function. Red arrows indicate that the areas under the tails of the density
function may be read off the cumulative probability function. The expected percentage of reinforced trials (gain) is optimized when the area under the left tail
of the probability density function (Eq. S1) times the probability of a short trial, pS , equals the area under the right tail times the probability of a long trial,
which is 1 − pS . The area under the left tail is Φð3; μ; σÞ and the area under the right tail is 1 − Φð9; μ; σÞ, where Φ is the Gaussian cumulative distribution
Φð9;μ;σÞ
[S1]; that is, the ratio of the tail areas must be the inverse of the ratio of the probabilities. The
function: pS Φð3; μ; σÞ = ð1 − pS ÞΦð9; μ; σÞ, whence 1 −pSpS = 1 −Φð3;μ;σÞ
log of the right hand side of Eq. S1 is a linear function of μ, with slope determined by σ. The inverse of this function gives optimal μ as a function of logit(pS).
20
2
Subject 1, transition 2
Subject 1, transition 1
10
0
1
20
40
60
80
100
20
0
20
40
80
100
40
Subject 2, transition 1
Subject 1, transition 3
10
0
60
20
20
40
60
80
100
5
0
20
40
60
80
100
10
Subject 2, transition 3
Subject 2, transition 2
5
marginal likelihood
0
20
40
60
80
100
20
0
20
40
10
Subject 3, transition 2
20
40
60
80
100
0
20
40
60
80
100
4
Subject 4, transition 2
Subject 4, transition 1
10
2
20
40
60
80
100
10
0
20
40
60
80
100
30
Subject 5, transition 1
Subject 4, transition 3
5
0
100
2
20
0
80
4
Subject 3, transition 1
0
60
20
20
40
60
80
100
10
20
40
60
80
100
duration of transition (trials)
10
Subject 5, transition 2
5
0
20
40
60
80
duration of transition (trials)
100
Fig. S2. Marginal likelihood of D, the duration parameter of the piece-wise linear transition function. The marginal likelihood is the likelihood after the other
nine parameters (S; e1 ; e2 ; μ1 ; μ2 ; σ1 ; σ2 ; p1 ; p2 ) of the 10-parameter model have been integrated out. In all but three cases, the mode is at D = 1. In all such cases,
the hypothesis that D = 1 is more likely than any hypothesis of the form 1 ≤ D ≤ N, where N is an integer greater than 1 and the prior probability of D is uniform
for the integers on that interval. Even when the mode is not at 1, the marginal likelihood at D = 1 is so close to the maximum likelihood that it is more likely
than the just-specified alternative hypothesis for reasonable values of N.
Kheifets and Gallistel www.pnas.org/cgi/content/short/1205131109
1 of 2
Subject 1, transition 1
Subject 1, transition 2 Subject 1, transition 3 Subject 1, transition 4
60
20
140
20
50
10
120
10
40
20 40 60 80 100
0
20 40 60 80 100
Subject 2, transition 1 Subject 2, transition 2
20 40 60 80 100
0
10
40
15
20
5
30
10
0
20 40 60 80 100
0
20 40 60 80 100
20
20 40 60 80 100
5
40
30
100
20
20
20
80
0
20 40 60 80 100
0
20 40 60 80 100
Subject 4, transition 1
20 40 60 80 100
Subject 3, transition 3 Subject 3, transition 4
40
Subject 3, transition 5
20 40 60 80 100
Subject 2, transition 3 Subject 2, transition 3
40
Subject 3, transition 1 Subject 3, transition 2
marginal likelihood
100
10
20 40 60 80 100
60
20 40 60 80 100
Subject 4, transition 2 Subject 4, transition 3
10
20
100
5
10
50
0
0
0
4
20 40 60 80 100
Subject 4, transition 4
20 40 60 80 100
Subject 4, transition 5
2
20 40 60 80 100
0
10
10
70
50
5
5
60
0
20 40 60 80 100
Subject 5, transition 3
0
20 40 60 80 100
Subject 5, transition 4
0
10
220
60
5
200
20 40 60 80 100
0
20 40 60 80 100
20 40 60 80 100
50
20 40 60 80 100
Subject 5, transition 5
70
50
20 40 60 80 100
Subject 5, transition 1 Subject 5, transition 2
100
180
20 40 60 80 100
duration of transition
Fig. S3. Marginal-likelihood functions for D from a subsequent experiment with six new mice, in which the changes in the probability of a short trial occurred
part way into a session, rather than at the start of a new session. The estimated latencies for the starts of the transitions were similar to those shown and, for all
but 2 of 23 within-session transitions, the Bayes factor favored the step hypothesis.
Kheifets and Gallistel www.pnas.org/cgi/content/short/1205131109
2 of 2