On the generic nonconvergence of Bayesian actions and beliefs

UNIVERSITY OF
ILLINOIS LIBRARY
AT URBANA-CHAMPAIGN
BOOKSTACKS
CENTRAL CIRCULATION
AND Bnrwcr
B00K STACKS
person
T^e
borrow
sponsible for
the Lot**
its
W
7u
reneL,
materiaI
h?^
is
re "
befor e
Date s2££3
be charged
a mini*.? r C,OW You may
each non-refu^d':;^^ ° f $75
-
°°
JAN
4 2000
wai/j
1999
'
reneWl ng by
Phone, write
bTlow
CiOW P^vious
D
new du e date
due date.
L162
S
Digitized by the Internet Archive
in
2011 with funding from
University of
Illinois
Urbana-Champaign
http://www.archive.org/details/ongenericnonconv1528feld
BEBR
FACULTY WORKING
rx
PAPER NO.
On
89-1528
the Generic Nonconvergence
of Bayesian Actions and Beliefs
THE LIBRARY OF THE
FFB 2 3
1989
Mark Feldman
UNlvtrta'H O. ILLINOIS
....
ipAIGN
College of Commerce and Business Administration
Bureau of Economic and Business Research
University of Illinois Urbana-Champaign
BEBR
FACULTY WORKING PAPER NO. 89-1528
College of Commerce and Business Administration
University of Illinois at Urbana- Champaign
January 1989
On the Generic Nonconvergence of Bayesian Actions and Beliefs
Mark Feldman, Assistant Professor
Department of Economics
An earlier version of this paper was presented at a conference on
"Endogenous Learning in Economic Environments," Cornell University, June
1988.
I am grateful to Roger Koenker, Andrew McLennan and the
participants of the conference for helpful discussions and comments.
ABSTRACT
Many
authors have recently modeled learning by Bayesian decision-makers and by appeal to the
Martingale Convergence Theorem proven that posterior beliefs converge
decision-maker. This paper emphasizes that this
view of an observer who knows the
is
demontrated
true
is
not equivalent to
a.s.
from the point of view of the
convergence from the point of
parameter value. For a variant of die two-armed bandit problem
that for a residual family of
it
parameter values and priors with objective probability one
posterior beliefs of the Bayesian decision-maker never converge.
211.
a.s.
JEL
Classification
Numbers: 022, 026,
-1-
I.
INTRODUCTION
Recently, there has been a revival of interest
and control
Nyarko
there
is
in
[16],
economic environments. One
McLennan
[18],
and Nyarko
in
studying the asymptotic dynamics of Bayesian learning
and Kiefcr
set of authors, including Easley
[19],
Kiefer and
[9, 10],
have analyzed single agent decision problems
for
which
a tradeoff between current period reward and the value of the information generated by the current
period action. In another strand of the literature,
[11, 12] attempt to characterize the
Blume and Easley
[5],
Bray and Kreps
and Feldman
[6],
of the sequence of beliefs and outcomes for economies with
tail
many
passively learning agents. Specifically, these latter papers focus on whether Bayesian learning by agents
with a correct specification of the underlying structure but uncertainty regarding the parameter values
sufficient condition to assure
These
Townsend
articles, as well as
convergence
common
[6],
Rothschild [23], and
framework. From the vantage point of the economic actors,
the set of possible complete descriptions of the relevant time-invariant
economic data can be represented as a
with Borel a-field B. The actors in the model, uncertain as to the "true" 8° e 0,
separable metric space
have prior beliefs
a
to a stationary rational expectations equilibrium.
important earlier contributions of Cyert and DeGroot
have the following
[26],
is
A
on (S,B). 1
|i(0)
result
(H(t)} of posterior beliefs almost surely
common
to this literature is a
converges weakly
"theorem" that the sequence
conditioned on the
to |i(°°), the posterior belief
limit sub-<y-field.
In all of the
a.s.
more
recently authored papers the Martingale
convergence of the sequence of Bayesian
measures of the Bayesian agents.
stating that a passive observer
An
who
beliefs,
where
Convergence Theorem
elaborate on this distinction, suppose that in period
is
not necessarily equivalent to
t
=
would converge
0, 1,
...
,
to a limit belief |i(°°).
agents observe outcomes in a
separable metric space Y. Given the behavioral rules of the agents, each parameter value 9 e
probability measure
P|i(0)(A) =
Qq
on the product space Y°°. The prior
Qe(A)PjI (0)(d9). The
convergence of
(|i(t)},
where the
a.s.
to establish
informed of the "true" parameter 9° would attach probability one to
the event that the sequence of beliefs {n(t)} of the Bayesian agents
To
invoked
the a.s. is with respect to the prior probability
essential point of this paper is that this
is
is
p.(0)
induces a measure Pfi(0) on Y°° defined by
application of the Martingale
statement
does not imply that for any particular 9 that
|i(t)
is
=>
Convergence Theorem yields
with respect to the probability measure
|i°°
with
Qq
induces a
probability one, even
if
9
Pjj.(0)-
is in
But
a.s.
this
the support
ofn(0).
One might hope
to establish a result that for a "large" class
class of parameter values.
to
Lebesgue measure
set, the set
Some
m
When
can be embedded
(restricted to
0)
of priors, posteriors converge for a "large"
in finite-dimensional
if
m
«
measure zero. But
in
many
as a natural notion of size.
of 9 such that Qe((p.(t) => n°°}) <
1,
has
m
authors allow for heterogeneous beliefs across agents.
Euclidean space, one has recourse
Then
|i(0) the
exceptional 9
naturally occurring
-2-
settings
is
not finite dimensional, and since there
measure, a measure-theoretic criterion
measure
In lieu of a reference
is
no
infinite
dimensional analogue of Lebesgue
unavailable.
is
to evaluate size, the
notion of category. Residual subsets are
deemed
customary procedure
to resort to the topological
is
be large, and subsets of
to
category (which are
first
complements of residual subsets) are regarded as small.
The formal
analysis in this paper focuses on the two-armed bandit problem with a countable
outcomes associated with each arm. Building upon
potential
a residual set of pairs (0,
of Freedman [13]
results
we
of parameter values and prior beliefs such that with
\i)
posterior beliefs never converge. Furthermore, there exist reward structures with one
than the other for an informed decision-maker; yet for a residual set of pairs (9,
infinitely often with
Qq
\x)
number of
prove that mere
is
probability one
Qq
arm more favorable
both arms will be played
probability one.
II.
NOTATION AND MATHEMATICAL PRELIMINARIES
Notational Conventions
II. 1.
The
generic element of
1,2,}.
numbers
set of real
If
Z
is
an arbitrary
C(W)
convergence. P(P(W))
is
by either
and
set
W with the sup norm. P(W)
is
denoted by R. The
N is denoted
topological space, then
(W,W0
is
is
is
the
if
w
denoted by
e A, for
n or L
A
of natural numbers
The
is
Banach space of bounded, continuous
P 2 (W).
measures on
A
is
is
N
=
{
A
1, 2, ... }.
denoted by
denoted by Ia
real- valued functions
No =
If
•
W
{0,
is
a
defined on
W endowed with the topology of weak
Unless otherwise specified
W the Dirac measure 8 W e
we
denoted by
of non-negative integers
set
Z, the indicator function for the set
the space of probability
a measurable space and
and 5 W (A) =
Ac
i,
set
all
f(W)
is
a-fields are the Borel a-field. If
defined by 8 W (A)
=
1
if
we A
e W.
Category Theory
II. 2.
shall briefly
I
review the concepts relating
to
Baire category. Standard references include Kelley [17, pp.
W be a metric space. A set E c W
200-203], Oxtoby [20] and Royden [24, Section 7.8]. Let
dense
if
E
has empty interior.
collection of
nowhere dense
complement of a
According
set
of
to the
first
A
set
E
is
of first category or
sets. If a set is
category
is
not of
it
is
nowhere
the union of a countable
if it is
category then
of second category.
The
a residual set.
theorem of Baire [24, Theorem 7.27],
intersection of a countable family of
open dense subsets of
III.
III.l.
first
meager
is
W
if
W
is itself
is
a complete metric space then the
a dense subset of
W.
THE MODEL
Preliminary Description
There are a countable number of time periods, with periods indexed by
beginning of period
t
the Bayesian decision-maker selects an action
t
=
0,
1, 2, ....
(or bandit-arm) x(t) e
X=
At the
[x\, X2).
-3-
Yk =
After choosing action Xk, an outcome y(t) e
is
received.
The
The range of Rk
action x(t) and
is
bounded with
outcome
cfc
= inf(Rk(yki):
of the random variables X(t) and Y(t). The
y(t) are the realizations
the discrete topology.
= @i x ©2. 9^
parameter space
perspective as the realization of a
context that
n represents
The objective of
©2 = P(Y2), and
is initially
unknown
random element with
a prior belief,
upon X(t) = xk
is
9 k e /'(Yk), where
Yk
is
objective probabilities 0j and 62 are respectively elements of the
The
parameter subspaces ©1 = P(Yi) and
e Yk}.
yki
probability law specifying the distribution of Y(t) conditional
endowed with
observed, and a period reward Rk(y(0)
is
(yki, yk2» ...}
we sometimes
the decision-maker
is
the vector 9°
to the
(8j, 8 2 ) is an element of the
decision-maker, and so
distribution n(0)
write
=
e P(&). (When
viewed from her
is
clear from the
(0, 1), the
expectation of
it
instead of |i(0).)
|i
maximize, with discount factor
to
is
a
e
the present value of the reward stream. Denoting for notational convenience the indicator function
I
Xk
:
X —»
{0,
1
}
by
I
k
,
for a given
sequence
00
reward
total
to die
decision-maker
is
2^a
2
l
(X(t)}
00
E[^a
l
t=0
£lk( x (0)Rk(y(0))- So the optimization problem
-{
precise definition of a policy
(a
to
choose a
k=1
t=0
policy
of actions and a given sequence (y(t)} of outcomes, the
{x(t)j
provided
is
2
where the expectation
£lk(X(f))Rk(Yk(t))}]
{
k=l
in
section
III. 3)
that
maximizes
taken with respect to the probability measure
is
induced by her prior belief fi(0).
The policy or
and
"rule" that the decision-maker abides
While a remains fixed throughout
|i(0).
behavior to variation
in the prior beliefs.
we
III.2.
Specification of the Probability Spaces
will often write X(t, ^i(O)) or X(t, n)
©1, ©2, and
The
with the Prohorov metric,
© are denoted
set of possible
©
and Y(t,
is
(^1 x &2.
F2
9 = (9i,e ©,
9k x 9 k x
...
x P2)is
.
A
or Y(t,
is
denoted byfy
'.
To
to study the sensitivity of the tail
|i)
for respectively X(t)
space (for further details see
(Q\, Fj), (^2. ^2) and (Q, F) by
generic element of
interpretation
is
Qk
is
is that:
Yk°°.
and
Y(t).
The Borel
Endowing Yk°° with
facilitate the analysis
[8,
subsets of
(i) co k n is
we
elect,
the product
without loss of
Section 1.6]). Accordingly,
(Q^ F k ) =
•••)•
Qe = Qej x Qq 2 on
the element y^i e
Yk
we
define
(Y k °°, Yk°) and (Q, F) =
denoted as cok = (u)ki, 0)k2.
an "objective" product probability measure
The
a
dependence of actions upon the prior belief
outcomes from repeated play of arm k
in representation
the measurable spaces
|i(0))
depends upon
respectively by B], B2, and B.
topology, the product a-field
work
this
is
(x(f)}
a complete and separable metric space.
00
generality, to
choosing the sequence
in
the paper, our goal
To emphasize
^i(O),
Endowed
by
that
Associated with each
(Q, F) defined by
is
Qe k =
observed on the
n'th
4-
occasion that action x k
{(Okl, 0)^2. •••} fr°
m
chosen, and
is
choosing x k are
The product outcome space
product probability
is
,
the
outcomes
with distribution 8 k 2
i.i.d.
.
F)=
(Q,
is
upon parameter 8 k
that conditional
(ii)
Qe = Qej x Qq 2 The
.
(Q.\
x
F\ x F2). For 8 = (61,62) e
Q.2,
set {(Q, F, Qe): 8
e 0}
is
the "objective"
the family of objective probability
spaces.
To avoid extraneous
\i
complications
we make
a restrictive independence assumption that the prior belief
g P(&) can be decomposed as the product probability
induced prior probability P^ on (Q x 0,
=
|
B Qe(A)
A
ji(0)(d8) for
unique extension of
{(Q x 0,
F
and (0, 5)
III.3.
x B, P^):
B
and
P^. to (Q.
efl.
x 0,
e P(Q)}.
\i
—>
F x B
.
2.6.2] there is a
The family of subjective probability spaces
).
for
example
for
Ae
F, P^(A)
is
P^ on (Q, F)
= Pp.(A x 0).
is
provided in
in the definition
this section.
We
start
by specifying
of an admissible plan are the count functions
t.
admissible plan
is
defined as sequences of random variables {X(t)}, {Y(t)}, (Ci(t)}, and (C2(t)},
H
such
t
that:
={0,Q)
\)H
X(0)
is
Ho measurable,
If X(0)(co)
iv) for
v) X(t)
vi)
Theorem
by P^(A x B)
and Policies
and a sequence of sub-a-fields
iii)
[1,
A x B,
No, k = 1,2. The realization Ck(t)(co) represents the number of times arm k has been chosen
through time
An
of the marginal distributions. 3 The
defined for measurable rectangles
precise statement of the optimization problem
Ck(t): ft
(12
4 By the Product Measure Theorem
what constitutes an admissible plan. Included
ii)
is
x
Slightly abusing notation, the marginal distribution of
Pu So
also denoted by
is
Histories
A
F
e
F x 5)
[i\
t
>
is
H
1,
H
C k (0) =
= xk
t
=
H .j
t
I
co k i,
o(X(t-l), Y(t-l)),
k(X(0)), and
=
x k and
A policy is a sequence
Ck (t) = C k (t-1) +
C k (co) =
(X(t)} for
(Ci(t)), (C2(t)}, [///}].
it is
v
=
measurable,
t
vii) If X(t)(co)
^Technically,
then Y(0)(co)
,
An
n,
I
k (X(t)), and
then Y(t)(co)
which there
= w^.
exists a (unique) corresponding admissible plan [{X(t), (Y(t)},
optimal policy for the prior
the projections (Proj(k, n)), defined
/i
maximizes
by Proj(k,
n)(o>)
= co^ which are independent.
-'Note the contrast with the assumptions of Rothschild [23]. Rothschild allows for the arms to be dependent but
requires the
4 For the integral
[21,
Lemma
outcome space be
to
be well-defined
6.1].
a
it
two point
is
set.
necessary that the
map 9 —> Qe(A) be
measurable. This
is
a consequence of
-5-
oo
C
la
1
-
{
t=0
2
IIk(X(t)(o)))R k (Y(t)(a)))}P^(d(o) over the
k=l
set
of policies.
a
Posterior Beliefs
III.4.
When
and The Optimal Bayesian Policy
Bayes
possible, beliefs are updated according to
occurs, the updating
Since none of the results of
is arbitrary.
we
of conditional probability, without loss of generality
outcome
Yk
to
which she assigned a prior probability of
-> P(e\d> k =
For
A
all
Bk
e
1,
2
this
shall
zero.
When
rule.
paper depend upon the choice of version
assume
More
an event of prior probability zero
that the
decision-maker ignores any
precisely, the posterior
map r k
:
/'(©k) x
defined by:
is
,
je k (yki) Uk(d6k)
—
TkOik, Yki)(A) =
whenever
Je k (yki) Wc(de k )
rk (|i k yidXA) =
(i k
,
(A) whenever
Je k (y ki ) Wc(d9k) > 0, and
©k
Je k (y k i)
|i k
(d0 k ) =
0.
©k
=
mx
sequences {vi(n, M-i)} n r o and
For prior
[i
v k (n,
Q -» P(&k) are recursively defined by:
ilk):
v k (0,
the
\x.2,
^((0) = n k
,
v k (n, ^ k )(co) = Tk (v k (n-1,
The
interpretation is that
times with prior measure
^i k )(o)),
v k (n, Hk)(co)
is
CDkn) for n
>
1.
the posterior measure
[\x(i)}
r The map
on (©k, B0
after
arm k has been chosen n
|i k .
Associated with a policy (X(t)} and the prior belief
~
{V2(n, \^1)}^1q of updating functions,
n(t): Q.
->
P(&)
is
p.
=
x
|ii
a sequence of posterior maps
112 is
defined by n(t)(co) = m(t)((o) x |i2(t)(o)) where
^i k (t)(co)
=
v k (C k (co), n k )((0).
A
policy (X(t)}
^(|i(t)(u))) for all 0)
stationary
is
if
there exists a measurable function
e Q. Identifying a function
71:
P(&) -»
X
71:
—» X
P(Q)
such that X(t)(co) =
with the stationary policy
it
induces, no
confusion should result from referring to k as a stationary policy.
Invoking
now well-known
results (for details see [9], [15] or [21]) the
problem can be reformulated as a dynamic programming problem with
V:
/
>
(0)
—»
R. Furthermore, there exists an optimal stationary policy
Drawing on
results of Gittins
Whittle [27], there
of functions
period
t
To
iff
is
and Jones
[14], with refinements
Mi(jii(t))
Mi
[5,
space P(@) and value function
Theorem
7(b)].
by Berry and Fristedt
[2],
Ross
[22]
and
an optimal policy with a sharp characterization. Gittins and Jones proved the existence
Mi: P(S\) -> R and M2: ^(©2) -* R w idi
define
state
Bayesian optimal control
> M2(M2(0)- The function
Mk
is
the property that
1
optimal to choose arm
often referred to as the Gittins Index for
consider a one-armed bandit problem where in the
option of playing arm
it is
initial
stage the decision
arm
k.
maker has
or stopping and collecting a terminal reward of m. In subsequent stages,
in
1
if
the
arm
1
has been played
in all
previous stages then the options remain selecting arm
payment m. The value of
final
Gittins index
is
stationary policy
x
7t*(jii
P(@)
7t*:
=
\i 2 )
xi
if
monograph of Berry and
Mi(m)
> M2(H2). and k*(\h,
is
The
p
The
Vk
with
Xfc
The
defined analogously.
Bayesian controller follows an optimal
that the
112)
= X2
if
M2(^2> > Mi(m).
GENERIC LIMIT THEOREMS
Continuity Results
first
functions
m).
advised to consult Ross [22], Whittle [27], and especially the
Preparatory to proving the genericity theorems,
subsection.
Vi(m,
Fristedt [2].
IV.
Some
is
denoted as
is
—» X defined by:
For a more detailed exposition the reader
IV. 1.
we assume
the remainder of the paper
M2
m) = m).
defined by Mi(jii) = inf{m: Vi(jii,
Throughout
e P(&\)
the optimal policy for beliefs \x\
or stopping and receiving a
1
principal result
step
to
is
some preliminary
the establishment of the continuity of the functions
is
develop characterizations of V\ and
are solutions to the optimality equation.
beliefs
denoted
|ik is
technical results are developed in this
by
the
V2 by making
Mj and M2.
use of the fact that the value
The expected one-period reward
function
Ufc:
P(&k)
choosing action
defined by Uk(M-k) =
R
~*
for
00
Z^k(yki)6k(yki)] Mk(dQk)- Informally denoting expectation taken with respect
[
J
to current beliefs |ik
by
i=l
E^k and denoting
Vk(Wc.
To make
future beliefs with a prime symbol, the optimality equation can be written as:
m
the
)
=
m ax
(m,
U k (|ik)+ aE^ k [V k (^k',
above precise,
m)]}.
current beliefs to probability distributions over future beliefs.
by
V k (MkXA) =
SlAC^kd^k.
[
E Uk and
necessary to provide formal definitions of
it
So
for
k =
yki))"6k(yki)] Hk(dQk)- Conditional
1, 2,
the
mapping from
define *Fk: /'(©k) -*
upon period
t
beliefs (ik(t)
P 2 (@k)
and arm
i=l
©k
k being selected, vFk(jik(0)(A) can be interpreted as the probability from the point of view of the decision-
maker
that
Hk( t+ 1) e A.
realization yki given that
k=
1, 2,
by
A-k(yki.
It
will
arm k
Uk) =
also be useful to have an expression for the conditional probability of
is
chosen with beliefs
J9k(yki) Uk(dQk)-
fik-
We can now
So we define
the functions
prove the following
^k ^k x f(©k).
:
results.
ek
LEMMA 4.1.
Proof.
The maps
p.k
-> rk(|ik> yki). k =
The proof which follows from
1,
routine arguments
Appendix.
PROPOSITION
Proof.
4.2.
The proof
is
¥1 and ¥2
in the
are continuous.
Appendix.
2 are continuous for
is
for
all
i
e N.
completeness included
in
the
-7-
The next
R2
is
step
necessary for this result. Berry and Fristedt
reward function
is
LEMMA 4.3.
Proof.
It
that
Uj
Vj
suffices to prove that
TK E K
:
->
is
continuous. Let
is
P(©l) x K. Since
<J>k:
->
continuous on every compact set and by extension W\
is
it
The expected
—»
R
(ii)
from stopping when arm k
loss
defined by Lk(|ik-
LEMMA 4.4.
(i)
•)
the function Lk(|!k,
Lemma
(i)
m
= Vk(Wo
)
m
m
)
Lemma
VII.3.1],
4>k
is
continuous, V]
is
continuous.
the only arm,
is
given by the function L^: /'(©k) x
collects
some well-known
R
results.
(ii)
Replacing sums with integrals, the proof provided
in [20,
VII. 2.1] remains valid.
for
which the decision-maker must
arm k or permanently
to select
optimal strategy for this
game
Uk(M-k) + ctjVk(|ik, m)
^(M^^Wc)-
is
future date with reward
LEMMA 4.5.
Proof,
(i)
(i)
Wi
indifferent
m
W2 are
< m. Since
W k 0ik
,
Vk(nkl
in)]
m
defined by
continuous,
m
is
) -
v k(Wo
(ii) If
*Fk(Wc)(dWc) ^
(
m
-
-
=
is
at
a singleton.
5.5 of [3].
m) =
in
=
is
indifferent
Vk(jlk> m)- According to
m), implying that
m). Finally, note that
m) = UkQlk) + a/VkOlfc, m)^k(Mk)(d^ik)
m)
Mk(Uk) < m. then WQik, m) < m.
a terminal payoff for which the controller
m) < (m
Wk(Mk>
prove that the set of terminal
m received at that future date) and stopping immediately
and
for all |ik e /'(©k), Vk(p.k.
-
Lemma we
R
between continuing play (and possibly stopping
continuing playing arm k and receiving payoff m, Wk(jik,
v k(Wc. m)
payoff of m. The value of pursuing the
The proof follows from Proposition 4.2 and Theorem
Suppose Mk(jik) =
play arm k, and thereafter chooses
Wk: ^(©k) x R -»
^Vith the following
is
initially
retiring with
given by the function
rewards m, for which the decision-maker
J[
the solution to the optimality
is
P(&\) x K. Since
The following Lemma
-
T^
decreasing.
) is
See [20,
between continuing
(ii)
to
easily confirmed that
it is
increasing and convex, and
is
Now consider the game
some
is
R
<J>k
a-Jf^i', m) *iOli)(djllO).
For k = 1,2:
the function V^Oik,
Proof,
P(&\) x
also the restriction of Vj:
is
the
Ek =
be compact and define
bounded. Invoking standard arguments,
R
equation,
KcR
E K by T K f(^i, m) = Max {m, U1O11) +
a contraction mapping with fixed-point
is
when
Vj and V2 are continuous functions.
bounded since R\
is
p.42] provide a counterexample to continuity
[2,
unbounded.
C(/>(©i) x K). Define
Note
Vi and V2. The boundedness of the functions R\ and
to establish the continuity of
is
between
Lemma
4.4
}
-8-
= Uk(Wc) +
«•
m)¥
{/VkOik,
=
m
+ a-fJVkOlfc m)
<
m
+
a(m
+ [Jv k (p-k. m)
k(£kXd|ii)
4>
k (^k)(dWc)
-
n^kKdUk) - Jv k (nk. m) ^k^kXdpk)]
JVk(Hk. m)4> k (iIk)(dHk)]
m)
-
< m.
The proof of
continuity of
upper semicontinuous
Mi
and
(|ik
From
0.
^k e Nk> Lk(Wc.
c)
>
0.
by verifying that for
l.s.c.
is
Lk (Lemma
e
all |ik
N/c,
m
<
c.
of jik such that for |ik e Jk
.
Wk(Hk.
c)
<
Vk(Hk>
By Lemma
c.
But
c.
So by
definition of
3 an open neighborhood
4.3),
confirmed by demonstrating
e R. Suppose that Mk(Mk) =
for all c
(l.s.c).
open. Suppose that Mk(ilk) >
Therefore, for
is
Mk
that
the continuity of
Upper semicontinuity
J/c
is
verifying that these functions are both
are continuous.
establish
g P(@k)'- Mk(Uk) > c)
Lk(Mk. c) >
all
we
First
now completed by
is
and lower semicontinuous
(u.s.c.)
PROPOSITION 4.6. Mi and M2
Proof.
M2
c)
-
c>
R
6
c
the set
Mk, Vk(£k.
N^
Hk such
of
and Mk(lik) >
that the set {|ik
4.5,
all
c)
-
c
=
that for
c.
s /'(©k): Mk(Wc) < c)
is
open
Wk(jlk> c) < c and 3 an open neighborhood
Mk(M-k) <
this implies that
c.
IV .l.Generic Outcomes when c\ * C2
we
In this subsection
inf (Rk(yki):
select i*
i
N
€
analyze the limit outcomes
e N}). Without loss of generality
such that Rl(yii*) < C2- So
yii* occurring if
arm
1
is
selected,
arm 2
in the
we assume
case
<
that ci
when
C2-
ci
* C2 (Recall
To understand
that Ck
=
the next result,
the decision-maker attaches sufficiently high probability to
if
will
be selected regardless of
|i2(t).
Furthermore, the Proposition
of Freedman [13] stated below implies that for a residual set of (81, M-i(O)) pairs, the probability of playing
arm
1
J0i(yii*) |ii(t)(d0i) bounded
infinitely often with
away from
one,
is
zero.
©1
PROPOSITION
supt->«
jHk(0(tok)(U)
Qk
So
Qe k (dC0k) =
for a residual set Ii
eventually (Qe
again.
4.7 [Freedman].
a.s.)
c ©1
For k = 1,2,
1
the set
L k of pairs
(9 k , Hk(0)}
c 0k x
/'(©k) wiln lim
simultaneously for all nonempty open subsets <?/@k
x P(Q\),
if
arm
1
is
W
residual.
played sufficiently often the decision-maker will
be sufficiently pessimistic regarding the realization 9i that arm
1
will never
be tested
-9-
PROPOSITION
for
Suppose c\ <
4.8.
m(0)) € Zi and
(9i,
Then there
C2-
a residual subset 1\
2 x />(02),
(02-H2(O)) e
all
exists
Qe
x P(Q\) such
that
m(0) x n 2 (0))(co) = x k only
X(t,
a.s.
c 0j
finitely often.
For
Proof
i
e N, define 5? e P(@\) by 5?(B) =
outcome
R l(yii*) <
Observing that MiCSj) =
c 2-
there exists a neighborhood
all
{co:
is
t'
>
X
t,
t
vi(n, m)(a>) e U. n =
—
2
U of 5j
= X2. The
(co)
if
Si
€ B, else 6?(B) =
occurs with probability one whenever action x\
the belief that
i
1
for
s.t.
set for
to
1, 2, ... }.
'r-and recalling
m6
c2
U, Mi(jii) <
which vi(n,
Mj
that
:
e
i*
N
to
s.t.
continuous (Proposition 4.6),
is
that if ni(t)(to)
U
never in
is
M-l)(<u)
chosen.) Choose
is
Note
= 5? corresponds
0. (\i\
denoted as
is
Define Ii = {(6, m(0)): Qe(DiGii)) = 0).
e U, then
By
for
Di(m) =
Proposition 4.7, Si
residual.
For
for all
t
g Zi, define n*(co) =
(0, |ii(0))
the count function Ci(t)(co)
inf{n: vi(n, |ii(0))(to)
< n*(co) <
e U}. n*
Q0
is
a.s. finite
and so
Q0
a.s.,
°o.
Outcomes when ci = C2
IV. 3. Generic
ci
To understand
generically,
Qe
the next result suppose that ci
there will exist
a.s.
some time
=
c2
say ti, such that M2(|i2('c l)) <
t,
some time x 2 >
reversing the roles of the arms, eventually at
< Mi(|ii(0)) < M2(M-2(0))- Then
and
Ti, Mi(|ii(T2))
Ml(m(0))- Then
< M2(p-2(T l)).
etc -
each arm will be chosen infinitely often. Observe that there exist consistent estimators of (0i,
from the point of view of a classical
statistician, there will eventually
And
so
2 ) and so
be overwhelming evidence as
to
which arm offers superior prospects.
PROPOSITION
4.9.
Suppose that
there is a residual subset 9*
X(t, p.(0))
=
xi infinitely often
Proof. For
(i
c
c\
= C2 and
x P(@) such
and
X(t, |i(0))
e f(0) define the
prove that there exists a residual subset
Q9(Zi(|i)) =
9*2
c
0.
Defining
x />(©) such
residual sets
is
We now
E2
set, 9?
0£
is
=
{co:
9?
i
X(t, |i(0))
c
15.3].
= X2
not a singleton. Then
1, 2, is
e ^» with
Qe probability
The
finitely often}.
x /*(©) such
U2) 6 ^2. Q0(-2(H-)) =
defined by 9* = 9* i
residual
[13,
Remark
{^k e P(Q\d- Mc(6f) > 0) are also residual
Theorem
k=
one
infinitely often.
strategy
to first
is
that for (9i, 02, M-i, M-2)
€ 9?i,
0-
Since the countable intersection of
n 9*2 is residual.
verify the existence of such a set 9*i. For
6k(yki)>0}.
Rfc,
analogously, by an identical argument there exists a residual subset
that for (0i, 02, M-i,
a residual
range of
that for (8i, 02» Hl» M-2)
= X2
set Ei(fi)
that the
I].
[13,
k =
The
Remark
1, 2,
sets
2],
define
0£ =
{0k e
Ak c f(0k)
and so
k
:
V
defined by
A = Ai x A2
is
i
e
A
N,
k
=
residual [20,
—
-10-
For n g N, 3
such that Ri(yii( n )) < c l + n_1 and hence Mi(5j/2 ) <
n)
i(n)
9
of Mi, there exists a neighborhood
Nn of SjAO
a family of such neighborhoods.
So
sup
meN n
[MiOii)] < M2(v2(n',
Define #(|ii) =
Define 9*i =
{(o
[(]i, 9): \i
depend upon 02, and
all
(o
,
e
+ n
<—ci
M(jii)
:
V
Un
for only finitely
g N, 3
n'
N
Define
and
LI
p.
n > n((u), vi(j, m)(G)) g
s.L for all
Q0(H(m,
for
(ii)
for
Un
the continuity
jV n
N
g
be
to
s.t.
co)).
6 Q: 3 n(co)
g A,
A2, then
|I2(0) g
if
mg
such that for
By
=
H2))
= (m,
and observe
g A, Z(hi,
J12)
that: (i) for
112)
c
i3(|ii).
=
(6i, 92),
A
direct
many
j}.
Q0(#(Hl)) does not
consequence
that
is
9tl3{(M):neA,QeO&(ni)) = 0}.
The proof
is
completed by confirming
K
residual. Define
residual.
By
residual,
and so K
1
n
=
{(]ii,
0i) g P(Q\)
1
n
(Ai x 0!),
1
n
(Ai x ©1)
(Ai x @i),
we
A2 =
}
|i2
is
residual.
g A2)
is
(Oi. 0): [ie
V.
It is
x&\:
p.
g A, Q0(i3(Hl)) = 0)
Q0(t3(jii))
=
Again invoking
And by
residual.
A,
Qe(d(m)) =
is
residual
and hence
9*
1
is
K
1
is
A] x 0]
is
0}. According to Proposition 4.7,
Theorem
application of the category analog of Fubini's theorem [20,
(HI, 0i) g N
X
1
that [(p., 0):
[20,
Theorem
definition, (\i\,
15.3],
\\.2,
15.3],
{(m, ^2» 6l»
01, 02)
(M-l,
:
02):
^l) e
0}.
CONCLUDING REMARKS
some
natural to inquire as to the generality of Propositions 4.7 and 4.8. Is there
peculiarity
associated with having a countable outcome space that underlies these nonintuitive propositions? In cases
when
the parameter space
a smooth, finite dimensional set (as in [10], [16], and [19]) one could hope to
is
modify the theorems of Schwartz [24] and prove
convergence of beliefs for
a.s.
g 0. (This work
all
though, remains to be undertaken.) But the results of Freedman [13] and of this paper can be extended to
other infinite dimensional settings. For instance, suppose for k
if
©k
is
remain
1,
2 the outcome space
Yk =
[0, 1].
Then
the set of all density functions with the Li[0, 1] topology, with minimal modification all proofs
Furthermore,
valid.
of actions
is
assumption
To
=
inessential.
is
that the
conjecture that the restriction that the decision-maker choose form a finite set
I
So while a complete answer
parameter space
is infinite
is
not yet available,
it
appears that the only crucial
dimensional.
assess the significance of the results contained in this paper with regard to
economic modelling with
Bayesian learning, requires addressing die issue of the "size" of the residual set R. The customary practice
is
to equate residual with "large" (even
this
equivalence
Freedman
An
is
accepted and
[13], "for essentially
obvious objection
On
parameters and priors are
in
and of the Dirichlet
prior beliefs
in
Rn
there are residual sets of
are arbitrarily chosen, then
any pair of Bayesians, each thinks the other
is
Lebesgue measure
one
the other hand, there are
any sense small.
priors,
would
A
no grounds
more cautious
stress the sensitivity of
is
zero). If
lead to conclude as did
crazy."
to this stark prediction is that there is little basis for
residual sets as "large".
91
if
though
automatically classifying
for asserting that the "bad" pairs of
interpretation, relying only
on the density of
asymptotic beliefs to the prior. In particular,
it
-11-
is
when dense) and
illegitimate to restrict attention to a computationally convenient class of priors (even
presume mat any resulting
A
initial
second point of contention with the remark of Freedman,
priori restrictions
on the family of admissible prior
Diaconis-Freedman
stance.
method. This consists of
compute
prior."
theorems extend beyond the
limit
the posterior,
Adapting
that agents
this
While
"... after
author
it
may be
appropriate to impose a
Freedman himself now appears
to support this
to the
"what if
specifying a prior distribution generate imaginary data sequences,
reccommendation
have prior beliefs with
this
that
advocate that Bayesian statisticians adopt what they label the
[7]
and consider whether
almost surely with respect
beliefs.
is
family of distributions.
the posterior
to the context of
full
would be an adequate representation of
economic modelling,
would be
it
the updated
natural to require
support and that the sequence of agent posterior beliefs converges
measure
Qe
for all 9
e 0.
sympathetic to imposing such restrictions, these auxiliary conditions are ad hoc;
is
Axioms of Bayesian decision theory provide no foundation
for such constraints
on prior
beliefs.
Furthermore, while the Dirichlet priors constitute an example of such priors within the specific framework
of this paper, the existence of such priors for more general learning models (such as in [8]) has not yet been
on
verified. Further research
this issue is
needed.
Appendix
Proof of Lemma
4.1.
suffices to prove that Ti(-, y\\)
It
is
be a closed set and suppose u.n =>
p.
Theorem
that lim n ri(jj. n , yi)(F)
->
it
T\(p., yi)(F) is
show
suffices to
To
simplify the notation for this
P(S\) are denoted respectively as
proof, representative elements of Y\, &\, and
2.1],
continuous.
yj, 9,
and
|i.
F
Let
c ©i
e P(@\). Applying a standard characterization of weak convergence [B,
<
ri(^t, yi)(F).
Or
equivalently that the
map
\i
upper semicontinuous u.s.c.
/9(y0H(d9)
Recall that
T\([i, yi)(F)
=
F
:
And
note that since the
map
9
—>
9(yj)
is
bounded and
J9(y0p.(d9)
ei
continuous the
map
\i
—>
J9(y0 |i(d9)
is
continuous. So the proof will be completed by confirming that
©1
the
map
|i
-» JQ(y\) |i(d9) u.s.c.
F
The map 9 -»
lF(9)-9(yi)
is
u.s.c,
and by
[3,
Exercise 7, p. 17] the
map
|i
->
JlF(9)-9(y0 |i(d9)
is
01
u.s.c.
But J9(y0
p_(d9)
F
Proof of Proposition
=
JlF<9)-9(y0 u.(d9), and so the
map
4.2. It suffices to
p.
-» J9(y0 p(d9) u.s.c.
F
0!
prove that *¥\
is
continuous.
notational simplicity representative elements of Yi, 0j, and
As
in the
proof of
Lemma
P(&\) are denoted respectively as
yj, 9,
4.1, for
and
\L
\
-12-
Let
P(&\) -* R be a bounded continuous function and
f:
continuity of 4*1,
we must
verify that
*¥
/f(j_i')
(ji
n
{\i
}
)(dpO -»
a sequence with
Jf(ji')
P(S)
|i
n =>
p..
To
establish
v
Fi(iI)(dM.')> or equivalently that
P(0)
oo
oo
f[^f(TiOi, yi)-e(yi))] Hn (d0) -> J[]^f(riOi. yO-eCyj))]
e
n
ji(d9).
e i=l
i=l
oo
Define the functions
g:
0i
->
R and
gn
:
©i
R
->
by g(0) = £f(T(Ji
yOHCyi) and
g n (0) =
i=l
oo
Xf(r(nn
,
yO)-Q(yi)- Since the
maps
y,
-> f(r(ji
y,))
and
y,
->
fCr^^
y,)) are
bounded and continuous, g
i=l
and g n are continuous. Furthermore, as
is
now
demonstrated, g n (0) -* g(0),
ji a.s..
Define X^ e P(Y\) by
oo
X U (A} = SlACyD'Myii H)
Ac Yj.
f° r
The support of
contained
fi is
Ilc8i,
in
defined by
n
=
i=l
(9e 0i:
hence
reult
ffJX-,
«
yO)
Xji).
is
e n, 0(y,) >
For
continuous at
jl.
implies that T(-, yj)
Denoting the Ue norm of
f
on the preservation of continuity under integration
by
[4,
is
continuous
llflU,
Mlflleo
Theorem
-»
£f(T(n, yi))0(yi)
is
continuous at
£ and
so g n (0) -> g(0).
i=l
Finally,
|g n (0)
Md0)
->
£f(TiOi,
i=l
Jg(9) p(d0) by
[3,
Theorem
5.5],
or equivalently
e
e
yi )-0(yi))]
li
n (d0) -»
J[^f(TiOl, yi)-0(yi))] H(d0).
i=l
jl
0(dyi)
[Lemma
<
16.8], for
oo
\i
at
°°.
By
e
4.1],
and
a standard
n
the
map
-13-
References
New
York, Academic Press.
1.
Ash, R. (1972), Real Analysis and Probability
2.
Berry, D. A., and B. Fristedt (1985), Bandit Problems; Sequential Allocation of Experiments London:
.
Chapman and
Hall.
(1968), Convergence of Probability Measures
3. Billingsley, P.
4. Billingsley, P. (1979), Probability
5.
.
Blackwell, D. (1965), Discounted
and Measure
.
New
.
New
York, Wiley.
York, Wiley.
Dynamic Programming Annals of Mathematical
.
Statistics 36,
226-
235.
Blume, L. and D. Easley (1984), "Rational Expectations Equilibrium: an Alternative Approach,
6.
"
Journal of Economic Theory 34, 116-129.
5.
Bray,
M. and D. Kreps
(1987), "Rational Learning and Rational Expectations,"
Modem Economic Theory
6. Cyert,
R. and
Economy
7.
M. DeGroot
,
edited by
George Feiwel,
New
York,
Arrow and
the
Ascent of
New York University Press.
(1974), "Rational Expectations and Bayesian Analysis," Journal of Political
82, 521-536.
Diaconis, P. and D. Freedman (1986),
"On
the Consistency of
Bayes Estimates," Special Invited Paper,
Annals of Statistics 14,1-26.
8.
Doob,
9.
J.
(1953), Stochastic Processes
.
New
York, Wiley.
Easley, D. and N. Kiefer (1988), "Controlling a Stochastic Process with
Unknown
Parameters,"
Econometrica 56, 1045-1064.
10.
Easley, D. and N. Kiefer (1988), "Optimal Learning with Endogenous Data," forthcoming,
International
11.
Economic Review.
Feldman, M. (1987a), "An Example of Convergence
Beliefs," International
12.
to Rational Expectations with
Heterogeneous
Economic Review 28, 635-650.
Feldman, M. (1987b), "Bayesian Learning and Convergence
Mathematical Economics
16,
297-313.
to Rational Expectations,"
Journal of
-1413.
Freedman, D. (1965), "On the Asymptotic Properties of Bayes Estimates
Annals of Mathematical
Experiments,"
Progress
in
Discrete Case
II,"
36, 454-456.
Statistics
and D. Jones (1974), "A Dynamic Allocation Index
14. Gittins, J.
in the
in Statistics ,
edited by
J.
Design of
for the Sequential
Gani, K. Sarkadi, and
I.
Vincze, 241-266, North-
Holland, Amsterdam.
Foundations of Nonstationary Dynamic Programming with Discrete Time
15. Hinderer, K. (1970),
Parameter Springer- Verlag,
.
16.
Kiefer, N. and Y.
York.
Nyarko (1988), "Control of a Linear Regression Process with Unknown
Dynamic Econometric Modeling
Parameters," in
Cambridge University
17. Kelley, J. (1955),
18.
New
Press,
New
,
edited by
W.
Barnett, E.
Bemdt, and H. White,
York.
General Topolog y. D. Van Nostrand, Princeton.
McLennan, A. (1987), "Incomplete Learning
in a
Repeated
Statistical
Decision Problem," working
paper, University of Minnesota.
19.
Nyarko, Y. (1988), "On the Convergence of Bayesian Posterior Processes
in
Linear Economic
Models," working paper, Brown University.
20. Oxtoby,
J.
(1980), Measure and Category:
Measure Spaces second
,
9,
24.
(1974),
in
Applied Probability
7,
Dynamic Programming New York, Academic
.
"A Two- Armed Bandit Theory of Market
330-348.
Press.
Pricing," Journal of EconomicTheory
185-202.
Royden, H. L. (1988), Real Analysis
25. Schwartz, L. (1965),
26.
New York.
Dynamic Programming," Advances
22. Ross, S. (1983), Introduction to Stochastic
M.
Survey of the Analogies between Topological and
edition, Springer-Verlag,
21. Rieder, U. (1975), "Bayesian
23. Rothschild,
A
,
third edition,
"On Bayes Procedures,"
Z.
Macmillan,
Wahr.
New
York.
4, 10-26.
Townsend, R. (1978), "Market Anticipations, Rational Expectations and Bayesian Analysis,"
International
Economic Review
27. Whittle, P. (1982), Optimization
Wiley,
New
York.
19,
481-494.
Over Time: Dynamic Programming and Stochastic Control-I John
.
HECKMAN
LJ
BINDERY
|«|
INC.
JUN95
b__j T„ PW»J N MANCHESTER,
Bo.nJ.To.FW
|NDlANA 46962
k
.
'