On parametric models for invariant probability measures

Quaderni di Statistica
Vol. 2, 2000
On parametric models
for invariant probability measures
Patrizia Berti
Dipartimento di Matematica Pura ed Applicata "G. Vitali", Università di Modena
E-mail: [email protected]
Sandra Fortini
Istituto di Metodi Quantitativi, Università "L. Bocconi"
E-mail: [email protected]
Lucia Ladelli
Dipartimento di Matematica, Politecnico di Milano
E-mail: [email protected]
Eugenio Regazzini
Dipartimento di Matematica "F. Casorati", Università di Pavia
E-mail: [email protected]
Pietro Rigo
Dipartimento di Statistica "G. Parenti", Università di Firenze
E-mail: [email protected]
~
Summary: Let ~
be a sequence of random variables with values in
a Polish space . If ~ is exchangeable (stationary), then ~ is i.i.d. (ergodic)
conditionally on some random probability measure ~ on
(~ on
). In
the exchangeable case, two necessary and sufficient conditions are given for ~ to be
~
~
i.i.d. conditionally on ~ , where is a given transformation. Essentially the same
~
conditions apply when is stationary, apart from "i.i.d." is replaced by "ergodic"
and ~ by ~. The basic difference between the exchangeable and the stationary case
lies in the empirical measure to be used. Up to a proper choice of the latter, the two
conditions work in a large class of invariant distributions for ~. Finally, the set of
ergodic probability measures is shown to be a Borel set, and almost sure weak
convergence of a certain sequence of empirical measures is proved in the stationary
case.
Keywords: Empirical distribution, Exchangeability, Invariant probability measure,
Predictive distribution, Predictive sufficient statistic, Stationarity.
39
1. Introduction
~
Let ~
be a sequence of random variables with
values in a Polish space . Suppose ~ is exchangeable or, equivalently,
~ is independent and identically distributed (i.i.d.) conditionally on
some random probability measure ~ on the Borel -field on ,
.
In Theorem (7.2) of Fortini, Ladelli and Regazzini (2000) referred to
as FLR in the sequel conditions are given for ~ to be i.i.d.
conditionally on ~ ~ , where ~ is a given transformation. In fact, the
hypotheses of Theorem (7.2) concern sequences of predictive sufficient
statistics, and are conceived to have some statistical content within a
Bayesian predictive framework.
In so far as the sole representation part is concerned i.e., finding
conditions under which ~ is i.i.d. given ~ ~ , where ~ is an assigned
transformation the hypotheses of Theorem (7.2) are sufficient but
not necessary.
The aim of this paper is to take up again the representation part to
provide necessary and sufficient conditions. From a statistical point of
view, such conditions are useful for investigating existence of
underlying parametric models for the distribution of ~.
In particular, two conditions are given. The first one (that will be
denoted as condition (b)) is of the abstract type. Nevertheless, by using
it, various other sufficient conditions are easily obtained, included the
one proposed in FLR. Moreover, it allows a shorter and more direct
proof of Theorem (7.2). The second condition (condition (c)) applies
to the special case where ~ is continuous (an assumption made also in
Theorem (7.2)). In this case, however, it has some statistical meaning.
The problem studied in this paper is different from the one tacled in
Olshen (1974). In the latter, among other things, an exchangeable
sequence ~ is shown to be i.i.d. conditionally on some real random
variable
. Such result, clearly, is based on the Borel isomorphism
theorem. In our case, instead, we investigate whether ~ is i.i.d.
conditionally on ~ ~ , where ~ is a given transformation (possibly,
suggested by a sufficient statistic) and ~ is the almost sure weak limit
of the empirical measures.
!
"
#
#
(
)
$
%
*
*
,
-
.
/
+
,
0
,
-
.
/
,
1
0
1
,
1
2
3
4
5
6
7
4
6
40
&
%
'
A further and minor goal of this paper is to start, in a particular
case, an analysis of Bayesian nonparametric problems for non
exchangeable data. Suppose ~ is stationary or, equivalently, ~ is
ergodic conditionally on some random probability measure ~ on
. Then, conditions (b) and (c) are still working, apart from
"i.i.d." is replaced by "ergodic" and ~ by ~. In this framework,
incidentally, the set of ergodic probability measures is shown to be a
Borel set, and almost sure weak convergence of a certain sequence of
empirical measures is proved.
The basic difference between the exchangeable and the stationary
case lies in the empirical measure to be used. Up to a proper choice of
the latter, conditions (b) and (c) work for a large class of invariant
distributions for ~.
The paper is organized as follows. Section 2 is devoted to
preliminaries and notation, Section 3 includes the statement of
Theorem (7.2), while Sections 4 and 5 contain the main results, in the
exchangeable and stationary case, respectively.
8
8
9
:
;
<
=
>
?
@
A
2. Preliminaries and notation
The basic notation is that of FLR, with some slight adaptions in
view of the last Section 5.
Let
be any set. Then,
is the space of all sequences
of elements of , and ~ is the -th coordinate map
on
, that is,
B
D
E
F
D
G
B
D
G
L
H
H
H
I
C
J
N
K
M
O
P
Q
N
~
P
R
N
S
T
for all
N
O
O
N
T
R
N
U
N
P
L
Z
U
V
V
V
S
Q
X
and
W
M
P
Y
W
.
[
P
When is a topological space,
denotes the Borel -field on ,
and
is equipped with the product topology. By a Polish space, it is
meant a topologically complete separable space. If is Polish, then
is Polish, too, and
.
Given a -field on , let
be the set of probability
measures (p.m.'s) on , and let
be the -field on
R
P
S
Q
P
P
Q
Z
P
Q
R
\
Z
S
^
P
T
Q
R
S
_
`
_
a
]
b
c
]
d
d
f
41
c
g
e
h
i
j
k
l
m
n
k
o
p
q
l
r
s
p
l
t
generated by the sets
, for
and
. Any measurable function
, where
is any measurable space, is called a random p.m.. Thus, a
function on
is a random p.m. whenever
is a p.m. on
for
, and
is -measurable for
.
We will consider a sequence of observable random elements taking
values in a Polish space . For our purposes, it is convenient to work
in the coordinate space
and to identify the sequence of
~
~
~
observables with
. Moreover, we let
and
, i.e.,
and
denote the
sets of p.m.'s on
and on
, respectively. Both
and
are equipped with the topology of weak convergence of p.m.'s. Since
is Polish,
and
are Polish, too, and
and
coincide with the -fields
and
defined earlier.
For every
, the empirical measure
associated to
~
~ is defined as
u
v
†
‚
w
‡
x
y
z
{
|
}
~

w
ƒ
z
„
}
€
w

z
…
}
ˆ
ƒ
„
†
‡
‰
Œ
ˆ
Š
Š
†
‹
Œ
Œ
Ž
ˆ
‰


Œ

‹
‘
’
‘
“

’
“
Ž
”
”
•
—
”
–
–
‘
‘
˜
’
™
š
˜
›
’
˜
›
ž
¡
¤
¡
œ
œ
œ

Ÿ
¤
£
š
š
¢


š
ž
š
¢


Ÿ
ž
¤
Ÿ
¤
£
š
¢

š
¢

ž
¢
¤
¤
š
ž
Ÿ

š

Ÿ
ž
¤
Ÿ
¤
£
¥
¦
š
š
¢


¦
š
š
¢


ª
§
¨
©
«
¬
­
®
¯
¯
¯
®
­
°
±
«
¹
º
»
²
¼
~
³
´
µ
´
¶
·
¸
½
¾
¿
where
is the unit mass at . Clearly,
is a p.m. on
for
, and
is Borel measurable for
, i.e.,
is a random p.m. from
into
.
A p.m.
on
is exchangeable if it is invariant under all
finite permutations of
, that is, if
for all functions
of the form
Â
Ã
Ä
À
Å
Æ
Ä
Ç
Æ
Á
Í
Ë
Å
È
Ì
Ç
Å
É
Ã
Ä
Å
Æ
Ä
Ê
Æ
Ê
Í
Ì
È
Ä
Ç
Æ
Í
Ë
Ë
Ð
Ð
Ã
Ä
Ñ
Õ
Ò
Ó
Ö
Ç
Î
Ä
Ç
Æ
Æ
Ä
Ï
Î
Ä
Ï
Æ
Æ
Ô
×
Ø
Ù
Ø
Ú
Û
Ü
Ý
Þ
Þ
â
ß
â
à
á
à
Þ
ã
ä
å
æ
æ
æ
å
ä
å
é
ä
ê
å
ê
ë
æ
æ
æ
ç
è
é
ã
ä
å
æ
æ
æ
å
ä
å
ì
ú
ï
ø
ð
ÿ
ý
ú
ú
ÿ
ÿ
ñ
that ~
õ
ñ
~ ~
ø
ñ
ó
ý
ñ
õ
õ
ú
ó
ô
ô
ô
õ
æ
æ
æ
ç
é
ù
ó
ø
û
ý
ò
ô
ô
ô
ó
þ
ø
ú
ë
and some permutation
of
. As
denotes the p.m. on
given by
. When is exchangeable, we will also say
is exchangeable under
ñ
ø
û
å
ê
î
÷
for some
usual,
ä
ì
í
ô
42
ò
ñ
õ
ü
õ
ñ
ö
ó
ô
ô
ô
ó
ï
õ
Given
, let
denote the corresponding product p.m., i.e.,
is the p.m. on
which makes the coordinate random
variables ~ ~
i.i.d. according to
We are now in a position to state de Finetti's representation
theorem for (infinite) exchangeable sequences.
Theorem 1 (de Finetti's representation theorem)
Let be a p.m. on
where is Polish. Then, the following
statements are equivalent:
(i)
is exchangeable;
(ii) There is a unique p.m.
"
on
such that
#
#
for all
$
!
;
%
(iii) There is a random p.m. ~ (i.e., a Borel function ~
such that,
for each
,~
is a version of ~
&
,
-
.
5
/
0
6
1
.
6
,
0
2
2
.
Moreover, when
is exchangeable, one has
all
and
7
-
.
5
:
.
7
3
0
-
8
'
+
(
)
,
4
2
.
9
)
~.
1
~
1
*
0
-
7
0
for
0
;
<
~
=
>
where "
#
&
?
-a.s.,
@
" stands for weak convergence of p.m.'s.
=
We close this section with the notion of statistic. By definition, a
statistic is a measurable function of the data. In our case, the data are
~
the -tuple ~
for some
Under the exchangeability
assumption, the order in which observations appear is not relevant, and
~
~ can be summarized by the empirical measure .
Let
A
B
C
D
E
E
E
D
C
F
A
E
G
B
>
C
D
E
E
E
D
C
F
<
G
>
H
>
I
J
K
L
M
N
O
M
U
P
Q
R
V
43
S
P
T
W
be the union of the ranges of all the random p.m.'s , and let
be a
Borel subset of
such that
. Further, let
be a Polish
space and ~
a Borel function. Then, for each , ~
is
~
~
Borel measurable on
and is a summary of the data
.
Accordingly, in Sections 3 and 4 a statistic is meant as the restriction
to , ~ , of any Borel function ~
, where
and
.
In last Section 5, a slightly different (but conceptually equivalent)
notion of statistic is used.
X
]
]
^
Y
Z
[
\
_
`
`
a
b
c
d
e
f
c
g
j
k
b
h
m
i
l
n
o
p
q
r
r
r
q
p
s
j
l
u
t
t
w
c
n
c
e
f
g
f
v
f
s
j
j
k
b
j
k
t
f
x
j
k
3. Statement of Theorem (7.2) of FLR
y
w
n
m
Let
be an exchangeable p.m. on
and let
~
~
be the sequence of coordinate random variables on
. According to Theorem 1, conditionally on some random p.m. ~,
~ is i.i.d. according to ~ . Fix a Borel function ~
, where is
Polish,
and
, and suppose that
where is as in Theorem 1.
Under these conditions one question is whether, conditionally on
~ ~ , ~ is again i.i.d. according to ~ . Answering this question is useful,
for instance, for investigating existence of an underlying parametric
model for the distribution of ~. Suppose in fact that, conditionally on
~ ~ , ~ is i.i.d. according to ~ . Then, ~ and ~ ~ play essentially the
~
~ ~~
same role, and a random parameter can be defined as
. In
other terms, the "original random parameter" ~, whose existence is
granted by de Finetti's theorem, can be reduced through ~. As far as ~
is concerned, in what follows it should be viewed as any given
transformation. However, in a particular statistical problem, ~ is
typically suggested by some sufficient statistic.
In any case, a positive answer to the earlier question occurs in case
o
s
n
d
p
z
p
e
v
s
l
{
m
|
o
|
c
p
e
f
g
j
w
f
v
n
k
b
b
t
f
s
f
~
n
x
f
s
z
}
j
j
k
j
k
j
k
~
n
|
|
c
s
p
p
n
|
|
|
n
c
|
c
s
p
s


‚
€
ƒ
„

ƒ
‚
‚
‚
(a) ~
and ~ ~
for some function
ƒ
…
†
‚
ƒ
„
‰
Š
~,
„

ˆ
ƒ
€
‹
Œ

Ž

‡

”
-a.s.,
such that
44
‘
•
Œ
’
“
‘
–—
Ž
’
™
š
›
œ
š
˜
›
œ
˜
where
is the completion of
with respect to the distribution
~
~
of
.
Indeed, under (a), ~ is i.i.d. (according to ~ ) conditionally on ~ ~ .
In particular, one has
š
ž
œ

ž
š
ž
œ

Ÿ
š
¡
œ
¢
¦
£
š
œ
§
š
¡
œ
¨
™
š
¤
œ
¡
¥
š
ª
§
œ
˜

for all

©
~
where is the distribution of ~ and is the completion of .
Theorem (7.2) of FLR, quoted in Section 1, just gives conditions
for (a). To state it, various definitions are needed.
First, let
be any p.m. on
. For each
,
~
~
~ is called the predictive distribution (at
time ). A statistic ~ is predictive sufficient if, for each
and
, one has
¨
š
ž
œ
¨
™
¨

©
©
©
š
š
®
¯
°
±
²
³
ª
§
œ
«
¥
˜
´
µ
¶
®
·
¸
¸
¸
·
®
¬
¹
­
º
»
²
¶
¼
½
²
´
¿
¾
´
Á
Â
Ã
³
À
Ä
Â
Å
®
Æ
~
¯
°
±
²
³
´
~
¶
À
Ç
È
¹
É
Â
Å
®
Æ
¾
º
¼
~
¯
°
±
²
³
´
¶
À
º
®
~
·
¸
¸
¸
·
®
»
~
¹
Å
¼
a.s..
So, the informal idea is that ~
is predictive sufficient provided
predictive distributions depend on data only through it. We send back
to FLR for more on predictive sufficiency and for some references
therein.
Next, let
be a countable -class such that
and
for all
, where
is the marginal of on the
first coordinate.
Further, for each , fix a function
on
such that:
is a p.m. on
for
,
is
-measurable for
,
~
~
¶
½
¾
Ê
Ñ
Ë
Í
Ó
Ô
Õ
Ö
×
Ø
Î
Ò
Ì
Ï
Ê
Î
Ù
Ö
Ú
Ü
Ó
Ñ
Ï
Ð
Î
Ò
Ï
Ó
Û
Û
Ý
Þ
ß
à
á
Þ
â
ã
ä
å
ß
æ
á
ã
á
ã
ç
è
é
å
â
Þ
ã
ä
å
ç
ê
ß
æ
ð
ñ
é
è
ë
å
â
ò
ã
õ
å
ë
ò
ö
÷
ø
ù
ò
ú
ö
û
÷
ú
ü
ý
ó
ê
â
ã
ô
ä
å
ò
ö
÷
þ
æ
ì
ê
ÿ
ë
è
í
î
ê
ï
å
for all
and
, where
denotes the distribution
~
of
.
Clearly, such
exists since is Polish.
Finally, consider the following condition; cf. condition (7.1) of
FLR.
þ
45
(*) There is a set
~
Each
For each
$
%
&
(
)
*
'
/
satisfying:
;
is a convergent sequence;
,
and
, there is
+
%
&
/
,
!
(
"
)
*
$
'
*
whenever
1
#
such that
-
.
for all sufficiently large
/
/
&
0
2
/
and
3
&
'
(
,
*
-
.
4
5
for all sufficiently large ,
6
6
8
7
9
being a metric for .
5
We are now able to state Theorem (7.2).
Theorem (7.2) of FLR
Let
be Polish spaces,
an exchangeable p.m. on
,
~
~
~
and
a Borel function. If
is predictive sufficient, is
continuous on
,
, and condition (*) is satisfied, then
condition (a) holds.
:
;
9
<
=
>
:
?
@
9
G
A
B
C
H
D
A
>
?
I
E
F
A
J
K
G
H
G
C
H
C
Finally we note that, if ~ is predictive sufficient, ~ is continuous
on
and
, then condition (*) is necessary in order that
(a) holds for a continuous ; cf. Theorem (7.4) of FLR.
A
>
?
I
E
F
A
J
K
G
H
G
C
H
C
L
4. The exchangeable case
<
=
>
:
?
In this section, is an exchangeable p.m. on
are as in Theorem 1. For definiteness, setting
N
O
P
Q
R
S
T
\
U
V
]
Q
W
converges weakly as
it is supposed that
46
X
@
Y
and ~ and
Z
M
[
,
K
~
^
`
a
b
weak lim
e
for all
f
_
`
a
b
f
a
c
d
and ~
for all
, where is any fixed element of
~
Given any Borel function
, with
, let us consider the condition:
l
g
h
m
n
g
h
i
j
g
o
.
k
o
p
z
y
and
y
q
r
s
t
u
s
v
p
w
s
x
p
p
y
s
{
|
p
(b) ~ is injective on
z
q
, for some
}
such that
y
}
v
w
s
x
p
.
~
}
w
x

€
Before any comments on (b), we prove that it amounts to (a).
Theorem 2
Let
be Polish spaces,
an exchangeable p.m. on
,
and ~
a Borel function. Then, conditions (a) and (b) are
equivalent. Moreover, under (a) or (b), the function involved in (a)
can be taken Borel measurable.

‚
z

„
u
ƒ
w
x
y
q
r
s
t
u
p
…
Proof
(a)
that ~
z

(b). Under (a), there is a set
,
, such
~
~
~
and
for all
. Then,
~
whenever
and
, i.e., ~
has ~
outer measure . Further,
is an analytic set. Hence, there is
~
with
and
. Since ~ is injective on
, condition (b) holds.
~
(b)
(a). Since ~ is injective on , for each
there is
precisely one
such that ~
. Fix
, and define
~
~
for
, and
for
. Then,
~
for all
. Further,
is Borel measurable.
In fact, ~
, due to ~ is Borel and injective on , and for the
same reason one has
„
†
Œ

‡
v
w
x
ƒ
w
x

€
y
q
‡
ˆ
‹
‡
w
‰
x
v
s
…
Š
Š
ˆ
w
‰
x

ˆ
w
‰
x
‰
v
p
Ž

Œ

‹
‘

Œ
’
“
‹
”

“
‹
”

z
~
~
•


‹
–
‘
“

“
™
–
‹
”
‹
”


‹
—
š

Ž

˜
–
•
‘
–
—
•
›
˜
˜

˜
‹

–
–
œ
“

˜
‹
“

Ž
˜
–
Ÿ
£
¤
¥
¦
§
¨

¥
©
¥
¤
®
¦
£
¤
¥
¦
§
¨
¬
£
¤
¥
¤
¨
¦
¦
¡
¢

¥
ž
©
ª
«
¥
¤
®
¦
­
§
¨
¨
©
®
£
¯
ª
°
±
²
³
¥
¼
®
¦
©
¤
~
½
´
for all
¤
¶
µ
¶
Â
·
Å
¸
µ
¹
µ
Æ
¿
ª
·
·
½
¦
À
¥
Á
¹
Â
~
¹
®
¾
µ
¿
·
Ã
º
.
47
Â
¶
Ä
À
~
¹
µ
¿
¸
¶
·
Â
Å
µ
»
·
~
Finally, let
, one has ~
for a Borel function .
Ç
Ç
Ò
É
È
Ó
É
Ë
Ë
Ï
Í
Ì
Ì
Ò
Ê
Ë
Ç
Ì
. Then,
~ ~
and
Ð
É
Ó
Ô
Õ
Ð
Ö
Ø
Ë
È
Ì
Õ
Ò
È
Ñ
Ë
~
É
Í
Ì
Ë
Ì
È
Ò
×
Î
and, for all
. Hence, (a) holds
Ù
×
A first remark on Theorem 2, even if not essential, is that the
function in condition (a) can always be taken Borel measurable.
Apart from this fact, two other points need to be discussed. One is the
possible meaning of (b) and the other is that, taking (b) as a starting
point, various other sufficient conditions for (a) can be obtained.
Ú
4.1. Meaning of condition (b)
From a statistical point of view, to get condition (a) (which is our
goal, as explained in Section 3), it would be desirable to ask conditions
only on the way ~ summarizes data. More precisely, it would be
desirable to ask conditions on ~ , in particular on its connections with
~
predictive distributions, but not on the behaviour of on
.
Strictly speaking, this is not possible in general. Suppose in fact
that
, choose any Borel function on , and take
to be any Borel set with
. Then, apart from trivial cases,
admits two Borel extensions to
, say ~ and ~ , such that (b) holds
~
~
for and fails for . In view of Theorem 2, condition (a) holds for ~
~
but fails for ~ , even if ~
.
In Theorem (7.2) of FLR, for instance, continuity of ~ is asked on
all
. However, all other conditions of Theorem (7.2) concern ~
only, and also the continuity assumption looks admissible. Indeed,
when ~ is continuous on and admits a continuous extension to
(e.g., when ~ is uniformly continuous on ), it is natural to take ~ on
as such continuous extension.
Generally, condition (b) deals with the behaviour of ~ on
,
and in this sense it does not have an intuitive statistical content.
Nevertheless, (b) is also necessary, and thus one can think in term of it
without any real loss of generality. In addition, (b) makes clear which
properties are requested to ~ in order that ~ can be reduced through ~:
Û
Û
Ü
Ý
Û
Ý
à
Þ
ä
å
æ
ç
è
ß
å
â
á
é
ê
å
ã
ë
ä
é
æ
ç
ì
í
â
ã
ë
ì
é
î
î
ë
ë
î
ï
ì
î
ë
î
ï
î
ë
î
ð
å
ç
ç
î
ð
å
ã
ï
ë
ï
î
é
î
ë
ð
å
ì
î
å
é
ë
î
å
î
é
ë
ì
î
é
ñ
ë
î
ò
48
å
ì
î
ì
over a set
of -probability , ~ must be able to distinguish between
two different weak limits of empirical measures.
ó
ô
õ
ö
4.2. Other sufficient conditions for (a)
By using Theorem 2, sufficient conditions for (a) can be obtained.
Moreover, it is possible to give a shorter (and more direct) proof of
Theorem (7.2). We begin with the latter point.
Proof of Theorem (7.2) of FLR
Since
is countable and
with
and ~
~
~,
. Moreover,
condition (*).
Let
÷
ø
ù
ú
û
ü
ý
þ
~
û
÷
~
ÿ
for all
for all
-a.s., where the
, there is
and
are as in
~ .
Since ~
is an analytic set and
, there is
with
~
and
. Let
. By definition of ,
~
continuity of ~ and condition (*), it follows that ~
whenever
~
~
~
~
~
and
. Hence, is injective on , and since
, condition (b) holds. By Theorem 2, this
concludes the proof.
!
"
#
!
$
%
&
)
-
,
.
/
&
4
(
1
*
2
'
+
5
0
'
(
)
*
+
&
0
'
(
,
*
(
)
*
&
+
'
(
,
*
1
6
(
*
+
3
2
7
Let us turn now to some other sufficient conditions for (a). If
assumptions on the behaviour of ~ on
are allowed, then, by
Theorem 2, a plenty of conditions are available. Because of the
remarks in Subsection 4.1, however, we focus on conditions
concerning ~ only, apart from continuity of ~ which is asked on all
. Then, one possible candidate is:
4
5
6
8
9
&
:
9
&
4
5
&
6
;
(c) There is a set
>
.
;
?
(
<
*
,
=
(
*
49
+
3
-
such that
@
A
B
C
D
E
F
K
M
N
S
~
G
Q
H
I
J
G
R
K
L
~
M
H
I
J
G
Q
N
L
L
O
P
Q
T
whenever
and
@
A
B
A
U
V
G
J
G
K
L
M
J
G
N
L
L
O
P
W
Q
X
Q
Q
Y
where
Z
is a metric for
\
and
[
is Prohorov metric on
W
]
Z
We recall that Prohorov metric
`
a
b
c
d
e
f
g
h
i
j
k
l
a
`
on
W
m
d
is given by
]
n
c
`
m
d
o
m
^
t
u
where
v
w
x
y
^
z
s
w
{
t
|
}
~
p

q
ƒ
„
r

being a metric for
€
.
Theorem 3
Let
be Polish spaces,
an exchangeable p.m. on
~
and
a Borel function. If ~ is continuous on
, then conditions (a), (b) and (c) are equivalent.
‚
d
_
y
,
€
`
for all
s
W
…
†
‡
‚
ˆ
,
and
‰
„

Š

Š
‹
Œ

Œ
Ž
‡
ˆ
‘
Ž
’


Œ
Ž
Proof
Let ~ be continuous on
with
. By Theorem 2, it is
enough to prove that (b) and (c) are equivalent.
(c)
(b). Let be defined as at the beginning of Section 4. Since
~
is analytic and has -outer measure , there is
~
with
and
. Then, (b) holds with
. To see this, the only non trivial fact is injectivity of ~ on
. Fix
such that
, and take
with
~
for
. Then, lim
,
~
due to
and
, and thus continuity of and condition
‡
ˆ

‘
’

Š
Œ

Œ
Ž
“
‡
–
Ž
”
—
ˆ
’
˜
™
†
‡
ˆ
”
•

Œ
Ž
˜
š
‡
–
—
ˆ
‡
˜
ˆ
‘
’
”
•
›
‘
˜

—

Š
Œ
Ž
œ

ž

Ÿ
¥
¨
©
¨
ª
«
œ
¦

¡
¥
¬
´

­
©
®
¯
¦
Ÿ
°
ª
£
¤
ª
§
±
´
¢
«
¬
µ
¯
±
ª
¶
«
¬
µ
¬
©
ª
¨
¯
·
¨
¬
¶
¸
·
¸
µ
«
¯
«
¹
¶
º
¨
»
·
¨
¶
¼
·
(c) yield
½
ª
¿
~
¼
ª
¨
¬
¶
¯
~
¼
½
ª
¨
¬
·
¬
©
lim
ª
¿
µ
50
~
¼
¾
±
ª
µ
«
¬
¶
¯
~
¼
¾
±
ª
µ
«
¬
·
¬
²
³
.
²
³
(b)
(c). Define
such that
and, since
À
Á
Í
Î
~
Å
Ï
Ç
Ð
È
Ì
Ï
Ð
Í
~
Ç
Å
Ú
Á
Â
~
Ã
Ñ
Ò
Ä
Ó
Ò
Å
Ô
Ê
Ë
Õ
Ç
Ç
Ì
, note that
È
Ö
Ç
È
Ú
Í
Ù
Ï
Ç
Ð
~
È
Å
Í
Î
Ç
È
Î
È
, and fix
. Then, ~
Æ
Ç
×
Ø
È
Â
É
Å
Ï
Ù
Á
Ç
È
Í
Ù
Î
,
Ã
Î
Ö
Ç
È
lim
È
Â
Ï
Ç
Ö
Ç
È
Ú
Ö
Í
Ç
È
Î
È
×
.
Ø
Ï
Ù
Ù
Ù
Thus, continuity and injectivity of ~ on
yield
Ì
Û
~
Ý
lim
Ç
Ö
Û
Ç
~
È
Þ
Î
~ ~
Ý
Ö
Û
Í
Ç
È
È
Â
Ç
Å
Þ
Û
Ï
Ù
Ù
Ù
Ü
Ç
~ ~
È
Å
Þ
Û
Í
Î
Ç
È
È
×
.
Ø
Þ
Ï
ß
Ü
By Theorem 3, in the relevant case where ~ is continuous on
and
, (c) is equivalent to (a). Moreover, in line with the
remarks in Subsection 4.1, condition (c) deals with ~ only. Hence,
Theorem 3 is an improvement of Theorem (7.2).
Next, given a function
with
and
metric spaces
(with distances and ), let us call "uniformly injective" in case: For
each
there is
such that
whenever
~
and
. If
is uniformly injective on , then
condition (c) trivially holds. Thus, by Theorem 3, it is enough for
condition (a) that ~ is uniformly injective on and continuous on
,
with
.
We close this section by noting that condition (c) becomes more
meaningful if
is translated into a sort of distance
between
and
. To this end, it is convenient to
replace Prohorov distance with some other equivalent metric. Let be
the set of real valued functions on such that, for all
,
Û
à
Ë
Ç
È
Â
É
à
â
Ë
á
Û
ã
è
Õ
å
æ
ç
ä
è
æ
æ
æ
è
Ý
Ý
Õ
õ
ë
ì
ë
é
ð
ò
ó
ì
í
î
ï
î
ð
ñ
ò
ï
î
ó
ñ
ñ
ô
ö
ê
÷
í
î
ð
ò
ó
ñ
ê
ô
ø
ù
ú
ø
ù
û
ü
î
û
ñ
ü
ÿ
ý
þ
î
î
ñ
ò
î
ñ
ñ
î
ò
ò
ñ
î
ò
ò
ñ
.
The so called bounded Lipschitz metric on
#
$
!
"
%
&
$
51
'
is defined as
'
,
ý
á
)
*
+
and it can be shown to satisfy
Corollary 4.3, p. 33). Thus, by using
can be equivalently written as:
,
-
,
(see, e.g., Huber, 1981,
instead of , condition (c)
.
(
/
0
1
8
2
3
9
4
(c) There is a set
:
;
<
=
>
3
(
5
6
7
D
0
such that
~
1
?
@
-
7
/
+
,
G
C
1
(
*
-
A
B
1
H
C
3
~
7
@
A
B
1
G
3
3
E
F
/
whenever
and
G
G
Q
Q
+
P
P
S
:
D
G
;
<
;
I
J
=
>
?
S
K
G
N
1
J
O
P
C
3
L
1
J
R
P
G
D
3
E
K
F
M
R
G
Q
Q
5. The stationary case
This section includes versions of Theorems 2 and 3 and a
convergence result for the case where is stationary. Some remarks
on Bayesian nonparametric inference for stationary data, or more
generally for data with invariant distribution, are also given. We begin
with a result asserting that, under general conditions, every invariant
p.m. is a unique integral mixture of extreme points.
Given a measurable space
and a class
of measurable
functions
, let be the set of those p.m.'s on
which are -invariant, i.e.,
for all
. Further, let
be the set of extreme points of , and let
be equipped with the
trace -field
. According to Section 2,
is generated by the maps
, from
into the
reals. We also recall that a p.m.
on
is perfect if, for each measurable function
, there is
such that
and
. Next Theorem 4, due to Maitra (1977),
unifies results of Bogoliouboff, de Finetti, Farrel, Kryloff and
Varadarayan.
4
1
W
X
Y
^
Z
_
[
\
Y
^
Z
a
_
7
T
[
b
3
U
V
c
`
b
]
d
e
e
`
j
a
g
g
h
i
h
i
`
o
p
n
m
_
n
o
g
h
i
`
o
p
m
_
o
l
n
q
r
q
s
s
o
b
t
f
_
g
h
i
_
|
}
~
x

€
~
}

{
u

x
‚
v
y
w
ƒ
52
`
_
n
e
{
i
`
n
g
h
k
`
n
f
_
f
z
y
o
Theorem 4 (Maitra)
Suppose is countable,
is countably generated and includes
the singletons, and every p.m. on is perfect. Then, for each
there is a unique p.m.
on
such that
for all
.
„
…
…
‡
†
‰
•
–
—
˜
™
Š
Ž
Ž
…

Š
Ž
’
“
Š
‘
Ž
Š
œ
”
“

Ž
ž
‘
Ÿ
œ

›
¡
œ
ž
ž
¢
In our case,
where
is Polish, so that
meets the conditions of Theorem 4 and the -field
reduces to
. If
is the class of all finite
permutations of
, then is the set of exchangeable p.m.'s and
the set of product p.m.'s, i.e.,
. Hence, the
equivalence between (i) and (ii) in Theorem 1 follows from Theorem 4,
after noting that and are connected by the relation
,
where
for
. In other terms, at least formally, de
Finetti's theorem can be embedded into a more general result on
invariant p.m.'s.
Since de Finetti's theorem is fundamental in Bayesian
nonparametric inference for exchangeable data, one could hope that,
taking Theorem 4 as a starting point, a relevant part of the usual theory
can be extended to the invariant case. In principle, this is possibly true.
However, moving from the exchangeable to the invariant case, the
problem becomes technically much more intricated. So, developing a
nonparametric theory for invariant data, analogous to the usual one for
exchangeable data, seems to be very hard. In particular, it looks hard
to get usable statistical procedures. On the other hand, it would be
interesting to investigate which part, if any, of the usual Bayesian
nonparametric theory can be extended to invariant data.
In the sequel, as a significant example, we discuss the stationary
case.
Let
be the shift
transformation:
. A p.m.
on
is stationary if
~ ~
and ~ ~
have the same distribution under , or
equivalently if
. Clearly,
is stationary in case is
exchangeable, but the converse is not true. If
is stationary and
for each Borel set with
, then is said to
be ergodic. When is stationary or ergodic, we will also say that ~ is
£
œ

£
ž
œ
›
¡
ž
¨


›
ˆ
‘
Œ
†
š
‡
Š
‹
ˆ
¤
¥
¦
¤
¥
¦
§
«
œ
ž
œ
›
©
¤
¥
¦
¢
ž
¨
ª
¨
£
¤
¨
¥
¦
Ÿ
¬
­
®
­
¯
°
±
¨
£
²
²
š
³
š
³
Ÿ
´
µ
¶
´
œ
­
ž
Ÿ
­
­
¯
°
£
²
®
¸
£
£
·
œ
¥

¥

¹
¹
¹
ž
Ÿ
œ
¥

¥

¹
¹
¹
ž
º
œ
ž
¿
·
»
¼
¼
À
½
¾
À
Â
Ã
Â
Ã
»
Ä
Ä
Ä
Å
Á
Â
Ã
¼
Â
Ã
¼
Ä
Ä
Ä
Å
½
»
Á
Æ
Á
Ç
È
Á
É
Á
»
Á
À
Ê
Ê
Ê
Æ
È
Ê
Á
É
Å
Ë
Ì
Í
Ã
Î
Ï
Á
Â
53
Ð
Ñ
stationary or ergodic under . Let be the set of ergodic p.m.'s and
the set of all p.m.'s on
; cf. Section 2. By relying on an
argument of Maitra (1977), we now prove that is a Borel subset of
.
Ò
Ø
Ó
Ô
Õ
Ö
×
Ñ
Ò
Ö
Lemma 5
If is a separable metric space, then
Ô
Proof
Let:
Ú
Ñ
Ù
Ø
Ó
Ò
Õ
.
Ö
Û
of
stationary
p.m.'s,
all
,
and
for all
By Lemma 4
of Maitra (1977), the -field is sufficient (in the classical sense) for
. Since
is countably generated, sufficiency of
implies
existence of a sufficient and countably generated -field
such that
; cf. Burkholder (1961, Theorem 1). Fix
countable fields
and
such that
and
, and define
à
Ü
Ý
Þ
,
ß
the
set
for
á
î
ã
ï
ä
å
æ
ç
è
é
ê
ë
ç
å
é
ð
ã
ì
ô
õ
æ
ö
÷
ø
ù
ú
û
÷
í
ò
ó
â
ë
ñ
õ
÷
ü
õ
ù
ù
ó
ý
û
ö
þ
ÿ
ï
ò
ð
ï
÷
ø
ù
ï
ï
ï
ï
and
Since
for all
,
for all
!
"
.
"
and
are countable,
is Borel, and since
, one has
. Let
. Since is degenerate
on the -class
, then
is also degenerate on
.
Since
, it follows that
. Hence,
, while it is clear that
. To sum up,
.
$
#
&
&
'
'
%
$
&
&
'
'
(
$
)
(
(
(
*
+
Setting
, Theorem 4 applies to stationary p.m.'s, and the
set of extreme points of stationary p.m.'s coincides with . Hence,
each stationary admits the representation
for
some unique p.m. on
. Furthermore, just as in the exchangeable
case,
is the probability distribution of ~, for some Borel function
~
~
~ for all
such that ~
is a version of
.
"
(
/
,
0
-
,
0
(
0
1
1
1
1
2
(
54
3
4
.
-
To realize the program sketched above, i.e., to develop a Bayesian
nonparametric theory for stationary data, one has to assess priors on
. Precisely, one should "propose" some reasonable class of priors
on
, and calculate the corresponding posterior and predictive
distributions. Such priors should have large support, so as to obtain a
real nonparametric theory. Further, they should cover a broad range of
potential beliefs, and the posterior and predictive distributions should
be not too difficult to evaluate. Clearly, it is not easy to put together all
these requisites.
As a preliminary step, we investigate, for stationary data, the same
problem of Section 4, i.e., existence of underlying parametric models.
A different kind of empirical measure is to be used. Given
, define:
5
6
7
:
;
8
5
@
9
<
=
>
?
~
A
C
G
H
~ , ,~
B
B
B
;
for
C
D
D
N
R
I
D
E
F
@
,
J
P
U
V
W
K
:
O
P
for
~
G
N
Q
X
N
R
P
S
T
L
.
M
Q
g
For fixed , and
,
is a p.m. on
. To obtain a
p.m. on
, we fix any
and we refer to
instead
of
. (For each p.m.
on
,
denotes the p.m. on
~
under which ~ has distribution , ~
has
~
~
~
distribution , and
is independent of
). Clearly, this
is only a rough device to transform
into a p.m. on
, and
will not play any essential role. Next, call
the integer part of
and define
a
Y
Z
[
\
b
]
^
_
e
f
[
`
_
c
d
]
`
b
g
a
_
]
`
j
\
i
^
h
j
k
e
l
x
m
n
o
s
t
q
x
y
r
z
f
b
p
u
v
v
w
s
~
…

y
{
|
€

‚
{
|
ƒ

‚
}
}
}
~
„
v
…
y
{
€
|

‚
ƒ
{
|

‚
}
}
}
~
„
w
†
x

y
z
~

w
‡
ˆ
‰
Š
‹
Œ

Ž


Ž
•
–
—
Ž


‘
’
“
”
.
˜
Thus, each
is a Borel function, and it is a summary of
~
~
the data
, , . When
is stationary, we will use the
as
empirical measures. One reason is the following.
š

›
™
œ
œ

Ž
ž
Ÿ
Ÿ
Ÿ


55
Theorem 6
If is a Polish space and
¡
¢
£
¤
¡
¥
a stationary p.m. on
§
~
¨
©
ª
, then
¦
¢
-a.s.
¨
where "
" stands for weak convergence of p.m.'s.
Proof
Conditionally on ~, ~ is ergodic with distribution ~. Hence, it is
enough to show that, if
is ergodic then
, -a.s.. Suppose
that is ergodic, and fix
. Let
be the canonical projection of
onto
. Since the sequence of
-valued random variables
~
~ , ,~
is ergodic under , one has
-a.s. as
. Further, for all
and
, a direct calculation shows that
©
«
©
¢
§
¨
¢
¢
ª
¢
¬
°
¡
±
­
®
¯
°
²
°
¡
±
²
¡
¦
¤
³
­
µ
®
¥
·
¤
¤
«
´
´
´
«
±
°
¥
­
¶
µ
ª
ª
®
¥
¢
¶
ª
²
§
°
¨
¢
¸
°
¯
¢
­
£
¤
±
¡
²
¥
° º
¶
ª
¬
¶
»
¼
½
¹
¾
À
Á
¿
²
§
¸
°
¯
¤
°
¥
·
§
°
¤
°
¡
¥
º
·
§
°
±
°
°
¤
¥
.
º
Ã
½
½
ª
ª
Â
½
¹
ª
¹
º
Ã
²
§
Thus,
the proof.
¸
²
¯
¨
°
Ã
¢
¸
¯
¢
º
°
,
º
ª
-a.s. as
¶
»
, and this concludes
¼
Ä
§
Å
Since
(and not ) is now used as empirical measure, the notion
of statistic is to be slightly modified, too. Let
ª
ª
Æ
Ç
È
É
Ê
Ë
Ì
Í
Ë
Î
Ï
Ó
Ð
Ñ
Î
Ò
Ô
Õ
É
Î
Ù
Ê
Ì
be the union of the ranges of all the , and let
be such
that
. In what follows, in line with the notion adopted so far, a
statistic is meant as the restriction to , ~ , of any Borel function
~
.
At this stage, the argument proceeds essentially as in Sections 3
and 4. The first part of Section 3 remains unaffected, apart from
"exchangeable" is to be replaced by "stationary", "i.i.d." by "ergodic",
by , and ~ by ~. In particular, given a stationary
and a Borel
×
×
Ó
Ø
Ö
Æ
Ö
×
Ø
Ö
Ú
Ü
Æ
Û
Û
Æ
Í
×
Ø
ß
Ö
à
Ý
Þ
á
â
ã
56
~
function
(a) simply becomes
ð
, where
ï
ä
å
ê
ë
é
î
(a ) ~
õ
and
ö
ò
ñ
û
ö
ó
ý
þ
ÿ
û
ü
~~
æ
ô
ô
ò
ø
ø
ç
å
è
î
î
~,
÷
ó
and
ï
å
ù
ò
, condition
ï
å
ì
í
î
-a.s., for some Borel function
ú
.
A slight difference between (a ) and (a), suggested by Theorem 2, is
that is now asked to be a Borel function, and not merely a
measurable function. In any case, if (a ) holds then, conditionally on
~ ~ , ~ is ergodic with distribution ~. So, under (a ), the "original
random parameter" ~ can be reduced through ~, i.e., the random
~ ~~
parameter can be taken to be
. Next, conditions (b) and (c)
turn into:
ó
ô
þ
ø
÷
ô
ò
ø
ò
÷
ò
÷
ù
ô
ò
ø
(b ) ~ is injective on
÷
, for some
ñ
õ
,
õ
ô
ø
ñ
ø
ü
ô
ø
ù
;
(c ) There is a set
ô
û
such that
ö
~
÷
ô
ú
ô
ø
ù
~
÷
ô
ø
such that
ô
ø
ø
whenever
and
õ
ô
ô
ø
ô
ø
ø
;
where is now Prohorov metric on
As in Section 4, condition
(c ) is perhaps more expressive if
is replaced by some other
equivalent metric, like the bounded Lipschitz metric on
.
Finally, the arguments for proving Theorems 2 and 3 do not
depend on exchangeability of , and can be repeated for a stationary
. In fact, up to a proper choice of the empirical measure (and thus up
to Theorem 6), the results in this section hold for any -invariant p.m.
on
, with countable. We state the stationary versions of
Theorems 2 and 3 jointly, and we omit proofs.
ö
û
ö
û
ú
ú
ú
ô
ø
57
Theorem 7
Let
~
&
!
"
#
$
be Polish spaces,
a stationary p.m. on
, and
a Borel function. Then, (a ) is equivalent to (b ).
~
Moreover, if is continuous on
and
, then conditions
(a ), (b ) and (c ) are equivalent.
'
(
%
)
*
,
,
+
#
$
-
.
&
(
/
*
,
,
+
(
*
+
,
Acknowledgements: Research partially supported by: M.U.R.S.T. 40% "Processi
Stocastici, Calcolo Stocastico ed Applicazioni".
References
Burkholder D.L. (1961) Sufficiency in the undominated case,
Annals of Mathematical Statistics, 32, 1191-1200.
Fortini S., Ladelli L., Regazzini E. (2000) Exchangeability,
predictive distributions
and parametric models, Sankhya, 62, Series A, 86-109.
Huber P.J. (1981) Robust statistics, J.Wiley, New York.
Maitra A. (1977) Integral representations of invariant measures,
Transactions of the American Mathematical Society, 229, 209-225.
Olshen R. (1974) A note on exchangeable sequences, Zeitschrift
fur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 28, 317-321.
58