full paper

EXCHANGEABLE LOWER PREVISIONS
GERT DE COOMAN, ERIK QUAEGHEBEUR, AND ENRIQUE MIRANDA
A BSTRACT. We extend de Finetti’s (1937) notion of exchangeability to finite and countable
sequences of variables, when a subject’s beliefs about them are modelled using coherent
lower previsions rather than (linear) previsions. We derive representation theorems in
both the finite and the countable case, in terms of sampling without and with replacement,
respectively.
1. I NTRODUCTION
This paper deals with belief models for both finite and countable sequences of exchangeable random variables taking a finite number of values. When such sequences of random
variables are assumed to be exchangeable, this more or less means that the specific order in
which they are observed is deemed irrelevant.
The first detailed study of exchangeability was made by de Finetti (1937) (with the
terminology of ‘equivalent’ events). He proved the now famous Representation Theorem,
which is often interpreted as stating that a sequence of random variables is exchangeable if
it is conditionally independent and identically distributed (IID). Other important work on
exchangeability was done by, amongst many others, Hewitt and Savage (1955), Heath and
Sudderth (1976), Diaconis and Freedman (1980) and, in the context of the behavioural theory
of imprecise probabilities that we are going to consider here, by Walley (1991). We refer to
Kallenberg (2002, 2005) for modern, measure-theoretic discussions of exchangeability.
One of the reasons why exchangeability is deemed important, especially by Bayesians,
is that, by virtue of de Finetti’s Representation Theorem, an exchangeable model can be
seen as a convex mixture of multinomial models. This has given some ground (de Finetti,
1937, 1974-5; Dawid, 1985) to the claim that aleatory probabilities and IID processes can
be eliminated from statistics, and that we can restrict ourselves to exchangeable sequences
instead (see Walley (1991, Section 9.5.6) for a critical discussion of this claim).
De Finetti presented his study of exchangeability in terms of the behavioural notion of
previsions, or fair prices. The central assumption underlying his approach is that a subject
should be able to specify a fair price P( f ) for any risky transaction (which we will call a
gamble) f (de Finetti, 1974-5, Chapter 3). This may not always be realistic, and for this
reason, it has been suggested that we should explicitly allow for a subject’s indecision, by
distinguishing between his lower prevision P( f ), which is the supremum price for which
he is willing to buy the gamble f , and his upper prevision P( f ), which is the infimum
price for which he is willing to sell f . For any real number r strictly between P( f ) and
P( f ), the subject is then not specifying a choice between selling or buying the gamble f
for r. Such lower and upper previsions are also subject to certain rationality or coherence
criteria, in very much the same way as (precise) previsions are on de Finetti’s account.
Key words and phrases. Exchangeability, lower prevision, Representation Theorem, Bernstein polynomials, convergence in distribution, sampling without replacement, multinomial sampling, imprecise probability,
coherence.
1
2
GERT DE COOMAN, ERIK QUAEGHEBEUR, AND ENRIQUE MIRANDA
The resulting theory of coherent lower previsions, brilliantly defended by Walley (1991),
generalises de Finetti’s behavioural treatment of subjective, epistemic probability, and is
briefly overviewed in Section 2.
Also in this theory, it is interesting to consider what are the consequences of a subject’s
exchangeability assessment, i.e., that the order in which we consider a number of random
variables is of no consequence. This is our motivation for studying exchangeable lower
previsions in this paper. An assessment of exchangeability will have a clear impact on the
structure of so-called exchangeable coherent lower previsions. We will show they can be
written as a combination of (i) a coherent (linear) prevision expressing that permutations
of realisations of such sequences are considered equally likely, and (ii) a coherent lower
prevision for the ‘frequency’ of occurrence of the different values the random variables can
take. Of course, this is the essence of representation in de Finetti’s sense: we generalise his
results to coherent lower previsions.
Before we go on, we want to draw attention to a number of distinctive features of our
approach. First of all, the usual proofs of the Representation Theorem, such as the ones
given by de Finetti (1937), Heath and Sudderth (1976), or Kallenberg (2005), do not lend
themselves very easily to a generalisation in terms of coherent lower previsions. In principle
it would be possible, at least in some cases, to start with the versions already known for
(precise) previsions, and to derive their counterparts for lower previsions using so-called
lower envelope theorems (see Section 2 for more details). This is the method that Walley
(1991, Sections 9.5.3 and 9.5.4) suggests. But we have decided to follow a different route:
we derive our results directly for lower previsions, using an approach based on Bernstein
polynomials, and we obtain the ones for previsions as special cases. We believe this method
to be more elegant and self-contained, and it certainly has the additional benefit of drawing
attention to what we feel is the essence of de Finetti’s Representation Theorem: specifying a
coherent belief model for a countable exchangeable sequence is tantamount to specifying a
coherent (lower) prevision on the linear space of polynomials on some simplex, and nothing
more.
Secondly, we will focus on, and use the language of, (lower and upper) previsions
for gambles, rather than (lower and upper) probabilities for events: as we shall see, in
the behavioural theory of imprecise probabilities, the language of gambles is much more
expressive than that of events, and we need its full expressive power to derive our results.
The plan of the paper is as follows. In Section 2, we introduce a number of results from
the theory of coherent lower previsions necessary to understand the rest of the paper. In
Section 3, we define exchangeability for finite sequences of random variables, and establish
a representation of coherent exchangeable lower previsions in terms of sampling without
replacement. In Section 4, we extend the notion of exchangeability to countable sequences
of random variables, and in Section 5 we generalise de Finetti’s Representation Theorem (in
terms of multinomial sampling) to exchangeable coherent lower previsions. In an appendix,
we have gathered a few useful results about Bernstein polynomials.
2. L OWER PREVISIONS , RANDOM VARIABLES AND THEIR DISTRIBUTIONS
In this section, we want to provide a brief summary of ideas and results from the theory
of coherent lower previsions (Walley, 1991).
2.1. Epistemic uncertainty models. Consider a random variable X that may assume
values x in some non-empty set X . By ‘random’, we mean that a subject is uncertain about
the actual value of the variable X, i.e., does not know what this actual value is.
EXCHANGEABLE LOWER PREVISIONS
3
Our subject may entertain certain beliefs about the value of X. We are going to try and
model his beliefs mathematically using the concept of a gamble on X , which is a bounded
map f from X to the set R of real numbers. Let us denote by L (X ) the set of all gambles
on X .
De Finetti (1974-5) has proposed to model a subject’s beliefs by eliciting his fair price,
or prevision, P( f ) for certain gambles f . This P( f ) can be defined as the unique real
number p such that the subject is willing to buy the gamble f for all prices s (i.e., accept
the gamble f − s) and sell f for all prices t (i.e., accept the gamble t − f ) for all s < p < t.
The problem with this approach is that it presupposes that there is such a real number, or, in
other words, that the subject, whatever his beliefs about X are, is willing, for (almost) every
real r, to make a choice between buying f for the price r, or selling it for that price.
2.2. Coherent lower previsions and natural extension. A way to address this problem
is to consider a model that allows our subject to be undecided for some prices r. This is
done in Walley’s (1991) theory of lower and upper previsions. The lower prevision of the
gamble f , P( f ), is our subject’s supremum acceptable buying price for f ; similarly, our
subject’s upper prevision, P( f ), is his infimum acceptable selling price for f . Hence, he is
willing to buy the gamble f for all prices s < P( f ) and sell f for all prices t > P( f ), but he
may be undecided for prices P( f ) ≤ p ≤ P( f ).
Since buying the gamble f for a price s is the same as selling the gamble − f for the
price −s, the lower and upper previsions are conjugate functions: P( f ) = −P(− f ) for any
gamble f . This allows us to concentrate on one of them, since we can immediately derive
results for the other. In this paper, we focus mainly on lower previsions.
For lower previsions, the most important rationality criterion is that of coherence. If a
lower prevision P is defined on a linear space of gambles K , then it turns out to be coherent
if and only if it satisfies the following conditions: for any gambles f and g in K and any
non-negative real number λ , it should hold that:
(P1) P( f ) ≥ inf f [accepting sure gains];
(P2) P(λ f ) = λ P( f ) [non-negative homogeneity];
(P3) P( f + g) ≥ P( f ) + P(g) [super-additivity].
The following special properties hold for a coherent lower prevision whenever the gambles
involved belong to its domain:
(i) P is monotone: if f ≤ g, then P( f ) ≤ P(g).
(ii) inf f ≤ P( f ) ≤ P( f ) ≤ sup f .
Moreover, coherent lower and upper previsions are continuous with respect to uniform
convergence of gambles.
2.3. Linear previsions. If the lower prevision P( f ) and the upper prevision P( f ) for a
gamble f happen to coincide, then the value P( f ) = P( f ) = P( f ) is called the subject’s
(precise) prevision for f . Previsions are fair prices in de Finetti’s (1974-5) sense. We shall
call them precise probability models, and lower previsions will be called imprecise.
A prevision on the set L (X ) of all gambles is linear if and only if it is a positive
( f ≥ 0 ⇒ P( f ) ≥ 0) and normed (P(1) = 1) real linear functional. A prevision on a general
domain is linear if and only if it can be extended to a linear prevision on all gambles. We
shall denote by P(X ) the set of all linear previsions on L (X ).
There is an interesting link between precise and imprecise probability models, expressed
through the so-called lower envelope theorem: A lower prevision P on some domain K
is coherent if and only if it is the lower envelope of some set of linear previsions, and in
4
GERT DE COOMAN, ERIK QUAEGHEBEUR, AND ENRIQUE MIRANDA
particular of the convex set M (P) of all linear previsions that dominate it: for all f in K ,
P( f ) = inf {P( f ) : P ∈ M (P)}, where M (P) := {P ∈ P(X ) : (∀ f ∈ K )(P( f ) ≥ P( f ))}.
2.4. The distribution of a random variable. We call a subject’s coherent lower prevision
P on L (X ), modelling his beliefs about the value that a random variable X assumes in the
set X , his distribution for that random variable.
Now consider another set Y , and a map ϕ from X to Y , then we can consider Y := ϕ(X)
as a random variable assuming values in Y . With a gamble h on Y , there corresponds
a gamble h ◦ ϕ on X , whose lower prevision is P(h ◦ ϕ). This leads us to define the
distribution of Y = ϕ(X) as the induced coherent lower prevision Q on L (Y ), defined by
Q(h) := P(h ◦ ϕ),
h ∈ L (Y ).
This notion generalises that of an induced probability measure.
Finally, consider a sequence of random variables Xn , all taking values in some metric
space S. Denote by C (S) the set of all continuous gambles on S. For each random
variable Xn , we have a distribution in the form of a coherent lower prevision PXn on L (S).
Then we say that the random variables converge in distribution if for all h ∈ C (S), the
sequence of real numbers PXn (h) converges to some real number, which we denote by P(h).
The limit lower prevision P on C (S) that we can define in this way, is coherent, because a
point-wise limit of coherent lower previsions always is.
3. E XCHANGEABLE RANDOM VARIABLES
We are now ready to recall Walley’s (1991, Section 9.5) notion of exchangeability in
the context of the theory of coherent lower previsions. We shall see that it generalises de
Finetti’s definition for linear previsions (de Finetti, 1937, 1974-5).
3.1. Definition and basic properties. Consider N ≥ 1 random variables X1 , . . . , XN taking
values in a non-empty and finite set X . A subject’s beliefs about the values that these random variables X = (X1 , . . . , XN ) assume jointly in X N is given by their (joint) distribution,
which is a coherent lower prevision PN defined on the set L (X N ).
Let us denote by PN the set of all permutations of {1, . . . , N}. With any such permutation π we can associate, by the procedure of lifting, a permutation of X N , also denoted
by π, that maps any x = (x1 , . . . , xN ) in X N to πx := (xπ(1) , . . . , xπ(N) ). Similarly, with any
gamble f on X N , we can consider the permuted gamble π f := f ◦ π.
A subject judges the random variables X1 , . . . , XN to be exchangeable when he is disposed
to exchange any gamble f for the permuted gamble π f , meaning that PN (π f − f ) ≥ 0, for
any permutation π. Taking into account the properties of coherence, this means that
PN (π f − f ) = PN ( f − π f ) = 0
for all gambles f on X N and all permutations π in PN . In this case, we also call the
joint coherent lower prevision PN exchangeable. A subject will make an assumption of
exchangeability when there is evidence that the processes generating the values of the
random variables are (physically) similar (Walley, 1991, Section 9.5.2), and consequently
the order in which the variables are observed is not important.
When PN is in particular a linear prevision PN , exchangeability is equivalent to having
N
P (π f ) = PN ( f ) for all gambles f and all permutations π. The following proposition,
mentioned by Walley (1991, Section 9.5), and whose proof is immediate and therefore
omitted, establishes an even stronger link between Walley’s and de Finetti’s notions of
exchangeability.
EXCHANGEABLE LOWER PREVISIONS
5
Proposition 1. A coherent lower prevision PN is exchangeable if and only if all the linear
previsions PN in M (PN ) are exchangeable.
Clearly, if X1 , . . . , XN are exchangeable, then any permutation Xπ(1) , . . . , Xπ(N) is
exchangeable as well, and has the same distribution PN . Moreover, any selection of
1 ≤ n ≤ N random variables from amongst the X1 , . . . , XN are exchangeable too, and their
distribution is given by Pn , which is the X n -marginal of PN , given by Pn ( f ) := PN ( fe) for
all gambles f on X n , where the gamble fe on X N is the cylindrical extension of f to X N ,
given by fe(z1 , . . . , zN ) := f (z1 , . . . , zn ) for all (z1 , . . . , zN ) in X N .
3.2. Count vectors. Interestingly, exchangeable coherent lower previsions have a very
simple representation, in terms of sampling without replacement. To see how this comes
about, consider any x ∈ X N . Then the so-called (permutation) invariant atom
[x] := {πx : π ∈ PN }
is the smallest non-empty subset of X N that contains x and that is invariant under all
permutations π in PN . We shall denote the set of permutation invariant atoms of X N
by A N . It constitutes a partition of the set X N . We can characterise these invariant atoms
using the counting maps TxN : X N → N0 defined for all x in X in such a way that
TxN (z) = TxN (z1 , . . . , zN ) := |{k ∈ {1, . . . , N} : zk = x}|
is the number of components of the N-tuple z that assume the value x. Here |A| denotes the
number of elements in a finite set A, and N0 is the set of all non-negative integers (including
zero). We shall denote by TN the vector-valued map from X N to NX
0 whose component
maps are the TxN , x ∈ X . TN actually assumes values in the set of count vectors
(
)
N
N
:=
m ∈ NX
0 :
∑
mx = N .
x∈X
The counting map TN can be interpreted as a bijection (one-to-one and onto) between the set
of invariant atoms A N and the set of count vectors N N , and we can identify any invariant
atom [z] by the count vector m = TN (z) of any (and therefore all) of its elements. We
therefore also denote this atom by [m]; and clearly y ∈ [m] if and only if TN (y) = m. The
number of elements ν(m) in any invariant atom [m] is given by
N
N!
ν(m) :=
=
.
m
∏x∈X mx !
If the joint random variable X = (X1 , . . . , XN ) assumes the value z in X N , then the
corresponding count vector assumes the value TN (z) in N N . This means that we can see
TN (X) = TN (X1 , . . . , XN ) as a random variable in N N . If the available information about
the values that X assumes in X N is given by the coherent exchangeable lower prevision PN
– the distribution of X –, then the corresponding uncertainty model for the values that TN (X)
assumes in N N is given by the coherent induced lower prevision QN on L (N N ) – the
distribution of TN (X) –, given by
QN (h) := PN (h ◦ TN ) = PN
h(m)I
(1)
[m]
∑
m∈N N
for all gambles h on N
We now prove a theorem that shows that, conversely, any
exchangeable coherent lower prevision PN is in fact completely determined by the corresponding distribution QN of the count vectors, also called its count distribution.
N.
6
GERT DE COOMAN, ERIK QUAEGHEBEUR, AND ENRIQUE MIRANDA
Consider an urn with N balls of different types, where the different types are characterised
by the elements x of the set X . Suppose the composition of the urn is given by the count
vector m ∈ N N , meaning that mx balls are of type x, for x ∈ X . We are now going to
subsequently select (in a random way) N balls from the urn, without replacing them. It
follows that for any gamble f on X N , its (precise) prevision (or expectation) is given by
1
MuHyN ( f |m) :=
∑ f (z).
ν(m) z∈[m]
The linear prevision MuHyN (·|m) is the one associated with a multiple hyper-geometric
distribution (Johnson et al., 1997, Chapter 39), whence the notation. For any permutation π
of {1, . . . , N}
1
1
MuHyN (π f |m) =
∑ f (πz) = ν(m) ∑ f (z) = MuHyN ( f |m),
ν(m) z∈[m]
π −1 z∈[m]
since π −1 z ∈ [m] if and only if z ∈ [m]. This means that the linear prevision MuHyN (·|m) is
exchangeable. The following theorem establishes an even stronger result. It is an immediate
consequence of a much more general representation result (De Cooman and Miranda, 2007,
Theorem 30).
Theorem 2 (Representation theorem for finite sequences of exchangeable variables). Let
N ≥ 1. A coherent lower prevision PN on L (X N ) is exchangeable if and only if it there is
some coherent lower prevision Q on L (N N ) such that PN ( f ) = Q(MuHyN ( f |·)) for all
gambles f on X N . If a coherent lower prevision PN on L (X N ) is exchangeable, then the
corresponding Q is given by Equation (1).
This theorem implies that any collection of N exchangeable random variables in X can
be seen as the result of N random draws without replacement from an urn with N balls
whose types are characterised by the elements x of X , whose composition m is unknown,
but for which the available information about the composition is modelled by a coherent
lower prevision on L (N N ).1
That exchangeable linear previsions can be interpreted in terms of sampling without replacement from an urn with unknown composition, is of course well-known, and essentially
goes back to de Finetti’s work on exchangeability; see de Finetti (1937), and Cifarelli and
Regazzini (1996). Heath and Sudderth (1976) give a simple proof for variables that may
assume two values. But we believe our proof for the much more general representation
result (De Cooman and Miranda, 2007, Theorem 30) to be conceptually even simpler than
Heath and Sudderth’s proof.
4. E XCHANGEABLE SEQUENCES
4.1. Definitions. Consider a countable sequence X1 , . . . , Xn , . . . of random variables taking
values in the same non-empty set X . This sequence is called exchangeable if any finite
collection of random variables taken from this sequence is exchangeable.
We can also consider the exchangeable sequence as a single random variable X assuming
values in the set X N , where N is the set of the natural numbers (positive integers, without
zero). Its possible values x are sequences x1 , . . . , xn , . . . of elements of X , or in other
words, maps from N to X . We can model the available information about the value that X
assumes in X N by a coherent lower prevision PN on L (X N ), called the distribution of
the exchangeable random sequence X.
1Walley (1991, Chapter 9) also mentions this result for exchangeable coherent lower previsions.
EXCHANGEABLE LOWER PREVISIONS
7
The random sequence X, or its distribution PN , is clearly exchangeable if and only if
all its X n -marginals Pn are exchangeable for n ≥ 1. These marginals Pn on L (X n ) are
defined as follows: for any gamble f on X n , Pn ( f ) := PN ( fe), where fe is the cylindrical
extension of f to X N , defined by fe(x) := f (x1 , . . . , xn ) for all x = (x1 , . . . , xn , xn+1 , . . . )
in X N . In addition, the family of exchangeable coherent lower previsions Pn , n ≥ 1,
satisfies the following ‘time consistency’ requirement:
Pn ( f ) = Pn+k ( fe),
(2)
for all n ≥ 1, k ≥ 0, and all gambles f on X
fe denotes the cylindrical extension
of f to X n+k : Pn should be the X n -marginal of any Pn+k .
It follows at once that any finite collection of n ≥ 1 random variables taken from such an
exchangeable sequence has the same distribution as the first n variables X1 , . . . , Xn , which
is the exchangeable coherent lower prevision Pn on L (X n ).
Conversely, suppose we have a collection of exchangeable coherent lower previsions Pn
on L (X n ), n ≥ 1 that satisfy the time consistency requirement (2). Then any coherent
lower prevision PN on L (X N ) that has X n -marginals Pn is exchangeable. The smallest,
or most conservative, such (exchangeable) coherent lower prevision is given by
n , where now
E N ( f ) := sup Pn (projn ( f )) = lim Pn (projn ( f )),
n→∞
n∈N
where f is any gamble on X N , and its lower projection projn ( f ) on X n is the gamble
on X n that is defined by projn ( f )(x) := infz∈X N :zk =xk ,k=1,...,n f (z) for all x ∈ X n . See De
Cooman and Miranda (2008, Section 5) for more details.
4.2. Time consistency of the count distributions. It is of crucial interest for what follows
to find out what are the consequences of the time consistency requirement (2) on the
marginals Pn for the corresponding family Qn , n ≥ 1, of distributions of the count vectors
Tn (X1 , . . . , Xn ). Consider therefore n ≥ 1, k ≥ 0 and any gamble h on N n . Let f := h ◦ Tn ,
then
Qn (h) = Pn ( f ) = Pn+k ( fe) = Qn+k (MuHyn+k ( fe|·)),
where the first equality follows from Equation (1), the second from Equation (2), and the
last from Theorem 2. Now for any m0 in N n+k , and any z0 = (z, y) in X n+k = X n × X k
we have that Tn+k (z0 ) = Tn (z) + Tk (y) and therefore
MuHyn+k ( fe|m0 ) =
∑n
m∈N
ν(m0 − m)ν(m)
h(m),
ν(m0 )
taking into account that MuHyn ( f |m) = h(m), and ν(m0 − m) is zero unless m ≤ m0 . So
we see that time consistency is equivalent to
ν(· − m)ν(m)
n
n+k
Q (h) = Q
h(m)
(3)
∑
ν(·)
m∈N n
for all n ≥ 1, k ≥ 0 and h ∈ L (N n ).
5. A REPRESENTATION THEOREM FOR EXCHANGEABLE SEQUENCES
De Finetti (1937, 1974-5) has proven a representation result for exchangeable sequences
with linear previsions that generalises Theorem 2, and where multinomial distributions
take over the rôle that the multiple hyper-geometric ones play for finite collections of
exchangeable variables. One simple and intuitive way (see also de Finetti, 1974-5, p. 218)
to understand why the representation result can be thus extended from finite collections to
8
GERT DE COOMAN, ERIK QUAEGHEBEUR, AND ENRIQUE MIRANDA
countable sequences, is based on the fact that the multinomial distribution can be seen as as
limit of multiple hyper-geometric ones (Johnson et al., 1997, Chapter 39). This is also the
central idea behind Heath and Sudderth’s (1976) simple proof of this representation result
in the case of variables that may only assume two possible values.
However, there is another, arguably even simpler, approach to proving the same results,
which we present here. It also works for exchangeability in the context of coherent lower
previsions. And as we shall have occasion to explain further on, it has the additional
advantage of clearly indicating what the ‘representation’ is, and where it is uniquely defined.
5.1. Multinomial processes are exchangeable. Consider a sequence of IID random variables Y1 , . . . , Yn , . . . with common probability mass function θ : the probability that Yn = x
is θx for x ∈ X . Observe that θ is an element of the X -simplex
Σ = {θθ ∈ RX : (∀x ∈ X )(θx ≥ 0) and
∑
θx = 1}.
x∈X
Then for any n ≥ 1 and any z in X n the probability that (Y1 , . . . ,Yn ) is equal to z is given by
T (z)
∏x∈X θx x , which yields the multinomial mass function (Johnson et al., 1997, Chapter 35).
As a result, we have for any gamble f on X n that its corresponding (multinomial) prevision
(expectation) is given by
Mnn ( f |θθ ) = CoMnn (MuHyn ( f |·)|θθ ),
(4)
where we defined the (count multinomial) linear prevision CoMn (·|θθ ) on L (N
n
n
CoMn (g|θθ ) =
∑
g(m)ν(m)
m∈N n
∏
n)
by
θxmx ,
(5)
x∈X
where g is any gamble on N n . The corresponding probability mass for any count vector m,
CoMnn ({m}|θθ ) = ν(m)
∏ θxmx =: Bm (θθ ),
(6)
x∈X
is the probability of observing some value z for (Y1 , . . . ,Yn ) whose count vector is m. The
polynomial function Bm on the X -simplex is called a (multivariate) Bernstein (basis)
polynomial. The set {Bm : m ∈ N n } of all Bernstein (basis) polynomials of fixed degree n
forms a basis for the linear space of all (multivariate) polynomials on Σ whose degree
is at most n; hence their name. If we have a polynomial p of degree m, this means
that for any n ≥ m, p has a unique (Bernstein) decomposition bnp ∈ L (N n ) such that
p = ∑m∈N n bnp (m)Bm . If we combine this with Equations (5) and (6), we find that bnp is the
unique gamble on N n such that CoMnn (bnp |·) = p.
We deduce from Equation (4) and Theorem 2 that the linear prevision Mnn (·|θθ ) on
L (X n ) is exchangeable, and that CoMnn (·|θθ ) is the corresponding distribution for the
corresponding count vectors Tn (Y1 , . . . ,Yn ). Therefore the sequence of IID random variables
Y1 , . . . , Yn , . . . is exchangeable.
5.2. A representation theorem. Consider the following linear subspace of L (Σ):
V (Σ) := {CoMnn (g|·) : n ≥ 1, g ∈ L (N n )} = {Mnn ( f |·) : n ≥ 1, f ∈ L (X n )} ,
each of whose elements is a polynomial function on the X -simplex:
CoMnn (g|θθ ) =
∑ n g(m)ν(m) ∏ θxmx = ∑ n g(m)Bm (θθ ),
m∈N
x∈X
m∈N
and is actually a linear combination of Bernstein basis polynomials Bm with coefficients
g(m). So V (Σ) is the linear space spanned by all Bernstein basis polynomials, and is
therefore the set of all polynomials on the X -simplex Σ.
EXCHANGEABLE LOWER PREVISIONS
9
Now if R is any coherent lower prevision on L (Σ), then it is easy to see that the family
of coherent lower previsions Pn , n ≥ 1, defined by
Pn ( f ) = R(Mnn ( f |·)),
f ∈ L (X n )
(7)
is still exchangeable and time consistent, and the corresponding count distributions are
Qn (g) = R(CoMnn (g|·)),
g ∈ L (N n ).
(8)
Here, we are going to show that a converse result also holds: for any time consistent family
of exchangeable coherent lower previsions Pn , n ≥ 1, there is a coherent lower prevision R
on V (Σ) such that Equation (7), or its reformulation for counts (8), holds. We call such an
R a representation, or representing coherent lower prevision, for the family Pn . Of course,
any representing R, if it exists, is uniquely determined on V (Σ).
So consider a family of coherent lower previsions Qn on L (N n ), n ≥ 1, that are
time consistent. It suffices to find an R such that (8) holds, because the corresponding
exchangeable lower previsions Pn on L (X n ) are then uniquely determined by Theorem 2,
and automatically satisfy the condition (7). Our proposal is to define the functional R on the
set V (Σ) as follows: consider any element p of V (Σ). Then, by definition, there is some
n ≥ 1 and a corresponding unique bnp ∈ L (N n ) such that p = CoMnn (bnp |·). We then let
R(p) := Qn (bnp ).
The first thing to check is whether this definition is consistent:
Lemma 3. Let p be a polynomial of degree m, and n1 , n2 ≥ m. Then Qn1 (bnp1 ) = Qn2 (bnp2 ).
Proof. We may assume without loss of generality that n2 ≥ n1 . The Bernstein decompositions bnp1 and bnp2 are then related by Zhou’s formula [see Equation (10) in the Appendix]:
bnp2 (m2 ) =
∑n
m1 ∈N 1
ν(m2 − m1 )ν(m1 ) n1
b p (m1 ),
ν(m2 )
m2 ∈ N
n2
.
Consequently, by the time consistency requirement (3), Qn2 (bnp2 ) = Qn1 (bnp1 ).
Lemma 4. R is a coherent lower prevision on the linear space V (Σ).
Proof. We show that R satisfies the necessary and sufficient conditions (P1)–(P3) for
coherence of a lower prevision on a linear space.
We first prove that (P1) is satisfied. Consider any p ∈ V (Σ). Let m be the degree of p. We
must show that R(p) ≥ min p. We find that R(p) = Qn (bnp ) ≥ min bnp for all n ≥ m, because
of the coherence of Qn . But Equation (11) in the Appendix tells us that min bnp ↑ min p,
whence indeed R(p) ≥ min p.
Next, consider any p in V (Σ) and any real λ ≥ 0. Consider any n that is not smaller
than the degree of p. Since obviously bnλ p = λ bnp , we get
R(λ p) = Qn (bnλ p ) = Qn (λ bnp ) = λ Qn (bnp ) = λ R(p),
where the third equality follows from the coherence [non-negative homogeneity] of the
count lower prevision Qn . This tells us that R satisfies (P2).
Finally, consider p and q in V (Σ), and any n that is not smaller than the maximum of
the degrees of p and q. Since obviously bnp+q = bnp + bnq , we get
R(p + q) = Qn (bnp+q ) = Qn (bnp + bnq ) ≥ Qn (bnp ) + Qn (bnq ) = R(p) + R(q),
where the inequality follows from the super-additivity of Qn . This tells us that R also
satisfies the (P3) and as a consequence it is coherent.
We can summarise the argument above as follows.
10
GERT DE COOMAN, ERIK QUAEGHEBEUR, AND ENRIQUE MIRANDA
Theorem 5 (Representation theorem for exchangeable sequences). Given a time consistent
family of exchangeable coherent lower previsions Pn on L (X n ), n ≥ 1, there is a unique
coherent lower prevision R on the linear space V (Σ) of all polynomial gambles on the
X -simplex, such that for all n ≥ 1, all f ∈ L (X n ) and all g ∈ L (N n ):
Pn ( f ) = R(Mnn ( f |·)) and
Qn (g) = R(CoMnn (g|·)).
(9)
Hence, the belief model governing any countable exchangeable sequence in X can be
completely characterised by a coherent lower prevision on the linear space of polynomial
gambles on Σ.
In the particular case where we have a time consistent family of exchangeable linear
previsions Pn on L (X n ), n ≥ 1, R will be a linear prevision R on the linear space V (Σ) of
all polynomial gambles on the X -simplex. As such, it will be characterised by its values
R(Bm ) on the Bernstein basis polynomials Bm , m ∈ N n , n ≥ 1, or on any other basis of
V (Σ).
It is a consequence of coherence that R is also uniquely determined on the set C (Σ)
of all continuous gambles on the X -simplex Σ: by the Stone-Weierstaß theorem, any
such gamble is the uniform limit of some sequence of polynomial gambles, and coherence
implies that the lower prevision of a uniform limit is the limit of the lower previsions.
This unicity result cannot be extended to more general (discontinuous) types of gambles:
the coherent lower prevision R is not uniquely determined on the set of all gambles L (Σ)
on the simplex, and there may be different coherent lower previsions R1 and R2 on L (Σ)
satisfying Equation (9). But any such lower previsions will agree on the class V (Σ) of
polynomial gambles, which is the class of gambles we need in order to characterise the
exchangeable sequence.
We now investigate the meaning of the representing lower prevision R a bit further.
Consider the sequence of so-called frequency random variables Fn := Tn (X1 , . . . , Xn )/n
corresponding to an exchangeable sequence of random variables X1 , . . . , Xn , . . . , and
assuming values in the X -simplex Σ. The distribution PFn of Fn is given by
1
1
PFn (h) := Qn (h ◦ ) = R(CoMnn (h ◦ |·)),
n
n
h ∈ L (Σ),
because we know that Qn is the distribution of Tn (X1 , . . . , Xn ), and also taking into account
Theorem 5 for the last equality. Now,
1
m
CoMnn (h ◦ |θθ ) = ∑ h
Bm (θθ )
n
n
m∈N n
is the Bernstein approximant or approximating Bernstein polynomial of degree n for the
gamble h, and it is a known result (see Feller (1971, Section VII.2) or Heitzinger et al.
(2003, Section 2)) that the sequence of approximating Bernstein polynomials CoMnn (h◦ n1 |·)
converges uniformly to h for n → ∞ if h is continuous. So, because R is defined uniquely,
and is uniformly continuous, on the set C (Σ), we find the following result:
Theorem 6. For all continuous gambles h on Σ, we have that
lim P (h) = R(h),
n→∞ Fn
or, in other words, the sequence of distributions PFn converges point-wise to R on C (Σ),
and in this specific sense, the sample frequencies Fn converge in distribution.
EXCHANGEABLE LOWER PREVISIONS
11
6. C ONCLUSIONS
We have shown that the notion of exchangeability has a natural place in the theory of
coherent lower previsions. Indeed, on our distinctive approach using Bernstein polynomials,
and gambles rather than events, it seems fairly natural and easy to derive representation
theorems directly for coherent lower previsions, and to derive the corresponding results for
precise probabilities (linear previsions) as special cases.
Interesting results can also obtained in a context of predictive inference, where a coherent
exchangeable lower prevision for n + k variables is updated with the information that the
first n variables have been observed to assume certain values. For a fairly detailed discussion
of these issues, we refer to De Cooman and Miranda (2007, Section 9.3).
ACKNOWLEDGEMENTS
We acknowledge financial support by research grant G.0139.01 of the Flemish Fund for
Scientific Research (FWO), and by project TIN2008-06796-C04-01. Erik Quaeghebeur’s
research was financed by a Ph.D. grant of the Institute for the Promotion of Innovation
through Science and Technology in Flanders (IWT Vlaanderen).
We would like to thank Jürgen Garloff for very helpful comments and pointers to the
literature about multivariate Bernstein polynomials.
A PPENDIX A. M ULTIVARIATE B ERNSTEIN POLYNOMIALS
With any n ≥ 0 and m ∈ N n there corresponds a Bernstein (basis) polynomial of degree n
on Σ, given by Bm (θθ ) = ν(m) ∏x∈X θxmx , θ ∈ Σ. These polynomials have a number of very
interesting properties (see for instance Prautzsch et al., 2002, Chapters 10 and 11):
B1. They are non-negative, and strictly positive in the interior of Σ.
B2. The set {Bm : m ∈ N n } of all Bernstein polynomials of fixed degree n forms a basis
for the linear space of all polynomials whose degree is at most n.
Hence, for any polynomial p of degree m there is a unique gamble bnp on N n such that
p=
∑ n bnp (m)Bm = CoMnn (bnp |·).
m∈N
This tells us that each p(θθ ) is a convex combination of the Bernstein coefficients bnp (m),
m ∈ N n whence min bnp ≤ min p ≤ p(θθ ) ≤ max p ≤ max bnp . It also follows that for all
k ≥ 0 and all µ in N n+k ,
µ) =
bn+k
p (µ
∑n
m∈N
µ − m) n
ν(m)ν(µ
b p (m).
µ)
ν(µ
(10)
This is Zhou’s formula (see Prautzsch et al., 2002, Section 11.9). Moreover, since for any
polynomial p on Σ of degree m the bnp converge uniformly to p as n → ∞ (see for instance
Trump and Prautzsch (1996)), it follows that
lim [min bnp , max bnp ] = [min p, max p] = p(Σ).
n→∞
n≥m
(11)
R EFERENCES
D. M. Cifarelli and E. Regazzini. De Finetti’s contributions to probability and statistics.
Statistical Science, 11:253–282, 1996.
A. P. Dawid. Probability, symmetry, and frequency. British Journal for the Philosophy of
Science, 36(2):107–128, 1985.
12
GERT DE COOMAN, ERIK QUAEGHEBEUR, AND ENRIQUE MIRANDA
G. de Cooman and E. Miranda. Weak and strong laws of large numbers for coherent lower
previsions. Journal of Statistical Planning and Inference, 138(8):2409–2432, 2008.
G. de Cooman and E. Miranda. Symmetry of models versus models of symmetry. In W. L.
Harper and G. R. Wheeler, editors, Probability and Inference: Essays in Honor of Henry
E. Kyburg, Jr., pages 67–149. King’s College Publications, 2007.
B. de Finetti. La prévision: ses lois logiques, ses sources subjectives. Annales de l’Institut
Henri Poincaré, 7:1–68, 1937.
B. de Finetti. Theory of Probability: A Critical Introductory Treatment. John Wiley & Sons,
Chichester, 1974-1975. Two volumes.
P. Diaconis and D. Freedman. Finite exchangeable sequences. The Annals of Probability, 8:
745–764, 1980.
W. Feller. An Introduction to Probability Theory and Its Applications, volume II. John
Wiley and Sons, New York, 1971.
D. C. Heath and W. D. Sudderth. De Finetti’s theorem on exchangeable variables. The
American Statistician, 30:188–189, 1976.
C. Heitzinger, A. Hössinger, and S. Selberherr. On Smoothing Three-Dimensional Monte
Carlo Ion Implantation Simulation Results. IEEE Transactions on Computer-Aided
Design of integrated circuits and systems, 22(7):879–883, 2003.
E. Hewitt and L. J. Savage. Symmetric measures on Cartesian products. Transactions of the
American Mathematical Society, 80:470–501, 1955.
N. L. Johnson, S. Kotz, and N. Balakrishnan. Discrete Multivariate Distributions. Wiley
Series in Probability and Statistics. John Wiley and Sons, New York, 1997.
O. Kallenberg. Foundations of Modern Probability. Springer-Verlag, New York, second
edition, 2002.
O. Kallenberg. Probabilistic Symmetries and Invariance Principles. Springer, New York,
2005.
H. Prautzsch, W. Boehm, and M. Paluszny. Bézier and B-Spline Techniques. Springer,
Berlin, 2002.
W. Trump and H. Prautzsch. Arbitrary degree elevation of Bézier representations. Computer
Aided Geometric Design, 13:387–398, 1996.
P. Walley. Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, London,
1991.
S. L. Zabell. Predicting the unpredictable. Synthese, 90:205–232, 1992.
G HENT U NIVERSITY, SYST E MS R ESEARCH G ROUP, T ECHNOLOGIEPARK –Z WIJNAARDE 914, 9052
Z WIJNAARDE , B ELGIUM
E-mail address: [email protected], [email protected]
U NIVERSITY OF OVIEDO , D EPT. OF S TATISTICS AND O PERATIONS R ESEARCH . C-C ALVO S OTELO , S / N ,
33007, OVIEDO , S PAIN
E-mail address: [email protected]