Ergodic Theorems for Lower Probabilities

Ergodic Theorems for Lower Probabilities
S.Cerreia–Vioglio, F. Maccheroni, and M. Marinacci
Abstract. We establish an Ergodic Theorem for lower probabilities, a generalization of standard probabilities widely used in applications. As a byproduct, we provide a version for lower probabilities of the Strong Law of
Large Numbers.
1. Introduction
The purpose of this paper is to state and prove an Ergodic Theorem for lower
probabilities: a class of monotone set functions that are not necessarily additive
and are widely used in applications where standard additive probabilities turn out
to be inadequate (for applications in Economics see Marinacci and Montrucchio
[16], for applications in Statistics see Walley [19]).
We consider a measurable space ( ; F), endowed with an FnF-measurable
transformation : ! , and a (continuous) lower probability : F ! [0; 1]. We
study three di¤erent notions of invariance for lower probabilities (De…nitions 1-3).1
They are equivalent in the additive case, and so are genuine generalizations to the
nonadditive setting of the usual concept of invariance.
The most natural de…nition of invariance for a lower probability (De…nition
1) requires that
1
(A) =
(A)
8A 2 F:
It is the weakest form of invariance for the nonadditive case. Nevertheless, it is
still possible to derive a version of the Ergodic Theorem (Theorem 2). In other
words, if is an invariant lower probability, then for each real valued, bounded,
and measurable function f : ! R the limit
n
lim
n
1X
f
n
k 1
(!)
k=1
exists on a set that has measure 1 with respect to . If, in addition, is ergodic,
we are able to provide bounds for such limit in terms of lower and upper Choquet
integrals.
Corresponding Author: Fabio Maccheroni <[email protected]>, U. Bocconi,
via Sarfatti 25, 20136, Milano, ITALY. The authors gratefully acknowledge the …nancial support
of MIUR (PRIN grant 20103S5RN3_005).
1 In the arXiv working paper version (arXiv:1508.01329), using the techniques of [4] and [5],
we study a fourth notion of invariance arising naturally from the literature on Robust Statistics.
1
2
S.CERREIA–VIOGLIO, F. M ACCHERONI, AND M . M ARINACCI
Under the stronger notions of invariance (De…nitions 2 and 3), the previous
result can be strengthened in several ways. First, we develop a nonadditive version
of Kingman’s super-subadditive ergodic theorem (Theorem 3). Second, when ( ; F)
is a standard measurable space we can better characterize the limit of time averages
(Corollary 2).
As an application of our main result, we establish a nonadditive version of the
Strong Law of Large Numbers (Theorem 4) for stationary and ergodic processes.
2. Mathematical Preliminaries
2.1. Set functions. Consider a measurable space (S; ), where S is a nonempty set and is a -algebra of subsets of S. Subsets of S are understood to be
in even when not stated explicitly. A set function : ! [0; 1] is
(i) a capacity if (;) = 0, (S) = 1, and (A)
(B) for all A and B such
that A B;
(ii) convex if (A [ B) + (A \ B)
(A) + (B) for all A and B;
(iii) additive if (A [ B) = (A) + (B) for all disjoint A and B;
(iv) continuous if limn!1 (An ) = (A) whenever either An # A or An " A;
(v) continuous at S if limn!1 (An ) = (S) whenever An " S;
(vi) a probability if it is an additive capacity;
(vii) a probability measure if it is a probability which is continuous at S.
We denote by (S; ) the set of all probabilities on and by
(S; ) the set
of all probability measures on . We endow both sets with the relative topology
induced by the weak* topology.2 A set function : ! [0; 1] is
(viii) a lower probability (measure) if there exists a compact set M
such that
(A) = min P (A)
8A 2 :
(S; )
P 2M
Given a capacity , its conjugate
! [0; 1] is given by
(Ac )
(A) = 1
It is immediate to verify that if
:
is a lower probability, then
(A) = max P (A)
P 2M
The core of a capacity
8A 2 :
8A 2 :
is the weak* compact set de…ned by
core ( ) = fP 2
(S; ) : P
g;
that is, the core is the collection of all probabilities that setwise dominate . A
capacity : ! [0; 1] is
(ix) exact if core ( ) 6= ; and
(A) = minP 2core( ) P (A) for each A.
2 Recall that a net fP g
2I converges to P , in the weak* topology, if and only if P (A) !
P (A) for all A 2 . The weak* topology is thus the restriction to
(S; ) of the topology
(ba (S; ) ; B (S; )) where B (S; ) is the space of all real valued, bounded, and -measurable
functions on S and ba (S; ) is the set of all bounded and …nitely additive set functions on . In
the case of S being a Polish space and
the Borel -algebra, the above topology should not be
confused with the topology generated by real valued, bounded, and continuous functions on S.
ERGODIC THEOREM S FOR LOW ER PROBABILITIES
3
If is a convex capacity continuous at S, then is exact and ; 6= core ( )
(S; ) (see [6, Lemma 2 and Theorem 1], [17, Theorem 3.2], and [16, Theorem
4.2 and Theorem 4.7]). In particular, is a lower probability where M = core ( ).
Conversely, if is a lower probability, then is exact, continuous at S, and M
core ( )
(S; ). Nevertheless, being exact does not automatically imply being
convex. An exact capacity continuous at S is continuous. Finally, we say that a
statement about a random element holds
a:s: if and only if there exists an event
A such that (A) = 1 and the statement holds for all s 2 A.
2.2. Integrals. We denote by B (S; ) the set of all bounded and -measurable
functions from S to R. A capacity induces a functional on B (S; ) via the Choquet integral, de…ned for all f 2 B (S; ) by:
Z
Z 1
Z 0
fd =
(fs 2 S : f (s) tg) dt +
[ (fs 2 S : f (s) tg)
(S)] dt
S
1
0
where the right hand side integrals are (improper) Riemann integrals. If is additive, then the Choquet integral
reduces
to the standard additive integral. It is also
R
R
routine to check that
f
d
=
f
d for all f 2 B (S; ). It is well known
S
S
(see [6, Lemma 2], [18, Proposition 3], and [16, Theorem 4.7]) that if is a convex
capacity, then
Z
Z
Z
Z
f d = min
f dP and
f d = max
f dP
8f 2 B (S; ) :
S
P 2core( )
S
P 2core( )
S
S
In the rest of the paper, we consider two measurable spaces (S; ). The …rst one is
( ; F) which we interpret as the space where ultimately uncertainty lives. As for the
second one, given a real valued and F-measurable stochastic process ffn gn2N on ,
we consider the space RN ; (C) , which we interpret as the space of observations
endowed with the -algebra generated by the algebra of cylinders C.
3. Ergodic Theorems
3.1. Invariant Capacities. In this section, we consider a measurable space
( ; F). We also consider a transformation : ! which is F=F-measurable.
Recall that a probability measure P is ( -)invariant if and only if
(3.1)
P (A) = P
1
(A)
8A 2 F:
We denote by I the set of all probability measures that satisfy (3.1) and by G the set
of all invariant events of F, that is, A 2 G if and only if A 2 F and 1 (A) = A. An
invariant probability measure P is said to be ergodic if and only if P (G) = f0; 1g.
Similarly, we say that a capacity is ergodic if and only if (G) = f0; 1g. We
denote by S (I) the subset of I such that
S (I) = fP 2 I : P (G) = f0; 1gg :
If ( ; F) is a standard measurable space, then it can be checked that S (I) is the
set of strong extreme points of I (see Dynkin [11]). Finally, following Dunford and
Schwartz [10, pp. 723-724] (see also Dowker [8]), we say that a probability measure
P is potentially ( -)invariant if and only if there exists a probability measure P^ 2 I
such that
P (E) = P^ (E)
8E 2 G:
4
S.CERREIA–VIOGLIO, F. M ACCHERONI, AND M . M ARINACCI
We denote the set of potentially invariant probability measures by PI. Next, we
propose three notions of ( -)invariance for a capacity.
Definition 1. A capacity
is invariant if and only if for each A 2 F
1
(A) =
Definition 2. A capacity
An
M
1
1
(A) =
(A) :
is strongly invariant if and only if for each A 2 F
1
(A) nA and
Definition 3. A lower probability
I.
(A) nA =
An
1
(A) :
is functionally invariant if and only if
In what follows, we clarify the connection between these notions of invariance.
Theorem 1. Let ( ; F) be a measurable space and a lower probability. Then,
(i) =) (ii) =) (iii) where
(i) is strongly invariant;
(ii) is functionally invariant and core ( ) I;
(iii) core ( ) I.
Moreover, if is convex and continuous at , then statements (i), (ii), and (iii)
are equivalent.
As a corollary, we obtain that the three de…nitions coincide with the usual
de…nition of invariance when is a probability measure. Finally, if is a functionally
invariant lower probability, then it is immediate to see that is invariant.3 In the
next section, we show that, if is an invariant lower probability, then its core must
be contained in PI.
3.2. Ergodic Theorem. Given the notions of invariance previously discussed,
we could then ask ourselves if suitable ergodic theorems can be developed for nonadditive probabilities. Given Subsection 3.1, an immediate dichotomy presents. In
fact, the notion of invariance of De…nition 1 stands separate from, and it is actually
weaker than, the other notions of strong and functional invariance, even in the convex case. Theorem 2 only assumes the weak form of invariance of De…nition 1. On
the other hand, Corollary 2 assumes strong invariance. Strong invariance, paired
with the convexity of and ( ; F) being standard, allows us to provide a sharper
version of Theorem 2.
Theorem 2. Let ( ; F) be a measurable space and a lower probability. If
is invariant, then for each f 2 B ( ; F) there exists f ? 2 B ( ; G) such that
n
lim
n
1X
f
n
k 1
(!) = f ? (!)
a:s:
k=1
Moreover, if is ergodic, then
(
Z
!2 :
f ?d
n
1X
lim
f
n n
k=1
k 1
(!)
Z
?
f d
)!
3 Since
is a functionally invariant lower probability, we have that M
1 (A) =
1 (A) for all A 2 F .
minP 2M P (A) = minP 2M P
= 1:
I and
(A) =
ERGODIC THEOREM S FOR LOW ER PROBABILITIES
5
As a corollary, we are able to show a necessary property that core ( ) of an
invariant lower probability must satisfy (cf. Theorem 1). Clearly, it is not a
characterization since it is well known that there are probability measures that are
potentially invariant, but not invariant.
Corollary 1. If a lower probability
is invariant, then core ( )
PI.
As a second corollary, we discuss the ergodic theorem for convex and strongly
invariant capacities. Compared to Theorem 2, the following corollary assumes
convex and a stronger form of invariance that, in turn, yield a limit function f ?
which has more properties. These properties naturally generalize the ones found in
the Individual Ergodic Theorem of Birkho¤. In this case, convergence of empirical
averages is a simple consequence of Birkho¤’s theorem applied to each probability
in core ( ). Nevertheless, the relation between f and f ? in terms of Choquet expectations is not immediate at …rst sight. A similar comment applies to Theorem
3.
Corollary 2. Let ( ; F) be a standard measurable space and a convex capacity continuous at . If is strongly invariant, then for each f 2 B ( ; F) there
exists f ? 2 B ( ; G) such that
n
1X
f k 1 (!) = f ? (!)
a:s:
(3.2)
lim
n n
k=1
Moreover,
(1) For each P 2 I, f ? is a version of the conditional expectation of f given
G.
R ?
R
(2)
f d =
fd .
(3) If is ergodic, then
(
)!
Z
Z
n
1X
k 1
!2 :
fd
lim
f
(!)
fd
= 1:
n n
k=1
3.3. Subadditive Ergodic Theorem. Next we turn to a Subadditive Ergodic Theorem for lower probabilities.
Definition 4. A sequence fSn gn2N of F-measurable random variables is superadditive (resp., subadditive) if and only if
Sn+k
n
Sn + Sk
(resp.,
)
8n; k 2 N:
The sequence fSn gn2N is additive if and only if it is superadditive and subadditive.
Consider an F-measurable function f :
(3.3)
Sn =
n
X
f
! R. If we de…ne fSn gn2N by
k 1
k=1
8n 2 N;
then we have that fSn gn2N is an additive sequence. The opposite is also true, that
is, if fSn gn2N is additive, then it takes the form (3.3) for some F-measurable real
valued function f . On the other hand, if we take fSn gn2N as in (3.3) and we consider
fjSn jgn2N we obtain a genuine subadditive sequence. Note that if f 2 B ( ; F),
then we also have that there exists 2 R such that
(3.4)
Similarly, we have that
n
n
Sn (!)
jSn j
n
8! 2 :
n for all n 2 N.
6
S.CERREIA–VIOGLIO, F. M ACCHERONI, AND M . M ARINACCI
Theorem 3. Let ( ; F) be a standard measurable space and a lower probability. If fSn gn2N is either a superadditive or a subadditive sequence that satis…es
(3.4) and if is functionally invariant, then there exists f ? 2 B ( ; G) such that
lim
n
Sn
= f?
n
a:s:
Moreover,
(1) If
R
is convex and strongly invariant and fSn gn2N superadditive, then
R Sn
f ? d = supn2N
n d .
R ?
(2) If is convex and strongly invariant and fSn gn2N subadditive, then
f d =
R Sn
inf n
n d .
(3) If is ergodic and fSn gn2N is either subadditive or superadditive, then
Z
Z
Sn (!)
f ?d
!2 :
= 1:
f ?d
lim
n
n
4. Strong Law of Large Numbers
As an application of Theorem 2, we provide a nonadditive version of the Strong
Law of Large Numbers. Before doing so, we need to introduce some notation
and terminology. Consider a sequence of real valued, bounded, and measurable
random variables f = ffn gn2N
B ( ; F). We denote by T the tail -algebra
\
(fk ; fk+1 ; :::).
k2N
Definition 5. Given a capacity , we say that f = ffn gn2N is stationary if
and only if for each n 2 N, for each k 2 N0 , and for each Borel subset B of Rk+1
(4.1)
(f! 2 : (fn (!) ; :::; fn+k (!)) 2 Bg) = (f! 2 : (fn+1 (!) ; :::; fn+k+1 (!)) 2 Bg) :
This notion generalizes the usual notion of stationary stochastic process by
allowing for the nonadditivity of the underlying probability measure. Recall that
RN ; (C) denotes the space of sequences endowed with the -algebra generated
by the algebra of cylinders. We denote a generic element of RN by x. We also
consider the shift transformation : RN ! RN de…ned by
(x) = (x2 ; x3 ; x4 ; ::::::)
8x 2 RN :
The sequence ffn gn2N induces a natural (measurable) map between ( ; F) and
RN ; (C) , de…ned by
! 7! f (!) = (f1 (!) ; :::; fk (!) ; :::)
De…ne
f
:
(C) ! [0; 1] by
f
(C) =
f
1
(C)
8C 2
8! 2 :
(C) :
Definition 6. Given a capacity , we say that f = ffn gn2N is ergodic if and
only if f is ergodic with respect to the shift transformation.
Lemma 1. If is a convex capacity continuous at and f is stationary, then
is a convex capacity continuous at RN which is shift invariant. Moreover, f is
ergodic if (T ) = f0; 1g.
f
ERGODIC THEOREM S FOR LOW ER PROBABILITIES
7
This observation is a …rst step to deduce the Strong Law of Large Numbers as a
corollary of Theorem 2 applied to f . In a nutshell, the assumption of stationarity
yields that the limit
n
1X
lim
fk
n n
k=1
exists -a.s. In order to obtain also a characterization of the limit in terms of the
(Choquet) expected value, we further need f to be ergodic.
Theorem 4. Let be a convex capacity continuous at . If f = ffn gn2N is
stationary and ergodic, then
)!
(
Z
Z
n
1X
fk (!)
f1 d
= 1:
!2 :
f1 d
lim
n n
k=1
We close by observing that there are few but important di¤erences with the nonadditive Strong Law of Large Numbers of Marinacci [15] and Maccheroni and Marinacci [14]. In terms of hypotheses, we weaken the assumption of total monotonicity
of to convexity, while we replace the i.i.d hypothesis of [15] with stationarity and
ergodicity. Finally, compared to the main result of [14], we need to assume the
continuity of . In turn, we obtain that empirical averages exist -a.s., a property
that was not present in previous works. The bounds for these empirical averages
are in terms of the lower and the upper Choquet integrals of the random variable
f1 , as in [15] and [14].
Appendix A. Dynkin Spaces and Nonadditive Probabilities
Consider a standard measurable space ( ; F) and a transformation : !
which is FnF measurable. Recall that we denote by I the set of all invariant
probability measures. If I is a nonempty set, then the triple ( ; F; I) forms a
Dynkin space.
Definition 7 (Dynkin, 1978). Let P be a nonempty subset of
( ; F) where
( ; F) is a separable measurable space. The triple ( ; F; P) is a Dynkin space if
and only if there exist a sub- -algebra G F, a set W 2 F, and a function
p: F
(A; !)
!
7!
[0; 1]
p (A; !)
such that:
(a) for each P 2 P and A 2 F, p (A; ) :
! [0; 1] is a version of the
conditional probability of A given G;
(b) for each ! 2 , p ( ; !) : F ! [0; 1] is a probability measure;
(c) P (W ) = 1 for all P 2 P and p ( ; !) 2 P for all ! 2 W .
It is not hard to check that, given f 2 B ( ; F), the function f^ :
de…ned by
Z
(A.1)
f^ (!) =
f dp ( ; !)
8! 2 ;
! R,
is a version of the conditional expected value of f given G for all P 2 P, in particular,
f^ 2 B ( ; G) (see also [5, Remark 13]). When ( ; F) is a standard measurable space,
8
S.CERREIA–VIOGLIO, F. M ACCHERONI, AND M . M ARINACCI
if ( ; F; P) = ( ; F; I), then G is the set of invariant events. In particular, we can
consider W = (see Gray [12, Theorem 8.3]). The next lemma is ancillary.4
Lemma 2. Let ( ; F) be a measurable space and G a sub- -algebra of F. If
is a lower probability such that (G) = f0; 1g and g 2 B ( ; G), then
Z
Z
!2 :
gd
g (!)
gd
= 1:
Appendix B. Proofs
Proof of Theorem 1. Recall that if
(B.1)
P
is a lower probability, we have that
8P 2 core ( )
( ; F) :
(i) implies (ii). Pick A 2 F. Since is strongly invariant and
, we have
1
1
1
(A) nA .
(A) nA
An 1 (A) =
(A) nA = An 1 (A)
1
1
1
1
It follows that An
(A) = An
(A) =
(A) nA =
(A) nA =
1
k. By (B.1), we can conclude that P An 1 (A) = k = P
(A) nA for all
1
P 2 core ( ). This implies that P (A) = P An
(A) + P A \ 1 (A) =
1
1
P
(A) nA + P A \ 1 (A) = P
(A) for all P 2 core ( ).
(ii) implies (iii). It is trivial.
Assume that is further convex and continuous at . Then, it is a lower
probability. We are left to show:
(iii) implies (i). Since is convex and core ( ) I, it follows that
Z
c
1
An 1 (A) +
A[
(A)
=
1 + 1A 1 1 (A) d
Z
=
min
1 + 1A 1 1 (A) dP = 1:
P 2core( )
Thus, we have that
An
1
(A) = 1
A[
1
(A)
c
1
=1
(A) nA
c
=
1
(A) nA :
1
An analogous argument yields that
(A) nA =
An 1 (A) , proving the
statement.
Before proving Theorem 2, we provide an ancillary key result.
Theorem 5. Let ( ; F) be a measurable space, a lower probability, and assume that the family I of invariant probability measures is not empty. The following
statements are equivalent:
(i) There exists P 2 I such that for each E 2 F
P (E) = 1 =) lim
k
k
(E) = 1;
(ii) There exists P 2 I such that for each E 2 G
P (E) = 1 =)
(iii) For each E 2 G
P (E) = 1
(E) = 1;
8P 2 I =)
(E) = 1;
4 The proof is standard and available in the arXiv working paper version of this work
(arXiv:1508.01329).
ERGODIC THEOREM S FOR LOW ER PROBABILITIES
9
(iv) For each f 2 B ( ; F) there exists f ? 2 B ( ; G) such that
n
1X
lim
f k 1 (!) = f ? (!)
a:s:;
n n
k=1
(v) core( )
PI.
k
Proof. (i) implies (ii). If E 2 G, then
(E) = E for all k 2 N, yielding the
statement.
(ii) implies (iii). It is trivial.
(iii) implies (iv). Consider f 2 B ( ; F). De…ne f ? : ! R by
n
1X
f k 1 (!)
f ? (!) = lim sup
8! 2 :
n
n
k=1
De…ne f? : ! R by considering the lim inf. Since f 2 B ( ; F), it can be shown
that f ? ; f? 2 B ( ; G). Consider the event
)
(
n
1X
k 1
f
(!) exists = f! 2 : f ? (!) = f? (!)g
E = ! 2 : lim
n n
k=1
)
(
n
1X
k 1
?
f
(!) = f? (!) :
= ! 2 : f (!) = lim
n n
k=1
By Birkho¤’s Ergodic Theorem (see [2, Theorem 24.1]), we have that P (E) = 1
for all P 2 I. By assumption, this yields that (E) = 1. Since f was chosen to be
generic, the statement follows.
(iv) implies (v). Recall that for each P 2 core ( ), P (A)
(A) for all A 2 F.
By assumption, we can conclude that for each P 2 core ( ), for each f 2 B ( ; F)
there exists f ? 2 B ( ; G) such that
n
1X
lim
f k 1 (!) = f ? (!)
P a:s:
n n
k=1
By [13, p. 964] (see also [10, Exercises 31 and 32, pp. 723–724]), it follows that
P 2 PI.
(v) implies (i). Since is a lower probability, it is continuous at and exact.
By [16, Theorem 4.2], it follows that there exists a measure P 2 core ( ) such that
for each A 2 F, for each " > 0, there exists > 0 such that
(B.2)
P (A) <
=) Q (A) < "
8Q 2 core ( ) :
It is immediate to show that P is such that for each A 2 F
(B.3)
P (A) = 0 =) Q (A) = 0
8Q 2 core ( ) :
Since P 2 core ( ) PI, we have that there exists P 2 I such that P (E) = P (E)
for all E 2 G. Consider E 2 F. Assume that P (E) = 1. It follows that P (E c ) = 0.
k
At the same time, de…ne Fn = [1
(E c ). Note that Fn # F 2 G. Since
k=n
P1
k
P 2 I, it follows that P (F ) = limn P (Fn )
P (F1 )
(E c ) =
k=1 P
0. It follows that P (F ) = 0, that is, P (F ) = 0. By (B.3), we have that
Q (F ) = 0 for all Q 2 core ( ), that is, (F ) = 0. Since is a lower probability, satis…es the Fatou’s property, that is, 0 lim supk (Ak )
(lim supk Ak )
k
for each sequence fAk gk2N
F. This implies that 0
lim inf k
(E c )
10
S.CERREIA–VIOGLIO, F. M ACCHERONI, AND M . M ARINACCI
k
lim supk
limk
k
(E c )
lim supk k (E c ) = (F ) = 0. We can conclude that
k
(E) = limk 1
(E c ) = 1, proving the statement.
The proof of Theorem 2 uses some of the techniques common in Ergodic Theory
(see, e.g., [7, Theorem 7]). Also, note that, given a capacity , we have that
core ( ) = fP 2
( ; F) :
P
g = fP 2
( ; F) :
Pg:
Proof of Theorem 2. We …rst prove that, given the assumptions, ; 6=core( )
PI. In particular, this shows that I =
6 ;.
Claim: Let be a lower probability. If
particular, I =
6 ;.
is invariant, then core ( )
PI. In
Proof of the Claim. Since is invariant, is invariant. Since is a lower
probability, is continuous at and, in particular, ; =
6 core ( )
( ; F). Fix
a Banach-Mazur limit (see [1, pag. 550]) : l1 ! R, that is, a functional from l1
to R such that:
(1)
(2)
(3)
(4)
is linear;
is positive;
(x1 ; x2 ; :::) = (x2 ; x3 :::) for all x 2 l1 ;
(x1 ; x2 ; :::) = limn xn for all x 2 c.
Observe that (A)
P (A)
(A) for all P 2 core ( ) and all A 2 F. Fix
P 2 core ( ), de…ne Pn : F ! [0; 1] by
Pn (A) =
n 1
1X
P
n
k
k=0
(A)
8A 2 F:
k
k
Note that P
(A)
(A) = (A) for all A 2 F and for all k 2 N0 .
Since core ( ) is convex, this implies that fPn gn2N
core ( ). For each A 2 F,
de…ne xA = (P1 (A) ; P2 (A) ; P3 (A) ; :::). Note that 0
xA
1N , thus, xA 2
l1 for all A 2 F. De…ne P^ : F ! [0; 1] by P^ (A) = (xA ) for all A 2 F.
Since
is positive, note that P^ is a well de…ned positive set function. Next,
consider A; B 2 F such that A \ B = ;. Since fPn gn2N
( ; F), it follows
that Pn (A [ B) = Pn (A) + Pn (B) for all n 2 N. Since is linear, this implies that
P^ (A [ B) = (xA[B ) = (xA + xB ) = (xA ) + (xB ) = P^ (A) + P^ (B), proving
k
that P^ is additive. Next, consider A 2 G. Since
(A) = A for all k 2 N0 , it
follows that Pn (A) = P (A) for all n 2 N. Since maps convergent sequences
into their limit, we have that P^ (A) = (xA ) = P (A). In particular, this implies
that P^ ( ) = 1 and P^ (;) = 0. Up to now, we have proved that P^ 2 ( ; F) and
P^ (A) = P (A) for all A 2 G. Since fPn gn2N core ( ), we have that xA
(A) 1N .
^
Since is linear and positive, it follows that P (A) = (xA )
( (A) 1N ) = (A)
for all A 2 F, that is, P^ 2 core ( ). Since core ( )
( ; F), we can conclude
that P^ 2
( ; F). We next show that P^ is invariant. Note that for each A 2 F
and for each n 2 N
Pn
1
(A) =
n 1
1X
P
n
n
k 1
k=0
n+1
=
Pn+1 (A)
n
(A) =
1 X
n+1
P
n
n+1
k=0
1
P (A) :
n
k
(A)
1
P (A)
n
ERGODIC THEOREM S FOR LOW ER PROBABILITIES
11
De…ne y = (P2 (A) ; P3 (A) ; :::). De…ne z = x 1 (A) y 2 l1 . Note that jzn j =
1
2
1
Pn
(A)
Pn+1 (A)
P (A)j
n jPn+1 (A)
n for all n 2 N. It follows that limn zn = 0. Since
satis…es properties 3, 1, and 4, we have that
1
^
^
x 1 (A)
(xA ) =
x 1 (A)
(y) = j (z)j =
P
(A)
P (A) =
^
0, proving that P is invariant. Given the previous part of the proof, P^ 2 I and
P 2 PI. Since P was arbitrarily chosen in core ( ), it follows that I 6= ; and
core ( ) PI.
By the previous claim and Theorem 5, the main statement follows.
Finally, assume that is further ergodic. By Lemma 2 and since f ? 2 B ( ; G)
and is an ergodic lower probability, it follows that
Z
Z
?
?
!2 :
f d
f (!)
f ?d
= 1:
Since
(
!2
: f ? (!) = limn
ability, this implies that
(
Z
!2 :
f ?d
1
n
n
X
f
k 1
(!)
k=1
)!
n
1X
lim
f
n n
k=1
k 1
(!)
= 1 and
Z
?
f d
is a lower prob)!
= 1;
proving the statement.
Proof of Corollary 1. It is the proof of the claim contained in the proof of
Theorem 2.
We next proceed by proving Theorem 3 and obtaining Corollary 2 as a corollary
of this former result. It is also possible to provide a proof of Corollary 2 as a
consequence of Theorem 2.
Lemma 3. Let fSn gn2N be a superadditive (resp., subadditive) sequence that
satis…es (3.4) and M a compact subset
If fan gn2N
R
R of invariant probability measures.
in R is de…ned by an = minP 2M Sn dP (resp., an = maxP 2M Sn dP ) for all
n 2 N, then fan gn2N is subadditive, that is, an+k an + ak for all n; k 2 N.
Proof. Since fSn gn2N satis…es (3.4), fSn gn2N
B ( ; F). We just prove the
statement for the superadditive case, being the subadditive one similarly proven.
If fSn gn2N is superadditive and M is a compactRsubset of invariant probability
R
measures, then we haveR that an+k = minRP 2M Sn+k dP
minP 2M
Sn +
R
n
n
Sk
dP
min
S
dP
+
min
S
dP
=
min
S
dP
+
P
2M
n
P
2M
k
P
2M
n
R
minP 2M Sk dP = an ak for all n; k 2 N, proving the statement.
Proof of Theorem 3. Since is a functionally invariant lower probability, we have
that M I. De…ne ffn gn2N
B ( ; F) by fn = Sn =n for all n 2 N. It follows
that f^n 2 B ( ; G) for all n 2 N. Since fSn gn2N satis…es (3.4), it follows that there
exists 2 R such that
fn ; f^n
for all n 2 N. De…ne f ? 2 B ( ; G) by f ? =
?
^
^
supn2N fn (resp., f = inf n2N fn ). By Kingman’s Subadditive Ergodic Theorem
(see Dudley [9, Theorem 10.7.1]
n and [12, Theorem 8.4]) and
o since W = , we
Sn (!)
?
?
^
! 2 : limn n = f (!)
= 1 for all P 2 M.
have that f = limn fn and P
n
o
Sn (!)
Since is a lower probability, it follows that
! 2 : limn n = f ? (!)
= 1,
proving the main part of the statement.
12
S.CERREIA–VIOGLIO, F. M ACCHERONI, AND M . M ARINACCI
1. If
(B.4)
is convex and strongly invariant, then we have that core ( )
Z
Z
f d = min
f dP
8f 2 B ( ; F) :
I and
P 2core( )
R
Consider the sequence fan gn2N de…ned by an =
Sn d for all n 2 N. By (B.4)
and Lemma 3, we have that fan gn2N is subadditive. It follows that (see [12, Lemma
8.3]) limn ann = inf n2N ann , that is,
(B.5)
lim
n
n o
Recall that f^n
n2N
an
an
= sup
:
n
n2N n
is uniformly bounded. By Cerreia-Vioglio, Maccheroni, Mari-
nacci, and Montrucchio [3, Theorem 22], (B.5), and the main part of the statement
and since core ( ) I, we have that
Z
Z
Z
Z
f^n dP
f^n d = lim
min
f ?d =
lim f^n d = lim
n
n
n
P 2core( )
R
Z
Z
Sn d
= lim
min
fn dP = lim
fn d = lim
n
n
n
n
P 2core( )
R
Z
Sn d
an
an
= lim
= sup
= sup
= sup
fn d ;
n
n
n
n
n2N n
n2N
proving point 1.
2. If
(B.6)
is convex and strongly invariant, then we have that core ( )
Z
Z
f d = max
f dP
8f 2 B ( ; F) :
I and
P 2core( )
R
Consider the sequence fan gn2N de…ned by an =
Sn d . By (B.6) and Lemma 3,
we have that fan gn2N is subadditive. It follows that (see [12, Lemma 8.3])
(B.7)
lim
n
n o
Recall that f^n
n2N
an
an
= inf
:
n n
n
is uniformly bounded. By [3, Theorem 22], (B.7), and the
main part of the statement and since core ( ) I, we have that
Z
Z
Z
Z
f ?d =
lim f^n d = lim
f^n d = lim
max
f^n dP
n
n
n
P 2core( )
R
Z
Z
Sn d
= lim
max
fn dP = lim
fn d = lim
n
n
n
n
P 2core( )
R
Z
Sn d
an
an
fn d ;
= lim
= inf
= inf
= inf
n n
n n
n
n2N
n
proving point 2.
3. By Lemma 2 and since is ergodic, it follows that
Z
Z
?
?
!2 :
f d
f (!)
f ?d
= 1:
ERGODIC THEOREM S FOR LOW ER PROBABILITIES
13
n
By the initial part of the proof, we have that
! 2 : f ? (!) = limn
1. Since is a lower probability, this implies that
Z
Z
Sn (!)
?
!2 :
f d
lim
f ?d
= 1;
n
n
Sn (!)
n
o
=
proving the statement.
Proof of Corollary 2. Pick f 2 B ( ; F). It is immediate to see that fSn gn2N ,
Pn
k 1
de…ned by Sn =
for all n 2 N, is an additive sequence which
k=1 f
satis…es (3.4). Since is convex, continuous at , and strongly invariant, it is
a functionally invariant lower probability. De…ne ffn gn2N by fn = Sn =n for all
n 2 N. Note that f^n = f^ for all n 2 N. By the proof of Theorem 3, we have that
limn Snn = limn f^n = f^,
a:s:, proving the main statement and point 1 where
f ? = f^.
R 2. Since is convexR and strongly invariant, then we have that core ( ) I and
f d = minP 2core( ) f dP . By point 1 and since core ( )
I, we have that
R
R
R
R
f d = minP 2core( ) f dP = minP 2core( ) f^dP =
f^d , proving point 2.
R
R
R
R
Note also that
f d = maxP 2core( ) f dP = maxP 2core( ) f^dP =
f^d .
3. By point 3 of Theorem 3 and the proof of point 2, the statement follows.
Proof of Lemma 1. Consider a convex capacity and a process f . It is immediate to see that f is a convex capacity. Next, consider fCn gn2N
(C) such
that Cn " RN . It follows that the sequence fAn gn2N , de…ned by An = f 1 (Cn )
for all n 2 N, is such that An " . Since is continuous at , we have that
limn f (Cn ) = limn f 1 (Cn ) = limn (An ) = 1, proving that f is continuous
at RN . Next, consider C 2 C. Then, there exist k 2 N and E 2 B Rk such that
C = x 2 RN : (x1 ; :::; xk ) 2 E . Note that 1 (C) = fx 2 RN : (x1 ; x2 ; :::; xk+1 ) 2
R Eg. Since f is stationary, it follows that
f
(C) =
=
=
=
f
1
(C) =
(f! 2
(f! 2
f
1
(f! 2
: (f1 (!) ; :::; fk (!)) 2 Eg)
: (f2 (!) ; :::; fk+1 (!)) 2 Eg)
: (f1 (!) ; f2 (!) ; :::; fk+1 (!)) 2 R
1
(C)
=
f
1
Eg)
(C) :
Since C 2 C was arbitrarily chosen, it follows that C
fC 2 (C) : f (C) =
1
(C) g
(C). Since f is convex and continuous at RN , we have that fC 2
f
1
(C) : f (C) = f
(C) g is a monotone class. By the Monotone Class Theorem
1
(see [2, Theorem 3.4]), it follows that (C) = C 2 (C) : f (C) = f
(C) ,
1
\
1
that is, f is shift invariant. De…ne H =
Ck+1
\ (C).5 Note that f 1 (H)
k=1
T . Thus, f (H) = f0; 1g if (T ) = f0; 1g. Let G be the -algebra of shift invariant
events. It is well known that G H. In light of these observations, it is immediate
to see that if (T ) = f0; 1g, then f (G) = f0; 1g, that is, f is ergodic.
5C1
k+1
is the class of cylinders such that
n
C = x 2 RN : (x1 ; :::; xk ; xk+1 ; :::; xk0 ) 2 Rk
where k0 > k and E 2 B(Rk
0
k ).
E
o
14
S.CERREIA–VIOGLIO, F. M ACCHERONI, AND M . M ARINACCI
Proof of Theorem 4. By induction and since f is stationary, it follows that for
each k 2 N and for each Borel subset B of R
(B.8)
(f! 2
: f1 (!) 2 Bg) =
(f! 2
: fk (!) 2 Bg) :
By (B.8), this implies that for each k 2 N and for each Borel subset B of R
x 2 RN : x k 2 B
f
=
(f! 2
: fk (!) 2 Bg) =
(f! 2
: f1 (!) 2 Bg) :
In particular, since ffn gn2N
B ( ; F), it follows that there exists m 2 R such
that m1
f1 m1 . If we replace B with [ m; m], then we can conclude that
(B.9)
x 2 RN : xk 2 [ m; m] = (f! 2 : f1 (!) 2 [ m; m]g) = 1 8k 2 N:
f
: RN ! R by
De…ne
x1
0
(x) =
if x1 2 [ m; m]
otherwise
8x 2 RN :
It is immediate to see that 2 B RN ; (C) . Note also that
(B.10)
)
(
n
n
1
1
\
\
1X
1X
k 1
N
N
(x) =
xk :
x2R :
x 2 R : xk 2 [ m; m]
n
n
n=1
k=1
k=1
k=1
By (B.9) and (B.10) and since f is a convex capacity which is further continuous
at RN , it follows that
(
)!
1
n
n
\
1X
1X
N
k 1
(B.11)
x2R :
(x) =
xk
= 1:
f
n
n
n=1
k=1
k=1
By Theorem 2 and since f is shift invariant and ergodic, we have that there exists
?
2 B RN ; G such that
(B.12)(
)!
Z
Z
n
1X
N
?
k 1
?
?
x2R :
d f lim
(x) =
(x)
d f
= 1:
f
n n
RN
RN
k=1
By (B.11) and (B.12) and since
(
Z
N
?
(B.13)
x
2
R
:
d
f
is convex, we can conclude that
)!
Z
n
1X
?
?
xk =
(x)
d f
= 1:
lim
f
n n
RN
RN
k=1
Pn
Pn
k 1
k 1
Let E = x 2 RN : limn n1 k=1
(x) = ? (x) and n = n1 k=1
for all n 2 N. By (B.12), we have that P (E) = 1 for all P 2 core ( f ). By
construction, f1E n gn2N
B RN ; (C) is a uniformly bounded sequence which
converges pointwise to 1E ? . By [3, Theorem 22] and since f is convex and
P (E) = 1 for all P 2 core ( f ), this implies that
(B.14)
Z
Z
Z
Z
Z
?
d
f
=
1E
RN
?
d
f
=
RN
Next, since
Z
RN
f
lim 1E
RN
n
nd f
= lim
n
1E
RN
nd f
= lim
n
nd f :
RN
is convex and shift invariant, note that for each n 2 N
Z
Z
n
n Z
1X
1X
k 1
k 1
d f
d f =
d f:
nd f =
n
RN n
RN
RN
f
k=1
k=1
ERGODIC THEOREM S FOR LOW ER PROBABILITIES
15
R
R
By
itR follows that RN ? d f R RN d f . R A similar argument
yields
R (B.14),
R
R that
?
d
d
.
Finally,
since
d
=
f
d
and
d
=
f1 d ,
f
f
f
1
f
RN
RN
RN
RN
by (B.13), we can conclude that
(
)!
Z
Z
n
1X
N
1= f
x2R :
d f lim
xk
d f
n n
RN
RN
k=1
)!
(
Z
Z
n
1X
fk (!)
f1 d
;
=
!2 :
f1 d
lim
n n
k=1
proving the statement.
References
[1] C. D. Aliprantis and K. Border, In…nite Dimensional Analysis, 3rd ed., Springer, New York,
2006.
[2] P. Billingsley, Probability and Measure, 3rd ed., John Wiley & Sons, New York, 1995.
[3] S. Cerreia-Vioglio, F. Maccheroni, M. Marinacci, and L. Montrucchio, Signed Integral Representations of Comonotonic Additive Functionals, Journal of Mathematical Analysis and
Applications, 385, 895-912, 2012.
[4] S. Cerreia-Vioglio, F. Maccheroni, M. Marinacci, and L. Montrucchio, Choquet Integration on
Riesz Spaces and Dual Comonotonicity, Transactions of the American Mathematical Society,
forthcoming.
[5] S. Cerreia-Vioglio, F. Maccheroni, M. Marinacci, and L. Montrucchio, Ambiguity and Robust
Statistics, Journal of Economic Theory, 974-1049, 2013.
[6] F. Delbaen, Convex Games and Extreme Points, Journal of Mathematical Analysis and
Applications, 45, 210-233, 1974.
[7] Y. N. Dowker, Invariant Measure and the Ergodic Theorems, Duke Mathematical Journal,
4, 1051-1061, 1947.
[8] Y. N. Dowker, Finite and -Finite Invariant Measures, Annals of Mathematics, 4, 595-608,
1951.
[9] R. M. Dudley, Real Analysis and Probability, 2nd ed., Cambridge University Press, Cambridge, 2002.
[10] N. Dunford and J. T. Schwartz, Linear Operators; Part I: General Theory, Wiley, New York,
1958.
[11] E. B. Dynkin, Su¢ cient Statistics and Extreme Points, The Annals of Probability, 6, 705-730,
1978.
[12] R. M. Gray, Probability, Random Processes, and Ergodic Properties, 2nd ed., Springer, New
York, 2009.
[13] R. M. Gray and J. C. Kie¤er, Asymptotically Mean Stationary Measures, The Annals of
Probability, 8, 962-973, 1980.
[14] F. Maccheroni and M. Marinacci, A Strong Law of Large Numbers for Capacities, The Annals
of Probability, 33, 1171-1178, 2005.
[15] M. Marinacci, Limit Laws for Non-additive Probabilities and Their Frequentist Interpretation,
Journal of Economic Theory, 84, 145-195, 1999.
[16] M. Marinacci and L. Montrucchio, Introduction to the Mathematics of Ambiguity, in Uncertainty in Economic Theory, Routledge, New York, 2004.
[17] D. Schmeidler, Cores of Exact Games, I, Journal of Mathematical Analysis and Applications,
40, 214-225, 1972.
[18] D. Schmeidler, Integral Representation without Additivity, Proceedings of the American
Mathematical Society, 97, 255-261, 1986.
[19] P. Walley, Statistical Reasoning with Imprecise Probabilities, Chapman and Hall, London,
1991.
Università Bocconi