INVARIANCE PRINCIPLES FOR ITERATED MAPS THAT

TRANSACTIONS OF THE
AMERICAN MATHEMATICAL SOCIETY
Volume 359, Number 3, March 2007, Pages 1081–1097
S 0002-9947(06)04322-4
Article electronically published on October 17, 2006
INVARIANCE PRINCIPLES FOR ITERATED MAPS
THAT CONTRACT ON AVERAGE
C. P. WALKDEN
Abstract. We consider iterated function schemes that contract on average.
Using a transfer operator approach, we prove a version of the almost sure
invariance principle. This allows the system to be modelled by a Brownian
motion, up to some error term. It follows that many classical statistical properties hold for such systems, such as the weak invariance principle and the law
of the iterated logarithm.
1. Introduction and statement of results
1.1. Background. The study of the limiting behaviour of the sum of a sequence
of observations or random variables is a key problem in dynamical systems and
probability theory. For example, the ergodic theorem (alternatively, the strong law
of large numbers) describes the average behaviour of such sums, and the central
limit theorem describes the deviations of these sums from the average. One can
then look for extensions of these results.
One such extension is the celebrated almost sure invariance principle [S1, S2].
This says the following: let Xj be a sequence of i.i.d. random variables with finite
(2+δ)-moments and let Sn = X0 +· · ·+Xn−1 . Then there exists a Brownian motion
B and a probability space Ω on which B and Sn can be redefined such that Sn =
B(n) + o(n1/2 ). This immediately allows one to deduce many classical statistical
properties (such as the strong law of large numbers and various refinements of
the central limit theorem) for Sn , given that they are known to hold for B. A
general method for proving the almost sure invariance principle for sums of weakly
dependent random variables can be found in [PS]; here the error term is of the form
O(n1/2−δ ) for some δ > 0.
In the context of a dynamical system T : X → X, it is natural to consider sums
j
of the form Sn = n−1
j=0 f T . The almost sure invariance principle in the case when
T is a uniformly hyperbolic diffeomorphism or flow is now well understood [DK],
again with an error term of the form O(n1/2−δ ) for some δ > 0.
More recently, the almost sure invariance principle has been re-examined for
a variety of partially hyperbolic maps [FMT, MT] and non-uniformly hyperbolic
maps with indifferent fixed points [PoS] using the spectral properties of a transfer
operator. The existence of strong spectral properties (in particular a spectral gap)
allows one to deduce an improved error term of the form O(n1/4+ε ) for any ε > 0.
Received by the editors March 19, 2003 and, in revised form, November 25, 2004.
2000 Mathematics Subject Classification. Primary 60F17; Secondary 37H99, 37A50.
c
2006
American Mathematical Society
Reverts to public domain 28 years from publication
1081
License or copyright restrictions may apply to redistribution; see http://www.ams.org/journal-terms-of-use
1082
C. P. WALKDEN
The purpose of this note is to prove an almost sure invariance principle with
an error term of the form O(n1/4+ε ) for any ε > 0 for a sequence of sums of
observations arising from an iterated function scheme where the maps satisfy a
notion of “contraction on average”, a form of non-uniform hyperbolicity.
1.2. Statement of results. Let (X, d) be a metric space. Consider a family of
Lipschitz maps Tj : X → X, j = 1, . . . , M . We are interested in studying the
statistical properties of the iterated function scheme formed by applying the maps
Tj chosen at random according to a Markov transition probability.
M
Let pj : X → [0, 1] be continuous maps such that j=1 pj (x) = 1 for all x ∈ X.
Define a Markov transition probability by
p(x, A) =
M
pj (x)χA (Tj x)
j=1
for each Borel subset A ⊂ X. (Here χA denotes the characteristic function of A.)
We say that the system contracts on average if there exists r ∈ (0, 1) such that
for all x1 , x2 ∈ X we have
(1)
M
d(Tj x1 , Tj x2 )pj (x1 ) ≤ rd(x1 , x2 ).
j=1
Examples of such systems include certain affine systems Tj : Rd → Rd : x →
Aj x + bj where Aj ∈ Gl(d, R) and bj ∈ Rd ; see §2.4. Other examples and applications are discussed in [DF].
We also assume that the pj > 0 are continuous and satisfy a Dini condition (cf.
[E1]).
With these assumptions, it is known [BDEG] that there exists a unique attractive
stationary Borel probability measure ν on X, i.e. for all Borel sets A
(2)
p(x, A) dν(x) = ν(A).
Let Σ = {i = (i0 , i2 , . . .) | 1 ≤ ij ≤ M } denote the one-sided full-shift on M
symbols. For each x ∈ X we define a probability measure µx on Σ by defining µx
on cylinder sets by
µx [i0 , i2 , . . . , in−1 ] = pi0 (x)pi1 (Ti0 x) · · · pin−1 (Tin−2 · · · Ti0 x).
For each x ∈ X and i ∈ Σ we define
Zk (x, i) = Tik−1 · · · Ti0 (x).
Then Zk (x, i) is an X-valued Markov chain with respect to µx , with initial state x
and transition probability p.
We can relate µx with ν as follows. Define πx (i) = limk→∞ Ti0 Ti1 · · · Tik (x) for
µx -a.e. i ∈ Σ. Then for all x ∈ X we have πx∗ µx = ν. See, for example, [E2].
Let f : X → R be a bounded and continuous function on X. We are interested
in the distribution of the sequence of observations
f n (x, i) =
n−1
f Zk (x, i).
k=0
License or copyright restrictions may apply to redistribution; see http://www.ams.org/journal-terms-of-use
ITERATED MAPS THAT CONTRACT ON AVERAGE
1083
It is known [E1] that f n satisfies a pointwise ergodic theorem (or strong law of
large numbers): for all x ∈ X and µx -a.e. i ∈ Σ,
1 n
(3)
f (x, i) → ν(f ).
n
A central limit theorem is also known [Pe]. Let f : X → R be a bounded Lipschitz
function and fix x ∈ X. Then
1
√ f n (x, ·) →d N (ν(f ), σ 2 (f )),
(4)
n
provided that the variance σ 2 (f ) = 0. Here N (ν(f ), σ 2 (f )) denotes the normal
distribution with mean ν(f ) and variance σ 2 (f ) and →d denotes convergence in
distribution. The variance is given by
1
2
(f n (x, i) − nν(f ))2 dµx (i).
σ (f ) = lim
n→∞ n
This quantity is independent of the choice of origin x. Error terms and estimates
on the rate of convergence in (4) (Berry-Esseen bounds) can be found in [Po].
The purpose of this note is to prove more refined statistical properties of f n . In
particular, we deduce the following results. (In the statements below, C([0, 1], R)
denotes the space of continuous functions on [0, 1], and D([0, 1], R) denotes the
space of right-continuous functions on [0, 1] that have left limits.)
Proposition 1.1 (Weak invariance principle). Let f : X → R be a bounded Lipschitz function. Suppose that ν(f ) = 0 and σ 2 (f ) > 0. Fix x ∈ X. For each n > 0
define the random function ζn,x : Σ → D([0, 1], R) by defining
f [nt] (x, i)
√ .
σ(f ) n
∗
µx converges
Then for every x ∈ X and µx -a.e. i ∈ Σ, the sequence of measures ζn,x
weakly as n → ∞ to the standard Wiener measure on C([0, 1], R).
ζn,x (i)(t) =
The weak invariance principle is also known as the functional central limit theorem. It immediately implies the central limit theorem (set t = 1).
Proposition 1.2 (Functional law of the iterated logarithm). Let f : X → R be a
bounded Lipschitz function. Suppose that ν(f ) = 0 and σ 2 (f ) > 0. Fix x ∈ X. For
each n > 0 define the random function ξn,x : Σ → D([0, 1], R) by defining
1
√
ξn,x (i)(t) =
f [nt] (x, i).
σ(f ) 2n log log n
Then for each x ∈ X, {ξn,x | n ∈ N} has uniformly compact closure in D([0, 1], R).
Moreover, the closure is equal to the set
1
ξ (s)2 ds ≤ 1 .
ξ : [0, 1] → R | ξ is absolutely continuous, ξ(0) = 0,
0
The FLIL immediately implies the law of the iterated logarithm (set t = 1).
Corollary 1.3 (Law of the iterated logarithm). Let f : X → R be a bounded
Lipschitz function. Suppose that ν(f ) = 0 and σ 2 (f ) > 0. Then for all x ∈ X and
µx -a.e. i ∈ Σ,
f n (x, i)
√
lim sup
= 1.
n→∞ σ(f ) 2n log log n
License or copyright restrictions may apply to redistribution; see http://www.ams.org/journal-terms-of-use
1084
C. P. WALKDEN
The above results are well-known corollaries of an almost sure invariance principle (Theorem 1.4 below). Other standard corollaries of the almost sure invariance
principle are discussed in more detail in [HH].
Theorem 1.4 (Almost sure invariance principle). Suppose that f : X → R is a
bounded Lipschitz function and is such that ν(f ) = 0 and σ 2 = σ 2 (f ) > 0. Fix
x ∈ X. Then there exists a probability space (Ω, F, P ) and a one-dimensional
Brownian motion W : Ω → C(R+ , R) such that the random variable ω → W (ω)(t)
has variance σ 2 t, and sequences of random variables φn,x : Σ → R, ψn,x : Ω → R
with the following properties:
(i) for all ε > 0, we have
1
f n (x, ·) = φn,x (·) + O(n 4 +ε ) µx -a.e.;
(ii) the sequences {φn,x (·)}n and {ψn,x (·)}n are equal in distribution;
(iii) for all ε > 0, we have
1
ψn,x (·) = W (·)(n) + O(n 4 +ε ) P -a.e.
(5)
Remark. Theorem 1.4 holds for a more general class of observable f . See §4.
This note is organised as follows. In §2 we remark on some generalisations and
examples of the set-up described above. The Skorokhod Embedding Theorem, a key
tool in our analysis, is stated in §3. In §4 we define a transfer operator acting on a
suitable family of functions spaces and state its spectral properties. The sequence of
observations f n is reduced to a martingale in §5. Following some moment estimates
in §6, we prove Theorem 1.4(i), (ii). The error term in Theorem 1.4(iii) is estimated
in §7.
2. Examples, remarks and generalisations
2.1. Generalisation to countably many transformations. In §1 we stated our
results for iterated function schemes consisting of finitely many transformations
Tj . Our results continue to hold in the more general case of countably many
transformations provided that one makes the following additional assumptions:
(i)
sup
∞
d(Tj y, Tj z)
x,y,z∈X,y=z j=0
d(y, z)
pj (x) < ∞,
(ii) for some (hence any) choice of x0
∞
d(Tj y, x0 )
pj (x) < ∞,
x,y∈X j=0 1 + d(y, x0 )
sup
(iii) for some (hence any) choice of x0
∞
d(Tj x, x0 )
|pj (y) − pj (z)|
sup
.
sup
1 + d(x, x0 ) y,z∈X,y=z
d(y, z)
x∈X j=0
The above conditions are required to ensure that a suitably defined transfer operator (see §4.2) has the appropriate spectral properties [Pe]. Note that the above
conditions are trivially satisfied in the case of finitely many Tj s.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/journal-terms-of-use
ITERATED MAPS THAT CONTRACT ON AVERAGE
1085
2.2. Generalisation to non-negative probabilities. In §1 we assumed that the
probabilities pj are strictly positive. In addition to the generalisations given in §2.1,
we can weaken this to pj (x) ≥ 0, provided that we assume the following: for all
x, y ∈ X there exists i, j ∈ Σ such that
lim d(Zn (x, i), Zn (y, j))(1 + d(Zn (y, j), x0 )) = 0
n→∞
with pin (Zn (x, i)) > 0 and pjn (Zn (y, j)) > 0 for all n ≥ 1. Again, these conditions
are required to ensure that a certain transfer operator has the appropriate spectral
properties [Pe].
2.3. Recoding. It may happen that a given system fails to satisfy (1) but does
satisfy this condition after it has been recoded. For example, take X = R2 and
T1 (x, y) = (5x/4, y/4) and T2 (x, y) = (x/4, 5y/4), both chosen with probability
1/2. This system does not satisfy (1). However, the recoded system Ti,j = Ti ◦ Tj ,
i, j ∈ {1, 2}, each chosen with probability 1/4, does satisfy (1). Hence Theorem
1.4 holds for the recoded system. It is then simple to see that the conclusions
of Theorem 1.4 hold for the original system. Note that neither T1 nor T2 are
contractive.
2.4. Example: Affine maps. Specific examples of systems satisfying our hypotheses can be given by using affine maps; see [Pe, DF]. Let Aj ∈ Gl(d, R) and
bj ∈ Rd . Define
Tj : Rd → Rd : x → Aj x + bj .
In this setting, the contraction on average condition (1) becomes
Aj pj (x) < 1,
sup
x∈Rd
j
where A = supx=0 Ax/x denotes the matrix norm of A.
3. The Skorokhod embedding theorem
Let (Ω, F, P ) be a probability space.
Definition. A stochastic process W : Ω → C(R+ , R) is called a Brownian motion
(with mean zero and variance σ 2 > 0) if:
(i) for P -a.e. ω ∈ Ω, we have W (ω)(0) = 0;
(ii) there exists σ 2 > 0 such that for each t > 0, the random variable
ω → W (ω)(t) : Ω → R
is normally distributed with mean zero and variance σ 2 t;
(iii) for all t0 < t1 < · · · < tn the random variables
ω → W (ω)(ti ) − W (ω)(ti−1 )
are independent.
Suppose that Fn is an increasing sequence of sub-σ-algebras of F.
Definition. A sequence of random variables Sn is called a martingale if, for each
We call Xn = Sn − Sn−1 a
n, Sn is Fn -measurable and E(Sn | Fn−1 ) = Sn−1 a.s. martingale difference. Setting X0 = S0 we write Sn = nj=0 Xj .
The following standard result [HH, p. 269] allows us to model a martingale by a
Brownian motion.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/journal-terms-of-use
1086
C. P. WALKDEN
Proposition 3.1 (Skorokhod Embedding Theorem). Let {φn = n−1
j=0 Xj , Bn } be a
square integrable, zero mean martingale on a probability space (X, B, µ). Then there
exists a probability space (Ω, F, P ), a standard Brownian motion W : Ω → C(R+ , R)
and a sequence of
non-negative random variables τ0 , τ1 , . . . with the following propn−1
erties: Let Tn = j=0 τj , ψn = W (Tn ) : Ω → R, and denote by Gn the σ-algebra
generated by ψ0 , . . . , ψn−1 , W (t), 0 ≤ t ≤ Tn . Then:
(i)
(ii)
(iii)
(iv)
the sequences {φn }n and {ψn }n are equal in distribution;
Tn is Gn -measurable;
E(τn | Gn−1 ) = E((ψn − ψn−1 )2 | Gn−1 ) a.s.;
for each r > 1 there exists a constant Cr < ∞ such that E(τnr | Gn−1 ) ≤
Cr E((ψn − ψn−1 )2r | Gn−1 ) a.s.
4. Function spaces and transfer operators
4.1. Function spaces. We will be interested in the following family of function
spaces. Fix a choice of origin x0 ∈ X (the exact choice of x0 is immaterial). Let
α, β ∈ (0, 1) and define a(t) = 1 + tα , b(t) = tβ . Define
Ca,b (X, R) = {f ∈ C(X, R) | f a,b < ∞},
where f a,b = |f |a + |f |a,b and
|f |a
|f (x)|
,
a(d(x, x0 ))
|f (x) − f (y)|
.
=
sup
x,y∈X,x=y a(d(x, x0 ))b(d(x, y))
=
sup
x∈X
|f |a,b
The following is easily proved from the definitions.
Lemma 4.1. Let fj ∈ Ca,b (X, R) for j = 1, 2, . . . , . Then
Moreover,
j=1 fj ∈ Ca ,b+b (X, R).
f1 f2 a2 ,b+b2 ≤ f1 a,b f2 a,b .
(6)
We shall use the following simple observation in what follows.
Lemma 4.2. Let a and b be as above. Let f : X → R be a bounded Lipschitz
function. Then f ∈ Ca,b (X, R).
Proof. Clearly |f |a = supx |f (x)|/a(d(x, x0 )) ≤ |f |∞ , where |f |∞ = supx∈X |f (x)|.
Similarly,
|f |a,b
≤
sup
x,y∈X,d(x,y)≤1
≤
sup
x,y∈X,d(x,y)≤1
|f (x) − f (y)|
|f (x) − f (y)|
+
sup
a(d(x, x0 ))b(d(x, y)) x,y∈X,d(x,y)≥1 a(d(x, x0 ))b(d(x, y))
|f |Lip d(x, y)
|f (x)| + |f (y)|
+
sup
a(d(x, x0 ))b(d(x, y)) x,y∈X,d(x,y)≥1 a(d(x, x0 ))b(d(x, y))
≤ |f |Lip + 2|f |∞ < ∞.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/journal-terms-of-use
ITERATED MAPS THAT CONTRACT ON AVERAGE
1087
4.2. Transfer operators. Define the operator P : C(X, R) → C(X, R) by
P f (x) =
M
pj (x)f (Tj x).
j=1
It follows that
P n f (x) =
pi0 (x)pi1 (Ti0 x) · · · pin−1 (Tin−2 · · · Ti0 x)f (Tin−1 · · · Ti0 x)
in−1 ,...,i0
=
(7)
f (Zn (x, i)) dµx (i).
Σ
More generally, let Fk denote the sub-σ-algebra of Σ generated by the cylinders of
length k. Then
P f (Zn−1 (x, ·)) = Ex (f (Zn (x, ·)) | Fn−1 ),
(8)
where Ex denotes the conditional expectation on L1 (Σ, µx ). As x is fixed, we will
often write f Zk for f (Zk (x, ·)).
The following result, ensuring that on suitable function spaces P has a spectral
gap, will be a key tool in our analysis.
Proposition 4.3 ([Pe]). Let a(t) = 1 + tα , b(t) = tβ where α, β ∈ (0, 1/2) and
β < α. Then:
(i) the operator P maps Ca,b (X, R) → Ca,b (X, R);
(ii) the operator P , when restricted to Ca,b (X, R), has 1 as a simple maximal
eigenvalue with an associated eigenprojection ν;
(iii) the remainder of the spectrum lies inside a disc of radius ρ = ρ(a, b) ≤
r α < 1 (where r is as in (1)).
Remarks. 1. Thus we can write P = ν + Q, where Q : Ca,b (X, R) → Ca,b (X, R)
has spectral radius at most ρ. Moreover, we have that νQ = Qν = 0 so that
P n = ν + Qn .
2. The eigenprojection ν corresponds to the stationary probability measure
defined in (2).
Suppose that we are in the simpler case of a place-independent probability, that
is, pi (x) = pi is independent of x for each i. Then we can instead work on the
simpler function space
C α (X, R) = {f : X → R | f α = |f | + |f |(α) < ∞},
where
|f (x)|
,
1 + d(x, x0 )
|f (x) − f (y)|
=
sup
.
d(x, y)α
x,y∈X,x=y
|f | =
sup
x∈X
|f |(α)
This space is invariant under the transfer operator P . It is shown in [Pe] that if
α is sufficiently small, then P is quasi-compact and the conclusions of Proposition
4.3 hold.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/journal-terms-of-use
1088
C. P. WALKDEN
5. Reduction to a martingale
Let f : X → R be a bounded Lipschitz function. We assume that ν(f ) = 0. In
[Pe] it is shown that a, b : R+ → R+ can be chosen as in §4.1 in such a way that P
acts on the different spaces Ca,b (X, R), Ca2 ,b+b2 (X, R) and Ca4 ,b+b4 (X, R) and has
a spectral gap.
Let
w=
∞
P kf =
k=0
∞
Qk f ∈ Ca,b (X, R).
k=0
Then
w − Pw =
∞
P kf −
k=0
Recalling that f n (x, i) =
f n (x, i) =
=
=
n−1
n−1
k=0
∞
P k f = f.
k=1
f Zk (x, i) we see that
wZk (x, i) − P wZk (x, i)
k=0
n−1
(wZk (x, i) − P wZk−1 (x, i)) + w(x) − P wZn−1 (x, i)
k=1
n−1
uk (x, i) + w(x) − P wZn−1 (x, i),
k=1
where we have set uk (x, i) = wZk (x, i) − P wZk−1 (x, i). Let
φn,x (i) =
n−1
uk (x, i)
k=1
so that
f n (x, i) = φn,x (i) + w(x) − P wZn−1 (x, i).
Proposition 5.1. Let ε > 0. Then for each x ∈ X
1
f n (x, i) = φn,x (i) + O(n 4 +ε )
for µx -a.e. i ∈ Σ.
Proof. For each n > 0 let
1
Σn,x = {i ∈ Σ | P wZn−1 (x, i) > n 4 +ε }.
To prove the proposition it is sufficient to prove that µx -a.e. i ∈ Σ is in at most
finitely
many Σn,x . By the Borel-Cantelli lemma it is sufficient to prove that
4
µ
(Σ
x
n,x ) < ∞. As P w ∈ Ca,b (X, R) we have that (P w) ∈ Ca4 ,b+b4 (X, R),
n
License or copyright restrictions may apply to redistribution; see http://www.ams.org/journal-terms-of-use
ITERATED MAPS THAT CONTRACT ON AVERAGE
1089
a space on which P acts and has a spectral gap. Now
µx (Σn,x ) =
1 dµx
Σn,x
≤
P wZn−1
1
Σn,x
1
≤
n 4 +ε
4
dµx
(P w)4 Zn−1 dµx
n1+4ε Σn,x
1
=
P n−1 ((P w)4 )(x)
n1+4ε
1
≤
(ν((P w)4 ) + Qn−1 ((P w)4 ))(x)
n1+4ε
1
≤ Const 1+4ε ,
n
where the constant is independent of n. As this series is summable, the result
follows.
The following is the key observation in our analysis.
Proposition 5.2. The sequence φn,x (·) is a square-integrable, zero-mean martingale on Σ with respect to the filtration {Fn }, where Fn denotes the σ-algebra of
cylinders of length n.
Proof. To see that φn,x is a martingale, simply observe that
uk (x, i) = wZk (x, i) − P wZk−1 (x, i)
= wZk (x, i) − Ex (wZk | Fk−1 )(x, i)
(9)
by (8). Hence
Ex (uk (x, i) | Fk−1 )
= Ex (wZk − Ex (wZk | Fk−1 ) | Fk−1 )
= Ex (wZk | Fk−1 ) − Ex (wZk | Fk−1 )
= 0
so that uk is a martingale difference operatorand φn,x is a martingale. Moreover,
integrating (9) with respect to µx shows that uk (x, i) dµx (i) = 0. Hence φn,x has
zero mean. That φn,x is square integrable follows from Proposition 6.2 below. 6. Moment estimates
Here we collect several estimates that will be useful below. We begin with the
following observation.
Lemma 6.1. Let f, g ∈ L1 (X, ν). Then
f Zn+m (x, i)gZn (x, i) dµx (i) = (P n (P m f · g))(x).
License or copyright restrictions may apply to redistribution; see http://www.ams.org/journal-terms-of-use
1090
C. P. WALKDEN
Proof. Simply observe that
f Zn+m (x, i)gZn (x, i) dµx (i)
=
pi0 (x) · · · pin−1 (Zn−1 (x, i))
i0 ,...,in−1
⎛
⎞
×⎝
pin (Zn (x, i)) · · · pin+m−1 (Zn+m−1 (x, i))f (Zn+m (x, i))⎠
in ,...,in+m−1
=
× g(Zn (x, i))
pi0 (x) · · · pin−1 (Zn−1 (x, i))(P m f )Zn (x, i)gZn (x, i)
i0 ,...,in−1
= (P n (P m f )g)(x).
We need the following estimate.
Proposition 6.2. The variances σ 2 (φn,x ) of φn,x with respect to the measure µx
satisfy
σ 2 (φn,x ) = nσ 2 (f ) + O(1).
Proof. Observe that w2 , (P w)2 ∈ Ca2 ,b+b2 (X, R), a space on which P acts and has a
spectral gap. Recall that φn,x has zero mean. Also recall that the uk are orthogonal
as they are martingale differences. Hence
n 2
u2k (x, i) dµx (i).
σ (φn,x ) =
k=1
Now
u2k (x, i) dµx (i)
=
w2 Zk (x, i) − 2wZk (x, i)P wZk−1 (x, i) + (P w)2 Zk−1 (x, i) dµx (i).
By (7) and Lemma 6.1, we have that
(10)
u2k (x, i) dµx (i) = P k (w2 )(x) − P k−1 ((P w)2 )(x)
so that
σ 2 (φn,x )
=
n
(P k (w2 ) − P k−1 ((P w)2 ))(x)
k=1
= n(ν(w2 ) − ν((P w)2 )) +
n
Qk (w2 )(x) − Qk−1 ((P w)2 )(x).
k=1
Proposition 4.3 implies
n
∞
k
2
k−1
2
Q (w )(x) − Q
((P w) )(x) ≤ Const
ρk < ∞.
k=1
k=1
∞
k
A calculation, using the fact that w = k=0 P f , shows that ν(w2 ) − ν((P w)2 ) =
σ 2 (f ).
License or copyright restrictions may apply to redistribution; see http://www.ams.org/journal-terms-of-use
ITERATED MAPS THAT CONTRACT ON AVERAGE
1091
Remark. One can easily show that the degenerate case σ 2 (f ) = 0 holds if and only
if there exists u : X → R such that, for each j, f Tj = uTj − u ν-a.e. [Pe].
Remark. The above calculation immediately implies that φn,x (·) ∈ L2 (Σ, µx ). By
Proposition 3.1, we can find an abstract probability space (Ω, F, P ), a Brownian
motion W : Ω → C(R+ , R) and a sequence of stopping times Tn such that if we define ψn,x = W (Tn ), then the sequences {φn,x } and {ψn,x } are equal in distribution.
This immediately implies part (ii) of Theorem 1.4.
We will also need the following fourth moment estimate. Here and throughout,
uk denotes uk (x, ·).
Lemma 6.3. There exists K > 0 such that Ex (u4k ) ≤ K < ∞ for all k ≥ 1.
Proof. Observe that
Ex (u4k )
= Ex (w4 Zk (x, i) + 4w3 Zk (x, i)(P w)Zk−1 (x, i)
+ 6w2 Zk (x, i)(P w)2 Zk−1 (x, i)
+ 4wZk (x, i)(P w)3 Zk−1 (x, i) + (P w)4 Zk−1 (x, i))
= P k−1 (P (w4 ) + 4P (w3 )P w + 6P (w2 )(P w)2 + 5(P w)4 )
≤ ν(P (w4 ) + 4P (w3 )P w + 6P (w2 )(P w)2 + 5(P w)4 ) + Const ρk .
7. Estimation of the error term
To estimate the error term and to prove part (iii) of Theorem 1.4, we follow
an approach outlined in [PS]. Let σ 2 = σ 2 (f ) = ν(w2 ) − ν((P w)2 ). Let W be
a standard Brownian motion satisfying the conclusions of Proposition 3.1. We
estimate
Tn − σ 2 n
=
n−1
τk − E(τk | Gk−1 )
k=0
+
n−1
E(τk | Gk−1 ) − E((ψk − ψk−1 )2 | Gk−1 )
k=0
+
n−1
E((ψk − ψk−1 )2 | Gk−1 ) − (ψk − ψk−1 )2
k=0
+
+
n−1
(ψk − ψk−1 )2 − Ex (u2k )
k=0
n−1
Ex (u2k ) − σ 2 n
k=0
= (I) + (II) + (III) + (IV) + (V).
By Proposition 3.1(iii), we have that (II) = 0. By Proposition 6.2, (V) = O(1).
We will then need the following lemma, which is a special case of a version of
the Kronecker lemma for martingales; see [F, p. 243].
License or copyright restrictions may apply to redistribution; see http://www.ams.org/journal-terms-of-use
1092
C. P. WALKDEN
Lemma 7.1. Let Sn = X0 + · · · + Xn be a zero mean martingale. Suppose that for
some K, we have that E(Xn2 ) ≤ K < ∞ for all n ≥ 0. Then for any ε > 0 we have
that
Sn
→ 0 a.s.
1
n 2 +ε
1
We claim that both (I) and (III) are O(n 2 +ε ) a.s.
Lemma 7.2. Let x ∈ X and ε > 0. Then
n−1
1
(I) =
τk − E(τk | Gk−1 ) = O(n 2 +ε ) a.s.
k=0
Proof. Let Xk = τk −E(τk | Gk−1 ). Proposition 3.1(ii) implies that τk −E(τk | Gk−1 )
is Gk -measurable and that
E(τk − E(τk | Gk−1 ) | Gk−1 ) = E(τk | Gk−1 ) − E(τk | Gk−1 ) = 0.
Hence the summand in (I) is a martingale difference.
Note that
E((τk − E(τk | Gk−1 ))2 ) ≤ E(τk2 )
≤ E(E(τk2 | Gk−1 ))
≤ C2 E(E((ψk − ψk−1 )4 | Gk−1 ))
= C2 E((ψk − ψk−1 )4 )
= C2 Ex (u4k )
≤ K<∞
(where we have used Proposition 3.1(i), (iii) and Lemma 6.3) for some constant K
independent of k.
Lemma 7.3. Let x ∈ X and let ε > 0. Then
n−1
1
(III) =
E((ψk − ψk−1 )2 | Gk−1 ) − (ψk − ψk−1 )2 = O(n 2 +ε ) a.s.
k=0
Proof. Let Xk = E((ψk − ψk−1 )2 | Gk−1 ) − (ψk − ψk−1 )2 . The definition of Gk in
Proposition 3.1 implies that E((ψk −ψk−1 )2 | Gk−1 )−(ψk −ψk−1 )2 is Gk -measurable.
Moreover,
E((ψk − ψk−1 )2 − E((ψk − ψk−1 )2 | Gk−1 ) | Gk−1 ) = 0.
Hence the summand in (III) is a martingale difference.
Note that
E((E((ψk − ψk−1 )2 | Gk−1 ) − (ψk − ψk−1 )2 )2 ) ≤ E((ψk − ψk−1 )4 )
= Ex (u4k ) ≤ K
1
for some constant K independent of k. By Lemma 7.1, (III) = O(n 2 +ε ) a.s.
Finally, it remains to consider (IV).
Lemma 7.4. We have that
n−1
1
(IV) =
((ψk − ψk−1 )2 − Ex (u2k )) = O(n 2 +ε ) µx -a.e.
k=0
License or copyright restrictions may apply to redistribution; see http://www.ams.org/journal-terms-of-use
ITERATED MAPS THAT CONTRACT ON AVERAGE
1093
The summand in (IV) is not in general a martingale difference, and the analysis
above does not immediately generalise. We prove Lemma 7.4 in §8 below.
Combining the above we see that
1
Tn − σ 2 n = O(n 2 +ε )
µx -a.e. To prove part (iii) of Theorem 1.4 it suffices to note that for any ε > 0,
W (Tn )
1
= W (σ 2 n + O(n 2 +ε )) a.s.
1
= W (σ 2 n) + O(n 4 +ε ) a.s.
for some 0 < ε < ε.
8. Proof of Lemma 7.4
In order to prove Lemma 7.4 we need the following special case of the Ga’alKoksma inequality.
Lemma 8.1 ([PS, Appendix I]). Let Xk be a zero-mean random variable such that
E(Xk2 ) ≤ K < ∞, where K is independent of k. Suppose in addition that for each
m, n ∈ N
⎛
2 ⎞
m+n
Xk ⎠ = O(n),
E⎝
k=m
where the implied constant is independent of m. Then for any ε > 0, we have that
n−1
1
Xk = O(n 2 +ε ) a.s.
k=0
We apply Lemma 8.1 to (IV) with Xk = (ψk − ψk−1 )2 − Ex (u2k ). By Lemma 6.3
and Proposition 3.1(i) we see that
(11)
Now
(12)
E(Xk2 ) = Ex ((u2k − Ex (u2k ))2 ) = Ex (u4k ) − Ex (u2k )2 ≤ K < ∞.
⎛
2 ⎞ n+m
m+n
n m+n−d
Xk ⎠ =
E(Xk2 ) + 2
E(Xk+d Xk ).
E⎝
k=m
k=m
d=1
k=m
The first term is clearly O(n) (independently of m) by (11).
We claim that
|E(Xk+d Xk )| ≤ Const ρd ,
(13)
where the constant is independent of m, n, d, k. To see that (13) is sufficient to
conclude that (12) is O(n), simply note that
n m+n−d
d=1
k=m
ρd =
n
d=1
(n − d)ρd ≤
n
d=1
nρd ≤
n
= O(n).
1−ρ
It remains to prove (13). Using Proposition 3.1(i) we have
E(Xk+d Xk ) = Ex (u2k+d u2k ) − Ex (u2k+d )Ex (u2k ).
License or copyright restrictions may apply to redistribution; see http://www.ams.org/journal-terms-of-use
1094
C. P. WALKDEN
As u2k = w2 Zk − 2wZk P wZk−1 + (P w)2 Zk−1 we can write the above as a linear
combination of the following nine terms:
(14)
Ex (w2 Zk+d w2 Zk ) − Ex (w2 Zk+d )Ex (w2 Zk ),
(15)
Ex (w2 Zk+d wZk P wZk−1 ) − Ex (w2 Zk+d )Ex (wZk P wZk−1 ),
(16)
Ex (w2 Zk+d (P w)2 Zk−1 ) − Ex (w2 Zk+d )Ex ((P w)2 Zk−1 ),
(17)
(18)
Ex (wZk+d P wZk+d−1 w2 Zk ) − Ex (wZk+d P wZk+d )Ex (w2 Zk ),
Ex (wZk+d P wZk+d−1 wZk P wZk−1 )
− Ex (wZk+d P wZk+d )Ex (wZk P wZk−1 ),
(19)
Ex (wZk+d P wZk+d−1 (P w)2 Zk−1 )
− Ex (wZk+d P wZk−1 )Ex ((P w)2 Zk−1 ),
(20)
Ex ((P w)2 Zk+d−1 w2 Zk ) − Ex ((P w)2 Zk+d−1 )Ex (w2 Zk ),
(21)
Ex ((P w)2 Zk+d−1 wZk P wZk−1 )
− Ex ((P w)2 Zk+d−1 )Ex (wZk P wZk−1 ),
(22)
Ex ((P w)2 Zk+d−1 (P w)2 Zk−1 )
− Ex ((P w)2 Zk+d−1 )Ex ((P w)2 Zk−1 ).
The following estimates will prove useful.
Lemma 8.2. Let a, b be as in Proposition 4.3. For g, h ∈ Ca,b (X, R) and p > 0 we
have
(i) for each x ∈ X, |Qp g(x)| ≤ Const ρp ;
(ii) |ν(Qp g · h)| < Const ρp .
Proof. Statement (i) follows immediately from the definition of Ca,b (X, R) and
Proposition 4.3.
To prove (ii) note that
p
|Qp g(x)||h(x)| dν(x)
|ν(Q g · h)| ≤
≤
Qp ga,b a(d(x, x0 ))|h|a a(d(x, x0 )) dν(x)
p
≤ Const ρ
a(d(x, x0 ))2 dν(x).
α
As a(t)
= 1 + t and α < 1/2, we have a(t) = O(t). Hence it is sufficient to prove
that d(x, x0 ) dν(x) < ∞. In the case of finitely many maps Tj this is immediate
from [BDEG, Theorem 2.4]. For the case of infinitely many maps, we argue as
follows. From [Pe, Lemma 2.3], we have that
sup
Ex (d(Zn (x, i), x0 )) = M < ∞.
x∈X,n≥0
For each m ≥ 0, define dm (x)
= min{d(x, x0 ), m}. This is a bounded Lipschitz
function, hence P n dm (x) → dm dν. As for each n,
P n dm (x) = Ex (dm (Zn (x, i))) ≤ Ex (d(Zn (x, i), x0 )) ≤ M < ∞,
we have dm dν ≤ M < ∞ and the result follows by the Monotone Convergence
Theorem.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/journal-terms-of-use
ITERATED MAPS THAT CONTRACT ON AVERAGE
1095
To estimate (14), (16), (20) and (22) we use the following calculation. Recall
that x ∈ X is fixed. For brevity, we shall write P g for P g(x), etc., in what follows.
Lemma 8.3. Let g, h ∈ Ca2 ,b+b2 (X, R). Then
|Ex (gZp+q hZp ) − Ex (gZp+q )Ex (hZp )| ≤ Const ρq ,
where the constant is independent of p, q.
Proof. Observe that
|Ex (gZp+q hZp ) − Ex (gZp+q )Ex (hZp )|
= |P p (hP q g) − P p+q g · P p h|
= |P p (ν(g)h + Qq g · h) − (ν(g) + Qp+q g)(ν(h) + Qp h)|
= |ν(Qq g · h) + Qp (Qq g · h) − ν(h)Qp+q g − Qp+q g · Qp h|
≤ Const ρq .
To estimate (17) and (19) we use:
Lemma 8.4. Let g, h ∈ Ca,b (X, R). Then
|Ex (gZp+q P gZp+q−1 hZp ) − Ex (gZp+q P gZp+q−1 )Ex (hZp )| ≤ Const ρq ,
where the constant is independent of p, q.
Proof. Observe that, using a simple extension of Lemma 6.1,
Ex (gZp+q P gZp+q−1 hZp )
= P p ((P q−1 (P g)2 )h)
= P p (ν((P g)2 )h + Qq−1 ((P g)2 )h)
= ν((P g)2 )ν(h) + ν(Qq−1 ((P g)2 )h) + ν((P g)2 )Qp h + Qp (Qq−1 ((P g)2 )h)
and
Ex (gZp+q P gZp+q−1 )Ex (hZp )
= P p+q−1 ((P g)2 )P p h
= (ν((P g)2 ) + Qp+q−1 ((P g)2 ))(ν(h) + Qp h)
= ν((P g)2 )ν(h) + ν((P g)2 )Qp h + ν(h)Qp+q−1 ((P g)2 )
+ Qp+q−1 ((P g)2 )Qp h.
Noting that the constant terms and terms only involving Qp cancel and applying
Lemma 8.2, we see that the difference of the above terms is at most Const ρq . To estimate (15) and (21) we use:
Lemma 8.5. Let g, h ∈ Ca,b (X, R) and suppose that ν(h) = 0. Then
|Ex (gZp+q hZp P hZp−1 ) − Ex (gZp+q )Ex (hZp P hZp−1 )| ≤ Const ρq ,
where the constant is independent of p, q.
Proof. First note that P h = Qh. It is easy to see that
Ex (gZp+q hZp P hZp−1 )
= P p−1 ((P ((P q g)h))P h)
= ν(g)ν((Qh)2) + ν(Q(Qq g · h)Qh) + ν(Qq g · h)Qp h + ν(g)Qp−1 ((Qh)2 )
+ Qp−1 (Q(Qq g · h)Qh)
License or copyright restrictions may apply to redistribution; see http://www.ams.org/journal-terms-of-use
1096
C. P. WALKDEN
and
Ex (gZp+q )Ex (hZp P hZp−1 )
=
ν(g)ν((Qh)2) + ν(g)Qp−1 ((Qh)2 ) + ν((Qh)2 )Qp+q g
+ Qp+q g · Qq−1 ((Qh)2 ).
The constant terms and terms involving only Qp cancel, and Lemma 8.2 (with an
obvious modification to control the term ν(Q(Qq g · h)Qh) implies the result.
Finally, we estimate (18). Recall that ν(w) = 0 so that P w = Qw. First note
that
Ex (wZk+d P wZk+d−1 wZk P wZk−1 )
= P k−1 ((P ((P d−1 ((P w)2 ))w))P w)
= ν((Qw)2 )ν((Qw)2 ) + ν(Q(Qd−1 ((Qw)2 )w)Qw) + ν(Qd−1 ((Qw)2 )w)Qk w
+ ν((Qw)2 )Qk−1 ((Qw)2 ) + Qk−1 (Q(Qd−1 ((Qw)2 )w)Qw)
(where we have used the facts that ν(w) = 0 and νQ = 0) and that
Ex (wZk+d P wZk+d−1 )Ex (wZk P wZk−1 )
=
(P k+d−1 ((Qw)2 ))(P k−1 (Qw)2 )
=
(ν((Qw)2 ) + Qk+d−1 ((Qw)2 ))(ν((Qw)2 ) + Qk−1 ((Qw)2 )).
Multiplying this out, we see that the constant terms and terms involving only Qk
again cancel. Applying Lemma 8.2 (and an obvious modification to control the
term ν(Q(Qd−1 ((Qw)2 )w)Qw)) implies that (18) is bounded above by Const ρd .
Combining the above and Lemmas 8.3, 8.4, 8.5 we see that
|E(Xk+d Xk )| = |Ex (u2k+d u2k ) − Ex (u2k+d )Ex (u2k )| ≤ Const ρd ,
where the constant is independent of k, d. By the discussion above, this implies
1
that (IV) = O(n 2 +ε ) for any ε > 0. This concludes the proof of Theorem 1.4.
References
[BDEG] M.F. Barnsley, S.G. Demko, J.H. Elton and J.S. Geronimo, Invariant measures for
Markov processes arising from iterated function systems with place-dependent probabilities. Ann. Inst. H. Poincaré Probab. Statist., 24 (1988), 367–394. MR0971099
(89k:60088)
[DK]
M. Denker and W. Philipp, Approximation by Brownian motion for Gibbs measures and
flows under a function, Ergod. Th. & Dynam. Sys., 4 (1984), 541–552. MR0779712
(86g:28025)
[DF]
P. Diaconis and D. Freedman, Iterated random functions, SIAM Review 41 (1999), 45–
76. MR1669737 (2000c:60102)
[E1]
J.H. Elton, An ergodic theorem for iterated maps, Ergod. Th. & Dynam. Sys., 7 (1987),
481–488. MR0922361 (89b:60156)
[E2]
J.H. Elton, A multiplicative ergodic theorem for Lipschitz maps, Stoch. Proc. Appl., 34
(1990), 39–47. MR1039561 (91a:47010)
[F]
W. Feller, An Introduction to Probability Theory and its Applications, vol. II, Wiley,
New York, 1966. MR0210154 (35:1048)
[FMT] M.J. Field, I. Melbourne and A. Török, Decay of correlations, central limit theorems
and approximation by Brownian motion for compact lie group extensions, Ergod. Th. &
Dynam. Sys., 23 (2003), 87–110. MR1971198 (2004m:37046)
[HH]
P. Hall and C. Heyde, Martingale Limit Theory and its Applications, Academic Press,
New York, 1980. MR0624435 (83a:60001)
License or copyright restrictions may apply to redistribution; see http://www.ams.org/journal-terms-of-use
ITERATED MAPS THAT CONTRACT ON AVERAGE
[MT]
[Pe]
[PS]
[Po]
[PoS]
[S1]
[S2]
1097
I. Melbourne and A. Török, Central limit theorems and invariance principles for timeone maps of hyperbolic flows, Commun. Math. Phys., 229 (2002), 57–71. MR1917674
(2003k:37012)
M. Peigné, Iterated function systems and spectral decomposition of the associated
Markov operator, Publ. Inst. Rech. Math. Rennes 1993, Univ. Rennes I, Rennes, 1993.
MR1347702
W. Philipp and W. Stout, Almost sure invariance principles for partial sums of weakly
dependent random variables, Mem. Amer. Math. Soc., 161 (1975), Amer. Math. Soc.,
Prov., Rhode Island. MR0433597 (55:6570)
M. Pollicott, Contraction in mean and transfer operators, Dynam. & Stab. Syst., 16
(2001), 97–106. MR1835908 (2002c:37080)
M. Pollicott and R. Sharp, Invariance principles for interval maps with an indifferent
fixed point, Commun. Math. Phys., 229 (2002), 337–346. MR1923178 (2003k:37016)
V. Strassen, An invariance principle for the law of the iterated logarithm, Z. Wahr., 3
(1964), 211–226. MR0175194 (30:5379)
V. Strassen, Almost-sure behaviour of sums of independent random variables and martingales, Proc. Fifth Berkeley Symp. Math. Stat. Prob., 2 (1965), 314–343.
School of Mathematics, University of Manchester, Oxford Road, Manchester M13
9PL, United Kingdom
E-mail address: [email protected]
License or copyright restrictions may apply to redistribution; see http://www.ams.org/journal-terms-of-use