Almost Sure Central Limit Theory
Fredrik Jonsson
U.U.D.M. Project Report 2007:9
Examensarbete i matematisk statistik, 20 poäng
Handledare och examinator: Allan Gut
Mars 2007
Department of Mathematics
Uppsala University
Abstract
The Almost sure central limit theorem states in its simplest form that a
sequence of independent, identically distributed random variables {Xk }k≥1 ,
with moments EX1 = 0 and EX12 = 1, obeys
n
1 X1
Sk
a.s.
lim
I √ ≤ x = Φ(x),
n→∞ log n
k
k
k=1
for each value x. I{·} here denotes the indicator function of events, Φ the
distribution function of the standard normal distribution and Sn the n:th
partial sum of the above mentioned sequence of random variables.
The purpose of this thesis is to present and summarize various kinds of
generalizations of this result which may be found in the research literature.
i
Acknowledgement
I would like to thank Professor Allan Gut for introducing me to the subject,
for careful readings of my drafts and for interesting conversations.
iii
Contents
1 Introduction
1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2
2 Preliminaries
2.1 Probability measures and weak convergence . . . . . . . . . .
2.2 Central limit theory . . . . . . . . . . . . . . . . . . . . . . .
2.3 Summation methods: linear transformations . . . . . . . . . .
3
3
8
15
3 Almost Sure Converging Means of Random Variables
19
3.1 Bounds for variances of weighted partial sums . . . . . . . . . 19
3.2 Bounds for covariances among individual variables . . . . . . 22
3.3 Refinements with respect to weight sequences . . . . . . . . . 25
4 Almost Sure Central Limit Theory
4.1 Independent random variables . . . . . . . . .
4.2 Weakly dependent random variables . . . . .
4.3 Subsequences . . . . . . . . . . . . . . . . . .
4.4 An almost sure version of Donsker’s theorem
.
.
.
.
5 Generalizations and Related Results
5.1 A universal result and some consequences . . .
5.2 Return times . . . . . . . . . . . . . . . . . . .
5.3 A local limit theorem . . . . . . . . . . . . . . .
5.4 Generalized moments in the almost sure central
v
.
.
.
.
.
.
.
.
.
.
.
.
. . .
. . .
. . .
limit
.
.
.
.
29
31
37
40
40
. . . . .
. . . . .
. . . . .
theorem
45
45
47
50
52
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
Introduction
The Almost sure central limit theorem states in its simplest form that a
sequence of independent, identically distributed random variables {Xk }k≥1 ,
with moments EX1 = 0 and EX12 = 1, obeys
n
1 X1
Sk
a.s.
lim
I √ ≤ x = Φ(x),
n→∞ log n
k
k
k=1
(1.1)
for each value x. I{·} here denotes the indicator function of events, Φ the
distribution function of the standard normal distribution and Sn the n:th
partial sum of the sequence of random variables {Xk }k≥1 . The notation
“a.s.” abbreviates “almost surely”, that is, with probability one. The first
version of (1.1) was proved in the late 1980s, but a preliminary result was
considered in the 1930s by Paul Lévy. It was at this early stage shown
(consult [13] for an elementary proof in the case of the simple, symmetric
random walk) that the random quantity
n
1X I Sk ≤ 0
n
(1.2)
k=1
does not stop to vary randomly as n tends to infinity. On the contrary, the
distributions of the random variables converge to the Arc sine distribution.
The quantity in (1.2) can be interpreted as the amount of time the random
walk {Sn } has spent below zero up to time n. In the result (1.1), except
for replacing 0 by the more general x, there are weights
P {1/k}k≥1 and a
different normalization corresponding to the fact that 1≤k≤n 1/k ∼ log n,
as n → ∞. In this way randomness vanishes asymptotically, but on the
other hand, the random walk occupancy time interpretation seems to be
lost.
There are many other kinds of sequences of random variables {Xk }k≥1 ,
than the one mentioned at the beginning where
Sn d
lim √ = Φ.
n→∞
n
For an even larger class of interesting sequences {Xk }k≥1 one has
S n − bn d
= G,
n→∞
an
lim
(1.3)
for some sequences {an }n≥1 and {bn }n≥1 and a distribution G (different from
the zero-distribution). One may call these results Central limit theorems.
The purpose of this thesis is to present and summarize known generalizations of (1.1) for some of the more well-known examples satisfying (1.3),
especially those where the Xk -variables are independent. This is the content
1
of Sections 4.1 and 4.2. Improvements, or other ways of generalizing (1.1),
are presented in Sections 4.4 and 5.4. The remaining parts of Chapter 5 and
Section 4.3 present other results which are interesting in this context. Some
useful background material is presented in Chapter 2 while Chapter 3 gives
relevant results to be used in Chapters 4 and 5.
The results of Chapter 3 are to a large extent based on arguments in [4]
which we separate and put in a more general form. However, we make a more
elementary connection to the theory of summation methods. No reference
to the general theory of Riesz typical means, as can be found in [4], is made
here. A slight generalization of the results has also been obtained under the
influence of [11].
A different way of arriving at (1.1), and related results which will be
considered here, is based on Characteristic functions and may be found in
[20].
It is not the aim of this thesis to give a complete overview of results
inspired by, or connected to (1.1). We refer to the survey paper [3] for further
results. We rather hope to introduce the subject and perhaps contribute
concerning unification and by filling out some gaps where research articles
most of the times leave out the details. Examples of the latter kind are
Theorems 2.4 and 5.19. Some extensions of previously published results
may also be new.
1.1
Notation
We follow Landau’s “small o”, “big O” notation for real-valued functions
and sequences. That is: f = o(g) means f (x)/g(x) → 0 as x → ∞ and
f = O(g) means f (x)/g(x) remains bounded as x → ∞. We presume the
reader’s familiarity with such statements as: For all > 0, log x = o(x ).
By f ∼ g we mean “asymptotically equal”, that is f (x)/g(x) → 1 as
x → ∞. An example of a true statement of this kind is log (1 + 1/x) ∼ 1/x.
We follow the tradition of denoting iterated logarithms by logk , k ≥ 1.
That is, log1 (x) := log x, and recursively logk+1 (x) := log logk (x).
We also presume the reader’s familiarity with the (hopefully universal)
fundamental concepts of probability theory. We refer e.g. to the first chapters of [17]. As for notation we reserve N (µ, σ 2 ) to denote the normal distribution with expectation µ and variance σ 2 . We also denote the standard
normal distribution, N (0, 1), by N, its distribution function by Φ and its
density function by φ. As is common, we abbreviate “independent, identically distributed” i.i.d.
At some places we refer to facts and concepts from the theory of integration. In measure spaces we denote the indicator function (defined on the
same space, assuming values 0 and 1) of a subset A by I{A}.
2
2
2.1
Preliminaries
Probability measures and weak convergence
This section concerns probability measures on some space S, equipped with
a metric %(·, ·) and the usual σ-field S generated by the open balls in S.
Given such a measure P on (S, S), a set being P-continuous means that its
boundary has PR-measure 0. Given an P -integrable function f on S to R,
we also denote f dP by P f .
To such a setting we may extend the familiar notion (in S = R) of
Convergence in distribution to what is called Weak convergence of probability
measures. It concerns a collection ({Pn }n≥1 , P ) of probability measures, is
denoted Pn ⇒ P and defined by
lim Pn f = P f,
n→∞
for all bounded and continuous f.
The following theorem, usually called ”The Portmanteau Theorem”, gives
five equivalent definitions.
Theorem 2.1. Let {Pn }n≥1 and P be probability measures on (S, S). These
five conditions are equivalent:
(i) Pn ⇒ P ;
(ii) limn→∞ Pn f = P f , for all bounded, uniformly continuous f ;
(iii) lim supn Pn F ≤ P F , for all closed sets F;
(iv) lim inf n Pn G ≥ P G, for all open sets G;
(v) Pn A ⇒ P A, for all P-continuity sets A.
Proof. [6, Theorem 2.1, page 16].
In the case of a Separable metric space, i.e. one with a countable dense
subset, condition (ii) may, for sufficiency, be weakened as follows:
Proposition 2.2. Let {Pn }n≥1 , P and (S, S) be as before. Assume S separable. Then there exists a sequence {fm }m≥1 of bounded, Lipschitz-continuous
functions, fm : S → R, such that
lim Pn fm → P fm ,
for all m ≥ 1,
n→∞
implies Pn ⇒ P .
Proof. Let {xk }k≥1 denote a dense sequence in S. There are countably
many balls {B(xm , q) : m ∈ N, q ∈ Q} which we denote {Ak }k≥1 . The sets
{Ak } generate the open sets in S in the sense that every open set A may be
written
[
A=
Ak ,
(2.1)
k∈A
3
for some A ⊆ N. Indeed, if x ∈ A there exists an so that B(x, ) ⊆ A and
an xk so that xk ∈ B(x,
2 ). By the triangle inequality B(xk , 2 ) ⊆ A so that
A := k ∈ N : Ak ⊆ A will do.
To verify condition (iv) of Theorem 2.1 it will be enough toSconsider
only finite unions of sets Ak , since assuming so and writing A = j∈N Akj
by (2.1) implies that
lim inf Pn A = lim inf lim Pn
n
n
≥
≥
m→∞
m→∞
m
[
n
lim P
Ak j
Akj
j=1
lim lim inf Pn
m→∞
m
[
j=1
m
[
Akj = P A.
(2.2)
j=1
The first inequality (changing order of limits) is valid since
lim inf Pn
m
[
n
Ak j
is non-decrasing in m,
j=1
so that for any > 0 and some M ():
lim lim inf Pn
m→∞
m
[
n
Akj ≤ lim inf Pn
M
[
n
j=1
Akj + .
j=1
The first term on the right is majorized by
lim inf lim Pn
n
since
Pn
m
[
Akj
m→∞
m
[
Ak j ,
j=1
is non-decrasing in m, for all n.
j=1
This proves (2.2).
The collection of finite unions of balls Ak is also countable. Indeed,
the collection of n-ary unions is of non-larger cardinality than the n-fold
cartesian product, which is countable. And a countable union of countable
sets is countable. It therefor remains to show that for any fixed, finite union
of sets Ak there exist a sequence {fm } of bounded Lipschitz-functions so
that Pn fm → P fm for all m ∈ N implies lim inf n Pn A ≥ P A. This last
4
condition is equivalent to lim supn Pn F ≤ P F , for F = Ac , where F is a
closed set. Define
%(x, F ) :=
inf %(x, z)
z∈F
+
1 − %(x, F )m
n
1o
.
:=
x ∈ S : %(x, F ) <
m
fm (x) :=
Fm
Then F =
T
m∈N Fm ,
since F closed implies that
x∈
/ F ⇒ %(x, F ) > 0 ⇒ ∃m : %(x, F ) ≥
\
1
⇒x∈
/ Fm ⇒ x ∈
/
Fm .
m
m∈N
Moreover, Fm+1 ⊆ Fm implies that
P F = lim P
M →∞
M
\
Fm = lim P FM ,
m=1
M →∞
(2.3)
by fundamental properties of measures. Finally, it follows that for any m
I{F } ≤ fm ≤ I{Fm }.
(2.4)
Indeed, x ∈ F implies that fm (x) = 1 and (2.4) holds with equalities. Taking
x ∈ Fm \F implies that fm (x) = 1 − %(x, F )m ≤ 1 so that (2.4) holds with
0 and 1 on the boundaries. Taking x ∈
/ Fm finally implies that fm (x) = 0
and (2.4) once again holds with equalities.
Now, assuming Pn fm → P fm for all m ∈ N we get by (2.4) that
lim sup Pn F ≤ lim sup Pn fm = P fm ≤ P Fm .
n
n
The statement lim supn Pn F ≤ P F follows by (2.3). It only remains to show
that fm is Lipschitz, that is
|fm (x) − fm (y)| ≤ N %(x, y),
(2.5)
for some constant N and all x and y, since boundedness is obvious. In
fact (2.5) holds with N = m. This follows from Lemma 2.3 by, as for
(2.4), going through the different cases where x and y belong to F and Fm
respectively.
Lemma 2.3. Let A be any subset of S and define a positive function on S
by %(x, A) := inf z∈A %(x, z). Then %(·, A) is Lipschitz-1, i.e.
%(x, A) − %(y, A) ≤ %(x, y).
5
Proof. Assume w.l.o.g. that %(x, A) ≥ %(y, A). For > 0 take z ∈ A so that
%(y, z) − %(y, A) ≤ . Then
%(x, A) − %(y, A) = %(x, A) − %(y, A) ≤ %(x, z) − %(y, A)
≤ %(x, y) + %(y, z) − %(y, A)
≤ %(x, y) + .
The proof is complete since was arbitrary.
From Theorem 2.1 and Proposition 2.2 we now deduce a result to be
used in Chapters 4 and 5.
Theorem
P 2.4. Let {dk }k≥1 be a sequence of positive real numbers and set
Dn := 1≤k≤n dk for n ≥ 1. Let further {Xk }k≥1 be a sequence of random elements in a separable metric space S defined on a probability space
(Ω, P, P ). Let further G be a probability measure on S and in case S = R,
CG ⊆ R its continuity points. Finally, let for x ∈ S, δ(x) denote the Dirac
point measure of x.
The following two conditions are equivalent:
P
(i) D1n nk=1 dk δ(Xk ) ⇒ G, almost surely;
(ii)
1
Dn
Pn
k=1 dk f (Xk )
a.s.
−→
R
f dG,for all bounded Lipschitz-functions f.
In the case S = R the following is a third equivalent condition:
P
a.s.
(iii) D1n nk=1 dk I{Xk ≤ x} −→ G(x), for all x ∈ CG .
Proof. Define for all n and k:
Fn (ω) :=
n
1 X
dk δ(Xk (ω)),
Dn
k=1
Gk (ω) := δ(Xk (ω)).
P
Since dk ≥ 0 and Dn = k≤n dk , this defines, for ω ∈ Ω fixed, probability
measures on S. For the equivalence of (i) and (iii) when S = R we merely
note that Fn (ω) have distribution functions
n
1 X
dk I{Xk (ω) ≤ x}.
Fn (ω; x) :=
Dn
k=1
Conditions (i) and (iii) are therefore equivalent (cf. [6, Chapter 1]).
In general, condition (i) could now be stated
Fn (ω) ⇒ G,
all ω ∈
/ N.
6
(2.6)
for some P -null set N ∈ P. Theorem 2.1 gives the equivalence of (2.6) and
the statement
Z
Z
f dG,
f dFn −→
S
S
for all bounded, uniformly continuous f and all ω ∈
/ N . But
Z
Z
n
n
1 X
1 X
f dFn =
dk f dGk =
dk f Xk (ω) .
Dn
Dn
k=1
k=1
Since f Lipschitz implies f Uniformly continuous, condition (i) implies condition (ii) with the same null set N for all f .
On the other hand, by Proposition 2.2, statement (2.6) is also equivalent
to:
Z
Z
fm dFn −→ fm dG,
for all m and all ω ∈
/ N,
where {fm } is a certain sequence of bounded, Lipschitz-continuous functions.
Once again:
Z
Z
n
n
1 X
1 X
dk fm dGk =
dk fm Xk (ω) .
fm dFn =
Dn
Dn
k=1
k=1
It therefore remains to show that condition (ii) implies:
Z
n
1 X
dk fm Xk (ω) −→ fm dG,
Dn
(2.7)
k=1
for some P-null set N and all m, allSω ∈
/ N . Condition (iii) gives null
sets Nm P
for each fm . Taking N := m Nm gives another null set, since
P (N ) ≤ m P (Nm ) = 0. Finally, for m fixed we have:
ω∈
/N ⇒ω∈
/ Nm ⇒ (2.7).
Remark 2.5. Theorem 2.4 will in later chapters be applied in cases where
S = R and S = C[0, 1], the set of continuous real-valued functions on
C[0, 1] equipped with the metric of uniform convergence. Moreover also for
S = D[0, 1], the set of functions f : [0, 1] → R, at each point being rightcontinuous and having a left-hand limit, equipped with any of the metrics
d and d◦ defined in [6, Chapter 3]. Another common candidate which will
not be considered is S = Rd . Billingsley [6] proves that all these spaces are
separable.
7
2.2
Central limit theory
We begin by stating three versions of the Central limit theorem. First the
classical formulation, then the Lindeberg-Lévy-Feller version and finally an
extension of the first, which not merely concerns random variables with finite
variance and the normal limit.
Theorem 2.6. Let {Xk }k≥1 be a sequence of i.i.d. random variables
P with
finite expectation µ and positive, finite variance σ 2 , and set Sn = nk=1 Xk .
Then
Sn − nµ d
√
−→ N as n → ∞.
σ n
Proof. Confer for example [17, Theorem 1.1, Chapter 3, page 330].
Theorem 2.7. Let {Xk }k≥1 be a sequence of independent random variables
with
positive, finite variances σk2 , and set Sn =
Pn finite expectations
Pnµk and
2
2
k=1 Xk and sn =
k=1 σk . To avoid trivialities, assume that s1 > 0.
Among the three conditions below, (ii) is equivalent to the conjunction of (i)
and (iii).
σ2
(i) max1≤k≤n s2k → 0
as n → ∞;
n
P
(ii) s12 nk=1 E|Xk − µk |2 I |Xk − µk | > sn → 0
n
(iii)
1
sn
Pn
k=1 (Xk
d
− µk ) −→ N
as
as
n → ∞;
n → ∞.
Proof. [17, Theorem 2.1, Chapter 3, page 331].
Before the next result we need some new notions. A probability distribution F on R, belongs to the Domain of attraction of a non-degenerate
distribution G whenever a suitably centered and normalized sequence of partial sums of i.i.d. F -distributed random variables converge in distribution to
G. It can be shown that G is unique, up to centering and normalization, relative to F , and that only Stable distributions may occur as G-distributions.
(cf. [17, pages 428-431])
We only mention here that stable distributions may be characterized by
an order parameter, α ∈ (0, 2], the skewness parameter β ∈ [−1, 1], and
finally by centering and normalization. Their characteristic functions admit
finite expressions, cf. [17, page 427]. They possess moments of order r,
r ∈ (0, α), except when α = 2, which gives the normal distribution (no
extra skewness parameter for this case), which has moments of all orders.
The two most well-known members of the family are the symmetric Cauchy
distribution (α = 1) and the standard normal distribution (α = 2).
A positive, Lebesgue-measurable function L defined on [a, ∞) for some
a > 0, is said to be Slowly varying at infinity, L ∈ SV, whenever
L(tx)
→1
L(t)
as
t → ∞,
8
for all x > 0.
Examples are positive functions with a finite limit at infinity and L = log+ .
Theorem 2.8. A random variable X with distribution function F belongs
to the domain of attraction of a stable distribution of order α if and only if
there exists L ∈ SV, such that
EX 2 I{|X| ≤ x} ∼ x2−α L(x)
as
x → ∞.
(2.8)
and, moreover, for α ∈ (0, 2), there exists some p ∈ [0, 1], such that
P (X > x)
→p
P (|X| > x)
and
P (X < −x)
→1−p
P (|X| > x)
as
x → ∞.
(2.9)
Proof. Confer [17, Theorem 3.2, Chapter 9, page 432].
Remark 2.9. It is possible to replace (2.9) by a condition involving P (|X| >
x) instead of EX 2 I{|X| ≤ x}, cf. [17, page 432].
The centering sequence may be ignored when α < 1 and taken to be
{nEX} when α > 1. This is possible since X possesses moments of the same
order as the stable distribution to which convergence occurs. An explicit
expression for the centering sequence may also be given for the case α = 1.
The normalization sequence will typically be of the form n1/α L(n), for
some L ∈ SV and may be taken to be increasing. cf. [17] or [14].
Theorem 2.14 below is due to deAcosta and Giné [10]. We here restrict to
the case of real random variables and give the original proof in a somewhat
more detailed version. We first state some facts that will be needed which are
related to symmetric random variables and symmetrization and one lemma
concerning slowly varying functions. The inequalities in Proposition 2.13
are the so-called Lévy inequalities. For a random variable X we define the
d
distribution of the Symmetrized variable by X s = X − X 0 , where X 0 and X
are i.i.d.
Lemma 2.10. Assume that an = n1/α L(n) with L ∈ SV, α > 0. Then there
exist constants C and N and a sequence {τn }, such that τn → 0 as n → ∞
and so that for n > N and all m
amn
≤ Cm1/α+τn .
an
Proof. Set p = 1/α. By assumption
a2n
→ 2p
an
as
n → ∞.
Choose N so that one may define a non-increasing sequence {τn }, with
τN = 1 and τn → 0 as n → ∞ so that
a2n
≤ 2p+τn ,
an
when n > N .
9
It now follows for n > N that
k
Y
k
p+τn
a2k n
a2j n
.
=
≤ 2p+τn = 2k
an
a2j−1 n
j=1
It remains to show that for some constant C all k, 2k−1 ≤ m ≤ 2k and
n>N
amn
≤ C,
(2.10)
a2k−1 n
since then
p+τn
amn a2k−1 n
amn
amn
=
≤
≤ Cmp+τn .
·
· 2k−1
an
a2k−1 n
an
a2k−1 n
When an is (ultimately) increasing it follows that
amn
ak
≤ 2 n ≤ 2p+τn ≤ 2p+1 .
a2k−1 n
a2k−1 n
But (2.10) also holds in general by using the uniform convergence theorem
[7, Theorem 1.5.2, page 22], so that
a m k−1
m
a k−1
2k−1 2 n
2k−1 n
amn
m p m p
2
=
≤
−
≤ 1 + 2p ,
+
a2k−1 n
a2k−1 n
a2k−1 n
2k−1 2k−1
for n > N1 say. One may then choose N0 = N ∨ N1 instead of N .
Remark 2.11. This lemma could also easily be deduced from Karamata’s
representation theorem of slowly varying functions, (which may be found in
[7, page 12]).
Proposition 2.12. Let X be a random variable and let med (X) denote the
median of X. Then for any r > 0,
1
E|X − med (X)|r ≤ E|X s |r .
2
Proof. [17, Proposition 6.4, Chapter 3, page 135].
Proposition 2.13. Let {Xk }k≥1 be a sequence of independent, symmetric
random variables with partial sums Sn , n ≥ 1. Then
P ( max Sk > x) ≤ 2P (Sn > x),
1≤k≤n
P ( max |Sk | > x) ≤ 2P (|Sn | > x).
1≤k≤n
Proof. [17, Theorem 7.1, Chapter 3, page 139].
10
Theorem 2.14. Assume that a distribution function F belongs to the domain of attraction of a stableP
distribution G of index α. Take {Xk }k≥1 i.i.d.
and F -distributed, set Sn = nk=1 Xk and assume that
S n − bn d
−→ G,
an
as
n → ∞,
(2.11)
for some positive sequence {an } and real sequence {bn }. Then
S n − bn β
< ∞,
sup E an n
for all
β ∈ (0, α).
(2.12)
Remark 2.15. Theorem 2.14 implies that moments of order strictly less
than α converge to moments of corresponding order in the limit relation.
To prove this one needs to verify Uniform integrability for the sequences
S n − bn β
β ∈ (0, α).
an ,
Uniform boundedness for all such β suffices by [17, Theorem 4.2, Chapter
5, page 215].
Proof of Theorem 2.14. It suffices to prove the result for symmetric random
variables Xk . Indeed, for the general case of (2.11) it follows (by subtracting
independent, convergent random variables) that
Sns d
−→ Gs ,
an
as
n → ∞.
By assuming (2.12) for this sequence we may then use Proposition 2.12 to
conclude that
s β
Sn 1 Sn − bn
Sn − bn β
.
E − med
≤
E
an 2
an
an
n
It then remains to prove that the sequence {med ( Sna−b
)} is bounded, but
n
this follows from assumption (2.11).
For symmetric random variables no centering constants bn are necessary.
We now proceed to prove the theorem for this situation. For ∈ (0, 1/2),
choose d so that for all n
Sn P > d < .
an
This is possible since convergence implies stochastic boundedness. It now
follows by the second inequality in Proposition 2.13 that
P ( max |Snk − Sn(k−1) |/amn > d) ≤ 2P (|Smn |/amn > d) ≤ 2.
1≤k≤m
11
Now since
d
|Snk − Sn(k−1) | = |Sn |,
and all independent for 1 ≤ k ≤ m, and since for any i.i.d. random variables
Yk and Y ,
P max |Yk | > d = 1 − P max |Yk | ≤ d = 1 − P (|Y | ≤ d)m ,
1≤k≤m
k≤m
P (|Y | > d) = 1 − 1 − P
max |Yk | > d
1/m
1≤k≤m
,
it follows that
P (|Sn |/amn > d) ≤ 1 − (1 − 2)1/m .
(2.13)
One verifies that for 0 < a < 1, some constant C = C(a) and all m,
1 − a1/m ≤ C/m.
(2.14)
Indeed
1 − ax ≤ Cx ⇐⇒ ax ≥ 1 − Cx ⇐⇒ x log a ≥ log (1 − Cx),
But
log (1 − Cx) ∼ −Cx
as
x → 0,
and
x log (1/a) ≤ Cx
obviously holds for C ≥ log (1/a). Equation (2.13) therefore turns into
P (|Sn |/amn > d) ≤ C0 /m,
for some constant C0 .
(2.15)
It now follows by Lemma 2.10 that
mP (|Sn |/an > Cdm1/α+τn ) ≤ mP (|Sn |/amn > d) ≤ C0 .
(2.16)
From this we are now in a position to prove:
P |Sann | > t tα−δ ≤ C,
for any δ such that 0 < 2δ < α, for n > N (δ) and for all t. (2.17)
Indeed choose N (δ) so that for n > N (δ)
−1
δ
1
< τn < ·
.
α
α α−δ
(2.18)
Put then for simplicity Cd = M in (2.16) and choose m so that
M m1/α+τn < t ≤ M (m + 1)1/α+τn .
12
(2.19)
From these assumptions we may now deduce
P
|S |
|S |
n
n
>t ≤P
> Cdm1/α+τn ,
an
an
(2.20)
and
δ
tα−δ ≤ M α−δ (m + 1)(1/α+τn )(α−δ) ≤ Cm1+τn (α−δ)− α ≤ Cm.
(2.21)
Statement (2.17) now follows from (2.16), (2.20) and (2.21).
With this established we can now conclude, for any such δ and n > N (δ),
that
α−2δ
Z ∞
Sn E =
(α − 2δ)tα−2δ−1 P (|Sn |/an > t)dt
an
0
Z ∞
t−(1+δ) dt < ∞.
(2.22)
≤ C(α − 2δ)
0
For δ fixed and the finite set n < N (δ) = N we may simply use that
E|X|α−2δ < ∞ and Minkowski’s inequality to conclude that
1
E|Sn |α−2δ ≤ n(E|X|α−2δ ) α−2δ
α−2δ
≤ N α−2δ E|X|α−2δ .
Uniform boundedness follows from (2.22) and (2.23).
(2.23)
We now turn to extensions of the first three limit theorems of this section
to random functions, or elements, in the spaces C[0, 1] and D[0, 1] mentioned
in Section 1.1. Limiting distributions considered are Wiener measure, i.e.
the probability measure of the stochastic process Brownian motion on the
unit interval (cf. [6, Section 8]) and more general Lévy stable processes (cf.
[32, page 113]). Especially Theorems 2.16 and 2.17 are known as versions
of Donsker’s theorem.
In this context we shall also mention the Arc sine distribution of Lévy and
its connection to the amount of time certain random walks asymptotically
spend on the positive or negative axis.
Theorem 2.16. Let W denote Wiener measure on C[0, 1]. If {ξk }k≥1 is a
sequence of i.i.d. random variables with mean 0 and variance σ 2 , and if X n
is the continuous random function on [0, 1] defined by
1
1
Xtn (ω) = √ S[nt] (ω) + nt − [nt] √ ξ[nt]+1 (ω).
σ n
σ n
Then X n ⇒ W as n → ∞.
Proof. [6, Theorem 8.2].
13
Theorem 2.17. Let W denote Wiener measure on D[0, 1]. Suppose that
{ξk }k≥1 is an independent sequence of random variables with mean 0 and
variances σk2 satisfying the conditions of Theorem 2.7. Let s2n and Sn be the
partial sums of variances and random variables respectively. Let further X n
be the random function on [0, 1] defined by
Xtn (ω)
Sk (ω)
,
=
sn
s2k s2k+1
t∈ 2, 2 ,
sn sn
for
k = 1, ..., n.
Then X n ⇒ W as n → ∞.
Proof. [6, Theorem 14.1 and its extension].
Theorem 2.18. Let F , G, {Xk }k≥1 , {Sn }, {an } and {bn } be as in Theorem
2.14. Assume moreover (w.l.o.g.) that G has normalization parameter 1
and centering parameter 0. Let further X n be the random function on [0, 1]
defined by
k k+1
Sk (ω) − bk
n
,
for t ∈
,
,
k = 1, ..., n.
Xt (ω) =
an
n
n
Then X n ⇒ X as n → ∞, where X ∈ D[0, 1] is the α-stable Lévy motion
(or Lévy process) whose one-dimensional marginal distribution at t = 1 is
G.
Proof. [30, Proposition 3.4., page 81]. For more information about α-stable
Lévy motions confer also [32, page 113].
Remark 2.19. There is no difficulty in using the same kind of linearly
interpolated random variables as in Theorem 2.16 also in Theorem 2.17 to
remain in the context of C[0, 1]. This is on the other hand not possible for
Theorem 2.18, since it is not possible to define X as an element of C[0, 1]
when α < 2 (confer [32, Exercises 9.5-9.6, Chapter 9]).
For simplicity we state the following theorem in the context of D[0, 1].
The function h which occurs could be defined on the entire function space;
it then assigns the amount of the unit interval where a given function takes
positive values.
Theorem 2.20. Let {X n }n≥1 be random elements in D[0, 1] defined as in
Theorem 2.17. Define random variables by:
h(X n ) =
n
1 X 2
σk I{Sk > 0}.
s2n
k=1
Then
d
h(X n ) −→ A,
14
as
n → ∞,
where A denotes the arc sine law, that is, the distribution concentrated on
[0, 1] satisfying:
P (A ≤ t) =
√
2
arcsin t,
π
0 < t < 1.
Proof. [6, Section 8] and [6, Section 14].
We finally state a local limit theorem related to the results of this section.
A more general version than the one given below concerns i.i.d. random
variables with some lattice distribution and finite variance. Confer [17,
Theorem 7.6, Chapter 7, page 365] or [15, §49].
Theorem 2.21. Suppose that {Xk }k≥1 is an sequence of i.i.d. random variables assumingP
only integer values. Assume that every integer a is a possible
value of Sn := 1≤k≤n Xk for all sufficiently large n. Assume moreover that
EX1 = 0 and that Var X1 = σ 2 < ∞. Then, uniformly over all integers a,
√
1
nP Sn = a → √
σ 2π
as
n → ∞.
Proof. [15, §49].
2.3
Summation methods: linear transformations
The concept of Summation method originates from questions concerning how
to assign limits to divergent sums. A classical treatise on the subject is [18].
We will here rather be interested in assigning limits to divergent sequences,
but the problems are more or less equivalent. Typically the divergence is
due to oscillation in some form. We shall only consider a certain type of
such methods, namely certain kinds of linear transformations of sequences
(as elements in R∞ ). We associate a summation method T to a double
sequence of real numbers, {cm,n }m,n≥1 , and say that a sequence {sn }n≥1 is
summable to s (T ), whenever
tm =
∞
X
cm,n sn → s,
as
m → ∞.
n=1
One important concept is that of Regularity. A method T is said to be
regular whenever usual convergence of a sequence, sn → s, implies that sn
is summable to s by T . The following theorem is associated with T oeplitz
and originally proved in the beginning of the 20th century.
Theorem 2.22. In order that T should be regular, it is necessary and sufficient that
P
(i) γm = n≥1 |cm,n | < H, for some constant H independent of m;
15
(ii) cm,n → 0, for each n, when m → ∞;
P
(iii) cm = n≥1 cm,n → 1, when m → ∞.
Proof. [18, Theorem 2, Chapter 3, page 43] or [24, Theorem 4.10-1, page
270]. The former reference proves necessity by counterexamples, while the
latter uses results from functional analysis.
We shall mostly be interested in an application
where {pn }n≥1 is a nonP
p
,
assume that Pm → ∞,
negative sequence with p0 > 0. Set Pm = m
n
n=0
as m → ∞, and set
cm,n = pn /Pm , when n ≤ m,
cm,n = 0, when n > m.
The method associated to {cm,n } will be denoted by (N̄ , pn ). One example
is pn = n + 1 which gives Cesàro summation or arithmetic means. The
pn -sequence will be called Weights and gives rise to weighted means tm .
Theorem 2.23. The method (N̄ , pn ) is regular.
Proof. This could be established in an elementary way without any reference
to Theorem 2.22. The conditions (i)-(iii) are anyhow easily verified. γm =
cm = 1, verifying (i) and (iii). To verify (ii), cm,n = pn /Pm → 0, as m →
∞.
In fact, one may go a bit further and prove that methods of the type
(N̄ , pn ) are Totally regular. This means that we add the condition that
sn → ∞, implies that sn is summable to ∞ (T ). Confer [18, Theorem 10,
Chapter 3, page 53].
Much investigation has been made on how different summation methods relate to each other. The following theorem gives such results for the
methods recently defined.
P
P
Theorem 2.24. If pn > 0, qn > 0,
pn = ∞,
qn = ∞, and either:
(a) qn+1 /qn ≤ pn+1 /pn ;
or
(b) pn+1 /pn ≤ qn+1 /qn and
Pn /pn ≤ HQn /qn , for some constant H,
then sn → s (N̄ , pn ) implies sn → s (N̄ , qn ).
Proof. [18, Theorem 14, Chapter 3, page 58].
16
The relation Strength and the concept of Equivalence, applied to couples
of methods, refer to situations as in the conclusion of the theorem, that is
to relations between the two sets of summable sequences.
A special case, which we shall encounter later, is where two methods
defined by bounded weight sequences pn and qn are related so that pn ∼ qn
as n → ∞. Do they necessarily give rise to equivalent summation methods?
This question is answered in the following proposition. The proof of the
second part is inspired by the proof of Theorem 2.24 found in [18].
Theorem 2.25. Assume that two methods (N̄ , pn ) and (N̄ , qn ) with bounded
weights pn and qn are related so that pn ∼ qn as n → ∞. Then
(i) sn → s (N̄ , pn ) ⇐⇒ sn → s (N̄ , qn ), for bounded sequences sn ;
(ii) There exists sequences pn and qn giving rise to non-equivalent methods.
Proof. It follows that Pn ∼ Qn . Indeed, for > 0 small, take N such that
for k ≥ N ,
qk ≤ (1 + )pk .
Then for m > N :
P
Pm
PN
pk
(1 + ) m
qk
qk
k=1
k=1
Pm
Pm k=N +1 .
≤ Pm
+
k=1 pk
k=1 pk
k=1 pk
The first term tends to zero as m → ∞, while the second is bounded by
1 + . Reversing inequalities and replacing by −, yields the result.
For (i), assume {sn } bounded and that sn → s (N̄ , pn ). We must now
prove sn → s (N̄ , qn ) which is equivalent to showing that
1 X
qk sk → s, as n → ∞.
(2.24)
Pn
k≤n
We then decompose
1 X
Pn
k≤n
qk sk =
1 X
1 X
pk sk +
(qk − pk )sk ,
Pn
Pn
k≤n
k≤n
and it remains to prove that the second term to the right tends to zero as n
tends to infinity. But for > 0 small, take N large such that for k ≥ N ,
|qk − pk | ≤ pk .
We use boundedness of sn to conclude that
X
1 X
≤ C 1
(q
−
p
)s
|qk − pk |
k
k
k
Pn
Pn
k≤n
k≤n
n
X
1 X
≤ C
|qk − pk | + pk
Pn
k≤N
k=N +1
1 X
≤ C
|qk − pk | + .
Pn
k≤N
17
The conclusion follows since Pn → ∞ and since was arbitrary.
k
For (ii), take pk = 1 and qk = 1 + (−1)
log k for k ≥ 2 and q0 = q1 = 1. They
indeed satisfy the conditions. Let, for any sequence {sk }k≥1 ,
tm =
1 X
pk sk ,
Pm
um =
k≤m
1 X
qk sk .
Qn
k≤n
It follows that
q1
qm
1 q0
P0 t0 + (P1 t1 − P0 t0 ) + · · · +
(Pm tm − Pm−1 tm−1 )
um =
Qm p0
p1
pm
m
X
=
cm,n tn ,
n=0
where
cm,n =
qn
pn −
qn+1
pn+1
qn+1 Pn
pn+1 Qm ,
0,
Pn
Qm ,
n < m,
n = m,
n > m.
We may now come to the desired conclusion by verifying that some of conditions (i)-(iii) in Theorem 2.22 are violated for this double sequence {cm,n }.
In fact, for n < m
1
1
Pn
|cm,n | =
+
,
log n log(n + 1) Qm
so that
X
|cm,n | >
n<m
∼
1 X
1
1
n
+
Qm n<m
log n log(n + 1)
1
1
1 X n
1 X
n
+
>
.
m n<m
log n log(n + 1)
m n<m log n
The latter expression tends to infinity with m, since n/ log n tends to infinity
with n. This contradicts condition (i) of Theorem 2.22.
To make this argument rigorous we must show that there is no essential
restriction on what convergent sequences {tn } may occur after transformation by the method (N̄ , pn ). But in fact both {pn } and {qn } are nonvanishing so that corresponding linear transformations are easily seen to be
bijective on R∞ .
A general phenomenon of summation methods mentioned in [18] is that
there often is a two-fold limitation in what type of sequences to which they
apply. That is, they neither accept sequences behaving too roughly, nor
such where the divergence is too slow. We mention here one result giving a
criterion of the first kind.
18
Proposition 2.26. If pn > 0 and sn → s (N̄ , pn ), then
sn − s = o(Pn /pn ).
Proof. The proof is found in [18, Theorem 13, Chapter 3, page 57] and may
be given in one line:
pn sn = Pn tn − Pn−1 tn−1 = s(Pn − Pn−1 ) + o(Pn ) = spn + o(Pn ).
It is a characteristic feature that the the strength of methods (N̄ , pn )
is negatively related to the speed with which Pm tends to infinity. This is
present in Theorem 2.24 and Proposition 2.26. The following proposition
further illustrates this. It gives an necessary upper bound on this speed for
the method to sum something else than convergent sequences.
P
Proposition 2.27. If Pn+1 /Pn ≥ 1 + δ > 1, for some δ, then
an cannot
be summable (N̄ , pn ) unless it is convergent.
Proof. [18, Theorem 15, Chapter 3, page 59].
3
Almost Sure Converging Means of Random Variables
The main result of this chapter is Theorem 3.1. The situation resembles
The law of large numbers for random variables of finite variance, although
no assumptions are made here on independence or orthogonality but instead
of boundedness. Moreover we no longer restrict to arithmetic means, but
allow rather arbitrary weight sequences.
3.1
Bounds for variances of weighted partial sums
The proof of the following theorem is fairly simple in principle and consists of
two parts. The procedure is known as the ”method of taking subsequences”
or the ”gap method”. It should however be noted that the second part of
such proofs usually is the hardest one, demanding some analysis and use of
”maximal inequalities”. This will not enter here, although such reasoning
would be possible also in this more general situation. For further discussion
confer Remark 3.4. The arguments in the proof are to a large extent implicit
in [4], confer also [11] for a similar result.
A general result in the case of arithmetic means and no assumption on
boundedness, may be found in [19].
Theorem 3.1. Let {ξk }k≥1 be a sequence of random variables, uniformly
bounded below and with finite variances, and let {dk }k≥1 be a sequence of
19
P
positive numbers. Set for n ≥ 1 Dn := nk=1 dk and Tn :=
Assume that
Dn → ∞ and Dn+1 /Dn → 1,
1
Dn
Pn
k=1 dk ξk .
(3.1)
as n → ∞. If for some > 0, C and all n
ETn2 ≤ C(log Dn )−1 (log2 Dn )−2 ,
then
a.s.
Tn −→ 0
as
n → ∞.
(3.2)
(3.3)
For the proof we shall need the following lemma.
Lemma 3.2. Let {dk }k≥1 and {Dn }n≥1 be as in Theorem 3.1. Then for
each a > 1 there exists a subsequence {nk }, such that
Dnk ∼ ak ,
as
k → ∞.
(3.4)
Proof. Choose N large so that n > N implies that Dn+1 /Dn < a. For
k ≤ N take nk = k. For k > N define
nk = inf{n : Dn ≥ ak }.
Now nk+1 > nk since Dnk ≥ ak , Dnk −1 < ak implies that
Dnk =
Dnk
Dn −1 < aak = ak+1 .
Dnk −1 k
Moreover, it also follows for k > N that
1≤
Dnk
Dnk Dnk −1
Dnk
=
<
.
k
k
Dnk −1 a
Dnk −1
a
The desired conclusion follows since Dn /Dn−1 → 1.
Proof of Theorem 3.1. Let a > 1 and apply Lemma 3.2 with this a. Equation
(3.2) for this subsequence then becomes
ETn2k ≤ Ck −1 (log k)−2 , for some constant C.
(3.5)
This follows since (3.4) implies that Dnk /ak remains bounded. We hence
have
∞
∞
∞
X
X
X
1
E
Tn2k =
ETn2k ≤ C
< ∞,
2
k
log
k
k=1
k=1
k=1
P
2
since the summands are positive random variables. A fortiori ∞
k=1 Tnk <
a.s.
∞, almost surely, which in turn implies that Tn2k −→ 0 which finally implies
that
a.s.
Tnk −→ 0, as k → ∞.
(3.6)
20
Consider now an arbitrary n and assume that nk < n ≤ nk+1 . From (3.4)
it follows that
(3.7)
Dnk+1 /Dnk → a.
By assumption there exists an M such that −M < ξk , for all k. Define, for
n ≥ 1,
n
1 X
Tn0 =
dk (ξk + M ) = Tn + M.
Dn
k=1
Then by (3.6)
a.s.
Tn0 k −→ M,
as
k → ∞.
(3.8)
Moreover, by positivity
Dnk Tn0 k ≤ Dn Tn0 ≤ Dnk+1 Tn0 k+1 ,
which gives that
Dnk+1 0
Dnk 0
Tnk ≤ Tn0 ≤
T
.
Dnk+1
Dnk nk+1
a.s.
From (3.7) and (3.8) it then follows, by letting a → 1, that Tn0 −→ M so
a.s.
that Tn −→ 0 as n → ∞ as desired.
Remark 3.3. The random variables ξk will usually have zero expectation
in applications of the theorem, so that ETn2 denotes the variance of Tn . If
(3.2) holds for random variables not obeying this assumption, it also does
for the sequence of centralized random variables. Indeed the variance of a
random variable X satisfies
Var X = inf E[X − a]2 .
a∈R
The two conclusions from the theorem would then imply that the sequence of
means of the original random variables is zero summable for this particular
summation method.
Remark 3.4. One could try to weaken (3.2) even further by choosing a
subsequence {Dnk } not obeying ”Dnk+1 /Dnk bounded ”. More classically one
could then try to replace the second part of the proof by proving appropriate
bounds for:
Mk :=
sup
|Sm − Snk |,
nk ≤m<nk+1
where we define
Sm :=
m
X
dk ξk .
k=1
Although possible in principle, there seems to be a difficulty in treating the
differences nk+1 − nk . These may be very large, especially when Dn tends
slowly to infinity, making {nk }k≥1 very sparse. This seems to make available
”maximal inequalities” insufficient for improvements on the results.
21
Remark 3.5. In order to relax Almost sure convergence to Convergence in
probability in the conclusion of the theorem, one could also weaken assumption (3.2). The following assumption would, for example, be sufficient:
ETn2 ≤ C(log Dn )−γ , for some γ > 0 and constant C.
(3.9)
By the standard Chebyshev-inequality it suffices that the right hand side of
(3.2) tends to zero as n tends to infinity.
3.2
Bounds for covariances among individual variables
We shall now consider a way to obtain both weight sequences and estimates
as in Theorem 3.1 by means of covariance estimates among individual random variables.
Condition (3.10) below gives a restriction in the scope of weight sequences {dk } to be considered in comparison with Theorem 3.1. In the
preceeding theorem it was assumed that dk = o(Dk ) while condition (3.10)
gives a minimum rate of this convergence.
Theorem 3.6. Let {ξk }k≥1 be as in Theorem (3.1); {ck }k≥1 a nondecreasing
positive sequence with c1 = 1 and ck → ∞. Define
X
dk = log (ck+1 /ck ),
Dn :=
dk = log (ck+1 ),
1≤k≤n
and assume that for some constant C and all k ≥ 1
−1
−2
dk ≤ CDk log Dk
log2 Dk
.
(3.10)
e
Assume further that for some constant C and for 1 ≤ k < l, cl /ck > ee ,
−1
−2
log3 (cl /ck ) .
(3.11)
|E(ξk ξl )| ≤ C log2 (cl /ck )
For other indices 1 ≤ k < l assume that
|E(ξk ξl )| ≤ C.
(3.12)
Then condition (3.2) will be satisfied with Dn and Tn as in Theorem 3.1 and
the conclusion of Theorem 3.1 follows.
Proof. Since for any numbers
n
X
ak
2
=
k=1
it follows that
n
X
k=1
n
X
k=1
ak
2
X
a2k + 2
ak al ,
1≤k<l≤n
X
≤2
1≤k≤l≤n
22
ak al ,
(3.13)
n
X
ak
2
X
≥
k=1
ak al .
(3.14)
1≤k≤l≤n
If we apply (3.13) and the triangle inequality with ak = dk ξk we arrive at
E
n
X
dk ξk
2
≤2
k=1
X
dk dl |E(ξk ξl )|.
(3.15)
1≤k≤l≤n
What we need to show is hence that
X
dk dl |E(ξk ξl )| ≤ C(log Dn )−1 (log2 Dn )−2 Dn2 ,
(3.16)
1≤k≤l≤n
for some constant C. For those indices in the left hand sum where cl /ck ≥
1/2
exp (Dn ) equation (3.11) gives:
|E(ξk ξl )| ≤ C ∗ (log Dn )−1 (log2 Dn )−2 .
The sum over such indices is hence majorized by
X
dk dl ≤ C ∗ (log Dn )−1 (log2 Dn )−2 Dn2 .
C ∗ (log Dn )−1 (log2 Dn )−2
1≤k≤l≤n
The last inequality is an application of (3.14).
For remaining indices k, l where l lies in:
l : k ≤ l ≤ n and cl /ck < exp (Dn1/2 ) =: Ak ,
we can apply (3.12) which by (3.11) generalizes to all indices l and k. We
note that the fact that {ck } is increasing implies that for all k considered
Ak = {l : k ≤ l ≤ nk } for some nk ≤ n. We thereby get that
X
dl = log (cnk +1 /ck ),
l∈Ak
and by (3.10) and the definition of Ak :
log (cnk +1 /ck ) = log (cnk +1 /cnk ) + log (cnk /ck )
≤ CDn (log Dn )−1 (log2 Dn )−2 + Dn1/2
≤ C 0 Dn (log Dn )−1 (log2 Dn )−2 .
(3.17)
Now putting things together, the left hand side in (3.16) for indices where
1/2
cl /ck < exp (Dn ), is majorized by:
C
n
X
k=1
dk
X
dl ≤ CC 0 Dn2 (log Dn )−1 (log2 Dn )−2 .
l∈Ak
Inequality (3.16) is therefore satisfied for some constant C. The conclusion of Theorem 3.1 now follows since the sequence {dk } fulfills the desired
conditions by assumptions on {ck }.
23
Remark 3.7. A stronger assumption than (3.10) is that ck+1 /ck = O(1).
We now derive two consequences of Theorem 3.6.
Theorem 3.8. If (3.11) and (3.12) are fulfilled for ck = k α and some α > 0
then equivalently to (3.3),
n
1 X ξk a.s.
−→ 0
log n
k
k=1
Proof. We have
1
1
∼ α , as k → ∞.
(3.18)
dk = α log 1 +
k
k
P
This implies that Dn ∼ α nk=1 k1 ∼ α log n as n → ∞. We may obviously
disregard the occurrence of α (set α = 1). By Theorem 2.24 we may prove
equivalence between the methods (N̄ , dk ) and (N̄ , 1/k) as defined in Section
2.3 by either showing (i) or (ii):
(i) (k + 1)dk+1 ≤ kdk ;
(3.19)
(ii) (k + 1)dk+1 ≥ kdk .
(3.20)
In fact (3.19) holds, since the function g,
g(k) := k log (1 + 1/k) = log
k+1
k
k !
,
k
is decreasing. Indeed, one easily verifies that ( k+1
k ) decreases as k increases.
The logarithmic function is on the other hand increasing.
Theorem 3.9. If (3.11) and (3.12) are fulfilled for ck = (log k)α and some
α > 0 then equivalently to (3.3),
n
1 X ξk
a.s.
−→ 0.
log2 n
k log k
k=1
Proof.
log (k + 1) − log k
dk = α log 1 +
log k
log (k + 1) − log k
1
∼ α
∼α
, as k → ∞.
log k
k log k
P
1
This implies that Dn ∼ α nk=1 k log
k ∼ α log2 n. As in Theorem 3.8 we
need to show either (i) or (ii):
(i) dk+1 (k + 1) log (k + 1) ≤ dk k log k;
(3.21)
(ii) dk+1 (k + 1) log (k + 1) ≥ dk k log k.
(3.22)
24
Define
f (x) = x log x log
log (x + 1) log x
,
g(x) = x log (1 + 1/x)
We need to show that f is either decreasing or increasing and in the proof
of Theorem 3.8 we showed that g is decreasing. Now
f (x) =
y =
x log x
y
x log x
x log x
y log (1 + 1/y) =
g(y),
y
y
log x
,
log (1 + 1/x)
= x log (1 + 1/x) = g(x).
y is increasing in x so that f is the product of two decreasing functions, and
hence decreasing. In conclusion, (3.21) holds.
3.3
Refinements with respect to weight sequences
Condition (3.23) below is stronger than (3.11) in Theorem 3.6 with ck =
k. That is, Theorem 3.8 with log-summation is valid in this case. The
next theorem shows that one may with this condition improve on the result
by choosing other sequences {Dn }n≥1 tending faster to infinity. Confer
Proposition 3.13 for results concerning in what sense one may call this an
improvement.
Theorem 3.10. Let {ξk }k≥1 be as in Theorem (3.1). Assume further that
for some positive constants C, γ and all 1 ≤ k ≤ l,
|E(ξk ξl )| ≤ C(l/k)−γ .
(3.23)
Then condition (3.2) is satisfied for any α ∈ [0, 12 ) and condition (3.9) in
Remark 3.5 is satisfied for all α ∈ [0, 1) with
1
dk := log 1 +
exp (log k)α ,
k
and Dn and Tn defined by these as before. It follows that:
1
a.s.
Tn −→ 0, as n → ∞ for all α ∈ [0, ),
2
p
Tn −→ 0, as n → ∞ for all α ∈ [0, 1).
First we need a lemma concerning these weight sequences.
25
Lemma 3.11. For α ∈ [0, 1] and Dn above, there exists some constant
C > 0 such that
Dn ∼ C(log n)1−α exp (log n)α .
Proof. The boundary cases are well-known with C = 1. Assume that α ∈
(0, 1) and define:
F (x) = (log x)1−α exp (log x)α ,
1
f (x) =
exp (log x)α .
x
Then dk ∼ f (k). Together with Dn → ∞ it follows, just as in the beginning
of the proof of Theorem 2.25, that
Dn =
n
X
dk ∼
k=1
n
X
f (k).
(3.24)
k=1
On the other hand
α (1 − α)
+
(log x)−α = αf (x) + g(x),
F 0 (x) = exp (log x)α
x
x
with
(1 − α)
g(x) := exp (log x)α
(log x)−α .
x
Clearly g = o(f ). Together with f, g ≥ 0 this implies that
Z n
Z n
0
F (n) = F (n) − F (1) =
F (x)dx ∼
αf (x)dx.
1
(3.25)
1
To prove this, take for > 0 small, N such that
x ≥ N ⇒ g(x) ≤ αf (x).
Then for y > N :
RN
Ry
Ry
g(x)dx
αf (x)dx
g(x)dx
1
1
Ry
≤ Ry
+ R y N
.
1 (αf (x) + g(x))dx
1 (αf (x) + g(x))dx
1 (αf (x) + g(x))dx
The first term tends to zero since F (y) → ∞ as y → ∞. The second is
bounded by . This proves (3.25).
It now remains to prove that
Z n
n
X
f (k) ∼
f (x)dx.
(3.26)
k=1
1
For x such that log x ≥ 1, it follows that
α(log x)α−1
α
1
1 α−1
f 0 (x) = exp (log x)α
−
≤
x
−
=
< 0.
x2
x2
x2 x2
x
26
The function f is hence decreasing for x ≥ 3, say. Hence
n
X
Z
3
k=3
Since F (n) → ∞ implies
show that
n
f (x)dx ≤
f (k) ≤
n+1
X
f (k).
k=3
Rn
k=1 f (x)dx
→ ∞ by (3.25), it only remains to
f (n + 1)
Pn
→ 0.
k=1 f (k)
But this is obvious since 0 ≤ f (k) ≤ 1. Our proof is complete and C =
except for α = 0 where C = 1.
1
α,
Proof of Theorem 3.10. The proof will be similar to the proof of Theorem
3.6. Condition (3.1) is satisfied since {dk }k≥1 is bounded, indeed,
1
1
α
log 1 +
exp (log k)
≤ log 1 +
exp (log k)
k
k
1
= k log 1 +
∼ 1.
k
As before we need to show that
X
dk dl |E(ξk ξl )| ≤ C(log Dn )−1 (log2 Dn )−2 Dn2 ,
(3.27)
1≤k≤l≤n
for some constant C. Now consider indices l, k satisyfing
l/k ≥ (log Dn )2/γ
and their complement separately. For the first set condition (3.23) transforms into
|E(ξk ξl )| ≤ C(log Dn )−2 ,
so that
C(log Dn )−2
X
dk dl ≤ C(log Dn )−2 Dn2 .
1≤k≤l≤n
The last inequality follows from (3.14).
For the complement we proceed similarly as in Theorem 3.6. Define
n
o
o n
Ak := l : k ≤ l ≤ n and l/k < (log Dn )2/γ = l : k ≤ l ≤ nk .
Inequality (3.17) is replaced by
log ((nk + 1)/k) ≤ log 2 +
27
2
log2 Dn ≤ C log2 Dn ,
γ
(3.28)
We shall also use the fact that for α > 0:
exp (log n)α ∼ C ∗ Dn (log Dn )−(1−α)/α ,
(3.29)
for some constant C ∗ . This comes from Lemma 3.11. Indeed,
log Dn ∼ (log n)α ⇒ log Dn(1−α)/α ∼ (log n)1−α .
We proceed:
n
X
k=1
dk
X
dl =
A
≤
n
X
k=1
n
X
k=1
dk
X
A
dk
X
A
1
log (1 + ) exp (log l)α
l
1
log (1 + ) exp (log n)α
l
≤ C exp (log n)α
n
X
dk log2 (Dn )
k=1
α
= C exp (log n)
Dn log2 (Dn )
≤ CC ∗ Dn2 log2 (Dn )(log Dn )−(1−α)/α
≤ C 00 Dn2 (log Dn )−
1−α
+
α
.
The last inequality holds for all > 0. Now α < 1/2 gives 1−α
α = 1 + δ,
1−α
while α < 1 only implies that α > 0. This completes the proof.
Remark 3.12. There seems to be an open question whether 1 > α ≥ 1/2
permits any Almost sure conclusions. Negative results in the case α = 1
follow from Theorem 2.20.
Berkes claims that Proposition 3.10 holds for arbitrary sequences ck
(with properties as in Theorem 3.6) with weights
dk := log (ck+1 /ck ) exp (log ck )α .
To verify this one needs to prove a generalization of Lemma 3.11 and to
show that dk satisfies (3.1). The rest of the proof would be similar.
The family of summation methods defined in Theorem 3.10 ranges between log summation (α = 0) and Cesàro summation (α = 1). The strength
of these methods decreases as α increases, as we shall see in the following
proposition. In fact, in the formulation of Theorem 2.24 there is no claim of
proving any strict difference of strength between methods. But this seems
possible to prove here and in other situations of the same type by continuing
the proof in [18], which is based on Theorem 2.22. Confer the end of the
proof of the next proposition.
28
Proposition 3.13. Let pk (α) = log (1 + 1/k) exp (log k α ) . Then for 0 ≤
α < α0 ≤ 1 and for all sequences {sn }n≥1 , sn → s (N̄ , pk (α0 )) implies that
sn → s (N̄ , pk (α)).
P
P
Proof. Set pk = pk (α0 ), qk = pk (α) and Pn = k≤n pk , Qn = k≤n qk . By
Lemma 3.11
1
0
0
Pn ∼ 0 (log n)1−α exp (log n)α ,
α
1
Qn ∼ (log n)1−α exp (log n)α .
α
We investigate condition (a) of Theorem 2.24, namely qn+1 /qn ≤ pn+1 /pn .
qn+1 /qn ≤ pn+1 /pn ⇐⇒
0
0
exp (log (n + 1)) − (log n)α ≤ exp (log (n + 1))α − (log n)α ⇐⇒
α
0
0
(log (n + 1))α − (log n)α ≤ (log (n + 1))α − (log n)α .
The last inequality follows from the mean value theorem and the fact that
the derivative of f (x) = xα increases pointwise for positive x as α increases.
Condition (b) of Theorem 2.24 for p and q in reversed order does on the
other hand not hold. In fact for some constant C
(log n)1−α
pn Qn
0
∼C
= C(log n)α −α ,
q n Pn
(log n)1−α0
which tends to infinity as n tends to infinity. This indicates that there is no
equivalence between methods belonging to different values of α.
4
Almost Sure Central Limit Theory
The results of the previous chapter will now be used to derive Almost sure
centralPlimit theorems for sums of random variables. We use the notation
Sn = nk=1 Xk and often presume a Central limit theorem of the form
S n − bn d
−→ G,
an
(4.1)
for some sequences {an }n≥1 and {bn }n≥1 and a non-trivial distribution G.
So far there are no dependency assumptions whatsoever concerning the sequence {Xk }k≥1 .
The following result is based on Theorem 3.6 and will be an important
tool in subsequent sections.
Theorem 4.1. Assume that for some distribution G, random variables
{Xk }k≥1 and real sequences {an }n≥1 and {bn }n≥1
S n − bn d
−→ G,
an
29
as
n → ∞,
(4.2)
with an > 0. Let f be a bounded Lipschitz-function on R and define:
Sk − bk
S k − bk
−E f
.
ξk = f
ak
ak
Assume moreover that for some constants M and (possibly depending on
f ) and all k and l obeying 1 ≤ k ≤ l,
E(ξk ξl ) ≤ M log2 (cl /ck ) −1 log3 (cl /ck ) −2 ,
(4.3)
or stronger
E(ξk ξl ) ≤ M ck ,
cl
(4.4)
with {ck }n≥1 positive and nondecreasing to infinity such that condition (3.10)
is fulfilled.
P
Then, for dk := log ck+1 /ck , Dn := nk=1 dk ,
n
Sk − ak
1 X
a.s.
≤ x −→ G(x), for all x ∈ CG .
dk I
Dn
bk
(4.5)
k=1
Proof. Put
Yk =
S k − bk
.
ak
Theorem 2.4 shows that (4.5) is equivalent to:
Z
n
1 X
a.s.
dk f (Yk ) −→ f dG,
Dn
k=1
for all bounded Lipschitz funtions f . It follows by (4.2) and Theorem 2.1
that
Z
Ef (Yk ) → f dG as k → ∞,
for such functions f . Since the summation method defined by {dk }k≥1 is
regular it follows that
Z
n
1 X
dk Ef (Yk ) → f dG.
Dn
k=1
It therefore remains to prove that
n
a.s.
1 X
dk f (Yk ) − Ef (Yk ) −→ 0,
Dn
k=1
for all bounded Lipschitz funtions f . To this end our assumptions make sure
that Theorem 3.6 applies.
30
Remark 4.2. Berkes [4, page 23] states that there is an example (due to
Lifschitz) in the case cl = l of a sequence of independent random variables
{Xk }k≥1 with mean 0 and finite variances that obeys (4.2) with bn = 0 and
√
an = n and
E(ξk ξl ) ≤ M log2 (l/k) −1 ,
as in (4.3) but not (4.3) itself nor the conclusion (4.5).
4.1
Independent random variables
We shall now recall the results of the beginning of Section 1.2 to see when,
and for which weight sequences {dk }k≥1 Theorem 4.1 may be applied. Condition (4.4) is crucial and will be investigated in the following proposition.
Proposition 4.3. Let Sn denote the n:th partial sum of a sequence of independent random variables {Xk }k≥1 . Let {an }n≥1 and {bn }n≥1 be sequences
of real numbers. Assume that {bn }n≥1 is positive. Let f be a bounded
Lipschitz-function on R and define:
Sk − bk
S k − bk
−E f
.
(4.6)
ξk = f
ak
ak
Assume further that
S n − bn p
≤C
E
an (4.7)
for some constant C and some p < 1. Then, for some constant M and all
k and l obeying 1 ≤ k ≤ l, it follows that
p
E(ξk ξl ) ≤ M ak .
(4.8)
al
Proof. Let K denote a Lipschitz-constant, as well as an upper bound for f .
Define further:
Sk − bk
fk = f
, for all k.
ak
Sl − Sk − (bl − bk )
, for all 1 ≤ k ≤ l.
fk,l = f
al
By independency assumptions, fk,l is independent of fk , for all 1 ≤ k ≤ l.
It follows that
E(ξk ξl ) = Cov (fk , fl ) = Cov (fk , fl − fk,l )
≤ C 0 E fl − fk,l .
(4.9)
31
The inequality holds for some C 0 depending on K, since f is bounded, and
thereby also fk , fk,l and their moments. From the assumptions on f and
the triangle inequality it follows that
Sk − bk fl − fk,l ≤ K al ∧ 2K
Now proceeding from (4.9) gives
Sk − bk ∧ 2K
E fl − fk,l ≤ E K al 1 Sk − bk = 2K E ∧1 .
2
al (4.10)
Since what is inside the last two brackets is less than 1, it will increase by
being raised by p < 1, defined at the beginning. We finish by continuing
from (4.10) using hypothesis (4.7):
p #
p
p " 1
S
−
b
1 Sk − bk k
k
S
−b
1 k k E
∧
1
≤
E
∧
1
≤
E 2 al al 2
al 2
p #
p " p
ak
Sk −bk ak
∗
.
E ak =
≤C
2al
al
Here C ∗ = C
1 p
2 .
The i.i.d. case of Part (b) of the following theorem is the original Almost
sure (or everywhere) central limit theorem, Brosamler [8], Schatte [33] and
Lacey and Philipp [25]. The general case of (b) seems to be due to Atlagh
[1]. Part (a) may not have been given in this general form before. An early
source for Part (c) is Peligrad and Révész [27].
Theorem 4.4. Let {Xk }k≥1 be a sequence of independent random variables.
Then
(a) When {Xk }k≥1 have finite P
expectations {µk }k≥1 and finite variances
2
2
{σk }k≥1 such that for sn = nk=1 σk2 , some constant C and all k ≥ 1,
(a1) log 1 +
(a2)
1
sn
Pn
σk2
2
sk−1
k=1 (Xk
≤ C(log sk )(log2 sk )−1 (log3 sk )−2 ;
d
− µk ) −→ N,
as n → ∞, then for all real x
P
n
Sk − 1≤i≤k µk
1 X
a.s.
2
2
log sk+1 /sk I
≤ x −→ Φ(x).
log s2n
sk
k=1
32
(4.11)
(b) When {Xk }k≥1 have finite P
expectations {µk }k≥1 and finite variances
{σk2 }k≥1 such that for s2n = nk=1 σk2 ,
σ2
max 2k → 0,
1≤k≤n sn
n
1 X
d
(Xk − µk ) −→ N,
sn
and
k=1
as n → ∞ (cf. Theorem 2.7), then for all real x
P
n
2
Sk − 1≤i≤k µk
1 X σk+1
a.s.
I
≤
x
−→ Φ(x).
log s2n
sk
s2k
(4.12)
k=1
(c) When {Xk }k≥1 are identically distributed and in the domain of attraction of a stable distribution G of index α, 0 < α ≤ 2 so that
Sn − bn d
, −→ G
an
as
n, → ∞
for some positive sequence {an } and real sequence {bn }, (cf. Theorem
2.8), then for all real x
n
1 X1
Sk − bk
a.s.
I
≤ x −→ G(x).
log n
k
ak
(4.13)
k=1
(d) The sequence {pk }k≥1 , pk := log (s2k+1 /s2k ) may in (4.11) be replaced
by any other sequence {qk }k≥1 , such that qk ∼ pk , as k → ∞. The
corresponding conclusion also holds with respect to (4.12), that is, with
2 /s2 and with respect to (4.13), that is, with
{qk }k≥1 such that qk ∼ σk+1
k
{qk }k≥1 such that qk ∼ 1/k.
Proof. To begin with, note that Theorem 2.25 (i) applies so that conclusion (d) holds if (a), (b) and (c) are proved with weight sequences {pk }k≥1 ,
2 /s2 }
{σk+1
k k≥1 and {1/k}k≥1 respectively, or any other ∼-equivalent sequences.
For part (a), condition (4.7) in Proposition 4.3 is satisified even for p = 2
which suggests that we choose ck = s2k in (4.4) of Theorem 4.1, yielding
weights dk = log (s2k+1 /s2k ). Condition (a1) corresponds to condition (3.10)
in Theorem 3.6 concerning the behavior of the weight sequence. Conclusion
(a) thereby follows from Theorem 4.1.
For part (b), which is a subcase of part (a), note that
log
Indeed
s2k+1
s2k
σ2
= log 1 + k+1
s2k
2
σk+1
→ 0,
s2k
as
33
∼
k → ∞.
2
σk+1
.
s2k
(4.14)
(4.15)
2
Arguing by contradiction, statement σk+1
> s2k implies that
2
2
σk+1
σk+1
s2k .
> 2 = 1− 2
s2k+1
sk+1
sk+1
We hence get
2
σk+1
>
= δ.
2
1+
sk+1
(4.16)
From (4.16) we may contradict condition (ii) of Theorem 2.7, that is, if
(4.15) does not hold. This proves (4.14).
Furthermore
n
X
k=1
n
X
2
σk+1
/s2k ∼
log s2k+1 /s2k = log s2n+1 /s21 ∼ log s2n .
(4.17)
k=1
Indeed s2n+1 ∼ s2n , which combined with s2n → ∞ gives log (s2n+1 ) ∼ log (s2n ).
The first asymptotic equivalence is a consequence of (4.14). Conclusion (b)
is thereby proved by the results (a) and (d).
For part (c), condition (4.7) in Proposition 4.3 is satisified for p < α,
according to Theorem 2.14. We hence get, for 1 ≤ k ≤ l,
l 1 L(l)
al
α
=
,
ak
k
L(k)
(4.18)
with L slowly varying at infinity. We would now like to show that condition
(4.4) with this sequence {ak }k≥1 is both stronger than the same condition with {ak }k≥1 replaced by some of {ck }k≥1 sequences of Theorem 3.8
and weaker than the same condition with {ak }k≥1 replaced by some other
{ck }k≥1 sequences of Theorem 3.8. (Confer statements (4.19) and (4.20)
below.) This would give (4.13) as a natural conclusion in view of the same
theorem.
In fact, Karamata’s representation theorem of slowly varying functions
(see e.g. [7, page 12]), tells us that
Z x
L(x) = c(x)exp
(u)du/u ,
0
with c(x) → c ∈ (0, ∞) and (x) → 0 as x → ∞. We may therefore conclude
that
M
L(l)
l
≤A
,
(4.19)
L(k)
k
for some constants M and A and all 1 ≤ k ≤ l. Indeed, |(x)| ≤ M gives
Z
l
(u)du/u ≤ M log (l/k),
k
34
and c(x) bounded below and above gives
cl
≤ A.
ck
This proves (4.19).
On the other hand, introducing
1
f (n) = n 2α L(n),
it follows that
l 1 f (l)
al
2α
=
,
ak
k
f (k)
and it remains to prove that
f (l)
≥ B,
f (k)
(4.20)
for some positive constant B and all l, k such that l > k. Now by [7, Theorem
1.5.3, page 23] it follows that
f (x) = inf{f (t) : t ≥ x} ∼ f (x)
as
x → ∞,
so that for some constant K and all k > K,
f (k)
f (l)
≥
≥ 1/2.
f (k)
f (k)
It also follows [7, Proposition 1.5.1, page 22] that
f (x) → ∞
as
x → ∞,
so that for some constant N and all l > N and k < K,
f (l) > max f (k) =⇒
k≤K
f (l)
≥ 1,
f (k)
We may finally define
B=
min
k,l≤K∧N
n
o
f (l)/f (k) ∨ 1/2,
since we now take the minimum over a finite set.
One could investigate further the range of summation methods (N̄ , pn ),
as defined in Section 1.3, which permits conclusions of the form (4.11), (4.12)
and (4.13) under the conditions (a), (b) and (c) of Theorem 4.4. Of special
interest may be how weak summation methods the conclusion permits, with
respect to log - and Cesàro summation.
We shall here move in this direction for the case (c) by making use of
Theorem 3.10.
35
Theorem 4.5. Let {Xk }k≥1 be a sequence of i.i.d. random variables in the
domain of attraction of a stable distribution G of index α, 0 < α ≤ 2 so that
S n − bn d
−→ G,
an
as
n → ∞,
for some positive sequence {an }n≥1 and real sequence {bn }n≥1 . Let, for
γ ∈ [0, 1],
X
1
(γ)
dk := exp (log k)(γ) ,
Dn(γ) :=
dk .
k
1≤k≤n
Then, for any γ ∈ [0, 1/2) and all real x,
n
1 X (γ) Sk − bk
a.s.
dk I
≤ x = G(x).
lim
n→∞ D (γ)
ak
n
(4.21)
k=1
For any γ ∈ [0, 1) and all real x,
lim
n→∞
n
X
1
(γ)
Dn
(γ)
dk I
k=1
S k − bk
p
≤ x = G(x).
ak
(4.22)
Proof. From (4.18) and (4.19) above and Proposition 4.3 we see that condition (4.4) in Theorem 4.1 is fulfilled with ck = k. Theorem 3.10 thereby
applies for the type of {ξk }-sequences defined in Theorem 4.1. We here have
replaced each weight sequence {dk } by an asymptotic equivalent, but this
does not disturb the result according to Theorem 4.4 (d). Conclusions (4.21)
and (4.22) now follow in the same manner as in Theorem 4.1.
The following theorem gives an example of the situation in (b) of Theorem 4.4. It is related to the theory of Records and Extremes. It follows from
(iii) below that usual log-summation is not enough in this case.
Theorem 4.6. Let {Xk } be a sequence of i.i.d. continuous random variables
and define
Ik = I{Xk > Xl , all l < k},
µ(n) =
n
X
Ik .
k=1
Then as n → ∞,
(i)
µ(n)−log n
√
log n
(ii)
1
log2 n
(iii)
1
log n
d
−→ N;
Pn
1
k=1 k log k I
Pn
n
1
k=1 k I{µ(n)
µ(n)−log n
√
log n
o
a.s.
≤ x −→ Φ(x), for all x;
d
− log n > 0} −→ A, where A denotes the arc sine
distribution.
36
Proof. It can be shown that Ik are independent Bernoulli(1/k)-random variables [17, page 93] so that Eµ(n) ∼ log n and Var µ(n) ∼ log n. Moreover,
the conditions of Theorem 2.7 may be verified yielding (i), cf. [17, page 351].
Condition (ii) follows from Theorem 4.4 (b) and (d) since
Var Ik+1
1/k(1 − 1/k)
1
1
=
∼
∼
.
Var µ(k)
Var µ(k)
kVar µ(k)
k log k
Finally, from Theorem 2.20 it follows that
n
1
1 X 1
d
1−
I{µ(n) − log n > 0} −→ A.
log n
k
k
k=1
This is equivalent to (iii) since
n
n
k=1
k=1
1 X 1
1 X 1
I{µ(n)
−
log
n
>
0}
≤
,
log n
k2
log n
k2
i.e. tends to zero as n → ∞, and by the general fact (Cramér’s theorem) for
d
p
d
sequences of random variables: Xn −→ X, Yn −→ 0 implies that Xn +Yn −→
X.
4.2
Weakly dependent random variables
By relaxing the assumption of independence we encounter different characterizations and definitions of Weakly dependent sequences which may behave asymptotically like independent sequences and may allow central limit
theorems of the form (4.1). Important classes are those of Strongly (or
α−) mixing, and ρ−mixing sequences. Let {Xn }n≥1 be a sequence of random variables and denote the σ−field generated by the random variables
n . We then take as defining properties respectively:
{Xj : m ≤ j ≤ n} by Fm
∞ } → 0;
(i) α(n) := supk≥1 {|P (A ∩ B) − P (A)P (B)| : A ∈ F1k , B ∈ Fk+n
n
o
(X,Y )
k ), Y ∈ L (F ∞ ) → 0,
(ii) ρ(n) := supk≥1 (EXCov
:
X
∈
L
(F
2 1
2 k+n
2 )1/2 (EY 2 )1/2
as n → ∞.
Another class is Associated sequences. The defining property for a sequence {Xn }n≥1 is that for any n ≥ 1 and any coordinatewise increasing
functions f, g : Rn → R we have
(iii) Cov (f (X1 , . . . , Xn ), g(X1 , . . . , Xn )) ≥ 0,
whenever the left hand side is well-defined.
We shall state results from [28] concerning these classes with respect to
conclusions of the form (c) of Theorem 4.4. Proofs will be indicated through
37
results of previous chapters but in close connection to the one in [28]. For
an overview of some further results consult [3, Chapter 6].
To begin with we state one result from [25] which is not related to assumptions (i)-(iii) but which, on the other hand, depends on a different
convergence result than the usual central limit theorem. It applies to broad
classes of sequences of random variables (mixing, martingale differences, lacunary trigonometric etc.), cf. [25].
Theorem 4.7. Let {Xn }n≥1 be a sequence of real random variables whose
partial sums Sn permit the approximation
X
√
Sn −
Yk = o( n),
a.s. as n → ∞,
(4.23)
k≤n
by a sequence of i.i.d. standard normal random variables {Yn }n≥1 . Then
n
1 X1
Sk
a.s.
I √ ≤ x −→ Φ(x),
log n
k
k
k=1
for all
x.
Proof. In view of Theorem 2.4 we need to show that
n
√ a.s.
1 X1 f Sk / k −→
log n
k
k=1
Z
f dN,
(4.24)
R
for any bounded Lipschitz function f . When XP
k = Yk , this is nothing but
Theorem 4.4 (b). For the general case, put S̃n = k≤n Yk . Then, as k → ∞,
Sk
S
S̃
C S̃
k
k
k
f √
− f √ ≤ C √ − √ = √ Sk − S̃k = o(1),
k
k
k
k
k
(4.25)
by (4.23) and boundedness of f . Since
n
√ a.s.
1 X1 f S̃k / k −→
log n
k
k=1
Z
f dN,
(4.26)
R
and since (Ñ , 1/n) is regular (cf. Theorem 2.23) statement (4.24) now follows
from (4.25) and (4.26).
Theorem 4.8. Let {Xn }n≥1 be a stationary strong mixing sequence satisfying EX1 = 0, EX12 < ∞, σn2 = ESn2 → ∞ and α(n) = O(log−γ n) for some
γ > 0. Assume that
d
Sn /σn −→ N.
(4.27)
Then
n
1 X1
Sk
a.s.
I
≤ x −→ Φ(x),
log n
k
σk
k=1
38
for all
x.
(4.28)
Theorem
associated sequence with EX1 =
P∞4.9. Let {Xn }n≥1 be a stationary
2 = ES 2 , then conclusions (4.27) and
EX
X
<
∞.
Let
σ
0 and
1 k
n
n
k=1
(4.28) hold.
Proof sketches of Theorems 4.8 and 4.9. By Theorem 2.4 we need to show
that
Z
n
a.s.
1 X1
f dN,
(4.29)
f Sk /σk −→
log n
k
R
k=1
for any bounded Lipschitz function f . By the same arguments as in Theorem
4.1 and by using assumption (4.27) (which also holds by the assumptions in
Theorem 4.9, cf. [28]) we reduce (4.29) to
n
a.s.
1 X1
f Sk /σk − Ef Sk /σk −→ 0.
log n
k
k=1
Using Theorem 3.1 we would be done by establishing
n
1 X1
−1
−2
f Sk /σk
= O (log2 n) (log3 n)
,
Var
log n
k
(4.30)
k=1
Peligrad and Shao [28] establish the stronger statement that
n
1 X1
−
f Sk /σk
= O (log n)
,
Var
log n
k
k=1
for some , using the assumptions of Theorems 4.8 and 4.9.
From central limit theorems for mixing sequences one may now deduce
the following two theorems as corollaries of Theorem 4.8 (cf. [28] and further
references therein).
Theorem 4.10. Let {Xn }n≥1 be a stationary strong mixing sequence with
EX1 = 0. Assume that
2+δ
E|X1 |
< ∞, for some δ > 0,
and
∞
X
αδ/(2+δ) (n) < ∞.
(4.31)
n=1
Then σ 2 = EX12 + 2
is true.
P∞
k=2 EX1 Xk
< ∞. If in addition σ 2 > 0, then (4.28)
Theorem 4.11. Let {Xn }n≥1 be a stationary ρ−mixing sequence with EX1 =
0, EX12 < ∞. Assume that σn2 → ∞ and
∞
X
ρ(2n ) < ∞.
n=1
Then (4.28) holds.
39
(4.32)
Remark 4.12. Condition (4.31) combined with the condition σ 2 > 0 is
essentially sharp in the context of α-mixing sequences with respect to the
central limit theorem (4.27). That is, counterexamples exist where (4.31)
is slightly violated and where (4.27) does not hold. The same is true of
condition (4.32) in the context of ρ-mixing sequences. Confer Theorems 1.7
and 2.3 with subsequent comments in the survey article [26].
4.3
Subsequences
We now return to independent random variables to prove a result which
in fact is a consequence of Theorem 4.4 (a). The idea is to consider subsequences of partial sums in order to prove almost sure results with the weaker,
but perhaps more convenient, arithmetic (or Cesàro) summation. We confine ourselves to the situation of i.i.d. random variables with finite variances.
One may proceed in the same manner to derive similar results for all types
of sequences of independent random variables considered in Theorem 4.4.
The following theorem was introduced and proved by Schatte [33] under
the extra condition E|X1 |3 < ∞. Atlagh and Weber [2] proved it in the
form in which it is given below.
Theorem 4.13. Let {Xn }n≥1 be a sequence
of i.i.d. random variables with
P
E(X1 ) = 0 and E(X12 ) = 1. Set Sn = nk=1 Xk , then
n
1X
S2k
a.s.
I √ ≤ x −→ Φ(x),
for all x.
(4.33)
k
n
2
k=1
Proof. √Put for k ≥ 2, Yk := S2k − S2k−1 , Y1 := X1 and put for k ≥ 1,
bk := P2k . Then {Yk }k≥1 is a sequence of independent random variables,
S2n = ni=1 Yi =: S̃n and
S̃n d
−→ N,
bn
as
n → ∞.
From Theorem 4.4 (a) applied to the random variables Yk it follows that
n
S̃k
1 X
a.s.
≤ x −→ Φ(x),
for all x,
(4.34)
dk I
Dn
bk
k=1
Pn
n
with dk := log bk+1 /bk = 12 log 2 and Dn :=
k=1 dk = 2 log 2. Since
{dk } is bounded it follows that condition (a1) of Theorem 4.4 is fulfilled.
Statement (4.34) is equivalent to (4.33).
4.4
An almost sure version of Donsker’s theorem
Just as Theorems 2.16 and 2.17 provide extensions of classical central limit
theorems to Functional central limit theorems, we shall now consider extensions of the almost sure results Theorem 4.4 (a) and (b).
40
Lemma 4.14. Let {ξk }k≥1 be a sequence of independent
square integrable
P
random variables and let µk = EXk and s2n = 1≤k≤n Var Xk . Then, for
some constant C,
E
n
X
max Sk −
µi ≤ Csn .
1≤k≤n
(4.35)
i=k
Proof. By The Kolmogorov inequality [17, Chapter 3, Theorem 1.6] it follows
that, for x > 0
P
k
X
max Sk −
µn > x · sn ≤ x−2 .
1≤k≤n
(4.36)
i=1
The right hand side of (4.36) is integrable at infinity, which proves (4.35).
Lemma 4.15. Let {ξk }k≥1 be a sequence of i.i.d. and F -distributed random
variables, with F belonging to the domain of attraction of a stable distribution G of index α, for some 0 < α ≤ 2. Assume moreover that
Sn − bn d
−→ G,
an
(4.37)
for some sequences {an }n≥1 and {bn }n≥1 . Then, for all β < α and some
constant C,
|Sk − bk |β ≤ C,
(4.38)
E max
1≤k≤n
aβn
Proof. We first assume that F and G are symmetric (which implies that
{bn }n≥1 may be left out). By Proposition 2.13 (the Lévy inequalities) it
follows that
|S |
|Sk |
n
P max
> x ≤ 2P
>x .
(4.39)
1≤k≤n an
an
The integral of the right hand side of 4.39 against xβ−1 is finite and bounded
uniformly in n by Theorem 2.14 so that the same is true of the left hand
side, proving (4.38) in the symmetric case.
For the non-symmetric case we apply a strong symmetrization inequality
[17, Proposition 6.3, Chapter 3]:
P
Ss S − b
S − b k
k
k k
max − med
≥ x ≤ 2P max k ≥ x .
1≤k≤n an
1≤k≤n
an
an
Since
S − b S − b k
k k
k ≤
C
med
med
an
ak
S − b β 1/β
k
k
≤ C21/β E ≤ C 0,
ak
41
for all β < α and some constants by [17, Proposition 6.1, Chapter 3] and
Theorem 2.14, it follows by the triangle-inequality, that for some positive
constant a,
P
S − b Ss k
k
max ≥ x ≤ 2P max k ≥ x − a .
1≤k≤n
1≤k≤n an
an
(4.40)
The first part of the proof now gives that also the integral of the left hand
side of (4.40) against xβ−1 is finite and bounded uniformly in n, proving
(4.38) in the non-symmetric case.
We can now state and prove our result. The proof given here is an
adaptation of the one given in [25] for the i.i.d. case.
Theorem 4.16. Let W denote Wiener measure on C[0, 1]. Let {ξk }k≥1
be a sequence of independent random variables satisfying the conditions of
Theorem 2.16. Let {s2n }n≥1 and {Sn }n≥1 be the partial sums of variances
and variables respectively and define for n ≥ 1
Xtn (ω) =
Sk (ω)
,
sn
for
t=
s2k
,
s2n
k = 1, ..., n,
and linearly interpolated for t ∈ [0, 1] in between. Then, as n → ∞,
n
2
1 X σk+1
δ X k ⇒ W,
2
2
log sn
sk
almost surely.
k=1
Proof. We proceed in analogy with the proofs of Theorems 4.1 and 4.4.
2 /s2 and D = log (s2 ). By Theorem 2.4 we need to show
Let dk = σk+1
n
n
k
that
Z
n
a.s.
1 X
dk f X k −→ f dW,
Dn
k=1
for all bounded Lipschitz functions f defined on C[0, 1]. Theorem 2.17 gives
(through Remark 2.19) and Theorem 2.1 that
Z
k
Ef X → f dW,
so that it remains to show that
n
a.s.
1 X
dk f X k − Ef X k −→ 0,
Dn
k=1
for all bounded Lipschitz funtions f . Let
ζk = f X k − Ef X k .
42
Just as in Theorem 4.4 it suffices to prove that for some constant M and all
k and l obeying 1 ≤ k ≤ l,
E(ζk ζl ) ≤ M sk .
(4.41)
sl
Define for 0 ≤ t ≤ 1 the random element
0,
rk,l (t) =
Xtl − Sk /sl ,
0 ≤ t ≤ s2k /s2l ,
s2k /s2l ≤ t ≤ 1,
and fk,l := f (rk,l ). Then rk,l (t) depends only on ξk+1 , . . . , ξl ; is independent
of X k , and moreover
l
Xt ,
0 ≤ t ≤ s2k /s2l ,
l
Xt − rk,l (t) =
Sk
s2k /s2l ≤ t ≤ 1.
sl ,
Therefore, with fk := f X k and fl := f X l and by the regularity of f ,
we have that
|E(ζk ζl )| = |Cov (fk , fl )| = |Cov (fk , fl − fk,l )| ≤ CE|fl − fk,l |
1 |Si | = C 0 E max |Si | .
≤ C 0 E kX l − rk,l k∞ = C 0 E max
i≤k
i≤k sl
sl
To arrive at (4.41) we finally apply the inequality of the conclusion of Lemma
4.14.
Remark 4.17. Theorem 4.7 could easily be extended to the form of Theorem 4.16 by the conclusion of Theorem 4.16. Consult [25] for further details.
The following theorem, as well as its proof, is modeled on the preceding
one. The result may however be new. Note that we no longer consider
convergence of measures on C[0, 1] but on D[0, 1] (cf. Remark 2.19).
Theorem 4.18. Let F , G, {ξk }k≥1 , {Sn }n≥1 , {an }n≥1 , {bn }n≥1 and X ∈
D[0, 1] be as in Lemma 4.15. Let further X n be the random function on
[0, 1] defined by
Sk (ω) − bk
k k+1
Xtn (ω) =
,
for t ∈
,
,
k = 1, ..., n.
an
n
n
Then, as n → ∞,
n
1 X1
δ X k ⇒ X,
log n
k
k=1
43
almost surely.
Proof. Let dk = 1/k and Dn = log n. By Theorem 2.4 we need to show that
Z
n
1 X
k a.s.
dk f X −→ f dX,
Dn
k=1
for all bounded Lipschitz functions f defined on D[0, 1]. Theorem 2.18 gives
that
Z
k
Ef X → f dX,
so that it remains to show that
n
a.s.
1 X
dk f X k − Ef X k −→ 0.
Dn
k=1
Let
ζk = f X k − Ef X k .
Just as in Theorem 4.4 (b) it suffices to prove that for some constants M
and p and all k and l obeying 1 ≤ k ≤ l,
p
E(ζk ζl ) ≤ M ak .
al
Define for 0 ≤ t ≤ 1 the random element
0,
rk,l (t) =
k
Xtl − Ska−b
,
l
0 ≤ t ≤ k/l,
k/l ≤ t ≤ 1,
and fk,l := f (rk,l ). Then rk,l (t) depends only on ξk+1 , . . . , ξl , is independent
of X k , and moreover
l
0 ≤ t ≤ k/l,
Xt ,
l
Xt − rk,l (t) =
Sk −bk
,
k/l
≤ t ≤ 1.
al
Now let d(·, ·) be the metric of D[0, 1] and note that
d(x, y) ≤ kx − yk∞ ,
for any two elements x, y of D[0, 1]. Let further K be a Lipschitz constant, as
well as an upper bound for f and p < α ∧ 1 any number. With fk := f X k
and fl := f X l , we therefore have that
|E(ζk ζl )| = |Cov (fk , fl )| = |Cov (fk , fl − fk,l )| ≤ CE|fl − fk,l |
≤ CE Kd(X l , rk,l ) ∧ 2K ≤ CE KkX l − rk,l k∞ ∧ 2K
h
i
h
ip
|Si − bi |
|Si − bi |
= C 0 E max
∧ 1 ≤ C 0 E max
∧1
i≤k
i≤k
2al
2al
ip
a p h
h
|S
−
b
|
|Si − bi |p i
i
i
k
= C0
E max
≤ C 0 E max
i≤k
i≤k
2al
2al
apk
a p
k
≤ C 00
,
al
by Lemma 4.15.
44
5
5.1
Generalizations and Related Results
A universal result and some consequences
Here we collect some results which no longer merely concern central limit
theory. Theorem 5.1 is due to Berkes [4, Theorem 2] and can be proved in
a similar fashion to how we arrived at Theorem 4.4. Its main condition is
reminiscent of the proof of Proposition 4.3. In fact, Theorem 4.4 may be
deduced from Theorem 5.1 with, for example,
fl (x1 , . . . , xl ) =
X
l
fk,l (xk+1 , . . . , xl ) =
xi − bl /al ,
i=1
l
X
xi − (bl − bk ) /al ,
i=k+1
when proving statement (b). (Cf. [4, Theorem A].)
When applying Theorem 5.1 one could typically start from the result
d
fk (X1 , . . . , Xk ) −→ G,
as
k → ∞.
The main condition is then a measure on how little the k first independent
random variables influence the variables {fl (X1 , . . . , Xl )}l>k asymptotically
as l → ∞. A limit relation which no longer holds when a few variables are
changed is typically what Theorem 5.1 does not cover. Confer [4, pages 3-5]
for further discussion.
Theorem 5.1. Let {Xk }k≥1 be independent random variables, fk : Rk → R,
k = 1, 2, . . ., measurable functions and assume that for each 1 ≤ k < l there
exists a measurable function fk,l : Rl−k → R such that
−(1+)
E fl (X1 , . . . , Xl ) − fk,l (Xk+1 , . . . , Xl ) ∧ 1 ≤ C log+ log+ (cl /ck )
,
for some constants C > 0, > 0 and a positive, non-decreasing sequence
{cn }n≥1 satisfying cn → ∞, cn + 1/cn = O(1) as n → ∞. Put
X
dk = log (ck+1 /ck ),
Dn =
dk .
1≤k≤n
Then, for any distribution function G, the relations
n
1 X
a.s.
dk I{fk (X1 , . . . , Xk ) ≤ x} = G(x), for any x ∈ CG ,
n→∞ Dn
lim
k=1
and
n
1 X
dk P {fk (X1 , . . . , Xk ) ≤ x} = G(x), for any x ∈ CG ,
n→∞ Dn
lim
k=1
45
are equivalent. The result remains valid if we replace the
sequence
Pweight
{dk }k≥1 by any sequence {d∗k }k≥1 such that 0 ≤ d∗k ≤ dk ,
d∗k = ∞.
Proof. [4, Theorem 2].
Berkes deduces Theorems 5.2 and 5.3 below from Theorem 5.1. Theorem
5.2 was originally proved independently in [12] and [9].
Theorem 5.2. Let {Xk }k≥1 be i.i.d. random variables such that setting
Mk = max1≤i≤k Xi we have
d
ak (Mk − bk ) −→ G,
for some numerical sequences {an }n≥1 and {bn }n≥1 and a distribution function G. Then
n
1 X1
a.s.
I{ak (Mk − bk ) ≤ x} = G(x), for any x ∈ CG .
n→∞ log n
k
lim
k=1
Theorem 5.3. Let {Xk }k≥1 be i.i.d. random variables with continuous distribution function F , let Fn be the empirical distribution function defined
by
1 X
Fn (x) =
I{Xk ≤ x}
n
1≤k≤n
and let
Dn = sup Fn (x) − F (x)
x
be the Kolmogorov statistic. Then
n
∞
a.s. X
1 X 1 √
2 2
I kDk ≤ x =
(−1)j e−2j x , for any x.
n→∞ log n
k
lim
j=−∞
k=1
Theorems 5.4 and 5.5 are more recent (2002 and 2006 respectively). The
latter is the “almost sure version” of the former.
Theorem 5.4. Let {Xk }k≥1 be a sequence of positive i.i.d. square integrable
random variables with E(X1 )P
= µ, Var (X1 ) = σ 2 > 0 and the coefficient of
variation γ = σ/µ. Let Sk = ki=1 Xi , k ≥ 1. Then
Qn
1/ γ √n
√
S
d
2N
k=1 k
−→
e
Yn :=
,
as n → ∞.
n!µn
Proof. Cf. [29].
46
Theorem 5.5. Let {Xk }k≥1 , {Yk }k≥1 , µ, σ and γ be as in Theorem 5.4.
Then, for any real x,
n
1 X1
a.s.
I{Yk ≤ x} = F (x),
n→∞ log n
k
lim
(5.1)
k=1
√
where F is the distribution function of the random variable e
2N .
Proof. Cf. [16].
5.2
Return times
We now use results from the previous section to derive some results concerning Return times, with respect to the origin, of the simple, symmetric
random walk. We first consider two dimensions and then one dimension.
The analysis is inspired by [4, pages 32-33] from which Theorem 5.9 is taken.
Theorem 5.13, which we deduce here, could be viewed as the corresponding
result for one dimension.
Let in both cases 0 = τ0 < τ1 < . . . denote the successive times of return
to the origin and Xn = τn − τn−1 , n ≥ 1, the excursion times. Naturally,
{Xn }n≥1 forms a sequence of i.i.d. random variables. (Cf. Propositions 5.6
and 5.10 below for the question of well-definedness.)
The following proposition dates back to 1950 and Dvoretzky and Erdős.
Proposition 5.6. Let τ1 be the time of the first return to the origin in the
two-dimensional setting. Then
P (τ1 > t) ∼
π
,
log t
as
t → ∞.
Proof. [31, Lemma 19.1, page 197].
Proposition 5.7. Let Mk = max1≤i≤k Xi , where {Xi }i≥1 , denotes the sequence of excursion times in the two-dimensional setting. Then
d
(i)
1
k
log Mk −→ H;
(ii)
1
k
log τk −→ H,
d
where the distribution function of H is defined by
−π/x
e
, if x > 0,
H(x) =
0,
if x ≤ 0.
Proof. Since
Mk ≤ τk ≤ kMk ,
47
it follows that
0 ≤ log τk − log Mk ≤ log k,
so that
1
log τk − log Mk → 0, as k → ∞.
k
Statements (i) and (ii) are therefore equivalent.
To prove (i) we note that the random variables {log Mk /k} are positive
since X1 ≥ 2. For x > 0 it now follows by independence and Proposition
5.6 that
log M
k
k
P
≤x
= P Mk ≤ ekx = P X1 ≤ ekx
k
π k
∼
1−
→ e−π/x ,
kx
as k → ∞.
Remark 5.8. The distribution H defined in Proposition 5.7 belongs to the
family of extremal distributions, it is in fact of Fréchet type. The same holds
for the distribution G defined in Proposition 5.11.
Theorem 5.9. Consider the two-dimensional setting. Then
a.s.
P
(i) limn→∞ log1 n nk=1 k1 I k1 log τk ≤ x = H(x), for any x;
(ii) limn→∞
1
log n
Pn
1
k=1 k I
1
k
a.s.
log Mk ≤ x = H(x), for any x,
where H and {Mk }k≥1 are defined in Proposition 5.7.
Proof. The result follows from Theorem 5.1, cf. [4, Theorem H, page 33].
And now to corresponding results for the one-dimensional simple, symmetric random walk.
Proposition 5.10. Let τ1 be the time of the first return to the origin in the
one-dimensional setting. Then
P (τ1 > t) ∼
2 1/2
π
t−1/2 ,
as
t → ∞.
Proof. By combinatorial arguments (cf. e.g. [31, page 94]) we get that
−2n 2n
P X1 > 2n = 2
,
P X1 > 2n + 1 = P X1 > 2n .
n
The result follows by Stirling approximation of the factorials.
48
Proposition 5.11. Let Mk = max1≤i≤k Xi , where {Xi }i≥1 , denotes the
sequence of excursion times in the one-dimensional setting. Then
Mk d
−→ G,
k2
where the distribution function of G is defined by
1/2
2
exp − πx
, if
G(x) =
0,
if
x > 0,
x ≤ 0.
Proof. Positivity of the random variables Mk /k 2 is obvious. For x > 0 it
now follows by independence and Proposition 5.10 that
M
k
k
2
2
P
≤x
= P Mk ≤ k x = P X 1 ≤ k x
k2
1 2 1/2 k
2 1/2
∼
1−
→ exp −
,
k πx
πx
as k → ∞.
Proposition 5.12. Consider the one-dimensional setting. Then
τk d
−→ F,
(5.2)
k2
where F is the so-called Lévy-distribution with distribution function defined
by
(
R x −3/2 − 1
√1
t
e 2t dt if x > 0
2π 0
F (x) =
0
if x ≤ 0
Proof. Since {τk } is a partial sum sequence of positive, i.i.d. τ1 -distributed
variables, (5.2) follows from Proposition 5.10 and Theorem 2.8 with preceding remarks and by noting that the above mentioned Lévy-distribution
has skewness parameter β = 1 and index α = 1/2, in the conventional
terminology concerning stable distributions.
A proof based on exact calculations of probabilities and Stirling approximation is also possible, confer [31, Theorem 9.11, page 99].
Theorem 5.13. Consider the one-dimensional setting. Then
a.s.
P
(i) limn→∞ log1 n nk=1 k1 I kτk2 < x = F (x), for any x;
a.s.
P
k
(ii) limn→∞ log1 n nk=1 k1 I M
< x = G(x), for any x,
k2
where F is defined in Proposition 5.12 and G and {Mk }k≥1 in Proposition
5.11.
Proof. Statements (i) and (ii) are consequences of Theorems 4.4 (c) and 5.2
respectively.
49
5.3
A local limit theorem
All results of this section are stated in Chapter 8 of the review paper [3], as
well as some further results and references for these.
The following theorem was published in 1951. It may be interesting to
note that the line of reasoning in the published proof is closely related to
the arguments in Chapter 3.
Theorem 5.14. Let {Xk }k≥1 be a P
sequence of i.i.d. integer random variables with E(X1 ) = 0 and let Sn = nk=1 Xk for n ≥ 1. Assume that every
integer a is a possible value of Sn for all sufficiently large n. Finally, set
Mk :=
n
X
P (Si = a).
k=1
Then
n
1 X I{Sk = a} a.s.
= 1.
n→∞ log Mn
Mk
lim
k=1
Proof. [22, Theorem 6].
The next result, which we may call the Almost sure local central limit
theorem, follows from the preceding theorem and Theorem 2.21, as is noted
in the proof below. We here also provide a proof independent of Theorem
5.14, based on the results of Chapter 3 and Theorem 2.21.
Theorem 5.15. Let {Xk }k≥1 and {Sn }n≥1 be as in Theorem 5.14. Assume
moreover that E(X12 ) = σ 2 < ∞. Then
n
1 X I{Sk = a} a.s. 1 1 1/2
lim
.
=
n→∞ log n
σ 2π
k 1/2
(5.3)
k=1
Proof. From Theorem 2.21 it follows that, for all integers b
1
,
P (Si = b) ∼ √
σ 2πn
(5.4)
and by, (what we now find to be!), standard techniques we deduce that
Mk ∼ σ −1 (2k/π)1/2 )
and
log Mk ∼ (log k)/2,
with Mk defined as in Theorem 5.14. The desired conclusion follows from
Theorem 2.25 (i) and Theorem 5.14.
For the alternative proof, define for k ≥ 1,
C=
√
1 1 1/2
; Ik = I{Sk = a}; Pk = P (Sk = a); ξk = kIk − C.
σ 2π
50
The desired conclusion (5.3) is equivalent to
n
1 X ξk a.s.
= 0.
n→∞ log n
k
lim
(5.5)
k=1
The sequence {ξk }k≥1 is uniformly bounded below and each variable is
square-integrable, so that the conditions of Theorem 3.1 are fulfilled. Through
Theorem 3.8, statement (5.5) follows if we show e.g. that
1/2
E(ξk ξl ) ≤ M k
,
l
(5.6)
for some constant M and all 1 ≤ k ≤ l. Now
√ √
√
√
E(ξk ξl ) =
k lP Sk = a, Sl = a + C 2 − C lPl − C kPk
√
√ √
√
=
k lPk P Sl−k = 0 + C 2 − C lPl − C kPk .
Applying (5.4) with b = a and b = 0 we get that
√
√ √
l
0
E(ξk ξl ) ≤ DC 2 k l √ √1
,
√
+
1
−
1
−
1
=
C
−
1
l−k
k l−k
for some constant D and C 0 = DC 2 . Finally
√
√
√
√ k
l
k 1/2
k
√
= 1+
≤1+ √
≤1+ 2√ ,
l−k
l−k
l−k
l
so that
1/2
√
E(ξk ξl ) ≤ 2C 0 k
,
l
proving (5.6) as desired.
We now investigate the consequences for the simple, symmetric random
walk in one dimension. As noted in the proof below, there is a connection
to Theorem 5.13 (i) of the preceding section.
Theorem 5.16. Let {ξk }k≥1 be a sequence of
Pi.i.d. random variables with
P (ξ1 = 1) = P (ξ1 = −1) = 1/2 and let Sn = 1≤k≤n ξk . Let 0 = τ0 < τ1 <
. . . denote the successive subscripts where Sk = 0. Then
1/2 a.s. 2 1/2
P
(i) limn→∞ log1 n nk=1 τ1k
;
= π
(ii) limn→∞
(iii) limn→∞
1
log τn
Pn
k=1
log τn a.s.
log n =
1 1/2 a.s.
=
τk
1 1/2
;
2π
2.
51
Proof. It suffices to prove two of the three statements above. Since (iii)
follows from Theorem 11.6 in [31] it suffices to prove (ii).
Statement (ii) now follows from Theorem 5.15 since
τn
n
1 X 1 1/2
I{Sk = 0}
1 X
,
=
log τn
τk
log τn
k 1/2
k=1
k=1
and since Eξ12 = 1.
Remark 5.17. One may note the connection between (i) and Theorem 5.13.
Namely, by defining
g(x) = x−1/2 ,
for x > 0,
we see that statement (i) is equivalent to
n
1 X τk a.s.
g 2 =
log n
k
k=1
Z
∞
f (x)g(x)dx =
0
2 1/2
π
,
where f (x) = √12π x−3/2 e−1/(2x) , x > 0, is the density function of the Lévy
distribution.
To deduce (i) immediately from Theorem 5.13 there is however a slight
difficulty to overcome, namely that g is not bounded.
5.4
Generalized moments in the almost sure central limit
theorem
Returning to the beginning and our fundamental result (1.1), we saw (by
Theorem 2.4) and also made use of the fact that it could be equivalently
stated as
Z
n
1 X1
Sk a.s.
lim
f √
=
f (x)φ(x)dx,
(5.7)
n→∞ log n
k
k
k=1
for any bounded, Lipschitz-continuous function f . The question now arises,
can the result (5.7) be extended to a larger class
R of functions f ? Do we need
to assume more than the well-definedness of f (x)φ(x)dx?
The answers are in both cases affirmative. Conditions concerning continuity as well as asymptotic behavior need to be imposed. Several papers
have dealt with the the problem in this form (cf. [3, Chapter 3]). Theorem
5.18 below due to Ibragimov and Lifschitz seems to be best possible at the
moment. For a counterexample where condition (5.8) is not fulfilled we refer
to [21].
Theorem 5.18. Let {Xk }k≥1 be a sequence of i.i.d. random variables with
EX1 = 0 and EX12 = 1. Further, let A and H0 > 0 be constants and assume
that f : [A, ∞) → R+ is a nondecreasing function such that f (x) exp {−H0 x2 }
52
is nonincreasing and
tion h which satisfies
R∞
A
f (x)φ(x)dx < ∞. Then for every continuous func-
|h(x)| ≤ f (|x|),
we have
|x| ≥ A,
(5.8)
Z
n
1 X1
Sk a.s.
h √
=
h(x)φ(x)dx.
n→∞ log n
k
k
k=1
lim
Proof. [21].
Several authors have noted and used the connection between results of
the form (5.7) and the Birkhoff ergodic theorem applied to the OrnsteinUhlenbeck process. Corollary 5.20 is indeed a “continuous version” of (5.7)
with partial sums replaced by Brownian motion and sums by integrals. In
this case then, no extra conditions on f are necessary. Theorem 5.19 below
concerns the more general setting which corresponds to Theorem 4.16.
Theorem 5.19. Let B(t), t ≥ 0, be a standard Brownian motion defined
on a probability space (Ω, P, P ). Set for each t > 0 and each s ∈ [0, 1]
Bt (s) = t−1/2 B(st).
Let further W denote Wiener measure on C[0, 1]. Then, for all f ∈ L1 (W ),
1
log T
T
Z
1
1
a.s.
f (Bu )du −→
u
Z
f dW,
as
T → ∞.
Proof. By changing variables, u = et , the conclusion is equivalent to
1
T
Z
T
a.s.
Z
f (Xt )dt −→
f dW,
as
T → ∞,
(5.9)
0
where we define
Xt (s) := Bet (s) = e−t/2 B(set ),
for s ∈ [0, 1] and t ≥ 0.
(5.10)
We shall deduce (5.9) from Birkhoff’s ergodic theorem. We transfer the
C[0,1]
problem (cf. Billingsley [5, page 19]) and define the product space Ω̃ = R≥0
and the mapping
ψ : Ω −→ Ω̃,
ψ(ω) = X· (·, ω).
We then equip Ω̃ with the σ-algebra and measure induced by ψ, hence
defining a probability space (Ω̃, P̃, P̃ ). If we can show that P̃ is an invariant
and ergodic measure with respect to the usual one step shift operator τ1
53
defined on Ω̃ then an application of Birkhoff’s ergodic theorem would give
that
Z
n−1
1X
a.s.
g(τk ω̃) −→ gdP̃ as n → ∞,
(5.11)
n
k=0
for all g ∈
version
L1 (Ω̃).
But as Krengel states in [23, page 10] the continuous time
Z
Z
1 T
a.s.
g(τt ω̃)dt −→ gdP̃ as T → ∞,
(5.12)
T 0
is a consequence of the usual theorem if we (with analoguous definitions)
assume that P̃ is invariant and ergodic with respect to the semiflow {τt , t ≥
0} of t-step shifts. We now let T be the projection on the first coordinate
of Ω̃,
C[0,1]
T : R≥0 −→ C[0, 1],
T (ω̃) = ω̃0 ,
and for f ∈ L1 (W ) we define g = f ◦ T . Now g ∈ L1 (Ω̃, P̃ ), since, for each
t ≥ 0, Xt (·) is a [0,1]-Brownian motion, and it also follows that
f ◦ Xt = f ◦ T ◦ τt ◦ ψ = g ◦ τt ◦ ψ.
R
R
By definition of ψ, and since gdP̃ = f dW , (5.9) therefore follows from
(5.12).
It remains to show invariance and ergodicity of the measure P̃ with
respect to shift operators. Invariance (or stationarity) here means
P̃ (τu−1 B̃) = P̃ (B̃),
for all u.
(5.13)
Replacing ergodicity by the stronger mixing property (cf. [page 12][5]) means
showing
P̃ (Ã ∩ τu−1 B̃) → P̃ (Ã)P̃ (B̃) as u → ∞.
(5.14)
à and B̃ here denote arbitray sets in Ω̃. Following Billingsley [6, page
194] we regard our original Brownian motion B(t) as a random element in
D[0, ∞). The σ-algebra of this space is generated by sets of the form
{x ∈ D[0, ∞) : x(t) ∈ I},
for a fixed interval I and a fixed value of t ∈ [0, ∞), (cf. [6, Theorem 12.5,
page 134]). Now P̃ is likewise induced by the mapping
ϕ : D[0, ∞) −→ Ω̃,
ϕ(x) = e−t/2 x(set ), for t ≥ 0 and s ∈ [0, 1],
with D[0, ∞) equipped with Brownian motion measure, which we denote by
PW . This means that both (5.14) and (5.13) only need to be considered in
a restrictive class of sets, namely sets à and B̃ of the following type
A = {x ∈ D[0, ∞) : x(t0 ) ∈ I},
ϕ−1 (Ã) = A,
B = {x ∈ D[0, ∞) : x(t1 ) ∈ J},
ϕ−1 (B̃) = B.
54
If we as usual distinguish elements in Ω̃ through coordinates t ≥ 0 and
s ∈ [0, 1], then B̃ is characterized by
ω̃t (s) ∈ e−t/2 J,
for all t ≥ 0 and s ∈ [0, 1] such that s = e−t t1 .
It follows that τu−1 B̃ is characterized by
ω̃t−u (s) ∈ e−t/2 J,
for all t ≥ u and s ∈ [0, 1] such that s = e−t t1 .
Therefore, denoting ϕ−1 (τu−1 B̃) =: B u , we finally get
B u = {x ∈ D[0, ∞) : eu/2 x(t1 e−u ) ∈ J}.
We may then prove (5.13) since
P̃ (τu−1 B̃) = PW (B u ) = PW (eu/2 B(t1 e−u ) ∈ J) = PW (B(t1 ) ∈ J)
= PW (B) = P̃ (B̃).
To prove (5.14) we note that
P̃ Ã ∩ τu−1 B̃ = PW ϕ−1 (Ã ∩ τu−1 B̃) = PW ϕ−1 (Ã) ∩ ϕ−1 (τu−1 B̃)
= PW A ∩ B u
= PW B(t0 ) ∈ I ∩ eu/2 B(t1 e−u ) ∈ J .
(5.15)
We therefore need to show that (5.15) converges to
PW B(t0 ) ∈ I)PW (B(t1 ) ∈ J ,
(5.16)
as u → ∞. Let X := B(t0 ), Y := B(t1 ) and Yu := eu/2 B(t1 e−u ). Now
Cov X, Yu = eu/2 Cov B(t0 ), B(t1 e−u ) = eu/2 t1 e−u = e−u/2 t1 ,
for u sufficiently large, and tends hence to 0 as u → ∞. Now
d
Yu = Y ∼ N (0, t1 ),
for all u. By the definition of bivariate normal distributions, it now follows that the density of (X, Yu ) converge to the density of (X, Y ), proving
convergence of (5.15) to (5.16) for arbitrary intervals I and J.
Corollary 5.20. Let B(t), t ≥ 0, be a standard Brownian motion. Then,
for all f ∈ L1 (N),
1
log T
Z
1
T
Z
1
B(u)
a.s.
f
du −→ f (x)φ(x)dx,
u
u1/2
55
as
T → ∞.
References
[1] M. Atlagh. Théorème central limite presque sûr et loi du logarithme
itéré pour des sommes de variables aléatoires indépendantes. C. R.
Acad. Sci. Paris Sér. I, 316:929–933, 1993.
[2] M. Atlagh and Weber M. Un théorème central limite presque sûr relatif
à des sous-suites. C. R. Acad. Sci. Paris Sér. I, 315:203–206, 1992.
[3] I. Berkes. Results and problems related to the pointwise central limit
theorem. In B. Szyszkowicz, editor, Asymptotic results in Probability
and Statistics, pages 59–96. Elsevier, Amsterdam, 1998.
[4] I. Berkes and E. Csàki. A universal result in almost sure central limit
theory. Stochastic Process. Appl., 94(1):105–134, 2001.
[5] P. Billingsley. Ergodic Theory and Information. Tracts on Prob. and
Statistics. John Wiley and Sons, 1965.
[6] P. Billingsley. Convergence of Probability Measures. John Wiley and
Sons, second edition, 1999.
[7] N.H Bingham, C.M. Goldie, and J.L. Teugels. Regular variation. Cambridge University Press, 1987.
[8] G. Brosamler. An almost everywhere central limit theorem. Math.
Proc. Cambridge Phil. Soc., 104:561–574, 1988.
[9] S. Cheng, L. Peng, and Y. Qi. Almost sure convergence in extreme
value theory. Math. Nachr., 190:43–50, 1998.
[10] A. deAcosta and E. Giné. Convergence of moments and related functionals in the general central limit theorem in banach spaces. Z.
Wahrsch. verw. Gebiete, 48:213–231, 1979.
[11] N. Etemadi. Stability of sums of weighted nonnegative random variables. J. Multivariate Anal., 13:361–365, 1983.
[12] I. Fahrner and Stadtmüller U. On almost sure max-limit theorems.
Statist. and Probab. Lett., 37:229–236, 1998.
[13] W. Feller. An Introduction to Probability Theory and Its Applications,
Vol 1. John Wiley and Sons, second edition, 1968.
[14] W. Feller. An Introduction to Probability Theory and Its Applications,
Vol 2. John Wiley and Sons, second edition, 1971.
[15] B.V. Gnedenko and A.N. Kolmogorov. Limit Distributions for Sums of
Independent Random Variables. Addison-Wesley, revised edition, 1968.
56
[16] K. Gonchigdanzan and G.A. Rempala. A note on the almost sure limit
theorem for the product of partial sums. Appl. Math. Lett., 19:191–196,
2006.
[17] A. Gut. Proabability: A Graduate Course. Springer, 2005.
[18] G.H. Hardy. Divergent series. Oxford University Press, 1949.
[19] T-C. Hu, A. Rosalsky, and A.I. Volodin. On the golden ratio, strong
law, and first passage problem. Mathematical Scientist, pages 1–10,
2005.
[20] I.A. Ibragimov and M.A. Lifshits. On almost sure limit theorems. Theory Probab. Appl., 44(2):254–272, 1998.
[21] I.A. Ibragimov and M.A. Lifshits. On the convergence of generalized
moments in almost sure central limit theorem. Statist. and Probab.
Lett., 40:343–351, 1998.
[22] Chung K.L. and P. Erdös. Probability limit theorems assuming only
the first moment 1. Memoirs of the AMS, 6, 1951.
[23] U. Krengel. Ergodic Theorems. Walter de Gruyter, 1985.
[24] E. Kreyszig. Introductory Functional Analysis with Applications. John
Wiley and Sons, 1978.
[25] M. Lacey and W. Philipp. A note on the almost everywhere central
limit theorem. Statist. and Probab. Lett., 9:201–205, 1990.
[26] M. Peligrad. Recent advances in the central limit theorem and its weak
invariance principle for mixing sequences of random variables (a survey). In E. Eberlein and M.S. Taqqu, editors, Dependence in Probability
and Statistics. A Survey of Recent Results., pages 193–225. Birkhäuser,
1985.
[27] M. Peligrad and P. Révész. On the almost sure central limit theorem.
In A. Bellow and R. Jones, editors, Almost everywhere convergence II,
pages 209–225. Academic Press, New York, 1991.
[28] M. Peligrad and Q. Shao. A note on the almost sure central limit
theorem for weakly dependent random variables. Statist. and Probab.
Lett., 22(2):131–136, 1995.
[29] G. Rempala and J. Wesolowski. Asymptotics for products of sums and
u-statistics. Electron. Comm. Probab., 7:47–54, 2002.
[30] S. I. Resnick. Point processes, regular variation and weak convergence.
Adv. in Appl. Probab., 18:66–138, 1986.
57
[31] P. Révész. Random walk in random and non-random environments.
World Scientific Publishing Co., 1990.
[32] G. Samorodnitsky and M. S. Taqqu. Stable Non-Gaussian Random
Processes. Chapman and Hall, New York, 1994.
[33] P. Schatte. On strong versions of the central limit theorem. Math.
Nachr., 137:249–256, 1988.
58
© Copyright 2026 Paperzz