KMT Theorem for the Simple Random Walk

Faculteit Wetenschappen
Departement Wiskunde
KMT Theorem for the
Simple Random Walk
Proefschrift ingediend met het oog op het behalen van de graad van Master in de Wiskunde
Lies Leemans
Promotor:
Prof. Dr. U. Einmahl
Academiejaar 2014-2015
Abstract
The aim of this thesis is to give a proof of the KMT theorem for the simple random
walk. This theorem provides a coupling of a simple random walk (Sk )k≥0 with a
standard Brownian motion (Bt )t≥0 . More specifically, the result implies that, under
certain assumptions, the optimal growth rate of
max |Sk − Bk |
1≤k≤n
is equal to O(log n). This result is also valid for some other processes apart from
the simple random walk. However, the proof of the general KMT theorem is quite
technical. Therefore, we present a new technique, introduced by Chatterjee, to prove
the theorem for the simple random walk. We use the concept of Stein coefficients.
Apart from the proof, this thesis also contains an application of the KMT theorem.
Using well-known results for the Brownian motion, the coupling can be used to obtain
similar results for other processes such as the simple random walk.
i
Samenvatting
Het doel van deze thesis is om een bewijs te geven van de KMT stelling voor de
eenvoudige stochastische wandeling. Deze stelling koppelt een eenvoudige stochastische wandeling (Sk )k≥0 aan een standaard Brownse beweging (Bt )t≥0 . Het resultaat
impliceert dat, onder bepaalde assumpties, de optimale orde van
max |Sk − Bk |
1≤k≤n
gelijk is aan O(log n). Behalve voor de eenvoudige stochastische wandeling, geldt dit
resultaat ook voor bepaalde andere processen. Het bewijs van de algemene KMT
stelling is echter nogal technisch. Daarom tonen wij een nieuwe techniek - bedacht
door Chatterjee - om de stelling te bewijzen voor de eenvoudige stochastische wandeling. We maken gebruik van Stein coëfficiënten.
Naast het bewijs, geven we in deze thesis ook een toepassing van de KMT stelling.
Door gebruik te maken van gekende resultaten voor de Brownse beweging, kunnen
we via de koppeling gelijkaardige resultaten aantonen voor bepaalde andere processen
zoals de eenvoudige stochastische wandeling.
ii
Dankwoord
”Education is not the learning of facts, but the training of the mind to think.”
Deze uitspraak van Albert Einstein geeft een perfecte omschrijving van de opleiding wiskunde. Toen ik vijf jaar geleden begon aan deze studie, wist ik niet goed wat
me te wachten stond. De enige zekerheid waar ik me toen aan vastklampte, was mijn
passie voor het vak. Beetje bij beetje ontdek je de wereld van de wiskunde, maar al
snel besef je dat die wereld veel te groot is om helemaal te ontdekken. Ik ervaar dit
echter als een voordeel. Dit betekent namelijk dat we altijd kunnen blijven bijleren
en ons kunnen verwonderen over de schoonheid van de wiskunde.
Naast heel wat theoretische bagage, heeft de vijfjarige opleiding aan de VUB me
vooral een bepaalde manier van redeneren opgebracht. Ik wil hiervoor alle professoren bedanken van wie ik les heb mogen krijgen. Al die kennis en techniek hebben
uiteindelijk geleid tot dit werk. Hiervoor wil ik in het bijzonder professor Einmahl
bedanken. Ondanks zijn zeer drukke agenda, heeft hij me met raad en daad bijgestaan. Op de meest onmogelijke momenten, stond hij toch steeds klaar om me verder
te helpen met mijn talloze vragen. Professor Einmahl, ik kijk op naar de toewijding
waarmee u uw werk ter harte neemt en de vakgroep ondersteunt.
Tot slot zijn er nog twee mensen zonder wie dit alles niet mogelijk zou zijn geweest,
mijn ouders. Zij hebben me vijf jaar lang met hart en ziel gesteund tijdens deze
veeleisende, maar lonende studie. Mama en papa, dankjewel!
iii
Contents
Introduction
1
1 Stein coefficients
1.1 Strong coupling based on Stein coefficients . . . . . . . . . . . . . . .
1.2 Examples of Stein coefficients . . . . . . . . . . . . . . . . . . . . . .
4
4
34
2 A normal approximation theorem
2.1 Verification of the normal approximation theorem . . . . . . . . . . .
2.2 Conditional version of the normal approximation theorem . . . . . . .
39
39
45
3 The KMT theorem for the SRW
3.1 The induction step . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Completing the proof of the main theorem . . . . . . . . . . . . . . .
57
57
67
4 An application of the KMT theorem
4.1 Theorem of Erdős-Rényi . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Limit theorem concerning moving averages . . . . . . . . . . . . . . .
79
79
88
A Some elaborations
A.1 Basic results related to probability theory . . . . . . . .
A.2 Topological results required for Theorem 1.1.1 . . . . . .
A.2.1 (V, T ) is a locally convex topological vector space
A.2.2 The vague topology . . . . . . . . . . . . . . . .
A.3 Some straightforward calculations . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
91
. 91
. 98
. 99
. 100
. 101
List of symbols
106
Bibliography
108
Index
111
iv
Introduction
2
Let 1 , 2 , ... be a sequence
Pk of i.i.d. random variables with E[1 ] = 0 and E[1 ] = 1.
Let S0 = 0 and Sk = i=1 i , for each k ≥ 1. Assume our aim is to construct a
version of (Sk )k≥0 and a standard Brownian motion (Bt )t≥0 on the same probability
space, such that the growth rate of
max |Sk − Bk |
1≤k≤n
is minimal. This is what is called coupling a random walk with a Brownian motion.
We usually refer to this as an ’embedding problem’.
A lot of mathematicians have already studied the embedding problem. The first
research was done by Strassen [24]. He introduced an almost sure invariance principle.
More specifically, his purpose was to find an optimal increasing function g such that
|Sn − Bn |
→ 0,
g(n)
a.s.,
as n → ∞. In 1964 he proved that
|S − Bn |
√ n
→ 0,
n log log n
a.s.,
as n → ∞. Later on it turned out that no better convergence rate is possible in
general for random variables with finite second moments. Assuming higher moments,
better convergence rates can be achieved. In 1965 Strassen [25] showed that it is
possible to have
max |Sk − Bk | = O(n1/4 (log n)1/2 (log log n)1/4 ) a.s.
1≤k≤n
under the assumption that E[41 ] < ∞. To prove these results, Strassen used the
Skorokhod embedding. Kiefer [18] then showed that no better approximation can be
achieved if one uses the Skorokhod embedding. So new construction methods had
to be developed. Eventually it was established by Komlós et al. [19], [20] that it is
possible to have
max |Sk − Bk | = O(log n)
1≤k≤n
under the assumption that 1 has a finite moment generating function in a neighbourhood of zero. Besides, they also proved that, under this assumption, O(log n)
is the optimal growth rate. They obtained this result from the following coupling
inequality.
1
Theorem 1. (Komlós-Major-Tusnády) Let 1 , 2 , ... be a sequence of i.i.d. random
variables with E[1 ] = P
0, E[21 ] = 1 and E[exp(θ|1 |)] < ∞ for some θ > 0. Let
k
S0 = 0 and let Sk =
i=1 i , for each k ≥ 1. Then for any n, it is possible to
construct a version of (Sk )0≤k≤n and a standard Brownian motion (Bt )0≤t≤n on the
same probability space such that for all x ≥ 0:
P max |Sk − Bk | ≥ C log n + x ≤ Ke−λx ,
k≤n
where C, K and λ do not depend on n.
This result of Komlós et al. was very surprising, since it was already known that
o(log n) could not be achieved unless the variables i are standard normal. Indeed, if
1 , 2 , ... are i.i.d. random variables with mean zero and variance one, and if (Bt )0≤t<∞
is a Brownian motion such that
Sn − Bn = o(log n) a.s.,
then the random variables i have a standard normal distribution. This result is
shown in Section 4.1. For a more detailed account of the history of embeddings, we
refer to the book of Csörgő and Révész [9].
Unfortunately, the proof of the KMT theorem is technically very difficult and it
is hard to generalize. Recently, Chatterjee [6] came up with a new proof of the KMT
theorem for the simple random walk (SRW). This proof will be the subject of this
work.
Theorem 2. Let 1 , 2 , ... be a sequence
of i.i.d. symmetric ±1-valued random variPk
ables. Let S0 = 0 and let Sk = i=1 i , for each k ≥ 1. It is possible to construct a
version of the sequence (Sk )k≥0 and a standard Brownian motion (Bt )t≥0 on the same
probability space such that for all n and all x ≥ 0:
P max |Sk − Bk | ≥ C log n + x ≤ Ke−λx ,
k≤n
where C, K and λ do not depend on n.
The importance of the KMT theorem for the SRW, lies in the fact that the SRW
is used in many domains of mathematics and science. For a specific example we refer
the interested reader to [11]. This work of Dembo et al. concerns the cover time for
the SRW on the discrete two-dimensional torus Z2n = Z2 /nZ2 . Results about this
cover time are obtained by first showing a result for the cover time for the Brownian
motion on the two-dimensional torus π 2 and then applying the KMT theorem.
The method we will use in order to prove this theorem is different from the one
to prove the general KMT theorem. The arguments and proofs in this master thesis
are based on the paper [6] of Chatterjee. The paper of Chatterjee provides us with a
main overview of the proof. Our aim will be to check all the details and to rigorously
work out the whole proof.
2
We start with a method that yields a coupling of an arbitrary random variable W
with a normally distributed random variable Z. The coupling will be such that the
tails of W − Z are exponentially decaying. This is what is called a strong coupling.
This coupling method will be based on the concept of Stein coefficients. The goal of
Chapter 1 is to give a proof of this ’strong coupling’ theorem.
Recall the notation of our main theorem; Theorem 2. Once we have established
the ’strong coupling’ theorem, we can make use of Stein coefficients in order to couple
Sn with a random variable Zn ∼ N (0, n). More specifically, we will construct a version
of Sn and Zn on the same probability space such that for all n:
E exp(θ0 |Sn − Zn |) ≤ κ,
where θ0 > 0 and κ are universal constants. Chapter 2 will be dedicated to this normal
approximation theorem. Apart from this theorem, Chapter 2 also provides another
result. Basically this result is a conditional version of the normal approximation
theorem. We fix the value of Sn , for n ≥ 3, and produce a coupling of Sk , for
n/3 ≤ k ≤ 2n/3, with a certain Gaussian random variable. This result will turn out
to be useful in Chapter 3, where we will complete the proof of Theorem 2.
In Chapter 3 we will first give an induction argument in order to obtain a ’finite n
version’ of Theorem 2. This means we will produce a coupling of (Sk )k≤n with random
variables (Zk )k≤n , having a joint Gaussian distribution, mean zero and Cov(Zi , Zj ) =
i ∧ j. Lemma 3.2.1 is dedicated to this result, which already quite resembles Theorem
2. Although, to finish the proof of the KMT theorem for the SRW, we need one
coupling for the whole sequence. Theorem 3.2.2 provides the completing arguments.
Whereas the first three chapters are dedicated to the proof of the KMT theorem
for the SRW, Chapter 4 provides an application of the general KMT theorem. The
results of the latter chapter are based on the book [9]. We will consider a sequence of
i.i.d. random variables with a finite moment generating function. The goal of Chapter
4 is to give some results concerning the size of the increments of the partial sums. In
particular, we want to point out how we can use the KMT theorem in order to obtain
such results. Besides, as mentioned earlier, Chapter 4 also contains a proof of why
o(log n) cannot be achieved unless the random variables i are standard normal.
3
Chapter 1
Stein coefficients
In this chapter we will focus on a so-called ’strong coupling’ theorem. This theorem
produces a coupling of an arbitrary random variable W with a Gaussian random variable Z. One of the assumptions in the theorem is the existence of a Stein coefficient
for W .
Definition 1.0.1. Let W and T be random variables, defined on the same probability
space, such that whenever ψ is a Lipschitz function and ψ 0 is the derivative of ψ a.e.,
we have:
E[W ψ(W )] = E[ψ 0 (W )T ].
Then we call T a Stein coefficient for W .
Now we can give a formulation of the strong coupling theorem.
Theorem 1.0.2. Suppose W is a random variable with E[W ] = 0 and E[W 2 ] < ∞.
Let T be a Stein coefficient for W and assume |T | ≤ K a.e., for some constant K.
Then, for any σ 2 > 0, we can construct Z ∼ N (0, σ 2 ) on the same probability space
such that for any θ ∈ R:
2
2θ (T − σ 2 )2
E exp(θ|W − Z|) ≤ 2E exp
.
σ2
The idea behind this theorem, is that if T ' σ 2 with high probability, then W ' Z
where Z ∼ N (0, σ 2 ). This intuition can be motivated by the fact that a random
variable Z follows the N (0, σ 2 ) distribution if and only if E[Zψ(Z)] = σ 2 E[ψ 0 (Z)] for
all continuous differentiable functions ψ for which E|ψ 0 (Z)| < ∞. Proposition A.1.3
in the appendix is dedicated to this result.
We will start with a proof of the strong coupling theorem. In the second section
we will give a few examples, in order to gain some insight into the concept of Stein
coefficients.
1.1
Strong coupling based on Stein coefficients
In order to give a proof of Theorem 1.0.2, we will need several auxiliary results. We
start with the theorem below, which will turn out to be very useful in proving the
4
CHAPTER 1. STEIN COEFFICIENTS
5
other results in this section. Recall that ∇f and Hess f denote the gradient and
Hessian of a function f ∈ C 2 (Rn ), and Tr stands for the trace of a matrix. Moreover
we will use the notation k · k for both the Euclidean norm on Rn and the matrix norm
induced by the Euclidean norm, i.e. let B ∈ Rn×n then
kBk :=
kBxk
.
x∈Rn \{0} kxk
sup
The context will make clear which one of both norms is meant.
Theorem 1.1.1. Let n be a strictly positive integer and suppose A is a continuous
map from Rn into the set of n × n symmetric positive semidefinite matrices. Suppose
there exists a constant b ≥ 0 such that for all x ∈ Rn :
kA(x)k ≤ b.
Then there exists a probability measure µ on (Rn , Rn ) such that for any random vector
X with distribution µ, we have that:
E exp hθ, Xi ≤ exp(bkθk2 )
(1.1)
E hX, ∇f (X)i = ETr(A(X)Hessf (X))
(1.2)
for all θ ∈ Rn , and:
for all f ∈ C 2 (Rn ) such that E|f (X)|2 , Ek∇f (X)k2 , and E|Tr(A(X)Hess f (X))| are
finite.
Before giving a proof of this theorem, we introduce some notation. Let V denote
the space of all finite signed measures on (Rn , Rn ). Moreover, letR T denote the
topology generated by the separating family of semi-norms |µ|f := | f dµ|, where f
ranges over all continuous functions with compact support. Furthermore, let Ve be
the subspace of V consisting of all probability measures on (Rn , Rn ), equipped with
the trace topology TVe . For more details on the properties of the spaces (V, T ) and
(Ve , TVe ) we refer to Appendix A.2. For the purposes of the proof of Theorem 1.1.1,
we only need the following proposition.
Proposition 1.1.2. The following properties hold:
(i) (V, TV ) is a locally convex topological vector space.
(ii) (Ve , TVe ) is a metric space and a sequence (µn )n in (Ve , TVe ) is convergent if and
only if it is weakly convergent.
We will now proceed with the proof of Theorem 1.1.1.
CHAPTER 1. STEIN COEFFICIENTS
6
Proof. Let K denote the set of all probability measures µ on (Rn , Rn ) satisfying
Z
Z
xµ(dx) = 0 and
exp hθ, xi µ(dx) ≤ exp(bkθk2 ) for all θ ∈ Rn .
Note that the first assumption is a shorthand notation for :
Z
pri (x)µ(dx) = 0, for all 1 ≤ i ≤ n,
where pri denotes the i-th projection.
(STEP 1) It is our aim to show that K is a nonempty, compact and convex subset
of the space V .
(1.a) We will start with showing that K is nonempty. Let Z be an n-dimensional
standard normal random vector with Z = (Z1 ,√. . . , Zn )T and Z1 , . . . , Zn independent
standard normal random variables. Set Y = 2bZ and let ν be the distribution of
Y . We will show that ν ∈ K. Obviously ν is a probability measure on (Rn , Rn ) such
that:
Z
Z
Z
√
xν(dx) = xPY (dx) = Y (ω)P(dω) = E[Y ] = 2bE[Z] = 0,
where we have used the formula of the image measure. Using the independence of
the random variables Zi , we have for all θ ∈ Rn :
" n
#
Z
n
Y
Y
√
√
√
exp( 2bθi Zi ) =
E exp( 2bθi Zi ).
exphθ, xiν(dx) = E exphθ, 2bZi = E
i=1
i=1
Using the well-known expression for the mgf of a standard Gaussian random variable,
we obtain:
!
2
Z
n
n
X
Y
2bθi
= exp b
θi2 = exp(bkθk2 ).
exphθ, xiν(dx) =
exp
2
i=1
i=1
Thus, we can conclude that ν ∈ K, and K is nonempty.
(1.b) Next, we will prove that K is a compact subset of V . Since K ⊂ Ve , it
is obviously sufficient to prove that K is compact in (Ve , TVe ). Using that (Ve , TVe )
is a metric space, this is equivalent with showing that K is closed and sequentially
compact. We will start with proving that K is sequentially compact. Let (µm )m≥1
be a sequence in K. Prohorov’s Theorem A.1.19 states that (µm )m≥1 is sequentially
compact if and only if this sequence is tight. Furthermore, Theorem A.1.20 gives a
sufficient condition for tightness. Let Xm : Ω → Rn be random vectors with µm =
PXm . Then it suffices to prove that there exists an α > 0 such that supm≥1 E[kXm kα ] <
∞. We can take α = 2. Indeed, for all m ≥ 1 we have that E[kXm k2 ] ≤ 4n exp b.
Later on in this proof (on p. 11), a more general case of this inequality will be shown.
Hence,
sup E[kXm k2 ] ≤ 4n exp(b) < ∞.
m≥1
CHAPTER 1. STEIN COEFFICIENTS
7
Thus, we have proved that (µm )m≥1 is sequentially compact. As a consequence, K
is sequentially compact too. Now, we will prove that K is closed. Let (µm )m≥1 be a
w
sequence in K and assume µm → µ. It suffices to prove that µ ∈ K. Take random
d
vectors Xm ∼ µm and X ∼ µ. Clearly Xm → X, as m → ∞. Let θ ∈ Rn be arbitrary,
then Proposition A.1.24 implies that
E exphθ, Xi ≤ lim inf E exphθ, Xm i.
m→∞
Since PXm = µm ∈ K, we have that E exphθ, Xm i ≤ exp(bkθk2 ) for all m ≥ 1.
Therefore,
Z
exphθ, xiµ(dx) = E exphθ, Xi ≤ exp(bkθk2 ).
d
On the other hand, we know that Xm → X and that projections are continuous.
Writing Xm,j for the components of Xm and Xj for the components of X, the cond
tinuous mapping theorem implies that Xm,j → Xj as m → ∞, for all 1 ≤ j ≤ n.
Our aim is to prove that E[X] = 0. Since we know that E[Xm,j ] = 0 for all m and
j, it suffices to prove that E[Xm,j ] → E[Xj ] for all 1 ≤ j ≤ n. Using Proposition
A.1.23, we see that it is sufficient to find a δ > 0 such that supm≥1 E[|Xm,j |1+δ ] < ∞.
Obviously E[|Xm,j |2 ] ≤ E[kXm k2 ]. Thus, putting δ = 1, we can conclude by a former
reasoning that:
sup E[|Xm,j |1+δ ] ≤ 4n exp b < ∞.
m≥1
Thus, we have shown that µ ∈ K. Therefore, K is a compact set.
(1.c) Finally, we will show that K is convex. Let µ1 , µ2 ∈ K and α ∈ [0, 1].
Using Proposition A.3.2 (i), we see that αµ1 + (1 − α)µ2 is a measure on (Rn , Rn ).
Obviously, αµ1 + (1 − α)µ2 is a probability measure, since α ∈ [0, 1] and since µ1 and
µ2 are probability measures themselves. Now, using Proposition A.3.2 (ii) and the
fact that µ1 , µ2 ∈ K, we have:
Z
Z
Z
x(αµ1 + (1 − α)µ2 )(dx) = α xµ1 (dx) +(1 − α) xµ2 (dx) = 0.
| {z }
| {z }
=0
=0
Moreover, for any θ ∈ Rn :
Z
exphθ, xi(αµ1 + (1 − α)µ2 )(dx)
Z
Z
= α exphθ, xiµ1 (dx) + (1 − α) exphθ, xiµ2 (dx)
≤ α exp(bkθk2 ) + (1 − α) exp(bkθk2 )
= exp(bkθk2 ).
Therefore, we have shown that αµ1 + (1 − α)µ2 ∈ K. Thus, K is convex.
(STEP 2) Fix ∈ (0, 1) and define a map T : K → V as follows. Given µ ∈ K, let
X be a random vector with distribution µ and let Z be a random vector, independent
CHAPTER 1. STEIN COEFFICIENTS
8
of X and following the standard normal law on Rn . Furthermore, let T µ be the
distribution of the random vector
p
(1 − )X + 2A(X)Z,
p
where A(X) denotes the symmetric, positive semidefinite matrix satisfying
p
p
A(X) A(X) = A(X).
Since T µ is a probability measure, it is obvious that T µ ∈ V .
(2.a) We will now show that T µ ∈ K. Let θ ∈ Rn , then the definition of T µ
clearly implies that:
Z
E
D
p
exp hθ, xi (T µ)(dx) = E exp θ, (1 − )X + 2A(X)Z .
Rn
Since the integrand is positive and since X and Z are independent, we can use Corollary A.1.4, and hence:
Z
Z
p
E[exp(hθ, (1 − )xi + hθ, 2A(x)Zi)]PX (dx)
exp hθ, xi (T µ)(dx) =
n
Rn
ZR
p
=
exp hθ, (1 − )xi E[exphθ, 2A(x)Zi]PX (dx).
Rn
D p
E
Next, we will compute E exp θ, 2A(x)Z for x ∈ Rn fixed. To ease the notation
p
we will write B = A(x). Then, obviously:
!
n
n
D p
E
X
X
√
E exp θ, 2A(x)Z
= E exp
2Bij Zj
θi
i=1
"
= E
n
Y
j=1
n
√ X
2
θi Bij Zj
exp
j=1
!#
.
i=1
By definition
√ Pnof Z, we know that the random variables Zj are independent. Moreover,
since 2 i=1 θi Bij is a constant, we can use the well-known expression for the
moment generating function of a standard normal random variable, and we get:
!
n
n
D p
E
Y
√ X
E exp θ, 2A(x)Z
=
E exp
2
θi Bij Zj
j=1
=
n
Y
i=1

exp 
j=1
= exp !2 
θi Bij

i=1
n
n
X
X
j=1
= exp n
X
!
θi Bij
i=1
n X
n
X
i=1 k=1
θi θk
n
X
!!
θk Bkj
k=1
n
X
j=1
!
Bij Bkj
.
CHAPTER 1. STEIN COEFFICIENTS
9
Using that the square root
PB of the matrix A(x) is a symmetric matrix, it is clear
that A(x)ik = (BB T )ik = nj=1 Bij Bkj . Thus, we have shown that:
!
n X
n
E
D p
X
= exp E exp θ, 2A(x)Z
θi A(x)ik θk = exp ( hθ, A(x)θi) .
i=1 k=1
Therefore, we can conclude that:
Z
Z
exp hθ, xi (T µ)(dx) =
Rn
exp hθ, (1 − )xi exp (hθ, A(x)θi) PX (dx)
Rn
= E exp (hθ, (1 − )Xi + hθ, A(X)θi) .
We will prove that this last expression can be bounded by
exp bkθk2 E exphθ, (1 − )Xi.
Obviously, it suffices to show that hθ, A(X(ω))θi ≤ bkθk2 for all ω ∈ Ω. By definition
of the induced matrixnorm it is clear that kBxk ≤ kBkkxk, for any real n × n-matrix
B and any x ∈ Rn . Using this together with the Cauchy-Schwarz inequality and the
boundedness of A, we obtain:
hθ, A(X(ω))θi ≤ kθkkA(X(ω))θk ≤ kθk2 kA(X(ω))k ≤ bkθk2 .
Recall that PX = µ ∈ K, so that by definition of K:
E exphθ, (1 − )Xi = E exph(1 − )θ, Xi
Z
exph(1 − )θ, xiµ(dx)
=
Rn
≤ exp b(1 − )2 kθk2 .
Thus, we have shown that:
Z
exp hθ, xi (T µ)(dx) ≤ exp bkθk2 exp b(1 − )2 kθk2
Rn
= exp b(1 − + 2 )kθk2
≤ exp(bkθk2 ),
where this last inequality
follows from the fact that 1 − + 2 ≤ 1, since ∈ (0, 1). If
R
we can show that x(T µ)(dx) = 0, then we can conclude that T : K → K. Since
PX = µ ∈ K, it is obvious that E[X] = 0. Hence,
Z
h
i √
hp
i
p
A(X)Z .
x(T µ)(dx) = E (1 − )X + 2A(X)Z = 2E
Rn
Since the random vectors X and Z are independent, we can use Corollary A.1.4 in
order to conclude that:
Z
Z p
p
p
E[ A(X)Z] =
E[ A(x)Z]PX (dx) =
A(x) E[Z] PX (dx) = 0.
|{z}
Rn
Rn
=0
CHAPTER 1. STEIN COEFFICIENTS
10
This means we have proved that T maps K into K.
(2.b) We will now show that T is continuous under the topology T . Assume
w
w
µn → µ, if n → ∞. It suffices to prove that T µn → T µ. We take random vectors
X ∼ µ and Xn ∼ µn for n ≥ 1. Moreover, we define a map G as follows:
p
G : Rn × Rn → Rn : (x, z) 7→ (1 − )x + 2A(x)z.
Clearly, the function G is continuous. Indeed, this follows
√ from the continuity of A
together with the continuity of the transformation B 7→ B. For a proof of this last
claim, we refer to Eq. (X.2) on p. 290 of [3]. Using properties of the product measure
and the image measure, it is easy to see that:
T µ = (PX ⊗ PZ )G
and T µn = (PXn ⊗ PZ )G .
w
We know that PXn → PX . Using characteristic functions, it follows that:
w
PXn ⊗ PZ → PX ⊗ PZ .
Since G is continuous, the continuous mapping theorem implies that:
w
(PXn ⊗ PZ )G → (PX ⊗ PZ )G .
w
Thus we have shown that T µn → T µ and we can conclude that T : K → K is a
continuous mapping.
(STEP 3) The Schauder-Tychonoff fixed point theorem A.2.1 for locally convex
topological vector spaces, implies that T must have a fixed point in K. For each
∈ (0, 1), let µ be a fixed point of T , and let X denote a random vector with
distribution µ .
(STEP 4) Take an arbitrary f ∈ C 2 (Rn ) with ∇f andpHess f bounded and uniformly continuous. Fix ∈ (0, 1) and define Y := −X + 2A(X )Z. By definition
of T , we have that:
PX = µ = T (µ ) = P(1−)X +√2A(X )Z = PX +Y .
This clearly implies that E[f (X )] = E[f (X + Y )] or equivalently that:
E[f (X + Y ) − f (X )] = 0.
Let
(1.3)
1
R = f (X + Y ) − f (X ) − hY , ∇f (X )i − hY , Hess f (X )Y i.
2
(4.a) First, note that:
EhY , ∇f (X )i = −EhX , ∇f (X )i.
This can be seen as follows. Simply using the definition of Y it is clear that:
p
EhY , ∇f (X )i = −EhX , ∇f (X )i + Eh 2A(X )Z, ∇f (X )i.
(1.4)
CHAPTER 1. STEIN COEFFICIENTS
11
p
Thus it suffices to prove that Eh 2A(X )Z, ∇f (X )i = 0. Since Z and X are
independent and E[Z] = 0, one easily sees that EhZ, g(X )i = 0 for any Borelmeasurable function g : Rn → Rn . Consequently,
p
p
Eh 2A(X )Z, ∇f (X )i = EhZ, 2A(X )∇f (X )i = 0.
This finishes the argument, and (1.4) has been proved.
(4.b) Secondly, we will show that all moments of kX k are bounded by constants
that do not depend on . For 1 ≤ j ≤ n, let ej := (0, . . . , 0, 1, 0, . . . , 0)T ∈ Rn , where
the j-th component of ej is equal to 1 and all other components are zero. Since
PX ∈ K, the definition of K clearly implies that for all 1 ≤ j ≤ n:
E[e|X,j | ] = E[eX,j I{X,j ≥0} ] + E[e−X,j I{X,j <0} ]
≤ E[eX,j ] + E[e−X,j ]
= E[exphej , X i] + E[exph−ej , X i]
≤ exp b + exp b = 2 exp b.
|X
|l
≤
From the Taylor expansion of the exponential function we can deduce that ,j
l!
l
exp |X,j |, for all l ≥ 0. Therefore, we have shown that E[|X,j | ] ≤ l!2 exp b. Let
kY kk := E[|Y |k ]1/k for an arbitrary random variable Y . It is well known that k · kk is
a semi-norm. In particular, using the inequality of Minkowski, we have for all k ≥ 1
that:
2
2
E kX k2k = kkX k2 kkk = kX,1
+ · · · + X,n
kkk
k
2
2
≤ kX,1
kk + · · · + kX,n
kk
2k 1/k
2k 1/k k
= E[X,1
] + · · · + E[X,n
]
k
≤ n ((2k)!2 exp b)1/k
= nk (2k)!2 exp b.
Since this holds for all k ≥ 1, we have shown that all even moments of kX k are
bounded by constants that do not depend on . The same holds for the odd moments,
since kY kp ≤ kY kq , if p ≤ q.
(4.c) Next, we will rewrite the expression EhY , Hessf (X )Y i. More specifically,
we will show that:
EhY , Hessf (X )Y i = 2ETr(A(X )Hessf (X )) + O(2 ).
(1.5)
Simply using the definition of Y , one sees that:
p
p
EhY , Hessf (X )Y i = Eh−X + 2A(X )Z, Hessf (X )(−X + 2A(X )Z)i
p
√
= 2 EhX , Hessf (X )X i − 83/2 Eh A(X )Z, Hessf (X )X i
p
p
+2Eh A(X )Z, Hessf (X ) A(X )Zi.
We will work out these 3 terms separately. Since Hessf is bounded, there exists a
constant C such that kHessf (x)k ≤ C for all x ∈ Rn . Recall that kBxk ≤ kBkkxk
CHAPTER 1. STEIN COEFFICIENTS
12
for any real n × n-matrix B and any vector x ∈ Rn . Thereby, using the inequality of
Cauchy-Schwarz, the first term can be bounded as follows:
2 EhX , Hessf (X )X i ≤ 2 E [kX kkHessf (X )X k]
≤ 2 E kX k2 kHessf (X )k
≤ C2 E kX k2 = O(2 ).
For the second term, we use the independence of X and Z, and we get:
p
p
Eh A(X )Z, Hessf (X )X i = EhZ, A(X )Hessf (X )X i = 0.
In order to prove (1.5), it thus remains to show that:
Dp
E
p
2E
A(X )Z, Hessf (X ) A(X )Z = 2ETr(A(X )Hessf (X )).
Using again Corollary A.1.4, we have that:
Dp
E Z
Dp
E
p
p
E
A(X )Z, Hessf (X ) A(X )Z =
E
A(x)Z, Hessf (x) A(x)Z µ (dx).
Rn
The integrand on the right-hand side, can be rewritten as:
" n
#
Dp
E
X p
p
p
E
A(x)Z, Hessf (x) A(x)Z
= E
A(x)Z
Hessf (x) A(x)Z
=
i=1
n
n X
n
XX
i
p
i
p
A(x)ij (Hessf (x) A(x))ik E[Zj Zk ].
i=1 j=1 k=1
Using independence, it is obvious that E[Zj Zk ] = E[Zj ]E[Zk ] = 0 if j 6= k. On the
other hand E[Zj2 ] = 1. Let δjk denote the Kronecker delta, then E[Zj Zk ] = δjk . Thus,
p
using that both of the two matrices A(x) and Hessf (x) are symmetric, we have
that:
n X
n
Dp
E
X
p
p
p
E
A(x)Z, Hessf (x) A(x)Z
=
A(x)ij Hessf (x) A(x)
=
=
i=1 j=1
n X
n
X
ij
p
p
T
T
A(x)ij
A(x) Hessf (x)
i=1 j=1
n X
ji
p
p
A(x) A(x)Hessf (x)
i=1
ii
= Tr(A(x)Hessf (x)).
Therefore, we can conclude that:
Z
Dp
E
Dp
E
p
p
E
A(X )Z, Hessf (X ) A(X )Z
=
E
A(x)Z, Hessf (x) A(x)Z µ (dx)
n
ZR
=
Tr(A(x)Hessf (x))µ (dx)
Rn
= ETr(A(X )Hessf (X )),
CHAPTER 1. STEIN COEFFICIENTS
13
which finishes the proof of (1.5).
(4.d) We will now prove that |R | ≤ kY k2 δ(kY k), where δ : [0, ∞) → [0, ∞) is a
bounded function satisfying limt→0 δ(t) = 0. Using Taylor expansion for a function
of n variables, we obtain:
(2)
∂
∂
1
Y,1
+ · · · + Y,n
f (X + θY ),
f (X + Y ) = f (X ) + hY , ∇f (X )i +
2
∂x1
∂xn
for a certain map θ : Ω → (0, 1). The square in the last term of this formula is a
shorthand notation and is defined as follows:
(2)
n X
n
X
∂
∂
∂ 2f
Y,1
+ · · · + Y,n
f (X + θY ) =
Y,i Y,j
(X + θY )
∂x1
∂xn
∂xi ∂xj
i=1 j=1
= hY , Hessf (X + θY )Y i.
Thus, for a specific map θ : Ω → (0, 1), it holds that:
1
f (X + Y ) = f (X ) + hY , ∇f (X )i + hY , Hessf (X + θY )Y i.
2
Simply substituting this into the definition of R yields:
1
R = hY , (Hessf (X + θY ) − Hessf (X ))Y i.
2
Again, using the Cauchy-Schwarz inequality and the definition of the induced matrix
norm, we get:
1
|hY , (Hessf (X + θY ) − Hessf (X ))Y i|
2
1
kY k2 kHessf (X + θY ) − Hessf (X )k.
≤
2
|R | =
We will show how we can use the boundedness and uniform continuity of Hessf to
construct the δ mentioned earlier. First of all, we use the uniform continuity of Hessf .
Let η > 0, then there exists a ξ > 0 such that:
1
∀x, y ∈ Rn : kyk < ξ ⇒ kHessf (x + y) − Hessf (x)k < η.
2
For simplicity we will write B = Hessf . The map δ can be constructed as follows.
First let η = 1,
1
∃ξ1 > 0, ∀x, y ∈ Rn : kyk < ξ1 ⇒ kB(x + y) − B(x)k < 1.
2
We define δ(t) := C, for t ≥ ξ1 . (Recall that C was chosen such that kB(z)k ≤ C for
all z ∈ Rn .) Secondly, let η = 1/2, then:
1
1
∃ξ2 ∈ (0, ξ1 ), ∀x, y ∈ Rn : kyk < ξ2 ⇒ kB(x + y) − B(x)k < .
2
2
CHAPTER 1. STEIN COEFFICIENTS
14
We define δ(t) := 1, for ξ2 ≤ t < ξ1 . Next, let η = 1/4,
1
∃ξ3 ∈ (0, ξ2 ), ∀x, y ∈ Rn : kyk < ξ3 ⇒ kB(x + y) − B(x)k < 1/4.
2
We define δ(t) := 1/2, for ξ3 ≤ t < ξ2 . It is easily seen that, continuing this reasoning,
leads to a bounded map δ : [0, ∞) → [0, ∞) such that limt→0 δ(t) = 0 and |R | ≤
kY k2 δ(kY k).
P
(4.e) Next, we will show that δ 2 (kY k) → 0 as → 0. To that end we will first
prove that E[−2 kY k4 ] can be bounded by a constant that does not depend on .
Using the simple inequality (a + b)4 ≤ 8 (a4 + b4 ), we obtain:
p
kY k4 = k − X + 2A(X )Zk4
4
p
≤ k − X k + k 2A(X )Zk
p
≤ 84 kX k4 + 322 k A(X )Zk4 .
Since ∈ (0, 1), it is obvious that:
p
p
A(X )Zk4 ≤ 8kX k4 + 32k A(X )Zk4
p
Thus, it suffices to prove that EkX k4 and Ek A(X )Zk4 can be bounded by constants that do not depend on . For EkX k4 we have already proved this before.
Onpthe other hand, using Corollary A.1.4, a straightforward calculation shows that
Ek A(X )Zk4 ≤ 3b2 n10 , which is a constant that does not depend on . Therefore, we have shown that E[−2 kY k4 ] can be bounded by a constant L independent
P
of . Subsequently, it can be easily seen that kY k → 0 as → 0. Indeed, since
E [kY k4 ] ≤ L2 , it is clear that kY k → 0 in L4 as → 0. Using Proposition A.1.22,
P
we can conclude that kY k → 0 as → 0. Now, let {n | n ≥ 1} be a sequence in
P
(0, 1) such that n → 0 as n → ∞. Since kY k → 0, Proposition A.1.21 implies that
there exists a subsequence {nk | k ≥ 1} such that kYnk k → 0 almost surely. By
construction of δ it is clear that δ 2 (kYnk k) → 0 almost surely. Again using the char−2 kY k4 ≤ 82 kX k4 + 32k
P
acterization A.1.21 of convergence in probability, we can conclude that δ 2 (kY k) → 0
as → 0.
(4.f) Combining some of the above-mentioned properties, we will now show that:
lim −1 E|R | = 0.
→0
(1.6)
Using Hölder’s inequality A.1.6 with p = q = 2, we see that:
E[−1 |R |] ≤ E[−1 kY k2 δ(kY k)] ≤ E[−2 kY k4 ]1/2 E[δ 2 (kY k)]1/2 .
Recall that E[−2 kY k4 ] is bounded by a constant not depending on . Thus, in order
to prove (1.6) it suffices to prove that E[δ 2 (kY k)] → 0 as → 0. Since δ is a
bounded function, the family {δ 2 (kY k) | ∈ (0, 1)} is uniformly integrable. Using
P
this, together with the fact that δ 2 (kY k) → 0 and the characterization in A.1.22, we
CHAPTER 1. STEIN COEFFICIENTS
15
obtain that δ 2 (kY k) → 0 in L1 . Or, equivalently E[δ 2 (kY k)] → 0. This concludes
the proof of (1.6).
(4.g) Recall that K is a compact set. Since {µ }0<<1 is a collection in K, this
collection has a cluster point µ ∈ K. This means that there exists a sequence (j )j
w
in (0, 1) such that j → 0 and µj → µ, as j → ∞. Let X be an arbitrary random
d
vector with law µ, then Xj → X. Note that for any ∈ (0, 1), we have by definition
of R and by equations (1.4) and (1.5) that:
−1 E[R ] =−1 E[f (X + Y ) − f (X )] + EhX , ∇f (X )i
1
− 2ETr(A(X )Hessf (X )) + O().
2
Using equation (1.3), this implies that:
−1 E[R ] = EhX , ∇f (X )i − ETr(A(X )Hessf (X )) + O().
Using the triangle inequality and equation (1.6), it is clear that:
|EhX , ∇f (X )i − ETr(A(X )Hessf (X ))|
≤ |EhX , ∇f (X )i − ETr(A(X )Hessf (X )) + O()| + O()
= −1 |E[R ]| + O()
≤ −1 E|R | + O() → 0.
Let aj := EhX, ∇f (X)i − EhXj , ∇f (Xj )i and bj := |ETr(A(Xj )Hessf (Xj )) −
ETr(A(X)Hessf (X))|, then by the triangle inequality:
|EhX, ∇f (X)i − ETr(A(X)Hessf (X))|
≤ aj + |EhXj , ∇f (Xj )i − ETr(A(Xj )Hessf (Xj ))| + bj .
The left-hand side of this inequality does not depend on j. Thus, in order to prove
that this left-hand side is zero, it suffices to prove that the right-hand side converges
to zero as j → ∞. We have already shown the middle term converges to zero, thus
it is sufficient to show that aj → 0 and bj → 0. Since ∇f is uniformly continuous,
it is obvious that the map x 7→ hx, ∇f (x)i is continuous. Using the continuous
d
d
mapping theorem and the fact that Xj → X, we can conclude that hXj , ∇f (Xj )i →
hX, ∇f (X)i. In order to prove that aj → 0, it suffices to prove that:
sup E |hXj , ∇f (Xj )i|1+α < ∞,
j≥1
for some α > 0. This is because of Proposition A.1.23. Setting α = 1, the CauchySchwarz inequality and the inequality of Hölder A.1.6 imply that:
E |hXj , ∇f (Xj )i|1+α = E |hXj , ∇f (Xj )i|2
≤ E kXj k2 k∇f (Xj )k2
1/2 1/2
≤ E kXj k4
E k∇f (Xj )k4
.
CHAPTER 1. STEIN COEFFICIENTS
16
Since ∇f is bounded and since the moments of kX k are bounded by constants not
depending on , there exists a constant C̃ > 0 such that for all j ≥ 1:
E |hXj , ∇f (Xj )i|1+α ≤ C̃.
This finishes the argument for aj → 0. To see that bj → 0, we proceed as follows.
First note that for any n × n-matrices B and D, we have:
|Tr(B) − Tr(D)| = |Tr(B − D)| ≤
n
X
|bii − dii | ≤ nkB − Dk,
i=1
where we used Proposition A.3.3 for the last inequality. This shows that Tr is a
(uniformly) continuous function. Moreover, A and Hessf are also continuous and
d
Xj → X. Therefore, we can use the continuous mapping theorem, and:
d
Tr(A(Xj )Hessf (Xj )) → Tr(A(X)Hessf (X)).
Thus, again by Proposition A.1.23, it suffices to prove that:
sup E |Tr(A(Xj )Hessf (Xj ))|2 < ∞.
j≥1
Note that for any two n × n-matrices B and D, we have that kBDk ≤ kBkkDk.
Using again Proposition A.3.3, it is clear that:
|Tr(A(Xj )Hessf (Xj ))| ≤ nkA(Xj )Hessf (Xj )k
≤ nkA(Xj )kkHessf (Xj )k
≤ nbC.
This implies that supj≥1 E |Tr(A(Xj )Hessf (Xj ))|2 ≤ (nbC)2 . Thus, we have
proved that aj → 0 and bj → 0. As mentioned before, this implies that:
EhX, ∇f (X)i = ETr(A(X)Hessf (X)).
This completes the proof for f ∈ C 2 (Rn ) with ∇f and Hessf bounded and uniformly
continuous.
(STEP 5) Finally, take an arbitrary f ∈ C 2 (Rn ). Let g : Rn → [0, 1] be a C ∞
function such that g(x) = 1 if kxk ≤ 1 and g(x) = 0 if kxk ≥ 2. For each a > 1, let
fa (x) := f (x)g(a−1 x). The first and second order partial derivatives of fa are given
by the expressions:
∂f
∂g −1
∂fa
(x) =
(x)g(a−1 x) + a−1 f (x)
(a x),
∂xi
∂xi
∂xi
∂ 2 fa
∂ 2f
∂f
∂g −1
(x) =
(x)g(a−1 x) + a−1
(x)
(a x)
∂xi ∂xj
∂xi ∂xj
∂xi
∂xj
∂f
∂g −1
∂ 2g
+a−1
(x)
(a x) + a−2 f (x)
(a−1 x).
∂xj
∂xi
∂xi ∂xj
CHAPTER 1. STEIN COEFFICIENTS
17
Since g ∈ C ∞ and f ∈ C 2 , these expressions clearly imply that fa ∈ C 2 . From now
on, we will use the following notation:
B(0, ) := {x ∈ Rn | kxk ≤ }.
(5.a) We will show that ∇fa and Hessfa are bounded and uniformly continuous.
a
We will first consider the boundedness of ∇fa . We have just shown that ∂f
is
∂xi
∂fa
continuous. Since B(0, 2a) is compact, ∂xi is bounded on this closed ball. By the
a
a
above, it is clear that ∂f
(x) = 0 if x 6∈
choice of g and by the expression for ∂f
∂xi
∂xi
∂fa
B(0, 2a). Thus ∂xi is bounded on Rn . Since this holds for all 1 ≤ i ≤ n and since
∂fa
∂fa
∇fa (x) =
(x), . . . ,
(x) ,
∂x1
∂xn
it is easy to see that ∇fa is bounded. The boundedness of Hessfa can be shown in a
similar way. Secondly, we will prove that ∇fa is uniformly continuous. A well-known
result from analysis states that for a function h : Y → Z, where Y is a compact
metric space and Z is just a metric space, it holds that h is continuous if and only if
a
is continuous, it is uniformly continuous
h is uniformly continuous. Hence, since ∂f
∂xi
on Y := B(0, 2a). Let > 0, then for all 1 ≤ i ≤ n:
∂fa
∂fa ∃δi > 0, ∀x, y ∈ Y : kx − yk < δi ⇒ (x) −
(y) < √ .
∂xi
∂xi
n
Setting δ := min1≤i≤n δi , then we have for any x, y ∈ Y with kx − yk < δ that:
v
u n 2 r 2
uX ∂fa
∂f
a
k∇fa (x) − ∇fa (y)k = t
(x) −
(y) <
n = .
∂x
∂x
n
i
i
i=1
When x, y 6∈ Y , it is obvious that k∇fa (x) − ∇fa (y)k = 0. To finish the argument,
let x ∈ Y and y 6∈ Y . It is always possible to find a vector z ∈ Rn such that
kzk = 2a and kx − zk ≤ kx − yk. Assume kx − yk < δ, then kx − zk < δ. Since,
∇fa (y) = 0 = ∇fa (z), it follows that:
k∇fa (x) − ∇fa (y)k = k∇fa (x) − ∇fa (z)k < ,
where the last constraint follows from the fact that x, z ∈ Y . In an analogous way it
can be shown that Hessfa is uniformly continuous. Since ∇fa and Hessfa are bounded
and uniformly continuous, we know from (STEP 4) that:
EhX, ∇fa (X)i = ETr(A(X)Hessfa (X)).
(5.b) Next, we will prove that the partial derivatives of fa converge pointwise to
∂g
those of f as a → ∞. Let x be fixed. Since g and ∂x
are continuous, the expression
i
∂fa
for ∂xi (x) mentioned earlier clearly implies that:
∂fa
∂f
∂f
(x) →
(x)g(0) =
(x),
∂xi
∂xi
∂xi
CHAPTER 1. STEIN COEFFICIENTS
18
2
∂g ∂g
g
as a → ∞. Using the continuity of g, ∂x
,
and ∂x∂i ∂x
, the expression for
i ∂xj
j
yields that:
∂ 2f
∂ 2f
∂ 2 fa
(x) →
(x)g(0) =
(x),
∂xi ∂xj
∂xi ∂xj
∂xi ∂xj
as a → ∞.
(5.c) Finally, we will prove that:
∂ 2 fa
∂xi ∂xj
EhX, ∇fa (X)i → EhX, ∇f (X)i
and
ETr(A(X)Hessfa (X)) → ETr(A(X)Hessf (X)),
as a → ∞. This would finish the proof, since we have already established that
EhX, ∇fa (X)i = ETr(A(X)Hessfa (X)) for all a > 1. We first note that by assumption the expectations E|f (X)|2 , Ek∇f (X)k2 and E|Tr(A(X)Hessf (X))| are finite. Besides, since µ ∈ K, all moments of kXk are bounded. Thus, in particular
EkXk2 < ∞. Since the first order partial derivatives of fa converge pointwise to those
of f , it is clear that for all ω ∈ Ω:
n
X
i=1
n
Xi (ω)
X
∂f
∂fa
(X(ω)) →
Xi (ω)
(X(ω)),
∂xi
∂x
i
i=1
as a → ∞. This implies that hX, ∇fa (X)i → hX, ∇f (X)i a.s., as a → ∞. (Note
that this convergence is even pointwise.) If we can use the dominated convergence
theorem, we immediately obtain that EhX, ∇fa (X)i → EhX, ∇f (X)i, as desired.
Thus, it remains to show that the dominated convergence theorem can be applied.
In other words, we have to prove that there exists an integrable random variable Y
such that |hX, ∇fa (X)i| ≤ Y , for all a > 1. Since g : Rn → [0, 1] and a > 1, we have
for all x ∈ Rn :
n
X ∂f
a
|hx, ∇fa (x)i| = (x)
xi
∂xi i=1
n
n
X
X
∂g
∂f
−1
−1
−1 (x)g(a x) +
xi a f (x)
(a x)
= xi
∂xi
∂xi
i=1
i=1
n
n
∂g X
∂f
X
≤
|xi | (x) +
|xi ||f (x)|
.
∂xi
∂xi ∞
i=1
∂g Note that ∂x
i
∞
i=1
< ∞, for any 1 ≤ i ≤ n. Namely, since g ∈ C ∞ ,
∂g
∂xi
∂g
Thus, ∂x
is bounded on B(0, 2) and zero outside. This means that
i
n
on R . Define:
n
n
∂g X
∂f
X
Y :=
|Xi | (X) +
|Xi ||f (X)|
,
∂xi
∂xi ∞
i=1
i=1
is continuous.
∂g
∂xi
is bounded
CHAPTER 1. STEIN COEFFICIENTS
19
then obviously |hX, ∇fa (X)i| ≤ Y for all a > 1. It remains to prove that Y is an
integrable random variable. Using Hölder’s inequality A.1.6 with p = q = 2, we see
that:
n X
∂g E|Y | ≤ nE[kXkk∇f (X)k] + E[kXk|f (X)|]
∂xi ∞
i=1
2 1/2
≤ nE[kXk ]
2 1/2
E[k∇f (X)k ]
2 1/2
+ E[kXk ]
2 1/2
E[|f (X)| ]
n X
∂g < ∞.
∂xi ∞
i=1
The only thing left to prove is that ETr(A(X)Hessfa (X)) → ETr(A(X)Hessf (X)),
as a → ∞. An easy calculation shows that for all ω ∈ Ω:
Tr(A(X(ω))Hessfa (X(ω))) =
n X
n
X
A(X(ω))ij
i=1 j=1
∂ 2 fa
(X(ω)).
∂xi ∂xj
We have already established that the second order partial derivatives of fa converge
pointwise to those of f . Thus, letting a → ∞, we have for all ω ∈ Ω:
n X
n
X
Tr(A(X(ω))Hessfa (X(ω))) →
A(X(ω))ij
i=1 j=1
∂ 2f
(X(ω))
∂xi ∂xj
= Tr(A(X(ω))Hessf (X(ω))).
Hence, Tr(A(X)Hessfa (X)) → Tr(A(X)Hessf (X)) a.s., as a → ∞. Obviously, if
the conditions of the dominated convergence theorem are met, this would finish the
argument. Thus, it remains to prove that there exists an integrable random variable
Ỹ such that |Tr(A(X)Hessfa (X))| ≤ Ỹ for all a > 1. By Proposition A.3.3, we have
for all x ∈ Rn that |A(x)ij | ≤ kA(x)k ≤ b. Using this and the fact that a > 1, a
straightforward calculation shows that |Tr(A(X)Hessfa (X))| ≤ Ỹ , where
n X
n X
∂g ∂f
Ỹ := |Tr(A(X)Hessf (X))| + 2b
∂xi (X) ∂xj ∞
i=1 j=1
+b
n X
n
X
i=1
∂g Analogous as before ∂x
j
∂ 2g |f (X)|
.
∂xi ∂xj ∞
j=1
∞
2 g and ∂x∂i ∂x
j
∞
are finite. Thus, it is easy to see that:
2 1/2
E|Ỹ | ≤ E|Tr(A(X)Hessf (X))| + 2bnE[k∇f (X)k ]
+bE[|f (X)|2 ]1/2
n X
n X
∂ 2g < ∞.
∂x
∂x
∞
i
j
i=1 j=1
This means that Ỹ is integrable, which finishes the proof.
n X
∂g ∂x
∞
j
j=1
CHAPTER 1. STEIN COEFFICIENTS
20
Lemma 1.1.3. Let A and X be as in Theorem 1.1.1. Take any 1 ≤ i < j ≤ n. Let
vij (x) := aii (x) + ajj (x) − 2aij (x),
where aij (x) denotes the (i, j)th element of the matrix A(x). Then for all θ ∈ R:
E exp(θ|Xi − Xj |) ≤ 2E exp(2θ2 vij (X)).
Proof. Take an arbitrary integer k ≥ 1 and define f : Rn → R as:
f (x) := (xi − xj )2k , for x ∈ Rn .
∂f
∂f
Then for any l 6∈ {i, j}, we have that ∂x
(x) = 0. Furthermore, we have that ∂x
(x) =
i
l
∂f
2k−1
2k−1
. Combining these results we see that:
2k(xi − xj )
and ∂xj (x) = −2k(xi − xj )
hx, ∇f (x)i = 2kxi (xi − xj )2k−1 − 2kxj (xi − xj )2k−1 = 2k(xi − xj )2k .
The second order partial derivatives are given by:
∂ 2f
∂ 2f
(x) =
(x) = −2k(2k − 1)(xi − xj )2k−2
∂xi ∂xj
∂xj ∂xi
∂ 2f
∂ 2f
(x)
=
(x) = 2k(2k − 1)(xi − xj )2k−2 .
∂x2i
∂x2j
All other second order partial derivatives are zero and thus:
n
X
Tr(A(x)Hessf (x)) =
(A(x)Hessf (x))ll
l=1
=
n
n X
X
alm (x)
l=1 m=1
2
= aii (x)
∂ 2f
(x)
∂xl ∂xm
∂ f
∂ 2f
∂ 2f
(x)
+
2a
(x)
(x)
+
a
(x)
(x)
ij
jj
∂x2i
∂xi ∂xj
∂x2j
= 2k(2k − 1)(xi − xj )2k−2 (aii (x) − 2aij (x) + ajj (x))
= 2k(2k − 1)(xi − xj )2k−2 vij (x),
where we have used that the matrix A(x) is symmetric. Take y ∈ Rn such that
ym = δim − δjm . Then, by positive semidefiniteness of A(x) we get:
vij (x) = aii (x) − 2aij (x) + ajj (x) = y T A(x)y ≥ 0.
Hence, vij (x) is nonnegative for all x ∈ Rn . We now use Hölder’s inequality A.1.6
k
and q = k, and we get:
with p = k−1
E|Tr(A(X)Hessf (X))| = 2k(2k − 1)E[|Xi − Xj |2k−2 |vij (X)|]
≤ 2k(2k − 1)E[|Xi − Xj |2k ]
k−1
k
1
E[vij (X)k ] k .
CHAPTER 1. STEIN COEFFICIENTS
21
We want to apply (1.2) to f . So we first have to verify whether f meets the assumptions of Theorem 1.1.1. Clearly f ∈ C 2 (Rn ). In order to prove that E|f (X)|2 < ∞,
|x|4k
≤ e|x| for all x ∈ R. Hence, we obtain,
we first notice that by Taylor expansion (4k)!
with y defined as above:
E|f (X)|2 = E|Xi − Xj |4k ≤
≤
=
≤
=
(4k)!E[e|Xi −Xj | ].
(4k)!(E[eXi −Xj ] + E[eXj −Xi ])
(4k)!(E[exp hy, Xi] + E[exp h−y, Xi])
(4k)!(exp(bkyk2 ) + exp(bk − yk2 ))
2(4k)! exp(2b) < ∞,
where we have used (1.1). By an analogous reasoning and some easy calculations, we
also have that:
Ek∇f (X)k2 = 8k 2 E(Xi − Xj )4k−2 < ∞,
and
E|Tr(A(X)Hessf (X))| ≤ 8bk(2k − 1)E(Xi − Xj )2k−2 < ∞,
where we have used that |vij (x)| ≤ 4b by Proposition A.3.3. We showed that all
assumptions on f are satisfied. Hence, we can use (1.2), which gives us that:
2kE(Xi − Xj )2k = E hX, ∇f (X)i = ETr(A(X)Hessf (X))
≤ E|Tr(A(X)Hessf (X))|
≤ 2k(2k − 1)E[(Xi − Xj )2k ]
k−1
k
1
E[vij (X)k ] k .
From the above inequality, we can conclude that:
E(Xi − Xj )2k ≤ (2k − 1)k E[vij (X)k ].
P
x2k
Using that cosh(x) = ∞
k=0 (2k)! , we obtain that:
E exp(θ|Xi − Xj |) ≤ E exp(θ(Xi − Xj )) + E exp(−θ(Xi − Xj ))
= 2E[cosh(θ(Xi − Xj ))]
" ∞
#
X θ2k (Xi − Xj )2k
= 2E
.
(2k)!
k=0 |
{z
}
≥0
Because all the terms of the sum are nonnegative, we can use monotone convergence,
and the last expression can be rewritten as:
∞
∞
X
X
θ2k E(Xi − Xj )2k
(2k − 1)k θ2k E[vij (X)k ]
2
≤ 2+2
(2k)!
(2k)!
k=0
k=1
≤ 2+2
"
= 2E
∞
X
2k θ2k E[vij (X)k ]
k=1
∞
X 2k k
k=0
k!
θ 2 vij (X)k
k!
= 2E exp(2θ2 vij (X)).
#
CHAPTER 1. STEIN COEFFICIENTS
22
In the third step we used again monotone convergence and for the second inequality
k
k
≤ 2k! , which can be seen as follows:
we used that (2k−1)
(2k)!
(2k − 1)k
2k
≤
⇔
(2k)!
k!
2k − 1
2
k
≤
(2k)!
k!
⇔ (k − 1/2)k ≤ (2k)(2k − 1) . . . (k + 1) .
|
{z
}
k factors
This last inequality is trivial, since k − 1/2 is smaller than each of the factors on the
right-hand side. This finishes the proof.
Definition 1.1.4. A function ψ : R → R is called absolutely continuous if:
k
k
X
X
∀ > 0, ∃δ > 0 :
(bi − ai ) < δ ⇒
|ψ(bi ) − ψ(ai )| < ,
i=1
i=1
for every finite collection of disjoint intervals ([ai , bi ])1≤i≤k in R.
Remark 1.1.5. It is easily seen that every Lipschitz function is absolutely continuous. Besides, it is a well-known fact that absolutely continuous functions are almost
everywhere differentiable. See [17] for more details on this topic.
The subsequent lemma shows that for a specific type of random variables, one
can always find a Stein coefficient. This Stein coefficient T is of a particular form.
Conversely, if that particular T is a Stein coefficient for a random variable Y , the
density of Y is determined. This last fact is only true under certain assumptions.
Lemma 1.1.6. Assume ρ is a continuous probability density function
R +∞ on R with
support I ⊆ R, where I is a (bounded or unbounded) interval. Suppose −∞ yρ(y)dy =
0. Let
 R∞
 x yρ(y)dy
if x ∈ I
h(x) :=
ρ(x)

0
if x 6∈ I.
Let X be a random variable with density ρ and E[X 2 ] < ∞.
(i) We have that
E[Xψ(X)] = E[h(X)ψ 0 (X)]
(1.7)
for every Lipschitz function ψ such that both sides of (1.7) are well defined and
E|h(X)ψ(X)| < ∞.
(ii) If h1 is another function that satisfies (1.7) for all Lipschitz functions ψ, then
h1 = h PX -a.s. on the support of ρ.
(iii) If Y is a random variable such that (1.7) is true with Y instead of X, for
all absolutely continuous functions ψ with continuous derivative ψ 0 , such that
|ψ(x)|, |xψ(x)| and |h(x)ψ 0 (x)| are uniformly bounded, then Y has density ρ.
CHAPTER 1. STEIN COEFFICIENTS
23
R∞
Proof. (i) Let u(x) = h(x)ρ(x), for all x ∈ R. First note that u(x) = x yρ(y)dy.
R +∞
This is also true for x 6∈ I, since I is an interval and −∞ yρ(y)dy = 0. Moreover, we
have that:
Z x
Z ∞
yρ(y)dy,
(1.8)
yρ(y)dy = −
u(x) =
−∞
x
R +∞
where we have used that −∞ yρ(y)dy = 0. Using equation (1.8), it is easy to verify
that u is continuous and that limx→−∞ u(x) = limx→∞ u(x) = 0. Equation (1.8) also
implies that u is strictly positive on the support of ρ. To see this, let x ∈ I and
assume x ≥ 0. Since ρ is continuous and ρ(x) > 0, there exists an > 0 such that
[x, x + [⊆ I. Hence we can conclude that:
Z ∞
Z x+
u(x) =
yρ(y)dy ≥
y ρ(y) dy > 0.
|{z} |{z}
x
x
>0
>0
When x ≤ 0 it follows from an analogous reasoning that u(x) > 0. Hence we can
conclude that u is strictly positive on I. From Proposition A.3.1 in the appendix we
know that E[h(X)] = E[X 2 ] < ∞. When ψ is a bounded Lipschitz function, using
integration by parts leads to:
Z x
Z ∞
Z ∞
yρ(y)dy
E[Xψ(X)] =
xψ(x)ρ(x)dx =
ψ(x)d
−∞
−∞
−∞
Z ∞
ψ(x)d(−u(x))
=
−∞
Z ∞
∞
u(x)ψ 0 (x)dx
= [−ψ(x)u(x)]−∞ +
−∞
Z ∞
h(x)ψ 0 (x)ρ(x)dx = E[h(X)ψ 0 (X)].
=
−∞
Notice that we have used the boundedness of ψ and the fact that u(x) → 0 as
x → ±∞. So, we have shown that (1.7) is true for any bounded Lipschitz function
ψ.
Now let ψ be an arbitrary Lipschitz function and g : R → [0, 1] a C ∞ map such
that:
1
if x ∈ [−1, 1]
g(x) :=
0
if x 6∈ [−2, 2].
For each a > 1, let ψa (x) := ψ(x)g(a−1 x). The derivative of ψa is given by:
ψa0 (x) = ψ 0 (x)g(a−1 x) + a−1 ψ(x)g 0 (a−1 x).
It is easily seen that ψa is continuous on the compact interval [−2a, 2a] and zero
outside. Hence ψa is bounded. Moreover, ψa is a Lipschitz function. Indeed, since ψ
is a Lipschitz function, we know there exists a constant L1 > 0 such that:
∀x, y ∈ R : |ψ(x) − ψ(y)| ≤ L1 |x − y|.
CHAPTER 1. STEIN COEFFICIENTS
24
Since ψa (t) = 0 for t 6∈ [−2a, 2a], it suffices to prove there exists a constant La > 0
such that:
∀x, y ∈ [−2a, 2a] : |ψa (x) − ψa (y)| ≤ La |x − y|.
Let x, y ∈ [−2a, 2a], then it is clear that:
|ψa (x) − ψa (y)| ≤ |ψ(x)g(a−1 x) − ψ(x)g(a−1 y)| + |ψ(x)g(a−1 y) − ψ(y)g(a−1 y)|
= |ψ(x)||g(a−1 x) − g(a−1 y)| + g(a−1 y)|ψ(x) − ψ(y)|.
By construction of g, we know g is a Lipschitz function. Assume g has Lipschitz
constant L2 > 0. Besides, ψ is continuous. Therefore ψ is bounded on the compact
interval [−2a, 2a]. More specifically, there exists a constant Ka > 0 such that |ψ(t)| ≤
Ka for all t ∈ [−2a, 2a]. Using these arguments together with the fact that g maps R
into [0, 1], we obtain for all a > 1 that:
|ψa (x) − ψa (y)| ≤ |ψ(x)|L2 |a−1 x − a−1 y| + L1 |x − y|
≤ Ka L2 |x − y| + L1 |x − y|
= La |x − y|,
with La := Ka L2 + L1 . Consequently, ψa is a bounded Lipschitz function, which
implies that for all a > 1:
E[Xψa (X)] = E[h(X)ψa0 (X)].
By continuity of g we have for all x ∈ R that ψa (x) = ψ(x)g(a−1 x) → ψ(x)g(0) = ψ(x)
as a → ∞. Hence, ψa converges pointwise to ψ as a → ∞. On the other hand, by
continuity of g and g 0 we have for all x ∈ R that ψa0 (x) → ψ 0 (x)g(0) + 0ψ(x)g 0 (0) =
ψ 0 (x) as a → ∞. Thus, we have shown pointwise convergence of ψa0 to ψ 0 as a → ∞.
Note that:
|xψa (x)| = |xψ(x)||g(a−1 x)| ≤ |xψ(x)|,
since g : R → [0, 1]. Moreover, we have that:
|h(x)ψa0 (x)| ≤ |h(x)ψ 0 (x)g(a−1 x)| + |h(x)a−1 ψ(x)g 0 (a−1 x)|
≤ |h(x)ψ 0 (x)| + kg 0 k∞ |h(x)ψ(x)|,
where kg 0 k∞ := supx∈R |g 0 (x)| is finite, since g 0 is continuous on the compact interval
[−2, 2] and zero outside. Since pointwise convergence implies almost sure convergence,
we have the following:
Xψa (X) → Xψ(X) a.s. and |Xψa (X)| ≤ |Xψ(X)| ∈ L1 .
Applying the dominated convergence theorem, yields that E[Xψa (X)] → E[Xψ(X)]
as a → ∞. On the other hand, we have:
h(X)ψa0 (X) → h(X)ψ 0 (X) a.s., |h(X)ψa0 (X)| ≤ |h(X)ψ 0 (X)|+kg 0 k∞ |h(X)ψ(X)| ∈ L1 .
Again by the dominated convergence theorem we can conclude that E[h(X)ψa0 (X)] →
E[h(X)ψ 0 (X)] as a → ∞. Combining the last two results together with the equality
CHAPTER 1. STEIN COEFFICIENTS
25
E[Xψa (X)] = E[h(X)ψa0 (X)], we can conclude that E[Xψ(X)] = E[h(X)ψ 0 (X)], and
the first part of the Lemma is proved.
(ii) For the second part, let h1 be another function such that
E[Xψ(X)] = E[h1 (X)ψ 0 (X)]
for all Lipschitz functions ψ. Let ψ be a function such that ψ 0 (x) = sign(h1 (x)−h(x)),
where

if y > 0
 1
−1
if y < 0
sign(y) :=

0
if y = 0.
Since ψ is a continuous function with a bounded derivative, the Mean Value Theorem
implies that ψ is a Lipschitz function. Then by assumption and by the first part of
the proof, we have:
E[h1 (X)ψ 0 (X)] = E[Xψ(X)] = E[h(X)ψ 0 (X)].
By construction of ψ we can conclude that:
0 = E[(h1 (X) − h(X))ψ 0 (X)] = E|h1 (X) − h(X)|.
Using Proposition A.1.5, this shows that h1 (X) = h(X) almost surely. When X is a
random variable on a probability space (Ω, F, P), a straightforward reasoning gives:
0 ≤ PX ({x ∈ I | h1 (x) 6= h(x)}) = P(X −1 ({x ∈ I | h1 (x) 6= h(x)}))
≤ P({ω ∈ Ω | h1 (X(ω)) 6= h(X(ω))}) = 0.
Thus, we have shown that PX ({x ∈ I | h1 (x) 6= h(x)}) = 0, which means that
h1 = h PX -a.s. on I.
(iii) For the last part of the lemma, let X be a random variable with density ρ.
Take v : R → R an arbitrary bounded continuous function and let m = E[v(X)]. For
x 6∈ I, we simply define ψ(x) = 0, and for all x ∈ I:
Z x
Z ∞
1
1
ρ(y)(v(y) − m)dy = −
ρ(y)(v(y) − m)dy.
ψ(x) :=
u(x) −∞
u(x) x
This last equality is easily seen, since by definition of m:
Z ∞
ρ(y)(v(y) − m)dy = E[v(X) − m] = E[v(X)] − m = 0.
−∞
In the beginning of this proof, we have already shown that u is strictly positive on I,
whence ψ is well-defined. A little reflection reveals that ψ is an absolutely continuous
function. We will now prove that |xψ(x)| is uniformly bounded. Let x ∈ I and x ≥ 0,
CHAPTER 1. STEIN COEFFICIENTS
26
then
|xψ(x)| =
≤
≤
≤
Z ∞
x
ρ(y)(v(y) − m)dy u(x)
Z x∞
1
|xρ(y)||v(y) − m|dy
u(x) x
Z
2kvk∞ ∞
xρ(y)dy
u(x) x
Z
2kvk∞ ∞
yρ(y)dy = 2kvk∞ ,
u(x) x
where we have used that m = E[v(X)] ≤ kvk∞ . Let x ∈ I and x < 0, then
Z x
x
|xψ(x)| = ρ(y)(v(y) − m)dy u(x) −∞
Z
−2kvk∞ x
≤
xρ(y)dy
u(x)
−∞
Z
−2kvk∞ x
yρ(y)dy = 2kvk∞ .
≤
u(x)
−∞
This shows that |xψ(x)| is uniformly bounded. Using (1.8), we see that u0 (x) =
−xρ(x). Thus, by definition of ψ it follows that:
Z x
1
1
0
0
ψ (x) =
ρ(x)(v(x) − m) −
u (x)
ρ(y)(v(y) − m)dy
u(x)
(u(x))2
−∞
Z x
xρ(x)
1
ρ(x)(v(x) − m) +
ρ(y)(v(y) − m)dy
=
u(x)
(u(x))2 −∞
1
xρ(x)
=
ρ(x)(v(x) − m) +
ψ(x)
u(x)
u(x)
1
x
=
(v(x) − m) +
ψ(x).
h(x)
h(x)
Combining this with the fact that |xψ(x)| ≤ 2kvk∞ , we have that:
|h(x)ψ 0 (x)| ≤ |v(x) − m| + |xψ(x)| ≤ 4kvk∞ .
Thus, |h(x)ψ 0 (x)| is uniformly bounded. Finally, for all x ∈ R, we have that |ψ(x)| ≤
sup|t|≤1 |ψ(t)| + |xψ(x)|. This follows since if |x| ≤ 1, then |ψ(x)| ≤ sup|t|≤1 |ψ(t)| and
if |x| > 1, then |ψ(x)| ≤ |xψ(x)|. Since ψ is continuous, there exists a constant K
such that sup|t|≤1 |ψ(t)| ≤ K. Hence, for all x ∈ R, it holds that |ψ(x)| ≤ K + 2kvk∞ ,
and consequently |ψ(x)| is uniformly bounded. By the expression for ψ 0 (x) calculated
above, it is also clear that ψ 0 is continuous.
Now, let Y be a random variable such that E[Y ψ̃(Y )] = E[h(Y )ψ̃ 0 (Y )] for every
absolutely continuous function ψ̃ with continuous derivative ψ̃ 0 , such that |ψ̃(x)|,
|xψ̃(x)| and |h(x)ψ̃ 0 (x)| are uniformly bounded, then E[h(Y )ψ 0 (Y ) − Y ψ(Y )] = 0. A
CHAPTER 1. STEIN COEFFICIENTS
27
previous calculation showed that h(x)ψ 0 (x) = v(x) − m + xψ(x) for all x, or thus that
h(Y )ψ 0 (Y ) − Y ψ(Y ) = v(Y ) − m. Combining these results, we can conclude that:
E[v(Y )] − E[v(X)] = E[v(Y ) − m] = E[h(Y )ψ 0 (Y ) − Y ψ(Y )] = 0.
Thus E[v(Y )] = E[v(X)] for every bounded continuous function v. By Lemma 9.3.2
d
in [12], we can conclude that PX = PY . Thus X = Y , which completes the proof.
Before giving a proof of Theorem 1.0.2, we recall some notation. Let (Ω, F, P) be
a probability space, X : Ω → R a random variable and Y : Ω → Rd a random vector.
Note that E[XkY ] is a σ{Y }-measurable random variable, such that E[XkY ] = g ◦ Y
for a suitable Borel-measurable function g : Rd → R. This map is unique PY -a.s. and
we define:
E[XkY = y] := g(y), y ∈ Rd .
For more details, we refer to Appendix A.1.11.
We will use the lemmas of this section in order to prove Theorem 1.0.2. For sake
of completeness we restate this strong coupling theorem.
Theorem 1.1.7. Suppose W is a random variable with E[W ] = 0 and E[W 2 ] < ∞.
Let T be a Stein coefficient for W and assume |T | ≤ K a.e., for some constant K.
Then, for any σ 2 > 0, we can construct Z ∼ N (0, σ 2 ) on the same probability space
such that for any θ ∈ R:
2
2θ (T − σ 2 )2
.
E exp(θ|W − Z|) ≤ 2E exp
σ2
Proof. (STEP 1) We first assume W has a probability density function ρ with respect
to the Lebesgue
measure, which is strictly positive and continuous on R. Define
R∞
yρ(y)dy
h(x) := x ρ(x) , for all x ∈ R. By assumption, we know that E[W ] = 0 and
E[W 2 ] < ∞. Thus, we can apply Lemma 1.1.6. Let h1 (w) := E[T kW = w], for
w ∈ R. If we can show that E[W ψ(W )] = E[h1 (W )ψ 0 (W )] for all Lipschitz functions
ψ, then part (ii) of Lemma 1.1.6 implies that h(w) = E[T kW = w] PW -almost surely.
By Remark A.1.11 we know that h1 (W ) = E[T kW ], hence it suffices to show that
E[W ψ(W )] = E[E[T kW ]ψ 0 (W )]. Using properties (i) and (ii) of Proposition A.1.12,
we have indeed that:
E[E[T kW ]ψ 0 (W )] = E[E[T ψ 0 (W )kW ]] = E[T ψ 0 (W )] = E[W ψ(W )],
where the last equality follows from the given assumption that T is a Stein coefficient
for W . Hence, we have proved that:
h(w) = E[T kW = w] a.s.
Analogously as in the proof of Lemma 1.1.6, one can use equation (1.8) to see
that h is nonnegative. Hence, it is possible to define a map A from R2 into the set of
2 × 2 symmetric positive semidefinite matrices, by setting:
p
h(x
)
h(x
)
σ
1
1
p
.
A(x1 , x2 ) :=
σ h(x1 )
σ2
CHAPTER 1. STEIN COEFFICIENTS
28
Notice that A(x1 , x2 ) does not depend on x2 at all. Obviously A(x1 , x2 ) is symmetric.
In
to show that this matrix is positive semidefinite, take an arbitrary y =
order
y1
∈ R2 . We want to show that y T A(x1 , x2 )y ≥ 0. This is easily seen from the
y2
fact that:
p
p
y T A(x1 , x2 )y = h(x1 )y12 + 2σ h(x1 )y1 y2 + σ 2 y22 = ( h(x1 )y1 + σy2 )2 ≥ 0.
With the notation of the proof of Lemma 1.1.6, we have that u(x)/ρ(x) = h(x),
where u was a continuous function. Since ρ is assumed to be continuous and strictly
positive on R, we can conclude that h is a continuous function too. Therefore, A is a
4-dimensional mapping having continuous components, which implies that A itself is
continuous.
By assumption there exists a constant K such that |T | ≤ K almost surely. Hence:
|h1 ◦ W | = |E[T kW ]| ≤ E[|T |kW ] ≤ K
a.s.,
where we have used Jensen’s inequality A.1.13 for conditional expectations. Since R
is the support of ρ, we have that W (Ω) = R, and therefore |h1 (x)| ≤ K for almost
all x ∈ R. Hence:
|h(w)| = |E[T kW = w]| = |h1 (w)| ≤ K
a.s.
A straightforward computation shows that kAk is also bounded by a constant b,
whence all assumptions of Theorem 1.1.1 are satisfied. Then let X = (X1 , X2 )
be a random vector satisfying (1.1) and (1.2) of Theorem 1.1.1. Let ψ : R → R
be an arbitrary absolutely continuous function with continuous derivative ψ 0 , such
that |ψ(x)|, |xψ(x)| and |h(x)ψ 0 (x)| are uniformly bounded by a constant C. Let
Ψ denote an antiderivative of ψ. This is a function such that Ψ0 = ψ. Since ψ is
absolutely continuous, it is continuous and hence such a function Ψ indeed exists.
Since an antiderivative is only determined up to a constant, we can assume that
Ψ(0) = 0. We define f : R2 → R : (x1 , x2 ) 7→ Ψ(x1 ). Since ψ 0 is continuous, we
have that f ∈ C 2 (R2 ). If we can show that E|f (X1 , X2 )|2 , Ek∇f (X1 , X2 )k2 and
E|Tr(A(X1 , X2 )Hessf (X1 , X2 ))| are finite, then we can apply (1.2) with this f .
(i) Using the Mean Value Theorem, there exists a u ∈ [0, x1 ] such that:
f (x1 , x2 ) = Ψ(x1 ) = Ψ(0) +Ψ0 (u)(x1 − 0) = ψ(u)x1 .
| {z }
=0
Since |ψ(x)| is uniformly bounded by C, we have that:
|f (x1 , x2 )| = |ψ(u)||x1 | ≤ C|x1 |.
Thus, we can conclude that E|f (X1 , X2 )|2 ≤ C 2 E[X12 ]. To see that this is finite,
we use (1.1) with θ = (t, 0) and t ∈ R. This yields that E[etX1 ] ≤ exp(bt2 ), and
consequently X1 has a finite moment generating function and all its moments
are finite.
CHAPTER 1. STEIN COEFFICIENTS
29
(ii) Obviously ∇f (x1 , x2 ) = (ψ(x1 ), 0), and thus k∇f (x1 , x2 )k = |ψ(x1 )| ≤ C.
Therefore Ek∇f (X1 , X2 )k2 ≤ C 2 < ∞.
(iii) It is easy to see that:
p
0
h(x1 )
ψ (x1 ) 0
h(x
)
σ
1
p
A(x1 , x2 )Hessf (x1 , x2 ) =
0
0
σ h(x1 )
σ2
0
h(x1 )ψ (x1 )
0
p
,
=
0
σ h(x1 )ψ (x1 ) 0
and therefore |Tr(A(x1 , x2 )Hessf (x1 , x2 ))| = |h(x1 )ψ 0 (x1 )| ≤ C. Eventually,
this implies that E|Tr(A(X1 , X2 )Hessf (X1 , X2 ))| ≤ C < ∞.
We have shown that equation (1.2) can be applied, using the function f defined
above. Therefore, we have that:
E hX, ∇f (X)i = ETr(A(X)Hessf (X)).
Using the calculations in part (ii), the left-hand side of this equation coincides with
E[X1 ψ(X1 )]. The calculations in part (iii) show that the right-hand side of the equation can be rewritten as E[h(X1 )ψ 0 (X1 )]. Consequently, E[X1 ψ(X1 )] = E[h(X1 )ψ 0 (X1 )]
for all absolutely continuous functions ψ with continuous derivative ψ 0 , such that
|ψ(x)|, |xψ(x)| and |h(x)ψ 0 (x)| are uniformly bounded. By Lemma 1.1.6 (iii) we can
conclude that X1 has the same distribution as W .
Now we want to show that X2 ∼ N (0, σ 2 ). Define h̃ as the function considered
in Lemma 1.1.6, with ρ̃ the density function of a N (0, σ 2 )-distribution. Simply using
the definition of h̃, it is easily seen that:
0
Z ∞
Z ∞
2
2
x2
x2
− y2
− y2
2
2
2
ye 2σ dy = −σ e 2σ
e 2σ
dy = σ 2 .
h̃(x) = e 2σ
x
x
Let ψ be an arbitrary absolutely continuous function with continuous derivative ψ 0 ,
such that |ψ(x)|, |xψ(x)| and σ 2 |ψ 0 (x)| are uniformly bounded by a constant C. If
we can show that:
E[X2 ψ(X2 )] = σ 2 E[ψ 0 (X2 )],
then Lemma 1.1.6 (iii) implies that X2 ∼ N (0, σ 2 ). In order to prove the above
equality, let Ψ be an antiderivative of ψ with Ψ(0) = 0 and let f (x1 , x2 ) = Ψ(x2 ).
Since ψ 0 is continuous, it is clear that f ∈ C 2 (R2 ).
(i) By the Mean Value Theorem there exists a u ∈ [0, x2 ] such that:
f (x1 , x2 ) = Ψ(x2 ) = Ψ(0) + Ψ0 (u)(x2 − 0) = ψ(u)x2 .
Hence E|f (X1 , X2 )|2 ≤ C 2 E[X22 ]. Using equation (1.1) with θ = (0, t) and
t ∈ R, we obtain that E[etX2 ] ≤ exp(bt2 ) < ∞. So the moment generating
function of X2 is finite, which shows that all moments of X2 are finite, and we
get that E|f (X1 , X2 )|2 < ∞.
CHAPTER 1. STEIN COEFFICIENTS
30
(ii) Clearly ∇f (x1 , x2 ) = (0, ψ(x2 )), which implies that Ek∇f (X1 , X2 )k2 = E[ψ(X2 )2 ] ≤
C 2 < ∞.
(iii) A simple calculation shows that:
p
0 σ h(x1 )ψ 0 (x2 )
,
A(x1 , x2 )Hessf (x1 , x2 ) =
0
σ 2 ψ 0 (x2 )
whence E|Tr(A(X1 , X2 )Hessf (X1 , X2 ))| = σ 2 E|ψ 0 (X2 )| ≤ C < ∞.
We have shown that E|f (X)|2 , Ek∇f (X)k2 and E|Tr(A(X)Hessf (X))| are finite.
Consequently, we can apply (1.2) for this f , and we get:
E hX, ∇f (X)i = ETr(A(X)Hessf (X)).
Combining this with the calculations in part (ii) and (iii) above, we can conclude that
E[X2 ψ(X2 )] = σ 2 E[ψ 0 (X2 )]. As we mentioned earlier, this implies that X2 ∼ N (0, σ 2 ).
The assumptions of Lemma 1.1.3 are the same as in Theorem 1.1.1. Since we
already verified these assumptions, we can apply Lemma 1.1.3 to X and A with i = 1
an j = 2. Therefore we have for all θ ∈ R that:
E exp(θ|X1 − X2 |) ≤ 2E exp(2θ2 v12 (X)).
First note that:
p
2
p
h(x1 ) − σ .
v12 (x1 , x2 ) = h(x1 ) + σ − 2σ h(x1 ) =
2
This can be rewritten as:
p
2 p
2
p
2
h(x1 ) − σ
h(x1 ) + σ
(h(x1 ) − σ 2 )2
(h(x1 ) − σ 2 )2
h(x1 ) − σ =
= p
.
p
2
2 ≤
σ2
h(x1 ) + σ
h(x1 ) + σ
This implies that for all θ ∈ R the following holds:
2 2
2 (h(X1 ) − σ )
E exp(θ|X1 − X2 |) ≤ 2E exp 2θ
.
σ2
d
Since X1 = W , we see that h(X1 ) has the same distribution as h(W ) = E[T kW ].
Hence, the above inequality can be rewritten as:
2 2
2 (E[T kW ] − σ )
E exp(θ|X1 − X2 |) ≤ 2E exp 2θ
σ2
 "

√
#2
2 2θ(T − σ ) 
= 2E exp E
W
σ
2 2
2 (T − σ )
W
≤ 2E E exp 2θ
2
σ
(T − σ 2 )2
= 2E exp 2θ2
,
σ2
CHAPTER 1. STEIN COEFFICIENTS
31
where we used Proposition A.1.12 (ii) in the last equation. For the second inequality
we used the convexity of the map x 7→ exp(x2 ) and Jensen’s inequality A.1.13 for
conditional expectations. This finishes the proof for the case where W has a probability density ρ w.r.t. the Lebesgue measure which is strictly positive and continuous
on R.
(STEP 2) Now we will prove the general case. Let Y be a standard normal
random variable, independent of W and T and defined on the same probability space
(Ω, F, P). Let W := W + Y , for all > 0. We write ν for the distribution of W
and we want to find an expression for the probability density function of W . By
independence of W and Y and by some basic knowledge about the convolution of
distributions, we have for all B ∈ R:
Z ∞
PW (B) = (PW ∗ PY )(B) =
PY (B − y)PW (dy)
−∞
Z
∞
x2
Z
=
−∞
Z
B−y
∞
Z
=
−∞
B
Z Z
∞
=
B
e− 22
√
dxPW (dy)
2π
(x−y)2
22
e−
√
2π
(x−y)2
22
e−
√
−∞
2π
dxPW (dy)
PW (dy)dx,
where we used Fubini’s Theorem in the last step. This was allowed, since the integrand
was positive. By the above calculations, we can conclude that the probability density
function of W is given by:
Z
∞
ρ (x) =
−∞
(x−y)2
e− 22
√
dν(y)
2π
We want to show that ρ is strictly positive and continuous on R. By the first part
of this proof, we then
can apply
the theorem on W . First of all, it is clear that
2
(x−W
)
1
ρ (x) = √2π
E exp − 22
≥ 0. Assume ρ (x) = 0, then we can apply Proposition
A.1.5,
since 2the
exponential function is (strictly) positive. This would mean that
(x−W )
exp − 22
= 0 a.s., which is a contradiction. Therefore ρ is strictly positive on
R. In order to prove the continuity of ρ , assume xn → x as n → ∞. We
will show2that
(ω))
1
ρ (xn ) → ρ (x) as n → ∞. For each ω, the function x 7→ √2π exp − (x−W
is
22
(ω))2
(ω))2
1
1
exp − (xn −W
exp − (x−W
continuous and thus √2π
→ √2π
. This implies
22
22
that:
1
1
(xn − W )2
(x − W )2
√
exp −
→√
exp −
a.s.
22
22
2π
2π
√1
(xn −W )2 1
Note that we even have pointwise convergence. Since 2π exp − 22
,
≤ √2π
CHAPTER 1. STEIN COEFFICIENTS
32
we can use the Bounded Convergence Theorem in order to conclude that:
(xn − W )2
1
(x − W )2
1
E exp −
→√
E exp −
= ρ (x).
ρ (xn ) = √
22
22
2π
2π
This finishes the argument of ρ being continuous on R. Thus, by (STEP 1) we are
allowed to apply the theorem on W . So we need a Stein coefficient for W which is
almost surely bounded by a constant. Let ψ be a Lipschitz function, then by definition
of W one gets:
E[W ψ(W )] = E[W ψ(W + Y )] + E[Y ψ(W + Y )].
We will work out the two expectations separately. Let ψt (x) := ψ(x + t). Using the
independence of Y and W , Corollary A.1.4 implies that:
Z ∞
Z ∞
E[W ψ(W + y)]PY (dy) =
E[W ψy (W )]PY (dy).
E[W ψ(W + Y )] =
−∞
−∞
Using that T is a Stein coefficient for W , the above expression can be rewritten as:
Z ∞
Z ∞
0
E[T ψy (W )]PY (dy) =
E[T ψ 0 (W + y)]PY (dy) = E[T ψ 0 (W + Y )].
−∞
−∞
where in the last step we used Corollary A.1.4 and the independence of Y and (W, T ).
For the second expectation, we use again Corollary A.1.4, which gives us that:
Z ∞
Z ∞
E[Y ψ(W + Y )] =
E[Y ψ(x + Y )]PW (dx) =
E[Y ψx (Y )]PW (dx).
−∞
−∞
Since Y ∼ N (0, 2 ), we can use Proposition A.1.3, and the former expression can be
rewritten as:
Z ∞
Z ∞
2
0
2
E[ψx (Y )]PW (dx) = E[ψ 0 (x + Y )]PW (dx) = 2 E[ψ 0 (W + Y )].
−∞
−∞
Note that in the last step, we used again Corollary A.1.4. Putting all these calculations together, we get that:
E[W ψ(W )] = E[W ψ(W + Y )] + E[Y ψ(W + Y )]
= E[T ψ 0 (W + Y )] + 2 E[ψ 0 (W + Y )]
= E[(T + 2 )ψ 0 (W )].
This means T + 2 is a Stein coefficient for W . Moreover, |T + 2 | is almost surely
bounded by the constant K + 2 . Besides, E[W ] = 0 by assumption. Therefore,
E[W ] = E[W ] + E[Y ] = 0. Finally, by independence of Y and W , it is clear that:
E[W2 ] = E[(W + Y )2 ] = E[W 2 ] + 2E[W ]E[Y ] + 2 E[Y 2 ] = E[W 2 ] + 2 .
CHAPTER 1. STEIN COEFFICIENTS
33
Since we assumed that E[W 2 ] is finite, we also have that E[W2 ] < ∞. Thus, by
the first part of the proof, we can construct a version of W and a random variable
Z ∼ N (0, σ 2 + 2 ) on the same probability space such that for all θ ∈ R:
2
2
2θ (T − σ 2 )2
2θ (T + 2 − (σ 2 + 2 ))2
= 2E exp
.
E exp(θ|W − Z |) ≤ 2E exp
σ 2 + 2
σ 2 + 2
Let µ be the distribution of the random vector (W , Z ). We want to show that
the family (µ )0<<1 is sequentially compact. By Prohorov’s Theorem A.1.19 this is
equivalent with showing that this family is tight. By Theorem A.1.20 it is enough to
show that there exists an α > 0 such that sup0<<1 E[k(W , Z )kα ] < ∞. Let α = 2,
then E[k(W , Z )k2 ] = E[W2 ] + E[Z2 ] for all 0 < < 1. Recall that E[W2 ] = E[W 2 ] +
2 . Since E[Z2 ] = σ 2 + 2 , we can conclude that E[k(W , Z )k2 ] = E[W 2 ] + 22 + σ 2 .
By assumption E[W 2 ] < ∞, which implies that:
sup E[k(W , Z )k2 ] = E[W 2 ] + 2 + σ 2 < ∞.
0<<1
Consequently, the family (µ )0<<1 is sequentially compact. Thus, there exist a probability measure µ0 on (R2 , R2 ) and a sequence (n )n in (0, 1) with n & 0, such that
w
µn → µ0 . Take a 2-dimensional random vector (W0 , Z0 ) with distribution µ0 . Then:
d
(Wn , Zn ) → (W0 , Z0 ).
Therefore, using Proposition A.1.24, we have for all θ ∈ R that:
E exp(θ|W0 − Z0 |) ≤ lim inf E exp(θ|Wn − Zn |)
n→∞
2
2θ (T − σ 2 )2
≤ 2 lim inf E exp
n→∞
σ 2 + 2n
2
2θ (T − σ 2 )2
= 2E exp
.
σ2
Note that in the last step, we used the monotone convergence theorem. There is
d
only one thing left to prove, namely that W0 = W and Z0 ∼ N (0, σ 2 ). We know that
w
w
µn → µ0 or equivalently that P(Wn ,Zn ) → P(W0 ,Z0 ) . Since projections are continuous,
the continuous mapping theorem implies that:
w
PWn → PW0
w
and PZn → PZ0 .
(1.9)
On the other hand, since pointwise convergence implies convergence in distribution,
d
d
we see that Wn = W + n Y → W . Our last aim is to prove that Zn → Z ∼
N (0, σ 2 ). Using Scheffé’s Lemma, it suffices to show pointwise convergence of the
density functions. (Note that we even obtain strong convergence in this case.) This
is trivial, since n → 0 implies that for every t ∈ R:
1
1 t2
1
t2
fZn (t) = p
exp − 2
→√
exp − 2 = fZ (t).
2 σ + 2n
2σ
2πσ
2π(σ 2 + 2n )
w
w
Thus, we have proved that PWn → PW and PZn → PZ . Combining this with
(1.9) and the fact that limits under weak convergence are unique, we can conclude
d
d
that W0 = W and Z0 = Z ∼ N (0, σ 2 ). This finishes the proof.
CHAPTER 1. STEIN COEFFICIENTS
1.2
34
Examples of Stein coefficients
In this section we will give a few examples of Stein coefficients. These examples
have the purpose to gain some insight into the concept of Stein coefficients, but are
irrelevant for the rest of the paper.
Example 1
P
Let 1 , 2 , ..., n be i.i.d. symmetric ±1-valued random variables. Let Sn := ni=1 i
2
. We claim
and let Y ∼ U [−1, 1]. Let Wn := Sn + Y and let Tn := n − Sn Y + 1−Y
2
that Tn is a Stein coefficient for Wn . For a verification of this statement, we refer to
the proof of Theorem 2.1.1 in Chapter 2.
Example 2
Let X be a random variable with E[X] = 0 and E[X 2 ] < ∞. Assume X has a density
function ρ with support I ⊆ R, where I is a bounded or unbounded interval. Let
 R∞
 x yρ(y)dy
if x ∈ I
(1.10)
h(x) :=
ρ(x)

0
if x 6∈ I.
Under certain conditions it can be shown that E[Xψ(X)] = E[ψ 0 (X)h(X)] for every
Lipschitz function ψ. Hence h(X) is a Stein coefficient for X. A verification of this
statement is given in the proof of Lemma 1.1.6.
Example 3
Let X be a random variable and assume we have the same conditions
Pn as in the
1
√
previous example. Let X1 , ..., Xn be i.i.d. copies of X, let W = n i=1 Xi and let
ψ
Pnbe a Lipschitz function. Using Corollary A.1.4 and the independence of Xi and
j=1,j6=i Xj , we get for each 1 ≤ i ≤ n:
"
!# Z
n
h
i
∞
1 X
E[Xi ψ(W )] = E Xi ψ √
Xj
=
E Xi ψ( √1n (Xi + x)) PPnj=1,j6=i Xj (dx).
n j=1
−∞
For x ∈ R fixed, let ψ x (y) := ψ( √1n (y + x)). Example 2 implies that:
h
i
0
1
√
E Xi ψ( n (Xi + x)) = E[Xi ψ x (Xi )] = E[h(Xi )ψ x (Xi )] = √1n E[h(Xi )ψ 0 ( √1n (Xi +x))].
Combining these two results, we get for each 1 ≤ i ≤ n that:
Z ∞
1
E[Xi ψ(W )] = √n
E[h(Xi )ψ 0 ( √1n (Xi + x))]PPnj=1,j6=i Xj (dx)
−∞
"
!!#
n
X
1
1
= √ E h(Xi )ψ 0 √
Xi +
Xj
n
n
j=1,j6=i
=
√1 E[h(Xi )ψ 0 (W )].
n
CHAPTER 1. STEIN COEFFICIENTS
35
P
Notice that for the second equality we used again that Xi and nj=1,j6=i Xj are independent. Finally, we conclude that for any Lipschitz function ψ:
"
#
n
n
X
1
1 X
E[Xi ψ(W )] = E ψ 0 (W )
h(Xi ) ,
E[W ψ(W )] = √
n i=1
n i=1
which means that
1
n
Pn
i=1
h(Xi ) is a Stein coefficient for W .
Example 4
In this example we will weaken the i.i.d. assumption. More precisely, we will show
how Theorem 1.0.2 can be used to produce strong couplings for sums of dependent
random variables.
Theorem 1.2.1. Assume X1 , ..., Xn , Xn+1 are i.i.d. random variables with E[X1 ] =
0, E[X12 ] = 1 and density function ρ. Suppose there exist constants x1 < x2 and
0 < a ≤ b such that:
a ≤ ρ(x) ≤ b
if x ∈ [x1 , x2 ]
ρ(x) = 0
if x 6∈ [x1 , x2 ].
Pn
Let Sn :=
i=1 Xi Xi+1 . Then it is possible to construct a version of Sn and a
Gaussian random variable Zn ∼ N (0, n) on the same probability space such that for
all x ≥ 0:
P{|Sn − Zn | ≥ x} ≤ 4e−C(ρ)x ,
where C(ρ) is a positive constant depending only on the density ρ (and not on n).
For the proof of this theorem we will need a result which is called the AzumaHoeffding inequality for sums of bounded martingale differences.
Definition 1.2.2. Let (Ω, F, P) be a probability space and (Fn )n=1,2,... a non-decreasing
family of sub-σ-fields of F. Let (Yn )n=1,2,... be a sequence of Fn -measurable random
variables. We call this a sequence of bounded martingale differences if the following
two conditions are satisfied:
1. ∀n ≥ 1, ∃Kn : |Yn | ≤ Kn a.s.
2. ∀n ≥ 1 : E[Yn kFn−1 ] = 0.
Lemma 1.2.3. (Azuma-Hoeffding inequality) Suppose (Yi )i=1,2,... is a sequence of
bounded martingale differences with |Yi | ≤ 1 a.s. for all i ≥ 1. Then for all t ∈ R
and for all n ≥ 1:
!
!
n
n
X
X
2
E exp t
bnk Yk ≤ exp t2
b2nk ,
k=1
k=1
where the bnk , k = 1, ..., n; n = 1, 2, ... are arbitrary real numbers.
Proof. The proof of this lemma can be found in [1].
CHAPTER 1. STEIN COEFFICIENTS
36
Now we will proceed with a proof of Theorem 1.2.1.
Proof. Let X0 ≡ 0 and let h be defined as in (1.10). Then for any Lipschitz function
ψ the following holds:
E[Sn ψ(Sn )] =
n
X
E[Xi Xi+1 ψ(Sn )] =
i=1
n
X
E[h(Xi )Xi+1 (Xi−1 + Xi+1 )ψ 0 (Sn )]. (1.11)
i=1
The second equality can be obtained by using Corollary A.1.4 and the independence
of Xi and Y := (X1 , ..., Xi−1 , Xi+1 , ..., Xn+1 ), so that for all 2 ≤ i ≤ n:
Z
E[Xi Xi+1 ψ(Sn )] =
E[Xi ψ x1 ,..,xˆi ,..,xn+1 (Xi )]PY (d(x1 , .., x̂i , .., xn+1 )),
Rn
P
Pn
i−2
where ψ x1 ,..,xˆi ,..,xn+1 (z) := xi+1 ψ
x
x
+
x
z
+
zx
+
x
x
for
j
j+1
i−1
i+1
j
j+1
j=1
j=i+1
z ∈ R. We see by Example 2 and by the above expression that:
Z
0
E[h(Xi )ψ x1 ,..,xˆi ,..,xn+1 (Xi )]PY (d(x1 , .., x̂i , .., xn+1 ))
E[Xi Xi+1 ψ(Sn )] =
Rn
= E[h(Xi )Xi+1 (Xi−1 + Xi+1 )ψ 0 (Sn )],
where in the last equality we made again use of Corollary A.1.4 and the independence
of Xi and Y . For i = 1 this equality can be proved, by using the exact same reasoning.
This finishes the argument, and we have proved (1.11).
Pn Hence, if we let Di :=
h(Xi )Xi+1 (Xi−1 +Xi+1 ), then we have shown that Tn := i=1 Di is a Stein coefficient
for Sn . Using that Xi−1 is σ{X1 , . . . , Xi−1 }-measurable and using the independence of
X1 , ..., Xi−1 , h(Xi ) and Xi+1 , we obtain by Proposition A.1.12 that for any 1 ≤ i ≤ n:
E[h(Xi )Xi+1 Xi−1 kX1 , . . . , Xi−1 ] = Xi−1 E[h(Xi )Xi+1 kX1 , . . . , Xi−1 ]
= Xi−1 E[h(Xi )Xi+1 ]
= Xi−1 E[h(Xi )]E[Xi+1 ] = 0.
Again using independence, this implies that for any 1 ≤ i ≤ n:
2
E[Di − 1kX1 , . . . , Xi−1 ] = E[h(Xi )Xi+1 Xi−1 kX1 , . . . , Xi−1 ] + E[h(Xi )Xi+1
]−1
2
]−1
= E[h(Xi )]E[Xi+1
2
2
= E[Xi ]E[Xi+1 ] − 1 = 1 − 1 = 0.
Here we have used that E[h(Xi )] = E[Xi2 ], which follows from Proposition A.3.1 in
the appendix. Let Dj ≡ 0 for j ≥ n + 1, let Fi := σ{X1 , ..., Xi } for 1 ≤ i ≤ n + 1, and
let Fi := Fn+1 for i ≥ n + 2. We want to show that (Di − 1)i=1,2,... is a sequence of
bounded martingale differences. The second condition of Definition 1.2.2 has already
been proved. For the first condition it suffices to show that |Di | is almost surely
bounded by a constant. Moreover, we will show that this constant only depends on
ρ. Since Xi has density ρ, we clearly have that Xi ∈ [x1 , x2 ] a.s., which implies that
CHAPTER 1. STEIN COEFFICIENTS
37
|Xi | ≤ |x1 |∨|x2 | almost surely. Secondly, by definition of h, we have for all x ∈ [x1 , x2 ]
that:
R x2
R∞
Z
|y|ρ(y)dy
|y|ρ(y)dy
b x2
b
x
x
=
≤
|y|dy ≤ cx1 ,x2 ,
|h(x)| ≤
ρ(x)
ρ(x)
a x1
a
where cx1 ,x2 is a constant depending only on x1 and x2 . Thus, by definition of Di we
have:
b
|Di | ≤ |h(Xi )||Xi+1 |(|Xi−1 | + |Xi+1 |) ≤ cx1 ,x2 2(|x1 | ∨ |x2 |)2 a.s.,
a
which is a constant depending on ρ only. Hence, there exists a constant K(ρ) > 0,
such that |Di − 1| ≤ K(ρ) almost surely, whence (Di − 1)i=1,2,... is a sequence of
i −1
i −1
bounded martingale differences. Since | DK(ρ)
| ≤ 1 a.s. and since ( DK(ρ)
)i=1,2,... still
is a sequence of bounded martingale differences, we can apply Lemma 1.2.3 to this
sequence. Let bnk = 1 for all 1 ≤ k ≤ n, then we obtain for every t ∈ R:
!
!
2 n
n
2 X
X
tn
D
−
1
t
k
2
t
1 = exp
.
≤ exp
E exp( K(ρ) (Tn − n)) = E exp t
K(ρ)
2 k=1
2
k=1
Let α ∈ R be arbitrary and let t = αK(ρ), then we obtain:
2
α K(ρ)2 n
E exp(α(Tn − n)) ≤ exp
= exp(C1 (ρ)α2 n),
2
2
is a constant depending only on ρ. Thus if Z is a standard
where C1 (ρ) = K(ρ)
2
Gaussian random variable, independent of all other random variables, then for any
α ∈ R we get:
Z +∞
√
√
E[exp(αt(Tn − n)/ n)]PZ (dt)
E exp(αZ(Tn − n)/ n) =
−∞
Z +∞
exp(C1 (ρ)α2 t2 )PZ (dt)
≤
−∞
= E exp(C1 (ρ)α2 Z 2 ).
In the first step we used Corollary A.1.4 and the fact that Tn and Z are independent. Since Z is a standard Gaussian random variable, we know that Z 2 follows a
gamma(1/2, 2) distribution. The moment generating function of Z 2 is thus given by:
2
E[etZ ] = √
1
,
1 − 2t
1
t< .
2
The expression for the mgf also follows from Proposition A.3.7 in the appendix. If
we choose C2 (ρ) = α > 0 small enough, such that C1 (ρ)α2 ≤ 3/8, one gets:
√
1
1
E exp(C2 (ρ)Z(Tn − n)/ n) ≤ p
≤q
1 − 2C1 (ρ)α2
1−
= 2.
2.3
8
CHAPTER 1. STEIN COEFFICIENTS
38
On the other hand, using Corollary A.1.4, we obtain:
Z +∞
√
√
E[exp(C2 (ρ)Z(t − n)/ n)]PTn (dt)
E exp(C2 (ρ)Z(Tn − n)/ n) =
−∞
Z +∞
exp(C2 (ρ)2 (t − n)2 /2n)PTn (dt)
=
−∞
= E exp(C2 (ρ)2 (Tn − n)2 /2n),
where we used the well-known expression for the moment generating function of
a standard Gaussian random variable. We have already shown that Tn is a Stein
coefficient for Sn . The other assumptions of Theorem 1.0.2 are easy to verify, and
hence this theorem can be applied. Therefore, there exist a version of Sn and a
random variable Zn ∼ N (0, n) on the same probability space, such that for all θ ∈ R:
2
2θ (Tn − n)2
E exp(θ|Sn − Zn |) ≤ 2E exp
.
n
C2 (ρ)
C2 (ρ)
Let θ = 2 , then we obtain that E exp
|Sn − Zn | ≤ 4. Using Markov’s
2
inequality, we get that for any x ≥ 0:
C2 (ρ)
C2 (ρ)
P{|Sn − Zn | ≥ x} ≤ P exp
|Sn − Zn | ≥ exp
x
2
2
C2 (ρ)
C2 (ρ)
|Sn − Zn | e− 2 x
≤ E exp
2
≤ 4e−
which completes the proof.
C2 (ρ)
x
2
,
Chapter 2
A normal approximation theorem
In this chapter we will prove two theorems. The first theorem is a normal approximation theorem. More specifically, this theorem produces a coupling of Sn with a
random variable Zn ∼ N (0, n). The second theorem can be seen as a conditional
version of the first theorem.
2.1
Verification of the normal approximation theorem
The goal of this section is to give a proof of the normal approximation theorem stated
below.
Theorem 2.1.1. There exist universal constants κ and θ0 > 0 such that the following
is true. Let n be a positive integer
P and let 1 , 2 , . . . , n be i.i.d. symmetric ±1-valued
random variables. Let Sn := ni=1 i . It is possible to construct a version of Sn and
Zn ∼ N (0, n) on the same probability space such that
E exp(θ0 |Sn − Zn |) ≤ κ.
Once we have established this result, we can use Markov’s inequality in order to
conclude that for all x ≥ 0 it holds that:
P{|Sn − Zn | ≥ x} ≤ P{exp(θ0 |Sn − Zn |) ≥ eθ0 x } ≤ κe−θ0 x .
Actually Theorem 2.1.1 is not a new result. A more general version of this theorem
can be shown, using the classical techniques. More specifically, when 1 , 2 , . . . , n are
independent mean zero random variables, having a finite moment generating function
in a neighbourhood of zero, this result is also valid. For more details we refer to
Sakhanenko [23]. We will only give a proof of the specific case where 1 , 2 , . . . , n are
i.i.d. symmetric ±1-valued random variables. Therefore we will need the following
lemma.
39
CHAPTER 2. A NORMAL APPROXIMATION THEOREM
40
Lemma 2.1.2. Assume X and Y are independent random variables, where X is a
symmetric ±1-valued random variable and Y follows the uniform distribution on the
interval [−1, 1]. Then we have for any Lipschitz function ψ : R → R that:
E[Xψ(X + Y )] = E[(1 − XY )ψ 0 (X + Y )],
and
1
E[Y ψ(X + Y )] = E[(1 − Y 2 )ψ 0 (X + Y )].
2
Proof. For the first equation we start with using the independence of X and Y , so
that by Corollary A.1.4 the following holds:
Z ∞
0
E[(1 − XY )ψ (X + Y )] =
E[(1 − Xy)ψ 0 (X + y)]PY (dy)
−∞
Z
1 1
=
E[(1 − Xy)ψ 0 (X + y)]dy.
2 −1
Note that in the last step we have used that the Lebesgue density function of Y is
given by:
1/2
if y ∈ [−1, 1]
y 7→
0
otherwise.
Since X is a symmetric ±1-valued random variable, we have for any y ∈ R that
E[(1 − Xy)ψ 0 (X + y)] = 21 (1 − y)ψ 0 (1 + y) + 12 (1 + y)ψ 0 (−1 + y). Combining this with
the former calculation, one gets:
Z
Z
1 1
1 1
0
0
(1 − y)ψ (1 + y)dy +
(1 + y)ψ 0 (−1 + y)dy.
E[(1 − XY )ψ (X + Y )] =
4 −1
4 −1
We will work out these two terms separately. Using integration by parts, it is easy to
see that:
Z 1
Z 1
1
0
ψ(1 + y)dy
(1 − y)ψ (1 + y)dy = [(1 − y)ψ(1 + y)]−1 +
−1
−1
Z 1
= −2ψ(0) +
ψ(1 + y)dy.
−1
Again using integration by parts, we see that the second term reduces to:
Z 1
Z 1
1
0
(1 + y)ψ (−1 + y)dy = [(1 + y)ψ(−1 + y)]−1 −
ψ(−1 + y)dy
−1
−1
Z 1
= 2ψ(0) −
ψ(−1 + y)dy.
−1
Putting these results together, we get that:
Z
Z
1 1
1 1
0
E[(1 − XY )ψ (X + Y )] =
ψ(1 + y)dy −
ψ(−1 + y)dy.
4 −1
4 −1
CHAPTER 2. A NORMAL APPROXIMATION THEOREM
41
On the other hand, using Corollary A.1.4, it is clear that:
Z
1 1
E[Xψ(X + y)]dy
E[Xψ(X + Y )] =
2 −1
Z 1 1 1
1
=
ψ(1 + y) − ψ(−1 + y) dy
2 −1 2
2
Z 1
Z
1 1
1
ψ(1 + y)dy −
ψ(−1 + y)dy,
=
4 −1
4 −1
and the first equality of the lemma has been proved. For the second equation, we
start again by using Corollary A.1.4 and we obtain:
Z
1 1
E[Y ψ(X + Y )] =
E[yψ(X + y)]dy.
2 −1
By definition of X, we have for all y ∈ R that E[yψ(X+y)] = 21 yψ(1+y)+ 21 yψ(−1+y).
Hence, we can conclude that:
Z
Z
1 1
1 1
yψ(1 + y)dy +
yψ(−1 + y)dy.
E[Y ψ(X + Y )] =
4 −1
4 −1
On the other hand, Corrolary A.1.4 implies that:
Z
1 1
1
2
0
E[(1 − Y )ψ (X + Y )] =
E[(1 − y 2 )ψ 0 (X + y)]dy
2
4 −1
Z
Z
1 1
1 1
2
0
(1 − y )ψ (1 + y)dy +
(1 − y 2 )ψ 0 (−1 + y)dy,
=
8 −1
8 −1
where we used for the last equality that X is a symmetric ±1-valued random variable.
Using integration by parts, we see the following holds for every x ∈ R:
Z
Z
1
1 1
1
1 1
2
0
2
(1 − y )ψ (x + y)dy =
ψ(x + y)(−2y)dy
(1 − y )ψ(x + y) −1 −
2 −1
2
2 −1
Z 1
yψ(x + y)dy.
=
−1
The above equality specifically holds for x = 1 and x = −1. Therefore, we can
conclude that:
Z
Z
1
1 1
1 1
2
0
E[(1 − Y )ψ (X + Y )] =
yψ(1 + y)dy +
yψ(−1 + y)dy
2
4 −1
4 −1
= E[Y ψ(X + Y )].
This finishes the proof of the lemma.
We will now give a proof of Theorem 2.1.1.
CHAPTER 2. A NORMAL APPROXIMATION THEOREM
42
Proof. For simplicity, let us write S instead of Sn . Let Y be a random variable,
independent of 1 , . . . , n and uniformly distributed on the interval [−1, 1]. To ease
the notation, the conditional expectation given 1 , . . . , n−1 will be denoted by E−
and we will write:
n−1
X
−
S =
i , X = n .
i=1
Let ψ be an arbitrary Lipschitz function. We will show that:
E− [Xψ(S − + X + Y )] = E− [(1 − XY )ψ 0 (S + Y )].
(2.1)
Let δ = (δ1 , . . . , δn−1 ) ∈ {−1, 1}n−1 and define:
"
!
#
n−1
X
g(δ) := E Xψ
i + X + Y 1 = δ1 , . . . , n−1 = δn−1 .
i=1
P
Let ψδ (x) := ψ( n−1
i=1 δi + x), for x ∈ R. Since (X, Y ) and (1 , . . . , n−1 ) are independent, we can use Theorem A.1.14, and we get:
"
!#
n−1
X
g(δ) = E Xψ
δi + X + Y
i=1
= E[Xψδ (X + Y )]
= E[(1 − XY )ψδ0 (X + Y )]
"
!#
n−1
X
= E (1 − XY )ψ 0
δi + X + Y
i=1
"
= E (1 − XY )ψ 0
n−1
X
i=1
!
#
i + X + Y 1 = δ1 , . . . , n−1 = δn−1 ,
where in the third step, we have used Lemma 2.1.2. By Remark A.1.11, this implies
that:
"
!
#
n−1
X
E− [Xψ(S − + X + Y )] = E Xψ
i + X + Y 1 , . . . , n−1
i=1
= g(1 , . . . , n−1 )
"
= E (1 − XY )ψ 0
n−1
X
i=1
−
!
#
i + X + Y 1 , . . . , n−1
0
= E [(1 − XY )ψ (S + Y )],
and (2.1) has been proved. Using that X = n and taking the expectation on both
sides of equation (2.1), we get by Proposition A.1.12 (ii) that:
E[n ψ(S + Y )] = E[(1 − n Y )ψ 0 (S + Y )].
CHAPTER 2. A NORMAL APPROXIMATION THEOREM
43
d
Since (i , S, Y ) = (n , S, Y ) for any 1 ≤ i < n, we obtain that:
E[i ψ(S + Y )] = E[(1 − i Y )ψ 0 (S + Y )],
for all 1 ≤ i ≤ n. Taking the sum, we get that:
E[Sψ(S + Y )] = E[(n − SY )ψ 0 (S + Y )].
(2.2)
Next, let g̃(δ) := E[Y ψ(S − +X +Y )k1 = δ1 , . . . , n−1 = δn−1 ], then again by Theorem
A.1.14 we get that:
"
!#
n−1
X
g̃(δ) = E Y ψ
δi + X + Y
= E[Y ψδ (X + Y )].
i=1
Using the second part of Lemma 2.1.2, we get that:
1
E[(1 − Y 2 )ψδ0 (X + Y )]
2 "
!#
n−1
X
1
=
E (1 − Y 2 )ψ 0
δi + X + Y
2
i=1
E[Y ψδ (X + Y )] =
=
1
E[(1 − Y 2 )ψ 0 (S + Y )k1 = δ1 , . . . , n−1 = δn−1 ].
2
Note that in the last step we used again Theorem A.1.14. Combining the above with
Remark A.1.11, we obtain that:
1
E− [Y ψ(S + Y )] = g̃(1 , . . . , n−1 ) = E− [(1 − Y 2 )ψ 0 (S + Y )].
2
Taking expectations of both sides yields that E[Y ψ(S + Y )] = 21 E[(1 − Y 2 )ψ 0 (S + Y )].
Combining this with equation (2.2), we get that:
1−Y2
0
E[(S + Y )ψ(S + Y )] = E n − SY +
ψ (S + Y ) .
2
2
Thus, putting S̃ = S +Y and T = n−SY + 1−Y
, we have that T is a Stein coefficient
2
2
for S̃. Let σ = n, then using the trivial inequality (a + b)2 ≤ 2a2 + 2b2 , we obtain:
2 2
(T − σ )
=
σ2
−SY +
n
1−Y 2
2
2
2S 2 Y 2 + 2
≤
n
Pn
1−Y 2
2
2
2S 2 + 1/2
≤
.
n
By definition of S̃, we have that E[S̃] = i=1 E[i ] + E[Y ] = 0. Furthermore, by
independence of 1 , . . . , n and Y we obtain that:
!
Z
n
n
X
X
1 1 2
2
2
2
E[S̃ ] = Var
i + Y =
E[i ] + E[Y ] = n +
y dy = n + 1/3 < ∞.
2
−1
i=1
i=1
CHAPTER 2. A NORMAL APPROXIMATION THEOREM
44
Moreover, by definition of T it is clear that:
|T | ≤ n + |S||Y | +
1−Y2
≤ n + n.1 + 1/2 = 2n + 1/2.
2
Hence, |T | is almost surely bounded by a constant, and all conditions of Theorem
1.0.2 are satisfied. Therefore we can construct a version of S̃ and a random variable
Z ∼ N (0, σ 2 ) on the same probability space such that for all θ ∈ R it holds that:
2
2θ (T − σ 2 )2
.
E exp(θ|S̃ − Z|) ≤ 2E exp
σ2
Assuming that the underlying p-space is rich enough, there exist independent random
d P
variables S and Y such that S̃ = S + Y , S = ni=1 i and Y ∼ U [−1, 1]. For more
details we refer to Theorem A.1.8 in the appendix. Clearly |S − S̃| ≤ 1. Let θ ∈ R
be arbitrary, then it is clear that θ|S − Z| ≤ |θ||S − S̃| + |θ||S̃ − Z|. Consequently,
we get:
E exp(θ|S − Z|) ≤ E[exp(|θ||S − S̃|) exp(|θ||S̃ − Z|)]
≤ exp |θ| E exp(|θ||S̃ − Z|)
2
2θ (T − σ 2 )2
≤ 2 exp |θ| E exp
.
σ2
Using the bound on (T − σ 2 )2 /σ 2 obtained above, we have:
2
2S + 1/2
2
E exp(θ|S − Z|) ≤ 2 exp |θ| E exp 2θ
n
2
= 2 exp(|θ| + θ /n)E exp 4θ2 S 2 /n .
2 2
Let V be
√ a standard
√ normal random variable independent of S, then E exp(4θ S /n) =
E exp( 8θV S/ n). To prove this equality, first note that by independence of V and
S we can use Corollary A.1.4, which implies that:
Z
√
√
√
√
E exp( 8θV S/ n) =
E[exp( 8θV s/ n)]PS (ds).
R
Using the well-known
√ formula
√ for the mgf2of2 a standard normal random variable, we
obtain that E exp 8θV s/ n = exp (4θ s /n) for any s ∈ R. Therefore:
2 2
2 2
Z
√
√
4θ s
4θ S
E exp( 8θV S/ n) =
exp
PS (ds) = E exp
.
n
n
R
√
√
√
√
n
We will
that
E
exp(
8θV
S/
n)
=
E[cosh
(
8θV
/
n)]. Let g(z) :=
√ now show
√
E[exp( 8θV S/ n)kV = z]. Then, using the independence of V and S, Theorem
CHAPTER 2. A NORMAL APPROXIMATION THEOREM
45
√
√
A.1.14 implies that g(z) = E exp( 8θzS/ n). Using that 1 , . . . , n are i.i.d. symmetric ±1-valued random variables, we obtain:
#
" n
n
Y
Y
√
√
√
√
E exp( 8θzi / n)
g(z) = E
exp( 8θzi / n) =
i=1
i=1
√
√
= E[exp( 8θz1 / n)]n
n
√
√
√
√
1
1
=
exp( 8θz/ n) + exp(− 8θz/ n)
2
2
√
√
= coshn ( 8θz/ n).
Hence, by Remark A.1.11, we can conclude that:
√
√
√
√
E[exp( 8θV S/ n)kV ] = g(V ) = coshn ( 8θV / n).
If we take the expectation
can conclude by Proposition A.1.12
√ we √
√
√on both sides,n then
(
8θV
/
n)]. Thus, we have shown that
(ii) that E exp( 8θV S/ n)√ = E[cosh
√
n
2 2
E exp(4θ S /n) = E[cosh ( 8θV / n)]. Using the power series of the hyperbolic
P∞ x2k
P∞ x2k
2
cosine, we have the simple inequality cosh x =
k=0 (2k)! ≤
k=0 k! = exp(x ).
Consequently, if 16θ2 < 1, we have that:
1
E exp(4θ2 S 2 /n) ≤ E[expn (8θ2 V 2 /n)] = E exp(8θ2 V 2 ) = √
,
1 − 16θ2
(2.3)
where the last equality can be seen as follows. Since V is a standard Gaussian random
variable, we know that V 2 ∼ gamma(1/2, 2). Thus the moment generating function
of V 2 is given by:
1
1
if t < .
R(t) = √
2
1 − 2t
This expression for R(t) also follows from Proposition A.3.7 in the appendix. Hence,
1
2
E exp(8θ2 V 2 ) = R(8θ2 ) = √1−16θ
< 1. Choosing θ0 such that 0 < θ0 < 1/4,
2 if 16θ
we can conclude that:
E exp(θ0 |S − Z|) ≤ 2 exp(|θ0 | + θ02 /n)E exp(4θ02 S 2 /n)
1
≤ 2 exp(|θ0 | + θ02 /n) p
1 − 16θ02
1
=: κ.
≤ 2 exp(|θ0 | + θ02 ) p
1 − 16θ02
This finishes the proof.
2.2
Conditional version of the normal approximation theorem
The purpose of this section is to prove the theorem below. It can be considered as a
conditional version of Theorem 2.1.1.
CHAPTER 2. A NORMAL APPROXIMATION THEOREM
46
Theorem 2.2.1. Let 1 , 2 , . . . , n be n arbitrary elements of {−1, 1}. Let
P π be a
uniform random permutation of {1, 2, . . . , n}. For each 1 ≤ k ≤ n, let Sk = kl=1 π(l) ,
and let
kSn
.
Wk = Sk −
n
There exist universal constants M > 1, c > 1 and θ0 > 0 satisfying the following.
Take any n ≥ 3, any possible value of Sn , and any n/3 ≤ k ≤ 2n/3. It is possible
to construct a version of Wk and a Gaussian random variable Zk with mean 0 and
variance k(n − k)/n on the same probability space such that for any θ ≤ θ0 :
cθ2 Sn2
.
E exp(θ|Wk − Zk |) ≤ M exp 1 +
n
Note that Sn is not random. Since 1 , 2 , . . . , n are fixed, Sn is simply an element
of {−n, . . . , −1, 0, 1, . . . , n}. Theorem 2.2.1 can be seen as a conditional version of
Theorem 2.1.1, since we assume the value of Sn to be known. Thus, actually, we are
conditioning on Sn .
In order to give a proof of Theorem 2.2.1, we need some auxiliary results. We will
start with the following lemma.
Lemma 2.2.2. Let us continue with the notation of Theorem 2.2.1. Then for any
θ ∈ R and any 1 ≤ k ≤ n, we have:
√
E exp(θWk / k) ≤ exp θ2 .
Remark 2.2.3. Note that the bound does not depend on the value of Sn . This will
be a crucial point in the proof of the next lemma and in the induction step in Section
3.1.
Proof. Since Wn = 0, the lemma is obviously
true for k = n. Fix
√ k such that
√
1 ≤ k < n, and let m(θ) := E exp(θWk / k). Then m(θ) = R(θ/ k), where R
denotes the moment generating function of Wk . By definition of Wk , it is clear that:
k
k
|Sn | ≤ k + n = 2k < ∞.
n
n
Thus Wk is a bounded random variable, which implies that R is differentiable and
√
√
1
1
m0 (θ) = √ R0 (θ/ k) = √ E[Wk exp(θWk / k)].
k
k
|Wk | ≤ |Sk | +
For a detailed proof of this result, we refer to Proposition A.3.4 in the appendix. Note
that:
P
P
n
k
(n − k) ki=1 π(i) − k nj=k+1 π(j)
1X X
(π(i) − π(j) ) =
n i=1 j=k+1
n
P
P
(n − k) ki=1 π(i) − k Sn − ki=1 π(i)
=
n
Pk
n i=1 π(i) − kSn
kSn
=
= Sk −
= Wk .
n
n
CHAPTER 2. A NORMAL APPROXIMATION THEOREM
47
Consequently,
k
n
√
1 X X
m (θ) = √
E[(π(i) − π(j) ) exp(θWk / k)].
n k i=1 j=k+1
0
(2.4)
Now fix i and j, such that i ≤ k < j, and let τij denote the transposition of i and j. Let
π 0 = π ◦τij , then it is obvious that π 0 (i) = π(j) and π 0 (j) = π(i). Moreover, π 0 is again
uniformly distributed on the set of all permutations of {1, . . . , n}. This can be seen
as follows. The random permutation π : Ω → {p | p is a permutation of {1, . . . , n}}
is uniformly distributed. Let p be an arbitrary permutation of {1, . . . , n}, then P{π =
p} = 1/n!. Thus:
P{π 0 = p} = P{ω ∈ Ω | π(ω) ◦ τij = p}
= P{ω ∈ Ω | π(ω) = p ◦ τij }
= P{π = p ◦ τij } = 1/n!.
Thus, we have shown that π 0 is uniformly distributed too. Let
Wk0 =
k
X
π0 (l) −
l=1
kSn
.
n
Since π and π 0 have the same distribution, and since π 0 (i) = π(j) and π 0 (j) = π(i),
we have that:
√
√
E[(π(i) − π(j) ) exp(θWk / k)] = E[(π0 (i) − π0 (j) ) exp(θWk0 / k)]
√
= E[(π(j) − π(i) ) exp(θWk0 / k)].
Therefore, it is clear that:
√
2E[(π(i) − π(j) ) exp(θWk / k)]
√
√
= E[(π(i) − π(j) ) exp(θWk / k)] − E[(π(i) − π(j) ) exp(θWk0 / k)],
or equivalently:
√
√
√
1
E[(π(i) − π(j) ) exp(θWk / k)] = E[(π(i) − π(j) )(exp(θWk / k) − exp(θWk0 / k))].
2
By straightforward reasoning, one can show that:
1
|ex − ey | ≤ |x − y|(ex + ey ).
2
For a verification of this inequality, we refer to Proposition A.3.6 in the appendix.
Furthermore, it is also clear that:
Wk −
Wk0
=
k
X
m=1
π(m) −
k
X
l=1
π0 (l) =
k
X
m=1
π(m) −
k
X
l=1,l6=i
π(l) − π(j) = π(i) − π(j) .
CHAPTER 2. A NORMAL APPROXIMATION THEOREM
48
Combining all these results, we can conclude that:
√
|E[(π(i) − π(j) ) exp(θWk / k)]|
√ i
√
1 h
≤ E |π(i) − π(j) | exp(θWk / k) − exp(θWk0 / k)
2 √
√ θWk θWk0 1
0
≤ E |π(i) − π(j) | √ − √ exp(θWk / k) + exp(θWk / k)
4
k
k
h
√
√ i
|θ|
2
0
exp(θWk / k) + exp(θWk / k)
= √ E π(i) − π(j)
4 k
√
√
√
|θ|
2|θ|
2|θ|
≤ √ E[exp(θWk / k) + exp(θWk0 / k)] = √ E exp(θWk / k) = √ m(θ).
k
k
k
Note that for the last inequality we used that |π(i) −π(j) | ≤ 2, since all l are elements
of {−1, 1}. If we use the obtained bound together with equation (2.4), it follows that:
k
n
2|θ| X X
|m (θ)| ≤
m(θ) ≤ 2|θ|m(θ).
nk i=1 j=k+1
0
We will use this inequality to complete the proof. First of all, we note that m(θ) >
0. This can be easily seen as follows. Since the
√ exponential function is (strictly)
positive, it is obvious that m(θ) = E exp(θWk / √k) ≥ 0. Assume m(θ) = 0, then by
Proposition A.1.5 it would follow that exp(θWk / k) = 0 a.s., which is a contradiction.
Hence, m(θ) > 0. We will now consider two cases.
Case 1: θ ≥ 0. Clearly, for all u ≥ 0, it holds that:
m0 (u) ≤ |m0 (u)| ≤ 2|u|m(u) = 2u m(u),
or equivalently
m0 (u)
m(u)
≤ 2u. Since θ ≥ 0, we have that:
Z θ 0
Z θ
m (u)
du ≤
2u du = θ2 .
m(u)
0
0
Since m(0) = 1, the left-hand side of this inequality can be rewritten as:
Z θ
(log m(u))0 du = log m(θ) − log m(0) = log m(θ).
0
Thus, we have shown that log m(θ) ≤ θ2 . Applying the exponential function on
both sides, gives that m(θ) ≤ exp (θ2 ).
Case 2: θ < 0. Obviously, for all u ≤ 0, we have that:
−m0 (u) ≤ |m0 (u)| ≤ 2|u|m(u) = −2u m(u),
or equivalently
m0 (u)
m(u)
≥ 2u. Since θ < 0, it follows that:
Z 0 0
Z 0
m (u)
− log m(θ) =
du ≥
2u du = −θ2 .
θ m(u)
θ
Thus, log m(θ) ≤ θ2 , and consequently m(θ) ≤ exp (θ2 ).
CHAPTER 2. A NORMAL APPROXIMATION THEOREM
49
This finishes the argument.
Lemma 2.2.4. Let all notation be as in the statement of Theorem 2.2.1. There exists
a universal constant α0 > 0 such that for all n, all possible values of Sn , all k ≤ 2n/3,
and all 0 < α ≤ α0 , we have:
3αSn2
2
E exp(αSk /k) ≤ exp 1 +
.
4n
Proof. Let Z be a standard normal random variable, independent of all other random
variables. Then by Corollary A.1.4, we have:
"
! Z
!#
r
r
2α
2α
E exp
E exp
ZSk =
Zs
PSk (ds).
k
k
R
Using q
the well-known
formula for the mgf of a standard normal rv, we obtain that
2α
E exp
Zs = exp(αs2 /k) for any s ∈ R. Therefore, we can conclude that:
k
r
E exp
2α
ZSk
k
!
Z
=
exp(αs2 /k)PSk (ds) = E exp(αSk2 /k).
R
By definition of Wk , we have proved that:
!
!
r
r
r
2α
2α
2α
kS
n
E exp(αSk2 /k) = E exp
ZSk = E exp
ZWk +
Z .
k
k
k n
p
Now let g(y) := E[exp( 2α/kZWk )kZ = y], for y ∈ R. Using Theorem A.1.14 and
Lemma 2.2.2, it follows that:
p
g(y) = E exp( 2α/kyWk ) ≤ exp(2αy 2 ).
Thus, by Remark A.1.11 we can conclude that:
p
E[exp( 2α/kZWk )kZ] = g(Z) ≤ exp(2αZ 2 ).
Combining this result with part (i) and (ii) of Proposition A.1.12, we obtain:
!
r
r
2α
2α kSn
E exp(αSk2 /k) = E exp
ZWk +
Z
k
k n
" "
!
! ##
r
r
2α
2α kSn
= E E exp
ZWk exp
Z Z
k
k n
! #
!#
" "
r
r
2α
2α kSn
ZWk Z
= E E exp
Z exp
k
k n
"
!#
r
2α
kS
n
Z
.
≤ E exp(2αZ 2 ) exp
k n
CHAPTER 2. A NORMAL APPROXIMATION THEOREM
50
Since Sn is a constant, the last expression is just the expectation of a function of
a standard normal random variable. Using Proposition A.3.7, we obtain that for
0 < α < 1/4:
αkSn2
1
2
exp
.
E exp(αSk /k) ≤ √
n2 (1 − 4α)
1 − 4α
Let k ≤ 2n/3, then obviously:
E exp(αSk2 /k)
1
exp
≤√
1 − 4α
2αSn2
3n(1 − 4α)
.
Let α0 := 1/36, then we have for every 0 < α ≤ α0 that:
1
2αSn2
2
E exp αSk /k ≤ √
exp
3n(1 − 4α0 )
1 − 4α0
r
9
3αSn2
=
exp
8
4n
3αSn2
≤ exp 1 +
,
4n
which completes the proof.
Using the above lemmas, we will now prove Theorem 2.2.1.
Proof. For simplicity, we shall write W for Wk and S for Sn . Sk will be written as
usual.
Let Y be a random variable which is uniformly distributed on the interval [−1, 1]
and independent of π. Fix i and j such that 1 ≤ i ≤ k and k < j ≤ n. The
conditional expectation given π(l), l 6= i, j will be denoted by E− . Furthermore, let:
S− =
X
l6=i,j
π(l)
and W − =
X
l≤k,l6=i
π(l) −
kS
.
n
Let ψ be an arbitrary Lipschitz function and let δ̂ = (δ1 , . . . , δˆi , . . . , δˆj , . . . , δn ) ∈ Rn−2
with δl , l 6= i, j different elements of {1, . . . , n}. Let:
g(δ̂) := E[(π(i) − π(j) )ψ(W + Y )kπ(l) = δl , l 6= i, j].
P
We will consider two cases. We will first assume that l6=i,j δl 6= S. It is obvious that
underPthis condition
P π(i) = π(j)
P. Otherwise π(i) + π(j) = 0, which would imply that
n
S = l=1 π(l) = l6=i,j π(l) = l6=i,j δl , since we have the information that π(l) = δl
for l 6= i, j. Thus, in this case, we can conclude by Theorem A.1.14 that:
g(δ̂) = E[(π(i) − π(j) )ψ(W + Y )kπ(l) = δl , l 6= i, j] = 0.
P
Secondly, we will assume that l6=i,j δl = S. Then there are only two possibilities.
The first one is that π(i) = 1 and π(j) = −1. The second possibility is that π(i) = −1
CHAPTER 2. A NORMAL APPROXIMATION THEOREM
51
and π(j) = 1. Each of these two possibilities occurs with probability one half, since
we work conditionally the information that π(l) = δl , l 6= i, j and since π is uniformly
distributed. Thus the conditional
distribution of (π(i) − π(j) )/2 is a symmetric ±1P
distribution. Moreover, if l6=i,j δl = S, it is easily seen that:
X :=
π(i) − π(j)
= π(i)
2
and
W = W − + X.
Thus, the random variable X conditionally that π(l) = δl , l 6= i, j with
has a symmetric ±1-distribution. It is now clear that:
P
l6=i,j δl
= S,
g(δ̂) = E[(π(i) − π(j) )ψ(W + Y )kπ(l) = δl , l 6= i, j]
= 2E[Xψ(W − + X + Y )kπ(l) = δl , l 6= i, j]
!#
"
X
kS
,
+ X̄ + Y
= 2E X̄ψ
δl −
n
l≤k,l6=i
where X̄ is a rv independent of Y with a symmetric
±1-distribution. Note that we
P
used Theorem A.1.14 in the last step. Since l≤k,l6=i δl − kS/n is a constant, we can
P
define ψδ̂ (x) := ψ
l≤k,l6=i δl − kS/n + x . Thus, using Lemma 2.1.2, we obtain
that:
g(δ̂) =
=
=
=
2E[X̄ψδ̂ (X̄ + Y )]
2E[(1 − X̄Y )ψδ̂0 (X̄ + Y )]
2E[(1 − XY )ψ 0 (W − + X + Y )kπ(l) = δl , l 6= i, j]
E[(2 − (π(i) − π(j) )Y )ψ 0 (W + Y )kπ(l) = δl , l 6= i, j],
where in the third step we used again Theorem A.1.14. Let:
aij := 1 − π(i) π(j) − (π(i) − π(j) )Y.
Then obviously:
2 − (π(i) − π(j) )Y
if π(i) =
6 π(j)
0
if π(i) = π(j) .
P
Thus, irrespectively of whether l6=i,j δl = S or not, we have:
aij =
g(δ̂) = E[aij ψ 0 (W + Y )kπ(l) = δl , l 6= i, j].
Hence, by Remark A.1.11 we can conclude that:
ˆ ..., π(j),
ˆ ..., π(n))
E− [(π(i) − π(j) )ψ(W + Y )] = g(π(1), ..., π(i),
= E− [aij ψ 0 (W + Y )].
CHAPTER 2. A NORMAL APPROXIMATION THEOREM
52
By Proposition A.1.12 (ii), taking the expectation on both sides leads to:
E[(π(i) − π(j) )ψ(W + Y )] = E[aij ψ 0 (W + Y )].
In the proof of Lemma 2.2.2 we have already shown that:
k
n
1X X
W =
(π(i) − π(j) ).
n i=1 j=k+1
This implies that:
k
n
1X X
E[(π(i) − π(j) )ψ(W + Y )]
E[W ψ(W + Y )] =
n i=1 j=k+1
k
n
1X X
E[aij ψ 0 (W + Y )]
=
n i=1 j=k+1
!
#
"
k
n
1X X
= E
aij ψ 0 (W + Y ) .
n i=1 j=k+1
We want to prove that E[Y ψ(W + Y )] = 21 E[(1 − Y 2 )ψ 0 (W + Y )]. In order to do so,
we define:
h(δ̂) := E[Y ψ(W + Y )kπ(l) = δl , l 6= i, j].
P
We will consider three different cases. First of all, we will assume that l6=i,j δl = S.
In this case, we can define X and X̄ as above. Then, by analogous arguments as
before, we can conclude that:
h(δ̂) = E[Y ψ(W − + X + Y )kπ(l) = δl , l 6= i, j]
!#
"
X
kS
+ X̄ + Y
= E Yψ
δl −
n
l≤k,l6=i
= E[Y ψδ̂ (X̄ + Y )].
Using the second equality of Lemma 2.1.2, we obtain that:
1
E[(1 − Y 2 )ψδ̂0 (X̄ + Y )]
2
1
=
E[(1 − Y 2 )ψ 0 (W + Y )kπ(l) = δl , l 6= i, j].
2
P
Secondly, we will consider the
case
where
l6=i,j δl = S − 2. This clearly implies that
P
π(i) = π(j) = 1. Let c := l≤k,l6=i δl + 1 − kS/n, then using integration by parts
h(δ̂) =
CHAPTER 2. A NORMAL APPROXIMATION THEOREM
53
gives us that:
1
1
E[(1 − Y 2 )ψ 0 (W + Y )kπ(l) = δl , l 6= i, j] =
E[(1 − Y 2 )ψ 0 (c + Y )]
2
2Z
1 1
=
(1 − y 2 )ψ 0 (c + y)dy
4 −1
Z
1 1
= −
ψ(c + y)(−2y)dy
4 −1
= E[Y ψ(c + Y )]
= E[Y ψ(W + Y )kπ(l) = δl , l 6= i, j].
P
Finally, we assume that l6=i,j δl = S + 2. This implies that π(i) = π(j) = −1.
Similarly, as in the previous case, it can be shown that:
1
h(δ̂) = E[(1 − Y 2 )ψ 0 (W + Y )kπ(l) = δl , l 6= i, j].
2
P
This means that this equality holds, irrespectively of the value of l6=i,j δl . Thus, by
Remark A.1.11, we see that:
ˆ . . . , π(j),
ˆ . . . , π(n))
E− [Y ψ(W + Y )] = h(π(1), . . . , π(i),
1 −
=
E [(1 − Y 2 )ψ 0 (W + Y )].
2
Taking the expectation on both sides, Proposition A.1.12 yields that:
1
E[Y ψ(W + Y )] = E[(1 − Y 2 )ψ 0 (W + Y )].
2
Put W̃ = W + Y and
k
n
1−Y2
1X X
aij +
.
T =
n i=1 j=k+1
2
Then, we have proved that:
E[W̃ ψ(W̃ )] = E[W ψ(W + Y )] + E[Y ψ(W + Y )]
"
!
#
k
n
1X X
1
aij ψ 0 (W + Y ) + E[(1 − Y 2 )ψ 0 (W + Y )]
= E
n i=1 j=k+1
2
= E[T ψ 0 (W̃ )].
Thus, T is a Stein coefficient for W̃ . Now, simply using the definition of aij , an easy
verification shows that:
P
P
k
n
k
n
i=1 π(i)
j=k+1 π(j)
1X X
k(n − k)
aij =
−
− W Y.
n i=1 j=k+1
n
n
CHAPTER 2. A NORMAL APPROXIMATION THEOREM
54
Let σ 2 = k(n − k)/n. Since n/3 ≤ k ≤ 2n/3, we have that:
kS
≤ |Sk | + k |S| ≤ |Sk | + 2 |S|.
|W | = Sk −
n n
3
By straightforward reasoning, we obtain:
P
 Pk
2
n
2
2 2
π(i)
π(j)
i=1
j=k+1
Y − 1
(T − σ )
n

+ WY +
=
2
σ
k(n − k)
n
2
P
 Pk
2
n
2
π(i)
π(j)
i=1
j=k+1
n
|Y − 1| 

≤
+ |W ||Y | +
k(n − k)
n
2
2
n
|Sk |n
1
≤
+ |W |1 +
k(n − k)
n
2
2
n
1
=
|Sk | + |W | +
k(n − k)
2
2
Since u 7→ u2 is a convex function, we have for all x, y, z ∈ R that x+y+z
≤
3
1
2
2
2
2
2
2
2
(x + y + z ), or equivalently that (x + y + z) ≤ 3(x + y + z ). Using this
3
inequality together with the bound on |W | obtained above, we get:
2
n
n
1
2
2
(|Sk | + |W | + 1/2) ≤
2|Sk | + |S| +
k(n − k)
k(n − k)
3
2
4
1
3n
4Sk2 + S 2 +
≤
k(n − k)
9
4
2
12n
1
S
2
=
Sk +
+
k(n − k)
9
16
2
2
Sk n
S
1
n2
n
= 12
+
+
k n − k 9n k(n − k) 16k (n − k)
Since it is assumed that n/3 ≤ k ≤ 2n/3, the following inequalities also hold:
n/3 ≤ n − k ≤ 2n/3,
n/k ≤ 3 and n/(n − k) ≤ 3.
Putting C := 36, we can conclude that:
2
n
3Sk S 2
3
2
(|Sk | + |W | + 1/2) ≤ 12
+
+
k(n − k)
k
n
16k
2
2
Sk S
≤ C
+
+1 .
k
n
Next, we will verify the assumptions of Theorem 1.0.2. Since π is uniformly distributed, we have for all 1 ≤ l ≤ n that:
E[π(l) ] = 1 P{π(l) = 1} + · · · + n P{π(l) = n} = 1
1
1
S
+ · · · + n = .
n
n
n
CHAPTER 2. A NORMAL APPROXIMATION THEOREM
55
Since Y is uniformly distributed on the interval [−1, 1], it holds that E[Y ] = 0, which
implies that:
E[W̃ ] = E[W ] + E[Y ] =
k
X
l=1
E[π(l) ] −
kS kS
kS
=
−
= 0.
n
n
n
Since Y and W are independent and since kS/n is a constant, we have that:
E[W̃ 2 ] = Var(W̃ ) = Var (Sk − kS/n) + Var(Y )
= Var(Sk ) + E[Y 2 ]
≤ E[Sk2 ] + 1/3 ≤ k 2 + 1/3 < ∞.
Finally, it is easy to see that:
|aij | = |1 − π(i) π(j) − (π(i) − π(j) )Y | ≤ 1 + 1 + 2|Y | ≤ 4.
Hence,
k
n
1X X
1−Y2
k(n − k)
1
|T | ≤
|aij | +
≤
4 + < ∞.
n i=1 j=k+1
2
n
2
We have shown that all conditions of Theorem 1.0.2 are met. Thus we can construct
a version of W̃ and a random variable Z ∼ N (0, σ 2 ) on the same probability space,
such that for all θ ∈ R:
2 2
2 (T − σ )
E exp(θ|W̃ − Z|) ≤ 2E exp 2θ
.
σ2
On that same p-space, we can construct a random variable W having the same distribution as Sk − kSn /n, and a random variable Y ∼ U [−1, 1] independent of W , such
that W̃ = W + Y . Indeed, assuming that the underlying p-space is rich enough, this
is a simple application of Theorem A.1.8. Then, it clearly holds that |W − W̃ | ≤ 1.
Combining these results, we obtain that:
E exp(θ|W − Z|) ≤ E exp(|θ| + |θ||W̃ − Z|)
2 2
2 (T − σ )
.
≤ exp(|θ|)2E exp 2θ
σ2
Using the bound on (T − σ 2 )2 /σ 2 obtained above, and putting c̃ = 2C, we see that:
2
Sk S 2
2
E exp(θ|W − Z|) ≤ 2 exp (|θ|) E exp 2θ C
+
+1
k
n
2
2
2S
2
2 Sk
= 2 exp |θ| + c̃θ
+ c̃θ E exp c̃θ
n
k
By Lemma 2.2.4 there exists a universal constant α0 > 0, such that for 0 < c̃θ2 ≤ α0 :
3c̃θ2 S 2
2 2
E exp(c̃θ Sk /k) ≤ exp 1 +
.
4n
CHAPTER 2. A NORMAL APPROXIMATION THEOREM
Thus when 0 < θ ≤
p
56
α0 /c̃ =: θ0 , it holds that:
3 c̃θ2 S 2
c̃θ2 S 2
2
+ c̃θ + 1 +
E exp(θ|W − Z|) ≤ 2 exp |θ| +
n
4 n
2
7 θ S2
2
= 2 exp |θ| + c̃θ exp 1 + c̃
4 n
2 2
θ S
≤ M exp 1 + c
,
n
p
where M := 2 exp
α0 /c̃ + α0 and c := 7c̃/4 are universal constants. This finishes
the proof.
Chapter 3
The KMT theorem for the SRW
The goal of this chapter is to give a proof of Theorem 2 from the introduction. We
will proceed as follows. In Section 3.1 we will carry out an induction step. In Section
3.2 we will then use this result in order to give a proof of the KMT theorem for the
simple random walk.
3.1
The induction step
The purpose of this section is to prove the theorem below, which provides a coupling
of a pinned random walk with a Brownian bridge. We will make use of Theorem 2.2.1
and an induction argument.
Theorem 3.1.1. Let 1 , 2 , . . . , n be n arbitrary elements of {−1, 1}. Let
Pk π be a
uniform random permutation of {1, 2, . . . , n}. For each 1 ≤ k ≤ n, let Sk = l=1 π(l) ,
and let
kSn
.
Wk = Sk −
n
There exist strictly positive constants C, K and λ0 such that the following is true.
For all n ≥ 2, and any possible value of Sn , it is possible to construct a version of
W0 , W1 , . . . , Wn and random variables Z0 , Z1 , . . . , Zn with a joint Gaussian distribution, mean zero and
(i ∧ j)(n − (i ∨ j))
Cov(Zi , Zj ) =
n
on the same probability space such that for any 0 < λ < λ0 ,
Kλ2 Sn2
.
E exp(λ max |Wi − Zi |) ≤ exp C log n +
i≤n
n
Remark 3.1.2. Note that the theorem above states that we can make such couplings
for all possible values of Sn . Since this concept turns up several times in the proof
below, we give an example of those ’possible values’. Take n = 4, then the possible
values for S4 are:
{−4, −2, 0, 2, 4}.
57
CHAPTER 3. THE KMT THEOREM FOR THE SRW
58
Notation 3.1.3. Before giving a proof of Theorem 3.1.1, we first introduce some
notation concerning several measures. Let µ denote the counting measure on Z and
λ the Lebesgue measure on R. Then we define:
Qn,1
Qn,2
Qn
Rn
:=
:=
:=
:=
µn+1
µ ⊗ λn−1 ⊗ µ
Qn,1 ⊗ Qn,2
(µ ⊗ λ) ⊗ Qk ⊗ Qn−k .
We can prove Theorem 3.1.1 as follows.
Proof. Recall the universal constants α0 from Lemma 2.2.4 and M, c and θ0 from
Theorem 2.2.1. Let:
r
α0
θ0
1 + log(2M 1/2 )
∧ , and C ≥
.
(3.1)
K = 8c, λ0 ≤
16c 2
log(3/2)
We claim that these constants are sufficient for carrying out the induction step. We
will prove this claim by induction on n. For all n and each possible value a of Sn , let
fan (s) denote the discrete probability density function of the sequence (S0 , S1 , . . . , Sn ).
Using the same notation as in 3.1.3, we see that fan (s) is actually a Qn,1 -density
function. Since the random permutation π is uniformly distributed, it is easy to see
that fan (s) is just the uniform distribution over Ana , where:
Ana := s ∈ Zn+1 | s0 = 0, sn = a, and |si − si−1 | = 1, ∀i .
Hence, for any s ∈ Ana , we have:
fan (s) =
1
.
|Ana |
Note that, using some combinatorics, it is easy to verify that
|Ana |
=
a+n . Next, let
n
2
β n (z) denote the Qn,2 -density function of a Gaussian random vector (Z0 , Z1 , . . . , Zn )
with mean zero and covariance:
Cov(Zi , Zj ) =
(i ∧ j)(n − (i ∨ j))
.
n
We want to prove that for each n ≥ 2, and every possible value a of Sn , we can
construct a joint Qn -density ρna (s, z) on Zn+1 × Rn+1 such that:
Z
Z
n
n
ρa (s, z)Qn,2 (dz) = fa (s),
ρna (s, z)Qn,1 (ds) = β n (z),
(3.2)
and for each 0 < λ < λ0 :
Z
2 2
n
ia
Kλ
a
exp λ max si − − zi ρa (s, z) Qn (ds, dz) ≤ exp C log n +
.
i≤n
n
n
CHAPTER 3. THE KMT THEOREM FOR THE SRW
59
If we can construct such densities, one can take a random vector (S, Z) following the
, we have constructed a version of W0 , . . . , Wn
density ρna . Then, setting Wi = Si − ia
n
and random variables Z0 , . . . , Zn satisfying all assumptions mentioned in Theorem
3.1.1. Hence, this would complete the proof.
For the induction base, we refer to the end of the proof. Suppose ρka can be
constructed for k = 2, . . . , n − 1, for all possible values of a in each case. We will now
show how ρna can be constructed when a is an allowed value for Sn .
First, fix a possible value a of Sn and an index k such that n/3 ≤ k ≤ 2n/3 (for
definiteness, take k = [n/2]). Given Sn = a, let gan,k denote the µ-density function
of Sk . We will use a simple counting argument, in order to determine gan,k (s). If we
know that Sn = a, then the number of possible s ∈ Zn+1 values for (S0 , S1 , . . . , Sn )
for which Sk = sk = s is given by:
|{s ∈ Zn+1 | s0 = 0, sk = s, sn = a, |si − si−1 | = 1, ∀i}|.
This expression can be rewritten as:
|{s ∈ Zk+1 | s0 = 0,sk = s, |si − si−1 | = 1, ∀i}×
{r ∈ Zn−k+1 | r0 = s, rn−k = a, |ri − ri−1 | = 1, ∀i}|.
Obviously, this is equal to the product of
|{s ∈ Zk+1 | s0 = 0, sk = s, |si − si−1 | = 1, ∀i}| = |Aks |
and
n−k
|{r ∈ Zn−k+1 | r0 = 0, rn−k = a − s, |ri − ri−1 | = 1, ∀i}| = |Aa−s
|.
Thus, given Sn = a, the number of possible s ∈ Zn+1 values for (S0 , S1 , . . . , Sn ) for
n,k
which Sk = sk = s is given by |Aks ||An−k
a−s |. To obtain ga (s) this expression still has
to be divided by the total number of all possibilities:
| s ∈ Zn+1 | s0 = 0, sn = a, and |si − si−1 | = 1, ∀i | = |Ana |.
Hence, we can conclude that for all allowed values s of Sk , we have:
gan,k (s) =
n−k
|Aks ||Aa−s
|
.
n
|Aa |
(3.3)
Notice that this is the density function
of a hypergeometric distribution. This can
n
be easily seen, using that |Ana | = a+n which we already mentioned before. Next,
2
let hn,k (z) denote the Lebesgue density function of the 1-dimensional Gaussian distribution with mean zero and variance k(n−k)
. By Theorem 2.2.1, Lemma A.1.15 and
n
Lemma A.1.16, there exists a joint (µ ⊗ λ)-density function ψan,k (s, z) on Z × R such
that:
Z
Z
n,k
n,k
ψan,k (s, z) µ(ds) = hn,k (z),
(3.4)
ψa (s, z) dz = ga (s),
CHAPTER 3. THE KMT THEOREM FOR THE SRW
and for all θ ≤ θ0 :
Z
cθ2 a2
ka
n,k
− z ψa (s, z) (µ ⊗ λ)(ds, dz) ≤ M exp 1 +
.
exp θ s −
n
n
60
(3.5)
Indeed, by Theorem 2.2.1 we can construct a version of Wk and a random variable
) on the same probability space, such that for all θ ≤ θ0 :
Zk ∼ N (0, k(n−k)
n
cθ2 Sn2
.
E exp(θ|Wk − Zk |) ≤ M exp 1 +
n
Note that in the statement of Theorem 2.2.1 it is assumed that n ≥ 3. In this
Theorem we assume that n ≥ 2, but since we are already in the induction step, the
assumptions of Theorem 2.2.1 are satisfied. Furthermore, let Sk = Wk + ka
and let
n
ψan,k be the joint density function of (Sk , Zk ). This (µ ⊗ λ)-density function exists
because of Lemma A.1.16. Then, obviously ψan,k meets all the assumptions mentioned
above. Next, define a function γan : Z × R × Zk+1 × Rk+1 × Zn−k+1 × Rn−k+1 → R as
follows:
n−k
γan (s, z, s, z, s’, z’) := ψan,k (s, z)ρks (s, z)ρa−s
(s’, z’).
(3.6)
Note that ρks and ρn−k
a−s exist by the induction hypothesis. Actually, they only exist
for allowed values s and a − s for Sk and Sn−k respectively. However, if s is not
a possible value for Sk , then ψan,k (s, z) = 0. If a − s is not an allowed value for
Sn−k , then either a is not allowed for Sn or s is not allowed for Sk . In both of these
cases ψan,k (s, z) = 0. Taking this into account, it is easy to verify that γan is an Rn probability density function. Indeed, integrating the expression in (3.6) over s’, z’,
gives us ψan,k (s, z)ρks (s, z). Then, by integrating over s, z, we obtain ψan,k (s, z). And
finally by integrating over s, z, we see that:
Z
γan (s, z, s, z, s’, z’) Rn (ds, dz, ds, dz, ds’, dz’) = 1.
Let (S, Z, S, Z, S’, Z’) be a random vector on (Ω, F, P) following the density γan .
In fact, this means the following. First, we generate (S, Z) from the joint density
ψan,k . Then, given S = s and Z = z, we are independently generating the pairs (S, Z)
n−k
and (S’, Z’) from the joint densities ρks and ρa−s
respectively.
n+1
Now define two random vectors Y ∈ R
and U ∈ Zn+1 as follows. For i ≤ k,
let:
i
Yi = Zi + Z,
k
and for i ≥ k, let:
n−i
0
Yi = Zi−k
+
Z.
n−k
Note that these definitions coincide at i = k, since Zk = Z00 = 0. To see that Zk = 0,
one can simply integrate the density function γan over s’, z’ first, then over s and finally
over s, z. In this way, Lemma A.1.15 implies that the joint density function of Z is
given by β k . By definition of β k , we obtain that Zk has a normal distribution with
CHAPTER 3. THE KMT THEOREM FOR THE SRW
61
mean zero and variance k(k−k)
= 0. Hence E[Zk2 ] = 0, and thus Zk = 0 almost surely.
k
Besides, we can determine the probability density function of Z00 by first integrating
γan over s, z, then over s’ and finally over s, z. We obtain that β n−k is the joint density
function of Z’. Hence, Z00 follows a normal distribution with mean zero and variance
0(n−k−0)
= 0. Thus E[Z002 ] = 0, which implies that Z00 = 0 almost surely. Therefore,
n−k
we have shown the two definitions of Yi match at i = k. Next, define
Ui = Si ,
for i ≤ k and
0
,
Ui = S + Si−k
for i ≥ k. Also here, the definitions coincide at i = k, since Sk = S and S00 = 0. This
is already intuitively clear since Lemma A.1.15 implies that (S, Z) and (S’, Z’) have
conditional densities ρks and ρn−k
a−s , given S = s and Z = z. We can also give a formal
explanation. Lemma A.1.15 states that, in order to determine the density function
of Sk , we simply integrate γan over s, z, s0 , . . . , sk−1 , z, s’z’, which gives:
Z
Z
n,k
k
k
ga (s)
fs (s) µ (ds0 , . . . , dsk−1 ) µ(ds).
Zk
Z
By definition of fsk , the expression between the parentheses is the density function of
the constant random variable Sk = s evaluated at sk . Thus the density function of
Sk is given by:
Z
gan,k (s)I{s} (sk ) µ(ds) = gan,k (sk ),
Z
d
which is the density function of S. Hence, Sk = S. An analogous reasoning can be
made in order to see that S00 = 0 almost surely. We claim that the joint density of
(U, Y) is a valid candidate for ρna . We will show this in three different steps.
1. Marginal distribution of U. By Lemma A.1.15, we obtain the density function
of (S, S, S’) as follows:
Z
γan (s, z, s, z, s’, z’) (λ ⊗ Qk,2 ⊗ Qn−k,2 )(dz, dz, dz’)
Z
Z
Z
n,k
k
= ψa (s, z)dz ρs (s, z)Qk,2 (dz) ρn−k
a−s (s’, z’)Qn−k,2 (dz’)
n−k
= gan,k (s)fsk (s)fa−s
(s’),
where we have used the definition of γan and equations (3.2) and (3.4). This means,
the distribution of the triplet (S, S, S’) can be described in the following way. First
generate S from the density gan,k . This means S has the distribution of Sk given
Sn = a. Next, independently generate S and S’ from the conditional densities fsk
n−k
respectively. Using the expression for gan,k (s) obtained in (3.3) and the
and fa−s
calculation above, we see that the joint density of (S, S, S’) is given by:
n−k
gan,k (s)fsk (s)fa−s
(s’) =
|Aks ||An−k
1
1
a−s | 1
= n .
n−k
n
k
|Aa |
|As | |Aa−s |
|Aa |
CHAPTER 3. THE KMT THEOREM FOR THE SRW
62
By definition of U it is easy to verify that there is a one-to-one correspondence between
(S, S, S’) and U. This implies that every possible value of U occurs with probability
1
. If we can show that U can take all values in Ana , then we know U has density fan .
|An
a|
Assume the value of S(= Sk ) is known and given by b. Then S = (S0 , . . . , Sk ) ∼ fbk
n−k
0
) ∼ fa−b
. Thus (U0 , . . . , Uk ) = (S0 , . . . , Sk ) can take all values
and S’ = (S00 , . . . , Sn−k
in:
Akb = {s ∈ Zk+1 | s0 = 0, sk = b, |si − si−1 | = 1, ∀i},
0
) can take all values in:
and (Uk , . . . , Un ) = (b, b + S10 , . . . , b + Sn−k
{(sk , . . . , sn ) ∈ Zn−k+1 | sk = b, sn = a, |si − si−1 | = 1, ∀i}.
This means U can take all values in:
{s ∈ Zn+1 | s0 = 0, sk = b, sn = a, |si − si−1 | = 1, ∀i}.
If we let b vary over all possible values of S = Sk , we can conclude that U can take
all values in Ana . This finishes the argument.
2. Marginal distribution of Y. First, we claim that Z, Z and Z’ are independent,
with densities hn,k , β k and β n−k respectively. What we need to show, is that the
joint density of (Z, Z, Z’) is equal to the product of hn,k , β k and β n−k . Using Lemma
A.1.15, this joint density function is obtained by integrating γan over s, s and s’, which
yields:
Z
γan (s, z, s, z,s’, z’) (µ ⊗ µk+1 ⊗ µn−k+1 )(ds, ds, ds’)
Z
Z
Z
k
k+1
n−k
n−k+1
n,k
ρs (s, z)µ (ds) ρa−s (s’, z’)µ
(ds’) µ(ds)
= ψa (s, z)
Z
= ψan,k (s, z)β k (z)β n−k (z’)µ(ds)
Z
= ψan,k (s, z)µ(ds) β k (z) β n−k (z’)
= hn,k (z)β k (z)β n−k (z’).
Since hn,k , β k and β n−k are densities of (multidimensional) Gaussian distributions
with mean zero, and since Z, Z and Z’ are independent, it follows that (Z, Z, Z’) has
a multidimensional Gaussian distribution with mean zero. It is easily seen that Y is a
linear transformation of (Z, Z, Z’), thus also Y is a Gaussian random vector and has
mean zero. The only thing left to prove, is that Cov(Yi , Yj ) = (i ∧ j)(n − (i ∨ j))/n.
We will distinguish the following three cases: i ≤ j ≤ k, k ≤ i ≤ j and i ≤ k ≤ j.
Then it suffices to show that Cov(Yi , Yj ) = i(n − j)/n in each case.
(a) i ≤ j ≤ k. Recall that Z has density hn,k , which implies that Z has variance
k(n − k)/n. Moreover, since Z has density β k , we know that Cov(Zi , Zj ) =
CHAPTER 3. THE KMT THEOREM FOR THE SRW
63
i(k − j)/k. Using this, together with the independence of Z and Z, we obtain:
j
i
Cov(Yi , Yj ) = Cov(Zi + Z, Zj + Z)
k
k
j
i
ij
= Cov(Zi , Zj ) + Cov(Zi , Z) + Cov(Z, Zj ) + 2 Cov(Z, Z)
k
k
k
i(k − j) ij k(n − k)
+ 2
=
k
k
n
i(n − j)
=
.
n
(b) k ≤ i ≤ j. As in part (a) we still have that Var(Z) = k(n − k)/n. Since Z’ has
0
0
density β n−k , we know that Cov(Zi−k
, Zj−k
) = (i − k)(n − k − (j − k))/(n − k).
Using this, together with the independence of Z and Z’, we have that:
n−i
n−j
0
Z, Zj−k
+
Z)
n−k
n−k
(n − i)(n − j)
0
0
= Cov(Zi−k
, Zj−k
)+
Cov(Z, Z)
(n − k)2
(i − k)(n − j) (n − i)(n − j) k(n − k)
=
+
n−k
(n − k)2
n
i(n − j)
=
.
n
0
Cov(Yi , Yj ) = Cov(Zi−k
+
(c) i ≤ k ≤ j. Using the independence of Z, Z and Z’, we see that:
i
n−j
0
Cov(Yi , Yj ) = Cov(Zi + Z, Zj−k
Z)
+
k
n−k
i n−j
=
Cov(Z, Z)
kn−k
i n − j k(n − k)
i(n − j)
=
=
.
kn−k
n
n
This finishes the argument, and we get that Y ∼ β n .
3. The exponential bound. For 0 ≤ i ≤ n, let:
Wi = Ui −
ia
.
n
We have to show that for all 0 < λ < λ0 :
Kλ2 a2
E exp(λ max |Wi − Yi |) ≤ exp C log n +
i≤n
n
,
where C, K and λ0 are as specified in the beginning of the proof. Define:
0
i
−
k
iS
0
TL := max Si −
− Zi , TR := max Si−k −
(a − S) − Zi−k ,
i≤k
i≥k
k
n−k
CHAPTER 3. THE KMT THEOREM FOR THE SRW
and
64
ka
− Z .
T := S −
n
We claim that:
max |Wi − Yi | ≤ max{TL , TR } + T.
i≤n
(3.7)
To prove this claim, we distinguish two cases. First assume i ≤ k. Then, simply
using the definitions of Wi , Yi and Ui , we see that:
iZ ia
|Wi − Yi | = Ui − − Zi +
n
k iS
iS ia iZ = S i −
− Zi +
− − k
k
n
k
iS ia iZ iS
− Zi + − − ≤ S i −
k
k
n
k
i
ka
≤ TL + S −
− Z k
n
≤ max{TL , TR } + T,
where in the last step, we used that i/k ≤ 1. Now assume i ≥ k. Simply using the
definitions and some straightforward calculations, we get that:
ia
n
−
i
0
0
Z |Wi − Yi | = S + Si−k − − Zi−k +
n
n−k
0
i
−
k
i
−
k
ia
n
−
i
0
≤ Si−k −
(a − S) − Zi−k + S +
(a − S) − −
Z n−k
n−k
n
n−k
0
n−i ka
i−k
0 = Si−k −
(a − S) − Zi−k
+
S−
− Z n−k
n−k
n
n−i
T
≤ TR +
n−k
≤ max{TL , TR } + T,
n−i
where in the last inequality we used that n−k
≤ 1, since we assumed that i ≥ k. This
concludes the proof of (3.7). Fix 0 < λ < λ0 . Note that for any x, y ∈ R the following
inequality holds:
exp(x ∨ y) ≤ exp x + exp y.
(3.8)
Using this crude bound and the fact that λ > 0, we have:
exp(λ max{TL , TR } + λT ) = exp(λTL ∨ λTR ) exp(λT )
≤ (exp(λTL ) + exp(λTR )) exp(λT )
= exp(λTL + λT ) + exp(λTR + λT ).
If we combine this with (3.7), we obtain:
exp(λ max |Wi − Yi |) ≤ exp(λTL + λT ) + exp(λTR + λT ).
i≤n
(3.9)
CHAPTER 3. THE KMT THEOREM FOR THE SRW
65
By Lemma A.1.15 and by definition of γan it is clear that the conditional density
of (S, Z) given (S, Z) = (s, z) is simply ρks . By Remark A.1.11 we know that
E[exp(λTL )kS, Z] = g(S, Z), where:
g(s, z) = E[exp(λTL )kS = s, Z = z]
Z
is
=
exp λ max si − − zi ρks (s, z) Qk (ds, dz)
i≤k
k
2 2
Kλ s
≤ exp C log k +
=: g̃(s).
k
Note that for the inequality, we used the induction hypothesis. As a consequence, we
have:
Kλ2 S 2
E[exp(λTL )kS, Z] ≤ g̃(S) = exp C log k +
.
k
By definition of T , it is clear that exp(λT ) is σ{S, Z}−measurable. Using properties
(i) and (ii) of Proposition A.1.12, we obtain that:
E exp(λTL + λT ) = E[exp(λTL ) exp(λT )]
= E [E[exp(λTL ) exp(λT )kS, Z]]
= E [exp(λT )E[exp(λTL )kS, Z]] .
Therefore, Hölder’s inequality A.1.6 with p = q = 2 implies that:
1/2
E E[exp(λTL )kS, Z]2 E[exp(λT )2 ]
1/2
2Kλ2 S 2
E exp(2λT )
≤
E exp(2C log k) exp
k
1/2
2Kλ2 S 2
= exp(C log k) E exp
E exp(2λT )
.
k
E exp(λTL + λT ) ≤
We wish to apply Lemma 2.2.4 to bound the first term between the parentheses.
Since S has density gan,k , we know that S plays the role of Sk in Lemma 2.2.4. Note
that k ≤ 2n/3 by assumption. We want to apply the Lemma with α = 2Kλ2 , so we
still have to verify that 2Kλ2 ≤ α0 . By the choice of K and λ0 in (3.1), we have:
2Kλ2 ≤ 16cλ20 ≤ 16c
α0
= α0 .
16c
Thus, Lemma 2.2.4 can be applied, and we find that:
2Kλ2 S 2
3Kλ2 a2
E exp
≤ exp 1 +
.
k
2n
Besides, we have by (3.1) that 2λ ≤ 2λ0 ≤ θ0 . Thus, by using inequality (3.5) with
CHAPTER 3. THE KMT THEOREM FOR THE SRW
66
θ = 2λ, we obtain:
ka
− Z E exp(2λT ) = E exp 2λ S −
n
Z
ka
=
exp 2λ s −
− z ψan,k (s, z) (µ ⊗ λ)(ds, dz)
n
2 2
4cλ a
≤ M exp 1 +
,
n
where we have used that ψan,k is the joint density of (S, Z). Combining the last three
steps, we have:
1/2
3Kλ2 a2
4cλ2 a2
E exp(λTL + λT ) ≤ exp(C log k) exp 1 +
M exp 1 +
2n
n
2 2
(3K + 8c)λ a
= M 1/2 exp C log k + 1 +
.
4n
By definition of K in (3.1), we see that 3K + 8c = 4K. Since k ≤ 2n/3 we also have
that:
k
log k = log n
= log n − log(n/k) ≤ log n − log(3/2).
n
Thus,
E exp(λTL + λT ) ≤ M
1/2
Kλ2 a2
exp C log n − C log(3/2) + 1 +
.
n
By symmetry, we can get the exact same bound on E exp(λTR + λT ). When we
combine this with the inequality in (3.9), we obtain:
Kλ2 a2
1/2
.
E exp(λ max |Wi − Yi |) ≤ 2M exp C log n − C log(3/2) + 1 +
i≤n
n
Finally, by the choice of C in (3.1), we have that:
−C log(3/2) + 1 + log(2M 1/2 ) ≤ 0.
Therefore, we can conclude that:
Kλ2 a2
1/2
E exp(λ max |Wi − Yi |) ≤ exp C log n − C log(3/2) + 1 + log(2M ) +
i≤n
n
2 2
Kλ a
≤ exp C log n +
.
n
This completes the induction step. To complete the argument, we still have to prove
the induction base. Obviously it is sufficient to show that if we just choose C large
enough and λ0 small enough, the result is true for n = 2. On a suitable probability
CHAPTER 3. THE KMT THEOREM FOR THE SRW
67
space it is possible to construct a Gaussian random vector (Z0 , Z1 , Z2 ) with mean
zero and:
(i ∧ j)(2 − (i ∨ j))
.
Cov(Zi , Zj ) =
2
On that same probability space, we construct a version of (W0 , W1 , W2 ) and we determine:
E exp(λ0 max |Wi − Zi |).
i≤2
It suffices to prove that this expression is finite for a certain λ0 . Indeed, in that case
it is possible to choose C large enough, such that for all λ < λ0 , and all K ≥ 0:
E exp(λ max |Wi − Zi |) ≤ E exp(λ0 max |Wi − Zi |)
i≤2
i≤2
≤ exp(C log 2)
Kλ2 S22
≤ exp C log 2 +
.
2
We will show that E exp(λ0 maxi≤2 |Wi − Zi |) is finite for any λ0 > 0. Using the fact
that W0 = Z0 = W2 = Z2 = 0, we see that:
E exp(λ0 max |Wi − Zi |) = E exp(λ0 |W1 − Z1 |) ≤ E exp(λ0 |W1 | + λ0 |Z1 |).
i≤2
Note that |W1 | = |S1 − 21 S2 | ≤ 1 + 2/2 = 2. Therefore, we can conclude that:
E exp(λ0 max |Wi − Zi |) ≤ E exp(2λ0 + λ0 |Z1 |)
i≤2
= exp(2λ0 )E exp(λ0 |Z1 |)
≤ 2 exp(2λ0 + λ20 /2) < ∞,
since Z1 has a centered normal distribution. Consequently, we can find a suitable C,
which finishes the proof.
3.2
Completing the proof of the main theorem
In this section, we will use former results to complete the proof of Theorem 2 from
the introduction. The next lemma uses Theorem 3.1.1 and Theorem 2.1.1 to give a
’finite n version’ of Theorem 2.
Lemma 3.2.1. There exist universal constants B > 1 and λ > 0 such that the
following is true. Let n be a strictly positive integer
Pk and let 1 , 2 , . . . , n be i.i.d.
symmetric ±1−valued random variables. Let Sk = i=1 i , for k = 0, 1, . . . , n. It is
possible to construct a version of the sequence (Sk )k≤n and random variables (Zk )k≤n
with a joint Gaussian distribution, mean zero and Cov(Zi , Zj ) = i ∧ j on the same
probability space such that E exp(λ|Sn − Zn |) ≤ B and
E exp(λ max |Sk − Zk |) ≤ B exp(B log n).
k≤n
CHAPTER 3. THE KMT THEOREM FOR THE SRW
68
Proof. Recall the universal constants θ0 and κ from Theorem 2.1.1 and C, K and λ0
from Theorem 3.1.1. Choose λ > 0 sufficiently small such that:
λ<
θ0 ∧ λ0
2
and 8Kλ2 < 1.
Let the probability density functions fan , ρna and β n be as in the proof of Theorem
3.1.1. Recall the notation introduced in 3.1.3. Let g n denote the µ-density of Sn and
let hn denote the Lebesgue density of Zn . By Theorem 2.1.1, Lemma A.1.15 and
Lemma A.1.16, there exists a joint (µ ⊗ λ)-density function ψ n on Z × R such that:
Z
Z
n
n
ψ (s, z) dz = g (s),
ψ n (s, z) µ(ds) = hn (z),
and
Z
exp(θ0 |s − z|)ψ n (s, z) (µ ⊗ λ)(ds, dz) ≤ κ.
By the choice of λ, this last inequality clearly implies that:
Z
exp(2λ|s − z|)ψ n (s, z) (µ ⊗ λ)(ds, dz) ≤ κ.
(3.10)
Now let γ n : Z × R × Zn+1 × Rn+1 → R be defined as follows:
γ n (s, z, s, z) := ψ n (s, z)ρns (s, z).
To see that this is a ((µ ⊗ λ) ⊗ Qn )-probability density function, we can integrate γ n
over s, z, s, z, which gives:
Z
γ n (s, z, s, z) ((µ ⊗ λ) ⊗ Qn )(ds, dz, ds, dz)
Z
Z
n
n
= ψ (s, z)
ρs (s, z) Qn (ds, dz) (µ ⊗ λ)(ds, dz)
Z
= ψ n (s, z) (µ ⊗ λ)(ds, dz) = 1.
Let (S, Z, S, Z) be a random vector following the density γ n . By Lemma A.1.15, the
joint density of (Z, Z) can be found by integrating γ n over s and s:
Z
Z
Z
n
n+1
n
n
n+1
γ (s, z, s, z) (µ ⊗ µ )(ds, ds) =
ψ (s, z)
ρs (s, z) µ (ds) µ(ds)
Z
=
ψ n (s, z)β n (z) µ(ds)
Z
=
ψ n (s, z) µ(ds) β n (z)
= hn (z)β n (z).
CHAPTER 3. THE KMT THEOREM FOR THE SRW
69
Thus, we have proved that Z and Z are independent with density functions hn and
β n respectively. Define a random vector Y = (Y0 , . . . , Yn ) as:
Yi = Z i +
i
Z.
n
Since Z and Z are independent and have (multidimensional) normal distributions,
we know that (Z, Z) is multidimensional normally distributed too. Note that Y is a
linear transformation of (Z, Z), which can be seen as follows:



0
Y0
 Y1  
1/n
  
 Y2  

Y =   = 2/n
 ..  
 .   ..
.
Yn
1
1
0
0
1
0
..
.
0
... ...
..
.
...
1
... ...
0 ... ...
0
Z 
0
Z0 
..  

. 

Z1 

.. 
.. 
.
.

. 

 
0  ... 
1
Z
n
Hence, Y is a Gaussian random vector. Moreover, since (Z, Z) is a mean zero random
vector, the same is true for Y. Now we will determine Cov(Yi , Yj ). Since Z follows
the density β n , we know that Cov(Zi , Zj ) = (i ∧ j)(n − (i ∨ j))/n. Besides we know
that hn is the density function of Z, and thus Z ∼ N (0, n). Combining these two
facts with the independence of Z and Z, yields that for i ≤ j:
j
i
Z, Zj + Z)
n
n
ij
= Cov(Zi , Zj ) + 2 Cov(Z, Z)
n
i(n − j) ij
=
+ 2 n = i.
n
n
Cov(Yi , Yj ) = Cov(Zi +
Clearly, if j ≤ i, we obtain that Cov(Yi , Yj ) = j. Hence, we can conclude that
Cov(Yi , Yj ) = i ∧ j.
Next, integrating out z and z we obtain that the joint (µ ⊗ µn+1 )-density of (S, S)
is given by:
g n (s)fsn (s).
This means that the conditional density of S given S = s is fsn . Note that we have
used Lemma A.1.15 here. It should now be clear that the distribution of S is the
same as that of a simple random walk up to time n. For completeness, we will give
an argumentation for this. By the above we see that given S = s, we know that
Sn = s. An example of a realization of (S0 , . . . , Sn ) is the following:
(0, 1, 2, 1, 2, 3, 4, 3, . . . , s + 1, s).
In general, a realization of (S0 , . . . , Sn ) has the following form:
0, π(1) , π(1) + π(2) , . . . , s ,
CHAPTER 3. THE KMT THEOREM FOR THE SRW
70
where the i ∈ {−1, +1} are numbers. Note that the number of ones in {1 , . . . , n } is
completely determined by s (and n). By letting π vary over different permutations,
we see that (S0 , . . . , Sn ) can be any random walk up to time n that ends up in s. If
we then let s vary over all possible values for Sn , we see that S has the distribution
of a simple random walk up to time n.
We will show that the pair (S, Y) satisfies the conditions of the theorem. First,
let Wi = Si − ni S. Then, for any i ≤ n:
i
i
i
i
|Si − Yi | = Si − Zi + Z = Wi + S − Zi − Z ≤ |Wi − Zi | + |S − Z|.
n
n
n
n
Using this bound together with the fact that λ > 0 and part (i) and (ii) of Proposition
A.1.12, we see that:
i
E exp(λ max |Si − Yi |) ≤ E exp λ max |Wi − Zi | + |S − Z|
i≤n
i≤n
n
≤ E[exp(λ max |Wi − Zi |) exp(λ|S − Z|)]
i≤n
= E E[exp(λ max |Wi − Zi |) exp(λ|S − Z|)kS, Z]
i≤n
= E exp(λ|S − Z|)E[exp(λ max |Wi − Zi |)kS, Z] .
i≤n
Using Hölder’s inequality A.1.6, we obtain the following bound:
1
2
2
2
E exp(λ max |Si − Yi |) ≤ E E[exp(λ max |Wi − Zi |)kS, Z] E exp(λ|S − Z|)
.
i≤n
i≤n
We will try to bound the two terms between the parentheses. First, note that by
Lemma A.1.15, the conditional distribution of (S, Z) given (S, Z) = (s, z) is simply ρns .
Combining this with Remark A.1.11, we know that E[exp(λ maxi≤n |Wi −Zi |)kS, Z] =
g(S, Z), where:
g(s, z) = E[exp(λ max |Wi − Zi |)kS = s, Z = z]
i≤n
Z
is
=
exp λ max si − − zi ρns (s, z) Qn (ds, dz)
i≤n
n
2 2
Kλ s
≤ exp C log n +
=: g̃(s).
n
Note that the inequality follows from the construction of ρns and from the fact that
0 < λ < λ0 . We can now conclude that:
Kλ2 S 2
E[exp(λ max |Wi − Zi |)kS, Z] ≤ g̃(S) = exp C log n +
.
i≤n
n
Besides, we can use (3.10) to obtain a bound on the second term between the parentheses. Indeed, by Lemma A.1.15, ψ n is the joint density function of (S, Z) and we
CHAPTER 3. THE KMT THEOREM FOR THE SRW
71
see that:
Z
2
E exp(λ|S − Z|) = E exp(2λ|S−Z|) = exp(2λ|s−z|)ψ n (s, z) (µ⊗λ)(ds, dz) ≤ κ.
Using the two bounds, we obtain that:
1/2
2Kλ2 S 2
κ
E exp(λ max |Si − Yi |) ≤
E exp 2C log n +
i≤n
n
1/2
= exp(C log n) κE exp 2Kλ2 S 2 /n
.
Since S has density function g n , we know that S plays the same role as the S in
inequality (2.3). We want to use this inequality with 4θ2 = 2Kλ2 , so we need to
prove that 8Kλ2 < 1. By the choice of λ in the beginning of this proof, this is
obviously true, and inequality (2.3) can be applied. We get:
1
E exp 2Kλ2 S 2 /n ≤ √
.
1 − 8Kλ2
This implies that:
E exp(λ max |Si − Yi |) ≤ exp(C log n)
i≤n
Now choose B large enough such that:
( B ≥ max C,
κ
√
1 − 8Kλ2
κ
√
1 − 8Kλ2
1/2
1/2
.
)
,κ .
With this choice of B, it is clear that:
E exp(λ max |Si − Yi |) ≤ exp(B log n)B,
i≤n
which concludes the proof of the second inequality of the theorem. Thus, up to this
point, we have constructed a random vector S with the same distribution as that
of a simple random walk up to time n and a Gaussian random vector Y with mean
zero and Cov(Yi , Yj ) = i ∧ j, satisfying the inequality above. The only thing left to
prove is the bound E exp(λ|Sn − Yn |) ≤ B. First, note that Yn = Z. By definition
of Yn , it suffices to prove that Zn = 0. Since the joint density of Z is given by β n ,
it follows that Zn is a Gaussian random variable with mean zero and Var(Zn ) = 0.
Hence E[Zn2 ] = 0, which implies that Zn = 0 almost surely. Consequently Yn = Z,
thus it remains to prove that E exp(λ|Sn − Z|) ≤ B. By construction of (S, Z, S, Z)
we know that Sn = S and that (S, Z) has joint density ψ n . By the choice of B and
by inequality (3.10) we can conclude that:
Z
E exp(λ|Sn −Z|) ≤ E exp(2λ|S−Z|) = exp(2λ|s−z|)ψ n (s, z)(µ⊗λ)(ds, dz) ≤ κ ≤ B.
This finishes the proof.
CHAPTER 3. THE KMT THEOREM FOR THE SRW
72
We will now give a proof of the main theorem.
Theorem 3.2.2. LetP1 , 2 , ... be a sequence of i.i.d. symmetric ±1-valued random
variables. Let Sk = ki=1 i , for each k ≥ 0. It is possible to construct a version of
the sequence (Sk )k≥0 and a standard Brownian motion (Bt )t≥0 on the same probability
space such that for all n and all x ≥ 0:
P max |Sk − Bk | ≥ C log n + x ≤ Ke−λx ,
k≤n
where C, K and λ do not depend on n.
r
Proof. Let m0 = 0. For r = 1, 2, . . . let mr = 22 , and nr = mr − mr−1 . For each r
(r)
(r)
let (Sk , Zk )0≤k≤nr be a random vector satisfying the conclusions of Lemma 3.2.1.
We can assume these random vectors are independent. Inductively define an infinite
(1)
(1)
sequence (Sk , Zk )k≥0 in the following way. Let Sk = Sk and Zk = Zk for k ≤ m1 .
Assume (Sk , Zk )k≤mr−1 has been defined, then define (Sk , Zk )mr−1 <k≤mr as:
(r)
Sk := Sk−mr−1 + Smr−1
(r)
and Zk := Zk−mr−1 + Zmr−1 .
We will prove that (Sk )k≥0 is a simple random walk and (Zk )k≥0 is a Brownian
(r)
motion. Recall that we assumed the random vectors (Sk )0≤k≤nr are independent
for different r. Thus by construction of the sequence (Sk )k≥0 , it is clear that there
exists a sequence (i )i≥1 of i.i.d. symmetric ±1−valued random variables, such that
P
Sk = ki=1 i for each k. Hence, (Sk )k≥0 is a simple random walk process. To prove
(1)
that (Zk )k≥0 is a Brownian motion, first note that Z0 = Z0 = 0 by construction.
It is a well-known fact that it is now sufficient to prove that (Zk )k≥0 is a centered
(r)
Gaussian process1 with Cov(Zi , Zj ) = i ∧ j, for all i and j. Since (Zk )0≤k≤nr has a
multidimensional normal distribution for each r ≥ 1, and since these random vectors
are independent for distinct r, we have that:
(1)
(2)
Z0 , . . . , Zn(1)
, Z0 , . . . , Zn(2)
,...
1
2
is a Gaussian process. Now, let k ≥ 1 and i1 , . . . , ik ≥ 0. By construction of (Zk )k≥0 ,
we see that (Zi1 , . . . , Zik ) is a linear transformation of a finite subvector of the Gaussian process above. Hence, (Zi1 , . . . , Zik ) has a k-dimensional normal distribution,
which shows that (Zk )k≥0 is a Gaussian process. Besides, since all random variables
(r)
Zk have mean zero, the same holds for the random variables Zk . Therefore, (Zk )k≥0
is a centered process. Finally we will prove that Cov(Zi , Zj ) = i ∧ j, for all i, j ≥ 0.
Since Z0 = 0, this statement is obviously true when i = 0 or j = 0. Thus we can
assume that i, j > 0. We will first show that Var(Zmr ) = mr , for each r ≥ 0. We
(r)
will use induction on r. By construction of the random variables Zk in Lemma
(1)
3.2.1, it is clear that Var(Z0 ) = 0 and Var(Zm1 ) = Var(Zm1 ) = m1 . Now assume
1
A stochastic process (Yi )i∈I is called a Gaussian process if for any k ≥ 1 and for each choice
i1 , i2 , . . . , ik ∈ I, the random vector (Yi1 , Yi2 , . . . , Yik ) has a k-dimensional normal distribution.
CHAPTER 3. THE KMT THEOREM FOR THE SRW
73
(r)
Var(Zmr−1 ) = mr−1 . Recall that Var(Znr ) = nr by construction. Using independence
and the induction hypothesis, we obtain that:
) = mr−1 + nr = mr .
) = Var(Zmr−1 ) + Var(Zn(r)
Var(Zmr ) = Var(Zmr−1 + Zn(r)
r
r
Let:
i = mr−1 + l,
with 0 < l ≤ nr ,
j = mr0 −1 + l0 ,
with 0 < l0 ≤ nr0 .
and
Without loss of generality we can assume that i ≤ j. This implies we can distinguish
two cases.
(i) r < r0 . Using independence, we obtain:
(r0 )
(r)
Cov(Zi , Zj ) = Cov(Zmr−1 + Zl , Zmr0 −1 + Zl0 )
(r)
= Cov(Zmr−1 + Zl , Zmr )
(r)
= Cov(Zmr−1 + Zl , Zmr−1 + Zn(r)
)
r
(r)
= Var(Zmr−1 ) + Cov(Zl , Zn(r)
)
r
= mr−1 + (l ∧ nr ) = mr−1 + l = i.
(r)
(ii) r = r0 and l ≤ l0 . Again using independence and the construction of (Zk )0≤k≤nr ,
we obtain:
(r)
(r)
Cov(Zi , Zj ) = Cov(Zmr−1 + Zl , Zmr−1 + Zl0 )
(r)
(r)
= Cov(Zmr−1 , Zmr−1 ) + Cov(Zl , Zl0 )
= Var(Zmr−1 ) + (l ∧ l0 ) = mr−1 + l = i.
This completes the argument of (Zk )k≥0 being a Brownian motion. Recall the constants B and λ from Lemma 3.2.1. First, note that for each r, we have:
Zmr = Zmr−1 + Zn(r)
r
= Zmr−2 + Zn(r−1)
+ Zn(r)
r
r−1
= ...
+ Zn(r)
= Zm1 + Zn(2)
+ · · · + Zn(r−1)
r
2
r−1
= Zn(1)
+ Zn(2)
+ · · · + Zn(r−1)
+ Zn(r)
r
1
2
r−1
r
X
=
Zn(l)l .
l=1
P
(l)
Analogously, we see that Smr = rl=1 Snl . Using this together with the independence
(r)
(r)
of the random vectors (Sk , Zk )0≤k≤nr and the fact that λ is positive, we obtain
CHAPTER 3. THE KMT THEOREM FOR THE SRW
74
that:
E exp(λ|Smr − Zmr |) ≤ E exp λ
r
X
!
|Sn(l)l − Zn(l)l |
l=1
"
= E
r
Y
#
exp(λ|Sn(l)l − Zn(l)l |)
l=1
r
Y
=
E exp λ|Sn(l)l − Zn(l)l | ≤ B r .
(3.11)
l=1
Note that the last inequality simply follows from Lemma 3.2.1. Now, let:
c=
1−
1
.
exp(− 12 B log 4)
B
We will show by induction that the following holds for each r ≥ 1:
E exp λ max |Sk − Zk | ≤ cB r exp(B log mr ).
(3.12)
k≤mr
By the facts that B > 1 and c > 1 and by Lemma 3.2.1, this inequality holds for
r = 1. Indeed, we have:
(1)
(1)
E exp(λ max |Sk − Zk |) = E exp(λ max |Sk − Zk |)
k≤m1
k≤n1
≤ B exp(B log n1 )
= B exp(B log m1 )
≤ cB exp(B log m1 ).
Now suppose inequality (3.12) holds for r − 1. By (3.8) and the fact that λ is positive,
we have:
E exp(λ max |Sk − Zk |) = E exp (λ max |Sk − Zk |) ∨ (λ max |Sk − Zk |)
k≤mr
mr−1 <k≤mr
≤ E exp(λ
max
mr−1 <k≤mr
k≤mr−1
|Sk − Zk |) + E exp(λ max |Sk − Zk |).
k≤mr−1
Let us consider the first term. For each k with mr−1 < k ≤ mr , we have:
(r)
(r)
|Sk − Zk | = |Sk−mr−1 + Smr−1 − Zk−mr−1 − Zmr−1 |
(r)
(r)
≤ |Sk−mr−1 − Zk−mr−1 | + |Smr−1 − Zmr−1 |
≤
(r)
(r)
max |Sj − Zj | + |Smr−1 − Zmr−1 |.
1≤j≤nr
Hence,
max
mr−1 <k≤mr
(r)
(r)
|Sk − Zk | ≤ max |Sj − Zj | + |Smr−1 − Zmr−1 |.
1≤j≤nr
CHAPTER 3. THE KMT THEOREM FOR THE SRW
75
Pr−1 (l)
P
(l)
Recall that Smr−1 = l=1
Snl and Zmr−1 = r−1
l=1 Znl . Therefore, by using independence, we obtain that:
E exp(λ
max
mr−1 <k≤mr
|Sk − Zk |)
(r)
(r)
≤ E exp(λ max |Sj − Zj |) exp(λ|Smr−1 − Zmr−1 |)
1≤j≤nr
(r)
(r)
= E exp(λ max |Sj − Zj |) E exp(λ|Smr−1 − Zmr−1 |)
1≤j≤nr
≤ B exp(B log nr )B r−1 ,
where the last inequality follows from Lemma 3.2.1 and (3.11). Since nr ≤ mr , we
have shown that:
E exp(λ
max
mr−1 <k≤mr
|Sk − Zk |) ≤ B r exp(B log mr ).
Let us now consider the second term. Recalling the definition of mr , we see that:
r−1 2
r−1
r
m2r−1 = 22
= 22 .2 = 22 = mr .
Using this together with the induction hypothesis, yields:
E exp(λ max |Sk − Zk |) ≤ cB r−1 exp(B log mr−1 )
k≤mr−1
B log(m2r−1 )
r−1
= cB
exp
2
B log mr
r−1
= cB
exp
.
2
Combining the bounds on the two terms, we get:
B log mr
E exp(λ max |Sk − Zk |) ≤ B exp(B log mr ) + cB
exp
k≤mr
2
c
B log mr
r
= B exp(B log mr ) 1 + exp −
.
B
2
r
r−1
Thus, to complete the induction step, it suffices to show that:
c
B log mr
1 + exp −
≤ c.
B
2
First, note that the definition of c implies that:
1
B
exp − B log 4 = B − .
2
c
r
Moreover, we have that mr = 22 ≥ 22 = 4, since r ≥ 1. Therefore,
c
B log mr
c
B log 4
c
B
1 + exp −
≤ 1 + exp −
=1+
B−
= c.
B
2
B
2
B
c
CHAPTER 3. THE KMT THEOREM FOR THE SRW
76
This completes the induction step. So we have shown (3.12). Let c̃ = 1/ log 2. Clearly
we have for all r ≥ 1:
r
log mr = log 22 = 2r log 2.
Therefore, we obtain the following bound for any r ≥ 1:
r ≤ 2r =
1
log mr = c̃ log mr .
log 2
Since B > 1, this implies that:
cB r ≤ cB c̃ log mr = c exp log B c̃ log mr
= c exp(c̃ log mr log B).
By using (3.12), we obtain:
E exp λ max |Sk − Zk |
≤ cB r exp(B log mr )
k≤mr
≤ c exp(c̃ log mr log B + B log mr )
= c exp ((c̃ log B + B) log mr ) .
Let K := max{c, c̃ log B + B}, then we obtain for all r ≥ 1 that:
E exp λ max |Sk − Zk | ≤ K exp(K log mr ).
k≤mr
Let us now prove such an inequality for arbitrary n instead of mr . Take any n ≥ 2.
Let r be such that mr−1 ≤ n ≤ mr . If r ≥ 2, then mr = m2r−1 ≤ n2 . Besides, if r = 1,
then mr = m1 = 4 ≤ n2 . Thus,
E exp(λ max |Sk − Zk |) ≤ E exp(λ max |Sk − Zk |)
k≤mr
k≤n
≤ K exp(K log mr )
≤ K exp(K log n2 )
= K exp(2K log n).
Let C := 2K/λ. Using Markov’s inequality, we obtain for any x ≥ 0:
P max |Sk − Zk | ≥ C log n + x
≤ E exp(λ max |Sk − Zk |) exp(−λC log n − λx)
k≤n
k≤n
≤ K exp(2K log n) exp(−λC log n − λx)
= Ke−λx .
This finishes the proof.
Remark 3.2.3. Note that in the preceding proof we only constructed random variables
Zk for k = 0, 1, . . . . In this way, we obtain a discrete process. By redefining the
random variables Zk on a rich enough probability space, we obtain a Brownian motion
(Bt )t≥0 .
CHAPTER 3. THE KMT THEOREM FOR THE SRW
77
We finish this chapter with a new result. This theorem was found by Chatterjee,
as a consequence of departing from the classical proof of the KMT theorem. This
result is irrelevant for the rest of the paper. It produces a coupling of a SRW with a
standard Brownian bridge.
Theorem 3.2.4. There exist positive universal constants C, K and λ0 such that the
following is true. Take any integer n ≥ 2 and suppose 1 , ..., n are
Pexchangeable
symmetric ±1-valued random variables. For k = 0, 1, ..., n, let Sk = ki=1 i and let
Wk = Sk − nk Sn . It is possible to construct a version of W0 , ..., Wn and a standard
Brownian bridge (Bt◦ )0≤t≤1 on the same probability space such that for any 0 < λ < λ0 :
√ ◦
Kλ2 Sn2
E exp(λ max |Wk − nBk/n |) ≤ exp(C log n)E exp
.
k≤n
n
d
Proof. Note that the exchangeability implies that (1 , . . . , n ) = (π(1) , . . . , π(n) ) if
π is a random permutation of {1, . . . , n} independent of 1 , . . . , n . Consequently,
Theorem 3.1.1 can be applied.
Let all notation be as in the statement of Theorem
√
◦
3.1.1. Define Bk/n := Zk / n, then Theorem 3.1.1 implies the following. It is possible
◦
)0≤k≤n with a joint
to construct a version of W0 , . . . , Wn and random variables (Bk/n
Gaussian distribution, mean zero and:
1
Cov(Zi , Zj )
n
(i ∧ j)(n − (i ∨ j))
=
n2
i j
i j
∧
∨
=
1−
.
n n
n n
◦
◦
Cov(Bi/n
, Bj/n
) =
Moreover, there exist universal constants C, K and λ0 such that for any 0 < λ < λ0 :
√ ◦
E exp(λ max |Wk − nBk/n
|) = E exp(λ max |Wk − Zk |)
k≤n
k≤n
Kλ2 Sn2
≤ exp C log n +
.
n
Note that in Theorem 3.1.1 we assumed that the value of Sn was known. Thus Sn
was not random. However, in the context of Theorem 3.2.4 Sn has to take on random
though ’possible’ values. So, when we define Wk as in Theorem 3.2.4 (i.e. Sn takes
on random possible values), we actually obtain:
√ ◦
Kλ2 s2
E[exp(λ max |Wk − nBk/n |)kSn = s] ≤ exp C log n +
,
k≤n
n
where s is a possible value for Sn . Hence, using Remark A.1.11, we see that
√ ◦
Kλ2 Sn2
E[exp(λ max |Wk − nBk/n |)kSn ] ≤ exp C log n +
.
k≤n
n
CHAPTER 3. THE KMT THEOREM FOR THE SRW
78
Taking expectations on both sides, indeed yields that
√ ◦
Kλ2 Sn2
.
E exp(λ max |Wk − nBk/n |) ≤ E exp C log n +
k≤n
n
Redefining the random variables on a rich enough probability space, we obtain a
version of W0 , . . . , Wn and a Brownian bridge (Bt◦ )0≤t≤1 such that all conditions of
the theorem are fulfilled.
Chapter 4
An application of the KMT
theorem
Since we have completed the proof of the KMT theorem for the simple random walk,
our aim will now be to apply the general KMT theorem. We will consider a sequence
of i.i.d. random variables with a finite moment generating function. More specifically,
we will focus on the size of the increments of the partial sums. In the first section
we will show that O(log n) is a borderline. In the second section we will state a limit
theorem concerning moving averages for Gaussian random variables. Eventually, we
will generalize this result using the KMT theorem. This chapter is based on the book
[9].
4.1
Theorem of Erdős-Rényi
In this section we will prove the Theorem of Erdős-Rényi. The origin of this theorem
lies in a problem posed by T. Varga, a secondary school teacher. He did the following
experiment. He divided his class into two groups. Each student of the first group
received a coin. They were asked to toss the coin two hundred times and to write
down the corresponding head and tail sequence. The students of the second group
did not receive any coin, and they were asked to arbitrarily write down a head and
tail sequence of length two hundred. Once all the sheets of paper were collected, T.
Varga asked if it was possible to subdivide them back into their original groups.
The answer is given by the Theorem of Erdős-Rényi. From this theorem, it follows
that a randomly produced head and tail sequence of length two hundred is very likely
to contain head-runs of length seven. On the other hand, he observed that children
writing down an imaginary sequence are afraid of putting down runs of longer than
four. Hence, in order to find the sheets of the first group, he simply selected the ones
containing runs longer than five.
Theorem 4.1.1. (Erdős-Rényi) Let 1 , 2 , ... beP
a sequence of i.i.d. symmetric ±1valued random variables. Put S0 = 0 and Sn = ni=1 i . Then for any c > 0 it holds
79
CHAPTER 4. AN APPLICATION OF THE KMT THEOREM
that
80
Sk+bc log2 nc − Sk
→ β(c) a.s.,
0≤k≤n−bc log2 nc
bc log2 nc
max
where β(c) = 1 for 0 < c ≤ 1, and, if c > 1, then β(c) is the unique solution of the
equation
1
1+β
=1−h
,
(4.1)
c
2
with h(x) = −x log2 x − (1 − x) log2 (1 − x), 0 < x < 1. The function β(·) is strictly
decreasing and continuous for c > 1 with limc&1 β(c) = 1 and limc→∞ β(c) = 0.
The following result is a generalization of Theorem 4.1.1.
Theorem 4.1.2. (Erdős-Rényi) Let 1 , 2 , ... be a sequence of i.i.d. random variables
with E[1 ] = 0. Assume there exists a neighbourhood I of t = 0 such that the moment
generating function R(t) = E [et1 ] is finite for all t ∈ I. Let
ρ(x) = inf e−tx R(t),
t∈I
the so-called Chernoff function of 1 . Then for any c > 0 we have
Sk+bc log nc − Sk
→ α(c)
0≤k≤n−bc log nc
bc log nc
max
a.s.,
where
α(c) = sup x : ρ(x) ≥ e−1/c .
Remark 4.1.3. Since ρ(0) = 1 and ρ(x) is a strictly decreasing function in the
interval where ρ(x) > 0, α(c) is well defined for every c > 0, and it is a continuous
decreasing function with limc→∞ α(c) = 0.
We will show how Theorem 4.1.1 can be seen as a special case of Theorem 4.1.2.
Since 1 is a symmetric ±1-valued random variable, the moment generating function
is given by R(t) = 21 (et + e−t ). As a result, the Chernoff function ρ(x) is given by

if 0 ≤ x < 1
 (1 + x)−(1+x)/2 (1 − x)−(1−x)/2
ρ(x) :=
1/2
if x = 1

0
if x > 1.
This can be seen as follows. Since
ρ(x) = inf e−tx R(t) =
t∈R
1
inf et(1−x) + e−t(1+x) ,
2 t∈R
it is clear that ρ(x) ≥ 0. First, assume that x > 1. Then 1 − x < 0 and 1 + x > 2 > 0,
and therefore t(1−x) → −∞ and −t(1+x) → −∞, as t → ∞. Hence, the expression
for ρ(x) above, implies that ρ(x) = 0 if x > 1. Furthermore, when x = 1, we have
that ρ(x) = 12 inf t∈R (1 + e−2t ) = 1/2. Finally, consider the case where 0 ≤ x < 1. Put
CHAPTER 4. AN APPLICATION OF THE KMT THEOREM
81
f (t) := et(1−x) + e−t(1+x) , then ρ(x) = 12 inf t∈R f (t). We will try to find t∗ such that
inf t∈R f (t) = f (t∗ ). Therefore, we compute the derivative
f 0 (t) = (1 − x)et(1−x) − (1 + x)e−t(1+x) .
It is clear that
f 0 (t) = 0 ⇔ (1 − x)et(1−x) = (1 + x)e−t(1+x) ⇔ e2t =
1+x
.
1−x
From this reasoning, it follows that t∗ = 21 log 1+x
. Therefore,
1−x
1
1+x
1
1+x
∗
inf f (t) = f (t ) = exp
log
(1 − x) + exp − log
(1 + x)
t∈R
2
1−x
2
1−x
1−x − 1+x
2
1+x 2
1+x
=
+
1−x
1−x
1−x
2
1−x
1+x
1+x
(1 − x)− 2 + (1 + x)− 2 (1 − x) 2
h
i
1+x
− 1−x
− 1+x
2
2
2
(1 + x)(1 − x)
+ (1 − x)
= (1 + x)
= (1 + x)
= 2(1 + x)−
1+x
2
(1 − x)−
1+x
1−x
2
.
1−x
Hence, ρ(x) = f (t∗ )/2 = (1 + x)− 2 (1 − x)− 2 . This completes the proof of the
general expression for the Chernoff function ρ(x). From this expression for ρ(x), we
will now derive Theorem 4.1.1. Using Theorem 4.1.2, we obtain that
Sk+b c log nc − Sk
Sk+bc log2 nc − Sk
c
log 2
j
k
=
max
max
→α
a.s.
c
0≤k≤n−bc log2 nc
bc log2 nc
log 2
0≤k≤n−b logc 2 log nc
log n
log 2
It remains to prove that α(c/ log 2) = β(c) as defined in Theorem 4.1.1. To find the
expression for α(c/ log 2), first note that e−1/(c/ log 2) = 2−1/c . We consider two cases.
When 0 < c ≤ 1, then e−1/(c/ log 2) ≤ 1/2. Taking into account the expression for ρ(x)
derived above, we obtain that α(c/ log 2) = sup([0, 1]) = 1. On the other hand, when
c > 1, we have
A := x : ρ(x) ≥ e−1/(c/ log 2) ⊂ {x : ρ(x) > 1/2} = [0, 1).
Thus, for x ∈ A, it holds that
ρ(x) ≥ e−1/(c/ log 2) ⇔ (1 + x)−(1+x)/2 (1 − x)−(1−x)/2 ≥ 2−1/c
⇔ log2 (1 + x)−(1+x)/2 − log2 (1 − x)(1−x)/2 ≥ −1/c
1+x
1−x
⇔
log2 (1 + x) +
log2 (1 − x) ≤ 1/c.
2
2
We now define β as the solution of the equation
1+β
1−β
log2 (1 + β) +
log2 (1 − β) = 1/c.
2
2
CHAPTER 4. AN APPLICATION OF THE KMT THEOREM
82
To complete the proof of Theorem 4.1.1, it clearly suffices to prove that
1
1+β
=1−h
,
c
2
with h(x) = −x log2 x − (1 − x) log2 (1 − x). By the definition of β above, it is clear
that
1+β
1−β
1
=1− 1−
log2 (1 + β) −
log2 (1 − β) .
c
2
2
Hence, it suffices to prove that
1+β
1−β
1+β
h
=1−
log2 (1 + β) −
log2 (1 − β).
2
2
2
Indeed,
1+β
1+β
1+β
1+β
h
= −
log2
− 1−
log2 1 −
2
2
2
2
1+β
1−β
= −
(log2 (1 + β) − 1) −
(log2 (1 − β) − 1)
2
2
1+β 1−β 1+β
1−β
=
+
−
log2 (1 + β) −
log2 (1 − β)
2
2
2
2
1+β
1−β
= 1−
log2 (1 + β) −
log2 (1 − β).
2
2
Thus, we have completed the proof of Theorem 4.1.1, by making use of Theorem
4.1.2.
1+β
2
Remark 4.1.4. As an illustration of Theorem 4.1.2, we consider the case where
2
1 ∼ N (0, 1). It is well known that R(t) = et /2 . Therefore, the Chernoff function is
given by
1
2
2
2
2
ρ(x) = inf e−tx et /2 = e−x /2 inf e 2 (t−x) = e−x /2 .
t∈R
t∈R
From this expression for the Chernoff function, we can easily compute α(c). Indeed,
for c > 0, we have
n
o
p
2
α(c) = sup x : e−x /2 ≥ e−1/c = sup{x : x2 /2 ≤ 1/c} = 2/c.
Thus, considering a standard normal random variable 1 , Theorem 4.1.2 implies that
p
Sk+bc log nc − Sk
max
→ 2/c a.s.
0≤k≤n−bc log nc
bc log nc
In order to give a proof of Theorem 4.1.2, we first need another result.
Theorem 4.1.5. (Chernoff ) Under the assumptions and notation of Theorem 4.1.2
we have
P{Sn ≥ nx} ≤ ρn (x),
and
(P{Sn ≥ nx})1/n → ρ(x), as n → ∞,
for any x > 0.
CHAPTER 4. AN APPLICATION OF THE KMT THEOREM
83
Proof. For a proof of Chernoff’s Theorem, we refer to the book of Durrett [15]. A
combination of Theorem 2.6.3 and Theorem 2.6.5 yields the desired result.
We will now prove Theorem 4.1.2.
Proof. The proof consists of two steps. We will show that
lim sup max
n→∞
0≤k≤n−ln
lim inf
max
and
n→∞ 0≤k≤n−ln
Sk+ln − Sk
≤ α a.s.,
ln
(4.2)
Sk+ln − Sk
≥ α a.s.,
ln
(4.3)
where ln = bc log nc and α = α(c). If we are able to prove these two assertions, this
would obviously finish the proof. For the first step we define
nj = inf{n ≥ 1 : ln ≥ j}, j ≥ 1.
Clearly, we have for a suitable j0 ≥ 1 and all j ≥ j0 , ln = j, nj ≤ n < nj+1 and also
nj > j. Notice that we then get the following inequality for nj ≤ n < nj+1
max
0≤k≤n−ln
Sk+ln − Sk
Sk+j − Sk
≤ max
.
0≤k≤nj+1 −j
ln
j
Defining for > 0 and j ≥ j0 ,
Aj = Aj (c, ) =
max
0≤k≤nj+1 −j
Sk+j − Sk
≥α+ ,
j
it is obviously enough to prove that
P lim sup Aj = 0, > 0.
(4.4)
j≥j0
To that end we first note that
nj+1 −j
P(Aj ) ≤
X
k=0
P
Sk+j − Sk
Sj
≥α+
= (nj+1 − j + 1)P
≥α+
j
j
≤ nj+1 P {Sj ≥ j(α + )} .
Using the inequality of Theorem 4.1.5, we obtain that P(Aj ) ≤ nj+1 ρj (α + ). By
definition of α it follows that ρ(α
+ ) < exp(−1/c). Hence, there exists a δ > 0,
such that ρ(α + ) ≤ exp − 1+δ
. Moreover, since c log nj+1 < j + 2, we also have
c
nj+1 < exp((j + 2)/c). Therefore,
j
j
1+δ
δ
P(Aj ) ≤ nj+1 exp −
≤ exp(2/c) exp −
,
c
c
CHAPTER 4. AN APPLICATION OF THE KMT THEOREM
and we see that
∞
X
84
P(Aj ) < ∞.
j=j0
Using the Lemma of Borel-Cantelli we obtain (4.4) which in turn shows that (4.2)
holds.
We now proceed with the proof of inequality (4.3). Therefore, let > 0, and put
Sk+ln − Sk
<α− .
Bn = Bn (c, ) =
max
0≤k≤n−ln
ln
Since ρ(x) is a strictly decreasing function (in the interval where ρ(x) > 0), we have by
definition of α = α(c) that ρ(α − ) > e−1/c . Hence, there exists a δ = δ(c, ) ∈ (0, 1),
such that ρ(α − ) − δ ≥ exp − 1−δ
. By definition of Bn , we have
c


!
bn/ln c n−l
\n Sk+l − Sk
\ Sdln − S(d−1)ln
n
P(Bn ) = P
<α−
≤ P
< α − .
l
l
n
n
k=0
d=1
Since the random variables Sdln − S(d−1)ln , for 1 ≤ d ≤ bn/ln c are independent and
identically distributed, we obtain that
bn/ln c
Sdln − S(d−1)ln
<α−
P(Bn ) ≤
P
ln
d=1
bn/ln c
Sln
=
P
<α−
ln
bn/ln c
Sln
=
1−P
≥α−
.
ln
Y
By Chernoff’s Theorem 4.1.5, we know that P{Sn /n ≥ x}1/n ≥ ρ(x) − δ, for n large
enough and x > 0. From this we can conclude that
Sln
P
≥ α − ≥ (ρ(α − ) − δ)ln ,
ln
for n large enough. If we combine the last two observations, we obtain that
P(Bn ) ≤ 1 − (ρ(α − ) − δ)
ln
bn/ln c
bn/ln c
1−δ
≤ 1 − exp −
ln
,
c
for n large enough. Note that for the second inequality, we simply used
the
definition
nδ
of δ. We will show that the term on the right-hand side is of order O exp − c log n .
CHAPTER 4. AN APPLICATION OF THE KMT THEOREM
85
It is straightforward that
bn/ln c
bn/bc log ncc
1−δ
1−δ
1 − exp −
ln
=
1 − exp −
bc log nc
c
c
≤ (1 − exp (−(1 − δ) log n))bn/bc log ncc
bn/bc log ncc
= 1 − n−(1−δ)
nδ
n
log 1 −
.
= exp
bc log nc
n
Using Taylor expansion, we know that for 0 ≤ x < 1, it holds that
log(1 − x) = −x −
x2 x3
−
− · · · ≤ −x.
2
3
Using this inequality with x = nδ−1 we obtain that
bn/ln c
δ
1−δ
n
n
1 − exp −
ln
≤ exp −
c
n bc log nc
δ
n
n
≤ exp −
n c log n
δ
n
n
≤ exp −
−1
.
n c log n
Since δ < 1, we can conclude that
bn/ln c
1−δ
nδ
nδ
P(Bn ) ≤ 1 − exp −
ln
= O exp −
,
≤ e. exp −
c
c log n
c log n
P
if we take n large enough. Therefore ∞
n=1 P(Bn ) < ∞. Using the Borel-Cantelli
Lemma, we can conclude that P (lim inf n Bnc ) = 1. This is equivalent with saying that
max
0≤k≤n−ln
Sk+ln − Sk
≥ α − eventually w.p.1.
ln
This clearly implies that
lim inf
max
n→∞ 0≤k≤n−ln
Sk+ln − Sk
≥ α − a.s.
ln
Since this holds for all > 0, we have showed that
lim inf
max
n→∞ 0≤k≤n−ln
Sk+ln − Sk
≥ α a.s.,
ln
which concludes the proof of equation (4.3).
Note that α(c) of Theorem 4.1.2 is uniquely determined by the moment generating
function R(t) of 1 . The converse of this statement also holds.
CHAPTER 4. AN APPLICATION OF THE KMT THEOREM
86
Theorem 4.1.6. (Erdős-Rényi) Using the same assumptions and notation as in Theorem 4.1.2, we have that the distribution function of 1 is uniquely determined by the
function α(c).
Proof. By definition of α(c), it is clear that α(c) uniquely determines the Chernoff
function ρ(x). Besides, the moment generating function R(t) is uniquely determined
by the Chernoff function ρ(x). This can be seen as follows. Let L(t) := log R(t), t ∈ I.
Then:
ρ(x) = exp inf {L(t) − tx} =: exp(−λ(x)).
t
0
We have L (t) = E[1 exp(t1 )]/R(t) and L00 (t) = E[21 exp(t1 )]/R(t) − L0 (t)2 , t ∈ I.
Employing the Cauchy-Schwarz inequality, we find that L00 (t) ≥ 0 and also if 1 is
non-degenerate that L00 (t) > 0, t ∈ I. For more details we refer to Proposition A.3.5.
Thus we see that L0 (t) is a strictly increasing (and continuous) function on I.
Set t0 := sup I > 0 and A0 := limt→t0 L0 (t) > 0. Then L0 : [0, t0 [→ [0, A0 [ is 1-1
with a positive derivative. This implies that for any x ∈ [0, A0 [, we can find a unique
t = t(x) such that L0 (t(x)) = x. Furthermore this function is differentiable and we
have t0 (x) = 1/L00 (t(x)), x ∈]0, A0 [. Also note that t(0) = 0 as E[1 ] = 0.
Returning to the Chernoff function ρ(x) = exp(−λ(x)) it follows that
λ(x) = xt(x) − L(t(x)), x ∈ [0, A0 [.
Calculating the derivative of the last expression, we see that λ0 (x) = t(x), x ∈ [0, A0 [.
Consequently, t(x) is determined by λ which in turn is determined by ρ. Recalling
that t(x) : 0 ≤ x < A0 is the inverse function of L0 (t) : 0 ≤ t < t0 and L(0) = 0, we
see after an integration that R(t) = exp(L(t)), 0 ≤ t < t0 is determined by ρ.
Finally, it is a well-known fact that the moment generating function R(t) on any
interval ]0, δ[, where δ > 0, uniquely determines the characteristic function of 1 and
consequently its distribution function F.
For more information on the Theorem of Erdős-Rényi we refer to the article of
Deheuvels et al. [10].
To end this section, we will give an important corollary of the previous results.
As we already mentioned in the introduction, the theorem of Komlós et al. is optimal
in the sense that o(log n) cannot be achieved. Indeed, the following corollary states
that o(log n) convergence implies that the i.i.d. random variables under consideration
must have a standard normal distribution.
Corollary 4.1.7. Let {Xn : n ≥ 1} and {Yn : n ≥ 1} be two sequences of i.i.d.
random variables.
(a) If we have with probability one,
n
X
(Xj − Yj ) = O(log n) as n → ∞
j=1
and Y1 ∼ N (0, 1), it follows that X1 has a finite moment generating function in
a neighbourhood of zero.
CHAPTER 4. AN APPLICATION OF THE KMT THEOREM
87
(b) If we have with probability one,
n
X
(Xj − Yj ) = o(log n) as n → ∞
j=1
and Y1 ∼ N (0, 1), it follows that X1 ∼ N (0, 1).
Proof. (a) Using the triangle inequality we see that
n−1
n
X
X
|Xn − Yn | ≤ (Xj − Yj ) + (Xj − Yj ) .
j=1
j=1
Therefore, using the assumption in (a), we can conclude that
lim sup |Xn − Yn |/ log n < ∞ a.s.
n→∞
Next note that for any > 0,
∞
X
P{|Yn | ≥ log n} ≤
∞
X
2 exp(−2 (log n)2 /2) < ∞.
n=1
n=1
In view of the Borel-Cantelli Lemma this means that |Yn |/ log n → 0 a.s. It follows
that lim supn→∞ |Xn |/ log n < ∞ a.s. This in turn implies via Kolmogorov’s 0-1 law
that there exists a constant C ≥ 0 such that w.p. 1, lim supn→∞ |Xn |/ log n = C.
Using once more the Borel-Cantelli Lemma we see that for any D > C
∞
X
P{|Xn | > D log n} < ∞.
n=1
The random variables {Xn } have identical distributions so that we can rewrite this
as follows,
∞
X
P{exp(|X1 |/D) > n} < ∞,
n=1
which is equivalent with E[exp(|X1 |/D)] < ∞.
(b) From part (a) we know that X1 has a finite mgf in a neighbourhood
Pn of zero
so that
theorem of Erdős-Rényi 4.1.2 can be applied. Setting Sn = j=1 Xj and
Pthe
n
Tn = j=1 Yj , we have for any c > 0, that w.p. 1,
Sk+bc log nc − Sk
→ α(c) as n → ∞,
0≤k≤n−bc log nc
bc log nc
max
where the function α is defined in Theorem 4.1.2. Likewise, we have w. p. 1,
p
Tk+bc log nc − Tk
→ 2/c as n → ∞,
0≤k≤n−bc log nc
bc log nc
max
CHAPTER 4. AN APPLICATION OF THE KMT THEOREM
88
where we also used Remark 4.1.4. Using the trivial inequality
max ai − max bi ≤ max |ai − bi |,
1≤i≤m
1≤i≤m
1≤i≤m
with a1 , . . . , am , b1 , . . . , bm ∈ R, we get that
T
−
T
S
−
S
k
k
k+bc
log
nc
k+bc
log
nc
≤ 2 max |Sk − Tk | .
−
max
max
0≤k≤n−bc log nc
0≤k≤n bc log nc
0≤k≤n−bc log nc
bc log nc
bc log nc
where
the right-hand side converges to zero by assumption. It follows that α(c) =
p
2/c, c > 0. Using Theorem 4.1.6 and Remark 4.1.4, we obtain that X1 ∼ N (0, 1).
4.2
Limit theorem concerning moving averages
The aim of this section is to show an application of the KMT theorem. We will start by
giving a limit theorem concerning moving averages for a standard Brownian motion.
This is a theorem of Csörgő and Révész [8]. A significant part of this theorem was
actually shown by Lai [21]. Using the KMT theorem, one can show that this result
remains valid for sums of random variables (which satisfy certain assumptions).
Theorem 4.2.1. (Csörgő-Révész, Lai) Let aT , T ≥ 0 be a monotonically non-decreasing
function of T , satisfying the following two conditions
(i) 0 < aT ≤ T ,
(ii) T /aT is monotonically non-decreasing.
Then
lim sup
T →∞
|W (T + aT ) − W (T )|
= 1 a.s.,
βT
where {W (T ) : T ≥ 0} is a standard Brownian motion and
1/2
T
βT = 2aT log
+ log log T
.
aT
Proof. For a proof of this theorem we refer to [8] and [21].
Since we have now established this limit theorem concerning moving averages
for the Brownian motion, we will continue with an extension of this result for more
general processes. In order to obtain this extension, we will need the general KMT
theorem 1, formulated in the introduction.
Theorem 4.2.2. Let 1 , 2 , ... be a sequence of i.i.d. random variables with E[1 ] = 0
and E[21 ] = 1. Assume there exists a t0 > 0 such that the moment generating function
R(t) = E[et1 ] is finite if |t| < t0 . Let {an } be a monotonically non-decreasing sequence
of integers, satisfying the following two conditions
CHAPTER 4. AN APPLICATION OF THE KMT THEOREM
89
(i) 0 < an ≤ n,
(ii) n/an is monotonically non-decreasing.
Furthermore, assume that
an / log n → ∞,
Putting Sn =
Pn
i=1 i ,
we have
lim sup
n→∞
where
as n → ∞.
|Sn+an − Sn |
= 1 a.s.,
βn
1/2
n
βn = 2an log
+ log log n
.
an
Proof. First, we will show that βn / log n → ∞, as n → ∞. Therefore, note that
r
s
n
2an log an + log log n
√ an
log ann + log log n
βn
=
= 2
.
log n
log n
log n
an
r
Putting δn :=
log ann + log log n /an , we will show that δn → 0. We already know
that an / log n → ∞, as n → ∞. Thus, in order to prove that βn / log n → ∞, it
suffices to show that δn converges slower to zero, than an / log n converges to infinity.
First, note that an > 0 is an integer, whence n ≥ n/an . This implies that
an
an
≤
→ ∞,
log n
log ann
and thus
log an
n
an
n
→ 0. Secondly, since log
→ 0, it is clear that
an
r
log log n
→ 0,
an
and this at a rate slower than an / log n → ∞. Therefore, δn converges slower to zero
than an / log n → ∞. This finishes the argument of βn / log n → ∞.
Using the KMT theorem 1, we know there exists a Brownian motion {W (t) : 0 ≤
t < ∞}, such that
|Sn − W (n)|
lim sup
< ∞ a.s.
log n
n→∞
Combining this with the fact that
lim sup
n→∞
log n
βn
→ 0, we obtain that
|Sn − W (n)|
|Sn − W (n)| log n
= lim sup
.
= 0 a.s.
βn
log n
βn
n→∞
Moreover, Theorem 4.2.1 implies that
lim sup
n→∞
|W (n + an ) − W (n)|
= 1 a.s.
βn
When we combine the last two observations, we obtain the desired result.
CHAPTER 4. AN APPLICATION OF THE KMT THEOREM
90
The article of Csörgő and Révész [8] also contains some results for Brownian
motions, which are similar to Theorem 4.2.1. In an analogous way as above, we
can use the KMT theorem in order to obtain versions of these theorems for sums of
random variables.
Appendix A
Some elaborations
A.1
Basic results related to probability theory
We begin with the definition and notation of one of the fundamental concepts of
probability theory; characteristic functions.
Definition A.1.1. Let X : Ω → Rn be a random vector defined on a probability space
(Ω, F, P). The characteristic function φX of X is defined as follows:
φX (t) := E exp(i ht, Xi), for t ∈ Rn .
Theorem A.1.2. Let X : Ω → R be a random variable with characteristic function
(k)
φX . If E[|X|m ] < ∞ for some m ∈ {0, 1, 2, ...}, then all derivatives φX of order
0 ≤ k ≤ m exist and are continuous. Moreover, we have:
(k)
φX (t) = ik E[X k eitX ], for 1 ≤ k ≤ m.
Proof. For a proof of this theorem, we refer to p. 344 of [4].
Proposition A.1.3. Let Z be a random variable. Then Z ∼ N (0, σ 2 ) if and only
if E[Zψ(Z)] = σ 2 E[ψ 0 (Z)] for all continuously differentiable functions ψ such that
E|ψ 0 (Z)| < ∞.
Proof. Let Z be a Gaussian random variable with mean zero and variance σ 2 . Using
integration by parts, it is easily seen that:
x2
e− 2σ2
ψ (x) √
dx
σ E[ψ (Z)] = σ
2πσ
−∞
+∞ Z +∞
x2
2
σ
e− 2σ2
− x2
= √
e 2σ ψ(x)
+
xψ(x) √
dx = E[Zψ(Z)].
2π
2πσ
−∞
−∞
2
0
2
Z
+∞
0
To see the last equality, we write fZ for the Lebesgue density function of Z and then
it suffices to show that
lim fZ (t)ψ(t) = 0.
(A.1)
t→±∞
91
APPENDIX A. SOME ELABORATIONS
92
We will only show that fZ (t)ψ(t) → 0 as t → ∞. Since then the limit for t → −∞
is equal to zero as well, by symmetry reasons. Let > 0 be arbitrary. For any t ∈ R
and any x∗ < t we have:
Z t
Z t
0
∗
ψ (s)fZ (t) ds + ψ(x )fZ (t) ≤
|ψ 0 (s)|fZ (t) ds + |ψ(x∗ )|fZ (t).
|fZ (t)ψ(t)| = x∗
x∗
Since fZ is a decreasing function, we obtain that
Z t
|ψ 0 (s)|fZ (s) ds + |ψ(x∗ )|fZ (t)
|fZ (t)ψ(t)| ≤
∗
Zx ∞
|ψ 0 (s)|fZ (s) ds + |ψ(x∗ )|fZ (t) =: T1 + |ψ(x∗ )|fZ (t).
≤
x∗
R∞
R∞
Since −∞ |ψ 0 (s)|fZ (s) ds = E|ψ 0 (Z)| < ∞, we know that x∗ |ψ 0 (s)|fZ (s) ds → 0, as
x∗ → ∞. Therefore, we can choose x∗ large enough such that T1 < /2. Furthermore
we can choose t > x∗ large enough such that |ψ(x∗ )|fZ (t) < /2. Hence, we obtain
that |fZ (t)ψ(t)| < for t large enough. This concludes the proof of Equation (A.1)
and one implication has been shown. For the other implication, take t fixed and let
ψ1 (x) = cos(tx) and ψ2 (x) = sin(tx). Then obviously ψj ∈ C 1 (R) and E|ψj0 (Z)| ≤
|t| < ∞, for j = 1, 2. Hence, by assumption, we have:
E[ZeitZ ] =
=
=
=
E[Z cos(tZ)] + iE[Z sin(tZ)]
σ 2 E[−t sin(tZ)] + iσ 2 E[t cos(tZ)]
itσ 2 (E cos(tZ) + iE sin(tZ))
itσ 2 E[eitZ ] = itσ 2 φZ (t).
If we take ψ the identity function, we get from the assumptions that E[Z 2 ] = σ 2 <
∞. Hence, Theorem A.1.2 can be used and we get that φ0Z (t) = iE[ZeitZ ]. Thus,
by the above calculation itσ 2 φZ (t) = −iφ0Z (t), which yields the following differential
equation:
0 = φ0Z (t) + tσ 2 φZ (t).
Multiplying with a factor e
0=e
t2 σ 2
2
t2 σ 2
2
, this implies:
φ0Z (t) + tσ 2 e
t2 σ 2
2
0
t2 σ 2
φZ (t) = φZ (t)e 2
Integrating both sides, leads to the following:
Z t
0
x2 σ 2
t2 σ 2
0=
φZ (x)e 2
dx = φZ (t)e 2 − 1.
0
t2 σ 2
Now we can conclude that φZ (t) = e− 2 , which is the characteristic function of
a Gaussian random variable with mean zero and variance σ 2 . This concludes the
proof.
APPENDIX A. SOME ELABORATIONS
93
The following result is a well-known corollary of Fubini’s Theorem. This result is
used frequently in this thesis.
Corollary A.1.4. Let X : Ω → Rd1 and Y : Ω → Rd2 be independent random
vectors. Let f : Rd1 × Rd2 → R be a Borel-measurable function such that f ≥ 0 or
E|f (X, Y )| < ∞. Then the following holds:
Z
Z
E[f (X, Y )] =
E[f (X, y)]PY (dy) =
E[f (x, Y )]PX (dx).
Rd2
Rd1
Proof. This follows directly from Fubini’s Theorem.
Proposition A.1.5. Let X : Ω → R be a positive random variable, with E[X] = 0.
Then X = 0 a.s.
Proof. Since X ≥ n−1 I{X≥1/n} , we have that 0 = E[X] ≥ n−1 P{X ≥ 1/n}. Thus
P{X ≥ 1/n} = 0, for all n ≥ 1. Since {X ≥ 1/n} % {X > 0} and probability
measures are continuous from below, we can conclude that P{X > 0} = 0. Hence
X = 0 with probability one.
Theorem A.1.6. (Hölder’s inequality) Let (Ω, F, P) be a probability space and let
1 < p < ∞. If X and Y are random variables and 1/p + 1/q = 1, then we have that:
E|XY | ≤ E[|X|p ]1/p E[|Y |q ]1/q .
Here we set a.∞ = ∞ for a > 0, and 0.∞ = 0.
Proof. For a proof of Hölder’s inequality we refer to pp. 80-81 of [4].
The inequality of Cauchy-Schwarz is a special case of Hölder’s inequality. Recall
that two random variables Y and Z are called linearly dependent if Z = 0 a.s. or
Y = dZ a.s. for some d ∈ R.
Theorem A.1.7. (Cauchy-Schwarz) Let (Ω, F, P) be a probability space and let Y, Z
be random variables. Then
E[Y Z] ≤ E[Y 2 ]1/2 E[Z 2 ]1/2 .
Moreover, this inequality is strict whenever Y and Z are linearly independent.
Proof. The Cauchy-Schwarz inequality is a simple application of Hölder’s inequality
with p = q = 2. For the second part of the theorem, we proceed as follows. Assume
that P{Z = 0} < 1 and E[Y Z] = E[Y 2 ]1/2 E[Z 2 ]1/2 . Then it suffices to show that
Y = dZ a.s. for some d ∈ R. To ease the notation, we write k · k for the L2 -norm (i.e.
kXk = E[X 2 ]1/2 for a rv X) and hY, Zi := E[Y Z]. Let c ∈ R, then
kY + cZk2 = E[(Y + cZ)2 ] =
=
=
=
E[Y 2 + 2cY Z + c2 Z 2 ]
kY k2 + 2chY, Zi + c2 kZk2
kY k2 + 2ckY kkZk + c2 kZk2
(kY k + ckZk)2 .
APPENDIX A. SOME ELABORATIONS
94
Since P{Z = 0} < 1, we know by Proposition A.1.5 that kZk > 0, so we can choose
c := −kY k/kZk. Then kY + cZk = 0 and Proposition A.1.5 implies that Y + cZ = 0
a.s., or equivalently
Y = −cZ a.s.,
which completes the proof.
The theorem below, about the convolution of p-measures, is commonly used in
p-theory. Also in this paper, the result turns out to be very useful.
Theorem A.1.8. (Deconvolution Lemma) Let X : Ω → Rn be a random vector such
that PX = Q1 ∗ Q2 , where Q1 and Q2 are p-measures on (Rn , Rn ). Assuming that
there exists a uniform(0, 1)-r.v. U on (Ω, F, P), which is independent of X, we can
define two independent random vectors Z1 and Z2 such that PZi = Qi , i = 1, 2 and
X = Z1 + Z2 .
Proof. For a proof of this result we refer to Lemma 3.35 on p. 161 of Dudley [13].
In this paper, we will use several times conditional expectations. We will start
with the definition.
Definition A.1.9. Let (Ω, F, P) be a p-space and let X be an integrable random
variable, i.e. X ∈ L1 (Ω, F, P) (or X ≥ 0). Let G be a sub-σ-algebra of F. Then we
call Z ∈ L1 (Ω, F, P) (or Z ≥ 0) a version of the conditional expectation of X, given
G if
(i) Z is G-measurable, and
(ii) E[XIG ] = E[ZIG ], ∀G ∈ G.
If Y : Ω → Rd is a random vector, we write σ{Y } for the smallest sub-σ-algebra
of F for which Y is measurable. We can now state the following lemma.
Lemma A.1.10. Let Z : Ω → R be a random variable. Then the following are
equivalent,
(a) Z is σ{Y }-measurable.
(b) There exists a Borel measurable map g : Rd → R such that Z = g ◦ Y .
Proof. A proof of this lemma can be found in [4], p. 255.
Remark A.1.11. If Y : Ω → Rd is a random vector, we write E[XkY ] := E[Xkσ{Y }].
Thus, by the above lemma there exists a suitable Borel-measurable map g : Rd → R
such that E[XkY ] = g ◦ Y . This map is unique PY -a.s. and we define
E[XkY = y] := g(y), y ∈ Rd .
The following properties on conditional expectations are useful in the calculations.
APPENDIX A. SOME ELABORATIONS
95
Proposition A.1.12. Let (Ω, F, P) be a probability space. Let X, Y : Ω → R be
integrable random variables and let G be a sub-σ-algebra of F. Then we have:
(i) If XY is also integrable and X is G-measurable, then
E[XY kG] = XE[Y kG] a.s.
(ii) E[E[XkG]] = E[X].
(iii) If G and σ{X} are independent, then E[XkG] = E[X].
Proof. For a proof of these properties, we refer to [4], pp. 447-448.
Another well-known theorem about conditional expectations is Jensen’s inequality. It is used several times in this paper.
Theorem A.1.13. (Jensen’s inequality) Let ψ : R → R be a convex mapping and
let X : Ω → R be a random variable such that X and ψ(X) are integrable. Then we
have:
ψ(E[XkG]) ≤ E[ψ(X)kG] a.s.
Proof. For a proof of Jensen’s inequality, we refer to [4], p. 449.
For the notation used in the following theorem we refer to Remark A.1.11.
Theorem A.1.14. Let X : Ω → Rd1 and Y : Ω → Rd2 be independent random
vectors and let h : Rd1 +d2 → R be a Borel-measurable map. If E|h(X, Y )| < ∞, we
have:
(i) E[h(X, Y )kY = y] = E[h(X, y)] with PY -probability one,
(ii) E[h(X, Y )kX = x] = E[h(x, Y )] with PX -probability one.
Proof. This follows directly from Fubini’s Theorem.
The following two lemmas are used in the proofs of Chapter 3. These are two
results about measures on product spaces and the corresponding density functions.
We consider two arbitrary measurable spaces (Xi , Ai ), i = 1, 2 with product space
(X1 ×X2 , A1 ⊗A2 ). Further let πi : X1 ×X2 → Xi , i = 1, 2 be the canonical projections.
Lemma A.1.15. Let µi : Ai → [0, ∞] be σ-finite measures. Consider a p-measure Q
on A1 ⊗ A2 which is dominated by µ1 ⊗ µ2 with density (x1 , x2 ) 7→ f (x1 , x2 ). Then
the marginal distributions
R Qi := Qπi are dominated by µi , i = 1, 2Rwith respective densities x1 7→ fX1 (x1 ) = X2 f (x1 , x2 )µ2 (dx2 ) and x2 7→ fX2 (x2 ) = X1 f (x1 , x2 )µ1 (dx1 ).
Moreover, the conditional Q-distribution of π1 given π2 = x2 is determined by the
conditional µ1 -density, x1 7→ f (x1 , x2 )/fX2 (x2 ) provided that fX2 (x2 ) > 0.
Proof. This follows directly from Fubini’s theorem.
APPENDIX A. SOME ELABORATIONS
96
Lemma A.1.16. Let µi : Ai → [0, ∞] be σ-finite measures. Consider a p-measure
Q on A1 ⊗ A2 . Assume that the marginal distributions Qi := Qπi are dominated by
µi , i = 1, 2. If µ1 is a counting measure (with at most countable support S1 ), then Q
is dominated by µ1 ⊗ µ2 . (And consequently, there exists a density function.)
Proof. Let A ∈ A1 ⊗ A2 such that (µ1 ⊗ µ2 )(A) = 0. Further we define
Ax1 := {x2 ∈ X2 : (x1 , x2 ) ∈ A} ∈ A2 ,
for any x1 ∈ S1 . Since
Z
(µ1 ⊗ µ2 )(A) =
µ2 (Ax1 )µ1 (dx1 ) =
X1
X
µ2 (Ax1 ),
x1 ∈S1
and (µ1 ⊗ µ2 )(A) = 0, we can conclude that µ2 (Ax1 ) = 0, for all x1 ∈ S1 . Using that
Q2 µ2 , this implies that Q2 (Ax1 ) = 0, for all x1 ∈ S1 . Moreover, we have
!
]
Q(A) = Q(A ∩ {π1 ∈ S1 }) = Q
{x1 } × Ax1
x1 ∈S1
=
X
Q ({x1 } × Ax1 )
x1 ∈S1
≤
X
Q1 ({x1 })1/2 Q2 (Ax1 )1/2 = 0,
x1 ∈S1
where the inequality can be easily verified using Hölder’s inequality A.1.6. Thus, we
have shown that Q(A) = 0, and we get that Q µ1 ⊗ µ2 .
Now we will give some definitions and theorems on tight families of probability
measures. Let (S, d) be a metric space and denote B for the Borel σ-algebra of (S, d).
Definition A.1.17. A family (µi )i∈I of probability measures on (S, B) is called tight
if for every > 0 there exists a compact subset K ⊆ S such that µi (S \ K) < for
every i ∈ I.
Definition A.1.18. A family A = (µi )i∈I of probability measures on (S, B) is called
sequentially compact if every sequence in A has a weakly convergent subsequence.
The concepts of the two previous definitions are related. This is stated in the
following theorem, which is better known as Prohorov’s Theorem.
Theorem A.1.19. (Prohorov) Assume (S, d) is a Polish space and let (µi )i∈I be a
family of probability measures on (S, B). The family (µi )i∈I is tight if and only if it
is sequentially compact.
Proof. For a proof of Prohorov’s Theorem, we refer to [5], pp. 60-63.
Note that Rn equipped with the Euclidean metric is a Polish space. Hence, the
equivalence between tightness and sequentially compactness is valid in particular for
probability measures on (Rn , Rn ). The following theorem gives a sufficient condition
for a family of probability measures on (Rn , Rn ) to be tight.
APPENDIX A. SOME ELABORATIONS
97
Theorem A.1.20. Let Xi : Ω → Rn be random vectors with distribution µi = PXi , i ∈
I. If there exists an α > 0 such that supi∈I E[kXi kα ] < ∞, then (µi )i∈I is a tight
family.
Proof. Assume there exists an α > 0 such that
c := sup E [kXi kα ] < ∞.
i∈I
Let > 0 and choose M >
that
c 1/α
.
Using Markov’s inequality, we have for all i ∈ I
P{kXi k > M } ≤
c
E [kXi kα ]
≤ α < .
α
M
M
Define K := B(0, M ), then K is compact and
µi (Rn \ K) = P{kXi k > M } < ,
for every i ∈ I. This completes the proof.
The following results give necessary and sufficient conditions for different types of
convergence. We will start by giving a characterization of convergence in probability.
Proposition A.1.21. Let (Ω, F, P) be a probability space. Assume Xn : Ω → R,
n ≥ 1 and X : Ω → R are random variables, then the following are equivalent:
P
(i) Xn → X
(ii) For every subsequence {nk | k ≥ 1} there is a further subsequence {nkr | r ≥ 1}
such that Xnkr → X a.s.
Proof. For a proof of this proposition we refer to [12], Theorem 9.2.1.
The following result gives a characterization of Lp -convergence.
Proposition A.1.22. Let X, Xn , n ≥ 1 be random variables in Lp (Ω, F, P). Then
the following are equivalent:
(i) Xn → X in Lp
P
(ii) Xn → X and {|Xn |p | n ≥ 1} is uniformly integrable.
P
(iii) Xn → X and E|Xn |p → E|X|p .
Proof. For a proof of this characterization we refer to Theorem 5.10 in [16].
Another convergence result is the following.
Proposition A.1.23. Let Zn , n ≥ 1 be a sequence of random variables. Then the
following two properties are true.
APPENDIX A. SOME ELABORATIONS
98
(i) If supn≥1 E[|Zn |1+α ] < ∞ for some α > 0, then {Zn | n ≥ 1} is uniformly
integrable.
d
(ii) If Zn → Z and {Zn | n ≥ 1} is uniformly integrable, then E[Zn ] → E[Z].
Proof. The first part is a standard exercise in p-theory. For the second part we refer
to p. 338 of [4].
The following proposition is used several times in this paper. It is a result on
weak convergence based on a monotonicity argument.
Proposition A.1.24. Let (S, d) be a separable metric space. Let X : Ω → S and
Xn : Ω → S, n ≥ 1 be random elements in S, and let g : S → [0, ∞[ be continuous. If
d
Xn → X, then
Eg(X) ≤ lim inf Eg(Xn ).
n→∞
d
Proof. Fix M > 0 and let gM := g ∧ M . Since Xn → X and gM is bounded and
continuous, the definition of convergence in distribution implies that
EgM (X) = lim EgM (Xn ) ≤ lim inf Eg(Xn ),
n→∞
n→∞
where the inequality follows from the fact that gM ≤ g. If we let M % ∞, we can
use monotone convergence and we get
Eg(X) = lim EgM (X) ≤ lim inf Eg(Xn ).
M →∞
A.2
n→∞
Topological results required for Theorem 1.1.1
We start with the formulation of the so-called ’Schauder-Tychonoff’ theorem.
Theorem A.2.1. (Schauder-Tychonoff ) Every non-empty, compact and convex subset K of a locally convex topological vector space V has the fixed point property. This
means that any continuous function f : K → K has a fixed point.
Proof. For a proof of this theorem we refer to Chapter V, 10.5 of [14].
Using the same notation as in Theorem 1.1.1, let
V = {ϕ : Rn → R | ϕ is σ-additive and ϕ(∅) = 0} ,
the space of all finite signed measures on (Rn , Rn ). In the proof of Theorem 1.1.1
we want to apply the Schauder-Tychonoff theorem for this particular choice of V .
Thus, we will start by defining a topology T on V such that (V, T ) is a locally convex
topological vector space.
APPENDIX A. SOME ELABORATIONS
A.2.1
99
(V, T ) is a locally convex topological vector space
We start with the definition of a separating family of semi-norms.
Definition A.2.2. Let X be a vector space, then p : X → R is called a semi-norm
on X if
1. p(x + y) ≤ p(x) + p(y),
2. p(αx) = |α|p(x),
∀x, y ∈ X.
∀x ∈ X, ∀α ∈ R.
Definition A.2.3. A family {pi : i ∈ I} of semi-norms on a vector space X is said
to satisfy the axiom of separation if
∀x 6= 0, ∃i ∈ I : pi (x) 6= 0.
Let Cc (Rn ) denote the set of all continuous functions f : Rn → R with compact
support. Moreover, for any f ∈ Cc (Rn ) and each µ ∈ V we define
Z
|µ|f := f dµ .
It can be easily shown that V is a vector space and that
{| · |f : f ∈ Cc (Rn )}
is a separating family of semi-norms on V . We will now construct a topology T on
V . For any f ∈ Cc (Rn ) and each > 0, we define
Z
V (f, ) := µ ∈ V : f dµ < .
Taking finite intersections of such sets, we obtain the class B,
( k
)
\
B :=
V (fj , j ) : k ≥ 1, fj ∈ Cc (Rn ), j > 0, ∀j .
j=1
The set of all neighbourhoods in V is defined as
C := {µ0 + U : µ0 ∈ V, U ∈ B} .
Finally, C is a base for T . That is, T consists of all arbitrary unions of sets in C.
By definition, this topology T is the topology generated by the separating family of
semi-norms
{| · |f : f ∈ Cc (Rn )} .
Theorem A.2.4. (V, T ) is a locally convex topological vector space.
Proof. A proof can be found in Chapter I.1 of [26] or in [22].
APPENDIX A. SOME ELABORATIONS
A.2.2
100
The vague topology
For the details and the proofs of this subsection, we refer to Chapter IV of [2]. In
general, let E be a locally compact topological space. (For the purposes of Theorem
1.1.1, we are only interested in the case where E = Rn ). Furthermore, let M+ (E)
denote the set of all Radon measures on the Borel σ-algebra B(E) of E.
Definition A.2.5. Let µ be a measure on the Borel sets B(E) of E. We call µ a
(i) ’Borel measure’ if
µ(K) < ∞,
for every compact K ⊂ E;
(ii) ’Radon measure’ if
(a) µ is ’locally finite’. That is, every point of E has an open neighbourhood
of finite µ-measure.
(b) µ is ’inner regular’. That is, for every B ∈ B(E):
µ(B) = sup {µ(K) : K ⊂ B, K compact} .
An easy verification shows that (a) implies (i), such that every Radon measure
on B(E) is a Borel measure. The inverse is not true in general. Although, we do have
the following theorem.
Theorem A.2.6. If the locally compact space E has a countable base for its topology,
then every Borel measure on B(E) is a Radon measure.
We now introduce the following notation
M1+ (E) := {µ ∈ M+ (E) : µ(E) = 1}
for the ’Radon probability measures’. In view of the observations above, it is clear
that M1+ (Rn ) is the set of all probability measures on (Rn , Rn ). Using the same
notation as in Theorem 1.1.1, we see that M1+ (Rn ) = Ve .
The goal of the remainder of this subsection is as follows. We will provide M+ (E)
with a topology T 0 . The convergence of sequences in M1+ (E) will turn out to be the
weak convergence of p-measures. Moreover, if we take E = Rn , we will obtain that
the trace topologies
TVe = TVe0
(i.e. {G ∩ Ve : G ∈ T } = {G0 ∩ Ve : G0 ∈ T 0 }), and that T 0 is metrizable. This also
implies that (Ve , TVe ) is a metrizable space.
We start with the definition of ’vague convergence’. Analogously as before we let
Cc (E) denote the set of all continuous functions f : E → R with compact support.
Definition A.2.7. A sequence (µn )n of Radon measures on B(E) is said to be ’vaguely
convergent’ to a Radon measure µ if
Z
Z
lim
f dµn = f dµ, ∀f ∈ Cc (E).
n→∞
APPENDIX A. SOME ELABORATIONS
101
Vague convergence of sequences in M+ (E) is convergence in a certain topology
on M+ (E), called, ’the vague topology’. It is defined as the coarsest topology on
M+ (E) with respect to which all mappings
Z
µ 7→ f dµ (f ∈ Cc (E))
are continuous. A fundamental system of neighbourhoods of a typical µ0 ∈ M+ (E)
consists of all sets of the form
Z
Z
Vf1 ,...,fn ; (µ0 ) = µ ∈ M+ (E) : fj dµ − fj dµ0 < , j = 1, ..., n .
We denote the vague topology on M+ (E) by T 0 . The following theorem reveals there
is a connection between vague convergence and weak convergence.
Theorem A.2.8. A sequence (µn )n in M1+ (E) is vaguely convergent to µ ∈ M1+ (E)
if and only if it is weakly convergent to µ.
For a variety of applications it is important to know when, in terms of E, the
vague topology T 0 of M+ (E) is metrizable. One reason is that sequences suffice for
dealing with metric topologies, but generally not for non-metric ones. An answer to
this metrizability-question is given in the next theorem.
Theorem A.2.9. The following assertions about a locally compact space E are equivalent:
(a) M+ (E) is a Polish space in its vague topology.
(b) The vague topology T 0 of M+ (E) is metrizable and has a countable base.
(c) The topology of E has a countable base.
(d) E is a Polish space.
Since property (c) is certainly true if we take E = Rn , we know that (M+ (Rn ), T 0 )
is metrizable. Hence, also (Ve , TVe0 ) is metrizable. Finally, using the construction of
the topologies T and T 0 , a straightforward calculation shows that TVe = TVe0 . Hence,
(Ve , TVe ) is a metric space. Moreover, when we apply Theorem A.2.8 with E = Rn ,
and we use that TVe = TVe0 , we see that convergence of sequences in (Ve , TVe ) is simply
weak convergence.
A.3
Some straightforward calculations
In this section we worked out some calculations that are useful in the text.
APPENDIX A. SOME ELABORATIONS
102
Proposition A.3.1. Let X be a random variable with E[X] = 0 and E[X 2 ] < ∞.
Suppose X has a density function ρ with support I ⊆ R, where I is a bounded or
unbounded interval. Let
 R∞
 x yρ(y)dy
,
if x ∈ I
h(x) :=
ρ(x)

0
if x 6∈ I.
Then E[h(X)] = E[X 2 ].
Proof.
Notice that for every x ∈ I, we have by definition of h that h(x)ρ(x) =
R∞
yρ(y)dy. This is also true for x 6∈ I, because I is an interval and E[X] = 0.
x
Furthermore, we also notice that:
Z ∞
Z x
yρ(y)dy = −
yρ(y)dy,
−∞
x
since E[X] = 0. We now obtain the following:
Z +∞
E[h(X)] =
h(x)ρ(x)dx
−∞
Z +∞ Z ∞
yρ(y)dydx
=
−∞
x
Z ∞Z ∞
Z 0 Z ∞
yρ(y)dydx
yρ(y)dydx +
=
0
x
−∞ x
Z 0 Z x
Z ∞Z ∞
=
(−y)ρ(y) dydx +
yρ(y) dydx
{z }
−∞ −∞ |
0
x | {z }
≥0
Z
0
=
Z
(−y)ρ(y)
−∞
Z 0
=
0
Z
2
0
Z
y ρ(y)dy +
−∞
Z
∞
2
Z
dxdy
0
+∞
y ρ(y)dy =
0
≥0
y
yρ(y)
dxdy +
y
∞
y 2 ρ(y)dy = E[X 2 ],
−∞
where we have used Fubini’s Theorem, since the two integrands were positive.
Proposition A.3.2. Let µ1 , µ2 be measures on (Rn , Rn ) and let α, β ∈ R+ . Then
the following statements are true:
(i) αµ1 + βµ2 is a measure on (Rn , Rn ).
(ii) For any Borel-measurable function f : Rn → R, we have:
Z
Z
Z
f d(αµ1 + βµ2 ) = α f dµ1 + β f dµ2 .
Proof. The proof of this proposition is straightforward.
APPENDIX A. SOME ELABORATIONS
103
Proposition A.3.3. Let k · k denote the matrix norm induced by the Euclidean norm
on Rn . Let A = (aij ) be a real n × n-matrix, then for all 1 ≤ i, j ≤ n:
|aij | ≤ kAk.
Proof. Let 1 ≤ i, j ≤ n and let A·,j denote the j-th column of A. Then, obviously:
v
u n
uX
kAej k
|aij | ≤ t
a2lj = kA·,j k = kAej k =
≤ kAk,
kej k
l=1
where ej = (0, . . . , 0, 1, 0, . . . , 0)T .
Proposition A.3.4. Let X be a bounded random variable. Then X has a finite
moment generating function R. Moreover, R is differentiable and:
R0 (t) = E[XetX ].
Proof. By assumption, there exists a constant C such that |X| ≤ C. This implies
that R(t) = E[etX ] ≤ E[e|t||X| ] ≤ e|t|C < ∞. Since the mgf of X is finite, it is a
well-known fact that R(t) can be written as follows:
R(t) =
∞
X
E[X j ]tj
j!
j=0
.
The proof of this result is straightforward and can be found in [4], p. 146. Since
we now have a representation of R as a power series, it follows from analysis that
P
E[X j ]tj−k
all derivatives R(k) of R exist. Moreover, it holds that R(k) (t) = ∞
j=k (j−k)! , for
t ∈ R and k = 1, 2, . . . . Thus, particularly we have that:
0
R (t) =
∞
X
E[X j ]tj−1
(j − 1)!
j=1
Let Yn =
X j tj−1
j=1 (j−1)! .
Pn
Yn = X
Obviously, we have that:
n
X
X j−1 tj−1
j=1
, t ∈ R.
(j − 1)!
=X
n−1
X
X j tj
j=0
j!
→ XetX a.s., as n → ∞.
Since |Yn | ≤ |X|e|t||X| ≤ Ce|t|C for all n, we can use the Bounded Convergence
Theorem which implies that:
n
X
E[X j ]tj−1
j=1
Consequently, E[XetX ] =
(j − 1)!
P∞
j=1
= E[Yn ] → E[XetX ].
E[X j ]tj−1
,
(j−1)!
which concludes the proof.
APPENDIX A. SOME ELABORATIONS
104
Proposition A.3.5. Let X be a random variable with E[X] = 0 and
assume there
exists a neighbourhood I of t = 0 such that the mgf R(t) = E etX is finite for all
t ∈ I. Let L(t) := log R(t), t ∈ I. Then
(i) L00 (t) ≥ 0, for all t ∈ I,
(ii) L00 (t) > 0, for all t ∈ I, if X is non-degenerate.
Proof. Clearly L0 (t) = R0 (t)/R(t) and therefore
L00 (t) =
R00 (t)R(t) − (R0 (t))2
.
(R(t))2
From this expression for L00 (t), we see that L00 (t) ≥ 0 if and only if R00 (t)R(t) −
(R0 (t))2 ≥ 0. Using the well-known expressions for R0 (t) and R00 (t), we see that this
is equivalent with saying that
E[X 2 exp(tX)]E[exp(tX)] ≥ E[X exp(tX)]2 .
This inequality is indeed valid, since if we put Y := X exp(tX/2) and Z := exp(tX/2),
then the inequality of Cauchy-Schwarz A.1.7 implies that
E[X exp(tX)]2 =
=
≤
=
E[X exp(tX/2) exp(tX/2)]2
E[Y Z]2
E[Y 2 ]E[Z 2 ]
E[X 2 exp(tX)]E[exp(tX)].
Thus we have shown that L00 (t) ≥ 0. Besides, we know that the inequality of CauchySchwarz is strict whenever Y and Z are linearly independent. This is obviously the
case when X is non-degenerate. Indeed, if Y and Z would be linearly dependent,
then Z = 0 a.s. or Y = dZ a.s. for some d ∈ R. Clearly Z 6= 0 almost surely. And
if Y = dZ a.s. for some d ∈ R, then X = d a.s., which is a contradiction, since X is
non-degenerate. Hence L00 (t) > 0 when we assume that X is non-degenerate.
Proposition A.3.6. For all real numbers x and y, the following is true:
1
|ex − ey | ≤ |x − y|(ex + ey ).
2
Proof. The inequality we have to show can be rewritten as:
1
ex |1 − ey−x | ≤ |x − y|ex (1 + ey−x ),
2
which is equivalent with |1 − ey−x | ≤ 12 |y − x|(1 + ey−x ). Thus it suffices to prove that
for all t ∈ R it holds that:
|1 − et |
1
≤
|t|.
1 + et
2
APPENDIX A. SOME ELABORATIONS
105
t
|
and h(t) := 12 |t|, then obviously g(0) = 0 = h(0). Thus, it suffices
Let g(t) := |1−e
1+et
0
to show that g (t) ≤ h0 (t) for all t > 0 and g 0 (t) ≥ h0 (t) for all t < 0. First let
et −1
2
t > 0. Then g(t) = 1+e
t = 1 − 1+et and h(t) = t/2. Thus, we want to prove that
t
2e
≤ 1/2. We will rewrite this inequality, until we get an equivalent expression
(1+et )2
from which we know it holds. Clearly,
2et
1
⇔ 4et ≤ 1 + 2et + e2t
≤
t
2
(1 + e )
2
⇔ 0 ≤ 1 − 2et + e2t
⇔ 0 ≤ (1 − et )2 ,
t
=
where the last inequality obviously holds. Now let t < 0, then g(t) = 1−e
1+et
2
2et
− 1 − 1+et and h(t) = −t/2. So, we want to prove that − (1+et )2 ≥ −1/2. This
t
2e
is equivalent with showing that (1+e
t )2 ≤ 1/2, which we already proved. This concludes the proof of this proposition.
Proposition A.3.7. Let Z be a standard normal rv and let α, β ∈ R with β < 1/2.
Then, we have
α2
1
2
E exp αZ + βZ = √
exp
.
2(1 − 2β)
1 − 2β
Proof. Using the density function of Z, it is easy to see that:
E exp αZ + βZ 2
Z ∞
1
exp(αx + βx2 ) exp(−x2 /2)dx
=√
2π −∞
Z ∞
2 !
2
p
1
1
α
α
= √ exp
exp −
1 − 2βx − √
dx.
2(1 − 2β)
2
1 − 2β
2π
−∞
Put y =
√
1 − 2βx −
E exp αZ + βZ
√ α
1−2β
2
and perform a substitution. Then we obtain that:
Z ∞
1
1
α2
= √ √
exp
exp(−y 2 /2)dy
2(1
−
2β)
1
−
2β
2π
−∞
1
α2
= √
exp
,
2(1 − 2β)
1 − 2β
where we used that the density function of a standard normal rv integrates to 1.
List of symbols
Symbol
mgf
p-space
p-measure
rv
U [a, b]
N (µ, σ 2 )
Φ
B(0, r)
(S, d)
B
Rd
Ck
C∞
Lp
a.s.
P
→
d
→
w
→
σ{Y }
R
φX
IA
Ac
SRW
i.i.d.
SLLN
KMT
B = (Bt )t≥0
W = {W (t) : t ≥ 0}
B ◦ = (Bt◦ )t≥0
Description
moment generating function
probability space
probability measure
random variable
uniform distribution on [a, b]
normal distribution with expectation µ and variance σ 2
standard normal distribution function
closed ball in Rn with center 0 and radius r
metric space
Borel subsets of a certain metric space (S, d)
Borel σ-field in Rd
space of functions which are k times continuously
differentiable
space of functions with derivatives up to any order
space of random variables X with E[|X|p ] < ∞
almost surely
convergence in probability
convergence in distribution
weak convergence
smallest sub-σ-field of F making Y measurable
moment generating function
characteristic function of a random vector X
indicator function
complement of A
simple random walk
independent and identically distributed
strong law of large numbers
Komlós-Major-Tusnády
Brownian Motion
Brownian Motion
Brownian Bridge
106
APPENDIX A. SOME ELABORATIONS
U
i∈I
Ai
a∨b
a∧b
bac
k·k
δij
µ1 ⊗ µ2
µ1 µ2
PX
disjoint union of the sets Ai , i ∈ I
maximum of a and b
minimum of a and b
floor function, i.e. the largest integer not greater
than a ∈ R
Euclidean norm or induced matrix norm
the Kronecker delta
product measure of µ1 and µ2
the measure µ1 is dominated by the measure µ2
distribution of the random vector X
107
Bibliography
[1] K. Azuma, Weighted sums of certain dependent random variables, Tôhoku Math.
J. (2) 19 (1967), 357–367.
[2] H. Bauer, Measure and integration theory, de Gruyter Studies in Mathematics,
vol. 26, Walter de Gruyter & Co., Berlin, 2001. Translated from the German by
Robert B. Burckel.
[3] R. Bhatia, Matrix analysis, Graduate Texts in Mathematics, vol. 169, SpringerVerlag, New York, 1997.
[4] P. Billingsley, Probability and measure, 3rd ed., Wiley Series in Probability and
Mathematical Statistics, John Wiley & Sons, Inc., New York, 1995. A WileyInterscience Publication.
[5]
, Convergence of probability measures, 2nd ed., Wiley Series in Probability
and Statistics: Probability and Statistics, John Wiley & Sons, Inc., New York,
1999. A Wiley-Interscience Publication.
[6] S. Chatterjee, A new approach to strong embeddings, Probab. Theory Related
Fields 152 (2012), no. 1-2, 231–264.
[7] H. Chernoff, A measure of asymptotic efficiency for tests of a hypothesis based
on the sum of observations, Ann. Math. Statistics 23 (1952), 493–507.
[8] M. Csörgő and P. Révész, How big are the increments of a Wiener process?, Ann.
Probab. 7 (1979), no. 4, 731–737.
[9]
, Strong approximations in probability and statistics, Probability and
Mathematical Statistics, Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New York-London, 1981.
[10] P. Deheuvels, L. Devroye, and J. Lynch, Exact convergence rate in the limit
theorems of Erdős-Rényi and Shepp, Ann. Probab. 14 (1986), no. 1, 209–223.
[11] A. Dembo, Y. Peres, J. Rosen, and O. Zeitouni, Cover times for Brownian motion
and random walks in two dimensions, Ann. of Math. (2) 160 (2004), no. 2, 433–
464.
108
BIBLIOGRAPHY
109
[12] R. M. Dudley, Real analysis and probability, Cambridge Studies in Advanced
Mathematics, vol. 74, Cambridge University Press, Cambridge, 2002. Revised
reprint of the 1989 original.
[13]
, Uniform central limit theorems, 2nd ed., Cambridge Studies in Advanced
Mathematics, Cambridge University Press, Cambridge, 2014.
[14] N. Dunford and J. T. Schwartz, Linear Operators. I. General Theory, With the
assistance of W. G. Bade and R. G. Bartle. Pure and Applied Mathematics, Vol.
7, Interscience Publishers, Inc., New York; Interscience Publishers, Ltd., London,
1958.
[15] R. Durrett, Probability: theory and examples, 4th ed., Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, Cambridge,
2010.
[16] U.
Einmahl,
Maattheorie,
Vrije
Universiteit
(http://homepages.vub.ac.be/ ueinmahl/).
Brussel,
2012-2013
[17] E. Hewitt and K. Stromberg, Real and abstract analysis, Springer-Verlag, New
York-Heidelberg, 1975. A modern treatment of the theory of functions of a real
variable; Third printing; Graduate Texts in Mathematics, No. 25.
[18] J. Kiefer, On the deviations in the Skorokhod-Strassen approximation scheme, Z.
Wahrscheinlichkeitstheorie Und Verw. Gebiete 13 (1969), 321-332.
[19] J. Komlós, P. Major, and G. Tusnády, An approximation of partial sums of
independent RV’s and the sample DF. I, Z. Wahrscheinlichkeitstheorie und Verw.
Gebiete 32 (1975), 111–131.
[20]
, An approximation of partial sums of independent RV’s, and the sample
DF. II, Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 34 (1976), no. 1, 33–58.
[21] T. L. Lai, Limit theorems for delayed sums, Ann. Probab. 2 (1974), no. 3, 432–
440.
[22] W. Rudin, Functional analysis, 2nd ed., International Series in Pure and Applied
Mathematics, McGraw-Hill, Inc., New York, 1991.
[23] A. I. Sakhanenko, Rate of convergence in the invariance principle for variables
with exponential moments that are not identically distributed, Limit theorems
for sums of random variables, Trudy Inst. Mat., vol. 3, “Nauka” Sibirsk. Otdel.,
Novosibirsk, 1984, pp. 4–49 (Russian).
[24] V. Strassen, An invariance principle for the law of the iterated logarithm, Z.
Wahrscheinlichkeitstheorie und Verw. Gebiete 3 (1964), 211–226.
BIBLIOGRAPHY
[25]
110
, Almost sure behavior of sums of independent random variables and martingales, Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley,
Calif., 1965/66), Univ. California Press, Berkeley, Calif., 1967, pp. Vol. II: Contributions to Probability Theory, Part 1, pp. 315–343.
[26] K. Yosida, Functional analysis, 6th ed., Grundlehren der Mathematischen
Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 123,
Springer-Verlag, Berlin-New York, 1980.
Index
Lp -convergence, 97
U [−1, 1], 34, 40, 50
σ-finite, 95
Kolmogorov’s 0-1 law, 87
absolutely continuous, 22
Azuma-Hoeffding inequality, 35
Lipschitz function, 4
locally convex topological vector space, 5,
98
locally finite, 100
Borel measure, 100
Borel-Cantelli, 84, 85
Bounded Convergence Theorem, 32, 103
Brownian bridge, 77
Brownian motion, 72
Mean Value Theorem, 25
Minkowski, 11
moment generating function, 103
monotone convergence theorem, 22, 33
moving averages, 88
Cauchy-Schwarz, 9, 93
characteristic function, 91
Chernoff, 82
compact, 6, 98
conditional expectation, 94
continuous mapping theorem, 15, 33
convergence in distribution, 98
convex, 6, 98
convolution, 31
Polish space, 96
positive semidefinite, 5
Prohorov, 96
dominated convergence theorem, 18, 24
embedding, 1
Erdős-Rényi, 80
fixed point, 10, 98
Fubini, 93, 95
gamma(1/2, 2), 45
Gaussian process, 72
gradient, 5
Hölder’s inequality, 93
Hessian, 5
Radon measure, 100
Schauder-Tychonoff, 98
semi-norm, 5, 99
separation, 99
sequence of bounded martingale differences,
35
sequentially compact, 96
simple random walk, 2, 57, 72
Stein coefficient, 4
strong coupling, 3
symmetric ±1-distribution, 2
tight, 96
uniform random permutation, 46
uniformly integrable, 97
vague topology, 100
weak convergence, 101
inner regular, 100
Jensen’s inequality, 95
111