Faculteit Wetenschappen Departement Wiskunde KMT Theorem for the Simple Random Walk Proefschrift ingediend met het oog op het behalen van de graad van Master in de Wiskunde Lies Leemans Promotor: Prof. Dr. U. Einmahl Academiejaar 2014-2015 Abstract The aim of this thesis is to give a proof of the KMT theorem for the simple random walk. This theorem provides a coupling of a simple random walk (Sk )k≥0 with a standard Brownian motion (Bt )t≥0 . More specifically, the result implies that, under certain assumptions, the optimal growth rate of max |Sk − Bk | 1≤k≤n is equal to O(log n). This result is also valid for some other processes apart from the simple random walk. However, the proof of the general KMT theorem is quite technical. Therefore, we present a new technique, introduced by Chatterjee, to prove the theorem for the simple random walk. We use the concept of Stein coefficients. Apart from the proof, this thesis also contains an application of the KMT theorem. Using well-known results for the Brownian motion, the coupling can be used to obtain similar results for other processes such as the simple random walk. i Samenvatting Het doel van deze thesis is om een bewijs te geven van de KMT stelling voor de eenvoudige stochastische wandeling. Deze stelling koppelt een eenvoudige stochastische wandeling (Sk )k≥0 aan een standaard Brownse beweging (Bt )t≥0 . Het resultaat impliceert dat, onder bepaalde assumpties, de optimale orde van max |Sk − Bk | 1≤k≤n gelijk is aan O(log n). Behalve voor de eenvoudige stochastische wandeling, geldt dit resultaat ook voor bepaalde andere processen. Het bewijs van de algemene KMT stelling is echter nogal technisch. Daarom tonen wij een nieuwe techniek - bedacht door Chatterjee - om de stelling te bewijzen voor de eenvoudige stochastische wandeling. We maken gebruik van Stein coëfficiënten. Naast het bewijs, geven we in deze thesis ook een toepassing van de KMT stelling. Door gebruik te maken van gekende resultaten voor de Brownse beweging, kunnen we via de koppeling gelijkaardige resultaten aantonen voor bepaalde andere processen zoals de eenvoudige stochastische wandeling. ii Dankwoord ”Education is not the learning of facts, but the training of the mind to think.” Deze uitspraak van Albert Einstein geeft een perfecte omschrijving van de opleiding wiskunde. Toen ik vijf jaar geleden begon aan deze studie, wist ik niet goed wat me te wachten stond. De enige zekerheid waar ik me toen aan vastklampte, was mijn passie voor het vak. Beetje bij beetje ontdek je de wereld van de wiskunde, maar al snel besef je dat die wereld veel te groot is om helemaal te ontdekken. Ik ervaar dit echter als een voordeel. Dit betekent namelijk dat we altijd kunnen blijven bijleren en ons kunnen verwonderen over de schoonheid van de wiskunde. Naast heel wat theoretische bagage, heeft de vijfjarige opleiding aan de VUB me vooral een bepaalde manier van redeneren opgebracht. Ik wil hiervoor alle professoren bedanken van wie ik les heb mogen krijgen. Al die kennis en techniek hebben uiteindelijk geleid tot dit werk. Hiervoor wil ik in het bijzonder professor Einmahl bedanken. Ondanks zijn zeer drukke agenda, heeft hij me met raad en daad bijgestaan. Op de meest onmogelijke momenten, stond hij toch steeds klaar om me verder te helpen met mijn talloze vragen. Professor Einmahl, ik kijk op naar de toewijding waarmee u uw werk ter harte neemt en de vakgroep ondersteunt. Tot slot zijn er nog twee mensen zonder wie dit alles niet mogelijk zou zijn geweest, mijn ouders. Zij hebben me vijf jaar lang met hart en ziel gesteund tijdens deze veeleisende, maar lonende studie. Mama en papa, dankjewel! iii Contents Introduction 1 1 Stein coefficients 1.1 Strong coupling based on Stein coefficients . . . . . . . . . . . . . . . 1.2 Examples of Stein coefficients . . . . . . . . . . . . . . . . . . . . . . 4 4 34 2 A normal approximation theorem 2.1 Verification of the normal approximation theorem . . . . . . . . . . . 2.2 Conditional version of the normal approximation theorem . . . . . . . 39 39 45 3 The KMT theorem for the SRW 3.1 The induction step . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Completing the proof of the main theorem . . . . . . . . . . . . . . . 57 57 67 4 An application of the KMT theorem 4.1 Theorem of Erdős-Rényi . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Limit theorem concerning moving averages . . . . . . . . . . . . . . . 79 79 88 A Some elaborations A.1 Basic results related to probability theory . . . . . . . . A.2 Topological results required for Theorem 1.1.1 . . . . . . A.2.1 (V, T ) is a locally convex topological vector space A.2.2 The vague topology . . . . . . . . . . . . . . . . A.3 Some straightforward calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 . 91 . 98 . 99 . 100 . 101 List of symbols 106 Bibliography 108 Index 111 iv Introduction 2 Let 1 , 2 , ... be a sequence Pk of i.i.d. random variables with E[1 ] = 0 and E[1 ] = 1. Let S0 = 0 and Sk = i=1 i , for each k ≥ 1. Assume our aim is to construct a version of (Sk )k≥0 and a standard Brownian motion (Bt )t≥0 on the same probability space, such that the growth rate of max |Sk − Bk | 1≤k≤n is minimal. This is what is called coupling a random walk with a Brownian motion. We usually refer to this as an ’embedding problem’. A lot of mathematicians have already studied the embedding problem. The first research was done by Strassen [24]. He introduced an almost sure invariance principle. More specifically, his purpose was to find an optimal increasing function g such that |Sn − Bn | → 0, g(n) a.s., as n → ∞. In 1964 he proved that |S − Bn | √ n → 0, n log log n a.s., as n → ∞. Later on it turned out that no better convergence rate is possible in general for random variables with finite second moments. Assuming higher moments, better convergence rates can be achieved. In 1965 Strassen [25] showed that it is possible to have max |Sk − Bk | = O(n1/4 (log n)1/2 (log log n)1/4 ) a.s. 1≤k≤n under the assumption that E[41 ] < ∞. To prove these results, Strassen used the Skorokhod embedding. Kiefer [18] then showed that no better approximation can be achieved if one uses the Skorokhod embedding. So new construction methods had to be developed. Eventually it was established by Komlós et al. [19], [20] that it is possible to have max |Sk − Bk | = O(log n) 1≤k≤n under the assumption that 1 has a finite moment generating function in a neighbourhood of zero. Besides, they also proved that, under this assumption, O(log n) is the optimal growth rate. They obtained this result from the following coupling inequality. 1 Theorem 1. (Komlós-Major-Tusnády) Let 1 , 2 , ... be a sequence of i.i.d. random variables with E[1 ] = P 0, E[21 ] = 1 and E[exp(θ|1 |)] < ∞ for some θ > 0. Let k S0 = 0 and let Sk = i=1 i , for each k ≥ 1. Then for any n, it is possible to construct a version of (Sk )0≤k≤n and a standard Brownian motion (Bt )0≤t≤n on the same probability space such that for all x ≥ 0: P max |Sk − Bk | ≥ C log n + x ≤ Ke−λx , k≤n where C, K and λ do not depend on n. This result of Komlós et al. was very surprising, since it was already known that o(log n) could not be achieved unless the variables i are standard normal. Indeed, if 1 , 2 , ... are i.i.d. random variables with mean zero and variance one, and if (Bt )0≤t<∞ is a Brownian motion such that Sn − Bn = o(log n) a.s., then the random variables i have a standard normal distribution. This result is shown in Section 4.1. For a more detailed account of the history of embeddings, we refer to the book of Csörgő and Révész [9]. Unfortunately, the proof of the KMT theorem is technically very difficult and it is hard to generalize. Recently, Chatterjee [6] came up with a new proof of the KMT theorem for the simple random walk (SRW). This proof will be the subject of this work. Theorem 2. Let 1 , 2 , ... be a sequence of i.i.d. symmetric ±1-valued random variPk ables. Let S0 = 0 and let Sk = i=1 i , for each k ≥ 1. It is possible to construct a version of the sequence (Sk )k≥0 and a standard Brownian motion (Bt )t≥0 on the same probability space such that for all n and all x ≥ 0: P max |Sk − Bk | ≥ C log n + x ≤ Ke−λx , k≤n where C, K and λ do not depend on n. The importance of the KMT theorem for the SRW, lies in the fact that the SRW is used in many domains of mathematics and science. For a specific example we refer the interested reader to [11]. This work of Dembo et al. concerns the cover time for the SRW on the discrete two-dimensional torus Z2n = Z2 /nZ2 . Results about this cover time are obtained by first showing a result for the cover time for the Brownian motion on the two-dimensional torus π 2 and then applying the KMT theorem. The method we will use in order to prove this theorem is different from the one to prove the general KMT theorem. The arguments and proofs in this master thesis are based on the paper [6] of Chatterjee. The paper of Chatterjee provides us with a main overview of the proof. Our aim will be to check all the details and to rigorously work out the whole proof. 2 We start with a method that yields a coupling of an arbitrary random variable W with a normally distributed random variable Z. The coupling will be such that the tails of W − Z are exponentially decaying. This is what is called a strong coupling. This coupling method will be based on the concept of Stein coefficients. The goal of Chapter 1 is to give a proof of this ’strong coupling’ theorem. Recall the notation of our main theorem; Theorem 2. Once we have established the ’strong coupling’ theorem, we can make use of Stein coefficients in order to couple Sn with a random variable Zn ∼ N (0, n). More specifically, we will construct a version of Sn and Zn on the same probability space such that for all n: E exp(θ0 |Sn − Zn |) ≤ κ, where θ0 > 0 and κ are universal constants. Chapter 2 will be dedicated to this normal approximation theorem. Apart from this theorem, Chapter 2 also provides another result. Basically this result is a conditional version of the normal approximation theorem. We fix the value of Sn , for n ≥ 3, and produce a coupling of Sk , for n/3 ≤ k ≤ 2n/3, with a certain Gaussian random variable. This result will turn out to be useful in Chapter 3, where we will complete the proof of Theorem 2. In Chapter 3 we will first give an induction argument in order to obtain a ’finite n version’ of Theorem 2. This means we will produce a coupling of (Sk )k≤n with random variables (Zk )k≤n , having a joint Gaussian distribution, mean zero and Cov(Zi , Zj ) = i ∧ j. Lemma 3.2.1 is dedicated to this result, which already quite resembles Theorem 2. Although, to finish the proof of the KMT theorem for the SRW, we need one coupling for the whole sequence. Theorem 3.2.2 provides the completing arguments. Whereas the first three chapters are dedicated to the proof of the KMT theorem for the SRW, Chapter 4 provides an application of the general KMT theorem. The results of the latter chapter are based on the book [9]. We will consider a sequence of i.i.d. random variables with a finite moment generating function. The goal of Chapter 4 is to give some results concerning the size of the increments of the partial sums. In particular, we want to point out how we can use the KMT theorem in order to obtain such results. Besides, as mentioned earlier, Chapter 4 also contains a proof of why o(log n) cannot be achieved unless the random variables i are standard normal. 3 Chapter 1 Stein coefficients In this chapter we will focus on a so-called ’strong coupling’ theorem. This theorem produces a coupling of an arbitrary random variable W with a Gaussian random variable Z. One of the assumptions in the theorem is the existence of a Stein coefficient for W . Definition 1.0.1. Let W and T be random variables, defined on the same probability space, such that whenever ψ is a Lipschitz function and ψ 0 is the derivative of ψ a.e., we have: E[W ψ(W )] = E[ψ 0 (W )T ]. Then we call T a Stein coefficient for W . Now we can give a formulation of the strong coupling theorem. Theorem 1.0.2. Suppose W is a random variable with E[W ] = 0 and E[W 2 ] < ∞. Let T be a Stein coefficient for W and assume |T | ≤ K a.e., for some constant K. Then, for any σ 2 > 0, we can construct Z ∼ N (0, σ 2 ) on the same probability space such that for any θ ∈ R: 2 2θ (T − σ 2 )2 E exp(θ|W − Z|) ≤ 2E exp . σ2 The idea behind this theorem, is that if T ' σ 2 with high probability, then W ' Z where Z ∼ N (0, σ 2 ). This intuition can be motivated by the fact that a random variable Z follows the N (0, σ 2 ) distribution if and only if E[Zψ(Z)] = σ 2 E[ψ 0 (Z)] for all continuous differentiable functions ψ for which E|ψ 0 (Z)| < ∞. Proposition A.1.3 in the appendix is dedicated to this result. We will start with a proof of the strong coupling theorem. In the second section we will give a few examples, in order to gain some insight into the concept of Stein coefficients. 1.1 Strong coupling based on Stein coefficients In order to give a proof of Theorem 1.0.2, we will need several auxiliary results. We start with the theorem below, which will turn out to be very useful in proving the 4 CHAPTER 1. STEIN COEFFICIENTS 5 other results in this section. Recall that ∇f and Hess f denote the gradient and Hessian of a function f ∈ C 2 (Rn ), and Tr stands for the trace of a matrix. Moreover we will use the notation k · k for both the Euclidean norm on Rn and the matrix norm induced by the Euclidean norm, i.e. let B ∈ Rn×n then kBk := kBxk . x∈Rn \{0} kxk sup The context will make clear which one of both norms is meant. Theorem 1.1.1. Let n be a strictly positive integer and suppose A is a continuous map from Rn into the set of n × n symmetric positive semidefinite matrices. Suppose there exists a constant b ≥ 0 such that for all x ∈ Rn : kA(x)k ≤ b. Then there exists a probability measure µ on (Rn , Rn ) such that for any random vector X with distribution µ, we have that: E exp hθ, Xi ≤ exp(bkθk2 ) (1.1) E hX, ∇f (X)i = ETr(A(X)Hessf (X)) (1.2) for all θ ∈ Rn , and: for all f ∈ C 2 (Rn ) such that E|f (X)|2 , Ek∇f (X)k2 , and E|Tr(A(X)Hess f (X))| are finite. Before giving a proof of this theorem, we introduce some notation. Let V denote the space of all finite signed measures on (Rn , Rn ). Moreover, letR T denote the topology generated by the separating family of semi-norms |µ|f := | f dµ|, where f ranges over all continuous functions with compact support. Furthermore, let Ve be the subspace of V consisting of all probability measures on (Rn , Rn ), equipped with the trace topology TVe . For more details on the properties of the spaces (V, T ) and (Ve , TVe ) we refer to Appendix A.2. For the purposes of the proof of Theorem 1.1.1, we only need the following proposition. Proposition 1.1.2. The following properties hold: (i) (V, TV ) is a locally convex topological vector space. (ii) (Ve , TVe ) is a metric space and a sequence (µn )n in (Ve , TVe ) is convergent if and only if it is weakly convergent. We will now proceed with the proof of Theorem 1.1.1. CHAPTER 1. STEIN COEFFICIENTS 6 Proof. Let K denote the set of all probability measures µ on (Rn , Rn ) satisfying Z Z xµ(dx) = 0 and exp hθ, xi µ(dx) ≤ exp(bkθk2 ) for all θ ∈ Rn . Note that the first assumption is a shorthand notation for : Z pri (x)µ(dx) = 0, for all 1 ≤ i ≤ n, where pri denotes the i-th projection. (STEP 1) It is our aim to show that K is a nonempty, compact and convex subset of the space V . (1.a) We will start with showing that K is nonempty. Let Z be an n-dimensional standard normal random vector with Z = (Z1 ,√. . . , Zn )T and Z1 , . . . , Zn independent standard normal random variables. Set Y = 2bZ and let ν be the distribution of Y . We will show that ν ∈ K. Obviously ν is a probability measure on (Rn , Rn ) such that: Z Z Z √ xν(dx) = xPY (dx) = Y (ω)P(dω) = E[Y ] = 2bE[Z] = 0, where we have used the formula of the image measure. Using the independence of the random variables Zi , we have for all θ ∈ Rn : " n # Z n Y Y √ √ √ exp( 2bθi Zi ) = E exp( 2bθi Zi ). exphθ, xiν(dx) = E exphθ, 2bZi = E i=1 i=1 Using the well-known expression for the mgf of a standard Gaussian random variable, we obtain: ! 2 Z n n X Y 2bθi = exp b θi2 = exp(bkθk2 ). exphθ, xiν(dx) = exp 2 i=1 i=1 Thus, we can conclude that ν ∈ K, and K is nonempty. (1.b) Next, we will prove that K is a compact subset of V . Since K ⊂ Ve , it is obviously sufficient to prove that K is compact in (Ve , TVe ). Using that (Ve , TVe ) is a metric space, this is equivalent with showing that K is closed and sequentially compact. We will start with proving that K is sequentially compact. Let (µm )m≥1 be a sequence in K. Prohorov’s Theorem A.1.19 states that (µm )m≥1 is sequentially compact if and only if this sequence is tight. Furthermore, Theorem A.1.20 gives a sufficient condition for tightness. Let Xm : Ω → Rn be random vectors with µm = PXm . Then it suffices to prove that there exists an α > 0 such that supm≥1 E[kXm kα ] < ∞. We can take α = 2. Indeed, for all m ≥ 1 we have that E[kXm k2 ] ≤ 4n exp b. Later on in this proof (on p. 11), a more general case of this inequality will be shown. Hence, sup E[kXm k2 ] ≤ 4n exp(b) < ∞. m≥1 CHAPTER 1. STEIN COEFFICIENTS 7 Thus, we have proved that (µm )m≥1 is sequentially compact. As a consequence, K is sequentially compact too. Now, we will prove that K is closed. Let (µm )m≥1 be a w sequence in K and assume µm → µ. It suffices to prove that µ ∈ K. Take random d vectors Xm ∼ µm and X ∼ µ. Clearly Xm → X, as m → ∞. Let θ ∈ Rn be arbitrary, then Proposition A.1.24 implies that E exphθ, Xi ≤ lim inf E exphθ, Xm i. m→∞ Since PXm = µm ∈ K, we have that E exphθ, Xm i ≤ exp(bkθk2 ) for all m ≥ 1. Therefore, Z exphθ, xiµ(dx) = E exphθ, Xi ≤ exp(bkθk2 ). d On the other hand, we know that Xm → X and that projections are continuous. Writing Xm,j for the components of Xm and Xj for the components of X, the cond tinuous mapping theorem implies that Xm,j → Xj as m → ∞, for all 1 ≤ j ≤ n. Our aim is to prove that E[X] = 0. Since we know that E[Xm,j ] = 0 for all m and j, it suffices to prove that E[Xm,j ] → E[Xj ] for all 1 ≤ j ≤ n. Using Proposition A.1.23, we see that it is sufficient to find a δ > 0 such that supm≥1 E[|Xm,j |1+δ ] < ∞. Obviously E[|Xm,j |2 ] ≤ E[kXm k2 ]. Thus, putting δ = 1, we can conclude by a former reasoning that: sup E[|Xm,j |1+δ ] ≤ 4n exp b < ∞. m≥1 Thus, we have shown that µ ∈ K. Therefore, K is a compact set. (1.c) Finally, we will show that K is convex. Let µ1 , µ2 ∈ K and α ∈ [0, 1]. Using Proposition A.3.2 (i), we see that αµ1 + (1 − α)µ2 is a measure on (Rn , Rn ). Obviously, αµ1 + (1 − α)µ2 is a probability measure, since α ∈ [0, 1] and since µ1 and µ2 are probability measures themselves. Now, using Proposition A.3.2 (ii) and the fact that µ1 , µ2 ∈ K, we have: Z Z Z x(αµ1 + (1 − α)µ2 )(dx) = α xµ1 (dx) +(1 − α) xµ2 (dx) = 0. | {z } | {z } =0 =0 Moreover, for any θ ∈ Rn : Z exphθ, xi(αµ1 + (1 − α)µ2 )(dx) Z Z = α exphθ, xiµ1 (dx) + (1 − α) exphθ, xiµ2 (dx) ≤ α exp(bkθk2 ) + (1 − α) exp(bkθk2 ) = exp(bkθk2 ). Therefore, we have shown that αµ1 + (1 − α)µ2 ∈ K. Thus, K is convex. (STEP 2) Fix ∈ (0, 1) and define a map T : K → V as follows. Given µ ∈ K, let X be a random vector with distribution µ and let Z be a random vector, independent CHAPTER 1. STEIN COEFFICIENTS 8 of X and following the standard normal law on Rn . Furthermore, let T µ be the distribution of the random vector p (1 − )X + 2A(X)Z, p where A(X) denotes the symmetric, positive semidefinite matrix satisfying p p A(X) A(X) = A(X). Since T µ is a probability measure, it is obvious that T µ ∈ V . (2.a) We will now show that T µ ∈ K. Let θ ∈ Rn , then the definition of T µ clearly implies that: Z E D p exp hθ, xi (T µ)(dx) = E exp θ, (1 − )X + 2A(X)Z . Rn Since the integrand is positive and since X and Z are independent, we can use Corollary A.1.4, and hence: Z Z p E[exp(hθ, (1 − )xi + hθ, 2A(x)Zi)]PX (dx) exp hθ, xi (T µ)(dx) = n Rn ZR p = exp hθ, (1 − )xi E[exphθ, 2A(x)Zi]PX (dx). Rn D p E Next, we will compute E exp θ, 2A(x)Z for x ∈ Rn fixed. To ease the notation p we will write B = A(x). Then, obviously: ! n n D p E X X √ E exp θ, 2A(x)Z = E exp 2Bij Zj θi i=1 " = E n Y j=1 n √ X 2 θi Bij Zj exp j=1 !# . i=1 By definition √ Pnof Z, we know that the random variables Zj are independent. Moreover, since 2 i=1 θi Bij is a constant, we can use the well-known expression for the moment generating function of a standard normal random variable, and we get: ! n n D p E Y √ X E exp θ, 2A(x)Z = E exp 2 θi Bij Zj j=1 = n Y i=1 exp j=1 = exp !2 θi Bij i=1 n n X X j=1 = exp n X ! θi Bij i=1 n X n X i=1 k=1 θi θk n X !! θk Bkj k=1 n X j=1 ! Bij Bkj . CHAPTER 1. STEIN COEFFICIENTS 9 Using that the square root PB of the matrix A(x) is a symmetric matrix, it is clear that A(x)ik = (BB T )ik = nj=1 Bij Bkj . Thus, we have shown that: ! n X n E D p X = exp E exp θ, 2A(x)Z θi A(x)ik θk = exp ( hθ, A(x)θi) . i=1 k=1 Therefore, we can conclude that: Z Z exp hθ, xi (T µ)(dx) = Rn exp hθ, (1 − )xi exp (hθ, A(x)θi) PX (dx) Rn = E exp (hθ, (1 − )Xi + hθ, A(X)θi) . We will prove that this last expression can be bounded by exp bkθk2 E exphθ, (1 − )Xi. Obviously, it suffices to show that hθ, A(X(ω))θi ≤ bkθk2 for all ω ∈ Ω. By definition of the induced matrixnorm it is clear that kBxk ≤ kBkkxk, for any real n × n-matrix B and any x ∈ Rn . Using this together with the Cauchy-Schwarz inequality and the boundedness of A, we obtain: hθ, A(X(ω))θi ≤ kθkkA(X(ω))θk ≤ kθk2 kA(X(ω))k ≤ bkθk2 . Recall that PX = µ ∈ K, so that by definition of K: E exphθ, (1 − )Xi = E exph(1 − )θ, Xi Z exph(1 − )θ, xiµ(dx) = Rn ≤ exp b(1 − )2 kθk2 . Thus, we have shown that: Z exp hθ, xi (T µ)(dx) ≤ exp bkθk2 exp b(1 − )2 kθk2 Rn = exp b(1 − + 2 )kθk2 ≤ exp(bkθk2 ), where this last inequality follows from the fact that 1 − + 2 ≤ 1, since ∈ (0, 1). If R we can show that x(T µ)(dx) = 0, then we can conclude that T : K → K. Since PX = µ ∈ K, it is obvious that E[X] = 0. Hence, Z h i √ hp i p A(X)Z . x(T µ)(dx) = E (1 − )X + 2A(X)Z = 2E Rn Since the random vectors X and Z are independent, we can use Corollary A.1.4 in order to conclude that: Z Z p p p E[ A(X)Z] = E[ A(x)Z]PX (dx) = A(x) E[Z] PX (dx) = 0. |{z} Rn Rn =0 CHAPTER 1. STEIN COEFFICIENTS 10 This means we have proved that T maps K into K. (2.b) We will now show that T is continuous under the topology T . Assume w w µn → µ, if n → ∞. It suffices to prove that T µn → T µ. We take random vectors X ∼ µ and Xn ∼ µn for n ≥ 1. Moreover, we define a map G as follows: p G : Rn × Rn → Rn : (x, z) 7→ (1 − )x + 2A(x)z. Clearly, the function G is continuous. Indeed, this follows √ from the continuity of A together with the continuity of the transformation B 7→ B. For a proof of this last claim, we refer to Eq. (X.2) on p. 290 of [3]. Using properties of the product measure and the image measure, it is easy to see that: T µ = (PX ⊗ PZ )G and T µn = (PXn ⊗ PZ )G . w We know that PXn → PX . Using characteristic functions, it follows that: w PXn ⊗ PZ → PX ⊗ PZ . Since G is continuous, the continuous mapping theorem implies that: w (PXn ⊗ PZ )G → (PX ⊗ PZ )G . w Thus we have shown that T µn → T µ and we can conclude that T : K → K is a continuous mapping. (STEP 3) The Schauder-Tychonoff fixed point theorem A.2.1 for locally convex topological vector spaces, implies that T must have a fixed point in K. For each ∈ (0, 1), let µ be a fixed point of T , and let X denote a random vector with distribution µ . (STEP 4) Take an arbitrary f ∈ C 2 (Rn ) with ∇f andpHess f bounded and uniformly continuous. Fix ∈ (0, 1) and define Y := −X + 2A(X )Z. By definition of T , we have that: PX = µ = T (µ ) = P(1−)X +√2A(X )Z = PX +Y . This clearly implies that E[f (X )] = E[f (X + Y )] or equivalently that: E[f (X + Y ) − f (X )] = 0. Let (1.3) 1 R = f (X + Y ) − f (X ) − hY , ∇f (X )i − hY , Hess f (X )Y i. 2 (4.a) First, note that: EhY , ∇f (X )i = −EhX , ∇f (X )i. This can be seen as follows. Simply using the definition of Y it is clear that: p EhY , ∇f (X )i = −EhX , ∇f (X )i + Eh 2A(X )Z, ∇f (X )i. (1.4) CHAPTER 1. STEIN COEFFICIENTS 11 p Thus it suffices to prove that Eh 2A(X )Z, ∇f (X )i = 0. Since Z and X are independent and E[Z] = 0, one easily sees that EhZ, g(X )i = 0 for any Borelmeasurable function g : Rn → Rn . Consequently, p p Eh 2A(X )Z, ∇f (X )i = EhZ, 2A(X )∇f (X )i = 0. This finishes the argument, and (1.4) has been proved. (4.b) Secondly, we will show that all moments of kX k are bounded by constants that do not depend on . For 1 ≤ j ≤ n, let ej := (0, . . . , 0, 1, 0, . . . , 0)T ∈ Rn , where the j-th component of ej is equal to 1 and all other components are zero. Since PX ∈ K, the definition of K clearly implies that for all 1 ≤ j ≤ n: E[e|X,j | ] = E[eX,j I{X,j ≥0} ] + E[e−X,j I{X,j <0} ] ≤ E[eX,j ] + E[e−X,j ] = E[exphej , X i] + E[exph−ej , X i] ≤ exp b + exp b = 2 exp b. |X |l ≤ From the Taylor expansion of the exponential function we can deduce that ,j l! l exp |X,j |, for all l ≥ 0. Therefore, we have shown that E[|X,j | ] ≤ l!2 exp b. Let kY kk := E[|Y |k ]1/k for an arbitrary random variable Y . It is well known that k · kk is a semi-norm. In particular, using the inequality of Minkowski, we have for all k ≥ 1 that: 2 2 E kX k2k = kkX k2 kkk = kX,1 + · · · + X,n kkk k 2 2 ≤ kX,1 kk + · · · + kX,n kk 2k 1/k 2k 1/k k = E[X,1 ] + · · · + E[X,n ] k ≤ n ((2k)!2 exp b)1/k = nk (2k)!2 exp b. Since this holds for all k ≥ 1, we have shown that all even moments of kX k are bounded by constants that do not depend on . The same holds for the odd moments, since kY kp ≤ kY kq , if p ≤ q. (4.c) Next, we will rewrite the expression EhY , Hessf (X )Y i. More specifically, we will show that: EhY , Hessf (X )Y i = 2ETr(A(X )Hessf (X )) + O(2 ). (1.5) Simply using the definition of Y , one sees that: p p EhY , Hessf (X )Y i = Eh−X + 2A(X )Z, Hessf (X )(−X + 2A(X )Z)i p √ = 2 EhX , Hessf (X )X i − 83/2 Eh A(X )Z, Hessf (X )X i p p +2Eh A(X )Z, Hessf (X ) A(X )Zi. We will work out these 3 terms separately. Since Hessf is bounded, there exists a constant C such that kHessf (x)k ≤ C for all x ∈ Rn . Recall that kBxk ≤ kBkkxk CHAPTER 1. STEIN COEFFICIENTS 12 for any real n × n-matrix B and any vector x ∈ Rn . Thereby, using the inequality of Cauchy-Schwarz, the first term can be bounded as follows: 2 EhX , Hessf (X )X i ≤ 2 E [kX kkHessf (X )X k] ≤ 2 E kX k2 kHessf (X )k ≤ C2 E kX k2 = O(2 ). For the second term, we use the independence of X and Z, and we get: p p Eh A(X )Z, Hessf (X )X i = EhZ, A(X )Hessf (X )X i = 0. In order to prove (1.5), it thus remains to show that: Dp E p 2E A(X )Z, Hessf (X ) A(X )Z = 2ETr(A(X )Hessf (X )). Using again Corollary A.1.4, we have that: Dp E Z Dp E p p E A(X )Z, Hessf (X ) A(X )Z = E A(x)Z, Hessf (x) A(x)Z µ (dx). Rn The integrand on the right-hand side, can be rewritten as: " n # Dp E X p p p E A(x)Z, Hessf (x) A(x)Z = E A(x)Z Hessf (x) A(x)Z = i=1 n n X n XX i p i p A(x)ij (Hessf (x) A(x))ik E[Zj Zk ]. i=1 j=1 k=1 Using independence, it is obvious that E[Zj Zk ] = E[Zj ]E[Zk ] = 0 if j 6= k. On the other hand E[Zj2 ] = 1. Let δjk denote the Kronecker delta, then E[Zj Zk ] = δjk . Thus, p using that both of the two matrices A(x) and Hessf (x) are symmetric, we have that: n X n Dp E X p p p E A(x)Z, Hessf (x) A(x)Z = A(x)ij Hessf (x) A(x) = = i=1 j=1 n X n X ij p p T T A(x)ij A(x) Hessf (x) i=1 j=1 n X ji p p A(x) A(x)Hessf (x) i=1 ii = Tr(A(x)Hessf (x)). Therefore, we can conclude that: Z Dp E Dp E p p E A(X )Z, Hessf (X ) A(X )Z = E A(x)Z, Hessf (x) A(x)Z µ (dx) n ZR = Tr(A(x)Hessf (x))µ (dx) Rn = ETr(A(X )Hessf (X )), CHAPTER 1. STEIN COEFFICIENTS 13 which finishes the proof of (1.5). (4.d) We will now prove that |R | ≤ kY k2 δ(kY k), where δ : [0, ∞) → [0, ∞) is a bounded function satisfying limt→0 δ(t) = 0. Using Taylor expansion for a function of n variables, we obtain: (2) ∂ ∂ 1 Y,1 + · · · + Y,n f (X + θY ), f (X + Y ) = f (X ) + hY , ∇f (X )i + 2 ∂x1 ∂xn for a certain map θ : Ω → (0, 1). The square in the last term of this formula is a shorthand notation and is defined as follows: (2) n X n X ∂ ∂ ∂ 2f Y,1 + · · · + Y,n f (X + θY ) = Y,i Y,j (X + θY ) ∂x1 ∂xn ∂xi ∂xj i=1 j=1 = hY , Hessf (X + θY )Y i. Thus, for a specific map θ : Ω → (0, 1), it holds that: 1 f (X + Y ) = f (X ) + hY , ∇f (X )i + hY , Hessf (X + θY )Y i. 2 Simply substituting this into the definition of R yields: 1 R = hY , (Hessf (X + θY ) − Hessf (X ))Y i. 2 Again, using the Cauchy-Schwarz inequality and the definition of the induced matrix norm, we get: 1 |hY , (Hessf (X + θY ) − Hessf (X ))Y i| 2 1 kY k2 kHessf (X + θY ) − Hessf (X )k. ≤ 2 |R | = We will show how we can use the boundedness and uniform continuity of Hessf to construct the δ mentioned earlier. First of all, we use the uniform continuity of Hessf . Let η > 0, then there exists a ξ > 0 such that: 1 ∀x, y ∈ Rn : kyk < ξ ⇒ kHessf (x + y) − Hessf (x)k < η. 2 For simplicity we will write B = Hessf . The map δ can be constructed as follows. First let η = 1, 1 ∃ξ1 > 0, ∀x, y ∈ Rn : kyk < ξ1 ⇒ kB(x + y) − B(x)k < 1. 2 We define δ(t) := C, for t ≥ ξ1 . (Recall that C was chosen such that kB(z)k ≤ C for all z ∈ Rn .) Secondly, let η = 1/2, then: 1 1 ∃ξ2 ∈ (0, ξ1 ), ∀x, y ∈ Rn : kyk < ξ2 ⇒ kB(x + y) − B(x)k < . 2 2 CHAPTER 1. STEIN COEFFICIENTS 14 We define δ(t) := 1, for ξ2 ≤ t < ξ1 . Next, let η = 1/4, 1 ∃ξ3 ∈ (0, ξ2 ), ∀x, y ∈ Rn : kyk < ξ3 ⇒ kB(x + y) − B(x)k < 1/4. 2 We define δ(t) := 1/2, for ξ3 ≤ t < ξ2 . It is easily seen that, continuing this reasoning, leads to a bounded map δ : [0, ∞) → [0, ∞) such that limt→0 δ(t) = 0 and |R | ≤ kY k2 δ(kY k). P (4.e) Next, we will show that δ 2 (kY k) → 0 as → 0. To that end we will first prove that E[−2 kY k4 ] can be bounded by a constant that does not depend on . Using the simple inequality (a + b)4 ≤ 8 (a4 + b4 ), we obtain: p kY k4 = k − X + 2A(X )Zk4 4 p ≤ k − X k + k 2A(X )Zk p ≤ 84 kX k4 + 322 k A(X )Zk4 . Since ∈ (0, 1), it is obvious that: p p A(X )Zk4 ≤ 8kX k4 + 32k A(X )Zk4 p Thus, it suffices to prove that EkX k4 and Ek A(X )Zk4 can be bounded by constants that do not depend on . For EkX k4 we have already proved this before. Onpthe other hand, using Corollary A.1.4, a straightforward calculation shows that Ek A(X )Zk4 ≤ 3b2 n10 , which is a constant that does not depend on . Therefore, we have shown that E[−2 kY k4 ] can be bounded by a constant L independent P of . Subsequently, it can be easily seen that kY k → 0 as → 0. Indeed, since E [kY k4 ] ≤ L2 , it is clear that kY k → 0 in L4 as → 0. Using Proposition A.1.22, P we can conclude that kY k → 0 as → 0. Now, let {n | n ≥ 1} be a sequence in P (0, 1) such that n → 0 as n → ∞. Since kY k → 0, Proposition A.1.21 implies that there exists a subsequence {nk | k ≥ 1} such that kYnk k → 0 almost surely. By construction of δ it is clear that δ 2 (kYnk k) → 0 almost surely. Again using the char−2 kY k4 ≤ 82 kX k4 + 32k P acterization A.1.21 of convergence in probability, we can conclude that δ 2 (kY k) → 0 as → 0. (4.f) Combining some of the above-mentioned properties, we will now show that: lim −1 E|R | = 0. →0 (1.6) Using Hölder’s inequality A.1.6 with p = q = 2, we see that: E[−1 |R |] ≤ E[−1 kY k2 δ(kY k)] ≤ E[−2 kY k4 ]1/2 E[δ 2 (kY k)]1/2 . Recall that E[−2 kY k4 ] is bounded by a constant not depending on . Thus, in order to prove (1.6) it suffices to prove that E[δ 2 (kY k)] → 0 as → 0. Since δ is a bounded function, the family {δ 2 (kY k) | ∈ (0, 1)} is uniformly integrable. Using P this, together with the fact that δ 2 (kY k) → 0 and the characterization in A.1.22, we CHAPTER 1. STEIN COEFFICIENTS 15 obtain that δ 2 (kY k) → 0 in L1 . Or, equivalently E[δ 2 (kY k)] → 0. This concludes the proof of (1.6). (4.g) Recall that K is a compact set. Since {µ }0<<1 is a collection in K, this collection has a cluster point µ ∈ K. This means that there exists a sequence (j )j w in (0, 1) such that j → 0 and µj → µ, as j → ∞. Let X be an arbitrary random d vector with law µ, then Xj → X. Note that for any ∈ (0, 1), we have by definition of R and by equations (1.4) and (1.5) that: −1 E[R ] =−1 E[f (X + Y ) − f (X )] + EhX , ∇f (X )i 1 − 2ETr(A(X )Hessf (X )) + O(). 2 Using equation (1.3), this implies that: −1 E[R ] = EhX , ∇f (X )i − ETr(A(X )Hessf (X )) + O(). Using the triangle inequality and equation (1.6), it is clear that: |EhX , ∇f (X )i − ETr(A(X )Hessf (X ))| ≤ |EhX , ∇f (X )i − ETr(A(X )Hessf (X )) + O()| + O() = −1 |E[R ]| + O() ≤ −1 E|R | + O() → 0. Let aj := EhX, ∇f (X)i − EhXj , ∇f (Xj )i and bj := |ETr(A(Xj )Hessf (Xj )) − ETr(A(X)Hessf (X))|, then by the triangle inequality: |EhX, ∇f (X)i − ETr(A(X)Hessf (X))| ≤ aj + |EhXj , ∇f (Xj )i − ETr(A(Xj )Hessf (Xj ))| + bj . The left-hand side of this inequality does not depend on j. Thus, in order to prove that this left-hand side is zero, it suffices to prove that the right-hand side converges to zero as j → ∞. We have already shown the middle term converges to zero, thus it is sufficient to show that aj → 0 and bj → 0. Since ∇f is uniformly continuous, it is obvious that the map x 7→ hx, ∇f (x)i is continuous. Using the continuous d d mapping theorem and the fact that Xj → X, we can conclude that hXj , ∇f (Xj )i → hX, ∇f (X)i. In order to prove that aj → 0, it suffices to prove that: sup E |hXj , ∇f (Xj )i|1+α < ∞, j≥1 for some α > 0. This is because of Proposition A.1.23. Setting α = 1, the CauchySchwarz inequality and the inequality of Hölder A.1.6 imply that: E |hXj , ∇f (Xj )i|1+α = E |hXj , ∇f (Xj )i|2 ≤ E kXj k2 k∇f (Xj )k2 1/2 1/2 ≤ E kXj k4 E k∇f (Xj )k4 . CHAPTER 1. STEIN COEFFICIENTS 16 Since ∇f is bounded and since the moments of kX k are bounded by constants not depending on , there exists a constant C̃ > 0 such that for all j ≥ 1: E |hXj , ∇f (Xj )i|1+α ≤ C̃. This finishes the argument for aj → 0. To see that bj → 0, we proceed as follows. First note that for any n × n-matrices B and D, we have: |Tr(B) − Tr(D)| = |Tr(B − D)| ≤ n X |bii − dii | ≤ nkB − Dk, i=1 where we used Proposition A.3.3 for the last inequality. This shows that Tr is a (uniformly) continuous function. Moreover, A and Hessf are also continuous and d Xj → X. Therefore, we can use the continuous mapping theorem, and: d Tr(A(Xj )Hessf (Xj )) → Tr(A(X)Hessf (X)). Thus, again by Proposition A.1.23, it suffices to prove that: sup E |Tr(A(Xj )Hessf (Xj ))|2 < ∞. j≥1 Note that for any two n × n-matrices B and D, we have that kBDk ≤ kBkkDk. Using again Proposition A.3.3, it is clear that: |Tr(A(Xj )Hessf (Xj ))| ≤ nkA(Xj )Hessf (Xj )k ≤ nkA(Xj )kkHessf (Xj )k ≤ nbC. This implies that supj≥1 E |Tr(A(Xj )Hessf (Xj ))|2 ≤ (nbC)2 . Thus, we have proved that aj → 0 and bj → 0. As mentioned before, this implies that: EhX, ∇f (X)i = ETr(A(X)Hessf (X)). This completes the proof for f ∈ C 2 (Rn ) with ∇f and Hessf bounded and uniformly continuous. (STEP 5) Finally, take an arbitrary f ∈ C 2 (Rn ). Let g : Rn → [0, 1] be a C ∞ function such that g(x) = 1 if kxk ≤ 1 and g(x) = 0 if kxk ≥ 2. For each a > 1, let fa (x) := f (x)g(a−1 x). The first and second order partial derivatives of fa are given by the expressions: ∂f ∂g −1 ∂fa (x) = (x)g(a−1 x) + a−1 f (x) (a x), ∂xi ∂xi ∂xi ∂ 2 fa ∂ 2f ∂f ∂g −1 (x) = (x)g(a−1 x) + a−1 (x) (a x) ∂xi ∂xj ∂xi ∂xj ∂xi ∂xj ∂f ∂g −1 ∂ 2g +a−1 (x) (a x) + a−2 f (x) (a−1 x). ∂xj ∂xi ∂xi ∂xj CHAPTER 1. STEIN COEFFICIENTS 17 Since g ∈ C ∞ and f ∈ C 2 , these expressions clearly imply that fa ∈ C 2 . From now on, we will use the following notation: B(0, ) := {x ∈ Rn | kxk ≤ }. (5.a) We will show that ∇fa and Hessfa are bounded and uniformly continuous. a We will first consider the boundedness of ∇fa . We have just shown that ∂f is ∂xi ∂fa continuous. Since B(0, 2a) is compact, ∂xi is bounded on this closed ball. By the a a above, it is clear that ∂f (x) = 0 if x 6∈ choice of g and by the expression for ∂f ∂xi ∂xi ∂fa B(0, 2a). Thus ∂xi is bounded on Rn . Since this holds for all 1 ≤ i ≤ n and since ∂fa ∂fa ∇fa (x) = (x), . . . , (x) , ∂x1 ∂xn it is easy to see that ∇fa is bounded. The boundedness of Hessfa can be shown in a similar way. Secondly, we will prove that ∇fa is uniformly continuous. A well-known result from analysis states that for a function h : Y → Z, where Y is a compact metric space and Z is just a metric space, it holds that h is continuous if and only if a is continuous, it is uniformly continuous h is uniformly continuous. Hence, since ∂f ∂xi on Y := B(0, 2a). Let > 0, then for all 1 ≤ i ≤ n: ∂fa ∂fa ∃δi > 0, ∀x, y ∈ Y : kx − yk < δi ⇒ (x) − (y) < √ . ∂xi ∂xi n Setting δ := min1≤i≤n δi , then we have for any x, y ∈ Y with kx − yk < δ that: v u n 2 r 2 uX ∂fa ∂f a k∇fa (x) − ∇fa (y)k = t (x) − (y) < n = . ∂x ∂x n i i i=1 When x, y 6∈ Y , it is obvious that k∇fa (x) − ∇fa (y)k = 0. To finish the argument, let x ∈ Y and y 6∈ Y . It is always possible to find a vector z ∈ Rn such that kzk = 2a and kx − zk ≤ kx − yk. Assume kx − yk < δ, then kx − zk < δ. Since, ∇fa (y) = 0 = ∇fa (z), it follows that: k∇fa (x) − ∇fa (y)k = k∇fa (x) − ∇fa (z)k < , where the last constraint follows from the fact that x, z ∈ Y . In an analogous way it can be shown that Hessfa is uniformly continuous. Since ∇fa and Hessfa are bounded and uniformly continuous, we know from (STEP 4) that: EhX, ∇fa (X)i = ETr(A(X)Hessfa (X)). (5.b) Next, we will prove that the partial derivatives of fa converge pointwise to ∂g those of f as a → ∞. Let x be fixed. Since g and ∂x are continuous, the expression i ∂fa for ∂xi (x) mentioned earlier clearly implies that: ∂fa ∂f ∂f (x) → (x)g(0) = (x), ∂xi ∂xi ∂xi CHAPTER 1. STEIN COEFFICIENTS 18 2 ∂g ∂g g as a → ∞. Using the continuity of g, ∂x , and ∂x∂i ∂x , the expression for i ∂xj j yields that: ∂ 2f ∂ 2f ∂ 2 fa (x) → (x)g(0) = (x), ∂xi ∂xj ∂xi ∂xj ∂xi ∂xj as a → ∞. (5.c) Finally, we will prove that: ∂ 2 fa ∂xi ∂xj EhX, ∇fa (X)i → EhX, ∇f (X)i and ETr(A(X)Hessfa (X)) → ETr(A(X)Hessf (X)), as a → ∞. This would finish the proof, since we have already established that EhX, ∇fa (X)i = ETr(A(X)Hessfa (X)) for all a > 1. We first note that by assumption the expectations E|f (X)|2 , Ek∇f (X)k2 and E|Tr(A(X)Hessf (X))| are finite. Besides, since µ ∈ K, all moments of kXk are bounded. Thus, in particular EkXk2 < ∞. Since the first order partial derivatives of fa converge pointwise to those of f , it is clear that for all ω ∈ Ω: n X i=1 n Xi (ω) X ∂f ∂fa (X(ω)) → Xi (ω) (X(ω)), ∂xi ∂x i i=1 as a → ∞. This implies that hX, ∇fa (X)i → hX, ∇f (X)i a.s., as a → ∞. (Note that this convergence is even pointwise.) If we can use the dominated convergence theorem, we immediately obtain that EhX, ∇fa (X)i → EhX, ∇f (X)i, as desired. Thus, it remains to show that the dominated convergence theorem can be applied. In other words, we have to prove that there exists an integrable random variable Y such that |hX, ∇fa (X)i| ≤ Y , for all a > 1. Since g : Rn → [0, 1] and a > 1, we have for all x ∈ Rn : n X ∂f a |hx, ∇fa (x)i| = (x) xi ∂xi i=1 n n X X ∂g ∂f −1 −1 −1 (x)g(a x) + xi a f (x) (a x) = xi ∂xi ∂xi i=1 i=1 n n ∂g X ∂f X ≤ |xi | (x) + |xi ||f (x)| . ∂xi ∂xi ∞ i=1 ∂g Note that ∂x i ∞ i=1 < ∞, for any 1 ≤ i ≤ n. Namely, since g ∈ C ∞ , ∂g ∂xi ∂g Thus, ∂x is bounded on B(0, 2) and zero outside. This means that i n on R . Define: n n ∂g X ∂f X Y := |Xi | (X) + |Xi ||f (X)| , ∂xi ∂xi ∞ i=1 i=1 is continuous. ∂g ∂xi is bounded CHAPTER 1. STEIN COEFFICIENTS 19 then obviously |hX, ∇fa (X)i| ≤ Y for all a > 1. It remains to prove that Y is an integrable random variable. Using Hölder’s inequality A.1.6 with p = q = 2, we see that: n X ∂g E|Y | ≤ nE[kXkk∇f (X)k] + E[kXk|f (X)|] ∂xi ∞ i=1 2 1/2 ≤ nE[kXk ] 2 1/2 E[k∇f (X)k ] 2 1/2 + E[kXk ] 2 1/2 E[|f (X)| ] n X ∂g < ∞. ∂xi ∞ i=1 The only thing left to prove is that ETr(A(X)Hessfa (X)) → ETr(A(X)Hessf (X)), as a → ∞. An easy calculation shows that for all ω ∈ Ω: Tr(A(X(ω))Hessfa (X(ω))) = n X n X A(X(ω))ij i=1 j=1 ∂ 2 fa (X(ω)). ∂xi ∂xj We have already established that the second order partial derivatives of fa converge pointwise to those of f . Thus, letting a → ∞, we have for all ω ∈ Ω: n X n X Tr(A(X(ω))Hessfa (X(ω))) → A(X(ω))ij i=1 j=1 ∂ 2f (X(ω)) ∂xi ∂xj = Tr(A(X(ω))Hessf (X(ω))). Hence, Tr(A(X)Hessfa (X)) → Tr(A(X)Hessf (X)) a.s., as a → ∞. Obviously, if the conditions of the dominated convergence theorem are met, this would finish the argument. Thus, it remains to prove that there exists an integrable random variable Ỹ such that |Tr(A(X)Hessfa (X))| ≤ Ỹ for all a > 1. By Proposition A.3.3, we have for all x ∈ Rn that |A(x)ij | ≤ kA(x)k ≤ b. Using this and the fact that a > 1, a straightforward calculation shows that |Tr(A(X)Hessfa (X))| ≤ Ỹ , where n X n X ∂g ∂f Ỹ := |Tr(A(X)Hessf (X))| + 2b ∂xi (X) ∂xj ∞ i=1 j=1 +b n X n X i=1 ∂g Analogous as before ∂x j ∂ 2g |f (X)| . ∂xi ∂xj ∞ j=1 ∞ 2 g and ∂x∂i ∂x j ∞ are finite. Thus, it is easy to see that: 2 1/2 E|Ỹ | ≤ E|Tr(A(X)Hessf (X))| + 2bnE[k∇f (X)k ] +bE[|f (X)|2 ]1/2 n X n X ∂ 2g < ∞. ∂x ∂x ∞ i j i=1 j=1 This means that Ỹ is integrable, which finishes the proof. n X ∂g ∂x ∞ j j=1 CHAPTER 1. STEIN COEFFICIENTS 20 Lemma 1.1.3. Let A and X be as in Theorem 1.1.1. Take any 1 ≤ i < j ≤ n. Let vij (x) := aii (x) + ajj (x) − 2aij (x), where aij (x) denotes the (i, j)th element of the matrix A(x). Then for all θ ∈ R: E exp(θ|Xi − Xj |) ≤ 2E exp(2θ2 vij (X)). Proof. Take an arbitrary integer k ≥ 1 and define f : Rn → R as: f (x) := (xi − xj )2k , for x ∈ Rn . ∂f ∂f Then for any l 6∈ {i, j}, we have that ∂x (x) = 0. Furthermore, we have that ∂x (x) = i l ∂f 2k−1 2k−1 . Combining these results we see that: 2k(xi − xj ) and ∂xj (x) = −2k(xi − xj ) hx, ∇f (x)i = 2kxi (xi − xj )2k−1 − 2kxj (xi − xj )2k−1 = 2k(xi − xj )2k . The second order partial derivatives are given by: ∂ 2f ∂ 2f (x) = (x) = −2k(2k − 1)(xi − xj )2k−2 ∂xi ∂xj ∂xj ∂xi ∂ 2f ∂ 2f (x) = (x) = 2k(2k − 1)(xi − xj )2k−2 . ∂x2i ∂x2j All other second order partial derivatives are zero and thus: n X Tr(A(x)Hessf (x)) = (A(x)Hessf (x))ll l=1 = n n X X alm (x) l=1 m=1 2 = aii (x) ∂ 2f (x) ∂xl ∂xm ∂ f ∂ 2f ∂ 2f (x) + 2a (x) (x) + a (x) (x) ij jj ∂x2i ∂xi ∂xj ∂x2j = 2k(2k − 1)(xi − xj )2k−2 (aii (x) − 2aij (x) + ajj (x)) = 2k(2k − 1)(xi − xj )2k−2 vij (x), where we have used that the matrix A(x) is symmetric. Take y ∈ Rn such that ym = δim − δjm . Then, by positive semidefiniteness of A(x) we get: vij (x) = aii (x) − 2aij (x) + ajj (x) = y T A(x)y ≥ 0. Hence, vij (x) is nonnegative for all x ∈ Rn . We now use Hölder’s inequality A.1.6 k and q = k, and we get: with p = k−1 E|Tr(A(X)Hessf (X))| = 2k(2k − 1)E[|Xi − Xj |2k−2 |vij (X)|] ≤ 2k(2k − 1)E[|Xi − Xj |2k ] k−1 k 1 E[vij (X)k ] k . CHAPTER 1. STEIN COEFFICIENTS 21 We want to apply (1.2) to f . So we first have to verify whether f meets the assumptions of Theorem 1.1.1. Clearly f ∈ C 2 (Rn ). In order to prove that E|f (X)|2 < ∞, |x|4k ≤ e|x| for all x ∈ R. Hence, we obtain, we first notice that by Taylor expansion (4k)! with y defined as above: E|f (X)|2 = E|Xi − Xj |4k ≤ ≤ = ≤ = (4k)!E[e|Xi −Xj | ]. (4k)!(E[eXi −Xj ] + E[eXj −Xi ]) (4k)!(E[exp hy, Xi] + E[exp h−y, Xi]) (4k)!(exp(bkyk2 ) + exp(bk − yk2 )) 2(4k)! exp(2b) < ∞, where we have used (1.1). By an analogous reasoning and some easy calculations, we also have that: Ek∇f (X)k2 = 8k 2 E(Xi − Xj )4k−2 < ∞, and E|Tr(A(X)Hessf (X))| ≤ 8bk(2k − 1)E(Xi − Xj )2k−2 < ∞, where we have used that |vij (x)| ≤ 4b by Proposition A.3.3. We showed that all assumptions on f are satisfied. Hence, we can use (1.2), which gives us that: 2kE(Xi − Xj )2k = E hX, ∇f (X)i = ETr(A(X)Hessf (X)) ≤ E|Tr(A(X)Hessf (X))| ≤ 2k(2k − 1)E[(Xi − Xj )2k ] k−1 k 1 E[vij (X)k ] k . From the above inequality, we can conclude that: E(Xi − Xj )2k ≤ (2k − 1)k E[vij (X)k ]. P x2k Using that cosh(x) = ∞ k=0 (2k)! , we obtain that: E exp(θ|Xi − Xj |) ≤ E exp(θ(Xi − Xj )) + E exp(−θ(Xi − Xj )) = 2E[cosh(θ(Xi − Xj ))] " ∞ # X θ2k (Xi − Xj )2k = 2E . (2k)! k=0 | {z } ≥0 Because all the terms of the sum are nonnegative, we can use monotone convergence, and the last expression can be rewritten as: ∞ ∞ X X θ2k E(Xi − Xj )2k (2k − 1)k θ2k E[vij (X)k ] 2 ≤ 2+2 (2k)! (2k)! k=0 k=1 ≤ 2+2 " = 2E ∞ X 2k θ2k E[vij (X)k ] k=1 ∞ X 2k k k=0 k! θ 2 vij (X)k k! = 2E exp(2θ2 vij (X)). # CHAPTER 1. STEIN COEFFICIENTS 22 In the third step we used again monotone convergence and for the second inequality k k ≤ 2k! , which can be seen as follows: we used that (2k−1) (2k)! (2k − 1)k 2k ≤ ⇔ (2k)! k! 2k − 1 2 k ≤ (2k)! k! ⇔ (k − 1/2)k ≤ (2k)(2k − 1) . . . (k + 1) . | {z } k factors This last inequality is trivial, since k − 1/2 is smaller than each of the factors on the right-hand side. This finishes the proof. Definition 1.1.4. A function ψ : R → R is called absolutely continuous if: k k X X ∀ > 0, ∃δ > 0 : (bi − ai ) < δ ⇒ |ψ(bi ) − ψ(ai )| < , i=1 i=1 for every finite collection of disjoint intervals ([ai , bi ])1≤i≤k in R. Remark 1.1.5. It is easily seen that every Lipschitz function is absolutely continuous. Besides, it is a well-known fact that absolutely continuous functions are almost everywhere differentiable. See [17] for more details on this topic. The subsequent lemma shows that for a specific type of random variables, one can always find a Stein coefficient. This Stein coefficient T is of a particular form. Conversely, if that particular T is a Stein coefficient for a random variable Y , the density of Y is determined. This last fact is only true under certain assumptions. Lemma 1.1.6. Assume ρ is a continuous probability density function R +∞ on R with support I ⊆ R, where I is a (bounded or unbounded) interval. Suppose −∞ yρ(y)dy = 0. Let R∞ x yρ(y)dy if x ∈ I h(x) := ρ(x) 0 if x 6∈ I. Let X be a random variable with density ρ and E[X 2 ] < ∞. (i) We have that E[Xψ(X)] = E[h(X)ψ 0 (X)] (1.7) for every Lipschitz function ψ such that both sides of (1.7) are well defined and E|h(X)ψ(X)| < ∞. (ii) If h1 is another function that satisfies (1.7) for all Lipschitz functions ψ, then h1 = h PX -a.s. on the support of ρ. (iii) If Y is a random variable such that (1.7) is true with Y instead of X, for all absolutely continuous functions ψ with continuous derivative ψ 0 , such that |ψ(x)|, |xψ(x)| and |h(x)ψ 0 (x)| are uniformly bounded, then Y has density ρ. CHAPTER 1. STEIN COEFFICIENTS 23 R∞ Proof. (i) Let u(x) = h(x)ρ(x), for all x ∈ R. First note that u(x) = x yρ(y)dy. R +∞ This is also true for x 6∈ I, since I is an interval and −∞ yρ(y)dy = 0. Moreover, we have that: Z x Z ∞ yρ(y)dy, (1.8) yρ(y)dy = − u(x) = −∞ x R +∞ where we have used that −∞ yρ(y)dy = 0. Using equation (1.8), it is easy to verify that u is continuous and that limx→−∞ u(x) = limx→∞ u(x) = 0. Equation (1.8) also implies that u is strictly positive on the support of ρ. To see this, let x ∈ I and assume x ≥ 0. Since ρ is continuous and ρ(x) > 0, there exists an > 0 such that [x, x + [⊆ I. Hence we can conclude that: Z ∞ Z x+ u(x) = yρ(y)dy ≥ y ρ(y) dy > 0. |{z} |{z} x x >0 >0 When x ≤ 0 it follows from an analogous reasoning that u(x) > 0. Hence we can conclude that u is strictly positive on I. From Proposition A.3.1 in the appendix we know that E[h(X)] = E[X 2 ] < ∞. When ψ is a bounded Lipschitz function, using integration by parts leads to: Z x Z ∞ Z ∞ yρ(y)dy E[Xψ(X)] = xψ(x)ρ(x)dx = ψ(x)d −∞ −∞ −∞ Z ∞ ψ(x)d(−u(x)) = −∞ Z ∞ ∞ u(x)ψ 0 (x)dx = [−ψ(x)u(x)]−∞ + −∞ Z ∞ h(x)ψ 0 (x)ρ(x)dx = E[h(X)ψ 0 (X)]. = −∞ Notice that we have used the boundedness of ψ and the fact that u(x) → 0 as x → ±∞. So, we have shown that (1.7) is true for any bounded Lipschitz function ψ. Now let ψ be an arbitrary Lipschitz function and g : R → [0, 1] a C ∞ map such that: 1 if x ∈ [−1, 1] g(x) := 0 if x 6∈ [−2, 2]. For each a > 1, let ψa (x) := ψ(x)g(a−1 x). The derivative of ψa is given by: ψa0 (x) = ψ 0 (x)g(a−1 x) + a−1 ψ(x)g 0 (a−1 x). It is easily seen that ψa is continuous on the compact interval [−2a, 2a] and zero outside. Hence ψa is bounded. Moreover, ψa is a Lipschitz function. Indeed, since ψ is a Lipschitz function, we know there exists a constant L1 > 0 such that: ∀x, y ∈ R : |ψ(x) − ψ(y)| ≤ L1 |x − y|. CHAPTER 1. STEIN COEFFICIENTS 24 Since ψa (t) = 0 for t 6∈ [−2a, 2a], it suffices to prove there exists a constant La > 0 such that: ∀x, y ∈ [−2a, 2a] : |ψa (x) − ψa (y)| ≤ La |x − y|. Let x, y ∈ [−2a, 2a], then it is clear that: |ψa (x) − ψa (y)| ≤ |ψ(x)g(a−1 x) − ψ(x)g(a−1 y)| + |ψ(x)g(a−1 y) − ψ(y)g(a−1 y)| = |ψ(x)||g(a−1 x) − g(a−1 y)| + g(a−1 y)|ψ(x) − ψ(y)|. By construction of g, we know g is a Lipschitz function. Assume g has Lipschitz constant L2 > 0. Besides, ψ is continuous. Therefore ψ is bounded on the compact interval [−2a, 2a]. More specifically, there exists a constant Ka > 0 such that |ψ(t)| ≤ Ka for all t ∈ [−2a, 2a]. Using these arguments together with the fact that g maps R into [0, 1], we obtain for all a > 1 that: |ψa (x) − ψa (y)| ≤ |ψ(x)|L2 |a−1 x − a−1 y| + L1 |x − y| ≤ Ka L2 |x − y| + L1 |x − y| = La |x − y|, with La := Ka L2 + L1 . Consequently, ψa is a bounded Lipschitz function, which implies that for all a > 1: E[Xψa (X)] = E[h(X)ψa0 (X)]. By continuity of g we have for all x ∈ R that ψa (x) = ψ(x)g(a−1 x) → ψ(x)g(0) = ψ(x) as a → ∞. Hence, ψa converges pointwise to ψ as a → ∞. On the other hand, by continuity of g and g 0 we have for all x ∈ R that ψa0 (x) → ψ 0 (x)g(0) + 0ψ(x)g 0 (0) = ψ 0 (x) as a → ∞. Thus, we have shown pointwise convergence of ψa0 to ψ 0 as a → ∞. Note that: |xψa (x)| = |xψ(x)||g(a−1 x)| ≤ |xψ(x)|, since g : R → [0, 1]. Moreover, we have that: |h(x)ψa0 (x)| ≤ |h(x)ψ 0 (x)g(a−1 x)| + |h(x)a−1 ψ(x)g 0 (a−1 x)| ≤ |h(x)ψ 0 (x)| + kg 0 k∞ |h(x)ψ(x)|, where kg 0 k∞ := supx∈R |g 0 (x)| is finite, since g 0 is continuous on the compact interval [−2, 2] and zero outside. Since pointwise convergence implies almost sure convergence, we have the following: Xψa (X) → Xψ(X) a.s. and |Xψa (X)| ≤ |Xψ(X)| ∈ L1 . Applying the dominated convergence theorem, yields that E[Xψa (X)] → E[Xψ(X)] as a → ∞. On the other hand, we have: h(X)ψa0 (X) → h(X)ψ 0 (X) a.s., |h(X)ψa0 (X)| ≤ |h(X)ψ 0 (X)|+kg 0 k∞ |h(X)ψ(X)| ∈ L1 . Again by the dominated convergence theorem we can conclude that E[h(X)ψa0 (X)] → E[h(X)ψ 0 (X)] as a → ∞. Combining the last two results together with the equality CHAPTER 1. STEIN COEFFICIENTS 25 E[Xψa (X)] = E[h(X)ψa0 (X)], we can conclude that E[Xψ(X)] = E[h(X)ψ 0 (X)], and the first part of the Lemma is proved. (ii) For the second part, let h1 be another function such that E[Xψ(X)] = E[h1 (X)ψ 0 (X)] for all Lipschitz functions ψ. Let ψ be a function such that ψ 0 (x) = sign(h1 (x)−h(x)), where if y > 0 1 −1 if y < 0 sign(y) := 0 if y = 0. Since ψ is a continuous function with a bounded derivative, the Mean Value Theorem implies that ψ is a Lipschitz function. Then by assumption and by the first part of the proof, we have: E[h1 (X)ψ 0 (X)] = E[Xψ(X)] = E[h(X)ψ 0 (X)]. By construction of ψ we can conclude that: 0 = E[(h1 (X) − h(X))ψ 0 (X)] = E|h1 (X) − h(X)|. Using Proposition A.1.5, this shows that h1 (X) = h(X) almost surely. When X is a random variable on a probability space (Ω, F, P), a straightforward reasoning gives: 0 ≤ PX ({x ∈ I | h1 (x) 6= h(x)}) = P(X −1 ({x ∈ I | h1 (x) 6= h(x)})) ≤ P({ω ∈ Ω | h1 (X(ω)) 6= h(X(ω))}) = 0. Thus, we have shown that PX ({x ∈ I | h1 (x) 6= h(x)}) = 0, which means that h1 = h PX -a.s. on I. (iii) For the last part of the lemma, let X be a random variable with density ρ. Take v : R → R an arbitrary bounded continuous function and let m = E[v(X)]. For x 6∈ I, we simply define ψ(x) = 0, and for all x ∈ I: Z x Z ∞ 1 1 ρ(y)(v(y) − m)dy = − ρ(y)(v(y) − m)dy. ψ(x) := u(x) −∞ u(x) x This last equality is easily seen, since by definition of m: Z ∞ ρ(y)(v(y) − m)dy = E[v(X) − m] = E[v(X)] − m = 0. −∞ In the beginning of this proof, we have already shown that u is strictly positive on I, whence ψ is well-defined. A little reflection reveals that ψ is an absolutely continuous function. We will now prove that |xψ(x)| is uniformly bounded. Let x ∈ I and x ≥ 0, CHAPTER 1. STEIN COEFFICIENTS 26 then |xψ(x)| = ≤ ≤ ≤ Z ∞ x ρ(y)(v(y) − m)dy u(x) Z x∞ 1 |xρ(y)||v(y) − m|dy u(x) x Z 2kvk∞ ∞ xρ(y)dy u(x) x Z 2kvk∞ ∞ yρ(y)dy = 2kvk∞ , u(x) x where we have used that m = E[v(X)] ≤ kvk∞ . Let x ∈ I and x < 0, then Z x x |xψ(x)| = ρ(y)(v(y) − m)dy u(x) −∞ Z −2kvk∞ x ≤ xρ(y)dy u(x) −∞ Z −2kvk∞ x yρ(y)dy = 2kvk∞ . ≤ u(x) −∞ This shows that |xψ(x)| is uniformly bounded. Using (1.8), we see that u0 (x) = −xρ(x). Thus, by definition of ψ it follows that: Z x 1 1 0 0 ψ (x) = ρ(x)(v(x) − m) − u (x) ρ(y)(v(y) − m)dy u(x) (u(x))2 −∞ Z x xρ(x) 1 ρ(x)(v(x) − m) + ρ(y)(v(y) − m)dy = u(x) (u(x))2 −∞ 1 xρ(x) = ρ(x)(v(x) − m) + ψ(x) u(x) u(x) 1 x = (v(x) − m) + ψ(x). h(x) h(x) Combining this with the fact that |xψ(x)| ≤ 2kvk∞ , we have that: |h(x)ψ 0 (x)| ≤ |v(x) − m| + |xψ(x)| ≤ 4kvk∞ . Thus, |h(x)ψ 0 (x)| is uniformly bounded. Finally, for all x ∈ R, we have that |ψ(x)| ≤ sup|t|≤1 |ψ(t)| + |xψ(x)|. This follows since if |x| ≤ 1, then |ψ(x)| ≤ sup|t|≤1 |ψ(t)| and if |x| > 1, then |ψ(x)| ≤ |xψ(x)|. Since ψ is continuous, there exists a constant K such that sup|t|≤1 |ψ(t)| ≤ K. Hence, for all x ∈ R, it holds that |ψ(x)| ≤ K + 2kvk∞ , and consequently |ψ(x)| is uniformly bounded. By the expression for ψ 0 (x) calculated above, it is also clear that ψ 0 is continuous. Now, let Y be a random variable such that E[Y ψ̃(Y )] = E[h(Y )ψ̃ 0 (Y )] for every absolutely continuous function ψ̃ with continuous derivative ψ̃ 0 , such that |ψ̃(x)|, |xψ̃(x)| and |h(x)ψ̃ 0 (x)| are uniformly bounded, then E[h(Y )ψ 0 (Y ) − Y ψ(Y )] = 0. A CHAPTER 1. STEIN COEFFICIENTS 27 previous calculation showed that h(x)ψ 0 (x) = v(x) − m + xψ(x) for all x, or thus that h(Y )ψ 0 (Y ) − Y ψ(Y ) = v(Y ) − m. Combining these results, we can conclude that: E[v(Y )] − E[v(X)] = E[v(Y ) − m] = E[h(Y )ψ 0 (Y ) − Y ψ(Y )] = 0. Thus E[v(Y )] = E[v(X)] for every bounded continuous function v. By Lemma 9.3.2 d in [12], we can conclude that PX = PY . Thus X = Y , which completes the proof. Before giving a proof of Theorem 1.0.2, we recall some notation. Let (Ω, F, P) be a probability space, X : Ω → R a random variable and Y : Ω → Rd a random vector. Note that E[XkY ] is a σ{Y }-measurable random variable, such that E[XkY ] = g ◦ Y for a suitable Borel-measurable function g : Rd → R. This map is unique PY -a.s. and we define: E[XkY = y] := g(y), y ∈ Rd . For more details, we refer to Appendix A.1.11. We will use the lemmas of this section in order to prove Theorem 1.0.2. For sake of completeness we restate this strong coupling theorem. Theorem 1.1.7. Suppose W is a random variable with E[W ] = 0 and E[W 2 ] < ∞. Let T be a Stein coefficient for W and assume |T | ≤ K a.e., for some constant K. Then, for any σ 2 > 0, we can construct Z ∼ N (0, σ 2 ) on the same probability space such that for any θ ∈ R: 2 2θ (T − σ 2 )2 . E exp(θ|W − Z|) ≤ 2E exp σ2 Proof. (STEP 1) We first assume W has a probability density function ρ with respect to the Lebesgue measure, which is strictly positive and continuous on R. Define R∞ yρ(y)dy h(x) := x ρ(x) , for all x ∈ R. By assumption, we know that E[W ] = 0 and E[W 2 ] < ∞. Thus, we can apply Lemma 1.1.6. Let h1 (w) := E[T kW = w], for w ∈ R. If we can show that E[W ψ(W )] = E[h1 (W )ψ 0 (W )] for all Lipschitz functions ψ, then part (ii) of Lemma 1.1.6 implies that h(w) = E[T kW = w] PW -almost surely. By Remark A.1.11 we know that h1 (W ) = E[T kW ], hence it suffices to show that E[W ψ(W )] = E[E[T kW ]ψ 0 (W )]. Using properties (i) and (ii) of Proposition A.1.12, we have indeed that: E[E[T kW ]ψ 0 (W )] = E[E[T ψ 0 (W )kW ]] = E[T ψ 0 (W )] = E[W ψ(W )], where the last equality follows from the given assumption that T is a Stein coefficient for W . Hence, we have proved that: h(w) = E[T kW = w] a.s. Analogously as in the proof of Lemma 1.1.6, one can use equation (1.8) to see that h is nonnegative. Hence, it is possible to define a map A from R2 into the set of 2 × 2 symmetric positive semidefinite matrices, by setting: p h(x ) h(x ) σ 1 1 p . A(x1 , x2 ) := σ h(x1 ) σ2 CHAPTER 1. STEIN COEFFICIENTS 28 Notice that A(x1 , x2 ) does not depend on x2 at all. Obviously A(x1 , x2 ) is symmetric. In to show that this matrix is positive semidefinite, take an arbitrary y = order y1 ∈ R2 . We want to show that y T A(x1 , x2 )y ≥ 0. This is easily seen from the y2 fact that: p p y T A(x1 , x2 )y = h(x1 )y12 + 2σ h(x1 )y1 y2 + σ 2 y22 = ( h(x1 )y1 + σy2 )2 ≥ 0. With the notation of the proof of Lemma 1.1.6, we have that u(x)/ρ(x) = h(x), where u was a continuous function. Since ρ is assumed to be continuous and strictly positive on R, we can conclude that h is a continuous function too. Therefore, A is a 4-dimensional mapping having continuous components, which implies that A itself is continuous. By assumption there exists a constant K such that |T | ≤ K almost surely. Hence: |h1 ◦ W | = |E[T kW ]| ≤ E[|T |kW ] ≤ K a.s., where we have used Jensen’s inequality A.1.13 for conditional expectations. Since R is the support of ρ, we have that W (Ω) = R, and therefore |h1 (x)| ≤ K for almost all x ∈ R. Hence: |h(w)| = |E[T kW = w]| = |h1 (w)| ≤ K a.s. A straightforward computation shows that kAk is also bounded by a constant b, whence all assumptions of Theorem 1.1.1 are satisfied. Then let X = (X1 , X2 ) be a random vector satisfying (1.1) and (1.2) of Theorem 1.1.1. Let ψ : R → R be an arbitrary absolutely continuous function with continuous derivative ψ 0 , such that |ψ(x)|, |xψ(x)| and |h(x)ψ 0 (x)| are uniformly bounded by a constant C. Let Ψ denote an antiderivative of ψ. This is a function such that Ψ0 = ψ. Since ψ is absolutely continuous, it is continuous and hence such a function Ψ indeed exists. Since an antiderivative is only determined up to a constant, we can assume that Ψ(0) = 0. We define f : R2 → R : (x1 , x2 ) 7→ Ψ(x1 ). Since ψ 0 is continuous, we have that f ∈ C 2 (R2 ). If we can show that E|f (X1 , X2 )|2 , Ek∇f (X1 , X2 )k2 and E|Tr(A(X1 , X2 )Hessf (X1 , X2 ))| are finite, then we can apply (1.2) with this f . (i) Using the Mean Value Theorem, there exists a u ∈ [0, x1 ] such that: f (x1 , x2 ) = Ψ(x1 ) = Ψ(0) +Ψ0 (u)(x1 − 0) = ψ(u)x1 . | {z } =0 Since |ψ(x)| is uniformly bounded by C, we have that: |f (x1 , x2 )| = |ψ(u)||x1 | ≤ C|x1 |. Thus, we can conclude that E|f (X1 , X2 )|2 ≤ C 2 E[X12 ]. To see that this is finite, we use (1.1) with θ = (t, 0) and t ∈ R. This yields that E[etX1 ] ≤ exp(bt2 ), and consequently X1 has a finite moment generating function and all its moments are finite. CHAPTER 1. STEIN COEFFICIENTS 29 (ii) Obviously ∇f (x1 , x2 ) = (ψ(x1 ), 0), and thus k∇f (x1 , x2 )k = |ψ(x1 )| ≤ C. Therefore Ek∇f (X1 , X2 )k2 ≤ C 2 < ∞. (iii) It is easy to see that: p 0 h(x1 ) ψ (x1 ) 0 h(x ) σ 1 p A(x1 , x2 )Hessf (x1 , x2 ) = 0 0 σ h(x1 ) σ2 0 h(x1 )ψ (x1 ) 0 p , = 0 σ h(x1 )ψ (x1 ) 0 and therefore |Tr(A(x1 , x2 )Hessf (x1 , x2 ))| = |h(x1 )ψ 0 (x1 )| ≤ C. Eventually, this implies that E|Tr(A(X1 , X2 )Hessf (X1 , X2 ))| ≤ C < ∞. We have shown that equation (1.2) can be applied, using the function f defined above. Therefore, we have that: E hX, ∇f (X)i = ETr(A(X)Hessf (X)). Using the calculations in part (ii), the left-hand side of this equation coincides with E[X1 ψ(X1 )]. The calculations in part (iii) show that the right-hand side of the equation can be rewritten as E[h(X1 )ψ 0 (X1 )]. Consequently, E[X1 ψ(X1 )] = E[h(X1 )ψ 0 (X1 )] for all absolutely continuous functions ψ with continuous derivative ψ 0 , such that |ψ(x)|, |xψ(x)| and |h(x)ψ 0 (x)| are uniformly bounded. By Lemma 1.1.6 (iii) we can conclude that X1 has the same distribution as W . Now we want to show that X2 ∼ N (0, σ 2 ). Define h̃ as the function considered in Lemma 1.1.6, with ρ̃ the density function of a N (0, σ 2 )-distribution. Simply using the definition of h̃, it is easily seen that: 0 Z ∞ Z ∞ 2 2 x2 x2 − y2 − y2 2 2 2 ye 2σ dy = −σ e 2σ e 2σ dy = σ 2 . h̃(x) = e 2σ x x Let ψ be an arbitrary absolutely continuous function with continuous derivative ψ 0 , such that |ψ(x)|, |xψ(x)| and σ 2 |ψ 0 (x)| are uniformly bounded by a constant C. If we can show that: E[X2 ψ(X2 )] = σ 2 E[ψ 0 (X2 )], then Lemma 1.1.6 (iii) implies that X2 ∼ N (0, σ 2 ). In order to prove the above equality, let Ψ be an antiderivative of ψ with Ψ(0) = 0 and let f (x1 , x2 ) = Ψ(x2 ). Since ψ 0 is continuous, it is clear that f ∈ C 2 (R2 ). (i) By the Mean Value Theorem there exists a u ∈ [0, x2 ] such that: f (x1 , x2 ) = Ψ(x2 ) = Ψ(0) + Ψ0 (u)(x2 − 0) = ψ(u)x2 . Hence E|f (X1 , X2 )|2 ≤ C 2 E[X22 ]. Using equation (1.1) with θ = (0, t) and t ∈ R, we obtain that E[etX2 ] ≤ exp(bt2 ) < ∞. So the moment generating function of X2 is finite, which shows that all moments of X2 are finite, and we get that E|f (X1 , X2 )|2 < ∞. CHAPTER 1. STEIN COEFFICIENTS 30 (ii) Clearly ∇f (x1 , x2 ) = (0, ψ(x2 )), which implies that Ek∇f (X1 , X2 )k2 = E[ψ(X2 )2 ] ≤ C 2 < ∞. (iii) A simple calculation shows that: p 0 σ h(x1 )ψ 0 (x2 ) , A(x1 , x2 )Hessf (x1 , x2 ) = 0 σ 2 ψ 0 (x2 ) whence E|Tr(A(X1 , X2 )Hessf (X1 , X2 ))| = σ 2 E|ψ 0 (X2 )| ≤ C < ∞. We have shown that E|f (X)|2 , Ek∇f (X)k2 and E|Tr(A(X)Hessf (X))| are finite. Consequently, we can apply (1.2) for this f , and we get: E hX, ∇f (X)i = ETr(A(X)Hessf (X)). Combining this with the calculations in part (ii) and (iii) above, we can conclude that E[X2 ψ(X2 )] = σ 2 E[ψ 0 (X2 )]. As we mentioned earlier, this implies that X2 ∼ N (0, σ 2 ). The assumptions of Lemma 1.1.3 are the same as in Theorem 1.1.1. Since we already verified these assumptions, we can apply Lemma 1.1.3 to X and A with i = 1 an j = 2. Therefore we have for all θ ∈ R that: E exp(θ|X1 − X2 |) ≤ 2E exp(2θ2 v12 (X)). First note that: p 2 p h(x1 ) − σ . v12 (x1 , x2 ) = h(x1 ) + σ − 2σ h(x1 ) = 2 This can be rewritten as: p 2 p 2 p 2 h(x1 ) − σ h(x1 ) + σ (h(x1 ) − σ 2 )2 (h(x1 ) − σ 2 )2 h(x1 ) − σ = = p . p 2 2 ≤ σ2 h(x1 ) + σ h(x1 ) + σ This implies that for all θ ∈ R the following holds: 2 2 2 (h(X1 ) − σ ) E exp(θ|X1 − X2 |) ≤ 2E exp 2θ . σ2 d Since X1 = W , we see that h(X1 ) has the same distribution as h(W ) = E[T kW ]. Hence, the above inequality can be rewritten as: 2 2 2 (E[T kW ] − σ ) E exp(θ|X1 − X2 |) ≤ 2E exp 2θ σ2 " √ #2 2 2θ(T − σ ) = 2E exp E W σ 2 2 2 (T − σ ) W ≤ 2E E exp 2θ 2 σ (T − σ 2 )2 = 2E exp 2θ2 , σ2 CHAPTER 1. STEIN COEFFICIENTS 31 where we used Proposition A.1.12 (ii) in the last equation. For the second inequality we used the convexity of the map x 7→ exp(x2 ) and Jensen’s inequality A.1.13 for conditional expectations. This finishes the proof for the case where W has a probability density ρ w.r.t. the Lebesgue measure which is strictly positive and continuous on R. (STEP 2) Now we will prove the general case. Let Y be a standard normal random variable, independent of W and T and defined on the same probability space (Ω, F, P). Let W := W + Y , for all > 0. We write ν for the distribution of W and we want to find an expression for the probability density function of W . By independence of W and Y and by some basic knowledge about the convolution of distributions, we have for all B ∈ R: Z ∞ PW (B) = (PW ∗ PY )(B) = PY (B − y)PW (dy) −∞ Z ∞ x2 Z = −∞ Z B−y ∞ Z = −∞ B Z Z ∞ = B e− 22 √ dxPW (dy) 2π (x−y)2 22 e− √ 2π (x−y)2 22 e− √ −∞ 2π dxPW (dy) PW (dy)dx, where we used Fubini’s Theorem in the last step. This was allowed, since the integrand was positive. By the above calculations, we can conclude that the probability density function of W is given by: Z ∞ ρ (x) = −∞ (x−y)2 e− 22 √ dν(y) 2π We want to show that ρ is strictly positive and continuous on R. By the first part of this proof, we then can apply the theorem on W . First of all, it is clear that 2 (x−W ) 1 ρ (x) = √2π E exp − 22 ≥ 0. Assume ρ (x) = 0, then we can apply Proposition A.1.5, since 2the exponential function is (strictly) positive. This would mean that (x−W ) exp − 22 = 0 a.s., which is a contradiction. Therefore ρ is strictly positive on R. In order to prove the continuity of ρ , assume xn → x as n → ∞. We will show2that (ω)) 1 ρ (xn ) → ρ (x) as n → ∞. For each ω, the function x 7→ √2π exp − (x−W is 22 (ω))2 (ω))2 1 1 exp − (xn −W exp − (x−W continuous and thus √2π → √2π . This implies 22 22 that: 1 1 (xn − W )2 (x − W )2 √ exp − →√ exp − a.s. 22 22 2π 2π √1 (xn −W )2 1 Note that we even have pointwise convergence. Since 2π exp − 22 , ≤ √2π CHAPTER 1. STEIN COEFFICIENTS 32 we can use the Bounded Convergence Theorem in order to conclude that: (xn − W )2 1 (x − W )2 1 E exp − →√ E exp − = ρ (x). ρ (xn ) = √ 22 22 2π 2π This finishes the argument of ρ being continuous on R. Thus, by (STEP 1) we are allowed to apply the theorem on W . So we need a Stein coefficient for W which is almost surely bounded by a constant. Let ψ be a Lipschitz function, then by definition of W one gets: E[W ψ(W )] = E[W ψ(W + Y )] + E[Y ψ(W + Y )]. We will work out the two expectations separately. Let ψt (x) := ψ(x + t). Using the independence of Y and W , Corollary A.1.4 implies that: Z ∞ Z ∞ E[W ψ(W + y)]PY (dy) = E[W ψy (W )]PY (dy). E[W ψ(W + Y )] = −∞ −∞ Using that T is a Stein coefficient for W , the above expression can be rewritten as: Z ∞ Z ∞ 0 E[T ψy (W )]PY (dy) = E[T ψ 0 (W + y)]PY (dy) = E[T ψ 0 (W + Y )]. −∞ −∞ where in the last step we used Corollary A.1.4 and the independence of Y and (W, T ). For the second expectation, we use again Corollary A.1.4, which gives us that: Z ∞ Z ∞ E[Y ψ(W + Y )] = E[Y ψ(x + Y )]PW (dx) = E[Y ψx (Y )]PW (dx). −∞ −∞ Since Y ∼ N (0, 2 ), we can use Proposition A.1.3, and the former expression can be rewritten as: Z ∞ Z ∞ 2 0 2 E[ψx (Y )]PW (dx) = E[ψ 0 (x + Y )]PW (dx) = 2 E[ψ 0 (W + Y )]. −∞ −∞ Note that in the last step, we used again Corollary A.1.4. Putting all these calculations together, we get that: E[W ψ(W )] = E[W ψ(W + Y )] + E[Y ψ(W + Y )] = E[T ψ 0 (W + Y )] + 2 E[ψ 0 (W + Y )] = E[(T + 2 )ψ 0 (W )]. This means T + 2 is a Stein coefficient for W . Moreover, |T + 2 | is almost surely bounded by the constant K + 2 . Besides, E[W ] = 0 by assumption. Therefore, E[W ] = E[W ] + E[Y ] = 0. Finally, by independence of Y and W , it is clear that: E[W2 ] = E[(W + Y )2 ] = E[W 2 ] + 2E[W ]E[Y ] + 2 E[Y 2 ] = E[W 2 ] + 2 . CHAPTER 1. STEIN COEFFICIENTS 33 Since we assumed that E[W 2 ] is finite, we also have that E[W2 ] < ∞. Thus, by the first part of the proof, we can construct a version of W and a random variable Z ∼ N (0, σ 2 + 2 ) on the same probability space such that for all θ ∈ R: 2 2 2θ (T − σ 2 )2 2θ (T + 2 − (σ 2 + 2 ))2 = 2E exp . E exp(θ|W − Z |) ≤ 2E exp σ 2 + 2 σ 2 + 2 Let µ be the distribution of the random vector (W , Z ). We want to show that the family (µ )0<<1 is sequentially compact. By Prohorov’s Theorem A.1.19 this is equivalent with showing that this family is tight. By Theorem A.1.20 it is enough to show that there exists an α > 0 such that sup0<<1 E[k(W , Z )kα ] < ∞. Let α = 2, then E[k(W , Z )k2 ] = E[W2 ] + E[Z2 ] for all 0 < < 1. Recall that E[W2 ] = E[W 2 ] + 2 . Since E[Z2 ] = σ 2 + 2 , we can conclude that E[k(W , Z )k2 ] = E[W 2 ] + 22 + σ 2 . By assumption E[W 2 ] < ∞, which implies that: sup E[k(W , Z )k2 ] = E[W 2 ] + 2 + σ 2 < ∞. 0<<1 Consequently, the family (µ )0<<1 is sequentially compact. Thus, there exist a probability measure µ0 on (R2 , R2 ) and a sequence (n )n in (0, 1) with n & 0, such that w µn → µ0 . Take a 2-dimensional random vector (W0 , Z0 ) with distribution µ0 . Then: d (Wn , Zn ) → (W0 , Z0 ). Therefore, using Proposition A.1.24, we have for all θ ∈ R that: E exp(θ|W0 − Z0 |) ≤ lim inf E exp(θ|Wn − Zn |) n→∞ 2 2θ (T − σ 2 )2 ≤ 2 lim inf E exp n→∞ σ 2 + 2n 2 2θ (T − σ 2 )2 = 2E exp . σ2 Note that in the last step, we used the monotone convergence theorem. There is d only one thing left to prove, namely that W0 = W and Z0 ∼ N (0, σ 2 ). We know that w w µn → µ0 or equivalently that P(Wn ,Zn ) → P(W0 ,Z0 ) . Since projections are continuous, the continuous mapping theorem implies that: w PWn → PW0 w and PZn → PZ0 . (1.9) On the other hand, since pointwise convergence implies convergence in distribution, d d we see that Wn = W + n Y → W . Our last aim is to prove that Zn → Z ∼ N (0, σ 2 ). Using Scheffé’s Lemma, it suffices to show pointwise convergence of the density functions. (Note that we even obtain strong convergence in this case.) This is trivial, since n → 0 implies that for every t ∈ R: 1 1 t2 1 t2 fZn (t) = p exp − 2 →√ exp − 2 = fZ (t). 2 σ + 2n 2σ 2πσ 2π(σ 2 + 2n ) w w Thus, we have proved that PWn → PW and PZn → PZ . Combining this with (1.9) and the fact that limits under weak convergence are unique, we can conclude d d that W0 = W and Z0 = Z ∼ N (0, σ 2 ). This finishes the proof. CHAPTER 1. STEIN COEFFICIENTS 1.2 34 Examples of Stein coefficients In this section we will give a few examples of Stein coefficients. These examples have the purpose to gain some insight into the concept of Stein coefficients, but are irrelevant for the rest of the paper. Example 1 P Let 1 , 2 , ..., n be i.i.d. symmetric ±1-valued random variables. Let Sn := ni=1 i 2 . We claim and let Y ∼ U [−1, 1]. Let Wn := Sn + Y and let Tn := n − Sn Y + 1−Y 2 that Tn is a Stein coefficient for Wn . For a verification of this statement, we refer to the proof of Theorem 2.1.1 in Chapter 2. Example 2 Let X be a random variable with E[X] = 0 and E[X 2 ] < ∞. Assume X has a density function ρ with support I ⊆ R, where I is a bounded or unbounded interval. Let R∞ x yρ(y)dy if x ∈ I (1.10) h(x) := ρ(x) 0 if x 6∈ I. Under certain conditions it can be shown that E[Xψ(X)] = E[ψ 0 (X)h(X)] for every Lipschitz function ψ. Hence h(X) is a Stein coefficient for X. A verification of this statement is given in the proof of Lemma 1.1.6. Example 3 Let X be a random variable and assume we have the same conditions Pn as in the 1 √ previous example. Let X1 , ..., Xn be i.i.d. copies of X, let W = n i=1 Xi and let ψ Pnbe a Lipschitz function. Using Corollary A.1.4 and the independence of Xi and j=1,j6=i Xj , we get for each 1 ≤ i ≤ n: " !# Z n h i ∞ 1 X E[Xi ψ(W )] = E Xi ψ √ Xj = E Xi ψ( √1n (Xi + x)) PPnj=1,j6=i Xj (dx). n j=1 −∞ For x ∈ R fixed, let ψ x (y) := ψ( √1n (y + x)). Example 2 implies that: h i 0 1 √ E Xi ψ( n (Xi + x)) = E[Xi ψ x (Xi )] = E[h(Xi )ψ x (Xi )] = √1n E[h(Xi )ψ 0 ( √1n (Xi +x))]. Combining these two results, we get for each 1 ≤ i ≤ n that: Z ∞ 1 E[Xi ψ(W )] = √n E[h(Xi )ψ 0 ( √1n (Xi + x))]PPnj=1,j6=i Xj (dx) −∞ " !!# n X 1 1 = √ E h(Xi )ψ 0 √ Xi + Xj n n j=1,j6=i = √1 E[h(Xi )ψ 0 (W )]. n CHAPTER 1. STEIN COEFFICIENTS 35 P Notice that for the second equality we used again that Xi and nj=1,j6=i Xj are independent. Finally, we conclude that for any Lipschitz function ψ: " # n n X 1 1 X E[Xi ψ(W )] = E ψ 0 (W ) h(Xi ) , E[W ψ(W )] = √ n i=1 n i=1 which means that 1 n Pn i=1 h(Xi ) is a Stein coefficient for W . Example 4 In this example we will weaken the i.i.d. assumption. More precisely, we will show how Theorem 1.0.2 can be used to produce strong couplings for sums of dependent random variables. Theorem 1.2.1. Assume X1 , ..., Xn , Xn+1 are i.i.d. random variables with E[X1 ] = 0, E[X12 ] = 1 and density function ρ. Suppose there exist constants x1 < x2 and 0 < a ≤ b such that: a ≤ ρ(x) ≤ b if x ∈ [x1 , x2 ] ρ(x) = 0 if x 6∈ [x1 , x2 ]. Pn Let Sn := i=1 Xi Xi+1 . Then it is possible to construct a version of Sn and a Gaussian random variable Zn ∼ N (0, n) on the same probability space such that for all x ≥ 0: P{|Sn − Zn | ≥ x} ≤ 4e−C(ρ)x , where C(ρ) is a positive constant depending only on the density ρ (and not on n). For the proof of this theorem we will need a result which is called the AzumaHoeffding inequality for sums of bounded martingale differences. Definition 1.2.2. Let (Ω, F, P) be a probability space and (Fn )n=1,2,... a non-decreasing family of sub-σ-fields of F. Let (Yn )n=1,2,... be a sequence of Fn -measurable random variables. We call this a sequence of bounded martingale differences if the following two conditions are satisfied: 1. ∀n ≥ 1, ∃Kn : |Yn | ≤ Kn a.s. 2. ∀n ≥ 1 : E[Yn kFn−1 ] = 0. Lemma 1.2.3. (Azuma-Hoeffding inequality) Suppose (Yi )i=1,2,... is a sequence of bounded martingale differences with |Yi | ≤ 1 a.s. for all i ≥ 1. Then for all t ∈ R and for all n ≥ 1: ! ! n n X X 2 E exp t bnk Yk ≤ exp t2 b2nk , k=1 k=1 where the bnk , k = 1, ..., n; n = 1, 2, ... are arbitrary real numbers. Proof. The proof of this lemma can be found in [1]. CHAPTER 1. STEIN COEFFICIENTS 36 Now we will proceed with a proof of Theorem 1.2.1. Proof. Let X0 ≡ 0 and let h be defined as in (1.10). Then for any Lipschitz function ψ the following holds: E[Sn ψ(Sn )] = n X E[Xi Xi+1 ψ(Sn )] = i=1 n X E[h(Xi )Xi+1 (Xi−1 + Xi+1 )ψ 0 (Sn )]. (1.11) i=1 The second equality can be obtained by using Corollary A.1.4 and the independence of Xi and Y := (X1 , ..., Xi−1 , Xi+1 , ..., Xn+1 ), so that for all 2 ≤ i ≤ n: Z E[Xi Xi+1 ψ(Sn )] = E[Xi ψ x1 ,..,xˆi ,..,xn+1 (Xi )]PY (d(x1 , .., x̂i , .., xn+1 )), Rn P Pn i−2 where ψ x1 ,..,xˆi ,..,xn+1 (z) := xi+1 ψ x x + x z + zx + x x for j j+1 i−1 i+1 j j+1 j=1 j=i+1 z ∈ R. We see by Example 2 and by the above expression that: Z 0 E[h(Xi )ψ x1 ,..,xˆi ,..,xn+1 (Xi )]PY (d(x1 , .., x̂i , .., xn+1 )) E[Xi Xi+1 ψ(Sn )] = Rn = E[h(Xi )Xi+1 (Xi−1 + Xi+1 )ψ 0 (Sn )], where in the last equality we made again use of Corollary A.1.4 and the independence of Xi and Y . For i = 1 this equality can be proved, by using the exact same reasoning. This finishes the argument, and we have proved (1.11). Pn Hence, if we let Di := h(Xi )Xi+1 (Xi−1 +Xi+1 ), then we have shown that Tn := i=1 Di is a Stein coefficient for Sn . Using that Xi−1 is σ{X1 , . . . , Xi−1 }-measurable and using the independence of X1 , ..., Xi−1 , h(Xi ) and Xi+1 , we obtain by Proposition A.1.12 that for any 1 ≤ i ≤ n: E[h(Xi )Xi+1 Xi−1 kX1 , . . . , Xi−1 ] = Xi−1 E[h(Xi )Xi+1 kX1 , . . . , Xi−1 ] = Xi−1 E[h(Xi )Xi+1 ] = Xi−1 E[h(Xi )]E[Xi+1 ] = 0. Again using independence, this implies that for any 1 ≤ i ≤ n: 2 E[Di − 1kX1 , . . . , Xi−1 ] = E[h(Xi )Xi+1 Xi−1 kX1 , . . . , Xi−1 ] + E[h(Xi )Xi+1 ]−1 2 ]−1 = E[h(Xi )]E[Xi+1 2 2 = E[Xi ]E[Xi+1 ] − 1 = 1 − 1 = 0. Here we have used that E[h(Xi )] = E[Xi2 ], which follows from Proposition A.3.1 in the appendix. Let Dj ≡ 0 for j ≥ n + 1, let Fi := σ{X1 , ..., Xi } for 1 ≤ i ≤ n + 1, and let Fi := Fn+1 for i ≥ n + 2. We want to show that (Di − 1)i=1,2,... is a sequence of bounded martingale differences. The second condition of Definition 1.2.2 has already been proved. For the first condition it suffices to show that |Di | is almost surely bounded by a constant. Moreover, we will show that this constant only depends on ρ. Since Xi has density ρ, we clearly have that Xi ∈ [x1 , x2 ] a.s., which implies that CHAPTER 1. STEIN COEFFICIENTS 37 |Xi | ≤ |x1 |∨|x2 | almost surely. Secondly, by definition of h, we have for all x ∈ [x1 , x2 ] that: R x2 R∞ Z |y|ρ(y)dy |y|ρ(y)dy b x2 b x x = ≤ |y|dy ≤ cx1 ,x2 , |h(x)| ≤ ρ(x) ρ(x) a x1 a where cx1 ,x2 is a constant depending only on x1 and x2 . Thus, by definition of Di we have: b |Di | ≤ |h(Xi )||Xi+1 |(|Xi−1 | + |Xi+1 |) ≤ cx1 ,x2 2(|x1 | ∨ |x2 |)2 a.s., a which is a constant depending on ρ only. Hence, there exists a constant K(ρ) > 0, such that |Di − 1| ≤ K(ρ) almost surely, whence (Di − 1)i=1,2,... is a sequence of i −1 i −1 bounded martingale differences. Since | DK(ρ) | ≤ 1 a.s. and since ( DK(ρ) )i=1,2,... still is a sequence of bounded martingale differences, we can apply Lemma 1.2.3 to this sequence. Let bnk = 1 for all 1 ≤ k ≤ n, then we obtain for every t ∈ R: ! ! 2 n n 2 X X tn D − 1 t k 2 t 1 = exp . ≤ exp E exp( K(ρ) (Tn − n)) = E exp t K(ρ) 2 k=1 2 k=1 Let α ∈ R be arbitrary and let t = αK(ρ), then we obtain: 2 α K(ρ)2 n E exp(α(Tn − n)) ≤ exp = exp(C1 (ρ)α2 n), 2 2 is a constant depending only on ρ. Thus if Z is a standard where C1 (ρ) = K(ρ) 2 Gaussian random variable, independent of all other random variables, then for any α ∈ R we get: Z +∞ √ √ E[exp(αt(Tn − n)/ n)]PZ (dt) E exp(αZ(Tn − n)/ n) = −∞ Z +∞ exp(C1 (ρ)α2 t2 )PZ (dt) ≤ −∞ = E exp(C1 (ρ)α2 Z 2 ). In the first step we used Corollary A.1.4 and the fact that Tn and Z are independent. Since Z is a standard Gaussian random variable, we know that Z 2 follows a gamma(1/2, 2) distribution. The moment generating function of Z 2 is thus given by: 2 E[etZ ] = √ 1 , 1 − 2t 1 t< . 2 The expression for the mgf also follows from Proposition A.3.7 in the appendix. If we choose C2 (ρ) = α > 0 small enough, such that C1 (ρ)α2 ≤ 3/8, one gets: √ 1 1 E exp(C2 (ρ)Z(Tn − n)/ n) ≤ p ≤q 1 − 2C1 (ρ)α2 1− = 2. 2.3 8 CHAPTER 1. STEIN COEFFICIENTS 38 On the other hand, using Corollary A.1.4, we obtain: Z +∞ √ √ E[exp(C2 (ρ)Z(t − n)/ n)]PTn (dt) E exp(C2 (ρ)Z(Tn − n)/ n) = −∞ Z +∞ exp(C2 (ρ)2 (t − n)2 /2n)PTn (dt) = −∞ = E exp(C2 (ρ)2 (Tn − n)2 /2n), where we used the well-known expression for the moment generating function of a standard Gaussian random variable. We have already shown that Tn is a Stein coefficient for Sn . The other assumptions of Theorem 1.0.2 are easy to verify, and hence this theorem can be applied. Therefore, there exist a version of Sn and a random variable Zn ∼ N (0, n) on the same probability space, such that for all θ ∈ R: 2 2θ (Tn − n)2 E exp(θ|Sn − Zn |) ≤ 2E exp . n C2 (ρ) C2 (ρ) Let θ = 2 , then we obtain that E exp |Sn − Zn | ≤ 4. Using Markov’s 2 inequality, we get that for any x ≥ 0: C2 (ρ) C2 (ρ) P{|Sn − Zn | ≥ x} ≤ P exp |Sn − Zn | ≥ exp x 2 2 C2 (ρ) C2 (ρ) |Sn − Zn | e− 2 x ≤ E exp 2 ≤ 4e− which completes the proof. C2 (ρ) x 2 , Chapter 2 A normal approximation theorem In this chapter we will prove two theorems. The first theorem is a normal approximation theorem. More specifically, this theorem produces a coupling of Sn with a random variable Zn ∼ N (0, n). The second theorem can be seen as a conditional version of the first theorem. 2.1 Verification of the normal approximation theorem The goal of this section is to give a proof of the normal approximation theorem stated below. Theorem 2.1.1. There exist universal constants κ and θ0 > 0 such that the following is true. Let n be a positive integer P and let 1 , 2 , . . . , n be i.i.d. symmetric ±1-valued random variables. Let Sn := ni=1 i . It is possible to construct a version of Sn and Zn ∼ N (0, n) on the same probability space such that E exp(θ0 |Sn − Zn |) ≤ κ. Once we have established this result, we can use Markov’s inequality in order to conclude that for all x ≥ 0 it holds that: P{|Sn − Zn | ≥ x} ≤ P{exp(θ0 |Sn − Zn |) ≥ eθ0 x } ≤ κe−θ0 x . Actually Theorem 2.1.1 is not a new result. A more general version of this theorem can be shown, using the classical techniques. More specifically, when 1 , 2 , . . . , n are independent mean zero random variables, having a finite moment generating function in a neighbourhood of zero, this result is also valid. For more details we refer to Sakhanenko [23]. We will only give a proof of the specific case where 1 , 2 , . . . , n are i.i.d. symmetric ±1-valued random variables. Therefore we will need the following lemma. 39 CHAPTER 2. A NORMAL APPROXIMATION THEOREM 40 Lemma 2.1.2. Assume X and Y are independent random variables, where X is a symmetric ±1-valued random variable and Y follows the uniform distribution on the interval [−1, 1]. Then we have for any Lipschitz function ψ : R → R that: E[Xψ(X + Y )] = E[(1 − XY )ψ 0 (X + Y )], and 1 E[Y ψ(X + Y )] = E[(1 − Y 2 )ψ 0 (X + Y )]. 2 Proof. For the first equation we start with using the independence of X and Y , so that by Corollary A.1.4 the following holds: Z ∞ 0 E[(1 − XY )ψ (X + Y )] = E[(1 − Xy)ψ 0 (X + y)]PY (dy) −∞ Z 1 1 = E[(1 − Xy)ψ 0 (X + y)]dy. 2 −1 Note that in the last step we have used that the Lebesgue density function of Y is given by: 1/2 if y ∈ [−1, 1] y 7→ 0 otherwise. Since X is a symmetric ±1-valued random variable, we have for any y ∈ R that E[(1 − Xy)ψ 0 (X + y)] = 21 (1 − y)ψ 0 (1 + y) + 12 (1 + y)ψ 0 (−1 + y). Combining this with the former calculation, one gets: Z Z 1 1 1 1 0 0 (1 − y)ψ (1 + y)dy + (1 + y)ψ 0 (−1 + y)dy. E[(1 − XY )ψ (X + Y )] = 4 −1 4 −1 We will work out these two terms separately. Using integration by parts, it is easy to see that: Z 1 Z 1 1 0 ψ(1 + y)dy (1 − y)ψ (1 + y)dy = [(1 − y)ψ(1 + y)]−1 + −1 −1 Z 1 = −2ψ(0) + ψ(1 + y)dy. −1 Again using integration by parts, we see that the second term reduces to: Z 1 Z 1 1 0 (1 + y)ψ (−1 + y)dy = [(1 + y)ψ(−1 + y)]−1 − ψ(−1 + y)dy −1 −1 Z 1 = 2ψ(0) − ψ(−1 + y)dy. −1 Putting these results together, we get that: Z Z 1 1 1 1 0 E[(1 − XY )ψ (X + Y )] = ψ(1 + y)dy − ψ(−1 + y)dy. 4 −1 4 −1 CHAPTER 2. A NORMAL APPROXIMATION THEOREM 41 On the other hand, using Corollary A.1.4, it is clear that: Z 1 1 E[Xψ(X + y)]dy E[Xψ(X + Y )] = 2 −1 Z 1 1 1 1 = ψ(1 + y) − ψ(−1 + y) dy 2 −1 2 2 Z 1 Z 1 1 1 ψ(1 + y)dy − ψ(−1 + y)dy, = 4 −1 4 −1 and the first equality of the lemma has been proved. For the second equation, we start again by using Corollary A.1.4 and we obtain: Z 1 1 E[Y ψ(X + Y )] = E[yψ(X + y)]dy. 2 −1 By definition of X, we have for all y ∈ R that E[yψ(X+y)] = 21 yψ(1+y)+ 21 yψ(−1+y). Hence, we can conclude that: Z Z 1 1 1 1 yψ(1 + y)dy + yψ(−1 + y)dy. E[Y ψ(X + Y )] = 4 −1 4 −1 On the other hand, Corrolary A.1.4 implies that: Z 1 1 1 2 0 E[(1 − Y )ψ (X + Y )] = E[(1 − y 2 )ψ 0 (X + y)]dy 2 4 −1 Z Z 1 1 1 1 2 0 (1 − y )ψ (1 + y)dy + (1 − y 2 )ψ 0 (−1 + y)dy, = 8 −1 8 −1 where we used for the last equality that X is a symmetric ±1-valued random variable. Using integration by parts, we see the following holds for every x ∈ R: Z Z 1 1 1 1 1 1 2 0 2 (1 − y )ψ (x + y)dy = ψ(x + y)(−2y)dy (1 − y )ψ(x + y) −1 − 2 −1 2 2 −1 Z 1 yψ(x + y)dy. = −1 The above equality specifically holds for x = 1 and x = −1. Therefore, we can conclude that: Z Z 1 1 1 1 1 2 0 E[(1 − Y )ψ (X + Y )] = yψ(1 + y)dy + yψ(−1 + y)dy 2 4 −1 4 −1 = E[Y ψ(X + Y )]. This finishes the proof of the lemma. We will now give a proof of Theorem 2.1.1. CHAPTER 2. A NORMAL APPROXIMATION THEOREM 42 Proof. For simplicity, let us write S instead of Sn . Let Y be a random variable, independent of 1 , . . . , n and uniformly distributed on the interval [−1, 1]. To ease the notation, the conditional expectation given 1 , . . . , n−1 will be denoted by E− and we will write: n−1 X − S = i , X = n . i=1 Let ψ be an arbitrary Lipschitz function. We will show that: E− [Xψ(S − + X + Y )] = E− [(1 − XY )ψ 0 (S + Y )]. (2.1) Let δ = (δ1 , . . . , δn−1 ) ∈ {−1, 1}n−1 and define: " ! # n−1 X g(δ) := E Xψ i + X + Y 1 = δ1 , . . . , n−1 = δn−1 . i=1 P Let ψδ (x) := ψ( n−1 i=1 δi + x), for x ∈ R. Since (X, Y ) and (1 , . . . , n−1 ) are independent, we can use Theorem A.1.14, and we get: " !# n−1 X g(δ) = E Xψ δi + X + Y i=1 = E[Xψδ (X + Y )] = E[(1 − XY )ψδ0 (X + Y )] " !# n−1 X = E (1 − XY )ψ 0 δi + X + Y i=1 " = E (1 − XY )ψ 0 n−1 X i=1 ! # i + X + Y 1 = δ1 , . . . , n−1 = δn−1 , where in the third step, we have used Lemma 2.1.2. By Remark A.1.11, this implies that: " ! # n−1 X E− [Xψ(S − + X + Y )] = E Xψ i + X + Y 1 , . . . , n−1 i=1 = g(1 , . . . , n−1 ) " = E (1 − XY )ψ 0 n−1 X i=1 − ! # i + X + Y 1 , . . . , n−1 0 = E [(1 − XY )ψ (S + Y )], and (2.1) has been proved. Using that X = n and taking the expectation on both sides of equation (2.1), we get by Proposition A.1.12 (ii) that: E[n ψ(S + Y )] = E[(1 − n Y )ψ 0 (S + Y )]. CHAPTER 2. A NORMAL APPROXIMATION THEOREM 43 d Since (i , S, Y ) = (n , S, Y ) for any 1 ≤ i < n, we obtain that: E[i ψ(S + Y )] = E[(1 − i Y )ψ 0 (S + Y )], for all 1 ≤ i ≤ n. Taking the sum, we get that: E[Sψ(S + Y )] = E[(n − SY )ψ 0 (S + Y )]. (2.2) Next, let g̃(δ) := E[Y ψ(S − +X +Y )k1 = δ1 , . . . , n−1 = δn−1 ], then again by Theorem A.1.14 we get that: " !# n−1 X g̃(δ) = E Y ψ δi + X + Y = E[Y ψδ (X + Y )]. i=1 Using the second part of Lemma 2.1.2, we get that: 1 E[(1 − Y 2 )ψδ0 (X + Y )] 2 " !# n−1 X 1 = E (1 − Y 2 )ψ 0 δi + X + Y 2 i=1 E[Y ψδ (X + Y )] = = 1 E[(1 − Y 2 )ψ 0 (S + Y )k1 = δ1 , . . . , n−1 = δn−1 ]. 2 Note that in the last step we used again Theorem A.1.14. Combining the above with Remark A.1.11, we obtain that: 1 E− [Y ψ(S + Y )] = g̃(1 , . . . , n−1 ) = E− [(1 − Y 2 )ψ 0 (S + Y )]. 2 Taking expectations of both sides yields that E[Y ψ(S + Y )] = 21 E[(1 − Y 2 )ψ 0 (S + Y )]. Combining this with equation (2.2), we get that: 1−Y2 0 E[(S + Y )ψ(S + Y )] = E n − SY + ψ (S + Y ) . 2 2 Thus, putting S̃ = S +Y and T = n−SY + 1−Y , we have that T is a Stein coefficient 2 2 for S̃. Let σ = n, then using the trivial inequality (a + b)2 ≤ 2a2 + 2b2 , we obtain: 2 2 (T − σ ) = σ2 −SY + n 1−Y 2 2 2 2S 2 Y 2 + 2 ≤ n Pn 1−Y 2 2 2 2S 2 + 1/2 ≤ . n By definition of S̃, we have that E[S̃] = i=1 E[i ] + E[Y ] = 0. Furthermore, by independence of 1 , . . . , n and Y we obtain that: ! Z n n X X 1 1 2 2 2 2 E[S̃ ] = Var i + Y = E[i ] + E[Y ] = n + y dy = n + 1/3 < ∞. 2 −1 i=1 i=1 CHAPTER 2. A NORMAL APPROXIMATION THEOREM 44 Moreover, by definition of T it is clear that: |T | ≤ n + |S||Y | + 1−Y2 ≤ n + n.1 + 1/2 = 2n + 1/2. 2 Hence, |T | is almost surely bounded by a constant, and all conditions of Theorem 1.0.2 are satisfied. Therefore we can construct a version of S̃ and a random variable Z ∼ N (0, σ 2 ) on the same probability space such that for all θ ∈ R it holds that: 2 2θ (T − σ 2 )2 . E exp(θ|S̃ − Z|) ≤ 2E exp σ2 Assuming that the underlying p-space is rich enough, there exist independent random d P variables S and Y such that S̃ = S + Y , S = ni=1 i and Y ∼ U [−1, 1]. For more details we refer to Theorem A.1.8 in the appendix. Clearly |S − S̃| ≤ 1. Let θ ∈ R be arbitrary, then it is clear that θ|S − Z| ≤ |θ||S − S̃| + |θ||S̃ − Z|. Consequently, we get: E exp(θ|S − Z|) ≤ E[exp(|θ||S − S̃|) exp(|θ||S̃ − Z|)] ≤ exp |θ| E exp(|θ||S̃ − Z|) 2 2θ (T − σ 2 )2 ≤ 2 exp |θ| E exp . σ2 Using the bound on (T − σ 2 )2 /σ 2 obtained above, we have: 2 2S + 1/2 2 E exp(θ|S − Z|) ≤ 2 exp |θ| E exp 2θ n 2 = 2 exp(|θ| + θ /n)E exp 4θ2 S 2 /n . 2 2 Let V be √ a standard √ normal random variable independent of S, then E exp(4θ S /n) = E exp( 8θV S/ n). To prove this equality, first note that by independence of V and S we can use Corollary A.1.4, which implies that: Z √ √ √ √ E exp( 8θV S/ n) = E[exp( 8θV s/ n)]PS (ds). R Using the well-known √ formula √ for the mgf2of2 a standard normal random variable, we obtain that E exp 8θV s/ n = exp (4θ s /n) for any s ∈ R. Therefore: 2 2 2 2 Z √ √ 4θ s 4θ S E exp( 8θV S/ n) = exp PS (ds) = E exp . n n R √ √ √ √ n We will that E exp( 8θV S/ n) = E[cosh ( 8θV / n)]. Let g(z) := √ now show √ E[exp( 8θV S/ n)kV = z]. Then, using the independence of V and S, Theorem CHAPTER 2. A NORMAL APPROXIMATION THEOREM 45 √ √ A.1.14 implies that g(z) = E exp( 8θzS/ n). Using that 1 , . . . , n are i.i.d. symmetric ±1-valued random variables, we obtain: # " n n Y Y √ √ √ √ E exp( 8θzi / n) g(z) = E exp( 8θzi / n) = i=1 i=1 √ √ = E[exp( 8θz1 / n)]n n √ √ √ √ 1 1 = exp( 8θz/ n) + exp(− 8θz/ n) 2 2 √ √ = coshn ( 8θz/ n). Hence, by Remark A.1.11, we can conclude that: √ √ √ √ E[exp( 8θV S/ n)kV ] = g(V ) = coshn ( 8θV / n). If we take the expectation can conclude by Proposition A.1.12 √ we √ √ √on both sides,n then ( 8θV / n)]. Thus, we have shown that (ii) that E exp( 8θV S/ n)√ = E[cosh √ n 2 2 E exp(4θ S /n) = E[cosh ( 8θV / n)]. Using the power series of the hyperbolic P∞ x2k P∞ x2k 2 cosine, we have the simple inequality cosh x = k=0 (2k)! ≤ k=0 k! = exp(x ). Consequently, if 16θ2 < 1, we have that: 1 E exp(4θ2 S 2 /n) ≤ E[expn (8θ2 V 2 /n)] = E exp(8θ2 V 2 ) = √ , 1 − 16θ2 (2.3) where the last equality can be seen as follows. Since V is a standard Gaussian random variable, we know that V 2 ∼ gamma(1/2, 2). Thus the moment generating function of V 2 is given by: 1 1 if t < . R(t) = √ 2 1 − 2t This expression for R(t) also follows from Proposition A.3.7 in the appendix. Hence, 1 2 E exp(8θ2 V 2 ) = R(8θ2 ) = √1−16θ < 1. Choosing θ0 such that 0 < θ0 < 1/4, 2 if 16θ we can conclude that: E exp(θ0 |S − Z|) ≤ 2 exp(|θ0 | + θ02 /n)E exp(4θ02 S 2 /n) 1 ≤ 2 exp(|θ0 | + θ02 /n) p 1 − 16θ02 1 =: κ. ≤ 2 exp(|θ0 | + θ02 ) p 1 − 16θ02 This finishes the proof. 2.2 Conditional version of the normal approximation theorem The purpose of this section is to prove the theorem below. It can be considered as a conditional version of Theorem 2.1.1. CHAPTER 2. A NORMAL APPROXIMATION THEOREM 46 Theorem 2.2.1. Let 1 , 2 , . . . , n be n arbitrary elements of {−1, 1}. Let P π be a uniform random permutation of {1, 2, . . . , n}. For each 1 ≤ k ≤ n, let Sk = kl=1 π(l) , and let kSn . Wk = Sk − n There exist universal constants M > 1, c > 1 and θ0 > 0 satisfying the following. Take any n ≥ 3, any possible value of Sn , and any n/3 ≤ k ≤ 2n/3. It is possible to construct a version of Wk and a Gaussian random variable Zk with mean 0 and variance k(n − k)/n on the same probability space such that for any θ ≤ θ0 : cθ2 Sn2 . E exp(θ|Wk − Zk |) ≤ M exp 1 + n Note that Sn is not random. Since 1 , 2 , . . . , n are fixed, Sn is simply an element of {−n, . . . , −1, 0, 1, . . . , n}. Theorem 2.2.1 can be seen as a conditional version of Theorem 2.1.1, since we assume the value of Sn to be known. Thus, actually, we are conditioning on Sn . In order to give a proof of Theorem 2.2.1, we need some auxiliary results. We will start with the following lemma. Lemma 2.2.2. Let us continue with the notation of Theorem 2.2.1. Then for any θ ∈ R and any 1 ≤ k ≤ n, we have: √ E exp(θWk / k) ≤ exp θ2 . Remark 2.2.3. Note that the bound does not depend on the value of Sn . This will be a crucial point in the proof of the next lemma and in the induction step in Section 3.1. Proof. Since Wn = 0, the lemma is obviously true for k = n. Fix √ k such that √ 1 ≤ k < n, and let m(θ) := E exp(θWk / k). Then m(θ) = R(θ/ k), where R denotes the moment generating function of Wk . By definition of Wk , it is clear that: k k |Sn | ≤ k + n = 2k < ∞. n n Thus Wk is a bounded random variable, which implies that R is differentiable and √ √ 1 1 m0 (θ) = √ R0 (θ/ k) = √ E[Wk exp(θWk / k)]. k k |Wk | ≤ |Sk | + For a detailed proof of this result, we refer to Proposition A.3.4 in the appendix. Note that: P P n k (n − k) ki=1 π(i) − k nj=k+1 π(j) 1X X (π(i) − π(j) ) = n i=1 j=k+1 n P P (n − k) ki=1 π(i) − k Sn − ki=1 π(i) = n Pk n i=1 π(i) − kSn kSn = = Sk − = Wk . n n CHAPTER 2. A NORMAL APPROXIMATION THEOREM 47 Consequently, k n √ 1 X X m (θ) = √ E[(π(i) − π(j) ) exp(θWk / k)]. n k i=1 j=k+1 0 (2.4) Now fix i and j, such that i ≤ k < j, and let τij denote the transposition of i and j. Let π 0 = π ◦τij , then it is obvious that π 0 (i) = π(j) and π 0 (j) = π(i). Moreover, π 0 is again uniformly distributed on the set of all permutations of {1, . . . , n}. This can be seen as follows. The random permutation π : Ω → {p | p is a permutation of {1, . . . , n}} is uniformly distributed. Let p be an arbitrary permutation of {1, . . . , n}, then P{π = p} = 1/n!. Thus: P{π 0 = p} = P{ω ∈ Ω | π(ω) ◦ τij = p} = P{ω ∈ Ω | π(ω) = p ◦ τij } = P{π = p ◦ τij } = 1/n!. Thus, we have shown that π 0 is uniformly distributed too. Let Wk0 = k X π0 (l) − l=1 kSn . n Since π and π 0 have the same distribution, and since π 0 (i) = π(j) and π 0 (j) = π(i), we have that: √ √ E[(π(i) − π(j) ) exp(θWk / k)] = E[(π0 (i) − π0 (j) ) exp(θWk0 / k)] √ = E[(π(j) − π(i) ) exp(θWk0 / k)]. Therefore, it is clear that: √ 2E[(π(i) − π(j) ) exp(θWk / k)] √ √ = E[(π(i) − π(j) ) exp(θWk / k)] − E[(π(i) − π(j) ) exp(θWk0 / k)], or equivalently: √ √ √ 1 E[(π(i) − π(j) ) exp(θWk / k)] = E[(π(i) − π(j) )(exp(θWk / k) − exp(θWk0 / k))]. 2 By straightforward reasoning, one can show that: 1 |ex − ey | ≤ |x − y|(ex + ey ). 2 For a verification of this inequality, we refer to Proposition A.3.6 in the appendix. Furthermore, it is also clear that: Wk − Wk0 = k X m=1 π(m) − k X l=1 π0 (l) = k X m=1 π(m) − k X l=1,l6=i π(l) − π(j) = π(i) − π(j) . CHAPTER 2. A NORMAL APPROXIMATION THEOREM 48 Combining all these results, we can conclude that: √ |E[(π(i) − π(j) ) exp(θWk / k)]| √ i √ 1 h ≤ E |π(i) − π(j) | exp(θWk / k) − exp(θWk0 / k) 2 √ √ θWk θWk0 1 0 ≤ E |π(i) − π(j) | √ − √ exp(θWk / k) + exp(θWk / k) 4 k k h √ √ i |θ| 2 0 exp(θWk / k) + exp(θWk / k) = √ E π(i) − π(j) 4 k √ √ √ |θ| 2|θ| 2|θ| ≤ √ E[exp(θWk / k) + exp(θWk0 / k)] = √ E exp(θWk / k) = √ m(θ). k k k Note that for the last inequality we used that |π(i) −π(j) | ≤ 2, since all l are elements of {−1, 1}. If we use the obtained bound together with equation (2.4), it follows that: k n 2|θ| X X |m (θ)| ≤ m(θ) ≤ 2|θ|m(θ). nk i=1 j=k+1 0 We will use this inequality to complete the proof. First of all, we note that m(θ) > 0. This can be easily seen as follows. Since the √ exponential function is (strictly) positive, it is obvious that m(θ) = E exp(θWk / √k) ≥ 0. Assume m(θ) = 0, then by Proposition A.1.5 it would follow that exp(θWk / k) = 0 a.s., which is a contradiction. Hence, m(θ) > 0. We will now consider two cases. Case 1: θ ≥ 0. Clearly, for all u ≥ 0, it holds that: m0 (u) ≤ |m0 (u)| ≤ 2|u|m(u) = 2u m(u), or equivalently m0 (u) m(u) ≤ 2u. Since θ ≥ 0, we have that: Z θ 0 Z θ m (u) du ≤ 2u du = θ2 . m(u) 0 0 Since m(0) = 1, the left-hand side of this inequality can be rewritten as: Z θ (log m(u))0 du = log m(θ) − log m(0) = log m(θ). 0 Thus, we have shown that log m(θ) ≤ θ2 . Applying the exponential function on both sides, gives that m(θ) ≤ exp (θ2 ). Case 2: θ < 0. Obviously, for all u ≤ 0, we have that: −m0 (u) ≤ |m0 (u)| ≤ 2|u|m(u) = −2u m(u), or equivalently m0 (u) m(u) ≥ 2u. Since θ < 0, it follows that: Z 0 0 Z 0 m (u) − log m(θ) = du ≥ 2u du = −θ2 . θ m(u) θ Thus, log m(θ) ≤ θ2 , and consequently m(θ) ≤ exp (θ2 ). CHAPTER 2. A NORMAL APPROXIMATION THEOREM 49 This finishes the argument. Lemma 2.2.4. Let all notation be as in the statement of Theorem 2.2.1. There exists a universal constant α0 > 0 such that for all n, all possible values of Sn , all k ≤ 2n/3, and all 0 < α ≤ α0 , we have: 3αSn2 2 E exp(αSk /k) ≤ exp 1 + . 4n Proof. Let Z be a standard normal random variable, independent of all other random variables. Then by Corollary A.1.4, we have: " ! Z !# r r 2α 2α E exp E exp ZSk = Zs PSk (ds). k k R Using q the well-known formula for the mgf of a standard normal rv, we obtain that 2α E exp Zs = exp(αs2 /k) for any s ∈ R. Therefore, we can conclude that: k r E exp 2α ZSk k ! Z = exp(αs2 /k)PSk (ds) = E exp(αSk2 /k). R By definition of Wk , we have proved that: ! ! r r r 2α 2α 2α kS n E exp(αSk2 /k) = E exp ZSk = E exp ZWk + Z . k k k n p Now let g(y) := E[exp( 2α/kZWk )kZ = y], for y ∈ R. Using Theorem A.1.14 and Lemma 2.2.2, it follows that: p g(y) = E exp( 2α/kyWk ) ≤ exp(2αy 2 ). Thus, by Remark A.1.11 we can conclude that: p E[exp( 2α/kZWk )kZ] = g(Z) ≤ exp(2αZ 2 ). Combining this result with part (i) and (ii) of Proposition A.1.12, we obtain: ! r r 2α 2α kSn E exp(αSk2 /k) = E exp ZWk + Z k k n " " ! ! ## r r 2α 2α kSn = E E exp ZWk exp Z Z k k n ! # !# " " r r 2α 2α kSn ZWk Z = E E exp Z exp k k n " !# r 2α kS n Z . ≤ E exp(2αZ 2 ) exp k n CHAPTER 2. A NORMAL APPROXIMATION THEOREM 50 Since Sn is a constant, the last expression is just the expectation of a function of a standard normal random variable. Using Proposition A.3.7, we obtain that for 0 < α < 1/4: αkSn2 1 2 exp . E exp(αSk /k) ≤ √ n2 (1 − 4α) 1 − 4α Let k ≤ 2n/3, then obviously: E exp(αSk2 /k) 1 exp ≤√ 1 − 4α 2αSn2 3n(1 − 4α) . Let α0 := 1/36, then we have for every 0 < α ≤ α0 that: 1 2αSn2 2 E exp αSk /k ≤ √ exp 3n(1 − 4α0 ) 1 − 4α0 r 9 3αSn2 = exp 8 4n 3αSn2 ≤ exp 1 + , 4n which completes the proof. Using the above lemmas, we will now prove Theorem 2.2.1. Proof. For simplicity, we shall write W for Wk and S for Sn . Sk will be written as usual. Let Y be a random variable which is uniformly distributed on the interval [−1, 1] and independent of π. Fix i and j such that 1 ≤ i ≤ k and k < j ≤ n. The conditional expectation given π(l), l 6= i, j will be denoted by E− . Furthermore, let: S− = X l6=i,j π(l) and W − = X l≤k,l6=i π(l) − kS . n Let ψ be an arbitrary Lipschitz function and let δ̂ = (δ1 , . . . , δˆi , . . . , δˆj , . . . , δn ) ∈ Rn−2 with δl , l 6= i, j different elements of {1, . . . , n}. Let: g(δ̂) := E[(π(i) − π(j) )ψ(W + Y )kπ(l) = δl , l 6= i, j]. P We will consider two cases. We will first assume that l6=i,j δl 6= S. It is obvious that underPthis condition P π(i) = π(j) P. Otherwise π(i) + π(j) = 0, which would imply that n S = l=1 π(l) = l6=i,j π(l) = l6=i,j δl , since we have the information that π(l) = δl for l 6= i, j. Thus, in this case, we can conclude by Theorem A.1.14 that: g(δ̂) = E[(π(i) − π(j) )ψ(W + Y )kπ(l) = δl , l 6= i, j] = 0. P Secondly, we will assume that l6=i,j δl = S. Then there are only two possibilities. The first one is that π(i) = 1 and π(j) = −1. The second possibility is that π(i) = −1 CHAPTER 2. A NORMAL APPROXIMATION THEOREM 51 and π(j) = 1. Each of these two possibilities occurs with probability one half, since we work conditionally the information that π(l) = δl , l 6= i, j and since π is uniformly distributed. Thus the conditional distribution of (π(i) − π(j) )/2 is a symmetric ±1P distribution. Moreover, if l6=i,j δl = S, it is easily seen that: X := π(i) − π(j) = π(i) 2 and W = W − + X. Thus, the random variable X conditionally that π(l) = δl , l 6= i, j with has a symmetric ±1-distribution. It is now clear that: P l6=i,j δl = S, g(δ̂) = E[(π(i) − π(j) )ψ(W + Y )kπ(l) = δl , l 6= i, j] = 2E[Xψ(W − + X + Y )kπ(l) = δl , l 6= i, j] !# " X kS , + X̄ + Y = 2E X̄ψ δl − n l≤k,l6=i where X̄ is a rv independent of Y with a symmetric ±1-distribution. Note that we P used Theorem A.1.14 in the last step. Since l≤k,l6=i δl − kS/n is a constant, we can P define ψδ̂ (x) := ψ l≤k,l6=i δl − kS/n + x . Thus, using Lemma 2.1.2, we obtain that: g(δ̂) = = = = 2E[X̄ψδ̂ (X̄ + Y )] 2E[(1 − X̄Y )ψδ̂0 (X̄ + Y )] 2E[(1 − XY )ψ 0 (W − + X + Y )kπ(l) = δl , l 6= i, j] E[(2 − (π(i) − π(j) )Y )ψ 0 (W + Y )kπ(l) = δl , l 6= i, j], where in the third step we used again Theorem A.1.14. Let: aij := 1 − π(i) π(j) − (π(i) − π(j) )Y. Then obviously: 2 − (π(i) − π(j) )Y if π(i) = 6 π(j) 0 if π(i) = π(j) . P Thus, irrespectively of whether l6=i,j δl = S or not, we have: aij = g(δ̂) = E[aij ψ 0 (W + Y )kπ(l) = δl , l 6= i, j]. Hence, by Remark A.1.11 we can conclude that: ˆ ..., π(j), ˆ ..., π(n)) E− [(π(i) − π(j) )ψ(W + Y )] = g(π(1), ..., π(i), = E− [aij ψ 0 (W + Y )]. CHAPTER 2. A NORMAL APPROXIMATION THEOREM 52 By Proposition A.1.12 (ii), taking the expectation on both sides leads to: E[(π(i) − π(j) )ψ(W + Y )] = E[aij ψ 0 (W + Y )]. In the proof of Lemma 2.2.2 we have already shown that: k n 1X X W = (π(i) − π(j) ). n i=1 j=k+1 This implies that: k n 1X X E[(π(i) − π(j) )ψ(W + Y )] E[W ψ(W + Y )] = n i=1 j=k+1 k n 1X X E[aij ψ 0 (W + Y )] = n i=1 j=k+1 ! # " k n 1X X = E aij ψ 0 (W + Y ) . n i=1 j=k+1 We want to prove that E[Y ψ(W + Y )] = 21 E[(1 − Y 2 )ψ 0 (W + Y )]. In order to do so, we define: h(δ̂) := E[Y ψ(W + Y )kπ(l) = δl , l 6= i, j]. P We will consider three different cases. First of all, we will assume that l6=i,j δl = S. In this case, we can define X and X̄ as above. Then, by analogous arguments as before, we can conclude that: h(δ̂) = E[Y ψ(W − + X + Y )kπ(l) = δl , l 6= i, j] !# " X kS + X̄ + Y = E Yψ δl − n l≤k,l6=i = E[Y ψδ̂ (X̄ + Y )]. Using the second equality of Lemma 2.1.2, we obtain that: 1 E[(1 − Y 2 )ψδ̂0 (X̄ + Y )] 2 1 = E[(1 − Y 2 )ψ 0 (W + Y )kπ(l) = δl , l 6= i, j]. 2 P Secondly, we will consider the case where l6=i,j δl = S − 2. This clearly implies that P π(i) = π(j) = 1. Let c := l≤k,l6=i δl + 1 − kS/n, then using integration by parts h(δ̂) = CHAPTER 2. A NORMAL APPROXIMATION THEOREM 53 gives us that: 1 1 E[(1 − Y 2 )ψ 0 (W + Y )kπ(l) = δl , l 6= i, j] = E[(1 − Y 2 )ψ 0 (c + Y )] 2 2Z 1 1 = (1 − y 2 )ψ 0 (c + y)dy 4 −1 Z 1 1 = − ψ(c + y)(−2y)dy 4 −1 = E[Y ψ(c + Y )] = E[Y ψ(W + Y )kπ(l) = δl , l 6= i, j]. P Finally, we assume that l6=i,j δl = S + 2. This implies that π(i) = π(j) = −1. Similarly, as in the previous case, it can be shown that: 1 h(δ̂) = E[(1 − Y 2 )ψ 0 (W + Y )kπ(l) = δl , l 6= i, j]. 2 P This means that this equality holds, irrespectively of the value of l6=i,j δl . Thus, by Remark A.1.11, we see that: ˆ . . . , π(j), ˆ . . . , π(n)) E− [Y ψ(W + Y )] = h(π(1), . . . , π(i), 1 − = E [(1 − Y 2 )ψ 0 (W + Y )]. 2 Taking the expectation on both sides, Proposition A.1.12 yields that: 1 E[Y ψ(W + Y )] = E[(1 − Y 2 )ψ 0 (W + Y )]. 2 Put W̃ = W + Y and k n 1−Y2 1X X aij + . T = n i=1 j=k+1 2 Then, we have proved that: E[W̃ ψ(W̃ )] = E[W ψ(W + Y )] + E[Y ψ(W + Y )] " ! # k n 1X X 1 aij ψ 0 (W + Y ) + E[(1 − Y 2 )ψ 0 (W + Y )] = E n i=1 j=k+1 2 = E[T ψ 0 (W̃ )]. Thus, T is a Stein coefficient for W̃ . Now, simply using the definition of aij , an easy verification shows that: P P k n k n i=1 π(i) j=k+1 π(j) 1X X k(n − k) aij = − − W Y. n i=1 j=k+1 n n CHAPTER 2. A NORMAL APPROXIMATION THEOREM 54 Let σ 2 = k(n − k)/n. Since n/3 ≤ k ≤ 2n/3, we have that: kS ≤ |Sk | + k |S| ≤ |Sk | + 2 |S|. |W | = Sk − n n 3 By straightforward reasoning, we obtain: P Pk 2 n 2 2 2 π(i) π(j) i=1 j=k+1 Y − 1 (T − σ ) n + WY + = 2 σ k(n − k) n 2 P Pk 2 n 2 π(i) π(j) i=1 j=k+1 n |Y − 1| ≤ + |W ||Y | + k(n − k) n 2 2 n |Sk |n 1 ≤ + |W |1 + k(n − k) n 2 2 n 1 = |Sk | + |W | + k(n − k) 2 2 Since u 7→ u2 is a convex function, we have for all x, y, z ∈ R that x+y+z ≤ 3 1 2 2 2 2 2 2 2 (x + y + z ), or equivalently that (x + y + z) ≤ 3(x + y + z ). Using this 3 inequality together with the bound on |W | obtained above, we get: 2 n n 1 2 2 (|Sk | + |W | + 1/2) ≤ 2|Sk | + |S| + k(n − k) k(n − k) 3 2 4 1 3n 4Sk2 + S 2 + ≤ k(n − k) 9 4 2 12n 1 S 2 = Sk + + k(n − k) 9 16 2 2 Sk n S 1 n2 n = 12 + + k n − k 9n k(n − k) 16k (n − k) Since it is assumed that n/3 ≤ k ≤ 2n/3, the following inequalities also hold: n/3 ≤ n − k ≤ 2n/3, n/k ≤ 3 and n/(n − k) ≤ 3. Putting C := 36, we can conclude that: 2 n 3Sk S 2 3 2 (|Sk | + |W | + 1/2) ≤ 12 + + k(n − k) k n 16k 2 2 Sk S ≤ C + +1 . k n Next, we will verify the assumptions of Theorem 1.0.2. Since π is uniformly distributed, we have for all 1 ≤ l ≤ n that: E[π(l) ] = 1 P{π(l) = 1} + · · · + n P{π(l) = n} = 1 1 1 S + · · · + n = . n n n CHAPTER 2. A NORMAL APPROXIMATION THEOREM 55 Since Y is uniformly distributed on the interval [−1, 1], it holds that E[Y ] = 0, which implies that: E[W̃ ] = E[W ] + E[Y ] = k X l=1 E[π(l) ] − kS kS kS = − = 0. n n n Since Y and W are independent and since kS/n is a constant, we have that: E[W̃ 2 ] = Var(W̃ ) = Var (Sk − kS/n) + Var(Y ) = Var(Sk ) + E[Y 2 ] ≤ E[Sk2 ] + 1/3 ≤ k 2 + 1/3 < ∞. Finally, it is easy to see that: |aij | = |1 − π(i) π(j) − (π(i) − π(j) )Y | ≤ 1 + 1 + 2|Y | ≤ 4. Hence, k n 1X X 1−Y2 k(n − k) 1 |T | ≤ |aij | + ≤ 4 + < ∞. n i=1 j=k+1 2 n 2 We have shown that all conditions of Theorem 1.0.2 are met. Thus we can construct a version of W̃ and a random variable Z ∼ N (0, σ 2 ) on the same probability space, such that for all θ ∈ R: 2 2 2 (T − σ ) E exp(θ|W̃ − Z|) ≤ 2E exp 2θ . σ2 On that same p-space, we can construct a random variable W having the same distribution as Sk − kSn /n, and a random variable Y ∼ U [−1, 1] independent of W , such that W̃ = W + Y . Indeed, assuming that the underlying p-space is rich enough, this is a simple application of Theorem A.1.8. Then, it clearly holds that |W − W̃ | ≤ 1. Combining these results, we obtain that: E exp(θ|W − Z|) ≤ E exp(|θ| + |θ||W̃ − Z|) 2 2 2 (T − σ ) . ≤ exp(|θ|)2E exp 2θ σ2 Using the bound on (T − σ 2 )2 /σ 2 obtained above, and putting c̃ = 2C, we see that: 2 Sk S 2 2 E exp(θ|W − Z|) ≤ 2 exp (|θ|) E exp 2θ C + +1 k n 2 2 2S 2 2 Sk = 2 exp |θ| + c̃θ + c̃θ E exp c̃θ n k By Lemma 2.2.4 there exists a universal constant α0 > 0, such that for 0 < c̃θ2 ≤ α0 : 3c̃θ2 S 2 2 2 E exp(c̃θ Sk /k) ≤ exp 1 + . 4n CHAPTER 2. A NORMAL APPROXIMATION THEOREM Thus when 0 < θ ≤ p 56 α0 /c̃ =: θ0 , it holds that: 3 c̃θ2 S 2 c̃θ2 S 2 2 + c̃θ + 1 + E exp(θ|W − Z|) ≤ 2 exp |θ| + n 4 n 2 7 θ S2 2 = 2 exp |θ| + c̃θ exp 1 + c̃ 4 n 2 2 θ S ≤ M exp 1 + c , n p where M := 2 exp α0 /c̃ + α0 and c := 7c̃/4 are universal constants. This finishes the proof. Chapter 3 The KMT theorem for the SRW The goal of this chapter is to give a proof of Theorem 2 from the introduction. We will proceed as follows. In Section 3.1 we will carry out an induction step. In Section 3.2 we will then use this result in order to give a proof of the KMT theorem for the simple random walk. 3.1 The induction step The purpose of this section is to prove the theorem below, which provides a coupling of a pinned random walk with a Brownian bridge. We will make use of Theorem 2.2.1 and an induction argument. Theorem 3.1.1. Let 1 , 2 , . . . , n be n arbitrary elements of {−1, 1}. Let Pk π be a uniform random permutation of {1, 2, . . . , n}. For each 1 ≤ k ≤ n, let Sk = l=1 π(l) , and let kSn . Wk = Sk − n There exist strictly positive constants C, K and λ0 such that the following is true. For all n ≥ 2, and any possible value of Sn , it is possible to construct a version of W0 , W1 , . . . , Wn and random variables Z0 , Z1 , . . . , Zn with a joint Gaussian distribution, mean zero and (i ∧ j)(n − (i ∨ j)) Cov(Zi , Zj ) = n on the same probability space such that for any 0 < λ < λ0 , Kλ2 Sn2 . E exp(λ max |Wi − Zi |) ≤ exp C log n + i≤n n Remark 3.1.2. Note that the theorem above states that we can make such couplings for all possible values of Sn . Since this concept turns up several times in the proof below, we give an example of those ’possible values’. Take n = 4, then the possible values for S4 are: {−4, −2, 0, 2, 4}. 57 CHAPTER 3. THE KMT THEOREM FOR THE SRW 58 Notation 3.1.3. Before giving a proof of Theorem 3.1.1, we first introduce some notation concerning several measures. Let µ denote the counting measure on Z and λ the Lebesgue measure on R. Then we define: Qn,1 Qn,2 Qn Rn := := := := µn+1 µ ⊗ λn−1 ⊗ µ Qn,1 ⊗ Qn,2 (µ ⊗ λ) ⊗ Qk ⊗ Qn−k . We can prove Theorem 3.1.1 as follows. Proof. Recall the universal constants α0 from Lemma 2.2.4 and M, c and θ0 from Theorem 2.2.1. Let: r α0 θ0 1 + log(2M 1/2 ) ∧ , and C ≥ . (3.1) K = 8c, λ0 ≤ 16c 2 log(3/2) We claim that these constants are sufficient for carrying out the induction step. We will prove this claim by induction on n. For all n and each possible value a of Sn , let fan (s) denote the discrete probability density function of the sequence (S0 , S1 , . . . , Sn ). Using the same notation as in 3.1.3, we see that fan (s) is actually a Qn,1 -density function. Since the random permutation π is uniformly distributed, it is easy to see that fan (s) is just the uniform distribution over Ana , where: Ana := s ∈ Zn+1 | s0 = 0, sn = a, and |si − si−1 | = 1, ∀i . Hence, for any s ∈ Ana , we have: fan (s) = 1 . |Ana | Note that, using some combinatorics, it is easy to verify that |Ana | = a+n . Next, let n 2 β n (z) denote the Qn,2 -density function of a Gaussian random vector (Z0 , Z1 , . . . , Zn ) with mean zero and covariance: Cov(Zi , Zj ) = (i ∧ j)(n − (i ∨ j)) . n We want to prove that for each n ≥ 2, and every possible value a of Sn , we can construct a joint Qn -density ρna (s, z) on Zn+1 × Rn+1 such that: Z Z n n ρa (s, z)Qn,2 (dz) = fa (s), ρna (s, z)Qn,1 (ds) = β n (z), (3.2) and for each 0 < λ < λ0 : Z 2 2 n ia Kλ a exp λ max si − − zi ρa (s, z) Qn (ds, dz) ≤ exp C log n + . i≤n n n CHAPTER 3. THE KMT THEOREM FOR THE SRW 59 If we can construct such densities, one can take a random vector (S, Z) following the , we have constructed a version of W0 , . . . , Wn density ρna . Then, setting Wi = Si − ia n and random variables Z0 , . . . , Zn satisfying all assumptions mentioned in Theorem 3.1.1. Hence, this would complete the proof. For the induction base, we refer to the end of the proof. Suppose ρka can be constructed for k = 2, . . . , n − 1, for all possible values of a in each case. We will now show how ρna can be constructed when a is an allowed value for Sn . First, fix a possible value a of Sn and an index k such that n/3 ≤ k ≤ 2n/3 (for definiteness, take k = [n/2]). Given Sn = a, let gan,k denote the µ-density function of Sk . We will use a simple counting argument, in order to determine gan,k (s). If we know that Sn = a, then the number of possible s ∈ Zn+1 values for (S0 , S1 , . . . , Sn ) for which Sk = sk = s is given by: |{s ∈ Zn+1 | s0 = 0, sk = s, sn = a, |si − si−1 | = 1, ∀i}|. This expression can be rewritten as: |{s ∈ Zk+1 | s0 = 0,sk = s, |si − si−1 | = 1, ∀i}× {r ∈ Zn−k+1 | r0 = s, rn−k = a, |ri − ri−1 | = 1, ∀i}|. Obviously, this is equal to the product of |{s ∈ Zk+1 | s0 = 0, sk = s, |si − si−1 | = 1, ∀i}| = |Aks | and n−k |{r ∈ Zn−k+1 | r0 = 0, rn−k = a − s, |ri − ri−1 | = 1, ∀i}| = |Aa−s |. Thus, given Sn = a, the number of possible s ∈ Zn+1 values for (S0 , S1 , . . . , Sn ) for n,k which Sk = sk = s is given by |Aks ||An−k a−s |. To obtain ga (s) this expression still has to be divided by the total number of all possibilities: | s ∈ Zn+1 | s0 = 0, sn = a, and |si − si−1 | = 1, ∀i | = |Ana |. Hence, we can conclude that for all allowed values s of Sk , we have: gan,k (s) = n−k |Aks ||Aa−s | . n |Aa | (3.3) Notice that this is the density function of a hypergeometric distribution. This can n be easily seen, using that |Ana | = a+n which we already mentioned before. Next, 2 let hn,k (z) denote the Lebesgue density function of the 1-dimensional Gaussian distribution with mean zero and variance k(n−k) . By Theorem 2.2.1, Lemma A.1.15 and n Lemma A.1.16, there exists a joint (µ ⊗ λ)-density function ψan,k (s, z) on Z × R such that: Z Z n,k n,k ψan,k (s, z) µ(ds) = hn,k (z), (3.4) ψa (s, z) dz = ga (s), CHAPTER 3. THE KMT THEOREM FOR THE SRW and for all θ ≤ θ0 : Z cθ2 a2 ka n,k − z ψa (s, z) (µ ⊗ λ)(ds, dz) ≤ M exp 1 + . exp θ s − n n 60 (3.5) Indeed, by Theorem 2.2.1 we can construct a version of Wk and a random variable ) on the same probability space, such that for all θ ≤ θ0 : Zk ∼ N (0, k(n−k) n cθ2 Sn2 . E exp(θ|Wk − Zk |) ≤ M exp 1 + n Note that in the statement of Theorem 2.2.1 it is assumed that n ≥ 3. In this Theorem we assume that n ≥ 2, but since we are already in the induction step, the assumptions of Theorem 2.2.1 are satisfied. Furthermore, let Sk = Wk + ka and let n ψan,k be the joint density function of (Sk , Zk ). This (µ ⊗ λ)-density function exists because of Lemma A.1.16. Then, obviously ψan,k meets all the assumptions mentioned above. Next, define a function γan : Z × R × Zk+1 × Rk+1 × Zn−k+1 × Rn−k+1 → R as follows: n−k γan (s, z, s, z, s’, z’) := ψan,k (s, z)ρks (s, z)ρa−s (s’, z’). (3.6) Note that ρks and ρn−k a−s exist by the induction hypothesis. Actually, they only exist for allowed values s and a − s for Sk and Sn−k respectively. However, if s is not a possible value for Sk , then ψan,k (s, z) = 0. If a − s is not an allowed value for Sn−k , then either a is not allowed for Sn or s is not allowed for Sk . In both of these cases ψan,k (s, z) = 0. Taking this into account, it is easy to verify that γan is an Rn probability density function. Indeed, integrating the expression in (3.6) over s’, z’, gives us ψan,k (s, z)ρks (s, z). Then, by integrating over s, z, we obtain ψan,k (s, z). And finally by integrating over s, z, we see that: Z γan (s, z, s, z, s’, z’) Rn (ds, dz, ds, dz, ds’, dz’) = 1. Let (S, Z, S, Z, S’, Z’) be a random vector on (Ω, F, P) following the density γan . In fact, this means the following. First, we generate (S, Z) from the joint density ψan,k . Then, given S = s and Z = z, we are independently generating the pairs (S, Z) n−k and (S’, Z’) from the joint densities ρks and ρa−s respectively. n+1 Now define two random vectors Y ∈ R and U ∈ Zn+1 as follows. For i ≤ k, let: i Yi = Zi + Z, k and for i ≥ k, let: n−i 0 Yi = Zi−k + Z. n−k Note that these definitions coincide at i = k, since Zk = Z00 = 0. To see that Zk = 0, one can simply integrate the density function γan over s’, z’ first, then over s and finally over s, z. In this way, Lemma A.1.15 implies that the joint density function of Z is given by β k . By definition of β k , we obtain that Zk has a normal distribution with CHAPTER 3. THE KMT THEOREM FOR THE SRW 61 mean zero and variance k(k−k) = 0. Hence E[Zk2 ] = 0, and thus Zk = 0 almost surely. k Besides, we can determine the probability density function of Z00 by first integrating γan over s, z, then over s’ and finally over s, z. We obtain that β n−k is the joint density function of Z’. Hence, Z00 follows a normal distribution with mean zero and variance 0(n−k−0) = 0. Thus E[Z002 ] = 0, which implies that Z00 = 0 almost surely. Therefore, n−k we have shown the two definitions of Yi match at i = k. Next, define Ui = Si , for i ≤ k and 0 , Ui = S + Si−k for i ≥ k. Also here, the definitions coincide at i = k, since Sk = S and S00 = 0. This is already intuitively clear since Lemma A.1.15 implies that (S, Z) and (S’, Z’) have conditional densities ρks and ρn−k a−s , given S = s and Z = z. We can also give a formal explanation. Lemma A.1.15 states that, in order to determine the density function of Sk , we simply integrate γan over s, z, s0 , . . . , sk−1 , z, s’z’, which gives: Z Z n,k k k ga (s) fs (s) µ (ds0 , . . . , dsk−1 ) µ(ds). Zk Z By definition of fsk , the expression between the parentheses is the density function of the constant random variable Sk = s evaluated at sk . Thus the density function of Sk is given by: Z gan,k (s)I{s} (sk ) µ(ds) = gan,k (sk ), Z d which is the density function of S. Hence, Sk = S. An analogous reasoning can be made in order to see that S00 = 0 almost surely. We claim that the joint density of (U, Y) is a valid candidate for ρna . We will show this in three different steps. 1. Marginal distribution of U. By Lemma A.1.15, we obtain the density function of (S, S, S’) as follows: Z γan (s, z, s, z, s’, z’) (λ ⊗ Qk,2 ⊗ Qn−k,2 )(dz, dz, dz’) Z Z Z n,k k = ψa (s, z)dz ρs (s, z)Qk,2 (dz) ρn−k a−s (s’, z’)Qn−k,2 (dz’) n−k = gan,k (s)fsk (s)fa−s (s’), where we have used the definition of γan and equations (3.2) and (3.4). This means, the distribution of the triplet (S, S, S’) can be described in the following way. First generate S from the density gan,k . This means S has the distribution of Sk given Sn = a. Next, independently generate S and S’ from the conditional densities fsk n−k respectively. Using the expression for gan,k (s) obtained in (3.3) and the and fa−s calculation above, we see that the joint density of (S, S, S’) is given by: n−k gan,k (s)fsk (s)fa−s (s’) = |Aks ||An−k 1 1 a−s | 1 = n . n−k n k |Aa | |As | |Aa−s | |Aa | CHAPTER 3. THE KMT THEOREM FOR THE SRW 62 By definition of U it is easy to verify that there is a one-to-one correspondence between (S, S, S’) and U. This implies that every possible value of U occurs with probability 1 . If we can show that U can take all values in Ana , then we know U has density fan . |An a| Assume the value of S(= Sk ) is known and given by b. Then S = (S0 , . . . , Sk ) ∼ fbk n−k 0 ) ∼ fa−b . Thus (U0 , . . . , Uk ) = (S0 , . . . , Sk ) can take all values and S’ = (S00 , . . . , Sn−k in: Akb = {s ∈ Zk+1 | s0 = 0, sk = b, |si − si−1 | = 1, ∀i}, 0 ) can take all values in: and (Uk , . . . , Un ) = (b, b + S10 , . . . , b + Sn−k {(sk , . . . , sn ) ∈ Zn−k+1 | sk = b, sn = a, |si − si−1 | = 1, ∀i}. This means U can take all values in: {s ∈ Zn+1 | s0 = 0, sk = b, sn = a, |si − si−1 | = 1, ∀i}. If we let b vary over all possible values of S = Sk , we can conclude that U can take all values in Ana . This finishes the argument. 2. Marginal distribution of Y. First, we claim that Z, Z and Z’ are independent, with densities hn,k , β k and β n−k respectively. What we need to show, is that the joint density of (Z, Z, Z’) is equal to the product of hn,k , β k and β n−k . Using Lemma A.1.15, this joint density function is obtained by integrating γan over s, s and s’, which yields: Z γan (s, z, s, z,s’, z’) (µ ⊗ µk+1 ⊗ µn−k+1 )(ds, ds, ds’) Z Z Z k k+1 n−k n−k+1 n,k ρs (s, z)µ (ds) ρa−s (s’, z’)µ (ds’) µ(ds) = ψa (s, z) Z = ψan,k (s, z)β k (z)β n−k (z’)µ(ds) Z = ψan,k (s, z)µ(ds) β k (z) β n−k (z’) = hn,k (z)β k (z)β n−k (z’). Since hn,k , β k and β n−k are densities of (multidimensional) Gaussian distributions with mean zero, and since Z, Z and Z’ are independent, it follows that (Z, Z, Z’) has a multidimensional Gaussian distribution with mean zero. It is easily seen that Y is a linear transformation of (Z, Z, Z’), thus also Y is a Gaussian random vector and has mean zero. The only thing left to prove, is that Cov(Yi , Yj ) = (i ∧ j)(n − (i ∨ j))/n. We will distinguish the following three cases: i ≤ j ≤ k, k ≤ i ≤ j and i ≤ k ≤ j. Then it suffices to show that Cov(Yi , Yj ) = i(n − j)/n in each case. (a) i ≤ j ≤ k. Recall that Z has density hn,k , which implies that Z has variance k(n − k)/n. Moreover, since Z has density β k , we know that Cov(Zi , Zj ) = CHAPTER 3. THE KMT THEOREM FOR THE SRW 63 i(k − j)/k. Using this, together with the independence of Z and Z, we obtain: j i Cov(Yi , Yj ) = Cov(Zi + Z, Zj + Z) k k j i ij = Cov(Zi , Zj ) + Cov(Zi , Z) + Cov(Z, Zj ) + 2 Cov(Z, Z) k k k i(k − j) ij k(n − k) + 2 = k k n i(n − j) = . n (b) k ≤ i ≤ j. As in part (a) we still have that Var(Z) = k(n − k)/n. Since Z’ has 0 0 density β n−k , we know that Cov(Zi−k , Zj−k ) = (i − k)(n − k − (j − k))/(n − k). Using this, together with the independence of Z and Z’, we have that: n−i n−j 0 Z, Zj−k + Z) n−k n−k (n − i)(n − j) 0 0 = Cov(Zi−k , Zj−k )+ Cov(Z, Z) (n − k)2 (i − k)(n − j) (n − i)(n − j) k(n − k) = + n−k (n − k)2 n i(n − j) = . n 0 Cov(Yi , Yj ) = Cov(Zi−k + (c) i ≤ k ≤ j. Using the independence of Z, Z and Z’, we see that: i n−j 0 Cov(Yi , Yj ) = Cov(Zi + Z, Zj−k Z) + k n−k i n−j = Cov(Z, Z) kn−k i n − j k(n − k) i(n − j) = = . kn−k n n This finishes the argument, and we get that Y ∼ β n . 3. The exponential bound. For 0 ≤ i ≤ n, let: Wi = Ui − ia . n We have to show that for all 0 < λ < λ0 : Kλ2 a2 E exp(λ max |Wi − Yi |) ≤ exp C log n + i≤n n , where C, K and λ0 are as specified in the beginning of the proof. Define: 0 i − k iS 0 TL := max Si − − Zi , TR := max Si−k − (a − S) − Zi−k , i≤k i≥k k n−k CHAPTER 3. THE KMT THEOREM FOR THE SRW and 64 ka − Z . T := S − n We claim that: max |Wi − Yi | ≤ max{TL , TR } + T. i≤n (3.7) To prove this claim, we distinguish two cases. First assume i ≤ k. Then, simply using the definitions of Wi , Yi and Ui , we see that: iZ ia |Wi − Yi | = Ui − − Zi + n k iS iS ia iZ = S i − − Zi + − − k k n k iS ia iZ iS − Zi + − − ≤ S i − k k n k i ka ≤ TL + S − − Z k n ≤ max{TL , TR } + T, where in the last step, we used that i/k ≤ 1. Now assume i ≥ k. Simply using the definitions and some straightforward calculations, we get that: ia n − i 0 0 Z |Wi − Yi | = S + Si−k − − Zi−k + n n−k 0 i − k i − k ia n − i 0 ≤ Si−k − (a − S) − Zi−k + S + (a − S) − − Z n−k n−k n n−k 0 n−i ka i−k 0 = Si−k − (a − S) − Zi−k + S− − Z n−k n−k n n−i T ≤ TR + n−k ≤ max{TL , TR } + T, n−i where in the last inequality we used that n−k ≤ 1, since we assumed that i ≥ k. This concludes the proof of (3.7). Fix 0 < λ < λ0 . Note that for any x, y ∈ R the following inequality holds: exp(x ∨ y) ≤ exp x + exp y. (3.8) Using this crude bound and the fact that λ > 0, we have: exp(λ max{TL , TR } + λT ) = exp(λTL ∨ λTR ) exp(λT ) ≤ (exp(λTL ) + exp(λTR )) exp(λT ) = exp(λTL + λT ) + exp(λTR + λT ). If we combine this with (3.7), we obtain: exp(λ max |Wi − Yi |) ≤ exp(λTL + λT ) + exp(λTR + λT ). i≤n (3.9) CHAPTER 3. THE KMT THEOREM FOR THE SRW 65 By Lemma A.1.15 and by definition of γan it is clear that the conditional density of (S, Z) given (S, Z) = (s, z) is simply ρks . By Remark A.1.11 we know that E[exp(λTL )kS, Z] = g(S, Z), where: g(s, z) = E[exp(λTL )kS = s, Z = z] Z is = exp λ max si − − zi ρks (s, z) Qk (ds, dz) i≤k k 2 2 Kλ s ≤ exp C log k + =: g̃(s). k Note that for the inequality, we used the induction hypothesis. As a consequence, we have: Kλ2 S 2 E[exp(λTL )kS, Z] ≤ g̃(S) = exp C log k + . k By definition of T , it is clear that exp(λT ) is σ{S, Z}−measurable. Using properties (i) and (ii) of Proposition A.1.12, we obtain that: E exp(λTL + λT ) = E[exp(λTL ) exp(λT )] = E [E[exp(λTL ) exp(λT )kS, Z]] = E [exp(λT )E[exp(λTL )kS, Z]] . Therefore, Hölder’s inequality A.1.6 with p = q = 2 implies that: 1/2 E E[exp(λTL )kS, Z]2 E[exp(λT )2 ] 1/2 2Kλ2 S 2 E exp(2λT ) ≤ E exp(2C log k) exp k 1/2 2Kλ2 S 2 = exp(C log k) E exp E exp(2λT ) . k E exp(λTL + λT ) ≤ We wish to apply Lemma 2.2.4 to bound the first term between the parentheses. Since S has density gan,k , we know that S plays the role of Sk in Lemma 2.2.4. Note that k ≤ 2n/3 by assumption. We want to apply the Lemma with α = 2Kλ2 , so we still have to verify that 2Kλ2 ≤ α0 . By the choice of K and λ0 in (3.1), we have: 2Kλ2 ≤ 16cλ20 ≤ 16c α0 = α0 . 16c Thus, Lemma 2.2.4 can be applied, and we find that: 2Kλ2 S 2 3Kλ2 a2 E exp ≤ exp 1 + . k 2n Besides, we have by (3.1) that 2λ ≤ 2λ0 ≤ θ0 . Thus, by using inequality (3.5) with CHAPTER 3. THE KMT THEOREM FOR THE SRW 66 θ = 2λ, we obtain: ka − Z E exp(2λT ) = E exp 2λ S − n Z ka = exp 2λ s − − z ψan,k (s, z) (µ ⊗ λ)(ds, dz) n 2 2 4cλ a ≤ M exp 1 + , n where we have used that ψan,k is the joint density of (S, Z). Combining the last three steps, we have: 1/2 3Kλ2 a2 4cλ2 a2 E exp(λTL + λT ) ≤ exp(C log k) exp 1 + M exp 1 + 2n n 2 2 (3K + 8c)λ a = M 1/2 exp C log k + 1 + . 4n By definition of K in (3.1), we see that 3K + 8c = 4K. Since k ≤ 2n/3 we also have that: k log k = log n = log n − log(n/k) ≤ log n − log(3/2). n Thus, E exp(λTL + λT ) ≤ M 1/2 Kλ2 a2 exp C log n − C log(3/2) + 1 + . n By symmetry, we can get the exact same bound on E exp(λTR + λT ). When we combine this with the inequality in (3.9), we obtain: Kλ2 a2 1/2 . E exp(λ max |Wi − Yi |) ≤ 2M exp C log n − C log(3/2) + 1 + i≤n n Finally, by the choice of C in (3.1), we have that: −C log(3/2) + 1 + log(2M 1/2 ) ≤ 0. Therefore, we can conclude that: Kλ2 a2 1/2 E exp(λ max |Wi − Yi |) ≤ exp C log n − C log(3/2) + 1 + log(2M ) + i≤n n 2 2 Kλ a ≤ exp C log n + . n This completes the induction step. To complete the argument, we still have to prove the induction base. Obviously it is sufficient to show that if we just choose C large enough and λ0 small enough, the result is true for n = 2. On a suitable probability CHAPTER 3. THE KMT THEOREM FOR THE SRW 67 space it is possible to construct a Gaussian random vector (Z0 , Z1 , Z2 ) with mean zero and: (i ∧ j)(2 − (i ∨ j)) . Cov(Zi , Zj ) = 2 On that same probability space, we construct a version of (W0 , W1 , W2 ) and we determine: E exp(λ0 max |Wi − Zi |). i≤2 It suffices to prove that this expression is finite for a certain λ0 . Indeed, in that case it is possible to choose C large enough, such that for all λ < λ0 , and all K ≥ 0: E exp(λ max |Wi − Zi |) ≤ E exp(λ0 max |Wi − Zi |) i≤2 i≤2 ≤ exp(C log 2) Kλ2 S22 ≤ exp C log 2 + . 2 We will show that E exp(λ0 maxi≤2 |Wi − Zi |) is finite for any λ0 > 0. Using the fact that W0 = Z0 = W2 = Z2 = 0, we see that: E exp(λ0 max |Wi − Zi |) = E exp(λ0 |W1 − Z1 |) ≤ E exp(λ0 |W1 | + λ0 |Z1 |). i≤2 Note that |W1 | = |S1 − 21 S2 | ≤ 1 + 2/2 = 2. Therefore, we can conclude that: E exp(λ0 max |Wi − Zi |) ≤ E exp(2λ0 + λ0 |Z1 |) i≤2 = exp(2λ0 )E exp(λ0 |Z1 |) ≤ 2 exp(2λ0 + λ20 /2) < ∞, since Z1 has a centered normal distribution. Consequently, we can find a suitable C, which finishes the proof. 3.2 Completing the proof of the main theorem In this section, we will use former results to complete the proof of Theorem 2 from the introduction. The next lemma uses Theorem 3.1.1 and Theorem 2.1.1 to give a ’finite n version’ of Theorem 2. Lemma 3.2.1. There exist universal constants B > 1 and λ > 0 such that the following is true. Let n be a strictly positive integer Pk and let 1 , 2 , . . . , n be i.i.d. symmetric ±1−valued random variables. Let Sk = i=1 i , for k = 0, 1, . . . , n. It is possible to construct a version of the sequence (Sk )k≤n and random variables (Zk )k≤n with a joint Gaussian distribution, mean zero and Cov(Zi , Zj ) = i ∧ j on the same probability space such that E exp(λ|Sn − Zn |) ≤ B and E exp(λ max |Sk − Zk |) ≤ B exp(B log n). k≤n CHAPTER 3. THE KMT THEOREM FOR THE SRW 68 Proof. Recall the universal constants θ0 and κ from Theorem 2.1.1 and C, K and λ0 from Theorem 3.1.1. Choose λ > 0 sufficiently small such that: λ< θ0 ∧ λ0 2 and 8Kλ2 < 1. Let the probability density functions fan , ρna and β n be as in the proof of Theorem 3.1.1. Recall the notation introduced in 3.1.3. Let g n denote the µ-density of Sn and let hn denote the Lebesgue density of Zn . By Theorem 2.1.1, Lemma A.1.15 and Lemma A.1.16, there exists a joint (µ ⊗ λ)-density function ψ n on Z × R such that: Z Z n n ψ (s, z) dz = g (s), ψ n (s, z) µ(ds) = hn (z), and Z exp(θ0 |s − z|)ψ n (s, z) (µ ⊗ λ)(ds, dz) ≤ κ. By the choice of λ, this last inequality clearly implies that: Z exp(2λ|s − z|)ψ n (s, z) (µ ⊗ λ)(ds, dz) ≤ κ. (3.10) Now let γ n : Z × R × Zn+1 × Rn+1 → R be defined as follows: γ n (s, z, s, z) := ψ n (s, z)ρns (s, z). To see that this is a ((µ ⊗ λ) ⊗ Qn )-probability density function, we can integrate γ n over s, z, s, z, which gives: Z γ n (s, z, s, z) ((µ ⊗ λ) ⊗ Qn )(ds, dz, ds, dz) Z Z n n = ψ (s, z) ρs (s, z) Qn (ds, dz) (µ ⊗ λ)(ds, dz) Z = ψ n (s, z) (µ ⊗ λ)(ds, dz) = 1. Let (S, Z, S, Z) be a random vector following the density γ n . By Lemma A.1.15, the joint density of (Z, Z) can be found by integrating γ n over s and s: Z Z Z n n+1 n n n+1 γ (s, z, s, z) (µ ⊗ µ )(ds, ds) = ψ (s, z) ρs (s, z) µ (ds) µ(ds) Z = ψ n (s, z)β n (z) µ(ds) Z = ψ n (s, z) µ(ds) β n (z) = hn (z)β n (z). CHAPTER 3. THE KMT THEOREM FOR THE SRW 69 Thus, we have proved that Z and Z are independent with density functions hn and β n respectively. Define a random vector Y = (Y0 , . . . , Yn ) as: Yi = Z i + i Z. n Since Z and Z are independent and have (multidimensional) normal distributions, we know that (Z, Z) is multidimensional normally distributed too. Note that Y is a linear transformation of (Z, Z), which can be seen as follows: 0 Y0 Y1 1/n Y2 Y = = 2/n .. . .. . Yn 1 1 0 0 1 0 .. . 0 ... ... .. . ... 1 ... ... 0 ... ... 0 Z 0 Z0 .. . Z1 .. .. . . . 0 ... 1 Z n Hence, Y is a Gaussian random vector. Moreover, since (Z, Z) is a mean zero random vector, the same is true for Y. Now we will determine Cov(Yi , Yj ). Since Z follows the density β n , we know that Cov(Zi , Zj ) = (i ∧ j)(n − (i ∨ j))/n. Besides we know that hn is the density function of Z, and thus Z ∼ N (0, n). Combining these two facts with the independence of Z and Z, yields that for i ≤ j: j i Z, Zj + Z) n n ij = Cov(Zi , Zj ) + 2 Cov(Z, Z) n i(n − j) ij = + 2 n = i. n n Cov(Yi , Yj ) = Cov(Zi + Clearly, if j ≤ i, we obtain that Cov(Yi , Yj ) = j. Hence, we can conclude that Cov(Yi , Yj ) = i ∧ j. Next, integrating out z and z we obtain that the joint (µ ⊗ µn+1 )-density of (S, S) is given by: g n (s)fsn (s). This means that the conditional density of S given S = s is fsn . Note that we have used Lemma A.1.15 here. It should now be clear that the distribution of S is the same as that of a simple random walk up to time n. For completeness, we will give an argumentation for this. By the above we see that given S = s, we know that Sn = s. An example of a realization of (S0 , . . . , Sn ) is the following: (0, 1, 2, 1, 2, 3, 4, 3, . . . , s + 1, s). In general, a realization of (S0 , . . . , Sn ) has the following form: 0, π(1) , π(1) + π(2) , . . . , s , CHAPTER 3. THE KMT THEOREM FOR THE SRW 70 where the i ∈ {−1, +1} are numbers. Note that the number of ones in {1 , . . . , n } is completely determined by s (and n). By letting π vary over different permutations, we see that (S0 , . . . , Sn ) can be any random walk up to time n that ends up in s. If we then let s vary over all possible values for Sn , we see that S has the distribution of a simple random walk up to time n. We will show that the pair (S, Y) satisfies the conditions of the theorem. First, let Wi = Si − ni S. Then, for any i ≤ n: i i i i |Si − Yi | = Si − Zi + Z = Wi + S − Zi − Z ≤ |Wi − Zi | + |S − Z|. n n n n Using this bound together with the fact that λ > 0 and part (i) and (ii) of Proposition A.1.12, we see that: i E exp(λ max |Si − Yi |) ≤ E exp λ max |Wi − Zi | + |S − Z| i≤n i≤n n ≤ E[exp(λ max |Wi − Zi |) exp(λ|S − Z|)] i≤n = E E[exp(λ max |Wi − Zi |) exp(λ|S − Z|)kS, Z] i≤n = E exp(λ|S − Z|)E[exp(λ max |Wi − Zi |)kS, Z] . i≤n Using Hölder’s inequality A.1.6, we obtain the following bound: 1 2 2 2 E exp(λ max |Si − Yi |) ≤ E E[exp(λ max |Wi − Zi |)kS, Z] E exp(λ|S − Z|) . i≤n i≤n We will try to bound the two terms between the parentheses. First, note that by Lemma A.1.15, the conditional distribution of (S, Z) given (S, Z) = (s, z) is simply ρns . Combining this with Remark A.1.11, we know that E[exp(λ maxi≤n |Wi −Zi |)kS, Z] = g(S, Z), where: g(s, z) = E[exp(λ max |Wi − Zi |)kS = s, Z = z] i≤n Z is = exp λ max si − − zi ρns (s, z) Qn (ds, dz) i≤n n 2 2 Kλ s ≤ exp C log n + =: g̃(s). n Note that the inequality follows from the construction of ρns and from the fact that 0 < λ < λ0 . We can now conclude that: Kλ2 S 2 E[exp(λ max |Wi − Zi |)kS, Z] ≤ g̃(S) = exp C log n + . i≤n n Besides, we can use (3.10) to obtain a bound on the second term between the parentheses. Indeed, by Lemma A.1.15, ψ n is the joint density function of (S, Z) and we CHAPTER 3. THE KMT THEOREM FOR THE SRW 71 see that: Z 2 E exp(λ|S − Z|) = E exp(2λ|S−Z|) = exp(2λ|s−z|)ψ n (s, z) (µ⊗λ)(ds, dz) ≤ κ. Using the two bounds, we obtain that: 1/2 2Kλ2 S 2 κ E exp(λ max |Si − Yi |) ≤ E exp 2C log n + i≤n n 1/2 = exp(C log n) κE exp 2Kλ2 S 2 /n . Since S has density function g n , we know that S plays the same role as the S in inequality (2.3). We want to use this inequality with 4θ2 = 2Kλ2 , so we need to prove that 8Kλ2 < 1. By the choice of λ in the beginning of this proof, this is obviously true, and inequality (2.3) can be applied. We get: 1 E exp 2Kλ2 S 2 /n ≤ √ . 1 − 8Kλ2 This implies that: E exp(λ max |Si − Yi |) ≤ exp(C log n) i≤n Now choose B large enough such that: ( B ≥ max C, κ √ 1 − 8Kλ2 κ √ 1 − 8Kλ2 1/2 1/2 . ) ,κ . With this choice of B, it is clear that: E exp(λ max |Si − Yi |) ≤ exp(B log n)B, i≤n which concludes the proof of the second inequality of the theorem. Thus, up to this point, we have constructed a random vector S with the same distribution as that of a simple random walk up to time n and a Gaussian random vector Y with mean zero and Cov(Yi , Yj ) = i ∧ j, satisfying the inequality above. The only thing left to prove is the bound E exp(λ|Sn − Yn |) ≤ B. First, note that Yn = Z. By definition of Yn , it suffices to prove that Zn = 0. Since the joint density of Z is given by β n , it follows that Zn is a Gaussian random variable with mean zero and Var(Zn ) = 0. Hence E[Zn2 ] = 0, which implies that Zn = 0 almost surely. Consequently Yn = Z, thus it remains to prove that E exp(λ|Sn − Z|) ≤ B. By construction of (S, Z, S, Z) we know that Sn = S and that (S, Z) has joint density ψ n . By the choice of B and by inequality (3.10) we can conclude that: Z E exp(λ|Sn −Z|) ≤ E exp(2λ|S−Z|) = exp(2λ|s−z|)ψ n (s, z)(µ⊗λ)(ds, dz) ≤ κ ≤ B. This finishes the proof. CHAPTER 3. THE KMT THEOREM FOR THE SRW 72 We will now give a proof of the main theorem. Theorem 3.2.2. LetP1 , 2 , ... be a sequence of i.i.d. symmetric ±1-valued random variables. Let Sk = ki=1 i , for each k ≥ 0. It is possible to construct a version of the sequence (Sk )k≥0 and a standard Brownian motion (Bt )t≥0 on the same probability space such that for all n and all x ≥ 0: P max |Sk − Bk | ≥ C log n + x ≤ Ke−λx , k≤n where C, K and λ do not depend on n. r Proof. Let m0 = 0. For r = 1, 2, . . . let mr = 22 , and nr = mr − mr−1 . For each r (r) (r) let (Sk , Zk )0≤k≤nr be a random vector satisfying the conclusions of Lemma 3.2.1. We can assume these random vectors are independent. Inductively define an infinite (1) (1) sequence (Sk , Zk )k≥0 in the following way. Let Sk = Sk and Zk = Zk for k ≤ m1 . Assume (Sk , Zk )k≤mr−1 has been defined, then define (Sk , Zk )mr−1 <k≤mr as: (r) Sk := Sk−mr−1 + Smr−1 (r) and Zk := Zk−mr−1 + Zmr−1 . We will prove that (Sk )k≥0 is a simple random walk and (Zk )k≥0 is a Brownian (r) motion. Recall that we assumed the random vectors (Sk )0≤k≤nr are independent for different r. Thus by construction of the sequence (Sk )k≥0 , it is clear that there exists a sequence (i )i≥1 of i.i.d. symmetric ±1−valued random variables, such that P Sk = ki=1 i for each k. Hence, (Sk )k≥0 is a simple random walk process. To prove (1) that (Zk )k≥0 is a Brownian motion, first note that Z0 = Z0 = 0 by construction. It is a well-known fact that it is now sufficient to prove that (Zk )k≥0 is a centered (r) Gaussian process1 with Cov(Zi , Zj ) = i ∧ j, for all i and j. Since (Zk )0≤k≤nr has a multidimensional normal distribution for each r ≥ 1, and since these random vectors are independent for distinct r, we have that: (1) (2) Z0 , . . . , Zn(1) , Z0 , . . . , Zn(2) ,... 1 2 is a Gaussian process. Now, let k ≥ 1 and i1 , . . . , ik ≥ 0. By construction of (Zk )k≥0 , we see that (Zi1 , . . . , Zik ) is a linear transformation of a finite subvector of the Gaussian process above. Hence, (Zi1 , . . . , Zik ) has a k-dimensional normal distribution, which shows that (Zk )k≥0 is a Gaussian process. Besides, since all random variables (r) Zk have mean zero, the same holds for the random variables Zk . Therefore, (Zk )k≥0 is a centered process. Finally we will prove that Cov(Zi , Zj ) = i ∧ j, for all i, j ≥ 0. Since Z0 = 0, this statement is obviously true when i = 0 or j = 0. Thus we can assume that i, j > 0. We will first show that Var(Zmr ) = mr , for each r ≥ 0. We (r) will use induction on r. By construction of the random variables Zk in Lemma (1) 3.2.1, it is clear that Var(Z0 ) = 0 and Var(Zm1 ) = Var(Zm1 ) = m1 . Now assume 1 A stochastic process (Yi )i∈I is called a Gaussian process if for any k ≥ 1 and for each choice i1 , i2 , . . . , ik ∈ I, the random vector (Yi1 , Yi2 , . . . , Yik ) has a k-dimensional normal distribution. CHAPTER 3. THE KMT THEOREM FOR THE SRW 73 (r) Var(Zmr−1 ) = mr−1 . Recall that Var(Znr ) = nr by construction. Using independence and the induction hypothesis, we obtain that: ) = mr−1 + nr = mr . ) = Var(Zmr−1 ) + Var(Zn(r) Var(Zmr ) = Var(Zmr−1 + Zn(r) r r Let: i = mr−1 + l, with 0 < l ≤ nr , j = mr0 −1 + l0 , with 0 < l0 ≤ nr0 . and Without loss of generality we can assume that i ≤ j. This implies we can distinguish two cases. (i) r < r0 . Using independence, we obtain: (r0 ) (r) Cov(Zi , Zj ) = Cov(Zmr−1 + Zl , Zmr0 −1 + Zl0 ) (r) = Cov(Zmr−1 + Zl , Zmr ) (r) = Cov(Zmr−1 + Zl , Zmr−1 + Zn(r) ) r (r) = Var(Zmr−1 ) + Cov(Zl , Zn(r) ) r = mr−1 + (l ∧ nr ) = mr−1 + l = i. (r) (ii) r = r0 and l ≤ l0 . Again using independence and the construction of (Zk )0≤k≤nr , we obtain: (r) (r) Cov(Zi , Zj ) = Cov(Zmr−1 + Zl , Zmr−1 + Zl0 ) (r) (r) = Cov(Zmr−1 , Zmr−1 ) + Cov(Zl , Zl0 ) = Var(Zmr−1 ) + (l ∧ l0 ) = mr−1 + l = i. This completes the argument of (Zk )k≥0 being a Brownian motion. Recall the constants B and λ from Lemma 3.2.1. First, note that for each r, we have: Zmr = Zmr−1 + Zn(r) r = Zmr−2 + Zn(r−1) + Zn(r) r r−1 = ... + Zn(r) = Zm1 + Zn(2) + · · · + Zn(r−1) r 2 r−1 = Zn(1) + Zn(2) + · · · + Zn(r−1) + Zn(r) r 1 2 r−1 r X = Zn(l)l . l=1 P (l) Analogously, we see that Smr = rl=1 Snl . Using this together with the independence (r) (r) of the random vectors (Sk , Zk )0≤k≤nr and the fact that λ is positive, we obtain CHAPTER 3. THE KMT THEOREM FOR THE SRW 74 that: E exp(λ|Smr − Zmr |) ≤ E exp λ r X ! |Sn(l)l − Zn(l)l | l=1 " = E r Y # exp(λ|Sn(l)l − Zn(l)l |) l=1 r Y = E exp λ|Sn(l)l − Zn(l)l | ≤ B r . (3.11) l=1 Note that the last inequality simply follows from Lemma 3.2.1. Now, let: c= 1− 1 . exp(− 12 B log 4) B We will show by induction that the following holds for each r ≥ 1: E exp λ max |Sk − Zk | ≤ cB r exp(B log mr ). (3.12) k≤mr By the facts that B > 1 and c > 1 and by Lemma 3.2.1, this inequality holds for r = 1. Indeed, we have: (1) (1) E exp(λ max |Sk − Zk |) = E exp(λ max |Sk − Zk |) k≤m1 k≤n1 ≤ B exp(B log n1 ) = B exp(B log m1 ) ≤ cB exp(B log m1 ). Now suppose inequality (3.12) holds for r − 1. By (3.8) and the fact that λ is positive, we have: E exp(λ max |Sk − Zk |) = E exp (λ max |Sk − Zk |) ∨ (λ max |Sk − Zk |) k≤mr mr−1 <k≤mr ≤ E exp(λ max mr−1 <k≤mr k≤mr−1 |Sk − Zk |) + E exp(λ max |Sk − Zk |). k≤mr−1 Let us consider the first term. For each k with mr−1 < k ≤ mr , we have: (r) (r) |Sk − Zk | = |Sk−mr−1 + Smr−1 − Zk−mr−1 − Zmr−1 | (r) (r) ≤ |Sk−mr−1 − Zk−mr−1 | + |Smr−1 − Zmr−1 | ≤ (r) (r) max |Sj − Zj | + |Smr−1 − Zmr−1 |. 1≤j≤nr Hence, max mr−1 <k≤mr (r) (r) |Sk − Zk | ≤ max |Sj − Zj | + |Smr−1 − Zmr−1 |. 1≤j≤nr CHAPTER 3. THE KMT THEOREM FOR THE SRW 75 Pr−1 (l) P (l) Recall that Smr−1 = l=1 Snl and Zmr−1 = r−1 l=1 Znl . Therefore, by using independence, we obtain that: E exp(λ max mr−1 <k≤mr |Sk − Zk |) (r) (r) ≤ E exp(λ max |Sj − Zj |) exp(λ|Smr−1 − Zmr−1 |) 1≤j≤nr (r) (r) = E exp(λ max |Sj − Zj |) E exp(λ|Smr−1 − Zmr−1 |) 1≤j≤nr ≤ B exp(B log nr )B r−1 , where the last inequality follows from Lemma 3.2.1 and (3.11). Since nr ≤ mr , we have shown that: E exp(λ max mr−1 <k≤mr |Sk − Zk |) ≤ B r exp(B log mr ). Let us now consider the second term. Recalling the definition of mr , we see that: r−1 2 r−1 r m2r−1 = 22 = 22 .2 = 22 = mr . Using this together with the induction hypothesis, yields: E exp(λ max |Sk − Zk |) ≤ cB r−1 exp(B log mr−1 ) k≤mr−1 B log(m2r−1 ) r−1 = cB exp 2 B log mr r−1 = cB exp . 2 Combining the bounds on the two terms, we get: B log mr E exp(λ max |Sk − Zk |) ≤ B exp(B log mr ) + cB exp k≤mr 2 c B log mr r = B exp(B log mr ) 1 + exp − . B 2 r r−1 Thus, to complete the induction step, it suffices to show that: c B log mr 1 + exp − ≤ c. B 2 First, note that the definition of c implies that: 1 B exp − B log 4 = B − . 2 c r Moreover, we have that mr = 22 ≥ 22 = 4, since r ≥ 1. Therefore, c B log mr c B log 4 c B 1 + exp − ≤ 1 + exp − =1+ B− = c. B 2 B 2 B c CHAPTER 3. THE KMT THEOREM FOR THE SRW 76 This completes the induction step. So we have shown (3.12). Let c̃ = 1/ log 2. Clearly we have for all r ≥ 1: r log mr = log 22 = 2r log 2. Therefore, we obtain the following bound for any r ≥ 1: r ≤ 2r = 1 log mr = c̃ log mr . log 2 Since B > 1, this implies that: cB r ≤ cB c̃ log mr = c exp log B c̃ log mr = c exp(c̃ log mr log B). By using (3.12), we obtain: E exp λ max |Sk − Zk | ≤ cB r exp(B log mr ) k≤mr ≤ c exp(c̃ log mr log B + B log mr ) = c exp ((c̃ log B + B) log mr ) . Let K := max{c, c̃ log B + B}, then we obtain for all r ≥ 1 that: E exp λ max |Sk − Zk | ≤ K exp(K log mr ). k≤mr Let us now prove such an inequality for arbitrary n instead of mr . Take any n ≥ 2. Let r be such that mr−1 ≤ n ≤ mr . If r ≥ 2, then mr = m2r−1 ≤ n2 . Besides, if r = 1, then mr = m1 = 4 ≤ n2 . Thus, E exp(λ max |Sk − Zk |) ≤ E exp(λ max |Sk − Zk |) k≤mr k≤n ≤ K exp(K log mr ) ≤ K exp(K log n2 ) = K exp(2K log n). Let C := 2K/λ. Using Markov’s inequality, we obtain for any x ≥ 0: P max |Sk − Zk | ≥ C log n + x ≤ E exp(λ max |Sk − Zk |) exp(−λC log n − λx) k≤n k≤n ≤ K exp(2K log n) exp(−λC log n − λx) = Ke−λx . This finishes the proof. Remark 3.2.3. Note that in the preceding proof we only constructed random variables Zk for k = 0, 1, . . . . In this way, we obtain a discrete process. By redefining the random variables Zk on a rich enough probability space, we obtain a Brownian motion (Bt )t≥0 . CHAPTER 3. THE KMT THEOREM FOR THE SRW 77 We finish this chapter with a new result. This theorem was found by Chatterjee, as a consequence of departing from the classical proof of the KMT theorem. This result is irrelevant for the rest of the paper. It produces a coupling of a SRW with a standard Brownian bridge. Theorem 3.2.4. There exist positive universal constants C, K and λ0 such that the following is true. Take any integer n ≥ 2 and suppose 1 , ..., n are Pexchangeable symmetric ±1-valued random variables. For k = 0, 1, ..., n, let Sk = ki=1 i and let Wk = Sk − nk Sn . It is possible to construct a version of W0 , ..., Wn and a standard Brownian bridge (Bt◦ )0≤t≤1 on the same probability space such that for any 0 < λ < λ0 : √ ◦ Kλ2 Sn2 E exp(λ max |Wk − nBk/n |) ≤ exp(C log n)E exp . k≤n n d Proof. Note that the exchangeability implies that (1 , . . . , n ) = (π(1) , . . . , π(n) ) if π is a random permutation of {1, . . . , n} independent of 1 , . . . , n . Consequently, Theorem 3.1.1 can be applied. Let all notation be as in the statement of Theorem √ ◦ 3.1.1. Define Bk/n := Zk / n, then Theorem 3.1.1 implies the following. It is possible ◦ )0≤k≤n with a joint to construct a version of W0 , . . . , Wn and random variables (Bk/n Gaussian distribution, mean zero and: 1 Cov(Zi , Zj ) n (i ∧ j)(n − (i ∨ j)) = n2 i j i j ∧ ∨ = 1− . n n n n ◦ ◦ Cov(Bi/n , Bj/n ) = Moreover, there exist universal constants C, K and λ0 such that for any 0 < λ < λ0 : √ ◦ E exp(λ max |Wk − nBk/n |) = E exp(λ max |Wk − Zk |) k≤n k≤n Kλ2 Sn2 ≤ exp C log n + . n Note that in Theorem 3.1.1 we assumed that the value of Sn was known. Thus Sn was not random. However, in the context of Theorem 3.2.4 Sn has to take on random though ’possible’ values. So, when we define Wk as in Theorem 3.2.4 (i.e. Sn takes on random possible values), we actually obtain: √ ◦ Kλ2 s2 E[exp(λ max |Wk − nBk/n |)kSn = s] ≤ exp C log n + , k≤n n where s is a possible value for Sn . Hence, using Remark A.1.11, we see that √ ◦ Kλ2 Sn2 E[exp(λ max |Wk − nBk/n |)kSn ] ≤ exp C log n + . k≤n n CHAPTER 3. THE KMT THEOREM FOR THE SRW 78 Taking expectations on both sides, indeed yields that √ ◦ Kλ2 Sn2 . E exp(λ max |Wk − nBk/n |) ≤ E exp C log n + k≤n n Redefining the random variables on a rich enough probability space, we obtain a version of W0 , . . . , Wn and a Brownian bridge (Bt◦ )0≤t≤1 such that all conditions of the theorem are fulfilled. Chapter 4 An application of the KMT theorem Since we have completed the proof of the KMT theorem for the simple random walk, our aim will now be to apply the general KMT theorem. We will consider a sequence of i.i.d. random variables with a finite moment generating function. More specifically, we will focus on the size of the increments of the partial sums. In the first section we will show that O(log n) is a borderline. In the second section we will state a limit theorem concerning moving averages for Gaussian random variables. Eventually, we will generalize this result using the KMT theorem. This chapter is based on the book [9]. 4.1 Theorem of Erdős-Rényi In this section we will prove the Theorem of Erdős-Rényi. The origin of this theorem lies in a problem posed by T. Varga, a secondary school teacher. He did the following experiment. He divided his class into two groups. Each student of the first group received a coin. They were asked to toss the coin two hundred times and to write down the corresponding head and tail sequence. The students of the second group did not receive any coin, and they were asked to arbitrarily write down a head and tail sequence of length two hundred. Once all the sheets of paper were collected, T. Varga asked if it was possible to subdivide them back into their original groups. The answer is given by the Theorem of Erdős-Rényi. From this theorem, it follows that a randomly produced head and tail sequence of length two hundred is very likely to contain head-runs of length seven. On the other hand, he observed that children writing down an imaginary sequence are afraid of putting down runs of longer than four. Hence, in order to find the sheets of the first group, he simply selected the ones containing runs longer than five. Theorem 4.1.1. (Erdős-Rényi) Let 1 , 2 , ... beP a sequence of i.i.d. symmetric ±1valued random variables. Put S0 = 0 and Sn = ni=1 i . Then for any c > 0 it holds 79 CHAPTER 4. AN APPLICATION OF THE KMT THEOREM that 80 Sk+bc log2 nc − Sk → β(c) a.s., 0≤k≤n−bc log2 nc bc log2 nc max where β(c) = 1 for 0 < c ≤ 1, and, if c > 1, then β(c) is the unique solution of the equation 1 1+β =1−h , (4.1) c 2 with h(x) = −x log2 x − (1 − x) log2 (1 − x), 0 < x < 1. The function β(·) is strictly decreasing and continuous for c > 1 with limc&1 β(c) = 1 and limc→∞ β(c) = 0. The following result is a generalization of Theorem 4.1.1. Theorem 4.1.2. (Erdős-Rényi) Let 1 , 2 , ... be a sequence of i.i.d. random variables with E[1 ] = 0. Assume there exists a neighbourhood I of t = 0 such that the moment generating function R(t) = E [et1 ] is finite for all t ∈ I. Let ρ(x) = inf e−tx R(t), t∈I the so-called Chernoff function of 1 . Then for any c > 0 we have Sk+bc log nc − Sk → α(c) 0≤k≤n−bc log nc bc log nc max a.s., where α(c) = sup x : ρ(x) ≥ e−1/c . Remark 4.1.3. Since ρ(0) = 1 and ρ(x) is a strictly decreasing function in the interval where ρ(x) > 0, α(c) is well defined for every c > 0, and it is a continuous decreasing function with limc→∞ α(c) = 0. We will show how Theorem 4.1.1 can be seen as a special case of Theorem 4.1.2. Since 1 is a symmetric ±1-valued random variable, the moment generating function is given by R(t) = 21 (et + e−t ). As a result, the Chernoff function ρ(x) is given by if 0 ≤ x < 1 (1 + x)−(1+x)/2 (1 − x)−(1−x)/2 ρ(x) := 1/2 if x = 1 0 if x > 1. This can be seen as follows. Since ρ(x) = inf e−tx R(t) = t∈R 1 inf et(1−x) + e−t(1+x) , 2 t∈R it is clear that ρ(x) ≥ 0. First, assume that x > 1. Then 1 − x < 0 and 1 + x > 2 > 0, and therefore t(1−x) → −∞ and −t(1+x) → −∞, as t → ∞. Hence, the expression for ρ(x) above, implies that ρ(x) = 0 if x > 1. Furthermore, when x = 1, we have that ρ(x) = 12 inf t∈R (1 + e−2t ) = 1/2. Finally, consider the case where 0 ≤ x < 1. Put CHAPTER 4. AN APPLICATION OF THE KMT THEOREM 81 f (t) := et(1−x) + e−t(1+x) , then ρ(x) = 12 inf t∈R f (t). We will try to find t∗ such that inf t∈R f (t) = f (t∗ ). Therefore, we compute the derivative f 0 (t) = (1 − x)et(1−x) − (1 + x)e−t(1+x) . It is clear that f 0 (t) = 0 ⇔ (1 − x)et(1−x) = (1 + x)e−t(1+x) ⇔ e2t = 1+x . 1−x From this reasoning, it follows that t∗ = 21 log 1+x . Therefore, 1−x 1 1+x 1 1+x ∗ inf f (t) = f (t ) = exp log (1 − x) + exp − log (1 + x) t∈R 2 1−x 2 1−x 1−x − 1+x 2 1+x 2 1+x = + 1−x 1−x 1−x 2 1−x 1+x 1+x (1 − x)− 2 + (1 + x)− 2 (1 − x) 2 h i 1+x − 1−x − 1+x 2 2 2 (1 + x)(1 − x) + (1 − x) = (1 + x) = (1 + x) = 2(1 + x)− 1+x 2 (1 − x)− 1+x 1−x 2 . 1−x Hence, ρ(x) = f (t∗ )/2 = (1 + x)− 2 (1 − x)− 2 . This completes the proof of the general expression for the Chernoff function ρ(x). From this expression for ρ(x), we will now derive Theorem 4.1.1. Using Theorem 4.1.2, we obtain that Sk+b c log nc − Sk Sk+bc log2 nc − Sk c log 2 j k = max max →α a.s. c 0≤k≤n−bc log2 nc bc log2 nc log 2 0≤k≤n−b logc 2 log nc log n log 2 It remains to prove that α(c/ log 2) = β(c) as defined in Theorem 4.1.1. To find the expression for α(c/ log 2), first note that e−1/(c/ log 2) = 2−1/c . We consider two cases. When 0 < c ≤ 1, then e−1/(c/ log 2) ≤ 1/2. Taking into account the expression for ρ(x) derived above, we obtain that α(c/ log 2) = sup([0, 1]) = 1. On the other hand, when c > 1, we have A := x : ρ(x) ≥ e−1/(c/ log 2) ⊂ {x : ρ(x) > 1/2} = [0, 1). Thus, for x ∈ A, it holds that ρ(x) ≥ e−1/(c/ log 2) ⇔ (1 + x)−(1+x)/2 (1 − x)−(1−x)/2 ≥ 2−1/c ⇔ log2 (1 + x)−(1+x)/2 − log2 (1 − x)(1−x)/2 ≥ −1/c 1+x 1−x ⇔ log2 (1 + x) + log2 (1 − x) ≤ 1/c. 2 2 We now define β as the solution of the equation 1+β 1−β log2 (1 + β) + log2 (1 − β) = 1/c. 2 2 CHAPTER 4. AN APPLICATION OF THE KMT THEOREM 82 To complete the proof of Theorem 4.1.1, it clearly suffices to prove that 1 1+β =1−h , c 2 with h(x) = −x log2 x − (1 − x) log2 (1 − x). By the definition of β above, it is clear that 1+β 1−β 1 =1− 1− log2 (1 + β) − log2 (1 − β) . c 2 2 Hence, it suffices to prove that 1+β 1−β 1+β h =1− log2 (1 + β) − log2 (1 − β). 2 2 2 Indeed, 1+β 1+β 1+β 1+β h = − log2 − 1− log2 1 − 2 2 2 2 1+β 1−β = − (log2 (1 + β) − 1) − (log2 (1 − β) − 1) 2 2 1+β 1−β 1+β 1−β = + − log2 (1 + β) − log2 (1 − β) 2 2 2 2 1+β 1−β = 1− log2 (1 + β) − log2 (1 − β). 2 2 Thus, we have completed the proof of Theorem 4.1.1, by making use of Theorem 4.1.2. 1+β 2 Remark 4.1.4. As an illustration of Theorem 4.1.2, we consider the case where 2 1 ∼ N (0, 1). It is well known that R(t) = et /2 . Therefore, the Chernoff function is given by 1 2 2 2 2 ρ(x) = inf e−tx et /2 = e−x /2 inf e 2 (t−x) = e−x /2 . t∈R t∈R From this expression for the Chernoff function, we can easily compute α(c). Indeed, for c > 0, we have n o p 2 α(c) = sup x : e−x /2 ≥ e−1/c = sup{x : x2 /2 ≤ 1/c} = 2/c. Thus, considering a standard normal random variable 1 , Theorem 4.1.2 implies that p Sk+bc log nc − Sk max → 2/c a.s. 0≤k≤n−bc log nc bc log nc In order to give a proof of Theorem 4.1.2, we first need another result. Theorem 4.1.5. (Chernoff ) Under the assumptions and notation of Theorem 4.1.2 we have P{Sn ≥ nx} ≤ ρn (x), and (P{Sn ≥ nx})1/n → ρ(x), as n → ∞, for any x > 0. CHAPTER 4. AN APPLICATION OF THE KMT THEOREM 83 Proof. For a proof of Chernoff’s Theorem, we refer to the book of Durrett [15]. A combination of Theorem 2.6.3 and Theorem 2.6.5 yields the desired result. We will now prove Theorem 4.1.2. Proof. The proof consists of two steps. We will show that lim sup max n→∞ 0≤k≤n−ln lim inf max and n→∞ 0≤k≤n−ln Sk+ln − Sk ≤ α a.s., ln (4.2) Sk+ln − Sk ≥ α a.s., ln (4.3) where ln = bc log nc and α = α(c). If we are able to prove these two assertions, this would obviously finish the proof. For the first step we define nj = inf{n ≥ 1 : ln ≥ j}, j ≥ 1. Clearly, we have for a suitable j0 ≥ 1 and all j ≥ j0 , ln = j, nj ≤ n < nj+1 and also nj > j. Notice that we then get the following inequality for nj ≤ n < nj+1 max 0≤k≤n−ln Sk+ln − Sk Sk+j − Sk ≤ max . 0≤k≤nj+1 −j ln j Defining for > 0 and j ≥ j0 , Aj = Aj (c, ) = max 0≤k≤nj+1 −j Sk+j − Sk ≥α+ , j it is obviously enough to prove that P lim sup Aj = 0, > 0. (4.4) j≥j0 To that end we first note that nj+1 −j P(Aj ) ≤ X k=0 P Sk+j − Sk Sj ≥α+ = (nj+1 − j + 1)P ≥α+ j j ≤ nj+1 P {Sj ≥ j(α + )} . Using the inequality of Theorem 4.1.5, we obtain that P(Aj ) ≤ nj+1 ρj (α + ). By definition of α it follows that ρ(α + ) < exp(−1/c). Hence, there exists a δ > 0, such that ρ(α + ) ≤ exp − 1+δ . Moreover, since c log nj+1 < j + 2, we also have c nj+1 < exp((j + 2)/c). Therefore, j j 1+δ δ P(Aj ) ≤ nj+1 exp − ≤ exp(2/c) exp − , c c CHAPTER 4. AN APPLICATION OF THE KMT THEOREM and we see that ∞ X 84 P(Aj ) < ∞. j=j0 Using the Lemma of Borel-Cantelli we obtain (4.4) which in turn shows that (4.2) holds. We now proceed with the proof of inequality (4.3). Therefore, let > 0, and put Sk+ln − Sk <α− . Bn = Bn (c, ) = max 0≤k≤n−ln ln Since ρ(x) is a strictly decreasing function (in the interval where ρ(x) > 0), we have by definition of α = α(c) that ρ(α − ) > e−1/c . Hence, there exists a δ = δ(c, ) ∈ (0, 1), such that ρ(α − ) − δ ≥ exp − 1−δ . By definition of Bn , we have c ! bn/ln c n−l \n Sk+l − Sk \ Sdln − S(d−1)ln n P(Bn ) = P <α− ≤ P < α − . l l n n k=0 d=1 Since the random variables Sdln − S(d−1)ln , for 1 ≤ d ≤ bn/ln c are independent and identically distributed, we obtain that bn/ln c Sdln − S(d−1)ln <α− P(Bn ) ≤ P ln d=1 bn/ln c Sln = P <α− ln bn/ln c Sln = 1−P ≥α− . ln Y By Chernoff’s Theorem 4.1.5, we know that P{Sn /n ≥ x}1/n ≥ ρ(x) − δ, for n large enough and x > 0. From this we can conclude that Sln P ≥ α − ≥ (ρ(α − ) − δ)ln , ln for n large enough. If we combine the last two observations, we obtain that P(Bn ) ≤ 1 − (ρ(α − ) − δ) ln bn/ln c bn/ln c 1−δ ≤ 1 − exp − ln , c for n large enough. Note that for the second inequality, we simply used the definition nδ of δ. We will show that the term on the right-hand side is of order O exp − c log n . CHAPTER 4. AN APPLICATION OF THE KMT THEOREM 85 It is straightforward that bn/ln c bn/bc log ncc 1−δ 1−δ 1 − exp − ln = 1 − exp − bc log nc c c ≤ (1 − exp (−(1 − δ) log n))bn/bc log ncc bn/bc log ncc = 1 − n−(1−δ) nδ n log 1 − . = exp bc log nc n Using Taylor expansion, we know that for 0 ≤ x < 1, it holds that log(1 − x) = −x − x2 x3 − − · · · ≤ −x. 2 3 Using this inequality with x = nδ−1 we obtain that bn/ln c δ 1−δ n n 1 − exp − ln ≤ exp − c n bc log nc δ n n ≤ exp − n c log n δ n n ≤ exp − −1 . n c log n Since δ < 1, we can conclude that bn/ln c 1−δ nδ nδ P(Bn ) ≤ 1 − exp − ln = O exp − , ≤ e. exp − c c log n c log n P if we take n large enough. Therefore ∞ n=1 P(Bn ) < ∞. Using the Borel-Cantelli Lemma, we can conclude that P (lim inf n Bnc ) = 1. This is equivalent with saying that max 0≤k≤n−ln Sk+ln − Sk ≥ α − eventually w.p.1. ln This clearly implies that lim inf max n→∞ 0≤k≤n−ln Sk+ln − Sk ≥ α − a.s. ln Since this holds for all > 0, we have showed that lim inf max n→∞ 0≤k≤n−ln Sk+ln − Sk ≥ α a.s., ln which concludes the proof of equation (4.3). Note that α(c) of Theorem 4.1.2 is uniquely determined by the moment generating function R(t) of 1 . The converse of this statement also holds. CHAPTER 4. AN APPLICATION OF THE KMT THEOREM 86 Theorem 4.1.6. (Erdős-Rényi) Using the same assumptions and notation as in Theorem 4.1.2, we have that the distribution function of 1 is uniquely determined by the function α(c). Proof. By definition of α(c), it is clear that α(c) uniquely determines the Chernoff function ρ(x). Besides, the moment generating function R(t) is uniquely determined by the Chernoff function ρ(x). This can be seen as follows. Let L(t) := log R(t), t ∈ I. Then: ρ(x) = exp inf {L(t) − tx} =: exp(−λ(x)). t 0 We have L (t) = E[1 exp(t1 )]/R(t) and L00 (t) = E[21 exp(t1 )]/R(t) − L0 (t)2 , t ∈ I. Employing the Cauchy-Schwarz inequality, we find that L00 (t) ≥ 0 and also if 1 is non-degenerate that L00 (t) > 0, t ∈ I. For more details we refer to Proposition A.3.5. Thus we see that L0 (t) is a strictly increasing (and continuous) function on I. Set t0 := sup I > 0 and A0 := limt→t0 L0 (t) > 0. Then L0 : [0, t0 [→ [0, A0 [ is 1-1 with a positive derivative. This implies that for any x ∈ [0, A0 [, we can find a unique t = t(x) such that L0 (t(x)) = x. Furthermore this function is differentiable and we have t0 (x) = 1/L00 (t(x)), x ∈]0, A0 [. Also note that t(0) = 0 as E[1 ] = 0. Returning to the Chernoff function ρ(x) = exp(−λ(x)) it follows that λ(x) = xt(x) − L(t(x)), x ∈ [0, A0 [. Calculating the derivative of the last expression, we see that λ0 (x) = t(x), x ∈ [0, A0 [. Consequently, t(x) is determined by λ which in turn is determined by ρ. Recalling that t(x) : 0 ≤ x < A0 is the inverse function of L0 (t) : 0 ≤ t < t0 and L(0) = 0, we see after an integration that R(t) = exp(L(t)), 0 ≤ t < t0 is determined by ρ. Finally, it is a well-known fact that the moment generating function R(t) on any interval ]0, δ[, where δ > 0, uniquely determines the characteristic function of 1 and consequently its distribution function F. For more information on the Theorem of Erdős-Rényi we refer to the article of Deheuvels et al. [10]. To end this section, we will give an important corollary of the previous results. As we already mentioned in the introduction, the theorem of Komlós et al. is optimal in the sense that o(log n) cannot be achieved. Indeed, the following corollary states that o(log n) convergence implies that the i.i.d. random variables under consideration must have a standard normal distribution. Corollary 4.1.7. Let {Xn : n ≥ 1} and {Yn : n ≥ 1} be two sequences of i.i.d. random variables. (a) If we have with probability one, n X (Xj − Yj ) = O(log n) as n → ∞ j=1 and Y1 ∼ N (0, 1), it follows that X1 has a finite moment generating function in a neighbourhood of zero. CHAPTER 4. AN APPLICATION OF THE KMT THEOREM 87 (b) If we have with probability one, n X (Xj − Yj ) = o(log n) as n → ∞ j=1 and Y1 ∼ N (0, 1), it follows that X1 ∼ N (0, 1). Proof. (a) Using the triangle inequality we see that n−1 n X X |Xn − Yn | ≤ (Xj − Yj ) + (Xj − Yj ) . j=1 j=1 Therefore, using the assumption in (a), we can conclude that lim sup |Xn − Yn |/ log n < ∞ a.s. n→∞ Next note that for any > 0, ∞ X P{|Yn | ≥ log n} ≤ ∞ X 2 exp(−2 (log n)2 /2) < ∞. n=1 n=1 In view of the Borel-Cantelli Lemma this means that |Yn |/ log n → 0 a.s. It follows that lim supn→∞ |Xn |/ log n < ∞ a.s. This in turn implies via Kolmogorov’s 0-1 law that there exists a constant C ≥ 0 such that w.p. 1, lim supn→∞ |Xn |/ log n = C. Using once more the Borel-Cantelli Lemma we see that for any D > C ∞ X P{|Xn | > D log n} < ∞. n=1 The random variables {Xn } have identical distributions so that we can rewrite this as follows, ∞ X P{exp(|X1 |/D) > n} < ∞, n=1 which is equivalent with E[exp(|X1 |/D)] < ∞. (b) From part (a) we know that X1 has a finite mgf in a neighbourhood Pn of zero so that theorem of Erdős-Rényi 4.1.2 can be applied. Setting Sn = j=1 Xj and Pthe n Tn = j=1 Yj , we have for any c > 0, that w.p. 1, Sk+bc log nc − Sk → α(c) as n → ∞, 0≤k≤n−bc log nc bc log nc max where the function α is defined in Theorem 4.1.2. Likewise, we have w. p. 1, p Tk+bc log nc − Tk → 2/c as n → ∞, 0≤k≤n−bc log nc bc log nc max CHAPTER 4. AN APPLICATION OF THE KMT THEOREM 88 where we also used Remark 4.1.4. Using the trivial inequality max ai − max bi ≤ max |ai − bi |, 1≤i≤m 1≤i≤m 1≤i≤m with a1 , . . . , am , b1 , . . . , bm ∈ R, we get that T − T S − S k k k+bc log nc k+bc log nc ≤ 2 max |Sk − Tk | . − max max 0≤k≤n−bc log nc 0≤k≤n bc log nc 0≤k≤n−bc log nc bc log nc bc log nc where the right-hand side converges to zero by assumption. It follows that α(c) = p 2/c, c > 0. Using Theorem 4.1.6 and Remark 4.1.4, we obtain that X1 ∼ N (0, 1). 4.2 Limit theorem concerning moving averages The aim of this section is to show an application of the KMT theorem. We will start by giving a limit theorem concerning moving averages for a standard Brownian motion. This is a theorem of Csörgő and Révész [8]. A significant part of this theorem was actually shown by Lai [21]. Using the KMT theorem, one can show that this result remains valid for sums of random variables (which satisfy certain assumptions). Theorem 4.2.1. (Csörgő-Révész, Lai) Let aT , T ≥ 0 be a monotonically non-decreasing function of T , satisfying the following two conditions (i) 0 < aT ≤ T , (ii) T /aT is monotonically non-decreasing. Then lim sup T →∞ |W (T + aT ) − W (T )| = 1 a.s., βT where {W (T ) : T ≥ 0} is a standard Brownian motion and 1/2 T βT = 2aT log + log log T . aT Proof. For a proof of this theorem we refer to [8] and [21]. Since we have now established this limit theorem concerning moving averages for the Brownian motion, we will continue with an extension of this result for more general processes. In order to obtain this extension, we will need the general KMT theorem 1, formulated in the introduction. Theorem 4.2.2. Let 1 , 2 , ... be a sequence of i.i.d. random variables with E[1 ] = 0 and E[21 ] = 1. Assume there exists a t0 > 0 such that the moment generating function R(t) = E[et1 ] is finite if |t| < t0 . Let {an } be a monotonically non-decreasing sequence of integers, satisfying the following two conditions CHAPTER 4. AN APPLICATION OF THE KMT THEOREM 89 (i) 0 < an ≤ n, (ii) n/an is monotonically non-decreasing. Furthermore, assume that an / log n → ∞, Putting Sn = Pn i=1 i , we have lim sup n→∞ where as n → ∞. |Sn+an − Sn | = 1 a.s., βn 1/2 n βn = 2an log + log log n . an Proof. First, we will show that βn / log n → ∞, as n → ∞. Therefore, note that r s n 2an log an + log log n √ an log ann + log log n βn = = 2 . log n log n log n an r Putting δn := log ann + log log n /an , we will show that δn → 0. We already know that an / log n → ∞, as n → ∞. Thus, in order to prove that βn / log n → ∞, it suffices to show that δn converges slower to zero, than an / log n converges to infinity. First, note that an > 0 is an integer, whence n ≥ n/an . This implies that an an ≤ → ∞, log n log ann and thus log an n an n → 0. Secondly, since log → 0, it is clear that an r log log n → 0, an and this at a rate slower than an / log n → ∞. Therefore, δn converges slower to zero than an / log n → ∞. This finishes the argument of βn / log n → ∞. Using the KMT theorem 1, we know there exists a Brownian motion {W (t) : 0 ≤ t < ∞}, such that |Sn − W (n)| lim sup < ∞ a.s. log n n→∞ Combining this with the fact that lim sup n→∞ log n βn → 0, we obtain that |Sn − W (n)| |Sn − W (n)| log n = lim sup . = 0 a.s. βn log n βn n→∞ Moreover, Theorem 4.2.1 implies that lim sup n→∞ |W (n + an ) − W (n)| = 1 a.s. βn When we combine the last two observations, we obtain the desired result. CHAPTER 4. AN APPLICATION OF THE KMT THEOREM 90 The article of Csörgő and Révész [8] also contains some results for Brownian motions, which are similar to Theorem 4.2.1. In an analogous way as above, we can use the KMT theorem in order to obtain versions of these theorems for sums of random variables. Appendix A Some elaborations A.1 Basic results related to probability theory We begin with the definition and notation of one of the fundamental concepts of probability theory; characteristic functions. Definition A.1.1. Let X : Ω → Rn be a random vector defined on a probability space (Ω, F, P). The characteristic function φX of X is defined as follows: φX (t) := E exp(i ht, Xi), for t ∈ Rn . Theorem A.1.2. Let X : Ω → R be a random variable with characteristic function (k) φX . If E[|X|m ] < ∞ for some m ∈ {0, 1, 2, ...}, then all derivatives φX of order 0 ≤ k ≤ m exist and are continuous. Moreover, we have: (k) φX (t) = ik E[X k eitX ], for 1 ≤ k ≤ m. Proof. For a proof of this theorem, we refer to p. 344 of [4]. Proposition A.1.3. Let Z be a random variable. Then Z ∼ N (0, σ 2 ) if and only if E[Zψ(Z)] = σ 2 E[ψ 0 (Z)] for all continuously differentiable functions ψ such that E|ψ 0 (Z)| < ∞. Proof. Let Z be a Gaussian random variable with mean zero and variance σ 2 . Using integration by parts, it is easily seen that: x2 e− 2σ2 ψ (x) √ dx σ E[ψ (Z)] = σ 2πσ −∞ +∞ Z +∞ x2 2 σ e− 2σ2 − x2 = √ e 2σ ψ(x) + xψ(x) √ dx = E[Zψ(Z)]. 2π 2πσ −∞ −∞ 2 0 2 Z +∞ 0 To see the last equality, we write fZ for the Lebesgue density function of Z and then it suffices to show that lim fZ (t)ψ(t) = 0. (A.1) t→±∞ 91 APPENDIX A. SOME ELABORATIONS 92 We will only show that fZ (t)ψ(t) → 0 as t → ∞. Since then the limit for t → −∞ is equal to zero as well, by symmetry reasons. Let > 0 be arbitrary. For any t ∈ R and any x∗ < t we have: Z t Z t 0 ∗ ψ (s)fZ (t) ds + ψ(x )fZ (t) ≤ |ψ 0 (s)|fZ (t) ds + |ψ(x∗ )|fZ (t). |fZ (t)ψ(t)| = x∗ x∗ Since fZ is a decreasing function, we obtain that Z t |ψ 0 (s)|fZ (s) ds + |ψ(x∗ )|fZ (t) |fZ (t)ψ(t)| ≤ ∗ Zx ∞ |ψ 0 (s)|fZ (s) ds + |ψ(x∗ )|fZ (t) =: T1 + |ψ(x∗ )|fZ (t). ≤ x∗ R∞ R∞ Since −∞ |ψ 0 (s)|fZ (s) ds = E|ψ 0 (Z)| < ∞, we know that x∗ |ψ 0 (s)|fZ (s) ds → 0, as x∗ → ∞. Therefore, we can choose x∗ large enough such that T1 < /2. Furthermore we can choose t > x∗ large enough such that |ψ(x∗ )|fZ (t) < /2. Hence, we obtain that |fZ (t)ψ(t)| < for t large enough. This concludes the proof of Equation (A.1) and one implication has been shown. For the other implication, take t fixed and let ψ1 (x) = cos(tx) and ψ2 (x) = sin(tx). Then obviously ψj ∈ C 1 (R) and E|ψj0 (Z)| ≤ |t| < ∞, for j = 1, 2. Hence, by assumption, we have: E[ZeitZ ] = = = = E[Z cos(tZ)] + iE[Z sin(tZ)] σ 2 E[−t sin(tZ)] + iσ 2 E[t cos(tZ)] itσ 2 (E cos(tZ) + iE sin(tZ)) itσ 2 E[eitZ ] = itσ 2 φZ (t). If we take ψ the identity function, we get from the assumptions that E[Z 2 ] = σ 2 < ∞. Hence, Theorem A.1.2 can be used and we get that φ0Z (t) = iE[ZeitZ ]. Thus, by the above calculation itσ 2 φZ (t) = −iφ0Z (t), which yields the following differential equation: 0 = φ0Z (t) + tσ 2 φZ (t). Multiplying with a factor e 0=e t2 σ 2 2 t2 σ 2 2 , this implies: φ0Z (t) + tσ 2 e t2 σ 2 2 0 t2 σ 2 φZ (t) = φZ (t)e 2 Integrating both sides, leads to the following: Z t 0 x2 σ 2 t2 σ 2 0= φZ (x)e 2 dx = φZ (t)e 2 − 1. 0 t2 σ 2 Now we can conclude that φZ (t) = e− 2 , which is the characteristic function of a Gaussian random variable with mean zero and variance σ 2 . This concludes the proof. APPENDIX A. SOME ELABORATIONS 93 The following result is a well-known corollary of Fubini’s Theorem. This result is used frequently in this thesis. Corollary A.1.4. Let X : Ω → Rd1 and Y : Ω → Rd2 be independent random vectors. Let f : Rd1 × Rd2 → R be a Borel-measurable function such that f ≥ 0 or E|f (X, Y )| < ∞. Then the following holds: Z Z E[f (X, Y )] = E[f (X, y)]PY (dy) = E[f (x, Y )]PX (dx). Rd2 Rd1 Proof. This follows directly from Fubini’s Theorem. Proposition A.1.5. Let X : Ω → R be a positive random variable, with E[X] = 0. Then X = 0 a.s. Proof. Since X ≥ n−1 I{X≥1/n} , we have that 0 = E[X] ≥ n−1 P{X ≥ 1/n}. Thus P{X ≥ 1/n} = 0, for all n ≥ 1. Since {X ≥ 1/n} % {X > 0} and probability measures are continuous from below, we can conclude that P{X > 0} = 0. Hence X = 0 with probability one. Theorem A.1.6. (Hölder’s inequality) Let (Ω, F, P) be a probability space and let 1 < p < ∞. If X and Y are random variables and 1/p + 1/q = 1, then we have that: E|XY | ≤ E[|X|p ]1/p E[|Y |q ]1/q . Here we set a.∞ = ∞ for a > 0, and 0.∞ = 0. Proof. For a proof of Hölder’s inequality we refer to pp. 80-81 of [4]. The inequality of Cauchy-Schwarz is a special case of Hölder’s inequality. Recall that two random variables Y and Z are called linearly dependent if Z = 0 a.s. or Y = dZ a.s. for some d ∈ R. Theorem A.1.7. (Cauchy-Schwarz) Let (Ω, F, P) be a probability space and let Y, Z be random variables. Then E[Y Z] ≤ E[Y 2 ]1/2 E[Z 2 ]1/2 . Moreover, this inequality is strict whenever Y and Z are linearly independent. Proof. The Cauchy-Schwarz inequality is a simple application of Hölder’s inequality with p = q = 2. For the second part of the theorem, we proceed as follows. Assume that P{Z = 0} < 1 and E[Y Z] = E[Y 2 ]1/2 E[Z 2 ]1/2 . Then it suffices to show that Y = dZ a.s. for some d ∈ R. To ease the notation, we write k · k for the L2 -norm (i.e. kXk = E[X 2 ]1/2 for a rv X) and hY, Zi := E[Y Z]. Let c ∈ R, then kY + cZk2 = E[(Y + cZ)2 ] = = = = E[Y 2 + 2cY Z + c2 Z 2 ] kY k2 + 2chY, Zi + c2 kZk2 kY k2 + 2ckY kkZk + c2 kZk2 (kY k + ckZk)2 . APPENDIX A. SOME ELABORATIONS 94 Since P{Z = 0} < 1, we know by Proposition A.1.5 that kZk > 0, so we can choose c := −kY k/kZk. Then kY + cZk = 0 and Proposition A.1.5 implies that Y + cZ = 0 a.s., or equivalently Y = −cZ a.s., which completes the proof. The theorem below, about the convolution of p-measures, is commonly used in p-theory. Also in this paper, the result turns out to be very useful. Theorem A.1.8. (Deconvolution Lemma) Let X : Ω → Rn be a random vector such that PX = Q1 ∗ Q2 , where Q1 and Q2 are p-measures on (Rn , Rn ). Assuming that there exists a uniform(0, 1)-r.v. U on (Ω, F, P), which is independent of X, we can define two independent random vectors Z1 and Z2 such that PZi = Qi , i = 1, 2 and X = Z1 + Z2 . Proof. For a proof of this result we refer to Lemma 3.35 on p. 161 of Dudley [13]. In this paper, we will use several times conditional expectations. We will start with the definition. Definition A.1.9. Let (Ω, F, P) be a p-space and let X be an integrable random variable, i.e. X ∈ L1 (Ω, F, P) (or X ≥ 0). Let G be a sub-σ-algebra of F. Then we call Z ∈ L1 (Ω, F, P) (or Z ≥ 0) a version of the conditional expectation of X, given G if (i) Z is G-measurable, and (ii) E[XIG ] = E[ZIG ], ∀G ∈ G. If Y : Ω → Rd is a random vector, we write σ{Y } for the smallest sub-σ-algebra of F for which Y is measurable. We can now state the following lemma. Lemma A.1.10. Let Z : Ω → R be a random variable. Then the following are equivalent, (a) Z is σ{Y }-measurable. (b) There exists a Borel measurable map g : Rd → R such that Z = g ◦ Y . Proof. A proof of this lemma can be found in [4], p. 255. Remark A.1.11. If Y : Ω → Rd is a random vector, we write E[XkY ] := E[Xkσ{Y }]. Thus, by the above lemma there exists a suitable Borel-measurable map g : Rd → R such that E[XkY ] = g ◦ Y . This map is unique PY -a.s. and we define E[XkY = y] := g(y), y ∈ Rd . The following properties on conditional expectations are useful in the calculations. APPENDIX A. SOME ELABORATIONS 95 Proposition A.1.12. Let (Ω, F, P) be a probability space. Let X, Y : Ω → R be integrable random variables and let G be a sub-σ-algebra of F. Then we have: (i) If XY is also integrable and X is G-measurable, then E[XY kG] = XE[Y kG] a.s. (ii) E[E[XkG]] = E[X]. (iii) If G and σ{X} are independent, then E[XkG] = E[X]. Proof. For a proof of these properties, we refer to [4], pp. 447-448. Another well-known theorem about conditional expectations is Jensen’s inequality. It is used several times in this paper. Theorem A.1.13. (Jensen’s inequality) Let ψ : R → R be a convex mapping and let X : Ω → R be a random variable such that X and ψ(X) are integrable. Then we have: ψ(E[XkG]) ≤ E[ψ(X)kG] a.s. Proof. For a proof of Jensen’s inequality, we refer to [4], p. 449. For the notation used in the following theorem we refer to Remark A.1.11. Theorem A.1.14. Let X : Ω → Rd1 and Y : Ω → Rd2 be independent random vectors and let h : Rd1 +d2 → R be a Borel-measurable map. If E|h(X, Y )| < ∞, we have: (i) E[h(X, Y )kY = y] = E[h(X, y)] with PY -probability one, (ii) E[h(X, Y )kX = x] = E[h(x, Y )] with PX -probability one. Proof. This follows directly from Fubini’s Theorem. The following two lemmas are used in the proofs of Chapter 3. These are two results about measures on product spaces and the corresponding density functions. We consider two arbitrary measurable spaces (Xi , Ai ), i = 1, 2 with product space (X1 ×X2 , A1 ⊗A2 ). Further let πi : X1 ×X2 → Xi , i = 1, 2 be the canonical projections. Lemma A.1.15. Let µi : Ai → [0, ∞] be σ-finite measures. Consider a p-measure Q on A1 ⊗ A2 which is dominated by µ1 ⊗ µ2 with density (x1 , x2 ) 7→ f (x1 , x2 ). Then the marginal distributions R Qi := Qπi are dominated by µi , i = 1, 2Rwith respective densities x1 7→ fX1 (x1 ) = X2 f (x1 , x2 )µ2 (dx2 ) and x2 7→ fX2 (x2 ) = X1 f (x1 , x2 )µ1 (dx1 ). Moreover, the conditional Q-distribution of π1 given π2 = x2 is determined by the conditional µ1 -density, x1 7→ f (x1 , x2 )/fX2 (x2 ) provided that fX2 (x2 ) > 0. Proof. This follows directly from Fubini’s theorem. APPENDIX A. SOME ELABORATIONS 96 Lemma A.1.16. Let µi : Ai → [0, ∞] be σ-finite measures. Consider a p-measure Q on A1 ⊗ A2 . Assume that the marginal distributions Qi := Qπi are dominated by µi , i = 1, 2. If µ1 is a counting measure (with at most countable support S1 ), then Q is dominated by µ1 ⊗ µ2 . (And consequently, there exists a density function.) Proof. Let A ∈ A1 ⊗ A2 such that (µ1 ⊗ µ2 )(A) = 0. Further we define Ax1 := {x2 ∈ X2 : (x1 , x2 ) ∈ A} ∈ A2 , for any x1 ∈ S1 . Since Z (µ1 ⊗ µ2 )(A) = µ2 (Ax1 )µ1 (dx1 ) = X1 X µ2 (Ax1 ), x1 ∈S1 and (µ1 ⊗ µ2 )(A) = 0, we can conclude that µ2 (Ax1 ) = 0, for all x1 ∈ S1 . Using that Q2 µ2 , this implies that Q2 (Ax1 ) = 0, for all x1 ∈ S1 . Moreover, we have ! ] Q(A) = Q(A ∩ {π1 ∈ S1 }) = Q {x1 } × Ax1 x1 ∈S1 = X Q ({x1 } × Ax1 ) x1 ∈S1 ≤ X Q1 ({x1 })1/2 Q2 (Ax1 )1/2 = 0, x1 ∈S1 where the inequality can be easily verified using Hölder’s inequality A.1.6. Thus, we have shown that Q(A) = 0, and we get that Q µ1 ⊗ µ2 . Now we will give some definitions and theorems on tight families of probability measures. Let (S, d) be a metric space and denote B for the Borel σ-algebra of (S, d). Definition A.1.17. A family (µi )i∈I of probability measures on (S, B) is called tight if for every > 0 there exists a compact subset K ⊆ S such that µi (S \ K) < for every i ∈ I. Definition A.1.18. A family A = (µi )i∈I of probability measures on (S, B) is called sequentially compact if every sequence in A has a weakly convergent subsequence. The concepts of the two previous definitions are related. This is stated in the following theorem, which is better known as Prohorov’s Theorem. Theorem A.1.19. (Prohorov) Assume (S, d) is a Polish space and let (µi )i∈I be a family of probability measures on (S, B). The family (µi )i∈I is tight if and only if it is sequentially compact. Proof. For a proof of Prohorov’s Theorem, we refer to [5], pp. 60-63. Note that Rn equipped with the Euclidean metric is a Polish space. Hence, the equivalence between tightness and sequentially compactness is valid in particular for probability measures on (Rn , Rn ). The following theorem gives a sufficient condition for a family of probability measures on (Rn , Rn ) to be tight. APPENDIX A. SOME ELABORATIONS 97 Theorem A.1.20. Let Xi : Ω → Rn be random vectors with distribution µi = PXi , i ∈ I. If there exists an α > 0 such that supi∈I E[kXi kα ] < ∞, then (µi )i∈I is a tight family. Proof. Assume there exists an α > 0 such that c := sup E [kXi kα ] < ∞. i∈I Let > 0 and choose M > that c 1/α . Using Markov’s inequality, we have for all i ∈ I P{kXi k > M } ≤ c E [kXi kα ] ≤ α < . α M M Define K := B(0, M ), then K is compact and µi (Rn \ K) = P{kXi k > M } < , for every i ∈ I. This completes the proof. The following results give necessary and sufficient conditions for different types of convergence. We will start by giving a characterization of convergence in probability. Proposition A.1.21. Let (Ω, F, P) be a probability space. Assume Xn : Ω → R, n ≥ 1 and X : Ω → R are random variables, then the following are equivalent: P (i) Xn → X (ii) For every subsequence {nk | k ≥ 1} there is a further subsequence {nkr | r ≥ 1} such that Xnkr → X a.s. Proof. For a proof of this proposition we refer to [12], Theorem 9.2.1. The following result gives a characterization of Lp -convergence. Proposition A.1.22. Let X, Xn , n ≥ 1 be random variables in Lp (Ω, F, P). Then the following are equivalent: (i) Xn → X in Lp P (ii) Xn → X and {|Xn |p | n ≥ 1} is uniformly integrable. P (iii) Xn → X and E|Xn |p → E|X|p . Proof. For a proof of this characterization we refer to Theorem 5.10 in [16]. Another convergence result is the following. Proposition A.1.23. Let Zn , n ≥ 1 be a sequence of random variables. Then the following two properties are true. APPENDIX A. SOME ELABORATIONS 98 (i) If supn≥1 E[|Zn |1+α ] < ∞ for some α > 0, then {Zn | n ≥ 1} is uniformly integrable. d (ii) If Zn → Z and {Zn | n ≥ 1} is uniformly integrable, then E[Zn ] → E[Z]. Proof. The first part is a standard exercise in p-theory. For the second part we refer to p. 338 of [4]. The following proposition is used several times in this paper. It is a result on weak convergence based on a monotonicity argument. Proposition A.1.24. Let (S, d) be a separable metric space. Let X : Ω → S and Xn : Ω → S, n ≥ 1 be random elements in S, and let g : S → [0, ∞[ be continuous. If d Xn → X, then Eg(X) ≤ lim inf Eg(Xn ). n→∞ d Proof. Fix M > 0 and let gM := g ∧ M . Since Xn → X and gM is bounded and continuous, the definition of convergence in distribution implies that EgM (X) = lim EgM (Xn ) ≤ lim inf Eg(Xn ), n→∞ n→∞ where the inequality follows from the fact that gM ≤ g. If we let M % ∞, we can use monotone convergence and we get Eg(X) = lim EgM (X) ≤ lim inf Eg(Xn ). M →∞ A.2 n→∞ Topological results required for Theorem 1.1.1 We start with the formulation of the so-called ’Schauder-Tychonoff’ theorem. Theorem A.2.1. (Schauder-Tychonoff ) Every non-empty, compact and convex subset K of a locally convex topological vector space V has the fixed point property. This means that any continuous function f : K → K has a fixed point. Proof. For a proof of this theorem we refer to Chapter V, 10.5 of [14]. Using the same notation as in Theorem 1.1.1, let V = {ϕ : Rn → R | ϕ is σ-additive and ϕ(∅) = 0} , the space of all finite signed measures on (Rn , Rn ). In the proof of Theorem 1.1.1 we want to apply the Schauder-Tychonoff theorem for this particular choice of V . Thus, we will start by defining a topology T on V such that (V, T ) is a locally convex topological vector space. APPENDIX A. SOME ELABORATIONS A.2.1 99 (V, T ) is a locally convex topological vector space We start with the definition of a separating family of semi-norms. Definition A.2.2. Let X be a vector space, then p : X → R is called a semi-norm on X if 1. p(x + y) ≤ p(x) + p(y), 2. p(αx) = |α|p(x), ∀x, y ∈ X. ∀x ∈ X, ∀α ∈ R. Definition A.2.3. A family {pi : i ∈ I} of semi-norms on a vector space X is said to satisfy the axiom of separation if ∀x 6= 0, ∃i ∈ I : pi (x) 6= 0. Let Cc (Rn ) denote the set of all continuous functions f : Rn → R with compact support. Moreover, for any f ∈ Cc (Rn ) and each µ ∈ V we define Z |µ|f := f dµ . It can be easily shown that V is a vector space and that {| · |f : f ∈ Cc (Rn )} is a separating family of semi-norms on V . We will now construct a topology T on V . For any f ∈ Cc (Rn ) and each > 0, we define Z V (f, ) := µ ∈ V : f dµ < . Taking finite intersections of such sets, we obtain the class B, ( k ) \ B := V (fj , j ) : k ≥ 1, fj ∈ Cc (Rn ), j > 0, ∀j . j=1 The set of all neighbourhoods in V is defined as C := {µ0 + U : µ0 ∈ V, U ∈ B} . Finally, C is a base for T . That is, T consists of all arbitrary unions of sets in C. By definition, this topology T is the topology generated by the separating family of semi-norms {| · |f : f ∈ Cc (Rn )} . Theorem A.2.4. (V, T ) is a locally convex topological vector space. Proof. A proof can be found in Chapter I.1 of [26] or in [22]. APPENDIX A. SOME ELABORATIONS A.2.2 100 The vague topology For the details and the proofs of this subsection, we refer to Chapter IV of [2]. In general, let E be a locally compact topological space. (For the purposes of Theorem 1.1.1, we are only interested in the case where E = Rn ). Furthermore, let M+ (E) denote the set of all Radon measures on the Borel σ-algebra B(E) of E. Definition A.2.5. Let µ be a measure on the Borel sets B(E) of E. We call µ a (i) ’Borel measure’ if µ(K) < ∞, for every compact K ⊂ E; (ii) ’Radon measure’ if (a) µ is ’locally finite’. That is, every point of E has an open neighbourhood of finite µ-measure. (b) µ is ’inner regular’. That is, for every B ∈ B(E): µ(B) = sup {µ(K) : K ⊂ B, K compact} . An easy verification shows that (a) implies (i), such that every Radon measure on B(E) is a Borel measure. The inverse is not true in general. Although, we do have the following theorem. Theorem A.2.6. If the locally compact space E has a countable base for its topology, then every Borel measure on B(E) is a Radon measure. We now introduce the following notation M1+ (E) := {µ ∈ M+ (E) : µ(E) = 1} for the ’Radon probability measures’. In view of the observations above, it is clear that M1+ (Rn ) is the set of all probability measures on (Rn , Rn ). Using the same notation as in Theorem 1.1.1, we see that M1+ (Rn ) = Ve . The goal of the remainder of this subsection is as follows. We will provide M+ (E) with a topology T 0 . The convergence of sequences in M1+ (E) will turn out to be the weak convergence of p-measures. Moreover, if we take E = Rn , we will obtain that the trace topologies TVe = TVe0 (i.e. {G ∩ Ve : G ∈ T } = {G0 ∩ Ve : G0 ∈ T 0 }), and that T 0 is metrizable. This also implies that (Ve , TVe ) is a metrizable space. We start with the definition of ’vague convergence’. Analogously as before we let Cc (E) denote the set of all continuous functions f : E → R with compact support. Definition A.2.7. A sequence (µn )n of Radon measures on B(E) is said to be ’vaguely convergent’ to a Radon measure µ if Z Z lim f dµn = f dµ, ∀f ∈ Cc (E). n→∞ APPENDIX A. SOME ELABORATIONS 101 Vague convergence of sequences in M+ (E) is convergence in a certain topology on M+ (E), called, ’the vague topology’. It is defined as the coarsest topology on M+ (E) with respect to which all mappings Z µ 7→ f dµ (f ∈ Cc (E)) are continuous. A fundamental system of neighbourhoods of a typical µ0 ∈ M+ (E) consists of all sets of the form Z Z Vf1 ,...,fn ; (µ0 ) = µ ∈ M+ (E) : fj dµ − fj dµ0 < , j = 1, ..., n . We denote the vague topology on M+ (E) by T 0 . The following theorem reveals there is a connection between vague convergence and weak convergence. Theorem A.2.8. A sequence (µn )n in M1+ (E) is vaguely convergent to µ ∈ M1+ (E) if and only if it is weakly convergent to µ. For a variety of applications it is important to know when, in terms of E, the vague topology T 0 of M+ (E) is metrizable. One reason is that sequences suffice for dealing with metric topologies, but generally not for non-metric ones. An answer to this metrizability-question is given in the next theorem. Theorem A.2.9. The following assertions about a locally compact space E are equivalent: (a) M+ (E) is a Polish space in its vague topology. (b) The vague topology T 0 of M+ (E) is metrizable and has a countable base. (c) The topology of E has a countable base. (d) E is a Polish space. Since property (c) is certainly true if we take E = Rn , we know that (M+ (Rn ), T 0 ) is metrizable. Hence, also (Ve , TVe0 ) is metrizable. Finally, using the construction of the topologies T and T 0 , a straightforward calculation shows that TVe = TVe0 . Hence, (Ve , TVe ) is a metric space. Moreover, when we apply Theorem A.2.8 with E = Rn , and we use that TVe = TVe0 , we see that convergence of sequences in (Ve , TVe ) is simply weak convergence. A.3 Some straightforward calculations In this section we worked out some calculations that are useful in the text. APPENDIX A. SOME ELABORATIONS 102 Proposition A.3.1. Let X be a random variable with E[X] = 0 and E[X 2 ] < ∞. Suppose X has a density function ρ with support I ⊆ R, where I is a bounded or unbounded interval. Let R∞ x yρ(y)dy , if x ∈ I h(x) := ρ(x) 0 if x 6∈ I. Then E[h(X)] = E[X 2 ]. Proof. Notice that for every x ∈ I, we have by definition of h that h(x)ρ(x) = R∞ yρ(y)dy. This is also true for x 6∈ I, because I is an interval and E[X] = 0. x Furthermore, we also notice that: Z ∞ Z x yρ(y)dy = − yρ(y)dy, −∞ x since E[X] = 0. We now obtain the following: Z +∞ E[h(X)] = h(x)ρ(x)dx −∞ Z +∞ Z ∞ yρ(y)dydx = −∞ x Z ∞Z ∞ Z 0 Z ∞ yρ(y)dydx yρ(y)dydx + = 0 x −∞ x Z 0 Z x Z ∞Z ∞ = (−y)ρ(y) dydx + yρ(y) dydx {z } −∞ −∞ | 0 x | {z } ≥0 Z 0 = Z (−y)ρ(y) −∞ Z 0 = 0 Z 2 0 Z y ρ(y)dy + −∞ Z ∞ 2 Z dxdy 0 +∞ y ρ(y)dy = 0 ≥0 y yρ(y) dxdy + y ∞ y 2 ρ(y)dy = E[X 2 ], −∞ where we have used Fubini’s Theorem, since the two integrands were positive. Proposition A.3.2. Let µ1 , µ2 be measures on (Rn , Rn ) and let α, β ∈ R+ . Then the following statements are true: (i) αµ1 + βµ2 is a measure on (Rn , Rn ). (ii) For any Borel-measurable function f : Rn → R, we have: Z Z Z f d(αµ1 + βµ2 ) = α f dµ1 + β f dµ2 . Proof. The proof of this proposition is straightforward. APPENDIX A. SOME ELABORATIONS 103 Proposition A.3.3. Let k · k denote the matrix norm induced by the Euclidean norm on Rn . Let A = (aij ) be a real n × n-matrix, then for all 1 ≤ i, j ≤ n: |aij | ≤ kAk. Proof. Let 1 ≤ i, j ≤ n and let A·,j denote the j-th column of A. Then, obviously: v u n uX kAej k |aij | ≤ t a2lj = kA·,j k = kAej k = ≤ kAk, kej k l=1 where ej = (0, . . . , 0, 1, 0, . . . , 0)T . Proposition A.3.4. Let X be a bounded random variable. Then X has a finite moment generating function R. Moreover, R is differentiable and: R0 (t) = E[XetX ]. Proof. By assumption, there exists a constant C such that |X| ≤ C. This implies that R(t) = E[etX ] ≤ E[e|t||X| ] ≤ e|t|C < ∞. Since the mgf of X is finite, it is a well-known fact that R(t) can be written as follows: R(t) = ∞ X E[X j ]tj j! j=0 . The proof of this result is straightforward and can be found in [4], p. 146. Since we now have a representation of R as a power series, it follows from analysis that P E[X j ]tj−k all derivatives R(k) of R exist. Moreover, it holds that R(k) (t) = ∞ j=k (j−k)! , for t ∈ R and k = 1, 2, . . . . Thus, particularly we have that: 0 R (t) = ∞ X E[X j ]tj−1 (j − 1)! j=1 Let Yn = X j tj−1 j=1 (j−1)! . Pn Yn = X Obviously, we have that: n X X j−1 tj−1 j=1 , t ∈ R. (j − 1)! =X n−1 X X j tj j=0 j! → XetX a.s., as n → ∞. Since |Yn | ≤ |X|e|t||X| ≤ Ce|t|C for all n, we can use the Bounded Convergence Theorem which implies that: n X E[X j ]tj−1 j=1 Consequently, E[XetX ] = (j − 1)! P∞ j=1 = E[Yn ] → E[XetX ]. E[X j ]tj−1 , (j−1)! which concludes the proof. APPENDIX A. SOME ELABORATIONS 104 Proposition A.3.5. Let X be a random variable with E[X] = 0 and assume there exists a neighbourhood I of t = 0 such that the mgf R(t) = E etX is finite for all t ∈ I. Let L(t) := log R(t), t ∈ I. Then (i) L00 (t) ≥ 0, for all t ∈ I, (ii) L00 (t) > 0, for all t ∈ I, if X is non-degenerate. Proof. Clearly L0 (t) = R0 (t)/R(t) and therefore L00 (t) = R00 (t)R(t) − (R0 (t))2 . (R(t))2 From this expression for L00 (t), we see that L00 (t) ≥ 0 if and only if R00 (t)R(t) − (R0 (t))2 ≥ 0. Using the well-known expressions for R0 (t) and R00 (t), we see that this is equivalent with saying that E[X 2 exp(tX)]E[exp(tX)] ≥ E[X exp(tX)]2 . This inequality is indeed valid, since if we put Y := X exp(tX/2) and Z := exp(tX/2), then the inequality of Cauchy-Schwarz A.1.7 implies that E[X exp(tX)]2 = = ≤ = E[X exp(tX/2) exp(tX/2)]2 E[Y Z]2 E[Y 2 ]E[Z 2 ] E[X 2 exp(tX)]E[exp(tX)]. Thus we have shown that L00 (t) ≥ 0. Besides, we know that the inequality of CauchySchwarz is strict whenever Y and Z are linearly independent. This is obviously the case when X is non-degenerate. Indeed, if Y and Z would be linearly dependent, then Z = 0 a.s. or Y = dZ a.s. for some d ∈ R. Clearly Z 6= 0 almost surely. And if Y = dZ a.s. for some d ∈ R, then X = d a.s., which is a contradiction, since X is non-degenerate. Hence L00 (t) > 0 when we assume that X is non-degenerate. Proposition A.3.6. For all real numbers x and y, the following is true: 1 |ex − ey | ≤ |x − y|(ex + ey ). 2 Proof. The inequality we have to show can be rewritten as: 1 ex |1 − ey−x | ≤ |x − y|ex (1 + ey−x ), 2 which is equivalent with |1 − ey−x | ≤ 12 |y − x|(1 + ey−x ). Thus it suffices to prove that for all t ∈ R it holds that: |1 − et | 1 ≤ |t|. 1 + et 2 APPENDIX A. SOME ELABORATIONS 105 t | and h(t) := 12 |t|, then obviously g(0) = 0 = h(0). Thus, it suffices Let g(t) := |1−e 1+et 0 to show that g (t) ≤ h0 (t) for all t > 0 and g 0 (t) ≥ h0 (t) for all t < 0. First let et −1 2 t > 0. Then g(t) = 1+e t = 1 − 1+et and h(t) = t/2. Thus, we want to prove that t 2e ≤ 1/2. We will rewrite this inequality, until we get an equivalent expression (1+et )2 from which we know it holds. Clearly, 2et 1 ⇔ 4et ≤ 1 + 2et + e2t ≤ t 2 (1 + e ) 2 ⇔ 0 ≤ 1 − 2et + e2t ⇔ 0 ≤ (1 − et )2 , t = where the last inequality obviously holds. Now let t < 0, then g(t) = 1−e 1+et 2 2et − 1 − 1+et and h(t) = −t/2. So, we want to prove that − (1+et )2 ≥ −1/2. This t 2e is equivalent with showing that (1+e t )2 ≤ 1/2, which we already proved. This concludes the proof of this proposition. Proposition A.3.7. Let Z be a standard normal rv and let α, β ∈ R with β < 1/2. Then, we have α2 1 2 E exp αZ + βZ = √ exp . 2(1 − 2β) 1 − 2β Proof. Using the density function of Z, it is easy to see that: E exp αZ + βZ 2 Z ∞ 1 exp(αx + βx2 ) exp(−x2 /2)dx =√ 2π −∞ Z ∞ 2 ! 2 p 1 1 α α = √ exp exp − 1 − 2βx − √ dx. 2(1 − 2β) 2 1 − 2β 2π −∞ Put y = √ 1 − 2βx − E exp αZ + βZ √ α 1−2β 2 and perform a substitution. Then we obtain that: Z ∞ 1 1 α2 = √ √ exp exp(−y 2 /2)dy 2(1 − 2β) 1 − 2β 2π −∞ 1 α2 = √ exp , 2(1 − 2β) 1 − 2β where we used that the density function of a standard normal rv integrates to 1. List of symbols Symbol mgf p-space p-measure rv U [a, b] N (µ, σ 2 ) Φ B(0, r) (S, d) B Rd Ck C∞ Lp a.s. P → d → w → σ{Y } R φX IA Ac SRW i.i.d. SLLN KMT B = (Bt )t≥0 W = {W (t) : t ≥ 0} B ◦ = (Bt◦ )t≥0 Description moment generating function probability space probability measure random variable uniform distribution on [a, b] normal distribution with expectation µ and variance σ 2 standard normal distribution function closed ball in Rn with center 0 and radius r metric space Borel subsets of a certain metric space (S, d) Borel σ-field in Rd space of functions which are k times continuously differentiable space of functions with derivatives up to any order space of random variables X with E[|X|p ] < ∞ almost surely convergence in probability convergence in distribution weak convergence smallest sub-σ-field of F making Y measurable moment generating function characteristic function of a random vector X indicator function complement of A simple random walk independent and identically distributed strong law of large numbers Komlós-Major-Tusnády Brownian Motion Brownian Motion Brownian Bridge 106 APPENDIX A. SOME ELABORATIONS U i∈I Ai a∨b a∧b bac k·k δij µ1 ⊗ µ2 µ1 µ2 PX disjoint union of the sets Ai , i ∈ I maximum of a and b minimum of a and b floor function, i.e. the largest integer not greater than a ∈ R Euclidean norm or induced matrix norm the Kronecker delta product measure of µ1 and µ2 the measure µ1 is dominated by the measure µ2 distribution of the random vector X 107 Bibliography [1] K. Azuma, Weighted sums of certain dependent random variables, Tôhoku Math. J. (2) 19 (1967), 357–367. [2] H. Bauer, Measure and integration theory, de Gruyter Studies in Mathematics, vol. 26, Walter de Gruyter & Co., Berlin, 2001. Translated from the German by Robert B. Burckel. [3] R. Bhatia, Matrix analysis, Graduate Texts in Mathematics, vol. 169, SpringerVerlag, New York, 1997. [4] P. Billingsley, Probability and measure, 3rd ed., Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons, Inc., New York, 1995. A WileyInterscience Publication. [5] , Convergence of probability measures, 2nd ed., Wiley Series in Probability and Statistics: Probability and Statistics, John Wiley & Sons, Inc., New York, 1999. A Wiley-Interscience Publication. [6] S. Chatterjee, A new approach to strong embeddings, Probab. Theory Related Fields 152 (2012), no. 1-2, 231–264. [7] H. Chernoff, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Ann. Math. Statistics 23 (1952), 493–507. [8] M. Csörgő and P. Révész, How big are the increments of a Wiener process?, Ann. Probab. 7 (1979), no. 4, 731–737. [9] , Strong approximations in probability and statistics, Probability and Mathematical Statistics, Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New York-London, 1981. [10] P. Deheuvels, L. Devroye, and J. Lynch, Exact convergence rate in the limit theorems of Erdős-Rényi and Shepp, Ann. Probab. 14 (1986), no. 1, 209–223. [11] A. Dembo, Y. Peres, J. Rosen, and O. Zeitouni, Cover times for Brownian motion and random walks in two dimensions, Ann. of Math. (2) 160 (2004), no. 2, 433– 464. 108 BIBLIOGRAPHY 109 [12] R. M. Dudley, Real analysis and probability, Cambridge Studies in Advanced Mathematics, vol. 74, Cambridge University Press, Cambridge, 2002. Revised reprint of the 1989 original. [13] , Uniform central limit theorems, 2nd ed., Cambridge Studies in Advanced Mathematics, Cambridge University Press, Cambridge, 2014. [14] N. Dunford and J. T. Schwartz, Linear Operators. I. General Theory, With the assistance of W. G. Bade and R. G. Bartle. Pure and Applied Mathematics, Vol. 7, Interscience Publishers, Inc., New York; Interscience Publishers, Ltd., London, 1958. [15] R. Durrett, Probability: theory and examples, 4th ed., Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, Cambridge, 2010. [16] U. Einmahl, Maattheorie, Vrije Universiteit (http://homepages.vub.ac.be/ ueinmahl/). Brussel, 2012-2013 [17] E. Hewitt and K. Stromberg, Real and abstract analysis, Springer-Verlag, New York-Heidelberg, 1975. A modern treatment of the theory of functions of a real variable; Third printing; Graduate Texts in Mathematics, No. 25. [18] J. Kiefer, On the deviations in the Skorokhod-Strassen approximation scheme, Z. Wahrscheinlichkeitstheorie Und Verw. Gebiete 13 (1969), 321-332. [19] J. Komlós, P. Major, and G. Tusnády, An approximation of partial sums of independent RV’s and the sample DF. I, Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 32 (1975), 111–131. [20] , An approximation of partial sums of independent RV’s, and the sample DF. II, Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 34 (1976), no. 1, 33–58. [21] T. L. Lai, Limit theorems for delayed sums, Ann. Probab. 2 (1974), no. 3, 432– 440. [22] W. Rudin, Functional analysis, 2nd ed., International Series in Pure and Applied Mathematics, McGraw-Hill, Inc., New York, 1991. [23] A. I. Sakhanenko, Rate of convergence in the invariance principle for variables with exponential moments that are not identically distributed, Limit theorems for sums of random variables, Trudy Inst. Mat., vol. 3, “Nauka” Sibirsk. Otdel., Novosibirsk, 1984, pp. 4–49 (Russian). [24] V. Strassen, An invariance principle for the law of the iterated logarithm, Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 3 (1964), 211–226. BIBLIOGRAPHY [25] 110 , Almost sure behavior of sums of independent random variables and martingales, Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66), Univ. California Press, Berkeley, Calif., 1967, pp. Vol. II: Contributions to Probability Theory, Part 1, pp. 315–343. [26] K. Yosida, Functional analysis, 6th ed., Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 123, Springer-Verlag, Berlin-New York, 1980. Index Lp -convergence, 97 U [−1, 1], 34, 40, 50 σ-finite, 95 Kolmogorov’s 0-1 law, 87 absolutely continuous, 22 Azuma-Hoeffding inequality, 35 Lipschitz function, 4 locally convex topological vector space, 5, 98 locally finite, 100 Borel measure, 100 Borel-Cantelli, 84, 85 Bounded Convergence Theorem, 32, 103 Brownian bridge, 77 Brownian motion, 72 Mean Value Theorem, 25 Minkowski, 11 moment generating function, 103 monotone convergence theorem, 22, 33 moving averages, 88 Cauchy-Schwarz, 9, 93 characteristic function, 91 Chernoff, 82 compact, 6, 98 conditional expectation, 94 continuous mapping theorem, 15, 33 convergence in distribution, 98 convex, 6, 98 convolution, 31 Polish space, 96 positive semidefinite, 5 Prohorov, 96 dominated convergence theorem, 18, 24 embedding, 1 Erdős-Rényi, 80 fixed point, 10, 98 Fubini, 93, 95 gamma(1/2, 2), 45 Gaussian process, 72 gradient, 5 Hölder’s inequality, 93 Hessian, 5 Radon measure, 100 Schauder-Tychonoff, 98 semi-norm, 5, 99 separation, 99 sequence of bounded martingale differences, 35 sequentially compact, 96 simple random walk, 2, 57, 72 Stein coefficient, 4 strong coupling, 3 symmetric ±1-distribution, 2 tight, 96 uniform random permutation, 46 uniformly integrable, 97 vague topology, 100 weak convergence, 101 inner regular, 100 Jensen’s inequality, 95 111
© Copyright 2025 Paperzz