Notes on Convergence of Probability Measures by Billingsly 1 Weak 1.1 Convergence in Metric Spaces Measures on Metric Spaces Denition 1. Our general framework here will be a metric space ρ, with a distance S equipped which denes the usual topology of open and closed sets. Along with this we will get S, the Borel sigma algebra of subsets. We are P on S which are non-negative, countable additive set functions satisfying P (S) = 1. We will R also use the shorthand that for functions f : S → R, we dene Pf = f dP = E (f ) ∈ R interested in studying probability measures here, that is measures Denition 2. Pn f → Pf converges weakly to Theorem 3. Pn ⇒ P if and we say P We write that a sequence of probability measures for every bounded, continuous real function f on S P S is what is called regular, meaning S−set A and every > 0, there is are open and closed sets G and sandwhich A so that F ⊂ A ⊂ G and P (G − F ) < Every probability measure on that for every F that Proof. This is a Borel sigma algebra proof . One rst veries that the collection of sets A with the above property form a sigma algebra (easy), and then checks A closed, take F = A Aδ := {x : ρ (x, A) < δ} that the closed sets have this property too. Indeed, for and then to get G consider the collection of open sets (recall that distance from a point to a set is taken with an inf ). ∩A 1 n for δ = A and is decreasing, hence limn→∞ PA 1 n Now, since = PA, so that P Aδ − A < suciently small. Theorem 4. If every closed set Pand Q are probability measures on S F , then PA = QA for every A ∈ S . so that PF = QF for Proof. The collection of sets where they agree is seen to be a sigma algebra, so containing all the closed sets is enough to get equality everywhere. Remark 5. This theorem shows us that the probability measure is determined entirely by its action on open and closed sets (We will see a lot more results of this avor). The next result tells us that knowing the action of continuous f is also sucient. 1 Pf for bounded Theorem 6. Suppose P,Q are probability f . Then P = Q. S measures on with Pf = Qf for every bounded continuous Proof. We will only need to consider the functions of the form: + F where is a closed set, >0 and we use the shorthand z f (x) = (1 − ρ (x, F ) /) = max (z, 0) ≥ 0. One easily veries the inequality: + 1F ≤ f = (1 − ρ (x, F ) /) ≤ 1F (1F is the indicator function of the set F ). Here, F = {x : ρ (x, F ) < } F by a bit. Now, integrating the above inequality gives, PF ≤ Pf . By hypothesis, Pf = Qf and itegrating the above inequality again gives Qf ≤ QF . Stringing these together gives PF ≤ QF . Taking to zero gives PF ≤ QF (like in the last theorem, QF → QF ). Of course, the argument works equally well to show QF ≤ PF , so we get the desired result. is the enlargment of Denition 7. > 0, P PK > 1 − . We say that a probability measure there exists a compact setK so that on S is tight for every This notion of tight is a bridge between the idea of comapact and the probability measure on the space. Theorem 8. P is tight if and only if PA = supK⊂A PK for every A ∈ S . (K 's are compact here) P is tight. For any > 0, nd K so that PK > 1 − /2 and F G open so that F ⊂ A ⊂ G and P (G − F ) < /2. Then P (A − F ) < /2 too. Now the set K ∩ F is compact (as K compact and F closed) and has A − K ∩ F = (A ∩ K c ) ∪˙ (A − F ), so P (A − K ∩ F ) ≤ PK c + P (A − F ) < /2 + /2 = . Hence PA ≤ supK⊂A PK + . Since this holds for any , we get PA ≤ supK⊂A PK . The reverse inequality is trivial from K ⊂ A, so we have Proof. Suppose closed, our result. Conversly, if Theorem 9. S If PA = supK⊂A PK S holds, feeding in A=S give tightness. is seperable and complete, then each probability measure on is tight. Proof. Fix any > 0. Since S is seperable, for each k ∈ N, by taking open 1 k -balls centered at the countable dense subset, we get a sequence of open sets Ak,1 , Ak,2 , . . . that cover S . Now, since ∪i Ak,i ↑ S , we can nd for each k , an nk so that P (∪i≤nk Ai,k ) > 1 − 2k . Now, consider the set ∪i≤nk Ak,i . P ∩k∈N c c Since P (∪i≤nk Ai,k ) < , hence P (∪k (∪i≤nk Ai,k ) ) < = , and so k k k 2 2 P (∩k∈N ∪i≤nk Ak,i ) > 1 − . Finally, we remark that the closure of ∩k∈N ∪i≤nk Ak,i is compact, as it is closed (its a closure of a set in a complete metric space) and totally bounded, since ∩k∈N ∪i≤nk Ak,i ⊂ ∪i≤nk Ak0 ,i which can be covered 0 1 by a nite number (nk0 of them) of k -balls. Hence it is compact, so this set gives us tightness. Denition 10. A collection of sets probability measures which agree on A ⊂ S is called a seperating class A must agree on all of S . This is 2 if two called + a seperating class, since the values of P on A are enough to seperate P from any other probability measure. For example, as we have seen, the open sets are a seperating class. Denition 11. A π−system is a collection of sets which is closed under nite intersections. e.g. the half open intervals Remark 12. If A (−∞, a] A is π−system which generates S are a Example 13. On . . . × (−∞, ak ]. R (−∞, a] , the half open intervals are the sets These are a π−system and so is a seperating class. on R. (the Borel sigma algebra), then is a seperating class. e.g. the half open intervals k π−system on R. (−∞, a1 ]×(−∞, a2 ]× that generates the Borel sigma algebra This means the cumulative distribution func- F (a1 , a2 , . . . ak ) = P(−∞, a1 ] × (−∞, a2 ] × . . . × (−∞, ak ] complete dek termine P. Since R is seperable and complete, one can see any probabilty tions: measure is tight by the previous theorem. Another way to see tightness is to see that Rk = ∪i B̄(0, i) σ−compact so that P (B(0, i)) ↑ 1. This arguement is saying that spaces are tight. Example 14. On the space of real valued seuqences, the metric: X ρ(x, y) = R∞ = {x1 , x2 , . . . }, take b(xi , yi )/2i i b(xi , yi ) = 1 ∧ |xi − yi | = min (1, |xi − yi |). This metrices pointn ∞ n wise convergence. If a sequence x ∈ R has ρ(x , x) → 0, then notice that n b(xni , xi ) ≤ 2i ρ(xni , xi ) → 0, so xni → x . Conversly, if xi → xi for each i given P i i any > 0, take n0 so large so that 1/2 < /2 and then take n1 so large i>n0 n so that |xi − xi | < /2 for every 1 ≤ i ≤ n0 and n > n1 (since there are only nitely many i here, 1 ≤ i ≤ n0 this is ok!) But then, for n > n1 we have: Where ρ(xn , x) = X ≤ X b(xi , yi )/2i i b(xi , yi )/2i + ≤ 1/2i i>n0 i≤n0 X X i (/2)/2 + (/2) i≤n0 < ρ(xn , x) → 0 if and only if xni → xi for each i. One can see that the natural k coordinates πk : R∞ → Rk are continuous then since k convergence on R is also characterized by coordinate-wise convergence. Hence So projections onto the rst the sets: Nk, (x) = {y : |yi − xi | < , 1 ≤ i ≤ k} = πk−1 (B(x, )) Are open. Moreover, k large enough and y ∈ Nk, (x) ρ (y, x) ≤ + 21k , so by choosing for any r , have Nk, (x) ⊂ B(x, r). implies small enough, we can, 3 By choosing only rational x, we can then see that the space is seperable. Com- pletness of the space follows since any Caucy sequeunce will be coordinate-wise Cauchy. Since R is complete, we will then have coordinatewise convergence, which we've already shown is equivalent to our metric. Notice that R ∞ is not σ−compact Hence R∞ is tight! (this can be proved by the Baire category theorem) so the fact that probability measures are tight is not obvious here. πk−1 H for H ∈ S . Notice that π−system, since the intersection of two such sets can be written −1 −1 −1 as πk H1 ∩ πk H2 = πk (H1 ∩ H2 ) (some manipulation with adding extra co1 2 ordinates to make k1 and k2 compatible may be nessasary). Morevoer, the fact that the subsets Nk, , which are exactly such nitie dimensional sets, form a Conisder the nite dimensional sets of the form these form a basis for the space (they can be found within every open ball) shows that these nite dimensional sets generate the whole sigma algebra. This is saying precisly that the nite dimensional sets are a seperating class. Example 15. The space of continuous functions on [0, 1],with the sup-norm distance, can also be shown to be seperable and complete. Seperability can be seen by considering piecewise linear functions with rational values that live on ner and ner grids. (The fact that these are dense uses uniform continuity). Completeness is a standard exercise, one sees that every Cauchy sequence must be Cauchy at each coordinate, and that the convergence is uniform. So again, we have tightness. Example 16. C[0, 1] with the sup-norm distance.) We will show that πt1 ,t2 ,...tk : C[0, 1] → Rk −1 by πt1 ...tk = (x(t1 ), x(t2 ), . . . x(tk )) These are continuous, so the sets πt ,...t H, H ∈ 1 k k ∞ R are Borel sets. As in the R example, consider the set of nite dimensional −1 0 sets of the form πt ,...t H . As before, by rening the indices t s a little bit, we 1 k see that these sets form a π−system. By continuity of the functions, we can write in C[0, 1] that B(x, ) = ∩r {y : |x(r) − y(r)| < } where r ranges over the rationals in Q ∩ [0, 1]. Since the rationals are countable, these balls are in the (Again on the nite dimensional sets form a seperating class. Let sigma algebra generated by the nite dimensional sets! That is to say, the nite dimensional sets are again a seperating class. 1.2 Properties of Weak Convergence Example 17. at in Write δx to be the probability measure on S that has unit mass R x ∈ S . That is δx (A) = 1A (x) and δx (f ) = f dδx = f (x) If we have xn → x0 S , then for continuous f we have that: δxn f = f (xn ) → f (x0 ) = δx0 f , δxn ⇒ δx0 by denition > 0 so that ρ(xn , x0 ) > + innetly often, and then for the function f (x) = (1 − ρ(x, x0 )/) , we will have that δxn f = f (xn ) = 0 innetly often while δx0 f = f (x0 ) = 1, so of course δxn f ; δx0 f . Hence δxn ⇒ δx0 if and only if xn → x0 . and therefore, since this holds for all continuous of ⇒ . Conversly, if xn 9 x0 , the there exists 4 Example 18. Take S = [0, 1] and P = L[0, 1] to be the usual Lebesgue measure. Suppose that we have a sequence of probability measures which are constructed 1 1≤k≤mn δxn,k , and mn that the points are asymptotically evenly distributed over [0, 1] in the sense that as a sum of many point masses. in any interval That is each P Pn = J ⊂ [0, 1] Pn J = #{k : xn,k ∈ J} → L(J) = PJ mn This condition is enough to see that Pn ⇒ P. One neat way to see this more rigourosly is to use the Theory of Riemann integrals, for if function on of [0, 1] [0, 1], is any continuous J1 , J2 , . . . Jr ne enough so that the upper (sups!) and lower (infs!) Riemann sums disagree by at most Pn f = f then it is Riemann integrable. Take a partition , and we will have that: X 1 X 1 X f (xn,k ) ≤ sup{f (x) : x ∈ Ji }·#{k : xn,k ∈ J} → sup{f (x) : x ∈ Ji }L (Ji ) ≤ Pf + mn mn The other inequality holds too for the lower Riemann sums, and so we have a sandwhich from which we conclude that continuous f, we conclude that Denition 19. ∂A saties Since this holds for every P on S , a set A ∈ S P−continuity set. For a probability measure P (∂A) = 0 Theorem 20. Pn f → Pf . Pn ⇒ P. is called a whose boundary (Portmanteau Theorem) The following are equivalent ways to Pn ⇒ P: (i)/(ii) Pn ⇒ P that is:Pn f → Pf for every (iii)/(iv) lim sup Pn F ≤ PF for every closed every open set G (v) Pn A → PA for all P−continuity sets A. say Proof. (i)/(ii) ⇒ (iii)/(iv): For F closed, let f lim infPn G ≥ PG bounded continuous set F / + f = (1 − ρ(·, F )/) for so that 1F ≤ f ≤ 1F as in a previous argument. Since this is bounded and continuous, we have by (ii) that lim Pn f = Pf . But then, lim sup Pn F ≤ lim sup Pn f = Pf ≤ PF . Since F is closed, we know ∩ F = F , so taking → 0 with this inequality gives the desired result in (iii). Taking complements shows (iii) and (iv) are really the same thing. Proof. (iii)/(iv) ⇒ (v): written as Given a set A, recall that the boundary of ◦ can be ∂A = Ā− A, so P(∂A) = 0 implies that P(Ā) = P(A) = P(A). ◦ Ā A ◦ is closed and A is open, we have by PĀ (iii), (iv) that: ≥ lim sup Pn Ā ≥ lim sup Pn A ≥ lim inf Pn A ◦ ≥ lim inf Pn A ◦ ≥ PA 5 Since ◦ P(Ā) = P(A) = P(A), these are all equalities, and moreover lim Pn A = PA, which is (v). Since (v) ⇒ (i), (ii) by linearity,we may assume that 0 ≤ f ≤ 1. Now, R f is bounded, R1 Pf = f dP = 0 P{f > t}dt (this a Fubini-Tonelli type statement), the same equality holds with Pn . Now, for continuous f , ∂{f > t} ⊂ {f = t} (nd sequences in {f > t} and {f ≤ t} converging to any point in ∂{f > t}, and by continuity, we will have that such a point has both f ≥ t and f ≤ t), so {f > t} is a P−continuity whenever P{f = t} = 0. Of course, P{f = t} 6= 0 for at most countably many t, say at t1 , t2 , . . . , and everywhere else {f > t} is a P−continuity set. By condition 5, everywhere except for t = t1 , t2 , . . . we will have Pn {f > t} → P{f > t}. That is to say that Pn {f > t} → P{f > t} for L−almost every t in [0, 1]. By the bounded convergence theorem then: Z 1 Z 1 Pn f = Pn {f > t}dt → P{f > t} = Pf Proof. Since 0 Denition 21. class is 0 A collection of Borel sets, Pn A → PA for all P−continuity A is called a convergence determening A ∈ A implies Pn ⇒ P. One can sets prove some wacky theorems that tell you that certain classes are convergence determening. See pages 17-19 of the book. Example 22. The nite dimensional sets of R∞ are convergence determening. (Details ommited) Example 23. C[0, 1] are not convergence deterfn of functions which gets to zero pointwise 1 after nite n, but for which fn 9 0 uniformly (e.g. fn has a spike at n , and 2 is zero everywhere after n ). Let Pn = δfn since fn 9 0 in this metric space, δfn ; δ0 . However, for all the nite dimensional subsets πt−1 H , we have 1 ,t2 ,... (for nlarge) −1 that Pn πt ,t ,... H = 1π −1 (fn ) = 1H1 (f (t1 )) · . . . · 1Hn (f (tn )) = 1 2 t1 ,t2 ,... H −1 1H1 (0) · . . . · 1Hn (0) = 1π−1 (0) = P πt1 ,t2 ,... H since fn gets to zero at the t1 ,t2 ,... H points t1 , t2 , . . . for large enough n by the pointwise stipulation we made. So Pn A → PA for every nite dimensional A and yet Pn ; P. This means these are NOT a convergence determening class in C[0, 1]; this highlites a fundemental ∞ dierence between R and C[0, 1] The nite dimensional sets of mening. To see this, take a sequence Theorem 24. call it Pnij , j ⇒ P, Proof. (By contrapositive). If so that Pn , call itPni ,has Pn ⇒ P. If every subsequence of so thatPni |Pn f − Pf | > then Pn ; P a further subsquence, then there is a function intently often. f and subsequence, we see that it's impossible to have a sub-sub-sequence with P, as the function f provides an obstruction. 6 > 0 Taking this innetly often as our Pnij ; 1.2.1 The Mapping Theorem Theorem 25. Suppose h : S → S 0 is a continous function between two metric spaces. For P a probability measure on S , we get an induced probability measure 0 −1 −1 on S , namely Ph by Ph (A0 ) = P h−1 (A0 ) . If Pn ⇒ P then Pn h−1 ⇒ Ph−1 . f , the function f ◦ h is again bounded and Pn (f ◦ h) → P(f ◦ h). By R a change of variables for probability spaces however we see that, P (f ◦ h) = f ◦h(x)dP(dx) = R R f (h(x))dP(dx) = f (y)dP(h−1 (dy)) = Ph−1 (f ). So indeed Pn (f ◦ h) → P(f ◦ h) is the same as Pn h−1 (f ) → Ph−1 (f ). Since this holds for every −1 bounded continous f , we see Pn h ⇒ Ph−1 . Proof. For any bounded continuous conditunous. Hence Example 26. Pn ⇒ P gives : R∞ → Rk are continuous, so if Pn ⇒ P on Pπk−1 for every k . The converse is also true. πk−1 H = πk−1 (∂H). (one direction is trivial, ∞ the other not so hard using sequences), so that the P-continuity sets of Rf −1 k (nite dimensional sets) are precisly those which are Pπk −continuity sets in R −1 −1 for every k . Hence Pn πk ⇒ Pπk for every k means that Pn A → PA whenever A is a nite dimensional P−continuity set. Since the nite dimensional sets are a convergence-determening class, we have Pn ⇒ P. πk R , then the measures Pn πk−1 ⇒ First, we can show directly that ∂ The projections ∞ Example 27. if Pn ⇒ P on πt1 ,t2 ,...tk : C[0, 1] → Rk are continuous, so −1 measures Pn πt ⇒ Pπt−1 for every k . The 1,... 1 ,... the same example with δfn ; δ0 with pointwise The projections R ∞ , then the converse is NOT true though, convergence works in the same way. Theorem 28. h : S → S 0 , and let Dh ∈ S be −1 the set of discontinuities of h. If Pn ⇒ P and P (Dh ) = 0, then Pn h ⇒ Ph−1 Let h be any measurable function . Pn ⇒ P lim sup Pn F ≤ PF for every closed F . To start, we rst remark that Dhc ∩ h−1 F ⊂ h−1 F for any F , since for x ∈ Dhc ∩h−1 F , there is a sequence xn → x so that hxn ∈ F , but since x is a continuity point of h, then hxn → hx means that hx ∈ F . For any closed set F we have (overbar is closure): Proof. We use the characterization from the Portmeanteu theorem i lim sup Pn h−1 (F ) ≤ lim sup Pn (h−1 F ) ≤ P (h−1 F ) = P (h−1 F ) ∩ Dhc (since Dhc is ≤ P h−1 F (above remark) = Ph−1 F (since 7 F is closed) a null event) 1.3 Convergence in Distribution So far we have been talking about probability measures on metric spaces. A different way to think about the same thing is to consider metric space valued ran- X : (Ω, P, F) → (S, S) in the dom variables which come from an arbitarty probability space (S, S). Of course, such random variables induces a measure on usual way: PA = P X −1 A = P {X ∈ A} This is also called the law of X and is sometimes denoted captures all the information we would want about X, P = L (X). This so when we think of random variables we can think of them in two ways: either as a measurable function on a probability space, or as a measure on the metric space X Z E [f (x)] = on which Z f (X(w))P(dw) = f (x)P(dx) = Pf Ω Denition 29. X S takes values. In this language: We say that a sequence of random variables in distribution if Theorem 30. S L (Xn ) ⇒ L (X). Xn converges Xn ⇒ X . to For convenience we write (Portmenteau Theorem) In this setting the dierent equivalent ways to think about weak convergence look like: (i),(ii) Xn ⇒ X , that is E (f (Xn )) → E (f (X)) for all bounded continuous f. (iii),(iv) lim sup P {Xn ∈ F } ≤ P {X ∈ F } for every closed set F P {X ∈ G} for every open set G (v) P {Xn ∈ A} → P {X ∈ A} for all X−continuity sets A. Denition 31. / lim infP {Xn ∈ G} ≥ Sometimes we will conate our two notations, so we will write things like: Where X can be read as Pn = Xn ⇒ X Xn ⇒ P Pn ⇒ X L (X) P if you ever get confused. 1.3.1 Convergence of Probability Denition 32. Say a ∈ S . We say that Xn every >0 converges to a in probability if for we have: P {ρ (Xn , a) < } → 1 Proposition 33. Xn converges to a in probability if and only if 8 Xn ⇒ a Xn ⇒ a, we use G be any open set. If a ∈ G, then nd > 0 so that B(a, ) ⊂ G. For this we have that P {Xn ∈ B(a, )} = P {ρ(Xn , a) < } → 1, but B(a, ) ⊂ G , so P {Xn ∈ G} → 1 too. Hence lim inf P {Xn ∈ G} = 1 = P {a ∈ G}. If a ∈ / G, then we have the trivial inequality lim inf P {Xn ∈ G} ≥ 0 = P {a ∈ G}. Conversly, suppose Xn ⇒ a. For every > 0, B(a, ) is an open set so choosing G = B(a, ), we have by the portmenteau theorem that 1 ≥ lim inf P {Xn ∈ B(a, )} ≥ 1 = P {a ∈ B(a, )}, hence lim P {Xn ∈ B(a, )} = 1 Xn Proof. Suppose converges to a in probability. To see that the open-set-lim-inf criteria of the Portmenteau theorem. Let so we have convergence in probability. Theorem 34. and (Xn , Yn ) are random elements of S ×S . Yn ⇒ X Suppose that ρ (Xn , Yn ) ⇒ 0 then Proof. We use the closed-set-lim-sup criteria. Let F = {x : ρ (x, F ) ≤ }. F If Xn ⇒ X be any closed set, and Then: P {Yn ∈ F } ≤ P {ρ (Xn , Yn ) ≥ } + P {Xn ∈ F } Since F is closed, we take lim sup P {Yn ∈ F } Since F lim sup ≤ lim sup P {ρ (Xn , Yn ) ≥ } + lim sup P {Xn ∈ F } ≤ 1 − lim inf P {ρ (Xn , Yn ) < } + lim sup P {Xn ∈ F } ≤ 1 − 1 + P {X ∈ F } = P {X ∈ F } →0 is closed, taking gives 1.3.2 Local vs Integral Laws Proposition 35. Suppose Pn and some other measure for µ−almost every to get: µ, x. P PF ↓ PF and gives the result. are absolutly continuous with respect to fn and f respectivly. If fn (x) → The converse statment is not true. and have densities Then Pn ⇒ P. f (x) Proof. (sketch) Have (this is related to the total variation stu we looked at in Markov Mixing) Z sup |PA − Pn A| ≤ |f (x) − fn (x)| µ(dx) → 0 A∈S S So of course, Pn A → PA for every Pcontinuity set. 1.3.3 Integration to the Limit If Xn ⇒ X , when does Theorem 36. If E (Xn ) → E (X)? Xn ⇒ X , then E |X| ≤ lim inf E |Xn | 9 | · | is a continuous function, Proof. Since by the mapping theorem, By the same type of argument in the proof of theorem, P {|Xn | > t} → P {|X| > t} (v) ⇒ (i) |Xn | ⇒ |X|. in the Portmanteau for all but countably many t. The result now follows by Fatous lemma: Z∞ E |X| = Z∞ P {|X| > t} dt ≤ lim inf P {|Xn | > t} dt = lim inf E |Xn | 0 0 Denition 37. We say that the sequence of random variables Xn is uniformly integrable if: Z |Xn |dP = 0 lim sup α→∞ n |Xn |>α This holds if the Xn αlarger are uniformly bounded, as for than the bound, all of these are 0. Proposition 38. Proof. Take α0 If Xn are uniformly integrable, then so large so that the supn R |Xn |>α0 supn E (Xn ) < ∞. |Xn |dP ≤ 1. Z Z |Xn |dP + sup sup E (Xn ) ≤ sup n Now have: n |Xn |dP n |Xn |>α0 |Xn |≤α0 Z ≤ 1+ = α0 dP 1 + α0 < ∞ Theorem 39. and If Xn are uniformly integrable, and Xn ⇒ X , then X E (Xn ) → E (X) is integrable E |Xn | are bounded, we know by our Fatou-type lemma that E |X| ≤ lim inf E |Xn | < ∞ is integrable. By the mapping theorem with the + − − + − + and Xn ⇒ X . Now continuous maps (·) , (·) , we have that Xn ⇒ X Proof. Since the write: E Xn+ Zα = E X+ = X + dP Xn ≥α X + dP, Xn ≥α n by the uniform integrability condition. The last term in these equations α → ∞ Z P t < X + < α dt + 0 as Xn+ dP Xn ≥α 0 Zα Z P t < Xn+ < α dt + R 10 X + dP tend to zero Xn ≥α + Hence, to see E (Xn ) → R Rα Rα E (X + ) , it suces to check that as α large that 0 P {t < Xn+ < α} dt → 0 P {t < X + < α} dt. By choosing an α with P {X = α} = 0, this follows by the bounded convergence − theorem (as it did in the Portmenteau theorem). The same can be said of X , so we get E (Xn ) → E (X) as desired. Proposition 40. If there exists > 0 so that supn E |Xn |1+ < ∞,then Xn is uniformly integrable. Proof. Have: Z Xn1+ dP α Z Xn dP ≤ Xn ≥α Xn ≥α 1 1+ E |X | n α → 0 ≤ 1.4 Skipped Section on Permutations 1.5 Prohorov's Theorem Denition 41. Πreletivly Let Π be a family of probability measures on compact if every sequence of elements of vergent subsequence. I.e. Πcontails ⇒ P . We S. We call a weakly con- ∀ {Pn }n ∈ Π, ∃Pni s.t.Pni will be mostly Πis itself a sequence, in this setting relativly com- concerned with the case wehre pact if every subsequence has a further subsequence which weakly converges to something. If Recal the folowing theorem Theorem 42. If Pn is relativly compact, and if the limiting probability measure is the same for every subsequence, then Pn ⇒ P. In other words, if every subse- Pn , call itPni ,has a further subsquence, Pn ⇒ P. quence of then Proof. (By contrapositive). If so that |Pn f − Pf | > Pn ; P call it Pnij , so thatPnij ⇒ P, then there is a function intently often. f and subsequence, we see that it's impossible to have a sub-sub-sequence with P, as the function f > 0 Taking this innetly often as our Pnij ; provides an obstruction. Why is relative compactness useful? Here are some examples. Example 43. Pn , Suppose we are onC[0, 1] and we know that the for some sequence P. −1 Pn πt−1 ⇒ Pπ for every t1 , . . . tk . We have seen already ,...t t ,...t 1 1 k k that this does not nessasarily mean that Pn ⇒ P (this is the statement that the the nite dimensional distributions converge to some other distribution i.e. we have that nite dimensional sets are not convergence determining. (for example, the if we take the pointwise convergent sequence of continuous spikes going to zero, the 11 point masses at these functions does not weakly converge, as the functions do not uniformly converge). However, if we know in addition to this that the family Q so that ∀Pni , ∃Pnij ⇒ −1 −1 Now, by the mapping theorem, it must be that Pni πt ,...,t ⇒ Qπt ,...,t . 1 1 k k j −1 −1 By uniqieness of weak convergence, we have then that Pπt ,...t = Qπt ,...,t . 1 1 k k But since the nite dimensional sets are a seperating class, hence P = Q, wh. Finally, by the last theorem, since every subsequence Pni has a further subsequence Pni converging to P, by contrapositive we prove that Pn ⇒ P. j Pn Q. is relativly compact, then we have a candidate In other words, nite dimenisoanl convergence + weak compactness gives weak convergence. Example 44. Similar to the above, if Pn πt−1 ⇒ µt1 ,...tk 1 ,...tk µt1 ,...tk Pn is relativly compact, and we know that µ, there is a Pso that Pπt−1 = 1 ,...tk for a family of measures 1.5.1 Tightness How do we prove relative compactness? tribution functions for Pn . On the real line, let Fn be the dis- By the Helly selection theorem, every subsequence Fn i has a further subsequence for which there is a nondecreasing, right contin- uos F F so that Fni(m) → F pointwise for all continuity points of F. However, might fail to be a proper distribution function for the reason that it doesnt have total mass 1, i.e. limx→∞ F (x) 6= 1 or limx→−∞ F (x) 6= 0. e.g. δn has F (x) = Heaviside(x − n) → 0 as n → ∞. Another example is the uniform distriution on [−n, n]. A condition that prevents this from happening is uniform tightness: Denition 45. A family exists a compact K Π is tight or uniformly tight if for every such that PK > 1 − Theorem 46. (Prohorov's Theorem) If Corollary 47. If erges to P, then {Pn } is Pn ⇒ P for every Πis > 0, tight, then it is relativly compact. tight, and if each convergent subsequence of Proof. By Prohorov's theorem, {Pn } there P ∈ Π. is relativly compact. Pn cov- Hence every suse- quence has a subsubsequence which converges. By hypothesis, it must converge to P. By the earlier theorem with the proof by contrapositive, 1.5.2 The proof of Prohorov's Theorem Its pretty technical, so I will skip it for now. 12 Pn ⇒ P.
© Copyright 2024 Paperzz