Convergence of Probability Measures by Patrick Billingsley

Notes on Convergence of Probability Measures
by Billingsly
1
Weak
1.1
Convergence in Metric Spaces
Measures on Metric Spaces
Denition 1.
Our general framework here will be a metric space
ρ,
with a distance
S
equipped
which denes the usual topology of open and closed sets.
Along with this we will get
S,
the Borel sigma algebra of subsets.
We are
P on S which
are non-negative, countable additive set functions satisfying P (S) = 1. We will
R
also use the shorthand that for functions f : S → R, we dene Pf =
f dP =
E (f ) ∈ R
interested in studying probability measures here, that is measures
Denition 2.
Pn f → Pf
converges weakly to
Theorem 3.
Pn ⇒ P
if
and we say
P
We write that a sequence of probability measures
for every bounded, continuous real function
f
on
S
P
S is what is called regular, meaning
S−set A and every > 0, there is are open and closed sets G and
sandwhich A so that F ⊂ A ⊂ G and P (G − F ) < Every probability measure on
that for every
F
that
Proof. This is a Borel sigma algebra proof . One rst veries that the collection
of sets
A
with the above property form a sigma algebra (easy), and then checks
A closed, take F = A
Aδ := {x : ρ (x, A) < δ}
that the closed sets have this property too. Indeed, for
and then to get
G
consider the collection of open sets
(recall that distance from a point to a set is taken with an inf ).
∩A
1
n
for
δ
= A and is decreasing, hence limn→∞ PA
1
n
Now, since
= PA, so that P Aδ − A < suciently small.
Theorem 4.
If
every closed set
Pand Q are probability measures on S
F , then PA = QA for every A ∈ S .
so that
PF = QF
for
Proof. The collection of sets where they agree is seen to be a sigma algebra, so
containing all the closed sets is enough to get equality everywhere.
Remark 5. This theorem shows us that the probability measure is determined
entirely by its action on open and closed sets (We will see a lot more results of
this avor). The next result tells us that knowing the action of
continuous
f
is also sucient.
1
Pf
for bounded
Theorem 6.
Suppose
P,Q are probability
f . Then P = Q.
S
measures on
with
Pf = Qf
for
every bounded continuous
Proof. We will only need to consider the functions of the form:
+
F
where
is a closed set,
>0
and we use the shorthand
z
f (x) = (1 − ρ (x, F ) /)
= max (z, 0) ≥ 0.
One easily veries the inequality:
+
1F ≤ f = (1 − ρ (x, F ) /) ≤ 1F (1F is the indicator function of the set
F ).
Here,
F = {x : ρ (x, F ) < }
F by a bit. Now, integrating the above inequality gives,
PF ≤ Pf . By hypothesis, Pf = Qf and itegrating the above inequality again
gives Qf ≤ QF . Stringing these together gives PF ≤ QF . Taking to zero
gives PF ≤ QF (like in the last theorem, QF → QF ). Of course, the argument
works equally well to show QF ≤ PF , so we get the desired result.
is the enlargment of
Denition 7.
> 0,
P
PK > 1 − .
We say that a probability measure
there exists a compact setK so that
on
S
is tight for every
This notion of tight is a
bridge between the idea of comapact and the probability measure on the space.
Theorem 8.
P
is tight if and only if
PA = supK⊂A PK
for every
A ∈ S . (K 's
are compact here)
P is tight. For any > 0, nd K so that PK > 1 − /2 and F
G open so that F ⊂ A ⊂ G and P (G − F ) < /2. Then P (A − F ) < /2
too. Now the set K ∩ F is compact (as K compact and F closed) and has
A − K ∩ F = (A ∩ K c ) ∪˙ (A − F ), so P (A − K ∩ F ) ≤ PK c + P (A − F ) <
/2 + /2 = . Hence PA ≤ supK⊂A PK + . Since this holds for any , we get
PA ≤ supK⊂A PK . The reverse inequality is trivial from K ⊂ A, so we have
Proof. Suppose
closed,
our result.
Conversly, if
Theorem 9.
S
If
PA = supK⊂A PK
S
holds, feeding in
A=S
give tightness.
is seperable and complete, then each probability measure on
is tight.
Proof. Fix any > 0. Since S is seperable, for each k ∈ N, by taking open
1
k -balls centered at the countable dense subset, we get a sequence of open sets
Ak,1 , Ak,2 , . . . that cover S . Now, since ∪i Ak,i ↑ S , we can nd for each k , an
nk so that P (∪i≤nk Ai,k ) > 1 − 2k . Now, consider the set
∪i≤nk Ak,i .
P ∩k∈N
c
c
Since P (∪i≤nk Ai,k ) <
,
hence P (∪k (∪i≤nk Ai,k ) ) <
=
, and so
k
k
k 2
2
P (∩k∈N ∪i≤nk Ak,i ) > 1 − . Finally, we remark that the closure of ∩k∈N ∪i≤nk
Ak,i is compact, as it is closed (its a closure of a set in a complete metric space)
and totally bounded, since ∩k∈N ∪i≤nk Ak,i ⊂ ∪i≤nk Ak0 ,i which can be covered
0
1
by a nite number (nk0 of them) of
k -balls. Hence it is compact, so this set
gives us tightness.
Denition 10.
A collection of sets
probability measures which agree on
A ⊂ S is called a seperating class
A must agree on all of S . This is
2
if two
called
+
a seperating class, since the values of
P
on
A
are enough to seperate
P
from
any other probability measure. For example, as we have seen, the open sets are
a seperating class.
Denition 11.
A
π−system
is a collection of sets which is closed under nite
intersections. e.g. the half open intervals
Remark 12. If
A
(−∞, a]
A is π−system which generates S
are a
Example 13.
On
. . . × (−∞, ak ].
R
(−∞, a]
, the half open intervals are the sets
These are a
π−system
and so is a seperating class.
on
R.
(the Borel sigma algebra), then
is a seperating class. e.g. the half open intervals
k
π−system
on
R.
(−∞, a1 ]×(−∞, a2 ]×
that generates the Borel sigma algebra
This means the cumulative distribution func-
F (a1 , a2 , . . . ak ) = P(−∞, a1 ] × (−∞, a2 ] × . . . × (−∞, ak ] complete dek
termine P. Since R is seperable and complete, one can see any probabilty
tions:
measure is tight by the previous theorem. Another way to see tightness is to
see that
Rk = ∪i B̄(0, i)
σ−compact
so that
P (B(0, i)) ↑ 1.
This arguement is saying that
spaces are tight.
Example 14.
On the space of real valued seuqences,
the metric:
X
ρ(x, y) =
R∞ = {x1 , x2 , . . . },
take
b(xi , yi )/2i
i
b(xi , yi ) = 1 ∧ |xi − yi | = min (1, |xi − yi |). This metrices pointn
∞
n
wise convergence. If a sequence x ∈ R
has ρ(x , x) → 0, then notice that
n
b(xni , xi ) ≤ 2i ρ(xni , xi ) → 0, so xni →
x
.
Conversly,
if xi → xi for each i given
P i
i
any > 0, take n0 so large so that
1/2
<
/2
and
then take n1 so large
i>n0
n
so that |xi − xi | < /2 for every 1 ≤ i ≤ n0 and n > n1 (since there are only
nitely many i here, 1 ≤ i ≤ n0 this is ok!) But then, for n > n1 we have:
Where
ρ(xn , x)
=
X
≤
X
b(xi , yi )/2i
i
b(xi , yi )/2i +
≤
1/2i
i>n0
i≤n0
X
X
i
(/2)/2 + (/2)
i≤n0
<
ρ(xn , x) → 0 if and only if xni → xi for each i. One can see that the natural
k coordinates πk : R∞ → Rk are continuous then since
k
convergence on R is also characterized by coordinate-wise convergence. Hence
So
projections onto the rst
the sets:
Nk, (x) = {y : |yi − xi | < , 1 ≤ i ≤ k} = πk−1 (B(x, ))
Are open. Moreover,
k
large enough and
y ∈ Nk, (x)
ρ (y, x) ≤ + 21k , so by choosing
for any r , have Nk, (x) ⊂ B(x, r).
implies
small enough, we can,
3
By choosing only rational
x,
we can then see that the space is seperable. Com-
pletness of the space follows since any Caucy sequeunce will be coordinate-wise
Cauchy.
Since
R
is complete, we will then have coordinatewise convergence,
which we've already shown is equivalent to our metric.
Notice that
R
∞
is not
σ−compact
Hence
R∞ is
tight!
(this can be proved by the Baire category
theorem) so the fact that probability measures are tight is not obvious here.
πk−1 H
for H ∈ S . Notice that
π−system, since the intersection of two such sets can be written
−1
−1
−1
as πk H1 ∩ πk H2 = πk (H1 ∩ H2 ) (some manipulation with adding extra co1
2
ordinates to make k1 and k2 compatible may be nessasary). Morevoer, the fact
that the subsets Nk, , which are exactly such nitie dimensional sets, form a
Conisder the nite dimensional sets of the form
these form a
basis for the space (they can be found within every open ball) shows that these
nite dimensional sets generate the whole sigma algebra. This is saying precisly
that the nite dimensional sets are a seperating class.
Example 15.
The space of continuous functions on
[0, 1],with
the sup-norm
distance, can also be shown to be seperable and complete. Seperability can be
seen by considering piecewise linear functions with rational values that live on
ner and ner grids. (The fact that these are dense uses uniform continuity).
Completeness is a standard exercise, one sees that every Cauchy sequence must
be Cauchy at each coordinate, and that the convergence is uniform. So again,
we have tightness.
Example 16.
C[0, 1] with the sup-norm distance.) We will show that
πt1 ,t2 ,...tk : C[0, 1] → Rk
−1
by πt1 ...tk = (x(t1 ), x(t2 ), . . . x(tk )) These are continuous, so the sets πt ,...t H, H ∈
1
k
k
∞
R are Borel sets. As in the R example, consider the set of nite dimensional
−1
0
sets of the form πt ,...t H . As before, by rening the indices t s a little bit, we
1
k
see that these sets form a π−system. By continuity of the functions, we can
write in C[0, 1] that B(x, ) = ∩r {y : |x(r) − y(r)| < } where r ranges over the
rationals in Q ∩ [0, 1]. Since the rationals are countable, these balls are in the
(Again on
the nite dimensional sets form a seperating class. Let
sigma algebra generated by the nite dimensional sets! That is to say, the nite
dimensional sets are again a seperating class.
1.2
Properties of Weak Convergence
Example 17.
at
in
Write δx to be the probability measure on S that has unit mass
R
x ∈ S . That is δx (A) = 1A (x) and δx (f ) = f dδx = f (x) If we have xn → x0
S , then for continuous f we have that:
δxn f = f (xn ) → f (x0 ) = δx0
f , δxn ⇒ δx0 by denition
> 0 so that ρ(xn , x0 ) > +
innetly often, and then for the function f (x) = (1 − ρ(x, x0 )/) , we will have
that δxn f = f (xn ) = 0 innetly often while δx0 f = f (x0 ) = 1, so of course
δxn f ; δx0 f . Hence δxn ⇒ δx0 if and only if xn → x0 .
and therefore, since this holds for all continuous
of
⇒ .
Conversly, if
xn 9 x0 ,
the there exists
4
Example 18.
Take
S = [0, 1] and P = L[0, 1] to be the usual Lebesgue measure.
Suppose that we have a sequence of probability measures which are constructed
1
1≤k≤mn δxn,k , and
mn
that the points are asymptotically evenly distributed over [0, 1] in the sense that
as a sum of many point masses.
in any interval
That is each
P
Pn =
J ⊂ [0, 1]
Pn J =
#{k : xn,k ∈ J}
→ L(J) = PJ
mn
This condition is enough to see that
Pn ⇒ P.
One neat way to see this more
rigourosly is to use the Theory of Riemann integrals, for if
function on
of
[0, 1]
[0, 1],
is any continuous
J1 , J2 , . . . Jr
ne enough so that the upper (sups!) and lower (infs!) Riemann sums
disagree by at most
Pn f =
f
then it is Riemann integrable. Take a partition
,
and we will have that:
X
1 X
1 X
f (xn,k ) ≤
sup{f (x) : x ∈ Ji }·#{k : xn,k ∈ J} →
sup{f (x) : x ∈ Ji }L (Ji ) ≤ Pf +
mn
mn
The other inequality holds too for the lower Riemann sums, and so we have
a sandwhich from which we conclude that
continuous
f,
we conclude that
Denition 19.
∂A
saties
Since this holds for every
P on S , a set A ∈ S
P−continuity set.
For a probability measure
P (∂A) = 0
Theorem 20.
Pn f → Pf .
Pn ⇒ P.
is called a
whose boundary
(Portmanteau Theorem) The following are equivalent ways to
Pn ⇒ P:
(i)/(ii) Pn ⇒ P that is:Pn f → Pf for every
(iii)/(iv) lim sup Pn F ≤ PF for every closed
every open set G
(v) Pn A → PA for all P−continuity sets A.
say
Proof. (i)/(ii)
⇒ (iii)/(iv):
For
F
closed, let
f
lim infPn G ≥ PG
bounded continuous
set
F
/
+
f = (1 − ρ(·, F )/)
for
so that
1F ≤ f ≤ 1F as in a previous argument. Since this is bounded and continuous,
we have by (ii) that lim Pn f = Pf . But then, lim sup Pn F ≤ lim sup Pn f =
Pf ≤ PF . Since F is closed, we know ∩ F = F , so taking → 0 with this
inequality gives the desired result in (iii). Taking complements shows (iii) and
(iv) are really the same thing.
Proof.
(iii)/(iv) ⇒ (v):
written as
Given a set
A,
recall that the boundary of
◦
can be
∂A = Ā− A, so P(∂A) = 0 implies that P(Ā) = P(A) = P(A).
◦
Ā
A
◦
is closed and
A
is open, we have by
PĀ
(iii), (iv)
that:
≥ lim sup Pn Ā
≥ lim sup Pn A
≥ lim inf Pn A
◦
≥ lim inf Pn A
◦
≥ PA
5
Since
◦
P(Ā) = P(A) = P(A), these are all equalities, and moreover lim Pn A =
PA, which is (v).
Since
(v) ⇒ (i), (ii)
by linearity,we may assume that 0 ≤ f ≤ 1. Now,
R f is bounded,
R1
Pf = f dP = 0 P{f > t}dt (this a Fubini-Tonelli type statement), the same
equality holds with Pn . Now, for continuous f , ∂{f > t} ⊂ {f = t} (nd
sequences in {f > t} and {f ≤ t} converging to any point in ∂{f > t}, and
by continuity, we will have that such a point has both f ≥ t and f ≤ t), so
{f > t} is a P−continuity whenever P{f = t} = 0. Of course, P{f = t} 6= 0
for at most countably many t, say at t1 , t2 , . . . , and everywhere else {f > t} is
a P−continuity set. By condition 5, everywhere except for t = t1 , t2 , . . . we will
have Pn {f > t} → P{f > t}. That is to say that Pn {f > t} → P{f > t} for
L−almost every t in [0, 1]. By the bounded convergence theorem then:
Z 1
Z 1
Pn f =
Pn {f > t}dt →
P{f > t} = Pf
Proof.
Since
0
Denition 21.
class is
0
A collection of Borel sets,
Pn A → PA
for all
P−continuity
A is called a convergence determening
A ∈ A implies Pn ⇒ P. One can
sets
prove some wacky theorems that tell you that certain classes are convergence
determening. See pages 17-19 of the book.
Example 22.
The nite dimensional sets of
R∞
are convergence determening.
(Details ommited)
Example 23.
C[0, 1] are not convergence deterfn of functions which gets to zero pointwise
1
after nite n, but for which fn 9 0 uniformly (e.g. fn has a spike at
n , and
2
is zero everywhere after
n ). Let Pn = δfn since fn 9 0 in this metric space,
δfn ; δ0 . However, for all the nite dimensional subsets πt−1
H , we have
1 ,t2 ,...
(for nlarge)
−1
that Pn πt ,t ,... H = 1π −1
(fn ) = 1H1 (f (t1 )) · . . . · 1Hn (f (tn ))
=
1 2
t1 ,t2 ,... H
−1
1H1 (0) · . . . · 1Hn (0) = 1π−1
(0) = P πt1 ,t2 ,... H since fn gets to zero at the
t1 ,t2 ,... H
points t1 , t2 , . . . for large enough n by the pointwise stipulation we made. So
Pn A → PA for every nite dimensional A and yet Pn ; P. This means these
are NOT a convergence determening class in C[0, 1]; this highlites a fundemental
∞
dierence between R and C[0, 1]
The nite dimensional sets of
mening. To see this, take a sequence
Theorem 24.
call it
Pnij ,
j
⇒ P,
Proof. (By contrapositive). If
so that
Pn , call itPni ,has
Pn ⇒ P.
If every subsequence of
so thatPni
|Pn f − Pf | > then
Pn ; P
a further subsquence,
then there is a function
intently often.
f
and
subsequence, we see that it's impossible to have a sub-sub-sequence with
P,
as the function
f
provides an obstruction.
6
> 0
Taking this innetly often as our
Pnij ;
1.2.1 The Mapping Theorem
Theorem 25. Suppose h : S → S 0
is a continous function between two metric
spaces. For P a probability measure on S , we get an induced probability measure
0
−1
−1
on S , namely Ph
by Ph
(A0 ) = P h−1 (A0 ) . If Pn ⇒ P then Pn h−1 ⇒
Ph−1 .
f , the function f ◦ h is again bounded and
Pn (f ◦ h) → P(f ◦ h). By
R a change of variables for probability spaces however we see that, P (f ◦ h) =
f ◦h(x)dP(dx) =
R
R
f (h(x))dP(dx) = f (y)dP(h−1 (dy)) = Ph−1 (f ). So indeed Pn (f ◦ h) →
P(f ◦ h) is the same as Pn h−1 (f ) → Ph−1 (f ). Since this holds for every
−1
bounded continous f , we see Pn h
⇒ Ph−1 .
Proof. For any bounded continuous
conditunous. Hence
Example 26.
Pn ⇒ P
gives
: R∞ → Rk are continuous, so if Pn ⇒ P on
Pπk−1 for every k . The converse is also true.
πk−1 H = πk−1 (∂H). (one direction is trivial,
∞
the other not so hard using sequences), so that the P-continuity sets of Rf
−1
k
(nite dimensional sets) are precisly those which are Pπk −continuity sets in R
−1
−1
for every k . Hence Pn πk ⇒ Pπk for every k means that Pn A → PA whenever
A is a nite dimensional P−continuity set. Since the nite dimensional sets are
a convergence-determening class, we have Pn ⇒ P.
πk
R , then the measures Pn πk−1 ⇒
First, we can show directly that ∂
The projections
∞
Example 27.
if
Pn ⇒ P
on
πt1 ,t2 ,...tk : C[0, 1] → Rk are continuous, so
−1
measures Pn πt
⇒ Pπt−1
for every k . The
1,...
1 ,...
the same example with δfn ; δ0 with pointwise
The projections
R
∞
, then the
converse is NOT true though,
convergence works in the same way.
Theorem 28.
h : S → S 0 , and let Dh ∈ S be
−1
the set of discontinuities of h. If Pn ⇒ P and P (Dh ) = 0, then Pn h
⇒ Ph−1
Let
h
be any measurable function
.
Pn ⇒ P
lim sup Pn F ≤ PF for every closed F . To start, we rst remark that Dhc ∩
h−1 F ⊂ h−1 F for any F , since for x ∈ Dhc ∩h−1 F , there is a sequence xn → x
so that hxn ∈ F , but since x is a continuity point of h, then hxn → hx means
that hx ∈ F . For any closed set F we have (overbar is closure):
Proof. We use the characterization from the Portmeanteu theorem
i
lim sup Pn h−1 (F ) ≤ lim sup Pn (h−1 F )
≤ P (h−1 F )
= P (h−1 F ) ∩ Dhc (since Dhc is
≤ P h−1 F (above remark)
= Ph−1 F
(since
7
F is
closed)
a null event)
1.3
Convergence in Distribution
So far we have been talking about probability measures on metric spaces. A different way to think about the same thing is to consider metric space valued ran-
X : (Ω, P, F) →
(S, S) in the
dom variables which come from an arbitarty probability space
(S, S).
Of course, such random variables induces a measure on
usual way:
PA = P X −1 A = P {X ∈ A}
This is also called the law of
X
and is sometimes denoted
captures all the information we would want about
X,
P = L (X).
This
so when we think of
random variables we can think of them in two ways: either as a measurable
function on a probability space, or as a measure on the metric space
X
Z
E [f (x)] =
on which
Z
f (X(w))P(dw) =
f (x)P(dx) = Pf
Ω
Denition 29.
X
S
takes values. In this language:
We say that a sequence of random variables
in distribution if
Theorem 30.
S
L (Xn ) ⇒ L (X).
Xn converges
Xn ⇒ X .
to
For convenience we write
(Portmenteau Theorem) In this setting the dierent equivalent
ways to think about weak convergence look like:
(i),(ii)
Xn ⇒ X ,
that is
E (f (Xn )) → E (f (X))
for all bounded continuous
f.
(iii),(iv) lim sup P {Xn ∈ F } ≤ P {X ∈ F } for every closed set F
P {X ∈ G} for every open set G
(v) P {Xn ∈ A} → P {X ∈ A} for all X−continuity sets A.
Denition 31.
/
lim infP {Xn ∈ G} ≥
Sometimes we will conate our two notations, so we will write
things like:
Where
X
can be read as
Pn
=
Xn
⇒ X
Xn
⇒ P
Pn
⇒ X
L (X)
P
if you ever get confused.
1.3.1 Convergence of Probability
Denition 32. Say a ∈ S . We say that Xn
every
>0
converges to
a
in probability if for
we have:
P {ρ (Xn , a) < } → 1
Proposition 33.
Xn
converges to
a
in probability if and only if
8
Xn ⇒ a
Xn ⇒ a, we use
G be any open
set. If a ∈ G, then nd > 0 so that B(a, ) ⊂ G. For this we have that
P {Xn ∈ B(a, )} = P {ρ(Xn , a) < } → 1, but B(a, ) ⊂ G , so P {Xn ∈ G} → 1
too. Hence lim inf P {Xn ∈ G} = 1 = P {a ∈ G}. If a ∈
/ G, then we have the
trivial inequality lim inf P {Xn ∈ G} ≥ 0 = P {a ∈ G}.
Conversly, suppose Xn ⇒ a.
For every > 0, B(a, ) is an open set
so choosing G = B(a, ), we have by the portmenteau theorem that 1 ≥
lim inf P {Xn ∈ B(a, )} ≥ 1 = P {a ∈ B(a, )}, hence lim P {Xn ∈ B(a, )} = 1
Xn
Proof. Suppose
converges to
a
in probability. To see that
the open-set-lim-inf criteria of the Portmenteau theorem. Let
so we have convergence in probability.
Theorem 34.
and
(Xn , Yn ) are random elements of S ×S .
Yn ⇒ X
Suppose that
ρ (Xn , Yn ) ⇒ 0
then
Proof. We use the closed-set-lim-sup criteria. Let
F = {x : ρ (x, F ) ≤ }.
F
If
Xn ⇒ X
be any closed set, and
Then:
P {Yn ∈ F } ≤ P {ρ (Xn , Yn ) ≥ } + P {Xn ∈ F }
Since
F is
closed, we take
lim sup P {Yn ∈ F }
Since
F
lim sup
≤
lim sup P {ρ (Xn , Yn ) ≥ } + lim sup P {Xn ∈ F }
≤
1 − lim inf P {ρ (Xn , Yn ) < } + lim sup P {Xn ∈ F }
≤
1 − 1 + P {X ∈ F } = P {X ∈ F }
→0
is closed, taking
gives
1.3.2 Local vs Integral Laws
Proposition 35. Suppose Pn and
some other measure
for
µ−almost
every
to get:
µ,
x.
P
PF ↓ PF
and gives the result.
are absolutly continuous with respect to
fn and f respectivly. If fn (x) →
The converse statment is not true.
and have densities
Then
Pn ⇒ P.
f (x)
Proof. (sketch) Have (this is related to the total variation stu we looked at in
Markov Mixing)
Z
sup |PA − Pn A| ≤
|f (x) − fn (x)| µ(dx) → 0
A∈S
S
So of course,
Pn A → PA
for every
Pcontinuity
set.
1.3.3 Integration to the Limit
If
Xn ⇒ X ,
when does
Theorem 36.
If
E (Xn ) → E (X)?
Xn ⇒ X ,
then
E |X| ≤ lim inf E |Xn |
9
| · | is a continuous function,
Proof. Since
by the mapping theorem,
By the same type of argument in the proof of
theorem,
P {|Xn | > t} → P {|X| > t}
(v) ⇒ (i)
|Xn | ⇒ |X|.
in the Portmanteau
for all but countably many
t.
The result
now follows by Fatous lemma:
Z∞
E |X| =
Z∞
P {|X| > t} dt ≤ lim inf
P {|Xn | > t} dt = lim inf E |Xn |
0
0
Denition 37.
We say that the sequence of random variables
Xn
is uniformly
integrable if:
Z
|Xn |dP = 0
lim sup
α→∞ n
|Xn |>α
This holds if the
Xn
αlarger
are uniformly bounded, as for
than the bound,
all of these are 0.
Proposition 38.
Proof. Take
α0
If
Xn
are uniformly integrable, then
so large so that the
supn
R
|Xn |>α0
supn E (Xn ) < ∞.
|Xn |dP ≤ 1.
Z
Z
|Xn |dP + sup
sup E (Xn ) ≤ sup
n
Now have:
n
|Xn |dP
n
|Xn |>α0
|Xn |≤α0
Z
≤ 1+
=
α0 dP
1 + α0 < ∞
Theorem 39.
and
If Xn are uniformly integrable, and Xn ⇒ X , then X
E (Xn ) → E (X)
is integrable
E |Xn | are bounded, we know by our Fatou-type lemma that
E |X| ≤ lim inf E |Xn | < ∞ is integrable. By the mapping theorem with the
+
−
−
+
−
+
and Xn ⇒ X . Now
continuous maps (·) , (·) , we have that Xn ⇒ X
Proof. Since the
write:
E
Xn+
Zα
=
E X+
=
X + dP
Xn ≥α
X + dP,
Xn ≥α n
by the uniform integrability condition.
The last term in these equations
α → ∞
Z
P t < X + < α dt +
0
as
Xn+ dP
Xn ≥α
0
Zα
Z
P t < Xn+ < α dt +
R
10
X + dP tend to zero
Xn ≥α
+
Hence, to see E (Xn ) →
R
Rα
Rα
E (X + ) , it suces to check that as α large that 0 P {t < Xn+ < α} dt → 0 P {t < X + < α} dt.
By choosing an α with P {X = α} = 0, this follows by the bounded convergence
−
theorem (as it did in the Portmenteau theorem). The same can be said of X ,
so we get E (Xn ) → E (X) as desired.
Proposition 40. If there exists > 0 so that supn E |Xn |1+ < ∞,then Xn
is uniformly integrable.
Proof. Have:
Z
Xn1+
dP
α
Z
Xn dP
≤
Xn ≥α
Xn ≥α
1
1+
E
|X
|
n
α
→ 0
≤
1.4
Skipped Section on Permutations
1.5
Prohorov's Theorem
Denition 41.
Πreletivly
Let
Π
be a family of probability measures on
compact if every sequence of elements of
vergent subsequence. I.e.
Πcontails
⇒ P . We
S.
We call
a weakly con-
∀ {Pn }n ∈ Π, ∃Pni s.t.Pni
will be mostly
Πis itself a sequence, in this setting relativly com-
concerned with the case wehre
pact if every subsequence has a further subsequence which weakly converges to
something.
If Recal the folowing theorem
Theorem 42.
If Pn is relativly compact, and if the limiting probability measure
is the same for every subsequence, then Pn ⇒ P. In other words, if every subse-
Pn , call itPni ,has a further subsquence,
Pn ⇒ P.
quence of
then
Proof. (By contrapositive). If
so that
|Pn f − Pf | > Pn ; P
call it
Pnij , so thatPnij ⇒ P,
then there is a function
intently often.
f
and
subsequence, we see that it's impossible to have a sub-sub-sequence with
P,
as the function
f
> 0
Taking this innetly often as our
Pnij ;
provides an obstruction.
Why is relative compactness useful? Here are some examples.
Example 43.
Pn ,
Suppose we are onC[0, 1] and we know that the for some sequence
P.
−1
Pn πt−1
⇒
Pπ
for every t1 , . . . tk . We have seen already
,...t
t
,...t
1
1
k
k
that this does not nessasarily mean that Pn ⇒ P (this is the statement that the
the nite dimensional distributions converge to some other distribution
i.e. we have that
nite dimensional sets are not convergence determining. (for example, the if we
take the pointwise convergent sequence of continuous spikes going to zero, the
11
point masses at these functions does not weakly converge, as the functions do
not uniformly converge). However, if we know in addition to this that the family
Q so that ∀Pni , ∃Pnij ⇒
−1
−1
Now, by the mapping theorem, it must be that Pni πt ,...,t ⇒ Qπt ,...,t .
1
1
k
k
j
−1
−1
By uniqieness of weak convergence, we have then that Pπt ,...t = Qπt ,...,t .
1
1
k
k
But since the nite dimensional sets are a seperating class, hence P = Q,
wh. Finally, by the last theorem, since every subsequence Pni has a further
subsequence Pni converging to P, by contrapositive we prove that Pn ⇒ P.
j
Pn
Q.
is relativly compact, then we have a candidate
In other words, nite dimenisoanl convergence + weak compactness gives weak
convergence.
Example 44.
Similar to the above, if
Pn πt−1
⇒ µt1 ,...tk
1 ,...tk
µt1 ,...tk
Pn is relativly compact, and we know that
µ, there is a Pso that Pπt−1
=
1 ,...tk
for a family of measures
1.5.1 Tightness
How do we prove relative compactness?
tribution functions for
Pn .
On the real line, let
Fn
be the dis-
By the Helly selection theorem, every subsequence
Fn i
has a further subsequence for which there is a nondecreasing, right contin-
uos
F
F
so that
Fni(m) → F
pointwise for all continuity points of
F.
However,
might fail to be a proper distribution function for the reason that it doesnt
have total mass 1, i.e. limx→∞ F (x) 6= 1 or limx→−∞ F (x) 6= 0. e.g. δn has
F (x) = Heaviside(x − n) → 0 as n → ∞. Another example is the uniform distriution on [−n, n]. A condition that prevents this from happening is uniform
tightness:
Denition 45.
A family
exists a compact
K
Π
is tight or uniformly tight if for every
such that
PK > 1 − Theorem 46.
(Prohorov's Theorem) If
Corollary 47.
If
erges to
P,
then
{Pn } is
Pn ⇒ P
for every
Πis
> 0,
tight, then it is relativly compact.
tight, and if each convergent subsequence of
Proof. By Prohorov's theorem,
{Pn }
there
P ∈ Π.
is relativly compact.
Pn
cov-
Hence every suse-
quence has a subsubsequence which converges. By hypothesis, it must converge
to
P.
By the earlier theorem with the proof by contrapositive,
1.5.2 The proof of Prohorov's Theorem
Its pretty technical, so I will skip it for now.
12
Pn ⇒ P.