2WS30 - A Note on the Proof of a (Weak) Glivenko

2WS30 - A Note on the Proof of a (Weak) Glivenko-Cantelli
Lemma
Rui Castro
February 5, 2015
In this note we state and prove a weaker version of the Glivenko-Cantelli Lemma (guaranteeing uniform convergence in probability). This is done to avoid the measure-theoretical notion
of almost-sure convergence.
Theorem 1 (A Weaker Glivenko-Cantelli Lemma) Let X1 , . . . , Xn be i.i.d. random variables with marginal distribution function F : R → [0, 1]. Define the empirical cumulative distribution function F̂n : R → [0, 1] as
n
1X
F̂n (x) =
1{Xi ≤ x} .
n
i=1
The empirical distribution converges uniformly to F (x), namely
P
sup F̂n (x) − F (x) → 0 ,
x∈R
as n → ∞.
Proof: Convergence in probability means that for any > 0
P sup F̂n (x) − F (x) > → 0 ,
x∈R
as n → ∞. Recall that the weak law of large numbers tells us that, for an arbitrary (but fixed)
x∈R
P |F̂n (x) − F (x)| ≥ → 0 ,
as n → ∞.
The proof proceeds by reducing the supremum in the statement of the theorem to a maximum
over a finite set. This is a very general technique to prove uniform convergence in many settings,
and a generalization of this approach is known as chaining in the theory of empirical processes.
We will prove the theorem for the case when F (x) is a continuous function. The proof can be
easily extended for general distribution functions. Let > 0 be fixed. Since F is continuous we
can find m < ∞ points such that −∞ = x0 < x1 < · · · < xm = ∞ and F (xj ) − F (xj−1 ) ≤ /2
1
for j ∈ {1, . . . , m}, where m is finite. Now take any point x ∈ R. There is a j ∈ {1, . . . , m} such
that xj−1 ≤ x ≤ xj . As distribution functions are non-decreasing we conclude that
F̂n (x) − F (x) ≤ F̂n (xj ) − F (xj−1 )
=
F̂n (xj ) − F (xj ) + (F (xj ) − F (xj−1 ))
≤ F̂n (xj ) − F (xj ) + /2 .
≤
max F̂n (xj ) − F (xj ) + /2 .
j∈{1,...,m}
In the same fashion
F̂n (x) − F (x) ≥ F̂n (xj−1 ) − F (xj )
=
F̂n (xj−1 ) − F (xj−1 ) + (F (xj−1 ) − F (xj ))
≥
F̂n (xj−1 ) − F (xj−1 ) − /2 .
≥ − max F̂n (xj−1 ) − F (xj−1 ) − /2 .
j∈{1,...,m}
Therefore
sup F̂n (x) − F (x) ≤ /2 +
x∈R
max
j∈{0,...,m}
F̂n (xj ) − F (xj ) .
(1)
Now let’s make use of the law of large numbers. Let j ∈ {0, . . . , m} be fixed and note that for
any δ > 0 there is a value nj ∈ N such that
P |F̂n (xj ) − F (xj )| ≥ /2 ≤ δ ,
for all n ≥ nj .
It is important to note that m depends solely on . To emphasize this use the notation
m ≡ m . By the union of events bound this means that
P
max |F̂n (xj ) − F (xj )| ≥ /2 ≤ (m + 1)δ ,
j∈{0,...,m }
for all n ≥ maxj∈{0,...,m } nj . So we conclude that, for any δ 0 > 0 there is n0 ∈ N so that
P
max |F̂n (xj ) − F (xj )| ≥ /2 ≤ δ 0 ,
j∈{0,...,m }
for all n ≥
n0 .
In other words,
lim P
n→∞
max
j∈{0,...,m }
|F̂n (xj ) − F (xj )| ≥ /2
→0.
We are almost done, as this together with (1) tells us that
P sup F̂n (x) − F (x) ≥ ≤ P /2 + max |F̂n (xj ) − F (xj )| ≥ j∈{0,...,m}
x∈R
= P
max |F̂n (xj ) − F (xj )| ≥ /2
j∈{0,...,m}
→ 0,
as n → ∞, concluding the proof.
2
Exercise: Extend the proof for general distribution functions.
3