TOTAL VARIATION
Abstract. An very brief introduction to total variation.
1. Introduction
Let Xn be a sequence of random variables with cdf F . Let X be a random variable with
cdf F . Recall that we say that Xn converges to X in distribution if Fn (x) → F (x) for
all every point x ∈ R that is a point of continuity for F . In the case that Xn and X are
discrete random variables taking values in Z, convergence in distribution is equivalent to the
condition that P(Xn = k) → P(X = k) for all k ∈ Z.
2. Total variation
Let X and Y be random variables (taking values on R). Set
dT V (X, Y ) = 2 sup |P(X ∈ A) − P(Y ∈ A)|.
A∈B
Remark 1. Note that the definition of dT V (X, Y ) only depends on the laws of X and
Y , individually; in particular, if we have that if (X 0 , Y 0 ) is a coupling of X and Y , then
dT V (X 0 , Y 0 ) = dT V (X, Y ).
Lemma 1. Let X and Y be integer valued random variables. We have that
X
dT V (X, Y ) =
|P(X = z) − P(Y = z)|.
z∈Z
Proof. The proof follows from the following simple facts. If D = {z ∈ Z : P(X = z) ≥ P(Y = z)},
then
X
X
X
|P(X = z) − P(Y = z)| =
(P(X = z) − P(Y = z)) +
(P(Y = z) − P(X = z)).
z∈Z
z∈Dc
z∈D
Observe that for any A ⊂ Z, we have
|P(X ∈ A) − P(Y ∈ A)| = |P(X ∈ Ac ) − P(Y ∈ Ac )|.
Thus using the set D, we see that
dT V (X, Y ) ≥
X
|P(X = z) − P(Y = z)|.
z∈Z
For the other direction, note that
X
X
|P(X ∈ A) − P(Y ∈ A)| = P(X = z) −
P(Y = z)
z∈A
≤
X
z∈A
z∈A
|P(X = z) − P(Y = z)|.
Thus it follows that
2|P(X ∈ A) − P(Y ∈ A)| ≤ |P(X ∈ A) − P(Y ∈ A)| + |P(X ∈ Ac ) − P(Y ∈ Ac )|
X
≤
|P(X = z) − P(Y = z)|.
z∈Z
Exercise 2.1. Let X ∼ Bern(p), Y ∼ Bern(q), and W ∼ P oi(p). Compute dT V (X, Y ) and
dT V (X, W ).
Lemma 2 (Coupling inequality). If X and Y are random variables, then dT V (X, Y ) ≤
2P(X 6= Y ).
Proof. Let A ∈ B. Note that
|P(X ∈ A) − P(Y ∈ A)| = |P(X ∈ A, X = Y ) + P(X ∈ A, X 6= Y ) −
P(Y ∈ A, X 6= Y ) − P(Y ∈ A, X = Y )|
= |P(X ∈ A, X 6= Y ) − P(Y ∈ A, X 6= Y )|
≤ P(X 6= Y )
Exercise 2.2 (Maximal coupling). Let X and Y integer valued random variables. Show that
there exists coupling (X 0 , Y 0 ) such that dT V (X, Y ) = dT V (X 0 , Y 0 ) = 2P(X 0 6= Y 0 ). Sometimes
such couplings are called maximal couplings or optimal couplings.
3. Basic Poisson approximation
Theorem 3 (Le Cam). Let (Xi )ni=1 be independent Bernoulli random variables with parameters (pi )ni=1 . Let Sn = X1 + · · · + Xn and λ = p1 + · · · + pn . If W ∼ P oi(λ), then
dT V (Sn , W ) ≤ 2
n
X
p2i .
i=1
Corollary 4. In the case that pi = λ/n, we have that Sn ∼ Bin(n, λ), and dT V (Sn , W ) ≤
2λ2 /n.
The proof will make use of Lemma 2, a simple coupling of Bernoulli and Poisson random
variables, and the following elementary inequality.
Exercise 3.1. For all x ∈ [0, 1], we have 0 ≤ 1 − x ≤ e−x .
Lemma 5. Let W1 , . . . , Wn , Z1 , . . . , Zn be independent random variables, where Wi ∼ P oi(pi )
and Zi ∼ Bern(1 − (1 − pi )/e−pi ), where pi ∈ (0, 1). If
Xi = 1[Wi ≥ 1] + 1[Wi = 0]1[Zi = 1],
then (Xi )ni=1 are independent Bernoulli random variables with parameters (pi )ni=1 . Furthermore, P(Xi = Wi ) = pi e−pi + (1 − pi )
Proof. Notice by Exercise 3.1, (1 − pi )/e−pi ∈ (0, 1), so that the Zi are well defined. Clearly,
the Xi are independent and Bernoulli. We have that
P(Xi = 1) = P(Yi ≥ 1) + P(Yi = 1)P(Zi = 1)
= 1 − e−pi + e−pi (1 − (1 − pi )/e−pi )
= pi .
The later claim follows from the fact that event {Xi = Wi } is given the disjoint union
{Wi = 1} ∪ {Xi = 0}
Proof of Theorem 3. Let Xi and Wi be defined as in Lemma 5; thus Sn = X1 + · · · + Xn .
Since the sum of independent Poisson random variables is again a Poisson random variable,
we may also assume that W = W1 + · · · Wn . Note that if Xi = Wi for all 1 ≤ i ≤ n, then
certainly, we have Sn = W . Thus by Lemma 5 and Exercise 3.1, we have
P(Sn 6= W ) ≤
n
X
−pi
pi (1 − e
)≤
i=1
n
X
p2i .
i=1
Hence the result follows from Lemma 2.
Exercise 3.2. In the proof of Theorem 3, instead of using Lemma 5, use Lemma 2.2. Is the
coupling given by Lemma 5 a maximal coupling?
Some basic facts needed for needed for the next exercise.
Theorem 6 (Skorokhod representation, Chapter 7.2). Let Xn be real-valued random variables that converge in distribution to a random variable X. It can be shown that there exists
d
d
a coupling Xn0 and X 0 such that Xn = Xn0 and X = X 0 , such that Xn0 converges almost surely
to X 0 –just use the usual inverse distribution.
Theorem 7 (A special case of the dominated convergence theorem). Let fn : Z → R be a
sequence of functions such that fn → f pointwise. Suppose that there exists g : Z → [0, ∞)
such that
X
g(z) < ∞ and |fn | ≤ g.
z∈Z
Then
X
|f (z)| < ∞
z∈Z
and
lim
n→∞
X
z∈Z
fn (x) =
X
f (z).
z∈Z
Exercise 3.3. Let Xn , X be integer-valued random variables.
(a) Show (using the dominated convergence theorem) that if Xn converges in distribution to
X, then dT V (Xn , X) → 0 as n → ∞. Thus, obtaining convergence in total variation is
no better than regular convergence in distribution (for discrete random variables) if we
do not obtain a rate.
(b) Show that if Xn converges to X almost surely, then there exist a finite random variable
N such that XN = X almost surely.
d
d
(c) Show that if Xn converges in distribution to X, then there exists Xn0 = Xn , X 0 = X, and
a random variable N such that XN0 = X 0 .
(d) Show using the previous exercise (without the dominated convergence theorem) that if Xn
converges to X in distribution, then dT V (Xn , X) → 0 as n → ∞.
© Copyright 2026 Paperzz