Chapter 1
1. Probability, Indicator function and expectation.
The present chapter contains a brief review of some of the basic notions in probability
theory and Statistics. The reader is advised to skim according to taste and background.
Basic Set Theory.
Let Ω be an abstract set representing the sample space of a random experiment. The
power set of Ω by P(Ω) is defined to be the set of all subsets of Ω. Elements of Ω are
outcomes and its subsets are events. Therefore
P(Ω) = {A : A ⊂ Ω}.
For A, B ∈ P(Ω) define
A ∪ B = {x : x ∈ A or x ∈ B}, A ∩ B = {x : x ∈ A and x ∈ B}, Ā = Ac = {x : x ∈
/ A}.
Also define
A∆B = (A ∪ B) − (A ∩ B).
In terms of events A ∪ B occurs if and only if at least one of the two events A and B
occurs. Also A ∩ B occurs if both A and B occurs. The empty set is denoted by Φ.
A summary of basic properties.
(1) A ⊂ A, Φ ⊂ A
(2) A ⊂ B and B ⊂ A implies A = B.
(3) A ⊂ C and B ⊂ C implies A ∪ B ⊂ C and A ∩ B ⊂ C.
(4) A ⊂ B if and only if B c ⊂ Ac .
(5) (Ac )c = A, Φc = Ω, Ωc = Φ.
1
(6) A ∪ B = B ∪ A, A ∩ B = B ∩ A, A ∪ A = A, A ∩ Ω = A, A ∪ Ac = Ω, A ∩ Ac = Φ.
(7) A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C), A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
(8) (A ∪ B)c = Ac ∩ B c , (A ∩ B)c = Ac ∪ B c
Example. We have
∞ [
n=1
n
0,
n+1
= [0, 1),
∞ \
n=1
1
0,
n
= Φ.
Example. Prove A∆B = Ac ∆B c .
Proof. Note that
AC ∆B c = (Ac ∪ B c ) − (Ac ∩ B c ) = (A ∩ B)c ∩ ((A ∪ B)c )c = (A ∩ B)c ∩ (A ∪ B)
= (A ∪ B) − (A ∩ B) = A∆B
Definition 1.1. The indicator function of a set A is defined by
1, if x ∈ A,
I(x ∈ A) = IA (x) =
0, if x ∈ Ac .
Properties.
(1) IA∪B = max(IA , IB ), I(A ∩ B) = IA · IB , IA∆B = IA + IB (mod 2).
(2) A ⊂ B if and only if IA ≤ IB
P
(3) I∪Ai ≤ i IAi .
Example. Show that
(1)
I∪∞
= 1 − Π∞
i=1 (1 − IAi ).
i=1 Ai
and (2)
IA∆B = (IA − IB )2 .
2
Definition 1.2. Let {An } be a sequence of events. Then
∞ \
∞
[
lim inf An =
Am
n=1 m=n
and
∞ [
∞
\
lim sup An =
Am .
n=1 m=n
Lemma 1.1. We have
∞
X
lim sup An = {ω :
IAi (ω) = ∞}
i=1
and
lim inf An = {ω :
∞
X
IAci (ω) < ∞}
i=1
Note. For this reason sometime we write lim sup An = An i.o. (infinitely often). If
lim inf An = lim sup An then
lim An = lim inf An = lim sup An .
Proof. (1) If ω ∈ lim sup An then ω ∈
S∞
m=n
Am for all integers n. Therefore for any
integer n there exists an integer kn such that ω ∈ Akn . Since
∞
X
IAi (ω) ≥
i=1
Conversely, for any integer n,
∞
X
IAki (ω) = ∞.
i=1
∞
X
IAi (ω) = ∞.
i=n
This implies that ω ∈
S∞
j=n
Aj , for any integer n. Therefore
ω∈
∞ [
∞
\
Am = lim sup An .
n=1 m=n
For (2), notice that
ω ∈ lim inf An =
∞ \
∞
[
n=1 m=n
3
Am ,
implies that there exist an integer n0 such that ω ∈
∞
X
nX
0 −1
IAcn (ω) =
T∞
k=n0
Ak . Therefore
IAcn (ω) ≤ n0 < ∞.
n=1
n=1
Remark. The proof in the above can be simplified by noticing the fact that
c
(lim sup An ) =
∞ \
∞
[
Acm = lim inf Acn .
n=1 m=n
and
c
(lim inf An ) =
∞ [
∞
\
Acm = lim sup Acn .
n=1 m=n
Lemma 1.2. (1) If An ⊂ An+1 , for any integer n, then
lim An =
∞
[
An
n=1
(2) If An+1 ⊂ An , for any integer n, then
lim An =
∞
\
An .
n=1
Proof. We prove (1) and (2) similarly follows. Note that in this case,
∞
[
Am =
m=n
for all integers n. Therefore
∞ [
∞
\
∞
[
Am =
n=1 m=n
On the other hand
∞
\
Am
m=1
∞
[
An .
n=1
Am = An
m=n
which implies
∞
∞ \
[
Am =
n=1 m=n
∞
[
An .
n=1
Therefore
lim sup An = lim inf An =
∞
[
n=1
4
An = lim An .
Also notice that
lim
∞
[
Am =
m=n
∞ [
∞
\
Am , lim
n=1 m=n
∞
\
Am =
m=n
∞ \
∞
[
Am .
n=1 m=n
Example. We have
lim[0, 1 − 1/n] = lim[0, 1 − 1/n) = [0, 1)
and
lim[0, 1 + 1/n] = lim[0, 1 + 1/n) = [0, 1]
Note. Let B, C ⊂ Ω and define the sequence
B, if n is odd,
An =
C, if n is even.
We have
∞ [
∞
\
Am = B ∪ C
n=1 m=n
and
∞ \
∞
[
Am = B ∩ C.
n=1 m=n
If B ∩ C 6= B ∪ C then B ∩ C = lim inf An 6= lim sup An = B ∪ C.
Definition 1.3. A field (algebra) is a class of subsets of Omega (called events) that
contains Ω, closed under finite union, finite intersection and complements.
In other words a family of subsets of Ω (say A) is a field if
(1) Ω ∈ A
(2) If A ∈ A then Ac ∈ A
(3) If A, B ∈ A then A ∪ B ∈ A.
Remarks. If A, B ∈ A then A ∩ B ∈ A. This is true because
(Ac ∪ B c )c = A ∩ B.
5
Definition (σ-field). A σ-field (σ-algebra) is a field that is closed under countable union.
(Observe that this implies that a σ-field is also closed under countable intersection).
Example. Here are 4 σ-fields.
(1) The power set P(Ω).
(2) Check that F = {Ω, Φ}.
(3) The family of subsets of < which are either countable or their complements are
countable.
(4) Let B the smallest σ-field containing all open sets. This σ-field is called the Borel
σ-field.
Definition 1.4. (Probability measure). Let Ω be a sample spaces and F be a σ-field on
Ω. A probability measure P is defined on F such that
(i) P (Ω) = 1
(ii) If A1 , A2 , . . . are in F and they are disjoint then
!
∞
∞
[
X
P
Ai =
P (Ai )
i=1
i=1
Basic properties. We have
S
(i) Since P (Ω) = 1 = P (Ω Φ) = P (Ω) + P (Φ) we have P (Φ) = 0.
S
T
(ii) Since (A − B) (A ∩ B) = A and (A − B) (A ∩ B) = Φ we have
P (A − B) = P (A) − P (A
(iii) Similarly note that (A − B)
S
B=A
S
\
B).
B and (A − B)
T
B = Φ which implies
P (A ∪ B) = P (A) + P (B) − P (A ∩ B).
(iv) If A ⊂ B then A
S
(B − A) = B. Therefore P (A) + P (B − A) = P (B). Obviously
P (A) ≤ P (B).
6
Approach based on expectation.
Definition 1.5. Let X : Ω −→ < and assume E be an operator with the following
properties
(1) If X ≥ 0 then E(X) ≥ 0
(2) If c ∈ < is a constant then E(cX) = cE(X)
(3) E(X1 + X2 ) = E(X1 ) + E(X2 )
(4) E(1) = 1
(5) If Xn (ω) is monotonically increasing and Xn (ω) ↑ X(ω) then
lim E(Xn ) = E(X).
n→∞
Example. Let
RD
−D
E(X) = lim
D→∞
X(ω)dω
.
2D
Check if E satisfies definition for expectation.
Solution. It is easy to check (1)-(4). The axiom (5) fails to be correct. For example
take Ω = < and Xn (ω) = I[ − n, n](ω). We have
RD
Xn (ω)dω
=0
lim −D
D→∞
2D
while Xn (ω) ↑ 1. So the operator in this example is not a proper form of expectation.
Definition 1.6. For any event A define
P (A) = E(IA (ω)).
For simplicity sometimes, we drop ω and write P (A) = E(IA ). It is easy to verify the
axioms of probability measure given in definition 1.4.
Properties. It is easy to conclude that (1)
!
n
n
X
X
E
ci X i =
ci E(Xi )
i=1
i=1
7
and
(2) if X ≤ Y ≤ Z then E(X) ≤ E(Y ) ≤ E(Z).
(3) If {Ai } is a sequence of events then
!
∞
∞
[
X
P
Ai ≤
P (Ai ).
i=1
i=1
To see this notice that
I
S∞
i=1
Ai
≤
∞
X
IAi
i=1
and take expectation.
(4)
P
n
[
!
Ai
=
n
X
P (Ai ) −
i=1
i=1
X
(−1)3 P (Ai ∩ Aj ) +
1≤i<j≤n
X
P (Ai ∩ Aj ∩ Ak )
1≤i<j<k≤n
− · · · + (−1)n+1 P (A1 ∩ A2 · · · ∩ An ).
For proof use the fact that
I∪ni=1 Ai = 1 − Πni=1 (1 − IAi ).
Theorem 1.1. (Fatou’s Lemma). If {An } is a family of events, then
(1)
P (lim inf An ) ≤ lim inf P (An ) ≤ lim sup P (An ) ≤ P (lim sup An ).
(2) If lim An = A then lim P (An ) = P (A).
We first prove the following lemma.
Lemma 1.3. If An ⊂ An+1 for any positive integer n then lim P (An ) = P (A). Similarly,
if An+1 ⊂ An for any positive integer n then lim P (An ) = P (A) where in both cases
A = lim An .
Proof. Define
B1 = A1 , Bi = Ai − Ai−1 , i = 2, 3, . . . .
8
Since A1 ⊂ A2 ⊂ A3 ⊂ · · · we have
n
[
Bi =
i=1
n
[
Ai = An ,
i=1
∞
[
Bi =
i=1
∞
[
Ai = A.
i=1
Therefore
P (A) = P
∞
[
!
Bi
i=1
=
∞
X
P (Bi ) = lim
n
X
n→∞
i=1
P (Bi ) = lim P
n→∞
i=1
n
[
!
Bi
i=1
= lim P (An ).
n→∞
For the second case notice that for any nonnegative integer n, Acn ⊂ Acn+1 . Therefore from
the first part of this lemma we have
lim P (Acn ) = lim (1 − P (An )) = P (Ac ) = 1 − P (A).
n→∞
n→∞
Therefore
lim P (An ) = P (A).
n→∞
Now we are ready to prove Theorem 1.1. Notice that from the first part of Lemma 1.3
write
P (lim inf An ) = P
since
T∞
i=n
!
Ai
= lim P
n→∞
i=n
∞
\
!
≤ lim inf P (An )
Ai
i=n
Ai ⊂ An . Likewise
P (lim sup An ) = P
since An ⊂
lim
n→∞
∞
\
S∞
i=n
lim
n→∞
∞
[
!
Ai
= lim P
n→∞
i=n
∞
[
!
Ai
≥ lim sup P (An )
i=n
Ai .
(2) If A = lim sup An = lim inf An we have
P (A) = P (lim inf An ) ≤ lim inf P (An ) ≤ lim sup P (An ) ≤ P (lim sup An ) = P (A).
Conclude limn→∞ P (An ) = P (A).
Definition 1.6. (Independence) Events A and B are independent if
P (A ∩ B) = P (A)P (B).
9
Properties.
(1) If A and B are independent then Ac and B are independent. To see this notice
that
P (Ac ∩ B) = P (B − A) = P (B) − P (A ∩ B) = P (B) − P (A)P (B)
= P (B)(1 − P (A)) = P (B)P (Ac ).
(2) If A, B, C are independent then A and B ∪ Care independent. Similarly A and
B ∩ C are independent.
(3) An event A is independent of itself if and only if A = Φ or Ac = Φ.
(4) Any event A is independent from Ω.
Theorem 1.2 (Borel Cantelli). Let (Ω, F, P ) be a probability space and let {Ei } be a
sequence of events.
(1) If
P∞
i=1
P (Ei ) < ∞ then P (lim sup En ) = 0
(2) If {Ei } is a sequence of independent events then P (lim sup An ) = 0 or 1 according
P
as the series ∞
i=1 P (Ei ) converges or diverges.
Proof. Set E = lim sup En . We have E =
T∞
n=1
positive integer n,
∞
X
0 ≤ P (Fn ) ≤
Fn where Fn =
S∞
m=n
Em . For every
P (Em ).
m=n
Since
P∞
i=1
P (Ei ) < ∞ then limn→∞ P (En ) = 0. Since Fn ↓ E from lemma 1.3 we can
write
0 = lim P (Fn ) = P
n→∞
lim Fn = P (lim sup En )
n→∞
thus (1) follows. for (2), suppose E1 , E2 , . . . are independent. From part (1) we know if
P∞
P∞
m=1 P (Em ) < ∞ then P (lim sup En ) = 0. It remains to show that if
m=1 P (Em ) = ∞
then P (lim sup En ) = 1. As in part (1) let E = lim sup En . We get
E c = lim inf Enc .
10
Since the sequence of events {Enc } are also independent we have
!
!
!
∞
N
N
\
\
X
c
c
P
Em
≤P
Em
= ΠN
P (Em ) .
m=n (1 − P (Em )) ≤ exp −
m=n
m=n
m=n
T
c
c
As N → ∞, we get N
m=n Em ↓ E . Thus
!
!
!
N
N
N
\
X
X
c
c
P
Em = P (E ) ≤ exp −
P (Em ) ≤ exp −
P (Em ) → 0
m=n
m=n
m=n
as n → ∞. Therefore P (E c ) = 0 and P (E) = 1.
Corollary 1.1. If {Ei } is a sequence of independent events then
P (lim sup En ) = 0 if and only if
∞
X
P (Ei ) < ∞.
i=1
P
P (Ei ) < ∞ from part (1) of Theorem 1.2, we have ∞
i=1 P (Ei ) < ∞. Now
P
let P (lim sup En ) = 0. If ∞
i=1 P (Ei ) = ∞ then P (lim sup En ) = 1. (part (2) of Theorem
Proof. If
P∞
i=1
1.2).
Remark. Notice that independence is required in Corollary 1.1 and can not be dropped.
To see this let (Ω = [0, 1], B, P) be a probability space with
Z
P (A) =
dx
A
for a Borel set A. It is easy to show that P is a probability measure on [0, 1]. Now define
En = (0, 1/n) and notice that En ↓ Φ. Therefore
P (lim sup En ) = P (lim En ) = 0.
Since P (En ) = 1/n we have
P∞
1
n=1 n
= ∞. This does not violate Corollary 1.1 as {Ei } are
not independent. For example, take E2 = (0, 1/2) and E3 = (0, 1/3). We have
P (E2 ) = 1/2, P (E3 ) = 1/3, P (E2 ∩ E3 ) = P (E3 ) = 1/3 6= P (E2 )P (E3 ) = 1/6.
11
Example. In tossing of a fair coin, let En denotes the event that a head turn up on both
the nth and (n + 1)st toss of a fair coin. Then lim sup En is the event that in repeated
tossing of a fair coin two successive head appears infinitely often times. Since {E2n } is an
independent sequence of events and
∞
X
n=1
∞
X
P (E2n ) =
(1/4) = ∞,
n=1
we have P (lim sup E2n ) = ∞. This implies that P (lim sup En ) = ∞. Note that
lim sup E2n ⊂ lim sup En .
12
Chapter 2, Some Inequalities in Probability
In this chapter we review some important inequalities. These inequalities will be used
later in different sections of this note.
Definition 1.7. Suppose E(X) = µ and σ 2 = V (X) = E((X − µ)2 ). Define
m.s.
X = µ if and only if E(X) = µ.
With some knowledge in real analysis equality in mean square is equivalent to equality
with probability 1 (almost sure).
Theorem 2.1. (Markov inequality). Let X be a nonnegative random variable and a be
a positive constant. Then
P (X > a) ≤
E(X)
.
a
Proof. Note that
I(X ≥ a) ≤
X
.
a
(Use Figure 1). Now take expected value from both sides.
Remark. Similarly we can write
I(|X − b| ≥ a) ≤
(X − b)2
a2
for all a > 0 and b ∈ < (See figure 2). Therefore
P (|X − b| ≥ a) ≤
E((X − b)2 )
.
a2
Take b = µ and a = kσ where µ and σ 2 are the mean and variance of X to get Chebyshev’s
inequality
P (|X − µ| > kσ) ≤
1
.
k2
Example. This example shows that chebyshev’s inequality can not be improved. Let
13
3
2
1
0
0
2
4
6
8
Figure 1: Graph for I(x > a) and x/a when a = 3.
14
10
2.5
2.0
1.5
1.0
0.5
0.0
−2
0
2
4
6
Figure 2: Graph for I(|x − b| > a) and (x − b)2 /a2 , when a = 3 and b = 2.
15
6
1
P (X = −1) = P (X = 1) = , P (X = 0) = .
8
8
It is easy to see that E(X) = 0 and σ 2 = 1/4.. Let k = 2 and calculate
P (|X − µ| ≥ kσ) = P (|X| ≥ 1) =
1
1
= 2.
4
k
0
Cauchy-Schwarz inequality. Let X = (X1 , . . . , Xp ) be a random vector. Define the
0
matrix U = E(XX ). Notice that for any p × 1 vector c we have
0
0
c Uc = E((c X)2 ) ≥ 0.
Therefore the matrix U is nonnegative definite. Take p = 2 to see that
0 ≤ det(U) = E(X12 X22 ) − (E(X1 X2 ))2 .
0
m.s.
Equality holds if c X = 0. Some generalizations exist which is given next.
Example. If P (X ≥ 0) = 1 and E(X) = µ then
P (X ≥ 2µ) ≤ 0.5.
Proof. Use Markov’s inequality
P (X ≥ 2µ) ≤
E(X)
= 0.5.
2µ
Example. Let X be a random variable and M (t) = E(exp(tX)) < ∞ exists for t ∈
(−h, h), h > 0 (moment generating function). It is easy to show that
P (X ≥ a) ≤ exp(−at)M (t), h > t > 0
and
P (X ≤ a) ≤ exp(−at)M (t), −h < t < 0.
Holder, Lyapunov and Minkowski inequalities.
16
Lemma 2.1. If α ≥ 0, β ≥ 0,
1 1
+ = 1, p > 1, q > 1
p q
then
0 ≤ αβ ≤
αp β q
+ .
p
q
Proof. If αβ = 0 then inequality holds trivially. Therefore, let α > 0, β > 0. For t > 0,
define
φ(t) =
tp t−q
+
p
q
and differentiate to get
0
φ (t) = tp−1 − t−q−1 .
0
0
0
We have φ (1) = 0 and φ (t) < 0 when t ∈ (0, 1) and φ (t) > 0 when t > 1. Therefore
t = 1 minimizes φ(t) in (0, ∞), i.e.
φ(t) =
tp t−q
+
≥ φ(1) = 1, t ∈ (0, ∞).
p
q
Set t = α1/q /β 1/p to get
αp/q
α−1
+ −q/p ≥ 1.
pβ
qβ
Multiply both sides by αβ and use the fact that
p/q + 1 = p and q/p + 1 = q
to get
αβ ≤
β
αp β q
αp/q+1
+ −q/p =
+ .
p
qβ
p
q
Theorem 2.2 (Holder’s inequality). Let X and Y be two random variables and
1 1
+ = 1, p > 1, q > 1.
p q
We have
E(|XY |) ≤ (E(|X|p ))1/p (E(|Y |q ))1/q .
17
a.s.
Proof. In the case that E(|X|p )E(|Y |q ) = 0, result follows easily (XY = 0). Otherwise
in Lemma 2.1, take
α=
|X|
1/p
(E(|X|p ))
,β =
|Y |
(E(|Y |q ))1/q
to get
|XY |
(E(|X|p ))1/p (E(|Y |q ))1/q
≤
|X|p
|Y |q
+
.
E(|X|p ) E(|Y |q )
Take expected value to conclude the result.
Theorem 2.3 (Minkowski’s inequality). For p ≥ 1, we have
(E(|X + Y |p ))1/p ≤ (E(|X|p ))1/p + (E(|Y |p ))1/p .
Proof. Since |X + Y | ≤ |X| + |Y | the case p = 1 is obvious. So let p > 1. Choose q > 1
such that
1 1
+ = 1.
p q
Use Holder’s inequality to write
E (|X + Y |p ) = E |X + Y ||X + Y |p−1 ≤ E |X||X + Y |p−1 + E |Y ||X + Y |p−1
1/q 1/q
≤ (E(|X|p ))1/p E |X + Y |(p−1)q
+ (E(|Y |p ))1/p E |X + Y |(p−1)q
.
Since (p − 1)q = p by dividing both sides by (E(|X + Y |p ))1/q 6= 0 result follows (the case
(E(|X + Y |p ))1/q = 0 is trivial).
Remarks.
1.Cauchy Schwarz inequality is the special case of Holder’s inequality (p = q = 0.5).
2. On space of random variables defined on (Ω, F, P ) the distance function
d(x, y) = (E(|X − Y |p ))1/p
18
is a metric. The space of random variables on Ω equipped with the distance d is called
the LP space.
Convexity and Jensen’s inequality. Convexity is an important topic in Mathematics.
For simplicity we only consider the smooth convex functions. In most of our cases for
00
convexity we only need to check that g (x) ≥ 0. Sometimes to check the convexity of a
function g, it suffices to show that the tangent line of the function g at the point µ is
below the curve g(x). This means that
0
g(x) ≥ g(µ) + g (µ)(x − µ).
If X is a random variable where µ = E(X) and g is a convex function then
E(g(X)) ≥ g(E(X)).
Theorem 2.4.(Lyapunov inequality). If r > s > 0 then
(E(|X|r ))1/r ≥ (E(|X|s ))1/s .
Proof. The function g(x) = |x|u when u > 1 is convex in <. Let r/s > 1 and use
E(g(X)) ≥ g(E(X)) to write
E(|X|r ) = E (|X|s )r/s ≥ (E(|X|s ))r/s .
Now result follows immediately.
0
0
The law of least square. Let X and Y = (Y1 , . . . , Ym ) be a random variable and
0
a random vector respectively. To predict X from a linear function of Y , a Y where
a(m × 1) ∈ <m , we need to minimize
0
0
0
0
D(a) = E(X − a Y)2 = E(X 2 ) − 2a E(XY) + a E(YY )a.
19
20000
10000
0
−10000
2
4
6
8
Figure 3: The convex function f (x) = x + exp(x)
20
10
Define
0
0
0
UY X = E(XY ) = (E(XY1 ), . . . , E(XYm )) , UY Y = E(YY ).
Notice that the matrix UY Y is nonnegative definite Differentiate D(a) solve for a to find
the optimum value for a. We get
∂D(a)
0
= −2E(XY) + 2E(YY )a = 0.
∂a
0
This gives a = UY−1Y UY X (if UY Y is nonsingular). If we have the predictor X̂ = a0 + a Y,
0
we impose E(X) = a0 + a E(Y) to get
0
X̂ − E(X) = a (Y − E(Y))
and a = VY−1
Y VY X where VY Y and VY X are centered version of previous formula.
21
Chapter 3
Induced probability measure, Conditional probability, probability density
functions and cumulative distribution functions
Induced probability measure. Let (Ω, F, P ) be a probability space and E be a
Borel set. Define PX (E) = P (X(ω) ∈ E) = P (X −1 (E)). We can check that
(i) PX (<) = 1
P∞
S∞
S
(ii) PX ( ∞
i=1 PX (Ei ).
i=1 ) =
i=1 Ei ) = P (X ∈
S
S
∞
−1
(Ei ).
Note that X −1 ( ∞
i=1 X
i=1 Ei ) =
Definition 3.1. If X is a random variable the cumulative distribution function (c.d.f.)
for the random variable X is defined by
FX (x) = P (X(ω) ≤ x) = P (X ≤ x).
Theorem 3.1. The distribution function F of a random variable X is a nondecreasing,
right continuous function on < such that
F (−∞) = 0, F (+∞) = 1.
Proof. Let h > 0 and x ∈ <. We have
F (x + h) − F (x) = P (x < X ≤ x + h) ≥ 0.
Therefore F is nondecreasing. For right continuity, take
En = (−∞, x + hn ), hn ↓ 0.
We have
En ↓ ∩∞
n=1 En = (−∞, x] ⇒ P (En ) = F (x + hn ) ↓ P (E) = F (x).
Finally we have
F (N ) − F (−N ) = P (−N < X ≤ N )
22
and
EN = (−N, N ] ↑ E = <.
Therefore
F (N ) − F (−N ) ↑ 1
which proves F (+∞) = 1 and F (−∞) = 0.
Remark. A distribution function F is continuous at x ∈ < if and only if P (X = x) = 0.
To see this notice that
P (X = x) = F (x) − F (x−)jump at x.
Since F is continuous F (x−) = limt→x− F (x) = F (x) result follows immediately.
Definition 3.2. A random variable X is of continuous type if F (x) = P (X ≤ x) is a
continuous function.
Example. Define
F (x) = (1/2) I[0,1) (x) + (2x/3) I[1,3/2] (x) + I(3/2,∞) (x).
It is obvious that F is nondecreasing, F (+∞) = 1, F (−∞) = 0 and F is right continuous.
Therefore F is a distribution function. The c.d.f. F is not of continuous type. We have
P (X = 0) = 0.5, P (X = 3/2) = 0, and P (0 ≤ X ≤ 1) = 5/6.
Lemma 3.1. The set of discontinuity points of a c.d.f. is at most countable.
Proof. Define p(x) = F (x) − F (x−) > 0 (the size of jump at x). Let D the set of all
discontinuity points of F . Let n be a positive integer and
Dn = {x :
1
1
< p(x) ≤ }.
n+1
n
23
We have D = ∪∞
n=1 Dn . Since the number of elements in Dn can not exceed n, then Dn is
finite and this proves that D is at most countable.
Lemma 3.2. Let X be a random variable with c.d.f. of F and p(x) = F (x) − F (x−) > 0.
Let D = {x1 , x2 , . . . , } be the set of discontinuities of F . Define the step function
G(x) =
∞
X
p(xn )I(xi ≤ x).
n=1
Then H(x) = F (x) − G(x) is nondecreasing and continuous on <.
Proof. Clearly H is right continuous. We next show that H is also left continuous. Note
0
that if x < x,
0
0
0
H(x) − H(x ) = F (x) − F (x ) − [G(x) − G(x )].
0
0
0
As x ↑ x then F (x) − F (x ) converges to the size of jump of F at x and G(x) − G(x )
converges to the size of jump of G at x. The size of jumps in both cases are p(x) which
0
0
shows that H(x) − H(x ) → 0 as x ↑ x.. Next we show that H is nondecreasing. We first
prove that
X
0
p(xn ) ≤ F (x) − F (x ).
x0 <xn ≤x
0
Suppose that F has only a finite number of discontinuity (say N ) in the interval (x , x].
We may assume that
0
x < x1 < x2 < · · · < xN ≤ x.
By the monotone property of F it follows that
0
F (x ) ≤ F (x1 −) < F (x1 ) ≤ F (x2 −) < F (x2 ) ≤ · · · ≤ F (xN −) < F (xN ) ≤ F (x).
Since p(xn ) = F (xn ) − F (xn −) we have
N
X
n=1
p(xn ) = F (xN )−F (x1 −)+
N
−1
X
0
[F (xj )−F (xj+1 −)] ≤ F (xN )−F (x1 −) ≤ F (x)−F (x ).
j=1
24
Next suppose that N → ∞ (countable infinite set of discontinuity) to get
0
0
H(x) − H(x ) = F (x) − F (x ) −
∞
X
0
p(xn )I(x < xn ≤ x).
n=1
Note. H(−∞) = 0, H(+∞) = 1 − α.
Theorem 3.2. Let F be a c.d.f.. Then F admits the decomposition
F (x) = αFd (x) + (1 − α)Fc (x)
such that both Fd and Fc are cumulative distribution functions and where Fd is a step
function and Fc is continuous.
Proof. With the same notation used in Lemma 3.2, let α ∈ (0, 1) and set Fd (x) = G(x)/α
and Fc (x) = H(x)/(1 − α). If α = 0 set F (x) = H(x) and if α = 1 set F (x) = G(x)
Example. Let
F (x) = (1/2) I[0,1) (x) + (2x/3) I[1,3/2) (x) + I[3/2,∞) (x).
find Fc and Fd .
Solution. Jump points are J = {0, 1} and p(0) = 1/2, p(1) = 2/3 − 1/2 = 1/6.
G(x) =
X
p(xn ) = (1/2)I(0,1] (x) + (2/3)I(x ≥ 1)
xn ≤x
and
H(x) = F (x) − G(x) = (2/3)(x − 1)I[1,3/2) (x) + (1/3)I(−∞,3/2] (x).
We also get 1 − α = H(+∞) = 1/3, α = 2/3. Therefore Fd (x) = 3G(x)/2, Fc (x) =
H(x)/3.
Note.
25
(1) A random variable is called of discrete type if
P (X ≤ x) = F (x) = Fd (x), Fc (x) = 0.
(2) For a discrete random variable X,
P (X = x) = f (x) = F (x) − F (x−) ≥ 0
is called the p.d.f. (p.m.f.) for X.
(3) For a discrete random variable X we have
P
x
f (x) = 1
Example. Let X be a random variable with continuous cdf of F . Find distribution for
(i) U = F (X)
d
(ii) If U ∼ U [0, 1] then F −1 (U ) = X
(iii) Use (i) and (ii) to show that X = − ln U has an exponential distribution.
Solution. We have
G(u) = P (F (X) ≤ u) = P (F −1 (F (X)) ≤ F −1 (U )) = F (F −1 (u)) = u.
0
Therefore g(u) = G (u) = 1, u ∈ (0, 1).
For (ii)
P (F
−1
(U ) ≤ x) = P (F (F
−1
Z
(U ) ≤ F (X)) = P (U ≤ F (x)) =
F (x)
du = F (x)
0
For (iii) note that
F (x) = P (− ln U ≤ x) = P (U ≥ exp(−x)) = 1 − exp(−x)
0
and F (x) = exp(−x), x > 0.
Definition 3.3. Let P (X ≤ x) = F (x) be continuous. If there exists a nonnegative
function f such that
Z
x
P (X ≤ x) =
f (t)dt
−∞
26
then f is called the probability density function (p.d.f.) for the random variable X.
Remark. Notice that if definition 3.3 holds then
Z ∞
f (t)dt = 1
−∞
0
and if f is continuous at x then F (x) = f (x).
2. Conditional Probability measure. Let (Ω, F, P ) be a probability space and B ∈ F
such that P (B) > 0. Define the conditional probability measure given B as
QB (A) = P (A|B) =
P (A ∩ B)
.
P (B)
Theorem 3.2. On (Ω, F, P ), the set function QB (A) is a probability measure.
Proof. If B ∈ F and {Ai } is a sequence of disjoint events then
(1) QB (Φ) = P (B ∩ Φ)/P (B) = 0,
P∞
S
(2) QB ( ∞
i=1 QB (Ai )
i=1 ) = P (B ∩ (A1 ∪ A2 ∪ · · · ))/P (B) =
(3) QB (Ω) = P (Ω ∩ B)/P (B) = P (B)/P (B) = 1.
(4) Since P (A ∩ B) ≤ P (B) we have QB (A) ∈ [0, 1] for any A ∈ F.
Remarks.
(1) P (A|B c ) 6= 1 − P (A|B), P (Ac |B) = 1 − P (A|B)
(2) P (A|Ω) = P (A)
(3) IF A and B are independent and P (A)P (B) > 0 then
P (A|B) = P (A) and P (B|A) = P (B).
Theorem 2.1. If {Ai } is a sequence of disjoint events such that
∞
[
Ai = Ω
i=1
and P (B) > 0 then
P (B) =
∞
X
P (B|Ai )P (Ai )
i=1
27
and
P (B|Ai )P (Ai )
P (Ai |B) = P∞
.
i=1 P (B|Ai )P (Ai )
Proof. We have
P (B) = P (B ∩ Ω) = P (B ∩ (A1 ∪ A2 ∪ · · · )) =
∞
X
P (B ∩ Ai ) =
i=1
∞
X
P (B|Ai )P (Ai ).
i=1
Similarly
P (Ai |B) =
P (Ai ∩ B)
P (B|Ai )P (Ai )
P (B|Ai )P (Ai )
=
= P∞
.
P (B)
P (B)
i=1 P (B|Ai )P (Ai )
Example 1. Roll a die, then flip a coin as many times as the number on the die. Let X
be the number of heads. (1) Find the distribution for the random variable X. (2) If we
observe 3 heads what is the probability that the die is 4 ?
Solution. (1) Let Y be the number on the die. Then
P (X = k) =
6
X
i=1
6
6 i
1X
1X i
1
P (X = k|Y = i)P (Y = i) =
P (X = k|Y = i) =
.
6 i=1
6 i=k k
2
For (2) use
P (Y = 4|X = 3) =
P (X = 3|Y = 4)P (Y = 4)
P (X = 3|Y = 4)P (Y = 4)
=
=
P6
1 i
i
1
P (X = 3)
6
Since P (Y = 4) = 1/6, P (X = 3|Y = 4) =
4
3
i=3
2
(1/2)4 = 1/4 and
6 i
X
i
1
3
i=3 3
2
=1
we get P (Y = 4|X = 3) = 0.25.
Example. An urn contains m + n chips of which m are white and the rest are black. A
chip is drawn at random and without observing its color another chip is drawn at random.
What is the probability that the first chip is white ? What is the probability that the
second chip is white ?
28
Solution. Let E1 be the event that the first chip is white and E2 be the event that the
first chip is black. Also let A be the event that the second chip is black. We have
P (E1 ) =
m−1
n
m
, P (A|E1 ) =
, P (A|E2 ) =
.
m+n
m+n−1
m+n−1
We have
=
P (A) = P (A|E1 )P (E1 ) + P (A|E2 )P (E2 )
m
m
n
m
m−1
+
=
.
m+n−1
m+n
m+n−1
m+n
m+n
29
Chapter 4
Expectation, moments, moment generating function, characteristic function,
functions of a random variable..
Let X be a random variable with c.d.f. F such that
Z ∞
|U (x)|dF (x) < ∞.
−∞
Define
Z
∞
E(U (X)) =
U (x)dF (x).
−∞
We denote mean by E(X), k th moment by µk = E(X k ), variance by σ 2 = E(X 2 )−E[(X −
E(X)2 ], moment generating function by M (t) = E(exp(tX)) and characteristic function
√
by Φ(t) = E(exp(itX)), where i = −1. If X is of discrete type then g(s) = E(sX ) =
P k
k s P (X = k) is called the generating function.
Example 1. Let X be a random variable and
Z
dx
P (A) =
, x ∈ <.
2
A π(1 + x )
Show that
(i) f (x) =
1
,x
π(1+x2 )
∈ < is a p.d.f..
(ii) E(X k ) does not exist if k ≥ 1.
Proof. Since f (x) ≥ 0 and
Z
∞
−∞
dx
= 1.
π(1 + x2 )
(i) follows immediately.
We have E(|X|) = ∞ and
Z ∞
k
E(|X| ) = 2
0
xk dx
2
≥
π(1 + x2 )
π
if k > 1..
30
Z
1
∞
xk
2
≥
1 + x2
π
Z
1
∞
xk
=∞
x2
Example 2. Let Z be a random variable with p.d.f. (check if f is a p.d.f.)
1
f (z) = √ exp(−z 2 /2).
2π
Find the c.d.f. for X = σZ + µ, for σ > 0 and µ ∈ <.
Solution. Note that f (x) ≥ 0. Let
∞
Z
exp(−x2 /2)dx
I=
−∞
and notice that
∞
Z
2
∞
Z
exp(−(x2 + y 2 )/2)dxdy.
I =
−∞
−∞
Now take x = r cos θ and y = r sin θ (polar coordinates) to get
2
Z
∞
Z
2π
I =
0
Therefore I =
√
exp(−r2 /2)rdrdθ = 2π.
0
2π. A graph for f (x) is given below
We can calculate
Z x−µ
σ
x−µ
f (z)dz.
F (x) = P (σZ + µ ≤ x) = P Z ≤
=
σ
−∞
We get
dF (x)
1
= √ exp
dx
σ 2π
−(x − µ)2
σ2
for x ∈ <. (This distribution is denoted by N (µ, σ 2 )).
Example 3. Let X be a random variable with Poisson distribution with mean λ. Find
the generating function.
Solution. We have
X
g(s) = E(s ) =
∞
X
k=0
sk
exp(−λ)λk
= exp(λ(s − 1)).
k!
31
Multivariate normal distribution. Let Z1 , . . . , Zp be an i.i.d. sequence of observations
from a standard normal distribution. We have
f (z1 , . . . , zp ) =
1
0
exp(−z
z/2)
(2π)n/2
0
where z = (z1 , . . . , zp ). Let Σ(p × p) be a positive definite matrix and x = Σ1/2 z + m. We
have dx = |Σ1/2 |dz. Therefore the p.d.f. for random variable X is
h
i
1
0
−1/2
exp
−(x
−
m)
Σ
(x
−
m)
.
g(x) =
|2πΣ)1/2 |
A random variable with the p.d.f. above is called p-variate normal density with the mean
m and the variance-covariance matrix Σ and denoted by Np (m, Σ).
Notice that the result above implies that if x ∼ Np (m, Σ) then
Σ−1/2 (x − m) ∼ Np (0, I).
Elementary properties of characteristic functions. The moment generating functions of a random variable may not exist. For example it can be shown that the moment
generating function for Cauchy distribution may not exist as
Z ∞
exp(θx)
dx = ∞.
2
−∞ π(1 + x )
Therefore we define characteristic function which always exist (Theorem 4.1).
Theorem 4.1. If X is a random variable then
(i) Φ(θ) = E(exp(iθX)) always exists, Φ(0) = 1 and |Φ(θ)| ≤ 1.
¯ = Φ(−θ)
(ii) Φ(θ)
(iii) If X is symmetric then Φ(θ) ∈ <.
(iv) ΦaX+b (θ) = E(exp(iθ(aX + b)) = exp(iθa)ΦX (θb).
(v) The characteristic function for any random variable X is uniformly continuous.
Proof. Since exp(iθX) = cos θX + i sin θX = U + iV and
|E(U + iV )|2 = (E(U ))2 + (E(V ))2 ≤ E(U 2 ) + E(V 2 )
32
we have
|E(U + iV )| ≤ E(|U + iV |).
Therefore
|E(exp(iθX))| ≤ E(| exp(iθX)|) = 1.
For (ii), notice that
E(cos θX) + iE(sin θX) = E(cos θX) − iE(sin θX) = Φ(θX).
d
For (iii) notice that X == −X. For (iv),
E(exp(iθ(aX + b)) = exp(iθb)E(exp(iθaX)).
To prove uniform continuity notice that
Φ(t + h) − Φ(t)| = |E(exp(itX)(exp(ihX) − 1)| ≤ E(| exp(itX)|| exp(ihX) − 1)|)
= E(| exp(ihX) − 1)|).
Since | exp(ihX) − 1)| ≤ 2 we have
lim E(| exp(ihX) − 1)|) = 0.
h→0
Theorem 4.2. Let X be a random variable with c.d.f. F such that E(X) exists. Then
Z ∞
Z ∞
E(X) =
(1 − F (x))dx −
F (−x)dx.
0
0
Proof. First assume P (X ≥ 0) = 1. We have
Z ∞
Z ∞Z x
Z
E(X) =
xdF (x) =
dydF (x) =
0
0
0
0
∞
Z
∞
Z
y
∞
(1 − F (y))dy.
dF (x)dy =
0
In general X = X + − X − where
X + = max(0, X) = (X + |X|)/2, X − = max(0, −X) = (−X + |X|)/2.
33
Therefore E(X) = E(X + ) − E(X − ). Since X + , X − ≥ 0 we have
Z ∞
Z ∞
+
P (max(0, X) > x)dx −
P (max(0, −X) > x)dx.
E(X ) =
0
0
Since
P (max(0, X) > x) = P ((X + |X|)/2 > x) = P (|X| > 2x − X)
= P (X > 2x − X, X < −2x + X) = P (X > x)
and
P (X − > x) = P (|X| > 2x + X) = P (X > 2x + X, X < −2x − X) = P (X < −x).
Remark 1. From theorem 4.2. we can conclude that if X ≥ 0 is an integer-valued
random variables then
E(X) =
∞
X
P (X > n).
n=0
An easy and direct solution is taking expectation from the both sides of
X=
∞
X
I(X > k).
k=0
Remark 2. We can also derive the following formula
Z ∞
r
E(X ) =
rxr−1 P (X > x), r > 0
0
by using the fact that
r
Z
E(X ) =
∞
Z
r
P (X > x)dx =
0
∞
P (X > x1/r )dx.
0
Use the change of variable x = ur to get the result.
Distribution for functions of a random variable. Let X be a random variable with
c.d.f. F . Our goal is to find c.d.f. (or p.d.f.) for Y = U (X).
Example 4. Let X ∼ U [0, 1]. Find the p.d.f. for
34
(i) Y = a + (b − a)X where a < b.
(ii) W = tan (π(2X − 1)/2) .
Solution. For t ∈ [a, b],
F (t) = P (a + (b − a)X < t) = P (X < (t − a)/(b − a)) =
0
This fives F (t) = f (t) =
1
,t
b−a
t−a
.
t−b
∈ (a, b). This means that Y ∼ unif (a, b).
For (ii), let t ∈ < and notice that
G(w) = P (tan (π(2X − 1)/2) ≤ w) = P
1
1
1
−1
X ≤ tan w + 1/2 = tan−1 w + .
π
π
2
Therefore
0
G (w) = g(w) =
1
, w ∈ <.
π(1 + w2 )
which shows W has Cauchy distribution. In the next following examples the following
formula
Γ(α)
=
βα
Z
∞
xα−1 exp(−βx)dx
0
is useful.
Example 5 (Beta distribution). Let Xi , i = 1, 2 be two independent random variable
from the following gamma distribution
Xi ∼ f (x) =
1
exp(−x)xαi −1 , x > 0, i = 1, 2.
Γ(αi )
Find the p.d.f. for the random variable
U=
X
.
X +Y
Solution. We have
G(u) = P
X
yu
≤u =P X≤
X +Y
1−u
35
1
=
Γ(α1 )Γ(α2 )
Z
∞
Z
0
yu
1−u
exp(−(x + y))xα1 −1 y α2 −1 dxdy.
0
We get
uα1 −1 (1 − u)−(α1 +1)
g(u) = G (u) =
Γ(α1 )Γ(α2 )
0
=
Z
∞
exp(−y/(1 − u))y α1 +α2 dy
0
Γ(α1 + α2 ) α1 −1
u
(1 − u)α2 −1 , 0 < u < 1.
Γ(α1 )Γ(α2 )
The resulting distribution is called beta(α1 , α2 ) distribution. It is useful to notice that
Γ(α1 )Γ(α2 )
=
Γ(α1 + α2 )
Z
1
uα1 −1 (1 − u)α2 −1 du.
0
Example 6 (t distribution). Let W ∼ N (0, 1) and V ∼ χ2 (r) such that W and V are
independent. Find the p.d.f. for
W
T =p
∼ t(r).
V /r
Solution. We have
p
G(t) = P (T ≤ t) = P W ≤ t V /r
Z ∞ Z t√v/r
1
1
√ exp(−w2 /2)
=
exp(−v/2)v (r/2)−1 dv.
Γ(r/2)
2π
0
−∞
Therefore
0
√
∞
Z
√
g(t) = G (t) =
0
=
v
exp(−t2 v/(2r)) exp(−v/2)v (r/2)−1 dv
2πrΓ(r/2)
2r/2 Γ((r + 1)/2)
,t ∈ <
√
2 (r+1)/2
Γ(r/2) πr 1 + tr
is the density for t(r) distribution. Notice that as r → ∞
1
g(t) → √ exp(−t2 /2), t ∈ <
2π
which is the standard normal p.d.f..
36
0.4
0.3
y
0.2
0.1
0.0
−4
−2
0
2
x
Figure 4: t distribution with 7 degrees of freedom
37
4
Example 7 (f distribution). Let Ui ∼ χ2 (ri ), i = 1, 2 be independent. Find the p.d.f.
for
F =
U1 /r1
∼ f (r1 , r2 ).
U2 /r2
Solution. We have
G(w) = P (F ≤ w) = P (U ≤ r1 V w/r2 )
Z
∞Z
=
0
Z
=
0
∞
0
vwr1 /r2
1
2(r1 +r2 )/2) Γ(r
1 /2)Γ(r2 /2)
exp(−(u + v)/2)ur1 /2−1 v r2 /2−1 dudv
h v
i
1
exp
−
(1
+
wr
/r
)
v (r1 +r2 )/2−1 (r1 /r2 )r1 /2 w(r1 /2)−1 dv
1 2
2(r1 +r2 )/2 Γ(r1 /2)Γ(r2 /2)
2
r1 /2 r
1
w 2 −1
Γ((r1 + r2 )/2) rr21
=
(r1 +r2 )/2 , w > 0.
r1
Γ(r1 /2)Γ(r2 /2) 1 + w r2
Example 8. Let X and Y be two independent random variables fromU [0, 1].
(i) Find the joint p.d.f. for U = X + Y and V = X − Y
(ii) Find marginal p.d.f.’s for U and V .
Solution. We have
Z Z
P ((U, V ) ∈ A × B) =
g(u, v)dudv
A
B
where g(u, v) is the p.d.f. of (U, V ). On the other hand
−1
Z
P ((U, V ) ∈ A × B) = P ((X, Y ) ∈ g (A × B)) =
Z
f (x, y)dxdy.
g −1 (A×B)
Therefore the problem turns to the change of variables in multiple integrals. We have
g(u, v)dudv| det(J)| = f (x, y)dxdy.
where J is the Jacobian of the transformation. Notice that each point (X, Y ) corresponds
to only one and only one point (U, V ) (one-one transformation). From the definition of
38
0.8
0.6
y
0.4
0.2
0.0
0
2
4
6
8
x
Figure 5: Graph of f distribution with 16 and 10 degrees offreedom
39
10
the r.v.’s U and V conclude that X = (U + V )/2, Y = (U − V )/2 and | det(J)| = 1/2.
Therefore
g(u, v) = f (u, v)|det(J)| = 1/2I(|v| < u, |v| < 2 − u, 0 < u < 2, −1 < v < 1).
Therefore
u
Z
du/2 = v, 0 < u < 1
g1 (u) =
−u
and
Z
2−u
du/2 = 2 − u, 1 < u < 2.
g1 (u) =
u−2
Similarly when |v| < 1,
Z
2−|v|
du/2 = 1 − |v|.
g2 (v) =
|v|
Example 9 (sample generation from normal distribution). Let U and U2 be two independent random variable from U [0, 1]. Define
X1 = cos(2πU1 )
p
p
−2 log U2 , X2 = sin(2πU1 ) −2 log U2 .
Find the joint p.d.f. and marginals for X1 and X2 .
Solution. Solve for U1 and U2 to get
U1 =
1
tan−1 (x2 /x1 ), U2 = exp(−(x21 + x22 )/2)
2π
and
|det(J)| =
1
exp(−(x21 + x22 )/2).
2π
Therefore
f (x1 , x2 ) =
1
exp(−(x21 + x22 ))/2, (x1 , x2 ) ∈ <2 .
2π
which is the p.d.f. for two independent standard normal distribution.
Example 10. Let X and Y be two independent random variables such that X ∼ exp(λ)
and Y ∼ exp(µ). Find the distribution for U = X + Y .
40
Solution. Define V = X and U = X + Y to get X = V and Y = U − V . We can
easily check that
dxdy = |J|du dv = du dv, |J| = 1.
fince for x > 0, y > 0, the joint p.d.f. for (X, Y ) is
f (x, y) = λµ exp(−(λx + µy)).
Therefore the joint p.d.f. for (U, V ) is
g(u, v) = f (u − v, v) = λµ exp(−(λv + µ(u − v))), 0 < v < u < ∞.
Therefore if λ 6= µ then
Z
u
Z
λµ exp(−(λv + µ(u − v)))dv = λµ exp(−µu)
g1 (u) =
0
u
exp(−(λ − µ)v)dv
0
=
λµ
exp(−µu) [1 − exp(−(λ − µ)u)] , u > 0.
λ−µ
If λ = µ then
g1 (u) = λ2 u exp(−λu), u > 0.
Example 11. Let X ∼ P oisson(λ) and Y ∼ P oisson(µ) be two independent random
variables. Find the joint distribution and marginals for
U = X + Y, V = X.
Solution. We have
f (x, y) = exp(−(λ + µ))
λx µy
, x, y = 0, 1, 2, . . . .
x!y!
We get
X = V, Y = U − V
41
and
λv µu−v
, u = v, v + 1, v + 2, . . . , v = 0, 1, 2, . . .
g(u, v) = g(v, u − v) = exp(−(λ + µ))
v!(u − v)!
Therefore
u exp(−(λ + µ)) X u v u−v
λv µu−v
=
λ µ
g1 (u) =
exp(−(λ + µ))
v!(u − v)!
u!
v
v=0
v=0
u
X
=
exp(−(λ + µ))
(λ + µ)u , u = 0, 1, 2, . . . .
u!
42
Chapter 5
Convergence of random variables.
P
Let {Xn } be a sequence of random variables. We say (i) Xn → X if
lim P (|Xn − X| > ) = 0.
n→∞
a.s.
(ii) Xn → X if
P (|Xn − X| > i.o.) = 0
Lp
(iii) Xn → X
lim E(|Xn − X|p ) = 0.
n→∞
Another convergence which is very useful in statistics is convergence in distribution. We
d
say Xn → X if
lim P (Xn ≤ x) = F (x) = P (X ≤ x)
n→∞
for all x in CF , the set of continuity points of F .
Example 1. Let {Xi } be a sequence of random variables such that E(X1 ) = µ and
V ar(X1 ) = σ 2 < ∞. Let
n
1X
Xi
X̄ =
n i=1
be the sample mean. Prove X̄ converges to µ in probability and in LP for p ∈ [1, 2].
Proof. We have E(X̄) = µ and
E(X̄ − µ)2 = V ar(X̄) = σ 2 /n → 0.
L2
Therefore X̄ → µ. From Lyapunov’s inequality E(X̄ − µ)p → 0 for p ∈ [1, 2]. Also
lim P (|X̄ − µ| > ) ≤
n→∞
E(X̄ − µ)2
σ2
=
→ 0.
2
n2
43
Theorem 5.1. Let {Xn } be a sequence of random variables such that E(Xn ) = µ and
V ar(Xn ) → 0. Then
P
Xn → µ.
Proof. Use Markov inequality similar to example 1 to prove this theorem.
Example 2. Let Xn be a sequence of random variables with c.d.f. Fn (x) = I(x ≥
2 + 1/n). We have limn→∞ Fn (x) = I(x > 2). The limit is not a c.d.f. but we can say
limn→∞ Fn (x) = F (x) = I(x ≥ 2) for x ∈ CF (2 is excluded).
Theorem 5.2. Let {Xi } be a sequence of random variables and c ∈ < be a constant. We
P
d
have Xn → c if and only if Xn → c.
P
Proof. Let Xn → c. We have P (|Xn − c| ≥ ) = Fn ((c + )−) − Fn (c − ) → 1 for all
> 0. Therefore
lim Fn ((c + )−) = 1, lim Fn ((c − ) = 0.
n→∞
n→∞
Define F (x) = I(x ≥ c) and P (Xn ≤ x) = Fn (x). Notice that limn→∞ Fn (x) = F (x) if
d
d
x ∈ CF (i.e. x 6= c). Therefore Xn → c. Now let Xn → c. Therefore
lim Fn (x) = I(x ≥ c), x 6= c.
n→∞
For all > 0,
lim P (|Xn − c| < ) = Fn ((c + )−) − Fn (c − ) → 1 − 0 = 1.
n→∞
P
Therefore Xn → c.
Example 3. Let {Xn } be a sequence of i.i.d. random variables from U [0, θ]. Defined
Yn = M ax(X1 , . . . , Xn ).
(i) Find c.d.f. and p.d.f. for Yn .
P
(ii) Show Yn → θ
44
(iii) Find the limiting distribution for n(θ − Yn ).
Solution. For y ∈ (0, θ),
Fn (y) = P (M ax(X1 , . . . , Xn ) ≤ y) = P (X1 ≤ y, . . . , Xn ≤ y) = (P (X1 ≤ y))n = (y/θ)n .
This gives
0
fn (y) = Fn (y) = ny n−1 /θn I(y ∈ (0, θ)).
(ii) Now calculate
θ
Z
nθn+1
→θ
ny /θ dy =
(n + 1)θn
n
E(Yn ) =
0
n
and since
E(Yn2 )
Z
=
θ
ny n+1 /θn dy =
0
nθn+2
→ θ2 .
n
(n + 2)θ
P
Therefore V ar(Yn ) → 0. Combine these to get Yn → θ from Theorem 5.1.
(iii) We have
Gn (w) = P (n(θ − Yn ) ≤ w) = 1 − P (Yn < θ − w/n) = 1 −
θ − y/n
θ
n
y n
= 1− 1−
nθ
→ G(w) = 1 − exp(−y/θ).
as n → ∞. We have
0
G (θ) =
1
exp(−y/θ), y ≥ o
θ
which is an exponential distribution.
Example 4. Let {Xn } be a sequence of i.i.d. random variables from a continuous c.d.f.
F . Defined Yn = M ax(X1 , . . . , Xn ). Find the limiting distribution for Zn = n(1 − F (Yn )).
Solution. Since F (Xi ) ∼ U [0, 1] for i = 1, 2, . . . , n result follows from the previous
example when θ = 1, i.e.
d
n(1 − F (Yn )) → V
45
where V ∼ Exp(1).
Example 5. Let {Xn } be a sequence of i.i.d. random variables with p.d.f.
f (x) = exp(−(x − θ))I(x ≥ θ).
Find the limiting distribution for Yn = n(M in(X1 , . . . , Xn ) − θ).
Solution. We have
P (Yn ≤ y) = P (n(M in(X1 , . . . , Xn ) − θ) ≤ x) = 1 − P (n(M in(X1 , . . . , Xn ) − θ) > x)
n
Z ∞
exp(−(x − θ))dx = 1 − exp(−y), y ≥ 0.
=1−
θ+y/n
which is free from n and is c.d.f. of an exp(1) distribution.
Example 6. Let {Xn } be a sequence of i.i.d. random variables with Bernoulli distribution
P (X1 = x) = px (1 − p)1−x , x = 0, 1
and P (X = x) = 0, x ∈
/ {0, 1} where p ∈ [0, 1]. Prove
Pn
Xi P
p̂ = i=1
→ p.
n
Proof. We have E(X̄) = p and V ar(X̄) = V ar(X1 )/n = p(1 − p)/n → 0. Now result
follows from Theorem 5.1.
Example 7. Let {Xn } be a sequence of i.i.d. random variables with d.d.f. F . Show that
Fn (x) =
1
P
I(Xi ≤ x) → F (x).
n
Since E(Fn (x)) = F (x) and V ar(Fn )(x) = F (x)(1−F (x))/n → 0 as n → ∞ use Theorem
5.1 to conclude the result. More can be said about the empirical process which will be
discussed later.
46
a.s.
P
Theorem 5.3. If Xn → X then Xn → X.
a.s.
Proof. Since Xn → X then for all > 0,
lim P (∪∞
m=n |Xm − X| ≥ ) = 0
n→∞
On the other hand
P (∪∞
m=n |Xm − X| ≥ ) ≥ P (|Xm − X| ≥ ).
Therefore
0 ≤ lim sup P (|Xn − X| ≥ ) ≤ lim P (∪∞
m=n |Xm − X| ≥ ) = 0
n→∞
which shows that
lim P (|Xn − X| ≥ ) = 0.
n→∞
c
Remark. (complete convergence). We say Xn → X if
P∞
n=1
P (|Xn − X| ≥ ) < ∞. If
a.s.
c
Xn → X then Xn → X. This is true since
P (∪∞
m=n |Xm
− X| ≥ ) ≤
∞
X
P (|Xm − X| ≥ )
m=n
and
∞
X
P (|Xm − X| ≥ ) → 0
m=n
as
P∞
n=1
a.s.
P (|Xn − X| ≥ ) < ∞. Another important fact is that if Xn → X and g is a
a.s.
bounded continuous function on < then g(Xn ) → g(X).
P
d
Theorem 5.4. If Xn → X then Xn → X.
0
Proof. Let X ∼ F and x ∈ CF and x be two real numbers. We have
0
0
0
0
P (X ≤ x ) = P (Xn ≤ x, X > x )+P (X ≤ x , Xn > x) ≤ P (Xn ≤ x)+P (X ≤ x , Xn > x).
(5.1)
47
0
If x < x then
0
0
P (X ≤ x ) ≤ P (Xn ≤ x) + P (|Xn − X| > x − x ).
P
Since Xn → X take limits as n → ∞ to get
0
F (x ) ≤ lim inf Fn (x).
n→∞
00
Now consider x > x. Similarly prove
00
00
Fn (x) ≤ F (x ) + P (|Xn − X| ≥ x − x).
00
00
Now converge n∞ to get lim sup Fn (x) ≤ F (x ). Now let x ↓ x to get lim sup Fn (x) ≤
F (x). We have
F (x−) = F (x) ≤ lim inf Fn (x) ≤ lim sup Fn (x) ≤ F (x).
Since x ∈ CF ,
lim Fn (x) = F (x).
P
P
Theorem 5.5.. Let Xn → X and Yn → Y . Then
P
(i) Xn + Yn → X + Y ,
P
(ii) Xn Y → XY ,
P
(iii) Xn Yn → XY
Proof. (i) for any > 0, we have
P (|Xn + Yn − X − Y | ≥ ) ≤ P (|Xn − X| ≥ /2) + P (|Yn − Y | ≥ /2)
and result follows as n → ∞.
For (ii), let k > 0 be a constant such that P (|Y | > K) < δ for δ > 0. Therefore
P (|Xn Y − XY | > ) = P (|Xn − X||Y | > , |Y | > k) + P (|Xn − X||Y | > , |Y | ≤ k)
≤ δ + P (|Xn − X| > /k).
48
P
now converge n → ∞ to get the result. Now since Yn − Y → 0.
For (iii), it is enough to show the result holds when X = 0 and Y = 0 for constants
P
P
a and b as Un → U is equivalent to Un − U → 0 and result proved part (ii) holds. For
large enough n and δ > 0 there exists a positive constant k such that P (|Xn | ≥ k) < δ.
Therefore
P (|Xn Yn | > ) = P (|Xn ||Yn | > , |Xn | ≥ k) + P (|Xn ||Yn | > , |Xn | < k)
≤ P (|Xn | ≥ k) + P (|Yn | > /k).
This gives the desired result as n → ∞.
Remarks.
P
P
(1) If Xn → X and Xn → Y then P (X = Y ) = 1. This is true because
P (|X − Y | > ) ≤ P (|Xn − X| > /2) + P (|Xn − Y | > /2)
and rsult follows as n → ∞ and ↓ 0.
P
P
P
(2) If Xn → X then aXn → aX and if a 6= 0 then Xn /a → X/a.
We use extensively the continuity theorem which is presented here without proof.
Theorem 5.6 (Continuity Theorem). Let {Xn } be a sequence of random variables
d
with Xn having distribution Fn (·) and characteristic function Φn (·). Then Xn → X if and
only if φn (t) → φ(t) for all t ∈ < where φ(t) is continuous in 0.
Lemma 5.1. Let a, b are two constants and let φ(n) → 0. Then (1 + a/n + φ(n)/n)bn →
eab .
Proof. We have
bn ln(1 + a/n + φ(n)/n) ∼ bn(a/n + φ(n)/n) + o(n) → ab.
Now result follows easily.
49
Example 8. Let Xn ∼ χ2 (n). Prove
Xn − n d
√
→ Z ∼ N (0, 1).
2n
Proof. We have
E(exp(itXn )) = (1 − 2it)−n/2 .
Therefore
Mn (t) = E
t(Xn − n)
√
2n
√
√
= exp(−nt/ 2n)(1 − 2t/ 2n)−n/2 .
Use the fact that
ln(1 − x) = x + x2 /2 + x3 /3 + . . . , −1 ≤ x < 1
to show that
−tn
nt
4t2 n
√
√
+ o(n) → t2 /2.
ln(Mn (t)) =
−
+
8n
2n
2n
Therefore Mn (t) → exp(t2 /2) which is the m.g.f. of Z ∼ N (0, 1).
The above example is special case of central limit theorem which we intend to prove in
this section
Theorem 5.7. If E(|X|m ) < ∞ for a given integer m then
E(exp(iθX)) = Φ(θ) =
m
X
(iθ)j
j=0
j!
E(X j ) + o(θm ).
Without Proof (see page 127 of the textbook).
Theorem 5.8. Let {Xi } be a sequence of i.i.d. random variables such that E(X1 ) = µ.
We have
P
X̄ → µ.
Proof. We have
E(exp(iθX̄) = (φ(θ/n))n = (1 + iµθ/n + o(θ/n))n → exp(θµ).
50
d
P
Therefore X̄ → µ which is equivalent to X̄ → µ (Theorem 5.2).
Theorem 5.9. Let {Xi } be a sequence of i.i.d. random variables such that E(X1 ) = 0
(without loss of generality) and σ 2 = V ar(X1 ) < ∞. We have
X̄
d
√ → N (0, 1)
σ/ n
Proof. We have
E(exp(iθX)) = Φ(θ) = 1 −
σ 2 θ2
+ o(θ2 ).
2
Therefore
X̄
σ 2 θ2
√
E exp iθ
+ o((θ/n)2 ))n → exp(−θ2 σ 2 /2).
= (1 −
2n
σ/ n
Remark. The above result can be generalized to multivariate easily by proving the result
for all the linear combinations (real valued) entries of the mean vector.
P
Theorem 5.10. Let {Xn , Yn } be two sequences of random variables such that |Xn −Yn | →
d
d
0 and Yn → Y . Then Xn → Y.
Proof. Let x be a continuity point of F (y) = P (Y ≤ y) and > 0. Then
P (Xn ≤ x) = P (Yn ≤ x + Yn − Xn ) = P (Yn ≤ x + Yn − Xn , Yn − Xn ≤ )
+P (Yn ≤ x + Yn − Xn , Yn − Xn > ) ≤ P (Yn ≤ x + ) + P (|Yn − Xn | ≥ ).
Therefore
lim sup P (Xn ≤ x) ≤ lim inf P (Yn ≤ x + ).
n→∞
n→∞
Similarly
P (Yn ≤ x − ) = P (Xn ≤ x + Xn − Yn − )
= P (Xn ≤ x + Xn − Yn − , Xn − Yn ≤ ) + P (Xn ≤ x + Xn − Yn , Xn − Yn > )
51
≤ P (Xn ≤ x + Xn − Yn − , Xn − Yn ≤ ) + P (|Xn − Yn | > )
≤ P (Xn ≤ x) + P (|Xn − Yn | > ).
Let n → ∞ tp get
lim inf P (Xn ≤ x) ≥ lim sup P (Yn ≤ x − ).
n→∞
n→∞
Since > 0 is arbitrary and x ∈ CF result follows by letting n → ∞.
d
P
d
Remark. If Xn → X and Yn → c Xn + Yn → X + c. To see why notice that
d
Xn + c → X + c and
d
(Yn + Xn ) − (Xn + c) = Yn − c → 0.
d
This implies the result (use Theorem 5.10). Also we have Xn Yn → cX. To see this first
let c = 0. In this case
P (|Xn Yn | > ) = P (|Xn Yn | > , |Yn | ≤ /k) + P (|Xn Yn | > , |Yn | > /k)
≤ P (|Xn | > k) + P (|Yn | > /k).
As n → ∞ we have
lim sup P (|Xn Yn | > ) ≤ P (|X| > k)
n→∞
and choosing k large implies the result. If c 6= 0 then
P
Xn Yn − cXn = Xn (Yn − c) → 0.
Use theorem 5.10 to get the required result.
Example 9. Let {Xi } be a sequence of i.i.d. random variables such that E(X12 ) < ∞.
We have
√
n(X̄ − µ) d
→ N (0, 1)
S
where S 2 is the sample variance.
52
Proof. We have
n
1 X
P
S =
(Xi − X̄)2 → σ 2
n − 1 i=1
2
and
√
n(X̄ − µ) d
→ N (0, 1).
σ
Theorem 5.10 implies the result.
Theorem 5.11 (Skorohod Representation). Let Xn and X are defined on the probability
d
space (Ω, F, P ). Also let Xn → X. On ([0, 1], B([0, 1]), P ∗ ) (P ∗ is uniform probability
measure on [0, 1]) and random variables Xn∗ and X ∗ defined on this new probability space
d
a.s.
d
such that Xn∗ = Xn for any fixed integer n and X ∗ = X and Xn∗ → X ∗ . (Note: The
random variables Xn∗ matches the distribution of Xn but ignores the dependence structure
of Xn .)
Without proof.
Some applications.
d
Theorem 5.12. (Continuous mapping theorem). Let Xn → X and g(·) be a real valued
function which is continuous. We have
d
g(Xn ) → g(X).
d
Proof. Since Xn → X then there exists a sequence of random variables Xn∗ and a random
d
d
variable X ∗ defined on another probability space such that Xn∗ = Xn and X ∗ = X such
a.s.
a.s.
d
that Xn∗ → X ∗ . Therefore g(Xn∗ ) → g(X ∗ ) which implies g(Xn∗ ) → g(X ∗ ). Combining
continuity Theorem and uniqueness Theorem shows that
E(exp(iθg(Xn∗ )) = E(exp(iθg(Xn )) → E(exp(iθg(X ∗ )) = E(exp(iθg(X)).
d
From continuity Theorem, g(Xn ) → g(X).
53
Example 10. Let {Xn } be a sequence of random variables such that E(X1 ) = µ and
V ar(X1 ) = σ 2 . We have
√
n(X̄ − µ) d
→Z
σ
where Z ∼ N (0, 1). We have
n(X̄ − µ)2 d 2
→ χ (1).
σ2
d
(Z 2 = χ2 (1)).
Theorem 5.13. (Delta method). Let {Xn } be a sequence of random variables such that
d
d
0
an (Xn − θ) → X. If g is a differentiable function then an (g(Xn ) − g(θ)) → g (θ)X.
Proof. There exists a sequence of random variables Xn∗ and a random variable X ∗ defined
d
d
a.s.
on another probability space such that Xn∗ = Xn and X ∗ = X such that an (Xn∗ −θ) → X ∗ .
Since
an (g(Xn∗ ) − g(θ)) = an (Xn∗ − θ)
g(Xn∗ ) − g(θ)
Xn∗ − θ
and
g(Xn∗ ) − g(θ) a.s. 0
→ g (θ)
Xn∗ − θ
we have
a.s.
0
an (g(Xn∗ ) − g(θ)) → g (θ)X ∗ .
Therefore
d
0
an (g(Xn ) − g(θ)) → g (θ)X.
Example 11. Let {Xn } be a sequence of random variables such that E(X1 ) = µ and
V ar(X1 ) = σ 2 . We have
√
n(X̄ − µ) d
→Z
σ
We can use the delta method to get
√
d
n(X̄ 2 − µ2 ) → 2µσZ.
54
Note that 2µσZ ∼ N (0, 4µ2 σ 2 ).
Example 12. Let {Xn } be a sequence of i.i.d. random variables such that E(X1 ) = 0
0
and V ar(X1 ) = σ 2 . Let g(x) = cos x and note that g (0) = 0. Therefore in the proof of
delta method we need to expand beyond the mean value theorem. We have
√
d
nX̄ → Z ∼ N (0, 1).
Now use Skorohod representation theorem and the fact that cos x−1 ≈ −x2 /2 to conclude
that
d
2n cos(1 − X̄) → σ 2 χ2 (1).
Theorem 5.12. (Multivariate delta method). Let {Un } be a sequence of random vectors
d
such that an (Un − m) → Np (0, Σ). If f : <p → < and the first and secon derivatives of
f exists in a neighborhood of θ then
√
d
0
n(f (Un ) − f (m)) → Np (0, a (m)Σa(m)
where
∂f (m)
∂f (m)
a (m) =
...
.
∂x1
∂xp
0
Proof. A similar argument to the proof of Theorem 5.12 can be given here which is
omitted here.
Example 13. Let {Xn } be an i.i.d. sequence of random variables with E(X1 ) =
µ, E(Xi2 ) = µ2 + σ 2 , E(X13 ) = µ3 and E(X14 ) = µ4 < ∞. Find sequences of constants an
and bn such that an (Sn2 − bn ) converges in distribution to a nontrivial random variable
P
where Sn2 = n1 ni=1 (Xi − X̄)2 .
0
0
Solution. Let Yi = (Xi , Xi2 ), i = 1, 2, . . . , n. We have m = E(Yi ) = (µ, µ2 + σ 2 ). The
variance and covariance matrix to get
σ2
µ3 − µ(µ2 + σ 2 )
Σ = µ3 − µ(µ2 + σ 2 ) µ4 − (µ2 + σ 2 )2
55
We have
√
d
n(Ȳ − m) → N2 (0, Σ).
Define g(x, y) = y − x2 and notice that
√
d
n(g(X̄, X̄ 2 ) − g(µ, µ2 + σ 2 )) → N (0, θ2 ).
0
To find θ2 , calculate θ2 = a (m)Σa(m) where
∂g(m) ∂g(m)
0
a (m) =
,
.
∂x
∂y
i
h
0
∂g ∂g
, ∂y = [−2x, 1] which shows that a(m) = [−2µ, 1]. For the case that µ = 0
We have ∂x
0
we get θ2 = a (m)Σa(m) = µ4 − σ 4 .
56
Chapter 6
Martingales.
Definition 6.1. A process {Xn : n = 0, 1, . . .} is a martingale if for n = 0, 1, 2, . . .
(i) E(|Xn |) < ∞
(ii) E(Xn+1 |X0 , . . . , Xn ) = Xn .
If Xn be the player’s fortune at stage n of a game the the martingale property says
that a game being fair.
Definition 6.2. Let {Xn : n = 0, 1, . . .} and {Yn : n = 0, 1, . . .} be two stochastic
processes. Then Xn is said to be a martingale with respect to Yn if
(i) E(|Xn |) < ∞
(ii) E(Xn+1 |Y0 , . . . , Yn ) = Xn .
Theorem 6.1. Let X and Y be two random variables. We have
E(E(Y |X)) = E(Y )
and
V ar(Y ) = E(V ar(Y |X)) + V ar(E(Y |X)).
Proof. We have
Z
∞
E(Y |X = x) =
ydF (y|x)
−∞
and
Z
∞
Z
∞
E(E(Y |X)) =
Z
∞
Z
∞
ydF (y|x)dF (x) =
−∞
−∞
ydF (x, y) = E(Y ).
−∞
−∞
From
E(V ar(Y |X)) = E(E(Y 2 |X) − (E(Y |X))2 ) = E(Y 2 ) − E(E(Y |X))2 )
57
and
V ar(E(Y |X)) = E((E(Y |X))2 ) − (E(E(Y |X))2 = E((E(Y |X))2 ) − (E(Y ))2
result follows easily.
Theorem 6.2. If Xn is a martingale with respect to Yn then E(Xn ) = E(Yn ) for all
n = 0, 1, 2, . . . .
Proof. Since E(Xn+1 |Y0 , . . . , Yn ) = Xn we get
E(E(Xn+1 |Y0 , . . . , Yn )) = E(Xn ).
From Theorem 6.1
E(Xn ) = E(E(Xn+1 |Y0 , . . . , Yn )) = E(Xn+1 )
result follows easily.
Example 1. Let Y0 = 0 and {Yi } be a sequence of i.i.d. random variables such that
E(Yn ) = 0. Define
Xn = Y1 + · · · + Yn
is a martingale with respect to Yn . To see this notice that
E(Xn+1 |Y0 , . . . , Yn ) = E(Xn + Yn+1 |Y0 , . . . , Yn ) = Xn + E(Yn+1 |Y0 , . . . , Yn ) = Xn .
Example 2. Let {Yi } be a sequence of i.i.d. random variables with E(Y1 ) = 0 and
V ar(Y1 ) = σ 2 . Let X0 = 0 and
Xn = (Y1 + · · · + Yn )2 − nσ 2 .
Then {Xn } is a martingale with respect to {Yn }.
58
Proof. Obviously E(|Xn |) < ∞. Moreover
E(Xn+1 |Y0 , . . . , Yn ) = E[(Yn+1 +
n
X
Yk )2 − (n + 1)σ 2 |Y0 , . . . , Yn )]
k=1
2
+(
= E[Yn+1
n
X
Yk )2 + 2Yn+1
k=1
n
X
Yk − (n + 1)σ 2 |Y0 , . . . , Yn )] = σ 2 + Xn − σ 2 = Xn .
k=1
Example 3. A ball is drawn at rndom from the urn of balls with a combination of one
red and one green color. The ball and one more of the same color are then returned to
the urn. Repeat this experiment indefinitely. Let
Xn =
Number of red balls
Number of balls
and
Yn = Number of red balls = (n + 2)Xn .
Given Yn = k, we have
P (Yn+1 = k + 1|Yn = k) =
k
k
, P (Yn+1 = k|Yn = k) = 1 −
.
n+2
n+2
We have
E(Yn+1 |Yn = k) =
(k + 1)k + k(n + 2 − k)
k(n + 3)
=
.
n+2
n+2
Therefore
E(Yn+1 |Yn ) = bn Yn , bn =
k(n + 3)
.
n+2
Therefore
E(Yn+1 /(n + 3)|Yn ) = Yn /(n + 2).
Therefore Xn is a martingale.
Example 4. (Doob’s martingale).Let Y0 = 0 and {Yi } be a sequence of i.i.d. random
variables and X be a random variable satisfying E(|X|) < ∞. Then
Xn = E(X|Y0 , . . . , Yn )
59
is a martingale with respect to {Yi }.
Proof. We have
E(|Xn |) = E [|E(X|Y0 , . . . , Yn )] ≤ E [E(|X||Y0 , . . . , Yn )] = E(|X|) < ∞.
Also
E(Xn+1 |Y0 , . . . , Yn ) = E(E(X|Y0 , . . . , Yn+1 )|Y0 , . . . , Yn ) = E(X|Y0 , . . . , Yn ) = Xn .
Note: E[E(X|Y, Z)|Y ] = E(X|Y ).
Example 5. (Likelihood ratio). Let Y0 = 0 and {Yi } be a sequence of i.i.d. random
variables and f0 and f1 be two p.d.f.. Define
Xn =
f1 (Y0 ) · · · f1 (Yn )
, n = 0, 1, 2, . . . .
f0 (Y0 ) · · · f0 (Yn )
i.i.d.
If Yi f
0
then we have
Z ∞
f1 (Yn+1 )
f1 (y)
E(Xn+1 |Y0 , . . . , Yn ) = E Xn
|Y0 , . . . , Yn = Xn
f0 (y)dy = Xn .
f0 (Yn+1 )
−∞ f0 (Y )
Therefore Xn is a martingale with respect to {Yi }.
Example 6. (Wald martingale). Let Xi be a sequence of i.i.d. random variables with
M (t) = E(exp(tX)) < ∞. Define:
Yn =
exp(λ(X1 + · · · + Xn ))
.
(M (λ))n
We have
E(Yn+1 |X1 , . . . , Xn ) = E
exp(λ(X1 + · · · + Xn+1 ))
|X1 , . . . , Xn
(M (λ))n+1
60
= Xn
© Copyright 2026 Paperzz