CHAPTER 1
Probability Theory
1.1. Probability space
dddd, ddd (Ω, F, P) dddddddd (probability space). ddddddd
ddddd notations.
(1) Possible outcomes ωα , α ∈ A, are called sample points (d
Definition 1.1.
dd) 1.
(2) The set Ω = {ωα : α ∈ A} = the collection of all possible outcomes, i.e., the set
of all sample points, is called a sample space (dddd).
dddddd, Ω ddddddddddddddddddd.
Example 1.2.
(1) Ω = N = the set of all natural numbers = {ωn : ωn =
n for all n ∈ N}.
(2) Ω = R = the set of all real numbers.
(3) Ω = the collection of all odd positive numbers.
(4) Ω = the collection of all fruits, e.g., apple ∈ Ω, pineapple ∈ Ω.
(5) Ω = the collection of all colors.
Definition 1.3. A system F of subsets of Ω is called a σ-algebra if
(i) Ω ∈ F;
1Addddddddd
index set. dA = N = {1, 2, 3, ...}, d ωα ddd ωn . d A = R, d possible
outcomes dd uncountable dd.
5
6
1. PROBABILITY THEORY
(ii) Ac ∈ F whenever A ∈ F ;
(iii) An ∈ F for all n = 1, 2, 3, ... implies that
∞
An ∈ F.
n=1
ddddddddddddd,dddddddd? dddddddddddddd
dd. dd, dddddddd.
Example 1.4.
(1) Let Ω = {1, 2, 3}. Then
(i) F1 = {∅, {1}, {2, 3}, Ω} is a σ-algebra.
(ii) F2 = {∅, {1}, {2}, {3}, Ω} is not a σ-algebra, since {1} ∈ F2 , but {1}c =
{2, 3} ∈ F2 .
(2) Let Ω = R and F = the collection of all subsets of R, then F is a σ-algebra.
(3) Let Ω = N. Then
(i) F1 = {∅, {1, 3, 5, 7, ...}, {2, 4, 6, 8, ...}, N} is a σ-algebra.
(ii) F2 = {∅, {3, 6, 9, ...}, {1, 4, 7, ...}, {2, 5, 8, ...}, {1, 3, 4, 6, 7, 9, ...}, {1, 2, 4, 5, 7, 8, ...},
{2, 3, 5, 6, 8, 9, ...}, N} is a σ-algebra.
(iii) F3 = {∅, {1, 2}, {3, 4}, {5, 6}, ..., {1, 2, 3, 4}, {1, 2, 5, 6}, ..., {1, 2, 3, 4, 5, 6}, ..., Ω}
is a σ-algebra.
(4) Let Ω = N. Then
(i) F1 = {A ⊆ N : A is finite or Ac is finite } is not a σ-algebra. For example,
the set An = {n} ∈ F1 for all n, but the set
∞
An = N ∈ F1 ,
n=1
since neither N nor Nc has finite elements.
(ii) F2 = {A ⊆ N : A is countable or Ac is countable } is a σ-algebra.
dddd, ddd sample space dddddddd σ-algebra. ddddd σ-algebras
dddddddddddddd, d Example 1.4 (3) dd F1 , F2 d F3 .
1.1. PROBABILITY SPACE
7
Definition 1.5. Let Ω be a non-empty set and let F be a σ-algebra on Ω, then (Ω, F)
is called a measurable space (dddd).
Definition 1.6. Let (Ω, F) be a measurable space. A probability measure (ddd
d) is a real-valued function P : F −→ R satisfying
(i) P(E) ≥ 0 for all E ∈ F;
(ii) (Countable additivity) Let (En ) be a sequence of countable collection of disjoint
sets in F. Then
P
∞
En
=
n=1
∞
P(En ).
(1.1)
n=1
(iii) P(Ω) = 1.2
d d d d d d d d d d d d, d d d d d d d d d σ-algebra d d d d d d d
∞
An ∈ F for all n = 1, 2, 3, ... implies that
An ∈ F . ddddddd, (1.1) dddd
n=1
d.
Proposition 1.7.
(1) P(E) ≤ 1 for all E ∈ F.
(2) P(∅) = 0.
(3) P(E c ) = 1 − P(E).
(4) P(E ∪ F ) = P(E) + P(F ) − P(E ∩ F ).
(5) If E ⊆ F , then P(E) ≤ P(F ).
(6) If (En ) is the collection of sets in F, then
∞
∞
P
En ≤
P(En ).
n=1
n=1
(7) (i) If (En ) satisfies
E1 ⊆ E2 ⊆ · · · ⊆ En ⊆ · · · ,
2dddddddddd,
P dddddddddd measure.
8
1. PROBABILITY THEORY
then P(En ) converges to P
∞
En , i.e., lim P(En ) = P
n→∞
n=1
(ii) If (En ) satisfies
∞
En .
n=1
E1 ⊇ E2 ⊇ · · · ⊇ En ⊇ · · · ,
∞
∞
En , i.e., lim P(En ) = P
En .
then P(En ) converges to P
n→∞
n=1
n=1
Definition 1.8. The triple (Ω, F, P) is called a probability space (dddd).
Example 1.9.
(1) Let
Ω = {H, T }
(ddddddd),
F = {∅, {H}, {T }, {H, T }},
and let P be given by
1
P({H}) = P({T }) = ,
2
P(∅) = 1,
P({H, T }) = 1.
Then (Ω, F, P) is a probability space.
(2) Let
Ω = {
F =
,
, , , , }
(ddddddddd),
the collection of all subsets of Ω,
P satisfies P({
}) = P({ }) = · · · = P({
}) = 1/6 and P is a probability
measure. Then (Ω, F, P) is a probability space.
(3) Let Ω = {ω1 , ω2 , ..., ωn , ...} be a countable set and let F be the collection of all
subsets of Ω. Assume that (pn ) be a sequence of real numbers with
pn ≥ 0
for all n
and
∞
n=1
pn = 1.
1.1. PROBABILITY SPACE
9
Define a set function P : F −→ R by P({ωn }) = pn and
P(E) =
P({ωn }) =
ωn ∈E
pn .
ωn ∈E
Then P defines a probability measure.
We call (Ω, F, P) a discrete probability space and Ω a discrete sample space.
dddddddddddddd probability space. ddddd probability space d
d, ddddddddddddd notation.
Question. Given a sample set Ω and a collection of subsets of Ω, C. Does there exist
a collection of subsets of Ω, say G, such that
(i) C ⊆ G;
(ii) G is a σ-algebra?
Answer. Yes. We may take G to be the collection of all subsets of Ω.
ddddddddd F ddddddd C dddd σ-algebra. dddd yes. dd
d
H.
C⊆H and H:σ-algebra
dddd σ-algebra ddddddd: dddd σ-algebra ddddddd σ-algebra.
Notation 1.10. If G is the smallest σ-algebra containing C, then we say that G is
generated by C and denote it by G = σ(C).
Example 1.11. Let
Ω = {1, 2, 3, 4}
and
C = {{1, 2}, {4}}.
Then
σ(C) = {∅, {1, 2}, {3}, {4}, {1, 2, 3}, {1, 2, 4}, {3, 4}, Ω}.
10
1. PROBABILITY THEORY
Example 1.12. Let Ω = R and let C be the collection of all open intervals (a, b) in R.
Then the sets in B = σ(C) are called Borel sets. dddddddddd, dddddd
random variable ddddd. For example, R, Q, (a, b), [a, b), (a, b], [a, b] are in σ(C). d
dddddddddd R1 dd subsets dd Borel sets. d Borel set ddddddd
dddd, dddddddd real analysis ddddd.
Remark 1.13. Let Ω = [0, 1] and let B1 be the collection of all Borel sets in [0, 1], i.e.,
B1 = B ∩ [0, 1] := {A ∩ [0, 1] : A ∈ B}.
For (a, b) ∈ B1 , define
m((a, b)) = b − a.
Then we can define a probability measure m : B1 −→ R. m is called the Lebesgue measure
(ddddddddddddddddd).
([0, 1], B1 , m) dddddddd probability space dd.
Exercise
(1) Find the σ-algebra generated by the given collection of sets C.
(a) Ω = {1, 2, 3, 4}, C = {{1, 2, 3}, {4}};
(b) Ω = {1, 2, 3, 4}, C = {{2, 3, 4}, {3, 4}};
(c) Ω = {1, 2, 3, 4, 5}, C = {{1, 2, 4}, {1, 4, 5}};
(d) Ω = R, C = {[−1, 0), (1, 2)}
(2) Let Ω = {1, 2, 3, 4, 5, 6} and let F = σ ({{1, 2, 3, 4}, {3, 4, 5}}). Find a probability
measure defined on (Ω, F).
1.2. RANDOM VARIABLES
11
(3) Consider a probability space (Ω.F, P), where Ω = {1, 2, 3, 4, 5}, F is the collection
of all subsets of Ω, and
1
P({1}) = P({2}) = P({5}) = ,
4
P({3}) = P({4}) = P({6}) =
1
.
12
(a) Let
X = 2I{1} + 3I{2,3} − 3I{4,5} + I{6} .
Find E[X] and E[X 2 ].
(b) Let
Y = I{1,2} + 3I{2,4,5} − 2I{4,5,6} .
Find E[Y ] and E[Y 3 ].
(4) Let Ω = R, F = all subsets so that A or Ac is countable, P(A) = 0 in the first
case and = 1 in the second. Show that (Ω, F, P) is a probability space, i.e., show
that F is a σ-algebra and P is a probability measure.
1.2. Random variables
Let (Ω, F, P) be a probability space.
Definition 1.14. We say a function X : Ω −→ R to be a random variable (r.v., dd
dd) if for every B ∈ B,
{ω : X(ω) ∈ B} ∈ F,
i.e., X is measurable with respect to F.
Notation 1.15. For all random variable X and B ∈ F ,
{X ∈ B} := {ω ∈ Ω : X(ω) ∈ B}.
12
1. PROBABILITY THEORY
Example 1.16. Suppose that Ω = [0, 1] and F = B1 .
(1) X1 (ω) = ω. For B ∈ B,
{X1 ∈ B} = {ω ∈ [0, 1] : X1 (ω) ∈ B}
= {ω ∈ [0, 1] : ω ∈ B} = B ∩ [0, 1] ∈ B1 .
Thus, X1 is a random variable.
(2) X2 (ω) = ω 2 . For B ∈ B,
{X2 ∈ B} = {ω ∈ Ω : X2 (ω) ∈ B} = {ω ∈ Ω : ω 2 ∈ B}.
dddddddd, ddddddd B ∈ B dd. dddddddddddd
d. d Example 1.18 (1) dddddddddddd.
Theorem 1.17. The following statements are equivalent.
(1) X is a random variable on (Ω, F).
(2) {X ≤ r} ∈ F for all r ∈ R.
(3) {X < r} ∈ F for all r ∈ R.
(4) {X ≥ r} ∈ F for all r ∈ R.
(5) {X > r} ∈ F for all r ∈ R.
ddddddd, d check ddddddd random variable ddddd. dddd
Example 1.16 dd X2 dd, ddd check X2 (ω) = ω 2 ddd random variable ddd
dddd B dddddd, ddddd, dddddd. dd Theorem 1.17 ddddd
dd.
Example 1.18.
(1) Consider Ω = [0, 1], F = B1 and X(ω) = ω 2 .
(i) If r < 0, {X ≤ r} = {ω ∈ [0, 1] : ω 2 ≤ r} = ∅ ∈ F.
1.2. RANDOM VARIABLES
13
√
(ii) If 0 ≤ r ≤ 1, {X ≤ r} = {ω ∈ [0, 1] : ω 2 ≤ r} = [0, r] ∈ F .
(iii) If r > 1, {X ≤ r} = [0, 1] ∈ F.
Thus, X is a random variable.
(2) ddddd random variable ddd.
Let Ω = {1, 2, 3, 4} and F = σ({1, 2}, {3}, {4}).
(a) X1 (1) = 2, X1 (2) = 3, X1 (3) = 4, X1 (4) = 5. Since
{X1 ≤ 2} = {1} ∈ F,
X1 is not a random variable.
(b) X2 (1) = X2 (2) = 2, X2 (3) = 10, X2 (4) = −500.
(i) If r < −500, {X2 ≤ r} = ∅ ∈ F.
(ii) If −500 ≤ r < 2, {X2 ≤ r} = {4} ∈ F.
(iii) If 2 ≤ r < 10, {X2 ≤ r} = {1, 2, 4} ∈ F.
(iv) If r ≥ 10, {X2 ≤ r} = Ω ∈ F.
Thus, X2 is a random variable.
Theorem 1.19.
(1) If X is a random variable, f is a Borel measurable function
on (R, B), then f (X) is a random variable.
(2) If X and Y are random variables, f is a Borel measurable function of two variables, then f (X, Y ) is a random variable.
(3) If (Xn )n≥1 is a sequence of random variables, then
inf Xn ,
n
sup Xn ,
n
lim inf Xn ,
n→∞
lim sup Xn ,
n→∞
lim Xn
n→∞
are random variables.
ddddddddddddd, ddddddd. lim sup d lim inf dddddd
Appendix A dddd.
14
1. PROBABILITY THEORY
Example 1.20.
(1) Let (Ω, F, P) be a discrete probability space. Then every
real-valued function on Ω is a random variable.
(2) Let (Ω, F, P) = ([0, 1], B1 , m). Then the random variables are exactly the Borel
measurable functions defined on ([0, 1], B1 ).
Exercise
(1) Let Ω = {1, 2, 3, 4, 5, 6}, and let
X1 (ω) =
⎧
⎪
⎪
1, ω = 1,
⎪
⎪
⎪
⎪
⎪
⎪
⎪
2, ω = 2,
⎪
⎪
⎪
⎪
⎪
⎪
⎨ 1, ω = 3,
⎪
⎪
1, ω = 4,
⎪
⎪
⎪
⎪
⎪
⎪
⎪
2, ω = 5,
⎪
⎪
⎪
⎪
⎪
⎪
⎩ 2, ω = 6,
X2 (ω) =
⎧
⎪
⎪
3, ω = 1,
⎪
⎪
⎪
⎪
⎪
⎪
⎪
2, ω = 2,
⎪
⎪
⎪
⎪
⎪
⎪
⎨ 3, ω = 3,
⎪
⎪
3, ω = 4,
⎪
⎪
⎪
⎪
⎪
⎪
⎪
2, ω = 5,
⎪
⎪
⎪
⎪
⎪
⎪
⎩ 2, ω = 6,
X3 (ω) =
⎧
⎪
⎪
3, ω = 1,
⎪
⎪
⎪
⎪
⎪
⎪
⎪
2, ω = 2,
⎪
⎪
⎪
⎪
⎪
⎪
⎨ 1, ω = 3,
⎪
⎪
5, ω = 4,
⎪
⎪
⎪
⎪
⎪
⎪
⎪
4, ω = 5,
⎪
⎪
⎪
⎪
⎪
⎪
⎩ 4, ω = 6.
(a) Let F = σ ({{1}, {2}, {3}, {4}, {5}, {6}}), which of X1 , X2 , X1 +X2 , X1 +X3
and X3 are random variables on (Ω, F)?
(b) Let F = σ ({{1, 2, 3}, {4, 5}}), which of X1 , X2 , X1 + X2 , X1 + X3 and X3
are random variables on (Ω, F)?
(c) Let F = σ ({{1, 4}, {2, 5}, {3}}), which of X1 , X2 , X1 + X2 , X1 + X3 and
X3 are random variables on (Ω, F)?
(2) Suppose X and Y are random variables on (Ω, F, P) and let A ∈ F . Show that
if we let
Z(ω) =
⎧
⎪
⎨ X(ω),
if ω ∈ A,
⎪
⎩ Y (ω),
if ω ∈ Ac ,
then Z is a random variable.
1.3. EXPECTATION
15
(3) Let P be the Lebesgue measure on Ω = [0, 1]. Define
Z(ω) =
⎧
⎪
⎨ 0,
if 0 ≤ ω < 1/2,
⎪
⎩ 2,
if 1/2 ≤ ω ≤ 1.
For A ∈ B 1 , define
Q(A) =
Z(ω) dP(ω).
A
(a) Show that Q is a probability measure.
(b) Show that if P(A) = 0, then Q(A) = 0. (We say that Q is absolutely
continuous with respect to P.)
(c) Show that there is a set A for which Q(A) = 0 but P(A) > 0.
1.3. Expectation
Definition 1.21. The function
IA (ω) =
⎧
⎪
⎨ 0,
if ω ∈ A,
⎪
⎩ 1,
if ω ∈ A.
is called the indicator function of A.
Remark 1.22. The indicator function IA is a random variable if and only if A ∈ F.
Definition 1.23.
(1) Let Ai ∈ F for all i and let a random variable X be of the
form
X=
∞
bi IAi .
i=1
Then X is called a simple random variable.
(1.2)
16
1. PROBABILITY THEORY
(2) Let X be the form (1.2), we define the expectation (ddd) of X to be
E[X] =
∞
bi P(Ai ).
i=1
dddddddd, dddddddddddd. ddddddddddddd
dd (An ) d disjoint.
Example 1.24. Let (Ω, F, P) = ([0, 1], B1 , m) and consider
X=
∞
1
I −i .
i [0,2 )
2
i=1
Then the expectation of X is given by
∞
∞
1
1
1
−i
E[X] =
P([0, 2 )) =
= .
i
i
2
4
3
i=1
i=1
Remark 1.25. Consider the generalization of the expectation3. Let X be a positive
random variable. Define
Λmn
n
n+1
= ω : m ≤ X(ω) < m
∈ F,
2
2
for all m, n ∈ N.
Let
∞
n
I
.
Xm =
m Λmn
2
n=0
(Xm ddddd Figure 1.3) Due to the construction of Xm , we see that for all ω ∈ Ω,
Xm (ω) ↑ and
lim Xm (ω) = X(ω).
m→∞
(i) If E[Xm ] = +∞ for some m, we define E[X] = +∞.
3dddddddddd
Lebesgue integral, ddddddddddddd. ddd Riemann integral
ddddddddddddd. Riemann inegral ddddddddd, Lebesgue integral dddddd.
ddddddddddd. dd step functions / simple functions, ddddd simple functions ddd
ddd f ddd. ddddddddd Figure 1.1 d Figure 1.2.
1.3. EXPECTATION
17
Figure 1.1. Riemann integral dddd
X
Figure 1.2. Lebesgue integral dddd
(ii) If E[Xm ] < ∞ for all m, define
∞
n
n
n+1
P m ≤X< m .
E[X] = lim E[Xm ] = lim
m→∞
m→∞
2m
2
2
n=0
dddddd positive random variable dddddddd expectation,
Definition 1.26. Consider a general random variable X. Then we can write X as
X = X + − X −,
where X + = X ∨ 0, X − = (−X) ∨ 0.
18
1. PROBABILITY THEORY
4.2
-m
3.2
-m
2.2
-m
X
Xm
-m
2
Figure 1.3. X d Xm
(1) Unless both of E[X + ] and E[X − ] are +∞, we define
E[X] = E[X + ] − E[X − ].
(2) If E|X| = E[X + ] + E[X − ] < ∞, X has a finite expectation. We denote by
E[X] =
(3) For A ∈ F , define
X(ω) P(dω).
X dP =
Ω
Ω
A
X dP = E[XIA ],
(1.3)
which is called the integral of X with respect to P over A.
(4) X is integrable with respect to P over A if the integral (1.3) exists and is finite.
Remark 1.27.
(1) If X has a cumulative distribution function (c.d.f.) F with
respect to P, then
E[X] =
∞
x dF (x).
−∞
Moreover, if g is Borel measurable function in R,
E[g(X)] =
∞
g(x) dF (x).
−∞
1.3. EXPECTATION
19
(2) If X has a probability density function (p.d.f.) f with respect to P, then
E[X] =
∞
xf (x) dx
−∞
and
E[g(X)] =
∞
g(x)f (x) dx.
−∞
(3) If X has a probability mass function p with respect to P, then
E[X] =
∞
xn p(xn ),
n=1
E[g(X)] =
∞
g(xn )p(xn ).
n=1
(1) Let Ω = {1, 2, 3, 4}, F = σ({1}, {2}, {3}, {4}) and
Example 1.28.
1
P({1}) = ,
2
1
P({2}) = ,
4
1
P({3}) = ,
6
P({4}) =
1
.
12
Let
X = 5I{1} + 2I{2} − 4I{3,4} .
Then
1
1
+
=2
6 12
1
1
1
1
35
2
+
E[X ] = 25 · + 4 · + 16
= .
2
2
6 12
2
1
1
E[X] = 5 · + 2 · − 4
2
2
(2) Suppose X is normally distributed on (Ω, F, P) with mean 0 and variance 1, then
X has probability density function
2
1
x
√
exp −
.
2
2π
20
1. PROBABILITY THEORY
Thus,
E[X] =
E[X 3 ] =
E[eX ] =
=
2 1
x
x √
exp −
= 0,
2
2π
−∞
2 ∞ 1
x
3
x √
exp −
= 0,
( odd function )
2
2π
−∞
2
2 ∞
∞ x
1
x
1
x
e √
exp − + x dx
exp −
=√
2
2
2π
2π −∞
−∞
∞
1
1
1
√
exp − (x − 1)2 +
dx = e1/2 .
2
2
2π −∞
∞
(1) (Absolute Integrability)
Proposition 1.29.
X dP < ∞
A
⇐⇒
A
|X| dP < ∞.4
(2) (Linearity)
(aX + bY ) dP = a
A
X dP + b
A
Y dP.
A
(3) (Additivity over sets) If (An ) is disjoint, then
X dP =
∪n An
n
X dP.
An
(4) (Positivity) If X ≥ 0 P-a.e.5 on A, then
A
X dP ≥ 0.
(5) (Monotonicity) If X1 ≤ X ≤ X2 P-a.e. on A, then
A
4ddd
5We
X1 dP ≤
A
X dP ≤
A
X2 dP.
Riemann integral dddddddd.
say a property holds P-a.e. (almost everywhere) or P-a.s. (almost surely) means that the
probability that this property holds is equal to 1, i.e., except a set with probability 0, this property is
true.
1.3. EXPECTATION
21
(6) (Modulus Inequality)
X dP ≤
|X| dP.
A
A
(1) (Dominated Convergence Theorem) If lim Xn = X P-a.e.
n→∞
on A and |Xn | ≤ Y P-a.e. on A for all n with
Y dP < ∞. Then
Theorem 1.30.
lim
n→∞
A
A
Xn dP =
lim Xn dP =
A n→∞
X dP.
A
(2) (Monotone Convergence Theorem) If Xn ≥ 0 and Xn X P-a.e. on A, then
lim
n→∞
A
Xn dP =
lim Xn dP =
A n→∞
X dP.
A
(3) (Fatou’s Lemma) If Xn ≥ 0 P-a.e. on A, then
lim inf Xn dP ≤ lim inf
A n→∞
n→∞
A
Xn dP.
(4) (Jensen’s Inequality) If ϕ is a convex function, X and ϕ(X) are integrable, then
ϕ(E[X]) ≤ E[ϕ(X)].
Exercise
(1) Let λ be a fixed number in R, and define the convex function ϕ(x) = eλx for
all x ∈ R. Let X be a normally distributed random variable with mean μ and
variance σ 2 , i.e., the probability density function of X is given by
1
(x − μ)2
f (x) = √
.
exp −
2σ 2
2πσ
(a) Find E[eλX ].
(b) Verify that Jensen’s inequality holds (as it must):
Eϕ(X) ≥ ϕ(E[X]).
22
1. PROBABILITY THEORY
(2) For each positive integer n, define fn to be the normal density with mean zero
and variance n, i.e.,
2
1
x
fn (x) = √
exp −
.
2n
2nπ
(a) What is the function f (x) = lim fn (x)?
n→∞
∞
fn (x) dx?
(b) What is lim
n→∞
−∞
(c) Note that
∞
lim
n→∞
−∞
fn (x) dx =
∞
f (x) dx.
−∞
Explain why this does not violate the ”Monotone Convergence Theorem”.
(3) Let P be the Lebesgue measure on Ω = [0, 1]. Define
⎧
⎪
⎨ 0,
if 0 ≤ ω < 1/2,
Z(ω) =
⎪
⎩ 2,
if 1/2 ≤ ω ≤ 1.
For A ∈ B 1 , define
Q(A) =
Z(ω) dP(ω).
A
(a) Show that Q is a probability measure.
(b) Show that if P(A) = 0, then Q(A) = 0. (We say that Q is absolutely
continuous with respect to P.)
(c) Show that there is a set A for which Q(A) = 0 but P(A) > 0.
© Copyright 2026 Paperzz