Contents 1 Set Theory and Probability Measures

Table of Contents
Contents
1
Set Theory and Probability Measures
1.1 Classes of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Probability Measures . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
4
2
Distribution Functions and Probability Meaures
2.1 Monotone functions and their properties . . . . . . . . . . . . . . . .
2.2 Distriburtion Functions and Their Decompositions . . . . . . . . . .
2.3 Distribution functions and probability measures . . . . . . . . . . . .
6
6
7
11
3
Random Variable and Expectation
3.1 Random variables and Properties . . . . . . . . . . . . . . . . . . . .
3.2 Expectation and Properties . . . . . . . . . . . . . . . . . . . . . . .
15
15
18
1
Set Theory and Probability Measures
1.1
Classes of sets
Let Ω be an “abstract space", namely a nonempty set of elements to be called “points"
and denoted generically by ω. Some of the usual operaoperations and relations between
sets, together with the usual notation, are given below.
• Union: E ∪ F , ∪n En
• Intersection: E ∩ F , ∩n En
• Complement: E c = Ē = Ω \ E
• Difference: E \ F = E ∩ F c
• Symmetric difference E 4 F = (E \ F ) ∪ (F \ E)
• Singleton: {ω}
• Subset: E ⊂ F , F ⊃ E, A ⊂ B
• Empty set: ∅
• Belonging to: ω ∈ E, E ∈ A
Definition 1.1. A nonempty collection A of subsets of Ω is called a field iff (i) and(ii)
hold:
(i) E ∈ A implies E c ∈ A.
(ii) E1 ∈ A, E2 ∈ A imply E1
S
E2 ∈ A.
Clearly, If A is a field, then ∅ ∈ A and Ω ∈ A. A field is sometimes called an algebra.
It is called a monotone class (M.C.) iff the (vi) and (vii) hold:
S∞
(vi) Ej ∈ A; Ej ⊂ Ej+1 , j = 1, 2, ... imply j=1 Ej ∈ A.
T∞
(vii) Ej ∈ A; Ej ⊃ Ej+1 , j = 1, 2, ... imply j=1 Ej ∈ A.
It is called a Borel field (B.F.) iff (i) and (viii) hold:
S∞
(viii) Ej ∈ A, j = 1, 2, ... imply j=1 Ej ∈ A.
Clearly a B.F. is a field so that it contains ∅ and Ω. A Borel field is sometimes called a
sigma-field or simga-algebra.
Theorem 1.2. A field is a B.F. if and only if it is also an M.C.
Proof. The “only if" part is trivial.
To prove the “if” part we show that (iv) and (vi) imply (viii):
Sn
(iv) En ∈ A, j = 1, 2, ..., n imply j=1 Ej ∈ A.
Let Ej ∈ A for j = 1, 2, ..., then by (iv),
Fn =
n
[
Ej ∈ A,
j=1
Clearly Fn ⊂ Fn+1 and
∞
[
Ej =
j=1
hence
S∞
j=1
∞
[
Fj ;
j=1
Ej ∈ A by (vi). Example 1.3.
• The collection S of all subsets of Ω is a B.F. called the total B.F.;
the collection of the two sets {∅, Ω} is a B.F. called the trivial B.F.
• If A is any index
set and if for every α ∈ A, Fα is a B.F. (or M.C.) then the
T
intersection α∈A Fα of these is B.F.’s (or M.C.’s).
• Given any nonempty collection C of sets, there is a minimal B.F. (or field, or
M.C.) containing it; this is just the intersection of all B.F.’s (or fields, or M.C.’s)
containing C, of which there is at least one, namely the S mentioned above. This
minimal B.F. (or field, or M.C.) is also said to be generated by C. In particular
if F0 is a field there is a minimal B.F. (or M.C.) containing F0 .
2
Theorem 1.4. Let F0 be a field, G the minimal M.C. containing F0 , F the minimal
B.F. containing F0 , then F = G.
Proof . Since a B.F. is an M.C, we have F ⊃ G. To prove F ⊂ G it is sufficient
to show that G is a B.F. Hence by Theorem ?? it is sufficient to show that G is a field.
We shall show that it is closed under intersection and complementation. Towardt this,
define two classes of subsets of G as follows:
C1 = {E ∈ G : E ∩ F ∈ G, ∀F ∈ F0 }
C2 = {E ∈ G : E ∩ F ∈ G, ∀F ∈ G} .
Since
∞
[
F∩
j=1
∞
\
F∩
∞
[
Ej =
(F ∩ Ej ),
j=1
∞
\
Ej =
j=1
(F ∩ Ej ),
j=1
it follows that both C1 and C2 are M.C.’s. Since F0 is closed under intersection and
contained in G, it is clear that F0 ⊂ C1 . Hence G ⊂ C1 by the minimality of G and so
G = C1 . This means for any F ∈ F0 and E ∈ G we have F ∩ E ∈ G, which in turn
means F0 ⊂ C2 . Hence G = C2 and this means G is closed under intersection.
Next, define another class of subsets of G as follows:
C3 = {E ∈ G : E c ∈ G} .
The (DeMorgan) identities

∞
[

c
Ej  =
∞
\

Ejc ,
j=1
j=1

∞
\
c
Ej  =
j=1
∞
[
Ejc
j=1
show that C3 is a M.C. Since F0 ⊂ C3 , it follows as before that G = C3 , which means
G is closed under complementation. The proof is complete. Remark 1. The theorem above is one of a type called monotone class theorems. They
are among the most useful tools of measure theory, and serve to extend certain relations
which are easily verified for a special class of sets or functions to a larger class.
1. (HMK) Show A ∪ B = B ∪ A, A ∩ B = B ∩ A
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C),
A ∪ A = A,
(A ∪ B)c = Ac ∩ B c ,
3
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
(A ∩ B)c = Ac ∪ B c
2. (HMK) Let Ω be a sample space and D = {D1 , D2 , ...} form a countable decomposition of Ω into nonempty sets, Ω = D1 ∪ D2 ∪ ..., Di ∩ Dj = ∅, i 6= j.
The collection A, formed by the sets that are unions or the complements of the
unions of finite numbers of elements of the decomposition, is a field.
3 (HMK) Let A be the finite unions of disjoint intervals:
A ∈ A if A =
n
[
Ii for some n,
i=1
where Ii are sets of the forms (−∞, a], (a, b], (b, ∞) with −∞ < a < b < ∞.
Show that A is a field, but it is not a B.F.
4 (HMK) Problems 1, 2 (page 12). Problem 1 (page 24).
1.2
Probability Measures
Let Ω be a space, F a B.F. of subsets of Ω. A probability measure P on F is a
numerically valued set function with domain F, satisfying the following axioms:
(i) ∀E ∈ F, P (E) ≥ 0.
(ii) (Countable additivity) If {Ej} is a countable collection of (pairwise) disjoint
sets in F, then


[
X
P  Ej  =
P (Ej ).
j
j
(iii) P (Ω) = 1.
These axioms imply the following properties.
(iv) P (E) ≤ 1.
(v) P (∅) = 0.
(vi) P (E c ) = 1 − P (E).
(vii) P (E ∪ F ) = P (E) + P (F ) − P (E ∩ F ).
(viii) E ⊂ F implies P (E) = P (F ) − P (F \ E) ≤ P (F ).
(ix) Monotone property: En ↑ E or En ↓ E implies P (En ) → P (E).
S
P
(x) Boole’s inequality: P ( Ej ) ≤ j P (Ej ).
(xi) (Continuity) En ↓ ∅ imples P (En ) → 0.
The triple (Ω, F, P ) is called a probability space (triple); Ω alone is called the
sample space, and ω is then a sample point.
4
Theorem 1.5. The axioms of finite additivity and of continuity together are equivalent
to the axiom of countable additivity.
Proof . Let En ↓. Then
En =
∞
[
(Ej \ Ej+1 )
∞
[\
j=n
Ej .
j=1
If En ↓ ∅, then the last term is the empty set. Hence if (ii) (countable addivity) is
assumed, then
∞
X
∀n ≥ 1 : P (En ) =
P (Ej \ Ej+1 );
j=n
P∞
Since the series j=1 P (Ej \ Ej+1 ) = P (E1 ) is convergent, it follows P (En ) → 0.
Hence (xi) (continuity) holds.
S∞
Conversely, let {Ej : j = 1, 2, ...} are pairwise disjoint, then j=n+1 Ej ↓ ∅.
S∞
Hence if (xi) (continuity) holds, then P
k=n+1 Ek → 0. If further finite addivity is assumed, then
!
!
!
!
∞
n
∞
n
∞
[
[
[
X
[
P
Ek = P
Ek + P
Ek =
P (Ek ) + P
Ek .
k=1
k=1
k=n+1
k=1
k=n+1
P∞
This shows that the infinite series k=1 P (Ek ) converges. Letting n → ∞, we get
!
!
∞
n
∞
∞
X
X
[
[
P (Ek ).
P (Ek ) + lim P
Ek =
P
Ek = lim
n→∞
k=1
k=1
n→∞
k=n+1
k=1
This is the desired result. Example 1.6. Let Ω be a countable set: Ω = {ω : j ∈ J}, where J is a countable index
set, and let F be the total B.F. of Ω. Choose any sequence of numbers {pj , j ∈ J}
satisfying
X
∀j ∈ J : Pj > 0;
pi = 1.
j∈J
and define a set function P on F as follows:
X
P (E) =
pj ,
E ∈ F.
ωj ∈E
Clearly axioms (i), (ii), and (iii) are satisfied. Hence P is a probability and (Ω, F, P )
is called a discrete probability space.
Example 1.7. Let U = (0, 1], C = {(a, b] : 0 < a < b ≤ 1}; B = σ(C). Then
(U, B, m) is a prob space.
Let B0 be the collection of subsets of U each of which is is the union of a finite
number of members of C. Thus a typical set B is of the form
B=
n
[
(aj , bj ],
a1 < b1 < a2 < b2 < ... < an < bn .
j=1
5
Then B0 is a field generated by C and generates B = σ(B0 ). If we take U = [0, 1]
instead, then B0 is no longer a field since U 6∈ B0 , but B and m may be defined as
before. The new B is generated by the old B and the singleton {0}.
Example 1.8. (Borel Sets) Let R = (∞, +∞), C the collection of intervals of the form
(a, b]. −∞ < a < b < +∞. The field B0 generated by C consists of finite unions of
disjoint sets of the form (a, b], (−∞, a] or (b, ∞). The Borel sets B on R is the B.F.
generated by C or B0 . Clearly, the Borel-Lebesgue measure m on R is not a p.m., since
m(R) = ∞. Hence m is not a finite measure, but it is σ-finite on R. Namely, there
exists a sequence of sets En ∈ B0 , En ↑ R with m(En ) < ∞ for each n.
2
Distribution Functions and Probability Meaures
2.1
Monotone functions and their properties
Let f be an increasing function defined on the real line (∞, +∞). We have the following properties.
(i) The left and right limits always exist (Why?):
f (x−) = lim f (t),
t↑x
f (x+) = lim f (t).
t↓x
In particular f (−∞) = limx→−∞ f (x) (could be −∞) and f (∞) = limx→∞ f (x)
exist (could be +∞).
(ii) f is continuous at x iff
f (x−) = f (x) = f (x+).
(iii) The only possible discontinuity of an increasing function is a jump, while f (x+)−
f (x−) is called the size of the jump at x.
(iv) The set of discontinuities of f is countable. (What kinds of discontinuities are
there if a function is not increasing?)
(iv) The set of discontinuities of f is countable. (What kinds of discontinuities are
there if a function is not increasing?)
Proof. For each point x of jump, let Ix = (f (x−), f (x+)). If x0 is another point
of jump, then Ix0 and Ix0 are disjoint (though it is possible f (x+) = f (x0 −))
since f is monotone. Each such interval Ix contains a rational number, hence the
collection of such intervals is at most as many as the number of rational numbers.
Since rational numbers are countable, the desired proof follows. (v) If f1 , f2 are two incr functions and concide on a dense subset D of R, then f1 , f2
have the same points of jumps with the same size of jumps and concide on R
except possibly at some points of such jumps. Proof. Let x ∈ R and xn ↑ x
and x0n ↓ x. Choose xn , x0n ∈ D since D is dense. Then
f1 (x−) = lim f1 (xn ) = lim f2 (xn ) = f2 (x−),
n→∞
n→∞
6
f1 (x+) = lim f1 (x0n ) = lim f2 (x0n ) = f2 (x+).
n→∞
n→∞
It is now straightforward to verify the results. 2.2
Distriburtion Functions and Their Decompositions
Definition 2.1. A real-valued function F with domain (−∞, ∞) that is increasing and
right continuous with F (−∞) = 0 and F (∞) = 1 is called a distribution function
(df).
A distribution function has only countable many jumps, say at a1 , a2 , ... with
jump sizes b1 , b2 , ... respectively, i.e.,
bj = F (aj ) − F (aj −).
P
Let Fd (x) = j bj δaj (x), where δt (x) = 0 or 1 according to x < t or x ≥ t.
Hence Fd (x) is the sum of all the jumps of F in the half-line (∞, x]. Clearly,
P
Fd (−∞) = 0, Fd (∞) = j bj ≤ 1.
Hence Fd is bounded and increasing and constitutes the “jumping part" of F . Intuitively, if it is subtracted away from F , the remainder should be positive, contain no
more jumps, and so be continuous.
Theorem 2.2. Let
Fc (x) = F (x) − Fd (x),
x ∈ R = (−∞, ∞).
Then Fc is positive, increasing, and continuous.
Proof. Clearly Fd (x) ≤ F (x), hence Fc (x) is postive. Let x < x0 . Then
X
X
Fd (x0 ) − Fd (x) =
bj =
(F (aj ) − F (aj −)) ≤ F (x0 ) − F (x).
x<aj ≤x0
x<aj ≤x0
It follows that Fc hence Fd is increasing. Since F and δaj are right continuous, so is
Fc . It suffices to show Fc is left continuous. Clearly,
bj If x = aj
Fd (x) − Fd (x−) =
0 otherwise
Proof (cont’d)
The above holds even for F . Hence,
Fc (x) − Fc (x−) = F (x) − F (x−) − [Fd (x) − Fd (x−)] = 0.
This shows Fc is left-continuous and finishes the proof.
7
Uniqueness
Theorem 2.3. Let F be a d.f. Suppose there exist a continuous function Gc and a
function Gd of the form
X
Gd (x) =
b0j δa0j (x),
j
P
where a0j is a contable set of reals and j |b0j | < ∞, such that
F = Gc + Gd .
Then
Gc = Fc ,
Gd = Fd .
Proof of Theorem 2.3
If Fd 6= Gd , then either {aj } and a0j are not identical, or a0j can be relabeled
so that a0j = aj for all j but bj 6= bj for some j. In either case there exists at least one
j, a = aj or a0j such that
Fd (a) − Fd (a−) − [Gd (a) − Gd (a−)] 6= 0.
Since Fc − Gc = Gd − Fd , it follows
Fc (a) − Gc (a) − [Fc (a−) − Gc (a−)] 6= 0.
This contradicts the continuity of Fc − Gc , showing Fd = Gd and hence Fc = Gc . Definition 2.4. A d.f. F that can be represented in the form
X
F =
bj δaj
P
where {aj } is a countable set of real numbers, bj > 0 for every j and j bj = 1, is
called a discrete d.f. A d.f. that is continuous everywhere is called a continuous d.f.
From Theorem 2.2, F = Fc + Fd . But Fc Fd are not d.f.’s unless Fc ≡ 0 or
Fd ≡ 0. Now suppose Fc 6≡ 0 and Fd 6≡ 0. So α = Fd (∞) ∈ (0, 1). Let F1 = Fd /α
and F2 = Fc /(1 − α). Then
F = αF1 + (1 − α)F2 ,
where F1 and F2 are discrete and continuous d.f.’s.
Summarizing the above discussion, we have
Theorem 2.5. Every d.f. can be written as the convex combination of a discrete and a
continuous d.f. Such a decomposition is unique.
An event A holds “almost everywhere (a.e.)” if Ac has (Lesbegue) measure zero.
8
Definition 2.6. A real-valued function F defined on a finite closed interval [a, b] is
called absolutely continuous (a.c.) on [a, b] if for every > 0 there exists a δ > 0
such that
m
X
|F (bi ) − F (ai )| < i=1
for every finite collection {(ai , bi ) : i = 1, ..., m} of non-overlapping sub-intervals of
[a, b] with
m
X
|bi − ai | < δ.
i=1
Properties of A.C. Functions
1. If f is a.c. on [a, b], then f is uniformly continuous (u.c.) on [a, b], i.e., for every
> 0, there exists δ > 0 depending on such that
|f (x1 ) − f (x2 )| < ,
for all x1 , x2 ∈ [a, b].
HWK. Show f (x) = sin(x) is u.c. on [a, b]. Is it u.c. on R? HWK. Give an
example which is continuous but not u.c and justify your conclusion.
2. If f is u.c. on [a, b], then f is of bounded variation (b.d.) on [a, b], i.e.,
(m
)
X
sup
|f (ai ) − f (ai−1 )| : a = a0 ≤ a1 ≤ ... ≤ am = b < ∞.
i=1
HWK: Write one-page summary about the properties of functions of b.d.
(3) If f, g are a.c. on [a, b], then f ± g and f g are a.c. on [a, b]. (HMK).
(4) If f is a.c. on [c, d], and g is a.c. on [a, b] with range on [c, d], then f ◦ g is a.c.
on [a, b].
Remark 2. If f has a continuous derivative, then f is a.c. If f is a.c. and differentiable everywhere, then f 0 can be chosen to be the derivative of f . The function
f (x) = exp(−|x|), x ∈ R is a.c. with f 0 (x) = −sign(x) exp(−|x|), x ∈ R, but not
differentiable at 0.
Theorem 2.7. A function F is absolutely continuous on [a, b] if F has a derivative f
almost everywhere and
Z x
F (x) = F (a) +
f (t) dt, x ∈ [a, b]
(2.1)
a
Conversely, if the F satisfies (2.1) for some f , then F is absolutely continuous on [a, b]
and f is an almost everywhere derivative. Further, if F is a d.f., then
Z b
f ≥ 0, a.e. and
f (x) dx = 1.
a
9
Remark 3. For a = −∞, b = ∞, F is called absolute continuous if for x > x0 ,
Z x
F (x) = F (x0 ) +
f (t) dt.
(2.2)
x0
Then the converse part holds for a = −∞, b = ∞, see page 11, KLC.
Definition 2.8. A function F is called singular iff it is not identically zero and the
derivative F 0 exists and equals zero a.e.
A discrete distribution function F is singular (HMK, Problem 6, page 13, KLC).
Theorem 2.9. Suppose F is an increasing function on R with F (−∞) = 0 and
F (∞) < ∞. Then the following hold.
1. F has a derivative F 0 a.e. (w.r.t. the Lebesgue measure µ), i.e., there exists a
measureable set D on which F 0 (x) exists with 0 ≤ F 0 (x) < ∞ for every x ∈ D
such that µ(D̄) = 0.
R
2. F 0 is L1 integrable (i.e., |F 0 (x)| dx < ∞) and satisfies
Ry
x
3. Let
Z
F 0 (t) dt ≤ F (y) − F (x),
x < y.
x
Fac (x) =
F 0 (t) dt,
Fs (x) = F (x) − Fac (x),
x ∈ R.
−∞
0
= F 0 a.e. so that Fs0 = 0 a.e. and Fs is singular provided it is not
Then Fac
identically zero.
• Let F be a d.f. and f = F 0 be its a.e. derivative. Set
Z x
Fac (x) =
f (t) dt, Fs (x) = F (x) − Fac (x).
−∞
0
0
= 0 a.e., so Fac is absolutely continuous
= F 0 a.e. and Fs0 = F 0 − Fac
Then Fac
and Fs is singular if it is not identically zero.
• We now have obtained another decomposition
F = Fac + Fs ,
where clearly Fac is a.c, Fs is singular, and both are increasing with Fac ≤ F ,
Fs ≤ F .
• As before, Fac and Fs are not distribution functions since Fac (∞) 6= 1 and
Fs (∞) 6= 1 unless Fs or Fac are identically zero respectively.
10
• Let β1 = Fac (∞). Then 0 ≤ β1 ≤ 1. Now let F̃ac = Fac /β1 if β1 > 0
otherwise F̃ac = 0 (if β1 = 0), and F̃s = Fs /(1 − β1 ) if β1 < 1 otherwise
F̃s = 0 (if β1 = 1). Then both F̃ac and F̃s are d.f.’s provided Fac 6≡ 0 and
Fs 6≡ 0, and
F = β1 F̃ac + β2 F̃s ,
(2.3)
where β2 = 1 − β1 .
Theorem 2.10. Every d.f. can be written as the convex combination of a discrete, a
singular continuous, and an absolutely continuous d.f. Such a decomposition is unique.
Proof. Let F be a d.f. By Theorem 2.5, there exist 0 ≤ α1 , α2 ≤ 1 and α1 +α2 = 1
such that
F = α1 Fc + α2 Fd .
Applying (2.3) with F = Fc , we obtain Fc = β1 Fac + β2 Fsc , where Fac is a.c. and
Fsc singular continuous, substition of this decomposition in the above equality, we get
F = γ1 Fac + γ2 Fsc + γ3 Fd ,
(2.4)
where γ1 = α1 β1 , γ2 = α2 β1 and γ3 = α2 . This proves the decomposition. The
uniqueness can be proved similar to Theorem 2.5. Definition 2.11. Any positive f with f = F 0 a.e. is called a density. Fac is called the
absolutely continuous part; Fs the singular part of F .
2.3
Distribution functions and probability measures
Lemma 2.12. Each p.m. µ on B determines a d.f. F through the correspondence
∀x ∈ R :
µ((−∞, x]) = F (x).
(2.5)
Consequently, we have for −∞ < a ≤ b < ∞:
µ((a, b]) = F (b) − F (a),
(2.6)
µ((a, b)) = F (b−) − F (a),
(2.7)
µ([a, b)) = F (b−) − F (a−),
(2.8)
µ([a, b]) = F (b) − F (a−),
(2.9)
Theorem 2.13. Each d.f. F determines a p.m. µ on B through any one of the relations
given in Lemma 2.12.
As previously discussed, given d.f. F , we may define a set function µ for intervals
of the form (a, b] by means of (2.6). Such a function is seen to be countably additive
on its domain of definition.
We can extend its domain of definition while preserving this additivity to the following sets.
11
Remark 4. In the above setting, we have defined the set function µ based on a d.f. F .
One can define such a set function µ based on the length of the interval (a, b], i.e.,
µ(a, b] = b − a,
and proceed as with a d.f. F . This gives the Lesbegue measure on R. The consequence
is that µ is no longer finite.
• Countable union of such disjoint intervals:
[
X
X
If S = (ai , bi ], then µ(S) =
µ((ai , bi ]) =
F (bi ) − F (ai ).
i
i
i
The extension is well defined in the sense that it does not depend on the respresetation of S.
S
• If O is open in R, then
P it is well known that O = (ci , di ) for some disjoint
(ci , di )’s, so µ(O) = i F (di −) − F (ci ) is well defined.
• For singleton {a},
µ({a}) = F (a) − F (a−).
• But even with the class of open and closed sets we are still far from the B.F. B.
The next step extension is to consider Gδ and Fδ sets. HWK. Find the defintions
of Gδ and Fδ and their properties.
• There is an alternative way for the extension.
Outer and Inner Measures
From the above process, µ is a well-defined set functions on “simple sets”. (Think
about it this way, you can calculate the area of a rectangle, or a polygon, and now
you want to calculate the area of a circule. The rectangles or polygons are the “simple
sets”). For any subset S of R, define
µ∗ (S) =
inf
O open O⊃S
µ(O),
µ∗ (S) =
sup
µ(C).
C closed C⊂S
µ∗ is called outer measure and µ∗ is called inner measure w.r.t. d.f. F . Clearly
µ∗ (S) ≥ µ∗ (S).
Equality does not hold in general.
Definition 2.14. If the equality holds in the above, then S is called “measureable”
(w.r.t. d.f. F .)
If S is measureable, the common value is denoted by µ(S) (abuse of notation!).
Now we need to check out the new measure agrees with the old measure. The next task
is to prove that:
• The class of all measurable sets forms a σ-algebra F say L;
12
• On this L, the function µ is a p.m;
• To finish: since L is a B.F., and it contains all intervals of the form (a, b], it
contains the minimal B.F. B with this property;
• It may be larger than B, indeed it is (see below), but this causes no harm, for the
restriction of L to B is a p.m. whose existence is asserted in Theorem 2.13.
There is one more question: besides the µ discussed above is there any other p.m.
ν that corresponds to the given F in the same way?
Theorem 2.15. (Uniqueness) Let µ and ν be two σ-finite measures defined on the same
σ-algebra F, which is generated by the field F0 . If either µ or ν is σ-finite on F0 , and
µ(E) = ν(E) for every E ∈ F0 , then the same is true for every E ∈ F and thus
µ = ν.
Proof. Suppose µ, ν are finite. Let
C = {E ∈ F : µ(E) = ν(F )} .
Then C ⊃ F0 by assumption. For monotone En ∈ C , it follows from the monotone
property of measure that
µ(E) = lim µ(En ) = lim ν(En ) = ν(E).
n
n
This shows that C is a M.C., and thus it follows from Theorem 1.4 that C ⊃ F.
Proof. (Cont’d) Prove the the result in the case of σ-finite (HMK). This completes
the proof. Corollary 2.16. Let µ and ν be sigma-finite measures on F that agree on all intervals
of one of the eight kinds: (a, b], (a, b), [a, b), [a, b], (−∞, b], (−∞, b), [a, ∞), (a, ∞)
or merely on those with the endpoints in a given dense set D, then they agree on B.
Theorem 2.17. Given a p.m. µ, there is a unique d.f. F satisfying (2.5). Conversely,
given a d.f. F , there is a unique p.m. satisfying (2.5) or any of the relations (2.6)-(2.9).
µ is called the p.m. of F and F is the d.f. of µ.
• Instead of (R, B) we may consider its restriction to a fixed interval [a, b]. Without
loss of generality we may suppose this to be U = [0, 1] so that we are in the
situation of Example 2.
• We can either proceed analogously or reduce it to the case just discussed, as
follows. Let F be a d.f. such that F (x) = 0 for x < 0 and F (x) = 1 for
x > 1. The probability measure µ of F will then have support in [0, 1], since
µ((∞, 0)) = 0 = µ((1, ∞)). Denoted it by (U , B[0, 1], µ), where B[0, 1] is the
restriction of B on U . Conversely, any p.m. on B[0, 1] may be regarded as such
a restriction.
13
• The most interesting case is when F is the “uniform distribution" on U : F (x) =
0, x, 1 according to x < 0, 0 ≤ x ≤ 1, x > 1. The resulting measure m is called
the Borel measure on [0, 1], while its extension on L according to Theorem 2.13
is the usual Lebesgue measure.
• It is well known that L is larger than B[0, 1].
Definition 2.18. The probability space (Ω, F, P ) is said to be complete iff any subset
of a set in F with µ(F ) = 0 also belongs to F.
A set N in F is called a null set if P (N ) = 0.
Theorem 2.19. Given a probability space (Ω, F, P ), there exists a complete space
(Ω, F̄, P̄ ) such that F ⊂ F̄ and P = P̄ on F.
Proof. Page 31. Remark 5. The completion of the usual Borel set (B, m) is the Lebesgue space (L, m)
HMK
1. For any countable infinite set Ω, the collection of its finite subsets and their
complements forms an algebra F. If P (E) on F is defined as 0 or 1 according
to E is finite or not, the P is finitely additive but not countable so.
2. The σ-algebra B 1 on R1 is generated by the class of all open intervals, or all
closed intervals, or all half lines of the form (−∞, a] or (a, ∞), or these intervals
with rational endpoints. But it is not generated by all the singleton of R1 nor by
any finite collection of subsets of R1 .
3. Show B 1 contains every singleton, countable set, open set, Gδ and Fδ set.
4. An atom of any measure on B 1 is a singleton {x} such that µ({x}) = 0. The
number of atoms of any σ-finte measure is countable. for each x we have
µ({x}) = F (x) − F (x−).
HMK
(5) Let F̃ have all the defining properties of a d.f. except it is assumed to be right
continuous. Show Theorem 2.13 and Lemma remain valid with F replaced with
F̃ , provided that F (x), F (b), F (a) with F̃ (x+), F̃ (b+), F̃ (a+), respectively.
How to modify Theorem 2.17?
(6) Show
µ(a, b] = F (b) − F (a),
µ[a, b] = F (b) − F (a−),
µ(a, b) = F (b−) − F (a),
µ[a, b) = F (b−) − F (a−),
µ {x} = F (x) − F (x−).
14
3
3.1
Random Variable and Expectation
Random variables and Properties
• (Ω, F, P ) denote a probability space,
• R = (−∞, ∞) the real line,
• R̄ = [−∞, ∞] the extended real line,
• B = the Borel field on R (recall B is the σ-algebra generated by sets of the form
(a, b] with −∞ < a < b < ∞), and
• B ∗ = the extended Borel field: a set in B ∗ is just a set in B possibly enlarged by
one or both points ±∞.
Definition 3.1. A real, extended-valued random varivariable is a function X whose
domain is a set A in F and whose range is contained in R̄ such that for each B in B ∗ ,
we have
{ω : X(ω) ∈ B} ∈ F
A complex-valued random variable is a function on a set A in F to the complex plane
whose real and imaginary parts are both real, finite-valued random variables.
For A ⊂ R, define the “inverse mapping” of X from R to Ω:
X −1 (A) = {ω : X(ω) ∈ A} .
Theorem 3.2. For any function X from Ω to R (or R̄), the following hold.
• X −1 (Ac ) = (X −1 (A))c .
S
S
• X −1 ( α Aα ) = α X −1 (Aα ).
T
T
• X −1 ( α Aα ) = α X −1 (Aα ), where α runs over an arbitrary index set (not
necessary countable).
Using the inverse mapping, X is a r.v. if X −1 carries members of B to members of
F:
∀B ∈ B :
X −1 (B) ∈ F.
Theorem 3.3. X is a r.v. if and only if for each real number x, we have
{ω : X(ω) ≤ x} ∈ F.
Proof. The “if” part is trival. To show the “only if” part, let A be the collection of
all subsets A of R for which X −1 (A) ∈ F. Then it is easy to verify that A is closed
under complementation and countable union, hence A is a B.F. Since A contains all
the intervals of the form (−∞, x] for all x ∈ R, it follows that A ⊃ B 1 , implying
X −1 (B) ∈ F for every B ∈ B 1 . This shows that X is a r.v. by definition. 15
Theorem 3.4. Each r.v. on the probability space (Ω, F, P ) induces a probability space
(R, B, µ) via the following correspondence:
µ(B) = P (X −1 (B)) = P (X ∈ B).
∀B ∈ B :
(3.1)
Proof. One verifies the nonnegativity, countable addivity and unity of the set function and hence proves the theorem by definition. Remark 6. The collection of sets X −1 (S), S ⊂ R is a σ-algebra for any function
X. If X is a r.v. then the collection X −1 (B), B ⊂ B is called the σ-algebra generated by X.
The d.f. of X is deined as
F (x) = µ((−∞, x]) = P (X ≤ x).
Example 3.5. Every function defined on a discrete measurable space (Ω, S ) is a r.v.
Example 3.6. A function f defined on the space (U , B, m) is a r.v. if and only if it is
Borel measurable, i.e., f −1 (B) ∈ B for every B ∈ B 1 .
Theorem 3.7. If X is an r.v. on (Ω, F), f is a Borel measurable function on (R, B 1 ),
then f (X) is an r.v.
Proof. The proof follows from
(f ◦ X)−1 (B 1 ) = X −1 (f −1 (B 1 )) ⊂ X −1 (B 1 ) ⊂ F.
If X, Y are r.v.’s on (Ω, F, P ), then (X, Y ) is called a random vector, which induces a probability distribution ν:
∀A ∈ B 2 ,
ν(A) = P ((X, Y ) ∈ A).
Analoguously, one can define (X, Y )−1 and etc.
Theorem 3.8. If X and Y are r.v.’s and f is a Borel measurable function of two variables, then f (X, Y ) is an r.v.
Proof. Analoguous to the univariate case. Corollary 3.9. If X is an r.v. and f is a continuous function on R, then f (X) is an
r.v.; in particular X r for positive integer r, |X|r for positive real r, e−λX , eitX for real
X and t, are all r.v.’s (the last being complex-valued). If X and Y are r.v.’s, then
max(X, Y ),
min(X, Y ),
X ± Y,
XY,
X/Y
are r.v.’s, the last provided Y does not vanish.
Theorem 3.10. If {Xj , j ≥ 1} is a sequence of r.v.’s, then
inf j Xj ,
supj Xj ,
lim inf j Xj ,
16
lim supj Xj
are r.v.’s, not necessarily finite-valued with probability one though everywhere defined,
and
limj→∞ Xj
is a r.v. on the set on which there is either convergence or divergence to ∞.
Proof. The proof for each case is similar. For example, supj Xj is a r.v. since
n
o \
sup Xj ≤ x =
{Xj ≤ x} .
j
j
Whereas lim supj Xj is a r.v. since
lim sup Xj = inf sup Xj .
n j≥n
j
Definition 3.11. A r.v. X is called discrete iff there is a countable set B ∈ R such that
P (X ∈ B) = 1.
For A ⊂ Ω, the indicator function of A is 1[A](ω) = 1 or 0 according to ω ∈ A
or ω ∈
/ A. (HMK) 1[A] is a r.v. iff A ∈ F. A countable partition
of Ω is a contable
S
collection of disjoint sets {Λj } with Λj ∈ F such that Ω = Λj . Then clearly
X
1[Λj ] = 1.
j
More generally, for a sequence {bj } of reals, we define
X
∀ω ∈ Ω : ϕ(ω) =
bj 1[Λj ](ω)
j
is a discrete r.v. Clearly every discrete r.v. corresponds to some partition.
Homework
1. For any functdion X from Ω to R1 (not necessarily a r.v.), the inverse mapping
X −1 has the following properties.
S
S
X −1 (Ac ) = (X −1 (A))c , X −1 ( α Aα ) = α X −1 (Aα ),
T
T
X −1 ( α Aα ) = α X −1 (Aα ),
where α runs over an arbitrary index set.
2. Let U be uniformly distributed on [0, 1]. For a d.f. F , define G(y) = sup {x : F (x) ≤ y} , y ∈
[0, 1]. Show G(U ) has the d.f. F .
3. Suppose X has a continuous d.f. F . Then F (X) is uniformly distributed on
[0, 1]. What if F is not continuous?
4. The sum, difference, product, quotient (denominator nonvanishing) of two discrete r.v.’s are all discrete.
5. If Ω is a discrete (countable), then every r.v. is discrete.
17
Homework cont’d
(6) Express the indicators of E1 ∪ E2 , E1 ∩ E2 , E1 \ E2 , E1 4E2 , lim inf En ,
lim sup En in terms of those of E1 , E2 or En .
3.2
Expectation and Properties
• For positive X =
as
P
j bj 1[Λj ],
the expected value (expectation) of X is defined
X
E(X) =
bj P (Λj ).
j
(Clearly the sum is either finite or infinite.)
• Let X be positive. For positive integers m and n, let
Λmn = {ω : n/2m ≤ X(ω) < (n + 1)/2m } .
Then Λmn ∈ F. For each m, let
Xm
∞
X
n
1[Λmn ].
=
m
2
n=0
Clearly,
Xm (ω) ≤ Xm+1 (ω),
0 ≤ X(ω) − Xm (ω) < 1/2m
• Therefore,
∀ω ∈ Ω :
lim Xm (ω) = X(ω)
m→∞
We have
E(Xm ) =
∞
X
n
P (n/2m ≤ X(ω) < (n + 1)/2m )
m
2
n=0
If one of E(Xm ) is infinity, we define E(X) = ∞. Otherwise we define
E(X) = lim E(Xm ).
m→∞
• For arbitrary X, we have
X = X + − X −.
We define
E(X) = E(X + ) − E(X − )
unless E(X + ) = E(X − ) = ∞.
18
• X has a finite or infinite expectation (or expected value) according to E(X)
is finite or ±∞. We use notation
Z
Z
E(X) =
X(ω) P (dω) = X dP
Ω
R
Clearly, Λ X dP = E(X1[Λ]), which is called the integral of X w.r.t. P over
Λ. We say that X is integral w.r.t. P over Λ if the integral exists and is finite.
• For (R, B, µ), we write X = f , ω = x and
Z
Z
X(ω) P (dω) =
f (x) µ(dx),
Λ
Λ
which is the Lebesgue-Stieltjes integral of f w.r.t. µ.
• If F is the d.f. of µ, and Λ = (a, b], then we also write
Z
f (x) dF (x).
(a,b]
1. Absolute Integrability.
R
X dP is integrable iff
Z
|X| dP < ∞.
2. Linearity.
Z
Z
aX + bY dP = a
Z
X dP + b
Y dP.
provided the integral is well defined on the right side.
3. Addivity. If Λn ’s are disjoint, then
Z
XZ
X
dP
=
S
Λn
n
X dP.
Λn
4. Positivity. If X ≥ 0, then
Z
X dP ≥ 0.
5. Monotonicity. If X1 ≤ X ≤ X2 a.e., then
Z
Z
Z
X1 dP ≤ X dP ≤ X2 dP
6. Mean value theorem. If a ≤ X ≤ b a.e. on Λ, then
Z
aP (Λ) ≤
X dP ≤ bP (Λ)
Λ
19
7. Modulus inequality.
Z
|
Z
X dP | ≤
|X| dP
8. Dominated ConvergenceRTheorem (DCT). If Xn → X a.e. or in probability and
for all n |Xn | ≤ Y with Y dP < ∞, then
Z
Z
Xn dP = X dP
(3.2)
lim
n→∞
8. Bounded Convergence Theorem (BCT). If Xn → X a.e. or in probability and
for all n |Xn | ≤ M for some constant M , then (3.2) holds.
10. Monotone convergence theorem (MCT). If Xn ≥ 0 and Xn ↑ X a.e. then (3.2)
holds.
P R
11. Integration term by term. If n |Xn | dP < ∞, then
Z X
XZ
X dP.
Xn dP =
n
n
12. Fatou’s lemma. If Xn ≥ 0 a.e., then
Z
Z
lim inf Xn dP ≥ lim inf Xn dP.
Proof. Verify by definition. A useful theorem.
Theorem 3.12.
∞
X
P (|X| ≥ n) ≤ E(|X|) ≤ 1 +
n=1
∞
X
P (|X| ≥ n)
n=1
Proof. By the change of order of summation and integration. 20
(3.3)