Lecture Notes
on
Advanced Probability and Stochastic Processes
ESTHER FROSTIG
Department of Statistics
The University of Haifa
Haifa, Israel
Draft, Do Not Circulate
c
by
Esther Frostig, 2016
Lecture notes ,
Initial draft
Last modified March 20, 2016
Contents
1 Probability: basic definition
1.1 basic definitions . . . . . .
1.1.1 Sample space . . .
1.1.2 Set operations . . .
1.1.3 Working with sets .
1.1.4 σ-field . . . . . . .
1.1.5 Probability . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
i
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
2
2
3
4
Lecture 1
Probability: basic definition
1.1
1.1.1
basic definitions
Sample space
In everyday statetments, the word ”probability” or ”chance” is usually a number between
0 ans 1 that quantifies our belief in occurrence of ”uncetain’ or a ”random’ event. Ex.
the probability that the value of certain share will increase is 0.6. ex. The probability the
a die will fall on an even number is 0.5. Our aim is to formalize and to build rigorouse
mathematical foundation of probability theory in order to apply it to more sofisticated
scnarios in actuarial science or finance.
Definition 1.1. Random experiment: An experiment with uncertain outcomes.
Examples
1. throwing a dice
2. toss a coin twice.
3. toss a coin infintly many times.
Definition 1.2. Sample space: Ω-The set of all possible outcomes of a random experiment.
1. Throwing a dice: Ω = {1, 2, 3, 4, 5, 6}.
2. Toss a coin twice. Ω = {HH, HT, T H, T T }
3. Toss a coin infintly many times. Ω = {all infinite sequences of H and T}
Definition 1.3. Evnt is a subset of Ω (more precise definition will come latter).
Examples In 1. Let A-the result is even. A = {2, 4, 6}
In 2. Let B-H in the first toss, B = {HH, HT }.
In 3 let C the first toss is H. C = all sequences begin with H.
1
2
LECTURE 1. PROBABILITY: BASIC DEFINITION
1.1.2
Set operations
(This I did not give in class just a review)
Throghout this section all sets are in some sample space Ω. Notations:
1. w ∈ A if the point (or result w) is in A. In 1. Let A = {2, 4, 6}, then 2 ∈ A. If the
die is tossed and ”2” appears we say that A occurred.
2. Let A and B events (in Ω). A ⊂ B (B includes A) if w ∈ A implied that w ∈ B. (if
A occurred so is B).
3. A ⊂ B and B ⊂ A implies that A = B.
4. Ac or Ā : Ac = {w : w ∈
/ A}.
5. Φ = Ωc empty set.
Definition 1.4. Let A and B be subsets of ω the union of A and B denoted by A ∪ B is
the set of elements that belong to A or B (or both).
Definition 1.5. Let A1 , A2 , · · · , An be sets in Ω then ∪ni=1 Ai is the sets of outcome that
are in at least one of the sets A1 , A2 , · · · , An . Similarily define ∪∞
i=1 Ai .
Ex. Toss a coin twice. Ω = {HH, HT, T H, T T }. Let Ai be the event H in the i toss.
Then A1 ∪ A2 = {HH, HT, T H},A1 ∩ A2 = {HH}.
Definition 1.6. Let A and B be subsets of ω the intersection of A and B denoted by A ∩ B
is the set of elements that belong to both A and B
Definition 1.7. Let A1 , A2 , · · · , An ba sets in Ω then ∪ni=1 Ai is the sets of outcome that
are in all the sets A1 , A2 , · · · , An .similarily define ∩∞
i=1 Ai .
Ex.toss a coin twice. Ω = {HH, HT, T H, T T }. Let Ai be the event H in the i toss.
Then A1 ∪ A2 = {HH, HT, T H},A1 ∩ A2 = {HH}.
Definition 1.8. A and B are disjoint if A ∩ B = Φ
1.1.3
Working with sets
Let Ai , B sets in Ω
B ∩ (∪i Ai ) = ∪i (B ∩ Ai )
B ∪ (∩i Ai ) = ∩i (B ∪ Ai )
De Morgan Rules
(∪i Ai )c = ∩i Aci
(∩i Ai )c = ∪i Aci
1.1. BASIC DEFINITIONS
1.1.4
3
σ-field
Not all subsets of Ω will be considered as events. Some structure is defined on the collection
of subsets that are ”events”. The collection (or familly) of such events are called σ-algebra.
Definition 1.9. A collection of subset of a non-empty set Ω is called a σ-field, denoted by
F if it satisfies the following coniditions:
1. Φ ∈ F.
2. if A ∈ F so is Ac .
3. Whenever a sequence of sets A1 , A2 , · · · ∈ F then ∪∞
i=1 Ai ∈ F
Examples
1. {Φ, Ω}-called trivial σ field.
2. For A ⊂ Ω, F = {Φ, Ω, A, Ac }-σ field generated by A.
3. When
Ω = {HH, HT, T H, T T },
(1.1)
F = {Φ, Ω, {HH}, {HT }, {T H}, {T T },
{HH, HT }, {HH, T H}, {HH, T T }, {T H, HT }, {T T, HT }, {T T, T H} ,
{HH, HT, T H}, {HH, HT, T T }, {HH, T H, T T }, {T H, HT, T T }} (1.2)
The last sigma field has 24 = 16 elements. Why?
σ-field generated by a collection of sets:
Let A = {Ai , i = 1, · · · , n} be a
collection of subsets of Ω . The σ-field generated by A is the smallest σ algebra containing
the sets in A.
Consider a family of σ fields {Fj , j ∈ J} containing A then σ(A) ⊆ ∩j∈J Fj
Thus A = ∩j:A∈Fj Fj .
Practically in orde to constract σ(A) we just add subsets as needed.
Borel σ-field Assume taht Ω = [0, 1]. How one can define σ-field.? A very commmon
choice is the following: consider the σ algebra containing all closed interval (i.e. {[c, d] :
0 ≤ c < d ≤ 1}). Then add what is necessary to form a σ-field. Clearly, the σ field contains
also all the open intervals since :
∞
[
[a +
n=1
1
1
, b − ] = (a, b)
n
n
Definition 1.10. Let F1 , and F2 two σ fiels defined on the same sample space. Assume
that F1 ⊂ F2 meaning that any event in F1 is also in F2 , then we say that F1 is smaller
than F2 .
4
LECTURE 1. PROBABILITY: BASIC DEFINITION
Definition 1.11. Let A be a colection of subsets of Ω. The σ -field generated by A is the
smallest σ-field containing all the sets in A.
Example 1.1. In (1.1) LetA = {{HH, HT, T H}} Then the smallest σ-field containing it
is {Φ, Ω, {HH, HT, T H}, {T T }}
Definition 1.12. Let Ω = [a, b] the Borel σ-field over [a, b] is the smallest σ-filed containing
all the closed intervals in [a, b].
1.1.5
Probability
Definition 1.13. Let Ω be a nonempty set, and let F be a σ-field of subsets of Ω. A
probability measure P on (Ω, F) is a function P : F → [0, 1] satisfying:
1. P(Ω) = 1
2. whenever A1 , A2 , · · · is a sequence of disjount sets in F, (i.e. Ai ∩ Aj = Φ, i 6= j)
then
!
∞
∞
[
X
P
Ai =
P(Ai )
i=1
i=1
Some remarks
1. Clearly propert 2 of the definition holds for a finite union of disjoint sets: Let
A1 · · · , An disjoint sets. Define Aj = Φ for j = n + 1, n + 2 · · · . Then
!
!
n
∞
∞
n
[
[
X
X
P
Ai = P
Ai =
P(Ai ) =
P(Ai )
i=1
i=1
i=1
i=1
2. P(Ac ) = 1 − P(A).
3. Let B/A = B ∩ Ac . If A ⊂ B then P(B) = P(A) + P(B/A) ≥ P(A)
4. P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
5. Generalization of 3: Let A1 , · · · , An be events then
P (∪ni=1 Ai )
=
n
X
i=1
P(Ai )−
X
i1 <i2
P(Ai1 ∩Ai2 )+
X
P(Ai1 ∩Ai2 ∩Ai3 )+· · ·+(−1)n+1 P(A1 ∩· · ·∩An )
i1 <i2 <i3
Proofs
Proof of 4: A ∪ B = (A ∩ B c ) ∪ (Ac ∩ B) ∪ (A ∩ B). The tree events in the union are
disjoint. Therefore:
P(A ∪ B) = P(A ∩ B c ) + P(Ac ∩ B) + P(A ∩ B)
(1.3)
1.1. BASIC DEFINITIONS
5
P(A) = P(A ∩ Ω) = P(A ∩ (B ∪ B c )) = P((A ∩ B) ∪ (A ∩ B c ))
= P(A ∩ B) + P(A ∩ B c )
Thus:
P(A ∩ B c ) = P(A) − P(A ∩ B)
(1.4)
Similarly: P(B ∩ Ac ) = P(B) − P(A ∩ B). Substitute the last identities in (1.3). 5. is
proved by induction.
Examples:
When Ω is finite e.g. tossig a coin twice, Ω = {HH, HT, T H, T T }. then it is natural
to take the σ field as the colection of all subsets of Ω as in (1.2). In this case it is enough
to assign probability to each result in Ω.
assume that we assign probability P({HH}) = P({HT }) = P({T H}) = P({T T }) = 1/4
then P({the first toss head}) = P({HH, HT }) = 1/2, etc.
Increasing and decreasing sets.
Definition 1.14. A sequence of sets A1 , A2 , · · · is increasing if A1 ⊆ A2 ⊆ · · · . For an
increasig sequence define
∞
[
A=
Ai = lim Ai
i→∞
i=1
A sequence of sets B1 , B2 , · · · is decreasing if B1 ⊇ B2 ⊇ · · · . For an decreasig sequence
define
∞
\
B=
Bi = lim Bi
i→∞
i=1
Ex. for increasing sequence: Ai = [1/i, 1 − 1/i], i = 2, 3, · · · .
A=
∞
[
i=1
Ai = lim Ai = (0, 1)
i→∞
Ex. for decreaing sequence: Bi = [−1/i, 1 + 1/i], i = 1, 2, 3, · · · .
B=
∞
\
i=1
Bi = lim Bi = [0, 1]
i→∞
Lemma 1.1. Let A1 ⊆ A2 ⊆ · · · be an increasing sequence. Then
P(
∞
[
Ai ) = P(limAi ) = lim P(Ai )
i=1
c
Proof. Define D1 = A1 , D2 = A2 ∩ Ac1 , Dn+1 = An+1 ∩ Acn , etc. P(Di ) = P(AS
i ∩ Ai−1 ) =
n
P(Ai ) − P(Ai ∩
i−1 ) = P(Ai ) − P(A
j are disjoint. b. An =
i=1 Di . c.
SA
P∞i−1 ) Then a. DP
∞
n
A = lim An = i=1 Di . d. P(A) = i=1 P(Di ) = lim i=1 P(Di ) = lim P(An )
6
LECTURE 1. PROBABILITY: BASIC DEFINITION
Similarly,
Lemma 1.2. Let B1 ⊆ B2 ⊆ · · · be a decreasing sequence. Then
P(∩∞
i=1 Bi ) = P(limBi ) = lim P(Bi )
∞
c
∞
c
Proof. Bic increasing. P(B) = P(∩∞
i=1 Bi ) = 1 − P((∩i=1 Bi ) ) = 1 − P(∪i=1 Bi ) = 1 −
c
c
limP(Bi ) = lim(1 − P (Bi )) = limP(Bi )
Definition 1.15. Let B(R) be the σ algebra of Borel subset of R. The Lebesgue measure
on R, which is denoted by L, assign each B ∈ B(R) a number in [0, ∞)or the value ∞ so
that
1. L[a, b] = b − a
2. if B1 , · · · , B2 , · · · is a sequence of disjoint sets in B then
L(∪∞
i=1 )Ai
=
∞
X
L(Ai )
i=1
Theorem 1.3. Let Ω = [0, 1] and consdider the σ-field generated by all the open subset of
[0, 1]. For 0 ≤ a < b ≤ 1 define
µ(a, b) = b − a
(1.5)
,
then there is a unique probability measure µ on the Borel set in [0, 1] such that (1.5 )
holds for all 0 ≤ a < b ≤ 1.
Remark 1.1. Let (Ω, F, P) be a probability space. Let A ∈ F. We say that A occurrs
almost surely with respect to P if P(A) = 1
Simlarily when consider (R, B) we say that a property holds almost every wher if the
Lebesgue measure of the set that it does not hold is 0.
Lecture 2
Random variables
Definition 2.1. Let (Ω, F, P) be a probability space. A random variable X is real valued
function (i.e. a function from Ω to (−∞, ∞) such that for every Borel set B of R, {w :
X(w) ∈ B} ∈ F.
For B ⊂ R define X − (B) = {w : X(w) ∈ B}. Then:
Theorem 2.1.
1. X −1 (Ac ) = (X −1 (A))c .
∞
−1
2. X −1 (∪∞
(Ai )).
i=1 Ai ) = ∪i=1 (X
Remark 2.1. Clearly, (−∞, x] is a in B since (−∞, x] = ∪∞
i=1 [−i, x]. Similarily it can be
shown that that all closed interval are in the smallest σ-algebra generated by intervals of
the form (−∞, x]. Thus B generted by intervals of the form (−∞, x] (why?).
Thus
Theorem 2.2. Let (Ω, F, P) be a probability space. A random variable X is real valued
function (i.e. a function from ω to (−∞, ∞) such that X − ((−∞, x]) ∈ F
Definition 2.2. X is real valued function (i.e. a function from Ω to subset of R. Define
the σ algebra generated by X as the smallest σ algebra containing the sets X −1 (B), where
B ∈ B. This is equivalent to X −1 (B).
Notation: σ(X)-the σ-algebra generated by X.
Remark 2.2. Let X, Y be defined on (Ω, P). Then Y is σ(X) measurable if and only if
there is a Borel function g : R → R such that Y = g(X).
2.0.1
Distibution function
Let X be a r.v. The distribution function of X, FX is defined as follows: FX (x) = P({w :
X(w) ≤ x}).
Properties of distribution function:
7
8
LECTURE 2. RANDOM VARIABLES
1. if x < y then F (x) ≤ F (y).
2. limx→−∞ (F (x)) = 0; limx→∞ (F (x)) = 1;
3. F is right continuous., F (x + h) → F (x) as h → 0.
Consider the set of real numbers R with Borel σ-field. Define the following measure on the
Borel sets n R: µ(a, b] = F (b) − F (a), µ[a, b] = F (b) − F (a−) thus we defined a probability
measure on the closed sets. This can be extended to a measure on all Borel sets. (the
definition of a measure on the sigma- defined is the same as the definiton of probability
with out the restriction that the measure of all the sample space is 1.)
Distribution function of two random variables: Consider the smallest σ-algebra generated by sets B1 × B2 where Bi are Borel sets.
Let X, Y be random variables on a probability space (Ω, F, P). For C a Borel set in R2
let
FXY (x, y) = P(X ≤ x, Y ≤ y)
Especially:
Z
x
Z
y
FXY (x, y) = P(X ≤ x, Y ≤ y) =
fXY (s, t)dtds
−∞
−∞
We say that fX,Y is the densidty of (X, Y ) (or that FXY has a density).
2.0.2
Integrals and expectations
Rieman integrals: let f be a real valued function from R to R. Let a < b. Let a =
x0 < x1 < x2 < · · · xn = b assume that maxi (xi+1 − xi , i = 0, · · · , n − 1) = δn . Define
Mi = max(f
i = min(f (x) : xi ≤ x < xi+1 ). Then define
Pm
Pn−1 (x) : xi ≤ x < xi+1 ), and
n−1
Sh,n = i=0 Mi (xi+1 − xi ) and Sl,n = i=0 mi (xi+1 − xi ) If Sh,n → Ih and Sl,n → Il and
Rb
the limits are equal we say that the Rieman integral exists and equals to a f (x)dx. Exists
if it continuous almost evrywhere(with rispect to the Lebeuge measure).
Defining expectation:
Let (Ω, F, P) be a probability space. Assume that Ω is finite, i.e Ω = {w1 , · · · , wN }
and let X be a random variable on Ω. Then
E[X] =
N
X
X(wi )P(wi )
i=1
Cinsider the following indicator or ”simple” random variable:
1
if w ∈ A
IA (w) =
0
otherwise
Then E[IA ] = E[A].
Let Aj ∈ F, j = 1, 2, · · · be a partition of Ω: i.e. disjoint sets whose union is Ω. Let
aj , j = 1, 2, · · · be a sequence of non-negative numbers. Define
9
X(w) =
X
aj IAj (w)
j
this sum is finite or infinite but well defined.
In order to obtain expectation let’s define the Lebegue integral. Consider the interval
[a, b] and a Borel measurable non-negative function, g on [0, ∞) and the Lebesgue measure
L defined on the Borel sigma field on [0, b]. Let A ⊂ [0, ∞) and g(x) = IA (x) then the
Lebesgue integral is defined as follows:
Z b
IA (x)dL(x) = L(A)
0
P
For a simple function g(x) = j cj IAj (x)
Z b
X
g(x)dL(x) =
cj L(Aj )
0
j
for a general Borel function approximate g by step function Let
Am,n = {x : m/2n ≤ g(x) < (m + 1)/2n )}, m = 0, 1, 2, .. · · · 22n
gn (x) =
2n −1
2X
m/2n IAm,n
m=0
Z
In =
∞
gn (x)dµ(x) =
0
2n −1
2X
m=0
m
µ(Am,n )
2n
In increasing
in n. Since g non-negative the limit exists (might be infinity) and is denoted
R∞
by 0 g(x)dL(x).
Now return to the dfinition of expectation of general random variable:
First consider a non-negative r.v. X. For fixed n let
Am,n = {w : m/2n ≤ X(w) < (m + 1)/2n }, m = 0, · · · , 22n − 1
P 2n
Define Xn (w) = 02 −1 m/2n IAm,n (w)
E[Xn ] =
2n −1
2X
m/2n P(Am,n )
m=0
This is an increaing sequence and thus has a proper limit (can be ∞.) Define E[X] =
limn→∞ E[Xn ] and denote it by
Z
E[X] =
X(w)dP(w)
Ω
When X assumes both negative and positive value define: X + = max(X, 0); X − =
max(−X, 0) = − min(X, 0) Then
10
LECTURE 2. RANDOM VARIABLES
1. E[X + ] < ∞, E[X − ] < ∞ then E[X] = E[X + ] − E[X − ]
2. E[X + ] = ∞, E[X − ] < ∞ then E[X] = ∞
3. E[X + ] < ∞, E[X − ] = ∞ then E[X] = −∞
4. E[X + ] = ∞, E[X − ] = ∞ then E[X] is undefined.
Note that P(m/2n ≤ X(w) < (m + 1)/2n ) = µ[m/2n , (m + 1)/2n ), where µ is the
measure (or probablity measure Ron R induced by defining µ(a, b] = F (b) − F (a)
R∞
∞
Thus we can write E[X] = 0 xdµ(x) Usually it is written as E[X] = 0 xdF (x),
where FRis the distribution function of X. If F has a density this can be written as follows:
∞
E[X] = 0 xf (x)dx If g is a function from R to R and X is a random variable so is g(X).
and
Z
E[g(X)] = g(x)dµ(x)
2.1
Convergence of integrals
Definition 2.3. Let (Ω, F, P) be a probability space and X1 , X2 , · · · be a sequaence of
random variables. We say that X1 , X2 , · · · converges to X almost surely if P(Xn → X) = 1
Definition 2.4. Let {fj , j ≥ 1} be a sequence of Borel measureable functions. We say
that fj converegence almstevery where to f if limj→∞ f (x) → f (x) except for a set with
Lebesgue measure 0.
Question: Assume that Xn → X with probability 1. is it always true that E[Xn ] →
E[X]?
R A similar Rquestion: Assume that fn → f almost everywhere. Is it always true that
fn (x)dx → f (x)dx? the answer is not neccessary:
Example:
√
n n2 x 2
fn (x) = √ e− 2
2π
For x 6= 0, fn (x) → 0 as n → ∞ for x 6= 0when x = 0 the limit is ∞ however converges
almost Reverywhere to 0. R
∞
∞
lim −∞ fn (x)dx = 1 6= −∞ lim fn (x)dx = 0
Convergence theorems
Theorem 2.3. Let X1 , X2 , · · · be a sequence of random variables converges almost surely
to a random variable X. Then
1. Monotone convergence theorem: If X1 ≤ X2 ≤ X3 ≤ · · · almost surely then
lim E[Xn ] = E[X]
n→∞
.
2.2. INDEPENDENCE
11
2. Dominated convergence theorem:m: If |Xn | ≤ Y , E[Y ] < ∞ then
lim E[Xn ] = E[X]
n→∞
.
Similarily with Borel functions: Let f1 , f2 , · · · be a sequence offunctions converges
almost surely to a a function f . Then
1. Monotone convergence theorem: If f1 ≤ f2 ≤ f3 ≤ · · · almost everywhere then
Z
Z
lim
fn (x)dx = f (x)dx
n→∞
.
R
2. Dominated convergence theorem:m: If |fn | ≤ g, g(x)dx < ∞ then
Z
Z
lim
fn (x)dx = f (x)dx
n→∞
.
2.2
Independence
Two events A and B are independent if
P(A ∩ B) = P(A)P(B)
let (Ω, F, P) be a probability space. A and B are independent if
P(A ∩ B) = P(A)P(B)
Let F1 and F2 be two σ algebras. F1 and F2 are independent if for each A ∈ F1 and
B ∈ F2 A and B are independent.
Let X and Y be random variables defined on (Ω, F, P) . X and Y are independent if
σ(X) and σ(Y ) are independent.
Example
Throw a coin twice:
σ(S1 ) = {{HH, HT }, {T H, T T }, {HH, HT, T H, T T }, Φ}
σ(S2 ) = {{HH}, {T T }, {T H, HT }, {HH, T T }, {HH, T H, HT }, {T T, T H, HT },
{HH, HT, T H, T T }, Φ}
Clearly dependent.
12
LECTURE 2. RANDOM VARIABLES
σ(S2 − S1 ) = {{T T, HT }, {HH, T H}, {HH, T T, T H, HT }, Φ}
σ(S1 ) and σ(S2 − S1 ) are independent.
If X and Y are independent and C and D any Borel sets then
P(X ∈ C) ∩ (Y ∈ D) = P(X ∈ C)P(Y ∈ D)
this implies that
FXY (x, y) = P(X ≤ x, Y ≤ y) = FX (x)FY (y)
If there is a joint density
fXY (x, y) = fX (x)fY (y)
If g, h andy Borel functions then g(X), h(Y ) also independnet.
2.3
Change of measure
On the same probability space one can define some probabilty meausres. In mathematical finance there is the actual probability measure and so called risk natural probability
measure. here we study how to derive one probability measure from the other.
Example: Consider the binomial model assume H with probability 1/3 and T with
probability 2/3 -probability measure P. Consider nother probability measure P̃ with
P̃(H) = P̃(T ) = 1/2.
P̃(w)
Then define a random variable Z as follows Z(w) = P(w)
w
HHH
HHT
HTH
THH
HTT
THT
TTH
TTT
Clearly
Thus
P(w)
P̃(W ) Z(w)
3
(1/3)
(1/2)3 (3/2)3
(1/3)2 (2/3) (1/2)3 (3/2)3 (1/2)
(1/3)2 (2/3) (1/2)3 (3/2)3 (1/2)
(1/3)2 (2/3) (1/2)3 (3/2)3 (1/2)
(1/3)(2/3)2 (1/2)3 (3/2)2 (1/4)
(1/3)(2/3)2 (1/2)3 (3/2)2 (1/4)
(1/3)(2/3)2 (1/2)3 (3/2)2 (1/4)
(1/3)(2/3)3 (1/2)3 (3/2)3 (1/8)
E[Z] = 1 (under the measure P).
P̃(w) = Z(w)P (w)
Theorem 2.4. Let (Ω, F, P) be a probability space. and Z an almost suerely nonnegative
random variable with E[Z] = 1. For A ∈ F define:
Z
P̃(A) = E[Z1A ] =
Z(w)dP(w)
A
Then P̃ is a probability measure.
2.3. CHANGE OF MEASURE
13
Proof
1. P̃(A) ≥ 0
2.
Z
P̃(Ω) =
Z(w)dP (w) = E[Z] = 1
Ω
3. Let A1 , A2 , · · · be disjoint sets. Let Bn = ∪ni=1 Ai , and B∞ = ∪∞
i=1 Ai .
Z
Z
P̃(Bn ) =
Z(w)dP (w) =
1Bn (w)Z(w)dP (w)
Bn
n Z
X
=
i=1
Ω
1Ai (w)Z(w)dP (w)
Ω
Since 1B1 ≤ 1B2 ≤ · · ·
Z
1Bn (w)Z(w)dP (w)
P̃(B∞ ) = lim
n→∞
Ω
Z
=
1B∞ (w)Z(w)dP (w) =
Ω
∞ Z
X
i=1
1Ai Z(w)dP (w)
Ω
Definition 2.5. Two probability measures on a nonempty set Ω with σ-algebra F are
equivalent, if they agree which sets in F habe probability 0.
P(A) = 0 ⇐⇒ P̃(A) = 0
Remark 2.3. If in theorem 2.4 Z > 0 P almost surely, then P and P̃ are equivalent.
Another way to write 2.4 is : Let X a random variable and Z as defined in theorem
2.4. Then under P̃
Z
X(w)Z(w)dP(w)
Ẽ[X] =
Ω
Example 2.1. Let X be a normal distributed random variable with mean 0 and variance
2
1 and let Y = X + µ. Let Z = exp(−µX − µ2 )
then
1. Z > 0.
2.
Z ∞
1
µ2
x2
E[Z] = √
exp(−µx − ) exp(− )dx
2
2
∞
Z 2π
∞
1
1
=√
exp(− (x2 − 2µx + µ2 )dx
2
2π Z∞
∞
1
1
=√
exp(− (x − µ)2 )dx = 1
2
2π ∞
14
LECTURE 2. RANDOM VARIABLES
Thus Z satisfies the requirments of 2.4.
Let’s calculate the distribution of Y under P̃.
P̃(Y ≤ y) = E[Z1Y ≤y ] = E[Z1X≤y−µ ]
Z y−µ
x2
1
µ2
=√
exp(−µx − ) exp(− )dx
2
2
2π −∞
Z y−µ
2
1
(x + µ)
=√
)dx
exp(−
2
2π −∞
Z y
s2
1
exp(− )ds
=√
2
2π −∞
Thus under P̃ Y is Normal with mean 0.
The proof of the following is byond the scop of this course:
Theorem 2.5. Let P and P̃ be equivalent probability measures on (Ω, F). Then there exist
a positive random variable Z with E[Z] = 1 and
Z
P̃(A) =
Z(w)dP(w)
A
for every A in F
Z is the Radon -Nikodym derivative of P̃ with respect to P.
2.4
Conditional expectation
Let (X, Y ) be random variables defined on the same probability space. The joint distribution FXY (x, y) = P(X ≤ x, Y ≤ y). If Y is discrete
P(X ≤ x|Y = y) =
P(X ≤ x, Y = y)
P(Y = y)
and when X and Y have joint density define
fX|y (x) =
Z
fXY (x, y)
fY (y)
∞
E[X|Y = y] =
xfX|y (x|y)dx)
−∞
To define conditional expectation in a more generally.
2.4. CONDITIONAL EXPECTATION
15
Let (Ω, F, P) be a probabilty space. Let G be a sub σ-field of G, i.e. G ⊂ F, and X a F
measurable random variable. Then E[X|G] is a G measurable random variable satisfying
E[E[X|G]1A ] = E[X1A ]
for all A ∈ G or
Z
Z
E[X|G](w)dP(w) =
A
X(w)dP(w)
A
Example:
Consider 3 tosses of coin
Ω = {HHH, HHT, HT H, HT T, T HH, T HT, T HH, T HT, T T H, T T T }
Define
F0 = {Ω, Φ}
F1 = {Ω, Φ, {HHH, HHT, HT H, HT T }, {T HT, T HH, T HT, T T H, T T T }
F2 = {Ω, Φ, {HHH, HHT, HT H, HT T }, {T HT, T HH, T T H, T T T },
{HHH, HHT }, {HT H, HT T }, {T HT, T HH}, {T T H, T T T }
{HHH, HHT, T HT, T HH}, {HHH, HHT, T T H, T T T },
{HT H, HT T, T HT, T HH}, {HT H, HT T, T T H, T T T }
{HHH, HHT, HT H, HT T, T HT, T HH}, {HHH, HHT, HT H, HT T, T T H, T T T }
{HT H, HT T, T HT, T HH, T T H, T T T }, {HT H, HT T, T HH, T HT, T T H, T T T }}
F3 contains all the possinble subset of Ω
Let X be the number of H in 3 tosses:
Consider E[X|F2 ]. Let A = {HT H, HT T, T HH, T HT, T T H, T T T }
w
P(w)
E[X|F2 ](w) E[X|F2 ]1A P (w)E[X|F2 ]1A
HHH p3
2+p
0
0
2
HHT p (1 − p) 2 + p
0
0
2
HTH p (1 − p) 1 + p
1+p
p2 (1 − p)(1 + p)
HTT p(1 − p)2 1 + p
1+p
p(1 − p)2 (1 + p)
THH p2 (1 − p) 1 + p
1+p
p2 (1 − p)(1 + p)
2
THT p(1 − p) 1 + p
1+p
p(1 − p)2 (1 + p)
TTH p(1 − p)2 p
p
p2 (1 − p)2
3
TTT (1 − p)
p
p
p(1 − p)3
3p − p3 − 2p2
What we get from
X
X(w)P (w) = 4p2 (1 − p) + 3p(1 − p)2 = 3p − p3 − 2p2
w∈A
Properties of conditional expectation:
16
LECTURE 2. RANDOM VARIABLES
(1) Linearity: If X and Y are integrable and c1 , c2 are constant then
E[(c1 X + c2 Y |G] = c1 E[X|G] + c2 E[Y |G]
(2) Taking out what is known: If Z is G measurable, X is integrable, an XZ is integrable
then
E[XZ|G] = ZE[X|G]
especially
E[Z|G] = Z
(3) Iterated conditioning-tower property: If G, H are two σ-fields and H ⊂ G then for G
measurable r.v. X
E[E[X|G]|H] = E[X|H]
(4) Independence: If X is independent of G then
E[X|G] = E[X]
(5) Jensen inequality: for a convex function g,
E[g(X)] ≥ g(E[x])
Proof:
(2) Taking out what is known: Let Z be a G measurable random variable. Assume that
Z = 1B for B ∈ G. Then for A ∈ G
Z
Z
Z
Z
1B (w)E[X|G](w)dP(w) =
E[X|G](w)dP(w) =
X(w)dP(w) =
1B (w)X(w)dP(w)
A
A∩B
A∩B
A
To prove for general ramdom variable approximate it by step function and take the
limit.
(4) Let X = IB independent of G.
Z
Z
IB (w)dP(w) = P(A ∩ B) = P(A)P(B) =
E[IB ]dP(w)
A
A
To prove for general ramdom variable approximate it by step function and take the
limit.
(3) let A ∈ H (thus also in G)
Z
Z
Z
Z
E[E[X|G]H](w)dP(w) =
E[X|G](w)dP(w) =
X(w)dP(w) =
E[X|H](w)dP(w)
A
A
A
A
2.5. FILTRATION
2.5
17
Filtration
Definition 2.6. Let Omega be non-empty set. Let T ≥ 0 and assume that for each
t ∈ [0, T ] there is a σ algebra F(t). For s ≤ t F(s) ⊆ F(t) .Then the collection F(t) is a
filration.
Knowing the definition of filtration and conditional expectaion we can define the following processes:
Definition 2.7. Let (Ω, F, P), and let F(t), 0 ≤ t ≤ T be a filtration of sub-σ-algebra of
F. Cosider the adapted process M (t):
(i) If for s < t E[M (t)|F(s)] = M (s) then M is a martingale.
(ii) If for s < t E[M (t)|F(s)] ≥ M (s) then M is a sub-martingale.
(iii) If for s < t E[M (t)|F(s)] ≤ M (s) then M is a super-martingale.
Now we can also define the Markov property:
Definition 2.8. Let (Ω, F, P), and let F(t), 0 ≤ t ≤ T be a filtration of sub-σ-algebra of
F. An adapted process X(t) is a Markov process if for every Borel measurable function h:
E[h(X(t))|F(s)] = g(X(s))
© Copyright 2026 Paperzz