- Lecture 2:Random Variables

Lecture 2:Random Variables
Esther Frostig
The University of Haifa
Definition
Let (Ω, F, P) be a probability space. A random variable X is
real valued function (i.e. a function from ω to (−∞, ∞) such
that for every Borel subset B of R ,{w : X (w) ∈ B} ∈ F.
For B ⊂ R define X − (B) = {w : X (w) ∈ B}. then:
Theorem
1. X −1 (Ac ) = (X −1 (A))c .
∞
−1 (A )).
2. X −1 (∪∞
i
i=1 Ai ) = ∪i=1 (X
Theorem
Let (Ω, F, P) be a probability space. A random variable X is a
real valued function (i.e. a function from Ω to (−∞, ∞) such
that X − ((−∞, x]) ∈ F
Definition
Let X a r.v. Define the σ algebra generated by X , σ(X ), as the
smallest σ algebra containing the sets X −1 (B), where B ∈ B.
Remark: Let X , Y be defined on (Ω, P). Then Y is σ(X )
measurable if and only if there is a Borel function g : R → R
such that Y = g(X ).
I
FX , the distribution function of a r.v. X ,
FX (x) = P({w : X (w) ≤ x}).
I
Properties:
1. if x < y then F (x) ≤ F (y ).
2. limx→−∞ (F (x)) = 0; limx→∞ (F (x)) = 1;
3. F is right continuous., F (x + h) → F (x) as h → 0.
I
Define probability measure on (R, B):
µ(a, b] = F (b) − F (a),
I
µ[a, b] = F (b) − F (a−)
Can be extended to a Probability measure on all Borel
sets.
I
Borel σ field on R2 is the smallest σ-algebra containing set
of B1 × B2 , Bi ∈ B.
I
Let X , Y be random variables on a probability space
(Ω, F, P). For C a Borel set in R2 let
µ((X , Y ) ∈ C) = P((X , Y ) ∈ C)
I
If
Z
x
Z
y
FXY (x, y ) = P(X ≤ x, Y ≤ y ) =
fXY (s, t)dtds
−∞
−∞
We say that fX ,Y is the densidty of (X , Y ) .
Integrals
I
Rieman integrals:
I
I
I
I
I
f :R→R.
a = x0 < x1 < x2 < · · · xn = b.
maxi (xi+1 − xi , i = 0, · · · , n − 1) = δn .
Mi = max(f (x) : xi ≤ x < xi+1 ),
mi = min(f (x) : xi ≤ x < xi+1 ).
Pn−1
Pn−1
Sh,n = i=0 Mi (xi+1 − xi ) and Sl,n = i=0 mi (xi+1 − xi )
I
If Sh,n → Ih and Sl,n → Il and the limits are equal we say
Rb
that the Rieman integral exists and equals to a f (x)dx.
I
Exists if it continuous almost evrywhere (with respect to
the Lebesgue measure).
Expectation-1
I
Ω = {w1 , · · · , wN } with F containing all subsets of Ω. X a
random variable on Ω.
I
E[X ] =
N
X
X (wi )P(wi )
i=1
I
Let Ai be a sequence of disjoint measurable
sets, with
P∞
∞
∪i=1 Ai = Ω, and ai ≥ 0 . X (w) = i=1 ai IAi (w) Then
E[X ] =
∞
X
ai P(Ai )
i=1
this sum is finite or infinite but well defined.
Expectation
I
X a nonnegative random variable.
I
Xn (w) =
n −1
2X
m=0
Bm,n = {w :
I
m
IB
2n m,n
m
m+1
≤ X (w) <
}
n
2
2n
n
2
X
m
P(Bm,n )
E[Xn ] =
2n
m=1
I
Z
E[X ] = lim E[Xn ] =
X (w)dP(w)
0
By MCT
∞
Lebesgue integral
I
g : R+ → R+ Borel.
I
µ a measure on R+ .
I
A∈B
∞
Z
IA (x)dµ(x) = µ(A)
0
I
g(x) =
P
j
cj IAj (x)
Z
∞
g(x)dµ(x) =
0
X
cj µ(Aj )
j
I
n
2
X
m
IB (x)
gn (x) =
2n m,n
m=1
Bm,n = {x :
m−1
m
≤ g(x) < n }
n
2
2
I
Z
Z
gn (x)dµ(x) →
g(x)dµ(x)
(monotone convergence Theorem) Lebesgue integral.
Expectation for general R.V
I
I
I
X + = max(X , 0); X − = max(−X , 0) = − min(X , 0)
1.
2.
3.
4.
E[X + ] < ∞, E[X − ] < ∞ then E[X ] = E[X + ] − E[X − ]
E[X + ] = ∞, E[X − ] < ∞ then E[X ] = ∞
E[X + ] < ∞, E[X − ] = ∞ then E[X ] = −∞
E[X + ] = ∞, E[X − ] = ∞ then E[X ] is undefined.
X is integrable if
Z
E[|X | =
|X (w)|dP(w) < ∞.
In this case
Z
E[|X |] =
X (w)dP(w).
Convergence Theorem
I
X1 , X2 , · · · be a sequaence of random variables on
(Ω, F, P). X1 , X2 , · · · converges to X almsot surely if
P(Xn → X ) = 1.
I
Let {fj , j ≥ 1} be a sequence of Borel measureable
functions. We say that fj converegence almst every where
to f if limj→∞ f (x) → f (x) except for a set with Lebesgue
measure 0.
I
Assume
R that fn →R f almost everywhere. Is it always true
that fn (x)dx → f (x)dx?
Not necessary!!!
I
Example:
√
n2 x 2
n
fn (x) = √ e− 2
2π
For x 6= 0, fn (x) → 0 as n → ∞ for x 6= 0 when x = 0 the
limitR is ∞ however converges
almost everywhere to 0.
R∞
∞
lim −∞ fn (x)dx = 1 6= −∞ lim fn (x)dx = 0
Theorem
Let X1 , X2 , · · · be a sequence of random variables converges
almost surely to a random variable X . Then
1. Monotone convergence theorem: If X1 ≤ X2 ≤ X3 ≤ · · ·
almost surely then
lim E[Xn ] = E[X ]
n→∞
.
2. Dominated convergence theorem:m: If |Xn | ≤ Y , E[Y ] < ∞
then
lim E[Xn ] = E[X ]
n→∞
.
Theorem
Let f1 , f2 , · · · be a sequence offunctions converges almost
surely to a a function f . Then
1. Monotone convergence theorem: If f1 ≤ f2 ≤ f3 ≤ · · ·
almost everywhere then
Z
Z
fn (x)dx = f (x)dx
lim
n→∞
.
2. RDominated convergence theorem: If |fn | ≤ g,
g(x)dx < ∞ then
Z
Z
lim
fn (x)dx = f (x)dx
n→∞
.
Connection to known formulas
I
X r.v. distribution F .
I
µ measure on R, µ(a, b] = F (b) − F (a).
µ(B) = P(X ∈ B).
I
g Borel R → R
R∞
If −∞ |g(x)|dµ(x) < ∞ then
I
Z
∞
g(x)dµ(x)
E(g((X )) =
−∞
I
Proof
I
g=
Pn
i=1
ci Ix∈Bi non-negative
E[g(X )] =
n
X
i=1
ci P(X ∈ Bi ) =
n
X
i=1
Z
ci µ(Bi ) =
∞
g(x)dµ(x)
0
I
Approximate g
gn (x) =
2n −1
2X
0
Bm,n = {x :
I
m
IB (x)
2n m,n
m
m+1
≤ g(x) <
}
2n
2n
n
E[gn (X )] =
2
X
g(
0
m
)P(Bm,n (x))
2n
I
MCT proves the result
I
Remark If F has a dsnsity f , i.e. P(X ∈ B) =
then
Z ∞
E[X ] =
xf (x)dx
0
R
B
f (x)dx,
properties of expectation
1. X ≤ Y ⇒ E[X ] ≤ E[Y ]
2. Linearity:
E[αX + βY ] = αE[X ] + βE[Y ]
3. Jensen inequality: ϕ convex,
i.e.ϕ(α(x) + (1 − α)y )) ≤ αϕ(x) + (1 − α)φ(y ) then
E[ϕ(X )] ≥ ϕ(E[X ])]
Independence
I
I
I
I
A and B are independent if P(A ∩ B) = P(A)P(B).
Let F1 and F2 be two σ algebras on Ω. F1 and F2 are
independent if for each A ∈ F1 and B ∈ F2 A and B are
independent.
X and Y be random variables defined on (Ω, F, P) . X and
Y are independent if σ(X ) and σ(Y ) are independent.
Throw a coin twice:
σ(S1 ) = {{HH, HT }, {TH, TT }, {HH, HT , TH, TT }, Φ}
σ(S2 ) = {{HH}, {TT }, {TH, HT }, {HH, TT }, {HH, TH, HT },
{TT , TH, HT }, {HH, HT , TH, TT }, Φ}
Clearly dependent.
I
σ(S2 − S1 ) = {{TT , HT }, {HH, TH}, {HH, TT , TH, HT }, Φ}
σ(S1 ) and σ(S2 − S1 ) are independent.
If X and Y are independent.
I
C and D any Borel sets then
P(X ∈ C) ∩ (Y ∈ D) = P(X ∈ C)P(Y ∈ D)
I
FXY (x, y ) = P(X ≤ x, Y ≤ y ) = FX (x)FY (y )
I
If there is a joint density
fXY (x, y ) = fX (x)fY (y )
I
If g, h are
Borel functions then g(X ), h(Y ) also independnet.
Change of measure
I
On the same probability space define some probabilty
meausres.
I
Example Binomial model assume H w.p 1/3 and T w.p 2/3
-P. Probability measure P̃ with P̃(H) = P̃(T ) = 1/2.
Then define a random variable Z as follows Z (w) =
w
HHH
HHT
HTH
THH
HTT
THT
TTH
TTT
I
P(w)
(1/3)3
(1/3)2 (2/3)
(1/3)2 (2/3)
(1/3)2 (2/3)
(1/3)(2/3)2
(1/3)(2/3)2
(1/3)(2/3)2
(1/3)(2/3)3
P̃(W )
(1/2)3
(1/2)3
(1/2)3
(1/2)3
(1/2)3
(1/2)3
(1/2)3
(1/2)3
Z (w)
(3/2)3
(3/2)3 (1/2)
(3/2)3 (1/2)
(3/2)3 (1/2)
(3/2)2 (1/4)
(3/2)2 (1/4)
(3/2)2 (1/4)
(3/2)3 (1/8)
Clearly E[Z ] = 1 (under the measure P).
P̃(w)
P(w)
I
P̃(w) = Z (w)P(w)
I
Theorem
Let (Ω, F, P) be a probability space. and Z an almost suerely
nonnegative random variable with E[Z ] = 1. Foe ∈ F define:
Z
P̃(A) = E[Z 1A ] =
Z (w)dP(w)
A
Then P̃ is a probability measure.
Proof
1. P̃(A) ≥ 0
2.
Z
Z (w)dP(w) = E[Z ] = 1
P̃(Ω) =
Ω
3. Let A1 , A2 , · · · disjoint sets. Let Bn = ∪ni=1 Ai , and
B∞ = ∪∞
i=1 Ai .
Z
Z
P̃(Bn ) =
Z (w)dP(w) =
1Bn Z (w)dP(w)
Bn
=
n Z
X
i=1
Ω
Ω
1Ai Z (w)dP(w)
Since 1B1 ≤ 1B2 ≤ · · ·
Z
P̃(B∞ ) = lim
n→∞ Ω
1Bn Z (w)dP(w)
Z
1B∞ Z (w)dP(w) =
=
Ω
∞ Z
X
i=1
Ω
1Ai Z (w)dP(w)
I
Definition
Two probability measures on nonemty set Ω with σ-algebra F
are equivalent if they agree which sets in F habe probability 0.
P(A) = 0 ⇐⇒ P̃(A) = 0
I
Remark If in theorem 7 Z > 0 P and P̃ are equivalent.
I
Let X a random variable and Z as defined in theorem 7.
Then under P
Z
Ẽ[X ] =
X (w)Z (w)dP(w)
Ω
Example
I
X be a normal distributed random variable with mean 0
and variance 1.
I
Y =X +µ
I
Z = exp(−µX −
I
µ2
2 )
1. Z > 0.
2.
Z ∞
1
µ2
x2
E[Z ] = √
exp(−µx −
) exp(− )dx
2
2
2π
Z ∞ ∞
1
1 2
=√
exp(− (x − 2µx + µ2 )dx
2
2π ∞
Z ∞
1
1
=√
exp(− (x − µ)2 )dx = 1
2
2π ∞
The distribution of Y under P.
P̃)Y ≤ y ) = E[Z 1Y ≤y ]E[Z 1X ≤y −µ ]
Z y −µ
1
µ2
x2
=√
exp(−µx − ) exp(− )dx
2
2
2π −∞
Z y −µ
2
1
(x + µ)
=√
exp(−
dx
2
2π −∞
Z y
1
s2
=√
exp(− )ds
2
2π −∞
Thus under P̃ Y is Normal with mean 0.
Radon-Nykodim
Theorem
Let P and P̃ be equivalent probability measures on (Ω, F). Then
there exist a positive random variable Z with E[Z ] = 1 and
Z
P̃(A) =
Z (w)dP(w)
A
for every A in F
Z is the Radon -Nikodym derivative of P̃ with respect to P.
Conditional expectation
(X , Y ) random variables on (Ω, F, P)
I
The joint distribution FXY (x, y ) = P(X ≤ x, Y ≤ y ).
I
If Y is discrete
P(X ≤ x|Y = y ) =
P(X ≤ x, Y = y )
P(Y = y )
X also discrete:
E[X |Y = y ] =
X
xP(X = x|Y = y )
x
I
If X and Y have joint density define
fX |y (x) =
Z
fXY (x, y )
fY (y )
∞
E[X |Y = y ] =
−∞
xfX |y (x|y )dx)
To define conditional expectation more generally.
Definition
(Ω, F, P) be a probabilty space. G ⊂ F, and X a random
variable. Then E[X |G] is a G measurable random variable
satisfying
E[E[X |G]1A ] = E[X (w)1A (w)]
for all A ∈ G.
Consider 3 tosses of coin
Ω = {HHH, HHT , HTH, HTT , THH, THT , THH, THT , TTH, TTT }
F0 = {Ω, Φ}
F1 = {Ω, Φ, {HHH, HHT , HTH, HTT },
{THT , THH, THT , TTH, TTT }
F2 = {Ω, Φ, {HHH, HHT , HTH, HTT },
{THT , THH, TTH, TTT }, {HHH, HHT }, {HTH, HTT },
{THT , THH}, {TTH, TTT }, {HHH, HHT , THT , THH},
{HHH, HHT , TTH, TTT }, {HTH, HTT , THT , THH},
{HTH, HTT , TTH, TTT }, {HHH, HHT , HTH, HTT , THT , THH},
{HHH, HHT , HTH, HTT , TTH, TTT },
{HTH, HTT , THT , THH, TTH, TTT },
{HTH, HTT , THH, THT , TTH, TTT }}
F3 contains all the possible subsets of Ω
I
I
I
X the number of H in 3 tosses.
E[X |F2 ]. Let A = {HTH, HTT , THH, THT , TTH, TTT }
I
w
HHH
HHT
HTH
HTT
THH
THT
TTH
TTT
I
P(w)
p3
p2 (1 − p)
p2 (1 − p)
p(1 − p)2
p2 (1 − p)
p(1 − p)2
p(1 − p)2
(1 − p)3
E[X |F2 ](w)
2+p
2+p
1+p
1+p
1+p
1+p
p
p
E[X |F2 ]1A
0
0
1+p
1+p
1+p
1+p
p
p
P(w)E[X |F2 ]1A
0
0
p2 (1 − p)(1 + p)
p(1 − p)2 (1 + p)
p2 (1 − p)(1 + p)
p(1 − p)2 (1 + p)
p2 (1 − p)2
p(1 − p)3
3p − p3 − 2p2
What we get from
X
X (w)P(w) = 4p2 (1 − p) + 3p(1 − p)2
w∈A
Properties of conditional expectation:
(1) Linearity: If X and Y are integrable and c1 , c2 are
constant then
E[(c1 X + c2 Y |G] = c1 E[X |G] + c2 E[Y |G]
(2) If Z is G measurable, X is integrable, an XZ is
integrable the
E[XZ |G] = Z E[X |G]
especially
E[Z |G] = Z
(3) Iterated conditioning-tower property: If G, H are
two σ-fields and H ⊂ G then for G measurable r.v.
X
E[E[X |G]|H] = E[X |H]
(4) Independence: If X is independent of G then
E[X |G] = E[X ]
(5) Jensen inequality: for a convex function g,
Proof
(4) Let X = IB independent of G.
Z
Z
IB (w)dP(w) = P(A∩B) = P(A)P(B) =
E[IB ]dP(w)
A
A
To prove for general ramdom variable approximate
it by step function and take the limit.
(3) let A ∈ H (thus also in G)
Z
E[E[X |G]H](w)dP(w)
AZ
Z
E[X |G](w)dP(w) =
=
ZA
E[X |H](w)dP(w)
=
A
X (w)dP(w)
A