Conditional Expectation

Conditional Expectation
Definition 1 Let ξ be a random variable F ⊆ A is a sub σ-algebra. The
conditional expectation E (ξ | F) is an F-measurable function such that
Z
Z
ξdP =
E (ξ | F) dP, ∀F ∈ F.
(1)
F
F
If F $ σ (η α )α∈A , then E (ξ | F) is denoted by E (ξ | η α , α ∈ A) . If ξ = χA ,
then E (ξ | F) is denoted by P (A | F) and called conditional probability. In this
case (1) is
Z
P (A ∩ F ) =
P (A | F) dP,
∀F ∈ F.
F
Proposition 2 (Existence of conditional probability) Let ξ be random variable F conditional σ-algebra. If ξ has expected value, finite or infinite, then
E (ξ | F) exists and uniquely defined up to a measure zero set.
Example 3 If F $ σ ((Ak )) , that is F generated by the partition (Ak ) then
Z
1
ξdP, ω ∈ Ak .
E (ξ | F) (ω) =
P (Ak ) Ak
E (ξ | F) is F-measurable, so it is constant on the elements of the partition (Ak )
that is constant on every set Ak . By the definition of the conditional expectation
Z
Z
E (ξ | F) (ω) dP (ω) = E (ξ | F) (ω) · P (Ak ) =
ξdP.
Ak
Ak
Hence if P (Ak ) > 0 then
R
E (ξ | F) (ω) =
Ak
E (ξ | F) (ω) dP (ω)
P (Ak )
,
ω ∈ Ak .
Proposition 4 (The properties of the conditional expectation) Assume
that all variables below are integrable.
a.s.
1. If ξ and F are independent, then E (ξ | F) = E (ξ) . Specially if F =
a.s.
{∅, Ω} , then E (ξ | F) = E (ξ) .
2. If ξ is F-measurable and the conditional expectation E (ξ | F) exists, then
a.s.
E (ξ | F) = ξ.
3. If ξ ≤ η, then
a.s.
E (ξ | F) ≤ E (η | F) ,
specially
a.s.
a.s.
0 ≤ P (A | F) ≤ 1.
1
a.s.
4. For any constant a E (aξ | F) = aE (ξ | F) .
a.s.
5. E (ξ + η | F) = E (ξ | F) + E (η | F) . Specially, if A and B are disjoint
sets, then
a.s.
P (A ∪ B | F) = P (A | F) + P (B | F) ,
hence if A ⊆ B, then
a.s.
P (B \ A | F) = P (B | F) − P (A | F) .
6. |E (ξ | F)| ≤ E (|ξ| | F) .
7. If G ⊆ F, then
a.s.
E (E (ξ | F) | G) = E (ξ | G) .
Specially
E (E (ξ | F)) = E (ξ) .
8. If G ⊆ F, then
a.s.
E (E (ξ | G) | F) = E (ξ | G) .
9. The monotone convergence theorem is satisfied, that is, if
ξ n ≥ 0,
and
then
ξ n % ξ,
a.s.
E (ξ n | F) % E (ξ | F) .
Specially, if An % A, then
a.s.
P (An | F) % P (A | F) .
If ξ n ≥ 0, then
∞
X
a.s.
E (ξ n | F) = E
n=1
∞
X
!
ξn | F
.
n=1
If the sets (An )n are disjoint and A = ∪n An , then
∞
X
a.s.
P (An | F) = P (A | F) .
n=1
10. The dominated convergence theorem is satisfied, that is if
lim ξ n = ξ,
n→∞
and
|ξ n | ≤ η,
where E (η) < ∞, then
a.s.
lim E (|ξ n − ξ| | F) = 0,
n→∞
a.s.
lim E (ξ n | F) = E (ξ | F) .
n→∞
Specially, if An & A, then
a.s.
lim P (An | F) = P (A | F) .
n→∞
2
11. If variable η is F-measurable, and E (ξ) and E (ηξ) are finite, then
a.s.
E (ηξ | F) = ηE (ξ | F) .
It is also true, if η and ξ are nonnegative.
Proof: The proofs are just reformulation of the analog proofs for the Lebesgueintegral.
1. The constant function E (ξ) is F-measurable, and by the independence of ξ
and F for all F ∈ F
Z
Z
ξdP = E (χF ξ) = E (χF ) E (ξ) =
E (ξ) dP.
F
F
2. The ξ is good for E (ξ | F), but E (ξ | F) is defined only up to a measure
zero set.
3. If ξ ≤ η and F ∈ F, then
Z
Z
Z
Z
E (ξ | F) dP =
ξdP ≤
ηdP =
E (η | F) dP,
F
F
F
F
m.m.
so E (ξ | F) ≤ E (η | F) .
4. If F ∈ F, then
Z
Z
Z
Z
aE (ξ | F) dP = a
E (ξ | F) dP = a
ξdP =
aξdP.
F
F
5. For F ∈ F
Z
F
Z
(ξ + η) dP
=
F
F
Z
ξdP+
ZF
=
ηdP =
Z
E (ξ | F) dP +
E (η | F) dP =
F
ZF
F
(E (ξ | F) + E (η | F)) dP.
=
F
As the sum E (ξ | F) + E (η | F) is F-measurable
a.s.
E (ξ + η | F) = E (ξ | F) + E (η | F) .
6. As −ξ, ξ ≤ |ξ|, by 3. and 4. the relation
a.s.
|E (ξ | F)| ≤ E (|ξ| | F)
is evident.
7. G ⊆ F, hence if G ∈ G, then G ∈ F and by the definition of E (E (ξ | F) | G) and
E (ξ | F)
Z
Z
Z
Z
E (E (ξ | F) | G) dP =
E (ξ | F) dP =
ξdP =
E (ξ | G) dP.
G
G
G
3
G
E (E (ξ | F) | G) is G-measurable and by the uniqueness of the conditional expectation
a.s.
E (E (ξ | F) | G) = E (ξ | G) .
8. By 2. it is evident, as by the assumptions G ⊆ F and E (ξ | G) is Fmeasurable.
9. The sequence E (ξ n | F) is almost surely monotone increasing, so the limit
ζ $ lim E (ξ n | F)
n→∞
exists. For any F ∈ F by definition
Z
Z
E (ξ n | F) dP =
ξ n dP.
F
F
By the classical theorem
Z
Z
Z
ζdP =
lim E (ξ n | F) dP = lim
E (ξ n | F) dP =
n→∞ F
F
F n→∞
Z
Z
Z
= lim
ξ n dP =
lim ξ n dP =
ξdP,
n→∞
F n→∞
F
that is
F
a.s.
lim E (ξ n | F) $ ζ = E (ξ | F) .
n→∞
10. Let
ζ n $ sup |ξ m − ξ| .
m≥n
As ξ n → ξ
ζ n & 0.
The conditional expectation conserves the ordering
a.s.
ψ $ lim E (ζ n | F) ≥ 0
n→∞
limit exists.
a.s.
a.s.
0 ≤ ζ n ≤ 2η,
so
a.s.
a.s.
0 ≤ E (ζ n | F) ≤ E (2η | F) .
As 2η and E (2η | F) are integrable by the classical dominated convergence
theorem
Z
Z
Z
ψdP $
lim E (ζ n | F) dP = lim
E (ζ n | F) dP $
n→∞ Ω
Ω
Ω n→∞
Z
Z
$ lim
ζ n dP =
lim ζ n dP = 0,
n→∞
Ω
Ω n→∞
4
a.s.
so ψ = 0. Using that the conditional expectation preserves ordering
m.m.
m.m.
a.s.
0 ≤ lim sup |E (ξ n | F) − E (ξ | F)| = lim sup |E (ξ n − ξ | F)| ≤
n→∞
n→∞
m.m.
m.m.
a.s.
≤ lim supE (|ξ n − ξ| | F) ≤ lim supE (ζ n | F) = 0.
n→∞
n→∞
11. Let η $ χG . As η is F-measurable, so G ∈ F. If F ∈ F is arbitrary, then
using that ξ has a conditional expectation and G ∩ F ∈ F
Z
Z
Z
Z
ηξdP $
ξχG dP $
ξdP $
E (ξ | F) dP $
F
G∩F
G∩F
ZF
Z
$
χG E (ξ | F) dP $
ηE (ξ | F) dP,
F
F
so
E (ηξ | F) = ηE (ξ | F) .
Let ξ ≥ 0. By the linearity the relation is true if η ≥ 0 is a step function.. If
0 ≤ η n % η, then 0 ≤ ξη n % ξη, and by the monotone convergence theorem
a.s.
a.s.
E (ηξ | F) = E lim η n ξ | F = lim E (η n ξ | F) =
n→∞
=
n→∞
lim η n E (ξ | F) = ηE (ξ | F) .
n→∞
As E (ξ) is finite
E (ξ | F) = E ξ + | F − E ξ − | F .
(2)
Let η ≥ 0 and be ξ arbitrary.
ηE (ξ | F)
a.s.
= ηE ξ + | F − ηE ξ − | F =
= E ηξ + | F − E ηξ − | F =
a.s.
+
−
= E (ηξ) | F − E (ηξ) | F = E (ηξ | F) ,
In the general case
E (ηξ | F) = E η + ξ + + η − ξ − − η + ξ − − η − ξ + | F .
Proposition 5 If P : L1 (Ω, A, P) → L1 (Ω, A, P) and P is
1. linear,
2. projection, that is P 2 = P ,
3. P 1 = 1,
4. contraction, that is kP ξk1 ≤ kξk1 ,
5
then there is a σ-algebra F ⊆ A such that for all ξ ∈ L1 (Ω, A, P) P ξ =
E (ξ | F) .
Conditional expectation and density functions
Definition 6 Let (X, A, µ) be a measure space and (Y, B) be a measurable space.
If f : X → Y is measurable then the measure
ν (B) $ µ f −1 (B) , B ∈ B
is called the distribution of f under µ.
Proposition 7 (Integration by substitution) Let (X, A, µ) be a measure
space, (Y, B) measurable space, f : X → Y and g : Y → R be measurable
mappings. If ν is the distribution of f then for any B ∈ B
Z
Z
gdν =
g (f ) dµ,
f −1 (B)
B
where the two integrals the same time exist or does not exist.
Proof: If g $ χC , C ∈ B, then
Z
Z
Z
Z
g (f ) dµ =
χC (f ) dµ =
χf −1 (C) dµ $
dµ =
X
X
X
f −1 (C)
Z
Z
−1
= µ f (C) $ ν (C) =
χC dν =
gdν.
Y
Y
As the integral is additive the equation is true if g is a step function. As every
non-negative measurable real-valued function is a limit of increasing sequence
of step functions by the monotone convergence theorem the equation is true if
g ≥ 0. If g is an arbitrary measurable function then taking the positive and the
negative part of g one can extend the equation for all measurable g. As
Z
Z
gdν $
gχB dν,
B
and
Z
Y
Z
gχB dν =
Y
Z
g (f ) χB (f ) dµ =
X
g (f ) dµ,
f −1 (B)
proposition is true.
Lemma 8 If f : X → R is an arbitrary real valued function then
σ (η) = B ⊆ X : B = f −1 (A) , A ∈ B (R) .
6
Lemma 9 Let f : X → R be an arbitrary real valued function. If g : X → R
is σ (f )-measurable, then there is an h : R → R Borel-measurable mapping such
that
g = h ◦ f.
−1
Proof: σ (f ) is the set of the sets of the form
Pn f (A) , A ∈ B (R) hence if g is
a σ (f )-measurable step function then g = k=1 ckP
χBk , where the sets Bk has
n
the form f −1 (Ak ), Ak ∈ B (R). In this case if h $ k=1 ck χAk , then
h (f (x)) =
n
X
ck χAk (f (x)) =
k=1
n
X
ck χf −1 (Ak ) (x) =
k=1
n
X
ck χBk (x) = g (x) .
k=1
As every measurable function is a limit of step functions
g = lim gn = lim hn (f ) ,
n→∞
n→∞
where hn are Borel-measurable. If h (y) $ lim supn→∞ hn (y) then h is Borelmeasurable and
g (x) = lim gn (x) = lim hn (f (x)) = h (f (x)) .
n→∞
n→∞
Definition 10 Let ξ and η be random variables.
E (ξ | η) $ E (ξ | σ (η)) .
Corollary 11 If ξ and η are random variables then there is a Borel-measurable
function h : R → R such that
E (ξ | η) = h (η) .
Definition 12 The function h above is called the regression of ξ over η and
denoted by
E (ξ | η = y) .
Proposition 13 If ξ and η are random variables then E (ξ | η = y) is the only
Borel-measurable function for which µ almost surely
Z
Z
E (ξ | η = y) dµ (y) =
ξdP, C ∈ B (R)
η −1 (C)
C
where
µ (C) $ P η −1 (C)
is the distribution of η.
7
Proof: By the definition of the conditional expectation if
F $ η −1 (C) = {η ∈ C} ∈ σ (η)
and
g (y) $ E (ξ | η = y)
then using the integration by substitution formula
Z
Z
Z
Z
ξdP =
E (ξ | η) dP =
g (η) dP =
gdµ $
F
F
η −1 (C)
C
Z
$
E (ξ | η = y) dµ (y) .
C
If for g and h
Z
Z
Z
hdµ =
C
then
gdµ, C ∈ B (R)
ξdP =
η −1 (C)
C
Z
h − gdµ = 0
C
hence for all C ∈ B (R)
Z
|g − h| dµ = 0,
C
so µ almost surely g = h.
Let f be the density function of (ξ, η) . Remember that
f (x, y) /g (y) if g (y) =
6 0
f (x | y) $
0
if g (y) = 0
where g is the density function of η that is
Z
g (y) $
f (u, y) du.
R
Remember that g (y) = 0 if and only if for almost all1 x f (x, y) = 0. If
−1
g (y) if g (y) 6= 0
g (y) $
,
0
if g (y) = 0
then
f (x | y) = f (x, y) g (y) .
1 By
the Lebesgue-measure.
8
Corollary 14 If the distribution of (ξ, η) has a density function f then
Z
a.s.
E (h (ξ) | η = y) =
h (x) f (x | y) dx,
R
where the relation a.s. is in the distribution of η.
Proof: If C ∈ B (R) and µ denotes the distribution of η then
Z
Z
h (ξ) dP =
h (ξ) χC (η) dP =
η −1 (C)
Ω
Z
=
h (x) χC (y) dF (x, y) =
2
ZR Z
=
h (x) χC (y) f (x, y) dxdy =
R R
Z Z
=
h (x) f (x, y) g (y) g (y) dxdy =
C R
Z Z
=
h (x) f (x, y) g (y) dxg (y) dy $
C R
Z Z
$
h (x) f (x | y) dxg (y) dy =
ZC ZB
=
h (x) f (x | y) dxdµ (y) ,
C
R
hence µ almost surely
a.s
Z
E (h (ξ) | η = y) =
h (x) f (x | y) dx.
R
Example 15 Calculate
E (ξ | η = y)
and
2
E [ξ − E (ξ | η = y)] | η = y
if the distribution of (ξ, η) is normal!
By direct calculation
2
1
(x − m (y))
√
√ exp − 2
f (x | y) =
2σ 1 (1 − r2 )
σ1 1 − r2
2π
where m (y) = µ1 + (σ 1 /σ 2 ) r (y − µ2 ) , hence
Z
m.m.
E (ξ | η = y) =
xf (x | y) dx = m (y) ,
R
9
!
hence the regression is a linear function. Also by direct calculation
Z
2
[x − m (y)] f (x | y) dx = σ 21 1 − r2 .
R
Example 16 Let the joint density function of (ξ, η) be
exp (−y)
if
0<x<y
f (x, y) $
.
0
otherwise
Calculate E (ξ | η = y) and E (η | ξ = x)!
The conditional density functions are
Z
Z y
g (y) $
f (x, y) dx =
e−y dx = ye−y ,
R
0
Z
Z ∞
∞
f (x) $
f (x, y) dy =
e−y dy = −e−y x = e−x .
x
R
f (x | y)
=
f (y | x)
=
E (ξ | η = y)
=
e−y
1
f (x, y)
= −y = , 0 < x < y,
g (y)
ye
y
f (x, y)
e−y
= −x = ex−y , 0 < x < y.
h (x)
e
So
Z y
1
xf (x | y) dx =
x dx =
y
R
0
Z
y
1 y
xdx = ,
y
2
Z 0
Z ∞
yf (y | x) dy =
yex−y dy =
R
x
Z ∞
Z ∞
∞
x
−y
x
e
ye dy = e
−ye−y x +
e−y dy =
x
x
ex xe−x + e−x = x + 1.
Z
=
E (η | ξ = x)
=
=
=
10