3. Introduction to conditional expectation - e-Learning

3. Introduction to conditional expectation
3.1. Doob’s measurability lemma. Given a measurable space pE, Eq and a function
X : Ω Ñ E, we define the σ-algebra σpXq generated by X as follows:
σpXq “ tC Ď Ω : C “ X ´1 pAq for some A P Eu
“ ttX P Au : A P Eu .
Note that σ-algebra on Ω. It is the smallest σ-algebra on Ω which makes X measurable.
More explicitly, the map X : pΩ, Aq Ñ pE, Eq is measurable if and only if A Ě σpXq.
Lemma 3.1 (Doob’s measurability lemma). Fix an arbitrary map X : Ω Ñ pE, Eq. A real
function Y : Ω Ñ R is σpXq-measurable if and only if there exists a measurable function
ϕ : pE, Eq Ñ R such that Y “ ϕpXq.
Proof. The “if” part is immediate: since X is σpXq-measurable, also Y “ ϕpXq is σpXqmeasurable (composition of measurable maps).
We turn to the “only if” part, so we assume that Y is σpXq-measurable.
‚ If Y is a simple random
variable, which takes the distinct values y1 , . . . , ym P R,
ř
we can write Y “ m
y
1tY “yi u . By assumption tY “ yi u P σpXq, so there exists
i
i“1
Ai P E such that tY “ yi u “ tX P Ai u. If we define ϕ : E Ñ R by
m
ÿ
ϕpxq :“
yi 1Ai pxq ,
i“1
then ϕpXq “
řm
i“1 yi 1Ai pXq
“
řm
i“1 yi 1tXPAi u
“
řm
i“1 yi 1tY “yi u
“Y.
‚ If Y ě 0, there is a sequence Yn of (finite) simple random variables such that
0 ď Yn pωq Ò Y pωq
for every
ω P Ω.
For each n there is a measurable ϕn : E Ñ r0, 8q such that Yn “ ϕn pXq, hence
0 ď ϕn pXpωqq Ò Y pωq
for every
ω P Ω.
(3.1)
Let us define the measurable map ϕ : E Ñ r0, 8s by
ϕpxq :“ lim sup ϕn pxq .
n
If x “ Xpωq for some ω P Ω, relation (3.1) shows that ϕn pxq has a limit, hence
ϕpXpωqq “ lim ϕn pXpωqq “ Y pωq .
n
‚ For arbitrary measurable Y , write Y “ Y ` ´ Y ´ and note that Y ` ě 0 and Y ´ ě 0
are both σpXq-measurable (composition of measurable maps). By the previous step,
there are measurable ϕ1 , ϕ2 : E Ñ r0, 8s such that Y ` “ ϕ1 pXq and Y ´ “ ϕ2 pXq. If
we define ϕ : E Ñ r´8, `8s by
ϕpxq :“ ϕ1 pxq ´ ϕ2 pxq
(3.2)
then Y pωq “ Y ` pωq ´ Y ´ pωq “ ϕ1 pXpωqq ´ ϕ2 pXpωqq “ ϕpXpωqq for every ω P Ω.
This completes the proof.
If Y can attain both values `8 and ´8, the definition (3.2) of φpxq could be ill-posed in the “exceptional set”
A :“ tx P E : ϕ1 pxq “ ϕ2 pxq “ 8u .
To fix this, we simply set ϕpxq “ 0 (or any other value) for x P A, that is we define
ϕpxq
r
:“ ϕ1 pxq 1A pxq ´ ϕ2 pxq 1A pxq .
1
(3.3)
2
´1
r
is (well-defined for every x P E and) measurable. Let us
Since A “ ϕ´1
1 pt8uq X ϕ2 pt8uq P E, the new function ϕpxq
show that Y pωq “ ϕpXpωqq
r
for every ω P Ω. To this purpose, we claim that if x “ Xpωq for some ω P Ω, then x R A:
r
for x R A coincides with the previous ϕpxq. Finally, to prove the claim, note
this is enough to conclude, because ϕpxq
that if x “ Xpωq for some ω P Ω, then ϕ1 pxq “ Y pωq` and ϕ2 pxq “ Y pωq´ so either ϕ1 pxq “ 0 or ϕ2 pxq “ 0.
(In principle, a more natural way to redefine ϕ would be to restrict it to the image of X, but the latter is not
necessarily a measurable subset of E.)
Intuitively, σpXq consists of those events which depend on X, in the following precise
sense: for every event C P σpXq, we can determine whether C happens (i.e. whether ω P C)
only knowing the value taken by X (i.e. Xpωq). Indeed, if C “ tX P Au,
ωPC
Xpωq P A .
ðñ
We say that σpXq encodes the “information” of X.
Exercise 3.2. Fix two sample points ω, ω 1 P Ω such that Xpωq “ Xpω 1 q. Then any event
C P σpXq contains either both points (tω, ω 1 u Ď C) or none of them (tω, ω 1 u Ď C c ). In
other terms, it cannot happen that ω P C, ω 1 R C, nor ω R C, ω 1 P C.
Example 3.3 (σ-algebra generated by a partition). Let Ω be a set and let pCi qiPI be a
finite or countable partition of Ω, that is
ď
|I| ď |N| ,
Ci “ Ω ,
Ci X Cj “ H for i ‰ j .
iPI
The smallest σ-algebra on Ω that contains all the Ci ’s can be described explicitly as follows:
"
*
ď
C :“ C “
Ci : J Ď I
iPJ
In words, the elements of C are all the possible unions of the Ci ’s (for the choice J “ H we
mean that C “ H). We say that C is the σ-algebra generated by the partition pCi qiPI .
Example 3.4 (Discrete random variables). Let S be a finite or countable set: |S| ď |N|.
Every discrete random variable X taking values in S defines the partition tCi :“ tX “ iuuiPS
of Ω. Then σpXq is the σ-algebra generated by this partition, that is
ÿ
C P σpXq
ðñ
C“
Ci for some I Ď S .
iPI
Example 3.5. Consider two Bernoulli random variables (“coins”) Y, Z P t0, 1u and consider
X :“ Y ` Z. Since X P t0, 1, 2u, it follows that σpXq is generated by the partition
C0 “ tX “ 0u ,
C1 “ tX “ 1u ,
C2 “ tX “ 2u .
With the canonical construction Ω “ t00, 01, 10, 11u and Y pabq :“ a, Zpabq :“ b, we have
C0 “ t00u ,
C1 “ t01, 10u ,
C2 “ t11u .
Note that D :“ t10u does not belong to σpXq. Intuitively: one cannot decide whether D has
happened only knowing the value of X. For a proof, it suffices to apply Exercise 3.2: given
ω :“ 10 and ω 1 :“ 01, we have Xpωq “ Xpω 1 q but ω P D, ω 1 R D.
Exercise 3.6. Let Ω “ R with A “ BpRq and let Xpωq “ |ω|. Show that
σpXq “ tA Y p´Aq : A P BpRq, A Ă R` u ,
that is σpXq consists of the symmetric Borel subsets of R.
3
Deduce that a Borel function Y : Ω Ñ R is σpXq-measurable if and only if it is a
symmetric function, equivalently Y “ Yp|ω|q, for a suitable measurable map Y : R` Ñ R.
3.2. Conditional Expectation knowing the Conditional Law. Let X
and Y be random variables, taking values in pE, Eq and pF, Fq. Fix a probability kernel
N px, dyq which is the conditional law of Y given X:
N px, dyq “ µY pdy|X “ xq.
Given a measurable function ϕ : F Ñ R either bounded or positive we consider the function
ż
Ipx, ϕq :“
ϕpyq N px, dyq ,
(3.4)
F
We are going to use the expressive notation
EpϕpY q|X “ xq :“ Ipx, ϕq .
We know that Ip¨, ϕq is a measurable function E Ñ R either bounded or positive. Let us
now compose Ip¨, ϕq with the random variable X (i.e. we replace x by X) and denote what
we obtain by EpϕpY q|Xq namely
ErϕpY q|Xspωq :“ IpXpωq, ϕq .
With expressive notation, we can write
ˇ
ErϕpY q|Xspωq “ ErϕpY q|X “ xsˇx“Xpωq .
Since X : pΩ, A, Pq Ñ pE, Eq and Ip¨, ϕq : pE, Eq Ñ R, we have defined a map
ErϕpY q|Xs : pΩ, A, Pq Ñ R .
Thus ErϕpY q|Xs is a random variable, called the conditional expectation of ϕpY q given X.
Remark 3.7. So far we have defined ErZ|Xs only when Z “ ϕpY q and we know the
conditional law of Y given X. We will soon give a more general definition.
The following gives an intrinsic characterization of the conditional expectation and will
be our starting point for further generalizations
Proposition 3.8. Let Z “ ϕpY q and define W :“ ErZ|Xs “ ErϕpY q|Xs as above. Then
W is the unique σpXq measurable random variable (up to a.s. equivalence) such that
ErW 1C s “ ErZ 1C s ,
@C P σpXq .
With expanded notation:
“
‰
E ErϕpY q|Xs 1C “ ErϕpY q 1C s ,
@C P σpXq .
(3.5)
Proof. By definition W “ ErϕpY q|Xs “ IpX, ϕq is the measurable function Ip¨, ϕq composed with X, hence W is σpXq measurable by (the easy half of) Doob’s measurability
Lemma. We notice that C P σpXq means C “ tX P Au for a suitable A P E. Then by
disintegration formula
„ż

ż
EpϕpY q1C q “ EpϕpY q1A pXqq “
1A pxq
ϕpyqN px, dyq µX pdxq
E
F
ż
“
1A pxqIpx, ϕqµX pdxq “ EpIpX, ϕq1A pXqq “ EpEpϕpY q|Xq1C q .
E
4
Finally we prove uniqueness: if W 1 is a σpXq-measurable random variable such that
EpW 1 1C q “ EpZ1C q ,
@C P σpXq ,
then EpW 1 1C q “ EpW 1C q for all C P σpXq. By the identification Lemma, it follows that
W “ W 1 , P-a.s..
Remark 3.9. A special case of (3.5), very useful in practice, is obtained for C “ Ω:
“
‰
E ErϕpY q|Xs “ ErϕpY qs .
(3.6)
In particular, if F “ R and ϕ is the identity,
“
‰
E ErY |Xs “ ErY s .
(3.7)
3.3. Examples and explicit computations. Let us discuss some cases in which
computations of conditional expectation can be performed explicitly.
‚ If X and Y are discrete random variables, the conditional law of Y given X “ x is
Y “yq
determined by the discrete density pY py|X “ xq :“ PpX“x,
(defined arbitrarily if
PpX“xq
PpX “ xq “ 0). It follows that
ÿ
ErϕpY q|X “ xs “
ϕpyq pY py|X “ xq .
(3.8)
yPF
In particular, if F “ R and φ is the identity, we get
ÿ
ErY |X “ xs “
y pY py|X “ xq .
(3.9)
yPF
‚ If E “ Rn , F “ Rm and X and Y are jointly absolutely continuos random variables,
the conditional law of Y given X “ x is determined by the density fY py|X “ xq :“
fpX,Y q px,yq
(defined arbitrarily if fX pxq “ 0). It follows that
fX pxq
ż
ErϕpY q|X “ xs “
ϕpyq fY py|X “ xq dx .
(3.10)
Rm
In particular, if m “ 1 and φ is the identity, we get
ż
ErY |X “ xs “
y fY py|X “ xq dx .
(3.11)
R
We now recall some concrete examples, in which we already know the conditional law.
Example 3.10 (Exponential and uniform random variables, reprise). Let Y, Z „ Exppλq
be independent random variables and define. We have proved that
$
&1 1
if x ą 0
r0,xs pyq
fY py|X “ xq “ x
,
%
arbitrary density if x ď 0
that is the conditional law of Y given X “ x ą 0 is the uniform distribution U p0, xq on the
interval r0, xs. It follows that ErY |X “ xs “ x2 , hence
1
ErY |Xs “ X .
2
5
An alternative way to determine the conditional law of Y given X “ x, without using the conditional density,
proceeds as follows. Choose any bounded measurable function φ : R2 Ñ R and calculate:
ż ż
EpφpY, Xqq “ EpφpY, Y ` Zqq “
λ2 1r0,8r pyq1r0,8r pzqφpy, y ` zqe´λpy`zq dydz
R R
„ż

ż
1
“
λ2 we´λw
1r0,wr pyqφpy, wqdy dw
R
R w
In the above formula it is easy to recognize the density of the uniform distribution in r0, ws in the inner integral and
the density of X „ Γp2, λq in the outer. Thus µY pdy, X “ xq “ Ur0,xs , as we already know.
Example 3.11 (Poisson and binomial random variables, reprise). Let Y „ Poispλq and
Z „ Poispµq be independent random variables. We know that (for k P t0, 1, . . . , nu)
ˆ ˙
n k
λ
pY pk|X “ nq “
p p1 ´ pqn´k
where
p :“
,
(3.12)
k
λ`µ
that is the conditional law of Y given X “ n is Binpn, pq. Then ErY |X “ ns “ np, hence
ErY |Xs “ pX .
We next describe a very useful tool to compute conditional expectations.
Lemma 3.12 (Freezing Lemma, reprise). Let X and Z be independent random variables,
taking values in pE, Eq and pG, Gq respectively. If g : pE ˆ G, E b Gq Ñ pR, BpRqq is a
measurable function, either positive or bounded, then
ErgpX, Zq|X “ xs “ Ergpx, Zqs ,
hence
ˇ
ErgpX, Zq|Xs “ Ergpx, Zqsˇx“X .
Proof. We know by the previous Freezing Lemma that Y :“ gpX, Zq has conditional law
µY pdy|X “ xq “ µgpx,Zq pdyq
Thus
ż
EpY |X “ xq “
y µgpx,Zq pdyq “ Epgpx, Zqq . R
Here is an interesting example.
Example 3.13 (Wald’s identity). Let X “ pXn qnPN be a sequence of (not necessarily i.i.d.)
random variables in L1 , with the same expected value m “ ErXn s P R. For m P N0 “
t0, 1, 2, . . .u we define
m
ÿ
Sm :“
Xn
n“1
with the convention S0 :“ 0. Let τ be a random variable independent of X, with values in
N. Define Y :“ Sτ , i.e.
τÿ
pωq
Y pωq :“
Xn pωq .
n“1
Then Y P L1 and the following relation, known as Wald’s (first) identity, holds:
ErY s “ Erτ s m “ Erτ s ErX1 s .
(3.13)
N
We can look at X as a random variable taking values in the space of real sequences
řt E “ R .
Then Y “ gpX, τ q for a measurable function g : E ˆ N Ñ R, namely gpx, tq :“ n“1 xn for
6
x “ pxn qnPN P E and t P N. It follows by the freezing lemma (with X repaced by τ , and Z
replaced by X) that
„ÿ

t
t
ÿ
ErY |τ “ ts “ ErgpX, tqs “ E
Xn “
ErXn s “ m t ,
n“1
n“1
by linearity of expected value. This means that ErY |τ s “ τ m. By the property (3.7),
ErY s “ ErErY |τ ss “ Erτ ms “ Erτ sm ,
which proves (3.13).
We conclude with some exercises.
Exercise 3.14. We continue here Exercise 3.6. Let again Ω “ R with A “ BpRq. We endow
Ω with the standard gaussian probability N p0, 1q and set again Xpωq “ |ω|. Show that if Y
is a measurable map Ω Ñ R either bounded or positive then:
1
EpY |Xq “ pY ptq ` Y p´tqq .
2
Exercise 3.15. Let X “ pY,„Zq be a gaussian vector in R2 with expectation m “ pmY , mZ q
σY2
%σY σZ
and covariance matrix K “
where σY ą 0, σZ ą 0 e % P p´1, 1q.
%σY σZ
σZ2
Prove that
σY
EpY |Zq “ % pZ ´ mZ q ` mY
σZ