A.1.3 Conditional Expectation: General Definition and Properties

A.1 Finite Probability Spaces
A.1.3
55
Conditional Expectation:
General Definition and Properties
Remark A.25. Note that, given a simple random variable Z on(⌦, F, P), the
conditional expectation of X with respect to Z can be written as:
E[X|Z](!) = E[X|Z = Z(!)],
! 2 ⌦.
(A.5)
Lemma A.26. Let X, Z be two simple random variables on (⌦, F, P), X(⌦) =
{x1 , . . . , xn }, then X, Z are independent if and only if the random variable
P({X = xi }|Z) is a constant on (⌦, F, P), for every i = 1, . . . , n.
Proof. By (A.5) and(A.2), for any ! 2 ⌦ we have
P({X = xi }|Z)(!) = E[1{X=xi } |Z = Z(!)]
Z
1
=
1{X=xi } dP
P(Z = Z(!)) {Z=Z(!)}
P({X = xi } \ {Z = Z(!)}
=
P(Z = Z(!))
= P({X = xi }|{Z = Z(!)}.
If X, Z are independent, this is clearly a constant, by Remark A.17. Conversely, suppose that Z(⌦) = {z1 , . . . , zm }. If, for all j = 1, . . . , m and for all
!2Z
1
({zj }),
P({X = xi }|Z)(!) = P ({X = xi }|{Z = zj }) = pi ,
then
P ({X = xi } \ {Z = zj }) = pi P(Z = zj ),
and summing over j = 1, . . . , m we obtain pi = P(X = xi ). By substitution
in the previous equation, we proved that the events {X = xi } and {Z = zj }
are independent. Since this holds for all i = 1, . . . , n, j = 1, . . . , m, that
means that X, Z are independent.
The properties (i)-(ii) shown in Lemma A.23 for the conditional expectation with respect to particular kinds of -algebra, G = ({B}) or G = (Z),
56
APPENDIX
can be extended to the definition of the conditional expectation with respect
to a generic -algebra.
Theorem A.27. Let X be an integrable random variable on (⌦, F, P) and
G be any -algebra contained in F, G ✓ F. There exists random variable Y
on (⌦, F, P) satisfying
(i) Y is integrable, i.e.
(ii)
R
A
XdP =
R
A
R
⌦
|Y |dP < 1, and G-measurable;
Y dP , or equivalently E[X1A ] = E[Y 1A ], for all A 2 G.
Moreover, Y is unique up to a negligible event, that is: if there exists another
random variable Z on (⌦, F, P) satisfying (i)-(ii), then Y = Z P-almost
surely.
Definition A.28. The conditional expectation of X with respect to G is any
member of the equivalence class of random variables on (⌦, F, P) satisfying
(i)-(ii) in Theorem A.27. It is denoted by E[X|G].
Definition A.28 is given under the most general assumptions on the probability space and random variables considered. In the case of simple random
variables or finite sample spaces, we do not require the integrability of Y in
(i), because it is always satisfied.
Note that the conditional expectation E[X|G] is G-measurable even when
X is not. It represents the best estimate of X based on the information
contained in G.
Proof (Theorem A.27: Almost sure uniqueness). We proceed as in the proof
of the implication (() in Lemma A.23. Assume that there exist two random
variables Y, Z on (⌦, F, P) satisfying (i)-(ii). Then, {Y > Z} = {(Y
Z) 1 ((0, 1))} 2 G by (i), and by (ii) we have
Z
Z
(Y Z)dP =
(X
{Y >Z}
X)dP = 0.
{Y >Z}
Thus P(Y > Z) = 0. In a symmetric way we obtain P(Y < Z) = 0, and so
Y = Z P-almost surely.
A.1 Finite Probability Spaces
57
In order to prove the existence, we have to resort to a classical result
in Probability. First, given two measures P, Q on (⌦, F), we say that Q is
P-absolutely continuous if, for all A 2 F such that P(A) = 0 , we also have
Q(A) = 0. In this case we write Q ⌧F P, or simply Q ⌧ P when there is no
ambiguity. If Q is P-absolutely continuous and P is Q-absolutely continuous,
we say that P and Q are equivalent and we write Q ⇠ P. Note that the
notion of absolute continuity is related to the -algebra considered.
Lemma A.29 (Radon-Nikodym Theorem). Let P, Q be finite measures on
(⌦, F) such that Q ⌧ P. Then, there exists a map L : ⌦ ! [0, 1) such that:
1. L is F-measurable (i.e. L is a random variable),
2. L is P-integrable,
3. Q(A) =
R
A
LdP for all A 2 F.
Moreover, L is unique up to a negligible event. L is called the density, or the
Radon-Nikodym derivative, of Q with respect to P on F, and the notation
L=
dQ
dQ
⌘
|F
dP
dP
is used.
For instance, any distribution P defined as in Proposition A.8 is absolutely continuous with respect to the Lebesgue measure m, that is P ⌧B m.
Actually the inverse also holds: all measures which are m-absolutely continuous, can be written in the form (A.1).
We are now able to prove the existence of the conditional expectation.
Proof (Theorem A.27: Existence). Assume that X 0. Define a measure Q
R
on (⌦, G) by Q(A) = A XdP for all A 2 G. Then Q is finite, because X is P-
integrable. Moreover, we have Q ⌧G P, thus, by Lemma A.29, there exists a
R
G-measurable and P-integrable random variable Y such that Q(A) = A Y dP
for all A 2 G. Such random variable Y satisfy the properties (i)-(ii).
Remark A.30. Property (ii) in Definition A.28 is equivalent to the following:
58
APPENDIX
(ii bis) E[XV ] = E[Y V ] for all bounded and G-measurable random variables
V on (⌦, F, P).
Proof. We only prove it in the case where both X and V are simple random
variables.
(ii))(ii bis). It is thanks to the fact that V is G-measurable and to the
linearity of the expectation. Indeed, by assumption,
V =
n
X
v j 1 Bj ,
where Bj 2 G ✓ F 8j = 1, . . . , m.
j=1
Then
"
E[XV ] = E X
n
X
#
v i 1 Bi =
i=1
n
X
vi E [X1Bi ] =
i=1
n
X
vi E [Y 1Bi ] = E[Y V ]
i=1
by (ii).
(ii bis))(ii). It is enough to consider all random variables of the form V =
1A , A 2 G, to get (ii).
Proposition A.31 (Properties of the conditional expectation). Let X, Y two
integrable random variables on (⌦, F, P) and H, G ✓ F two sub- -algebras of
F, then:
1. If X is G-measurable, then X = E[X|G].
2. If X is independent of G, i.e.
E[X].
(X), G are independent, then E[X|G] =
3. E [E[X|G]] = E[X].
4. [Linearity on the argument] If a, b 2 R, then E[aX+bY |G] = aE[X|G]+
bE[Y |G].
5. [Linearity w.r.t. convex combinations of measures] Let
and P, Q two probability measures on (⌦, F), then E
E[X|G] + (1
)E[X|G].
P+(1
)Q
2 [0, 1]
[X|G] =
A.1 Finite Probability Spaces
59
6. [Monotonicity] If X  Y , then E[X|G]  E[Y |G].
7. If Y is G-measurable and bounded, then E[Y X|G] = Y E[X|G].
8. If Y is independent of (X, G), then E[Y X|G] = E[Y ]E[X|G].
9. If H ✓ G, then E[E[X|G]|H] = E[E[X|H]|G] = E[X|H].
10. [Jensen inequality] Let ' : R ! R be a convex function such that '(X)
is P-integrable, then '(E[X|G])  E['(X)|G].
Proof.
1. Trivial, by Definition A.28.
2. E[X] is a constant, thus
(E[X]) = {;, ⌦} ✓ G, i.e. E[X] is G-
measurable. Then, for every bounded and G-measurable random variable V , X, V are independent and
E[XV ] = E[X]E[V ] = E [E[X]V ] .
3. By (ii bis) with V = 1.
4. By Definition A.28: the set of G-measurable random variables is a
vector space, and the integral is linear.
5. Again by linearity of the integral, but with respect to the measure.
6. By monotonicity of the integral.
7. Consider the random variable Z := Y E[X|G]. We want to prove that
Z satisfies the properties (i)-(ii bis). Since both Y and E[X|G] are
G-measurable, so is Z; for every bounded and G-measurable random
variable V ,
E[ZV ] = E [Y E[X|G]V ] = E[Y V X],
by (ii bis) for E[X|G]. Thus Z = E[XY |G].
60
APPENDIX
8. As before, let us consider the random variable Z := E[Y ]E[X|G] and
prove (i)-(ii bis). Z is G-measurable as the product of a constant and a
G-measurable variable. Moreover, for every bounded and G-measurable
random variable V ,
E[ZV ] = E [E[Y ]E[X|G]V ] = E [E[Y ]XV ] = E[Y ]E[XV ] = E[Y XV ],
by (ii bis) for E[X|G] and the independence of Y, XV . Thus Z =
E[XY |G].
9. Consider Z := E [E[X|G]|H]. It is H-measurable by definition; for
every bounded and H-measurable random variable V ,
E[ZV ] = E [E [E[X|G]|H] V ]
= E [E [V E[X|G]|H]]
(by 7.)
= E [V E[X|G]]
(by 3.)
= E [E[V X|G]]
(by 7.)
= E [V X]
(by 3.)
Thus Z = E[XY |H].
10. We recall a property of convexity, that is: any convex function ' is the
supremum of all linear functions dominated by it, i.e. for all x 2 R
'(x) = sup l(x),
l2L
Then,
L := {l : R ! R| l(x) = ax + b, a, b 2 R, l  '}.

E['(X)|G] = E sup l(X)|G
l2L
sup E[l(X)|G]
l2L
= sup l (E[X|G])
l2L
= ' (E[X|G]) .
(by 4.)