BASIC CONDITIONAL EXPECTATION 1. Undergraduate definitions

BASIC CONDITIONAL EXPECTATION
Abstract. In undergraduate courses, conditional expectation is
sometimes defined twice, once for discrete random variables and
again for continuous random variables. Here, we will review some
of these concepts.
1. Undergraduate definitions
Recall that if X and Y are random variables defined on the same
probability space, then E(X|Y ) is a random variable computed in the
following way. Let A be some countable set, like the integers. If X and
Y are discrete A-valued random variable, then we consider the function
X
xP(X = x | Y = y),
φ(y) := E(X | Y = y) :=
x∈A
and set the random variable E(X|Y ) := φ(Y ). Two important facts to
note are that E(X|Y ) is a random variable, and that it is a function of
Y . One way to think about E(X|Y ) is that it is an approximation of
X, given the information of Y ; we can easily remember that E(X|Y ) is
a random variable that is a function of Y , since it is an approximation
that depends on Y .
Similarly, for continuous random variables X and Y with joint pdf
f and marginals fX and fY , we define
fX|Y (x|y) =
f (x, y)
fY (y)
define a function φ via
Z
∞
φ(y) := E(X | Y = y) :=
xfX|Y (x|y)dx,
−∞
and set E(X|Y ) := φ(Y ).
In more advanced courses, a general unifying definition is given to
define E(X|Y ).
Theorem 1. The conditional expectation of X given Y is a random
variable that is a function of Y that satisfies
E E(X|Y ) = EX;
moreover,
E E(X|Y )g(Y ) = E Xg(Y ) ,
for all g for which whenever the expectations exist.
In fact, Theorem 1 can be used to define conditional expectations.
Exercise 1.1. In the discrete case, prove that first claim in Theorem
1.
Solution. Let φ(y) = E(X|Y = y). Then
E(E(X|Y )) = E(φ(Y ))
X
=
φ(y)P(Y = y)
y
=
XX
=
XX
y
xP(X = x|Y = y)P(Y = y)
x
y
xP(X = x, Y = y)
x
=
X X
P(X = x, Y = y)
x
=
X
y
x
xP(X = x)
x
= EX.
Exercise 1.2. In the discrete case, prove the second, more general,
claim in Theorem 1.
Exercise 1.3. What about the continuous case of Theorem 1?
Exercise 1.4 (Linearity). Prove that
E(aX + bY |Z) = aE(X|Z) + bE(Y |Z)
Exercise 1.5 (Independence). Show that if X is independent of Y ,
then E(X|Y ) = EX.
Exercise 1.6. In the discrete case, prove that
E(X|X) = X.
Exercise 1.6 also holds in the continuous case, but here we already
have a technical problem, since if Y = X, we do not have a joint density
(with respect to dxdy) for X and Y . In order to prove Exercise 1.6, a
more general definition of conditional expectation is required.
Exercise 1.7 (Taking out what is known). In the discrete case, prove
that
E(Xg(Y )|Y ) = g(Y )E(X|Y ).
Exercise 1.7 also holds in the continuous case, but the similar technical problem of Exercise 1.6 occurs.
Exercise 1.8 (Jensen’s inequality). Prove Jensen’s inequality for conditional expectation; that is, for a convex function g show that
g(E(X|Y )) ≤ E(g(X)|Y ).
2. Examples
Exercise 2.1. Let X = (X1 , . . . , Xn ) be a random sample, where Xi is
a Poisson random variable with mean µ. Let S = X1 + · · · + Xn . Set
Y = 1[X1 = 0]. Show that E(X1 |S) = S/n and
1 S
E(Y |S) = 1 −
.
n
Exercise 2.2. Notice in Exercise 2.1, X1 is an unbiased estimator for
µ, and Y is an unbiased estimator for P(X1 = 0) = e−µ . Are E(X1 |S)
and E(Y |S) still unbiased estimators, for the respective parameters?
Exercise 2.3. Let X = (X1 , . . . , Xn ) be a random sample, where Xi ∼
Bern(p). Let S = X1 + · · · + Xn , and Y = X1 X2 . Compute E(Y |S).
Exercise 2.4. Referring to Exercise 2.3, using your expression for
E(Y |S), show that E(E(Y |S)) = E(Y ) = p2 , by direct computation;
that is, without using Theorem 1. Show that E(Y |S) is a consistent
estimator for p2 .
Exercise 2.5. Prove that (in the discrete case)
X
E(g(X)|Y = y) =
g(x)P(X = x|Y = y)
x
Exercise 2.6. With regards to Exercise 2.5, what happens in the continuous case?
Exercise 2.7. Let λ > 0. Let X and Y be continuous random variables
with joint density given by
f (x, y) =
λe−λy
1[0 < x < y].
y
Check that f is indeed a joint pdf. Show that E(X|Y ) = Y /2, and use
1
your answer to show that E(X) = 2λ
.
Exercise 2.8. Referring to Exercise 2.7, show that
3
5
= Var(E(X|Y )) < Var(X) =
.
2
12λ
12λ2
In fact, in general, later we will prove that
Var(E(X|Y )) ≤ Var(X).
Exercise 2.9 (Baby Wald). Let X1 , X2 , . . . be identical discrete realvalued random variables, with EXi = µ. Let N be a nonnegative integer
valued random variable that is independent of the Xi . Let
S=
N
X
Xi .
i=1
Show that E(S|N ) = µN. What is E(S)?
3. The case of finite variance
In the case where the random variable X has finite variance, one also
has the following geometric characterization.
Theorem 2. Let X and Y be random variables. Suppose E|X|2 <
∞. Let G be the set of all random variables that can be written as a
function of Y that have finite variance. Then there exists an almost
surely unique random variable Z ∈ G such that
E(X − Z)2 = inf E(X − W )2 ;
W ∈G
furthermore, Z = E(X|Y ).
In Theorem 2, it might be helpful to think of the set G as a plane,
and the random variable X as a point. When we try to reach the plane
from the point X, in the shortest possible way, we arrive at the point
Y . This idea is the principle behind the proof of Theorem 2. In more
advanced courses, the Hilbert projection theorem is used to give the
proof of Theorem 2.
Exercise 3.1. Consider the line in the plane R2 given by the equation
ax + by = c. Let (x0 , y0 ) ∈ R2 be a point in the plane. Find an
expression for the (shortest) distance from the point to the line.