BASIC CONDITIONAL EXPECTATION Abstract. In undergraduate courses, conditional expectation is sometimes defined twice, once for discrete random variables and again for continuous random variables. Here, we will review some of these concepts. 1. Undergraduate definitions Recall that if X and Y are random variables defined on the same probability space, then E(X|Y ) is a random variable computed in the following way. Let A be some countable set, like the integers. If X and Y are discrete A-valued random variable, then we consider the function X xP(X = x | Y = y), φ(y) := E(X | Y = y) := x∈A and set the random variable E(X|Y ) := φ(Y ). Two important facts to note are that E(X|Y ) is a random variable, and that it is a function of Y . One way to think about E(X|Y ) is that it is an approximation of X, given the information of Y ; we can easily remember that E(X|Y ) is a random variable that is a function of Y , since it is an approximation that depends on Y . Similarly, for continuous random variables X and Y with joint pdf f and marginals fX and fY , we define fX|Y (x|y) = f (x, y) fY (y) define a function φ via Z ∞ φ(y) := E(X | Y = y) := xfX|Y (x|y)dx, −∞ and set E(X|Y ) := φ(Y ). In more advanced courses, a general unifying definition is given to define E(X|Y ). Theorem 1. The conditional expectation of X given Y is a random variable that is a function of Y that satisfies E E(X|Y ) = EX; moreover, E E(X|Y )g(Y ) = E Xg(Y ) , for all g for which whenever the expectations exist. In fact, Theorem 1 can be used to define conditional expectations. Exercise 1.1. In the discrete case, prove that first claim in Theorem 1. Solution. Let φ(y) = E(X|Y = y). Then E(E(X|Y )) = E(φ(Y )) X = φ(y)P(Y = y) y = XX = XX y xP(X = x|Y = y)P(Y = y) x y xP(X = x, Y = y) x = X X P(X = x, Y = y) x = X y x xP(X = x) x = EX. Exercise 1.2. In the discrete case, prove the second, more general, claim in Theorem 1. Exercise 1.3. What about the continuous case of Theorem 1? Exercise 1.4 (Linearity). Prove that E(aX + bY |Z) = aE(X|Z) + bE(Y |Z) Exercise 1.5 (Independence). Show that if X is independent of Y , then E(X|Y ) = EX. Exercise 1.6. In the discrete case, prove that E(X|X) = X. Exercise 1.6 also holds in the continuous case, but here we already have a technical problem, since if Y = X, we do not have a joint density (with respect to dxdy) for X and Y . In order to prove Exercise 1.6, a more general definition of conditional expectation is required. Exercise 1.7 (Taking out what is known). In the discrete case, prove that E(Xg(Y )|Y ) = g(Y )E(X|Y ). Exercise 1.7 also holds in the continuous case, but the similar technical problem of Exercise 1.6 occurs. Exercise 1.8 (Jensen’s inequality). Prove Jensen’s inequality for conditional expectation; that is, for a convex function g show that g(E(X|Y )) ≤ E(g(X)|Y ). 2. Examples Exercise 2.1. Let X = (X1 , . . . , Xn ) be a random sample, where Xi is a Poisson random variable with mean µ. Let S = X1 + · · · + Xn . Set Y = 1[X1 = 0]. Show that E(X1 |S) = S/n and 1 S E(Y |S) = 1 − . n Exercise 2.2. Notice in Exercise 2.1, X1 is an unbiased estimator for µ, and Y is an unbiased estimator for P(X1 = 0) = e−µ . Are E(X1 |S) and E(Y |S) still unbiased estimators, for the respective parameters? Exercise 2.3. Let X = (X1 , . . . , Xn ) be a random sample, where Xi ∼ Bern(p). Let S = X1 + · · · + Xn , and Y = X1 X2 . Compute E(Y |S). Exercise 2.4. Referring to Exercise 2.3, using your expression for E(Y |S), show that E(E(Y |S)) = E(Y ) = p2 , by direct computation; that is, without using Theorem 1. Show that E(Y |S) is a consistent estimator for p2 . Exercise 2.5. Prove that (in the discrete case) X E(g(X)|Y = y) = g(x)P(X = x|Y = y) x Exercise 2.6. With regards to Exercise 2.5, what happens in the continuous case? Exercise 2.7. Let λ > 0. Let X and Y be continuous random variables with joint density given by f (x, y) = λe−λy 1[0 < x < y]. y Check that f is indeed a joint pdf. Show that E(X|Y ) = Y /2, and use 1 your answer to show that E(X) = 2λ . Exercise 2.8. Referring to Exercise 2.7, show that 3 5 = Var(E(X|Y )) < Var(X) = . 2 12λ 12λ2 In fact, in general, later we will prove that Var(E(X|Y )) ≤ Var(X). Exercise 2.9 (Baby Wald). Let X1 , X2 , . . . be identical discrete realvalued random variables, with EXi = µ. Let N be a nonnegative integer valued random variable that is independent of the Xi . Let S= N X Xi . i=1 Show that E(S|N ) = µN. What is E(S)? 3. The case of finite variance In the case where the random variable X has finite variance, one also has the following geometric characterization. Theorem 2. Let X and Y be random variables. Suppose E|X|2 < ∞. Let G be the set of all random variables that can be written as a function of Y that have finite variance. Then there exists an almost surely unique random variable Z ∈ G such that E(X − Z)2 = inf E(X − W )2 ; W ∈G furthermore, Z = E(X|Y ). In Theorem 2, it might be helpful to think of the set G as a plane, and the random variable X as a point. When we try to reach the plane from the point X, in the shortest possible way, we arrive at the point Y . This idea is the principle behind the proof of Theorem 2. In more advanced courses, the Hilbert projection theorem is used to give the proof of Theorem 2. Exercise 3.1. Consider the line in the plane R2 given by the equation ax + by = c. Let (x0 , y0 ) ∈ R2 be a point in the plane. Find an expression for the (shortest) distance from the point to the line.
© Copyright 2026 Paperzz