Conditional Expectation

Conditional Expectation
Recall that

If
X and Y are continuous random variables, then the conditional density
function of Y given X  x is given by
fY / X ( y / x)  f X ,Y ( x, y) / f X ( x)

If
X and Y are discrete random variables, then the probability mass function
Y given X  x is given by
pY / X ( y / x)  p X ,Y ( x, y) / p X ( x)
The conditional expectation of Y given X  x is defined by
Y / X  x

  yfY / X ( y / x)
 E (Y / X  x)   
  ypY / X ( y / x )
 yRY
if X and Y are continuous
if X and Y are discrete
The conditional expectation of Y given X  x is also called the conditional mean of
Y given X  x. Clearly, Y / X  x denotes the centre of mass of the conditional pdf or
the conditional pmf as shown in Fig. below.
Remark

We
can
similarly
define
the
conditional
expectation
of
X given Y  y, denoted by E ( X / Y  y )

Higher-order conditional moments can be defined in a similar manner.

Particularly, the conditional
variance of
Y given X  x
is given by
 Y2 / X  x  E[(Y  Y / X  x )2 / X  x]
Example:
Consider the discrete random variables X and Y discussed in example .The
joint probability mass function of the random variables are tabulated in Table .
Find the joint expectation of E (Y / X  2)
x
y
0
1
p X ( x)
0
1
2
pY ( y )
0.25
0.14
0.1
0.35
0.15
0.01
0.5
0.5
0.39
0.45
0.16
The conditional probability mass function is given by
pY / X ( y / 2)  p X ,Y (2, y ) / p X (2)
 pY / X (0 / 2)  p X ,Y (2,0) / p X (2)

0.15
 15 /16
0.16
and
pY / X (1/ 2)  p X ,Y (2,1) / p X (2)
0.01
 1/16
0.16
E (Y / X  2)  0  pY / X (0 / 2)  1  pY / X (1/ 2)

 1/16
Similarly, we can show that
pY / X (0 /1)  2 / 9
and
pY / X (1/1)  7 / 9
so that
E (Y / X  1)  7 / 9
pY / X (0 / 0)  25 / 39
and
pY / X (1/ 0)  14 / 39
so that
E (Y / X  0)  14 / 39
We also note that
EX  0  p X (0)  1  p X (1)  2  p X (2)  0.77
and
EY  0  pY (0)  1  pY (1)  0.5
Example
Suppose X and Y are jointly uniform random variables with the joint probability
density function given by
1
 x  0, y  0, x  y  2
f X ,Y ( x, y )   2
0 otherwise
Find E (Y / X  x)
From the figure, f X ,Y ( x, y ) 
1
in the shaded area.
2
Y
x y 2
We have
X
2 x
 f X ( x)   f X ,Y ( x, y )dy
0
2 x
 
0
1
dy
2
1
= (2  x) 0  x  2
2
 fY / X ( y / x )
 f X ,Y ( x, y ) / f X ( x)
1
2 x


 E (Y / X  x )   yfY / X ( y / x )dy

2 x
  y
0

1
dy
2 x
2- x
2
Example
Suppose X and Y are jointly Gaussian random variables with the joint probability density
function given by
f X ,Y ( x, y) 

1
2 x y 1  X2 ,Y
e
1
2(1  2
XY )
 ( x   X )2
 2  XY

2
  X
( x
X
)( y  ) ( y  )2 
Y 
Y

Y2

 
X Y
.
Find E (Y / X  x).
We have fY / X ( y / x)  f X ,Y ( x, y ) / f X ( x)


1
2
2 X  Y 1  XY
e
 ( x   X )2
 2  XY

2
 X
1
2(1  2
X ,Y )
1
2 X



1
2
2 Y 1  XY
e

1
2
2 Y 1  XY
e
( x
X
2
X
X
)( y   ) ( y   )2 
Y 
Y

 Y2
Y

 X
)2

 Y  XY 
( y   Y )  ( x   x )

X 

1
2 )
2 Y2 (1  XY
1
2 )
2 Y2 (1  XY
e
 12
( x
2
 
 Y  X ,Y  
 y  Y  ( x   x )

 X  
 
2
which is a Gaussian distribution.
Therefore,


E (Y / X  x)   y

 Y 
1
2
2 Y 1  XY
e
1
2 )
2  Y2 (1  XY
 Y  XY
( x  x )
X
and
2
var(Y / X  x)   Y2 (1   XY
)
We can similarly show that


 Y  XY
( x   x )
( y   Y ) 
X


2
dy
E ( X / Y  y)   X 
 Y  XY
( y  Y )
Y
and
2
var( X / Y  y )   X2 (1   XY
)
Conditional Expectation as a random variable
Note that E(Y / X  x) is a function of x.
Using this function, we may define a random variable  ( X )  E (Y / X ). Thus
EY / X as a function of the random variable
we may consider
value of
E (Y / X )
at
X
and
E (Y / X  x)
X  x.
E (Y / X ) is a random variable, E (Y / X  x) is the value of E (Y / X ) at X  x
Total expectation theorem
We establish the following results.
EE (Y / X )  EY
and
EE ( X / Y )  EX
Proof:

EE (Y / X )   E (Y / X  x) f X ( x)dx

 
   yfY / X ( y / x)dy f X ( x)dx
 
 
   yf X ( x) fY / X ( y / x)dydx
 
 
   yf X ,Y ( x, y )dydx
 




  y  f X ,Y ( x, y )dxdy

  yfY ( y )dy

 EY
as the
Thus EE (Y / X )  EY and similarly
EE ( X / Y )  EX
The above results simplify the calculation of the unconditional expectations EX and
EY . We can also show that
EE ( g (Y ) / X )  Eg (Y )
and
EE ( g ( X ) / Y )  Eg ( X )
Example In example
x
E (Y / X  x)
0
1
2
14/39
7/9
1/16
p X ( x)
039
0.45
0.16
EE (Y / X )  pX (0) E (Y / X  0)  p X (1) E (Y / X  1)  p X (2) E (Y / X  2)
 0.5
 EY
Baysean Estimation theory and conditional expectation
Consider two random variables X and Y with joint pdf f X ,Y ( x, y ). Suppose
Y is observable and some a priori information about X is available in a sense that
some values of X are more likely. We can represent this prior information in the
form of a prior density function f X ( x). . We have to estimate X for a given value
Y  y in some optimal sense.
Obervation Y  y
fY / X ( y / x )
Random
variable X
with density
f X ( x)
The conditional density function fY / X ( y / x) is called likelihood function in
estimation terminology.
f X ,Y ( x, y )  f X ( x) fY / X ( y )
Also we have the Bayes rule
f X /Y ( x / y) 
f X ( x ) fY / X ( y / x )
fY ( y )
where f  / X ( ) is the a posteriori density function
Suppose the optimum estimator Xˆ (Y ) is a function of the random variable Y such that
it minimizes the mean-square estimation error E ( Xˆ (Y )  X ) 2 . Such an estimator is
known as the minimum mean-square error(MMSE) estimator.
The estimation problem is
 
  ( Xˆ ( y)  x)
2
Minimize
f X ,Y (x, y )dx dy
 
with respect to X̂(y) .
This is equivalent to minimizing
 
  ( Xˆ ( y)  x)
2
fY (y ) f X / Y ( x / y )dx dy
 
 

  ( Xˆ ( y)  x)
2
f X / Y ( x / y )dx fY (y )dy
 
Since fY (y ) is always +ve, the above integral will be minimum if the inner integral is
minimum. This results in the problem:

Minimize
 ( Xˆ ( y)  x)
2
f X / Y ( x / y )dx

with respect to Xˆ ( y ). The minimum is given by


( Xˆ ( y )  x)2 f X / Y ( x / y )dx  0

ˆ
X ( y ) 

 2  ( Xˆ ( y )  x) f X / Y ( x / y )dx  0



Xˆ ( y ) f X / Y ( x / y )dx 


 Xˆ ( y )

xf
X /Y
( x / y )dx E ( X / Y  y )



f X / Y ( x / y )dx  E ( X / Y  y )

 Xˆ ( y )  E ( X / Y  y )
Example
Suppose X and Y are two jointly Gaussian random variables considered in the earlier
example. We have to estimate X from a single observation Y  y. The MMSE
estimator Xˆ ( y ) is given by
 
Xˆ ( y)   Y  Y X ,Y ( x   x )
X
If   0 and   0, then
 
Xˆ ( y)  Y X ,Y x
X
X
Y
Thus the MMSE estimator Xˆ ( y ) for two zero-mean jointly Gaussian random variables
is linearly related with the data y. This result plays an important role in the optimal
filtering of random signals.