Continuous Random Variables: Conditioning,
Expectation and Independence
Berlin Chen
Department of Computer Science & Information Engineering
National Taiwan Normal University
Reference:
- D. P. Bertsekas, J. N. Tsitsiklis, Introduction to Probability , Sections 3.4-3.5
Conditioning PDF Given an Event (1/3)
• The conditional PDF of a continuous random variable X ,
given an event A
– If A cannot be described in terms of X , the conditional PDF
is defined as a nonnegative function f X A ( x ) satisfying
P ( X ∈ B A ) = ∫B f X
A
(x )dx
• Normalization property
∞
∫− ∞
fX
A
( x )dx
=1
Probability-Berlin Chen 2
Conditioning PDF Given an Event (2/3)
– If A can be described in terms of X ( A is a subset of the real
line with P ( X ∈ A ) > 0 ), the conditional PDF is defined as a
nonnegative function f
(x ) satisfying
X A
fX
⎧ f X (x )
,
⎪
A (x ) = ⎨ P ( X ∈ A )
⎪0,
⎩
if x ∈ A
otherwise
• The conditional PDF is zero outside the
conditioning event
and for any subset
B
P (X ∈ B X ∈ A ) =
=
fX
remains the same shape as
f X except that it is scaled along
the vertical axis
P (X ∈ B , X ∈ A )
P (X ∈ A )
∫ A I B f X ( x )dx
P (X ∈ A )
= ∫ B f X A ( x )dx
∞
– Normalization Property ∫− ∞ f X
A
(x )dx
= ∫A f X
A
A
( x )dx
=1
Probability-Berlin Chen 3
Conditioning PDF Given an Event (3/3)
• If A1 , A2 , K , An are disjoint events with P ( Ai ) > 0 for
each i , that form a partition of the sample space, then
n
f X (x ) = ∑ P ( A i ) f X
i =1
Ai
(x )
– Verification of the above total probability theorem
n
P ( X ≤ x ) = ∑ P ( Ai )P ( X ≤ x Ai )
i =1
⇒
x
∫− ∞
n
f X (t )dt = ∑ P ( Ai )∫−x∞ f X
i =1
Ai
(t )dt
Taking the derivative of both sides with respective to x
n
⇒ f X ( x ) = ∑ P ( Ai ) f X
i =1
Ai
(x )
Probability-Berlin Chen 4
An Illustrative Example
• Example 3.9. The exponential random variable is
memoryless.
– The time T until a new light bulb burns out is exponential
distribution. John turns the light on, leave the room, and when he
returns, t time units later, find that the light bulb is still on, which
corresponds to the event A={T>t}
– Let X be the additional time until the light bulb burns out. What is
the conditional PDF of X given A ?
X = T − t , A = {T > t}
T is exponential
⎧λe −λt , t > 0
f T (t ) = ⎨
otherwise
⎩0,
P(T > t ) = e − − λt
The conditional CDF of X given A is defined by
P (X > x A) = P(T − t > x T > t ) (where x ≥ 0)
= P(T > t + x T > t ) =
P(T > t + x )
P(T > t )
e −λ (t + x )
=
e − λt
= e − λx
P(T > t + x and T > t )
P(T > t )
∴The conditional PDF of X given
the event A is also exponential
with parameterλ.
=
Probability-Berlin Chen 5
Conditional Expectation Given an Event
• The conditional expectation of a continuous random
variable X , given an event A ( P ( A ) > 0 ), is defined by
E [X A ] = ∫−∞∞ xf
X A
( x )dx
– The conditional expectation of a function
form
E[g ( X ) A] = ∫−∞∞ g ( x ) f X
– Total Expectation Theorem
[
n
E[X ] = ∑ P ( Ai )E X Ai
and
i =1
A
g ( X ) also has the
(x )dx
]
E[g ( X )] = ∑ P ( Ai )E[g ( X ) Ai ]
n
i =1
• Where A1 , A2 , K , An are disjoint events with P ( Ai ) > 0
for
each i , that form a partition of the sample space
Probability-Berlin Chen 6
Illustrative Examples (1/2)
•
Example 3.10. Mean and Variance of a Piecewise Constant PDF.
Suppose that the random variable X has the piecewise constant
PDF
if 0 ≤ x ≤ 1,
⎧1 / 3,
⎪
f X ( x ) = ⎨ 2 / 3,
⎪0,
⎩
if 1 ≤ x ≤ 2 ,
otherwise.
Define event A1 = {X lies in the first interval [0,1] }
event A2 = {X lies in the second interval [1,2] }
⇒ P ( A1 ) = ∫01 1 / 3 dx =1 / 3, P ( A2 ) = ∫12 2 / 3 dx = 2 / 3
fX
⎧ f X (x )
= 1, 0 ≤ x ≤ 1
⎪
(
)
P
X
A
∈
x
(
)
=
⎨
1
A1
⎪0,
otherwise
⎩
Recall that the mean and second moment of
a uniform random variable over an interval
(
)
[ a, b ] is (a + b )/ 2 and a 2 + ab + b 2 / 3
[ A ]= 1 / 3
E [X A ] = 3 / 2 , E [X A ] = 7 / 3
⇒ E [X A1 ] = 1 / 2 , E X
2
2
2
1
1
fX
⎧ f X (x )
= 1, 1 ≤ x ≤ 2
⎪
(
)
P
X
A
∈
x
(
)
=
⎨
2
A2
⎪0,
otherwise
⎩
⇒ E [ X ] = P ( A1 )E [X A1 ] + P ( A2 )E [X A2 ]
= 1 / 3 ⋅1 / 2 + 2 / 3 ⋅ 3 / 2 = 7 / 6
[ ] = P ( A )E [X
E X
2
1
2
]
[
A1 + P ( A2 )E X
2
A2
]
= 1 / 3 ⋅ 1 / 3 + 2 / 3 ⋅ 7 / 3 = 15 / 9
∴ var ( X ) = 15 / 9 − (7 / 6 )2 = 11 / 36
Probability-Berlin Chen 7
Illustrative Examples (2/2)
•
1/20
Example 3.11. The metro train arrives at the station near your home
every quarter hour starting at 6:00 AM. You walk into the station
every morning between 7:10 and 7:30 AM, with the time in this
interval being a uniform random variable. What is the PDF of the
time you have to wait for the first train to arrive?
- The arrival time, denoted by X , is a uniform random
variable over the interval 7 : 10 to 7 : 30
- Let random varible Y model the waiting time
- Let A be a event
A = {7 : 10 ≤ X ≤ 7 : 15} (You board the 7 : 15 train)
- Let B be a event
B = {7 : 15 < X ≤ 7 : 30} (You board the 7 : 30 train)
- Let Y be uniform conditione d on A
- Let Y be uniform conditione d on B
PY ( y ) = P ( A )PY
A
( y ) + P (B )PY B ( y )
Probability-Berlin Chen 8
Multiple Continuous Random Variables (1/2)
• Two continuous random variables X and Y associated
with a common experiment are jointly continuous and can
be described in terms of a joint PDF f X ,Y satisfying
P (( X , Y ) ∈ B ) =
–
f X ,Y
∫∫ f X ,Y ( x , y )dxdy
( x , y )∈B
is a nonnegative function
∞ ∞
– Normalization Probability ∫− ∞ ∫− ∞
f X ,Y ( x , y )dxdy = 1
• Similarly, f X ,Y (a , c ) can be viewed as the “probability per
unit area” in the vicinity of (a, c )
P (a ≤ X ≤ a + δ , c ≤ Y ≤ c + δ )
= ∫aa + δ ∫cc +δ f X ,Y ( x , y )dxdy = f X ,Y (a , c ) ⋅ δ 2
– Where δ is a small positive number
Probability-Berlin Chen 9
Multiple Continuous Random Variables (2/2)
•
Marginal Probability
P ( X ∈ A ) = P ( X ∈ A and X ∈ (− ∞ , )∞ )
=
∫ ∫
∞
X ∈A − ∞
f X ,Y ( x , y )dydx
– We have already defined that
P (X ∈ A ) =
•
∫
X ∈A
f X ( x )dx
We thus have the marginal PDF
f X (x ) = ∫−∞∞ f X ,Y (x , y )dy
Similarly
f Y ( y ) = ∫−∞∞ f X ,Y (x , y )dx
Probability-Berlin Chen 10
An Illustrative Example
•
Example 3.13. Two-Dimensional Uniform PDF. We are told that
the joint PDF of the random variables X and Y is a constant c
on an area S and is zero outside. Find the value of c and the
marginal PDFs of X and Y .
The correspond ing uniform joint PDF on
an area S is defined to be (cf. Example 3.12)
1
⎧
, if ( x,y ) ∈ S
⎪
f X ,Y ( x,y ) = ⎨ Size of area S
⎪⎩ 0 ,
otherwise
1
⇒ f X ,Y ( x,y ) =
for ( x,y ) ∈ S
4
for 1 ≤ x ≤ 2
for 1 ≤ y ≤ 2
⇒ f X ( x ) = ∫14 f X,Y ( x,y )dy
1
3
= ∫14 dy =
4
4
for 2 ≤ x ≤ 3
⇒ f Y ( y ) = ∫12 f X,Y ( x,y )dx
1
1
= ∫12 dx =
4
4
for 2 ≤ y ≤ 3
⇒ f X ( x ) = ∫23 f X,Y ( x,y )dy
1
1
= ∫23 dy =
4
4
⇒ f Y ( y ) = ∫13 f X,Y ( x,y )dx
1
1
= ∫13 dx =
4
2
for 3 ≤ y ≤ 4
⇒ f Y ( y ) = ∫12 f X,Y ( x,y )dx
1
1
= ∫12 dx =
4
4
Probability-Berlin Chen 11
Conditioning one Random Variable on Another
• Two continuous random variables X and Y have a joint
PDF. For any y with fY ( y ) > 0 , the conditional PDF of X
given that Y = y is defined by
f X Y (x y ) =
f X ,Y ( x , y )
fY ( y )
∞
– Normalization Property ∫− ∞
f X Y (x y )dx = 1
• The marginal, joint and conditional PDFs are related to
each other by the following formulas
f X ,Y ( x , y ) = f Y ( y ) f X Y (x y ),
f X ( x ) = ∫−∞∞ f X ,Y ( x , y )dy .
marginalization
Probability-Berlin Chen 12
Illustrative Examples (1/2)
• Notice that the conditional PDF f X Y (x y ) has the same
shape as the joint PDF f X ,Y (x , y ) , because the
normalizing factor f Y ( y ) does not depend on x
f X Y (x 3 .5 ) =
f X Y (x 3 . 5 ) =
f X ,Y ( x ,3 .5 )
f Y (3 .5 )
f X ,Y ( x , 2 .5 )
f Y (2 .5 )
f X Y (x 3 . 5 ) =
=
f X ,Y ( x ,1 .5 )
f Y (1 .5 )
=
1/ 4
=1
1/ 4
1/ 4
= 1/ 2
1/ 2
=
1/ 4
=1
1/ 4
cf. example 3.13
Figure 3.17: Visualization of the conditional PDF f X Y (x y ) .
Let X , Y have a joint PDF which is uniform on the set S . For
each fixed y , we consider the joint PDF along the slice Y = y
and normalize it so that it integrates to 1
Probability-Berlin Chen 13
Illustrative Examples (2/2)
•
Example 3.15. Circular Uniform PDF. Ben throws a dart at a
circular target of radius r . We assume that he always hits the target,
and that all points of impact ( x , y ) are equally likely, so that the
joint PDF f X ,Y (x, y ) of the random variables x and y is uniform
– What is the marginal PDF f Y ( y )
1
⎧
, if ( x , y ) is in the circle
⎪
f X ,Y ( x , y ) = ⎨ area of the circle
⎪⎩ 0,
otherwise
⎧ 1
2
2
2
⎪ 2, x +y ≤r
= ⎨ πr
⎪0,
otherwise
⎩
fY ( y ) =
=
=
∞
∫− ∞
f X,Y ( x,y )dx = ∫x 2 + y 2 ≤ r 2
1
πr 2
2
2
∫x 2 + y 2 ≤ r 2 1dx =
2
1
πr 2
∫
fX
1
πr
2
f X,Y ( x,y
fY (y )
dx
r2 − y2
− r2− y
=
1dx
2
2
r − y , if y ≤ r
Y (x y ) =
πr
(Notice here that PDF f Y ( y ) is not uniform)
=
)
1
πr 2
2
πr 2
r2 − y2
1
2
r
2
− y
For each value y , f X
2
Y
,
(x y )
if x
2
+ y2 ≤ r2
is uniform
Probability-Berlin Chen 14
Expectation of a Function of Random Variables
• If X and Y are jointly continuous random variables,
and g is some function, then Z = g ( X , Y ) is also a
random variable (can be continuous or discrete)
– The expectation of Z can be calculated by
E [Z ] = E [g ( X , Y )] = ∫−∞∞ ∫−∞∞ g ( x , y ) f X ,Y ( x , y )dxdy
– If Z is a linear function of X and Y , e.g., Z = aX + bY , then
E [Z ] = E [aX + bY ] = a E [ X ] + b E [Y ]
• Where
a and b are scalars
Probability-Berlin Chen 15
Conditional Expectation
• The properties of unconditional expectation carry though,
with the obvious modifications, to conditional expectation
E [X Y = y ] = ∫−∞∞ xf X Y (x y )dx
E [g ( X )Y = y ] = ∫−∞∞ g ( x ) f X Y (x y )dx
E [g ( X , Y )Y = y ] = ∫−∞∞ g ( x , y ) f X Y (x y )dx
Probability-Berlin Chen 16
Total Probability/Expectation Theorems
• Total Probability Theorem
– For any event A and a continuous random variable Y
P (A) =
∞
∫− ∞
P ( A Y = y ) f Y ( y )dy
• Total Expectation Theorem
– For any continuous random variables X and Y
E [ X ] = ∫−∞∞ E [X Y = y ] f Y ( y )dy
∞
(
)
[
]
E g X = ∫− ∞ E [g ( X )Y = y ] f Y ( y )dy
E [g ( X , Y )] = ∫−∞∞ E [g ( X , Y )Y = y ] f Y ( y )dy
Probability-Berlin Chen 17
Independence
• Two continuous random variables X and Y are
independent if
f X ,Y ( x , y ) = f X ( x ) f Y ( y ), for all x,y
– Since that
f X ,Y ( x , y ) = f Y ( y ) f X Y (x y ) = f X ( x ) f Y
X
(y x )
• We therefore have
f X Y (x y ) = f X ( x ), for all x and all y with fY ( y ) > 0
• Or
fY X ( y x ) = fY ( y ), for all y and all x with f X ( x ) > 0
Probability-Berlin Chen 18
More Factors about Independence (1/2)
• If two continuous random variables X and Y are
independent, then
– Any two events of the forms {X ∈ A} and {Y ∈ B } are
independent
P ( X ∈ A, Y ∈ B ) = ∫x∈ A ∫ y∈B f X ,Y ( x , y )dydx
= ∫x∈ A ∫ y∈B f X ( x ) f Y ( y )dydx
[
= [∫x∈ A f X ( x )dx ] ∫ y∈B f Y ( y )dy
]
= P ( X ∈ A )(Y ∈ B )
– The converse statement is also true (See the end-of-chapter
problem 28)
Probability-Berlin Chen 19
More Factors about Independence (2/2)
• If two continuous random variables X and Y are
independent, then
–
–
E [ XY ] = E [ X ]E [Y ]
var ( X + Y ) = var ( X ) + var (Y )
– The random variables g( X ) and h(Y ) are independent for any
functions g and h
• Therefore,
E [g ( X )h (Y )] = E [g ( X )]E [h (Y )]
Probability-Berlin Chen 20
Joint CDFs
• If X and Y are two (either continuous or discrete)
random variables, their joint cumulative distribution
function (CDF) is defined by
F X ,Y ( x , y ) = P ( X ≤ x , Y ≤ y )
– If X and Y further have a joint PDF f X ,Y , then
F X ,Y ( x , y ) =
And
f X ,Y ( x , y ) =
If
F X ,Y
x
y
∫− ∞ ∫− ∞
f X ,Y (s , t )dsdt
∂ 2 F X ,Y ( x , y )
∂x∂y
can be differentiated at the point ( x, y )
Probability-Berlin Chen 21
An Illustrative Example
• Example 3.20. Verify that if X and Y are described by a
uniform PDF on the unit square, then the joint CDF is
given by
F X ,Y ( x , y ) = P ( X ≤ x , Y ≤ y ) = xy , for 0 ≤ x,y ≤ 1
Y
∂ 2 FX ,Y ( x, y )
∂x∂y
(0,1)
(1,1)
(0,0 )
(1,0 )
X
= 1 = f X ,Y ( x, y ), for all (x, y ) in the unit square
Probability-Berlin Chen 22
Recall: the Discrete Bayes’ Rule
• Let A1, A2 ,K, An be disjoint events that form a partition of
the sample space, and assume that P ( Ai ) ≥ 0, for all i .
Then, for any event B such that P (B ) > 0 we have
(
)
P Ai B =
=
=
(
P ( Ai )P B Ai
P (B )
(
)
P ( Ai )P B Ai
(
Multiplication rule
Total probability theorem
)
∑ nk =1 P ( Ak )P B Ak
(
(
)
P ( Ai )P B Ai
)
)
(
P ( A1 )P B A1 + L + P ( An )P B An
)
Probability-Berlin Chen 23
Inference and the Continuous Bayes’ Rule (1/2)
• As we have a model of an underlying but unobserved
phenomenon, represented by a random variable X with
PDF f X , and we make a noisy measurement Y , which
is modeled in terms of a conditional PDF f Y X . Once the
experimental value of Y is measured, what information
does this provide on the unknown value of X ?
X
f X (x )
Y
Measurement
fY
f X Y (x y ) =
Note that
f X f Y X = f X ,Y = f Y f X
Y
X
(y x )
Inference
f X Y (x y )
(y x )
= ∞
fY ( y )
∫− ∞ f X (t ) f Y X ( y t )dt
f X ,Y ( x , y )
f X (x ) fY
X
Probability-Berlin Chen 24
Inference and the Continuous Bayes’ Rule (2/2)
• If the unobserved phenomenon is inherently discrete
– Let N is a discrete random variable of the form {N = n} that
represents the different discrete probabilities for the unobserved
phenomenon of interest, and p N be the PMF of N
P (N = n Y = y ) ≈ P (N = n y ≤ Y ≤ y + δ )
P (N = n )P ( y ≤ Y ≤ y + δ N = n )
=
P( y ≤ Y ≤ y + δ )
≈
=
p N (n ) f Y
N
( y n )δ
f Y ( y )δ
(y n )
(i ) f ( y i )
p N (n ) f Y
∑p
N
N
Total probability theorem
Y N
i
Probability-Berlin Chen 25
Illustrative Examples (1/2)
•
Example 3.18. A lightbulb produced by the General Illumination
Company is known to have an exponentially distributed lifetime Y .
However, the company has been experiencing quality control
problems. On any given day, the parameter Λ = λ of the PDF of Y
is actually a random variable, uniformly distributed in the interval
[1, 3 / 2 ] .
– If we test a lightbulb and record its lifetime ( Y = y ), what can
we say about the underlying parameter λ ?
f Y Λ ( y λ ) = λ e − λ y , y ≥ 0, λ > 0
Conditioned on Λ = λ , Y has a exponential distribution
with parameter λ
⎧ 2, for 1 ≤ λ ≤ 3 / 2
f Λ (λ ) = ⎨
⎩ 0, otherwise
f Λ Y (λ y ) =
f Λ (λ ) f Y Λ ( y λ )
3/ 2
∫1
f Λ (t ) f Y Λ ( y t )dt
=
2λ e −λy
3/ 2
− ty
∫1 2te dt
,
for 1 ≤ λ ≤ 3 / 2
Probability-Berlin Chen 26
Illustrative Examples (2/2)
•
Example 3.19. Signal Detection. A binary signal S is transmitted,
and we are given that P (S = 1) = p and P (S = − 1) = 1 − p .
– The received signal is Y = S + N , where N normal noise with
zero mean and unit variance , independent of S .
– What is the probability that
y of Y ?
fY
S (y s ) =
Conditioned on S
S = 1 , as a function of the observed value
2
1
e − ( y − s ) / 2 , for s = 1 and -1, and -∞ ≤ y ≤ ∞
2π σ
=s,Y
P (S = 1 Y = y ) =
has a normal distribution with mean
p S (1) f Y
S
( y 1)
fY ( y )
=
p S (1) f Y
p S (1) f Y
S
S
s
and unit variance
( y 1)
( y 1) + p S (− 1) fY S ( y − 1)
2
1
e − ( y −1) / 2
2π
2
2
1
1
e − ( y −1) / 2 + (1 − p )
e − ( y +1) / 2
2π
2π
p
=
p
=
e
(
)
− y 2 +1 / 2
e − (y
y
2
)
+1 / 2
⋅ pe + e
(
⋅ pe y
) ⋅ (1 − p )e − y
− y 2 +1 / 2
=
pe y
pe y + (1 − p )e − y
Probability-Berlin Chen 27
Recitation
• SECTION 3.4 Conditioning on an Event
– Problems 14, 17, 18
• SECTION 3.5 Multiple Continuous Random Variables
– Problems 19, 24, 25, 26, 28
Probability-Berlin Chen 28
© Copyright 2026 Paperzz