4 Pairs of Random Variables

B.Sc./Cert./M.Sc. Qualif. - Statistical Theory
4
4.1
Pairs of Random Variables
Introduction
In this section, we consider a pair of r.v.’s X, Y on (Ω, F, P), i.e. X, Y : Ω → R. More precisely,
we define a random vector (X, Y ) : Ω → R2 by setting (X, Y )(ω) = (X(ω), Y (ω))
In general, there will be some dependence between X and Y ; this dependence can be encapsulated through an appropriately defined distribution function.
It will be useful to define the concept of range for this bivariate situation.
Definition 4.1.1 (Range)
The range of a random vector (X, Y ) is given by
R(X,Y ) = {(x, y) : X(ω) = x, Y (ω) = y for some ω ∈ Ω}.
4.2
Discrete case
Definition 4.2.1 (distribution function/p.m.f.(bivariate case))
Suppose X and Y are discrete r.v.’s.
(i) The joint distribution function F(X,Y ) : R2 7→ [0, 1] of X and Y is given by
F(X,Y ) (x, y) = P(X ≤ x, Y ≤ y).
(ii) The joint probability mass function p(X,Y ) : R2 7→ [0, 1] is given by
p(X,Y ) (x, y) = P(X = x, Y = y).
As with single r.v.’s, we can say that p(X,Y ) is a p.m.f. if and only if
(i) p(X,Y )P
(x, y) ≥ 0 for (x, y) ∈ R2 .
(ii) x,y p(X,Y ) (x, y) = 1.
Given the joint p.m.f., it is possible to calculate the marginal p.m.f.’s, namely, pX and pY .
In fact, suppose that X takes values in a countable set
RX = {x1 , x2 , . . .},
and Y takes values in
RY = {y1 , y2 , . . .}.
1
Then
pX (xi ) = P(X = xi ) = P({ω ∈ Ω : X(ω) = xi })
[
X
X
= P( {ω ∈ Ω : X(ω) = xi , Y (ω) = yj }) =
P(X = xi , Y = yj ) =
p(X,Y ) (xi , yj )
j
j
j
where xi ∈ RX .
To derive the above expression, we have used the fact that, for Aj = {X = xi , Y = yj }, {Aj , j =
1, 2, . . .} is a partition of the set {X = xi }. The final equality follows from Definition 4.2.1 (ii).
Example 4.2.2
Let the number of enquiries arriving into a call centre (in a specified period of time) and
the number of these which are answered be represented by the random variables X and Y
respectively.
Suppose that the joint p.m.f. of X and Y is given by
p(X,Y ) (m, n) =
e−λ λm θn (1 − θ)m−n
.
(m − n)!n!
with 0 < θ < 1, λ > 0.
Find the marginal distributions of X and Y .
Solution:
Note1 that
R(X,Y ) = {(m, n) : m ∈ N, n ∈ N, n ≤ m}
To find the p.m.f. of X, first note that the range of X is RX = {0, 1, 2, . . .}
For a given m ∈ RX ,
pX (m) = P(X = m) =
m
X
P(X = m, Y = n) =
n=0
=
m
X
e−λ λm θn (1 − θ)m−n
m
X
p(X,Y ) (m, n)
n=0
= e−λ
m
λm X
m!
θn (1 − θ)m−n
m! n=0 (m − n)!n!
(m − n)!n!
m µ ¶
m X
m n
λm
λm
−λ λ
× 1 = e−λ .
=e
θ (1 − θ)m−n = e−λ
m! n=0 n
m!
m!
n=0
We have used the fact that the sum of the p.m.f. of the Bin(m, θ) distribution over its range
is equal to 1.
We recognize the above function, with the stated range, i.e. PX (m), to be the p.m.f. of the
1
Here N is defined to be the set of non-negative integers, and so includes 0.
2
P o(λ) distribution, i.e. X ∼ P o(λ).
Also, RY = {0, 1, 2, . . .}
So for n ∈ RY ,
pY (n) = P(Y = n) =
X
P(X = m, Y = n) =
m
X
p(X,Y ) (m, n)
m
∞
X
e−λ λm θn (1 − θ)m−n
=
(m − n)!n!
m=n
(the restricted summation is valid due to the fact that X ≥ Y ). So pY (n) is equal to
e
∞
∞
n X
X
(λ(1 − θ))m−n
(λ(1 − θ))m
(λθ)n
−λ (λθ)
=e
= e−λ eλ(1−θ)
n! m=n (m − n)!
n! m=0
m!
n!
n
−λ (λθ)
which is equal to
e−λθ
(λθ)n
, n = 0, 1, 2, . . .
n!
i.e. Y ∼ P o(λθ).
4.3
Continuous case
The definition of the joint distribution function is exactly the same as that for the discrete
case. The continuous analogue of the p.m.f. is introduced here.
Definition 4.3.1 (Joint p.d.f.)
X and Y are (jointly) continuous with joint p.d.f. f(X,Y ) : R2 7→ [0, ∞) if
Z y Z x
F(X,Y ) (x, y) =
f(X,Y ) (u, v)dudv
−∞
−∞
for each x, y ∈ R.
In line with earlier parts of this discussion, a joint p.d.f., f(X,Y ) say, of r.v.’s X and Y , can be
characterized by the following conditions
2
(i) f(X,Y
RR ) (x, y) ≥ 0 for (x, y) ∈ R
(ii) R2 f(X,Y ) (x, y)dxdy = 1.
Again, given the joint p.d.f. of X and Y , we can calculate the marginal p.d.f.’s.
First note that
Z
P(X ∈ A) =
fX (x)dx.
A
However
P(X ∈ A) = P(X ∈ A, −∞ < Y < ∞)
3
(1)
Z µZ
¶
∞
=
f(X,Y ) (x, y)dy dx
A
(2)
−∞
Combining (1) and (2) yields
Z
Z µZ
fX (x)dx =
A
f(X,Y ) (x, y)dy dx.
−∞
A
Hence
¶
∞
Z
∞
fX (x) =
f(X,Y ) (x, y)dy.
−∞
In a similar way, it may be shown that
Z
∞
fY (y) =
f(X,Y ) (x, y)dx.
−∞
Remarks 4.3.2
It is easy to extend and generalize results in obvious ways for random vectors of dimension
n > 2, for both the discrete and continuous cases; we will explore some of these results in a
future lecture.
Example 4.3.3
Suppose that
½
f(X,Y ) (x, y) =
α(1 − x − y) x > 0, y > 0, x + y < 1
0
o.w.
Determine the constant α. Hence, determine the marginal p.d.f. of X.
Solution: In this case, the range of (X, Y ) is given by
R(X,Y ) = {(x, y) : x > 0, y > 0, x + y < 1}
with α > 0.
α can be determined by exploiting property (ii):
ZZ
ZZ
1=
f(X,Y ) (x, y)dxdy = α
R(X,Y )
Z
1
µZ
1−x
=α
¶
Z
1
(1 − x − y)dy dx = α
0
(1 − x − y)dxdy
R(X,Y )
0
0
·
¸1
α(1 − x)3
(1 − x)2
α
dx = −
=
2
6
6
0
⇒ α = 6.
Indeed, if we quit after the integration w.r.t. y with the knowledge that α = 6, we see that
fX (x) = 3(1 − x)2
4
x ∈ RX .
Example 4.3.4 Suppose that
½
f(X,Y ) (x, y) =
x2 + xy/3 0 < x < 1, 0 < y < 2
0
o.w.
Find the probability that the sum of X and Y is less than 1.
Solution:
ZZ
P(X + Y < 1) =
f(X,Y ) (x, y)dxdy
x+y<1
¾
xy ´
dy dx
x +
=
f(X,Y ) (x, y)dy dx =
3
0
y=0
0
0
¸1−x
¾
Z 1·
Z 1½
xy 2
x(1 − x)2
2
2
=
x y+
dx =
x (1 − x) +
dx
6 0
6
0
0
Z
1
½Z
¾
1−x
Z
1
½Z
1−x
³
2
= 7/72.
4.4
Further Remarks on the Distribution function
In this section, we summarize some of the properties of the distribution function that apply
both to the discrete and continuous cases.
Remarks 4.4.1 (Properties of the Bivariate distribution function)
(i)
0 ≤ F(X,Y ) (x, y) ≤ 1
lim F(X,Y ) (x, y) = 1
x→∞
y→∞
lim F(X,Y ) (x, y) = lim F(X,Y ) (x, y) = lim F(X,Y ) (x, y) = 0.
x→∞
y→−∞
x→−∞
y→∞
x→−∞
y→−∞
(ii) For fixed x, F(X,Y ) (x, y) is monotone % in y.
Similarly, for fixed y, F(X,Y ) (x, y) is monotone % in x.
(iii)
lim F(X,Y ) (x, y) = lim P(X ≤ x, Y ≤ y) = P(X ≤ ∞, Y ≤ y) = P(Y ≤ y) = FY (y).
x→∞
x→∞
lim F(X,Y ) (x, y) = lim P(X ≤ x, Y ≤ y) = P(X ≤ x, Y ≤ ∞) = P(X ≤ x) = FX (x).
y→∞
y→∞
(iv) If a1 < a2 and b1 < b2 , then
P(a1 < X ≤ a2 , b1 < Y ≤ b2 )
= F(X,Y ) (a2 , b2 ) − F(X,Y ) (a1 , b2 ) − F(X,Y ) (a2 , b1 ) + F(X,Y ) (a1 , b1 ).
5
4.5
Conditional distributions
Let (X, Y ) be a 2-dimensional random vector with joint p.m.f. p(X,Y ) (x, y) for (x, y) ∈ R(X,Y ) ⊆
R2 . Consider
P(X = x|Y = y) = P({ω : X(ω) = x}|{ω : Y (ω) = y})
for P({ω : Y (ω) = y}) > 0.
Then from our definition of conditional probability, the above expression is equivalent to
P({ω : X(ω) = x} ∩ {ω : Y (ω) = y})
P({ω : Y (ω) = y})
p(X,Y ) (x, y)
P(X = x, Y = y)
=
P(Y = y)
pY (y)
thus motivating the following definition:
=
Definition 4.5.1 (Conditional p.m.f./p.d.f.)
(i) For X and Y discrete, the conditional p.m.f. of X given Y , is given by
pX|Y (x|y) = P(X = x|Y = y) =
p(X,Y ) (x, y)
pY (y)
for any y s.t. pY (y) > 0.
(ii) For X and Y continuous, the conditional p.d.f. is given by
fX|Y (x|y) =
f(X,Y ) (x, y)
fY (y)
for any y s.t. fY (y) > 0.
Remarks 4.5.2
It isPeasy to show that
(i) R x pX|Y (x|y) = 1 for y such that pY (y) > 0.
(ii) fX|Y (x|y)dx = 1 for y such that fY (y) > 0.
Example 4.5.3
½
6(1 − x − y) x > 0, y > 0, 0 < x + y < 1
0
o.w.
Find the conditional p.d.f. of X given Y .
f(X,Y ) (x, y) =
Solution: It can be shown that
fY (y) =
½
3(1 − y)2 y ∈ (0, 1)
0
o.w.
Therefore, given y ∈ (0, 1),
fX|Y (x|y) =
f(X,Y ) (x, y)
2(1 − x − y)
=
fY (y)
(1 − y)2
for 0 < x < 1 − y.
6
Definition 4.5.4 (Probability from conditional p.d.f.)
If (X, Y ) is continuous, then
Z
P(X ∈ A|Y = y) =
fX|Y (x|y)dx.
A
Example 4.5.5 X is height in cm.
Y is weight in kg.
A = (180, 200), y=70.
Then P(X ∈ A|Y = y) gives us the ’probability that the height lies between 180 and 200 cm,
given that the weight is 70 k.g.’
Theorem 4.5.6 (Theorem of Total Probability)
(a) If (X, Y )R is continuous, then
(i) fX (x) = RY fX|Y (x|y)fY (y)dy for x ∈ RX ;
R
(ii) A ⊆ RX , P(X ∈ A) = RY P(X ∈ A|Y = y)fY (y)dy.
(b) Similarly for the discrete case, with integration and p.d.f.’s replaced by summation and
p.m.f.’s, respectively.
Proof
(a)
(i)
Z
Z
fX (x) =
f(X,Y ) (x, y)dy =
fX|Y (x|y)fY (y)dy.
RY
(ii)
RY
P(X ∈ A) =
=
¶
fX|Y (x|y)fY (y)dy dx
A
RY
Z
fX|Y (x|y)dx dy =
fY (y)
RY
fX (x)dx =
A
µZ
Z
¶
Z µZ
Z
A
fY (y)P(X ∈ A|Y = y)dy
RY
where the final equality follows from Definition 4.5.4.
We can see from this that P(X ∈ A) is found by computing the P(X ∈ A|Y = y), and
then ’averaging’ these out over all y ∈ RY .
7
Corollary 4.5.7 (Bayes’ Rule/Formula/Theorem)
(i) discrete case:
pX|Y (x|y)pY (y)
pY |X (y|x) = P
n∈RY pX|Y (x|n)pY (n)
(ii) continuous case:
fX|Y (x|y)fY (y)
.
f
(x|u)f
(u)du
Y
X|Y
RY
fY |X (y|x) = R
Definition 4.5.8 (Independence of r.v.’s) X, Y are independent ⇔
for all
A ⊆ RX , B ⊆ RY
P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B).
Proposition 4.5.9 (Independence of r.v.’s via marginals)
X and Y are independent ⇔
for all x and y
p(X,Y ) (x, y) = pX (x)pY (y) discrete case
f(X,Y ) (x, y) = fX (x)fY (y) continuous case.
Proof See Appendix.
Remarks 4.5.10 (Important Observation!)
(i) For independence,
f(X,Y ) (x, y) = fX (x)fY (y).
But L.H.S. is > 0 ⇔ (x, y) ∈ R(X,Y ) ;
R.H.S. is > 0 ⇔ x ∈ RX and y ∈ RY .
Hence, if X, Y are independent, then R(X,Y ) has a ’rectangular’ structure.
(ii) ’Rectangular’ R(X,Y ) is a necessary but not a sufficient condition for independence.
Corollary 4.5.11
(i) R(X,Y ) not ’rectangular’ ⇒ X, Y not independent.
(ii) X, Y independent then
(a) fX|Y (x|y) = fX (x), fY |X (y|x) = fY (y)
(b) F(X,Y ) (x, y) = FX (x)FY (y).
Example 4.5.12
Let (X, Y ) be continuous, with joint p.d.f.
½ −(x+y)
e
x ≥ 0, y ≥ 0
f(X,Y ) (x, y) =
0
o.w.
8
Are X and Y independent?
Solution: Observe that
f(X,Y ) (x, y) = e−(x+y) = e−x × e−y for (x, y) ∈ R(X,Y ) = {(u, v) : u ≥ 0, v ≥ 0}.
Since e−x and e−y are both p.d.f.’s on RX = {x : x ≥ 0} and RY = {y : y ≥ 0} (each
corresponding to the Exp(1) distribution), then we may conclude that X and Y are independent
r.v.’s.
Example 4.5.13
½
f(X,Y ) (x, y) =
x2 + xy/3 0 ≤ x ≤ 1, 0 ≤ y ≤ 2
0
o.w.
Are X and Y independent?
Solution: Since f(X,Y ) cannot be factorized into the product of two marginals, then X, Y
are not independent, in spite of the fact that R(X,Y ) is ’rectangular’ in this case.
4.6
Conditional Expectation
Definition 4.6.1 The conditional expectation of X given Y = y, is given by:
(i) discrete case:
∞
X
E[X|Y = y] =
xpX|Y (x|y) y ∈ RY .
x=−∞
(ii) continuous case:
Z
∞
E[X|Y = y] =
xfX|Y (x|y)dx
y ∈ RY .
−∞
Remarks 4.6.2
The conditional expectations can be used to compute the total expectation (provided that they
exist), since:
X
X X
X
X
E[X] =
xpX (x) =
x
p(X,Y ) (x, y) =
xp(X,Y ) (x, y) =
xpX|Y (x|y)pY (y)
x
x
=
(
X X
y
y
x,y
)
xpX|Y (x|y) pY (y) =
x
x,y
X
E[X|Y = y]pY (y)
y
for the discrete case.
For the continuous case,
Z
E[X] =
E[X|Y = y]fY (y)dy.
9
Appendix
Proof of Proposition 4.5.9
We’ll restrict our attention to the continuous case.
(⇒)
Set A = (x, x + δx) and B = (y, y + δy) where δx,δy are small.
From Definition 4.5.8, X, Y independent implies that
P(x ≤ X ≤ x + δx, y ≤ Y ≤ y + δy) = P(x ≤ X ≤ x + δx)P(y ≤ Y ≤ y + δy).
But L.H.S. is equal to
Z
x+δx
Z
y+δy
f(X,Y ) (u, v)dudv ' f(X,Y ) (x, y)δxδy
x
y
and R.H.S. is equal to
µZ x+δx
¶ µZ
¶
y+δy
fY (v)dv
fX (u)du
' fX (x)fY (y)δxδy.
y
x
Hence
f(X,Y ) (x, y)δxδy = fX (x)fY (y)δxδy
i.e. f(X,Y ) (x, y) = fX (x)fY (y).
(⇐)
ZZ
P(X ∈ A, Y ∈ B) = P((X, Y ) ∈ A × B) =
f(X,Y ) (x, y)dxdy
A×B
µZ
Z Z
=
¶ µZ
fX (x)fY (y)dxdy =
A×B
¶
fX (x)dx
fY (y)dy
B
A
where the final equality follows from Remark 2.3.2 (iii).
10
= P(X ∈ A)P(Y ∈ B)