by computing Pr(Y y) = Pr[g(X)

TRANSFORMING continuous RVs
Distribution-Function Technique
When Y ≡ g(X), we find Fy (y) by computing Pr(Y < y) = Pr[g(X) < y]. This
requires solving the g(X) < y inequality for X, and integrating f (x) over the resulting
interval.
EXAMPLES:
• Consider X ∈ U(− π2 , π2 ). Find the distribution of Y = b tan(X) + a (location of a
dot a bi-directional laser beam would leave on a screen placed b units from the wheel’s
center, with a scale whose origin is a units off the center). Note that Y can have any
x+ π
real value. B Let’s fist recall that Fx (x) = π 2 = πx + 12 when − π2 < x < π2 . Then,
y−a
Fy (y) = Pr[b tan(X) + a < y] = Pr[X < arctan( y−a
b )] = Fx [arctan( b )] =
y−a
1
1
π arctan( b ) + 2 where −∞ < y < ∞. Usually, we can relate better to the
corresponding
1
fy (y) = πb · b2 +(y−a)
2
for any real y. This function looks very similar to the Normal pdf (also a ’bell-shaped’
curve), but in terms of its properties, the new distribution turns out to be very different.
R∞
y · fy (y) dy
The name of this new distribution is Cauchy, notation: C(a, b). Since the
−∞
integral leads to ∞ − ∞, the Cauchy distribution does not have a mean (consequently,
its variance is infinite), but its median is well defined and equal to a. The most common
case is C(0, 1), whose pdf equals
1
1
f (y) = ·
π 1 + y2
and looks like this:
Its ’rare’ values start at ±70, we need to go beyond ±3000 to reach ’extremely unlikely’,
and only ∓300 billion become ’practically impossible’. Since the mean does not exist,
the central limit theorem breaks down - it is no longer true that Ȳ → N (µ, √σn ). Yet, Ȳ
must have some well defined distribution. We will find out in the next section.
• Let X have a pdf defined by f (x) = 6x(1 − x) for 0 < x < 1.Find the pdf of Y = X 3 .
1
B First we realize that 0 < Y < 1, based on
Rx
Secondly, we find Fx (x) = 6 (x − x2 ) dx = 3x2 − 2x3 . And finally:
0
1
1
2
Fy (y) ≡ Pr(Y < y) = Pr(X < y) = Pr(X < y 3 ) = Fx (y 3 ) = 3y 3 − 2y.This easily
3
1
converts to fy (y) = 2y − 3 − 2 where 0 < y < 1.
• Let X ∈ U(0, 1). Find and identify the distribution of Y = − ln X (its range is obviously
0 < y < ∞):
B First we need Fx (x) = x when 0 < x < 1. Then: Fy (y) = Pr(− ln X < y) =
Pr(X > e−y ) = 1 − Fx (e−y ) = 1 − e−y where y > 0 which implies that fy (y) = e−y .
This can be easily identified as the exponential distribution with the mean of 1. Note that
Y = −β · ln X would result in the exponential distribution with the mean equal to β.
• If Z ∈ N (0, 1), what is the distribution of Y = Z 2 . B Fy (y) = Pr(Z 2 <
√
√
√
√
y) = Pr(− y < Z < y) = Fz ( y) − Fz ( y). Since we don’t have
an explicit expression for Fz (z) it would appear that we are stuck at this
point, but we can get the corresponding fY (y) by a simple differentiation:
y
1
√
√
y − 2 e− 2
√
√
dFZ ( y)
dFZ (− y)
1 − 12
1 − 12
√
−
=
y
f
(
y)
+
y
f
(−
y)
=
where y > 0.
z
z
dy
dy
2
2
2π
The last distribution can be identified as that of gamma, with k = 12 and β = 2. One can
show that a non-integer k is OK - all we have to do is to replace k! by Γ(k + 1). Due to its
importance, this distribution has yet another name, it is called the chi-square distribution
with one degree of freedom, or χ21 for short. It has the expected value of (k · β =) 1, its
1
variance equals (k · β 2 =) 2, and the MGF is √1−2t
. An independent sum of n of these has
2
χ2n distribution with n degrees of freedom ≡ γ( n2 , 2).
EXAMPLE:
If U ∈ χ29 , find Pr(U < 3.325). B Integrate the corresponding pdf:
R 3.325 7/2
1
u exp(−u/2)du = 5.00%
29/2 Γ(9/2) 0
Probability-density-function Technique
works faster, but only for one-to-one transformations. It consists of three simple steps:
(i) Express X (the ’old’ variable) in terms of Y the ’new’ variable.
(ii) Substitute the result - we will call it x(y), switching to small letters - for the argument
of fx (x).
¯
¯
¯
¯
(iii) Multiply this by ¯ dx(y)
dy ¯ , and you have fy (y).
In summary
¯
¯
¯ dx(y) ¯
¯
fy (y) = fx [x(y)] · ¯¯
dy ¯
EXAMPLES
• X ∈ U(− π2 , π2 ) and Y = b tan(X) + a. (i) x = arctan( y−a
b ) (ii)
1 1
1
b
1
·
·
=
·
where
−∞
<
y
<
∞
(check).
π b 1+( y−a )2
π b2 +(y−a)2
1
π
(iii)
b
• f (x) = 6x(1 − x) for 0 < x < 1 and Y = X 3 . (i) x = y 1/3 (ii) 6y 1/3 (1 − y 1/3 ) (iii)
6y 1/3 (1 − y 1/3 ) · 13 y −2/3 = 2(y −1/3 − 1) when 0 < y < 1 (check).
• X ∈ U(0, 1) and Y = − ln X. (i) x = e−y (ii) 1 (iii) e−y for y > 0 (check).
Bivariate case
Distribution-Function Technique follows the same logic as the univariate case: based
on Y ≡ g(X1 , X2 ), we find Fy (y) = Pr(Y < y) = Pr[g(X1 , X2 ) < y, which now requires
double integration over the indicated region. The technique is simple in principle, but often
quite involved in technical details.
EXAMPLES:
X2
• Suppose that X1 and X2 are independent RVs, both from E(1), and Y =
.
X
1
µ
¶
RR
X2
B Fy (y) = Pr
< y = Pr(X2 < yX1 ) =
e−x1 −x2 dx1 dx2 =
X1
0<x2 <yx1
R∞ −x yx
R 1 −x
1
e 1
e 2 dx2 dx1 = 1 −
, where y > 0. This implies that
1+y
0
0
1
fy (y) =
when y > 0.
(1 + y)2
• This time Z1 and Z2 are independent RVs from N (0, 1) and Y = Z12 + Z22 . B
√
2π
RR
z2 +2
R R y − r2
1
1
2
2
− 12 2
Fy (y) = Pr(Z1 + Z2 < y) = 2π
e
dz1 dz2 = 2π
e 2 · r dr dϕ =
z12 +z22 <y
3
0
0
y/2
R
0
y
e−w dw = 1 − e− 2 where y > 0. This is the exponential distribution with β = 2 (not
χ22 as expected, how come?).
• Assume that X1 and X2 are independent RVs from a distribution having L and H as
its lowest and highest possible value, respectively. Find the distribution of X1 + X2 . B
Skipping the details, the resulting pdf is
fy (y) =
min(H,y−L)
Z
f (x) · f (y − x) dx
max(L,y−H)
where 2L < y < 2H. This is the so called convolution of two distributions.
Thus for example, when X1 , X2 ∈ U(0, 1), we get (for the independent sum):
 Ry


dx = y
when 0 < y < 1

min(1,y)
R
0
fy (y) =
dx =
which lloks like
R1


max(0,y−1)
dx
=
2
−
y
when
1
<
y
<
2

y−1
this:
Similarly, when X1 , X2 ∈ C(0, 1), the answer is: fX1 +X2 (y) =
1
2
1+(y−x)2 dx = π
X1 +X2
, this
X̄ =
2
·
1
4+y 2 ,
1
π2
R∞
−∞
1
1+x2
·
where −∞ < y < ∞. Converted to the distribution of
1
1
1
yields fx̄ (x̄) = π2 · 4+(2x̄)
2 · 2 = π · 1+x̄2 . Thus, the sample mean X̄
has the same C(0, 1) distribution as do the individual observations (this can be extended
to any sample size).
Probability-density-function Technique
works faster, but involves several steps. Furthermore, it can work only for one-to-one
(’invertible’) transformations. This implies that the new RV Y ≡ g(X1 , X2 ) must be
accompanied by yet another arbitrary function of X1 and/or X2 (the usual choice is either
Y2 ≡ X2 or Y1 ≡ X1 ). Once we have done this, we proceed as follows.
1. Invert this transformation, i.e. solve the two equations y1 = g(x1 , x2 ) and y2 = x2 for
x1 and x2 . Getting a unique solution guarantees that the transformation is one-to-one.
4
2. Substitute this solution into the joint pdf of the ’old’ X1 , X2 pair.
¯
¯
¯ ∂x1 ∂x1 ¯
¯ ∂y1 ∂y2 ¯
3. Multiply the result by the transformation’s Jacobian ¯ ∂x2 ∂x2 ¯ , getting the joint pdf
¯ ∂y1 ∂y2 ¯
of Y1 and Y2 . At the same time, establish the region of possible (Y1 , Y2 ) values.
4. Eliminate Y2 by finding the Y1 marginal.
EXAMPLES:
• X1 , X2 ∈ E(1), independent, Y1 =
³
y
1
−y2 1+ 1−y
into e−x1 −x2 , getting e
y2
(1−y1 )2
R∞
by
0
getting f (y1 , y2 ) =
y2
(1−y1 )2
y2
e− 1−y1 dy2 =
1
1 ·y2
B x2 = y2 and x1 = y1−y
. Substitute
1
¯
¯
1−y1 +y1
y1
y2
¯
¯
y2 (1−y1 )2 1−y
− 1−y
¯
¯=
1
1
=e
, multiply by ¯
0
1 ¯
X1
X1 +X2 .
´
y2
y2
(1−y1 )2
e− 1−y1 with 0 < y1 < 1 and y2 > 0. Eliminate Y2
1
(1−y1 )2
· (1 − y1 )2 ≡ 1 when 0 < y1 < 1. The distribution
1
is thus U(0, 1). Note that if we started with X1 , X2 ∈ E(β) instead of E(1),
of X1X+X
2
the result would have been the same.
2
• Same X1 and X2 as before, Y2 = X
. B x1 = y1 and x2 = y1 · y2 . Substituting into
¯
¯ X1
¯
1
0 ¯¯
−x1 −x2
−y1 (1+y2 )
¯
e
= y1 gives the joint pdf when y1 > 0 and
to get e
, times ¯
y2 y1 ¯
∞
R
1
y2 > 0. Eliminate y1 by y1 e−y1 (1+y2 ) dy1 = (1+y
2 , where y2 > 0 (check).
2)
0
• In this example we introduce the so called Beta distribution. Let X1 and
X2 be independent RVs from γ(k, 1) and γ(m, 1) respectively, and Y1 =
y1 y2
y2
X1
X1 +X2 . B x2 = y2 , x1 = 1−y1 , and the Jacobian is (1−y1 )2 . Substituting
xk−1
xm−1
e−x1 −x2
1
2
Γ(k)·Γ(m)
y2
y1k−1 y2k−1 y2m−1 e− 1−y1
Γ(k)Γ(m)(1 − y1 )k−1
into f (x1 , x2 ) =
and multiplying by the Jacobian yields
y2
when 0 < y1 < 1 and y2 > 0.
(1 − y1 )2
R∞ k+m−1 − y2
y1k−1
y
e 1−y1 dy2 =
Integrating y2 out results in:
Γ(k)Γ(m)(1 − y1 )k+1 0 2
f (y1 , y2 ) =
·
Γ(k + m)
· y k−1 (1 − y1 )m−1
Γ(k) · Γ(m) 1
where 0 < y1 < 1. One can show (by simple integration) that the mean of this
distribution is
k
k+m
and the variance equals
km
2
(k + m + 1) (k + m)
• In this example we introduce the so called ‘Student’ or t-distribution (notation:
5
tn , where n is called ‘degrees of freedom’). We start with two independent
X1
RVs X1 ∈ N (0, 1) and X2 ∈ χ2n , and introduce a new RV by Y1 = q .
X2
n
p y2
To get its pdf, we solve for x2 = y2 and x1 = y1 ·
n
x2
¯ p y2
x2
1
−1
¯
e− 2 x22 e− 2
n
f (x1 , x2 ) = √
and multiply by ¯¯
·
n
n
2
0
2π Γ( 2 ) · 2
2y
y1
2
n
y2
−1
n , substitute into
¯
1 √y1
¯ p y2
2 · ny2 ¯ =
n
¯
1
to get
e− 2n
y 2 e− 2 p y2
· 2 n
f (y1 , y2 ) = √
n ·
n where −∞ < y1 < ∞ and y2 > 0. To
Γ( 2 ) · 2 2
2π
2
R∞ n−1
y1
y2
1
y2 2 e− 2 (1+ n ) dy2 =
eliminate y2 we integrate: √
n√
n
2πΓ( 2 ) 2 2 n 0
Γ( n+1
1
2 )
·
n √
´ n 2+1
Γ( 2 ) nπ ³
y2
1 + n1
for any −∞ < y1 < ∞. Note that when n = 1, this gives π1 · 1+1y2 (Cauchy), when
1
n → ∞, we get N (0, 1). Due to the symmetry of the distribution its mean is zero (when
n
is exists, i.e. when n ≥ 2). Its variance equals n−2
• And finally, we introduce the Fisher’s F-distribution (notation: Fn,m where n
and m are the so called numerator’s and denominator’s ‘degrees of freedom’,
X1 /n
where X1 and X2 are independent, having
respectively), defined by Y1 = X
2 /m
the chi-square distribution with n and m degrees of freedom respectively. First we
n
n
solve for x2 = y2 and x1 = m
y1 y2 , and fnd the Jacobian to equal m
y2 . Then we
n
−1
x1
m
x2
−1
x 2 e− 2 x 2 e− 2
substitute these into 1 n n · 2 m m and multiply by the Jacobian to get
Γ( 2 ) 2 2
Γ( 2 ) 2 2
n n
n
n
+m
2
n
(m)
−1 − y2 (1+ m y1 )
2 −1
2
2
·
y
e
, when y1 > 0 and y2 > 0. Integrating
n+m y1
2
2
Γ( n2 ) Γ( m
)
2
2
over y2 (from 0 to ∞) yields the following formula for the corresponding pdf
n
−1
Γ( n +m
y12
n n2
2 )
f (y1 ) =
·
(
)
n +m
n
Γ( n2 ) Γ( m
(1 + m
y1 ) 2
2) m
when y1 > 0. The mean of this distribution is
m
m−2
when m ≥ 3 (infinite for m = 1 and 2), the variance equals
2 m2 (n + m − 2)
(m − 2)2 (m − 4) n
when m ≥ 5 (infinite for m = 1, 2, 3 and 4).
6