Review of Probability and Distribution Theory

ECON 5350 Class Notes
Review of Probability and Distribution Theory
1
Random Variables
De…nition. Let c represent an element of the sample space C of a random experiment, c 2 C. A random
variable is a one-to-one function X = X(c). An outcome of X is denoted x.
Example. Single Coin Toss
C = fc = T ; c = Hg
X(c) = 0 if c = T
X(c) = 1 if c = H
1.1
Probability Density Function (pdf)
Two types:
1. Discrete pdf. A function f (x) such that f (x)
0; 8x and
2. Continuous pdf. A function f (x) such that f (x)
P
x
0; 8x and
f (x) = 1.
R1
x= 1
f (x)dx = 1.
See MATLAB example #1 for an example to calculate the area under a pdf.
Notes:
1. Pr(X = x) = f (x) in the discrete case, and Pr(X = x) = 0 in the continuous case.
2. Pr(a
1.2
X
b) =
Rb
x=a
f (x)dx.
Cumulative Distribution Function (cdf)
Two types:
1. Discrete cdf. A function F (x) such that
P
X x
2. Continuous cdf. A function F (x) such that
f (x) = F (x).
Rx
1
f (t)dt = F (x).
Notes:
1. F (b)
2. 0
F (a) =
F (x)
3. limx!
1
1.
Rb
1
f (t)dt
Ra
1
f (t)dt where b
F (x) = 0.
1
a.
4. limx!+1 F (x) = 1.
5. If x > y, F (x)
2
F (y).
Mathematical Expectations
Consider the continuous case only.
2.1
Mean
De…nition. The mean or expected value of g(X) is given by
E[g(X)] =
Z
g(x)f (x)dx:
x
Notes:
1. E(X) =
=
R
x
xf (x)dx is called the mean of X or the “…rst moment of the distribution”.
2. E( ) is a linear operator. Let g(X) = a + bX.
E[g(X)]
=
Z
Z
(a + bx)f (x)dx =
x
af (x)dx +
x
Z
bxf (x)dx
x
= E(a) + E(bX) = a + bE(X):
3. Other measures of central tendency: median, mode.
2.2
Variance
De…nition. The variance of g(X) is given by
V ar[g(X)] = E[fg(X)
E[g(X)]g2 ] =
Z
x
fg(x)
E[g(x)]g2 f (x)dx:
Notes:
1. Let g(X) = X. We have
V ar(X)
=
2
=
Z
(x
)2 f (x)dx =
x
= E(X 2 )
Z
x2 f (x)dx
x
2 E(X) +
2
= E(X 2 )
2
2
Z
x
2
:
xf (x)dx +
2
Z
x
f (x)dx
2. V ar(X) is NOT a linear operator. Let g(x) = a + bX.
V ar[g(X)] =
Z
x
3.
2.3
g( )g2 f (x)dx =
fg(x)
Z
b2 (x
)2 f (x)dx = b2 V ar(X) = b2
2
:
x
is called the standard deviation of X.
Other Moments
The measure E(X r ) is called the “rth moment of the distribution” while E[(X
)r ] is called the “rth
central moment of the distribution”.
r
Central Moment
1
E[(X
2
E[(X
Measure
)] = 0
)2 ] =
2
variance (dispersion)
3
E[(X
)3 ]
skewness (asymmetry)
4
E[(X
)4 ]
kurtosis (tail thickness).
Moment Generating Function (MGF). The MGF uniquely determines a pdf when it exists and is given
by
M (t) = E(etX ) =
Z
1
etx f (x)dx:
1
The rth moment of a distribution is given by
dr M (t)
jt=0 :
dtr
2.4
Chebyshev’s Inequality
De…nition. Let X be a random variable with
Pr(
k
2
< 1. For any k > 0,
X
+k )
1
:
k2
1
Chebyshev’s inequality is used to calculate upper (and lower) bounds on a random variable without having
to know the exact distribution.
Example. Let X
f (x) where
p
1
f (x) = p ;
2 3
3
3<x<
p
3
and zero elsewhere. If we let k = 3=2, we get
Cheb :
Exact :
3
Pr( 3=2
X
Pr( 3=2
X
3=2)
1
3=2) =
Z
1
= 5=9 = 0:55
(3=2)2
3=2
1
1
p dx = p [(3=2)
2 3
2 3
3=2
( 3=2)] ' 0:866:
Speci…c Probability Distributions
3.1
Normal pdf
If X has a normal distribution, then
1
p exp
2
f (x) =
where
1 < x < 1. In short-hand notation, X
(x
2
2
N( ;
)2
2
).
Notes:
1. The normal pdf is symmetric.
2. Z = (X
)=
N (0; 1) is called a standardized random variable and
1
(z) = p exp( 0:5z 2 )
2
is called the standard normal distribution.
3. Linear transformations of normal random variables are normal. If Y = a + bX where X
then Y
3.2
N (a + b ; b2
2
N( ;
):
Chi-square pdf
If Zi ; i = 1; :::; n; are independently distributed N (0; 1) random variables,
Y =
Xn
i=1
Zi2
2
(n)
where E(Y ) = n and V ar(Y ) = 2n.
Exercise. Find the MGF for Y = Z 2 and use it to derive the mean and variance.
Answer. We begin by calculating the MGF for Z 2 where t < 0:5:
2
M (t) = E(etZ ) =
Z
1
1
etz
2
(z)dz =
Z
1
(2 )
1
4
0:5 (t 0:5)z 2
e
dz =
Z
1
1
(2 )
0:5
e
0:5(1 2t)z 2
dz:
2
),
Now using the method of substitution, let w =
substitution produces
M (t) = (1
2t)
1=2
Z
p
1
(1
2t)z so that dw = (1
0:5
(2 )
0:5w2
e
dw = (1
2t)
2t)1=2 dz. Now making the
1=2
:
1
To calculate the mean, we take the …rst derivative of M (t) and evaluate at t = 0:
=
dM (t)
jt=0 = (1
dt
2t)
3=2
jt=0 = 1:
To calculate the variance, we take the second derivative of M (t), evaluate at t = 0, and subtract
2
3.3
=
d2 M (t)
jt=0
dt2
2
= 3(1
2
2t)jt=0
2
:
= 2:
F pdf
2
If X1 and X2 are independently distributed
F =
3.4
Student’s t pdf
If Z
N (0; 1) and X
2
(ni ) random variables,
X1 =n1
X2 =n2
(n) are independent,
T =p
3.5
Lognormal pdf
If X
N( ;
2
0.
exp( +
2
Z
X=n
t(n):
) then Y = exp(X) has the distribution
f (y) = p
for y
F (n1 ; n2 ):
ln(y)
1
exp[ 0:5(
2 y
Sometimes this is written as y
=2) and V ar(Y ) = exp(2 +
2
)(exp(
LN ( ;
2
)
1).
Notes:
5
2
).
)2 ]
The mean and variance of Y are E(Y ) =
1. If Y1
LN (
2
1)
1;
and Y2
LN (
2;
2
2)
are independent random variables, then
Y1 Y2
3.6
LN (
1
+
2
1
2;
2
2 ):
+
Gamma pdf
The gamma distribution is given by
f (x) =
for 0
1
( )
1
x
x < 1. The mean and variance are E(X) =
exp( x= )
and V ar(X) =
2
.
Notes:
R1
1.
( )=
2.
( )=(
0
1
y
exp( y)dy is called the gamma function,
1)! if
3. Greene sets
> 0.
is a positive integer.
= 1= and
= P:
4. When
= 1, you get the exponential pdf.
5. When
= n=2 and
= 2, you get the chi-square pdf.
Example. Gamma distributions are sometimes used to model “waiting times”. Let W be the waiting
time until death for a human. Let W
death is 80 years. (Note: W
Pr(W
30) =
3.7
= 80) so that the expected waiting time until
Exponential( )). Find the Pr(W
Z
30
0
=
Gamma( = 1;
1
1
exp( w=80)dw =
(1)80
80
1
( 80 exp( w=80)) j30
w=0 =
80
Z
30).
30
exp( w=80)dw
0
[exp( 3=8)
exp(0)] = 1
0:687 = 0:313:
Beta pdf
If X1 and X2 are independently distributed Gamma random variables then Y1 = X1 + X2 and Y2 = X1 =Y1
are independently distributed. The marginal distribution f2 (y2 ) of f (y1 ; y2 ) is called the beta pdf:
g(y) =
where 0
y
( + )
(y=c)
( ) ( )
1
[1
(y=c)]
1
(1=c)
c. The mean and variance are E(Y ) = c =( + ) and V ar(Y ) = c2
6
=( +
+ 1).
3.8
Logistic pdf
The logistic distribution is
f (x) = (x) [1
where
2
1 < x < 1 and
(x) = (1 + exp( x))
1
(x)]
. The mean and variance are E(X) = 0 and V ar(X) =
=3. A useful property of the logistic distribution is that the cdf has a closed-form solution
F (x) = (x).
3.9
Cauchy pdf
If X1 and X2 are independently distributed N (0; 1), then
Y = X1 =X2
where
1
(1 + y 2 )
f (y) =
1 < y < 1. The mean and the variance of the Cauchy pdf do not exist because the tails are too
thick. See See MATLAB example #2 for an example that graphs the Cauchy and standard normal pdfs.
3.10
Binomial pdf
The distribution for x successes in n trials is
b(n; ; x) =
where x = 0; 1; :::; n and 0
and V ar(X) = n (1
n
x
x
)n
(1
x
1. The mean and variance of the binomial distribution are E(X) = n
). The combinatorial formula for the number of ways to choose x objects from a
set n distinct objects is
n
x
3.11
=
n!
:
x!(n x)!
Poisson pdf
The Poisson pdf is often used to model the number of changes in a …xed interval. The Poisson pdf is
f (x) =
where x = 0; 1; ::: and
exp(
)
x
x!
> 0. The mean and variance are E(X) = V ar(X) = .
7
4
Distributions of Functions of Random Variables
Let X1 ; X2 ; :::; Xn have joint pdf f (x1 ; :::; xn ).
What is the distribution of Y = g(X1 ; X2 ; :::; Xn )? To
answer this question, we will use the change-of-variable technique.
Change of Variable Technique.
Let X1 and X2 have joint pdf f (x1 ; x2 ).
Let Y1 = g1 (X1 ; X2 ) and
Y2 = g2 (X1 ; X2 ) be the transformed random variables. If A is the set where f > 0, then let B be the set
de…ned by the one-to-one transformation of A to B. Then
g(y1 ; y2 ) = f (h1 (y1 ; y2 ); h2 (y1 ; y2 )) abs(J)
where (y1 ; y2 ) 2 B, x1 = h1 (y1 ; y2 ), x2 = h2 (y1 ; y2 ) and
J=
@x1
@y1
@x1
@y2
@x2
@y1
@x2
@y2
Example. Let X1 and X2 be uniformly distributed on 0
:
Xi
1. The random sample X1 ; X2 is jointly
distributed
f (x1 ; x2 ) = f1 (x1 )f2 (x2 ) = 1
over 0
x1 ; x2
1 and zero elsewhere. Find the joint distribution of Y1 = X1 + X2 and Y2 = X1
Answer. We know that x1 = h1 (y1 ; y2 ) = 0:5(y1 + y2 ) and x2 = h2 (y1 ; y2 ) = 0:5(y1
X2 .
y2 ). We also know
that
J=
0:5
0:5
0:5
=
0:5:
0:5
Therefore,
g(y1 ; y2 ) = f1 (h1 (y1 ; y2 ))f2 (h1 (y1 ; y2 )) abs(J) = 0:5
where (y1 ; y2 ) 2 B and zero elsewhere.
5
5.1
Joint Distributions
Joint pdfs and cdfs
A joint pdf for X1 and X2 gives Pr(X1 = x1 ; X2 = x2 ) = f (x1 ; x2 ). A proper joint pdf will have the
R1 R1
property 1 1 f (x1 ; x2 )dx2 dx1 = 1 and f (x1 ; x2 ) 0 for all x1 and x2 .
A joint cdf for X1 and X2 is Pr(X1
x1 ; X2
x2 ) = F (x1 ; x2 ) =
8
R x1 R x2
1
1
f (t1 ; t2 )dt2 dt1 .
5.2
Marginal Distributions
The marginal pdf of X1 is found by integrating over all X2 :
f1 (x1 ) =
Z
1
f (x1 ; x2 )dx2
1
and likewise for X2 .
Example. Let X1 and X2 have joint pdf
f (x1 ; x2 ) = 2; 0 < x1 < x2 < 1
and zero elsewhere. Is this a proper pdf?
Z
0
1
Z
1
x1
2dx2 dx1 =
Z
0
1
2x2 j1x2 =x1 dx1 =
Z
1
2(1
0
x1 )dx1 = 2x1 j1x1 =0
x21 j1x1 =0 = 2
1 = 1:
So yes, this is a proper pdf. The marginal distribution for X1 is
f1 (x1 ) =
Z
1
x1
2dx2 = 2x2 j1x2 =x1 = 2(1
x1 ); 0 < x1 < 1
and zero elsewhere. The marginal distribution for X2 is
f2 (x2 ) =
Z
0
x2
2dx1 = 2x1 jxx21 =0 = 2x2 ; 0 < x2 < 1
and zero elsewhere. See MATLAB example #4 for a graphical example of a joint and marginal pdf.
Notes:
1. Two random variables are stochastically independent if and only if f1 (x1 )f2 (x2 ) = f (x1 ; x2 ).
2. In our example, X1 and X2 are not independent because f1 (x1 )f2 (x2 ) = 4x2
4x1 x2 6= 2 = f (x1 ; x2 ).
3. Moments (e.g., means and variances) in joint distributions are calculated using marginal densities (e.g.,
R
E(X1 ) = x1 f1 (x1 )dx1 .
5.3
Covariance and Correlation
De…nition. The covariance between X and Y is
cov(X; Y ) = E (X
x )(Y
9
y)
= E(XY )
x y.
De…nition. The correlation coe¢ cient between X and Y removes the dependence on the unit of measurement:
= corr(X; Y ) =
cov(X; Y )
x y
where
1
1.
Notes:
1. If X and Y are independent, then cov(X; Y ) = 0:
cov(X; Y )
=
=
Z Z
E(XY )
xyfx (x)fy (y)dydx
x y =
Z
Z
xfx (x)dx yfy (y)dy
x y = x y
x y
x y
= 0:
2. However, cov(X; Y ) = 0 does not imply stochastic independence. Consider the following joint distribution table
y
1
0
1
0
0
1=3
1=3
0
0
1=3
0
1=3
1
0
0
1=3
1=3
0
1=3
2=3
1
x
fy (y)
where
x
= 0,
y
fx (x)
= 2=3 and
cov(X; Y )
=
=
XX
(x
x )(y
y )f (x; y)
( 1)(1=3)(1=3) + (0)( 2=3)(1=3) + (1)(1=3)(1=3) = 0:
However, X and Y are not independent because for (x; y) = (0; 0) we have
fx (0)fy (0) = 1=9 6= f (0; 0) = 1=3:
6
Conditional Distributions
De…nition. The conditional pdf for X given Y is
f (xjy) =
f (x; y)
:
fy (y)
Notes:
10
1. If X and Y are independent, f (xjy) = fx (x) and f (yjx) = fy (y).
2. The conditional mean is E(XjY ) =
R
xf (xjy)dx =
R
3. The conditional variance is V ar(XjY ) = (x
7
xjy .
2
xjy ) f (xjy)dx.
Multivariate Distributions
Let X = (X1 ; :::; Xn )0 be a (n
1) column vector of random variables. The mean and variance of X is
= E(X) = (
and
1 ; :::;
2
= V ar(X) = E[(X
0
n)
11
12
6
6
6 21
6
)0 ] = 6 .
6 .
6 .
4
)(X
1n
7
7
7
2n 7
7:
7
7
5
22
..
n1
.
n2
3
nn
Notes:
1. Let W = A + BX. Then E(W ) = A + BE(X).
2. The variance of W is
V ar(W )
7.1
=
E[(W
=
E[B(X
E(W ))(W
E(X))(X
E(W ))0 ] = E[(BX
BE(X))(BX
BE(X))0 ]
E(X))0 B 0 ] = B B 0 .
Multivariate Normal Distributions
Let X = (X1 ; :::; Xn )0
N ( ; ). The form of the multivariate normal pdf is
f (x) = (2 )
n=2
j j
1=2
exp[ 0:5(x
)0
1
(x
)]:
See MATLAB example #5 for an example of a bivariate normal density function.
7.2
If (X
Quadratic Form in a Normal Vector
) is a normal vector, then the quadratic form Q = (X
11
)0
1
(X
)
2
(n).
Proof. The moment generating function of Q is
M (t)
=
=
=
E(etQ )
Z
Z
Z
Z
(2 )
n=2
j j
1=2
exp[t(x
(2 )
n=2
j j
1=2
exp[ 0:5(x
Next, multiply and divide by (1
M (t)
=
R
R
=
(1
2t)
(2 )
n=2
)0
1
(x
)
)0 (1
2t)
)0
0:5(x
1
(x
1
)]dx1
(x
)]dx1
dxn :
2t)n=2 :
n=2
j =(1
2t)j
1=2
exp[ 0:5(x
(1 2t)n=2
)0 (1
2t)
1
(x
)]dx1
dxn
; t < 0:5.
The numerator is the integral of a multivariate normal random distribution with variance
it equals one. M (t) then simpli…es to the MGF for a
7.3
dxn
2
=(1
2t) and so
(n) random variable.
A Couple of Important Theorems
1. Let X
N (0; I) and A2 = A (i.e., A is idempotent). X 0 AX
2. Let X
N (0; I). X 0 AX and X 0 BX are stochastically independent i¤ A B = 0.
12
2
(r) where the rank of A is r.