The Normal Distribution

STAT 516: Some continuous distributions
Lecture 13: The Normal Distribution
Prof. Michael Levine
October 20, 2015
Levine
STAT 516: Some continuous distributions
Definition
I
First, define the pdf
1
2
φ(z) = √ e −z /2
2π
for any −∞ < z < ∞
I
It is clearly non-negative; using polar coordinates, easy to
verify that
Z ∞
1
2
√ e −z /2 dz = 1
2π
−∞
I
Equally easy to derive the mgf (use completion of the square
while integrating):
2
MZ (t) = e t /2
for any −∞ < t < ∞
Levine
STAT 516: Some continuous distributions
Moments of Z ∼ φ(z)
I
Using the MGF approach, it is easy to find that
EZ = 0
and
Var (Z ) = 1
I
Any X = bZ + a, for b > 0 is also normal...using the standard
transformation technique, one obtains
1
e
fX (x) = √
2πb
n
− 12 ( x−a
b )
2
o
for −∞ < x < ∞
I
Immediately, E X = a and Var (X ) = b 2
Levine
STAT 516: Some continuous distributions
Formal definition
I
A random variable X is said to have a normal distribution if
its pdf is
n
o
2
1
− 1 ( x−µ )
e 2 σ
f (x) = √
2πσ
for −∞ < x < ∞
I
µ and σ 2 are the mean and the variance of X ; also,
X ∼ N(µ, σ 2 )
I
The mgf of X is
1
MX (t) = e µt+ 2 σ
2t2
for −∞ < t < ∞
I
The version with mean zero and variance 1 is called the
standard normal random variable
Levine
STAT 516: Some continuous distributions
Some important properties of the normal distribution pdf
1. The normal pdf is symmetric about the vertical axis x = µ
2. The normal pdf is unimodal and achieves its maximum of
√1
at x = µ
σ 2π
3. x-axis is the horizontal asymptote of the normal pdf
4. The points x = µ ± σ are the inflection points of the normal
pdf
Levine
STAT 516: Some continuous distributions
Computation of the normal cdf
I
The normal cdf
I
1
2
√ e −ω /2 dω
2π
−∞
cannot be expressed in the closed form
For any X ∼ N(µ, σ 2 ), to compute FX (x) = P(X ≤ x) , we
have
x −µ
x −µ
FX (x) = P(X ≤ x) = P Z ≤
=Φ
σ
σ
I
The quantiles of X are related to the quantiles of Z as
Z
z
Φ(z) =
xp = σzp + µ
I
for any 0 < p < 1
It is only enough to compute probabilities for z > 0 since
Φ(−z) = 1 − Φ(z)
due to the symmetry of standard normal pdf around 0
Levine
STAT 516: Some continuous distributions
Example
I
Let X ∼ N(µ, σ 2 )
I
P(µ − 2σ < X < µ + 2σ) = Φ(2) − Φ(−2) = 0.954
I
In R, can use pnorm(x,a,b) compute the probability P(X ≤ x)
for X ∼ N(a, b 2 )
I
The value of the pdf of X is given by dnorm(x,a,b)
Levine
STAT 516: Some continuous distributions
Example
I
Let 10% of the probability for a distribution N(µ, σ 2 ) is below
60 and 5% is above 90
I
What are the values of µ and σ?
I
Clearly, P(X ≤ 60) = Φ[(60 − µ)/σ] = 0.1 and
P(X ≤ 90) = Φ[(90 − µ)/σ] = 0.95
I
This translates into
I
The solution of the above system is µ = 73.1 and σ = 10.2
60−µ
σ
= −1.28 and
Levine
90−µ
σ
= 1.64
STAT 516: Some continuous distributions
Example
I
Let X denote the length of time (in minutes) an automobile
battery will continue to crank an engine. Let X ∼ N(10, 4).
What is the probability that the battery will crank the engine
longer than 10 + x minutes given that it is still cranking at 10
minutes?
I
Find
P(X > 10 + x|X > 10) =
P(Z > x/2)
P(X > 10 + x)
=
P(X > 10)
1/2
= 2[1 − Φ(x/2)]
I
Note that the result is a decreasing function of x. If X had
been exponentially distributed with the same mean , we would
have P(X ≥ x) = e −x/10 due to the memoryless
property...but not here.
Levine
STAT 516: Some continuous distributions
Remark
I
The normal pdf belongs to a location-scale family with µ a
location parameter and σ a scale parameter
I
For any location scale family, the cdf of any member of the
family is
x −a
FX (x) = FZ
b
where FZ (z) is the cdf of the standard member of the family
I
Other examples of location scale families are the Cauchy,
logistic and the double exponential (Laplace)
Levine
STAT 516: Some continuous distributions
Two important results concerning the normal distribution
(X −µ)2
σ2
I
For X ∼ N(µ, σ 2 ), σ 2 > 0, V =
I
Additive property of the normal distribution: for
X1 , . . . , Xn ∼ N(µi , σi2 ) and independent
Y =
n
X
ai Xi ∼ N
i=1
n
X
i=1
∼ χ21
ai µi ,
n
X
!
ai2 σi2
i=1
I
The last result is best proved using the mgf approach
I
A direct consequence of the second result is that
X̄ ∼ N(µ, σ 2 /n)
Levine
STAT 516: Some continuous distributions
Multivariate Normal Distribution: an introduction
0
I
Start with Z = (Z1 , . . . , Zn ) where each Zi ∼ N(0, 1)
I
The joint density of Z is
fZ (z) =
1
2π
n/2
1 0
e {− 2 z z}
I
Clearly, E Z = 0 and Cov (Z ) = In
I
Due to independence of Zi , the mgf of Z is
0
1 0
tZ
tt
MZ (t) = E [e ] = exp
2
Levine
STAT 516: Some continuous distributions
Definition
I
Z has a multivariate normal distribution with mean vector
0 and covariance matrix In
I
We can also say that Z ∼ N(0, In )
I
In general, the covariance matrix may not be an identity...
Levine
STAT 516: Some continuous distributions
General multivariate normal distribution
I
A general n × n covariance matrix is symmetric and positive
definite
I
Spectral decomposition of Σ is
0
Σ = Γ ΛΓ
I
Λ = diag (λ1 , . . . , λn ) is a diagonal matrix with
λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0 are the eigenvalues of Σ
I
The columns of Γ v1 , . . . , vn are the corresponding
eigenvectors
I
Note that Γ is an orthogonal matrix: Γ−1 = Γ and therefore
0
ΓΓ = I
0
Levine
STAT 516: Some continuous distributions
General multivariate normal distribution
I
Due to orthogonality of Γ, we can represent
0
0
Σ = Γ Λ1/2 ΓΓ Λ1/2 Γ
0
where Σ1/2 = Γ Λ1/2 Γ is the square root of the positive
semidefinite matrix Σ
I
Easy to show that
0
Σ−1/2 = Γ Λ−1/2 Γ
if Σ is actually positive semidefinite
Levine
STAT 516: Some continuous distributions
General multivariate normal distribution
I
Now, define
X = Σ1/2 Z + µ
I
Check that E X = µ and Cov X = Σ
I
The mgf is
0
0
MX (t) = exp {t µ + (1/2)t Σt}
I
(1)
Formally, for any positive semidefinite matrix Σ and µ ∈ Rn ,
a distribution with the mgf (1) is called a multivariate
normal distribution or X ∼ N(µ, Σ)
Levine
STAT 516: Some continuous distributions
The pdf of the multivariate normal distribution and linear
transformations
I
If Σ is positive definite, by usual transformation method,
1
1
0
−1
fX (x) =
exp − (x − µ) Σ (x − µ)
2
(2π)n/2 |Σ|1/2
I
For Y = AX + b, where X ∼ Nn (µ, Σ) and Am×n ,
0
Y ∼ Nm (Aµ + b, AΣA )
Levine
STAT 516: Some continuous distributions
Marginal distributions of the multivariate normal
I
An elementary consequence of the above is that for any
m < n, the m × 1 subvector X1 has a normal distribution as
well:
X1 ∼ Nm (µ1 , Σ11 )
where µ1 = E X1 and Σ11 = Cov X1
I
To justify it is enough to notice that X1 = AX where
.
A = [Im ..Omp ]
Levine
STAT 516: Some continuous distributions
Example
I
For the bivariate normal distribution, let us denote X ≡ X1
and Y ≡ X2
I
Notation:
Σ) where µ = (µ1 , µ2 ) and
2 X ∼ N(µ,
σ
σ
12 Σ = 1
2
σ21 σ2 Since σ12 = ρσ1 σ2 , need to require that ρ > 0 to ensure that
Σ is positive definite and invertible
I
0
Levine
STAT 516: Some continuous distributions
Example
I
The expression for f (x, y ) is
f (x, y ) =
1
p
e −q/2
2πσ1 σ2 1 − ρ2
where
"
x − µ1 2
1
−
q=
1 − ρ2
σ1
#
x − µ1
y − µ2
y − µ2 2
2ρ
+
σ1
σ2
σ2
Levine
STAT 516: Some continuous distributions
Some remarks
I
Note that for any two independent X and Y their correlation
ρ=0
I
Using the pdf of the bivariate normal distribution we obtained,
it is easy to see that ρ = 0 implies that X and Y are
independent
I
This remark extends immediately to any multivariate (and not
just bivariate) normal distribution - see the next slide!
Levine
STAT 516: Some continuous distributions
Normal marginals do not guarantee the joint normality
I
Let Z ∼ N(0, 1) and U : P(U = ±1) = 21 ; let U and Z be
independent
I
Transformation: take X = U|Z | and Y = Z
I
Easy to check that both X and Y are N(0, 1)
I
Jointly, however, the picture is different...Note that X 2 = Y 2
w.p. 1
I
This means that the joint distribution of X and Y is
supported on just the two lines y = ±x and it cannot be
bivariate normal!!
Levine
STAT 516: Some continuous distributions
Independence and Conditional Distributions
I
For X ∼ Nn (µ, Σ) and any partition, X1 and X2 are
independent iff Σ12 = O
I
The Conditional distribution of X1 |X2 is
−1
Nm (µm + Σ12 Σ−1
22 (X2 − µ2 ), Σ11 − Σ12 Σ22 Σ21 )
I
As an example, in the bivariate case, if X = X2 and Y = X1 ,
σ2
2
2
Y |X ∼ N µ2 + ρ (x − µ1 ), σ2 (1 − ρ )
σ1
I
This implies that
E (Y |x) = µ2 + ρ
Levine
σ2
(x − µ1 )
σ1
STAT 516: Some continuous distributions
Interpretation and example
I
Note that for conditional distribution variance do not depend
on the conditioning value x
I
For bivariate normal distributions, the conditional expectation
of one variable given the other coincides with the regression
line of that variable on the other variable
I
Let incomes of husbands and wives in a population be
X ∼ N(75, 400) and Y ∼ N(60, 400) with the correlation
ρ = .75; all measurements are in thousands of dollars
Levine
STAT 516: Some continuous distributions
Example
I
What is P(X + Y > 175|Y = 80)? Note that
X |Y = 80 ∼ N(75 + .75(80 − 60), 400(1 − .752 )) = N(90, 175)
I
Thus,
P(X + Y > 175|Y = 80) = P(X > 95|Y = 80)
95 − 90
= P(Z > .38) = .3520
=P Z > √
175
Levine
STAT 516: Some continuous distributions
Example II: Galton’s Regression to the mean
I
Let X and Y be the first and second midterm grade for a
student.
I
Let (X , Y ) be jointly bivariate normal with means 70,
standard deviations 10, and the correlation .7
I
If X = 90, what is P(Y < X |X = 90)?
Levine
STAT 516: Some continuous distributions
Example II: Galton’s Regression to the mean
I
90 − 84
P(Y < X |X = 90) = P(Y < 90|X = 90) = P Z < √
51
= P(Z < .84) = .7995
since
Y |X = 90 ∼ N(70 + .7(90 − 70), 100(1 − .72 )) = N(84, 51)
I
The term regression to mediocrity was first used by Francis
Galton in his study of inheritance
Levine
STAT 516: Some continuous distributions
χ2 and multivariate normal
I
If X ∼ Nn (µ, Σ) with positive definite Σ,
0
W = (X − µ) Σ−1 (X − µ) ∼ χ2 (n)
I
Very easy proof using quadratic form argument
Levine
STAT 516: Some continuous distributions