Probability Distributions

Week #4: Probability Distributions (Ch. 4 & 6)
MAT 660--Biostatistics (Dr. Christopher J. Mecklin)
February 8, 2016
Binomial Distribution
Suppose I flipped a coin three times and wanted to compute the probability of getting
heads exactly 2 times. This can be done with a tree diagram.
Bernoulli Trials
You can see that the tree diagram approach will not be viable for a large number of trials,
say flipping a coin 20 times. The binomial distribution is a probability model that will
allow us to make computations such as the probability of getting 12 heads in 20 flips of a
coin without constructing the tree diagram.
The binomial distribution is based on the assumption that we have Bernoulli trials,
where:
1.
2.
3.
We have n independent trials of an event, such as flipping a coin 20 times.
Each trial has two possible outcomes; one is a success (heads) and the other is failure
(tails).
The probability of success is known and the same for each trial; for coin flipping, π =
0.50. (Sometimes we use p instead of π)
Probability Density Function
The probability density function (or p.d.f.) for the binomial distribution, for x = 0,1,2, … , n,
is:
n
P(X = x) = ( ) πx (1 − π)n−x
x
where X is the random variable representing the number of successes, n is the number of
trials, x is the observed number of successes, π is probability of success, n − x is the
observed number of failures, 1 − π is the probabiity of failure, and (nx) is n choose x (i.e.
combinations).
Combinations
n choose x, which is usually represented as (nx) or n Cx , represents the number of
combinations, or ways we can select x successes in n trials (without regard to order). It is
n!
computed as: (nx) = x!(n−x)!
4!
4×3×2×1
For example: (43) = 3!×1! = 3×2×1×1 = 4
If we are flipping a coin 4 times, there are 4 ways to get 3 heads:
{H, H, H, T}, {H, H, T, H}, {H, T, H, H}, {T, H, H, H}
Binomial Probabilities with the Probability Density Function
Suppose I take a n = 10 multiple choice test by guessing. Each question has 4 choices, one
of which is correct. My probability of 'success' (guessing correctly) is π = 0.25. If X
represents the number of correct answers, then X has a binomial distribution.
X ∼ BIN(10,0.25)
If I want to know the probability of getting exactly 3 questions right: P(X = 3) =
(10
)(0.25)3 (0.75)7 = 120(0.25)3 (0.75)7 = .2503
3
Histogram of a Binomial Distribution}
Binomial Probabilities with Technology
Both the TI-83/84 calculator and R Commander have functions to compute binomial
probabilities, that remove the need to directly use the formula or have a special table.
Suppose now we have X = 6, n = 20, and π = 0.25, where X ∼ BIN(n = 20, π = 0.25).
With the calculator, go to 2nd, VARS and choose Option A: binompdf, in the DISTR menu.
The calculator expects one to enter binompdf(n,p,x), so binompdf(20,0.25,6)=.1686, as
seen before.
Binomial Probabilities with Technology
With R Commander, go to Distributions, Discrete Distributions, Binomial
Distribution, Binomial Probabilities and enter 20 for Binomial Trials and 0.25 for
Probability of Success.
Alternatively, in the regular R window, enter
dbinom(x=6,size=20,prob=0.25)
## [1] 0.1686093
Binomial Probabilities Screenshots
Binomial Probabilities Screenshots}
Cumulative Distribution Function (CDF)
Often we want to know a cumulative probability, such as P(X ≤ 6) when X ∼ BIN(n =
20, π = 0.25).
Literally, this is
P(X ≤ 6) = P(X = 0) + P(X = 1) + ⋯ + P(X = 6).
This could be computed via using the probability density function ('formula') several times.
If you are patient enough, it is just a matter of addition.
P(X ≤ 6)
= .0032 + .0211 + .0669 + .1339 + .1897
+.2023 + .1686 = .7857
Cumulative Distribution Function with Technology
Both the TI-83/84 calculator and R Commander also have functions to compute binomial
cumulative probabilities.
With the calculator, go to 2nd, VARS, and choose Option B: binomcdf, in the DISTR menu.
The calculator expects one to enter binomcdf(n,p,x), so binomcdf(20,0.25,6)=.7858.
Cumulative Distribution Function with Technology
With R Commander, go to Distributions, Discrete Distributions, Binomial
Distribution, Binomial Tail Probabilities and enter 6 for Variable Value(s), 20 for
Binomial Trials, 0.25 for Probability of Success, and Lower Tail. (Upper tail will
computer greater-than probabilities).
Alternatively, in the regular R window, enter
pbinom(q=6, size=20, prob=0.25, lower.tail=TRUE)
## [1] 0.7857819
Binomial Probabilities Screenshots
Binomial Probabilities Screenshots
Other Inequalities
The simplest way to solve problems involving other inequalities besides less-than or equal
to is to be 'clever' with the complement rule.
For example, if X ∼ BIN(n = 20, π = 0.25) and we have already found that P(X = 6) =
.1686 and P(X ≤ 6) = .7858, then:
1.
2.
3.
4.
P(X ≠ 6) = 1 − P(X = 6) = 1 − .1686 = .8314
P(X < 6) = P(X ≤ 5) = .7858 − .1686 = .6172
P(X ≥ 6) = 1 − P(X ≤ 5) = 1 − .6172 = .3823
P(X > 6) = 1 − P(X ≤ 6) = 1 − .7858 = .2142
Mean and Variance for Binomial Distribution
For a binomial distribution, if X ∼ BIN(n, π), then:
μX = E(X) = nπ
σ2X = Var(X) = nπ(1 − π)
So if n = 20 and π = 0.25, then
μX = E(X) = 20(.25) = 5
σ2X = Var(X) = 20(.25)(.75) = 3.75
σX = SD(X) = √3.75 = 1.936
Poisson Distribution
Another situation in which we want to model the number of occurences of an event leads to
the Poisson distribution, which like the binomial is discrete. Suppose that I am sick with a
viral infection and we wish to count X, the number of people that I transmit the virus to in a
given period of time (say one day).
In general, we are counting the number of occurences of an event in a fixed interval of time
or space. We will make the following assumptions:
1.
2.
Occurences of the event (such as transmitting the virus) are independent. (i.e. my
chance of infecting you is not affected by the fact that I have already infected someone
else)
The probability of an occurence is the same for all possible intervals.
Probability Density Function
The probability density function for the Poisson distribution, for x = 0,1,2, … is:
P(X = x) =
e−λ λx
x!
where x is the count (or the number of occurrences of the event), λ is a parameter that
represents the mean number of occurrences, and e is a mathematical constant
approximately equal to 2.71828.
An unusual property of the Poisson distribution is that its variance is equal to its mean λ.
That is:
σ2X = Var(X) = λ
σX = SD(X) = √λ
Poisson Probabilities with the Probability Density Function
Suppose I have a viral infection and on average infect λ = 1.2 other people with the virus
per day. If X represents the number of people I infect, then X has a Poisson distribution.
X ∼ POI(λ = 1.2)
If I want to know the probability of infecting exactly 2 people:
P(X = 2) =
e−1.2 × 1. 22
= 0.72e−1.2 = .2169
2!
Histogram of a Poisson Distribution
Poisson Probabilities with Technology
Both the TI-83/84 calculator and R Commander have functions to compute Poisson
probabilities, that remove the need to directly use the formula or have a special table.
With the calculator, go to 2nd, VARS, and choose Option C: poissonpdf, in the DISTR
menu. The calculator expects one to enter poissonpdf(lambda,x), so
poissonpdf(1.2,2)=.2169, as seen before.
Poisson Probabilities with Technology
With R Commander, go to Distributions, Discrete Distributions, Poisson
Distribution, Poisson Probabilities and enter 1.2 for Mean.
Alternatively, in the regular R window, enter
dpois(x=2,lambda=1.2)
## [1] 0.2168598
Cumulative Distribution Function
Often we want to know a cumulative probability, such as P(X ≤ 2 when X ∼ POI(λ = 1.2).
Literally, this is P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2).
This could be computed via using the probability density function (`formula') several times.
With the Poisson table, it is just a matter of addition and patience.
P(X ≤ 2) = .301194 + .361433 + .216860 = .879487
Cumulative Distribution Function with Technology
Both the TI-83/84 calculator and R Commander also have functions to compute Poisson
cumulative probabilities.
With the calculator, go to 2nd, VARS, and choose Option D: poissoncdf, in the DISTR
menu. The calculator expects one to enter poissoncdf(lambda,x), so
poissoncdf(1.2,2)=.8795.
Cumulative Distribution Function with Technology
With R Commander, go to Distributions, Discrete Distributions, Poisson
Distribution, Poisson Tail Probabilities and enter 2 for Variable Value(s), 1.2 for
Mean, and Lower Tail. (Upper tail will computer greater-than probabilities).
Alternatively, in the regular R window, enter
ppois(q=2, lambda=1.2, lower.tail=TRUE)
## [1] 0.8794871
Normal Distribution
The normal distribution is the most important probability model in the field of statistics. It
is commonly referred to as the so-called 'bell curve' or sometimes as the Gaussian
distribution.
It is a continuous probability distribution that is important in the study of probability and
statistics for a variety of reasons. First, many natural phenomenon will approximately
follow the normal model. Also, many man-made measures, such as standardized test
scores, will closely follow the normal distribution.
This distribution is also vital in the theoretical development of many of the methods
commonly utilized in applied statistics.
Normal probability density function
While we will not directly use this formula, the probability density function for the normal
distribution is:
f(x) =
1
1 (x − μ) 2
exp[− (
) ]
2
σ
√2πσ2
where the parameter μ is the mean and the parameter σ is the standard deviation.
If you have a calculus background, you'll appreciate not having to evaluate the integral of
this function.
Properties
The normal distribution is a continuous distribution that is bell-shaped, unimodal,
symmetric, and asymptotic.
X ∼ N(100,15)
Standardization
There are an infinite number of normal distribution models, as μ can take on any real
number and σ any positive real number. In order to make finding probabilities associated
with the normal distribution easier, we generally compute what is known as a , or z-score.
The process involves subtracting the mean and dividing by the standard deviation. The
resulting z-score measures how many standard deviations above or below average a data
value is.
Z=
X−μ
σ
Standardization
Suppose a high school senior takes two mathematics tests, the SAT (X) and the ACT (Y).
Both are normally distributed, but with very different means and standard deviations. If
X ∼ N(500,100) and Y ∼ N(20,5) and the student gets a 640 on the SAT and a 28 on the
ACT, then:
ZX =
640 − 500
= 1.40
100
ZY =
28 − 20
= 1.60
5
Since the z-score is higher for ACT, this student did relatively better on that test than on the
ACT.
Normal Probabilities
Assume the mean length of a human pregnancy can be described by a normal distribution
with μ = 266 days and σ = 16 days. i.e. X ∼ N(266,16). We want to know the probability
that a pregnancy lasts less than 246 days, or P(X < 246), represented below by the
percentage of the area under the curve shaded in blue.
Normal Probability With a Table
We will convert X = 246 days into a z-score.
Z=
246 − 266 −20
=
= −1.25
16
16
So P(X < 246) = P(Z < −1.25). Using the standard normal table that I've put on Canvas (I
dislike your textbook's tables),
P(Z < −1.25) = .1056
So a pregnancy lasting 246 days or less happens about 10.56% of the time, or X = 246 is
approximately the 11th percentile of the distribution.
Normal Probability With a Table
To find P(X > 280), first standardize.
Z=
280 − 266
= 0.88
16
P(X > 280) =
=
=
=
P(Z > 0.88)
1 − P(Z < 0.88)
1 − .8106
. 1894
P(246 < X < 280) =
=
=
P(−1.25 < Z < 0.88)
. 8106 − .1056
. 7050
Normal Probabilities With Technology
With the TI 83/84 calculator, go to 2nd, VARS, and choose Option 2: normalcdf in the
DISTR menu. The calculator expects one to enter normalcdf(lower $z$-score, upper
$z$-score).
1.
2.
3.
$P(Z<-1.25)=\text{normalcdf(-99999999,-1.25)}=.1056$ (the large negative value is
used since the normal curve will stretch on the left to −∞.
P(Z > 0.88) = normalcdf(0.88,99999999) = .1894 (the large positive value is used
since the normal curve will stretch on the right to ∞; the answer is slightly different
than the tabled answer because we were forced to round the z-score to 2 decimal
places when using the table)
$P(-1.25<Z<0.88)=\text{normalcdf(-1.25,0.88)}=.7049$
Normal Probabilities With Technology
With R Commander, to find P(Z < −1.25) go to Distributions, Continuous
Distributions, Normal Distribution, Normal Probabilities and enter -1.25 for
Variable Value(s), 0 for Mean, 1 for Standard Deviation and Lower Tail. (Upper tail will
computer greater-than probabilities).
Alternatively, in the regular R window, enter
pnorm(q=-1.25, mean=0, sd=1, lower.tail=TRUE)
## [1] 0.1056498
Probabilities without Standardizing
With technology, we can find probabilities without standardizing (finding the z-score). If
we have X ∼ N(180,24) and we want P(X > 220), you would:
1.
2.
3.
With the TI 83/84, P(X > 220) = normalcdf(220,99999999,180,24) = .0478
With R Commander, go to Distributions, Continuous Distributions, Normal
Distribution, Normal Probabilities and enter 220 for Variable Value(s), 180 for
Mean, 24 for Standard Deviation and Upper Tail.
In the regular R window, enter
pnorm(q=220,mean=180,sd=24,lower.tail=FALSE)
## [1] 0.04779035
Inverse Normal Problem
In the so-called 'inverse' normal problem, we start off knowing the area under the curve
(percentile) and we want to find what value of the variable X corresponds to that
percentage.
1.
2.
A college will only accept students that score in the top 15% on a standardized test
with scores Y ∼ N(μ = 400, σ = 100). John is applying to that score and he wants to
know what actual score Y is necessary on the test to qualify.
The length of human pregnancies is approx. normal where X ∼ N(266,16). Suppose we
want to know the 70th percentile of this distribution, or in other words the 'cut-off'
value separating the shortest 70% of pregnancies from the longest 30%.
Graph of the Inverse Normal Problem
70th percentile for X ∼ N(266,16)
Normal Percentiles With The Table
In order to determine the 70th percentile for human pregnancies using a normal table, we
first must determine the z-score that corresponds to the 70th percentile.
This is done by looking up an area of .7000 in the 'middle' of the normal curve table, and
working back to find the row and column to determine the z-score.
The closest area to .7000 is .6984, which is on the row for 0.5 and column for 0.02, making
the desired standardized score of Z = 0.52. If you are at the 70th percentile of a normal
distribution, you are 0.52 standard deviations above average.
Use the z-score formula
Now that we know the z-score for the 70th percentile of a standard normal distribution is
Z = 0.52, and the fact that the distribution of the length of human pregnancies is approx.
normal with mean 266 and standard deviation 16, we use the standardization formula to
solve for X.
X−μ
σ
X − 266
0.52 =
16
X=
266 + 0.52 × 16
X=
274.32
Z=
Normal Percentiles With Technology
Let's find some percentiles with the TI-83/84 calculator.
With the calculator, go to 2nd, VARS, and choose Option 3: invNorm in the DISTR menu.
The calculator expects one to enter invNorm(area,mu,sigma).
For the 70th percentile of the pregnancy distribution, enter invNorm(.70,266,16). The
answer is 274.39.
For the top 15% of the standardized test score distribution (area below is 85%), where Y ∼
N(400,100), use invNorm(.85,400,100). The answer is 503.64.
Normal Percentiles With Technology
How about those same percentiles with R?
With R Commander, to find the 70th percentile of the pregnancy distribution, go to
Distributions, Continuous Distributions, Normal Distribution, Normal
Quantiles and enter 0.70 for Probabilities, 266 for Mean, 16 for Standard Deviation and
Lower Tail.
To find the top 15% of the standardized test distribution, go to Distributions,
Continuous Distributions, Normal Distribution, Normal Quantiles and enter 0.15
for Probabilities, 400 for Mean, 100 for Standard Deviation and Upper Tail.
Normal Percentiles With Technology
Alternatively, in the regular R window, enter
qnorm(p=0.70, mean=266, sd=16, lower.tail=TRUE)
## [1] 274.3904
qnorm(p=0.15, mean=400, sd=100, lower.tail=FALSE)
## [1] 503.6433
Central Limit Theorem
Sampling Distribution of the Sample Mean x
Imagine taking a sample of size n from some distribtuion with mean μ and standard
deviation σ. We can compute statistics on the data points in this sample. Suppose we
compute the sample mean, x.
Now imagine that we repeat this process many many times and constructing a histogram of
the many many sample means (xs) that one would accumulate. This is what is known as the
x.
An online simulation of this can be found at
http://www.rossmanchance.com/applets/SampleMeans/SampleMeans.html
Small Samples from a Skewed Distribution
If we take small samples (say n = 10) from a skewed distribution, the sampling
distribution of x (lower left) is .
Large Samples from a Skewed Distribution}
However, larger samples (n = 50 was used here) yields a sampling distribution that is
approximately normal.
Samples from a Uniform Distribution
The uniform distribution is not bell-shaped like the normal, but it is symmetric and the
convergence of the sampling distribution of x to normality occurs sooner. Here, n = 15.
Samples from a Normal Distribution
If the parent distribution is , the sampling distribution of x will also be normal for any
sample size, even if it is small such as n = 5.
The Central Limit Theorem
While a formal proof of the Central Limit Theorem is beyond the scope of this course, we
will state a version of it:
Central Limit Theorem: Suppose we draw random samples of size n from some parent
distribution (that may or may not be normal) that has a mean μ and standard deviation σ. If
n is `large' enough, then the sampling distribution of x will be approximately normal; that
is:
X ∼̇ N(μ, σ/√n)
The Central Limit Theorem
Typically, n > 30 is considered 'large' enough for the sampling distribution to be close to
normal, even if the parent distribution is heavily skewed.
If the parent distribution is symmetric, a sample of n ≈ 15 can be sufficient, and if the
parent distribution is normal, any sample size n is large enough.
Since we usually don't know the shape of the parent distribution, we will assume the worst
and use n = 30 as our standard.
Implications of the Central Limit Theorem
The Central Limit Theorem tells us three important facts about the sampling distribution of
x.
1.
2.
3.
Shape: The shape is approximately normal.
Center: μx , or the mean of the many sample means, is μx = μ. (i.e. the mean of the
sampling distribution is equal to the original mean)
σ
Spread: σx , or the standard deviation of the sampling distribution, is σx = n (i.e. the
√
standard deviation of the sampling distribution becomes smaller as the sample size
increases)
The Standard Error of the Mean
Usually σx =
σ
√n
is called the . For example, if σ = 100 and n = 64, then
σx =
100
√64
= 12.5
The standard error of the will come up again and again over the next several weeks.
The Central Limit Theorem in Action
Suppose we are studying X, the length of hospital stay, in days, for patients who have an
appendectomy. The mean stay is short (μ = 2 with σ = 2) but the shape is right-skewed, as
a small number of patients will have complications and longer stays.
Probability for the Mean of a Small Sample
Since the parent distribution is heavily right skewed, if I asked the question 'What is the
probability that the mean stay for a random sample of 10 patients is more than 2.5 days?'
or:
P(X > 2.5)
we would be unable to compute the probability, since the sample is too small for the
sampling distribution of X to be normal.
Probability for the Mean of a Large Sample
However, if the sample was larger (at least 30), we could turn the problem into a problem
solvable with the normal distribution.
For instance, if n = 64 and I want to know P(X > 2.5), I can use the Central Limit Theorem
to show that
X ∼̇ N(μx = 2, σx =
2
√64
= 0.25)
Probability for the Mean of a Large Sample
So the solution is
P(X > 2.5) =
=
=
=
2.5 − 2
)
0.25
P(Z > 2.00)
1 − .9772
. 0228
P(Z >
t, χ2 , F Distributions
There are several other continuous probability distributions, which are related to the
standard normal and each other, that are important in applied statistics and introduced in
chapter 4 and 6. I will delay using tables and technology to find critical values and
probabilities associated with them until future chapters.
•
The t distribution is similar to the standard normal in that it is bell-shaped and can
take on all possible values, but has heavier tails than the standard normal. It is heavily
used in finding confidence intervals and hypothesis tests involving a mean or a
difference of two means (the famous t-test, for instance.
t, χ2 , F Distributions
•
The χ2 distribution is a skewed distribution that only takes on positive values. Its main
use for us will be in the analysis of categorical data.
•
The F distribution is also a skewed distribution that only takes on positive values. It is
very useful in linear regression and ANOVA.