R Lab #5: Central Limit Theorem and Confi- dence

R Lab #5: Central Limit Theorem and Confidence Intervals
Central Limit Theorem
Let X be a random variable with mean µ and standard deviation σ. For a large
sample size n, the distribution of the sample mean is
σ
√
x̄ ∼ N µ,
n
Example
The weights of apples collected from a farm are normally distributed with a
mean of 5.2 oz and a standard deviation of 1.1 oz, X ∼ N (5.2, 1.1).
Find the probability that a randomly selected apple weighs less than 5 oz? Note,
this is a single apple.
pnorm(5, mean = 5.2, sd = 1.1)
## [1] 0.4279
Find the probability that a simple random sample of 10 apples has a sample of
mean less than 5 oz? Note, this is a mean with a sample size of 10.
pnorm(5, mean = 5.2, sd = 1.1/sqrt(10))
## [1] 0.2827
Find the probability that a simple random sample of 30 apples has a sample of
mean less than 5 oz?
pnorm(5, mean = 5.2, sd = 1.1/sqrt(30))
## [1] 0.1597
Find the probability that a sample of 100 apples will have a sample mean greater
than 5.3 oz?
1 - pnorm(5.3, mean = 5.2, sd = 1.1/sqrt(100))
1
## [1] 0.1817
What is the probability that a sample of 100 apples will have a sample mean
between 5.1 oz and 5.3 oz?
pnorm(5.3, mean = 5.2, sd = 1.1/sqrt(100)) - pnorm(5.1, mean = 5.2, sd = 1.1/sqrt(100))
## [1] 0.6367
Confidence Intervals
How can the central limit theorem help us construct a confidence interval? Well
first we need to define confidence. There are three levels of confidence that are
commonly used in statistics: 95% (the most common), 90%, and 99%. A 95%
confidence interval is the most common, and it always refers to the middle 95%.
For example, take a simple random sample of size 30 from our apple population.
Figure 1: plot of chunk unnamed-chunk-6
If the shaded area represents the middle 95%, how much is on either side?
These values are important, as they help us define the 95% Confidence Interval.
So if the mean is 5.2, the standard deviation is 1.1 and the sample size is 30, I
am 95% confident that a sample mean will between. . .
round(qnorm(0.025, 5.2, 1.1/sqrt(30)), 2)
2
## [1] 4.81
round(qnorm(1 - 0.025, 5.2, 1.1/sqrt(30)), 2)
# or .975
## [1] 5.59
I am 90% confident that the sample mean will be between. . .
round(qnorm(0.05, 5.2, 1.1/sqrt(30)), 2)
## [1] 4.87
round(qnorm(1 - 0.05, 5.2, 1.1/sqrt(30)), 2)
# or .95
## [1] 5.53
I am 99% Confident that the sample mean will be between . . .
round(qnorm(0.005, 5.2, 1.1/sqrt(30)), 2)
## [1] 4.68
round(qnorm(1 - 0.005, 5.2, 1.1/sqrt(30)), 2)
# or .995
## [1] 5.72
This is great in theory, as it tells you the possible values of x̄ when µ is known.
In practice, all you have is x̄, which is an estimate of µ. We can change this
around. To start, we will recall that approximately 95% of the data is within 2
standard deviation of the mean, or in this case 2 standard errors.
P
σ
σ
µ − 2 √ 6 x̄ 6 µ + 2 √
n
n
P
P
σ
σ
2 √ 6 x̄ − µ 6 2 √
n
n
= .95
σ
σ
x̄ − 2 √ 6 µ 6 x̄ + 2 √
n
n
= .95
= .95
We can rewrite this in a nicer form, and say we are approximatly 95% confident
that population mean is between
σ
x̄ ± 2 √
n
If we want to be exact, we can use R to calculate the exact multiplier
3
qnorm(0.025, 0, 1)
## [1] -1.96
qnorm(0.975, 0, 1)
## [1] 1.96
σ
x̄ ± 1.96 √
n
We also don’t know the value of σ, so we will replace it with s. Therefore, the
formula for a 95% confidence interval for the mean, assuming n is large, is
s
x̄ ± 1.96 √
n
The 1.96 √sn part of the equation is called the margin of error, and it is broken
up into two parts: a multiplier (1.96 in this case) and the standard error √sn .
Example 1
For a chemical reaction, 50 repeated trials showed that the amount of time
for the reaction to finish was 149.2 seconds with a standard deviation of 44.1
seconds. Construct a 95% confidence interval for the mean reaction time.
1.96 * 44.1/sqrt(50)
## [1] 12.22
149.2 - 1.96 * 44.1/sqrt(50)
## [1] 137
149.2 + 1.96 * 44.1/sqrt(50)
## [1] 161.4
There are multiple ways to report this answer. There is the margin of error
format:
149.2 ± 12.2 seconds
Or you can use the interval format:
(137.0, 161.4) seconds
Not that units are clearly stated and the same number of decimal
places/significant digits that mean is reported to are used.
4
Large Sample Confidence Interval for the Mean
The form for any C% confidence interval for the mean, assuming n is large (30
or more), is
s
x̄ ± Zc √
n
The value of Zc depends on the level of confidence.
• For C=95%, Zc = 1.96
• For C=90%, Zc = 1.645
• For C=99%, Zc = 2.576
Example 2
A drug company claims that the time it takes to relieve a headache after taking
a pill is 10 minutes. The drug is administered to 45 people, and the average
time to relief was 13.4 minutes with a standard deviation of 6.7 minutes. Find
the 99% C.I. for the mean time to relief. Does the claim seem accurate?
13.4 - 2.576 * 6.7/sqrt(45)
## [1] 10.83
13.4 + 2.576 * 6.7/sqrt(45)
## [1] 15.97
The 99% confidence interval is (10.8,16.0) minutes. Since 10 minutes is not in
the 99% confidence interval, the claim appears to be incorrect.
5