Chapter 18

Chapter 18
Sampling Distribution Models
Topics
 The sampling distribution of a
proportion/mean.
 The mean and standard deviation of a
sampling distribution of a proportion/mean.
 Normality revisited
 Necessary conditions
 Standard Error
Presidential Election 1996
 In the 1996 presidential election, Bill Clinton received
49% of the popular vote, Bob Dole received 41% of
the popular vote, and Ross Perot received 8% of the
popular vote.
 If we were to sample the public regarding how they
voted (or were planning to vote), what would we
expect to obtain?



How accurate would our sample proportion be?
How likely would we be to get a sample predicting Bob
Dole would win the election?
These are the types of questions we will answer in
related to the sampling distribution of the mean.
Election example ctd.
 Run a simulation in which we randomly sample 200




voters and ask each of them their presidential
preference. We are interested in finding the
percentage of the voters that would choose Bob
Dole. Perform 1000 runs to simulate choosing 1000
different samples of size 200.
What is the shape of our distribution?
What does each data value (or dot) represent?
What is the mean of the distribution?
What is the standard deviation of the distribution?
Election example ctd.
 What is the probability of choosing a sample
predicting that Bob Dole would get more than
50% of the vote?
 What is the probability of choosing a sample
predicting that Dole would get between 40%
and 42% of the vote?
 What would we expect to happen if we
increased our sample size to 500? To
10,000?
The sampling distribution of a
proportion
 The Sampling Distribution of a Proportion is the
set of all possible sample proportions of size n.
 Notation: p̂ is the notation for a sample proportion.
It is also used to represent the sampling distribution
of a proportion. The difference can be made clear by
context.
 The mean of the sampling distribution:
 ( pˆ )   pˆ  p
 The standard deviation of the sampling dist:
SD( pˆ )   ( pˆ )   pˆ 
p(1  p)
n
Presidential Example ctd.
 Returning to the Dole example,



What is the true mean of the sampling
distribution of the proportion for samples of
size 200?
What is the true standard deviation of the
sampling distribution of the proportion for
samples of size 200?
What is the actual probability of obtaining a
random sample of size 200 in which Dole
obtains more than 50% of the vote?
Assumptions and Conditions for
Normality (The Central Limit Theorem)
 What does it take for the sampling distribution to normal?





1) The sampled values must be chosen independently of
each other (with replacement).
2) The sample size must be large enough.
10% condition: If the samples are not obtained with
replacement, the sample size must not exceed more than
10% of the population size.
Success/Failure Conditions: The sample size has to be large
enough so that both np and nq = n(1-p) are at least 10.
Example: How large does the sample size have to be in the
presidential election when analyzing Bob Dole’s proportion
of the vote?
Quantitative Data and the Sampling
Distribution of the Mean
 If we are analyzing quantitative data, and not
necessarily interested in a proportion having
a specified attribute, we may be interested in
the average value we would expect to obtain.
 The sampling distribution of the mean is
the set of all sample means of size n from a
population.
Sampling Distribution of the Mean
example
 Suppose a kennel has 5 dogs with the
following weights: 30, 36, 48, 60, 72.
(pounds)
Construct the sample distribution of the mean
for samples of size 2.
Construct the sample distribution of the mean
for samples of size 3.
Example ctd.
 What is the mean of the sample distribution of
the mean?
 What happens to the spread of the sampling
distribution of the mean as the sample size
grows larger?
Mean and Standard Deviation of The sampling
distribution
 For a sampling distribution of the mean,
x
x  

x 
n
 What happens to the standard deviation of the
sampling distribution as the sample size increases?
Example
 According to a recent study, the average salary of
advanced degree (post Master’s degrees) holders is
$42,000 with a standard deviation of $6,000.
 For samples of size 36, determine the mean and
standard deviation of the sampling distribution of the
mean.
 Interpret this result.
 What about samples of size 100?
Notation note
 The variable
x is frequently used to denote
the sample distribution of the mean.
 This is the same notation as individual
sample means, so you determine which is
being talked about by the context.
When can we assume that x is
normal? (Central Limit Theorem)
 1) When the original variable under
consideration is normal.
 2) When the sample size is 30 or greater.
Conditions
 Random sampling condition—The data
values must be randomly sampled.
 Independence Assumption—the sampled
values should be mutually independent. If
the sampling is done without replacement
(this is usually the case), then the sample
size should not exceed 10% of the population
size (Usually not a problem).
Example
 A variable under consideration has a mean of
50 and a standard deviation of 20.
 A) Identify the sampling distribution of the
mean for samples of size 36.
 B) What can you say about the sample
distribution of the mean for samples of size
25?
Empirical Rule
 Recall that 68.26% of all data values in a
normal distribution lie within 1 standard
deviation of the mean.
 95.44% of all data values in a normal
distribution lie with 2 standard deviations of
the mean.
 99.74% of all data values in a normal
distribution lie within 3 standard deviations of
the mean.
What are data values in the sampling
distribution of the mean?
 Data values are sample means of all the
possible samples of size n.
 68.26% of all possible samples of size n have
means that lie within  / n of the actual
population mean.
 95.44% of all possible samples of size n have
means that lie within 2 / n
of the actual
population mean.
 99.74% of all possible samples of size n have
means that lie within 3 / n
of the actual
population mean.
Equivalent Interpretations
 68.26% of all samples of size n have the
property that the population mean is
contained in the interval from
x

n
to x 

n
 95.44% of all samples of size n have the
property that the population mean is
contained in the interval from
x2

n
to x  2

n
 99.74% of all samples of size n have the
property that the population mean is
contained in the interval
x 3

n
to x  3

n
Example
 The average commuting time in a city 40 minutes




with a standard deviation of 10 minutes. Commuting
time may or may not be a normal variable. Suppose
a sample of 100 commuters is studied. Determine
whether the following statements are true or false.
A) There is a roughly 68.26% chance that the mean
of the sample will be between 30 and 50.
B) 68.26% of all possible observations of x lie
between 30 and 50.
C) There is roughly a 68.26% chance that the mean
of the sample will be between 39 and 41.
What if commuting time is a normal variable?
Standard Error
 What if we don’t know p or σ?
 We would expect to use the sample values instead.
 The standard error for the sampling distribution of
the proportion:
pˆ (1  pˆ )
n

The standard error for the sampling distribution of
the mean:
s
n