Handout of Oct. 15

Sampling Distributions for p̂ and ȳ
1
Sample Proportions
According to one company source, the proportion of Milk Chocolate M&M’s that are
blue is 24%. Josh Madison took a large sample of these candies and recorded the following frequency table:
Color
Blue
Brown
Green
Orange
Red
Yellow
Total
Count
Rel.Freq.
481
371
483
544
372
369
18.36%
14.16%
18.44%
20.76%
14.20%
14.08%
2620
If we consider this sample to be 2620 Bernoulli
trials with two outcomes, blue and non-blue, Josh Madison’s sample distribution appears at right.
This sample produces a single p̂-value,
p̂ = 0.1836,
and its distribution tells us only what was found in his sample, giving us no feel for how much p̂
can vary when computed from a random sample of 2620 M&M candies.
In contrast, suppose many random samples
of size 2620 are drawn, each resulting in a p̂value. At right is a histogram displaying the
results of a simulation in which 100,000 samples of just this size were taken and used to get
a p̂-value. (We based our simulation on the
company-reported parameter value p = 0.24.)
This is called the sampling distribution or,
more explicitly, the sampling distribution for
p̂. [See the footnote at the bottom of p. 459 in
Intro Stats for an “apology” over the similarity in terms sample distribution and sampling
distribution.] Given what we see here, Josh
Madison’s sample appears to be statistically
significant
1
Sampling Distributions for p̂ and ȳ
2
Sample Means
In 2005 there were 134 men playing on varsity
MIAA basketball squads. The population
distribution for points per game is displayed
at right and is noticably skewed to the right.
Taking a simple random sample of 4 players
from this group, we might get numbers like
these:
3.7,
5.3,
4.8,
1.1
from which we could compute a sample mean
ȳ =
1
(3.7 + 5.3 + 4.8 + 1.1) = 3.725.
4
One such sample, with so few values, will
not have a very interesting sample distribution, and cannot illustrate the amount of
variability encountered in ȳ-values from one
random sample of 4 players to the next. But
if we simulate the taking of many random
samples with sample size n = 4, using each
to compute a sample mean ȳ and making
a histogram of them, our simulation gives
us a feel for the sampling distribution of
the sample mean ȳ. (See right.) Two things
of note: the sampling distribution is more
symmetric and less spread out than the
population distribution.
If we take simulate the taking of samples of
size n = 16 so as to get a feel for the sampling
distribution of ȳ for samples of size n = 16,
we get the graph at bottom right. This is
even more symmetric (even normal-looking,
as the overlayed curve suggests) and even
less spread out.
2