Sampling Distributions for p̂ and ȳ 1 Sample Proportions According to one company source, the proportion of Milk Chocolate M&M’s that are blue is 24%. Josh Madison took a large sample of these candies and recorded the following frequency table: Color Blue Brown Green Orange Red Yellow Total Count Rel.Freq. 481 371 483 544 372 369 18.36% 14.16% 18.44% 20.76% 14.20% 14.08% 2620 If we consider this sample to be 2620 Bernoulli trials with two outcomes, blue and non-blue, Josh Madison’s sample distribution appears at right. This sample produces a single p̂-value, p̂ = 0.1836, and its distribution tells us only what was found in his sample, giving us no feel for how much p̂ can vary when computed from a random sample of 2620 M&M candies. In contrast, suppose many random samples of size 2620 are drawn, each resulting in a p̂value. At right is a histogram displaying the results of a simulation in which 100,000 samples of just this size were taken and used to get a p̂-value. (We based our simulation on the company-reported parameter value p = 0.24.) This is called the sampling distribution or, more explicitly, the sampling distribution for p̂. [See the footnote at the bottom of p. 459 in Intro Stats for an “apology” over the similarity in terms sample distribution and sampling distribution.] Given what we see here, Josh Madison’s sample appears to be statistically significant 1 Sampling Distributions for p̂ and ȳ 2 Sample Means In 2005 there were 134 men playing on varsity MIAA basketball squads. The population distribution for points per game is displayed at right and is noticably skewed to the right. Taking a simple random sample of 4 players from this group, we might get numbers like these: 3.7, 5.3, 4.8, 1.1 from which we could compute a sample mean ȳ = 1 (3.7 + 5.3 + 4.8 + 1.1) = 3.725. 4 One such sample, with so few values, will not have a very interesting sample distribution, and cannot illustrate the amount of variability encountered in ȳ-values from one random sample of 4 players to the next. But if we simulate the taking of many random samples with sample size n = 4, using each to compute a sample mean ȳ and making a histogram of them, our simulation gives us a feel for the sampling distribution of the sample mean ȳ. (See right.) Two things of note: the sampling distribution is more symmetric and less spread out than the population distribution. If we take simulate the taking of samples of size n = 16 so as to get a feel for the sampling distribution of ȳ for samples of size n = 16, we get the graph at bottom right. This is even more symmetric (even normal-looking, as the overlayed curve suggests) and even less spread out. 2
© Copyright 2026 Paperzz