An Example of the Central Limit Theorem in Action Using the 1994 Salaries of Major League Baseball Players The central limit theorem says that for large n, the sampling distribution of X is approximately ( N µ, σ n ) for any population with finite standard deviation. As the sample size increases, the distribution of X becomes closer to a normal distribution, regardless of what the population distribution may be. To illustrate this, let’s look at a real set of data: the 1994 salaries of major league baseball players. A histogram of the salaries is below. Figure 4: Number of Players 1994 Salaries of Major League Baseball Players 200 150 100 50 0 0 1 Salaries include signing bonuses Data source: USA Today, April 5, 1994 2 3 4 5 6 Salary (in millions) Notice that this histogram is extremely strongly skewed to the right. There were a total of 747 players, and more than a third of them had salaries below $200,000 (the first bar on the left side of the histogram). For this example, we will treat the collection of all 747 players as our population and choose random samples of the players. In order to make this resemble an infinite population, we will allow repetition in our samples – that is, the same player can be chosen more than once in a single sample. Suppose that we take a random sample of 10 of these 747 baseball players and find the mean salary of these 10 players. I did this and got an mean of $645,750. Then I did it a second time, picking another 10 players at random and found a mean of $1,197,900. I repeated this procedure 100 times. Each time 10 players were chosen at random and the mean salary for those 10 was recorded. There was quite a bit of variation in the results (the means ranged from $166,550 to $2,309,333), but each time we are getting an estimate of the mean salary for all 747 players (which was $1,183,714). I also repeated this procedure with samples of other sizes. I found 100 samples of size 2, 100 samples of size 5, and 100 samples of size 40. For each sample, I recorded the mean salary. On the next page there are four histograms of the mean salaries from these samples, one for each of the sample sizes. Important features of these histograms: “ As the sample size increases, the variability of the means from the different samples decreases. This reflects the fact that the variance of X is given by σ / n, which decreases as n increases. This indicates that with larger sample sizes, the mean from your sample is more reliably close to the mean of the entire population. 2 “ With a sample size of two (that is, choose two players at random and average their salaries), the distribution of the means is still very strongly skewed to the right. However, as sample size increases, the histograms become much closer to symmetric, single-peaked, bell-shaped curves – like the normal distribution! Finally, on the back of the sheet, there are normal probability plots for the means of the samples of different sizes. These graphs plot the data on the x-axis (in this case, the different values of X ) against their percentiles on the y-axis. The y-axis is scaled based on the normal distribution. If the data follow very closely the pattern of a normal distribution, then the plot will follow almost perfectly along a straight line. The normal probability plots indicate that: The means of samples of size two very clearly do not follow the normal distribution. (This was also evident from the histogram since it was strongly skewed to the right). The distribution of means of samples of size 5 is also not very normal. It is also somewhat skewed to the right, though not as much as the samples of size 2 were. It also has two high outliers. The distribution of the means of samples of size 10 is only very slightly skewed to the right, and is very close to symmetric. The normal probability plot indicates that it is pretty close to normal. By the time we get to samples of size 40, the distribution of the means is almost perfectly symmetric and the normal probability plot indicates that it is very close to normal. So this provides an illustration of the fact that as the sample size increases, the distribution of X becomes closer to a normal distribution, regardless of what the population distribution may be. An implication of this is that when n is fairly large and we need to do probability computations, we can use the normal distribution. Approximate Sampling Distributions for the Average 1994 Salaries of Random Samples of Major League Baseball Players Samples of Size 2 Samples of Size 5 10 Percent of Samples Percent of Samples 20 10 0 5 0 0 1000000 2000000 3000000 4000000 0 1000000 Average Salary in Sample 3000000 4000000 Average Salary in Sample Samples of Size 10 Samples of Size 40 30 Percent of Samples 20 Percent of Samples 2000000 10 0 20 10 0 0 1000000 2000000 3000000 Average Salary in Sample 4000000 0 1000000 2000000 3000000 Average Salary in Sample 4000000 Normal Probability Plot for N = 2 Normal Probability Plot for N = 5 Mean: 1109190 StDev: 950180 99 95 95 90 90 80 80 70 70 Percent Percent 99 60 50 40 30 10 5 1 1 3000000 4000000 0 1000000 Data Mean: 1088984 StDev: 473962 99 95 95 90 90 80 80 70 70 60 50 40 30 60 50 40 30 20 20 10 10 5 5 1 1 1000000 Data 247334 3000000 Normal Probability Plot for N = 40 Percent Percent 99 500000 2000000 Data Normal Probability Plot for N = 10 0 1193613 StDev: 30 5 2000000 Mean: 40 20 1000000 714721 50 10 0 1282994 StDev: 60 20 -1000000 Mean: 1500000 2000000 500000 700000 900000 1100000 Data 1300000 1500000 1700000 1900000
© Copyright 2026 Paperzz