October 8, 2012 RANDOM SAMPLING & SAMPLING DISTRIBUTIONS Chapter 11 The foundation for inferential statistics Backtracking A couple things we skipped with Chapter 10 Probability The odds that an event will occur Range from 0 to 1.0 5.2 October 8, 2012 Quick check I roll a die with 6 sides. What is the probability it lands with the “2” face up? a) b) c) d) e) Who knows? 1 out of six 2 out of six 2 50-50 (If this is hard, read chapter 10). 5.3 Key Statistical Concepts… Population Sample Subset .Population parameters Sample statistics 1.4 October 8, 2012 Sampling… Inferential statistics permits us to draw conclusions about a population based on a sample. Sampling (i.e. selecting a sub-set of a whole population) is done for reasons of: cost (it’s less expensive to sample 1,000 television viewers than 100 million TV viewers) and practicality (e.g. performing a crash test on every automobile produced is impractical). We want the sample to be a good representation of the population. 5.5 The risks and rewards of sampling Risks 1. The sample might not represent the larger population. 2. We might reach inaccurate conclusions. Rewards 1. The sample might represent the larger population. 2. We might reach accurate conclusions at a very low cost. Best approach: Realize there is some error in sampling, and Estimate this error. October 8, 2012 Different types of error in research Inferential statistics will not address: Inferential statistics will address: Nonsampling error: Sampling Error: unrelated to the sampling strategy e.g., lousy measures, sloppy ratings differences between the sample and the population that exist only because of the observations that happened to be selected for the sample. Biased sampling Random We cannot get rid of this type of error. 5.7 Selection Bias or Biased Samples… …occurs when the sampling plan is such that some members of the target population are less likely to be selected for inclusion in the sample. e.g., I am interested in how my students are doing, but I sample those in the front row; I am interested in depression, but I only sample those in the hospital We need to avoid this. To do so, we use random samping strategies. 5.8 October 8, 2012 Random sampling Every observation in the population has an equivalent chance of being sampled. Every sample of a given size (n) has an equivalent chance of being drawn from the population. How do we do this? Ideal: use random numbers from a site like www.random.org to guide selection 5.9 Random Sampling Sampling without replacement: no element may appear more than once in a sample. Sampling with replacement: an element may appear more than once in a sample. We will assume sampling with replacement. Differences tend to be negligible when sample size (n) is small relative to the size of the population (N), which is often quite large. October 8, 2012 Simple Random Sampling… A government income tax auditor must choose a sample of 5 of 11 returns to audit. Person baker george ralph mary sally joe andrea mark greg aaron kim Generate Random # 0.87487 0.89068 0.11597 0.58635 0.34346 0.24662 0.47609 0.08350 0.53542 0.37239 0.73809 1 2 3 4 5 Person mark ralph joe sally aaron andrea greg mary kim baker george Sorted Random # 0.08350 0.11597 0.24662 0.34346 0.37239 0.47609 0.53542 0.58635 0.73809 0.87487 0.89068 5.11 Let’s watch what happens with random sampling. Assume I want to do a study to understand the population of people who are in the front row of my statistics class. What data do we want to gather? I will gather one sample from the front row using simple random sampling, and then a second sample. 5.12 October 8, 2012 Sample 1 Let’s assign everyone in the front row a random number from random.org Random Integer Generator Here are your random numbers: 32 30 95 4 88 32 13 60 51 91 6 88 75 58 Now let’s draw the two people with the lowest numbers as our sample (4 and 6). 5.13 Sample 2 Let’s assign everyone in the front row a NEW random number from random.org Here are your random numbers: 75 94 71 57 14 26 41 6 75 38 41 88 67 23 Now let’s draw the two people with the lowest numbers as our sample (6 and 14). 5.14 October 8, 2012 5.15 Gathering our data Let’s compute the mean for sample 1 and for sample 2. What do we expect will happen? Stratified random sampling Our sample might be more representative with stratified random sampling Define strata within the population (e.g., males, females) Conduct random sampling from each strata 5.16 October 8, 2012 Sampling Error… Sampling error refers to differences between the sample and the population that exist only because of the observations that happened to be selected for the sample. Every sample is likely to differ slightly from the population, due to random variation. Individual differences, timing effects, etc., all can lead to subtle differences across samples. 5.17 Sampling error Increasing the sample size will reduce sampling error. It still exists though. We need to estimate the sampling error. This is the goal of inferential statistics. 5.18 October 8, 2012 5.19 Give it a shot… explain to your neighbor: What is sampling error? Is it influenced by the quality of your measures? The goal of inferential statistics Even when we use careful random sampling, careful measurement, etc., we still have sampling error. How can we estimate the magnitude of sampling error? That is, how do we know how accurate our sample is as an estimate of the population? 5.20 October 8, 2012 Random Sampling Population (μ, σ) Sample 2 Sample 1 X , sX X , sX The values of these statistics will vary from sample to sample. Sampling distributions The distribution we would observe if we took all possible samples of a given size from a population. We want to estimate the population mean, standard deviation, and other parameters. There is a sampling distribution for each of these: E.g., Sampling distribution of the mean October 8, 2012 x Sampling Distribution of Mean Proper analysis and interpretation of a sample statistic requires knowledge of its distribution. Use x Pop ulation to estimate Process of (p arameter) Inferential Statistics " Start here." Samp le x (statistic) Select a random sample Developing a sampling distribution The sampling distribution of the means– the distribution of means if we took all possible samples of a given size from the population, and calculated their means 5.24 October 8, 2012 Sampling distribution of the mean We use this to estimate: the level of error we would observe if we used sample means to estimate the population mean The probability of observing a sample mean that was highly discrepant from the population mean . Sampling Distribution of the Mean Some interesting characteristics which are captured in the Central Limit Theorem… October 8, 2012 Central Limit Theorem: Given any population with mean (μ) and standard deviation (σ), the sampling distribution of the mean for sample size n will have: a mean equal to μ A standard deviation (known as the standard error of the mean) = n and, as n → ∞, will approach a normal distribution. Central Limit Theorem Let’s break down the implications for understanding the mean, the variability, and the shape of the sampling distribution of the mean October 8, 2012 The Mean of the Sampling Distribution of the Mean With repeated sampling (actually an infinite number of times): The mean of the sampling distribution of the mean is the same as the population mean (μ) from which the scores were drawn. True regardless of n, σ, and the shape of the population. Standard deviation of the sampling distribution of the mean The standard deviation of the distribution of sample means tells us how close the sample mean is likely to be from the population mean, or the spread of the sample means. ●Very important tool for all of inferential statistics ● October 8, 2012 Sampling Distribution of the Mean The standard deviation of the sampling distribution of the mean is the standard error of the mean. X n Sampling Distribution of the Mean X n The sample means vary less when: the scores from the population vary less. sample size (n) is greater. Note that there will be many random sampling distributions of the mean. Each based on a different sample size (n). October 8, 2012 Shape of the sampling distribution of the mean Even when the underlying population distribution is NOT normally distributed, If your sample size is large enough, The sampling distribution of the mean will be normal. This is a huge advantage in conducting inferential statistics. Central Limit Theorem Allows one to study populations with differently shaped distributions Creates the potential for applying the normal distribution to many problems when sample size is sufficiently large October 8, 2012 A brief detour: the Monte Carlo simulation We can use a Monte Carlo simulation to demonstrate what would happen if we took many samples from a population with known characteristics. ●Several factors will affect the simulation. ●Let’s look at two of these: ●Distribution of the population ●Sample size ● Monte Carlo simulations Assume a population distribution. Draw many samples (of size n) from this population and plot the samples (or their descriptive statistics) across repeats. ●NOTE: The MCS is for demonstration purposes only. We would never do a Monte Carlo simulation to analyze the results of a real experiment. ●Remember, in a real experiment we usually don't know the population parameters, we only have the sample statistics. ● October 8, 2012 Sampling by Monte Carlo simulation Consider a roulette wheel with numbers 00, 0, and 1-36. ●Spin the wheel n times, and make a frequency histogram of the results. ● “3” n Number on wheel Sampling by Monte Carlo simulation Draw one sample of size n, calculate the sample mean. ●Now repeat this procedure 10000 times. ●What does the distribution of means look like? ● M M M Samples (Keep going to 10,000) Sampling Distribution Of the Mean October 8, 2012 Sampling Distribution of the Mean To visualize the relationship between the sample mean and the underlying population, we can do the following experiment many times, with different numbers of observations. M M M M Sampling Distribution Of the Mean Population M Sample of size n Take Mean The Sampling Distribution Of the Mean distributed normally even if the population and samples from it are not distributed normally! M M M M Population M Sample of size n Take Mean Sampling Distribution Of the Mean October 8, 2012 The Sampling Distribution Of the Mean Consider a population with mean 0 and standard deviation 1. ● 0.2 0.0 0.1 Density 0.3 0.4 Sample of 100000 -4 -2 0 2 4 M= 0 SD= 1 The Sampling DistributionOf the Mean for large samples When the size of the sample is large, the sample means will be fairly close to the population mean. ● Sample of 100 Sample of 100 -2 0 M= -0.01 SD= 1.1 2 4 12 10 8 4 2 0 2 0 -4 6 Frequency 8 4 6 Frequency 8 6 4 2 0 Frequency 10 10 12 12 14 Sample of 100 -4 -2 0 M= -0.06 SD= 0.94 2 4 -4 -2 0 M= 0.03 SD= 1 2 4 October 8, 2012 The Sampling Distribution Of the Mean for large samples When the size of the sample is large: The means across samples will be pretty similar– they don’t vary much. The mean of the Sampling Distribution Of the Mean = population mean. 3000 2000 0 1000 Frequency 4000 5000 DOSM for sample size 100 -4 -2 0 2 4 M= -0.00052 SD= 0.0999769 The Sampling Distribution Of the Mean for large samples When the sample size is large the Sampling Distribution Of the Mean will have a small SD. M M M M Population M Sample of size 20 Take Mean Distribution of Sample Means October 8, 2012 The Sampling Distribution Of the Mean for small samples When the sample size is small: the sample means vary a lot the standard deviation of the sampling distribution is larger Sample of 8 1.0 Frequency 2.0 -4 -2 0 2 4 0.0 0.0 0.0 0.5 0.5 0.5 1.0 1.5 Frequency 1.5 1.0 Frequency 2.0 1.5 2.5 2.5 2.0 Sample of 8 3.0 3.0 Sample of 8 -4 -2 M= 0.9 SD= 0.41 0 2 -4 4 M= -0.3 SD= 0.62 -2 0 2 4 M= -0.4 SD= 1.3 The Sampling Distribution Of the Mean for small samples When the sample size is small: mean of the Sampling Distribution Of the Mean is still = population mean, but the SD of the Distribution of Sample Means is large. 1500 1000 500 0 Frequency 2000 2500 DOSM for sample size 8 -4 -2 0 2 M= -0.00034 SD= 0.353312 4 October 8, 2012 The Sampling Distribution Of the Mean for very small samples When the sample size is very small, the mean of the Distribution of Sample Means still = population mean, the standard deviation of the Distribution of Sample Means is huge. 0 500 Frequency 1000 1500 DOSM for sample size 2 -4 -2 0 2 4 M= -0.001 SD= 0.705306 Sample size affects the SD of the Distribution of Sample Mean 0.4 0.3 0.2 0.1 SD of DOSM 0.5 0.6 0.7 The standard deviation of the Distribution of Sample Mean falls as: 0 20 40 60 Sample size 80 100 October 8, 2012 Optimal sample size In any single sample, the sample mean is more likely to be equal to the population mean if the sample size is larger. Always use the largest sample size that you can afford! If you must use a small sample, remember that the sample statistics might not accurately reflect the population parameters. 5.50 Sampling distribution of the mean Assume you have taken all samples of size 25 from a population of 400. The population is skewed, with a mean of 0 and a SD of 1. What will the shape of the sampling distribution of the means be? a) b) c) Skewed Normal Who knows? October 8, 2012 What does all this allow us to do? If we know the distribution of means for samples drawn from a given population, we can estimate the probability that a sample was drawn from a population with a certain mean and SD. This is inferential statistics. 5.53 Converting Sample Mean to z z X X X X X n October 8, 2012 Example 1 Assume a normally distributed population with μ = 70 and σ = 20. Your sample size is 25. What is the probability of obtaining a random sample with a mean of 80 or higher? Convert sample mean to z. z X X n 80 70 2.5 20 25 Refer to z table. The probability of obtaining such a mean or larger in random sampling is .0062.
© Copyright 2024 Paperzz