Lyndee Labrum Skittles Term Project By completing this project, i was able to effectively apply what i have learned throughout the semester of studying statistics, and apply it to a real collection and study of data. The study began with counting the number of candies in a 2.17 ounce bag of Original Skittles and recording the number of red, orange, yellow, green, and purple candies. After submitting my own data, the data was compiled for the entire class. Once recorded, charts were made based upon the colors, number of candies per bag, along with estimating confidence interval and hypothesis testing. The goal of completing the Skittles Project is to understand how to organize and analyze data, draw conclusions from the completed data using confidence intervals and hypothesis testing, and recording such information in a well organized paper. Organizing & Displaying Categorical Data: Colors Results of My Skittles Bag Number of Red Candies Number of Orange Candies Number of Yellow Candies Number of Green Candies Number of Purple Candies 13 15 13 6 10 Total Number of Candies: 57 Results from Entire Class Sample (21 Bags) Number of Red Candies Number of Orange Candies Number of Yellow Candies Number of Green Candies Number of Purple Candies 283 271 254 206 250 0.224 0.214 0.201 0.163 0.198 Total Number of Candies: 1,264 Lyndee Labrum Lyndee Labrum After observing the data pictured above, I found that the graphs reflect precisely what I expected to see. Each bag of Skittles has a different number of candies, along with different amounts of colors as the graphs show. I have never opened a bag of Skittles and believed that there would be the same number of red candies in this bag as I found in the last. That being said, my bag of Skittles in comparison with the overall data of the entire class does not agree with my own. My bag, with a total of 57 candies, was the second to smallest bag out of the 21 bags recorded. While other student’s bags were unequally filled with drastically varying numbers of colors of skittles, my bag was almost equal across the board besides the small amount of green candies. Interestingly, i found that green candies seem to be the least abundant per bag. Organizing and Displaying Quantitative Date: The Number of Candies per Bag The Mean: 60.2 Standard Deviation: 3.75 Five Number Summary: 1. Minimum- 52 2. Quartile 1- 59 3. Quartile 2- 60 (medium) 4. Quartile 3- 62 5. Maximum- 67 Lyndee Labrum Lyndee Labrum The shape of the distribution was mostly bell- shaped, meaning the number of candies increased up, reached their maximum, and then decreased down. The graph reflected what i expected to see and the overall data does indeed agree with my own sample of Skittles, yet it seems that most bags, besides my own, contained more skittles. The average number of Skittles per bag, of the 21 bag sample size, was 60.2 candies, whereas my bag only contained 57 candies. Reflection: Categorical data refers directly to the category as in Skittles, wheres Qualitative data is referring to the numerical portion of the data such the amount of Skittles. For Categorical Data, we most commonly use Pie Charts and Pareto Charts because they are easy to make sense of visually and statistically, especially the Pie Chart as it is the easiest to read. When researching Quantitative Data, we use charts such as the Histogram and Box Plot. Although both represent the same data, Histograms are easier to understand because the highest bar in the graph gives the mean and medium of the data whereas the Box Plot has to be studied. Mathematical calculations, such as the mean, standard deviation, and quartiles do not make sense for Categorical data because that data refers to the category. On the other hand, mathematical calculations such as the ones mentioned make sense when dealing with Qualitative data because it refers directly to numbers and amounts. Confidence Interval Estimates The purpose of finding a confidence interval is to help us to see a range of values for an estimated population parameter instead of just one value. Given that an experiment is done over and over again, the point of estimate should fall in between the value range calculated by the confidence interval results. In order to perform a confidence interval, certain conditions need to be met including: the sample is a simple random sample, the sample needs to meet the binomial distribution conditions, a sample needs to be greater than 30 and the proportion, the mean and the standard Lyndee Labrum deviation has to fall within the margin of error. The margin of error is how close the sample statistic needs to be to the population parameter. A. Construct a 95% confidence interval estimate for the true proportion of purple candies. n= 1,264 x= 250 p̂ = 0.19778 α= 1.96 E= 1.96 √ [0.19778 (1-0.19778)] / 1264 E= 0.021959 p̂ + or - 0.022 0.176 < p < 0.220 B. Construct a 99% confidence interval estimate for the true mean number of candies per bag. n= 21 s= 3.75 x̄= 60.2 α= 2.57 E= 2.57 (3.75/ √ 21) E= 2.103074917 x̄ + or - 2.103074917 58 < x < 62 C. Construct a 98% confidence interval estimate for the standard deviation of the number of candies per bag. n= 21 DF= 20 variance/ s²= 14.0625 σ = 3.75 Lyndee Labrum x²R= 37.556 x²L= 8.260 (20)14.0625/ 37.556 < σ² < (20)14.0625/ 8.260 2.737 < σ < 5.835 In calculating the true proportion of yellow candies, I used the total number of candies and the total number of purple candies. The confidence interval is interpreted to be, “we are 95% confident that the population proportion of purple candies per bag of skittles falls between .176 and .220.” For the true mean number of candies per bag the confidence interval only contained 3 possible mean counts: 59, 60 and 61. This shows that the mean number of candies per bag is fairly consistent in sample data. This is interpreted to be, “we are 99% confident that the mean number of candies per skittles bag will be between 58 and 62 candies.” In order to calculate the standard deviant, I used Table A-4 to find x²R and x²L. The results, after plugging my numbers into the formula, were for the variance or σ², so i took the square root of the values to obtain the standard deviation interval. This confidence interval is interpreted as, “we are 98% confident that the population standard deviation for the number of candies per bag is between 2.737 and 5.835 standard deviations from the mean.” Hypothesis Tests A hypothesis test is used to test a claim about a population. In hypothesis testing, we use a null hypothesis (a parameter that is equal to some claimed value) and an alternative hypothesis (states that the parameter somehow differs fro\m the null). a. Use a 0.01 significance level to test the claim that 20% of all Skittles candies are green. Ho: p= 0.2 H1: p ≠ 0.2 (two-tailed test) Lyndee Labrum α= 0.01 n= 1264 p̂ = 0.16297 p= 0.2 z= 0.163- 0.2 /[√0.2(1-0.2)/1264] = -3.2886 0.0005 X 2= 0.001 We are testing the null hypothesis of p = .2. The p-value (probability of getting a result at least as extreme) is .001. As this is less than the significance level of .01, we reject the null hypothesis. 20% of all Skittles candies are not green. There may be more, or less, than 20% of green candies, but within are data, that claim is false. b. Use a 0.05 significance level to test the claim that the mean number of candies in a bag of Skittles is 56. Ho: μ=56 H1: μ ≠ 56 (two-tailed test) α= 0.05 x̅= 60.2 s= 3.010299 D.F.= 21 t= 60.2-56/(3.010299/ √22)= 6.544 t-chart critical value= 2.080 p= 0.0250 We are testing a null hypothesis of mean = 56 candies per bag. As the t-value (probability of getting a result at least as extreme) is less than the significance level of 0.05, there is not sufficient evidence to warrant the rejection of the claim that the mean number of candies is 56. Because we cannot reject the idea that the mean number of candies is 56, it may very well be that the mean number of candies in a bag of Skittles is 56. Although the math was completed, it is unethical to use such data because the sample size requirement of n>30 is not met in case of the means. Lyndee Labrum Reflection: The conditions needed for interval estimates include that the sample is a random sample, the conditions for a binomial distribution are satisfied, and that the population is normally distributed (n>30). Our samples did meet these requirements when questioning the claim of green skittles (1264> 30), but not when questioning the mean number of candies per bag of Skittles (22< 1264). If we were to test a standard deviation, the requirements to do so include a simple random sample and a sufficiently large sample. To decide whether or not a sample is sufficiently large, it should include 10 success and 10 failures. Because our sample of Skittles are not necessarily considered successes or failures, we do not use this sampling method. An error that could have occurred while testing these conditions is that the numbers given to data table were not correct, or false. People may have used a larger bag, a smaller bag, or a bag that contained the blue skittle rather than the original rainbow. We could improve our sampling method by using larger bags of skittles, having stricter rules on what skittles to remove (broken, cracked, etc.), and only using bags that have 20+ skittles contained in them to give us a larger sample size. I found after completing this exercise that the true mean number of candies in each bag of Skittles is close to the actual mean we found by compiling our data. Also, it seems that each color of Skittle is close to being evenly proportioned from one bag to the next.
© Copyright 2026 Paperzz