Skittles Project

Lyndee Labrum
Skittles Term Project
By completing this project, i was able to effectively apply what i have learned
throughout the semester of studying statistics, and apply it to a real collection and study
of data. The study began with counting the number of candies in a 2.17 ounce bag of
Original Skittles and recording the number of red, orange, yellow, green, and purple
candies. After submitting my own data, the data was compiled for the entire class. Once
recorded, charts were made based upon the colors, number of candies per bag, along
with estimating confidence interval and hypothesis testing.
The goal of completing the Skittles Project is to understand how to organize and
analyze data, draw conclusions from the completed data using confidence intervals and
hypothesis testing, and recording such information in a well organized paper.
Organizing & Displaying Categorical Data: Colors
Results of My Skittles Bag
Number of Red
Candies
Number of
Orange Candies
Number of Yellow
Candies
Number of Green
Candies
Number of Purple
Candies
13
15
13
6
10
Total Number of Candies: 57
Results from Entire Class Sample (21 Bags)
Number of Red
Candies
Number of
Orange Candies
Number of Yellow
Candies
Number of Green
Candies
Number of Purple
Candies
283
271
254
206
250
0.224
0.214
0.201
0.163
0.198
Total Number of Candies: 1,264
Lyndee Labrum
Lyndee Labrum
After observing the data pictured above, I found that the graphs reflect precisely
what I expected to see. Each bag of Skittles has a different number of candies, along
with different amounts of colors as the graphs show. I have never opened a bag of
Skittles and believed that there would be the same number of red candies in this bag as
I found in the last. That being said, my bag of Skittles in comparison with the overall
data of the entire class does not agree with my own. My bag, with a total of 57 candies,
was the second to smallest bag out of the 21 bags recorded. While other student’s bags
were unequally filled with drastically varying numbers of colors of skittles, my bag was
almost equal across the board besides the small amount of green candies. Interestingly,
i found that green candies seem to be the least abundant per bag.
Organizing and Displaying Quantitative Date: The Number of Candies per Bag
The Mean: 60.2
Standard Deviation: 3.75
Five Number Summary:
1. Minimum- 52
2. Quartile 1- 59
3. Quartile 2- 60 (medium)
4. Quartile 3- 62
5. Maximum- 67
Lyndee Labrum
Lyndee Labrum
The shape of the distribution was mostly bell- shaped, meaning the number of
candies increased up, reached their maximum, and then decreased down. The graph
reflected what i expected to see and the overall data does indeed agree with my own
sample of Skittles, yet it seems that most bags, besides my own, contained more
skittles. The average number of Skittles per bag, of the 21 bag sample size, was 60.2
candies, whereas my bag only contained 57 candies.
Reflection:
Categorical data refers directly to the category as in Skittles, wheres Qualitative
data is referring to the numerical portion of the data such the amount of Skittles. For
Categorical Data, we most commonly use Pie Charts and Pareto Charts because they
are easy to make sense of visually and statistically, especially the Pie Chart as it is the
easiest to read. When researching Quantitative Data, we use charts such as the
Histogram and Box Plot. Although both represent the same data, Histograms are easier
to understand because the highest bar in the graph gives the mean and medium of the
data whereas the Box Plot has to be studied. Mathematical calculations, such as the
mean, standard deviation, and quartiles do not make sense for Categorical data
because that data refers to the category. On the other hand, mathematical calculations
such as the ones mentioned make sense when dealing with Qualitative data because it
refers directly to numbers and amounts.
Confidence Interval Estimates
The purpose of finding a confidence interval is to help us to see a range of values
for an estimated population parameter instead of just one value. Given that an
experiment is done over and over again, the point of estimate should fall in between the
value range calculated by the confidence interval results. In order to perform a
confidence interval, certain conditions need to be met including: the sample is a simple
random sample, the sample needs to meet the binomial distribution conditions, a
sample needs to be greater than 30 and the proportion, the mean and the standard
Lyndee Labrum
deviation has to fall within the margin of error. The margin of error is how close the
sample statistic needs to be to the population parameter.
A. Construct a 95% confidence interval estimate for the true proportion of purple
candies.
n= 1,264
x= 250
p̂ = 0.19778
α= 1.96
E= 1.96 √ [0.19778 (1-0.19778)] / 1264
E= 0.021959
p̂ + or - 0.022
0.176 < p < 0.220
B. Construct a 99% confidence interval estimate for the true mean number of candies
per bag.
n= 21
s= 3.75
x̄= 60.2
α= 2.57
E= 2.57 (3.75/ √ 21)
E= 2.103074917
x̄ + or - 2.103074917
58 < x < 62
C. Construct a 98% confidence interval estimate for the standard deviation of the
number of candies per bag.
n= 21
DF= 20
variance/ s²= 14.0625
σ = 3.75
Lyndee Labrum
x²R= 37.556
x²L= 8.260
(20)14.0625/ 37.556 < σ² < (20)14.0625/ 8.260
2.737 < σ < 5.835
In calculating the true proportion of yellow candies, I used the total number of
candies and the total number of purple candies. The confidence interval is interpreted to
be, “we are 95% confident that the population proportion of purple candies per bag of
skittles falls between .176 and .220.”
For the true mean number of candies per bag the confidence interval only
contained 3 possible mean counts: 59, 60 and 61. This shows that the mean number of
candies per bag is fairly consistent in sample data. This is interpreted to be, “we are
99% confident that the mean number of candies per skittles bag will be between 58 and
62 candies.”
In order to calculate the standard deviant, I used Table A-4 to find x²R and x²L.
The results, after plugging my numbers into the formula, were for the variance or σ², so i
took the square root of the values to obtain the standard deviation interval. This
confidence interval is interpreted as, “we are 98% confident that the population standard
deviation for the number of candies per bag is between 2.737 and 5.835 standard
deviations from the mean.”
Hypothesis Tests
A hypothesis test is used to test a claim about a population. In hypothesis testing,
we use a null hypothesis (a parameter that is equal to some claimed value) and an
alternative hypothesis (states that the parameter somehow differs fro\m the null).
a. Use a 0.01 significance level to test the claim that 20% of all Skittles candies are
green.
Ho: p= 0.2
H1: p ≠ 0.2
(two-tailed test)
Lyndee Labrum
α= 0.01
n= 1264
p̂ = 0.16297
p= 0.2
z= 0.163- 0.2 /[√0.2(1-0.2)/1264] = -3.2886
0.0005 X 2= 0.001
We are testing the null hypothesis of p = .2. The p-value (probability of getting a result at
least as extreme) is .001. As this is less than the significance level of .01, we reject the
null hypothesis.
20% of all Skittles candies are not green. There may be more, or less, than 20% of
green candies, but within are data, that claim is false.
b. Use a 0.05 significance level to test the claim that the mean number of candies in a
bag of Skittles is 56.
Ho: μ=56
H1: μ ≠ 56
(two-tailed test)
α= 0.05
x̅= 60.2
s= 3.010299
D.F.= 21
t= 60.2-56/(3.010299/ √22)= 6.544
t-chart critical value= 2.080
p= 0.0250
We are testing a null hypothesis of mean = 56 candies per bag. As the t-value
(probability of getting a result at least as extreme) is less than the significance level of
0.05, there is not sufficient evidence to warrant the rejection of the claim that the mean
number of candies is 56.
Because we cannot reject the idea that the mean number of candies is 56, it may very
well be that the mean number of candies in a bag of Skittles is 56.
Although the math was completed, it is unethical to use such data because the
sample size requirement of n>30 is not met in case of the means.
Lyndee Labrum
Reflection:
The conditions needed for interval estimates include that the sample is a random
sample, the conditions for a binomial distribution are satisfied, and that the population is
normally distributed (n>30). Our samples did meet these requirements when
questioning the claim of green skittles (1264> 30), but not when questioning the mean
number of candies per bag of Skittles (22< 1264). If we were to test a standard
deviation, the requirements to do so include a simple random sample and a sufficiently
large sample. To decide whether or not a sample is sufficiently large, it should include
10 success and 10 failures. Because our sample of Skittles are not necessarily
considered successes or failures, we do not use this sampling method.
An error that could have occurred while testing these conditions is that the
numbers given to data table were not correct, or false. People may have used a larger
bag, a smaller bag, or a bag that contained the blue skittle rather than the original
rainbow.
We could improve our sampling method by using larger bags of skittles, having
stricter rules on what skittles to remove (broken, cracked, etc.), and only using bags that
have 20+ skittles contained in them to give us a larger sample size.
I found after completing this exercise that the true mean number of candies in
each bag of Skittles is close to the actual mean we found by compiling our data. Also, it
seems that each color of Skittle is close to being evenly proportioned from one bag to
the next.