Section 10: Hypothesis/Significance Testing for Proportions (Major

Section 10: Hypothesis/Significance Testing for Proportions (Major Concept Review)
Example 1: Count Buffon (Georges-Louis Leclerc, Comte de
Buffon, 1707-1788) tosses coin 4040 times, gets 2048 heads.
One story has it that he was in prison at the time. Another
story is that he paid an orphan boy to do it. Either way, he was
either very bored, very rich, or both.
Are his results unusual?
That is, is the coin loaded to favor heads?
For a fair coin, the sampling distribution for the proportion of
heads in 4040 flips would be
𝑝̂ ≈ N (0.5, √
0.5(1 − 0.5)
)
4040
What is the probability of a fair coin resulting in so many
heads? (right tail)?
𝑝̂ ≥
2048
4040
2048
− 0.5
𝑍 ≥ 4040
√0.5(1 − 0.5)
4040
𝑍 ≥ 0.88
0.1894
A fair coin flipped 4040 times:
 19% of the time you’d get as many heads as he got.
 19% of samples result in as many heads as he got.
 Judgment call: It is simply not that unusual to get this many heads. It is still plausible that the coin is fair. We
have no evidence that the coin is loaded in favor or heads.
 We have no evidence that p (the probability of a head for this coin) is more than 0.5, so do not reject the theory
that p =0.5.
Assuming the coin is fair, we run the numbers and conclude that it wouldn’t be unusual for a fair coin to do such a thing.
So it’s still plausible that the coin is fair.
Example 2: Kaktus Fabric Softener claims that 60% of customers prefer their brand. A SRS of 2000 customers reveals
that 1140 prefer their brand.
Does this sample constitute evidence that they’re exaggerating?
Let p = the proportion of all customers who prefer their brand.
One theory is that p is 0.6. Another theory is that that p is less than 0.6. (It is also possible that p is more than 0.6, but
we’re not interested in this case.)
The null hypothesis 𝑯𝟎 is a theory about what p equals, a potential value for the purpose of running the numbers.
The alternative hypothesis 𝑯𝒂 is a range of possible alternatives for p (less than, greater than, not equal to a number).
𝐻0 : 𝑝 = 0.6
𝐻𝑎 : 𝑝 < 0.6
If 𝐻0 is the truth, then the company is telling the truth.
If 𝐻𝑎 is the truth, then they’re exaggerating.
Notice that if 𝐻0 is the truth, we know what p is, but if 𝐻𝑎 is the truth, we don’t know what p is.
We always assume 𝐻0 (the null hypothesis) for the purpose of running the numbers. In the end, it’s either plausible in
light of the sample (do not reject the theory, no evidence of the alternative), or it’s not plausible in light of the sample
(reject the theory, evidence of the alternative).
If the theory being true would make the sample very unlikely, then we will reject the theory. 𝐻0 is the foundation of the
calculation; if it produces something very unlikely, the foundation crumbles.
Assuming they’re telling the truth (𝐻0 ), what is the probability of a sample as extreme as ours?
What does “as extreme as” mean in context?
 For a coin which might be loaded in favor of heads, high proportions of heads incline you toward that
explanation.
 For a fabric softener company which might be exaggerating, low proportions of customers liking their brand
incline you to conclude that they’re exaggerating).
In our case, low 𝑝̂ ’s constitute evidence of 𝐻𝑎 .
(In our previous example, high samples did: 𝐻0 : 𝑝 = 0.5
𝐻𝑎 : 𝑝 > 0.5)
Assuming 𝐻0 , the probability of 𝑝̂ as low as ours is
𝑝̂ ≤ 0.57






𝑍≤
0.57 − 0.6
√0.6(1 − 0.6)
2000
𝑍 ≤ −2.74
0.0031
If they’re telling the truth, this sample is very unusual.
If they’re telling the truth, this sample is very unlucky for them. Unbelievably unlucky.
So don’t believe.
Judgmental call: If 60% of customers prefer their brand, it would be very unusual to get a sample result this low.
𝐻0 is no longer plausible. We have evidence of 𝐻𝑎 .
We have evidence that p is less than 0.6. Reject the theory that p = 0.6.
The probability of getting a sample as extreme as ours (as low, as high, as different) on the current understanding of the
facts (𝐻0 ) is called the p-value.


If the p-value is “very low”, reject 𝐻0 (it is no longer plausible in light of the sample) and find evidence of 𝐻𝑎 .
If the p-value is “not that low”, do not reject 𝐻0 (it is still plausible in light of the sample) and do not find
evidence of 𝐻𝑎 .
At no point do we use the word “accept”. 𝐻0 doesn’t have to be accepted; it already has the benefit of the doubt, until we
reach the breaking point. Nor does it make sense to “accept” 𝐻𝑎 because it isn’t a specific explanation of the facts. If the
company is exaggerating, we still don’t know what proportion of people prefer their brand, only that it’s less than 60%.
Like a court of law, 𝐻0 is guilty (low p-value) or not guilty (p-value not that low). The sample either convicts it or fails to
convict it. The alternative 𝐻𝑎 comes into it (if at all) only by the process of elimination.
In light of the sample we have to work with, 𝐻0 is either plausible (do not reject) or implausible (reject). We either have
evidence of the alternative 𝐻𝑎 or we don’t have evidence of it.
Example 3: Claim: 30% of jellybeans in a large vat are licorice. SRS n= 1000, 270 licorice.
(a) I like licorice. Am I being cheated? The logical alternatives are
𝐻0 : 𝑝 = 0.3
𝐻𝑎 : 𝑝 < 0.3
Assuming they’re telling the truth (𝐻0 ), what is the probability of a sample as extreme as ours?
Low 𝑝̂ ’s constitute evidence of 𝐻𝑎 .
𝑝̂ ≤ 0.27
𝑍≤
0.27 − 0.3
√0.3(1 − 0.3)
1000
𝑍 ≤ 02.07
0.0192
If he’s right, 1.92% of samples result is a value this low. Do I consider this to be “unbelievably unlikely?”
Well, unlike the previous examples (which were clear cut), it comes down to what I consider to be “unbelievably
unlikely.”
The 5% significance level means that occurring less than 5% of the time through random chance, is too unlikely to
believe in. Having a pre-set standard of what you’ll consider too unlikely to believe in (before running the test), renders
our results scientific and no longer subjective.





At the 5% level, 0.0192 < 0.05, so yes: reject 𝐻0 , evidence of 𝐻𝑎
o If occurring less than 5% of the time through chance is my standard, then this p-value is too unlikely to
believe.
At the 1% level, 0.0192 ≮ 0.01, so no: do not reject 𝐻0 , no evidence of 𝐻𝑎
o If occurring less than 1% of the time through chance is my standard, then this p-value is still believable.
It comes down to what you consider “unbelievably unlikely.”
If you have to be among the 5% weirdest samples to be considered “unacceptable weird”, it is.
If you have to be among the 1% weirdest samples to be considered “unacceptable weird”, it isn’t.
(b) Suppose that I have no opinion of licorice one way or the other. In this case, the advanced claim is just a number
which might be true or might not be true.
Do I have evidence that this claimer is mistaken? (That is, that it is not the case that 30% of the vat is licorice.)
𝐻0 : 𝑝 = 0.3
𝐻𝑎 : 𝑝 ≠ 0.3
Assuming they’re telling the truth (𝐻0 ), what is the probability of a sample as extreme as ours?
𝑝̂ ’s far from 0.3 in either direction constitute evidence of 𝐻𝑎 .
In other words, what is the probability of getting a 𝑝̂ as far from 0.3 as ours is?
𝑝̂ ≤ 0.27 or 𝑝̂ ≥ 0.33
Note that you don’t actually need the other number (0.33). These are two identical tails, one of which we’re set up to
compute (or in this case, already know).
𝑝̂ ≤ 0.27
0.0192
Same tail: 0.192
Both tails = 0.0192 * 2 = 0.0384
In the two-sided hypothesis test (when you have the ≠ alternative), double the tail you look up.
If 30% of the vat is licorice, the probability of getting a sample as far away from 30% as ours is, is 0.0384. That is, 3.84%
of samples would be as far from the truth as ours is. Do I consider this to be “unbelievably unlikely?”


At the 5% level, 0.0384 < 0.05, so yes: reject 𝐻0 , evidence of 𝐻𝑎
At the 1% level, 0.0384 ≮ 0.05, so no: do not reject 𝐻0 , no evidence of 𝐻𝑎
When evaluating a p-value, ask the question, is it “suspiciously small”? If so, it renders the current theory (𝐻0 ) suspect.
It counts against the current theory. Otherwise 𝐻0 is still plausible.
In conclusion, don’t make it more complicated than it is. It is unlikely conclusions which cast doubt on our assumptions.