Math 141 Lecture 9: Inference Albyn Jones1 1 Library 304 [email protected] www.people.reed.edu/∼jones/courses/141 Albyn Jones Math 141 Inference: Main Topics Estimation What’s our best guess? (last time!) Albyn Jones Math 141 Inference: Main Topics Estimation What’s our best guess? (last time!) Hypothesis Testing Are the data consistent with a hypothesized model? Albyn Jones Math 141 Inference: Main Topics Estimation What’s our best guess? (last time!) Hypothesis Testing Are the data consistent with a hypothesized model? Confidence Intervals How accurate is our estimate? Albyn Jones Math 141 Inference: Main Topics Estimation What’s our best guess? (last time!) Hypothesis Testing Are the data consistent with a hypothesized model? Confidence Intervals How accurate is our estimate? We will develop these concepts in the context of the binomial distribution first... Albyn Jones Math 141 Motivation: Hypothesis tests Imagine I tell you I have tossed a coin 10 times, and got 10 Heads in the 10 tosses. What are the possible explanations? Albyn Jones Math 141 Motivation: Hypothesis tests Imagine I tell you I have tossed a coin 10 times, and got 10 Heads in the 10 tosses. What are the possible explanations? The coin really isn’t a fair coin: P(H) 6= 1/2. Albyn Jones Math 141 Motivation: Hypothesis tests Imagine I tell you I have tossed a coin 10 times, and got 10 Heads in the 10 tosses. What are the possible explanations? The coin really isn’t a fair coin: P(H) 6= 1/2. The coin is fair, and we observed a rare event. Albyn Jones Math 141 Motivation: Hypothesis tests Imagine I tell you I have tossed a coin 10 times, and got 10 Heads in the 10 tosses. What are the possible explanations? The coin really isn’t a fair coin: P(H) 6= 1/2. The coin is fair, and we observed a rare event. Other possibilities: the tosses weren’t independent, I made it all up, etc. Albyn Jones Math 141 Motivation: Hypothesis tests Imagine I tell you I have tossed a coin 10 times, and got 10 Heads in the 10 tosses. What are the possible explanations? The coin really isn’t a fair coin: P(H) 6= 1/2. The coin is fair, and we observed a rare event. Other possibilities: the tosses weren’t independent, I made it all up, etc. The first two possibilities lie at the heart of hypothesis testing. Albyn Jones Math 141 The problem: No Certainty! 0.00 0.05 0.10 0.15 0.20 The distribution for X ∼ Binomial(10, 1/2). 0 1 2 3 4 5 6 7 8 9 10 There are no impossible outcomes, though some are improbable. Albyn Jones Math 141 Terminology Null Hypothesis H0 the specification of a probability model corresponding to the substantive question we would like to answer. Often it is a hypothesis we hope to reject! Example: A researcher tests a new treatment for anorexia, and wants to know if it has any effect on subjects’ gain or loss of weight. Let X be the number of subjects who gain weight after treatment. Assuming subjects are independent, we might wish to test the null hypothesis that the treatment does no better than chance: the probability of weight gain is p = 1/2, and thus X ∼ Binomial(n, 1/2). Albyn Jones Math 141 Decisions, Decisions! The standard solution: choose a Rejection Region, a set of outcomes that you consider to be evidence that the specified probability model should be rejected. Important: getting an outcome in the rejection region does not prove that the model is wrong, since the outcomes in the rejection region are not impossible, just improbable. We reject hypotheses, rather than proving them false: either we have observed a rare event, or the null hypothesis is false. Albyn Jones Math 141 More Terminology! Significance Level, α The Significance Level, or α-level, also called the size of the test, is the probability of getting an outcome in the rejection region given that the null hypothesis is correct. If X is the test statistic, and RR the rejection region, then P(X ∈ RR|H0 ) = α α is a conditional probability: how often do we reject H0 when H0 is really correct? Albyn Jones Math 141 Example: Testing H0 : p = 1/2 The rejection region is colored red 0.00 0.05 0.10 0.15 0.20 Binomial( 10 , 0.5 ), alpha= 0.021 0 1 2 3 4 Albyn Jones 5 6 Math 141 7 8 9 10 Computing α H0 : p = 1/2, for 10 coin tosses, vs. H1 : p 6= 1/2. Let the Rejection Region be {0, 1, 9, 10}. What is α? RR <- c(0,1,9,10) sum(dbinom(RR,10,.5)) [1] 0.02148438 α ≈ .02 Albyn Jones Math 141 Two-tailed vs One-tailed tests Suppose X ∼ Binomial(10, p). The choice of rejection region depends on the alternative hypothesis. Albyn Jones Math 141 Two-tailed vs One-tailed tests Suppose X ∼ Binomial(10, p). The choice of rejection region depends on the alternative hypothesis. Two tailed test: H0 : p = .5 vs. H1 : p 6= .5; for α ≈ .02, choose RR = {0, 1, 9, 10} Albyn Jones Math 141 Two-tailed vs One-tailed tests Suppose X ∼ Binomial(10, p). The choice of rejection region depends on the alternative hypothesis. Two tailed test: H0 : p = .5 vs. H1 : p 6= .5; for α ≈ .02, choose RR = {0, 1, 9, 10} One tailed test: H0 : p ≥ .5 vs. H1 : p < .5; for α ≈ .01, choose RR = {0, 1} Albyn Jones Math 141 Two-tailed vs One-tailed tests Suppose X ∼ Binomial(10, p). The choice of rejection region depends on the alternative hypothesis. Two tailed test: H0 : p = .5 vs. H1 : p 6= .5; for α ≈ .02, choose RR = {0, 1, 9, 10} One tailed test: H0 : p ≥ .5 vs. H1 : p < .5; for α ≈ .01, choose RR = {0, 1} One tailed test: H0 : p ≤ .5 vs. H1 : p > .5; for α ≈ .01, choose RR = {9, 10} Albyn Jones Math 141 Testing H0 : p ≥ 1/2 vs. H1 : p < 1/2 The rejection region is colored red 0.00 0.05 0.10 0.15 0.20 Binomial( 10 , 0.5 ), alpha= 0.011 0 1 2 3 4 Albyn Jones 5 6 Math 141 7 8 9 10 Still More Terminology: P-Values p-value For a two-sided test: the probability of getting an outcome at least as unlikely as the observed outcome, given that the null hypothesis is correct. Albyn Jones Math 141 Still More Terminology: P-Values p-value For a two-sided test: the probability of getting an outcome at least as unlikely as the observed outcome, given that the null hypothesis is correct. For a one-sided test: the probability of getting an outcome X ≤ Xobs if H1 : p < p0 , X ≥ Xobs if H1 : p > p0 . or In other words, the probability of getting an outcome at least as inconsistent with H0 as the observed outcome. Albyn Jones Math 141 Another version: P-Values p-value The significance level (size) of the smallest rejection region containing the observed statistic, defined by the alternative hypothesis with probabilities computed under H0 : p = p0 . Albyn Jones Math 141 H0 : p = 1/2, two sided p-value Outcomes contributing to the p-value are colored red Suppose we toss a coin n = 10 times to test H0 vs. the alternative p 6= 1/2. We observe X = 7. What is the p-value? 0.00 0.05 0.10 0.15 0.20 H0: X ~ Binomial( 10 , 0.5 ), p−value= 0.344 0 1 2 3 4 Albyn Jones 5 6 7 Math 141 8 9 10 Testing H0 : p ≤ 1/2 vs H1 : p > 1/2 Outcomes contributing to the p-value are colored red Suppose we toss a coin n = 10 times to test H0 vs. the alternative p > 1/2. We observe X = 7. What is the p-value? 0.00 0.05 0.10 0.15 0.20 H0: X ~ Binomial( 10 , 0.5 ), p−value= 0.172 0 1 2 3 4 Albyn Jones 5 6 7 Math 141 8 9 10 Testing H0 : p ≥ 1/2 vs H1 : p < 1/2 Outcomes contributing to the p-value are colored red Suppose we toss a coin n = 10 times to test H0 vs. the alternative p < 1/2. We observe X = 7. What is the p-value? 0.00 0.05 0.10 0.15 0.20 H0: X ~ Binomial( 10 , 0.5 ), p−value= 0.945 0 1 2 3 4 Albyn Jones 5 6 7 Math 141 8 9 10 An Argument for Two-sided Tests In almost all settings, two-sided tests are the standard ritual: Albyn Jones Math 141 An Argument for Two-sided Tests In almost all settings, two-sided tests are the standard ritual: The p-value for a one-sided test can be half that for a two sided test with the same observed outcome. Albyn Jones Math 141 An Argument for Two-sided Tests In almost all settings, two-sided tests are the standard ritual: The p-value for a one-sided test can be half that for a two sided test with the same observed outcome. What if we get a result that would allow us to reject H0 with a one-sided test, but not with the two-sided test we initially conducted? Albyn Jones Math 141 An Argument for Two-sided Tests In almost all settings, two-sided tests are the standard ritual: The p-value for a one-sided test can be half that for a two sided test with the same observed outcome. What if we get a result that would allow us to reject H0 with a one-sided test, but not with the two-sided test we initially conducted? Rude Question: Did the researcher change the alternative hypothesis after computing the p-value for the two-sided alternative? Albyn Jones Math 141 An Argument for Two-sided Tests In almost all settings, two-sided tests are the standard ritual: The p-value for a one-sided test can be half that for a two sided test with the same observed outcome. What if we get a result that would allow us to reject H0 with a one-sided test, but not with the two-sided test we initially conducted? Rude Question: Did the researcher change the alternative hypothesis after computing the p-value for the two-sided alternative? Some journals and federal agencies will not accept one-sided hypothesis tests! Albyn Jones Math 141 Terminology Suppose we have tested some null hypothesis H0 . What do we say when we reject H0 ? Here are some standard expressions you will hear or see in journals: Albyn Jones Math 141 Terminology Suppose we have tested some null hypothesis H0 . What do we say when we reject H0 ? Here are some standard expressions you will hear or see in journals: We reject H0 at α = .05. Albyn Jones Math 141 Terminology Suppose we have tested some null hypothesis H0 . What do we say when we reject H0 ? Here are some standard expressions you will hear or see in journals: We reject H0 at α = .05. We reject H0 , p = .043. (p here refers to the p-value, not the binomial parameter p!) Albyn Jones Math 141 Terminology Suppose we have tested some null hypothesis H0 . What do we say when we reject H0 ? Here are some standard expressions you will hear or see in journals: We reject H0 at α = .05. We reject H0 , p = .043. (p here refers to the p-value, not the binomial parameter p!) p̂ is statistically significantly different from p0 . Albyn Jones Math 141 Terminology Suppose we have tested some null hypothesis H0 . What do we say when we reject H0 ? Here are some standard expressions you will hear or see in journals: We reject H0 at α = .05. We reject H0 , p = .043. (p here refers to the p-value, not the binomial parameter p!) p̂ is statistically significantly different from p0 . The result is statistically significant. Albyn Jones Math 141 Terminology Suppose we have tested some null hypothesis H0 . What do we say when we reject H0 ? Here are some standard expressions you will hear or see in journals: We reject H0 at α = .05. We reject H0 , p = .043. (p here refers to the p-value, not the binomial parameter p!) p̂ is statistically significantly different from p0 . The result is statistically significant. The difference is large enough that it is unlikely to have occurred by chance. Albyn Jones Math 141 Significance It is important to distinguish statistical significance from substantive significance: statistical significance: the difference is large enough to detect with this sample size. substantive significance: the difference is large enough that we care about it! Albyn Jones Math 141 More Terminology Suppose we have tested some null hypothesis H0 . What do we say when we fail to reject H0 ? Here are some standard expressions you will hear or see in journals: Albyn Jones Math 141 More Terminology Suppose we have tested some null hypothesis H0 . What do we say when we fail to reject H0 ? Here are some standard expressions you will hear or see in journals: We fail to reject H0 at α = .05. Albyn Jones Math 141 More Terminology Suppose we have tested some null hypothesis H0 . What do we say when we fail to reject H0 ? Here are some standard expressions you will hear or see in journals: We fail to reject H0 at α = .05. The observed difference is no bigger than would be expected due to chance. Albyn Jones Math 141 More Terminology Suppose we have tested some null hypothesis H0 . What do we say when we fail to reject H0 ? Here are some standard expressions you will hear or see in journals: We fail to reject H0 at α = .05. The observed difference is no bigger than would be expected due to chance. The result is marginally significant at p = .07. We expect a larger replication of the experiment would demonstrate statistical significance. Albyn Jones Math 141 More Terminology Suppose we have tested some null hypothesis H0 . What do we say when we fail to reject H0 ? Here are some standard expressions you will hear or see in journals: We fail to reject H0 at α = .05. The observed difference is no bigger than would be expected due to chance. The result is marginally significant at p = .07. We expect a larger replication of the experiment would demonstrate statistical significance. Your paper is not accepted for publication. The editors agree that your result would be interesting if it were confirmed, and encourage you to resubmit after collecting more data. Albyn Jones Math 141 Interpretation Some consider p-values to measure the strength of evidence against H0 . This is dangerous, since p-values only measure the probability of the observed data assuming H0 is true, and: P(X |H0 ) 6= P(H0 |X ) Albyn Jones Math 141 Example: Screening Tests Recall the example of screening tests we studied with Bayes’ Theorem. Let H0 be !D, i.e. ‘no disease’, P is ‘tests positive’. Albyn Jones Math 141 Example: Screening Tests Recall the example of screening tests we studied with Bayes’ Theorem. Let H0 be !D, i.e. ‘no disease’, P is ‘tests positive’. Sensitivity: is power, the probability we reject H0 when it is false. P(P|D) = .99 Albyn Jones Math 141 Example: Screening Tests Recall the example of screening tests we studied with Bayes’ Theorem. Let H0 be !D, i.e. ‘no disease’, P is ‘tests positive’. Sensitivity: is power, the probability we reject H0 when it is false. P(P|D) = .99 Specificity: is the significance level or, if the test is positive, the p-value P(P|!D) = .01 Albyn Jones Math 141 Example: Screening Tests Recall the example of screening tests we studied with Bayes’ Theorem. Let H0 be !D, i.e. ‘no disease’, P is ‘tests positive’. Sensitivity: is power, the probability we reject H0 when it is false. P(P|D) = .99 Specificity: is the significance level or, if the test is positive, the p-value P(P|!D) = .01 Prevalence: typically not available for hypothesis tests! P(D) = .01 Albyn Jones Math 141 Example: Screening Tests Recall the example of screening tests we studied with Bayes’ Theorem. Let H0 be !D, i.e. ‘no disease’, P is ‘tests positive’. Sensitivity: is power, the probability we reject H0 when it is false. P(P|D) = .99 Specificity: is the significance level or, if the test is positive, the p-value P(P|!D) = .01 Prevalence: typically not available for hypothesis tests! P(D) = .01 P(P|H0 ) = .01 6= .5 = P(H0 |P) Albyn Jones Math 141 Summary Null Hypothesis (H0 ): specifies a probability model. Rejection Region: the set of values that we feel cast serious doubt on H0 . Significance Level (α): The probability of getting an outcome in the rejection region, assuming H0 is true. p-value: For a two sided alternative, the set of values at least as unlikely as the observed value. Language Matters! Statistical significance is not the same as substantive significance! Albyn Jones Math 141
© Copyright 2026 Paperzz