Chi-Squared Tutorial This is significantly important. Get your AP Equations and Formulas sheet The Purpose • The chi-squared analysis exists to help us determine whether two sets of data have a significant difference. – Remember early in the semester when I said how scientists use the word “significant” only when they really mean it? • This is one method to tell if you can use the word. • Take a biostatistics course in college and you’ll learn a buttload more. The Null Hypothesis • Also recall that every experiment has a null hypothesis. – The “not very interesting” possibility, a.k.a. there is no difference between two sets of numbers. • In order to accept your own hypothesis, you must reject the null hypothesis. – In other words, determine if the results are significant. • The chi-squared test is one way to tell if you can do that. An Example • To resurrect this analogy and then kill it again, suppose you flip a coin 10 times. – You get 6 heads and 4 tails. – Is something fishy? • That’s a 60% heads rate. If you flip 100 times and get 60 heads and 40 tails, that’s the same rate. – Now you might think something’s wrong. – But where do you draw the line? How many flips does it take? – Looks like you need one of them chi-squared tests. http://images.nationalgeographic.com/wpf/media-live/photos/000/002/cache/angler-fish_222_600x450.jpg The Chi-Squared Test • The Greek letter chi is basically an χ, so the chi-squared test usually goes by the name χ2. • To perform the test, you need the following: – Data you observe (o). – Data you expect (e). – The degrees of freedom (df). • For example, in the 100 flip test, you’d expect 50 heads, but you observed 60 heads. The Chi-Squared Test: Step 1 • Determine the difference between observed and expected numbers: • 60 observed – 50 expected = 10 heads difference. • Square the difference: • 102 = 100. • Divide by what you expected: • 100/50 = 2. • Do the same for all calculated “differences” and add them together. • 40 observed – 50 expected = -10 tails difference, squared to 100, divided by 50 = 2. • 2 + 2 = 4. The Chi-Squared Test: Step 2 • That “4” we got as our answer is the calculated chisquared statistic (χ2calc) for our test. – The higher this is relatively speaking, the less “random chance” can play a role. – It’s called “calculated” because…you just…calculated it. • We will compare this statistic to another number to see if this indicates more variation than chance would suggest, or not. The Chi-Squared Test: Step 2 • The number to which you’ll compare the calculated χ2 value is called the critical chisquared value (χ2crit). • To figure out how to get the critical value, you need to know one other thing – the degrees of freedom. The Chi-Squared Test: Step 3 • Degrees of freedom goes by “df” and represents…well…this is hard to explain. Let’s try this: – I flipped 100 times and got 60 heads. Once I know how many times I got heads, the number of times I got tails is a given. – As a result, though there are two outcomes, there is only one degree of freedom. • Typically, df is the number of possible outcomes minus one. The Chi-Squared Test: Step 4 • p value reflects the probability that “chance” explains the difference between two sets of data. – In other words, p represents the likelihood that any difference you see is simply a fluke. • As you might guess, scientists demand a very high confidence that “chance” is not at work. – So high, in fact, that 95% is the traditional benchmark of confidence…confidence that the data is so unusual that it would have only occurred by chance 5% of the time. – Sometimes, scientists even raise that benchmark to 99%. The Chi-Squared Test: Step 4 • This benchmark, the “confidence that these data are truly unusual,” is also called alpha (). – If we demand that the data are so unusual they only would occur at random 5% of the time, then = 0.05. (or 5%, get it?) • So how is p different? • is like a benchmark for how unusual the data must be. • p is how unusual the data actually are. • In short, you’re going to look for p to be under 0.05. http://courses.washington.edu/p209s07/lecturenotes/Week%205_Monday%20overheads.pdf Another way to look at p… • Think of a board game you know. How much strategy is involved? How much luck? – For example, Sorry! is mostly luck. There is almost no strategy because it depends on random card draws. – Chess, on the other hand, is entirely strategy. • So the results of Sorry! are mostly due to chance. • The results of chess are mostly not. • In the same way, p is the degree of chance involved in a difference between sets of data. – If p is 0.25, any difference between data sets is 25% likely due to chance. – For scientists, traditionally, p should be 0.05. In other words, chance should only be, at most, 5% likely to explain the data. The Chi-Squared Test: Step 5 • Okay, let’s recap. • You do fancy math operations to get χ2calc. – It represents how different/unusual your data are. • Higher values mean “more differenter/unusualer.” • You figure out your df value (outcomes – 1). • You then determine how confident you want to be that your results show significance. – Want to be 95% confident it’s not a fluke? = 0.05. – Want to be 99% confident it’s not a fluke? = 0.01. The Chi-Squared Test: Step 5 • But how do you actually compare p to ? – You compare χ2calc to χ2crit! • You’ll need to look up χ2crit in a table. • The numbers across the top represent p values: We want our p value to be 0.05 or less, because that’s what science sets to. df/prob. 0.99 0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 1 0.00013 0.0039 0.016 0.06 0.15 0.46 1.07 1.64 2.71 3.84 2 0.02 0.10 0.21 0.45 5.99 3 0.12 0.35 In our coin flip example, 4 0.3 0.71 there is only one df. 5 0.55 1.14 0.58 1.00 is therefore χ2crit value. 0.713.841.39 2.41 our3.22 4.60 Remember, 4 was our χ2calc value. 1.42 2.37 3.66 4.64 6.25 1.06 1.65 2.20 3.36 4.88 5.99 7.78 9.49 1.61 2.34 3.00 4.35 6.06 7.29 9.24 11.07 7.82 The Chi-Squared Test: Step 5 • Once you have both χ2crit and χ2calc, compare: • χ2crit > χ2calc? Accept the null hypothesis.* There is no significant difference. • *Technically this is “failing to reject the null hypothesis.” • χ2calc ≥ χ2crit? Reject the null hypothesis. There’s something going on here. The Chi-Squared Test: Step 5 df/prob. 0.99 0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 1 0.00013 0.0039 0.016 0.06 0.15 0.46 1.07 1.64 2.71 3.84 2 0.02 0.10 0.21 0.45 0.71 1.39 2.41 3.22 4.60 5.99 3 0.12 0.35 0.58 1.00 1.42 2.37 3.66 4.64 6.25 7.82 4 0.3 0.71 1.06 1.65 2.20 3.36 4.88 5.99 7.78 9.49 5 0.55 1.14 1.61 2.34 3.00 4.35 6.06 7.29 9.24 11.07 Insignificant (accept null hypothesis) Significant (reject null hypothesis) The Chi-Squared Test: Step 6 • At p=0.05 (5% likelihood it’s chance) and 1 DF, χ2crit is 3.84, which is less than the “4” we got. • Since χ2crit ≤ χ2calc, we can reject the null hypothesis. – Something’s up with this coin. • Just so you know, doing this with 6/4 heads/tails leads to a χ2calc of 0.4, which is not a significant result. • Let’s look at the table for χ2calc = 0.4. The Chi-Squared Test: Step 5 df/prob. 0.99 0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 1 0.00013 0.0039 0.016 0.06 0.15 0.46 1.07 1.64 2.71 3.84 2 0.02 0.10 0.21 0.45 0.71 1.39 2.41 3.22 4.60 5.99 3 0.12 0.35 0.58 1.00 1.42 2.37 3.66 4.64 6.25 7.82 4 0.3 0.71 1.06 1.65 2.20 3.36 4.88 5.99 7.78 9.49 5 0.55 1.14 1.61 2.34 3.00 4.35 6.06 7.29 9.24 11.07 Insignificant (accept null hypothesis) Significant (reject null hypothesis) 6 Heads, 4 Tails • Our χ2calc = 0.4 value corresponds to a p value somewhere between 0.70 and 0.50. – So it’s about 60% likely to be chance that we got 6 heads. Makes sense. • Computer software can often calculate an exact p value for you, but for our purposes we’ll use tables. Chi-Squared Summary • o is “observed” – What you found. • e is “expected” (o - e) x e 2 2 – What you would have gotten if there were no difference. • (sigma) means “sum of” – Add all the (o-e)2/e results together • Look up what you get for x2 on a chi-squared table under with the right “degrees of freedom” under p=0.05. – If your x2 value is higher, it’s a significant difference! – If not, find the closest p value. Scientific Example: Chantix™ • Remember Chantix? The anti-smoking drug we discussed earlier in the year? – How do we relate this to chi-squared testing? • First, what’s the null hypothesis? – Chantix has no effect on smoking cessation. • The observed data? – How many smokers quit. • The expected data? – How many smokers quit…on a placebo. • Degrees of freedom? – One. You either quit or you don’t. Final Example • Suppose you have a generation of worms in which a disease strikes only the males. • So, in generation one, we find 100 males and 900 females. • A biologist checks on the population when the tenth generation is alive and finds 12,000 males and 13,000 females. • Are the male worms evolving resistance to the disease? Final Example Generation 1: 100 ♂, 900 ♀ Generation 10: 12,000 ♂, 13,000 ♀ • Finding observed is easy – we observed 12,000 ♂ and 13,000 ♀ worms in Generation 10. – But what about expected? • If the male worms are not evolving resistance, and Generation 10 totals 25,000 worms, the same proportion of the population should be male. – Generation 1 was 10% male (100 ♂/900 total). – Generation 10 should also be 10% male, or 2500 worms. (0.1 * 25,000) • So expected is 2500 ♂ and 22,500 ♀. (25,000 – 2500) Final Example • Now do the chi-squared analysis: Thus, we have significance and the worms are evolving resistance! – ♂: (12,000 – 2500)2 / 2500 = 36,100 – ♀: (13,000 – 22,500)2 / 22,500 = 4011.11 – Summing those equals 40,111.11, which is χ2calc. • Now look up χ2crit with df = 1 (since you’re either male or female). – I’ll use the table you’ll have on the AP Exam… – You can see that χ2calc is much larger than χ2crit. Chi-Squared Takeaways • x2 increases with greater differences between data sets. • So, to be confident it is not a chance effect, you need a bigger difference from the result of the chi-squared test than is listed on the table. • With more degrees of freedom, you need an even larger difference between the data sets. • At the end of the test, the big number to report is p. – There are lots of statistical tests out there that use lots of unique numbers (unique like χ2calc is to chi-squared), but all test report p value at the end – everyone knows p value. • Now let’s get to some M&Ms… M&M Chi-Squared Activity • Here’s the idea: – Mars says they measure out how many M&Ms of various colors are in a bag. – But are they really all equal? How can we tell? • Perform a chi-squared test to find out! – Count the number of each color in your bag. – Convert the given percentages to numbers (no rounding necessary). – Complete the test and find out if your bag is significantly different from what Mars calls standard. – Note: We will pool all our data for the second half of the lab during the next class.
© Copyright 2026 Paperzz