Unit #8 – Chapter 13 The Chi-Square Distribution! 1 Why used? Two purposes….. • Chi-square analysis is primarily used to deal with categorical (frequency) data (1) We measure the “goodness of fit” between our observed outcome and the expected outcome for some variable (2) With two variables, we test in particular whether they are independent of one another using the same basic approach. 2 Section13-1: “Goodness of Fit” Tests • Suppose we want to know how people in a particular area will vote. We perform an SRS with n=60, asking them which party they prefer. Republican Democrat Independent 20 30 10 • How will we analyze what’s really going on? 3 Goodness of Fit Example, continued • Null Hypothesis: There is no preference – all three parties are equally liked • Solution: chi-square analysis to determine if our outcome is different from what would be expected if there was no preference 2 ( O E ) 2 E 4 Calculations Observed Expected Republican Democrat Other 20 20 30 20 10 20 • Plug into the formula: 2 (20 20)2 (30 20)2 (10 20)2 20 20 20 <<< If there were truly no preference (as the null hypo states), then we would expect equal #’s for each. = 10 5 More Calculations 2 (2) 10 2 .05 5.99 • So we will Reject H0 • More on p-values on the board…. 6 Conclusion • Note that all we really can conclude is that our data is different from the expected outcome given a situation – that’s what the Alternative Hypothesis says – Although it would appear that the district will vote Democratic, really we can only conclude they were not responding by chance – Regardless of the order that I wrote the outcome categories, the calculations would have been the same – In other words, it is a non-directional test regardless of the prediction 7 Summary: The Chi-square distribution • Skewed to the right, but it becomes more symmetric (and normal-looking) with increasing degrees of freedom • No, this is NOT a normal distribution, a tdistribution, or a uniform distribution – it’s a different thing! • So, different graph, different numbers, etc. n z i 1 2 i 8 Conditions for Chi-Square Tests • Counts: – We need at least 5 for each of our expected frequencies values • Inclusion of non-occurences: – Must include all responses, not just those positive ones • Independence: – Not that the variables are independent or related (that’s what the test can be used for), but rather, as with our t-tests, the observations (data points) don’t have any bearing on one another. So, as usual, make sure it was an SRS. • To help with the last two, make sure that your N equals the total number of people who responded 9 Tests of Independence between 2 categorical variables Section 13-2: • What do Stats kids do with their free time? • We ask an SRS of n=200 students and get these observed results: Males Females TV Nap Study Stare at Ceiling 30 20 40 30 20 40 10 10 10 • Is there a relationship between gender (X) and what the Stats kids do with their free time (Y)? Males Females Totals: TV Nap Study Stare at Ceiling Totals: 30 20 40 30 20 40 10 10 100 100 50 70 60 20 200 • Expected = (Ri*Cj)/N • Example for males/TV: the expected count would be (100*50)/200 = 25 11 The same chart, with the Observed and (Expected) counts TV Nap Study Stare at Ceiling Totals: Males (E) 30 (25) 40 (35) 20 (30) 10 (10) 100 Females (E) 20 (25) 30 (35) 40 (30) 10 (10) 100 Totals: 50 70 60 20 200 • df = (R-1)(C-1) = (2-1)(4-1) = 3 R = number of rows and C = number of columns 12 Calculations and Conclusion: • Chi-Squared TOI: (3) 10.10 2 7.82 2 .05 • Reject H0 (more on p-values on the board) • Conclusion: There is some relationship between gender and how Stats students spend their free time 13 Conditions for both types of Chi-Square Tests • Counts: – Rule of thumb is that we need at least 5 for our expected frequencies values • Inclusion of non-occurences: – Must include all responses, not just those positive ones • Independence: – Not that the variables are independent or related (that’s what the test can be used for), but rather, as with our t-tests, the observations (data points) don’t have any bearing on one another. So, as usual, make sure it was an SRS. • To help with the last two, make sure that your N equals the total number of people who responded 14 Other • Important point about the non-directional nature of the test: the chi-square test by itself cannot speak to specific hypotheses about the way the results would come out >> In other words, the null hypothesis is always “no relationship or no preference”, while the alternative hypothesis is always “there is some relationship” or “there is some sort of preference” 15 For example, we asked: “What do Stats kids do with their free time?? TV Nap Study Stare at Ceiling Males 30 40 20 10 Females 20 30 40 10 • Even though we rejected the null hypothesis – concluding that gender and free time behavior are associated with each other – that’s our only conclusion. • We can’t also conclude that males nap more or females study more, for example, even though it looks that way, since that wasn’t in our alternative hypothesis to start with. 16 Summary: The Chi-square distribution • Skewed to the right, but it becomes more symmetric (and normal-looking) with increasing degrees of freedom • No, this is NOT a normal distribution, a tdistribution, or a uniform distribution – it’s a different thing! • So, different graph, different numbers, etc. n z i 1 2 i 17 Finally….. • There are two different types of Chi-Square Tests: > Chi-Square Goodness of Fit Test (13-1) > Chi-Square Test of Independence (13-2) * When you write the name of the test (the second C in HCCC), you must specify which one it is – don’t just write “Chi-Square Test.” 18
© Copyright 2024 Paperzz