Blue Brown Green/Hazel Total Males 6 20 6 32 Females 4 16 12 32 Total 10 36 18 64 1. What percent of females have brown eyes? 2. What percent of brown-eyed students are female? 3. What percent of students are brown-eyed females? 4. What’s the distribution of eye color? (Percent of each color for all participants) 5. What’s the conditional distribution of eye color for males? 6. Compare the percent of female among blue-eyed students to the percent of all students who are female. Chapter 26 Part 1 COMPARING COUNTS Is an observed distribution consistent with what we expect? Are observed differences among several distributions large enough to be significant? These are questions that we will answer 2 using a χ (chi-squared) model. The Chi-squared tests only involve categorical variables. When we have quantitative variables, we do hypothesis testing for means (the ttest). 2 χ 𝑀𝑜𝑑𝑒𝑙𝑠 • Right skewed distribution • The distribution is less skewed as degrees of freedom increase • The mean of the model, or the expected value, is equal to the degrees of freedom Three types of testing in this chapter: 1. A goodness-of-fit test compares the distribution of observed outcomes for a single categorical variable to the expected outcomes predicted by a probability model to see if the model is viable. →One sample, one variable 2. A test of homogeneity compares observed distributions for several groups to each other to see if there is evidence of differences among the respective populations. →Several groups, one variable 3. A test of independence cross-categorizes one group on two variables to see if there is an association between them. →One sample, two variables Would you use a chi-square goodness-of-fit test, a chi-square test of homogeneity, a chi-square test of independence, or some other test? 1) A brokerage firm wants to see whether the type of account a customer has (Silver, Gold, Platinum) affects the type of trades that customer makes (in person, by phone, or by Internet). It collects a random sample of trades made for its customers over the past year and performs a test. Chi-square test of independence (one sample, two variables –type of account and type of trades) 2) That brokerage firm also wants to know if the type of account affects the size of the account (in dollars). It performs a test to see if the mean size of the account is the same for the three account types. Other test. Account size is quantitative. 3) The academic research office at a large community college wants to see whether the distribution of courses chosen (Humanities, Social Science, or Science) is different for its residential and nonresidential students. It assembles last semester’s data and performs a test. Chi-square test of homogeneity (two groups, one variable Courses) Assumptions & Conditions for a χ2 test: 1.All expected cell frequencies must be at least 5. (You must show each of the frequencies; you can’t state that they are all 5 or greater without giving the values.) 2.Independence Assumption: The individuals of a sample need to be independent of each other. *This is not necessary if testing for homogeneity. 3.Randomization Condition: Samples need to be drawn randomly. 4.Counted Data Condition: The data must be counts or frequencies. Medical researchers enlisted 90 subjects for an experiment comparing treatments for depression. The subjects were randomly divided into three groups and given pills to take for a period of three months. Unknown to them, one group received a placebo, the second group the “natural” remedy St. John’s Wort, and the third group the prescription drug Posrex. After six months, psychologists and physicians evaluated the subjects to see if their depression has returned. Placebo St John’s Wort Posrex Depression returned 24 22 14 No sign of depression 6 8 16 Step 1: Hypotheses 𝐻0 : 𝑇ℎ𝑒 𝑟𝑎𝑡𝑒 𝑜𝑓 𝑟𝑒𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 𝑖𝑠 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑡ℎ𝑟𝑒𝑒 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠. 𝐻𝐴 : 𝑇ℎ𝑒 𝑟𝑎𝑡𝑒 𝑜𝑓 𝑟𝑒𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑠 𝑓𝑜𝑟 𝑠𝑜𝑚𝑒 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠. Step 2: Check Conditions and Model These are counts of categorical data. Subjects were randomly assigned to treatments. Need to check for expected counts to continue (next slide). The degrees of freedom = (#rows – 1)(#columns -1) Checking expected counts… The expected counts if the treatments are equally effective would come from splitting the totals up evenly among the 3 groups. Placebo Depression 24 returned No sign of 6 depression Total St. John’s Wort Posrex 22 14 (20) (20) (20) 8 16 (10) (10) (10) 30 30 30 Total 60 30 90 All expected counts are greater than 5, so we can continue with a chi-square test for homogeneity with df=(2-1)(3-1) = 2. Step 3: Mechanics χ2 = (𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑)2 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 1. Find expected values 2. Compute residuals 3. Square residuals 4. Divide each by expected value 5. Add components (take the sum) 6. Find d.f. (if not done already) 7. Test hypotheses 2 2 2 2 2 2 (24 − 20) (22 − 20) (14 − 20) (6 − 10) (8 − 10) (16 − 10) χ2 = + + + + + 20 20 20 10 10 10 = 0.8 + 0.2 + 1.8 + 1.6 + 0.4 + 3.6 = 8.4 To get the P-value on TI-84: DIST 8:χ2 cdf (χ score, 999, df) *We always do a right-tail test for chi-square 2 𝑃 χ > 8.4 = 0.015 Step 4: Conclusion Because the P-value is low, we reject the null hypothesis. There is strong evidence that the tested treatments are not all equally effective in preventing the recurrence of depression. It appears that people who took the prescription drug Posrex are more likely to remain free of the signs of depression than those who took a placebo or the natural remedy St. John’s wort. Assignment: Ch 26 Homework: Page 642 #2-4
© Copyright 2026 Paperzz