Chapter 12: Comparing Counts “Goodness-of-Fit”: (How closely do the observed numbers fit the “null” model?) involves testing a hypothesis to determine if the observed values fit the “null” model (there is no confidence interval since there is no single parameter of interest) Assumptions and Conditions: Counted Data Condition: check that the data are counts for the categories Independence Assumption o Randomization Condition: the individuals who have been counted and whose counts are available for analysis should be from a random sample of some population Sample Size Assumption: we need enough data for the methods to work o Expected Cell Frequency Condition: we should expect to see at least 5 individuals in each cell Chi-square (or chi-squared) Statistic: the test statistic for this type of calculation (Obs Exp) 2 2 Exp allcells *this is based on a family of sampling distribution models called the chi-square models: - differ only in the number of degrees of freedom - degrees of freedom is n – 1, where n is the number of categories (not sample size) One-Sided vs. Two-Sided: - the Chi-Square test is ALWAYS one-sided (we’re only interested in high values of the statistic) - a large value of 2 means we will reject the null hypothesis (rejecting the null means the model didn’t fit) Steps for Chi-Square Calculations: 1. Find the expected values 2. Compute the residuals (Observed – Expected) 3. Square the residuals (Observed Expected) 2 4. Compute the components Expected 5. Find the sum of the components (this is the chi-square statistic) 6. Find the degrees of freedom 7. Test the hypothesis (find the P-value for your calculated chi-square statistic) **Step-by-Step: pg. 609 Does your zodiac sign determine how successful you will be in later life? Fortune magazine collected the zodiac signs of 256 heads of the largest 400 companies. Here are the number of births for each sign: Births Sign 23 Aries 20 Taurus 18 Gemini 23 Cancer 20 Leo 19 Virgo 18 Libra 21 Scorpio 19 Sagittarius 22 Capricorn 24 Aquarius 29 Pisces Is this enough to claim that successful people are more likely to be born under some signs than others? Ho : HA : Births are uniformly distributed over zodiac signs. Births are not uniformly distributed over zodiac signs. Counted Data Condition: I have counts of the number of executives in 12 categories. Randomization Condition: This is a convenience sample of executives, but there’s no reason to suspect bias Expected Cell Frequency Condition: The null hypothesis expects that 1/12 of the 256 births, or 21.333, should occur in each sign. These expected values are all at least 5, so the condition is satisfied. The conditions are satisfied, so I’ll use a 2 model with 12 - 1 = 11 degrees of freedom and do a chisquared goodness-of-fit test. The expected value for each zodiac sign is 21.333 (Obs Exp) 2 (23 21.333) 2 (20 21.333) 2 ... 5.094 for all 12 signs Exp 21.333 21.333 *sketch and label model 2 P-value = P( 2 >5.094) = 0.926 The P-value of 0.926 says that if the zodiac signs of executives were in fact distributed uniformly, an observed chi-square value of 5.09 or higher would occur about 93% of the time. This certainly isn’t unusual, so I fail to reject the null hypothesis, and conclude that these data show virtually no evidence of nonuniform distribution of zodiac signs among executives. Chi-Square Test of Homogeneity: *this time we find the expected counts directly from the data *the degrees of freedom are slightly different *there is a standard null that the distribution does not change from group to group (we already know how to test if we want to know if two proportions are the same, but now we have more than two) Assumptions and Conditions: - Counted Data Condition: we can’t do a chi-square test of homogeneity for proportions or measurements - Data are from independently chosen random samples or from subjects who were assigned at random to treatment groups - Expected Cell Frequency Condition: must expect at least 5 in each cell Calculations: Using your null hypothesis (always write your null hypothesis in words), find the expected values (since the hypothesis is that the proportion is the same for each cell, use the overall proportion as the expected value) Check the Expected Cell Frequency Condition Compute the component for each cell of the table; add them together to get the chi-square statistic (Obs Exp) 2 2 Exp Degrees of Freedom: (R-1)(C-1), where R=rows and C=columns **Step-by-Step: pg. 616-617 Examining the Residuals: this needs to be done only when rejecting the null To standardize a cell’s residual: (Obs Exp) c Exp **Just Checking: pg. 616-617 Chi-square Test for Independence: use when you have subjects from a single group categorized on two categorical variables Assumptions and Conditions: - Counted Data Condition: we can’t do a chi-square test of homogeneity for proportions or measurements - Randomization: the observed counts are based on data from a random sample - Expected Cell Frequency Condition: must expect at least 5 in each cell Contingency tables: categorize counts on two (or more) variables so that we can see whether the distribution of counts on one variable is contingent on the other. (Is one independent of the other?) Remember: Independence… P(A) = P(A|B) *use same mechanics as chi-square test for homogeneity Conditions: Expected Cell Frequency Condition and Randomization **Step-by-Step: pg. 620-621 Examine the Residuals… *find standardized residuals, which residuals are the largest (think absolute value) **BE CAREFUL** if the expected count was below 5, the residuals need a closer look…you may need to combine categories to get a higher expected count (only combine if it makes sense!) **TI Tips: pg. 623-624 **Just Checking: pg. 624 Chi-square and Causation NEVER assume causation...remember only controlled experiments can determine causation! The Chi-square test for independence can only say if the variables are independent or not (think correlation!) Plus…there’s no way to determine which variable is the cause.
© Copyright 2026 Paperzz