Notes - Ch. 12

Chapter 12: Comparing Counts
“Goodness-of-Fit”:
(How closely do the observed numbers fit the “null” model?) involves
testing a hypothesis to determine if the observed values fit the “null” model (there is no confidence
interval since there is no single parameter of interest)
Assumptions and Conditions:
 Counted Data Condition: check that the data are counts for the categories
 Independence Assumption
o Randomization Condition: the individuals who have been counted and whose counts
are available for analysis should be from a random sample of some population
 Sample Size Assumption: we need enough data for the methods to work
o Expected Cell Frequency Condition: we should expect to see at least 5 individuals in
each cell
Chi-square (or chi-squared) Statistic: the test statistic for this type of calculation
(Obs  Exp) 2
2
  
Exp
allcells
*this is based on a family of sampling distribution models called the chi-square models:
- differ only in the number of degrees of freedom
- degrees of freedom is n – 1, where n is the number of categories (not sample size)
One-Sided vs. Two-Sided:
- the Chi-Square test is ALWAYS one-sided (we’re only interested in high values of the
statistic)
- a large value of  2 means we will reject the null hypothesis (rejecting the null means the
model didn’t fit)
Steps for Chi-Square Calculations:
1. Find the expected values
2. Compute the residuals (Observed – Expected)
3. Square the residuals
(Observed  Expected) 2
4. Compute the components
Expected
5. Find the sum of the components (this is the chi-square statistic)
6. Find the degrees of freedom
7. Test the hypothesis (find the P-value for your calculated chi-square statistic)
**Step-by-Step: pg. 609
Does your zodiac sign determine how successful you will be in later life? Fortune magazine collected
the zodiac signs of 256 heads of the largest 400 companies. Here are the number of births for each
sign:
Births
Sign
23
Aries
20
Taurus
18
Gemini
23
Cancer
20
Leo
19
Virgo
18
Libra
21
Scorpio
19
Sagittarius
22
Capricorn
24
Aquarius
29
Pisces
Is this enough to claim that successful people are more likely to be born under some signs than
others?
Ho :
HA :
Births are uniformly distributed over zodiac signs.
Births are not uniformly distributed over zodiac signs.
Counted Data Condition: I have counts of the number of executives in 12 categories.
Randomization Condition: This is a convenience sample of executives, but there’s no reason to
suspect bias
Expected Cell Frequency Condition: The null hypothesis expects that 1/12 of the 256 births, or
21.333, should occur in each sign. These expected values are all at least 5, so the condition is
satisfied.
The conditions are satisfied, so I’ll use a  2 model with 12 - 1 = 11 degrees of freedom and do a chisquared goodness-of-fit test.
The expected value for each zodiac sign is 21.333
(Obs  Exp) 2 (23  21.333) 2 (20  21.333) 2


 ...  5.094 for all 12 signs
Exp
21.333
21.333
*sketch and label model
2  
P-value = P(  2 >5.094) = 0.926
The P-value of 0.926 says that if the zodiac signs of executives were in fact distributed uniformly, an
observed chi-square value of 5.09 or higher would occur about 93% of the time. This certainly isn’t
unusual, so I fail to reject the null hypothesis, and conclude that these data show virtually no evidence
of nonuniform distribution of zodiac signs among executives.
Chi-Square Test of Homogeneity:
*this time we find the expected counts directly from the data
*the degrees of freedom are slightly different
*there is a standard null that the distribution does not change from group to group (we already know
how to test if we want to know if two proportions are the same, but now we have more than two)
Assumptions and Conditions:
- Counted Data Condition: we can’t do a chi-square test of homogeneity for proportions or
measurements
- Data are from independently chosen random samples or from subjects who were
assigned at random to treatment groups
- Expected Cell Frequency Condition: must expect at least 5 in each cell
Calculations:




Using your null hypothesis (always write your null hypothesis in words), find the expected
values (since the hypothesis is that the proportion is the same for each cell, use the overall
proportion as the expected value)
Check the Expected Cell Frequency Condition
Compute the component for each cell of the table; add them together to get the chi-square
statistic
(Obs  Exp) 2
2  
Exp
Degrees of Freedom: (R-1)(C-1), where R=rows and C=columns
**Step-by-Step: pg. 616-617
Examining the Residuals: this needs to be done only when rejecting the null
To standardize a cell’s residual:
(Obs  Exp)
c
Exp
**Just Checking: pg. 616-617
Chi-square Test for Independence:
use when you have subjects from a single group
categorized on two categorical variables
Assumptions and Conditions:
- Counted Data Condition: we can’t do a chi-square test of homogeneity for proportions or
measurements
- Randomization: the observed counts are based on data from a random sample
- Expected Cell Frequency Condition: must expect at least 5 in each cell
Contingency tables: categorize counts on two (or more) variables so that we can see whether the
distribution of counts on one variable is contingent on the other. (Is one independent of the other?)
Remember: Independence… P(A) = P(A|B)
*use same mechanics as chi-square test for homogeneity
Conditions: Expected Cell Frequency Condition and Randomization
**Step-by-Step: pg. 620-621
Examine the Residuals…
*find standardized residuals, which residuals are the largest (think absolute value)
**BE CAREFUL** if the expected count was below 5, the residuals need a closer look…you may
need to combine categories to get a higher expected count (only combine if it makes sense!)
**TI Tips: pg. 623-624
**Just Checking: pg. 624
Chi-square and Causation
NEVER assume causation...remember only controlled experiments can determine causation!
The Chi-square test for independence can only say if the variables are independent or not (think
correlation!)
Plus…there’s no way to determine which variable is the cause.