Chapter 26 Part 1

Blue
Brown
Green/Hazel
Total
Males
6
20
6
32
Females
4
16
12
32
Total
10
36
18
64
1. What percent of females have brown eyes?
2. What percent of brown-eyed students are female?
3. What percent of students are brown-eyed females?
4. What’s the distribution of eye color? (Percent of each color for all
participants)
5. What’s the conditional distribution of eye color for males?
6. Compare the percent of female among blue-eyed students to the
percent of all students who are female.
Chapter 26
Part 1
COMPARING COUNTS
Is an observed distribution consistent with what we expect?
Are observed differences among several distributions large
enough to be significant?
These are questions that we will answer
2
using a χ (chi-squared) model.
The Chi-squared tests only
involve categorical variables.
When we have quantitative variables, we
do hypothesis testing for means (the ttest).
2
χ
𝑀𝑜𝑑𝑒𝑙𝑠
• Right skewed distribution
• The distribution is less skewed as
degrees of freedom increase
• The mean of the model, or the
expected value, is equal to the
degrees of freedom
Three types of testing in this chapter:
1. A goodness-of-fit test compares the distribution of observed outcomes for a
single categorical variable to the expected outcomes predicted by a
probability model to see if the model is viable.
→One sample, one variable
2. A test of homogeneity compares observed distributions for several groups to
each other to see if there is evidence of differences among the respective
populations.
→Several groups, one variable
3. A test of independence cross-categorizes one group on two variables to see if
there is an association between them.
→One sample, two variables
Would you use a chi-square goodness-of-fit test, a chi-square test of
homogeneity, a chi-square test of independence, or some other test?
1) A brokerage firm wants to see whether the type of account a customer has (Silver, Gold,
Platinum) affects the type of trades that customer makes (in person, by phone, or by
Internet). It collects a random sample of trades made for its customers over the past year
and performs a test.
Chi-square test of independence (one sample, two variables
–type of account and type of trades)
2) That brokerage firm also wants to know if the type of account affects the size of the
account (in dollars). It performs a test to see if the mean size of the account is the same for
the three account types.
Other test. Account size is quantitative.
3) The academic research office at a large community college wants to see whether the
distribution of courses chosen (Humanities, Social Science, or Science) is different for its
residential and nonresidential students. It assembles last semester’s data and performs a
test.
Chi-square test of homogeneity (two groups, one variable Courses)
Assumptions & Conditions for a χ2 test:
1.All expected cell frequencies must be at least 5.
(You must show each of the frequencies; you can’t state that they are
all 5 or greater without giving the values.)
2.Independence Assumption:
The individuals of a sample need to be independent of each other.
*This is not necessary if testing for homogeneity.
3.Randomization Condition: Samples need to be drawn randomly.
4.Counted Data Condition: The data must be counts or frequencies.
Medical researchers enlisted 90 subjects for an experiment comparing
treatments for depression. The subjects were randomly divided into
three groups and given pills to take for a period of three months.
Unknown to them, one group received a placebo, the second group
the “natural” remedy St. John’s Wort, and the third group the
prescription drug Posrex. After six months, psychologists and
physicians evaluated the subjects to see if their depression has
returned.
Placebo
St John’s Wort
Posrex
Depression
returned
24
22
14
No sign of
depression
6
8
16
Step 1: Hypotheses
𝐻0 : 𝑇ℎ𝑒 𝑟𝑎𝑡𝑒 𝑜𝑓 𝑟𝑒𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 𝑖𝑠 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑡ℎ𝑟𝑒𝑒 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠.
𝐻𝐴 : 𝑇ℎ𝑒 𝑟𝑎𝑡𝑒 𝑜𝑓 𝑟𝑒𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑠 𝑓𝑜𝑟 𝑠𝑜𝑚𝑒 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠.
Step 2: Check Conditions and Model
These are counts of categorical data.
Subjects were randomly assigned to treatments.
Need to check for expected counts to continue (next slide).
The degrees of freedom = (#rows – 1)(#columns -1)
Checking expected counts…
The expected counts if the treatments are equally effective would
come from splitting the totals up evenly among the 3 groups.
Placebo
Depression 24
returned
No sign of 6
depression
Total
St. John’s Wort
Posrex
22
14
(20)
(20)
(20)
8
16
(10)
(10)
(10)
30
30
30
Total
60
30
90
All expected counts are greater than 5, so we can continue
with a chi-square test for homogeneity with df=(2-1)(3-1) = 2.
Step 3: Mechanics
χ2 =
(𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑)2
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑
1. Find expected values
2. Compute residuals
3. Square residuals
4. Divide each by expected value
5. Add components (take the sum)
6. Find d.f. (if not done already)
7. Test hypotheses
2
2
2
2
2
2
(24
−
20)
(22
−
20)
(14
−
20)
(6
−
10)
(8
−
10)
(16
−
10)
χ2 =
+
+
+
+
+
20
20
20
10
10
10
= 0.8 + 0.2 + 1.8 + 1.6 + 0.4 + 3.6 = 8.4
To get the P-value on TI-84:
DIST  8:χ2 cdf  (χ score, 999, df)
*We always do a right-tail test for
chi-square
2
𝑃 χ > 8.4 = 0.015
Step 4: Conclusion
Because the P-value is low, we reject the null hypothesis.
There is strong evidence that the tested treatments are not
all equally effective in preventing the recurrence of
depression. It appears that people who took the
prescription drug Posrex are more likely to remain free of
the signs of depression than those who took a placebo or
the natural remedy St. John’s wort.
Assignment:
Ch 26 Homework: Page 642 #2-4