Solution key

MATH 461/661, Homework 4
Sonia Heckler and Connor Dayton
Exercise 1: (2.18) Table 2.13 shows data from the 2002 General Social Survey cross classifying a person’s perceived happiness with their family income. The table displays the observed
and expected cell counts and the standardized residuals for testing independence.
Income
Not Too Happy
Pretty Happy
Very Happy
159
110
35.8
166.1
88.1
-2.973
-0.947
3.144
372
221
79.7
370.0
196.4
-4.403
0.224
2.907
249
83
52.5
244.0
129.5
7.368
0.595
-5.907
21
Above Average
53
Average
94
Below Average
a. Show how to obtain the estimated expected cell count of 35.8 for the first cell.
Income
Not Too Happy
Pretty Happy
Very Happy
Total
Above Average
Average
Below Average
Total
21
53
94
168
159
372
249
780
110
221
83
414
290
646
426
1362
Solution:
µ̂ij =
ni • ∗ n • j
168 ∗ 290
=
≈ 35.8
n
1362
b. For testing independence, χ2 = 73.4. Report the df value and the P-value, and interpret.
Solution: d f = ( I − 1)( J − 1) = (3 − 1)(3 − 1) = 4.
From chi-squared table: p-value < 0.001
P is sufficiently close to 0 that the null hypothesis can be rejected.
c. Interpret the standardized residuals in the corner cells having counts 21 and 83.
Solution: For the corner cells 21 and 83, they have residuals of -2.973 and -5.907 respectively.
The residuals are large enough in the negative direction to conclude that there are significantly
fewer subjects in these categories than the model predicts.
d. Interpret the standardized residuals in the corner cells having counts 110 and 94.
Solution: For the corner cells 110 and 94, they have residuals of 3.144 and 7.368 respectively.
The residuals are large enough in the positive direction to conclude that there are significantly
more subjects in these categories than the model predicts.
Exercise 2: (2.19) Table 2.14 was taken from the 2002 General Social Survey.
Race
Democrat
Independent
Republican
Total
White
Black
Total
871
302
1173
444
80
524
873
43
916
2188
425
2613
a.Test the null hypothesis of independence between party identification and race. Interpret.
Solution: H0 : response variables are independent H A : response variables are dependent
µ̂ij =
ni • ∗ n • j
Expected value:
n
X2 = ∑
ij
Race
Democrat
Independent
Republican
White
Black
982.2135
190.7865
438.7723
85.2277
797.0142
148.9858
(nij − µ̂ij )2
= 167.8
µ̂ij
d f = ( I − 1)( J − 1) = (2 − 1)(3 − 1) = 2
From chi-squared table: p-value < 0.001 P is sufficiently close close to 0 that the null hypothesis can be rejected.
b. Use standardized residuals to describe the evidence.
Solution:
Race
Democrat
Independent
Republican
White
Black
-3.5486
8.0516
0.2496
-0.5663
3.8269
-8.6831
The model does a good job of predicting independent voters for both black and white. It
does a mediocre job of predicting Democrat and Republican for whites, and it does a terrible
job predicting Democrat and Republican for Blacks. Overall the model is a mixed bag. These
results do not completely confirm the results in part a.
c. Partition the chi-squared into two components, and use the components to describe the
evidence.
Race
Democrat
Independent
Total
White
Black
Total
871
302
1173
444
80
524
1315
382
1697
Democrat vs. Independent: X 2 =
(nij − µ̂ij )2
= 22.2036
∑
µ̂ij
ij
Race
Democrat + Independent
Republican
Total
White
Black
Total
1315
382
1692
873
43
916
2188
425
2613
Democrat and Independent vs. Republican X 2 =
(nij − µ̂ij )2
= 137.3387
∑
µ̂ij
ij
The difference between Whites and Blacks identifying as republicans is much stronger than
those identifying as Independent or Democrat.
Exercise 3: (2.21) Each subject in a sample of 100 men and 100 women is asked to indicate
which of the following factors (one or more) are responsible for increases in teenage crime:
A, the increasing gap in income between the rich and poor; B, the increase in the percentage
of single-parent families; C, insufficient time spent by parents with their children. A cross
classification of the responses by gender is
Gender
A
B
C
Men
Women
60
75
81
87
75
86
a. Is it valid to apply the chi-squared test of independence to this 2 x 3 table? Explain.
Solution: No, because the sujects have the option to choose more than 1 answer, the responses are dependent and chi-squared test does not apply.
b. Explain how this table actually provides information needed to crossclassify gender
with each of three variables. Construct the contingency table relating gender to opinion about
whether factor A is responsible for increases in teenage crime.
Solution: The table needs to be broken up by factor. Each factor individually can be compared to gender, but they cannot be compared to each other. Since there are 100 men and 100
women surveyed, 100 - (value of factor) calculates those who said no.
Gender: A
Yes
No
Men
Women
60
75
40
25
Exercise 4: (2.22) Table 2.15 classifies a sample of psychiatric patients by their diagnosis and
by whether their treatment prescribed drugs.
Diagnosis
Drugs
No Drugs
Total
Schizophrenia
Affective disorder
Neurosis
Personality disorder
Special symptoms
Total
105
12
18
47
0
182
8
2
19
52
13
94
113
14
37
99
13
276
a. Conduct a test of independence, and interpret the P-value.
Solution: H0 : πij = πi• ∗ π• j for all i, j
Meanwhile, the Ha : πij 6= πi• ∗ π• j for a certain i, j.
χ2 = ∑
ij
(nij −µ̂ij )2
µ̂ij
= 84.189, with degrees of freedom d f = ( I − 1)( J − 1) = (5 − 1)(2 − 1) = 4
The Chi-squared distribution has a mean of k and a variance of 2k, where k is the degrees of
√
√
freedom. For this distribution, the mean = 4 and the standard distribution = 8 = 2 2. Our
value of 84.189 is very far in the right tail, with a p-value of approximately 0. Therefore, we
reject the null hypothesis and conclude that the patient’s diagnoses is associated to whether or
not they are prescribed drugs through their treatments.
b. Obtain standardized residuals, and interpret.
Solution: Standardized residuals are given by
ni • n • j
n
where µ̂ij =
and pi• = i•
n
n
p
nij − µ̂ij
µ̂ij ∗ 1 − pi• ∗ 1 − p• j
The following table shows the standardized residuals for each i and j
Diagnosis
Drugs
No Drugs
Schizophrenia
Affective disorder
Neurosis
Personality disorder
Special symptoms
7.8742
1.6022
-2.3853
-4.8416
-5.1393
-7.8745
-1.6023
2.3852
4.8418
5.1396
As an example of calculating a standardized residual, I will demonstrate the calculation for
the first cell
∗182 = 74.5145, p = 113 = 0.4094, p = 182 = 0.6594
n11 = 105, µ̂11 = 113276
1•
•1
276
276
the residual is p
105 − 74.5145
74.5145 ∗ (1 − 0.4094)(1 − 0.6594)
= 7.8742.
If the variables were truly independent, we would expect small residuals in each of the
cells. The residuals we calculated are much larger than predicted, indicating a significant
discrepancy between nij and µ̂ij
c. Partition chi-squared into three components to describe differences and similarities
among the diagnoses, by comparing (i) the first two rows, (ii) the third and fourth rows, (iii)
the last row to the first and second rows combined and the third and fourth rows combined.
Solution:
Table of first 2 rows for mental health question
X2 = ∑
ij
Diagnosis
Drugs
No Drugs
Total
Schizophrenia
Affective disorder
Total
105
12
117
8
2
10
113
14
127
(nij − µ̂ij )2
= 0.8917 (without correction)
µ̂ij
Table of 3rd and 4th rows for mental health question
X2 = ∑
ij
Diagnosis
Drugs
No Drugs
Total
Neurosis
Personality disorder
Total
18
47
65
19
52
71
37
99
136
(nij − µ̂ij )2
= .0149 (without correction)
µ̂ij
Table of 1st and 2nd rows against 3rd and 4th rows against last row
Diagnosis
Drugs
No Drugs
Total
Schizophrenia + Affective disorder
Neurosis + Personality disorder
Special symptoms
Total
117
65
0
182
10
71
13
94
127
136
13
276
Schizophrenia/Affective Disorder vs. Special Symptoms
X2 = ∑
ij
(nij − µ̂ij )2
= 72.8997 (without correction)
µ̂ij
Neurosis/Personality disorder vs. Special Symptoms
X2 = ∑
ij
(nij − µ̂ij )2
= 11.0211 (without correction)
µ̂ij
ni + n j
.
n
a. Show that µ̂ij have the same row and column totals as nij .
ni • n • j
Solution: We know ∑ µ̂ij = ∑
. For a fixed i, this sum is equal to
n
ij
ij
Exercise 5: (2.25) For tests of H0 : independence, µ̂ij =
∑
ij
ni • n • j
n
= i•
n
n
∑ n• j
=
j
ni •
n
n
= ni •
So, for any fixed i, ∑ µ̂ij = ni• . Repeat this process with a fixed j to show that ∑ µ̂ij = n• j .
ij
ij
Thus, µ̂ij has the same row totals ni• and column totals n• j as nij
b. For 2x2 tables, show that
µ̂11 µ̂22
= 1.0 Hence, µ̂ij satisfy H0
µ̂12 µ̂21
n1• n •1 n2• n •2
n
n
Solution:
n1• n •2 n2• n •1
n
n
= 1, after canceling all the terms.
µ̂11 µ̂22
=
µ̂12 µ̂21
Exercise 6: (2.30) Table 2.17 contains results of a study comparing radiation therapy with
surgery in treating cancer of the larynx. Use Fisher’s exact test to test H0 : θ = 1 against
Ha : θ > 1. Interpret results.
Treatment
Cancer Controlled
Cancer Not Controlled
Total
Surgery
Radiation therapy
Total
21
15
36
2
3
5
23
18
41
Solution: For Ha : θ > 1, the p-value is a hypergeometric probability that the first element
(n11 ) is at least as large as what it was observed to be (21, in this case.) Note that the highest
possible value is 23, as that is the row total.
p = P(21) + P(22) + P(23) . According to Fisher’s exact test, the probability of a certain value
for n11 is given by
P(n11 ) =
(nn111• )(n•1n−2•n11 )
(nn•1 )
. So, the p-value is...
p = P(21) + P(22) + P(23)
=
18
(23
21)(15)
(41
36)
+
18
(23
22)(14)
(41
36)
+
18
(23
23)(13)
(41
36)
= 0.3808
The p-value = 0.3808 means we fail to reject the null hypothesis, and we therefore conclude
that whether or not the cancer was controlled is independent of whether treatment was
surgery or radiation therapy.