MATH 461/661, Homework 4 Sonia Heckler and Connor Dayton Exercise 1: (2.18) Table 2.13 shows data from the 2002 General Social Survey cross classifying a person’s perceived happiness with their family income. The table displays the observed and expected cell counts and the standardized residuals for testing independence. Income Not Too Happy Pretty Happy Very Happy 159 110 35.8 166.1 88.1 -2.973 -0.947 3.144 372 221 79.7 370.0 196.4 -4.403 0.224 2.907 249 83 52.5 244.0 129.5 7.368 0.595 -5.907 21 Above Average 53 Average 94 Below Average a. Show how to obtain the estimated expected cell count of 35.8 for the first cell. Income Not Too Happy Pretty Happy Very Happy Total Above Average Average Below Average Total 21 53 94 168 159 372 249 780 110 221 83 414 290 646 426 1362 Solution: µ̂ij = ni • ∗ n • j 168 ∗ 290 = ≈ 35.8 n 1362 b. For testing independence, χ2 = 73.4. Report the df value and the P-value, and interpret. Solution: d f = ( I − 1)( J − 1) = (3 − 1)(3 − 1) = 4. From chi-squared table: p-value < 0.001 P is sufficiently close to 0 that the null hypothesis can be rejected. c. Interpret the standardized residuals in the corner cells having counts 21 and 83. Solution: For the corner cells 21 and 83, they have residuals of -2.973 and -5.907 respectively. The residuals are large enough in the negative direction to conclude that there are significantly fewer subjects in these categories than the model predicts. d. Interpret the standardized residuals in the corner cells having counts 110 and 94. Solution: For the corner cells 110 and 94, they have residuals of 3.144 and 7.368 respectively. The residuals are large enough in the positive direction to conclude that there are significantly more subjects in these categories than the model predicts. Exercise 2: (2.19) Table 2.14 was taken from the 2002 General Social Survey. Race Democrat Independent Republican Total White Black Total 871 302 1173 444 80 524 873 43 916 2188 425 2613 a.Test the null hypothesis of independence between party identification and race. Interpret. Solution: H0 : response variables are independent H A : response variables are dependent µ̂ij = ni • ∗ n • j Expected value: n X2 = ∑ ij Race Democrat Independent Republican White Black 982.2135 190.7865 438.7723 85.2277 797.0142 148.9858 (nij − µ̂ij )2 = 167.8 µ̂ij d f = ( I − 1)( J − 1) = (2 − 1)(3 − 1) = 2 From chi-squared table: p-value < 0.001 P is sufficiently close close to 0 that the null hypothesis can be rejected. b. Use standardized residuals to describe the evidence. Solution: Race Democrat Independent Republican White Black -3.5486 8.0516 0.2496 -0.5663 3.8269 -8.6831 The model does a good job of predicting independent voters for both black and white. It does a mediocre job of predicting Democrat and Republican for whites, and it does a terrible job predicting Democrat and Republican for Blacks. Overall the model is a mixed bag. These results do not completely confirm the results in part a. c. Partition the chi-squared into two components, and use the components to describe the evidence. Race Democrat Independent Total White Black Total 871 302 1173 444 80 524 1315 382 1697 Democrat vs. Independent: X 2 = (nij − µ̂ij )2 = 22.2036 ∑ µ̂ij ij Race Democrat + Independent Republican Total White Black Total 1315 382 1692 873 43 916 2188 425 2613 Democrat and Independent vs. Republican X 2 = (nij − µ̂ij )2 = 137.3387 ∑ µ̂ij ij The difference between Whites and Blacks identifying as republicans is much stronger than those identifying as Independent or Democrat. Exercise 3: (2.21) Each subject in a sample of 100 men and 100 women is asked to indicate which of the following factors (one or more) are responsible for increases in teenage crime: A, the increasing gap in income between the rich and poor; B, the increase in the percentage of single-parent families; C, insufficient time spent by parents with their children. A cross classification of the responses by gender is Gender A B C Men Women 60 75 81 87 75 86 a. Is it valid to apply the chi-squared test of independence to this 2 x 3 table? Explain. Solution: No, because the sujects have the option to choose more than 1 answer, the responses are dependent and chi-squared test does not apply. b. Explain how this table actually provides information needed to crossclassify gender with each of three variables. Construct the contingency table relating gender to opinion about whether factor A is responsible for increases in teenage crime. Solution: The table needs to be broken up by factor. Each factor individually can be compared to gender, but they cannot be compared to each other. Since there are 100 men and 100 women surveyed, 100 - (value of factor) calculates those who said no. Gender: A Yes No Men Women 60 75 40 25 Exercise 4: (2.22) Table 2.15 classifies a sample of psychiatric patients by their diagnosis and by whether their treatment prescribed drugs. Diagnosis Drugs No Drugs Total Schizophrenia Affective disorder Neurosis Personality disorder Special symptoms Total 105 12 18 47 0 182 8 2 19 52 13 94 113 14 37 99 13 276 a. Conduct a test of independence, and interpret the P-value. Solution: H0 : πij = πi• ∗ π• j for all i, j Meanwhile, the Ha : πij 6= πi• ∗ π• j for a certain i, j. χ2 = ∑ ij (nij −µ̂ij )2 µ̂ij = 84.189, with degrees of freedom d f = ( I − 1)( J − 1) = (5 − 1)(2 − 1) = 4 The Chi-squared distribution has a mean of k and a variance of 2k, where k is the degrees of √ √ freedom. For this distribution, the mean = 4 and the standard distribution = 8 = 2 2. Our value of 84.189 is very far in the right tail, with a p-value of approximately 0. Therefore, we reject the null hypothesis and conclude that the patient’s diagnoses is associated to whether or not they are prescribed drugs through their treatments. b. Obtain standardized residuals, and interpret. Solution: Standardized residuals are given by ni • n • j n where µ̂ij = and pi• = i• n n p nij − µ̂ij µ̂ij ∗ 1 − pi• ∗ 1 − p• j The following table shows the standardized residuals for each i and j Diagnosis Drugs No Drugs Schizophrenia Affective disorder Neurosis Personality disorder Special symptoms 7.8742 1.6022 -2.3853 -4.8416 -5.1393 -7.8745 -1.6023 2.3852 4.8418 5.1396 As an example of calculating a standardized residual, I will demonstrate the calculation for the first cell ∗182 = 74.5145, p = 113 = 0.4094, p = 182 = 0.6594 n11 = 105, µ̂11 = 113276 1• •1 276 276 the residual is p 105 − 74.5145 74.5145 ∗ (1 − 0.4094)(1 − 0.6594) = 7.8742. If the variables were truly independent, we would expect small residuals in each of the cells. The residuals we calculated are much larger than predicted, indicating a significant discrepancy between nij and µ̂ij c. Partition chi-squared into three components to describe differences and similarities among the diagnoses, by comparing (i) the first two rows, (ii) the third and fourth rows, (iii) the last row to the first and second rows combined and the third and fourth rows combined. Solution: Table of first 2 rows for mental health question X2 = ∑ ij Diagnosis Drugs No Drugs Total Schizophrenia Affective disorder Total 105 12 117 8 2 10 113 14 127 (nij − µ̂ij )2 = 0.8917 (without correction) µ̂ij Table of 3rd and 4th rows for mental health question X2 = ∑ ij Diagnosis Drugs No Drugs Total Neurosis Personality disorder Total 18 47 65 19 52 71 37 99 136 (nij − µ̂ij )2 = .0149 (without correction) µ̂ij Table of 1st and 2nd rows against 3rd and 4th rows against last row Diagnosis Drugs No Drugs Total Schizophrenia + Affective disorder Neurosis + Personality disorder Special symptoms Total 117 65 0 182 10 71 13 94 127 136 13 276 Schizophrenia/Affective Disorder vs. Special Symptoms X2 = ∑ ij (nij − µ̂ij )2 = 72.8997 (without correction) µ̂ij Neurosis/Personality disorder vs. Special Symptoms X2 = ∑ ij (nij − µ̂ij )2 = 11.0211 (without correction) µ̂ij ni + n j . n a. Show that µ̂ij have the same row and column totals as nij . ni • n • j Solution: We know ∑ µ̂ij = ∑ . For a fixed i, this sum is equal to n ij ij Exercise 5: (2.25) For tests of H0 : independence, µ̂ij = ∑ ij ni • n • j n = i• n n ∑ n• j = j ni • n n = ni • So, for any fixed i, ∑ µ̂ij = ni• . Repeat this process with a fixed j to show that ∑ µ̂ij = n• j . ij ij Thus, µ̂ij has the same row totals ni• and column totals n• j as nij b. For 2x2 tables, show that µ̂11 µ̂22 = 1.0 Hence, µ̂ij satisfy H0 µ̂12 µ̂21 n1• n •1 n2• n •2 n n Solution: n1• n •2 n2• n •1 n n = 1, after canceling all the terms. µ̂11 µ̂22 = µ̂12 µ̂21 Exercise 6: (2.30) Table 2.17 contains results of a study comparing radiation therapy with surgery in treating cancer of the larynx. Use Fisher’s exact test to test H0 : θ = 1 against Ha : θ > 1. Interpret results. Treatment Cancer Controlled Cancer Not Controlled Total Surgery Radiation therapy Total 21 15 36 2 3 5 23 18 41 Solution: For Ha : θ > 1, the p-value is a hypergeometric probability that the first element (n11 ) is at least as large as what it was observed to be (21, in this case.) Note that the highest possible value is 23, as that is the row total. p = P(21) + P(22) + P(23) . According to Fisher’s exact test, the probability of a certain value for n11 is given by P(n11 ) = (nn111• )(n•1n−2•n11 ) (nn•1 ) . So, the p-value is... p = P(21) + P(22) + P(23) = 18 (23 21)(15) (41 36) + 18 (23 22)(14) (41 36) + 18 (23 23)(13) (41 36) = 0.3808 The p-value = 0.3808 means we fail to reject the null hypothesis, and we therefore conclude that whether or not the cancer was controlled is independent of whether treatment was surgery or radiation therapy.
© Copyright 2026 Paperzz