STATISTICS 110, FALL 2015 Homework #8 Solutions Assigned Wed, November 25, Due Wed, December 2 For this assignment there is no computer work. It includes only exercises from the book, with some slight modifications and additions. 1. Do Exercise 6.4, page 311. (Notice that the instructions are given at the top of the page, before Exercise 6.1, and that there are parts a to d.) Now add a part e by explaining whether or not the results as described indicate that there is an interaction between the two factors in their effect on the response. Solution: a. The response is the student’s score on the set of math problems they were given to solve. b. One factor is whether the child was diagnosed as hyperactive or not and the other factor is the noise level (high or low) while the child worked on the problems. c. The hyperactive factor is observational (because hyperactivity is something that can only be observed, not assigned) and has 2 levels (hyperactive or not). The noise level factor is experimental and has 2 levels (high, low). d. You can answer this question either way, because we did not discuss the exact definition of a “complete block design.” However, your answer must show that you know what is meant by a “block.” A block is a group of either the same individuals measured at all of the levels of a “treatment” factor, or similar individuals each assigned to one level of the “treatment” factor. In this experiment, each student solved the problems under both conditions, high and low noise. So each student could be considered to be a block. But the technical definition of a complete block design (called a randomized block design in class) is that there is only one measurement in each combination of factor levels, and in this study there were multiple children measured at each hyperactivity by noise level combination. So this study used blocks but is not technically a complete block design. e. Yes, the results indicate that there is an interaction between hyperactivity group and noise level in their effect on solving math problems, because the control kids did better with high noise, but the hyperactive kids did better with low noise. 2. Do Exercise 6.6, page 312. (Same instructions as for 6.4.) Now add a part e by explaining whether or not the results as described indicate that there is an interaction between the two factors in their effect on the response. Solution: a. The response is the amount eaten by a rat after receiving the hormone shot. b. One factor is the sex of the rat and the other factor is the type of hormone the rat received. c. The sex factor is observational (can’t be randomly assigned) and has two levels (male, female). The hormone factor is experimental (randomly assigned) with two levels (leptin, insulin). d. This is not a block design at all. Each rat was given only one shot. In a block design there are multiple measurements taken on each individual (or individuals are grouped into similar sets of units). e. Yes, the results indicate that there is an interaction between sex and type of hormone in their effect on amount eaten because the females ate less with leptin compared to insulin, while the males ate less with insulin compared to leptin. 3. Do Exercise 6.12, page 313. Solution: large, less 4. Do Exercise 6.19 (page 314), part (a) only. (The “two interaction graphs” requested in the exercise include one with the categories of Factor A on the x-axis and lines for the categories of Factor B, and the other has those roles reversed.) Solution: The two plots are shown below. Although you weren’t asked to interpret the plots, it’s worth thinking about the interpretation. First, men took fewer minutes overall on average than did women to burn 200 calories, so there is an effect for the “sex” factor. Averaging across men and women the two exercise machines have almost the same average, so there is at most a weak “exercise machine” factor effect, with a slightly lower average for the treadmill. However, there is a strong interaction effect, so it doesn’t make sense to talk about the main effects without taking that into account. Men took fewer minutes using the treadmill than using the rowing machine, while women took fewer minutes using the rowing machine than using the treadmill. 17 17 Machine Rowing Treadmill Average minutes Average minutes Sex Men Women 16 16 15 14 15 14 13 13 12 12 Men Women Sex Rowing Treadmill Machine 5. Using the data in Exercise 6.19 give numerical values to estimate all of the parameters in the twofactor ANOVA model Yikj = µ + αk + βj + γkj + εikj in the order listed below: Solution: a. µ (12 17 14 16) 14.75 minutes. 4 b. The values of αk for k = 1, 2 where k = 1 for Treadmill and k = 2 for Rowing machine. The estimates are (mean for that exercise machine – overall mean), so (12 17) ˆ 1 14.75 14.5 14.75 0.25 minutes, and because they must sum to 0, 2 ˆ 2 0.25. So the estimate is that on average it would take 0.25 seconds less than the mean to lose the calories using the treadmill, and 0.25 seconds more than the mean using the rowing machine. c. The values of βj for j = 1, 2 where j = 1 for Men and j = 2 for Women This is the overall mean. The estimate is ˆ The estimates are (mean for that sex – overall mean), so (12 14) ˆ1 14.75 13 14.75 1.75 and because they must sum to 0, ˆ 2 1.75. So 2 the estimate is that on average it would take men 1.75 seconds less than the mean to lose the calories, and women 1.75 seconds more than the mean to lose them. d. The values of γkj for all j, k pairs The estimate of each interaction term is what’s still left over after accounting for the other terms in the model for that cell. So ˆ11 y11 ˆ ˆ 1 ˆ1 12 14.75 (0.25) (1.75) 0.75. This says for the combination of male and treadmill, the estimate is that the combination of the two would result in 0.75 seconds less than would be estimated by just using the male effect and treadmill effect separately. Because the interaction terms must sum to 0 for each row and column, the remaining estimates are ˆ12 ˆ 21 0.75, ˆ 22 0.75 . 6. Do Exercise 6.20 (page 314). Solution: The interaction plots are shown below. (To be able to judge the correctness of responses plots are shown both ways, but you only needed to do one of them.) Interpretation: There is a sex effect, as evidenced by the fact that the average for the 3 lines for the females is higher than for the males, indicating that a higher percentage of girls than boys reported having been drunk at least twice in their lives. There is a region effect, with the highest average percentage in the Northern region and the lowest in the Continental region, with the Eastern region between the two. However, the main effects (for sex and region) should not be interpreted without taking into account the strong interaction. For the Northern region, the percentages are almost exactly the same for the girls and boys, whereas for the other two regions the percentages are higher for the girls than the boys. Looking at it in the other direction, the boys’ percentages are almost identical for the Continental and Eastern regions and much lower than for the Northern region, whereas the girls’ percentages are different for all 3 regions, with lowest for Continental and highest for Northern. Region Continental Eastern Northern 50 50 45 40 Percentage Percentage 45 35 35 30 30 25 25 Male Female Sex Sex Male Female 40 Continental Eastern Region Northern 7. Do Exercise 6.22 (page 315). Solution: The table is filled in below. You don’t need to explain how you got your answers, but here is the explanation. The degrees of freedom can be found by knowing that K = 2, J = 2, n = 25 for each combination of “Face” and “Gender” and N = 100, then from the formulas K–1, J–1, (K – 1)(J – 1), KJ(n – 1), N – 1. Next, MS = SS/df, so once you have the df, you can fill in SS and MS for Gender, Interaction and Residuals. Next, you can find SS for Face by subtracting all of the other SS from SSTotal, and then you can find MS for Face. Finally, the F values are all MS for that effect divided by MSE (which is in the Residuals row). Source Face (Yes/No) Gender (M/F) Interaction Residual Total df 1 1 1 96 99 SS 12,915 2,500 400 9,600 25,415 MS F 12,915 129.15 2,500 25.0 400 4.0 100 8. Do Exercise 7.8 (page 383). Solution: The multiplier is different for each of the methods. 9. Do Exercise 7.30 (page 390). (Hint: Exercise 7.31 is very similar, and has answers in the back of the book.) Solution: a. Y k where μ = grand mean shelf length for the population of strawberries, and αk is the amount added or subtracted to the population mean for treatment k. b. Y = β0 + β1Lemon + β2Paper + ε, where Lemon = 1 if lemon juice was used and 0 otherwise, Paper = 1 if paper towels were used and 0 otherwise. You could have an indicator for control instead of either lemon or paper, but you cannot have all 3 indicator variables because of the intercept. c. β0 is the mean shelf life (or predicted shelf life) for the population of all strawberries if neither of the treatments are applied; β1 is the amount by which the population mean (or predicted) shelf life would differ from β0 if all strawberries were treated with lemon juice, and β2 is the amount by which the population mean (or predicted) shelf life would differ from β0 if all strawberries were treated with paper towels. 10. Do Exercise 8.2 (page 435). Note: This last homework exercise brings us full circle, because it involves the basic concepts we discussed on the first day of class! Solution: a. Random assignment is not possible because the explanatory variable of racial/ethnic group is a characteristic of individuals, not something that can be assigned. Therefore, the study is observational and cause and effect cannot be concluded. There are possible confounding variables that are both related to ethnicity and might affect the birth weight of the baby, such as economic group, mother’s diet, and so on. b. Inferences to the population can be made if the sample is representative of the population for the question of interest. This condition is satisfied if a random sample is used, and in Exercise 5.31 it is stated that a random sample was used for this study.
© Copyright 2025 Paperzz