Assignment #8, Due Wed, Dec 2

STATISTICS 110, FALL 2015
Homework #8 Solutions
Assigned Wed, November 25, Due Wed, December 2
For this assignment there is no computer work. It includes only exercises from the book, with some
slight modifications and additions.
1. Do Exercise 6.4, page 311. (Notice that the instructions are given at the top of the page, before
Exercise 6.1, and that there are parts a to d.) Now add a part e by explaining whether or not the
results as described indicate that there is an interaction between the two factors in their effect on
the response.
Solution:
a. The response is the student’s score on the set of math problems they were given to solve.
b. One factor is whether the child was diagnosed as hyperactive or not and the other factor is the
noise level (high or low) while the child worked on the problems.
c. The hyperactive factor is observational (because hyperactivity is something that can only be
observed, not assigned) and has 2 levels (hyperactive or not). The noise level factor is
experimental and has 2 levels (high, low).
d. You can answer this question either way, because we did not discuss the exact definition of a
“complete block design.” However, your answer must show that you know what is meant by a
“block.” A block is a group of either the same individuals measured at all of the levels of a
“treatment” factor, or similar individuals each assigned to one level of the “treatment” factor. In
this experiment, each student solved the problems under both conditions, high and low noise.
So each student could be considered to be a block. But the technical definition of a complete
block design (called a randomized block design in class) is that there is only one measurement
in each combination of factor levels, and in this study there were multiple children measured at
each hyperactivity by noise level combination. So this study used blocks but is not technically a
complete block design.
e. Yes, the results indicate that there is an interaction between hyperactivity group and noise level
in their effect on solving math problems, because the control kids did better with high noise,
but the hyperactive kids did better with low noise.
2. Do Exercise 6.6, page 312. (Same instructions as for 6.4.) Now add a part e by explaining whether
or not the results as described indicate that there is an interaction between the two factors in their
effect on the response.
Solution:
a. The response is the amount eaten by a rat after receiving the hormone shot.
b. One factor is the sex of the rat and the other factor is the type of hormone the rat received.
c. The sex factor is observational (can’t be randomly assigned) and has two levels (male, female).
The hormone factor is experimental (randomly assigned) with two levels (leptin, insulin).
d. This is not a block design at all. Each rat was given only one shot. In a block design there are
multiple measurements taken on each individual (or individuals are grouped into similar sets of
units).
e. Yes, the results indicate that there is an interaction between sex and type of hormone in their
effect on amount eaten because the females ate less with leptin compared to insulin, while the
males ate less with insulin compared to leptin.
3. Do Exercise 6.12, page 313.
Solution: large, less
4. Do Exercise 6.19 (page 314), part (a) only. (The “two interaction graphs” requested in the exercise
include one with the categories of Factor A on the x-axis and lines for the categories of Factor B,
and the other has those roles reversed.)
Solution: The two plots are shown below. Although you weren’t asked to interpret the plots, it’s
worth thinking about the interpretation. First, men took fewer minutes overall on average than did
women to burn 200 calories, so there is an effect for the “sex” factor. Averaging across men and
women the two exercise machines have almost the same average, so there is at most a weak
“exercise machine” factor effect, with a slightly lower average for the treadmill. However, there is
a strong interaction effect, so it doesn’t make sense to talk about the main effects without taking
that into account. Men took fewer minutes using the treadmill than using the rowing machine,
while women took fewer minutes using the rowing machine than using the treadmill.
17
17
Machine
Rowing
Treadmill
Average minutes
Average minutes
Sex
Men
Women
16
16
15
14
15
14
13
13
12
12
Men
Women
Sex
Rowing
Treadmill
Machine
5. Using the data in Exercise 6.19 give numerical values to estimate all of the parameters in the twofactor ANOVA model Yikj = µ + αk + βj + γkj + εikj in the order listed below:
Solution:
a. µ
(12  17  14  16)
 14.75 minutes.
4
b. The values of αk for k = 1, 2 where k = 1 for Treadmill and k = 2 for Rowing machine.
The estimates are (mean for that exercise machine – overall mean), so
(12  17)
ˆ 1 
 14.75  14.5  14.75  0.25 minutes, and because they must sum to 0,
2
ˆ 2  0.25. So the estimate is that on average it would take 0.25 seconds less than the mean to
lose the calories using the treadmill, and 0.25 seconds more than the mean using the rowing
machine.
c. The values of βj for j = 1, 2 where j = 1 for Men and j = 2 for Women
This is the overall mean. The estimate is ˆ 
The estimates are (mean for that sex – overall mean), so
(12  14)
ˆ1 
 14.75  13  14.75  1.75 and because they must sum to 0, ˆ 2  1.75. So
2
the estimate is that on average it would take men 1.75 seconds less than the mean to lose the
calories, and women 1.75 seconds more than the mean to lose them.
d. The values of γkj for all j, k pairs
The estimate of each interaction term is what’s still left over after accounting for the other
terms in the model for that cell. So
ˆ11  y11  ˆ  ˆ 1  ˆ1  12  14.75  (0.25)  (1.75)  0.75. This says for the combination
of male and treadmill, the estimate is that the combination of the two would result in 0.75
seconds less than would be estimated by just using the male effect and treadmill effect
separately. Because the interaction terms must sum to 0 for each row and column, the
remaining estimates are ˆ12  ˆ 21  0.75, ˆ 22  0.75 .
6. Do Exercise 6.20 (page 314).
Solution: The interaction plots are shown below. (To be able to judge the correctness of responses
plots are shown both ways, but you only needed to do one of them.) Interpretation:
 There is a sex effect, as evidenced by the fact that the average for the 3 lines for the females is
higher than for the males, indicating that a higher percentage of girls than boys reported having
been drunk at least twice in their lives.
 There is a region effect, with the highest average percentage in the Northern region and the
lowest in the Continental region, with the Eastern region between the two.
 However, the main effects (for sex and region) should not be interpreted without taking into
account the strong interaction. For the Northern region, the percentages are almost exactly the
same for the girls and boys, whereas for the other two regions the percentages are higher for the
girls than the boys. Looking at it in the other direction, the boys’ percentages are almost
identical for the Continental and Eastern regions and much lower than for the Northern region,
whereas the girls’ percentages are different for all 3 regions, with lowest for Continental and
highest for Northern.
Region
Continental
Eastern
Northern
50
50
45
40
Percentage
Percentage
45
35
35
30
30
25
25
Male
Female
Sex
Sex
Male
Female
40
Continental
Eastern
Region
Northern
7. Do Exercise 6.22 (page 315).
Solution: The table is filled in below. You don’t need to explain how you got your answers, but
here is the explanation. The degrees of freedom can be found by knowing that K = 2, J = 2, n = 25
for each combination of “Face” and “Gender” and N = 100, then from the formulas K–1, J–1,
(K – 1)(J – 1), KJ(n – 1), N – 1. Next, MS = SS/df, so once you have the df, you can fill in SS and
MS for Gender, Interaction and Residuals. Next, you can find SS for Face by subtracting all of the
other SS from SSTotal, and then you can find MS for Face. Finally, the F values are all MS for that
effect divided by MSE (which is in the Residuals row).
Source
Face (Yes/No)
Gender (M/F)
Interaction
Residual
Total
df
1
1
1
96
99
SS
12,915
2,500
400
9,600
25,415
MS
F
12,915 129.15
2,500
25.0
400
4.0
100
8. Do Exercise 7.8 (page 383).
Solution: The multiplier is different for each of the methods.
9. Do Exercise 7.30 (page 390). (Hint: Exercise 7.31 is very similar, and has answers in the back of
the book.)
Solution:
a. Y     k   where μ = grand mean shelf length for the population of strawberries, and αk is
the amount added or subtracted to the population mean for treatment k.
b. Y = β0 + β1Lemon + β2Paper + ε, where Lemon = 1 if lemon juice was used and 0 otherwise,
Paper = 1 if paper towels were used and 0 otherwise. You could have an indicator for control
instead of either lemon or paper, but you cannot have all 3 indicator variables because of the
intercept.
c. β0 is the mean shelf life (or predicted shelf life) for the population of all strawberries if neither
of the treatments are applied; β1 is the amount by which the population mean (or predicted)
shelf life would differ from β0 if all strawberries were treated with lemon juice, and β2 is the
amount by which the population mean (or predicted) shelf life would differ from β0 if all
strawberries were treated with paper towels.
10. Do Exercise 8.2 (page 435). Note: This last homework exercise brings us full circle, because it
involves the basic concepts we discussed on the first day of class!
Solution:
a. Random assignment is not possible because the explanatory variable of racial/ethnic group is a
characteristic of individuals, not something that can be assigned. Therefore, the study is
observational and cause and effect cannot be concluded. There are possible confounding
variables that are both related to ethnicity and might affect the birth weight of the baby, such as
economic group, mother’s diet, and so on.
b. Inferences to the population can be made if the sample is representative of the population for
the question of interest. This condition is satisfied if a random sample is used, and in Exercise
5.31 it is stated that a random sample was used for this study.