Chapter 15 Analysis of Variance for Standard Designs

Chapter 15 Analysis of Variance for Standard Designs
Case Study 1
Ramsey and Schafer (1997) The Statistical Sleuth, Duxbury Press, p. 365. The Pygmalion effect.
• Pygmalion was a king of Cyprus who sculpted a figure of the ideal woman and then
fell in love with the sculpture. It also refers for the situation in which high expectations
placed on individuals by teachers or supervisors often results in improved performance
by students or subordinates.
• Eden, D. 1990, Pygmalion effects without interpersonal contrast effects, J. Appl. Psychol. 75(4), 395-98. Eden speculated that in most quantitative examples of the Pygmalion effect which compared two groups of subjects (one with high expectations, and
the other without), there were also reduced expectations placed on the “no expectation” group. This contrast between high and low expectations may be exaggerate the
Pygmalion effect.
• Eden conducted an experiment that attempted to eliminate interpersonal contrasts of
this type. Ten companies of soldiers in a training camp were selected for study. Each
company consisted of three platoons; one platoon in each was selected by a random
mechanism to be the Pygmalion platoon.
• Companies are identified as blocks, and because randomization was within block, this
is a randomized block design.
• Prior to assuming command, each platoon leader met with an army psychologist that
1
described a nonexistent battery of tests taken by the platoon members. If the platoon
was a Pygmalion platoon, the psychologist reported that the tests predicted superior
performance for the platoon.
• At the end of training, each platoon took a test that measured their ability to operate
weapons and answer questions about their use.
Table 1: Platoon average scores on the test.
Company
1
2
3
4
5
6
7
8
9
10
Treatments
Pygmalion
Control
80.0
63.2 69.2
83.9
63.1 81.5
68.2
76.2
76.5
59.5 73.5
87.8
73.9 78.5
89.8
73.9 78.5
76.1
60.6 69.6
71.5
67.8 73.2
69.5
72.3 73.9
83.7
63.7 77.7
• Note that the experimental units are platoons - scores on individual soldiers are
averaged, then discarded. Treatments were assigned to platoons - hence these are the
experimental units.
• There is one observation for each combination of company and the Pygmalion treatment, and usually two observations for each combination of company and the control
treatment - so the experiment is not balanced
Summary The Pygmalion effect adds an estimated 7.22 points to a platoon’s score
(95% CI is 1.80 to 12.64 points). The evidence strongly suggests that the effect is real
(p-value= 0.006), and the design allows for causal inference.
2
Additive and Non-additive Models for Two-way Tables
• In situations where there are two factors, the data set may be viewed as a two-way (rows
and columns) table where rows and columns correspond to factors. In the Pygmalion
study, rows are company (10 levels), and columns are treatment (2 levels).
• Research questions tend to focus on additive models for two-way tables.
• An additive model assumes that there is no interaction between the row and column
factors.
• An additive model specifies that the effect of one level of one factor (e.g., the row
factor) is the same at all levels of the second factor.
• The effect of factor A is completely unrelated to the levels of B; likewise, the effect of
factor B is completely unrelated to factor A
• For example, the effect of factors A at level 1 and B at level 3, is the sum of the
effect of A at level 1 and the effect of B at level 3. If the model were not additive, then
combined effect would not be the sum of the two effects
• Example Mean test scores according to the additive model, in terms of regression
coefficients in a multiple regression model with indicator variables
• β0 is the expected response at the control level
• β1 is the Pygmalion effect (the difference in expected response between the control
and Pygmalion treatments)
• β2 is the effect of company 2 (relative to the aliased company - company 1), etc.
• The additive model is powerful because it is universal in the sense that the factor effects
3
Table 2: Additive Model of the Expected, or Mean Responses.
Company
1
2
3
4
5
6
7
8
9
10
Treatments
Pygmalion
Control
β0 + β1
β0
β0 + β1 + β2
β0 + β2
β0 + β1 + β3
β0 + β3
β0 + β1 + β4
β0 + β4
β0 + β1 + β5
β0 + β5
β0 + β1 + β6
β0 + β6
β0 + β1 + β7
β0 + β7
β0 + β1 + β8
β0 + β8
β0 + β1 + β9
β0 + β9
β0 + β1 + β10 β0 + β10
Treatment Effects
Pygmalion−Control
β1
β1
β1
β1
β1
β1
β1
β1
β1
β1
are independent of each other. No matter which company is scrutinzied, the Pygmalion
effect is the same.
• If the additive model fails to fit well, then all references to the Pygmalion effect must
be stated with respect to a particular company
Fitting the Additive Model
• Let Y represent the platoon score, and let




1, if treatment is Pygmalion
x1 =



0, if treatment is control




1, if company is 2
x2 =



0, if company is not 2




1, if company is 10
, · · · , x10 =



0, if company is not 10.
• The additive model, in regression form, is
Y = β0 + β1 x1 + β2 x2 + · · · + β10 x10 + ε
4
• This model contains 11 parameters:
p = 1 + (r − 1) + (c − 1) = 1 + 9 + 1 = 11
where r are the number of levels of the row factor and c is the number of levels of the
column factor. One additional parameter is counted for the intercept β0
The Saturated, Nonadditive Model
• One competing, alternative model to the additive model is the saturated, nonadditive
model. It allows for interaction between rows and columns, and in doing so, does not
assume that the effect of one particular factor level is the same at each level of the other
factor
• It is called saturated because there are no additional parameters that can be introduced
to reduce the residual variation about this model. Said another way, there are as many
parameters as cells (20 = 2 × 10 cells)
• Every cell has its own, unconstrained, expected value (or mean)
• Let




1, if treatment is Pygmalion and company is 2
x11 =x1 × x2 =



0, otherwise
..
.




1, if treatment is Pygmalion and company is 10
x19 =x1 × x10 =



0, otherwise
• We sometimes call this the cell means model because each cell mean has a freely
determined mean
5
• The nonadditive saturated model, in regression form, is
Y = β0 + β1 x1 + β2 x2 + · · · + β10 x10 + β11 x11 · · · + β19 x19 + ε
• Mean test scores according to the nonadditive saturated model, in terms of regression
coefficients in a multiple regression model with indicator variables
Table 3: Mean test scores according to the nonadditive saturated model, in terms of regression coefficients
in a multiple regression model with indicator variables.
Company
1
2
3
4
5
6
7
8
9
10
Treatments
Pygmalion
β0 + β1
β0 + β1 + β2 + β11
β0 + β1 + β3 + β12
β0 + β1 + β4 + β13
β0 + β1 + β5 + β14
β0 + β1 + β6 + β15
β0 + β1 + β7 + β16
β0 + β1 + β8 + β17
β0 + β1 + β9 + β18
β0 + β1 + β10 + β19
Control
β0
β0 + β2
β0 + β3
β0 + β4
β0 + β5
β0 + β6
β0 + β7
β0 + β8
β0 + β9
β0 + β10
Treatment Effects
Pygmalion−Control
β1
β1 + β11
β1 + β12
β1 + β13
β1 + β14
β1 + β15
β1 + β15
β1 + β17
β1 + β18
β1 + β19
• This nonadditive saturated model contains 20 parameters:
p = 1 + (r − 1) + (c − 1) + (r − 1) × (c − 1) = 1 + 9 + 1 + 9 = 20
(same as the number of cells)
• In the saturated model, the means are unrelated.
There are no constraints on the
means (such as the Pygmalion effect is the same for all companies). The estimated cell
mean (e.g., for Company 2, Pygmalion) is the platoon sample mean. (The same estimate
can be obtained by fitting the model and computing the fitted value)
• Let Yijk denote the platoon mean for level i of company, i = 1, . . . , 10, and level j of
treatment, j = 1, 2, and replicate k (= 1 or 2)
6
• The number of replicates is nij . nij is 1 whenever j = 1 (Pygmalion treatment), and
nij is 2 for the controls (except company 3)
• We may re-express the model as
Yijk = β0 + β1 x1,ij
+ β2 x2,ij + · · · + β10 x10,ij
+ β11 x11,ij + · · · + β19 x19,ij + εijk
• We also write a short-hand expression Yijk = µij + εijk where
µij = β0 + β1 x1,ij
+ β2 x2,ij + · · · + β10 x10,ij
+ β11 x11,ij + · · · + β19 x19,ij
The subscripts i and j on µ imply that there is a distinct, freely determined mean for
each cell
• The estimate of σ 2 , the residual variance, is
nij
1 XXX
s =
(yijk − ybij )2 .
n−p i j k
2
• Note that n − p = 29 − 20 = 9 are the degrees of freedom for error
• The predicted or fitted values ybij can be computed as an output of multiple regression
model fitting, or simply by computing the cell means y ij . For example, if i = 1 and
j = 1, ybij = y111 ; i = 1 and j = 2,
yb12 =
y121 + y122
2
7
A Strategy for Anlayzing Two-way Tables With Several Observations per Cell
The fixed effects analysis of variance is approached as a multiple regression analysis
in which backwards elimination is used to determine the importance of the factors in
explaining variation in the response variable
1. Begin with graphically-based initial exploration, and determine if there are outliers,
and if transformations are needed
2. Fit a rich model (the saturated model) with interactions, and examine model assumptions (concentrating on the constant variance assumption, and whether there
are outliers)
3. Test whether the interaction terms are needed (via the F -statistic).
• If interaction terms are needed, then estimate the mean response and its standard
error for each treatment. That is, compute y ij and σ
byij = s/nij for each i and j.
• If interaction terms are not needed, then test whether the additive effects of the
row factor are zero, and whether the additive effects of the column factor are zero.
In other words, test whether the coefficients that account for the factor are all zero
versus the alternative that at least one is different from zero.
• Particular comparisons can be carried out at this point. For example, estimate the
differences in expected response for different treatments (when interaction is found to be
present) or different levels of factors (when interaction is not present). The answers to
these questions are ultimately, the most useful information coming from the analysis
8
The Analysis of Variance F -test for Additivity
• If the all the interaction parameters are equal to 0, then the nonadditive model reduces
to the additive model. A check on the additive model is obtained by comparing the fit
of the additive and nonadditive models
• A test used to compare fit is the extra-sums-of-squares F -test.
• The hypotheses of interest are (informally)
H0 : there is no interaction between the two factors
Ha : there is interaction between the two factors
• For example, for the Pygmalion problem, the test for additivity is a test of the following
hypotheses
H0 : β11 = 0, . . . , β19 = 0, versus
Ha : βi 6= 0 for at least one i where 11 ≤ i ≤ 19
• A F -test of this hypothesis compares the difference in error sums of squares between the
model with all interaction indicator variables in the model to the error sums of squares
to the model with none of indicator variables in the model
• A large difference in error sums of squares is evidence that the interaction terms explain
variation in the response variable
• Specifically, we compare the fit of the model
Yij = β0 + β1 x1,j
+ β2 x2,i + · · · + β10 x10,i
+ β11 x11,i,j + · · · + β19 x19,i,j + εij
9
to the fit of the reduced model
Yij = β0 + β1 x1,j
+ β2 x2,i + · · · + β10 x10,i + εij
• As before, let
1. M1 denote the model with the interaction terms in the model,
SSE1 denote the error sums of squares associated with M1
df1 denote the degrees of freedom associated with M1 (df1 = n − p)
4. M2 denote the model without the interaction terms in the model,
SSE2 denote the error sums of squares associated with M2
df2 denote the degrees of freedom associated with M2
• The test statistic is
F =
(SSE2 − SSE1 )/(df2 − df1 )
SSE1 /df1
• Under H0 , F has a F -distribution with n1 = df2 − df1 = (r − 1)(c − 1) numerator and
n2 = df1 , denominator degrees of freedom, respectively
• We reject H0 at the α-level if F > fα , where α = P (Fn1 ,n2 > fα ) is the probability
that an F random variable with n1 numerator and n2 denominator degrees of freedom
takes on a value larger than fα
• A p-value for the test is P (Fn1 ,n2 > F )
• F has and F -distribution with numerator and denominator degrees of freedom df1
and df2 respectively, provided that the random error terms εij are iid N (0, σ).
10
This
assumption must be investigated by residual plots which check for non-constant variance
and approximate normality. The residuals used in this analysis are the residuals from
the full model, since σ is estimated using the residual mean square error from the full
model
• Figure 1 is residual plot using residuals from the full model. There is some concern
regarding the assumption of constant variance (and no apparent method of improving
0
-2
-5
-1
0
Residuals
1
5
2
the appearance). Why is the left plot symmetric about about the horizontal axis?
65
70
75
80
85
90
-2
Fitted values
-1
0
1
2
Quantiles of Standard Normal
Figure 1: Residual plots. Residuals obtained from the interaction model. n = 29.
• SPSS profile plots for the additive and nonadditive model are fairly similar, indicating
visually that the additive model is adequate
11
• A formal test of significance usually is necessary. To carry out a test of the hypotheses
H0 : there is no interaction between the two factors, versus
Ha : there is interaction between the two factors,
we formally test
H0 : β11 = 0, . . . , β19 = 0,versus
Ha : βi 6= 0 for at least one i where 11 ≤ i ≤ 19
• The test statistic is
F =
(SSE2 − SSE1 )/(df2 − df1 )
SSE1 /df1
where
SSE1 is the error sums of squares for the interaction model,
df1 are the degrees of freedom associated with the interaction model,
SSE2 is the error sums of squares for the additive model,
df2 are the degrees of freedom for the additive model
• Under H0 , F has a F -distribution with n1 =df2 −df1 numerator and n2 =df1 denominator
degrees of freedom, respectively
• The p-value for the test is P (Fn1 ,n2 > F )
• In this case,
F =
(778.50 − 467.04)/9
34.61
=
= 0.667,
467.04/28
51.89
and P (F9,9 > 0.667) = .72.
• I conclude that there is no statistical evidence of interaction. This implies that there
12
is no evidence that the size of the Pygmalion effect depends on the particular company
Ott and Longnecker’s 15.3 Randomized complete block design (p. 859)
• The discussion of Ott and Longnecker involves t treatments in b blocks. This means
that there are two factors, the blocking factor (corresponding to company in the previous
example), and another factor with t levels (see Table 15.8).
• A blocking factor is one that is not necessarily believed to affect the response variable.
Instead, it is a variable that serves to identify groups of observations that are similar
within groups and possibly different between groups. Company is a variable that was
thought to do this in the Pygmalion study.
There is occasionally some interest in
determining whether the blocking factor is important related to the efficiency of the
experiment, but usually, there is no substantive interest beyond this role. In other words,
the test of significance for a blocking factor is of minor importance. Some statisticians
advocate that a test of significance is not justifiable because its role is to reduce residual
error, not to explain variation in the response variable
• A blocking factor is often treated as a random factor (not a fixed factor). We will study
random factors later.
Treating a block as a fixed factor is not encouraged, though it
does simplify certain aspects of the analysis and interpretation of the results.
• Ott and Longnecker present a slightly different model than the general linear model
presented in these notes. They reserve α1 , . . . , αt to denote the effects associated with the
t levels of the fixed factor and β1 , . . . , βb to denote the effects associated with the t levels of
the blocking factor, and µ to denote the overall average. Ott and Longnecker’s model is
overparameterized (there are 1+t+b parameters whereas only 1+(b−1)+(t−1) = t+b−1
13
can be estimated).
When it comes data analysis, their model must be discarded and
replaced with the one presented in these notes (or one like it).
• Ott and Longnecker’s Figure 15.1 is a profile plot
• Ott and Longnecker’s ANOVA table (Table 15.1) has an extra line reporting the total
sums of squares.
P P P
i
j
k (yijk
In the context of the Pygamlion data, the total sums of squares is
− y ... )2 .
• Because y ... is the mean of all observations, the total sums of squares is measuring
the error associated with the ”no-information” model - that is the model, that uses the
sample mean to predict the value of any future observation.
• The main difference between this expression
P P P
i
j
2
k (yijk −y ... )
for the total sums of
squares and the one in Ott and Longnecker;s book is the inner-most summation sign (it’s
absent in their discussion because they assume that there are exactly one observation per
treatment/block combination, and so there is no need for the inner-most summation).
• With only one observation per treatment/block combination, it is impossible to model
interaction. The total number of parameters in the interaction model is p = 1 + (t −
1) + (b − 1) + (t − 1) × (b − 1) = t × b. Since t × b = n, the degrees of freedom to
estimate σ is 0, and the model fits the data perfectly. Ott and Longnecker are forced
into the position of arguing that interaction does not exist in order to proceed with their
analysis of the additive (no-interaction model). The assumption that interaction that
interaction does not exist is usually reasonable when one of the two factors is a blocking
factor because the blocking factor is used to reduce residual variation, not because it is
thought to substantively affect the response variable.
14
• In the analysis of the Pygmalion data, there are 2 observations in some of the treatment/block combinations. Consequently, it is possible to estimate σ when an interaction
model is adopted, and hence possible to test for interaction between blocks and treatment.
• In general, it is preferable to have some replication of treatment/block combinations
in order to investigate, or at least have the ability to investigate the interaction model
• The process of assigning observational units to treatment/block combinations is carried
out (usually) by randomly choosing blocks, and then, one block at at time, randomly assigning observations to treatments. For example, the companies are formed by randomly
assigning soldiers to platoons, and then forming companies as either 3 or 2 platoons.
Then, within each company, one platoon is random selected to be the Pygmalion treatment platoon.
• Often, blocks are contiguous landscape units (e.g., a field or a forest stand.
Plots
are experimental units that are treated in some way (say, thinned or fertilized), and
then measured (think of a square with an area of 100m2 ). Plot locations are randomly
established within each block.
Usually, by not necessarily, every treatment appears
at least once in each block. This serves to reduce environmental variation as a source
of residual error, and generally improves the sensitivity of the experimental design for
detecting treatment differences.
• Blocking can be carried out with more than one experimental factors. For example,
if we are interested in forest fuel reduction, two factors of potential interest are season
(levels are spring and fall), and pre-burning thinning (thinned or not). Since existing
fuel loading and soil moisture are potentially confounding variables, blocking helps to
15
reduce the residual error associated with these difficult-to-measure variables.
Analysis of Variance Terminology
• Analysis of variance (ANOVA) is a partitioning of the total sums of squares into terms
that are attributable to the model factors
• The following discussion describes ANOVA terms for two-way tables. Ott and Longnecker discuss the ANOVA for one-way tables (i.e., one factor) on page 856, and randomized complete block design (two factors) on page 865.
• The total sums of squares are
SST =
nij
r X
c X
X
(yijk − y ··· )2
i=1 j=1 k=1
=
X
(yijk − y ··· )2
i,j,k
where nij is the number of observations in cell i, j, and
y ··· =
X
1X
yijk and n =
nij
n i,j,k
i,j
is the grand, or overall, mean
• The degrees of freedom for SST are n − 1
• The between sum of squares (between cells) SSB is that portion of SST which is
attributable to differences between the cell means and the overall mean, i.e.,
SSB =
XXX¡
¡
¢2 X
¢2
y ij· − y ··· =
nij y ij· − y ···
i
j
i,j
k
where the cell means are
y ij· =
1 X
yijk
nij k
16
• It is easy to get cell means from SPSS by fitting the interaction model, and saving the
predicted values. The predicted values are the cell means, i.e.,
ybijk = y ij· for every i, j and k
• Consequently, SSB is exactly equivalent to the regression sums of squares SSR from
the saturated model. Mathematically,
SSR =
X
(b
yijk − y ··· )2
i,j,k
and so
SSR =
X
(b
yijk − y ··· )2 =
X
¡
¢2
nij y ij· − y ··· = SSB
i,j
i,j,k
• The degrees of freedom associated with SSB is the number of parameters used to
account for differences between cell means in the saturated model, namely,
dfB = (r − 1) + (c − 1) + (r − 1)(c − 1)
= 9 + 1 + 9 = 19
(the first two terms are the number of parameters needed to model the row and column
effects, and the last term is the number of interaction parameters)
• The within sums of squares (within cells) SSW is the variation of the observations
about their cell means
SSW =
X¡
yijk − y ij·
¢2
i,j,k
It is important to remember that SSW is called the residual (or error) sums of squares
(SSE) in multiple regression terminology, i.e., SSW=SSE
• Also, SSW=SST−SSB.
17
• The degrees of freedom associated with SSW is the sum, over all groups, of the number
of observations within group minus 1:
dfW =
X
(nij − 1)
i,j
=9
• Also, dfW = n − p, where p is the number of parameters in the saturated model, e.g.,
dfW = 29 − (1 + 1 + 9 + 9) = 9,
Interpretation of SSB and SSW
• Note that
SSB =
X
¢2
¡
nij y ij· − y ···
i,j
• If SSB is small, then the sample cell means y ij· are not very different from the overall
mean, indicating that there is not much variation in population cell means µij·
• If SSB is large, then there are differences between the cell means and the overall
mean, implying that there are differences in mean response among different treatments,
or different blocks, or both
• A formal test of whether there are differences among the cell means is given by the
F -test.
Warning: this is not a very interesting test, since it is equivalent to stating
that neither the row nor the column factor have any effect on the response variable.
The interesting tests are formed by decomposing SSB into terms that are attributable to
interaction, and the (main) effect of the row factor, and the (main) effect of the column
factor
18
• Let µij denote the true cell mean for the cell i,j, and let µ denote the true overall
mean. Then, the formal hypothesis is
H0 : µij = µ, for all i and j
Ha : µij 6= µ, for at least one i or j
• The test statistic is an F defined by
F =
SSB/dfB
MSB
=
SSW/dfW
MSW
• Under H0 , F has a F -distribution with n1 =dfB numerator and n2 =dfW denominator
degrees of freedom, respectively
• We reject H0 at the α-level if F > fα , where α = P (Fn1 ,n2 > fα ) is the probability
that an F random variable with n1 numerator and n2 denominator degrees of freedom
takes on a value larger than fα
• A p-value for the test is P (Fn1 ,n2 > F )
The ANOVA table for the Pygmalion experiment
• The tests of significance are summarized in the ANOVA table(s). Two tables must be
examined by the analyst, though it is necessary to present only the second (Table 5)
• Table 4 gives one important bit of information: the test for interaction on the line
labeled Comp × Treat
• Because interaction is not significant, a second table is necessary to summarize the
significance tests for the main effects. Recall that we do not test for the effect of a main
effect if interaction is modeled (in the model), as it is in Table 4. Table 5 shows the
significance tests for the main effects
19
Table 4: AVOVA table for the Pygmalion data set based on the non-additive (interaction) model.
2
R2 = 0.739, Radjusted
= 0.188.
Source
Regression
Company
Treatment
Comp × Treat
Error
Corrected Total
Sums-of-squares
1321.322
672.52
328.88
311.464
467.040
1788.362
Degrees of Freedom
19
9
1
9
9
28
Mean Square Error
69.2
74.72
7328.88
34.607
51.893
F
P-value
1.44
6.34
0.667
0.298
0.033
0.722
Table 5: AVOVA table for the Pygmalion data set based on the additive (no-interaction) model. R2 =
2
0.565, Radjusted
= 0.323.
Source
Regression
Company
Treatment
Error
Corrected Total
Sums-of-squares
1009.858
682.517
338.883
778.504
1788.362
Degrees of Freedom
10
9
1
18
28
Mean Square Error
100.98
75.835
338.883
43.250
F
P-value
1.753
7.835
0.148
0.012
• Some details on the Construction of the ANOVA tables
1. The sums-of-squares for the test of interaction (Table 4) is the difference in error
sums-of-squares that were used in the extra-sums-of-squares F -test, that is 311.46 =
778.50−467.04, where 788.50 is the error sums of squares associated with the additive
model and 467.04 is the error sums of squares associated with the nonadditive model.
Similarly, the mean square error (51.893) for that line is the denominator in the F statistic
2. The test for treatment effect is given in Table 5. Specifically, 338.88 = 1117.38 −
778.50 is the increase in error sums-of-squares between the main effects model and
20
the company-only model. The F -statistic is
F =
=
(SSE2 − SSE1 )/(df2 − df1 )
SSE1 /df1
(1117.38 − 778.50) /1
338.88
=
43.25
43.25
= 7.83
and P (F1,9 > 7.83) = 0.012
3. The test for company effect is carried out in the same fashion - we compare the
error associated with the additive model (both main effects) versus the model with
only treatment.
Specifically, 682.517 = 1461.0.6 − 778.50 is the increase in error
sums-of-squares between the main effects model and the company-only model. The
F -statistic is
F =
=
(SSE2 − SSE1 )/(df2 − df1 )
SSE1 /df1
(1461.02 − 778.50)/9
43.25
= 1.753
and P (F9,9 > 1.753) = 0.148
4. The SPSS ANOVA procedure produces a different summary table
• The final task is to extract the estimate of the Pygmalion effect and construct a 95%
confidence interval
• The estimate is βb1 = −7.222 and the estimated standard error is 2.58.
• A 95% CI is for the improvement in mean score due to the Pygmalion effect is
βb1 ± t.025,18 × σ
bβb1 = 7.222 ± 2.101 × 2.58 = (1.8,12.6)
21
Case Study 2 From Ramsey and Schafer (1997) The Statistical Sleuth, Duxbury Press,
p. 363. (A. Olsen, Evolutionary and Ecological Interactions Affecting Seaweeds, Ph.D
Thesis. Oregon State U. 1993.) To study the influence of grazers on regeneration rates
of seaweed in the intertidal zone, a researcher scraped rocks free of seaweed and observed
the amount of regeneration over time when certain grazers were excluded.
• The grazers were
1. L - limpets (an invertebrate)
2. f - small fish
3. F - big fish
• Each plot was a 100 cm square located on a rock surface. Each plot received one of
6 treatments levels:
1. LfF: all three grazers were allowed access
2. fF: fish allowed access (limpets excluded by surrounding the plot with a caustic
paint)
3. Lf: Limpets and small fish allowed access (a coarse net excluded large fish)
4. f: small fish allowed access (paint and coarse net)
5. L: limpets allowed access (fine net)
6. C: (control) limpets, small and large fish excluded
• Exclosures were constructed by mounting nets on a frame bolted to the rock
22
• All plots had frames to eliminate confounding with the possible effect of the frames
on feeding preference
Objectives of the Study
• Determine the impacts of the three different grazers on seaweed regeneration rates
• Determine which grazer consumes the most seaweed
• Determine if different grazers affect each other
• Determine if grazing effects are the same in all microhabitats
More on design
• There are 3 factors in this study each with 2 levels:
1. limpets (present and absent)
2. small fish (present and absent)
3. large fish (present and absent)
• A factorial design combines each level of each factor with every other level. If this
were a factorial design, then there would be 2 × 2 × 2 = 8 treatments.
• It was not physically possible to form all 8 combinations.
For example, it was not
feasible to exclude small fish and allow large fish in the enclosures.
• A strategy for analyzing these data is to view the experiment as a two-way analysis of
variance using a single treatment factor with 6 levels (shown in the layout below) and a
blocking factor corresponding to inter-tidal environment
23
Large fish
Limpets absent
Limpets present
Small fish
Small fish
absent present
absent
present
C
f
absent
present
L
Lf
fF
LfF
• Because the intertidal zone is a highly variable environment, the researcher applied
the 6 levels of treatment in eight blocks, each of which contained 12 plots. Thus, each
treatment level was replicated twice within each block.
• Within block, the six levels were randomly allocated to the 12 plots
• If desired, interaction between grazers (the treatment factor) and environment (blocking factor) and be investigated
• The blocks are
1. Block 1: below high tide, exposed to heavy surf
2. Block 2: below high tide, protected from heavy surf
3. Block 3: Mid-tide, exposed
4. Block 4: Mid-tide, protected
5. Block 5: Low tide, exposed
6. Block 6: Low tide, protected
7. Block 7: On a near-vertical rock wall, mid-tide level and exposed
8. Block 8: On a near-vertical rock wall, low tide level and protected
24
• Replication is very helpful, because it allows us to measure the inherent variability of
the response variable under (near) identical conditions.
• This is a randomized block experiment - treatment levels were randomly allocated to
experimental units (the plots) within each block
• By allocating treatment level within block, Olsen was assured of having exactly 2
replications of each treatment in each block. The design is said to be balanced
• A primary objective of this design is to be able to compare treatments within block
- this helps prevent environmental variation from confounding or obscuring the comparisons between treatments
• Alternatively, if Olsen had randomly allocated treatments to plots without regard for
blocks, some blocks would have more of one treatment than another. Perhaps all the
limpet level observations would have been in blocks with a high levels of exposure
• After four weeks, Olsen estimated regenerating seaweed cover by positioning a metal
sheet with 100 holes over each plot
• The percentage of holes that were positioned over regenerating seaweed was determined
• Data - percent regenerated seaweed cover
25
Treatment
Block
C
L
f
Lf
fF
LfF
1
14
23
4
4
11 24
3
5
10 13
1
2
2
22 35
7
8
14 31
3
6
10 15
3
5
3
67 82
28 58
52 59
9
31
44 50
6
9
4
94 95
27 35
83 89
21 57
57 73
7
22
5
34 53
11 33
33 34
5
9
26 42
5
6
6
58 75
16 31
39 52
26 43
38 42
10
17
7
19 47
6
8
43 53
4
12
29 36
5
14
8
53 61
15 17
30 37
12 18
11 40
5
7
Recall the proposed strategy for analyzing two-way tables when there are several observations per cell:
1. Begin with graphically-based initial exploration, and determine if there are outliers,
and if transformations are needed
2. Fit a rich model (the saturated model) with interactions, and examine model assumptions (concentrating on the constant variance assumption, and whether there
are outliers). Model fitting uses a backwards elimination process
3. Test whether the interaction terms are needed (via the F -statistic).
• If interaction terms are needed, then estimate the mean response and its standard
error for each treatment. That is, compute y ij and σ
byij = s/nij for each i and j.
• If interaction terms are not needed, then test whether the additive effects of the
26
row factor are zero, and whether the additive effects of the column factor are zero.
In other words, test whether the coefficients that account for the factor are all zero
versus the alternative that at least one is different from zero.
• Particular comparisons can be carried out at this point.
For example, estimate the
differences in expected response for different treatments (when interaction is found to be
present) or different levels of factors (when interaction is not present). The answers to
these questions are ultimately, the most useful information coming from the analysis
• Figure 3 is a visual representation of the data
40
0
20
Percent
60
80
CONTROL
L
f
Lf
fF
LfF
2
4
6
8
Block
Figure 2: Cell means plotted against block. Seaweed grazers data.
• Treatments with limpets excluded had larger averages than those wherein limpets were
27
able to graze. This difference is consistent across blocks
• Other treatments are less consistent across blocks
• There is evidence of interaction - the differences between treatments varies with block.
There are substantially larger differences in block 4 than 1 and 2. However, this apparent
interaction is related to the overall percent regeneration, and it may be possible to remove
it by transformation
• There is evidence of nonconstant variance - the variation in cell means is greater for
blocks 4, 6 and 8. This variation is related to the overall block average
• Figure 5 shows the residuals from the nonadditive model.
Nonconstant variance is
present; in particular, the residuals associated fitted values near 50 percent have the
greatest variability, and those residual associated with residuals near zero or 100 have
the least variability
• The data are percentages (or equivalently, sample proportions) Percentages typically
exhibit a variance relationship where small variance is observed when the values are near
the lower and upper limits of the range, and large variances are observed in the center of
the range).
• Recall that the variance of a sample proportion is p(1 − p)/n where p is the probability
that the event of interest will occur, and n is the number of observations. The maximum
variance occurs when p = 0.5
• A common transformation for percentage data is the logit function given by
z = log
³ y/100 ´
³ y ´
= log
1 − y/100
100 − y
28
2
1
0
-1
-2
Standardized residuals
0
20
40
60
80
Fitted values
Figure 3: Residuals from the nonadditive (cell means) models. Seaweed grazers data.
• The logit transformation is interpretable as the natural logarithm of the proportion of
the plot covered by seaweed to that not covered by seaweed
• Figure 4 shows the graph of the logit
• The ANOVA table assuming a nonadditive (cell means) model is shown in Table 6
• Recall that the objectives of the study were to
1. Determine the impacts of the three different grazers on seaweed regeneration rates
2. Determine which grazer consumes the most seaweed
3. Determine if different grazers affect each other
29
4
2
0
-4
-2
logit(p)
0.0
0.2
0.4
0.6
0.8
1.0
p
Figure 4: Graph of the logit function.
4. Determine if grazing effects are the same in all microhabitats
• The last objective has largely been answered. Because treatments and blocks do not
interact, the treatment effects are the same in all blocks. Hence, we can conclude that
there is no evidence that grazing effects are not the same in all microhabitats
• To begin to address the remainder of the objectives, consider a table (Table 7) showing
the fitted values for each of the block/treatment combinations. Such a table is sometimes
called a table of estimated means from the additive model. For now, the logits are going
to be used.
30
2
0
-4
-2
logit
2
4
6
8
Block
Figure 5: Logit cell means plotted against block. Seaweed grazers data.
• The values shown in Table 7 (besides the right and bottom margins) are
ybij· = µ
bij = βb0 + . . . + βbp xp,ij
(1)
where βb0 , . . . , βbp are the parameter estimates for the additive model computed via least
square regression.
• A row mean is the mean prediction for all observations in a particular block. That is,
the ith row mean is
c
1X
µ
bi· =
µ
bij ,
c j=1
• A column mean is the mean prediction for all observations in a particular treatment
31
Table 6: AVOVA table for the seaweed grazers data set based on the non-additive (interaction) model.
R2 = 0.928.
Source
Regression
Blocks
Treatment
Interaction
Error
Corrected Total
Sums-of-squares
188.46
76.24
96.99
15.23
14.54
203.00
Degrees of Freedom
47
7
5
35
48
95
Mean Square Error
4.01
10.89
19.40
0.43
0.303
51.893
F
13.24
35.96
64.05
1.44
P-value
< 0.0001
< 0.0001
< 0.0001
0.121
group. That is, the jth column mean is
r
µ
b·j =
1X
µ
bij
r i=1
• For example, µ
b1· = −2.64 is the mean logit of percent regenerated seaweed for block 1
Table 7: Predicted values (on the logit scale) assuming the additive model.
Blocks
1
2
3
4
5
6
7
8
Mean
C
−1.22
−.76
.88
1.76
−0.01
0.80
−.11
.11
.18
L
−3.12
−2.66
−1.01
−.13
−1.90
−1.09
−2.01
−1.79
−1.71
Treatment
f
Lf
−1.72
−3.41
−1.26
−2.95
.39
−1.30
1.26
−.43
−.50
−2.19
.31
−1.39
−.61
−2.30
−.39
−2.08
−.31
−2.00
fF
−2.22
−1.76
−.12
.76
−1.01
−.20
−1.12
−.89
−.82
LfF
−4.13
−3.67
−2.02
−1.15
−2.91
−2.12
−3.02
−2.80
−2.72
Mean
−2.64
−2.18
−.53
.34
−1.42
−.61
−1.53
−1.31
−1.23
Contrasts (Ott and Longnecker, p. 431)
Often, it is desirable to compare specific treatments and obtain a formal test of significance.
• For example, suppose that drug A is the conventional treatment for a particular disease
and drugs B and C are two versions of a new formulation (say fast- and slow-acting
versions).
32
• Once it has been established that there are differences in expected response due to
treatments (drugs) via an F -test, then we may want to compare B and C versus A, and
then B versus C (and answer the question: which of the new drugs is best?).
• It is desirable to use contrasts for this purpose
Notation and Terminology
• Suppose that there are t treatment group means µ1 , µ2 , . . . , µt . Let µ1 , µ2 , µ3 denote
the mean response when drug A, drug B, and drug C are taken, respectively
• A contrast (or comparison) of these means (µ1 , µ2 , µ3 ) is
l = a1 µ1 + · · · + at µt =
t
X
ai µi ,
i=1
where a1 , . . . , at are known constants with the property
t
X
ai = 0
i=1
• For example to compare µ2 against µ3 , set a1 = 0, a2 = 1, a3 = −1, . . . , at = 0. Then,
l=
t
X
ai µi = µ2 − µ3
i=1
• To compare µ1 versus µ2 and µ3 , one way is to set a1 = 1, a2 = − 12 and a3 = − 12 .
Then,
l = µ1 −
µ2 + µ3
2
3
• The mathematical statement l = 0 is equivalent to 0 = µ1 − µ2 +µ
and also µ1 =
2
• A formal test of H0 : µ1 =
µ2 +µ3
2
µ2 +µ3
2
is obtained by testing H0 : l = 0 versus Ha : l 6= 0
• Because µ1 , µ2 , . . . , µt are unknown, we need to estimate l. The estimate is called a
33
linear contrast of treatment means, and it is
b
l = a1 µ
b1 + · · · + at µ
bt =
t
X
ai µ
bi ,
i=1
where the estimates µ
b1 , . . . , µ
bt are obtained from fitted regression model
• To test H0 : l = 0 versus Ha : l 6= 0, we use the test statistic
b
l
,
σ
bbl
T =
where the estimated variance of the contrast of treatment means is
σ
bbl2
=σ
b
2
t
X
a2
i
i=1
ni
• σ
b is the residual standard error (from the final model), and ni are the number of
observations which received treatment i
• If H0 is true, then T has a T distribution where the degrees of freedom are residual
degrees of freedom associated with model from which the estimate σ
b was obtained
• Thus, a p-value for the test of H0 is Pr(Tdf ≥ |t|) where t is the observed value of the
contrast of treatment means and Tdf has a T distribution with df degrees of freeedom
• The objectives of the seaweed grazers study require a number of contrast tests. Recall
the following table showing the grazer treatments
Large fish
absent
present
Limpets absent
Limpets present
Small fish
Small fish
absent present
C
f
absent
present
L
Lf
fF
LfF
34
• Since an additive model was adopted, the effect of limpets can be estimated by comparing the average response for the three treatments wherein limpets grazed to the average
response for the three treatments wherein limpets were excluded using a contrast of
treatment means given by
µ
bLfF + µ
bLf + µ
bL µ
bfF + µ
bf + µ
bC
−
3
3
(2)
• However, before reporting the estimated limpet effect, we ought to verify that it is not
zero by testing the following hypotheses
H0 :
µLfF + µLf + µL µfF + µf + µC
−
= 0,
3
3
(3)
Ha :
µLfF + µLf + µL µfF + µf + µC
−
6= 0,
3
3
(4)
versus
• The estimated effect of limpets is
µ
bLfF + µ
bLf + µ
bL µ
bfF + µ
bf + µ
bC −2.72 − 2.00 − 1.71 −.82 − .31 + .18
−
=
−
3
3
3
3
= − 1.82
• The values used in this equation are the row means shown in Table 7
• To carry out the test of significance for limpet effect, note that σ
b2 = 0.358 and that
the number of observations that are used to estimate the cell means are ni = 16 = 2 × 8
for each i = 1, . . . , 6. Using these values, the estimated variance of the treatment mean
35
contrast is
σ
bbl2
=σ
b
2
2
=
σ
b
16
t
X
a2
i
i=1
t
X
ni
a2i
i=1
· ¸2 · ¸2 · ¸2 · ¸2 · ¸2 · ¸2 ´
0.358 ³ 1
1
1
1
1
1
=
+
+
+ −
+ −
+ −
16
3
3
3
3
3
3
=
0.358 6
× = .01494
16
9
• The standard error of the treatment mean contrast is estimated to be
q
σ
bbl =
σ
bbl2 =
√
.01494 = .1222
• The test statistic is
t=
b
l
−1.829
=
= −14.97.
σ
bbl
.1222
• Finally,
Pr(T48 > | − 14.97|) = Pr(T48 > 14.97) < 0.0001
and we conclude that there is abundant evidence that limpets affect the regeneration of
seaweed
• Because of the combinations of grazers that were allowed access is somewhat complicated, some thought is needed to determine how to assess whether different grazers affect
each other
• For example, are limpets affected by small fish? To answer this question, we can
compare the differences between limpets present and absent when small fish are present,
versus when small fish are absent.
36
• A contrast of these means is
µLFf − µfF
µLf − µf
µL − µC
+
−2×
2
2
2
and we are interested in determining whether the observed contrast is significantly different from zero
• By substituting the appropriate column means form Table 7, the contrast estimate is
−2.72 − −.82 −2.00 − −.31
−1.71 − .18
+
−2×
= −.09
2
2
2
• The estimated standard error of the contrast of treatment means is σ
bbl = .26 and the
test statistic is t = −.09/.26 = 0.37. Further, Pr(|T48 | > 0.37) = 0.71 which shows that
there is no evidence that small fish affect limpets, and likewise no evidence that limpets
affect small fish
• Another example. To assess whether large fish have an effect on regeneration, we
compare means from the treatments fF and LfF against f and Lf by testing the hypothesis
H0 :
µfF + µLfF µf + µLf
µfF + µLfF µf + µLf
−
= 0 versus Ha :
−
6= 0
2
2
2
2
• The estimated contrast is
−.82 − 2.72 −.31 − 2.00
−
= −1.77 + 1.16 = −.61
2
2
• In this case, a1 =
1
2
= a2 , a3 = − 12 = a4 , and a5 = 0 = a6
• Then,
t=
b
l
−.61
=
= −4.10.
σ
bbl
.1498
• The p-value is P (|t48 | > 4.10) = .0007.
37
Multifactor Studies Without Replication
• So far, the emphasis has been on analyzing two-way tables using categorical indicator
variables. There was more than one observation for some treatment combinations
• Now we consider experiments in which there are two or more factors, and just one
observation per treatment combination.
• If there are more than one observation per treatment combination, we say that treatments combinations are replicated ; otherwise we say treatment combinations are not
replicated
• If treatment combinations are replicated, then there are no limitations on the interactions that can be accommodated in the regression analysis.
• For example, if the seaweed grazers experiment had identified four factors (limpets,
large fish , and small fish, and blocks) and every combination of factor levels had been
replicated more than once, then we could fit a model with all main effects, all two-way interactions (e.g., block×Limpet), and all three-way interactions (e.g., block×Limpet×Large
fish), and the four-way interaction block×Limpet×Large fish×Small fish
• When there are more than one observation per treatment group and every possible
interaction is included in the model, then the variance σ 2 is estimated by comparing the
variation of the observations about their respective cell means.
• When there are more than one observation per treatment group and every possible
interaction is included in the model, then the cell means are exactly equal to the single
observation observed at the correspond combination of treatment means. Error will be
estimated to be zero, which is obviously unrealistic for most experiments
38
• The situation is different if there are only a single observation per treatment group
Case Study 3 (revisited) (Ramsey and Schafer) from Fouts, R.S. 1983. “Aquisition and
testing of gestural signs in four young chimpanzees,” Science, 180, 978-980. Fouts taught
4 chimpanzees 10 signs of the the American sign language with the intent of determining
whether some signs are easier to learn, and whether some chimps tended to learn more
quickly than others.
Table 8: Data: (time in minutes to learn a word).
Word
Listen
Drink
Shoe
Key
More
Food
Fruit
Hat
Look
String
Booee
12
15
14
10
10
80
80
78
115
129
Chimpanzee
Cindy Bruno
10
2
25
36
18
60
25
25
15
225
55
14
20
177
99
178
54
345
476
287
Thelma
15
18
20
40
24
190
297
297
420
372
• Note that there are two factors (chimps with 4 levels and words with 10 levels). Hence,
there are 4 × 10 = 40 treatment groups, and 40 observations
• There is no possibility of replicating treatments (we cannot view teaching a word to an
animal a second time as an independent replication of the first teaching). Thus, there
is exactly one observation per treatment combination (chimp × word)
• An important question is how we view chimps. Chimps are treated as a factor with 4
levels. Note that chimps are not replicable but words are. That is, if I decide to repeat
the study, I can teach the same words to 4 chimps, but, almost certainly, I cannot use
the same four chimps. Hence, chimps are like blocks - they are not replicable, and their
39
main purpose is improve the contrast between the learning time of particular words.
• If these chimps were a random sample of chimps, then comparisons among chimps
could be used for draw inferences about the population of chimps. If this were the case,
then we probably would have substantial interest in comparing chimps
• Because the chimps cannot be viewed as a representative sample of any recognizable
and useful population, we should view chimp as a fixed blocking factor. We’ll return to
this matter soon
• Two factors are identified: chimp (with 4 levels), and word (with 10 levels)
• The chimp factor requires 3 indicator variables
• The word factor requires k − 1 = 10 − 1 = 9 indicator variables
• The additive model requires 1 + 3 + 9 = 13 parameters; therefore there are n − p =
40 − 13 = 27 degrees of freedom for estimating σ 2
• An interaction model requires 3 × 9 = 27 indicator variables, hence, there would be a
total of 1 + 3 + 9 + 27 = 40 parameters and
n − p = 40 − 40 = 0
degrees of freedom for estimating σ 2 .
The estimate of σ 2 from a regression analysis
will be σ
b2 = 0. Zero is an under-estimate of σ 2 . Surely there are sources of variation in
the time it takes a chimp to learn a word; for example, physical condition (hunger) or
outside distractions probably induce variation in the time it takes a particular chimp to
learn a particular word
• Clearly, this estimate is biased downward, perhaps severely, and no test that uses the
40
estimate can be viewed as unbiased
• We cannot carry out any inferential methods that require an estimate of σ 2 if we believe
the interaction model is correct because our estimate of σ 2 is 0
Rationale for Designs with One Observation Per Cell
• For some experiments, there is no possibility of replicating the treatment group combinations.
Because a word cannot be taught twice to a particular chimp, we cannot
replicate the (chimp,word) treatment combinations
• For some experiments, the cost of obtaining replicate observations is too great.
If
it can be argued in the design stage that interaction is not likely, then this design is
cost-effective. If interaction is present, the value of the experiment is greatly diminished
because there is no way to estimate σ 2 and it is very difficult, if not impossible, to the
test the usual hypotheses of interest
A Strategy for Data Analysis in the Absence of Replicates
1. Begin with graphically-based initial exploration, and determine if there are outliers,
and if transformations are needed
2. Fit a rich model and examine model assumptions (concentrating on the whether
there is evidence of interaction and nonconstant variance). If there are three factors,
then we can have two-way, but not three-way interactions. A rich model contains
all reasonable two-way interactions, and the main effects
3. If the rich model contains interactions (there must be at least 3 factors in the model
so that 2-way interaction can be modeled), test whether these interaction terms are
41
needed (via the extra-sums-of squares F -statistic)
• If a factor interacts with another factor, then estimate the mean response and its
standard error for each different combination of factor levels
• If interaction terms are not needed, then test whether the additive effects of each
factor are zero
4. Particular comparions can be carried out at this point (e.g, “which grazer reduces
seaweed growth to the greatest extent, and by how much?”) The answers to these
questions are ultimately the most useful information coming from the analysis
• The main point is that the researcher must be able to argue convincingly, based on her
scientific understanding of the problem, that the highest possible level of interaction is
zero (does not exist)
42