Chapter 15 Analysis of Variance for Standard Designs Case Study 1 Ramsey and Schafer (1997) The Statistical Sleuth, Duxbury Press, p. 365. The Pygmalion effect. • Pygmalion was a king of Cyprus who sculpted a figure of the ideal woman and then fell in love with the sculpture. It also refers for the situation in which high expectations placed on individuals by teachers or supervisors often results in improved performance by students or subordinates. • Eden, D. 1990, Pygmalion effects without interpersonal contrast effects, J. Appl. Psychol. 75(4), 395-98. Eden speculated that in most quantitative examples of the Pygmalion effect which compared two groups of subjects (one with high expectations, and the other without), there were also reduced expectations placed on the “no expectation” group. This contrast between high and low expectations may be exaggerate the Pygmalion effect. • Eden conducted an experiment that attempted to eliminate interpersonal contrasts of this type. Ten companies of soldiers in a training camp were selected for study. Each company consisted of three platoons; one platoon in each was selected by a random mechanism to be the Pygmalion platoon. • Companies are identified as blocks, and because randomization was within block, this is a randomized block design. • Prior to assuming command, each platoon leader met with an army psychologist that 1 described a nonexistent battery of tests taken by the platoon members. If the platoon was a Pygmalion platoon, the psychologist reported that the tests predicted superior performance for the platoon. • At the end of training, each platoon took a test that measured their ability to operate weapons and answer questions about their use. Table 1: Platoon average scores on the test. Company 1 2 3 4 5 6 7 8 9 10 Treatments Pygmalion Control 80.0 63.2 69.2 83.9 63.1 81.5 68.2 76.2 76.5 59.5 73.5 87.8 73.9 78.5 89.8 73.9 78.5 76.1 60.6 69.6 71.5 67.8 73.2 69.5 72.3 73.9 83.7 63.7 77.7 • Note that the experimental units are platoons - scores on individual soldiers are averaged, then discarded. Treatments were assigned to platoons - hence these are the experimental units. • There is one observation for each combination of company and the Pygmalion treatment, and usually two observations for each combination of company and the control treatment - so the experiment is not balanced Summary The Pygmalion effect adds an estimated 7.22 points to a platoon’s score (95% CI is 1.80 to 12.64 points). The evidence strongly suggests that the effect is real (p-value= 0.006), and the design allows for causal inference. 2 Additive and Non-additive Models for Two-way Tables • In situations where there are two factors, the data set may be viewed as a two-way (rows and columns) table where rows and columns correspond to factors. In the Pygmalion study, rows are company (10 levels), and columns are treatment (2 levels). • Research questions tend to focus on additive models for two-way tables. • An additive model assumes that there is no interaction between the row and column factors. • An additive model specifies that the effect of one level of one factor (e.g., the row factor) is the same at all levels of the second factor. • The effect of factor A is completely unrelated to the levels of B; likewise, the effect of factor B is completely unrelated to factor A • For example, the effect of factors A at level 1 and B at level 3, is the sum of the effect of A at level 1 and the effect of B at level 3. If the model were not additive, then combined effect would not be the sum of the two effects • Example Mean test scores according to the additive model, in terms of regression coefficients in a multiple regression model with indicator variables • β0 is the expected response at the control level • β1 is the Pygmalion effect (the difference in expected response between the control and Pygmalion treatments) • β2 is the effect of company 2 (relative to the aliased company - company 1), etc. • The additive model is powerful because it is universal in the sense that the factor effects 3 Table 2: Additive Model of the Expected, or Mean Responses. Company 1 2 3 4 5 6 7 8 9 10 Treatments Pygmalion Control β0 + β1 β0 β0 + β1 + β2 β0 + β2 β0 + β1 + β3 β0 + β3 β0 + β1 + β4 β0 + β4 β0 + β1 + β5 β0 + β5 β0 + β1 + β6 β0 + β6 β0 + β1 + β7 β0 + β7 β0 + β1 + β8 β0 + β8 β0 + β1 + β9 β0 + β9 β0 + β1 + β10 β0 + β10 Treatment Effects Pygmalion−Control β1 β1 β1 β1 β1 β1 β1 β1 β1 β1 are independent of each other. No matter which company is scrutinzied, the Pygmalion effect is the same. • If the additive model fails to fit well, then all references to the Pygmalion effect must be stated with respect to a particular company Fitting the Additive Model • Let Y represent the platoon score, and let 1, if treatment is Pygmalion x1 = 0, if treatment is control 1, if company is 2 x2 = 0, if company is not 2 1, if company is 10 , · · · , x10 = 0, if company is not 10. • The additive model, in regression form, is Y = β0 + β1 x1 + β2 x2 + · · · + β10 x10 + ε 4 • This model contains 11 parameters: p = 1 + (r − 1) + (c − 1) = 1 + 9 + 1 = 11 where r are the number of levels of the row factor and c is the number of levels of the column factor. One additional parameter is counted for the intercept β0 The Saturated, Nonadditive Model • One competing, alternative model to the additive model is the saturated, nonadditive model. It allows for interaction between rows and columns, and in doing so, does not assume that the effect of one particular factor level is the same at each level of the other factor • It is called saturated because there are no additional parameters that can be introduced to reduce the residual variation about this model. Said another way, there are as many parameters as cells (20 = 2 × 10 cells) • Every cell has its own, unconstrained, expected value (or mean) • Let 1, if treatment is Pygmalion and company is 2 x11 =x1 × x2 = 0, otherwise .. . 1, if treatment is Pygmalion and company is 10 x19 =x1 × x10 = 0, otherwise • We sometimes call this the cell means model because each cell mean has a freely determined mean 5 • The nonadditive saturated model, in regression form, is Y = β0 + β1 x1 + β2 x2 + · · · + β10 x10 + β11 x11 · · · + β19 x19 + ε • Mean test scores according to the nonadditive saturated model, in terms of regression coefficients in a multiple regression model with indicator variables Table 3: Mean test scores according to the nonadditive saturated model, in terms of regression coefficients in a multiple regression model with indicator variables. Company 1 2 3 4 5 6 7 8 9 10 Treatments Pygmalion β0 + β1 β0 + β1 + β2 + β11 β0 + β1 + β3 + β12 β0 + β1 + β4 + β13 β0 + β1 + β5 + β14 β0 + β1 + β6 + β15 β0 + β1 + β7 + β16 β0 + β1 + β8 + β17 β0 + β1 + β9 + β18 β0 + β1 + β10 + β19 Control β0 β0 + β2 β0 + β3 β0 + β4 β0 + β5 β0 + β6 β0 + β7 β0 + β8 β0 + β9 β0 + β10 Treatment Effects Pygmalion−Control β1 β1 + β11 β1 + β12 β1 + β13 β1 + β14 β1 + β15 β1 + β15 β1 + β17 β1 + β18 β1 + β19 • This nonadditive saturated model contains 20 parameters: p = 1 + (r − 1) + (c − 1) + (r − 1) × (c − 1) = 1 + 9 + 1 + 9 = 20 (same as the number of cells) • In the saturated model, the means are unrelated. There are no constraints on the means (such as the Pygmalion effect is the same for all companies). The estimated cell mean (e.g., for Company 2, Pygmalion) is the platoon sample mean. (The same estimate can be obtained by fitting the model and computing the fitted value) • Let Yijk denote the platoon mean for level i of company, i = 1, . . . , 10, and level j of treatment, j = 1, 2, and replicate k (= 1 or 2) 6 • The number of replicates is nij . nij is 1 whenever j = 1 (Pygmalion treatment), and nij is 2 for the controls (except company 3) • We may re-express the model as Yijk = β0 + β1 x1,ij + β2 x2,ij + · · · + β10 x10,ij + β11 x11,ij + · · · + β19 x19,ij + εijk • We also write a short-hand expression Yijk = µij + εijk where µij = β0 + β1 x1,ij + β2 x2,ij + · · · + β10 x10,ij + β11 x11,ij + · · · + β19 x19,ij The subscripts i and j on µ imply that there is a distinct, freely determined mean for each cell • The estimate of σ 2 , the residual variance, is nij 1 XXX s = (yijk − ybij )2 . n−p i j k 2 • Note that n − p = 29 − 20 = 9 are the degrees of freedom for error • The predicted or fitted values ybij can be computed as an output of multiple regression model fitting, or simply by computing the cell means y ij . For example, if i = 1 and j = 1, ybij = y111 ; i = 1 and j = 2, yb12 = y121 + y122 2 7 A Strategy for Anlayzing Two-way Tables With Several Observations per Cell The fixed effects analysis of variance is approached as a multiple regression analysis in which backwards elimination is used to determine the importance of the factors in explaining variation in the response variable 1. Begin with graphically-based initial exploration, and determine if there are outliers, and if transformations are needed 2. Fit a rich model (the saturated model) with interactions, and examine model assumptions (concentrating on the constant variance assumption, and whether there are outliers) 3. Test whether the interaction terms are needed (via the F -statistic). • If interaction terms are needed, then estimate the mean response and its standard error for each treatment. That is, compute y ij and σ byij = s/nij for each i and j. • If interaction terms are not needed, then test whether the additive effects of the row factor are zero, and whether the additive effects of the column factor are zero. In other words, test whether the coefficients that account for the factor are all zero versus the alternative that at least one is different from zero. • Particular comparisons can be carried out at this point. For example, estimate the differences in expected response for different treatments (when interaction is found to be present) or different levels of factors (when interaction is not present). The answers to these questions are ultimately, the most useful information coming from the analysis 8 The Analysis of Variance F -test for Additivity • If the all the interaction parameters are equal to 0, then the nonadditive model reduces to the additive model. A check on the additive model is obtained by comparing the fit of the additive and nonadditive models • A test used to compare fit is the extra-sums-of-squares F -test. • The hypotheses of interest are (informally) H0 : there is no interaction between the two factors Ha : there is interaction between the two factors • For example, for the Pygmalion problem, the test for additivity is a test of the following hypotheses H0 : β11 = 0, . . . , β19 = 0, versus Ha : βi 6= 0 for at least one i where 11 ≤ i ≤ 19 • A F -test of this hypothesis compares the difference in error sums of squares between the model with all interaction indicator variables in the model to the error sums of squares to the model with none of indicator variables in the model • A large difference in error sums of squares is evidence that the interaction terms explain variation in the response variable • Specifically, we compare the fit of the model Yij = β0 + β1 x1,j + β2 x2,i + · · · + β10 x10,i + β11 x11,i,j + · · · + β19 x19,i,j + εij 9 to the fit of the reduced model Yij = β0 + β1 x1,j + β2 x2,i + · · · + β10 x10,i + εij • As before, let 1. M1 denote the model with the interaction terms in the model, SSE1 denote the error sums of squares associated with M1 df1 denote the degrees of freedom associated with M1 (df1 = n − p) 4. M2 denote the model without the interaction terms in the model, SSE2 denote the error sums of squares associated with M2 df2 denote the degrees of freedom associated with M2 • The test statistic is F = (SSE2 − SSE1 )/(df2 − df1 ) SSE1 /df1 • Under H0 , F has a F -distribution with n1 = df2 − df1 = (r − 1)(c − 1) numerator and n2 = df1 , denominator degrees of freedom, respectively • We reject H0 at the α-level if F > fα , where α = P (Fn1 ,n2 > fα ) is the probability that an F random variable with n1 numerator and n2 denominator degrees of freedom takes on a value larger than fα • A p-value for the test is P (Fn1 ,n2 > F ) • F has and F -distribution with numerator and denominator degrees of freedom df1 and df2 respectively, provided that the random error terms εij are iid N (0, σ). 10 This assumption must be investigated by residual plots which check for non-constant variance and approximate normality. The residuals used in this analysis are the residuals from the full model, since σ is estimated using the residual mean square error from the full model • Figure 1 is residual plot using residuals from the full model. There is some concern regarding the assumption of constant variance (and no apparent method of improving 0 -2 -5 -1 0 Residuals 1 5 2 the appearance). Why is the left plot symmetric about about the horizontal axis? 65 70 75 80 85 90 -2 Fitted values -1 0 1 2 Quantiles of Standard Normal Figure 1: Residual plots. Residuals obtained from the interaction model. n = 29. • SPSS profile plots for the additive and nonadditive model are fairly similar, indicating visually that the additive model is adequate 11 • A formal test of significance usually is necessary. To carry out a test of the hypotheses H0 : there is no interaction between the two factors, versus Ha : there is interaction between the two factors, we formally test H0 : β11 = 0, . . . , β19 = 0,versus Ha : βi 6= 0 for at least one i where 11 ≤ i ≤ 19 • The test statistic is F = (SSE2 − SSE1 )/(df2 − df1 ) SSE1 /df1 where SSE1 is the error sums of squares for the interaction model, df1 are the degrees of freedom associated with the interaction model, SSE2 is the error sums of squares for the additive model, df2 are the degrees of freedom for the additive model • Under H0 , F has a F -distribution with n1 =df2 −df1 numerator and n2 =df1 denominator degrees of freedom, respectively • The p-value for the test is P (Fn1 ,n2 > F ) • In this case, F = (778.50 − 467.04)/9 34.61 = = 0.667, 467.04/28 51.89 and P (F9,9 > 0.667) = .72. • I conclude that there is no statistical evidence of interaction. This implies that there 12 is no evidence that the size of the Pygmalion effect depends on the particular company Ott and Longnecker’s 15.3 Randomized complete block design (p. 859) • The discussion of Ott and Longnecker involves t treatments in b blocks. This means that there are two factors, the blocking factor (corresponding to company in the previous example), and another factor with t levels (see Table 15.8). • A blocking factor is one that is not necessarily believed to affect the response variable. Instead, it is a variable that serves to identify groups of observations that are similar within groups and possibly different between groups. Company is a variable that was thought to do this in the Pygmalion study. There is occasionally some interest in determining whether the blocking factor is important related to the efficiency of the experiment, but usually, there is no substantive interest beyond this role. In other words, the test of significance for a blocking factor is of minor importance. Some statisticians advocate that a test of significance is not justifiable because its role is to reduce residual error, not to explain variation in the response variable • A blocking factor is often treated as a random factor (not a fixed factor). We will study random factors later. Treating a block as a fixed factor is not encouraged, though it does simplify certain aspects of the analysis and interpretation of the results. • Ott and Longnecker present a slightly different model than the general linear model presented in these notes. They reserve α1 , . . . , αt to denote the effects associated with the t levels of the fixed factor and β1 , . . . , βb to denote the effects associated with the t levels of the blocking factor, and µ to denote the overall average. Ott and Longnecker’s model is overparameterized (there are 1+t+b parameters whereas only 1+(b−1)+(t−1) = t+b−1 13 can be estimated). When it comes data analysis, their model must be discarded and replaced with the one presented in these notes (or one like it). • Ott and Longnecker’s Figure 15.1 is a profile plot • Ott and Longnecker’s ANOVA table (Table 15.1) has an extra line reporting the total sums of squares. P P P i j k (yijk In the context of the Pygamlion data, the total sums of squares is − y ... )2 . • Because y ... is the mean of all observations, the total sums of squares is measuring the error associated with the ”no-information” model - that is the model, that uses the sample mean to predict the value of any future observation. • The main difference between this expression P P P i j 2 k (yijk −y ... ) for the total sums of squares and the one in Ott and Longnecker;s book is the inner-most summation sign (it’s absent in their discussion because they assume that there are exactly one observation per treatment/block combination, and so there is no need for the inner-most summation). • With only one observation per treatment/block combination, it is impossible to model interaction. The total number of parameters in the interaction model is p = 1 + (t − 1) + (b − 1) + (t − 1) × (b − 1) = t × b. Since t × b = n, the degrees of freedom to estimate σ is 0, and the model fits the data perfectly. Ott and Longnecker are forced into the position of arguing that interaction does not exist in order to proceed with their analysis of the additive (no-interaction model). The assumption that interaction that interaction does not exist is usually reasonable when one of the two factors is a blocking factor because the blocking factor is used to reduce residual variation, not because it is thought to substantively affect the response variable. 14 • In the analysis of the Pygmalion data, there are 2 observations in some of the treatment/block combinations. Consequently, it is possible to estimate σ when an interaction model is adopted, and hence possible to test for interaction between blocks and treatment. • In general, it is preferable to have some replication of treatment/block combinations in order to investigate, or at least have the ability to investigate the interaction model • The process of assigning observational units to treatment/block combinations is carried out (usually) by randomly choosing blocks, and then, one block at at time, randomly assigning observations to treatments. For example, the companies are formed by randomly assigning soldiers to platoons, and then forming companies as either 3 or 2 platoons. Then, within each company, one platoon is random selected to be the Pygmalion treatment platoon. • Often, blocks are contiguous landscape units (e.g., a field or a forest stand. Plots are experimental units that are treated in some way (say, thinned or fertilized), and then measured (think of a square with an area of 100m2 ). Plot locations are randomly established within each block. Usually, by not necessarily, every treatment appears at least once in each block. This serves to reduce environmental variation as a source of residual error, and generally improves the sensitivity of the experimental design for detecting treatment differences. • Blocking can be carried out with more than one experimental factors. For example, if we are interested in forest fuel reduction, two factors of potential interest are season (levels are spring and fall), and pre-burning thinning (thinned or not). Since existing fuel loading and soil moisture are potentially confounding variables, blocking helps to 15 reduce the residual error associated with these difficult-to-measure variables. Analysis of Variance Terminology • Analysis of variance (ANOVA) is a partitioning of the total sums of squares into terms that are attributable to the model factors • The following discussion describes ANOVA terms for two-way tables. Ott and Longnecker discuss the ANOVA for one-way tables (i.e., one factor) on page 856, and randomized complete block design (two factors) on page 865. • The total sums of squares are SST = nij r X c X X (yijk − y ··· )2 i=1 j=1 k=1 = X (yijk − y ··· )2 i,j,k where nij is the number of observations in cell i, j, and y ··· = X 1X yijk and n = nij n i,j,k i,j is the grand, or overall, mean • The degrees of freedom for SST are n − 1 • The between sum of squares (between cells) SSB is that portion of SST which is attributable to differences between the cell means and the overall mean, i.e., SSB = XXX¡ ¡ ¢2 X ¢2 y ij· − y ··· = nij y ij· − y ··· i j i,j k where the cell means are y ij· = 1 X yijk nij k 16 • It is easy to get cell means from SPSS by fitting the interaction model, and saving the predicted values. The predicted values are the cell means, i.e., ybijk = y ij· for every i, j and k • Consequently, SSB is exactly equivalent to the regression sums of squares SSR from the saturated model. Mathematically, SSR = X (b yijk − y ··· )2 i,j,k and so SSR = X (b yijk − y ··· )2 = X ¡ ¢2 nij y ij· − y ··· = SSB i,j i,j,k • The degrees of freedom associated with SSB is the number of parameters used to account for differences between cell means in the saturated model, namely, dfB = (r − 1) + (c − 1) + (r − 1)(c − 1) = 9 + 1 + 9 = 19 (the first two terms are the number of parameters needed to model the row and column effects, and the last term is the number of interaction parameters) • The within sums of squares (within cells) SSW is the variation of the observations about their cell means SSW = X¡ yijk − y ij· ¢2 i,j,k It is important to remember that SSW is called the residual (or error) sums of squares (SSE) in multiple regression terminology, i.e., SSW=SSE • Also, SSW=SST−SSB. 17 • The degrees of freedom associated with SSW is the sum, over all groups, of the number of observations within group minus 1: dfW = X (nij − 1) i,j =9 • Also, dfW = n − p, where p is the number of parameters in the saturated model, e.g., dfW = 29 − (1 + 1 + 9 + 9) = 9, Interpretation of SSB and SSW • Note that SSB = X ¢2 ¡ nij y ij· − y ··· i,j • If SSB is small, then the sample cell means y ij· are not very different from the overall mean, indicating that there is not much variation in population cell means µij· • If SSB is large, then there are differences between the cell means and the overall mean, implying that there are differences in mean response among different treatments, or different blocks, or both • A formal test of whether there are differences among the cell means is given by the F -test. Warning: this is not a very interesting test, since it is equivalent to stating that neither the row nor the column factor have any effect on the response variable. The interesting tests are formed by decomposing SSB into terms that are attributable to interaction, and the (main) effect of the row factor, and the (main) effect of the column factor 18 • Let µij denote the true cell mean for the cell i,j, and let µ denote the true overall mean. Then, the formal hypothesis is H0 : µij = µ, for all i and j Ha : µij 6= µ, for at least one i or j • The test statistic is an F defined by F = SSB/dfB MSB = SSW/dfW MSW • Under H0 , F has a F -distribution with n1 =dfB numerator and n2 =dfW denominator degrees of freedom, respectively • We reject H0 at the α-level if F > fα , where α = P (Fn1 ,n2 > fα ) is the probability that an F random variable with n1 numerator and n2 denominator degrees of freedom takes on a value larger than fα • A p-value for the test is P (Fn1 ,n2 > F ) The ANOVA table for the Pygmalion experiment • The tests of significance are summarized in the ANOVA table(s). Two tables must be examined by the analyst, though it is necessary to present only the second (Table 5) • Table 4 gives one important bit of information: the test for interaction on the line labeled Comp × Treat • Because interaction is not significant, a second table is necessary to summarize the significance tests for the main effects. Recall that we do not test for the effect of a main effect if interaction is modeled (in the model), as it is in Table 4. Table 5 shows the significance tests for the main effects 19 Table 4: AVOVA table for the Pygmalion data set based on the non-additive (interaction) model. 2 R2 = 0.739, Radjusted = 0.188. Source Regression Company Treatment Comp × Treat Error Corrected Total Sums-of-squares 1321.322 672.52 328.88 311.464 467.040 1788.362 Degrees of Freedom 19 9 1 9 9 28 Mean Square Error 69.2 74.72 7328.88 34.607 51.893 F P-value 1.44 6.34 0.667 0.298 0.033 0.722 Table 5: AVOVA table for the Pygmalion data set based on the additive (no-interaction) model. R2 = 2 0.565, Radjusted = 0.323. Source Regression Company Treatment Error Corrected Total Sums-of-squares 1009.858 682.517 338.883 778.504 1788.362 Degrees of Freedom 10 9 1 18 28 Mean Square Error 100.98 75.835 338.883 43.250 F P-value 1.753 7.835 0.148 0.012 • Some details on the Construction of the ANOVA tables 1. The sums-of-squares for the test of interaction (Table 4) is the difference in error sums-of-squares that were used in the extra-sums-of-squares F -test, that is 311.46 = 778.50−467.04, where 788.50 is the error sums of squares associated with the additive model and 467.04 is the error sums of squares associated with the nonadditive model. Similarly, the mean square error (51.893) for that line is the denominator in the F statistic 2. The test for treatment effect is given in Table 5. Specifically, 338.88 = 1117.38 − 778.50 is the increase in error sums-of-squares between the main effects model and 20 the company-only model. The F -statistic is F = = (SSE2 − SSE1 )/(df2 − df1 ) SSE1 /df1 (1117.38 − 778.50) /1 338.88 = 43.25 43.25 = 7.83 and P (F1,9 > 7.83) = 0.012 3. The test for company effect is carried out in the same fashion - we compare the error associated with the additive model (both main effects) versus the model with only treatment. Specifically, 682.517 = 1461.0.6 − 778.50 is the increase in error sums-of-squares between the main effects model and the company-only model. The F -statistic is F = = (SSE2 − SSE1 )/(df2 − df1 ) SSE1 /df1 (1461.02 − 778.50)/9 43.25 = 1.753 and P (F9,9 > 1.753) = 0.148 4. The SPSS ANOVA procedure produces a different summary table • The final task is to extract the estimate of the Pygmalion effect and construct a 95% confidence interval • The estimate is βb1 = −7.222 and the estimated standard error is 2.58. • A 95% CI is for the improvement in mean score due to the Pygmalion effect is βb1 ± t.025,18 × σ bβb1 = 7.222 ± 2.101 × 2.58 = (1.8,12.6) 21 Case Study 2 From Ramsey and Schafer (1997) The Statistical Sleuth, Duxbury Press, p. 363. (A. Olsen, Evolutionary and Ecological Interactions Affecting Seaweeds, Ph.D Thesis. Oregon State U. 1993.) To study the influence of grazers on regeneration rates of seaweed in the intertidal zone, a researcher scraped rocks free of seaweed and observed the amount of regeneration over time when certain grazers were excluded. • The grazers were 1. L - limpets (an invertebrate) 2. f - small fish 3. F - big fish • Each plot was a 100 cm square located on a rock surface. Each plot received one of 6 treatments levels: 1. LfF: all three grazers were allowed access 2. fF: fish allowed access (limpets excluded by surrounding the plot with a caustic paint) 3. Lf: Limpets and small fish allowed access (a coarse net excluded large fish) 4. f: small fish allowed access (paint and coarse net) 5. L: limpets allowed access (fine net) 6. C: (control) limpets, small and large fish excluded • Exclosures were constructed by mounting nets on a frame bolted to the rock 22 • All plots had frames to eliminate confounding with the possible effect of the frames on feeding preference Objectives of the Study • Determine the impacts of the three different grazers on seaweed regeneration rates • Determine which grazer consumes the most seaweed • Determine if different grazers affect each other • Determine if grazing effects are the same in all microhabitats More on design • There are 3 factors in this study each with 2 levels: 1. limpets (present and absent) 2. small fish (present and absent) 3. large fish (present and absent) • A factorial design combines each level of each factor with every other level. If this were a factorial design, then there would be 2 × 2 × 2 = 8 treatments. • It was not physically possible to form all 8 combinations. For example, it was not feasible to exclude small fish and allow large fish in the enclosures. • A strategy for analyzing these data is to view the experiment as a two-way analysis of variance using a single treatment factor with 6 levels (shown in the layout below) and a blocking factor corresponding to inter-tidal environment 23 Large fish Limpets absent Limpets present Small fish Small fish absent present absent present C f absent present L Lf fF LfF • Because the intertidal zone is a highly variable environment, the researcher applied the 6 levels of treatment in eight blocks, each of which contained 12 plots. Thus, each treatment level was replicated twice within each block. • Within block, the six levels were randomly allocated to the 12 plots • If desired, interaction between grazers (the treatment factor) and environment (blocking factor) and be investigated • The blocks are 1. Block 1: below high tide, exposed to heavy surf 2. Block 2: below high tide, protected from heavy surf 3. Block 3: Mid-tide, exposed 4. Block 4: Mid-tide, protected 5. Block 5: Low tide, exposed 6. Block 6: Low tide, protected 7. Block 7: On a near-vertical rock wall, mid-tide level and exposed 8. Block 8: On a near-vertical rock wall, low tide level and protected 24 • Replication is very helpful, because it allows us to measure the inherent variability of the response variable under (near) identical conditions. • This is a randomized block experiment - treatment levels were randomly allocated to experimental units (the plots) within each block • By allocating treatment level within block, Olsen was assured of having exactly 2 replications of each treatment in each block. The design is said to be balanced • A primary objective of this design is to be able to compare treatments within block - this helps prevent environmental variation from confounding or obscuring the comparisons between treatments • Alternatively, if Olsen had randomly allocated treatments to plots without regard for blocks, some blocks would have more of one treatment than another. Perhaps all the limpet level observations would have been in blocks with a high levels of exposure • After four weeks, Olsen estimated regenerating seaweed cover by positioning a metal sheet with 100 holes over each plot • The percentage of holes that were positioned over regenerating seaweed was determined • Data - percent regenerated seaweed cover 25 Treatment Block C L f Lf fF LfF 1 14 23 4 4 11 24 3 5 10 13 1 2 2 22 35 7 8 14 31 3 6 10 15 3 5 3 67 82 28 58 52 59 9 31 44 50 6 9 4 94 95 27 35 83 89 21 57 57 73 7 22 5 34 53 11 33 33 34 5 9 26 42 5 6 6 58 75 16 31 39 52 26 43 38 42 10 17 7 19 47 6 8 43 53 4 12 29 36 5 14 8 53 61 15 17 30 37 12 18 11 40 5 7 Recall the proposed strategy for analyzing two-way tables when there are several observations per cell: 1. Begin with graphically-based initial exploration, and determine if there are outliers, and if transformations are needed 2. Fit a rich model (the saturated model) with interactions, and examine model assumptions (concentrating on the constant variance assumption, and whether there are outliers). Model fitting uses a backwards elimination process 3. Test whether the interaction terms are needed (via the F -statistic). • If interaction terms are needed, then estimate the mean response and its standard error for each treatment. That is, compute y ij and σ byij = s/nij for each i and j. • If interaction terms are not needed, then test whether the additive effects of the 26 row factor are zero, and whether the additive effects of the column factor are zero. In other words, test whether the coefficients that account for the factor are all zero versus the alternative that at least one is different from zero. • Particular comparisons can be carried out at this point. For example, estimate the differences in expected response for different treatments (when interaction is found to be present) or different levels of factors (when interaction is not present). The answers to these questions are ultimately, the most useful information coming from the analysis • Figure 3 is a visual representation of the data 40 0 20 Percent 60 80 CONTROL L f Lf fF LfF 2 4 6 8 Block Figure 2: Cell means plotted against block. Seaweed grazers data. • Treatments with limpets excluded had larger averages than those wherein limpets were 27 able to graze. This difference is consistent across blocks • Other treatments are less consistent across blocks • There is evidence of interaction - the differences between treatments varies with block. There are substantially larger differences in block 4 than 1 and 2. However, this apparent interaction is related to the overall percent regeneration, and it may be possible to remove it by transformation • There is evidence of nonconstant variance - the variation in cell means is greater for blocks 4, 6 and 8. This variation is related to the overall block average • Figure 5 shows the residuals from the nonadditive model. Nonconstant variance is present; in particular, the residuals associated fitted values near 50 percent have the greatest variability, and those residual associated with residuals near zero or 100 have the least variability • The data are percentages (or equivalently, sample proportions) Percentages typically exhibit a variance relationship where small variance is observed when the values are near the lower and upper limits of the range, and large variances are observed in the center of the range). • Recall that the variance of a sample proportion is p(1 − p)/n where p is the probability that the event of interest will occur, and n is the number of observations. The maximum variance occurs when p = 0.5 • A common transformation for percentage data is the logit function given by z = log ³ y/100 ´ ³ y ´ = log 1 − y/100 100 − y 28 2 1 0 -1 -2 Standardized residuals 0 20 40 60 80 Fitted values Figure 3: Residuals from the nonadditive (cell means) models. Seaweed grazers data. • The logit transformation is interpretable as the natural logarithm of the proportion of the plot covered by seaweed to that not covered by seaweed • Figure 4 shows the graph of the logit • The ANOVA table assuming a nonadditive (cell means) model is shown in Table 6 • Recall that the objectives of the study were to 1. Determine the impacts of the three different grazers on seaweed regeneration rates 2. Determine which grazer consumes the most seaweed 3. Determine if different grazers affect each other 29 4 2 0 -4 -2 logit(p) 0.0 0.2 0.4 0.6 0.8 1.0 p Figure 4: Graph of the logit function. 4. Determine if grazing effects are the same in all microhabitats • The last objective has largely been answered. Because treatments and blocks do not interact, the treatment effects are the same in all blocks. Hence, we can conclude that there is no evidence that grazing effects are not the same in all microhabitats • To begin to address the remainder of the objectives, consider a table (Table 7) showing the fitted values for each of the block/treatment combinations. Such a table is sometimes called a table of estimated means from the additive model. For now, the logits are going to be used. 30 2 0 -4 -2 logit 2 4 6 8 Block Figure 5: Logit cell means plotted against block. Seaweed grazers data. • The values shown in Table 7 (besides the right and bottom margins) are ybij· = µ bij = βb0 + . . . + βbp xp,ij (1) where βb0 , . . . , βbp are the parameter estimates for the additive model computed via least square regression. • A row mean is the mean prediction for all observations in a particular block. That is, the ith row mean is c 1X µ bi· = µ bij , c j=1 • A column mean is the mean prediction for all observations in a particular treatment 31 Table 6: AVOVA table for the seaweed grazers data set based on the non-additive (interaction) model. R2 = 0.928. Source Regression Blocks Treatment Interaction Error Corrected Total Sums-of-squares 188.46 76.24 96.99 15.23 14.54 203.00 Degrees of Freedom 47 7 5 35 48 95 Mean Square Error 4.01 10.89 19.40 0.43 0.303 51.893 F 13.24 35.96 64.05 1.44 P-value < 0.0001 < 0.0001 < 0.0001 0.121 group. That is, the jth column mean is r µ b·j = 1X µ bij r i=1 • For example, µ b1· = −2.64 is the mean logit of percent regenerated seaweed for block 1 Table 7: Predicted values (on the logit scale) assuming the additive model. Blocks 1 2 3 4 5 6 7 8 Mean C −1.22 −.76 .88 1.76 −0.01 0.80 −.11 .11 .18 L −3.12 −2.66 −1.01 −.13 −1.90 −1.09 −2.01 −1.79 −1.71 Treatment f Lf −1.72 −3.41 −1.26 −2.95 .39 −1.30 1.26 −.43 −.50 −2.19 .31 −1.39 −.61 −2.30 −.39 −2.08 −.31 −2.00 fF −2.22 −1.76 −.12 .76 −1.01 −.20 −1.12 −.89 −.82 LfF −4.13 −3.67 −2.02 −1.15 −2.91 −2.12 −3.02 −2.80 −2.72 Mean −2.64 −2.18 −.53 .34 −1.42 −.61 −1.53 −1.31 −1.23 Contrasts (Ott and Longnecker, p. 431) Often, it is desirable to compare specific treatments and obtain a formal test of significance. • For example, suppose that drug A is the conventional treatment for a particular disease and drugs B and C are two versions of a new formulation (say fast- and slow-acting versions). 32 • Once it has been established that there are differences in expected response due to treatments (drugs) via an F -test, then we may want to compare B and C versus A, and then B versus C (and answer the question: which of the new drugs is best?). • It is desirable to use contrasts for this purpose Notation and Terminology • Suppose that there are t treatment group means µ1 , µ2 , . . . , µt . Let µ1 , µ2 , µ3 denote the mean response when drug A, drug B, and drug C are taken, respectively • A contrast (or comparison) of these means (µ1 , µ2 , µ3 ) is l = a1 µ1 + · · · + at µt = t X ai µi , i=1 where a1 , . . . , at are known constants with the property t X ai = 0 i=1 • For example to compare µ2 against µ3 , set a1 = 0, a2 = 1, a3 = −1, . . . , at = 0. Then, l= t X ai µi = µ2 − µ3 i=1 • To compare µ1 versus µ2 and µ3 , one way is to set a1 = 1, a2 = − 12 and a3 = − 12 . Then, l = µ1 − µ2 + µ3 2 3 • The mathematical statement l = 0 is equivalent to 0 = µ1 − µ2 +µ and also µ1 = 2 • A formal test of H0 : µ1 = µ2 +µ3 2 µ2 +µ3 2 is obtained by testing H0 : l = 0 versus Ha : l 6= 0 • Because µ1 , µ2 , . . . , µt are unknown, we need to estimate l. The estimate is called a 33 linear contrast of treatment means, and it is b l = a1 µ b1 + · · · + at µ bt = t X ai µ bi , i=1 where the estimates µ b1 , . . . , µ bt are obtained from fitted regression model • To test H0 : l = 0 versus Ha : l 6= 0, we use the test statistic b l , σ bbl T = where the estimated variance of the contrast of treatment means is σ bbl2 =σ b 2 t X a2 i i=1 ni • σ b is the residual standard error (from the final model), and ni are the number of observations which received treatment i • If H0 is true, then T has a T distribution where the degrees of freedom are residual degrees of freedom associated with model from which the estimate σ b was obtained • Thus, a p-value for the test of H0 is Pr(Tdf ≥ |t|) where t is the observed value of the contrast of treatment means and Tdf has a T distribution with df degrees of freeedom • The objectives of the seaweed grazers study require a number of contrast tests. Recall the following table showing the grazer treatments Large fish absent present Limpets absent Limpets present Small fish Small fish absent present C f absent present L Lf fF LfF 34 • Since an additive model was adopted, the effect of limpets can be estimated by comparing the average response for the three treatments wherein limpets grazed to the average response for the three treatments wherein limpets were excluded using a contrast of treatment means given by µ bLfF + µ bLf + µ bL µ bfF + µ bf + µ bC − 3 3 (2) • However, before reporting the estimated limpet effect, we ought to verify that it is not zero by testing the following hypotheses H0 : µLfF + µLf + µL µfF + µf + µC − = 0, 3 3 (3) Ha : µLfF + µLf + µL µfF + µf + µC − 6= 0, 3 3 (4) versus • The estimated effect of limpets is µ bLfF + µ bLf + µ bL µ bfF + µ bf + µ bC −2.72 − 2.00 − 1.71 −.82 − .31 + .18 − = − 3 3 3 3 = − 1.82 • The values used in this equation are the row means shown in Table 7 • To carry out the test of significance for limpet effect, note that σ b2 = 0.358 and that the number of observations that are used to estimate the cell means are ni = 16 = 2 × 8 for each i = 1, . . . , 6. Using these values, the estimated variance of the treatment mean 35 contrast is σ bbl2 =σ b 2 2 = σ b 16 t X a2 i i=1 t X ni a2i i=1 · ¸2 · ¸2 · ¸2 · ¸2 · ¸2 · ¸2 ´ 0.358 ³ 1 1 1 1 1 1 = + + + − + − + − 16 3 3 3 3 3 3 = 0.358 6 × = .01494 16 9 • The standard error of the treatment mean contrast is estimated to be q σ bbl = σ bbl2 = √ .01494 = .1222 • The test statistic is t= b l −1.829 = = −14.97. σ bbl .1222 • Finally, Pr(T48 > | − 14.97|) = Pr(T48 > 14.97) < 0.0001 and we conclude that there is abundant evidence that limpets affect the regeneration of seaweed • Because of the combinations of grazers that were allowed access is somewhat complicated, some thought is needed to determine how to assess whether different grazers affect each other • For example, are limpets affected by small fish? To answer this question, we can compare the differences between limpets present and absent when small fish are present, versus when small fish are absent. 36 • A contrast of these means is µLFf − µfF µLf − µf µL − µC + −2× 2 2 2 and we are interested in determining whether the observed contrast is significantly different from zero • By substituting the appropriate column means form Table 7, the contrast estimate is −2.72 − −.82 −2.00 − −.31 −1.71 − .18 + −2× = −.09 2 2 2 • The estimated standard error of the contrast of treatment means is σ bbl = .26 and the test statistic is t = −.09/.26 = 0.37. Further, Pr(|T48 | > 0.37) = 0.71 which shows that there is no evidence that small fish affect limpets, and likewise no evidence that limpets affect small fish • Another example. To assess whether large fish have an effect on regeneration, we compare means from the treatments fF and LfF against f and Lf by testing the hypothesis H0 : µfF + µLfF µf + µLf µfF + µLfF µf + µLf − = 0 versus Ha : − 6= 0 2 2 2 2 • The estimated contrast is −.82 − 2.72 −.31 − 2.00 − = −1.77 + 1.16 = −.61 2 2 • In this case, a1 = 1 2 = a2 , a3 = − 12 = a4 , and a5 = 0 = a6 • Then, t= b l −.61 = = −4.10. σ bbl .1498 • The p-value is P (|t48 | > 4.10) = .0007. 37 Multifactor Studies Without Replication • So far, the emphasis has been on analyzing two-way tables using categorical indicator variables. There was more than one observation for some treatment combinations • Now we consider experiments in which there are two or more factors, and just one observation per treatment combination. • If there are more than one observation per treatment combination, we say that treatments combinations are replicated ; otherwise we say treatment combinations are not replicated • If treatment combinations are replicated, then there are no limitations on the interactions that can be accommodated in the regression analysis. • For example, if the seaweed grazers experiment had identified four factors (limpets, large fish , and small fish, and blocks) and every combination of factor levels had been replicated more than once, then we could fit a model with all main effects, all two-way interactions (e.g., block×Limpet), and all three-way interactions (e.g., block×Limpet×Large fish), and the four-way interaction block×Limpet×Large fish×Small fish • When there are more than one observation per treatment group and every possible interaction is included in the model, then the variance σ 2 is estimated by comparing the variation of the observations about their respective cell means. • When there are more than one observation per treatment group and every possible interaction is included in the model, then the cell means are exactly equal to the single observation observed at the correspond combination of treatment means. Error will be estimated to be zero, which is obviously unrealistic for most experiments 38 • The situation is different if there are only a single observation per treatment group Case Study 3 (revisited) (Ramsey and Schafer) from Fouts, R.S. 1983. “Aquisition and testing of gestural signs in four young chimpanzees,” Science, 180, 978-980. Fouts taught 4 chimpanzees 10 signs of the the American sign language with the intent of determining whether some signs are easier to learn, and whether some chimps tended to learn more quickly than others. Table 8: Data: (time in minutes to learn a word). Word Listen Drink Shoe Key More Food Fruit Hat Look String Booee 12 15 14 10 10 80 80 78 115 129 Chimpanzee Cindy Bruno 10 2 25 36 18 60 25 25 15 225 55 14 20 177 99 178 54 345 476 287 Thelma 15 18 20 40 24 190 297 297 420 372 • Note that there are two factors (chimps with 4 levels and words with 10 levels). Hence, there are 4 × 10 = 40 treatment groups, and 40 observations • There is no possibility of replicating treatments (we cannot view teaching a word to an animal a second time as an independent replication of the first teaching). Thus, there is exactly one observation per treatment combination (chimp × word) • An important question is how we view chimps. Chimps are treated as a factor with 4 levels. Note that chimps are not replicable but words are. That is, if I decide to repeat the study, I can teach the same words to 4 chimps, but, almost certainly, I cannot use the same four chimps. Hence, chimps are like blocks - they are not replicable, and their 39 main purpose is improve the contrast between the learning time of particular words. • If these chimps were a random sample of chimps, then comparisons among chimps could be used for draw inferences about the population of chimps. If this were the case, then we probably would have substantial interest in comparing chimps • Because the chimps cannot be viewed as a representative sample of any recognizable and useful population, we should view chimp as a fixed blocking factor. We’ll return to this matter soon • Two factors are identified: chimp (with 4 levels), and word (with 10 levels) • The chimp factor requires 3 indicator variables • The word factor requires k − 1 = 10 − 1 = 9 indicator variables • The additive model requires 1 + 3 + 9 = 13 parameters; therefore there are n − p = 40 − 13 = 27 degrees of freedom for estimating σ 2 • An interaction model requires 3 × 9 = 27 indicator variables, hence, there would be a total of 1 + 3 + 9 + 27 = 40 parameters and n − p = 40 − 40 = 0 degrees of freedom for estimating σ 2 . The estimate of σ 2 from a regression analysis will be σ b2 = 0. Zero is an under-estimate of σ 2 . Surely there are sources of variation in the time it takes a chimp to learn a word; for example, physical condition (hunger) or outside distractions probably induce variation in the time it takes a particular chimp to learn a particular word • Clearly, this estimate is biased downward, perhaps severely, and no test that uses the 40 estimate can be viewed as unbiased • We cannot carry out any inferential methods that require an estimate of σ 2 if we believe the interaction model is correct because our estimate of σ 2 is 0 Rationale for Designs with One Observation Per Cell • For some experiments, there is no possibility of replicating the treatment group combinations. Because a word cannot be taught twice to a particular chimp, we cannot replicate the (chimp,word) treatment combinations • For some experiments, the cost of obtaining replicate observations is too great. If it can be argued in the design stage that interaction is not likely, then this design is cost-effective. If interaction is present, the value of the experiment is greatly diminished because there is no way to estimate σ 2 and it is very difficult, if not impossible, to the test the usual hypotheses of interest A Strategy for Data Analysis in the Absence of Replicates 1. Begin with graphically-based initial exploration, and determine if there are outliers, and if transformations are needed 2. Fit a rich model and examine model assumptions (concentrating on the whether there is evidence of interaction and nonconstant variance). If there are three factors, then we can have two-way, but not three-way interactions. A rich model contains all reasonable two-way interactions, and the main effects 3. If the rich model contains interactions (there must be at least 3 factors in the model so that 2-way interaction can be modeled), test whether these interaction terms are 41 needed (via the extra-sums-of squares F -statistic) • If a factor interacts with another factor, then estimate the mean response and its standard error for each different combination of factor levels • If interaction terms are not needed, then test whether the additive effects of each factor are zero 4. Particular comparions can be carried out at this point (e.g, “which grazer reduces seaweed growth to the greatest extent, and by how much?”) The answers to these questions are ultimately the most useful information coming from the analysis • The main point is that the researcher must be able to argue convincingly, based on her scientific understanding of the problem, that the highest possible level of interaction is zero (does not exist) 42
© Copyright 2026 Paperzz