ANOVA Analysis of Variance (In this class, we will only do this for balanced data.) Why to do ANOVA: [This means that there are the same number of observations in each group; i.e. the total number of observations = N = IJr] To answer the questions 1. Is the interaction between Factor A and Factor B statistically significant? If yes: Keep αβij, αi, and βj in the model; stop here. If no: Drop αβij and move on to questions 2 and 3. 2. Is the main effect of Factor A significant? If yes: Keep αi in the model. If no: drop it. 3. Is the main effect of Factor B significant? If yes: Keep βj in the model. If no: drop it. How to do ANOVA: Step 1: Calculate the Total Sum of Squares (SST). I J r SST (Yijk Y... ) 2 i 1 j 1 k 1 I J r (Yijk Y.. ) 2 since we have balanced data i 1 j 1 k 1 For the wood-joint example: SST (1518 1375.9) 2 (1927 1375.9) 2 (1348 1375.9) 2 ... (1493 1375.9) 2 4,521, 604.98 1 Step 2: Decompose the Total Sum of Squares into components due to each factor, the interaction between the factors, and experimental error: SST = SSA + SSB + SSAB + SSE Component Formula of Variability I SSA Interpretation rJ Yi. Y.. i 1 SSB J rI Yi. Y.. j 1 2 I rJ ai2 i 1 2 J rI b 2j j 1 2 SSAB r Yij Y.. ai b j r abij2 i 1 j 1 i 1 j 1 SSEfull Y [for the full model; this is used to test is the interaction is significant] I I J J r i 1 j 1 k 1 ijk Yij 2 I J The variability in the data caused by variations in Factor A. The variability in the data caused by variations in Factor B. The variability in the data caused by the interaction between the variability in Factors A and B. The variability in the data caused by experimental error (i.e., something other than Factors A and B and their interaction). Noise (as opposed to signal). SSEadd SSEfull + SSAB [for the additive model, in which there is no interaction] The variability that would be expected for two observations given the same treatment. SST Within group variability. (Means within same treatment group.) Total variability in the data. I J r SST (Yijk Y.. ) 2 i 1 j 1 k 1 2 Step 3: Make an ANOVA Table Start with Full or Saturated Model Table. Notes about the p-values: You find the p-value for each row using Table A.9 in the book, a calculator, or other statistical software. For a TI-83 TI-89 calculator, use the FCDF function: FCDF(lower,upper,numerator df,denominator df) where lower is the value of the F Statistic, upper is a really large number (use EE99), and the df come from the rows used to calculate the F statistic. On a TI-83, type: 2nd > VARS > 9 and enter the appropriate inputs This first table is me communicating to you what the formulas are and/or how you get the numbers for the table. Source Degrees of freedom (df) Sum of Squares (SS) Mean Square (MS) F Statistic (F) A B A*B Error Total I-1 J-1 (I-1)( J-1) IJ(r-1) (IJr)-1 SSA SSB SSAB SSEfull SST SSA/(I-1) SSB/(J-1) MSA/MSE MSB/MSE MSAB/MSE ----- SSAB/(I-1)(J-1) SSE/(IJ(r-1)) --- p-value ----- [Note: dferror = dftotal – dfA – dfB – dfAB] Example for the TV Tube Wood-Joint data: Source Degrees of freedom (df) Sum of Squares (SS) Mean Square (MS) F Statistic (F) p-value [taken from Table A.9] Glass Joint Phosphor Wood G*P J*W 2 2 2126875.3 1686394.2 1063437.65 843197.1 47.47 37.64 <0.001 <0.001 4 507080.3 126770.1 5.66 Between 0.01 & 0.05 Error Total 9 17 201622.5 4521605 22402.5 Sums of Squares Calculations: SSA = (2)(3)(459.82 + 336.62 + 93.12) SSB = (2)(3)(64.12 + 402.72 + 338.82) SSAB = (2)(177.32 + 155.52 +…+ 130.62) SSE = (1518-1722.5)2 + (1927-1722.5)2 + (1348-1277.5)2 +…+ (1493-1491)2 SST is calculated on page 1 3 [Notes: In practice, we would stop the analysis at this point since the interaction is significant, so it and each of the main effects must remain in the model. When I say “model” I am referring to the “model equation”—a symbolic way to represent the relationship between the factors and the response. For example: o When there is no relationship between the response Y and Factors A and B, then the model would show that the value of Y for each individual is made up of the overall mean µ and some deviation (or error, denoted e) from that mean: Yijk = µ + eijk o When there is a relationship between the response and Factor A but not Factor B, then the model would show that Y for each individual is made up of the overall mean, the effect of Factor A, and some error: Yijk = µ + αi + eijk o Similarly, when there is a relationship between the response and Factor B but not Factor A, the model would be: Yijk = µ + βj + eijk o The model becomes more complex (includes more terms) as the relationship becomes more complex. It can include terms for both the main effects of Factors A and B In can include a term for the interaction (αβij; the book uses the notation γij) o In each model, the error term eijk reflects the fact that the values of Y for each individual are not the same (there is variation in the data due to unexplained causes). These errors often follow a normal distribution. On average, the errors would be zero. The amount of variation in the errors reflects the amount of variation in the data. So we say that eijk ~ N(0,σ2) For the Wood-Joint example, the final model would be: Yijk = µ + αi + βj + αβij + eijk, where eijk ~ N(0,σ2). However, on the next page we will explore the additive model for this data, so you can see what that would be like.] 4 Now for the Additive Model Table. This first table is me communicating to you what the formulas are and/or how you get the numbers for the table and/or what the numbers mean. Source A B Error Total Degrees of freedom (df) I-1 J-1 * (IJr)-1 Sum of Squares (SS) SSA SSB SSEadd SST Mean Square (MS) SSA/(I-1) SSB/(J-1) SSE/dferror --- F Statistic (F) MSA/MSE MSB/MSE ----- p-value [see note on previous page] ----- * dferror = dftotal – dfA – dfB Example for the TV Tube Wood-Joint data: Source Glass Joint Phosphor Wood Error Total Degrees of freedom (df) Sum of Squares (SS) Mean Square (MS) 2 2 2126875.3 1686394.2 1063437.65 843197.1 13 17 708702.8 4521605 54515.6 F Statistic (F) 19.51 15.47 p-value <0.001 <0.001 [Notes: The highlighted portions of the table are the same as for the full model. The F Statistics for Factors A and B change since MSE changed. The 2nd handout from class shows an example of where the interaction is not significant, so it would be appropriate to calculate the additive model. We will go over this example more on Tuesday.] 5
© Copyright 2026 Paperzz