Filled in notes

ANOVA
Analysis of Variance (In this class, we will only do this for balanced data.)
Why to do ANOVA:
[This means that there are the same
number of observations in each group; i.e.
the total number of observations = N = IJr]
To answer the questions
1. Is the interaction between Factor A and Factor B statistically significant?
If yes: Keep αβij, αi, and βj in the model; stop here.
If no: Drop αβij and move on to questions 2 and 3.
2. Is the main effect of Factor A significant?
If yes: Keep αi in the model. If no: drop it.
3. Is the main effect of Factor B significant?
If yes: Keep βj in the model. If no: drop it.
How to do ANOVA:
Step 1: Calculate the Total Sum of Squares (SST).
I
J
r
SST   (Yijk  Y... ) 2
i 1 j 1 k 1
I
J
r
  (Yijk  Y.. ) 2 since we have balanced data
i 1 j 1 k 1
For the wood-joint example:
SST  (1518  1375.9) 2  (1927  1375.9) 2  (1348  1375.9) 2  ...  (1493  1375.9) 2
 4,521, 604.98
1
Step 2: Decompose the Total Sum of Squares into components due to each factor,
the interaction between the factors, and experimental error:
SST = SSA + SSB + SSAB + SSE
Component
Formula
of Variability
I
SSA
Interpretation

rJ  Yi.  Y..
i 1
SSB
J

rI  Yi.  Y..
j 1


2
I
 rJ  ai2
i 1
2
J
 rI  b 2j
j 1


2
SSAB
r  Yij  Y..  ai  b j   r  abij2


i 1 j 1
i 1 j 1
SSEfull
 Y
[for the full model;
this is used to test
is the interaction is
significant]
I
I
J
J
r
i 1 j 1 k 1
ijk
 Yij 
2
I
J
The variability in the data
caused by variations in
Factor A.
The variability in the data
caused by variations in
Factor B.
The variability in the data
caused by the interaction
between the variability in
Factors A and B.
The variability in the data
caused by experimental error
(i.e., something other than
Factors A and B and their
interaction).
Noise (as opposed to signal).
SSEadd
SSEfull + SSAB
[for the additive
model, in which
there is no
interaction]
The variability that would be
expected for two
observations given the same
treatment.
SST
Within group variability.
(Means within same
treatment group.)
Total variability in the data.
I
J
r
SST   (Yijk  Y.. ) 2
i 1 j 1 k 1
2
Step 3: Make an ANOVA Table
Start with Full or Saturated Model Table.
Notes about the p-values:
 You find the p-value for each row using Table A.9 in the book, a
calculator, or other statistical software.
 For a TI-83 TI-89 calculator, use the FCDF function:
FCDF(lower,upper,numerator df,denominator df)
where lower is the value of the F Statistic, upper is a really large
number (use EE99), and the df come from the rows used to calculate
the F statistic.
 On a TI-83, type: 2nd > VARS > 9 and enter the appropriate inputs
This first table is me communicating to you what the formulas are and/or how you
get the numbers for the table.
Source
Degrees of
freedom
(df)
Sum of
Squares
(SS)
Mean
Square
(MS)
F Statistic
(F)
A
B
A*B
Error
Total
I-1
J-1
(I-1)( J-1)
IJ(r-1)
(IJr)-1
SSA
SSB
SSAB
SSEfull
SST
SSA/(I-1)
SSB/(J-1)
MSA/MSE
MSB/MSE
MSAB/MSE
-----
SSAB/(I-1)(J-1)
SSE/(IJ(r-1))
---
p-value
-----
[Note: dferror = dftotal – dfA – dfB – dfAB]
Example for the TV Tube Wood-Joint data:
Source
Degrees of
freedom
(df)
Sum of
Squares
(SS)
Mean
Square
(MS)
F Statistic
(F)
p-value
[taken from
Table A.9]
Glass Joint
Phosphor
Wood
G*P J*W
2
2
2126875.3
1686394.2
1063437.65
843197.1
47.47
37.64
<0.001
<0.001
4
507080.3
126770.1
5.66
Between 0.01 &
0.05
Error
Total
9
17
201622.5
4521605
22402.5
Sums of Squares Calculations:
SSA = (2)(3)(459.82 + 336.62 + 93.12)
SSB = (2)(3)(64.12 + 402.72 + 338.82)
SSAB = (2)(177.32 + 155.52 +…+ 130.62)
SSE = (1518-1722.5)2 + (1927-1722.5)2 + (1348-1277.5)2 +…+ (1493-1491)2
SST is calculated on page 1
3
[Notes:
 In practice, we would stop the analysis at this point since the interaction is
significant, so it and each of the main effects must remain in the model.
 When I say “model” I am referring to the “model equation”—a symbolic way to
represent the relationship between the factors and the response. For example:
o When there is no relationship between the response Y and Factors A and B,
then the model would show that the value of Y for each individual is made up
of the overall mean µ and some deviation (or error, denoted e) from that
mean: Yijk = µ + eijk
o When there is a relationship between the response and Factor A but not
Factor B, then the model would show that Y for each individual is made up of
the overall mean, the effect of Factor A, and some error: Yijk = µ + αi + eijk
o Similarly, when there is a relationship between the response and Factor B
but not Factor A, the model would be: Yijk = µ + βj + eijk
o The model becomes more complex (includes more terms) as the relationship
becomes more complex.
 It can include terms for both the main effects of Factors A and B
 In can include a term for the interaction (αβij; the book uses the
notation γij)
o In each model, the error term eijk reflects the fact that the values of Y for each
individual are not the same (there is variation in the data due to unexplained
causes).
 These errors often follow a normal distribution.
 On average, the errors would be zero.
 The amount of variation in the errors reflects the amount of variation
in the data.
 So we say that eijk ~ N(0,σ2)
 For the Wood-Joint example, the final model would be: Yijk = µ + αi + βj + αβij + eijk,
where eijk ~ N(0,σ2).
 However, on the next page we will explore the additive model for this data, so you
can see what that would be like.]
4
Now for the Additive Model Table.
This first table is me communicating to you what the formulas are and/or how you
get the numbers for the table and/or what the numbers mean.
Source
A
B
Error
Total
Degrees of
freedom
(df)
I-1
J-1
*
(IJr)-1
Sum of
Squares
(SS)
SSA
SSB
SSEadd
SST
Mean
Square
(MS)
SSA/(I-1)
SSB/(J-1)
SSE/dferror
---
F Statistic
(F)
MSA/MSE
MSB/MSE
-----
p-value
[see note on
previous page]
-----
* dferror = dftotal – dfA – dfB
Example for the TV Tube Wood-Joint data:
Source
Glass Joint
Phosphor
Wood
Error
Total
Degrees of
freedom
(df)
Sum of
Squares
(SS)
Mean
Square
(MS)
2
2
2126875.3
1686394.2
1063437.65
843197.1
13
17
708702.8
4521605
54515.6
F Statistic
(F)
19.51
15.47
p-value
<0.001
<0.001
[Notes:
 The highlighted portions of the table are the same as for the full model.
 The F Statistics for Factors A and B change since MSE changed.
 The 2nd handout from class shows an example of where the interaction is not significant, so it would
be appropriate to calculate the additive model. We will go over this example more on Tuesday.]
5