February 3

ST 516
Experimental Statistics for Engineers II
Model Adequacy
Usual residual plots:
Residuals versus predicted (fitted) values;
Probability plot (q-q plot) of residuals;
Residuals by treatment level.
And for blocks: Residuals by block.
R command for the usual plots
plot(aov(Yield ~ factor(Batch) + factor(Pressure), graftLong))
1 / 26
Blocked Designs
Model Adequacy
ST 516
Experimental Statistics for Engineers II
4
Residuals vs Fitted
1.3 ●
● 4.2
●
●
●
0
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
−4
−2
Residuals
2
●
●
85
90
●
1.2 ●
95
Fitted values
aov(Yield ~ factor(Batch) + factor(Pressure))
2 / 26
Blocked Designs
Model Adequacy
ST 516
Experimental Statistics for Engineers II
1.3 ●
●
4.2 ●
1
●
●
●
●
0
●● ●
●●●
●
● ●
−1
Standardized residuals
2
Normal Q−Q
●
● 1.2
−2
●●
●
● ●
●
−1
0
1
Theoretical Quantiles
aov(Yield ~ factor(Batch) + factor(Pressure))
3 / 26
Blocked Designs
Model Adequacy
2
ST 516
Experimental Statistics for Engineers II
Scale−Location
1.2
1.2 ●
●
●
●
●
0.8
●
●●
●●
●
●
●
●
0.4
●
●
●
●
●
●
0.0
Standardized residuals
1.3 ●
● 4.2
●
85
90
95
Fitted values
aov(Yield ~ factor(Batch) + factor(Pressure))
4 / 26
Blocked Designs
Model Adequacy
●
ST 516
Experimental Statistics for Engineers II
2
Constant Leverage:
Residuals vs Factor Levels
1.3 ●
●
1
●
●
●
●
0
●
●
●
●
●
●
● ●
●
●
●
●
●
●
−1
Standardized residuals
● 4.2
●
●
−2
● 1.2
factor(Batch) :
2
3
5
4
1
Factor Level Combinations
5 / 26
Blocked Designs
Model Adequacy
6
ST 516
Experimental Statistics for Engineers II
General Regression Approach
Any design may be analyzed based on the statistical model; e.g.
RCBD:
yi,j = µ + τi + βj + i,j ,
i = 1, 2;
j = 1, 2, . . . , n.
Parameter estimates: use least squares:
for balanced design, gives the usual estimates;
for unbalanced design, e.g. with missing values, gives optimal
estimates.
6 / 26
Blocked Designs
General Regression Approach
ST 516
Experimental Statistics for Engineers II
General Regression Significance Test
For any fitted model, write
X
y 2 = R(parameters) + SSE
E.g. RCBD:
a X
b
X
2
yi,j
= R(µ, τ, β) + SSE .
i=1 j=1
R(µ, τ, β) is the “sum of squares explained by µ, τ1 , . . . , τa , and
β1 , . . . , βb ”.
7 / 26
Blocked Designs
General Regression Approach
ST 516
Experimental Statistics for Engineers II
A hypothesis such as H0 : τ1 = τ2 = · · · = τa = 0 specifies a reduced
model with those parameters omitted (but still containing
β1 , . . . , βb ).
The “sum of squares associated with those parameters” is the
difference between the explained sums of squares.
E.g. R(τ |µ, β) = R(µ, τ, β) − R(µ, β) is “the sum of squares
associated with τ1 , . . . , τa ”.
8 / 26
Blocked Designs
General Regression Approach
ST 516
Experimental Statistics for Engineers II
In a balanced design:
R(τ |µ, β) is exactly SSTreatments
R(β|µ, τ ) is exactly SSBlockss .
With unbalanced data, e.g. a balanced design with missing data,
R(τ |µ, β) is the correct numerator for an F -statistic to test the
hypothesis H0 : τ1 = τ2 = · · · = τa = 0.
So in either case (balanced or unbalanced), R(τ |µ, β) is the correct
numerator.
9 / 26
Blocked Designs
General Regression Approach
ST 516
Experimental Statistics for Engineers II
Missing Observation
Suppose the yield for 8700 psi is missing in batch 4:
R commands
graftCopy <- graftLong
graftCopy["2.4", "Yield"] <- NA
summary(aov(Yield ~ factor(Batch) + factor(Pressure), graftCopy))
Output
Df Sum Sq Mean Sq F value
Pr(>F)
factor(Batch)
5 190.12 38.024 5.2346 0.006448 **
factor(Pressure) 3 163.40 54.466 7.4981 0.003130 **
Residuals
14 101.70
7.264
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
1 observation deleted due to missingness
The order matters! Blocks before Treatments.
10 / 26
Blocked Designs
Missing Observation
ST 516
Experimental Statistics for Engineers II
Montgomery describes an approximate method for handling a missing
data point, essentially by replacing it by its predicted (“imputed”)
value.
To use the imputation method, we could use the predict() method
to impute the missing value.
R commands
p <- predict(aov(Yield ~ factor(Batch) + factor(Pressure),
graftCopy), newdata = graftCopy["2.4", ])
#
2.4
# 91.08
graftCopy["2.4", "Yield"] <- p
summary(aov(Yield ~ factor(Batch) + factor(Pressure),
graftCopy))
11 / 26
Blocked Designs
Missing Observation
ST 516
Experimental Statistics for Engineers II
Output
Df Sum Sq Mean Sq F value Pr(>F)
factor(Batch)
5 189.5
37.90
5.591 0.00419 **
factor(Pressure) 3 166.1
55.38
8.169 0.00185 **
Residuals
15 101.7
6.78
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
12 / 26
Blocked Designs
Missing Observation
ST 516
Experimental Statistics for Engineers II
But note:
The degrees of freedom for Residuals should be 14, not 15, so the
table must be adjusted.
By hand:
factor(Batch)
factor(Pressure)
Residuals
13 / 26
Df
Sum Sq
Mean Sq
5
3
14
189.5
166.1
101.7
37.90
55.38
7.264
Blocked Designs
F value
Pr(>F)
5.218 0.00653
7.624 0.00292
Missing Observation
ST 516
Experimental Statistics for Engineers II
Adjusting residual df using R:
a <- aov(Yield ~ factor(Batch) + factor(Pressure), graftCopy)
a$df.residual <- a$df.residual - 1
summary(a)
Output
Df Sum Sq Mean Sq F value Pr(>F)
factor(Batch)
5 189.5
37.90
5.218 0.00653 **
factor(Pressure) 3 166.1
55.38
7.624 0.00292 **
Residuals
14 101.7
7.26
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Compare with correct table, above: close, but still only an
approximation. Moral: do it the correct way!
14 / 26
Blocked Designs
Missing Observation
ST 516
Experimental Statistics for Engineers II
Latin Squares
The RCBD allows you to remove variability due to one controllable
nuisance factor.
More complex designs are needed for more than one controllable
nuisance factor:
Latin Square for two nuisance factors;
Graeco-Latin Square for three nuisance factors.
15 / 26
Blocked Designs
Latin Squares
ST 516
Experimental Statistics for Engineers II
A Latin Square design has one treatment factor and two nuisance
factors.
Same number p of levels of each factor.
All p 2 combinations of levels of nuisance factors are run.
Treatment assignments are balanced across both nuisance factors.
16 / 26
Blocked Designs
Latin Squares
ST 516
Experimental Statistics for Engineers II
Examples (row = first nuisance factor, column = second nuisance
factor, letter = level of treatment):
4 × 4: A B
B C
C D
D A
D C
A D
B A
C B
5 × 5: A D
D A
C B
B E
E C
B
C
E
A
D
17 / 26
E C
B E
D A
C D
A B
Blocked Designs
Latin Squares
ST 516
Experimental Statistics for Engineers II
E.g. Rocket Propellant (rocket-propellant.txt):
Batch
1
1
1
1
1
2
.
.
.
18 / 26
Operator
1
2
3
4
5
1
Formulation
a
b
c
d
e
b
BurningRate
24
20
19
24
24
17
Blocked Designs
Latin Squares
ST 516
Experimental Statistics for Engineers II
R commands
rocketPropellant <- read.table("data/rocket-propellant.txt",
header = TRUE)
summary(aov(BurningRate ~ factor(Batch) + factor(Operator) +
Formulation, rocketPropellant))
Output
Df Sum Sq Mean Sq F value
factor(Batch)
4
68 17.000 1.5937
factor(Operator) 4
150 37.500 3.5156
Formulation
4
330 82.500 7.7344
Residuals
12
128 10.667
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05
19 / 26
Blocked Designs
Pr(>F)
0.239059
0.040373 *
0.002537 **
. 0.1
Latin Squares
1
ST 516
Experimental Statistics for Engineers II
Model equation:
yi,j,k = µ + αi + τj + βk + i,j,k ,
with 1 ≤ i ≤ p, 1 ≤ j ≤ p, 1 ≤ k ≤ p.
Here:
αi = i th row effect,
βk = k th column effect,
τj = j th treatment effect.
But we have only one observation in the (i, k) cell of the table; the
corresponding value of j can be read from the design.
20 / 26
Blocked Designs
Latin Squares
ST 516
Experimental Statistics for Engineers II
Random Effects
2
If Batch is a random effect with variance σBatches
, then the expected
mean square is
2
E (MSBatches ) = σ 2 + pσBatches
2
So we estimate σBatches
by
2
=
σ̂Batches
21 / 26
MSBatches − MSResiduals
17.000 − 10.667
=
= 1.267.
p
5
Blocked Designs
Latin Squares
ST 516
Experimental Statistics for Engineers II
R note: using reshape()
The data file rocket-propellant.txt has a different layout from
Table 4.8 in the text.
table-04-8.txt has essentially the same layout as Table 4.8:
Batch F1 R1 F2 R2 F3 R3 F4 R4 F5 R5
1 A 24 B 20 C 19 D 24 E 24
2 B 17 C 24 D 30 E 27 A 36
3 C 18 D 38 E 26 A 27 B 21
4 D 26 E 31 A 26 B 23 C 22
5 E 22 A 30 B 20 C 29 D 31
22 / 26
Blocked Designs
Latin Squares
ST 516
Experimental Statistics for Engineers II
Note that we need to reshape this with all the Fn columns stacked
into a single Formulation column, and with all the Rn columns
stacked into a single BurningRate column.
Read it and reshape it like this:
a <- read.table("data/table-04-8.txt", header = T)
rocketPropellant <reshape(a,
varying = list(c("F1", "F2", "F3", "F4", "F5"),
c("R1", "R2", "R3", "R4", "R5")),
v.names = c("Formulation", "BurningRate"),
timevar = "Operator", direction = "long");
Note that varying is now a list of two vectors of variable names, and
is a vector of names for the stacked columns.
v.names
23 / 26
Blocked Designs
Latin Squares
ST 516
Experimental Statistics for Engineers II
Two nuisance factors or one?
A different design for the rocket propellant experiment, also with 25
runs, is:
24 / 26
Batch
Operator
Formulation
1
1
1
1
1
1
1
1
1
1
a
b
c
d
e
2
2
...
2
2
a
b
Blocked Designs
Latin Squares
ST 516
Experimental Statistics for Engineers II
This is a complete blocked design.
In each block, a single operator tests every formulation using a single
batch of raw material.
If we observe significant block effects, we cannot distinguish operator
effects from batch effects.
But we get 16 degrees of freedom for Residuals, instead of 12, and
hence a more powerful test for Formulations.
25 / 26
Blocked Designs
Latin Squares
ST 516
Experimental Statistics for Engineers II
So we do not need to use the Latin Square design just because we
have two controllable nuisance factors.
If primary focus is on Formulations, the RCBD is better.
But the RCBD only allows us to estimate
2
2
σβ2 = σOperators
+ σBatches
.
2
2
The Latin Square allows us to estimate σOperators
and σBatches
separately.
26 / 26
Blocked Designs
Latin Squares