April 23

ST 516
Experimental Statistics for Engineers II
Nonnormal Responses
We have usually assumed that experimental data are
at least approximately normally distributed,
with at least approximately constant variance.
When either assumption is violated, we can try
transforming the response to remove the violation, or
using another model for the response distribution.
1 / 10
Other Topics
Nonnormal Responses
ST 516
Experimental Statistics for Engineers II
Box-Cox approach
The power transformations y ∗ = y λ are useful.
Box and Cox developed a systematic approach to finding a good λ,
based on
 λ

 y − 1 λ 6= 0,
y (λ) = λẏ λ−1


ẏ ln y
λ = 0,
where
X
1
ẏ = exp
ln y
n
is the geometric mean response.
2 / 10
Other Topics
Nonnormal Responses
ST 516
Experimental Statistics for Engineers II
Procedure
Fit model for various λ, and graph SSE against λ.
Lowest SSE gives best λ.
All λ with SSE(λ) ≤ SS∗ comprise a 100(1 − α)% confidence
interval, where
!
2
t
α/2,dfE
SS∗ = SSE(λopt ) 1 +
.
dfE
Example
Peak discharge data (peak-discharge.txt):
(peak-discharge-box-cox.R).
3 / 10
Other Topics
Nonnormal Responses
ST 516
Experimental Statistics for Engineers II
Generalized Linear Model
Sometimes a better approach is to use a different statistical model.
E.g., for counted data, assume that Y has the Poisson distribution.
Replace the linear model
E(Y ) = µ = β0 + β1 x1 + β2 x2 + · · · + βk xk = x0 β
by
g (µ) = x0 β ⇐⇒ E(Y ) = µ = g −1 (x0 β)
for some nonlinear link function g (·).
4 / 10
Other Topics
Nonnormal Responses
ST 516
Experimental Statistics for Engineers II
If the distribution is in the exponential family and the link function is
chosen to match it, estimation by maximum likelihood is relatively
easy.
In general, the variance of Y also depends on µ; examples from the
exponential family:
Distribution
g (µ)
Normal, σ 2 = 1
µ
Poisson
log µ
Gamma
1/µ
Inverse Gaussian 1/µ2
µ
Binomial
log 1−µ
5 / 10
Other Topics
V (µ)
1
µ
µ2
µ3
µ(1 − µ)
Nonnormal Responses
ST 516
Experimental Statistics for Engineers II
Other combinations of distribution, g (·), and V (·) may also be used,
but are not supported by standard software.
The binomial case is widely used:
0
P(Y = 1) =
1
ex β
.
=
.
0
1 + ex β
1 + e −x0 β
Example
Coupon redemption:
Y is the number of customers out of 1000 who redeem the
coupon;
three factors were used in a 23 factorial design.
6 / 10
Other Topics
Nonnormal Responses
ST 516
Experimental Statistics for Engineers II
R commands
Generalized linear models are fitted using glm():
summary(glm(cbind(Redeemed, Customers - Redeemed) ~ A * B + A * C + B * C,
coupon, family = "binomial"))
Output
Call:
glm(formula = cbind(Redeemed, Customers - Redeemed) ~ A * B +
A * C + B * C, family = "binomial", data = coupon)
Deviance Residuals:
1
2
3
0.4723 -0.4307 -0.4228
7 / 10
4
0.3949
5
-0.4572
6
0.4166
Other Topics
7
0.4238
8
-0.3987
Nonnormal Responses
ST 516
Experimental Statistics for Engineers II
Output, continued
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.011545
0.025515 -39.645 < 2e-16 ***
A
0.169208
0.025509
6.633 3.28e-11 ***
B
0.169622
0.025515
6.648 2.97e-11 ***
C
0.023317
0.025510
0.914
0.361
A:B
-0.006285
0.025512 -0.246
0.805
A:C
-0.002773
0.025432 -0.109
0.913
B:C
-0.041020
0.025434 -1.613
0.107
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 93.0238
Residual deviance: 1.4645
AIC: 72.286
on 7
on 1
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 3
8 / 10
Other Topics
Nonnormal Responses
ST 516
Experimental Statistics for Engineers II
Reduced model
The analyst decides to fit a reduced model including A, B, and BC
(and, to keep it hierarchical, C ):
summary(glm(cbind(Redeemed, Customers - Redeemed) ~ A + B * C,
coupon, family = "binomial"))
Output
Call:
glm(formula = cbind(Redeemed, Customers - Redeemed) ~ A + B *
C, family = "binomial", data = coupon)
Deviance Residuals:
1
2
3
0.3402 -0.3114 -0.3783
9 / 10
4
0.3531
5
-0.5142
6
0.4692
Other Topics
7
0.5509
8
-0.5171
Nonnormal Responses
ST 516
Experimental Statistics for Engineers II
Output, continued
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.01142
0.02551 -39.652 < 2e-16 ***
A
0.16868
0.02542
6.635 3.25e-11 ***
B
0.16912
0.02543
6.650 2.94e-11 ***
C
0.02308
0.02543
0.908
0.364
B:C
-0.04097
0.02543 -1.611
0.107
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 93.0238
Residual deviance: 1.5360
AIC: 68.358
on 7
on 3
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 3
10 / 10
Other Topics
Nonnormal Responses