SolutionExercisesFinaL_August032010.pdf

ST512
Exercises Final Exam
SSII 2010
I. use multiple regression to fit the model
Y = beta_0 + beta_1 X1 + beta_2 X2 + beta_3 X3 + e
and here is part of the PROC REG output, including the Type I (sequential)
sums of squares
Analysis of Variance
Source
Model
Error
Corrected Total
DF
(_3__)
( 6_ )
( 9_ )
Sum of
Squares
23.14600
4.69000
27.83600
Mean
Square
7.71533
0.78167
F Value
9.87
Pr > F
0.0098
Parameter Estimates
Variable
Intercept
x1
x2
x3
Parameter
Estimate
14.20813
-0.39313
-0.62438
0.43750
DF
1
1
1
1
Standard
Error
3.03548
(0.3954)
0.29491
0.31903
t Value
4.68
*****
-2.12
1.37
Pr > |t|
0.0034
******
0.0786
0.2193
Type I SS
806.40400
(19.10412)
2.57188
1.47000
The X'X matrix and its inverse are
X'X
=
10
25
75
10
25
145
100
65
75
100
665
35
10
65
35
37.68
inverse of X'X =
11.79 -1.41 -1.14
-1.41
0.20
0.13
-1.14
0.13
0.11
0.36 -0.10 -0.03
0.36
(-0.10)
-0.03
0.13
s.e.(x1) = sqrt(0.20*0.78167) = 0.3954
Type I SS (x1) = Model SS –
= 23.14600 –
(A) Fill in the 6 blanks in
in calculating elements
(B)
Type I SS (x2) - Type I SS (x3)
2.57188 – 1.47000 = 19.10412
the display above. Do not use the inverse of X'X
of X'X - it will not be accurate enough.
Calculate F tests for these hypotheses. In each case H is just "not H0."
F =
1
H0: beta_2=0.
F = _(-2.12)2 = 4.4944_
H0: beta_2=beta_3=0
F = [(2.57188+1.470000)/2]/ 0.78167 = 2.58
H0: beta_1+beta_2+beta_3=0 , beta_2-beta_3 =0, and beta_1-beta_2=0
Set of these three null hypothesis is equivalent to
H0: beta_1 = beta_2 = beta_3=0
9.87
(Model F)
ST512
Exercises Final Exam
SSII 2010
(C) Compute the standard error of b1- 2*b2 where b1 and b2 are the estimates
of beta_1 and beta_2.
var(b1-2*b2) = [0 1 -2 0] (X’X)-1 [0, 1 , -2 , 0]
AV(b)A’
11.79 1.41 1.14 0.36   0 
 1.41 0.20 0.13 0.10  1 
 
0 1 2 0 
1.14 0.13
0.11 0.03  2

 
 0.36 0.10 0.03 0.13   0 
0
1
  1.41  2*(1.14) 0.20  2*(0.13) 0.13  2*(0.11) 0.10  2*(0.03)   
 2
 
0
0
1
  0.88 0.06 0.09 0.04    0.06  2(0.09)  0.12
 2 
 
0
se  b1  2b2   0.12  0.3464
II. I used 4 drugs, each on three patients. In addition to the response
variable Y,I have these dummy variables X1, X2, and X3 in my dataset.
Note that there is no dummy variable for drug C.
drug
A
A
A
B
B
B
C
C
C
D
D
D
X1: drug A;
X2:
drug B;
DF
1
1
1
1
X2
X3
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
X3: drug D
I ran PROC REG; MODEL Y = X1
Variable
Intercept
X1
X2
X3
X1
X2
X3;
getting this partial output:
Parameter Estimates
Parameter
Standard
Estimate
Error
11.00000
1.47196
9.00000
2.08167
6.00000
2.08167
8.00000
2.08167
t Value
7.47
4.32
2.88
3.84
Pr > |t|
<.0001
0.0025
0.0204
0.0049
(A) Give the sample average response for drug A 20.00 and for drug C 11.00
Since regression in dummy variables, with coefficient for drug C set up to be zero, intercept is y C , and
parameter estimates (for X1, X2, and X3) correspond to differences  y i  y C 
sample average for A = 11.00 + 9.00 = 20.00
sample average for C = 11.00 = 11.00
(B)
Give a t test for testing the null hypothesis that drugs B and D have
the same mean response. t = -0.9607
H o :  B.   D. or equivalently, H o :  B.   D.  0 ,
H1 :  B .   D .
H1 :  B .   D .  0
Note that var y  y  2v ar y  2 MSE  2.081672
 i. j . 
 i. 
r
2
ST512
Exercises Final Exam
t
y
B.

 y D.  0
2.08167
2
11  6   11  8   0
2


 0.960767
2
2.08167
2.08167
(C) Find, if possible, the error mean square, MSE= 6.50 for this
regression.


 
2.081672  var y i.  y c.  2v ar y i.  2 
MSE , r = 3
r
MSE 
3  2.08167 2
 6.50
2
(D) I create a t statistic whose numerator is the mean response
for drug A minus the average of the B and C means. Complete this
formula for the standard error of this linear combination of
means (i.e.the denominator of t ) std error = sqrt[(2)MSE]
where, as always, MSE is the error mean square.
Contrast: Mean_A – average(Mean_B and Mean_C)
Contrast Ĉ1 = 1*mean_A-(1/2)*mean_B-(1/2)*mean_C+0*Mean_D
var( Ĉ1 )= MSE 02  12   1 2 2   1 2 2  0  MSE  1  6  MSE


r
3
3 4
2
SSII 2010
ST512
Exercises Final Exam
SSII 2010
III. An experiment is done to investigate the effects of factor S=salinity
and factor T=water temperature on growth of a certain type of underwater
plant. Three equally spaced levels of each factor are investigated in a
factorial treatment arrangement. Each replicate of the experiment uses 9
containers, one for each treatment combination. The 9 treatments are
assigned at random to the containers, each of which has been stocked with
young plants. After 1 month the plants are harvested and growth
measured. The equipment is cleaned out, new plants inserted and the
whole process is repeated the next month and the month after for a total
of 3 replicates (REP). Note replicates on time: Design is a Randomized
Complete Block Design
1. How many observations do we have in this experiment? 3*3*3 = 27
2. Compute the error degrees of freedom resulting from each of these
analyses (SAS code is given for clarity - note the CLASS statements).
Note CLASS statement identifies factor in Anova setting with dummy
variables 0/1 being created.
a.
Factorial ANOVA with interaction, no blocks.
PROC GLM;
CLASS S T;
MODEL YIELD = S T S*T;
error df= 3*3*3 – 1-2-2-4 = 18
b.
Factorial ANOVA with blocks, no interaction.
PROC GLM;
CLASS REP S T;
MODEL YIELD=REP S T;
error df= = 3*3*3–1 -2-2-2 = 20
c.
Linear regression in S and T with blocks, no interaction.
PROC GLM;
CLASS REP;
MODEL YIELD=REP S T;
error df = 3*3*3–1 -2-1-1 = 22
d.
Linear regression in S and T with (linear by linear) interaction,
blocks.
PROC GLM;
MODEL YIELD = S T S*T;
error df= 3*3*3–1 -1-1-1 = 23
e. Full quadratic surface in S and T, no blocks.
PROC GLM;
MODEL YIELD = S S*S T T*T T*S; error df= 3*3*3–1 -1-1-1-1-1 = 21
3. Some of the models in question 2 include block effects (REP)and some
don't. Based on the description of the experiment, which is
appropriate? (include REP, it was replicated in blocks of time)
4. Here are some treatment totals. You may want to recall that the
orthogonal polynomial coefficients for a factor at 3 equally spaced
levels are
Linear:
-1 0 1
Quadratic: -1 2 -1
4
no
ST512
Exercises Final Exam
SSII 2010
Treatment totals:
1
S
2
3
T
1
2
3
--------------------|
50
44
46 |
|
|
|
45
35
40 |
|
|
|
35
21
24 |
--------------------130
100
110
140
120
80
340
Compute the sum of squares for linear effect of T within level 1 of S
Q = (-1)*50+0*44+1*46 = -4
SS(Q) = (Q)2/(r*SUM_coef_squared)
2
 4 
SS  Q  
 2.6667
2
3  1   02  12  
2.6667
Compute the sum of squares for the T linear by S linear interaction 4.0833
Each of the above sums of squares is associated with a contrast in the 9
totals in our table. Are these 2 contrasts orthogonal? (yes, no)
S1T1
S1T2
S1T3
S2T1
S2T2
S2T3
S3T1
S3T2
S3T3
50
44
46
45
35
40
35
21
24
Tlin in S1 -1
0
1
0
0
0
0
0
0
Slinear
-1
-1
-1
0
0
0
1
1
1
TLinear
-1
0
1
-1
0
1
-1
0
1
SlinxTlin 1
0
-1
0
0
0
-1
0
1
QSlin*Tlin = (1)*50+0*44+(-1)*46+0*45+0*35+0*40+(-1)*35+0*21+1*24 = -7
SS(QSlin*Tlin) = (QSlin*Tlin)2/(r*SUM_coef_squared)


SS QSlin Tlin 
 7 
49

 4.0833
3 1   0   1  0  02  02   12  02  12   3 4
2
2
2
2
2
“Tlinear within S1” and “Tlinear” are not orthogonal since
Sum of the product of their coefficients is not 0
(-1)*(-1) + (0)*(0)+(1)*(1) = 2
“Tlinear within S1” and “Tlinear x Slinear” are not orthogonal since
Sum of the product of their coefficients is not 0
(-1)*(1) + (0)*(0)+(1)*(-1) = -2
5
ST512
IV.
Xl
X2
X3
Exercises Final Exam
SSII 2010
A multiple regression equation
Yt + Beta0 + Betal Xlt + Beta2 X2t + Beta3 X3t
is fit to some data. We obtain:
Sum of Squares
df
Type I
Type II
+
: .03
l
l80
40
:
-1 : .05
l
90
l00
(X'X)
:
: .04
l
50
50
:
: .01
Error
20
.l2
300
.l0
+ et
.05
.04
.30
.20
.20
.48
+
.0l :
:
.l2 :
:
.l0 :
:
.0l :
+
+
Give, if possible, the computed F statistic for testing:
(a)
H0:
Beta2 = Beta3 = 0
F = [(90+50)/2]/(300/20)=4.6667
(b) H0: Beta2 = 0
F = [(100)/1]/(300/20) = 6.6667
V. A factorial experiment has quantitative factors A at 3 equally spaced
levels and B at 4 equally spaced levels. The 10 replications are in
blocks. Here are the totals for A and B, each being a total of 10
original observations:
b0
bl
b2
b3
Polynomials
+--------------------------+
a0 : 530 : 700 : 720 : 850
:
:------:-----:-----:-------:
al : 4l0 : 500 : 620 : 700
:
:------:-----:-----:-------:
a2 : 400 : 470 : 500 : 650
:
+--------------------------+
BLOCK SS =
Linear Orthogonal
4 levels
3 levels
+--------------------------: -3
-l
l
3
:
: -l
0
l
:
5000
Total SS = 28000
Compute the sums of squares for:
a0b0
a0b1
a0b2
a0b3
a1b0
a1b1
a1b2
a1b3
a2b0
a2b1
a2b2
a2b3
totals
AL
BL
530
700
720
850
410
500
620
700
400
470
500
650
Q
div
SS
-1
-3
-1
-1
-1
1
-1
3
0
-3
0
-1
0
1
0
3
1
-3
1
-1
1
1
1
3
-780
2750
7605
13504.46
3
-200
8*10
10*14*
3 =168
10*14*
2 =112
ALxBL
3
1
-1
-3
0
0
0
0
-3
-1
1
142.8571
r=10
Contrasts Q
QAL
(-1)*530+ (-1) *700+ (-1)*720+ (-1)*850+ (0)*410+ (0)*500+ (0)*620+ (0)*700+ (1)*400+ (1)*470+ (1)*500+ (1)*650
= -780
QBL
(-3)*530+ (-1) *700+ (1)*720+ (3)*850+ (-3)*410+ (-1)*500+ (1)*620+ (3)*700+ (-3)*400+ (-1)*470+ (1)*500+ (3)*650
= 2750
QALxBL
(3)*530+ (1) *700+ (-1)*720+ (-3)*850+ (0)*410+ (0)*500+ (0)*620+ (0)*700+ (-3)*400+ (-1)*470+ (1)*500+ (3)*650
= -200
6
ST512
Exercises Final Exam
SSII 2010
Sum of Squares
SS (AL) = (QAL^2)/div
(-780)^2/(8*10) = 7605
SS (BL) = (QBL^2)/div
2750)^2/(10*14*3) = 13504.46
SS (ALxBL) = (QALxBL^2)/div
-200)^2/(10*14*2) = 142.8571
SS
AL = A linear
7605
BL = B linear
13504.46
AL x BL
142.8571
Treatment SS =
(530^2+700^2+720^2+850^2+410^2+500^2+620^2+700^2+400^2+470^2+500^2+650^2)/(10)(530+700+720+850+410+500+620+700+400+470+500+650)^2/(10*3*4)
[1] 21582.5
Or,
means
a0
A1
a2
mean
b0
53
41
40
44.67
b1
70
50
47
55.67
b2
72
62
50
61.33
b3
85
70
65
73.33
mean
70.00
55.75
50.50
58.75
Treatment SS =
10*( (53-58.75)^2 + (70-58.75)^2 + (72 -58.75)^2 + (85-58.75)^2 + (41 -58.75)^2 + (50-58.75)^2
+ (70 -58.75)^2 + (40-58.75)^2 +(47-58.75)^2 +(50-58.75)^2 +(65 -58.75)^2 ) = 21582.5
+ (62-58.75)^2
Error SS = Total SS – Block Ss – Treatment SS = 28000 – 5000 – 21582.5 = 1417.5
Source
BLOCK
TREATMENTS
A
AL
(Alinear)
B
BL
(Blinear)
A*B
ALxBL
ERROR
TOTAL
DF
10-1 = 9
3*4-1=11
3-1 = 2
1
4–1 = 3
1
(3-1)*(4-1)
= 6
1
(10-1)*(3*4-1)
= 9*11= 99
10*3*4-1= 119
TREATMENT MS:
SS
MS
F
21582.5
1962.045
137.0317
7605
7605
13504.46
13504.46
142.8571
142.8571
1417.5
14.31818
21582.5/11
=
1962.045
14.31818
Error MS:
1417.5/99
=
Treatment F:
1962.045/14.31818
= 137.0317
() The test statistic which test the null hypothesis that nothing other
than the above effects is needed to describe the effects of A and B,
F =
7
hypothesis MS/ Error MS = 41.2725/14.31818 = 2.88
ST512
Exercises Final Exam
Want to test if the variation due to treatments ( main effect of A with 2 df, main effect of B with 3 df and
interaction effects with 6 df) is solely due to the linear effect of A (1 df) and the linear effect of B (1 df) and their
A_linear by B_linear effect (1 df), i.e., whether the remaining effects are jointly all equal 0
Full model with 9 df
Full Model SS = 21582.5
Reduced model with 3 df ( AL, BL, ALxBL)
Reduced Model SS = 7605 + 13504.46+ 142.8571= 21252.32
Full Model SS – Reduced Model SS = 21582.5-
21252.32 = 330.18
with 11-3 = 8 df
Hypothesis MS = (21582.5-
21252.32)/8 =
41.2725
Hypothesis F = hypothesis MS/ Error MS = 41.2725/14.31818 = 2.88
() How many numerator 8 denominator 99 degrees of freedom for F?
8
SSII 2010
ST512
Exercises Final Exam
VI. Eight trees are selected at random and from each tree 12 identical boards
are cut for a total of 96 boards. Breaking strengths of the boards are
measured. From each tree, three randomly selected boards are broken while
dry at temperature 40 degrees F, three are broken wet at 40 degrees F,
three dry at 90 degrees F and three wet at 90 degrees F. Let factor A be
temperature, B be tree, and C be wetness. Label as random or fixed.
NOTE:
Factor A is (random, fixed)
Fixed
Factor B is (random, fixed)
Random
Factor C is (random, fixed)
Fixed
Wetness is not selected at random from a normal population.
In a factorial experiment, the experimenter would try to hit the same wetness (saturation level) each time,
but might not be successful. Still, half the numbers would be close to 0 and half close to, say, 60%
saturation so we would expect effects that are bimodal, not normal looking. Also, wetness (like drug
dosage, fertilizer level, etc.) would likely be analyzed as a "regression" type variable, that is, extension to
other levels of wetness would be done by running a (linear) regression rather than by describing the
variance of some normal population. These "regression" factors are considered fixed.
Who would be interested in these results? Why was the experiment done? Someone interested in
breaking strength of wood in construction applications might realize that items he constructs might be in
warm or cold, wet or dry environments and hence would be interested in these (fixed) effects. Wood will
come from various trees and tree-to-tree variation is likely of interest, but not the comparison of one
tree to another since these were randomly selected (likely all of one variety - e.g. knotty pine). Anyone
interested in this experiment will not want results restricted to these trees and extrapolation by regression
(on tree number ??) would obviously not be the correct way to extrapolate (in contrast to wetness factor).
Trees are thus random.
VII. Trucks deliver loads of corn to a breakfast cereal company. Several
trucks were selected at random and from each, six sample jars of corn
(from six randomly selected locations in the truck bed) were taken. Three
of these (randomly selected) were refrigerated and the other three left
at room temperature. Call this factor “storage”. After a period of time,
an aflatoxin measurement Y was made on each jar. The company wants to
compare these two storage methods in terms of the effect on aflatoxin.
How many trucks _____ were selected? One (there is a total of 6 jars
stored)
There is only one fixed effect here. Which is it? STORAGE
9
SSII 2010
ST512
Exercises Final Exam
SSII 2010
VIII. A 2x3 factorial set of treatments, namely 2 wheat varieties (A,B) and 3
levels of fertilizer (1, 2, or 3 fertilizer applications) were set up
for research . The experiment was done by selecting 5 farms at random
in North Carolina, laying out 6 plots per farm and assigning the 6
treatment combinations to the six plots randomly within each farm.
a. Present the layout of the experiment.
Design is a Randomized complete block design
Factors
- Blocking factor: Farm, recognizing differences (soil, management, fertility, …) between farms
across North Carolina
-
Random
Treatment Factors
o
Varieties:
2 (A, B)
Fixed
o
Fertilizer Applications:
3 (1, 2, 3)
Fixed
-
Plots (Exp. Unit)
nested within each farm
-
Number of treatment combinations: 2 x 3 = 6
-
Number of blocks (repetitions)
5
-
Total number of observations
2 x 3 x 5 = 30
In any given farm
A3
B3
A1
B1
B1
10
A2
Random
ST512
Exercises Final Exam
SSII 2010
b. ANOVA table: SOURCES and DF. Indicate whether factors are random or
fixed.
Linear Model
-
Yijk    i   j   ij  dk  eijk
2
Variety is a Fixed effect

i
0

j
0
2
3
i 1
-
3
Fertilizer is a Fixed effect
i 1
-
Variety*Fertilizer is a Fixed effect
  
i 1 j 1
ij
0
-
Farm is a Random effect , all six treatments present in each farm
-
Plot is a Random effect, nested on farm and treatment combination
-
eijk and
dk
are independent random effects
ANOVA Table
Source
E(MS)
BLOCK (FARMS)
5-1 = 4
TREATMENTS
2*3-1=5
Variety (V)
Fertilizer (F)
V*F
2-1 = 1
3–1 = 2
(2-1)*(3-1)
= 2
ERROR
TOTAL
11
DF
(5-1)*(2*3-1)
= 4*5= 20
5*2*3-1= 29
 2   2  3  2farm
 2   5  3
 2  5  2
 2   5

2

2
i
i
 2  1

2
j
j
 3  1
  
i, j
2
ij
 2  1 3  1
dk ~ N  0,  2farm 
eijk ~ N  0,  2  ,