QUIZ3_july212011_Solution.pdf

NCSU
ST512
QUIZ 3
Sum2 2011
1) A study of the effect of water acidity on plant growth uses 10 plants watered at each of 5 acidity levels
(Treatment = A,B,C,D,E) in a completely randomized design, getting these means and variances:
Treatment
Mean y i
Variance si2
 y
16
12
11
9
10
11.6
10
10
6
10
6
10*9=90
10*9=90
6*9=54
10*9=90
6*9=54
378
ij
 yi. 
2
j
A
B
C
D
E
Overall mean
SUM
a) Computed the error sum of squares SS[E] 378 and its degrees of freedom 5*(10-1)=54
 y
10
si2 
j 1
ij
 yi. 
2
 y
10
10  1
j 1
2
ij
 yi.   10  1 si2
2
SS  E     yij  yi    10  1si2
5
10
5
i 1 j 1
i 1
2) All questions refer to this: I have regressed Y (yield) on
P=soil pH,
M = soil moisture, T=soil temperature, and MT
where MT =M*T is the product of moisture times temperature.
Y be the column vector of observed yields and X, the X matrix containing a column of 1s, plus
columns for the explanatory variables.
Here is a partial PROC REG output:
Dependent Variable: Y
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
Model
Error
C Total
4
46
50
2214.67512
747.95233
2962.62745
553.66878
16.2598
F Value
Prob>F
34.051
0.0001
Model is yi  o  1P  2 M  3T  4  MT   ei
Parameter Estimates
Variable
DF
Parameter
Estimate
Standard
Error
T for H0:
Parameter=0
Prob > |T|
INTERCEP
P
M
T
1
1
1
1
-34.468222
-1.866855
1.854880
1.611417
93.43489693
2.16613910
2.40241982
1.39034011
-0.369
-0.862
0.772
1.159
0.7139
0.3932
0.4440
0.2524
1
July 22, 2011
NCSU
MT
ST512
QUIZ 3
1
0.009275
Sum2 2011
0.03598928
0.258
Variable
DF
Type I SS
Type II SS
INTERCEP
P
M
T
MT
1
1
1
1
1
1249887
14.4795
1406.756208
792.359286
1.079988
2.212767
12.077159
9.692811
21.841861
1.079988
0.7978
2214.67512-(1406.756208+792.359286+1.07998)
= 14.4795
a) How many rows 51 and columns 5 does my X matrix have?
b) Where possible, fill in the blanks in the above output.
Your boss says that temperature has no effect on yield and no term involving temperature should be
included in the regression. She says, "look at the p-values 0.2524 and 0.7978 and you will see you can
leave out both terms with T.
c) What hypothesis is being tested by test statistic tcalc = 1.159 for temperature (T), give degrees of
freedom for test-statistic.
H o : 3  0
vs
H1 : 3  0
t-test have (n-1-4) = 46 df.
d) Compute the test statistic for testing the null hypothesis that regression coefficients for T and MT
are both simultaneously equal to 0.
Reduced model (under Ho) : yi  o  1P  2 M  ei
Under Ho, F  H o  
 SS[ E]
r
 SS[ E ] f
  dfe
MSE
r
 dfe f 

 792.359286  1.079988
16.2598
2
 24.3988
e) Write null hypothesis and df of the test-statistic. Use as critical value 4.5. Do you reject Ho?
H o : 3   4  0
vs
H1 : not all i  0
Fcalc = 24.3988 which is greater than critical value F(2, 46, 0.05) = 3.19
f)
Do you agree with your boss’ statement?. Explain your answer.
g) Although individual test for T and MT are non significant, the simultaneous test of both coefficients
 3 and  4 are highly significant, which means that we should not drop both variables, since the
nonsignificant pvalues for testing that each partial regression coefficient is equal to zero is due to the
2
July 22, 2011
NCSU
ST512
QUIZ 3
Sum2 2011
high collinearity present among explanatory variables. It will be necessary to analyze the pattern of
correlation among variables and the nature of the MT interaction. Note MT has a small TypeI SS =
1.079988, and its pvalue = 0.7978.
3
July 22, 2011
NCSU
ST512
QUIZ 3
Sum2 2011
3) In an experimental study, a researcher incorporated a polychlorinated biphenyl (PCB) mixture in the
diets of lab mice fed ad libitum. Rates were 0, 62.5, 250, 1000 ppm. After two weeks, the mice were
injected with Nembutal and their sleeping times recorded. Each diet was randomly assigned to three mice
for a total of 12 mice. Researcher wants to determine whether the four PCB diets (treatments) have an
effect on Sleeping times, and whether this effect, if significant, may be presented as a linear relationship
between Sleeping time and BCP rates.
Analysis of variance table for Treatments
Dependent Variable: SleepingTimes
DF
Sum of
Squares
Mean Square
F Value
Pr > F
Treatment
3
5352.916667
1784.305556
12.10
0.0024
Error
8
1180.000000
147.500000
11
6532.916667
Source
Corrected Total
a) Give the model (regression) sum of squares when fitting the following regression model
Yj  o  1 X j  2 X 2j  3 X 3j  e j
j  1, ..., 12
SS[R] = 5352.916667
To answer whether a linear regression line is adequate to represent the relationship between Treatment (BCP
tailored diets) and mean response (SleepingTimes) the following analysis was carried out
b) Fill the blanks in table below
The GLM Procedure
Class Level Information
Class
Levels
Values
CRATEs
4
0 62.5 250 1000
5352.916667-542.688300 = 4810.228
Dependent Variable: SleepingTimes
Source
DF
Type I SS
Mean Square
F Value
Pr > F
rates
CRATES
1
2
542.688300
4810.228367
542.688300
2405.114183
3.68
16.31
0.0914
0.0015
c) Calculate F value for CRATES, What hypothesis is tested by this F-test? Write out the hypotheses
and p-value, and then interpret the results of this test
F=2405.1141/542.6883 = 16.31, with num df = 2 and denominator df= 8.
d) Compute the pure error sum of squares = MSE from full model = 147.5
e) Summarize the findings of this analysis, would you recommend to use a linear equation to represent
the relationship between SleepingTimes and BCP diet content. Support your answer.
Since the lack of fit test from the linear regression of SleepingTimes on Rates was significant, there is
evidence (p=0.0015 < 0.05) that the relationship between Rates and SleepingTimes is not just linear,
4
July 22, 2011
NCSU
ST512
QUIZ 3
Sum2 2011
but higher degree either quadratic or cubic. Testing for quadratic regression coefficient and lack of fit
from quadratic is necessary.
5
July 22, 2011