Exercises_Quiz2_Answers4and5.pdf

St512
1.
Exercises
SSII10
I have 5 fish tanks with water at each of 4 pH values (4, 5, 6, and 7) in a
completely randomized design. In each of these 20 tanks I have 1 fish so
altogether there are 20 fish.
The weight gain of fish in tank j,
for pH i is Yij.
a)
Write down the lineal model
b)
The following model recognize the fact that we have four groups, from an
experimental setting, where interest is comparing the different pH level effects
on the response given by weight gain.
Yij     i  eij
i=1,
,
,4 pH levels
j=1,
,5 tanks/pH level
An associated regression model is given by
Yij   0  1 X 1 j   2 X 2 j   3 X 3 j   4 X 4 j  eij
i=1,
,4 pH levels
j=1,
,
,5 tanks/pH level
and
X1  1 if i-th observation belongs to pH level 1, and 0 otherwise.
similarly,
X 2  1 if i-th observation belongs to pH level 2, and 0 otherwise
X 3  1 if i-th observation belongs to pH level 3, and 0 otherwise
X 4  1 if i-th observation belongs to pH level 4, and 0 otherwise
These 0/1 dummy variables are used to identify to what group each observation belongs to.
c)
Write down the X matrix,
For the dummy variables X1-X4, the X matrix is given by
1
1

1

1
1

1
1

1
1

1
X
1
1

1
1

1
1

1

1
1

1
1
1 0 0 0
1 0 0 0 
1 0 0 0

1 0 0 0
1 0 0 0

0 1 0 0
0 1 0 0

0 1 0 0
0 1 0 0

0 1 0 0

0 0 1 0
0 0 1 0

0 0 1 0
0 0 1 0

0 0 1 0
0 0 0 1

0 0 0 1

0 0 0 1
0 0 0 1

0 0 0 1
(taken from DD ST512 notes)
Note
a. This matrix is called design matrix because incorporates the
experimental setting and experimental factor of the study,
b. Because we are identifying uniquely each observation in the
dataset, the matrix is singular, have redundant columns, and
in this case, one column could be dropped without losing any
information about the effect on pH on weight gain.
St512
d)
Exercises
I am
a.
b.
c.
SSII10
interested in analyzing the following comparisons between the ph levels:
The largest vs the smallest pH level
The two central
Find out a third orthogonal comparison to the first two.
This question is directly associated with the main objective in running an
experiment, we want to be able to say something about the pH levels studied and
their effects (and differences) over the response. Thus a simple way is to set
up a set of comparisons of interest for researcher. These comparisons are
described as linear combination of the pH-level effects (group means). When the
linear combination is set up to be equal to zero is called a contrast. In a
contrast we compare two groups, one with positive coefficients in the associated
pH levels and the other with negative coefficients such that the sum of all
coefficients is 0. Two contrasts C1 and C2 are orthogonal if the sum of the
cross product of their coefficients is zero, SUM_C1_i*C2_i = 0. For our case, a
set of orthogonal contrasts have number_pH-levels -1 contrasts, and we can
choose from several sets, the one that best serves our objectives, if possible.
Not always objectives can be represented by orthogonal sets.
How do we choose the coefficients of each orthogonal contrast?
Use the “description”,
pH  Levels
4
5
 
6
 
7 
C1
 1
0
lowest vs largest =  
0
 
1
C2
"compare middle levels" =
0
 1
 
1
 
0
Note that the column vector for pH follows the natural order seen in the X
matrix (important!!), now
Are C1 and C2 orthogonal?
C1   1 pH  4    0  pH  5    0  pH  6   1 pH  7 
C 2   0  pH  4    1 pH  5   1 pH  6    0  pH  7 
Sum _ C1* C 2  (1)*(0)  (0)*( 1)  (0)*(1)  (1)*(0)  0, thus C1 and C2 are orthogonal
How many orthogonal contrasts do we have available?
There are 4 -1 = 3 orthogonal contrasts that may be used instead of the dummy
variables to analyze the effect of pH on fish Weight gain in experiment.
We need a third contrast, C3, orthogonal to C1 and C2. A suggestion is to
compare the two groups involved in C1 and C2, i.e., (pH 4 and 7) vs (pH 5 and
6)
pH  Levels
4
5
 
6
 
7 
C3
1
 1
extremes vs middle =  
 1
 
1
C1*C3=(-1)(1) + (0)(-1)+(0)(-1)+(1)(1) =0
C2*C3=(0)(1) + (1)(-1)+(-1)(-1)+(0)(1) =0
Thus C1, C2, and C3 are an orthogonal set of contrasts that may be used instead
of the dummy variables to analyze the effect of pH on weight gain.
2
(taken from DD ST512 notes)
St512
2.
Exercises
SSII10
Here is a PROC PRINT and a PROC REG where you see I have 5 treatments A through E
(from a completely randomized design) and some ORTHOGONAL columns C1 through C4.
My model is the usual:
Y(ij)
= Mu + Tau(i) + e(ij)
Yij     i  eij
i=1,
,
,5 treatments
j=1,
,3 repetitions(per treatment)
SAS output:
5 treatments, completely randomized design
TRT
A
A
A
B
B
B
C
C
C
D
D
D
E
E
E
Y
C1
C2
C3
C4
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
1
1
1
0
0
0
-1
-1
-1
0
0
0
0
0
0
-1
-1
-1
0
0
0
-1
-1
-1
2
2
2
0
0
0
0
0
0
-1
-1
-1
0
0
0
0
0
0
1
1
1
2
2
2
-3
-3
-3
2
2
2
2
2
2
-3
-3
-3
C1 is A vs C
C2 is D vs (A and C)
C3 is B vs E
C4 is (A, C, and D) vs (B and E)
Model: MODEL1
Dependent Variable: Y
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
Model
Error
C Total
4
10
14
631.00403
48.35803
679.36207
157.75101
4.83580
F Value
Prob>F
32.621
0.0001
Parameter Estimates
Variable
DF
Parameter
Estimate
Standard
Error
T for H0:
Parameter=0
Prob > |T|
INTERCEP
C1
C2
C3
C4
1
1
1
1
1
36.501133
4.271500
-0.350833
-7.514333
1.416267
0.56779123
0.89775677
0.51832011
0.89775677
0.23179980
64.286
4.758
-0.677
-8.370
6.110
0.0001
0.0008
0.5138
0.0001
0.0001
a)
TYPE I SS
19984.98
109.4742
2.215507
338.791
180.5230
Compute, if possible, the F test for the null hypothesis that the 5 treatments all
have the same mean.
3
(taken from DD ST512 notes)
St512
Exercises
F
ModelMS 157.75101

 32.621
ErrorMS
4.83580
SSII10
H o : 1  2  3  4  5  
p-value= 0.001 , Reject null hypothesis, at leat one ph level has an effect
significantly different from 0.
b) Compute, if possible, the F test for the null hypothesis that treatments E and B have
the same mean in the population. (H0: Tau(2) = Tau(5) )
note that
. mean for treatment 1 is given by 1     1
, 2     2 and so on.
. equality of means imply equality of effects;
. We want to test H o :  2   5
or
H o : 2  5  0 , and this comparison is the same as
contrast C3, so our null hypothesis may be expressed as H o : C3  0
From the Anova results, the requested F value is the F value for C3
FC3 
338.791 1
2
 70.06   8.370 = t2 (for C3). p-value is 0.0001
4.83580
, reject null
hypothesis, and conclude that C2 and C3 are significantly different.
c) Let b2 and b3 denote the estimated regression coefficients for columns C2 and C3
respectively.
Compute the standard error of the difference b2 - b3 from the standard
errors of the two coefficients. Standard error of b2 - b3 is ____
Ho : C2  C3
H o : C2  C3 =0
or
Since the two contrasts are orthogonal, we know that their covariance is zero and
that
var  C2  C3  = var  C2   var  C3 
and s.e.(C2  C3 )= var  C2   var  C3 
Which is given by s.e.(C2  C3 )= 0.51832011  0.89775677  1.036640
2
2
d) Compute the partial SS for C2, C4.
Partial SS (C2/int, C1, C3, C4) = 2.215507 = Seq SS(C2/int, C1)
Partial SS (C4/int, C1, C2, C3) = 180.5230= Seq SS(C4/int, C1,C2,C3)
e) I want to test the null hypothesis that the coefficients of C2 and C3 can
simultaneously be set to 0 in the above regression.
2
Give the F test statistic F
= _________ for this hypothesis.
10
H o : C2  C3  0
We should have the order of variables entering the model as : int, C1, C4, C2, C3
So that the two variables being tested enter last in the model and used the
F
 SS Re gression
full
 SS Re gressionreduced  2
MSError
 2.215507  338.791 2  35.2585

4.83580
Note that since C2 and C3 are orthogonal, Type I SS and Type II SS are the same,
In general, the above procedure may not be valid, and we should have to run again
the model with C2 and C3 entering last (sequentially).
4
(taken from DD ST512 notes)