Practice_Questions_Final_ST512.pdf

The following exercises were taken from Dr. D. Dickey web site for ST512 : http://www.stat.ncsu.edu/people/dickey/courses/st512/
Question I
I use multiple regression to fit the model
Y = beta_0 + beta_1 X1 + beta_2 X2 + beta_3 X3 + e
and here is part of the PROC REG output, including the Type I
(sequential) sums of squares
Analysis of Variance
Source
Model
Error
Corrected Total
DF
(____)
( __ )
( __ )
Sum of
Squares
23.14600
4.69000
27.83600
Mean
Square
7.71533
0.78167
F Value
9.87
Pr > F
0.0098
Parameter Estimates
Variable
Intercept
x1
x2
x3
Parameter
Estimate
14.20813
-0.39313
-0.62438
0.43750
DF
1
1
1
1
Standard
Error
3.03548
(______)
0.29491
0.31903
t Value
4.68
*****
-2.12
1.37
Pr > |t|
0.0034
******
0.0786
0.2193
Type I SS
806.40400
(_______)
2.57188
1.47000
The X'X matrix and its inverse are
X'X
=
10
25
75
10
25
145
100
65
75
100
665
35
10
65
35
37.68
inverse of X'X =
11.79 -1.41 -1.14
-1.41
0.20
0.13
-1.14
0.13
0.11
0.36 -0.10 -0.03
0.36
(_____)
-0.03
0.13
(A) () Fill in the 6 blanks in the display above. Do not use the inverse
of X'X in calculating elements of X'X - it will not be accurate enough.
(B) (15 pts.) Calculate F tests for these hypotheses. In each case H is just
"not H0."
H0: beta_2=0.
F = ______________
H0: beta_2=beta_3=0
F = _______________
H0: beta_1+beta_2+beta_3=0 , beta_2-beta_33=0, and beta_1-beta_2=0
F = __________________
1
(C) (5 pts.) Compute the standard error of
estimates of beta_1 and beta_2.
b1-2 b2 where b1 and b2 are the
Question II
() I used 4 drugs, each on three patients. In addition to the response
variable Y,
I have these dummy variables X1, X2, and X3 in my dataset.
Note that there is no dummy variable for drug C.
drug
A
A
A
B
B
B
C
C
C
D
D
D
I ran PROC REG; MODEL Y = X1
Variable
Intercept
X1
X2
X3
DF
1
1
1
1
X2
X1
X2
X3
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
X3;
getting this partial output:
Parameter Estimates
Parameter
Standard
Estimate
Error
11.00000
1.47196
9.00000
2.08167
6.00000
2.08167
8.00000
2.08167
t Value
7.47
4.32
2.88
3.84
Pr > |t|
<.0001
0.0025
0.0204
0.0049
(A) Give the sample average response for drug A____ and for drug C _____
(B) Give a t test for testing the null hypothesis that drugs B and D
have the same mean response. t = _________
(C) Find, if possible, the error mean square, MSE=_______ for this
regression.
(D) I create a t statistic whose numerator is the mean response
for drug A minus the average of the B and C means. Complete this
formula for the standard error of this linear combination of
means (i.e.the denominator of t ) std error = sqrt[(________) MSE]
where, as always, MSE is the error mean square.
2
III.
An experiment is done to investigate the effects of factor S=salinity
and factor T=water temperature on growth of a certain type of underwater
plant. Three equally spaced levels of each factor are investigated in a
factorial treatment arrangement. Each replicate of the experiment uses 9
containers, one for each treatment combination. The 9 treatments are
assigned at random to the containers, each of which has been stocked with
young plants. After 1 month the plants are harvested and growth measured.
The equipment is cleaned out, new plants inserted and the whole process is
repeated the next month and the month after for a total of 3 replicates
(REP).
1. ()
How many observations do we have in this experiment? _______
2. () Compute the error degrees of freedom resulting from each of
these analyses (SAS code is given for clarity - note the CLASS
statements: Categorical factors are listed here).
a.
Factorial ANOVA with interaction, no blocks.
PROC GLM;
CLASS S T;
MODEL YIELD = S T S*T;
error df= ______
b.
Factorial ANOVA with blocks, no interaction.
PROC GLM;
CLASS REP S T;
MODEL YIELD=REP S T;
error df= _____
c.
Linear regression in S and T with blocks, no interaction.
PROC GLM;
CLASS REP;
MODEL YIELD=REP S T;
error df =
d.
Linear regression in S and T with (linear by linear) interaction,
no
blocks.
PROC GLM;
MODEL YIELD = S T S*T;
error df=
e. Full quadratic surface in S and T, no blocks.
PROC GLM;
MODEL YIELD = S S*S T T*T T*S; error df=
3. () Some of the models in question 2 include block effects (REP)
and some don't. Based on the description of the experiment, which is
appropriate? (include REP, it was replicated in blocks of time)
4. () Here are some treatment totals. You may want to recall
that the orthogonal polynomial coefficients for a factor at 3 equally
spaced levels are
Linear:
-1
Quadratic: -1
0 1
2 -1
Treatment totals:
1
T
1
2
3
--------------------|
50
44
46 |
140
3
S
2
3
|
|
|
|
|
|
|
35
21
24 |
--------------------130
100
110
45
35
40
120
80
340
Compute the sum of squares for linear effect of T within level 1 of S _______
Compute the sum of squares for the T linear by S linear interaction ______
Each of the above sums of squares is associated with a contrast in the 9
totals in our table. Are these 2 contrasts orthogonal? (yes, no)
()
IV.
A multiple regression equation Yt + Beta0 + Betal Xlt + Beta2 X2t
+ Beta3 X3t + et is fit to some data. We obtain:
df
Type I
Type II
Xl
l
l80
40
X2
l
90
l00
X3
l
50
50
20
300
-1
Error
(X'X)
+
:
:
:
:
:
:
:
+
.03
.05
.04
.05
.30
.20
.04
.20
.48
.01
.l2
.l0
+
.0l :
:
.l2 :
:
.l0 :
:
.0l :
+
Give, if possible, the computed F statistic for testing:
(a)
H0:
Beta2 = Beta3 = 0
(b)
H0:
Beta2 = 0
F = ___________________
F = ___________________
Parts ( c) , (d), and (e) should be ignored.
V
A factorial experiment has quantitative factors A at 3 equally spaced
levels and B at 4 equally spaced levels.
blocks.
The 10 replications are in
Here are the totals for A and B, each being a total of 10
original observations:
b0
bl
b2
b3
Polynomials
+--------------------------+
-a0 : 530 : 700 : 720 : 850
:
Linear Orthogonal
+--------------------------4 levels
: -3
-l
l
3
4
al
a2
:------:-----:-----:-------:
: 4l0 : 500 : 620 : 700
:
:------:-----:-----:-------:
: 400 : 470 : 500 : 650
:
+--------------------------+
BLOCK SS =
3 levels
:
: -l
:
0
l
5000
Total SS = 28000
Compute the sums of squares for:
()
AL = A linear ______________________
BL = B linear
______________________
AL x BL
_______________________
() The test statistic which test the null hypothesis that nothing other
than the above effects is needed to describe the effects of A and B,
F = _________________________________
() How many numerator________ denominator _________ degrees of freedom
for F?
5
VI. () Eight trees are selected at random and from each tree 12 identical
boards are cut for a total of 96 boards. Breaking strengths of the
boards are measured. From each tree, three randomly selected boards are
broken while dry at temperature 40 degrees F, three are broken wet at 40
degrees F, three dry at 90 degrees F and three wet at 90 degrees F. Let
factor A be temperature, B be tree, and C be wetness. Label as random or
fixed.
Factor A is (random, fixed)
f
Factor B is (random, fixed)
r
Factor C is (random, fixed)
f
NOTE: Wetness is not selected at random from a normal population.
In a factorial experiment, the experimenter would try to hit the
same wetness (saturation level) each time, but might not be
successful. Still, half the numbers would be close to 0 and half
close to, say, 60% saturation so we would expect effects that
are bimodal, not normal looking. Also, wetness (like drug dosage,
fertilizer level, etc.) would likely be analyzed as a "regression"
type variable, that is, extension to other levels of wetness would
be done by running a (linear) regression rather than by describing
the variance of some normal population. These "regression" factors
are considered fixed.
Who would be interested in these results? Why was the experiment
done? Someone interested in breaking strength of wood in construction
applications might realize that items he constructs might be in warm
or cold, wet or dry environments and hence would be interested in
these (fixed) effects. Wood will come from various trees and
tree-to-tree variation is likely of interest, but not the comparison of
one tree to another since these were randomly selected (likely all of
one variety - e.g. knotty pine). Anyone interested in this experiment
will not want results restricted to these trees and extrapolation
by regression (on tree number ??) would obviously not be the corrrect
way to extrapolate (in contrast to wetness factor). Trees are thus random.
6
VII. Trucks deliver loads of corn to a breakfast cereal company. Several trucks were
selected at random and from each, six sample jars of corn (from six randomly selected
locations in the truck bed) were taken. Three of these (randomly selected) were
refrigerated and the other three left at room temperature. Call this factor “storage”. After
a period of time, an aflatoxin measurement Y was made on each jar. The company wants
to compare these two storage methods in terms of the effect on aflatoxin.
How many trucks _____ were selected?
There is only one fixed effect here. Which is it?
VIII. A 2x3 factorial set of treatments, namely 2 wheat varieties (A,B) and 3 levels of fertilizer
(1, 2, or 3 fertilizer applications) were set up for research . The experiment was done by selecting
5 farms at random in North Carolina, laying out 6 plots per farm and assigning the 6 treatment
combinations to the six plots randomly within each farm.
a. Present the layout of the experiment.
b. ANOVA table: SOURCES and DF. Indicate whether factors are random or fixed.
7