The following exercises were taken from Dr. D. Dickey web site for ST512 : http://www.stat.ncsu.edu/people/dickey/courses/st512/ Question I I use multiple regression to fit the model Y = beta_0 + beta_1 X1 + beta_2 X2 + beta_3 X3 + e and here is part of the PROC REG output, including the Type I (sequential) sums of squares Analysis of Variance Source Model Error Corrected Total DF (____) ( __ ) ( __ ) Sum of Squares 23.14600 4.69000 27.83600 Mean Square 7.71533 0.78167 F Value 9.87 Pr > F 0.0098 Parameter Estimates Variable Intercept x1 x2 x3 Parameter Estimate 14.20813 -0.39313 -0.62438 0.43750 DF 1 1 1 1 Standard Error 3.03548 (______) 0.29491 0.31903 t Value 4.68 ***** -2.12 1.37 Pr > |t| 0.0034 ****** 0.0786 0.2193 Type I SS 806.40400 (_______) 2.57188 1.47000 The X'X matrix and its inverse are X'X = 10 25 75 10 25 145 100 65 75 100 665 35 10 65 35 37.68 inverse of X'X = 11.79 -1.41 -1.14 -1.41 0.20 0.13 -1.14 0.13 0.11 0.36 -0.10 -0.03 0.36 (_____) -0.03 0.13 (A) () Fill in the 6 blanks in the display above. Do not use the inverse of X'X in calculating elements of X'X - it will not be accurate enough. (B) (15 pts.) Calculate F tests for these hypotheses. In each case H is just "not H0." H0: beta_2=0. F = ______________ H0: beta_2=beta_3=0 F = _______________ H0: beta_1+beta_2+beta_3=0 , beta_2-beta_33=0, and beta_1-beta_2=0 F = __________________ 1 (C) (5 pts.) Compute the standard error of estimates of beta_1 and beta_2. b1-2 b2 where b1 and b2 are the Question II () I used 4 drugs, each on three patients. In addition to the response variable Y, I have these dummy variables X1, X2, and X3 in my dataset. Note that there is no dummy variable for drug C. drug A A A B B B C C C D D D I ran PROC REG; MODEL Y = X1 Variable Intercept X1 X2 X3 DF 1 1 1 1 X2 X1 X2 X3 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 X3; getting this partial output: Parameter Estimates Parameter Standard Estimate Error 11.00000 1.47196 9.00000 2.08167 6.00000 2.08167 8.00000 2.08167 t Value 7.47 4.32 2.88 3.84 Pr > |t| <.0001 0.0025 0.0204 0.0049 (A) Give the sample average response for drug A____ and for drug C _____ (B) Give a t test for testing the null hypothesis that drugs B and D have the same mean response. t = _________ (C) Find, if possible, the error mean square, MSE=_______ for this regression. (D) I create a t statistic whose numerator is the mean response for drug A minus the average of the B and C means. Complete this formula for the standard error of this linear combination of means (i.e.the denominator of t ) std error = sqrt[(________) MSE] where, as always, MSE is the error mean square. 2 III. An experiment is done to investigate the effects of factor S=salinity and factor T=water temperature on growth of a certain type of underwater plant. Three equally spaced levels of each factor are investigated in a factorial treatment arrangement. Each replicate of the experiment uses 9 containers, one for each treatment combination. The 9 treatments are assigned at random to the containers, each of which has been stocked with young plants. After 1 month the plants are harvested and growth measured. The equipment is cleaned out, new plants inserted and the whole process is repeated the next month and the month after for a total of 3 replicates (REP). 1. () How many observations do we have in this experiment? _______ 2. () Compute the error degrees of freedom resulting from each of these analyses (SAS code is given for clarity - note the CLASS statements: Categorical factors are listed here). a. Factorial ANOVA with interaction, no blocks. PROC GLM; CLASS S T; MODEL YIELD = S T S*T; error df= ______ b. Factorial ANOVA with blocks, no interaction. PROC GLM; CLASS REP S T; MODEL YIELD=REP S T; error df= _____ c. Linear regression in S and T with blocks, no interaction. PROC GLM; CLASS REP; MODEL YIELD=REP S T; error df = d. Linear regression in S and T with (linear by linear) interaction, no blocks. PROC GLM; MODEL YIELD = S T S*T; error df= e. Full quadratic surface in S and T, no blocks. PROC GLM; MODEL YIELD = S S*S T T*T T*S; error df= 3. () Some of the models in question 2 include block effects (REP) and some don't. Based on the description of the experiment, which is appropriate? (include REP, it was replicated in blocks of time) 4. () Here are some treatment totals. You may want to recall that the orthogonal polynomial coefficients for a factor at 3 equally spaced levels are Linear: -1 Quadratic: -1 0 1 2 -1 Treatment totals: 1 T 1 2 3 --------------------| 50 44 46 | 140 3 S 2 3 | | | | | | | 35 21 24 | --------------------130 100 110 45 35 40 120 80 340 Compute the sum of squares for linear effect of T within level 1 of S _______ Compute the sum of squares for the T linear by S linear interaction ______ Each of the above sums of squares is associated with a contrast in the 9 totals in our table. Are these 2 contrasts orthogonal? (yes, no) () IV. A multiple regression equation Yt + Beta0 + Betal Xlt + Beta2 X2t + Beta3 X3t + et is fit to some data. We obtain: df Type I Type II Xl l l80 40 X2 l 90 l00 X3 l 50 50 20 300 -1 Error (X'X) + : : : : : : : + .03 .05 .04 .05 .30 .20 .04 .20 .48 .01 .l2 .l0 + .0l : : .l2 : : .l0 : : .0l : + Give, if possible, the computed F statistic for testing: (a) H0: Beta2 = Beta3 = 0 (b) H0: Beta2 = 0 F = ___________________ F = ___________________ Parts ( c) , (d), and (e) should be ignored. V A factorial experiment has quantitative factors A at 3 equally spaced levels and B at 4 equally spaced levels. blocks. The 10 replications are in Here are the totals for A and B, each being a total of 10 original observations: b0 bl b2 b3 Polynomials +--------------------------+ -a0 : 530 : 700 : 720 : 850 : Linear Orthogonal +--------------------------4 levels : -3 -l l 3 4 al a2 :------:-----:-----:-------: : 4l0 : 500 : 620 : 700 : :------:-----:-----:-------: : 400 : 470 : 500 : 650 : +--------------------------+ BLOCK SS = 3 levels : : -l : 0 l 5000 Total SS = 28000 Compute the sums of squares for: () AL = A linear ______________________ BL = B linear ______________________ AL x BL _______________________ () The test statistic which test the null hypothesis that nothing other than the above effects is needed to describe the effects of A and B, F = _________________________________ () How many numerator________ denominator _________ degrees of freedom for F? 5 VI. () Eight trees are selected at random and from each tree 12 identical boards are cut for a total of 96 boards. Breaking strengths of the boards are measured. From each tree, three randomly selected boards are broken while dry at temperature 40 degrees F, three are broken wet at 40 degrees F, three dry at 90 degrees F and three wet at 90 degrees F. Let factor A be temperature, B be tree, and C be wetness. Label as random or fixed. Factor A is (random, fixed) f Factor B is (random, fixed) r Factor C is (random, fixed) f NOTE: Wetness is not selected at random from a normal population. In a factorial experiment, the experimenter would try to hit the same wetness (saturation level) each time, but might not be successful. Still, half the numbers would be close to 0 and half close to, say, 60% saturation so we would expect effects that are bimodal, not normal looking. Also, wetness (like drug dosage, fertilizer level, etc.) would likely be analyzed as a "regression" type variable, that is, extension to other levels of wetness would be done by running a (linear) regression rather than by describing the variance of some normal population. These "regression" factors are considered fixed. Who would be interested in these results? Why was the experiment done? Someone interested in breaking strength of wood in construction applications might realize that items he constructs might be in warm or cold, wet or dry environments and hence would be interested in these (fixed) effects. Wood will come from various trees and tree-to-tree variation is likely of interest, but not the comparison of one tree to another since these were randomly selected (likely all of one variety - e.g. knotty pine). Anyone interested in this experiment will not want results restricted to these trees and extrapolation by regression (on tree number ??) would obviously not be the corrrect way to extrapolate (in contrast to wetness factor). Trees are thus random. 6 VII. Trucks deliver loads of corn to a breakfast cereal company. Several trucks were selected at random and from each, six sample jars of corn (from six randomly selected locations in the truck bed) were taken. Three of these (randomly selected) were refrigerated and the other three left at room temperature. Call this factor “storage”. After a period of time, an aflatoxin measurement Y was made on each jar. The company wants to compare these two storage methods in terms of the effect on aflatoxin. How many trucks _____ were selected? There is only one fixed effect here. Which is it? VIII. A 2x3 factorial set of treatments, namely 2 wheat varieties (A,B) and 3 levels of fertilizer (1, 2, or 3 fertilizer applications) were set up for research . The experiment was done by selecting 5 farms at random in North Carolina, laying out 6 plots per farm and assigning the 6 treatment combinations to the six plots randomly within each farm. a. Present the layout of the experiment. b. ANOVA table: SOURCES and DF. Indicate whether factors are random or fixed. 7
© Copyright 2024 Paperzz