NCSU ST512 TAKE HOME QUIZ SUM 2 2011 1. A friend has asked you for help in fitting a line for a class project. He has collected data for runners world record times on ten distances for men and women. He needs to fit a single linear trend over all data points to show the relationship between time and distance, ignoring gender of the runner. The world records on ten distances (outdoor running) are listed for men and women in Table below. The records were taken from the website of the International Association of Athletics Federation (IAAF), http://www.iaaf.org on July 23, 2011. The marathon is a long-distance running event with an official distance of 42.195 kilometers (26 miles and 385 yards) Male Distance (m) time(sec) Female date time(sec) date 100 9.77 14/06/2005 10.49 16/07/1988 200 19.32 01/08/1996 21.34 29/09/1988 400 43.18 26/08/1999 47.60 06/10/1985 800 101.11 24/08/1997 113.28 26/07/1983 1500 206.00 14/07/1998 230.46 11/09/1993 3000 440.67 01/09/1996 486.11 13/09/1993 5000 757.35 31/05/2004 864.53 03/06/2006 10000 1577.53 26/08/2005 1771.78 08/09/1993 21097.5 3535.00 15/01/2006 4004.00 15/01/1999 42195 7495.00 28/09/2003 8125.00 13/04/2003 As requested, your run a simple linear regression on this dataset and present your friend with the results. a) Write the estimated Simple linear regression equation and test whether the linear regression coefficient is significantly different from 0. (Page 1) timei 67.83242 0.18424 distancei H o : 1 0 H1 : i 0 , 0.05 tcalc =68.41, p <0.0001 , 68.41 > 2.01 (t18, 0.025) , Reject Ho, There is a significant linear relationship between Time and distance for world running record at a 0.05 significance level b) As learned in class, you run a lack of fit test on this data to ensure that the linear fitting is adequate. Lack of fit test: July 23, 2011 Test the hypothesis that a higher degree polynomial may be needed. (Page 6) 1 NCSU ST512 TAKE HOME QUIZ H o : Higher deg ree polynomial is needed SUM 2 2011 a=0.05 H1 : Linear regresion is adequate MSLOF = 8943.0, MSE = 35988.8, FLOF = 8943.0/35988.8 Calculated F for lack of fit test = FLOF = 0.25 p-val = 0.9699 , Do not Reject Ho , there is not enough statistical evidence at a 0.05 significance level to reject Ho. Linear fitting seems adequate. Based on the lack of fit test, you decided that linear trend is fine, and prepared a plot of the observed records and linear trend. c) Include the plot of predicted vs distance. Still, a look at the residual should not be bad idea, d) present a plot of residual against predicted and discuss the linear regression fitting. Does the plot of predicted against distance adjust well to data? Does the residual plot show whether the linear fitting was adequate? July 23, 2011 2 NCSU ST512 e) TAKE HOME QUIZ SUM 2 2011 After looking at the residuals, you decided to try a power function for this relationship. This power function is expressed as a linear function after a log transformation for both x- and y-variables, as shown next, time M distanceb1 log10 time log10 M b1 log10 distance y bo b1 x where bo log10 M y log10 time x log10 distance f) Write the power function equation estimated for this data. values for M and b1. What are the estimated (page 12) time M distanceb1 bo log10 M 1.21326 M 101.21326 0.061198 b1 1.10905 yi 1.21326 1.10905 xi time 0.061198 distance 1.10905 g) Compute the predicted mean for distance 100 meters, distance of 1000 meters time1000 . time100 . Repeat computation for a Note that time x new 10 y where y bo b1 xnew For distance = 100 meters, x100 = log10(100) = 2 ydis tan ce 100 1.21326 1.10905 2 1.00484 time dis tan ce 100 101.00484 10.11207 sec. For distance = 1000 meters, x100 = log10(1000) = 3 ydis tan ce 1000 1.21326 1.10905 3 2.11389 time dis tan ce 1000 102.11389 129.984 sec. July 23, 2011 3 NCSU ST512 h) TAKE HOME QUIZ Find the time 1000 SUM 2 2011 . Interpret b1. time 100 log10 time1000 time100 time1000 time100 log10 time1000 log10 time100 1.10905 129.984 12.8543 101.10905 10.11207 For a 10-fold increase on distance, new time is 12.8543 base time , 101.10905 12.8543 2. Estimate the mean record time for a distance of 25 km. Calculate the 95%confidence interval for this predicted mean. x25000 log10 25000 4.39794 ydis tan ce 25000 1.21326 1.10905 4.39794 3.664275 time dis tan ce 25000 103.664275 4616.102 sec. (SAS output pred time is 4615.99) Confidence Interval for the estimated mean time for a distance of 25000 meters The MEANS Procedure Variable Corrected SS Mean Variance Std Dev ---------------------------------------------------------------------------distance 3949839136 9191.04 171732136 13104.66 time 112592378 1485.19 5925914.65 2434.32 LOG_TIME 16.8274621 2.4585597 0.8856559 0.9410929 LOG_distance 16.1219348 3.3754829 0.7009537 0.8372298 ---------------------------------------------------------------------------- Confidence interval for E Y | X xo , the conditional mean of Y when X=25000 Y / X x 1.21326 1.10905 4.39794 3.664275 , o Var Y | X xo Var Y | X xo 2 xo x 1 n xi x 2 2 1 3.664275 3.3754832 MSE 0.00005903525 0.00768344 ^ 2 20 16.121935 t18,0.05 2 2.100922 July 23, 2011 4 NCSU ST512 TAKE HOME QUIZ SUM 2 2011 100 1 % Confidence interval for Y / X xo is given by Y | X x t n 2, 2 o 2 xo x 1 MS E n xi x 2 3.664275 2.100922 0.00768344 Conf 3.648133 y| x 4.39794 3.680417 95% Lowerdis tan ce 25000 103.648133 4447.671 Upperdis tan ce 25000 103.680417 4790.902 Conf 4447.671 Time|dis tan ce 25000 4790.902 95% In SAS output: log10-scale : lower limit =3.63890, upper limit =3.68963 Original-scale: lower limit = 4354.14, upper limit = 4893.59 i) Plot of residuals against the predicted values is presented below. Discuss whether a separate fitting is needed for male and females. Residual plot Residual 0.06 0.05 0.04 0.03 0.02 0.01 0.00 -0.01 -0.02 -0.03 -0.04 -0.05 -0.06 1 2 3 4 Predicted Value of LOG_TIME gender F M Note that women tend to have higher residuals than men in a clear pattern which may be evidence that a common linear equation for both men and women may be underestimating the record time for a given distance when the runner is a women and over estimating the record time for a given distance when the runner is a man. Also is important to note that these records provide with just an observation for each distance in the considered range for each group. When fitting a separate curve for each group, any statistical test is done against a MSE that measures basically the lack of fit from the linear trend. No measure of “pure error” is available. July 23, 2011 5 NCSU ST512 TAKE HOME QUIZ SUM 2 2011 2. The following data presents the results of a study of the effect of ambient temperature and liquid viscosity on the amount of energy (joules/sec) honeybees spend while drinking. Temperature levels were 20 and 30C. Levels of liquid viscosity refer to the percent of Sucrose in total solids dissolved in liquid. There were two levels for Sucrose, 20% and 40%. Each of the 4 combinations of temperature and viscosity were repeated three times in controlled conditions, randomly assigning the bees to each of the four experimental groups. The following variables were used in the analysis to simplify calculations: Temperature 25 5 Sucrose 30 X2 10 X 3 X 1 X 2 X1 Temperature Sucrose X 3 20 20 1 20 40 1 Sucrose X 2 20 1 40 1 Temperature X 1 Note 20 1 30 1 30 30 20 40 1 1 Data. July 23, 2011 Obs i temperature sucrose rep energy x1 x2 x3 1 20 20 1 3.1 -1 -1 1 2 20 20 2 3.7 -1 -1 1 3 20 20 3 4.7 -1 -1 1 4 20 40 1 5.5 -1 1 -1 5 20 40 2 6.7 -1 1 -1 6 20 40 3 7.3 -1 1 -1 7 30 20 1 6.0 1 -1 -1 8 30 20 2 6.9 1 -1 -1 9 30 20 3 7.5 1 -1 -1 10 30 40 1 11.5 1 1 1 11 30 40 2 12.9 1 1 1 12 30 40 3 13.4 1 1 1 6 NCSU ST512 TAKE HOME QUIZ SUM 2 2011 a) The following regression model was fit to study the effect of temperature, sucrose and their interaction on the amount of energy spent. y j o 1 X 1 2 X 2 3 X 3 e j j 1,...,12 b) Test the following hypothesis H o : 1 2 3 0 H1 : not all i 0 , i 1, 2,3 (page 9) SS(Model) =122.78, dfmodel = 3 , MS(Model) = 53.97, p-val <0.0001 . Reject Ho , at least one of the regression coefficients i 0 c) Write the estimated regression equation (need to replace each regression coefficient by is estimated value). y j 7.4333 2.2667 X 1 2.1167 X 2 0.7833 X 3 j 1,...,12 d) Write the test hypothesis for each parameter, and conclusion. H o : 1 0 H1 : 1 0 =0.05 p-va l< 0.0001, Reject Ho regression coefficient for X1 is significantly different from zero =0.05 p-va l< 0.0001, Reject Ho regression coefficient for X2 is significantly different from zero =0.05 p-va l< 0.0001, Reject Ho regression coefficient for X3 is significantly different from zero H o : 2 0 H1 : 2 0 H o : 3 0 H1 : 3 0 e) How much is the change in energy when the temperature increases from 20 to 30 and the viscosity is 20%? 20 25 1 5 30 25 Temperature 40 : X 1 1 5 20 30 Sucrose 20 : X 2 1 10 X 3 X 1 X 2 1 1 1 for X 1 1 and X 2 1 Temperature 40 : X 1 X 3 X 1 X 2 1 1 1 for X 1 1 and X 2 1 y X 11, X 21, X 31 7.4333 2.2667 1 2.1167 1 0.7833 1 =3.8332 y X 11, X 21, X 31 7.4333 2.2667 1 2.1167 1 0.7833 1 =6.8 July 23, 2011 7 NCSU ST512 TAKE HOME QUIZ SUM 2 2011 y j 7.4333 2.2667 X 1 2.1167 1 0.7833 X 3 Note that for X2=-1, y j 7.4333 2.1167 2.2667 0.7833 X 1 y j 5.3163 1.4834 X 1 6.8-3.8832 = 2.9668 = 2.2667*2-2*0.7833 = 2*1.4834 Twice the 1-unit change in X1 when X2 = -1. Table of means for each experimental group is presented next. Group mean Temperature 20 30 Mean f) 20 3.8 6.8 5.3 Sucrose 40 6.5 12.6 9.6 Mean 5.2 9.7 7.4 Show that the predicted mean for Temperature 20 when sucrose is at its average value is equal to the observed mean for this temperature over both sucrose levels. Temperature 20 : X 1 1 Sucrose 30 : X 2 0 X 3 1 0 0 y j 7.4333 2.2667 1 2.1167 0 0.7833 0 y x11, x 20, x 30 7.4333 2.2667 5.1663 y 5.2 which is the same as Temperature20,. , Mean response for Temperature=20 on average over sucrose. g) Show that the predicted mean for Sucrose = 40 when temperature is at its average value is equal to the observed mean for that sucrose over both temperature levels. Temperature 25 : X 1 0 Sucrose 40 : X 2 1 X 3 0 1 0 y j 7.4333 2.2667 0 2.1167 1 0.7833 0 y x10, x 21, x 30 7.4333 2.1167 9.5497 which is the same as temperature. July 23, 2011 y ., Sucrose20 9.6 ,Mean response for Sucrose=40 when averaged over 8 NCSU ST512 TAKE HOME QUIZ SUM 2 2011 h) Show that predicted value for Temperature =20 and Sucrose = 40 is equal to the observed mean for the corresponding experimental group. Temperature 20 : X 1 1 Sucrose 40 : X 2 1 X 3 11 1 y j 7.4333 2.2667 1 2.1167 1 0.7833 1 y x11, x 21, x 31 6.5 which is the same as Mean response for Temperature=20 and Sucrose=40 , yTemperature20,Sucrose40 6.5 i) . Use the following graph to explain the significance of X3. Energy against temperature by sucrose level energy mean 14 14 12 12 10 10 8 8 6 6 4 4 2 2 0 0 20 21 22 23 24 25 26 27 28 29 30 temperature sucrose 20 40 sucrose 20 40 When Sucrose = 20 the slope of the regression line for energy consumed and temperature is 1.4874 While if Sucrose = 40 (at its high value) this slope is 3.05. Bees energy required to drink liquid per unit of Temperature increase depend on the viscosity of the liquid, requiring more energy when viscosity is higher, for the observed range of 20-40% sucrose in liquid and ambient temperature 20C-40C . July 23, 2011 9
© Copyright 2025 Paperzz