STATISTICS 110, FALL 2015, MIDTERM EXAM NAME:_____KEY__________________________ Your assigned homework number :___________ First 6 digits of Student ID: __________________ Open notes, calculator required. You should have 5 pages plus a page of R output, handed out separately. Make sure you have them all. Each part of each problem is worth 4 points unless specified otherwise. Use the back of the pages if you need more space, but tell us to turn the page over and look. 1. The Kids198 dataset accompanying the book provides data on a random sample of 198 children between the ages of 8 and 18. Three of the variables measured and the way we will use them are for this question are: Y = Weight (in pounds), X1 = Age (in months), X2 = 0 if Male and 1 if Female. For each of the parts of this question, use the notation with Y and Xs rather than the names of the variables. a. Write the population model that specifies a linear relationship between Y = Weight and X1 = Age and for which that relationship has the same slope and the same intercept for males and females. Include information about the normality assumption. (The left hand side of the model is provided, to get you started.) Y = β0 + β1X1 + ε where the condition is that the ε are independent and each from a N(0, σε) distribution. For parts (b) and (c) you don’t need to include information about the normality assumption. Continue using the Y and X notation rather than names of variables. b. Write the population model for the linear relationship between Weight and Age that includes the same intercept but different slopes for males and females. Y = β0 + β1X1 + β2X1 X2 + ε (It’s okay if you use β3 for the coefficient for X1 X2 in anticipation of adding separate intercepts.) c. Write the population model for the linear relationship between Weight and Age that includes different intercepts and different slopes for males and females. Y = β0 + β1X1 + β2X2 + β3X1 X2 + ε d. Using the notation from your model in part (c), write the null and alternative hypotheses that would be used to test whether the population regression lines are the same for males and females, versus that they are not the same in some way. H0: β2 = β3 = 0 Ha: At least one of β2 and β3 is not 0 2. [2 pts each] For each of the following situations, specify whether the statement provided is always true, could be true for some populations and/or samples, or is never true. (Circle your answer.) a. When X and Y have a deterministic linear relationship, the slope of the line is 1. Always true Could be true Never true b. If 100 independent 95% confidence intervals are created for a mean, each based on a different sample, exactly 95 of them will cover the true population mean. Always true Could be true Never true c. Consider a regression situation with Y as the response, and 2 possible predictors X1 and X2. SSTotal will be the same for the model with X1 and X2 as predictors as it is for the model with only X1 as a predictor. Always true Could be true Never true d. When the slope of a regression line is negative, r2 will also be negative. Always true Could be true Never true e. When the correlation between X and Y is positive (and not 0) the slope of the least square regression line for simple linear regression is also positive. Always true Could be true Never true f. In a simple linear regression setting the numerical values of 1 and ˆ1 are equal. Always true Could be true Never true g. The sum of the residuals from fitting a least squares regression line will be 0. Always true Could be true Never true For Questions 3 to 9: The R output distributed with the exam includes an analysis of the Kids198 data described in question 1, but with Height added as a predictor. It includes the response variable Y = Weight (in pounds), and predictor variables X1 = Height (in inches), X2 = Age (in months) and X3 = Sex with 0 for males and 1 for females. Use the R output to answer all of the rest of the questions except the multiple choice. 3. As shown on the output, the correlation between Weight and Sex is −0.245. Explain why the correlation is negative. A negative correlation indicates a general pattern that as one variable increases the other decreases. In this case, Sex is 0 for males and 1 for females. So as Sex increases, Weight decreases. This makes sense because women in general weight less than men do. 4. Interpret the value of the coefficient for Sex, which is −2.28. For a male and female of the same age and height, the female is predicted to weigh 2.28 pounds less than the male. OR For the population of males and females of any fixed age and height, the average weight for the females is estimated to be 2.28 pounds less than the average weight of the males. 5. [2 pts each blank] Fill in numerical values in each of the blanks where possible. If a numerical value cannot be determined from the computer output, write NA (not available). No extensive computations are required, but if you need to compute something you can show your work on the side. (If you make a mistake in computation, including your work might get you partial credit because we can see where you went wrong.) a. ̂ 0 = __−174.33329___________ b. MSE = __14.48____________ c. β1 = ___NA (It’s a population value, so it’s unknown.)____ d. The value of the test statistic for testing H0: β1 = β2 = β3 = 0 is __277.7___________ e. The standard error of ˆ1 = ___0.33094________________ f. SSModel = __174,622 (It’s 173615 + 785 + 222)_______ 6. Write out the sample regression equation using the numbers from the output. Use the variable names instead of the X notation for the predictor variables. Round the coefficients to 3 digits after the decimal place. Yˆ = −174.333 + 4.284(Height) + 0.123(Age) – 2.281(Sex) 7. Row 3 after the “Head” command near the top of the output give values of all of the variables for the 3rd child in the data set. Find the predicted weight and then the residual for this child. Show your work. a. Predicted weight = −174.333 + 4.284(Height) + 0.123(Age) – 2.281(Sex) = −174.333 + 4.284(50.1) + 0.123(119) – 2.281(0) = 54.93 pounds b. Residual = Actual – predicted = 54 – 54.93 = −0.93 pounds c. Interpret the value of the residual you found in part (b). This child’s actual weight was 0.93 pounds lower than it was predicted to be based on his height, age and sex. 8. Using the R output, give the value of the test statistic and the p-value for testing the following hypotheses. a. In the simple linear regression model with Y = Weight and X = Age, the null hypothesis is that the population slope is 0 and the alternative hypothesis is that the slope is not 0. Test statistic = __17.5313_______ p-value = __2.2 × 10−16_____ b. In the multiple linear regression model with Y = Weight and the three predictors Height, Age and Sex, the null hypothesis is that Age is not needed given that Height and Sex are included in the model, and the alternative hypothesis is that Age is needed. In other words, the null hypothesis is that the population coefficient for Age is 0 and the alternative hypothesis is that it is not 0. Test statistic = __2.144_________ p-value = __0.0333_________ c. In the multiple linear regression model with Y = Weight and the three predictors Height, Age and Sex, the null hypothesis is that Age is not needed given that Height and Sex are included in the model, and the alternative hypothesis is that Age is needed and has a positive coefficient. In other words, the null hypothesis is that the population coefficient for Age is 0 and the alternative hypothesis is that it is greater than 0. Test statistic = __2.144_________ p-value = __0.0333/2 = 0.0167_ 9. (1 pt each) Fill in the blanks in the ANOVA table below, where F is the test statistic for H0: β1 = β2 = β3 = 0. Hint: All of the information you need is in the output, but you will need to do some arithmetic to get some of the values. Source Df SumSq MeanSq Model Error __3____ _194___ _174622________ __40656________ _58207.33____ _210_________ Total _197___ _215278________ F p-value _ 277.2____ _ 2.2 × 10−16__ MULTIPLE CHOICE (3 pts each) Circle the best choice 1. In simple linear regression, a plot of residuals versus fitted values is useful for checking some of the necessary conditions for inference in regression. Which one of the following is it not useful for checking? A. The relationship between Y and X is approximately linear. B. The standard deviation of the errors remains constant across the x values. C. The n pairs of observations are all independent. D. It is useful for checking all of the above conditions. 2. A researcher is interested in predicting the relationship between Y = percent body fat and X = average number of calories consumed per day for college freshmen who eat in the dining hall at a large university. She employs two research assistants, and they each are going to independently take a random sample of 100 students who eat in the dining hall and ask them to provide this information. Which of the following definitely will be the same for the two research assistants? A. The population regression line. B. The SSTotal for their samples. C. The intercepts for the regression lines computed from their samples. D. None of the above; all of those would differ. 3. In simple linear regression, MSE is used as an estimate of σ. In this context, what is σ? A. The standard deviation of the population of all Y values, ignoring the values of X. B. The standard deviation of the residuals from the sample. C. The standard deviation of the population of X values at each value of Y. D. The standard deviation of the population of Y values at each value of X. 4. Which of the following is a correct way to write the sample regression equation for simple linear regression? A. Y 0 1 X 1 e B. Yˆ X 0 1 1 C. Y ˆ0 ˆ1 X 1 e D. Yˆ ˆ0 ˆ1 X 1 e This one. The output below from R uses the data set Kids198, accompanying the book. Some irrelevant parts of the output have been removed for space reasons. Variable include: Y = Weight in pounds X1 = Height in inches X2 = Age in months X3 = Sex = 0 if Male, 1 if Female The population model using variable names is µY = β0 + β1(Height) + β2(Age) + β3(Sex). > head(Kids198) Height Weight 1 67.8 166 2 63.0 93 3 50.1 54 #First 3 rows of data, to give you a feel for it Age Sex 210 0 144 1 119 0 > cor(Kids198) Height Weight Age Sex Height 1.0000000 0.8980355 0.83292636 -0.25811645 Weight 0.8980355 1.0000000 0.78141242 -0.24549691 Age 0.8329264 0.7814124 1.00000000 -0.06732361 Sex -0.2581165 -0.2454969 -0.06732361 1.00000000 > cor.test(Kids198$Weight,Kids198$Age) t = 17.5313, df = 196, p-value < 2.2e-16 sample estimates: cor 0.7814124 > Kid.mod <- lm(Weight ~ Height + Age + Sex, data = Kids198) > summary(Kid.mod) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -174.33329 13.80581 -12.628 <2e-16 Height 4.28446 0.33094 12.946 <2e-16 Age 0.12301 0.05737 2.144 0.0333 Sex -2.28138 2.21699 -1.029 0.3047 --Residual standard error: 14.48 on 194 degrees of freedom Multiple R-squared: 0.8111, Adjusted R-squared: 0.8082 F-statistic: 277.7 on 3 and 194 DF, p-value: < 2.2e-16 > anova(Kid.mod) Analysis of Variance Table Response: Weight Df Sum Sq Mean Sq F value Pr(>F) Height 1 173615 173615 828.4373 < 2e-16 Age 1 785 785 3.7454 0.05441 Sex 1 222 222 1.0589 0.30474 Residuals 194 40656 210
© Copyright 2026 Paperzz