Mathematics 243 Inference in Linear Models - Continued Observation = Model + Error Observation = Fitted + Residual yi = β0 + β1 xi + εi The disribution of β1 : b1 − β1 se(b1 ) Day 40 - April 19 yi = b0 + b1 xi + ei . has a t-distribution with n − 2 degrees of freedom. Rust The dataframe corrosion has data on corrosion in 13 specimens of 90-10 Cu-Ni alloys with varying iron content. The bars were submerged in sea water for 60 days and the loss of material due to corrosion was recorded. The variables are Fe (iron content in percent) and loss (the weight loss in mg per square decimeter per day). Of course Fe is the explanatory variable and loss the response variable. > rust=lm(loss~Fe,data=corrosion) > summary(rust) Call: lm(formula = loss ~ Fe, data = corrosion) Residuals: Min 1Q -3.7980 -1.9464 Median 0.2971 3Q 0.9924 Max 5.7429 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 129.787 1.403 92.52 < 2e-16 *** Fe -24.020 1.280 -18.77 1.06e-09 *** --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 3.058 on 11 degrees of freedom Multiple R-squared: 0.9697, Adjusted R-squared: 0.967 F-statistic: 352.3 on 1 and 11 DF, p-value: 1.055e-09 1. Confidence interval for the mean of y given x = x0 : > x=data.frame(Fe=c(.25,.5,.75,1)) > predict(rust,newdata=x,interval='confidence') fit lwr upr 1 123.7816 121.2195 126.3437 2 117.7767 115.6346 119.9187 3 111.7717 109.8732 113.6702 4 105.7667 103.8662 107.6672 2. Prediction interval for an individual y given x = x0 : > x=data.frame(Fe=c(.25,.5,.75,1)) > predict(rust,newdata=x,interval='prediction') fit lwr upr 1 123.7816 116.58030 130.9829 2 117.7767 110.71385 124.8395 3 111.7717 104.77890 118.7645 4 105.7667 98.77338 112.7600 1980 Census Undercount The dataset Ericksen in the package car has data on undercounting in the 1980 census. It includes data on the 16 largest cities, the rest of the states in which those cities are located, and the other US states. We are interested in whether the percentage of minority residents (minority) is related to the percentage by which the 1980 census undercounted the city or state population (undercount). 1. Write a linear function to predict undercount percentage from minority population percentage. 2. Peruse a residual plot to determine whether the three assumptions necessary for inference are plausible for this situation. 3. Test the hypothesis that undercount percentage is positively related to minority percentage. Be sure to write the hypotheses carefully in terms of the parameters of the model. 4. Write and interpret a 95% confidence interval for the slope of the regression line. 5. According to the 1980 census, Grand Rapids had a population of 181,843 and a minority population of 18.9%. (a) Predict the undercount for Grand Rapids and, hence, the true population. (b) Give a 95% prediction interval for the true population of Grand Rapids in 1980. 2 Homework, due Thursday, April 26 (due to advising days) 1. A famous dataset (Pierce, 1948) contains data on the relationship between cricket chirps and temperature. The dataset is in the M241 package and is named crickets. Here the variables are Temperature in degrees Fahrenheit and Chirps giving the number of chirps per second of crickets at that temperature. (a) Write the equation of the regression line that could be used to predict the temperature from the number of cricket chirps per second. (b) Write a 95% confidence interval for the slope of the line. (c) Write a 95% confiddence interval for the mean temperature for each of the values 12, 14, 16, and 18 of cricket chirps per second. (d) You hear a cricket chirping 15 times per second. What is an interval that is likely to capture the value of the temperature? Explain what likely means here. (e) Is there anything in the residual plot that gives you any concern about using a linear model here? 2. The faraway package contains a dataset cpd which has the projected and actual sales of 20 different products of a company. (The data were actually transformed to disguise the company.) (a) Write a regression line that describes a linear relationship between projected and actual sales. (b) Identify one data point that has particularly large influence on the regression. (c) Recompute the regression line after removing the data point that you identfied. How does the equation of the line change? Regression in R: > > > > > > > > > > > > model=lm(height~foot,data=statstu) r=residuals(model) f=fitted(model) bs=coefficients(model) xyplot(r~foot,data=statstu) xyplot(height~foot,data=statstu) ladd(panel.abline(model)) summary(model) confint(model) x=data.frame(foot=c(25,30,35)) predict(model,x,interval='confidence') predict(model,x,interval='prediction') # # # # # # # # # # # # fits the linear model defines the vector of residuals defines the vector of fitted values a vector with the slope and intercept plotting the residuals plotting the data adds the regression line to data plot summary of statistics and inference confindence intervals for betas a dataframe of x values predictions and confidence interval for mean of y at x prediction and prediction interval for value of y at x 3
© Copyright 2025 Paperzz