Lab 8 : Linear Regression Simple Linear Regression Exercise: Statistics that summarize personal health care expenditures by state for the years 1966 through 1982 have been examined in a attempt to understand the issues related to rising health care costs. Suppose that you are interested in focusing on the relationship between expense per admission into a community hospital and average length of stay in the facility. The data set hospital contains for each state in the United States for 1982. The measures of mean expense per admission are saved under the variable name expadm; the corresponding average lengths of stay are saved under los. a) Genarate a numerical summary statistics for the variables expense per admission and length of stay in the hospital.(Means, std deviations, min, max values) Analyze→ Descriptive Statistics→ Descriptives Descriptive Statistics N Minimum Maximum Mean Std. Deviation mean expense per admission ($) 51 1772,00 4612,00 2716,8039 603,94708 average length of stay (days) 51 5,40 9,70 7,4902 1,01514 average salary ($) 51 11928,00 23594,00 14852,4118 1965,51403 Valid N (listwise) 51 1 b) Construct a two way scatter plot of mean expense per admission versus length of stay. What does the scatter plot suggest about the nature of the relationship between these variables? Graphs→ Chatter Builder→ Scatter Dot → Simple Scatter 2 c) Using expense per admission as the response and length of stay as the explanatory variable, compute the regression line. Interpret estimated slope and the intercept. Analyze → Regression→ Linear ANOVAa Model Sum of Squares Regression 1 df Mean Square 1890784,672 1 1890784,672 Residual 16346819,367 49 333608,559 Total 18237604,039 50 F Sig. ,021b 5,668 a. Dependent Variable: mean expense per admission ($) b. Predictors: (Constant), average length of stay (days) Coefficientsa Model Unstandardized Coefficients Standardized t Sig. Coefficients B (Constant) 1 average length of stay (days) a. Std. Error 1281,959 608,104 191,563 80,465 Dependent Variable: mean expense per admission ($) 3 Beta ,322 2,108 ,040 2,381 ,021 d) Interpret the 95% confidence for the true slope of the population regression line. → Statistics →Confidence intervals Coefficientsa Model Unstandardized Standardized Coefficients Coefficients B (Constant) 1 average length of stay (days) Std. Error 1281,959 608,104 191,563 80,465 t Sig. B Beta ,322 1 R ,322a R Square ,104 Adjusted R Std. Error of the Square Estimate ,085 577,58857 a. Predictors: (Constant), average length of stay (days) 4 Upper Bound ,040 59,929 2503,990 2,381 ,021 29,862 353,264 e) What is the coefficient of determination for the regression? (R squared) Model Lower Bound 2,108 a. Dependent Variable: mean expense per admission ($) Model Summary 95,0% Confidence Interval for f) What is the correlation between expense per admission and length of stay? How is it related to the determination of coefficient? Analyze→ Correlate →Bivariate Correlations mean expense average length per admission of stay (days) ($) Pearson Correlation mean expense per admission ($) 1 Sig. (2-tailed) ,021 N average length of stay (days) ,322* 51 51 Pearson Correlation ,322* 1 Sig. (2-tailed) ,021 N 51 51 *. Correlation is significant at the 0.05 level (2-tailed). g) Construct a plot residuals versus fitted values of expense per admission. In what three ways does the residual plot help you to evaluate the fit of the regression model? To save the fitted values and standardize residuals do the following using the Save button of regression model fitting window. Analyze→ Linear Regression →Save→ Predicted Values→ Unstandardized Residuals→ Standardized 5 The your data looks like the following: 6 Now plot fitted values versus standardized residuals. Also put a horizontal line at y=0 using the graph editor menu. 7 A plot of the residuals serves three purposes. 1) It can help us to detect outlying observations in the sample. (In our plot no residual is larger than the others.) 2) It can suggest a failure in the assumption of homoscedasticity. Homoscedasticity means that standard deviation of the outcomes (y) is constant across all values of x. If the ranges of the magnitudes of the residuals either increases or decreases as the fitted values become larger this implies that standard deviation of the outcomes (y) does not take the same value for all values of x. (No such pattern is evident in our plot suggesting the assumption of homoscedasticity does not appear to be violated) 3) If the residuals do not exhibit a random scatter but instead follow a trend this would suggest the true relationship between x and y might not be linear or some important explanatory variables might have been left out of the model. (Our plot does not exhibit a random scatter) Multiple Linear Regression Exercise: Consider the above data set hospital is the average salary per employee in 1982. a) Summarize the average salary per employee both graphically(boxplot) and numerically(mean, std. dev, min, max). 8 The 50th observation looks like an outlier. b) Construct a two way scatter plot of mean expense per admission versus average salary. What does the scatter plot suggest about the nature of the relationship between these variables? 9 c) Fit a regression model taking mean expense per admission as the response and average length of stay and average salary as the explanatory variables. Interpret the estimated regression coefficients. d) What happens to the estimated coefficient of length of stay when average salary is added to the model? e) Does the inclusion of salary in addition to average length of stay improve your ability to predict mean expense per admission? (Check whether it is significant, check whether R squared has increased etc.) f) Examine a plot of the residuals versus the fitted values of expense per admisison. What does the plot tell you about the fit of the model? 10
© Copyright 2026 Paperzz