Essentials of Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved. Chapter 12 Learning Objectives LO 12.1 LO 12.2 LO 12.3 LO 12.4 LO 12.5 LO 12.6 LO 12.7 Estimate the simple linear regression model and interpret the coefficients. Estimate the multiple linear regression model and interpret the coefficients. Calculate and interpret the standard error of the estimate. Calculate and interpret the coefficient of determination R2. Differentiate between R2 and adjusted R2. Conduct tests of individual significance Conduct a test of joint significance Regression Analysis 12-2 12.1 The Simple Linear Regression Model LO 12.1 Estimate the simple linear regression model and interpret the coefficients. With regression analysis, we explicitly assume that one variable, called the response variable, is influenced by other variables, called the explanatory variables. Using regression analysis, we may predict the response variable given values for our explanatory variables. Regression Analysis 12-3 LO 12.1 12.1 The Simple Linear Regression Model If the value of the response variable is uniquely determined by the values of the explanatory variables, we say that the relationship is deterministic. But if, as we find in most fields of research, that the relationship is inexact due to omission of relevant factors, we say that the relationship is inexact. In regression analysis, we include a random error term, that acknowledges that the actual relationship between the response and explanatory variables is not deterministic. Regression Analysis 12-4 LO 12.1 12.1 The Simple Linear Regression Model The simple linear regression model is defined as: y = b0 + b1x + e where y and x are, respectively, the response and explanatory variables and e is the random error term The coefficients b0 and b1 are the unknown parameters to be estimated. Regression Analysis 12-5 LO 12.1 12.1 The Simple Linear Regression Model By fitting our data to the model, we obtain the equation: ŷ = b0 + b1x where ŷ is the estimated response variable, b0 is the estimate of b0, and b1 is the estimate of b1. Since the predictions cannot be totally accurate, the difference between the predicted and actual value is called the residual e = y – ŷ. Regression Analysis 12-6 LO 12.1 12.1 The Simple Linear Regression Model This is a scatterplot of debt payments against income with a superimposed sample regression equation. Debt payments rise with income. Vertical distance between ŷ and y represents the residual, e. Regression Analysis 12-7 LO 12.1 12.1 The Simple Linear Regression Model The two parameters b0 and b1 are estimated by minimizing the summer of squared residuals. The slope coefficient is first estimated as: b1 x x y y x x i i 2 i Then the intercept is computed as: b0 y b1 x Regression Analysis 12-8 12.2 The Multiple Regression Model LO 12.2 Estimate the multiple linear regression model and interpret the coefficients. If there is more than one explanatory variable available, we can use multiple regression. For example, we analyzed how debt payments are influenced by income, but ignored the possible effect of unemployment. Multiple regression allows us to explore how several variables influence the response variable. Regression Analysis 12-9 LO 12.2 12.2 The Multiple Regression Model Suppose there are k explanatory variables. The multiple linear regression model is defined as: y = b0 + b1x1 + b2x2 + … + bkxk + e, Where x1, x2, …, xk are the explanatory variables and the bj values are the unknown parameters that we will estimate from the data. As before, e is the random error term. Regression Analysis 12-10 LO 12.2 12.2 The Multiple Regression Model The sample multiple regression equation is: ŷ = b0 + b1x1 + b2x2 + … + bkxk. In multiple regression, there is a slight modification in the interpretation of the slopes b1 through bk, as they show “partial” influences. For example, if there are k = 3 explanatory variables, the value b1 estimates how a change in x1 will influence y assuming x2 and x3 are held constant. Regression Analysis 12-11 12.3 Goodness-of-Fit Measures LO 12.3 Calculate and interpret the standard error of the estimate. We will introduce three measures to judge how well the sample regression fits the data The Standard Error of the Estimate The Coefficient of Determination The Adjusted R2 Regression Analysis 12-12 LO 12.3 12.3 Goodness-of-Fit Measures To compute the standard error of the estimate, we first compute the mean squared error. We first compute the sum of squares due to error: n n SSE e yi yˆ i i 1 2 i 2 i 1 Dividing SSE by the appropriate degrees of freedom, n – k – 1 , yields the mean squared error, MSE: SSE MSE n k 1 Regression Analysis 12-13 LO 12.3 12.3 Goodness-of-Fit Measures The square root of the MSE is the standard error of the estimate, se. se MSE e 2 i n k 1 2 ˆ yi yi n k 1 In general, the less dispersion around the regression line, the smaller the se, which implies a better fit to the model. Regression Analysis 12-14 12.3 Goodness-of-Fit Measures LO 12.4 Calculate and interpret the coefficient of determination R2. The coefficient of determination, commonly referred to as the R2, is another goodness-of-fit measure that is easier to interpret than the standard error. The R2 quantifies the fraction of variation in the response variable that is explained by changes in the explanatory variables. Regression Analysis 12-15 LO 12.4 12.3 Goodness-of-Fit Measures The coefficient of determination is computed as SSE R 1 SST 2 2 SST y y where and SSE y ŷ i i The SST, called the total sum of squares, denotes the total variation in the response variable. The SST can be broken down into two components: The variation explained by the regression equation (the sum of squares due to regression, or SSR) and the unexplained variation (the sum of squares due to error, or SSE). 2 Regression Analysis 12-16 12.3 Goodness-of-Fit Measures LO 12.5 Differentiate between R2 and adjusted R2. More explanatory variables always result in a higher R2. Some of these variables may be unimportant and should not be in the model. The Adjusted R2 tries to balance the raw explanatory power against the desire to include only important predictors. Regression Analysis 12-17 LO 12.5 12.3 Goodness-of-Fit Measures The Adjusted R2 is computed as: Adjusted R2 n 1 = 1 1 R n k 1 2 The Adjusted R2 penalizes the R2 for adding additional explanatory variables. As with our other goodness-of-fit measures, we typically allow the computer to compute the Adjusted R2. It’s shown directly below the R2 in the Excel regression output. Regression Analysis 12-18 12.4 Tests of Significance LO 12.6 Conduct tests of individual significance. With two explanatory variables (Income and Unemployment) to choose from, we can formulate three possible linear models Model 1: Debt = b0 + b1Income + e Model 2: Debt = b0 + b1Unemployment+ e Model 3: Debt = b0 + b1Income + b2Unemployment + e Inference with Regression Models 12-19 LO 12.6 12.4 Tests of Significance Consider our standard multiple regression model: y = b0 + b1x1 + b2x2 + … + bkxk + e, In general, we can test whether bj is equal to, greater than, or less than some hypothesized value bj0. This test could have one of three forms: Inference with Regression Models 12-20 LO 12.6 12.4 Tests of Significance The test statistic will follow a t-distribution with degrees of freedom df = n – k – 1. It is calculated as: tdf bj b j0 seb j sebj is the standard error of the estimator bj Inference with Regression Models 12-21 LO 12.6 12.4 Tests of Significance By far the most common hypothesis test for an individual coefficient is to test whether its value differs from zero. To see why, consider our model: y = b0 + b1x1 + b2x2 + … + bkxk + e, If a coefficient is equal to zero, then it implies that the explanatory variable is not a significant predictor of the response variable. Inference with Regression Models 12-22 12.4 Tests of Significance LO 12.7 Conduct a test of joint significance. In addition to conducting tests of individual significance, we also may want to test the joint significance of all k variables at once. The competing hypotheses for a test of joint significance are: H0: b1 = b2 = … = bk = 0 HA: at least one bj ≠ 0 Inference with Regression Models 12-23 LO 12.7 12.4 Tests of Significance The test statistic for a test of joint significance is SSR k MSR Fdf1 ,df2 SSE n k 1 MSE where MSR and MSE are, respectively, the mean square regression and the mean square error. The numerator degrees of freedom, df1, equal k, while the denominator degrees of freedom, df2, are n – k – 1. Fortunately, statistical software will generally report the value of F(df1, df2) and its p-value as standard output, making computation by hand rarely necessary. Inference with Regression Models 12-24
© Copyright 2026 Paperzz