12.1 The Simple Linear Regression Model

Essentials of Business Statistics: Communicating
with Numbers
By Sanjiv Jaggia and Alison Kelly
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
Chapter 12 Learning Objectives
LO 12.1
LO 12.2
LO 12.3
LO 12.4
LO 12.5
LO 12.6
LO 12.7
Estimate the simple linear regression model and
interpret the coefficients.
Estimate the multiple linear regression model and
interpret the coefficients.
Calculate and interpret the standard error of the
estimate.
Calculate and interpret the coefficient of
determination R2.
Differentiate between R2 and adjusted R2.
Conduct tests of individual significance
Conduct a test of joint significance
Regression Analysis
12-2
12.1 The Simple Linear Regression Model
LO 12.1 Estimate the simple linear regression model and interpret
the coefficients.
 With regression analysis, we explicitly assume that

one variable, called the response variable, is
influenced by other variables, called the explanatory
variables.
Using regression analysis, we may predict the
response variable given values for our explanatory
variables.
Regression Analysis
12-3
LO 12.1



12.1 The Simple Linear Regression
Model
If the value of the response variable is uniquely determined
by the values of the explanatory variables, we say that the
relationship is deterministic.
But if, as we find in most fields of research, that the
relationship is inexact due to omission of relevant factors, we
say that the relationship is inexact.
In regression analysis, we include a random error term, that
acknowledges that the actual relationship between the
response and explanatory variables is not deterministic.
Regression Analysis
12-4
LO 12.1
12.1 The Simple Linear Regression
Model
 The simple linear regression model is defined as:

y = b0 + b1x + e
where y and x are, respectively, the response and
explanatory variables and e is the random error term
The coefficients b0 and b1 are the unknown
parameters to be estimated.
Regression Analysis
12-5
LO 12.1
12.1 The Simple Linear Regression
Model
 By fitting our data to the model, we obtain the

equation:
ŷ = b0 + b1x
where ŷ is the estimated response variable, b0 is the
estimate of b0, and b1 is the estimate of b1.
Since the predictions cannot be totally accurate, the
difference between the predicted and actual value is
called the residual e = y – ŷ.
Regression Analysis
12-6
LO 12.1
12.1 The Simple Linear Regression
Model
 This is a scatterplot of debt payments against income
with a superimposed sample regression equation.
 Debt payments rise with income. Vertical distance
between ŷ and y represents the residual, e.
Regression Analysis
12-7
LO 12.1
12.1 The Simple Linear Regression
Model
 The two parameters b0 and b1 are estimated by
minimizing the summer of squared residuals.
 The slope coefficient is first estimated as:
b1
x  x  y  y 


 x  x 
i
i
2
i

Then the intercept is computed as:
b0  y  b1 x
Regression Analysis
12-8
12.2 The Multiple Regression Model
LO 12.2 Estimate the multiple linear regression model and interpret
the coefficients.
 If there is more than one explanatory variable


available, we can use multiple regression.
For example, we analyzed how debt payments are
influenced by income, but ignored the possible effect
of unemployment.
Multiple regression allows us to explore how several
variables influence the response variable.
Regression Analysis
12-9
LO 12.2
12.2 The Multiple Regression Model
 Suppose there are k explanatory variables. The multiple
linear regression model is defined as:
y = b0 + b1x1 + b2x2 + … + bkxk + e,
 Where x1, x2, …, xk are the explanatory variables and the
bj values are the unknown parameters that we will

estimate from the data.
As before, e is the random error term.
Regression Analysis
12-10
LO 12.2
12.2 The Multiple Regression Model
 The sample multiple regression equation is:
ŷ = b0 + b1x1 + b2x2 + … + bkxk.
 In multiple regression, there is a slight modification in

the interpretation of the slopes b1 through bk, as they
show “partial” influences.
For example, if there are k = 3 explanatory variables,
the value b1 estimates how a change in x1 will
influence y assuming x2 and x3 are held constant.
Regression Analysis
12-11
12.3 Goodness-of-Fit Measures
LO 12.3 Calculate and interpret the standard error of the estimate.
 We will introduce three measures to judge how
well the sample regression fits the data



The Standard Error of the Estimate
The Coefficient of Determination
The Adjusted R2
Regression Analysis
12-12
LO 12.3
12.3 Goodness-of-Fit Measures

To compute the standard error of the estimate, we first
compute the mean squared error.

We first compute the sum of squares due to error:
n
n
SSE   e    yi  yˆ i 
i 1

2
i
2
i 1
Dividing SSE by the appropriate degrees of freedom,
n – k – 1 , yields the mean squared error, MSE:
SSE
MSE 
n  k 1
Regression Analysis
12-13
LO 12.3
12.3 Goodness-of-Fit Measures
 The square root of the MSE is the standard error of
the estimate, se.
se  MSE 
e
2
i
n  k 1

2
ˆ


 yi  yi
n  k 1
 In general, the less dispersion around the regression
line, the smaller the se, which implies a better fit to the
model.
Regression Analysis
12-14
12.3 Goodness-of-Fit Measures
LO 12.4 Calculate and interpret the coefficient of determination R2.
 The coefficient of determination, commonly referred

to as the R2, is another goodness-of-fit measure that
is easier to interpret than the standard error.
The R2 quantifies the fraction of variation in the
response variable that is explained by changes in the
explanatory variables.
Regression Analysis
12-15
LO 12.4
12.3 Goodness-of-Fit Measures
 The coefficient of determination is computed as
SSE
R  1
SST
2
2


SST

y

y
where
and


SSE

y

ŷ
 i
 i
 The SST, called the total sum of squares, denotes the
total variation in the response variable.
 The SST can be broken down into two components:
The variation explained by the regression equation
(the sum of squares due to regression, or SSR) and the
unexplained variation (the sum of squares due to
error, or SSE).
2
Regression Analysis
12-16
12.3 Goodness-of-Fit Measures
LO 12.5 Differentiate between R2 and adjusted R2.
 More explanatory variables always result in a higher


R2.
Some of these variables may be unimportant and
should not be in the model.
The Adjusted R2 tries to balance the raw explanatory
power against the desire to include only important
predictors.
Regression Analysis
12-17
LO 12.5
12.3 Goodness-of-Fit Measures
 The Adjusted R2 is computed as:
Adjusted
R2
n 1
= 1 1 R
n  k 1

2

 The Adjusted R2 penalizes the R2 for adding additional

explanatory variables.
As with our other goodness-of-fit measures, we
typically allow the computer to compute the Adjusted
R2. It’s shown directly below the R2 in the Excel
regression output.
Regression Analysis
12-18
12.4 Tests of Significance
LO 12.6 Conduct tests of individual significance.
 With two explanatory variables (Income and
Unemployment) to choose from, we can formulate
three possible linear models
Model 1: Debt = b0 + b1Income + e
Model 2: Debt = b0 + b1Unemployment+ e
Model 3: Debt = b0 + b1Income + b2Unemployment + e
Inference with Regression Models
12-19
LO 12.6
12.4 Tests of Significance
 Consider our standard multiple regression model:


y = b0 + b1x1 + b2x2 + … + bkxk + e,
In general, we can test whether bj is equal to, greater
than, or less than some hypothesized value bj0.
This test could have one of three forms:
Inference with Regression Models
12-20
LO 12.6
12.4 Tests of Significance
 The test statistic will follow a t-distribution with

degrees of freedom df = n – k – 1.
It is calculated as:
tdf 
bj  b j0
seb j
 sebj is the standard error of the estimator bj
Inference with Regression Models
12-21
LO 12.6
12.4 Tests of Significance
 By far the most common hypothesis test for an


individual coefficient is to test whether its value
differs from zero.
To see why, consider our model:
y = b0 + b1x1 + b2x2 + … + bkxk + e,
If a coefficient is equal to zero, then it implies that the
explanatory variable is not a significant predictor of
the response variable.
Inference with Regression Models
12-22
12.4 Tests of Significance
LO 12.7 Conduct a test of joint significance.
 In addition to conducting tests of individual

significance, we also may want to test the joint
significance of all k variables at once.
The competing hypotheses for a test of joint
significance are:
H0: b1 = b2 = … = bk = 0
HA: at least one bj ≠ 0
Inference with Regression Models
12-23
LO 12.7
12.4 Tests of Significance
 The test statistic for a test of joint significance is
SSR k
MSR
Fdf1 ,df2  

SSE n  k  1 MSE


where MSR and MSE are, respectively, the mean square
regression and the mean square error.
The numerator degrees of freedom, df1, equal k, while the
denominator degrees of freedom, df2, are n – k – 1.
Fortunately, statistical software will generally report the
value of F(df1, df2) and its p-value as standard output,
making computation by hand rarely necessary.
Inference with Regression Models
12-24