March 3

ST 370
Probability and Statistics for Engineers
Multiple Linear Regression
Often more than one predictor variable can be used to predict the
value of a response variable.
The basic approach of the simple linear regression model may be
extended to include multiple predictors.
The principles carry over, but the computations are more tedious, and
hand calculation is largely infeasible.
1/8
Multiple Linear Regression
ST 370
Probability and Statistics for Engineers
For example, consider the data on the strength of the bond between
a component and its frame:
wireBond <- read.csv("Data/Table-01-02.csv")
pairs(wireBond)
Clearly Length (x1 ) could be used to predict Strength (y ), but also
possibly Height (x2 ) or Length2 (x12 ).
2/8
Multiple Linear Regression
ST 370
Probability and Statistics for Engineers
Multiple Linear Regression Model
The multiple linear regression model with k predictors is
Y = β0 + β1 x1 + β2 x2 + · · · + βk xk + Notation
When we have n observations on data like these, we write them
Yi = β0 +
k
X
xi,j βj + i ,
i = 1, 2, . . . , n;
j=1
that is, xi,j is the value of the j th predictor xj in the i th observation.
3/8
Multiple Linear Regression
Multiple Linear Regression Model
ST 370
Probability and Statistics for Engineers
Predictors and Variables
Each term in the equation is a predictor, but is not necessarily an
independent variable.
For example, consider the relationship between Strength and Langth:
plot(Strength ~ Length, wireBond)
Strength increases with Length, and roughly linearly, so we could use
the single-variable equation
Y = β0 + β1 x + .
4/8
Multiple Linear Regression
Multiple Linear Regression Model
ST 370
Probability and Statistics for Engineers
Close examination suggests that the relationship may be curved, not
linear, so we might want to fit the quadratic equation
Y = β0 + β1 x + β2 x 2 + .
If we write
x1 = x,
x2 = x 2 ,
this becomes
Y = β0 + β1 x1 + β2 x2 + ,
the multiple regression model with k = 2 predictors.
But the equation brings in only one independent variable, Length.
5/8
Multiple Linear Regression
Multiple Linear Regression Model
ST 370
Probability and Statistics for Engineers
Least Squares
As with the single-predictor model, we usually find parameter
estimates using the least squares approach.
For any proposed values b0 , b1 , . . . , bk we form the predicted values
b0 + b1 xi,1 + · · · + bk xi,k ,
i = 1, 2, . . . , n
and the residuals
ei = yi − (b0 + b1 xi,1 + · · · + bk xi,k ),
i = 1, 2, . . . , n
The sum of squares to be minimized is
L(b0 , b1 , . . . , bk ) =
n
X
i=1
6/8
ei2 =
n
X
[yi − (b0 + b1 xi,1 + · · · + bk xi,k )]2 .
i=1
Multiple Linear Regression
Multiple Linear Regression Model
ST 370
Probability and Statistics for Engineers
The least squares estimates β̂0 , β̂1 , . . . , β̂k that minimize
L(b0 , b1 , . . . , bk ) cannot in general be written out in closed form, but
have to be found by solving a set of equations.
The residual sum of squares is again
SSE =
n
X
ei2 ,
i=1
but the degrees of freedom for residuals are n − (k + 1), so the
estimate of σ 2 is
SSE
σ̂ 2 = MSE =
.
n − (k + 1)
7/8
Multiple Linear Regression
Multiple Linear Regression Model
ST 370
Probability and Statistics for Engineers
Fitting the model
Use lm() to fit the multiple regression model:
# the quadratic model:
summary(lm(Strength ~ Length + I(Length^2), wireBond))
# the two-variable model:
summary(lm(Strength ~ Length + Height, wireBond))
# quadratic in Strength, plus Height:
summary(lm(Strength ~ Length + I(Length^2) + Height, wireBond))
Note
The arithmetic operators “+”, “-”, “*”, “/”, and “^” have special
meanings within a formula, so the predictor Length^2 must be
“wrapped” in the identity function I(), otherwise it is misparsed as
part of the formula.
8/8
Multiple Linear Regression
Multiple Linear Regression Model