Introduc)on to Mul)ple Regression
Keegan Korthauer Department of Sta)s)cs UW Madison
1
Basic MLR Model
• Dependent con)nuous variable y • p independent con)nuous variables x1, x2,…, xp • n observa)ons: ordered pairs (yi, x1i, x2i,…, xpi) yi = β0 + β1 x1i + β2 x2i +... + β p x pi + εi
• Predicted yi for a set of x1i, x2i,…, xpi: ŷi = β̂0 + β̂1 x1i +... + β̂ p x pi
2
Least Squares Coefficients • Minimize the sum of squared residuals (SSE) to obtain coefficient point es)mates β̂0 , β̂1,..., β̂ p
– Analogous to SLR, involves taking p+1 par)al deriva)ves, seOng them equal to zero, and solving a system of p+1 equa)ons… – OR a much more elegant expression using linear algebra… – But we’ll rely on R to calculate the values for us • SSE is s)ll the sum of squared differences between observed y and predicted ŷ – no longer can visualize in 2D e = y − ŷ = y − β̂ − β̂ x −... − β̂ x
i
i
i
i
0
1 1i
p
pi
n
SSE = ∑ ei2
i=1
4
Interpre)ng Coefficients ^
• β0 is the predicted value of y when all xi are equal to zero – OYen this is nonsensical or involves extrapola)on ^
• βi is the predicted change in y for every one unit increase in xi while holding all other predictors constant – Or a1er adjus4ng for all other predictors in the model – OYen of the most interest to us • We can use these es)mates to predict the value of y for a new observa)on with x1,…,xn (as long as each x1,…,xn value is within the range of the observed values) 5
Interpre)ng Interac)ons We can rearrange the following model: yi = β0 + β1 x1i + β2 x2i + β3 x1i x2i + εi
= β0 + (β1 + β3 x2i )x1i + β2 x2i + εi
= β0 + β1 x1i + (β2 + β3 x1i )x2i + εi
• The presence of β3 here indicates that the effect of x1 depends on the value of x2 • In other words, the model predicts that y will change by ^
^
β1 + β3x2 units for every one unit increase in x1 What Does Interac)on Look Like? Price = 504.2 − 98.16*Bath − 78.91*Bed + 30.39*Bath*Bed Example – Pa)ent Sa)sfac)on A hospital administrator wished to study the rela)on between • Y: pa)ent sa)sfac)on (percent) • X1: pa)ent’s age • X2: severity of illness (an index) • X3: pa)ent’s anxiety level (an index) They randomly selected 46 pa)ents and collected each measurement above – the first 5 observa)ons are: Satis Age Sev Anx!
48 50 51 2.3!
57 36 46 2.3!
66 40 48 2.2!
70 41 44 1.8!
89 28 43 1.8!
...!
8
Es)ma)ng Coefficients in R >
>
>
>
ptsat <- read.table("patient_satisfaction.txt", header=T)!
attach(ptsat)!
fit1 <- lm(Satis ~ Age + Sev + Anx)!
summary(fit1)!
Coefficients:!
Estimate Std. Error t value Pr(>|t|)
!
(Intercept) 158.4913
18.1259
8.744 5.26e-11 ***!
Age
-1.1416
0.2148 -5.315 3.81e-06 ***!
β̂i
Sev
-0.4420
0.4920 -0.898
0.3741
!
Anx
-13.4702
7.0997 -1.897
0.0647 . !
---!
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 !
!
Residual standard error: 10.06 on 42 degrees of freedom!
Multiple R-squared: 0.6822,!Adjusted R-squared: 0.6595 !
F-statistic: 30.05 on 3 and 42 DF, p-value: 1.542e-10 !
9
What about those other Sums of Squares? We’re in luck! They’re exactly the same as in SLR 10
Sums of Squares n
• Error Sum of Squares SSE =
n
• Total Sum of Squares SST =
n
n
2
2
2
(y
−
ŷ
)
=
y
−
ŷ
∑ i i ∑i ∑i
i=1
n
i=1
∑(y − y ) = ∑ y − ny
2
2
i
i
i=1
n
i=1
2
i=1
2
SSR
=
(
ŷ
−
y
)
• Regression Sum of Squares ∑ i
i=1
Analysis of Variance property: SST = SSR + SSE
Coefficient of Determina)on (Goodness-‐of-‐fit measure): Regression sum of squares SSR
R =
=
Total sum of squares
SST
2
11
Assump)ons for Inference • Again we have to make some assump)ons in order to perform any inference (CIs/PIs/HTs) • Simplest case: same four assump)ons on the errors: 1.
2.
3.
4.
Errors ε1,…, εn are random and independent. In par)cular, the magnitude of any error εi does not influence the value of the next error εi+1 Errors ε1,…, εn all have mean 0 Errors ε1,…, εn all have the same variance denoted by σ2 Errors ε1,…, εn are normally distributed 12
Consequences of the Assump)ons • The errors ε1,…, εn are independent normal random variables with mean zero and variance σ2: 2
ei ~ N(0, σ )
• Since yi = β0 + β1 x 1i + β2 x 2i +...
+ β p x pi + ε i the yi are a linear combina)on of εi so they are also normally distributed: 2
yi ~ N(β0 + β1 x1i + β2 x2i +... + β p x pi , σ )
13
2
Es)ma)ng the Error Variance σ • In SLR: 2
σˆ = s
2
∑
=
∑
=
n
i=1
n
(yi − ŷi )
2
n−2
(yi − ŷi )
2
SSE
=
n−2
SSE
=
n − p −1
• In MLR: σˆ = s
n − p −1
• Es)mates of sβ̂ 0 and
s β̂1 are the same as before, but using the appropriate value of s • We’ll obtain them from R 2
2
i=1
14
Other MLR Quan))es in R >
>
>
>
ptsat <- read.table("patient_satisfaction.txt", header=T)!
attach(ptsat)!
fit1 <- lm(Satis ~ Age + Sev + Anx)!
summary(fit1)!
Coefficients:!
Estimate Std. Error
t value Pr(>|t|)
!
(Intercept) 158.4913
18.1259
8.744 5.26e-11 ***!
Age
-1.1416
β̂i 0.2148 sβ̂i -5.315 3.81e-06 ***!
Sev
-0.4420
0.4920
-0.898
0.3741
!
Anx
-13.4702
7.0997
-1.897
0.0647 . !
---!
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 !
!
s
Residual standard error:2 10.06 on 42 degrees of freedom!
R
Multiple R-squared: 0.6822,!Adjusted R-squared: 0.6595 !
F-statistic: 30.05 on 3 and 42 DF, p-value: 1.542e-10 !
15
HTs for Coefficients (One at a Time) • Under assump)ons 1-‐4, (β̂i − βi )
~ tn− p−1
sβ̂
i
• We can test a hypothesis for any of the βi (one at a )me) using a t-‐test where the quan)ty above is the test sta)s)c 16
Other MLR Quan))es in R >
>
>
>
ptsat <- read.table("patient_satisfaction.txt", header=T)!
attach(ptsat)!
fit1 <- lm(Satis ~ Age + Sev + Anx)!
summary(fit1)!
Coefficients:!
Estimate Std. Error
t value Pr(>|t|)
!
(Intercept) 158.4913
18.1259
8.744 5.26e-11 ***!
Age
-1.1416
β̂i 0.2148 sβ̂i -5.315 3.81e-06 ***!
Sev
-0.4420
0.4920
-0.898
0.3741
!
Anx
-13.4702
7.0997
-1.897
0.0647 . !
---!
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘
!
s
Residual standard error: 210.06 on 42 degrees of freedom!
R !Adjusted R-squared: 0.6595 !
Multiple R-squared: 0.6822,
F-statistic: 30.05 on 3 and 42 DF, p-value: 1.542e-10 !
Test sta)s)cs and p-‐values for the null that the coefficient=0 ’ 1 !
17
HTs for Coefficients (Globally) • Say we want to test whether all the predictor coefficients are equal to zero, i.e. H 0 : β1 = ... = β p = 0, versus
H1 : at least one of the βi is not zero
• The test sta)s)c is: Looks like a ra)o of ‘variances’! SSR / p
F=
~ Fp, n− p−1
SSE / (n − p −1)
• This uses the F distribu)on, that we used previously to test whether the ra)o of two (normal) variances was different than 1 (sec)on 6.11) If this test is not rejected, the model may not be useful 18
Other MLR Quan))es in R >
>
>
>
ptsat <- read.table("patient_satisfaction.txt", header=T)!
attach(ptsat)!
fit1 <- lm(Satis ~ Age + Sev + Anx)!
summary(fit1)!
Coefficients:!
Estimate Std. Error
t value Pr(>|t|)
!
(Intercept) 158.4913
18.1259
8.744 5.26e-11 ***!
Age
-1.1416
0.2148
sβ̂ -5.315 3.81e-06 ***!
β̂i
i
Sev
-0.4420
0.4920
-0.898
0.3741
!
Anx
-13.4702
7.0997
-1.897
0.0647 . !
---!
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘
!
s
Residual standard error: 2 10.06 on 42 degrees of freedom!
R
Multiple R-squared: 0.6822,!Adjusted R-squared: 0.6595 !
F-statistic: 30.05 on 3 and 42 DF, p-value: 1.542e-10 !
F
Test sta)s)cs and p-‐values for the null that the coefficient=0 ’ 1 !
19
Checking Assump)ons • All of what we discussed for SLR diagnos)cs applies • In addi)on, look at a plot of each of the independent variables against the residuals – look for trends or heteroscedas)city 20
Residual Plot # residual plot!
plot(fit1$fitted, fit1$residuals, xlab="Fitted", ylab="Residuals")!
abline(h=0, col="red")!
21
Time Order (Independence) Plot # time order plot!
plot(1:46, fit1$residuals, xlab=”Observation Number", ylab="Residuals")!
abline(h=0, col="red")!
22
QQ (Normality) Plot # QQ plot!
qqnorm(fit1$residuals)!
qqline(fit1$residuals)!
23
Residuals Against All Predictors #residuals against all predictors!
plot(Age, fit1$residuals, xlab="Age", ylab="Residuals")!
abline(h=0, col="red")!
!
plot(Sev, fit1$residuals, xlab="Severity", ylab="Residuals")!
abline(h=0, col="red")!
!
plot(Anx, fit1$residuals, xlab="Anx", ylab="Residuals")!
abline(h=0, col="red")!
24
Next
• More on Mul)ple Linear Regression – Collinearity and Confounding – Model Selec)on 25
© Copyright 2026 Paperzz