ch09-sec01-04.pdf

ST430 Introduction to Regression Analysis
ST430: Introduction to Regression Analysis, Chapter 9,
Sections 1-4
Luo Xiao
November 16, 2015
1 / 27
ST430 Introduction to Regression Analysis
Special Topics
2 / 27
Special Topics
ST430 Introduction to Regression Analysis
Some complex model-building problems can be handled using the linear
regression approach covered up to this point.
For example, piecewise regression, including piecewise linear regression
and spline regression.
Some require more general nonlinear approaches.
For example, logistic and probit regression for binary responses.
3 / 27
Special Topics
ST430 Introduction to Regression Analysis
Sometimes, theory may suggest that a relationship is a straight line, but
with different slopes in different parts of the data.
For example,
E (Y ) =
Y
]—0 + —1 x
[— ú + — ú x
0
1
x Æ ›,
x > ›.
Usually we want the model to be continuous at x = ›:
—0 + —1 › = —0ú + —1ú ›
Instead of imposing this as a constraint, the model is usually
reparameterized as
E (Y ) = —0 + —1 x + —2 max (x ≠ ›, 0) .
4 / 27
Special Topics
ST430 Introduction to Regression Analysis
Piecewise regression
Example 9.1
Y : compressive strength of concrete;
X : ratio of water to cement (by mass);
Note: the compressive strength decreases as the ratio increases.
R code (output in “output1.txt“):
setwd("~/Dropbox/teaching/2015Fall/R_datasets/Exercises&Exampl
load("CEMENT.RData")
with(CEMENT,plot(STRENGTH~RATIO,pch=20))
lq = lm(STRENGTH ~ RATIO + I(RATIO^2), CEMENT)
summary(lq)
curve(predict(lq, data.frame(RATIO = x)), add = TRUE)
5 / 27
Special Topics
ST430 Introduction to Regression Analysis
3
2
1
STRENGTH
4
R graph
50
60
70
80
RATIO
6 / 27
Special Topics
ST430 Introduction to Regression Analysis
In the example, suppose the theory suggests that › = 70 is the critical ratio:
R code (output in “output2.txt“):
setwd("~/Dropbox/teaching/2015Fall/R_datasets/Exercises&Exampl
load("CEMENT.RData")
with(CEMENT,plot(STRENGTH~RATIO,pch=20))
xi = 70
lp70 = lm(STRENGTH ~ RATIO + pmax(RATIO - xi, 0), CEMENT)
summary(lp70)
curve(predict(lp70, data.frame(RATIO = x)), add = TRUE,
col = "red")
Note the slightly higher R 2 and Ra2 .
7 / 27
Special Topics
3
2
1
STRENGTH
4
ST430 Introduction to Regression Analysis
50
60
70
80
RATIO
8 / 27
Special Topics
ST430 Introduction to Regression Analysis
Sometimes the lines are not constrained to be continuous.
Example 9.2:
Y : reading test score;
X : age of children/young adults.
R code:
setwd("~/Dropbox/teaching/2015Fall/R_datasets/Exercises&Exampl
load("READSCORES.RData")
with(READSCORES, plot(SCORE~AGE,pch=20))
9 / 27
Special Topics
ST430 Introduction to Regression Analysis
40
30
20
10
0
SCORE
50
60
Scatter plot
5
10
15
20
AGE
10 / 27
Special Topics
ST430 Introduction to Regression Analysis
Models
We may fit a quadratic regression.
To fit separate lines for AGE Æ 14 and AGE > 14, include the interaction of
AGE and the indicator variable for AGE > 14.
The piecewise regression model has higher Ra2 than the quadratic model.
11 / 27
Special Topics
ST430 Introduction to Regression Analysis
R code:
setwd("~/Dropbox/teaching/2015Fall/R_datasets/Exercises&Exampl
load("READSCORES.RData")
with(READSCORES, plot(SCORE~AGE,pch=20))
l14 = lm(SCORE ~ AGE * (AGE > 14), READSCORES)
lq = lm(SCORE~AGE + I(AGE^2), READSCORES)
curve(predict(l14, data.frame(AGE = x)), to = 14,
add = TRUE, col = "blue")
curve(predict(l14, data.frame(AGE = x)), from = 15,
add = TRUE, col = "blue")
curve(predict(lq, data.frame(AGE = x)),add=TRUE,col="red")
12 / 27
Special Topics
ST430 Introduction to Regression Analysis
R fits
40
30
20
10
0
SCORE
50
60
piecewise linear
quadratic
5
10
15
20
AGE
13 / 27
Special Topics
ST430 Introduction to Regression Analysis
Inverse prediction
One use of a regression model
E (Y ) = —0 + —1 X
is to predict Y for a new X , x0 .
Sometimes, instead, we observe a new y0 , and want to make an inference
about the new x0 .
Often X is expensive to measure, but Y is cheap; the relationship is
determined from a calibration dataset.
14 / 27
Special Topics
ST430 Introduction to Regression Analysis
Because
y0 = —0 + —1 x0 + ‘0 ,
we can solve for x0 :
x0 =
y0 ≠ —0 ≠ ‘0
.
—1
We do not observe ‘0 , but we know that E (‘0 ) = 0.
Similarly, we do not know —0 and —1 , but we have estimates —ˆ0 and —ˆ1 .
15 / 27
Special Topics
ST430 Introduction to Regression Analysis
This suggests the estimate
x̂0 =
y0 ≠ —ˆ0
.
—ˆ1
This is known as inverse prediction.
An approximate 100(1 ≠ –)% prediction interval for x0 is:
s
x̂0 ± t–/2 ◊
◊
—ˆ1
16 / 27
Û
1+
1 (x̂0 ≠ x̄ )2
+
.
n
SSxx
Special Topics
ST430 Introduction to Regression Analysis
An alternative approach is to fit the inverse regression:
X = “0 + “1 Y + ‘.
Then use the standard prediction interval
x̂0 ± t–/2 ◊ sx |y ◊
where
Û
1+
1 (y0 ≠ ȳ )2
+
n
SSyy
x̂0 = “ˆ0 + “ˆ1 y0 .
This is not supported by the standard theory, because, in the calibration
data, X is fixed and Y is random.
But it has been shown to work well in practice.
17 / 27
Special Topics
ST430 Introduction to Regression Analysis
Weighted least squares
Recall the linear regression equation
E (Y ) = —0 + —1 X1 + —2 X2 + · · · + —k Xk
We have estimated the parameters —0 , —1 , —2 , . . . , —k by minimizing the
sum of squared residuals
SSE =
=
n 1
ÿ
i=1
n Ë
ÿ
i=1
18 / 27
Yi ≠ Ŷi
1
22
Yi ≠ —ˆ0 + —ˆ1 Xi,1 + —ˆ2 Xi,2 + · · · + —ˆk Xi,k
Special Topics
2È2
.
ST430 Introduction to Regression Analysis
Sometimes we want to give some observations more weight than others.
We achieve this by minimizing a weighted sum of squares:
WSSE =
=
n
ÿ
i=1
n
ÿ
i=1
1
wi Yi ≠ Ŷi
Ë
1
22
wi Yi ≠ —ˆ0 + —ˆ1 Xi,1 + —ˆ2 Xi,2 + · · · + —ˆk Xi,k
2È2
ˆ are called weighted least squares (WLS) estimates, and the
The resulting —s
WLS residuals are
Ô
wi (Yi ≠ Ŷi ).
19 / 27
Special Topics
ST430 Introduction to Regression Analysis
Why use weights?
Suppose that the variance is not constant:
var(Yi ) = ‡i2 .
If we use weights
wi Ã
1
,
‡i2
the WLS estimates have smaller standard errors than the ordinary least
squares (OLS) estimates.
That is, the OLS estimates are inefficient, relative to the WLS estimates.
20 / 27
Special Topics
ST430 Introduction to Regression Analysis
In fact, using weights proportional to 1/‡i2 is optimal: no other weights give
smaller standard errors.
When you specify weights, regression software calculates standard errors on
the assumption that they are proportional to 1/‡i2 .
21 / 27
Special Topics
ST430 Introduction to Regression Analysis
How to choose the weights
If you have many replicates for each unique combination of Xs, use si2
to estimate var(Y |Xi ).
Often you will not have enough replicates to give good variance
estimates.
The text suggests grouping observations that are “nearest neighbors”.
Alternatively you can use the regression diagnostic plots.
22 / 27
Special Topics
ST430 Introduction to Regression Analysis
Example 9.4: Florida road contracts
Y : winning (lowest) bid price
X : length of new road
R code (run the code!):
setwd("~/Dropbox/teaching/2015Fall/R_datasets/Exercises&Exampl
load("DOT11.RData")
fit = lm(BIDPRICE~LENGTH,DOT11)
plot(fit)
23 / 27
Special Topics
ST430 Introduction to Regression Analysis
The first plot uses unweighted residuals yi ≠ ŷi , but the others use weighted
residuals.
Also recall that they are “Standardized residuals”
yi ≠ ŷi
ziú = Ô
.
s 1 ≠ hi
which are called Studentized residuals in the text.
With weights, the standardized residuals are
ziú =
24 / 27
Ô
wi
3
4
y ≠ ŷi
Ôi
.
s 1 ≠ hi
Special Topics
ST430 Introduction to Regression Analysis
Note that the “Scale-Location” plot shows an increasing trend.
Try weights that are proportional to powers of x = LENGTH:
fit1 = lm(BIDPRICE~LENGTH, DOT11, weights = 1/LENGTH)
plot(fit1)
fit2 = lm(BIDPRICE~LENGTH, DOT11, weights = 1/LENGTH^2)
plot(fit2)
summary() shows that the fitted equations are all very similar, and
weights = 1/LENGTH gives the smallest standard errors.
25 / 27
Special Topics
ST430 Introduction to Regression Analysis
Note
Standard errors are computed as if the weights are known constants.
But for the “nearest neighbors" method in textbook, weights are based on a
preliminary OLS fit.
Theory shows that in large samples the standard errors are also valid with
estimated weights.
26 / 27
Special Topics
ST430 Introduction to Regression Analysis
Note
When you specify weights wi , lm() fits the model
‡i2 =
‡2
wi
and the “Residual standard error” s is an estimate of ‡:
s2 =
qn
(yi ≠ ŷi )2
n≠k ≠1
i=1 wi
If you change the weights, the meaning of ‡ (and s) changes.
You cannot compare the residual standard errors for different weighting
schemes (c.f. page 488, foot).
27 / 27
Special Topics