ch08-sec05-07.pdf

ST430 Introduction to Regression Analysis
ST430: Introduction to Regression Analysis, Chapter 8,
Sections 5-7
Luo Xiao
November 9, 2015
1 / 25
ST430 Introduction to Regression Analysis
Residual Analysis
2 / 25
Residual Analysis
ST430 Introduction to Regression Analysis
Checking normality
One of the standard assumptions that ensure that inferences are valid is
that the random errors
= Y − E (Y |X)
are normally distributed.
Standard error calculations do not depend on the normality assumption, but
P-values do.
Except in small samples, departures from normality do not usually invalidate
hypothesis tests or confidence intervals.
3 / 25
Residual Analysis
ST430 Introduction to Regression Analysis
Often, when data are not normal, they show longer/heavier tails.
Heavy tails generally make inferences conservative.
For instance, a 95% confidence interval actually covers the true
parameter value with a probability higher than 95%.
Similarly, the Type I error rate in a hypothesis test is less than the
nominal α.
Conservative inferences are not optimal (for instance, confidence intervals
are wider than they need to be).
4 / 25
Residual Analysis
ST430 Introduction to Regression Analysis
One approach to checking normality is by a hypothesis test: H0 : is
normally distributed, versus Ha : is not normally distributed.
The Shapiro-Wilk test is often recommended.
All tests have relatively low power in small samples, and even in moderately
large samples.
That is, the chance of detecting moderate non-normality is not close to 1,
or, type II error rate is not close to 0.
5 / 25
Residual Analysis
ST430 Introduction to Regression Analysis
Graphical checks
Use histogram and overlay a normal curve.
Use normal quantile-quantile plot (Q-Q plot): quantiles of observed
data against quantiles of a standard normal distribution.
The normal Q-Q plot is more useful than the histogram.
6 / 25
Residual Analysis
ST430 Introduction to Regression Analysis
R code for normality checks
setwd("~/Dropbox/teaching/2015Fall/R_datasets/Exercises&Exampl
load("SOCWORK.RData")
fit = lm(SALARY ~ EXP + I(EXP^2), SOCWORK)
r = residuals(fit)
par(mfrow=c(1,2))
hist(r,probability=TRUE) #histogram
#add a normal curve
curve(dnorm(x,mean = 0,sd = sd(r)),col=2,add=TRUE)
qqnorm(r,pch=20) #q-q plot
qqline(r,col=2)
shapiro.test(r) #Shapiro-Wilk test gives a P-value 0.3475
7 / 25
Residual Analysis
ST430 Introduction to Regression Analysis
Histogram and Q-Q plot
−20000
0
20000
r
8 / 25
−10000
10000
Normal Q−Q Plot
Sample Quantiles
3e−05
0e+00
Density
6e−05
Histogram of r
−2
−1
0
1
2
Theoretical Quantiles
Residual Analysis
ST430 Introduction to Regression Analysis
A Cauchy distribution (heavy-tailed)
Normal Q−Q Plot
20
0
−20
−60
−40
Sample Quantiles
0.03
0.02
0.01
0.00
Density
0.04
40
0.05
Histogram of y
−80
−40
0
y
9 / 25
20
40
−2
−1
0
1
Theoretical Quantiles
Residual Analysis
2
ST430 Introduction to Regression Analysis
Another “non-normal” example
Normal Q−Q Plot
0
20 40 60 80
Sample Quantiles
0.010
0.000
Density
0.020
120
Histogram of r
0
50
100
r
10 / 25
150
−2
−1
0
1
2
Theoretical Quantiles
Residual Analysis
ST430 Introduction to Regression Analysis
Outliers
Recall that ˆi = Yi − Ŷi is the i th residual; it has the same units as Y .
Residuals are often scaled in some way to make them dimensionless.
Terminology varies! Here we follow R (’rstandard()’ and ’rstudent()’) and
SAS, not the textbook.
11 / 25
Residual Analysis
ST430 Introduction to Regression Analysis
Scaled residual (“standardized” residual in the text):
scaled residual zi =
Yi − Ŷi
ˆi
=
.
s
s
Rule of thumb
If |zi | > 3, the i th observation is an outlier.
Equivalently, |Yi − Ŷi | > 3s, a “3-σ event”.
12 / 25
Residual Analysis
ST430 Introduction to Regression Analysis
The “hat” matrix
Each observation contributes to the value of β̂; in matrix notation,
−1
β̂ = X0 X
X0 Y.
So it also contributes to the predicted values:
Ŷ = Xβ̂ = X X0 X
−1
where
H = X X0 X
X0 Y = HY,
−1
X0
is the hat matrix.
H “puts the hat on Y”.
13 / 25
Residual Analysis
ST430 Introduction to Regression Analysis
The residuals are ˆ = Y − Ŷ = (I − H)Y and consequently (with some
matrix algebra)
var (ˆ
i ) = σ 2 (1 − hi )
where hi is the i th diagonal entry in H.
The
Yi − Ŷi
zi
ˆi
= √
=√
standardized residual zi∗ = √
s 1 − hi
s 1 − hi
1 − hi
(“studentized” residual in the text) is adjusted for these different variances.
We can also use the rule of thumb with standardized residuals: if |zi∗ | > 3,
the i th observation is an outlier.
14 / 25
Residual Analysis
ST430 Introduction to Regression Analysis
Leverage
Recall that Ŷ = HY, where H is the hat matrix:
Ŷi =
n
X
hi,j Yj .
j=1
The diagonal entry hi,i = hi is the weight attached to yi itself in computing
ŷi .
The diagonal entry hi is defined to be the leverage of the i th observation.
Leverage measures the contribution of yi to its predicted value ŷi .
15 / 25
Residual Analysis
ST430 Introduction to Regression Analysis
Leverage satisfies 0 < hi ≤ 1; the average leverage is always
h̄ =
k +1
,
n
where k the number of predictors.
In many designed experiments, all observations have the same leverage:
hi ≡ h̄; in observational studies, leverage can vary widely.
Rule of thumb
If hi > 2h̄, the i th observation is a leverage point.
In the fourth residual plot, the standardized residuals are plotted against
leverage.
16 / 25
Residual Analysis
ST430 Introduction to Regression Analysis
Influence
An observation can be a leverage point but not have a great influence on β̂.
Write β̂
If β̂
(i)
(i)
for the parameter estimates when the i th observation is omitted.
is very different from β̂, the i th observation has high influence.
17 / 25
Residual Analysis
ST430 Introduction to Regression Analysis
One measure of the magnitude of β̂
Pn
j=1
Di =
(i)
Ŷj
− β̂ is Cook’s distance,
− Ŷj
2
(k + 1)s 2
β̂
=
(i)
(i)
0
− β̂
(X0 X)
β̂
(i)
− β̂
(k + 1)s 2
(i)
where Ŷj is the usual predicted value of yj and Ŷj
is the predicted value
(i)
using β̂ .
18 / 25
Residual Analysis
ST430 Introduction to Regression Analysis
It can be shown that
z2
zi∗ 2
hi
hi
Di = i
=
.
2
k + 1 (1 − hi )
k + 1 1 − hi
where zi is the scaled residual, zi∗ is the standardized residual, and hi is the
leverage.
If the i th observation has a large standardized residual zi∗ and high leverage
hi , Cook’s distance Di will be large.
Rule of thumb
If Di > 1, the i th observation is highly influential.
The fourth residual plot shows contours of Cook’s distance, so the rule of
thumb is easy to use.
19 / 25
Residual Analysis
ST430 Introduction to Regression Analysis
Detecting correlation
Time series data
Regression models are sometimes used with responses Y1 , Y2 , . . . , Yn that
are collected over time.
Often one response is similar to the immediately preceding responses, which
means that they are correlated.
Since standard errors are usually calculated on the assumption of zero
correlation, they can be quite incorrect, often too small by a factor of 2 or
more.
20 / 25
Residual Analysis
ST430 Introduction to Regression Analysis
When such serial correlation is present, both the estimation procedure (least
squares) and the calculation of standard errors need to be modified.
First we need to know when significant correlation is present.
Durbin-Watson test
The widely available Durbin-Watson test was developed by Jim Durbin and
Geof Watson.
It is based on the statistic
Pn
d=
21 / 25
i − ˆi−1 )
i=2 (ˆ
Pn
ˆ2i
i=1 2
.
Residual Analysis
ST430 Introduction to Regression Analysis
d statistic
Range of d : 0 ≤ d ≤ 4;
If there is no correlation, d ≈ 2;
If observations are positively correlated, d < 2, if strong positive
correlation, d ≈ 0;
If observations are negatively correlated, d > 2, if strong negative
correlation, d ≈ 4;
22 / 25
Residual Analysis
ST430 Introduction to Regression Analysis
Example: trend in sales (run the code!)
setwd("~/Dropbox/teaching/2015Fall/R_datasets/Exercises&Exampl
load("SALES35.RData")
plot(SALES ~ T, data = SALES35, pch=20)
fit = lm(SALES ~ T, SALES35)
summary(fit)
plot(fit)
library(car)
durbinWatsonTest(fit)
23 / 25
Residual Analysis
ST430 Introduction to Regression Analysis
The usual four plots give no information about correlation.
Looking Ahead...
The ’arima()’ function fits a regression model given an assumed model for
the residual correlation.
One simple model is the first order autoregression, AR(1) (output in
"output1.txt"):
fitAR = arima(SALES35$SALES,
order = c(1, 0, 0), xreg = SALES35$T)
fitAR #AR model
summary(fit) #regular linear regression
24 / 25
Residual Analysis
ST430 Introduction to Regression Analysis
Note the increase in standard error of the estimated trend, from 0.1069 to
0.1760.
25 / 25
Residual Analysis