Autocorrelation - Pdx - Portland State University

Autocorrelation
David Gerbing
School of Business Administration
Portland State University
January 30, 2016
Autocorrelation
The difference between an actual data value and the forecasted data value from a model is
the residual for that forecasted value.
Residual:
ei = Yi − Ŷi
One of the assumptions of the least squares estimation procedure for the coefficients of a
regression model is that the residuals are purely random. One consequence of randomness
is that the residuals would not correlate with anything else, including with each other at
different time points. A value above the mean, that is, a value with a positive residual,
would contain no information as to whether the next value in time would have a positive
residual, or negative residual, with a data value below the mean.
For example, flipping a fair coin yields random flips, with half of the flips resulting in a Head
and the other half a Tail. If a Head is scored as a 1 and a Tail as a 0, and the probability
of both Heads and Tails is 0.5, then calculate the value of the population mean as:
Population Mean:
µ = (0.5)(1) + (0.5)(0) = .5
The forecast of the outcome of the next flip of a fair coin is the mean of the process, 0.5,
which is stable over time. What are the corresponding residuals?
A residual value is the difference of the corresponding data value minus the mean. With
this scoring system, a Head generates a positive residual from the mean, µ,
Head:
ei = 1 − µ = 1 − 0.5 = 0.5
A Tail generates a negative residual from the mean,
Tail:
ei = 0 − µ = 0 − 0.5 = −0.5
The error terms of the coin flips are independent of each other, so if the current flip is a
Head, or if the last 5 flips are Heads, the forecast for the next flip is still µ = .5. That
is, the error terms of successive coin flips are unrelated. Neither a residual of +.5 or −.5
predicts anything about the value of the next residual.
However, this independence of residuals with each other may very well not apply, particularly to time series data.
Autocorrelation: Residuals, ei , correlate with each other, which for
time series data applies to different time periods.
Autocorrelation is literally expressed as a correlation coefficient.
A correlation of the residuals, ei , leads also to a correlation of the corresponding data values,
Yi . Autocorrelation occurs in time-oriented data in which values of the same variable, Y,
correlate with each other over time. For time series data the correlation is between two
values of the same variable at times Yi and Yi+k . If k = 1, that is, for one time period, then
any autocorrelation is for a lag of 1, sometimes called a first-order autocorrelation.
The process that generates the autocorrelation is summarized by a regression equation in
Gerbing
Autocorrelation
which the autocorrelated error term is expressed as the sum of the autocorrelated component
and a pure random component.
First-order autoregression: An error term at time t, t , has a firstorder autoregression if
t = ρt−1 + ut
where ρ is the autocorrelation coefficient and ut is random variable with
no autocorrelation.
Consider the standard regression model with a single predictor variable, perhaps another
variable entirely or a lagged version of the response variable in a time series.
Yt = β0 + β1 Xt + t
When applied to a time-series, the variable Xt is a lagged version of response Y, such as
for a lag of one time period, Yt−1 .
Yt = β0 + β1 Yt−1 + t
If the underlying process is first-order regressive, then the model can be expressed according
to the following error structure.
Yt = β0 + β1 Yt−1 + (ρt−1 + ut )
What does data that conforms to this autocorrelated model look like?
Given the autocorrelation, successive values of the time series are not entirely random. The
value of the time series at time t depends to some extent on the value of the residual at
the previous time. For a positive autocorrelation, what occurs now influences what occurs
next in the same direction. For example, if one value has a positive error term, then the
next value in the time sequence is more likely to have a positive error term. Similarly, a
negative error term implies a more likely negative error term for the next value. The extent
of this dependence follows from the size of the autocorrelation coefficient, ρ, as defined in
the previous equation.
The resulting time-series with a positive value of ρ tends to change direction slowly, to
meander. Once the error term is positive it tends to remain positive and once negative it
tends to remain negative.
Persistence: A tendency for a system to remain in the same state from
one observation to the next.
Occasionally some shock occurs the creates a qualitative change in the system. Then the
system again more or less stays near the same shared value until another shock occurs.
Why bother understanding and assessing autocorrelation? The answer is improved forecasting of the values for the variable of interest. A model can successfully forecast only to the
extent that it successfully mirrors aspects of reality. The assessed amount of autocorrelation
present contributes to the forecast of the future data values.
2
Gerbing
Autocorrelation
Stock prices are a prime financial example of autocorrelation. A stock that is gaining over
recent time periods would be forecasted, in the absence of any other information, to continue
those gains, more or less, for at least the immediate future. Of course, at any specific time
a shock could occur, such as a stock market crash, and reset the system so that which was
consistently positive could become negative and vice versa.
Lagged correlations
A common tool for assessing pattern in a time series is the lagged autocorrelation function.
Obtain a lag 1 autocorrelation by correlating the variable Yt with a new variable Yt−1 , which
is the value of Yt lagged one observation, that is, the value of Y obtained one observation
earlier.
Begin with some illustrative data, values of Y at 5 time points: 2, 9, 4, 3, and 5.
The lag one autocorrelation is obtained by correlating the variables Yt and Yt−1 . One
observation is lost in the calculation of a lag one correlation.
t Yt Yt-1
---------1 2 2 9 2
3 4 9
4 3 4
5 5 3
More generally, compute a lag k autocorrelation by correlating Y and Yt−k . The R autocorrelation function, adf, provides both a table of the lagged autocorrelation for each time lag
and a graph of the lagged autocorrelations as a function of the lag k. A 95% confidence interval around 0 is provided as part of the graph to assess whether the lagged autocorrelation
at each time lag significantly differs from zero.
Consider data generated by a stable process, that is, with only random noise accounting for
the difference of each data value from its mean. Here the 25 data values are sample from a
normal population with µ = 100 and σ = 15.
> y <- rnorm(25, 100, 15)
If the time-series error terms are random (white) noise, the data values portray random
variation about a common mean.
> lc(y)
3
Autocorrelation
100
medn
80
90
y
110
120
130
Gerbing
5
10
15
20
25
index
Figure 1: Data over time from a stable process.
Just as there is no pattern to the original data, it follows that there is no pattern to the
lagged autocorrelations. These correlations for different lags are shown in the following
figure, the output of the R autocorrelation function acf.
> acf(y)
0.4
-0.4
0.0
ACF
0.8
Series y
0
2
4
6
8
10
12
Lag
Figure 2: Autocorrelations up to a lag of 12.
The 0th lag is just the correlation of a residual with itself, so that correlation is always 1.0.
For these data, the largest autocorrelation appears to be about 0.3 for the 3rd lag.
Most of the lagged autocorrelations of a stable process pattern, about 95%, should fall
within this 95% CI, a condition met by the plotted autocorrelations in Figure 2. If many
4
Gerbing
Autocorrelation
lagged correlations fall outside this CI, then there is some evidence of pattern.
If the lagged autocorrelation function shows a cyclic pattern, this may suggest that a seasonality is present in the original time-series data. For example, if there is quarterly seasonality,
then the data value at Time 0 will correlate with the data value at Time 4 and so forth.
(TIme 0 is the beginning or first point of the time series.) If there are large autocorrelations
at large time lags, this may be evidence of a trend or other types of a changing process in
the time series.
To illustrate, ten different time series were simulated with the R function arima.sim. The
autocorrelation varies from 0 to 0.9, indicated by the population (auto) correlation coefficient, rho or ρ. The first plot, with autocorrelation of 0, exhibits pure random variation
about the process mean, what is called white noise. As successively more autocorrelation is
introduced, the independence of successive observations diminishes, with the consequence
of successively more meandering.
Examples of First-Order Autocorrelation
1
2
Autocorrelation of 0
−2
−1
y
0
mean
0
20
40
60
80
100
For a stable process model, with autocorrelation of 0, the forecasted value is just the mean
of the process. When autocorrelation is present, the forecasting model is more accurate
when this autocorrelation is accounted for. R provides such forecasting procedures.
5
Gerbing
Autocorrelation
1
2
Autocorrelation of 0.1
−3
−2
−1
y
0
mean
0000
0020
0040
0060
0080
0100
0
mean
−2
−1
y
1
2
Autocorrelation of 0.2
0000
0020
0040
0060
0080
0100
−1
0
mean
−2
y
1
2
3
Autocorrelation of 0.3
0000
0020
0040
0060
6
0080
0100
Gerbing
Autocorrelation
0
mean
−3
−2
−1
y
1
2
Autocorrelation of 0.4
0000
0020
0040
0060
0080
0100
y
1
2
3
Autocorrelation of 0.5
−1
0
mean
0000
0020
0040
0060
0080
0100
y
1
2
3
4
Autocorrelation of 0.6
−2
−1
0
mean
0000
0020
0040
0060
7
0080
0100
Gerbing
Autocorrelation
y
1
2
3
Autocorrelation of 0.7
−2
−1
0
mean
0000
0020
0040
0060
0080
0100
0
mean
−4
−2
y
2
4
Autocorrelation of 0.8
0000
0020
0040
0060
0080
0100
2
4
Autocorrelation of 0.9
−2
0
y
mean
0000
0020
0040
0060
8
0080
0100