Ch. 15 Forecasting
Having considered in Chapter 14 some of the properties of ARM A models, we
now show how they may be used to forecast future values of an observed time
series. For the present we proceed as if the model were known exactly.
One of the important problems in time series analysis is the following: Given
T observations on a realization, predict the (T + s)th observation in the realization, where s is a positive integer. The prediction is sometime called the f orecast
of the (T + s)th observation. Forecasting is an important concept for the studies
of time series analysis. In the scope of regression model we usually has an existing economic theory model for us to estimate their parameters. The estimated
coefficients have already a role to play such as to confirm some economic theories.
Therefore, to forecast or not from this estimated model depends on researcher’s
own interest. However, the estimated coefficients from a time series model have
no significant meaning to economic theory. An important role that a time series
analysis is therefore to be able to forecast precisely from this pure mechanical
model.
1
1.1
Principle of Forecasting
Forecasts Based on Conditional Expectations
Suppose we are interested in forecasting the value of a variables Yt+1 based on a
set of variables xt observed at date t. For example, we might want to forecast
Yt+1 based on its m most recent values. In this case, xt = [Yt , Yt−1 , ...., Yt−m+1 ]0 .
Let Yt+1|t denote a forecast of Yt+1 based on xt (a function of xt , depending
on how they are realized). To evaluate the usefulness of this forecast, we need
to specify a loss function. A quadratic loss function means choosing the forecast
∗
Yt+1|t
so as to minimize
M SE(Yt+1|t ) = E(Yt+1 − Yt+1|t )2 ,
which is known as the mean squared error.
Theorem:
The smallest mean squared error of in the forecast Yt+1|t is the (statistical) expectation of Yt+1 conditional on xt :
∗
= E(Yt+1 |xt ).
Yt+1|t
1
Ch. 15
1 PRINCIPLE OF FORECASTING
Proof:
Let g(xt ) be a forecasting function of Yt+1 other then the conditional expectation
E(Yt+1 |xt ). Then the MSE associated with g(xt ) would be
E[Yt+1 − g(xt )]2 = E[Yt+1 − E(Yt+1 |xt ) + E(Yt+1 |xt ) − g(xt )]2
= E[Yt+1 − E(Yt+1 |xt )]2
+2E{[Yt+1 − E(Yt+1 |xt )][E(Yt+1 |xt ) − g(xt )]}
+E{E[(Yt+1 |xt ) − g(xt )]2 }.
Denote ηt+1 ≡ {[Yt+1 − E(Yt+1 |xt )][E(Yt+1 |xt ) − g(xt )]} we have
E(ηt+1 |xt ) = [E(Yt+1 |xt ) − g(xt )] × E([Yt+1 − E(Yt+1 |xt )]|xt )
= [E(Yt+1 |xt ) − g(xt )] × 0
= 0.
By laws of iterated expectation, it follows that
E(ηt+1 ) = Ext E(E[ηt+1 |xt ]) = 0.
Therefore we have
E[Yt+1 − g(xt )]2 = E[Yt+1 − E(Yt+1 |xt )]2 + E{E[(Yt+1 |xt ) − g(xt )]2 }.
(1)
The second term on the right hand side of (1) cannot be made smaller than zero
and the first term does not depend on g(xt ). The function g(xt ) that can makes
the mean square error (1) as small as possible is the function that sets the second
term in (1) to zero:
g(xt ) = E(Yt+1 |xt ).
The MSE of this optimal forecast is
E[Yt+1 − g(xt )]2 = E[Yt+1 − E(Yt+1 |xt )]2 .
1.2
Forecasts Based on Linear Projection
Suppose we now consider only the class of forecast that Yt+1 is a linear function
of xt :
Yt+1|t = α0 xt .
2
Copy Right by Chingnun Lee r 2006
Ch. 15
1 PRINCIPLE OF FORECASTING
Definition:
The forecast α0 xt is called the linear projection of Yt+1 on xt if the forecast error
(Yt+1 − α0 xt ) is uncorrelated with xt :
E[(Yt+1 − α0 xt )x0t ] = 00 .
(2)
Theorem:
The linear projection produces the smallest mean squared error among the class
of linear forecasting rule.
Proof:
Let g0 xt be any arbitrary linear forecasting function of Yt+1 . Then the MSE
associated with g0 xt would be
E[Yt+1 − g0 xt ]2 = E[Yt+1 − α0 xt + α0 xt − g0 xt ]2
= E[Yt+1 − α0 xt ]2
+2E{[Yt+1 − α0 xt ][α0 xt − g0 xt ]}
+E[α0 xt − g0 xt ]2 .
Denote ηt+1 ≡ {[Yt+1 − α0 xt ][α0 xt − g0 xt ]} we have
E(ηt+1 ) = E{[Yt+1 − α0 xt ][(α − g)0 xt ]}
= (E[Yt+1 − α0 xt ]x0t )[α − g]
= 00 [α − g]
= 0.
Therefore we have
E[Yt+1 − g0 xt ]2 = E[Yt+1 − α0 xt )]2 + E[α0 xt − g0 xt ]2 .
(3)
The second term on the right hand side of (3) cannot be made smaller than zero
and the first term does not depend on g0 xt . The function g0 xt that can makes
the mean square error (3) as small as possible is the function that sets the second
term in (3) to zero:
g0 xt = α0 xt .
The MSE of this optimal forecast is
E[Yt+1 − g0 xt ]2 = E[Yt+1 − α0 xt ]2 .
3
Copy Right by Chingnun Lee r 2006
Ch. 15
1 PRINCIPLE OF FORECASTING
For α0 xt is a linear projection of Yt+1 on xt , we will use the notation
P̂ (Yt+1 |xt ) = α0 xt ,
to indicate the linear projection of Yt+1 on xt . Notice that
M SE[P̂ (Yt+1 |xt )] ≥ M SE[E(Yt+1 |xt )],
since the conditional expectation offers the best possible forecast.
For most applications a constant term will be included in the projection. We
will use the symbol Ê to indicate a linear projection on a vector of random
variables xt along a constant term:
Ê(Yt+1 |xt ) ≡ P̂ (Yt+1 |1, xt ).
4
Copy Right by Chingnun Lee r 2006
Ch.
2 15
FORECASTS BASED ON AN INFINITE NUMBER OF OBSERVATION
2
Forecasts Based on an Infinite Number of Observation
Recall that a general stationary and invertible ARM A(p, q) process is written in
this form:
φ(L)(Yt − µ) = θ(L)εt ,
(4)
where φ(L) = 1 − φ1 L − φ2 L2 − ... − φp Lp , θ(L) = 1 + θ1 L + θ2 L2 + ... + θq Lq and
all the roots of φ(L) = 0 and θ(L) = 0 lie outside the unit circle.
2.1
Forecasting Based on Lagged ε0 s, M A(∞) form
Assume that the lagged εt is observed directly, then the following principle is useful in the manipulating the forecasting in the subsequence. Consider an M A(∞)
form of (4):
Yt − µ = ϕ(L)εt
(5)
with εt white noise and
−1
ϕ(L) = θ(L)φ (L) =
∞
X
ϕj Lj ,
j=0
ϕ0 = 1,
∞
X
|ϕj | < ∞.
j=0
Suppose that we have an infinite number of observations on ε through date t,
that is {εt , εt−1 , εt−2 , ....}, and further know the value of µ and {ϕ1 , ϕ2 , ...}. Say
we want to forecast the value of Yt+s from now. Note that (5) implies
Yt+s = µ + εt+s + ϕ1 εt+s−1 + ... + ϕs−1 εt+1 + ϕs εt
+ϕs+1 εt−1 + ....
We consider the best linear forecast since the ARM A model we considered
so far is covariance-stationary, no assumption is made on their distribution so we
5
Copy Right by Chingnun Lee r 2006
Ch.
2 15
FORECASTS BASED ON AN INFINITE NUMBER OF OBSERVATION
cannot form conditional expectation. The best linear forecast takes the form1
Ê(Yt+s |εt , εt−1 , ...) = µ + ϕs εt + ϕs+1 εt−1 + ....
(6)
= [µ, ϕs , ϕs+1 , ...][1, εt , εt−1 , ...]0
(7)
= α0 x t .
(8)
That is, the unknown future ε’s are set to their expected value of zero. The error
associated with this forecast is uncorrelated with xt = [1, εt , εt−1 , ...]0 , or
1
ε
t
ε
t−1
(ε
+
ϕ
ε
+
...
+
ϕ
ε
)
E{xt · [Yt+s − Ê(Yt+s |εt , εt−1 , ...)]} = E
t+s
1
t+s−1
s−1
t+1
.
.
.
= 0.
The mean squared error associated with this forecast is
E[Yt+s − Ê(Yt+s |εt , εt−1 , ...)]2 = (1 + ϕ21 + ϕ22 + ... + ϕ2s−1 )σ 2 .
Example:
For an M A(q) process, the optimal linear forecast is
½
µ + θs εt + θs+1 εt−1 + ... + θq εt−q+s f or s =
1, 2, ..., q
Ê(Yt+s |εt , εt−1 , ...] =
µ
f or s = q + 1, q + 2, ...
The MSE is
σ2
f or s =
1
2
2
2
2
(1 + θ1 + θ2 + ... + θs−1 )σ f or s =
2, 3, ..., q
((1 + θ12 + θ22 + ... + θq2 )σ 2 f or s = q + 1, q + 2, ....
The MSE increase with the forecast horizon s up until s = q. If we try to
forecast an M A(q) farther than q periods into the future, the forecast is simply
the unconditional mean of the series (E(Yt+s ) = µ)) and the MSE is the unconditional variance of the series (V ar(Yt+s ) = (1 + θ12 + θ22 + ... + θq2 )σ 2 ).
1
You cannot use conditional expectation that E(εt+i |εt , εt−1 , ..) = 0 for i = 1, 2, · · · , s since
we only assume that εt is uncorrelated which do not imply that they are independent.
6
Copy Right by Chingnun Lee r 2006
Ch.
2 15
FORECASTS BASED ON AN INFINITE NUMBER OF OBSERVATION
A compact lag operator expression for the forecast in (6) is sometimes used.
Rewrite Yt+s as in (5) as
Yt+s = µ + ϕ(L)εt+s
= µ + ϕ(L)L−s εt .
Consider polynomial that ϕ(L) are divided by Ls :
ϕ(L)
= L−s + ϕ1 L1−s + ϕ2 L2−s + ... + ϕs−1 L−1 + ϕs L0
s
L
+ϕs+1 L1 + ϕs+2 L2 + ....
The annihilation operator replace negative powers of L by zero; for example,
·
¸
ϕ(L)
= ϕs L0 + ϕs+1 L1 + ϕs+2 L2 + ....
s
L
+
Therefore the optimal forecast (6) could be written in lag operator notation
as
·
ϕ(L)
Ê(Yt+s |εt , εt−1 , ...] = µ +
Ls
2.2
¸
εt .
(9)
+
Forecasting Based on Lagged Y 0 s
The previous forecasts were based on the assumption that εt is observed directly.
In the usual forecasting situation, we actually have observation on lagged Y 0 s, not
lagged ε0 s. Suppose that the general ARM A(p, q) has an AR(∞) representation
given by
η(L)(Yt − µ) = εt
(10)
with εt white noise and
−1
η(L) = θ (L)φ(L) =
∞
X
ηj Lj = ϕ−1 (L),
j=0
η0 = 1,
∞
X
|ηj | < ∞.
j=0
Under these conditions, we can substitute (10) into (9) to obtain the forecast
of Yts as a function of lagged Y 0 s:
·
¸
ϕ(L)
Ê(Yt+s |Yt , Yt−1 , ...] = µ +
η(L)(Yt − µ)
(11)
Ls +
7
Copy Right by Chingnun Lee r 2006
Ch.
2 15
FORECASTS BASED ON AN INFINITE NUMBER OF OBSERVATION
or
·
ϕ(L)
Ê(Yt+s |Yt , Yt−1 , ...] = µ +
Ls
¸
+
1
(Yt − µ).
ϕ(L)
(12)
Equation (12) is known as the W iener − Kolmogorov prediction f ormula.
2.2.1
Forecasting an AR(1) Process
Therefore are two ways to forecast a AR(1) process. They are:
(a). W iener − Kolmogorov prediction f ormula:
For the covariance stationary AR(1) process, we have
ϕ(L) =
and
·
ϕ(L)
Ls
1
= 1 + φL + φ2 L2 + φ3 L3 + ...
1 − φL
¸
= φs + φs+1 L1 + φs+2 L2 + ... =
+
φs
.
1 − φL
The optimal linear s-period ahead forecast for a stationary AR(1) process is
therefore:
φs
(1 − φL)(Yt − µ)
1 − φL
= µ + φs (Yt − µ).
Ê(Yt+s |Yt , Yt−1 , ...] = µ +
The forecast decays geometrically from (Yt − µ) toward µ as the forecast horizon
s increase.
(b). Recursive Substitution and Lag operator:
The AR(1) process can be represented as (using (1.1.9) on p.3 of Hamilton)
Yt+s − µ = φs (Yt − µ) + φs−1 εt+1 + φs−2 εt+2 + ... + φεt+s−1 + εt+s ,
Setting E(εt+h ) = 0, h = 1, 2, ..., s, the optimal linear s-period ahead forecast for
a stationary AR(1) process is therefore:
Ê(Yt+s |Yt , Yt−1 , ...] = µ + φs (Yt − µ),
8
Copy Right by Chingnun Lee r 2006
Ch.
2 15
FORECASTS BASED ON AN INFINITE NUMBER OF OBSERVATION
with forecast error φs−1 εt+1 + φs−2 εt+2 + ... + φεt+s−1 + εt+s , which is uncorrelated
with the forecast Yt (= µ + εt + φ1 εt−1 + φ2 εt−2 + ....). The MSE of this forecast is
E(φs−1 εt+1 + φs−2 εt+2 + ... + φεt+s−1 + εt+s )2 = (1 + φ2 + φ4 + ... + φ2(s−1) )σ 2 .
Notice that this grows with s and asymptotically approach σ 2 /(1 − φ2 ), the unconditional variance of Y .2
2.2.2
Forecasting an AR(p) Process
(a). Use Recursive substitution and Lag operator:
Following (11) in Chapter 13, the value of Y at t + s of an AR(p) process can
be represented as
s
s
s
s−1
s−2
Yt+s − µ = f11
(Yt − µ) + f12
(Yt−1 − µ) + ... + f1p
(Yt−p+1 − µ) + f11
εt+1 + f11
εt+2 + ...
1
+f11
εt+s−1 + εt+s ,
j
is the (1, 1) elements of
where f11
φ1 φ2
1 0
0 1
.
F≡
.
.
.
.
.
0 0
Fj ,
φ3
0
0
.
.
.
0
.
.
.
.
.
.
.
. φp−1 φp
.
0
0
.
0
0
.
.
.
.
.
.
.
.
.
.
.
1
0
Setting E(εt+h ) = 0, h = 1, 2, ..., s, the optimal linear s-period ahead forecast
for a stationary AR(p) process is therefore:
s
s
s
(Yt−p+1 − µ).
(Yt−1 − µ) + ... + f1p
(Yt − µ) + f12
Ê(Yt+s |Yt , Yt−1 , ...] = µ + f11
The associated forecast error is
s−1
s−2
1
Yt+s − Ê(Yt+s ) = f11
εt+1 + f11
εt+2 + ... + f11
εt+s−1 + εt+s .
2
That is, with the information of Yt to forecast Yt+s , the MSE could never be larger than
the unconditional variance of Yt+s .
9
Copy Right by Chingnun Lee r 2006
Ch.
2 15
FORECASTS BASED ON AN INFINITE NUMBER OF OBSERVATION
It is important to note that to forecast an AR(p) process, an optimal s-periodahead linear forecast based on an infinite number of observations {Yt , Yt−1 , ...} in
fact make use of only the p most recent value {Yt , Yt−1 , .., Yt−p+1 }.
2.2.3
Forecasting an M A(1) Process
An invertible M A(1) process:
Yt − µ = (1 + θL)εt
with |θ| < 1.
(a) . Applying the Wiener-Kolmogorov formula we have
·
¸
1 + θL
1
Ŷt+s|t = µ +
(Yt − µ),
s
L
+ 1 + θL
To forecast an M A(1) process one period ahead (s = 1),
¸
·
1 + θL
= θ,
L1
+
and so
θ
(Yt − µ)
1 + θL
= µ + θ(Yt − µ) − θ2 (Yt−1 − µ) + θ3 (Yt−2 − µ) − ....
Ŷt+1|t = µ +
To forecast an M A(1) process for s = 2, 3, ... periods into the future,
·
¸
1 + θL
= 0 f or s = 2, 3, ...;
Ls
+
an so
Ŷt+s|t = µ f or s = 2, 3, ....
10 Copy Right by Chingnun Lee r 2006
Ch.
2 15
FORECASTS BASED ON AN INFINITE NUMBER OF OBSERVATION
(b). From recursive substitution:
An M A(1) process at period t + 1 is
Yt+1 − µ = εt+1 + θεt .
At period t, E(εt+s ) = 0, s = 1, 2, .... The optimal linear 1-period ahead forecast
for a stationary M A(1) process is therefore:
Ŷt+1|t = µ + θεt
= µ + θ(1 + θL)−1 (Yt − µ)
= µ + θ(Yt − µ) − θ2 (Yt−1 − µ) + θ3 (Yt−2 − µ) − ....
An M A(1) process at period t + s is
Yt+s − µ = εt+s + θεt+s−1 ,
To forecast an M A(1) process for s = 2, 3, ... periods into the future therefore is
Ŷt+s|t = µ f or s = 2, 3, ....
2.2.4
Forecasting an M A(q) Process
For an invertible M A(q) process,
(Yt − µ) = (1 + θ1 L + θ2 L2 + ... + θq Lq )εt ,
the forecast becomes
¸
·
1 + θ1 L + θ2 L2 + ... + θq Lq
1
(Yt − µ).
Ŷt+s|t = µ +
s
2
q
L
+ 1 + θ1 L + θ2 L + ... + θq L
Now
·
½
=
1 + θ1 L + θ2 L2 + ... + θq Lq
Ls
¸
+
θs + θs+1 L + θs+2 L2 + ... + θq Lq−s f or s =
1, 2, ..., q
0
f or s = q + 1, q + 2, ...
11 Copy Right by Chingnun Lee r 2006
Ch.
2 15
FORECASTS BASED ON AN INFINITE NUMBER OF OBSERVATION
Thus for horizons of s = 1, 2, ..., q, the forecast is given by
Ŷt+s|t = µ + (θs + θs+1 L + θs+2 L2 + ... + θq Lq−s )
1
(Yt − µ).
1 + θ1 L + θ2 L2 + ... + θq Lq
A Forecast farther then q periods into the future is simply the unconditional mean
µ.
It is important to note that to forecast an M A(q) process, an optimal s (s ≤
q)-period-ahead linear forecast would in principle requires all of the historical
value of Y {Yt , Yt−1 , ...}.
2.2.5
Forecasting an ARM A(1, 1) Process
For an ARM A(1, 1) process
(1 − φL)(Yt − µ) = (1 + θL)εt ,
(13)
that is stationary and invertible.
Denote (1 + θL)εt = ηt , this ARM A(1, 1) process can be represented as (using
(1.1.9) on p.3 of Hamilton)
Yt+s − µ = φs (Yt − µ) + φs−1 ηt+1 + φs−2 ηt+2 + ... + φηt+s−1 + ηt+s
= φs (Yt − µ) + φs−1 (1 + θL)εt+1 + φs−2 (1 + θL)εt+2 + ...
+φ(1 + θL)εt+s−1 + (1 + θL)εt+s
= φs (Yt − µ) + (φs−1 εt+1 + φs−1 θεt ) + (φs−2 εt+2 + φs−2 θεt+1 ) + ...
+(φεt+s−1 + φθεt+s−2 ) + (εt+s + θεt+s−1 ).
Setting E(εt+h ) = 0, h = 1, 2, ..., s, the optimal linear s-period ahead forecast for
12 Copy Right by Chingnun Lee r 2006
Ch.
2 15
FORECASTS BASED ON AN INFINITE NUMBER OF OBSERVATION
a stationary ARM A(1, 1) process is therefore:
Ê(Yt+s |Yt , Yt−1 , ...] = µ + φs (Yt − µ) + φs−1 θεt
1 − φL
= µ + φs (Yt − µ) + φs−1 θ
(Yt − µ) f rom (13)
1 + θL
(1 + θL)φs + φs−1 θ(1 − φL)
(Yt − µ)
= µ+
1 + θL
φs + θLφs + φs−1 θ − θLφs
= µ+
(Yt − µ)
1 + θL
φs + φs−1 θ
= µ+
(Yt − µ).
1 + θL
It is important to note that to forecast an ARM A(1, 1) process, an optimal
s-period-ahead linear forecast would in principle requires all of the historical
value of Y {Yt , Yt−1 , ...}.
Exercise:
Suppose that the U.S. quarterly seasonally adjusted unemployment rate for the
period 1948 − 1972 behaves much like the time series
Yt − 4.77 = 1.54(Yt−1 − 4.77) − 0.67(Yt−2 − 4.77) + εt ,
where the εt are uncorrelated random (0, 1) variables. Let us assume that we
know that this is the proper representation. The four observations updated to
1972 are Y1972 = 5.30, Y1971 = 5.53, Y1970 = 5.77, and Y1969 = 5.83. Please make a
best linear forecast of Y1972+s , s = 1, 2, 3, 4 based on the information available.
13 Copy Right by Chingnun Lee r 2006
Ch. 153 FORECAST BASED ON A FINITE NUMBER OF OBSERVATIONS
3
Forecast Based on a Finite Number of Observations
The section continues to assume that population parameters are known with certainty, but develops forecasts based on a finite m observations, {Yt , Yt−1 , ..., Yt−m+1 }
3.1
Approximations to Optimal Forecast
For forecasting an AR(p) process, an optimal s-period-ahead linear forecast based
on an infinite number of observations {Yt , Yt−1 , · · · } in fact make use of only the p
most recent value {Yt , Yt−1 , · · · , Yt−p+1 }. For an M A or ARM A process, however,
we would require in principle all of the historical value of Y in order to implement
the formula of the proceeding section.
In reality we do not have infinite number of observations for forecasting. One
approach to forecasting based on a finite number of observations is to act as if
pre-sample Y ’s were all equal to its mean value µ (or ε = 0). This idea is thus to
use the approximation
Ê(Yt+s |1, Yt , Yt−1 , ...) ∼
= Ê(Yt+s |Yt , Yt−1 , ..., Yt−m+1 ; Yt−m = µ, Yt−m−1 = µ, ...).
For example in the forecasting of an M A(1) process, the 1 period ahead
forecast
Ŷt+1|t = µ + θ(Yt − µ) − θ2 (Yt−1 − µ) + θ3 (Yt−2 − µ) − ....
is approximated by
Ŷt+1|t = µ + θ(Yt − µ) − θ2 (Yt−1 − µ) + θ3 (Yt−2 − µ) − .... + (−1)m−1 θm (Yt−m+1 − µ).
For m large and |θ| small (then the real difference between Y and µ will multiply
a small and smaller number), this clearly gives an excellent approximation. For
|θ| closer to unity, the approximation may be poor.
14 Copy Right by Chingnun Lee r 2006
Ch. 153 FORECAST BASED ON A FINITE NUMBER OF OBSERVATIONS
3.2
Exact Finite-Sample Forecast
An alternative approach is to calculate the exact projection of Yt+1 on its m most
recent values.3 Let
1
Yt
Yt−1
.
xt =
.
.
Yt−m+1
We thus seek a linear forecast of the form
Ê(Yt+1 |xt ) = α0
(m)
(m)
xt = α0
(m)
(m)
(m)
+ α1 Yt + α2 Yt−1 + ... + αm
Yt−m+1 .
(14)
The coefficient relating Yt+1 to Yt in a projection of Yt+1 on the m most recent
(m)
value of Y is denoted α1 in (14). This will in general be different from the
coefficient relating Yt+1 to Yt in a projection of Yt+1 on the (m + 1) most recent
(m+1)
value of Y ; the latter coefficient would be denoted α1
.
From (2) we know that
E(Yt+1 x0t ) = α0 E(xt x0t ),
or
α0 = E(Yt+1 x0t )[E(xt x0t )]−1 ,
(15)
assuming that E(xt x0t ) is a nonsingular matrix.
Since for a covariance stationary process Yt , γj = E(Yt+j − µ)(Yt − µ) =
E(Yt+j Yt ) − µ2 , here
E(Yt+1 x0t ) = E(Yt+1 )[1 Yt Yt−1 ...... Yt−m+1 ]
= [µ (γ1 + µ2 ) (γ2 + µ2 ) ..... (γm + µ2 )]
3
So we do not need to assume any model such as ARM A here.
15 Copy Right by Chingnun Lee r 2006
Ch. 153 FORECAST BASED ON A FINITE NUMBER OF OBSERVATIONS
and
0
E(xt xt ) = E
=
1
Yt
Yt−1
.
.
.
£
¤
1 Yt Yt−1 . . . Yt−m+1
Yt−m+1
1
µ
µ
2
µ γ0 + µ
γ 1 + µ2
µ γ 1 + µ2
γ 0 + µ2
.
.
.
.
.
.
.
.
.
2
µ γm−1 + µ γm−2 + µ2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
µ
. γm−1 + µ2
. γm−2 + µ2
.
.
.
.
.
.
. γ 0 + µ2
.
Then
α0
(m)
= [µ (γ1 + µ2 ) (γ2 + µ2 ) ..... (γm + µ2 )]
1
µ
µ
. . .
µ
2
µ γ 0 + µ2
γ1 + µ
. . . γm−1 + µ2
µ γ 1 + µ2
γ0 + µ2 . . . γm−2 + µ2
.
.
.
. . .
.
.
.
.
. . .
.
.
.
.
. . .
.
µ γm−1 + µ2 γm−2 + µ2 . . . γ0 + µ2
−1
.
When a constant term is included in xt , it is more convenient to express
variables in deviations from the mean. Then we could calculate the projection of
(Yt+1 − µ) on xt = [(Yt − µ), (Yt−1 − µ), ......, (Yt−m+1 − µ)]0 :
(m)
(m)
(m)
Ŷt+1|t − µ = α1 (Yt − µ) + α2 (Yt−1 − µ) + ... + αm
(Yt−m+1 − µ).
For this definition of xt the coefficients can be calculated from
(m)
−1
α1
γ0
γ1 . . . γm−1
(m)
γ0 . . . γm−2
α 2 γ1
.
. . .
.
. .
α(m) =
=
.
. . .
.
. .
.
.
. . .
.
.
(m)
γm−1 γm−2 . . . γ0
αm
(15) to be
γ1
γ2
.
.
.
γm
.
(16)
16 Copy Right by Chingnun Lee r 2006
Ch. 153 FORECAST BASED ON A FINITE NUMBER OF OBSERVATIONS
To generate an s-period-ahead forecast Ŷt+s|t we would use
(m,s)
Ŷt+s|t − µ = α1
(m,s)
(Yt − µ) + α2
(m,s)
(Yt−1 − µ) + ... + αm
(Yt−m+1 − µ),
where
(m,s)
α1
(m,s)
α2
.
.
.
(m,s)
αm
=
γ0
γ1
.
.
.
γ1
γ0
.
.
.
γm−1 γm−2
.
.
.
.
.
.
.
.
.
.
.
.
. γm−1
. γm−2
.
.
.
.
.
.
. γ0
−1
γs
γs+1
.
.
.
.
γs+m−1
17 Copy Right by Chingnun Lee r 2006
Ch. 15
4
4 SUMS OF ARM A PROCESS
Sums of ARM A Process
This section explores the nature of series that result from adding two different
ARM A process together.
4.1
Sum of an M A(1) Process and White Noise
Suppose that a series Xt follows a zero-mean M A(1) process:
Xt = ut + δut−1 ,
where ut is white noise with variance σu2 .
The autocovariance of Xt are thus
(1 + δ 2 )σu2 f or j = 0
δσu2
f or j = ±1
E(Xt Xt−j ) =
0
otherwise.
(17)
Let vt indicate a separate white noise series with variance σv2 .
Suppose, furthermore, that v and u are uncorrelated at all leads and lags:
E(ut vt−j ) = 0 f or all j,
implying that
E(Xt vt−j ) = o f or all j.
(18)
Let an observed series Yt represent the sum of the M A(1) and the white noise
process:
Yt = Xt + vt
= ut + δut−1 + vt .
(19)
(20)
The question now posed is, What are the time series properties of Y ?
18 Copy Right by Chingnun Lee r 2006
Ch. 15
4 SUMS OF ARM A PROCESS
Clearly, Yt has mean zero, and its autocovariances can be deduced from (17)
through (18):
E(Yt Yt−j ) = E(Xt + vt )(Xt−j + vt−j )
= E(Xt Xt−j ) + E(vt vt−j )
(1 + δ 2 )σu2 + σv2 f or j = 0
δσu2
f or j = ±1
=
0
otherwise.
(21)
Thus, the sum Xt +vt is covariance-stationary, and its autocovariances are zero
beyond one lags, as those for an M A(1). We might naturally then ask whether
there exist a zero-mean M A(1) represent for Y ,
Yt = εt + θεt−1 ,
with
½
E(εt εt−j ) =
σ 2 f or j = 0
0 otherwise,
(22)
(23)
whose autocovariance match those implied by (21). The autocovariance of (22)
would be given by
(1 + θ2 )σ 2 f or j = 0
θσ 2
f or j = ±1
E(Yt Yt−j ) =
0
otherwise.
In order to be consistent with (21), it would be the case that
(1 + θ2 )σ 2 = (1 + δ 2 )σu2 + σv2
(24)
θσ 2 = δσu2 .
(25)
and
Taking the value4 associated with the invertible representation (θ∗ , σ 2∗ ) which
is solved from (24) and (25) simultaneously. Let us consider whether (22) could
indeed characterize the data Yt generated by (20). This would require
(1 + θ∗ L)εt = (1 + δL)ut + vt ,
4
Two value of θ that satisfy (24) and (25) can be found from the quadratic formula:
p
[(1 + δ 2 ) + (σv2 /σu2 )] ± [(1 + δ 2 ) + (σv2 /σu2 )]2 − 4δ 2
θ=
2δ
19 Copy Right by Chingnun Lee r 2006
Ch. 15
4 SUMS OF ARM A PROCESS
or
εt = (1 + θ∗ L)−1 [(1 + δL)ut + vt ]
= (ut − θ∗ ut−1 + θ∗2 ut−2 − θ∗3 ut−3 + · · · )
+δ(ut−1 − θ∗ ut−2 + θ∗2 ut−3 − θ∗3 ut−4 + · · · )
+(vt − θ∗ vt−1 + θ∗2 vt−2 − θ∗3 vt−3 + · · · ).
(26)
The series εt defined in (26) is a distributed lag on past value of u and v, so
it might seem to possess a rich autocorrelation structure. In fact, it turns out to
be white noise ! To see this, note from (21) that the autocovariance-generating
function of Y can be written
gY (z) = δσu2 z −1 + [(1 + δ 2 )σu2 + σv2 ]z 0 + δσu2 z 1
= σu2 (1 + δz)(1 + δz −1 ) + σv2 ,
(27)
so the autocovariance-generating function of εt = (1 + θ∗ L)−1 Yt is
gε (z) =
σu2 (1 + δz)(1 + δz −1 ) + σv2
.
(1 + θ∗ z)(1 + θ∗ z −1 )
(28)
But θ∗ and σ ∗2 were chosen so as to make the autocovariance-generating function
of (1 + θ∗ L)εt , namely,
(1 + θ∗ z)σ ∗2 (1 + θ∗ z −1 ),
(29)
identical to the right side of (27). Thus, (28) is simply equal to
gε (z) = σ 2∗ ,
a white noise series.
To summarize, adding an M A(1) process to a white noise series with which it
is uncorrelated at all leads and lags produce a new M A(1) process characterized
by (22).
4.2
Sum of Two Independent Moving Average Process
As a necessary preliminary to what follows, consider a stochastic process Wt ,
which is the sum of two independent moving average processes of order q1 and
q2 , respectively. That is,
Wt = θ1 (L)ut + θ2 (L)vt ,
20 Copy Right by Chingnun Lee r 2006
Ch. 15
4 SUMS OF ARM A PROCESS
where θ1 (L) and θ2 (L) are polynomials in L, of order q1 and q2 , and the white
noise processes ut and vt have zero means and are mutually uncorrelated at all
lead and lags. Suppose that q = max(q1 , q2 ); then it is clear that the autocovariance function γj for Wt must be zero for j > q.5 It follows that there exists a
representation of Wt as a single moving average process of order q:
Wt = θ3 (L)εt ,
(30)
where εt is a white noise process with mean zero. Thus the sum of two uncorrelated moving average processes is another moving average process, whose order
is the same as that of the component process of higher order.
4.3
Sum of an ARM A(p, q) Plus White Noise
Consider the general stationary and invertible ARM A(p, q) model
φ(L)Xt = θ(L)ut ,
where ut is a white noise process with variance σu2 . Let vt indicate a separate
white noise series with variance σv2 . Suppose, furthermore, that v and u are
uncorrelated at all leads and lags.
Let an observed series Yt represent the sum of the ARM A(p, q) and the white
noise process:
Yt = Xt + vt .
(31)
In general we have
φ(L)Yt = φ(L)Xt + φ(L)vt
= θ(L)ut + φ(L)vt .
Let Q = max(q, p) and from (30) we have that Yt is an ARM A(p, Q) process.
5
To see this, let Wt = Xt + Yt , then
γjW = E(Wt Wt−j ) =
=
=
E(Xt + Yt )(Xt−j + Yt−j )
E(Xt Xt−j )(Yt Yt−j )
½ X
γj + γjY f or j = 0, ±1, ±2, ..., ±q,
0
otherwise.
21 Copy Right by Chingnun Lee r 2006
Ch. 15
4.4
4 SUMS OF ARM A PROCESS
Adding Two Autoregressive Processes
Suppose now that Xt are AR(p1 ) and Wt are AR(p2 ) process:
φ1 (L)Xt = ut
φ2 (L)Wt = vt ,
(32)
where ut and vt are uncorrelated white noise at all lead and lags. Again suppose
that we observe
Yt = Xt + Wt ,
we wish to determine the nature of t5he observed process Yt .
From (32) we have
φ2 (L)φ1 (L)Xt = φ2 (L)ut
φ1 (L)φ2 (L)Wt = φ1 (L)vt ,
then
φ1 (L)φ2 (L)(Xt + Wt ) = φ1 (L)vt + φ2 (L)ut
or
φ1 (L)φ2 (L)Yt = φ1 (L)vt + φ2 (L)ut .
Therefore Yt is an ARM A(p1 + p2 , max{p1 , p2 }) process from (30).
22 Copy Right by Chingnun Lee r 2006
Ch. 15
5
5 WOLD’S DECOMPOSITION
Wold’s Decomposition
All of the covariance stationary ARM A process considered in Chapter 14 can be
written in the form
Yt = µ +
∞
X
ϕj εt−j ,
(33)
j=0
where εt is a white noise process and
P∞
j=0
ϕ2j < ∞ with ϕ0 = 1.
One might think that we are able to write all these processes in the form of (33)
because the discussion was restricted to a convenient class of models (parametric
ARM A model). However, the following result establishes that the representation
(33) is in fact fundamental for any covariance-stationary time series.
Theorem (Wold’s Decomposition):
Let Yt be any covariance stationary stochastic process with EYt = 0. Then it
can be written as
Yt =
∞
X
ϕj εt−j + ηt ,
(34)
j=0
P
2
2
2
where ϕ0 = 1 and where ∞
j=0 ϕj < ∞, Eεt = σ ≥ 0, E(εt εs ) = 0 for t 6= s,
Eεt = 0 and Eεt ηs = 0 for all t and s; and ηt is a process that can be predicted
arbitrary well by a linear function of only past value of Yt , i.e., ηt is linearly deterministic.6 Furthermore, εt = Yt − Ê(Yt |Yt−1 , Yt−2 , · · · ).
Proof:
See Sargent (1987, pp. 286-90) or Fuller (1996, pp. 94-98), or Brockwell and
Davis (1991, pp. 187-89).
Wold’s decomposition is important for us because it provides an explanation
of the sense in which ARM A model (stochastic difference equation) provide a
general model for the indeterministic part of any univariate stationary stochastic
process, and also the sense in which there exist a white-noise process εt (which
is in fact the forecast error from one-period ahead linear projection) that is the
building block for the indeterministic part of Yt .
6
When ηt = 0, then the process (18) is called purely linear indeterministic.
23 Copy Right by Chingnun Lee r 2006
© Copyright 2026 Paperzz