Forecasting
1. Optimal Forecast Criterion - Minimum Mean Square Error Forecast
We have now considered how to determine which ARIMA model we should fit to our
data, we have also examined how to estimate the parameters in the ARIMA model
and how to examine the goodness of fit of our model. So we are finally in a position
to use our model to forecast new values for our time series. Suppose we have
observed a time series up to time T, so that we have known values for π1 , π2 , β― , ππ ,
and subsequently, we also know π1 , π2 , β― , ππ . This collection of information we
know will be denoted as:
πΌπ = {π1 , π2 , β― , ππ ; π1 , π2 , β― , ππ }
Suppose that we want to forecast a value for ππ+π , π time units into the future. We
now introduce the most prevalent optimality criterion for forecasting.
Theorem: The optimal k-step ahead forecast ππ+π,π (which is a function of πΌπ ) that
will minimize the mean square error,
E(ππ+π β ππ+π,π )
2
is
ππ+π,π = E(ππ+π |πΌπ )
This optimal forecast is referred to as the Minimum Mean Square Error Forecast.
This optimal forecast is unbiased because
E(ππ+π β ππ+π,π ) = E [E ((ππ+π β ππ+π,π )|πΌπ )] = E[ππ+π,π β ππ+π,π ] = 0
Lemma: Suppose Z and W are real random variables, then
2
2
min πΈ [(π β β(π)) ] = πΈ [(π β πΈ(π|π)) ]
β
1
That is, the posterior mean E(Z|W) will minimize the quadratic loss (mean square
error).
Proof:
2
2
πΈ [(π β β(π)) ] = πΈ [(π β πΈ(π|π) + πΈ(π|π) β β(π)) ]
2
= πΈ [(π β πΈ(π|π)) ] + 2πΈ[(π β πΈ(π|π))(πΈ(π|π) β β(π))]
2
+πΈ [(πΈ(π|π) β β(π)) ]
Conditioning on W, the cross term is zero. Thus
2
2
2
πΈ [(π β β(π)) ] = πΈ [(π β πΈ(π|π)) ] + πΈ [(πΈ(π|π) β β(π)) ]
Note: We usually omit the word βoptimalβ in the K Periods Ahead (Optimal) Forecast,
because in the time series context, when we refer to βforecastβ, we mean βthe optimal
minimum mean square error forecastβ, by default.
2
2. Forecast in MA(1)
The MA(1) model is:
ππ‘ = ππ‘ + πππ‘β1 , π€βπππ ππ‘ ~ππ(0, π 2 )
Given the observed data {π1 , π2 , β― , ππ }, the white noise term ππ‘ is not directly
observed, however, we can perform an Recursive Estimation of the White Noise as
follows:
Given the initial condition π0 (not important), we have:
π1 = π1 β ππ0
π2 = π2 β ππ1
β―
ππ = ππ β πππβ1
Therefore, given {π1 , π2 , β― , ππ }, we can also obtain {π1 , π2 , β― , ππ }, the entire
information known can thus be written as: π°π» = {πΏπ , πΏπ , β― , πΏπ» ; ππ , ππ , β― , ππ» }
Since ππ+1 = ππ+1 + πππ , the one period ahead (optimal) forecast is
ππ+1,π = πΈ(ππ+1|πΌπ ) = πΈ(ππ+1 + πππ |πΌπ )
= πΈ(ππ+1 |πΌπ ) + ππΈ(ππ |πΌπ )
= πΈ(ππ+1 ) + ππΈ(ππ |ππ )
= 0 + πππ = πππ
Note: Per convention in time series literature, we wrote:
πΈ(ππ |ππ ) = ππ
Although one can write this more rigorously as:
πΈ(ππ |ππ = π§π ) = π§π
Similarly, we wrote
πΈ(ππ |ππ ) = ππ ,
which can be written more rigorously as:
πΈ(ππ |ππ = π₯π ) = π₯π
The forecast error is
ππ+1,π = ππ+1 β ππ+1,π = ππ+1
Since the forecast is unbiased, the minimum mean square error is equal to the
forecast error variance:
2
E(ππ+1 β ππ+1,π ) = πππ(ππ+1 β ππ+1,π ) = πππ(ππ+1 ) = π 2
Given πΌπ = {π1 , π2 , β― , ππ ; π1 , π2 , β― , ππ }, and ππ+2 = ππ+2 + πππ+1 ,
the two periods ahead (optimal) forecast is
ππ+2,π = πΈ(ππ+2|πΌπ ) = πΈ(ππ+2 + πππ+1|πΌπ )
= πΈ(ππ+2 |πΌπ ) + ππΈ(ππ+1|πΌπ )
= 0+πβ0 =0
3
The forecast error is
ππ+2,π = ππ+2 β ππ+2,π = ππ+2 + πππ+1
Since the forecast is unbiased, the minimum mean square error is equal to the
forecast error variance:
2
E(ππ+2 β ππ+2,π ) = πππ(ππ+2 β ππ+2,π ) = πππ(ππ+2 + πππ+1 ) = (1 + π 2 )π 2
In summary, for more than one period ahead, just like the two-periods ahead
forecast, the forecast is zero, the unconditional mean. That is,
ππ+2,π = πΈ(ππ+2|πΌπ ) = 0
ππ+3,π = πΈ(ππ+3|πΌπ ) = 0
β―
The MA(1) process is not forecastable for more than one period ahead (apart from
the unconditional mean).
Forecast for MA(1) with Intercept (non-zero mean)
If the MA(1) model includes an intercept
ππ‘ = π+ππ‘ + πππ‘β1 , π€βπππ ππ‘ ~ππ(0, π 2 )
We can perform forecasting using the same approach.
For example, since ππ+1 = π+ππ+1 + πππ , the one period ahead (optimal)
forecast is
ππ+1,π = πΈ(ππ+1 |πΌπ ) = πΈ(π + ππ+1 + πππ |πΌπ )
= π + πΈ(ππ+1 |πΌπ ) + ππΈ(ππ |πΌπ )
= π + 0 + πππ = π + πππ
The forecast error is
ππ+1,π = ππ+1 β ππ+1,π = ππ+1
Since the forecast is unbiased, the minimum mean square error is equal to the
forecast error variance:
2
E(ππ+1 β ππ+1,π ) = πππ(ππ+1 β ππ+1,π ) = πππ(ππ+1 ) = π 2
3. Generalization to MA(q):
For a MA(q) process, we can forecast up to q out-of-sample periods. Can you write
down the (optimal) point forecast for each period, the forecast error, and the
forecast error variance? Please practice before the exam.
4. Forecast in AR(1)
For the AR(1) model,
Xπ‘ = β
Xπ‘β1 +ππ‘ , π€βπππ ππ‘ ~ππ(0, π 2 )
Given πΌπ = {π1 , π2 , β― , ππ ; π1 , π2 , β― , ππ };
The One Period Ahead (Optimal) Forecast can be computed as follows:
4
X π+1 = β
X π +ππ+1
So, given data πΌπ , the one period ahead forecast is
ππ+1,π = πΈ(X π+1|πΌπ ) = πΈ(β
X π +ππ+1 |πΌπ )
= πΈ(β
X π |πΌπ ) + πΈ(ππ+1|πΌπ )
= β
X π + 0
= β
X π
The forecast error is
ππ+1,π = ππ+1 β ππ+1,π = ππ+1
The forecast error variance:
πππ(ππ+1,π ) = πππ(ππ+1 ) = π 2
Two Periods Ahead Forecast
By back-substitution
X π+2 = β
X π+1 +ππ+2
X π+2 = β
(β
X π +ππ+1 )+ππ+2
X π+2 = β
2 X π +β
ππ+1 +ππ+2
So, given data πΌπ , the two periods ahead (optimal) forecast is
ππ+2,π = πΈ(ππ+2 |πΌπ ) = πΈ(β
2 X π +β
ππ+1 +ππ+2 |πΌπ )
= β
2 X π + β
0 + 0
= β
2 X π
That is, the 2 periods ahead forecast is also a linear function of the final observed
value, but with the coefficient β
2 .
The forecast error is
ππ+2,π = ππ+2 β ππ+2,π = β
ππ+1 +ππ+2
The forecast error variance is:
Var(ππ+2,π ) = πππ(β
ππ+1 +ππ+2 ) = (1 + β
2 )π 2
K Periods Ahead Forecast
Similarly
X π+π = β
π X π +β
πβ1 ππ+1 + β― + β
ππ+πβ1 + ππ+π
So, given Iπ , the k periods ahead optimal forecast is
ππ+π,π = πΈ(X π+π |Iπ ) = β
π X π
AR(1) with Intercept
If the AR(1) model includes an intercept
Xπ‘ = πΌ + β
Xπ‘β1 + ππ‘ , π€βπππ ππ‘ ~ππ(0, π 2 )
Then the one period ahead forecast is
ππ+1,π = πΈ(ππ+1|πΌπ )
= πΌ + πΈ(β
ππ |πΌπ ) + πΈ(ππ+1|πΌπ )
5
= πΌ + β
ππ
What is the two periods ahead forecast?
Forecast AR(1) recursively.
Alternatively, rather than using the back-substitution method directly as qwe have
shown above, we can forecast the AR(1) recursively as follows:
X π+2 = β
X π+1 +ππ+2
So, given data πΌπ , the two periods ahead (optimal) forecast is
ππ+2,π = πΈ(ππ+2 |πΌπ ) = πΈ(β
X π+1 +ππ+2 |πΌπ ) = β
πΈ(X π+1|πΌπ ) = β
ππ+1,π
Similarly, one can obtain the general recursive forecast relationship as:
ππ+π,π = πΈ(ππ+π |πΌπ ) = πΈ(β
X π+πβ1 +ππ+π |πΌπ )
= β
πΈ(X π+πβ1 |πΌπ ) = β
ππ+πβ1,π
= β― = β
πβ1 ππ+1,π
This way, once you have computed the one-step ahead forecast as:
ππ+1,π = πΈ(X π+1|πΌπ ) = πΈ(β
X π +ππ+1 |πΌπ )
= πΈ(β
X π |πΌπ ) + πΈ(ππ+1|πΌπ )
= β
X π + 0
= β
X π
You can obtain the rest of the forecast easily.
Note: As mentioned in class, I prefer the back-substitution method (introduced first)
in general because it can provide you with the forecast error directly as well.
5. AR(p) Process
Consider an AR(2) model with intercept
Xπ‘ = πΌ + β
1 Xπ‘β1 + β
2 Xπ‘β2 + ππ‘ , π€βπππ ππ‘ ~ππ(0, π 2 )
We have
X π+1 = πΌ + β
1 X π + β
2 X πβ1 + ππ+1
Then the one period ahead forecast is
ππ+1,π = πΈ(ππ+1|πΌπ )
= πΌ + πΈ(β
1 ππ |πΌπ ) + πΈ(β
2 ππβ1 |πΌπ ) + πΈ(ππ+1|πΌπ )
= πΌ + β
1 ππ + β
2 X πβ1
The two periods ahead forecast (by the recursive foreecasting method) is
ππ+2,π = πΈ(ππ+2|πΌπ )
= πΌ + πΈ(β
1 ππ+1 |πΌπ ) + πΈ(β
2 ππ |πΌπ ) + πΈ(ππ+2|πΌπ )
= πΌ + β
1 ππ+1,π + β
2 X π
= πΌ + β
1 (πΌ + β
1 ππ + β
2 X πβ1 ) + β
2 X π
= (1 + β
1 )πΌ + (β
1 2 + β
2 )X π + β
1 β
2 X πβ1
6
Alternatively, the two periods ahead forecast (by the back-substitution method) can
be derived as:
X π+2 = πΌ + β
1 X π+1 + β
2 X π + ππ+2
= πΌ + β
1 (πΌ + β
1 X π + β
2 X πβ1 + ππ+1 ) + β
2 X π + ππ+2
= (1 + β
1 )πΌ + (β
1 2 + β
2 )X π + β
1 β
2 X πβ1 + β
1 ππ+1 + ππ+2
Therefore
ππ+2,π = πΈ(ππ+2|πΌπ )
= (1 + β
1 )πΌ + (β
1 2 + β
2 )X π + β
1 β
2 X πβ1
The forecast error is
ππ+2,π = ππ+2 β ππ+2,π = β
1 ππ+1 + ππ+2
The forecast error variance is:
Var(ππ+2,π ) = πππ(β
1 ππ+1 + ππ+2 ) = (1 + β
12 )π 2
What are the forecasts for more future periods?
Please practice before the exam.
6. ARMA Process
Consider an ARMA(1,1) model
Xπ‘ = β
Xπ‘β1 + ππ‘ + πππ‘β1 , π€βπππ ππ‘ ~ππ(0, π 2 )
Again we can use the back-substitution method to compute point forecast.
X π+1 = β
X π + ππ+1 + πππ
X π+2 = β
X π+1 + ππ+2 + πππ+1
= β
(β
X π + ππ+1 + πππ ) + ππ+2 + πππ+1
One-step ahead forecast:
ππ+1,π = πΈ(ππ+1|πΌπ )
= πΈ(β
ππ |πΌπ ) + πΈ(ππ+1 |πΌπ ) + πΈ(πππ |πΌπ )
= β
ππ + πππ
Two-step ahead forecast:
ππ+2,π = πΈ(ππ+2|πΌπ )
= πΈ(β
ππ+1 |πΌπ ) + πΈ(ππ+2 |πΌπ ) + πΈ(πππ+1|πΌπ )
= β
πΈ(ππ+1 |πΌπ ) = β
(β
ππ + πππ )
The two-step ahead forecast error is
ππ+2,π = ππ+2 β ππ+2,π = β
ππ+1 + ππ+2 + πππ+1 = (β
+ π)ππ+1 + ππ+2
Again, since the forecast is unbiased, the minimum mean square error is equal to the
forecast error variance:
2
E(ππ+2 β ππ+2,π ) = πππ(ππ+2 β ππ+2,π ) = πππ[(β
+ π)ππ+1 + ππ+2 ]
= [(β
+ π)2 + 1]π 2
7
7. ARIMA Process
Suppose ππ‘ follows the ARIMA(1,1,0) model. That means its first difference
ππ‘ = βππ‘ = ππ‘ β ππ‘β1
follows the ARMA(1,0) model, which is simply the AR(1) model as follows:
ππ‘ = πππ‘β1 + ππ‘
π»πππ π€π ππ π π’ππ ππ‘ ~ππ(0, ππ2 )
ππ¨ππ: we used Ο2Z instead of Ο2 so you can get familiar with different notations.
Substituting ππ‘ = βππ‘ = ππ‘ β ππ‘β1 , and ππ‘β1 = βππ‘β1 = ππ‘β1 β ππ‘β2 we can write
the original ARIMA(1,1,0) model as:
ππ‘ β ππ‘β1 = π(ππ‘β1 β ππ‘β2 ) + ππ‘
That is:
ππ‘ = (1 + π)ππ‘β1 β πππ‘β2 + ππ‘
One-step ahead forecast:
ππ+1,π = πΈ[ππ+1 |πΌπ ] = πΈ[(1 + π)ππ β πππβ1 + ππ+1 |πΌπ ]
= (1 + π)ππ β πππβ1
Thus the forecast error
ππ+1,π = ππ+1 β ππ+1,π
= (1 + π)ππ β πππβ1 + ππ+1 β [(1 + π)ππ β πππβ1 ]
= ππ+1
And the variance of the forecast error
πππ(ππ+1,π ) = πππ(ππ+1 ) = ππ2
Two-step ahead forecast:
ππ+2 = (1 + π)ππ+1 β πππ + ππ+2
= (1 + π)[(1 + π)ππ β πππβ1 + ππ+1 ] β πππ + ππ+2
= (1 + π + π 2 )ππ β (π + π 2 )ππβ1 + ππ+2 + (1 + π)ππ+1
Therefore
ππ+2,π = πΈ[ππ+2 |πΌπ ]
2 )π
2
= πΈ[(1 + π + π
π β (π + π )ππβ1 + ππ+2 + (1 + π)ππ+1 |πΌπ ]
2
= (1 + π + π )ππ β (π + π 2 )ππβ1
Thus the forecast error
ππ+2,π = ππ+2 β ππ+2,π = ππ+2 + (1 + π)ππ+1
And the variance of the forecast error
πππ(ππ+2,π ) = (2 + 2π + π 2 )ππ2
8. Prediction (Forecast) Error, and Prediction Interval
Recall the k-step ahead prediction error is defined as
e + , = X + β E(X + |I )
By the law of total variance:
Var( ) = E[Var( |X)] + Var[E( |X)]
8
We can derive the variance of the prediction error:
Var(e
+ ,
) = Var(X
+
|I )
Based on the predictor and its error variance, we may construct the prediction
intervals. For example, if the X βs are normally distributed, we can consider the 95%
prediction interval
E(X
+
|I )
0.02
βVar(X
+
|I )
9
© Copyright 2026 Paperzz