Chapter 3: Autoregressive and moving average processes
Yining Chen∗
26 January 2015†
1
Two operators
Definition. We define the backshift operator by BXt = Xt−1 and extend it to powers, i.e. B 2 Xt =
B(BXt ) = BXt−1 = Xt−2 and so on. Thus, B k Xt = Xt−k .
Definition. Differences of order d are defined as ∇d = (1 − B)d , where we may expand the
operator (1 − B)d algebraically to evaluate for higher integer values of d. When d = 1, we usually
drop d from the notation.
2
Moving average models
Definition. The moving average model of order q, or MA(q), is defined to be
Xt = t + θ1 t−1 + θ2 t−2 + · · · + θq t−q ,
i.i.d.
where t ∼ N (0, σ 2 ).
Remarks:
1. Without loss of generality, we assume the mean of the process to be zero.
2. Here θ1 , . . . , θq (θq 6= 0) are the parameters of the model.
3. Sometimes it suffices to assume that t ∼ WN(0, σ 2 ). Here we assume normality mainly to
simplify our discussion.
4. By defining the moving average operator as
Θ(B) = 1 + θ1 B + θ2 B 2 + · · · + θq B q
We may also write the MA(q) process in the equivalent form
Xt = Θ(B)t .
Θ(z) is also known as the MA polynomial for z ∈ C.
Proposition. Let {Xt } follow the MA(q) model. Then
1. EXt = 0,
2. varXt = (1 + θ12 + · · · + θq2 ) σ 2 ,
∗ Please
send any comments and corrections to [email protected].
on 28 Jan.
† Updated
1
3. ρk =
1,
P
k = 0,
q−k
i=0 θi θi+k
1+θ12 +···+θq2
0,
,
1 ≤ k ≤ q, wher we denote θ0 = 1 for notational convenience.
k > q,
Remarks:
1. For an MA(q) model, its ACF vanishes after lag q.
2. It is both weakly and strongly stationary.
Now consider an MA(1) model, Xt
2 2
(1 + θ )σ
γk = θσ 2 ,
0,
= t + θt−1 . We have shown
k = 0,
1,
θ
k = 1, and ρk = 1+θ
2,
0,
k > 1.
previously that
k = 0,
k = 1,
k > 1.
Observe that |ρ1 | ≤ 1/2 and ρ1 is the same for θ as for 1/θ. Moreover, the pair (θ, σ 2 ) and
(1/θ, σ 2 θ2 ) yield the same autocovariance function γk . For example, the following two models
(
i.i.d.
Xt = t + 1/5t−1 t ∼ N (0, 25)
i.i.d.
ηt ∼ N (0, 1)
Xt = ηt + 5ηt−1
are the same because of normality.
Let’s express {t } backward in terms of the data for both models. For the pair (θ, σ 2 ), we get
t = Xt − θt−1 = Xt − θ(Xt−1 − θt−2 )
= Xt − θXt−1 + θ2 Xt−2 − θ3 Xt−2 + · · · ,
(1)
while for the pair (1/θ, σ 2 θ2 ), we get
t = Xt − θ−1 Xt−1 + θ−2 Xt−2 − θ−3 Xt−3 + · · · .
Note that if |θ| < 1, the first expression converges1 , while the second diverges. When we want to
interpret or estimate the noises t , it is more desirable to deal with a convergent expression, so in
the previous MA(1) example, we prefer θ = 1/5 to θ = 5.
In the first case, if θ < 1, we can also derive (1) via the moving average operator, i.e.
t = Θ−1 (B)Xt = (1 + θB)−1 Xt = (1 − θB + θ2 B 2 − · · · )Xt
= Xt − θXt−1 + θ2 Xt−2 − θ3 Xt−3 + · · ·
In general, if the contemporaneous error term of an M A(q) can be expressed as a linear
combination of its past observations {Xs , s ≤ t}, then the process is said to be invertible. More
formally,
Definition. An MA(q) (or more generally, ARMA (p, q)) process is said to be invertible, if the
time series {Xt } can be written as
Π(B)Xt =
∞
X
πj Xt−j = t ,
(2)
j=0
where Π(B) =
P∞
j=0
πj B j , and
P∞
j=0
|πj | < ∞. Here we set π0 = 1.
1 this convergence should be interpreted in the sense of mean-square limit. Rigorous treatment of such issues
proceeds by representing all the variables as elements of a suitable Hilbert space, with norm given by |X|2 = var(X)
P
and prove that Zj = Xt + ji=1 (−θ)i Xt−i is a Cauchy sequence, and so converges. See also the example sheet.
2
For the following theorem, the proof of its “if” part is given in the lecture. See the example
sheet for a proof of its “only if” part.
Theorem. An MA(q) process is invertible if and only if the roots of the equation Θ(z) = 0 all lie
outside the unit circle.
Remarks:
1. In case q = 1, the equation Θ(B) = 0 becomes 1 + θz = 0. Requiring the root to be outside
the unit circle is equivalent to asking for | − θ−1 | > 1, which is equivalent to |θ| < 1.
2. The coeffictients πj of Π given in (2) can be determined by solving
Π(z) =
∞
X
πj z j =
j=0
3
1
,
Θ(z)
|z| ≤ 1.
Autoregressive models
Definition. The autoregressive model of order p, or AR(p), is of the form
Xt = t + φ1 Xt−1 + φ2 Xt−2 + · · · + φp Xt−p ,
(3)
i.i.d.
where t ∼ N (0, σ 2 ), and where {Xt } is stationary.
Remarks:
1. (3) implies that the mean of Xt , µ, is zero. If we want to model a series with non-zero mean
by AR, we can replace Xt by Xt − µ in the previous definition, so
Xt − µ = t + φ1 (Xt−1 − µ) + φ2 (Xt−2 − µ) + · · · + φp (Xt−p − µ),
or equivalently, we can write
Xt = c + t + φ1 Xt−1 + φ2 Xt−2 + · · · + φp Xt−p ,
where c = µ(1 − φ1 − · · · − φp ).
2. Here φ1 , . . . , φp (φp 6= 0) are the parameters of the model.
3. By defining the autoregressive operator as
Φ(B) = 1 − φ1 B − φ2 B 2 − · · · − φp B q
We may write (3) as Φ(B)Xt = t . Φ(z) is also known as the AR polynomial for z ∈ C.
4. The existence of such a process is guaranteed. The basic idea should be clear when we study
AR(1).
i.i.d.
Now consider AR(1) model with Xt = φXt−1 + t and t ∼ N (0, σ 2 ). Iterating backwards k
times, we get
k−1
X
Xt = φk Xt−k +
φj t−j .
j=0
Provided that |φ| < 1 and Xt is stationary, we can represent2 AR(1) as
Xt =
∞
X
φj t−j .
j=0
2 again
in the mean square sense, see the previous footnote.
3
(4)
The mean of AR(1) is zero. The autocovariance function is γk = σ 2 φk /(1 − φ2 ) for k ≥ 0, and the
autocorrelation is ρk = φk . Therefore, AR(1) is weakly stationary by construction via (4).
Note that for AR(1) with |φ| > 1, the process is explosive because the values of the series can
quickly become large in magnitude. A stationary solution still exists by the following construction:
Xt = φ−1 Xt+1 − φ−1 t+1 = φ−1 (φ−1 Xt+2 − φ−1 t+2 ) − φ−1 t+1
= φ−k Xt+k −
k−1
X
φ−j t+j
j=1
=−
∞
X
φ−j t+j .
j=1
However, this solution is future-dependent.
Definition. An AR(p) (or more generally, ARMA (p, q)) process is said to be causal, if the time
series {Xt } can be written as
Xt =
∞
X
ψj t−j = Ψ(B)t ,
(5)
j=0
where Ψ(B) =
P∞
j=0
ψj B j , and
P∞
j=0
|ψj | < ∞. Here we set ψ0 = 1.
The following theorem should be compared to its MA analogue stated in the previous section.
It can be proved using exactly the same techniques shown in the lecture and the example sheet.
Theorem. An AR(p) process is causal if and only if the roots of the equation Φ(z) = 0 all lie
outside the unit circle.
4
Autoregressive moving average models
Definition. The autoregressive moving average model of orders p and q, or ARMA(p, q), is of
the form
Xt = φ1 Xt−1 + · · · + φp Xt−p + t + θ1 t−1 + · · · + θq t−q ,
(6)
i.i.d.
where t ∼ N (0, σ 2 ), and where {Xt } is stationary.
Remarks:
1. Using the AR operator and the MA operator, the ARMA model in (6) can be written in
concise form as
Φ(B)Xt = Θ(B)t .
2. The process reduces to AR(p) if q = 0, or to M A(q) if p = 0.
3. The usefulness of ARMA models lies in their parsimonious representation.
The problem of parameter redundancy (or over-parameterisation) can arise in ARMA models.
Consider a white noise process Xt = t . It also satisfies the following
Xt = 0.5Xt−1 + t − 0.5t−1 ,
which looks like an ARMA(1,1) model. When writting the model in operator form, we have
(1 − 0.5B)Xt = (1 − 0.5B)t .
4
We can easily detect this problem by observing that the AR polynomial Φ(z) = 1 − 0.5z and the
MA polynomial Θ(z) = 1 − 0.5z have a common factor 1 − 0.5z. Discarding the common factor
in each leaves Φ(z) = Θ(z) = 1, and we deduce that the model is actually white noise.
More generally, the problem of parameter redundancy arises only if there are common roots
in Φ(z) = 0 and Θ(z) = 0. If so, this issue can be resolved by discarding the common factors in
Φ(z) and Θ(z).
Theorem. An ARMA(p) process is causal and invertible if there is no common factors in Φ(z)
and Θ(z), and the roots of Φ(z) = 0 and Θ(z) = 0 all lie outside the unit circle.
Remark: t = Π(B)Xt is called the invertible representation of ARMA, where the coefficients πj
of Π given in (2) can be determined by solving
Π(z) =
∞
X
πj z j =
j=0
Φ(z)
,
Θ(z)
|z| ≤ 1.
On the other hand, Xt = Ψ(B)t is called the causal representation of ARMA, where the coeffictients ψj of Ψ given in (5) can be determined by solving
Ψ(z) =
∞
X
ψj z j =
j=0
5
Θ(z)
,
Φ(z)
|z| ≤ 1.
Autoregressive integrated moving average models
Often we process a time series before data analysis (e.g. differencing). Therefore, it is natural to
consider the following generalisation of ARMA:
Definition. {Xt } is said to be an ARIMA(p, d, q) process if ∇d Xt is an ARMA(p, q) process.
Remarks:
1. In the operator form, Φ(B)(1 − B)d Xt = Θ(B)t .
2. Typically, we pick d to be a small integer (≤ 2).
5
© Copyright 2026 Paperzz