A.
Introduction
The purpose of this paper is to establish a notation and derive
certain distributional results needed in subsequent work.
The topic of
this research is statistical time series analysis in the time domain,
especially autoregressive-moving average (ARMA) models.
The particular
goal is a structured Bayesian approach to ARMA time series models.
In
this paper, a portion of the entire work, a notation is established for
the
anal~sis
of these models.
Then the results of the classical
statistical approach, maximum likelihood are obtained.
The structure
of the prior distributions to be used in the Bayesian analysis is next
described, followed by some derivations of the necessary conditional
and/or posterior distributions.
given in Appendix G.
The proofs of these derivations are
2
B.
Notation and Definitions
Consider the autoregressive-moving average (ARMA) process
of order
(p,q)
{Zt}
described by (B.l):
(B.l)
where the
e 's
are iid normal random variables each with mean zero
t
2
A finite segment of this process is observed:
cre .
and variance
Z = (z1' ••• , zN)
T
which has an
distribution with mean vector
matrix
~
N dimensional multivariate normal
~lN
and covariance matrix
The
is described by
(B. 2) .
where the covariance function
cr
characterizes this time series process.
By taking variances of both sides of (B.l), the covariance function
cr
can be determined from
p
p
I ~i~J·cr(s+i-j)a;
I
i=O j=O
q
I
=
j=O
for any integer
for
i > P
or
s
~
0,
j > q.
where
~
o=
8
(See Anderson
gives an algorithm for computing
0
2
8 .8
= -1
(1971~
J
·+ cre
J s
and
~.
~
p. 237);
= 8. = 0
J
McLeod (1975)
cr.)
Since forecasting is of primary interest, the distribution of the
future
n
observations, that is,
3
conditional on the observed
with mean
z, has a multivariate normal distribution
and covariance matrix
where
} N
} n
and
* =
In the notation to be used here, this is
AnN
written
(B.4)
Notice that
~+n
is a function of
$
4
C.
Maximum Likelihood
The analysis for applying the maximum likelihood approach is
straightforward when the model
M = (p,q)
is assumed to be known.
rathe~
The
density of the observations is given by
By taking the natural logarithm of both sides above, the log-likelihood
function is written
(C.2)
C1 -
~ in
0; - ~nl~1
- 2~2 (z-~lN)T~l(z-~lN)
e
where
C
1
is a constant (as are later
2
3l(~,ae,~)/a~
Ci's).
First, by setting
to zero, the maximum likelihood estimator (MLE) for
can be found to be
(C.3)
The MLE of
0
2
e
can then be found by solving
A
2
e
2
= 0,
e
3i(~,0 ,~)/30
which yields
(C.4)
The concentrated log-likelihood function can now be written
(C.S)
i * (~)
=
C2 - N/2 in Q(~) - ~ inl~1 .
~
5
The MLE of the remaining parameters,
maximizes
l
* (~),
~,
is that vector which
which must be found numerically.
Note that for
identifiability purposes, the values of the parameter vector
~
be restricted to that region which insures the process
to be
{Zt}
must
stationary and invertible.
The maximum likelihood approach to forecasting requires the
examination of the statistical properties of maximum likelihood
The great appeal
estimates under standard regularity conditions.
per~ormance
of maximum likelihood methods lies in the
of MLE's when large samples (i.e., large N) are encountered.
Subject
to mild regularity conditions, MLE's are consistent and asymptotically
normally distributed about the true parameter values.
Let the entire parameter vector be written as
n
=
(.J..l,cr 2 '~l'
e
2 "T) T
t··· ,cj> p ,6 1 _,." ,6 q ) T; "n = (""
Jl,cre'~
that is, that vector
n which maximizes lCn)
denotes the MLE for
from (C.2); and let
denote the true values of the unknown parameters.
statement
just mentioned
n*
The asymptotic
can be written more practically as
N(n * , DN(n *))
(C.6)
(approx)
is the asymptotic cavariance matrix of
observations
z.
Theoretically,
*
DN(n)
can be obtained from Fisher's
Information matrix (Theil (J97l, p. 192 ff)
case) but this is not feasible here.
DN(~)
of
"
n,
based on N
a~plains
the simple yector
However, a consistent estimate
can be obtained by taking numerical second partial derivatives
len)
at the MLE
"
n.
Since the problem at hand is forecasting, the forecasting vector
following the maximum likelihood approach is
6
(C.7)
which is found be replacing the unknown parameters in the mean given in
(B. 4) •
Since the elements of
n
are unknown, the "covariance matrix"
needed to obtain standard errors for
A
ZF must be augmented.
An
asymptotic distribtuion result similar to (C.6) can be written
(C.8)
where
T
n=n *
The covariance matrix above in (c.8) can be consistently
estimated by
T
A
n=n
7
D.
Bayesian Analysis
For the Bayesian analysis in this paper, the prior distribution
takes on a highly structured form, described below, in order
analysis flows smoothly.
th~t
the
It is hoped that the structure is sufficiently
flexible to admit the expression of a great variety of opinions.
The prior distribution is expressed in many levels of increasing
fineness.
At the coarsest level, the prior probability distribution
admits probability to the discrete set of models
order of the ARMA process, (p,q).
That is,
M described by the
p(m), m = 1,2,3, .••
are
the probabilities for which
p(m)
= Pr(M=m) = Pr(z
arises from an ARMA
where, by the convention chosen here,
m
=
(Pm'~)
process)
II
(~*1) + (Pm+~)(Pm+~+1)/2.-
A density function (with respect to Lebesgue measure on
p
R m
+
~
~(~IM=m),
),
parameters,
describes the prior distribution of the ARMA
1k = (4)l, .. ·,4>p' 8 l ,· .. ,8 q )
yielding the observations, z.
to that subset of
p+~
R m
T
of the ARMA (p,q)
The support of
for which the AIDiA
~(~ IM=m)
(Pm~)
process
is restricted
process described
21
by (B. 1)
is stationary and invertib1e.-
!/Hence m
=1
for (0,0), m
=2
for (1,0), ... , m
=
6 for (0,2).
~/This is not a constraint but a manner in which identifiability is
established. See Box and Jenkins (1976, pp. 195-198); they use the
term "model multiplicity." The identifiability regions (or stationary
and invertibi1ity regions) become complicated for higher order models.
For the Bayesian approach, this problem is easily surmountable. A method
for integrating over these regions is explained in Monahan (1977).
8
The remaining parameters are now
the mean of the process, and
~,
2
0e ' the disturbance variance, which will hereafter (in the Bayesian
analysis) be expressed as
r
= 1/0;.
r, the disturbance precision, related by
The prior distribution of
implied, M and
$) is normal with mean
I
(ll r)
(D. 1)
Finally,
r
~
~ given the precision r
y
and precision
3/
Lr,
or'
a
and
N(y, (u) -1 )
has a gamma distribution with shape parameter
scale parameter
(and,
S:
(r) - gamma (a,S)
(D.2)
Notice that the parameters of these last levels of prior distributions,
y,L,a,S,
$
'may depend on
and
M.
Since the distribution of the observations
z
has already been
stated (in Section B),
(D.3)
certain posterior distributions follow.
given in Appendix G.
$
given
(D.4a)
and
Proofs of the analyses are
To summarize, understanding that all are for
M,
l
yL +
(plz,r) - N
L
+
T -1
lNAN
z
T -1
lNAN IN
(D.4b)
~/ Definitions of the distributional notation are given in Appendix E.
9
(D.4c)
(r Iz) - gamma(a +
N
2 '
T
-1
8:+ ~(z-y~) (~
+
-1
T
T-1
~lN)
(z-y1N))
(D.4d)
(D.4e)
(D.4f)
* +
(zF I z,r) - NCY *a + b, r -1 (AnN
T
* +
(zF I z) - t n ,2a+N y *a + b, [2a+N]"
28* (AnN
[
-*aa))
T
T
-*aa)
T -1]"
where the following notation has been used:
-1
b = A21~ z
T
*
8*
= T
=
T -1
+ I.NAN
~
= (T
-* ) -1
-1 (z-y1N)
8 + (z-y1 N) T~
as in (D.4c)
The remainder of the Bayesian analysis now follows easily:
(D.5)
7f(1JJ\M,z) = 7f(1JJIM)p(zl1JJ,M)/p(zIM)
where
(p.6)
(D.7)
p(Mlz) = p(M)p(zIM)/I p(j)p(zlj)
j
10
And the posterior distribution of the forecasts follows fr,om
P(ZF!Z) = ~ fp(zF!z,w,M=j)~(wlz,M=j)p(jlz)dw
(D.8)
J
where
P(zF!z,w,M)
is found from (D.4f).
In the event that
~
and/or
must be modified somewhat.
r
are known, the preceding analysis
If both
p
and
r
are known, all of the
analysis in the Appendix can be bypassed; the results listed in (D.4)
are unnecessary; the only distributions needed are that of
(D.3) and (B.4).
If, however,
r
z
and
zF:
alone is known, (D.4a) is unchanged,
(D.4b) and(D.4f) are dropped, and (D.4d) is replaced by
(z Ir,w,M)
If, on the other hand,
obtained by taking
L
~
~
N(y1 N, r
-1
(~+
-1
T
LININ»
p
alone is known, the necessary results can be
00
so that
p
= y
with probability one.
(D.4c ** )
.
**
(D.4d )
(D.4e ** )
(D.4f ** )
where
S**
.
22:+*N*]
(zFlz) ~ ~,2a+N~a+b,
[
[ ~
(AnN
* ) -1],
is the scale parameter in (D.4c ** ).
11
Appendix E:
A
random variable
Distributional Definitions
said to have the (k-dimensional) multivariate
X
normal distribution with mean vector
~
and covariance matrix
A is
denoted by
N(~,A)
(X) -
where
~
X and
definite.
are
k
by 1 vectors and
A
is
k
The probability density function of such an
by
k
and positive
X is
(E.l)
If
k = 1,
X
(now a scalar) has a univariate normal distribution.
The same notation is used to write this; A
is now a scalar, say
is called the variance, and must be positive.
0'
2
The density function has
a simpler form:
2 -1.<2 exp 11
p (x) = (21T0')
21
20
Also,
-1
A
2} .
is sometimes referred to as the precision matrix of the
multivariate normal; 1/0 2 ,
A random variable
parameter
(x-~)
the precision of the univariate normal.
X said to have a gamma distribution with shape
and scale parameter
a
~
(X) - gamma
where both
a
and
of such an
X
is
p(x)
~
are positive.
~
a
r (a)
-~x
a-I
x
e
is denoted by
(a,~)
The probability density function
for
x >
a .
12
A random variable
t-distribution with
n
X said to have a (k-dimensional) multivariate
degrees of freedom, location vector
~
and
precision matrix B (see definition in DeGroot (l970? Pp. 59-62) is
described by
where
and
X
matrix.
f.i
k
k by 1
and
B
is a
k by k positive definite
The probability density function of such an
p(x)
If
are
= 1,
=
r~) IB1~
r
+ 1: (x-~) TB (x-~)] -
[l
n (mr) k/2
X is given by
n+k
-2-
n
2
X is now a scalar and is said to have the univariate
(Student's) t distribution, written
(]{) ~ t
where
such an
~
and
b
are scalars and
n
(~,b)
b > O.
The density function of
X is given by
n+l
--2p (x) =
-Jo-""';:-.....L--
(X-fl)
nb
2]
13
Appendix F.
Lemmas
=
This last expression is obtained by subtracting
[ aaliJ
times the first
row from the ith:
a.
first column:
diagonal:
others:
b
1
a1.'
l
1.
= _.-
a
(1 +
a
l
b )
1 1
=
+ b.a.
1. 1.
a.b.
1.
J
These operations do not change the determinant and the last determinant
can be found by partitioning:
Lemma 3.
If
A is positive definite and
T >
0 ,
14
Proof:
Lemma 4.
scalars
Given a positive definite matrix A, vectors
,~O
and
LHS
= y2, +
RHS
= (z
=
Proof:
y, LHS
= RHS
d
and
z,
below
zTA-l z ~ (y, + zTA-ld )2/(, + dTA-ld)
- yd)T(A-l_A-lddTA- l /(, + dTA-ld»(z - yd)
-1 T-l
. T
(z - yd) (A +, dd) (z - yd) .
Subtract
T -1
z A z
from both LHS and RHS and multiply both by
to get, respectively, LHS * and
RHS * :
LHS* = y2,(, + dTA-ld) _ (y, + zTA-ld )2
=
2 2
2 T -1
2 2
T -1
T -1 2
y, + Y ,d A d - y, - 2y,z A d - (z A d)
= y.2,d TA- l d
RHS*
_ 2y,zTA-ld _ (zT A-l d )2
= (~2y,zTA-ld + r 2dTA- l d)(, +
= _2y,zTA-ld +
dTA-ld) - (z - yd)T(A-lddTA-l)(z - yd)
,y2 d TA-ld _ 2YZTA-lddTA-ld + y2(d TA- l d)2
_ (zT A-l d )2 + 2y(zTA-ld ) (dTA-ld) _ y2(d TA- l d)2
= ,y2dTA-ld
_ 2y,zTA-l d _ (zTA-l d )2
15
Appendix G.
NB:
TheoJ;ems
Subscripts have been dropped when unnecessary.
section
Theorem 1.
then
z
is
Given
N by 1
and
(Jl\r) - N(y,(-rr)-l)
(lllz,r) - NC/,r- 1 L-*)
Proof;
zF
and
is
Throughout this
n by 1.
and
1
(zIJl,r) - N(].11,r- A),
(z!r) _ N(y1,r- 1 (A+L-
1
H T)-1)
= p(].1,zlr) = p(].1lz,r)p(zlr}
p(Jllr)p(z\].1,r)
N+1
N+1
--2- !.: -2- I-~
r
2
T -1
p(].1, z r) = (27f)
L 2r
exp{-2"[ (].1-y) L + (z-].11) A (z-].11) J}
\A
I
[J
2
= Jl ~
*
T -1
T -1 2
2].1(yL + z A 1) + (YL + z A l)/L
-
T -1
+ z A
2
T -1
z + Y L - (YL + z A
2
1) /L
*
*
= L*(].1 - y*)2 + (z - y1)T(A- 1 - L-*A- 1 11 TA- 1)(z_ yl)
L*(].1 - y*)2 + (z - Y1)T(A + L- 1 11T )-1(z - y1)
p(].1lz,r)p(z!r)
=
(27f)
-~
N
via Lemma 4.
*!.:!.:
r
* 2 *
(L) 2 r 2 exp{-l(].1-y ) L }
N
-:2" r 2"
x (27f)
IA I-~ (L * /L) -~ exP{-lr
T 1 (z-y1)}
(z-y1) T (AT+L -1 11)
16
Theorem 2.
Given
~
(r)
gannna (a,S)
*
N
(r z) - gamma (a + 2" ' 13)
I
13 * = 13
then
where
+
l T
l
(zlr) - N(yl,r- (A+r- ll ))
(z) - ~,2a (yl,
and
Sa
-1 T-l
(A+rll) )
~(z-Y1)T(A+T-l11T)-1(z-Y1) •
p(zlr)p(r)
Proof.
p(z, r) =
and
=
p(z,r)
=
p(rlz)p(z)
N
N
-7
I
-1 T 1-~"2
r
T
-1 T-l
(21T)
A+T 11
. r exp{-2" (z-'11) (A,+T 11) (z-y1) }
r
a-I -Sr
e
N
* a+2." a+ N -1
*
rs
=~r 2 ex
r N +a
2
Corollary.
Given
Proof.
N
rea) (21Ta)
(~Iz,r) - N(~*, r-lT-*)
[2~NJ
I[
l
1
T
-1 T -1
l+1S(z-{'l) (A+T 11) (z-yl)J
from Theorem 1, then
/).
p(~lz)
=
fp(~lz,r)p(rlz)dr
afN
(s*)
2
. r[al~J
*
({3
=
1:
2
r
a+N-l
*
2
e -S r dr
+ ~ ( ~ -y
N+l
* 2 * -~+-2-)
) T )
2a+N+l
2
*_~
(S )
2a
+ N
N
~-rz)1
17
Theorem 3.
then
where
A-*
=
I
*
-1 *
-* T
(zF z,r) - N(y a +b, r
(AnN + T aa))
I
**
T
=
T -L
T -*
T + 1 ~ ""1.+ a A a , a
=
-1
In - A21~ ~, b
-1
= A21~
z
and
(A~)-l
Proof.
where
-1
(zF z,l1,r) - N(l1ln+A21~ (z-'11),
Given
= !p(zFlz,Jl,r)p(l1lz,r)dl1
p(zFlz,r)
p(Jllz,r)
is available from Theorem 1
N
= (2'IT)-2IA~I-~' (2'IT)-~(T*r)~
P(zF1z,11,r)p(11Iz,r)
[] = 11 2 T**
**
= T
+
* *
- 211(y T
* *
(l1-(y T
* 2 *
(y ) T
Integrating out
d
=
a, y
= y* ,
P(zF Iz,r)
=
+
(2.)
=
-*
aA
T -*
a A (zF-b))
(zF-bD IT
** 2
)
* *
T -*
2 **
T -*
(zF-b) A (zF-b) - (y T + a A (zF-b)) IT
and applying Lemma 4 while substituting
11
z
+
±
zF - b, A
= A*
T
~ T
*,
will yield
~) IA* (, b:*f'expt~ [("F-b-r*a)T[A*+ aT~Tr ("F-b-/a)]}
*
IA* I (1: ** IT)
.
and notice
Corollary. ("Flz)'
tn,
2n+N (/a+b, [2;:*N] (A~ + aaT/T*)-l)
Proof follows directly from
p(~Flz)
=
f~(zFlz,r)p(rlz) dr •
18
=
J; (2~)-n/2IA*I-~(T**/T*)~
r
* -1 (zF- Y*a-b)}
x exp{ - 2"
(zF-Y *a-b) T (A*+aaTIT;·)
N
a.+(s*)
2
x r(a.+
~)-
2a.+NJ
r [---2-·
N
a.+2"-1
r
(~(2a.+N+W)
e
-13 *r
n/2
dr
*
- *
T *
T *
*
[s +(z -Y a-b) (A +aa IT ) (ZF- Y a-b)]
2a.+N+n
2
19
References
Anderson, T. W. (1970).
The Statistical Analysis of Time Series,
Wiley, New York.
Box, G. E. P. and Jenkins, G. M. (1976).
Time Series Analysis.
Forecasting and Control (Revised edition), Holden-Day, San
Francisco.
DeGroot, M. H. (1970).
Optimal Statistical 7]]ecisions, McGraw-Hill,
New York ..
McLeod, I. (1975).
Derivation of the Theoretical Autocovariance
Function of Autoregressive-Moving Average Time Series, Applied
Statistics 24, 2, pp. 255-256.
Monahan, J. F. (1977).
Correction:
26, p. 194.
Integration Over a Parameter Space Arising in
Time Series Models, Report No. 748, Applied Math. Dept.,
Brookhaven Nat1. Lab., Upton, New York.
Theil, H.
(1971).
Principles of Econometrics, Wiley, New York.
© Copyright 2026 Paperzz