REGRESSION mALYSIS OF TIrf: SERIES DATA
by
William S. Cleveland
Department of Statistics
Univerasi.ty of Norath Carolina at ChapeZ Hill
Institute of Statistics Mimeo Series No. 753
Aprvf,Z~
•
le'il
•
•
ABSTRACT
In fitting regression-autoregression models to time series data. it is
very convenient to be able to treat the model like a single large regression.
The asymptotic sampling theory justification for doing this is reviewed.
It
is shown that from the likelihood inference and Bayesian inference points of
view using standard N(J1'ession inferenaes is justifiable when the amount of
information from the initial assumptions and initial observations 18 small
compared with the additional information from the rest of the sample.
The use of various regression techniques such as transformations, subset
regression calculations, and nonlinear least squares in fitting regressionautoregression models is discussed.
I£GI6SION J1!'4ALYSIS OF TIPE SERIES DATA
William S. Cleveland.
1. INTRODUCTION
tl' ••• '~ be
Let
and k-l, ••• ,b,
and
~
dn, k' for n=af-l, ••• ,N
a=(al, ••• ,aa)' B=(Bl, ... ,Bb),
time series; a, b, and
a set of "known" numbers;
2
unknown parameters; and z, for n-a+l, ••• ,N, a sequence of
n
independent normal random variables with mean 0 and variance 0 2• Suppose
0,
for n-a+l, ••• ,N,
t n satisfies the equation
(1.1)
and that the j oint distribution of
~
-
(~l'
a linear
••• '~a)
t l , ... , t a
and covariance matrix t.
regression-auto~egressionmodel
RAR model.
t:=l Bkdn,k'
t~=l ajtn_j
is normal with mean
This model will be referred to as
and will be abbreviated to linear
is the autoregression part of the model and
the regression part of the model.
We could allow nonlinearities in a and
B and will do so in a few
places in the paper. a, b, and dn, k are "known" in the sense that when it
comes time to make inferences about the unknown parameters ~, t, a, B, and
2 the former will be assumed fixed and known. But in any
0 ,
particular application a, b, and dn, k must somehOw' be chosen and their
adequacy checked.
Usually, interest centers on making inferences about a, B, and 0 2
with
~
and I
regarded as nuisance parameters.
Often, a priori
tions are made on the values that the vector of parameters
restr1c~
(~, I, a, B, 0 2)
*WiZU,am S. Cleveland is assistant prOfessor.. Department of Statistics"
University of North CaroUna.. Chapel HilZ.
2
can take.
For example, it is sometimes assumed from the start that
stationary.
That is, the joint distribution, for any value of
tl+n' ~: ~ ~ tt+n
n-a+l, ••• ,N;
does not depend on
(ar- ias- i - aa+i- r
aa+i
- s ),
Let Y be the
the
b
e
vectors
tions in
(1.1)
aO·l
(N-a) x (a+b)
element of
(a+b)xl
of
dn,l-l,
for
a
••• -aaB, lie
t- l
is
t~~(s,t)
for
Z be the
(N-a)xl
(Oll,··.,aa' 131' ••• ,13 b )';
vector
j=l, ••• ,a,
(da+l,k' ••• '~.k)t,
is
n
(cf. [15, p.968]).
matrix whose first
for
bel;
l-~B-
B.
(ta+l, ••• ,~)t;
vector
be the
(ta+1_j' ••• '~_j)"
vectors
(r,s)th
where
(N-a)xl
(.za+l, .. .,~)t;
X be the
and
This implies
the roots of the polynomial in
outside the unit circle and the
vector
n.
r,
t
a
columns are the
and whose last
kal, ••• ,b.
b
a
columns are
Then the N-a equa-
may be written in matrix form as
Y ..
+ z.
Xi
(1.2)
Writing (1.1) in this way makes the model look like a single big regression
and tempts us to make inferences about
regression inferences.
e
and
variance theory.
and
a2 by using standard
The meaning of this italicized phrase will be taken
to be the following.
mating
e
From the sampling theory point of view it means esti-
a2 by least squares and applying the usual analysis of
From the likelihood inference (the approach to inference
described in [2] and [12]) or the Bayesian inference point of view it means
taking the likelihood or the posterior of
haa -2
and
e
to be proportional
to
N-a
hT
.
exp -%h(Y...xe) '.(Y-X6)
where this function of
h > 0
in
and all real
[17, p. 342-5].
h
and
e
(a+b)-tuples
(1.3)
is considered to be defined for all
e.
This normal-gamma density is discussed
The standard regression infe:rencBs for each of the three
approaches are in essence the same (except for minor differences in degrees
of freedom).
Only the language is different.
The situations where it is justifiable to make standard regression
inferences about e and (12 are discussed in Section 2 from the sampling
theory point of view and in Section 3 from the likelihood and Bayesian points
of view.
In Section 4 these different justifications are compared.
In
Section 5 the regression with stationary autoregressive errors model is
discussed.
Comparisons are made between this model and the linear RAR model.
In Sections 6-10 it is assumed that for the particular set of data
being analyzed it is justifiable to use standQ1'd rtegression infertences.
Under this assumption the use of various regression techniques in setting up
and analyzing a RAR model is discussed.
The general philosophy of using
regression techniques in fitting a RAR model is discussed in Section 6.
Sections 7 and 8 treat the question of selecting some transformation of the
series
t
n
in order to get a better fit.
In Section 7 likelihood techniques
are used to estimate the parameter of a power transformation of
t.
n
Section 8 a procedure is suggested for selecting a transformation of
order to remove inhomogeneities in the variation of
t.
n
In
t
n
in
In Section 9 an
algorithm is given for calculating all subset regressions (really autoregressions) in the stationary autoregressive model with mean
O.
In Section
10 a method of unconstrained optimization is given for finding the maximum of
the likelihood function for the particularly useful nonlinear seasonal RAR
model discussed in [4, p. 300].
The techniques suggested in this paper are illustrated by their application to birth rate, air pollution, airline passenger, telephone installation, and solar flux data.
4
2. SAMPLING THEORY STANDARD REGRESSION INFERENCES
Sampling theory justifications for making standard regression inferences
about the parameters in a RAR model have been given by Mann and Wald [14] and
Durbin [9].
Mann and Wald deal with the stationary
RAR model.
It is shown
that making standard regression inferences is justifiable if the sample size
is sufficiently large.
Durbin gives a similar asymtotic result for the more
general linear RAR model but requires the assumption that
(EX'X)-lX'X
con-
;51'
verges stochastically to the id'&ntity matrix.
No conditions. other than
stationarity, are given which assure this convergence.
3. BAYESIAN AND LIKELIHOOD STANDARD REGRESSION INFERENCES
The following conventions will be useful in this section.
is a set
r.
"r"
r
First if
will also be used to denote the indicator function of the set
For example the indicator fnnction
and is zero otherwise.
[y: Iyl
< 1]
equals one if
As in the previous section we will let
Iyl
h. a-2.
<
1
Also
Rk will denote the set of k-tpples of real numbers and R+ will denote the
positive rea1s.
The important idea used in this section will be the principle of stable
estimation [10. p. 201).
of parameters
F3(~)
~
Suppose
F
l
such that inferences about
= F1(~)F2(~)·
Now i£
F
1
F
2
are two functions of a vector
~
In the likelihood approach
function and in the Bayesian approach
~.
and
are to be solely based on
F3 would be the likelihood
F would be the posterior density of
3
is relatively flat in the region where most of the mass of
F
is concentrated and is nowhere too big outside this region, then F
3
2
(for the purpose of making inferences about ~) may be approximated by F2 •
We shall now see how this notion of stable estimation can be used to decide
when the RAR model can be treated like a standard regression.
It will be
assumed that a, 13,
and h are the parameters of interest and p and t
are nuisance parameters.
The likelihood function L of a sample tl' ... , ~ from the linear RAR
model (1.1) may be factored into two functions
L(tl ,···, ~ Ip, t,a,B,h) .. Ll (tl , .... , t a Ip,E)L2 (t afol ,·.·, ~ It l , ••• , t a ,a,B,h). (3.1)
As previously stated in Section 1 the vector of parameters
(p,t,a,B,h)
may
be a priori restricted to lie in some region r 1• The stationary case was
given as a particular example.
would restrict the range of
(It would however be quite rare that one
B to be less than all of Rb or the range of
~
h to be less than all of R.)
of parameters
(a,B,h)
must lie.
Let r 2 be the region in which the vector
:fr 2 is then a subset of Ra xRb xR.
The
likelihood function in (3.1) and the two factors on the right side are, once
the sample is taken, regarded as functions of the parameters; we will therefore write them as L(p,L,a,B,h), Ll(lJ,t),
and L2 (a,B,h). Now it is easy
is identical to the function in (1.3) (recall that
to see that L2 (a,e,h)
e .. (a,B)'),except that L2 (a,B,h)" 0 if (a,B,h) is not in r2 • This is
a bit inconvenient for what is about to be said so we will change the
definition of L2 slightly. L2 will be taken as identical to (1.3) and
(3.1) will then be rewritten as
We will now show when it is justifiable from the likelihood and the
Bayesian points of view to make standaPd regression inferences about a,B
and h.
The Bayesian case will be given first.
prior distribution on the parameters.
after observing the first
P(lJ,t,a,e,h)Ll(lJ,E).
be a
The posterior density of the parameters
a sample values
Let FI(a,B,h)
Let P(lJ,E ,a,B,h)
tl' ••• ,ta is proportional to
be the marginal posterior density of
6
a,B,
and
h
after observing
marginal function in
rZLZ(a,B,h)
and let
t , ... ,t ,
l
a
which is proportional to the
PL • Let FZ(a,e,h).
l
F (a,B,h). F (a.e,h)F (a.e,h). Then F
is pro3
l
2
3
a,e.
and
h
of the product
portional to the marginal posterior of
a,B.
and
h after observing the
tl' •••• ~.
entire sample
Sta:n.dard regression inferences about a,B,
(3.2)
and h
The principle of stable estimation as described earlier
applies to the
F3 , FZ' and F
l
Most of the mass of L lies in
Z
(3.3)
From (3.2)
may be made if:
F3
is well approximated by
well approximated by L2 ;
previously stated
thus
F
Z
1;1
r ZL
Z
just defined.
r Z•
r 2LZ is
and from (3.3)
F3 .is well approximated by L •
2
But as
is just the standard regression inference function
L
2
given in (1.3).
Condition (3.2) says that if the amount of information about
and
h
after observing
t , ... ,t
a
l
amount of information from
approximation of
F •
3
11,t
is such that
is proportional to
tl, ••• ,ta
F3
and
F
3
a,l3,
is small compared wi th the additional
ta+l' .... ~'
then
r 2L2 = F2 is a good
Note that if the prior distribution on the parameters
is independent of
a,6,h
and the prior density of
a,l3,h
r 2' then there is no information about a.l3,h from
= F2 •
In addition if
r2
= RaxRbxR+
then L • F
2
2
and
is exactly the fitandard ztegr'essi.on inference function in (1.3).
The discussion of the likelihood case proceeds exactly as the Bayesian
case with the word "posterior" replaced by "likelihood" and the prior density
P replaced by
r1 •
One expects the approximation typically to be valid when N is large.
With a large N we can justifiably expect the additional information about
a,e,
and
h
to be much greater than the information after observing
'l
tl, ••• ,t
which remains fixed as N increases. Of course this does not
a
happen with probability 1 so it is necessary to check the approximation.
As an example suppose t
is a stationary first order autoregression
altn_l + zn' for n-2, ••• ,N; lall < 1; and
is normal with mean 0 and variance h -l(l-ai> -1. Suppose the prior
with mean O.
tl
n
That is,
tn
III
density on a l · is uniform and the prior on h is h-1,
Jt
2Jt
2
independen.tly of
2
ar Then Ll(aph> -h (l-al ) exp - Jth(l-~)tl and L2 (a1 ,h)"
N-l
T
N
2
h
exp - Jth t n-2(tn-altn-1) and the posterior of al and h '.will be proportional to
If (3.2) and (3.3) hold then L (a ,h),
2 l
a good approximation of the posterior.
normalized to integrate to one,
is
This last paragraph will now be applied to the 199 daily changes from
July 16, 1968 tht',ough January 31, 1969 in the level of the solar power flux
density (defined in [20, Section 16.3]) at 2800 megahertz, which were recorded
at National Research Council, Ottawa, Canada.
The observations are given by
the following string of numbers; an italicized number has a negative value:
6 8 0 1 2 6 7 6 5 3 8 3 1 6 3 1 0 7 0 0 12 8 2 5 0 4 10 10 13 1 6 10 4 0 0
1 0 10 'l 6 6 4 5 2 4 2 6 5 8 1 4 3 5 8 0 7 3 3 0 4 11 3 1 8 2 0 7 6 1 16 15
2 2 1 4 10 8 0 9 1 2 3 2 0 4 3 1 3 9 5 5 0 6 7 4 2 8 2 2 6 1 1 0 2 21 28 4
6 2 0 6 8 1 2 4 2 0 3 3 2 0 3 6 13 8 'l 2 6 3 2 2 1 0 1 0 1 0 12 10 0 2 1 4
2 1 4 3 0 0 'l 3 1 ? 4 3 9 3 2 0 1 8 2 2 10 13 8 1 0 1 2 2 11 8 14 7 0 1 10
1 0 6 4 0 1 4 0 13 4 4 2 9 6 3 9 11 2 1 3 O.
From the knowledge about the
physical: mechanism generating the series it seems quite reasonable to assume
the mean of t
n
is O.
This is substantiated by the sample since the sample
mean is -.08 and the sample standard deviation is 6.20.
From a histogram of
8
the data it seemed quite reasonable to assume normality, although one might
be a bit hesitant in view of the accuracy of the data.
The first 5 sample
autocorrelations are
The first 5 sample
.20, .12, .04, -.05,
partial autocorrelations
[4, p. 64] are
and
-.06.
.20, .08, .00, -.07, and -.04.
The first 5 sample inverse autocorre1ations [7] are
and .04.
-.15, -.08, -.03, .05,
From these estimates it seemed quite reasonable to try fitting a
/
first order stationary autoregression.
The amount of information in L
2
The maximum of
L
2
occurs at
in this example appears to be large.
a • • 19945 with a standard error of
.069.
From this rough sort of analysis it would be quite reasonable to suppose
serves as a good approximation.
Let us, however, look closely at the ap-
proximation for making inferences about
of
a
l
L
2
a •
l
The actual marginal posterior
is
(t -a t
)2J - 99.5
n 1 n-l
This was got by integrating out h
in
L (al'h)
2
gration to find the normalizing constant.
.174
.
~
1
+
Now
•
and using numerical inte-
the posterior approximation
(a -.19945)2 208.266J- 199+1
1
2
199
which is a t-density and very nearly normal with mean .19945 and variance
.0048.
o
at
to
Very nearly all the mass of both m
1
.4.
.19945.
The maximum of
by
m
2
occurs at
.19943
m
2
lies in the interval
and the maximum of
Table 1 gives, for 11 equally spaced values of
.4, the values of
~
m
1
and
~,~
is quite good.
and their ratio.
a
l
m ,
2
.from 0
Clearly the approximation of
to
9
4. COMPARISON OF SAMPLING THEORY, BAYESIAN, AND LIKELIHOOD APPROACHES
At present the sampling theory justification for making standardre-
gression inferences in a RAR model is unfortunately quite incomplete.
The
results in Section 2 are asymptotic and restrict the class of' RAR .models eonsidered.
The main problem is that it is extremely difficult to derive
finite sample distributions for the least squares estimates r even if these
were calculated there would remain the problem of deciding when these distributions are well approximated by the standQ%'d regression inferences.
The Bayesian and likelihood justifications in Section 3 are (statistically)
complete.
That is, a routine has been described for deciding with a particular
sample at hand when standard regression inferences may be made.
Carrying out
the routine is a numerical analysis problem, albeit in some cases a difficult
one.
The reason for attempting to make such an approximation is that if the
approximation is valid, the resulting inferences are more easily understood
and more easily communicated.
5. COMPARISON OF THE LINEAR RAR MODEL AND THE REGRESSION
WITH STATIONARY AUTOREGRESSIVE ERRORS ~ODEL
A model often used in time series studies is the regression model with
stationary autoregressive errors.
nal, ••• ,N,
That is,
t
n
is assumed to satisfy, for
the equation
tn
=
1:~, 1 f3 kdn, k +
A=
en
where en is a stationary autoregression en
=
= 1:;.1 ajen_j
+ zn. For example,
economists frequently use the model t n Tn + Sn + en where Tn accounts
for trend-cycle and S,n accounts for seasonality.
10
We could proceed as in Section 3 by writing the likelihood function of a
sample from a regression model with stationaxy autoregressive errors as the
product
and see" if
L
2
provides a good approximation of L or is nearly proportional
to the posterior density of
a,B, and h. But the problem is that this L2
for the regression model with stationaxy autoregressive errors is not in the
form of the standard Pegress1,on inference function (1.3) and is not an easy
a function to work with because of the nonlinearities in
a
and
B.
This added complication in the regression with stationaxy autoregressive
errors model should, 1 think, encourage us to try and fit a linear
model when it is possible to do so.
RAR
Of course, a situation might arise where
the knowledge of the mechanism generating the series is such that one can
write down the first model and apriori specify the
choosing the
dn, k.
But usually,
d
n, k involves both prior knowledge and a good hard look at
the data; there tends to be a lot of guessing, fitting, and checking.
In
this sort of a situation the RAR model seems more attractive simply because
it is easier to work with if it is justifiable to make standard refJ'l'8ss1,on
inferences.
Of course, if this approximation is not valid then each of the
models has its own difficult nonlinearities.
6. THE USE OF REGRESSION TECHNIQUES
From this point on in the paper it will be assumed that a
RAR
model is
being fit to a particular set of data and that a preliminaxy analysis from
the likelihood or Bayesian point of view has indicated that making standard
regression inferences is justifiable.
The RAR model will be treated as if it
11
is the single large regression in (l.2) with dependent variable Y and
_independent variables
~,
for
k-l, ••• ,a+b,
which are the columns of
X.
In fitting the RAR model we would like to make use of what will be
called regression techniques - procedures such as transformations of the data,
variable selection, analysis of residuals, graphical displays, and nonlinear
least squares.
We could straightforwardly apply these techniques to the RAR
model as they are applied in the general regression situation.
However, while
with the RAR model we are now faced with a regression, it is one that has a
special structure (indeed sometimes, as we shall see later, an extremely
special structure) that is not present in the general regression problem for
which these techniques were devised.
(ta+l'''.'~)'
the independent variable
y.
(ta+l-k' ••• '~-k)"
k=l, ••• ,a,
for
There is a special relationship among
and dependent variables
which are the first
a
~
III
columns of
X.
Thus it is fruitful to rethink the use of regression techniques in the
specially structured RAR model.
Some techniques may possibly be modified
so they work much more effectively; some may not change at all; others may
be totally inappropriate; and completely new techniques may be developed.
The following sections are devoted to discussions of the use of a few
of the many regression techniques in fitting RAR models.
7. TRANSFORMATIONS:
ESTIr1ATING THE BEST POWER TRANSFORMATION
In the standard regression theory with independent observations there
have been many discussions about transforming the dependent variable or
transforming the independent variables in order to better satisfy the assumptions of the model.
For example, Anscombe and Tukey [1] have discussed
the use of power transformations,
log t,
t(T)
III
(tT-l)/T
for
T~ 0
and
teO)
III
for reducing inhomogeneities in variance and nonadditivity for
positive data.
If the data is not positive (or even if it is) then the two-
12
parameter family of power transformations
In
this paper only the one-parameter family will be discussed.
A discussion of
the two-parameter family would involve the same general principles; the
details would change in an obvious way.
Suppose the independent variable in (1. 2) is transformed by the power
transformation
t(L).
This is equivalent to rewriting the equations (1.1) as
Such a model, however, involves the original series in both a transformed and
untransformed way, which is not particularly congenial since it makes understanding the results of the analysis more difficult.
Of course, if such a
model is a very good approximation of the truth then one must be prepared to
face up to it.
to
But what one would like to get away with is to fit a RAR model
t(~), that is, to fit the model
(7.1)
In the language of regression the independent variable and the first
a
dependent variables are being transformed by the same power transformation.
In some instances one might also want to transform some of the other independ(L )
ent variables, that is, to replace dn, k in the above equation by dn,
k
for certain values of k. The choice of T will be discussed in this and
the next sections.
There are a number of ways of choosing the value of
T.
formation has been introduced to remedy a particular ill, then
chosen to minimize some measure of that ill.
be discussed in the next section.
If the transT
will be
A particular case of this will
If the power transformation is being
incorporated into the model to improve its overall health then it would seem
most reasonable to treat
T
as a parameter in
th~
model and utilize whatever
13
one believes to be a sound approach to estimation, to estimate T.
This
approach has been used for transforming the independent variable in the
gel'lera1- regression situation by Box and Cox [3].
The mechanics of esti-
mating T in the RAR model (7.1) will now be discussed from the likelihood
and Bayesian points of view.
It was stated in Section 6 that we would asswne the situation is such
that the standard regNssion infettence approximation is valid.
model in (7.1), since we have the added parameter T,
But for the
this statement must
be amended to say that the likelihood or the posterior is nearly proportional
to L2 (a,B,h,T) = L2(ta+l, ••• ,~la,B,h,T,tl,••• ,ta). That i~ for each fixed
T, standard Ngression infel'enoos may be made about a,B, and h. Define
the
(N-a)xa matrix T(T) = [t~~11, . n.a+1, ... ,N,
(N-a)xb matrix D = [dn,k 1 ,
j=l,. •• ,a and the
k-l,. •• ,b.
n-a+l, ••• ,N,
As previously stated
one might want to transform, in addition to
t _j , for j=l, ••• ,a, some of
n
the other independent variables dn, k for certain values of k. To allow
for this possibility define D(T) to be the (N-a) xb matrix obtained by
transforming a subset of the columns of D.
X(T) • [T(T); D(T)].
independent.
Let
a(T)
(N-a)x(a+b)
We will assume that the columns of X(T)
matrix
are linearly
(The discussion would be similar if this restriction were removed.)
e = (a,B)'; S(T)
(Y-X(T)6(T».
Define the
= (X'(T)X(T»-l X'(T)Y;
and aCT)
= (Y-X(T)S(T»'
Then
is the maximum likelihood estimate of
Bayesian approach is the condition mean of
e for a fixed T and in the
e given T.
14
For fixed
T and
h,
L
2
multivariate normal density.
posterior of
and
h
e
as a function of
is proportional to a
Thus the marginal likelihood or the marginal
T is proportional to
N-2a-b
t~)
<TI::a+l
If
h
lx' (T)X(T) I-~exp"-Yh
h 2
is integrated out and we let v
II:
s(T) •
N-2a-b+2, the marginal of
T is
proportional to
(7.2)
To help keep the calculations within reasonable bounds it is better to compute the function
m(T) -
<IT... N
. t )
_ n-a+1
which, as a function of
by first computing
"C
't
Ls(r)
n i x ' (T)X(T) I-Y
J; ,
m{T)
can be computed
and then exponentiating.
T will be governed by meT)
mechanism generating the series.
value of
[i(l)
is proportional to (7.2).
log m("C)
The estimation of
T~l JX' (l)X(l) l!
and the physical
Two possible estimates of
which maximizes meT),
and
T· (J
mean likelihood estimate or the posterior mean of
Tare
T,
T m(T) d"C)/J m(T) dT,
T.
Often a value
the
the
TO
will be chosen which is close to an estimate that is believed reliable and
(TO)
which is such that
t
n
has an easy to understand interpretation.
This method of choosing the transformation will be applied to 234
measurements of the level of air pollution in nonurban areas in the United
States from 1958 through 1966.
These measurements are reported in [18] and
resulted from averaging daily observations in two-week intervals.
observations are
The
24 15 15 19 16 15 11 20 23 18 48 30 33 31 21 39 30 21 28 26
30 22 22 10 25 20 20 11 20 21 17 21 21 28 39 37 31 31 41 21 34 34 37 37 24 20
20 15 20 19 20 13 13 15 19 18 15 21 22 33 31 26 29 33 38 29 42 32 40 35 28 34
15
29 23 20 24 22 20 16 17 22 19 25 24 18 22 26 27 35 33 39 32 28 32 33 34 23 28
30 30 21 16 21 17 14 21 19 22 20 17 18 18 33 29 30 31 32 34 35 32 36 32 28 26
-
26 28 25 18 19 20 21 29 17 19 17 29 44 36 23 35 35 39 39 35 34 29 28 26 33 30
37 31 20 17 18 18 18 25 21 15 29 27 19 34 29 42 SO 41 39 44 39 30 28 33 28 23
29 36 30 18 19 26 17 18 20 16 29 31 26 22 23 37 28 37 48 39 24 37 45 34 25 29
23 34 38 16 23 24 20 17 18 20 21 28 23 33 28 3442 31 57 44 42 33 38 39 30 29
32 39 2722 23 20.
There is a
and an increasing linear trend.
strong~
yearly seasonal effect (of
T
at which meT)
maximum of meT)
so that
-T
~
26)
The model fit to the data was
The results of the fit are summarized in Table 2.
of
c~der
Column 1 gives the values
Column 2 gives meT) f lOS.
was calculated.
The
occurs at T" .16 • m is very nearly symmetric about T
.16.
Column 3 gives the values m(T)/m(.16),that
relative likelihood function.
Since
m(O)/m(~)
= .50
is~
the
it is very tempting to
use the logarithms of the observations since it makes sense to think in terms
of percentage fluctuation.
(e(T»'.
Columns 4-8 give the values of
The dependence of ~l(T)
where most of the mass of meT)
gl(T),
(a1 (T), ••• ,6 4(T»"
is slight in the interval [-.3, .6]
is contained.
Column 9 gives the values of
the coefficient of skewness of the fitted residuals
gl(T)" t 23 4 ~3(T)/(t2342 z2(T»1.S. The untransn= 2 n
n= n
formed data (corresponding to T=l) yields residuals Z~(l) that are skewed
for n. 2, ••• ,234.
to the right.
Thus
The indication from meT)
that some transformation is needed
seems in large part due to the deviation of these residuals from normality.
Indeed it can be seen that in the region of high likelihood the coefficient
of skewness has been greatly reduced.
At
T·
T•
.16 some skewness is still
16
present but is only roughly a third of the value at
gl (T) • 0
occurs between -.2
and
skewness is positive but slight.
-.1.
At
or· 1.
or" 0
A value of
the coefficient of
This perhaps makes taking the logarithms
even somewhat more attractive.
8. VARIAlJCE
HOI~OGENIZING
TRANSFORMATIONS
Suppose that an initial inspection of a particular time series reveals
that the variation of the series is changing more or less continuously through
time.
By variation is meant the magnitude of the up and down oscillations
that have a frequency greater that the frequency of what would be called
trend.
Variation is used rather than variance since we want to avoid any
probabilistic notions; the following analysis will be purely structural.
The above initial inspection might involve plotting the series and making
a visual inspection.
It also might inv0lve, particularly if the series is a
long one, breaking the series into blocks, calculating a measure of the
variation in each block (perhaps by a sample variance), and inspecting these
measures.
Or it might involve calculating a measure of the change in
variation of the series.
It is, of course, highly desirable to have a series whose variation does
not exhibit any patterned change; that is, to have a series with homogeneous
variation.
If
transforming
variation.
t
t
n
n
is a series with inhomogeneous variation then we can try
by a power transformation
Choosing a value of
or
t(or)
n
in order to homogenize the
to do this will now be discussed.
One possibility would be to make plots of
or
and select a value of
the variation.
or
by Visual inspection which appears to homogenize
An alternative to this graphical method is to develop some
measure of the change in variation and select the
measure.
for several values of
T
One way of doing this will now be described.
which minimizes this
17
First a value of
and then the trend in
T is selected, then
t (T)
n
is removed.
discussed in more detail a bit later.
t
n
is transformed to
This removal procedure will be
Let seT)
n
trended series and let s(T) .. N- l t:'ls~T).
be the square of the de-
One would like the variation
of the detrended series to be as uniform as possible.
this is that one would like seT)
n
A measure of how close
t~T),
to be as near to
Another way of saying
constant as possible.
S~T)18 d>'a constant is the percentage reduction in
sum of squares of the regression of seT)
n
on a constant.
Percentage rather
than absolute reduction is used since the latter is not scale invariant and
would therefore not allow meaningful comparisons for different·· values of T.
Thus the measure of the change in variation is
The value of
T which minimizes VeT)
will be used in the transformation.
The trend in the series can be removed by ca:Lculating either a moving
average of
the
t(T)
n
t(T)
n
and subtracting from
t(T)
n
or by fitting some function to
using least squares and subtracting the fitted function from
t(T).
n
In choosing the coefficients in the moving average or the particular function
to be fit, sufficient latitude must be given to be able to account for the
trend of
for all
T
in the range of interest.
Sometimes there is
slow variation in the series which has a higher frequency than the variation
that would be called trend but which is homogeneous and has a lower frequency
than of the variation which shows a pattem.
In this case it will sometimes
be useful to remove this slow variation by a moving average before calculating v( T).
Another possibility for removing inhomogeneities in the variation of
tn'
which won't be described in detail here, would be to transform t n
to
18
cntn
where
measure
v
cn
is some positive" smooth function of
n.
The above
can be used to assess the relative success of various
cn
se-
quences in removing inhomogeneities.
The technique of using a power transformation to homogenize the variation
was applied to two series.
The first is the well-known monthly international
airline passenger series ([4, p.531] or [5, .. p. 429]).
The seeond is the
monthly inward station movement series reported in [19] which consists of the
number of telephones installed each month in Wisconsin from January 1951
through October 1966.
Both series have an increasing trend and seasonal
oscillations (of order 12) whose magnitudes are increasing with time.
For
both analyses a quadratic polynomial was used to eliminate trend.
For the airline series,
points
-.5, -.4, ••• , .5.
points were
.672.
VeT)
was calculated at the 11 equally spaced
The values of
VeT)
corresponding to these
.618, .596, .580, .572, .572, .568, .593, .611, .630, .651, and
The minimum of
VeT)
occurs at
T
I:
O.
From a visual inspection of a
graph of the logarithms of the series (cf. [4, p. 308]) it is seen that the
increasing trend in the magnitudes of the seasonal oscillations of the series
has been removed.
Indeed, it makes sense to take logarithms since, as Box
and Jenkins have written, it is percentage fluctuation which might be expected
to be eomparable at different sales volumes.
For the telephone data
-.9, -.8, 0, .5 and 1.
VeT)
was calculated at the points
The values of
VeT)
corresponding to these values of
were .568,.564, .562, .562, .565, .664, .147 arid .815.
occurs near -1.0.
-1.2, -1.1, -1.0,
The minimum OfV(T)
Again from a visual inspection of a graph of the inverses
of the telephone data it is seen that the increase in the magnitudes of the
seasonal oscillations has been removed.
The inverses·· have the interpretation
of being the average time between installations during each month.
T
19
9. CALCULATION OF ALL SUBSET REGRESSIONS UP TO APARTICULAR
SIZE FOR ASTATIONARY AUTOREGRESSION WITH MEAN O.
Suppose
t
n
is am stationary autoregression With mean
0
and the
standard Ng1!esaion infe7'enC8 approximation of Section 3 is justified.
That
is, the likelihood function or the posterior density is approximately proportional to
N-a
T
h
_~ tN
-~a
2
exp,h L.n=a+l(tn - l.j=l ajtn _j ) •
If the model is written in the form (1.2) then
and y . (ta+p." ,~)' •
jel,. .. ,a
!:.!
2
h
The
(i,j)th
,."
IW
exp-~h[(Y-Xa)'(Y-Xa)
element of
X'X
Let
~
III
",.,
(9.1)
X
III
[t _
n j
(X'X) -Ix'Y
1,
n-a+~, ....
then
(9.1) is
N
+ (a-a)'X'X(a-a)].
(9.2)
is
tN
t
t
L.n=a+1 n-i n-j
and the j-th element of
INn°&l-l
Let
~=
imated by
X'Y
t
t
a
(9.4)
•
If
N is sufficiently large (9.3) may be approx-
Cji_jl and (9.4) by
Y III (cl, ••• ,ca )';
(9.3)
is
n-j n
t:.a+l tntn+k.
,N,
cj •
= (&l' ••• '&a)'
Let
•
rill
[Cli_jll,
1=1, ••• ,a,
jal, ••• ,a;
r -1y; and I'n· (N-a)/(cO-E ja=l ajcj ).
Then (9.2) is approximately
a
and
h,
the values of
a
and h
which maximize this function, are the
well-known Yule-Walker estimates (cf. [13, p. 476]) of
a
and h.
20
It is desirable, just as in the standard regression model, to investigate which of the columns of
fit.
That is, which of the
X could be deleted and still maintain a good
Qj'
for
jel, ••• ,a,
can be assumed to be zero.
One way to carry out this investigation is to calculate all subset regressions
up to a particular size.
section
X'X.
r
and
In the particular regression problem of this
X'Y
lid
Y have a very special structure.
r
is a
Toeplitz matrix and its elements determine in a very special way those of
y.
One would expect therefore to be able to devise an algorithm for calculating
all subset regressions up to a particular size which makes efficient use of
this special structure.
Let
X
o
One such algorithm will now be described.
denote the independent variable
denote the regression of
for
jal, ••• ,a,
Xi
o
Then
xo.
i1'." ,ir+1
X(Oli •••• ,ir+1)
1
on
regressions of size
r
comes from noting that
~lXjl+.
r
Let
·e-+E;rXj-r
r+1
is equal to
X(Oli , ••• ,i )
r
1
plus the regression of
X(Olil.···,ir )
is known since all
X(ir+1Iil •••• ,ir)
~e
has already been calculated.
the sum of squares of
y
and
;rXil+•• +~~r and the sum of squares of
saving in the algorithm
essentially has already been
~l, ••• ,r.
for
r
From these calcu-
l,. •• ,a
have been computed.
jk· ir+l - i k
Now suppose
can be computed in the following
be a subset of
view of the special structure of
Thus
X(iolil, ••• ,ir)
X(Olj)· (cj/cO)Xj ,
Then
have been computed"
Xir+1 - X(ir+l1i1.···,ir)·
calculated.
Let
which gives all subset regressions of size 1.
lations the regressions of size
Let
Xi'· •• ,X ' .
tr
1
on
all subset regressions of size
manner.
Y.
Now X(Oljl" ,···,jr) •
Then it is easy to see, in
that
X(ir+l1il' ••• ,i )
r
Xir+l-X(ir+llil.... ,ir)
Xo - X(Olj1' ... ,jr)
which is
=
equals
Co - ;lCjl- ... -~rCjr.
21
and the residual sum of squares of this regression of Xo on
is
is
Co -
Tlci - •••-Tr+l c i
•
r+l
1
The assumptions being made in this section are that
s~ationary.
n has mean 0 and
Such assumptions will generally arise when the t n are the
t
residuals after subtracting off an estimate of trend or the t n are the
result of differencing an original time series. (cf. [4, p. 85]). Such a
stagewise procedure is not the most efficient with regard to estimation but
does tend to make working with the series less complicated.
A note of skepticism will now be entered about calculating subset autoregressions.
y
We have been able to exploit the special structure of r
and
in order to reduce the computations in calculating subset autoregreesions.
The question, however, is whether there is generally much marginal benefit
from calculating subset autoregressions after the estimates of the autocorrelations and inverse autocorrelations [7] have been studied.
In the
general regression situation X'X and X'Y are usually too complicated to
simply look at and decide which independent variables should be in the
regression model.
Consequently a crutch such as calculating subset re-
gressions is needed.
But for the stationary RAR model of this section we
can do a fairly good job of looking at the values of
r and y (i. e.
looking at the autocorrelation function) and making some statement about
23
b = (N-a-p)/SS(~,;,e).
that is, the least squares estimates,and
One possibility for iteratively finding the least squares solution of
this noD.1inear least squares problem is to use an algorithm such as the
Gauss method (cf. [16, p. 148] or [8, p. 267»
the general nonlinear least squares problem.
SS
suggests an alternative algorithm.
minimum over
'IT
and
o
B of SS(a' ,11,,13)
a
minimize
a
and
converge.
'IT
SS (a,1r, a)
for fixed
OJ
Also
then the
over
'IT
and
a.
A quick description of the
''If
and
a,
for fixed
a;
then minimize
and repeat this process until the points
1(;
a(B)
and
'!T(B)
In reading it,
commute, a(B)'IT(B)t. •
n
will be used to denote a vector of
o
The starting values
a
is linear in
A more detailed description will now be. given.
'IT(B)a(B) tn.
o
is fixed to be
can be found by the usual linear
fixed.
it should be kept in mind that since
1.
a
a (B)7T(B)t + E:=l Skdn,k
n
A similar statement holds with
over
But the special structure of
Q
squares formula since
algorithm is:
If
which is capable of handling
o
ea ,
a' =
'IT = 0p'
and
0
a'
zeros.
j
= 0b
are generally
quite satisfactory.
2.
Suppose at the completion of the r-th iteration the values of the
parameters are
r
a r , tr ,
and
are
The
(r+l)-st
iteration is given by steps
3 and 4.
3.
Minimize
SS(al' "tr,a)
over
This is done by calculating
of
minimizing
N
and
'If
yielding'ITr+l
....
r
t = a (B)t
n
n
b
'"
S,
En=a+p+l('IT(B)tn - Ek=l Bkdn,k)
2
over
for
7T
as the value
n=a+l, ••• ,N
and
a
and
using the
linear least squares formula.
4.
Minimize
SS(a,:'!Tr+l,a)
This is done by calculating
N
....
b
is at the point
a
and
'"t = 11'r+l (B)t
n
n
En=a+p+l(a(B)tn - Lk=l l3 k dn ,k)
squares formula.
over
2
over
This completes the
aI+l, 'lTr+l, er+l.
ex
and
a,
for
a
yielding
n=p+l, ••• ,N
ar+l
and
ar+l.
and minimizing
using the linear least
(r+l)-st" iteration and the algorithm
24
S.
A reasonable stopping point can be given in terms of the likelihood
function approximation F. Define hr to be the value of h 'which minimizes
F(h,cl''lT r ,/3r). Then hr. (N"a-p) /5S(o. r , 'ITt', 13 r ). The algorithm can be stopped
after the r-th iteration if the likelihood ratio L(hr,o.r,~r,ar)/L(hr-l,ar-l,
N-a-p
r
l
'Il'r-l,a - ) .. '(hr/hr-l) 2
is nearly one, or equivalently, if N-~-P log
h
r
- r 1 is nearly 0.
h -
In the remainder of the paper this algorithm will be ·referred to as the
relaxation algorithm.
algorithm.)
(Actually it is only one particular relaxation
It will now be shown that the relaxation algorithm converges
(mathematically, that is, ignoring round-off error) to a local minimum of SSe
The method of proof easily extends to the case where there is more than two
polynomials of B in (10.1).
Let ak , for k-l,2,. .. ,
be a sequence of a points converging to a
point o.O(using the usual Euclidean distance). Let 'Irk and 13 k , for
k
k=O,l, ••• , denote the unique values of rt and a which minimize SS(a ,
k
'IT,B) over 'IT and a for fixed o.k. Let ~k • (ak ,'ITk ,a),
for k-0,l,2, ••••
We will show that ~r converges to
negative polynomial in a, 'IT,
and
°
First note that S is a non-
~.
a.
Since SS(~k)
&
>
SS
is uniformly continuous on R;
° we can construct a family of spheres
such that
distance of
(akOO
,:IT ,a)
ISS(~) - Ss(~k)1
(a,
k 0p' Db)
lies in
argument 5S(~0)
<
from
°
in
(a, 0p' Db)
0 ·Thus
Si~).
k
SS{~)
and
therefore,
for
k'
Sd'.~ep
for all ~
< &
k
55(0. ,Op,Ob)
the ~k lie in a closed
these latter numbers are uniformly bounded in k,
bounded region R.
s;
.
) of radius oAPout the
So(~k). If the Euclidean
is less than 6,
°+
< SS(~ )
SS(~k) + e. Thus SS(~k)
k
~o1nts .~
£.
then
By a similar
converges to SS(~O). The
~k
lie in a bounded region and therefore ,have a point of accumulation,
(aO,'IT*,a*).
Now SS(aO.'IT*,e*> .. SS(~O).
values at which the minimum of SS(aO,1li,a>
But since
'ITO and 13° are unique
over 'IT and a occurs,
'IT*" 'iTO
25
and
13*" 13°.
Thus
<l>k
converges to
<1>0.
By a completely· analogous pro.of
this same convergence occurs if the roles of
Now let
algorithm.
ar
r r r
.. (a ,1f ,B )
and
ware interchanged.
be a sequence of points of the relaxation
SS(ar):s SS(a r - l )
Since
a
and 58.
is a polynomial, the
be a subsequence of
ar
which converges to
relaxation algorithm minimizes
respect to
w and
B.
Let
SS
13*.
with respect to
Let
At the point
the
a
and
<l>r be the subsequence of
to minimization with respect to
a
and
B.
lie
e* = (a*,.n:*,B*).
in a bomded region and have a point of accumulation
yr
ar
13
yr
or wi th
yr which corresponds
(If there are only a finite
number of terms in this subsequence then consider the other minimization.)
Let
.... r
be the point to which the algorithm moves as a result of this
<I>
minimization.
n*
fixed.
converges
qo
$.
S*.
Now suppose
5S(~) < S5(a*).
would have that
to
be the minimum of
SS(a,.w*,B)
over
8 with
and
13*
Ar
5S(<I»
Btit since
< 55(13*)
SS($r)
for some
is a point of accumulation.
Therefore
ex.
<I>
r,
replaced by w.
55(~) we
converges to
which contradicts the assump-
Thus
minimization with respect to
statement holds with
Ar
~.:/: 13*. . Then since $ is a mique point of
A
<I> •
a
e*
and
algorithm infinitely often out of a small neighborhood of
A
a
From the convergence fact proved in the previous paragraph,
minimization
tion that
~
Let
Ar
and
8
<I>
converges
cannot take the
13*.
A similar
Furthermore, from the· fact that
..
<I> ..
a*,
we see that
e*
is a local minimum of
converges to a local minimum of
5S.
Thus the algorithm
SSe
The relaxation algorithm has the property that it tends very quickly to
get in the general area of a minimum but has difficulty descending a long
narrow valley to a minimum.
This difficulty can often be easily overcome by
using the elegant pattern mova part of the Hook and Jeeves [11] algorithm.
This would be used after the r-th iteration in the following manner.
<l>r • (ex.r,:JTr,a r )
Let
be the point to which the algorithm moves at the r-th
26
iteration.
to the point
- exploits the
~r. The direction of the move is ~r - ~r-l.
CotnmOfi
even be better.
twice as far.
at the point
The pattern move goes in the direction
That is. the move is to
$%+1
ere
1'-1
The pattern move
sense notion that a little more of a good thing might
tion cil.gcritbm"is used to improve
call it
~
The result of the r-th iteration is a move from the point
ere
ar
C
~
l'
~
-
1'-1
$1'-1 + 2(~r_~r-l).
but goes
Now the relaxa-
That is. steps 3 and 4 are carried out
If the resulting point is an improvement over
and now repeat the pattern move by going to
If the point is not an improvement over
cpr
then return to
$1'
+
4>1'
$1'
then
2(cp%+l_$r) •
and carry
out the relaxation algorithm as usual.
Just what strategy one uses in inserting pattern moves will depend on
the type of computing system used.
If batch processing is used then one
strategy would be to allow the relaxation algorithm to run for a number of
iterations without any pattern moves and then use pattern moves at every
iteration.
If an interactive system is used the strategy can be defined as
the algorithm proceeds.
The· programmer can insert the pattern move when it
seems advisable.
Another procedure would be to combine the relaxation algorithm with the
Gauss method.
The relaxation algorithm tends to get in the neighborhood of
a minimum more quickly than the Gauss method and a single iteration of the
former requires less computation than a single iteration of the latter.
the Gauss method seems to be a better valley descender.
But
Thus the two might
be combined by alternating them or perhaps by.. letting the relaxation method
run for the first few iterations and the Gauss algorithm thereafter.
If time permits it is advantageous to use more than one algorithm.
This
tends to guard against the possibility of thinking a minimum has been found
when. in fact. the algorithm has merely slo",,1ed down.
27
As an example the model (l-alB) (l-'f l Bl2)tn • zn was fit to the monthly
birth rate data reported in [6, p. 240].
algorithm are shown in Ta15le 3.
Convergence was quite rapid and it was
unnecessary to use a pattern move.
shown in Table 4.
The results of the Gauss method are
The convergence was also rapid, but not quite as fast as
the relaxation algorithm.
values.
The results of using the relaxation
It is reassuring to see convergence to the same
Table 1.
POSTERIOR AND APPROXIMATION
M1 (al)/M2 (al)
al
M1 (al)
M2 (al)
.00
.101
.097
1.036
.04
.425
.416
1.021
.08
1.320
1.308
1.010
.12
2.984
2.980
1.001
.16
4.874
4.893
.996
.20
5.724
5.757
.994
.24
4.825
4.848
.995
.28
2.923
2.926
.999
.32
1.279
1.273
1.005
.36
.407
.402
1.014
.40
.095
.093
1.024
Table 2.
TRANSFORMATION OF AIR POLLUTION DATA
M(,)/10 5 M(,) /M( .16)
A
13 (,)
1
A
13 (,)
2
A
gl(')
.01
-.00
-.98
.11
-.01
-.14
.15
-.01
-.03
.05
.29
-.01
-.01
3
-
A
13 (,)
4
13 (,)
.60
.00
.17
.78
.02
.00
.12
1.65
.07
.2
.13
1.27
.03
.12
1.90
.07
.1
6.98
.16
.11
2.21
.07
21.69
.50
.11
2.59
.07
.1
38.94
.90
.11
3.07
.07
-
.41
-.02
.19
.16
43.03
.10
3.41
.06
- .50
-.02
.23
.2
41.60
.97
.10
3.66
.06
.57
-.03
.25
.3
.4
27.27
.63
.10
4.41
.06
-
.75
-.04
.31
11.26
.26
.10
5.35
.05
-1.10
-.05
.36
.5
3.00
.07
.10
6.56
.04
-1.52
-.07
.41
.53
.01
.10
8.10
.04
-2.11
-.09
.46
.00
.11
20.26
.03
-7.85
-.25
.68
-1.0
-
A
ell (,)
.3
0
.6
1.0
10- 5
1
- .21
.12
Table 3.
r
0
0.
r
7f
1
RELAXATION ALGORITHM
r
1
h
r
61.5 £.og --1
rh
601. 91
0
0
(hr) -1
1
.55993
.99990
.35104
2
.56168
.99844
.35079
3
.56171
.99842
.35079
4
.56171
.99842
.35079
443.09
.042801
4.4239x10- 6
1.2463x10- 9
r
0
Cl.
Table 4.
GAUSS METHOD
r
(hr) -1
r
TIl
l
0
r
h
r-
61.5 log --1
h
601.91
0
1
.32183
.67829
2
.64387
.84687
3
.70624
.97076
4
.59214
5
.55488
6
1.0077
28.963
2.0996
.39487
180.53
156.14
99.423
.36050
5.4183
.99751
.35093
1. 6007
.56235
.99842
.35079
7
.56171
.99842
.35079
8
.56171
.99842
.35079
.02296
3.5179x10- 5
7.9949x10- 8
32
REFERENCES
[1]
Anscombe, F.J. and Tukey, John W., "The examination and analysis of
residuals," Teahnometncs, 6'6 1963, 141-159.
[2]
Barnard, G.A.; Jenkins, G.M.; and Winsten, C.B., "Likelihood inference
and time series," Jourrnal of the Royal Statistioa't Soai.ety A-125~ 1962,
321-372.
[3]
Box, G.E.P. and Cox, D.R., "An analysis of transformations," Journal of
the Roya'L Statistiaal Soaiety B-26.. 1964, 211-252.
[4]
Box, George E.P. and Jenkins, Gwilym, M., Time Senes Analysis'6 HoldenDay, San Francisco, 1970.
[ 5]
Brown, Robert G.,
Smoothing~
Foreaasting and Prediation of Discrete
Time Series.. Prentice Hall, New Jersey, 1962.
[6]
[7]
Chakravarti, I.M.; Lalla, R.G.; and Roy, J., Handbook of Methods of
AppUed Statistics.. Vol. 1.. Wiley, New York, 1967.
Cleveland, William S., "The inverse autocorrelations of a time series,"
Institute of Statistics Mimeo Series No. 689.. Department of Statistics,
University of North Carolina, Chapel Hill, 1970 (to appear in
Technometrias) •
[8]
[9]
Draper, N.R. and Smith, H., AppZied Regression AnaZysis.. lliley, New
York, 1966.
Durbin, J., "Estimation of parameters in time series regression models,"
JoumaZ of the Royal Statistiaal Soaiety B-28, 1960, 139-153.
•
[10]
Edwards, Ward; Lindman, Harold; and Savage, Leonard J., "Bayesian
statistical inference for psychological research," Psychological
RevietJJ 'l0.. 1963, 193-242.
[11]
Hook, R. and Jeeves, T.A., "Direct search solution of numerical and
statistical problems," JoumaZ of the Association for Computing
Maahinery, 8... :. 1961, 212-221.
[12]
KalbfleisCh, John D., and Sprott, D.A., "Application of likelihood
methods involving large numbers of parameters," Journal of the Royal,
Statistical Society B-32.. 1970, 175-208.
[13]
Kendall, Maurice G. and Stuart, Alan, The Advanced Theory of Statistics
VoZ. 3, Griffin, London, 1966.
[14]
Mann, H.B. and Wald, A., "On the statistical treatment of linear
stochastic difference equations, If Economemca 11.. 1943, 173-220.
[15]
Parzen, !imanuel, "An approach to time series analysis," AnnaZ,s of
Mathematical. Statistics, 32, 1961, 951-989 •
33
[16]
Powell, M.J.D., "Minimization of functions of several variables,"
NUTftSP'icaZ AnaZysis: An Introduction~ J. Walsh, ed., Academic Press,
- London, 1966, 143-157.
[17]
Raiffa, Howard and Sch1atfer, Robert, AppZied StatisticaZ Decision
Theo~~ Division of Research, Harvard Business School, Boston, 1961.
[18]
Spirtas, Robert and Levin, Howard J., "Patterns and trends in levels
of suspended particulate matter at 78 NASN sites from 1957 through
1966," presented at the Annual Meeting of the Air Pollution Control
Association, New York, June 22-26, 1969.
[19]
Tiao, G.C. and Thompson, H.E., "Analysis of telephone data," TechnicaZ
Report No. 222~ Department of Statistics, University of Wisconsin, 1970.
[20]
Valley, She:a L., Handbook of Geophysics and Space
Hill, New York, 1965.
Environments~
McGraw-
© Copyright 2026 Paperzz