Gallant, A.R. and Goebhel, J.J.; (1975)Nonlinear regression with artoregressive errors."

NONLINEAR REGRESSION WITH AUTOREGRESSIVE ERRORS
by
A. R. GALLANT and J. J. GOEBEL
Institute of Statistics
Mimeograph Series #986
Raleigh - March 1975
2
ABSTRACT
The article sets forth an estimator of the parameters of a nonlinear time
series regression when the statistical behavior of the disturbances can be
reasonably approximated by an autoregressive model.
The sampling distribution
of the estimator and relavent statistics is investigated both theoretically and
using Monte-Carlo simulations.
NONLINEAR REGRESSION WITH
AUTOREGRESSIVE ERRORS
A o R. Gallant and J. J. Goebel*
*Ao
R. Gallant is assistant professor of statistics and economics, Department
;
of Statistics, North Carolina State University, Raleigh, N. C.
J. J. Goebel is
assistant professor of statistics, Department of Statistics, Iowa State
University, Ames, Iowa
50010.
3
1.
INTRODUCTION
This article examines estimation of the unknown parameter
9
~'(
of the
nonlinear time series regression model
(t
The process
co
f u }
.
1. t t=+co
= 1,
2, ••• , n) •
is
generating the realized disturbances
assumed to be a covariance stationary time series.
This is to say that the co-
of the time series depend only on the gap
variances
h
and nov
//
on the position
t
In consequence, the variance-covariance matrix
in time.
rn
of the disturbance vector
(n x 1)
u )'
n
will have a banded structure with typical element
y ..
'!"
J.J
y(i-j)
where
Y(h)
is
the autocovariance function of the process
(h
= 0,
±l, ±2, •••)
9 * ,were
The appropriate estimater of
rn
•
known, is the generalized non'1(
linear least squares estimator.
Specifically, one would estimate
minimizing
l
[y - f(9)]'r-n [y - f(9)J
where
(n x 1)
and
(n x 1) •
9
by
e
4
When
rn
is not known, as we aSSume here, the obvious approach is to substitute
an estimator of
rn
in the formula above.
e*
Recently, Hannan [6] and Goebel [5J have obtained estimators for
employing this approach.
to approximate
rn .
They use a circular symmetric matrix [4, 1971, Qh. 4J
The approximating matrix has easily computed eigen vectors
which do not depend on any unknown parameters,
This fact considerably reduces
computational difficulties and avoids inversion of a matrix of the same order as
the sample size; see,
~.a.,
[3].
The eigen values of the approximating matrix
are proportional to the spectral density of the time series
appropriate frequencies.
(utl~_aJ at
Hannan and Goebel estimate the spectral density from
ordinary nonlinear least squares residuals to obtain consistent estimates of
the eigen values of the approximating matrmx.
Their estimators differ in choice
of procedures for estimating the spectral density but share the same desirable
asymptotic properties under fairly unrestrictive assumptions on the time series
generating the disturbances.
Moreover, the Monte-Carlo
simulatio~reported by
Goebel indicate that this type of estimater has better efficiency in small
samples than the ordinary nonlinear least squares estimator.
Presently, we shall demonstrate by an example that the approximating matrix
used by Hannan and Goebel can be very inaccurate in small samples.
This is not
due to sampling variation but due to the form of a circular symmetric matrix
which, in effect, imposes the unrealistic condition
••• , n)
on the approximating autocovariances.
disturbances appear to satisfy more
Goebel were willing to
assume~
restricti~e
y(h)
= y(n~h)
(h
= 1,
2,
Often, the serially correlated
assumptions than Hannan and
In these cases, it would seem possible to
exploit these assumptions and obtain a better approximating matrix.
naturally expect that the use of a better approximating matrix in the
One would
5
generalized nonlinear least squares formula would lead to better small sample
efficiency.
In view of Hannan and Goebe1s theoretical results, we would not
expect to obtain better large sample efficiency.
An assumption which is frequen,tly satisfied in applications - at least to
within errors which can reasonably be ascribed to sampling variation - is that
the disturbances can be reduced to a white noise process by using a short linear
filter.
(Ut}~_ CXJ is assumed to satisfy
To be more specific, the time series
the equations
= 0,
(t
where
+ 1, + 2, ••• )
(et}~_CXJ is a sequence of independently and identically distributed
random variables each with mean zero and variance
C1
2
•
In addition, we assume
the roots of the polynomial
have roots less than one in absolute value.
{u t } ~_ CXJ which
A time series
satisfies these assumptions is called an autoregressive process of order
q •
The Wholesale Price Index for the years 1720 through 1973 plotted as
Figure A appears to satisfy these assumptions.
To these data we have fitted an
exponential growth model
(t
= 1,
•• "
n
= 254)
by ordinary nonlinear least squares to obtain residuals
residuals we have estimated the autocovariances using
(h = 0, 1, •• "
and plotted them as Plot AUTOCOVARIANCg,
in Figure B.
60)
A
(
254
u t }t=l.
From these
6
0
CO
en
0
(0
en
0
v
en
0
N
en
0
0
en
en
W
0
a::
0
W
..J
0
co
co
0..
(0
«
~o::
w
OW
en
«
a;>-
..J
0
:I:-
3:
0
en·
::::>·
«·
(\J
CO
0
0
CO
0
~
0
0
co
r-
.' ...
~-.,
.-;".-
0
(0
r-
II
ren
(0
0
v
r-
X
W
°0
~V
0
(\J
0
(\J
0
0
0
0
en
0
CO
0
r--
0
to
0
(\J
O~
7
FOOTNO';['E TO FIGURE A
Source:
Composite derived from:
Wholesale Prices for Philadelphia, 1720 to
1861, Series E82, [14J; Wholesale Prices, All Commodaties, 1749 to 1890,
Series El, [14]; Wholesale Prices, All Commodaties, 1890 to 1951, Series El3,
[14]; Wholesale Prices,. All Commodoties, 1929 - 1971, [12J; Wholesale Prices,
All Commodoties, 1929 - 1973, [131.
AUTOCOVARI ANCE
275.
i
.250\
i
i
,
i
i
B. AUTOCOVARIANCES, AUTOREGRESSIVE APPROXIMATION,
HANNAN-GOEBEL IMPLICIT APPROXIMATION
225f-\
200f-. \175f-
.~
\, .
-1
',,~.\~,
125
.
HANNAN-GOEBE)!
/
.
.0/.
'\,
. '........
.
"~"
./
.
---....... . -:::::""_~'-. ....~.
.......~"":::.-.::_----:..~---...
~
,"
._.
/\'0
\ . ' , ........'\,-AUTOREGRESSIVE
\
..
.
.
·
-50 I
o
~m COVARIANCE
0""0_0_0-
I
I
10
20
.
00'
I
I
I
I
30
40
50
60
LAG
9
Using the methods set forth in Section 3 and assuming q = 2
2
(aI' a , a )
2
= (-1.048,
0.1287, 34.09).
Using the
Yule-Wa~ker
we estimate
equations and
these estimates, we obtain the approximating autocovariances shown as Plot
AUTOREGRESSIVE in Figure B.
As we mentioned earlier, the entries on the diagonal bands of the variancecovariance matrix
y(h)
= Yij
r n are the autocovariances y(h)
where
of the process; that is,
Thus, the approximating circular symmetric matrix
used by Hannan and Goebel implicity defines an estimator of the autocovariance
function; see Box and Jenkins [2, p. 3lJ for a schematic illustration.
Due to
the nature of a circular symmetric matrix, the autocovariance function
implicitly defined will vary with the sample size
estimator for the spectral density.
We used
n
n
= 60
density using a Tukey-Iag window [9, p. 243 ff.] with
and with the choice of an
and estimated the spectral
M= 25
to obtain the
approximating autocovariances shown as Plot HANNAN-GOEBEL in Figure B.
As can be seen from Figure B, the approximation of the autocovariance
function obtained using the autoregressive assumption is considerably more
realistic than the approximation implied by the Hannan-Goebel procedure.
Correspondingly, for these data we would expect that a generalized nonlinear
least squares estimator using the autoregress ive assumption to obtain the
approximating variance-covariance matrix would perform better in small samples
than a Hannan- Goebel estimator.
squares residuals
A
n
(Ut}t=l
In general, when the ordinary nonlinear
appear to satisfy the assumption that they were
generated by a low order autoregressive process we would expect that a
generalized nonlinear least squares estimator which exploits this assumption
would have better efficiency in small samples than a Hannan-Goebel estimator.
10
This article investigates this hypothesis.
estimation procedure.
In Section 2 we describe an
In Section 3 we summarize the consequences of the auto-
regressive assumption which motivate the procedure.
,£orth the asymptotic properties of the estimator.
In Section 4 we set
In Section 5 we investigate
the small sample properties of the estimator using a Monte-Carlo simulation and
compare them with Goebel's results.
We conclude, on the basis of the Monte-Carlo evidence, that the estimator
suggested here will have better efficiency in small samples than the HannanGoebel estimator when the autoregressive assumption is at least approximately
satisfied.
Gains in efficiency relative to ordinary least squares can be
dramatic when the autoregressive assumption is exactly satisfied and
q
is
correctly identified.
2.
Our
obje~tive
ESTIMATION PROCEDURE
is to estimate the unknown parameter
e*
appearing in the
nonlinear regression equations
(t
= 1,
2, ••• , n)
when we assume that the disturbance terms are an autoregressive process of order
q •
A detailed description of such a process is deferred to the next section
where methods of estimating
q
from the: data are discussed.
are
The inputs
oJ(
known
k
by
1
vectors and the unknown parameter
known to be contained in the parameter space
be written in vector form as
y
= f(8)
+
u
using the notation of the previous section.
®.
e
is a
p
by
1
vector
The regression equations may
11
The first step of the procedure is to compute the ordinary least squares
A
e
estimator
over
which minimizes
® using, for example, Hartley's [7J modified Gauss-Newton method or
Marquardt's [llJ algorithm.
A program implementing at least one of these
algorithms is usually found at computing centers with a statistical program
library.
The second step is to compute the ordinary least squares residuals
u=y
-
f(8)
and from these estimate the autocovariances up to
lag
q
of the disturbances
using
(h = 0, 1, ""
q) •
For the third step, let
A
A
A
fq
yq
y (q-l)
y(l)
Y(O)
y(O)
y(l)
::;
...
y(q-2)
(q x q)
•
··
y(q-l)
= [y(l) ,
A
y(q-2)
t)
iii"
y(O)
y(2), ••• , y(q)J'
and compute
A
'.
a
=-
0. 2
=
A
A
(q x 1) ,
f qy q
y(Q)
+ ~ 'y q
•
(q x 1)
12
using,
~.£.,
Cholesky's method [8, p. l58J and set
o
}q rows
1
1
u·- q rows
'a
...
q-l
1
A
Note that to perform the multiplication
necessary to store
~; only
Pq
Pw
on a digital computer it is not
aI' a2, ... , aq
and
are needed.
Define:
(1 x p)
F(e) = the
by
n
p
. wh ose t-=
th .r.ow.
..
.
matr1x
1S
The fourth step is to compute
e
minimizing
and from this value obtain
and
As shown in Section 4,
If
vn (6- -
e~'( )
is asymptotically normally distributed with
a variance-covariance matrix for which -2
a "'"C
is a strongly consistent estimator.
Either Hartley's [7] or Marquardt's [1lJ algorithm may be used for the
final minimization to obtain
e.
Put
z = Py
and
g(e)
=:;
Pf(S) •
The problem
13
is then to minimize
~
n
[z
"t=l
- g (e)J
t
2
The derivatives
t
the program may be obtained as the
tth
row of
PF(e) •
W'gt(e)
As mentioned above,
this matrix multiplication does not require the storage by the
"P.
needed by
n
by
n
matrix
The matrix printed by the program
will satisfy
'"
C "" n(A'A)
-1
•
The estimation procedure may be iterated by returning to the second step
with '"
e
replacing
l.
The asymptotic properties of this two-stage estimator
do not differ from those of the one-stage estimator
e.
Intuitively, it would
seem that the two-stage estimator would be more efficient but the Monte-Carlo
eVidence, Section 5, suggests otherwise.
The only case we find improved
perfo'l;:lmance is when the l;1utoregressive assumption is exactly satisfied.,';··
3.
AUTOREGRESS IVE PROCESSl):S
The autoregressive process
co
tUt}t=_co of order
q
is defined implicitly
as a sequence of random variables which satisfies the difference equation
+atl
""e
q t-q
t
where the process
{et1
co
is a sequence of independently and identically
.
. t=- co
' bl es
' t rL'b ute d rand om varLa
d LS
e~c
finite ~1llOm:ent. The parameters
h WL'th mean zero, f'LnL't e varLance
'
a,
L
(i
= 1,
••• , q)
a 2 ,and
of the defining equation
are assumed to satisfy the condition that the roots of the polynomial
••• + a q
are less than one in absolute value.
14
The process
rUt}
in terms of weights
can be given an explicit moving average representation
ro
(w}
j. j=O
defined by the equations
wq- 1
and
- a qwj-q
for
j
= q,
q+1,
0... These weights are absolutely summablE)
(L:j~olwj
1< 0)
and
almost surely [4J.
The autocovariances of the process
(u t }
(h= 0,
+
are defined by
1,
± 2,
0
0
.)
•
From the moving average representation we have that the autocovariances are
given by
(h
and satisfy the Yule-Walker equations
= 0, +
l,
+ 2, .0.)
15
w.a
2
j sO
J
y(j) + a y (j-1) + ••• + aqy(j-q) =
1
OJ> 0
In addition, the autocovariances are absolutely summab1e
... ,
(}:~_coIY(h) I<co) •
The Yule-Walker equation may be used to compute the parameters a , a ,
1
2
2
a q' a
defining the process rUt} from knowledge of the first q auto-
covariances
y(h)
rs =
(h
= 0,
y(O)
y(l)
Y(1)
y(O)
q) •
Y(s-2)
= [y(l), y(2),
Define
Y(s-l)
.•
y(s-l)
y~
..
1, ' "
...
Y(s-2)
...
y(O)
(s x s)
n., y(s)]
,
(s x 1) ,
and
(q x 1) •
The parameters are computed from the equations
a ==
-1
-r q yq
The observed portion
n .
{Utlt=l . of a realization of the process
has variance-covariance matrix
which diagona1izes
= pIp
q q
and set;
rn
•
For
n
larger than
may be obtained as follows.
Factor
q
a matrix
as
P
16
\fa
- -
aq
p
2 p
a
-
_1-
a
q-l
aq
=
°
i
q
a
l
...
q-l
a
q rows
1
a
1
l
n-q rows
1
q-l
(w.} ,
Using the Yule-Walker equations and the equations defining the weights
This fact motivates the use of
PI" n p' == I n .
-Dne can verify that
....
P
J
in the
estimation procedure described in the previous section.
Estimators of the autocovariances may· be obtained from
(h = 0, + 1, ••• ,
In the nonlinear regression context we cannot observe
must substitute the estimators
least squares residuals
estimators ....y(h)
and
....
y(h)
(u'
n
trt=l
±
(ut}t:l
(n-q»
as
•
directly and
computed from the ordinary nonlinear
rUt} • We will show in the next section that the
y(h)
are asymptotically equivalent under apprdpriate
regularity conditions.
We have [4] that
lim
n Cov[y(h),y(£)]
n... co
4 . 4
= [e(et)/a
+
and [5J that
.
- 3]Y(h)y(£)
L::x'
[y(j)y(j-h+£)
J=-CO
Y(h)
+ y(j+£)y(j-h»)
converges almost surely to
equality the covariance formula implies that
probability; that is, given
8 >
°
y(h)
Using Chebischev's in-
\fn [Y(k) - y(h)J
there is a finite bound
is bounded in
B and an
N such
17
that for all
n > N
P[\[nIY(h) - y(h)
I> B]
< 0 •
If we estimate the parameters
(h
...
0, 1,
... , q)
a ...
... ,
al'
a q' a
2
by substituting
Y(h)
in the equations
-1
-r q yq
a 2 ... y(O) + a 'y q
to obtain.
-
a· ...
...-1Yq
..r q
we have as consequences of the facts set forth in the previous paragraph that
ai
each
converges almost surely to
probability.
to
~
a
and that \[n
i
a-2 •
The same is true for
-i
a
properties of the estimators
- a )
i
is bO\lnded in
These equations may be solved subject
a .... 0 •
priori restrictions that some of the
(a i
J,.
provided that the
This will not disturb the
~estrictions
are, in fact,
true.
To determine the order
q
of the process from the data, we consider a test
of
H:
at the
a
q
... 0
against A:
a-level of significance.
aq
:if:
0
We may use a two-sided "t-test" by comparing
= [y-(O)
-s2
to the
two-sided~-leve1
freedom.
th
is the q--
+ a-,Y J/( n-q )
q
where
diagonal element of
critical point c of a t-variate with
n-q
18
--1
f
q
degrees of
We have [1, Sec. 5.6.3J that
limtHCO
pet
lim
P[ t > c A1
> clHJ
=~
and
tHCO
I =1
•
This test is employed to determine the order
q
of an autoregressive
process using procedures analogous to those used to determine the appropriate
degree of a polynomial uSed in polynomial regression analysis.
One may, for
example, test sequentially upward or} alternativelYJ start from a very high order
and USe Anderson's [1, Sec. 3.2.2J downward selection procedure.
For a sequence of weights
parameter
(ct(e)}~l depending on a p.dimensional
e e ® there is a Uniform Strong Law [5, 61 which, in the present
context,states that
converges almost surely to zero uniformly for
(i)
e
in
® provided that:
The process· (u t-1 is as described above.
(ii) Each
ct(e)
(iii) The set
is a continuous function ;for
a
in ® •
® is compact.
(iv) The Sum
and
e
converges uniformly for
in
® as n
tends to infinity.
all~
19
4.
ASYMPTOTIC PROPERTIES OF THE ESTIMATOR
The assumptions used by Goebel [5J to show that
e * and that
to
*
(8 - 9)
Vn
e
converges almost surely
is bounded in probability may be stated in the
present context as:
Assumptions:
The errors
rUt}
are an autoregressive process of order
q
,/(
as described in the previous section.
in an open sphere
The true parameter value
S , which is, in turn,. contained in
® is a compact set. The response function f(x,e)
partial derivatives in
sequence of inputs
9
tXt}
and its first and second
are continuous over the parameter space.
g2(x,e)
01
,S
in
® as
n
= 0, +
1, ••• ,
+
q)
tends to infinity where
may variously be
2
(0 /o9 i oe j )f(x,e) • The function
5(x,e)
The
are chosen so that the functions
converge uniformly for all
and
is contained
® ; the parameter space
(h
gl(x,9)
9
= f(x,9 *)
f(x,9), (%9 )f(x,e), or
i
n
2
limn-+ co (l/n) !:t=l 5 (Xt'9) where
- f(x,e)
is assumed to be non-zero on ® except at the point
e*
•
The
p x p
matrix
V(h) with typical element
(h = 0, + 1, ••• ,
is assumed to be non-singular when
± q)
h = 0 •
These assumptions are sufficient to prove that the autocovariance estimators
y(h)
computed from ordinary nonlinear least squares residuals are asymptotically
equivalent to the estimators
y(h)
computed from the unobservable disturbance terms.
20
Theorem 1.
= y(h) + 01 n
y(h)
where
01
Under the assumptions above,
(h
= 0,
+
~n 01
converges almost surely to zero and
n
± q)
1, ••• ,
converges in probability
n
to zero.
Proof.
S.
Let
e= e
when
@ is in
Since, by Goebel's results,
A
S
•
e =e
and let
~
~'c
=e
almost surely for
e
it will suffice to prove the theorem with
replacing
when
n
e.
e
is not in
sufficiently large
This allows the use
of Taylor's theorem in the proof •
The estimator with
.
e
replacing
A
e
may be written as
y(h)
+
01
n
•
Using Taylor's theorem we may write
o
-Ie
(e - e )
-
e
where the
indicate points on the line segment joining
e
'1(
to
e
The term
in braces converges almost surely to zero by the Uniform Strong Law, the
assumption that
(l/n)~t~l S(xt'OI)V'f(xt+lh\'S)
•
almost sure convergence of
e
almost sure convergence of
e ;
•
'1(
~n (e - e)
.
to
~n
~'c
e
OI
n
is bounded in probability.
Thus,
converges uniformly, and the
01 n
cQnverges'
almost
su,rely' to
.
.
. zero by the
converges in probability to zero because
IT
21
The following additional assumptions are needed to derive the asymptotic
properties of· ""
e •
Assumptions.
o(i-j,e)
(continued) ·The
be non-singular on
®
q
C = L.qOL.
~=
J=
o
except at the point
equation defining
~
(u }
t
J
a.
~
are the coefficients of the stochastic difference
aO
=1
•
Under the assumptions above,
Let
The matrix
.
and
converges almost
Proof.
e*
is assu~d to
1
a.a. -2 [V(i-j) + V'(i-j)J
is non-singular where the
and
matrix with typical element
-o(h,e) = lim -+ (l/n)L:n-\hl
t =l O<xt'e)O(xt+\hl,e)
n ro
where
Theorem 2.
q x q
su~e1y
to
""e
converges almost surely to
a2
be a typical element of
",-1
rq .
Then
Q
+ (l/n)L t :'.+l{L:. O a.[u t . + o(x t .,e)]}2
-,'1.
J=
J
-J
-J
.
= a (e)/n + b (e)/n •
n
n
e
because
conv~ges almost surely to y~j and the continuous functions
S(x.,6)
Now
a (e)/n
n
converges almost surely to zero uniformly in
bounded over the compact set
estimator differs from
y(i-j)
share asymptotic properties.
~
are
®•
by only a finite number of terms, so they
The second term may be written
e*
22
which converges almost surely to
q
q
2:'-02:'=0
a.a:y(i-j)
~- . J
1 J
uniformly in
e
q
q
-
+ 2:'-02:'-0
1J- a.a,6(i-j,e)
1 J
by the Uniform Strong Law, the definition of
almost sure convergence of
a.
1
and
"-,'<
Y (h) •
6(h,e) , and the
Using the ¥ule-Walker equations
we have
whence
Q (e)
n
convergest:o
2
Q(e) =
CT
q.
q
+ 2:.1=0 2: j=O a.a
1 J..S(i-J',e)
almost surely uniformly in
e.
The term on the right is the limit of a sum of
squares and is therefore positive or zero.
Q(e)
=(J
2
only when
By our non-singularity assumption,
e = e*
('9}
CO
minimizing Qn(e)
·nu=l
Consider a sequence of points
over
co
rUt} t=- co.
corresponding to a realization of the process
Q
n
=e
® is compact
re }m=l
co
there is at least one limit point eO and one subsequence
.lim
";;;
. ffio+CO on
Since
®
~tn
such that
m
o
Unless, this realization belongs to the exceptional set
m
whence
cOIJ,verge s uniformly to
(e)
E
m
o
. ~ 2 :s;
Q(e 0) =
Q
m-+·CO n
lim
re n
m
m
o
"",2
to
2
CT
"",2
a
Q (e ) -a
n
2
m
has one limit point
except· for realizations of the process rUt}
a
m+ CO
e = e*
which implies
Since
~'c
):s; lim
~
= [n/(n-p)lQn~e)
in
E, where
e*
P(E) = 0
we have the almost sure convergence of
0
Theorem 3.
Under the assumptions above,
~n
<9 -
'1c
e )
conve rge s in
distribution to a p-variate normal with mean vector zero and variance-covariance
matrix
(J
2 ... 1
C
23
0
The elements of the matrix
converge almost surely to the corresponding elements of C.
Proof.
Since
~n
Let
(8 . . i)
e = ""e
if
e
is in
S
and let
. = 9*
f"W
if
9
e
is not in
S
converges almost surely to the zero vector by Theorem~, it
will suffice to prove the theorem for
.
e •
By Taylor's theorem we may write
where
e• •
e
varies with
t
and
9.
and lies on the line segment joining
9*
to
Similarly, the vector
where the matrix
D(r,s)
has typical element
As-a consequence of the Uniform Strong Law,
(l/n)d .. (r,s)
converges almost
~J
surely to zero.
Using these expressions we can write
q
q
*
n
"r:;::
+ ~'-O~'-O
~J- a.a.(lrvn)~t
~ J
. =q+lVf(x t ... ~.,9 )u t "'J.
...
q
[~.~= O'
A
q
A
A
A
••
•
LJ=0 a.a.A
(~,J)Jlrn
~ J n
v.. (9 ...
~'(
e)
;"
24
where
A (i, j)
n
- (l/n)D(j,i) •
Note that if
if
i < j
i
~
A (i, j)
n
The vector
because
an
e
then
j
A (i, j)
n
converges almost surely to
converges almost surely to
(-l,fn/2)vQ(8)
n
V(i-j)
and
VI (i-j) •
converges to the zero vector almost' surely
is a stationary point of
Q (9)
n
when
9
is in
S •
The vector
converges almost surely to zero because it is a finite sum of random
variables which converge almost surely divided by yn
~
We may write
*
q
Ir
n
+~. 0 a. v n (a. - a.)(l/n)~t_ +lu t .~f(xt .,9 )
~=
~
J
J
-q
-J
-~
A
A
which converges in distribution to a p-variate normal with meGln vector zero and
variance-covariance matrix
c = ~~~O~J'~O
...
2
a C where
a.a.lim
A (i,j)
~J
n-+<Xl n
25
in consequence of Corollary 1 of Theorem 5 of [lOJ, the Uniform Strong Law, and
the fact that
~n
(a.~ - a.)
~
is bounded in probability.
Thus,
*
Vn (e. - e)
is
asymptotically normally distributed with mean vector zero and variance-covariance
matrix
0'
2 -1
C
•
The second conclusion is obtained by writing
The two terms on the right converge almost surely to the zero matrix and
respectively, using arguments similar to those employed above.
5.
C ,
n
MONTE-CARLO SIMULATIONS
The simulations summarized in Tables 1 and 2 were performed to gain
infor~tion
1.
on three questions:
Does the autoregressive estimation procedure described in Section 3
have better small sample efficiency than the Hannan-Goebel
estimato.r'?
2.
Is the small sample distribution of the "t-ratios"
"",,2.
,c
(0'
ii
1/2
)· . (L"f 1,: 2,
H., p) derived from the asymptotic theory
approximated by ,the t-distribution with
n-p
degrees freedom with
sufficient accuracy for use in applications'?
3.
Is the two-stage autoregressive estimator an improvement over the onestage autoregressive estimator in small samples'?
The Monte-Carlo evidence presented here indicates that the answers are:
1.
Yes.
1.
MONTE-CARLO ESTIMATES: MEAN, VARIANCE, MEAN SQUARE ERROR, AND REIATIVE MEAN SQUARE ERROR EFFICIENCY FOR
EXPONENTIAL MODEL WITH n = 60 a ,
'.',Ordinary Least Squares
Error
Structure
Mean
Var.
M.S .E. Eff.
Autoregressive. q • 2
Hannan-Goebel
Mean
Var.
One Stage
M.S.E. Eft.
Mean
Var.
.. Two Stage
M:S .E. Eft.
Var.
M.S.E. Eff.
.751
.755
.751
.761
.00091
.00536
.00315
.00759
.00091 .96
.00538 1.41
.00315 1.65
.00772 5.84
1.150
1.149
1.151
1.147
.00024
.00103
.00057
.00122
.00024 .95
.00103 1.47
.00057 1.73
.00122 6.19
Mean
Parameter 81
lID
MA(4)
AR(l)
AR(2)
.751
.758
.751
.775
.00087
.00753
.00518
.04444
.00087
.00759
.00518
.04507
1
1
1
1
.749 .00086 .00086 0.92
.756 .0065e .00661 1.28
.751 .00394 .00394 1.34
---- ------ ------ ----
-
.751 .00090 .00090 .96
.755 .0053~ .00536 1.42
.751 .00317 .00317 1.63
.762 .00950 .00963 4.68
Parameter 82
lID
MA(4)
AR(l)
AR(2)
1.150
1.149
1.151
1.152
.00022
.00151
.00099
.00757
.00022
.00151
.00099
.00757
1
1
1
1
1.151 .00023 .00023 0.91
1.149 .00127 .00127 1.34
1.151 .00072 .00072 1.38
------ ----- ------ ----
1.150
1.149
1.151
1.148
.00023
.00102
.00058
.00153
.00023 .96
.00102 1.48
.00058 1.71
.00153 4.95
aComputed from two thousand trials except the Hannan-Goebel estimator which was computed from four hundred
trials for MA(4) and two hundred trials in the remaining cases.
Source:
The values for the Hannan-Goebel estimator were obtained from [5].
N
0'1
EMPERlCAL DISTRIBUTION OF lit-RATIOS" COMPARED TO THE t-DISTRIBUTIONa
2.
Emperica1 Distribution
I
Tabular Values
c
pet
~cl
lID'.
P[ t1~cJ
MA.(4)
p[ t2~cJ
p[ t1~cl
AR(l)
P[t2~cJ
P[t1~cJ
AR(2)
P[ ti~c]
p[ t1~cl
P[t2~cl
Std. Error
.2205
.2750
.3045
.4045
.4820
.5700
.6860
.7240
.7940
.1810
.2430
.2875
.4185
.5190
.6150
.7465
.7860
.8570
.0016
.0035
.0049
.0097
.0112
.0097
.0049
.0035
.0016
.0470
.0860
.1210
.2980
.4875
.6685
.8585
.9000
.9465
.0610
.1155
.1480
.3325
.5110
.7070
.8960
.9300
.9645
.0016
.0035
.0049
.0097
.0112
.0097
.•0049
.0035
.0016
Ordinary Least Squares
-2.663
-2.002
-1.672
-0.679
0.0
0.679
1.672
2.002
2.663
-2.663
-2.002
-1.672
-0.679
0.0
0.679
1.672
2.002
2.663
~
.005
.025
.050
.250
.500
.750
.950
.975
.995
.0060
.0290
.0515
.2590
.5000
.7485
.9435
.9770
.9960
.0050
.0270
.0615
.2555
.5045
.7465
.9535
.9775
.9950
.0445
.1035
.1385
.3085
.4805
.6555
.8690
.9105
.9640
.005
.025
.050
.250
.500
.750
.950
.975
.995
.0115
.0330
.0630
.2590
.4995
.7380
.9370
.9720
.9950
.0075
.0320
.0695
.2655
.5015
.7365
.9455
.9755
.9895
.0105
.0340
.0625
.2635
.4810
.7310
.9450
.9735
.9950
.0320
.0765
.0455
.1330
.0785
.1005
.1830
.1435
.1180
.3265
.3300
.3590
.5240
.5005
.4985
.6540
.6690
.7075
.8990
.8280
.8715
.9420
.8700
.9045
.9360
.9585
.9800
One Stage Autoregressive
.0080
.0340
.0685
.2735
.5240
.7505
.9560
.9755
.9955
.0290
.0615
.0890
.2905
.5060
.7215
.9175
.9535
.9830
.0160
.0515
.0855
.2755
.5020
.7205
.9230
.9470
.9860
N
......
Two Stage Autoregressive
-2.663
-2.002
-1.672
-0.679
0.0
0.679
1.672
2.002
2.663
.005
.025
.050
.250
.500
.750
.950
.975
.995
.0115
.0330
.0635
.2600
.5015
• 7370
.9375
.9720
.9945
.0075
.0330
'.•0700
I
"'0 2650
.5005
.7350
.9455
.9745
.9895
.0100
.0350
.0610
.2630
.4790
•7350
.9460
.9750
.9950
~0075
.0330
.0665
.2710
.5205
.7505
.9555
.9760
.9960
.0290
.0615
.0905
.2870
.5045
•7255
.9200
.9570
.9845
.0155
.0485
.0810
.2715
.5030
• 7220
.9225
.9485
.9870
.0350
.0725
.1075
.2805
.4900
.6695
.8700
.9155
.9590
.0480
.1000
.1360
.3260
.5120
•7230
.9100
.9425
.9760
.0016
.0035
.0~9
.0097
.0112
.0097
.0049
.0035
.0016
aEmpirica1 distributions were computed from two thouSand MOnte-Carlo trials
N
00
29
2.
No.
On balance, the small sample distributions of the "t-ratios" have
heavier tails than the t-distribution with
n-p
degrees freedom.
Perhaps a reasonable procedure would be to enter tables of the
distribution with
3.
. Ambiguous.
n-p-q
t-
degrees of freedom in applications.
Performance improved when the disturbances were, in fact,
autoregressive and deteriorated slightly when they were not.
The details of the simulations are as follows.
So as to be able to compare
our results with Goebel's [5J we used his choice'of a response function,
his choice of inputs for
e = (.75,
n
= 60
as shown in Table 3, his choice of parameters,
L15) ,
and his choice of error structures:
where in each case
with
q = 2
e
t
_ NID(0,.25) •
for the simulations.
We chose the autoregressive estimator
With this choice, the autoregressive
estimator is not exactly appropriate for any of these cases - lID,
AR(l).
~(4),
and
However, one cannot really hope to exactly satisfy his assumptions in
applications and we expect that a near miss such as used here is a more
realistic imitation of an applied situation.
To gain information on what would
happen were our assumptions exactly satisfied we included a fourth case not
considered by Goebel:
30
where
e
t
The gains in mean square error efficiency relative to
"'" NID(O, .25)
ordinary least squares are dramatic for this case as seen in Table 2.
Relying on the asymptotic theory for purposes of statistical inference - such
as finding a confidence interval for
distribution with
= \{n
t.
~
h
te
~
~
P=
in the formulae for
I
.
(e. - e.;(.)/(a'--..2-c .. )
"'"c ..
~s
one would enter tables of the t-
~
n-p degrees freedom using the statistic
where
~~
e. -
~~
. th d'~agona1 e 1ement
~--
ordinary least squares.)
.....2
0'
and
"'"
C'
0
f
::;:
~
.
2
as define d in 8 ect~on.
(8 et
to obtain the appropriate quantities for
We see from Table 2 that if one uses ordinary least
squares formulae when the disturbances are, in fact, autocorrelatedhis
probability statements can be quite erroneous - confidence intervals,
would be much too narrow.
~.&.,
We c'ould argue, on the basis of Table 2, that one
should use the formulae of Section 2 not so much as to gain efficiency in nonlinear time series regressions, but so as to be able to make reasonably accurate
probability statements in applications.
The standard
erro~
shown in Table 2 refer to the fact that if
t.
in fact, follow the t-distribution then the Monte-Carlo estimate of PE t
has a standard error of
{PEt:S: cJ.PEt < c]/2000}
1/2
does,
~
i
:S: c]
•
~-_/
3.
INPUTS FOR nIE SlMUIATIONS
Inputs
2.12500
2.09400
2.98500
2.4?300
1.54200
2.03600
2.65400
2.75400
1.23000
2.06680
2.00300
2.20300
1.00330
2.45000
2.40000
1.56000
1.77000
1.23068
2.02000
2.75000
1.32040
2.42100
2.12300
3.00200
2.65200
1.03300
1.56300
2.10300
1.00330
2.45000
2.40000
1.56000
1.77000
1.23068
2.02000
2.75000
0.99800
1.65400
2.56800
2'.12300
•
2.02300
2.00200
2.98600
1.33200
2.00123
2.54000
1.30000
1.65000
r
/-l.03300
2.03600
2.65400
, 2.75400
1.23000
2.06680
2.00300
1.32100
2.02300
2.42100
2.12300
3.00200
32
REFERENCES
[lJ
Anderson, To Wo, The Statistical Analysis of Time Series, New York:
John Wiley and Sons, Inc., 1971 0
[2J
Box, G. E. Po and Jenkins, Go M., Time Series Analysis:
Control, San Francisco: Holden-Day, Inc., 1970.
[3J
Duncan, D. B. and Jones, R. H., "Multiple Regression with Stationary
Errors", Journal of the American Statistical Association, (December 1966),
917-928.
[4J
Fuller, W. A o Introduction to Statistical Time Series, Ames, Iowa:
University Bookstore, Iowa State University, 1971.
[5J
Goebel, J. J., "Nonlinear Regression in the Presence of Autocorrelated
Errors", Ph.D. Dissertation, Iowa S tate University, 19740
[6J
Hannan, E. J. "Nonlinear Time Series Regression", Journal of Applied
Probability, 8 (December 1971), 767-80.
[7J
Hartley, H. Q. "The Modified Gauss-Newton Method for the Fitting of
Nonlinear Regression Functions by Least Squares," Technomettics, 3
(May 1961), 269-80 •.
[8J
IoB.M. Corporation, System/360 Scientific· Subroutine Package. Version III,
White Plains, New York: IoBoMo Corporation, Technical Publications
Department, 1968.
[9J
Jenkins, G. M. and Watts, D. G. Spectral Analysis and Its Applications, San
Francisco: Holden-Day, Inc., 1968.
[lOJ
Jennrich, R. Jo '~symptotic Properties of Nonlinear Least Squares
EstimatDJ::s", The Annals of Mathematical Statistics, 40 (March 1969),
533-43.
[l1J
Marquardt, DoW 0 "An Algorithm for Least Squares Es timation of Nonlinear
Parameters", Journal of the Society for Industrial and Applied Mathematics,
11 (~une 1963), 431-410
[12J
Office of the President, Economic Report of the President. Transmitted to
Congress January 1972, Washington, D. Co: U. S. Government Printing Office,
1972.
[13J
Office of the President, Economic Report of the President. Transmitted to
Congress February 1974, Washington, D. C.: U. S. Government Printing Office,
1974.
[14J
U. S. Bureau of the Census, Historical Statistics of the United States,
Washington, D. C.: U.S. Government Printing Office, 1960 0
Forecasting and