Gallant, A.R.; (1985)Nonlinear Statistical Models. Ch. 9 Dynamic Nonlinear Models."

•
•
Nonlinear Statistical Models
by
A. Ronald Gallant
Chapter 9. Dynamic Nonlinear Models
Institute of Statistics Mimeograph Series No. 1667
July 1985
•
NORTH CAROLINA STATE UNIVERSITY
Raleigh, North Carolina
NONLINEAR STATISTICAL MODELS
by
A. Ronald Gallant
CHAPTER 9.
Copyright
©
Dynamic Nonlinear Models
1984 by A. Ronald Gallant.
All rights reserved.
NOT FOR GENERAL DISTRIBUTION
This copy is reproduced by special permission of the author and
publisher from:
A. Ronald Callant.
John
~iley
Nonlinear Statistical Models.
New York:
and Sons, Inc, forthcoming.
and is being circulated privately for discussion purposes only.
Reproduction
from this copy is eapressly prohibited.
Please send comments and report errors to the folloWing address:
A. Ronald Callant
Institute of Statistics
North Carolina State University
Post Office Boa 8203
Raleigh, NC 27695-8203
Phone 1-919-737-2531
NONLINEAR STATISTICAL MODELS
Table of Contents
Anticipated
Completion Date
1.
Univariate Nonlinear Regression
1.0
1.1
1.2
1.3
1.4
I.S
1.6
1.7
1.8
Completed
Preface
Introduction
Taylor's Theorem and Matters of Notation
Statistical Properties of Least Squares
Estimators
Methods of Computing Least Squares Estimators
Hypothesis Testing
Confidence Intervals
References
Indn
2.
Univariate Nonlinear Regression:
Special Situations
3.
A Unified Asymptotic Theory of Nonlinear Statistical
Models
December 1985
Completed
3.0
3.1
3.2
Prehce
Introduction
The Data Cenerating Model and Limits of Cesaro
Sums
3.3 Least Mean Distance Estimators
3.4 Method of Moments Estimators
3.S Tests of Hypotheses
3.6 Alternative Representations of a Hypothesis
3.1 Indepently and Identically Distributed Regressors
3.8 Constrained Estimation
3.9 References
3.10 Indn
4.
Univariate Nonlinear Regression:
Asymptotic Theory
Completed
4.0
4.1
4.2
4.3
Preface
Introduction
Regularity Conditions
Characterisations of Least Squares Estimators and
Test Statistics
4.4 References
4. S Index
S.
Multivariate Linear Models:
Review
Deletion likely
Table of Contents (continued)
Anticipated
Completion Date
6.
Multiv~ri~te
Nonlinear Models
Completed
6.0
6.1
6.2
6.3
6.4
6.5
6.6
6.7
Preface
Introduction
Least Squares Estimators and Matters of Notation
Asymptotic Theory
Hypothesis Testing
Confidence Intervals
Mazimum Likelihood Estimators
An Illustration of the Bias in Inference
Caused by Misspecification
6.8 References
6.9 Indu
7.
Linear Simultaneous Equations Models:
Review
8.
Nonlinear Simultaneous Equations Models
June 1985
9.
Dynamic Nonlinear Models
December 1984
9.0 Preface
9.1 Introduction
9.2 A Uniform Strong Law and! Central Limit
Theorem for Dependent, Nonstationary
Random Variables.
9.3 Data Generating Process
9.4 Least Mean Distance Estimators
9.5 Method of Moments Estimators
9.6 Hypothesis Testing
9.7 Ref erences
9.8 Indu
Deletion likely
9-0-1
The statistical analysis of dynamic nonlinear models, for instance a model
such as
t
= 0, ±1, ±2, ...
with serially correlated errors, is little different from the analysis of static
models as far as applications is concerned.
The formula for estimating the
~
-
variance of the average of the scores -- the formula for J and J -- changes but
little else.
Thus, as far as applications is concerned, the previous intuition
and methodology carries over directly to the dynamic situation.
The main theoretical difficulty is to establish regularity conditions that
permit a uniform strong law and nearly uniform central limit theorem that are
both plausible (reasonably easy to verify) and resilient to nonlinear
transformation.
The time series literature is heavily oriented toward linear
models and is thus not of much use.
The more recent martingale central limit
theorems and strong laws are not of much use either because martingales are
essentially a linear concept
a nonlinear transformation of a martingale is
not a martingale of necessity.
In a series of four papers McLeish (1974, 1975a,
1975b, 1977) developed a notion of asymptotic martingales which he termed
mixingales.
This is a concept that does extend to nonlinear situations and
the bulk of this chapter is a verification of this assertion.
the extension is this.
Conceptually Yt in the model above is a function of
all previous errors e t , e - 1 , . . . .
t
is a function of e , ... , e
t
The flavor of
t -m
But if Yt can be approximated by
and the error of approximation Ily -ytll
t
Yt that
p
falls
off at a polynomial rate in m then smooth transformations of the form
g(Yt' ... ' Yt-l'x t , ... , xt-l'Y) follow a uniform strong law and a nearly uniform
central limit theorem provided that the error process is strong
mixing.
rest of the analysis follows along the lines laid down in Chapter 3 with a
The
9-0-2
reverification made necessary by a weaker form of the uniform strong law:
an average of random variables becomes close to the expectation of the average
but the expectation
itself
does not necessarily converge.
These results
were obtained in collaborative research with Charles Bates, Halbert White, and
Jeffrey M. Wooldridge while they visited Raleigh in the summer of 1984.
National
Science Foundation support for this work is gratefully acknowledged.
The reader who is applications oriented is invited to scan the regularity
conditions to become aware of various pitfalls, isolate the formula for J or J
relevant to the application, and then apply the methods of the previous chapters
forthwith.
A detailed reading of this chapter is not essential to applications.
The material in this chapter is intended to be accessible to readers familiar
with an introductory, measure theoretic, probability text such as Ash (1972),
Billingsly (1979), Chung (1974), or Tucker (1967).
In those instances where the
proof in an original source was too terse to be read at that level, proofs with
the missing details are supplied here.
Proofs of new results or significant
modifications to existing results are, of course, given as well.
Proofs by
citation occur only in those instances when the argument in the original source
was reasonably self contained and readable at the intended level.
~
9-1-1
1.
INTRODUCTION
This chapter is concerned with models which have lagged dependent variables
as explanatory variables and (possibly) serially correlated errors.
Something
such as
u
t
= e
t
+
y;
e -l
t
=0.±1.±2 • . . .
t
might be envisaged as the data generating process with {e } a sequence of. say.
t
independently and identically distributed random variables.
As in Chapter 3.
one presumes that the model is well posed so that in principle. given Yt-l' x '
t
yi.
Ut' one could solve for Yt.
Thus an equivalent representation of the model
is
yO)
Yt = Y(u t • Yt-l' x t ' I
u
t
t
= e
t
+ y; e t - l
...
= O. ±l. ±2.
substitution yields
and if this substitution process is continued indefinitely the data generating
process is seen to be of the form
Yt = Y(t. e , x , yO)
""
""
t
0, ±l, ±2 •...
with
e"" = ( •.. , e_ l • eO' e l •.•• )
x"" = ( ••• , X_I' x
o'
Xl' ••• )
Throughout. we shall accommodate models with a finite past by setting Yt' x ' e
t
t
equal to zero for negative t; the values of Yo' x ' and eO are the initial conditions
o
in this case.
9-1-2
If one has this sort of data generating process in mind, then a least
mean distance estimator could assume the form
with sample objective function
or
n
~
s n (A) = (l/n)E t= lS t (y,y
t t- l'···'YO'x,x
t t- 1, ... ,xO,T n ,A);
the distinction between the two being that one distance function has a finite
number of arguments and the number of arguments in the other grows with t.
Writing
with it depending on t accommodates either situation.
Similarly, a method of
moments estimator can assume the form
A = argminl\ s (A)
n
n
= d[mn (A),; n J
with moment equations
m (A)
n
In the literature, the analysis of dynamic models is unconditional for the
most part and we shall follow that tradition here.
amongst the components of x
t
are accommodated by viewing them as random variables
that take on a single value with probability one.
is no
Fixed (nonrandom) variables
Under these conventions there
mathematical distinction between the error process {e}=
and the process
t t=-=
{xt}::_=describing the independent variables.
the independent variables {x }=
t t=-=
The conceptual distinction is that
are viewed as being determined externally
to the model and independently of the error process {e t }:=_=.
In an unconditional ~
9-1-3
analysis of a dynamic setting we must permit the process {x}~
to be dependent
t t=-~
and, since fixed (nonrandom) variables are permitted, we must rule out stationarity.
We shall also permit the error process {et};=_~ to be dependent and nonstationary
primarily because nothing is gained by assuming the contrary.
Since there is no
mathematical distinction between the errors and the independent variables, we can
economize on notation by collecting them into the process {v}~
with
t t=-~
denote a realization of the process by
Recall that if the process has a finite past then we set v
t
= a for t
<
a and
take the value of va as the initial condition.
Previously, we induced a Pitman drift by considering data generating
processes of the form
and letting yO tend to a point y*.
n
In the present context it is very difficult
technically to handle drift in this way so instead of moving the data generating
model to the hypothesis as in Chapter 3 we shall move the hypothesis to the model
by considering
H: h(AO) = h*
n
n
against
*
A: h(AO) 1 h
n
n
and letting h(AO) -h * drift toward zero at the rate O(l/~).
n
n
This method of
inducing drift is less traditional but in some respects is philosophically more
palatable.
It makes more sense to assume that an investigator slowly discovers
the truth as more data becomes available than to assume that nature slowly
accommodates to the investigator's pigheadedness. But withal, the drift is only
a technical artifice to obtain approximations to the sampling distributions of
test statistics that are reasonably accurate in applications so that
9-1-4
phi1isophical nitpicking of this sort is irrelevant.
If the data generating model is not going to be subjected to drift there
is no reason to put up with the cumbersome notation:
Yt = Y(t,e 00 ,x00 ,yO)
sO.)
= (l/n)I;nt= lS t (y,y
o,x,x
n
t t- l'···'Y t-~
t t- l'···'x t-~0"; n ,A)
t
t
Much simpler is to stack the variables entering the distance function into a
single vector
and view w as obtained from the doubly infinite sequence
t
by a mapping of the form
Let w be kt-dimensional.
t
Estimators then take the form
in the case of least mean distance estimators and
A = argmin. s (A) = d[m (A),; J
n
1l
n
n
n
m (A) = (l/n)I;~=lmt(Wt' ;n,A)
n
in the case of method of moments estimators.
We are
led then to consider limit
9-1-5
theorems for composite functions of the form
n
(l/n)E t= Ig t [W t (v~ ),yJ
which is the subject of the next section.
There y is treated as a generic
parameter which could be variously yO, (T,A), or an arbitrary infinite
dimensional vector.
9-2-1
2.
A UNIFORM STRONG LAW AND A CENTRAL LIMIT
THEOREM FOR DEPENDENT, NONSTATIONARY RANDOM VARIABLES
Consider a sequence of vector-valued random variables
t = 0,±1,±2,
l
defined on a complete probability space (12,a,p) with range in, say, lR .
Let
, Borel measurable
l
of the form Wt (v )
is in lR and consider vector-valued/functions
A
where each v
0)
t
with range in lR
kt
for t
=
0, 1,
The subscript t serves three functions.
It indicates that time may enter as a variable.
It indicates that the focus of
the function W is the component v of VO) and that other components V enter the
t
t
s
computation, as a rule, according as the distance of the index s from the index
t, for instance
W (v )
t
0)
0)
= E.J= OA.V
.
J t -J
And, it indicates that the dimension k
t
of the vector w = Wt (v ) may depend on t.
t
0)
Put
then Wt[VO)(w)J is a kt-dimensional random variable depending (possibly) on infinitely
many of the random variables Vt(w).
often write Wt(w) or W instead.
t
{gnt(wt,y):
{gt(wt,y):
This notation is rather cumbersome and we shall
Let (f,p) be a compact metric space and let
n=1,2, ... ; t=0,1, .•. }
t=0,1, .•. }
be sequences of real valued functions defined over lR
kt
x f.
shall set forth plausible regularity conditions such that
almost surely (12, G,p) and such that
In this section we
9-2-2
for any sequence
{yO}
n
from
r,
convergent or not.
We have seen in Chapter 3 that
these are the basic tools with which one constructs an asymptotic theory for
nonlinear models.
As mentioned earlier, these results represent adaptations
and extensions of dependent strong laws and central limit theorems obtained in
a series of articles by McLeish (1974, 1975a, 1975b, 1977).
Additional details
and some of the historical development of the ideas may be had by consulting
that series of articles.
We begin with a few definitions.
The first defines a quantitative measure
of the dependence amongst the random variables {v}~
.
t t=-~
STRONG MIXING.
A measure of dependence between two sigma-algebras J
and
~
is
IP(FG)
a(3,q) = sUPFE3' ,G~
- P(F)P(G) I
The measure will be zero if the two sigma-algebras are independent and positive
otherwise.
Let {Vt};=_~be the sequence of random variables defined on the
complete probability space (12,a,p) described above and let
denote the smallest complete (with respect to p) sub-sigma-algebra such that the
random variables V for t = m, m+l,
t
t
~
a m = sup t a(J -~ ,Jt+m).
Observe that the faster a
l~
exhibits.
{VtIt=_~
m
m
... ,
n are measurable.
Define
[J
converges to zero, the less dependence the sequence
An independent sequence has a m
>0
for m = 0 and a m = 0 for
> O.
Following McLeish
(1975b~
we shall express the rate at which such a sequence
of nonnegative real numbers approaches zero in terms of size.
9-Z-3
SIZE.
-q if a
{a m}:=1
A sequence
8
m
= O(m ) for some 8
of nonnegative real numbers is said to be of size
< -q.
o
This definition is stronger than that of McLeish.
However, the slight
sacrifice in generality is irrelevant to our purposes and the above definition
of size is much easier to work with.
a bound B with
la mI
~
Bm
e
Recall that a
m
8
= O(m ) means that there is
for all m larger than some M.
Let {€t:
Withers (1981, Corollary 4.a) proves the following.
t=O,±l,±Z,···t
be a sequence of independent and identically distributed random variables each
with mean zero, variance one, and a density P€(t) which satisfies
P€(t+h)ldt ~ IhlB for some finite bound B.
distributed then
this condition
is
satisfied.
If each €t is normally
Let
~
e t = Ej=O dj€t_j
where d
j
Izi ~ 1.
= O(j-v) for some v
>
3/2 and E;=OdjZ
Suppose that I/€tl/o ~ const.
<~
j
~
0 for complex valued z with
for some 0 with Z/(v-l)
Then {e } is strong-mixing with {a } of size -[6(v-1)-Z]/(6+1).
t
m
distributed {€t} there will always be such a 6 for any v.
<0 <v
+~.
For normally
These conditions are
not the weakest possible for a linear process to be strong-mixing; see
Withers (1981) and his references for weaker conditions.
The most frequently used time series models are stationary autoregressive
moving average models, often denoted ARMA (p,q),
with the roots of the characteristic polynomials
p-l
+
mP + aIm
q-l
q
+
m + blm
+ a
...
+ b
p
q
= 0
= 0
9-2-4
less than one in absolute value.
Such processes can be put in the form
where the d. falloff exponentially and
]
E;=O
djZ
j
~ 0 for complex valued z
with Izi S 1 (Fuller, 197&, Theorem 2.7.1 and Section 2.4)
for any v
> O.
whence d.
]
= O(j-v)
Thus, a normal ARMA (p,q) process is strong-mixing of size -q
for q arbitrarily large; the same is true for any innovation process {€t} that
satisfies Withers' conditions for large enough
o.
It would seem from these remarks that an assumption made repeatedly in
the sequel:
"{V}
t
00
t=-oo
is strong-mixing of size -r/(r-2) for some r
is not unreasonable in applications.
> 2,"
If the issue is in doubt, it is probably
easier to take {V } to be a sequence of independent random variables or a
t
finite moving average of independent random variables which will certainly
be strong-mixing of arbitrary size and then show that the dependence of
observed data W on far distant V is limited in a sense we make precise below.
t
s
This will provide access to our results without the need to verify strong-mixing.
We shall see an example of this approach when we verify that our results apply
to a nonlinear autoregression (Example 1).
9-2-5
Consider the vector-valued function W (V ) which, recall, depends (possibly)
t
""
on infinitely many of the coordinates of the vector
If the dependence of W (V ) on coordinates V
t
s
""
far removed from the position
occupied by V is too strong, the sequence of random variables
t
W
t
= W (V
t
""
)
will not inherit any limits on dependence from limits placed on {V}""
as measured
t t=-""
by {am}:=O'
In order to insure that limits placed on the dependence exhibited by
carryover to {W}""
{V}""
t t=-""
t t=O
we shall limit the influence of V on the values
s
taken on by Wt(V"") for values of s far removed from the current epoch t.
A
quantative measure of this notion is as follows.
NEAR EPOCH DEPENDENCE.
Let {V}""
be a sequence of vector-valued random
t t=-""
variables defined on the complete probability space (~~, a,p) and let J n denote
m
the smallest complete sub-sigma-algebra such that the random variables V for
t
t
= m,
m+l, ... , n are measurable.
=
W (v )
t
for
""
sequence of Borel measurable, functions with range in
on infinitely many of the coordinates of the vector
~
kt
t
=
0, 1, ... denote
that depends (possibly)
9-2-6
Let {gnt(w )} for n = 1, 2, ... and t = 0, 1, 2, '"
t
be a doubly indexed sequence
of real valued, Borel measurable functions each of which is defined over
The doubly indexed sequence {g
nt
~
kt
(W )} is said to be near epoch dependent of size -q if
t
is size -q.
Let (f,p) be a separable metric space and let {g
nt
(w ,y)} be a doubly indexed
t
family of real valued functions each of which is continuous in y for each fixed
kt
w and Borel measurable in WtE ~
for fixed y. The family {gnt(Wt,y)} is said
t
to be near epoch dependent of size -q if
(a)
The sequence {gO (W ) = g (W ,yO)} is near epoch dependent of size -q for
nt t
nt t n
every sequence yO from
n
(b)
r.
The sequences
and
are near epoch dependent of size -q for each yO in r and all positive
o
less than some 0° which can depend on yO.
0
The above definition is intended to include singly indexed sequences
in this instance.
~
For singly indexed families {gt(wt,y)};=O ' the definition
retains its doubly indexed flavor as {gO (W ) = gt(W ,yO)} is doubly indexed even
nt t
t n
9-2-7
Note that if W depends on only finitely many of the V , for instance
t
t
then any sequence {gnt(w t )} or any family {gnt(wt,y)} will be near epoch
dependent because one has
for m larger than i; similarly for
{gOnt (w t )}, {gnt (w t )},
and {g (w )}.
_nt t
The situations of most interest here will have the dimension of W fixed
t
at k
t
where
= k for all t and gnt(w) or gnt(w,y) will be smooth so that
w is
on the line segment joining w to w.
Letting
where Ixl
using
II:
x·y.1
1. 1.
Ixl Iyl·
:li
For functions g
nt
(w) or g t(w,y) that are smooth
n
enough to satisfy this inequality, the following lemma and proposition aid
in
showing near epoch dependence.
{W}'"
0' and {g (w,y): t=0,1,2, ... ; n=1,2, ... }
nt
t t=
be as in the definition of near epoch dependence but with k
where
Iw-wAI
= [I:.k
1.=1
1
A 2 J~
(w.-w.)
or any other convenient norm on
1.
1.
At+m
there exist random variables W
of the form
t-m
At+m
W
t-m
t
= W(V t-m , ••• , Vt , ••• , Vt +m)
= k for all t.
mk .
Let
Suppose that
9-2-8
such that for some r
>2
and some pair p, q with 1
~
p,q
~ ~,
l/p + l/q = 1
we have:
dominated by random variables d ntm with
(a)
II dntm II q
= [f Id
ntm
IqdP J1/ q
:il
r:. < ~
Y)IW _wt+ml} dominated by random variables d
with
{Bnt (W t' Wt+m
t-m'
t t-m
ntm
II d ntm II r
:il
r:. < ~ .
If
is size -2q(r-l)/(r-2) then {g nt (W t ,y)} is near epoch dependent of size -q.
o
First, we prove the following lemma.
LEMMA 1.
Let {Vt}~=_~ and {Wt}~=O be as in Proposition 1 and let {gnt(w)}
be a sequence of functions defined over
mk
with
t
For {Wt+m}, rand q as in Proposition 1 let liB t(W ,w +m)1I
t-m
n
t t-m q
t m
II B (W W
+ ) I W _wt+ml
nt t' t-m
t t-m
PROOF.
II r ~ r:. < ~.
:il
r:. < ~
and let
(W )} is near epoch dependent of size -q.
nt t
A At+m
t+m
Let g(w) = g (w), W=Wt,W=W
,and:J =3'
. For
nt
t-m
t-m
Then {g
c =
let
A
Bl(W,W)
= {B(Wo'W)
A
B(w,w)IW-wl
A
A
B(W,W) w-WI
A
A
A
and let B (W,W) = B(W,W) - Bl(W,W).
2
:il
c
>c
Then
because e(gWIJ) is the best J-measurable approximation to gW in L2-norm and W
is
~-measurable
9-Z-9
- -
~ IIB(w,w)lw-WI
liz
AI>....
....,..
~ II Bl (w,w) Iw-wl
by
liz + IIBZ(W,W) IW-wl liz
the triangle inequality
= {J[B l (W,W)j ZIW-WI 2 dP}~ + {J[BZ(W,W)j2IW-wI2dP}~
~ C~{JBl(W,W) Iw-wldP}~ + c(2-r)/2 {Jc-2+r[B2(W,W)j2IW-WlzdP}~
~ c~IIB1(w,w)II~1I Iw-wl II~ + /Z-r)/2{J[B2(W,W)jrlw-wlrdP}~
by the Holder inequality
= c~IIB1(w,w)II~
=
2~
II Iw-wl
II Iw-wl II; + c(Z-r)/2 1IB (w,w) Iw-wl
II~(~:I)IIB(W,W)II~(~:I>
IIB(W,W) Iw-wl
after substituting the above expression for c and some algebra.
and II B(W, w)
Iw-wl
II r ~ t. then we have
IIgW - e(gwIJ)1I
2
:;l
2~
t. II Iw-wl
II;(~:I)
whence
If n
m
is size -Zq(r-2)/(r-Z) then v
PROOF OF PROPOSITION 1.
m
is size -q.
Now
implies
-gnt(w,y) ~ Bnt(w,;,y)lw-;1 - gnt(;'y)
-gnt(;'y) ~ Bnt(w,;,y)lw-;1 - gnt(w,y)
[J
11:/ 2
11~(r:I)
9-2-10
whence, using sup{-x} = -inf{x}, one has
~ sup p ( y,y O)<ru Bnt (w,;,y) Iw-;I
A similar argument applied to
yields
Isupp(y,yO)<o gnt(w,y) - supp(y,yO)<o gnt(;,y)1
~ sup p ( y,y O)<ru Bnt (w,;,y) Iw-;I.
We also have
All three inequalities have the form
Ignt(w) - gnt(;)I ~ Bnt(w,;)lw-;1
with
IIBnt(Wt,W~~)lIq :£
applies to all three.
t.
<
00
and
IIBnt(Wt,W~~)IWt- w~~:1
II r
~
t.
<
00
whence Lemma 1
Thus part (a) of the definition of near epoch dependence
obtains for any sequence {yO}
n
and part (b) obtains for all positive 0.
(J
The following example illustrates how Proposition 1 may be used in applications,
EXAMPLE 1.
(Nonlinear Autoregression)
Consider data generated according to
the model
t
=
1, 2,
t
~
0
Assume that f(y,x,S) is a contraction mapping in y; viz
I <a/3y)f(y,x,S)1
:;; d < 1 .
9-2-11
Let the errors {e } be strong mixing with Iletllp ~ K
t
e
t
=
a
for t
~
e
with
t
t
t
O.
=
00
for some p
E.t
y e:
,. e:
j =0 j t - j' '" t
=a
'
e Ie
t
Ip ~
K
<
00
~
0 and V = (et,x ) for
t
t
A
Suppose that eO is estimated by least squares -- e
...
> 4; set
As an instance, let
With this structure, V = (0,0) for t
t
finite.
= 1, 2,
<
n
minimizes
We shall show that this situation satisfies the hypotheses of Proposition 1.
To this end, define a predictor of y
of the form
s
as follows:
>\
=
a
t
Yt = f(y t _ 1 ,x t ,eO)
AS
Yt,m = Ys - m
AS
As-1
f(y
,x ,eO) + e
Yt,m =
t,m s
s
For m
~
0, t
~
0 there
:;l
a
a <t
s
~
max(t-m,O)
<s
max(t-m,O)
~
is a Y on the line segment joining Y to Y such that
t
t
t
Iyt-ytl = If(y t _ 1 ,x t ,eO) + e t - f(y t _1 ,x t ,eO)1
= 1(3/3y)f(Yt,xt,eO)(Yt_l- Y - )
t 1
:;l
~
t
+ etl
dly t - 1 - yt-11 + let'
2
d ly t - 2 - y t - 2 1 + diet_II + Ie t
I
9-2-12
For t-m)O the same argument yields
where the last inequality obtains by substituting the bound for IYt - ytl obtained
previously.
For t - m
<0
we have
In either event,
This construction is due to Bierns (1981, Chapter 5).
Letting
we have
with
For t
~
1
9-2-13
At-ll
I
At
I
I
At-ll
+ d I Yt - l - Yt,m )( Yt - Yt,m + d Yt - l - Yt,m )
+ dmE t -m- 2 d j
j=O
t
Ie
t - m- l - j
I)( IY _ Yt,m
At I
t
+
-1, )
IY - _ Yt,m
At
t l
t
Wt-m ) Iw t - Wt-m I
where we take as a convenient norm
We have at once:
IIB(wt'w~_m)lIp S 2K[l + dm/(l-d)J ~
II
Iwt
t
- W I II ~ 2K dm/(l-d) ~ t:,
t-m
p
Using the Holder inequality, we have for r
=
<
t:,
<
00
00
all m,t
all m,t .
p/2 that
t
II B(W t ,wtt-m )IW t - Wt-m I II r
At
Note that B(W ,W
) is not indexed by
t t-m
variables.
Put q = p/(p+l)
<p
whence
e
so the above serve as dominating random
9-2-14
Thus, the example satisfies conditions (a) and (b)
of
Proposition 1.
The rate at which
Lastly, note that
nm falls off with m is exponential since d < 1 whence nm is
> 2.
size -q(r-1)/(r-2) for any r
Thus all conditions of Proposition 1 are
satisfied.
If the starting point of the autoregression is random with Yo
II YII p
~
K the same conclusion obtains.
= Y where
One can see that this is so as follows.
In the case of random initial conditions, the sequence {V } is taken as Vt
t
for t
< 0,
Va
= (Y,O),
and V
t
= (et,x t )
for t
> O.
For t - m
>a
=
(0,0)
the predictor
Y~,m has prediction error (Problem 2)
~ dtlYI + dmL:-m-ldjle
]=0
~ dmlYI + dmL:-m-1djle
]=0
For t-m
<a
~t
t,m
.1·
t-m-]
one is permitted knowledge of Y and the errors up to time t so that Yt
can be predicted perfectly for t-m
Y
.1
t-m-]
< O.
Thus, it is possible to devise a predictor
with
The remaining details to verify the conditions of Proposition 1 for random initial
conditions are as above.
U
McLeish (1975b) introducted the concept of mixingales -- asymptotic martingales -on which we rely heavily in our treatment of the subject of dynmaic nonlinear models.
The definition is as follows
MIXINGALE.
{x
nt
Let
: n = 1, 2,
... ,
t
=
1, 2, ... }
9-2-15
be a doubly indexed sequence of real valued random variables in LZ(n,u,p) and
let ~t-00 be an increasing sequence of sub-sigma-algebras.
mixingale if for sequences of nonnegative constants {c
lim
nt
Then (Xnt ,~t)
is a
-00
} and {W } with
m
W = 0 we have for all t ~ 1, n ~ 1, and m ~ 0 that
m-o m
II e(Xnt I ~t-m) II Z
(a)
:5
-00
w.m c nt
(b)
c
[J
nt
The intention is to include singly indexed sequences {Xt};=l as a special
case of the definition.
c
t
Thus (Xt'~:oo) is a mixingale if for nonnegative W and
m
W = 0 we have
with lim
m-o
m
(a)
lI e ( x t l;;::m)11 2
(b)
Ilx t -e( x t
~ Wmc t
l;;::m)lI z ~
Wm+1Ct
There are some indirect consequences of the definition.
We must have
(Problem 3)
II Xnt - e (Xnt I;; t+(m+l» II Z
II Xnt
:5
-00
- e(x
nt
I;; t+m) II Z
-00
Thus, Wm appearing in the definition could be replaced by W'm = min n_m
<
Wn so that
one can assume that W satisfies Wm+l
m
= n
00
t=-oo
;;t
-00
and letting
00
;;-00
~
W without loss of generality.
m
denote the smallest complete sub-sigma-
algebra such that all the V are measurable, we have from
t
Consequently, e(X
nt
) = 0 for all n, t
X - e (X I'i")
= 0 almost surely.
nt
n t - oo
Letting
~
1.
By the same sort of argument
9-2-16
Every example that we consider will have X
a function of the past, not
nt
the future, so that X
will perforce be ~
nt
-~
measurable.
This being the case,
condition (b) in the definition of mixingale will be satisfied trivially and is
just excess baggage from our point of view.
Nonetheless, we shall carry it along
through Theorem 2 because it is not that much trouble and it keeps us in conformity
with the literature.
The concept of a mixingale and the concepts of strong mixing and near epoch
dependence are related by the following two propositions.
random variable with range in lR
PROPOSITION 2.
k
Recall that if X is a
then
Suppose that a random variable X defined over the probability
space (12,a,p) is measurable with respect to the sub-sigma-algebra
in lR
k
•
q and has range
Let g(x) be a real valued, Borel measurable function defiaed over lR
with eg(X) '"' 0 and
II g(X) II r <
co
for some r
> 2.
k
Then
~ 2(2~ + l)[a(:J,q)J~-l/r
lIe(gxl:J) II 2
for any sub-sigma-algebra J.
PROOF.
(Hall and Heyde, 1980, Theorem A.S; McLeish, 1975b, Lemma 2.1)
Suppose that U and V are univariate random variables, each bounded in absolute
value by one, and measurable with respect to J and
v '"' sgnle(VIJ) -e VJ which is :J measurable.
s
e
v
l e(V 13<) -
= ele(vvl:J)
e VJ
-e veV J
'"' e(vV) - eveV .
q respectively.
We have
Let
9-2-17
The argument is symmetric so that for
~
.. sgn[e(ulq)
-e uJ
we have
leuv -euevl ~ e(~U) - e~eu
But' \) is just a particular instance of an:J measurable function that is bounded
by one so we have from this inequality that
Combining this inequality with the first we have
Put F_ l '" {w: \) .. -I}, F .. {w: \) .. I}, G_ .. {w: ~ .. -I}, and G .. {w: ~ .. I}.
l
l
l
Then
We have
of which the second inequality will be used below and the first is of some
interest in its own right (Hall and Heyde, 1980, p. 277).
The rest of the proof is much the same as the proof of Lemma 1.
Put
a" aCJ,Q), c,. a-l/rllgXll , Xl ,. r<lgxl s c), X ,. gX - Xl where r<lgXl :ii c) ,. 1
r
2
if
I gXI
and
e (gX /3)
:ii c and zero otherwise.
..
o.
For ex
>0
by the triangle inequality
If;] and q
are independent we will have ex .. 0
9-Z-l8
by the conditional Jensen's inequality (Problem 4) and the fact that Ix '
z
+ Zlc p - r
J
c r - p IXzlP dPjl/p
~ l(Zc)p-l Jle(xlIJ) - eXlldPjl/p + Zlc p - r Jlxzl r dPjl/p
~ (Zc>(p-U/p lele(xlIJ) - e x , jl/p + z c(p-d/p Ilx211~/p
l
~ (2c)(P-U/p (4ccd l / p + 2c(p-d/p
II gXII r/p
r
by the inequality derived above and the fact that IgXI
~ 2(Zl/p + Ua l / p - l/r IIgxll
~
IXzl
r
after substituting the above expression for c and some algebra.
PROPOSITION 3.
U
be a sequence of vector valued random variables
that is strong-mixing of size -Zqr/(r-Z) for some r
>Z
and q
> O.
Let Wt
=. Wt (V
k
denote a sequence of functions with range in R t that depends (possibly) on
infinitely many of the coordinates of the vector
Let {g (w)} for n ~ 1, Z,
nt t
and t
~
0, 1, Z, ••• be a near epoch dependent
sequence of real valued functions that is near epoch dependent of size -q.
Let
J~oo denote the smallest complete sub-sigma-algebrasuch that the random variables
V for t • n, n-l, ..• are measurable.
t
Then
00
)
9-2-19
with {tjlm J of size - q and c
PROOF.
nt
Recall that 3 n
m
= max{1, IIgnt Wt // r fl
•
u
denotes the smallest complete
that Vm, Vm+l' ... , Vn are measurable.
Let m be even.
sub-sigma-algebra such
Then
= lie le (gnt Wt /3 t-m ) IJ t- (m+l) 11/ 2
-00
-00
by the law of iterated expectations
by the conditional Jensen's inequality (Problem 4)
t m
~ Ile[e( gnt Wt l..."t-m/2
t+m/Z)la - JII
2
-00
by the triangle inequality
~ 2(2~+1)[(l(3t-m 3t+m/2)J~-1/r
-00
+ Ilg
w -
nt t
'
e(g
t-m/2
II gnt wt II r
W 13t+m/2)II
nt t t-m/2
2
by Proposition 2 applied to the first term and by the conditional Jensen's
inequality (Problem 4) applied to the second
~ 2(2~+1)«lm/2)~-1/rllgntWt II r
by definition of near epoch dependence.
and c
t
= max {1, II gnt Wt II r }
whence
+ v /
m 2
Put W
m+l
= tjIm = 2(2~+1)«l m)~-l/r+
v
/2m/2
9-2-20
for all m
~
0, n
~
1, t
~
1 and
~
m
is size -q.
Again, let m be even whence
is the best L2 approximation to gntWt by an ~~:m+l
t+m
t+m+l
measurable function and e(gntWt~_~ ) is 3_~
measurable (Problem 5)
because
e(gntWtl~:m+I)
II gnt Wt
-<
e ( gnt Wt lat+tn)II
t -m 2
-
by the same best L approximation argument
2
~
'J
m
by the definition of near epoch dependence
by the best L
for all m
approximation argument.
2
~ O.
We have
0
A mixingale (Xnt~:~) with
{W m}
of size -~ will obey a strong law of large
numbers and a central limit theorem provided that additional regularity conditions
are imposed on the sequence {c
nt
}.
An inequality that is critical in showing
both the strong law and the central limit theorem is the following.
LEMMA 2. (McLeish's inequality) Let (X t,-:l ) be a mixingale and put 8 .=E J" IX t
n
-~
nJ
t= n
be a doubly indexed sequence of constants with a
Then
e(maxo<U 8
]=..(,
2
nj
)
k
= a_ k
and
9-2-21
PROOF.
(McLeish, 1975a) We have from Doob (1953, Theorem 4.3) that
tim
e.(X
m+'X'
almost surely.
nt
I~ t_+tn
oo ) = e. (X
Q'
nt
1:;00)
-00 = Xnt
It follows that
almost surely since
Em
k
o
[e(x
~
nt
I:Jt+k) - e.(X
-~
= e(x
I:Jt+k-l)J
n t - oo
lat+tn) - e.(X
nt
-00
nt
l:;t-t-1)
-00
Put
Y = Ej vO(X I~t+k) _ o(X I~t+k-l)
jk
t=1
nt Q'-oo
v
nt Q'-oo
00
whence SnJ. = E -__ 00 YOJ almost surely.
k
k
By the Cauchy-Schwartz inequality
whence
By the monotone convergence theorem
e.(maXj~t S~j)
:£
(E:=_ooalC)E:=_oo e,(maxj:ilt
o+k
For fixed k, {(Yok,:;J ): 1 ~ j ~
J
-00
CXYo l;yk+ j -1)
Jk -00
t}
=Y
j-l,k
A martingale with a last element
t,
Y~k)/ak
is a martingale since
+ e.[e.(X .I:;j+k) - e(x .1:;j+k-l)l:;k+j-1 J
nJ -00
nJ -00
-00
such as the above, satisfies Doob's inequality
(Hall and Hyde, 1980, Theorem 2.2 or Doob, 1953, Theorem 3.4) whence
9-2-22
Now (Problem () )
Let Z k = X
nt
nt
e (X
t I;,t+k) whence
n-~
e [xe<xl~) la J
= e,2(xle) implies
and
=
ee2(xntl;':~)/ao
+
e
- ee2(xntI3::1)/ao
I
2
2
t
ZntO/al - ee (X nt :J_~)/al
(Problem 7)
-1
2
2
+ Ek =1 c nt Wk + 1 (a k+ 1
~
a
-1
)
k
2
-1
-1
2
+ Ek =1 c nt Wk (a k - a k + 1 )
~
= c
Thus,
2
[WO/a
nt
O+
2
w2l /a O +
~
2E k =1
wk2
(a
-1
k
-1
a _ )J
k 1
9-2-23
A consequence of Lemma 2 is the following inequality from which the strong
law of large numbers obtains directly.
LEMMA 3.
Let
(Xnt'~:~) be a mixingale with {~m}
of size -~ and let S . = L
X
nJ
t=1 nt
j
Then there is a finite constant K that depends
only on {~ } such that
m
2
e(max.<u S .)
J-.(,.. nJ
If
~
m
> 0 for all m then
PROOF.
for some m since
put a
k
By Lemma 2 the result is trivially true if
(McLeish, 1977)
Then assume that
~m ~ ~m+1·
2
~
= [~k(~k + 4a _ ) k 1
2
~kJ/ak_1
for k
~
~m
>0
for all m, put a O =
so that
<
-~
and using an integral approximation we have
B'k- 2S +1 for some B'.
~
::.l
<
~
k
Lk=l(Lm=O
(B' )-~
~
~
Thus
-2 -~
m )
~
Lk=1 k
S-~
m
~O'
= 0
and
I whence a k is positive and solves
Then
B mS for some S
~
9-2-24
Further,
2
2
(W o
+ w )/a
1
O
Putting a_
result.
k
= a
k
~
2a
O
because
2
2
Wm is a decreasing sequence and a O
= ~O .
and substituting into the inequality given by Lemma 2 yields the
[j
A strong law of large numbers for mixingales follows directly using the
same argument that is used to deduce the classical strong law of large numbers
from Kolmogorov's
inequality.
For detailed presentation of this argument see
the proof of Theorem 2, Section 5.1, and Theorem 1, Section 5.3, of Tucker (19b7).
The strong law reads as follows.
PROPOSITION 4.
Wm of size
-~
Suppose that (Xt,J:~) is a singly indexed mixingale with
(whence, recall, eXt = 0 for all t).
~
(a)
(b)
~
If E
t=l
n
then Et=l X converges almost surely.
t
< ~ then timn-+<><> (l/n)E nt =1 Xt = 0 almost surely.
[j
We are now in a position to state and prove a uniform strong law of large
numbers.
The approach is due to Hoadey (1971).
Below, we state some standard
definitions and results (Neveu, 19b5, p. 49-54), prove an intermediate result,
then state and prove the uniform strong law.
UNIFORMLY INTEGRABLE.
A collection {X : A
A
E
A} of integrable random variables
is uniformly integrable if
PROPOSITION 5.
r
>
If
II XA II r
~ t::.
<~
for all A in some index set A and for some
1 then {X : A E A} is uniformly integrable.
A
PROPOSITION b.
The following are equivalent
(a)
{Xn}:=l is uniformly integrable and Xn
(b)
X is integrable and timn-+<><>
Ilxn
-
xIII
~X
= O.
9-2-25
PROPOSITION 7.
Let (f.p) be a separable metric space and let X (y) be
o
continuous in y for n = 1. 2 •...
integrable and limy~yo Xn(y)
integrable.
= Xn
If {X (y): 0=1.2 •... ; y
o
E
f} is uniformly
almost surely then {Xn }:=l is uniformly
If in addition.
lim
0
sup IX (y) - X I = 0
y-ry
0
n
n
almost surely then
lim
0
sup elx (y) - X I = 0
y-ry
n
n
n
PROOF (Hoadley. 1971).
Holding
0
Choose M large enough that
fixed. let
A = {w: lim
0
X (y) = X }
y..,.y
0
0
B(y)
= {w:
IXn(y)1
> M}
{w: IXo I > M}
B =
continuity and separability insure the measurability of these sets.
equal one if w is in A and zero otherwise.
Then
lim
0
I[AnB(y)J IX (y)1 = I(AnB)IX I
y-ry
0
n
and by Fatou's lemma (Royden. 1963)
II
I
X >M
n
Ix
0
I
dP
=I
I(AnB)lx
= I IX
n
<~ .
(y)
n
I
dP
I >M IXn (y) I dP
Let I(A)
9-2-26
This proves the first assertion.
As to the second, suppose that sup e IX (Y) - X
n
Then there is an €>O, a subsequence n.
does not converge to zero.
J
<€ <e
sequence y. with p(y.. yO) +0 such that 0
J
Zj
Now {X .(Y.)}
nJ
J
J
=
+
n
n
I
00, and a
Iz.1 where
J
Ixnj(y j ) - xnjl .
is uniformly integrable by assumption and we have just shown that
{X .} is uniformly integrable.
nJ
must be uniformly integrable.
This contradicts liz
.11 >
J
€
Using Iz.1 ~ IX .(y.)1 + Ix
J
nJ
J
.1
nJ
we see that {Z.}
J
Proposition 6 implies that lim.
Ilz.1I
J+OO
J
1
= o.
whence it must be true that
limy+y ° sup n e IXn (y) - Xn I
=
Using elx (y) - X I ~ I e X (y) -e X I
n
n
n
n
O.
we have the last assertion.
[j
We are now in a position to state and prove the Uniform Strong Law. In reading it,
the statement "f (y) is continuous in y uniformly in til means that for each fixed
t
yO in r
If a family {ft(y)} is continuous in y uniformly in t and (f,p) is a compact metric
space then the family {ft(y)} is equicontinuous; that is, given
o>0
that depends only on
for all y,yO and all t.
€
such that
(Problem 8)
€
>
0 there is a
e
9-2-27
THEOREM 1.
(Uniform Strong Law)
Let {Vt(W)};=_~ be a sequence of vector-
valued random variables defined on a complete probability space (n,a,p) that is
strongly mixing of size -r/(r-2) for some r
space and let W = W (V)
t ~
t
.
.
w1th range 1n
]R
kt
2.
Let (r,p) be a compact metric
- W (w) be a Borel measurable function of
t
Let {gt(wt,y)};=O
each Borel measurable for fixed y.
be a sequence of real-valued functions,
Suppose:
is a near epoch dependent family of size -;.,2
(a)
(b)
>
•
gt[Wt(w),y] is continuous in y uniformly in t for each fixed w in
some set A with P(A) = 1.
(c)
There is a sequence {d } of random variables with
t
lid t
II r
St:.<~
for t = 0, 1, 2, . . . .
Then
almost surely and
is an equicontinuous family.
PROOF
(Problem 9).
(Hoadley, 1971).
A compact metric space is perforce separable
Hypothesis (c) together with Proposition 5 implies that {gt(wt,y)}
is uniformly integrable.
Hypothesis (b) states that
9-2-28
almost surely whence by Proposition 7-
Then
timy-ry ° sup n
I (l/n)E nt= 1
e[gt(Wt'Y) - gt(W t ,yO)]1
S
timy-ry0 sUPn(l/n)E~=l elgt(wt,y) - gt(Wt,yO)1
=
o.
This proves the second assertion; see the remarks preceding the statement of
Theorem 1.
Proposition 7 also implies
whence hypothesis (b) is satisfied by
Since hypotheses (a) and (c) are trivially satisfied as well, we may assume that
eg(Wt,y) = 0 in the rest of the proof without loss of generality.
Let
The continuity of gt(w,y) in y and the separability of (f,p) insure measurability.
From hypothesis (b) we have
9-2-29
almost surely.
Hypothesis (c) implies uniform integrability and application of
Proposition 7 yields
Thus, given
€
for all t.
>0
there is for each yO in r a 0°
>0
so small that
The collection {~o} ° r with~ ° = {y: p(y,yO)
y
y t::
Y
covering of the compact set r so there is a finite subcovering
<
0°} is an open
{~y~}~=l
.
1
The sequence {ht(Wt'y~,o~)} satisfies the hypotheses of Proposition 3 whence
ht(Wt'y~,o~) is a mixingale with {~t} of size -~ and
max{l,IIh (W ,y~,o~)11 } ~ ~ < ~ (taking ~ > 1 if necessary). Now
t t
1
1
r
2 2
En
c /t < ~ and Proposition 4'applies. Then for w not in the exceptional
t=l t
c
t
set E
i
given by Proposition 4 put w = Wt(w) and there is an N such that n
i
t
>
N
i
implies
n
n
~ (l/ n )E t =
1ht(wt'y~,o~)
- O/n>E t -_ 1 e.ht(Wt'y~,o~)
11
11
whence
Now every y is in
i and we have that n
-
for w
t
€
~
n
Ui=l
E
(lin)
i
> max
~
°
Yi
for some
N. implies
1
E~=l gt(Wt,y) ~
with P(U N=l E ) = O.
i
i
€
This establishes the first assertion.
[J
9-2-30
As seen in Chapter 3, there are two constituants to an asymptotic theory
of inference for nonlinear models:
a Uniform Strong Law of large numbers and a
"nearly uniform" Central Limit Theorem.
Let {V t~
be a sequence of vector
t t=-~
(Central Limit Theorem)
THEOREM 2.
valued random variables that is strongly mixing of size -r/(r-2) for some r
Let (f,p) be a separable metric space and let W = Wt (V ~ )
t
V~
>
2.
be a function of
= ( ... , V_I' VO' VI' .•. )
with range in lR
kt
Let
{gn t (Wt ' y ): n = 1, 2, ...; t = 0, 1, 2, .•• t
be a near epoch dependent sequence of real valued functions that is near epoch
dependent of size -~.
0
Given a sequence {y;t:=l from
2 = Var[E n 1 g (W yO)]
t=
nt t' n
n
n
=
r,
put
1, 2, ...
o :5
s :5 1
where [nsJ denotes the integer part of ns -- the largest integer that does not
exceed ns -- and w (0) = O.
n
Suppose that:
(a)
1/0 2
(b)
= s,
lim
n-+<Xl Vadwn (s)J
(c)
II gn t (Wt ' y ~ ) -
n
= O( lin)
o
:5 s
~
1
eg t(W ,yO)1!
t n r
n
~
tJ.
<~ ,
1
~
t
~
n,
t = 1, 2,
Then w (.0) converges weakly in D[O,lJ to a standard Wiener process.
n
...
In particular,
9-2-31
The terminology appearing in the conclusion of Theorem 2 is defined as
follows.
0[0,1] is the space of functions x or x(·) on [O,IJ that are right
continuous and have
x(t+)
= ti~+O
exists.
left hand limits; that is, for 0
x(t+h) exists and x(t+)
= x(t)
~
and for 0
t
< 1,
<t
~
1, x(t-)
A metric d(x,y) on O[O,lJ may be defined as follows.
= ti~+Ox(t-h)
Let A denote the
class of strictly increasing, continuous mappings A of [O,lJ onto itself; such a
A will have A(O) = 0 and A(l) =1 of necessity.
d(x,y) to be the infinum of those positive
€
For x and y in O[O,lJ define
for which there is a A in A with
The idea is that one is permitted
to shift points on the time axis by an amount
€
in an attempt to make x and y
coincide to within €; note that the points 0 and 1 cannot be so shifted.
A
verification that d(x,y) is a metric is given by Billingsly (1968, Section 14).
If
~
denotes the smallest sigma-algebra containing the open sets
< o} --
form ~= {y: d(x,y)
Borel subsets of oW,lJ.
then (O,~) is a measurable space.
= Pwn-1 (A)
~ is called the
The random variables w (.) have range in 0[0,1] and,
n
preforce, induce a probability measure on
P (A)
n
sets of the
(O,~)
defined by
=p{w: w (.) in A} for each A in~.
n
w(·) has two determining properties.
A standard Wiener process
For each t, the (real valued) random
variable w(t) is normally distributed with mean zero and variance t.
partition 0
~
to
~
t
1
~
•..
~
t
k
~
For each
1, the (real valued) random variables
are independent; this property is known as independent increments.
-1
probability measure on (O,~) induced by this process; W(A) = P w
Let W be the
(A) = p{w: w(·) in A}.
It exists and puts mass one on the space C[O,lJ of continuous functions defined on
[O,lJ (Billinglsy, 1968, Section 9).
Weiner process means that
Weak convergence of w (.) to a standard
n
9-2-32
tim
n~
fD
f dP
n
fD
=
f dW
for every bounded, continuous function f defined over D[O,lj.
The term weak
convergence derives from the fact that the collection of finite signed measures
is the dual (Royden, 1963, Chapter 10) of the space of bounded, continuous
f
functions defined on D[O,IJ and timn~
fdP n =
f
fdW for every such f is weak*
convergence (pointwise convergence) in this dual space.
Section 2)
(Billingsly, 1908,
If h is a continuous mapping from D[O,IJ into B
implies tim
n~
fD
f h dP
n
fD
=
n~
-1
n
weakly to Wh
-1
On B
1
then weak convergence
f h dW for all f bounded and continuous on B'
whence by the change of variable formula tim
the probability measures P h
1
fBf
dP hn
1
=
fBf
defined on the Borel subsets of
dWh-
m1
1
. Thus
converge
convergence in distribution and weak convergence are
equivalent (Billingsly, 1968, Section 3) so that the distribution of h w ('),
n
F (x) = P[hw (.)
n
n
~
xj = P h
n
-1
(-=,xJ,
converges at every continuity point to the distribution of h w(·).
~lx(')
the mapping
tim
n~
y (1)
n
=
x(l) is continuous because timn~ d(y n ,x) =
In particular
° implies
x(l); recall one cannot shift the point 1 by choice of A in A.
Thus we have that the random variable w (1) converges in distribution to the
n
random variable w(l) which is normally distributed with mean zero and unit variance.
The proof of Theorem 2 is due to Wooldridge (1984) and is an adaptation of
the methods of proof used by McLeish (1975, 1977).
We shall need some preliminary
definitions and lemmas.
Recall that {Vt(w)}~=_= is the underlying stochastic process on (~l,a,p);
that ~ denotes the smallest complete sub-sigma-algebra such that Vm, Vm+l' •.. , Vn
are measurable, 3-=
-=
= n
=
t=-=
=
cr(U~=_= ~~=); that Wt(V=) is a function of
k
possibly infinitely many of the V with range in B t for t = 0, 1, ... , and that
t
gnt(Wt'Y~)
maps B
kt
into the real line for n = 1, 2, ... and t = 0, 1, . . . .
9-2-33
Set
~
for t
0 and Z
= 0 for t
nt
02
n
=
<0
whence
n
Var(E t= IXnt )
w (s) = E[ns] X 10 2
n
t=l
nt n
By Proposition 4, (Xnt'~:~ ) for n = 1, 2, ... and t = 1, 2,
with {ljIm} of size -~ and Cnt = max {1, IIxntllr}.
for n
~
(a)
II e (Xnt I~t-m)
-~ II 2
(b)
IIXnt -
1, t
1, m
~
~
That is,
-< ljIm Cnt
C
O.
is a mixingale
nt
We also have from the definition of near epoch
dependence that
is size
-~.
Define
and
S
nj
j
= Et=l
Take S (0) = S
n
nO
MARTINGALE.
algebras.
X
nt
= Sn (Join)
.
= O.
'1/1
.
Note that '1/0
q_~, q_~, ... 1S an increasing sequence
0
Relative to these sigma-algebras, a doubly indexed process
{(Z
nt
,~t):
-~
n =
1, 2, ... ; t = 0, 1, ... }
is said to be a martingale if
f
su b -sigma-
9-2-34
(a)
t
is measurable with respect to 3' -00
Z
nt
(b)
elz nt I
(c)
e(z nt
<
I3 s
00
< t.
)
= Xns
for s
n
=
1, 2,
... ,
and Ynt
=
Znt - Zn,t-l for t
-00
The sequence
{ (Y
with Y
nO
, :;t
nt
-co
= ZnO'
):
t
=
1, 2,
=
... }
I, 2, •.. and
(Znt,3~co)
as above
is called a martingale difference sequence.
t
LEMMA 4.
Let (Xnt':;-co) be a mixingale and put
>
Then for any y
PROOF.
1 and nonnegative sequence {ail
we have
(McLeish, 1975b, Lemma 6.2).
LEMMA 5.
=
Let (y t,3F ) for n
n
-co
difference sequence (soe(y
that IYntl ~ K c
nt
nt
1~-1)
co
1, 2,
and t
=
1, 2, ... be a martingale
= 0 almost surely for all t
~ 1) and assume
almost surely for some sequence of positive constants {c nt } .
Then
PROOF.
McLeish (1977, Lemma 3.1).
Let (Xnt':;:oo) be a mixingale with {$m} of size -~ and
LEMMA 6.
c:
nt
= max {I, II Xnt II r }
{X:
t
:
t
=
for r
> 2.
1, 2, ... ,
is uniformly integrable then
n;
If
n
= 1, 2, ... }
9-2-35
2
{maxj~l ( 5n,j+k - 5 nk )2/ Ek+l
t=k+l c nt
1 ~ k+l ~ n, k ~ 0, n ~ I}
is uniformly integrable.
PROOF.
(McLeish, 1977)
For c
= Y + Z
and note that X
nt
nt
nt + U
nt
_2
-z = Ej z
-u = Ej u
t=1 nt' nj
t=1 nt' col
nj
~
1 and m to be determined later, put
=
j
Let E X =
X dP, Y = E
t=1 Ynt '
a
nj
l
2
E
Jensen's inequality implies that
c
t=1 nt'
f[x~aJ
(E Pix ) 2 :;; E Pix 2 for any positive Pi with E Pi = 1 whence
i
i
by taking p.
1.
= 1/3.
In general
(X + Y + Z
> a)
c
(X
> a/3)
U
(y
> a/3)
U
(Z
> a/3)
whence
(X + Y + Z) I(X + Y + Z
~
3 X I(X
> a)
> a/3)
+ 3 Y I(Y
> a/3)
+ 3 Z I(Z
and
Ea (X + Y + Z) ~ 3 Ea / 3 X + 3 Ea / 3 Y + 3 Ea / 3 Z .
It follows that
where
> a/3)
9-2-36
e < -~
For some
o
~ 1/1
e
>
-%
= O(k ) = o[k 2(ln k)
k
Note that for k
and for m
we have
~
-2
J.
m
k
t k
c nt
II Unt -e(Unt la + )II Z = IIXnt -e(Xn t13t+k)lIz~
- oo
I/Ik·
-00
II e(Unt la::k ) liz
Similarly
a
k
k
~ B/[k~
= l/(k
for k ~ m and is less than c nt I/I k
Therefore (U n t' at-00 ) is a mixingale with $k =
for k ;;: m.
and I/I
is less than c nt I/J~
ifi
ifi(k)] for all k
k) for Ikl
> m.
By Lemma 2 with a
k
1/1
max ( m. k) of size -~
= m ifi m for Ikl
~
m and
~ m we have
u :;;;
Now
r2 x
-1
(in x)
-Z
dx =
J~n2
by Taylor's theorem 0 ::i k
k - 1 :5 k :5 k
for arbitrary €
ifi
-Z
<
00
Further.
implies that 0 :;;; Lk=-oo a k < 00
2
2
k - (k-l)in(k-l) :5 k[ifi k - in(k-l) J :5 Z in k for
u
du
00
00
-1
2 -1
whence 0 :5 Lk=Z I/Ik(a
k - a k- 1 )
>0
~
00
4
Z Lk=Z in k/(k in k)
<
00
Thus.
we may choose and fix m sufficiently large that u :5 €/27.
Note the choice of m depends only on the sequence {I/Ik}' not on n.
Also note that
if some of the leading U wre set to zero U would be a mixingale with the
nt
nt
A
same W but the leading c
would be zero.
k
nt
on where the sum starts.
Thus the choice of m does not depend
9-Z-37
Similarly, for k
than II Znt liz
~
m IIZnt - e(Zntl:J::k)ll z and lie (ZntI:J::k)lIz
are less
and
II Znt liz :5 II Xnt -
X~t liz
:li
max t :5n e cX:t
For k > m IIZnt -e(ZntI3::k)lIz = lIe(ZntI3::k)lIz = O.
for k S m+l and
~
= a
k
= k
Z
By Lenuna Z, with a k
for k > m+1 we have
For our now fixed value of m we may choose c large enough that z
{XZ } is a uniformly integrable set.
nt
< €/Z7 since
Note again that c depends neither on n nor
on where the sum starts.
With c and m thus fixed apply Lenuna 4 to the sequence
for Iii ~ m and a. = i
1.
Z
{Ynt } with
y=4 and a=l
for Iii> m to obtain
where
YUk = ~t
e(y 13t + k ) - e(y laF+ k - 1 )
-{..
t=l
nt .-co
nt-co
For fixed m and c as chosen previously, one sees from this inequality that there
is an a large enough that y
< €/Z7.
Thus
Note once again that the choice of a depends neither on n nor on where the sum
starts thus
Z
k t
c ): 1
+
{max jS-{..U (S n,j+k - Sn,k )Z/(L t=k+l
nt
is a uniformly integrable set.
U
~
k +
t ~
n, k
~
0, n
~ I}
1
9-2-38
TIGHTNESS.
{p n } defined on the Borel
A family of probability measures
subsets of D[O,11 is tight if for every positive
K such that p (K) > 1 - €
n
for all n.
€
there exists a compact set
The importance of tightness derives from
the fact that it implies relative compactness:
every sequence from {p } contains
n
a weakly convergent subsequence.
LEMMA 7.
Let w
n
be a sequence of random variables with range in D[O,11
and suppose that
is a uniformly integrable set where N(t,o) is some nonrandom finite valued
function.
[t,11.
It is understood that if t+o>l then the maximum above is taken over
Then {p } with
n
P (A) = p{w: w (.) in A}
n
n
is tight and if pI is the weak limit of a subsequence from P
n
then pI puts mass
one on the space C[O,l].
PROOF.
The proof consists in verifying the conditions of Theorem 15.5 of
Billingsly (1968).
(a)
These are:
For each positive
n
there exists an a such that p{w: Iw (0)1 > a} ~
n
n
for n ;;: 1.
(b)
For each positive € and n, there exists a 0, 0
such that
for all n ;;: nO
<
° < 1,
and an integer
9-2-39
Because
W (0)
n
= 0 for all n, (a) is trivially satisfied.
let positive € and
n be given.
To show (b)
As in the proof of Lemma 6 let E
a
the integral of X over the set {w: X ~
a}.
X denote
Note that
~ A2 P[max t_s_t
< < +~ Iw (s) - w (t)1 ~ A{6] .
u
n
n
By hypothesis A can be chosen so large that both €2/ A2<1 and the left hand side
of the inequality is less than n€2 for members of U.
... ,
equal to the largest of the N(io,o) for i = 0, 1,
Set 0 = €2/ A2 and set nO
[i/o].
If Is-tl < 0 then
either both t and s lie in an interval of the form [io, (i+1)0] or in abutting
intervals of that form whence
p{w: sUPls_tl<o Iwn(s) - wn(t>
<
=
~ 3€}
[i/o] {w: maxio~s~(i+1)0
P Ui=O
~ Ei:~OJA-2
and if n
I
E
A2
Iwn ()
s
[maxiO~S(i+1)0
- wn (t
)I
~ €
}
2
Iwn(s) - wn (t)1 /0]
> nO
::i [1
~
+ 1/010 n
2n
CONTINUITY SET.
Let Y be a (possibly) vector valued random variable.
Y-continuity set is a Borel set B whose boundary aB has p(y in aB) = O.
A
The
boundary aB of B consists of those limit points of B that are also limit points
of some seuence of points not in B.
I f P (B)
n
p(y
n
in B), P'(B) = p(y in B),
and B is a Y-continuity set then limn-+<x> P'(B)
= P(B) (Billingsly, 1968,
n
Theorem 2.1).
9-2-40
LEMMA 8.
Let V ., Y ., Y. for i = 1, 2, ... , k be random variables defined
nl.
nl.
l.
on a probability space (11,a,p) such that
- Yni
P
for i = 1, 2,
... ,
k
for i = 1, 2,
... ,
k
(a)
V
ni
(b)
Y .~-+ Y.
l.
nl.
(c)
k
J
timn-+<X> {p[n i=1 (V ni in A.)
l.
-+0
k
- lli=1
P(V
in A.) } = 0
l.
ni
Condition (c) is called asymptotic independence; the condition must hold for all
possible choices of Borel subsets A.l. of the real line.
B.)}
timn-+<X>
l.
Conditions (a) and (b) imply that V.
nl.
PROOF.
Then for all Y.l. - continuity
s: -+
= O.
Y. whence for Y.-continuity
l.
l.
sets B. we have
l.
timn-+<X>
B. )
timn-+<X>
l.
k
since X =1 B. is a Y-continuity set of the random variable Y = (Y 1 , Y2 ,
l.
i
k
with boundary X =1 aBol. (Problem 10). Condition (c) implies the result.
i
PROOF OF THEOREM 2.
X
Recall that we have set
nt
w
n
(s)
= ~[nsJX
t=1
nt
/0
n
o :i
s
~
1
and that we have the following conditions in force:
... ,
Y )1
k
9-2-41
(a)
1/0 2 = 0 (l/n)
n
(b)
lim
(c)
II Xnt II r
(d)
(X
(e)
\)
n~
nt
m
Var[w (s)j = s, 0
n
S t:,.
< ""
r
> 2,
~
s
1
~
1 S t :i n, n = 1, 2,
,3:",,) is a mixinga1e with {I/J m} of size
sUPn SUPt
II X nt
e(X
nt 1;Y~::) 11 2
-~
and c
is of size
nt
= maxi I,
II Xnt II r}
-~
(£)
Condition (c) implies that
2
{Xnt
: t = 1 , 2 , ... ,n;n=I,2, ... }
is a uniformly integrable set by Proposition 5.
This taken together with
condition (d) implies that
15
=
{
(
S )2/1;k+l
2
maxj~ Sn,j+k - nk
t=k+1 c nt
is a uniformly integrable set by Lemma 6.
t, 0
~
t
~
I, and any 0, 0
< 0 S l,
~
if t
~
I ~ k + l S n, k ~ 0, n ~ I}
Condition (a) implies that for any
s
~
t + 0 then
([nsj - [ntJ)t:,.2/(o02)
n
(n+l) 0 t:,. 2/( 0 ( 2 )
n
l
S (n+l) t:,.2 O(n- )
~
::; B t:,.2
for n larger than some nO.
For each t and 0 put N(t,o) = nO whence
is dominated by B t:,.2 times some member of 15 for n
> N(t,o).
Thus Lemma 7 applies
whence {p } is tight and if pi is the weak limit of a sequence from {p } then pI
n
puts mass one C[O,lj; recall P is defined by P (A) = p{w: w (.) in A}
n
n
n
n
9-2-42
for every Borel subset A of 0[0,1].
Theorem 19.2 of Billingsly (1968) states that if:
(i)
w (s) has asymptotically independent increments
(ii)
{w~(s)}:=l is uniformly integrable for each s
n
2
(iii) ew (s) + 0 and ew (s) + 0 as n +
n
(iv)
n
For each positive
~
00
and n there is a positive 6, 0 < 6 < 1, and
an integer nO such that p{w: sUPls_tl<6 Iwn(s) - wn(t)1 i1:~}:ii n
for all n
> nO
then w converges weakly in 0[0,1] to a standard Weiner process.
n
We shall verify
these four conditions.
We have condition (iii) at once from the definition of Xnt and condition (b).
We have just shown that for given t and 6 the set
is uniformly integrable so put 6=1 and t=O and condition (ii) obtains.
verified condition (iv) as an intermediate step in the proof of Lemma 7.
We
It
remains to verify condition (i).
Consider two intervals (O,a) and (b,c) with 0 < a < b < c
Un =
e[wn(a)13:~a]]
Vn =e[w(c)-w(b)I:J""r
n
n
nc ]J
Thus
By Minkowski's inequality and condition (e)
~
1.
Define
9-2-43
II wn (a)
- U II 2
- 1 E[na J II x
n
~ an
t=l
nt
- e (Xnt 1'1: [na J ) II 2
fJ'_00
a-1E[naJ Ilx
- e(x l:Jt+[naJ-t)11
n t=l
nt
nt t-[naJ+t
:ii
-l.. [naJ
S an 4 t =1 v[naj-t
-~
Since {v Jl is of size -~, E
v m
m
m=O m
00
<
00.
By Kronecker's lemma (Hall and
Heyde, 1980, Section 2.6)
converges to zero as n tends to infinity.
Ilw (a) - U 11
n
n 2
-+-
n
-1
->.:
is O(n 2) we have that
n
0 as m ... 0 whence w (a) - U p... O.
n
n
that [w (c) - w (b)J - V
n
Since a
n
p
... O.
A similar argument shows
For any Borel sets Band C, U-1(B) g:J [naJ
n
-00
and v:l(C) g:J 7ncJ ' thus
Ip (U n
in A) n ( V in B)
n
P(U
n
in A) P(V
n
in B)I
which tends to zero as n tends to infinity by condition (f).
conditions (a) and (c) of Lemma 8.
We have now verified
Given an arbitrary sequence from {p } there
n
is a weakly convergent subsequence {p ,} with limit P' by relative compactness.
n
Since, by Lemma 7, pI puts mass one on C[O,lj the finite dimensional distributions
of w
n
converge to the corresponding finite dimensional distributions of P' by
Theorem 5.1 by Billingsly (1968).
This implies that condition (b) of Lemma 8
holds for the subsequence, whence the conclusion of Lemma 8 obtains for the
subsequence.
Since the limit given by Lemma 8 is the single value zero and the
choice of a sequence from {p } was arbitrary we have that condition (i),
n
asymptotically independent increments, holds for the three points 0
The same argument can be repeated for more points.
[j
< a < b < c.
9-2-44
Theorem 2 provides a central limit theorem for the sequence of random
variables
{gnt (W t
=
.yO): n
n
1. 2.
... ,
t=O.l .... } .
To make practical use of it. we need some means to estimate the variance of
a sum, in particular
Putting
this variance is
with
T
= 0,
±l, ±2, ... , ±(n-l).
The natural estimator of a 2 is
n
A
R
nT
with
where w
T
is some set of weights chosen so that
0n2
is guaranteed to be positive.
Any sequence of weights of the form (Problem 11)
W
T
will guarantee positivity of which the simplest such sequence is the modified
Bartlett sequence
9-2-45
The truncation est1°mator
(JA2
n
=
A
~l(n)
~,=-l(n) Rn, d oe s no t h
aveO
we1g ht s th a t sa to1S f y
the positivity condition and can thus assume negative values.
We shall not
consider it for that reason.
If
{Xnt } were
a stationary time series, then estimating the variance
of a sum would be the same problem as estimating the value of the spectral
density at zero.
There is an extensive literature on the optimal choice of
weights for the purpose of estimating a spectral density; see for instance
Anderson (1971, Chapter 9) or Bloomfield (197b, Chapter 7).
In the theoretical
• discussion we shall use Bartlett weights because of their analytical tractability
but in applications we recommend Par zen weights
1 -
,
w
ol,1 2Il2 (n)
+
ol,1 3 /l3 (n)
=
(
2[1 -
1,lll(n)]3
with l(n) taken as that
integer
nearest n
115
See Anderson (1971, Chapter 9)
for a verification of the positivity of Par zen weights
and for a
°
°
. ..
ver1° f 1cat10n
t h at t h e c h·
01ce ~On( ) ·= n 115 m1n1m1zes
t h e mean square error 0 f t h e
estimator.
At this point we must assume that W is a function of past values of V so
t
t
that W is measurable with respect to J
t
t
-QO
This is an innocuous assumption in
view of the intended applications while proceeding without it would entail
inordinately burdensome regularity conditions.
properties of
0n2
The following describes the
subject to this restriction for Bartlett weights; see Problem 12
for Parzen weights.
THEOREM 3.
Let {V}
t
QO
t=-QO
be a sequence of vector valued random variables
that is strong - mixing of size -2qr/(r-2) with q = 2(r-2)/(r-4) for some r
>
Let (f,p) be a separable metric space and let W = W (V ) be a function of the
t
t
QO
4.
9-2-46
~
past with range in
kt
that is, W is a function of only
t
Let
t
= 0,
1, ...
t
be a sequence of random variables that is near epoch dependent of size -q.
r,
Given a sequence {y:t:=l from
x
= g
nt
nt
(W
yO) -
t' n
put
e g nt (W t' yO)
n
<~
and suppose that IIXntllr S 6
for 1 S t ~ n;
n
RnT
a~
=
=
En
X
t=l+ITI
E~~~i(n)
nt
n, t = 1, 2, . . . •
1, 2, ...
X
n,t-ITI
T = 0, ±l, ±2, .•. , ±(n-l)
[1 - ITl/t(n)JRnT 1
~
t(n) S n-1
Then there is a bound B that does not depend on n such that
PROOF.
To establish the first inequality, note that
10n2 -e
en2
1 :ii
2 t-1(n)
E~~~)
n-l
+ 2 E
T=t(n)
S 2
Now
t-1(n)
Define
T En
t=l+T
e (X nt Xn,t-T ) I
En
t=l+T
e (Xnt Xn,t-T ) I
n-l
En
ET=O T
t=l+r
e (Xnt Xn,t-T ) 1
9-2-47
Ie
(X
x
nt n,t-,
)I
Ie xn,t-, e (Xnt /3
=
~
where 0
;;ii
c
nt
;;ii
max{l,
(l
Ilxn til}
r
Proposition 3; note that q
>
2.
;;ii
+
t-,)
-00
Ilxn,t-, Il rr / 2 )
I
c nt I/J ,
(1 + t:,.r/2) and 0;;;
Thus, we have
which establishes the first inequality.
To establish the second inequality note that
~
P
n~~~l(n) {I[l - ITI/l(n)JI 1Rn-r - eRn-r' < €/[Zl(n) + II}
= 1 - P
ul(n)
{I[l
,=-l(n)
-
1,I/l(n)JIIRn, -eRn, 1>€/[2l(n)+IJ}
so that
Suppress the subscript n and put
xt = 0
for t
~
O.
By applying in succession:
a change of variable formula, the law of iterated expectations, and Holders
inequality, we have
9-Z-48
=e
~
(n-l-'r)
En
X
X
Eh=-(n-l-'r) t=l+r+lhl
t-Ihl-T t-Ihl Xt - T Xt
ZT
Z E
En
Ie (
x
x
)
h=O t=l+r+h
Xt - h - T t-h t-T Xt
+ZE
CD
h=ZT
En
IIX
t=l+r+h
t-h-T
4
xt-h 1111e(X
Z
t-T
::i 4 T n ( 1+11 2/ ) + Z ECD
Write X - = e (X t - T
t T
I d~t+h/Z
t-h/Z)
I
En
h=ZT t=l+r+h
A
and Xt = e (X t
XIJt-h)11
t
-CD
Z
II Xt-h-T II 4 II Xt-h II 4
I J t+h/Z
t-h/Z)·
lie (X
x l;f-h) , _
t-T t
-CD
By applying the triangle
inequality twice, and the conditional Jensen's inequality (Problem 4), we obtain
+ II e [X t-T (X t
-
Xt ) I J
t-h 111
-CD
z
9-2-49
The argument used to prove Lemma 1 can be repeated to obtain the inequality
for s
r/2
>
2.
Then we have
s-2
II X(X
t
t-T
-X
t-T
)11
2
s-2
:£2~llx t-T -X t-T 11~(s=r)(l+Lir/2)~(s=r)[IIXt-T II r II Xt II r
+ IIXt-T"r
IIXtllrJ~(s~l)
=(const.) IIX n,t-T -e (X n,t-T
where the constant does not depend on n, h, or T.
IJ
r-4
t+h/2
~(r=z)
t- h/2)11 2
By the definition of near
epoch dependence we will have
II X
- e(x
/:Jt+h/2)
n,t-T
n,t-T t-h/2
provided that T
;S
h/2.
II 2
:£
Thus we have
and by the same argument
where the constant does not depend on n, h, or T.
S 2(2~+l) LHa
h/2
Using Proposition 2 we have
)~ - l/r
9-2-50
Combining the various inequalities we have
~
1 (r-4)
E""
l(v
)"2r=7+(a
)(r-2)/ Zr r
(const.)n l·(n)
b
+
,=-l(n)
h=2T
h/2-T
h/2-T
.
= (const. )n El(n)
T=-t(n)
""
IT + Em=O(m)
Thus we have
which establishes the second inequality.
[J
_ !l (r-4)
2
r-2
+ (m) -q]
9-Z-5l
PROBLEMS
(Nonlinear ARMA)
1.
Consider data generated according to the model
1, Z, ...
t
y
t
=
0
where {e } is a sequence of independent random variables.
t
Ii sup s
g(e,e l, ... ,e
,S)II P ~ K
t tt-q
< co
for some p
Show that {gt(wt,S)} is near epoch dependent.
=
gt
and put
Show that
~t
Referring to Example 1, show that if V = (Y,O) then Yt,m has prediction
o
error
dtlYI + dm E~-m-l djle
[y
is near epoch dependent.
J=O
-
3.
Wm+l
Hint:
Let
g(e t , e t - l , ... , e t _ q , SO) is strongly mixing of size -q for all positive q.
Z.
t
>4
0
~
t
~
.1
t-m-J
Use this to show that
Show that the definition of a mixingale implies that one can assume that
Wm without loss of generality.
4.
Hint:
See the proof of Proposition 3.
The conditional Jensen's inequality is
Show that this implies e le (X I~) 1p ~ ex P whence
5.
Show that i f X and Yare in
II
then
for t - m.
X -e(xl:J)lI
z ~ II
X - Y
LZ(~2,G,P)
liz.
Hint:
gle(XI~)1 ~ e(gXI~)
"e(x I ~ " p
and Y is
~"X"
P
for convex g.
for p ~ 1.
~-measurable
with
~
C'I
Consider [X - e(XI~) +e(XI~) - YJZ
and show that e{[X -e (XI:J)J[e(XI~) - YJ} = O.
6.
Show that the random variables U
= e(X
l:J t + k ) - e(Xnt I~t+k-l)
appearing
nt -co
-co
tk
in the proof of Lemma Z form a two dimensional array with uncorrelated rows and
columns where t is the row index and k is the column index.
Show that
9-2-52
r
Show that the hypothesis
7.
1 /
wk2 / a k-!a-k-l
<
permits the reordering of terms
00
in the proof of Lemma 2.
uni~ormly
Show that if ft(y) is continuous in r
8.
in t and (r,p) is a
compact metric space then {ft(y)} is an equicontinuous family.
9.
Show that a compact metric space (X,p) is separable.
ball of radius lin at each point in X.
within p(x,x. )
In
<
lin for each x in X.
Hint:
Thus, there are points x
ln
'
Center a
... , x
mn
Show that the triangular array that results
by taking n= 1, 2, ... is a countable dense subset of X.
k
k
10.
Show that the boundary of X. lB. is X. loB.
1.=
1.
1.=
1.
11.
Write
Xl
a
X
Xl
a
a
X
3
X
Xl
X
n
X
n-l
X
on
z
z
where oBi is the boundary
l
a
o
a
I
X ..
a
n+l(n)
and show that
;zn ..
l~
a
;Zn •
12.
xn-l(n)
X
n
l(n)
a'X'Xa where
w ..
and hence that
estimator
X
n-Z
X
n-l
X
on
Show that the truncation
T
~
R
nT
can be negative.
Prove Theorem 3 for Parzen weights assuming that q
~
3.
Hint:
Verification of the second inequality only requires that the weights be less
than one.
As to the first, Parzen weights differ from one by a homogeneous
polynomial of degree three for l(n)/n ~ ~ and are smaller than one for
l(n)/n ;:
~
.
~
..
9-3-1
3.
DATA GENERATING PROCESS
In this section we shall give a formal description of a data generating
mechanism that is general enough to accept the intended applications yet
sufficiently restrictive to permit application of the results of the previous
section, notably the uniform strong law of large numbers and the central limit
theorem.
As the motivation behind our conventions was set forth in Section 1,
we can be brief here.
The independent variables
{Xt}~=_a>
and the errors {e }a>_
are grouped
t t--a>
together into a single process {vt};=_a>
t
in m .
In instances where we wish to indicate clearly that v
t
is being regarded
as a random variable mapping the underlying (complete) probability space
into
mt
(~"G,P)
we shall write V (w) or V and write {V (w)}a>
or {V}a>
for the
t
t
t
t=-a>
t t=-a>
process itself.
But for the most part we shall follow the usual convention in
statistical writings and let {v}a>
denote either a realization of the process
t t=-a>
or the process itself as determined by context.
Recall that ~n
is the smallest sub-sigma-algebra of G, complete with
m
-a>
a>
t
respect to (~"G,p), such that Vm, Vm+l' ... , Vn are measurable; J -a> = n t --a>-a>
J
.
Situations with a finite past are accommodated by putting V
t
=
0 for t
<0
and
letting V represent initial conditions, fixed or random, if any. Note that if
o
{V } has a finite past then J:a> will be the trivial sigma-algebra {~,~,} plus its
t
completion for t
< O.
ASSUMPTION 1.
{Vt(w)};=_a> is a sequence of random variables each defined
over a complete probability space
(~, ,n ,P)
t
and each with range in lR .
0
Let
denote a doubly infinite sequence, a point in X;=_a> mI.
the dependent variables {y t }a>t=-a>
Recall, Section 1, that
are viewed as obtaining from va> via a reduced
9-3-2
form such as
t
0, ±l, ±2, ...
but, since we shall be studying the limiting behavior of functions of the form
s 0,)
n
more convenient is to group observations into a vector
dispense with consideration ofY{tN00 ,yO), and put conditions directly on the mapping
w
t
= Wt (v
)
00
k
with range in lR t, k
t
fixed for all t, and k
= it + i~.
t
serves three functions.
The most common choices for k
= (const.)·t.
= const.,
t
Recall that the subscript t associated to Wt(V oo )
t
are k
It indicates that time may enter as a variable, it indicates
that Wt (v00 ) depends primarily on the component v
components v s of v00 with It-sl
>
t
of v00 and to a lesser extent on
0, and it indicates that the dimension k
vector wt = Wtoo
(v ) may depend on t.
t
of the
As Wtoo
(v ) represents data, it need only be
defined for t = 0, 1, ... with W representing initial conditions, fixed or
o
random, if any.
We must also require that W depends only on the past to invoke Theorem 3.
t
ASSUMPTION 2.
Each function Wt(voo ) in the sequence {Wtt;=O is a Borel
OO
JRl into JRk t
measurable mapping of JR_00oo __ Xt=-oo.
That is, if B is a Borel subset
of lR
kt
-1
then the pre-image W (B) is an element of the smallest sigma-algebra
t
00
B-00 containing all cylinder sets of the form
1
... x lR x B x Bm+l x ... x B x lR
m
n
1
1
x
where each B is a Borel subset of lR . Each function Wt (v00 ) depends only on the past;
t
that is, depends only on ( .•. , v
,v
,v). U
t-2
t-l
t
9-3-3
The concern in the previous section was to find conditions such that a
sequence of real valued random variables of the form
will obey a uniform strong law and such that a sequence of the form
{gnt (W t .y nO): yOn
E
r;
= 0.1 •... ;
t
will follow a central limit theorem.
n
=
1.2 •... }
Aside from some technical conditions, the
inquiry produced three conditions.
The first condition limits the dependence that {V}oo
can exhibit.
t t=-oo
ASSUMPTION 3.
00
{Vt } t=-oo is strong-mixing of size -4r/(r-4) for some r
The second is a bound lid II
t
functions d
t
~ Ig (W ,Y)I
t
t
r
~ ~
<
00
> 4.
on the rth moment of the dominating
in the case of the strong law and a similar rth moment
condition IIgnt(Wt'Y~) - egnt(Wt,y~)llr ~ ~
theorem; r above is that of Assumption 3.
<
00
in the case of the central limit
There is a trade-off. the larger the
moment r that can be so bounded, the more dependence {V } is allowed to exhibit.
t
The third condition is a requirement that gt(Wt,y) or gnt(Wt,y) be nearly a
function of the current epoch.
In perhaps the majority of applications the
condition of near epoch dependence will obtain trivially because Wt (V 00 ) will be
of the form
W (V ) = W (V
, ... ,V)
t o o t t-m
t
for some finite value of m that does not depend on t.
In other applications,
notably the nonlinear autoregression, the dimension of W does not depend on t.
t
gt(w.y) or g
nt
(w,y) is smooth in the argument w, and W is nearly a function of
t
the current epoch in the sense that
nm = II Wt - p, (W t I;Jt+m)
I/z falls off at a
t-m
geometric rate in m in which case the near epoch dependence condition obtains
by Proposition 1.
For applications not falling into these two categories, the
near epoch dependence condition must be verified directly.
u
9-4-1
4.
LEAST MEAN DISTANCE ESTIMATORS
Recall that a least mean distance estimator ~
n
is defined as the solution
of the optimization problem
Minimize:
s (A)
n
As with {v}~
we shall let {Wt}~t--O denote either a realization of the process
t t=-~
that is, data -- or the process itself as determined by context.
we shall write W (v ) when considered as a function defined on
t
W (V), W , W
t ~
t
t
variable;
~
For emphasis,
~
m__- ,
and write
tv~ (w)], or Wt (w) when considered as a random variable.
The random
corresponds conceptoally to a preliminary estimator of nuisance parameters;
n
A is a p-vector and each St(Wt,T,A) is a real valued, Borel measurable function
defined on some subset of
estimator A
n
k
mt
x
u
m
x
mP .
A constrained least mean distance
is the solution of the optimization problem
Minimize:
s (A) subject to h(A)
n
h*
n
The objective of this section is to find the asymptotic distribution of the
estimator A
n
error.
under regularity condtions that do not rule out specification
Some ancillary facts regarding the asymptotic distribution of the
constrained estimator A under a Pitman drift are also derived for use in later
n
sections on hypothesis testing.
We shall leave the data generating mechanism
fixed and impose drift by moving h * ; this is the exact converse of the approach
n
taken in Chapter 3.
Example 1, least squares estimation of the parameters of a
nonlinear autoregression, will be used for illustration throughout this section.
EXAMPLE 1 (Continued).
o
The data generating model is
t
= 1, 2,
t
:;; 0
...
9-4-2
with 1(3/3y)f(y,x,y)1 S d
<1
for all relevant x and Y'
The process
t
=
1, 2, ...
t
S
°
generates the underlying sub-sigma-algebras Jt-og that appear in the definition
of strong mixing and near epoch dependence.
= 0,
t
As we saw in Section 2,
II et l lpS
t::,.
<
Data consists of
og
1, 2, . . . .
for some p
>4
is enough to
guarantee that, for the least squares sample objective function
s
n
Od
the family
t
is near epoch dependent of size -q for any q
= 0, 1, ...
> 0.
The same is true of the family
of scores
t
= 0,
1, ...
assuming suitable smoothness (Problem 2).
If we take
IIV t II r
~ t::,.
<
og
for some r
>4
and assume that
{V t
} is strong-
mixing of size -r/(r-2), then Theorems 1 and 2 can be applied to the sample
objective function and the scores respectively.
If {V } is strong-mixing of
t
size -4r/(r-4) then Theorem 3 may be applied to the scores.
As we shall see later, if the parameter A is to be identified by least
squares, it is convenient if the orthogonality condition
holds for all square integrable g(Yt-l'x ).
t
The easiest way to guarantee that
9-4-3
the orthogonality condition holds is to assume that {e } is a sequence of
t
independent random variables and that the process {e } is independent of {x }
t
t
whence e
and (Y
t
t-
l'x ) are independent.
t
U
In contrast to Chapter 3, s (A) and, hence, A
n
possess almost sure limits.
n
do not, of necessity,
To some extent this is a simplification as the
ambivalence as to whether some fixed point A* or a point AO that varies with
n
n ought to be regarded as the location parameter of
is the only possibility.
An
is removed.
Here, AO
n
This situation obtains due to the use of a weaker
strong law, Theorem 1 of this chapter, instead of Theorem 1 of Chapter 3.
The estimator,
n
is centered at
,0n
defined in Assumption 4.
NOTATION 1.
s (A)
n
s°(A) = (l/n) En
t=l
n
~
n
e
s (W "O,A)
t
t
n
minimizes s (A)
n
A minimizes s (A) subject to h(A) = 0
n
n
AO minimizes SO (A)
n
n
A* minimizes s°(A) subject to h(A) = 0
n
n
.
U
In the above, the expectation is computed as
est(Wt",A) =
JI~
StLWt(W)",A] dP(w)
Identification does not require that the minimum of SO(A) becomes stable
n
as in Chapter 3 but does require that the curvature near each AO becomes stable
n
for large n.
ASSUMPTION 4. (Identification)
centered at
,0n
The nuisance parameter estimator,
in the sense that tim
n~
(T n
,0) = 0 almost surely and
n
n
is
9-4-4
~(T
n
-
for each
TO) is bounded in probability.
n
€
>0
inf
n
The estimation space A* is compact and
there is an N such that
>N
inf,
A -A °
n
1
~€
ls°o.)
n
P
(A. - Ao)2]~ or any other convenient norm and
i
i=l l.
it is understood that the infimum is taken over A in A* with IA - AOI > €
In the above, IA-Aol
=
lE
For the example, sufficient conditions such that the identification
condition obtains are as follows.
EXAMPLE 1.
(Continued)
= (l/n)
We have
E~=l P- e~ +
(2/n)
E~=l e etlf(Yt_l,Xt'YO)
- f(yt_l,xt,A)]
+ (l/n) E~=le If(yt_l,xt,yO) - f(Yt_l,X t ,A)]2
= (l/n)
E~=l o~ +
(l/n)
E~=l e
If(yt_l,xt,yO) - f(Yt_l,X t ,A)]2
Using Taylor's theorem and the fact that yO minimizes SO(A)
n
O-yO)'{O/nH
n
le l(a/aA)f(y l'x ,~)Jl<a/aA)f(Yt l,Xt,~)J'}<A-YO).
t=
tt
-
A sufficient condition for identification is that the smallest eigenvalue of
be bounded from below for all A in A* and all n larger than some N.
obliged to impose this same condition later in
Assumpt~on
b.
We are
U
We append some additional conditions to the identification condition to
permit application of the Uniform Strong Law.
ASSUMPTION S.
The sequences
{T n } and
closed ball with finite, nonzero radius.
{TO} are contained in T which is a
n
On T x A*, the family {StlWt(W),T,AJ};=O
9-4-5
is near epoch dependent of size -~, is continuous in (T,A) uniformly in t for
each fixed w in some set A with P(A) = I
(Problem 1), and there is a sequence
for all t where r is that of Assumption 3.
U
Let Assumptions I through 5 hold.
LEMMA 9.
lim
n~
Then
suP.* Is (A) - SO(A)! = 0
lL
n
n
almost surely and {S~(A)}:=O is an equicontinuous family.
PROOF.
Writing eSt(Wt';n,A) to mean eSt(Wt,T,A)'T=;
we have
n
Except on an event that occurs with probability zero, we have that the first term
on the right hand side of the last inequality converges to zero as n tends to
infinity by Theorem I and the same for the second term by the equicontinuity of
the average guaranteed by Theorem I and the almost sure convergence of T
n
to zero guaranteed by Assumption 4.
THEOREM 4.
(Consistency)
-
T°
U
Let Assumptions I through 5 hold.
Then
almost surely.
PROOF.
be given.
Fix w not in the exceptional set given by Lemma 9 and let
For N given by Assumption 4 put
C = infn>N inflA_Aol
n
€
> 0
n
9-4-6
Applying Lemma 9, there is an N' such that suP.* Is (A) - sO(A)1 < 6/2 for all
n
H
n
> N'.
~
Since s (A )
n n
s (AO) we have for all n
n n
A
n
> N'
that
A
~
SO(A ) - 6/2
n n
orls06) - sOOO)I<
n n
n n
s (A )
n n
o.
~
s (AO)
n n
> max(N,N')
Then for all n
The asymptotic distribution of A
n
SO(AO) + 6/2
n n
~
we must have
I~n-A~I<E:. U
is characterized in terms of the following
notation.
NOTATION 2.
11 0)
n
(I / n)I;n }..1..lj,
nT
~
(A)
=
{
(,) =
~n ~
09*
n
t=
~l
l ec a/ ;n ) s ( W ' TO, A) J le (a/ ;n ) s
t
t
n
t-T
(W
t-T
, TO, A) J'
n
T
~
T
<0
,
U.
n,-T
~(n-l)
(A)
n (,)
~T=-(n-l) ~nT ~
= J n 0*)
n '
(1*
if n
= ifnn 0*)
u. n*
n'
= ~ 0*)
n n
0
9-4-7
We illustrate their computation with the example
EXAMPLE 1.
(Continued)
The first and second partial derivatives of
are
Zl(a/aA)f(y
t-
Evaluating the first derivative at A =
recalling that e
t
l'x ,A)Jl(d/aA)f(y l'x ,A)J'
t
tt
yO and Yt
and (Yt-l'x t ) are independent;
e (a / a A) s (W t ' A) IA=Yo = - ze e t e (a / aA)f (Y t- 1 ' x t ' yO)
whence tl° =
n
0
o.
Put
(a/aA)f(y
t-
l'x ,A)I
t
A=Y o
Then
s =
s
02
t
e F t F't
t
<t
s = t
s
<t
9-4-8
and
In summary,
General purpose estimators of
(JO,:J
0)
n
n
- :J- ) respectively -- may be defined as
(c9
n
,
n
and
(J*,:J
*)
n
n
(09 n ,9 n )
and
follows.
NOTATION 3.
E~~~i(n)
In(A) =
J
nT
wlT/l(n)] JnT(A)
T
~
T
<0
(A) =
w(x) =
t
1 - olxl 2 : olxl 3
2(1 - Ixl)
0
~
Ixl
~
~
~
~
Ixl
~
1
l(n) = the integer nearest n 1/5
:In(A)
-
J =
= (l/n)E n l(a 2 /aAaA')S (w ,T- ,A)
t=
t t n
J (~),
n
-
:J= :J n (~),
-
J = J
n
(~),
-
:J= :I n (~)
The special structure of specific applications will suggest alternative
estimators.
for T :f
0
o.
For instance, with Example lone would prefer to take J
nT
(h)
=0
9-4-9
The normalized sum of the scores is asymptotically normally distributed
under the following regularity conditions as we show in Theorem 4 below.
ASSUMPTION 6.
The estimation space A* contains a closed ball A with
The points {AO} are contained in a concentric ball
finite, non-zero radius.
of smaller radius.
n
Let gt(Wt,T,A) be a generic term that denotes an element
of (OIOA)St(Wt,T,A),
(o2IoAOA')St(Wt,T,A), (o2IoTOA')St(Wt,T,A), or
l(oloA)St(Wt,T,A)Jl(oloA)St(Wt,T,A)J'.
On T x A, the family {gtlWt(W),T,AJ} is
near epoch dependent of size -q with q = 2(r-2)/(r-4) where r is that of Assumption 3,
gtlWt(W),T,AJ
set A with P(A)
is continuous in (T,A) uniformly in t for each fixed w
=
I, and there is a sequence of random variables {d } with
t
sUPTxAlgtlWt(W),T,AJI:il dt(w) and IIdtllr:£ t:"
constants
<
c
c 0'0
0
o'~ (A) 0 :£ clo'o
l
<
CD
for all t.
There is an Nand
such that for all 0 in ~p we have
Co > 0,
CD
all n
> N,
c 0'0 :iii o'Joo :iii c 0'0
1
0
n
all n
>N
c 0'0 :£ o 'J* 0 ;:;; c 0'0
1
0
n
all n
>N
:ii
in some
n
0'(09°)
.tim
11-+«>
n
-~
2
1
JO
lnsJ
(JO)-~
all A in A
I
n
* _1 '
* -~
(09 ) ~
.timn-+«> 0' 0 ) 2 J*
n
n
lnsJ
0
0'0
all 0
<s
:;l
a
0'0
all 0
<s
:£ 1
1
.
Also,
Recall that lnsJ denotes the integer part of ns, that J
with J- l
= (J-~)' (J-~)
and
-~
2
denotes a matrix
J~ a matrix with J = (J~)(J~)', and that factorizations
~
are always taken to be compatible so that ~2 J
As mentioned in Chapter 3, the condition
-~
2
= I.
9-4-10
permits two-step (first T then A) estimation.
If it is not satisfied, the
easiest approach is to estimate T and A jointly.
The requirement that
0'0,
limn ~
is particularly unfortunate because it is nearly the same as requiring that
as in Chapter 3.
This has the effect of either restricting the amount of
heteroscedasticity -that (a/aA)S (W ,TO,AO) can exhibit or requiring the use
t
t
n n
of a variance stabilizing transformation (see Section 2 of Chapter 3).
But
the restriction is dictated by the regularity conditions of the Central Limit
Theorem and there is no way to get around it because asymptotic normality cannot
obtain if the condition is violated (Ibragimov, 1962).
We verify that the
condition holds for the example.
EXAMPLE 1.
(Continued)
For the example,
1, 2, ...
t
y
t
= 0
with l(a/ay)f(y,x,yO)1 S d
t
<
1, we shall verify that
satisfies the condition
0' 0 •
To do so, define
~
0
9-4-11
Yt
0
t
Yt
f(yt_l,xt,yO)
o <
~s
Yt,m
~s-l
~s
Yt ,m
y
t
s
0
= f(Y
t ,m
0
,x ,y ) + e
s
s
~
~
0
t
max(t-m,O}
max(t-m,O)
<
s
~
t
t-l j
L. 0 die
J=
t- J
.1
g(y,x) = a typical element of G(y,x}
and assume that {e } and tXt} are sequences of identically distributed random
t
variables, and that Iytl and 1(3/3y)g(y,x)1 are bounded by some ~
in Section 2, for m
Y to
t
Yt
~
0 and t
~
0 there is a
Yt
>0
the same argument yields
;:;;dml-Y t l
I
~ d m(~
y
+
+dml Yt-m - Y
- - I
t m
t-m
)
00.
As
on the line segment joining
such that
For t - m
<
9-4-12
The assumption that the sequences of random variables {e } and {x } are
t
t
At-l
identically distributed causes the sequence of random variables {G(Yt-l,m'x )}
t
to be identically distributed.
Thus
But
where the constant does not depend on n or t.
=
o'lv +
O(dm)j-~lV +
Thus
O(dm)jlV +
O(dm)j-~
,
0
Now m is arbitrary so we must have
u
With Assumption b one has access to Theorems 1 through 3 and asymptotic
normality of the scores and the estimator ~
the same methods of proof as in Chapter 3.
LEMMA 10.
n
follows directly using basically
The details are as follows.
Under Assumptions 1 through b, interchange of differentiation
and integration is permitted in these instances:
9-4-13
Moreover,
lim
supl\. l(a/aA)S (A) - (a/aA)SO(A)1
n-+<JO
n
n
lim
n-+<JO
=
0
almost surely,
supl\. /(a 2 /aAaA')S (A) - (a 2 /aAaA')SO(A)1 = 0
n
n
almost surely,
and the families
are equicontinuous on A .
PROOF.
The proof that interchange is permitted is the same as in Lemma 3
of Chapter 3.
Almost sure convergence and equicontinuity follow directly from
Theorem 1 using the same argument as in Lemma 9.
THEOREM 5.
lJ
(Asymptotic normality of the scores)
Under Assumptions 1
through 6
lim
0°n +ll. n° n-+<JO
PROOF.
where
T
n
J) =
0 in probability.
For each i where i = 1, 2, ... , P we have
is on the line segment joining T
lim
T
- TO
n-+<JO n
n
=0
to TO
n
n
almost surely and ~ (;
n
TO)
n
By Assumption 4
= 0 p (1)
whence
9-4-14
almost surely.
By Assumption 0 we have
almost surely.
As the elements of
(JO)-~
n
must be bounded (Problem 3), we have,
where the interchange of integration and differentiation is permitted by Lemma 10.
Let 6 be a non-zero p-vector and put
00
Assumption 3 guarantees that {V }
is strong-mixing of size -4r/(r-4) with
t t=-oo
r
>4
so {V lOOt
t
=-00
is strong mixing of size -~ as required by Theorem Z and
Assumption 0 guarantees that {g
q = Z(r-Z)/(r-4)
>~
nt
(Problem 4).
which, by Assumption 0, satisfies
(a)
l/oZ
n
(b)
tim
=
n---
l/o Z
= O(l/n)
lnJ
02
lnsJ
/oZ
n
=
s .
(Wt,yO)} is near epoch dependent of size -q with
n
We have
9-4-15
Further, Assumption 6 and Problem 3 implies
II gn t (W t ,yO)
n
-
e. gn t (W t
,yO)
n "r
Thus,
1
= (o'o)~ (l/on)E~=lgnt(Wt'Y~)
-
by Theorem 2.
t
N(O,o'o)
This proves the first assertion.
To prove the second, put
and note that
l (n )
l IT II .(.U ( n ) j RnT = n
an2 = ..LoT=-l(n)
W
A
A
J:'
U
(~
2J
u'n° ) UJ:
_"
where w(x) can be either Bartlett or Parzen weights.
strong-mixing of size -4r/(r-4) for r
>4
00
By Assumption 3 {V }
is
t t=-oo
as required by Theorem 3, by Assumption 2
W depends only on the past, by Assumption 6 {gnt(Wt,y)} is near epoch dependent
t
of size -q with q = 2(r-2)/(r-4) so we have from Theorem 3 that
9-4-16
Hi
limn-+<Xl
-1
(n)
=
0
whence
J
for every 0
~
O.
THEOREM 6.
that ~
n'
AO
n
n
U
(Asymptotic normality)
lim
(:;0.
n-+<Xl
n
PROOF.
+ UO)o = 0 in probability
§)
Let Assumptions 1 through 6 hold.
Then:
= 0 almost surely.
By Lemma 2 of Chapter 3 we may assume without loss of generality
E
A and that (a/aA)S (~ ) =
n n
0
s
(n-~), (a/aA)SO(AO)
n
o(n-~).
=
n
By Taylor's theorem
where
9 has
rows (a/aA' Ha/dA.)s (i. ) with
1. n 1.n
IIi.1.n -
AOIl ~ II~
n
n
-
AOll.
n
Lemma 9
permits interchange of differentiation and integration, we have lim
II ~
n-+<Xl n
almost surely by Theorem 4, so that application of Theorem 1 yields lim
n-+<»
almost surely (Problem 5).
~ (c9 0)
n
-;.,
2
lp n° +
0
s
- An° II = 0
pon -9 = 0
Thus, we may write
(1) j (~
n
- A 0) = -
n
recalling that (a/aA)S (~ ) = 0 (n-~) and that (c90)-~ = 0(1) by Assumption 6
n n
s
n
(Problem 3).
The right hand side is 0 (1) by Theorem 5, <c9~)
0(1) by Assumption 6 so that ~ (~
p
n
- AO)
n
= 0 p (1)
;.,
2
and (P~)
and we can write
-1
are
9-4-17
which proves the first result.
The same argument used to show tim
~o
n-+<x> n
-
~
= 0 almost surely can be used
A
to show that lim
~o
n-+<x> n
- ~
= 0 almost surely.
lJ
Next we shall establish some ancillary facts concerning the estimator A
n
that minimizes s (A) subject to H:
n
h(A) = h* under the assumption that the elements
n
of the q-vector ~ lh(AO) - h*] are bounded.
n
n
Here h* is a variable quantity chosen
n
to adjust to AO so that the elements of the vector are bounded which contrasts
n
with Chapter 3 where AO was taken as the variable quantity and h* was held fixed
n
at zero.
n
As in Chapter 3, these results are for use in deriving asymptotic
distributions of test statistics and are not meant to be used as a theory of
constrained estimation.
See Section 8 of Chapter 3 for a discussion of how a
general asymptotic theory of estimation can be adapted to estimation subject to
constraints.
ASSUMPTION 7.
(Pitman drift)
hypothesis H: h(AO)
n
The function h(A) that defines the null
h* is a twice continuously differentiable mapping of A as
n
defined by Assumption 0 into R
q
with Jacobian denoted as H(A) = (a/aA')h(A)
The eigenvalues of H(A)H'(A) are bounded below over A by
In the case where p
with a continuous inverse.
=
Co2 > 0
and above by
q, h(A) is assumed to be a one-to-one mapping
In the case
q
< p,
there is a continuous function
cP(A) such that the mapping
(p). ~(A~
,
h(A)
has a continuous inverse
A
defined over S
= ljJ(p,,)
= {(p,,):
p
= cP(A), ' = h(A) ,
continuous extension to the set
A in
A}.
Moreover, ljJ(p,,) has a
9-4-18
{p:
R x T
p
The sequence {h*} is chosen such that
n
O( 1).
U
The purpose of the functions 4>0,) and W(p or) in Assumption 7 is to insure the
Af~
existence of a sequence {A f~} that satisfies h(/) = 0 but has lim
n~
n
n
n
-
AO
n
o.
This is the same as assuming that the distance between AO and the projection of
n
AO onto A*
n
n
=
= h*}
n
{A: h(A)
decreases as
Ih* - h(AO)1 decreases.
n
n
The existence
of the sequence {A#} and the Identification Condition (Assumption 4) is enough
n
to guarantee that lAo - A*I decreases as Ih(AO) - h*1 decreases (Problem 7).
n
n
n
n
The bounds on the eigenvalues of H(>.)H'(>.) (Assumption 7) and
~
n
(A) (Assumption 0)
guarantee that 1>.° - >.*1 decreases as fast as Ih(AO) - h*1 decreases as we show
n
n
n
in the next two lemmas.
LEMMA 11.
< p.
q by P with q
and above by c
2
c .
l
Let
l
<~
9
be a symmetric p by P matrix and let H be a matrix of order
Suppose that the eigenvalues of
9
are bounded below by
and that those of HH' are bounded below by
Co2
Co > 0
and above by
Then there is a matrix G of order p by (p-q) with orthonormal columns such
that HG = 0, the elements of
are bounded above by pc , and Idet AI
l
PROOF.
Let
be the singular value decomposition (Lawson and Hanson, 1974, Chapter 4) of H
9-4-19
where S is a diagonal matrix of order q with positive entries on the diagonal,
,
V(l) is of order q by p and U'U = UU' = V(l)V(l) = I of order q.
HH'
=
c~ ~ s~i ~ c~.
2
US U' we see that
v =
From
,
Choose V(2) of order p-q by p such that
[:j~;]
satisfies
Put G
I
=
I~J
I
V V
I
I
=o
V(2) note that HG
AA
I
=
[V<2~ }
USV
V(1) SU I J
V(2)
o
= r<2;
and consider
~
,
V(2)
V(2;
2
US U'
us V(Up V( 2 )
[:(2)
=
:] [: :]
,
I
V(lJ SU.]
[: (lJ
V(2)] [V ~ lJ
o
V(2)
:]
[: :] [:(2)
:J
,
BCD D C B
The elements of Band D are bounded by one so we must have that each element of
BCD is bounded above by pc .
l
Then each element of AA
Since a diagonal element of AA
,
has the form Lia
(Mood and Graybill, 1963, p. 206)
I
2 2
is bounded above by p c .
l
2
we must have laijl ~ p c .
ij
l
Now
9-4-20
= det 8
= det 8
2
2
det<v(2)
~~
V(2) - V(2)
de t< V( 2)~ V( 2) V( 2)
;:: (c )2 P de t
o
2
~
V(l)
V(l)~V(2»
~ V( 2) )
,
( V( 2) ~ V( 2 ) )
But
whence (cO)P
:s
det V(2)
~V(2) and
2
det A'A = det A
LEMMA 12.
Under Assumptions 1 through 7 there is a bound B that does not
depend on n such that lAo - A*I
n
n
PROOF.
T
q
lJ
~
Blh(AO) - h*1 where IAI = (E P A2)~
n
n
i=1 i
The proof for the case q = p is immediate as the one-to-one mapping
h(A) has a Jacobian whose inverse has bounded elements.
Consider the case
< p.
Let
~
>0
be given.
For NO given by Assumption 4 put
Let ~(p,T) be the continuous function defined on R x T given by Assumption 7.
Now
h* = h(A*) by definition and put p* = ~(A*), hO = h(AO), and Pno = ~(AO). The image of
n
n
n
n
n
n
n
a compact set is compact and the Cartesian product
so R x T is compact.
of two compact sets is compact
A continuous function on a compact set is uniformly con-
tinuous so lim
Iho - h*1 = 0 implies that
n-+<x>
n
n
9-4-21
In particular, putting A#
n
By Assumption 6 the points {AO} are in a concentric ball of radius strictly
n
smaller than the radius of A so we must have A# in A for all n greater than
n
some N .
1
By Lemma 9, the family {SO(A)} is equicontinuous so that there is
n
an N such that IA - A~I
2
for all n
> N2 ·
<n
implies that
IF
Choose N large enough that IA - A~I
n
3
°
fOl.es t h e constraint h(A#)
The point 1\,IF sat1.S
n
n
whencelsO(A*) - SO(AO)
n n
n n
1<
0
= h*n
<n
for all n
> N3 .
so we must have SO(A*) S SO(A#)
n n
n n
and we must have IA* - AOI
n
n
<€
We have shown that
AO - A* = 0(1) as Iho - h*1 tends to zero.
n
n
n
n
The first order conditions for the problem minimize SO(A) subject to h(A)
n
are
h(A*)
n
By Taylor's theorem we have
= h*n
= h*n
9-4-22
Using (a/a;\)sO(;\O)
n n
=0
for large nand h(;\*) - h*
n
n
= 0,
we have upon substitution
into the first order conditions that
L~* + o(l)j(;\o - ;\*) = -H*'e
n
n
n
n
Let G* be the matrix given by Lemma 11 with orthonormal columns, H*G*
n
n n
O
<
I
n' an d max
a*
(cO) 2P :5- det A*
ij ijn
A: -
I
<
~
<~
P cl
= 0,
h
were
~:~::).
Let a .. denote the elements of a matrix A and consider the region
~J
•
."
On th is region 'Je must have laijl
=B <-
<
h
were
a ij d enotes an e 1 ement
0
f A-
l
For large n the matrix A* is in this region by Lemma 11 as is the matrix
n
A
n
since the elements of G* are bounded by one.
n
In consequence we have
-1
where the elements of A
are bounded above by B for all n larger than some N.
n
Thus we have 1;\° - ;\*1 ~ B IhOO) - h*1
n
n
n
n
for large n.
LJ
•
9-4-23
THEOREM 7.
Let Assumptions 1 through 7 hold.
.lim
A - A*
n-+<><> n
n
o
almost surely,
=0
lim
(J* +1..1.* - J)
n-+<><>
n
n
-
lim
{)* - {)
n-+<><> n
Then:
in probability
= 0 almost surely.
The first result obtains from Lemma 12 since ~ lh(A O )
PROOF.
n
h*J = 0(1)
n
by Assumption 7.
The proof of the second is nearly word for word same as the first part of
the proof of Lemma 12.
One puts
O
and for fixed w has from Lemma 9 that IA - A
n
for all n larger than N .
l
n
-
n
<
<
n
implies
For n larger than N one has
2
as in the proof of Lemma 12.
SO(A ) - 0/2
!
The critical inequality becomes, for the same fixed w,
s (A )
n n
Combining this with
the first result gives the second.
9-4-24
The proof of the third and fourth result is the same as the proof of
Theorem 5 recalling that' (a/aA)SO(A*) is the mean of (a/aA)S (A*) by Lemma 10.
n n
n n
The fifth result is an immediate consequence of Lemma 10 and the second
Ll
result.
PROBLEMS
1.
Show that if w has fixed dimension, k = k all t, and the dependence
t
t
of St(w , T,A) on t is trivial, St(W,T ,A) = S(W,T ,A) all t, then continuity of
t
S(W,T,A) in (W,T,A) implies that StlWt(W),T,AJ is continuous in (L,A) uniformly
in t for each fixed W; that is lim(T,A) ~(TO,AO) SUPtIStlWt(W),T,A] - StlWt(W),TO,AOjl=o
for each fixed w.
is near epoch dependent of size -q for any q ) O.
List the regularity conditions
used.
Let cOn be the smallest eigenvalue of J~ and c
3.
Assumption 6 implies that Co
,
all n ;;: N and that 0 (j 0)
n
-1
0
ln
all n ;;: N.
~
~
the largest.
Prove that
Prove that det JO ;;: (c )p
n
0
Show that (Jo)-l can always
n
be factored such that the elements of
(JO)-~
n
are bounded.
Show that if the elements of (a/aA)St(Wt,T~,A~) are near epoch dependent
4.
of size -q then so are the elements of o'A (a/aA)S (W ,TO,AO) if A has bounded
n
t t n n
n
elements.
5.
Let limn~ sUP TxA l(l/n)E~=l ft(T,A)
-e
ft(T,A)1 = 0 almost surely,
{(l/n)E~=l e ft(T,A)}:=l be an equicontinuous family on T x A, and let
lim
n~
I(T,~)
n
n
-
(TO,AO)I = 0 almost surely.
n n
n
lim
l(l/nH __ f
n-+«>
t l t
(T n ,~ n ) - e
--------
Show that
f (TO,AO)I = 0 almost surely.
t n n
J
9-4-25
b.
values of
7.
Prove Lemma 11 with
~
bounded below by
~
not necessarily symmetric but with the singular
Co > 0
and above by c
l
< ~.
The purpose of the function W(P,T) in Assumption 7 is to guarantee the
existence of a sequence {A#} that satisfies h(A#)
n
n
=
0
Prove Lemma 12 using this condition instead of the existence of W(p,T).
9-5-1
5.
METHOD OF MOMENTS ESTIMATORS
Recall that a method of moments estimator A is defined as the solution
n
of the optimization problem
Minimize:
where
s
dLm (A),; J
n
n
(A) =
n
d(m,T) is a measure of the distance of m from zero, T
n
is an estimator
of nuisance parameters, and m (A) is a vector of sample moments,
n
m (A) = (l/n) En
m (w ,; ,A).
n
t=l t t
n
The dimensions involved are as follows:
w is a kt-vector, T is au-vector,
t
A is a p-vector, and each mt(wt,T,A) is a Borel measurable function defined
k
on some subset of .lR t x lR
u
v
x lR P and with range in lR .
specifically, it does not depend on t.
Note that v is a constant;
As previously we use lower case w to
t
mean either a random variable or data as determined by context.
shall write Wt(v
ClO
)
ClO
when considered as a function on lR
-ClO
For emphasis, we
,and write Wt(V ), Wt ,
ClO
WtlV (w)J, or Wt (w) when considered as a random variable depending on the underClO
lying probability space (!2,a,p) through function composition with the process
A constrained method of moments estimator A is the solution of the
n
optimization problem
Minimize:
s (A) subject to h(A) = h*
n
n
q
where h(A) maps .lRP into lR •
As in the previous section, the objective is to find the asymptotic distribution of the estimator ~
specification error.
n
under regularity conditions that do not rule out
-
Some ancillary facts regarding A under a Pitman drift
n
are also derived for use in the next section.
As in the previous section, drift
is imposed by moving h*.
n
As the example, we shall consider the estimation procedure that is most
commonly used to analyze data that are presumed to follow a nonlinear dynamic
9-5-2
model.
The estimator is called nonlinear three-stage least squares by some
authors (Jorgenson and Laffont, 1974; Gallant, 1974; Amemiya, 1977; Gallant
and Jorgenson, 1979) and generalized method moments by others (Hansen, 1982).
The estimation procedure is as follows.
EXAMPLE 2.
(Three-stage least-squares).
Data is presumed to follow the
model
t
= 0,
I,
where Yt is an M-vector of endogenous variables, x
t
is a kt-vector with exogenous vari-
ables and (possibly) lagged values of Yt as elements (the elements ofx are collectively
t
termed predetermined variables rather than exogenous variables due to the presence
M
k'
of lagged values of Yt)' yO is a p-vector, and qt(y,x,y) maps m x m t x mP into
mL with L S M.
Note that M, L, and p do not depend on t.
Instrumental variables
a sequence of K-vectors {Zt} -- are assumed available for estimation.
variables have the form Zt
= Zt(x t )
These
where Zt(x) is some (possibly) nonlinear,
vector valued function of the predetermined variables that are presumed to satisfy
t
= 0,
I, ...
where, recall (Chapter b, Section 2),
e
~
Z
=
More generally, Zt may be any K-vector that has
dependence of qt(Yt'xt,y) on elements of x
not restrictive.
of the form
t
ee t
~
Zt
= 0,
but since a trivial
is permited, the form
Z
t
=
Also, Zt may depend on some preliminary estimator Tn and be
4It
9-5-3
or depend on the parameter yO (Hansen, 1982) with
The moment equations are
with w' = (y' x') and
t
t' t
or
Hereafter, we shall consider the case Zt = Zt(x ) because it occurs most
t
frequently in practice.
Our theory covers the other cases but application is more
tedious because the partial derivatives of mt(wt,T,A) with respect to T and A
become more complicated
If L = P one can use method of moments in the classical sense by putting
sample moments equal to population moments, viz. m (A) = 0, and solving for A
n
to get A.
n
But in most applications L
>
p and the equations cannot be solved.
However, one can view the equation
as a nonlinear regression with p-parameters and L x K observations and apply the
Let TO denote the upper
n
principle of generalized least squares to estimate yO
triangle of l(l/n)
D(T~)
e(E~=l
= l(l/n)e
e t 0 zt)(E:=l e s 0 zs)' J-
(E~=let 0
l
Zt)(E:=le S 0 zs)' J-
and put
l
Using the generalized least squares heuristic, one estimates yO by A
n
minimizes
that
9-5-4
dlm 0'>,;] =~m'O.) D(;)m Od
n
n
n
n n
We shall assume that the estimator T satisfies ~im
T - TO = 0 almost
n
n-+oo n
n
surely and that ~ (;
- TO) is bounded in probability.
n
n
to obtain such an estimate is to find the minimum
~#
The obvious approach
of m'(A)m (A), and put
n
n
where
S
nT
(A)
=
S'
n, -T
If e
t
t
~ Zt and e
s
(A)
T
~ Zs are uncorrelated for all time gaps Is-tl
<0
,
larger than some
as in many applications to financial data (Hansen and Singleton, 1982) then we
can obtain the conditions tim;
n
- TO = 0 and ~ (; - TO) bounded in probability
n
n
n
using Taylor's series expansions and Theorems 1 and 2 with ten)
But if e
t
~
Zt and e
s
~
e
s
=t
and w(x)
~
~
= 1.
are correlated for every s, t pair then this sort of
approach will fail for any ~(n) with ~im
~(n) = ~
because Theorem 3 is not
n-+oo
l
enough to imply the critical result that ~ O'lD- (; ) - D-l(,O)]o is bounded in
n
n
-1
probability. But as noted in the discussion preceding Theorem 3, o'D (T)O is an
A
n
estimate of a spectral density at zero so that if {e
t
~ Zt} were stationary we
should have the critical result with w(x) taken as Parzen weights and ten) = ln
l/5
r
-1
-1
It is an open question as to whether ¥n o'lD (T) - D (TO)]O is bounded in
A
n
n
probability under the sort of heteroscedasticity permitted by Theorem 2 or if
stationarity is essential.
U
We call the reader's attention to some heavily used notation and then state
the identification condition.
].
9-5-5
NOTATION 4.
s (A)
n
= dLmn (A),T n j
A minimizes s (A)
n
n
A minimizes s (A) subject to h(A)
=0
A* minimizes SO(A) subject to h(A)
n
n
= o.
n
n
ASSUMPTION 8.
(Identification)
The nuisance parameter estimator T
mO(A)
n
=
is
n
=
centered at TO in the sense that lim
T - TO
n
n~n
n
is bounded in probability.
U
0 almost surely and
I; (T n -
TO)
n
Either the solution AO of the moment equations
n
0 is unique for each n or there is one solution that can be regarded
being naturally associated to the data generating process.
and M* = (3/3A')mO(A*); there is an N and constants
n
n n
Co > 0,
c
1
<
00
such that for
all 0 in lR we have
c~
0'0 S o'M*'M*o
n
LJ
n
As mentioned in Section 4 of Chapter 3, the assumption that mO(AO) = 0 is
n n
implausible in misspecified models when the range of m (A) is in a higher
n
dimension than the domain.
As the case mO(AO)
n n
~
0 is much more complicated than
the case mO(AO) = 0 and we have no need of it in the body of the text, consideration
n n
of it is deferred to Problem 1.
by construction.
The example has mO(AO) = 0 with AO
n n
n
= yO
all n
9-5-{)
The following notation defines the parameters of the asymptotic distribution
of ~
n
NOTATION 5.
Kn (A)
KnT (A)
= L(n-l)
T=-(n-l)
L~=l+r
= {( lin)
KnT (A)
K'
n,-T
0
T
~
T
<0
T
~
0
T
<
0
(A)
Sn (A)
= {:l/n)E~'l+C e"t(Wt.T~.A)
SnT (A)
S'
(A)
n, -T
£1n (A)
= (l/n)L~=l
Dn (A)
= (a 2 /amam')dLmO(A),ToJ
n
n
.1 (A) =
n
e (a/aA')mt(Wt,T~,A)
£1'n (A) Dn (A) Sn (A) Dn (A) £1 (A)
n
9n (A) = H'n (A)
jj (A)
n
Hn (A)
Gn (A) = £1'n (A) Dn (A) Kn (A)
J
(A0),
n
~o
<9°
n
=
09*
n
=J (A*), ~*
n n
n
n
n
=
jj (A)
n
9n (AO),
n
= ~ n (A*)
n '
M
(A)
n
h O =i:i (A0), SO
n
n n
n
=
S(A 0)
n
=ii: n (A*),
n
=
S(A*)
n
u*
n
s*
n
We shall illustrate the computations with the example.
EXAMPLE 2.
(Continued)
Recall that data follows the model
t
with
= 1, 2,
... ,
n
.
U
~
9-5-7
and
Since
U
= 0 we have K
o.
=
n
Further,
We have
Mn 0.)
= (l/n)
l:~=l ~
(a/aU m(Wt'A)
= (l/n) l:~=l ~ (a/aA' Hqt (Y ,x ,A)
t
t
(l/n)
= (l/n)
o
Zt J
l:~=le L(a/aA')qt(yt,Xt,A)] 0 Zt
l:~=l~Qt(A) 0
Zt
Recall that
with
'" ("n
e8
L(l/ n ) ,:"
<Ot=l e t 0 Zt )("n
<0
8 =1
l\Il
lO'
Z8 )
I
J- l
0
T
;;:
T
<0
9-5-8
Thus,
-
= (a 2 /amam')
D (A)
n
~I(SO)
n
-1
mi'
m=m (A)
n
and
=:J0n •
An important special case is the instance where the x
t
are taken as fixed
(random variables with zero variance) and the errors {e } are taken as independently
t
and identically distributed with
S~
=
eete~
= L.
In this case
n
L ~ (l/n) Lt=l z t z't
and
= :Jon
U
(J,j)
General purpose estimators of (Jo,
denoted (J,:J) and
n ./0)
n and (J*,
n :J*),
n
respectively, may be defined as follows.
NOTATION 6.
= L~:~i(n)
Sn(A)
S
n
(A)
= {( lin)
wLT/!(n)] SnT(A)
L~=l+r mt(w t ,; n ,A)m t
s'n, -T 0,)
w(X)
=
t:
- Ixl> 3
-T
(w
t -T
,; n ,A)
T
~
T
<0
0
9-5-9
t(n)
D 0,)
n
= the integer nearest n
=
lis
<aZ/amam') dLm 0.) ,T j
n
n
(1)
= M'(l)
n
D (1) S (1) D (1) M (1)
:; 0.)
= M'n (1)
D (1) M (1)
~
n
n
n
n
.:9
=
.:9 (~ ) ,§
-
=
r.9 (1 ),
A
.:9
n
n
n
n
n
n
n
n
= :;n (~ n )
:;
-
= :;0.)
n n
U
For the example, three-stage least squares, one is presumed to have an
estimate D(T ) of (Su)-l available in advance of the computations.
n
n
In applications,
it is customary to reuse this estimate to obtain an estimate of .:9 u rather than
n
trying to estimate SU afresh.
n
EXAMPLE 2.
almost surely and
We illustrate.
A
Recall that by assumption tim
D(T) - (SO)
n-+n
n
(Continued)
~ lD(T n ) -
(So)-l j is bounded in probability.
n
-1
Thus, for the case
we will have
where, recall,
In the special case where {e } is a sequence of independent and identically
t
distributed random variables with
Pe
e' =
/ t t
~
= 0
and z t taken as fixed we have
9-5-10
and
=
:J.
U
The following conditions permit application of the uniform strong law for
dependent observations to the moment equations, the Jacobian, and the Hessian
of the moment equations.
ASSUMPTION 9.
The sequences
Fn } and
closed ball with finite non-zero radius.
{T"}
n
are contained in T which is a
The sequence {A"} is contained in A*
n
which is a closed ball with finite, non-zero radius.
Let qt(Wt,T,A) be a generic
term that denotes, variously,
<a/cu.)
m (W ,T,A), (3 2 /CU.3A.) m"'t(Wt,T,A),
1.
at t
1.
J
...
for i, j
=
1, Z, ... , p,
t =
1, Z, ... , u and a
=
1, Z, ... , M.
On T x A*, the
family {gtLWt(W),T,AJ} is near epoch dependent of size -q with q = Z(r-Z)/(r-4)
where r is that of Assumption 3, gtLWt(W),T,AJ is continuous in (T,A) uniformly
in t for each fixed w in some set A with peA) = 1, and there is a sequence of
random variables {d } with sUPTxA* gtLWt(W),T,AJ ~ dt(W) and IIdtllr ~ /::,
t
all t.
<
QO
for
U
Observe that the domination condition in Assumption 9 guarantees that m"(A)
n
takes its range in some compact ball because
We shall need to restrict the behavior of the distance function d(m,T) on a
9-5-11
slightly larger ball
~.
The only distance functions used in the text are
quadratic
with D(T) continuous and positive definite on T.
Thus, we shall abstract
minimally beyond the properties of quadratic functions.
See Problem 1 for the
more general case.
ASSUMPTION 10.
Let
rn
be a closed ball that contains UCD 1{
m =o
m (;\): ;\ e: A*}
n=
as a concentric ball of smaller radius.
(a/am)d(m,T), (a 2 /amam' )d(m,d,
n
The distance function d{m,T) and derivatives
<a 2 /amch' )d(m,T)
are continuous on ;n x T.
Moreover,
(a/am)d(O,t) =0 for all T in T{which implies (a 2 /amaT')d(Q,T) = 0 for all T in T)
and (a 2 /amam' )d(m,T) is positive definite over ;n x T.
lJ
Before proving consistency, we shall collect together a number of facts needed
throughout this section as a lemma.
LEMMA 13.
Under Assumptions 1 through 3 and 8 through 10, interchange of
differentiation and integration is permitted in these instances:
Moreover,
limn-l'<lD sUPA* 1m (;\) - mO (;\)1 = 0 almost surely,
an
an
limn-l'<lD sUPA* l(a/aA.)lm (A) -mO(;\)JI = 0 almost surely,
1.
an
an
limn-l'<lD suP *
A
l(a 2 /a;\.a;\.)lm
(;\)
1.
J
an
- mO (;\) J I = 0 almost surely,
na
limn-l'<lD sUPA* Is (;\) - sO(;\)1 = 0 almost surely,
n
n
lim
sUPA* l(a 2 /a;\.o;\.)ls (;\) - sO(;\)JI = 0 almost surely,
n-l'<lD
1.
J
n
n
9-5-12
and the families {mO (A)}, {(a/aA.)mO (A)}, {(a 2 /aA.aA.) mO (A)}, {SO(A)},
an
1
an
1
J
an
n
{(a/aA.)SO(A)}, and {(a 2 /aA.aA.)SO(A)} are equicontinuous; indices range over
1
n
J
1
n
i, j = 1, 2, ... , p; a = 1, 2, ... , M, and n = 1, 2, ... ,
PROOF.
in the above.
CD
The proof for the claims involving m (A) and mO(A) is the same
n
n
as the proof of Lemma 10.
For m (A) in 1l\ we have
n
s (A) = dlm (A), , j
n
n
n
(a/aA.)S (A) = L (a/am )dlm (A),; j(a/aA.)m (A)
1
n
a
ann
1
an
and
(a 2 /aA.aA.) = L Le(aZ/am ame)dlm (A),; j(a/aA.)m (A)(a/aA.)m (A)
1
J
a
ann
1
an
J
en
+ L (a/am )dlm (A),; j(aZ/aA.aA.)m
a
ann
Consider the second equation.
1
J
an
(A)
A continuous function on a compact set is uniformly
continuous thus (a/am) d(m,,) is uniformly continuous on ~ x T.
a
a
<a
small enough that 1m - mOl
and ,; - ,°1
<a
Given
€>
0 choose
imply
Fix a realization of {Vt};=l for which
tim
n-+=
I; n - ,°1
n
~ 0 and tim
n~
suP.* 1m (A) - mO(A)1 = 0, almost every realization
II
n
n
is such by Assumption 9 and Theorem 1.
suP.* 1m (A)
II
n
-
mO(A)1
n
<a
and
I; n -
,°1
n
Choose N large enough that n
< o.
equicontinuity, we can choose n such that IA - AOI
<n
N implies
This implies uniform convergence
since we have sup.* I(a/am ){dlm (A),; J -d lm°(A)"Ol}1
II
ann
n
n
For IA - AOI
>
<n
<€
for n
> N.
By
implies ImO(A) - mO(AO)1
n
n
we will have
which implies that {(a/am )d lmO(A)"Oj} is an equicontinuous family.
ann
n
As (a/aA.)S (A) is a sum of products of uniformly convergent, equicontinuous
1
n
functions, it has the same properties.
< 0,
9-5-13
The argument for s (>..> and (a2./aA aAS)s (A) is the same.
n
ex
n
U
As we have noted earlier, in many applications, it is implausible to assume
h as on 1 y one root over J1A*.
t h at m"(\)
~
n
Thus, the best consistency result that we
can show is that s (A) will eventually have a local minimum near A" and that all
n
n
other local minima of s (A) must be some fixed distance 0 away from A" where this
n
n
distance does not depend on A" itself.
n
Hereafter, we shall take A to mean the
n
root given by Theorem 8.
THEOREM 8.
(Existence of consistent local minima).
3 and 8 through 10 hold.
Then there is a 0
minimizes s (A) over IA - A"I ~ 0
n
n
>0
Let Assumptions 1 through
such that the value of ~
n
which
satisfies
lim
(~- A") = 0
n-+oo
n
n
almost surely.
PROOF.
By Lemma 13 the family {M. (A)=(Cl/aA')m"(A)} is equicontinuous over A*.
n
n
Then there is a 0 small enough that I~ - A"I S 0 implies
n
(A - AO)'lM.'(~)M. (~) - M"'M"J(A - AO)
n
n
n
n n
n
>-
(cOZ/Z)IA - AnolZ
where c~ is the eigenvalue defined in Assumption 8.
Let YO be the smallest eigenvalue
of (a2./amam' )d(m,T) over lnx T which is positive by Assumption 10 and continuity over
a compact set.
Recalling that m"(A") = 0, d(O,T) = 0, and (a/am)d(O,T) =
n n
by Taylor's theorem that for N given by Assumption 8
= \)
o
. f
1· f
1n n>N n e:~IA-AOI;;;;o
n
(\_\")'M.'(~)
~ ~n
n 1\
M. (~)(A-\")
n ~
~n
° we have
9-5-14
where m is on the line segment joining the zero vector to m, and A is on the
line segment joining A to AU.
Fix w not in the exceptional set given by Lemma 13.
that n
> N'
implies that sUPA* ISn(A) - s~(A)1
s (~ ) ~ s (AU) we have for all n
n n
n n
I~
n
-
AUI
n
<
~
>
Choose N' large enough
< Voc~~2/4
for all n
> N'.
Since
N' that
U
We append some additional conditions needed to prove the asymptotic normality
of the score function (a/aA')s (AU).
n n
ASSUMPTION 11.
The points {AU} are contained in a closed ball A that is
n
concentric with A* but with smaller radius.
c
l
<~
such that for 0 in
mP
O'(SU)-~ SU
(SU)
lim
n-+oo
n
n
LnsJ
-!.::
2
> 0,
we have
Co 0'0 ~ o'~ n ( A) 0 ~ c 1 0 ' 0
limn-+oo o'(S*)
n
There is an N and constants Co
all n
-~,
2
> N,
all A in A
0 = 0'0 all 0
<
s :iii 1
0 = 0' 0 all 0
<
s
_1 ,
(s*) ~
s*
n
LnsJ
;S
1
Also,
THEOREM 9.
(Asymptotic normality of the scores)
3 and 8 through 11
Under Assumptions 1 through
~
9-5-15
r
rn (09°)
n
2
S(a/O).)s 0.°)-+ N(O,I)
n n
(o9~ + lA~
lim _
n
PROOF.
-%
-
J)
=
0 in probability.
By the same argument used to prove Theorem 5 we have
~ (SO)-~lm
n
(AO) n
n
mO(AO)j~
n
n
tim
lSo + KO - S (~ )j = 0
nn
n
n n
N(O,I)
almost surely.
A typical element of the vector ~ (a/am) dlm (AO),T j can be expanded about
n n
n
lmO(AO),TOj to obtain
n n
n
~ (a/am) dlm (AO),T j
ann
n
~ l mo.°) - m° ( A° ) j
ann
n n
+ (a / am' )( a / am ) d (Ji,;)
where
(Ji,;) is on the line segment joining lmn (AO),T
j to LmO(AO),TOj.
n
n
n n
n
We have
that ~ Lm (AO) - mO(AO)j converges in distribution and so is bounded in probability;
n
n
n
n
we have assumed that ~ (T
n
- TO) is bounded in probability.
n
Then using the uniform
convergence of lm (A) - mO(A)j to zero given by Lemma 13, the convergence of
n
(T
n
n
- TO) to zero, and the continuity of d(m,T) and its derivatives we can write
n
~ (a/am) dLm (AO),T j
n
=
n
n
~ (a/am) dlmO(AO),TOj
n
n
n
+ (a 2 /amaT') dLmO(AO),TOj
~ (T
+ (a 2 /amam')d LmO(AO),TOj
~ lm 0.°) - mO(AO)j
n
n
nn
+
n
n
n
- TO)
n
nn
nn
(1)
0
P
Since AO is an interior point of A* by Assumption 11, we have ~(a/aA)SO(AO) = 0(1)
n
n n
whence
9-5-16
~ (a/aA)s (A 0)
n n
In general this simplification will not obtain and the asymptotic distribution
of ~ (a/aA)s (AU) will be more complicated than the distribution that we shall
n n
obtain here (Problem 1).
Now (JU)-~
n
= (SO)-~
n
(DU)-l (MU,)-l
n
n
Assumptions 8, 10, and 11 assure the existence of the various inverses and the
existence of a uniform (in n) bound on their elements.
and the first result obtains.
Then
Lemma 13 and Theorem 8 guarantee that lim
n~
almost surely, Assumption 10 and Theorem 8 guarantee that lim
n~
almost surely, we have already that lim
n~
the second result obtains.
(SU + KU n
n
Sn ) = 0
(D U n
almost surely whence
U
through 11 hold.
(Asymptotic normality)
Then:
Mn ) = 0
D)
=0
n
Asymptotic normality of the unconstrained estimator follows at once
THEOREM 10.
(M U n
Let Assumptions 1 through 3 and 8
9-5-17
A
limn-fo<lO (~nO - ~ )
PROOF.
=0
almost surely.
lJ
The proof is much the same as the proof of Theorem b.
Next we establish some ancillary facts regarding the constrained estimator
subject to a Pitman drift for use in the next section.
ASSUMPTION 12.
(Pitman drift)
The function h(A) that defines the null
h* is a twice continuously differentiable mapping of A as
hypothesis H: h(AO)
n
n
defined by Assumption 11 into
mq
with Jacobian denoted as H(A)
The eigenvalues of H(A) H'(A) are bounded below over A by
=
Co > 0
(a/aA')h(A)
and above by
In the case where p = q, h(A) is assumed to be a one-to-one mapping
with a continuous inverse.
~(A)
In the case p
<
q, there is a continuous function
such that the mapping
(p) = (:~ »)
(A
h(A)
T
has a continuous inverse
defined over S
= {(P,T);
p
= ~(A),
T
= h(A),
A in
A}.
Moreover, ~(P,T) has a
continuous extension to the set
R
x T
= {p:
p
= ~(A)}
x {T: T
= h(A)}
.
The sequence {h*} is chosen such that
n
THEOREM 11.
is a 0
>
Let Assumptions 1 through 3 and 8 through 12 hold.
Then there
0 such that the value of A which minimizes s (A) over IA - A*I
n
n
n
subject to h(A)
= h*n
satisfies
<0
9-5-18
o almost surely.
lim
(A - A*)
n"'n
n
Moreover,
tim
(J* + U* - J) = 0 in probability
n"'n
n
lim
n"'-
PROOF.
~* - ~ = 0 almost surely.
n
The proof is much the same as the proof of Theorem 7.
U
PROBLEMS
1.
Let Assumptions 1 through 3 and 8 through 11 hold except that m~(A~) ~ 0;
also (a/am) d(O,T) and (aZ/amam') d(O,T) need not be zero.
Presume that the estimator
of the nuisance parameter TO can be put in the form
n
where {ft(W )} satisfies the hypotheses of Theorem 2 and cOo'o ~ ol(A~)'(A~)O ~ c 1 0'0
t
for finite, non-zero cO' c
Zt =
vec
l
and all n larger than some N.
<a/aA)m~
f (W )
t t
Define
(Wt'T~'0
O
KnO = r(n-l)
T=-(n-l) KnT
O =
XnT
{ (lIn)
r
n
(eZ )(eZ
)'
t=l+T
t
t-T
O
Kn,-T
go = r(n-l)
go
T=-(n-l) nT
n
T
~
0
T
<
0
9-5-19
go
=
{(lIn)
nT
JO
n
:;:
T
<0
go
n, -T
= CLog G.
n
0
n
0 ,
n
Show that
,0)L N
(0 ,
I).
1\
n
0
T
9-6-1
6.
HYPOTHESIS TESTING
The results obtained thus far may be summarized as follows:
SUMMARY.
Let Assumptions 1 through 3 hold and let either Assumptions 4
through 7 or 8 through 12 hold.
Then on a closed ball A with finite, non-
zero radius:
S ( 1\') -
n
a.s., 0 unl.of orm1 y on A ,
SU(,)
1\
n
(a/ a:A. )l s ( ')
n
1\
U()J a.s.
s:A.
-
n
0
1
uni f ormly on A ,
(a 2 /a:A.a:A.')ls (:A.) - SU(:A.)j a.s., 0 uniformly on A,
n
n
{S~(:A.)}:=l is equicontinuous
{(a/a:A.)s~(:A.)}:=l is equicontinuous
iIi"
(09 U) -~
n
iIi" (o9*)-~
n
:A,
<a / aA)s (:A. U) -L. N( 0 , 1)
n
n
(a/a:A.)ls (:A,*) n n
SU(:A,*)j~ N(O,I)
n
n
, 0
n
~ 0, (09 *n + u *n
(p 0
p ) ...::a:..:.-=s~.
~
_
n
-+1
0,
(p* n
c 0'0
~
°'9(:A.)0 S clo'o
c 0'0
~
0',9*0 ~
n
0
0
cOo'o :i
C
1
0'0
0'09*0 ~ c 0'
l
n
3)
-
a. S.
all n
all n
1
0
< Co < c l <
> N,
° all n > N,
~
00
•
P
Moreover,:A.
n
o , U n' u*n
U
0(0
0
> N,
ouuo S clo'o, o'u*o :;; clo'o all n
n
n
where
-
J)
all
°
in lRP
all
°
all
° in
> N,
and:A,
in lRP , all :A, in A
all
n
lRP
<5
in lRP
are tail equivalent to random
9-6-2
variables that take their values in the interior of A and AO and A* are interior
n
n
Thus, in the sequel we may take A , A ,Ao and A* interior
n
n
n'
n
to A for large n.
U
to A without loss of generality.
Taking the summary as the point of departure, consider testing
where h(A) maps A
C
R P into R
statistics for this problem:
Lagrange multiplier test.
q
.
As in Chapter 3, we shall study three test
the Wald test, the "likelihood ratio" test, and the
Each statistic, say T as a generic term, is decomposed
into a sum of two random variables
T
where a
n
= X
n
+ a
n
n
converges in probability to zero and X
n
tribution.
has a known, finite sample dis-
Such a decomposition permits the statement
tim
n~
lP(T
n
>
t) - P(X
n
>
t)J = O.
Because we allow specification error and nonstationarity we will not necessarily
have T
n
converging in distribution to a random variable X.
However, the
practical utility of convergence in distribution in applications derives from
the statement
tim
n~
because p(X
<
the value P(X
n
lp(T
n
> t)
- p(X
> t)J = 0
t) is computable so can be used to approximate P(T
>
n
>
t).
Since
t) that we shall provide is computable, we shall capture the
full benefits of a classical asymptotic theory.
We introduce some additional notation.
9-b-3
NOTATION 7.
VO
n
<p 0 )-1 <.9 0 <p 0 )-1 V* = <p*)-l <.9*<p*)-1
n
n n
n
n n
' n
, -V -_~-1 J- ~-l
V
= ~-l
h
= h(~), HOJ = (a/aA')h(A)
~
~
~-l
<.9~
HO = HO.O), H* = H(A*)
n
n
n
n
H
= H(~ n ),
H = H(A )
n
THEOREM 12.
V = VO J = <.9 ° n = nOll = 11 °
n'
n'lf
If n '
n'
H = HO
n
THEOREMS 13, 14, 15, and lb.
V = V* ,~
n'
= J n'lf
* n
= n*
If
n'
1A = tt * H = H*
n'
n
The first test statistic considered is the Wald test statistic
As shown below, one rejects the hypothesis H: h(AO) = h* when W exceeds the
n
n
upper a x 100% critical point of a chi-square random variable with q-degrees of
freedom to achieve an asymptotically level a test in a correctly specific
situation.
As noted earlier, the principal advantage of the Wald test is that
it requires only one unconstrained optimization to compute it.
The principal
disadvantages are that it is not invariant to reparameterization and its sampling
distribution is not as well approximated by our characterizations as are the
"likelihood ratio" and Lagrange multiplier tests.
THEOREM 12.
Let Assumptions 1 through 3 hold and let either Assumption 4
through 7 or 8 through 12 hold.
Let
9-{)-4
Then
W- Y
+
Z - N
{In l h (A nU)
0
P
(1)
where
and
q
v
Recall:
= VU
J = JU
n'
-
h * j, HVH'}
n
rz = rzu
n'
if
if n
1),
'
= tl u, and H = HU
n
•
n
If
~
=0
then Y has the
non-central chi-square distribution with q degrees of freedom and non-centrality
parameter a
a
Under the null hypothesis
= O.
We may assume without loss of generality that ~ , AU
PROOF.
n
(a/aA)s (~ ) =
n
0
n
(n -~ ), (a/aA)SU(A U ) = o(n -~ ).
n
~
n
n
i
to zero, lim
n-+<:c
II ~n - A~ II·
II ~ 1.n
. - An II = 0
- (a/aA)h.(AO)j
1. n
=0
where
n
lh.(~ ) - h (AU)j = (a/aA')hi(~. )
In
II ~in
s
- A~ II ~
U
~n
n
€
A and that
By Taylor's theorem
In
(~
i = 1 , 2 , .•. , q
n
By the almost sure convergence of
almost surely whence lim
l(a/aA)h.(~.)
n-+<:c
1. 1.n
almost surely.
Thus we may write
Again by Taylor's theorem
!.:
By the Summary, the left hand side is 0p(l), and (J~)2 and <:;~)whence I~ (~
n
- AU)
= 0 p (1) and
n
Combining these two equations we have
1
are both 0(1)
9-6-5
because all terms are 0 (1) save the 0 (1) and 0 (1) terms.
s s p
of
Pn OJ},
{iAn OJ}, {~n CU},
and det§(A), det09(A) ~ t:..
>
The equicontinuity
the almost sure convergence of II~
° imply
n
-
AOIl to zero,
n
that
o (1)
s
Since
and all terms on the right are bounded in probability we have that
W = nLh(~ ) - h*J (HVH,)-l Lh(~
n
n
= nLh(~ ) - h*J
n
n
n
LH ~-l (J
) - h*J
n
+u ) ~-l H' j-l Lh(~ ) - h*l + 0 (1)
n
p
n
By the Skorokhod representation theorem (Serfling, 1980, Section l.b), there are
random variables X with the same distribution as J
n
X
n
=X+
0
s
(1) where X - N(O,I).
because H, ~
-1
-x
2
~ (a/aA)s (AU) such that
n n
Then
x
, J 2 are bounded.
result follows.
U
In order to characterize the distribution of the Lagrange multiplier and
"likelihood ratio" test statistics we shall need the following characterization
of the distribution of the score vector evaluated at the constrained value A*
'n
THEOREM 13.
Let Assumptions 1 through 3 hold and let either Assumptions 4
through 7 or 8 through 12 hold.
Then
9-6-6
~ (3/3A)S (A*) - X
n
+
0
s
(1)
where
x-
N
PROOF.
l~
(3/3A)S"(A*),
n
n
o9~J
By either Theorem 7 or Theorem 11
By the Skorokhod representation theorem (Serfling, 1980, Section 1.6) there are
with the same distribution as ~ (09*)
n
random variables Y
n
~ (o9*)-~
that Y n
n
X
=
(3/3A)S"(A*) = Y
n n
+
0
s
-~
2
(3/3A)S (A*) such
n n
(1) where Y - N(O,I).
Let
1
(o9*)~Y - ~ (3/3A)S"(A*)
n
n n
whence
X-Nl~(3/aA)s"(A*) ,o9*J.
n n
n
1
1
Since (o9")~ is bounded, (J")~
n
n
0
s
(1) =
0
s
(1) and the result follows.
U
Both the "likelihood ratio" and Lagrange multiplier test statistics are
effectively functions of the score vector evaluated at A.
n
The following result
gives an essential representation.
THEOREM 14.
Let assumptions 1 through 3 hold and let either Assumptions
4 through 7 or 8 through 12 hold.
Then:
~ (3/3A)S (A ) = H,(~-lH,)-l
n
where
:J =:J n*
PROOF.
I~
and H
HP- l
n
~ (3/3A)S (A*) +
n
= H*n
By Taylor's theorem
(3/3A)S (A )
n n
~ h(~ n ) = ~
=~
h(A*)
n
(3/3A)S (A*)
n n
+ H ~ (A
n
+:J ~ (~ n -
- A*)
n
A*)
n
n
0
p
(1)
=
0 (1)
p
9-b-7
where ~ has rows
(a/aA')(a/aA.)S (~. )
1. n 1.n
i
=
1, 2, ... , P
j
=
1, 2, ... , q
- has rows
and H
(o/aA')h.(L)
J In
Now ~ lh(A ) - h*j =
n
n
with A. and A. on the line segment joining A to A*
1.n
In
n
n
In
Recalling that
IIA
n
= 0,
lh(A*) - h*j
n
n
we have
HIn
(~ n
-
A*) =
n
0
s
(1).
0
s
(1).
Since
- A*II converges almost surely to zero, (~i - A*) and (~. - A*) converge
n
n
n
In
n
almost surely to zero and ~ = ~ + 0s(l) by the equicontinuity of {Pn(A)t:=l;
continuity of H(A) on A compact implies equicontinuity whence H
H+o (1).
s
Moreover, there is an N corresponding to almost every realization of {Vtt such
that det(9)
for all n
>0
> N.
> N.
for all n
Defining
- A*)
Thus,
n
--1
~
-
arbitrarily when det(d)
= ~ (~ n
- A*) +
n
0
s
(1).
=
0 we have
Combining
these observations, we may write
H In (~ n
~ (A
n
-
A*) =
n
0
s
(1)
- A*) = 9- 1 ~ (a/aA)S (~ ) - j-l
n
nn
In
(a/aA)S (A*) + 0 (1)
nn
s
whence
Now
~ J-~ l(a/aA)S n (A*)
- (a/aA)SO(A*)j converges in distribution and by Taylor's
n
n n
theorem
r- J -~
yn
= 0(1)
+
0(1)
r- -%
so we have that yn J 2 (a/aA)SO(A*) is bounded.
n
n
Since J
-~
is bounded,
9-6-8
~ (a/aA)S (A*) is bounded in probability.
n
n
a sequence of Lagrange multipliers
~
(a/aA)S
n
(~ n ) +
H'
en
By Lemma 2 of Chapter 3, there is
such that
~ en = 0 s (1)
II~n -
By continuity of H(A) and the almost sure convergence of
have H
=H+
0
s
(1).
A*II
n
to zero we
- --1 - -1
--1
Defining (H9
H)
similarly to 9
above and recalling
that ~ (a/aA)S (A*) is bounded in probability
n n
H' (H9- 1H' ) -1 Hf 1
In
(a/aA)S (A*)
n n
= H'(Hj"-l
- --1
H' )-1 H9
~ (a/aA)s 0*)
n n
+
0
= H'(Hj"-l
- --1
H') -1 H9
~ ( a/aA)s (A )
n n
+
0
= H'(Hj-1
- --1 H' )-1 H9
H'
= H'
~
en +
0
p
~e n +
0
p
p
p
(1)
(1)
(1)
(1)
The second test statistic considered is the "likelihood ratio" test statistic
L
=
2nls (A ) - s (~ )J
n n
n n
As shown below, one rejects the hypothesis H: h(A U )
n
= h*n
when L exceeds the
upper a x 100% critical point of a chi-square random variable with q degrees of
freedom to achieve an asymptotically level a test in a correctly specified
situation.
The principal disadvantages of the "likelihood ratio" test are that
it takes two minimizations to compute it and it requires that
to achieve its null case asymptotic distribution.
As seen earlier, when this
condition holds, there is Monte Carlo evidence that indicates that the asymptotic
approximation is quite accurate if degrees of freedom corrections are applied.
9-0-9
THEOREM 15.
Let Assumptions 1 through 3 hold and let either Assumptions
4 through 7 or 8 through 12 hold.
L = 2nLs (A ) - s (~ )J
n n
n n
Let
.
Then
L - Y
+
0
P
(1)
where
and
Recall :
v = V*, J = J *, {}
n
-1
H{j
If H V H'
n
= {} *,
n
1A
= 1.4 *,
n
and H = H* .
n
H' then Y has the non-central chi-square distribution with q
degrees of freedom and non-centrality parameter
a
=
Under the null hypothesis,
a = O.
PROOF.
By Taylor's theorem
2nls (A ) - s (~ )j
n n
n n
= 2nL(%A)S (~)j'(A - ~ )
nn
n
n
where
II ~ n - ~ n II :s II i n - ~ n II·
+ n(A n -
~ )'l(OZ/OAOA')S (~)j(A - ~ )
n
nn
n
n
IIin - A*n II
By the Summary,
and
II ~ n -
A()
n
almost surely to zero and {(oZ/oAoA')sn(A)}:=l is equicontinuous whence
(OZ/OAOA')S (~ ) = Ln +
n
n
tr
0
s
(l)j.
2nL(o/OA)S (~)J'(A - ~ ) =
n n
n
n
0
s
By Lemma 2 of Chapter 3,
(1) whence
2nLs (A ) - s (~ )j = n(A - ~ )IL{j +
nn
nn
n
n
0
s
(l)j(A
n
- ~ ) +
n
0
s
(1)
II
converge
9-0-10
Again by Taylor's theorem
l,? +
0
s
~ (~ n - ~ n ) = ~ (a/aA)S n (A n )
(l)J
whence, using the same type of argument as in the proof of Theorem 14
~ ) = l~ +
n
= l~
+
0
0
s
s
(1)J-
(1)J-
1
1
l~ +
0
s
(1)J
n
~ (A
~ ) =~-1
n
n
= n (A n
n
n
n
0
s
~ ) +
n
0
s
(1)
(1)
Thus
- ~ )'~ (A - ~ )
n
n
n
~ (a/aA)S (A ) +
-
n
~ (a/aA)S (~ ) +
which is bounded in probability by Theorem 14.
2n l S (A ) - s (~ ) J
n n
n n
~ (~
0
p
+ 0 p (1)
(1)
whence
2n l s
<i ) -
nn
s ( ~ ) J = n l ( a / aA ) s ( ~ ) J I
nn
nn
~ -1 l ( a / a A) s n( n
~ )J +
0
and the distributional result follows at once from Theorems 13 and 14.
that Y is distributed as the non-central chi-square when HVH' = Hi
that ~
-1
H' (H,?
-1
H')
-1
H,?
-1
J
is idempotent under this condition.
l
p
(1)
To see
H' note
U
The last statistic considered is the Lagrange multiplier test statistic
As shown below, one rejects the hypothesis H: h(AO) = h*
n
n
when R exceeds the
upper a x 100% critical point of a chi-square random variable with q degrees
of freedom to achieve an asymptotically level a
situation.
test in a correctly specified
Using the first order condition
(a/aA)'£(A ,6)
n n
for the problem:
=
(a/aA){S (A)+6 ' lh(A) - h*J}
n
n
n
0
9-b-11
Minimize s (A) subject to h(A) = h*
n
n
one can replace (a/aA)S (A) by e'(a/aA)h(A ) in the expression for R whence the
n
n
term Lagrange multiplier test; it is also called the efficient score test.
Its
principal advantage is that it requires only one constrained optimization for
its computation.
If the constraint h(A)
= h*n
completely specifies
in a linear model, this can be an overwhelming advantage.
rather bizarre structural characteristics.
specifies A.
n
~n
or results
The test can have
Suppose that h(A)
= h*n
completely
Then the test will accept any h* for which A is a local minimum,
n
n
maximum, or saddlepoint of s (A) regardless of how large is Ilh(~) - h*lI.
n
n
As
we have seen, Monte Carlo evidence suggests that the asymptotic approximation
is reasonably accurate.
THEOREM lb.
Let Assumptions 1 through 3 hold and let either Assumptions
4 through 7 or 8 through 12 hold.
Let
Then
R -
Y
+
0
P
(1)
where
and
v = V*n' J = J*n' {} .: d"n'
n* 1J = 1J * and H = H* .
n'
n
Recall:
If 11 =
a
then Y has the non-central chi-square distribution with q degrees of
*
freedom and non-centrality parameter a = nl(a/aA)SO(A )J'{}
n
x l(a/aA)SO(A*)J/2.
n
n
Under the null hypothesis, a
=
n
O.
-1
H'(HVH')
-1
~
-1
9-6-12
PROOF.
By the summary,
By Theorem 14,
~ <a/aA)S n <~ n )
is bounded in probability whence we have
The distributional result follows by Theorem 13.
rzif
l H' lHrz- 1 .n 11- 1 H' j-l H rz- 1 .n
if~if
i f ·J
square distribution if
l.~
= o.
°
~s
lJ
°d
~
The matrix
h
1
empotent so Y f 0 11 ows tenon-centra
h
c~O
9-7-1
7.
REF:::RENCES
T. W. (19711, The Statistic~l Analysis of Time Series, New York:
Jo hn Wi ley an d Son s, Inc.
Anders~n,
Ash, Robert B. (19721, Real Analysis
Press, Inc.
~nd
Probability.
New York:
Ac~demic
Bierens, Herman J. (19811, "Robust Methods ~nd Asymptotic Theory," Lecture
Notes in Economics and Mathematical Systems 192. Berlin: SpringerVerlag.
Billingsley, Patrick (19681, Convergence of Probability Measures.
John Wiley and Sons, Inc.
Billingsley, Patrick (19791, Probability
and Sons, Inc.
~nd
Measure.
New York:
New York:
John Wiley
Bloomfield, Peter (19761, Fourier An~lysis of Time Series: An Introduction.
New York: John Wiley and Sons, Inc.
Chung, Kai Lai (19741, A Course in Probability Theory, Second Edition
York: Academi c Press, Inc.
Doob, J. L.
(19531, Stochastic Processes.
New York:
New
John Wiley and Sons, Inc.
Fuller, Wayne A. (19761, Introduction to Statistical Time Series.
John Wiley and Sons, Inc.
New York:
Hall, P. ~nd C. C. Heyde (19801, Martingale Limit Theory and Its Application.
New York: Academic Press, Inc.
Hoadley, Bruce (19711, "Asymptotic Properties of Maximum Likelihood Estimators
for the Independent Not Identically Distributed Case," The Annals of
Mathematical Statistics 42, 1977-1991.
Ibragimov, I. A. (19621
"Some Limit Theorems for Stationary Processes,"
Theory of Probability and Its Applications 7, 349-382.
I
Jorgenson, D. W., and J. Laffont (19741, "Efficient Estimation of Nonlinear
Simultaneous Equations with Additive Disturbances," Annals of Economic
and Social Measurement 3, 615-640.
Calhnt, A. Ronald (1977) "Three Stage Least Squares .. for a System of
Simultaneous, Nonlinear, Implicit Equations," Journal of Econometrics S,
71-88
A. Ronald and Dale W. Jorgenson (19791, "Statistical Inference for a
System of Simultaneous, Nonlinear, Implicit Equations in the Context of
Instrumental Variable Estimation," Journal of Econometrics 11, 275-302.
Call~nt,
Hansen, Lars Peter (19821, "Large Sample Properties of Ceneralized
Moments Estimators," Econometric~ 50, 1029-1054.
Met~od
of
9-1-2
H3nsen. Lars Peter (19821, "Generalized Instrumental Variables Estimation of
Ncnlinelr Rltional Expectltions Models." Econometrica 50, 1269-1285
Lawson, Charles L. and Richard J. Hanson (1974), Solving Least Squares
Problems, Englewood Cliffs, New Jersey: Prentice-H~ll, Inc.
Mood, Alelander M. and Franklin A. Graybill (1963), Introduction to the Theory
of St~tistics, Second Edition, New York: McCraw-Hill Book Comp~ny
McLeish. O. L. (1974), "Dependent Central Limit Theorems and Invarhnce
Principles." The Annals of Prob~bility 2, 630-628.
McLeish, O. L. (1975a), "A Malimal Inequality .1nd Dependent Strong Laws,"
The Annals of Prob~bility 3, 829-839.
McLeish, D. L. (1975b). "Invariance Principles of Dependent Variables,"
Zeitschrift fur Wahrscheinlichkeitstheorie und verwandte Gebiete 32.
165-178.
McLeish, D. L. (1977). "On the Inv~rhnce Principle for Nonstationary
Mixingales," The Annals of Probability 5. 616-621.
Neveu, J. (1965', M~thematic~l Foundations of the Calculus of Probability,
San Francisco: Holden-Day
Royden, H. L.
(1963),
Re~l
An~lysis.
New York:
The Macmillan Company.
Serfling. Robert J. (1980), Approximati~n Theorems of Mathematical Statistics,
New York: John Wiley & Sons
Tucker. Howard G.
Inc.
(1967), A Graduate Course in Probability.
Academic Press,
Withers. C. S. (1981), "Conditions for Linear Processes to be Strong-Mixing."
Zeitschrift fur Wahrscheinlichkeitstheorie und verwandte Cebiete 57.
477-480.
Wooldridge. Jeffrey M. (1985). Nonlinear Econometric Models with Dependent
Observations. Ph.D. Dissert.1tion. University of California at San Diego.
9-8-1
8.
INDEX TO CHAPTER 9
Assumption 1, 9-3-1
Assumption 2, 9-3-2
Assumption 3, 9-3-3
Assumption 4, 9-4-1
Assumption 5, 9-4-4
Assumption 6, 9-4-9
Assumption 7, 9-4-17
Assumption 8, 9-5-5
Assumption 9, 9-5-10
Assumption 10, 9-5-11
Assumption 11, 9-5-14
Assumption 12, 9-5-17
Asymptotic normality
of le~st mean distance estimators, 9-4-16, 9-4-23
of method of moments estimators, 9-5-16, 9-5-17
Asymptotic normality of scores
least mean distance estimators, 9-4-13
method of moments estimators, 9-5-14
constrained estimators under Pitman drift, 9-6-6
Bartlett weights, 9-2-44
Central Limit Theorem
stated, 9-2-30
definition of terminology, 9-2-31
variance estimation, 9-2-44
Conditional analysis, 9-1-2
Conditional Jensen's inequality, 9-2-51
Consistency
of least me~n distance estimat.ors, 9-4-5
of method of moments estimators, 9-5-13
Constraint
equivalent represent~tions, 9-4-17
Continuity set, 9-2-39
Data generating model
def ined 9-3-1
introductory discussion, 9-1-1
Distance function
for le~st me~n distance estimators, 9-4-1
for method of moments estimators, 9-5-1
Efficient score test
see Lagrange multiplier test
Estimated parameter
discussed, 9-4-3
Identification condition
least mean distance estimators
defined, 9-4-3
example, 9-4-5
method of moments estimators
def ined, 9-5-5
discussed, 9-5-5
I
9-8-2
variable estimator
see three-stage least-squares estimator
L~grange multIplier test
3symptotic distribution, 9-6-11
defined, 9-6-10
dIscussed, 9-6-10
Least mean distance estimator
consistency, 9-4-5
constrained, 9-4-1
defined, 9-4-1
introductory discussion, 9-1-2
identification, 9-4-3
summ~ry of results, 9-6-1
Likelihood ratio test
asymptotic distribution, 9-6-9
defined, 9-6-8
discussed, 9-6-8
Martingale, 9-2-33
McLeish's inequality, 9-2-20
Method of moments estimator,
consistency, 9-5-13
constrained, 9-5-1
defined, 9-5-1
introductory discussion, 9-1-2
identification, 9-5-5
summary of results, 9-6-1
under misspecific3tion, 9-5-5, 9-6-19
Mizingale
defined, 9-2-14
sufficient conditions for, 9-2-18
Near epoch dependence
defined, 9-2-5
sufficient conditions for, 9-5-7
eZlmple of, 9-2-10
Nonlinear autoregression, 9-2-10, 9-4-1, 9-4-4, 9-4-7, 9-4-10
Nonlinear ARMA, 9-2-51
Notation 1, 9-4-3
Notation 2, 9-4-6
Notation 3, 9-4-8
Notation 4, 9-5-5
Notation 5, 9-5-6
Notation 6, 9-5-8
Notation 7, 9-6-3
Null hypothesis
defined, 9-6-2
Parten weights, 9-2-44
Pitman drift
introductory discussion, 9-1-3
least mean distance estimators, 9-4-17
method of moments estimators, 9-5-17
Sample objective function
for least mean distance estimators, 9-4-1
for method of moment estImators, 9-5-1
!nstrument~l
9-8-3
Score
see asymptotic normality of the scores
ze, 9-2-3
Strong law of large numbers, 9-2-24, 9-2-27
Strong miling
sufficient conditions for, 9-2-3
defined, 9-2-2
Tightness, 9-2-38
Theorem 1, 9-2-27
Theorem 2, 9-2-30
Theorem 3, 9-2-45
Theorem 4, 9-4-5
Theorem 5, 9-4-13
Theorem 6, 9-4-16
Theorem 7, 9-4-23
Th eo r em 8, 9- 5-1 3
Theorem 9, 9-5-14
Theorem 10, 9-5-16
Theorem 11, 9-5-17
Theorem 12, 9-6-3
Theorem 13, 9-6-5
Theorem 14, 9-6-6
Theorem 15, 9-6-9
Theorem 16, 9-6-11
Three-stage least-squares, 9-5-2, 9-5-6, 9-5-9
Underlying probability space
formal description 9-2-1
Uniform Strong Law of Large Numbers, 9-2-27
Uniformly integrable, 9-2-24
Variance of a sum, 9-2-44
~ald test statistic
~symptotic distribution, 9-6-3
defined, 9-6-3
discussed, 9-6-3
51