No. 1229
,
STATISTICAL INFERENCE BASED ON M-ESTIMATORS FOR THE
MULTIVARIATE NONLINEAR REGRESS ION MODEL· IN IMPLICIT FORM
by Geraldo Souza and A. Ronald Gallant
North Carolina State University
SUMMARY
The general regression model postulates that an observed multivariate response is a function of an observed multivariate input,
an unknown parameter, and an unobservable additive multivariate error.
In principle one may solve for the response given the input, parameter,
and error but this is not required in applications.
Given an optimi-
zation procedure which defines an estimator, a companion theory of
large sample inference is developed.
This theory includes strong con-
sistency and asymptotic normality of the estimator and the asymptotic
null and non-null distributions of the Wald test statistic, an analog
of the likelihood ratio test statistic, and an analog of Rao's
efficient score test statistic.
A~
lQ70 subiect classifications.
~o.J"~~~~
Primary 62F05, 62H15; Secondary
62G35,62P20.
Key words and phrases.
~ ~
Multivariate nonlinear regression, M-estimators,
nonlinear simultaneous equation models, Wald test, Rao's efficient
score test, likelihood ratio test, non-null asymptotic distributions.
,
Abbreviated Title.
~
regress ion.
Tests from M-estimators in multivariate nonlinear
~.
Introduction.
An M-variate response Y follows the statistical model
t
,~
t
= 1,2,
... , n
where x t denotes a k-dimensional input variable, yO denotes an unknown
s-dimensional parameter, and e t denotes an M-variate random error.
variables are contained in the Borel sets
~,
1, f, and
e
These
respectively.
It is convenient to absorb the scale parameters of the error distribution
into y and impose
ASSUMPTION 1.
The errors e t are independently and identically dis-
tributed each with mean zero and variance-covariance matrix the identity.
An example, which occurs in the study of consumer demand (Jorgenson,
Christenson and Lau, 1975), is
or, say,
Ylt - fl(xt,e)
Y2t - f 2 (xt
=
,e) =
= (tnPl'
e lt
6
2t
y
tnp , tnp , tnI)t'
In this model, termed a translog
2
3
expenditure system in the econometric literature, Y and y are the tth
lt
2t
with x
t
consumer's expenditures on non-durable goods and services, of durable goods
expressed as a proportion of the consumer I s income It; Pl' P2' and P3
are the prices of non-durable goods, services of durable goods, and services respectively.
The scale parameters may be absorbed by writing
,
2
or, say,
The models envisaged here are supposed to describe the behavior of
a physical, biological, economic, or social system.
If so, to each value
of (e ,x,yO) there should correspond one and on1¥ one outcane y.
This
condition and continuity are imposed.
ASSUMPTION 2.
For each (x,y) e
defines a one-to-one mapping of
Y(e,x,y) is continuous on
e
x 1. x
Throughout, q(y,x,y)
model while y
,.
= Y(e,x,y)
e
=
r
x r the equation q(y,x,y)
into U denoted as Y(e,x,y).
=e
Moreover,
r .
e will be referred to as the structural
will be termed the reduced form following the con-
ventions of the econometric literature.
It should be emphasized that it is
not necessary to have a closed form expression for the reduced form, or even
to be able to compute it using numerical methods. in order to use the statistical methods set forth here.
Interest is focused on a p-vector of parameters A.
will equal y or some subvector of y.
The parameter A is contained in A
and is estimated by finding that value ~
n
...
where Tn is a random variable;
Typically, A
typical~,
in A which maximizes
.
T corresponds to some subvector
...
of Y which is regarded as a nuisance parameter and Tn is its estimator.
This formulation of the estimation problem is motivated by a consideration of the statistical methods presently in use in nonlinear regression
,
ana~sis
and some others one might wish to employ.
The translog example
may be used for illustration.
Maximum likelihood methods for nonlinear models with explicit, separable reduced forms - those which may be written as y = f(x,6) + e - have
3
been studied by Barnett (1976) and Holly (1978).
,
The translog example fits
this description and if normally distributed errors are assumed then the
log likelihood is
S(y,X,'i,A) = Ln det R - (1/2)IIR(y - f(x,e)JII
2
and the method of maximum likelihood may be formulated as above; the
dependence of s(y,X,'i,~) on 'i is trivial in this case.
The esttmation method which probably finds most frequent
us~
in
applications is "iterated Aitken" also termed "Zellner-type," "the seemingly
unrelated regression method," and "minimum distance."
studied in Gallant (1975) and Holly (1978).
applied to the translog.
This method has been
The method is 'as follows when
First least squares residuals
obtained by fitting the two models Ylt
individually by least squares.
= fl(xt,e)
Ult
and
+ u1t and Y2t
U2t are
= f 2 (xt ,e)
+ u
Let
The iterated Aitken estUnator is obtained by finding
e which minimizes
and the iterated Aitken estimator is seen to be of the general form considered
here.
•
2t
,
4
,
The asymptotic properties of the estimator ~n are considered in
Sections 3 and 4.
To indicate the nature of these results, for the moment
assume that a density pee) and Jacobian (a/Oy')q(y,x,y) are available so
that a conditional density for the endogeneous variables y is given by
p(ylx,y) =
I
det(a/ay')q(y,x,y)\Rq(y,x,y)].
Set the true value yO = Y* and suppose that.tim
;. =
n
n~
,r*
almost surely.
Then
~n converges almost surely to the point A.* which :naximizes
Moreov~r,
N(O,V).
- A.* ) converges in distribution to a multivariate normal
l
where gj
A strongly consistent estimator of V is V= ,n-ljnif
In(A.
A
n
•
This result suggest that if the true value is yO in each finite
n
then the estimator ~
n
0
maximizes s(y
,T * ,A. ) •
n
is to be regarded
estimating that "Ooint
AOn rtThich
-
9.S
Typically, A and Tare subvectors of y whence there
c'J!'resnond A0 and TO obtained as subvectors of Y:' .
-
n
n
L~
as a nuisance par~eter to be held fixed at T~
a constraint on y~.
It may happen that
),,0
n
= T*
Usually, TOn is regarded
for all n
cases, the definition of
),,0
n
n
as a maximum is to prevail.
In these
Similarly, the
definition of T* as the almost sure limit of Tn is to prevail when
A
,
This implies
obtained as a subvector of yO
does not coincide with 1.0 defined as the maximum of s- (0
Yn,r* ,A ) .
n
arise.
s~ple
a~biguities
The results of Section 3 and 4 are obtained with yO subJ'ect to drift
'n
to facilitate the derivation of non-null distributions in Section 5.
4a
Inference is considered as an adjunct to the estimation procedure in
Section 5.
To indicate the nature of these results, consider testing
H: h(~)
=a
against
*a
A: h(~)
where h(A) is an r-vector valued function with Jacobian
let h
= h(~)
and
H= H(\n)
W = n h/(S ~ H,)-l
.
sn(~)
= (%~')h(~)
Following Wald, the test statistic
n
may be employed to test H against A.
maximizing
H(~)
,
~
Let A denote an estimator obtainedqr
n
subject to the constraint
h(~)
=a
By analogy with the
likelihood ratio test, the test statistic
•
,
5
L
,
=- 2 n( s n (~n )
- s (~ )]
nn
may be employed to test H against A.
.~ =
Let
'if = H(1:n)
-(1/n)~1(a2/aA.OA./) S'Yt,Xt'~n'>;:n) ,
J = (l/n)~l[ (a/aA.) s(Yt,Xt'~n'~n) I
,...
';:,-1'" ?!,-l
and V = ~
J
.
~
(a/aA,)
s(Yt,Xt'~n'~)]
I
,
The analogy with Rao's efficient score test statistic is
These test statistics are shown to converge in distribution to the same noncentral chi-square distribution with
r
degrees of freedom.
The nqn-
centrality parameter of this non-central chi-square distribution may be
approximated by
•
The validity of this approximation requires the additional assumption that
~im
n~
,
(Jo _
go) = 0
in the case of the statistic L .
6
~.
The Probabilit
S ace and Limits of Cesaro Sums.
Repeatedly, in the
sequel, the almost sure uniform limit of a Cesaro sum such as
is required.
In the nonlinear regression literature much attention has been
devoted to finding conditions which insure this behavior yet are plausible
and can be easily recognized as obtaining or not obtaining in an application
(Jennrich, 1969; Malinvaud, 1970; Gallant, 1977; Gallant and Holly, 1978).
A precis of these ideas appears here for the readers convenience.
The .independent variables (x l ,x ' •.• ) are either fixed, the realization
2
of a random process or the components of the vectors x
these.
t
are a mixture of
If, say, the data for the translog example were generated by randomly
selecting individuals from a population in each of several years then a
plausible and convenient assumption is that xt = (~nPl' ~nP2' ~nP3' ~nI)t
follows some distribution which is absolutely continuous with respect to
Lebesque measure on the cube X.
Then, denoting this distribut ion by
~(x)
,
~im~(1/n)~=lf(Plt'P2t'P3t,It)
=
Sf(Pl,P2,P3,I) d ~(x)
for integrable f (strong law of large numbers).
assumption is that the independent variables x
uncorrelated.
•
t
The typical regression
and the errors e
t
are
If this is strengthened to independence then, by Assumption 1,
~im~(1/n)~=lf(Plt'P2t,P3t,It,et)
=
SSf(Pl,P2,P3,I,e) dP(e) d~(x)
for integrable f where P is the distribution of the errors.
These considerations motivate the notion of regarding the joint sequence
of independent variables and errors as being defined on a probability space
~
7
,
and the definition of Cesaro summability.
Theorem 1 demonstrates that the
concept is not void for examples which come immediately to mind, and
Theorem 2 yields the desired uniform, almost sure convergence.
DEFINITION.
(Gallant and Holly, 1978)
A sequence [v ) of points from
t
a Borel set U is said to generate Cesaro summable sequences with respect to
a probability measure
~
defined on the Borel subsets of u and a dominating
function b(v) with Jb d~ < ~
if
for every real valued, continuous function f with If(v)\ < b(v) •
THEOREM 1.
(Gallant and Holly, 1978)
Let V , t = 1,2, •.. be a
t
sequence of independent and identically distributed s-dimensional random
variables defined on a complete probability space (O,a,p* ) with co~non distri-
•
bution
~.
Let
~
be absolutely continuous with respect to ,some product mea-
sure on RS and let b be a non-negative function with Sbd ~< ~
exists E with P* (E) = 0 such that if w
i
Then there
E
for every continuous function with If(v)\ $ b(v) .
Note that the null set depends on b and not on f .
THEOREM 2.
(Gallant, 1977)
Let f(v,A) be a real valued continuous
function on V x A where A is compact.
Let [vt ) generate Cesaro summable
sequences with respect to v and b(v).
If sUPA!f(V,A)\ < b(v) then
J,im~suPAI(l/n)~=lf(Vt'A.)- Jf(v,A) d~(v)\ = 0
and Sf(V,A) d~ is continuous on A .
,
8
ASSUMPTION 3.
(Gallant and Holly, 1978)
Almost every realization of
tVt} with v = (et,xt ) generates Cesaro summable sequences with respect to
t
the product measure
,
and a dominating function b (e ,x) • Almost every sequence tXt} generates
Cesaro summable sequences with respect to ~ and b(x) =
each x e I there is a neighborhood N such that
x
COROLLARY.
(Gallant and Holly, 1978)
Sb(e,x)
e
J sUPN b(e,x)
e
Q:)
•
Let Assu.'lJlptions 1 through 3 hold.
!f(y,x,p)l~ b[q(y,x,yO),x] for all (y,x) e ~ x I
Then both
dP(e) <
For
x.
Let f(y,x,p) be continuous on U x I x K where K is compact.
where A* is compact.
dP(e).
Let
and all (p,yO) in ~ x A*
(l/n)~lf(Yt'Xt'p) and
(l/n)~=lS f[Y(e,xt,yO),xt,p] dP(e) converge uniformly to
e
except on the event E with P* (E) = 0 given by Assumption 3.
In typical applications, a density pee) and a Jacobian
J(y,x,yO)
are available.
=
•
(2l/2ly')q(y,x,yO)
With these in hand, the conditional density
p(ylx,yO)
=
!det J(y,x,yO)\p[q(y,x,yO)]
may be used for computing limits since
SS f[Y(e,x,yO) ,x,y] dP(e) d~(x)
X. e
= SS f(y,x,y) p(Ylx,yO) dy
I U
d~(x)
The choice of integration formulas is dictated by convenience.
The probabilisitc structure which is usually assumed in asymptotic
regression analysis is as follows.
One fix.es a sequence
,
9
,
and works with a sequence of regular conditional probability distributions
Pn('lxm,yO) defined on the measurable subsets of ~n = ~l~.
The assumption
of independent and identically distributed errors with common distribution
p('), Assumption 1, induces a product measure Pco (.) defined on the measurm
able subsets of e co
= xt=le
distribution on
and the error distribut ion on e
Pn (Alxco ,yO)
~
n
= Pcom
te
•
The xelationship between the conditional
*
~
n
•
...
A statement such as A converges almost
n
means that ..A is a random variable with argument
~
n
(e , •.. ,en ,xl" .• ,Xn ,yO)
and that Pco (E)
n
l
E =
•
=0
ne>on.coJ= lUcon=J. teco : \/\~ n - A,*1 >
A statement that In(~n
distribution
is
s e co :[Y(el,xl,yO), •.• , Y(e nn
,x ,yO)] e A}
for every measurable subset of
surely to
m
N( '\o,V)
J!
where
e '\J
A* ) converges in distribution to a multivariate normal
means that for A of the form
it is true that
tim
P (Alxmn
,yO) = S dN(z\o,V)
'A
~n
In the sequel, the usual conventions will be followed; the probabilistic
structure is as described in the preceeding paragraph.
conditional on a fixed sequence
holds with respect to
~
X
00
and b(x) of
The analysis is
for which the Cesaro summability property
AssQ~ption
3. The link with the usual
probabilistic structure and Assumption 3 is provided by Theorem 3 below.
,
It shows that the event E c e m on which Cesaro
s~ability
fails for the
occurs with Pm probability zero for
10
almost every choice of x
=
THEOREM 3.
Let Assumption 1 hold and let [V } with V
t
t
= (Et,Xt )
be
a sequence of random variables defined on (n,u,p*) which satisfies Assumption
3.
,
=
Let P= be the measure on the measurable subsets of e= = xt=le induced by
the sequence of random. variables [E } and, similarly,
t
eXt} .
Then there is a subset N of X,<:0 with
IJo
<:0
= on I = induced by
IJo
(N) = 0 such that each
X
O, N
<:0
both satisfies the Cesaro summability property (with respect to b(x) and 1Jo)
and
has P<:0 (EO) = 0 •
By Assumption 3, 1Jo= (N ) = 0 where
l
PROOF.
Nl = n c>on;=ou:=je X=:3:\f(X)
1< b(x)
3
Moreover, Assumption 3 implies that P
.;Q
F =
I (l/n)t""~=lf(Xt)
x
nc>on~J= ou'»n=J.[(e=, x<:c ):3:!f(e,x)\ <
x.o ,
Note that for ever-J
P (EO) =
=
=
EO x [XO}
=
r 4o (e
Je.l:!.i
-= F
IJo
<:c
- Jf dlJol > e} .
(F) = 0 where
b(e,x) 3 \(l/n)}'~lf(et,Xt) -Jrrf dP d~}
-e=
oJ'~
whence
) dP (e )
<:c
<:c
=
<:c
~OxeX~}(eCD'x~) dPCD(eCD )
= Se
CD
But
P :{
IJo ( F)
::0
CD
= Jr' S
I .... ( e ,XO) d P (e ) d \-it (XO)
x.e~COCD::ooo-:eCD
CD
CD
whence PCD(EO) = 0 except for x~ in some event N r..rith
2
N
= N1U
N2 .
0
CD (N2 ) = O.
IJo
Let
,
11
~.
~.
It is necessary to introduce a dependence of the true
parameter on sample size in order to derive the non-null asymptotic distribution of test statistics.
Also, the large sample behavior of ~
n
must be
specified.
NOTATION.
= SS s[Y(e,x,y),X,T,A]
I e
S(y,T,A)
The parameter yO is indexed by nand 1- imn-lOJy~ = y * for
ASSUMPl'ION 4.
some point y *
and
In(~n -
•
r . The sequence
Tn
converges almost surely to a point T*
T*) is bounded in probability.Y
°
A* ,~° 'A2'
over A.
$
dP(e) d~(x)
corresponding to. y
There are unique points.
° °
= Y* 'Yl'Y2'
which maximize S(y,T*,A)
=0
The function hC\) of the hypothesis H: h( AO)
n
vector valued function on A; the point A* satisfies h(A* )
is a continuous
== 0
and
To illustrate, consider the translog example with the iterated Aitken
estimation method.
In this case,
s(yO 'T*'A) = -M/2 n
where
r. =
l
! S [f(x,e)
I
- f(x,e°)J'l;-l[f(X,e) n
l
(R- ) (R- ) I provided T* is such that S*
=s
I
* =
f(x,e~)J
r. .
ctJ.(x)
It is seen
'T'=T
at sight that
A~
=
(61' 82 , ... , 88)~
maximizes -s (yOn ,T* ,A ) .
~(A)
,
> 0 when 8
A
* eO
= [x:
Uniqueness obtains if
where
f(x,e)
* f(x,eO)l > 0
r.
is positive definite and
12
= (1.n};)l'
Recall that x
1.n};)2' 1.n};)3' 1.nI) eland ~
with res};)ect to Lebesque measure on 1 whence ~(A)
f(x,e)
= f(x,eO)
is absolutely continuous
=0
if and only if
a.e. with res};)ect to Lebesque measure on 1 .
= f(x,eO)
mani};)ulation shows that f(x,S)
a.e. im};)lies 8
= 8°
Some algebraic
};)rovided that,
for example, 8],. , eO 4 are non-zero and at least one of the values
or
is non-zero. 21 Thus,
8S
A~
e6 '
is uniquely determined for the translog
example with minimum distance estimation.
The almost sure convergence imposed in Assumption 4 lm};)lies that there
is a sequence which takes its values in a neighborhood of T* and is tail
equivalent to ~n.
~
n
Thus, without loss of generality, it may be assumed that
takes its values in a compact sphere T for which T* is an interior point.
Similarly,
r may be taken as a compact s};)here with interior
p~int
y*
Sufficient conditions such that A may effectively be taken as a compact sphere
are set forth in Theorem 4; they are patterned after Huber .(1964).
TrlEOREM
4.
Let Assumptions 1 through 4 hold.
Suppose that rand Tare
compact, that A is an unbounded closed set, and that there is a continuous
positive function W(A) such that
(
.)
\1.
•
sup.!\s[Y(e,x,y),X,T,~J/W(A) is continuous and sUPAls[Y(e,x,y),X,T,>..]/w(Y)1
is dominated by b (e ,x)
(ii)
~
* ),X,T* ,i"J.h.,(i,,)} S; -1
Jr \:suPi\[s[Y(e,x,y
1
(iii)
1.im inf\\AI\~
W
()
A
- * ,T* ,A* )
> -s(y
Then there is a com};)act set A' containing y * such that there corresponds to
almost every realization of tXt} an N for which n > N implies
N depends on the realization but A' does not.
,
14
The function S(y,x,T,A) is continuous on U x I x T X A' and
S(y,X,T,A.) S b[q(y,x,y),x] on 1J x I x T X A' x
given by Assumption
THEOREM 5.
Then )"n and
1:n
(The function b(e,x) is
3.)
Let Assumptions 1 through 5 hold.
(Strong consistency)
converge almost surely to ).,*
Let let} be such that Lim~Tn
PROOF.
r.
= T*
, ((et,xt )} has the Cesaro
summability property, and ~ e A' for large n ; almost every error sequence
n
....
*
*
*. .
is such. Since A satisfies h().. ) = 0 and).. e A'it follows that reA'
n
for large n.
Since A' is co~pact, the sequences ~ and
1n
corresponding to
let} have subsequences ~n and ~n converging to limit points ~ and .~
m
m
respectively. By the Corollary of Theorem 2,
• •
converges uniformly to S(y,T,A) on
S
(*
y ,T* ,,,')
r x T x A'
Then
(0
yn ' ,.Tn ' ~"n )
m m m m
= ~n;m ~ S n
oJ.
~ ~im
~
"
*)
s n ( y0
n ,Tn ,~.
m m m
- * ,T* ,A* )
= s(y
because in each finite sample (Y~ ,Tn '~n ) maximizes sn (Y~ ,Tn ,A) while
m m m*
m m n
(yO ,T ~A*) ~eed not. Si~ilarly, because A satisfies h(A*) = 0 ,
n
n
:n
m
- *,1"* ,/-.)
..
s(y
~
- *,1"*,A* ) •
s(y
. . = A* •
implies ~ = A
4.
~
here.
The assumption of a unique maximum, Assumption 4,
~
*
~
Then (A ) and [A ) have only the one limit point A .
n
Asy~pt~tic Normalitv.
#~v~~~vv~~~~~
n
0
The asymptotic normality of in is established
The verification that the "scores" (O/OA)S(Yt,Xt'~n,A~) are
asymptotically normally distributed is the critical result; the notion of
Cesaro slunmable sequences plays a key role in the proof.
From this result
13
PROOF.
By (iii) there is a compact set A' and an c with 0 < e < 1
such that
()
-s + e
infA , A' w A > 1 - e
where s-
- * ,T* ,A* ) < 0
= s(y
By (i), (ii) and the Corollary of Theorem 2,
there exists Nl such that for n > N •
..
Then A , A' and n > Nl imply
By (i) and the Corollary of Theorem 2,
so there is an N2 such that n > N
2
i~plies
Then if n > N = max(Nl ,N2 } it follows that ~* e A' and
Theorem 4 suggests that an assumption that it eventually suffices to
maximize s n (~) over a compact set is not as restrictive as might appear at
every realization of (e t } there corresponds an N for which n > N implies
sup ,s (A)
A
n
= sun- As n (A)
15
asymptotic normality of
~
n
follows by Taylor's series expansions and
related arguments of the sort which are typical of nonlinear regression
theory.
* r ,
There are open spheres r * , T* , and A* with y*ere
A8SUMPrION 6.
* T , and A* cAe
* A' .
T*eTc
The elements of (O/OA)S(y,s,T,~) ,
(02/0AOA ')S(y,X,T,A) , (02/0TOA ')S(y,X,T,A) , and [(O/OA)S(y,X,T,A)] x
C(O/OX)S(y,X,T,A)]' are continuous and dominated by b[q(y,x,y),x] on
~
-* x A
-*
x 1 x r-* x T
whe~e
the overbar indicates the closure of a set.
Moreover,
* ,>.,0]
Se (%>.)s[Y(e,x,yO),X,T
n
n
dP(e) = 0
SS (02/0TO A.')s[Y(e,x,y*),X,T*,A*]
e
1
1P(e) d~(x)
=0
The two integral conditions imposed in Assumption 6 do not appear to
impede application of the results in instances which come readily to mind.
•
Apparently they are intrinsic properties of reasonable estimation procedures.
The translog example with interated Aitken estimation serves to illustrate.
The score is
(O/O>')S(y,x,T,~) = [(%e')f(x,e)]' 8- 1 [y - f(x,e)]
-Nhence
Since
Se
e
= 0 , one notes at sight that the two integral conditions
dP(e)
are satisfied.
NOTATION.
c9
,
= JI"
1
~
.
*
**
*
**,
S t(O/OA)s[Y(e,x,y
),X,T ,A JH(O/OA)s[Y(e,x,y ),X,T ,x J} dP(e)
e
.
2
*
* *
= - SS
(0 /OAOA')s[Y(e,x,y ),X,T ,A ]
1
e
dP(e) d~(x)
16
In(~) = (l!n)~=l[(o!oA)S(Yt,Xt'Tn,A)][(o!oA)S(Yt,Xt'Tn,A)]'
~n(~)
-(1!n)~=1(o2!OAO~')S(Yt,Xt'~n,A)
=
For the translog example with iterated Aitken estimation
J =
S[(o!oe')f(x,e)]'
(S*)-l ~ (S*)-l [(o!oe')f(x,S)] d~(x)
1.
~ =
S [(o!o~')f(x,e)]'
*
(S)
-1
[(o!oe')f(x,e)] d~(x)
1.
If
~
= S* then
THEOREM 6.
J
=~ .
(Asymptotic Normality of the Scores)
Under Assumptions 1
through 6
J may be singular.
PROOF.
Given 1. with \11.\1 = 1 consider the triangular array of random
•
variables
Each Ztn has mean, SeZtn(e) dP(e) , zero by assumption and variance
2
0t n =
t'Se [(O!OA)s[Y(e'Xt,yO),Xt,T*,AoJH(o!OA)s[Y(e'Xt,yO),Xt,T*,)..O]}'dP(e)t
n
n
n
n .
* *
By the Corollary of Theorem 2 and the assumption that timn~ (yOn ,)..0)
n = (y ,A )
it follows
that Lim
.
~
(l!n)Vn
= t'J
1. where
Now (l!n)Vn is the variance of (l!Ji)r::=lZtn and if L'Jt = 0 then (l!./ri)r::=lZtn
converges in distribution to N(O,L'JL) by Chebyshev's inequality.
Suppose, then, that L'Jt > O.
tim~n
B
= 0 where
If it is shown that for every e > 0
,
.
17
B = (l/n)Etn_,S I
[Zt (e)]z2t (e) dP(e)
n
-~ e [lzl>eKJ
n
n
n
then tim~ (n/V)B
= O.
n n
This is the Lindberg-Feller condition (Chung, 1974);
it implies that (l/Jn)~=lZtn converges in distribution to N(O,t'Jt) .
Let n> 0 and c > 0 be given.
Choose a> 0 such that B(y*'A*) < n/2
where
.
,
*
* * 2
x [t (O/OA)s[Y(e,x,y ),X,T ,).. J) dP(e) d~(x)
-( y,A
* *) exists when a
This is possible because B
function
~(z)
I
•
=0
.
Choose a continuous
and an Nl such that, for all n > Nl '
(z)$~(z)SI
[lzl>e~J
(z)
[lzl>e a ]
and set
= (l/n)~_lS ~[t'(O/OA)s[Y(e,x,y),Xt'T*'A]}
- e
x [t'(O/OA)s[Y(e,x,y),Xt ,T*,A]}2 dP(e)
-* to, say,
By the Corollary of Theorem 2, '"
Bn (y,A ) converges uniformly on r-* x A
0
0
'B(y,)..) . By assumption timn~(yO,'A
B (yO,'A
.
) = (y*,'A*) whence tim
) ='B'(y*, ",*)
n ''n
~ n
n n
°) < "'(
Then there is an N2 auch that, for all n > N2 ' '"
Bn (0
Yn'~
B Y* ,A*) + h / 2 .
But, for all n > N = max[Nl ,N2 }, Bn S 13n (yO,
1..0) whence
n n
'" (yO,'A.0 )<B(y
,. . * ,A* )+h2SB(y
/
- * ,A*) +h2<11
/
Bn <B
n n n
By Taylor's theorem
,
(l/Jn) ~lZtn = (l/~)t'E~=l(o/OA)S(Yt,Xt'~n''''~)
+ [(l/n)(%T')t'~=l(%)..)s(Yt,Xt'~n,A~)~ (~n- T*)
18
where
\ITn - 'i*\I s \IT n - 'i*\I. 21
By the Corollary of Theorem 2, the aJJnost sure
r= - T* ) conconvergence of ''i" , and Assumption 5, the vector multiplying ~n('i
A
n
n
verges aJJnost surely to zero.
In(Tn - 'i* )
This and the assumed probability bound on
tmply that the last term converges in probability to zero whence
(l/JO)t'~=l(O/O~)S(Yt,Xt'~n,A~) ~ N(O,t'Jt). This holds for every t with
\ltH = 1 whence the desired result obtains.
THEOREM 7.
0
Let Assumptions 1 through 6 hold.
(l/Jh)~=l(O/O~)S(Yt,Xt'~n,A*)~
Then p is nons ingular,
N(p6,J) ,
JO(""n- A*) ~ N(6,p- 1Jp-l) ,
In(\n) converges aJJnost surely to J , and Pn(\n) converges aJJnost surely to P •
PROOF.
By the mean value theorem and the dominated
c~nvergence
theorem,
interchange of differentiation and integration is permitted whence
P
= - (0-?/OAoA ') -(
s y * ,T* ,A*)
* ) has a
By Assumption 4, s-( y * ,'i,A
uni~ue
•
maximum at A = A* whence
(02/0AO~') S(y*,T*,A) must be negative definite at A = A*
By Taylor's theorem,
n
A
*
(l/Jn) Lt=l(o/OA) s(Yt,Xt,'in,A )
=
(l/Jn)~l(%>,,)s(Yt,Xt'Tn,A~) - PJOO.*- ~~)
where ~ has rows
- (l/n)~n
l(o/OA')(O/O>...)s[Y(et,xt,yO),xt,T
,~.
~1
n
n m
and II~.In - A* \I
J
s II ~*- AOn\l . From the Corollary of Theorem 2 and the assumption
,
19
... ,A0) converges almost surely to (y * ,T* ,A* ) it follows that ~that (yO ,T
n
n
n
Since J.imn-+=..;n(;..~- A* ) = 6 by Assumption,
converges almost surely to ~.
the left hand side converges in distribution to
N(~
6,J) .
By Theorem 5 there is a sequence which is tail equivalent to
takes its values in A*
...
'~
and
The remarks apply to the tail equivalent sequence
but a new notation is not introduced.
Taylor's theorem applies to this
sequence whence
where ~ is similar to
to
9 of the
for similar reasons.
~
previous paragraph and converges almost surely
Now the first term on the right is the gradient
of the objective function evaluated at a random variable which is tail
•
equivalent to the optimizing random variable; it must, therefore, converge
r.: ~ - i-,* ) ~
SThus, ,yn(A
n
almost surely to zero.
N(6,~
J~
) by Slutsky's theorem.
By the -Corollary of Theorem 2 and the almost sure convergence of
~
0'"
* * *)
(Yn,Tn'An ) to (y ,T ,A
surely.
~.
~%v~.
=0
are considered here.
necessary.
cal
~
...
it follows that J.imn~[Jn(~n)'~n(An)]= (J,~) almost
0
H: h(AO)
Tests of the hypothesis
against
A: h(;"O) :\: 0
A full rank assumption is imposed which is not strictly
However, the less than full rank case appears to be of no practi-
L~portance
and a full rank assumption eliminates much clutter from the
theorems and proofs.
,
-1-1
NOTATION
."-J
A maximizes sn(i-,) subject to h(A.)
n
~
= J n (~n ), 'J = J n (~n )
=
0
20
~
V=~
-1
-1
...
... -1.... ~-l
'" "'-1 "t( .-v-l
JJ
,V=~ Jil
,V=~
cJ~
H(t..) = (%X')h()..)
(the Jacobian of h of order r x p)
h= h()..* )
h = h(~n) , '"h = h~n)
hO = h(}.,~)
H= H(>..* )
if
If = h(A~)
ASSUMPTION 7.
H: h(AO)
,
= ~ n (~)
n ' 9 = ~ n (!n )
=0
= H(\ ) , '"
H = H(1' )
n
n
The r-vector valued function h()..) defining the hypothesis
is twice continuously differentiable with Jacobian
H()..) = (0 / oA')h(t..) ; H()..) has full rank at )..
has full rank.
= ;.,*
The matrix V = ~
-1
-1
J ~
The statement "the null hypothesis is true" means that
h(~) = 0 for all n .
THEO~~
8. Under Assumptions 1 through 7 the statistics
converge in distribution to the non-central Chi square distribution with
degrees of freedom and noncentrality parameter
r
•
11
Under the null hypothesis, the limiting distribution is the central chi
square with r degrees of freedom.
PROOF.
(The statistic W)
By Theorem 5 there is a sequence which is
tail equivalent to ~n and takes its values in A*
The remarks refer to the
tail equivalent sequence but a new notation is not introduced.
theoram applies to this sequence whence
Taylor's
,
21
= 1,2, ... ,
i
where \I~."J.n·
- A.*\1 :S
sureJ.¥ whence 1,im
\I ~n- ·A*\1
I:H<X>
.
1, im
II}:..J.n - ). .*\\ = 0
(%"')h. (",*)
J.
aJmost sureJ.¥.
By Theorem 5,
(%"')h. (~.)
J. J.n
=
I:H<X>
r
almos t
Now, in
addition, h(e*) = 0 so the Taylor's expansion may be writtenY
In
h(~ n )
= [H
+
0
*
s (l)}Jn(t''n - '" ) •
Then by Theorem 7,
In
h(~n) has the
same asymptotic distribution as H JD.( AA. - ~* ) • Now (~ ~ li')-2 exists for n
1
n
1
sufficiently large and converges almost sureJ.¥ to (H V H')-2 whence
.!.
1
A
(H V H')-2Jn h()...n ) and (H V H')-2H In(~
A
JIII.".
/It.
n
distribution.
- ",*)
have the same asymptotic
But
whence W converges in distribution to the non-central chi-s,quare.
•
When the null hypothesis is true, it follows from Taylor's theorem that
Taking the limit as n tends to infinity this equation becomes
o
=
*
(%>"')h i ()...)e
(The statistic R)
equivalent to
-~n
whence He
=
0 and>.,
=
0 .
By Theorem 5 there is a sequence which is tail
and takes its values in A* .
The remarks below refer to
the tail equivalent sequence but a new notation is not introduced.
Taylor's theorem
,
By
22
where
\IL'J.n - A*\\ , litJ.n - A*\1 s \11:n- Aifo\l
for i = i, 2, ••• , r
• By tail
~uivalence, there is for. every realization of Let} an N such that h~)
~
Thus h(A
)
n
for all n > N.
= 0s (1)
and recall that h(e * )
continuity of H(A), the almost sure convergence of
=
O.
=
0
Then the
An to A* given by Theorem
~
5, and the Corollary of Theorem 2 permit these Taylor's expansions to be
rewritten as
[H+
0
*
(l)J(>--'il - A)
f"J
s
=
0
(1)
s
These equations may be reduced algebraicly to
[H + Os (1) J(p + Os (1) rl(O/OA)Sn ~n)
= [H
+ Os (1) J[p + Os (1)
rl
(O/OA)Sn(A* ) + 0s(l) •
•
Then it follows from Theorem 7 that
In(0/0 A) s n (~)
n
[H + 0 (1) J[ P + 0 (1) ] -1
s
s
~ N(H
0, H V H')
The continuity of H(A), Theorem 5, and the Corollary of Theorem 2 permit the
conclusion that
whence R converges in distribution to the non-central chi square.
This
completes the argument but note for the next proof that
THEOREM 9,
L
=
Under Assumptions 1 through 6 the statistic
-2[ s
(1' ) - s n (\n )]
n'n
,
23
converges in distribution to the law of the
y
= Z'~
~uadratic
form
Z
where Z is distributed as the multivariate normal
If J
=~
then Y has the non-central
chi-s~uare
distribution with r degrees
of freedom and non-centrality parameter
chi-s~uare
Under the null qypothesis Y is distributed as the central
degrees of freedom provided that J
PROOF.
By Theorem 5 there are
=~ .
s~uences
~n and -An and take their values in A*
•
with r
which are tail
~uivalent
to
The remarks below refer to the tail
~uivalent s~uences but a new notatiQn is not intrQduced.
By Taylor's theorem
-2n[ s (1 ) - s (~ )]
n n
n n
= -2n[(%).,)s n (~n )~J'~
- An ) - n(~n - ~ n )'r(02/
0AOA ')S n (~n no:
\~n
•
- n - ~'n )
where
II>:.n - }"n 1\ :~ il'rn - ~n \\
convergence of
"' ")
(""'An'
An
The Corollary of Theorem 2 and the almost sure
/ , )Sn(A
\ - )
to (A* ,A *) imply that (0 2
oAeA
n
-2n[s n(~)
n - s n (~ n )J
~ n )'[:;
= n(t.J.'l -
+ 0
= -CJ
s (l)J(>::
'
'n - ~n ) +
+ 05(1)J
°s (1)
By tail e~uivalence, there is for every realizatiQn of ~e+} an N such that
,
-
v
A solves
n
Max s (A)
n
for all n > N.
subject to h(A.)
Then for n
..rn(2) /0, A) s n (\n )
~
=0
N there are Lagrange multipliers
- H' (~ )
n
,j;. 9n
=0
en
such that
24
Thus,
[H +
0
s
(l)J'Jn
-
en = In
by a previous argument In(O/OA)S
(o/OA.)s (A. ) +
n n
n
(1n ) = 0 s (1)
0
(1)
s
whence
by Taylor's theorem and previous arguments
= [nif +
0
'il-(~
s (1)'..l\IU
- -n - ~n ) +
0
s (1)
From this string of equalities one has
H ,9-\H +
0
s
(1) J 'In
en
In
=
rw-l(%A)s
n
(Xn )
+
0
s
(1)
whence by the last line of the previous proof
•
Thus
Again from the string of equalities one has
whence
Then In(~n- ~n) converges in distribution to the distribution of the random
variable Z and In(~n-
t n ) = Opel)
.
From the first paragraph of the proof,
-2n[s n (>:-n ) - s n ().n )J
= n(~n -
'~ ) '[,9 +
n
= n()..n - ~n ) ',9(1-n -
0
s
(1) J()" n
rn ) +
1:n ) +
Opel)
0
0
s
(1)
s (1) 0p(l) +
0
s (1)
,
25
If J =
~
then V =
~
-1
and
The conclusion that Y is chi-square follows at once from Theorem 2 of
Searle (1971, p. 57).
0
In a typical application,
~
and Tare
or some easily computed function of y •
subvectors of y
Thus, if yO is specified then
n
and TO become specified and this ~~ will satisfy Assumption 4.
n
~o
n
Thus, in a
typical application, (yO, TO, >..0) is specified and the noncentra1ity
n
parameter
ct
n
n
= 0 'H' (HVH') -~O/2
is to be computed.
The annoyance of having
to specify (y * , T* , A* ) in order to make this computation may be eliminated
by application of Theorem 10.
NOTATION.
•
J
O
= (l/n)~J [(O/OA)s[Y(e,Xt,y~),Xt,T~,>..~]}
e
x [(%~)s(y( e ,xt ,y~) ,xt ' T~, A~]}' dP( e)
~O
=
-(l/n)~=lS (o2/0~o>"')s[Y(e'Xt,y~),Xt'T~,A~]
e
dP(e)
VO = (~O f1 JO (~O )-1
THEOREM 10. Let Assumptions 1 through 7 hold and let [yO, TO ,AO)} be any
n
sequence with i.im
PROOF.
,
~
n
'n
(yO ,TO ,i-0 ) = (y * ,T* ,~* ) and tim~Jll(A~- ~* ) = 0
n n n
Then
By the continuity of H( A), the Corollary of Theorem 2, and the
assumption that tim
IH'X'
(yO ,TO ,AO) = (y * ,T* ,>..* ) it follows that
n
n
n
By Taylor's theorem
where
\I~in- A*\1
S;
\I~~- ~*\1
for i
= 1,
2, ... , r .
Thus,
FOOTNOTES
!I
See Caves and Christensen (1978) for a detailed discussion of the
domain of applicability of this demand system.
Since the purpose here
is illustrative, a simple expedient is adopted:
4
8
pact cubes in R and R such that on e x 1
let
e
and 1 be com-
gj For a real valued function g(u,v) of the k-vector u and p-vector v,
(%~)g(U,V)J' ,
(%u)g(u,v) denotes the k-vector [(%ul)g(u,v), .. "
•
and (o2/ouov ')g(u,v) denotes the k x p matrix with typical element
(02/ou ,ov.)g(u,v)
J
~
For a k-vector valued function g(v) of the p-
vector v , (%v')g(v) denotes the k x p matrix with typical element
(%v.)g.(v) ,
J
J
For a k x p vector matrix function g(z), Sg(u) d~(u)
denotes the k x p matrix with typical element
Sg. .(u)
~J
d~(u) .
]I Typically ~n depends on (Yl' ""yn ,xl' •. "xn ) and the dependence on
(el,·",en'Y~) enters via the relation Yt
y
Given 0 :> 0 there exists M and N such that
for all n :> N
= Y(et,xt'Y~)
,
pCJn lITn - ,.*\\ <
M) > 1 - 0
,
27
•
21
The weakest condition is that a particular matrix has full rank
but a display of this matrix would consume too much space.
§) The function;: n may be taken as measurable; however, measurability of
Tn is not necessary for the validity of the proof.
11
Searle's (1971) definition of the non-central ch;-square is used.
~
The notation is standard:
the random variables a..
~Jn
a
0s(n ) denotes a matrix
each with tim
~
a ..
Ina
~Jn
whose elements are
= 0 almost surely,
os (na ) denotes a matrix whose elements are the randoms variable a ~Jn
..
each with a ..
~Jn
Ina
is bounded almost surely.
Similarly op(na ) and
a
0p(n ) for ccnvergence in probability.
REFERENCES
•
[ 1 J Barnett, W. A. (1976).
Maximum likelihood and iterated Aitken
esti:nation in nonlinear systems of equations.
Assoc.
[2J
J. Amer. Statist.
71 354-360.
,..,...
Caves, D. W. and Christensen, L. R. (1978).
of flexible functional forms.
Global properties of
Technical report.
Social
Systems Research Institute, University of Wisconsin.
[3J
Christensen, L. R., Jorgenson, D. W., and Lau, L. J. (1975).
Transcendental logarithmetric utility functions.
Rev.
[4J
65
,..,v
Amer. Econ.
367-383.
Chung, K. L. (1974).
A Course in Probability, 2nd ed.
Academic
Press.
,
[5J
Gallant, A. R. (1975).
J. Econometrics
Seemingly unrelated nonlinear regressions.
2
35-50.
28
[6J
Three-stage least s~uares estimation for
Gallant, A. R. (1977).
a system of simultaneous, nonlinear, implicit
J. Econometrics
[7]
5
I'V
71-88.
Gallant, A. R. and Holly, A. (1978).
Statistical inference for an
implicit, nonlinear, simultaneous
e~uation
of maximum likelihood estimation.
Holly, A. (1978).
multiple
e~uation
nonlinear models.
A 176 0178, Ecole
~
HUber, P. J. (1964).
Paris.
Cahiers du Laboratoire
Polytechni~ue,
Paris.
The behavior of maximum likelihood estimators
under nonstandard conditions.
Statist. Prob. 1
~-------
[lOJ Jennrich, R. I. (1969).
s~uares.
Cahiers du Laboratoire
Tests of nonlinear statistical hypotheses in
d'Econometrie
[9]
model in the context
~,Ecole Polytechni~ue,
d 'Econometrie
[8J
e~uations.
Proc. Fifth Berkeley Symp. Math.
73-101.
Asymptotic properties of non-linear least
Ann. Math. Statist. ,....,....
40 633-643.
[llJ Malinvaud, E. (1970).
The consistency of nonlinear regressions.
•
Ann. Math. Statist. 41 965-969.
"""'
[12J Searle, S. R. (1971).
Linear Models.
Wiley.
,
© Copyright 2026 Paperzz