'rESTING A SUBSET OF' THE PARAMETERS OF' A
NONLINEAR REGRESSION MODEL
by
A. R. GALLANT
Institute of Statistics
Mlineograph Series No. 943
Raleigh - August 1974
TESTING A SUBSET OF THE PARAMETERS OF A
HONLINEAH
P.EGPE~3S
ION MODEL
by
A. R. Gallant *
-J(-
Assistant Professor of Eoonomics and Statistics, Institute of
Statistics, North Car olina State Uni versity, Raleigh, North Carolina
27607.
ABSTRACT
The prohlern of testing for the location of a slibset of the parameters entering the response function of a nonlinear regression model
is considered.
Two test statistics--the Likelihood Ratio Test statistic
and a test derived from the asymptotic normality of the least squares
estimator- -fO.I' this problem are discuqsed and compared.
The large
sample distributions of these statistics are derived and Monte-Carlo
.!JGv,7cr PC3tima t ;es are compared "i'Tith power com:puted l1;:;in8 the large
sample distributions.
1.
INTRODUCTION
This study is a continuation of the work reported in
tains a brief review
0:('
the literature.
[4]
which oon-
Here testing for the location
of a subset of the parameters entering the response function of a nonlinear regression model is considered rather than a joint test of all
the parameters.
This problem is more useful in applications but more
tedious to deal with analytically.
'Two test statj stjr.'s are f.'onsidered for this problem--the Likelihood
Ratio Test statistic and a test statistic derived from the asymptotic
normality of the least squares estimator.
The tests based on these
statistics are equivalent in linear regression but differ in nonlinear
regression.
Two approximating random variables with known small sample
distributions are derived; they are asymptotically equivalent to these
test statistics.
(The mathematical detail is deferred to the Appendix,)
A Monte-Carlo study is employed to verify that estimates of the power
of these tests computed using the distribution function of the approximating random variables are sUfficiently accurate for use in
tions.
applica~
Lastly, considerations governing a choice between the two
tests are discussed.
A more formal description of the problem studied in this article
is the following.
We consider testing the hypothesis of location
H:
at the
ex
inputs
x
against
A:
.,.
f .,.o
level of significance when the data are responses
t
Yt
generated according to the nonlinear regression model
(t - 1, 2, ... , n)
to
2
where the parameter
A has been partitioned according to
s'
(p', '1") •
f(x , e)
t
The form of the response function
e
parameter
is known.
is known to be contained in the parameter space
is a subset of the p-dimensional rea1s and the first
e
of the parameter
The inputs
x
The unknown
t
I
==
(p', ,.')
The errors
which
coordinates
I'
p
are regarded as nuisance parameters.
are contained in l.
dllllensional reals.
(1
e
which is a subset of the kare assumed independent and normally
t
distributed with mean zero ano. unknown variance
(j
2
An example of a problem fitting this description is that of testing
whether selected treatment contrasts are zero in a completely randomized
experimental design where some covariate
w such as agEr is kr).own to
affect the experimental material by shifting its expectation exponentially.
Such a model would be written
I3w ..
z . j == U + t. + ct e
l
l
lJ
+ e ..
following the qsual conventions.
that
A:
1=2
t
1
I
1,2, ... , I;
lJ
2
.
... , J)
Assume for simplicity of exposition
and the hypothes is of interest is H:
t
j == 1, 2,
t
l
= t
against
2
This hypothesis suggests a reparameterization to the
model
Yt
Ax
3 3t + e
A1xlt + 82 X2t + 84 e
t
(t == 1, 2,
.... ,
n)
where
Yt
Zlt'
x'
t
Yt
Z2t'
x'
t
::;;
==
(1, 1, w ) ,
1t
e
(0, 1, w ),
2t
e
t
€lt
t
= €2t
(t - 1, 2,
(t .- J+1,
... , ,
. , .,
J)
,
2J)
,
3
and
e
~ t ~)
= (t1 - t 2 , u + t 2 ,
I
0
With this parameterization the hypothesis of interest becomes
A:
against
meters
Al
I
0
The par~eter
0
("\'12' e3' e)
4
p
~
e1
=
0
and the remaining para~
61
is
H:
ar.e the nuisanoe poY'ameters.
...,...
This eXanJ.ple
.
is used for the Monte-Carlo simulations,
2.
NOTATION, ASSUMPTIOWS, AND A PRELJJv1.INARY THEOREM
The following notation will be useful in the remainder of the paper.
Notation:
Given the regression model
(t
where
e*
= 1,
2,
00',
e,
denotes the true but unknown value of
(t
= 1,2,
n)
the observations
"0' n)
and the hypothesis of location
'l'
where
e
=
against
"'0
A;
.,.
I
1'0
has been partitioned according to
e'
(p /,
'i f)
,
we define:
(n X
f(6) = (f(xl
,
1) ,
6), f(x 2 , e), ... , f(x n , 8))'
(n )( 1) ,
4
(n X 1) ,
F(e)~=the
F1 (A)
==
nxp
the
matrix with typical element
n X r
~f(Xt' e),
b8 j
matrix formed by deleting t lE last
p-r
columns
of F(A) ,
(n X n) ,
(n X n)
(n X n) ,
p.l. == I-P
e ==
the
p ~ 1
2
S
p ==
vector minimizing
A
t~==l(Yt - f(x t , 6))2 over 0 ,
A
(y - f(e))'(y - f(e))/(n-p) ,
==
the
r)( 1
(p, "0)
in
vector minimizing
~==1(y t . ,. f(X t , (p, "0)))2 for
n ,
A
(y - f(e))'(y - f(e))/n ,
() a (p) == f (e * ) - f(p, 'ra )
C
22
==
the
q Xq
columns of
(n X 1) ,
matrix formed by deleting the first
(F'(e*)F(e*))-l,
r
rows and
5
the
==
the
u~
q Xq
q X 1
matrix formed by deleting the first
rows and
r
vector obta-ined by deleting the first
r
rows of
[F'(e*)F(e*)]-lF/(e*)e .
For simplicity
6
0
for
6 (po).
0
we
will write
F
*
F(e),
for
F1
F (e * ),
l
for
and
Also, the following vectors and subvector::> are related
"
according to the equalities
e ==
A"
e*
(p, ~),
==
(p * , ' i * ),
and
In the remainder of the paper it will be convenient to use the
following conventions to refer to
v~ious
distribution and density
functions.
Densities and i Distributions.
Let
g(t; v, A)
central chi-squared density function with
v
ing distribution functiQn~
function with mean
IJo
Let
and variance
0
2
and let
Let
the non-central F-distri,bution function with
pp.
v
77-78].
2
denominator
The central
freedom is denoted by
denote the
and non-
corre::>pond~
n(t;~, ( 2 ) denote the normal density
the corresponding distribution function.
freedom,
degrees~freedom
A [5, p. 74J and let G(t; v, A)
centrality
denote the non-
degrees~freedom and
F~distribution with
F(t; VI' v )·
2
Define
2
N(t; 1Jo, (])
denote
F'(t; VI' v2'~)
v
l
numerator
denote
degrees~
A [5,
non-centrality
c0rresponding degreesH(x; vl ' v 2 ' AI' ),,2)
to
be the distribution function given by
0,
)I:
< 1,
Ai>
0,
6
It
is shown in [4J that
The following regularity conditions are taken from [4 J and are
repeated here for the readers' convenience.
Definition.
Let
0.
be the Borel subsets of 1
the sequence of inputs chosen from 1 .
function of a subset A of 1
and
(Xt)~==l be
IA(x) be the ind;i.cator
Let
on
The measure
(1,0. )
is defined
by
== the proportion of
for each
A
€
XL
with
t $ n
A sequence of measures
(~n)
converge weakly to a measure
~
on
pounded, continuous function
g
with domain 1
n ...
co
•
A
0. .
Definitio~.
as
which are in
(1,0.)
on
(1,0.)
is said to
if for every real valued,
7
Assumvtions.
n
The parameter space
1
and the set
are compact
subsets of the p-dimensional and k..dimensional reals, respectively,
response function
f(x,S)
b~. f(x,e)
and the partial derivatives
The
and
~
ee ~·be·J
f(x, e)
are continuous on 1)( O.
The sequence of inputs
[Xt)~::;l are chosen such that the sequence of measures
verges weakly to a measure
e,
*
denoted bye,
cUlltaincri in 0
f(x,~)
zero, it is assumed that
;:::
*
f(x,8)
c
e - e*
with density
The true value of
(1,eL) .
except on a set of
[.!' be.
e f (x,8*) lbe.
measure
f (x,e *) dI.J. ()
x ]
J
As mentioned earlier, the errors
n[x; 0,
I.J.
p X P matrix
The
~
is non-singular.
con-
is contained in an open set which, in turn, is
If
,..'t'
defined over
I.J.
f 11.)00
u-n n=l
ret}
are independent
a2 } where a2 is non-zero, finite, and unknown.
An additional assumption is used to obtain the characterizations
of the test statistics studied in this paper.
See [7, p. 305) for
its motivation and justification.
Assumptions (continued).
a rate such that
In (To
- T*)
As
n
~d
tends to
increases,
In
(po" p*)
';
*
at
converge to finite
limi ts.
Theorem 1 gives the
~symptotic
properties of
8,
"2
a ,
and
",,2
a
which are used to characterize the two test statistics studied in this
paper.
This theorem is proved in the appendix.
'rheorem 1.
Let the assumptions listed above be satisfied.
8
The random variable
A
e*
converges almost surely to
and may be
characterized as
In a n
where
converges in probability to zerQ.
The random va.riable
converges
~nost
surely to
2
a
and its
reciprocal ma,y be characterized as
where
n a
n
converges iq probability to zero.
"'2
a
The random variable
may be characterized as
(e + to ) I Pl.L (e + to )/ n + b
o
0
n
where
n b
3.
n
converges in probability to zero.
THE LIKELIHOOD RATIO TEST AND ITS LAR.GE SAMPLE DISTRIBUTJON
The Likelihood of the sample
L(y; 8,
(J
2
y
is
)
The maximum of the Likelihood subject to
L(H)
N'2 -n/2
(2na )
H
is
exp[-n/2]
and its maximwn over the entire parameter space is
Thus, the Likelihood Ratio is
L(H)/L(O):::
(~/~2)-n/2
and the Likelihood
9
Ratio Test has the form:
Reject the null hypothesis
H when the
statistic
T
is larger than
c
where
::=
""2 '" 2
0 /0
I
F{ T > c HJ
Ci
::=
The computation of the statistic requires that two nonlinear models
be fitted to the data using, say, either Hartley's [6] Or Marquardt's [9J
algorithm.
The constrained model
must be fit to obtain ~
and the unconstrained model
to obta.l.n
The statistical behavior of
T
is given by Theorem 2 which is
proved in the Appendix.
Theorem 2.
Under the assumptions listed in Section 2, the Likeli-
hood Ratio Test sta,tistic rna;y be cha,.racterized as
T
where
n
c
n
C
n
converges in probe,.bility to zero and
X
The random variable
where
X +
Al
::=
:.=
(e + 0 ) Ip1.l(e + 0 )/e'p.le
o.
X has the distrilmtion function
2
o'(P-Pl)o /(20 )
o
0
.
0
H(x; q, n-p, AI'
10
A relationship between the distribution function of the approximating random variable
in the
c*
pr~viou8
X
and the non-central F-distribution was mentioned
section.
Using this relationship, the critical point
X (pCX:> c * I Al == >"2 ::: OJ :::
of
c*
where
F'
denotes the upper
ex
distribution with
q
==,.
F /(n-p)
0'
100
()I
percentage point of an F-
This critical point
mate the cl,iticaJ,. point
'T
q
numerator degrees of freedom and
degrees of' f'reedom.
H:
1 +
•
can be obtained fran the formula
0')
c
of
T
The calculation of the non-centralities
a*
values of
and
a2
denominator
may be used to approxi-
T -- in appl.ications one rejects
when the calculated value of
a
c*
n-p
exceeds
Al
and
c*
A2
for trial
requires the minimization of the sum of squares
to obtain
for computing
This ma,y be done using either Hartley 's [6 J or Marquardt's [ 9] algorithm
with
*
f( A)
replacing
y.
The remaining computations are straight-
forward matrix algebra..
A series expansion of the distribution function
is given in
and
A
2
be small
[4J
which ma,y be used to compute
have been obtained.
« .1)
and
H(x; vI' v ,
2
F[X> c * J
However, in most applications
F[x > c * ]
once
A2
AI'
Al
will
can be adequately approximated by
chliLrts of' the non-central F-distribution [1, 10J.
The correspondence
for the use of these charts is
That is, the charts are entered at the
0'
level of significance with
q
11
nWDerator degrees of freedom,
non-centrality
n~p
6~(P-Pl)O/(2i)
).,1 ==
A Monte-Carlo study reported in
~
q
p == 2
n:::: 30
and
denaninator degrees or ffeedan, mld
.
[4J for an exponentia,l model with
suggests that the approximation
P(T > c* J .
== P[X*
> C J
will be fairly accurate in applications.
in Table L
Using the example of Section 1, five thousand trials were
n == 30,
generated with the parameters set at
o
2
:::: .001
Additional evidence is given
8
with
and
1
8
3
8
2
:::: 1,
64:::: -.5
varied a$ shown in the table.
sta.qdard errors shown in the table refer to the fact that the
pC T > c*)
Carlo estimator of
estimated by
P[X
and
The
Mo~te-
is binom;t.ally distributed with variance
c * J . P(X:S c * J/5000.
>
Table 1 about here
4.
A TEST STATISTIC DERIVED F'ROM THE ASYMPrOTIC NORMALITY OF THE
LEAST SQUARES ESTIMATOR AND ITS LARGE SAMPLE DISTRIBUTION
The least squares estimator
with mean
by
e*
e
is
as~ptotically
normally distributed
mid a variance-covarimice matrix estimated consistently
[F'(S)F(e)J-
1
g2
under the assumptions of Section 2 [2,
suggests the use of the test statistic
s
;:::
3)}/
This
12
The test has the form:
d
where
pCS
Reject the null hypothesis
H when
S
exceeds
> dlH] = Ci
The program used to compute the least squares estimator will usua.:j..ly
[F'(S)F(S)]-l,
2
s .
print
8,
ea~ily
be computed from the printed results.
and
If
the test statistic can
q=l,
However, if
acc~acy
be maintained, some programming effort will be required to:
and
2
s,
is to
recover
9,
and perform the necessary matrix algebra.
The statistical behavior of the statistic
S
is given by Theorem
3 w-hich is proved. in the Appendix.
Theorerp. 3.
Under the assumptions listed in Section 2, the test
statistic based on the asymptotic normality of the least squares estimator
may be characteriped as
where
Y
n
converges in probability to zero and
(U
2
+ ,. *
~-----=-~--.--.......------.-
which is distributed as
1"'(y; q,
d*
The oritical point
n~p,
X)
where
of Y(P[Y> d* I>'=OJ =
X
0')
is obtained
directly from a table of the F-distribution by setting
F'
ex
denotes the upper
degrees of freedom and
O!
•
n-p
100
percentage point with
d
*
:=
0'
q
of
S
~n
denominator degrees of freedam.
applications one rejects
exceeds
F
0'
H:
where
numerator
critical point may be used to approximate the critical point
s --
F
This
d
of
when the calculated value
13
Monte-Carlo evidence, Table 2, suggests that, under the alternative,
the approximation
p[S
> d *" J
.
P[X
> d* J
is not as accurate as the approximation of
T by
X.
However, the
approximation appears to be sUfficiently accurate for use in applications.
Table 2 was constructed in the same manner as Table 1.
Table 2 about here
5.
Tests based on
S
and
T
COMPARISON
are equivalent in linear regression;
they are not in nonl1.near regress1.on.
This raises
the question of'
which should be used in applications.
Approaching this question on the basis of power, the analytical
and Monte·-Carlo evidence suggests that when a component of the parl;l1lleter
e
enters the model nonlinearly the Likl;lihood Ratio Test with respect
to a hypothesis concerning it has better power than the test based on
the asymptotic normality of the least squares estimator.
When a compo-
nent enters the model linearly the two tests have the same power.
How-
ever, the evidence presented here is too limited to place much reliance
on this generalization.
If the question is of critical importance in an
application, the non-centralities
pared.
Provided
A2
is small
«
).,1
.1)
and
).,
can be computed and com-
the Likelihood Ratio Test will
14
have better power than the test based on the asymptotic normality of
the least squares estimator when
Al
is larger than
P[X > c ~ ]
or
FLY:> d * J
necessary to evaluate
A
It is not
if relative pm.,er is
the only matter of concern; the comparison of non-centralities will
suffice.
As regards computational convenim ce, we have seen previously that
fO.L~
qC"'l
for
q > 1
the test based on the asymptotic nonnality is more convenient j
the Likelihood Ratio Test is more convenient.
tion is flee'-=,ssary.
One qualifica-
Hartley's [6J and Marquardt's [9J algorithms do not
always clinverge in practice.
successfully computed but not
computed by other means,
Thus , it is possible that
'"p . y
~.~.,
In such situations,
can be
A
p
must be
grid search, or the test based on the
asymptotic normality of the least squares estimator employed by default.
In most applications, a hypothesis involves only a single component
of
e
and, therefore,
q=l.
The author's practice is to use the test
based on the asymptotic normality of the least squares estimator for
computational convenience when the purpose is to assist a judgment as
to whether the model employed adequately represents the data.
He uses
the Likelihood Ratio Test when the hypothesis is an important aspect of
the study.
15
FOOTNOTES
a.s.
and
;;:..
P
is usually a good start va,J..ue for computing
p
t -1cr2
•
10
H:
MONTE-CARLO POWER ESTIlIillTES FOR THE LII\ELIHOOD RATIO TEST
against
81 = 0
A:
81
> c* J p[T > c* J STD. ERR.
e3
=
-1
against
A:
8
-+-1
3 '
Monte-Carlo
A1
A2
p[X > c * ]
P[T> c* )
STD. ERR.
.003
0.0
0.0
.050
.052
.003
.094
.004
0.2421
0.0006
.103
.110
.004
.237
.231
.006
0.8503
0.0078
.243
.248
.006
.700
.6B7
.006
2.6669
0.0728
.618
.627
.007
3
1..1
~2
OeD
-1.0
0.0
0.0
.050
.050
0.008
-1.1
0.2353
0.0000
.101
0.015
-1.2
0.8307
0.0000
0.030
-1.4
3.3343
0.0000
8
H:
0
Monte-Carlo
Parameters
81
f-
P[X
2.
M01~'1'£-CA1\LO
POWER ESTIMATES FOR THE TEST BASED ON THE ASYMPI'OTIC NORMALITY OF THE LEAST SQUARES
ESTTh1ATOR
H:
81 == 0
against
8
3
6
e3
against
== -1
Monte-Carlo
Parameters
61
H:
1 =f 0
A:
A
p[Y
>
F ]
Ci
J{s
>
F ]
Ci
STD. ERR.
A:
6
3
i
-1
Monte-Carlo
A
0.0
-1.0
0.0
.050
.050
.003
0.0
0.008
-1.1
0.2353
.101
.094
.004
0.015
-1.2
0.8309
.237
.231
0.030
-1.4
3·3343
·700
.687
p[Y
>
F ]
Ci
p[s
>
F ]
STD. ERR.
Ci
050
.056
.003
0.2220
.098
.082
.004
.006
D.7332
.215
.183
.006
.006
2.1302
·511
·513
.007
16
:f{EFERENCES
[1J
[2]
Fox, M., "Charts of the Power of the F'-test," The Annals of
Mathemati.cal Statistics, 27 (June 1956), 484-9r
Gallant, A. R., "Statistical Inference for Nonlinear Regression
Ph.D. dissertation, Iowa State University, 1971.
~odels,"
Gallant, A. R., "Inference for Nonlinear Models," Institute of
Statistics Mimeograph Series, No. 875. Raleigh: North Carolina
State Uni.versity, 1973.
[4J
Gallant, A. R., "The Power of the Likelihood Ratio Test of'
Location in Nonlinear Regression Models," Journal of the American
Stct:t.i.:;t:ical Associaticn, 70 in press .
~
(
i
)
[5J
Graybill, F. A., An Introduction to Linear Statistical Models,
Volume 1. New York-; McGraw-Hill Book Company, 1961'.
[ 61
Hartley, H. 0., "The Modified Gauss-Newton Method for the Fitting
of Non-linear Regression Functions by Least Squares,"
Technamei;;rics, 3 (May 1961), 269-80.
[7J
Lehmann, E. L., Testing Statistical Hypotheses.
Wiley and Sons, Inc. 1959.
New Yorkl
John
[8J Malinvaud, E., "The Consistency of Nonlinear Regressions," The
Annals of Mathemattcal Statistics, 41 (June 1970),956-69.
Marquardt, D. W., "An Algorithm for Least-squares Estimation of
Nonlinear Parameters, II Journal of lndustrial and Applied Mathematics, 11 (June 1963), 431 .41.
c
[10J
Pearson, E. S. and Hartley, H. 0., "Charts of the Power Function
of the Analysis of Variance Tests, Derived from the Non-central
F-di_stribution," Biometrika, 38 (June 1951), 112-30.
17
APPENDIX.
PROOFS OF TBEOREMS
'rhree technical lemmas are needed in the proofs of the theorems.
Lerruna 1.
on
r
X A
Let
where
g(x,),)
A
be a real valued continuous function defined
is a compact set.
Under the assumptions listed in
Sectiun 2:
The series
S g(x}
ii)
n
n1 Et==l
g(xt ,),)
),)d\-L(x)
converges to the integral
uniformly in
The random variable
*~=l
),.
g(xt , ),)e t
converges almost surely
converges ip probability to
Part i is proved in [8, p,
Proof.
3.4 in [3].
Part i i is proved as Lemma
Parts i i i and i v follow from the uniform convergenoe given
Lemma 2.
~
p
!:~.
Under the assumptions listed in Section 2, the random
converges almost surely to
Let
p*
Qn(e) = (y - r(e))'(y - r(e))/n = e'e/n +
* - f(e))/n + (f(A * ) - f(e))'(r(e * ) - f(e))/n
2e'(r(e)
almost surely to
j
n
e
).* )~(x) .
0
in Part i.
variable
967]0
S g(x,
Q(e):.::
a2
t
S(f(x,
which converges
2
e* ) - f(x, e)}~(x)
by the Strong Law of Large Numb ers and Lerriilla 1 i, ii.
uniformly
Let
18
be a sequence of points minimizing
r'e\ilization of the errors
p#
cneUmit point
}
'
pl.
N
Ufr
rn ....
c:o
Pn
m
2
a
tional set,
p # ::.: p
Implies
LeIfillla
3.
-
o
.j(
S [f(x,
=
lim
m ....
e* ) -
by assumption.
~
Pn
Q(""p
ClO
corresponding to a
0
[~pn )
rn
n
such that
ret)
belongs to the excep-
,,.)
$;
0
lim
m ....
In
ClO
f(x, (p # , ,. * ))} 2 ~(x) ::.: 0
which
Thus, excepting realizations in I the
haB only one lim.it point which is
*
0
p •
Under the assumptions listed in Section 2, the random
vect0r (11In) F" e
(lIJn)F"o
#
Q(p , ,. )
*
(p,.,. )
is compact there is at least
(1
Unless the realization
$;
c"eeptL:u:'..l set,
Since
n
and at least one subsequence
that
80
[e }.
t
~
converges in distribution to a p-variate normal and
conver'ges to a fi.nite limit.
By Lemma 3.5 of [3 J or Lemma 4.2 of [2 J (11In) F' ' e
Proof.
(1/Jn)F'6 o =
in distribution to a p-variate normal.
By Taylor's theorem,
- (lIJn)F'F(S)(So
-
the line segment joining
* where e is on
e)
*
*
e"
The asswnption that
Jii (8 - e)
o
Lemma 1, i applied to
1
n
Proof' of Theorem 1.
F'F(A')
8
to
0
tends to a finite l.imit and
imply the result.
0
The first conclusion is proved as
of [3J or as Proposition 6.1 of [2J.
converges
Theor~rn
3
The second conclusion is proved
as Lemma 1 of [4 J.
By Lemma 2,
subset of
(1
(p,
"0)
containing
in the proof and causing
will almost surely be contained in the open
e*
f'J
p
allowing the use of Taylor's expansions
to eventually become a stationary point of
19
Tn this paragraph we will obtain intermediate results ba.sed on
Taylor
r
expansions which will be used later in the proof.
S
Using
Taylor I s the0rem,
1s the
H
whc're
v If(X
matrix with typical row
P
1",
t'
a)0
V;f(X t , A*) + 2"(p - P ) Iv2f( x t ' (p, '1"0»
and p is on the line segment
o
p
joining '"
P to Po' Us lng Lemmas 1 and 2 and the assumption that
Jil"(A
Ii-.
-8)
o
!n
and
converges to
F'H
and
1 '
!n
H'H
I;t
.1 II'"
n
Fl\p,
.
finite llmit
one can show that
To ) H,
converge almost surely to the zero matrix.
Applying Ta.ylor I s theorem twice one oan write
E
where
r Xp
is the
matrix with typical element
e ..
lJ
illld
n is the
d ..
lJ
n
tt==l
We will write
and
D
r
Eo
~
r
matrix with tyvical element
J
e)-
1
when this matrix depends on
when the dependence is on
convergence of
*
t>2
_
;;::-;:-- f(x , (p, T »[e . + f(x ,
t
t
t
vp .up.
0,
to
e*
p.
Po
f(x , ~
t
a
)J .
and will write
E
Using Lemma 1 and the assumed
one can show that
1
n
and
E ,
0
!'D
n
converge almost surely to the zero matrix.
In this paragraph we will obtain the probability order of
mentioned earlier, for almost ever.'Y realization of the errors
is eventually a stationary point of
(-,In
/2)Q(p, 'To)
so that the
As
'"
p
20
random vector
Jii
(-
/2)'V Q(p, or ) ;:: (l/Jn) Fl/(p,
p.
0
verges almost surely to the zero vector.
'l"
0
)(y - f(P, ,. )) con~
0
SUbstituting the expansions
in the previous paragraph we have that
[! F / (""p 'T) F
n
1
'
0
F ' (""'p, 'T' ) H - ~ D)
l on
n
+!
1
converges almost surely to ze,ro.
JD
L E(e -e*) - L
(p'-p ) -
In
0
F / ( e+ 0 )
1
0
In
0
Lemma 1 and our previous results imply
that the matrix in brackets converges almost surely to a positive
definite E'ub:r.lG.trix of
t"
Lemma
3 implies that (l/Jn) F{( e + 0
)
con-
- e*)
con-
0
verges in distribution to a r-variate normal;
verges almost surely to the zero vector.
(!'E)
n
J?l
(e
0
These facts allow us to con-
elude that
converges in probability to zero and that
v
n
;::In(p-p)
0
in probability.
The swm of squares
"')
""
IIP .I. ( e+ 0 ) + PI ( e+ 6 ) - F (p
-p
- H (p
-p ) 1\2
I
o
010
0
I
;::
IIP~'( e+ 0 0 ) 11 2
,.. 2 ( e+ 0
0
)
'P~H (p-p 0 )
The cross product telm may be written as
is bounded
21
Each of thesE terms converges almost surely to zero by Lemmas 2., 2, and 3 and
our previous results.
By the triangle inequality
1
(U
'(:~ F ' F ) u )"2
n n
l l
1
n
.l.
+ (v' (:... H'H)v ) 2
n n
n
The two terms on the right converge in probability to zero by our
0
previous results.
Proof of Theorem 2.
'I' "" X + a b
n n
By Theorem 1,
zero.
Now
nab
n n
Applying Theorem 1 we have
+ (n/e'pJ..e)b
and
0
0
+ a (e+o ) 'pl(e+O )/n
n(n/e'pJ..e)b
a(e+o )'(e+o )/n
n
n
n
n
010
converge in probability to
bounds the last term from above as
pJ..
1
22
is idempoLe:nt.
As in the proof of Lemma 2,
2
almost surely to
a
and
n a
(0°
0
)
I(e+0o)/n
converges
converges almost surely to zero by
n
Theorem 10
(1/0)e,
Set
(z.l-'
ZC)'
z)
••• ,
a.rb itrary cons Lant
b,
(z + bY)
f::.LL;~f:
1}'.l.(2,
H
tOY)
independent oecause
q
P-P
The random variables
1.
n( t; 0, 1).
(z + bY) IR (z + bY)
q.
= o·,
Rp..l.
is a
SirrLilarly,
is a non-central Chi-squared with
b:;
For an
degrees freedom and non-·centrality
is irJem:potent wi.th rank
freedom and non-centrality
Let
=
the random variable
non-central Chi-squared with
2 /
b Y R Y/2
R
are independent with densi ty
n
c
and
I
p J.Y / 2
rt-p
degrees
These two random variables are
see Grayb ill [5, p. 79 ff).
a > 0 .
P[ (z +- y)
p[X > a + 1)
I
P~ (z + y) :> (a + 1) z I p..l. Z ]
L..I.
F[ (z + y) IR(z + y) > a(z - a--Y)
II' (z .- a -L)
Y - ( .1+ a -1) Y I I'.1. Y]
.-".
~
S 0 PC t
x
g(t; q, y/R Y/2)dt
x
g(t; q, y/R Y/2)dt
~
S 0 G(t/a +
x
g(t; q, y/R Y/2)dt .
~
1
-1..1.
:> a( z - a -1Y) I..I.(
I' z - a - Y) - (l + a . )y/p y]
.L.,
2
(a+l)y/p.ly/a2
; n-p,
yIp 1/[2 a J)
23
By substituting
x:=: a-I,
A.
== y'R Y/2,
1
and
Y'P~/2
).,2 ==
one obtains
x > 1
the form of the distribu.tion function for
The derivations for the remaining cases are analogous and are
0
omitted.
Proof of 'Theorem 3.
The random variable
The secund Lena on the right will -be denoted by
probability to zero because
Theorem 1 and Lemma 3,
assumption,
In (~ - /)
Jll- ('T o
'T*)
-
may bewritten
S
u
.
n'
11
converges in
n
is bounded in probability by
converges to a finite limit by
converges almost surely to
2
a ,
converges almost surely to the zero matrix by Lemma 1, iii.
e
Using the expression for
given in Theorem 1
S
where
and
are formed fran
deleting the first
v '
n
r
rows.
As noted previously
Jii
and
Ci
n
by
Denote the second term on the right by
Z2
is bounded in probability,
JllCi 2
con-
verges in probability to the zero vector as stated in Theorem 1, and by
Lemma 1, i the matrix
n-lc;~ converges.
Consequently
v
n
converges
in probability to zero.
We now use the expression for
S
1/;2
given in 'I'heorem 1 to obtain
Z~C;~Z2/q
+
e'p.Le/(n_p)
where
a
n
is as in 'Theorem 1.
II
n
For the reasons noted previously,
is bounded in probability while
a
n
converges almost surely
24
to zero.
Thus
S = Y+ Y
n
where
Y
n
converges in probability to zero
and Y has the familiar form of the test statistic used in linear
regression [5, pp. 77-7 13].
0
© Copyright 2026 Paperzz