Gallant, A.R.; (1973)Seemingly unrelated nonlinear regressions."

SEEMINGLY UNRELATED NONLINEAR REGRESSIONS
by
A,.
Institute of Statistics
Mimeograph Series No. 900
Raleigh - November 1973
Ronald Gallant
SEEMINGLY UNRELATED NONLINEAR REGRESSIONS
by
A. Ronald Gallant*
ABSTRACT
The paper considers the estimation of the parameters of a set of
nonlinear regression equations when the responses are contemporaneouslybut not serially correlated.
estimator
Conditions are set forth such that the
obtained is strongly consistent, asymptotically normally
di~tributed,
and asymptotically more efficient than the single-
equation least squares estimator.
The methods presented allow estima-
tion of the parameters subject to nonlinear restrictions across equatfons.
The paper includes a discussion of methods to perform the computations,
a worked example, and a Monte Carlo simulation.
*
Professor of Statistics and Economics, Institute of Statistics, North Carolina state University, Raleigh, North Carolina
A~sistant
2760'7.
1.
INTRODUCTION
This paper may be viewed as a generalization of Zellner's
paper in the following respects.
(1962)
The response functions are allowed
to be nonlinear in the parameters as well as linear.
The parameters
of the model may be estimated sUbject to nonlinear restrictions across
equations as well as linear restrictions across equations.
We rely on Zellner's paper to provide the general economic considerations which motivate the study of the Seemingly Unrelated Regressions
situation.
The extension to the nonlinear case is motivated by concern
for the specification error due to requiring that economic theory
accommodate the assumptions of linear regression.
The argument based
on Taylor's Theorem used to justify the linear model as an adequate
approximation of the economic model often leaves the reader of an
econometric study less than satisfied.
The case when the approxfmati.on
is inadequate provides the primary motivation for this paper.
It
attempts to give the practioner the freedom to let his econometric
model more adequately represent his economic model.
The estimation procedure set forth in Section 2 is the same as
Zellner's
(1962) Seemingly Unrelated Regressions technique with the
exception that nonlinear response functions and nonlinear parametric
constraints are permitted.
The variances and covariances of the dis-
turbances are estimated from the residuals derived from an equatfon-byequation application of least squares.
simultaneously by applying Aitken's
All parameters are then estimated
(1935) generalized le.ast squares
to the whole system of equations using these estimated vari.ances and
2
covariances. Nonlinear constraints on the parameters may be imposed in
the Aitken phase of the procedure.
Conditions are set forth in Section 3 under which this procedure
yields estimators which are strongly consistent (almost sure convergence)
and asymptotically normally distributed.
These assertions are proved
in Section 4 using a set of lemmas stated and proved in the Appendix.
The conditions used to obtain these results are patterned after the
assumptions used by
Malinv~ud
(1970) in a nonlinear regression context.
Either Hartley's (1961) Modified Gauss-Newton Method or Marquardt's
(1963) Algorithm may be used to obtain the restricted or_unrestrrcted
estimates of the parameters in the Aitken phase of the procedure as
well as in the equation-by-equation phase.
The details and a worked
example are in Section 5.
The unrestricted estimator obtained according tothe Zellner procedure is shown to be asymptotically more efficient than the equation-byequation least squares estimator except in two special cases.
The
first case would be more apt to occur in a designed multivariate experiment than in an econometric investigation.
It is the case when each
of the contemporaneous responses has the same response fUnction and
same independent variables.
This special situation is entirely analo-
gous with the one discussed by Zellner (1962, p. 170).
A second case
where the equation-by-equation least squares estimator is fully efficient is when there are no contemporaneous correlations among the
responses.
The reader who is only interested in the statistical and numerical
methods proposed in this paper need only read Section 2 for a description
3
of the method and Section 5 for a discussion of how available computer
programs may be used to carry out the computations.
Section 7 reports the results of a Monte-Carlo simulation.
4
2.
ESTIMATION PROCEDURE
The estimation procedure is entirely analogous to Zellner's (1962)
Seemingly Unrelated Regressions estimation method except that (possibly)
nonlinear response functions are substituted for linear response functions.
We have a set of M nonlinear regression equations
Yt a, = f a, (lC...
, eO) + e t ,..""
,-va, rva,
where the inputs
eO
("Va,
are
kta,
(a,
k
are
= 1,
a
(t
2, ••• , M)
= 1, 2, .•. ,
n)
by 1 vectors and the unknown parameters
p a, by 1 vectors known to be contained in the sets rva
®.
The
errors
(M )( 1)
are assumed to be
independ~nt,
each having
mean
,
0,
rv
the same distribu-
tion function, and positive definite variance-covariance matrix
t·
rv
Each,Qf the M regression equations may be written in a convenient
vector form as
(a
e
+ rva,
=1, 2, •.. , M)
where
(n )( 1),
f
(a ) = (f a,(Xl'
e ),
\:e a, rva,
1"lJ<;X. ~a,
e
"'a,
=
f a,(x
rv
'
2a,
e ), ..• ,
("Va,
(e l a,' e 2 a' ""
eno.) ,
f a,(x
, e»)'
rvna, ("Va
(n X 1).
(n Xl),
5
The first step of the procedure is to obtain the least squares estiA
mators
over
e
(Va,
by minimizing
equation by equation.
The second step is to form the residual vectors
A
e
(Va,
= Na,
v
A
- (Va,
f (8
)
(Va.
and estimate the elements
(<J;.
= 1,
o~
of the
= 1,
2, " ' , M)
" ' , M)
variance-covari~nce matrix
by
(a,
(13
1, 2, ... , M)
=
A
to obtain the estimate
~
of
~.
To carry out the next step, the set of M regressions are
arranged in a single regression
+ (V
e
where
(,,"
il ;;: 'K-l' il2' '''',
')'
~
(nM Xl),
(nM Xl),
(nM X 1) •
~
6
The variance-covariance matrix for thi$ regression is
°Z:J,
e~~/) =
~GD~
crl~
crlliS
cr2~
cr2Mf,
(nM X nM)
=1
cr~
cr~
where
I
~
is the
n
by
n
,.
identity matrix.
This variance-covariance
ma.trix is estimated by ~~l, obtained in the second step of the
procedure.
The third
,. step of the procedure is to obtain the Aitken type
,.
estimator
e
r<J
by minimizing
over
Define
'il
f (x , CI)
"'a a "'a
~o,
:;;: the p
b
~.
vo.
~o,
o
f
(x
,
~ct
ct
e) ,
rva,
...
o
'"
'"
by 1 vector whose i th element is
a
o
~
(nM )(
o
~
~l
Po,) •
7
The fourth and final step is to obtain the inverse of the matrix
,..,..
rn(~
In Section 4 it is shown that
distributed with a
- ,fi0)
variance-~ova;dance
is asymptotically
,.. normally
matrix for which
,.. -1
(0)
'"
is a
strongly oonsistent estimatorOne may wish to impose restrictions across
improvement phase of the estimation.
eq~ations
in the Aitken
For our purposes, the most con-
venient way to represent these restrictions is by reparameterization.
Let
~
by
vector valued function relating the original parameters to the
.~
be an
r
by
1 vector of new parameters and let
~
be a
Pa
new parameters according to
We assu,me that
parameter space
r:5p
~
tMar1 P
and that
~
~ is contained in the
p •
rv
Define
(p ::::
r
by
1
~l
p ex. X 1)
vector whose J.th element is
~ g (~)
bp. iex.%'
J
R.a.(R.-) :::: the
p ex.
by
r
matrix whose i th row is
(p =
)
~l
P ex.
X r) •
~ gi~(~) ,
8
The third
,. step of the procedure is modified to read:
Obtain the
,.
estimator
~
by minimizing
over the parameter space
l .
The fourth step of the procedure is modified to read:
Obtain
the inverse of the matrix
,. ,. , . , .
,.
.
,.
Q'Q,Q = Q'(~[ ~!'(~(~»)(i-l(i,;O !(~(~)]
In Section 4 it is shown that
Q(;J •
is asymptotically
,.,.,.normally
,.,.,.
distributed with a variance-covariance matrix for which
a strongly consistent
1
(G'n
G'~~N'
is
estimator~
Either Hartley1s (1961) Modified Gauss-Newton Method or Marquardt1s
(1963) Algorithm may be used to find
~
minimizing
Qa(~).
Using
a suitable
,. transformation, either,.of these, ,.methods may be applied to
,.
find
e minimizing
,. ,. ,.rv
,.,.,.
(Q; QQ)-
Q(~)
,_
1
as the case may
and
~.)-l"
(~
or Q..
;be~
The details are discussed in Section 5.
minimizingQ (~(~')
IV and
One may wish to estimate the parameters of the model subject to the
restrictions
~(n) =
~N
e
f'V
but present the results
in terms of the original
A
A
A
parameters
,..
,..
8.
rv
This is done by setting
4, S is strongly consistent for
~
t
!
and
,..
= ~(~).
,..
As shown in Section
Jil (~ - t)
is asymptoti-
cally normally distributed with a variance-covariance matrix for which
,..
AAA
,..
Q(;)(Q'~Q)-lQ'(~)
is a strongly consistent estimator.
will be singular when
r < p •
This matrix
9
3~
ASSUMPrIONS
In order to obtain asymptotic results, it is necessary to specify
~a
the behavior of the inputs
as
n
becomes large.
A general way
of I?pecifying the limiting behavior of nonlinear regression inputs is
due to Malinvaud (1970).
Two of
his definitions are repeated below
for the reader's convenience; a more complete discussion and examples
are contained in his paper.
In reading the two definitions, it will
help if the reader keeps in mind two situations where the inputs satisfy
k
Definitions 3.1 and 3.2•. Let
the inputs
*
T
[~}t=l
X=
~
=
~~
•
R -~ from which
k = ~=1 ka;' ,
(x'
XI
x. /.) I
~l'~' ... , ~
xM 1 X
a;=
be the subset of
(k Xl),
The first is to choose some set of points
and choose inputs according to the scheme
*
= ~l'
~l
"""Cf.,
~~ are to be chosen,
X
~
and let
X
*
~ =~,
••• , ~ =
*
.2S:r'
~+l
*
= ASl'
*
~+2 =~,
... ,
that is, "constant in repeated samples" (Theil, 1971, p. 364).
The
second is to choose inputs by randomly sampling from a distribution
function- defined over the set
X·
~,
for example, the
dimensional
k
uniform distribution.
Definition 3.1:
Let
a
be the Borel subsets of X and let
~
[~}~=l be a sequence of inputs chosen from ~.
let
IA(~)
on (X,
~
a)
be its indicator function.
is defined by
For a set A in
The probability measure ~n
a
10
Definition 3.2:
£M n }:==l on
A sequence of measures
said to converge weakly to a measure
J g(x)
rv
<tL nrv
(x) ==
is
g with domain rv
X,
valued, bounded, and continuous function
limn-+oo
a)
(h a) if for every real
on
~
(h
J g(x)
rv
<tL (x)
rv •
The following Assumptions are sufficient to obtain the asymptotic
A
A
properties of the estimator
imposed on the parameters
The response function
8 •
~
f
(x , 8 ),
a,~a,
~
®
and sets
rva,
~
for each
a, == 1, 2, ••• , M.
are continuous on
~a,
The sequence of inputs
r~}oo
are chosen such that the sequence of measures
weakly to a measure
~'defined
on
eO
is contained in an open spehre
®
for each
""CL
rva,
a, == 1, 2,
... , M.
for rv
x in A where ~ (A) == 0,
by P
If
1..
n n==l
21
~
=
converges
S
which is, in turn, contained in
rva,
fa,(~d
The true parameter value
e)
~a,
== f (x , rva,
8° )
a,~
·..
a ~2
·..
a~
M2
cr 2M2
·..
cr~
22
'-lJ
a).
it is assumed that rva,
e
l2
V
cr rv12
[~~}OOt__l
(h
matrix
a rv21
v
X
are compact.
rva,
the first partial derivatives in
and the second partial derivatives in
X X®
rva,·
in the case when no restrictions are
The parameter spaces
Assumptions:
~d
8
rv
== rva,
8°·
except
The
p
11
cr~ are the elements of ~-l and the Pa
is non-singular where the
have the ijth element
matrices
by Pf3
As mentioned earlier, the errors
distributed with mean
covariance matrix
0
rv
~
are independently and identically
and unknown positive-definite variance-
~.
In the case where the restrictions
= ~(~)
~
are imposed on the
parameters, the following additional assumptions are required.
Assumptions:
The range of
,g
RJ
The parameter space
(Continued)
is a subset of
(8)
rv
= xMa;:: 1
(8)
""tX,
its first and second partial derivatives in
true parameter value
in an open sphere
~Jp)
G
f'J
=:t
= rvG(nfI<J
O
)
~o
satisfies
~(~O)
P is compact.
rv
The function
~
are continuous-
=:t
and R,.0
S which is, in turn, contained in
rv
it is assumed that
is non-singular.
,Q,
= R,.0
•
g(n)
1<1
•
The matrix
RJ
and
The
is contained
P.
rv
If
GIn G where
rv~rv
12
4.
STRONG CONSISTENCY AND ASYMPrOTIC NORMALITY
Two theorems establishing the strong consistency and asymptotic
""
normality of the estimator
are proved in this section; the strong
~
""
consistency and asymptotic normality of
these two theorems.
8 follow as corollaries to
f'<J
In order to simplify the notation in this section
and the remainder of the paper we will adopt the convention of writing
f
"'a,
for the function
f
for
f(eO),
f"'o,Jf'J
Similarly, we will write
"""1:X,
F
t"I
F(eO
for
when it is evaluated at the true value
"'a,
e = f'<Ja,
8°.
of the parameter
i*'J
f (e)
"""1:X,
rvrv
),
and
F
"'a
for
g'(n O ) .
G for
~_N
f"'o,J
"
Theorem 1:
The estimator
F (eO),
"""1:X, """1:X,
" converges almost surely to
~
~o
under
the Assumptions listed in Section 3.
Proof:
Consider the quadratic form
"~
The random variables
of
-1
~
cr
by Lemma A.5.
converge almost surely to the elements
Using arguments similar to those employed in
1
-(y
n Na,
the proof of Lemma A.5, the terms
almost surely and uniformly for
where
almost surely to
f
(x , rva,
eO) -
a, rva;
e
tv
f
f (e
))/(y~
"'P
tva, tva,
-
~(~))
.,.,. ,.,
converge
in ® to
f'J
(x , "'Cf,
e ).
a, tva,
Thus,
Q(e)
tv
converges
13
e
uniformly for
'"
-1
The matrix
'"
is positive definite so the
~
JZ-')' in }S X ~ •
v(l6',a)' is pos itive for all (~',
integrand
Consequently, if
= o.
where U(A)
which implies
=t
~(R)
in ®.
~
Q* (~)
=M
then v(l6'~)
= to,
ft
except for
. ....,
a,~
.*
x
rv
=M
Q (~(R))
Thus,
by Assumption.
which implies
except for ~ in A
~) = 0
0 (x ,
This implies
=0
in A
implies
= R,Q.
i"-
A
Consider a sequence
tRn}
t~}~=l
corresponding to a realization of the errors
compact there is at least one limit point f<-*
A
A
A
[Rn
1:=1
such that
m
to the exceptional
~
in
limm
,*
n.
A
0'=
-? co I"oU ' . F'V
P
over
of points minimizing
Since
P
rv
'"
is
and one subsequence
Unless this realization belongs
m
E,
converges to
Q* (~(p))
uniformly for
P whence
'"
*
*
M:::; Q (ll(n
))
!<I i<.I
= limm ....
This implie~ Q* (~(~* ))
A
sequence
=M
Q(g(o))
:::; limm ....
rv I'\Jll
m
which implies
co
Q(ll(nO))
~ K.I
* = R,0.
~.
= M•
Thus, the
co
[Rn}n=l
of the errors
co
has only one limit point R,.0
~}~=l contained in
except for realizations
E where peE) = o.
~
A
A
Corollary 1:
The estimator
A converges almost surely to
rv
eO
'"
under the Assumptions listed in Section 3.
Proof:
set
XMa;=1
S
f"Ja,
Let ~(R,)
be the identity mapping of ® onto
is an open set containing
rv
eO
'"
®.
The
'"
which is, in turn, contained
14
in ~.
Thus, there exists an open sphere ~ containing
I
is, in turn, contained in '®.
"
nonsingular.
x
"
$ (r<- - R-0
0
f",J
so
a' n~ a
f"V
f"V
is
0
If the Assumptions listed in Section 3 are satisfied
Theorem 2:
with mean
a(-) == I
rv 'J<.,
which
Consequently, the identity mapping satisfies the full
set of assumptions.
then
The matrix
t
)
converges in distribution to an r-variate normal
and variance-covariance matrix
(a' 0 a)-I.
f"'\.I
""
" " """
Q,' (~)[ ~!' (~(~))
~'Q ~ ==
The matrix
rv
I"J
""
(,~-l~,t) X(~(~)) ]Q(~)
a 'n a .
converges almost surely to
""~""
""
" .
",
""
Define ~=R. if ~ is in S and ==r;..0 if r;.. is not
""
•
•
""
Set Ii == ~(R)' Since J;.(~ - R-) converges almost surely to
Proof:
in S
""
£
.
by Theorem 1, it will suffice to prove the theorem for
•
The first order Taylor series expansion of . !(~)
.•
~
may be written
as
where
the t
H is the
rv
th
row of the
nM by
n
by
r
matrix
r
submatrix ~ is
where R.. is on the line segment joining ~ to
*
X'(,.~,)(k.-l@*);!:!,
J:'1
.
r<-0'
Pa, by r
which is composed of the
· .
n t:t=l "La,f a,(~r1 ~a,)(r;..
oaf; 1
"=1"2
1 ,
2
2
-
2(r;.. - r;..0) ' ; /a, (~a:' ~ (.Q..) )
Consider the matrix
submatrices
- !GO); f t3 (~t3' ~ (~)) •
15
The elements of this submatrix converge almost surely to zero by the
continuity of the first and second order partial derivatives of
and
the compactness of the set
g ("',
K.Ja
1<.1
•
sure convergence of R- to R-0
•
X >< f'J
® X P,
~
f"V
and the almost
n1 Q' (g)..
In consequence, the matrix
converges almost surely to
f a, (x
,Cl )
f'V(J., ':<'a,
X
o•
'"
Using Taylor's Theorem, we may write
where
D
'"
d ..
lJ
is the
= ~l
r
by
~=l
r
matrix with
.th
element
i, J
2
.
'0
cr0f3 t:=l
f (~ a,, %(~)) e t (3
'Op jbpi a,
1
Note that - d.. converges almost surely to 0 by Lemma A.4.
n lJ
Using the Taylor series expansions obtained in the two previous
paragraphs, we may write the
r
by
1
vector of partial derivatives
- [!n '"G/(~)F'(8)(~-lOI)(H'
G + H) - ! D] Jil (~ - nO)
1'<.1 ' " ' " !O
\i5I'" AJ rv
rv
n rv
I<J
I<J
The estimator
'"'"
S for n sufficiently large except
is contained in rv
~
on an exceptional event
thus
E occurring with probability zero by Theorem 1;
converges to
£
almost surely as
..
R- is a
16
'"
A
stationary point of
In(R, - ~O)
2
when
S.
In consequence,
f'V
(Q'QQ)-l
because:
The first term on the right converges in distribution to a
r-variate normal with mean
~ In
t'<J~._
.£3:'
B.
is in
converges in distribution to a r-variate normal with mean
and variance-covariance matrix
A.
~
and variance-covariance matrix
0
f'V
b Y Le mma A ••
6
The matrix in brackets appearing in the second term on the
right converges almost surely to
1
..
Gin G.
f'V~f'V
This follows because
'" 1
-n F'(Cl)O=-<i:I)
F converges almost surely to 0
N
f'V
A. 3,
f'V
2(~)
f'V
f'V
f'V
converges almost surely to Q;
terms converge almost surely to
C.
0
f'V
and the remaining
by our preceeding remarks.
o.
The third term on the right converges in probability to
J'1
Ji1(~af3- ac$)!n
'r3=1
This follows because the subvectors
converge in probability to
D.
by Lemma
0
f'V
f'V
F'(A)e
""C(,
""'a ~
by Le_mma A.5 and Lemma A.4.
The fourth term on the right converges in probability to
o.
f'V
~=1$ (~af3- aq3)~ X.~(ia) )(
This follows because the subvectors
converge in probability to
by Lemma A.5,
O·
f'V
Lemma A.l, and Theorem 1.
The last sentence of the theorem follows by applying Lemma A.5 to
obtain the almost sure convergence of
Th~orem
'"
2(~)
~c$
aaf3,
to
"
by applying
A
'"
1 to obtain the almost sure convergence of Ji(~) to
to Q;
!t
and
and by applying Lemma A.3 to obtain the almost sure conA
vergence of the submatrices
'"
'"
n1 'tx,(i£a,W))
Xr3 (~(It))
A
to ~.
17
If the Assumptions listed in Section 3 are satisfied
Corollary 2:
x
'"
then Jll(~
mean
Q
- ~O)
converges in distribu~ion to a p-variate normal with
and variance-covariance matrix Q-l.
Q.
converges almost surely to
Proof:
The proof is the same as the proof of Corollary 1.
Theorem 3:
'"
A
~
The matrix
Let the Assumptions of Section 3 be satisfied and set
x
'"
'"
= ,&(Q) •
Then
1'"
t
converges. almost surely to
is asymptotically
normally distributed with mean
,
singular variance-covariance matrix
'"
'"
'"
'"
("'oJ
Q(~)(a'~ Q) -lQ' (~)
Proof:
rv f"oJ f"oJ
0
'"'"
Jil t~
- t)
and (possibly)
rv
The matrix
rv
is strongly consistent for Q(Q',Q, ,Q) -lQ,'.
The almost sure convergence of
~(RJ
the continuity of
Let
G(G'O G)-lG'.
'"
and
.
follows from
and Theorem 1•
~
be as in the proof of Theorem 2. It will suffice to
'" ' .
'"'"
"
prove the theorem for ~(R.) instead of ~ because JU
~(p)
(! -
converges almost
surel~
to zero by Theorem 1.
series expansion of ~(i)
The first order Taylor
is
/
where
.!!
row of the
where
~
is the
by
p
r
matrix
A! = (~l' ,&2, "", ~)';
the i th
hia, = ~(~ - R-0 )/~gia,(Q)
• to ~o. The asymptotic
lies on the line segment joining RPa,'bY
normality of
r
submatrix
Jil (~(R) ,- ft)
Ma,
is
is due to the fact that
Jil (~ -
R,.0)
is
18
asymptotically normally distributed as shown in the proof of Theorem 2,
the elements
of
~
and
are bounded by continuity and the compactness
converges almost surely to
0
rv
by Theorem 1.
The last sentence of the Theorem follows from Theorem 1, Theorem 2,
and the continuity of ~(~).
0
19
5.
COMPUTATIONAL CONSIDERATIONS
The minimization of
estimators
~a,
Qa,(~)
to obtain the ordinary least squares
and residual vectors
...
e
may be carried out using
rva,
either Hartley's (1961) Modified Gauss-Newton Method or Marquardt's
(1963) Algorithm.
Q(~)
by proceeding as follows.
t- l
such that
index
~a,
Either of these methods may be used to minimize
rv
Let
(
r of3
t3.
and column index
a,
=
= rvR'R.
rv
Factor
~-l
to obtain a matrix
denote the element of
.s
R with row
rv
Define new input vectors,
I
I
r 01' r 02' >>>, raM' JEtI' ~2' • >',
. I
~M
)
I
(M + k X 1) ,
new responses,
and a new response function,
When the mqdel is transformed in this fashion it is in a form permitting
the application of either Hartley's or Marquardt's algorithm to minimize
and obtain
......
~
...
The vector!
is a good start value for the iterations>
Most implementations of either algorithm will print the matrix
...
~/l)-l
=«(=1
= n(p/p)-l
rv rv
~=l ~ h(~a!
...
!) ~' h(~ta! ,i))-l .
the estimated variance-covariance matrix of
may be obtained without additional computations.
20
The same transformation of ,..
the inputs, responses, and response
A
function may be used to obtain ~ minimizing
to obtain
.'",..
e
fV
minimizing
Q(e).
Q(~(~))
as was used
The only modification required is
fV
to substi tute~(R.,) for ,.!2, in the response function ' h(~ci~)
h(~ci ~(p)).
ing
obtain-
Be careful that the number of parameters supplied
to the program are reduced from p
taken with respect to R,.
to
r
and that derivatives
are
Since
,..
,..
,..,..
,
..,..
;, h(~a:' ~<R-))= Q' (0
!
h(~a:' ,&~))
the matrix printed by the program
,..
(£'£) ~l =
,..
(~=1~=1!P h(~ci ~~)) Y.; h(~a:' ~(~)) )'-1
will satisfy
Thus ,.. it may be used to estimate the
,..
$
variance~covariance matrix
of
(~ ~ s:<-0).
Start values for the iterations may be obtained by finding R,;
which will satisfy at least
r
of the equations in the system
21
The computations will be illustrated using the data presented in
Table 1.
These observations were generated according to the model
Using the Modified-Gauss Newton Method, one finds that the least squares
estimator
for the first equation is
A
~1
- (1.0127, 1.0077, ·9903, -1.0263)' ,
and for the second equation it is
~
~
- (1.0 450, ·9709, -1.0979) '.
The estimated variance-covariance matrix is
.0008856
.0004516 )
( .• 0004516
and its inverse may be factored as
.0008258
t- 1 = R'R
rv
rv rv
-21.64 )
34.80
The transformed inputs are, therefore,
where
22
Table 10
t
Ytl
1
2
3
4
1·98803
1·74975
1·56687
1.48967
5
1.4015~
6
7
8
9
10
11
12
.~13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
2.28432
1·96907
1083365
1.66293
1.61976
2·51247
2·31425
2.08653
2.05508
1.84513
2·74747
2·52438
2·37002
2.21222
2.09650
2·98463
2·77650
2.60803
2.49681
2.42138
2.05712
1.82886
1.62770
1.48741
1033268
2.28643
2.08228
1.82655
1.68291
1.61076
2.47966
2·27283
2.09720
1·98836
1.85849
2·73098
2·54987
2·37716
2.25082
2.14113
2099249
2·78713
2.63906
2.48154
2..36727
Inputs and Responses
xltl
x2t1
0
0
0
0
0
0
.25
.25
.25
.25
.25
·5
·5
·5
·5
·5
·75
·75
·75
·75
·75
1
1
1
1
1
0
0
0
0
0
.25
.25
.25
.25
.25
·5
·5
·5
·5
05
·75
·75
·75
·75
·75
1
1
1
1
1
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
.25
·5
·75
.25
·5
·75
.25
·5
·75
.25
·5
·75
025
·5
·75
.25
·5
·75
025
·5
·75
.25
·5
·75
.25
·5
·75
.25
·5
·75
Yt2
2.05446
1.7.3545
1·58786
1049833
1·37966
2.02124
1·74193
1.60170
1.43230
1·34858
2.00361
1.80896
1·55098
1.48081
1·33590
1.96754
1·76454
1.63567
1.41481
1·36679
1098960
1076581
1·57198
1.47238
1.40579
2.04711
1·78453
1.65160
1.48088
1·38178
2.02916
1.80538
1.60661
1.45213
1·38909
1·96843
1·75310
1.61+080
1.48162
1·37142
2.02424
1081938
1.62442
1·51076
1·36210
2.06032
1082534
1.60557
1.48878
1·34132
x lt2
0
1
0
1
0
1
0
1
0
1
0
1
0
.25
·5
·75
.25
·5
·75
025
·5
·75
.25
·5
·75
.25
·5
·75
.25
·5
·75
.25
·5
o
7~5
1
0
1
0
1
0
1
.25
·5
·75
.25
·5
·75
.25
·5
·75
23
the transformed responses are,
Ztl
==
39· '57 Ytl - 21.64 Yt2
Zt2 == 34.80 Yt2 '
and the new response function is
This transformed model is in a suitable form for the use of the
Modified Gauss-Newton Method.
Using
e
as the start value, one
fV
obtains
,..,..
8 == (1.0119, 1.0096, ·9902, -1.0266, 1.0450, • gy09, -1.0979) '.
fV
and
(p/p)-l ==
fV
fV
.006368 .000051 -.006047 -.010380
.000051 .000102
0
0
-.006047
0
.005833 .009781
-.010380
0
.009781 .017418
.002844
0
-.002'716 -.004639
-.002692
0
.002606.004332
-.005237
0
.004939 .008774
.002844 -.002692 -.005237
0
0
0
-.002716 .002606 .004939
-.004639 .004332 .008774
.004)76 -.004339 -.008394
-.004339 .004180 .007847
-.008394·.00'78'1+7 .015849
after four iterations.
Suppose, now, that we wish to impose the constraints
811 == El12 '
831 = 822 '
841
== 832
.
24
The reparameterization corresponding to these restrictions is
The matrix of partial derivatives is
o
1
o
o
o
o
o
o
o
1
o
o
1
o
A
Putting ~(~) ~ ~
and solving the first four equations for
~
we
obtain the start values for the iterations
A,fterfive iterations of the Modified-Gauss Newton Method one obtains
25
6. _ASYMPTOTIC EFFICIENCY
,..
,;n (6rv
In Lemma A.2 it is shown that
to a
p~variate
normal with mean
converges in distribution
and variance-covariance matrix
0
rv
o
·..
·..
o
·..
o
f'J
'"
v =
- rveO)
o
rv
o
f'J
rv
'"
and
...
=
T
rv
•••
,..,..
_We have seen in Theorem 2 that
to a p-variate normal with mean
J'fi (e - eO)
-rv
converges in distribution
and variance-covariance matrix 0- 1 •
0
'"
,..,..
f'J
The Aitken improved estimator
'"
e
is asymptotically more efficient-
,..
-f'J
than the equation by equation least squares estimator
sense that
V~lT
""-J
v- l
f"'oJ. r-J
is proven as follows.
- 0- 1
e
rv
in the
is a positive semi-definite matrix.
'r'J
..
By LemmaA.3
-1-1
V TV
'"
,..... rv
(!
F'F)-l(!
F'(tG?I)F)(!
F'F)-l While ~
n- l
nrvrv
nrv rv . " , , , ,
nrv",
This
is the limit of
is the limit of
f
(~~,~-l<B'J)V-l.
The dif'ference
n[Q;,£)-l(~I'(~~l)X.)(!'x.)-l-
(£/~~,~)£)-l] is positive semi-definite by Aitken's Theorem
26
(Theil, 1971, p. 238) so the limit ~-l~~-l - ~-l
is positive semi-
definite.
There is one situation, likely to occur in a designed experiment,
e does not result in a gain in
""
, e ) all
When the response functions f a, (x
rva, ""(1,
where the Aitken improvement of
.
asymptotic efficiency.
have the same functional form and the inputs
.. q. ::;:
1, 2, ••• , M the
.
matr~x
from the faqt that, in this
for
Ct,~::;:
1, 2, ••• , M.
~a,
Q,-1 ::; Z-1XZ-1 .
cas~,
~~
i~
are the same for
This follows directly
the same matrix,
~ay
W,
f'V
Thus,
In the case when the response functions and inputs are the same,
the recommended procedure is to skip step three and estimate the
A
variance-cova.riance matrix of
Q-l ::;
Jil (£ -
(~£' (~)(~-l ~))X(~)) -1 .
i)
by
7.
MONTE-CARLO SIMULATION
This section reports the results of a Monte-Carlo study which was
undertaken to gain an indication of the adequacy of the asymptotIc theory
in applications.
Briefly, the study indicates the need for some con-
servatism in setting confidence intervals and testing hypotheses using
the standard error estimates obtained in step four of the procedure.
This is primarily due to evidence that these estimates understate the
""
e.lex.
actual standard deviation of the estimators
and, to some extent,
due to departures in the shape of the small sample distributions of-tne
"«\
e.lex.
from the limiting normal distributions.
The details of the study are as follows.
Bivariate normal errors
were generated with
.001
.0005
.0005
.001
(
)
anq added to the response functions of Section 5
with
8o = ( 1, 1, 1, -1, .
1, 1,)
-1 I
"'"
using the in;puts in Table 1.
The replication scheme is "constant in
.
t wen t y- f'lve.
repea t e d samp 1 es Il a f Slze
The sample size
taken as fifty yielding two replicates of each input.
of the simulation are summarized in Tables 2 and 3.
n
was
The results
Table 2.
Location and Shape Parameters of the Sampling and Asymptotic Distributions
Asymptotic
Monte-Carlo
Stdo"·Dev. Skewness Kurtosis Mean Std. Dev. SkewneBs Kurtosis
Parameter
Mean
Orqinary Least Squares
811
e21
e31
841
A12
822
832
·9789
1.0001
.1082
.0124
-1.0160
- .0212
1.0211
.1039
.1530
.0973
1.0325
.0251
-1.0827
1.0728
.0684
- ·9942
·9831
1.0164
- ·9952
.0939
.1378
4·7232
2·7972
4.8957
2.916 5
5·5i?66
5· 5581
3·0470
1
1
1
-1
1
1
-1
.0892
.0126
.0855
.1407
.0890
.0855
.1407
0
0
0
0
0
0
0
3
3
3
3
.3
3
3
0
0
0
0
0
0
0
3
3
3
3
3
3
3
0
0
0
0
3
3
3
3
Seemingly Unrelated Regressions
5·5006
5,5267
1
1
1
-1
1
1
3·0510
-1
·9792
·9996
1.0210
.1079
.0106
-1.0171
.1084
4·7110
2.8214
.1038
841
812
8 22
- ·9943
·9832
1.0164
.1530
.0972
.0938
4.8379
2·9141
832
- ·9952
.1377
1.0249
.0258
-1.0780
1.0673
. 06 50
ell
821
831
.0892
.0110
.0855
.1407
.0890
.0855
.1407
Constrained Seemingly Unrelated Regressions
P1
P2
·9850
1.0000
P3
P4
l.0147
.0883
.0066
.0849
- ·9950
.1275
- ·9445
.0558
·9386
.0770
4.6118
2.8926
4.6365
3·1294
1
1
1
-1
.0771
.0069
·0740
.1219
29
~able
Parameter
3·
Estimated Standard Errors
Mean Estimated
Standard Errors
Monte-Carlo Estimate
of the Standard Error
Ratio.
Seemingly Unrelated Regressions
811
821
831
841
812
822
932
.0922
.0104
.0887
.1335
.0912
.0878
.1346
.1079
.0106
.1038
.1530
.0972
.0938
.1377
~.17
1.02
1-.17
1.15
1.07
1.07
1.02
Constrained Seemingly Unrelated Regressions
PI
°2
°3
P4
.0782
.0067
.0753
.1164
.0883
.0066
.0849
.1275
1.13
·99
1.13
1.10
30
Table 2 conveys the impression that the bias, skewness, and peakedA
A
A
ness of the small sample distributions of the estimators
8-lex,
A
or
Pl-
are set in the ordinary least squares phase of the estimation procedure
and are not appreciably modified in the later unrestricted or restricted
Aitken phase.
No general observations are possible.
Some parameters
are estimated with positive bias and some with negative bias, some
distributions are left skewed and some right skewed, some are leptokurtic and some platykurtic.
The only effect of the Aitken phase of
the estimation procedure is to reduce the standard errors of the
estimators.
When each single equation model is linear, the variance reduction
in the Aitken phase is entirely due to linear restrictions across
equations.
Usually these restrictions take the form of knowledge that
some parameters entering the model are zero.
p.
353).
(See Zellner, 1962,
In a sense this is true in the nonlinear situation.
seen previously that if the inputs
response functions
fex,(~'~)
We have
f
~ex,
are the same and each of the
have the same functional form then there
is no asymptotic variance reduction in the Aitken phase of the estimatioD procedure.
Only when the functional forms differ or restrictions
across equations are imposed does an opportunity for variance reduction
(
arise.
The Monte-Carlo study indicates that these same considerations
remain valid in the small sample distributions of the parameter estimators.
and
The slight variation in functional form between
f2(~'~) yields an improvement in efficiency.A
fl(~l' ~l)
Interestingly,
A
the asymptotic theory indicates an improvement for
~l
only whereas
the Monte-Carlo study indicates a slight gain in efficiency in the
31
remaining parameters.
The imposition of the restriction ~
= ~(~)
results in additional improvement.
Table 3 indicates that standard errors obtained from the diagonal
elements of
astray.
'"
(Q,fl
or
(2"'''''''
~ 2/)-1
I'V~I'V
as the case may be will lead one
In the absence of further evidence, it would not be an
unreasonable practice to increase estimates of standard errors by
the
~actor
hypotheses.
1.10 before use in setting confidence intervals or testing
32
ACKNOWLEWE:MENT
The author wishes to express his appreciation to his colleagues
Professors Thomas M- Gerig andT- Dudley Wallace for their comments
and suggestions-
33
REFERENCES
[1] Aitken, A. C.
(1935)
observations. II
55, pp. 42-48.
[2]
liOn least squares and linear combination of
Proceedings of the Royal Society of Edinburgh,
Gallant, A. R. (1973) "Inference for linear models." Institute
of Statistics Mimeograph Series, No. 889. Raleigh: North
Carolina State University.
[3] Hartley, H. O.
(1961) liThe modified Gauss-Newton method for the
fitting of nonlinear regression functions by least squares."
Technometrics, 3, pp. 269-280.
[4]
Jennrich, R. I. (1969) "Asymptotic properties of nonlinear least
squares estimators." Annals of Mathematical Statistics, 40,
pp. 633-643·
(1970) liThe consistency of' nonlinear regressions."
Annals of Mathematical Statistics, 41, pp. 956-969.
[5] Malinvaud, E.
[6] MI?-rquardt" D. W.
(1963) "An algorithm for least squares estimationof nonlinear parameters." Journal of Industrial Applications Mathematics, 11, pp. 431-441.
Rao, C. R. (1965) Linear Statistical Inference and Its Applications. New York: John Wiley and Sons, Inc.
[8]
Th~il,
[9]
Zellner, A. (1962) "An efficient method of estimating seemingly
unrelated regressions and tests for aggregation bais." Journal
of the American Statistical Association, 57, pp. 348-368.
H. (1971) Principles of Econometrics.
Wiley and Sons, Inc.
New York:
John
APPENDIX:
LEMMAS
The appendix contains a sequence of technical lemmas which
are used to obtain the asymptotic properties of the estimators
'"'"
~
'"'"
of
~
in Section 4 and to compare the asymptotic efficiency
and f<-
'"'"
e
A with
'"
!o
to
of
"'-1
~
A.5.
'"
in Section
6~
is given in Lemma A.2; the asymptotic distribution
is given by Lemma A.7.
~
to
The almost sure convergence of
-1
~
The almost sure convergence of
and the rate of convergence are given by Lemma
The remaining lemmas establish those implications of the
weak convergence of the measures
[".... n } which are useful in
Sections 4 and 6.
Lemma A.l:
Let
K be a compact subset of the m-dimensional
'"
real numbers for some integer
m~ 1
and let
g~~)
be con-
tinuous on X X K. Under the Assumptions of Section 3, the
'" '"
sequeJ+ce of integrals
g(h ~) dll
converges to the integral
n
(.v
J
over
-
Malinvaud (1970, p. 967).
0
uniformly in
T
'"
Proof:
Lemma A.2:
Under the Assumptions of Section 3, the least
squares estimators
of the form
K.
8
-(1,
converge almost surely to
eO
""'CX,
and are
35
where ,ji1 a
converges in probability to zero.
'"""'CXXl
converges in distribution to a
.
.
·
varlance-covarlance
ma trlX
aa
Let
Proof:
(J
Moreover,
p -variate normal with mean
0
a
V-I.
~
and
a.t:1:"w:L
be the Borel subsets of
n
()
X
~a
and A
a
a set in
-1
(A) :;;: nti;:;;:l
I A .2St a • Set A :;;: A X
~ and
a
a
set U (A ) :;;: U(A). The measur~~ U
converge weakly to the measure
a a
an
u a so that Assumption 6 of Gallant (1973) is satisfied. If g(~):;;: 0
aa.
Define
a.e.
U
Since
a
(1
~
U
an
then
g(x) :;;:
.rY
°
a.e.
U
a X:ta
so that Assumption 7 is satisfied.
must be non-singular and Assumption 11
is non-singular
Assumptions 1, 2, 4, 5, 9, 10, 14, and 15 of Gallant
is satisfied.
(1973) have been assumed
direc~l~
and so that the Lemma follows from
Theorems 2, 3, 4, and 5 of Gallant (1973).
Lemma A.3:
Let the Assumptions of Section 3 hold and let
converge almost surely to
Then
n-IF'
(8*) !~ (~:)
f'JCk rva, ............
a,13 :;;:
1,
~,
Proof:
eO
and
~a
*
.&.
orr-
converge almost surely to
converges almost surely to
V,..,A
f'JU!-'
for
••• , M.
The ijth element of
We may apply Lemma A.l since
®
rva
may be written as
)( ®
f3
is compact so that
v .. (8 , lk,)
1
Jn
'""'Ct
°
l-'
has the uniform limit
v.,fe,
~.}
lJ ~a, 0-1'"
which is continuous on ® )( ®Q
rva,
I'"
being the uniform limit of continuous
36
* - v 1...J (~*a,, 'I-'
* . converges almost
v .. (6*, A~)
i).Q.)
1. J n "'a, ""p
* surely to zero by uniform convergence 'and the diffe:t::ence v .. (8* , ,A~)
1. J '""'a, """p
v~ .(au , ..e~)
converges almost surely to zero by continuity.' The surn of
1. J
The difference
functions ..
(Va,
.,..,
these two differences is
v
ijri
*
(a * A~)
"Va! '''''P
converges almost surely to zero.
Lemma
A.4:
Let
g05, -:r;)
K be a compact subset of the m-dimensional real
zero uniformly in
.
Proof:
S g2 05,
~
(V
over
elements
elements
,..
Then for almost all reali- .
K.
2(
)
n ..l h
tt~l g ~,,.!
uniformly in
,:r, over ,!S.
converges to
The Lemma follows from
0
Let the Assumptions of Section 3 hold.
aat3 = n
CJaf3
,!S.
(V
Theorem 4 of Jennrich (1969).
Lemma A.5:
~ X
) eta, converges t 0
n -l~n
(j=l g (
~,,:r,
the series
By Lemma A.l,
,.!) c¥.t (~)
estimators
Let the Assumptions of Section 3 hold
m ~ 1.
be continuous on
[ eta,}oot=l
zations of
n
(V
numbers for SOme integer
and let
which, therefore,
of
-1 ,.. I
.,..
ea, e
,e.
Then the
13
For
converge almost surely to the corresponding
.,.
n
sUfficiently large, the matrix
I:
'"
with
aat3 is non-singular except on an event occurring with
probability zero.
The elements
to the corresponding elements
~af3
rvA
aU<-'
of
Proof:
n
(V
-1
~
is bounded in probability; that is, given
such that for all
t- l
of
converge almost surely
•
0 > 0
sUfficiently large we have
The estimator may be rewritten as
there is a bound M
37
The first term
!n e' ~ converges almost surely to cr~ by the strong
1"'J(X,' .....
Law of Large Numbers.
1
'I'he second term - 0
(e)~
n rva rva· .....
surely to zero by Lemma A. 4.
converges almost
Similarly for the third term.
Which, by Lemma A.l, converges uniformly in
compact set
........
,.
(e',
8')'
r<;Ja ~
Letting
over the
to
~)( ~
,.
By the same arguments employed in Lemma A· 3 we have that
converges almost surely to wof' (~d ~)
A
w (,go;' ~).
af3n
o.
:=
A
The eventual non-singularity of
follows from the continuity of
~
det(t)
rv
and the assumption
that rv
t is positive definite. The almost
.
A-I
-1
sure convergence of rv
t
to rv
t
follows from the standard continuity
argument.
By
is a
8
and rv
x in rv
in rva
S
X there
Taylor's theorem, for given
""'a
on the line segment joining
f (x , 8 )
a rva
"'a
t-l '
=
f {x , 8°) +
a
A
1"'J(X,
rva
+ n rva~'
e' A... (8
- ,.q3
8°)
~
'V'
rva
In I
8
rva
and
f (x ,
a ""a
8°
"'a
such that
e )(a -
r<;Ja
~
eO) •
""'tG
The first term on the right is
In(~ ~~ - o~)
in probability because
bOl~nded
is asymptotically normally distributed by the
Central Limit Theorem.
The second term on the right converges in
,..
J1(~ -~)
probability to zero because
1~~
A converges almost surely to
n ...... rva,
distributed by Lemma A.2 and
zero by Lemma A.4.
is asymptotically normally
Similarly for the third term on the right.
(a'""a, - rva,
e
U
fourth term converges in probability to zero because ,jn
,..
e~
r<.p
asymptotically normally distributed,
to zero, and the elements of
SuPt>-.
v
at>
0
J. a,
,a )
f a,(x
f'Ja, rva,
!nA'A
~
b
>-a
fQ.
(~, _aQ.) :
V
OR
1-" ...... .,..,
-
are uniforml.y bounded by
(x',a',~,~)'
X X®X&
x~}
"'a,
. .....
. .,..,
E
\:ea, f'Ja, . -po ......
JI-'
"'a,
continuity of the partial derivatives.
one can show that if
g~)
tives on a bounded open
f'Ja,
,..
s~here
peT E S)
~
tends to one as
n
tends
'"
is bounded in probability then
- g (,.r.u ))
is bounded in probability.
positive-definite so there is an
,..
E
'f"V
.f"'..J
f"o",I
to infinity we have that the elements of
n
The 'matrix ,E;
> 0 such that
P(det(r:) - E < deter:) < deter:) + E)
in probability.
and the
"{3
is a function with continuous first deriva-
to infinity, and
Since
X®
"13
"'a,
Using a Taylor series expansion
rv
In {g (rt.)
is
)
.
converges
a 1m
. os t sure1 y
3Q.U
'.,..,
which is finite by the compactness of X X ® XX
,..
The
,..
deter:) -
'"
tends to one as
r.:::- (~-l -
~n ~
-1)
~
is
E
n
> O.
tends
are bounded
Lemma A.6:
Under the Assumptions listed in Section },
L F I (~-l GSJI) e converges in distribution to a p-variate normal with
$ '" ~
'" '"
mean
·
.
·
an d varlance-covarlance
ma trlX
O
'"
Proof:
4.7 of Rao
1
where
!t
We may write
(1965, p. 118).
1(~-lf.:'lT'
~
\8 h ~ ==
is the
JSt
p
1 ~
'- ~==l
~n
-1
!St ~
I-n
~ == .c- '"t==l ~
~n
by M matrix
~l,f:l Ustl,,.{?,l)
0
,.....,
0
!.L2 f 2 (~2',.a~)
'"
0
'"
0
'"
::
~
...
0
0
ry
'"
The
'"
We will apply the Central Limit Theorem stated in Problem
c!
~n
("'1-1
~~
are independent by the independence of the
1 ~
e (.~ ~')
n
~::l
1 -n
== n
'"t==l ~ ~-1JStI
convergence of the measures
function of ~
~ :: ~-l~
and let
Un.
H(~)
and
~
which converges to ~ by th e wea k
Let
Gt(Z)
be the distribution
be the distribution function of
(which does not depend on
t).
For given
E
> 0
40
= [v:
,..,
where
= !n ~t=l j'B
~a=l IIVrvdf 'avvta'~a
(x
cP) 11 2 W2 dH(W)
a
~
tn
= [w:
rv
where
l
$ -n
where
K
= sUPa,suPX
rva
IIV""'(t,f a (lCA,e )1I 2Wa,2 >
'-lJa,
-M
U
';. 1
aP
~t=l
n
€
2
)
f"V
J. BtnK ~/~'<iH(;~)
IlZa,f a0sa'!~)
11
2
and is finite becausek,a is
compact
$ K
where
this
I
en =
la~t
Ie n ~/~
dReW)
f"V
because
[w:
f"V
term converges to zero as
~/~ dR(~
is finite.
n
Bee
tn
n
for
t
tends to infinity because
D
converges in distribution to a p-variate normal with mean
.
.
var~ance-covar~ance matr~x
Proof:
2, ••• , n;
Under the Assumptions listed in Section 3,
Lemma A.7:
.
= 1,
0
N
and
V-1 ~""'rv
V-1 •
f"V
By Lemma A.2
1
I
)-1 -1
(- F F
n
where
JUa
rv
f"V f"V
I
t:::'
F e + 'l!n a
rnrv
converges in probability to
converges to rv
V
-1
by Lemma A.3.
f"V
0' •
f"V
.
f"V
The matrix
Using arguments similar to those used
41
in the proof of Lemma A.6'we have that
!- Fie = !- t n K
Jll ""
Jll t=l Nt
rv
conver~es
in distribution to a p-variate normal with mean
variance-covariance matrix
.t.
n
u
Nt
0
rv
and