A TEST FOR NONLINEARITY
IN THE GENERAL MULTIPLE REGRESSION MODEL
BASED UroN SQUARED DIFFERENCES
BE1W'EEN PAIRS OF RESIWALS
by
Nancy Irene Llfons
Institute of Statistics
Mimeograph Series No. 1039
Raleigh, N.C.
1975
/
iv
TABLE OF CONTENTS
1
INTRODUCTION. .
1
2
REVIEW OF LITERATURE.
8
3
TEST PROCEDURE . . . .
13
3.1
3.2
3.3
3.4
3.5
3.6
4
5.3
5.4
..
37
The Alternative Hypothesis.
Properties of r under HA. .
Method of Approximating the Power of the
Test . . .
37
38
42
46
Large Sample Comparisons.
Asymptotic Normality under Pitman's
Alternative . . . . . •.
..
Asymptotic Relative Efficiency of r and F .
Selection of Pairs for D .
...
L
46
...
6.2
6.3
Accuracy of Approximations to the Critical
Region Under H . .
..
•
Power Studies . O
..
...
Application of the Test Procedure .
SUMMARY AND CONCLUSIONS . . . .
7.1
7.2
Summary and Conclusions
Suggestions for Future Research
20
26
32
RESULTS OF SIMULATION STUDIES AND APPLICATIONS . .
6.1
7
15
16
LARGE SAMPLE PROPERTIES
5.1
5.2
6
13
ALTERNATIVE HYPOTHESIS.
4.1
4.2
4.3
5
Distance Function
Construction of D .
Test Statistic . .L.
Moments of r under H
O
Distribution of r . .
Bounds for the Critical Values of the
Test Statistic.
48
55
58
60
.
.
60
66
71
74
74
77
8
REFERENCES.
80
9
APPENDIX.
83
CHAPTER 1
INTRODUCTION
The motivation for the following research arises from a problem in the field of sociology.
Suppose a general multiple linear
regression model is proposed to represent a relationship in which
each observation of the dependent variable and k independent variables represent a set of measurements describing a particular
collection of sociological characteristics ef an individual.
The
general linear model is expressed as
(1.1.1)
where
~'
=
(Yl'Y2'···'Yn) is the vector of observations of the
dependent variable.
The matrix X is an n x k matrix of indepen-
dent variables assumed to be fixed,
!
is a vector of k unknown
parameters, and £ is a vector of n unobservable independent error
2
terms each distributed normally with mean zero and variance 0 ,
or,
E~N(~,I02). It is assumed that the matrix X has full column
rank and that a term representing a general mean is included in
the model.
X is an n
This assumption implies that one column of the matrix
x
1 vector with each element equal to 1.
Model (1.1.1)
with its corresponding assumptions will be referred to as H '
O
The matrix X may be written as
x '
-1
X
x
-2
I
(1.1. 2)
=
x
-n
2
where
~' = (xi1,xi2,···,xik) is a 1
x k vector for i=1,2, ••• ,n~
where the x.. 's are expressed as deviations from the mean for each
1J
variable.
Then the model (1.1.1) can be written alternatively as
=
y.
1
k
1:
j=l
e.
x ..
+ e:. , i=1,2,···,n,
1
1J J
According to the methods of least squares,
under HO by £, = (X' X)
~l
l. -
~
is estimated
..
X' X> and the vector z = l. - l., where l. = Xb a
is called the vector of residuals.
Z ::
(1.1. 3)
The vector z can be written as
x£"
= (I - X(X'X)-lX')l.,
= Mr.,
where M = (I - X(XIX)-lX ' ).
(1.1.4)
Furthermore, since MX = 0,
z=M(X~+.0,
= Me:
(1.1.5)
and z is therefore a random variable distributed as N(~)Ma2).
In this situation a test is often needed for lack of fit of
the proposed model which is sensitive to an alternative specification that the relationship between the dependent and the independent variables is expressed by the addition to the model of one or
more terms each of which is a nonlinear function of one or more of
the independent variables.
The total number of observations is
often small and there may be no replications at any combination of
values of the independent variables.
IA typical vector from the set x., i=I,2,···,n, will often be
referred to as x.
-1
3
If i t is assumed that the alternative specification of the
relationship is given by the addition into the model equation of
a single term which is a nonlinear function in one of the k independent variables, then the usual procedure for testing the
significance of a regression coefficient can be used,
This pro-
cedure involves including the suspected term in the model equation
and performing the partial F or t test that its coefficient
zero"
is
If there is no indication from sources independent of the
fitted model as to the functional form of the possible alternatives, then a common procedure is to test the significance of
numerous probable alternative models.
However, the interpreta-
tion of the significance levels of such a sequence of tests is
difficult (see Ashar and Wallace (1972))0
The test commonly used against a general nonlinear alte1'native is the classical lack of fit test due to R. A. Fisher (1922).
Under the above assumptions this procedure requires that there be
repeated observations at one or more values of the independent
variable.
Within the framework of model (1.1.1) suppose there are
'" '"
'" .
p < n distinct values of x., which are denoted by x1,xZ,---,x
-1
-p
Let each !1''" occur n. times with associated observations y .. ,
1
1)
P
j=I,---,n., where n > L n.
The error sum of squares (SSE) from
1
i=l 1
o
fitting model (I.InI) by the method of least squares has n-k degrees
of freedom and can be partitioned into two component sums of squares.
The first, called the pure error sum of squares (SSPE), is defined
by
p
SSPE
<:
L
n.
1
I:
i=l j:d
(1.1.6)
4
no
1
y.1
= L yo ./n., i=l,···,p, and is associated with n-p degrees
j=l IJ 1
of freedom, The second sum of squares, called the lack of fit sum
where
of squares (SSLF), is obtained as SSLF
ated
with p-k degrees of freedom,
= SSE
- SSPE and is associ-
The quantities SSPE/(n-p) and
SSLF/(p-k) are independent, the former being an unbiased estimate
of random error under both H and HA, and the latter being an unO
2
biased estimate of 0 under H but sensitive to lack of fit of the
O
proposed model.
The test statistic
F '-"" SSLF/(p-k)
SSPE/(n-p)
(1.1. 7)
is compared to tabulated percentage points of the F(p_k,n_p) distribution to determine the adequacy of the model.
When the above
assumptions are met, this is a UMP invariant test against a p
parameter nonlinear alternative model (see Kendall and Stuart
(1963)) .
In practice if there are no replications but the number of
observations is large enough, groups of observations at
near lylt
It
identical values of the independent variable vector are formed
and the F-type test of (1.1.7) is carried out assuming that observat ions within these groups are replicates.
An example of the use
of this modification is given in Daniel and Wood (1971).
Corres-
ponding to the notation in (1.1.6), suppose there are p disjoint
sets G., i=1,2,···,p, each of n. near-replicates, with n.
1
For each -1
x.
vations,
E
0
J
1
1
>
1.
Go let y. , j=l 2 ••• n. be the associated obser1.
1 .
" , l.
J
In this situation SSPE is now computed as
5
SSPE
where y"
1
=:
.
I
=
P
~
. l '1. E:.
G
1=
J 1
/n.~
L y.
G
i
j
E:
10
J
1
-
(y.
E
- y.)
2
1
10
J
and is treated as though it had a chi-square
distribution with n-p degrees of freedom under H '
O
test is carried out in the same manner
Otherwise the
0
If replication does not exist, the test is no longer a likelihood ratio test since the actual distribution of the test statistic
is alteredo
This alteration may result in a disruption of the
significance levels of the test.
Attempting to determine appropriate
critical values by approximating the distribution of the test
statistic is difficult due to the form of the function
0
rules for forming the groups G. can be very arbitrary.
1
In addition
When the
number of observations is small, it may not be possible to group
together values of x in G. which are close together relative to the
-
1
overall dispersion of the set of data points, and it is reasonable
to suspect that the optimal property of the F-type test might be
affected when the distances between values within the groups G. are
1
large.
For these reasons it becomes meaningful to consider alter-
natives to the F-type lack of fit test.
Although the responses associated with near-neighbor pairs of
the independent variable can not be considered as being from the
same population, the means of the two populations do not differ
appreciably compared to means associated with an arbitrary pair of
values of Xo
The purpose of this investigation is to develop an
alternative test procedure based upon a test statistic which is a
function of pairs of residuals associated with pairs of
neal' <neighbor values of x.
The l)<!slc fUl1ctIonnl unit is
where z. and z. arc the residuals associated with the pair (x.,x.).
1
J
-1
-J
2
Under H the quantity b .. is a biased estimate of 0 , however the
O
1J
bias may be small for near-neighbor pairs relative to that of a b ..
1J
associated with an arbitrary pair (x.,x.).
-1
-J
In the case of small
samples, rather than ignore the distances between these pairs, the
approach taken '\7i 11 be to quanti fy them and allow them to exert
their relative influences on the test procedure.
This is done by
approximating the distribution of the test statistic using its
moments under H ' Analytic expressions for these moments can be
O
found for the particular test statistic selected.
The test procedure is then compared with the modification of
the F test in a two variable case as to its power against certain
nonlinear alternatives under several patterns of dispersion of the
independent variable values.
is given in Chapter 2.
A review of the relevant literature
Chapters 3 and 4 deal with the test
statistic and its basic properties under the null and general
alternative hypotheses, respectively.
In Chapter 3 an approximate
test is derived for the case in which the set of near-neighbors
consists of disjoint groups of equal size.
The asymptotic relative
efficiency of the test with respect to the F test is considered in
Chapter S.
Data are simulated under several alternatives and dis-
cussion and presentation of results concerning the power as well as
a demonstration of the application of the test procedure to some
7
sociological data are included in Chapter 60
A summary of the
results of the thesis and some suggested problems for future
research are discussed in Chapter 70
8
CHAPTER 2
REVIEW OF LITERATURE
One of the oldest tests for general lack of fit in the linear
regression model is the one discussed in Chapter 1 due to R, A.
Fisher.
As do many of the classical methods, this test requires
an independent estimate of random error about the model,
This
estimate is usually obtained from the presence of several observat ions of y at one or more values of x,
Alternatively if the
sample is large enough, the x-values are grouped so that several
y-values are associated with each group, and an estimate of error
is obtained within groups.
Often neither of these two conditions
can be met in practice especially for survey data.
When this is the case several approaches may be taken.
One
approach is to choose statistics which express the overall shape
of the data points and are therefore sensitive to departures from
the conditions in question.
Included in these are graphical
techniques such as the plotting of residuals versus fitted values
of y (see Anscombe and Tukey (1963)),
A second approach, which was
introduced in Chapter 1, involves the formation of groups of nearreplicates in order to obtain an estimate of the variance of random
error.
Still another approach is through the use of transformations,
For example in the case of the BLUS (Best Linear Unbiased Scalar
covariance matrix) residuals of Theil (1965), the residuals from the
fitted regression are transformed into a set of n-k BLUS residuals
with variance-covariance matrix equal to a scalar times the identity
matrix,
In a paper by Ramsey (1969) four tests based upon statistics
which are functions of the BLUS residuals are developed to test
9
against four basic types of departures from the assumed model.
The
approach taken in this investigation is to consider only the untransformed residuals,
The effects of using transformed residuals could
be considered in future research.
In this chapter a review is given
of a few selected techniques which depend upon different types of
functions of the residuals from the fitted regression.
It is assumed
throughout that there are no replications of observations at any
values of x.
For obvious reasons, more work seems to have been done concerning
the problem of lack of fit in the simple rather than the multiple
regression model.
Many problems concerning the handling of a multi-
pIe regression model are best attacked by considering first a simple
regression model involving one dependent variable, possibly the residuals from a fitted regression, and one independent variable,
Although
his emphasis is in applications to design problems, Anscombe (1961)
proposed the use of two statistics in this connection.
These
statistics, gl and g2' are based upon the third and fourth power sums
of the residuals and are estimates of the skewness and kurtosis,
respectively, of the error distribution of the proposed model,
of approximate tests of significance are given.
Examples
Tukey (1949) proposes
a test which he calls "one degree of freedom for additivitytt, designed
to detect a nonlinear relationship between the independent and dependent variables.
The technique essentially involves examination of a
regression of the residuals on the squares of the predicted y values,
~.
The test is based upon the statistic
(~,~2)2
f
= "2
"2
~ 'M~
A2
where L
I
=
A2 A2
A2
(Yl 'Y2 ,···,yn ).
10
This quantity is subtracted from the
residual sum of squares and compared with the remainder against the
values of an F distribution with 1 and n-k-l degrees of freedom.
In the simple regression case, Thornby (1972) develops a nonparametric test based upon the signs of the residuals for testing
linearity of regression against convex alternatives.
The procedure
involves division of the independent variable axis into at least
three disjoint intervals and recording the sign of the residuals
in each group for comparison.
Using the distribution theory of
U statistics, he develops the test procedure and examines its
behavior under various distributions of the error term.
The mean-square-successive-difference statistic based on residuals from a simple regression model is used for testing against
an alternative of serial correlation between successive observations (see Durbin and Watson (1950), (1951)).
The statistic is
employed by Prais and Houthakker (1955) to test against nonlinearity.
Although it is not mentioned in their discussion, the procedure
would seem to give the best results in the case of nearly equally
spaced values of the independent variable.
In the case of equally
spaced observations the statistic proposed in Chapter 3 is equivalent to that of Prais and Houthakker in the simple regression
case.
However the method of determining significant values of the
statistic proposed here is different.
As mentioned earlier the second approach, which is basically
the one taken here, is the determination of groups of near replicates
through the introduction of an appropriate distance function defined
11
on the k-dimensional space of the independent variable.
In a method
suggested by Daniel and Wood (1971). an estimate of pure error is
obtained from a set of near-neighbors which is determined by the
·r
following interpoint distance function.
WSSD .... ,
JJ
=
k
L
i=l
.. - x 1J
..
1 (x 1J
S
')J2
•
Y
where b .• i=l.···.k. are the estimates of the regression coefficients.
1
j. j'=1.2.···.n. j
~
j'. and S
y
fitted regression equation.
is the error mean square from the
The first step in the procedure is to
order the observations according to the values of y.
Then WSSD ..
JJ
t
is computed for the 4n-10 pairs of points separated by O. 1. 2. 3.
and 4. y values.
A running average estimate of
a. based on the
assumption that the residuals could be treated as if they were
independent normal observations. is obtained as
S (a)
n
= 0.886
L
D(a)
Iz.J
- z.,
J
I
n(a)
where
L
indicates summation over the first a of the 4n-10 pairs
D(a)
of residuals in increasing order of WSSD .. ,. The point a at which
.;
JJ
the value of S (a) "stabilizes" is taken as the estimate of a.
n
This
value is then compared with the estimate of a obtained from the error
sum of squares from fitting the original regression model.
No explicit
decision rule is given either for selecting the value of a or for
determining significant values of the ratio.
Furthermore Daniel and
Wood give no consideration to the possibility that a high positive
correlation may exist between the two estimates of a.
The presence
of such correlation would tend to bias the results of the test
toward acceptance of the model.
12
In the following investigation relatively few assumptions are
made in developing the test procedure.
Subsequently the set of
alternatives is very large, and the conclusions that can be drawn
as to the true relationship when the null hypothesis is rejected
are necessarily limited.
13
CHAPTER 3
TEST PROCEDURE
3.1
Distance Function
In order to determine the set of near-neighbors in the k-
dimensional space of the independent variable, a distance function
must be chosen which is defined between every possible pair of
points (x.,x.).
-1
-J
The Maha1anobis generalized interpoint distance,
D, is chosen as the criterion.
For two sample points x. and x.,
-:l
-J
this function is defined by
D(x.,x.) = [(x.- x.)'S
-1 -J
where S-1
-1
-1
-J
= (X'X/n) - 1=-nC1
,
n
(x. - x.)]
1/2
(3.1.1)
-l-J
with C
n
= X'X.
When X is of full
column rank, X'X is nonsingu1ar and positive definite and therefore
k 2
D is a metric on R .
The interest in this investigation lies in detecting nonlinearity in one or more of the independent variables.
Therefore
the distance function for determining the near-neighbor vector
pairs should weight the independent variables according to the
relative spread of the data points for each component of the vector
x.
To illustrate this property of the Maha1anobis distance function
it is helpful to begin by making the assumption that
limit (X'X/n)
n-+oo
=E
where Eisa k x k matrix of constants.
(3.102)
If the vectors -1
x. are
orthogonal, the matrix C is diagonal for each n.
n
2
See Lancaster (1969).
As n increases
14
C In approaches a matrix which must itself be diagonal.
n
The set
of all points ~ in R which are a constant distance dO from a vector
k
~
forms a k-dimensiona1 ellipsoid around
~{J0
For fixed n the
length of the i-th axis measured on the Euclidean scale EOi is
dO/~(C-l)
.. ,
n 11
where (C- l ) .. is the i-th diagonal element of C- l ,
(i=l,···,k).
As n increases the length of the corresponding axis
n
n
11
on the Mahalanobis scale tends to the value
E
Oi
;(~)"
11
where
(~)
,
.. is the corresponding element of L.
11
Therefore in the
limit the Mahalanobis Distance weights the relative influence of
each component of a vector as if each were measured on the same
scale.
If the vectors x are not orthogonal, the crossproducts of
the components are also taken into account in the weighting imposed
by D.
The function D is invariant to changes in the scale of the data
points.
If -1
x. * = AX.,
i=l,···,n, and A
-1
D(x. * ,x. * ) = [A
~
-J
2_
~x.
> 0,
then
,2
-1
1/2
-x.) (A C In) (x. -x.) ]
~-J
n
-l-J
= D(x.
,x.).
-1 -J
If the set of all possible unordered pairs of the vectors x.
-1
(i=1,2,···,n) is denoted by IT, then the elements of the set
IT may be ordered by D.
The ordering imposed by D is unchanged
by any transformation of the form
aD b ,
a,b > O.
15
Therefore to simplify matters when the actual calculations are performed
in computer program routines), the function
~,&"
=
(l/n)D(x.
,x.)
-1 -J
2
is often used when the concern is only with the ordering of the
pairs, (x.,x.),
-1
3.2
-J
Construction of D ,
L
Let the set of near-neighbor pairs of the vectors -1
x. be denoted
by D , and let the number of distinct unordered pairs in D be deL
L
noted by n , With respect to the construction of D it is useful
L
L
to introduce the notion of the index of a point. Let the index I.
1
of a point x., (i=1,2,···,n) be defined as the number of distinct
4.
unordered pairs in D for which
L
~
is one of the coordinates.
is clear that the index of any point
~
It
can take values 0,1,2,···,
n-l.
In forming D it may be necessary to impose a uniform bound
L
on the indices of x, In other words the restriction
I.
1
<
-
may be imposed,
I L'
i=1,2,···,n
Alternatively an upper bound (N ) may be placed
L
upon the size of D or a maximum value d may be placed upon the
L
L
distance allowed between a pair of elements in D ,
L
in II are ordered by their value of D.
Then the pairs
In order, beginning with
the pair with smallest interpoint distance, all the pairs are included into D until one or more of the three restrictions causes
L
the inclusion process to stop, i,~., until either n > N or
L
L
16
D
>
d , or the index of every point is equal to I .
L
L
Specifically,
let DL be written as D = D(NL,dL,I ), where N , d L and I are
L
L
L
L
respectively, the maximum values allowed for the number of pairs
in D , the distance between any pair in D , and the index of any
L
L
point x. Then DL=D(N ,-,-) implies that D is restricted only by
L
L
size, D = D(-,d ,-) implies that D is restricted only by point
L
indices.
L
L
Any combination of these three restrictions may also be
imposed, yielding a total of seven possibilities.
The set D will be called the set of near-neighbor pairs of
L
the vectors
~,
i=1,2,···,n.
For convenience in notation D con-
L
sidered interchangeably with the set of subscripts of its elements,
i.~.,
alternatively,
DL
= {(i,j)1 (x.,x.)
--1-,)
~,
3.3
is a near-neighbor pair of the set
i=1,2,···,n}
Test statistic
The test statistic proposed in this investigation is
(z. - z.)
(i,j)€D
1
J
E
r
n-k
2n
L
=
2
L
n
E
i=l
(3.3.1)
z.1
2
where z. and z. are the residuals from the fitted regression model
1
J
which correspond to the pair (x.,x.).
of (3.2.1)
L
D = D(-,-,2)) while k=l and the observations are equally
L
--1-,)
is 2
(i.~.,
In the case that I
spaced, this statistic reduces to the mean-square-successive-difference statistic discussed in Chapter 2.
17
Suppose the argument
a vector
~
~
of the statistic (3,3,1) is replaced by
of independently distributed random normal variables
whose observations are taken simultaneously with an equally spaced
variable!
time),
(~,g"
In this situation Kamat (1963) considers
several alternatives to (3,3,1) including
n-1
i~llYi
and
- Yi+l l
(3.3.2)
n-2
2
E (y. - 2Yi+l + Yi+2)
(n-l) i=l 1
(n-2)
n
_ 2
(3.3.3)
L (y. - y)
i=l
1
in testing independence versus pure quadratic and sinusoidal trend.
Using the criterion of asymptotic relative efficiency, Kamat coneludes that the form (3.3.1) is the best among the alternatives.
Both the numerator and demoninator of (3.3.1) are quadratic
forms in
~,
so that r may be represented in matrix notation as the
ratio
r
=
(n-k)
2n
L
ZIp Z
-
Lz'z
(3.3.4)
where the matrix P is defined elementwise by
L
(P )· .
L IJ
=r'
if (x. ,x. )eD
-:l-.J
V
iFj
0, if (x. ,x.)iDV iFj
-:l -J
n
CP )· .
L 11
=f jjFi=1
L (P ) .. ,.
L 1.J
i=l,2,···,n.
18
Alternatively, using (1.1.5),
r =
(n-k)
2n L
(3.3.5)
Since M is indempotent, M and MPLM commute with each other.
Therefore there exists an orthogonal transformation wijich simultaneouslyreduces the matrices of the numerator and demoninator of
r to a diagonal matrix and the identity matrix respectively.
e
= He *
If
represents this orthogonal transformation, r becomes
=
r
(n-k)
2n
e * 'Ve *
(3.3.6)
L
n diagonal matrix whose elements are the eigenI
0
values of MPLM, and I = (n-k ). Let the rank of MPLM be denoted
where V is an n
x
0
by m.
0
Then the rank of V is also m, where m is at most equal to
n-k, implying that at least k of the eigenvalues of V are zero.
Denote the remaining n-k eigenvalues by v ,v 2 ,···,v _ , where
l
n k
n-k-m of these are also zero. Since H is an orthogonal matrix,
the vector e* is distributed as e*~N(O,I 0 2), and without loss of
-
-
-
n
generality the statistic r can be considered to have the form
n-k
*2
E v.e.
(n-k) i=l ~ ~
(3.3.7)
r =
2n
n-k *2
L
E e.
. 1
~=
~
under H '
O
Any ratio of the form (3.3.6) is known as a Rayleigh quotient
and has the following useful properties (see Lancaster (1969».
e*'Ve*
Theorem 3. 3. 1. The quotient r =
is invariant to changes in
e*l'e*
19
scale of the vector e: * •
W* IVW *
*rv *
w Iw
i.~.•
:>. 2e: * Ie: *
= :>"e: * ;>,.Fa.
then
e: * tVe: *
:>..2e: * tVe: *
=
if w*
=
(3.3.8)
e: * t1'/
If vI and v _ are the smallest and largest eigenn k
values. respectively. of V. then for every vector £ * Fa.
Theorem 3.3.2.
e: * IVe: *
vI ~ * rv *
€: tIe:
(3.3.9)
< v n-k
In terms of the distribution theory of r. equation (3.3.8) implies
2 under H ' The inequality
O
(3.3.9) guarantees that for fixed n all finite moments of r exist
that the value of r is independent of
0
since r is bounded by two constant functions.
20
3.4
Moments of r under H '
O
Analytic expressions for the moments of r can be obtained by
using the following theorem due to Pitman (1937).
Theorem 3.4.1 (Pitman).
Let the random variable w have continuous
distribution with density function
1
f
-w
(w) = f(n0- e
m-l
w
[ 0
m
where m > 0 and
o.w,
rem)
denotes the gamma function with parameter m.
Let w ,w 2 ,"',w be n independent random variables each with
l
n
w.J.
f m, (w.),
i=1,2,"',n, with the m.1 IS not necessarily equal.
J.
J.
Let H(w l ,w ,"',wn ) be any function of the wi's which is indepen2
~
dent of scale,
i.~.,
n
for fixed A.
E w. and
i=l J.
The proof can be found in
Then the variables h(w ,w ,···,w )
l 2
n
H(w ,w ,"',w ) are independent.
l 2
n
=
Pitman (1937).
Let r
= v/h where v = £* IV£ * and h
=
~ *.
£* '1£
Under the null
·
2 f or eac h J..
.
hypot h eS1S,
eac h -1
£.*2"1S d"J.strJ.'b ute d as cr 2Xl'
erty
3.3.7 the function r is independent of scale.
Pitman's theorem hand r
ate
By prop-
Therefore by
independently distributed under H '
O
In this case for any positive integer s,
s
E (v )
H
o
(3.4.1)
Obtaining the moments of r simplifies to that of obtaining the
moments of two quadratic forms in independent standard normal
21
variables.
In order to evaluate these moments it will be useful at
this point to state three well-known results concerning quadratic
forms and matrices.
The proofs of the first two theorems can be
found in Searle (1970), the proof of the last can be found in
Lancaster (1969).
Theorem 3.4.2
If
~ ~ N(~,Q),
then the s'th cumulant of wtAw is
In the case that s=l it is not necessary that w be normally distributed.
Theorem 3.4.3
If
~ ~ N(~,Q),
then
If A ,A ,---,A are the eigenvalues of the matrix A,
l 2
n
then for any positive integer s,
Theorem 3.4.4
n
E A. s
. 1 1
1=
= tr (As) .
Using Theorems 3.4.1 - 3.4.4, the expected value of r under
~rO)'
HO' EH (r) (or
o
~rO
::
is equal to
2
n-k tr(V)o
trey)
=
2
2n L
2n L tr(M)o
n-k
E v.
1
(n-k) i=l
= 2n
n-k
L
n-k
E v.
1
i=l
where v ::
n-k
::
(n-k)
2n L
(3.4.2)
V
,
Note that ~rodoes not involve the parameter
The moments of r about
~rO
(]
can be derived using a technique
described in Durbin and Watson (1950).
2
.
22
n-k
n-k
*2
L: v.
i=l 1
(n-k)
L: V.E.
i=1 ~ ~
n-k *2
L: E.
i=l
1
n-k
n-k
*2
L: V.E.
i=l
n-k
=
1
L:
i=l
1
*2
i=l
1
n-k
_
- V)E.
1
1
N-k
*2
L: E.
i=l 1
n-k *2
L:
E.
. 1 1
1=
*2
L: (v.
i=l
=
1
(n-k)
L: E.
n-k
v.·
*2
L: E.
. 1 1
1=
n-k
.
L:
1=
*2
~.E.
III
.;;;...~---,
n-k
*2
say.
(3.4.3)
L: E.
i=l
As before since
E
*
1
~ N(~,
10 2 ), r -
~rO
is of the required form for
the application of Pitman's Theorem and the central moments of r can
be written as
5
EHO (r
-
~rO) =(~~!
EHo~~:<i£i"S
(n-k
E
L: E.
Ho i=1 1
*'
(3.4.4)
s '
which is the s'th central moment of the numerator of 3.4.3 over the
slth noncentral moment of its denominator.
Thus the next three
higher moments of r are as follows (Durbin and Watson (1950)):
23
n-k
Var
H
-2
L (v.
2
- v)
i=l 1
(n-k) (n-k+2)
(r)
o
n-k
8
_ 3
L (v.
- v)
i=l 1
(n-k) (n-k+2) (n-k+4)'
n-k
E (r
HO
- ~
rO
)
4 Qn-kj
2n
=
4 48
---
~-k
L
~
2
- v) 4 + 12 L (v. - v)
1
·=1 1
(n-k) (n-k+2) (n-k+4) (n-k+6)
L (v.
i=l
The first moment of r under H is particularly interesting
O
since it can be expressed as an explicit function of the distances
between the pairs in D .
L
Let the matrix C = X'X be written as
n
n
C = L x. x.', and let the matrix M be rewritten as a matrix of
n
i=1-1-1
row vectors, i .~. ,
m
'
-1
m '
M=
-2
(3.4.5)
m
"""11
where -1
m., i=l, 2,"',n are n
-1
1-x
- l 'Cn Xl
-1
n Xl
-
-x~'C
-~
x
-1
1 vectors.
E1ementwise M is given by
-xI'C
x~·
n-~
-x 'C-lx
-1
-x-2 'e n"""11
x
1-x~'C
-~
-1
n"""11
-1
x~
n-~
(3.4.6)
-1
-x"""11 'C n-1
x
-x 'C -1 x
"""11
n-2
•
-1
'c
X
"""11
n"""11
• I-x
24
if i
I-x. 'c-lx. ,
-1
n-l
=j
or
-x.'c
-1
-1
x.,
n-J
if i -; j
n-k
Since MPLM is orthogonally similar to V,
= treV).
~
a=l
Therefore,
m
-1 '
m '
-n
m
-n
and
t
n
tr(MPLM) =
~
a=l
m 'PLm
-0.
By the construction of P ,
L
.
-0.
2
em. - m .)
aJ
· .) € DL al
e1,J
m 'P m =
~
L-o.
-0.
n
Therefore,
~
(m
~
a=l (i,j)€D
ai
- m )
aj
2
L
n
=
~
~
(i,j)€D
Consider only one of the terms
(m . - m .)
L a=l
al
2
aJ
2
n
em . - m .) .
a=l al
aJ
~
This equals
2 2
em. - m.) 2
+ (m .. - m.. ) + em .. - m.. )
al
aJ
11
lJ
1J
JJ
ar 1 ,J
~
J.'
•
ex.
-1
+
-
[ex.
a-;i,j
-1
x.)'C
J
-1
2
x.) 'C?,> x.]
-J
d
-1
-1
2
[1 + ex. - x.)'C n x.] ,
-1
-J
. -J
-1
-1
x x 'C ex. - x.) + 1
n -0.-0. n -1
-J
25
+
+
n
=
E
(!:i.
a=l
+
(x. - x.)'C
-J
~
1
-1
n
x.x. 'C
-1~
-1
n
-1
(x. - x.) - 2(x.-x.)'C x.
~
-:J
~ -:J
n-1
(x. - x.)'C -1 x.x.'C -1 (x. - x.) + 2(x. - x.) 'C -1 x. ,
~
-J
n -J-J n -1
+
-J
- X.)'C -1 (x. - x.)
-:J
n ~
-:J
2(x. - X.)'C
-J
~
+
-1
J
n-J
2
-1
-1
X. - 2(x. - X.)'C X. .
n -J
-1 -:J
n"""1.
By the distributive property of matrices it is also equal to
-1
- 2(x. - X.)'C (x. - x.) ,
"""1.-:Jn-1-:J
-1
= 2 - (x.
- X.)'C (x. - x.) ,
-:L
-:J
n"""1. -J
=2
- 0
2
(x.,x.)/n
"""1. -:J
Therefore
2
(2 - 0 (x.
E
(i,j)e:O
= 2nL -
~
,x.)/n)
-J
L
2
0 (x.
E
(i,j)e:O
-1
,x.)/n
-J
L
From (3.4.2)
E (r)
H
o
= 1/2nL[2nL =1
2
E (0 (x.,x.)/n)]
. . ) e: 0L
~ -:J
( 1,J
(3.4.7)
- 02/2n '
where
0
2
=
Since (z.
1
E
0
..) 0
( 1,J
e: L
z.)
J
2
2
(x. ,x.)/n
~
-J
L
•
is non-negative for all pairs (i,j),
26
E (z, - z,)
H0).
J
2
(3.4.8)
implies
o -~
D2 (x.,x.)/n < 2 for all pairs
-1 J
=
(x.,x.)
-J
"""".l.
Since 0 is always greater than zero, it follows that
o =<
D(x.,x,) ~ I2:n for all pairs
"""".l.
-J
-
(x.,x.) .
-1
-J
It can be seen from (3.4.7) that when the specified model is
the correct one the expected value of r is in general larger over
a set D , where 0 2 is small, than over an arbitrary set of n pairs.
L
L
This fact is partly explained by the fact that the least squares
technique for fitting a regression model tends to be more sensitive
to the remote points of the independent variables,
This results in
somewhat of an "overfit,,3 of the data at these points,
3.5
Distribution of r.
In the special case that there are equal numbers of replica-
tions at several values of the independent variable, the exact
distribution of r can be derived,
replications at each of p
~
Suppose there are exactly N
1 different values,
>
1
Without loss of
generality it can be assumed that the observations are arranged so
that the vectors, !., associated with each set of replications occur
as adjacent rows of X and, taken together, occur as the first pN
rows of X,
Then the matrix ~L can be written as
3This term is used in Daniel and Wood (1971).
27
o
(3.5.1)
P
P
Each P. of (3.5.1) is an N x N matrix of the form
1
=I
Pi
1
(3.5.2)
- NJ ,
where J is the matrix with all elements equal to 1, and ~ is an
zero matrix.
n-pN x n-pN
It is well-known (see Searle (1971)) that a matrix of the form
(3.5.2) is idempotent, and that the rank of each P.1 is equal to N-l.
Therefore the matrix P is also idempotent and has rank p(N-l).
L
1
Accordingly the eigenvalues of N
P are
L
o
1, 1, 1, ••• , 1, 0, 0,
...
\
p(N-l)
Since X'P
L
=0
MP
in the case of replications,
= (I
L
n - p (N-l)
- X(X'X)-lX')p
L
Therefore the eigenvalues of MPLM and P are identical and from
L
expression (3.3.7), r is distributed like the quantity
N(n-k)
2n
L
p(N-l) *2
L £.
i=l
1
n-k *2
L £.
i=l
1
£
*
tV
N(Q., Icr 2),
28
p(N-I) *2
L
= N(n-k)
2n
L
E.
1
i=l
p(N-I) *2
L
E.
1
. I
1=
n-k
+
*2
L
E.
i=p(N-I)+1
1
This in turn is distributed like the quantity
N(n-k)
2n
I
n-k-p(N-I)~
(p(N-I)
2
'
Q
~
2
}'
where B(a,b) denotes a random variable having a beta distribution
with parameters a and b.
Furthermore, in the special case that N=2, the statistic r can
be written explicity as a function of the lack of fit statistic F,
For any pair (x. ,x.),
-1
(z.
1
-
-J
2
z.)
J
= (y.1 - y.
1
A
-
(y. - y.) )
2
J
J
2
= (Yi - y.)
J
because the predicted values of y associated with identical x's are
the same.
Since there are only two observations per group
2
_ 2
L (y. - y)
i=l
(y. - y.)
2
= -1-2~J
1
Therefore r becomes
2
L
(i:j~EDL i=l
r
_ 2
L (y.
1.
- y)
/n
L
1<J
= ---'-'---------n
L
i=l
2
z. / (n-k)
1.
SSPE/n
L
= SSE/(n-k)
,
29
where F is the usual lack of fit test statistic and is distributed
as F(n-k-nL,n ).
L
In this case r is distributed as
(n-k)
Q
n
p
L
GL
n-k-n L\
2'
2/'
for n-k-n
L
>
0 •
The statistic r can be viewed as a special case of the more
general statistic
w'Aw
U "" w'w
(3.5.3)
This type of statistic is known as a serial correlation coefficient
(see,
~'K.,
Anderson (1948)),
For a completely general n
x
n matrix
A, the distribution of U under H is not expressible in closed form.
O
However an exact expression for the density is given by R, L. Anderson
(1942) in two cases.
These are:
of A are equal in pairs, and
(2)
(1)
n-k is even and the eigenvalues
n-k is odd and the eigenvalues
of
A are equal in pairs with the remaining eigenvalue being either the
smallest or largest value.
J. von Neumann (1941) derives an exact
expression for the [~(n-k)-l] order derivative of the density function
of U for the case that n-k is even and all the eigenvalues are different.
However these cases do not occur in general for the statistic r.
Anderson's expression becomes very complicated for n-k moderately
large and similarly, von Neumann's result can only be used for very
small n-k.
values.
In all cases it is necessary to know the individual eigen-
For practical applications a simple method of approximation
is desirable.
30
T. W. Anderson (1948) shows that /:nU is asymptotically normally
distributed for a sequence of matrices, A(n), for which the eigenvalues satisfy fairly general regularity conditions.
However there
are indications that the convergence to normality is slow enough
that use of a normal approximation does not always give satisfactory
results, even for moderately large sample sizes (see,
~.~.,
Anderson
(1942), and Koopmans (1942)).
From (3.3.7) the statistic r can be written as
r
= (n2- k~ n ~ k
2
W
v. --r_ __ 1_''k;----,:~n~ i=l 1
n2
I:
+
w.
jFi J
=
~
-k~ n-k
-2n~
L
i=l
(3.5.4)
" . W., say,
1
1
where the random variables W.(i=1,2,
---,n) are correlated beta vari1
ables, each with parameters 1/2 and (n-k-l)/2.
This expression
suggests that a Pearson type I (beta) distribution be used to approximate
the distribution of r.
To this result a series of beta approx-
imations arise in the literature for special forms of A.
In all cases
the parameters of the approximating distributions are estimated by the
method of moments.
In the case that (3.5.3) is the mean-square-successive-difference
statistic, Durbin and Watson (1951) suggest a two parameter beta distribution on the interval (0,4).
Bloch and Watson (1967) use a
constant times a beta distribution.
Press (1969) suggests that for small n the distribution of any
statistic of the form
31
n
k A.w.
i=l 1 1
n
2
k w.
i=l 1
2
can be approximated by
A B(a,b) ,
n
where A is the largest value of the A.,
n
1
i=1,2,···,n.
This method
essentially fits a two parameter beta distribution with the proper
end points.
A four parameter beta distribution is used by Henshaw
(1966) to approximate the distribution of the serial correlation
coefficient of Durbin and Watson, yielding better results than the
two parameter beta approximation for small n-k but involving substantially more computation.
For larger n the two parameter approx-
imation of Durbin and Watson gives relatively good results considering the amount of computation involved (see Durbin and Watson (1971)).
For
purpos~of
approximating the entire distribution function of r,
a technique similar to one or more of the above can be used,
However
in practical application of the test all that may be needed are
several cirtical values of the test statistic.
The Pearson Curves are a family of absolutely continuous probability distributions whose density function, f(x), and distribution
function, F(x), satisfy
1
f(x)
df(x) _ x+a
dx
- F(x) ,
and whose first four moments exist.
Assuming different relation-
ships between a and these moments yields three main, and nine transitional types of Pearson Curves.
Included among these are the beta
32
and normal distributions.
Methods are available to estimate the
parameters of the appropriate approximating density function, if
this is desired.
By Theorem 3,3.2 the moments of
l'
exist and, as
was seen in the previous section, exact expressions for these
moments are available,
For this reason the Pearson Curves are
recommended to obtain approximate critical values.
This method
was also suggested by von Neumann, Kent, Bellinson, and Hart (1941),
for the mean-square-successive-difference statistic.
3.6
Bounds for the Critical Values of the Test Statistic
The critical values for the test statistic which are obtained
according to the methods of the previous section depend upon the
matrix X.
Similar to the situation for replications described in
Section 3.5, suppose the data points, -1.
x.,
i=l,2,···,n, are arranged
in k-space in such a manner that p disjoint groups each containing
N points can be formed.
Suppose further that the function D
generates a set of near-neighbors, D , such that a pair of points is
L
in D if and only if both points belong to one and the same group.
L
In this case bounds can be obtained for the critical values of the
statistic r under H which do not depend upon the values in the
O
matrix X. These bounding values are actually critical values of
two bounding random variables and can be used in an approximate
test procedure to test against alternative hypotheses of the type
described in Chapter 4.
In this section the derivation of the
bounding random variables and their distributions are presented,
The general method for conducting the approximate test for
33
nonlinearity is indicated,
The derivations draw heavily upon the
work of Durbin and Watson (1950) and Anderson (1958),
First some bounds upon the eigenvalues of MPLM are derived
using the following theorem adapted from Anderson (1958),
Theorem 3,6,1,
rank k,
Let M=I-X(XtX)-lX t , where X is an n x k matrix of
Let Al :. A ::.. •••
2
~
~l ~ ~2 ~
matrix PL, and let
An be the eigenvalues of the symmetric
•••
~ ~n-k
be the n-k eigenvalues of
MPLM other than k zeros, arranged in descending order,
A.1+ k <-
Then
i""1,2,···,n-k
1 -> A.1
1.1.
The proof of the theorem is found in Durbin and Watson (1950) and
is omitted here.
Letting
~*
=
H~
be the orthogonal transformation diagonalizing
MPLM, it follows immediately that
n-k
n-k
n-k
E A. k(£.*)2 < E~. (£.*)2 < E A. (£.*)2
i=l 1+
1
- i=l 1 1
- i=l 1 1
Furthermore,
n-k
E A. k(£.*)2
. 1 1+
1
_1n-=-'k;-----E (£.*)2
. 1
1=
<
n-k
n-k
2
E ~. (£.*)2
L ~.(£.*)
. 1 1 1
i=l 1. 1
<
1 ~-k
n-k
1
L (£.*)2
L (£.*)2
1= 1
.
i=l
1
1
Therefore for every real number r '
O
P {dU
where
<
r }
O
~
Pr{d
* 2
A.(€.)
i=l 1 1
d =
u
n-k
2
E (£.)
i=l 1
<
r }
O
~
Pr{d
L
<
r } ,
0
n-k
n-k
L
,
L A. k(£.*)2
. 1 1+
1
1=
d = n-k
L
E (£.*)2
i=l 1
34
n-k
~ 1.1.
and
d
=
i=l 1
n-k
~
. 1
1=
* 2
(E:.)
1
* 2
(£. )
1
In certain circumstances in which the columns of X are eigenvectors
of P these bounds can be improved.
L
Consider the following theorem
adapted from Anderson (1958).
Theorem 3.6.2.
Under the hypothesis of Theorem 3.6.1 .• suppose that
s«k) linearly independent linear combinations of the columns of X
are s eigenvectors of the sYmmetric matrix PL'
AI'
Let
A ' > ••• > A 'be the eigenvalues of P remaining after
L
2 - n-s
deleting the s eigenvalues corresponding to these s vectors. Then
>
-
where
n-k
~ A.,(£.*)2
i=l 1
1
•d I
d ' =
U
n-k
L
~ (£.*)2
i=l 1
* 2
n-k
L A. k
. 1
= 1=
n-k
~
1+
-s
(£.)
1
* 2
(E:. )
i=l
1
The proof is in Durbin and Watson (1950).
From the previous section the eigenvalues of P are
L
N.
N.
\
• •• J
,..
... ,
N.
O. O. ,..
J
\
q
n-p (N-l)
P (N-l)
In general the value of s will be at least 1 since the vector 1,
where 1'=(1,1.···,1), appears as a column of X and is an eigenvector
of P associated with the eigenvalue O. In the case that a subset
L
of the independent variables is replicated at particular points, s
can be greater than 1.
For example, let m < k and let each vector
xi' be written as ~il'I~2')' where ~l is a 1 x m vector and
x!2 is a 1 x(k-m) vector.
-1
If the vectors x.,
""""l.
i=1,2,···,n. occur
3S
so that for every (i,j)eD L, ~l = ~2' then s=m and each of these s
vectors is associated with a 0 eigenvalue, provided the s vectors
are linearly independent,
are
1
p,
N_,_N_,....~"..._._,--JN/
l.....
O'~'"
j~i+k-s,
=
0I
d ' simplifies to
L
*
n-s
dL
t,
n-p(N-l)-s
P (N-l)
Letting
In this case the n-s eigenvalues, A.
A.'(L k )
J - +s
j=k-s+l J
E
2
n-k
E (€.*)2
j=l J
Therefore substituting in the values for A.',
1
p(N-l)
E
dL
*
N(e. k
= j=k-s+l
J- +s
n-k
E (E.*)2
i=l J
p(N-I)
2
E
)
2
N(E.*)
i=l
1
n-k
E ((;.*)2
i=l 1
d ' ::: ---;..----U
Under H the random variables
O
is distributed as
E
i
*2 are independent and each
2 2
a Xl
Therefore d ' is distributed as
L
where mi = p(N-l) - k + s, m2 = n-k-m l = n - p(N-I) - s, and
X2 and X2 are independent chi-square random variables with
ml
m2
ml and m2 degrees of freedom, respectively.
d L • is distributed as
Thus it follows that
36
~
Ns(n 1/2. n 2/2). where n 1 :: p(N-I) and 11 :::
2
It follows, since r :: (n-k)d,that r satisfies
n
L
Similarly, dU'
n-k-p(N-1).
rL~r.::.rU
where r
L
~ N(n-k) • 8(m /2, m /2) and r ~ N(n-k) 8(n /2, n /2).
2
2
nL
1
n
U
1
L
for every r
O
Pr{rU ~ rO}
~
Pr{r
~
rO}
Therefore if r'(a) satisfies Pr{r
~
~
Pr{r L
~
rOl .
r'(a)} :: a, then
where rL'(a) is a lower level a critical value for the distribution
of r L, and ru'(a) is the lower level a critical value of the distribution of rUG The values rL'(a) and rU'(a) depend only upon
n, n , k, and s.
l
37
CHAPTER 4
ALTERNATIVE HYPOTHESIS
4.1
The Alternative hypothesis.
In general the alternative hypothesis for a test for lack of
fit of H would result when anyone or more of the assumptions of
O
H are relaxed or are differently specified.
However the possible
O
departures from the assumptions which are considered in this investigation are restricted to those which can be expressed by the
addition to the model equation (1.1.1) of a term which is itself a
nonlinear function of one or more of the independent variables.
Two basic types of terms are considered here, 1)
polynomial function of the x .. 's, and 2)
1J
function of the x .. 'so
a second degree
a general nonlinear
The model for the alternative relationship
1J
in each of these two cases can be represented as
l.
= X~
where w is a k
x
(4.1.1)
+ Y.&.c X) + u
1 vector of unknown parameters,
vector of unobserved random error terms with u
-
~
~
is an n
x
1
2
N(O,Icr ), and
-
u
£.(X) is an n x 1 vector of unknown nonlinear functions of X with
unknown coefficient y.
The vector
£,(X)
~
can be written as
=
(4. L 2)
g (X)
n
In case 1), g. (X) may be defined as
1
gl (X)
= -1
x. 'Ax.
-1
i=1,2,···,n
(4. L 3)
38
for some k x k matrix A of fixed coefficient parameters.
2)
g,1
(X)
::0
For case
f(x.),
i:-c.I,2,"·,n, where f(x.)
is some nonlinear function
-1
-1
of the x .. 's.
1J
In the case that there are replications at p values
of the independent variable it was noted in Chapter 1 that the alternative hypothesis for which the F test is uniformly most powerfUl is
that g. eX) is a p parameter
1
independent variables.
(l~~n-k)
nonlinear function of the k
The same alternative will be assumed for the
r test as well as for the F test when there are no replications.
4.2
Properties of r under H ,
A
Under the alternative hypothesis, H , the residual vector z is
A
equal to
z = Ml. = M( ya + u) .
Therefore
- C-n2n--kL)
(!:!. + YB)'MPLM(!:!. + yB)
r -
If H =
(~1'~2'···'~)
(!:!. + yB) 'M(!:!. + yB)
is the matrix of the orthogonal transformation
used in Section 3.2 and w*
= H(!:!. +
yB), then
~* ~ N(yHa,
2
10 ).
u
Subsequently r becomes
w* 'Vw*
(n-k~- \2nL W*,'10lw *
n-k
L v. (w. *)2
i=l 1 1
n-k
L (w.*)2
. 1
1=
1
Under this alternative each of the variables (w.*)2 is distributed
1
as 02 X2 '(A.), where X2 '(A.) is a random variable having a noncentral
u l
1
1
l
chi-square distribution with I degree of freedom and noncentrality
39
')
2
parameter Ao1 '" y".['h,h,
'pla ,
-1-1 9-' U
(i c .l,2,···,n),
Using the methods of
Section 3,5 r can be expressed as
r :::. .....l_n_-k_)....
2n L
n-k
r. \) W if,
i:l 1 i
0
although now the (W.*)'s,
1
i=1,2,"',n-k, are n-k correlated doubly-
noncentral beta random variables,
The fact that the distribution
of the doubly-noncentral beta random variable can be approximated
by a constant times a beta distribution (see Scheffe (1959)) suggests
that if necessary the distribution of r under HA could be approximated by one of the variations of a beta distribution suggested in
section 3.5.
Under
HA
the omitted vector
the moments of r are now not only functions of
?~
2
but also are dependent upon au' the variance
The distribution of the (wi *)2 ts
not in the form of the r distribution discussed in Section 3.4,
of the error term in model (4.1.1).
is
therefore (Pitman's) Theorem (3.4,1) is not applicable and exact
analytic expressions for the moments of r 'mder H are not obtainA
able. However the moments of r can be approximated by using a
Taylor series expansion of r about its mean,
These approximations
are used to study the behavior of the statistic r under HA as
compared to its behavior under H '
O
• '\J
Let r = r(v,h) = vlh, where v = w*tVw*/2nL and h - w*'Iw/(n-k),
and denote the means of V and h under HA by EA(V) and EACh),
respectively. SInce the variances of v and h and the covariance of
v and h are all of order lIn, a first and second order Taylor series
approximation to the moments of r have error terms of order lIn and
,. ()
-)
l/n", respectively,
imation to EACr) ,
Consider the first order Taylor series approx-
4
EA(v)
EA(r(v,h)) - =--:~+
O(l/n)
E (h)
(4,2.1)
A
Using Theorems 3,4,2 - 3.4,4 the moments of v and h under H can
A
be expressed as functions of the moments of v and h under H as
O
follows:
EA(V)
= EO(v)p
+
Gp(n)
EA(h)
= EO(h)p
+
G(n)
20
= n-~
VarA(h)
(4.2.2)
2
[EO(h)p
+
2G(n)]
20 2
= n-ku
CovA(v,h)
where Gp(n)
[EO(V)p
2
= ya'MPLMB!2nL,
+
G(n)
2G p (n)] ,
2
= ya'MB!(n-k),
and p
= 0 u2/0 2 •
Therefore
(4.2.3)
and for large n,
if
(4.2.4)
It is useful to note that (4.2.4) is equivalent to
2"
(1 - D
/2n)
G (n)
>
~(n)
(4.2,5)
In the limit (4.2.4) becomes equivalent to
where
G
>
G
:=
G
P
limit
n~
G(n)
, and
G
p
::
limi t G (n)
n~
p
4The second order app'roximation leads to the same result.
41
For large n, condition (4,2,4) implies that EA(r)
~
EO(r),
therefore the rejection region of the test for nonlinearity using
r is in the lower tail of its distribution,
Noting that
~ = M~
+
yM£, the sum of a fixed and a random com-
ponent, suppose the function r is computed on the elements of the
vector representing the fixed portion
If the value of this
of~,
function is less than the expected value of the same statistic, r,
calculated on the random component of
for large n EA(r)
~
EO(r),
~,
then (4,2,4) holds and
It is also noteworchy that the approx-
imate condition (4.2.4) is independent of the values of y and cr 2 •
u
In the case that there are replications at values of
= D(-,O,-).
may be defined by DL
it follows that x.
-1
of PL , X'P L
=0
= -J
x.
L
for every (x. ,X,)ED .
-1
(~ + Y£) 'P
P
=
L
D
L
Since the function D is a metric
in this special case.
(I - X(X'X)-lX')P
~,
-J
L
By the definition
Therefore MP
L
=
and the numerator of r becomes
L
(~ + Y.[)
Furthermore since gi (X)
=
f(~i)'
K'P L
= 0, Therefore r simplifies
to
I!.
k)
= \.~~L
L:
(i,n ED
L
1
--n----,-k,.---2-----
L: Z
i=l
with £'MPLM£
(u.
=
O.
2"
0
1
Condition (4,2.4) always holds in this case since
(1 - D /2n)
=
1
>
0
42
4.3
Method of Approximating the Power of the Test
For purposes of determining the power of the test procedure
the interest lies in tail probabilities of the form
where r O is defined by PH {r ~ r O} = Q, Q being the size of the test.
a
However since the denominator is never negative, this quantity can
be rewritten as
{(~ + yz) 'MnM(~ + y,g) .::. O}
PH
A
= (1/2nL )P L
where n
M(~ +
~
=u
- (rO/(n-k))I.
~
The quantity
= (~+
y,g)'Mn
y,g) is an indefinite quadratic form in the variable
y&.
+
The distribution of
~
can be approximated by a Pearson
three-moment chi-square distribution.
This method, as described by
Imhoff (1961), assumes that the distribution of
~
is approximately
that of
(4.3.1)
where X2 is a central chi-square random variable with q degrees of
q
freedom. The parameter q is estimated using the first three moments
of~.
This estimate, q, is
S
(4.3.2)
Although Imhoff cautions that this method of approximation gives
SIt is assumed that ~3(~»O, otherwise the distribution of -~
should be approximated with the power of the test being approximated by PH (-~>O), (Imhoff (1961)).
A
43
more accurate results in the case of definite quadratic forms than
for indefinite forms it will be considered sufficiently accurate
here to indicate trends in the power of the test against specific
types of alternatives,
The first three central moments of tare
2
- tr(MQM) cru
+
Var H (.11,) ;: 2tr(MQM)2cr~
2
y £'MQM£
+
4y2&., (MQM)2£o~ , and
A
6234
cr + 24y ~'(MQM) £0
= 8tr(MQM) 3
u
u
11 [-1 (t)
3
A
The terms tr(MQM) k ,
k=1,2,3, can be expanded into a sum of terms
each of the form trM(PLM)j,
j=O,· .. ,k-L
In case the individual
eigenvalues of the matrix MPLM are known the evaluation of these
terms can be simplified,
Let the matrix H be the orthogonal matrix defined in Section
3.3,
Then
tr(MQM)k
since HH'
=
10
= tr(MQMHH,)k
= tr(H'MQMH)k ,
But from the choice of the matrix H this becomes
tr(D(A.))k, where D(A.) is an n x n diagonal matrix with elements
1
1
A. given by
1
i=1,2,···,n,
and v. ,
1
i=I,2,···,n, are the eigenvalues of MPLM (including the
k zero eigenvalues),
tr(MQM)
k
=
In this case
n
k
EA. ,
i=l 1
k=1,2,3,
44
k
Similarly the terms £'(MQM) £ can be expressed as a sum of
terms of the
£'M(P LM)j £
(2n ) j (n-k) k- j ,
L
j = l,···,k,
k
for k=I,2,3, and the term £'ME!(n-k).
(4.3.3)
For small samples when there
is no replication these terms influence the power of the test and
reflect the degree of departure from the ideal condition of replication.
An ordering exists among the terms of the form (4.3.3).
matrix PL is a real symmetric definite matrix.
The
Therefore there
exists a definite matrix U of the same rank as P which is real
L
and symmetric such that UU'
Letting £*' = £'MU,
= PL.
then
= £*'£* ,
and
2
£' (MP LM) £
=
£'MUU'MUU'M£
= £*'U'MU£* .
(4.3.4)
By Theorem 3.3.2 the two quadratic forms satisfy the inequality
£*'U'MU£*
Al
<
£*'£
<
=
(4.3.5)
A
n
where Al and An are the smallest and largest eigenvalues, respectively, of U'MU.
But the eigenvalues of a product of matrices are
not affected by a rotation of the product, so that U'MU and
MUU' = MP L have the same eigenvalues.
Therefore
2
o =<
£' (MPLM) £
£'MPLM£
< v
=
n
(4.3.6)
4S
Similarly, defining £*'
= £'MPLM
, it can be shown that
(4,3,7)
Inequalities (4,3,5) - (4,3,7) imply that
(4,3.8)
Since MPLM is positive semidefinite, vi
n-k
v
<
n -
L
. I
1=
v.
1
>
0
for every
i, so that
"2
= 2n (l-D /2n).
L
Therefore
(4.3.9)
If the set DL consists of pairs of replicates then £'MP L
= £'P L = ~,
implying that all the terms of inequality (4.3,9) are identically
zero.
The process leading to (4.3.9) can be extended by induction to
yield the inequality
for any positive integer m.
If the eigenvalues of MPLM are uniformly
bounded by some value AN' say, for all n, then from (4,3,8) ,
(4.3,10)
m
Therefore the term £' (MP LM) £ has the same order of magnitude for
any fixed positive integer m.
This result is used in Chapter 5
when the large sample properties of the test are considered.
46
CHAPTER 5
LARGE SAMPLE PROPERTIES
5,1
Large Sample COmparisons
Any comparison of the F test and the r test depends upon
the number of observations available.
When there is a sufficient
number of replications in the data, of course the F test would
be preferable to use at any sample size.
As discussed in Chapter
1, the F test is also preferred when the number of observations
is large, regardless of the dispersion of the data points.
An
additional argument in favor of the F test for large samples is
that as the number of observations increases, the exact moments
of r under H become increasingly expensive to compute.
O
When the sample size is small, problems arise in the determination of the groups of near-neighbors.
Although computer
algorithms for selecting such groups are available, the criterion
for determining the best grouping often depends upon some arbitrary starting point - especially in a procedure which allows the
groups to form to variable sizes.
With the r test this problem
is not encountered since the formation of groups is replaced by
the selection of pairs of nearest-neighbors which do not necessarily have to be disjoint pairs.
task to order all pairs,
(~i '~j)'
Since it is a straightforward
by a well-defined distance
function, determining a usable set of pairs is a simpler task.
Although some arbitrariness is introduced in deciding upon how
many pairs to include in DL, this same type of decision is made
in forming groups for the F test.
47
A meaningful comparison of the two tests should involve a
consideration of the relative power of the tests,
Therefore for
small sample sizes this comparison is done by simulation of the
behavior of the test statistic under several alternative model
specifications.
A description of the simulation experiment and
discussion of these results are given in Chapter 6.
In the
present chapter the two tests are compared for large samples by
using the criterion as asymptotic relative efficiency (ARE) as
defined by Pitman (see Noether (1955)).
For large sample considerations it is necessary to make the
following assumptions:
limit XIX = ~ ,
n~
n
where
~
(5. L 1)
is a fixed, finite-valued, k x k matrix, and
2 ,[',[
limi t y -nn~
2
= cr g
<
co
(5. L 2)
,
where,[ is the vector of nonlinear terms of model (4.1.1).
tion (5.1.1) implies that the vectors -1
x.,
i=1,2,···, are increased
so that their matrix of cross-products converges to a limit.
follows that
limit n(X'X)-l :: L-1 •
n~
Subsequently for a fixed pair, (x.
,x.),
-1 -J
limit D2 ex. ,x.)
n-+co
-1 -J
)
.. (-1
X.-x.
-J
'l:
2
::!Sex.,x.)
-1
-J
-1
Assump-
.
(x.-x.)
-1 -J
say,
It
48
This guarantees that
limit
n-1oOO
2
D (x. ,x.)
-1 -J
=
n
o.
Throughout the remainder of the chapter it will be implicitly
assumed that (5.1.1) and (5.1.2) are satisfied.
5.2 ASymptotic Normality of r under Pitman's Alternative
Given the two tests F and r of identical size and of the same
hypothesis, the relative efficiency of r to F is generally defined
as the ratio nF/nr , where n r is the sample size required using test
r to give the same power as test F with sample size n , In general
F
the ratio nF/n r is a complex function of the parameters of the particular alternative chosen.
However in the limiting case often
this dependence can be simplified.
The asymptotic relative efficiency
of r relative to F $rF) is defined by
(5.2.1)
Suppose that the set of hypotheses to be tested is given by
H : y = 0 versus H : Y , 0, where y is defined as in (4.1.1).
O
A
Consider an alternative parameter y which satisfies limit y
n
n-1oOO
n
= O.
For particular forms of y n limit nF/n r can be evaluated under
n~
HI: y
= Yn
under certain conditions given by Pitman (see Noether
(1955)).
The value of E obtained as such is considered to be a
rF
reasonably good indication of the relative efficiency of the two
tests against local alternatives,
l.~.,
value of y is in the neighborhood of O.
alternatives in which the
At this stage in order to
49
evaluate (5.2.1) using Pitman's theory it must be assumed that limit
tr(MPLM) 2 /2n
L
exists,
This condition is satisfied for the r test
under certain regularity conditions on the eigenvalues of the matrix
One of the requirements of Pitman's theorem is that /:n·r have
limiting normal distribution under both H and HI'
O
The demonstration
of this property is given here in considerable detail because it
illustrates the development of the regularity conditions imposed upon
the eigenvalues of MPLM,
The general method of approach is that used
by Kamat (1962) to show the asymptotic normality of the mean-squaresuccessive-difference statistic under alternatives of serial dependence.
The satisfaction of the other requirements for Pitman
efficiency can be demonstrated using first order Taylor series
approximations to the first two moments of r under HI'
straightforward and are omitted here.
These are
In the following derivations
it is necessary to assume that n , the number of pairs in D , remains
L
L
proportional to n and n increases,
!.~.,
nL/n = 0(1),
two normal convergence theorems are used.
The following
The proof of both theorems
are found in Hoeffding and Robbins (1948).8
Theorem 5.2.1.
Let {pen)} be a non-decreasing sequence of positive
integers approaching infinity.
Let (X .,Y .),
nJ
nJ
j=1,2,···,p(n), be
an infinite sequence of finite sequences of pen) random variables
in R2 which are independent for each fixed n.
8
See Orey (1955) for an extension of theorem 5.2.1 to sequences
of m-dependent random variables.
50
E(X .) :: E(Y .) = 0, and,define
nJ
nJ
Suppose
~L2 . (nj) = E(X i .• y2~i) ,
1
nJ
-1
nJ
i=0,1,2
pen) 3
. E PnJ"
J=l
If
limit
n~
pen)
,
E ~.2. (nj))/p(n) = ~i2-i
. 1 1-1
J=
limi t p I
and
n~
n
v'p (n) :: 0,
then the random vector
__!_
p (n)
v'p (n) (E
j=l
X.,
nJ
p (n)
E Y .)
j=l
nJ
has a limiting normal distribution with mean (0,0) and variancecovariance matrix
(020
~l.l'
~ll ~oJ
Theorem 5.2.2.
E(T)
= to
If Sand T are random variables with E(S) = So and
and if In(S - sO) and In(T - to) have a limiting bivariate
normal distribution with nonsingular variance-covariance matrix
~
:J'
mean
a
then InrS/T - sO/tol has limiting normal distribution with
and variance
a
2bs
-t
O
--+
2 - t 3
a
a
(5.2.2)
Sl
Necessary for the results of this section is the following
lemma.
The proof appears in the appendix.
Lemma 5.2,1.
Let the eigenvalues of MPLM be uniformly bounded for
all n.
Then for every finite fixed positive integer m, and for
every a.
>
0
.m
tr(MPLM)
=
m
.&.' (MP LM) .&.
and
o(n
=0
1+0.
(n
)
1+0.
(5.2.3)
).
The Pitman alternative takes the specific form HI: y2
some 8
-F
= 8/~
for
O.
In the notation of theorem 5.2.1, let pen)
X •
nJ
Y .
nJ
= n-k
v. (n) (n-k)
= ~J----:::--_ _
2n L
=
[w.*2 - E(w.*2)]
J
J
[w.*2 - E(w.*2)].
J
J
Then under HI'
= E(YnJ.) = 0,
E(X .)
nJ
1120 (nj)
=
and
ll·
l.
2-l.. (nj) are
2
2
\). (n) (n-k)
2
J
Var(xi (A j ))
(2n )2
L
1l02(nj)
A. =
J
1
(n-k)
(8/m) .&.'h.h. 'ya
JJ
n-k
~ Var(X .)
j=l
nJ
2
u
n-k
= ---:::-2
(2n )
L
Furthermore
Var(~* IV~*)
and define
52
==
n-k
2 4
--""""2 [2tr(MP LM) CY u
(2n )
48
+ -
L
rn
2
.8.'MP M,[CY ]
L
u
Therefore
n-k
n-k
--..",...
.l: ).I20(nj)
n-+oo (2n )2 J=l
L
= limit
).120
_ 8 2 (2cy 4 )
u
where
8
2
limit (tr(MP M)2/(n-k))
L
n-+oo
n-k
(n-k) . L lin (nj)
J=1
1
Similarly
= limit
n-+oo
=
1
n-k
(n-k) . L ).102 (nj)
J=l
limit (n:k)
n-+oo
[2(n-k)CY~ + (48/rn).8.'M.8.CY~]
It is well-known (see Cramer (1946), p176) that
a= 1,2,3, • •• ,
for any random variable whose absolute moments exist.
for a :: 3,
Therefore
53
_1_ 3
In - k
n-k
L:
Elx .1 3
.
J=1
nJ
n-k
< _1_ 3
=
3/4
L: (E (X
. 1
J=
In- k
>
nJ
4) )
3/4
But
E(x .4)
nJ
<
=
E(X .4)
nJ
n-k
3/4
~
E(X .4)
j=l
nJ
L:
1
+
for all j, no
Therefore
n-k
L:
- j=l
E(X .4)
nJ
+
(n-k).
Using Kolmogorov's Inequality for a = 4,
n-k
n 4
E(X .4) ~ L: v.(n)[60
j=1
nJ
j=l J
L:
+
2408C1'h.h.'C1/lrlo2
.Q. -J-J £!.!
u
+
2
48(8g'h.h. 'C1/lrlo )2]
4
v. (n) [60
j=l J
--J-J
n
<
+
L:
T
~
u
222
240(8S.'YIrl"0u) + 48(8S.'K.!Irl0 u ) ].
(5.2.4)
Vj 4 (n)
The values
are uniformly bounded by some number AN' say.
Therefore (5.2.4) is bounded above by
nA~[60
+
240 0(1rl)
+
which is a term of order n 2 .
,.--,/vn-k
p
n
= - 1In-k
3
48 D(n)
+
(n-k)] ,
(5.2.5)
Therefore
n-k
L:
j=l
Elx .1
3
nJ
is less than a term of order n-1/6, and limit Pn /In-k = O.
By
n-+<>o
letting AN = 1 in (5.205) the same procedure yields the corresponding
result for Y
Replacing AN by max(AN.I) in (5.2.5). it follows
nJ
that limit Pn/ln-k = O. Therefore. under HI'
~k :~>nj' I~-k :~:~ is
S4
asymptotically normally distributed with limiting mean (0,0) and
variance-covariance matrix
The main result of this section is stated the the following
two theorems,
Under HI' lJl·r is asymptotically normally distri-
Theorem 5.2.3.
buted with mean 1 and variance 2CS
2
- 1) if the eigenvalues of
MPLM are uniformly bounded for all n,
Define S = __1 _ w*'Vw* and T = _w*'l_w*/(n-k).
2n L
Then r = SIT.
By theorem 5.2.1 Incs-s o ) and InCT - to) have an asymptotically
2
bivariate normal distribution where to'" So = 2o u '
Therefore s
and T satisfy the conditions of Theorem 5.2.2. and In·r has limiting
normal distribution with mean 1 and variance 2(S2 - 1) .
Corollary 5.2.1.
/nor is asymptotically normally distributed under
HI if the indices of the vectors -1
x. (i=1,2,···,n) are uniformly
bounded for all n.
Proof.
There exists a value Nr such that Ii
~
Ny for all i.
It
follows immediately that any diagonal element of P is less than
L
N,Le.,
r
--
for every j=1,2,···,n
(P ) .. < N
L JJ r
Define
n
p.
J
=
L
k=l
ICPL\.I
J
k~j
n
Then
max p.
J
j
<
L (\. <
k=l
k~j
J
-
Ny
55
if (i, j ) e: D .
L
where
otherwise .
By Gersgorin's Theorem (see Lancaster (1969)) every eigenvalue of
PL lies in one of the intervals
or
o
<
z
0
<
z -< Nr + Nr = 2N r
j=1,2,···,n,
< (P ) .. + p.,
L JJ
J
-
Theorem 3.6.1,
A.1+ken)
~
where vl'(n)
<
-
v.1 'en)
v2 '(n)
~
i=1,2,···,n-k,
A.1 (n)
<
-
•••
~
arranged in decending order.
vn'(n) are the eigenvalues of MPLM
Since the A. (n)'s are uniformly
1
~
bounded by 2N r , there exists a value, AN' such that vi'(n)
for all i and for all n.
AN
Since the eigenvalues of MPLM are
bounded, the result follows from Theorem 5.2.3.
That all the conditions of this section hold for HO as well
as HI follows immediately by letting
5.3
e=
0 in the preceding results.
Asymptotic Relative Efficiency of rand F.
The classical lack of fit statistic developed in Chapter 1
takes the form
F
x...' (M-A)l! (n-k-a)
(5.3.1)
=
x...'AxJa
where A is the idempotent matrix of rank a of the quadratic form
which represents the pure error sum of squares, x...'A"i.....
case of true replications A(M-A)
= 0,
In the usual
implying that AM = A.
this is only an approximation when near-replicates are used.
However
To
56
evaluate the Pitman efficiency of r relative to F some restrictions
must be placed upon the manner in which n tends to infinity so that
the conditions are compatible with both tests.
For the r test suppose that for each fixed n the -1
x. 's are grouped
into men) intervals or clusters so that (x.
,x.)eO if and only if -1
x.
-1 -J
L
and x. belong to one and the same cluster.
Suppose for each n there
-J
exists a dL(n) which, with 0, generates this set 0L'
it is assumed that the index of any point
~
Furthermore,
is not greater than Nr
and that the proportion of points whose indices are equal to N
r
tends to one as n tends to infinity.
For the F test it is assumed that the -1
x. 's can be grouped into
pen) groups, G.,
i=I,2,"',p(n), with each G. containing n. points
1
1
on which pure error sum of squares is computed.
n i 2 NF for some NF and for all i.
1
Suppose that
As was the case for the r test,
some regularity assumptions must be made concerning the noncentrality
parameters of the numerator and denominator of F for the case that
there are no replications.
~'XAX~
.a' A,[
These are
= o(a)
::
0
(5.3.2)
(a)
Then using second order Taylor series approximations to the moments
of the F ratio, it is straightforward to show that the Pitman alternative hypothesis for the F test is also HI: y2 =
a/In. The numerator
and denominator of F both have central chi-square distributions under
H and noncentral chi-square distributions under HI with noncentrality
O
parameter of order l/In.
Using the methods of Section 5.2, it can
be shown that F is asymptotically normally distributed under both
57
The ARE of r relative to F $rF) in the general case is given
by (Noether (1955)):
(5.3.3)
where B
n
=
S'X'AXS/acr
-
-
2
The term B appears in the expression
u
n
because the pure error sum of squares is calculated on near-rep lications.
Using assumptions (5.1.1), (5.1.4), and (5.3,2), the limit
of expression (5.3.3) simplifies to
2n 2tr(MA)l
L
(n-k)oa
j
_ 2nJ
n-fJ
provided the limit exists.
limit tr(MA)
a
n~
n~
+
and
2
2n L
(see appendix for proof).
2n /n
L
Furthermore it can be shown that
=1
tr(MPLM)
limit
9
(5.3.4)
'
=N
r
+
1
In the limit 2n L/a
+
NrNF/(NF-l) and
N , therefore (5.3.4) reduces to
r
(5.3.5)
9If the data points are increased in the manner described, the
term K'MP MB/2n tends to zero by the continuity of the vector
L
L
function K'M.
58
where N is the maximum index of a point x. and Np is the maximum
r
~
number of elements in each set G.1 ,
i=1,2,'" ,p(n)"
In the simple regression model, suppose that the ~i's are
bounded and equally spaced for each n, so that the Gi's are formed
by partitioning the range of values of ~ into m equal intervals.
The G. 's consist of points for which the maximum distance between
1
any pair does not exceed the width of a subinterval Gi "
In order
to make the r test equivalent in terms of distances between pairs
of near-neighbors, the maximum value that Nr is considered to
attain is 2(N p-l).
Por example if each Gi contains Np points,
then a reasonable restriction is that N satisfies
r
Por the case N
r
=
Np -1, Erp is 1 regardless of the value of Np "
In general Erp satisfies
(5" 3. 7)
5.4
Selection of Pairs for D "
L
The results of the previous section indicated that the ARE of
an r-type test with smaller index per point relative to the same
test with a larger index per point is less than 1.
It is assumed
that the distances between all pairs in the two sets tend to zero
in the limit.
This would suggest that the preferred scheme for
selecting pairs for the set D would be to include as many pairs
L
as possible and specifically all pairs whose distance is less than
59
the critical value d ,
L
At the same time the index of any point
should not be allowed to become considerably larger than the
others.
Suppose r
l
is an r-type test statistic based upon a set of
near-neighbors, 01' and let r 2 be a test statistic of the same
form using the same set of data points but based upon a different
set of near-neighbors 02'
Then using the results of the previous
section, a comparison can be made between the two tests involving
r
l
and r 2 .
Let N be the maximum index of any point using set
l
01 and N be the maximum index using 02'
2
Suppose that for each
test the proportion of points whose index is less than its maximum
value tends to zero as n tends to infinity.
relative to r
2
E
= limit
12
n-+oo
2
lO
(n) and a
statistic r
l
l
reduces to
a
where a
Then the ARE of r
a
2
20
2
20
2
lO
(n)
,
(5.4.1)
(n)
(n) are the respective
and r 2 evaluatalat y
=0
varianc~of
for sample size n.
the
Since
the tests are of the same form (see Noether (1955)), this simplifies
further to
(5.4.2)
60
CHAPTER 6
RESULTS OF SIMULATION STUDIES AND APPLICATIONS
6.1
Accuracy of Approximations to the Critical Regions Under HO'
A relatively small-scale Monte Carlo experiment was performed
to compare the behavior of the statistics rand F under H for
O
small sample si zes.
Variations of the following factors were taken
into account in the study.
1.
sample size
2.
arrangement of the data points of the independent
variable
3.
magnitude of the variance of random error about the
model
4.
significance level
A sequence of twenty-one sets of X values were chosen for a
sequence of regression models in two variables.
The data generated
for this experiment were used to evaluate the Pearson Curve approximation to the critical values of r, and to determine the amount of
error in using critical values of F assuming replications.
To generate the
~
vectors, initially a set of twelve equally
spaced two-dimensional lattice points was chosen.
(-6, 4)
(-2, 4)
( 2, 4)
( 6, 4)
( -6, 0)
(-2, 0)
( 2, 0)
( 6, 0)
(-6, -4)
(-2,-4)
( 2,-4)
( 6, -4)
These are:
At each of the twelve lattice points, sets of 2, 3, and 4 data
points were generated in turn, to give the three different sample
sizes of 24, 36, and 48, respectively.
The values were generated
61
by independently selecting each of the two
coordinates of each
vector x from a normal distribution with mean equal to the
corresponding coordinate value of the associated lattice point.
The dispersion of the x .. 's, which was the same for both coor1J
dinates, varied from 0.0 to 1.8 in steps of 0.09 to yield the
twenty-one sets of x values.
The model used to generate the values of y under H was:
O
(6.1.1)
i~l,2,···,n,
where
point values, and
,x ) represents the displ ced lattice
li 2i
~ IIDN(O, 0 2). For each sample, values of
~i'
8.
1
~
(x
the normal deviates e.1 were selected by a computer subroutine which
generates random normal deviates according to a method developed
by Chen (1971).
values of
0
Based upon
th~
results of several trial runs two
were chosen (8.0 and 9.0).10
The pairs of points in D were selected as follows.
L
First
the modified squared interpoint distance, d, between two adjacent
lattice points was determined.
taken as the value of d
L
One-half of this distance was
and the set D * was formed as
L
The pairs in D * were then ordered according to the values of their
L
interpoint distances.
chosen for the set D .
L
10
pairs in D * were then
L
L
The number n was determined by
L
The smallest n
Reasons for selecting these particular values are discussed
in Section 6.2.
62
n *
L
~
number of elements in D *, and
L
l2n (n -1)
m r
with n
r
r
r
-~---
2
equal to the number of replications at each of the twelve
lattice points.
The result was equivalent to choosing
DL = D(nL,d ,-), but the two stage procedure resulted in more
L
efficient use of computing time due to the substantial reduction
in the number of elements to be sorted.
The exact moments of the
statistic r were determined in a separate computer run along with
the eigenvalues of MPLM, and the parameters necessary for using
the Pearson Curve approximation for the r test.
Critical values
of the r statistic were then determined for five levels of
significance (0.10, 0.05, 0.025, 0.01, 0.005) from tables of
the Pearson Curves (Johnson, Nixon and Amos (1963)).
The sets. G.,
1.
DL.
i=1,2.···,p, were formed using the pairs in
The size of each G was set equal to n for each i and for
r
i
each set of X values.
The value of p was allowed to vary over the
sets of XIS.
A criterion distance variable, d , was used.
c
Beginning at zero, the value of d was increased until it exceeded
c
the smallest interpoint distance available.
These two points were
then included in G and the points associated with this pair were
l
removed from further consideration.
The value of d
c
was increased
further until nr such points were included in G or until the set
l
DL was exausted. Then dc was returned to zero and G was formed
2
in a similar manner with the remaining points. This process yielded
63
p groups,
Critical values for the F test wero obtained for identical
levels of significance
(0:)
from tables in Pearson and Hartley (1966).
Results of this study are based upon 1500 samples of observat ions at each sample size (24. 36. and 48) and at each value of
ax'
These results for
0:
= 0.05
appear in Table 6.1.1.
The entries
in the table represent the number of rejections of H out of the
O
1500 samples of data generated under model (6.1.1).
11
The number of rejections of H for each model follows a
O
binomial distribution with mean na and variance nil(l-a).
A test for
a significant difference between the true and the estimated critical
values for both tests was made using the normal approximation.
Sub-
sequently under H the number of rejections at significance level
O
0: = 0.05 should fall within the interval [57.6. 92.4] for approximately 95% of the samples.
For the
l'
test only 3 out of 126
counts fell outside the interval.
for samples of size 24.
= 21
x
3
x
2 (or 2.4%) of the
All three of these values occured
For the F test a total of 19 out of 126
(or 15.1%) of the counts fell outside the interval [57.6. 92.4].
breakdown of these values by sample size and by value of au are
given in Table 6.1.2.
11Results for the other values of
author.
0:
are available from the
A
64
Table 6.1.1
Number of rejections of H out of 1500 samples generated
O
under H using the r test and the F test with a
O
(]
8.0
u =
r
36
F
48
r
48
70
70
77
72
75
82
58
76
59
65
52
74
76
62
58
80
58
64
65
56
57
49
49
60
79
64
82
70
75
64
85
74
75
79
65
76
82
63
69
67
67
70
63
69
75
75
Test
Sample Size
Sample Number
F
24
r
F
24
36
1
81
2
71
3
4
5
6
7
8
9
10
64
65
81
73
65
69
76
69
55
68
11
12
13
14
15
16
17
18
19
20
21
71
70
65
74
66
63
86
81
66
90
60
54
74
65
79
67
79
77
82
74
77
69
91
74
73
95
73
76
81
81
= 0.05.
77
89
78
86
83
73
69
67
90
78
74
81
71
65
62
58
58
6S
64
61
56
55
31
46
49
60
36
42
44
64
73
71
77
63
76
70
78
78
65
Table 6.1.1
(J
u
Continued
= 9.0
Test
Sample Size
Sample Number
F
r
F
24
24
36
1
T7
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
63
70
82
66
68
70
60
79
63
78
62
77
77
87
69
74
76
62
81
71
65
66
63
71
82
68
72
63
63
83
81
64
93
69
91
92
77
68
82
69
75
75
79
r
36
67
78
79
78
64
83
71
72
74
63
70
59
82
58
72
66
69
74
69
62
58
57
59
70
70
69
89
75
89
84
77
74
75
85
73
77
84
72
74
63
F
48
70
73
81
71
65
57
69
64
54
67
65
47
34
54
61
68
42
65
62
45
69
r
48
70
78
83
65
70
72
72
68
66
88
83
73
78
84
64
68
77
76
83
77
73
66
Table 6,1.2:
Number of Rejection counts out of 126 Samples falling
outside the 95% Confidence Interval [57,6, 92,4] for the F test
With a
= 0.05.
au 8,0 au 9.0
n
n
n
= 24
= 36
= 48
Total
Total
1
a
1
4
o
4
8
6
14
13
6
19
A simple chi-square test of independence indicates no change
in the distribution of rejections over the three sample sizes for
the two values of a.
u
However a similar test procedure used on
the three marginal totals corresponding to the three sample sizes
indicates a significant difference.
For smaller sample sizes (and
also smaller size groups of near-neighbors) the critical values of
the F test are adequate.
However for n
= 48
(using groups of near-
replicates of size 4) the approximation to the true critical value
is inadequate in terms of numbers of rejections falling outside
the 95% confidence interval,
6.2
Power Studies
A second simulation experiment was simultaneously conducted to
compare the power of the r test and the F test for small samples.
In terms of power, it was hypothesized that the r test would do no
better than the F test if there were replications of the independent
variable.
As the data points changed from a clumped distribution
gradually to a random distribution it was assumed that the F test
67
would begin to look worse either in terms of power or in terms of
the deterioration of the significance levels under H ' or both. It
O
was desirable to determine if a point existed in such a sequence of
dispersions at which the r test would give better results than the
F test.
For this reason relatively more consideration was given to
the arrangement of the values of the independent variable than to
variations of other parameters.
Subsequently the sequence of twenty-
one sets of values of the independent variable described in the
previous section were used to form a corresponding sequence of
twenty-one regression models in two variables.
The data was the
same as that used for the experiment described in Section 6.1,
The
method of generating the data is described there as well as the
method for choosing D and forming the disjoint groups for the F test.
L
The model proposed for fitting the data was
Yi
=~
i=l,···,n where
+
E.
1
Slx l + S2 x 2 +
~ IIDN(0,a 2).
Ei
Three alternative models were selected for generating values
of the dependent variable.
In each case the true relationship
was expressed by the addition to the basic model of Section 6.1, of
one term which was non-linear in one or more of the independent
variables.
Then a simulation of random error about the model was
conducted.
Two quadratic terms and one interaction term were con-
sidered, yielding the three models.
I:
y. = 10.0
1
+
2.0x
II:
y. = 10.0
1
+
2.0X
III:
y. = 10.0
1
+
2.0x
l
1
1
+
Specifically, they are
2.0x
+
2.0x
+
2.0x
2
2
2
2
+
0.Sx
+
0.Sx 2
2
+
0.Sx 1x 2
1
+
~.
+
~.
1
1
+ ~i
68
For each sample the variables u.1
'')
tV
IIDN(O,a U ") were selected by
computor as described in Section 6.1,12
two val uos of
(f
u
(8,0 and 9,0).
Simulations were done for
Based upon several preliminal'y
trial runs of the simulation program it was found that these values
of a
u
resulted in estimated power for both tests that was sufficiently
far from both 0 and 1 for all 63
= 21
x
3 regression models.
This
allowed for meaningful comparisons of power for the two tests.
Estimates of the power of the tests were obtained from the number
of rejections of tho null hypothesis out of 500 trials at each of
three sample sizes.
The
results of the power considerations are
presented in Table 6.2.1 for au ,- 9.0,
Results for a
u
= 8.0
are
similar in all aspects.
The results indicate that both tests are sensitive to the nonlinear alternatives.
Except in only a very few cases, the estimated
power of the r test is at least as great as the estimated power of
the F test.
The magnitude of the difference increases both with the
sample size and with the value of aX' the parameter representing the
dispersion of the x values, for each alternative.
As discussed in
Section 6.1, this is in part due to the apparent reduction in the
actual significance level of the F test when it is based upon nearreplicates.
As the values of the independent variable become more
randomly dispersed (ax increases), the power of the r test seems to
increase relative to that of the F test.
However it steadily
increases up to a point about half way through the sequence, and
12The same random normal deviates were used in this experiment
as were used in the experiment of the previous section.
69
Table 6.2.1
Number of rejections of H out of 500 samples of data
O
generated under models I, II, and III with cr
test and the F test with a.
Model
N
u
:=
9.0, using the r
= 0.05.
I
24
48
II
24
36
48
36
F
285
285
460
484
492
492
73
72
132
132
191
191
r
F
274
278
460
488
489
490
63
62
139
144
F
r
287
286
451
472
494
492
77
82
4
r
F
299
299
446
476
494
493
5
r
F
275
276
443
484
6
r
F
281
276
7
r
F
8
III
24
36
48
255
254
465
465
485
485
198
196
233
228
422
421
490
489
104
112
196
189
281
274
383
380
481
480
82
74
140
137
189
175
249
246
441
439
479
475
483
483
69
66
137
137
179
159
221
210
431
427
483
484
464
474
487
478
55
57
113
103
170
131
221
213
405
371
472
462
165
201
420
422
495
489
62
69
110
78
185
169
211
212
385
326
488
482
r
F
200
167
452
462
472
466
69
49
126
80
181
151
168
138
377
322
475
459
9
r
F
267
267
362
329
491
469
72
52
131
85
176
132
262
238
376
286
470
413
10
r
F
257
195
445
439
489
478
8,
47
148
107
160
113
211
158
392
340
478
443
Sample
Number
1
2
3
Test
r
70
Table 6.2.1
I
Model
N
Sample
Number
11
r
r
F
13
r
F
14
r
F
15
r
F
16
r
F
17
r
F
18
r
F
19
r
F
20
r
F
21
III
II
24
36
48
24
36
48
24
36
48
167
145
450
450
493
479
56
47
117
87
236
124
164
119
398
328
437
351
249
210
437
413
497
465
90
78
87
59
226
139
247
192
276
178
475
402
242
206
430
313
497
480
143
100
169
92
312
157
191
151
400
283
486
448
169
119
457
394
494
458
73
60
135
69
321
208
191
139
403
244
451
266
354
274
236
220
468
349
74
77
?47
124
740
182
240
182
30R
391
247
235
305
217
469
479
452
360
119
78
313
222
303
177
196
110
289
206
448
339
195
109
495
484
491
429
61
36
162
103
276
173
175
97
439
328
478
427
159
113
454
450
500
485
46
37
185
131
376
247
112
57
446
334
488
431
255
143
433
350
499
469
121
91
249
162
338
214
262
186
455
334
465
411
342
205
494
465
481
451
92
74
286
168
312
197
201
97
419
277
445
372
482
414
491
448
498
483
100
76
400
298
399
305
339
249
314
154
470
395
Test
F
12
Continued
r
F
71
then the power begins to vary considerabiy.
This may be due to
the chance variation possible in the method of generation of the
data.
This trend is also evident for the F test.
The r test as well as the F test seem to be most powerful
against model I and least powerful against model II-both quadratic
models - while the interaction alternative falls between these two
in terms of power.
This seems to indicate that the power for small
samples may be influenced more by the relative number of points
appearing in a line in the direction of the nonlinearity than by
the functional form of the alternative term.
The quadratic term
of model I involves the variable with the lattice points having
the largest number of points in the direction of the nonlinearity,
while those of model II have the fewest number.
powerful for a
u
Both tests are more
= 8.0 than for a = 9.0.
u
The test was also compared with the usual t test by including
the nonlinear term in the model and testing that its coefficient is
zero.
In nearly all cases considered the value of the power for this
test was estimated to be nearly equal to one.
results are not presented here.
Subsequently the
This might have been expected since
the test makes use of prior indications as to the nature of the
functional form of the nonlinear alternative.
6.3
Application of the Test Procedure
The test procedure was illustrated on a set of sociological
data.
The data represent observations of five sociological charac-
teristics on 51 individual school children.
represent the relationship is
The model proposed to
72
i=1,2,·",51.
A graph was made of the
small~st
5% of the pairwise distances.
There were four noticeable jump points-in the graph near the values
0.012, 0.014, 0.017, and 0.020.
The value of d
was set equal to
L
There were 15 pairs of points
the smallest of these points 0.012.
with interpoint distance less than d . The maximum value for the index
L
of each point was one so that the beta bounds test procedure could
be illustrated.
Thus 0L was defined by 0L -.: .; 0(0.012. 15, 1).
yielded a total of 9 disjoint pairs in 0L'
This
The same nine pairs were
used as nine groups of two for the comparison with the F test.
The
first four power sums of the eigenvalues of MPLM are computed as
n
13
s
s
E v. = tr(MPLM)
for s=1,2,3,4. The first four moments of r
. 1 1
1=
(~1'~2'~3'~4)
are then computed using the formulae of Section 3.4.
The Pearsonian parameters 8
81
= ~3 2/~2 3
82
= ~/~2
2
1
and 8
2
are then computed as
and
.
This yielded the values ~
=
0.60 and 8
2
=
3.29 indicating a type
I, bell-shaped, (beta) approximation to the distribution of standardized r.
The lower 5% critical value from the tables of the Pearson
Curves is -1.44.
The actual value of the statistic r and its first
two moments under HO are:
The smallest and
1.9999.
r
l~rgest
= 0.895,
~l
= 0,996,
and ~
= 0.412.
eigenvalues of MPLM are 0.0000 and
Therefore the value of r can fall in the interval [0, 5.11].
13
A computer program for application of the test procedure is
available from the author.
73
The critical value for the distribution of the non-standardized r
statistic is (-1,44)(0.412)
+
0,996 : 0,403,
Therefore it was
concluded that there was no indication of significant nonlinearity
in the model,
Using the beta bounds approximate test the estimated lower
critical value is the lower critical value of the random variable
r
L
where
rL
~
46
-g
8(5/2. 41/2) ,
The lower 5% critical value for r
is 0,120,
V
Using the F test with the assumption of replications corres-
ponds to using as an approximation for r. the distribution of T U'
where
r
U
~
46
-g
8(9/2. 37/2) ,
The lower 5% critical value of r
U
is 0,400.
In all three tests the
assumed model cannot be rejected against any general p(9) parameter
nonlinear alternative model,
74
CHAPTER 7
SUMMARY AND CONCLUSIONS
7.1
Summary and Conclusions.
The statistic r was developed for use in a test against non-
linearity of the general linear regression model.
The classical
lack of fit tests usually involve the estimation of the variance
of random error about the model from observations associated with
repeated values of the independent variable.
An alternative to
this procedure was developed for small to moderate sample sizes
when there are no replications at any values of the independent
variable.
The alternative test is based upon the statistic r
which is a ratio of an appropriate estimate of the error variance,
to the error mean square.
This estimate is the average squared
difference between residuals associated with a set of nearneighbor pairs of the independent variable.
The Mahalanobis inter-
point distance was used to determine the set of near-neighbors.
The function was expressed as the ratio of two quadratic forms
in standard normal variables and the relevant theory concerning the
exact distribution of a general statistic of this type was reviewed.
It was noted that the statistic has an asymptotically normal distribution under the null hypothesis, but that the convergence to normality in general may be very slow.
The exact distribution of r
was derived for several special cases.
All of these involve linear
combinations of beta random variables.
It was shown that when the
independent variable consists of pairs of replicates, the r statistic
can be expressed as a one-to-one function of the classical F
statistic used to test for lack of fit.
75
Considerable attention was given to a review of the existing
methods for approximating the distribution of the test statistic,
since the exact expressions available are not practical for routine
application.
All of these approximations involve variations of the
beta distribution.
For purposes of conducting the test, it was
recommended that critical values be approximated using Pearson Curves.
The recommendation was based upon the existence of the moments or r,
and upon the fact that the availability of extensive tables allows
for the elimination of the intermediate step of determining the
specific form of the distribution function used in the approximation.
Finally it was noted that the family of Pearson Curves includes the
beta and the normal distributions.
A small scale Monte Carlo experiment was conducted to check
the accuracy of the Pearson Curve approximation at six critical
values.
At the same time the significance levels of the corresponding
F test were examined.
It was concluded that the Pearson Curves gave
relatively good approximations to the critical regions examined.
However the F approximation tended to have smaller tails than the
true distribution of the modified F statistic when the number of
points involved
in the groups of replicates was increased.
When the proposed regression model expresses the actual relationship between the variables, the expected value of the test statistic
was expressed as a simple function of the average, over the pairs of
near-neighbors, of the squared interpoint distances.
Using a tech-
nique described in Durbin and Watson (1950), random variables which
bound the statistic r were derived.
For certain arrangements of the
76
data points, it was shown that the distribution of these random
variables could be expressed as a constant times a simple beta
distribution whose parameters do not depend upon the specific
values of the independent variables.
A possible test procedure
using these results was described.
A general expression for the nonlinear alternative model was
given.
Large sample conditions were derived which indicated the
behavior of the statistic r under H , and a method was described
A
to approximate the power of the test.
A
large sample comparison
of the test with the F test was made using the criterion of asymptotic relative efficiency as defined by Pitman.
Considerable
attention was given to determining conditions under which the
statistic is asymptotically normal under the Pitman alternative
hypothesis.
Sufficient conditions are essentially that the
eigenvalues of the matrix of the quadratic form of the numerator
of r be bounded for all n.
This can be accomplished by requiring
that the index of any point be uniformly bounded for all points
as n increases.
In these cases the Pitman efficiency of r relative
to F is in the neighborhood of one.
When the index pairs which
overlap the boundaries of the groups of points formed for the F
test, the efficiency becomes greater than one.
As a small sample comparison, a simulation experiment was
conducted for a two-variable regression model to compare the power
of the rand F tests against alternative models containing an
additional term which is nonlinear in form.
Two quadratic terms
and one interaction term were considered, yielding three alternatives.
Results indicated that the r test is at least as
77
powerful as the F test in nearly all of the cases considered.
However the simulation results also indicated that the actual
significance levels of the F test are decreased when nearreplications are used,
Apparently the sensitivity of the r
test, as well as that of the F test, depended more upon the
number of points available in the direction of the nonlinear
variable than on the functional form of the alternative for
the cases considered in this experiment.
7.2
Suggestions for Future Research
Numerous problems related to the work done in this thesis
seem to deserve further consideration,
This is especially true
if the r test is ever to be considered for routine use,
Many of
these problems are purely computational and some are theoretical
in nature.
Of major concern is the creation of the set D ,
L
Subsequently the choice of the parameter dL(or NL) should be more
rigorously investigated than is done here.
The computation of
moments, pairwise distances, and possibly eigenvalues prohibits the
use of the r test in large samples routinely,
A study should be
made of computer methods and computational shortcuts to minimize
computer time needed for larger sample sizes,
If pairwise dis-
tances are computed only for pairs of points within subsets formed
from one or two initial partitionings of the x-space considerable
computations can be saved, while sacrificing relatively few nearneighbor pairs.
Other methods similar to this may be discovered,
Although the r test is suggested for testing against alternatives of nonlinearity, most of the theoretical development was
78
done in the framework of testing independent observations against
correlated error structures (see Anderson (1948)).
The use of
the r test against such alternatives should be considered.
The
sensitivity of the test to other types of misspecifications should
be examined theoretically.
A more extensive study could be made to determine the relationship between the patterns of indices of the points and the patterns
of eigenvalues of MPLM for small samples.
A study of the type
should be made to determine if a test procedure could be developed
which, like the beta approximate bounds test, is independent of
the actual values in the matrix X.
The relative merits of pre-
choosing the indices of the points in patterns could be considered
for use in the case of data in which exact replicates cannot be
selected.
As mentioned in Chapter 1, the behavior of the r test when
used on transformed residuals should be studied.
For example,
consider the BLUS residuals of Theil (1965) which are independently distributed.
Many of the problems of the r test resulting
from the dependence of the usual residuals might be avoided if
the r statistic were based upon residuals, such as the BLUS
residuals referred to in Chapter 1, which are independently distributed.
This is a particularly important consideration since
the r test is recommended for small samples.
Since the r test is suitable for small samples, research
concerning the properties of the test leads itself to almost
endless possibilities for Monte Carlo studies.
Studies of the
79
type described in the previous sections could be extended to include
different emphases than are given here,
the case in which the
~
Particularly important is
values have a more normal-shaped dispersion
pattern, since observations would be clumped in the centero
same methods,
By the
the r test could be compared with respect to sen-
sitivity and power to other tests of nonlinearity such as the PraisHouthakker test (1955) and Thornby's nonparametric test (1972)0
The
work done in this investigation is of a rather general nature, even
though numerous assumptions are made throughout.
It is believed
that many useful analytical results can be obtained by considering
several more specific cases.
For example interesting cases would
include alternatives of polynomial trand using equally-spaced variables or indicator variables in the simple regression scheme.
80
REFERENCES
Anderson, R. L. 1942. Distribution of the serial correlation
coefficient. The Annals of Mathematical Statistics
13:
1-13.
Anderson, T. W. 1948. On the theory of testing serial correlation. Skandinavisk Aktuarietidskrift 31: 81-116.
Anderson, To W. 1958. The Statistical Analysis of Time Series.
John Wiley and Sons, Inc., New York.
Anscombe, F. J. 1961. Examination of residuals. Proceedings
of the Fourth Berkeley Symposium on Mathematical
Statistics and Probability.!.: 1-36.
Anscombe, F. J. and Tukey, J. W. 1963. The examination and
analysis of residuals. Technometrics 5: 141-160.
Ashar, V. G. and Wallace, T. D. 1972. Sequential methods in
model construction. Review of Economics and Statistics 54: 172-178.
Chen, E. H. 1971. A random normal number generator for 32bit-word computers. Journal of the American
Statistical Association 66: 400-403.
Cram~r,
H. 1946. Mathematical Methods of Statistics, Princeton
University Press, Princeton, N.J.
Bloch, D. A. and Watson, G. S. 1967. A Bayesian study of the
multinomial distribution. The Annals of Mathematical
Statistics 38: 1423-1435.
Daniel, C. and Wood, F. S. 1971. Fitting Equations to Data.
John Wiley and Sons Inc., New York.
Durbin, J. and Watson, G. S. 1950. Testing for serial correlation in least squares regression I. Biometrika 37:
409-428.
Durbin, J. and Watson, G. S. 1951. Testing for serial correlation in least squares regression II. Biometrika 38:
159-178.
Durbin, J. and Watson, G. S. 1971. Testing for serial correlation in least squares regression III. Biometrika 58:
1-19.
81
Fisher, R. Ao 1922. The goodness of fit of regression formulae
and the distribution of regression coefficients.
Journal of the Royal Statistical Society~: 597-612.
Henshaw, R. C, 1966. Testing single-equation least squares
regression models for autocorrelated disturbances.
Econometrica 34: 646-660.
Hoeffding, W. and Robbins, H. E. 1948. The central limit theorem
for dependent variables. Duke Mathematical Journal 15:
773-780.
Imhoff, J" P. 1961. Computing the distribution of quadratic forms
in normal variables. Biometrika 48: 419-426.
Johnson, N. L., Nixon, E. and Amos, D. E, 1963. Tables of percentage points of Pearson Cruves for given
~ and S2 expressed in standard measure.
50:
Biometrika
459-498.
Kamat, A. R. 1963. Asymptotic relative efficiencies of certain
test criteria based on first and second differences.
Journal of the Indian Society of AgriCUltural Statistics
15: 213-222,
Kendall, M. G. and Stuart, A. 1963. The Advanced Theory of
Statistics, volume ~, Charles Griffin and Company, Inc.,
London.
Koopmans, T. 1942. Serial correlation and quadratic forms in normal
variables. The Annals of Mathematical Statistics 13: 14-33.
Lancaster, P. 1969.
Loeve, M. 1955.
Theory of Matrices.
Academic Press, New York.
Probability Theory. van Nostrand, New York,
Noether, G. 1955. On a theorem of Pitman.
matical Statistics 26: 64-69.
The Annals of Mathe-
Orey, S. 1957. A Central limit theorem for m-dependent random
variables. Duke Mathematical Joul~al 25: 543-546.
Pearson, E. So and Hartley, H. O. 1966. Biometrika Tables for
Statisticians. Cambridge Uni versi ty Press,
Pitman, E. J. G. 1937. The 'closest' estimates of statistical
parameters. Proceedings of the Cambridge Philosophical
Society ~: 212-222.
82
Prais, S. J, and Houthakker, H. S. 1955, The Analysis of Family
Budgets. Cambridge University Press.
Press, S. J, 1969. On serial correlation,
matical Statistics 40: 188-190,
The Annals of Mathe-
Ramsey, J. B. 1969. Tests for specification errors in classical
linear least squares regression analysis. Journal of
the Royal Statistical Society, series B, 31: 350-371.
Scheffe, H, 1959. The Analysis of Variance. John Wiley and Sons,
Inc., New York.
Searle, S. R. 1970.
New York.
Linear Models.
John Wiley and Sons, Inc.,
Thornby, J. I, 1972
A robust test for linear regression,
Biometrics 28: 533-543.
0
Tukey, J, W. 1959. One degree of freedom for nonadditivity,
Biometrics 5: 232-242.
Theil, H. 1965. Analysis of distrubances in regression analysis.
Journal of the American Statistics Association 60:
1067-1079.
von Neumann, J. 1941. Distribution of the mean square successive
difference to the variance. The Annals of Mathematical
Statistics 12: 367-395.
von Neumann, J., Kent, R. H. Bellinson, H. R., and Hart, B. I.
1941. The mean square successive difference. The
Annals of Mathematical Statistics 12: 153-162.
83
APPENDIX
Lemma AI:
Let Ie be the index of the point X. (i=l,···,n).
1
4
Suppose
I1
-
<
Nr for all i and let £(n) be the number of points for which
L
<
N
r
o
1
If limit R,(n)/n = 0, then
n4<lO
limit
n-+<»
Proof.
\i
l
Let
~ \!2
2
tr(MPLM)
2n
L
~l ~ ~2 ~
>. • •• >
N
r
.:
~ ~n
•••
+ 1.
be the eigenvalues of PL and let
vn be the eigenvalues of MPLMo
-
Since X is of full
column rank
o -<
~"k < \!.
1"1<
-
by Theorem 3.6.1.
n
2
I
]1.
i=l
-
E
i=l
i=l
2
]1.
1
n-k
<
I
-
i=l
2
\!.
1
n
I L (1.
n
2
I 'Ii. -
but
1
Therefore
k
1
i.:l,2,"·,n-k,
< ~o,
1 -
i=l
1
Since the indices of
Xo
-1
1
<
n-k 2
-
I
]1.
" 1 1
1=
+1).
1
are uniformly bounded, the eigenvalues of
PL are also uniformly bounded by some value, AN' by the definition
of P "
L
Therefore
I L (1. + 1) - k AN
i=l
or,
n-k 2
2
n
1
1
<
Iv. <
i=l
1
n
I 1. (1,+1)
1 1 1
1=
o
2
n-k 2
(n-R,(n))N (N +l)-kA N < E \!.
r
r
- i= 1 1
(n-R,(n))Nr < 2n
Since
(n-R,(n))Nr(Nr+l)
nN
-
r
<
-
n'N (N +1)
r
< nN
,
r
Ln-k 2
2
kA
E v.
nN (N +1)
N
i=l 1
r r
-- <
< -:----:--:--2n - 2n
(n-l(n))N
L
L
r
l'
84
N +1
(1- X,(n)) (N +1) - 0 (lin)
Or,
n
r
r
<
(N +1)
r
But limit (1 - £(n))(N +1) = limit
n
r
(1
_ ~(n))
n4.00
tr(MPLM)
Therefore
limit
n4.00
Lemma A2.
2n
(l-Q,(n)/n)
<
= Nr
+ 1
n
2
- N
L
+ 1.
r
For fixed n let a(n) be the rank of the matrix A.
If
a(n) is of order n, then
limit tr(MA) ::: 1
a
Proof.
Let
~1(n) ~ ~2(n) ~
and let al(n)
~
~
•••
~ ~n(n)
•••
be the eigenvalues of MA,
Qn(n) be the eigenvalues of A.
Then since
X is of full column rank
~.
Q.1+ ken) -<
by Theorem 3.6.1.
1 (n) -< a.1 (n),
1=1,2,···
,n-k,
Therefore
n
n-k
E a.(n) < E~. (n)
i=k+l 1
- i=1 1
n
<
E a.
. 1
1=
en) ,
1
Since A is idempotent, the
Q. (n) -> 0 for every i end for all n.
1
eigenvalues of A are
1,
, 1,
1, 0, 0,
J
a
•
... , 0
....
n-a
Therefore,
n-k
a(n)
E 1 < E]..1. (n)
°kl
1=
+ -°11
1=
a(n)
<
-
L:
1
i=1
n-k
or,
a(n)-(k+l)
<
-
0
L:]..1. (n) < a(n)
1 1
1=
implying
I
85
1 -
S1nce
o
a n
()
(k+ 1)
t:(' (MA)
a(n) ~ a(n) < 1
= a ()
n,
l
11m1t k+
a(n) = a
, d
an
0
<
n-+<Xl
therefore,
11" mit tr(MA) = 1
n-+«>
a(n)
0
© Copyright 2026 Paperzz