•
SOME ALTERNATIVE MEASURES OF ASYMPTOTIC RELATIVE
EFFICIENCY FOR THE MULTIPARAMETER TESTING PROBLEM-wITH
APPLICATION TO THE GROWTH CURVE PROBLEM
By
Robert Francis Woolson
Department of Biostatistics
University of North Carolina at Chapel Hill, N. C.
Institute of Statistics Mimeo Series No. 903
JANUARY 1974
..
..
SOME ALTE&'{ATIVE
~rEAS{;RES
FOR THE HULTIPARAHETER
OF ASYHPTOTIC RELATIVE EFFICIENCY
TESTI~G
PROBLEM
~IITH
APPLICATION
TO THE GROHTH CURVE PROBLEM
by
Robert Francis Woolson
A dissertation submitted to the faculty
of the Universi ty of !{orth Carolina at
Chapel Hill in partial fulfillment of
the requirccents for the degree of
Doctor of Philosophy in the Department
of Biostatistics
•
Chapel Hill
1972
--/
V.;
----I'
--.
/
/{ lV~<)
~~
(
--
•
r-
0-
A4viser
Reader
'
...
·e
-
ABSTRACT
WOOLSON, ROBERT FRANCIS.
Relative Efficiency
with Application to
direction of PRANAB
Some Alternative Measures of Asymptotic
for the Multiparameter Testing Problem
the Growth Curve Problem. (Under the
KUMAR SEN.)
Criteria for evaluating the loss in underfitting"or overfitting
a growth curve model are proposed.
This problem has been formulated
as one of comparing two test statistic sequences which have limiting
chi-square distributions.
As the degrees of freedom of the two com-
petitive chi-square distributions may be different, alternative
measures to the standard Pitman asymptotic relative efficiency (ARE)
are considered.
Two such criteria are the trace asymptotic relative
efficiency (TARE) and the curvature asymptotic relative efficiency
(CARE).
factors:
Each of these quantities is sho\VD to be the product of two
the first factor reflecting the degrees of freedom of the
two tests and the common significance level, while the second factor
is a function of the two noncentrality parameters.
This second factor
for the TARE is the ratio of the traces of the matrices in the noncentrality parameters while the second factor for the CARE is the q
th
root of the ratio of the determinants of the matrices in the noncentrality parameters, where q is the number of parameters in the
common hypothesis of interest.
The CARE selects the test whose power
function has the greater generalized Gaussian curvature at the null
point, and the TARE selects the test with greater average local power,
where the average is taken as the average over the family of spheres.
The CARE and TARE are applied to the one-sample and multi-sample
growth curve problems.
Bounds for the TARE similar to the bounds
which exist for the CARE are derived.
It is shown that the c-sample
efficiencies are multiples of the one-sample results.
Numerical
illustrations of the TARE and the CARE are presented for specific
covariance matrices.
•
e
ACKNOWLEDGMENTS
. The author appreciates the assistance of his advisor,
Dr. P. K. Sen, and of the members of his advisory committee, Drs. S. K.
Chatterjee, J. C. Cornoni, J. E. Grizzle, and D. E. Quade.
He wishes
to especially thank Dr. P. K. Sen for suggesting the topic of this
dissertation and for thoughtful discussion at any time during the
research.
Special thanks also go to Dr. James E. Grizzle for serving
as the author's course work advisor and for providing the author with
the invaluable experience of working on numerous consulting projects
...
with him.
The financial assistance of the Department of Biostatistics
during the course of the author's graduate studies is gratefully
acknowledged.
The author also wishes to thank his family for their encouragement during the course of this work.
In particular he wishes to thank
his wife, Linda, his son, Rob, and his mother,
~rrs.
Sallie Hagey.
The author also expresses sincere appreciation to Mrs. Gay Hinnant
for the typing of the dissertation.
e ...
...
..
TABLE OF CONTENTS
Page
......
ACKNOWLEDGHENTS ••
LIST OF TABLES •
..
...
..........
...
ii
v
Chapter
1.
II.
INTRODUCTION AND REVIEW OF THE LITERATURE.
1
1.1
1.2
1.3'
1.4
1
2
Introduction. • • • • • • • • • • • •
Asymptotic Relative Efficiency • • • •
Review of Optimal Parametric Tests ••
Organization of the Study
.
..
8
11
MEASURES OF ASy}~TOTIC RELATIVE EFFICIENCY FOR THE
MULTIPARAMETER TESTING PROBLEM
• • •
2.1 Introduction and Summary. • ••
2.2 The Test Statistics and Their Limiting
Distributions • • • • • • • •
• •
2.3 T~e Asymptotic Power Function {QN} . . •
2.4 P~tman ARE Hhen t l = t 2 • • • • • • • •
2.5 Local Asymptotic Relative Efficiency (LARE) •
2.6 Curvature Asymptotic Relative Efficiency (CARE) •
2.7 Trace Asymptotic Relative Efficiency (TARE) •
2.8 Bahadur Efficiency • • • • • • • • • • • • • • • •
..
III.
APPLICATION OF TARE TO THE ONE-SANPLE GROlITH
CURVE PROBLEN. • • •
• • • • • • •
Introduction. • •
The Statistical Model • •
Comments on Data Reduction.
3.4 Parametric Procedures • • •
3.5 Nonparametric Procedures. •
3.6 Comparison of Nonparametric
Procedure • . • • . . • • •
3.7 Covariance Adjustment •
•
3.8 Comments. • • • • • •
3.1
3.2
3.3
14
14
16
27
39
41
50
57
59
67
. . . ·· .. ..
67
68
·..
73
81
·· .. ..
109
118
•
to Parametric
• • • •
••
• • • • • • • •
71
94
iv
Chapter
IV.
Page
APPLICATION OF TARE AND CARE TO THE MULTISAMPLE
GROWTH CURVE PROBLEM • ••
• • •
• •
4.1
4.2
4.3
4.4
V.
VI.
·.. .
Introduction..
••
The Statistical Model and Data Reduction ••
Parametric Procedures
• • • • •
Nonparametric Procedures. • • • • • • • •
120
120
120
122
130
NUMERICAL ILLUSTRATIONS· OF· THE CARE AND TARE •
135
5.1
5.2
5.3
5.4
135
136
140
155
Introduction • • • • • • • •
Special Covariance Patterns •
Underfitting • • • • • •
Overfitting • • • • • • • • •
SUMMARY AND SUGGESTIONS
·...
.......
FOR FURTHER RESEARCH • · . . .
....
REFERENCES • • • • • • • • • • • • • • • • • • • • •
e ...
159
161
...
w
vi
Table
5.3.9
Page
Ratio -of Traces of Noncentra1ity Parameters for
Underfitting: First Order Model p = .1,
h = 10
•
•
•
•
... . ... .. .....·····
5.3.10 Ratio of Traces of Noncentrality Parameters for
Underfitting: First Order Model p = -.95,
h = 10
•
.. ....
. ....
5.4.1
Values of D and CARE for Overfitting p
= .5.
5.4.2
Values of D and CARE for Overfitting p
= -.5
5.4.3
Values of D and CARE for Overfitting p
= +.9
5.4.4
Values of D and CARE for Overfitting p
= -.9
····
e ...
153
154
157
··
·····
157
158
158
...
..
NOTATION
B
..,
is a matrix of real elements.
I!I
is the determinant of B.
II~II
is the Euclidean norm of a vector A•
.;f(~)
a.s.
-+-
is the law or distribution of a random vector X•
...
p
'-+-
denote convergence almost surely and in probability
respectively.
tr(B)
...
is the trace of the matrix ...B•
~®~
is the Kronecker or Direct Product of A and ...B•
<Sij
is the Kronecker Delta.
ch(B)
..,
denotes the set of characteristic roots of ...B•
denotes Euclidean t-space.
denotes the t dimensional normal distribution.
is the identity matrix.
J
...
is the matrix of ones.
CHAPTER I
INTRODUCTION AND REVIEW OF THE LITERATURE
1.1
Introduction
When two or more test procedures are available to test the
same hypothesis we are faced with the problem of deciding which one
to use.
While the null hypothesis may be parametric or nonparametric,
in most situations one is interested in a specific family of alternative hypotheses.
Hence we assume that this family is parameterized
by a parameter 8 and, furthermore, that the value of 8 when the null
hypothesis is true is denoted by 8 •
0
For this reason we shall
notationally represent the null hypothesis in this study as:
H :
O
8
= 80
In this thesis the specific problem of comparing two test
statistic sequences which have limiting chi-square distributions with
possibly different degrees of freedom is to be studied.
Themain
purpose of this work is to suggest and justify some measures of
asymptotic relative efficiency (ARE) which may be used in comparing the
two test sequences.
ObViously, the measures of efficiency proposed
should be such that, if the efficiency of test 2 relative to that of
test 1 is greater than one, then test 2 possesses a limiting power
function which in some sense is better than t.he limiting pml1er function
of test 1.
In addition, it is desirable for any
m~asure
of ARE
.
2
proposed to be free of arbitrary or unknown quantities and, furthermore, it shouid take on a single numerical value when sampling is performed from a completely specified distribution.
Another purpose of
this work is to apply the proposed criteria to the comparison of tests
used for growth curve models.
Of particular interest is the loss in
efficiency when we underfit or overfit the true model.
We shall see,
however, that the efficiency criteria proposed are applicable to a
larger set of problems and are not restricted to the study of growth
curve problems alone.
1.2
Asymptotic Relative Efficiency
For comparing two a-level tests, ¢l and
¢z, a reasonable measure
·of relative efficiency to use is:
where Nl and NZ are the minimum sample sizes for which ¢l and ¢2 at
level a have power
S against
the alternative that e
simplicity we take e to be scalar.
= ea ,
where for
To study this ratio for all values
of (a,S,e ) is quite an involved study and, as an alternative, we may
a
consider asymptotic comparisons of the two tests.
One asymptotic
approach is to fix ea,e a ~ eO and compare the limiting powers of ¢l
and ¢2 as sample size increases without bound for this fixed alternative.
•
An obvious shortcoming of this approach is that if ¢l and 92
are both consistent tests then both of the limiting powers are one
and this method of comparison does not discriminate between the two
tests.
On the other hand, if the tests are inconsistent they are of
little interest.
Bah~dur
(1960) considers the comparison of the
3
inverse ratio of the sample sizes in the limit for the two tests to
achieve the same significance level; we shall see in Chapter II that
this efficiency has serious deficiencies for our problem.
Pitman (1948) [see e.g. Noether(1955)] proposed that rather than
considering a fixed alternative hypothesis, a sequence of alternative
hypothesis depending on the sample' size, N, be chosen such that the
limit of this sequence approaches the null point and simultaneously the
power is bounded away from one.
In the Pitman sense we consider
testing
against the sequence
where A is a fixed but arbitrary nonzero real number and 0 > O.
The
Pitman procedure for obtaining a measure of ARE is to consider two
sequences of sample sizes, {N } and {N }' chosen such that the limiting
l
Z
powers of the two tests through the sequence {eN}' are the same.
Then
according to Pitman we have:
Definition:
The asymptotic relative efficiency (ARE) of test
relative to test
~l
~Z
is the limiting value of N/N Z where
~l
N is the number of observations required by test
l
the power of test
~l
NZ observations while simultaneously NZ
If
~l
and
~z
~z
to equal the power of
~
00
for
based on
and eN
~
eo'
are tests based on the statistics t Nl and t NZ '
respectively, and if the limiting distributions of t
converge to the normal distribution as N ~
00,
Ni
through
III
~
then with suitable
restrictions on the limiting behavior of the first two moments of t
Ni
4
we may find the Pitman ARE quite easily.
...
= eN
e
O: e
H
Denote the mean of t
by ~Ni(eN) and its variance by o~i(eN)'
= 8
0
against the one-sided sequence
Ni
when
Consider the hypothesis
. -0
eN = 8 0 + N A where
~:
A > 0 «0) and, suppose we have the following three conditions in
addition to the limiting normality of t
d l1
(1. 2.1)
lim
N~oo
(1. 2. 2)
N~oo
d l1
(1. 2.3)
lim
N~oo
:
«0)
> 0
PNi(60~
d PN1(6N)
de
d8
d
lim
(8 ) .
0
Ni
de
Ni
Ni
de
(8 )
0
= c. >
l.
0
«0)
0
N 0Ni(8 0)
then the ARE of
o = 1/2
~2
~l
relative to
and in general it is
the efficacy of test
[:~ ]
can be seen to be
cC12 ] 1/0
2
when
The quantity c. is called
[
1.
~ .•
1.
Noether (1955) extended Pitman's definition of ARE to the case
where the first (m
M.
1.
th
derivative is
i
- 1) derivatives of
nonzero at[
ARE of ~2 relative to
<PI is
~~']l~~~h
c
~Ni(e)
at 8
Ml = M2
0
are zero and the
= M, he found that the
where
l
c
.
i
= lim
N~oo
N
d l1
(8 )
Ni 0
d eM
~7-----
JiO
aNi (eo)
Although not explicitly pointed out in Noether's article, .it should be
....,
observed that i f one does not require H .= H then the ARE is
2
I
indeterminate, depending as it does on the unknown Al (or A ) raised
2
5
to a power of (M
since if M
l
~·M2
- M ).
l
2
This point causes no practical limitation
then we are comparing tests which behave quite dif-
•
ferently.
For the one sided alternative hypothesis, Blomqvist (1950) proposed an asymptotic local relative efficiency defined as the limit of
the ratio of the sample sizes chosen so that the two limiting power
functions have the same slope at eO'
o = 1/2
Under the conditions M = 1,
and
Noether (1955) established the equivalence of this definition and the
Pitman definition.
Kendall and Stuart (1963) discuss other cases and
show, for example, for the two sided test with M = 1 that under mild
restrictions the ARE is equal to the limiting ratio of the second
derivatives of the power functions of 8 ,
0
It has been observed [see e.g. Puri& Sen (1971)] that the
theory for comparing the two limiting distributions does not require
asymptotic normality.
It is sufficient that the two power functions
can be made analytically the same by an appropriate choice of the
sample sizes.
They present the requisite theory for the comparison of
two test statistics which have noncentral chi-square distributions
both with the same degrees of freedom, p.
In this case, since the
noncentral chi-square is a monotonically increasing function of its
noncentrality parameter, they found that when M = 1 and 0
= 1/2
that
the ARE is simply the ratio of the two respective noncentrality
parameters.
They point out that this definition is entirely equivalent
to the Pitman-Noether definition for comparison of tests based on one
.
6
degree of freedom.
Turning now to the multiparametertesting problem we consider
•
H :
O
e = ~O
and the sequence of alternative hypotheses,
qXl
~:
~N
= ~O + N-0
and 0 > O.
M = 1, 0
A where A is a fixed but arbitrary non-null vector
Puri and Sen study the case p
= 1/2
=q
in their text and when
observe that the Pitman-Noether efficiency is the
ratio of two positive definite quadratic forms in A.
Clearly, in this
case, there is no unique answer regarding the ARE since it depends, in
general, on the arbitrary vector A.
By application of a theorem due to
Courant on the extrema of the ratio of two positive definite quadratic
forms [Puri & Sen (1971, p. 122)], bounds may be placed on the ARE
over all non-null A.
their text.
This is, in fact, done for a number of cases in
The approach of placing bounds on the ARE provides some
information on the ARE.
However, some tests may be placed in a mis-
leading position in the spectrum of tests for a given problem since
the minimum bound on the ARE could be quite near zero, while the
typical or average performance of the test may be better than its
competitors.
To develop appropriate average measures free from A is
one of the goals of this investigation.
In our study we shall consider
a Pitman type of sequence of alternative hypotheses; however, a comparison in the sense of Bahadur (1960) is a possibility.
Hence, lye
shall summarize the necessary points for the Bahadur criterion of
•
efficiency •
In the Bahadur definition of efficiency it is assumed that we
have a family of probability measures parameterized by
on the same probability space.
~
E n, defined
The parameter set is denoted by nand
7
is partitioned into
containing
~O.
and
n - nO
where
nO
is the single point set
Let us suppose that the two competing test statistic
sequences are {T
Definition:
nO
Nl
} and· {T }.
N2
The sequence of test statistics {T } is called a
Ni
standard sequence for testing H if, and only if,
. {T
Ni
1.
O
} satisfies ·the following conditions
there exists a continuous distribution function F.l. (x)
such that
lim Pr
[T
80
Ni
N-+-co
2 x] = F.(x)
for all
l.
real x,
II. there exists a constant
a.~(O,ro)
l.
such that
a.x 2
l.
-loge [1 - Fi(X)] = -2-·- [1 + 0(1)] as x -+- co,
III. there exists a function b.(8) from Q l. -
positive real line such that for every
nO
~
to the
> 0 we
have
for all 8 E
Q -
nO.
If {T } is a standard sequence for testing H then the quantity
Ni
O
a b~ (8) is called the asymptotic slope of the sequence {T }.
i
Ni
With
this in mind, if both {T } and {T } are standard sequences for
Nl
N2
testing H ' then the Bahadur efficiency of test 2 to test 1 is
O
a2b~(8)
b 2 (8) for all 8
all
~
Q-
nO.
It can be sho\Yn that the Bahadur
efficiency is the limit of the inverse ratio of the sample sizes
needed for the two tests to have the same level of significance in
large samples.
In the later chapters we will show that under mild
assumptions the Bahadur efficiency of ¢2 relative
~o
¢l is
8
i
1
r- (8) n (e) for 8 E n - nO' where
~2(~) / n'(e)
~1 ~l
-1 both -1
n.(e)
and r.(b)
depend on the alternative value of e. The
_
_1 -
II
Bahadur efficiency will also be shown to be insensitive to differences
II
-1
~i(~) ~2
(~)
in degrees of freedom of the two tests.
The measures of ARE we shall propose.wil1 be defined using
various notions of test optimality.
II
Ii
Hence, we shall review the rele-
"
I:
vant areas of the theory of hypothesis testing in the following
/I
"II
section.
II
II
1.3
II
Review of Optimal Parametric Tests
Uniformly most powerful tests (or critical regions) are known
to exist in only the rarest situations.
In order to derive tests with
uniformly most powerful properties when restricting attention to a
~
subset of all available tests, Neyman and Pearson (1936) generalized
their fundamental lemma (for testing a simple hypothesis against a
simple alternative), to power functions subject to more than one side
condition.
This generalized lemma is stated and proven in several
places (e.g. Lehmann (1959».
For testing the hypothesis H :
O
against the two-sided alternative Ha :
e ~ eO'
e=
8
where 8 is a one-
dimensional parameter, Neyman & Pearson (1936) propose the type A
critical region which may be described as the locally best unbiased
critical region.
This test is the one which maximizes
derivative of the power function evaluated at 8
0
unbiasedness restrictions on the power functions.
~he
second
subject to size and
For the two
parameter testing problem, Neyman & Pearson (1938) proposed type C
critical regions; these regions can be constructed if one knows the
relative importance locally of type II errors, since this region is
0
9
defined to be the one with best local power along a given family of
concentric ellipses with the same shape and direction of principal
axes.
If the family of ellipses are concentric circles then we say
the type C region is regular otherwise it is said to be nonregular.
Two main objections to type C regions can be raised; in the first
.
.
place, one may not be able to state the relative importance of type II
errors and, secondly, regular regions are not invariant under one to
one, twice differentiable transformations of the parameter space.
The
last point simply means that regular regions can become nonregular
regions even under some elementary transformations of the parameter
space.
To overcome these problems Isaacson (1951) proposed a type D
critical region which does not require knowledge of the relative
importance of type II errors.
To motivate the type D test Isaacson
observes that the type A power function satisfied an attractive
geometrical property.
Namely, if one considers a horizontal chord
drawn at a fixed infinitesimal distance above 8
0
that the length of
this chord, for the type A power function is a minimum when compared
to the length of the chord for any other of the power functions
satisfying the stated conditions of size and unbiasedness.
A type D region can be defined in the q parameter testing
problem as that region which maximizes the generalized Gaussian
curvature of the power function at
conditions.
~O
subject to size and unbiasedness
In order to be more specific let us denote by S(~lw),
the power function of a test with critical region w, and let
a(i)(~olw) denote the first partial derivative of S(~lw) with respect
to 8
i
at
~o'
i
= l,2, ••• ,q.
Further let
S(i,j)(~olw)
denote the second
partial derivative of S(~lw) with respect to 8.8.J evaluated at
~
~.
~
10
~
= ~O'
i, j = 1,2, ••• ,q.
denoted by
IAI,
A be
then the following is Isaacson's definition of a type
D region for testing H :
O
Definition:
Letting the determinant of a matrix
8 :
....8 = ....0
A region W is said to be an unbiased critical region
o
of
~
D for testing H if:
O
a. 13(~olwo)
= Ci., .
b. S(i)(8Iw)=0
.... 0 0
i = 1,2, ••• ,q,
c. «S(i,j) (8 Iw ») is positive definite,
....0 0
d. /«S(i,j)(8 Iw »)1 > !«S(i,j)(8 Iw»)/ for any other
-0 0
.... 0
region w satisfying conditions a, band c.
In the two-dimensional case it follows that the type D region
minimizes the area of an ellipse at an infinitesimal distance above the
point
~O'
and in the general q parameter case the type D region would
be the one
whos~
power function minimized the volume of a certain
ellipsoid at a given cross-section of the power function.
The type D
region is then a generalization of the type A region to more than one
dimension.
Type D regions are characterized by Isaacson by use of the
generalized Neyman-Pearson lemma.
We know the generalized Gaussian
curvature of S(~lw) at ~O is:
Definition:
The generalized Gaussian curvature of S(~lw) at ~O is
K=
..
1«S(i,j)(~olw»)I/(l
+
i
. 1
J=
Thus, if a test is unbiased then K =
S(j)2(8 IW»2
....0
1«S(i,j)(~olw»)I. If we
compare two critical regions (tests) then the test with the larger K
has a power function which encloses an ellipsoid of smaller volume
than the other power function along any of a family of infinitesimal
11
contours.
This
presumes~ of course, that both of the «e(i,j)(~olw»'s
are positive 'definite since
a
.
lack of definiteness of this matrix would
mean that the intersection of the power function with the fixed hyperplane would not be an ellipsoid.
Wald (1943) defines a critical region W to have uniformly best
o
average power with respect to a family of surfaces, K , and a weight
c
function,
g(~),
if for any other region w the surface integral of the
power function of W times g(8) over K is greater than the surface
o
c
integral of the power function of w times g(~) over the surface K c '
If we define the surface as the unit sphere, i.e.
I I~I I = 1,
can show that the surface integral of a quadratic form
I I~I I = 1
then we
A' B A over
is proportional to the trace of the matrix B.
=k
A' BAd A
That is,
J
--
-
tr(B). Using this fact, it will be
II~ 11=1
possible to deduce that of two tests the one with a larger trace of
«S(i,j)(~olw») has greater average power locally over the family
of spheres than the other.
1.4 Organization of the Study
Since many of the standard parametric and nonparametric mu1tivariate test procedures are based on test statistics which are quadratic forms, we shall consider in the general development in Chapter
II, two sequences of statistics which are quadratic forms in two
sequences of random vectors.
When the null hypothesis, H :
O
~
= ~O'
qXI
is true it is assumed that each of the sequences of test statistics
have limiting central chi-square distributions; one with t
freedom, the other with t
2
degrees of freedom.
1
degrees of
Sufficient conditions
~
12
are presented under which we may compute the limiting power of our
statistics through the sequence of alternative hypotheses
HN'
Simpli-
fications of the power functions are obtained in terms of the parameters.
Three new definitions of ARE are proposed in Chapter II.
These are:
a) local asymptotic relative efficiency (LARE), b) curva-
ture asymptotic relative efficiency (CARE) and c) trace asymptotic
relative efficience (TARE).
Chapter II.
The precise definitions are presented in
All three criteria depend on the level of significance
of the tests, a, and the degrees of freedom, t
l
and t '
Z
Tabulations
in Chapter II show the dependence on a is slight while the dependence
on t
2
- t
l
is strong.
In addition, all three criteria depend on the
noncentra1ity parameters and the latter two criteria are shown to be
"average" measures of efficiency, independent of the direction of
approach of
~N
to
~O.
The second criterion of ARE is a function of
the ratio of determinants of the noncentra1ity parameters, while the
trace criterion is a function of the ratio of the traces of the noncentrality parameters.
There is, it seems, an interesting connection
between these criteria and type D and type E optimality in the field
of experimental design.
Chapter II is concluded with a brief study
of the Bahadur efficiency and its relationship to the LARE in a limiting sense.
Chapters III and IV are applications of the measures of ARE to
the one-sample and multisample growth curve problems, respectively.
•
Polynomial models are studied and the efficiency results are presented
for the cases of underfitting and overfitting the correct model.
These efficiency results are presented for the least squares procedures
and the rank scores tests.
The least squares and nonparametric
13
procedures are also compared using the curvature and trace criteria.
Bounds using the trace criterion are obtained similar to those known
for the curvature criterion.
The use of the higher order polynomial
terms as covariables is also studied and the ARE is evaluated in this
case.
Chapter V contains numerical computations of the results obtained
in Chapters III and IV for some special cases.
\
CHAPTER II
MEASURES OF ASYMPTOTIC RELATIVE EFFICIENCY FOR THE
MULTIPARAMETER TESTING PROBLll1
2.1
Introduction and Summary
The purpose of this chapter is to propose and study several
competitive measures of asymptotic relative efficiency (ARE) for the
multiparameter testing problem.
We shall assume throughout that we
have two sequences of test statistics available for testing the same
hypothesis.
It is customary in both parametric and nonparametric
inference to consider a specified type of alternative hypothesis,
e.g. translation alternatives, scale alternatives, etc.
family of alternatives by a parameter
e when
e
and we let 8
the hypothesis we wish to test is true.
0
We label this
be the value of
In the text of this
chapter we shall occasionally refer to the null hypothesis as
H :
O
8 = 8 ; however, we should keep in mind that the hypothesis may
0
in fact be much more general.
To compare the two test statistic
sequences in large samples we shall consider a Pitman type sequence of
alternative hypotheses and we shall present in the form of a theorem
(Theorem 2.2.1), sufficient conditions under which these test statistics have limiting chi-square distributions through the sequence of
•
alternative hypotheses.
Under the assumptions of Theorem 2.2.1 we
shall study in detail the limiting power functions of the statistics
and obtain several simplifications.
To derive these results we shall
'
15
find it useful to prove some results for homogeneous polynomials in
general and then apply these results to the expanded power functions.
In order to compare the two test statistics we consider a common
sequence of alternative hypotheses and derive the Pitman ARE of test 2
to test 1 when they both have limiting chi-square distributions with
equal degrees of freedom.
When" the degrees of freedom are unequal
we propose a local asymptotic relative efficiency (LARE) for the comparison of the two tests.
This LARE is found to be equal to a scalar
function, R, multiplied by the ratio of the noncentrality parameters
of the limiting power functions.
The function, R, has as its arguments
the common significance level of the tests, a, and the two respective
degrees of freedom of the tests.
of q parameters and the i
th
In the general setting
~
is a vector
test statistic has a limiting power func-
tion with t. degrees of freedom for i = 1,2.
~
The power depends also
on a positive integer, M, (defined by the conditions of Theorem 2.2.1).
In considering other measures of ARE we briefly discuss the problems
when M > 1 and then restrict ourselves to the case M = I."
least one t. > q for i
~-
=1
IVhen at
or 2 we define a measure of ARE based on the
generalized Gaussian curvature of the limiting power of functions at
the null value of
e.
We shall show that this criterion of ARE produces
•
a quantity which is independent of the sequence of alternative hypotheses and furthermore when t
1
= t z this
criterion reduces to the ratio
of the geometric means of the characteristic roots of the matrices in
the noncentrality parameters.
When t
l
< q and t
z<
q, the generalized
Gaussian curvatures of the power functions are zero at
method of comparison is useless.
~O
and this
In this situation we propose a
..
16
criterion of comparison based on the average power of the local power
functions over a given family of surfaces.
This approach leads to
comparison of the trace of the matrices in the noncentrality parameters
multiplied by the function, R.
We consider at the end of this chapter the efficiency criterion
proposed by Bahadur (1960).
We present sufficient conditions under
which the statistics we consider form a standard sequence for testing
the null hypothesis, and obtain the Bahadur efficiency of test 2 to
test 1.
It is shown that this efficiency (Bahadur) depends on the
unknown alternative
vector.
e in
both the covariance matrix and the location
The difference in degrees of freedom of the bvo tests is
also not reflected in this efficiency criterion.
the limiting form of the Bahadur efficiency as
e~
\~e
~O
discuss briefly
and under certain
assumptions show that this limiting situation is equivalent to the LARE
without the adjustment for degrees of freedom of the tests.
The
insensitivity of the Bahadur efficiency to differences in degrees of
freedom of the tests and the complicated manner in which this efficiency depends on the unknown alternative value of 8 lead us to dismiss
this criterion as a possible mode of comparison of the two tests.
2.2
The Test Statistics and Their Limiting Distributions
For testing a specified null hypothesis against a parametric
family of alternatives we have two tests of size a available, ~l and
$2·
The family of alternative hypotheses is parameterized by the
vector 8, which has q elements.
-
value of 8 is denoted by
null hypothesis as:
~O;
\~en
the null hypothesis is true the
for this reason we sometimes denote the
17
(2.2.1)
= ...0
6
H :
O
...6
H:
a
6 = 6a
and the alternative by
#'ItJ
~
......
6. . . .
0
{Q~l)}
The tests, $1 and $2' are functions of the statistics
and {Q~2)}, respectively.
..
These statistic sequences are further
assumed to be quadratic forms in random vector sequences {T(l)} and
N
{T~2)}, respectively. Each vector {T~i)} is composed Qf t i elements
and the test statistics are written as:
(2 2 2)
••
where
E~i)
(i)
QN
=
N(T(i)_ll(i) (8 »' ~(i)-l (i) (.)
N ~N
-0
~N
(TN -~N~ (8 »; i = 1,2
0
...
IN ~~i) (~O)
is the mean vector of
IN T~i)
when
~
=
~O
and
is some consistent estimator of the covariance matrix of
IN !~i).
The tests would reject the null hypothesis for large
and accept the null hypothesis for small Q~i).
the discriminant of
Q~i)
We may assume that
Q~i) is of full rank, since if it is not it is
always possible to express
Q~i) as a quadratic form in fewer variables
whose discriminant would be nonsirigular.
To illustrate this point more
clearly let us consider the multivariate multisample location problem.
One statistic in common use is the Hotelling- Lawley Trace ,,,hich is
defined as:
c
(2.2.3)
P
P
2
2: sij(x(i)_~(i»(x(j)_~(j»
T = 2: nK 2:
K
K
N
i=l j=l
K=l
= «(N-c)
c
N=
2: ~
K=l
-1
,,,here «sij) )
c
2:~xii)-x(i»(x(j)-xij»)-1
2:
i
K
Ki
K=l i=l
i,j = 1, ••• , P
e
18
T; is used to test the null hypothesis:
...
HO: . ~l
pXl
(2.2.4)
where
~K
= ~2 = ••• • 1J
.... c
pXl
pXl
is the location vector of an observation from population K.
/
We can rewrite
~K
as
. where
c
L
K=l
-
Y = O.
K ....
With this restriction the hypothesis (2.2.4) can be written equivalently as
(2.2.5)
Xl = X2 = ... = Xc-l
=
o.
••• , ~,n are independent and identically distributed
.
K
p-variate normal vectors with mean ~ and covariance matrix ~,
If
~l'
symmetric and positive definite, then
n
-
K
-1
x.. - n
L x..
....K
K t=l -K
is distributed Np(PK' n
--
t
-1
2:)
K ....
and
_
-1 c
x = N
L nK
K=l
_
~K
-1
c nK
is distributed Np ( L N !::K,N
K=l
E).
As a result we see that for each K = 1,2, ••• ,c we have
\
(x..-~) is distributed N. (llK-ll , (n- l _N-1 ) E).
-K ~
P - K
The covariance of
- = -
=
-1
(~-~)(~K,-~)'is- N
~
and consequently the joint
19
distribution of [(~-~); K
mean
= 1,2, ••• ,c]
is pc-variate normal with
K = l,2, ••• ,c] and covariance matrix defined by:
[(~K-~)'
(2.2.6)
>:
@
[~Kq ~
_
]
K
K,q
= l,2, ••• ,c
c
with the restriction
>: nK(~-~} = ...0, we see that the rank of this
K=l
..
distribution is p(c-l}, which means the quadratic form in (2.2.3),
if viewed as a quadratic form in [(~-~); K = 1,2, ••• ,c], has a discriminant of rank p(c-1}.
On the other hand if we consider the
p(c-l} vector;
[(~-~); K = 1,2, ••• ,c-l],
(2.2.7)
this is p(c-1} variate normal with mean
and covariance matrix given by
(2.2.8)
K,q
= 1,2, ••• ,c-1
The inverse of (2.2.8) is
(2.2.9)
l,2, ••• ,c-1
Since we have sampled from normal populations we know from the strong
laws of large numbers that S
=
«s .. )} a.s. >: as N +
~J
+
...
00.
So a con-
sistent estimate of (2.2.9) is provided when we substitute S-l in
place of
E-1
in (2.2.9).
With some algebra, and using the restriction
20
c
(x
L n
-~)
K=l K -K -
.
= 9,
2
we see that TN' defined by (2.2.3), can be written
equivalently as:
.
N5 K
1
(2.2.10) [1N(~-~),K=1,2,... ,c-1]'(~® (~- 1»- [IN(~-~),
K
K = 1,2, ••• ,c-1]
But (2.2.10) is a quadratic form whose discriminant is of full rank,
so we see that quadratic forms can be written in a "full rank" form.
In the theory presented in this chapter, we may, therefore, consider
without loss of generality that the quadratic forms, which we use as
test statistics, to have nonsingular distributions.
We now present a
theorem which allows us to compute the limiting power of our statistics
through a sequence of alternative hypotheses.
Theorem 2.2.1.
For testing a specified hypothesis H against a
O
sequence of alternative hypothesis
where
e>
~,
defined by:
0, A is a fixed non-null vector and 8
-
-0
Me
when HO is true; we have a t- vector
N
is the value of 8
Me
!N with mean N ~N(~) and
positive definite covariance matrix E(8).
Suppose the following
five conditions are true:
a)
L,.. ( ~
N-
r. 1~1
e,
a)r
(8 ) = 0
1\1 a8 1
~.j-O
for r = 1,2, .•. ,M-l
for j. = 1, 2, .•• , t
(2.2.11)
\
a
1 q -0
M
b) ~1' ( L NAn -;:;;--8) llN' (8 0 )
1". 1=1
N 0
1
,J -
:f
0
for some M > 1 and at
least one
j e: (1,2, ••• ,t)
21
P
A
a) ch EN
~-l(~N) -r 1
as N -r
00
(2.2.12)
b) ch
(2.2.13)
-r 1
as ~ -r ~O
...
th
the M partial derivatives of ~N.(e) are continuous at
J
~o
(2.2.14)
rca) ~-1(~0)
~
= 1,2, ••• ,t
for each j
=
lim
N-rOO
c
tx1
Me
{2.2.15)~TN (!N - ~N(~N»] -r Nt(~'~(~O»
as N -r
00
uniformly in~, and
the distribution is non-degenerate, then
where X2(t,~) is a noncentra1 chi-square with t degrees of
freedom and noncentra1ity parameter
(2.2.17)
~
~
and
£' ~-l(eO) £
= ----::-~(MI) 2
Proof:
For each j
= 1,Z, ••• ,t
by (2.2.11) and (2.2.13) we can write
by Taylor's theorem and the mean-value theorem:
*
(8 )
~
where
a* = ~O
therefore
e
+ h N- ~; h
£ (0,1),
22
where
But E: Nj
(2.2.13)
*
(~'.~O)
E:
Nj
= 0(1) as N +'00 because ~ *
*
+
~oas
N+
00;
hence by
= 0(1). So
(~,~O)
Let us defineQN and Q* by the following
N
ZMe
QN = N (tN - ~N(~O»'
* ZMe
QN
= N (tN -
~N(~O»'
*
'-Ie first shall show that ;r'(~)
fo11mv that ~~(QN)
Now
+
+
X2 (t,L\) as N +
X2 (t,L\) as N +
00,
N (!N - ~N(~O»
it
Me
= N (!N - ~N(~N» + N (~N(~N) - ~N(~O»
and by (2.2.18) we have as N +
Me
(2.2.19) N (tN - ~N(~O»
00.
Me
= N (!N -
~N(~N»
2Me
By (2.2.15) and since N (!N - ~N(~N»'
}!o
a continuous function of N (!N -
~N(~N»
,vould
=, :
as we shall show IQN-Q
Me
M~
00, hence
we have that
o.
23
2Mo
~(N
(!N ~ ~N(~N»'
but by (2.2.l2b) and the fact that
2Mo
~[N (!N - ~N(~N»
,
~
-1
~N + ~O
as N +
(~N) (!N - ~N(~N»]
+
00
we also have that
2
X (t) as N +
00.
Let
2Mo
PN = N (!N -
~N(~N»' ~-l(~N)
(!N -
~N(~N»
*
N
by (2.2.19) we can write Q as
*
1
q
a M
-1
1
q
a M
QN = PN + [Mj"(R,:/'R, a6R,) ~N(~O)]' ~ (~N)[M!(R,:lAR, a6R,) ~N(~O)] + 0(1)
By (2.2.l2b) we can replace ~
r
- = 0(1)
-
as N +
00
-1
-1
(~N) by~.
+
+
I
where
hence
But by (2.2.14) the 2nd term becomes 6 as N +
as N
.
(~O)
00
hence
*
~(QN) +
..0
X2 (t,6)
00.
To show the asymptotic equivalence of Q and Q* we first note
that
Q* is
N
N
bounded in probability since 'it has a
N
X. distribution;
2
be more explicit for any £ > 0 there exists a K(N,s), depending on
Nand
S
such that
P[QN* ~ K(N,S)]
~
1 - S for every N > Ns '
We denote this property by writing
Q* = 0
N.
P
(1) as N +
00.
to
24
Consider IQN - QN* 1 •
we have Q* > O.
N
There exists an NO such that for all N > NO
So we may write for N > NO
,*-
IQN - QN*I =
QN
*
11
QN
QN
Now by Courant's theorem on the ratio of two positive definite
quadratic forms we know that
where
= smallest
"
-1
root of EN E
(~N)' and
sup
where
So we obtain
"
Yt,N "
"
but by (2.2.l2a) we know Yl,N and Yt,N
*
IQ N - QNI
2
op(l) 0p(l)
=
op(l) as N +
11
converge stochastically to one
00
* are
and we see that QN and QN
Q.E.D.
asymptotically equivalent and the theorem is proven.
We observe that if we have 2 sequences of test statistics {Q(l)}
Nl
and {Q(2)} -and a common sequence of alternative hypotheses:
N
2
~:
-0
~N = ~O + N
~
where {N } and {N } both depend on N and furthermore if there exist
l
2
constants P and P2 defined by:
l
(2.2.20)
Pi
E
(0,1]
i = 1,2
25
then Theorem 2.2.1 could be seen to yield the conclusion that
2Mi O (')
(T 1 _ ~(i)(e
i
-N
-N -0
~(N
(2.2.21)
»'
-1
A
~~i) (T~i) _ ~i)(~O» ~ X2(t'Pi2MiO~).
This fact will be used in later sections to obtain some
efficiency results.
The only value of M which we consider in the applications in the
later chapters is M = 1; no practical situations are known to us in
which M > 1.
With regard to 0 we note that for the multivariate
location problems and the growth curve application that 0
tests of independence one would require 0
= 1/4
= 1/2;
for
in order to keep the
limiting power bounded away from zero and one.
Since M = 1 for the cases we study we present in the form of a
theorem, an observation concerning the noncentrality parameter when
t
< q.
If t, q,
~,
:' M and
~(~O)
are as defined in Theorem 2.2.1 we
have the following theorem:
Theorem 2.2.2.
If t < q and M = 1 then there exists at least one
A 1 0 such that c =
o.
Proof:
If M = 1 then
~ = A'
D'
r-
1
(~o)
D
A
where D is defined
tXq qXl
by :
= E~.
rank
(E'
~
-1
Since t < q we know that rank
(~O)~) ~ t < q.
q x q has rank s where s < q.
(E)
So the matrix
E'
~
t < q and further
1
~- (~o) D of size
26
I
By the symmetry of D'E- (8 ) D there exists an orthogonal matrix
-0
'V
'""'"
P such that
1
P'(D' E- (8 )D) P
-
-
-
-0 -
-
where R has the roots of
= diag(r 1 ,r Z, ••• ;r s 'O,O, ••• ,O) = R
(~' ~
-1
(~O)~)
on the diagonal.
Define the full rank transformation
= A*
P'A
and
therefore
A' D' E- 1 (~O) D A = A*' R A*
Let the first s elements of A* be zero and at least one of the
remaining (q-s) elements be nonzero, then
A*' R A*
but P A*
=
=0
A and I~I # 0 therefore A # 0 and the result is proven.
Q.E.D.
We see that if t < q there will exist directions for which 6
will be zero and hence our test
1ft
~q
~vill
have power a in those directions.
and i f D has rank q then it·follows that D'
positive definite and for any
~
#
~-l(~O) D is
°we would have 6 > 0.
We now study the asymptotic power function of our statistics
under the conditions of Theorem 2.2.1 •
.
27
2.3
The Asymptotic Power Function of {QN}
2.3.1
Taylor Series Expansion of the Power Function
Under the assumptions of Theorem 2.2.1 and the uniform conver-
gence in
~
P(~)
we may write the limiting power function
as:
(2.3.1) peA)
If we are going to use a Taylor series representation of peA) .
we need to evaluate terms of the form
aA t ... aA t
1
A
r
-
=0
-
Since peA) is a function of 6. one could evaluate the partial
derivatives by application of the chain rule of differential calculus.
Proceeding with this idea we get the following:
ap(A)
a)
aA
K
1
-
ap(A)
= a6.
a6.
3A
for K
1
K1
= 1,2, ••. ,q
as the first partial derivatives, and
a 2p(A)
b)
3A
-
aA
K1 K2
-
r
2
3 P(A) 36.
.
C36.
-2 -dA - -aA +
C36.
K
1
K2
ap(A) a 26.
a6.
3A
dA
K K
2
1
for Kl'K 2 = 1,2, ... , q as the second partial derivatives,
2
3
3 p(A)
a6.
a6.
36.
a p(A)
c)
~-,-,---"...,--
d~
dA C3A
1 K-2 K3
=
- -aA- -C3A
-+
aA
-
36. 2
K1 2
K 3
K
2
a p(A)
a 2fj,
-
3A
3A
dA
ClA
K K
K
123
2
3 6.
a2p(A)
-
+ - - 2- - - - - - - + - -2 - - - - a6.
aA
K
1
aA
K
3
aA
K2
a6.
3A
K
2
ap(A)
-
3A
3A
K
K3 1
+---------
e
28
for K , K , K
2
l
3
= 1,2, ••• ,q
as the third partial derivatives.
Con-
tinuing in this fashion will lead to complicated expressions to evaluate so we seek alternative methods of computing these derivatives.
Since we see that each term in the series representation of
P(~)
in
(2.3.1) is a polynomial of degree r in the Ai' we will derive some
results for homogeneous polynomials in general and with these results
obtain a reduction of (2.3.l).
2.3.2
Results on Homogeneous Polynomials
Definition:
A function h(zl' ••• , Zq} is said to be homogeneous of
degree n in a region E C Rq if and only if for every positive number
and (zl' ••• , Zq) with both (zl' ••• , Zq) and (Szl' ••• , SZq) in E we
n
= S h(zl'
have h(Szl' ••• , SZq}
.
••• , Zq}'
We remark that if h is a polynomial in (zl' ••• , Zq) then we
would say h is a homogeneous polynomial of degree n if it satisfied
the condition of the above definition.
nomial of degree K as
~
= ~(zl'
... ,
Denoting a homogeneous polyZ ) we now state and prove
q
some results concerning ~.
Lemma 2.3.1
d~
dZ
is a homogeneous polynomial of degree
s
K - 1 (K = 1,2, ••• , ), s = 1, ••• , q.
Proof:
Fix an arbitrary SE (1,2, ••• ,q).
where I
K
We can represent
~
as
is the set defined as:
i.
J
= O,l, ••• ,K;
j
= 1,2, ••• ,q
q
E
j=l
i
j
= K}
S
29
and a
iI' ••• , i q
are real numbers (could possibly be zero).
Now
~-
-1<
is
a finite sum so there is no problem in exchanging the derivative and
summation operators, so we get that
i.
q
i
s
a.
~l'
.
IT
... ,1. q J=
• 1
.j;'s
z.
J
J
let i's = i s -1 then we see that
i'
s
=
i. = O, ... ,K; j = 1,2, ... ,q; j ::f s;
J
q
O, ••• ,K-l; r 1. + i = K-l}
s
j=l J
j;s
So we see that
* is
~
a polynomial of degree K-l in addition
note that
so it is a homogeneous polynomial of degree K-l.
Since s was arbi-
trarily the result holds for s = 1,2, .•• ,q.
Q.E.D.
It is worth noting that if h is a homogeneous polynomial of
degree 1 then each of its partial derivatives are themselves constants;
hence the derivatives would be identically zero.
Lemma 2.3.2
Any tth partial derivative of ~(zl, ••• ,Zq)' K
is:
i) a homogeneous polynomial of degree K-t for t < K
ii) zero for all t > K.
= 1,2, •••
30
Proof:
i) follows immediately from repeated application of Lemma 2.3.1
...
th
.
at the K step the homogeneous polynomial of degree 0 would
also be a constant so all further derivatives would be zero
so ii) follows.
Q.E.D.
Lemma 2.3.3
For any two positive integers a and b we have:
ii) h + g = h
a
a.
a'
iii) ch
a
= ha where c is a nonzero real number.
Proof:
i) The product of 2 polynomials is a polynomial of degree
equal to the sum of the degrees of the individual po1y- ,
nomials; the homogeneity follows since:
ha+b(SZl,···,SZq)
= ha(Szl,···,SZq)
= Ba +b
h
~(Szl,···,SZq)
a+b
ii) The sum of 2 polynomials is a polynomial of degree less
than or equal to the maximum degree of the individual
polynomials, hence h
a
+ g
a
is a polynomial of degree
~
a
and
h a (Szl, ••• ,Bz q ) + g a (Szl""'Sz q )
=
S~'a
=
B~ a
iii) follows trivially.
Q.E.D.
31 .
Lemma 2.3.4
f(~(zl, •••
Let
,Zq»
be a function of
~,
a homogeneous
~
th
polynomial of degree K, possessing K partial derivatives
with respect to z and continuous in z, then for any
s
= 1,2, ••• ,K
we have
~...;:a;...s-=-f-:-_ = ~ aj f,
~
oX i
s
~
.•. ox i
on h'K
4
1
j=l
(s , j )
h'K
J-s
a~
-1(
, where the superscripts
are just to indicate that these homogeneous
J -s
polynomials are in general distinct for
...,
i
e: (1, 2, ••• , q) •
s
Proof:
df
af
af
(1,1)
a~
i) =
l~_l
ax '
a~ ~=a~
i
~1
by Lemma 2.3.1
1
2
2
a f
ii)
a f
=-
ax, ax,
~2
a~
~1
(1,1) a~
~-1 -ax,
for i
af
~2
+a~
-
2
af
=
=
= 1,2, ••• ,q
1
a~1,1)
-1
ax,
(2,2)
af
h
+
~, 2
2K 2
a~
~
~2
(2,1)
~-2
01'1<.
by Lemmas 2.3.2 and
2.3.3
2 ajf
(2,j)
L - , h'K
J
J -s
j=1 abj{
for i l ,i = 1, 2, ••• , q
2
Similarly we find
a3f
iii) ax, ax, ax,
~3
~2
2
a3f
~1
=--
a~
a2f
af
h
3x-3
+- h
a~
af
+ a~ ~-3
2x-3
+ - h 2K- 3
a~
3
=
and
L
j=l
2.3.3
32
for iI' i , i
2
3
= 1,
2, ••• , q and in general we see the result follows
with repeated ·application of Lemmas 2.3.2 and 2.3.3.
Q.E.D.
=0
If we define h
r
for all r < 0 then the same formula
th
Lemma 2.3.4 will hold if f has more than K continuous partial derivatives.
It should be noted that Lemma 2.3.4 gives us no idea as to the
precise homogeneous polynomials on the right hand side but for the
reduction of the power function this is no limitation.·
Lemma 2.3.5
~(O,
••• , 0) = 0 V K > 1.
Proof:
q
11
j=l
Now if a.
~l '
=
.
••• , ~K
0 V (i , ••• ,i ) then of course h
l
the other hand if a.
.
~l""'~K
(il, ••• ,iK) then i
with nonzero a.
~l '
j
-x
K
~
==
On
O.
0 for at least one K-triple
> 0 for at least one j and hence the terms
.
••• , ~K
are multiplied by zeroes hence
~(O, ... ,O)
=
0
Q.E.D.
2.3.3
A Simplification of the Pmver Function
In this section we shall substantially simplify (2.3.1) in terms
of the parameter M in Theorem 2.2.1.
Lemma 2.3.6
Under the assumptions of Theorem 2.2.1 the noncentrality
parameter
~
is a homogeneous polynomial of degree
Proof:
b.
1
=-.;;.,.(Mf) 2
where
~!.
33
c
& c. =
=
J
R.
L
An
A~.
"t!
•••
~)I,l
= 0,1, ••• ,M;
s
~.
dj n
j
)1,1' ••• '"t!
s
M
L
= 1, ••• ,M;
s=l
R.
s
= 1, ••• ,t
= M}
and
clearly for each j
= 1,
2, ••• , t we see that c. is a homogeneous
polynomial of degree m.
So /),
=
1
(HI)
2
t
L
j=l
t
J
1
Let L- (8)
-
jO'
L cr J
j '=1
-0
=
«cr
jj
'».
c. c., by Lemma 2.3.3 we see that /), is a
J
J
homogeneous polynomial of degree 2M.
QoE.D.
Lenuna 2.3.7
Under the assumptions of Theorem 2.2.1 all of the first
(2M-I) partial derivatives of the limiting power function
with respect to the A.'S are zero at A = O.
1
That is
= 0; s = 1, 2, ••• , 2M-I.
A
=0
Proof:
P(A)"= P(/),(Al, ••• ,A
q
»
and by Lemma 2.3.6 /), is a homogeneous
polynomial of degree 2M, furthermore P is continuous in its
partial derivatives of all order so by Lemma 2.3.4 we can
represent the partial derivatives as:
(s,j)
h
2jM-s
, V A3
II~II < K and s
= 1,2,_ .. ,2M-l
e
34
(s,j)
Now for 2jM-s
~
(0, ... ,0) = 0
1 we know by Lemma 2.3.5 that h
2jM-s
but j
~
1 and s < 2M so ZMj > s therefore we see that 2Mj - s > 1 for
all s = 1, 2, ••• , 2M-1.
So we obtain the required result.
Q.E.D.
The preceding theorem yields a substantial reduction of the
power function.
(2.3.2)
We see that (2.3.1) can now be written as:
= 0'. +
peA)
1
q
I:
-I( I:
r=2M r. R.=1
00
.
a
N
OAR.
An~)
the significance level of the test.
r
where a is
peA)
A=O
We now reduce the expansion by
the following theorem.
aSp (A)
-
Lemma 2.3.8
= 0 if s is not an integer multiple of
A=O
2M; i.e. if there does not exist an integer b such that
s = 2Mb.
Proof:
By Lemma 2.3.4 and the comment after it we can take derivatives
higher than the
~~
and use the representation
=
s
E
j=l
where it is understood that
.
ab. J
h
2M -s
j
-A=O-
= 0 if K < O.
= 0 if 2M j - r ~ 0 by Lemma 2.3.5 so the
But h 2M -s
j
~
ajp
A=O
result fo110,.,s.
Q.E.D.
35
So we can write the power function in its simplified form as
00
r
(2.3.3)
r=2
1.=0
We now give an alternative representation for the power function and
then proceed to use these simplifications in discussing the proposed
efficiency criteria.
2.3.4
An Alternative Representation of the Power Function
If we denote the i
th
p(i) (6) then we can write
derivative of P(~) with respect to 6 by
P(~) in a power series about 6 = 0 as:
00
P(6)
= r
r=O
6 r
_
per) (0)
r!
where
but 6 itself is a function of 1.1'
series in A about A
=0
=
1:
s=O
1
s!
1.=0
00
00
So
P(lI)
q
as
00
6(1.)
A and can be written in a power
= E
r=O
{r
1
q
- , (E
8=0 8. £=1
d
1.£
r!
ar)
£
s
6
I ~=~}r
_ _
() (0)
P r
36
Lemma 2.3.9
a
q
(L AR, a)
= L .....{2M~)..:..1 _R,_=l"-:-_--.;..R.
P{!:.)
r=O
rl
2M
1
co . {
r
!:.
A=O}
_
p{r)(O)
under the conditions of Theorem 2.2.1.
Proof:
!:. is a homogeneous polynomial of degree 2M therefore, all of
its derivatives less than the 2M
must vanish at A = O.
- -th
greater than the ZM
th
with respect to the A's
On the other hand all derivatives
are zero since the ~~th derivative itself
-
is a constant in A so the result follows.
Q.E.D.
We now give a result which provides a representation of p(r)(!:.)
in general.
Lemma 2.3.10
p{r)(!:.) =
r
= 1,
2, •..
where t is as defined in Theorem 2.2.1 and
pet + 2j,t:.) = Pr[x 2 (t + 2j,!:.) > X ]
- t,a
i.e. the probability that a noncentral chi-square random
variable with noncentrality parameter t:. and t + 2j degrees
of freedom is greater than or equal to the (I-a) 100%
point of a central chi-square random variable with t
degrees of freedom.
Proof:
We know that
37
Now the infinite series converges absolutely so there is no
problem in exchanging the derivative operator and the infinite summation operator so we get
b.
p(l) (b.)
dP
1
= -db. = - '2
- 2
b. j
(2)
00
L:
e
j=O
~j
J
Therefore
p(
1)
1
1 .1 1 1
s
(b.) =-Z[P(t+2,b.) - P(t,Li)] = (-)
L: ()(-1) P(t+2(l-s),l:I)
2
s
s=O
We shall prove the result by induction:
Assume the result of the lemma is true for r
p(n)(b.)
p(n+l) (b.)
= (1)
n
P(t+2(n-s),b.)
s=O s
dP(n) (b.)
db.
i.e.
s
L: (n) (-1)
2
=
n
= n,
= (1)
2
n
n
s
L: (n) (-1)
5=0 s
dP(t+2(n-s),l:I)
db.
1 n n
s 1
= (-) L: (n) (-1) {-Z(p(t+2(n-s)+Z,b.) - P(t+2(n-s),b.»}
2 s=O s
n
n
n
+ (1)P(t+2n-2,b.) + (2)P(t+2n-Z,b.) - (2)P(t+2n-4,l:I)
n
n+1
- (.n )p(t+2n-4,A) + .,, + (n)(_l) P(t+2,b.)+(n)(-1)
P(t,A)}
n
n
3
38
1 n+1
n+1
P(t+2(n-s),t.) + ('2)
(-I)
P(t+2(n-n),t.)
+ ( n)
(n)
s
s+l
but
(s+l+n-s] n!
= (n-s)! (s+l)!·
= (n+1)
s+l
therefore we have
1 n+1
= (-)
2
(n+1) P(t+2(n+1-0) ,t.) + (1)
0
2
n+1 n-1
L: (-1)
s+l
s=O
(n+1)
s+l
1 n+1
n+1
P(t+2(n-s),t.) +("2)
(-1)
P(t+2(n+1 - (n+1),t.)
let
Sl
=S + 1
1 n+1
1 n+1 (n+1 ) P(t+2(n+1-0),t.) + (_)
= (-)
2 0 2
·e
n
L:
Sl
(-1)
sl=l
(n+1)
Sl
1 n+1 .
P(t+2(n+1-s l ),t.) + ("2)
P(t+s(n+l) - (n+1),t.)
+1
s
1 n+1 n+1
= (-)
L: (n ) (-1) P(t+2(n+1-s) ,t.).
2
s=O
s
So by the principle of mathematical induction the result holds
for all r
= 1, 2, ....
Q.E.D.
In the next section we compute the Pitman
A~
of test 2 to test 1
= t 2 ; in the following sections we make use of the reductions
1
we have obtained in this section.
when t
39
2.4
Pitman ARE when t
l
= t2
We consider now the problem of comparing the performance of
{Q~2)}
\-
to
{Q~l)}
for larg'e samples.
We assume that both sequences of
test statistics satisfy the conditions of Theorem 2.2.1 with parameters
denote the limiting power functions by P.(A) for i
~ .....
= 1,
2.
We first
~
MZ then one sequence of test statistics is behaving
differently from the other. For example if t ~ q and M = 1, M > 1
l
i
Z
notice that if M
l
then test 1
minimum at
h~s
~O
a limiting power function which attains a relative
while test Z is changing so rapidly at
guarantee that it has an extremum at
consideration to M
l
= MZ = M in
~O.
~O
that we cannot
For this reason we restrict
all the sections to follow.
Since it involves a slight extension of existing work (since
q, t
l
and t
z are
general) we now derive the Pitman ARE of ¢2 with
respect to ¢l in the situation t
Pi tma'n ARE df
¢z
l
= t Z'
For convenience we denote the
P
with respect to ¢l as eZ,l'
Theorem Z.4.l. If {Q(l)} and {Q(Z)} satisfy the conditions of
N
Theofem Z.Z.l with parameters t
N
i
and
~i
respectively and if t
l
=
t
z
then
Proof:
Recall that we require the
lim
N-+co
{NZ(N)} are chosen such that the two tests {Q(l)} and {Q(Z)} have the
N
N
l
2
same limiting power agains t the same ~"\
40
~N
Since t
l
=
t
2
= ~O +
-0
N
~
we know that the limiting powers are the same when
the noncentralities are the same so if Pi are defined in (2.2.20) we
see by (2.2.2l) that
and
therefore we require
hence,
linr
N .... oo
Q.E.D.
We remark that i f exactly one
clearly superior; if both
~l
that particular direction
~.
and
~2
~i
is zero the other test is
are zero then both are useless for
A common set of values for the parameters M, 0 are 1, 1/2
respectively, and then
e~,l
will be indeterminate for some
interpretation of e~,l.
~
and care must be taken in the
If q ~ t then one can apply Courant's theorem
and place bounds on e~ 1 for all A ~ O.
If q
,
=I
no problem arises in
the evaluation of etl since it would not depend on A in this case.
2.5
Local Asymptotic Relative Efficiencv (LARE)
In general t l ~t2 then the
preceding section's approach of com-
puting the Pitman ARE does not work since for t
not the same when
~l = ~2.
l
~
t
2
the powers are
In this section we propose an alternative
criterion of comparison in which we compare the limiting powers
locally, i.e. when
origin.
~
is in- some arbitrarily small neighborhood of the
To be precise we adopt the following:
The local asymptotic relative efficiency (LARE) of 9
2
N
l
with respect to ¢l is defined to be the lim
~ where {N } and {N }
l
2
N-+CXl
2
Definition:
are chosen such that the two tests have the same limiting power locally
through the same sequence of alternative hypotheses.
Local power is
the power function expansion up to the (ZM)th derivative terms.
Theorem 2.5.1.
The LARE of ¢2 with respect to ¢l when
{Q~l)} and
{Q~2)} satisfy the conditions of Theorem 2.2.1 each with x2 (t i ,6 i )
(i
= 1,2)
as
N -+ CXl
through
~
where
~N
is
LARE
=
-0
= ~O + N
~
a] 11/
82
-"
81
J
2M
6
42
Proof:
From section 2.3.4 we know that each power function,
Pi(~)
can be written as
Pi(~) =0.+
00
L
but
is the tail of a convergent series
~.=o
1.
r=2
so we knmv that it is bounded by a finite number K.
In particular if
00
~
is sufficiently small then
~r
-t
L
p~r) (~.)
r. 1.
1.
r= 2
where A
max
~.=O
1.
- 0
(A 4M) as A ~ 0
max
max
=
So for
I I~I I
< £, a small number, we see that
• ex
P. (A) =
1. -
+
(1) (~. )
~. P.
1.
1.
1.
So if we define local power as a. +
~.=o
1.
~.
1.
then we require the equality of terms like
in the limit.
that
(1)
P.
1.
~l
(~.)
1.
(1)
P
l
~.=o
1.
(~l)
and
~l=O
Now choose {Nl(N)} and {N (N)} such
2
43
but by Lemma 2.3.10 we know
1
(-2) [P.1. (t.1. + 2,0) - a]
and we see
- a]
/),2} 1/2Mo
- a]
/),1
and hence
- ex]
1/ZHo
- ex]
Q.E.D.
Let us define
P (t + 2,0) - a
2 2
P (t + 2,0) - a
1 1
In the case where M = 1, 0
= 1/2
we get
-1
~'p(2)'~(2)
(2.5.1)
p(2)A(
A'D(1)'L(1)-lD(l)A
It can be seen that we now have a measure of ARE which takes into
account (a) the difference in degrees of freedom and (b) the difference
in the noncentrality parameters for
~
near the origin.
Furthermore,
these two components are factored so that their product is the efficiency.
The second quantity is the same as we get when we compute the
Pitman ARE fortI
= t 2,
and we could proceed to place bounds on this
ratio by Courant's theorem.
The comments that were made in the
..
44
previous section on the relationship of q and t. should be kept in mind
1
since either of these quadratic forms in (2.5.1) may be positive semidefinite.
The scalar factor R(t ,t ,a) is given in Tables 2.5.1 to
l 2
2.5.5 for t1,t
Z
= 1(1)10
and a
= .10,
.05, .01, .005 and .001.
We
note that the scalar adjustment can be quite large for large values of
It2-tll, but this factor varies little with the value of a unless
It 2-t 1 ' is large.
One major drawback of this LARE is that it depends on the arbitrary vector A, even though we have assumed the vector to have arbitrarily small elements.
measure of ARE.
This is an undesirable feature of this
We now propose a measure of ARE, based on the general-
ized curvature of the power function at the null point, which avoids
this drawback.
I.
\
48
e
TABLE
2.5.4
VALUES OF THE ADJUSTMENT FACTOR, R(t ,t ,ex)
FOR ex
t
1
t
1
2
1.00 1.64
2
.005
2
1
e
=
1
3
4
5
6
7
8
9
10
2.18
2.66
3.09
3.49
3.86
4.22
4.55
4.87
1.33
1.62
1.88
2.12
2.35
2.56
2.77
2.96
1.00 1.22 1.42
1. 60
1.77
1.93
2.08
2.23
2
.61
1.00
3
.46
.75
4
.38
.62
.82
1.00
1.16
1.31
1.45
1.58
1.71
1.83
5
.32
.53
.71
.86
1.00
1.13
1.25
1.36
1.47
1.57
6
.29
.47
.63
.76
.89
1.00
1.11
L21
1.30
1.39
7
.26
.43
.57
.69
.80
.90
1.00
L09
1.18
1.26
8
.24
.39
.52
.63
.73
.83
.92
LOa
1.08
1.15
9
.22
.36
.48
.59
.68
.77
.85
.93
LOO
1.07
10
.21
.34
.45
.55
.64
.72
.79
.87
.93
LOO
Note:
See note for Table
2.5.1.
49
e
TABLE 2.5.5
VALUES OF THE ADJUSTMENT FACTOR t R(t p t 2t a)
FOR a = .001
t
t
2
1
1
2
3
4
5
6
7
8
9
10
1
1.00
1.69
2.28
2.81
3.29
3.73
4.15
4.53
4.91
5.27
2
.59
1.00 1.35
1. 66
1. 94
2.20
2.45
2.68
2.90
3.11
3
.44
.74
1.00
1.23
1.44
1.63
1.82
1. 99
2.15
2.31
4
.36
.60
.81
1.00 1.17
1.33
1.48
1.62
1. 75
1.88
5
.30
.52
.69
.85
1.00
1.14
1.26 1.38
1.49
1.60
6
.27
.45
.61
.75
.88
1.00 1.11
1.22
1.32
1.41
7
.24
.41
.55
.68
.79
.90
1.00
1.10
1.18
1.27
8
.22
.37
.50
.62
.72
.82
.91
1.00
1.08
1.16
9
.20
.34
.46
.57
.67
.76
.84
.92
1.00
1.07
10
.19
.32
.43
.53
.62
.71
.79
.86
.93
1.00
Note:
See note for Table 2.5.1.
e
50
2.6
Curvature Asymptotic Relative Efficiency (CARE)
We consider now another new measure of ARE for the special case
M
l
= M2 = 1.
At the end of this section we discuss briefly the case
where M. > 1 and point out problems in the interpretation of this case.
~
We recall from Chapter I the definition of generalized Gaussian
. curvature of a function in several variables.
We propose now as a
measure of efficiency a criterion based on the generalized Gaussian
curvature of the two limiting power functions, P.(A) at the origin.
~
Definition:
~
The curvature asymptotic relative efficiency (CARE) of
.~
test ~2 with respect to ~l is the
lim
N
1
+
00
-- where {N } and {N }
l
2
N
2
are chosen such that the two tests have limiting power functions with
the same generalized Gaussian curvature at A
.e
=
° through the same
sequence of alternative hypotheses •
Of course this definition of efficiency, denoted by (CARE), is
not meaningful if M and M are both greater than one since one can
2
l
then show by application of Lemma 2.3.7 that the two generalized
Gaussian curvatures must be zero at A = 0.
Let us now derive the
generalized curvatures (we omit the adjective Gaussian for convenience)
~.
of the limiting power functions at A =
Theorem 2.6.1. Under the conditions of Theorem 2.2.1, when M.
1
the generalized curvature of the limiting power function of
A=
° is:
G
i
= [Pi(t + 2,0) - a]q
i
1~(i)'~(i)-l~(i)1
q
2
where n(i) is defined by f(i)
= n(i)
A.
=1
{Q~i)}
at
51
Proof:
2
by Lemma 2.3.7 the denominator is one so
and we know that
d2Pi(~)
. dAR,dA
K
A=O
But by Lemmas 2.3.9 and 2.3.10 we get
~O
=
[PJ..(t i + 2,0) - a]
2
( t
)-1
(D i) ~(i
D(i»
n
where the second term on the right is the R,K element of the matrix
D(i)'~(i)-lD(i)• Therefore we get
...,
D(i)'~(i)-1D(i)
...,
...,
Q.E.D.
Now we observe that if q > t. with M. = 1 then G. = 0 by
J.
J.
J.
Theorem 2.2.2; on the other hand if q < t, then G > 0 if we assume
i
52
that D(i) is of rank q which we do throughout.
We are now in a posi-
tion to find the CARE.
Theorem 2.6.2.
If
{Q~l)} and {Q~2)} satisfy the conditions of
Theorem 2.2.1 and if q
CARE
Proof:
~
= R(t 1 ,t Z,a)
= 1,Z)
ti(i
{N
1
1
= MZ = 1,
ID(Z)'E(Z)-l D(2)
- -
l/Zo
We must determine
withM
} and
{N
2
-
I
} such that
then
1/2qo
G
1
= G2 where Gi
is the generalized curvature of the limiting power function of
. {Q(i)} through H as N ~
N
N
i
~:
00
where
-0
~N = ~O + N
~
By (2.2.21) we have
and Pi is defined in (2.2.20) and since M = 1
By Theorem 2.6.1
[P.(t.+2,0) - a]q
1.
1.
Setting G = G we obtain
l
2
1~(2)'~(2)-1~(2)I
1!Zoq
1~(1)'~(1)-1~(1) I
.e
and the proof is complete.
Q.E.D.
53
We now give a result for the special case of t
Theorem Z.6.3.
1
= t •
Z
Under the conditions of Theorem Z.6.Z with t
1
= t
z
the CARE is the geometric mean of the roots of the product matrix,
,-1
(D(Z) E(Z)
Proof:
D(Z»(D(l)
f.-1
E(l)
D(l»-l, of the Pitman ARE.
From section Z.5 we know that when q
we have
(2)' (2)-1 (2)
P
~'D
E
D ~
e Z,l = [ A'D(l)'E(l)~lD(l)A
~
t 1 = t z, M1 = MZ = 1
J
l/Zo
-
-
Since both D(Z)'E(2)-l D(2) and D(l)'E(l)-l D(l) are positive definite
we can find a nonsingu1ar F such that F'D(2)'E(2)-lD(2)F =
diag (r1, ••• ,r ), F'D(l)'E(l)-l D(l)F = I where r , ... ,r are the roots
q
1
q
,-1
,-1
of (D(2) E(2) D(2»(D(l) E(l) D(l»-l.
can represent e
p
2,
1 in A* space as
Now the geometric mean of rl, •.• ,r
l/q
q
(n
r 2,)
If we let A* = F A then we
= {IF'D ( 2)
' (
E 2)
-1 ( )
D2
q
is given by
, ( ) -1 () -1 1/q
1
D 1 FI }
FI /F'D(l) E
2,=1
So the CARE is the geometric mean of the roots of the product
matrix.
We also note that the CARE is equal to the ratio of the two
geometric means of the roots of the individual matrices
54
Q.E.D.
Thus the efficiency using the curvature provides us with a
typical or average efficiency of our tests over all A # 0 and in the
case t
l
= t
2
considerably simplifies the Pitman ARE since we have an
alternative to simply placing bounds on the efficiency.
Theorem 2.6.3
is a generalization of Bickel's (1965) result when he considered the
multivariate one sample location problem with q = t
We can see that if q
~
t
i
for one i but q > t
l
i
=. t •
2
for the other that
the latter power function has zero curvature at the origin while the
former has positive curvature so we could conclude that the former test
is unquestionably superior to the other when using this efficiency
criterion.
If on the other hand both t. < q (i = 1,2) then both curva~
tures are zero and this method of comparison is not able to discriminate between the tests.
In this latter case we consider an alternative
measure of ARE in a later section.
Returning to the conditions of
Theorem 2.6.2 let us define an egui-power contour as the set of
that P.(A)_ = a + c (0 <
~
-
power function
a~
2.3.3; with M.
=1
~
a +
C
-<
such
Consider the
written in equation (2.3.3) at the end of section
we find that an equi-power contour is equal to:
~,[[:::~~~) ] I~=~] ~
AR, .. •AR,
I
2r
(2Hr) !
00
e
1 - a) where c is arbitrary.
~
+ E L
r=2 I
2r
d
2r
-
P. (A)
~
dAR, ••• dAR,
1
2r
= a + c
A=O
.
55
Now if we choose A sufficiently small so that we can ignore
infinitesimals in A of order 4 and higher we see that the equi-power
2
contour reduces to
A'
ft, = c •
.... ....
The matrix in this equation is positive definite when q < t. and
-
hence this contour is an ellipsoid.
~
Now the volume of this ellipsoid
is given by
v=
q
J
dA 2 where Ai =
IT
2=1
A'A.A=C
-
A=O
-1-
2,K=1, ••• ,q
Now A. is symmetric positive definite so there exists an orthogonal
-~
matrix B such that B'A.B = R = diag (r.l, ••• ,r. ), r .. = roots of A.•
....
-~....
~
~q
~J
-~
Let C = BR- l / 2 and let A = C A*
J
therefore V =
ri
IBR- I / 2 ,
.. 2-1
*
c
....A*'A=
....
-
A~
= IB_R_-
1/2
1
the volume of the q-dimensiona1 sphere of radius
R- 1 / Z B' A. B R- 1/Z = I
-
-
-1 _
-
-
we notice that
1/2 S (c) -consequen tly V - IA
-i 1q
sq(c) where sq(c) is
IV
ve.
Since
1/2 = IBR- 1/2 ,
IA.I-1--
and
S (c)
9
IA
....i
1 / Z•
Th e vo 1ume
0f
th e
.
equ~-
1
power contour is thus inversely proportional to the square root of the
generalized curvature of the power function at the origin.
We see
that to increase the curvature is to decrease the volume of a certain
..
56
infinitesimal ellipsoid.
In comparing the power functions of test 2
and test 1 along an equi-power contour, we see that if CARE > 1 then
the power function of test 2 encloses an ellipsoid of smaller volume
than test 1 along the same contour; intuitively the second test seems
to satisfy an attractive property of faster average growth locally
than test 1.
Under the assumptions of Theorem 2.6.2 both tests are unbiased,
and we see that if CARE >1 then test 2 is more nearly optimum in the
type D sense described by Isaacson (1951).
If CARE> 1 we may say that
test 2 has faster average growth locally when compared to test 1 but
we should keep in mind that there may be some directions for which
test 2 has lower pmver than test 1.
This type of deficiency is almost
certain to exist in any single quantity which attempts to measure
multivariate efficiency.
If we consider the case where M
l
terms in A of power greater than the ZM
= M2 = M with M > 1, neglecting
th
and higher than the equi-
power contour of each power function, reduces to
=c
for c E (Oll-a] •
For general values of M it is not clear to me what type of
contour this is or what conditions (similar to the positive definiteness assumed when M = 1), are needed on the
to ensure a region 'vhich encloses an interior.
Consequently attempts
57
generalize the arguments on minimizing the volume of ellipsoids fail.
~
One should anticipate problems in handling this for general M since
even sufficient conditions to ensure a relative minimum of the power
function at A =
9 are
not generally available.
A sufficient condition
nd
for M = 1 is, of course, the positive definiteness of the 2
partial
derivative matrix.
The tests considered in this work have M = 1 so we
discuss the case M > 1 no longer.
2.7
Trace Asymptotic Relative Efficiency (TARE)
Again confining ourselves to the case M
l
= M2
we propose another
measure of ARE whose range of applicability extends to a wider class of
problems than does the CARE.
The criterion we propose is as follows:
Definition:
The trace asymptotic relative efficiency (TARE) of test ¢2
N
l
lim N: where {N } and {N } are chosen
with respect to ¢l is the
2
l
-+00
Nl
2
such that the tlvO tes ts have the same limiting average ,power locally
over the unit sphere through the same sequence of alternative hypotheses.
Again as in section 2.6 we define locally to involve the terms
up to the ZM
th
derivatives in the limiting power function, and we
assume negligible terms beyond the ZM
th
derivatives.
We expect this
ARE to be an average efficiency since we are taking the average local
power over all possible directions with respect to the surface area
element on the q-dimensional sphere.
Pi(~) = a +
t 6 [P
=a +
t[Pi(t
i
i (t i
i
\Vhen M
l
+ 2,0) - a] +
= M2 = I
we can write
O(A~ax)
+ 2,0) - a] A' D(i)'r(i)-ID(i)A +
O(A~ax)'
.
58
So ignoring higher order terms we obtain a local power of
Let
f(~)dA
f
q
denote the surface integral of the function f(A) over
K(R)
Then we obt~in the following theorem for the TARE
the surface Kq(R).
of test 2 to test 1.
Theorem 2.7.1.
If {Q(l)} and {Q(2)} satisfy the conditions of
N
N
Theorem 2.2.1 and HI = H2 = 1 then
tr
TARE
=
D
(2)' (2) -1 (2)
r
D
1
1
/
20
R( t l ' t , a) -----------1---- .
2
tr D(l)'r(l)
D(l)
I
)
Proof:
We are required to choose {N } and {N } such that the two
1
2
tests have the same average local power over the surface
I IAI I = 1
we must choose the sample sizes to guarantee this condition through
the sequence
~:
~N
= ~O +
-0
N
~.
By (2.2.21)
'P
(1)
20
-"\ (QN, IlL) -+ X 2 (t.,P. l1 ) as N -+
~
-N
~
~
i
' d e f'~ne d'~n (2220)
h
were
Pi ~s
••
and
A
D
i
='A 'D(i)'~(i)-lD(i),.
~
_
~
So by (2.7.1) we require that
(2.7.2)
00
J
"~,, =1
J
/I ~ 1/=1
and
59
But by a result in differential geometry [Wey1 (1939)]
,
(2.7.3)
A B ~ dA
J
.tr ~
= ---q-
q-1
Area (8(1»
11~11=1
so (2.7.2) becomes
1/28
Recalling how the p. are defined we see the result follows.
1
Q.E.D.
The trace of a matrix is equal to the sum of its roots so using
this definition of ARE we find that we get a quantity which is equal
to the arithmetic mean of the roots.
Hence, if we had defined the
TARE as the limiting ratio of sample sizes such that the tests have
nd
the same arithmetic mean of the roots of the 2
partial derivative
matrix at the origin then we would have arrived at the same result.
We see that if TARE > 1 then test 2 is more nearly optimum in the sense
of Wa1d (1943) for sufficiently small ~, since it has greater average
local power over the unit sphere.
2.8
Bahadur Efficiency
To be consistent with the notation in Chapter I
parameter set consisting of the values of
~
and let
of the parameter set consisting of the single point
~.,e
nO
~O.
let n be the
be that subset
We consider
in the Bahadur method of comparison, a fixed alternative hypothesis,
H:
a
e = -a
e
-
E
n - n0 ;
the statistics we compare are
,',
60
"
11(i)(8 »'1.:(i)
.
N
-1
-0...
(T(i) _11(i)(8
N
N
We give sufficient conditions under which'
»;
.... O
{~Q~i)}
i = 1,2.
is a standard
sequence for testing H :
O
Theorem 2.8.1.
Suppose that to test the hypothesis
the two sequences of test statistics,
~
= ~O
we have
{Q~l)}, and {Q~2)}, satisfying
the following three conditions:
Q~i) has a central X2 distribution with t i degrees of
freedom as N ~ ~ when ~ = ~O.
(2.8.1)
a)
(2.8.2)
b)' (T (i)
11(i)(8
»
J-'
N
N
~
~
n - nO
for all 8 £
n(i)(8) a S N ~ ~
a.s.
.... 0
where
~
~(i)(~) is a fixed non-null
vector of t.-components.
1.
(2.8.3)
~(i)
c)
f.,
a.s.
-N
is positive definite.
Then the sequence .
Proof:
{~Q(i)}
N
~i) = ~.
Let
is a standard sequence for testing
We must now verify the three conditions
for a standard sequence as indicated in the definition in Chapter I.
I.
X(Q~i»
~~i»
~ X2 (t.) implies
1.
~
X (t ), where X(t ) is a chi-distribution with
i
i
degrees of freedom, for ~ £ nO since ~i) is a
i
continuous function of Q~i~ Hence we see there exists a
t
61
continuous distribution function Fi(X) such that
lim
N+oo
pr[~i) ~ X] = Fi(X) V X and ~
II. For each X and
since Fi(X)
= pr[x 2 (t i ) ~
I
=
1 - Fi(X)
nO we notice that
~ £
2
X ] that
t.
00
nO.
£
~t
(2 2
r (-!.»
2
t
-1 e- z / 2 z
-2
i
2
dz
x2
t.-2
let u
=
...Lz 2
dv
= e -z/2
dz
t.-4
].
2
v
dz
= -2
e
-z/2
So for each X
=
(2.8.4)
t
(2 i
_X 2/2
/2
(t.-2)
r(t./2»-1 [2e
].
+ 2
X].
r
e-
X
z/2
/r
t.-2
2
-1
)zT
dz]
2
let w = z/2 then (2.8.4) becomes
(2.8.5)
1 - Fi (X)
=
2
/2
-1
-X /2 (t.-2)
(2 i
r(t./2»
2[e
X].
].
t
t -2
co
Ix
2
2
e
-w
.iw
2
-1
dW].
62
Now we know that
t -2
-i - -1
00
e-W w 2
_X2
dw
therefore as X-+-oo (2.8.5)
= O(e
-
2
2(X )
2
=
So as X -+-
(Xl
1 - F.(X)
=
].
1
t -2
i
(2 i r(t./2»-1 2[e-X
1
e
_X 2/2
2
) as X-+-+
00
is
t
1 - F.(X)
t.-2
_1_ _ 1
-2
2 t 4
t -2
-X/2 i+ 2 i (t.-2) O(e
X
)]
].
t i- 2
t./2
1
2
X
[2 (2 1 r(t./2»- + O(X- )]
1
2
-X J2 t.-2
t./2
= e X ] . 2(2 1 r(t./2»-1 [1 + 0(1)] as X -+-
(Xl
•
1
Hence,
log
(1 - F.(X)
e].
where c
t.l2
t ]..
2
X
-2 + (t.-2)
log e X + log c
1
= -
= 2(2].
t
+ 0(1) as X -+-
(Xl
i
1
r(t.l2»-
and we see
1
X2
(2.8.6) 10g e (1-F i (X»
10g e (1-F (X»
i
= - -Z
2
X
[1 -
= - ~[1
2(t i -2) loge X
X2
+ 0(1)] as X -+-
+ 0(1)] as X-+-
(Xl
00.
So condition II required for a standard sequence is satisfied
for ~i) with a
i
= 1.
III. Since (T(i) - ~(i)(e » a.s. ~(i)(~) y ~
N
and
N
.... 0
-+-
.-
.-
-
£
n - nO as N -+-
00
63
"'(i)a.s. (i)
L
L (8) as N
...N
~
-
....
~
00,
V 8
e: 0 - 0
0
therefore we see that
as N ~
00
•
As a result, we have
So if
~(i)(~) # ~ for ~ e: 0 - 00 then by the assumed positive
definitiveness of ...r(i)(8)
... V ...8; the result of the theorem follows.
Q.E.D.
As a result if we assume that our statistics satisfied the
conditions of Theorem 2.8.1, which they do for all the applications
considered, then the Bahadur efficiency of
~2with
respect to
~l
is
simply
n(2)' (8)L(2)-1(8)n(2)(8)
(2.8.7)
B
e 2 ,1 (8)
=-
=
n(1)(8)L(1)
-1
--
-
for 8 e: 0 -
nO.
(8)n(1)(8)
We notice that (2.8.7) depends on the parameter 8 in a most
complicated fashion; primarily we notice that the covariance
depends on the fixed alternative value chosen.
~(i)(~)
For some problems
~(i)(~) would not of course depend on ~, for example, in the standard
multivariate one sample problem when the data represent observations
from a normal population with specified location
~O
hypothesis and location
For other problems
~
for the alternatives.
this/dependence of the covariance on 8 is real, for
under the null
ex~ple,
if in
64
the above problems we consider nonparametric tests constructed under
an appropriate invariance structure we would find that r(i)(8) would
depend on 8.
So we see
~his
dependence of the matrix
E(i)(~)
creates a problem in the evaluation of this efficiency.
on 8
We should also
notice by (2.8.6) that the measure of efficiency, which was constructed
for large
x,
is not sensitive to the degrees of freedom of the test,
as the dominant terms in (2.8.6) do not involve t. as X ~
.
~
00.
For these
reasons we find the Bahadur criterion an unattractive measure of
efficiency for our problem.
Bahadur (1960) established sufficient conditions under which the
Pitman ARE is equal to the limiting value of the Bahadur efficiency
when the univariate parameter
e approaches
the null value
~O.
We
now establish a relationship between the limiting Bahadur efficiency
as
~ ~ ~O
and the LARE which we proposed earlier
Theorem 2.8.2.
If for testing H :
O
of test statistics {Q(i)}
N
= 1,2,
i
~
=
~O
we have the two sequences
each sequence satisfying the
conditions of Theorem 2.8.1, and
(2.8.8)
a)
~~i)(~) has first partial derivatives with respect to
the elements of
(2.8.11) d) n(i)(8)
then
=
lim
N .... oo
~
and uniformly continuous in a small
,
-1
(8-6 ) 'D(2) r(2) (8 )D(2) (6-6 )[1
- -0
(2.8.12)
-
,
-~O
-1
(8-8 )ID(l) r(l)
~
-0
~
-
-
--0
+
65
0(1)]
(& )D{l) (8-6 ) [1 + 0(1)]
~O - -0
where ~(i) is defined by (2.8.9).
Proof:
By Theorem 2.8.1
for every
But by (2.8.11) we have for every
~ £
~ £
n - nO.
n - nO
But by (2.8.10) there exists matrices ~(i)(~) defined by
rei)
-
-1
(8)
-
=
rei)
-
-1
(8) + r(i)(6)
-0
-
where
So for each i we see that for ~. close to ~O we have for each N:
Now choose ...,8 such that
but by (2.8.8)
=
[1 + 0(1)] as
~ ~ ~O
•
So we see that (2.8.14) becomes for each i
(2.8.15) (8-8 )'
_ _0
[a~~i)(~O)]'E(i)-l(e
) [a~~i)(~o)](e_e ) + 0(1)
.a e
_
-0
.. a e
_ -0
-
-
as _e
Now this is true for each N hence we have that as
~ ~ ~o
~
8-0 •
that
Hence the result (2.8.12) follows.
Q.E.D.
We notice that if we let
~
=
(~
-
~O)
then
~ ~ ~
as
we see that the quantity (2.8.12) is similar to the LARE.
the
~ ~ ~O
and
In short if
{Q~i)} satisfied the conditions of Theorem 2.2.1 with M = 1 we see
that the LARE and the limiting Bahadur we have considered here would
be nearly identical.
The limiting Bahadur efficiency in (2.8.12) even
though simple to compute does not account for differences in degrees of
freedom.
CHAPTER III
APPLICATION OF TARE AND CARE TO THE ONE SAMPLE
GROWTH CURVE PROBLEM
3.1
Introduction
In this chapter we apply the measures of ARE proposed in
Chapter II to evaluate the efficiencies of some common procedures used
in the study of polynomial grmvth curve models.
The general statis-
tical model along with its reduction for the one sample growth curve
model is given in Section 3.2.
While the results of this chapter are
easily extended for other hypotheses, the hypothesis we study is the
hypothesis of a constant growth curve over time.
In Section 3.4 we
discuss the reduction of the basic data to estimates of the assumed
model.
Once a given model has been decided upon a common procedure to
follow in practice is to estimate the growth curve parameters by
unweighted least squares.
this chapter.
This basic reduction is assumed throughout
After obtaining this set of summary statistics for each
2
observation vector we apply the Hotelling T test to test the null
hypothesis.
We then evaluate the TARE and CARE of an incorrect speci-
fication of the model to the correct specification of the number of
parameters.
Similar results are obtained in Section 3.5 for the cor-
responding nonparametric rank scores procedure.
A brief summary of
the one sample rank scores procedure is presented in Section 3.5.2.
Section 3.6 is devoted to a comparison of the parametric and
68
nonparametric procedures based on the trace and curvature criteria of
ARE.
Bounds for the TARE are derived similar to those available for
the CARE.
The chapter is concluded with a brief section on the use of
covariance adjustment of the statistics with the higher order polynomia1 coefficients.
3.2
The Statistical Model
The model given in this section is sufficiently general to
encompass the c-sample problem and more complicated designs and will
be used in a later chapter on the c-sample problem.
We consider two
index sets, I and T where
We notice that I contains K1K2 ••• ~ distinct points and we may
think of I as specifying the design across individuals.
of distinct time points.
have a set of
n(~)
Corresponding to each i £ I and t
random vectors of b elements, namely
I
(3.2.3)
~S(~,ti)
=
S
= 1,2, ••• ,n(i).
S
= 1,2, ••• ,n(9.
1xb
In addition, let the hxb matrix
(3.2.4)
~S(~) =
T is the set
(~Ttl)
eS (l-, t h )
I
i
£ T we
69
We assume that
~S(~)
is a collection of independent stochastic
--
matrices from"a bh-variate continuous distribution function G(X;i)
bh
for X £ R •
For the normal theory, we would assume
bhxbh
where the location vector
M(i)
E does not depend
on~.
i
=
£
I and
We note that with this specification
S
= l, ... ,n(i)
that is, the distri~ution of ~S(~) - ~(~) does not depend on ~.
To eliminate the normality assumption, in the nonparametric approach
we assume that
G(X;i)
In this way
~S(~)
-
~(~)
= G(X +
-
--
M(i»
-i
£
1.
is distributed independently of
For the growth curve model one assumes that the
~.
~(~,ti)
can be
written as a function of certain parameters 8(i) and the time point,
tie
For example we may assume that
(3.t.7)
y.(S.(i),
J -J -
t
i
n)
£ I
x.,
1 < j < b
1 < i < h
where S.(i) is a vector of r elements (r
-J -
~
h).
Since the S.(i) are
-J -
70
assumed to be unknown we can consider problems in testing hypotheses
concerning the ~j(~) and the estimation of ~j(~)'
(3.2.8)
8(i)
rxb
= (~l(~)""'~b(!»
i
If we let
E I
then we wee that we have a reparameterization from M(i) to
is dimension-reducing.
~(~)
which
In this· chapter we are interested in evaluating
the loss in the incorrect specification of 8(i) for the one sample
problem.
We shall consider only polynomial functions of t n where 8.(i) are
-J To be explicit, we require the
IV
the coefficients of these polynomials.
functions Y defined by (3.2.7) to be polynomials in tR,'
j
The set T
corresponds to h distinct abscissae and it is well known that corresponding to this set T there exists a unique set of h orthogonal polynomials PO(t), •• "
Ph_l(t) satisfying the conditions:
i = K
i I: K
(3.2.9)
and P.(t) is a polynomial of exactly degree i in t.
~
vector
~i
the value of Pi(t) for each tR,'
$I,
= 1,
We denote by the
••• , h; that is
Pi (t )
l
(3.2.10)
b. =
-~
i = 0, 1, . • . ,h-l.
hxl
We shall consider in this study only the case b
= 1.
We observe
that for the one sample problem I consists of only one element so we
omit the subscripts
case.
~
in the remaining discussion of the one sample
In the one sample problem with b
=1
we have:
71
..
x-8 =
hx1
8
= 1,2, ••• ,n
which are independent and identically distributed as
3.3
G(~}.
Comments on Data Reduction
Since we consider only polynomial models we see that
~(t)
can
be any of the following:
With each of these models one could construct a test of the
hypothesis that the growth curve is constant with respect to t£.
example~
for the model (3.3.l) the appropriate hypothesis to test is
0
~l
(3.3.4)
For
H :
O
~2
~-1
=
0
~O
unspecified.
0
While under the assumption of model (3.3.3) the appropriate
hypothesis to test is:
(3.3.5)
Given what maximum degree of the model is assumed then we can
determine which parameters to test for equality to zero.
Once it has
been decided how many parameters we wish to estimate we then have the
72
problem of reducing each observation vector of h-elements to a vector
of t-elements'which represent the estimates for the individual's
growth curve.
One reduction technique is the method of unweighted
least squares, which amounts to choosing for each individual the
vector a which minimizes the quantity
(~s - ~~)' (~s - ~~)
(3.3.6)
where Ba is the assumed growth curve model.
~S is denoted by
!s
and is equal to (~'~)
Y
""",,8
-1
The estimate for each
~'~s' Le.
= (B'B)-l
~'~s.
- ----
Now if B is the matrix of orthogonal polynomials P satisfying (3.2.9)
then,
·e
(3.3.7)
Under the assumption of the Gauss-Markov theorem this is the minimum
variance linear unbiased estimator for a if the covariance matrix of
~s assumed to exist) is equal to 0 2 I
•
When this covariance pattern
hxh
is incorrect then the estimator
!s
is not the best linear unbiased
estimator (BLUE), which in this case is given by the method of generalized least squares.
In the absence of information on the true covariance
-
~,
the
estimator (3.3.7) is the estimator suggested by several authors,
[e.g. Potthoff and Roy, 1964].
Other estimates have also been pro-
posed which introduce stochastic weights [e.g. Grizzle and Allen, 1969];
ho'olever, 'ole explore estimators of the form (3.3.7) since these are also
used frequently in practice.
73
The estimators needed to test the null hypothesis would not
!s
involve the first element of
since this estimates aO.
The esti-
mators for testing the null hypothesis are:
(3.3.8)
where~i
Y
...Si
= B... i'
S
X
....S
= 1, ••• ,n
correspond to the orthogonal polynomials £l' ••• '~i to be
used to test that a ,a , ••• ,a are zero.
i
1 2
We now consider the standard
parametric analysis which is applied to these reduced data.
3.4
Parametric Procedures
In the standard parametric analysis we assume that
N(e,E); however, if one assumes that
G(~)
G(~)
is
has finite moments up to
order 2 + 0 for some 0 > 0 then the parametric procedure in large
samples will still yield the same probability distributions.
more explicit, we suppose that
G(~)
To be
is an h-variate continuous distri-
bution function with location vector
l..l
and dispersion
E'
=
positive definite.
Consider now the random sample
hxh
~8'
S = 1, ••• ,n and
define
~n
hx1
~n
--
=n
-1
n
L
8=1
~8
-1 n .
= ~n L
n
n-1
8=1 ~s~S'
- ~
- X
...n X
...n }
by the arguments presented in Puri and 8en (1971, p. 173) we see that
~
74
~(nl/2
p
~n -+- ~
as n
-+-
00.
X)
-+""""n
.
N(O,E)
-
""""
if
~ =0
and
In addition for a sequence of alternative hypotheses
n- l / 2 A where A is fixed and non-null
~(nl/2 X.
..
Inn )
-+-
N(A,E) as n
-+-
00
therefore,
Clearly, if we now consider the function defined by
Y
' = B.
X .; for each 8 = l, ••• ,n
-s
].
~l.- s
then we can apply the same logic to the random variables ..n
Y to find
l 2
the distribution of n / Y
for large n. In particular consider now
..n
the special problem of comparing a linear to a quadratic model when
the true model is linear.
3.4.1
Parametric Procedure - True Model Linear
If
~(t~)
is linear in
t~,
then to test the null hypothesis we
consider the following:
(3.4.1)
"!.8l
-
= b'
..1
X •
..8'
8
= 1, ••. ,
n
and if we (incorrectly) tested for the quadratic also we consider the
transformation:
b'
1
(3.4.2)
18 2 =
b'
2
s
= 1, •.. ,
n.
75
Since we know
it follows that
£i E~l·
E(!Sl)
= al
Similarly E(!SZ)
=
and its covariance matrix is given by
a
1
(0 ) and the covariance matrix of
~S2
given by
Consider the sequence of alternative hypotheses,
(3.4.3)
~:
a In
= n-1/2
A A ~ 0
'
Then the usual parametric one sample estimates are
Y.
_1
=n
-1
n
L
S=l
Y .
- S1
i
= 1,2.
Since G has moments of order 2 + 0 (0 > 0), the conditions of
Theorem 2.2.1 are satisfied; the first four conditions are obvious
and the fifth follows from the Berry-Esseen Theorem.
where
,i
In a similar fashion the test based on
-1
b'S b'
b'S b
~l-n",
-1-n_2
Iii y'
In ~2
_2
b'S
b
b'S
b
_Z_n_l
_2_n_2
Therefore
is
76
has a limiting x2(2'~2) as n .... ClO, where
-1
b' ....r b,
b' ~ ~2
~l
b'
.... 2 ~ ~l
~i ~ ~2
~l
~2
= Al (1,0)
~
[~ ]
AI·
This reduces to
If we let
~l
be the test based on the correct model and
test on the quadratic model; we see that the ARE of
~2
~2
with respect to
¢l using trace, curvature or local criteria becomes:
Continuing with the parametric procedure let us derive the
efficiency for the general case of overfitting the model.
3.4.2
Parametric Procedures - General Case of Overfitting
Suppose that the correct model is:
(3.4.5)
Make the transformation
b'
-1
~S = ~i ~S
(3.4.6)
S = 1, ••• ,n
b
-q'
Now we see that under the model (3.4.5)
a1
a
q
the
77
Consider now the sequence of alternative hypotheses
·a··
-1/2
,A;O.
• n
A
qn
q
Then similarly to the previous section we see that
Q~l) = in fs~ (~i ~n ~l)
-1
In !Sl
will have X2(q'~l) distribution as n ~ ~ where
-1
~1
= ~'(~i
~ ~1)
A.
If we incorrectly assumed that
p ~ (h-1).
Denote
b'
_q+1
B'
-2
=
b'
-p
and the accompanying transformation to !S2 by
B'
-1
!S2
=
B'
_2
We see that
·a·•
q
°
°
~S
S
= 1,2, •.• ,n.
78
and the covariance of !S2 is
We observe that in the notation of Theorem 2.2.1
II (a
....n
1
••• a ) =
q
a
q
o
o
thus
I
qXq
-o
(p- q) Xq
in the notation of Theorem 2.2.1.
of
Q~2) is X2(P'~2) as n
+
Clearly, the limiting distribution
00, where
-1
Q(2)
n
=
n Y'
.... 2
B' S
B
....1
....n ....1
B' S B
...1 ...n -2
S B
....B'2 ....n
....1
B' S B
. . 2 -n -2
!2
and
B'
.... 1
b.
2 ==
A' D
-
(2) ,
E~1
B'
-1
E~2
-1
D(2)
B'
.... 2
E~1
B'
_2
E~2
By Theorem 8.2.1 of Graybill (1969) we have:
-A.
79
-1
(3.4.7)
B' ~ ~2
=
,
*
where
* indicates
multiplication by
62
*
other terms in ~i ~ ~j which are eliminated by
E(2) , .
= ~'[~i
Hence using (3.4.7) we obtain
E~1 - ~i E~2(~i E~2)-1 ~i E~lJ
-1
A.
Again designating the test with the correct number of parameters
as ¢1 and letting ¢2 be the overfit we obtain the following efficiencies of ¢2 relative to
~1:
l/q
(3.4.8)
CARE
=
R(q,p,a.)
q
~
p
q
~
p.
and
(3.4.9)
3.4.3
TARE
Parametric Procedures - General Case of
Unde~fitting
For this situation we consider the same model 3.4.5 and suppose
¢1 is again based on the test
on
Q~l)
of Section 3.4.2 but
Q~2), a quadratic form in p variables, where p
< q.
~2
is based
We define
80
!S2 by
bl
-1
(3.4.10)
!S2
=
p X1
bl
-p
Notice that
···
b
q
l
In the context of Theorem 2.2.1 we have
j
= 1, ••• , P,
~
.(a , ••• ,a )
1
q
nJ
= a.,
J
and
D(2)
=[ I
4'
pXq
pxp
o
pX(q_p)
The conditions of Theorem 2.2.1 are satisfied again and the limiting
distribution of Q(2) where
n
As pointed out in
Chapter II this is a situation in which the test has power a in (q-p)
principal directions and hence the generalized Gaussian curvature is
zero.
We can consider the trace criterion which yields TARE of ¢2 with
respect to ¢l:
t
TARE
= R(q,p,a) =
(
.tr D 2) (B t L: B)
--
-1
81
which reduces to
tr(B' E B)-l
(3.4.11)
TARE
= R(q,p,a)
tr(~i
t ~l)
for q > p.
This quantity will also be numerically evaluated for some
specific covariance structures in Section 5.3.
3.5
Nonparametric Procedures
Wnen using nonparametric methods for growth curve analysis two
approaches suggest themselves.
The first is to estimate each indi-
vidual's growth curve parameters using some robust procedure and
then apply a nonparametric test to these estimates.
The second approach is to use a least squares reduction
of each observation as in the previous section and then apply a
standard nonparametric test to these least squares estimates.
latter approach is taken in this
3.5.1
.,
The
section.
Data Reduction
~S -~; S = 1, ••• ,n are assumed to have been
hx1
selected from an h-variate continuous distribution function G(X),
The observations
diagonally symmetric about
9 where
models (3.3.1),(3.3.2) or (3.3.3).
~(t£)
is represented by one of the
As we did in the parametric case
we shall first consider the special case where the model is linear
and evaluate the efficiency by overfitting this linear model.
The
tests we compare will be rank order tests applied to !Sl and !S2
defined by (3.4.1) and (3.4.2) respectively.
Since
~S
-
~
~S
-
~
is assumed to be from G which is continuous and
is diagonally symmetric about zero it follows from the symmetry
4It
82
alone that the characteristic function of
any hXq matrix of rank q(q
~
Z'
-8
it'Z
= B'(X
_ _8
- _S)
¢t
~S' ~(E)'
~s'
- -
is real for all
-S
= ~'
) = ¢(Bt)
--
(~S
~~,
But ¢ is real
~).
-
hence
~
is real and
~S
If we let H be the distribution
then we can think of
H and hence letting !S
If B is
is also real since
-~)
·it'B'(X
= E(e
is diagonally symmetric about O.
function of
is real.
-~)
_
where ¢ is the characteristic function of
for all t therefore
~
-
h) and we define
then the characteristic function of
= E(e
~S
~s
S = 1, ••• ,n as a sample from
~S; !S can be assumed to be from a
q-variate continuous distribution function, F, diagonally symmetric
.e
about
B'~.
We are now in a position to apply the multivariate one
sample tests as outlined in Puri and Sen (1971).
We conclude from
this section that we have a reduction of the data to a set of data to
which we may apply the nonparametric multivariate one sample test
procedures.
We shall discuss, as in the parametric case, the problem
of overfitting a linear growth curve model but first we briefly
summarize the multivariate nonparamettic results which we shall use.
3.5.2
One Sample Procedures
Let
~S
' S
= 1, •.. ,n
be a random sample from a diagonally
qXl
sYmmetric continuous distribution function F.
esis we assume the location vector of
study we have H:
n
e = n- 1 / 2 r
-n
~S
is
~
Under the null hypothand for large sample
We denote by R , the rank matrix
-n
83
of the absolute values of V ; i.e.
Sk
R =
-n
... Rqn
Ties can be ignored, at least in theory, by the assumed con-
A transformation of the ranks is made to a matrix of
tinuity of F.
rank scores, E , where
~n
E R ' .•• E R
n 1n
n 11
E
-n
=
En. Rql '
The transformation is defined by the score function, J , defined by
n
ED,a = J n (a/(n+1»
where I
n
I.
1 < a < n
is required to satisfy the following conditions:
n
lim
+
J (u)
n
00
= J(u)
exists for
° < u < 1 and J(u)
is not
constant
00
II.
J [J
U
[ +n
n l
'00
2
H .(X)] - J[ +n l H .(X)]] dF .(X) = 0 (n- l / )
nJ
n
nJ
nJ
P
j=1,2, ... ,q
where for each j
H .(X) =
UJ
III.
n
-1
= 1,2, ••• ,q
[# of
lvsjl 2 X;
< K[u(l-u)]-i - 1/2 + 0
S = 1,2, ••• ,u]
i
= 0,1
for some
~
> 0
84
[J (u) - J(u)]2 du
n
..
esa
Also define
= O.
to be
f
eSa =
V
+1
sa
L
V"< 0
if
-1
> 0
Sa
Under the basic sign invariance structure a class of conditiona11y distribution free tests for the null hypothesis exists
and is characterized in Puri and Sen (1971).
We shall use from their
1 2
text, corollary 4.4.31,
. for the limiting distribution of n / (T-n - y)
through a sequence of alternatives H.
To summarize their result:
n
If
n
..
e
T .
nJ
= L
a=l
En R
ja
e.
for j
Ja
= 1,2, ••• ,q
and
1
f J(u)du
1
y=-
2
o
1·
f J(u)du
o
2
then through H the vector n 1 / (T
n
bution as n
~
00
_n
2
-
with mean
e=-1
-
_ y) has a q-variate normal distri-
where A. (F.)
r
. q
J
A (F )
q
q
J
=
_00
j
..
and covariance matrix
I:
...
1
= -4«0'
.. ,»
JJ
j,j ' = l
, ... , q .
= 1, ••. ,q
85
dF. , (X,y)
J
The additional assumptions have been made that E
is the
n,a
th
expectation of the a
smallest observation from a sample of size n
drawn from a distribution ~j (X) with
=
{21/1;<X)' - 1
~j (X)
•
*
~j(X)
So
J(O) =
o
X~O
X < 0
*
+ ~j(-X) = 1
X
V
o and
-1
J(u) = ~j (u) = ~;
-1
o< u
(l;u) = J*(l+u)
2
< 1
= l, ••• ,q
each j
and we assume that
fj(X) J *' (Fj(X»
is bounded as X -+ ±
00.
We see that we may construct quadratic forms in n
with En a consistent estimate for E and test H •
O
l/2
(T
-n
- 1.),
--
Alternatively, one
may use the test statistic (quadratic form) constructed under the
permutation. group generated by the sign invariance structure and use
these test statistics.
In large samples these two statistics are
power equivalent
through the sequence Hn •
.
3.5.3
Application of the One Sample Test
For our problem we started with an original h-variate continuous
distribution function, diagonally symmetric about B a where
~
is the
portion of the orthogonal polynomial matrix needed for the model.
test constancy of the growth curve model we must test
To
.
86
o
=
o
hence we transform from
~s
to
!s
By our previous discussion
o
B'B
-1-
a
a
],
by (3.3.8).
!s
is diagonally symmetric about
[WhiCh in the orthogonal polynomial representation is
q
[I:] ].
We observe that if
~1
and
~
are orthogonal polynomial matrices
that
B'
-1
B
-
qXh hX(q+1)
=
[0
: I]
qX1
qXq
We note that a sufficient condition for
which does not depend on a
vector.
then
•
Denoting
..
O
!s
to have a distribution
is that the first column of ~i~ is the zero
If in this more general situation we let
87
we complete our connection between this problem and the multivariate
nonparametric ,one-sample probl,em by noting that
implies
e=
0 and the sequence of
alternat~ve
n
1n terms of
values
-1/2
...e 1s
U:} =
n
-1/2
where
Henc[e :: m]aY test HO by testing the hypothesis
a
sequence
in terms of the test on
a
e is
e=0
and the
e-n •
equal to
Hence in
qn
the notation of Section 3.5.2 i f we consider general rank scores
applied to the
2
!s
then the test statistics constructed would be non-
central X with q degrees of freedom and noncentrality
where
~
defined by
88
c
=
,
(B'l'B 2)
- -
T=
T
jK
q
q
«(Tj~»
=
jK
where cr
A A (F )
q -
j,K
cr jK
= 1, ••• ,q
.
Aj(Fj)~(FK)
is defined in Section 3.5.2.
Then we see
6
and we see that
= -A'(B'B
)'
_1_2
T-
-
1 (B'B )A
-1_2-
(~i~2) corresponds therefore to the D(i) of
Theorem 2.2.1.
The effect of testing the hypothesis with too few or too many
parameters i.e., underfitting or overfitting respectively, simply
changes
~i~2
to a non-square matrix.
statistics, say PJ (p < q) then
semidefinite.
~i~2
For example, if we had too few
is pXq and hence 6 is positive
We now discuss the linear growth curve model and compute
the efficiency of a quadratic assumed model to the test with the
correct number of parameters.
3.5.4
True Model Linear
If the model (3.3.3) is the true one then the hypothesis to be
tested is (3.3.5) against the sequence of alternatives (3.4.3 ).
Letting !Sl be as in (3.4.1) and !S2 as in (3.4.2) then !Sl can be
89
viewed as a sample from F (a univariate continuous distribution func1
tion symmetric about al ) and !S2 can be viewed as a sample from F2
(a bivariate continuous distribution function, diagonally symmetric
al
about (0
On applying rank scores to !Sl and !S2 as described in
».
Section 3.5.2 and letting Q(i) be the quadratic form in the rank
n
statistics
of Y
.
_ 1' (i
S
= 1,2),
we see that
as n
-+
00
where
and
where
F ,j is the jth marginal distribution function of F ,
2
2
V 2 ,jK
= foo foo J*(F 2 ,j(X» J*(F 2 ,K(Y» dF 2 ,jK(X,y);
_00
j,K
=
1,2.
-00
We observe that we have used the same score function for each
test procedure and each variate.
Obviously we want to use the same
score function for the common variates in the competitive tests;
90
however one may be interested in using a different score function for
the extraneou$ variates (in the case of overfitting).
These points are
other possibilities which could be explored; however, we continue in
our discussion under the assumption of a common score function.
We note that Fl
= F2 ,1
and hence
and
The ARE of ¢2 with respect to ¢l using any of the three criteria,
curvature, trace, or local is:
= R(1,2,
(3.5.1)
) (
..)
T2,11 T2 ,22- T 2 ,12 T 2 ,2l
We now consider the general case of overfitting in order to
obtain results comparable to the parametric results.
3.5.5 Nonparametric Procedures - General Case of Overfitting
For the general case of overfitting we let ¢l be the test which
is based on the rank scores applied to
~Sl
defined by (3.4.6), i.e.
qXl
~i ~S·
In the context of Section 3.5.3 we see that
~2
= ~l'
qXh
hence the test based on Q(l) (the quadratic form in T(l»
n
sequence
(3.5.5.1)
would be
H:
n
~ln
.•
[ a
qn
I-
n
.
- n -1/2 [ :1
•
Aq
through the
91
with
-1
81 = A'
...
T
...1
A
...
where !1 is defined as T in Section 3.5.3.
On the other hand if we
tested for an additional (p-q) parameters
°q+1
] __ [
•
o
~S to !S2 by !S2
We then see that
~S
Q~2),
S
=
00."
]
by reducing
p
l, ••• ,n where
~3
=
[b.b...pq,+l ].
the quadratic form in the rank scores of !S2'
has through the same sequence of alternative hypotheses, a limiting
8
2
=
A
1xq
[ I
qXq
We denote the null distribution functions of !Sl and !S2 by Fl
and F respectively.
2
The corresponding marginals (bivariate and
univariate) are denoted by F1 ,j; F1 ,jK and F2 ~; F2,~m' where
j,K = l, ••• ,q while
~,m
= l, ••• ,q, q+l, ••• ,p.
It is apparent by the
construction of !Sl and !S2 that
for j,K
and by equality of the score
!l = !2,11 where
=
1,2, ••• ,q
functions we therefore have that
92
T
T
....2,11
qXq
T =
...2
•
T
.... 2,21
(p-q)xq
....2,12
qX(p-q)
T
....2,22
(p-q)x(p-q)
and
-1
-1
T
...2
=
(~2,11 - ~2,12 ....T 2,22
.
,
.
.
Thus the ARE of ¢2 with respect to ¢1 using the curvature criterion is:
l/q
(3.5.5.1) CARE
= R(q,p,a)
I....T2,11
-T
T-
1
T
.... 2,12 ... 2,22 .... 2,21
I
Using the trace criterion the ARE is:
T
(3.5.5.2) TARE
-1
-1
T
.... 2.22 .... 2.21
= R(q,p,a)
]
-1
q'::'P
tr ~1
3.5.6
Nonparametric Procedures - General Case of Underfitting
We let ¢1 be the test in the previous section for the same
hypothesis and we let ¢2 be the corresponding rank test when we have
based the test on only the first p of the Cal" •• ,a ) , where p < q.
q
Letting
B' ~S
~S2 = -4
p X1
S
= 1, ... , n.
He note
~l
= [B- 4 ,b-p+l,~··,b-q ]
pXh hxl
From the previous discussions the quadratic forms constructed in the
rank order statistics of the !S2 denoted by
where
Q~2) have a limiting
93
~2 = A'
-1
I
~2
pxp
[ I
'
..
pxp
...
0
pX(q-p)
]
A
0
(q-p)xp
The ARE of
~2
with respect to ¢1 using the trace criterion is
therefore:
(3.5.6.1)
tr
TARE = R(q,p,a)
tr
3.5.7
-1
~2
p < q
-1
~1
Reduction for Wilcoxon Signed Rank Test
In the notation of Section 3.5.2 the Wilcoxon score or signed
rank defines E = J ( +( )
na
n n 1
a
=--;
n+1
a
=
1, ... ,n.
Then
T
-
=
«T
'K»
J
j,K = 1, ••• ,q is equal to
2
00
1/12 [I_00 f.(X)
dF.(X)]
J
J
K
= j = l, ... ,q
(3.5.7.1)
p;K!f12[/OO
dF.(X)]
[foo fKeX)dFKeX)]}
t_OO f.eX)
J
J
_00
j~K=l, ... ,q
g
where F is the distribution function of !S and PjK is the grade
correlation, i.e.
g
PjK = 12
rr
_00
[Fj(X) -
1
"2]
1
[F (y) - -] dFjK(X,y)
K
2
-00
j,K = 1, ••• , q
.
Further if G is multivariate normal then F is multivariate
normal and
(3.5.7.2)
=
1/2 -1
[2
ITI Yjj ]
j
= 1, ... ,q
94
where Yjj is the variance of the j
th
component of
!s.
. th
th
Denoting P as the correlation of the J
and K variates of
jK
'!S
we have
g
PjK
6
= -1T
We note that if we assume
= b~
then y ..
L b
-J - -j"
JJ
-1
sin
th~t
PjK
( 2 )
j,K
= 1, ••• ,q.
G has a covariance matrix L
We see that
j=K=l, ••• ,q
(3.5.7.3)
L
jK
=
-l[i sin-lt iK )] [21iT(b~
12 n
L b )1/2] [2/TI(b' L b )1/ 2 ]
-J - _2
-K - -K
2
j#:K=l, ••• ,q.
In general
1/12(4n --~j' Lb.)
_ ...J
(3.5.7.4)
L
jK
-1
;
j=K=l, ••• ,q
=
. 1(4 (b' L b.)1/2(b' ~ b )1/ 2 ) [.2. ·sin-l(P jK)].,
12 1T _j _ -J
_K ~ _K
n
2
j #: K = 1,2, ••• ,q
where
P· K =
J
b '• t..~ bK I(b'. t..~ b.)1/2(b'K t..~ bK )1/2
-J - -J'" -J
...... -
j,K
= 1,2, ••• ,q.
Obviously, the additional subscript (lor 2) needs to be added
for the
3.6
co~parison
of the two tests.
Comparison of Nonparametric to Parametric Procedures
In this section we consider a comparison of the nonparametric
procedures available for the one sample problem to the standard
parametric procedure.
The results in this section have applications
to the general one sample shift problem so we shall first present
95
results for the general one sample problem.
We then discuss the
extension of these results to the comparison of the two procedures when
applied to the growth curve problem when
a) the model is correctly specified;
b) the model is overspecified; and
c) the model is underspecified.
3.6.1
One Sample Location Problem
In this case we shall let ¢l be the test based on the parametric
(T~) test and ¢2 be the nonparametric (rank scores) procedure. We can
have either of the two measures of ARE in this case, the curvature or
the trace; i.e.
CARE = {IEI/I~I}
(3.6.1)
l/q
or
(3.6.2)
TARE
=
-1
tr T
-
-1
/tr E
-
•
We shall want to place bounds on (3.6.1) and (3.6.2) over suitable
subclasses of the entire class of distribution
~j',
functions,~.
The class,
for the one sample location problem, is the class of q-variate
absolutely continuous distribution functions, diagonally symmetric
about its median,
o>
~,
and possessing moments of order 2 + 0 for some
O.
Bounds for (3.6.1) are well known since (3.6.1) is nothing more
than the efficiency of the rank scores estimator to the least squares
estimator using the Wilk's criterion of asymptotic
ance; this fact was noted by Bickel (1965).
genera1i~ed
vari-
Bounds on (3.6.1) over
subclasses in d are presented in Puri and Sen (1971) and Bickel (1964).
~
96
We summarize these results briefly:
If
~2
is the normal scores estimator and
a) F is q-variate normal then CARE
= 1,
q
b) F(X) = IT Fj(X.) then CARE > 1.
j=l
J
If
(3.6.3)
~2
is the Wilcoxon scores estimator then
a) CARE ~ .864 [
IPjKI
I
/PIKI]
l/q
for all F in3;
b) If F is q-variate normal then
(3.6.4)
CARE
3 I
. -1
= i[
PjK I I /TI6s~n
(P jK/ 2) I
] l/q
c) If F is bivariate normal then
CARE
3
= i[ (l-p
2
)
I
(1 -
36
-1
2' (sin
2 1/2
(p/2»]
'IT
and if we find the maximum and minimum with respect to
p(-l
~
P
~
1) we find
.91
bound is reached when /pl
P
~
-+
CARE
~
.95 where the lower
1 and .95 is the value at
= o.
d) If F is pairwise independent then (3.6.3) has a lower
bound of .864 and (3.6.4) has a lower bound of 3/'IT.
We now shall find bounds similar to the above bounds when we
use the trace criterion of efficiency.
Let us consider the normal scores test, i.e.
score procedure.
L
=T
- and
~2
is the normal
We first note that if F is q-variate normal then
therefore TARE
= 1.
97
Theorem 3.6.1.
If F(X)
-
=
q
IT Fj(X.) and if F.(X ) has density
j=l
J
. J j
fj(X j ) with finite variance
X
4
±
00
.
d
-1
and ~f dX ~ (Fj(X»
OJ2
is bounded as
then TARE > 1.
Proof:
For all j,K
= 1,2, ••• ,q
we know that
fOO foo ~-l(F.(X»
J
~-l(FK(Y»dF'K(X,y)
.
J
_00 _00
00
d
(f
dX ~
_00
-1
(F.(X»
J
dF.(X»
J
00
d
(f_00 --d
y ~
-1
(FK(y»
dFK(y»
but for j # K the numerator of T
is zero by the diagonal symmetry of
jK
~-1.
F(X) and the symmetric nature of the score function
Let
A(F j ) =
~ ~X ~-l(Fj(X»
dFj(X);
j
= 1,2, ••• ,q.
_00
Therefore
o
oo
f
j
[~-1(F.(X»]2
J
#K
dF.(X)/A 2 (F.)
J
J
j
=K=
1,2, ••• ,q.
_00
But by the boundedness assumption and general properties of the
normal distribution it has been shown [see e.g. Gastwirth and Wolf
(1968)] that
2
A (F.) > 1/0.
J
-
2
J
We know that
Therefore
! -1 = diag
2
(A (F ),
1
... ,
j = 1,2, ... ,q.
98
L-1 = diag (1/0 1 2 '
... ,
and the result follows.
Q.E.D.
We note that if the distribution function, F, has a diagonal
covariance matrix then the above theorem still follows since each
-1 is greater than or equal to l/A2 (F.).
diagonal element of T
-
Hence
1
the result of the theorem is valid for a wider class of distributions.
If ¢2 is the Wilcoxon score then from Puri and Sen (1971, p. 175) we
have:
1/12(fOO
fj(X) dF.(X»2;
_00
J
T' K
J
j
=
K = 1,2, ••• ,q
=
where
Let
00
A(F.) = f fj(X) dF.(X).
J-oo
J
In general tr
F(~).
-1
/tr
~
-1
is not easily evaluated for arbitrary
We consider some special cases.
Theorem 3.6.2.
(3.6.5)
~
If X is an interchangeable random vector then
TARE
= l2a 20 2 (1_p) (1 + (9-2)p8) (1 + (g-l)p)
(1 - pg) (1 + (q-2)p) (1 + (q-l)pg)
where
a = A(F ) =
j
f
00
_00
f.(X)
dF.(X)
J
J
= f
00
_00
f(X) dF(X).
99
Proof:
...
p
Since
E
T
-1
E
-1
T
= [a"2 (l-p) I + a2 pJ]
-
=
1
[12az (l-pg)
= (a ~ (l-p)) -1
=
1
!
[! -
g -1
(12a2 (l-p ))
g
+ l~az ~]
2
2
2
a p/{a + (q-1)a p)
og
[! - TW" /
1
(12a 2
~]
g
+ (q-1) 1ia z ) ~]
hence the result follows.
Q.E.D.
100
Theorem 3.6.3.
If
F(~)
has covariance matrix
~
which is diagonal
then
TARE > .864.
Proof:
We know that
tr ,.-1
~
~ 12
i(
j=l
r
fj(X)
~F.(X»2 = 12
J
-00
i. 1 A2 (F.)J
J=
and
tr E- 1
=
q
E 1/cr7
j=l
J
therefore
i
TARE > 12
-
q
2
2
E A (F.)/( E 1/0.)
j=l
.
q
J
j=l
J
j=l
=
12 cr7
J
q
E
j=l
A2(F.)/0~
J
J
2
l/crj
2 2
but 120. A (F.) > .864 [Hodges and Lehmann (1956)].
.
J
J-
TARE > .864 •
Thus,
Q.E.D.
Theorem 3.6.4.
If F(X) is q-variate normal with pairwise independent
coordinates
TARE
Proof:
When F is normal we have
and
hence we have the following
~
3/rr .
101
1T
T=-
3
symmetric
Pairwise independence implies P '
JK
g
= 0 thus p
= O.
jK
Therefore,
-1
T
-
= -3
diag(l/a 2 , ••• , l/a 2 )
l
q
1T
E-1 = diag(l/a l2 ,
2
••• , l/a )
q
and the result follows.
Q.E.D.
We now establish a result for the trace criterion which parallels
the curvature result when F is q-variate normal (q
Theorem 3.6.5.
~
3).
For q > 2 then
inf TARE
=0
ep £ Q
where Q is the class of nonsingular q-variate normal distribution.
Proof:
As in Bickel (1964) it is sufficient to establish the result for
q
= 3.
Consider the covariance matrix:
1
o
(I-a) /12 .
o
1
(l-a)/12
(I-a) /12
(l-a)/12
1
102
Hence
E- 1 (a)
=
2
2
(0 (1- (I-a) »)
1
tr E-(a)
=
tr ~~-I(a)·
~ 00.
-
T(a)
-
= -'IT
2
2
(I-a)
1 2
. 2
(I-a)
2
-1
2 -1
0
'2
(I-a)
1 2
(I-a)
(I-a)
12
12
2
[3';'(I,;...a)]
[0 (1,;...(I-a»]
(I-a)
2
(I-a)
2
12
I
and we see as a
. -1
-6 sJ.n
[l~]
o
1
. -1
-6 sJ.n
[l~]
'IT
'IT
(a)
3
= -'IT
0
2
2 -1
(1-2b )
where
b =
2
1_b 2
b
2
I_b 2
b
-b
-b
~ sin- 1 [l~l.
Clearly we'have
-1
tr T
-
(a)
30 2
2 -1
(1-2b )
Notice that --'IT
since
0 that
o
e
-
~
1
1
,..1
(I-a)
0n t h e oth er han,
d
6
T
12
2
3
= -'IT
0
2
2 -1
2
(1-2b)
(3-2b).
-b
-b
1
103
b
2
= [•iT6
.-1
s~n
fl~]l
2
36
=2
-+
2.25 as a
-+
0,
'IT
therefore
1
So we see that tr ...T- (a)
a
-+
-+
.42 0
2
~
00
as a
-+
O.
So TARE
-+
0 as
° and the theorem follows.
Q.E.D.
Hence the same type of degeneracy occurs using the trace
criterion as occurs with the curvature and the smallest root criterion.
Let us now study the bivariate normal distribution function and derive
bounds on the TARE.
Theorem 3.6.6.
r =
[
o~
p ° 1°2
IfF is bivariate normal with mean (0) and
P
0:02 ]
e
°
and positive definite then
°2
2
2
a) TARE is independent of 01 and 02;
b) .87 < TARE ~ .95
where the lower bound is reached as Ipl
and the upper bound is reached at p = 0. '
Proof:
Since F is bivariate normal we know
6cro
1 2 sin-l(p/Z)
'IT
-+
1
104
In the 2x2 case we know that for a nonsingular matrix A that
~
~-1
tr
a
tr
~/IAI
.
Hence,
and
TARE = tr T-l/tr
r- 1
-
= 1(1 _ 36(sin- 1 (p/2»2)
TI
h(-p)
For 0
~
=
p
=0
We first notice that the TARE is
since if we denote h(p)
= TARE
3
36. -1
2 -1
2
-(1 - --(s1n (-p/2»)
(1 -p )
TI
TI2
~
(1 _ p2)
TI 2
which proves part a of the theorem.
symmetric about p
-1
=
we see
h(p).
-1
-1
2
l,sin (p/2) is increasing, therefore, [sin (p/2)]
1
is increasing, similarly, for 1 ~ P ~ 0,sin- (p/2) is decreasing, and
therefore, (sin- l (p/2»2 is increasing, thus [sin- l (p/2)]2 is
increasing as Ipl increases.
Bickel (1964) has sho,vn that
- -TI2 cos -1 p/2)
hence
2
3
2
TARE = -TI (1 - P )(1 - 9(1 - -TI cos
-1
p/2»
2
-1
Further Bickel (1964, p. 1087) demonstrated that this function
has a maximum at p = 0 and is monotonically decreasing in Ipl.
the maximum of the TARE is at p
=
0 which is 3/TI or .95.
lower bound consider the monotonic transformation
u
2 sin u
= sin-1
= p,
(p/2)
Hence
For the
105
thus
TARE
=1
1T
2
,
(1 - 4 sin u)
(1 _ 36 u 2 )
2
1T
and
lim .
Ip. I ~
1
1 _~(=l~-",-p2.,J.)_~_---:'_ =
1T
36. -1
(1 - ~(s~n
(p/2»
2
1T
Applying L'Hospita1's rule we obtain
lim
lui
~lI:.
6
3 (-8 sin u cos u)
1T
-2
ill2..
2
u
3
8
•
s~n
= -1T
1T
1T
- cos 6
6
2(6)
1T
1T
13
= :2
= .866 = .87.
Q.E.D.
We see the trace leads to slightly wider bounds than the curvature criterion does; but we have the same basic result that the
efficiency decreases as
Ipl
increases.
In any case we see that
using curvature or the trace is likely to produce similar results
for the efficiency for an underlying bivariate normal distribution.
3.6.2 Growth Curve Model Correctly Specified
In the notation of Chapter II we have t
1
=
t
2
=q
and if ¢2 is
the nonparametric procedure and ¢1 the parametric procedure then the
efficiencies of ¢2 relative to ¢1 are:
(3.6.6)
CARE
(3.6.7)
TARE
106
r
is the covariance matrix of the original observations from the
parent distribution function G, an
h-variate absolutely continuous
distribution function diagonally symmetric about zero
of order 2 + 0 for some 0 > O.
F is, in
t~is
with moments
problem, the distribution.
function of the reduced quantities !S' defined by
~i ~s.
The least
squares procedure and the rank procedures are applied to the
!s.
Practically speaking, the bounds attained in the general one sample
problem apply to this problem as well.
The only point of interest
is to determine what set of circumstances in the growth curve problem
lead to these bounds.
Consider the normal scores procedure.
If F is
q-variate normal then the efficiency using either criterion is 1; if
F has painvise independent
componen~s
bl r b
-j - -K
=0
One case where this happens is when
are bounded below by
it follows that
V j # K.
r = a 2 !'
hence the CARE and TARE
unity~
Turning now to the Wilcoxon scores we see that if F has pairwise
I
independent components then the efficiency (trace and curvature) is
bounded below by .864 and if F is normal the bound is 3/rr, again pairwise independence necessitates
thatb~
r
b
-J - -K
=0
V j # K.
If F is
bivariate normal then using curvature efficiency the lower bound of
.91 is reached when
while the trace criterion achieves a lower bound of .87 for the same
limit.
The upper bound is attained for both when p
= O.
One case
107
where p
=0
situation.
is
~
= a 2 !,
so the upper bound is attained for this
The lower bound for the growth curve model cannot in
general be reached for the bivariate case (i.e. linear and quadratic).
To see this point more clearly, we know there exists a nonsingu1ar
E
such that
(D- 1 ) , .E D- 1
therefore
E
let
~i
-D'D-
=I
=
= E~i
i
= 1,2
thus
P
1/2
= -1-2
a'a /(a'a )
-1-1
(a'a )
-2-2
1/2
and this can be one only i f there is a nonzero constant c such that
~1
= ~.
~2
which implies that
which we know cannot happen since for h time points, the elements of
~l
are monotonically increasing while the elements of
tonic.
~2
are not mono-
Thus the correlation coefficient of the linear and quadratic
cannot be 1 so the lower bound in general cannot be reached.
3.6.3
Growth Curve Hodel Overspecified
In the notation of the previous sections we have t
equal to t which is greater than q.
(3.6.8)
1
and t
2
both
We denote
... , b...q ]
where the underlying model is given by
~1 ~
and
~2
corresponds to the
108
orthonormal polynomials we have overfit.
...
The rank scores and least
scores procedures are applied to the
B'
!s =
-1
S '" 1,2, ••• ,n.
~s
B'
-2
Under the assumptions of Theorem 2.2.1 the quadratic forms constructed
in the rank
s~atistics
have a chi-square distribution with t degrees of
freedom and noncentra1ity parameter
;\' [ I
: 0
- qXq- qx(t-q)
-
1Xq
through
."
~
..
] T
-1
where
~:
e
Each term of T is the corresponding covariance of the rank scores of
the
!s
divided by the A(F ) A(F ) defined earlier in this section.
K
j
The c9rresponding parametric test would also have chi-square distribution with
t
degrees of freedom and noncentra1ity
;\'[1:0]
_
..
..1
[ B'
IV
B'
-2
through~.
and
The efficiency of
~ (~l' ~2) -1
~2
D]
relative to
~l
A
is therefore given by:
109
where
~ij
is the portion of
~
due to the covariance of the rank scores
applied to ~~ ~S and ~j ~S for i,j
3.6.4
= 1,2.
Growth Curve Model Underspecified
In this case t < q and we let
B be the first t rows of
~l:
The
hxt
least squares procedure and nonparametric procedure are applied to
Y-
s = B'
-
X
-S
S
= 1,
2, ••• , n.
The curvature criterion cannot be applied in this case so we consider
the trace criterion which yields the efficiency of
~2
relative to ¢l
as:
-E B)-l[I:O]
--
TARE
"""
This reduces to:
(3.6.11)
3.7
Covariance Adjustment
In this section we investigate the effect of the use of the
higher order polynomial terms as covariables on the TARE and CARE.
We
compute the efficiencies of the procedure with covariance adjustment
with respect to the procedure without covariance adjustment when the
model has been
specified.
a)
correctly specified, b) overspecified, and c) under-
In cases a and c we are able to show that you always gain
by covariance adjustment.
3.7.1 Model Correct1v Specified
The hypothesis of interest in this problem is
110
(3.7.1)
against
~(Ct1N' ••• , CtqN )
'
-1/2(,
. 1\1' ••• , A )' •
q
= N
From the previous sections we know that the parametric test
based on
Y
....Sl
= .B. 1'
S
X
....S;
= 1,
2, ••• , n
q X1
will have noncentral X2(q'~1) through ~ as N +
6 = A'(B'
l: B )-1 A.
....
1 ........1 ........1
00
where
While the nonparametric test using rank scores
would also have limiting noncentral X2(q'~2) through ~ where
~1
is defined by (3.6.8).
We assume that we shall
use r of the higher degree terms as covariab1es in the analysis, i.e.
we let
!S2
= ~2 ~S
S
= 1,
2, ••• ,
n
rxh
where
B2 = [b +1' ••• , b.... q+ r ]
........q
The expected value of !1 given ~2 is
and its covariance matrix is
The residuals are thus
r
= 1,2, .•• ,(h-q)-1.
111
and under H are
O
with covariance matrix
and through
~
In a similar fashion [see e.g. Sen and Puri (1970)] the test
based on the rank scores procedure would have noncentral X2(q'~3)
through
~
as N +
00
with
~3 = Al
1
A
-1.2 T-
where
T
-1.2
= -11
T
-1
- -12
T
T
:'21'
-22
--
We then see that the efficiencies of the test with covariables
relative to the test without covariates for the parametric procedure
are:
and
The corresponding efficiencies for the nonparametric procedures are:
112
and
We know
~i ~ ~2(~2 E ~2)-1 ~2 E ~l
. definite while
~i ~ ~l
~i ~ ~l
and
-
is symmetric positive semi-
~i ~ ~2(~i ~ ~2)-1 ~2 ~ ~l
are
symmetric positive definite.
Hence by Theorem 1. 44 of Graybill (1961) we see that
A similar argument shows that
Therefore, using the curvature criterion of ARE we see that the
efficiency of the procedures using covariance adjustment to the corresponding procedure without covariance adjustment is bounded below
by unity.
Using the trace criterion we also find that the ARE is bounded
below by unity.
rnent.
To demonstrate this we consider the following argu-
There exists a nonsingular matrix C such that
B' E B = C C'
(3.7.2)
where A
-1 - _1
= diag(y l ,
... , Yq , y.
~
-~
--
and
is a characteristic root of
113
Each Y is between zero and one since the roots of
i
are the same as the roots of
i~e.
the solution of
for Y or the solution of
(3.7.3)
for A where A
=1
- yare the same.
The roots of (3.7.3) are greater
than or equal to zero, i.e. 1 - Y ~ 0, therefore y
each y. is between zero and one.
~
tr[~i E~l
-
,
~1
1.
Clearly
So we see that from (3.7.2)
E~2 ( ~2, E~2 ) -1 ~2, E~1 ]-1
=
tr
~-l(~i E ~1)-1 ~ tr(~i E ~1)-1
since each element of the diagonal matrix Ato one.
2
1
is greater than or equal
Therefore
A similar argument can be applied to the nonparametric procedure.
Covariance adjustment does improve both the CARE and TARE in this case.
3.7.2 Model Underspecified
In this situation the hypothesis is (3.7.1); however, we have one
test with t statistics (t < q) and zero covariab1es and the second test
114
with t statistics and r covariables.
We let
B • [~1' ... , ~t : ~t+l' ... , ~q] • [~l
and assume the least squares reduction
!Sl
txl
= ~i ~S;
S
= 1, 2, ••• ,
n.
From the previous sections the parametric procedure based on the !Sl
will have noncentral
1::.
1
x2 (t,lll)
= A'
[.
through ~ as N -+
~txJ(B'
r
o -J
B )-1 [ I
-1 - -1
-
00
where
OlA
txt
and
1::.
1
(B t 1: B )-1
-1 - -1
= At
A
0
tXt
tx(q-t)
-
·0
0
(q-t)Xt
(q-t)x(q-t)
The nonparametric procedure with rank scores applied to the
x2 (t,ll2)
!Sl l.,ould result in a limiting
1::.
2
-1
= At
through
\.,i th
A.
0
~ll
tXt
~
tx(q-t)
0
0
(q-t) x(q-t)
(q-t) Xt
We wish to compare these test to the tests which use r of the higher
order terms as covariables.
t
Y
-S2 =
We notice that
~3
B
-3
rXh
l~e
let
r = 1, 2, .•• , (h- t) - 1
will contain terms from
~2
and in fact may actually
115
-
equal ~·2.
-
The expected value of Y1 given Y is given by
~l
:
[ at
2
+
with covariance matrix
The residuals are
with the same covariance matrix.
Under similar assumptions as before
we have that:
is noncentral X2(t'~3) through ~ as N -+
~
3
= ~.
00,
where
[[~i ~ ~l - S : ~3(~3 ~ ~3)-1~3 ~ ~ll-l;
In a like fashion the nonparametric procedure with covariables
would have a noncentral X2(t'~4) through ~ as N -+
:J
00
where
A •
-
We thus have the following as the efficiency of the test using
covariates to the test without the use of covariates for the parametric
procedure:
116
and for the corresponding nonparametric procedure is:
TARE
•
-1
-1
-1
tr[~ll - ~13 ~33 ~31] /tr ~11 •
=
The trace criterion is the only method of comparison which we
can use in this case since the curvature of the power function of
each test is zero.
We also note these efficiencies are bounded below
by one from the arguments in section 3.7.1.
Even if we have under-
specified the model we always do better to use covariance adjustment.
3.7.3 ModelOverspecified
In this situation the hypothesis is again given in (3.7.1);
however, we have one test with t statistics (t > q) and no covariables
and the second test with t statistics and r covariables.
B
=
We let
[~l' ••• , ~q' ~q+l' ••• , ~t] and assume the least squares
reduction
!Sl
txl
= B'
~S;
S = 1, 2, .•. , n.
From the previous sections the parametric procedure based on the
will have noncentral X2(t'~1) through ~ as N ~
00
where
Tpe nonparametric procedure with rank scores applied to the
result in
a limiting
X2(t'~2) through ~ as N ~
,[
-1-1
00,
~2 = ~ ~11 - ~12 ~22 ~2l]
~.
=
[~l'
The partitioned matrices are defined by B
hxq
~Sl
~Sl
would
where
~2
] and T ••
-1.J
hx(t-q)
are the covariances of the rank scores of ~i ~S and ~j ~S·
We wish
117
to compare these tests to the tests which use r of the higher order
terms as covariables we let
Y
...82
= B'
... 3
X
-8
r
= It
2, ... , (h-t)-1
rxh
where
The expected value of !l given !2 is
[
~ ] + (~. E~3)(~j E~3)-1 ~2
with covariance matrix
The residuals as
[
~1 -
[ [
~] + (B' E ~ 3 )(~j E ~3) -1 ! 2 ]
with the same covariance matrix and through
[Iii [~1 (~. E~3)(~j E~3)-~2
-
2
- n-l/ [
~
]
we assume that
~ ]]]
+
Nt [[
~ ],k1.3]
as n
We denote
=
Ai'
... J
B~
_l.
-E ~j
i,j = 1,2,3,
and
-1
~ij -3 =A- i J, - A'
-l. 3 ~33 ~3j
Also let
A'* =
All
A
12
A
2l
A
22
and
i,j
=
lt 2 •
-+-
00.
118
A** =
We know that
. and
I~11.3 -
A
-12.3 A-I
-22.3
~2l.3 I
=
I~ **1 / I~22.3 I •
Thus using the curvature criterion we get
(3.7.4)
CARE
I~ll - ~12 ~;~ ~2ll
= --..:::.;:;---=:.=-..-=.::=--=--
IA
- A
A-I
A
I
_11.3
-12.3 _22.3 -21.3
=
for the efficiency of the test with covariates relative to the test
without covariates.
The first ratio in (3.7.4) is at least unity,
however, the second term is less than or equal to one.
It does not
seem obvious whether the CARE is greater than or equal to one.
Using
the nonparametric rank scores procedures we have for the efficiency of
the test with covariates to the test without covariates:
CARE =
1!11 -
~12 ~;;
!2l 1
--.-;;;=--~----'~---:;;;.;:::;.....--
IT 11 • 3 - T12 • 3
T;~.3
T2l • 3 1
As in the parametric case it is not clear if this quantity can
. be less than unity.
3.8
',-
Comments
It should be observed that the hypothesis
[ ::: ] = [
n I:]
with [
119
unspecified
(~
< h) could have been the hypothesis tested in this
chapter in place of the hypothesis given in (3.3.4).
To test this
hypothesis we would apply the parametric or nonparametric one sample
procedure to
!s = B' ~s'
s = 1,
2, ••. , n
where
The bounds attained in this chapter would not change; however, the
actual efficiency formulae would change to reflect the B matrix.
In
general the results of this chapter are applicable to tests for the
one sample problem.
In Chapter IV we find that the TARE and CARE
reduce to scalar multiples of the corresponding one sample results for
the hypothesis that the intercept is a fixed number and the curve is
stationary.
-'
CHAPTER IV
APPLICATION OF TARE AND CARE TO TRE MULTI-SAMPLE
GROWTH CURVE PROBLEM
4.1
Introduction
We shall present results in this chapter for the c-sample prob-
lem (c > 1) similar to the results obtained in Chapter III for the onesample problem.
As in Chapter III we shall restrict attention to the
polynomial growth curve model.
The hypothesis of interest will be the
hypothesis of the equality of the c growth curves, under various
assumptions of what the model is, we shall see that this test of homogeneity will give rise to different test statistics.
The unweighted
least squares reduction will be used for reduction of each observation
vector to the estimates for the assumed model.
We shall apply the
general rank scores statistics to these reduced vectors and also shall
apply the Rotelling-Lawley trace statistics to test the
n~ll
hypothesis.
Efficiency formulae are presented for each procedure.
4.2
The Statistical Model and Data Reduction
The model for the multi-sample (c-sample) problem is a special
case of the general model presented in Section 3.2.
is the set of c indices 1,2, ••• ,c.
r,
The index set,
The index set, T, is the same as
in (3.2.2) and the data are characterized'by (3.2.3) and (3.2.4).
notational ease, we write n.~ in place of n(i) since i
= 1,2, ••• ,c.
.
For
121
Restricting attention to the case b
=1
we see that we have the follow-
ing for the multi-sample problem.
(4.2.1)
~Si
i
= 1,2, ••• ,c
S
= 1,2, ... ,ni
hxl
where for each i the
s 1. are i.i.d. as G(X;i) an h-variate absolutely
X
-
hxl
continuous distribution function.
The location vector of X 1' is M(i)
S
~
where
i
= 1,2, ••• ,c
and in the orthogonal polynomial models which we consider, we have that
(4.2.2)
M(i)
=
~l
i
= 1,2, ••• ,c
hXr
where
~1
are the orthogonal polynomials of degree 0 to r-l satisfying
(3.2.9) and (3.2.10).
Hence, we have again a dimension reducing trans-
formation from M(i) to a .•
The assumption
-1
G(X;i) = G(X + M(i»
-
(4.2.3) .
-
i
~
= l,2, ••• ,c
is made so that X ' - M(i) is distributed independently of i.
S
- 1
,
In the
.
normal theory we assume
(4.2.4)
i
if M(i) is given by (4.2.2).
= 1,2, ••• ,c,
The hypothesis of homogeneity of the
growth curves is given by
(4.2.5)
~l
= ~2 = •••
= a""'C = a (unspecified) •
~
122
Defining b .
...J
j
= O,l, ••• ,h-l
as in (3.2.10) we shall consider
the unweighted least squares estimate of
d·eno ting
~l
~i
for each observation; hence
by
then consider the transformation
(4.2.6)
!Si
= ~i ~Si
i.
= 1,2, ••• , c
S
= 1,2, ••• ,n ..
~
In the next sections we compute the efficiencies of the parametric and
nonparametric procedures for the overspecified and underspecified
growth curve models.
4.3
e.
Parametric Procedure
Under the assumption that
~Si
are from a multivariate normal
s . defined by (4.2.6)
distribution defined by (4.2.4) it follows that Y
...
has a multivariate normal distribution.
~
Renee, to test the null
hypothesis (4.2.5) we could follow any of several procedures, for
example, the likelihood criterion which reduces to Wilks lambda, the
Rotelling-Lawley trace, or Roy's largest root.
of normality
Under the assumption
the Rotelling-Lawley trace criterion (denoted
T~) and
the likelihood ratio criterion are asymptotically equivalent, in fact,
both would lead to noncentral chi-square statistics for large samples.
On the other hand if the parent distribution is not necessarily normal
but has moments of order 2 + 0 for some 0 >
° then T~ still has a
central chi-square under the null hypothesis and a noncentral chisquare through an appropriate sequence of alternatives.
the
We shall use
T~ statistic, but the equivalence of the two criteria in large
123
samples for normal distribution functions is noteworthy.
We consider
the data characterized by (3.2.3) and (3.2.4) and define
n
~i
X
where
N
-1
= ni
=
=
c
I:
i
X '
I:
S=1 ... SJ.
i
= 1,2, ••• ,c
i
= 1,2, ••• ,c
ni_
-
i=l N
X.
.,.J..
c
I: n
i
i=l
n.
S = (N_c)-l
c
J.
I: (I: (X
- Si
i=l 8=1
i.)(xs ' - i.)').
-
.,.J.
-
J.
-J.
each in the open
Assume there exist c constants, 1'1' 1'2' ••• , Yc'
interval from zero to one such that
c
i = 1,2, ••• ,c and
y. = 1.
I:
i=l
J.
Since the parent distribution function of ...X J.' has moments of
S
order 2 + 0 for some 0 > 0, it follows from the central limit theorem
that
as n
(4.3.1)
i
~
00
•
Furthermore we know that .,.8 P ...E by the laws of large numbers •
~
From (4.3.1) it follows that
(4.3.2)
~
-A.
~,
1/2 (Y.
- - E Y
-
(n.
1
-1
~
i
»
~
N (O,B' E B) as n. ~
r - -
as Y. is a continuous function of X.•
J.
J.
- _
1
00
In addition it is obvious from
(4.3.2) that
EY)··i= 1,2, ••• ,c) ~ Nrc(~'( ~' ~ ~) ~
.,.i '
rxr
£)
cxc
124
where
1
-.
Y1
r =
o
o
c
Writing ~i as ~ + ~i where
r
i=l
c - 1 of the
e.
are linearly independent; hence the null hypothesis
-~
= ~2 = ...
implies that ~l
hypotheses,
~,
=
8. = O.
-~
~:
i
= 1,2, ... ,c
A. is non-null for at least one i.
-~
To test H ' we may define
O
·e
The sequence of alternative
is defined by
(4.3.4)
where
Yi ~i·= ~ we observe that only
T~ in its symmetric, less than full
rank, form of
.
n
~
i=l
(4.3.5)
~
ni
(Y-
Y-)(B'S B )-l(y-. _ Y-)'
-1 - _
-1- -1
-~
_
or we may consider the statistic in one of its full rank forms, for
example:
(4.3. 6)
T~
N[Y.
- Y);
-1
-IV\
~
0..
[
11
n
i
= 1,2,ooo,c-l)' {CBl'S
B)
- ~ ~1
i
,N
-
}-l[(Y.-Y);
1
i,i~1,2,.oo,c-l
-
-
-1
-
i=1,2, •.• ,c-1).
By (4.3.3) and computation of the covariance we knmv, under H '
O
that
125
)
r c- 1)'[ ~0, (B~ 1'r- B
~1
i = 1,2, ••• ,c-l] + N (.
.[OU')
~'-
-1
Yi
Furthermore through
~
=
(4.3.7) ..f(,JN(!i - !); i
where
X=
E y. A.•
i=1
(4.3.8)
00.
•
[[~i-~) ,i=l,2, ••• ,c-1];
1,2, ••• ,c-1J .. Nr (c-1)
0.1i _'
Q9 _
Y
[
1]
-
i
]
./
as N
+
00
1,1=1,2, ••• ,c-l
Without loss of generality we may assume ~ is
1-1
zero because of the r restrictions.
has a limiting
N +
we see that
(B ' r B)
-v 1 _ _
c
) as
i,i~I,2, ••• ,c-l
x2 (r(c-l),
~l = [~i;
i
Clearly, by virtue of (4.3.7)
~ ) through ~ as N +
= 1,2, ... ,c-ll'
00
with
[(~i~ ~l) ® [O~~'1
[A.;
-1
Written in a symmetric form in the A., i
-1
T~
-1] . .'1 2
1,1= , , ••• c-
= 1,2, ••• ,c-1].
i
= 1,2, ••• ,c
~l
becomes
We shall need to compute the trace and determinant of the discriminant
in (4.3.8) so we reduce this discriminant to a simpler form.
Noting
that
(4.3.9)
o •••
[O~~. _1]
0
+
=
(-1)
1
1
i,i'=1,2, ••• ,c-l
o
1
(1,1, .•. ,1)
J-
1
126
thus by Theorem 8.3.3 in Graybill (1969) the inverse of (4.3.9) is
..
0 • •• 0
..Y1
0
Y1
+K
O
(Y , ••• , y 1)
1
c-
Yc - 1
Yc - 1
where
c-1
K
= - (-1)(1 +
(-1) L
i=l
-1
-1
y.)
=
J.
Yc
Consequently, we have
(4.3.10)
6."
J.J.
--[ Y
i
1]1.1.=1.2••••• C_1
Furthermore, since the inverse of a direct product is the direct
...
e
product of the inverses, it follows from (4.3.10) that
(B' B)® [6y. 1]
L:
~1 ~ ~1
-1
ii' _
1.
i,i '=1,2, ..• ,c-1
-1
= (Ei EEl)
®
O.• ,y.+
[
1.1
1.
Y.y· ,]
11
Yc
.
i,i'=1,2, •.. ,c-1
In addition, by Theorem 8.8.10 and 9.1.11 in Graybill (1969) it
is apparent that:
(4.3.11)
[
0.. ,Yo + ~YiYi'J
J.J.
YC
J.
i,i'=1,2, ••• ,c-1
c-1
0 .. ,y.
=
[
and
J.J.
J.
+
(r)
~YiYi']
Y
c
i,i'=1,2, ••• ,c-1
127
[C~i'Yi +
(4.3.12)
y.y.,]
~c~
]
,
i,i =l,2, ••• ,c-l
Let us define
(4.3.13)
A =
-
4.3.1 Parametric Procedure - Overfitting
If we.had incorrectly assumed that the polynomial model was of
degree (t-l) in each group (t>r) then we would make the transformation:
(4.3.14)
Y
- s ~·
=
[bO,bl,···,b
1]' X
'
- ~
-r-1,···,b
~t~ S~
and for notational convenience we let B =
i
= 1,2, ••• ,c
S
= 1,2, ••• ,n.
~
[~1:~2]
Bl = [bO,···,b-r- 1]
~
~
I
B
= [b~r , ••• ,b-t- 1].
~Z
Following, the same reasoning as in (4.3.6) and (4.3.7) we see that
through
(4.3.15)
~
or (v'N(y .-Y);
-~
~
i
= 1,Z, ••• ,c-1)
(!!'E !!) ®
+
A.
Nt(c-l) [(~~),1 = l,2, ••• ,c-l];
[
[o~~, -1)._
J
as
N
H
i,i -1,2, ••• ,c 1
where for each i; A. is rxl and 0 is (t-r)xl.
-~
Therefore, the statistic
..
~
128
A
[(Oi); i=1,2, ... ,c-l] •
...
By (3.4.7) and the fact that the last (t-r) elements of each
A
( ...oi) vector are zeroes we have
fl..;
... ~ i
= 1,2, .•• ,c-l] •
As in (4.3.11) and (4.3.12) we have the determinant and trace of
the discriminant of 6
I [~i
2
as
~ ~l - ~i ~
,
~2 (~2
~
-1
~2)
r
~2
c-l
f ~] -I, I~ I
and
respectively.
Representing the test based on the correct number of parameters
as
¢1 and the overfitted procedure as ¢2 we easily find the following
efficiencies of
CARE
¢2 relative to ¢1:
= R(r(c-l),t(c-l),a)\
1
1~1~~11
(c-l)
r(c-l)
~~
!B'EB - B'EB (B'EB )-lB'EB I(c-l)
...1......1
...1-... 2 ...2...... 2 -2--1
which becomes
129
(4.3.16) CARE = R(r(c-1),t(c-l) ,a)
1/r
J
1----------1
1~1~~11
l~i~~l
~i~~2(~2~~2)-1~2~~11;)
.
for t
~
Using the trace criterion we obtain
r.
)
tr[~i~~l- ~i~~2(~2~~2)-1~2~~l]-1
tr[~i~~l]-l
(4.3.17) TARE = R(r(c-1),t(c-1),a)
i.
f
for t
~
r.
We see therefore that the c-sample efficiencies for overfitting
are the one sample efficiency, in both cases adjusted by a scalar
function.
The equations (4.3.16) and (4.3.17) do illustrate the fact
that the loss in efficiency may be more severe for the c-samp1e problem
than it was for the one-sample problem.
4.3.2
Parametric Procedure - Underfitting
If we had assumed that the polynomial growth curve was of degree
t-1(t<r) in each group then we would make the transformation
,
~Si = [£O'~l'···'~t-l]
(4.3.18)
i=l, ••• ,c
X
'
- S1.
S
= 1,2, ••• ,n.1.
Let
and
Then through
~
we see that
(4.3.l9).';--"(vN(!i - !); i=1,2, ... ,c-l)
+
Nt(C-1)([~:'
(~
i=1,2, ••• ,c-l];
*' ~~)~;
* .- ~-1 )
as N +
00.
e·
130
A*
i
*
where ~i is defined by ~i =
the statistics
~
!J.
IN(Y. - Y)
-~
-
txl
Rence, the quadratic form in
would have X2(t(~-1),63) as N
+
00
through
where
3
*
= [A.; i=1,2, ... ,c-l]' [(B
-~
-
*' EB)
* -1®
X,A]
--
*
[A.,i=1,2, ... ,c-l].
--1
Using the trace criteria the ARE of the underfitting procedure,
¢2' to the procedure with the correct numbers of parameters, ¢l' is:
tr[B *'
(4.3.20)
4.4
TARE = R(r(c-l),t(c-l),a)
Nonparametric Procedures
We shall present efficiency formulae in this section for the
I
general rank scores procedures for the multivariate multisample problem
as outlined in Puri and Sen (1971).
Since the X ' have been observed
- S J.
from a continuous h-variate distribution function G(X,i) with location
-
vector M(i) where M(i)
a. , it is obvious that the B X ' have
- l - SJ.
hXr rxl
~l
-
-~
a continuous r-variate distribution function with location vector a .•
-J.
Hence, the multisample rank scores procedure may be applied to the
B X "
l "" S1
-
The procedure is simply to rank each coordinate of Y ' in the
- S1
set of all the Y ' and apply a score function to these ranks.
S
- J.
Mean
scores are computed for each sample and to test the null hypothesis of
equality of location vectors; a set of r(c-l) contrasts in the mean
scores is constructeci.
A quadratic form in these r(c-l) contrasts is
131
constructed.
A quadratic form in these r(c-l) contrasts is defined to
be the test statistic; numerically large values of this statistic lead
to rejection of the null hypothesis.
Section 5.6 of Puri and Sen (1971,
Referring specifically to
we note through the sequence
~
defined in (4.3.4) that the quadratic form would be noncentral chisquare with r(c-l) degrees of freedom with noncentrality 6
The noncentrality 6
1
1
as
N~
00.
is defined by
(4.4.1)
where Y are defined in the previous section
i
j~'---""'-"-J
T (F) -- [ [_O..-JL..J'
-11
c(F.)c(F.,)
J
J
i
= 1,2, ••• ,c
and
]
. "-12
J,J
- , , ••• ,r
and 0jj' is Vjj,(F) defined by (5.4.28) of Puri and Sen (1971) and
C(F j )
=~
:x J(Fj(X»
dFj(X); for j
= 1,2, ••• ,r.
_00
We have again assumed that
~
= ~,
which is no loss of generality.
observe that (4.4.1) can be written in its full rank form as:
(4.4.2) 6
1
=
.
. -1
~
[A.; i=1,2, ••• ,c-l]'[T (F) (~)A][A.; i=1,2, ••• ,c-l]
-1
where A is defined by (4.3.13).
-
~
ll
-
-1
Hence, we have
r
(4.4.3)
and
(4.4.4)
IA I
We
132
4.4.1
Nonparametric Procedures - Overfit
If we had overfit the polynomial mOdel in each group with po1y-
"
nomia1s of degree t-l then we would apply the rank scores procedures
to the variables defined by (4.3.14).
contrasts in rank
The quadratic form in the t(c-l)
scores would have noncentral chi-square distribution
with t(c-l) degrees of freedom and noncentra1ity
t
1,2, ••• ,c-1]
(4.4.5)
~2
through
[~-l(F) ~\~]
A.
[(0 1 ) ;
i
~,
where
= 1,2, ••• ,C-l]
and
T(F)
~11(F)
~12(F)
~21 (F)
~22(F)
=
the covariance matrix of the entire set of t scores.
In an obvious
way we have
I
~2 = [~i' i = 1,2, •.•
,c-I]
[(~II(F)
The efficiencies of the incorrect test ¢2 to ¢l' the test based
on the correct number of parameters using curvature criterion is:
(4.4.5)
ri
CARE
= R(r(c-l),t(c-l),a);
ITll(F) I
-
i
);
ll!11(F)-!12(F)!;~(F)!21(F)~
l/r
r<t
and using trace criterion we obtain:
(4.4.6)
TARE
= R(r(c-l),t(c-l),a)
r<t.
133
We note the similarity to the one-sample results and observe that
the scalar adjustment, R(r(c-l),t(c-l) ,a) may result in large reductions when comparing many samples.
4.4.2
Nonparametric Procedure - Underfitting
Underfitting the model leads to appl¥ing the rank scores to the
!Si defined by (4.3.18).
The quadratic form in the t(c-l) contrasts of
the mean rank scores would lead to a limiting noncentral chi-square
distribution with t(c-l) degrees of freedom and noncentrality parameter
~3
through
~
where,
*
* 1 ®~] [A.;
* i=1,2, ... ,c-l],
~3 = [A."i=1,2,
... ,c-l]'[(B *' E ~)-~
~
-
~
-~-1
A.* and B* are defined as in Section 4.3.2.
-1
The efficiency of the
underfitted procedure to the correctly fitted procedure is:
tr T*-l(F)
(4.4.7)
TARE
= R(r(c-l),t(c-l),a)
--1
tr !ll (F)
when T*(F) is the portion of !ll(F) , corresponding to the upper tXt
portion of ~ll(F).
4.5
Comments
We conclude from Sections 4.3 and 4.4 that for large values of c,
the adjustment of the one-sample efficiencies to obtain the c-sample
efficiencies, may be quite large.
The loss in efficiency in over-
fitting is greater for the c-sample problem than the corresponding
one sample problem.
On the other hand, the loss in efficiency in
underfitting the correct model is not as great for the c-sample problem
as it is for the one-sample problem.
.'
134
A comparison of the parametric and nonparametric procedures would
obviously lead to the same formulae as presented for the one-sample
problem.
This fact is deduced by simply observing that the function,
R{q,t,a), is 1 when q
=t
and the ratio of the determinants or traces
of the noncentralities would be independent of the matrix A.
Hence,
the one-sample bounds obtained in Chapter III are applicable to the
c-sample problem.
Adjustment of the statistics with the higher order terms as
covariables would also lead to identical results as presented in
Section 3.7 and the bounds attained in that section would apply for
the c-samp1e problem as well.
CHAPTER V
NUMERICAL ILLUSTRATIONS OF THE CARE AND TARE
5.1
Introduction
The purpose of this chapter is to provide some numerical illus-
trations of the CARE and TARE in specific situations.
We have seen in
Chapters III and IV that the CARE and TARE for the polynomial growth
curve model depend in general on:
a) the covariance matrix,
~,
b) the number of parameters in the model, q, c) the number of time
points, h, d) the use of covariates and e) the score.function.
We
concern ourselves in this chapter with the parametric procedures only.
We shall find, however, that the results in Section 5.3 for the parametric tests are similar to the results obtained for the Wilcoxon
score.
Sections 5.3 and 5.4 are devoted to the evaluation of the
TARE and CARE for the problem of underfitting and overfitting,
respectively.
Particular attention is paid to the
situa~ion
where the
data vectors are observations at five equally spaced time points.
Section 5.2 is included to show the redution of the TARE and CARE for
the uniform correlation model, and a tabulation of the TARE in this
case is given.
The covariance matrix defined in (5.2.6) is the one
which is used to provide numerical examples of the values of TARE and
CARE.
136
5.2
Special Covariance Patterns
If the covariance matrix is uniform, for example,
(5.2.1)
1
P
p
1
...
p
p
p
1
then the TARE and CARE for the test of stationarity studied in detail
2
in Chapter III do not depend on a or p.
~
=
[~1'
To demonstrate this we let
..• ,_btl, be the orthonormal polynomials of degrees 1 to t,
then
~' ~ ~
=a
2
(l-p)
!.
Consequently, the TARE and CARE for overfitting with t statistics
relative to the test with q statistics are:
TARE
= R(q,t,a)
t/q
for t > q.
(5.2.2)
CARE
= R(q,t,a)
•
TARE
= R(q,t,a)
t/q.
If t < q then
(5.2.3)
The quantity (5.2.3) is tabulated in Table 5.2.1 with q
t
= t2
in the notation used in that table.
= t1,
The entries in Table 5.2.1
are valid for all values of the number of time points, h, and are
valid for both the parametric and Wilcoxon procedures discussed in
Chapter III.
Inspection of the table reveals that the efficiency is
not greatly reduced for underfitting the model by only one or two
parameters but is
qui~e
and
low for underfitting by a great deal.
137
As we pointed out in Chapter III in the concluding comments, the
results of that chapter are applicable to other hypotheses of interest;
e.g., we may test the hypothesis that the intercept is a fixed
quantity, say aOO and that the curve is stationary.
matrix is augmented by
~O.
In this case the B
This hypothesis is of special-interest
uu
since the c-sample efficiencies'have been shown to be multiples of the
corresponding one-sample efficiencies of that particular test.
Defining B as B
~
(5.2.4)
B'
=
[b ' b , ••• , b ], we find that
-O -l
-q
r B = 02
1-p+hp
0
0
o
l-p
0
o
o
o
o
o
o
1-p
Hence, the TARE and CARE are independent of
depends in general on p.
0
2
however, the TARE
Inspection of (5.2.4) and its determinant
readily show that the CARE would be independent of p.
In fact, the
quantity (5.2.2) vlould define the CARE for this problem.
We remark
also that covariance adjustment by the higher order polynomial estimates does not alter the results discussed up to this point in this
section since B'
r B is diagonal in both cases. lVhile the CARE does
not depend ,on p for the second hypothesis, the TARE does depend on p.
With some algebra the TARE of the test with t statistics to the test
with q statistics can be shown to be:
(5.2.5)
TARE
= R(q,t,a) [t(l-p) +
h(t-l)p]/[q(l-p) + h(q-l)p].
We have not tabulated values of (5.2.5) but it may be of interest
to do so in future work.
The point to be made is that we get different
~
138
efficiency results by including the intercept in the hypothesis.
A model considered in time series analysis is the first order
auto-regressive model, with covariance structure defined by (5.2.6).
r =a
(5.2.6)
2
2
1
P
P
P
2
p
1
P
1
p
3
P
h-l
P
1
h-l
P
In section 5.3 and
5~4
... ...
P
1
we shall evaluate the TARE and CARE for
this correlation pattern for various values of p and h.
The value of
a2 is, of course, immaterial since the TARE and CARE are independent
2
of a.
1
r- =
It is known that the inverse of
2 -1
[a (l-p )]
2
1
o
-p
-p
r,
defined by (5.2.6) is:
o
...
o
o
-p
1
o
o
which is useful in computing the covariance matrix of the weighted
least squares estimates.
by (B' r- l B)-I.
-
-
This covariance matrix is, of course, given
\{hen the underlying distribution function is normal,
r- l B)-l is identical to the covariance
--
the covariance matrix (B'
-
matrix for the maximum likelihood estimates.
(~'
E- I
Hence, we see that
~) is the matrix in the noncentrality parameter of these
statistics.
In Section 5.3 we compare the underfitting procedure
based on the unweighted least squares estimates to the trace of
r- 1
--
(B'
B).
-
I
140
5.3
Underfitting
Because of the relationship between the multi-sample problem and
the one-sample problem for the hypothesis of a specified intercept and
a stationary or constant response over time, we shall consider this
hypothesis in this section.
This section specifically considers the
evaluation of the TARE of the unweighted least squares procedure which
underfits the model with respect to the unweighted least squares procedure which specifies the correct number of parameters.
lve also con-
sider the TARE when covariance adjustment has been used in the underfitting procedure.
These two comparisons are analogous to two possible
procedures one may follow in practice; the first procedure simply being
an unweighted least squares with no covariables, while the second is
the unweighted least squares estimates with all remaining higher order
polynomial estimates used as covariables.
In addition to comparing
these two underfitting procedures to the unweighted least squares procedure based on the correct number of parameters, we also compare them
to the weighted least squares procedure based on the correct number of
parameters.
The covariance matrix (5.2.6) ,vas used for the covariance
matrix of the original observations.
Tables of the ratio of the trace
of the noncentrality parameters were generated for p = ±.l, ±.5, ±.9
and ±.95 for five and also ten equally spaced time points •. For five
time points these figures are found in Tables 5.3.1 - 5.3.8, two illustrations for ten time points are given in Tables 5.3.9 and 5.3.10.
To
compute the TARE one only needs to obtain the value of R corresponding
to the a-level and the number of samples and multiply by the appropriate
number selected from the tables.
Values of R are found in Chapter II.
While the numerical results in Tables 5.3.1 - 5.3.10 are for the
141
parametric test procedures, the corresponding tabulation for the
Wilcoxon score yields nearly identical results for the figures below
the diagonals.
The maximum difference was, in fact, .01.
Each
figure below the diagonal is the ratio of the trace of the matrix in
the noncentrality parameter of the test based on the unweighted least
squares procedure with too
fe~
parameters to the corresponding trace
for the unweighted least squares test with the correct number of
parameters.
For this problem tl,in the tables is the number of
parameters in the true model while t
assumed.
z is
the number of parameters
In the upper triangle of the tables, we compare the under-
fitted tests to the test using the weighted least squares reduction.
The roles of t
1
and t
2
are thus reversed in this comparison.
The
entries above the diagonal in the 'a' tables are therefore:
tr(B ' E B )-l/tr(B '
- t1
- t1
-t2
(5.3.1)
and below the diagonal are:
(5.3.2)
The entries in the 'b' tables have identical denominators, but
the covariance matrices in the numerator have been changed to:
(5.3.3)
B
-h-t
degree t
l
is used to denote the matrix of orthonormal polynomials of
l
to (h-l).
Examining the tables we see that for p = ±.l covariance adjustment does not change the ratio of traces of the underfit to either
procedure.
For all t
l
and t
2
the ratio of traces is quite low when
142
h
= 5.
When h
= 10
the ratio is near .90 if the degree of the true
model is high ·and the degree of the incorrect test is only one less.
For p
= .5
and h
=5
or 10 slight improvements are observed using
covariance adjustment; these are small, however.
The ratio of traces
improves by as much as .07 with covariance adjustment for p
= -.5.
If P is .9 or .95 the ratio of traces is extremely small whether
covariance adjustment is used or not.
Nearly all of the ratios of
traces are less than .50 in this case and many are near zero.
Hence,
for large positive correlations the loss in the TARE in underfitting
the model is substantial.
For large negative values of p we see that
covariance adjustment greatly improves the efficiency results.
For
example, in comparing the test based on the unweighted least squares
with two parameters to the test based on unweighted squares with three,
we see that the ratio of traces is .44.
Adjustment of the former test
with the three higher order terms as covariables increases the ratio
of traces to 1.06.
This is not a
sur~rising
result since the test with
covariance adjustment is providing information about the underlying
covariance matrix which the unweighted least squares procedure does not
obtain.
Marked improvements are also noted for the adjusted test as
compared to the weighted least squares procedure.
\{hen adjustment is
made for the factor R, one will find that the TARE will in several
cases be greater than one for the test based on fewer degrees of
freedom relative to the test based on a larger number of degrees of
freedom.
This can be explained by one of two possibilities in the case
of comparing the 'unweighted least squares tests:
either the covariance
is such that not weighting causes a reduction in the trace of the
matrix in the expression for the noncentrality or the increase in the
143
size of the trace of the test with more degrees of freedom is too small
to account for the loss in sensitivity of a test with more degrees of
freedom.
The interpretation of this must be made in light of the fact
that we have considered the local power functions 'tvhich means the
alternatives are close to the null point.
No generalization to
alternatives at a greater distance "from the null point should be made
since the truncated power function is not an adequate approximation to
the entire power function for these alternatives.
A fuller discussion
of the relationship between the degrees of freedom and the noncentrality
parameter is found in Krishnaiah (1966, pp. 91-92).
The interpretation
of the increase in the TARE of the test with fewer degrees of freedom
relative to the test with more degrees of freedom based on the weighted
least squares procedure lies simply in the fact that the gain in degrees
of freedom of the latter test has not been accompanied by a sufficiently
large
in~rease
in the trace of the matrix in its ncncentrality.
The
weighted least squares procedure which is equivalent to the maximum
likelihood procedure in the normal theory case is the uniformly most
powerful test in the class of invariant tests for the one-sample problem.
Underspecifying the true model results in choosing estimates
which are singular transformations of the estimates for the correct
number of parameters.
Hence, tests constructed for the underfitted
model do not belong to the same invariant class as the weighted least
squares test statistics.
According to a result of Wald (1943) the test
based on the maximum likelihood statistics (weighted estimates) has
best average power over the family of ellipsoids defined by equating
its noncentrality parameter to a constant.
Integration over the family
~
144
of spheres for local alternatives weights all the parameters equally,
while in using the ellipsoids as the surface of integration the
weights are determined by the covariance matrix.
Since the likelihood
criterion provides the best test in the sense of best average power
on a given family of ellipsoids, it may not perform as well when the
surfaces of integration are spheres and the information matrix is not
proportional to I.
Care must be used in the interpretation of the efficiency results
presented in this section because we have compared the tests in a very
restricted manner.
First, we have
allo~ved
only local alternatives and,
second, we have taken average power in this local sense.
While the
test based on fewer degrees of freedom may have better average local
power, it is at the same time inferior in those directions which have
not been fit.
In conclusion, we see the numerical results show that we never
lose by using covariance adjustment of the primary variates with the
higher order polynomial terms.
Furthermore, there are covariance
matrices that show great improvements in the TARE by use of the
covariance technique.
While underfitting the model is not desirable,
the results in this section seem to suggest that the loss in the
average local power is not large for certain covariance matrices.
For
others, the loss is severe and we see that the decision to include or
exclude higher degree polynomial terms depends to some extent on the
underlying covariance pattern.
145
e
TABLE 5.3.1
RATIO OF TRACES OF NON CENTRALITY PARAMETERS FOR UNDERFITTING:
FIRST ORDER MODEL P = .1, h = 5
~
a. Without Covariance Adjustment
t
t
1
1
1
2
2
3
4
5
.48
.30
.22
.17
.64
.46
.35
.72
.55
2
.48
3
.31
.64
4
.22
.46
.72
5
.17
.35
.55
.77
.77
b. With Covariance Adjustment
t
t
1
1
1
Note:
2
2
3
4
5
.48
.31
.22
.17
.64
.46
.35
.72
.55
2
.48
3
.31
.64
4
.22
.46
.72
5
.17
.35
.55
.77
.77
Entries are defined by (5.3.1) , (5.3.2) and (5.3.3).
e~
.,.
149
e
TABLE 5.3.5
RATIO OF TRACES OF NONCENTRALITY PARAMETERS FOR UNDERFITTING:
FIRST ORDER MODEL P = .9, h = 5
a. Without Covariance Adjustment
t2
t
1
1
1
2
3
4
5
.09
.02
.01
.01
.25
.11
.06
.43
.24
2
.10
3
.03
.26
4
.01
.11
.43
5
.01
.06
.24
.56
.56
b. With Covariance Adjustment
t2
t
1
1
1
Note:
2
3
4
5
• 09
.03
.01
.01
.27
.12
•07
.44
.25
2
.10
3
.03
.27
4
.01
.12
.44
5
.01
•07
.25
See note for Table 5.3.1.
.57
. 57
e
"':
151
e
TABLE 5.3.7
RATIO OF TRACES OF NONCENTRALITY PARAMETERS FOR UNDERFITTING:
FIRST ORDER MODEL P = -.95, h = 5
a. Without Covariance Adjustment
t2
t
1
1
1
2
3
4
5
.10
.08
.07
.06
.24
.21
.20
.62
.59
2
.32
3
.11
.33
4
.07
.24
.71
5
.06
.20
.59
.83
.83
b. With Covariance Adjustment
t
t
1
1
1
Note:
2
2
3
4
5
.61
.48
.42
.40
.78
.68
.65
.87
.83
2
2.01
3
.67
1.10
4
.48
.78
.99
5
.40
.65
.83
See note for Table 5.3.1.
.95
.95
e
154
TABLE 5.3.10
'"
e
RATIO OF TRACES OF NONCENTRALITY PARAMETERS FOR UNDERFITTING:
FIRST ORDER MODEL P • -.95, h = 10
.
a • Without Covariance Adjustment
t
t
1
1
1
.
2
.85
3
.45
4
2
2
3
4
5
6
7
8
9
10
.34
.25
.20
.17
.15
.14
.13
.13
.12
.29
.23
.20
.18
.16
.15
.15
.44
.38
.33
.31
.29
.15
.28
.63
.56
.51
.48
.46
.46
.74
.68
.64
.62
.61
.83
.78
.75
.75
.88
.85
.84
.93
.92
.27
.53
.32
.60
5
.20
.24
.45
6
.17
.19
7
.15
8
9
10
.37
.75
.62
.82
.17
.33
.54
.72
.88
.14
.16
.30
.50
.66
.81
.92
.13
.15
.28
.47
.63
.87
.95
.12
.15
.28
.46
.61
.77
.75
.84
.92
.97
e
.28
.97
b. With Covariance Adjustment
t2
t
1
1
1
3
4
5
6
7
8
9
10
.55
.40
.32
.27
.24
.22
.21
.20
.20
.73
.58
.49
.44
.40
.38
.37
.36
.80
.68
.60
.55
.52
.50
.50
.85
.75
.69
.65
.63
.62
.89
.81
.76
.74
.92
.86
.83
.73
.82
.94
.91
.90
.97
.96
2
1.37
3
.72
1.31
4
.43
.79
1.08
5
.33.
.59
.82
1.02
6
.27
.48
.67
.83
.98
7
8
.24
.22
.43
.59
.73
.86
.98
.39
.54
.68
.79
.90
.98
9
.21
.37
.51
.75
.20
.36
.50
.85
.82
.93
10
.64
.62
Note:
e
2
See note for Table 5.3.1.
.73
.90
.99
.96
.99
.98
155
5.4
Overfitting
We present numerical results for the CARE of the parametric test
that overspecifies the model relative to the parametric test which
correctly specifies the model for h equal to five.
We consider the
unweighted least squares procedure for both tests.
We restrict atten-
•
tion to the first order auto-regressive model and the hypothesis
discussed in Section 5.3.
the t
th
1
In Tables 5.4.1 - 5.4.4 we have tabulated
root of the ratio of the determinants above the diagonal,
while below the diagonal we have the CARE of the overspecified test to
the test based on the correct number of parameters for the one sample
problem (or two sample problem) for a
=
.05.
Hence, above the
diagonals are:
(5.4.1)
and below the diagonal are:
We have tabulated the results for p
= ±.5,
±.90.
The tables were
generated for other values of p; however, the results for intermediate
values of
p are between the values listed in the tables. Small values
of Ipl result in D, defined by (5.4.1), being very close to one.
The
comparison to the test based on the weighted least squares reduction
yields similar values of the CARE for Ipl ~ .5.
Ipl
For larger values of
the CARE for comparison to the weighted least squares procedure
is much lower (see Table 5.4.4).
We note that for p
=
±.5 the CARE for
~
156
overfitting is about .40 for fitting five terms when only one is
needed.
If we overfit a four parameter model by a five parameter model
the CARE is about .90.
the case where p
= +.5,
When p
= +.9
while for p
I
we obtain results identical with
= -.9
we see that overfitting in
most cases produces a CARE greater than one.
This most probably is
caused by the fact that the overfitting yields information about the
covariance matrix not used in the unweighted least squares procedure
which specifies the correct number of parameters.
Comparison to the
weighted least squares procedure based on the correct number of
parameters for p
=
-.9 produces rather low values of the CARE.
On the
other hand, with underfitting in this same case we have a loss in the
ratio of the traces of the noncentralities; however, a large portion
of this loss is recovered by covariance adjustment (see Table 5.3.6).
Furthermore, all the numbers in Table 5.3.6 will increase when the
adjustment is made for R.
This suggests that overfitting has more
loss associated with it than does underfitting for certain covariance
matrices.
Note:
Entries above diagonal are defined by (5.4.1).
Entries below diagonal are defined by (5.4.2).
e
TABLE 5.4.2
VALUES OF D AND CARE FOR OVERFITTING P
t
t
1
1
1
Note:
2
2
3
4
5
1.00
1.10
1.10
1.11
1.05
1.13
1.13
1.05
1.12
2
.65
3
.56
.82
4
.47
.74
.88
5
.41
.65
.82
See note for Table 5.4.1.
= -.5
1.05
.91
,.
158
."
e
TABLE 5.4.3
VALUES OF D AND CARE FOR OVERFITTING P
t
t
1
1
1
~
2
2
3
4
5
1.00
1.03
1.03
1.03
1.02
1.04
1.04
1.02
1.02
2
.65
3
.53 .
.79
4
.44
.68
.85
5
.39
.59
.75
Note:
= +.9
1.01
.88
See note for Table 5.4.1.
e
TABLE 5.4.4
VALUES OF D AND CARE FOR OVERFITTING P
t
t
1
1
1
2
Note:
= -.9
2
2
3
4
5
1.00
3.15
3.15
3.25
1.77
2.34
2.38
1.20
2.19
.65 (.20)
3
1. 60(.58) 1.38(.49)
4
1.35(.46) 1. 53 (.64) 1. 01( .41)
5
1.22(.87) 1.36(.73) 1. 60(.57) 1.37(.37)
1.56
See note for Table 5.4.1. Numbers in parenthesis are the CARE
of the overfit with respect to the weighted least squares test.
CHAPTER VI
SUMMARY AND SUGGESTIONS FOR FURTHER RESEARCH
We have considered in this study two measures of ARE which may
be used when comparing two statistics with limiting chi-square distributions.
The two quantities, the CARE and TARE, may be used when the
chi-square distributions have different degrees of freedom.
The CARE
selects that test of the two competing tests whose power function has
the greater generalized Gaussian curvature at the null point.
On the
other hand, the TARE selects the test whose power function has the
greater average local power over the family of spheres.
Both the CARE
and TARE depend on the degrees of freedom of the tests, the significance
level and the noncentra1ity parameters.
The CARE and TARE computing
formulae have been derived for the one-sample and the multi-sample
polynomial growth curve problems.
to
cal
•
th~
Particular attention has been paid
formulae for the underfitting and overfitting problems.
ex~p1es
I
Numeri-
of the CARE and the TARE have also been computed to pro-
vide some idea of the efficiency results for some special cases.
Several areas of future work may be suggested.
In the area of
applications it would be useful to study the CARE and TARE for examples
other than the ones chosen in Chapter V.
In addition, the work of
Chapters III and IV should be generalized to include more complicated
designs.
Hopefully, these results would reduce to the one-sample prob-
1em as the multi-sample problem did so the same bou.ws on the TARE and
160
CARE would apply, excepting the factor R.
It seems reasonable that the
noncentrality·parameters for the more complicated models would factor
.
as the noncentrality parameters for the multi-sample problem did •
In .addition to the growth curve problem it would be desirable to
study other applied problems.
The results of Chapter II are in no way
restricted to the study of growth curve models, but may be applied to
virtually any situation where a reduction to a set of summary statistics is made and we want to determine the efficiency of the reduction.
For example, if the course of a disease is characterized by a stochastic process and it is of interest to compare different groups of
persons in this disease process then maybe certain summary statistics
define or describe the process.
One way to compare them is to reduce
each person's observations to the basic summary statistics by some
suitable estimation procedure then compare the summary statistics of
the different groups.
A natural question arises as to whether the
reduction is a 'good' reduction to use.
The basic approach used in
this study would be helpful in this problem.
Another area of possible future work is to consider other
efficiency criteria.
From the results obtained in Chapter V for the
underfitting problem, we see that it may be meaningful to define the
TARE as the ratio of the average local powers where the average is
taken over the family of ellipsoids defined by the nonc.entrality of
the likelihood ratio test.
In addition, the restriction of local alternatives makes the
TARE a somewhat restricted measure of ARE.
...
REFERENCES
[1]
Bahadur, R. R. (1960) • "s tochas tic comparison of tes ts, "
Annals of Mathematical Statistics, 11, 276-295.
[ 2]
B1omqvist, N. (1950). "On a measure of dependence between
two random variables," Annals of Mathematical Statistics,
21, 593-600.
[3]
Bickel, P. J. (1964). "On some alternative estimates for
shift in the p-variate one-sample problem," Annals of
Mathematical Statistics, 11, 1079-1090.
[4]
Bickel, P. J. (1965). "On some asymptotically nonparametric
,
competitors to Hotelling's T2 ," Annals of Mathematical
Statistics, 1£, 160-173.
[5]
Gastwirth, J. L. and Wolff, S. S. (1968). "An elementary
method for obtaining lower bounds on the asymptotic power
of rank tests," Annals of Hathematical Statistics, 12,
2128-2131.
[6]
Graybill, F. A. (1961). An Introduction to Linear Statistical
Models, Volume 1, McGraw-Hill Book Company, Inc., New York.
[7]
Graybill, F. A. (1969). Introduction to Matrices with Applications in Statistics, Wadsworth Publishing Company, Inc.,
Belmont, California.
[8]
Grizzle, J. E. and Allen, D. (1969). "Analysis of gro\vth and
dose response curves," Biometrics, 12" 357-382.
[9]
Hodges, J. L., Jr. and Lehmann, E. L. (1956). "The efficiency
of some nonparametric competitors of the t-test," Annals
of Hathematical Statistics, 1I, 324-335.
[10]
Isaacson, S. 1. (1951). "On the theory of unbiased tests of
simple statistical hypotheses specifying the values of two
or more parameters," Annals of Mathematical Statistics,
22, 217-234.
[11]
Kendall, M. and Stuart, A. (1963). The Advanced Theorv of
Statistics, Volume II, Second Edition, Hafner Publishing
Company, New York.
•
162
.
•
'
•
.
[12]
Krishnaiah, P. R. (1966).
Press, New York.
Multivariate Analysis, Academic
[13]
Lehmann, E. L. (1959). Testing Statistical Hypotheses,
John Wiley and Sons, Inc., New York •
[14]
Neyman, J. and Pearson, E. S. (1936). "Contributions to the
theory of testing statistical hypotheses I. Unbiased
critical regions of type A and type AI'" Statistical
Research Memoirs. Volume 1, 1-37.
[15]
Neyman, J. and Pearson, E. S. (1938). "Contributions to the
theory of testing statistical hypotheses," Statistical
Research Memoirs. Volumes II and III, 25-57.
[16]
Noether, G. E. (1955). "On a theorem of Pitman," Annals of
Mathematical Statistics. 1£, 64-68.
[17]
Potthoff, R. F. and Roy, S. N. (1964). "A generalized multivariate analysis of variance model useful especially for
growth curve problems," Biometrika, 21:., 313-326.
[18]
Puri, M. L. and Sen, P. K. (1971). Nonparametric Methods in
Multivariate Analysis, John Wiley and Sons, Inc., New York.
[19]
Sen, P. K. and Puri, M. L. (1970). "Asymptotic theory of likelihood ratio and rank order tests in some multivariate
linear models," Annals of Hathematical Statistics, 41,
87-100.
[20]
Wald, A. (1943). "Tests of statistical hypotheses concerning
several parameters when the number of observations is
large, II Transactions of the A.rnerican Hathematical Society,
54, 426-482.
[21]
Weyl, H. (1939). "On the volume of tubes," American Journal of
Mathematics, 461.
© Copyright 2026 Paperzz