BIOMA1JfEMATICS TRAINING PROGRAM
•
SOME ALTERNIrIVE MEASURES OF ASYMPTOTIC RELATIVE
EFFiCIENCY FOR THE MULTIPARAMETER.TESTING PROBLEM WITH
APPLICATION TO THE GROWTH CURVE PROBLEM
By
Robert Francis Woolson
,
~.
Department of Biostatistics
Universityo£ North Carolina at Chapel Hill, N. C.
Institute of Statistics Mimeo Series No. 903
JANUARY 1974
SOME
ALTER.~ATIVE
!-rEAST.:RES OF ASYHPTOTIC RELATIVE EFFICIENCY
FOR. THE HULTIPARAHETER
TESTI~G
PROBLEM HITH APPLICATION
TO THE GROHTH CURVE PROBLEM
by
Robert Francis Woolson
A dissertation submitted to the faculty
of the University of Harth Carolina at
Chapel Hill in partial fulfilllnent of
the rcquirccents for the degree of
Doctarof Philosophy in the Department
of Biostatistics
Chapel Hill
1972
Appr,?ved by:
Aclviser
•
Reader
--l2~QL{C(e&~
Reader
i
ABSTRACT
WOOLSON, ROBERT FRANCIS.
Relative Efficiency
with Application to
direction of PRANAB
Some Alternative Measures of Asymptotic
for the Mu1tiparameter Testing Problem
the Growth Curve Problem. (Under the
KUMAR SEN.)
Criteria for evaluating the loss in underfitting"or overfitting
a growth curve model are proposed.
This problem has been formulated
as one of comparing two test statistic sequences which have limiting
chi-square distributions.
As the degrees of freedom of the two com-
petitive chi-square distributions may be different, alternative
measures to the standard Pitman asymptotic relative efficiency (ARE)
are considered.
Two such criteria are the trace asymptotic relative
efficiency (TARE) and the curvature asymptotic relative efficiency
(CARE).
Each of these quantities is sho\vu to be the product of two
factors:
the first factor reflecting the degrees of freedom of the
two tests and the common significance level, while the second factor
is a function of the two noncentrality parameters.
This second factor
for the TARE is the ratio of the traces of the matrices in the noncentrality parameters while the second factor for the CARE is the q
th
root of the ratio of the determinants of the matrices in the noncentrality parameters, where q is the number of parameters in the
common hypothesis of interest.
The CARE selects the test whose power
•
where the average is taken as the average over the family of spheres.
The CARE and TARE are applied to the one-sample and multi-sample
growth curve problems.
Bounds for the TARE similar to the bounds
which exist for the CARE are derived.
It is shown that the c-sample
efficiencies are multiples of the one-sample results.
Numerical
illustrations of the TARE and the CARE are presented for specific
covariance matrices.
\.
ACKNOWLEDGMENTS
. The author appreciates the assistance of his advisor,
Dr. P. K. Sen, and of the members of his advisory committee, Drs. S. K.
Chatterjee, J. C. Cornoni, J. E. Grizzle, and D. E. Quade.
He wishes
to especially thank Dr. P. K. Sen for suggesting the topic of this
dissertation and for thoughtful discussion at any time during the
research.
Special thanks also go to Dr. James E. Grizzle for serving
as the author's course work advisor and for providing the author with
the invaluable experience of working on numerous consulting projects
with him.
The financial assistance of the Department of Biostatistics
during the course of the author's graduate studies is gratefully
acknowledged.
The author also wishes to thank his family for their encouragement during the course of this work.
In particular he wishes to thank
his wife, Linda, his son, Rob, and his mother, Mrs. Sallie Hagey.
The author also expresses sincere appreciation to Mrs. Gay Hinnant
for the typing of the dissertation.
TABLE OF CONTENTS
Page
ACKNOWLEDGHENTS ••
..
11
·.......
LIST OF TABLES •
v
Chapter
I.
INTRODUCTION AND REVIEW OF THE LITERATURE.
1.1
1.2
1.3'
1.4
II.
..
Introduction. • • • • • • • • • • • •
Asymptotic Relative Efficiency • • • •
Review of Optimal Parametric Tests • • •
Organization of the Study • • • • • • • • •
MEASURES OF ASy}~TOTIC RELATIVE EFFICIENCY FOR THE
MULTIPARAMETER TESTING PROBLEM •
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
III.
1
Introduction and Summary. • • •
• • •
The Test Statistics and Their Limiting
Distributions • • • • . • • •
• • • •
The Asymptotic Power Function {QN}' . •
Pitman ARE \-Jhen t = t • • • • • • • • • • • • •
l
2
Local Asymptotic Relative Efficiency (LARE) •
Curvature Asymptotic Relative Efficiency (CARE) •
Trace Asymptotic Relative Efficiency (TARE) •
Bahadur Efficiency. • • • • • • • • • • • • • • •
APPLICATION OF TARE TO THE ONE-SANPLE GROHTH
CURVE PROBLEH. • • • • • •
• • •
••• •
3.1 Introduction. •
3.2 The Statistical Hodel ••
3.3 Comments on Data Reduction.
3.4 Parametric Procedures • • •
3.5 Nonparametric Procedures. •
3.6 Comparison of Nonparametric
Procedure • • • • • • • • •
3.7 Covariance Adjustment •
3.8 Commen ts. • • • • • • • • •
·...
·.
• •
• •
1
2
8
11
14
14
16
27
39
41
50
57
59
67
67
68
71
73
81
• •••
to Parametric
• •
94
• • • • • • • •
109
118
·.
· . . · .. .. ..
1v
Chapter
IV.
Page
APPLICATION OF TARE AND CARE TO THE MULTISAMPLE
GROWTH CURVE PROBLEM • • • • • • • • • • • • •
. ...
4.1 Introduction..
•• • • • •
4.2 The Statistical Model and Data Reduction. •
• • • • • • • •
4.3 Parametric Procedures
4.4 Nonparametric Procedures • • • • • . • •
V.
VI.
.. .. ..
120
120
120
122
130
NUMERICAL ILLUSTRATIONS· OF THE CARE AND TARE
135
5.1
5.2
5.3
5.4
135
136
140
155
Introduction• • • • • • • •
Special Covariance Patterns • • • • • • • • • • •
Underfitting. .
• ••.
Overfitting • • • • • • • •
• •
SUMMARY AND SUGGESTIONS FOR FURTHER RESEARCH • •
REFERENCES • • • • • • • • • • • • • • • • • • • • •
159
161
e~
LIST OF TABLES
Table
2.5.1
2.5.2
2.5.3
2.5.4
2.5.5
5.2.1
5.3.1
5.3.2
5.3.3
5.3.4
5".3.5
5.3.6
5.3.7
e
5.3.8
Page
Values of the Adjustment Factor, -R( t , t ,a) ,
l 2
for a = .10.
····················
Values of the Adjus tment Factor, R(t ,t ,a),
l 2
for ex = .05.
····················
Values of the Adjustment Factor, R(t ,t ,a),
l 2
for a = .01.
····················
Values of the Adjustment Factor, R(t ,t ,a),
l 2
for ex = .005
····················
Values of the Adjustment Factor, R(t ,t ,a),
l 2
for ex = .001
········ ··· · ···· ··
Values of TARE for Underfitting for Uniform
Covariance Patterns.
···········
Ratio of Traces of Noncentrality Parameters for
Underfitting: First Order Model p = .1, h = 5
Ratio of Traces of Noncentrality Parameters for
Underfitting: First Order Nodel p = -.1, h = 5.
Ratio of Traces of Noncentrality Parameters for
Underfitting: First Order Model p = .5, h = 5 •
Ratio of Traces of Noncentrality Parameters for
Underfitting: First Order Model p = -.5, h = 5.
45
46
47
48
49
139
145
··
146
··
147
··
148
Ratio of Traces of Noncentrality Parameters for
Underfitting: First Order Hodel p = .9, h = 5
149
Ratio of Traces of Noncentrality Parameters for
Underfitting: First Order Hodel p = -.9, h = 5.
150
··
Ratio of Traces of Noncentrality Parameters for
Underfitting: First Order Model p = -.95, h = 5
··
Ratio of Traces of Noncentrality Parameters for
Underfitting: First Order Model p = .95, h = 5.
··
151
152
vi
Table
Page
Ratio of Traces of Noncentra1ity Parameters for
Underfitting: First Order Model p = .1,
h = 10
•
•
153
5.3.10 Ratio of Traces of Noncentrality Parameters for
Underfitting: First Order Model p = -.95,
h = 10
•
154
5.3.9
..... ...... .....···..
......
5.4.1
5.4.2
. ........···.•
···
Values of D and CARE for Overfitting p = -.5
··
Values of D and CARE for Overfittirtg p
= .5.
157
157
5.4.3
Values of D and CARE for Overfitting p
= +.9
158
5.4.4
Vaiues of D and CARE for Overfi tting p
= -.9
158
e~
NOTATION
B
is a matrix of real elements.
I~l
is the determinant of B.
1l~1I
is the Euclidean norm of a vector A.
~(~)
a. s.
-+
is the law or distribution of a random vector X.
p
'-+
denote convergence almost surely and in probabili ty
respectively.
tr(B)
is the trace of the matrix B.
~®~
is the Kronecker or Direct Product of A and B.
o..
1.J
is the Kronecker Delta.
ch(B)
denotes the set of characteristic roots of B.
t
R
denotes Euclidean t-space.
denotes the t dimensional normal distribution.
is the identity matrix.
J
is the matrix of ones.
CHAPTER ·1
INTRODUCTION AND REVIEW OF THE LITERATURE
1.1
Introduction
When two or more test procedures are available to test the
same hypothesis we are faced with the problem of deciding which one
to use.
While the null hypothesis may be parametric or nonparametric,
in most situations one is interested in a specific family of alternative hypotheses.
by a parameter
e
Hence we assume that this family is parameterized
and,
fu~thermore,
that the value of 8 when the null
hypothesis is true is denoted by 8 •
0
For this reason we shall
notationally represent the null hypothesis in this study as:
H :
O
8
= 80
In this thesis the specific problem of comparing two test
statistic sequences which have limiting chi-square distributions with
possibly different degrees of freedom is to be studied.
purpose of this
~vork
Themain
is to suggest and justify some measures of
asymptotic relative efficiency (ARE) which may be used in comparing the
two test sequences.
Obviously, the measures of efficiency proposed
should be such that, if the efficiency of test 2 relative to that of
test 1 is greater than one, then test 2 possesses a limiting power
function which in some sense is better than the limiting power function
of test 1.
In addition, it is desirable for any m2asure of ARE
propof-'ed to be fl'ee of arbitrary or unknown quantities and., further-Dlore, it shouid take on a single numerical value when sampling is performed from a completely specified distribution.
Another purpose of
this work is to apply the proposed criteria to the compari.son of tests
used for growth curve models.
Of particular interest is the loss in
efficiency when we underfit or overfit the true model.
We shall see,
however, that the efficiency criteria proposed are applicable to a
larger set of problems and are not restricted to the study of growth
curve prob16us alone.
For comparing two a-level tests, ¢l and ¢2' a reasonable measure
·of l.·elative efficiency to use is:
where N and N are the minimum sample sizes for which ¢l and ¢2 at
1
Z
level a have power
S against
simplicity we take
e to
the alternative that
be scalar.
e = ea ,
where for
To study this ratio for all values
of (a,S,8) is quite an involved study and, as an alternative, vle may
consider asymptotic comparisons of the two tests.
approach is to fix
ea' ea
:f
6
0
One asymptotic
and compare the limiting powers of ¢1
and ¢2 as sample size increases without bound for this fixed alternatlve.
An obvious shortcoming of this approach is that i f <PI and
are both consistent tests then both of the limiting
~nd
pm~ers
92
are one
this method of comparison does not discriminate between the two
tests.
On the other hand, if the tests are inconsistent they are of
little interest.
Bah~dur
(1960) considers the comparison of the
3
inverse ratio of the sample sizes in the limit for the two tests to
achieve the same significance level; we shall see in Chapter II that
this efficiency has serious deficiencies for our problem.
Pitman (1948) [see e.g. Noether (1955)] proposed that rather than
considering a fixed alternative hypothesis, a sequence of alternative
hypothesis depending on the sample· size, N, be chosen such that the
limit of this sequence approaches the null point and simultaneously the
power is bounded away from one.
In the Pitman sense we consider
testing
H :
O
e = 80
against the sequence
~:
where A is a fixed but arbitrary nonzero real number and 0 > O.
The
Pitman procedure for obtaining a measure of ARE is to consider two
sequences of sample siz~s, {N } and {N }' chosen such that the limiting
Z
I
powers of the two tests through the sequen~e {eN}' are the same.
Then
according to Pitman we have:
Definition:
The asymptotic relative efficiency (ARE) of test
relative to test
~l
~Z
is the limiting value of N/N Z where
N is the number of observations required by test
l
the power of test
~l
N observations while simultaneously NZ
2
If
~l
and
~2
~Z
to equal the power of
+
00
~l
for
based on
and eN
+
eo'
are tests based on the statistics t Nl and t N2 ,
respectively, and if the limiting distributions of t
converge to the normal distribution as N +
00,
Ni
through
~
~
then with suitable
restrictions on the limiting behavior of the first two moments of t
Ni
4
we may find the Pitman ARE quite easily.
Denote the mean of t
8 = 8N by ~Ni(8N) and its variance by a~i(8N).
O: 8 = 80 against the one-sided sequence
H
~:
Ni
when
Consider the hypothesis
. -0
eN = 8 + N A where
0
A > 0 «0) and, suppose we have the following three conditions in
addition to the limiting normality of t
(1. 2.1)
lim
N~oo
(1. 2. 2)
lim
d llNi (8 0) .
d8
> 0
lim
N~oo
:
«0)
d~i(eO~d~Ni(eN)
de
N~oo
(1. 2.3)
Ni
dS
d llNi(8 0 )
= c. > 0
de
~
0
N a (8 )
Ni 0
«0)
then the ARE of $2 relative to $1 can be seen to be [
o = 1/2
and in general it is
the efficacy of test
cC12 ] 1/0
[
:~
]2 when
The quantity c. is called
~
¢..
~
Noether (1955) extended Pitman's definition of ARE to the case
where the first (m i - 1) derivatives of
leI
th d er~vat~ve
•
•
•
~.-~s
~Ni(8)
~
at 8
0
= M,
are zero and the
he found that the
ARE of ¢2 relative
H
C
i
= lim
N~oo
d llNi (8 0 )
d
eH
-";~'---
Jio
aNi (eo)
Although not explicitly pointed out in Noether's article, it should be
observed that i f one does not require 1'11 = H then the ARE is
2
indeterminate, depending as it does on the unknown Al (or A ) raised
2
5
to a power of (M
2
- M ).
l
This point causes no practical limitation
since if M #"M then we are comparing tests which behave quite dif2
l
ferently.
For the one sided alternative hypothesis, Blomqvist (1950) proposed an asymptotic local relative efficiency defined as the limit of
the ratio of the sample sizes chosen so that the two limiting power
functions have the same slope at 6 ,
0
o = 1/2
Under the conditions M = 1,
and
Noether (1955) established the equivalence of this definition and the
Pitman definition.
Kendall and Stuart (1963) discuss other cases and
show, for example, for the two sided test with M = 1 that under mild
restrictions the ARE is equal to the limiting ratio of the second
derivatives of the power functions of 6 ,
0
It has been observed [see e.g. Puri& Sen (1971)] that the
theory for comparing the two limiting distributions does not require
asymptotic normality.
It is sufficient that the two power functions
can be made analytically the same by an appropriate choice of the
sample sizes.
They present the requisite theory for the comparison of
two test statistics which have noncentral chi-square distributions
both with the same degrees of freedom, p.
In this case, since the
noncentral chi-square is a monotonically increasing function of its
noncentrality parameter, they found that when M = 1 and 0
= 1/2
that
the ARE is simply the ratio of the two respective noncentrality
parameters.
They point out that this definition is entirely equivalent
to the Pitman-Noether definition for comparison of tests based on one
~
6
degree of freedom.
Turning now to the multiparametertesting problem we consider
H :
O
e = ~O
qXl
~:
and
~N
e
= -0
o > O.
and the sequence of alternative hypotheses,
+ N- o A where A is a fixed but arbitrary non-null vector
Puri and Sen study the case p
=q
in their text and when
M = 1, 0 = 1/2 observe that the Pitman-Noether efficiency is the
ratio of two positive definite quadratic forms in A.
Clearly, in this
case, there is no unique answer regarding the ARE since it depends, in
general, on the arbitrary vector A.
By application of a theorem due to
Courant on the extrema of the ratio of two positive definite quadratic
forms [Puri & Sen (1971, p. 122)], bounds may be placed on the ARE
over all non-null A.
their text.
This is, in fact, done for a number of cases in
The approach of placing bounds on the ARE provides some
information on the ARE.
However, some tests may be placed in a mis-
leading position in the spectrum of tests for a given problem since
the minimum bound on the ARE could be quite near zero, while the
typical or average performance of the test may be better than its
competitors.
To develop appropriate average measures free from A is
one of the goals of this investigation.
In our study we shall consider
a Pitman type of sequence of alternative hypotheses; however, a comparison in.the sense of Bahadur (1960) is a possibility.
Hence, we
shall summarize the necessary points for the Bahadur criterion of
efficiency.
In the Bahadur definition of efficiency it is assumed that we
have a family of probability measures parameterized by
on the same probability space.
~
E n, defined
The parameter set is denoted by nand
7
is partitioned into
containing
~O.
and
n - nO
where
nO
is the single point set
Let us suppose that the two competing test statistic
sequences are {T
Definition:
nO
Nl
} and· {T }.
NZ
The sequence of test statistics {T } is called a
Ni
standard seguence for testing H if, and only if,
O
. {T } satisfies ·the following conditions
Ni
I. there exists a continuous distribution function F.(x)
1
such that
lim
N +
Pre
00
0
~
[TN.
x]
=
F.(x) for all
1
1
real x,
II. there exists a constant ais(o,oo) such that
a.x 2
-loge [1 - Fi(x)]
=
1
-Z--[1 + 0(1)] as x
III. there exists a function b.(8) from
1
-
+
n - nO
00,
to the
positive real line such that for every E > 0 we
lim pr [ U [IK-l1ZT . K1
N + 00 a K>N
for all e E
b.ca)1
> El]
=0
1
n - nO.
If {T } is a standard sequence for testing H then the quantity
Ni
O
aib~(e) is called the asymptotic slope of the sequence {T
Ni
}.
1~ith
this in mind, if both {T } and {T } are standard sequences for
NZ
N1
testing H ' then the Bahadur efficiency of test 2 to test 1 is
O
aZb~(8)
2
for all a E n - no. It can be sho'vn that the Bahadur
albl(8)
efficiency is the limit of the inverse ratio of the sample sizes
needed for the two tests to have the same level of significance in
large samples.
In the later chapters we will show that under mild
assumptions the Bahadur efficiency of ¢Z relative
~o
¢l is
8
~2(~) ~;l (~) ~2(~) I ~~(~) ~~l (~) ~l(~)
both n.(8)
and r.(b)
_
_1 _
~1
for
~
€
n - nO'
depend on the alternative value of
where
e~
The
Bahadur efficiency will also be shown to be insensitive to differences
in degrees of freedom of the two tests.
The measures of ARE we shall propose.will be defined using
various notions of test optimality.
Hence, we shall review the rele-
vant areas of the theory of hypothesis testing in the following
section.
1.3
Review of Optimal Parametric Tests
Uniformly most powerful tests (or critical regions) are known
to exist in only the rarest situations.
In order to derive tests with
uniformly most powerful properties when restricting attention to a
subset of all available tests, Neyman and Pearson (1936) generalized
their fundamental lemma (for testing a simple hypothesis against a
simple alternative) to power functions subject to more than one side
condition.
This generalized lemma is stated and proven in several
places (e.g. Lehmann (1959».
For testing the hypothesis H :
O
against the two-sided alternative H :
a
8
~ 8 ,
0
8
=
8
where 8 is a one-
dimensional parameter, Neyman & Pearson (1936) propose the type A
critical region which may be described as the locally best unbiased
critical region.
This test is the one which maximizes
derivative of the power function evaluated at 8
0
unbiasedness restrictions on the power functions.
~he
second
subject to size and
For the two
parameter testing problem, Neyman & Pearson (1938) proposed type C
critical regions; these regions can be constructed if one knows the
relative importance locally of type II errors, since this region is
0
9
e-
defined to be the one with best local power along a given family of
concentric ellipses with
axes.
~he
same shape and direction of principal
If the family of ellipses are concentric circles then we say
the type C region is regular otherwise it is said to be nonregular.
Two main objections to type C regions can be raised; in the first
.
.
place, one may not be able to state the relative importance of type II
errors and, secondly, regular regions are not invariant under one to
one, twice differentiable transformations of the parameter space.
The
last point simply means that regular regions can become nonregular
regions even under some elementary transformations of the parameter
space.
To overcome these problems Isaacson (1951) proposed a type D
critical region which does not require knowledge of the relative
importance of type II errors.
To motivate the type D test Isaacson
observes that the type A power function satisfied an attractive
geometrical property.
Namely, if one considers a horizontal chord
drawn at a fixed infinitesimal distance above 8
0
that the length of
this chord, for the type A power function is a minimum when compared
to the length of the chord for any other of the power functions
satisfying the stated conditions of size and unbiasedness.
A type D region can be defined in the q parameter testing
problem as that region which maximizes the generalized Gaussian
curvature of the power function at
conditions.
~O
subject to size and unbiasedness
In order to be more specific let us denote by S(~lw),
the power function of a test with critical region w, and let
S(i)(~olw) denote the first partia+ derivative of S(~lw) with respect
to
ai
at
~o'
i = l,2, ••• ,q.
Further let
S{i,j)(~olw) denote
the second
partial derivative of S(_8!w) with respect to 8.8. evaluated at
J.
J
~
10
~
= ~o' i, j = l,2, ••• ,q.
IAI,
denoted by
A be
then the following is Isaacson's definition of a type
D region for testing H :
O
Definition:
Letting the determinant of a matrix
....8
= ....0
8 :
A region W is said to be an unbiased critical region
o
of
~
D for testing H if:
O
a. t3(~0Iwo)
= cx.;
b. S(i)(8....0 Iw0 )
=0
i = l,2, ••• ,q,
c. «S(i,j)(8....0 Iw0 ») is positive definite,
I«S(i,j)(8 Iw»)1 for any other
d. /«S(i,j)(8....0 Iw0 »)1>
....0
..
region w satisfying conditions a, band c.
In the avo-dimensional case it follows that the type D region
minimizes the area of an ellipse at an infinitesimal distance above the
point
~O'
and in the general q parameter case the type D region would
be the one
whos~
power function minimized the volume of a certain
ellipsoid at a given cross-section of the power function.
region is
the~
dimension.
The type D
a generalization of the type A region to more than one
Type D regions are characterized by Isaacson by use of the
generalized Neyman-Pearson lemma.
We know the generalized Gaussian
curvature of S(~lw) at ~O is:
Defini tion:
The generalized Gaussian curvature of B(~lw) at ~O is
K =
q
1«B(i,j)(~olw»)I/(l+ E S
j=l
(') 2
J
(8
-0
Iw»
2
Thus, if a test is unbiased then K = 1«B(i,j)(~olw»)1 •
If we
compare two critical regions (tests) then the test with the larger K
has a power function which encloses an ellipsoid of smaller volume
than the other power function along any of a family of infinitesimal
11
contours.
This presumes) of course) that both of the «e(i,j)(~olw»,s
are positive "definite since
a
e-
lack of definiteness of this matrix would
mean that the intersection of the power function with the fixed hyperplane would not be an ellipsoid.
Wald (1943) defines a critical region W to have uniformly best
o
average power with respect to a family of surfaces, K , and a weight
c
function,
g(~),
if for any other region w the surface integral of the
power function of W times g(8) over K is greater than the surface
o
-
c
integral of the power function of W times
g(~)
over the surface K '
c
If we define the surface as the unit sphere, i.e.
I I~I I = 1,
then we
can show that the surface integral of a quadratic form A' B A over
I I~I I ~
1 is proportional to the trace of the matrix B.
That is.
J
A' BAd A = k
tr(~).
Using this fact. it will be
ll~ll=l
possible to deduce that of two tests the one with a larger trace of
«e(i,j)(~olw») has greater average power locally over the family
of spheres than the other.
1.4 Organization of the Study
Since many of the standard parametric and nonparametric multivariate test procedures are based on test statistics which are quadratic forms, we shall consider in the general development in Chapter
II) two sequences of statistics which are quadratic forms in two
sequences of random vectors.
When the null hypothesis, H :
O
~
= ~o'
qXl
is true it is assumed that each of the sequences of test statistics
have limiting central chi-square distributions; one with t
freedom, the other with t
2
degrees of freedom.
l
degrees of
Sufficient conditions
..
12
are presented under which we may compute the limiting power of our
hypotheses~.
statistics through the sequence of alternative
Simpli-
fications of the power functions are obtained in terms of the parameters.
Three new definitions of ARE are proposed in Chapter II.
These are:
a) local asymptotic relative efficiency (LARE), b) curva-
~symptotic
ture
relative efficiency (CARE) and c) trace asymptotic
relative efficience (TARE).
Chapter II.
The precise definitions are presented in
All three criteria depend on the level of significance
of the tests, a, and the degrees of freedom, t
l
and t •
2
Tabulations
in Chapter II show the dependence on a is slight while the dependence
on t
2
- t
l
is strong.
In addition, all three criteria depend on the
noncentrality parameters and the latter two criteria are shown to be
"average" measures of efficiency, independent of the direction of
approach of
~N
to
~O.
The second criterion of ARE is a function of
the ratio of determinants of the noncentrality parameters, while the
trace criterion is a function of the ratio of the traces of the noncentrality parameters.
There is, it seems, an interesting connection
between these criteria and type D and type E optimality in the field
of experimental design.
Chapter II is concluded with a brief study
of the Bahadur efficiency and its relationship to the LARE in a limiting sense.
Chapters III and IV are applications of the measures of ARE to
the one-sample and multisample growth curve problems, respectively.
Polynomial models are studied and the efficiency results are presented
for the cases of underfitting and overfitting the correct model.
These efficiency results are presented for the least squares procedures
and the rank scores tests.
The least squares and nonparametric
13
procedures are also compared using the curvature and trace criteria.
Bounds using the trace criterion are obtained similar to those known
for the curvature criterion.
The use of the higher order polynomial
terms as covariables is also studied and the ARE is evaluated in this
case.
Chapter V contains numerical computations of the results obtained
in Chapters III and IV for some special cases.
\
CHAPTER II
MEASURES OF ASYMPTOTIC RELATIVE EFFICIENCY FOR THE
MULTIPARAMETER TESTING PROBLB1
2.1
Introduction and Summary
The purpose of this chapter is to propose and study several
competitive measures of asymptotic relative efficiency (ARE) for the
multiparameter testing problem.
We shall assume throughout that we
have two sequences of test statistics available for testing the same
hypothesis.
It is customary in both parametric and nonparametric
inference to consider a specified type of alternative hypothesis,
e.g. translation alternatives, scale alternatives, etc.
family of alternatives by a parameter 8 and we let 8
8 when the hypothesis we wish to test is true.
0
We label this
be the value of
In the text of this
chapter we shall occasionally refer to the null hypothesis as
H :
O
8
= 80 ;
however, we should keep in mind that the hypothesis may
in fact be much more general.
To compare the two test statistic
sequences in large samples we shall consider a Pitman type sequence of
alternative hypotheses and we shall present in the form of a theorem
(Theorem 2.2.1), sufficient conditions under which these test statistics have limiting chi-square distributions through the sequence of
alternative hypotheses.
Under the assumptions of Theorem 2.2.1 we
shall study in detail the limiting power functions of the statistics
and obtain several simplifications.
To derive these results we shall
'
15
find it useful to prove some results for homogeneous polynomials in
general and then apply these results to the expanded power functions.
In order to compare the two test statistics we consider a common
sequence of alternative hypotheses and derive the Pitman ARE of test 2
to test 1 when they both have limiting chi-square distributions with
equal degrees of freedom.
When' the degrees of freedom are unequal
we propose a local asymptotic relative efficiency (LARE) for the comparison of the two tests.
This LARE is found to be equal to a scalar
function, R, multiplied by the ratio of the noncentrality parameters
of the limiting power functions.
The function, R, has as its arguments
the common significance level of the tests, a, and the two respective
degrees of freedom of the tests.
of q parameters and the i
th
In the general setting
~
is a vector
test statistic has a limiting power func-
tion with t. degrees of freedom for i = 1,2.
~
The power depends also
on a positive integer, M, (defined by the conditions of Theorem 2.2.1).
In considering other measures of ARE we briefly discuss the problems
when M > 1 and then restrict ourselves to the case M = 1.
least one t. > q for i
~-
=1
Hhen at
or 2 we define a measure of ARE based on the
generalized Gaussian curvature of the limiting power of functions at
the null value of 8.
We shall show that this criterion of ARE produces
a quantity which is independent of the sequence of alternative hypotheses and furthermore when t
l
= t 2 this criterion reduces to the ratio
of the geometric means of the characteristic roots of the matrices in
the noncentrality parameters.
When t
l
< q and t
2
< q, the generalized
Gaussian curvatures of the power functions are zero at
method of comparison is useless.
~O
and this
In this situation we propose a
16
2.2
The Test Statistics and Their Limiting Distributions
For testing a specified null hypothesis against a parametric
family of alternatives we have two tests of size a available,
~2·
~l
and
The family of alternative hypotheses is parameterized by the
vector 8, which has q elements.
value of 8 is denoted by
null hypothesis as:
~O;
~~en
the null hypothesis is true the
for this reason we sometimes denote the
17
(2.2.1)
and the alternative by
H:
a
.8. .
= """,2
8
~
8 ,
.....,
0
{Q~l)}
The tests, ¢l and¢2' are functions of the statistics
. and
{Q~2)}, respectively.
..
These statistic sequences are further
assumed to be quadratic forms in random vector sequences {T(l)} and
N
. {T~2)}, respectively.
Each vector {T~i)} is composed Qf t
i
elements
and the test statistics are written as:
(2.2.2)
,,,here
E~i)
Q(i) = N(T(i)_~(i)(e »' ,,(")-1
N
N ",0
fN~ (T~i)-~~i)(eo»; i = 1,2
N
IN ~~i)
..
(~O) is the mean vector of IN
T~i) when ~ = ~O and
is some consistent estimator of the covariance matrix of
IN !~i).
The tests would reject the null hypothesis for large
and accept the null hypothesis for small
Q~i)
Q~i). We may assume that
the discriminant of Q(i) is of full rank, since if it is not it is
N
always possible to express
Q~i) as a quadratic form in fewer variables
whose discriminant would be nonsingular.
To illustrate this point more
clearly let us consider the multivariate multisample location problem.
One statistic in common use is the Hotelling- Lawley Trace which is
defined as:
c
p
P
E n E E
K
K=l
i=l j=l
(2.2.3)
.
c
N
=
L ~
K=l
e
18
T~ is used to test the null hypothesis:
HO: . ~1
pX1
(2.2.4)
where
~K
= ~2
= •••
pXl
II:
1.1
... c
pXl
is the location vector of an observation from population K.
We can rewrite
~K
as
where
With this restriction the hypothesis (2.2.4) can be written equivalently as
= ••• = Y...c- 1
(2.2.5)
If
~'lt "'t ~. n
-K.
-K t K
=
O.
are independent and identically distributed
p-variate normal vectors with mean
~
~t
and covariance matrix
symmetric and positive definite t then
n
-
~
~
~
K
~ •';s d'~str~'b uted NP <PKt n -1 ~)
~
K i=l - i
-K-
- n -1
and
=
-1 c
x = N
_
c nK
L n x is distributed Np < L -N
K=l K - K
K=l
-1
~KtN
L).
As a result we see that for each K = 1 t 2 t ••• t c we have
The covariance of
- = -
=
-1
<~-~)<~K,-~)'is- N
~
and consequently the joint
19
distribution of [(~-~); K
mean
[(~K-~)'
(2.2.6)
= 1,2, ••• ,c]
is pc-variate normal with
e·
K = 1,2, •.• ,c] and covariance matrix defined by:
I:
@
[~Kq ~
_
]
KK,q
c
with the restriction
E nK(~-~)
K=I . .
= ..0,
= 1,2, •.• ,c
we see that the rank of this
distribution is p(c-I), which means the quadratic form in (2.2.3),
if viewed as a quadratic form in [(~-~); K
criminant of rank p(c-l).
= 1,2, ••• ,c],
has a dis-
On the other hand if we consider the
p (c-I) vector;
[(~-~); K
(2.2.7)
= 1,2, .•• ,c-l],
this is p(c-l) variate normal with mean
and covariance matrix given by
(2.2.8)
K,q
= 1,2, .•• ,c-l
The inverse of (2.2.8) is
(2.2.9)
1,2, .•• ,c-l
Since we have sampled from normal populations we know from the strong
laws of large numbers that S
=
«s,
,»
~J
a.s.
~
r.. as N ~
00.
So a con-
sistent estimate of (2.2.9) is provided when we substitute S-l in
place of
E-1
in (2.2.9).
With some algebra, and using the restriction
~
~
20
·e
c
L
n
K=l
(x -~) = 2,
K -K -
2
we see that TN' defined by (2.2.3), can be written
equivalently as:
(2.2.10)
[IN(~-~) ,K=1,2, ... ,c-1] , (~®
N6
K
(--rPl-
1
1»- [IN(~-~),
K
K
= l,2, ••• ,c-1]
But (2.2.10) is a quadratic form whose discriminant is of full rank,
so we see that quadratic forms can be written in a "full rank" form.
In the theory presented in this chapter, we may, therefore, consider
without loss of generality that the quadratic forms, which we use as
test statistics, to have nonsingu1ar distributions.
We now present a
theorem which allows us to compute the limiting power of our statistics
through a sequence of alternative hypotheses.
Theorem 2.2.1.
For testing a specified hypothesis H against a
O
sequence of alternative hypothesis
~,
defined by:
A
~:
qXl
where
e>
0,
~
is a fixed non-null vector and 8
when H is true; we have a t- vector
O
Me
-
0
is the value of
Me
e
N !N with mean N ~N(~) and
positive definite covariance matrix L(e).
Suppose the following
five conditions are true:
for r
= 1,2, .•• ,M-1
for j-= 1,2, ... ,t
(2.2.11)
for some M > 1 and at
least one
j £ (1,2, ... ,t)
21
as N -+
e.
00
(2.2.12)
(2.2.13)
.
th
the M partial derivatives of ~N.(e) are continuous at
J
~o
(2.2.14)
~
= 1,2, ••• ,t
for each j
=
lim
c
N-+oo
tx1
Me
(2.2.15)~TN (!N - ~N(~N»] + Nt(~'~(~O»
as N
+
00
uniformly in ~, and
the distribution is non-degenerate, then
2Me
(2.2.16)~[N
(~N - ~N(~O»'
where X2(t,~) is a noncentral chi-square with t degrees of
freedom and noncentrality parameter
(2.2.17)
~
~
and
£' ~-l CeO) £
= ---::-.....;;........(M!) 2
Proof:
For each j
=
1,2, •.• ,t by (2.2.11) and (2.2.13) we can write
by Taylor's theorem and the mean-value theorem:
where
e* = ~O + h
therefore
N-
e ~;
h
E
(0,1),
22
·e
where
*
But £Nj (~'.~O)
(2.2.13) £Nj
0(1) as N +'00 because ~ *
=
*
+
~oas
N+
00;
hence by
= 0(1). So
(~,~O)
Let us defineQN and Q* by the following
N
2Me
QN = N (rN - ~N(~O»I
* 2Me
QN
= N (rN -
~N(~O»I
*
He first shall show that ~(~)
follow that ~~(QN)
Now
+
X2(t,~) as N +
M~
N (!N - ~N(~O»
+
=
00,
Me
+ 00,
Mo
+ N (~N(~N) - ~N(~O»
00.
Me
= N (!N -
2Me
By (2.2.15) and since N (!N -
~N(~N»
~N(~N»I
}to
a continuous function of N (!N -
~N(~N»
hence it 'vould
as we shall show 1QN-Q:' : O.
M8
N (r N - ~N(~N»
and by (2.2.18) we have as N +
(2.2.19) N (!N - ~N(~O»
X2 (t,M asN
we have that
23
2Mo
o('(N (!N ~ ~N(~N»'
~
-1
(~O) (!N - ~N(~N»)
~N + ~O
but by (2.2.l2b) and the fact that
as N +
2Mo
-1
~[N (!N - ~N(~N»' ~ (~N) (!N - ~N(~N»]
+
~
2
X (t) as N +
+
00
00
we also have that
X2 (t) as N +
00.
Let
2Mo
PN = N (!N -
~N(~N»' ~-l(~N)
(!N -
~N(~N»
by (2.2.19) we can write Q* as
N
*
QN
1
q
a
M
-1
1
q
= PN + [MT(R,~/R, aaR,) ~N(~O)]' ~ (~N)[M! (R,~/'R,
By (2.2.12b) we can replace
r = 0(1) as N +
00
~
-1
(~N)
by
a
M
'daR,) EN(~O)] + 0(1)
E.-1 (~O)
+
£ where
hence
But by (2.2.14) the 2nd term becomes
~
as N +
00
hence
*
~(QN) +
\.0
X2 (t,t)
To show the asymptotic equivalence of Q and Q* we first note
N
N
that Q* is bound.ed in probability since ·it has a X2. distribution; to
N
be more explicit for any £ > 0 there exists a K(N,£), depending on
Nand £ such that
P[QN* ~ K(N,£)] -> 1 - £ for every N > N£ •
.
We denote this property by writing
Q* =
N.
0 (1) as N +
P
00.
e
.
24
·e
Consider IQ
we have Q* > O.
N
N
- QN*I .
There exists an NO such that for all N > NO
So we may write for N > NO
Now by Courant's theorem on the ratio of two positive definite
quadratic forms we know that
where
A
Yl,N
= smallest
-1
A
(~N)' and
root of EN E
QN
~
sup
QN
(!N - ~N(~O»
Yt , N =
N where
'
# 0
-1 (~N) •
A
A
A
= Yt
largest root of EN E
IQN - QN* 1 ~ max
A
[I Yl,N
-
= opel)
A
*
11 QN'
but by (2.2.l2a) we know Yl,N and Yt,N
IQN - QN* 1 ~ opel) Opel)
So we obtain
as N +
Yt,N -
11
*
QN]
converge stochastically to one
00
* are
and we see that QN and QN
Q.E.D.
asymptotically equivalent and the theorem is proven.
We observe that if we have 2 sequences of test statistics {Q(l)}
N
l
and {Q(2)} -and a common sequence of alternative hypotheses:
N
2
~:
~N
-0
= ~O + N
~
where {N } and {N } both depend on N and furthermore if there exist
l
2
constants PI and P2 defined by:
(2.2.20)
Pi e: (0,1]
i
= 1,2
25
then Theorem 2.2.1 could be seen to yield the conclusion that
(2.2.21)
This fact will be used in later sections to obtain some
efficiency results.
The only value of M which we consider in the applications in the
later chapters is M = 1; no practical situations are known to us in
which M > 1.
With regard to 0 we note that for the multivariate
location problems and the growth curve application that 0
tests of independence one would require 0
= 1/4
= 1/2;
for
in order to keep the
limiting power bounded away from zero and one.
Since M = 1 for the cases we study we present in the form of a
theorem, an observation concerning the noncentrality parameter when
t
< q.
If t, q,
~,
:' M and
E(~o)
are as defined in Theorem 2.2.1 we
have the following theorem:
Theorem 2.2.2.
A
~
If t < q and M = 1 then there exists at least one
- = o.
0 such that c
Proof:
If M = 1 then 6
=
A'
D'
E- l (~o)
D
A
where D is defined
tXq qXl
by :
= E~.
rank
(E' E-l(~O)
Since t < q we know that rank
E)
~ t < q.
q x q has rank s where s < q.
(~) ~
So the matrix
E'
t < q and further
E-1(~O) D of size
26
.e
1
By the symmetry of D'E- (e ) D there exists an orthogonal matrix
- -0
P such that
P'(D' E-1 (e )D) P = diag(r ,r , ••• ,r 'O,O, ••• ,O) = R
~
-
-
-0 -
1
-
where R has the roots of
-1
(E'~.
(~o)
2
E)
s
on the diagonaL
Define the full rank transformation
P'A
= A*
therefore
and
A' D' E- 1
(~O)
D A = A*' R A* .
Let the first s elements of A* be zero and at least one of the
remaining (q-s) elements be nonzero, then
*=°
*'
-
A
but P A*
= A and
I~I #
R A
° therefore A # ° and the result is proven.
Q.E.D.
We see that if t < q there will exist directions for which 6
will be zero and hence our test '\Till have power a. in those directions.
If t ~ q and if
D has rank q then it· follows that D' ~-1 (~O) D is
positive definite and for any
~
#
-°we would have 6 >
o.
We now study the asymptotic power function of our statistics
under
th~
conditions of Theorem 2.2.1.
27
2.3
The Asymptotic Power Function of {QN}
2.3.1
Taylor Series Expansion of the Power Function
Under the assumptions of Theorem 2.2.1 and the uniform conver-
gence in
~
we may write the limiting power function
P(~)
as:
(2.3.1) peA)
If we are going to use a Taylor series representation of P(A)
we need to evaluate terms of the form
Since peA) is .a function of
~
derivat~ves
~
dA t •.. dA t
r
1
A
~
=0
~
one could evaluate the partial
by application of the chain rule of differential calculus.
Proceeding with this idea we get the following:
for K
l
a)
= l,2, ••• ,q
as the first partial derivatives, and
b)
for K ,K
l 2
= l,2, ••• ,q
as the second partial derivatives,
a3 p(A)
c)
-::-:--~~-:---
a\ aA K aA K
12
3
=
aA K aA K aA K
1
2
3
+
a2p(A)
~
·d2~
dA
------+
a~2 dA
dA dA
K1K
K
32
a2p(A)
a~2
a2~
~
dA
dA
K2 K3
dA
.K1
ap(A)
+ ---- - - - - - -
e.
28
for K , K , K
1
2
3
= 1,2, ••• ,q
as the third partial derivatives.
Con-
tinuing in this fashion will lead to complicated expressions to eva1uate so we seek alternative methods of computing these derivatives.
Since we see that each term in the series representation of
P(~)
in
(2.3.1) is a polynomial of degree r in the Ai' we will derive some
results for homogeneous polynomials in general and with these results
obtain a reduction of (2.3.1).
2.3.2
Results on Homogeneous Polynomials
Definition:
A function h(zl' ••• , Zq) is said to be homogeneous of
q
degree n in a region E C R if and only if for every positive number S
SZq) in E we
and (zl' ••• , Zq) with both (zl' ••• , Zq) and (Szl' ""
n
.
have h(Szl' ••• , SZq} = B h(zl' ••• , Zq)'
We remark that if h is a polynomial in (zl' ••• , Zq) then we
would say h is a homogeneous polynomial of degree n if it satisfied
the condition of the above definition.
~
nomia1 of degree K as
= ~(zl'
Denoting a homogeneous po1y-
... ,
Z )
q
we now state and prove
some results concerning ~.
d~
Lemma 2.3.1
is a homogeneous polynomial of degree
dZ
s
K - 1 (K
= 1,2, ••• ,
), s
= 1,
•.• , q.
Proof:
Fix an arbitrary
S8
(1,2, ••• ,q).
We can represent
q
I:
~=
(i , ... ,i )
1
q
•
where I
K
a
8
IK
.
IT
~
as
i.
z J
i
. 1
1 "",1. q J=
j
j
=1,2, ••• ,q
I:
is the set defined as:
1.
J
= O,1, ••• ,K;
q
j=1
i.
J
= K}
29
and a
i 1 ,···,iq
are real numbers (could possibly be zero).
Now
hz<
is
e·
a finite sum so there is no problem in exchanging the derivative and
summation operators, so we get that
i
q
i
let
i~ =
s
a.
1.
.
II
• 1
1 ' ... '1. q J=
"j#s
z
s-l
s
=hz<*
is -1 then we see that
i. = O, ... ,K; j = l,2, ... ,q; j
J
q
i' = O, ••• ,K-1; r i. + i = K-1}
s
s
j=l J
# s;
j;s
So we see that
*. is a polynomial of degree K-l in addition
~
note that
so it is a homogeneous polynomial of degree K-l.
Since s lv-as arbi-
trarily the result holds for s = l,2, ••• ,q.
Q.E.D.
It is worth noting that if h is a homogeneous polynomial of
degree 1 then each of its partial derivatives are themselves constants;
hence the derivatives would be identically zero.
Lemma 2.3.2
Any tth partial derivative of ~(zl, ••• ,Zq)' K = 1,2, •••
..
is:
i) a homogeneous polynomial of degree K-t for t < K
ii) zero for all t > K.
•
30
Proof:
i) follows immediately from repeated application of Lemma 2.3.1
th
.
at the K step the homogeneous polynomial of degree 0 would
also be a constant so all further derivatives would be zero
so ii) follows.
Q.E.D.
Lemma 2.3.3
For any two positive integers a and b we have:
ii) h a + ga
iii) ch
a
=
ha'
= ha where c is a nonzero real number.
Proof:
i) The product of 2 polynomials is a polynomial of degree
equal to the sum of the degrees of the individual poly- .
nomials; the homogeneity follows since:
ha+bCSZl,···,SZq)
= haCSzl,···,SZq)
= Sa+b
h
~(Szl,···,SZq)
a+b
ii) The sum of 2 polynomials is a polynomial of degree less
than or equal to the maximum degree of the individual
polynomials, hence h + g
a
a
is a polynomial of degree
~
a
and
ha (Szl' ••• 'Szq ) + ga CSzl, ••• ,Sz.)
q
..
= S~Ia = S~ a
iii) follows trivially.
Q.E.D.
31 .
. Lemma 2.3.4
Let
f(~(zl, •••
,Zq»
be a function of
~,
a homogeneous
th
polynomial of degree K, possessing K partial derivatives
with respect to
s
= 1,2, ••• ,K
and continuous in z, then for any
Z
we have
s ajf
(s,j)
=.4 - . h.
K
j=l
J-S
, where the superscripts
atti
on h
_ are just to indicate that these homogeneous
jK s
polynomials are in general distinct for
..., i s
e: (1, 2, •.• ,
q) •
Proof:
by Lemma 2.3.1
for i
dh(l,l)
af
-1(-1
dX.1
2
a f
(2,2)
a'LX2
=
2
(2,1)
af
+
=-- h 2K 2
= 1,2, .•• ,q
1
by Lemmas 2.3.2 and
~-2
a~
2.3.3
2 ajf
(2,j)
L - . h· K
J -s
j=l a~
for i ,i = 1, 2,
l 2
... , q
Similarly we find
3
a f
iii)
ax. ax. ax.
J.
J.
J.
2
1
3
2
3
a f
=-- h
a~
a2f
a f
3x-3
+- h
a~
2x-3
,
af
3
+ a~ ~-3 = L
j=l
+ - h 2K- 3
a~
.
aJf
~
a~
h· K 3 by 2.3.2 and
J2.3.3
32
·e
for iI' i , i
2
3
= 1,
2, ••• , q and in general we see the result follows
with repeated application of Lemmas 2.3.2 and 2.3.3.
Q.E.D.
=0
If we define h
r
for all r < 0 then the same formula
th
Lemma 2.3.4 will hold if f has more than K continuous partial derivatives.
It should be noted that Lemma 2.3.4 gives us no idea as to the
precise homogeneous polynomials on the right hand side but for the
reduction of the power function this is no limitation.·
Lemma 2.3.5
~(O,
""
0)
0 'if K > 1.
=
Proof:
a.
1, ••• ,1
Now i f a.1 , ••• ,1.
1
=0
q
II
.
j=l
K
'if (i , ••• ,i ) then of course h. == O.
K
l
On
-x
K
the other hand if a.
. I 0 for at least one K-triple
11' ... ,1 K
(il, ••• ,i ) then i > 0 for at least one j and hence the terms
K
j
with nonzero a.
. are multiplied by zeroes hence
11 ,···,1 K
~(O,
... ,O)
=
0
Q.E.D.
2.3.3
A Simplification of the PmoJer Function
In this section we shall substantially simplify (2.3.1) in terms
of the parameter M in Theorem 2.2.1.
Lemma 2.3.6
Under the assumptions of Theorem 2.2.1 the noncentra1ity
parameter
f:,
is a homogeneous polynomial of degree 2H.
Proof:
f:,
1
=-=-(MI) 2
where
33
C
1
c
C =
c
2
& c. =
J
r A,R.
\i
At
1
d.
M
j = 1, ••• ,t
J t 1 ,· •• ,~
t
,R. s
= O,l, ••• ,M;
M
r
s = 1, ••• ,M;
s=l
t
s
= M}
and
clearly for each j
= 1, 2, ••• ,
polynomial of degree m.
So
~
=
1
2
(H!)
t
r
j=l
t
Let
t we see that c. is a homogeneous
J
r- 1 (8)
-
-0
jj'
rae.
j'=l
=
«a jj '».
c., by Lemma 2.3.3 we see that
J
~
is a
J
homogeneous polynomial of degree 2M.
Q.E.D.
Lemma 2.3.7
Under the assumptions of Theorem 2.2.1 all of the first
.(2M-1) partial derivatives of the limiting power function
with respect to the A.'S are zero at A = O.
That is
1
= 0;
A
s = 1, 2, ••• , 2M-1.
=0
Proof:
p(A)"=
P(~(Al, •••
,Aq»
and by Lemma 2.3.6
~
is a homogeneous
polynomial of degree 2M, furthermore P is continuous in its
partial derivatives of all order so by Lemma 2.3.4 we can
represent the partial derivatives as:
(s,j)
h 2jM- s ' V A 3 II~II < K and s
=
1,2, ... ,2M-l
34
(s ,j)
Now for 2jM-s
~
1 we know by Lemma 2.3.5 that h
(0, ••• ,0)
=
2jM-s
•
1 and s < 2M so 2Mj > s therefore we see that 2Mj - s > 1 for
but j
~
all s
= 1,
2, ••• , 2M-1.
So we obtain the required result.
Q.E.D.
The preceding theorem yields a substantial reduction of the
power function.
(2.3.2)
We see that (2.3.1) can now be written as:
= ex. +
P(A)
a
1
q
- ,(1:
00
1:
r=2M r.
$/,=1
r
A n "7'i"'\)
where ex. is
P(A)
OA$/,
N
the significance level of the test.
A=O
We now reduce the expansion by
the following theorem.
aSp (A)
=0
Lemma 2.3.8
if s is not an integer multiple of
A=O
2M; i.e. if there does not exist an integer b such that
s = 2Mb.
Proof:
By Lenuna 2.3.4 and the comment after it we can take derivatives
higher than the
~~
and use the representation
=
A=O
where it is understood that
=0
But h 2M -s
j
s
1:
aJp
j=l ab.
~
j
=0
h
2M.-s
J
-A=O-
if K < O.
if 2M j - r ~ 0 by Lemma 2.3.5 so the
A=O
result fo110\.s.
Q.E.D.
0
35
So we can write the power function in its simplified form as
00
(2.3.3)
I;
r=2
A=O
We now give an alternative representation for the power function and
then proceed to use these simplifications in discussing the proposed
efficiency criteria.
2.3.4
An Alternative Representation of the Power Function
th
If we denote the i
P(~) with respect to 6 by
derivative of
p(i) (6) then we can write peA) in a power series about 6 = 0 as:
00
P(6)
= L
r=O
6 r
P (r) (0)
r!
where
Aq and can be written in a power
but 6 itself is a function of AI'
series in A about A
=0
as
00
6(A) =
L
s=O
q
I
(L
s!
~=l
00
00
So
P(6)
= L
r=O
{L
s=O
A~
a
ar)
1
q
- ,(L A~
s. ~=l
r!
s
6
~
A=O
a
ar)
~
s
6
IA=O}r
_ _
P ()
r (0)
36
Lemma 2.3.9
q
( 1:
Jl.=l
r1
P(M =
under the conditions of Theorem 2.2.1.
Proof:
~
is a homogeneous polynomial of degree 2M therefore, all of
its derivatives less than the 2M
must vanish at
-A = -O.th
greater than the 2M
th
with respect to the A's
On the other hand all derivatives
are zero since the ~~th derivative itself
is a constant in A so the result follows.
-
Q.E.D.
We now give a result which provides a representation of p(r)(~)
in general.
Lemma 2.3.10
p(r)(~)
r r
r
s
1: ()(-l)
P(t + 2(r - s),IJ.),
s
2 s=O
= (1)
r = 1, 2,
...
where t is as defined in Theorem 2.2.1 and
pet + 2j,IJ.)
= Pr[x 2 (t +
2j,~) > X
-
t,a
]
i.e. the probability that a noncentral chi-square random
variable with noncentrality parameter IJ. and t + 2j degrees
of freedom is greater than or equal to the (I-a) 100%
point of a central chi-square random variable with t
degrees of freedom.
Proof:
We know that
37-
Now the infinite series converges absolutely so there is no
problem in exchanging the derivative operator and the infinite summation operator so we get
D.
ap
p(l) (D.)
- -2
1
= -aD. = - "2
e
D. j
00
L
j=O
_
(-2)
~j
J
D.
_
00
(D.)
j-1
+ 1- e 2 L ...,.-2-._
2
j=l (j -I)!
Therefore
p
(1)
(D.)
1 .1 lIs
1
= -2[P(t+2,D.) - P(t,D.)] = (-)
2
L ()(-1)
s=O s
P(t+2(1-s) ,D.)
We shall prove the result by induction:
Assume the result of the lemma is true for r
= n, i.e.
p(n+1) (D.)
1 n n
= (-)
2
L
s 1
(n) (-1) {-2(P(t+2(n-s)+2,D.) - P(t+2(n-s) ,D.»}
5=0 5
n
n
n
+ (1)P(t+2n-2,D.) + (2)P(t+2n-2,D.) - (2)p(t+2n-4,D.)
n
n+1
- (n )p(t+2n-4,D.) + ••• + (n)(_l) P(t+2,D.)+(n)(-l)
P(t,D..)}
n
n
3
e
38
=
1 n+1 n
1 n+1
(2)
(0)P(t+2(n+1),fl) + ('2)
n-l
r (-1)
s+1
[(n) + (s+n )]
1
s
saO
1 n+1.
n+1
P(t+2(n-s),fl) + ("2)
(-1)
P(t+2(n-n),fl)
+ ( n)
(n)
s+l
s
but
[s+l+n-s] n!
_ (n+l)
= (n-s)! (s+l) !.- s+l
therefore we have
1 n+1
= (-)
2
(n+1) P(t+2(n+l-O) ,fl) + (1.)
0
2
n+l n-l
r (-1)
s+l
s=O
(n+l)
s+l
1 n.+1
n+1
P(t+2(n-s),fl) +('2)
(-1)
P(t+2(n+l - (n+l),fl)
=s
let s'
+ 1
1 n+1 .
P(t+2(n+1-s'),fl) + ('2)
P(t+s(n+l) -
(n+1),~)
1
s
1 n+l n+l
r (n+) (-1) P(t+2(n+l-s) ,M.
= (-)
2
s=O
s
So by the principle of mathematical induction the result holds
for all r = 1, 2, ... •
Q.E.D.
In the next section we compute the Pitman ARE of test 2 to test 1
when t
1
= t 2 ; in the following sections we make use of the reductions
we have obtained in this section.
39
2.4
Pitman ARE when t
l
=
t
e·
2
We consider now the problem of comparing the performance of
{Q~2)}
to
{Q~l)}
for
larg~
samples.
We assume that both sequences of
test statistics satisfy the conditions of Theorem 2.2.1 with parameters
denote the limiting power functions by P.(A) for i
J. -.
notice that if M
l
~
= 1,
2.
We first
M then one sequence of test statistics is behaving
2
= 1,
M > 1
2
then test 1 has a limiting power function which attains a relative
differently from the other.
minimum at
~O
For example if t
i
~
q and M
l
while test 2 is chan8ingso rapidly at
guarantee that it has an extremum at
consideration to M
l
= M2 = M in
~O.
~O
that we cannot
For this reason we restrict
all the sections to follow.
Since it involves a slight extension of existing work (since
q, t l and t z are general) we now derive the Pitman ARE of ¢2 with
respect to ¢l in the situation t
l
= t Z•
For convenience we denote the
P
Pi tma'n ARE df ¢2 with respect to ¢l as eZ,l'
Theorem 2.4.1. If {Q(l)} and {Q(Z)} satisfy the conditions of
N
N
TheOrem 2.2.1 with parameters t
i
and
~i
respectively and if t
l
= t2
then
Proof:
Recall that we require the
lim
where {Nl(N)} and
N-+co
{N (N)} are chosen such that the two tests {Q(l)} and {Q(2)} have the
2
N
N
l
2
same limiting power against the same ~\
40
·e
Since t
l
= t2
we know that the limiting powers are the same when
the noncentralities are the same so if Pi are defined in (2.2.20) we
see by (2.2.21) that
and
therefore we require
hence,
lim
Q.E.D.
We remark that if exactly one 6. is zero the other test is
~
clearly superior; if both 6
1
and 6
2
are zero then both are useless for
that particular direction A.
-
A common set of values for the parameters M, 0 are 1, 1/2
respectively, and then
41
where D(i) is defined by C(i)
~
~
= -D(i) A.
In this form we have a general
~
representation of the Pitman ARE for q and t arbitrary.
e~,l
will be indeterminate for some
interpretation of e~,l'
~
If q > t then
and care must be taken in the
If q ~ t then one can apply Courant's theorem
and place bounds on e~ 1 for all A f O.
,
If q
=1
no problem arises in
the evaluation of e~ ,.1 since it would not depend on A in this case.
2.5
Local Asymptotic Relative Efficiency (LARE)
In general t f t then the
l 2
preceding section I s approach of com-
puting the Pitman ARE does not work since for t
not the same when
~l
= ~2'
f t z the powers are
l
In this section we propose an alternative
criterion of comparison in which we compare the limiting powers
~
locally, i.e. when
origin.
is in some arbitrarily small neighborhood of the
To be precise we adopt the following:
The local asymptotic relative efficiency (LARE) of ¢
Z
N
l
with respect to ¢l is defined to be the lim ~ where {N l } an~ {~2}
Definition:
.
N-+co
2
are chosen such that the two tests have the same limiting power locally
through the same sequence of alternative hypotheses.
Local power is
the power function expansion up to the (ZM)th derivative terms.
Theorem 2.5.1.
The LARE of ¢2 with respect to ¢l when
{Q~l)} and
{Q~2)} satisfy the conditions of Theorem 2.2.1 each with X2 (t i ,6i )
(i
= 1,2)
as N -+
co
through
is
LARE=
~
where
•
.e
42
Proof:
Pi(~)
From section 2.3.4 we know that each power function,
can be written as
-
P.O.) = a.
~
+
/)..
E
-1:. p
r!
i
r=l
= a. + /).
r
to
i
P
( i)
i
(r)
(0)
+'
-T P. r
r=2 r. ~
00
/).:
()
E'
(/).i)
/). =0
i
00
but
is the tail of a convergent series
L:
/),.=0
r=Z
~
so we know that it is bounded by a finite number K.
In particular if
-+ p~r)
~
is sufficiently small then
/).r
00
L:
r= 2
r.
~
(6.)
~
/)..=0
- A(Amax
4M) as Amax
~ 0
-..,
~
where A
max
So for
I I~I I
< £, a small number, we see that
P.(A) ~ a. +
~
-
/).i
p~I)(6.)
~
~
/)..=0
~
(1)
So if we define local power as a. + /).. P.
~
~
(6.)
~
/)..=0
~
then we require the equality of terms like 6
in the limit.
1
pi l ) (/).1)
and
/).1=0
Now choose {Nl(N)} and {NZ(N)} such
that
2No A =0
Pz
LIZ
43
but by Lemma 2.3.10 we know
p(l)(O)
i
=
and we see
- a]
- a]
b,z} 1/2Mo
b,1
and hence
lim
N -+ 00
N (N)
1
N (N) =
Z
- a] /:;2
~ l!ZMo
- a] ill
J
Q.E.D.
Let us define
Pz(t z + 2,0) - a
P (t + 2,0) - a
1 1
In the case where M = 1, 8
= 1/2
we get
A D(Z)' E(2)
'
~
-
(2.5.1)
I
-
A D(l) E(l)
'
-1
D(Z)A
--1
D(l)A
It can be seen that we now have a measure of ARE which takes into
account (a) the difference in degrees of freedom and (b) the difference
in the noncentra1ity parameters for
~
near the origin.
Furthermore,
these two components are factored so that their product is the efficiency.
The second quantity is the same as we get when we compute the
Pitman ARE fort
1
= t z,
and we could proceed to place bounds on this
ratio by Courant's theorem.
The comments that were made in the
44
•
previous section on the relationship of q and t. should be kept in mind
1
since either of these quadratic forms in (2.5.l) may be positive semidefinite.
The scalar factor R(t ,t ,a) is given in Tables 2.5.1 to
l 2
2.5.5 for t ,t
1 2
= 1(1)10
and a
= .10,
.05, .01, .005 and .001.
We
note that the scalar adjustment can be quite large for large values of
It2-tll, but this factor varies little with the value of a unless
It2-tll is large.
One major drawback of this LARE is that it depends on the arbitrary vector A, even though we have assumed the vector to have arbitrarily small elements.
measure of ARE.
This is an undesirable feature of this
We nm, propose a measure of ARE, based on the genera1-
ized curvature of the power function at the null point, which avoids
•
•
this drawback.
•
•
48
ke
TABLE 2.5.4
VALUES OF THE ADJUSTMENT FACTOR, R(t 1 ,t 2 ,ex)
FOR ex = .005
t
t
2
3
4
5
6
7
8
9
10
1.00 1.64
2.18
2.66
3.09
3.49
3.86
4.22
4.55
4.87
1
1
2
2
.61
1.00
1.33
1. 62
1.88
2.12
2.35
2.56
2.77
2.96
3
.46
.75
1.00
1. 22
1.42
1. 60
1.77
1. 93
2.08
2.23
4
.38
.62
.82
1.00
1.16
1.31
1.45
1. 58
1.71
1.83
5
.32
.53
.71
.86
1.00 1.13
1.25
1.36
1.47
1.57
6
.29
.47
.63
.76
.89
1.00
1.11
1. 21
1.30
1.39
7
.26
.43
.57
.69
.80
.90
1.00
1.09
1.18
1.26
8
.24
.39
.52
.63
.73
.83
.92
1.00
1.08
1.15
9
.22
.36
.48
.59
.68
.77
.85
.93
1.00
1.07
10
.21
.34
.45
.55
.64
.72
.79
.87
.93
1.00
Note:
.
1
See note for Table 2.5.1.
49
e~
TABLE 2.5.5
VALUES OF THE ADJUSTMENT FACTOR, R(t ,t ,ex)
1 2
FOR ex = .001
t1
t2
1
2
3
4
5
6
7
8
9
10
1
1.00
1. 69
2.28
2.81
3.29
3.73
4.15
4.53
4.91
5.27
2
.59
1.00 1.35
1. 66
1. 94
2.20
2.45
2.68
2.90
3.11
3
.44
.74
1.00 1. 23
1.44
1. 63
1.82
1. 99
2.15
2.31
4
.36
.60
.81
1.00 1.17
1.33
1.48
1.62 1. 75
1.88
5
.30
.52
.69
.85
1.00
1.14
1.26 1.38
1.49
1.60
6
.27
.45
.61
.75
.88
1.22 1.32
1.41
7
.24
.41
.55
.68
.79
.90
1.00
1.18
1.27
8
.22
.37
.50
.62
.72
.82
.91
1.00 1. 08
1.16
9
.20
.34
.46
.57
.67
.76
.84
.92 1.00
1.07
10
.19
.32
.43
.53
.62
.71
.79
.86
1.00
Note:
1.00 1.11
1.10
.93
...
See note for Table 2.5.1.
e
...
50
2.6
Curvature Asymptotic Relative Efficiency (CARE)
We consider now another new measure of ARE for the special case
M1 = M2 = 1.
At the end of this section we discuss briefly the case
where M. > 1 and point out problems in the interpretation of this case.
~
We recall from Chapter I the definition of generalized Gaussian
curvature of a function in several variables.
We propose now as a
measure of efficiency a criterion based on the generalized Gaussian
-
curvature of the two limiting power functions t P.(A) at the origin.
~
Definition:
The curvature asymptotic relative efficiency (CARE) of
Nl
test ~2 with respect to ~l is the
lim
-- where {N } and {N 2 }
l
N + 00 N2
1
are chosen such that the two tests have limiting power functions with
the same generalized Gaussian curvature at A
=0
through the same
sequence of alternative hypotheses.
Of course this definition of efficiencYt denoted by (CARE)t is
not meaningful if M and M are both greater than one since one can
l
2
then show by application of Lemma 2.3.7 that the two generalized
Gaussian curvatures must be zero at A = O.
Let us now derive the
generalized curvatures (we omit the adjective Gaussian for convenience)
of the limiting power functions at
= O.
~
Theorem 2.6.1. Under the conditions of Theorem 2.2.l t when M. = 1
~
the generalized curvature of the limiting power function of {Q(i)} at
N
A
=0
is:
G
i
=
[Pi(t
+ 2 t O) - aJ
i
q
1~(i)'E(i)-l~(i)1
q
2
where D(i) is defined by ~(i)
= DCi )
A.
51
Proof:
By definition of G in Chapter Iwe have
i
2
d P (A)
2
i
«dAdA-
Gi
~
=
K ....-
q
1 + L
t=l
fPdAi (~)
t
by Lemma 2.3.7 the denominator is one so
Gi
=
and we know that
A=O
But by Lemmas 2.3.9 and 2.3.10 we get
=
A=o
--
[PJ.. (t i
+
2,0) - ex]
2
('
-1
(D i) rei) D(i»
~
-
where tpe second term on the right is the
D(i)'r(i)-lD(i) •
~K
~K
element of the matrix
Therefore we get
D(i)'r(i)-lD(i)
-
-
Q.E.D.
Now we observe that if q > t i with M.J.
Theorem 2.2.2; on the other hand if q
~
=1
then G.J.
=
°by
t, then G. > 0 if we assume
J.
e-
52
that ~(i) is of rank q which we do throughout.
We are now in a posi-
tion to find the CARE.
Theorem 2.6.2.
{Q~l)} and {Q~2)} satisfy the conditions of
If
Theorem 2.2.1 and if q
~
ti(i
= 1,2)
1
= M2 = 1,
then
1~(Z)t~(Z)-l~(Z)1 1/2q8
1/20
CARE = R(t ,t ,a)'
1 Z
Proof:
withM
1~(l)t~(l)-l~(l)1
We must determine {N } and {N } such that G
Z
l
l
= G
Z
where G
i
is the generalized curvature of the limiting power function of
through H as N +
N
00
where
-8
~N = ~O + N
~:
~
By (Z.2.2l) we have
I R_)
(i)
~(QN
-~
.
J.
X2 (t.,P.
1
1
+
28
as N
~.)
1
+
00
and Pi is defined in (2.2.20) and since M = 1
~. = At
1
-
n(i)t E(i)-l n(i) A.
-
-
-
-
By Theorem 2.6.1
Gi =
[P.(t.+2,O) - a]q
_=-J.--:1:..-.
2
Setting G
l
_
q
= G
2
n(i)t E(i)-l n(i)
-
we obtain
-1
1~(2)tE(2)
1/2oq
~(Z)
I
1~(l)tE(l)-l~(l) I
and the proof is complete.
Q.E.D.
53
We now give a result for the special case of t
Theorem Z.6.3.
l
= t Z•
Under the conditions of Theorem Z.6.Z with t
l
=
t
z
the CARE is the geometric mean of the roots of the product matrix,
,-1
(D(Z) E(Z)
Proof:
.-1
D(Z»(D(l)f E(l)
D(l»-l, of the Pitman ARE.
From section Z.5 we know that when q
~
t
l
= t
z,
M = M
l
Z
=1
we have
~'D
P
eZ 1
,
=
(Z)' (Z) -.1 (2)
E
D ~
J
1/20
,-1
[ A'D(l) E(l)
D(l)A
-
-
Since both D(Z)'E(2)-lD(Z) and D(l)'E(l)-lD(l) are positive definite
we can find a nonsingular F such that F'D(Z)'E(2)-lD(2)F
diag (rl, ••• ,r ), F'D(l)'r(l)-lD(l)F
q
=I
of (D(2)'E(2)-lD(2»(D(1)'E(1)-lD(1»-1
-
where rl, ••• ,r
If we let A*
q
=
are the roots
=F
A then we
can represent e~,l in ~* space as
Now the geometric mean of rl, ••• ,r
q
is given by
q
l/q
( IT r t)
t=l
So the CARE is the geometric mean of the roots of the product
matrix.
We also note that the CARE is equal to the ratio of the two
geometric means of the roots of the individual matrices
54
Q.E.D.
Thus the efficiency using the curvature provides us with a
typical or average efficiency of our tests over all A :/: 0 and in the
case t
l
= t2
considerably simplifies the Pitman ARE since we have an
alternative to simply placing bounds on the efficiency.
Theorem 2.6.3
is a generalization of Bickel's (1965) result when he considered the
multivariate one sample location problem with q
=tl
=. t •
2
We can see that if q < t. for one i but q > t. for the other that
-
~
~
the latter power function has zero curvature at the origin while the
former has positive curvature so we could conclude that the former test
is unquestionably superior to the other when using this efficiency
criterion.
If on the other hand both t. < q (i
~
= 1,2)
then both curva-
tures are zero and this method of comparison is not able to discriminate between the tests.
In this latter case we consider an alternative
measure of ARE in a later section.
Returning to the conditions of
Theorem 2.6.2 let us define an egui-power contour as
~he
that P.(>..) = a + c (0 < c < 1 - a) where c is arbitrary.
-
~-
set of
~
such
Consider the
power function as w-ritten in equation (2.3.3) at the end of section
2.3.3; with M.~
a +
=1
we find that an equi-power contour is equal to:
~,[[:::~~~) ] I~=~] ~
00
e
+ E E
r=2 I
2r
AR, ... AR,
1
2r
(2Nr) I
a
2r
Pi(~)
aAR, • .. dAR,
1
2r
=a +
A=O
c
.
55
Now if we choose
A~
infinitesimals in
~
sufficiently small so that we can ignore
of order 4 and higher we see that the equi-power
contour reduces to
A'
>., = c •
The matrix in this equation is positive definite when q < t. and
-
hence this contour is an ellipsoid.
1
Now the volume of this ellipsoid
is given by
q
v =
J
A'A.A=c
-
dA~
IT
where Ai =
~=l
1..=0
-1-
~,K=l,
••• ,q
Now A.
is symmetric positive definite so there exists an orthogonal
",,1
matrix B such that B'A B
"" ""i-
=
r ..
1J
= roots of A.•
-1
Let C = BR- 1/2 and let A = C 1..*
therefore V =
the volume of the q-dimensional sphere of radius /C.
R- l/Z B' A B R- l/Z
~
-
-1 _ -
= -I
we notice that
consequently V = jA /-1/ 2 S (c) =
_i
q
Since
1/2 = /BR- 1/2 ,
IA./-1-
and
S (c)
9
IA.ll/Z·
The volume of the equi-
_1
power contour is thus inversely proportional to the square root of the
generalized curvature of the power function at the origin.
We see
that to increase the curvature is to decrease the volume of a certain
56
infinitesimal ellipsoid.
In comparing the power functions of test 2
and test 1 along an equi-power contour, we see that if CARE > 1 then
the power function of test 2 encloses an ellipsoid of smaller volume
than test 1 along the same contour; intuitively the second test seems
to satisfy an attractive property of faster average growth locally
than test 1.
Under the assumptions of Theorem 2.6.2 both tests are unbiased,
and we see that if CARE >1 then test 2 is more nearly optimum in the
If CARE > 1 we may say that
type D sense described by Isaacson (1951).
test 2 has faster average growth locally when compared to test 1 but
we should keep in mind that there may be some directions for which
test 2 has lower
po~.,er
than test 1.
This type of deficiency is almost
certain to exist in any single quantity which attempts to measure
multivariate efficiency.
If we consider the case where M
l
terms in A of power greater than the ZM
= M2 = M with
th
M > 1, neglecting
and higher than the equi-
power contour of each pOvler function, reduces to
a2~i(~)A=O
dAt • .. dA
t
1
2M
=
c for c
£
(O~l-a]
•
For general values of M it is not clear to me what type of
contour this is or what conditions (similar to the positive definiteness assumed when M = 1), are needed on the
to ensure a region
~vhich
encloses an interior.
Consequently attempts
57
generalize the arguments on minimizing the volume of ellipsoids fail.
One should anticipate problems in handling this for general M since
even sufficient conditions to ensure a relative minimum of the power
function at A
=~
are not generally available.
A sufficient condition
nd
for M = I is, of course, the positive definiteness of the 2
partial
derivative matrix.
The tests considered in this work have M = I so we
discuss the case M > 1 no longer.
2.7
Trace Asymptotic Relative Efficiency (TARE)
= M2
Again confining ourselves to the case M
I
we propose another
measure of ARE whose range of applicability extends to a wider class of
problems than does the CARE.
Definition:
The criterion we propose is as follows:
The trace asymptotic relative efficiency (TARE) of test ¢2
N
with respect to ¢l is the
lim
N +
1
such that the
t~170
NI
<Xl
where {N } and {N } are chosen
I
2
2
tests have the same limiting average power locally
over the unit sphere through the same sequence of alternative hypotheses.
Again as in section 2.6 we define locally to involve the terms
up to the 2M
th
derivatives in the limiting power function, and we
assume negligible terms beyond the 2M
th
derivatives.
We expect this
ARE to be an average efficiency since we are taking the average local
power over all possible directions with respect to the surface area
element on the
q~dimensional
sphere.
Pi(~) = a + ~ ~i[Pi(ti +
=a +
t[Pi(t
i
\{hen M = M = I we can write
2
I
2,0) - a] +
O(A~ax)
+ 2,0) - a] A' D(i)'}.;(i)-ln(i)A +
O(A~ax)'
58
So ignoring higher order terms we obtain a local power of
,
l[p i ( t +2 . 0) _ ....
N] "D(i)
~(i)
Pi (~' ) .= ex + '2
A
t..
i
(2 • 7 • 1)
Let
f
q
K(R)
f(~)dA
-1
D(i),.
A
denote the surface integral of the function f (A) over
-
the surface Kq(R).
Then we
obt~in.the
following theorem for the TARE
of test 2 to test 1.
If {Q(l)} and {Q(2)} satisfy the conditions of
N
N
Theorem 2.7.1.
Theorem 2.2.1 and M
l
TARE
Proof:
= M2 = 1
then
tr
=
D
(2)' (2) -1 (2)
I:
D
-
- -
ll/2
0
We are required to choose {N } and {N } such that the two
2
l
tests have the same average local power over the surface
I IAI I = 1
we must choose the sample sizes to guarantee this condition through
the sequence
~:
")0
-\
~N
(Q (1)
N.
~
-0
= ~O + N
~.
By (2.2.21)
20
IR_)~X2(t.,p.
-~
~
. de f'~ne d·~n (2220)
h
were
Pi ~s
••
and
~
A
Wi
/:'.) asN~oo
~
='A 'D(i)'~(i)-lD(i),.
t..
A
So by (2.7.1) we require that
(2.7.2)
f
"~,, =1
J
,,~ 1/=1
and
59
e-
But by a result in differential geometry [Weyl (1939)]
tr B
I
(2.7.3)
"B'1\ dA
1\
= -q--
Area (sq-l)
(1)
11~11=1
so (2.7.2) becomes
1/28
Recalling how the p. are defined we see the result follows.
1.
Q.E.D.
The trace of a matrix is equal to the sum of its roots so using
this definition of ARE we find that we get a quantity which is equal
to the arithmetic mean of the roots.
Hence, if we had defined the
TARE as the limiting ratio of sample sizes such that the tests have
nd
the same arithmetic mean of the roots of the 2
partial derivative
matrix at the origin then we would have arrived at the same result.
We see that if TARE > 1 then test 2 is more nearly optimum in the sense
of Wald (1943) for sufficiently small ~, since it has greater average
local power over the unit sphere.
2.8
Bahadur Efficiency
To be consistent with the notation in Chapter I we let n be the
parameter set consisting of the values of
~
and let
of the parameter set consisting of the single point
nO
~O.
be that subset
We consider
in the Bahadur method of comparison, a fixed alternative hypothesis,
H:
a
e = -a
e
-
E
n - n0 ;
the statistics we compare are
'~'
.
.'"
60
i
We give sufficient conditions under which"
{~Q~i)}
= 1,2.
is a standard
sequence for testing H :
O
Theorem 2.8.1.
Suppose that to test the hypothesis
{Q~l)},
the two sequences of test statistics,
and
~
= ~O
{Q~2)},
we have
satisfying
the following three conditions:
Q~i) has a central X2 distribution with t i degrees of
freedom as N ~ ~ when ~ = ~O.
(2.8.1)
a)
(2.8.2)
b)" (T(i) - ~(i)(6
N
N
for all
-0
»
a.s.
~
6 £ n - nO
where
n(i)(S) as N ~ ~
~(i)(~)
is a fixed non-null
vector of ti-components.
~(i)
c) -N
~
(2.8.3)
a.s.
-7
(.)
E ~ (6) as N ~ ~ for ~__
£
n - nO
(')
where E ~ (8)
is positive definite.
Then the sequence
Proof:
"{~Q(i)}
is
N
~i) = ~Q~i).
Let
a standard sequence for testing
We must now verify the three conditions
for a standard sequence as indicated in the definition in Chapter I.
1.
~(Q~i»
~~i»
~
X (t.), where X(t.) is a chi-distribution with
~
~
degrees of freedom, for ~ £ nO since ~i) is a
i
continuous function of Q~i~ Hence we see there exists a
t
61
continuous distribution function Fi(X) such that
lim
N~al
pr[~i) ~
II. For each X and
~ E
~
(2 2 r(-!.»-l e- z / 2 z
I
E
nO.
t
-2
2
X ] that
t.
"'!t
al
=
V X and
nO we notice that
= Pr[x 2 (t i ) ~
since Fi(X)
1 - Fi(X)
= Fi(X)
X]
i
dz
2
2
x2
-2
t
....L.
. let u
=
dv
z 2
t
= e- z / 2 dz
-4
-i2
dz
v
= -2
e
-z/2
So for each X
(2.8.4)
= (2
_X2 /2
/2
t
i
r(t./2»-1 [2e
].
+
2
(t -2)
X i
re-
t -2
Z
/
2
/f\z+
-1
dz]
x2
let w = z/2 then (2.8.4) becomes
(2.8.5)
00
t -2
i
r -w .-2--1
Jew
dw].
2
X
2
e.
62
Now we know that
t -2
00
-i - 1
e-w w 2
dw
therefore as X -+
co
t -2
1 - F.(X)
1.
So as X
=
(2 i
_X2
= 0 (e
-
t.-2
2 ..1:.--1
2
) as X -+
2 (X )
2
+
co
(2.8.5) is
ti
f(t./2»-1 2 [e-X
-2
+ 2 i
1.
_X 2/2
t -2
(t.-2) O(e
1.
ti
-4
X
)]
-+ co
_X 2/2
e
=
t.-2
t./2
X 1. 2(2 1. f(t./2»-1 [1 + 0(1)] as X -+
co •
1.
Hence,
=-
log e(l - Fi(X)
where Ct.
= 2(2
2
X
~
+ (t i -2) loge X + log c + 0(1) as X -+
ti
t.l2
1
f(t/2»-
1.
co
and we see
1.
(2.8.6) log (l-F.(X»
e
1.
2
2(t.-2) loge X
= - X [1 1.
+ 0(1)] as X -+
X2
2
00
2
log (l-F.(X»
e
1.
X
= - -2[1
+ 0(1)] as X -+
00.
So condition II required for a standard sequence is satisfied
for ~i) with a
i
= 1.
III. Since (T(i) - ~(i)(8 » a.s. n(i)(8)
N
and
N
... O
-+
...
VeE n - n0
......
as N -+
co
63
"'(i)a.s. (i)
E
(8)
-N
+ I_
_ .as N + 00, V 8 E 0 - 00
therefore we see that
as N +
00
•
As a result, we have
So if
~(i)(~) # ~ for ~
E 0 - 0
0
then by the assumed positive
definitiveness of ...I(i)(8)
... V ...8; the result of the theorem follows.
Q.E.D.
As a result if we assume that our statistics satisfied ,the
conditions of Theorem 2.8.1, which they do for all the applications
considered, then the Bahadur efficiency of
~2
with respect to
~l
is
simply
(2.8.7)
B
e 2 ,1 (8)
=
for 8 e:
n - nO.
We notice that (2.8.7) depends on the parameter 8 in a most
complicated fashion; primarily we notice that the covariance ...I(i)(8)
...
depends on the fixed alternative value chosen.
For some problems
~(i)(~) would not of course depend on ~, for example, in the standard
multivariate one sample problem when the data represent observations
from a normal population with specified location
hypothesis and location
~
for the alternatives.
this/dependence of the covariance on
~
~O
under the null
For other problems
is real, for
ex~ple,
if in
64
-e
the above problems we consider nonparametric tests constructed under
an appropriate invariance structure we would find that r(i)(8) would
depend on 8.
So we see
~h1S
dependence of the matrix r(i)(8) on
...
creates a problem in the evaluation of this efficiency.
...
...e
We should also
notice by (2.8.6) that the measure of efficiency, which was constructed
for large
x,
is not sensitive to the degrees of freedom of the test,
as the dominant terms in (2.8.6) do not involve t. as X -+
~
00.
For these
reasons we find the Bahadur criterion an unattractive measure of
efficiency for our problem.
Bahadur (1960) established sufficient conditions under which the
Pitman ARE is equal to the limiting value of the Bahadur efficiency
when the univariate parameter
e approaches
the null value
~O.
We
now establish a relationship between the limiting Bahadur efficiency
as
~ -+ ~O
and the LARE which we proposed earlier
If for testing HO: ~ = ~O we have the two sequences
of test statistics {Q(i)} i = 1,2, each sequence satisfying the
Theorem 2.8.2.
N
conditions of Theorem 2.8.1, and
(2.8.8)
a)
~~i)(~) has first partial derivatives with respect to
the elements of
(2.8.9)
~
and uniformly continuous in a small
open neighborhood of ~O; for every N,
() ].I (i) (8 )
...N
... 0
b) lim
= D(i) exists
() e
N-+oo
(2.8.10) c) r(i)(8)r(i)
~
- (2.8.11) d) n(i)(8)
then
=
-1
(8)
~O
lim
N .... oo
-+ I
e
as ...e -+ ...0
,
-1
(8-8 )'D(2) r(2)
~
-0
-
--0 -1
(8-8 )'D(1)'r(1)
-
~O
as ~
-+
-
(8 )D(2) (B-e )[1 + 0(1)]
-
--0
(6 )D(l) (e-e ) [1 + 0(1)]
~O -.
--0
~O
where ~(i) is defined by (2.8.9).
Proof:
By Theorem 2.8.1
,
-1
n (2) (B)r(2) (B)n (2) (e)
B ()
- for every
e Z,l ~ =
1
n(l)' (e)r(l)- (B)n(I)' (8)
~ E
n - nO.
......
But by (2.8.11) we have for every
~ E
n - nO
But by (2.8.10) there exists matrices ....r(i)(e)
defined by
....
where
So for each i we see that for ~. close to ~O we have for each N:
Now choose ....8 such that
but by (2.8.8)
65
66
.e
=
a~(i)(e )
-N
-0
a ...8
[1 + 0(1)] as ~ ~ ~O •
So we see that (2.8.14) becomes for each i
(2.8.15)
(8-e)'
... -0
[a~~i)(~O)]'E(i)-\e)
[.~~i)(~o)J(e-e)
+ 0(1)
.ae
...
...0
. ae
_ -0
-
-
as
Now this is true for each N hence we have that as
~ ~ ~O
e~ e .
-
-0
that
Hence the result (2.8.12) follows.
Q.E.D.
We notice that if we let
~
=
(~
-
~O)
then
~ ~.~
as
we see that the quantity (2.8.12) is similar to the LARE.
the
~ ~ ~O
and
In short if
{Q~i)} satisfied the conditions of Theorem 2.2.1 with M = 1 we see
that the LARE and the limiting Bahadur we have considered here would
be nearly identical.
The limiting Bahadur efficiency in (2.8.12) even
though simple to compute does not account for differences in degrees of
freedom.
CHAPTER III
APPLICATION OF TARE AND CARE TO THE ONE SAMPLE
GROWTH CURVE PROBLEM
3.1
Introduction
In this chapter we apply the measures of ARE proposed in
Chapter II to evaluate the efficiencies of some common procedures used
in the study of polynomial growth curve models.
The general statis-
tical model along with its reduction for the one sample growth curve
model is given in Section 3.2.
While the results of this chapter are
easily extended for other hypotheses, the hypothesis we study is the
hypothesis of a constant growth curve over time.
In Section 3.4 we
discuss the reduction of the basic data to estimates of the assumed
model.
Once a given model has been decided upon a conunon procedure to
follow in practice is to estimate the growth curve parameters by
unweighted least squares.
this chapter.
This basic reduction is assumed throughout
After obtaining this set of summary statistics for each
2
observation vector 'ole apply the Hotelling T
hypothesis.
test to test the null
We then evaluate the TARE and CARE of
a~
incorrect speci-
fication of the model to the correct specification of the number of
parameters.
Similar
r~sults
are obtained in Section 3.5 for the cor-
responding nonparametric rank scores procedure.
A brief summary of
the one sample rank scores procedure is presented in Section 3.5.2.
Section 3.6 is devoted to a comparison of the parametric and
..
68
nonparametric procedures based on the trace and curvature criteria of
ARE.
Bounds for the TARE are derived similar to those available for
the CARE.
The chapter is concluded with a brief section on the use of
covariance adjustment of the statistics with the higher order po1ynomia1 coefficients.
3.2
The Statistical Model
The model given in this section is sufficiently general to
encompass the c-sample problem and more complicated designs and will
be used in a later chapter on the c-samp1e problem.
We consider two
index sets, I and T where
We notice that I contains
K1K2"'~
distinct points and we may
think of I as specifying the design across individuals.
of distinct time points.
have a set of
nC~)
Corresponding to each i £ I and t
random vectors of b elements, namely
I
C3.2.3)
~SC~,ti)
S
=
lxb
= 1,2, ••• ,nCi).
-
In addition, let the hxb matrix
[!sTt
l
(3.2.4)
~S(~)
=
T is the set
)
~S (.!' t h )
I
S = 1,2, ••• ,n(i).
i
£
T we
69
We assume that
~S(~)
is a collection of independent stochastic
-
matrices from"a bh-variate continuous distribution function G(X;i)
...
For the normal theory, we would assume
bhxbh
where the location vector
M(i)
E does not depend
on~.
i
=
I and
We note that with this specification
S
that is, the
£
distri~ution
~s(9
of
-
= l, ... ,n(i)
~(~)
does not depend on
~.
To eliminate the normality assumption, in the nonparametric approach
we assume that
-
G(X;i)
,..,
In this way
~S(~)
-
~(~)
= G(X + M(i»
_ .....
~
...i
£
1.
is distributed independently of 1.
For the growth curve model one assumes that the
~(~,ti)
can be
written as a function of certain parameters 8(i) and the time point,
tt.
For example we may assume that
(3.i.7)
i £ I
1 < j < b
1 < t < h
where 8.(i) is a vector of r elements (r
...J ...
~
h).
Since the 8.(i) are
-J ...
70
assumed to be unknown we can consider problems in testing hypotheses
concerning the
(3.2.8)
~j(~)
and the estimation of
= (~l(~)""'~b(~»
SCi)
~j(~)'
If we let
i £ I
rxb
--
then we wee that we have a reparameterization from M(i) to 8(i) which
is dimension-reducing.
In this' chapter we are interested in evaluating
the loss in the incorrect specification of 8(i) for the one sample
problem.
We shall consider only polynomial functions of to where e.(i) are
-J To be explicit, we require the
N
the coefficients of these polynomials.
functions Y defined by (3.2.7) to be polynomials in t}1,'
j
The set T
corresponds to h distinct abscissae and it is well known that corresponding to this set T there exists a unique set of h orthogonal polynomials poet), ••• , Ph_let) satisfying the conditions:
h
. 1
L P.(tj)Pk(t.) = {o
(3.2.9)
j=l
J
1
i
i
=K
1K
and P.(t) is a polynomial of exactly degree i in t.
1
vector
~i
the value of Pi(t) for each t}1,' }1,
= 1,
We denote by the
••• , h; that is
Pi(t )
l
i = O,l, ••• ,h-l.
(3.2.10)
hxl
We shall consider in this study only the case b
= 1.
We observe
that for the one sample problem I consists of only one element so we
omit the subscripts
case.
~
in the remaining discussion of the one sample
In the one sample problem with b
=1
we have:
71
..
x...8 =
hx1
8
= 1,2, ••• ,n
which are independent and identically distributed as G(X)
... •
.
3.3
Comments on Data Reduction
Since we consider only polynomial models we see that
~(t)
can
be any of the following:
With each of these models one could construct a test of the
hypothesis that the growth curve is constant with respect to
example~
For
for the model (3.3.1) the appropriate hypothesis to test is
a
(3.3.4)
t~.
H :
O
u
0
1
2
~-1
=
0
a
O
unspecified.
0
While under the assumption of model (3.3.3) the appropriate
hypothesis to test is:
(3.3.5)
Given what maximum degree of the model is assumed then we can
determine which parameters to test for equality to zero.
Once it has
been decided how many parameters we wish to estimate we then have the
72
problem of reducing each observation vector of h-elements to a vector
of t-elements'which represent 'the estimates for the individual's
growth curve.
One reduction technique is the method of unweighted
least squares, which amounts to choosing for each individual the
vector a which minimizes the quantity
(~s - ~~)' (~s - ~~)
(3.3.6)
where Ba is the assumed growth curve model.
~s
is denoted by
!s
and is equal to
Y
....,g
The estimate for each
(~'~)-l~ '~S'
i. e.
= (B'B)-l
~'~s'
.., ----
Now if B is the matrix of orthogonal polynomials
~
satisfying (3.2.9)
then,
(3.3.7)
Under the assumption of the Gauss-Markov theorem this is the minimum
variance linear unbiased estimator for a... if the covariance matrix of
~s assumed to exist) is equal to 0 2 I •
When this covariance pattern
hxh
is incorrect then the estimator
!s
is not the best linear unbiased
estimator (BLUE), which in this case is given by the method of generalized least squares.
In the absence of information on the true covariance
~,
the
estimator (3.3.7) is the estimator suggested by several authors,
[e.g. Potthoff and Roy, 1964].
Other estimates have also been pro-
posed which introduce stochastic weights [e.g. Grizzle and Allen, 1969];
ho~~ever,
we explore estimators of the form (3.3.7) since these are also
used frequently in practice.
73
The estimators needed to test the null hypothesis would not
involve the first element of !S since this estimates aO.
The esti-
mators for testing the null hypothesis are:
(3.3.8)
Y
....Si
where~i
= B-1'
S
X
-S
= 1, ••• ,n
correspond to the orthogonal polynomials £l' ••• '~i to be
used to test that al,aZ, ••• ,ai are zero.
We now consider the standard
parametric analysis which is applied to these reduced data.
3.4 Parametric Procedures
In the standard parametric analysis we assume that
-
N(~,L);
....
G(~)
is
however, if one assumes that G(X) has finite moments up to
-
order Z + 0 for some 0 > 0 then the parametric procedure in large
samples will still yield the same probability distributions.
more explicit, we suppose that
G(~)
To be
is an h-variate continuous distri-
bution function with location vector
~
and dispersion
~
=
, positive definite.
Consider now the random sample
hxh
~S'
S
= l, ••• ,n
and
define
=n
n
-1
~S
L
S=l
~n
--
~-1
= ~n
n-1
n
.
- - }
L X_S~S' - X X
S=l
~
-n-n
by the arguments presented in Puri and Sen (1971, p. 173) we see that
e-
74
s PEas
n ~
_n
_
~
00.
In addition for a sequence of alternative hypotheses
n- 1 / 2 A where A is fixed and non-null
-
~(n1/2 X. IH ) ~
~
n
N(A,E) as n
~
-
~
00
therefore,
Clearly, if we now consider the function defined by
I
!Si = !i !S; for each S = 1, ..• ,n
then we can apply the same logic to the random variables Y to find
-n
1 2
the distribution of n / Y
_n for large n. In particular consider now
the special problem of comparing a linear to a quadratic model when
the true model is linear.
3.4.1
Parametric Procedure - True Model Linear
If
~(tt)
is linear in tt' then to test the null hypothesis we
consider the following:
(3.4.1)
'!.Sl
--
= b'
_1
X •
-8'
S = 1, ••• , n
and if we (incorrectly) tested for the quadratic also we consider the
transformation:
b'
1
(3.4.2)
S
b'
2
= 1, •.• ,
n.
75
Since we know
it follows that
£i E ~1·
E(!Sl)
= a1
and its covariance matrix is given by
a
= (0 1)
Similarly E(!S2)
and the covariance matrix of !S2 is
given by
Consider the sequence of alternative hypotheses,
(3.4.3)
~:
a 1n
= n-1/2 A, A ~ 0
Then the usual parametric one sample estimates are
-1
!i
--
=n
n
L
8=1
YS '
.... J.
i
= 1,2.
Since G has moments of order 2 + 0 (0 >0), the conditions of
Theorem 2.2.1 are satisfied; the first four conditions are obvious
and the fifth follows from the Berry-Esseen Theorem.
where
In a similar fashion the test based on
Iii yl
.... 2
blS b l
~l-n"",
b'S
....1....n....b 2
blS b
.... 2....n....1
blS b
....2....n.... 2
-1
Iii !2
Therefore
76
has a limiting x2(2'~2) as n .....
~2
= A1 (1,0)
ClQ,
where
-1
~i ~ ~,
~i ~ ~2
~i ~ ~1
~i ~ ~2
[~ ]
A1 ·
This reduces to
-1
~2
= A1(~2
~ ~2)[(~i ~ ~1)(~2 ~ ~2) - (~i ~ ~2)(~2 ~ ~1)]
A1 •
If we let $1 be the test based on the correct model and $2 the
test on the quadratic model; we see that the ARE of $2 with respect to
$1 using trace, curvature or local criteria becomes:
(3.4.4) ARE
Continuing with the parametric procedure let us derive the
efficiency for the general case of overfitting the model.
3.4.2
Parametric Procedures - General Case of Overfitting
Suppose that the correct model is:
(3.4.5)
Make the transformation
b'
_1
~S = ~i ~S
(3.4.6)
S
= 1, ••• ,n
b'
-q
Now we see that under the model (3.4.5)
Ct
Ct
1
q
77
Consider now the sequence of alternative hypotheses
• n
a
-1/2
, A ",0 •
A
qn
q
Then similarly to the previous section we see that
Q~l)
= lin
fs~ (~i ~n ~l)
will have X2(q'~1) distribution as n
~l = ~'(~i
E~1)
-1
+ ~
In !Sl
where
-1
A.
If we incorrectly assumed that
p~(h-l).
Denote
b'
",q+1
B'
",2
=
b'
",p
and the accompanying transformation to !S2 by
B'
",1
!S2
=
S
B'
",2
We see that
a
q
o
o
=
1,2, •••
,n.
78
and the covariance of !S2 is
We observe that in the notation of Theorem 2.2.1
=
Jl (0: .•• 0: )
....n 1
q
aq
o
o
thus
I....
qXq
o
(p-q)Xq
in the notation of Theorem 2.2.1.
of
Q~2) is X2(P'~2) as·n
+
Clearly, the limiting distribution
00, where
-1
Q(2)
n
= n Y'
... 2
B'
.. 1 S
....n ~1
B' ...n
...2
...1 S B
...n B
....1
...B'2 S
B'
S B
... 2
.... 2 ...n
!2
and
~ 2 ==
(2) ,
_A' ...D
B'
...1
E~1
B'
...1
E~2
-1
- -.
D(2) A
B'
.... 2
E~1
B'
_2
E~2
By Theorem 8.2.1 of Graybill (1969) we have:
79
-1
(3.4.7)
B' ~ ~2
*
=
,
*
. where
* indicates
other terms in
multiplication by p(2)'.
62 = ~'[~i E ~l -
~i ~ ~j
*
which are eliminated by
Hence using (3.4.7) we obtain
~i E ~2(~2 E ~2)-l ~2 E ~1]
-1
A.
Again designating the test with the correct number of parameters
~1
as
and letting
~2
be the overfit we obtain the following efficien-
cies of ~2 relative to ~l:
l/q
(3.4.8)
CARE
= R(q,p,a)
q~p
and
(3.4.9)
TARE
q
3.4.3
Parametric Procedures - General Case of
~
p.
Unde~fitting
For this situation we consider the same model 3.4.5 and suppose
~1
is again based on the test
Q~l)
on
Q~2), a quadratic form in p variables, where p < q. We define
of Section 3.4.2 but
~2
is based
80
!S2 by
b'
-1
(3.4.10)
X
!S2 =
p X1
-S
each S
X
= B ' -s;
= 1, ••• ,n.
b'
-p
Notice that
b'
q
In the context of Theorem 2.2.1 we have
j
= 1, ... ,p,
~
.(a , ••• ,a ) = a.,
nJ
1
q
J
and
..
D(2)
=[
pXq
o )
pX(q-p)
I
pxp
The conditions of Theorem 2.2.1 are satisfied again and the limiting
distribution of Q(2) where
n
Q(2)
n
is
= yn
rn
y-, .(B' S
- S2 -
B)-l rn Y---82
yn
-n-__
.-
X2(p'~2) where ~2 = A' D(2)' (B' L B)-l D(2) A. As pointed out in
Chapter II this is a situation in which the test has power a in (q-p)
principal directions and hence the generalized Gaussian curvature is
zero.
We can consider the trace criterion which yields TARE of ¢2 with
respect to ¢l:
,
tr D(2) (B' L B)
TARE = R(q,p,a) =
- --
-1
D(2)
81
which reduces to
tr(B' I: B)-l
TARE
(3.4.11)
-
= R(q,p,cx.)
for q > p.
This quantity will also be numerically evaluated for some
specific covariance structures in Section 5.3.
3.5
Nonparametric Procedures
Wnen using nonparametric methods for growth curve analysis two
approaches suggest themselves.
The first is to estimate each indi-
vidual's growth curve parameters using some robust procedure and
then apply a nonparametric test to these estimates.
The second approach is to use a least squares reduction
of each observation as in the previous section and then apply a
standard nonparametric test to these least squares estimates.
latter approach is taken in this
3.5.1
The
section.
Data Reduction
The observations
~S -~;
S
= 1, .•• ,n
are assumed to have been
hxl
selected from an h-variate continuous distribution function G(X),
-
diagonally symmetric about
~
where
models (3.3.1),(3.3.2) or (3.3.3).
~(tt)
is represented by one of the
As we did in the parametric case
we shall first consider the special case where the model is linear
and evaluate the efficiency by overfitting this linear model.
The
tests we compare will be rank order tests applied to !Sl and !S2
defined by (3.4.1) and (3.4.2) respectively.
Since
~S
-
~
~S
-
~
is assumed to be from G which is continuous and
is diagonally symmetric about zero it follows from the symmetry
82
alone that the characteristic function of
any hXq matrix of rank q(q
~
= E(e
.
it'Z
- _s)
~s' ~(~),
·it'B'(X
= E(e
- -
~t
is real for all
is diagonally symmetric about O.
function of
~S'
= ~'
~S;
(~s
~!,
) = ¢(~~)
e).
-
hence
But ¢ is real
~
is real and
~S
If we let H be the distribution
then we can think of
H and hence letting !S
is also real since
-p)
-s
where ¢ is the characteristic function of
for all t therefore
If B is
h) and we define
then the characteristic function of
~(t)
e is real.
-
~s
~S
S
= 1, ••• ,n
as a sample from
!S can be assumed to be from a
q-variate continuous distribution function, F, diagonally symmetric
about B'p.
We are now in a position to apply the multivariate one
sample tests as outlined in Puri and Sen (1971).
We conclude from
this section that we have a reduction of the data to a set of data to
which we may apply the nonparametric multivariate one sample test
procedures.
We shall discuss, as in the parametric case, the problem
of overfitting a linear growth curve model but first we briefly
summarize the multivariate nonparametiic results which we shall use.
3.5.2
One Sample Procedures
Let
Ys '
S
= l, ••• ,n
be a random sample from a diagonally
q X1
symmetric continuous distribution function F.
esis we assume the location vector of
stu dy we h ave H :
n
e --
-n
n- 1 / 2
r
Ys is
~
Under the null hypothand for large sample
We denote by R , the rank matrix
-n
83
of the absolute values of V ; i.e.
Sk
R =
...n
....
~
···
...
R
qn
Ties can be ignored, at least in theory, by the assumed continuity of F.
A transformation of the ranks is made to a matrix of
rank scores, E , where
~n
En R11 '
...
E
=
...n
.
E
,
n.R
ql
The transformation is defined by the score function, J , defined by
n
En,a
where J
n
I.
= J n (a/(n+l))
1 < a < n
is required to satisfy the following conditions:
lim
n+
oo
J (u)
n
= J(u)
exists for
0 <
U
< 1 and J(u) is not
constant
00
II.
J [Jn [n+nl
= aP (n- l/Z )
H .(X)] - J[ +n H .(X)]] dF .(X)
nJ
n l nJ
nJ
.00
j
where for each j
Hnj(X)
III.
=n
-1
= 1,2, ... ,q
= 1,2, ••• ,q
[# of IVsjl ~ X; S = 1,2, ••• ,n]
< K[u(l-u)]-i - 1/2 + 0
i
= 0,1
for some
p>
0
84
1
-e
IV.
f
lim
n-+
oo .
[In(u) - J(u)]2 du
= O.
0
esa
Also define
to be
eSa =
J+1
V
sa
L
V' < 0
if
-1
> 0
Sa
Under the basic sign invariance structure a class of conditiona11y distribution free tests for the null hypothesis exists
and is characterized in Puri and Sen (1971).
We shall use from their
1 2
text, corollary 4.4.31, for the limiting distribution of n / (T - y)
-n
through a sequence of alternatives H. To summarize their result:
n
If
n
T ,=
nJ
L:. En R
ja
a=l
e.
for j = 1,2, ••• , q
Ja
and
1
f J(u)du
o
1
y=-
2
1·
. fJ (u)du
o
2
then throughHthe vector n 1 / (T
n
bution as n -+
e=-12
-
co
_n
_ y) has a q-variate normal distri-
with mean
where A. (F ,)
rq
J
A (F )
q q
J
=
_00
j
and covariance matrix
L:
-
1
= -4«0',
,,»
JJ
J',j'=l ,
••• s q •
= 1, .•• ,q
85
dF ,(X,y)
j
The additional assumptions have been made that E
n,CL
expectation of the CL
th
is the
smallest observation from a sample of size n
drawn from a distribution ~j (X) with
{2~;(Xo)"-
=
~j (X)
1
X~O
X
< 0
v X
So
J(O)
=0
and
-1
J(u) = ~j (u) =
~;
-1
(1-;U)
o<u
=
< 1
= 1, ••• ,q
each j
and we assume that
f.(X) J *' (F. (X»
J
J
is bounded as X
+
±
00.
1 2
We see that we may construct quadratic forms in n / (T
-n
with En a consistent estimate for E and test H '
O
- y_),
Alternatively, one
may use the test statistic (quadratic form) constructed under the
permutation. group generated by the sign invariance structure and use
these test statistics.
In large samples these two statistics are
power equivalent through the sequence H •
n
3.5.3
Application of the One Sample Test
For our problem we started with an original h-variate continuous
distribution function, diagonally symmetric about B CL where
~
is the
portion of the orthogonal polynomial matrix needed for the model.
test constancy of the growth curve model we must test
To
86
hence we transform from
~s
to
!s
By our previous discussion
a
!s
is diagonally symmetric about
O
B'B
which in the orthogonal polynomial representation is
-1...
a
[I:
by (3.3.8).
q
J ].
We observe that if
~1
~
and
are orthogonal polynomial matrices
that
B'
B
....1
....
qXh hX(q+1)
=
[0
: I]
qX1
qXq
We note that a sufficient condition for
which does not depend on a
vector.
!s
to have a distribution
O is that the first column of
If in this more general situation we let
-1-
B'B
=[
~i~
a
0
..,
then
= ~i~2
" ].
[ ::ql
Denoting
e = ...1:
B'B
2
....
[I:]
~i~
is the zero
87
we complete our connection between this problem and the multivariate
nonparametric .one-samp1e problem by noting that
implies
e=0
and the sequence of
a1ternat~ve
n
in terms of
values
-1/2
e is
[1: ]-
n
-1/2
[ ~:
where
j
Hence we may test H by testing the hypothesis
sequence
:~.ln
""
O
]
in terms of the test on
e is
= 1,2, .•. ,q.
e = o and
equal to
e.
-n
the
Hence in
qn
the notation of Section 3.5.2 i f we consider general rank scores
applied to the
!s
then the test statistics constructed would be non-
2
central X with q degrees of freedom and noncentrality 11 defined by
t:. = c' E- l c
-
where
88
c
=
,
(B'1'B ) A A (F )
...... 2 q... q q
L is as defined in Section 3.5.2, and Fj is the marginal distribution
function.
Thus if we let
T
T
jK
=
«Tj~»
j,K
= l, ••• ,q
O"K
= _~ol.J~--:--:Aj(Fj)~(FK)
where O'jK is defined in Section 3.5.2.
Then we see
The effect of testing the hypothesis with too few or too many
parameters i,e., underfitting or overfitting respectively, simply
changes
~i~2
to a non-square matrix.
statistics, say p, (p < q) then
semidefinite.
~i~2
For example, if we had too few
is pXq and hence
~
is positive
We now discuss the linear growth curve model and compute
the efficiency of a quadratic assumed model to the test with the
correct number of parameters.
3.5.4
True Model
L~near
If the model (3.3.3) is the true one then the hypothesis to be
tested is (3.3.5) against the sequence of alternatives (3.4.3 ).
Letting !Sl be as in (3.4.1) and !S2 as in (3.4.2) then !Sl can be
89
viewed as a sample from F (a univariate continuous distribution func1
tion symmetric about a1 ) and !S2 can be viewed as a sample from F2
(a bivariate continuous distribution function, diagonally symmetric
al
about (0
».
On applying rank scores to !Sl and !S2 as described in
Section 3.5.2 and letting Q(i) be the quadratic form in the rank
n
_
statistics of Y ].' (i
S
=
1,2), ·we see that
as n
-+
00
where
and
where
j,K
= 1,2
F ,j is the jth marginal distribution function of F .
2
2
V2 ,jK
=Joo
_00
~ J*(F 2,j(X»
J*(F 2 ,K(Y»
dF 2 ,jK(X,y);
j,K
=
1,2.
-00
We observe that we have used the same score function for each
test procedure and each variate.
Obviously we want to use the same
score function for the common variates in the competitive tests;
90
however one may be interested in using a different score function for
the extraneou$ variates (in the case of overfitting).
These points are
other possibi1itie$ which could be explored; however, we continue in
our discussion under the assumption of a common score function.
We note that F
1
= F2 ,1
and hence
= v 2,11
,
and
The ARE of
~2
with respect to
~1
using any of the three criteria,
curvature, trace, or local is:
(3.5.1)
ARE
= R(1,2,
) (
T 2,11 T2 ,22 -
T2 ,12 T 2 ,21
)
We now consider the general case of overfitting in order to
obtain results comparable to the parametric results.
3.5.5 Nonparametric Procedures - General Case of Overfitting
For the general case of overfitting we let
is based on the rank scores applied to
~Sl
~1
be the test which
defined by (3.4.6), i.e.
q X1
~Sl
= ~i ~S'
In the context of Section 3.5.3 we see that
~2
= ~1'
qXh
hence the test based on Q(l) (the quadratic form in T(l»
n
sequence
(3.5.5.1)
would be
n
.
through the
91
with
61
where
~1
-1
A
...1 '"
= A'
'"
L
is defined as T in Section
3.5.3.
On the other hand if we
tested for an additional (p-q) parameters
] =[: ]
exp
~S to !S2 by !S2 =
We then see that
[~~3i, ]
S - 1, ••• , n
__
Q~2),
where
by reducing
~3 = [ t+l ].
the quadratic form in the rank scores of !S2'
has through the same sequence of alternative hypotheses, a limiting
[ I
:
qXq
0
_
]
L -2
1
-[
At"
~]
~
qx(p-q)
We denote the null distribution functions of
and F respectively.
2
A •
. _
~Sl
and
~S2
by F1
The corresponding marginals (bivariate and
univariate) are denoted by F1 ,j; F1 ,jK and F2 ,£; F2 ,£m' where
j,K
= 1, ••• ,q
while £,m
construction of
~Sl
= 1, .•• ,q,
q+1, ••• ,p.
It is apparent by the
and !S2 that
for j,K
and by equality of the score
~1
= ~2,ll
where
= l,2, ••• ,q
functions we therefore have that
e.
92
T
~2
T
. . . 2,11
qXq
...2,12
qX(p-q)
.T. . 2,21
(p-q)xq
T
...2,22
(p-q)x(p-q)
=
and
-1
-1
.T. . 2
=
(~2,11 - ~2,12 ~2,22 ~2,21)
-1
*
*
*
Thus the ARE of ¢2 with respect to ¢1 using the curvature criterion is:
l/q
(3.5.5.1) CARE = R(q,p,a)
I!2,11
-1
-~2,12 !2,22 !2,21
I
Using the trace criterion the ARE is:
(3.5.5.2) TARE
=
-1
T
T
]
tr[!2,11 -!2,12 -2,22
-2,21
R(q,p,a)
-1
q .::. p
3.5.6 Nonparametric Procedures - General Case of Underfitting
We let ¢1 be the test in the previous section for the same
hypothesis and we let ¢2 be the corresponding rank test when we have
based the test on only the first p of the (a , ... ,a )' where p < q.
1
q
Letting !S2
p X1
=
~~
~S
S
= 1, ... ,n.
He note B
...1
=
[B 4 ,b +1'·••• ,b ]
- -p
...q
pXh hx1
From the previous discussions the quadratic forms constructed in the
rank order statistics of the !S2 denoted by
where
Q~2) have a limiting
93
/). = A'
.2
-
-1
_2 [I'·.
I
-
o
T
pXp
-
pxp
...
]
e.
A
pX(q-p)
o
(q-p)xp
The ARE of ¢2 with respect to ¢1 using the trace criterion is
therefore:
TARE = R(q,p,a)
(3.5.6.1)
tr
tr
3.5.7
Redu~tion
-1
~2
p < q
-1
~1
for Wilcoxon Signed Rank Test
In the notation of Section 3.5.2 the Wilcoxon score or signed
rank defines E
na
= J n ( n+(1 )
a
n+1 '
Then T
= - - ' a = 1, ... ,n.
j,K = l, ... ,q is equal to
00
1/12 [f_00 f.(X)
dF.(X)]
J
J
2
K = j = l , ••• ,q
(3.5.7.1)
j~K=l, ...
where F is the distribution function of
~S
,q
g
and P is the grade
jK
correlation, i.e.
[Fj(X) -
1
2]
[FK(y) -
1
2]
dFjK(X,y)
j,K
=
l, ... ,q •
Further i f G is multivariate normal then F is multivariate
normal and
(3.5.7.2)
ff
_(X)
j
(X) dF (X)
j
=
1/2 -1
[2
.fIT y .. ]
JJ
j
= 1, .•• ,q
94
where Y is the variance of the j
jj
th
component of
!s.
.th
th
Denoting P
as the correlation of the J
and K variates of
jK
!s we
have
6
-1 PjK
g
P
= - sin
(
)
~
2
jK
We note that if we assume that G has a covariance matrix L
then
Eej·
Yjj = ej
We see that
j
(3.5.7.3)
L
jK
= K = l, ••• ,q
=
1~[~
sin-
l
tl
K
)] [21TI(ej
In general
j=K=l, ••• ,q
(3.5.7.4)
L
jK
=
'1~(4~(ej Eej)l/2(~~ E~K)1/2)[~sin-ltiK)];
where
P.
JK
= b! L b
-J
~
-K
/(b! I b.)l/2(b' L b )1/2
-J - -J
-K ~ _K
Obviously, the additional subscript (lor 2) needs to be added
for the
3.6
co~parison
of the two tests.
Comparison of Nonparametric to Parametric Procedures
In this section we consider a comparison of the nonparametric
procedures available for the one sample problem to the standard
parametric procedure.
The results in this section have applications
to the general one sample shift problem so we shall first present
95
results for the general one sample problem.
We then discuss the
extension of these results to the comparison of the two procedures when
applied to the growth curve problem when
a) the model is correctly specified;
b) the model is overspecified; and
c) the model is underspecified.
3.6.1 One Sample Location Problem
In this case we shall let ¢l be the test based On the parametric
(T~) test and ¢2 be the nonparametric (rank scores) procedure. We can
have either of the two measures of ARE in this case, the curvature or
the trace; i. e.
(3.6.1)
CARE
= {IEI/I!I}
TARE
=
l/q
or
(3.6.2)
-1
tr T
-1
/tr E
•
We shall want to place bounds on (3.6.1) and (3.6.2) over suitable
subclasses of the entire class of distribution
~j',
functions,~.
The class,
for the one sample location problem, is the class of q-variate
absolutely continuous distribution functions, diagonally symmetric
about its median, 8, and possessing moments of order 2 + 0 for some
<5
> O.
Bounds for (3.6.1) are well known since (3.6.1) is nothing more
than the efficiency of the rank scores estimator to the least squares
estimator using the Wilk's criterion of asymptotic
ance; this fact was noted by Bickel (1965).
generali~ed
vari-
Bounds on (3.6.1) over
subclasses in J are presented in Puri and Sen (1971) and Bickel (1964).
96
We summarize these results briefly:
If
~2
is the normal scores estimator and
a) F is q-variate normal then CARE
= 1,
q
b) F(X)
-
If
(3.6.3)
~2
= IT
j=l
F~(X.)
J
then CARE > 1.
J
-
is the Wilcoxon scores estimator then
a) CARE ~ .864 [
IP jK '
l/q
I Ip;K 1
]
for all F in~;
b) If F is q-variate normal then
(3.6.4)
c) If F is bivariate normal then
CARE = ;[(1_p 2)
I
(1 _ 3~ (sin- l (p/2»2]
'IT
and if we find the maximum and minimum
p(-l
~
P
~
1) we find
.91
bound is reached when Ipi
+
~
CARE
~
1/2
~vith
respect to
.95 where the lower
1 and .95 is the value at
P = O.
d) If F is pairwise independent then (3.6.3) has a lower
bound of .864 and (3.6.4) has a lower bound of 3/'IT.
We now shall find bounds similar to the above bounds when we
use the trace criterion of efficiency.
Let us consider the normal scores test, i.e.
score procedure.
is the normal
We first note that i f F is q-variate normal then
L = T and therefore TARE = 1.
-
~2
97
q
Theorem 3.6.1.
If F(X) =
IT Fj(X ) and if F.(X.) has density
j = 1 ·j
J J
fj(X j ) with finite variancecr
X
~
±
00
2
and
j
•
~f
d
dX
~
-1
(Fj(X»
is bounded as
then TARE > 1.
Proof:
For all j,K
= 1,2, ... ,q
we know that
but for j # K the numerator of T
is zero by the diagonal symmetry of
jK
~
F(X) and the symmetric nature of the score function
-1
•
Let
A(F j ) =
~ :x ~-l(Fj(X»
dFj(X);
j
= 1,2, ••• ,q.
_00
Therefore
o
oo
J
j
[~-1(F.(X»)2
#K
dF.(X)!A 2 (F.)
J
J
J
j
= K = 1,2, ••• ,q.
_00
But by the boundedness assumption and general properties of the
normal distribution it has been shown [see e.g. Gastwirth and Wolf
(1968») that
2
A (F.) > l/cr. 2
J
-
j
J
We know that
~[~-1(Fj(X»)2
dFj(X)
-<Xl
Therefore
1-1 = diag
2
(A (F1 ),
... ,
= 1.
= 1,2, ... ,q.
e·
.
98
-1
L
= diag
2
(l/a l '
... ,
and the result follows.
Q.E.D.
We note that if the distribution function, F, has a diagonal
covariance matrix then the above theorem still follows since each
-1
2
diagonal element of T
is greater than or equal to l/A (F.).
~
Hence
the result of the theorem is valid for a wider class of distributions.
If
~2
is the Wilcoxon score then from Puriand Sen (1971, p. 175) we
have:
2
l/12(foo L(X) dF.(Xn ;
J
_00
j
J
= K = 1,2, ••• ,q
•
where
= 12
Pg
jK
00
00
_00
_00
1 1
Let
A(F.)
J
In general tr
F(~).
~
-1
=I
00
fj(X) dF.{X).
J
_00
Itr
~
-1
is not easily evaluated for arbitrary
We consider some special cases.
Theorem 3.6.2.
If X is an interchangeable random vector then
(3.6.5)
= 12a 2cr 2 (1_p)
where
-
TARE
a
(1 + (g-2)pg) (1 + (g-l)p)
g
(1 - p ) (1 + (q-2)p) (1 + (q-l)pg)
= A(F j ) =
00
1
f,(X) dF.(X)
- 00 J
J
=
00
I f(X) dF(X).
_00
99
Proof:
...
p
Since
Denote
~
~
=
qXq
then we represent
~
& ~ and their inverses are
qXq
given by the following four equations:
E=
-
·22
(l-p) I + 0 pJ]
[0
-
g
1
T = [12a2 (l-p )
E-1 = (0~ (1-p» -1
-1
T
=
1
g
(12a 2 (l-p »
!
og
+ 12a 2 ~]
[! -1
2
2
2
0 p/{a + (q-1)0 p)
[! -
og
~
/
1
~]
pg .
(12a 2 + (q-1) ~) ~]
hence the result follows.
Q.E.D.
100
Theorem 3.6.3.
F(~)
If
has covariance matrix
~
which is diagonal
then
TARE > .864.
Proof:
We know that
tr
T-
1
~
~ 12
i(
j=l
r
f
j
eX)
~F. (X»2
J
=
-00
and
tr E- 1
q
= E 1/07
j=l
J
therefore
q
E
q
2
q
2
TARE > 12 E A (F.)/( E 1/0.)
j=l . J
j=l
J
j=l
=
222
12 OJ A (F.>/O.
J
q
E
j=l
J
2
J
1/0.
2 2
but 120 A (F.) > .864 [Hodges and Lehmann (1956)].
j
J
-
TARE > .864 •
Thus,
Q.E.D.
Theorem 3.6.4.
If
F(~)
is q-variate normal with pairwise independent
coordinates
TARE ~ 3/rr •
Proof:
When F is normal we have
and
...
hence we have the following
101
symmetric
Pairwise independence implies PjK
=0
thus P~K
=
O.
Therefore,
-1
T
-
= -TI3
2
diag(l/ol'
E-1 = diag(l/ol'
2
"4'
2
1/0)
q
2
••• , l/Oq)
..
and the result follows.
Q.E.D.
We now establish a result for the trace criterion which parallels
the curvature result when F is q-variate normal (q
Theorem 3.6.5.
~
3).
For q > 2 then
inf TARE
4>
=
0
e: Q
where Q is the class of nonsingular q-variate normal distribution.
Proof:
As in Bickel (1964) it is sufficient to establish the result for
q = 3.
Consider the covariance matrix:
1
o
(I-a) /12
o
1
(I-a) /12
(I-a) /12
(l-a)/12
1
.
102
Hence
1 -
=
1
tr E-(a)
tr E-1 (a)
T(a)
=
~
= -'IT6
2
(0 (l-(l-a)
2
0
2
(I-a)
2
. 2
2 -1
(I-a)
2
»)
2 -1
1 -
·2
(I-a)
2
12
12
and we see as a
~O
that
1
o
.-1
-6 sJ.n
[I-a]12
o
1
. -1
-'IT6 sJ.n
[I~]
'IT
1 (a) =
-1
1.'IT 0 2 (1-2b 2)
2
i_b 2
b
2
1_b 2
b
-b
-b
where
Clearly we· have
tr
T
-1
(a)
30 2
2 -1
that
---(1-2b )
Notice
'IT
since
12
1
1
-
(l:-a)
On the other hand,
.
T.-
12
2
'
(I-a)
[3~(1~a)]
_ (I-a)
2
(I-a)
2
[0 (l~(l-a»]
00.
(I-a)
2
3
= -
'IT
0
2
2 -1
2
(1-2b)
(3-2b).
-b
-b
1
103
2
b
2
.-1
= [ .iT6 S1n
f1:;;]]
36
=2
-+
2.25 as ex.
-+
0,
7T
therefore
So we see that tr T- 1 (a.)
..
a.
-+
-+
.42 0
2
~
00
as a.
-+
O.
So TARE
-+
0 as
0 and the theorem follows.
Q.E.D.
Hence the same type of degeneracy occurs using the trace
criterion as occurs with the curvature and the smallest root criterion.
Let us now study the bivariate normal distribution function and derive
bounds on the TARE.
If F is bivariate normal with mean (~) and
and positive definite then
a) TARE is independent of 0
b) .87 < TARE ~ .95
2
2
and O ;
1
2
where the lower bound is reached as Ipl
and the upper bound is reached at p
= o. .
Proof:
Since F is bivariate normal we know
6cro
1 2 sin-1(p/2)
1T
-+
1
104
In the 2x2 case we know that for a nonsingular matrix
~-1
tr
c,tr
~/IAI
~
that
.
Hence,
and
TARE
= tr
T- 1 /tr
-
L- 1
-
= 1(1
TI
_ 36(sin- 1 (p/2»2)
TI 2
which proves part a of the, theorem.
symmetric about p
h(-p)
For 0
~
=
p
=0
2
(1 - P )
We first notice that the TARE is
since if we denote h(p)
= TARE
3
36. -1
2 -1
2
-(1 - -(s1n (-p/2»)
(1 -p )
TI
TI2
~
-1
=
we see
h(p).
-1
-1
2
l,sin (p/2) is increasing, therefore, [sin (p/2)]
1
is increasing, similarly, for 1 ~ P ~ 0,sin- (p/2) is decreasing, and
therefore, (sin- 1 (p/2»2 is increasing, thus [sin- l (p/2)]2 is
increasing as
Ipi
increases.
Bickel (1964) has sho,vn that
2-1
g
P = 3(1 - -TI cos
p/2)
hence
3
2
2
TARE = -TI (1 - P )(1 - 9(1 - -TI cos
-1
p/2»
2 -1
Further Bickel (1964, p. 1087) demonstrated that this function
has a maximum at p
=0
and is monotonically decreasing in Ipl.
the maximum of the TARE is at p = 0 which is 3/TI or .95.
lower bound consider the monotonic transformation
u
2 sin u
= sin-1
= p,
(p/2)
Hence
For the
105
thus
= -'3IT
'l'ARE
,
and
lim .
Ip. I ~
1
1 _....;(=1;....-pc:..2.J.)_~_~_ =
'IT
(1 -
36. -1
-z(s~n
'IT
(p/2»
2
Applying L'Hospital's rule we obtain
lim
lui ~2I
6
1
'IT
(-8 sin u cos u)
-2
ill2.
'IT
13
=~
=
.866
2
=
3
8
•
s~n
= -
'IT
u
'IT
'IT
- cos 6
6
2 (.2.)
'IT
.87.
Q.E.D.
We see the trace leads to slightly wider bounds than the curvature criterion does; but we have the same basic result that the
efficiency decreases as
Ipl
increases.
In any case we see that
using curvature or the trace is likely to produce similar results
for the efficiency for an underlying bivariate normal distribution.
3.6.2
Growth Curve Model Correctly Specified
In the notation of Chapter II we have t
the nonparametric procedure and
efficiencies of
(3.6.6)
(3.6.7)
~2
relative to
~l
~1
1
=
t
2
=q
~2
and if
is
the parametric procedure then the
are:
CARE
[b 1 ,···,b ].
-
-q
106
E is the covariance matrix of the original observations from the
parent distribution function G, an
h-variate absolutely continuous
distribution function diagonally symmetric about zero
of order 2 + 0 for some 0 > O.
F is, in
function of the reduced quantities
~S'
t~is
with moments
problem, the distribution.
defined by
~i ~S.
The least
squares procedure and the rank procedures are applied to the
~S.
Practically speaking, the bounds attained in the general one sample
problem apply to this problem as well.
The only point of interest
is to determine what set of circumstances in the growth curve problem
lead to these bounds.
Consider the normal scores procedure.
If F is
q-variate normal then the efficiency using either criterion is 1; if
F has pairwise independent components it follows that
bl E b
-K
-j -
=0
V j
One case where this happens is when E
are bounded below by
~
K.
= a 2 !,
hence the CARE and TARE
unity~
Turning now to the Wilcoxon scores we see that if F has pairwise
I
independent components then the efficiency (trace and curvature) is
bounded below by .864 and if F is normal the bound is 3/TI, again pairwise independence necessitates that
b~
E b
-J - -K
=0
V
j ~ K.
If F is
bivariate normal then using curvature efficiency the lower bound of
.91 is reached when
while the trace criterion achieves a lower bound of .87 for the same
limit.
The upper bound is attained for both when p
= O.
One case
107
where p
=0
situation.
is
~
= a 2 !'
so the upper bound is attained for this
The lower bound for the growth curve model cannot in
general be reached for the bivariate case (i.e. linear and quadratic).
To see this point more clearly, we know there exists a nonsingu1ar
E
such that
(D- 1 ) , ·r D- 1
....
.... ....
=I
r = D'D
therefore
let
i
= 1,2
thus
P
1/2
= ....1-2
a'a /(a'a )
-1....1
(a'a )
....2.... 2
1/2
and this can be one only if there is a nonzero constant c such that
which implies that
~1
= ~ ~2
which we know cannot happen since for h time points, the elements of
£1 are monotonically increasing while the elements of
tonic.
~2
are not mono-
Thus the correlation coefficient of the linear and quadratic
cannot be 1 so the lower bound in general cannot be reached.
3.6.3
Growth Curve Model Overspecified
In the notation of the previous sections we have
equal to t which is greater than q.
(3.6.8)
t
1
and t
2
both
We denote
... , b....q ]
where the underlying model is given by
~1 ~
and
~2
corresponds to the
•
108
orthonormal polynomials we have overfit.
The rank scores and least
scores procedures are applied to the
B'
!s =
-1
s = 1,2, ••• ,n.
~s
B'
-2
Under the assumptions of Theorem 2.2.1 the quadratic forms constructed
in the rank
s~atistics
have a chi-square distribution with t degrees of
freedom and noncentrality parameter
A' [ I
: 0
- qXq- qx(t-q)
-
] T
-1
lXq
through
where
~
~:
Each term of T is the corresponding covariance of the rank scores of
the
!s
divided by the A(F ) A(F ) defined earlier in this section.
K
j
The c9rresponding parametric test would also have chi-square distribution
w~th
t degrees of freedom and noncentrality
B'
-1
A'[I:O]
_
N
B'
[ -2
through~.
The efficiency of
~2
relative to $1 is therefore given by:
l/q
(3.6.9)
CARE = {IB'LB -1--1
~iE~2(~2~~2)-1~2E~11/1~11~12~;~~211}
and
e
(3.6.10) TARE
=
-1
tr(~ll - ~12~22~2l)
-1
,
,
('
/tr(~lE~l- ~lE~2 ~2~~2
)-1 'L:
~2-~1
)-1
109
where
~ij
is the portion of
~
due to the covariance of the rank scores
applied to ~i ~S and ~j ~S for i,j
3.6.4
= 1,2.
Growth Curve Model Underspecified
In this case t < q and we let
~
be the first t rows of
~1.
The
hXt
. least squares procedure and nonparametric procedure are applied to
Y
-5
= -B'
X
-5
S
= 1,
2, .•• , n.
The curvature criterion cannot be applied in this case so we consider
the trace criterion which yields the efficiency of
~2
relative to
~1
as:
TARE
This reduces to:
(3.6.11)
3.7
Covariance Adjustment
In this section we investigate the effect of the use of the
higher order polynomial terms as covariables on the TARE and CARE.
We
compute the efficiencies of the procedure with covariance adjustment
with respect to the procedure without covariance adjustment when the
model has been
specified.
a)
correctly specified, b) overspecified, and c) under-
In cases a and c we are able to show that you always gain
by covariance adjustment.
3.7.1 Model Correctly Specified
The hypothesis of interest in this problem is
e·
110
(3.7.1)
~(a.lN' ••• , a. qN )
against
'
-1/2(,
.1\1' ••• , A )' •
q
= N
From the previous sections we know that the parametric test
based on
!Sl
= ~i
~S;
S
=
1, 2, ... ,
n
qXl
will have noncentral X2 (q,A ) through ~ as N ~
l
6
1
= ~'(~i
E~1)-1~.
_
2
,
- ~
-1
~l
~.
~1
where
While the nonparametric test using rank scores
would also have limiting noncentra1
6
00
x2 (q,6 2)
is defined by (3.6.8).
through ~ where
We assume that we shall
use r of the higher degree terms as covariables in the analysis, i.e.
we let
!S2
B'
= -2
~S
S=1,2, ... ,n
rxh
where
r
The expected. value of !l given
,i
and its covariance matrix is
The residuals are thus
!2 is
= 1,2, ••• ,(h-q)-1.
111
and under H are
O
with covariance matrix
Hence. t
and through
~
...
In a similar fashion [see e.g. Sen and Puri (1970)J the test
based on the rank scores procedure would have noncentra1 X2(qt~3)
through
~
as N ~
00
with
where
~1.2
= ~11
-1
- ~12 ~22 ~21 •
We then see that the efficiencies of the test with covariables
relative to the test without covariates for the parametric procedure
are:
and
The corresponding efficiencies for the nonparametric procedures are:
112
and
,
We know ~i ~ ~2(~2 ~ ~2)
definite while
~i
E~1
and
-1
~2 ~ ~l is symmetric positive semi-
~i ~ ~l
-
~i
E~2(~2 E~2)-1 ~2 E~l
are
symmetric positive definite.
Hence by Theorem L 44 of Graybill (1961) we see that
A similar argument shows that
Therefore, using the curvature criterion of ARE we see that the
efficiency of the procedures using covariance adjustment to the corresponding procedure without covariance adjustment is bounded below
by unity.
Using the trace criterion we also find that the ARE is bounded
below by unity.
ment.
To demonstrate this we consider the following argu-
There exists a nonsingu1ar matrix C such that
B' ~ B = C C'
... 1 ......1
......
(3. 7.2)
where A
= diag(Y1'
... , Yq ),, y....
~
and
is a characteristic root of
113
Each Y is between zero and one since the roots of
i
are the same as the roots of
i~e.
the solution of
for Y or the solution of
(3.7.3)
for A where A
=1
- yare the same.
The roots of (3.7.3) are greater
than or equal to zero, i.e. 1 - Y ~ 0, therefore y
each y.~ is between zero and one.
, ~
I ~
(I ~
tr [ ~l ~ ~l - ~l ~ ~2 ~2 ~ ~2
~
1.
Clearly
So we see that from (3.7.2)
)-1, ~
~2 ~ ~l
]-1
= tr ~-l(~i E~l)-l ~ tr(~i E~l)-l
since each element of the diagonal matrix A- l is greater than or equal
to one.
Therefore
A similar argument can be applied to the nonparametric procedure.
Covariance adjustment does improve both the CARE and TARE in this case.
3.7.2 Model Underspecified
In this situation the hypothesis is (3.7.1); however, we have one
test with t statistics (t < q) and zero covariables and the second test
114
with
t
statistics and r covariab1es.
We let
B • [~1' ... , ~t : ~t+l' .. ., ~q]. [~1
~2]
and assume the least squares reduction
!Sl
tx1
= ~i ~S;
S
= 1,
2, .•• , n.
From the previous sections the parametric procedure based on the !Sl
will have noncentra1 X2(t'~1) through ~ as N -+
6
= A'
[.
1
~o J
-J
tX
(B' l: B )-1 [ I
-1 - -1
00
where
O]A
txt
and
o
(B'L:B)-l
-1 - -1
tXt
A
t X (q-t)
o
(q-t) Xt
-
'0
(q-t)x(q-t)
The nonparametric procedure with rank scores applied to the
2
IS1 would result in a limiting X (t'~2) through
-1
~11
txt
A'
~2 =
~
,.,i th
A.
0
t X (q-t)
0
(q-t) Xt
0
(q-t) x (q- t)
We wish to compare these test to the tests which use r of the higher
order terms as covariab1es.
Y
-S2
=
B'
·-3
rXh
We notice that
~3
~S
lole let
r = 1, 2, ..• , (h- t) - 1
will contain terms from
~2
and in fact may actually
115
equal ~2'
The expected value of
Yl
given
Y2
is given by
with covariance matrix
The residuals are
with the same covariance matrix.
Under similar assumptions as before
we have that
is noncentral X2 (t'[).3) through ~ as N -+
~3 -~.
[IS E~1
-
where
00,
~i ~ ~3(~j E~3)-1~3 E~ll-l: : ] ~
'.
In a like fashion the nonparametric procedure with covariables
would have a noncentral X2 (t'[).4) through ~ as N -+
~4 = ~
, [[
~ll
-
,-1
~13 ~33 ~31
We thus have the following as the
]-1
'
effi~iency
:J
00
where
-A •
of the test using
covariates to the test without the use of covariates for the parametric
procedure:
TARE
=
tr[~i ~ ~l - ~i
E~3(~3
E ~3 )
-1 ,
~3
E ~l ]-1/ tr [ ~l, E ~l] -1 ,
116
and for the corresponding nonparametric procedure is:
TARE
= tr[~11
-1
- ~13 ~33 ~3l]
-1
-1
ftr ~ll
The trace criterion is the only method of comparison which we
can use in this case since the curvature of the power function of
each test is zero.
We also note these efficiencies are bounded below
by one from the arguments in section 3.7.1.
Even if we have under-
specified the model we always do better to use covariance adjustment.
3.7.3 Mode10verspecified
In this situation the hypothesis is again given in (3.7.1);
however, we have one test with t statistics (t > q) and no covariables
and the second test with t statistics and r covariab1es.
We let
B = [~1' ••• , £q' £q+l' ••• , £t] and assume the least squares
reduction
Y
-Sl
= B'
X
-S;
s=
1, 2, ... , n.
tx1
From the previous sections the parametric procedure based on the
will have noncentra1 X2 (t,6 l ) through ~ as N ~
00
where
Tpe nonBarametric procedure with rank scores applied to the
result in
a limiting
62
X2 (t,6 ) through ~ as N ~
2
,[
= ~ ~ll
~Sl
would
,/
00,
where
-1-1
- ~12 ~22 ~21]
~.
The partitioned matrices are defined by B
e
~Sl
=
[
~l'
hxq
~2
] and
T ••
-1.J
hx(t-q)
are the covariances of the rank scores of B' Xs and B~ ~S·
-i -J
We wish
117
to compare these tests to the tests which use r of the higher order
terms as covariab1es we let
Y
-52
=
where
B'
-3
rXh
The expected value of !1 given
[
= It
r
~5
2 t ••• t (h-t)-l
!2 is
~ ] + (~' E~3)(~j E~3)-1 !2
with covariance matrix
•
The residuals as
[!l - [[ ~l
+
(~' E~3)(~j E~3)-1!2]]
with the same covariance matrix and through
~
we assume that
as n
We denote
Ai'
- J
= -1.
B~
l: B
~j
~
i,j
=
1,.2,3,
and
-1
~ij.3 = ~ij - ~i3 ~33 ~3j
Also let
A* =
and
i,j
=
1,2.
-+ co.
118
** :::
. A.
We know that
and
I~11.3
-
-1
/ = I!** 1/1!22.3'·
~12.3 ~22.3
~2l.3
Thus using the curvature criterion we get
,.
(3.7.4)
=
for the efficiency of the test with covariates relative to the test
without covariates.
The first ratio in (3.7.4) is at least unity,
however, the second term is less than or equal to one.
It does not
seem obvious whether the CARE is greater than or equal to one.
Using
the nonparametric rank scores procedures we have for the efficiency of
the test with covariates to the test without covariates:
CARE =
I~ll - ~l2 ~;; ~2ll
----:;;.::---=---=~..;.:;.;~--
IT 11 •3 - T12 • 3 T;;.3 T2l •31
As in the parametric case it is not clear if this quantity can
. be less than unity.
3.8
Comments
It should be observed that the hypothesis
119
unspecified (1 < h) could have been the hypothesis tested in this
chapter in place of the hypothesis given in (3.3.4).
To test this
hypothesis we would apply the parametric or nonparametric one sample
procedure to
!s = B'
~s'
s
= 1, 2, ..• , n
where
The bounds attained in this chapter would not change; however, the
actual efficiency formulae would change to reflect the B matrix.
In
general the results of this chapter are applicable to tests for the
~
one sample problem.
In Chapter IV we find that the TARE and CARE
reduce to scalar multiples of the corresponding one sample results for
the hypothesis that the intercept is a fixed number and the curve is
stationary.
CHAPTER IV
APPLICATION OF TARE AND CARE TO THE MULTI-SAMPLE
GROWTH CURVE PROBLEM
4.1
Introduction
We shall present results in this chapter for the c-sample prob-
lem (c > 1) similar to the results obtained in Chapter III for the onesample problem.
As in Chapter III we shall restrict attention to the
polynomial growth curve model.
The hypothesis of interest will be the
hypothesis of the equality of the c growth curves, under various
assumptions of what the model is, we shall see that this test of homogeneity will give rise to different test statistics.
The unweighted
least squares reduction will be used for reduction of each observation
vector to the estimates for the assumed model.
We shall apply the
general rank scores statistics to these reduced vectors and also shall
apply the Hotelling-Lawley trace statistics to test the
n~ll
hypothesis.
Efficiency formulae are presented for each procedure.
4.2
The Statistical Model and Data Reduction
The model for the multi-sample (c-sample) problem is a special
case of the general model presented in Section 3.2.
is the set of c indices l,2, ••• ,c.
The index set, I,
The index set, T, is the same as
in (3.2.2) and the data are characterized'by (3.2.3) and (3.2.4).
notational ease, we tYrite n. in place of n(i) since i
~
= 1,2, ••• ,c.
For
121
Restricting attention to the case b
=1
we see that we have the follow-
ing for the mU'lti-sample problem.
(4.2.1)
where for each i the
i
= 1,2, ••• ,c
S
= 1,2, ••• ,ni
X ' are i.i.d. as G(X;i) an h-variate absolutely
S1
-
-
hxl
continuous distribution function.
The location vector of ...X 1' is M(i)
S
where
-
M(i)
i
= l,2, ••• ,c
and in the orthogonal polynomial models which we consider, we have that
(4.2.2)
where
~l
M(i)
~1
A..
hXr
rxl
=
i
-1
= 1,2, ••• ,c
are the orthogonal polynomials of degree 0 to r-1 satisfying
(3.2.9) and (3.2.10).
Hence, we have again a dimension reducing trans-
formation from M(i) to a. .•
The assumption
-1
(4.2.3)
G(X;i)
= G(~ +
M(i»
i
= l,2, ••• ,c
is made so that X ' - M(i) is distributed independently of i.
- S1
In the
normal theory we assume
(4.2.4)
i =
i f M(i) is given by (4.2.2).
l,2, ••• ,c,
The hypothesis of homogeneity of the
growth curves is given by
(4.2.5)
~l
= ~2 = •••
=0.
-c
=0.
-
(unspecified) •
122
Defining b.
j = O,l, ••• ,h-l as in (3.2.10) we shall consider
-J
the unweighted least squares estimate of ~i for each observation; hence
denoting
~l
by
then consider the transformation
= B'" l'
(4.2.6)
i.
XS '
'" ~
= l, 2, .•. , c
S=l,2, ••• ,n .•
~
In the next sections we compute the efficiencies of the parametric and
nonparametric procedures for the overspecified and underspecified
growth curve models.
.'.
4.3
~
Parametric Procedure
Under the assumption that
~Si
are from a multivariate normal
distribution defined by (4.2.4) it follows that Y ' defined by (4.2.6)
'" S~
has a multivariate normal distribution.
Hence, to test the null
hypothesis (4.2.5) we could follow any of several procedures, for
example, the likelihood criterion which reduces to Wilks lambda, the
Hotelling-Lawley trace, or Roy's largest root.
of normality
Under the assumption
the Hotelling-Lawley trace criterion (denoted
T~) and
the likelihood ratio criterion are asymptotically equivalent, in fact,
both would lead to noncentral chi-square statistics for large samples.
On the other hand if the parent distribution is not necessarily normal
2
but has moments of order 2 + 0 for some 0 > 0 then TO still has a
central chi-square under the null hypothesis and a noncentral chisquare through an appropriate sequence of alternatives.
We shall use
the T~ statistic, but the equivalence of the two criteria in large
123
samples for normal distribution functions is noteworthy.
We consider
the data characterized by (3.2.3) and (3.2.4) and define
n
-1
~i = ni
X
=
c
L
i=1
i
S:1
~Si
i
= 1,2, ••• ,c
i
= 1,2, ••• ,c
ni_
N
~i
c
N=
where
L
i=1
S
=
n
i
n.J.
c
(N_c)-1
L (L
i=1 S=l
(X . - x,)(X - X.)').
~ SJ.
~J.
~ Si
~J.
each in the open
Assume there exist c constants, Yl' Y2 , ••• , Y '
c
interval from zero to one such that
n.
.
J.
1 J.m - = y ;
N-+ooN
i
c
i = 1,2, ••• ,c and
L
y. = 1.
i=l
J.
Since the parent distribution function of X
' has moments of
~
J.
S
order 2 + 0 for some 0 > 0, it follows from the central limit theorem
that
(4.3.1)
Furthermore we know that ~
YE by
the laws of large numbers.
From (4.3.1) it follows that
(4.3.2)
~(nli!2 (Y=i
- i»
__ - E !
~,
~ ~ ) as n i -+
-+ Nr (0'
~,~ ~
as Y is a continuous function of Xi'
i
00
In addition it is obvious from
(4.3.2) that
E !i); i
= 1,2, ••• ,c) ~ Nrc(~'(~' E ~)~
rxr
£)
cXc
124
where
o
r =
o
c
Writing
c - 1 of the
implies that
hypotheses,
~i
e.
e
..1
..1.
~,
as
~
+
~i
where
r Yi
i=l
~i"=
0 we observe that only
..
are linearly independent; hence the null hypothesis
= ..e2 = ••• = ..].
e. = O.
The sequence of alternative
is defined by
(4.3.4)
~:
e = N- l / 2
.. 1.'N
,
/\i
..
i
= 1,2, ••• ,c
where A. is non-null for at least one i •
.. 1.
To test H ' we may define T~ in its symmetric, less than full
O
rank, form of
(4.3.5)
or we may consider the statistic in one of its full rank forms, for
example:
(4.3. 6)
T~
= N[Y.-Y);
i
.. 1.
..
= 1.2 •.•.• c-1]·
~(~i~ ~1)
IN
@[OU
n
i
_ l ]}, \ ( y . _ Y ) ; i..=
]. 1
.. ' 2 ' ' ' ' ' C _ l ] .
i,i=1,2, .•• ,c-l
By (4.3.3) and computation of the covariance we know, under H '
O
that
125
as N
Furthermore through
(4.3.7)
.
~(IN(Y. _~
~
-+
~~
] as N -+
~,~=1,Z,
X=
L y. A.•
. 1
~=
~-~
(4.3.8)
CXl
••• ,c-1
Without loss of generality we may assume ~ is
zero because of the r restrictions.
has a limiting
•
Nr ( c-1)[[A.-X),i=1,Z,
••• ,C-1];
-
6. i ' - 1]
(B'L B) (;'l -~-1- . .'
[ Yi
where
CXl.
we see that
Y);
,c-1]
_ i = 1,2, •••
..
c
-+
x2 (r(c-l),
Clearly, by virtue of (4.3.7)
~ ) through ~ as
~1= [A_ ; i=1,2, ... ,c-1]'
i
N -+
(B 'LB
l
1
[ - - -
00
)®
with
J-
1
6 .. ,
]
2L-l
[ y.~
. . ' 1 , Z , ... c- 1
~,~=
[~i;
Written in a symmetric form in the
T~
= 1,Z, ••• ,c-l]'
i
A., i = 1,Z, ••• ,c
-~
~l
becomes
We shall need to compute the trace and determinant of the discriminant
in (4.3.8) so we reduce this discriminant to a simpler form.
Noting
that
(4.3.9)
-
t6~~. 1]
o •••
·
0
+
(-1)
1
1
i,i'=1,Z, ••• ,c-1
o
1
(1,1, .•• ,1)
.
126
thus by Theorem 8.3.3 in Graybill (1969) the inverse of (4.3.9) is
0 • •• 0
..Y1
0
Y1
+K
O
(Y , ••• , Y 1)
1
cYc - 1
Yc - 1
where
c-1
-1
-1
K = - (-1)(1 + (-1) L y.)
i=l ~
= Y
c
Consequently, we have
(4.3.10)
[
<5. of
]
-!L - 1
y.
~
+
Y~Yi']
c
. . '-1
- , 2 , ••. ,c-1
~,~
. . , = 1 , 2 , ••• , c- 1
~,~
Furthermore, since the inverse of a direct product is the direct
product of the inverses, it follows from (4.3.10) that
(B' L B) ® [<5 ii'
-1 - -1
Yi
_
1]
-1
i,i'=1,2, ••• ,c-1
=
(B ' L B )
- 1 - - 1
-1 ® [
<5 •• , y.
1.~
1.
y.Y.,]
+ ~ 1.
Yc
•
.
i,i'=1,2, ••. ,c-1
In addition, by Theorem 8.8.10 and 9.1.11 in Graybill (1969) it
is apparent that:
y.y. ,]
(4.3.11)
[0ii'Yi
+
~C'
.,
i,l. =1,2, ••• ,c-1
=
~; . '
+ _y Y....;;;i,-']
c
and
,(r)
. . ,_
-1, 2 , .•• ,c-1
l.,~
127
(4.3.12)
Let us define
-
(4.3.13)
A =
]1'1'=1'2""'C-1]
4.3.1 Parametric Procedure - Overfitting
If we.had incorrectly assumed that the polynomial model was of
degree (t-l) in each group (t>r) then we would make the transformation:
Ys '
(4.3.14)
-
~
=
[b O,b 1 ,···,b
-
~
1,···,b t 1]' XS '
-r- - 1
and for notational convenience we let B
i
= l,2, ••• ,c
S
= 1,2, ••• ,n.
~
=[~l:~2]
Following, the same reasoning as in (4.3.6) and (4.3.7) we see that
through
~
(4.3.15) o(vN(Y.-Y); i = 1,2, ••• ,c-1)
-1 -
+
A.
Nt(c-l) [(~1),i = 1,~, ••• ,c-1];
[
(~.~ ~) ® [o~~. _1]
] as N
->00
i,i'=1,2, ••• ,c-1
where for each i; A. is rxl and 0 is (t-r)x1.
-1
-
Therefore, the statistic
~
128
T~, defined as the quadratic form in the t(c-l) vectors in (4.3.15)
would
h~:e
limiting
nonce~t[ral X2~:(C_l[)'82) as :i:i~JWhere
1!.2 = [(~ );i=I,2, ••• ,c-1]
(~'E~)
®
°ii'Yi + Y
c
]
,_
_
i,1 -1,2, ••• ,c 1
A
[(Oi); i=1,2, ••• ,c-1].
By (3.4.7) and the fact that the last (t-r) elements of each
A.
( -~)
o vector are zeroes we have
[A.; i = 1,2, ••• ,c-1].
-~
As in (4.3.11) and (4.3.12) we have the determinant and trace of
the discriminant of
~2
as
I~I
c-1
and
respectively.
Representing the test based on the correct number of parameters
as ¢1 and the overfitted procedure as ¢2 we easily find the following
efficiencies of ¢2 relative to ¢1:
CARE = R(r(c-l),t(c-1),a)\
1
1~1~~11
(c-l)
r(c-I)
~~
IB'rB - B'EB (B'EB )-IB'EB I(c-I)
-1--1
-1--2 -2--2
-2--1
which becomes
129
1/r
(4.3.16) CARE = R(r(c-1),t(c-1),~)
1~1~~11
Jl----------1
I!i~!l ~i~~2(~~~~2)-1~~~~11)
for t
~
Using the trace criterion we obtain
)
tr[~i~~l- ~i~~2(~2E~2)-1~2~~l]-1
(4.3.17) TARE =
r.
tr[!i~~l]-l
R(r(c-1),t(c-1),~)
I
f
for t > r.
We see therefore that the c-samp1e efficiencies for overfitting
are the one sample efficiency, in both cases adjusted by a scalar
function.
The equations (4.3.16) and (4.3.17) do illustrate the fact
that the loss in efficiency may be more severe for the c-samp1e problem
than it was for the one-sample problem.
4.3.2
Parametric Procedure - Underfitting
If we had assumed that the polynomial growth curve was of degree
t-1(t<r) in each group then we would make the transformation
,
(4.3.18)
!Si = [£O'£1'···'£t-1)
i=l, ••• ,c
XS '
- 1.
S=1,2, ... ,n.
1.
Let
and
Then through
~
we see that
(4.3.19),~(IN(~i -
Y); i=1,2, ••• ,c-1)
-+-
Nt(C-l)([~:'
(B
...,
i=1,2, ••• ,c-l);
*' ~B)~;
* .- -1
-- A- )
as N -+-
00 •
130
>..'*
'*
where ~i is defined by ~i
the statistics
~
6
i
=
txl
I:N(Y. - Y) would have
-~
-
Hence, the quadratic form in
X2(t(~-1),63) as N +
00
through
where
3
=
'*
[~i;
--
. ,c-l]' [(B ie' LB)
'*
i=1,2, ...
-
--
-l®X/A]
'*
[>...,i=1,2,
... ,c-ll.
--1
Using the trace criteria the ARE of the underfitting procedure,
~2'
to the procedure with the correct numbers of parameters,
tr[B
(4.3.20)
TARE = R(r(c-l),t(c-l),a)
~l'
is:
ie'
-
Nonparametric Procedures
4.4
We shall present efficiency formulae in this section for the
i
general rank scores procedures for the multivariate multisample problem
as outlined in Puri and Sen (1971).
Since the X ' have been observed
-
S J.
from a continuous h-variate distribution function G(X,i) with location
vector M(i) where M(i)
=
-
~l
.~
a. , it is obvious that the B
-1
-
l
X ' have
-
S1
hXr rxl
a continuous r-variate distribution function with location vector a .•
-J.
Hence, the multisample rank scores procedure may be applied to the
B X
"
l - S1
_
The procedure is simply to rank each coordinate of Y
' in the
- S1
_ J.' and apply a score function to these ranks.
set of all the Y
S
Mean
scores are computed for each sample and to test the null hypothesis of
-.
equality of location vectors; a set of r(c-l) contrasts in the mean
scores is constructea.
A quadratic form in these r(c-l) contrasts is
131
constructed.
A quadratic form in these r(c-l) contrasts is defined to
be the test statistic; numerically large values of this statistic lead
to rejection of the null hypothesis.
Section 5.6 of Puri and Sen (1971,
Referring specifically to
we note through the sequence
~
defined in (4.3.4) that the quadratic form would be noncentra1 chisquare with r(c-1) degrees of freedom with noncentra1ity 6
1
The noncentrality 6
1
as N 4
~.
is defined by
-1
(4.4.1)
:Ell (F) ~i
where Y are defined in the previous section
i
T
_11
J
<1'i'
(F) -
i = 1,2, ••• ,c and
]
- [[ C(Fj)C(F j ,) . . • _
J,] -1,2, ... ,r
]
and 0jj' is Vjj,(F) defined by (5.4.28) of Puri and Sen (1971) and
C(F j ) =
foo ~X
J(Fj(X»
dFj(X); for j = 1,2, .•• ,r.
_00
We have again assumed that
A=
~,
which is no loss of generality.
observe that (4.4.1) can be written in its full rank form as:
(4.4.2) 61
= [A,;
-~
1
i=1,2, ... ,C-l]'[T- (F)
- 1l
where A is defined by (4.3.13).
(g) A] [A,;
-
-1
Hence, we have
(4.4.3)
and
(4.4.4)
-1
tr(:Ell (F)
®
-1
A ) = tr T ll (!) tr A.
i=1,2, ... ,c-1]
We
132
4.4.1
Nonparametric Procedures - Overfit
If we had overfit the polynomial "mOdel in each group with poly-
nomials of degree t-1 then we would apply the rank scores procedures
to the variables defined by (4.3.14).
contrasts in rank
The quadratic form in the t(c-l)
scores would have noncentra1 chi-square distribution
with t(c-1) degrees of freedom and noncentra1ity 6
(4.4,.5)
fJ
=
2
A.
[(OJ.); i
= 1,2, ••• ,c-1]
,
2
through
~,
where
1
[~- (F) ®~]
A
[(-Oi),. i
= 1 , 2 , ••• ,c-l ]
and
~12(F)
~11 (F)
'r(F) =
~21 (F)
the covariance matrix of the entire set of t scores.
In an obvious
way we have
fJ 2
=
[~i' i
= 1,2, ••• ,c-1]
,
[(~ll(F)
The efficiencies of the incorrect test ¢2 to ¢1' the test based
on the correct number of parameters using curvature criterion is:
(4.4.5)
r
CARE = R(r(c-l) t(c-1) ex);
·
l'r
(F)
I
;
-11
· L'!l1 (F)-!12(F)E;~(F)E21
\.
l/r
r<t
(F) (
and using trace criterion we obtain:
(4.4.6)
TARE
= R(r(c-1),t(c-1),ex)
r<t.
133
We note the similarity to the one-sample results and observe that
the scalar adjustment, R(r(c-l),t(c-l) ,a) may result in large reductions when comparing many samples.
4.4.2 Nonparametric Procedure - Underfitting
Underfitting the model leads to applying the rank scores to the
!Si defined by (4.3.18).
The quadratic form in the t(c-1) contrasts of
the mean rank scores would lead to a limiting noncentral chi-square
distribution with t(c-l) degrees of freedom and noncentrality parameter
~3
through
~
where,
*
*' E ~ *)-Q9~]
1
~3 = [~i" i=1,2, ••• ,c-l]'[(~
* and B* are
~i
*
[~i; i=1,2, ••• ,c-l],
defined as in Section 4.3.2.
The efficiency of the
underfitted procedure to the correctly fitted procedure is:
tr T*-l(F)
(4.4.7)
TARE = R(r(c-1),t(c-l) ,a)
--1
tr ~ll(F)
when T *(F) is the portion of
portion of
4.5
~ll(F),
corresponding to the upper tXt
~ll(F).
Comments
We conclud.e from Sections 4.3 and 4.4 that for large values of c,
the adjustment of the one-sample efficiencies to obtain the c-sample
efficiencies, may be quite large.
The loss in
efficien~y
in over-
fitting is greater for the c-sample problem than the corresponding
one sample problem.
On the other hand, the loss in efficiency in
underfitting the correct model is not as great for the c-sample problem
as it is for the one-sample problem.
134
A comparison of the parametric and nonparametric procedures would
obviously lead to the same formulae as presented for the one-sample
problem.
This fact is deduced by simply observing that the function,
R(q,t,a), is 1 when q
=t
and the ratio of the determinants or traces
of the noncentralities would be independent of the matrix A.
Hence,
the one-sample bounds obtained in Chapter III are applicable to the
c-sample problem.
Adjustment of the statistics with the higher order terms as
covariables would also lead to identical results as presented in
Section 3.7 and the bounds attained in that section would apply for
the c-samp1e problem as well.
CHAPTER V
NUMERiCAL ILLUSTRATIONS OF THE CARE AND TARE
5.1
introduction
The purpose of this chapter is to provide some numerical i11us-
trations of the CARE and TARE in specific situations.
We have seen in
Chapters III and IV that the CARE and TARE for the polynomial growth
curve model depend in general on:
a) the covariance matrix,
E,
b) the number of parameters in the model, q, c) the number of time
points, h, d) the use of covariates and e) the score.function.
\
We
.
concern ourselves in this chapter with the parametric procedures only.
We shall find, however, that the results in Section 5.3 for the parametric tests are similar to the results obtained for the Wilcoxon
score.
Sections 5.3 and 5.4 are devoted to the evaluation of the
TARE and CARE for the problem of underfitting and overfitting.,
respectively.
Particular attention is paid to the
situa~ion
where the
data vectors are observations at five equally spaced time points.
Section 5.2 is included to show the redution of the TARE and CARE for
the uniform correlation model, and a tabulation of the TARE in this
case is given.
The covariance matrix defined in (5.2.6) is the one
which is used to provide numerical examples of the values of TARE and
CARE.
136
5.2
Special Covariance Patterns
If the covariance matrix is uniform, for example,
(5.2.1)
1
P
p
1
...
p
p
p
1
then the TARE and CARE for the test of stationarity studied in detail
in Chapter III do not depend on cr
~
=
[~1'
2
or p.
To demonstrate this we let
... ,_btJ, be the orthonormal polynomials of degrees 1 to t,
then
B' E B = cr 2 (1_P) I.
•
#lIw
...,
"'"
Consequently, the TARE and CARE for overfitting with t statistics
relative to the test with q statistics are:
TARE
= R(q,t,~)
t/q
(5.2.2)
for
CARE
= R(q,t,~)
•
TARE
= R(q,t,~)
t/q.
t ~
q.
If t < q then
(5.2.3)
The quantity (5.2.3) is tabulated in Table 5.2.1 with q
t
=t2
in the notation used in that table.
= t1,
The entries in Table 5.2.1
are valid for all values of the number of time points, h, and are
valid for both the parametric and Wilcoxon procedures discussed in
Chapter III.
Inspection of the table reveals that the efficiency is
not greatly reduced for underfitting the model by only one or two
parameters but is
qui~e
and
low for underfitting by a great deal.
137
As we pointed out in Chapter III in the concluding comments, ·the
results of that chapter are applicable to other hypotheses of interest;
e.g., we may test the hypothesis that the intercept is a fixed
quantity, say aOO and that the curve is stationary.
matrix is augmented by
~O.
In this case the B
This hypothesis is of special interest
since the c-sample efficiencies' have been shown to be multiples of the
corresponding one-sample efficiencies of that particular test.
Defining B as ....B = [~O' ~l'
(5.2.4)
B' L B
=
(J
2
... , b....q ], we find
that
1-p+hp
0
0
0
0
l-p
0
0
0
•
0
0
0
l-p
2
Hence, the TARE and CARE are independent of a
depends in general on p.
however, the TARE
Inspection of (5.2.4) and its determinant
readily show that the CARE would be independent of p.
In fact, the
quantity (5.2.2) would define the CARE for this problem.
We remark
also that covariance adjustment by the higher order polynomial estimates does not alter the results discussed up to this point in this
section since B' L B is diagonal in both cases.
lVhile the CARE does
not depend.on p for the second hypothesis, the TARE does depend on p.
With some algebra the TARE of the test with t statistics to the test
with q statistics can be shown to be:
(5.2.5)
TARE
= R(q,t,a) [t(l-p) +
h(t-l)p]/[q(l-p) + h(q-l)p].
We have not tabulated values of (5.2.5) but i t may be of interest
to do so in future work.
The point to be made is that we get different
138
efficiency results by including the intercept in the hypothesis.
A model considered in time series analysis is the first order
auto-regressive model, with covariance structure defined by (5.2.6).
(5.2.6)
E=
(J 2
2
1
P
P
P
1
P
P
1
2
P
3
P
...
h-1
p
1
h-l
P
In section 5.3 and
5~4
• •• • ••
p
I
we shall evaluate the TARE and CARE for
this correlation pattern for various values of p and h.
The value of
(J2 is, of course, immaterial since the TARE and CARE are independent
2
of (J.
I
L- =
It is known that the inverse of L, defined by (5.2.6) is:
2 -1
[(J (l-p )]
2
1
o
-p
-p
o
...
o
o
-p
1
o
o
which is useful in computing the covariance matrix of the weighted
least squares estimates.
This covariance matrix is, of course, given
by (B' E- l B)-I.
When the underlying distribution function is normal,
the covariance matrix (B' E- l B) -1 is identical to the covariance
matrix for the maximum likelihood estimates. Hence, He see that
-
(~' ~-l ~) is the matrix in the noncentrality parameter of these
statistics.
In Section 5.3 we compare the underfitting procedure
based on the unweighted least squares estimates to the trace of
-- -
(B'L-lB).
139
T~LE
VALUES OF TARE FOR
U1~ERFITTING
e .'
5.2.1
FOR UNIFORM COVARIANCE PATTERNS
t
t
1
l
2
3
4
5
6
7
2
8
9
10
11
12
13
14
1
2
.76
3
.65
.86
4
.58
.76
.89
5
.54
.70
.82
.92
6
.50
.65
.76
.85
.93
7
.47
.61
.72
.80
.87
.94
8
.44
.58
.68
.76
.83
.89
.95
9
.42
.55
.65
.79
.85
.90
.95
10
.40
.53
.62
.72
.69
.75
.81
.86
.91
.96
11
.39
.51
.59
.66
.73
.78
.83
.88
.92
.96
12
.37
.49
.57
.64
.70
.75
.80
.85
.89
.93
.96
13
.36
.47
.55
.62
.68
.73
.77
.82
.86
.90
.93
.97
14
.35
.46
.54
.60
.66
.71
.75
.79
.83
.87
.90
.94
Note:
~,
"
.97
Entries in the table are TARE of the incorrect test lvith t 2
degrees of freedom to the test specifying the correct number of
statistics, t •
1
Values in the table were computed with a
= .05.
140
5.3
Underfitting
Because of the relationship between the multi-sample problem and
the one-sample problem for the hypothesis of a specified intercept and
a stationary or constant response over time, we shall consider this
hypothesis in this section.
This section specifically considers the
evaluation of the TARE of the unweighted least squares procedure which
underfits the model with respect to the unweighted least squares procedure which specifies the correct number of parameters.
We also con-
sider the TARE when covariance adjustment has been used in the underfitting procedure.
These two comparisons are analogous to two possible
procedures one may follow in practice; the first procedure simply being
an unweighted least squares with no covariables, while the second is
the unweighted least squares estimates with all remaining higher order
polynomial estimates used as covariables.
In addition to comparing
these two underfitting procedures to the unweighted least squares procedure based on the correct number of parameters, we also compare them
to the weighted least squares procedure based on the correct number of
parameters.
The covariance matrix (5.2.6)
matrix of the original observations.
~vas
used for the covariance
Tables of the ratio of the trace
of the noncentra1ity parameters were generated for p
= ±.l,
±.5, ±.9
and ±.95 for five and also ten equally spaced time points •. For five
time points these figures are found in Tables 5.3.1 - 5.3.8, two illustrations for ten time points are given in Tables 5.3.9 and 5.3.10.
To
compute the TARE one only needs to obtain the value of R corresponding
to the a-level and the number of samples and multiply by the appropriate
number selected from the tables.
Values of R are found in Chapter II.
While the numerical results in Tables 5.3.1 - 5.3.10 are for the
141
parametric test procedures, the corresponding tabulation for the
Wilcoxon score yields nearly identical results for the figures below
the diagonals.
The maximum difference was, in fact, .01.
Each
figure below the diagonal is the ratio of the trace of the matrix in
the noncentrality parameter of the test based on the unweighted least
squares procedure with too few parameters to the corresponding trace
for the unweighted least squares test with the correct number of
parameters.
For this problem tl,in the tables is the number of
parameters in the true model while t
assumed.
z is
the number of parameters
In the upper triangle of the tables, we compare the under-
fitted tests to the test using the weighted least squares reduction.
The roles of t
l
and t
z are
thus reversed in this comparison.
The
entries above the diagonal in the 'a' tables are therefore:
(5.3.1)
and below the diagonal are:
(5.3.2)
The entries in the 'b' tables have identical denominators, but
the covariance matrices in the numerator have been changed to:
(5.3.3)
t
' ~ B
B' ~ B
(B'
~ B
)-lB '
~ B )-1.
r (B
- t I -~ -t l - - -t 1 -~ --h-t
-~
1 -h-t 1 U - h-t l - h -t l U- -t 1
B
is used to denote the matrix of orthonormal polynomials of
-h-t
l
degree t to (h-l).
l
Examining the tables we see that for p
= ±.l covariance adjust-
ment does not change the ratio of traces of the underfit to either
procedure.
For all t
l
and t
z the
ratio of traces is quite low when
142
h
= 5.
When h
= 10
the ratio is near .90 if the degree of the true
model is high "and the degree of the incorrect test is only one less.
For p
= .5
and h
=5
or 10 slight improvements are observed using
covariance adjustment; these are small, however.
The ratio of traces
improves by as much as .07 with covariance adjustment for p
= -.5.
If P is .9 or .95 the ratio of traces is extremely small whether
covariance adjustment is used or not.
Nearly all of the ratios of
traces are less than .50 in this case and many are near zero.
Hence,
for large positive correlations the loss in the TARE in underfitting
the model is substantial.
For large negative values of p we see that
covariance adjustment greatly improves the efficiency results.
For
example, in comparing the test based on the unweighted least squares
with two parameters to the test based on unweighted squares with three,
we see that the ratio of traces is .44.
Adjustment of the former test
with the three higher order terms as covariables increases the ratio
of traces to 1.06.
This is not a surprising result since the test with
covariance adjustment is providing information about the underlying
covariance matrix which the unweighted least squares procedure does not
obtain.
Marked improvements are also noted for the adjusted test as
compared to the weighted least squares procedure.
When adjustment is
made for the factor R, one will find that the TARE will in several
cases be greater than one for the test based on fewer degrees of
freedom relative to the test based on a larger number of degrees of
freedom.
This can be explained by one of two possibilities in the case
of comparing the 'unweighted least squares tests:
either the covariance
is such that not weighting causes a reduction in the trace of the
matrix in the expression for the noncentrality or the increase in the
143
size of the trace of the test with more degrees of freedom is too small
to account for the loss in sensitivity of a test with more degrees of
freedom.
The interpretation of this must be made in light of the fact
that we have considered the local power functions which means the
alternatives are close to the null point.
No generalization to
alternatives at a greater distance 'from the null point should be made
since the truncated power function is not an adequate approximation to
the entire power function for these alternatives.
A fuller discussion
of the relationship between the degrees of freedom and the noncentrality
parameter is found in Krishnaiah (1966, pp. 91-92).
The interpretation
of the increase in the TARE of the test with fewer degrees of freedom
relative to the test with more degrees of freedom based on the weighted
least squares procedure lies simply in the fact that the gain in degrees
of freedom of the latter test has not been accompanied by a sufficiently
large
in~rease
in the trace of the matrix in its ncncentrality.
The
weighted least squares procedure which is equivalent to the maximum
likelihood procedure in the normal theory case is the uniformly most
po,~erful
lem.
test in the class of invariant tests for the one-sample prob-
Underspecifying the true model results in choosing estimates
which are singular transformations of the estimates for the correct
number of parameters.
Hence, tests constructed for the underfitted
model do not belong to the same invariant class as the weighted least
squares test statistics.
According to a result of
\~ald
(1943) the test
based on the maximum likelihood statistics (weighted estimates) has
best average power over the family of ellipsoids defined by equating
its noncentrality parameter to a constant.
Integration over the family
~
144
of spheres for local alternatives weights all the parameters equally,
while in using the ellipsoids as the surface of integration the
weights are determined by the covariance matrix.
Since the likelihood
criterion provides the best test in the sense of best average power
on a given family of ellipsoids, it may not perform as well when the
surfaces of integration are spheres and the information matrix is not
proportional to I.
Care must be used in the interpretation of the efficiency results
presented in this section because we have compared the tests in a very
restricted manner.
First, we have allowed only local alternatives and,
second, we have taken average power in this local sense.
While the
test based on fewer degrees of freedom may have better average local
power, it is at the same time inferior in those directions which have
not been fit.
In conclusion, we see the numerical results show that we never
lose by using covariance adjustment of the primary variates with the
higher order polynomial terms.
Furthermore, there are covariance
matrices that show great improvements in the TARE by use of the
covariance technique.
While underfitting the model is not desirable,
the results in this section seem to suggest that the loss in the
average local power is not large for certain covariance matrices.
For
others, the loss is severe and we see that the decision to include or
exclude higher degree polynomial terms depends to some extent on the
underlying covariance pattern.
145
e~
TABLE 5.3.1
RATIO OF TRACES OF NONCENTRALITY PARAMETERS FOR UNDERFITTING:
FIRST ORDER MODEL P = .1, h = 5
a. Without Covariance Adjustment
t
t
1
1
1
2
2
3
4
5
.48
.30
.22
.17
.64
.46
.35
.72
.55
2
.48
3
.31
.64
4
.22
.46
.72
5
.17
.35
.55
.77
.77
b. With Covariance Adjustment
t
t
1
1
1
Note:
2
2
3
4
5
.48
.31
.22
.17
.64
.46
.35
.72
.55
2
.48
3
.31
.64
4
.22
.46
.72
5
.17
.35
.55
.77
.77
Entries are defined by (5.3.1) , (5.3.2) and (5.3.3).
147
e.
TABLE 5.3.3
RATIO OF TRACES OF NONCENTRALITY PARAMETERS FOR UNDERFITTING:
FIRST ORDER MODEL P = .5, h = 5
a. Without Covariance Adjustment
t
t
1
1
1
2
2
3
4
5
.34
.16
.09
.06
.46
.26
.17
.56
.36
2
.35
3
.16
.47
4
.09
.27
.56
5
.06
.17
.36
.64
.64
b. With Covariance Adjustment
tz
t
1
1
1
Note:
2
3
4
5
.35
.17
.09
.06
.47
.27
.17
.57
.37
Z
.36
3
.17
.48
4
.09
.27
.57
5
.06
.17
.37
See note for Table 5.3.1.
.64
.64
149
e~
TABLE 5.3.5
RATIO OF TRACES OF NONCENTRALITY PARAMETERS FOR UNDERFITTING:
FIRST ORDER MODEL P = .9, h = 5
a. Without Covariance Adjustment
t2
t
1
1
1
2
3
4
5
.09
.02
.01
.01
.25
.11
.06
.43
.24
2
.10
3
.03
.26
4
.01
.11
.43
5
.01
.06
.24
.56
.56
b. With Covariance Adjustment
t2
t
1
1
1
Note:
2
3
4
5
.09
.03
.01
.01
.27
.12
.07
.44
.25
2
.10
3
.03
.27
4
.01
.12
.44
5
.01
.07
.25
See note for Table 5.3.1.
.57
.57
150
~e
TABLE 5.3.6
RATIO OF TRACES OF NONCENTRALITY PARAMETERS FOR UNDERFITTING:
FIRST ORDER MODEL p = -.9, h = 5
a. Without Covariance Adjustment
t2
t
1
1
1
2
3
4
5
.19
.15
.13
.12
.32
.28
.27
.64
.61
2
.45
3
.20
.44
4
.14
.32
.72
5
.12
.27
.61
.84
.84
b. With Covariance Adjustment
t2
t
1
1
1
Note:
2
3
4
5
.61
.48
.41
.39
.78
.68
.65
.87
.83
2
1.48
3
.65
1.06
4
.47
.77
.99
5
.39
.65
.83
See note for Table 5.3.1.
.95
.95
151
e~
TABLE 5.3.7
RATIO OF TRACES OF NONCENTRALITY PARAMETERS FOR UNDERFITTING:
FIRST ORDER MODEL P = -.95, h = 5
a. Without Covariance Adjustment
t
t
1
1
1
2
2
3
4
5
.10
.08
.07
.06
.24
.21
.20
.62
.59
2
.32
3
.11
.33
4
.07
.24
.71
5
.06
.20
.59
.83
.83
b. With Covariance Adjustment
t
t
1
1
1
Note:
2
2
3
4
5
.61
.48
.42
.40
.78
.68
.65
.87
.83
2
2.01
3
.67
1.10
4
.48
.78
.99
5
.40
.65
.83
See note for Table 5.3.1.
.95
.95
152
TABLE 5.3.8
RATIO OF TRACES OF NONCENTRALITY PARAMETERS FOR UNDERFITTING:
FIRST ORDER MODEL P = .95, h = 5
a. Without Covariance Adjustment
t2
t
1
1
1
2
3
4
5
.05
.01
.01
.00
.23
.10
'.06
.41
.23
2
.05
3
.01
.24
4
.00
.10
.42
5
.00
.06
.23
.56
.56
b. Hith Covariance Adjustment
t2
t1
1
1
Note:
2
3
4
5
.05
.01
.01
.00
.24
.10
.06
.42
.24
2
.05
3
.01
.25
4
.01
.10
.43
5
.00
.06
.24
See note for Table 5.3.1.
.56
.56
155
5.4
Overfitting
We present numerical results for the CARE of the parametric test
that overspecifies the model relative to the parametric test which
correctly specifies the model for h equal to five.
We consider the
unweighted least squares procedure for both tests.
We restrict atten-
tion to the first order auto-regressive model and the hypothesis
discussed in Section 5.3.
the t
th
l
In Tables 5.4.1 - 5.4.4 we have tabulated
root of the ratio of the determinants above the diagonal,
while below the diagonal we have the CARE of the overspecified test to
the test based on the correct number of parameters for the one sample
problem (or two sample problem) for a
=
.05.
Hence, above the
diagonals are:
(5.4.1)
and below the diagonal are:
(5.4.2)
We have tabulated the results for p
=
±.5, ±.90.
The tables were
generated for other values of p; however, the results for intermediate
values of
of
Ipl
p are between the values listed in the tables. Small values
result in D, defined by (5.4.1), being very close to one.
The
comparison to the test based on the weighted least squares reduction
yields similar values of the CARE for
Ipl
Ipl
~ .5.
For larger values of
the CARE for comparison to the weighted least squares procedure
is much lower (see Table 5.4.4).
We note that for p
= ±.5
the CARE for
156
Je"
overfitting is about .40 for fitting five terms when only one is
needed.
If we overfit a four parameter model by a five parameter model
the CARE is about .90.
the case where p
= +.5,
When p
= +.9
while for p
we obtain results identical with
= -.9
we see that overfitting in
most cases produces a CARE greater than one.
This most probably is
caused by the fact that the Qverfitting yields information about the
covariance matrix not used in the unweighted least squares procedure
which specifies the correct number of parameters.
Comparison to the
weighted least squares procedure based on the correct number of
parameters for p
=
-.9 produces rather low values of the CARE.
On the
other hand, with underfitting in this same case we have a loss in the
ratio of the traces of the noncentra1ities; however, a large portion
of this loss is recovered by covariance adjustment (see Table 5.3.6).
Furthermore, all the numbers in Table 5.3.6 will increase when the
adjustment is made for R.
This suggests that overfitting has more
loss associated with it than does underfitting for certain covariance
matrices.
157
TABLE 5.4.1
VALUES OF D AND CARE FOR OVERFITTING P
e~
= .5
t2
t1
1
1
2
3
4
5
1.00
1.04
1.04
1.04
1.02
1.04
1.04
1.01
1.02
2
.65
3
.53
.79
4
.44
.68
.85
5
.39
.59
.75
Note:
1.00
.88
Entries above diagonal are defined by (5.4.1).
Entries below diagonal are defined by (5.4.2).
TABLE 5.4.2
VALUES OF D AND CARE FOR OVERFITTING P
t
t
1
1
1
Note:
2
2
3
4
5
1.00
1.10
1.10
1.11
1.05
1.13
1.13
1.05
1.12
2
.65
3
.56
.82
4
.47
.74
.88
5
.41
.65
.82
See note for Table 5.4.1.
= -.5
1.05
.91
158
Je
TABLE 5.4.3
VALUES OF D AND CARE FOR OVERFITTING P
t
t
1
1
1
= +.9
2
2
3
4
5
1.00
1.03
1.03
1.03
1.02
1.04
1.04
1.02
1.02
2
.65
3
.53 .
.79
4
.44
.68
.85
5
.39
.59
.75
1.01
.88
See note for Table 5.4.1.
Note:
TABLE 5.4.4
VALUES OF D AND CARE FOR OVERFITTING P = -.9
t
t
1
1
1
2
Note:
2
2
3
4
5
1.00
3.15
3.15
3.25
1.77
2.34
2.38
1.20
2.19
.65(.20)
3
1. 60(.58) 1.38( .49)
4
1.35(.46) 1. 53 C. 64) 1. 01(.41)
5
1. 22(.87) 1.36(.73) 1. 60(.57) 1.37(.37)
1.56
See note for Table 5.4.1. Numbers in parenthesis are the CARE
of the overfit with respect to the weighted least squares test.
CHAPTER VI
SUMMARY AND SUGGESTIONS FOR FURTHER RESEARCH
We have considered in this study two measures of ARE which may
be used when comparing two statistics with limiting chi-square distributions.
The two quantities, the CARE and TARE, may be used when the
chi-square distributions have different degrees of freedom.
The CARE
selects that test of the two competing tests whose power function has
the greater generalized Gaussian curvature at the null point.
On the
other hand, the TARE selects the test whose power function has the
greater average local power over the family of spheres.
Both the CARE
and TARE depend on the degrees of freedom of the tests, the significance
level and the noncentrality parameters.
The CARE and TARE computing
formulae have been derived for the one-sample and the multi-sample
polynomial growth curve problems:
to
cal
.
th~
Particular attention has been paid
fOrmulae for the underfitting and overfitting problems.
ex~ples
,
Numeri-
of the CARE and the TARE have also been computed to pro-
vide some idea of the efficiency results for some special cases.
Several areas of future work may be suggested.
In the area of
applications it would be useful to study the CARE and TARE for examples
other than the ones chosen in Chapter V.
In addition, the work of
Chapters III and IV should be generalized to include more complicated
designs.
Hopefully, these results would reduce to the one-sample prob-
lem as the multi-sample problem did so the same bou.ills on the TARE and
~
160
CARE would apply, excepting the factor R.
It seems reasonable that the
noncentrality'parameters for the more complicated models would factor
as the noncentrality parameters for the multi-sample problem did.
In addition to the growth curve problem it would be desirable to
study other applied problems.
The results of Chapter II are in no way
restricted to the study of growth curve models, but may be applied to
virtually any situation where a reduction to a set of summary statistics is made and we want to determine the efficiency of the reduction.
For example, if the course of a disease is characterized by a stochastic process and it is of interest to compare different groups of
persons in this disease process then maybe certain summary statistics
•
define or describe the process.
One way to compare them is to reduce
each person's observations to the basic summary statistics by some
suitable estimation procedure then compare the summary statistics of
the different groups.
A natural question arises as to whether the
reduction is a 'good' reduction to use.
The basic approach used in
this study would be helpful in this problem.
Another area of possible future work is to consider other
efficiency' criteria.
From the results obtained in Chapter V for the
underfitting problem, we see that it may be meaningful to define the
TARE as the ratio of the average local powers where the average is
taken over the family of ellipsoids defined by the noncentrality of
the likelihood ratio test.
In addition, the restriction of local alternatives makes the
TARE a somewhat restricted measure of ARE.
REFERENCES
[1]
Bahadur, R. R. (1960). "Stochastic comparison of tests,"
Annals of Mathematical Statistics, 31, 276-295.
[2]
Blomqvist, N. (1950). "On a measure of dependence between
two random variables," Annals of Mathematical Statistics,
21, 593-600.
[3]
Bickel, P. J. (1964). "On some alternative estimates for
shift in the p-variate one-sample problem," Annals of
Mathematical Statistics, ~, 1079-1090.
[4]
Bickel, P. J. (1965) • "On some asymptotically nonparametric
competitors to Hotelling's T2 , " Annals of Mathematical
,
Statistics, 1£, 160-173.
[5]
Gastwirth, J. L. and Wolff, S. s. (1968). "An elementary
method for obtaining lower bounds on the asymptotic power
of rank tests," Annals of Hathematical Statistics, 12,
2128-2131.
[6]
Graybill, F. A. (1961). An Introduction to Linear Statistical
Models, Volume 1, McGraw-Hill Book Company, Inc., New York.
[7]
Graybill, F. A. (1969). Introduction to Matrices with Applications in Statistics, ~vadsworth Publishing Company, Inc.,
Belmont, California.
[8]
Grizzle, J. E. and Allen, D. (1969). "Analysis of growth and
dose response curves," Biometrics, g, 357-382.
[9]
Hodges, J. L., Jr. and Lehmann, E. L. (1956). "The efficiency
of some nonparametric competitors of the t-test," Annals
of Mathematical Statistics, lL, 324-335.
[10]
Isaacson, S. 1. (1951). "On the theory of unbiased tests of
simple statistical hypotheses specifying the values of two
or more parameters," Annals of Nathematica1 Statistics,
22, 217-234.
[11]
Kendall, M. and Stuart, A. (1963). The Advanced Theorv of
Statistics, Volume II, Second Edition, Hafner Publishing
Company, New York.
162
[12]
Krishnaiah, P. R. (1966).
Press, New York.
Multivariate Analysis, Academic
[13]
Lehmann, E. L. (1959). Testing Statistical Hypotheses,
John Wiley and Sons, Inc., New York.
[14] . Neyman, J. and Pearson, E. S. (1936). "Contributions to the
theory of testing statistical hypotheses I. Unbiased
critical regions of type A and type A!," St~tistical
Research Memoirs, Volume 1, 1-37.
,-
..
[15]
Neyman, J. and Pearson, E. S. (1938). "Contributions to the
theory of testing statistical hypotheses," Statistical
Research Memoirs, Volumes II and III, 25-57.
[16]
Noether, G. E. (1955). "On a theorem of Pitman," Annals of
Mathematical Statistics, 1£, 64-68.
[17]
Potthoff, R. F. and Roy, S. N'. (1964). "A generalized multivariate analysis of variance model useful especially for
growth curve problems," Biometrika, 21., 313-326.
[18]
Puri, M. L. and Sen, P. K. (1971). Nonparametric Methods in
Multivariate Analysis, John Wiley and Sons, Inc., New York.
[19]
Sen, P. K. and Puri, M. L. (1970). "Asymptotic theory of likelihood ratio and rank order tests in some multivariate
linear models," Annals of Hathematical Statistics, 41,
87-100.
[20]
Wa1d, A. (1943). "Tests of statistical hypotheses concerning
several parameters when the number of observations is
large," Transactions of the American Hathematica1 Society,
54, 426-482.
[21]
Weyl, H. (1939). "On the volume of tubes," American Journal of
Mathematics, 461 •
© Copyright 2026 Paperzz