Williams, George W.; (1972)Restrained multisample nonparametric multivariate estimation."

.e
RESTRAINED MULTISAMPLE NONPARAMETRIC
MULTIVARIATE ESTIMATION
by
George W. Williams
Department of Biostatistics
University of North Carolina, Chapel Hill, N. C.
Institute of Statistics Mimeo Series No. 836
AUGUST 1972
RESn);~H:~D :~ULTISi\i:PLE tlO::PM~/\i:ETRIC
MULTIVARIATE ESTIMATION
by
George W. Williams
·e
A dissertation submitted to the faculty
of the University of North Carolina at
Chapel Hill in partial fulfillment of
the requirements for the degree of
Doctor of Philosophy in the Department
of Biostatistics
Chapel Hi 11
1972
Approved by:
Adviser
Reader
Reader
ii
TABLE OF CONTENTS
CHAPTER
PAGE
ACKNOWLEDGEMENTS
ABSTRACT
I
INTRODUCTION AND SUMMARY
Review of the Literature
Rel ated Results
Summary of Results and Notation
Some Applications
1.1.
1.2.
1.3.
1.4.
II
TWO SAMPLE
2.1.
2.2.
2.3.
2.4.
2.5.
.e
III
WITH FIXED TOTAL SAMPLE SIZE
Introduction
Procedure
Strong Convergence of Estimators
Asymptotic Normality
Optimality Results
26
27
31
39
63
Introduction
Procedure
Consistency
Efficiency
Remarks
70
70
73
82
95
SMALL SAMPLE BEHAVIOR
4.1.
4.2.
4.3.
4.4.
4.5.
e
ESTI~~TION
1
6
10
23
A TWO SAMPLE CONFIDENCE ELLIPSOID OF FIXED
MAXIMUM WIDTH
3.1.
3.2.
3.3.
3.4.
3.5.
IV
iv
v
Introduction
The Wilcoxon Statistic
Small Sample Behavior for the Wilcoxon
Statistic
Tables 4.3.1
4.3.12
Lehmann's Two Sample Scale Statistic
Small Sample Behavior for Lehmann's Two
Sample Scale Statistic
Tables 4.5.1 - 4.5.2
-
100
101
105
110
122
125
128
iii
PAGE
CHAPTER
4.6.
v
·e
Calculation of the Asymptotic Variance
of Lehmann's Statistic
130
SUMMARY AND SUGGESTIONS FOR FURTHER RESEARCH
133
REFERENCES
137
iv
ACKNOWLEDGEMENTS
I would like to express my sincere gratitude to Professor
P. K. Sen for suggesting the problems considered in this dissertation
and for many helpful discussions during its preparation.
I would also like to thank the other members of my committee:
Professors S. K. Chatterjee, J. Grizzle, R. Kuebler, and J. Cornoni.
Their encouragement and helpful comments were greatly appreciated.
I wish to acknowledge the Department of Biostatistics who
·e
supported my graduate studies through N. I. H. Training Grant No.
TOl GM00038 from the National Institute of General Medical Sciences.
Finally, a very special thanks is due my wife, Marcia, for
her willing assistance and understanding.
v
GEORGE W. WILLIAMS.
Restrained Multisample Nonparametric
Multivariate Estimation (Under the direction of p. K. Sen.)
ABSTRACT
In this dissertation two problems are considered.
First. we discuss a procedure for estimating a vector of
functionals of two distribution functions by means of a vector of
generalized U-statistics with the restriction that the total sample
size is fixed.
It will be shown that the estimate which is obtained
by the proposed procedure has an asymptotic generalized variance
equal to the minimum asymptotic generalized variance of any two
sample U-statistic estimator for which the total sample size is
·e
fixed.
The extension to the c-sample case (c
~
2) is indicated.
Second, we discuss a sequential procedure for obtaining a
confidence ellipsoid for a vector of functionals of two distribution
functions that has a fixed maximum width and prescribed confidence
coefficient, a.
It will be shown that under certain conditions the
confidence coefficient is asymptotically
a.
It will also be shown
that the expected total sample size of our procedure is asymptotically
equal to the sample size one would use if the population parameters
were known.
Again. the extension to the c-sample case is indicated.
Finally, numerical illustrations are provided for both
procedures with particular attention given to the Wilcoxon statistic
and Lehmann's two sample scale statistic.
CHAPTER I
INTRODUCTION AND SUMMARY
1.1. Review of the Literature
Several workers have been concerned with the problem of
estimating and testing (with prescribed accuracy) the mean of a
distribution when the variance was unknown.
a two sample test for the mean of a
norm~l
Stein [1945] developed
distribution when the
variance was unknown. The size of the second sample depended upon
the result of the first sample. Ghurye and Robbins [1954] developed
a procedure to estimate the difference between the means of two
populations that were normal, poisson, binomial, or possessed moments
up to order eight.
If the cost of sampling is a linear function of
the number of observations, then they showed that the optimum values
of the sample sizes that will yield the minimum variance, V ' of
o
Xl - X2 subject to the cost function depends upon the ratio of the
unknown variances. To overcome this difficulty a two stage procedure
was used.
It was shown that under certain conditions the ratio of the
variance of the two stage estimator to Vo tends to unity.
Richter
[1960] considered the problem of estimating the common mean of two
normal populations using a fixed number of observations.
He
discussed the problem of how the first stage sample size might be
2
chosen as a function of the total sample size using a minimax
approach.
Yen [1964] improved upon the results of these earlier workers
by consideringa wider class of distribution functions than just the
nonmal distribution. Moreover, instead of estimating a function of
means, she dealt with the more general case of a real valued estimable
parameter e
=
e(F,G). More explicitly, she considered the case for
which there exists an unbiased estimator ,(Xl' ••• 'Xr,Yl, ••• ,Y r ) of
e where , is a function of independent observations from two
populations with cumulative distribution functions F(x) and G(y).
The class of distribution functions considered were those for which.
a U-statistic [see (1.2.2)] based on , is the unique minimum variance
unbiased estimator of e.
For a discussion of the requirements for
such a class of distribution functions, one should see Fraser
[1957, p. 142].
Her procedure for estimating e was a two stage procedure.
The total sample size was a fixed number N.
First, Mobservations
were taken from each population. The remaining N - 2M observations
were allocated between the two populations based upon the first 2M
observations. Two kinds of estimators (U' and Un) were proposed.
Both estimators were U-statistics with random sample sizes. U' was
based primarily on the second stage observations, while Un was
based on all N observations.
Let Vo denote the smallest variance of
any U-statistic estimator of e under the condition that the total
number of observations is fixed.
The sample sizes from the two
3
populations such that the variance of a U-statistic is V were
o
shown to depend upon certain population parameters.
Yen developed
estimates for these parameters directly from the theory of
U-statistics.
Yen showed that U· is unbiased and under certain
conditions, including the existence of the eighth moment of 4»,
that the variance of U· approaches asymptotically VO'
Although
it was shown that U" is biased, E(U,,~)2 also approaches asymptotically
VO'
Moreover, Yen discussed the optimal choice of Mrelative to
N.
The optimal choice depended upon whether one assumes that the
first eight moments of ,exist, all moments of 4» exist, or , is
bounded.
Finally, the asymptotic normality of U· and
un
were
proven.
In the preceeding discussion we have been considering the
·e
problem of estimating an unknown parameter by a statistic that has
minimum variance and for which the total sample size is fixed.
We
will now consider the closely related problem of determining
confidence intervals for an unknown parameter of specified length
and confidence coefficient for which the expected sample size is
minimum.
Wald [1947] who was one of the original workers in
sequential analysis suggested this problem for the particular case
of estimating the mean of a normal population when the variance
was unknown.
At approximately the same time Stein [1945] considered
a two sample procedure for this problem.
However, his procedure
rejected some of the information provided by the sample, and thus,
for practical purposes is unlikely to be acceptable.
Anscombe
[1953], besides considering this problem, discussed the related
4
problem of estimating the difference between the means of two
normal populations when the variances are unknown.
Ray [1957]
considered a small sample approach to the estimation of the mean
of a normal population by a confidence interval of prescribed
width and confidence coefficient when the population variance is
unknown. To consider the small sample theory for this problem,
he modified Anscombe's sequential procedure. As a result of his
numerical computations, he expressed some doubt in the asymptotic
consistency of the procedure. This doubt later proved to be
unjustified.
A significant advance in the work on these problems came
as a result of the work of Chow and Robbins [1965]. They wanted
to find a confidence interval of prescribed width and confidence
coefficient for the unknown mean of some population. They showed,
under the sole condition of a finite variance, the asymptotic
consistency and efficiency of their procedure. Starr [1966] by
means of numerical results showed that the procedure recommended
by Chow and Robbins [1965] was reasonably consistent and efficient
for all values of the unknown variance when considering the normal
distribution. He also showed that their procedure had a slightly
reduced minimum confidence coefficient compared with Stein's two
stage procedure. However, the Chow and Robbins procedure is more
efficient than the two stage procedure. The difference in efficiencies
is sizeable whenever the first stage sample size is chosen poorly.
Simons [1968] showed that the mean of a normal distribution with
5
unknown variance may be estimated to be within an interval of
given fixed width at a prescribed confidence level using a
procedure which overcomes ignorance about the variance with no
more than a finite number of observations. That is, the expected
sample size exceeds the (fixed) sample size one would use if the
variances were known by a finite amount; the difference depending
on the confidence level but not depending on the values of the
mean, the variance, and the interval width.
Robbins et al. [1967] extended the work of Chow and Robbins
(1965] and Starr [1966] to the problem of estimating the difference
of the means of two populations. Their problem was to find a
confidence interval of fixed width and coverage probability for the
difference of means from two distributions when the variances are
·e
unknown. Their sequential procedure consisted of a sampling scheme
which determines at each stage from which population to take the
next observation and a stopping rule.
Actually three related
stopping rules were discussed. Although the asymptotic consistency
and efficiency results were proven under the assumption that the
two distributions are normal, they hasten to point out that all that
is required is the existence of the fourth moment and even that
can be relaxed. Some numerical computations are presented in a
discussion of the small sample behavior of the procedure. Moreover,
it is also shown that the expected sample size exceeds the (fixed)
sample size one would use if the variances were known by a finite
amount.
6
Sproule [1969] extended the work of Chow and Robbins [1965]
to the problem of estimating a real valued estimable parameter of
a distribution function F(x). Based on estimates derived from the
theory of U-statistics, a sequential procedure was developed, and
its asymptotic consistency and efficiency were demonstrated.
Making use of the fact that U-stat1stics are reverse martingales,
the asymptotic efficiency was established in much the same way as
Simons [1968].
1.2. Related Results
In the study of our estimation procedure we. like Yen [1964]
and Sproule [1969]. will rely heavily on the theory of U-statistics
developed originally by Hoeffding [1948] and extended in many
directions by Sen and other workers. A systematic review of the
various properties of U-statistics can be found in Section 3.2 of
Puri and Sen [1971].
Let {Xi = (Xil •••••Xi )'
..
P
i
~
1 } be a sequence of independent
and identically distributed random vectors with a distribution function
F(x). x £ RP• the p (>l)-dimensional Euclidean space. If to any
F belonging to a subset
t:1'-'
~
of the set of all distribution functions
1n RP a quantity e(F) is assigned, then e(F) is called a functional
of F. defined on """
~.
If
(1.2.1)
where k 1s a positive integer and the Stie1tjes-lebesgue integral
1s extended over the entire space of
~l
' •.. '~k' then the estimate
e·
7
~
~(~l'''''~k)
of e(F) .is unbiased over.r.
If a functional has
an unbiased estimate over 7, then the functional is regular over 1'.
tl"
If e(F) is regular over I and k is the smallest sample size for
which there exists an unbiased estimate of e over
c.t , then
k
will be called the degree over ~ of the regular functional e (F).
For any regular functional e{F) there exists a kernel
to(~l '···'~k) symmetric in ~l '···'~k
since ~O{~l '···'~k) can
be formulated by the average of any kernel of e (F) over all
permutations of its arguments. An example of a regular functional
of degree 2 is the variance of Xij. The symmetric kernel for the
variance of X.. is
lJ
.e
For every m ~ k, a U-statistic is defined as
* t
Um = em)
k -1 E m
(1 .2.2)
(X
..°1
, •• .,X
where the summation extends over all 1
.. ok
~
01
)
< ••• <
ok
~m.
Many statistics in current use can be expressed 1n terms of Ustatistics such as moments and rank correlation statistics. For
every 0
~
c
~
k, 1et
8
where
~l'
••. '~k are arbitrary fixed vectors and the expected value
is taken with respect to the random vectors ~C+l' ••• '~k. Assuming
that the following expectations exist, for every 0
~
c
~
k
Hoeffding [1948] showed that if the variance of Um exists,
(1.2.3)
Although in our work we will consider only the situation
where X
,a
_a
= 1, ••• ,m
are identically distributed, the above
definitions can be easily extended to the case for which the {Xa }
are not identically distributed. Hoeffding [1948] showed that if
~l' ••• '~m
are identically distributed and
.(~l'
••• '~k)
independent of m, then the distribution function of
tends to a normal distribution function as m
+
m
is
~(Um-
e(F»
under the sole
condition of the existence of E .2(~l' ••• '~m). He showed that
similar results hold for the joint distribution of several Ustatistics, for statistics U· which are asymptotically equivalent
to U, for certain functions of U or U', and, under certain additional
's having different distributions.
assumptions, for the case of X
_a
Let
{~j' j ~
l} be an independent sequence of independent
and identically distributed random variables with a distribution
function G(y), y
£
Rp·, p.
>
1. We can now consider an estimable
e·
9
~
parameter [functional of (F,G)]
e (F,G)
~ dF(Xi)·.~
- J=l
= f... f •
where k and
q
(xl'···' xk' Y1'··.' Yq )
- .. i=l
dG(Yj)
..
are positive integers, and the kernel is symmetric in
(~1'···' ~k) as well as in (~1'···' ~q).
For every m ~ k and n
~
q, a generalized U-statistic is
defined as
-1
(n)
q
where the summation extends over all
1 ~ a1
< ••• <
For every
Bq
~
0 ~
1
~a1 < ••• <
a k ~ m,
n.
c
~ k
and
0 ~
d
~
q, 1et
.~
so that
.00
=e
(F,G) and too = O.
One can easily see by a direct extension of Hoeffding's [1948]
one sample result that
(1.2.4)
Var (U)
mn
=
-1
( mk>
-1
(qn>
10
1.3.Su~ary
of Results and Notation
Before discussing our results it will be necessary to formulate
some notation.
Consider a vector of estimable parameters [functionals
of (F,G»
= ( 0l(F,G), ••• ,
e (F,G)
0t(F,G»'
t
> 1
where
where ki , qi are positive integers, and the kernel
~i(Xl"'" Xk ' Y , ••• , Y ) is symmetric in (Xl"'" Xk ) as well
..
.. i .. l
.. qi
.... i
as
(Y" ••• , Yq ) for i
..
.. i
For every m ~ ki
defined as
Um
' = (mk )
n
i
-1
= 1, ... ,
and n
-1
(qn)
Emn
i
~
qi' a generalized U-statistic is
~i(Xa , ••• , Xa ' VB , ••• , Y )
.. 1
.. k
.. 1
.. Bqi
i
where the summation extends over all
1
~
81
< ••• <
8q
i
For every 0
t~~
=
E
~
~
c
e·
t.
~
t
al
< ••• <
ak
<
i -
m,
n for i = l, ••• ,t.
~
ki and 0
~
d
~
qi' let
~~d (~l""'~c'!l""'!d)
•
~~d (Xl"",X
'Y ""'Yd) - 6i(F,G) 0j(F,G)
... c -1
...
11
i (X) = (m-1) -1 (n )-1 '-~ 'I'... i ( X , X , ••• ,X
y
Y)
V10
.. h
k;-l
qi
h
- h _a2
..ak '0
"P1 .···'0
.. Pq
1
where the summation extends over all possible 1 < a2
1 ~61 <... < 6q ~ n with at. f- h, t. = 2, ... , m.
1
< ••• <
a
<
k1 -
m,
1
We will now define the following statistics 1n a manner
analogous to Sen [1960].
s1j
10
=
1
m-1
V~l(!t.) and S~~ follow similarly.
The following notation will also be introduced at this point
.
~
since it will be required in the proof of Theorem 2.3.2.
ij
uc,d;m,n
=
• t
where the summation extends over all possible
1 ~
al
< ••• <
1 .i. 61 < ••• <
ak
<
i -
aqi
m
< n
12
1
1
~
al*
~
61* <
< ••• <
*
ak.
~
m
*
Sq.
~
n
J
••• <
J
with the restriction that ah
e
= a~ , h = O,l, ••• ,c and ah ~
a:
= 6h ' h = O,l, ••• ,d
and Bh ~ 8;
for any other (h,t) where c = O,l, ••• ,k i and d = O,l, ••• ,qi
and ki ~ kj , qi ~ qj •
for any other (h,t) and Sh
Now with our notation defined, we can summarize our results in
Chapters II-IV.
In Chapter II we will consider a sequential procedure
to estimate e(F,G) by means of
~mn
with the restriction that m+n
=N
(fixed). An initial sanlp1e of equal size is taken from both
populations. Based on estimates of the generalized variance of
~mn' an estimate is obtained of A (
= mWn)
such that the generalized
,
variance of
~mn
is minimum. Another sample is taken from both
populations so that the initial sample and the second sample is
divided between the two populations in accordance with our estimate
of A. The procedure is repeated so that at each step the estimate
of A is improved. This sequential procedure not only allows our
final estimate of
A
to be based upon a larger number of observations
than would be possible in a two stage procedure, but it also reduces
the effect of the first stage sample size. Certainly, a special
case of our sequential procedure is Yen1s [1964] univariate two stage
procedure.
In order to estimate the generalized variance of
~mn
' we will
e.
13
~
use S~~ and S6{ , i,j
= l, •.. ,t.
These estimators are much easier to
calculate than the comparable estimators proposed by Yen [1964]
directly from the theory of U-statistics.
based upon vtO(~h)
(and V51(~t»
sl5
(and S~~)
is
which is a U-statistic with
terms while Yen1s estimator is based on a U-statistic with
m-k. n-q.
1) (
1)
ki-l qi
(m- 1 ) (n ) (
ki-l
qi
terms.
i .
.~
i .
We shall prove that Sl~ (and SOl ) converges almost surely
ij
..
to tlO (and t~~) using a decomposition suggested by Sproule
(1969] in Theorem 3.2. Since Sproule [1969] was dealing with the
one sample case, he only required the existence of the second moment
of the kernel.
Unlike Yen [1964] who required the existence of the
eighth moment of the kernel, we will only require the fourth moment
to derive comparable results. A couple of examples for which our
procedure would apply and Yen1s procedure would not apply are the
following:
Example 1.3.1
Consider the distributions
14
8
311'
dF=-
1
dG = _8_
1
-. oS. X ~ CD
- . oS. Y oS.
311'
CD
Moments about the origin exist up to the fourth moment, but higher
order moments do not. Thus, if 8(F,G)
= 81 -
6
2 , Yen's procedure
would not apply, and our procedure would apply.
Example 1.3.2
Consider the distributions
dF
a
c X- 6 e
dG = c y
- v1/ x
dx
-6 - v2/y
e
dy
o
< X <
CD
O~y~CD
Moments up to the fourth moment exist, but higher order moments do
not. Thus, if e(F,G)
=
Vl - v2' Yen's procedure would not apply,
and our procedure would apply.
A straightforward extension of Hoeffding's [1948, p. 305]
one sample result (see, for instance, Puri and Sen [1971, p. 65])
demonstrates that Umn is asymptotically normally distributed for
fixed sample sizes. However, in our procedure even though the total
15
sample size is fixed. the sample sizes of the U-statistic are random'
..
variables.
The asymptotic normality for this case follows by an
argument analogous to that presented by Mogyorodi [1967].
In such
an argument. it is necessary to consider the concept of "uniform
continuity in probability" developed by Anscombe [1952. p. 600].
In order to prove uniform continuity in probability. we will make
use of the following arguments:
Berk [1966. p. 56] showed that if
e(F) is estimable. {Urn' m ~ k} is a reverse martingale. Chow
[1960] extended the Hajek-Renyi inequality to a semi-martingale
inequality.
Sen [1971] extended Chow's result to stochastic vectors.
Sen [1972] extended Chow's result to two sample generalized Ustatistics.
·e
After showing that under certain conditions upon Mr and Nr
that UM N converges in distribution to the same limiting distribution
~ r' r
,
as Um n where M and N are random variables and mr and nr are
r
r
_ r' r
constants, it is shown that the estimate of
A
obtained by our proce-
dure converges almost surely to the value which would yield the
,
U-statistic with minimum asymptotic generalized variance of any two
sample U-statistic estimator of e{F,G) for which the total sample
size is N (fixed).
Also. it is shown that the asymptotic generalized
variance of our estimate of e{F,G) (appropriately normalized) is
equal to the minimum asymptotic generalized variance of any two sample
U-statistic estimator of e(F,G) (also appropriately normalized) fer
which the total sample size is N (fixed).
In Chapter III we will consider the problem of finding a
16
sequential confidence ellipsoid for e(F,G) of fixed maximum width
2d, where d
0, and such that the confidence coefficient is
>
asymptotically a specified a, where 0
~a
<
We will define an
1.
ellipsoid with maximum width 2d as
(1.3.1)
sup
t: t't =1
I
I.'
~
[U
~mn
-
e (F ,G) ] I
-
<
d
where .t is a t x 1 vector of constants. After some manipulations,
if
~ >
(1.3.2)
0 (1.3.1) can be written as
N[U_mn - e{F,G)]'
r- 1 [~mn - e(F,G)]
<
v*
where
-rtxt
=
and v* is the largest root of r.
+
In Chapter II we will have shown
that IN [~mn - ~(F,G)] has an asymptotic Nt(O,:) distribution.
Thus,
(1.3.3)
N[U
~n
- e(F,G)]1
r- l [U
_
has an asymptotic x ~ distribution.
we would select mand n as follows:
~n
- e(F,G)
]
_
Therefore, if r were known,
-
From (1.3.2) and (1.3.3) we
17
~
note that if
sup
I.: .t
..
1..
-1
I .t [U
- 6.. (F .G) ]
1
'Omn
I ~ d] =
CI
'O'O
then
(1.3.4)
v*
It is evident from (1.3.2) that Ncould be minimized for fixed d if
v* wereminimized. Hence. if r were known. we would find the minimum
'O
value of v* (say v*( Amin) ) by proper choice of A. Thus. A would
be selected and from (1.3.4) Nwould be selected.
Since r is not known. a
~equentia1
procedure will be used
'O
.~
to estimate ~ and correspondingly v*( Amin ). An initial sample
of equal size will be taken from both populations. Based on
estimates of
~.
an estimate of Amin will be calculated. Another
sample will be taken such that the total number of Observations is
divided in accord with our estimate of Amin. A stopping rule will
be formulated based on the stopping random variable N*.
procedure is terminated.
~mn
ellipsoid can be constructed.
If the
will be calculated and the confidence
If the procedure is not terminated.
the process is repeated.
In order to estimate r • we will ~se
stg
and s~~ ;i.j
=
l •••••t. We shall prove the strong convergence property of our
estimates of ~in and v*( ~in).
If we assume the existence of
the fourth moment of the kernel and the uniqueness of
~in'
then
18
we can prove the asymptotic consistency of our procedure. With
certain additional assumptions, the asymptotic efficiency of our
procedure will be proved in a manner similar to Miller and Sen [1972].
It is important at this point to note that the results
obtained in Chapters II and III can easily be extended to the c
sample case (where c
>
2). Since the notation that will be
required for the c sample case is quite analogous to the notation
previously discussed for the two sample case, we will only briefly
outline the notation for the c sample case.
Let
be a kernel for the estimable parameter ei(F1 , ••• ,Fc ) where
kil, ••• ,k ic are positive integers. The kernel is assumed to be
symmetric in the kij arguments of the jth set for j = 1, ••• ,c,
i = l, ••• ,t though the roles of the c different sets may not be
symmetric. The U-statistic corresponding to ,i, i
given by
;
UN(c)
c
=
r
n
j=l
where the summation extends over all possible
1 ~ajl < ... <
ajk;j ~nj
for
j:l
1, ••• ,c.
= l, ••• ,t
is
e·
19
For 1
= 1, ... ,t; o -<.qj
=
< k . , j
- 1J
= 1, ••• ,c
E.1(x 11,···,x1 'Xl
1",·,X
, ••• ,
•
• q1 • q1+
• 1k 11
xc 1,···,xcq ,X cq +l"·,,X ck )
•
• c· c
• ic
·e
For 1
= l, ••• ,t
,
c
n
j=2
•
1
n
( j ) -1
kij
(X 1h ,Xl a. , •• "Xl u..
•
• 12
• Ik
~«
cl
t
t · • ••
11
, ••• '~a
ck ic
)
where the summation extends over all possible
1<
-
a.
12
< ••• < a 1kn ~ n wi th a 1.. ~ h, t
1
over all possible 1 ~
a
jl
<... < a
jk
1j
~
= 2, ... ,k l1
nj for
j •
and
2•••• ,c.
20
= l ••.•• t
For i.j
•
nl
1
sij
=
1.0 •••••0
•
t
h=l
(V
i
j
(X
1,0 •••••0.lh
j
) - U
N(c)
)
i
Notation for VO•l •O••••• 0(~2h) • VO,O.l ,0, ••• ,0(~3h) , etc. and
13
ij
f·
50 ,1.0 •••••0 .50.0.1.0 •••••0 ' etc. ollow slmilarly.
A straightforward modification can be made of the sequential
procedure discussed for the two sample case in order to estimate
~
(Fl •••••Fc ) by means of
~N(C)
with the restriction that
n1 + ••• + nc = N (fixed). In-the c sample case we must now
obtain estimates for A j = njtN • j = 1•••••c-1 such that the
asymptotic generalized variance of
~N(C)
of the asymptotic generalized variance of
of S13
1 .0 •••• ,0'
s~~ i.i
s ij
,
0.1.0 •.•••0
= l, ••• ,t
convergence of
...
is minimum. Our estimate
~N(c)
will be a function
as opposed to just s~~ and
for the two sample case. The almost sure
s~:o, ... ,o to t~~o, ... ,o will follow by exactly
the same argument as before.
By a similar decomposition to that
used in the two sample case. asymptotic normality of
sample sizes will follow.
~N(C)
for fixed
Although the bound that will be obtained
for the Hajek-Renyi type inequality that is necessary to prove the
e·
21
asymptotic normality of
~N(C)
for random sample sizes will be quite
complicated, the basic proof as in the two sample case will suffice.
Special care should be given to the decomposition of the remainaer
term in the extension of Lemma 2.4.3. With appropriate modifications
of restrictions, the optimality results for the c sample case are
obtained in an analogous manner to the two sample case.
With only slight modifications it is possible to use the results
of Chapter III to obtain a confidence ellipsoid for
~
(F l , ••• , Fc )
of fixed maximum width such that the confidence coefficient is
asymptotically a specified a. For the c sample case (1.3.2) becomes
N[ ~N(c) - ~ (F l ,···, Fc ) ] :(~)
.e
<
[~N(C) - ~ (F l ,···, Fe) ]
N·d 2
Vfc)
where
i °
1 .0, ••• ,0
).1
r; J
!
(e)
+
k,02 k_ r; 0,1,
ij •. .,0
J2
o
t )( t
+ ••• +
0,0, ••• ,1
kie kje . r;1j
e-l
1 -
and vte}
is the largest root of
~(e).
t
11
).1
1=1
Making use of the same
estimates as in the c sample extension of Chapter II, we can estimate
:(e) and correspondingly v(c}. Computationally, however, it may be
quite laborious to obtain the estimates of v(c). The same
22
sequential procedure and stopping rule can be used as 1n the two
sample case. Moreover, the consistency, efficiency, and related
results follow as in the two sample case with only slight modifi·
cations for the c samples.
Chapter IV will be devoted to an application of the theory
developed in Chapters II and III.
In particular, we will first
consider a sequential analogue of the Behrens·Fisher problem. We
will not review the literature on the Behrens-Fisher problem explicitly since it is quite ancillary to our particular topic. However,
good discussions of this problem can be found on pp. 118-122 of
Anderson [1958], pp. 369-371 of Wilks [1962], and pp. 134-151 of
Kendall and Stuart [1967].
Basically, the Behrens-Fisher problem is
concerned with testing for a difference in location of two normal
distributions when there is also a difference in dispersion. Therefore, in Chapter IV, we will be concerned with the estimation of a
difference in location of two normal distributions when there is also
a difference in dispersion and the total sample size is fixed.
Similarly, we will want to obtain a fixed-width confidence ellipsoid
for a difference in location of two normal distributions when there
is also a difference in dispersion. As our estimator of a parameter
that will reflect a difference in location, we will use the Wilcoxon
statistic. Next, we will estimate a difference in scale of two
normal distributions when the total sample size is fixed.
Similarly,
we will determine a fixed-width confidence ellipsoid for a difference
in scale of two normal distributions. As our estimator of a parameter
e·
23
that will reflect a difference in scale, we will use Lehmann's two
sample scale statistic.
More explicitly the content of Chapter IV will be based on a
computer simulation. Random observations will be generated from
bivariate normal distributions. One should note that such workers
as Ray [1957], Starr [1966], and Robbins et a1. [1967] have used
computer simulations in discussions of their respective procedures.
We will be concerned in our simulation of the procedure described
in Chapter II with the effect of various parameters on a comparison
of an estimate of the asymptotic generalized variance of our estimator
with the minimum asymptotic generalized variance of any two sample
U-statistic estimator for which the total sample size is fixed.
.
~
In
our simulation of the procedure described in Chapter III, we will
be concerned with the effect of various parameters on the coverage
frequency and the ratio of the average sample size of our procedure
to the sample size one would expect if the covariance matrices were
known.
l.~
Some Applications
Since we are considering generalized U-statistics in this paper,
many applications of this theory are possible.
will consider t
= 1 and p = p* = 1 in. the
For simplicity, we
following examples.
Example 1.4.1
Let e(F,G) • P (X
<
Y) - P (X
>
Y) where X and Y have the
cumulative distribution functions F and G respectively. The kernel
24
for a(F,G) is
~
=
(X,V)
1
if X < V
o
if X = V
if X > V
-1
The corresponding U-statistic is the Wilcoxon two sample statistic"
In this case, V10(x h) = ! Ah where Ah is equal to the number of Y's
n
which are greater than xh minus the number of Y's which are less than
xh" V01(Y t ) = ! Bt where Bt is equal to the number of x's less than
m
Yt minus the number of x's greater than Yt " Thus,
m
I
5 •
10
h=l
(m - 1) n2
n
I
=
- 2
(A h - A)
t=l
e.
(B
t
9)2
-
m2 (n - 1)
Example 1.4.2
Let e(F,G) =
,- [F(x) - G(x)]2
d {F(x)
2G(x)l
_00
The kernel for e(F,G)
~
(Xl'X 2,V 1 ,V 2) =
1 if max(x1 ,x )
2
or
<
min(V l ,V 2)
min(X1 ,X 2)
>
max(Y l ,Y 2)
0
otherwise
25
~
The corresponding U-statistic is known as lehmann's two sample
statistic. S10 and SOl can be expressed as functions of the Ai's
and Bj's considered in Example 1.4.1. but will be omitted here due
to their complexity.
One should note that the theory discussed in this paper will
only apply in this case if F and G do not agree at least on a set of
points of measure nonzero. This restriction is necessary since under
the null hypothesis the asymptotic distribution of this two sample
U-statistic is not nonna1.
Example 1.4.3
let e(F.G)
.~
= ,- ,-
-. -.
(y-x) dF(x) dG(y)
The kernel for e(F.G) is
+(X.Y)
The corresponding U-statistic is
= y -
X •
Y- X.
In this case.
Thus.
n
=
i~1
(Yi - y)2
n-1
CHAPTER II
TWO SAMPLE ESTIMATION WITH FIXED TOTAL SAMPLE SIZE
2.1. Introduction
In this chapter we will discuss a procedure for estimating a(F,G)
-
by means of U
_m,n with the restriction that m+n = N (fixed). After a
brief discussion of the mechanics of the procedure, the strong convergence of certain estimators is demonstrated.
In order to prove the
asymptotic normality of a generalized U-statistic with random sample
sizes, we require several preliminary results.
First, a straight-
forward extension of Hoeffdingls [1948, p. 305] one sample result
(see, for instance Puri and Sen [1971, p. 65]) demonstrates that U
_m,n
is asymptotically normally distributed for fixed sample sizes. Second,
by means of a Hajek-Renyi type inequality developed by Sen [1971, 1972],
"uniform continuity in probability", a concept developed by Anscombe
[1952, p. 600], is proven for all possible linear compounds {t_l U
-,
m n'
t ~ ol. Third, the proof of the asymptotic normality of a generalized
U-statistic with random sample sizes is completed in the manner of
Mogyorod [1967].
Finally, the asymptotic optimality of our procedure
is demonstrated by showing that almost surely our procedure will
allocate the total sample size in the ratio that will minimize the
asymptotic generalized variance of any two sample U-statistic estimator
of a(F,G) for which the total sample size is fixed. Moreover, based
e·
27
on the above results. it is easily shown that the asymptotic generalized variance of our estimate of e(F.G) (appropriately normalized)
is equal to the minimum asymptotic generalized variance of any two
sample U-statistic estimator of e(F.G)
(also appropriately
..
normalized) for which the total sample size is fixed.
2.2. Procedure
let the total number of observations be fixed at N. Select
Mo/2 observations from the first population and MO/2 observations
from the second population where MO is even and
MO >2[max {kit i • 1•••••t; qj' j • 1•••••t} J.
·e
1j
ij (as defined in Section 1.3 ) •
Compute for i,j _- 1•••••t. 510
and SOl
Consider the txt matrix
28
t - O•••••t; the summation is over all possible 1
1 ~ 81
< ••• <
8t-t
~
~
a1
< ••• <
a1
~
t.
t where Si i aj; i-1 ••••• t-t; j • 1•••••1 ;
and I denotes the number of inversions of 1••••• t in the permutation
a1· •••• at. B1' ••• , Bt-t· For example consider t-2.
2 11
2 11
k1 S10 + q1 SOl
m
n
..
r..m.n •
c0 -
cl -
21
21
k1k2 S10
ql q2 SOl
+
m
n
12
12
k2k1 S10
q2 q1 SOl
+
m
n
q2 S22
k2 S22
2 10 + 2 01
m
n
q2 S11
1 01
12
q2 ql SOl
21
q1q2 SOl
2 22
q2 SOl
11
k21 S10
12
q
q2 1 SOl
12
k2k1 S10
2 11
q1 SOl
. 21
k1k2 SlO
q2 S22
2 01
k2 s22
2 10
21
ql q2 SOl
k2 s11
1 10
k 2k 1
sl~
21
k2
s22
kl k2 SlO
e·
2 10
In order to find the values of mand n that will minimize
...
I~m,nl
subject to the constraint that m+n - N, the method of Lagrange
29
multipliers may be used. That is, if
1
u·-
m
and
1
v
=- ,
n
then we must minimize
where AI is the Lagrange multiplier. The solution for t
= 1 and
kl = ql is given by Yen [1964, p. 1101]. By appropriate algebraic
manipulations it can be shown that the desired solution is obtained
by solving for u in the following equation:
.e
(2.2.2)
It is important to note that a valid solution in this case is real
l
va1ued and 1i es in the i nterva 1 -N
< u ~ l/w
l where
-w2 wl
= max
{k i , i
= l, ••• ,t}
and w2 • max {qj' j-l, ••• ,t}. Certain
iterative techniques such as the Newton-Rhapson method can be used in
the solution of (2.2.2). Generally, there will be more than one
solution for u. The appropriate solution in this case will minimize
where 1
+1
u V
= N.
One should not only check the values of u obtained
30
1
by the Newton-Rhapson method in the interval
N-w2
and l/w l •
1
~
u ~ l/wl
but also the boundary values of ----N-w2
It is of interest to note that if t=l, then
a
au
[ -co + c l (Nu - 1) 2 ]
>
a
in the interval under consideration, and there is only one solution
to
in that interval. Thus, for t=l, any solution to (2.2.2) in the
interval considered is the unique 'solution. However, in general,
since (2.2.2) is a polynomial expression, there may be more than
one solution in the interval which could denote either a maximum
or a minimum.
Based upon the minimiation of
Ir_m,n I,
A
calculate >.
=
V
u+v
where u and v are obtained as above. At the next step in our
sequential procedure, select
Mi
M
l
observations, but this time take
observations from the first population where
Mil
1
= o
•
if
<
(MO/2) + 1
>
otherwise
e.
31
~
and M1 - Mi observations from the second population. Now using the
i .
MO + Hl observations recalculate the Slij and S J t i,j c l, ••• ,t
O.
01
and the corresponding i based upon these estimates. Repeat this
procedure until all Nobservations have been taken. Thus, we will
calculate a sequence
~
{A}
~
at various stages of iteration with AN
at the last stage. Calculate the estimate of
~(F,G)
and its generalized sample variance as I~ MI,NI
as
~MI,NI
I where HI is the
number of observations from the first population and NI is the
number of observations from the second population.
In our procedure the number of steps and the sample sizes for
each step are quite arbitrary (influenced only by w1 and w2 which
are defined earlier).
It would be nice to have some theoretical
justification for the selection of the number of steps and the
sample size for each step, but, unlike Yenls [1964] approach, the
theoretical development used in this chapter does not lead to answers
to such questions. Perhaps simulation studies could be of benefit
here.
2.3. Strong Convergence of Estimators
kq < . and m~ nO ' n ~ nO ' then Umn converges
almost surely to its expected value, e(F,G), as nO + • •
Lemma 2.3.1. If
t
Proof. We must show that for any two
arbitra~ly
small positive
quantities £ and 6, there exists an no(£,6) such that
(2.3.1)
P{I Umn - e(F,G)
I
n ~ nO(t,6»
>
£ for at least one m~ nO(£,6) •
<
6
32
For m~ k, n ~ q, we define
where
I
* extends over all possible 1 ~
extends over all possible 1 ~ 81
<---<
CI'J
<.. -< Ok ~
mand
I
**
8 q ~ n•.
By the Bonferroni inequality, we obtain
(2.3.2)
p {IUmn - e(F,G)
I
>
p {IUm(l) - e(F,G) I
£ for at least one m~
£/3
>
p {IUn(2) - e(F,G) I
£/3
>
p {IUmn (3) + e(F,G) I
>
£/3
nO(£'~)
,
for at least one
for at least one
for at least one m ~
nO(£'~)'
33
~
•
Let e(l) (and e(2»
be the a-field generated by the
m
n
unordered collection {~l' ••• '~m} and by ~m+l'~m+2' ••• (and
and by Y+l'Y
..n ..n+2 •••• ) so that e(l)
m is ~
{.
Yl.•••••Y}
..n
in '
mand e(2) is ~ in n. Also let e(3) be the product a-field
n
mn
e (~) x
c(~). for all m.n ~ 1. Now {Um(l) • m ~ k} and
{ Un(2) , n ~ q} are the usual Hoeffding [1948] one sample Ustatistics, so that by the same argument as in Berk [1966. p. 56],
{ Um(l) •
e(~), m~ k}
is a reverse martingale. and {Un(2)'
e(~) ,
n ~ q} is a reverse martingale. Therefore, {U;(l) • e(~) • m ~ k}
is a non-negative reverse semi-martingale, and {U~(2) • e(~), n ~ q}
is a non-negative reverse semi-martingale by a direct application of
Jensen's Inequality. Now applying Ko1mogorov's Inequality for
non-negative semi-martingales modified for reverse martingales (see
Feller [1965, p. 235]).
(2.3.3)
p
{I Um(l)
-
e(F,G)1
>
£/3 for at least one
..
..
o.
By an argument similar to that used in deriving (1.2.3),
•
34
(2.3.4)
p {IUn(2) - O(F,G)I
>
<
£/3 for at least one
9 £-2
E{ [Un (2) -
o
8(F,G)]2 }
where
let us now classify the rectangular state { Umn (3) + O(F,G),
m ~ nO(£ ,IS) , n ~ n (£ ,6) } into 1inear arrays
O
(1) {Umn (3) + O(F,G) ,n· m,m+l, ••• } , m· nO,nO+l, ...
(ii) { Umn (3) + o(F,G) , m· n+l,n+2, ••• } , n· nO,nO+l, •••
Now by an extension of the same argument as in Berk [1966, p. 56],
it can be shown that for every m (m
0:>
{Umn (3) + e (F ,6) ,
c~)
and for every n (n
= "0'"0+1, •••
= nO,nO+l, •••
),
1s a reverse martingale,
, n = m,m+l, ••• }
),
e
(2.3.5)
{ Umn (3)
+ o(F,G) , C ~~) ,m = n+l,n+2, ••• }
1s a reverse martingale.
One tan easily verify (2.3.5)
a
< j
k-
, 1
~
a
1
by
< ••• <
noting that for any
B <
q-
E ( +(X a ""'Xa 'Va ""'V s ) I
1
k 1
q
n,
C (3) }
jn
35
4It
is the same, and hence, equal to
a.s.
• Ujn
Hence,
(2.3.6)
.
.'~
for all h
= l, ••• ,j-no
•
A similar argument can be applied to show that
(2.3.7)
(3)]
C jn
E[ U
nO+h(l) I
for all h
= 1, ... ,j-nO •
=
U
j(l)
a.s.
Therefore, (2.3.5) follows from (2.3.6)
and (2.3.7).
Once again by a direct application of Jensen's Inequality,
one can verify that for every m (m = nO,nO+l, ••• )
{[umn (3) + e(F,G)]2 , c(3)
mn ~ n
= m,m+l, •••
} is a non-negative
reverse semi-martingale, and for every n (n=no,no+l, •••
{[Umn (3) + e(F ,G)]2 ,
c~) , m = n+l ,n+2, ... }
)
is a non-negative
reverse semi-martingale. Now again by application of Kolmogorov's
~
Inequality for non-negative semi-martingales modified for reverse
martingales, we see that
(2.3.8)
P{ IUmn (3) + e(F,G)\
n ~ m}
for every m• no,no+l, •••
<
>
£/3
for at least one
9 £-2 E { [Umm(3) + e(F,G)]2 }
36
By arguments similar to those used in deriving (1.2.3) and (1.2.4)
E { [Umm (3)
+
e(F,G)] 2 } =
Also
(2.3.9)
Umn (3) + e(F,G)1
P{I
m ~ n+1 }
~
£/3
>
-2
for at least one
2
E { [U n+1,n(3) + e(F,G)] }
9 £
for every n = n ,nO+1, •••
O
Similarly
E {[Un+1 ,n(3) + e(F,G)]
2
}
=
2 2
k q t 11
n(n+1)
Then by (2.3.8), (2.3.9), and the Bonferroni Inequality, we obtain
(2.3.10) P {I Umn (3) + e(F,G)1
+
I
k
aD
I
n-n
£/3
>
O
for at least one
2 2
q
1;11
n(n+1)
Thus, from (2.3.2), (2.3.3), (2.3.4), and (2.3.10), we see that
37
P{ \Umn - e(F ,G) I
(2.3.11)
> £
for at least one m~ nO' n ~ nO
2
2
<
9 .-2 [ k '0
nO
+
q l:ol
+
•
[ k2 q2 t
ll
m2
m:no
nO
}
Therefore, since the r.h.s. of (2.3.11) can be made arbitran1y small
by a proper choice of nO' (2.3.1) holds.
Theorem 2.3.2.
If E{ [ • i (X1 , ••• ,X k ,Vl •••••Vq )] 4 }
..
.. -1-1
S~~ converges almost surely to
surely to tb~ as N +
Proof.
m
t
for i.j
<
m
~~ and S~~ converges almost
= l, ••• ,t.
(For the notation used in this proof, see Section 1.3.)
ki
t
c-O
k
(c i) (
m-k
q
n-qi i .
i) ( i) (
)
J
kj-C d
qj-d Uc,d;m,n
where we shall assume without any loss of generality that k;
q. < q .•
1 -
(m-1 )
•
J
-1
m
t
hel
~
kj
,
38
For c
= O.l •••••ki and d = O.l •••••qi • let
all other (c,d), except (0.0) and (1,0). amn(c.d)
(2.3.12)
ij
s~~ = u1.0;m,n
=
-
1j
uO.O;m.n
0(N- 1) for c
= 0(N- 1).,
Thus.
Qmn(c.d) U1j
c d c .d ;m.n
+ I
= O.l ••••• ki
I
and d
= O.l ••••• Qi.
39
The result follows almost immediately from (2.3.12). By
ij
Lemma 2.3.1, the U-statistic uij
U
1 ,o;m,n
oI o;m,n converges
almost surely to its expectation
amn(c,d)
= O(N -1 )
t{g
as N + ~. Secondly, since
for c • O,l, ••• ,k i and d
= O,l, ••• ,qi
and
U~~d;m,n is a U-statistc with finite variance for c = O,l, ••• ,k i
and d • O,l, ••• ,qi '
a.s.
ij
«--(c,d)
u
c d mn
c,d;m,n
t
t
ij
This completes the proof that S10
i .
proof for SO~
a.s.
+
r,;U
01
as N +
+
a.s.
+
~
ij
r,; 10
0
as N +
as N +
~.
~
•
The
follows in a similar fashion.
(The decomposition used in this proof is quite analogous to the
one used in Theorem 3.2 of Sproule [1969] who considered the case
of single sample U-statistics.)
2.4. ASymptotic Normality
Theorem 2.4.1 and its proof are direct extensions of the work
of Hoeffding [1948]. Although the theorem and its proof have
essentially been stated elsewhere by other workers, they are included
here for completeness.
In our proof we will use the decomposition
suggested by Puri and Sen [1971, p. 65] and the argument given by
Hoeffding [1948] in his proof of Theorem 7.1.
40
Theorem 2.4.1. Assume that there exists a sequence { (mr,n r ), r
of positive integers such that
11m m
r... r
=
mr
11m
r... mr+n r
11m n
lID
r...
• >.
r
where
0
=
<
>.
~
1 }
lID
<
1
i-1, ... ,t,
then as r
+
,
lID
[Urn
- r,n r
- e(F,G) ]]
-
+
where
Proof. Consider the t quantities Y • Wi + 2 where
i
i
mr
t
h=l
i
[.lO(X_ h) - 8.(F,G) ]
1
and
z.1 =
nr
i=l, ••• ,t.
Note that Wi is a sum of mr independent, identically distributed, "random
variables and Zi is a sum of nr independent, identically distributed
41
~
random variables. and Wi and Zi are independent. Thus. y1 ••••• yt
are random variables with zero means whose covariance matrix is
[[
+
r 1j
e
"10
m
r
By the Central Limit Theorem for vectors given by Cramer [1937,
p. 112],
1[
mr n
+r
mr nr
as r
]1/ 2
eo
(W + Z)
+
+ ••
• 2
2 11
[ k j °tl0
- 2
mr
2
+
qj
mr -1
n
(
(k _l) q1r)
1
m n
(k r ) (qP
i
:r
11
t01
]
k1 1';11
10
+ O(N;2)
+
m n-1
(k r ) (qr_ 1)
1
1
m
n
( r) ( r)
ki qi
qi 1';11
01
42
Let {~i' i ~ 1} be a sequence of stochastic t-
Lemma 2.4.2.
dimensional random vectors where t
be the a-field generated by
E ~i •
~
and
~i =
=
that E (~hl Cn)
E !i
~n
>
(~i'~i+1'
~i
1. Let Ci = C (~i'~i+l' •••
••• ) i
~
1. Suppose that
exists for all i
~
1. Also suppose
For a non-decreasing sequence
{ci} of positive constants,
c {sup (1' A 1)-1/2 I l'Z I }
P [ max
n<h<n+N h 1 ~O
• •
• -h
> £ ]
Proof. By the Schwarz Inequality
(1' A 1)-1/2
sup
- - -
for all h
>
I
l' Z
- -h
I
h~l.
(>0)
=
1. Hence.
c {sup (1' A 1)-1/2
P [ max
n<h<n+N h 1JfO - - .
•
P [ max
n<h<n+N
)
a.s. for all n ~ h ~ 1. Let ~ be an arbitrary
(t x t) positive definite matrix.
1~0
~
c~
Yh
>
£2]
l' ~h
I }
>
where Yh
£
]
1 Z)
= (Z'
.h A_ .h
43
= Y + 2 E { (Zh - Z ) I
n
>
Yn
.....n
I Cn
}
A-1 Z
......n
a.s.
for all n ~ h ~ 1. Hence {!h' Ch, h ~ l} fo..-ms a non-negative
reverse semi-martingale sequence. By reversing the role of the
index set and thereby converting to a forward semi-martingale,
we get our desired result by Theorem 1 of Chow [1960].
(This
theorem and proof constitute only a slight modification of Theorem 1
of Sen [1971].)
Define
+'
+
r* • max (r,n)
s* • max (s+l,m)
N*
= min
(M,N).
44
With these quantities defined. we can now consider the following
1enma.
Lemma 2.4.3. Let A be an arbitrary (t x t) positive definite matrix.
If E {[ .i(X 1 •••• :x k 'Y1 •••••Yq ) ]2} < • • i = 1•••••t .{Cmn }
-i -i
is a sequence of positive constants non-decreasing in each
argument. and m ~ max (k i • i=l •••••t) • n
then for every £ > O.
~
max (qi' i=l, ••. ,t) ,
c {SUP (1' A 1)-1/2
(2.4.1) P [ max
max
m<r<M n<s<N rs 1 ~O - ••
I~' [~rs - ~(F.G)
+ c2 tr (A- 1 r(2»
Mn
• .n
+
]1 }
+
N*
s=n
{
<
2
-1 (2)
- cMs-l) tr (A r s )
(c~s
'
..
s=n+1
1 r(3»
t
{ c2rr* tr (A• .rr*
r=m
t
£]
N
E
N*
+
>
+
N
1 r(3»}
(c~s
- c~,s-l) tr (A- .rs
s=r*+l
t
2
1 r(31) + M
) tr (A- 1 r(3»}]
tr
(A(c 2rs _ c2
E
css *
_ .rs
• .ss
r.s-1
r=s*+l
where for N* < m (or < n) the corresponding summation is understood
to be zero.
45
Proof. For m~ max(k p 1=1, •• ott) and n ~ max(Q1' 1=1, •••,t)
we define
~m(l) = (U~(l)""'U~(l»' where
•
f
Un(2)
~
0 (X
1
. . al
, ••• ,X
...ak
)
1
Y)
_ (n) -1 . ** 1 (Y
q.
t
.Oq B , ••• , Q
i
i ... 1
...... qf
-
and ~mn(3) = ~mn - ~m(l) - ~n(2) where the summation
over all 1 ~ al
1 ~ 81
< ... <
< ... <
aq
f
~
ak
i
~
1:
* extends
m and t ** extends over all
n. Then
E [ (~m(l) - ~(F,G»' ~-1
(~m(l) - ~(F,G»] =
tr [ ~-1 E (~m(l) - !(F,G» (~m(l) - ~(F,G»' ]
• tr
(A- 1
...
r (1»
... m
E [
(~n(2) - !(F,G»' ~-1 (~n(2) - !(F,G»]
E[
(~mn(3) - !(F,G»' ~-1 (~mn(3) - ~(F,G»] = tr (~-1 ~~»
=
tr (A- 1 r(2»
...
...n
By vfrtue of the monotonfc1ty property of {c rs }' we obtain by the
46
Bonferron1 Inequality that for every £ >0. m> max(k .• i=l •••••t)
-
1
and n ~ max(qi' i=l, ••• ,t) ,
max
c { sup (t' A t)-1/2 •
(2.4.2) P [ max
me r< M n< s< N rs
.t;0 - --
.t'
P [max
crN
mer<M
+ P[max
n~s<N
+ P[max
~r<
(~rs
- a (F ,G})
{sup (t' A t)-1/2
t;O - _ ..
CMs {sup (t' ~ ~)-1/2
t;O
I}
>
£
]
<
t' (_U r (l) - a(F.G» I } > £/3]
I ~' (~s(2) - ~(F,G)) I } >
£/3]
max
c { sup (t' A.t}-1/2 I l'(U rs (3)+a(F.G»1 } > £/3]
M n< s< N rs t;O - ....
....
..
As before. let C(l) ( and C(2» be the a-field generated by the
.
m
n
unordered collection {X 1 •••••X } and by ~m+1'~m+2""
(and
..
..m
(l) is oJ. 1n
{y1.···. yn} and by yn+1. yn+2 •••• ).
. so that Cm
- ..
....
mand c(2) 1s
n
oJ.
in n. Also let C(3) be the product a-field
mn
C~l) X C~2). for all m.n ~ 1. Now.{ ~m(l)' m ~ (k i , i=l, ••. ,t) }
and (
~n(2)'
n ~ max(qi' i=l, ••••t)} are the usual Hoeffding (1948]
one sample U-statistics. so that by the same argument as in Berk
[1966. p. 56]. ( ~m(l)' C ~1). m~max(ki' i=l ••.•• t)} is a reverse
martingale. and (~n(2)' C ~2), n ~ max(qi' i=l, ••• ,t)} is a
e·
47
reverse martingale. Therefore. by Lemma 2.4.2.
(2.4.3) P [ max
crN {SUP (1 1 A 1)-1/2 •
mkr<H
t~O
~
~t' (~r(l) 9 e- 2
{c 2
!(F.G» I }
>
e/3]
<
M
tr (A- l r(l»
+
mN.m
(c~N - c2r_1 N) tr (A- l rr(l»}
r=m+l
•
~.
I
(2.4.4) P [ max
cMs { sup (t' A t)-1/2 •
n<s<N
t~O·
~ ~
t' (~s(2) -
9
.
~
e- 2
{C~n
tr (A-1
~(F.G»
~~2»
+
I }
~
s=n+l
Let us now classify the rectangular state
m~ r
~M.
n~s
~
e/3] ~
(c~s - c~ s-l)
•
{~rs(3)
+
tr (A- l r(2»}
·s
~(F.G),
N } into linear arrays
{~rs(3)
(i)
>
+ !(F.G). s = r* •••••N }. r • m••••• N* and
(ii) (~rs(3) + !(F.G). r
In our proof. we assume that N*
>
= s* •••••M}.
s • n•••••N* •
(m.n) otherwise one of these
subsets will be null. Now by an extension of the same argument as
in Berk [1966, p. 56], it can be shown that for every r (m<r<N*)
{U rs (3) + e(F,G), C rs
(3), s = r* •••• ,N} is a reverse martingale.
~
~
and for every s (n~s<N*)
(~rs(3) + !(F.G). C ~~). r • s*, .•• ,M}
48
is a reverse martingale. Again. by lemma 2.4.2,
C {sup (t' A t)-1/2 •
(2.4.5) P ( max
dO
- -r*<s<N rs
I
t' (U rs (3) +
- -
e
-
-2 {C2 * tr (A- 1 r(3»
£
rr
_ _rr*
9
I }>
(F.G»
N
t
+
s=r*+l
£/3]
<
(c 2 - c 2
rs
r.s-1
) tr (A- l r(3»}
_ _rs
for every r = m, •••• N* , and
(2.4.6)
P [
max
s*~r<M
c rs {SUP
tfO
(1' A1)-1/2 •
- --
-
It' (U rs (3) + e(F.,G»
_...
9
-2
£
I } > £/3]
~
M
1
tr
(Ar(31)
+
t
(C~s - c2r ,S_1) tr (~-1 ~r(3s»}
{C ss *
- _ss
r=s*+l
- -
2
for every s
= n••.. ,N*. Then by (2.4.5), (2.4.6). and the Bonferroni
Inequality, we obtain that
(2.4.7)
P (
C { sup (t' A t)-1/2 •
max
max
m<r<M n~s<N rs t fO - --
I t'(U rs (3) +
--
9 £ -2
{
£/3] <
e (F.G»I } >
N
N*
t
t
[C~r* tr (A- 1 r(31) +
s=r*+l
ram
. .rr
tr (A-1 r(3»] +
..rs
N*
t
[c 2
S5*
(c2
rs
- c2
tr (A- 1 r (3\) +
-
- 55'
r,s-l
).
49
+
M
E
r=s*+l
Thus, (2.4.1) follows from (2.4.2), (2.4.3), (2.4.4), and (2.4.7).
(This proof and the proof of Lemma 2.4.4 are quite analogous to
the arguments presented in the proofs of Theorem 1 and Theorem 2
of Sen [1972] respectively.)
Next, we will study the "unifonn continuity in probability"
(with respect to [m n / (m + n) ]-1/2 ) of all possible linear
compounds {
~' ~mn
'
g}.
~ ~
Lemma 2.4.4. Assume E { [. i (X , ••• ,X k ,Y1 , ••• ,Yq )] 2 } <
1
-i
i = 1, ••• ,t. For every positive E and n , however small, there
CD
~
exists a 6
>
,
-i~
0, such that
P { max
max
Ir-ml<6m Is-nl<6n
for all t
~
O}
> £
< n
where A is an arbitrary (t x t) positive definite matrix.
r
= m, ••• ,m* } is a reverse martingale. { ~s(2) - ~n*(2)' C ~2),
s = n, ••• ,n* } is a reverse martingale. { ~rs(3) - ~m*n*(3)' C ~~),
50
s
= r*, ••• ,n*
} is a reverse martingale for every r
and {~rs(3) - ~m*n*(3)' c~~) , r
martingale for every s
= n, ••• ,n*.
= s*, ••• ,m*}
= m, ••• ,m*
is a reverse
Then, by the same proof as in
lemma 2.4.3 with some simplifications,
(2.4.8) P {max
max
m<r<m* n<s<n*
+
J--m+n
(t' A t)1/2 , for all t
£
mn
m+n
,-ron
~ 0 }
<
{ tr [A- l (r(l) - r(l»] + tr [A-l (r(2) - r(~» ]
_m
~m*
~
~n
~n
t
t
t
tr [A- l (r(3) _ r(3) ) ] .+ t tr [A- l (r(31 _ r(3) ) ] }
rr*
m*n*
~ss
~m*n*
r=m
s=n
where t= min (m*,n*) and for
t~<
m ( or < n) the corresponding
summation is understood to be zero. The above follows since
"
for r
=m, ••. ,m*
r(l) - r (1)
~r
~
m*
=
and s
m*-r
mW r
= n, ••• ,n*.
Note that
51
..
(2)
rs
n*-s
n* 5
r (2) •
.. n*
r (3) - r (3)
.. rs
.. m*n*
for all m*
>
=
« q1 qj tM
m*n* - rs
---« k1kj Q1qJ'
m*n*rs
r , n*
~
+ 0(5-1 »)
t
1j
11
for n*> s
1
1
+ O(r-) + 0(5- ) »
s. The r.h.s. of (2.4.8) becomes
6
(2.4.9)
1+6
m 6
tr [A-1 rO)l +-_tr[A- 1 r(2)]
m+n 1+6
where c is a positive constant and
Note that (2.4.9) can be bounded by
(2.4.10) 9 £-2
m c2
~ + _
1+ 6
n+m
n cl
_
[ n+m
where c1' c2' and c3 are positive constants. Thus, (2.4.10) can
always be made less than 0/4 by proper choice of 6. The cases
for (i) m - [6m]
~
r
~
m, n - [6n]
~
s
~
n; (ii) m - [6m]
~
r
~
m,
52
n~s
~
n* ; (iii) m ~ r
~
m* , n -
[~n] ~
s
~
n follow similarly •.
To complete the proof of the asymptotic normality of a generalized U-statistic with random sample sizes, we will prove Theorem 2.4.7
by an argument, only slightly modified, given by Mogyorodi [1967].
In the proof of Theorem 2.4.7 we, like Mogyorodi, will use the fo110w-
-
ing notation; A is the random event consisting of the non-occurrence
of the random event A; B 0 C is an abbreviation for the symmetric
difference of the random events Band C. Also, in the proof of
Theorem 2.4.7 we will use the following lemmas given by Mogyorodi
[1962].
xl ,X 2 , •••
lemma 2.4.5. let
be a sequence of random variables
converging in probability to the' random variable X as n +
further a and b (a
<
let
-.
b) be continuity points of the distribution
o
function of X. let A denote the event {a
{a
~
Xn
<
b}. Then P (An
A)
0
+
0 as
~
X < b} and An the event
n +-.
lemma 2.4.6.
If X1 ,X 2 , ••• is a sequence of random variables which
converge in probability to a random variable X and a is a continuity
point of the distribution function of X such that P (X
(or P (X
~
that for n
a)
~
<
<
a)
< £
£) then there exists a positive integer nO such
nO' P (X n
<
a)
<
2£
(or P (X n
~
a)
<
2£).
Theorem 2.4.7. Consider a sequence of stochastic vectors
{(M~, N~),
r
~
1}, where
M~
and
N~
only assume positive integer
values. Moreover, assume there exists a sequence {(mr , nr ), r
>
1}
e·
53
~
of positive integers such that
lim m • •
r... r
= •
lim nr
r...
•
o<
A
A<
1
and
M'
r
N'
r
p
p
+
nr
where A(l) and
If 0
<
A(2) are independent positive random variables.
1
2
E {[, (~l""'~ki'!l•••••!qi)] } < • • i = 1••••• t. then
lim
r...
.~
Proof. We suppose first that A(l) and A(2) have continuous distribution functions.
o
Pea ~ A(l)
<
b)
Let us choose 0
1-
>
£
a
<
band 0
and P(c ~ A(2)
<
d)
<
>
1 -
< C <
£
d so that
are satisfied.
For simplicity of notation let F(x) be the limiting (as r... )
distribution function of
Ym n
r r
where
1
nr
.
[~r 1/2
1
-
1
[U- mr nr - o(F,G)]
_
mr+n r
•
•
is an arbitrary non-null vector. Note that r O is assumed to be
positive definite.
and choose v
>
Let x be an arbitrary continuity point of F(x)
0, so that
I F(x) - F(x
±
v)1
<
£
is satisfied.
It follows from Lemma 2.4.4 that for v
exists a C (V.n)
>
>
0 and n
0 and PO (V.n) such that for all n
~
>
0 there
PO' m ~
Po
54
(1' r•O 1)-1/2
•
p [( :.: ]1/2
I l• ' (U.rs - U
.mn )1
least one (r,s) such that Ir - m I
Is - n I
<
en)
<
<
> V
for at
C m and
0
Consider the following possibilities
(l)
U1 C (v,o)
>
b- a
and
U2 C (v,o)
>
d- c
(2)
U1 C (v,o)
<
b- a
and
U2 C (V,o)
>
d - c
(3)
U1 C (v,n)
>
b- a
and
U2 C (V,o)
~
d- c
(4)
Ul C (V,o)
<
b- a
and
U2 C (V,o)
~
d - c
e·
where U1 and U2 are arbi trari.1 y 1arge constants, but U1 U2 n
and (U1 + U2) £ are small.
First we shall deal with possibility (1).
let k (
~
and h (
~
In thi.s case
U1) be the minimal integer constant for which k C (V,n)
U2) be the minimal integer constant for which h C (v,n)
~
b-a
~
d-c.
Then we divide the interval [a,b) into k subintervals by the
subdivision points ai (1=O,1,2, ••• ,k) such that
and divide the interval [c,d) into h subintervals
points b (1=O,1,2, ••• ,h) such that
i
by
the subdivision
55
Let us introduce the following notation
. J
H'r
(i • 1, ••• ,k)
m,B(j)
= { bj _1 ~ ).(2)
B(j)
N'
= { b.J- 1 -< -L.
<
r
< b }
j
b }
(j • 1, ••• ,h)
j
nr
We have by Lemma 2.4.6 that there exists an r(l)
o
=
r(l)(£) such
0
that
H'
P{ a~-L.<
.e
=
b}
~
k
PC
Lt
ACi»
1=1
r
for r ~ r~l), and that there exists an r~2)
N'
P { c ~ ---.!:... < d h:
nr
h
P (
t
B( 1»
1-2£
>
=
r~2)(£) such that
>
i=l r
1 - 2
for r ~ r~2).
Again for simplicity of notation let
1
YH,r, H'r
2
=
t
l
[UMr,H r - !(F,G)]
(I.' r O 1.)1/2
Thus, if we denote by A the event
Mr'
{a<_
<b}
-
mr
£
56
~
and by B the event {c
P ( YH, N'
r' r
<
x)
N'
_r_
nr
<
e
d }, then
= P ( YH, N'
r' r
< X ,
P ( VH, N'
r' r
<
A B)
x ,
+
Xu ~ )
Since the second member on the right-hand side of this equality
is smaller than 4£ for r ~max(r~l), r~2)} , we have only to deal
with the first member. let us denote by C (r,k,h,i,j,v) the event
{ I VH, N'
r' r
i
= l, ... ,k
(2.4.12)
=
, j
Y
r' r
< X ,
H'r
a~-mr<b
h
I
t
P ( YH, N'
i=l j=l
r' r
k
h
i=l
j=l
t
mrai_1,nrbj_1
I < v
}
= 1•... ,h. Then we can write
P ( VH, N'
k
-
t
P ( VH, N'
r' r
and c
N'r
nr
<-<
d )
<
x, A~i) , B~j) , C (r,k,h,i,j,v»
<
x, Ar{i) , B~j) , C (r,k,h,i,j,v) )
+
Applying the inequalities
P ( ABC)
~
we have from (2.4.12)
P ( A B) - P ( A C ),
P ( A B)
<
P (A),
57
(2.4.13)
k
h
t
t
i=l j-1
k
t
P(Y
b
h
x - v, A(i), B(j»
r
r
<
mr a1_1,nr j-1
p(~i), B~j), C(r,k,h,i,j,v»
I
1-1 j-l
a
<
-
M'r
m
r
<
band c
<
N'
...!. < d)
- nr
h
+ kt
t P«A 1) ' Br(j) ,C ( r,k,h,i,j,v »
r
1-1 j-l
It follows from Lemma 2.4.5 that for any
positive integer r~l)
for r
.e
>
-
£ >
•
0 there exists a
= r~l)(e) (r~l) ~max(r~l), r~2»)sUCh
that
r(l)
1
(2.4.14)
holds. Similarly it follows that for any e > 0, there exists a
positive integer r~2) = rf 2)(e) (rf2) ~max(r~l), r~2») such that
for r
>
-
r(2)
1
(2.4.15)
holds. Obviously we have for any three events A, B, and C
(2.4.16)
I P(AB) - P(AC)I
~
PCB
0
C)
and thus from (2.4.14) and (2.4.15) and repeated application of
(2.4.16) it follows that
h
I
I
P( y
. 1
mr a.1-l,n r bj - 1
i =1 J=
58
k
~
k
I
i=l
k
h
I
I
P ( Ym a
nb
r i-1' r j-1
j=l
i~l
< X
+ v. A(1), B(j»- (k + h)£
<
x :!:. v, A(1), B(j) ) ~
r
r
h
I
j-1
Hence and from (2.4.13) we have
(2.4.17)
(k + h)
k
h
I
I
i=l
j=l
£
~ P ( YM' HI <
r' r
k
h
I
t
1=1 j=l
k
h
I
I
i=l
j=l
-
x, a
P ( Ym a
r i-1
~
P ( A.r.(i), B~j), C (r,k,h,i,j,v) )
HI
M'r
-,yr- <
r
,n b.
r J-1
<
band c
<
-
--!:...
nr
<
e·
d) ~
x + v, A(i), B(j»
(k + h)
+
£
+
Since the sequence {Ym n} has the limiting distribution function
r' r
F(x), there exists a positive integer r = r 2(e)
2
such that for r
~
r 2 we have
(r
>
2 -
max(r(l) r(2»
l' 1
59
-
F(x ± v) Pea !. >. (1) < b) P(e!. >. (2) < d)
I <£.
Hence from (2.4.17) and (2.4.11) it follows that
k
(2.4.18)
F(x) - (k + h + 4)£ - I
(i)
h
1=1 j=l
~
P(YM N
r' r
I
I
<
(j)
peAr ' Br
I
M'
r
' C(r,k,h, i,j,v»
NI
r
x, a ~ IlL- < band c!. n < d) !.
r
"T
k
F(x) + (k + h + 2)£ + I
(j) ' C(r,k,h,i,j,v»
(i)
h
peAr ' Br
i=l j=l
t
Consider the following inequality
P(A~i), B~j), C(r,k,h!f,j,v» ~ P(IY u v - Ym a. ,n b I
.
,
r 1-1 r j-l
(2.4.19)
·e
>
v for at least one (u,v) such that [mrai-l] <u<-!mrai]
and [nrbj _l ] ~ v ~ [nrbj ])
We choose r3 such that for r ~ r3 where r3 = r3(£) and r3
~
m1n(mr ,n r )
k
i l l1
~
r3
h
(i) (j)
(
)
t peAr ,B r ' C r,k,h,i,j,v ) ~ hkn
j-1
Now on the basis of (2.4.18) we see that for r
F(x) - kh n - (k + h + 8)£
Since nand
r2'
Po(V,n) is satisfied. Thus, we see from (2.4.19) and
from the choice of PO(v,n) that for r
I
>
~
P(YM, N
r' r'
<
x)
~
~
r 3,
F(x) + khn+ (k + h + 6)£
are arbitrarily small and k and h are bounded by U
l
and U2 respectively for possibility (1), the desired result is obtained
E
60
for this case.
We can reduce possibilities (2). (3). and (4) to possibility
(1) by the following remark: Let U,C(v.n)
= A.
U2C(v.n)
=8
and
let Kand L be constants such that in cases (2) and (4)
b-a
A
T< !
and in cases (3) and (4)
d-c
L
<
8
2
M'
and n;. Ln r • we see that
converges in
probability to the random variable A*(l) = A(l) and N~ converges
Then if m~
mf
= Kmr
K
in probability to the random variable A*(2)
P( ! < A* (1) <
K-
!!.K )
> 1 -
£
and P(
nrr
(2)
= ~ • for
rC -< A*(2)
<
d )
L
> 1 -
which
£
e·
•
Let now. in the appropriate cases. k be the minimal integer constant
for which
(2.4.20)
k C(V.n)
>
b~a
and h be the minimal integer constant for which
d-c
(2.4.21)
h C(v.n) > T
.
0
80th (2.4.20) and (2.4.21) can be obtained for k(
by taking into account the choice of Kand L.
~
U, ) and h(
~
U)
2
From this point the
proof is the same as possibility (1).
We omit now the supposition that A(l} and A(2) have continuous
distribution functions.
First of all we can choose a and b such
that they are continuity points of the distribution function of A(l)
which means that
and
(h -
l)C'(V,~)
+
C(V,~) >
d- c
(or in case of the other possibilities, similar expressions) will be
satisfied if
£ >
0 is small enough.
Thus,
(2.4.22)
11m P(YM• N' < x) • F(x)
r'"
r'r
62
at every continuity point x of F(x). By Theorem 2.4.1 and the fact
1 ~
that (2.4.22) holds for all
lim
r ....
[
I
O. we obtain
mr nr ] 1/2
rnr + nr
I
Corollary 2.4.8. Consider a sequence of stochastic vectors
{(M'.N').
r r r -> l} where M'r and N'r only assume positive integer
values. Moreover, assume there exists a sequence {(mr.nr ), r ~ 1 }
of positive integers such that
=
mr
mr+nr
lim
r-+ao
=
nr
CD
=
CD
e·
>.
and
M'r
(2.4.23)
H'r
p
1 ,
+
<
11m
r-+ao
.v
, .•• ,Vq .)] 2 } <
1
i ..
.. 1
E { [ • i (Xl' ..•• xk
[
M~ N~
[
[
M'r + N'r
J 1/2
1
nr
mr
If 0
P
+
[
~M'r' Nr'
..
CD
•
i
= 1•.•••t.
~(F.G)]
]
=
then
Nt(~' ~O)
Proof. The proof follows immediately from Theorem 2.4.7 and (2.4.23).
63
~
2.5. Optimality Results
Leona 2.5.1.
cf as n ~, i - 1, ••• ,k,
If ci <. and X1{n) C!.ts.
then
a.s.
+
as n
+ ..
Proof.
Let Xi(n)
n+
Thus,
••
k
= cf
+ Vni
where Vni
k
k
k
n
f-l
If en)
-
n
1=1
C
t Vnf
f + f=l
a.s.
0, 1 ~ f ~ k as
+
k
cf ) +
(n
j=l
"f
t
V f V j( n c)
f~j-l n n 1=1 1
J'f~j
+ •••
.e
Let C = max
l1.Cls...k
!
k
max
( n
1t.1 1)
j=l
"f 1" •• ~fq
fl,· .. ,f q
Then,
k
Vnf
t
f 1" .. ~fq
~
C
1
".V"f
t
f 1" .. J'1 q
~
as
IYn1 I
C (
a.s.
+
k
( n
cj
j=l
"fl" •• "f q
q
IYnt I·" IVn1 I
1
q
1-1
IVn1. I )
a
for all f.
t
q
a.s.
+
a
k
)
]
64
1.j
l •••••t as n +
a
_.
then if we define
~
as n +
_
Proof. We must show that for
11m
.....
P{
U
N>n
« Cij
».
•
£
>
0
1 <I X(N) 1- 1C I) 1 >
We shall write 1 X(N) 1
t (_l)a
a
o
£}
=
0
as
Xla (N) ••• Xta (N)
1
t
where a denotes the number of inversions of 1, •••• t in the permutation
al •••• tOt and the sUlllllation t is over all possible permutations of
al ••• •tOt. Thus.
1X(N) 1 - 1c 1
giving
We can now write
=
t
(_l)a [Xl
(N) ••• X (N)
tat
a1
-
cl
al
•••
65
N~
t
Ic I) I
( I X(N) I
p { U
P {U
N>n
Xl
a1
>
£
<
}
(N) ••• Xt (N) - c1a ••• ct I
at
1
at
>
e'
}
where e' > O. Finally, by Lemma 2.5.1,
1im P { U
n-+oo
N>n
Define
I (I
X(N) I
. [[
- Ic I ) I
> £}
I:
O.
+
m
Ir mnl can be written as
'
-
I~m,nl
~ I. • t (-1)
1.=
where
I:
I
O, ••• ,t, the summation 1s over all possible 1 ~ a1
1~ 8
1
< ••• < 8t_~
t, where 81
~
aj ; i
= l, ••• ,t-I.,
< ••• <
j
at
~
t,
= 1, ••• ,t.
and I denotes the number of inversions of l, ••• ,t 1n the permutation
66
By Theorem 2.3.2 and Lemma 2.5.2. we can state that if
E { [ • i (Xl •••••Xk .Vl ••••• yq ) ] 4 }
-
- 1 -
a.s.
(2.5.1)
+
= O••••• t
for I.
for i
<.
- 1
vI.
as
N
= l •••••t
+
• then
•
.
We have seen in Section 2.2 that the values of mand n which
.
minimize If.. m nl
subject to the constraint that m+n
= N are
.
by (2.2.2). The values of mand n which minimize Ir.. m nl
to the constraint that m+n
t
(2.5.2)
i
1.-=0
l •••••t. Let AN
minimizes
...
I~m.nl
=
O.
m'/N where m' is chosen so that (m' ,nIl
-=
when m+n
= N. and let
~
= m"/N
(> 0)
where mil is that value such that {m".n"} minimizes I:m.nl
...
AN
Proof.
a.s. A
+
as N +
-
subject
given similarly by
vI. [ I.Nu - t ] (Nu - 1)1.
I
...
-=
= N are
given
• Then
•
If we can show that the solutions to {2.2.2} converge almost
surely to the solutions to (2.5.2), then our proof will be complete.
67
In this way we can be assured that the solution which leads to the
global minimum of I~m.nl
will converge to the solution that leads
to the global minimum of Irm.nl
we must show that if
t
P { (1:
1.-0
(2.5.3)
£. ~ .TI*
cl.
>
under the constraint. That is,
0, then
[I.N uO{N) - t ] [N uO{N) - 1
)'t <
E
when
I
t
t
vI. [I.N uO{N) - t ] [N uO(N) - 1 ]1.
1.-0
for all N ~ NO
.e
=
}
1 -
By {2.5.1}, we know that for all N~ No
where
TIl.
<
E
)
TI*
cl.
= vI.
(a.s.)
+ nl.
can be made arbftrari,ly small. Thus, for all N ~ NO'
<
(a.s.)
Therefore, by proper choice of ni' i
Remarks. Consider a sequence of
= 1•••• ,t
sto~hastic
, (2.5.3) follows.
vectors {
(M~.N;).
r
~
II ,
68
where M' and N' only assume positive integer values. Moreover.
r
r
assume that there exists a sequence {(mr.n r ). r
integers such that
lim mr
I'\+eD
11m
I'\+eD
=
l}
lim n =
r-.c» r
CD
mr
mr+n r
~
=
o<
A
of positive
CD
A< 1
I
where Nr = mr + nr = MrI + Nr'
Let A be chosen so that the
asymptotic generalized variance of
[~
r
(2.5.4)
n
1/2
r
m +n
r r
Ir*1
is minimized. and let
[U
.
- a(F.G)]
_mr·n r
denote that minimum value.
..
Therefore •
since Theorem 2.5.3 holds and MI = AN (M I + NI ) . we can apply
r
r
r
Corollary 2.4.8 and note that the asymptotic generalized variance of
[
(2.5.5)
is
Ir*1 .
M~ N~
] 1/2
MI+N 1
r
[UMI NI
- r' r
r
-
a(F.G)]
-
Thus, the asymptotic optimality of our procedure is proven.
One should note that we have proven the asymptotic optimality
of our procedure by reference to asymptotic generalized variances.
In large sample statistical inference for U-statistics. this technique
is quite common due to Theorem 2.4.1.
one can not conclude that
~he
However. from the above result
ratio of the actual generalized variances
69
~
of (2.5.4) and (2.5.5) converges almost surely to 1 as r
+ _.
It
would be quite difficult to express the generalized variance of
(2.5.5). Such an expression would necessarily be based upon a
conditional expectation conditioned on the sample sizes.
In
general, the distribution of the sample sizes will need to be
known. Yen [1964], however, could formulate her optimality results
in terms of actual variances since she was concerned with the
special case of a two stage procedure.
As stated in Section 1.3, these results can easily be
extended to the c-sample case (where c
~
2).
In such a case we
would be interested in estimating e(F 1 , ••• ,F c ) by means of UN(c)
with the restriction that n1+···+n c = N (fixed). (For notation
-
-
see Section 1.3.) The almost sure convergence of the appropriate
estimates, the asymptotic normality of
~N(c)
for random sample
sizes, and the optimality results will all follow by analogous
arguments to those presented here for the two sample case.
CHAPTER III
A TWO SAMPLE CONFIDENCE ELLIPSOID OF FIXED MAXIMUM WIDTH
3.1.
Introduction
In this chapter we will discuss a procedure that will enable
us to obtain a confidence ellipsoid for e (F,G) of fixed maximum
width 2 d, where d
>
0, and such that the confidence coefficient is
asymptotically a specified
a,
where 0
< a <
1. After a brief
discussion of the mechanics of the procedure, we will show under
certain conditions that the confidence coefficient is asymptotically
a ;
that is, the procedure is consistent. We will also show under
certain additional conditions that the expected total sample size
of our procedure is asymptotically equal to the sample size one
would use if the covariance matrix, r,
were known; that is, the
...
procedure is efficient.
3.2. Procedure
Select Mo/2 observations from the first population and MO/2
observations from the second population where MO is even and
Me
~2
[max { kif i=l, ••• ,t; qj' j=l, •• .,t} J. Compute for i,j=l, •• .,t,
Si~ and sij
1
01
..
r
(as defined in Section 1.3). Consider for 0 <
(AI)
... M0
=
[[
i.
ki kj Sl~
A
+
ij
qi qj SOl
1-
AI
AI < 1 ,
]]
.
71
where the subscript, MO' denotes the total number of observations on
which S~~ and S~~ , i,j • l, ••• ,t are based.
..
Determi ne the value, ~min,Mo'
of ~ , such that
(
vMo~'
)
is
minimized where
v*
No
and
Vi.M_(~')'
i
(~')
• max vi M(A')
i'0
= l, ••••t
.
are the characteristic roots of rM (A').
"1)
-
0
Select M observations at the next step, but this time
l
take Mi observations from the first population, where
Mil
1
.
(Mo/2) + 1
=
0
if
A.
=.
M1
if
~.
M
0
<
mln, M
0
>
ml~,
.e
.
otherwise
and M
l -
Mi
observations from the second population. Hence,
Now, using the MO + M observations recalculate s;~ and
l
S6{ ,
72
i ,J
= 1, ••• , t
•
Consider the stopping rule:
N*(d)
(3.2.1)
= smallest h > 2 [max(ki,qi' i •
It ••• ,t) ]
such that
where al ,a2 , ••• is any sequence of positive constants such that
11m
h...
and x2
t ,a
ah
=
2
Xt,a
is the lOOa % point of the chi-square distribution with t
degrees of freedom.
If the procedure is not stopped, repeat the above.
If the
procedure is stopped, calculate the ellipsoid:
.. -1
..
N* [~M'JN' - e(F,G) ]. ~ N* (IN*) [~M'JNI - ~(F,G)] ~
where M' is the number of observations from the first population and
N' is the number of observations from the second population and
M'
M' + N'
•
In our procedure the sample sizes, Mi'
73
for each step are quite arbitrary (influenced only by the degrees of
the kernels involved).
3.3. Consistency
Define for 0
< ~I <
1, the non-null positive definite matrix,
]]
+
where ~~, ~~ , i,j = 1,... ,t are defined in Section 1.3. Let ~ min
be the value of
.e
and v 1.( ~I), i
~I
such that
= 1, ... ,t
V*(~I)
is minimized where
are the characteristic roots of r_ (
~).
The characteristic root, vN(i N), defined earlier, can be
expressed as a solution to the following characteristic equation:
(3 3 1)
•
•
t-1 ·+. . + bt-1 v + bt • 0
( -1 ).t .. t + b1 v
v
where bi is a sum of products of
ij
q1 qj SOl
1j
ki kj S10
.
AN
+
1 -
..
~N
i,j • l, ••• ,t and the total number of observations on which S~~ and
S~{ are based is N. Similarly the characteristic root, v*(~min)'
74
can be expressed as a solution to the following characteristic
equation:
(3.3.2)
(-1) t v t + a1
v
t-1 + ••• + a _
t 1
v +
at • 0
where 6i is the same function as b except a has arguments of
i
i
ij
qi qj tOl
+
i,j
=
1 - ).min
1, •.. , t.
In general, for any p.d. symmetric. matrix, we can express
the greatest characteristic root of that matrix as a function of
e·
its elements. Hence,
• f .je l •...• t ]
and
+---Equivalently,
A)
vN( ).N
=
gt (sij
10' sij
01' i,J'
= 1 ,•• •,t;
.. )
"N
• t.j=l ••••• t
1
•
•••••
, •.
__
,,_
J
.............. , ... , ......_ .. _
'_.,"
r"
. . . .-.- " ...
~f-~
.. ~ ..·,.....,·· ...._
~
.....
,~.-"..
75
~
Lemma 3.3.1. The function gt(~) where ~'
(aij.b ij • i.j
II
is a continuous function in its arguments if £ ~
aij.b1j
< -.
and B1j
II
Proof. By definition
aj1 • b1j • bj1 (i.j
gt(~)
II
A ~
II
1, ••••t; A)
1-£ (£>0).
1•••••t).
denotes the greatest characteristic root
of the symmetric matr1x Awhere
-
A
As stated before. the characteristic equltioft for the characteristic
roots of A ;5:
(_l)tvt + (_1)t-1 vt - l f*(X) + ••• + (-l)v f* (X) + f*(X) = 0
(3.3.3)
.
~
f1(~)'
i
= 1••••• t
Consider aij
II
aij + t ij • bij
where
by the continuity of
f~(X)
1 ..
II
bij + £iJ. and
).1
II
).
~.
+ £11 where
1n the region considered,
i ..
+
t ..
1s a sum of products of the elements of
f*(X*)
where ni
t-l ..
1 ..
0 as {£ij.tij' i,i
= f*(X)
1 II
+n
1
I t •••• t. til}
+
O. Thus. since the
matrices considered are symmetr1c (i.e. have real roots). we obtain
1[(_l)t vt ~ (_l)t-l vt - l f*(X*) + •• , + (-1) v f* (X*) + f*(X*)]
1 t-l ..
t ..
[(_l)t vt + (_l)t-l vt - 1f*(X) + ••• + (-1) v f* (X) + f*(X)J
1 ..
t-l"
t ..
<
t
t"'1
- 1=1
I
_
76
where T1 * = ct -\ • 1 = l •••••t and
1
1
t
c =
lemma 3.3.2. For 0
bij , 1,j • l, ••• ,t;
<
t
1=1
[
k2 a'
1 11
+
q~ bif 1
1 -
l
A
A < 1. the function gt(~) where ~'
A)
= (aij'
attains its minimum value with respect to
at some interior point of (0.1) if a1j • b
1j
Proof. By definition. the function.
gt(~)'
1.j
< m •
A
= l •••••t.
denotes the greatest
characteristic root of the t-dimens10nal symmetric matrix, A, defined
above.
It can easily be shown that
(3.3.4)
+
Hence, as A +0.1.
gt(~)
By a suitable choice of
A<
£
and>.
>
1-
+ -. Moreover. for A = 0.5.
£ >
0, we note by (3.3.4) that for all
£. gt{~) >
not attain its minimum for 0
A> 1 -
E.
C. Thus. the function
<
>. < 1 in the region A
Hence. by Buck [1965, p. 74].
value somewhere in the closed interval
Remark.
1
£ ~
gt{~)
A~
gt(~)
< £
can
and
takes its minimum
1-
£.
Based on lemma 3.3.2, we know that Amin lies in the closed
. ..
.....:.."
~.
..
-
77
interval [E, 1-&]. Therefore, in the remainder of our discussion we'
will limit ourselves to that interval.
1 • l, ••• ,t , and
~min
is unique, then
·as N +
Proof. By Theorem 2.3.2, we know that for
A'
£ ~
~
1 -
a.s.
(3.3.5)
as N+
-
0
+
-.
£,
Consider the equation for the characteristic roots of
fN(>"):
.e
= o.
We can write an equivalent equation as:
(3.3.6)
Thus, (3.3.5) and (3.3.6) imply for
£
~
A'
(3.3.7)
~
as
1-
&
N+ - •
By (3.3.7) we know that
(3.3.8)
a.s.
v*(>. ' )
N
'.
for all A' such that
£
~
AI
~
as N +
v*(~ ')
+
1-
£.
.-
co
"
By our definition of Amin '
78
(3.3.9)
for all
A'
such that
£ ~ ~, ~
1-
£.
Thus.
(3.3.10)
for n
>
a.s •
.
O. Also. by the definition of Amin.N.
(3.3.11)
By (3.3.8) and (3.3.11).
(3.3.12)
a.s.
Hence. from (3.3.10) and (3.3.12).
(3.3.13)
a.s.
By (3.3.8) and (3.3.13). we obtain
(3.3.14)
. By our assumption Amin is unique. and by Lemma 3.3.1 V*(A·) is a
continuous function 1n A'. Hence. by (3.3.14) •
..
a.s.
+
A
m1n
as N
+
1
•
4
Lemma 3.3.4. If E { [. (X1 •••••X" .V 1•••••Vq )] }
...
- 1 ..
.. 1
1 • 1••••• t and Amin 1s unique. then
(3.3.15)
a.s.
+
as
N
+
•
< ••
79
~
It is helpful to note again that ~min.N' is the value of
Proof.
A' that minimizes VN'(A') where N' is the total number of
observations taken prior to the stage in which the total number of
observations taken reaches N. For
,.
~1n,N'
a.s.
...
He
sufficiently small.
as N'"
1
co
Thus. by Lemma 2.5.1 and Lemma 3.3.3.
(3.3.16)
·e
L
.~
a.s.
...
A
as
min
N ... •
By (3.3.16) and Theorem 2.3.2,
a.s.
...
r.. (Am1n)
as N...
oo
Thus. (3.3.l5) follows directly.
1 • l •••• ,t and Amin is unique, then
(i) N*{d) 1s well defined and is a non-increasing function
of d
(11 )
11m E {N*(d)}
d-tO
lim N*(d) •
d...o
d2 N*{d)
(1v) 11m
d-40 a ~(~in)
(iii)
=
co
a.s.
GO
=
1
a.s.
80
Proof. The same method of proof as in the proof of (4) and (5)
of Chow and Robbins [1965] will be adopted here.
notation except that
y'
and t' replace their Yn and t,
•
n
a
Y~
Using their
2
Xt,a
I;
al,a 2, ••• be any sequence of positive constants such that
lim an
n-...
a
na
fen)
t'
III
=
I;
We can see from (3.2.1) that our stopping rule can be written as
f(k)
Yk'
~
t'
Thus, our proof 1s complete by reference to Lemma 1 of Chow and
Robbins [1965].
1
D
l •••••t • and
.
~m1n
1s unique, then
,
81
P { e(F.G)
11m
d+O
EN* }
£
a
where EN* is the ellipsoid of maximum width 2 d given by
Proof. By Theorem 3.3.5.
11m
(3.3.17)
N*(d)
d+O
=
ClO
a.s.
By Theorem 2.3.2. (3.3.17). and (3.3.16). it can be shown that
a.s.
(3.3.18)
·e
Let m
=
as
+
Amin tl. n
=
d + 0 •
where t l is as defined in
AN* N*(t l ). N' = (1 - AN*) N*(t').
(1 - Amin ) t'
Theorem 3.3.5. Recall that HI
=
A
A
(Note that N* can be written depending upon d or t l .) Thus.
m+n a t' and H'+N' • N*(t l ). As t' + ClO, we will generate a sequence
{(mr,n r ). r
r
~
~
1} of positive integers and a sequence
{(M~,N~),
1} of stochastic vectors. By Theorem 3.3.5.
(3.3.19)
N*(t')
t'
a.s.
+
1
as t I
+
ClO
By (3.3.19). (3.3.16), and Lemma 2.5.1, the conditions for Corollary
2.4.8 are satisfied, and it follows that
82
(3.3.20)
as d
O. Thus, based on (3.3.18) and (3.3.20), we conclude that
+
is asymptotically (as d
+
0) distributed as a X~ random variable.
By (3.3.17), (3.3.19), Lemma 3.3.4, and Lemma 2.5.1, we know
a.s.
as d
a
+
+ 0 •
Therefore,
lim
P { e{F,G)
d..Q
£
EN*}
.
=
p{
2
Xt
<
a} •
a
3.4. Efficiency
We will start this section with a lemma that is closely related
to a theorem given by Cramer [1946, p. 353].
Lemma 3.4.1.
If
(1) in some neighborhood of X = a, g(X) is continuous and for
all X ~ a, the first partial derivative of g(X) exists,
(2) Ig(:n)1 < C nq for 0 < C < ., 0 ~ q <. , and
(3) nk E [Tnj - aj]2k ~ c2k
where r
~
1. then
<.
for some k ~ r (q+l)
e·
_-
_
,.._ ,
, ..-.J'
~-
'-"
=-.._-..
--~,
, ,.. '. -.. .
~
83
Proof. For simplicity we will consider the univariate case. The
more general result will follow analogously. Let ~n •
(Xl •••••Xn) £ Rn and denote by Pn the joint distribution of
(Xl ... •.X n)· Further, 1et
Zn
=
{
~n
:
ITn - 01
-<
£
}
ZC
n
c
{
~n
:
ITn - 81
>
£
}
= Iz [g(Tn) - g(o)]r dPn +
n
E [g(Tn) - g(o)]r
.e.
Consider
2r cr nqr
=
c*
<
"?K
-r
g(Tn) - gee)
<
n
dPn
<
since k ~ r (q+l) •
n
Now, within Zn' for Tn
and for Tn
I ZC
r
2r c nqr C2k
nk £2k
> 0 ,
=
g'(e + t*) (Tn - 0)
where t*
>
0
e ,
g(Tn) - gee) • g'(e + t') (Tn - e )
where
£' <
i
o.
84
Thus,
I
g(Tn) - gee)
where a-£
an
<
I
a and 8
<
~
max { IgI(6 n)1
< a~ <
, I gl(e~)I} • ITn - al
8 + £. Note that
Izcn
Define
P*n •
Izcn
= 0
otherwise
since k ~ 1. Thus, the proof is complete.
~
1 then
t
(T - a)r dP n
n
dP n
1 Izcn (Tn - a)r dP n I· I IRn (Tn - 8)r P*n dP*n
where r
e
e·
85
=.
E {[U
- e(F,G)]2r}
mn
O(N-r)
where m+n _ N.
Proof.
By definition for every m ~ k, n
~
~
OJ
where the sumnation extends over all 1
1
~ ~
< ... <
Sq
~
n.
q,
< ... < ,
~
m,
Thus, in a manner analogous to expression
(2.18) of Miller and Sen [1972], we can write
·e
U mn
meek] n-[q] t P
m,njk,q
k
f ~•• f
q
n d[c(x 1 - Xa )] n d[c(y,f - YB )]
1-1
- i
j=l
-.J
- j
where
P
_
m,njh,l
and meek]
{ 1 ~ a 1 f- ... f- ~ ~ m, 1 ~
= { m ...
a,
f- ... f-B l
~
n }
(m-k+1)} -1.
Consider
and
q
n d[c(Yj" Ys)]
j-1
.. ~
q
_
n (d[C(~j" !B ) .. G(~j)] + dG(~J.) }
j
j-1
86
Thus, due to the symmetry of the kernel,
•
Umn
e (F ,G)
+ k Um,ni 1 , 0 + q Um,niO, 1
where
1
Um,n; 1 ,0
Um,niO,l
and for 1
~
=
h
~
II
m
=
k, 1
1
n
~
t ~ q
m-[h] n-[1]
1:p.
m,nih,1
f ... f.ht(Xlt."'~'Y1'
••• 'YIJ
............£0
h
t d[c{Xi - Xa ) - F(x i )]
i=l
..
... i
...
1
1:
jill
d[C(Yj - VB ) - G{y.)]
- j
-J
Now, consider
E
k
q
{t
1:
h=l 1,=1
E [U
m,nih,l
]2r
Before we proceed, one should note (see expression (2.9) of Miller
and Sen [1972]) that for all rj
~
1, j
II
•
1•••••h* (
~
1 ).
87
h*
t
j-l
rj
2s .s.t1.
-
- 0
<
if at least one of r1 •••••rt - 1
dF(xl)
.. ••• dF(x..)
..
otherwise
When we examine E[Um•n;h ...]2r. we see that there are
fh ...(r). m (m-1) •• (m-h+1) n (n-1) ••• (n-1+1)
terms which are non..zero. Hence.
E [U
]2r _ 0[mh(1-2r) n1(1-2r)]
m.n;h."
_
·e
o
<
0(N-2r)
for r .t 1. h ~ 1. 1 ?. 1. Therefore.
E { ~~
~~
hill1 1,=1
.. Um•n;h.1} 2r
(k) (q)
h
Now note that Um.n;l.O and Um.niO.l are averages of 1.i.d.r.v.
Therefore.
by
a well known result,
E [Um,n,.0 , 1]2r
. Now, since
III
O(n- r )
Umn •
e (F ,G) =
k U .• 1 0 + q U .
m,n, ,
m,n.O,l
+
k
t
q
t
h=l 1.-1
88
(~)(~)Um ,n jh,.t
by the C. r - Inequality,
k
q (k) (q)
t
h I. Um n.he] 2r }
h=l 1.=1
' , ~
+ E [t
Lemma 3.4.3.
If E { [. i (Xl' ••• 'X k ,Y1' ••• ,Yq
..
-1-
-1
)]
4k
} < .,
k
>
-
1,
i • l, ••• ,t , then
(3.4.1)
and
(3.4.2)
i ,j
= 1, .•• , t.
Proof. We shall only prove (3.4.1) sinee (3.4.2) will follow
analogously. From Theorem 2.3.2 we note that for m+n • N,
sij
10
=
uij
uij
l,O;m,n
where am,n(e,d)
O,O;m,n
=
+
qi
d-O
t
i .
a
m, n(e,d) UeJ, d•,m, n
O(N-l). By the C. r Inequality,
e·
89
By Lel11llCl 3.4.2,
ij
uij
ij )2k ]
O{N-k)
E [ ( U1,0;m,n
- O,O;m,n· t 10
•
(3 •4 •3)
ki
qi
t
(c d) Uij
)2k]
(3.4.4) E [{ t d-O
am,n'
c,d,;m,n
ceO
=
0(N- 2k )
Therefore, (3.4.1) follows.
As noted in Section 3.2 VN(A ' ) can be written as gt(S~~, s~~,
i,j = 1, ••• ,t; At). Since imin,N is the value of AI that minimizes
VA(A ' ), we can write Amin,N' for N sufficiently large, as the
appropriate solution to the following differential equation:
-
a
all
if
(Sij sij i .
gt 10 ' 01 " J
= 1, • • ., t.,
j ij
-!9t(S110 , SO1' i,j = 1
aA
t •••
,I)
•
0
A
,t; AI) exists. Otherwise, Amin N
I
is determined by that value of At
'
(£ ~ A I ~ 1-£ ) which minimizes
In other words,
* 1j ij
Amin,N • 9t (S10' SOl' 1,j • l, ••• ,t) •
A
,
90
Similarly,
[Note that in the above definition of Amin , we are making the implicit
assumption that Amin is unique.]
Lemma 3.4.4. Assume
(1) in some neighborhood of X = 8, where
~ ~
i,j • l, ••• ,t; Amin)' for all
8,
=
8'
(c;~, t~{,
the first partial derivative
of g(X) exists,
- (2)
i,j
in some neighborhood of Y =., where,'
= l, ••• ,t),
gi(!) is continuous.and.for all Y ~
=
~
ij ' t ij '
(C 10
01
• the first
partial derivative of gi(X) exists.
(3)
:N
ij
= (SlO'
I gt(:N)I < C Nq for
ij
SOl' i,j
= l, ••• ,t;
0 < C < • , 0 ~ q <. where
.. )
AN
(4) E {[. i (Xl, ••• ,xk.,Yl, ••• ,Yq )]4k } < • , i • 1, ••• ,t
-
- 1 -
-
i
for some k ~ 4 (q+1),
then, for Nsufficiently large,
>
Proof. By the C_ r - Inequality,
&
}
~
c l N-2
(O<cl<ao)
91
'"
..
From the definition of AN and Amin,N' given earlier,
MO + 2
2 N
"'..
Thus, E [AN
- Amin,N'] 2"~ O(N- 2k ). By Lemma 3.4.3 and assumption
(4),
(3.4.6)
and
{3.4.7}
where ml+n l • N' • Since £ ~ AN ~ 1-£ and assumption (2) holds •
{3.4.6} and (3.4.7) enable us to apply lemma 3.4.1 and obtain, for
'"
.~
Nsufficiently large,
(3.4.8)
Again by Lemma 3.4.3 and assumption (4),
(3.4.9)
and
(3.4.10)
where m+n
= N.
By assumptions (1) and (3). lemma 3.3.1. and (3.4.8).
(3.4.9), and (3.4.10), we can apply Lemma 3.4.1 and obtain
92
Thus, by Chebychev's Inequality, (3.4.5) follows.
In the proofs of Lemma 3.4.5 and Theorem 3.4.6 which follow,
we will use an argument analogous to that given by Sen and Ghosh
[1971] in their proof of Lemma 5.5 and statement (3.6). Define
Lemma 3.4.5. If conditions (1) - (4) of Lemma 3.4.4 hold, then
lim
d-+O
I
n-n2(d)
P { N*(d)
>
n}
0
-
Proof.
I
n-n 2(d)
P (N*(d)
>
n}
-
I
n=n (d)
2
...
P ( Vh(A h)
>
• for all
a
h - 1, ••• ,n }
a
Since for n ~ n2(d) ,
}
93
>
a
a
and as by Lemma 3.4.4,
with probability 0(n- 2) (as n + .
n2(d)
+ . ),
),
by letting d + 0 (i.e.
the lemma follows directly.
Theorem 3.4.6. If conditions (1) - (4) of Lemma 3.4.4 hold, then
(3.4.11 )
11m
d:+O
.e
d2 E {N*(d)}
a v*(Amin)
=
1.
Proof. Cons ider
d2 E {N*(d)}
•
a v*("min)
where tl extends over all n
<
nl(d), t 2 over all n: nl(d)
and t 3 over all n ~ n2(d). Since
11m
d:+O
and by Theorem 3.3.5
= •
<
n
<
n2(d)
94
11m
•
d-t()
for every
all 0
<
d
£ (
>
~
dO'
0), there exists a value of d, say dO' such that for
P { n1(d) < N*(d) < n2(d) }
P
>
[
I
1 -
>
'1
a.s.
1
d2 N*(d)
_ 1
aV*(>'min)
,
'1
being arbitrarily small. Hence for d ~ dO'
(3.4.12)
2: 1 n P { N*( d) • n}
a v*(>. mi n)
<
P {N*(d)
(1 - £)
Also, for all n (d)
1
<
n
<
n1(d)}
<
'1 (1
<
- £)
n2(d),
2
d n
-
1
<
£
,
and hence,
2:2 n P {N*(d)
(3.4.13)
£
2:2 P { N*(d)
=n }
= n}
1
+
'1
<
£
+
'1
...... _.,.. _.... ,'.... ."... "., ,._
e
,._..__. _ _......._-..
..,,-r-
~,
."..- .....
~_
.....
,~.'.,
""",,,.. . .. •
~
95
Finally,
(3.4.14)
t3 n P {N*(d) • n } •
t3 P {N*(d)
n} + [1 + £
>
2
+
d
a v*b'min)
j.
Since
, ..
•
·e
+
as d + 0,
•
•
using Lemma 3.4.5, both the terms on the r.h.s. of (3.4J4) converge
to 0 as d + O. Thus, (3.4.11) follows from (3.4.12), (3.4.13), and
(3.4.14).
3.5. Remarks
(1)
As stated in Section 1.3, these results can easily be extended
to the c-sample case (where c
~
2).
In such a case we would be
interested in obtaining a confidence ellipsoid for !(F1, ••• ,Fc ) of
fixed maximum width and such that the confidence coefficient is
asymptotically a specified
e
a.
(For notation see Section 1.3.) The
asymptotic consistency and asymptotic efficiency results will all
follow by analogous arguments to those presented here for the two
I;
96
sample case.
(2)
It is of interest to consider why we can not use an argument
similar to Simons [1968] in the proof of Theorem 3.4.6. To proceed
as in Simons [1968], we would like to define a "reverse" stopping
variable by
L
=
if there
is such an h
=
•
'"
h0 - 1 if vh(>'h)
if Vh(A h)
where ho • 2 [max(ki,qi' i
of in' we see that i
h d2
'"
011
n
<
h d2
ah
>
CIi1
= 1, •.•• ,t)].
for all h ~ hO
infinitely often
However, by the definition
depends on the entire past so that L certainly
.
does not depend solely on the future.
Moreover, we note from the
argument given in Theorem 2.3.2 that S~~ and S~~ can be written as
a linear combination of generalized U-statistics, but the coefficients
of those linear combinations depend upon the sample sizes from the
two populations and hence upon the past. Therefore, the argument
presented by Simons [1968] can not be applied in our situation.
In the work of Robbins et al. [1967] and Simons [1968], it
has been shown that the expected sample sizes for their procedures
exceed the sample sizes one would use if the appropriate population
parameters were known by a finite quantity for all d
>
O.
It appears
that the proof of such a result in our situation is not obvious.
Part of the difficulty arises from the problem of reverse stopping
97
variables presented above.
(3)
It is of interest to note that since
[k2i Sii
q2i Sii
10
01 ]
t
+
i=l
AN
1 - AN
t
A
A
A
and
£
~ AN ~ 1-
1s for i
£.
a sufficient condition for
I9t (!N) I
<
C
q
N
= l •••••t.
11
(3.5.1)
1510
I
q
<
cN
and
11
15011
<
q
c N •
Certainly any set of bounded kernels (the case of the most practical
value) will satisfy (3.5.1) with q = O.
(4)
o
For t
>
5, it may not be possible to solve equations (3.3.1)
and (3.3.2) explicitly for vACi N) and v*(Amin). Thus, an expression
for 9t(~) may not be obtained explicitly. In practice. however.
one can st 111 determi ne if Theorem 3.4.6 app11 es. . Consider the
t-dimensional matrix.
+
As stated before, the polynomial expression for the characteristic
roots of A is: y • 0 where
-
Xl
•
(a ij • bij , i.j • l .....t; A) • and
f1(~)'
i • l •••• ,t are
98
sums of products of
+
For 0 < A <1, Y possesses continuous partial derivatives of all orders.
Therefore, by Hildebrand [1962, p. 340],
3 "
(3.5.2)
whenever
aY
~
=-
•
ay -1
O. By examining the behavior of
aY
aX
and
i
ay
3" ' we can determine if assumption (1) of Theorem 3.4.4 is
satisfied.
Moreover, by definition, gi(!) is the solution for A in the
o
differential equation
=
0
such that
>
A-g*(Y)
t
3gt(~)
if aA
agt(~)
exists. As before, if aX
partial derivatives,
o
has continuous first
99
=-
(3.5.3)
whenever
if
•
is not zero. However. we know by definition
agt(~)
i»'
has a continuous first partial derivative which
requires that ~ ~ O. Thus. to verify assumptions (1) and (2)
of Theorem 3.4.4 we need only examine the behavior of ay
.
aX i
and
ay • the relationships (3.5.2) and (3.5.3). and the value of
av
o
CHAPTER IV
SMALL SAMPLE BEHAVIOR
4.1.
Introduction
This chapter will be devoted to a numerical illustration of the
theory developed in Chapters II and III.
In particular, we will
consider, first, an analogue to the Behrens-Fisher problem.
In other
words, we will be concerned with the estimation of a difference in
location of two normal distributions when there is also a difference
in dispersion and the total sample size is fixed. Similarly, we will
want to obtain a fixed-width confidence ellipsoid for a difference in
location of two normal distributions when there is also a difference
in dispersion. As our estimator of a parameter that will reflect a
difference in location, we will use the Wilcoxon statistic. Next, we
will estimate a difference in scale of two normal distributions when
the total sample size is fixed. Similarly, we will detennine a-f:,/,fixed'",,_w
width confidence ellipsoid for a difference in scale of two nonnal
distributions. As our estimator of a parameter that will reflect a
difference in scale, we will use Lehmann's two sample scale statistic.
In order to evaluate the perfonnance of our procedure in the
above two situations, we will examine the effect of various
parameters on several quantities of interest. For the procedure
described in Chapter II, it will be of interest to consider the
effic1e~cy
of our estimator (i.e.
~ ,
101
[;~::::~ :I
lIt
-.
A
where rM, N' is defined in Section 2.2, f m n is defined in Section
,
-
2.5, and mO.n O are chosen to
minimizel:m,~).
For the procedure
described in Chapter III, we will be concerned with the coverage
frequency and a comparison of the average sample size of our procedure
with the sample size one would expect if the covariance matrices
were known. For simplicity, we will consider, in both cases,
bivariate normal distributions. Therefore, one parameter that will
be examined will be the correlation coefficient in each of the
. ~
two distributions. Other parameters which will be varied for our
test of a difference in location will be the ratio of the scales of
the two distributions and the sample size (in the case of estimation,
the fixed total sample size, while for the fixed width confidence
ellipsoid. the sample size one would expect if the covariance matrices
were known).
4.2. The Wilcoxon Statistic
This section will be an el1aboration of Example 1.4.1. Let
{~1'~2' ••• } be a sequence of i.i.d.r.v. with each ~i
= (X 11 ,X i2 )'
having a distribution function F(x) where F(x) denotes a bivariate
normal distribution with mean
~(l)
=[
P1l ] and dispersion
P
21
102
Let {Y1,Y , •••} be an independent sequence of i.i.d.r.v. with each
.. . 2
~j •
(Y j1 ,Yj2 )' having a distribution function G(!) where G(!) denotes
a bivariate normal distribution with mean
and dispersion
Define F1(x i1 )
and G2(Yj2)
= 1,2.
The
1
U.mn • (Umn'
i
=
=
F(x i1 ,.), F2(x i2 ) • F(_,x i2 ), Gl(Yjl) •
G(-,yj2 ). Let e(F,G) • (al(F,G), e2(F,G»' where
U-statistic estimator of a(F,G) is for m~ 1, n ~ 1,
2
Umn ). where
and the summation extends over all 1
•
for 1
a
G(Yjl'~)'
0
~ a ~m,
if Xa1· Y81
1,2. It can easily be seen that
1 ~ 8 ~ n.
e·
" •• -~,._ _ ..,.. 'i< ..••.
~..
<: ... ~ ·_·.... .;...t,..,,·~.,. w
··~ .... f~ ""
•• _._,_ .... ,. ........... __..,.,•• _- _ _ • ...., .... ~ ~ _..... ~ _ _ ~ . ~. . .",'-~.1
'.0-
•
._.~.:
103
~
.~0(~a·!8)
III
1 - 2G 1(Xa1 )
.~l(~a'!B) =
and
2F 1(YS1) - 1.
Also
and
o
b12
=
[1
+
b22
III
[
1 +
a
~~)]
(2)
a 22
-1
. • . _....
. ~...., ..... ~
'. _
'.
104
By an extension of Pearson [1907, p. 11], it can be shown that if
e(F,G)
-
c
-0,
2
2
11'
By application of Lemma 4.2 of Hoyland [1965], we obtain
l; 11
10
=
2
2
'If
For simplicity we will consider a(F,G)
also for simplicity, assume that
Sin -1 (b 22 )
= O.
Therefore, we can,
~(1) : ~(2) • O.
As in Example 1.4.1 we note that
Ai
h
where A~ is equal to the number of Y6i'S which are
n
greater than xhi minus the number of Y6i'S which are less than
xh1; i = 1,2.
m
where B~ 1s equal to the number of xai's less than
Y1i minus the number of Xa;'s greater than Y1;; i
Thus,
(m - 1) n2 Sij
10
= ~ (Ahi - )\I) (Ajh - ~)
h-l
and
e
1,2.
105
m2
(n - 1) s1j
01
•
o
4.3. Small Sample Behavior for the Wilcoxon Statistic
For the Wilcoxon statistic, k1 • k2 = ql = q2 1. We will
consider the case described in Section 4.2. Thus, using the notation
&:
introduced in Section 4.2,
r
.-
2 Sin-l(b11 )
2 Sin- l (b12 )
+
11' A
'If (1 - A)
2 Sin-1(al) 2 Sin-1(a2)
2 Sin- 1(al)
2 Sin- l (b21 ) 2 Sin- l (b22 )
+
'If A
'If(l
-
A)
&:
'If
>.
+
2 Sin- 1(a 2)
... (1 ->. )
...
'If
>.
..( l
-
>.)
J
Using the notation of Chapters II and III, if
'0
&:
4
2,..
4
2"
'II'
{S1n- l (b1l ) S1n- l (b 22 ) - 2 Sin-l(al) S1n- l (a2)
+ S1n- 1(b 12 ) Sin- l (b 21 )}
then,
106
2
t (!.)I.
1.=0 l
(2..)t-1.
l-l
vI.
The greatest characteristic root of ...r is:
v*(l)
= .-1
( l-l [Sin-l(b l1 ) + S1n- l (b21 )] + (1 _ l)-l [S1n- l (b1 2)
+ S1n- l (b22 )] + [{ l-l [S1n- l (b ll ) - S1n- l (b21 )]
+ (1 - l)-l [Sin- l (b 12 ) _ Sin-l(b22)]} 2 +
4 [ l-l Sin- 1(a ) + (1 _ l)-l Sin- l (a2)]2 ]1/2 )
l
In our analysis, we will
Mo,
ar~itrarily
select our initial sample,
to be 10, and the sample size of each subsequent step equal to
10. We w111 consider the following cases for the ratio of the scales
of the two distributions:
o
(2)
all
[
(1) ,
=
(1,1), (0.25,1), (0.25,0.25).
a 11
When considering such cases as
• 1, let (2).
for simplicity. For the case
• °.25, leta 11(2) = 1
<;1
(1)
all
II
and
1,
a~~). 4.
107
In the case of estimation with fixed total sample size,
we will let that fixed total take on the values of 50 and 100.
In
the case of the fixed width confidence ellipsoid, we will let the
sample size one would expect if r were known take on the values of
50 and 100. Also, in the later case,
a=
0.95 and ah
= 5.99.
In order to generate observations from a bivariate normal
Let {X l 'X 2} be
a pair of independent normal deviates with mean zero and unit
distribution, we will use the following fact.
variances. Then
·e
represents a pair of deviates from a bivariate normal distribution
with zero means, variances equal to all' a 22 respectively, and
correlation coefficient p.
o
Therefore. in Tables 4.3.1 - 4.3.6. the sequential procedure
described in Chapter II is simulated for the particular cases
described above.
From these tables we see that, in general, our
estimate of the asymptotic generalized variance of our estimator
(described in Chapter II) is approximately equal to the minimum
asymptotic generalized variance of any two sample U-statistic
estimator for which the total sample size is fixed.
It appears,
however, that for a sample size of 50, the asymptotic generalized
variance of our estimator does not closely approximate the minimum
asymptotic generalized variance when either p(l) or p(2) has a large
108
absolute value. However, when the sample size increases to 100,
the efficiency (defined above) seems to improve.
It appears that
our procedure works slightly better for the cases described in
Tables 4.3.3 and 4.3.4 than the other two cases considered, but
no clear pattern seems to exist.
It is of interest to note that a comparison of computing time
was made for the sequential procedure described in Chapter II when
estimators based directly on the theory of U-statistics (as in
Yen [1964]) were used to estimate t~~, t~~, i.j
to s~~, S6~' i,j
= l •••••t.
= l ••••• t
as opposed
For the case described in Table 4.3.1,
it was found that for a few selected combinations of p(1).p(2)
that for the calculation of the estimators of t~~. t~{. i.j
the procedure which employed s~~, s~{. i.j
= l •••••t
= l •••••t.
was approximately
250 times as fast as the other method. The difference in time was
particularly evident in the later stages of the sequential procedure
when the sample sizes were large.
Tables 4.3.1 and 4.3.2 have been included in their entirety
instead of an upper triangular form due to the fact that our
procedure is not exactly sYmmetric in those cases. The lack of
symmetry is due, in large part, to the fact that a greatest integer
function is employed, and, to some extent, our minimization routine.
The same comment will also apply to Tables 4.3.7 and 4.3.8. This
lack of symmetry may become particularly evident in these latter
tables when we consider the efficiency results since for anyone
application of the procedure. the total sample size will differ from
~ .
109
~
the sample size one would use if the covariance matrix were known
by multiples of the size of the sample taken at each stage.
In Tables 4.3.7 - 4.3.12, the sequential procedure described
in Chapter III is simulated for the particular cases described earlier.
From these tables we see that, 1n general, the sample size required
by our procedure is quite close to the sample size one would use if
the covariance matrix were known.
Moreover, the coverage frequency
of the confidence ellipsoid is quite close to the specified confidence
coefficient.
It appears, however, that the efficiency of our
procedure is low when pel) and p(2) are quite different. There appears
to be no obvious pattern with respect to the other variables considered.
·e
_._,--"~
~~.
,..~
.... _ _
.......- - - •._ _ ........_ · · · · ·.... ,·--.,..w __ .....
~._
~
_
,.
110
TABLE 4.3.1
EFFICIENCY OF ESTH1ATION WITH FIXED SAMPLE SIZE
IN THE CASE OF THE WILCOXON STATISTIC
N
=
50.
0(2)
0(2)
0(1)
11
0(1)
22
[~ ,~]
.
p(2)
0
=
(l ,1 )
pO)
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
-0,8
1.56
1.37
1.30
1.22
1.18
1.14
1.03
1.03
0.95
-0.6
1.39
1.25
1.23
1.16
1.11
1.09
1.03
1.02
0.93
-0.4
1.17
1.12
1.09
1;08
1.05
1.01
1.00
0.99
0.94
-0.2
1.06
1.07
1.08
1.04
1.02
1.01
1.00
0.98
0.98
0.0
1.03
1.01
1.04
1.01
1.03
1.03
1.03
1.02
1.02
0.2
1.01
0.98
1.01
1.00
1.02
1.03
1.05
1.08
1.09
0.4
0.97
0.99
1.01
1.04
1.05
1.06
1.09
1.12
1.22
0.6
0.95
1.00
1.00
1.03
1.08
1.11
1.15
1.20
1.33
0.8
0.94
1.01
1.02
1.09
1.11
1.18
1.25
1.39
1.55
Note: The entries in the table are equal to
.
:M' ,N
ave I
I
I ]
1/2
[
I:mo·nol
where mO and nO are chosen to minimize I:m.nl • All entries are based
on 25 repetitions of the simulation.
e·
111
TABLE 4.3.2
EFFICIENCY OF ESTIMATION WITH FIXED SAMPLE SIZE
IN THE CASE OF THE WILCOXON STATISTIC
N
II
100,
a(2)
a(2)
a(1)
11
a(1)
[~ ,~]22
p(1)
p(2)
,e
(1,1)
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
-0.8
1.50
1.42
1.27
1.21
1.15
1.08
1.06
1.01
0.97
-0.6
1.32
1.25
1.20
1.13
1.12
1.06
1.04
1.01
0.95
-0.4
1.12
1.10
1.08
1.06
1.03
1.01
1,00
0.97
0.94
-0.2
1.06
1.07
1.03
1.03
1.01
1.01
0,99
0.97
0.97
0.0
1.00
0.99
1.01
1.01
1.01
1.01
1.01
1.00
1.01
0.2
0.99
1.01
1.00
1.01
1.01
1.03
1.03
1.05
1.09
0.4
0.97
0.97
1.00
1.01
1.03
1.05
1.08
1.12
1.18
0.6
0.93
0.96
0.99
1.03
1.06
1.10
1.14
1.21
1.29
0.8
0.93
0.99
1.02
1.07
1.11
1.17
1.24
1.35
1.54
Note: See note for Table 4.3.1.
112
TABLE 4.3.3
EFFICIENCY OF ESTIMATION WITH FIXED SAMPLE SIZE
IN THE CASE OF THE WILCOXON STATISTIC
N
=
50,
0(2)
0(2)
o(l)
0(1)
[..2:- ,..::-] = (0.25,1)
11
p(2)
22
P(l)
-0.8
-0,6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
-0.8
1.11
1.08
1.07
1.02
0.99
0.94
0.92
0.87
0,83
-0.6
1.08
1,06
1.03
0.97
0.95
0.92
0.91
0.86
0.81
-0,4
0,97
1,00
0.97
0.98
0.97
0.97
0.94
0.94
0.84
-0.2
1.00
1.02
1.02
0.99
1.01
1.00
0.98
0.97
0,96
0.0
1.05
1.03
1.03
1.03
1.04
1.02
1.02
1.02
1.05
0.2
1.02
1.03
1.01
1.02
1.05
1.03
1.05
1.07
1.10
0,4
1.01
1.02
1.03
1.05
1.05
1.10
1.09
1.15
1.17
0,6
1.05
1.03
1.04
1.05
1.08
1.12
1.16
1.20
1.28
0.8
1,00
1.03
1.07
1.08
1.11
1,15
1.19
1.26
1.41
Note:
See note for Table 4.3.1.
e·
113
TABLE 4.3.4
EFFICIENCY OF ESTIMATION WITH FIXED SAMPLE SIZE
o
IN THE CASE OF THE WILCOXON STATISTIC
N • 100,
0(2)
0(2)
0(1)
0(1)
(~ ,~].
11
p(2)
.-
(0.25,1)
22
pO)
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
-0.8
1.16
1.09
1.04
1.01
0.98
0.94
0.92
0.88
0.77
-0.6
1.03
1.00
1.02
1.00
0.94
0.94
0.95
0.87
0.80
-0.4
0.98
1.00
0.99
0.98
0.96
0.94
0.92
0.88
0.87
-0.2
1.03
1.01
1.00
1.00
1.00
0.98
0.97
0.96
0.93
0.0
1.01
1.00
1.01
0.99
1.01
1.02
0.99
0.99
1.01
0.2
0.99
1.00
1.01
1.01
1.02
1.03
1.04
1.08
1.09
0.4
1.01
1.01
1.03
1.03
1.06
1.07
1.09
1.11
1.15
0.6
1.00
0.99
1.04
1.05
1.08
1.09
1.13
1.18
1.29
0.8
0.98
1.01
1.03
1.07
1.10
1.15
1.19
1.27
1.41
Note:
See note for Table 4.3.1.
114
TABLE 4.3.5
EFFICIENCY OF ESTIMATION WITH FIXED SAMPLE SIZE
IN THE CASE OF THE WILCOXON STATISTIC
a(2)
N
a
50,
-
11
[ a(l)
11
p(2)
0
a(2)
(0.25,0.25)
· a;;J
=
22
pel)
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
-0.8
1.36
1.22
1.11
1.05
1.06
1.02
0.97
0.93
0.84
-0.6
1.19
1.11
1.07
1.10
1.08
1.00
0.98
0.94
0.95
-0.4
1.12
1.08
1.03
1.03
0.99
0.97
0.97
0.96
0.93
-0.2
1.00
1.08
1.04
1.02
1.01
1.03
1.00
0.99
0.98
0.0
0.97
1.03
1.03
1.03
1.05
1.01
1.02
1.04
1.06
0.2
1.02
0.99
1.03
1.04
1.01
1.05
1.05
1.06
1.10
0.4
1.00
1.05
1.01
1.04
1.04
1.07
1.10
1.14
1.18
0.6
1.00
1.02
1.02
1.03
1.08
1.10
1.12
1.17
1.26
0.8
0.92
0.99
1.04
1.06
1.07
1.14
1.19
1.28
1.41
Note: See note for Table 4.3.1.
_.
115
EFFICIENCY OF ESTIMATION WITH FIXED SAMPLE SIZE
IN THE CASE OF THE WILCOXON STATISTIC
0(2)
N :: 100, [11
0(1)
11
,~1
I:
(0.25,0.25)
a(l)]
22
p(l)
p(2)
·e
0(2)
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
-0.8
1.35
1.19
1.06
1.09
1.03
1.00
0.96
0.93
0.89
-0.6
1.17
1.14
1.06
1.02
0.99
0.98
0.99
0.89
0.92
o.
-0.4
1.06
1.08
1.02
1.02
0.98
0.97
0.97
0.95
0.94
-0.2
1.06
1.03
1.01
1.01
1.00
1.00
0.98
0.98
0.95
0.0
1.02
1.02
1.01
1.01
1.00
1.02
1.01
1.02
1.03
0.2
0.99
1.07
1.01
1.01
1.01
1.03
1.04
1.07
1.10
0.4
1.00
0.99
1.00
1.02
1.04
1.05
1.09
1.13
1.17
0.6
0.96
1.00
1.01
1.02
1.05
1.09
1.13
1.16
1.25
0.8
0.96
0.99
1.01
1.03
1.07
1.10
1.17
1.22
1.40
Note: See note for Table 4.3.1,
116
TABLE 4.3.7
CONSISTENCY AND EFFICIENCY FOR FIXED WIDTH CONFIDENCE
c
ELLIPSOID IN THE CASE OF THE WILCOXON STATISTIC
(7(2)
N • 50 •
p
r\
0(2)
~]
11 •
(1)
(7(1)
(711
22
(2)
I:
p
(l ,1)
(1)
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
-0.8
48
0.92
44
0.80
44
0.88
46
0.80
50
0.92
54
0.92
59
0.92
72
0.96
80
0.96
-0.6
49
0.92
50
0.96
47
0.84
49
0.92
52
0.96
56
0.72
66
0.88
73
0.88
76
0.92
-0.4
50
0.92
49
0.76
53
0.96
51
0.96
54
0.96
62
0.92
67
0.88
72
0.96
70
0.88
-0.2
53
0.92
51
0.84
54
1.00
54
0.92
59
0.96
65
0.96
64
0.88
62
0.96
61
0.92
0.0
56
1.00
56
0.92
57
0.92
60
0.96
62
1.00
58
0.96
56
0.76
58
0.92
54
0.92
0.2
59
0.96
61
1.00
62
1.00
62
0.96
58
0.96
54
1.00
53
1.00
52
0.92
51
1.00
0.4
65
1.00
68
0.96
64
0.92
60
1.00
54
0.88
51
1.00
48
0.96
48
0.92
48
0.88
0.6
73
0.96
72
1.00
64
1.00
54
1.00
50
0.96
49
1.00
46
1.00
45
0.96
44
0.92
0.8
72
0.84
68
1.00
59
.0.92
52
0.88
48
1.00
47
0.88
44
1.00
44
0.96
44
0.96
Note:
The first number in each cell is the average value of N*.
The
second number in each cell is the coverage frequency of the ellipsoid
EN*· All entries are based on 25 repetitions of the simulation.
e·
117
TABLE 4.3.8
CONSIstENCY AND EFFICIENCY FOR FIXED WIDTH CONFIDENCE
ELLIPSOID IN THE CASE OF THE WILCOXON STATISTIC
og)
o~~)
[_11 ' _22: : I
]
N=100,
0(1)
p(2)
·e
(1,1)
0(1)
pO)
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
-0.8
90
0.96
87
0.96
84
0.96
84
0.92
92
1.00
98
0.84
117
0.96
140
0.96
154
0.96
-0.6
92
0.96
92
0.92
89
0.88
88'
0.96
92
0.96
107
0.96
124
0.92
143
0.96
152
0.96
-0.4
97
1.00
95
0.96
94
0.96
• 93
0.96
98
1.00
108
1.00
124
0.84
132
1.00
131
0.96
-0.2
100
0.96
101
0.96
99
0.96
100
0.96
104
0.96
117
0.96
118
0.92
117
0.96
115
0.84
0.0
105
0.96
107
0.96
109
1.00
107
1.00
116
0.92
110
0.96
107
0.92
105
0.96
105
0.92
0.2
114
1.00
115
1.00
114
1.00
118
0.96
104
0.96
99
0.88
96
99
0.96 . 0.96
98
0.92
0.4
125
0.92
126
0.96
124
0.96
109
0.96
99
1.00
92
1.00
88
0.92
94
0.88
91
0.92
0.6
138
0.96
126
0.88
118
0.92
104
1.00
92
0.92
87
1.00
85
0.84
84
0.92
86
0.96
0.8
138
0.88
128
0.96
110
0.96
97
0.96
87
1.00
81
1.00
79
0.96
76
0.80
80
1.00
Note:
See note for Table 4.3.7.
118
TABLE 4.3.9
CONSISTENCY AND EFFICIENCY FOR FIXED WIDTH CONFIDENCE
ELLIPSOID IN THE CASE OF THE WILCOXON STATISTIC
(2)
N = 50 ,
0(2)
011
22
[ 0(1)
11
' 0(1)
p(2)
1•
(0.25,1)
22
pO)
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
-0.8
45
0.92
45
1.00
45
0.92
43
0.80
48
0.96
50
0.80
62
0.88
68
0.96
68
0.96
-0.6
48
0.92
48
0.96
49
0.96
46.
1.00
49
1.00
55
0.92
64
0.92
63
0.96
66
0.96
-0.4
50
0.96
49
0.88
50
0.88
52
0.96
53
1.00
61
1.00
59
1.00
58
0.88
64
1.00
-0.2
53
0.96
55
0.96
52
0.92
54
0.88
58
0.92
60
0.92
59
0.80
56
0.92
58
1.00
0.0
56
0.88
57
0.92
56
0.92
58
0.92
63
0.96
57
0.92
56
0.92
56
0.92
53
0.96
0.2
59
0.88
58
0.84
61
0.84
63
0.92
60
1.00
55
1.00
53
0.88
54
0.96
52
0.88
0.4
62
0.92
60
0.88
63
0.96
63
0.96
56
1.00
52
0.88
51
1.00
52
1.00
51
1.00
0.6
65
0.92
67
0.88
67
0.92
60
0.92
54
0.92
50
0.96
51
0.96
50
0.88
49
1.00
0.8
70
0.92
66
1.00
65
0.92
56
0.92
50
0.84
48
0.96
46
0.96
48
0.96
45
0.84
Note:
See note for Table 4.3.7.
-,
119
TABLE 4.3.10
..
CONSISTENCY AND EFFICIENCY FOR FIXED WIDTH CONFIDENCE .
"
0
'l
ELLIPSOID IN THE CASE OF THE WILCOXON STATISTIC
0(2)
N • 100,
0(1)
11
p(2)
,e
:I!TI
22
-
(0.25,1)
pO)
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
-0.8
88
0.96
84
0.92
82
1.00
82
0.92
86
0.84
96
0.88
115
0.92
132
0.96
132
0.88
-0.6
89
0.92
90
0.96
90
1.00
88
0.96
90
0.96
103
1.00
120
0.96
124
0.84
124
0.96
-0.4
97
0.92
95
0.92
92
0.96
92
0.96
96
0.92
112
0.92
115
0.96
115
0.92
118
0.84
-0.2
101
0.88
100
0.96
100
0.96
101
0.96
106
0.96
113
0.96
112
1.00
113
1.00
113
1.00
0.0
106
0.96
105
0.84
102
0.96
107
1.00
114
0.96
106
0.96
104
0.92
108
1.00
106
1.00
0.2
108
0.88
112
0.92
115
0.96
117
1.00
110
0.96
100
0.88
100
0.80
98
0.96
100
1.00
0.4
120
1.00
119
1.00
120
0.96
116
1.00
102
1.00
97
0.96
95
0.92
96
0.92
92
0.96
0.6
126
0.92
128
0.96
125
0.96
109
0.92
98
1.00
92
0.96
91'
1.00
90
0.92
90
1.00
132 "122
0.96 0.96
108
0.96
94
0.92
90
1.00
86
0.96
86 .
0.92
88
0.84
0.8 ' 132
0.92
Note: See note for Table 4.3.7.
."
t '__ I."
.
:
:.10..
0(2)
22
~
,u _1 •
(
_..',
, ..'.....
"'\:.
-.
"
120
TABLE 4.3,11
CONSISTENCY AND EFFICIENCY FOR FIXED WIDTH CONFIDENCE
ELLIPSOID IN THE CASE OF THE WILCOXON STATISTIC
. 0(2)
N
I:
50,
0(2)
[2!-, ~ 1=
0(1)
0
11
p(2)
0)
(0.25,0.25)
22
pO)
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
-0.8
47
0.88
46
0.96
48
1.00
48
0.92
51
0.96
57
0.88
66
0.88
70
0.92
72
0.88
-0.6
48
0.88
48
0.88
50
0.92
48.
0.92
53
0.96
60
0.84
68
0.92
66
0.92
66
0.96
-0.4
50
0.96
52
0.92
48
0.96
52
'0.92
56
0.96
62
0.96
62
0.96
62
0.88
64
1.00
-0.2
53
0.92
52
0.96
55
1.00
55
0.96
57
0.96
63
0.92
56
0.88
59
0.84
59
0.92
0.0
57
0.96
57
0.92
57
0.88
57
0.96
62
0.88
58
0.96
54
0.88
54
0.84
54
0.88
0.2
58
0.96
58
0.88
59
0.96
60
0.92
59
0.92
57
0.92
54
1.00
53
0.84
52
0.92
0.4
61
0.84
64
0.92
64
0.88
62
0.96
59
1.00
53
0.96
52
1.00
51
0.88
52
0.92
0.6
67
0.92
68
0.88
68
0.92
63
0.96
54
1.00
53
0.96
52
0.92
50
0.92
50
0.96
0.8
72
0.96
72
0.96
67
0.88
61
1.00
52
0.96
51
0.92
48
1.00
47
0.92
47
0.92
Note:
See note for Table 4.3.7.
e·
121
TABLE 4.3.12
CONSISTENCY AND EFFICIENCY FOR FIXED WIDTH CONFIDENCE
ELLIPSOID IN THE CASE OF THE WILCOXON STATISTIC
a(2)
N • 100.
[
~ • ~].
af~)
a~~)
(0.25.0.25)
.
pO)
p(2)
·e
a(2)
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
-0.8
90
0.92
90
0.96
91
0.96
86
0.96
93
1.00
110
0.92
128
0.92
137
1.00
141
0.92
-0.6
94
0.92
94
0.96
90
0.84
91
0.92
100
0.96
114
0.92
130
0.96
130
0.96
128
0.96
-0.4
95
0.88
96
0.92
92
0.96
96
1.00
100
0.84
115
0.96
118
0.92
119
1.00
118
0.96
-0.2
100
0.92
103
1.00
100
1.00
100
0.96
106
1.00
113
0.92
109
0.96
109
1.00
111
0.92
0.0
106
0.96
106
1.00
106
0.84
108
0.96
116
0.96
107
0.96
109
0.96
107
1.00
104
0.92
0.2
112
0.96
112
0.92
112
1.00
118
0.96
110
0.96
103
0.96
103
0.96
98
0.88
102
0.92
0.4
118
0.92
120
0.96
124
0.84
119
0.96
105
1.00
98
0.84
95
0.96
96
0.96
98
1.00
0.6
128
1.00
131
1.00
132
0.92
118
0.96
101
1.00
98
1.00
92
0.92
95
1.00
94
0.92
0.8
141
0.96
136
0.92
133
0.92
110
0.96
99
0.96
93
0.84
90
1.00
93
0.96
89
0.92
Note:
See note for Table 4.3.7.
122
4.4.
lehmann's Two Sample Scale Statistic
As in Section 4.2, let {X l ,X 2 , ••• } be a sequence of i.i.d.r.v.
with each X.
_ 1 = (X
_
il ,X i2 )' having a distribution function F(x)
--
where F(x) denotes a bivariate normal distribution with mean ~(l)
-
-
and dispersion ~(l). let {~l'!2' ••• } be an independent sequence of
i.i.d.r.v. with each Y. = (Y'l'Y j2 )' having a distribution function
_J
J
G(y) where G(y) denotes a bivariate normal distribution with mean
-
-
~(2) and dispersion E(2).
8i(F,G)
1
= 1,2.
= (el(F,G),e2(F,G»' where
let e(F,G)
= P {IYPl i
- Y6 i I
2
Q
>
'Xall' - Xa i
2
I }
From the definition of 8i(F,G), it is obvious that we can,
without loss of generality, assume ~(l)
estimator of e(F,G) is, for m ~ 2,
with the summation over all 1
c(u,v)
=
=
~
al
1
if u
0
otherwise
<
= ~(2) = O. The
n ~ 2, ~mn = (U~~,U~n)'
<
a2
~m.
1
~
81
<
82
~
U-statistic
where
nand
v
It has been shown by Sukhatme [1957] that:
~~g
a
E [ c( IX li - X2i !, !Y 1i - Y2il ) c( IX lj - X3jl, IY 3j - Y4jl)]
- E [ c(1 Xli - X2i !, !Y1i - Y2il )] E [c(IX1j-X2jl '!Y lj - Y2jl)]
123
e
r;;~~
= E [ cCi Xli - X211 "Yl1 - Y21 1) C(!X3j - X4jl ,I Y1j - Y3jl )]
- E[ CCi Xli - X211,I Yli - Y211)] E[c(1 X1j - X2jl ,I Y1j - Y2jl )]
We shall assume that
a~~) = a~~)
and
a~~) = a~~).
Moreover, for
simplicity, we shall assume a~~) = a~~). 1. Thus,
P { IYs i - Ys i I
1
2
>
IX 1 - XCI 11 } -= 0.5
Cl1
2
1 • 1,2 and e(F,G) • (0.5, 0.5)'.
To compute t ij ; 1,j • 1,2 consider IX1i - X2i I
10
-
IX1i - X3f I
=
U2f ,
=
U1i ,
IY1f - Y2fl~ • V1f ' and IY3f - Y4i I • V2f
(i,j • 1,2). Then,
Also,
•f
[F1(xClf + u1f ) - F1(xCli - u1f )] [Fj(xClj + u2j ) - Fj(XClj - u2j )]
dF(x )
.0
Then,
e·
124
- 0.25
In the same manner.
where
h(t li .t2j ) =
f
[9i(Yai + t li ) + 9i(YSi - t ll )] ·
[9j(Y j + t 2j ) + 9j (YBj - t 2j )] dG(~B)
B
and
More specifically.
12
C10
=
12 =
I;Ql
21
1;10
21
1;01
~
~
1 109
,..2
1
,..2
22
11
11
1;10 = 1;10 = 1;01
109
=
8
8 - p
(1 )2
8
8 _ p (2)2
22
1;01 =
1
7"
109 (8/7)
(See Section 4.6 for proof of the above.)
We note that
V1 ( X ) ·
10 ..h
i
2 E A~h'
h'Jlh
(m-l) n (n-1)
~ j < j' ~
where A1 ,
hh
(i
II
1.2; 1 ~ h.h' ~ m.
n) is the number of pairs of observations (Yji'Yj'i)
125 .
1
-<
k
<
Thus. S~5 and S~~
k'
- m ) is the number of pairs of observations
<
can be calculated directly from the above and
the definitions in Section 1.3.
4.5. Small Sample Behavior for Lehmann's Two Sample Scale Statistic
For Lehmann's scale statistic. kl = k2 = q1 = q2
will consider the case described 'in Section 4.4. Thus.
0
r
4 t ll •4 t 11
10 + --!l
).
(1-). )
12
4 tlO
21
4 tlO
21
4 1;01
22
4 1;10
(1-). )
).
).
where 1;~~ • t~~ • i.i
+
= 1.2
).
+
= 2.
We
12
4 t01
(1-). )
22
4 1;01
+
(1-). )
have been defined for this case in Section
4.4. Again using the notation of Chapters II and III. if
126
'1 = 2 { [(4I nZ ) log (8/7)]2 -
[(4InZ) log{ 8/(8 _ p(l )2)}] •
[(4/w2) log{ 8/(8 _ p(2)2)}] }
then,
I
2
=
r
t
.t=o
The greatest characteristic root of r is:
v*(~) = [4/(w2
+
A (l-~)]
log (8/7) + (4/w 2 )
[~-l
log{ 8/(8_ p(1)2)}
(l_A)-l log {8/(8 _ p(2)2)} ].
As before, for our simulation. we will select our initial
o
sample, MO' to be 10, and the sample size of each subsequent step
equal to 10. In the case of estimation with fixed total sample size,
we will let that fixed total be 50.
In the case of the fixed width
confidence ellipsoid, we will let the sample size one would expect if
the population parameters were known be 50. Also in the later case,
a
=
0.95 and ah
= 5.99. Due to the fact that there is more computation
required for Lehmann's statistic as opposed to Wilcoxon's statistic
and due to limited computer time, we will only be able to consider a
few combinations of p(l). p(2).
127
The results of a simulation of the sequential procedure
described in Chapter II is presented in Table 4.5.1.
It appears
that our estimate of the asymptotic generalized variance of our
estimator (described in Chapter II) is approximately equal to the
minimum asymptotic generalized variance of any two sample U-statistic
estimator for which the total sample size is fixed.
(For the same
reason as outlined in Section 4.3, Table 4.5.1 has been included in
its entirety.) The results of a simulation of the sequential
procedure described in Chapter III is presented in Table 4.5.2.
It appears that the sample size required by our procedure is approximately equal to the sample size one would use if the covariance
matrix were known. Moreover, the coverage frequency is approximately
equal to'the specified confidence coefficient.
(It is of interest
to note that if we had included the next several terms in our
calculation of ~{5' t~{, i,i • l, ••• ,t. the width of the confiden~e
ellipsoid would have only been changed by about 4%.)
128
TABLE 4.5.1
EFFICIENCY OF ESTIMATION WITH FIXED SAMPLE
SIZE IN THE CASE OF LEHMANN'S STATISTIC
p(2)
p
(1)
-0.50
0.00
0.50
-0.50
1.01
1.12
1.03
0.00
1.11
1.02
1.04
0.50
1.06
1.04
1.17
Note: The entries in the table are equal to
o
a.ve I ~M' .N' I ]1/2
[
Ir...mO,n I
O
where mO and nO are chosen to minimize
on 25 repetitions of the simulation.
I:m,n l
.
All entries are based
e·
, _
_.. _ _,,_
_
.~
~
•...
~_......-..~.
,.,._.
,. , ,-_-,
_._,.< .._._..... . ·
u·
~
-
.
129
TABLE 4.5.2
CONSISTENCY AND EFFICIENCY FOR FIXED WIDTH CONFIDENCE
ELLIPSOID IN THE CASE OF LEHMANN'S STATISTIC
Average N* Coverage Frequency
.-
-0.50
-0.50
57
0.92
-0.50
0.00
63
0.92
-0.50
0.50
53
0.84
0.00
-0.50
62
0.80
0.00
0.00
65
0.92
Note: All entries are based on 25 repetitions of the simulation
except the case p(l)
= p(2)
= 0.00 which is based on 12 repetitions.
130
4.6.
Calculation of
th~ A:.ymp~otic
Variance of Lehmann's Statistic
Consider
=
f(x)
k(v 1 )
=f
-00
00
_CD
=
(2w)-1/2 exp [-(x 2/2)]
[f(x+v 1) + f(x- v1)] f(x) dx
-1/2
w
2
exp [-(v 1/4)]
(4.6.1) except -p/2 is replaced by p/2.
Let
00
CD
where
< x <
CD
. 131
z =
dp
=
2 •.1 (4 • T2)-1/2 P (x*
>
O. y*
>
0)
where (X*.Y*) is distributed as a bivariate normal with zero means.
variances of
(2 -
2 ) (1 _ T2 )
T
(2 _ ,(2)2 _ '[2
and correlation '[ (2 - T2)-1. Thus,
. (4.6.2)
p
=
Considering just the first term in the series expansion of
f
(4 - T2)-1/2 Sin -1 (T/(2 - T2» dT
~
log (2 -
2
T )
4
Since p
= 1/16 when
Ta
0, we can now evaluate the constant of
132
integration in (4.6.2) and obtain
constant~ (1/16)
+ [log(2)/(4w 2)]
Finally, one should note that
f
(2w) -3/2 exp [ - {(x-t1 )2 + (x+t 2)2 + x2} /2] dx
has the same value as (4.6.3) except that the correlation in
the bivariate normal is -1/2.
o
...,..
..
'-~'._""'-'
'''''-'
..
,,'
..
_ _.-.....,-- --....
.....
_... ~
-_
.....
_.~--
.~
....----,....
_--_.-
"~'''-''''''--
.....
- ...-.
,-
._,.'
CHAPTER V
SUMMARY AND SUGGESTIONS FOR FURTHER RESEARCH
In this thesis we have considered two related estimation
In Chapter II we discussed a procedure for estimating
procedures.
~(F,G)
by means of
~mn
with the restriction that m+n
= N (fixed).
In Chapter III we discussed a procedure that will enable us to
of fixed maximum width
obtain a confidence ellipsoid for e(F.G)
..
2d, where d
>
asymptotically
0, and such that the confidence coefficient is
a.
where 0
< a <
1. Finally in Chapter IV. a
simulation study was performed for the theory presented in Chapters
II and III with particular reference to the Wilcoxon statistic and
Lehmann's two sample statistic.
Further research in this area could proceed in several directions.
(1)
In this thesis we have dealt exclusively with U-statistics
as our estimators of e(F.G).
Another method of estimation is based
.
on an alignment consideration. For example, consider the problem of
estimating a difference in location for the univariate two sample
problem. Suppose that X1' ••• 'Xm and yl' ••••yn' N = m+n are independent
samples from F(x) and F{x-6). respectively. Consider a test statistic
h(X1, ••• ,Xm.yl •••••yn) for the hypothesis
6
>
A
=0
against the alternative
O. Assume that h(X1 •••••Xm.yl+a •••• 'yn+a) is a non-decreasing
function of a for each X1' •••• Xm'yl' •••• yn. and when 6
E(h(Xl' ••• 'Xm'y1 •••• 'yn)) •
~
1s independent of F.
= 0,
Let
134
A*
N
=
SUp {
A**
=
inf { a:
N
a:
...
For a suitable function h. AN is an aligned
2
estimator of A.
If we let h be the Wilcoxon statistic. the resulting
...
estimator AN turns out to be the set of mn differences Yj - xi'
i
= 1•••••m •
j
= 1••••• n.
If h is the median test.
the resulting
estimator becomes median(Yl •••••Yn) - median(xl ••.• '~).
If
h is the normal scores statistic, the resulting estimator has to be
computed by trial and error.
Using estimators based on this alignment procedure, Sen and
Ghosh [1971] have considered the problem of a fixed-width confidence
interval for the univariate one sample case.
It would be of interest
to extend the work of Sen and Ghosh [1971] and develop procedures
considered in this thesis for this type of estimator.
In this thesis, we have considered
(2)
{Y.,j=l •••• }
...J
{~i'
i
= l •••• }
and
to be independent sequences of independent and
identically distributed random variables.
Additional research could
= 1,2, ••• } and {Y j = 1,2, ••• }
...
are independent m-dependent processes or the case where
i = 1••• }
consider the case where {Xi' i
j •
{~i'
and {Y.,
...J
j
= 1,2, •.• }
are independent sequences from finite
universes.
(3)
In Chapter II we considered the restriction that m+n
(fixed).
=N
A more general discussion would involve the cost of
sampling in the different samples. That is. in the two sample case,
135
let the cost of sampling be a known linear function of the observations
11 m + a2n + a • Then. our restriction would involve a prescribed
3
upper bound AO to the cost of sampling.
(4)
The simulation study of Chapter IV could obviously be extended
to consider many different situations other than those discussed
here. Other statistics besides the Wilcoxon statistic and lehmann's
tW{, sample scale statistic could be examined. Higher dimensional
cases could be considered. A study of the effect of different sample
sizes for both the initial sample and subsequent samples would be
of interest. Trivially. the simulation could be extended by
considering additional values for the various parameters studied in
Chapter IV. More sequences on which to base the average total
sample size and the average coverage frequency of the ellipsoid
would help to reduce the variation in these averages. Similarly.
more sequences would be helpful for the averages computed in the
o
simulation of the results of Chapter II.
(5)
As stated in Section 3.5. it would be of interest to consider
whether the procedure described in Chapter III has the property
that the expected sample size of that procedure exceeds the sample
size one would use if the appropriate population parameters were
known by a finite quantity for all d
(6)
>
O.
In this thesis we have allowed the sample sizes of the various
stages to be selected quite arbitrarily (only requiring the sample
sizes to exceed a minimum value depending on the degrees of the
kernels).
Further investigation could consider a theoretical
justification for determining an optimal selection of these sample
. _ . . . . - _ .... ~..
_ ' ....
~
W __
."_'.'_~'_"'."_' ~._•• , . _
136
sizes. Consideration could be given to a sampling scheme that would
reduce the sample size of successive steps.
REFERENCES
Anderson, T. W. 1958. An Introduction to Multivariate Statistical
Analysis. New York: John Wiley &Sons, Inc.
Anscombe, F. J. 1952. Large sample theory of sequential estimation.
Proceedings of the Cambridge Philosophical Society 48: 600-7.
Anscombe, F. J. 1953. Sequential estimation. Journal of the Royal
Statistical Society Series B 15: 1-21.
,
Berk, R. H. 1966. Limiting behavior of ~osterior distribution when.
the model is incorrect. The Annals of Mathematical Statistics
37: 51-8.
Buck, R.
1965. Advanced Calculus. New York: McGraw-Hill Book Co.
Chow, Y. S. 1960. A martingale inequality and the law of large
numbers. Proceedings of the American Mathematical Society 11:
107-11.
Chow, Y. S., and Robbins, H. 1965. On the asymptotic theory of fixedwidth sequential confidence intervals for the mean. The Annals
of Mathematical Statistics 36: 457-62.
Cramer, H. 1937. Random Variables and Probability Distributions.
Cambridge: Cambridge Tracts in Mathematics.
Cramer, H. 1946. Mathematical Methods of Statistics. Princeton:
Princeton University Press.
Feller, W. 1966. An Introduction to probabi1it~ Theory and Its
Applications. vol. II. New York: John iley &Sons, Inc.
Fraser, D. A. S. 1957. Nonparametric Methods in Statistics. New York:
John Wiley &Sons, Inc.
Ghurye, S. G., and Robbins, H. 1954. Two-stage proc~dure for
estimating the difference between means. Biometrika 41: 146-52.
Hildebrand, F. 1962. Advanced Calculus for Applications.
Cliffs: Prentice-Hall, Inc.
Englewood
138
Hocffdin~,
H.
lSl;S.
distribution.
293-325.
A class of statistics with asymptotically nonnal
The Annals of Mathematical Statistics 19:
Hoeffding, W. 1961. The strong law of large numbers for U-statistics.
University of North Carolina Institute of Statistics Mimeo Series
No. ::>02.
Hoyland, A. 1965. Robustness of the Hodges-Lehmann estimates for
shift. The Annals of Mathematical Statistics 36: 174-97.
Kendall, M. G., and Stuart, A. 1967. The Advanced Theory of Statistics.
Vol. II. New York: Hafner Publishing Company.
Miller, R. G., and Sen, p. K. 1972. Weak convergence of U-statistics
and Von Mises' differentiable statistical functions. The
Annals of Mathematical Statistics 43: 31-4].
Mogyorodi, J. 1967. Limit distributions for sequences of random
variables with random indices. Transactions of the Fourth
Prague Conference on Information TheorY Statistical Decision
Functions, Random Processes. Prague: Academ;a PUblishing
House of the Czechoslovak Academy of Sciences.
Pearson, K. 1907. On further methods of determining correlation.
Draper1s Company Research Memoirs. London: Cambridge
University Press.
Puri, M. L., and Sen, P. K. 1971. Nonparametric Methods in
Multivariate Analysis. New York: John Wiley &Sons, Inc.
Ray, W. D. 1957. Sequential confidence intervals for the mean of a
normal distribution with unknown variance. Journal of the
Royal Statistical Society Series B 19: 133-43.
Richter, D. L. 1960. Two-stage experiments for estimating a common
mean. The Annals of Mathematical Statistics 31: 1164-73.
Robbins, H., Simons, G., and Starr, N. 1967. A sequential analogue
of the Behrens-Fisher problem. The Annals of Mathematical
Statistics 38: 1384-91.
Sen, P. K. 1960. On some convergence properties of U-statistics.
Calcutta Statistical Association Bulletin 11: 125-43.
Sen, P. K. 1971. A Hajek-Renyi type inequality for stochastic
vectors with applications to simultaneous confidence regions.
Jhe Annals of Mathematical Statistics 42: 1132-34.
139
Sen, P. K. 1972. A Hajek-Renyi type inequality for generalized
U-statistics. (unpublished).
Sen, P. K., and Ghosh, M. 1971. On bounded length sequential
confidence intervals based on one-sample rank order statistics.
The Annals of Mathematical Statistics 42: 189-203.
Simons, G. 1968. On the cost of not knowing the variance when
making a fixed-width confidence interval for the mean.
The Annals of Mathematical Statistics 39: 1946-52.
Sproule, R. N. 1969. A sequential fixed-width confidence interval
for the mean of a U-statistic. (unpublished Ph.D. dissertation,
University of North Carolina).
Starr, N. 1966. The performance of a sequential procedure for the
fixed-width interval estimation of the mean. The Annals of
Mathematical Statistics 37: 36-50.
Stein, C. 1945. A two-sample test for a linear hypothesis whose
power is independent of the variance. The Annals of
Mathematical Statistics 16: 243-58.
Sukhatme, B. V. 1957. On certain two-sample non-parametric tests
for variances. The Annals of Mathematical Statistics 28:
188-94.
Wa1d, A. 1947. Seguential
Inc.
o
A~alysis.
New York: John Wiley &Sons,
Whittaker, E., and Robinson, G. 1944. The Calculus of Observations.
4th edt london: Blackie &Son Limited.
Wilks, S. S. 1932. Certain generalizations in the analysis of
variance. Biometrika 24: 471-94.
Wilks, S. S. 1962. Mathematical Statistics. New York: John Wiley &
Sons, Inc.
.
Yen, E. H. 1964. On two-stage non-parametric estimation. The Annals
of Mathematical Statistics 35: 1099-144.