Sprould, R.N.; (1969)A sequential fixed-width confidence interval for the mean of a U-statistics."

Work supported by the National Institute of Health, Public Health
Service, Grant GM-I03~7 and the United States Air Force Grant
AFOSR-68-1415.
A SEQUENTIAL FIXED-WIDTH CONFIDENCE INTERVAL
FOR THE MEAN OF AU-STATISTIC
by
RAYMOND NELSON SPROULE
Department of Statistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 636
AUGUST 1969
RAYMOND NELSON SPROULE.
A sequential fixed-width confidence interval
for the mean of a U-statistic.
w.
(Under the direction of
J. HALL.)
Preliminary properties of aU-statistic U , introduced by
n
Hoeffding (Ann. Math. Statist.
21
(1948) 293-325), are developed along
with an investigation of several related statistics.
The problem of estimating n Var(U } is considered in some detail,
n
with emphasis placed on the asymptotic nature of the estimates.
Two
prime candidates emerge each possessing good asymptotic properties.
Hoeffding (Univ. of North Carolina, Inst. of Statist., Mimeo
Series No. 302 (1961»
showed that a U-statistic may be expressed as
an average of independent and identically distributed random variables
plus a remainder term.
A Ko1mogorov-1ike inequality for this remainder
term is developed and its (a.s.) convergence properties are examined.
These properties are then related to the U-statistic.
In addition,
using a result of Anscombe (Proc. Cambridge Philos. Soc.
~
(1952)
600-607), the asymptotic normality of UN' where N is a positive
integer-valued random variable, is established under certain conditions.
Equipped with the preceding results, a sequential fixed-width
confidence interval for the mean of a U-statistic, having coverage
probability approximately equal to some preassigned a, is developed.
It is also shown that the sequential procedure (or confidence interval)
is asymptotically efficient, in the sense of Chow and Robbins (Ann.
Math. Statist. 36 (1965) 457-462).
"""
A SEQUENTIAL FIXED-WIDTH CONFIDENCE INTERVAL
FOR THE MEAN OF AU-STATISTIC
by
Raymond Nelson Sproule
A thesis submitted to the faculty of the University of
North Carolina at Chapel Hill in partial fulfillment of
the requirements for the degree of Doctor of Philosophy
in the Department of Statistics
Chapel Hill
1969
Approved by:
Adviser
ii
TABLE OF CONTENTS
CHAPTER
I
II
III
PAGE
INTRODUCTION
1
PRELIMINARY CONCEPTS AND RESULTS
3
2.1
Introduction
3
2.2
Functionals
3
2.3
U-statistics
4
2.4
The H-decompoaition
9
2.5
The W-statistic
15
2.6
The S-decomposition
17
2.7
The Z-statistic
18
ESTIMATION OF THE VARIANCE OF AU-STATISTIC
29
3.1
Introduction
29
3.2
The U-statistic estimate of P1
30
3.3
Estimation ofp
33
3.4
2
Sen's estimate of cr =
3.5
Estimate of cr
3.6
Example
2
c
= r 2 P1
based on the Z-statistic
35
38
49
iii
PAGE
CHAPTER
IV
CONTRIBUTIONS TO Tlili ASYMPTOTIC THEORY
OF U-STATISTICS
53
4.1
Introduction
53
4.2
Kolmogorov inequalities
53
4.3
Strong convergence results
55
4.4
The asymptotic normality of UN
58
V SEQUENTIAL FIXED-WIDTH CONFIDENCE INTERVALS
FOR REGULAR FUNCTIONALS
5.1
5.2
5.3
Introduction
2
The sequential procedure using swn
2
The sequential procedure using s
zn
5.4 Examples
BIBLIOGRAPHY
62
62
63
70
77
81
iv
ACKNOWLEDGMENT
I wish to express my sincere appreciation to Professor W. J. Hall
for his significant guidance, criticism and understanding during the
course of this work.
I would also like to thank Professors W. Hoeffding, P. K. Sen
and G. Simons for their helpful suggestions regarding the original
manuscript and the Public Health Service for financing a Research
Assistantship through grant GM-10397.
A very special thank you is due my wife Bonnie and Electra Sproule
for their constant encouragement.
CHAPTER I
INTRODUCTION
Our primary goal is the development of a sequential confidence
interval for the mean of a U-statistic, having fixed-width equal to 2d
and coverage probability approximately equal to some preassigned a,
where 0 < a < 1.
The problem was solved by Chow and Robbins [3] for
the special U-statistic, the sample mean.
Starr [16] evaluated the
Chow and Robbins [3] sequential procedure, assuming that the underlying distribution is normal, and concluded that the procedure is
reasonably consistent and efficient for all values of the variance of
the underlying distribution.
Generally speaking a U-statistic is a
generalization of a sample mean.
This is clearly revealed by a de-
composition due to Hoeffding [9] and which we refer to as the Hdecomposition.
By means of this decomposition aU-statistic U can be
.
n
expressed as the sum of a sample mean and a remainder term R •
n
In Chapter II, notation, concepts and preliminary results used
throughout this work are developed.
In addition to an examination of
the H-decomposition, two important statistics are introduced, the
W-statistic and the Z-statistic.
In Chapter III we consider the problem of estimating the variance
of a U-statistic.
Two estimates, s2 and s2 , of n Var(U } emerge as
wn
zn
n
prime candidates - based on the W- and Z-statistics.
It turns out that
, s l'19h t 1Y superlor
'h
teoret i ca 11y, b ut t h at s 2 can be calculated
s 2 lS
wn
zn
much more easily, in general.
Therefore, both estimates are used in
2
Chapter V to define sequential procedures.
In Chapter IV a Kolmogorov-like inequality for U-statistics is
established.
Also, it is shown that the remainder term R converges
n
to 0 in a strong sense.
However, the main purpose of Chapter IV is to
develop the asymptotic normality of a U-statistic based on a random
number of observations, in the fashion of Anscombe [1].
This result
leads to the asymptotic consistency of the sequential procedures
offered in Chapter V.
In Chapter V, equipped with the results of the previous chapters,
we present a sequential confidence interval for the mean of a Ustatistic.
The confidence interval has fixed-width and is asymptoti-
cally consistent.
Making use of the fact that U-statistics are reverse
martingales, the asymptotic efficiency of our sequential procedure is
established, in much the same manner as Simons [15].
The theory is
illustrated explicitly by obtaining sequential (non-parametric) fixedwidth confidence intervals for (1) the population variance and
(2) the probability of a concordant pair of observations in sampling
from a bivariate population.
As a secondary goal, it is hoped that the techniques and results
developed here can be used to extend other sequential procedures
already available for the sample mean (where the variance is unknown)
to the case of aU-statistic.
CHAPTER II
PRELIMINARY CONCEPTS AND RESULTS
2.1
Introduction.
~
U-statistics, as well as some related
statistics, are defined and some relevant properties are presented.
A particular emphasis is placed on a decomposition of aU-statistic
due to Hoeffding [9], and referred to as the H-decomposition.
2.2
Functiona1s.
We now introduce some basic terminology
~
following closely the lines of Hoeffding [7].
Let
~
be a subset of
the set of all c.d.f.'s defined on a finite-dimensional Euclidean
space.
Suppose that the random variable X has c.d.f. F
that "e(F) is a functional of F defined on
number e(F) is assigned.
~"
E~.
We say
i f for each F E
~
a real
A functional e(F) is "regular over £I" if
there exists a positive integer n and a function
for each F
E~.
In such a case
unbiased estimate of e(F) over
~(x1'··· ,x
n
~".
~(x1,···,xn)
such that
) is said to be "an
Let r be the smallest number of
arguments required for a function to be an unbiased estimate of e(F)
over £I.
Then r is said to be "the degree of e(F) over
function
~(x1,···,xr)
functional e(F)".
~"
and the
is referred to as a "kernel of the regular
Clearly, for any regular functional e(F), we can
always find a kernel which is a symmetric function of its r arguments,
namely,
4
(r:)-l~ ~(x Q"
1
••• , x Q' )
r
where the summation is over the r: permutations (Q'l,···,Q'r) of the
integers [1,2,".,r}.
For example, suppose
~
is the set of all c.d.f. 's defined on the
real line and having finite variance.
random variable X having c.d.f. F
E~.
Let e(F) be the variance of the
Then
2
is a kernel of e(F) and (x -x ) /2 is the
1 2
corresponding symmetric kernel.
Hoeffding [7] shows that a polynomial in regular functiona1s is
itself a regular functional.
This useful result has a very straight-
forward proof.
2.3
U-statistics.
U-statistics were introduced by Hoeffding [7].
~
Let X ,X ,···,X be independent and identically distributed random
n
1 2
variables (henceforth referred to as I.I.D. random variables) having
c.d.f. F.
Let f(x ,···,x ) be a function of r arguments.
1
r
Then a
U-statistic is defined by
=
[n(n-1)··· (n-r+1) ]
-1
~
f (xQ' , .•• ,xQ' )
1
r
where the summation is over all permutations (Q'l,···,Q'r) formed from
the integers [1,2, ••• ,n}.
the U-statistic".
We refer to f(xl'''.'x ) as a "kernel of
r
Notice that Un is sYmmetric in x ,x ,···,x •
n
1 2
i f f(x ,···,x ) is a kernel of a regular functional e(F) over
1
r
~
Also,
then
5
Un
is an unbiased estimate
of e(F) over~. Letr
f (x 1 ""'x ) be the
o
symmetric function corresponding to f(x 1 ,··· ,x r ).
n)-l L; (n , r) f (x
n = ( r
0
Q"
U
• •• x
1
,
We can then write
Q' )
r
where L;(n,r) represents here, and in the sequel, the summation over all
the combinations (Q'l,···,Q'r) formed from the integers {l,Z,···,n}.
refer to f (x ,···,:x ) as the "symmetric kernel" of the U-statistic.
r
o 1
Assume from this point on, without loss of generality, that
f(x 1 ,···,xr ) is symmetric in x ,x Z,···,xr '
1
We now introduce regular functiona1s denoted by p which playa
central role in what follows.
Assume that the symmetric function
f(x , ••• ,x ) has existing expectation for F
r
1
E~.
Write
Define
for c = 1,2,···,r.
Note that f r (x ,···,x r ) = f(x ,···,x r ).
1
1
We
interpret e{f(x1,.·.,xC'Xc+l"",Xr)} as the expected value of
f(X ,···,Xr ) given that X "",Xc are fixed at the values x ,"',xc '
1
1
1
respectively.
Notice that e = e{f c (X ,···,Xc )} for c = 1,2,···,r.
1
Define
pc = Var{fc (X 1 ,"',Xc )}
for c= 1,2,···,r.
In particular f 1 (x 1) = e{£(x 1 ,X Z""'Xr )} and
P1 = var{f 1 (X 1)}·
Now Pc = pc(F) is a polynomial in regular func-
tiona1s of F, and so, is itself a regular functional of F.
We
6
If e(f(Xl,···,X )}
r
2
<
00,
then Hoeffding [7] shows that the
variance of U is given by
n
Var(U}= (n)-lL;rlc r \) (n-r)
n
r
c=
c
r-c Pc
(2.1)
= n -1 r 2 PI +
for n
r.
~
O(n
-2
)
This result is generalized in Theorem 2.2 below.
In
Chapter III we consider the problem of estimation of the regular
2
2
functional cr = r Pl.
LEMMA 2.1.
(i)
(ii)
(iii)
(iv)
The following lemma appears in Hoeffding [7].
Assume e(f(Xl,···,X )}
r
2
<
00.
Then
0 ~ pc/c ~ pd/d for 1 ~ c < d ~ r,
2
r p /n ~ Var(un } ~ rpr/n,
l
n Var(D } is a non-increasing function _of n, and
n
--
-
Var(U } = P and lim
r
r --
n .... oo
2
n Var(U } = r p •
n
l
We now introduce notation which is used to represent the covariance between two U-statistics.
Let X ,X ,···,X be I.I.D. random
n
l 2
variables, and let f(xl,···,x ) and g(xl,···,x s ) be two symmetric
r
functions with rand s arguments, respectively.
f
c
(x
l'
Define
••• x )
, c
and
Sc = Cov(f c (Xl,···,Xc ),gc (Xl,···,X)},
c = 1,2,···,min(r,s).
c
Define two U-statistics by
D
n
=
n) -1 L; (n , r) f (x
• •• x )
( r
et'
'et
1
r
7
and
V = (. m ) -1 ~ (m , s) g(x A ,···,xA )
s
m
~1
~s
where n
2
rand n
THEOREM 2.2.
2
m 2 s.
Assume e[f(X ,· .. ,x )} 2 <
1
= n
PROOF.
-1
rS~l
r
00
and e[g(x ,.·.,X) }2 < 00.
1
--
s
-2
+ O(n ).
First
Cov(u ,V } = en) -l(m) -l~(n,r)~(m,s)Cov(f(X ••• X ) (X" .... X-..)}.
n mrs
Q" s
~ ,S
Q'"
Q' ' g j j '
, jj
r
1
s
1
Now, for each combination (~l'···'~s) formed from (1,2, •• ·,m}, the
total number of combinations (Q'l' ••• , Q')
r formed from (1,2,··· ,n} and
having exactly c suffixes in common with (~1'··· ,~s) is (~) (~=~)
where c = O,l,···,min(r,s).
Notice also that Cov(f(XQ' ,···,XQ')'
1
r
g(X~ , .•. ,x~ )} is zero if c = 0, that is, if there are no suffixes
1
s
in common.
Cov(u
Thus
V } = (n) -1 (m) -1 {em) ~min(r,s) e S) (n-s) S }
n' m r s
.
s
c=l
c
r-c
c
which completes the proof.
Cov(u ,V } is free of m.
n m
Notice that the final expression for
,
8
and
Correlation[U ,U }
n
We
ne~t
m
=
q
(c)
n
m
introduce notation used to represent the covariance
between the squares of U and U.
n
m
(2.2)
[Var[U }/Var[U }]1/2
For c
= O,l,···,r
define
(xl' ••• ,x 2
r-c )
where the summation ~(c) is over all combinations (al,· •• ,a ) and
r
(~l'···'~r) each formed from [1,2,·.· ,2r-c} and such that there are
exactly c integers in common.
Pc +
e2 for c
=
O,l, .. ·,r.
Put Po
= 0.
Then e(q (c) (X· •• X
)} =
l'
, 2r-c
Now, for c = O,l,· .. ,r and
t
=
1,2, ••• ,2r-c
define
qlC)(Xl,···,X;)
= e(q(C)(Xl,···,Xt,XU1'···'X2r_c)}
and
c
= O,l,···,r
(2.3)
define a U-statistic by
=
n) -1,,(n,2r-c)
(c) ( x , · · · ,x
i.J
q
).
( 2 r-c
a
a
1
2r-c
Put t = min(2r-c,2r-d).
define
Finally, for
t
= 1,2,"',t and c,d = O,l, .. ·,r
9
Notice that for L = 1,2,···,2r-c and c = d = 0,1,··· ,r, SlC,C)
=
(c)
PL
so that, in particular,
LEMMA 2.3.
Assume
2 2
-1 en) -1 r
r er)(r)(n-r)(m-r) e n ) -1
Cov[U n ,Um} = ( m)
r
r
~c=o~d=o c
d
r-c
r-d
2r-c
~t
b:1
(2r-d) cn-2r+d) S(c ,d)
L
2r-c-L
L
= n -1 4r 2e2 P1 +
O(m
-1 -1
n ).
First, from (2.2) and (2.3),
PROOF.
(
n) -2 ~ (n , r) ~ (n , r) f(x
r
0" s
~
, sO"
•.• x
1
'0'
)f(x
r
~'
••. x )
1
,
~
r
= en) -l~r (r) (n-r) U(c).
r
c=o c
r-c
n
Then
cov[u 2 U2 }
n'm
=
(m)
r
-I( n)
-l~r ~r (r)( r)· (n-r) (m-r)
r
c=o d=o c d
r-c
r-d
Cov{ U(c) U(d) }.
. n ' m
But from Theorem 2.2
Cov[u(c) U(d)}
n ' m
for c,d
= O,l,···,r.
2.4
=
(n ).-l~t (2r-d) (n-2r+d) S(c,d)
2r-c
L=l
L,
2r-c-,e
,e
This completes the proof.
~.
In Hoeffding [9] a means for decom-
posing U is developed, one which has great value in establishing
n
properties of- U.
n
decomposition".
We refer to this decomposition as the "HContinuing in our development of notation we define
10
gh(xl'·"'~)
=
fh(xl'·"'~)
-
e for h = 1,2,· .. ,r. Also, let
g(l)(X ) = gl(x ) and
l
l
(2.4)
= ~(xl.' ••• '~) - ~~-l~(h,j)g(j)(x
••. x )
~n
n
J=l
Q'l"
Q'j
for h = 2,3,···,r.
For example, if h = 2,
From (2.4) it follows that
= e/g(h)(x ••• x X · •• X.)}
gc(h)(x l' ••• ,x)
c
l
1"
c' c+l'
'--h
for h = 1,2,···,r and c = 1,2,···,h-l.
For n > rand h = 1,2,···,r
define
n
£.J
g (h)( x,· . . ,x ) .
V(h) __ (h ) -l,,(n,h)
n
Q'l
'b.
(2.5)
(1)
-1 n
(1)
-1 n
Note that Vn
= n ~i=lg
(xi) = n ~i=lfl(xi) -
e.
Strictly
speaking V(h) is not a U-statistic, as it may depend upon unknown
n
functionals.
Nevertheless, it does have most of the attributes of a
U-statistic.
The proof of the following lemma is simple and appears
in Hoeffding [9].
LEMMA 2.4.
If e[ If(X ' ••• ,Xr ) I} < 00, then erg (h) (Xl' ••• ,~)} = o and
l
.•. ,x) = 0 for h = 1,2,···,r and c = 1,2, ••• ,h-l.
synunetric.
g(h) (x
c
l'
For h = 1,2,···,r the function g(h)(xl'.··'~) is
c
11
LEMMA 2.5.
Assume that e(f(X 1 ,··· ,Xr )}
2
0h = var(g(h)(X1'···'~)} for h = 1,2,··· ,r.
<
00
and let
Then
(i) for h = 1,2,···,r the ~ of v~h) is 0 and the variance is
given EY
O(n
-h
).
Also
for r < m < n we have
(ii)
---
-
Cov(v(h) v(I-)} =
n ' m
PROOF.
Var (Von(h) }
{
First, since e(f(X ,···,X )}
1
r
h = 1,2,···,r.
h = I- = 1,2,···,r
h
~
I- = 1,2,···,r.
2
<
00,
then 0h <
00
for
Part (i) follows from Lemma 2.4 and the fact that
Part (ii) follows from Theorem 2.2.
We now introduce the H-decomposition by means of the following
theorem given in Hoeffding [9].
THEOREM 2.6.
~
U-statistic may be decomposed into
~
linear
combination of U-statistics, specifically,
(2.6)
U =
n
=
e +>,.r (r) v(h) = e + rV(l) + R
n=l
e
+ rn
h
-1 n
~.
~=
n
n
n
1(f1 (x.)-e)
+ Rn
~
where R = >,.r (r) V(h) and Correlation (v(l),R } = O.
n
in=2 h. n
n
n
n ) Vn(h)
satisfies the martingale property, that is,
(h
Further,
12
for r.$ m < n and h
REMARK 1.
1,2,·",r.
=
Theorem 2.6 is extremely useful in establishing
properties of U-statistics.
In fact, it states that U is a linear
n
combination of U-statistics, mutually uncorre1ated (by Lemma 2.5) and
each successive term having a variance of smaller order.
It shows
that a U-statistic is essentially the sum of an average of I.I.D.
random variables V(l) and a zero-mean remainder term R , and that the
n
n
two are uncorre1ated.
zero.
Of course, if r
=1
From Lemma 2.5 we see that Var(R }
n
the remainder term R
= O(n- 2).
show that under the assumption that e(f(X , •••
1
converges to zero almost surely as n
This implies that nYR
n
Y < 1.
-> ~
is
n
In Chapter IV we
,xr )}2 <~,
nYv(h)
n
for Y < h/2 and h == 1,2,"·,r.
converges to zero almost surely as n
->
~ for
Hoeffding [9] uses the H-decomposition to show that, under the
assumption that e(lf(X ,···,X )
r
1
mean almost surely as n
->~.
I}
<~, a U-statistic converges to its
Sen [13] proves a somewhat weaker result
in that he assumes that e(lf(X ,···,X )
1
r
1 e
1
+ } < ~ for some e> 1 - r- l
Berk [2] contains a rather simple proof of the almost sure convergence
of a U-statistic by recognizing that U-statistics are reverse martingales.
More will be said about U-statistics as reverse martingales
in Chapter V.
REMARK 2.
2
Hoe ffd,ing [7] prove s that i f e( f (Xl' ... ,X ) } <
r
~
and
2
PI > 0 then In(Un-e) has an asymptotic normal distribution N(O,cr )
2
2
where cr = r Pl'
The result follows directly from the H-decomposition
2
by noticing that r/n v(l) is asymptotically N(O,cr ) by the Lindbergn
13
Levy central limit theorem and that lim
REMARK 3.
n-+ oo
e[/n R }2 =
n
o.
The H-decomposition along with Lemma 2.5 yields a
second expression for the variance of a U-statistic, namely,
(2.7)
Compare (2.7) with (2.1).
Equating these two expressions for the
variance of a U-statistic enables us to obtain explicitly the re1ationship between the p's and the o's.
This leads us to the following
lemma, also proved by Hoeffding [7], but in an entirely different
manner.
LEMMA 2.7.
The p's and the o's
(2.8)
0.
'-n
=
~
related for h = 1,2,···,r
~
L;hc=l (h)
0c
c
and
(2.9)
PROOF.
To prove (2.8) we proceed by induction.
and equating (2.7) with (2.1) we obtain Pr -holds for h = r.
some j > 1.
Putting n = r
c=l (r)
c 0c·
Thus (2.8)
L;r
Now assume that (2.8) is true for r - j
<h
~
r for
Put n = r + j and equate (2.7) with (2.1) so as to obtain
)'.r
. (r+j ) -1 ( r) ( j )
= )'.r
(r+j) -1 (r) 2 0 •
~=r-J ~ r
h
r-h Ph
~=1 \ h
h
h
.
Mu1t1p1y
through b y( .r ) -1 (r+
r j ) and obtain
J
Pr - j
+ )'.r
( r) -1 ( r) ( j )
~=r- j+1 j
h
r-h
Thus, from (2.8), we have
.
r (r+j )
% = L;h=1
h
-1 (r ) -1 (r ) 2 (r+j) 0h.
j
h
r
14
(2.10)
The double sum on the left side uf (2.10) with i = r - h becomes
°
r i
t:~-l(~)
-l(:)(~
)t: - (r-i)
1=0 J
1
1
c=l
c
c
= t:~-l (~ ) -1 (~ ) t:r ( r ) (r:c)
1=0 J
1
c=l c
1
°c
°
= t:r ( ~ ) -1 ( r) t:~-l (~ ) ( r:c)
c=l J
c
c 1=0 1 \ 1
Thus (2.10) becomes
Pr - j =
=
~=l ( ~ )
-1
~:{ (r~j)
which is (2.8) with h = r - j.
(~) (rjh)
0h
0h
This completes the proof of (2.8).
order to prove (2.9) we make use of (2.8) as follows:
= t:~ t:h . (-1) h-c ( h) ( : ) 0..
J=l c=J
c
J
J
But
In
15
h . (-1) h-c (h) (~ )
C=J
c
J
2::
=
= h,
which equals 1 whenever j
(~) (l_l)h-j
and 0 whenever j < h.
This completes
the proof of (2.9).
2.5
The W-statistic.
~
For each i = 1,2,···,n define aU-statistic
based on xl,···,x.~- l'x,~+l'···'xn by
U.
(~) n
=
(
n-l)-L(n-l
r) f (x
L:
'
• •. , x 0: )
r
0:'
1
r
where the summation is over all combinations (O:l,···,O:r) formed from
{l, ••• ,i-l,i+l, ••• ,n}.
(2.11)
Define the W-statistics by
Win
= nUn
- (n-r)u(i)n
for i = 1,2,···,n and notice that they are identically distributed.
Furthermore, since Un = n
Wn
-1
=n
n
2::. lU(,) ,
~=
~ n
-1 n
2::.~= lW,~n
= rUn •
The W-statistics can be conveniently decomposed.
To do so, for each
i = 1,2,···,n and h = 1,2,···,r, define
V(h)
(i)n
where the summation is over all combinations
the integers {l,···,i-l,i+l,.··,n}.
Let
(O:l'···'~)
formed from
16
(h)
nVn(h) - (n-r)V(i)n
(2.12)
for i = 1,2,···,n and h = 1,2,···,r.
Then
r (r)
(h)
h Win
(2.13)
Win = r8 + ~l
for i = 1,2,···,n.
The following lemma gives us additional insight
into the decomposition (2.13).
LEMMA 2.8.
Assume e[f(Xl,···,Xr )}
and i,j = 1,2,···,n.
2
<
~ and suppose tha~ n
> r
Then, for h = 1,2, .•• ,r we-have
that e[w~h)} = 0
----
- - --
~n
and
Also, for h ~ t = 1 2 •.• r we have that Cov[w~h) w~t)} = 0, whereas
"
, - -- -~n ' In
for h = t = 1,2,···, r ,
(h) (h) }
(2.15) Cov[W.~n
= ( n-l)
h __ -1 [(r 2 -h)-(n-l) -1 h(r-l) 2 ]oh
' W.In
=
PROOF.
O(n
-h
).
Lemma 2.4 and (2.12) imply that e[w~h)} = O.
~n
follows from Lemma 2.5 and (2.12).
Cov[w~h)
,w~t)}
= 0 by Lemma 2.4.
~n
In
If h
~
Also, (2.14)
t = 1,2,···,r, then
If h = t -= 1,2,··· ,r, then by
Lemma 2.5 and (2.12), (2.15) holds.
LEMMA 2.9.
Assume e[f(X1 ,···,Xr )}
and i, j = 1,2, •• ·,n •
2
<
~ and suppose that n
Then e[w. } = r8,
~n
r (n-l)
-1 ( hr) 2 [r 2 + h(n-2r)]oh = 0(1)
h
Var [Win } = Lh=l
>r
17
and
O(n
The proof follows directly from (2.13) and Lemma 2.8.
-1
).
The
W-statistic is closely related to a statistic introduced by Sen [13],
as we shall see in the next
sectio~,
and plays an important role in
Chapter III.
2.6
The S-decomposition.
~
V
in
=
Sen [13] defined a U-statistic by
n-1) -1",(n-1,r-1)f(
'"'
x.,x , ••• ,x )
( r- 1
1
~2
~r
for i = 1,2,···,n where the summation is over all combinations
(~2'···'~r) formed from the integers [1, ••• ,i-1,i+1, ••• ,n}.
The
S-decomposition is
Un = n
-1 n
2::. 1V, •
1=
1n
Notice that W. = rV. for i = 1,2,···,n so that Lemma 2.9 may be used
1n
1n
to determine variance and covariance expressions for the V. 'so An
1n
2
2
estimate of a = r P1 can be constructed from the W. 's (or the V. IS)
1n
1n
and such an estimate is examined in Chapter III.
V converges in probability to f (x ) as n
in
1 i
integer i.
~
00
Sen [13] proves that
for any positive
This also follows from Lemma 2.9.
Both the H-decomposition and the S-decomposition indicate that for
large n a U-statistic behaves like a sample mean.
However, the H-
decomposition has the advantage that the remainder term R of
n
Theorem 2.6 has a fairly explicit representation.
There is a great
bulk of theory in the literature developed for the sample mean.
It
18
appears then that this theory might, in certain cases, be extended to
the case of a U-statistic by showing that, at least asymptotically,
the remainder term is in some sense negligible.
A specific result due
to Chow and Robbins [3) is extended in Chapter v.
2.7
The Z-statistic.
For n > r define the
Let Zr = rU.
r
~
Z-statistic by
Z = nU - (n-1)U
•
n-1
n
n
(2.16)
Note that U = n
n
-1 n
~.
Z ••
1.=r 1.
Now, define
Z(h) = nv(h) - (n-1)v(h)
n
n
n-1
(2.17)
for n > rand h = 1,2,···,r.
By (2.16) and (2.17), along with
Theorem 2.6, we have
Z =
n
(2.18)
.
of
An est1.mate
for n > r.
e + L;r (r)
h=1
0
2
= r 2 PI
h
Z (h)
n
can be constructed from the Zn's
and such an estimate is examined in Chapter III.
The following lemma,
an immediate consequence of (2.17) and Lemma 2.5, gives us some
insight into the decomposition (2.18).
LEMMA 2.10.
Assume e{f(X ,···,X )}
r
1
var{z(h)}
n
~ and suppose that
= ( n -1) -1 [(n-2)h+1)Oh = O(n -h+1 ).
h
Also, for h
~ ~ = 1,2,···,r ~
for h =
1,2, •• ·, r ,
~
<
Then, for h = 1,2,···,r ~ have that e{z~h)} = 0 and
r < m < n.
(2.19)
2
=
have that
cov{Z~h) ,Z~~)}
= 0, whereas
19
Cov[Z(h) Z(h)} = _(n-1)-1(h_1)O
(2.20)
n
LEMMA 2.11.
h
' m
Assume e[f(X ,···,X )}2 <
1
r
h
= O(n- h ).
00.
Then e[z } = re,
--
r
e[z } = e for n > rand
n
-
-
(2.21) Var[Z } =
n
{ r
r2 p
r
(n-1) -1 r 2
2-L
h..
(h) [(n-2)h+1]~= r P1 +O(n)
n = r
lb=l \
n> r.
Also, for m < n,
(2.22) Cov[Z ,Z } =
n m
PROOF.
(2.16).
1
-rL;~=2 (n~l) -1 (~) 2(h_1) ~
_L;r
h=2
2
= 0(n- )
(n-1) -1 ( r) 2 (h-1) 0 _ O( -2)
h
h
h n
m
=
r
m
>
r.
The expectations follow from the definition of Z and
r
The variance expression (2.21) follows from (2.18), (2.19)
and (2.20).
Suppose r < m < n.
Then, by (2.18),
which, upon applying (2.20), reduces to (2.22).
m = r < n.
Next, suppose
Then, using (2.16), the corollary to Theorem 2.2, and
(2.7) gives us
Cov[Z n ,Z r } = rn Cov[Un ,U r } - r(n-1)Cov[U n- l'U}
r
= rn Var[Un } - r(n-1)Var[Un- 1}
20
which reduces to (2.22).
REMARKS.
This completes the proof.
If r = 1, notice that W. and Z. each reduce to f(x.).
1n
1
1
In general rU
n
is the average of WI ,·.·,W , whereas U is a near
n
nn
n
.
average of Z , ••• ,Z , that 1S, U =
r
n
n
-1 n
Z.•
1=r 1
n~.
In Chapter III we are
concerned with the problem of estimating 02 = r2pl'
One estimate is
the sample variance of WI n ,···,Wnn , while a second estimate somewhat
resembles the sample variance of Z ,···,Z.
r
n
From the computational
point of view, the Z-statistic is a little more suited to a sequential
setting than is the W-statistic.
The next two lemmas are used in Chapter III to establish
Theorem 3.3.
LEMMA 2.12.
Assume e[f(X , ••• ,X )}4 <
r
l
Let ~l = var[[g(1)(X )]2}.
l
00
and suppose that n> r.
Then
(2.23)
PROOF.
and set P
n
(2.24)
For convenience write z
"r
L..h= 2 ( hr ) Z(h)
n •
= Z(l) = g(l)(x ) (see (2.17))
n
n
Th en, b y (2 • 18) ,
n
22
(Z _8)2 = r 2
z + 2rz P + P
n
n
n n
n
and so
242
2
(2.25) Var[(Z n -8) } = r ~l. + 4r Var[z n Pn } + Var[pn }
3
2
222
2
+ 4r Cov[z n ,z n Pn } + 2r Cov[zn ,Pn } + 4rCov[zn Pn ,Pn }.
Our major task is to show that
(2.26)
2
-2
Var[Pn } = O(n ).
21
Once this has been accomplished, it is a simple matter to show
that each of the remaining variance and covariance terms in (2.25) is
of order n- l
To prove (2.26), first notice that
(2.27)
Now, using (2.17) and (2.15), we can write
W*(h) _ (h-l)V(h)
nn
n-l
(2.28)
where
ok (h)
(2.29)
W
nn
h
ehn-I)
-1.
-l,,(n-l,h-l) g (h) ( x
~
Q/' S
for h = 2,3,···,r and n > r.
x
..• x )
n' Q/2'
, ~
(Notice that W*(h) is related to V(h) in
nn
n
much the same way as W , defined in (2.11), is related to U , and
nn
n
that W*(h) differs slightly from w(h), defined in (2.12).)
nn
nn
Therefore,
from (2.28)
zn(h) 2 = W~.(nn(h) 2
(2.30)
for h
(h) (h)
2 (h) 2
- 2(h-l)Wnn Vn-l + (h-l) Vn-l
ft
oJ.
= 2,3,···,r. We now are equipped to prove that
(2.31)
for h
= 2,3, .•• ,r.
e[v(h)}
n-l
= 0,
From Lemma 2.3 (with m = n) and the fact that
we have
(2.32)
for h = 2,3,···,r.
(h)2
Var [ V _ }
n l
= O(n -2 )
Although W*(h) is not a U-statistic, the proof of
nn
Lemma 2.3 can be adapted to show that
22
r ,\, (h) 2}
Var1. Wnn
(2.33)
for h
h
=
=
2,3,···,r.
2,3, ••• ,r.
Notice that, by Lemma 2.4
,
o
e[w*(h)v(h)}
nn
n-1
for
Then, by the Schwarz inequality
= 2,3,···,r. Thus, (2.32), (2.33) and (2.34), along with
for h
Lemmas 2.5 and 2.8, imply that
(2.35)
for h = 2,3,···,r.
Also, (2.32), (2.33), (2.35) and the Schwarz
inequality imply that the three covariances involving W*(h)2 V(h)2
nn
' n-1
2
and w:~h)v~~i are each of order n- for h = 2,3, •.. ,r. This proves
(2.31).
An argument similar to that in (2.34), along with (2.31) gives
us
(2.36)
for h
1
~
h
2
= 2,3, ..• ,r.
Again, by the Schwarz inequality, the
covariances between the terms in (2.27) are each of order n
-2
.
This
proves (2.26).
We now tackle the remaining terms in (2.25).
k
erz
p } = 0 for k = 1,3 so that
1. nn
(2.37)
cov[z2,z p } = cov[z3,p } = O.
n n n
n n
From Lemma 2.4,
23
By Lemma 2.10, Var[p }
n
= O(n- l );
also var[zk}
n
= 0(1)
for k
= 1,2.
Hence, by an argument similar to that in (2.34) we obtain
Var[z P }
(2.38)
n n
= O(n- l ).
An application of (2.26) and the Schwarz inequality proves that
2
l
2
cov[z2,p } and Cov[z P ,p } are each of order n- .
n n
n n n
This, along with
(2.26), (2.37), (2.38) and (2.25), completes the proof of (2.23).
LEMMA 2.13.
r
<
m
(2.39)
<
Assume e[f(X ,··· ,X )}
r
l
4
<
00
and suppose that
Then
n.
2
2
Cov[ (2 -8) , (2 -8) }
n
r
=
m
4
[(r-l)~2
+ (r-l)
2
~3]n
-1
-1 -1
+ O(n m )
where ~2 = e[g(1)2(X )g(1) (X )g(2)(X ,X )} and
2
l 2
l
~3
= e[g(l) (Xl)g(l) (X2 )g(2) (X
PROOF.
l
,X )g(2)(X ,X )}.
2 3
3
From (2.24), writing zn for g(l)(X ),
n
3
2
2
2
+ 2r Cov[znnm
P ,z } + 4r Cov[z P ,z P } + 2rCov[z P ,P }
nnmm
nnm
22
2
2 2
+ r2
Cov[p ,z } + 2rCov[P ,z P } + Cov[p ,P }.
n m
n m
n mm
We treat the seven terms in (2.40) one at a time.
z2 is independent of (2 _8)2, so that
n
m
(2.41)
From (2.26) and the Schwarz inequality
Since m < n,
24
2 2
-1 -1
Cov [ p ,P } = O(n m ).
n m
(2.42)
2
m
We now consider the term Cov[z P ,z } in (2.40).
n n
(2.29) and the fact that z
n
(2.43)
2}
Cov [ zn P,z
n m =
r
~h
From (2.28),
is independent of z2 v (h)1 for m < n,
m n-
(r) ,,[
2Z (h)} =
=2 h U zn zm n
L:hr
=2
(r) U"[z z2W'1(h)}
h
n m nn
= ~r (r)(n-1)-1~(n-1,h-1)e[z z2 g (h)(X X ... X)}.
h=2 h
h-1
a's
n m
n' a '
, a
2
For convenience we shall write g
combination Cap."
,~).
a1"·~
= g(h)(x
a1
,···,x
~
h
) for any
Now, for h = 3,4, ... ,r, by Lemma 2.4,
o
(2.44)
for any combination (a2'···'~) chosen from [1,2, •.. ,n-1}.
Also,
for h = 2
2g . }
(n-1) -1 ~.n-1
1e[ zn zm
~=
n~
(2.45)
=
(n-1)
-1
~2'
Putting (2.44) and (2.45) into (2.43) gives us
(2.46)
Next, we consider the term Cov[z n Pn ,zmPm } in (2.40).
e[zn Pn } = 0
(2.47)
Cov[z P ,z P }
n n
m m
Since
25
(hI) (h 2)
From (2.28) and the independence of z from z VIZ
n
m nm
(h ) (h )
e[z n zmZn
(2.48)
1 Z
m
2} =
i(
(h )
i(
1 W
e[z n zmWnn
we have
(h )
mm
2 }
*(h ) (h 2)
l
- (h 2 -l)e[zn zmWnn
VI}
m-
*(2)}
e[ zn zmW*(2)
nn Wmm
where
e[z n zmg
= (n-l)
-1
(m-l)
-1 n-l m-l [
~. l~' Ie z z g .g .}
~=
J=
n m n~ mJ
.g .} = 0 except possibly when i = j = 1,2,···,m-l.
mJ
n~
Thus
(2.49) e[z z w*(2)w*(2)} = (n-l)-l(m-l)-l~~-lle[z z g .g .}
mm
n m nn
J=
=
Higher values of hI and h
(n-l)
2
-1
n m nJ mJ
13 ,
3
lead to terms of order n
-2
or higher,
that is
(2.50)
*(h l ) *(h 2)
W
}
mm
e[zn zmWnn
O(n
-2
).
A close examination yields
(2.51)
*(h l ) (h 2)
VI}
m-
e[zn zmWnn
4
= O(n- )
~'(
(More specifically,
for hI = 2,3 and h
2
= 2,3, .• ·,r.)
(h ) (h )
e[zn zmWnn
1 V 2 } = 0
m- l
Putting (2.49), (2.50) and (2.51)
into (2.48), and then, (2.48) into (2.47) yields
(2.52)
2
We now consider the term Cov[z P ,p } in (2.40).
n n
m
First,
26
from (2.28)
(2.53)
since z
n
is independent of v(h)lP2 for m < n.
n-
m
For h
= 2,3,"',r
(2.55)
But
~'(2) *(2)2}
e[ znWnn
W
rom
=
(n-l)
where e[z g .g n g n }
n n~ m~l m~2
,R,l
-1
(m-l)
-2 n-l m-l m-l [
}
L:._lL: n _lL: n _Ie z g .g n g n
~~l~2n n~ m~l m~2
= 0 except possibly when i = m and
= 1-2 = 1,2"" ,m-l and when i = 1-1 = /;2 = 1,2,'" ,m-l,
(2.56)
Also, it is not difficult to show that
(2.57)
and
Thus
27
(2.58)
Putting (2.56), (2.57) and (2.58) into (2.55) gives us
(2.59)
We now show that
(2.60)
For h
= h1 =
2 and h
2
=3
e( z nW*(2)
W*(2) W*(3)}
nun
nun
nn
where e(z g .g ~g ~ ~ } = 0 except possibly when (i,t) equals
n n~ m"", m""l ""'2
-1 -1
(2.61)
O(n m ).
It is a simple matter to show that
(2.62)
e(z W*(2)W*(2)V(3)}
n nn
nun
m-1
=
0
'
(2.63)
and
(2.64)
Then (2.60) follows from (2.61), (2.62), (2.63) and (2.64).
Higher
28
values of h, h
l
and h 2 yield terms of order n
-1 -2
m or higher.
There-
fore, from (2.53), (2.54), (2.59), (2.60) and the extensions of
(2.59) and (2.60) to higher values of h, h
l
and h , we obtain
2
(2.65)
Further application of the techniques already used gives us
(2.66)
Cov [ p 2 ,z 2} = 0(n-2)
n m
and
(2.67)
Cov [ p 2 ,z P } = O(n -2 ).
n
mm
We have now completed our treatment of each of the seven terms
in (2.40).
Combining (2.41), (2.42), (2.46), (2.52), (2.65), (2.66)
and (2.67) yields (2.39).
This completes the proof of the lemma.
CHAPTER III
ESTIMATION OF THE VARIANCE OF AU-STATISTIC
3.1
Introduction.
~
Our chief purpose in this chapter is to
obtain an estimate for the variance of a U-statistic having certain
desirable properties.
Assume e[f(X ,···,X )}
1
r
2
<
00
and that P1 > O.
Then, from (2.1), recall that
var[u n } = n
-1 2
-2
r P + O(n )
1
so that we may confine our attention to estimation of
since any good estimate of
0
2
~
00.
=
2
r P1'
n
More specifically, we are
concerned with obtaining an estimate for
n
2
will be a good estimate of n Var[U },
if second order terms are negligible.
basic properties.
0
(1) It converges to
0
0
2
2
= r 2 P1
= r 2 P1
that has several
almost surely as
(2) For each n, it is positive almost surely.
variance of the estimate may be evaluated for large n.
(3) The
(4) The
nature of the estimate is such that, for it, we can establish the
asymptotic efficiency of the sequential procedure appearing in
Chapter V.
(5) In addition, we would hope that the sequential
calculation of the estimate is not too tedious.
three suitable candidates are examined.
In this chapter
The two leading candidates
are compared when U is the unbiased estimate of the population
n
variance.
30
3.2
P1
The U-statistic estimate of Pl'
2
2
= var[f 1 (X 1 )} = e[f 1 (X 1 )} - e.
and so, has a U-statistic estimate.
Recall that
Now P is a regular functional,
1
From definition (2.2) in
section 2.3, notice that P1 has a kernel
(3.1)
2r) -1 ~ (0) f(x
••• x )f(x
••. x )
( r
a"a~"~
1
r
1
r
-
where the summation ~(1) is over all combinations (a ,···,a ) and
r
1
(~l""'~r) each formed from [1,2,"',2r-1} and such that there is
exactly one integer in common, and the summation ~(O) is over all
combinations (a1 ,···,ar ) and
(~l""'~r)
each formed from
[1,2,"',2r} and such that there are no integers in common.
From
(2.3), the U-statistic estimate of P is given by
1
(3.2)
= u(l)
n
_ u(O)
n
n ) -1,,(n,2r-1)
Lq (1) ( X
( 2r-1
a'
1
-
• ••
n ) -1,,(n,2r)
Lq (0) ( X
( 2r
a ' ••• , x a
.
1
)
x
, a
2r-1
)
2r
.
We next evaluate the variance of this estimate of Pl'
(2.1) and Theorem 2.2,
= £1 n
1
-1
+
O(n
-2
)
From
31
where
and the functionals siO,l) and pi l ) are defined in section 2.3.
Then, from (3.4),
(3.5) 6
1
= 4r 2e 2var[g(1)(X l )}
- 4r(2r-l)Cov[qiO)(X ),qi l )(X )}
l
l
+ (2r-l)2var [qi l ) (Xl)}
We would now like to express (2r-l)qi l )(X ) - 2rqiO)(x ) in
l
l
terms of the g(h) functions introduced in section 2.4.
(0)
ql
(xl) = ef l (xl) = eg
fo(x l )
=
(1)
e[fl(X2)f2(xl,X2)}.
(3.6 )
From Lenuna 2.4
2
(xl) + e.
.
Def1ne
Then, from section 2.3,
First
32
(3.7) fo(x )
l
= e[(g(1)(X 2 )+Q)(g(2)(x l ,X 2 )
= gl(x l )
+ Qg
(1)
+ Q + g(l)(x ) + g(1)(X ))}
l
2
2
(xl) + Pl + Q
Thus, putting (3.7) into (3.6) yields
and so
Before we put (3.8) into (3.5), notice that
(3.9)
and
33
=
I{Ig(l) (X )g(2) (X ,X )dF(X )JfIg(1) (X )g(2) (x ,x )dF(x 2~dF(Xy
l
l
l 3
2
2 3
=
Ig~(Xl)dF(Xl)
Then, putting (3.8) into (3.5) gives us
(3.11)
61 =
~l
+
r(r-l)~2
+ 4(r-l)
2
~3·
The variance of Q is then given by (3.3) with 6 given by (3.11).
ln
1
Qln inherits all the good properties of aU-statistic.
an unbiased estimate of Pl.
e[f(Xl,···,Xr )}
2
It
is
Under the assumption that
<~, Q
ln converges almost surely to PI as n ~~.
One drawback is that Q may possibly take on negative values.
ln
Also,
it appears that Q would require much time for computation.
ln
Estimation of p.
3.3
Pc for c
~ c
=
1,2,· •• ,r.
In section 2.3 we defined the functionals
Now Pc is a regular functional of degree 2r
and has a kernel given by q(C)(X ,···,x _ ) - q(O)(X ,···,x ) for
2r c
2r
l
l
c = 1,2,···,r.
The U-statistic estimate of P is given by
c
_ U(O) f or c = 1 , 2 , ••• , r.
Qcn = U(c)
n
n
Then, from (2.1) and
Theorem 2.2,
Var[Q
cn
}
= var[u(c)}
n
=
6 n
c
-1
- 2 cov[U(C) u(O)} + var[u(O)}
n ' n
n
+ O(n -2 )
34
where
for c = 1,2,···,r.
It is also possible to write
for c = 1,2,···,r.
Clearly, for c = 1 we have the situation con-
sidered in section 3.2.
Suppose we define
n) -1 r (r)(n-r)
Qn = r
~c=l c
r-c Qcn·
e
Then Q is an unbiased estimate of Var(U}.
n
n
Also
var(Q } = (n) -2~r (r) 2 (n-r) 2va r(Q }
n
r
c=l c
r-c
cn
r
r (r)(r)(n-r)(n-r)
+ ( n)-2
r
~c=l~d=l
c
d
r-c
r-d Cov ( Qcn,Qdn }
C:;fd
=
where 6
1
4
r 6 n
1
-3
-4
+ O(n )
is given by (3.11).
For small values ofn, where the higher order terms of Var[U }
n
may not be negligible, Q might be a satisfactory estimate of
n
Var(U}.
n
However, it is a very tedious estimate to compute, and we
2 -1
are concerned mainly with large values of n, so that r n Q is to
1n
.e
be preferred, so far, as an estimate of Var(U }.
n
35
3.4
Sen's estimate of cr
2
In section 2.5 we introduced
=
~
the W-statistic.
Now define
(3.12)
s
2
(n-l)
wn
-1 n
2
-
L:, l(W. -W ) •
~=
~n
n
Sen [13] introduced (see section 2.6)
Recall that W,
as an estimate of Pl.
W = rU.
n
·d
s~
~n
It then follows that s
n
ere d as an
.
est~mate
0
2
~
converges to cr2
~n
for i
= 1,2,···,n and
222
r s , so that s
can be convn
~
=
f 0 2 = r 2 Pl.
asymptotically unbiased estimate of
= rV.
is an
Sen [13] showed that s2
wn
0
2
2
= r
= r2
Pl .~n pro b a b'l'
~ ~ty as n
2
PI' and further, that s wn
~ ~.
In the next two
theorems we derive first order expressions for the bias and the
variance of s
to cr
2
2
wn
, as well as prove that s
2
= r Pl as n
2
wn
converges almost surely
~ ~.
THEOREM 3. 1 •
(i)
2
2
-1-2
(3.13) e(s2 } = r Pl + r (r-l) [(r-l)o2-2P ]n
+ O(n ).
~
l
(ii)
(3.14)
2
Var(s}
wn
where 6 is given
1
~
PROOF.
= r 4 6l n -1 + O(n -2 )
(3.11).
To prove (i) notice that
s
2
~
=
(n-l)
-1 n
2
L:, lW,
~=
~n
(n-l)
-1 -2
nWn •
36
Then
(n-1)
-1 n
[}2
L:i =l e Win
= (n-1)-ln [var[w
1n
- (n-1)
-1
-
2
ne[Wn }
} - Var[W }]
n
which, using Lemma 2.9 and (2.7), reduces to (3.13).
To prove (ii) we express s2
wn
U-statistics.
as a linear combination of
From the proof of Lemma 2.3
r (r)(n-r)u(c).
( n)-lL:
r
c=o c
r-c
n
(3.15)
In a similar fashion we may write
(3.16)
(n-1)
-1
n
2
L:.1= 1W.1n
= (n-1) -1 (n-1) -l r 2n L: r (r-1) (n-r) U(c) •
r-1
c=l c-1
r-c
n
Then, from (3.15) and (3.16), after some rearranging,
(3.17)
For c = O,l,"',r let
(n)
(r)(n-r)
2 •
an (c) = (n-1) -1 n
r-1c
r-c [cn-r]
In particular, a (0) = -r
n
2
-1
+ O(n ) and a n (l) =
c = 2,3,"',r notice that a (c) = O(n
-1
n
(3.18)
s
).
For
Thus
2
wn
1
where a (c) = 0(n- ) for c
O,l,···,r.
n
an explicit expression for a (c).
n
Note that we do not require
For convenience let
r
a (c)u(c) and recall from section 3.2 that Q
T = L:
n
c=o n
n'
1n
37
Then
242
Var[s wn } = r Var(Ql n } + 2r cov[Ql n ,Tn } + Var(T n }.
(3.19)
From (2.1), Theorem 2.2 and the fact that ~ (c)
n
= O(n- l )
for
-2
3
c = O,l,···,r, we have that Var[T } = O(n- ) and cov[Ql ,T } = O(n ).
n
n
n
Therefore, from (3.19) and (3.3)
2
Var(s wn }
where ~l
= ~l +
4(r-l)~2
THEOREM 3.2.
surely .!2 (J
2
PROOF.
= r 2 PI
= r 4 ~ln -1 +
+ 4(r-l)
2
~
n
surely to its expectation PI as n
= O,l,···,r,
<
co,
2
then s
converges almost
- - wn
-
First, by Hoeffding [9]
= U(l)
- U(O) converges almost
n
n
-+ co.
Secondly, since
n
n
= ~rc=o~n (C)U(C)
n
converges almost surely to 0
This completes the proof.
2
REMARKS.
.
est~mates
= O(n-1 )
~n(c)
and U(c) is a U-statistic with finite expectation,
c = O,l,···,r, then T
-+ co.
2
The theorem follows almost immediately from expression
(or Berk [2]), the U-statistic Q
ln
as n
)
-+ co.
(3.18) in the proof of Theorem 3.1.
for c
-2
This completes the proof.
~3'
If e(f(Xl,···,X )}
r
O(n
2
There is not much difference between swn and r Q as
ln
2
f
0
(J 2 = r 2 Pl' From expression (3.18), s2
wn = r Ql n + Tn •
The difference T converges to 0 almost surely as n
n
expectation of order n-
l
and variance of order n- 3
-+ co,
has
Both estimates
have good properties, including the same first order variance.
of course, converge to (J2
= r 2 PI
almost surely as n
differ on the question of unbiasedness.
2
wn
They only
2
The estimate r Qln is
unbiased whereas s2 is asymptotically unbiased.
wn
view of small sample theory s
-+ co.
Both,
From the point of
has the good property that it is
38
2
always non-negative, whereas it is possible for r Q to take on
ln
negative values. For the problem of finding a fixed-width sequential
e,
confidence interval for
discussed in Chapter V, the estimate s
2
wn
2
serves our purposes better than does r Qln.
3.5
~ 0
2
2
= r PI based on the Z-statistic.
section 2.7 for an introduction to the Z-statistic.
Refer to
For n > r define
(3.20)
Notice that if r = 1 then
variance.
o
2
=
0
2
= var(f(X )} and s2 reduces to a sample
l
zn
We now consider the merits of s
2
zn
as an estimate of
2
r Pl.
THEOREM 3.3.
If e(f(X ,···,X )}2 <
r
l
(i)
(3.21)
2}
e( szn
00,
then
2
2
2
-1
-1
= r PI + r (r-l) 02 n logn + An
+ o(n -1 )
where A is defined
~
with Al = -1, A2 = 2 (y -
r-2. -1
~i=l~
),
~
= h(h_2)-1 for h = 3,4,··· ,r
and Y = 0.5772 ••• (Euler's constant).
(ii)
Var ( s 2} = r 4b.n -1 + O(n -2 (log n) 2 )
zn
(3.22)
where b. =
~l
PROOF.
+
2(r-l)~2
+ 2(r-l)
To prove (i) let
2
~3.
39
s
~'(2
zn
=
(n-1)
-1 n
2:.
~=r+1
(Z.-U )
~
n
so that s2 = s*2 + r(n-1)-1(U -U )2.
zn
zn
r n
(3.23)
}2
-1 r
r(n-1) -1 e[u-u
= rn Lh=l
r n
2
Now
(rh )
~ + O(n
-2
).
Next
= (n-1)
-1
n
n
[2:.~=r+lvar[z.}
- 22:.~=r+lcov[Z.,u}
~
~
n + (n-r)Var[U n }].
From (2.16) and the corollary to Theorem 2.2, Cov[Z.,U
} = Var[Un }
~
n
for i = r+1, •.• ,n, so that using Lemma 2.11
*2 = (n-1) -1 2:i=r+1Var
n
[Zi
} - (n-1) -1 (n-r)Var [Un
}
(3.24) e [ szn}
r (r) 2
-1 2
-2
= 2:h=l h
~Kn(h,r) - n r P1 + O(n )
where, for n > rand h = 1,2,··· ,r,
Kn(h,r) = (n-1) -1 2:in=r+1 (i-1)-1
h
[(i-2)h+1].
Note that Kn (l,r) = 1 - n
-1
(r-1) + O(n
-2
) and that
K (2,r) = 4,{'n-1) -12:~ +1(i-2) -1 - 2(n-1) -12:~ 1(i-1) -1(i_2) -1.
n
~=r
~=r+
Now, let yn
=
n
.-1
2:;... __ 1~
- log n for n > 2; then limn-+oo yn = y, where
y = 0.5772 •.. is Euler's constant.
Also, notice that
(r-1)
Then Kn (2,r) becomes
-1
- (n-1)
-1
.
40
(3.25) K (2,r)
4n
n
-1
r-2 -1
-1
-1
[y+10g n-L:i=li ] - 2n (r-1)
+ O(n
4n
-1
-2
10gn) + O(n
10gn+n
-1
-1
E:)
n
r-2 -1
-1
-1
[4y-4L:i=li -2(r-1)
] +o(n ).
From the theory of infinite series
L:~
i- 1 (i+1) -1 •.• (i+k) -1 = k-1(k~) -1 (m+k) -1
~=m+1
k
for k = 1,2, ••. and m = 0,1,···.
Notice that the above infinite
-k
series is of order m
for k = 1,2,···.
Then, making use of this
series result, for h = 3,4,···,r, we have that
Kn(h,r) = n
-1
n
h(h~)L:i=r+1 (i-2)
-n
= n
-1
-1
n
<Xl
1 hi
~=r+ -
-1
.•. (i-h)
(h-1)(h~)L:.
(i-l)
~=r+1
h (h : ) L:.
-n
-1
-1
( i +1)
<Xl
(h-1) (h:)L:.
-1
-1
-1
···(i-h)
• . . (i+h - 2)
1hi
~=r+ -.
-1
(i+1)
-1
-1
-2
+O(n)
-1
···(i+h-1)
-1-2
+O(n)
and so
= n
r (r)
h
-1 rL: =3
h
\ (h-2)
-1 [(r-2) h+2] + 0 (n -2 ).
Combining (3.20), (3.23), (3.24), (3.25) and (3.26) gives us (3.21).
To prove (ii) first set
e
= 0, without loss of generality.
Then
41
*2
-1 n
2
s zn = (n-1) 2:.~=r+1Z.~
= An +
(n-1)
-1
(n+r)U
2
-1
+ (n-1) 2rU U
n
r n
B
n
where we have set
(3.27)
A
n
(n-1)
-1 n
2
2:.~=r+lZ,~
and
(3.28)
Therefore
(3.29)
var[s*2} = Var[A } + 2 Cov[A ,B } + Var[B }.
zn
n
n n
n
We now divide the proof of (ii) into two parts.
In Part (a) we
show that Var[A } is given by the expression in (3.22).
In Part (b)
n
we show that Var[B } and Cov[A ,B } are each of order n
n
n n
-2
log n or
higher.
PART (a).
(3.30)
From (3.27)
Var[A } = (n-1) -2 2:.n
IVar [ Z.2}
~=r+
~
n
+ 2(n-1)
~2
n
2:.
i-I
[2 2}
12:.
lCoV Z. ,Z .•
J=r+
~
J
~=r+
By Lemma 2.12
(3.31) (n-1) -2 2:.n
~=r+
-2 n
4
-1
1Var [ Z.2} = (n-l) 2: =r+l[r ~1 + O(i )]
~
i
= r
4
~ln
-1
+ O(n
-2
logn).
42
By Lennna 2.13
(3.32) (n-1)
-2 n
~.
~=r
(n-1)
i-I CoV [2 Z2}
z.~ t J.
J=r+l
+l~'
-2 n
[4 [(r-1)~2+(r-1) 2 ~3]~.-1 + 0(4-1J.-1)}
i-I
~i=r+1~j=r+1 r
4
= r [(r-1) ~2+(r-1)
2
~3]n
L
-1
+ O(n
-2
2
(log n) ).
Putting (3.31) and (3.32) into (3.30) yields
4
(3.33)
Var [A } = r t:.n
n
PART (b).
2
that var[u }
n
-1
+ 0 (n
-2
2
(log n) ).
From Lennna 2.3 (with m = n)t since
O(n
-2
).
(3.34)
In order to prove
Cov[AntBn } = O(n
-2
log n)
it is sufficient t by (3.28)t to show that
(3.36)
Cov[A tU U } = 0(n- 1 )
n
r n
and
(3.37)
= 0t we have
It is then a simple matter to show from
(3.28) that
(3.35)
e
Cov [ An tUn2} = O(n -2 logn).
We now prove (3.36).
First
43
(3.38)
2 2
= cov[U ,U } + Var[U }Var[U } - [Cov[U ,U }]2
r n
r
n
r n
= O(n-1)
by Lemma 2.3 (since
e
= 0), (2.1) and the corollary to Theorem 2.2.
l
Then (3.36) follows from (3.38), the fact that Var[A } = O(n- ) and
n
the Schwarz inequality.
We now prove (3.37).
From (3.27) and (2.24)
2 n
(3.39) Cov[A ,U 2 } = (n-l) -1 r~.
lCoV [2
z.,U 2}
n n
~=r+
~
n
+ (n-l)
where Pi =
-1
~=2 (~)
n
2
2r~.~=r+lCov[z.P.,u}
~ ~
n + (n-l)
zih )
for i = r+l,r+2,'" ,no
-1 n
[2
2}
~.~=r +lCoV P.,U
~
n
Now, using symmetry
and (3.15),
n +lCoV [2
2}
(3.40) (n-l) -1 ~.~=r
z.,U
~
n
(n-l) -1 (n-r) Cov [2
zl' Un2}
= (n-l)-l(n-r) (n) -l~r (r)(n-r) Cov[n-l~? z7 U(c)}.
r
c=o c
r-c
~=l ~' n
Notice that n
-1 n
~.
~=
2
lZ' is a U-statistic, so that, applying Theorem 2.2,
~
we obtain cov[n-l~~ lz7,u(0)}
~=
~
n
= O(n -2 ).
Theorem 2.2 equals zero because q(O)(x)
1
1
(In this case the Sl of
=
0.)
For c = 1,2, ... ,r it
is clear, by the Schwarz inequality and the fact that the variance
44
of a U-statistic is of order n -1
2 (c)} = O(n -1 ).
that Cov [ n -1 ~.n 1z.,U
~=
~
n
Thus
(n-1) -1 ~.n
(3.41)
1CoV [2
z,U 2} = O(n -2 ).
i n
~=r+
In order to prove that
n +lCoV [2
(n-1) -1 ~.~=r
P.~ ,Un2} = O(n -2 log n)
(3.42)
2
recall from (2.26) that var[p7} = O(i- ).
~
The Schwarz inequality
2 2
-1 -1
then implies that Cov[P.,U } = O(i n ) and (3.42» follows.
n
~
To prove
(3.43)
n
[
2}
(n-1) -1 ~.~=r
+lCovz.P.,U
~ ~
n
O(n -2 10gn)
notice from (3.15) that
n
2}
(n-1) -1 ~.~=r
+lCov[z.P.,U
~ ~
n
= (n) -l~r (r)(n-r) (n_1)-1~~
Cov(z.P. U(c)}.
r
c=o c
r-c
~=r+1
~ ~' n
We can therefore establish (3.43) by showing that
(3.44)
Cov[(n-1)
-1 n
~.
~=r
(1) }
-1
+lz.P.,U
= O(n )
~ ~
n
and
(3.45)
n
(O)}
(n-1) -1 ~.~=r
+lCoV ( z.P.,U
~ ~
n
O(n -2 log n).
Now
(3.46)
n
}
}
Var ( (n-1) -1 ~.~=r
+lz.P.
= (n-1) -2 ~.n~=r+lVar [
z.P.
~ ~
~ ~
+ (n-1) -2 2~.n
i-1 1CoV [ z.P.,z.P .}.
J=r+
~ ~
J J
1~'
~=r+
45
Then, from (2.38) and (2.52) in the proof of Letmna 2.13
Cov[z.P.,z.P.}
~
for r < j .:S i.
(3.47)
J J
~
= 0(i- 1 )
Thus, from (3.46),
Var[(n-1)
-1
~.
n
+lz.P.
~ ~
}
~=r
= O(n -1 ).
Therefore (3.44) follows from (3.47), the fact that var[u(l)}
n
= 0(n- 1 )
and the Schwarz inequality.
To prove (3.45) it is sufficient to prove (by (2.28»
(h)
(3.48)
(O)}
Cov[z.V.
U
~ ~-1' n
that
= O(i -1 n -1 )
and
*(h) (0)
Cov[z.W..
}
~ ~~
, Un
(3.49)
for h
= 2,3,"',r
= O(n -2 )
= r+1,r+2, ••• ,n.
and i
Since z. is independent of
~
vi~i, Letmna 2.5 implies that
for h
~
Also, var[U(O)}
2,3,"',r and i = r+1,r+2, ••. ,n.
n
= O(n -2 ),
so that (3.48) follows from the Schwarz inequality.
To prove (3.49) notice that for h
*(2) (O)} -_
Cov [ z.W..,U
~ ~~
n
•
for i
~
J=
~
= r+1,r+2, ••• ,n.
e[z.g(2)(X.,X.)f(X
~
._ 1) -l(n)-l(n-r)-l
r
r
(~
i-11~ (0) e[ z.g (2) (X.,X.)f(X
~.
~
J
a1
=2
J
a1
,"',X
ar
)f(X Q ""'XQ
~1
~r
)
}
But
,"',X
ar
)f(XQ ,"',X Q
~1
~r
)}
=0
except possibly
46
when i appears among (a ,···,ar ) and j appears among
1
or vice versa.
(~l""'~r)'
The number of possibly non-zero terms is
•
2 ( n-2)
r-1 (n-r-1)
r-1
Thus, (3.49) holds for h = 2.
h = 3,4, .•. ,r follow in analogous fashion.
The cases where
This completes the
proof of (3.49).
2
4
We have therefore shown that var[s*2} = r t.n- 1 + 0(n- (10g n)2).
zn
It is a simple matter to see that var[(n-1)-1(U -u )2} and
r n
-2
cov[s*2,(n-1)~1(U -u )2} are each of order n , and so, (3.22) is
zn
r n
finally verified.
Assume e[f(X ,··· ,X )}
1
r
THEOREM 3.4.
to
(J
2
2
= r P1 almost surely
PROOF.
Assume that
e
n ....
~
2
<
00.
Then s
2
converges
zn
00.
= 0, without loss of
genera1it~and
recall
that
(3.50)
s
2
zn
where
*2
-1 n
2
-1
2
-1
(3.51) szn = (n-1) ~i=r+1Zi - (n-1) (n+r)U + (n-1) 2rU U •
r n
n
Clearly r(n-1)
-1
(U -U)
r n
2
converges to 0 almost surely as n ....
00.
By
Hoeffding [9] (or Berk [2]), the second and third terms in (3.51)
each converge to 0 almost surely as n ....
00.
From (3.27) and (2.24)
(3.52)
= (n-1)
"'1 2 n
r~.
2
-1 n
+l z ~. + 2r(n-1) ~.~=r+lz.P.
~ ~
~=r
+ (n-1)
-1 n
~.
2
+l P ~..
~=r
47
Now
e[Z:} =
~
e[g(1)2(X.)}
~
= PI' so that, by the strong law of large
2
numbers, the first term in (3.52) converges to cr
surely as n
almost
Thus, to complete the proof we need only show that
~~.
.
(3.53)
2
= r PI
-1 n
l~m
(n-l)~.
n"'~
2
o
+lP,~
~=r
(a. s.)
and
(3.54)
(a. s.) .
From (2.28)
(r)w~~h) _ ~r (r) (h-l)V~h)
1h=2 h
~~
h=2 h
~-l
P; = )'.r
(3.55)
~
= r+l,r+2,···,n where W~~h) is given by (2.29).
for i
From
~~
Hoeffding [9], V~h) converges almost surely to zero as i ... ~ for
~
h
=
2,3,···,r.
Also, it can be shown, in a proof almost identical
to that on pages 108-110 of Wilks [17] (due to Feller [5]), that
w~~h) converges almost surely to zero as i ... ~ for h
~~
=
2,3,···,r.
2
Then, (3.55) implies that P., and therefore P., converges almost
~
~
surely to zero and hence in Cesaro-mean a.s. - that is, (3.53) holds.
Next, by the Schwarz inequality,
I(n-l) -1 ~.n
(3.56)
so that (3.54) holds.
If r
REMARK 1.
I
-1 n
~.
~=
This completes the proof.
= 1,
then W.
~n
2
2
Also, both sand s
sample mean.
(n-l)
+lz.P.
~ ~
~=r
wn
2
-
l(x.-x).
~
n
= x.,
wn
~
Z. = x. and U =
~
~
n
xn , the
reduce to the sample variance
(For notational convenience, when r
=
1, we
48
assume that f(x) = x.)
2
A few comments about the relative merits of sand
wn
.
f a2=2
. or der. Both estimates are
0
r P are 1n
s 2 as est1mates
zn
l
2
asymptotically unbiased. However, the bias in the case of s
is of
zn
2
lower order than that of swn' that is
REMARK 2.
BIAS(S2 )
zn
-1
= r 2 (r-l) 2 02n -1 logn + An - 1
+ o(n )
whereas the bias of s
2
is of order n
wn
.22
overest1mate a = r Pl.
~l
Notice that s
2
tends to
zn
The variances of the estimates each have
leading terms of order n
possible to compare
-1
-1
and
See (3.14) and (3.22).
~,
as
~2
+
(r-l)~3
It is not
may be either negative
or non-negative depending upon f(xl, ... ,x ) and the c.d.£. F. The
r
2
second order term of var[s2 } is of order n- (log n)2, which compares
zn
2
with n- , the order of the second order term of var[s2}. The one
wn
advantage, and it is an important one, that s
2
zn
has over s
it is by nature more suited for sequential calculation.
2
wn
In s
is that
2
wn
,
each of the Win terms depend upon x ,x 2 ,·.·,x and must therefore
n
l
(in general) be calculated at each stage of the sequential procedure,
whereas, in s2 , Z. depends only on the first i observations and need
zn
1
only be calculated once.
More will be said about the
computatio~ of
2
2
2
and s
at the end of Chapter v. Theoretically, s
appears a
wn
zn
wn
2
2
2
as an estimate of a = r Pl· However, in
little superior to s
zn
s
2
many cases, szn can be calculated with relative ease, and in such
cases, might be preferred over s
2
wn
Thus, the final choice of
estimate depends upon the function f(xl,···,x ).
r
In Chapter V the
sequential procedure is examined with respect to both s2 and s2
wn
zn
49
~. Define ~
3.6
j
= 2,3,'"
f(x ,x )
1 2
=
(when eXistent).
Assume that
(X -X )2/2, so that
1 2
=
e{x 1} and ~j
e
=
e{(x1-~) j} for
=
~4
<
~
e{(x 1-x 2)2/2}
and
~2
~2'
=
> O.
Let
The corres-
ponding U-statistic is
(3.57)
Un
where x
n
n
2
= (n-1) -1 L:.1=
l(x.-x)
= n(n-1) -1 (m2 -m 12 )
1
n
is the sample mean and m.
J
n
j
= n -1 L:.1=
1x.
1
.
for J
= 1,2, ••..
2
2
Next, f 1 (xl) = e{(x 1 -x2 ) /2} = (x1-~) /2 + ~zl2 and
222
2
P1 = 1 = (~4-~2)/4, so that 0 = r P1 = ~4 - ~2'
°
From (2.4) we obtain g(1)(x ) = (X1-~)2/2 - ~2/2 and
1
g
(2)
(x 1 ,x 2 )
=
-(x1-~)(x2-~)'
so that
°2 =
2
~2'
Therefore, it follows
from (2.7) that
var{U }
(3.58)
n
as is well known.
We now present
s~n
and s;n' the estimates of
0
2
=
~4 -
2
~2'
From (2.11)
W.1n
for i
= 1,2,'"
,n, and so, after some manipulation,
(3.59)
3
The factor n (n-1)
properties of s
(3.60)
2
wn
-3
in (3.59) may be omitted without affecting the
to any appreciable extent.
From (3.20)
50
where Z. for i
1.
= 3,4,···
is given by
Z.1. = iU.1. - (i-1)U.1.- 1
(3.61)
and U. is given by (3.57).
1.
In this example s
2
wn
is just as easy to
2
2
calculate sequentially as szn' and so, is to be preferred over s
zn
.
f
2
2
as an est1.mate 0 a = ~4 - ~2.
From (3.13) of Theorem 3.1
(3.62)
which indicates that the first order bias of s
2
wn
From (3.21) of Theorem 3.3
(3.63)
-1
+ o(n )
where Y = 0.5772 •... The dominant term in the bias of s2 is
zn
2 -1
4~2n log n, which is non-negative, and so, as we have already
.
. d , s 2 ten d s to overest1.mate a
not1.ce
zn
n
2
=
~4
-
2
~2.
The term of order
-1 .
1.n (3.63) may be negative or non-negative depending upon the
c.d.f. F, and therefore, may either help to decrease or increase the
bias.
.
.
of s2 an d s 2 assume t h at
To d eterm1.ne
t h e var1.ances
wn
zn
(as well as
~2
> 0).
From the statement of Lemma 2.12
~8
<
=
51
Next, from section 3.2
= e[g(l)(X )g(2)(x X)} =
go (x 1)
2
l' 2
Therefore, by (3.9)
and, by (3.10)
The variance of s
2
wn
is then
(3.64)
The variance of s
Var[s
(3.65)
where ~
=
~1
+
2~2
2
zn
is given by
2
-1
-2
2
} = 16~n + O(n (log n) )
zn
+
2~3·
be either negative or non-negative depending upon the c.d.f. F, and
so, it is 'not possible to compare the first order terms of (3.64)
and (3.65).
For this particular example we. might consider
s
2
2
-1
-1
::: n (n-2) (n-3) [S4
2
(n -3) (n-l)
-2 2
s2]
52
as an estimate of n Var{U } where s. = 2:~ l(x.-x )j for j = 2,3,···.
J
n
~=
~
n
The estimate s2 is motivated by the classical theory of sampling
distributions of sample moments.
Kendall and Stuart [10J.
See, for example, Chapter 12 of
The chief merit of s2 is that it is an
unbiased estimate of n Var {U}.
n
As is evident from (3.59), which
may be written as
2
3
Swn = n (n-1)
2
2
sand s
wn
with s2
wn
-3
2
[s4-s2J,
differ very little (especially for large values of n)
being slightly easier to compute.
concerned with large values of n, we favor s
Since we are mainly
2
wn
over s
2
as an estimate
For further examples of U-statistics see Hoeffding [7J, [8J
and Fraser [6J.
CHAPTER IV
CONTRIBUTIONS TO THE ASYMPTOTIC
THEORY OF U-STATISTICS
4.1
Introduction.
~
The results of this chapter were developed
with the idea of solving the fixed-width sequential confidence
interval problem of Chapter V.
We utilize the H-decomposition intro-
duced in Theorem 2.6, namely
U
n
= 8
+
L;r
h=l
(r) V (h)
h
n
'
to establish a Kolmogorov-like inequality for U-statistics
(Theorem 4.2), and then, to show that, for y < 1/2, nY(U -8) converges
n
almost surely to 0 as n
~ ~
(Theorem 4.4).
In addition, if for each
positive integer s, N is a positive integer-valued random variable,
s
then Theorem 4.5 states that the U-statistic UN
based on
s
x ,x ,···,x
is asymptotically normal as s
l 2
N
s
ditions.
~~,
under certain con-
Theorem 4.5 is used in Chapter V to establish the asymptotic
consistency of the sequential procedure (Theorem 5.2).
4.2
Kolmo orov inequalities.
h = 1,2,.··,r,
s~h)
=
Theorem 2.6 states that for each
(~)v~h) forms a martingale sequence.
This
fact is used to prove
LEMMA 4.1.
Assume that 0 < 6 <
h
~
for
~
the following Kolmogorov-like inequality holds:
h = 1,2, .• · ,r.
for A > 0
Then
54
(4.1)
PROOF.
By Lemma 2.5,
e[s~h)2}:::;: (~) 0h'
Thus, by the
Kolmogorov inequality for martingales (see page 399 of Loeve (11]),
> 0,
for any
€
Putting
€:::;:
AO l/2(n)"
h
h
REMARKS.
1/2 completes the proof of (4.1).
The Kolmogorov-like inequality (4.1) can be applied
directly to show that for any y < h/2, n Yv~h) converges almost surely
to 0 as n
~
See pages 109-110 of Wilks (17].
00.
However, a different
approach leads to stronger convergence results, as we shall see in
Theorem 4.3.
In a somewhat classical approach a proof similar to that
on pages 107-108 of Wilks (17], utilizing Lemmas 2.4 and 2.5, can be
used to establish (4.1).
We now use Lemma 4.1 to derive a Kolmogorov-like inequality for
a U-statistic.
From Theorem 2.6
where we have set S
n
THEOREM 4.2.
°=
r
for A > O.
r
n
-
Assume e[f(Xl,···,X r )}
1/2
L: h=l (r)
h 0h
= ( n ) U for n > r.
.
Then
2
<
00
and
°1 > 0, and let
55
PROOF.
First, note that 0h <
~
for h
= 1,2,···,r
as a conse-
quence of our assumption, Lemma 2.1(i), Lemma 2.7 and the Schwarz
inequality.
Define the events
and
for h = 1,2,···,r.
If each of E ,E ,'"
1 2
,Er
occurs, then for a ~ r
ISa- ( ~) e 1.$ (~) ~~=1 ( ~) (~ ) -11 S;h) I
<
(~)~~=1(~)(~) -lAO~/2(~)
1/2
r
This implies that E E- lJh=lEh' so that, by Lemma 4.1
which completes the proof.
4.3
~~~.
THEOREM 4.3.
The main theorem is
Let (bnJ; be .~ positive increasing sequence of real
numbers with limn-too bn
=~.
If, for
~
h = 1,2,'" ,r, 0 < 0h <
and
(4.3)
then b~ls~h) converges almost surely to 0 as n ~ ~.
~
56
PROOF.
From Lemma 4.1 for any e > 0
(4.4)
Then (4.3), (4.4) and the Borel-Cantelli lemma imply that
(4.5)
(a.s.).
Next, define
for j
=
1,2,··· and Y
n
= s(~)
2J+n
-
s(~) for n = 1,2,···. Then [Yn}~
2J
is a martingale sequence, so that, by the Kolmogorov inequality for
martingales (page 399 of Loeve [11])
(4.6)
P[ T. > eb . } < e-2 b -2.e[ Y. }2 •
J 2J 2J
2J
(4.7)
2 j +l
2j
A little computation shows that ( h
) - (h)
constant 0 < K <
00.
h'
< K2 J for some
Thus (4.3), (4.6), (4.7) and the Borel-Cantelli
lemma imply that
(4.8)
-1
b.T.
J"'oo 2J J
lim.
=0
(a.s.).
Now, for each n, let j be the positive integer such that
57
Then, since [bnJ; is positive increasing,
(4.9)
for n = h,h+1,···.
Combining (4.5), (4.8) and (4.9) completes the
proof of the theorem.
COROLLARY.
If e
(i)
to 0 as
n ...
<\
> lIZ, then n
< GO for
-h/Z
~
(log n)
h = 1,Z,···,r.
-e (h)
converges almost surely
Sn
GO.
If Y < h/Z, then nYV(h) converges almost surely to 0 as
(ii)
n ...
Assume 0 <
n
GO.
If Y < 1, then nYR
(iii)
n
converges almost surely to 0 as n ...
00,
where R is defined bv (Z.6).
n---";;'';;;''';;;=~~
PROOF.
To prove (i) let bn
= nh/Z (log
n)
e
so that, since Ze > 1,
(4.3) becomes
To prove (ii) let b
n
= nh- y •
Then, since h - Zy > 0, (4.3)
becomes
GO
~.
J=
1Z
-j(h-ZY)
< GO.
Thus ny-hS(h) converges almost surely to 0 as n ... GO, which is equivan
lent to (ii).
Part (iii) follows directly from (ii).
REMARK.
result.
Theorem 4.3 may be easily generalized to the following
Let [bnJ; be a positive increasing sequence of real numbers
with lim
b
n"'GO n
= GO.
If, for n > r, S forms a zero-mean martingale
n
58
sequence such that els IS}
n
= O(nh )
for some s > 1 and some h> 0 and
then b- 1 S converges almost surely to 0 as
n
n
THEOREM 4.4.
Assume e(f(Xl,···,X )}
r
2
<
00
and that
°1 >
Y < 1/2, then nY(Un-Q) converges almost surely ~ 0 ~ n ...
PROOF.
O.
If
00.
The theorem follows directly from the H-decomposition
(2.6) and Corollary (ii) of Theorem 4.3.
4.4
The asymptotic normality of UN'
In this section we extend
to U-statistics Anscombe's theorem on the asymptotic normality of
averages of a random number of I.I.D. random variables.
From
Theorem 2.6 recall that the H-decomposition is
U = Q + rv(l) + R
n
where Rn
(h)
= ~hr= 2 (r)
h Vn
THEOREM 4.5.
n
(1)
and Vn
n
= n -1~.nL= l(f 1 (x.)-Q).
L
Assume e(f(X ,· .. ,X )}
1
r
2
<
00
and P1> O.
Denote the
standard normal c.d.f. ~ ~(x).~ (n } be 1!!! increasing sequence of
s
.£!: !!
00
~
s -+
proper random variables taking
~
positive integer values
positive integers tending J:.£
-1
00,
p - lims-+oo n s Ns
~, with cr2
and (N }
s
sequence of
~ ~
= 1.
=
lim
p(U -Q) < n- 1/ 2xcr} =
s-+oo
- s
N
~(x).
s
PROOF.
Anscombe [1] introduced the following situation.
Let
59
[Y } be a sequence of random variables.
n
Assume that there exists a
real number Q, a sequence of positive numbers [w } and a c.d.f. F(x)
n
such that:
Cl.
For any x such that F(x) is continuous
lim
n-+QO
C2.
pryn -Q -<
xwn } = F(x).
Given e > 0 and Tl> 0 there exists a large \)
c > 0 such that for any n > \)
p[lyn ,-yn I <
ew
n
e,
e,
Tl and a small
Tl
for all n' such that In'-nl < cn} > 1 - Tl.
Theorem 1 of Anscombe [1] states that if [Y } satisfies Cl and C2,
n
then
lim
S-texl
p[Y -9 < xw } = F(x)
N
n
s
at all continuity points of F(x).
s
Let C3 be the condition that [w }
n
is decreasing, tending to 0 as n -+ QO and limn-+QO w-+llw
n
n
=
1.
Theorem 3
of Anscombe [1] states that C2 is satisfied if Y is the average of
n
n IoIoDo random variables, if Cl and C3 hold, and if F(x) is
continuous.
We now apply these results to our situation.
so that C3 is satisfied.
Cl with F(x)
=
~(x).
n
= n -1/2 cr
Hoeffding [7] has shown that U satisfies
n
(See Remark 2 following Theorem 2.6.)
show that U satisfies C2.
n
of Anscombe [1].
Here w
We now
First, rv(l) satisfies C2 by Theorem 3
n
Thus, from the H-decomposition, given e > 0 and
Tl > 0 there exists a \) e, Tl and a c > 0 such that
p[lun ,-Un -Rn ,+Rn I <
ecrn-
1/2
for all n' such that In'-hl < cn} > 1 - Tl
·60
for all n > v
e,
and hence
~,
'I
pf IUn ,-Un I- IRn ,-Rn I <
(4.10)
ecrn -1/2
for all n' such that /n'-nl < cn} > 1 - 11
for all n> v
lim
n"' oo
~.
By Corollary (iii) of Theorem 4.3,
e, 'I
1/2
n R = 0 (a.s.). Thus given e > 0 and 11 > 0 there exists
n
an N' ~ such that
e, 'I
pflRn ,-Rn I <
for all n> N'
e,
~
~crn-1/2 for n' = n,n+1 ,···,n+k} > 1 - ~'I
v
and for k
'I
= 0,1,···.
n > N
implies that n(l-c) > N' ~.
e,11
e"1
(4.11)
pf IRn ,-Rn I <
ecru
Let N ~
e"1
Therefore
= (l_c)-lN,
~.
e"1
Then
-1/2
for all n' such that In'-nl < cn} > 1 - 11
for all n > N ~.
e, 'I
= max(v
Let v
A
= flun ,-un I-IRn ,-Rn I <
B
= fiR n ,-Rn I <
ecrn-
1/2
ecrn-
1/2
~,N
e"1
~).
e"1
Define the events
for all n' such that In'-nl < cn},
for all n' such that In'-nl < cn}
and
C
= flU n ,-un I <
Therefore A
2e cr n-
nB~
1/2
for all n' such that In'-nl < cn}.
C, and so, from (4.10) and (4.11)
P(C) ~ P(A n B) = P(A) - P(A n B) > 1 - 211
for all n > v.
Thus U satisfies C2, and the asymptotic normality of
n
61
UN follows from Theorem 1 of Anscombe [1].
The following corollary is a consequence of Theorems 3.2 and 4.5.
Under the assumptions of Theorem 4.5
COROLLARY.
.
f
-1N (UN -8) _< x } =@(x)
11ms"'oo P n s1/2 s w
(4.12)
s
where swN
is given
~
s
(3.12).
s
REMARKS.
As a result of Theorem 3.4, swN
in (4.12) may be
s
replaced by szN ' which is given by (3.20).
As a special case of
s
(4.12) it follows that n
1/2 -1
s (U -8) is asymptotically normal with
wn n
mean 0 and variance 1 as n ...
s. for s .)
zn
wn
00.
(We may, of course, again substitute
CHAPTER V
SEQUENTIAL FIXED-WIDTH CONFIDENCE
INTERVALS FOR REGULAR FUNCTIONALS
5.1
variables.
~~.
.
Assume that Xl ,X 2 ,··· are I.I.D. random
Let f(xl,·.·,x r ) be the symmetric kernel of aU-statistic
U whose expectation is 8.
n
The problem is to find a sequential confi-
dence interval for 8 of fixed-width 2d, where d > 0, and such that the
coverage probability either equals, or approaches in some way, a
specified a, where
° < a < 1.
The problem was solved by Chow and
Robbins [3] for a special U-statistic, the sample mean.
To adapt
their procedure to deal with a general U-statistic is the "raison
d'etre" of Chapter V.
Chow and Robbins [3] use n
-1 2
2
n
n
s , where s
is the
sample variance, to estimate the unknown variance of the sample mean.
In section 5.2, n -1 s 2 is used as an estimate of the unknown variance
wn
. sect10n
.
5 • 3 ,n -1 s 2 .1S use d •
o f U ,and 1n
n
zn
The sequential procedure may be simply described as follows:
at
each stage of sampling the U-statistic U and an estimate of its
n
variance are calculated, and sampling is terminated as soon as the
approximate coverage probability for the interval [Un -d,Un +d], based
on a normal approximation, is at least a.
It is shown that the
coverage probability is, in a certain sense, asymptotically a; that is,
the sequential procedures are consistent (Theorem 5.2).
It is also
shown that the expected sample size of the procedures is asymptotically
equal to the sample size of the corresponding non-sequential scheme
63
used when the variance of the U-statistic is known (Theorems 5.3 and
5.8); that is, the sequential procedures are efficient.
In section 5.4 the procedures are illustrated with the estimation
of (1) the variance of Xl' and (2) the probability of concordance for
bivariate Xl'
5.2
"a"
The se uentia1 procedure using s
2
wn
•
For 0 < a < 1 define
(> 0) by
(2TI)
-1/2 J+a
2
exp(-u /2)du
= a.
-a
Let fa n } be a sequence of positive real numbers such that limn-KlOa n
= a.
For d > 0 define the stopping variable
(5.1)
2
N(d) = smallest integer k > r such that swk
Define a closed confidence interval
Notice that N and
~
~
~
= [UN-d,UN+d]
2 -2
kd a .
k
of width 2d.
have properties similar to those stated in the
theorem appearing in Chow and Robbins [3].
LEMMA 5.L
(i)
(ii)
(iii)
(iv)
Assume e.ff(X 1 ,···,X )}
r
N(d) is well-defined and is
1im
N(d)
d"'0
= CIO
2
< CIO and P1> O.
~
Then
non-increasing function of d,
(a.s.),
--
limd
e.fN(d)} = CIO,and
"'0
--2 -2 -2
1imd
a cr d N(d) = 1 (a.s.).
"'0
2
Recall from Theorem 3.2 that limn-.oo s2
wn = cr (a.s.). Let
-2 2
-2 2
-2 2 2
Yn = cr swn' fen) = an na and t = d a cr. Then all parts of the
PROOF.
lemma follow from Lemma 1 of Chow and Robbins [3].
THEOREM 5.2.
Then
64
Hence, i2!: sufficiently short intervals, the coverage probability is
approximately equal to
PROOF.
replaced by t
ct.
= d-2a 2a 2
Let t
-1 2 2
a a.
and let Nt be defined by (5.1) with d
(Note that Nt
and identify Nt with N and t with
s
p[Q E ~}
= p[IUN -QI
t
< d}
= N(d).)
n •
s
2
Refer to Theorem 4.5
Then
= p[IUN -QI
l 2
< t- / aa}
t
and so the theorem follows.
THEOREM 5.3.
d2 a -2a -2 e[ N(d)}
limd
"'0
(5.2)
= 1.
Before we tackle the proof of Theorem 5.3 we establish a series of
four lemmas, which are required in the proof.
Let x l ,X ,··· be Ll.D. random variables and Yn .!
2
LEMMA 5.4.
function of Xl ,X2 ,··· ,Xn •
For each t > 0 let Nt be.! positive
integer-valued random variable depending
event [Nt
= n}
~
(X ,X ,···) such
l 2
PROOF.
is in 8 , the a-field generated bv [x ,x ,···,X } for
l 2
n -~
n --
=
00
(a.s.),
then
- limt ... oo YN
=Q
If lim
Y
=Q
Yn
= Q}
n.-.co n
(a.s.) .
t
Define the events
= [(X l ,x2 ,···)llimn-+
and
the
--
n = 1,2,··· (i.e., Nt is.! stopping variable).
and limt .-.co Nt
~
OO
(a. s •)
65
Then P(A)
= P(B) = 1,
which implies that P(A n B)
easily be shown that A nBc C.
LEMMA 5.5.
Thus P(C)
If e[lf(X ,···,x ) I} <
1
r
00,
= 1.
But, it can
= 1.
then [U }oo is a reverse
-nr--
martingale.
PROOF.
The proof appears in Berk [2], but, because of its simple
nature and the fact that it is referred to several times in the
ensuing pages, is repeated here.
Let n> m> rand (a ,···,a ) be
1
r
any r-combination from [1,2,··.,n}.
Then
e[f(Xa , ••• ,Xa )IU,U
+1'···}
n n
1
r
= e[f(X ,···,X )IU,U
1
r
n
n+1
, ••• } = q
(say).
Sum over all (~) combinations and obtain
:E(n,r)e[f(Xa ,· •• ,Xa )IU,U
+1' ••• }
n n
1
r
so that e[un Iu n ,U n+1'···}
= q.
That is Un
=
=q
(n)
r q
(a.s.).
Next, sum
over all (:) combinations from [1,2'''.,m} and obtain
e[UmIUn,Un+1'···} = q.
Thus, e[UmIUn'Un+1'···}
= Un
(a.s.), and
[U }oo is a reverse martingale.
n r
LEMMA 5.6.
(5.3)
PROOF.
This method of proof by truncation is similar to that in
Hoeffding [9] and Sen [13].
Also see Siegmund [14].
Define
66
f
I
(x
0"
••• x
,
1
and f"(x
0'1
0' )
r
,···,x )
f(X
=
0'1
{
,···,x )
O'r
O'r
o
= f(x
0'1
otherwise
, ••• ,x )
O'r
S = ~(n,r)f(x
nO"
f'(x
0'1
, ... ,x ).
O'r
••• x )
, 0'
'
1
r
and
S"n
= ~(n,r)f"(x
O"
(a)
1
••• , x0' ) •
r
To prove e(sup n-(r+e)ls'll <
n
00,
n
note that
< sup n -(r+e)~(n,r)(
~
max.O'. )e/2
J J
n
< sup
~(n,r)(
n
~
max.O'. )-r-e/2
J J
n
(j-l)
< SUPn~j=r r-l
00
.-I-el2
< ~.J=r J
(b)
<
00.
To prove e( sup n- (r+e) Is" 11 <
n
n
.-r-e/2
J
00,
note that
Then set
67
e(sup n-(r+€)ls"l} < e(sup n-(r+€)E(n,r)lf"(X
••. X )I}
n
n
nO:'"
0:'
1
r
=e(su
Pn
n-(r+€)E~ E(j-1,r-1)\f"(X.X
.. ·X )I}
J=r O:"s
J' 0:'2'
, O:'r
< e(E~ J.-(r+€)E(j-1,r-1) If"(X. X
J=r
0:" s
J' 0:"
-
2
< Ea:
-
=
J=r
••• X
,
ar
)
I}
J.-(r+€)E(j-1,r-1) e( If"(X. X
••• X ) \}
0:" S
J' 0:"
, 0:'
2
r
E~=rj-(r+€)
(;=i) I \f(xl"" ,x r ) 1I1~=ldF(Xi)
[If( ••• ) l>jE:!2]
<
-
b E~ J,-1-€
r J:::r
<
co
where we have set
bj =
I
If(x 1 ,··· ,x r ) I I1~=ldF(Xi)
[!f(x , ... ,x ) \>j€/2]
r
1
for j = r,r+1,"', so that b. > 0 and b. > b. 1.'
J+
J J (c)
Finally, we have S = S' + S" and
n
n
n
e(sup n-(r+€) Is I} < e(sup n-(r+€) Is'
n
n
n
n
IJ
+ e(sup n-(r+€) Is"l}
n
n
which, along with (a) and (b), proves the lemma.
A positive integer-valued random variable M depending on
(X ,X ,· .. ) such that, for n = 1,2,"', the event (M = n} is in :a~,
1 2
the a-field generated by
variable".
(xn ,xn + 1 ,···}, is called a "reverse stopping
The following lemma appears in Simons [15] and follows
68
from Theorem 2.2 on page 302 of Doob [4].
!:.2! Z_m , ••• , Z-m be.! martingale where
LEMMA 5.7.
2
<
co
and ~ M be .! reverse stopping variable
~ e[z_M}
= e[Z_m
-co
1
!!!!h
< ml < m2
P[m l .:s M.:s m2 } = 1.
}.
1
PROOF OF THEOREM 5.3.
(a)
As in Simons [15] define a reverse stopping variable for
d> 0 by
last integer n > n
-
0
such that s2 > nd 2a- 2
wn
n
if there is such an n
(5.4) M =
if s2
no - 1
< nd 2a-n 2 for all n >- n0
wn-
if s2 > nd 2a- 2 infinitely often
wn
n
00
where n 0 >
r + 1.
-
Let I represent the indicator function and define
t and Nt as in the proof of Theorem 5.2.
< d- 2 2s 2 + n
-
~ wM
I
0
Then for every t > 0
[M> n -1]
-
0
2 2
< ta-2 (J-2 a..S
M + n0 •
MW
-
Thus, for every t > 0
-1 [ }
-2 -2 [ 22}
-1
teNt .:s a (J e ~swM + t no.
(5.5)
(b)
=
2
(J
•
From expression (3.18) in
69
the proof of Theorem 3.1
(5.6)
Define Z(c)
-n
z(c)
_00 = lim
= U(c)
n
n~oo
Po
= 0.)
and Z(c) = lim
z(c) for c = O,l,···,r. Then
-00
n~oo
-n
2
U(c)
= Pc + 9 (a.s.) for c = O,l,···,r. (Recall that
n
Then, by Lemma 5.5, {z(c)'
is a martingale.
_00 •• "z(c)}
-r
fore, from Lemma 5.7 with m1
= no
- 1 and m
2
= 00,
There-
we obtain
t{u(c)} = t{u(c) } =
M
n -1
(5.7)
o
= O,l,···,r.
for c
In particular,
t{U~l)} = P1 + 9 2 and t{u~O)} = 9 2 .
From (5.1) and (5.4) note that, for every t > 0, Nt
as a consequence of Lemma 5.1(ii), 1im
t
Lemma 5.4 implies that
Now an(c)
c
c
= O(n-1 ),
= O,l,···,r.
= O,l,···,r.
1imt~00 u~c)
1imt~00
so that
= Pc
~oo
~
M + 1, so that,
M = 00 (a.s.).
+ 9 2 (a.s.) for c = O,l,···,r.
(c)
aM(c)UM
=
° (a.s.) for
Furthermore, by Lemma 5.6, t{sup a (c)lu(c)l} < 00 for
n n
n
We then use the Lebesgue dominated convergence theorem
to obtain
.
(5.8)
r
l~mt~oo ~c=ot
{
(c)}
aM(c)UM
= 0.
Finally, from (5.6), (5.7) and (5.8) we conclude that
lim
(5.9)
(c)
Also,
t ~oo
t{s2 } = (J 2 •
wM
Guided by Pratt [12], we next show that
(5.10)
From Lemma 5.4 it follows that 1imt~00 s;M = (J2 (a.s.) and
70
2
.2222
2
11m
8 S M= a a
(a.s.). Now, let A = inf a and B = sup a .
t~
nn
n n
Mw
222
2
for every t > 0, As wM -< a.s
M < Bs wM' Thus
MW -
Then,
and, by Fatou's lemma,
(5.11)
Also
o<
222
BCJ - a CJ·
and, by invoking Fatou's lemma once more,
(5.12)
= BCJ2
- lim sup t-+oo
2 2
e(aM
s }.
wM
Then (5.10) follows from (5.11) and (5.12).
(d)
We conclude from (5.5) and (5.10) that
lim SUPt-+oot-1e(Nt};; 1.
lim inf -+ oo tt
5.3
1e(N } ~
t
1.
The sequential
However, Fatou's lemma implies that
This completes the proof of Theorem 5.3.
rocedure usin
s
2
The results of
zn
section 5.2 also hold if s2 is used as an estimate of CJ2
zn
Throughout this section N(d) is defined by (5.1) with s;k substituted
2
for swk'
Results analogous to Lemma 5.1 and Theorem 5.2 follow
immediately.
We now consider the analog of Theorem 5.3.
THEOREM 5.8.
Assume e(f(X 1 ,··· ,Xr )}
2
<
00
and P1 > O.
Then
71
(S .13)
PROOF.
(a)
Examine the proof of Theorem S.3.
analogy to (S.S), for every t>
(S .14)
-1 ( }
teNt
-2 2 2
where t = d a cr.
It is clear that, in
°
-2 -2 ( 2 2 } + t-ln
~ a c r e ~szM
0
In order to establish (S.13), it is sufficient
to prove
2
(S.lS)
cr •
For, assume for the moment that (S.lS) is true.
Then it is easy to
see that Part (c) of the proof of Theorem S.3 can be applied, as it
stands, to prove that
= a 2a 2 •
(S.16)
As a result of (S.14), (S.16) and Part (d) of the proof of
Theorem S.3, it follows that (S.13) is true.
We therefore begin the proof of (S.lS).
without loss of generality.
2
szn
= s *2
zn
First, set Q
= 0,
Then from the proof of Theorem 3.3
+ r(n-l)
-1
(Ur -Un )
2
and
*2
s zn
= (n-l) -1 ~.n1=r+12.12
(n-l)
-1
2
n
(n+r)U + (n-l)
-1
2rU U •
r n
We establish (S.lS) by proving each of the following four statements:
72
(S.17)
lim
2
t ..... CD
= a ,
e(M-l) -ll!:
1.=r+lZ:}
1.
(S.18)
and
(S.19)
(S.20)
(b)
Proof of (S.17).
Z2. = r 2 z.2 + 2rz.P. + P.2
1.
1.
1. 1.
1.
(S.2l)
where Pi =
From the proof of Lemma 2.12 recall that
~=2 ( ~ ) z~h)
for i = r+l,···,n.
and
Now write
2}
-1 M
2}
(
M
2}
(S.23) e(M-l) -LM
L;.
1.=r+lz.1. = e(M-r) L:.1.=r+lz.1. + e b(M)L:.1.=r+lz.1.
where b(M) = OeM
-2
).
.
1e,
Cl ear 1y, ( n-r ) -l"n
~.
lZ.2.1.S a reverse mart1.nga
1.=r+ 1.
so that, by Lemma S.7
2
(S.24)
Recall that lim t .....CD M =
a •
CD
(a.s.).
Therefore, from Lemmas S.4 and S.6
and the Lebesgue dominated convergence theorem, we obtain
73
1im t "'co e[b(M)~~=r+lz2}
=
i
(5.25)
o.
Putting (5.24) and (5.25) into (5.23) yields
(5.26 )
Because of (5.21) and (5.26), in order to prove (5.17), we need
only prove
(5.27)
and
.
(5.28)
l~mt
...co
M
}
e [ (M-1) -1 2:.~=r+
1z.Pi
=
~
o.
But, by the Schwarz inequality (for both the sununation and the
expectation),
Ie[(M-l) -1 2:.M
~=r
(5.29)
I e [ «M-l) -1 2:.M +l z 2.) 1/2
+lz.P.}
~ ~
-<
~=r
[
-< [e (M-1)
-1 M
~
2}
2:.~=r +l z ~.
[
«M-1)
e (M-l)
2 V 2}
-1 M
-1 M
2:.~=r +lP.)
~
2
2:.~=r +lP.}]
~
1/2
•
From (5.29) and (5.26), notice that (5.27) implies (5.28).
We now prove (5.27).
(5.30)
From the proof of Lenuna 2.12 recall that
(h)
*(h) _ (h-1)V(.,h)
2i
= Wii
~-l
so that
(5.31)
for h = 2,3,···,r and i = r+1,···,n.
Then, because of (5.22), (5.31)
1"'lIII
74
and an argument similar to (5.29), in order to prove (5.27) it is
sufficient to prove
.
l~mt"'00 e[(M-1)
(5.32)
-1~
*(h)2
~.~=r+lW"~~
} = 0
and
.
l~mt...00
(5.33)
[
e (M-1)
-1 M
(h)2}
~.~=r~+lV. 1
= 0
for h = 2,3,···,r.
To prove (5.32), for h = 2,3,···,r; c = O,1,···,h-1 and
i = r+1, ••• ,n, define
[ (i-1) (h-1) ( i-h ) ]-1
h-1
c
h-1-c
(5.34)
•
(c) (h)
g
~
(x.,x
J
a2
, ••• ,x)g (h) (x.,xj:l , ••• ,xj:l )
ah
J P2
Ph
where the summation ~(c) is over all combinations (a ,·.·,a ) and
h
2
(13 2 "",13 h ) each chosen from [1,2, ••• ,i-1} and such that there are
exactly c integers in common.
defines the u(c)'s.)
n
(Compare (5.34) with (2.3) which
Then, in analogy to (3.15),
1c(h)2 = (i-1) -1~h-1 (h-1) (
W~~
..
(5.35)
h-1
c=o
for h = 2,3,· .. ,r and i = r+1,···,n.
c
i-h ) V~h,c)
h-1-c
~
Because of (5.35), (5.32)
becomes
(5.36) lim
t~
+ lim
e[ (M_1)-1~~
~=r+1
(i-1) -1 (i-h) V~h,O)}
h-1
h-1
~
~h-1 (h-1) e[(M_1)-1~~
(i-1) -1 ( i-h ) v~h,c)} = 0
c
~=r+1
h-1
h-1-c
~
t ...oo c=l
75
for h = 2,3,···,r, which is to be proved.
We now examine the first term of (5.36), which may be written as
lim
(5.37)
t ....oo
e[(M-l)-l~~
V~h,O)}
1.=r+l 1.
+ lim
t ....00
e[(M-l) -l~Mi +la(h,i)V~h,O)}
=r
1.
l
where a(h,i) = O(i- ) for h = 2,3,···,r.
Notice that the first term
of (5.37) is equal to
lim
(5.38)
t .... oo
e[(M-r)
+ lim
t .... oo
-l~~
V~h,O)}
1.=r+l 1.
e[b (M) ~~
V~h ,0) }
1.=r+l 1.
for h = 2,3, ••• ,r and where b(M) = O(M
-2
).
By Lemma 2.4,
e[v~h,O)}
= 0 for h = 2,3,.··,r and i = r+l, ••• ,n.
1.
(n_r)-l~~
Also,
v~h,O) is a reverse martingale (the proof is similar to
1.=r+l 1.
that of Lemma 5.5) so that, by Lemma 5.7,
(5.39)
for h = 2,3,··· ,r.
Next, Theorem 4.3 can be adapted to show that
lim
b(n)~~
lv~h,O) = 0 (a.s.), and also, Lemma 5.6 can be
1.=r+ 1.
n.... oo
adapted to show that e[sup b(n)~~ +llv~h,O) I} <
n
1.=r
1.
00,
for h = 2,3, ••. ,r.
Thus, from the Lebesgue dominated convergence theorem
(5.40)
lim
for h = 2,3,···, r •
t .... 00
e[b(M)~1.=r+lv~h,O)}
= 0
1.
From (5.38), (5.39) and (5.40) we find that the
first term of (5.37) equals zero.
In a similar fashion it is possible
to shoN that the second term of (5.37), and hence, the first term of
(5.36), equals zero.
76
We now examine the second term of (5.36).
Again, Theorem 4.3
can be adapted to show that
(5.41)
(a.s.)
and a proof similar to that of Lemma 5.6 demonstrates that
n
(h c)l}
e[supn (n-l) -1 ~.1=r
+li -11 V.'
< ~
1
(5.42)
for h = 2,3,"',r and c = O,l,···,h-l.
Then (5.41), (5.42) and the
Lebesgue dominated convergence theorem combine to show that the second
term of (5.36) equals zero.
This completes the proof of (5.32).
The proof of (5.33) is similar to the proof of (5.32) and is
therefore omitted.
(c)
We have thus established (5.27), and hence, (5.17).
Proof of (5.18), (5.19) and (5.20).
From the proof of
Theorem 3.1 recall that
u 2 = (n) -1 ~ r ( r ) (n - r) U(c)
r-c
n
n
r
c=o c
(5.43)
with U(c) given by (2.3).
n
for c = O,l, ••• ,r.
Notice that (n ) -1 ( r ) (nr-_cr)
c
r
For c = 0, by Lemma 5.7, e[u~O)} = 0.
= O(n -c )
For
c = 1,2,"',r, by Lemma 5.6,
(5.44)
Also, for c = 1,2, ••• ,r
(5.45)
(a.s.) •
Thus, by (5.43), (5.44), (5.45) and the Lebesgue dominated convergence
theorem, (5.18) holds.
Both (5.19) and (5.20) can be easily proved
using the Schwarz inequality.
This completes the proof of (5.15) and
77
the theorem.
5.4
~.
EXAMPLE 1.
We continue our discussion of the example of
section 3.6 in which Q = var{X }.
1
To be specific let a
k
= a for
k = 2,3,··· although any positive sequence {a } such that 1i~~ak= a
k
would do since we are only investigating asymptotic behavior.
From
(5.1) define for d > 0
2 2
N(d) = smallest integer k> 2 such that s2 k < kd aw -
(5.46)
2
where swk is given by (3.59).
Then IN = [UN-d,UN+d] is a sequential
confidence interval for Q =
having width equal to 2d and coverage
~2
probability approximately equal to a, for small values of d.
The
sequential procedure is asymptotically efficient in the sense that
(5.2) holds.
In this case s2 is not difficult to calculate sequenwn
tia11y as it depends only on the first four sample moments.
We could also define N(d) by (5.46) with s;k replaced by
s~k
2
where szk is computed using (3.57), (3.61) and (3.60), in that order.
However, for this example, the procedure using s;k is to be preferred.
Note, incidentally, that the sequential procedure is invariant
under a location shift.
(1)
(2)
x(2»
), ••• ,xn = (x(l)
n
' n
is a bivariate random sample of a random variable X = (x(l) ,x(2» with
EXAMPLE 2.
Suppose that xl = (xl
,xl
continuous marginal distribution functions.
s(u)
-1
u
< 0
0
u
=0
!
=
+1
u> 0
Let
78
and f(x ,x ) = s(x(1)_x(1»s(x(2)_x(2»
1
2
12·
1 2
The corresponding U-statistic
is
(5.47)
and is referred to as the difference sign covariance of the sample.
See Hoeffding [7].
Two points xl and x
are said to be concordant
2
= +1 and are discordant if
if s(x(1)_x(1»s(x(2)_x(2»
1
2
1
2
s(Xi1)-x~1»s(xi2)-x~2» = -1. Let
(5.48) TI = P[X
1
and X are concordant} = p[(X~1)_x~1»(X~2)_x~2»
2
> OJ.
Then the expectation of the U-statistic is
Q
= e[s(x(1)_x(1»s(x(2)_x(2»} = 2TI - 1.
1
2
1
2
Now, let C equal the
n
number of concordant pairs among x ,x ,· •• ,x '
n
1 2
(5.49)
Un = 4n
-1
(n-1)
-1
Then (5.47) becomes
Cn - 1.
Next, for i = 1,2,"',n, let T. equal the number of points among
J.n
n
Then~.
J.= 1T.J.n = 2Cn
n
and Cn = ~i=2Tii' so that Cn+ 1 = Cn + Tn+1,n+1'
·that W. = 4(n-1) -1T. - 2 and so
To determine s 2
notice
wn
J.n
J.n
W.J.n - Wn = 4(n-1) -1 T.J.n
for i
(5.50)
Define
= 1,2,···,n.
8n
-1
(n-1)
-1
Thus, after some arrangement
2
-3 n
2
s wn = 16(n-1) [~.J.= 1T.J.n
C
n
79
if xi and xn+ 1 are concordant
1
=
'a. (n+1)
~
for i
= 1,2,···,n.
to
otherwise
Then T.~,n+1
= T.
~n
+ a.(n+1)
for i
~
=
1,2,···,n.
Notice that the T. 's may be arranged in a triangle as follows:
~n
T12
T
22
T
13
T
23
1n
T
2n
·T··
T
33
T
nn
Suppose that the observations x ,x ,··.,x have been taken and that
n
1 2
C ,T ,···,T
have been determined numerically.
n 1n
nn
Now, if a further
observation xn+1 is taken, then Tn+ 1 ,n+1 can be determined either by
plotting x + 1 and comparing with x ,x 2 ,···,xn ' or otherwise.
n
1
= Cn
compute Cn+1 from Cn+1
for i
= 1,2,···,n,
+ Tn+ 1 ,n+1.
Then
The Ti ,n+1's are determined,
from the last row of the above triangle, and
2
finally, s
given by (5.50) •
w,n+1 is
2
The estimate s
is given by
zn
(5.51)
2
-1 n
2
s zn = (n-1) [L 3Z .
~=
~
nu~
+
2U~]
where, for n > 2,
(5.52)
Z = nU
Suppose that Cn' Un and
n
n
n
2
~i=3Zi
observation xn+ 1 is taken.
(n-1)Un- 1.
are known numerically and a further
Determine Tn+ 1 ,n+1 and Cn+ 1 as before.
80
Compute Un+ 1 from (5.49) and Zn+1 from (5.52).
Then, finally s
2
z,n+1
can be computed from (5.51).
Define N(d) using either s
IN
=
2
wn
or s
2
zn
as an estimate of cr
[UN-d,UN+d] is a sequential confidence interval for Q
2
= 2n
Then
- 1
having fixed-width equal to 2d and coverage probability approximately
equal to
~,
for small values of d.
81
BIBLIOGRAPHY
[1]
Anscombe, F. J. (1952). Large-sample theory of sequential
estimation. ~. Roy. Statist. Soc. Sere B. 48 600-607.
[2]
Berk, R. H. (1966). Limiting behavior of posterior distributions when the model is incorrect. Ann. Math. Statist.
37 51-58.
-- -
[3]
Chow, Y. S. and Robbins, H. (1965). On the asymptotic theory
of fixed-width sequential confidence intervals for the mean.
Ann. ~. Statist. 36 457-462.
[4]
Doob, J. L. (1953). Stochastic Processes. Wiley, New York.
[5]
Feller, W. (1957). An Introduction to Probability Theory and
its Applications, 1. Second edition. Wiley, New York.
""
[6]
Fraser, D. A. s. (1957). Nonparametric Methods in Statistics.
Wiley, New York.
[7]
Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. Ann. Math. Statist. 19 293-325.
--
---
""-"V
[8]
Hoeffding, W. (1960). An upper bound for the variance of
Kendall's tau and of related statistics. In Contributions to
Probability and Statistics, edited by I. 01kin and others.
Stanford University Press, Stanford.
[9]
Hoeffding, W. (1961). The strong law of large numbers for
U-statistics. Institute of Statistics Mimeo Series No. 302,
University of North Carolina, Chapel Hill.
[10]
Kendall, M. G. and Stuart, A. (1958). The Advanced Theory of
Statistics, 1. Hafner, New York.
[11]
Loeve, M. (1960). Probability Theory. Second edition.
Van Nostrand, New York.
[12]
Pratt, J. W. (1960). On interchanging limits and integrals.
Ann. Math. Statist. 31 74-77.
[13]
Sen, P. K. (1960). On some convergence properties of Ustatistics. Calcutta Statist. Assn. Bull. 10 1-18.
""
---""-"V
82
[ 14]
Siegmund, D. (1969). On moments of the maximum of normed partial
sums. Ann. Math. Statist. 40 527-531.
-- ---
[15]
Simons, G. (1968). On the cost of not knowing the variance
when making a fixed-width confidence interval for the mean.
Ann. Math. Statist. 39 1946-1952.
-- --[16]
Starr, N. (1966). The performance of a sequential procedure
for the fixed-width interval estimation of the mean. ~.
Math. Statist. 37 36-50.
[17]
Wilks, S. S. (1962). Mathematical Statistics. Wiley, New York.