Lee, Alan J.On the Asymptotic Distribution of Certain Incomplete U- Statistics."

ON THE ASYMPTOTIC DISTRIBUTION OF CERTAIN INCOMPLETE U-STATISTICS
by
Alan J. Lee
University of Auckland
and
University of North Carolina at Chapel Hill
Abstract
We consider incomplete U-statistics based on the arithmetic mean of m
quantities g. where g is the kernel of the complete U-statistic evaluated at
1
•
a randomly chosen subsample of a sample of size n.
The asymptotic
distribution of the incomplete U-statistic is obtained in terms of that of
the complete statistic, and the asymptotic relative efficiency of the
incomplete statistic discussed.
Comparisons are made with other
incom~lete
U-statistics.
Some Key Words:
Incomplete U-statistic, asymptotic distribution, asymptotic
relative efficiency.
*The work of this author was partially supported by the Air Force Office of
Scientific Research under Grant AFOSR-75-2796.
2
§l.
Introduction.
Let Xl, •.. ,X be independent and identically distributed
n
random variables with distribution function F, and let
00
v
00
e = f ... f
_00
_00
g(xl,···,xv ) IT F(dx.)
1.
i=l
where the kernel g is symmetric in its v arguments.
unbiased estimator of
e
(1)
As is well known, an
is furnished by the U-statistic
Un
=
[~rl I
where the sum ranges over all
(~J
g(X. , ... ,X. )
(2)
l.v
1. 1
v-subsets of the X..
1.
Such estimators may
require excessive computation when v and n are not small.
•
The dependence
among the terms in (2) suggests that an incomplete statistic of the-form
U*
m
= .!.
m
where the sum ranges over m <
randomly, may estimate
r g (X. 1 , ••• , X. v)
1.
(~J
e almost
1.
,
(3)
v-subsets chosen systematically or
as well as (2).
Such incomplete U-statistics
have been studied by Blom [1] and Brown and Kildea [2].
In the present
paper we consider the case where the m subsets in (3) are chosen at random
with replacement from the
(~J
possible subsets and derive the asymptotic
distribution of the resulting (normalized) incomplete U-statistic.
We
discuss the efficiency of such statistics compared to the corresponding
complete U-statistic and also compared to the balanced incomplete U-statistic
proposed by Brown and Kildea.
We first summarize some results on complete U-statistics due to
Hoeffding [3].
3
Let
Sc
= Var
sd+l
~
~
c (xl' ... 'x c )
~c(Xl'···'Xc)
E(g(xl, ... ,x,X
c c+ l' ... 'X))
v and define
=
for c
= 1,2, ..• ,v;
So
= O.
=0
If sk
0, Un is said to be stationary of order d at F.
for k
~
d and
In this case
d+l [
)
n~ Un-e has a nondegenerate limit distribution, and nd+1 Var Un converges
to
§2.
[d~1)2
(d+1)! sd+1.
See e.g. Puri and Sen [6], p. 54.
Efficiency of the incomplete U-statistic.
Suppose now the incomplete
U-statistic u* is constructed by selecting m subsets at random with
m
replacement. Then (B1om [1]) U~ is unbiased, and its variance is given by
Var
U~
1]
l,;v + ( 1- m Var Un.
=m
Other relations between the second moments of
2
U and u* are Var u* - Var U = Elu* - U 1 and Cov(U ,U*) = Var U. The
n
m
m
n
m
n
n mZ;;
n
relative efficiency of u* compared with U is thus Var U /(~T(l-~Jvar Un]
m
n
n m
m
which converges as n,m tend to infinity to
(d+l)
!(d~lr Pd+1
ex+(d+1)!
(d~1)2
Pd+1
sd+1
Sd+l
Sv
. n d+1-1
!l.m
m and Pd+1 = -s-· Note that since 0 < -d+1
-- < -- v
d+l
v
Thus the asymptotic relative efficiency of
(Hoeffding [3]), 0 < Pd+1 ~ --.
v
U*m is zero, unity or between these two values according as ex is 00, 0 or a
where ex
=
finite positive number.
In the first case the asymptotic distribution of
IJn(U*-8) is normal with zero mean and variance s , as proved below.
m
d+l v
second case, the asymptotic distribution of
d+l
of n-2-(u -e), since
n
n-2-(U~-8)
In the
coincides with that
4
d+1
d+1
2
2
E n - (u;-e) - n - (Un-e)
2
I
1
= nd+ E u; - Un
I
=n
d+1
2
I
(Var U* - Var U )
m
n
which converges to zero.
The case 0 < a <
§3.
00
is treated in §4 below.
Comparison with Brown and Ki1dea's incomplete U-statistic.
In [2]
Brown and Ki1dea propose an incomplete U-statistic for the case d=O,v=2
based on systematic selection of Kn pairs X.,X. in such a way that each r.v.
1
J
appears in exactly 2K pairs and each pair shares an r.v. with 2(2K-1) other
pairs.
The asymptotic efficiency of this statistic, relative to the
complete U-statistic, is 2KP1(~+(2K-1)Pl)-1, which is superior to that of u*
m
which is Kp1(1+KP1)
-1
when m=Kn,d=O,V=2.
Brown and Ki1dea's statistic, which
has an asymptotically normal distribution, is thus to be preferred over u*
m
in this case.
However, the efficiency of U* improves as v and d increase,
m
provided that Pd+1 is not too small.
The construction of "balanced"
incomplete U-statistics poses difficult combinatorial problems for high v
and d, and the resulting asymptotics are also not simple, so the use of
statistics of type U* may be regarded as a satisfactory albeit nonoptima1
n
alternative.
Asymptotic distribution of U* when 0 < a < 00. Let '¥ (t) be the
n
m
d+1
2
characteristic function of n---[u -e) and '¥(t) the c.f. of the limit
n
§4.
5
-(d+l)
distribution.
1
2 :2
The c.f. of ~(Un-8) is thus 'l'n (m n
-1:
t], which converges
Using these facts, we
to 'l'(ta 2) or 1 according as a is finite or infinite.
can now prove our asymptotic result.
Theorem.
.
11m
Suppose
n
d+l-l
m
=a
where 0 < a
~
Then v'm(U*-8)
00.
m
n,m~
converges in distribution to a distribution with characteristic function
<p(t)
o<
=
a
Proof.
Given a sample xl""'xn , let g.,
i
1
kernel g(x. , ... ,X. ) where x. , ... ,X.
1
1
1
1
V
1
1
=
a <
00
00
,
•
= 1, ... N,
be the values of the
range over the N
=
V
(~] subsets
of
{xl' ... ,x}. The conditional distribution of U* given X = (Xl' ... ,x ) is
n
m
n
1 m
that of - L Y. where the Y. are independent and identically distributed
J
m j=l J
random variables with distribution Pr[Y.J = g.]
= liN, which has mean
1
~...
1
-N
N
2
1 N
2
L
g. = UN and variance a = N L (g.-U) .
. 1 1
n
. 1
1
n
1~
1=
conditional characteristic function of the r.v.
¢n (t) converges to e -t
2
/2
Let ¢ (t) be the
n
rm(.!.m j=l
I
by the central limit theorem.
E [exp(itv'm(U*-8)) Ix]
m
'"
= ¢n (ton )e
and so
iv'm(U -8) t
n
Y.
J
-u]/a
so that
n n
It follows that
6
Consider
where a
n
= m-1 nd+1
The second term obviously converges to 0, while the
first is less than
E ¢ (ta)e
n
n
I
i~(U -8)t
n
~
_ ¢ (tr,;2)e
n
\)
i~(U -8)t
n
(4)
where H is the distribution function of the r.v. a.
n
n
Now
212
1 ~ 2
= -N L g (X.1 , ... ,X.1\)) - UN2 and N
L g (X. , •.. ,X. ) is a U-statistic and so
1
l
1
1
V
2
converges in probability to E(g2(X , ... ,X )) = r,;\) + 8 . Similarly Un
1
v
a
n
2
converges in probability to 8 and so a converges in probability to r,; which
n
\)
P ~
implies that a + r,;2
Now write the integral in (4) above as
n
v
(5)
~
The second integral in (5) is less than 2Pr[lan - r,; \)21 > c] while the first
7
(6)
where <P(t)
= lim
<p
n
(t)
= e _t
2/2
+
\<p
Now
<p
n
n
(tz;;) - <P(tz;;)
I
.
1
+ 2Pr[jo
- z;;'21 > E]
v
n
converges uniformly to
<p
on compact
sets so by (4), (5) and (6)
which by the uniform continuity of characteristic functions can be made
arbitrarily small by suitable choice of
and the proof is complete.
!<:
Thus lim EI<Pn(ton) - <Pn(tz;;~)
E.
Note that if d
I
=
0,
!<:
0 the distribution of n 2 (U -e)
=
n
2
is normal with mean v Z;;1 and so the asymptotic distribution of IJll(Vm-e) is
l
normal with mean zero and variance Z;;v+vZ;;la- . When d = 1, n(Un-e) has an
asymptotic distribution equal to that of
I
\.(Z:-l) for iid N(O,l)Z., where
J
J
J
the \. are the eigenvalues of a certain integral operator (see [4] for
J
details).
The limit distribution of lJll(u*-e) is thus that of X +
.
m
,
where X has anN(O,Z;;,,) distribution independent of the Z., and \.
J
v
Finally, when a
= 00,
,
2
J
J
\.(Z.-l)
!<:
=
a
2
\ ••
J
the asymptotic distribution of lJll(u*-e) is N(O,z;; )
m
v
irrespective of d.
§5.
J
I
U-statistics based on sampling without replacement.
o
The above results,
with slight modifications, remain true when we sample without replacement in
the construction of the incomplete U-statistic.
8
Let U' be such aU-statistic, then
m
2
Var U' = E(E (Um
' - 8 ) I,...,
X) )
m
=
2
2
1 m
where a = - I (g. -U) .
m m
1
n
i=l
Ef:; [~=~)+(Un _8)2]
Now
= E t1m.
I
2
(g. _ 8) ) + E(U -8) 2
1 1
n
1=
=
0_
Sv [N-m)
so Var U~ = In
N-l
+
sV -
Var U
n
(1
(N-m)) Var Un' which corresponds to Blom's result
l-m (N-l)
for the with replacement case.
The theorem of section 4 remains true, with
th~Q~f.
only slight modifications needed in
m(U~-Un)
we note that conditional on X, pr[
From Madow [5], Corollary 1,
a (l-(m-l)/(N-l))
n
as m,n +
00
provid~d
~~
xj converges to ~(x)
2
an does not converge to zero, (m-l)/(N-l) < 1 -
E
for
all sufficiently large m,n and the g. are uniformly bounded.
1
Assuming the latter (without loss of generality, see [2]) the proof of
the theorem of section 4 applies almost without change.
References
[1] Blom, G. (1976).
Some properties of incomplete U-.statistics.
Biometrika, 63, pp. 573-580.
[2] Brown, B.M. and Kildea, D.G. (1978). Reduced U-statistics and the
Hodges-Lehmann estimator. Ann. Statist., 6, pp. 828-835.
[3] Hoeffding, W. (1948). A class of statistics with asymptotically normal
distribution. Ann. Math. Statist., 19, pp. 293-325.
9
[4] Lee, A.J. (1979). On the asymptotic distribution of U-statistics.
Institute of Statistics Mimeo Series #1255, University of North
Carolina at Chapel Hill.
[5] Madow, W.G. (1948). On the limiting distributions of estimates based on
samples from finite universes. Ann. Math. Statist., 19,
pp. 535-545.
[6] Puri, M. and Sen, P.K. (1971). Non Papametric Methods in MUZtivariate
AnaZysis. Wiley, New York.
"-
UNCLASSIFIED
SECURITY CLASSIFICATION OF THIS PAGE (When Dat. Bnte,ed)
READ INSTRUCTIONS
BEFORE COMPLETING FORM
REPORT DOCUMENTAJION PAGE
2. GOVT ACCESSION NO. 3.
I. REPORT NUMBER
RECIPIENT'S CATALOG NUMBER
5. TYPE OF REPORT. PERIOD COVERED
4. TITLE (end Subtitle)
TECHNICAL
On the Asymptotic Distribution of Certain
Incomplete U-Statistics
6. PERFORMING O"lG. REPORT NUMBER
Mimeo Series No. 1259
7.
8. CONTRACT OR GRANT NUMBER(s)
AUTHOR(e)
Grant AFOSR-75-2796
Alan J. Lee
10. PROGRAM ELEMENT. PROJECT, TASK
I. PERFORMING ORGANIZATION NAME AND ADDRESS
AREA. WORK UNIT NUMBERS
I
12. REPORT DATE
II. CONTROLLING OFFICE NAME AND ADDRESS
December 1979
Air Force Office of Scientific Research
Bolling AFB
Washington, DC 20332
13. NUMBER OF PAGES
9
14. MONITORING AGENCY NAME. ADDRESS(II dllferent from Controlllnil Office)
15. SECURITY CLASS. (of thle report)
UNCLASSIFIED
ISa.
•
,--
DECLASSIFICATION/DOWNGRADING
SCHEDULE
'6. DISTRIBUTION STATEMENT (01 this Report)
Approved for public re1ease--distribution unlimited.
•f
l
.
I'
j
17, DISTRIBUTION STATEMENT (of the abatract entered In Block 20, It dllle,ent from Report)
".
SUPPLEMENTARY NOTES
II. KEY WORDS (Continue on rever.e .Ide It necenary and Identity by block number)
Incomplete U-statistic, asymptotic distribution, asymptotic relative
efficiency
20. ABSTRACT (Continue on ,ever.e side It nece"••ry end Identity by block number)
We consider incomplete U-statistics based on the arithmetic mean of m
quantities g. where g is the kernel of the complete U-statistic evaluated at a
randomly cho~en subsamp1e of a sample of size n. The asymptotic distribution
of the incomplete U-statistic is obtained in terms of that of the complete
statistic, and the asymptotic relative efficiency of the incomplete statistic
discussed. Comparisons are made with other incomplete U-statistics.
DD
FORM
, JAN 73
1473
EDITION OF 1 NOV 65 IS OBSOLETE
UNCLASSIFIED
SECURITY CLASSIFICATION OF THIS PAGE (MIen Dat. Bnr"ed)
SECURITY CLIoSIII"ICATION 01" THIS PIoOI:(lI'JlenD... "'.,ed)
•
SECURITY C,-ASSIFICATIOII OF
TU'·
PAGE(Wh_ D.t. Ent.·