ON THE ASYMPTOTIC DISTRIBUTION OF U-STATISTICS
by
Alan J. Lee*
University of Auckland
and
University of North Carolina at Chapel Hill
Abstract
The asymptotic distribution of a U-statistic is found in the case when
the corresponding Von Mises functional is stationary of order I.
Practical
methods for the tabulation of the limit distributions are discussed, and the
results extended to certain incomplete U-statistics.
Key Words and Phrases:
asymptotic distribution, stationary statistical
functional, incomplete U-statistic
.e
*The work of this author was partially supported by the Air Force Office of
Scientific Research under Contract AFOSR-7S-2796.
§1.
Introduction.
Let Xl"",X be independent random k-vectors with common distribution
n
function F.
Let
¢(~l""'~)
be a function on R km symmetric in the
k-vectors xl""'x
and let
"'III
~
8(F)
= J... J ¢(xl,.·.,x)
D1< R k
m
m
IT F(dx.)
i=l
1
be a functional defined on the space of d.f.s. on
integral exists.
An unbiased estimate of 8
= 8(F)
~,
(1)
assuming that the
is furnished by the
U-statistic
¢(X. , ... ,X. )
1
1
1
m
where the sum extends over all
(~)
(2)
subsets of the Xi'
Following Hoeffding
[10], define
c=1,2, ... ,m
and
~c(xl""'xc) = ~c(xl""'xc)
- 8, assuming all expectations exist.
Also define
c = 1,2, ... ,m
1';0 = 0 ,
If for some nonnegative integer
d < m, sd+ 1
.
~
0 but Sc
.
= 0 for c = O, ... ,d,
the functional (1) is said to be stationary of order d at F.
If d = 0, then (2)
has an asymptotically normal distribution after suitable normalization [10].
If d
~
0, the distribution is no longer normal and ma)' be obtained by
2
applying the theory of differentiable statistical functionals due to
Von Mises and Filippova" ([13], [7]) .
Let Fn be the empirical distribution function of Xl' ... 'Xn .
distribution of
1
S(Fn ) = -
n
n
n
I
m.
1
The
.L=1
=1
1
1
m
</>(X. , ... ,X. )
1
1
m
1
(3)
is closely related to the distribution of U as described in §2, and the
n
asymptotic distribution of S(F ) is given by the following theorem.
n
Theorem (Von Mises, Filippova).
d.
Let 6 be a stationary functional of order
d+l
2
Then theasynlptotic distribution "of n
(S(F ) -6) is identical to that
"
n
of
d
II
i=l
In the case d
= 1,
(F n (dx.)-F(dx.»
1
1
•
(4)
the asymptotic distribution of (4) can be found
using integral equation techniques. Filippova "gives an expression for the
characteristic function of the asymptotic distribution of (4) in terms of
Fredholm determinants.
Gregory [8] gives a more concrete representation of
the asymptotic distribution of (4) as an infinite series of random variables,
in the case m = 2.
Below we combine these results to give explicit
expressions for the c.f.s. of the limit distributions and discuss practical
methods for the tabulation of the limit distributions.
remarks about incomplete U-statistics.
We also make some
3
§2.
Relationship between 8(P ) and U .
n."
n
~
Let 1be the set of indices {(i l , ... , i ): 1
m
iJl,
~
n,
JI,
= 1, ... ,m} and let
1. be the subset of 1 consisting of those m-tuples with exactly j distinct
J
integers.
Define for each j = i, ... ,m a sYmmetric kernel <Pu) (xl""
,X~)
by'
j1<P(j)(xl"",xj ) = L(j) <P(Xil""'Xim~where the sum L(j) is taken over all
m-tuples of indices (il, .•. ,im) with 1
in are distinct.
N
~
iJl,
~ j
such that exactly j of the
Then 8(P ) =!L <p(X.1 , ..• ,X.1 )
n
nm 1
1
m
1
= -
m
L L <pcx.1 , ... ,x.1 )
. 1 1.
nm J=
J
1
m
.e
where
u~j) is the U-statistic corresponding to the kernel <p(j)' Note that
U(m)·= U.
n
n
Thus
(5)
where
(6)
If 8 is stationary of order 1, then n(8(P
)-8) converges to a nondegenerate
.
n
distribution as seen in §3, and Var(nU ) is bounded as n
n
Moreover, E(Zn) converges to the quantity -m(~-l)
+
+
00
([10]).
8(m-l) where
4
•(XI,···,X _ )] and S1nce
.
. . Var Un
. (j)
8 (m-I) .
= E[<!>(m_I)
Var Un -- 0. [IJ
n 2 . and
mI
=
for j < mit follows from (6) that Z converges in probabi.1 ity to
n
-m(m-l)
(m-I)
---2-- + 0
. , lind
so the asymptotic di.s1ri.butiolls of n(O(F )-0) lind
n
n(U -8) differ only by a shift.
n
A more explicit expression for
e(m-l)
is
1
=
(m-I)! I(m-l) E(<!>(X i "",Xi »
1
m
=
(m-I)!
1
m! (m-I)
2
E(<!>(XI,XI'X2"",Xm_I»
_ mem-I)
2
E(~2(XI,XI»
and so we
. may
. write lim
n E(Z n) = lim
n En (8(F n)-8) =
§3.
(m2JE(~2(XI,XI»'
distribution of n(8(F
Asymptotic
.
. n )-8) and n(Un -8).
By the Von Mises - Filippova theorem, the asymptotic distribution of
n(8(F n )-8) coincides with that of
(7)
-
where Gn (x)
= /:R(F n (x)-F(x». Consider the karnel
integrals are over
II1c)
Then it is easily seen that
'"
~2(xI,x2)
where (all
5
J J '1'Z(x1,xZ)Gn(dx1)Gn(dxZ)
=
'"
J J'1'(x1,x2)Gn(dx1)Gn(dx2)
=!n·
I I
1= 1
Now let A., j
LZ(IRk,dF)
1ry(X.,X.)
'"
+
(8)
_1J
1, be the (real) eigenvalues of the linear operator
~
J
· 1
J=
LZ(IRk,dF) with (symmetric) kernel ~2'
assumption that ?:Z exists,
JJ
We note that by the
\'1'z(u,v)\z F(du)F(dv) <
00.
By the results of
[8], we can see that
(i) the asymptotic distribution of (8) is the same as
I
(m
z}{ j=l AJ. (Y J.-1)
+
E(o/Z(X 1 ,X 1).)}.
(9)
where the Yj are independent xi random variables; and
00
L
Cii) if
11...1 <
j=l
00
J
then the asymptotic distribution of (8) is that of
A.Y ..
J J
To prove these, set the functions h
n
!\''''
L '1'ZCX.,X. )
n 1." ;tJ
1 n
1.
n
-n i=l
I j=l
I
J
in Theorem 2.1 of [8] to be 0, then
00
converges in distribution to
1
'"
'1'2(X.,x.) = - I '1'2(X.,X,)
1.
J
n i;tj
1.
J
\'L
J'=l
A.(Y.-1).
J
J
Since
1 n '"
1 n '"
I
'1'2(X.)x.) and- I '1'z(x.,x.)
n i=l
1.
1.
n i=l
. 1. 1.
+ -
'"
converges in probability to E['1'Z(X 1 ,X1)] by the WLLN, (9) follows upon
noting that
E['1'2(X 1 ,X 1)] =
J '1'2(x1 ,X 1)F(dx1)
=J '1'Z(x 1,X 1)F(dx1)
- E('1'2(Xl'X 2))
6
E[~2(X1,X2)]
since
= 0 by [10].
(ii) follows as in 1.2.3 of [8].
Using (i) and (ii) above and (5) we see that the asymptotic
distribution of n(U -8) is that of
n
to
distribution
that of
(~) j=l
I
A.(Y.-1), and if
J
J
(~) ~~1 ~/~ - E !'J!~(XrXl) ~.
I
IA.I <
co,
J
the
The characteristic
functions of the limit distributions are thus seen to be
-itA'•
co
</let) =
Aj
§4.
=
1
J (1-2iA~t)-~ in the former case and
J
</let) = II e
j=l
e-it~ II(1-2iAjt)-~
(~}Aj'
j
~
in the latter where
~ =(;}E[~2(X1,X1)]
and
1.
Tabulation of the limit distribution.
In this section we discuss practical means of computing the limit
distribution of n(U -8).
n
are surnmab1e.
We first consider the case when the eigenvalues A.
J
The method employed to tabulate the limit distribution will
depend on the amount of information we possess about the eigenvalues.
There
are three subcases.
(i) The eigenvalues must be determined numerically, and we have
accurate determinations of
x..
J
for 1 ::; j ::; N.
(ii) The eigenvalues can be explicitly dete-rmined for all j.
co
I A. Y., namely
j=l J J
</let) = IT (1-2itA.)-~, can be expressed in closed form.
j=l
J
In subcase (iii), the limit distribution may most conveniently be found by
(iii) The characteristic function of
co
1
numerical inversion of </let).
Various methods are available (e.g. Davies
[5], Bohman [3], Martynov [12]).
product is necessary.
A
In subcase (ii), computation of the inner
reason~b1e
way to accomplish this is described in
N-1
[2] and consists of
expJ'es~ing
!,:
</let) = II (1-2itA .)-2 </l2(t) where
J
j=l
7
00
log ¢(t) = -~
I
10g(1-2itA.).
J
j=N
00
00
I
I
Formally, we have
00
00
k
(2itA .)k/k = ~ I c N k(2it)k;lk where CN k = I A. .
J
j=N k=l
k=l
'
,
j=N J
The formal manipulations above are valid if the last power series converges.
~
log ¢(t) =
Since IcNkl
It I
~ 2~
N
~ [.Y
J=N
IA.I]k =
J
c~,
say, this convergence will take place if
• The numerical inversion methods above will require the
computation of ¢2 at a finite number M of ordinates t , so choose N so large
.
m
that CN < (2
m~x Itml)-l and use the power series to compute ¢2'
In special cases other methods may be used.
If the eigenvalues A. are
J
all positive, the asymptotic results of Zolotarev [14] and Hoeffding [11]
furnish good approximations for the tail probabilities.
If the eigenvalues
are all positive and of multiplicity unity, then Smirnov's formula may be
used as an efficient numerical inversion method; see Martynov [12] for
details.
In subcase (i) two possible approaches suggest themselves.
N
The use
00
I A.Y. to approximate that of I A.Y. will not be
j=l J J
j=l J J
satisfactory in general for reasonable values of N. For a discussion of
of the distribution of
this point and bounds on the truncation error see [2].
of approximating the tail of the series is required.
Rather, some method
If all the A. for
J
j > N are positive, we may approximate (see [6]) the distribution of
00
I
N
2
A.Y. by that of I A.Y. + cY where Y is an x variate and c and v are
j=l J J
j=l J J
v
chosen to make the mean and variance of the two distributions coincide.
Thus c and v are obtained from
8
N
I:
5=1
A~
+ C
A.
+
J
2
v = Var(\1'2(X l ,X 2)) = l';2
N
I:
5=1
J
cv = E[\1'2 (Xl' Xl) ]
yielding
N
I;;
2
-
L A~
J
J·=l
c = -----"----'----'~--
(E[\1'2(X l ,X l )] -
v
= ~[E(~2(Xl,Xl)) -
~
j=1
A. )
.J
i~l
Ail .
The c.f. of the approximating r.v. is
N
II
=
5=1
(1-2iA.t)-~ (1_2ict)-V/2
(10)
J
and a numerical inversion of (10) furnishes the desired approximation.
If all but a finite number of the A. are not of the same sign, one may
N-l
J
I: A.Y. by e.g. the integral equation method of
5=1 J J
Grenander et al. [grand approximate the distribution function of the
compute the density of
00
I: A.Y. by means of a Cornish-Fisher expansion.
j=N J J
the remainder are easily seen to be
remainder
Kr
N-l
= 2r - l (r_l)lC
.
N,r
N-l
2
A.,K = 62 - .1: A. and for r ~ 3 the K may be
r
J=1 J 2
.
J=1 J
approximated reasonably well by computing a few more eigenvalues. The limit
so K1
=
E[\1'2(X 1 ,X1)] -
.1:
The cumulants of
df may then be obtained by a numerical convolution.
9
If all but a finite number of eigenvalues are positive (or negative)
then the kernel
~2(xl,x2)
can be expressed as the sum of a degenerate and a
positive definite (negative definite) kernel and hence its eigenvalues will
be summable.
Thus in the nonsummable case, an infinite number of both
positive and negative eigenvalues will be encountered.
If only a finite
number are known, we may employ the Cornish-Fisher method suggested above.
00
If all are known, we may compute <1>2 (t) = IT e
00
j=N
log <l>2(t) = -~
§5.
I
-itA.·
1
J (1-2iA. t) -'2 by
J
CN k(2it)k/k and proceed as before.
k=2
'
Some remarks on incomplete U-statistics.
The calculation of the U-statistic (2) requires the averaging of
(~)
terms, which may not be practical if m and n are not small.
To reduce the
volume of computation Blom [1] and Brown and Kildea [4] have proposed the
use of incomplete U-statistics of the form
U
= Nl
I
<I>(X i , ... ,x.
1
1
)
(11)
m
when the sum in (11) is taken over N specified or randomly selected
m-subsets of the indices.
incomplete U-statistics
the degenerate case
(~l
The asymptotic distributions of nondegenerate
> 0) are studied in the references above.
~1=0'~2>0
For
some of their results remain true while others
need modification.
Denote by <1>.1 the r.v. <I> (X.1 , ... ,X.1 ) where (il, ... ,im) is the i-th
1
m
subset of indices in the summation (11). As·in [1], let p c denote the
2
proportion of the N pairs (<1>.,<1>.) having c indices in common. p will be a
1
J
c
fixed constant or an r.v.depending on the method of subset selection.
[1] we have
From
10
m
Var U =
I
c=2
E(pc)sc
and the assumption is made that n Var U + S where 8 is some nonnegative
constant.
N01:e that the
E(p~)
do not depenq on
~
but only on the method of
subset selection.
The asymptotic distribution of U depends on the ratio nINo
as n,N both
+ 00;
the quantities
<1>.
1
If n/N +
are asymptotically independent; and the
degeneracy of the complete U-statistic is irrelevant to the limit
distribution of !n(u-e), which will be normal with mean zero and variance
s.
m
On the other hand, if n/N + 0, the limit distribution of U will
coincide with that of the complete U-statistic under certain conditions.
For example, from [1] we have (denoting the complete U-statistic by U )
o
Var U - Var Uo
= E(U-UO) 2
,
so n(UO-e) and n(U-e) will have the same asymptotic distribution if
2
2
n (Var U - Var U ) converges to zero. Now n Var U converges to
O
o
so a sufficient condition for the coincidence of the asymptotic
2(~}2 s2
distributions is
c
=2
c>
,
2 •
Alternatively, if the N subsets are chosen at random, with replacement from
the (:} possible subsets, then ([1])
00
11
2
so the asymptotic distributions coincide if n
IN
converges to zero.
Thus
the incomplete U-statistic based on n 2+E randomly chosen subsets will have
the same asymptotic distribution as that of the complete U-statistic based
on (:) subsets.
References
.e
[1]
Blom, G. (1976). Some properties of incomplete U-statistics.
Biometrika, 63, pp. 573-580.
[2]
Blum, J.R., Kiefer, J., and Rosenblatt, M. (1961). Distribution-free
tests of independence based on the sample distribution function.
Ann. Math. Statist., 32, pp. 485-498.
[3]
Bohman, H. (1972). From the characteristic function to the distribution
function via Fourier Analysis. BIT, 12, pp. 279-283.
[4]
Brown, B.M. and Kildea, D.G. (1978). Reduced U-statistics and the
Hodges-Lehmann estimator. Ann. Statist., 6, pp. 828-835.
[5]
Davies, R.B. (1973). Numerical inversion of the characteristic
function. Biometrika, 60, pp. 415-417.
[6]
Durbin, J. and Knott, M. (1972). Components of Cramer - Von Mises
statistics I. J.R. Statist. Soa., 34, pp. 290-307.
[7]
Filippova, A.A. (1962). Von Mises '-tlieorem on the asymptotic behavior of
functionals of empirical distribution functions and its statistical
applications. Theop. Ppob. AppZ., 7, pp. 24-57.
[8]
Gregory, G.G. (1977). Large sample theory for U-statistics and tests
of fit. Ann. Statist. if 5, pp. 110-115.
[9]
Grenander; U., Pollak, H.O., and Slepian, D. (1959). The distribution
of quadratic forms in normal variates. J.S.I.A.M., 7, pp. 374-401.
[10]
Hoeffding, W. (1948). A class of statistics with asymptotically normal
distribution. Ann. Math. Statist., 19, pp. 293-325.
[11]
Hoeffding, W. (1964). On a theorem of V.M. Zolotarev.
AppZ., 9, pp. 89-91.
[12]
Martynov, G.V. (1975). Computation of distribution functions of
quadratic forms in normally distributed random variables. Theop.
Ppob. AppZ., 20, pp. 782-793 .
Theop. Ppob.
12
[13]
Von Mises, R. (1947). On the asymptotic distribution of differentiable
statistical functi0nals. Ann. Math. Statist.,. 18, pp. 309-348.
[14]
Zolotarev, V.M. (1961).
Concerning a certain probability problem.
Theor. Prob. AppZ., 6, pp. 201-204.
UNCLASSIFIED
SECURITY CLASSIFICATION OF THIS PAGE (When Data Entered)
READ INSTRUCTIONS
BEFORE COMPLETING FORM
REPORT DOCUMENTAriON PAGE
2. GOV" ACCESSION NO. 3.
l. REPORT NUMBER
RECIPIENT'S CATALOG NUMBER
5. TYPE OF REPORT & PERIOD COVERED
4. TITLE (and Subtitle)
On the Asymptotic Distributioh
at
TECHNICAL
U-Statistic
6. PERFORMING oqG. REPORT NUMBER
Mimeo Series No. 1255
8. CONTRACT OR GRANT NUMBER(s)
7. AUTHOR(s)
AFOSR-75-2796
Alan J. Lee
9. PERFORMING ORGANIZATION NAME AND ADORESS
10. PROGRAM ELEMENT, PROJECT, TASK
1 1.
12.
REPORT DATE
13.
NUMBER OF PAGES
AREA 8c WORK UNIT NUMBERS
CONTROLLING OFFICE NAME AND ADDRESS
Sentember 1979
Air Force Office of Scientific Research
Bolling AFB, DC 20332
13
14. MON ITORIN G AGENCY NAME 8c ADDRESS(if different from Controlllnll Office)
15. SECURITY CLASS. (of this report)
UNCLASSIFIED
15a.
DECLASSI FICATIONI DOWNGRADING
SCHEDULE
16. DISTRIBUTION STATEMENT (of this Report)
Approved for public release
- distribution unlimited.
17. DISTRIBUTION STATEMENT (of the abstract entered in Block 20, if different (rom Report)
18. SUPPL EMENT ARY NOTES
19. KEY WORDS (Contfnu.e on reverse side if necessary and Identlfy by block number)
Asymptotic distribution, stationary statistical functional, incomplete
U-statistic
20.
ABSTRACT (Continue on reverse side If necessary and identify by block number)
The asymptotic distribution of a U-statistic is found in the case
when the corresponding Von Mises functional is.stationary of order l.
Practical methods for the tabulation of the limit distributions are
discussed, and the results extended to certain incomplete U-statistics.
DO
FORM.
1 JAN 73
1473
EDITION OF I NOV 65 IS OBSOLETE
UNCLASSIFIED
SECURITY CLASSIFICATION OF THIS PAGE (When D"t" Entered)
SECURITY CLASSIFICA1'ION OF THIS PAGE(lf'hen D.t. Bntered)
•
SECURITY CLASSIFICATIOII OF
TU"
f'AGE(Wh"n Dat. Entur '1;
© Copyright 2026 Paperzz