DISTINGUISHABILITY OF SETS OF DISTRIBUTIONS
(The case of independent and identically distributed chance variables.)
~.
/-
t
By
,t-
]
Wassily Hoeffding l and J. WolfoWitz 2
University of North Carolina and Cornell University
.•
...~
~1"
Institute of Statistics
Mimeograph Series No. 166
April, 1957
•
1.
..
Introduction.
Suppose1t is desired to make one of two decisions, dl and d2 ,
on the basis of independent observations on a chance variable whose
distribution F is known to belong to a set
subsets ~ and
'it
~.
There are given two
of ';I such that decision d (d 2 ) is strongly prel
ferred if F is in ~ ('Jt).
Then it is reasonable to look for a test
(decision rule) which makes the probability of an erroneous decision
small when F belongs to '-§ or %
,and at the same time exercises
some control over the number of observations required to reach a
•
decision when F is in ~ (not only in ~ or
1t ).
2
..
?If
in
J
is equivalent to the existence of what has been called a
uniformly consistent sequence of tests for testing F
F
E
£
~ against
1(:
The sets ~
if for any test in
and ~ will be called indistinguishable in
':f
~ the sum of the maximum error probability in ~
and the maximum error probability in 1t is at least one.
(There
always eXists a trivial test for which this sum is equal to one.)
In
section 2 it will be shown that, with the present restriction to
sequences of independent and identically distributed chance variables,
two sets are either distinguishable or indistinguishable in any of
•
the classes
.
7
which we shall consider.
Since we confine ourselves to tests based on a sequence
Xl' X2, ••• of independent, identically distributed chance variables,
we may restrict ourselves to sequential tests. A sequential test is
determined by the sample size function N and the terminal decision
function
9,
and will be denoted by (N,¢).
Here N is a chance variable
whose values are the non-negative integers and +00, and whose conditional distribution, given any sequence!
= (x1 ,x2 , ••• )
of possible
values of Xl 'X 2 ' ••• , is such that the probability of N S n does not
depend on xn+1 ,xnt2 ' ••• , for all non-negative integers n.
¢ is
.
The function
a function of (Xl, ••• ,x ) whose values range from 0 to 1.
N
The test
(N,¢) consists in taking one observation on each of the first
N chance variables of the sequence, finding the corresponding
value of
¢,
and making decision d l or d 2 with respective probabilities
3
•
p and~.
1 -
The function ¢ and the conditional distribution function
of N given ! are always understood to be measurable on the appropriate
•
a-field. The sample size function N and the terminal decision
~
functi~n
are said to be non-randomized if the respective functions
P1N
:s n13.7
and ~(!) take on the values 0 and 1 only. A test (N,~)
will be called non-randomized if both N and
~
are non-randomized.
We use the term distribution synonymously with probability
measure. The set ~ consists of distributions on a fixed a-field
of subsets of a space ~.
assume that
1.
tt
Unless we state otherwise, we shall
a
is a k-dimensional Euclidean space and
dimensional Borel field.
0-
a
A distribution on
a k-dimensional or k-variate distribution.
02
the k-
will then be called
If' f'
is
&
distribution on
a , we denote by F1A_7
the probability of the set A € Ci? and
rv
by F(x) = F(x (1) , ••• ,x (k) ), x € .;A;
, the associated distribution
function, that is, F(x)
= F1t yly(l)
::; x(l), ••• ,y(k) $ x(k)} _7
.
With the usual definition (see 15_7 ) of the distribution of a sequence
! = (X l ,X2 , ••• ) of independent chance variables with identical marginal
distribution F, we denote by P LB_7 the probability of a measurable
F
set B in the range of !' and write EF"';f for the expected value of a
funct ion "f of X.
According to our definitions, the probability of an erroneous
..
decision when test (N,¢) is used is equal to E,¢ if F
E,(l - ¢) if F
€
'1t . Thus the sets
€ ~
,
and to
~ and "f( are distinguishable
in a class ~ of tests if and only if for every
€
>0
there exists a
4
7
test (N,¢) in
...
for all F
2.
€
tf(
such that ErP <
€
for F
€
-1J
and EF(l - ¢) <
€
•
Modes of distinguishability.
We shall be concerned with the distinguishability of two sets of
distributions in various classes ~ of tests, which are defined in
terms of properties of the distribution of the sample size function N.
Some classes of particular interest are the following.
rj 0:
PFLN
'J1(r):
"
< 00_7 = 1
~Nr <
00
if F
if F
€
€
'f ·
7
(r
> 0).
€
r:#.
'71 :
~Nr
'12 :
~etN < 00 for some t = t(F) > 0 if F
7'3:
max(N)
< CD
for all r
>0
if F
€
If.
< 00 •
It will be noted that each of the successive classes contains
the one following.
Some classes of obvious interest have been omitted
because, for the purposes of our investigation, they are equivalent
to some of the classes listed above.
guishable in one of the classes
Thus if two sets are distin-
~, ••• ,
:7;,
they are also
distinguishable in the corresponding subclass which contains only the
non-randomized tests; this follows from Theorem 2.1 below.
are distinguishable in
r:J;
If two sets
(the class of "truncated" sequential tes1s),
they are clearly distinguishable in the class of all fixed sample size
tests;
for if (N,P) is any test in
~3' and we put N'
= max(N),
5
..
I
¢ = E~¢ ~ _7
,
~¢
•
= E ¢
F
I
, then (N
,
I
¢ ) is a fixed sample size test such that
for all F •
In view of the importance of the two extreme classes,
and
7-y
we shall use the following terms.
~
If two sets of distri-
butions are distinguishable (indistinguishable) in
~O'
they will be
(~)(indistingU1Shable (~». If two sets
called distinguishable
are distinguishable (indistinguishable) in
:Jf3,
we shall say that
they are finitely distinguishable (finitely indistinguishable).
The classes
~l
have been defined in terms of the set
'f
to which the distribution of Xj is assumed to belong (Without dis-
PlaYing~in the
notation).
It may be of interest to consider also
;t' is
the correspond ing classes where
(compare -Lemma 4.1 in section 4).
replaced by some subset of
7
It will be clear that theorems 2.1
and 4.1 below can be immediately extended to such classes.
Our list does not contain the subclass of
r
EFN is bounded for F
:;t:
€
nor the subclass of
PFLN> n_7 - > 0 as n - >
CD,
for this omission is that two sets
~(r)
7
uniformly for F
~
and
X
0 where
€
as n
->
00,
'//)
The reason
This follows
If (N,¢) is a test such that PF~N > n_7--->0
uniformly for F
eXists a test (N' J
do
which are distinguishable
in one of these classes are finitely distinguishable.
from the following fact:
where
E
y(il( ,
then for every
such that max(N I) < co and ~F(/ -
€
> 0 there
~¢/ <
€
6
for all F
~ U'#.
E
This is so since, by our assumption, we can choose
an integer n = nee) such that PFLN> n_7
,I
, ,
,¢ ) defined
and the test (N
¢' = ¢,
N'
€
for all F
I
=n
~V 'if,
€
by
(,(,
N ~ n;
= N if
<2
F
=
1
2' N
if N > n
has the stated property.
Let
'7
be any class of tests.
IfE:. = L(~) denotes the
class of all terminal decision functions ¢ of the tests in
J!f
statement that
and rj{ are distinguishable in
7
7, the
can be expressed
by the equation
sup
(2.1)
¢
eL
Whenever ~ contains a trivial test such that
left side of
¢=
const, the
(2,1) is at least zero. Let us say that a test in
~ is
nontrivial for distinguishing between ..." and 1'( if
sUPG e.J!l E ¢
G
if and only if
between ~
<
infH e '1f E,gP.
1f
Thus the left s ide of
(2,1) is pos it i ve
contains a nontrivial test for distinguishing
and ~ • The following theorem shows that if ~ is one
of the classes
7 0" " ,
'J3 (or one of the "equivalent" classes men-
tioned above), then the existence in ~ of a nontrivial test for
distinguishing between
distinguishable in
1
:f,
and
1f
is sufficient for
7'
and
'Jf
to be
and even in the class ~I which consists of
the non-randomized tests in
sr.
The special case of the theorem
7
•
where
:r is
the class of all non-randomized fixed sample size
tests
is contained in a lemma of Berger L:l_7 (which is there attributed to
Bernoulli) •
We denote by
2:
and!:' the classes of the terminal decision
functions of the tests in ~ and
Theorem 2.1.
If
:t
~', respectively.
~31 then
1f01 ""
1s one of the classes
(2.2)
implies
-e
sup
(2.3)
11
e
!:
,
inf
G e ~ IH e ":Ii
(E ¢ - E ¢) = 1.
H
G
Hence
sup
jIJ e
E:.
inf
G
€
~ ,H
E'1f
(EH¢
-
a = o or
E ¢)
1.
For the proof of Theorem 2.1 we require the following
~.' If rf is one of the classes
(N, ¢) is in
rt,
then for every
E
such that N' is non-randomized and
Proof.
>0
1'01
d.)
'))'
and
there is a test (N' ;¢') in
I¢' - pi <
€l
Let N' be the least integer n ~ 1 such that
7'
8
,
Define r/J by
o
,
¢ = E~¢IN ~ n,
-,
= n,
! _/ if N
= 1,2, ••••
n
Thus (N', ¢') is a test, and N' is non-randomized.
We have for every n
~
1
Since for any increasing function h on the nonnegative integers
00
Eh(N)
= h(O)
+
r: ~h(n + 1) - h(n)_7PfN
n=O
> n_7 ,
it follows that if N satisfies the condition for any of the classes
r-t'
.J 0'··"
.-.p
J Y
Now if Nt
,
,
,
so does N • Hence (N , ¢ ) is in
= n,
J.
we have from the definition of
I¢ - ¢'/ S P£N > n I ~ _7 if Nt = n. But Nt
P£N > nJ! _7 < €, for all n. This completes the
Thus
,...".
pt
= n implies
proof of the Lemma.
9
Proof of Theorem 2.1.
•
If condition (2.2) is satisfied,
:t
contains a test (N,¢) such that
By the preceding lemma we may and shall assume that N is nonrandomized.
Let E be any positive number.
The theorem will be proved by
showing that there is a non-randomized test (Nt ,pI) in 'J such that
sup
Ge
it
EG~'
>1
- e.
Choose a positive integer m which satisfies the inequality
Define the test (N',~') as follows.
the
1 Bbutting
value.s
or
First apply test (N,¢), and denote
Nand ¢ by N1 and
~l'
Then apply the same
test to a new independent sequence of observations and note the values
N2 and ¢2 of Nand
9.
Continue in this way until m independent sequences
I
of observations have been taken. The total sample size is N
+ N.
m
Since N is non-randomized, so is Nt. Now put
= Nl
+ •••
10
~
1
m
= -m 1=1
L ¢i'
if
0:
'fJ
> a:
if ~ <
C{
+ 13
2
~ f3
, ,
Thus (N
,¢ ) is
a non-randomized test.
The chance variables ¢l' ... , ¢m are independent, and each has
the same distribution as
¢.
Hence E
W= E¢,
and the variance of ~ is
less than 1fill.
If G €~, then EG¢ ~
E
¢'
G
so that
= P Tri _ E i'Z
(}L 'P
G1'
<
by Chebyshev's inequality.
0:,
1
m
2
2
(13 - 0)
Hence
In a similar way it is seen that
> a:
+ 13 - E
2
¢7
G -
11
so that inequality (2.5) is satisfied.
We now show that the test (N',¢') is in
and
J
73
=
this is obvious.
Since for r
, r
(N)
m
= (r,
i=l
Ni )
r
> 0,
~ (mi=l~x••• m N.
,'.'
1
)r
r
r m r
<m I: Ni
1=1
r
InaX
= m i-l
m (N i )
- , ••• ,
, r
and each N has the same distribution as N, we have E(N)
i
r
EN < ro.
This proves the statement for
Finally, if Ee tN
e.
rto
rf. For 7' =
7
=
J;, (r)
<
and
00
whenever
"J=
~.
< 00, where t > 0, put t ' = tfin. Since
,,
Nl' • • •,Nm are independent and distributed as N, Ee
Thus (N I , ; ; t) is in rj in every case.
It should be noted that if Xl ,X 2 ""
t N
= Ee tN < 00 •
The proof is complete.
are not independent and
identically distributed, the analog of Theorem 2.1 is not true in
general.
3.
Sufficient Conditions for Distinguishability.
Let'l(be a set of distributions on (2.
A distance in '1(, is
J of the pairs (G,H) of distributions in '1<: such
J (G,H) = J (H,G), and J (G,H) :s J (G,K) + J (H,K),
a nonnegative function
that cf(G,G)
= 0,
for all G,H, and K in
imply G
for inf
= H.)
G
~
€ ..)
'"K..
(We do not require that d(G,H) = 0
We write d(G,'1()
<:5 (G, ')if: ) •
for inf
H
€r:rtI(G,H),
andJ(~ ,1f)
12
Let F denote the empiric distribution of the first n members,
n
X1, ••• ,Xn of
8
distribution F
sequence of independent chance variables with the common
€
sf;
i < n for which X.
-
€
1
Jr
a distance
That iS I nFn~A_7 is the number of indices
is defined contains
'3f
€
is consistent in
if
if for
>0
d (FnI F) > € _7
PFL
11m
(3.1)
n->
whenever F
in
in which
and all empiric distributions.
J
We shall say that a distance
every
rJC
A. We assume throughout that the set
E
"i' .
= 0
00
The distance
J'
will be called uniformly consistent
if if the convergence in (3.1) is uniform for F
€
~
•
In this section we derive sufficient conditions for distinguishability in terms of uniformly consistent distances. 'VIe first mention
a few examples of such distances.
If ~ 1s the set of all distributions
on the
a
k-dimensional Borel field
1
and
2
denotes the k-dimensional
Euclidean space, the distance
is known to be uniformly consistent in ~. (see, for example,
.L4_7).
So is the distance
-e
( J IG(x)
1
• H(x)! r dK)
Ir ,
~
where r
> 0 and K i.s a fixed distribution on
~ , since i t is bounded
13
above by D(G,H}.
A further examp1e of a uniformly consistent distance
is
D ..(G,H) = D(G , H ),
~
Cc.l
(IJ
where G and H are the distributions, according to G and H, of a
W
w
fixed, real-or vector-valued measurable function w on X 4 If
~(F)
denotes the mean of a one-dimensional distribution F, the distance
I ~(G)
- ~(H)
I
is uniformly consistent in any class of distributions
with bounded variances.
A sufficient condition for finite distinguishability is the
following.
i
If the distance
~ U~
is uniformly consistent in
and
J (~
,rtf ) > 0 ,
then the sets ~ and Yf are finitely distinguishable.
This can be seen by using the test with N = n fixed and
or 0 according as
t (F n , ~)
-
(F,
1f ) > 0 or <: O.
n-
If F
€
¢=1
11 ,
then
Hence E ¢ does not exceed
F
We obtain the same upper bound for EF(l-¢), F € ~.
imply that the bound tends to 0 as n->
00.
Our assumptions
14
In the proof of the next theorem we shall make use of a test
defined as follows.
Let 5 be a distance,
Lc i },
sequence of' positive numbers, and ini} , i =
sequence of positive integers.
= 1,2, ••• , a
1,2, •• 0' an increasing
Put
5(F
continue sampling as long as 0i < Cit
5i
~
1
n
,'1f-)
i
-
7.
stop sampling as soon as
c i ' and apply the terminal decision function
¢. {
o if
°(F
n
i
, ~) < 5 (Fn , '1( ) .
i
Thus N = ni , where i is the least integer for which 01
We shall refer to this test as the test T(5,
Theorem 3.1.
(a)
~
t C i}
ci •
J
[n i } ).
If the distance 5 is uniformly consistent in
if ,then any two subsets ~ and rf{ of '1 for which
are distinguishable (rj').
(b)
If, for every c
> 0, there exist two positive numbers A(c)
and B( c) such that for all integers n
>0
and all F
€
if
15
~
then any two subsets
'f
and rj( of
which satisfy (:; .6) are
distinguishable in the cla~s of tests (N,¢) such that EFe
for some t = t(F)
Proof.
>0
Let
0:
if F
be a
I:
tN
< co
"fo
~e>sitive
number
by showing that the sequences { c }
i
that the test (N,¢) = T( 5, {£i /,
0
and
Part (a) will be proved
{n }
i
can be so chosen
(nil) satisfies the conditions
and
Let
i Ci
} be a. sequence of positive numbers such that
(3.10)
lim
1-> 00
ci
Choose the positive numbers
=0
0:
1,
•
0: "0'
2
so that
(3.11)
Since 5 is uniformly consistent in
integers
(3.12)
01
< n2 <
00.
in such a way that
sr , we
can choose the
16
for all F
€
r# .
IfF€~,
00
~ r PFl"c(F ,F) ~ cj-7 •
j=l
nj
e
It now follows from ( ).12) and (3.11) that E ¢ ~a if F
F
similar way it 1s seen that Ep.(l-¢)
~ ex
if F
€
€ ~.
In a
.-It. Thus the conditions
(3.8) are satisfied.
The terminal sample size N takes on the values nl ,n 2 ••• , and
we have
By the triangle inequality,
OJ
2:
8* - 8(Fn ,F)
j
where
-e
1
17
By assumption,
'6*
Hence if' F
Since c
--->
J
>0
€
for all F €~:l
'?f,
0, we have '6* - c
J
> c for J sufficiently large, and
then the right side of (3.13) 1s
Thus PFLN> n
•
_7 - >
J
J->
0 as
J
S O:J' By (3.11), oJ -> 0 as j->
00,
which implies (3.9).
00.
This
completes the proof of part (a).
Now suppose that the assumption of part (b) 1s satisfied.
sequences
l ci ~
The
and \ nil can be so chosen that, in addition to
11m inf
i-> 00
and
(For instance, put M(c)
that c i
> 0,
lim c i
= maxLA(c),
= 0 and
ljB(c)
_7;
M(e i ' $ m i 1/2 , i
ohoose c l 'c 2 ' ••. so
= 1,2, ••• ,
with a
suitable number m • OJ and put ni = ni, where n is so large that
00
mi
i=l
!:
1/2
e
-nm -li 1/2
< a).
-
The inequalities (3.7) and (3.15) imply that conditions (3.11)
and (3.12) are fulfilled.
Hence the conditions (3.8) are satisfied.
18
For a fixed F
i
> h.
Then for 1
= Ace* /2)
where a
€
'j!, choose the integer h so that c
> h,
and b
*
i ~ e /2 for
due to (3.13) and (3.7),
= B(e* /2)
are positive numbers.
Now for any real t,
Thus EFe
tN
converges.
< 00 if the series
If t ~ b /2,
so that the series converges due to (3.14).
The proof is complete.
The assumption of Theorem 3.1, part (b) is satisfied if ~ is any
set of k-dimensional distributions
by (3.2).
(~l)
and
e = D, the distance defined
This is implied by the following theorem due to Kiefer and
one of the authors
L:4_7:
For every integer k ~ 1 there exist two
positive numbers a and b such that for al c
>.
0, all integers n
> 0,
19
and all k-dimensional distributions F
(3.16)
(For k
=1
the inequality (3.16), with b
= 2,
was proved by Dvoretzky,
Kiefer and one of the authors~2_7.) Hence we can state the following
corollary.
Corollary 3.1.
""f is any set of k-dimensional distributions
If
(k ~ 1), then any two subsets
-!t
and 7t of
max~D(F, ~ ), D(F, fj{
)_7 > 0
":t' for which
if F €
r:r'
.
are distinguishable in the class of tests (N,¢) such that
for some t = t(F)
4.
>0
if F €
tN
< 00
"JP.
Necessary Conditions for Distinguishability.
distributions on a a-field
~e
Let P and Q be two
P..:J of subsets of an arbitra~y spa,ce ~,
and let ~ be the class of all measurable functions on ~ with values
ranging from 0 to 1. We denote by d the distance defined by
(4.1)
We note some alternative expressions for d.
Let v be any
O-finite measure with respect to which P and Q are absolutely continuous
(for instance, v = P +
Q), and denote by p and q densities (Radon-Nikodym
derivative~ of P and Q with respect to v.
(4.2)
d(P,Q) =
S
{p> q}
(p-q)dv
= ~
Then
f(
p-cll dv
=1
-
f
min{p,q)dv.
20
(Here and in what follows, an integral whose domain of integration is
not indicated is extended over the entire space.)
Also
For any distribution G on ~ we denote by G(n)
the distribution
of n independent chance variables each of which has the distribution G.
We write ~ (n) for the set of all G(n) such that G
€
~
•
It is easily seen from (4.1) that
(4.4 )
n = 1,2, •••
We also have (Kruskalf6_1 , p. 29)
(4.5)
d(G
(n)
,H
(n»
~ 1 - (1 -
The convex hull, C~ , of a set
dIG,H»
l'
n
~
nd(G,H) •
of distributions on a common
a-field is defined as the set of all distributions hlPI + ••• +
where r is any positive integer ,j P , ••• , P are in
r
I
~l,
••• ,hr
JP
~rPr
,and
are positive numbers whose sum is 1.
In order that two sets ~
and ~ be finitely distinguishable
it is necessary that
(4.6)
or, equivalently,
for some n
'
21
d(C ~(n), c'JC(n»
lim
(4.7)
n->
= 1 •
00
This is known and follows easily from the definition (4.1) and Theorem
If the set ~ U %
tributions in
11
is dominated, that is to say, if the dis-
U 7£ are absolutely continuous with respect to a
fixed a-finite measure, then condition (4.7) is also sufficient for ~
and
?t'
to be finitely distinguishable.
This is contained in Theorem
6 of Kraft ~7_7 ard follows from a theorem of LeCam (Theorem 5 of
Kraft
L:7_7)
Ji U P2
which is equivalent to the statement that if the set
is dominated, then
(4.8)
max
¢
where
~
€ [
denotes the set of all measurable functions
¢ such
that
O~¢~l.
If condit1on, (4.6) 1s satisfied, then
This weaker but much simpler necessary condition for finite distinguish.
ability will be shown in section 5 to be also sufficient under certain
assumptions.
To obtain necessary conditions for non-finite distinguishability
we first prove the following lemma.
22
Lemma 4.1.
If
( 4.10)
then the sets ~ and '1{ are indistinguishable in the class of tests
Proof.
Define f)
n
==
Let (N,,¢) be any test such that P £N
Fo
¢ if N < n"
-
xl"" ,,,xn only, and ¢n
1Jn
~ ¢.
= 0 if N > n.
Thus
< 00_7 ==
1.
¢.n is a function of
Let K be a member of C -1 (n)" so that
Hence
for all K
€
C ~ (n).
Since PF (N
o
Hence
(4.11)
Therefore
> n) ->
0 as n --,.
00"
E,
¢
0n
converges to E-. ;; •
~O
23
In a similar way, if we use, instead of ¢n' the function
.,
pn = ¢
if N < n,
-
,
¢n = 1
if N > n, we find that
(4.12)
Inequalities (4.11) and (4.l2) imply the Lemma.
Theorem 4.1.
In order that the sets ~ and '1( be distinguishable
( ':I' ) it is necessary that
and hence that
...
e
(4.14 )
maxfd(F,-:S- ), d(F, '1f
Proof.
4.1.
)_7>
0 if F
€
rj'.
The necessity of (4.13) follows immediately from Lemma
That (4.13) implies (4.14) follows from inequality (4.5).
That the condition (4.13) can be violated when inequality (4.14)
is satisfied can be seen from an example given by Kraft
(£7_7,
p. 132)
to show the non-equivalence of two conditions equivalent to (4.6) and
(4.9).
Nevertheless the simple necessary condition (4.14) is also
sufficient under certain restrictions on the set of distributions, as
will be seen in section 5.
We conclude this section by showing that a known necessary
condition for distinguishability is implied by condition. (4.14) of
Theorem 4.1.
For any two distributions F and G on'·
w
~
distributions on
62
define
a and any set
~ of
24
't" (F,G) = !f log
= infO €~,,[,,(F,G
't(F,-?)
(f/g)dv,
) ,
where v denotes a a-finite measure with respect to which F and G are
absolutely continuous, with densities f and g.
K.
00.
It has been sh~wn in
[""3_7
Note that
that i f 't"' (F, ~)
= 0,
°
~ ~(F,G)
then
F and ~ are indistinguishable in the class of tests with EFN <
00 •
We shall show that
(4.15)
Hence 'r' (F,.....g,)
=0
°
implies d(F,~)
= 0.
'l'hus, by Lemma 4.1 (with
F = F and rj!( consisting only of F) F and ~ are even indistinguishable
~
in the class of tests with
examples where d(F,..I?1)
PF~N < oo~7 = 1.
= ° and
It is easy to construct
7:(F,~) > 0, so that condition (4.14)
1s actually better than the corresponding condition with d replaced by 1r •
To prove (4.15) put
rr::+-
=
f
f log(f/g) dv,
t-"
{f > g!
f
= -
f log(f Ig) dv
{f < g}
and write d and ~ fer d(F ~G) and 1:"(F,G).
Note that -r::t
~ 0, 1:'" > 0
and
(4.16)
T"
= 't" +
-
't"
Since
f
{f > g}
(f-g)dv
=
f
\.f>g}
f(l - f)dV
S
f
flog (fig) dv,
{f>g}
25
we have
d
(4.17)
<
1;"+
Next we show that
(4.18)
•
Put h = log (g/f). Then
r
=
~h
f h
dv,
d
> O}
f
h
(g-f)dv >
f(e -l)dv
{g>f}
{h>O}
=
J
By Schwarz's inequality, and since e
1 - h) -ldv
f
\h>
<2
f
f
f dv
f(e
{h>O}
{h>O}
<2
(h
)
f( e
n
- 1 - h)dv
h
- 1 - h~h
f(e
h
2
12
for h> 0,
- 1 - h) dv
O}
- 1 - h) dv
~
-
2(d - 'l"') ,
1h > O}
which implies (4.18).
Inequality (4.15) now follows from (4.16), (4.17),
and (4.18) by an easy calculation.
5.
Necessary and Sufficient Conditions for Distinguishability.
In this section we shall show that the necessary conditions of
section 4 are also sufficient for distinguishability under certain
restrictions on the sets of distributions. Most of our results will
be such that if the necessary condition is satisfied, the sets are not
26
only distinguishable (rJ'), but even distinguishable in a stronger sense.
If ~ consists of a single distribution G, then, by Theorem
4 .1, ~ and ~ are distinguishable
>0
for some n.
~
(-!1 u 1t)
only if d(G(n) ,C '1t (n»
If '":H. is dominated, this condition is sufficient for
and 1'( to be finitely distinguishable, by the Le Cam-Kraft
theorem mentioned in section 4.
If ~
More generally, we can state the following.
is finite and ':k is dominated, then ~ and'1f are either
finitely distinguishable or are indistinguishable (~ U~), depending
on whether the condition
is or is not satisfied. Condition (5.1) is equivalent to
(5.2)
That condition (5.1) is necessary for ~
distinguishable
(1
U'1f) follows from Theorem
·if (5.1) is satisfied, so is (5.2).
are denoted by Gl, ••• ,Gr ,
and r:f{ to be
4.1. On the other hand,
Hence if the distributions in ~
then, by Le Cam's theorem, Gi and
finitely distinguishable, for each i. Thus, given E
¢i)
an integer n and tests (n,
sUPH
Then
€
< €,
"it EH(l- 'Pi)
ep
~
and EH(l -
¢i and 1 -
rp ) < r
€
i
'I> :;;
if H
€
such that E
Gi
= 1, ••• ,r.
Put
1> i <
€
> 0,
r:H.
<P i)·
are
there exists
and
l' =
r
Li=l (1 -
~
Hence EG
i
9b < E for
all i
Therefore ~ and tfl are finitely
distinguishable, and condition (5.12) is equivalent to (5.1).
27
If both ~ and
longer true that ~
rJt
are countably infinite sets, it is no
and ~ are either finitely distinguishable or
indistinguishable. To see this, let.A1
i
= 1,2, ••• ,
={
G1
1 and r'!f
={ Hit,
where G and Hi are univariate normal distributions with
i
2
respective means a and b (a ~ b) and common variance ai' such that
lim
O'i = CD.
L-8_7
It follows from a result of stein
(or from
Theorem 3.1) that ~ and r:J( are distinguishable (~, where
the class of univariate normal distributlpns.
that
lim d(G , Hi)
i
i
= 0,
K
denotes
But one readily verifies
so that the sets are finitely indistinguishable.
In what follows it will be shown that the
s~ple
condition
which, by Theorem 4.1, is necessary for ~ and r-Ji. to be distinguishable (r:f), is also sufficient under rather general assu~ptions. Under
somewhat more stringent assumptions the necessary condition d(~,~) < 0
for finite distinguishability (see (4.9»
will also be shown to be
sufficient.
A compari50n of the results of sections 3 and 4 shows that if
af 1s any ll.f.liformly consistent distance in a set rf , then d( ~ , '1{)
im.plies 0 (~ , rj() = 0 whenever
and 4.1 alao show that if the set
~ c rf and
4
-e
1) = 0
~ c f"J'. Theorems 3.1
has the property that there exists
a uniformly consistent J'such that, for all F
5 (F,
= 0
€
'11 and
all
~
C
~
implies (and hence is equivalent to) d (F, ...;) = 0, then
28
~
the necessary condition (4.14) is also sufficient for two subsets
and '.1(' of
'7
to be distinguishable ( '1').
and all1( C ~, 8(~
two subsets
only i f d(~ ,1()
= 0 implies d(1 , '#) = 0, then any
,1t)
~ and 1(- of
if
are finitely distinguishable if and
> O.
We first consider conditions which ensure that D(F,~ )
implies d(F,~ )
= 0.
e
Suppose that there is an integer J with
non-overlapping k-dimensional
6
I , such that (i) F-G is monotone' in each I j , and
p:r.oper·~y.
intervals II"'"
=°
Let F and G be two k-dimensional distributions
and € a nonnegative number.
the following
.Ij C rf
Similarly, if for all
There exist J
J
(ii) if V denotes the complement of U~=l I j , then min (FL:v_7,GL:v_~
< €. Write J(F"Gj€) for the least integer J having this property. If
such a finite J does not exist, define J(F,Gj€) =
00.
Note that if F - G is monotone in a set C, the difference of
the densities, f - g,is of constant sign in C except in a subset of
probability 0 according to both F and G.
Lemma
5.!.
If F and G are two k-d1mensional distributions,
d(F,G)
~
k
2 J(F,Gj€) D(F,G) + £.
Proof. We may assume that J
= J(F,Gj€)
is finite.
Then there
exist J nonoverlapping intervals I l , ••• , IJwhiCh satisfy the conditions
(i) and
-e
(ii). We have
2d(F,G)
=
J
E
j=l
dv + [, f - g
V
I
dv
29
Now
! I f-g 1dv S J (f+g)dv :: 2
V
V
r r
(g-f)dv = 2F fV_7+
fdv +
V
V
~ tJ
j=l
I
J
L.
J/ (f-g)dv
j=l I
(f-g)dv
j
1
J
Also,
fI
I
f-g
I
dv
iF
tf
~ 2~(F,G)
(f-g)dV)
•
Ij
j
Hence
LV_7
By symmetry, the term 2F
and hence also by 2€. This implies
~~::1J1 5.1.
k
> 1.
(5.5)
Let
rj
fv_7 ,
can be replaced by 2G
(5.4).
be a set of k-dimensional distributions,
(a) If
sUPG
€'3t J(F,Gj€) < 00 for all F
then two subsets ~
and tJ( of
if
€
'f' and all
€
> 0,
are distinguishable ('f!) if and
only if
(5.6)
maxfd(F,1 ) ,d(F, rJ(.
)_7>
°for all F
€
~
•
Moreover, i f condition (5.6) is satisfied, then ~ and ']f" are
30
distinguishable in the class of tests (N, ¢) such thatiFe
for some t
(b)
= t(F) > 0 if F
€
tN
< 00
~.
If
(5.7)
sUPF
€
~
G
1
€
'Y
J(F , Gj€)
<
ex:> for all €
> 0,
then two subsets ~ and ~ of ~ are finitely distinguishable if
and only'if
d( ~
(5.8)
Proof.
~
e
D(F,~ )
1l ) > o.
The necessity of conditions (5. 6) and (5.8) has been
proved in section 4.
5.1,
1
If condition (5.5) is satisfied, then 1 by Lemma
= 0 implies d(F,-1 )
= 0 for
€~
all F
and all
~ C '1' .
Hence if (5.6) is satisfied, the assumption of corollary 3.1 is fulfilled , which implies part (a). The proof of part (b) is similar,
referring to (3.4) with
6 = D.
The assumption of Theorem 5.1, part (b) (and hence that of
part (a) ) is satisfied for most parametric sets of univariate distributions which are commonly used as models in statistics.
In such
sets ~ the minimum number of intervals in which f - g . is of constant
sign is usually bounded, and then even sUPF
finite.
err
1
G
€ ~J(F,G;O)
is
For example, if F and G are any two univariate normal distri-
butions, then J(F , GjO)
~
3. This is also true if the singular normal
distributions (With zero variance) are included.
The assumption of part (a) is satisfied if
'?f. is any subclass
of the class of all distributions on the subsets of a fixed countable
31
set S.
Since the points of S can be arranged in a sequence, we may
assume that S is the set of the positive integers.
E
> 0,
choose the integer M so that F~x
If F
> M_7 < E.
E
'1' and
Since we can
choose M intervals each of which contains exactly one positive integer
< M..
we have J(F ,Gj E)
<M
for all G
E
~, so that condition (5.5)
is satisfied.
Actual statistical observations are either integer-valued or
integer multiples of a fixed unit of measurement • In this sense it
can be said that the assumption of part (a) is satisfied for all classes
of distributions which actually occur in statistics.
If ~
and 7f are two arbitrary sets of distributions over
a fixed countable set.. then ~ and
even when d (~ , tj{ )
> O.
or
can be f1nite~ indistinguishable
This is shown by the following example.
For
r = 1 .. 2..... and k = 1, ... ,r define the sets
Ar,k
=t} ( j2r-k+l)
+i 2 -r I i
= 1,2, ..... 2r-k j j = 0,1, ... ,2.k-l - 1 }.•
Let Gr and Hr, k be the discrete distributions whose elementary
probability functions are
f
r
gr(x) = 2- X(Xi Ar ),
where X(xjA)
=1
or 0 according as x
r = 1,2, .,., and rj(
= {Hr,k
= 2 -r+ 1 X( ~jAr, k )
hr, k (x)
E
A or x fA.
t ' k = 1, ••
"rj
r
Let
--§ ={ Gr
= 1,2, ...
'
} ,
The reader
32
can verify that
for all r,s, and k, so that d(~ , ~)
> O.
Now denote by G(n) and E(n ) the distributions of n independent
r
r, k
chance variables each of which has the distribution Gr and Hr,k'
respectively, and by
g~n)
probability functions.
and
h~~~
Let H~n)
,.
their elementary
denote the distribution in C ~(n)
whose elementary probability function is
= r
(n)
Writing gr
-1
r
1:
hen)
k=l r,k
' gr' etc. for the chance variables
(n))
~
(Xl, ••• ,Xn ,
gr(X i ), etc., and E for the expected value when the distribution of
We calculate
33
It follows that
d(
1 (n) "c 1f (n»
=0
eve~y n. Therefore
= 0 for
lim
d(G(n)" H(n»
r-> ex>
r
r
for all n" so that the sets
finitely indistinguishable.
~ and rJ{ are
Note" however" that since d(
..§ ,,'1f ) > 0,
the sets are distinguishable in the sense of part (a) of Theorem 5.1
?f = ~ U?( and, more generaJ.l~r, with
with
co
of ditributions on the subsets of
LJ r= lAr
rJ
denoting any class
such that condition (5.6)
is satisfied.
We
s~~ll
see that all conclusions of Theorem 5.1 are true
also for arbitrary sets of k-dimensional normal distributions, for
any k
~
1.
However, for k > 1 this cannot be deduced from Theorem 5.1
since the multidimensional D distance does not have the properties
required by the theorem.
It can be shown that if ~ is any set of
non-singular bivariate normal distributions, the assumption of part
(a) is satisfied.
But for arbitrary sets of bivariate (possibly
Singular) normal distributions, D(F, ~ )
d(F,~ )
= O.
=0
does not imply
(For instance, if Fe denotes the bivariate normal
distribution with means (c, - c), unit, variances and correlation
coefficient 1, and...g = [ Fc Ie>
d(F 0'
1) = 1.)
o} ,
Moreover, D( ~ ,,7f)
then D(F0'
=0
".J ) = 0 but
does not imply d( ~ ,r.ff ) = 0
even for sets of non-singular bivariate normal distributions.
(ThUS
if Gc denotes the bivariate normal distribution with means (c" - c)"
2 -1
unit variances, and correlation coefficient (1 + c) "if
.A!J
=
i Gc I c < 0 1 and
dc1,,'1< ) > 0.)
'1( =
f Gel c > 0 J , then D( ~ , tjI )
= 0 but
For a fixed k ~ 1 let ~denote the set of all k-dimensional
normal distributions. To prove the statement at the beginning of the
preceding paragraph it is sufficient to display a distance B such that
B(
~ I 1() =
0 implies d(
~
and 5 satisfies assumption
shall show this to be true
7I} = 0 whenever ~ eX and 1f <:.)(,
(3.7) of Theorem 3.1 with 7 = ,.AI'". We
for the distance 8 * defined as follows.
I
For any k-dimensional distribution F with finite moments of
the second order define Q(F)
= (~(F},E(F)
}, where
~(F)
denotes the
vector of the means and E(F} the covariance matrix of F., Denote by
®
the range of Q(F}. Define the function
d* (Ql'Q2 >
I
Qp Q2
E
@
by
Now define 5* by
for any two k-d1mensional distributions Fl and F2 with finite moments
of the second order.
The function 8* is a distance 7 in the set of distributions
for which it 1s defined,
d( ~ ,-;()
Obviously 5*( J!:; I 71)
~ ( c..A"" and
= 0 for
=0
if and only i f
-1t c/.
Now let Fn be the empiric distribution of n independent chance
va.riables Xp ,. "X
Q(F)
= Q = (/-lIE)
n'
each of which has the distribution F
1\
and O(Fn )
A
A
= Q = (~IE>,
A
€
Jf.
Put
Thus ~ is the sample mean vector
35
A
•
and r. the sample covariance matrix. We have
It follows from the definition of d* that the distribution of
A
d* (Q,g)
does not change if each Xi is subjected to the same non-
singular linear transformation.
f\
Hence the distribution of d* (g,g)
depends only on the rank r of L.
(O,I)
= go
(say), where
If r
If 1
~
= gO.
Thus we may confine
A
If r = 0, d* (Q,g)
=
A(c) and B(c) such that for all integers n
>
€
> ° the
1\
where P ij
" and
~
II
(~.9)
= k,
°there exist numbers
> 0,
Now the function d*.
(9,QO ) is continuous at g
Hence it is easily seen that
° with
ourselves to the case r
We have only to show that for every c
sense.
=
,A
r < k, the distribution of d* (Q,.g)
is the same, only with k replaced by r.
g
we may assume that 9
°denotes the zero vector with k components
and I the k x k unit matrix.
probability one.
= k,
= go
in the usual
is satisfied if for every
probability of each of the inequalities
A
= O'ij
A..
'"
(O'ii O'Jj)
-1/2
~
'"
, and ~i and O'ij are the components of
L, does not exceed a bound of the form A(E) exp ( - B(E) n) With
B(€)
> O.
That the latter is true 1s seen by considering the well-
known distributions of
,,1\
~i'
A
cr ii , and P1J'
This completes the proof •
•
In the proof we could have equally well used, instead of d,
the distance
where v denotes a measure such that F and G have densities, f and S,
with respect to v, and
p(F,G) = )(fg)1/2 dv,
For we have (see, for instance, Kraft
£7_7 ,
2
Lemma 1)
1 - p(F,G) ::: d(F,G) ~(l - P (F,G»
1/2
.>
so that the distances d and d are equivalent for our purposes.
l
and 5* were defined in terms of d. We shall write p(g1' g2) for
If L1 and L2 are nonsingular,
(5.10)
where
~l
and
~2
the transpose.
are regarded as column vectors and the prime denotes
(Compare Kraft ~7_7, p. 129, where there are some
37
•
If L1 has rank r, 1
misprints.)
~
r
< k,
then peGl , Q2) = 0 unless
L also has rank r and the normal distributions
2
wi~h
Q = Ql and
Q = Q assign probability one to the same r-dimensional plane, H; in
2
this case P(gl,G2 ) is e~ual to an expression like (5.10), with ~i and
L
i now denoting the m9ans and covariances, in a
co~mon
coordinate
system, of the corresponding r-dimensional normal distributions on
H.
If the rank of L1 is 0, then P(G l ,Q2) = 0 or 1 according as
Q
2 or °1
= G2
•
If rand 6 are subsets of ~ , write p(Q,6) for sUPQ,~p(Q,gl)
.
and p(r,~) for SUPg
€r
p(Q,~) • If ~cJ{, define Q(~) ={G(F)\F €~}.
Expressing the conditions (5.6) and (5.8) of Theorem 5.1 in terms
of p, we can summarize the foregoing as follows.
Theorem 5.2.
distributions, k ~ 1.
of rj
Let ~ be the set of all k-dimensional normal
(a)
t:f. ccJV,
If
then two subsets ~ and rif-
are distinguishable (rf) if and only if
Moreover, if condition (5.11) is satisfied, ~ and ~ are distinguishable
in the class of tests (N,¢) such that EFe tN < 00 for some t = t(F) > 0
ifF€
ri.
(b)
TtV'O subsets . . .
-&
and
\it
of '#are finitely
38
•
distinguishable if and only if
We observe that condition (5.11) can be expressed in an
alternative form.
=
If 9
(Il, r.)
=1
p(9,6.)
€
Note that p(9l ,92 )
®,
=1
if and only if 91
where r. is nonsingular, and 6.
C
® ,
= 92
•
then
if and only if there is a sequence [9i } in 6. such that
each of the real components of 9i converges to the corresponding
If r. is singular of rank r,
component of Q(in the ordinary sense).
the same is true, but with the additional condition that the normal
distributions with parameters 9i and 9
the same r -dimensional plane.
assign probability one to
Thus, for instance, i f ry is a set
of non-singUlar distributions, condition (5.11) is equivalent to the
statement that, for every F E ~ , the EU~lidean distance of Q(F)
from 9(~) or from Q('1f) is positive.
Condition (5.12) does not seem to have an equally simple
interpretation.
By
way of illustration, let ~
and rjf denote two sets of
univariate normal distributions with positive variances such that
Il
<0
Then
2
i f (ll,a ) E:
1
9(~) and Q('U) =
€
9(~)} •
and rjI.. are finitely distinguishable if and only i f ilia
is bounded away from 0 in
(~U 'j{).
9(~).
They are always distinguishable
If ~ denotes a set of normal distributions with positive
variances which contains ~ U rjf,
(if>
f (ll,a2 ) I ( - Il,i)
then ~ and
r-Jt
are distinguishable
if and only i f the distance of every point (O,i)
is positive.
E
9(,/) from Q(~)
39
FOOTNOTES
to
1. The research of this author was supported by the United States
Air Force through the Air Force Otfice of Scientific Research
of the Air Research and Development Command, under contract No.
AF 18(600)-458. Reproduction in whole or in part is permitted
for any purpose of the United States Government.
2. The research of this author was supported by the United States
Air Force through the Air Force Office of Scientific Research
of the Air Research and Development Command, under contract No.
AF 18{600)-685. Reproduction in whole or in part is permitted
tor any purpose of the United States Government.
3.
In L:3_7 the term distinguishable was used in another sense.
4. The numbers in square brackets refer to the bibliography listed at the end.
denotes the expected value of ~n when the Joint
distribution of (Xl, ••• ,X n ) is K. We keep the notation E ~. n
G
when Xl"",Xn are independent and each Xi has the distribution G.
5. Here EK
¢n
6. An additive function L on
a
is monotone in a set C if either
LL:A_7 ~ LL:B_7 whenever Ac Be C or LL:A_7 ::: L£B_7 whenever ACBC C.
40
REFERENCES
A. Berger, "On uniformly consistent tests", Ann. Math. Stat.,
Vol. 22(1951), pp. 289·293.
•
Dvoretzky, J. Kiefer, and J. Wolfowitz, "Asymptotic·
minimax character of the sample distribution function and
of the classical multinomial estimator", Ann. Math. Stat.,
Vol. 27(1956), pp. 642.669.
1 3_7 w.
Hoeffding, "The role of assumptions in statistical decisions",
Proc. Third Berkeley Symposium on Math. stat. and Prob.,
university of California Press (1956), pp. 105·116.
J. Kiefer and J. Wolfowitz, "On the deviations of the empiric
distri bution of vector chance variables", to appear in
Trans. Amer. Math. Soc.
15J
r
e
£6_7 w.
H. Kruskal, "On the problem of nonnormal1ty in relation to
hypothesis testing". (Dittoed)
17J c.
Kraft, "Some conditions for consistency and uniform consistency
of statistical procedures". university of California Publications in Statistics, Vol. 2(1955), pp. 125-142.
£8.J.
..
A. Kolmogorov, Foundations of the Theory of Probability (English
translation), Chelsea, 1950.
C. Stein, "A two-sample test for a linear hyPOthesis whose
power is independent of the variance", Ann. Math. Stat.,
Vol. 16(1945), pp. 243-258 •
© Copyright 2026 Paperzz