THE ROLE OF ASSUMPTIONS IN STATISTICAL DECISIONS
by
Wassily Hoeffding
University of North Carolina
r
,/'
This research was supported by the United
States Air Force through the Office of
Scientific Research of the Air Research
and Development Command
Institute of Statistics
Mimeographed Series No. 136
July, 1955
THE ROLE OF ASS~IONS IN STATISTICAL DH::ISIONS
by
Wassily Hoeffding
University of North Carolina
1.
Introduction.
l
In order to obtain a good decision rule for some statis-
tical problem we start by making assumptions concerning the class of distributions,
the loss function, and other data of the problem.
Usually these assumptions only
approximate the actual conditions, either because the latter are unknown, or in
order to simplify the mathematical treatment of the problem.
Hence the assumptions
under which a decision rule is derived are ordinarily not satisfied in a practical
situation to which the rule is applied.
It 1s therefore of interest to investi-
gate how the performance of a decision rule is affected when the assumptions under
which it was derived are replaced by another set of assumptions.
We shall contine ourselves to the consideration of assumptions concerning
the class of distributions.
Investigations of partiCUlar problems of this type are
numerous in the literature.
There are many studies of the performance of "standard"
tests under "non-standard" conditions, for example
L-3_7,
where further references
are given. Most of them considered only the effect of deviations from the assumptions on the significance level of the test.
The relatively few studies of the
effect on the power function include several papers by David and Johnson, the
latest of which is
L-6J.
For some problems tests have been proposed whose signi-
ficance level is little affected by certain deviations from standard assumptions,
for instance R. A. FiSher's randomization tests (cf. section 3); see also Box and
Andersen
L-4_7.
Some other relevant work will be mentioned later.
In sections 2 - 4 we shall be concerned with problems of the following type.
Let
P denote the Joint distribution of the random variables under observation.
Suppose that we contemplate making the assumption that
1
P belongs to a class
ii,
This research was Bupported by the United States Air Force through the Office of
Scientific Research of the Air Research and Development Command.
2
but we admit the possibility that actually P is contained in another class, ~2.
The performance of a decision rule (decision tunction)
d
is assumed to be expressed
by the given risk function
€
10 1
r (p, d), defined tor all P
+
~ and all d
in D, the class of decision rules available to the statistician.
Let
di
be a
decision rule which is optimal in some specified sense (for example, minimax) under
the assumption P
jOi'
€
i:: 1, 2.
unique except for equivalence in
also optimal for
P
€
Suppose first that the optimal rule
1J1 +
P2'
for i
= 1,
di
is
2, that is, it d~
1s
f-J i then reP) d~) :: reP, d i ) for all P
Then we may assess the consequences of the assumption P
€
€
~1 + ~.
7"1 when actually
r( p) d ) and r(P,d 2 ) for P E f/ 2. If the
l
optimal rules are not unique, we ma.y pick out from the class of rules which are
P
E
~ 2 bY' comParing the values
optimal for
P
€
i'J 1
the assumption P
€
a subclass of rules which come closest to optimality under
1?2' and compare their performance with that of the rules which
are optimal under the latter assumption.
In some situations other ways of approach-
ing the problem may be more adequate (see, e.g., section 2).
In section 2 the consequences ot assuming that a distribution is continuous
are discussed.
Problems involved in comparing assumptions of varying generality
are considered in section 3.
Section 4 is concerned with causes where decision
rules derived under assumptions of normality retain their optimal properties when
these assumptions are relaxed.
The last three sections deal with distinguishable sets of distributions, a
concept related to the problem of the existence of unbiased or consistent tests
under given assumptions.
Criteria for the distinguishability of two sets by means
of a test based on finitely many observations and by a sequential test are considered and their uses illustrated in sections
5 and 7. An example where two sets
are indistinguishable by a non-randomized test but distinguishable by a randomized
test is discussed in section 6.
2.
The assumption of a continuous distribution.
The assumption that we
are dealing with a class of continuous distributions 1s usually made when actually
the observations are integer multiples of the unit of measurement h, a (small) posin
tive constant. Suppose that a sample x = (Xl' ••• , X ) is a point in R , and
n
let ~l be a class of distributions (probability measures) which are absolutely
continuous with respect to n-dimensional Lebesgue measure"
n
all points in R
Let S be the set of
whose coordinates are integer mUltiples of h"
Let us suppose
P € ~ l' we "have in mind" that the
l
distribution is P2 = f(P l ), where the probability measure P2 is defined by
that when we say that the distribution is
(1)
j
for all
y = (Yl' ".. , Yn)
in S.
=
Let ""2
interested in the consequences of assuming P
Let
E:
[f(P): P
fJ l
E:
= 1,
-fJl
S"
J)
."", n
Thus we are
when actually P
E:
f02 "
d be a decision function which is optimal in some sense under the
assumption P
€
fJ 1"
Then any decision rule which differs from d only on the
set S is eqUivalent to
d for
P
E:
-yO 1" Since peS} = 1 for all P
the mere fact that a rule is optimal for
its performance when P
latter assumption.
either assumption.
2
E
;t02;
P
E:
E:
-to 2'
jf'l does not tell us anything about
indeed, it can be as bad as we please under the
Of course, in general there are rules which are optimal under
But the main reason for making the simplifying assumption of
continuity is that we donlt want to bother with rules which are optimal for
P
E
"'P2"
Now it is clear that if there is a determination d'
of d which is
P = f(P ) will differ arbitrarily little
2
l
is small enough; also, d ' may not be much worse
sufficiently regular, its risk at
from the risk at
P if h
l
than an optimal rule for P
2
E:
;02.
We shall here not investigate under what
The author's attention to situations of this kind was drawn by H. Robbins some
years ago.
4
conditions such a regular decision rule exists or how small h has to be in order
that the assumption of continuity cause little harm.
These questions may deserve
attention. Fortunately, when a statistician applies a decision rule, he is likely
to choose the most regular determination available anyway.
But the theoretical
statistician might do well to be careful when he neglects sets of measure zero.
3. Assumptions of varying generality. Suppose we consider making one of
P ~
two assumptions,
Pl
and P
€
~, where
-fi c
~.
The second assumption
is safer, but with the first assumption we may achieve a smaller risk.
The consequences ot making the broader assumption when actually the narrower
assumption is Justified may be called serious it any decision rule which is "good"
under the broader assumption is "bad" under the na.rrower assumption.
consequences will depend on what we mean by a good decision rule.
Thus the
But even with
a given definition ot "good" or "best" the consequences may depend on the class of
decision rules at our disposal. For example, suppose we require a minimax esttmator of the mean
of a normal distribution when the loss function is the squared
deviation trom ~, and we assume that the variance 0 2 does not exceed a given
number A.
~
If we are restricted to estimators based on a sample of fixed size, the
minimax estimator is the sample mean
x
and does not depend on A.
On the other
hand, if we are permitted to choose the sample size in advance, and the cost of
sampling is taken into account, the minimax estimator will depend on A. If A2
2
is substantially larger that A , the assumption 0 S A2 will give us a unique
l
2
minimax estimator whose performance is poor under the assumption 0 SAl.
It is not uncommon that a considerable broadening of the assumption does
not lead to serious consequences when the narrower assumption is Justified.
Thus
in the standard problems concerning the variance of a normal distribution we need,
when the mean is completely unknown, just one more observation to obtain the same
expected loss as when the mean is known.
Somewhat similar results have been
5
obtained in certain cases where a parametric class of distributions in enlarged to
a nonparametric class.
Several examples can be found in
L-9-'.
Thus in the problem
of testing, on the basis of two random samples of fixed size, whether two distributions are equal (and not otherwise specified) against the alternative that the
distributions are normal with common variance and means
~l
<
~2
' the uniformly
most powerful similar test is asymptotically as powerful in large samples (in a
sense explained in
L-9_7 ) as
the corresponding standard test for testing the equa-
lity of the means of two normal distributions.
(The former test is of the randomi-
zation type introduced by R. A. Fisher; its optimal properties were proved by Leh-
L-12J.)
Here we assumed that the class of alternatives is the same
under both assumptions.
Actually the test retains its property of being uniformly
mann and Stein
most powerful similar even when the class of alternatives is enlarged to a nonparametric class of distributions of an exponential type (see Lehmann and Stein
L-12-').
If the class is further extended, a uniformly most powerful similar test will in
general not eXist, and it will be necessary to specify against what types of alternatives the power of a test should be large.
This can be done in many ways, and
an optimal test and its performance in the class of normal distributions will
depend on this specification.
4. Nonp!rametric justifications of assumptions of normality. Given a
decision rule
d which is optimal in a specified sense under the assumption that
fJ l'
it is of interest to determine other classes -,0 such tlIat
P
is in a class
d
is optimal (in the same or a suitably extended sense) under the assumption
P E~.
If optimal means minimax, an obvious sufficient condition for
main a minimax rule in
in ;Dl'
-;O::::J ~
is that the risk of
d
in
l'
d
to re-
attain its maximum
Situations of this type were considered by Hodges and Lehmann
L-8-7.
In certain cases we find that a decision rule derived under the assumption
of a normal distribution retains its optimal character in a large, nonparametric
6
class of distributions.
One result of this type, concerning the minimax character
L-8_7.
of Markov estimators, can be found in
Similar though weaker results can be
obtained in certain testing problems.
AS
an example consider the following extension of Student t s problem.
Let
~ be the class of distributions F with finite mean ~(F) , positive variance
2
a (F)
and such that
00
J
(2)
Ix
- "F)
I}
S M ~(F)
dF(x)
,
-00
where M is fixed.
Let
r5
be the subclass of ~ with
want to test the hypothesis F
5
> O.
76
€
,
5 ~ 0, against the alternative
We restrict ourselves to the class D of tests
observations from F , with critical region W = Wed).
a
(})
where
reF ,d)
pew IF)
=
;n ~(F)/O'(F) = 5.
pew IF)
d based on
if F
E
78 ,
5 ~ 0,
F
€
15,
5 ~ 8
if'
o
elsewhere,
(Xl' ••• , X )
n
independent with the common distribution F,
(4)
do
be the test with critical region Wo
= n2 X/
and the constant c
independent
X are
J
and a, b, and 01 are positive con€
W when the
=
~ > c5 , where
1
t
r5 ,
1
stants.
Let
€
We choose the risk function
bL-l - P(WIF)_7
denotes the probability of
n
F
We
s,
is determined by
,.
7
here
S
lex, 0) denotes the non-central Student distribution function with n-l
ndegrees of freedom and non-centrality parameter o. It can be shown by standard
methods that
DO
of the normal distributions.
L-7- 7 ) the
F (y)
n
of
to the standard normal distribution function
uniformly 1n y) as
F € to
->
F ~ '}-.
Also, for any
<Xl.
which consists
ni
!
( X- ~(F)
(y)
) / o(F)
uniformly for
t>
converges
F e 1l(and
11 <
0, p( Is/o(F) -
elF)
-> 1
Hence it can be shown that for any real 0 and for all
Ip(t ~ YIF) -
where
r
we have
(6)
o
n
of
By an inequality of Berry and Esseen (see, e.g.,
distribution function
uniformly for
q. 0
is the minimaX test in the subclass
Cn(o)
fixed.
depends on
!
(y - 0)
I ~ Cn(o),
-co
n, 0, and M only and tends to
< y < 00,
--->
0 as
n
= 0
and
00,
for
It follows that
7Z 0 •
if
1-*
for all F €
Now
denotes the subclass of :, with
~(F)
a2 (F) = 1,
we have
(8)
sUIT € !l
pet
> clF) = sup
* p«n~
x+o)/s
F€1
o
which is a nondecreasing function of
o.
> cIF),
The same is true of the infimum in
To.
Hence we obtain
~ intd€D sUIT€ 9' reF ,d) + e
where
test
e
dO
=
2 max [a Cn(O), b Cn(Ol)
J.
exceeds the minimaX risk in
r
Thus the maximum risk in
?
of Student's
by at most e , where e is arbitrarily
8
e
small for
zero as
n sufficiently large.
n
->
(Note that the minimax risk is bounded away from
00.)
In the corresponding problem with
a(F)
=0
fixed we find in a similar way
the stronger result that the maximum risk of the x-test in
IJ.(F)
= I) oJ.jn )
lies within a small
&
~5
(the class with
of its "normal" risk" uniformly in 5.
The argument which was used above does not permit us to decide whether an analogous
result is true when
o(F)
is unrestricted.
The explanation for the near-optimal behavior of the "normal" decision
rules in these cases is,, of course" the distribution-free character of the central
limit theorem" combined with the fact that the class
7
was so chosen as to make
the approach to the normal distribution uniform.
5. Distinguishable sets of distributions. If we relax the assumptions more
and more, the minimax risk will in general increase" and eventually we may reach a
point where the maximum risk of any decision rule is not smaller than the risk of
a rule which does not depend on the observations.
We shall consider criteria for
recognizing when this or a similar situation occurs in testing problems.
Consider a testing (or two decision) problem such that one or the other
decision is definitely preferred according as the distribution P belongs to ~l
or
1J 2"
two disjoint subsets of the given class
assume that each P in
l' is
space of infinite sequences
rov' s extension to
X
-;0. Unless otherwise stated we
a probability measure on
x
(i ,a..)"
where
= (xl" x 2 " ••• ) of real numbers and
a
i
is the
is Kolmogo-
of the ordinary Borel field.
A test will be called finite if it depends only on a finite number of coordinates (observations) x .•
J
measurable function
By the critical function of a finite test we mean a
W from ~ to the interval
L-O"
1_7 such that 1-l(X)L-w(x)_7
1s the probability of taking the decision corresponding to
when x
1s the sequence of observations.
P
--dl
E ~
L-p ~~ r...n 2_7
9
Let
be the class of the cri~
We shall say that the sets -f 1 and fJ 2 are
tical functions of tests in D.
distinguishable in D i t there exists a
t
i
in
such that
sup
1 E(tlp) + sup
2 E(l - ~Ip)
Pei'
Pe'!'
(10)
where E(flp) is the expected value of
Otherwise ~l and
t
of
t
D be any class of finite tests, and let
f(X)
< 1,
when X has the distribution P.
]0 2 are said to be indistinguishable in D. (The property
expressed in (10) has an obvious relation with unbiasedness.)
Let Df denote the class of all finite tests, and let Dn (n = 1, 2, ••• )
be the class of all fixed sample size tests based on the observations (xl, ••• ,x ).
n
Two sets which are distinguishable in D will be called finitely distinguishable.
f
~e observe that two sets are finitely distinguishable if and only if they are distinguiehable in Dn for some
Berger and Wald
L-2_7
n.
gave conditions under which two sets of distributions
are distinguishable in the class of all non-randomized tests in Dn
they are disjoint.
if an only if
(Their Theorem 3.1 is stated in a slightly more special form.)
A sufficient condition for two sets to be indistinguishable in D
n
stated as follows.
Let:i! n be the space of points
a , fJ..
be the a-field of subsets of
n
n
tiona on
and -pi
n
Kn
a n which are determined by t2, -I' and ]Oi
can be
(Xl' ••• , xn ), and let
and the classes of distribuin an obvious way.
For any
P and P on tl
we denote by 'V any measure on tl relan
l
n
2
tive to which Pl and P2 are absolutely continuous, and by Pl and P2 the
resPeCtive densities (Radon-Nikodym derivatives). With this notation, the sets
two distributions
l' 1
and
..,02
(or, equivalently, the sets -;;1 and 7'2) are indistinguishable
in Dn if for any e
such that
(11)
n
>0
there exist two distributions,
n
Pl €,o~ and P2 E~
,
10
where the integral extends over ~ • This follows from the inequality
n
(12)
inf
P€1'
J
2
VdP - sup
"J.
P€r-
J
'!IdP
~
J
'!I(P2 - Pl )dv •
The statement of the condition remains true in the more general form where
i
is any mixture of distributions in -f> n with respect to some probability mea-
Pi
sure
~i
on a a-field of subsets of
f
i
n' subject to an obvious measurability con-
The proof is similar and uses Theorem} of Robbins L-l}j.
dition.
With
S:: [x: Pl (x)
> P2(x)
~ we have
i) IPl - P21dV = sup
(l})
A€t2
IP1(A) - P2 (A) I
= Pl(S)
- P2 (S).
n
The first equation (13) shows that condition (11) is independent of the choice of
v.
The last expression in (l}) is often convenient when applying this condition.
It follows from an earlier remark that two sets fJ 1 and fJ 2 are finitely
indistinguishable if the condition expressed in (11) is satisfied for every n.
We shall say thatf> 1 and
for any
&
>0
p2
are finitely absolutely distinguishable if
there exists a finite test with critical function
(14)
sup
P€ fl
1
E('lilp) + sup
P€ -fi
2 E(l - '!lIp)
<
'!I
such that
s •
This property has also been expressed by saying that there exists a uniformly consistent sequence of tests L-l_7.
Now suppose that each P in 'fJ is the distribution of a sequence of independent, identically distributed random variables.
Then if two sets are finitely
distingUishable, they are finitely absolutely distinguishable.
This is a simple
partial extenSion of a theorem of Berger L-l-7; the theorem gives a necessary and
sufficient condition for the existence of a uniformly consistent sequence of nonrandomized tests.
11
We now give three examples of finitely indistinguishable sets.
Exami?le 5.1.
ables with mean
If P is the distribution of independent, normal random variand variance
~
and'jJi
~:: ~i'
is the set with
0
< a2 < 00,
then .,01 and 1"2 are finitely indistinguishable.
Condition (11) is satisfied for
every n
and
if Pi
is the distribution with
~:: ~i
a sufficiently large.
The corresponding result for tests with constant power in -f71 and -fJ2 was proved
Example 5.2.
~l' ~2'
ables with means
a = ai'
< ~j <
-00
the mixture of the
••• ,
Let -9 i
"..,i
J:: 1, 2, ••• ,
then fll
the set with
and tJ2 are finitely indistinl'"
i
P in fJ n according to
~1' where under
For if P.J.
~i
is
the means
2
Ti
' such that
This is a further extension of Student's problem (cf. section
be the class of all distributions F on the real line with finite
~(F) and positive variance a2 (F) such that ~(F)/a(F) :: 7i , 71 < 72 "
mean
Let 'fJ
i
be the class of distributions of independent random variables with common
distribution F
if
2
and common variance a , and r
are independent normal with zero mean and variance
~n
Example 5.3.
4).
00,
•••
Here we can apply the general form of condition (11).
guishable.
~l'
If P is the distribution of independent, normal random vari-
71
€
;i.
Then ;J 1 and
tJ 2 are finitely absolutely distinguishable
< 0 < 72 , and finitely indistinguishable if 72
~
0 or
7
1
~'
o.
7 < 0 < 7 , it is easy to show with the aid of Chebyshev's inequality
1
2
that the tests with critical functions ~ (x) :: 0 or 1 according as
If
n
or
>0
form a uniformly consistent sequence.
If
71
~
0, condition (11) is satisfied for every n
it
Pi
1s the distri-
bution with F:: Fi ' where F i ascribes probabilities 1 - ~i and n i = (l+ti>-l
to the respective points 72 - t -1 and 7 + t 2 ; here t 2 > 0, t l :: f(t 2 ) is the
2
2
positive root (unique for t
small) of
2
12
(15)
and t 2
---> O. The case 12 S 0 can be reduced to this case.
6. Sets distinguishable only by randomized tests: An example. Some results
of Lehmann L-ll-7 suggest that two sets may be distinguishable in Dn but indistinguishable in the class D~ of non-randomized tests in D • We shall consider
n
a problem where this situation occurs. We denote by
the class of criti-
in(()
w' , vex) = 0 or 1 for all x.
.1n
7 \.L be a class of distributions F on the real line with mean \.L and
cal functions of the tests in D (D').
n
Let
Thus if
n
•
€
variance 1, which contains all distributions with this property which assign probability 1 to at most three points.
be the class of all distributions
\.L,n
of n independent random variables with a common distribution in ?-. We shall
\.L
show that fJ '\
and 10
are distinguishable in D for all A ~ \.L and all
~,n
\.L,n
n
n
= 1,
Let]?
2, ••• , but indistinguishable in D~ for any n unless
a positive constant (Which depends on n).
~ =
-h, \.L
= h > O.
We denote by E(f IF)
IA - \.LI exceeds
It is clearly sufficient to take
the expected value of f(X)
when the
components of X are independent with the common distribution F.
We first prove the second part of the statement in the stronger form:
any n and for any
Q €
(0,1)
For
the inequalities
(16)
cannot both be satisfied with
V€
i~
unless
1
depends only on
n
(and is of order n-'2).
If
h exceeds a positive number which
'" is in j~ and satisfies the
first inequality (16), we must have
(17)
for all real y.
For if
t = Y + h ~ 0, let F'
be the distribution (in ~-h)
13
4It
which assigns the probabilities
tive points
t . hand
This implies (17) for
(1 + t 2 )-1 and 1 - (1 + t 2 )-1
_t-l.h.
y + h , 0.
Then a ~
E(,IF')
If
= 0,
Y+ h
to the respec~ ,(t-h, ••• , t_h)(1+t 2 )-n ,
we use a similar argument with
any distribution in 7-.h which assigns to the point -h a probability arbi-
F'
trarily close to 1.
Similarly, for any
,€
i~
which satisfies the second inequality (16) we
must have
ley, ••• ,
(18)
y)
=1
we find that a , € i~ cannot satisfy
both inequalities (16) if L-l + (2h)2-7n < max L-o- l , (l~)-l-7 j and hence cannot
for all real y.
Taking y
satisfy them for any
0
= -h
= h,
and y
7 < 2.
if L-l + (2h)2_ n
(This is not the best bound which
can be obtained from (17) and (18).)
> 0,
any n > 1, and any ae (0,1) condition (16),
least one strict inequality, can be satisfied by a randomized test in D.
Let
n
We now show that for any
b
with at
a
=b
1-
n2
,
-a < c < a,
k(c)
=a
+ c + (a_c)-l,
b
= -k(-c)/2,
d
= k(c)/2,
o
rf; (y)
(20)
=
(y-b)/(d-b)
if
b < Y < d,
1
if
d ~ Y •
(21)
for Icl
< a.
As
c
increases from
tinuously from 1 to O.
-a
to
a, either side of (21) decreases con-
14
e
We sketch the proof of (21).
Let fey)
be any polynomial of the second
I
~ (y) ~ fey)
degree such that
E(tIF)
-< E(gIF),
7.
and E(gIF) is constant in ~
~
E(g IF), F ~
to minimize
1
for all real y.
Seq~entially
If
g(x)
= f(n-2
~.
for each
~ x ), then
j
Now choose f
so as
'1 ~ .
distinguishable sets of distributions.
We shall restrict
ourselves to sequences ot independent random variables with a common distribution
F. Suppose that F E; .!f , and let ~l and 1 2 be two disjoint subsets ot!f. Let
D = Ds (; ) be the class ot all sequential tests tor taking one of two decisions,
s
a l and a , which terminate with probability one tor all F € 9-. We denote by
2
Pr(a IF ,d) and E(n IF ,d} re3pet::'\~i."ely the probabil:Lty of t~e deciston a i and
i
the expected number of observations reqUired to rea:h a decision when the distribution is F and test d is used.
The sets 1- 1 and ~2 will be called sequentially distinguishable (indistinguishable) at F
E(nIF,d) <
00
it there exists (does not exist) a
d
in Ds
such that
and
(22)
If the left side of (22) is arbitrarily small for some
E(n IF ,d)
d
in De with
< 00,
then 9" 1 and!f 2 are said to be sequentially absolutely distin1
guishable at F. It 1
and ?' 2 are sequentially L- absolutely_7 distinguishable
(indistinguishable) at every F
in a class
'7 * , then 1i 1 and .!J-2 will be
said to be sequentially L-absolutely-l distinguishable (indistinguishable) in 9 *
2
Note that these definitions are stated in terms of the sets ~l and 7t
rather thanm terms of the corresponding sets of distributions ot sequences.
Statements such as 9- 1 and -¥ 2 are finitely indistinguishable will have an
obvious meaning 1n this context.
.
15
A sufficient condition tor two sets to be sequentially indistinguishable is
implied by an inequality proved in
let
,7
L-10-,.
Let F1
r1 , F2
E:
1- 2 , F € 1-, and
v be a measure relative to which these three distributions are absolutely con-
tinuous, with respective densities
L-10, (4)
-I,
f , f , and f.
2
l
is any te~t in Ds such that
if d
(23)
sup
1
Fe
E(nIF,d) ~
(24)
for
E:
0
< c < 1,
ticular,
F e
~
Pr( a 2 IF ,d) Sal' sup
2
By a trivial extension of
Pr(aliF ,d)
Fe!!
oS a 2
-log L-a~(l - a )1-C + (1 - al)c a~-cJ
2
cj flog
(f/fl)dV + (l-c)
Jflog (f/f2 )dV
where the integrals are taken over the entire space.
1-- 1 ,
the right side of (24) is maximized with F l
=F
If, in parand c
->
0,
and we obtain
(25)
E(nIF,d) ~
We note that the numerators and denominators in (24) and (25) are positive; the
denominators may be infinite.
Hence if for any positive number M and any two
l
a 2 with a l + a 2 < 1 the distributions F1 e:t- and
c can be so chosen that the right side of (24) exceeds
1
are sequentially indistinguishable at F. If F € '1 ,
ally indistinguishable at F
positive numbers a
and
l
2
F2 e !f
and the number
M, the sets 1-1 and 9- 2
the two sets are sequenti-
if for any & > 0 we can find an F
2
€
7L 2
such that
(26)
By example 5.1 two sets of normal distributions with fixed aeans and
16
unrestricted variances are finitely indistinguishable.
well-known result of Stein
L-14-7,
On the other hand, by a
these sets are sequentially absolutely dis-
tinguishable in the class of all normal distributions.
ment E(n IF ,d)
< ex>
be bounded in
a for
However, i f the require-
is replaced by the stronger condition that E(n IF,d)
=E(nlJ.L,C1;d)
J.L fixed, inequality (24) easily implies that condition (22)
cannot be satisfied.
As an application of condition (26) we shall show that the sets
of example 5.}" with
7
1
= 0
<
1- 1 and
are sequentially indistinguishable in
7 ,
2
2
1- 2 •
Let Fl = (l-t)F+ tG,
where 0 < t < 1 and G is the distribution which assigns probability one to the
point a = -J.L(F)(l-t)/t. Then F E l • Both Fl and F are absolutely continul
ous relative to v = F , with respective densities flex) = 1 and
l
F be any distribution in ::;- , so that
J.L(F)/a(F)
= 72 •
r
f(x)
(27)
where b
(28)
l/(l_t)
it x
{ b/(b+t-bt)
if x
=
I a
= a,
= bet) is the F-probability of the point a. Hence
~f
log (f/fl)dV
= -(l-b)
log(l-t) + b log
b+~-b
t
'
where the last term is to be omitted if b = O. By Chebyshev's inequality,
2
b ~ t / 1~. Hence the right side of (28) tends to 0 as t ---> -0. Thus condition
2
(26) (With ~l and 9interchanged) can be satisfied for any e > O.
The proof shows that this result still holds if ;71 and
:;-2 consist only
of the normal distributions and of the distributions (l-t)F + tG, with F normal
and t
less than an arbitrarily small number (which, in a sense, are very close
to normal distributions).
17
REFERENCES
L-1J
A. Berger, "On uniformly consistent tests,"
pp. 289-293.
Ann. Math. Stat., Vol. 22 (1951),
A. Berger and A. Wald, "On distinct hypotheses,"
(1949), pp. 104-109.
Ann. Math. Stat., Vol. 20
G. E. P. Box, "Non-normality and tests on variances," Biometrika, Vol. 40
(1953), pp. 318-335.
G.
~.
P. :Box and S. L. Andersen, "Robust tests for variances and effect of
non-normality and variance heterogeneity on standard tests,"
Institute of Statistics (Consolidated University of North Carolina,
Raleigh, N. C.) Mimeo. Series, No. 101 (1954).
L-5.1
G. B. Dantzig, "On the non-existence of tests of 'Student's' hypothesis
having power functions independent of a," Ann. Math. Stat., Vol 11
(1940), pp. 186-192.
L-6J
F. N. David and N. L. Johnson, "Extension of a method of investigating the
properties of analysis of variance teats to the case of random and
mixed models," Ann. Math. Stat., Vol. 23 (1952), pp. 594-601.
B. V. Gnedenko and A. N. Kolm.o~orov, Limit. distributions for sums of lnde.E!':ndent random variables, CaInbrr~, Hass., Addison:wesley, 1954.
J. L. Hodges, Jr. and E. L. Lehmann, "Some problems in minimax point estimation," Ann. Me.th. Stat., Vol. 21 (1950), pp. 182-197.
W. Hoeffding, "The large-sample power of tests based on permutations of observations," ADn. Mat.h. Stat., Vol. 23 (1952), pp. 169-192.
W. Hoeffding, "A lower bound for the average sample number of a sequential
test," Ann. Math. Stat., Vol. 24 (1953), pp. 127-130.
L-11JE. L. Lehmann, "Consistency and unbiasedness of certain nonparametric tests,"
~n. Math~ Stat., Vol. 22 (1951), pp. 165-179.
L-12.1 E. L. Lehmann and C. stein, "On the theory of some non-parametric hypotheses,"
Ann. Math. Stat., Vol. 20 (1949), pp. 28-45.
L-13J
B. Robbins, "Mixture of distribution," Ann. Math. Stat., Vol. 19 (1948),
pp. 360-369.
C. Stein, "A two-sample test for a linear hypothesis whose power is independent of the variance," Ann. Math. Stat., Vol. 16 (1945), pp. 254.
258.
© Copyright 2026 Paperzz