I
~
I
I
I
I
I
I
I
THE RELATIONSHIP BETWEEN SUFFICIENCY AND INVARIANCE
WITH APPLICATIONS IN SEQUENTIAL ANALYSIS
By
W. J. Hall,
R. A. Wijsman,
..I
I
J. K. Ghosh
University of North Carolina, University of
Illinois, and Calcutta University
Institute of Statistics
September
I
I
I
I
I
I
and
Mimeo Series No. 403
1964
This research was supported at various times during the
years 1959-1964 by the United States Government (through
the Air Force Office of Scientific Research, the Office
of Naval Research, the National Institutes of Health,
and the National Science Foundation) and the Government of India; final preparation and distribution was
arranged in Chapel Hill under N. I. H. Grant GM- 10397.
DEPARTMENT OF STATISTICS
UNIVERSITY OF NORTH CAROLINA
Chapel Hill, N. C.
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
THE RELATIONSHIP BETWEEN SUFFICIENCY AND INVARIANCE
WITH APPLICATIONS IN SEQUENTIAL ANALYSIS
1
By W. J. Hall,
R. A. Wijsman2 , and J. K. Ghosh3
University of North Carolina, University of
Illinois, and Calcutta University
~~~~~~2~~_~~~~'
The main result in this paper relating sufficiency
and invariance was originally found by Charles Stein before 1950 but was not
published and not widely known. It has since been rediscovered independently
by Burkholder in 1958 (reported in [7J), by Hall in 1959 [18, 19J, and by
Ghosh in 1960 [16J; the best theorems of this kind have since been developed
by Wijsman.
This result is closely related to a theorem of D. R. Cox [9J,
published in 1952 and widely used in sequential analysis (e.g., [14, 17, 22, 23])
though Cox made no explicit use of invariance concepts.
The result, together
with extensions (due to Wijsman), related results on tnmsitivity (due to Ghosh),
and sequential applications (due to Hall and Ghosh), is now finally published
as a joint contribution, with the permission of Stein and Burkholder.
This paper is presented in two parts.
Part I, largely written by Hall
and Ghosh, discusses the implications of the main result and sketches a proof.
It also discusses a result in transitivity and the application of it and the main
result to sequential analysis. Several normal theory examples and a sequential
rank test are treated in some detail.
Part II, largely written by Wijsman, presents
the general theory in the subfield mode, including relu.ted results on conditional
independence and transitivity, and additional examples.
The authors 'dsh to thank H. K. Nandi for the research guidance given to
one of them, D. L. Burkholder for helpful discussions, find E. L. Lehmann, J. L.
Hodges, and W. Kruskal for making this joint endeavor possible.
I
J
TABLE OF CONTENTS
Explanatory note
Part I.
EXPOSITION AND SEQUENTIAL APPLICATIONS
1.1.
Introduction
I- 1
1.2.
Invariantly sufficient statistics
1- )
1.3.
Some examples
1-10
1.4.
Invariant sufficiency and transitivity
1-12
1.5.
Application to sequential tests of composite hypotheses:
v-l'ules
1-16
Sequential F, t , and ~ tests
2
I.G.
Alternative sequential tests:
1. 8.
1-19
z-rules
1-2]
Nonrarametric sequential applications
Part II.
GENERAL THEORY
II.O.
Summary
11- 1
11.1.
Introduction
11- 2
II.e.
Preliminaries on transformations and conditional independence
II- '.;
Ir.5.
Sufficiency and invariance in the nonsequential
case.
Assumption A
11- 8
Sufficiency, invariance and transitivity in the
sequential case
II-13
rr.).
Discussion of Assumption A
11-17
II.G.
Assumption B:
invariant conditional probability
II-18
distribution
II.7.
Assumption C:
invariant conditional density
11-21
References
II-54
Footnotes
11-38
I
I
I
I
I
I
I
ell
I
I
I
I
·1
I
.-I
I
I
..I
I
I
I
I
I
I
It
I
I
I
I
I
I
I.
I
I
PART I:
I.l
Introduction.
EXPOSITION AND SEQUElfrIAL APPLICATIONS
We investigate in what sense sufficiency properties
are preserved under the invariance principle and thereby obtain an interpretation
of the sufficiency of a statistic in the presence of nuisance parameters--an
interpretation which facilitates the derivation of some sequential tests of
composite hypotheses.
Sufficiency and invariance are reduction principles--principles for
condensing or reducing the data x to a few statistics which can then be used
for purposes of drawing inferences concerning the probability model of the
data.
Loosely speaking a sufficiency reduction replacing x by s
= s(x)
discards
information which is not relevant to the parameter 9 of the model; an invariance
reduction
4 replacing x by u
= u(x)
discards information about 9 which does not
pertain solely to a parametric function 1
reductions applied in tandem, replacing x
= 1(9) of special
by v = v(x) where
interest; the two
v(x)
= vl(s(x))
v (u(x)), retain only relevant information pertaining solely to 1.
2
interpretation of this latter statement that concerns us here.
or
It is the
More specifically, suppose we consider a family of distributions indexed
by 9 and some group of transformations on the sample space (e.g., changes in
sign, location, scale or order) which leaves the family of distributions unchanged.
Then decision procedures will not be affected by the transformations if they
are based on invariant functions on the sample space; such invariant functions
ha~c
distributions depending only on some function, say 1, of 9. We shall show
that, loosely speaking, if a statistic s contains all relevant information about
9 then a maximal invariant function of s contains all the relevant information
about 1 that is available in any invariant function.
We call such functions
invariantly sufficient; they are sufficient for the family of distributions of
I-2
a maXimal invariant function--or in a sense to be explained, of any invariant
function--on the sample space.
This result is a consequence of the Stein
Theorem presented in this paper.
It may be illustrated by the following
diagram, in which vertical arrows indicate maximal invar1ance reductions and
horizontal arrows sufficiency reductions:
e
J
~>
S
8
t
y
Here X may be thought of as the probability model for the data x with distribu-
e
tion depending on 8; Sa as the probability model for a sufficient statistic s
whose distribution also depends on 8; Uy as the probability model for a maximal
invariant function u, with distribution depending only on y; and Vy as the probability model for an invariantly sufficient statistic v, obtained by either
route in the diagram--as a sufficiency reduction on Uy (definition of invariant
sufficiency) or as a maximal invariance reduction on Sa'
In
applications it is
usually the sufficiency reduction on Uy that is wanted, while it is the maximal
invariance reduction on Sa that is the easier to perform.
That these two routes
are equal under certain conditions is the content of the Stein Theorem.
These results seem to be implicitly assumed by other authors.
Thus Lehmann
[30J derives most of the common significance tests about normal models as uniformly
most powerful invariant tests based on these statistics obtained via the upper
route (X -> Se- >v I)' In these normal theory problems, sufficiency and
a
invariance frequently reduce the data to a single numerical-valued statistic-the sample mean or its magnitude (see I.3), the sample variance (I.3), the tstatistic (I.3), the F-ratio (I.6), Hotelling's ~-statistic (I.6), the (multiple)
II
JI
II
II
II
II
II
II
1\
til
II
II
Ii
I:
II
I
.-I
I
I
..I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
1-3
correlation coefficient (II. 7), and others [30, 13, lJ.
In Lehmann's treatment
of rank tests [30], sufficiency and invariance reductions are applied in alternating order.
For example, when testing whether two random samples come from
the same population with a continuous distribution function against the alternative
that one variable is stochastically larger than the other, sufficiency and invariance reductions reduce the two samples to the ranks (in the combined sample)
associated with one sample
(I.B).
Without the Stein Theorem the justification
and interpretation of these reductions is not clear.
The situation is similar as regards sequential analysis.
Bahadur
[4] has
shown that reduction by the sufficiency principle is also possible in sequential
analysis if the sequence of sufficient statistics, for the data up to stage n,
n
=
1, 2, ..• , is transitive (as defined by him).
By interpreting the Stein
Theorem as a theorem on conditional independence, it can be shown that the
transitivity of a sufficient sequence is preserved under an invariance reduction.
Hence, in this case also, one may either reduce first by invariance and then by
sufficiency or follow the reverse procedure, and the latter is usually easier.
Moreover, the sufficiency assertion in the Stein Theorem may be used to
construct sequential tests for certain kinds of composite hypotheses.
The
method essentially consists of applying a Wald SPRT for simple hypotheses about
"I
(or any other sequential test based on likelihood ratios) to what we shall
call an invariantly sufficient sequence of statistics, the successive values of v.
This method has been described by Cox [9] who gave conditions under which the
joint density of n terms in this sequence factors conveniently.
Application of
the Stein Theorem clarifies the motivation of these tests, reinterpreting them
I.
I
I
through the principle of invariance, and also constitutes a stmplification and
1-4
extension of Cox's factorization theorem.
Moreover, it should be noted that
Cox's result is incorrectly stated, some vital assumptions having been omitted
(see 1.5).
In these sequential applications, and also in many fionsequential contexts,
invoking invariance can be considered as a way of handling nuisance parameters5 •
Sometimes we are not concerned with
e as
a whole but only some function (say)')
of it--the remaining variability in the parameter space being ascribed to nuisance
parameters.
For example, a normal distribution with mean
a
and lmown variance may
be considered as a two-parameter distribution, one parameter being the magnitude
of the mean ()'
= lei)
and the other its sign.
By invoking invariance under changes
in sign, one obtains the magnitude of the sample mean as an invariantly sufficient
statistic--a statistic which contains all the information about lal that is available
in any invariant statistic, the sign of
a being
a nuisance parameter (see 1.3).
Similarly, with normal mean and variance unlmown and
e = (1-1,
2
cr ), the sample
variance is invariantly sufficient for the population variance )' ~ cr2 under
changes in location, and the t-statistic is invariantly sufficient under changes
in scale ()'
= l-1/cr);
in fact, the t-statistic is invariantly sufficient in a wider
context (see II, Example 7.3).
Unfortunately, however, invariance· theory is not
always applicable in problems with nuisance parameters--seldom is it applicable
in discrete models.
Invoking invariance in inference or decision problems insures that the
error probabilities or risk functions will be independent of the nuisance parameters.
However, it may be possible to reduce the error probabilities or the risk
func~ions
by using non-invariant decision procedures, and in fact as shown by Stein (see pp.
338-9 in [30J)there are examples where all invariant procedures are inadmissible.
On the other hand the (extended) Hunt-Stein Theorem shows that in a number of
important cases minimax solutions may be invariant [30J, and invariance theory
I
JI
I
I
I
I
I
I
411
I
I
I
I
I
I
.-I
I
I
..I
I
I
I
I
I
I
1-5
may then provide a useful means of deriving or characterizing minimax procedures.
~~~~~lnY!£~~M~I~iytt~£~~nMNit~~~iM.£i. We
e•
(space) by X
<.~,
cr-field of subsets
!o, P e) where !
of !', and P e is
represent a probability model
is a semple space of points x, ! is a given
a probability measure on!.
For simplicity,
one might like to think of X as a notation for a random sample from a population
e
with density dr mass function
indexed by
e by Xe = {X : e
e
E
Pee
We represent a class of probability models
e} where e is some index set.
Any (measurable) function t on ! induces a new probability model which we
t
t
denote by Te = (!' ! , P e) with analogous notations for other functions. Here
t
t
P e is the induced probability measure on! .. t(!) and ~ .. t(!) is the sample
space of t (that is, the density or mass function
Pet
of t is obtained by transfor-
Pe).
a-
mation from
I
I
I
I
I
I
mations from! onto itself, and assume, as in [30], that each g induces a transfor-
I.
I
I
We consider a group G with elements g of one-to-one (measurable) transfor-
mation
g from e onto
itself defined by P e(gx
A)
E
= Pga (x
E
A), A
Hence, the transformed model is among those considered originally.
these assumptions symbolically as gXe
under G.
= X®,
e E e.
We represent
g.
The group G partitions X into equivalence classes or orbits.
X which is constant on an orbit is invariant.
!
!,
and say the class of models is invariant
G will denote the group with elements
invariant on
E
A function t on
Mere specifically, t is called
~ G if t(x) .. t(gx) for all g and x.
If an invariant func-
tion u on X assumes a different value on each orbit, it is a maximal invariant.
I
_
In other words, u is a maximal invariant on
!
under G if u(x) .. u(gx) for all
and x and if u(x') .. u(x) implies the existence of a g
€
G for which x .. gx'.
Maximal invariants always exist, and they have the property that all invariant
functions are functions of the maximal invariant; also, if t is invariant under
g
II
I-6
G then its distribution depends only on a maximal invariant fUnction, say l, on
8 under
G [30J.
We denote the probability model corresponding to an invariant
statistic t by
T
l
A set A
€ ~
t
= (T,
At , P),
l
is an invariant set if x
invariant set is of the form tx: u(x)
l
€
r = lee) .
A implies gx
€
€
€
A for every g
€
G.
Any
AU) where u is a maximal invariant.
A (measurable) function s on A is said to be a sufficient statistic for Xe
if for every A
P (Als )
a
0
= P a(x
A and s
€
-
€
Als(x)
€
S there is a version of the conditional probability
0-
= s0 )
which does not depend on
If t is any statistic for which t(gx)
= t(gx')
a.
whenever t(x)
= t(x'),
we
say that G induces a group Gt of transformations gt 9E!. Here gt is defined by
g t' = t(gx') for t' € T and x' satisfying t(x') = t'.
t
Clearly, if u
6
t
is invariant on T under G , then u = utt ( = u [t(x)J ) is invariant
t
t
on X under G.
Finally, we shall assume that any sufficient statistic s which we consider
is such that G actually induces a group G of transformations on the sample space
s
S of s.
Although this assumption holds in all interesting examples known to us,
counterexamples can easily be constructed; but without this assumption the
invariance reduction on Sa indicated in the diagram cannot even be defined.
In summary then, our basic assumptions are that we are considering a class
of probability models Xe, a group G of one-to-one transformations on the sample
space
~
which leaves the class of models invariant (gX e
= X8 ),
and a sufficient
statistic s on X which has the property that G induces a group G of transformations
s
on the sample space of s.
"III
II
II
II
Ii
II
I
-I'
I
I
I
I
I
I
.-I
I
I
1-7
\
I
I
'I
I
I
I
I
II
I
I
I
1
I
I
\t
I
We now define invariant sufficiency, which describes the lower route in
the diagram:
DEFINITION.
(i)
A function v on X is invariantly sufficient for Xe
G if
v is invariant under G, and
the conditional probability of any invariant set A ~ v is parameter-
(ii)
free for e
€
e
(for suitable determination of the conditional probability).
By (i) we may write v
on X.
(it)
~
= vu u
where v
u
is a function on U and u is a maximal invariant
Then a statement equivalent to (ii) is
V
u
is sufficient for Ur
since P/,(s)u(u € AU!vu = va) is the same as Pe(u(x) € AUlv(x) = va). It is
tempting to think of (ii) as stating that v is sufficient for the distributions
of any invariant t but this would require an extended definition of sufficiency
7
which did not require v to be expressible as vtt.
(or rather vw where v
= vww)
We can say, however, that v
is sufficient for the distributions of w
where t is any invariant function.
= (v,t)
It is with these various interpretations in
mind that we state loosely that v contains all the information about y that is
available in any invariant function.
We are now prepared to give an informal statement of the main theorem:
STEIN THEOREM.
Under certain assumptions, if s is sufficient for Xe and
u s is a maximal invariant on Sunder
G ,then
--s
- v
for
X8~
= us s
is invariantly sufficient
G.
In the case of discrete distributions (actually, only s need be discrete),
the Stein Theorem can be proved without any additional assumptions.
The proof
is elementary and is given here in two parts:
1.
0
For any e and So for which Pe(s(x)
set A (i.e.,
gA
= [gx:
x
€
A}
= A),
= so)
> 0, any g, and any invariant
we have Pe(Als )
o
= Pe(x
€
A, s(x)
= s a )/p e(s(x) = s o )
I
I-8
• P- (x € gA, s(x) • g s \/p-e(s(X)
ge
s
g
~.
= gs s 0 )
• p-a(gAlg
s )
g
S 0
=p-e(Alg
g
s s).
0
Since
s is a sui'ficient statistic, the extreme members are parameter-free so that
p(Als )
o
2. 0
= p(Algs s 0 ).
Let A be an 1nvari~tset and let Pe(Slvo ) denote Pe(s(x)
then Pe(Alv )
o
which u (s)
s
=~
= v.
0
= slv(x) = vo );
Pe(slv ) p(Als) where the summation is over all s-values for
o
Since u
s
is a maXimal invariant, all those a-values are of
the form g s s 0 , with s 0 fixed, g s
€
G.
s
Therefore, Pe(Alv0 ) • ~ Pe(g s s 0 Iv0 ) X
p(Alg s s 0 ), where the summation is over all gs.
But p(Alg s s 0 )
= p(Als0 )
so that it factors out of the summation, leaving a sum which is unity.
0
by 1
We have
thus proved that Pe(Alv ) = p(Als ), free of e, which concludes the proof.
o
0
0
Part 2 of the proof easily generalizes to the general (instead of discrete)
case, by noting from the conclusion of part 10 that p(Als) depends on s only
through u s s
= v,
and writing Pe(Alv0
) = Ee(P(Als
)~ )
00
= p(Als 0 ),
which is
parameter-free.
0
Part 1
is not so immediately generalized.
In the discrete case, it shows
that the conditional probability of an invariant set, given s, is invariant.
It is not even known whether this result is true in general.
Therefore, in
order to obtain more generally the invariance of p(Als), and with it the Stein
Theorem, another assumption has to be made.
The choice of the most convenient
such assumption depends on the type of problem at hand, and so in Part II three
possibilities are being offered:
Assumptions A, B, and C.
We indicate below
specialized versions of each of these which will enable us to apply the Stein
Theorem to a variety of examples; the sufficiency of these assumptions is
Assumption A is satisfied if every almost invariant function on S is
Part II since we assume
II (i)
(We really refer to Assumption A(ii) of
as part of our basic assumptions in Part 1.)
I
I
-I
I
I
I
II
I
I
I
"I
I
verified in Part II.
equivalent to an invariant function.
J
Thi s
J
I
I~
I
1-9
is commonly true in parametric problems, a useful sufficient condition being
the existence of an invariant measure on G (see pp. 225-8 and 335 in [30J and
I
I
I
I
I
I
II.3, II.5).
Moreover, it always holds for finite groups, such as sign changes
or permutations (see I.3).
Assumption B is, essentially, that there exist an invariant conditional
probability distribution, p(Als)
= p(gAlgs s),
from which the theorem readily
follows, and this assumption is easily verified in some important nonparametric
applications.
In fact, as a useful special case, suppose the sufficient statistic
s has the property that any s-set B may be partitioned into sets B , B , .•• , in
I
2
such a way that the set of x-values mapping into B may be partitioned into i
i
subsets of equal probability (for all e) on which s is one-to-one; hence, there
are a finite number of x-values, all "equally likely', which map into any s-value.
Suppose G is any group which induces a group on S.
-
as the proportion of the x-values mapping into s
I
I
I
I
I
I
~
I
)
a
Then p(Als ) may be taken
0
which are in A.
That this
p(Als) is invariant is immediate; that it is a version of the conditional
probability is straightforward to verify.
An example is provided by s being
the order statistic(s) corresponding to samples from one or more populations
and gx is obtained from x by applying an order-preserving transformation to
each observation (see I.8 and Example II.6.1).
Another example may be provided
by a sufficiency reduction which is simply the dropping of signs in data with
a symmetric distribution.
Assumption C concerns regular continuous cases, and the Stein Theorem under
this assumption may be considered a rigorous version of the Cox Theorem.
The
conditions are that x has a multivariate (non-singular) continuous distribution,
the region of positive density not varying with
e,
and the factorization of
the joint density of x may be written ge(s(x)) h(x) where the transformations
I-10
g in G, the sufficient statistic
conditions, namely:
5,
and the factor h satisfy certain regularity
for all x-values except those lying in an invariant set
A having probability zero and satisfying the condition that s(x) ~ s(x') if
o
x, but not x', is in A , we have (a) each g is continuously differentiable
o
and both the Jacobian and h(gX)/h(X) depend only on s(x), and (b) s is continuously differentiable with matrix of partial derivatives of maximal rank.
Most
normal theory examples satisfy these conditions (see I.3 and II.7).
Finally, it is trivial to show that completeness of a family of distributions
is preserved under invariance reductions.
However, in the absence of completeness,
minimalityof sufficiency is not in general preserved (see II.3).
Hence, it is
possible for maximal reductions made by the upper route in the diagram in I.l
to lead to a lesser reduction than maximal reductions made by the lower route.
__ ~~_~~~~~.
!~~~
x
= (~,
As a simple illustrative example consider the following:
x ) where the Xi's are independent and normal with means e and unit
2
variances, @ = (e: lei < a (finite or infinite)), and G
and g- x = -x
= [g+,
g-) where g+x
=x
= ( -Xl'
-x2 ) • Then -G is found to be (-+
g, --)
g
where -+
g-e .. .:!:e so
that the class of models is invariant under G. Moreover, s(x) = Xl + x is
2
+
-+
sufficient for X@, and G induces the group G • (g+, g-),where g'S ... g-, on s
s
s
s
so that the basic assumptions are satisfied. Moreover, Assumption A holds since
G is finite.
= (u l '
u ' u ) where
2
3
ul(x) = Ix11, u (x) = IX 1, and u (x) = 1 if x x > 0 and = 0 otherwisej and Us = Is\
2
2
l 2
3
or v(x) = uss(x) = IX + x21. We prove it for u only: (i) u(g':!:x) ... u(x); (ii)
l
u(x) = u(x') implies x .... +x.' with the same sign for i = 1, 2, that is, x = g+x'
The maximal invariants are found to be:
1
-
r(e) ... leI; u
1
the magnitude of each coordinate x., together with the
1
knowledge of which pair of diagonally opposite quadrants contains (Xl' x ), form
2
I
I
I
I
I
I
tI
I
I
I
I
I
I
JI
1
I
..I
I
I
I
I
I
I
lit
I
I
I
I
I
I
I.
I
I
I-II
a maximal invariant u (a diagram may be helpful).
The theorem states that any conditional probability statement about any
invariant function t--that is, any function for which t(x)
::I
t(-x)--given
v is free of a-dependencej i.e., among invariant functions, IX + x21 contains
I
1A I. For example, any statement about
all the available information about
lXII, or about (lxII, IX2 1), given IXI + x2 ' is free of dependence on lal.
This example is readily extended to samples of size n > 2.
For a second example, suppose X
and normal with a
::I
(~,~),
= (xl' "0'
x ) where the xi's are independent
n
@ is the upper half-plane, and G is the group of
scale changes, an element of which mUltiplies each xi by a specific positive
constant c (see pp. 98-99 in [13J).
The sample mean and standard deviation
together constitute a sufficient statistic, and the basic assumptions are
readily verified (the induced groups
changes).
G and
G again being groups of scale
s
Assumption A is satisfied since the absolutely continuous measure
with derivative l/c is an invariant measure on the Borel sets of the positive
reals {c}, and correspondingly on G.
Alternatively, Assumption C may be
verified, the set A being the line on which xl
o
::I
x - • 0'
2
= Xn
(see II, Example
7 .1 and 7 .3).
The maximal invariants are found to be y(e) • ~/~j u(x)
::I
(xl/xn, •.. ,
xlix,
sgn x )j and v is Student's t-statistic. The theorem thus states that
nn
n
the t-statistic is sufficient for the class of distributions of u, this class
being indexed by y.
If t' is any invariant statistic--e.g., the sample mean
divided by the sample range-- then the t-statistic is sufficient for the distributions of t' in the sense that any probability statement about t'(x) given vex)
is parameter-free.
Similarly, the magnitude (or square) of the t-statistic is invariantly
sufficient for the distributions of the maximal invariant u(x)
= (Xlixn , .. o,xn- l/x)
n
I
I-12
under changes in location, the distributions being indexed by
J
ri.
Uniform and exponential location-scale parameter examples may be treated
analogously.
I.4.
Invariant sufficiency and transitivity.
--------------------------------------------
In this section we discuss
sufficiency and invariance for a sequential experiment.
The experiment may be
terminated at any stage, but performance of stage n implies previous performance
of stages 1, 2, ••• , n-1.
We must distinguish three kinds of probability models:
the component or marginal models Xne
x
n
(i1)
:s
(X
, A , Pne) for the stage n data
-n-n
(n = 1, 2, ••• ),
the joint (n-~) models X(n)e = (~(n)' ~(n)' p(n)e) for the ac~umulated
data x(n) = (Xl' ••• , xn ) through stage n, and
(iii) the sequential model X = (~, ~, Pe) for the whole sequence of data
e
X = (Xl' x2 ' ... ).
Here, ~(n) and X are the product (n-fo1d and infinite, respectively) sample spaces
with components
~1' ~2'
~-fields of events [32J.
•.• ,
~n'
••• , and
For each
e
€
~(n)
and
~
8, P is a probability measure on (~, ~)
e
n-tuple x(n)' obtained by truncating the sequence x, lies in A, and similarly for
Pne •
We shall largely, be concerned with the sequence of joint models (X(n)e}.
The
concepts of sufficiency and transitiVity are defined (below) in terms of this
Invariance, however, is more suitably defined in terms of the sequential
model X , although, by giving up justificatinn for invariance reductions, we could
e
completely avoid the sequential model altogether.
--:II
are the respective product
and p(n)e and P are the corresponding joint and marginal probability measures derived
ne
therefrom; thus, for A € ~(n)' P(n)e(A) is the probability according to Pe that the
sequence.
I
I
I
I
I
I,
I
I
I
I
I
I
.-I
I'
I
..I
I
I
I
I
I
I
lit
I
I
I
I
I
I
I.
I
I
I-13
If, for each n, sn is sufficient for the class X(n)e of joint models
(8
e), then s
€
= (sl'
s2' ••• ) is called a sufficient sequence for Xe; sn is
a function of the first n observations.
For each n, suppose t
n
If, for all 8 and each n, the
is a function of X(n).
conditional distribution of t +l given x(n) is identical with the conditional disn
tribution of t + given tn,then t
n l
sequence for Xe •
= (t l ,
t 2 , ••• ) is said to be a transitive
This definition is adequate for the discrete and continuous
case examples treated here; a general definition appears in II.4 and [4J.
The idea
is that all the information about t n +l contained in x(n) is carried by t n
= tn(x(n)).
In sequential inference problems about 8, Bahadur [4J has shown that attention
may be confined to sequential decision rules, here called s-rules, which depend at
each stage n only on sn' provided s
= (sl'
s2' ••• ) is a sufficient and transitive
sequence for Xw
'..fe nm~
introduce an invariance structure on Xe-
Suppose
formations g on the sequential sample space -X for which gX.,...\:)
invariant )' on IS.
We shall further assume that each
on the n-fold sample space
x(n)
I
= g(n)x(n)"
~(n)'
that is, i f
models are also invariant (but see next paragraph).
gx
:i.s a e;roup of trans-
= Xe"?
with maximal
induces a transformation G(n)
th
is the n
component of
x~
It is easily seen that g(n)X(n)~j
assumption holds if g acts component-wise:
g
G
= X(n)e,J
= gx
then
that is, the joint
In particular, thin fLlrtLer
= (glxl ,
commonl:y occurs in applications (see also [26J).
Xl
G2x2, ••• ), and this
Typically, the stages in the
sequential experiment are mutually independent copies in "hieh case g ilIUSt act component-vlise with identical components g --for example, each X is a Euclidean plane
n
-n
and each G is a rotation, g(n) rotating each of the n component planes the same
n
amount.
(Strictly speaking, all identical joint (and marginal) models--\Ihich need not
be distinct for all 8
€
e
even though the sequential models are--should be (';iven
I
I-l4
the same index value, by introducing a reduced parameter index e
the maximal invariant on
example, suppose the x
n
9
n
= (lJ.l ,
I
en
may also vary with
D,
.. w (e)
n
n
E
being a function of ")' however.
s are mutually independent N(IJ. , 1J.2),
n
n
e = (lJ. l ,
J
en say;
For
1J. , ••• ), and
2
••• , IJ. ),a new parameter ~eing introduced at each stagej let g act comn
ponent-wise With g x = c x (c > 0), and the maximal invariants under G and G( )
nn
nn n
n
are y = [sgo IJ. } and the first n components thereof, respectively. We choose to
n
avoid this notational complexity, feeling the imprecision should cause no real
difficulty. )
Now let un denote a maximal invariant on !(n) under G(n).
Since, as is clear,
G(n) induces G(m) for m < n, um considered as a function of x(n) (depending on the
first m coordinates) is invariant and hence a function of u j that is, from knowledge
n
of the value of one term in the sequence u
evaluated.
= (u l '
u2 ' ••• ), all prior terms may be
However, u itself is not necessarily a maximal invariant under
--I
but
Gj
it is this sequence of maximal invariants (under G(n»that is relevant in the
sequential decision problem since a maximal invariant under G would depend on the
whole sequence x
= (Xl'
x2 ' ••• ) which is not available to the decision-maker.
I
I
I
I
I
'{e therefore interpret the principle of invariance in the sequential case as
stipulating that attention be confined to u-rUles--that is, to decision procedures
that depend at stage n only on the value of u
8
n
•
In effect, we replace the original
sequence of joint probability models [X(n)8) with the sequence [U ) where Uny is
nr
the model for u. A component-wise sufficiency reduction on u leads to a sequence
n
v
= (vl'
v2' ••• ) which may be called an invariantly sufficient sequence for
Xe,
under G, each vn beins invariantly sufficient for X(n)e under G(n). Hence, when
invokinG invariance, restriction to v-rules is justified so long as v is transitive
for the sequence of models D •
nr
~le
Stein Theorem provides an alternative means of reduction from the sequence
to the sequence v, assuming C(n)induCes a Group of transformations on the
sarJlpl(~
space
I
I
I
I
I
I
I
A
.-I
I
I
\
I
I
I
I
I
I
I
It
I
I
I
I
I
I
..I
I
I-15
of s •
n
The Theorem asserts (under certain assumptions) that a maximal invariance
reduction applied component-wise to s leads to an invariantly sufficient sequence v.
(The diagram of 1.1 is relevant only if we append subscripts (n) to X and U and subscripts n to S and V.)
One problem then remains--that of verifying the transitivity of v.
Fortunately,
as shown in 11.4, it is sufficient to verify the transitivity of s, so that the
upper route is completely justified; that is, we may m.ak.e a sufficiency reduction
from x(n) to sn' verify the transitivity of s, and then make a maximal invariance
reduction from s
sequence v.
n
to. v , and we will obtain an invariantly sufficient and transitive
n
The reason this is permissible is that, in proving the Stein Theorem
in Part II, a stronger result is actually obtained; this result may be roughly
described as asserting the conditional independence of sand u given v --Le., to
n
n
the factoring of the conditional joint distribution of sand u
n
distribution of s
n
n
n
into the conditional
and the conditional distribution of u --and this result can be
n
used to s11m. that transitivity of s implies transitivity of v.
Fillally then, a problem of interest in applying the Stein Theorem to obtain
v-rules is the verification of the transitivity of the sufficient sequence s.
follmling result (see Theorem }~.3 in II) is qUite useful for this purpose.
The
Suppose
the original random variables with values x = (~, x2 ' ••• ) are mutually indepcncL:mt;
then s is a transitive sequence i f
6
n +l , a function of x(n+l)
= (X(n)'
X
n +l ), is a
function of s n (X( n )) and xn+1 only, i.e., if s n+1 depends on x( n ) only through
Gn •
This condition is easily verified for exponential class laws, where sn+l is of the for..n
sn + f(X +1)' and for nonparametric problems where sets of ordered observations conn
stitute the sufficient statistic at any stage. In particular, this condition holds
in all examples discussed in this paper (I.5-8).
I
I-16
I.5.
Application to sequential tests of composite hypotheses:
v rules.
------------------------------------------------------------------------
COX
[9J has proposed that sequential tests of simple hypotheses about a parametric
function
'Y, which are composite hypotheses about
e,
may be obtained by applying a
SPRT--or any generalized sequential probability ratio test (GSPRT) [27J for that
matter--to a sequence of statistics whose distributions depend only on 'Y 9.
In
the framework of the previous section, an invariantly sufficient and tr.ansitive
sequence v is such a sequence, and restriction to v-rules is simply a consequence
of invoking the invariance principle; restriction to SPRT's applied to vl' v2' ••• ,
which turn out to be v-rules, is on the other hand largely a matter of convenience.
Since v
which v
I.2), v
n
n
is sufficient for the distributions of any invariant function of
n
is a function (see remarks after definition of invariant sufficiency in
is sufficient for the distributions of v(n) = (V ' ••• , v ).
l
n
Therefore,
the joint density (With respect to a suitable dominating measure) of v(n) factors
according to the Fisher-Neyman factorization theorem for sufficient statistics.
The ratio of densities of v(n) at fixed values of 'Y, say 'Y
is based--thus reduces to the ratio of densities of v
n
and 'Yo--on which any GSPRT
l
at 'Y
l
and 'YO; hence a GSPHT
based on v depends only on vn' not v (n)' at stage n, and is thus e. v-rule.
densit~r of V(n) need not be known since only the (m.e.rginal) density of v
n
The (joint)
is reqUired.
This fo.ctorization is the essence of Cox's Theorem.
Actunlly, Cox's Theorem and proof are incorrect as stated [9, 22, 14J.
omitted invc.riance assumption gX e
= XiS
is really qUite vi tal.
The
As a counterexample to
this theorem (unless some special interpretation is placed on Cox's assumption thc.t
the T's be estimators of the e's) we sketch the following example:
independent N(~, 1) and N(~ + ~, 1);
U are independent and U is N(~ +
'V,
T and T are
2
l
u given (tl , t 2 ) is N(t2 , 1) so that Tl and
2); and the transformations are t
2
--> t + ....
2
Then the joint density of (T , U) factors but, according to Cox's Theorem, the
l
second factor which is here the density of U should not involve
~,
a contrndiction.
I
I
I
I
I
I
I
I
II
I
I
I
I
I
I
J
I
I
I
\
I
I
I
I
I
I
I
lit
I
I
I
I
I
I
I-17
Our basic assumptions and Assumption C provide a corrected version of Cox's
conditions; however , it is usually simpler to verify Assumption A than Assumption C,
and, through Assumption B, extension to some nonparametric applications is also
made possible.
Moreover, the justification for confining attention to v-rules is
made precise and this enables further investigation of the properties of sequential
procedures based thereon.
Thus our approach corrects, simplifies, extends and
motivates the application of Cox's technique.
Since the v's are not independent and identically distributed, SPRT's applied
n
to them do not in general have any known optimum property (but see the end of I. 7).
We list under four headings below the properties that are known:
(i)
strength:
B
For tests of simple hypotheses about "I, use of Wald I s boundaries
= 13/(l-a)
and A
= (l-I3)/a
on the true error probabilities, whatever the values of the nuisance
parameters.
(ii)
termination:
Many such procedures, including those in which v
n
h£.s a
monotone likelihood ratio (MLR), may be shmvn to terminate with certainty
(for all
e or
even more generally).
For some specific exnmples, termina-
tion with probability one has been known for some time (e.g. [lOJ); more
recently, rather general
te~nination
[J+7J, lirem [21J and Berk [6J.
results were obtained by Wirjosudirdjo
One or more of these references provides
proof of termination for all examples in this paper (though only under
H and H in the rank test example).
O
l
When terminction under H and H is
O
l
assured, the approximate bounds in (i) become approxima.te equalities.
(iii)
OC-function:
The operating characteristic functions of these tests--or
of any GSPRT's of '0 vs. 1
I.
I
I
provides approximate upper bounds 10 (a, ,3)
and, if v
n
1
applied to v--depend on
has a MLR for Y in
r,
e
only through Y,
the OC-functions are monotone in y
(real) [15J; this occurs in most normal theory and exponential class
I
I-18
J
Thus, assuming .,0 < "1' these tests are effectively tests
examples.
of the hypotheses.,
<.,0 vs • ., -> .,.1
-
tions are not generally available.
theorem but his proof is incomplete.
But approJd.ma1i10D8 to the OC func(Wald
[45] has given a monotonicity
In the problem considered by him,
he claims that it is sufficient to prove that the density ratio at "'1
"'loiS monotone in vn but he did not verif'y this claim.
1
and
Also, the proof'
of a similar monotonicity theorem in [29] is invalid for tests based on
a sequence of dependent variables, such as v.
A valid monotonicity
theorem for the case of independent variables, such as the z-rules of
I. 7, appears in [29J and [30J.
appear in
(iv)
Valid theorelus for the dependent case
[15] and [16J and, implicitly,
ASN-function:
in
[47J.)
Little is known about the average srunple number functions
for these tests except for some heuristic approximations supported by
empirical investigations (see [25, 35, 2, 22J).
The results of Ifram
[21] may be used to obtain alternative approximations.
He now consider the examples introduced in I.5.
are independent
IGI ~"'I
l
(>"'1
0
N(e,
~
0).
1) and we wish to test
Ie;
~ "'1
0
First, suppose the observe.tions
against the two-sided alternative
The class of probability models is invariant under the group
vf sign changes for every n, as are the hypotheses.
In I.3 we found v
n
= IXl
+••• + xnl
to be invariantly sufficient for the joint models of the first n observations,
and v = (v ' v ' ••• ) is then found to be an invariantly sufficient and transitive
2
l
sequence; restriction to v-rules is then justified under the principle of invariance.
A SPRT of lO vs. II based on v depends at stage n only on the ratio of densities
of v
n
(using the sufficiency factorization), and this is readily found to be
I
I
I
I
I
I
I
--I
I
I
I
I
I
.'I
I
I
~
I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
I-19
Sampling is continued as long as the ratio remains between Wald's boundaries B
and A (or B and A for a GSPRT).
n
n
Y
= lei,
Since v may be verified to have a MLR in
n
the OC-function--which depends only on Y--is monotone in Y, and the test
terminates with certainty for all Y [2lJ (and even more generally [6J).
The true
error probabilities are approximately equal to the prescribed oneQ.
This example has application in the sequential testing for the significance
of the difference between two means against two-sided alternatives when the observations are normal with equal and known variances, a difference being observed at
each stage of sampling.
The Sobel-Wald [43J, Armitage [3J, and Schneiderman-Armitage
[40] sequential test procedures are alternative to the one above, and not based on
invariance theory, but the symmetric version of each is still a v-rule, and in
fact a GSPRT based on v.
Consideration of the second example of I.3 would lead to the WAGR one-sided
sequential t-test of ~/~ ~ YO vs. ~/~ ~ Yl with v n equal to the t-statistic (or a
monotone function thereof) based on the first n observations ([lOJ, [36J and page
250 of [30J);its properties are analogous to those above (i) - (iv).
We omit
further consideration of it, but consider the two-sided case--the variance unknown
analog of the first example above--in the next section; !lee also Ie 7.
That these
t-tests have a broader applicability may be seen from Example 7.3 in II.7.
Sequential tests about one or two normal variances and sequential tests about
normal correlation coefficients may be derived analogously.
2
sequential F, t , and
r? tests.
-----------------------------------I.6.
to have been introduced by Stein [44J.
Nandi [33 J.
Sequential F-tests and
r! -tests
appear
sequential F-tests were also considered by
Johnson [23J invoked Cox's method and Hoel [20J employed the weight
function method of Wald [45 J to justify sequential F-tests; Hoe 1 also pointed out
invariance properties of them.
F-tests are also treated in [35J, [38J, and [14J.
Cox's method has been recently applied to sequential rF-tests (and X2 -tests) by
I
I-20
Jackson and Bradley
[22J. The two-sided t-test, or t 2-test, is treated here as a
special case of the F-teet (or the T2-test); it appears in
2
comparisons with alternative t _tests appear in
ponents of variance problems
[34J and [37J. Some
[41J. Sequential testi for com-
([24J, [14J) may also be derived through sufficiency
and invariance considerations but will not be considered here.
Our purpose here
is to show in outline how these tests (or slight extensions thereof) can be
derived from sufficiency and invariance considerations; we thus provide a rigorous
basis for justifying and interpreting them.
Alternative tests may be constructed
by the methods of I.7.
F-test for fixed-effects model:
We consider a sequential experiment in which
each stage consists of a replication of a fixed-effects linear model experiment.
v~
assume that the data from each staee are separately reduced to canonical form
as described in Sec. 7.1 of
[30J so that stage n yields data xn = (xnl' ••• , xnp )
where the x .'s are independent normal observations with common (unknown) variance
nl
Ij"
and \~ith means 9 , ••• , el' 0,
, a U ~ p). The hypotheses to be tested are
...
r'.
1
"I
If "1
0
S "1 0
vs. HI:
= ••• = Ok = 0;
1
since such a hypothesis is usually known to be false a priori it may be
is taken to be zero the null hypothesis may be described as 9
howe\~r,
more reasonable to assign a type I error bound to a larger parameter set, and a /interval is a mathematically convenient choice.
'.B1e sequential F-test for this prolllem will now be briefly described and
justified.
results in
Many of the arguments sketched below can be verified in analogy with
[30J.
We define a group G of transfonnations g which act
conlponen~wisein an
way on the canonical forms of the respective stages of the experiment.
identical
Bach transfor-
J
I
I
I
I
I
I
I
--I
I
I
I
I
I
.'I
I
I
\
I
I
I
I
I
I
I
lit
I
I
I
I
I
I
I-21
mation is defined by an arbitrary positive number b, an arbitrary orthogonal
matrix C} and / -k arbitrary numbers a + , .•• , at' and transforms the stage
k l
n data as follows:
(Xn,k+l'
·.. ,
·.. ,
(Xn,[+l'
·.. , xnp ) - > b· (xn, 1 I'
(Xnl '
·.. , xnk )·c
x ) - > b. (X ,
nl
nk
Xn!) ---> b.(Xn,k+1 + a k +l ,
~
xn! + at)
x
np
).
This group leaves the model for the data through stage n invariant, with .,
(defined above) as a maximal invariant on the parameter space.
The sample means
Xnl , ••• , Xnk of the first K components of the observations
through stage n together with the conventional error mean square E
n
~
n
= (n-1)/
(based on
+ n(p-K) degrees of freedom) constitute a stage n sufficient statistic,
say sn' and s
= (Sl'
s2' ••• ) is a sufficient and transitive sequence.
The following
transformation is induced on the sample space of s :
n
(Xnl '
(xn,k+l'
·.. , xnk) - > b. (.~nl
·.. , xnk) .c
·.. , x nK ) - > b·(xn,k+l + a k+1, ·.. , -xnK
2
E
-->
b . E
n
n
+ aK)
A maximal invariant under this induced group is the F-statistic F
n
V
n
degrees of freedom) conventionally used to test the hypothesis r
(With k and
=0
(or .,
~
"0)
based on all the data available through stage n; thus F is the ratio of the mean
n
squares due to hypothesis and error based on the first n replications.
Since every almost invariant function of s
I.
I
I
·.. ,
·.. ,
n
is known to be equivalent to an
invariant function, Assumption A holds, and the Stein Theorem leeds to the conclusion
I
I-22
that F
= lF l ,
hence, F
n
F , ••• ) is an invariantly sufficient land transitive) sequence 11;
2
is sufficient for the distributions of any invariant function of which
it is a function, specifically for the distributions of (F , ••• , F ).
n
l
lihood ratio for lF , ••• , F ) at
n
l
for F at
n
~l
and
~O'
~l
and
~O
then reduces to the likelihood ratio
that is, the ratio of two non-central F-densities.
may be conveniently expressed as R
n
The like-
= hn (7 1 )/hn (7o)
)
where
,
M(·,·;·) is the confluent hypergeometric function and zn
sums of squares due to hypothesis and error [3b].
This ratio
= Fnk/~ n ,
the ratio of
A SPRT based on the ratios (R }
n
has properties (i)-(iv) listed in 1.5 since the non-central F-statistic has a MLR.
References to available tables for the confluent hypergeometric function may
be found in [42]; aSYlll1?totic expansions given there may also be used to develop
approximate procedures.
see also [35, 36, 56, 39J in these re5ards.
Note that we do not reduce the data available through stage n to canonical
form, but only the data of each stage separately.
This is essential in our formula-
tion to permit a consistent component-wise group structure.
Actually the successive
stages need not be perfect replications of one basic experiment so long as the
canonical forms are perfect replications; in fact, we can permit any number of the
last p-k components, or all of the first k components, of the data in cs.nonical term
to be missing at any stage by
the number
makin~
only minor alterations above.
Indeed, p and
(t-k) of nuisance parameters may be infinite, provided only that each
ro'4/ in the desiE',n matrix have a finite number of non-zero entries.
Thus, new
nuisance parameters may be introduced at each stage to adjust for suspected stageto-stage effects.
effects.)
(We call the sta.ges replications, but they include any time
Some accounting for such effects, even stage-wise variat.ion in
ti,
is
J
I
I
I
I
I
I
I
--I
I
I
I
I
I
.-I
I
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
1-25
also possible by using an alternative procedure, namely an SPRT based on a sequence
of independent F-statistics, one computed from each stage (assuming
f <
p).
Such
procedures are considered in 1.7 (specifically in the context of t-testsrather than
F-tests).
Finally, the sequential F-test may be shown to be valid in any situation in
which the non-sequential F -test, based on the accumulated data through stage n,
n
has a power function depending on r only.
observations
x~,
In this case, one can transform to new
where x(n) is a function of x(n)' and in terms of the new observa-
tions the first k components are replicated equally,or not at all,at each stage so
that the precedines analysis is applicable for the primed data.
2
The above F-test reduces to a two-sided t-test, or t -test,
Two-sided t-test:
f
when k = 1; we further assume below that p ~
discussed in [n].)
~
1.
(The case k
= 1,
f =2
Thus, based on a single sequence of observations Xl' x2 '
from a ncrm~l population with unknown mean ~ and unknown variance ~2
test H :
O
take each x. into + bX, and thus constitute
-
~
P =
~
cha~es
is
.00'
we wish to
in sign and scale; see I.5.)
The likelihood ra.tio after n observations is as given above for the F-test with
v
n
= n-l,
k
= 1,
and F
n
the square of Student's t-statistic based on n-l
of freedom; more conveniently, z /(1 + z )
n
n
°0 = 0,
[)';].
= (~~~=1
2
x.)2/~~
1 x
~
~=
i
de~rees
= r n ,say.
When
the SPRT reduces to the common WAGR two-sided t-test treated in [54J and
The restricted [3J and wedge [4~J procedures are alternative GSPRT's based
2
on the same invariantly sufficient sequence of t -statistics.
usin~Wald's
see also 1.7. (~:
weight function one can obtain a similar test with the modification of
reducinb by unity the first argument of the confluent hypergeometric function.
I.
I
•
2
However, even for 00
= 0,
the case considered by Wald, we know of no rigorous proof
of W,:ld IS inequalities on the two error probabilities; for the kind of arguments
I
I-24
J
[5].)
required, see
Tables [34] are available for carrying out the SPRT when 6
0
:II
O.
Otherwise,
tables of the cOnfluent hypergeometric function are required (see above).
Alterna-
tively, one can approximate the confluent hypergeometric function to obtain a simpler
form of the test.
Following Rushton [36J, one obtains the following approximate
sampling procedure, using his simplest approximation:
a
n
< r
n
< b
a
and
= n[ (1
n
~
+ 2~
= (6 1
= log
replaced by b
r
I
where
n
a = lo~ A,
whenever
continue sampling only if
B.
2
~ a 1/2
J2 2
+ - -)
- 1 /~
p n
+ ( )/2, P
0
= (6 1
- ( )/2, and similarly for b with
n
0
r! -test:
a
Better approximations or the exact formulas could be used
lies close to a critical value
n
a
n
or
b •
n
The same approach as for the F-test above could be carried through
for multivariate linear models, as in Sec.
7.9 of [30J, but a single (numerical)
maximal invariant on the space of the sufficient statistic or on the parameter
space is usually not available; instead, the roots of certain determinantal equations
play these roles.
I
I
I
Sequential tests of simple hyPOtheses about these parametric roots
could be carried out, but such hypotheses would seem to be of little practical
I
I
I
--I
I
I,
interest; there is no available sequential analot$ of the maxituum root, trace, or
likelihood ratio tests for this problem (but see Ie 7).
However, whenever there
is a single non-zero root, the problem reduces to that of a sequential
important special case of which we outline below.
r! -test,
an
(See [30J and [lJ.)
Consider two p.variate multinormal populations with equal (and unknown) nonsingular dispersion matrices
~
and unknown mean vectors 8
of the experiment an observation from each population
X
nl
and x
n2
(p-vectors).
"0 ~ 0) where.,
= (9 1
~s
1
and 02.
At each stacie
drawn independently, say
We wish to test the hypotheses" < "0 vs • ., ~ "1 ("1
- a2),~-1(8l - 9 ).
2
With.,O
= 0,
this is the problem of
>
I
I:
I
.-I
I
I
..I
I
I
I
I
I
I
lit
I
I
I
I
I
I
I.
I
I
1-25
sequentially testing equality of two mean vectors.
Let g be a component-wise transformation with identical components taking
x . into (x . + a).L (i
n~
n~
3
1,2) where a
L is un arbitrary non-singular matrix.
invariant, and
l
is an arbitrary vector of constants and
These transformations leave the problem
is a maximal invariant on the parameter space.
of sample means, x and
nl
'Ibe two vectors
xn2 ,
and the pooled sample dispersion matrix D form a
n
sufficient statistic based on the data through stage nj the sequence of sufficient
statistics is transitive, and a transformation is induced on the sufficient statistic
(x .
taking x . into
m
n~
+ a)L (i
= 1,2)
and D into LID Lj moreover, Assumption A holds.
rI-,
the non-zero
n
= (xnl - xn2 ) (xnl
A maximal invariant is
=0
n
where
< (p+l)/2. )
A
n
Renee, r?
n
root of the determinantal equation I A D- l
n n
(~n is an arbitrary constant for
I
= (~, ~, .00)
n
is an invariantly sufficient and transitive
sequence and luay be used to construct sequential tests about the parameter function 7.
Since r? is a Rotelling ~-statistic (With 2(n-l) degrees of freedom), its disn
tribution is essentially that of non-central Fj hence, the stage n likelihood ratio
is of the same form as in the F-test, with k
When p
= p,
v = 2(n-l) - p + 1, and z
n
n
=
:!n /2(n-l) 0
this reduces to the two-sample t 2 _test [17J.
= 1,
Sirnilarly, an invariantly sufficient and transitive sequence of Rotelling r?-statiStics (With n-l degrees of freedom) may be constructed for testing hypotheses about
l
= 9l L-1G1
I
. .
when samplJ.llg
frolll a
si~le
p-variate normal population with mean vector
8 and dispersion matrix ~; the transformations then are of the form x ---> xnloL.
nl
1
If ;: is assumed known in these problems, one may obtain analogously sequential
x£:'-tests [22J.
L"7o
Alternative sequential tests:
z-ruleso
construction of sequential tests about a parameter
An alternative approach to the
"I
when the successive stages of
the sequential experiment are mutually independent is as follows:
let t
n
be a
statistic based on the stage n data (a function on X ) whose distribution depends
-n
I
1-26
only on 7, and let zn be a function of t(n) • (tl , ••• , t n ) which is sufficient
for the distributions of t(n).
Then a GSPRT of simple hypotheses about 7 can be
bued on the t-sequence, or equivalently (by sufficiency)
We call such tests
z.
n
z-~,
on the z-sequence.
decisions at stage n depending only on the value of
An advantage of z-rules is that the t l s are mutually independent and possibly
n
identically distributed as well, in which case Wald's ASN and OC approximations
and his terrnination proofs are applicable; Lehmannls [30J monotonicity theorem for
the OC function may also be applicable.
However, unless such procedures are also
v-rules (see below), they waste pertinent information about
'y
and so are presumably
less efficient that v-rulesj in fact, they may perform no better, as far as the ASN
is concerned, than
nO~equential tests of equal strength (see [24J).
To derive z-rules trom sufficiency and invariance considerations, we first note
that, in the framework of I.4, G induces a group G on X with elements g.
n
-n
n
wn be invariantly sufficient for the component models Xn~~ under Gn •
Let
For' z-rules,
ir.variance reductions (to w ) are made separately stage-wise, and then suffiCiency
n
reductions yield the z-sequencej for v-rUles, invar1ance reductions (to u ) are made
n
on the accwuulated data, and then sufficiency reductions yield the v-sequence.
No'W it may be necessary to group the successive stages of the experiment in
order that the w-sequence does not
an
e~nple,
de~enerate
into a sequence of constants.
consider a one-sided t-test situation (see I.5 and I.3)
As
in which
observations are taken in successive groups of size k (>1) from a normal population.
Student's t-statistic calculated from the stage n data (k-l
degrees of freedom) is
invariantly sufficient under scale changes on the stage n data and 7
w is a sequence of independent t-statistics.
= fJ.!cr.
J
I
I
I
I
I
I
I
til
I
I
I
I
I
Thus
An SPRT of 70 vs. 71 based on w is
easily constructed; this is a z-rule (we can let zn
=
W
(n) 1 Ai'ter a total at'
nk
observations this z-rule would utilize nt-statistics 'With a total of n(k-l) degrees
I
I
..I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
1-2'1
of freedom; in contrast, the WAGR t-test, which is a v-rule, would utilize one
t-statistic with nk-l degrees of freedom.
Hence, the z-rule wastes n-l degrees
of freedom (the between stages degrees of freedom); also, it only permits termination
after multiples of k observations.
On the other hand, approximations to its OC
and ASH fWlctions are available.
A compromise between these two approaches would retain some of the adve.ntages
of each; specifically, observations may be taken one at a time and, af'ter nk + m
observations (1< m ~ k, n
= 0,
1, ••• ), decisions based on the probability ratio
of the rnutually independent t-statistics t , t 2 , ..• , tn' t~+l' each of the first
l
nt's being based on k-l degrees of freedom and t'
being based on m-l degrees
m
n+!
of freedom (termination is not permitted when m = 1).
Example::; of z-rules in the literature include a range test for normal vv.rie,nces
introduced by
COX
[8J, some sequential tests for variance components introduced by
,Tohnson [24J, ana two sequential rank tests for the two-sample problem proposed by
\'lilcoxol1, Rhodes and Bradley [46J (see 1.8).
e~~mple,
Other possibilities are abWldant; for
a group sequential test for the multivariate linear hypothesis could be
constructed based on a sequence of independent maximum root statistics, trace
statiGtics, or likelihood ratio statistics.
No~
it may happen, though not in any of the examples considered so far in this
paper, that a z-rule is in fact a v-rule.
This occurs whenever w(n)
is a m8,Xilnal invariant Wlder G(n)' that is, w(n)
x ... X
G •
n
modification of' the t-test example above.
= un'
= (W l ,
• •• 1
w )
n
This is so in particular if
An example of this is provided by a
Suppose the k observations in stage n
are independent N(r~n' cr~); thus the mean and standard deviations (all Wlknown)
may nOlI vary from stage to stage but the ratio remains constant.
Let g = (gl' g2' ••• )
II
I-28
JI
where g applies a scale change to the stage n data, and g may now vary with n.
n
n
Then w(n) above is easily seen to be invariantly sufficient under G(n) and w under
n
G. Thus, under these conditions, an SPRT based on the sequence of independent
n
t-statistics is both a z-rule and a v-rule.
The sequential F-test and other normal
theory examples could be modified analogously.
When w(n) does coincide With un' SPRT's constructed from the w-sequence (as
in the t-test example above) are easily seen to have a Wald-type optimality among
all invariant procedures, that is, the ASN is minimized for all
or
'Y
l
e
such that .,(0)
= 'Yo
among all invariant procedures with the same or smaller error probabilities.
We conclude this section With a second example; a third example will be given in
the next section.
(If the w 's are Fraser-sufficient as well as invariantly
n
sufficient, as in the example below, then the restriction to invariant procedures
in this optimality may be removed; see
[16J and [12J.)
Suppose stage n of the experiment yields two independent normal observations
with \.U1it varinnces and unknown menns
I.l. +1'
n
and I.l. ,reopectively.
n
group of transformations which adds en arbitrary constant a
from stege n (= 1, 2, ••• ).
n
Suppose
G
is the
to both observations
Then wn may
.- be taken as the difference between the sta(Y,e
= (w l '
) is a maximal invariant under (](n) as is I' under
n
the induced group on the parameter space. (The same statements hold if I.l. and a are
n
n
n observations, and un
constanta
••• ,
W
n
An SPRT of simple hypotheses about y
Then zn = vn =~.~=1 w.•
~
based on the w-sequence has the Wald optimal property.
I.u.
I.l.
and a.)
Nonparametric sequential applications.
In a
se~lential
test (at least in
a SERT), the t,."o hypotheses and two kinds of error are treated in a similar fashion.
In most
nonpar[u~etric
tests, however, the alternative hypotheses are
vague or all-encolnpassing.
tJ~ically
rather
Thus, to obtain a sequential nonparametric test from
II
II
II
II
II
II
II
ell
II
Ii
II
I
I
I
I
availo.ble theory, one must consider rather specific alternatives--sufficiently specific _
so thC'.t the probability distribution of some test statis.tic (perhaps invariantly
I
I
I
..I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
1-29
sufficient) is cOlnpletely specified by the alternative hypothesis, as it is by the
null hypothesis.
1~e
practicality of such a specification is perhaps rare.
One such example, however, is the sign
~
for which one reduces nonparametric
measurement data to binomial data. by classifying successive independent observations
(or
p['~irs
[50]).
of observations) simply as "successes" or "failures" (see pp. 147-149 in
Sequential binomial tests of whether the success probability is large or
small can then be performed.
Sometimes such data. reductions may be justified by
invuriQnce considerations, e.g., in paired comparisons (pg. 220 in [50J) or when
testinL for ~~netry (pg. 242 in [30J); see also the exrunple below.
Here, the
lower route is the convenient one--reducing by invariance and then by sufficiency-so that the Stein Theorem is not required.
which 2.re
sJ~1linetric
However, when testing two-sided hypotheses
about 1/2 in the success probability p, invariance may again be
applied after sufficiency to reduce the data further to the
dev-iotion 01' the proportion of success from 1/2; here 'Y
maL~itude
= ip
- 1/2).
of the
Then the two-
sided sequential binomial test may be derived, using the methods of 1.5.
Both of
these sequential sign tests have properties (i)-(iV) described in 1.5.
Some expmples from the theory of most powerful rank order tests [28J can also
be ho.n(Ued.
In this tl'eory (nonsequential),
one does specify a particular type
of' alternative against which maximum power is desired, for a test of level
sequential test of specified strength
(a,~)
Q.
A
can, at least in principle, be based on
a sequence of most powerful test statistics, calculated stage-wise or from the
accumulated data.
\.Jt.en invariance considerations are 8.pplicable, the Stein Theorem
may fncilitate such tests.
We exempl)fy this with a two-sample sequential rank test;
ananalogous one-sample rank test of
s~'llIIl1etry
about the orit;in may also belEveloped.
Suppose at each stage of experimentation an
from each of two popUlations
'~ith
observ~tion
is taken independently
(unspecified) continuous distribution functions
I
1-30
against H : F' : I F2
l
This is a particular case of the "one variable is stochastically
F and F', respectively, and one wishes to test H :
O
(I~e
(28) or (13)).
larger than the other" alternative.
F'
=F
Letting r l' r , ••• , r be the ordered ranks
2
n
(increasing order) of the observations from the second population from a combined
ranking of all the data available through stage n, we find that v
n
= (r l ,
••• , r )
n
is invariantlJr sufficient (and transitive) under the group G, an element of which
applies an identical monotone continuous transformation to each observation (see
remarks on Assumption B in
I.2). Using the Stein Theorem and [28), the probability
ratio at stage n is found to be
and a SPRT is readily performed by comparing this ratio at each stage with Wald's
boundaries A and B.
(k > 1) or F'
Similar results are available when the alternative is F'
= h(F)
= Fk
for specified h(') (F' ~ F); also,sampling in pairs is not
essential (see final paragraph).
Nothing is known about the properties of these
tests other than that specified bounds (approXimate equalities) on the error
probabilities are met (i), and termination occurs with certainty if either of the
two hypotheses is true (ii); the latter follows from a theorem of Wirjosudirdjo
(47), but whether temination is certain under other hypotheses is not known.
How-
ever, under the invariance principle, any good test must be a v-rule, and these
tests are v-rules.
An alternative group sequential procedure, a z-rule, will be
giyen in the last paragraph below.
Now let us consider a variation on the above example.
We replace
F and F' by
Fn and F'n in the assumptions and hypotheses, and no longer reqUire the components
go of g
= (gl'
~,
••• ) to be identical; that is, the pair of observations from
stage n, having distributions F and F', are both transformed by the same monotone
n
n
J
I
I
I
I
I
I
I
til
I
I,
II
II
I:
I
.'I
I
I
..I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
I-31
continuous transformation, but the transformations and distributions may vary with
n.
The required invariance assumptiom still hold.
=1
Letting w
n
or 0 according
as the sign of the difference between the stage n observations is positive or
negative, we find that w is a maximal invariant under G and u
n
n
a m.axi:mal invariant under G(n).
P
= 1/2 = p(wn = IjHO)
n
= (loll' ••• , w ) is
n
A sequential binomial test or sign test
= 1/3 = p(wn = IIHl »
against p
(of
is then a v-rule ~ a z-rule
and has the Wald optimal property among invariant procedures.
Finally, suppose k and
n
k~
independent observations are to be taken at stage n,
if stage n is performed, from populations with distribution functions F
n and F'n ,
respectively.
(The sequences of numbers k
n
and k l , n
n
= 1,
2, ••• , are non-negative
integers, arbitrarily determined but not dependent on the observations.)
distributions lUay vary from stage to stage but not within stage.
paralle1 structure:
Thus, the
The group G has a
the component g of g transforms each observation :from stage n
n
by the same monotone continuous transformation but the g 's may vary with n.
n
Letting
w denote the ordered ranks from the second (pr1llled) population from a combined
n
ranking of only the stage n data, a SPRT basedw the w-sequence is both a v-rule and
a z-rule and has the Wald optimal property among invariant procedures.
(At stage n
the probability ratio is given by the product o:f the stage-wise probability ratios,
and each of them is of the form 2
(k+2k l ).)
k'
r (r + 1) ••• (rkl + k ' - l)/(k + k l +1) (k+k l +2)
l 2
However, if the hypotheses (and transformations) do not pennit
variation from stage to stage (F
n
= F,
F'
n
= F'
and the components of g are identical),
then this procedure is still a valid z-rule but not a v-rule ; it terminates with
certainty and has a known ASN (apprOxilllate), but preswnably is less efficient than
the v-rule which would re-rank all the available data at each stage rather than rank
only within stages.
This z-rule, and an analogous z-rule based on the rank sum L:r.
J.
(there is no rank sum v-rule since the rank sum is not invariantly sufficient), have
been proposed by Wilcoxon, Rhodes and Bradley [46], and designated the configural
rank test.
I
..I
I
I
I
I
I
I
lit
I
I
I
I
I
I
..I
I
THE RELATIONSHIP BE'IWEN SUFFICIENCY AND nNARIANCE
WITH APPLICATIONS IN SEQUENTIAL ANAL ISIS
PART II:
q~_~~~.
!,
of
~
~
GENERAL THEORY
is a family of distributions on a a-field
is a sufficient subfie1d of
~,
~I
is the a-field of invariant members of
intersection of
~
and
ditions, that
~I
~,
and
~I
is the
The main result establishes, under certain con-
is sufficient for
stronger conclusion that
of subsets
G is a group of invariance transfor-
mations g on !,
~I.
~
~
and
~I
~I.
This is implied by the slightly
are conditionally independent given
~I.
Both conclusions have been established under anyone of three assumptions.
Assumption A is that
~
•
~
for each g, and that every
almost invariant function is equivalent to an
~-measurab1e
~I-measurab1e
and
function.
Assumption B is that there eXists a conditional probability distribution Q
such that Q(gA,gx) • Q(A,x) for all A c ~, XC!, g
€
G.
Assumption C is
that P is a family of densities on n-space of the form Pe(x) • g (s(x»h(X),
-
e
the functions geand h being positive, and that on some
~I
set of probability
1 each transformation g is differentiable with Jacobian depending only on
sex), sex) = sex') implies s(gx) ~ s(gx'), s is differentiable with its
matrix of partial derivatives of maXimal rank, and h(gx)/h(x) depends only
A counter example shows that ~I need not be minimal sufficient
on s(x).
for
of
~I
~
on
if
~
~
is minimal sufficient
is inherited by
~I.
the sequential case are given.
for~.
On the other hand, completeness
Some theorems concerning transitivity in
Theorem 4.1 states that (~n J being transitive
for (~n Jis eqUivalent to ~n and ~(n+1) being conditionally independent given
B"
for each n.
A
- 3n
C
-vn
A
- 2n
C
Theorem 4.2 states that if, for each n, A c A
- 3n
- ln
A , and A
-n
- ln
C
A ,
-n
and An- are conditionally independent given A ' then
-c::u
- 3n
II-2
(~ln) transitive for (~n) implies (~3n) transitive for (~n)·
ibis implies
(under Assumption A, B or C) that (~In) is transitive for (~In) if (~Sn) is
transitive for (~).
= -Bl
subfields, A
-n
Am c
Theorem 4.3 states that if !!l' ~, ••• are independent
V ••• V B
-n
~n and ~(n+l) c ~n
Introduction.
1.
~~~---~---~-----
for each n, and (A" ) any sequence such that
-vn
V !!n+l' then
(~n) is transitive for (~n)·
The question with which this paper is concerned is
phrased in Part I essentially as follows:
If a sufficiency reduction and
an invariance reduction of a problem are performed in succession, is the
result independent of the order in which these two reductions are carried out?
The purpose of Part II is to present a treatment of this question and its
various solutions entirely in the language of subfields.
!
Let
on
~,
and
be a space,
transformations g of
~
n ~I
a cr-field of subsets of
a sufficient subfield of A.
~
2 and 3).
~
Let
~I
!
Let
G
!'
~
a family of distributions
be a group of invariance
onto itself (precise definitions are given in sections
be the cr-field of invariant members of A.
The intersection
will be denoted by ~I.
Suppose that ~ is induced by a sufficient statistic
(£, ~s,
s), where
s is a function from X onto S and AS is a cr-field of subsets of S such that
-l( s)
~S=s
~
•
The notion of a sufficiency reduction followed by an invariance
reduction cannot be formulated very well unless every g induces a transformaimplies s(gX ) = s(gX2 ). In the subfield
l
language this means that g transforms any member of ~ into a member of
tion in~, i.e. s(~)
= s(x2 )
We shall assume throughout that every g
~S.
G has this property (As sumption A(i) ) •
s
It is clear that the inverse images under s of the invariant sets of A constitute the subfield
€
~r
to a maximal invariant function on
£'
say u (Where u is supposed to be A
-.
J
I
I
I
I
I
I
I
et
I
I
I
I
I
I
J
An invariance reduction applied after a sufficiency reduction leads
S
I
_
I
I
I
..I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
..I
I
II-3
measurable).
Since u induces the rr-field of invariant sets in ~s, the func-
tion u(s(·)) on
!
induces ~r
The question stated in Part I is whether
a maximal invariant function on § is invariantly sufficient, i.e. sufficient
for P restricted to the invariant sets.
this question becomes:
Is
~I
Translated into the subfield language
sufficient for AI?
It is not known whether the answer to the question in the preceding
sentence is yes in general if only Assumption A(i) is made.
C. M. Stein,
in an unpublished manuscript (see explanatory note), was the first to recogni ze
the problem, and to give sufticient conditions under which a maximal invariant
function on § is invariantly sufficient.
In the present paper we shall prove
the desired result under various other sets of conditions, different from
those of Stein.
More specifically, we shall propose three different sets
of conditions, called Assumption A, B, and C.
In applications it is convenient
to have several possibilities to choose from, for, to give an example, conditions
that are easy to check in situations involving normal distributions may be hard
to verify in nonparametric situations, and vice versa.
It turns out that
Assumptions A and C are usually easier to apply in parametric problems, B
in nonparametric problems.
The sufficiency of
~I
for AI turns out to be a consequence of the
following interesting relationship between the subfields As, AI and
and
~I
are conditionally independent given
~I'
~I:
~
This conditional independence
obtains under anyone of the Assumptions A, B or C.
The formulation in terms
of conditional independence of subfields has the advantage of being symmetric
in
As
and AI' and in simplifying proofs, e.g. in section 4.
Sequential aspects are treated in section 4.
we have the subfields An' ~n' AIn , ~SIn·
At each sampling stage n
Whether or not ~In is sufficient
I
II-4
for
~In
depends only on the three subfields
whole sequence.
~, ~
and
~In'
not on the
However, an additional notion enters that does depend on
the whole sequence, namely the notion of trandtivity, introduced by Bahadur
[4J.
Questions of transitivity are answered in section 4, relying heavily
on the notion of conditional independence of subfields.
In section 5, Assumption A is discussed.
In the language of statistics
Assumption A (ii) means that an almost invariant function on S is equivalent
to an invariant function on S.
There are problems where Assumption A (11)
cannot be checked, since the only known theorem that covers Assumption A (ii)
assumes a certain property of the structure of G, and, in addition can be
applied only if
~
is a dominated family.
parametric problems.
Theorem
This excludes application to non-
6.1, in section 6, avoids those handicaps by
using Assumption B, which is the existence of an invariant conditional
probability distribution.
Such a distribution is usually easily exhibited
in cases where the conditional probability distribution is discrete, such as
in certain nonparametric problems.
Example
6.1 illustrates this case. On
the other hand, in a large class of problems involving a 1'amily 01' densities
with respect to Lebesgue measure, Assumption B may be dif1'icult to verity
directly.
In order to cope with such cases, 'Iheorem 7.1 in section 7 gives
suf1'icient conditions (ca.l.led Assumption C) tor Assumption B to be valid it
~ is a 1'amily 01' densities on n-space 01' the 1'orm. P (x) ;: ge (s(x) )h(x), where
a
the various 1'unctions,
regularity conditions.
ga'
s, h, and each trans1'ormation g satis1'y certain
'lhis theorem could perhaps be regarded as a rigorized
version 01' Cox's theorem. [9].
Its advantage over Assumption A is that its
conditions can be checked in a straightforward way.
not involve the topological structure 01' G.
trated in examples 7.1, 7.2 and 7.3.
In particular, it does
The use 01' Theorem 7.1 is illus.
J
I
I
I
I
I
I
I
--I
I
I
I
I
I
J
I
I
I
..I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
11-5
Let
~
measure on A.
by
12
be a space of points x,
~ c~,
subfield.
If ~
c!2
and
~
a a-field of subsets of
~,
P a probability
is a a-field, we shall simply denote this
and we shall, for short, call!20 a subfield of
~,
or simply a
If ~l and ~2 are two subfields, their intersection
also a subfield.
shall denote by
~l
n ~2
is
With the union ~l U ~2 this is not usually the case.
~l
V
~2
the smallest subfield containing both
~l
and
We
~2.
All
functions will be understood to be A-measurable real-valued functions on
and, if required in the context, P-integrable.
~,
If two functions f 1 and f
are equal except on a set of P-measure 0, i.e. f
2
=- f 2 a.e. P, we shall denote
l
this by f 1 - f 2.' and sometimes term this "f1 is equivalent to f 2· "
It'!20
C
and f is ~ -measurable, we shall sometimes term this "an !20 function f."
12
The
conditional expectation of f given!20' written E(f I~), is defined as any ~
function whose integral over any A
O
€ ~
equals the integral of f over A
O
(thiS follows Loevels definition [32]; Doob [11J relaxes the definition somewhat by including all functions that are equivalent to an !20 function with
the above mentioned property).
The conditional expectation of f given ~
is defined up to an equivalence within the
~
functions, and we shall some-
times speak of the various "versionsll of this conditional expectation.
The
conditional probability of a set A, p(AI~), is the conditional expectation
of the indicator of A.
Let g be a 1-1 transformation of
write
1.
=
~,
~
onto a space
and y • gx i f y is the image of x.
1
of points y.
We
The point transformation
g induces in a natural way a set transformation, which we shall also denote
by g.
Thus, gA is the image of A
a a-field of subsets of
1,
€!2.
The collection of all gA is obviously
which we shall denote
by~.
induces a probability distribution, denoted gP, on
8!2:
Furthermore, g
gP(gA) is defined as
II-6
Finally, to each tunction t on
P(A).
gt(gx) is detined as t(x).
gt:
!
corresponds a function on!, denoted
(This definition ot gt makes sense even
if f has an arbitrary range space.)
The transformation g produces an isomorphism between
(~, ~, sF).
function on
!
(!,
~,
p) and
Thus, it ~ c ~ then ~ c ~, and it f is a P-integrable
then gt is a sF-integrable function on
~
aDd a possible
verSion of E(gt I~) is gE(f I~), so that
(the eqUivalence - is here with respect to
gP
on ~).
The considerations gi.ven so far will be applied in the case !
i.e. g is a 1-1 transformation of
to talk about the possib1lities
!
onto itself.
~ .~,
= !,
In that case it makes sense
gt • f, etc.
The rest of this section is devoted to propositions on conditional
independence.
Let
~l' ~ and ~3 be
three subfieldB, then
~l
and
~2
are
defined in [32], p. 351, to be conditionally independent given ~3 if' for any
Al
E ~l
and A2
E ~2
we have
Instead of' giving the definition in terms of conditional probabilities of
sets, we may, equivalently, give it in terms of conditional expectations of
integrable functions.
make
Since this is more convenient in the sequel, we shall
I
J
I
I
I
I
I
I
I
--I
I
I
I
I
I
.-I
I
I
..I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
II-7
DEFmrrION 2.1.
~3
Subfields
~l ~ ~
are conditionally independent given
11" for any ~l function f 1 aad ~ function f 2 we have
The definition of unconditional independence of
~l
and
~
fOllow8 by
taking in (2.2) ~3 • (~, ~) (i. e. ~3 is the triVial eubtield) and by replaCing
- by..
The following theorem is proved in (32 J, p. 351.
~1!!!2. ~
THEOREM 2 .1.
only if for every
~
are conditionally independent given
~3
if and
function f 2
From Theorem 2.1 and the fact that (~l V ~) V ~3 • ~l V ~3 follows
COROLLARY 2 .1.
if
~l
V
~3
and
~
~l ~ ~
are conditionally independent given
are conditionally independent g1ven
By taking in Theorem 2.1 ~3·
COROLLARY 2.2,
~l
~3
if and only
~3·
t!,¢) we have
and ~2 are independent if and only if for every ~2
function f 2 we have E(f21~1) - Ef2 •
If ~l and ~ are conditionally independent g1ven ~3' i t ~ c ~l and it
is ~-measurable, then f is also ~l-lIIll!tasurable so that (2.2) holds.
l
l
have therefore
f
We
LEMMA 2.1. I f ~l !!!2. ~ are conditionally independent given ~3' and ~c ~l'
I.
I
I
~ ~
and ~ are conditionally independent g1ven !3'
I
1I-8
and
By taking ~3 •
(!, ¢}
COROLLARY 2.3.
.!! ~l
~
J
in Leuma 2.1 we have
~ ~ are 1Ddependent and ~
c ~l' ~ ~
are independent.
LEMMA 2. 2.
1!. ~l
and ~2 are independent and ~3 c ~l ~ ~l ~ ~2 !!:!
conditionally independent §iven
PROOF.
~l V ~3
Let t
2
be
~3.
~-measurable.
Using Theorem 2.1 and observing
• ~l' we have to show
(2.4)
This is true because both sides of (2.4) are - Ef •
2
For the lett hand side
this tollows from Corollary 2.2, and for the right hand side by first applying
Corollary 2.3 with ~ • ~3.
The various propositions on conditional independence in this section have
their obvious analogues in terms of random variables.
2.1 Would read:
For instance, Corollary
X and Y are conditionally independent given Z if and only if
(X, z) and Y are conditionally independent given Z.
Leoma 2.2 would read:
if
X and Yare independent, h a function of X, then X and Y are conditionally
independent given h (X) •
Let
!
P.
If we write f
and
~
be as in section 2, and let
!:
be a family of probability measures
l - f 2 this will mean now that f l = f 2 a.e.
!:'
on which the equality does not hold has P-measure 0 for every P
functions are understood to be P-integrable for every P
is req1ltired in the context.
written Ep •
€
!:
i.e. the set
€
P.
All
whenever this
The expectation with resp!ct to P will now be
I
I
I
I
I
I
I
_I
I
I
I
I
I
I
.-I
1
1
I
II-9
..I
I
I
I
I
I
I
--I
I
I
I
I
I
..I
I
If ~l and ~ are any subfields of ~, with ~2
~l
is sufficient for
f
2
c
~l'
we say that
if for every ~l function f 1 there is an
such that Ep(fll~) - f
2
for all P
€
!::e
!l2
function
In particular, let ~S be
P.
sufficient for A.
!
Let G be a group of transformations g of
one-one onto itself, such
that for each g
A,
(a)
~ ..
(b)
gP € P whenever P € P.
A function f is called invariant [30) if sf • f for each g
sf -
called almost invariant if
set may depend on g).
Hence, for each A
f for each g € G (where the exceptional ~-null
!l
form clearly a subfield.
€ ~I
we have gA .. A.
!ls
and of
We shall denote it by
The members of A that are in
both ~ and !lI constitute the subf'ield !lSI •
field both of
G; f is
A subset of X is called invariant if its indicator is.
The invariant members of
!lI.
€
!ls n !;I·
Clearly, !lSI is a sub-
~I.
Concerning the relation between G,
!ls
and AI we shall make the following
assumption:
ASSUMPTION A.
(i)
~ =
!ls
for each g
and almost invariant, there exists an
~SI
€
G; (ii)
~ f
S is AS-measurable
function f SI such that f SI
~
f S.
Before stating the main result in this section (conclusion of Theorem
.5 • l), it is convenient to state first the two following lemmas.
LEMMA. 3.1. Under Assumption
of E(fl!ls) is almost invariant.
A(i), if f is invariant then any version
I
II-10
In (2.1) on the left hand side we have gf ::: f since f is invariant,
PROOF.
and, replacing!:.o by ~S' we have ~
..
reads E(f I~)
gE(f I~).
If f
S
:lIl
~S by Assumption A (i).
Thus, (2.1)
is any version of E(f I~), we have f
S
... gfS'
which is the conclusion of the lemma.
Using lemma 3.1, and Assumption A (11) we have immediately
LEMMA 3.2.
function
f
Under Assumption A, it f is invariant there exists an
such that f
SI
SI
~SI
... E(fl~).
The following theorem was first stated and proved by Stein (unpublished)
under slightly different assumptions.
is given in Part I, Section 2.
The analogue in terms of statistics
The statement and proof given here follow
consistently the Janguage of subfields.
THEOREM 3.1.
PROOF.
~SI
Under Assumption A,
~I
We have to show that if f is
is sufficient for
~I-measurable,
~I'
there exists an
function f SI such that
To show thiS, let f
S
be any version of E(f I~).
Since ~I c~, we have by
a well-known property of iterated conditional expectations ([11], p. 37):
From lemma 3.2 we know that there is an
Substituting f
SI
for f
S
~I
function f SI such that f SI ... f S.
on the right hand side in (3.2), and observing
Ep(fSII~I) ... f SI ' we have (3.l).
This concludes the proof.
JI
I
I
I
I
I
I
til
I
I
I
I
I
I
JI
I
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
II-II
There are a few additional properties of a subfield of interest besides
sufficiency.
One is completeness, another is minimal sufficiency.
these properties inherited by ~I if valid for ~S?
We recall [31] that P
on AS is called complete if, for an AS function f, E f
-
-
implies f -
o.
is complete on
on
~r.
-
Are
p
=0
for all P
P
€
-
It follows then immediately from the definition that if
~S'
it is also complete on any subfield of
Hence, completeness is inherited by
for minimal sufficiency.
~SI.
~,
~
in.particular
The situation is different
We recall that a subfield
~2'
sufficient for
~l'
is called minimal sufficient [31] (or necessary and sufficient in [4J) if
every subfield sufficient for
~l
contains
counter example shows that if
~S
is minimal sufficient for
necessarily true that
~I
~2'
up to
is minimal sufficient for
~-null
sets.
~,
The following
it is not
~I·
Let Xl' ••• , X
n
be independently normal with common unknown standard deviation rr and common
mean crr, where c ~ 0 is a known real number.
Let G consist of all transforma-
tions of the form xi --> gx , i = 1, ••. , n, where g is any positive number.
i
Then AI is induced by the maximal invariant (xliX, ... , xlix, sgn X ).
n
nn
n
Let X and S be the sample mean and standard deviation, respectively; then
~S is induced by the minimal sufficient (but not complete) statistic
and ~I is induced by the statistic
xis.
(X, S),
However, ~I is not minimal suffi-
cient for ~I since the trivial subfield (~, ¢} is contained in ~SI' not equivalent to it, and also sufficient for
~I.
The latter is true because the
distribution of the maximal invariant is free of rr, so that any
~I
function
has a fixed distribution.
The relevance of the conclusion of Theorem 3.1 for testing problems is
that for every invariant test function CPr there exists an
test function CPSI with the same power function.
If
~S
~I-measurable
is complete, the
I
II-l2
property mentioned in the preceding sentence is possessed not only by the
invariant tests, but by all test functions cP whose power function is invariant,
including the CPI as special cases.
To see this we apply first Lemma 2, p. 227,
in [30J to E(cpl~), then Assumption A(ii), and conclude that there is an
invariant version CPSI of E(cpl~s)' Under these circumstances, if a test
enjoys a certain optimum property within the class of
~I-measurable
tests,
it also enjoys this property among all tests whose power function is invariant.
(An analogous statement may be made in the sequential case, replacing "power
function" by "joint distribution of decision and sample size".)
We conclude this section by an interpretation of Theorem 3.1 in the
language of conditional independence.
The latter notion was defined in
section 2 in the case of one probability measure P.
this paper we shall call
for every P
€
~l
and
~
In the remainder of
conditionally independent given
~
if
P (2.2) holds, With E replaced by E •
P
LEMMA 3.3. The following statements are eqUivalent:
(i)
(11)
(iii )
If' 1" I is invariant, there exists an ~I function 1" SI such that 1" SI - E(1" I I~s) •
If' 1" I is invariant, then for every P
€
P
~ and ~I are conditionally independent giYen ~SI.
PROOF.
(i) follows from (ii) immediately by taking fSI to be any version
of Ep(fII~SI) for any P.
right hand side of
Conversely, (ii) follows from (i) by writing the
(3.3) as
Ep(E(fII~) I~I)'
Then
(3.3) follows after
remarking that both sides are equivalent to fSI' using (i).
The equivalence
JI
I
I
I
I
I
I
•I
I
I
I
I
I
.-I
I
I
..I
I
I
I
I
I
I
lit
I
I
I
I
I
I
..I
I
II-13
of (ii) and (iii) follows immediately from Theorem 2.1 by taking in this
theorem ~l = ~, ~2 = ~I' ~3 = ~SI (so that ~l V ~3 = ~), f 2
=fI,
and
replacing E by Ep •
We recognize (i) of Lemma 3.3 as the conclusion of Lemma 3.2.
Using
Lemma 3.3 (i) and (iii) we see then that Lemma 3.2 is equivalent to
THEOREM 3.2.
given
Under Assumption A,
~S ~ ~I
are conditionally independent
~I.
Since the conclusion of Theorem 3.1 followed from the conclusion of
Lemma 3.2, it follows then also from the conclusion of Theorem 3.2.
therefore interpret the results as follows:
the conditional independence of
the sufficiency of
~I
for
~
and
~I
Assumption A is used to establish
given
~SI'
and this in turn implies
~I. 12
One of the advantages of Theorem 3.2 is that it is symmetric in
~I.
We could
~S
and
Therefore, any statement implied by the conclusion of Theorem 3.2 remains
true if the subscripts S and I are interchanged.
For example, if we do this
in (3.3) (after replacing on the left hand side E by ~) we get for every
~S function f s and every P Ep(fSI~I) - ~(fsl~I)·
Let [A , n > lJ be a sequence of subfields of A, and let, for each n > 1,
-n
- ~n' ~In
~SI
and ~In be subfields of ~n' defined in the same way as ~, ~I and
were in section 3.
this by saying that l.Aco
For each n,
-on
J is
~n
is sufficient
for~.
a sufficient sequence for {A ).
-n
We shall express
From Theorem
3.1 we know that if ASSUilption A is valid for each n, then {~In} is a
sufficient sequence for {~In}.
Besides the notion of sufficiency there
is in the sequential case an additional notion, called transitivity, and
I
II-14
introduced by Bahadur [4].
DEFmITION 4.1. ~ ~, n ~ 1} and %n' n ~ 1} be two sequences of
subfields such that ~ c ~ for each n.
'nle sequence ~). is said to be
a transitive sequence for ~} if for every n, every ~(n+l) function f and
every PEP we have
EP (fiB)
- EP-Vll
(fl~).
-n
(4.1)
The importance of (~} being a sufficient and transitive sequence for
{~} has been pointed out in [4].
section
4.
A discussion can also be found in Part I,
This section will be concerned mainly with the question of
transitivity.
It is of some interest that Definition 4.1 is equivalent to
a statement in terms of conditional independence of subfields, as follows:
THEOREM 4.1.
{BOn} is a transitive sequence for (Bn) if and only if for
~ n ~ 1 ~n ~ ~(n+l) are conditionally independent given ~n·
PROOF.
Let f be !!o(n+l)-m.easurable.
by Ep ' ~l = ~, ~2
~3 = ~ (so that ~l
= !!o(n+l) ,
Then Theorem 2.1 states that
Apply Theorem 2.1 with E replaced
~
and
~(n+l)
V
~
= ~n)
and f 2
::I
f.
are conditionally independent
given ~ if and only if for each P (4.1) holds.
Two questions will be investigated in the remainder of this section.
The first is whether (~In) is a transitive sequence for (~n) if (~n) is
a transitive sequence for (A).
-n
This question was answered by Ghosh [16],
Theorem 2, Chapter 4, in the affirmative under slightly more restrictive
conditions than we shall impose in Theorem 4.2.
I
I
I
I
I
I
I
til
I
I
I
I
I
I
The second question is under
what conditions (Ac< ) is a transitive sequence for (A ).
-on
J
-n
This question was
J
I
I
I
..I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
..I
I
II-15
suggested by Ghosh I s theorem quoted above, and the answer, as formulated in
Theorem. 4.3 below, is essentially contained in the proof of Theorem 11.5 in
[4].
Our proof of 'D1eorem 4.3 is entirely in terms of coDditional independence
of subfields.
Note that 'D1eorems 4.2 and 4.3 are detached from sutticiency
and invariance considerations.
1!! !.n,
For each n ~ 1,
THEORa{ 4.2.
~ln c~, ~ c ~ ~ ~n c ~ln
n ~n'
!en and ~n be
~ln'
subfields, with
such that !In ~ ~ are conditionally
A-.
=in;;;;;d;;,;e.pe..;;,n;;;;.d;;,;e~n;;.;;t~g1;;.va~n
-:>n
1hen it {A ) is a tra.naitiva sequence for {A ), {A )
- ln
-il
- 3n
is a transitive sequence for ~).
PROOF.
We have to show that if f is
~(n+l)-measurable,
then for each P
(4.2)
Because f is ~3(n+l)-measurable, and ~(n+l)
measurable.
C
!l(n+l)' f is also !l(n+l)-
Since by assumption {A;"J.n ) is a transitive sequence for {A
-n ), we
have for each P
Ep (fiA
)
-n
Ep
(fIA ).
-ln
N
We apply now Theorem 2.1, replacing in (2.3) E by Ep ' !l by ~2n' ~ bY!ln '
~ by ~n' f 2 by Ep (f I~ln) •
'Blen ~l V ~ is replaced by !2n V ~ • !2n' and
(2.3) reads
(4.4)
We have now
E (E (f IAln) I~)
P
P
-
-c;u
N
E (E (fIA
P
P
)
IA- ).
- ln -:>n
II-16
Ep (f I~.J
-cu
..
J
Ep (Ep (f IA
) I~)
-n -c;,u
... E (E (fIA )IA ) by (4.4)
p p
- ln - 3n
... E (f IA )
p
-n
3
which is (4.2).
We are especially interested in applying Theorem 4.2 to the case
A
=A
nA •
-3n
-In
-2n
Taking in Theorem 4. 2 ~ln = ~n' ~ = ~In' ~3n • ~In'
we have
COROLLARY 4.1.
It the conclusion of 'lheorem 3.2 is valid for each
n ~ 1, and if { ~Sn} is a sufficient and transitive sequence for {~n),
~ {!sIn} is a sufficient and transitive sequence for {A
THEOREM 4.3.
In }.
Suppose a sequence ~l' ~2' ••. of independent subfields
of f; is given; suppose ~n = ~l
V •••
v ~n; let {~n} be given such that for
each n ~ cA and ~( n +1) c ~ V B +1; then {~ } is a transitive sequence
-vn - n - - v
-vn
-n
-"'-In
for {A }.
-n
PROOF.
By the construction of {A }, A and B +1 are independent.
-n
-n
= -n
A ,
-n
We
= ~+
B l'
~n+l
A = ~ and conclude that A and
-3
"'-In
-n
are conditionally independent given~. Applying Corollary 2.1 we have
that
~
apply Lemma 2.2 with Al
-
and
~n+1 V ~n
now Lenma 2.1 with ~l
A,..,
~
are conditionally independent given
= ~n+l
V
~n' ~
= ~(n+l)'
I
~.
We apply
~2 = ~n' ~3 =~, and
conclude that ~(n+l) and ~n are conditionally independent given~.
The
I
I
I
I
I
I
I
--I
I
I
I
I
I
J
I
I
I
\
I
I
I
I
I
I
I
--I
I
I
I
I
I
..I
I
II-l?
desired result now follows from Theorem 4.1 with
~
replaced by
~.
When sampling from an exponential family of distributions the assumptions
of Theorem 4.3 usually apply.
For instance, if Xl' X , ••• are independent
2
and identically distributed according to a normal distributior. with unknown
mean and known variance, then B is induced by X ,A by (Xl' ••. , X ), and
-n
n-n
n
~n
by Tn
= Xl
+ ..• + Xn • Hence AO(n+l)' which is induced by Tn + Xn +1 , is
a subfield of ~ V B 1. Other examples of the use of Theorem 4.3 are given
---vn -n+
in Part I.
~~
__~!~£~~~!~~_~f_~~~~~~!~~_~.
A(i) is a very natural one to make.
As explained in section 1, Assumption
If
~S
is induced by a sufficient statistic
s with range ~, then Assumption A(i) is closely related to the property that
every g
€
G induces a 1-1 transformation of S onto itself.
The following
theorem is due to C. M. Stein (unpublished) and gives conditions under which
Assumption A(i) holds.
THEOREM 5.1.
If!:.S is minimal sufficient for
~,
and if
~S
contains all
P-null sets, then Assumption A(i) is valid.
PROOF.
For each g
€
G, due to the isomorphism described in section 2
between (~, ~, p) and (g~, ~, gP), we have that ~S is minimal sufficient
for
~ on~.
But
~ = ~
sufficient for P on A.
and g!;
=~,
so that both
~
and
~S
are minimal
Since they both contain all P-null sets, they must
be the same.
It should be remarked that Assumption A(i) is usually easy to check
directly, and has been found to hold in all interesting examples, whereas
it is often not true that !:.S contains all P-null sets, in which case
Theorem 5.1 is not applicable,
I
11-18
Assumption A can be phrased in the following way.
Noting that
~S
by Assumption A(i), we can consider ~ as our basic ~-field, instead of
1.e. consider only ~s-measurable functions.
A,
Assumption A(ii) then says that
every almost invariant function is equivalent to an invariant function.
applications
~s
~,
In
is often induced by a statistic s, and G induces a group of
transformations on the range
tions on
~S
:II
~
of s.
Considering then only measurable func-
and invariance relative to the induced group of transformations
on ~, Assumption A(ii) again says that every almost invariant function is
equivalent to an invariant function.
This assumption holds in a good many
cases, as implied by a theorem of Lehmann [30], p. 225.
theorem cannot be applied unless
nonparametric cases.
of a
~-finite
~
However, Lehmann's
is dominated, which excludes many interesting
Furthermore, Lehmann's theorem reqUires the existence
measure on G possessing a certain invariance property, so
that the applicability depends rather heavily on the topological structure
of G.
In nonparametric examples the group G is usually of such a nature
that it is not known how to verify the eXistence of a measure with the
desired properties.
In some problems G is finite.
In that case, and, more generally, when
G is countable, Assumption A(ii) is automatically fulfilled.
discussion in section 5 brings out the desirability of having another assumption,
alternative to Assumption A, that also permits the conclusions of Theorems
3.1 and 3.2.
The following Assumption B achieves this aim, by introducing
a function Q, which we shall call an invariant conditional probability distribution.
If
Q satisfies merely (i) and (ii) of Assumption B, it has been
called a conditional probability distribution by Doob [11] (except that in
J
I
I
I
I
I
I
I
--I
I
I
I
I
I
.-I
I
I
\
I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
II-19
[11] the measurability condition is slightly weaker), and a regular conditional probability by Lo~ve [32].
(We use here the symbol Q instead of the
more customary P, since there is only one conditional probability distribution
due to the sufficiency of
tions P.)
~,
whereas there is a whole family
!:
of distribu-
The additional condition (iii) gives Q a certain invariance pro-
perty.
ASSUMPTION B.
There is a set ASI e
valued function Q ~ ~ 'f(
!'
~SI ~ ~-measure
With Q(A,x) • 0 for every A
€
1, and a real
~,
X
~ ASI,such
that
for every x E ASI ' Q(·,x) is a probability distribution on ~;
(i)
(ii)
(iii)
for every A e ~, Q(A,·) is a version of p(AI~);
for every x E X, A E
~
and g E G, Q(gA,gx)
= Q(A,x).
The reason we have to distinguish in Assumption B between
!
and ASI is
that it is not always possible in applications to satisfy (i) for all x E !'
and at the same time satisfy (iii).
As regards to condition (iii), it is
ASI ' since if x ~ ASI then also
so that by definition Q(gS,gx) • Q(A,x) • o.
merely necessary to verify this for x
gx
4 Asr ;
E
Strictly speaking, (ii) and (iii) of Assumption B would be sufficient
to obtain the conclusions of Theorems 3.1 and 3.2, since for any invariant
set A (11) and (iii) say that there is an invariant version of P(A I~) .
However, it is mere natural to make (i) also part of assumption B since
the purpose of this assumption is to apply it to cases where a conditional
probability distribution satisfying (i) and (11) is readily exhibited, and
where (iii) can then subsequently be verified.
It should also be remarked
at this point that seemingly Assumption B does not contain or imply Assumption
A(i), even though the latter was announced in section 1 to be assumed
throughout this paper.
It is true that Assumption A(i) is not needed for
II
II-20
Theorem 6.1 below, but it is equally true that essentially Assumption A(i)
is implied by Assumption B.
Tb see this, consider the subfield generated
by the totality of functions Q(A,·), for A €~.
be checked to be sufficient, contained
in~,
null sets, and satisfying Assumption A( i) •
satisfied
This subfield can easily
differing from
~
only in
'!bus, if Assumption A( i) is not
the latter can be replaced by an equivalent subfield that
by~,
does satisfy it.
Doob [11] has shown (a proof is also in [32]) that a possible version
of E(fl~) is defined by
E(fl~)(x)
(6.1)
=
J f(x' )Q(dx' ,x).
If f is invariant, and Q satisfIes Assumption B, then E(fl~) as defined by
(6.1) can immediately be checked to be invariant.
Since E(fl~) is also
~-measurable, it follows that the version of E(fl~) given by (6.1) is
an ~I function.
This is precisely (i) of Lemma 3.3, which is equivalent
to (iii) of the sarne lemma, i.e. the conclusion of Theorem 3.2.
We have
therefore proved
THEOREM 6.1.
If ASSumption B holds, then the conclusions of Theorems
3.1 and 3.2 are valid.
We shall now give a nonpararnetric example of the use of Theorem 6.1.
EXAMPLE 6.1.
Let~,
••• ,
common unknown distribution.
g defined by gx
= (h(~),
n be independent random variables with
Let the group G consist of transformations
X
••• , h(X )), where h is strictly monotonic,
n
continuous, and maps the real line onto itself. A sufficient statistic is
J
Ii
II
II!
II
I
I
I
--I
I
I
I
I
I
.'I
I
I
\
I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
II-2l
the unordered set {Xl' ••• , X }. If 1f stands generically for a permutation
n
of the coordinates of a point x, then ~ can be described as the family of
those sets A
E
A that have the property:
x
E
A =->
A for every
1fX E
For any x, let m(x) be the number of distinct points of the form
coordinates of x are distinct, m(x)
= n~).
m (x) be the number of distinct points
A
= mA(x) /m(x).
with ASI =
~
1fX
1fX
1f.
(it the
Given x and a set A E~, let
that are in A.
Then define Q(A,x)
One can verify irmnediately that Q satisfiel!l Assumption B,
= n-space.
1~ __~~~~~!2~_£l
__ !~!~!~~_£2~~!~!2~~!_~~~~!!l'
Assumption B is
less easy to verity if Q(',x) is not a discrete probability distribution.
Theorem 7.1 in this section is designed to cope with the continuous case.
More specifically, it gives sufficient conditions for Assumption B to hold.
These conditions will be called Assumption C.
Assumption C is usually very
easy to verify, and Theorem 7.1 may therefore be used as an alternative to
Theorems 3.1 and 3.2 in those cases where the latter also apply.
cases will be illustrated in Examples 7.1 and 7.2.
Two such
The normal theory
examples in Part I may also be treated with Theorem 7.1.
'lhe real advantage
of this theorem lies in the fact that Assumption C does not involve the
topological structure of the group G, so that the theorem may be used in
cases where we don't know how to verify Assumption A(ii).
Example 7.3 will
illustrate such a case, in which the family of distributions is nonparametric,
and the conditional probability distribution continuous.
We shall precede 'lheorem 7.1 by
LEMMA 7.1. Suppose F is an open subset of n-space, G a group of transformations of F onto itself, s a differentiable function from F into k-space
I
II-22
(k < n) with range~. Let D(x) be the n X k matrix whose ij element i8
~s/~xi
evaluated at x
€
F.
We make the following assumptions:
for each g E G, the transformation x -> gx is continuously differentiable;
(i)
(ii)
for each g
(iii)
€
G, sex)
= s(x')
implies s(gx) ~ s(gx')j
D(x) is continuous and of rank k for each x E F.
Then each g
€
G induces a 1-1 and bi-continuously differentiable transformation
g !2! ~
onto itself, where for s'
that s'
= sex).
PROOF.
€
S we define gs'
:II
s(gx) for any x ~
That the transformation g is well-defined follows from (11).
The transformation is 1-1 since if for some x,x', gs(x)
=gs(x'),
s(gx) = s(gx'), then sex) = sex'), using (ii) with g-l.
g obviously form a group
To
G,
i.e.
The transformations
which is a homomorphism of G.
show that g is bi- continuously differentiable (Le. g and g -1 are
continuously differentiable), it is sufficient to show that
since the same conclusion will then apply to g -1.
g is
differentiable,
Incidentally, this will
show that the Jacobian of the transformation g is everywhere positive on
s(F).
In the following, points in Euclidean space will be denoted by row
vectors, and the same notation will be used for vector-valued functions.
Furthermore, g will be an arbitrary but fixed element of G.
o € F be such that s(xO) = sO' so that
It follows from (iii) that there is an n X (n-k) matrix
arbitrary point in
gSO
= s(gxo).
2.,
Let So be an
and let
X
EO such that the n x n matrix (D(x ) ,Eo) is nonsingular. Since D(x) is
O
continuous at xo ' (D(x),E ) is nonsingular in a neighborhood NO of xO•
O
Define the function U from No into (n-k)-space by Uo = xEO' and define
o
v = (s,uo )' so that V is differentiable, with matrix of partial derivatives
o
o
(D(x) ,EO) nonsingular on No. By an implicit function theorem there is then
a neighborhood NO of xo , No
C
No, such that on NO the function v0 is 1-1 and
JII
I'
I
I
I
I
I
--I
I
I
I
I
I
.'I
I
I
\t
I
I
I
I
I
I
I
--I
I
I
I
I
I
II-23
Let M = vo(No ); then V o is a 1-1 bi-cont1nuously
o
differentiable map of No onto MO. Similarly, there is a neighborhood N of
l
gxo ' and a function ul on Nl , such that vI = (s,ul ) is a 1-1 bi-continuously
bi-continuously differentiable.
differentiable map of N onto M = vI (N ). Without loss of generality we
l
l
l
may assume N = gNO. By (i) the transformation g maps NO continuously diffel
-1
rentiably onto N • Let w be the composition of the three functions v0 ' g
l
and vI; then w maps Mo onto ~ continuously differentiably (actually bi-continuously differentiably).
Wri te wI for the first k components of w, so
that wI maps M continuously differentiably onto s(N ). By the construction
o
l
of w, any point (s' ,u) € M is mapped by wI into gs'. Hence gS' is a cono
tinuously differentiable function of (s' ,u). But we know that gs' depends
only on s' ; hence gs' is a continuously differentiable function of s', for
s' in a neighborhood of sO.
This concludes the proof of the letmna.
Although not needed here, we shall give without proof an explicit
expression for the matrix of partial derivatives of the transformation g,
i.e. the matrix Wl(s') whose ij element is o(gS')j/osi' for s'
S.
Let
Go(x) be the matrix whose ij element is o(gx)/ox , let D(x), So and X be
i
o
as in Lemma 7.1, and put DO = D(Xo ), Dl = D(gxo), GO = Go(xo ). Then
WI (sO)
= (D~DoflnoGODl·
In the following we shall write g instead of g, in conformity with
the notation in sections 2 and 3.
ASSUMPTION
of
!,
~ = (P
c. ! is an n-dimensional Borel set,
e, e
€
®Eith
@
~
the Borel subsets
~bitrary index set, and with resrpect
to n-dimensional Lebesgue measure P has a density
I.
I
I
€
e
I:
1I-24
in which s is a measurable function from.
~~,
!
into k-space (k < n) with
ga and h are positive, real-valued measurable functions on
~
respectively, and s and h satisfy the conditions below.
~I
~,
~, ~I
G,
be as in section 3 and suppose that there is an open set ASI E
~SI
!'
and
of
P-measure 1, such that on A :
SI
(i)
for each g E G the transformation x ---> gx is continuously differentiable,
and the Jacobian depends only on s{x);
(ii )
(iii)
(iv)
for each g E G, s{x)
S
is
= s{x')
implies s{gx)
= s{gx
t
);
continuously differentiable, and the matrix D{x), ~ ij
element is OSj/ox , is of rank k;
i
for each g E G, h{gx)/h{x) depends only on s{x).
Note that Assumption C{ii) implies Assumption A(i).
THEOREM 7.1.
Assumption C implies Assumption B and therefore the
conclusions of Theorems 3.1 and 3.2.
PROOF.
As in the proof of Lenuna 7.1, to each x
E
A we can assign a
SI
neighborhood and a 1-1 bi-continuously differentiable function on this neighborhood into n-space whose first k components coincide with s.
Since A is
SI
separable, we can cover it with a countable subfamily of these neighborhoods,
and from this subfamily we may construct a family {N , a
a
= 1,2,
••• ) of
disjoint sets whose union is AS.. (the N are not necessarily open, but each
a
J.
Na contains an open set).
space, such that va
n-space.
On each N we have then a function u
= (s,ua )
a
a into (n-k)-
maps N 1-1 bi-continuously differentiably into
a
Let J a be the Jacobian of v~l and define the real-valued function
J'I:
II
II
I
I
I
I
II
I
I
I
I
I
I
.'I
I
I
II-25
'-
I
I
I
I
I
I
I
lit
I
I
I
I
I
I
I.
I
I
Note that h[va] > 0 for each a.
In the following, the indicator of any set
B will be denoted by I[B], and the probability with respect to P e of B by
P eB.
For any A
€ ~
and s'
K[A](s')
=E
a
€
§ we put
J I[Va(NcI)](s' ,U)h[Va](S' ,u)du
and for K[~] (s') we shall simply write K( s' ).
X €
Now we define, for A
€
~,
ASI :
Q(A,x)
(7.4)
= K[AJ(s'
)
= s(x)
s'
•
K(s')
We shall assume for the time being that K( s') is neither 0 nor
to a discussion of the possibilities K(s')
=0
or
00
later.
does not change if' on any N the function uQ is changed to
a
00,
and return
Note that K[AJ(s')
u~,
such that
(s,u~) is again a 1-1 bi-continuously differentiable function on N •
This
a
remark also can be used to show that K[AJ(s') does not depend on the partiCUlar chOice of the family (N }:
a
if (N~, ~
= 1,
2, ••• } is another choice,
with 1-1 bi-continuously differentiable function (s,u~) on N~, then we may
employ the family (NaN~} and on NJl~ we may take either the function (s, ua )
or (s,u~), giving the same contribution to the double series defining K[AJ(s').
We shall show now that Q defined in
of Assumption B.
(7.4) satisfies (i), (ii) and (iii)
That Q(·,x) is a probability distribution for each x
is immediate, so that (i) is true.
A
SI
To show (11) we first remark that each
€
term on the right hand side in (7.3) is a measurable function of s', so that
Q(A,·) is
~-measurable.
Now let B
O
€ ~,
then
1I-26
II
JII
It is readily verified that
Therefore, we obtain from
=
J
(7.5):
ga(s' )K[AJ(s' )ds'
s(B )
O
using (7.3).
By taking in (7.6) A
(7.7)
It follows from
J
PaBo =
S(B
=~
we get
ga(S')K(S')dS'.
o)
(7.7) that the random variable sex) has a density P: with
respect to k-dimensional Lebesgue measure, given by
(7.8)
We still have to show that the integral of Q(A,·) over B with respect to
O
PaequalsPaABo.
II
II
II
II
II
II
III
II
II
II
II
II
II
Now, using (7.4), we may compute this integral as
.1
I:
1
I
II-27
'-
I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
and substituting (7.8) into (7.9) we obtain the right hand side of (7.6).
This concludes the verification of (11) of Assumption B.
Next we shall
verify (i11).
From (7.4) we have Q(gA,gx)
Put N~
that this equals Q(A,x).
= K[gA](gSI)/K(gs'),
= gNl3'
and we have to show
13 = 1, 2, ••• , and let u~ be a func-
tion on N~ into (n-k)-space, defined by
(7.10)
(i.e. u~
X €
= su13 ). Putting
checked that
v~
Since {N~, '13
= 1,2,
maps
N~
v~
NI
13
= (s,u~) and using (i), it can easily be
1-1 bi-continuously differentiably into n-space.
.,.} is a family of disjoint sets, covering A ' we
SI
have, for any B C ASI ' B = U N~B, and the sets of this union are disjoint.
13
Applying this to B = N~ in the expression for K [gA](gS'), we have
so that
(7.11)
On NbNa we shall use now
v~
instead of va'
By a previous remark this does
not change the contribution to the at3th term on the right hand side of (7.11).
II-28
I
JI,
Thus:
I!
By virtue of the construction of
v~
one can easily check
Substituting (7,13) into (7.12) and summing over ex yields
(7.14)
Using (iv) of Assumption C, let h(gx)/h(x) = cl(s'), where s' = sex).
Using
(i) of Assumption C, let the Jacobian la(gx)/axl be c 2 (s'). Finally, using
Lemma 7.1, let the Jacobian las'/a(gS') I be c (s'). All three CIS are > 0 on
3
ASI ' so their product c(s') = cl(s' )c 2 (s')c (s') is also positive on ASI ' With
3
help of (7.2) one can verify then that
I f we
substitute (7.15) into (7.14) and compare the result with (7.3) we see
that
(7.16)
K[gA](gs')
= c(s')K[A](s')
where, as remarked before, c(s') > o.
(7.17)
K(gs')
Applying (7.16) to A =
= c(s' )K(s').
! gives
I
I
I
I
I
--I
I
I
I
I
I
.-I
I
I
1I-29
'-
I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
Taking the ratio of (7.16) and (7.17), and using (7.4), gives the desired
result Q(gA,gx) = Q(A,x), demonstrating the validity of (iii) of Assumption
B.
We return now to the question whether K(st), in the denominator of
(7.4), can be a or
CD.
Since ga > a on ~, we see from (7. 8 ) that the
subset ASIa of A on which K(s(x)) = a or
SI
00
is of ~-measure a (actually
one can easily show that K(st) carmot be a on s(A )' but we shall not need
SI
Moreover , it follows from (7.17) that ASIa is an invariant set.
this fact).
The invariant conditional probability distribution Q, given by (7.4), is
well-defined on ASI - ASIa' which is an
~I
set of
~-measure
Theorem 7.1 can be applied with ASI - ASIa instead of ASI '
1, so that
This concludes
the proof of the theorem.
EXAMPLE 7.1.
Let
)L,
-~
••• ,
Xn
(n> 3) be independent and identically
distributed according to a normal distribution With mean
er, both unknown.
~,
standard deviation
Then we may take a= (~,er), s = (sl' s2 ) with sl(x) = E xi'
2
s2(x) = E Xi' h(x) = 1, and
Let G be the totality of transformations x -> cx,
C
> a.
The matrix n(x)
of Assumption C is of rank 2 unless all components of x are equal, 1.e.
unless x is on the equiangular line.
and in
~I'
The latter is of Lebesgue measure a
so that its complement can be taken as the set A in Assumption C.
SI
All assumptions are easily verified to hold.
As a maximal invariant statistic
based on s one can take sl/\/s2-sf/n, which is essentially Student's t-ratio.
I
II-30
JI
Theorem 7.1 implies then that in a sequential t-test the t-ratio at the nth
stage is invariantly sufficient.
EXAMPLE 7.2. (multiple correlation coefficient).
I
I
I
I
I
I
Let Xl' ••• , X be
n
independent and identically distributed according to a p-variate normal
distribution (p < n), with unknown mean vector and unlalown nonsingular
covariance matrix.
Let X = (Xl' ••• , X ) so that X is a p X n matriX, and
n
let Y be the matrix obtained from X by deleting the first row x of X. Define
x = L:
xcJn (in this example
0:
will always run from 1 to n) and A
= XX'
so that X and A/(n-l) are the sample mean and sample covariance matriX.
larly, define
Y = L:
YcJn and B
= YY' -
- nXX' ,
Simi-
nlY'.
Suppose inference is desired about the population multiple correlation
coefficient -2
R between the first and the remaining variates.
The corresponding
2
2
sample multiple correlation coefficient R can be written as [lJ R
in which all is the 11 element of A.
= 1 -IAI/(al1IBI),
2
Both R:2 and R are invariant under the
group G composed of the follOWing transformations:
(i) Xo: -> lb + b, b an arbitrary p
(ii)
(iii)
X
1 vector;
Y -> CY, C an arbitrary nonsingular (p-l)
X
(p-l) matriX;
z -> cz, c arbitrary real and ~ O.
If the density of
X is written down, it is seen to be of the form (7.1), With
~= pn-space, and for s we may take the vector-valued function (X,A), whose
1 p(p+l) elements a .. of A, with
components are the p components of X and the -2
lJ
(say) i ~ j (for simplicity of notation no distinction is made here between
random variables and the values they may take on).
can be verified to hold.
All parts of Assumption C
The only step that is not immediate is checking
that the matrix D(x) in (iii) of Assumption C is of maximal rank.
It turns
out that this condition is true on the set on which A is nonsingu1ar, i.e.
~
I
I
I
I
I
I
.-I
I
I
l-I
I
I
I
I
I
I
II
I
I
I
I
I
I
I.
I
I
II-51
an ~I set of ~-measure 1.
tated by considering
x
ia
X and
Thts can be shown by direct computation (faciliA functions of new variables, obta~ned from the
by an orthogonal transformation), or by the followine
eeom~tric
argument:
If the rows of X are projected on the n-l dimensional orthogonal complement
of the equiangular line, there results a matriX X·)~ such that A = x*X*'.
The
only nontriVial part of the proof is to show that the matriX of partiul
derivatives of the mapping x* --> A is of full rank.
Now if
t~e
rows of
X" t are linearly independent, then by the Gram-Schmidt orthogonalization
process we can write x* = Ttl, where T is p X plower trio.ngutar \.,rit1+ posit~.ve
diagonal elements, and U has orthonormal rows.
We h2.ve then
!, =
TT', and
the assertion follows from the fact that X* --> (T,U) and T -> A are 1-1
bi-continuously differentiable.
Considering the transformations induced by G in the range of s, we can
show readily that R2 is a maximal invariant statistic b~csed on s, co thc.t it
induces the rr-field ~SI' FrOm the Conclusion of theorem 7.1 we know then
that R2 is invariantly sufficient. We can use thi~ fa~t for 8 sequential
test of a hypothesis concerning
R2 , by tasing the test on the seq~ence of
R2 at the successive stages of sampling.
The above stated result implies
2
then that R at the nth stage is invariantly sufficient.
The ordinary correlation coefficient between tHO
varic~tes
can be
treated in a completely analogous way.
EXAMPLE 7.3.
Let
~
be n-space with the equiangular line deleted, and
the function s as in Example 7.1.
In contrast to the lr,tter, let Pe (x)
:l:
go (s (x) ) ,
where {gel consists of all positive measurable functions on S such that
j
ge(s(x)) dx = 1.
Denote z = (l/7t) arc tan [(s2 - si/n)1/2/slJ, so that 0 ";;; - 1.
Let G be the totality of transformations x --> c(z)
~,
where c is any positive
1I-32
analytic function on [0,1] such that c(O)
=1
(note that z is a function of
x, and that the transformation does not change the value of x).
Part (i) of
Assumption C can be verified by direct computation; parts (ii) and (iil) are
the same as in Example 7.1.
EXample 7.1, hence the same
'Ihe group G produces the same orbits as in
~r
Therefore, the same conclusion obtains as
in Example 7.1, i.e. the sequence of Student's ratios is an invariantly
sufficient sequence.
In Example 7.3 Theorem 7.1 is easily applied.
On the other hand, we
cannot apply Theorem 3.1 directly, since we do not know how to verify Assumption
A(ii) in this case due to the more complicated structure of G.
Care has been
taken in Example 7.3 that G does not contain an obvious subgroup that produces
the same orbits and for which Assumption A(ii) can be verified (if there were
such a subgroup, it could be used instead of G to yield the desired conclusion).
The group of Example 7.1 is not a subgroup of G in Example 7.3 because G does
not contain any transformation x -> c x with c constant, except when c
= 1.
We can get a similar example by replacing the group (under multiplication) of
functions c(z) in Example 7.3 by the group of functions c(z) defined by
In
c(z)
= J(exp
zy) a(dy),where a runs through the additive group of signed
measures on the real line such that a((o))
=0
and J(exp y) la(dy)l<
00.
The
restriction a((o)) = 0 prevents the group of Example 7.1 from being a subgroup
of G.
The essential difference between Assumptions A and C, as far as their
verifiability is concerned, is that in Assumption A(ii) the structure of the
group G comes into play, whereas in Assumption C conditions have to be verified
only for each g separately.
Example 7.3, even though admittedly artificial,
shows that even when the family of distributions is dominated there may be
I
JI
I
I
I
I
I
I
II
I
I
I
I
I
I
.1
I
I
I
~
I
I
I
I
I
I
I
•I
I
I
I
I
I
~
I
I
II-33
cases Where G is so
complic~ted
that the verification of
either impossible or more difficult
th~
Assump~ion A(i~)
the vefitication of Assumption C.
1s
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
II-34
REF ERE N C E S
[lJ
ANDERSON, T. W. (1958).
Analysis.
[2J
Introduction to Multivariate Statistical
Wiley, New York.
APPLEBY, R. H., and FREUND, R. J. (1962). An empirical evaluat;l.on of
multivariate sequential procedure for testing means. Ann. Math.
Statist.
[3J
An
~~
1413-1420.
ARMITAGE, P. (1957).
Restricted sequential procedures.
Biometrika
41+
..,. ...
9-26.
[4J
BAHADUR, R. R. (1954).
Sufficiency and statistical decision functions.
g2 423-462.
Ann. Math. Statist.
[5J
BP~ARD, G. A. (1952, 1953).
The frequency justification of cert~in
Biometrika 39 144_150 and 40 468-469.
sequential tests.
~-
~~
[tiJ BERK, ROBERT H. (1964). Asymptotic propert~es of sequential probab1l1 ty
tests. Ph.D. thesis (unpublished), Harv~rd Univ.
[7J
BURKHOLDER, DONALD L. (1960). The relation between sufficiency and
invariance, I: theory. Invited address at the Central Regional
Meeting of the Institute of Mathematical Statistics, Lafayette, Indiana.
[8J
COX, D. R. (1949). The use of the range in sequential analysis.
Stati st. Soc. Ser. B 11 lOl-lll~.
[9J
COX, D. R. (1952).
Sequential tests for composite hypotheses.
Cambridge Phi10s. Soc. ~§
[lOJ
t-test reaches a decision with probability one.
797-805 and ~2
[11J
DOOB, J. L. (1953).
[12J
F&\SER, D. A.
s.
New York.
s.
The WAGR sequential
Ann. Math. Statist.
,
936.
Stochastic Processes.
(1956).
Ann. Math. Statist.
[13J FRASER, D. A.
Proc.
290-299.
DAVID, HERBERT T. and KRUSKAL, WILLIAM H. (1956).
~7
J. Roy.
~7
(1957).
Wiley, New York.
Sufficient statistics with nuisance parameters.
838-842.
Nonparametric Methods in Statistics.
Wiley,
II-35
[14J GHOSH, BHASKAR K. and FREEMAN, HAROLD (1961).
Experimentation:
Introduction to Sequential
Sequential Analysis of Variance.
A monograph prepared
by Groton Associates, Groton, Massachusetts, for the Quartermaster
Field Evaluation Agency, Fort Lee, Virginia.
[15 J GHOSH, JAYANTA KUMAR (1960).
On the monotonicity of the OC of a class
of sequential probability ratio tests.
g
Calcutta Statist. Assoc. Bull.
139-144.
[16J GHOSH, JAYMlTA KUMAR (1962).
Optimum properties of sequential tests of
simple and composite hypotheses and other related inference procedures.
D. Phil. thesis (unpublished), Calcutta University.
[17J HAJNAL, J. (1961).
[18J
A two-sample sequential t-test.
65-75.
WM. JACKSON (1959).
On sufficiency and invariance with applications
in sequential analysis. Inst. of Statist. Mimeo. Sere No. 228, Univ.
HAI.J..,
of North Carolina, Chapel Hill.
[19J
Biometrika 48
HAI.J..,
II:
(contains errors).
WM. JACKSON (1960).
applications.
The relation between sufficiency and invariance,
Invited address at the Central Regional Meeting
of the Inst. of Math. Statist., Lafayette, Indiana.
[20J HOEL, PAUL G. (1955). On a sequential test for the general linear
hypothesis. Ann. Math. Statist. g§ 136-139.
[21J IFRAM, A. F. (1963).
On the asymptotic behavior of densities with
applications to several tests of composite hypotheses.
Ph.D. thesis
(unpublished), Univ. of Illinois.
[22J JACKSON, J. EDWARD and BRADLEY, RALPH A. (1961).
~-tests. Ann. Math. Statist. 19 1063-1077.
[23J JOHNSON, N. L. (1953).
Some notes on the application of sequential
methods in the analysis of variance.
[21~J
JOHNSON, N. L. (1954).
variance problems.
Ann. Math. Statist.
g~
614-623.
Sequential procedures in certain component of
Ann. Math. Statist.
gz
[25J JOHNSON, N. L. (1961). Sequential analysis:
Soc. Ser. A 124 372-411.
[26J KIEFER, J. (1957).
Sequential x2_ and
357-366.
a survey.
J. Roy. Statist.
Invariance, minimax sequential estimation, and
continuous time processes. Ann. Math. Statist. 28 573-601.
I
JI
I
I
I
I
I
I
--I
I
I
I
I
I
.-I
I
I
'-I
I
I
I
.1
I
I
II
I
I
I
I
I
I
I.
I
I
II-56
[27J
KIEFER, J. and WEISS, LIONEL (1957).
Some properties of generalized
sequential probability ratio tests.
Ann. Math. Statist. 28 57-70,
14-17, and 72-74.
[28J
LEHMANN, E. L. (1953).
g~
[29J
The power of rank tests.
Ann. Math. Statist.
23-43.
LEHMANN, E. L. (1955).
Statist. 26
Ordered families of distributions.
Ann. Math.
399-419.
[30J
LEHMANN, E. L. (1959).
Testing Statistical Hypotheses.
[31J
LEHMANN, E. L. and SCHEFFE, HENRY (1950). Completeness, similar regions,
and unbiased estimation - Part I. Sankhya ~2 305-340.
[32J
LOEvE, MICHEL (1963).
Princeton, N. J.
[33J
NANDI, H. K. (1948). Use of well-known statistics in sequential analysis.
Sankhya 8 339-344.
[34J
NATIONAL BUREAU OF STANDARDS (1951). Tables to Facilitate Sequential
t-Tests. Applied Math. Ser. 7, U. S. Government Printing Office,
Washington.
[35J
RAY, W. D. (1956). Sequential analysis applied to certain experimental
designs in the analysis of variance. Biometrika ~~ 388-403.
[36J
RUSHTON, S. (1950).
On a sequential t-test.
[37J
RUSHTON, S. (1952).
On a two-sided sequential t-test.
Probability Theory (3rd ed.).
Wiley, New York.
VanNostrand,
Biometrika ~1
326-333.
Biometrika ~2
302-308.
[38J
RUSHTON, S. (1954).
On the confluent hypergeometric function M(a,r,x).
Sankhya ±~ 369-376.
[39 J RUSHTON, S. and LANG, E. D. (1954).
function.
[40J
Sankhya!~ 377-411.
SCHNEIDERMAN, M. A. and ARMITAGE, P. (1962).
procedures.
[41J
Tables of the confluent hypergeometric
Biometrika
~2
--
359-366.
family of closed sequential
41-56.
SCHNEIDERMAN, M. A. and ARMITAGE, P. (1962).
Biometrika 49
A
Closed sequential t-tests.
I
II-37
[42]
SLATER, L. J. (1960).
Confluent Hypergeometric Functions.
Cambridge
Univ. Press.
[43]
SOBEL, MIL'roN and WALD, ABRAHAM (1949).
A sequential decision procedure
for choosing one of three hypotheses concerning the unknown mean of
a normal distribution.
[44]
STEIN, CHARLES (1948).
statist. !2
Ann. Math. Statist.
~Q
502-522.
On sequences of experiments (abstract).
Ann. Math.
117-118.
[45] WALD, ABRAHAM (1947).
sequential Analysis.
Wiley, New York.
[46J WILCOXON, FRANK, RHODES, L. J. and BRADLEY, RALPH A. (1963).
Two sequential
two-sample grouped rank tests with applications to screening experiments.
Biometrics!~
58-84.
[47] WffiJOSUDIRDJO, SUNARDI (1961).
ratios.
Limiting behavior of a sequence of density
Ph.D. thesis (unpublished), Univ. of Illinois.
Ann. Math. Statist.
~~
296-297.
Abstract in
JI
I
I
I
I
I
I
--I
I
I
I
I
I
.-I
I
I
,I
I
I
I
I
I
I
•I
I
I
I
I
I
~
I
I
11-3 8
F 0 0 T NOT E S
1.
Research was supported in part by the Air Force Office of Scientific
Research, the Office of Naval Research, and the National Institutes of
Health under Grant GM-10397.
Reproduction is permitted for any purposes
of the United States Government.
2.
Research was supported by the National Science Foundation under Grants
G-ll,382 and G-21,507.
3.
Research was supported by a junior research training scholarship grant
of the Government of India.
4.
For an explanation of the invariance principle, see [30).
5.
Fraser [12) has considered a generalized sufficiency definition in the
presence of nuisance parameters, differing from this approach through invariance.
It is shown in [16) that Fraser-sufficiency and invariant
sufficiency sometimes, though rarely, coincide.
6.
Hereafter, we drop the brackets in our notation for composition of functions,
writing z(·)
7.
See also the end of 1.7 •
= zy y(.)
for z [y(.)J.
y
With such a definition, namely that the probability that t(x) lies in any
t-set given v is known, the sufficient statistic v may well contain more
information about the parameter than does t, but certainly not less.
8.
This is consistent with the definition in [26) where g acts component-wise.
Application of the invariance principle in the sequential decision problem
presumes a cost function which is invariant under both G and
G,
for example,
constant cost per observation.
9.
It should be noted that Wald [45J and Barnard [5J have proposed alternative
approaches to deriving sequential tests of such hypotheses; see also [25, lL~, 36].
10.
It is frequently suggested (e.g., [9, 10J and pp. 98 and 250 in [30)) that
in order to use
certainty.
W~ld's
boundaries for a SPRT one must prove termination with
However, it is easily verified that the requirements on the error
probabilities are fulfilled as apprOXimate upper bounds, rather than apprOXimate
I
II-39
equalities, whether or not termination is certain. (The word "approximate"
before "upper bounds" is really only justified if the error probabilities
are small, but may in fact be deleted throughout if Waldls conservative
boundaries B
11.
=~
and A
= 1/0 are
used.)
Application of the stein Theorem could also be validated by verifying
Assumption C, in analogy with Cox's Theorem, but the verification is both
tedious and unnecessary.
12.
It was noticed by J. K. Ghosh that if ~ on ~S is complete, then the
conclusions of Theorem 3.1 and 3.2 are equivalent.
~
I
I
I
I
I
I
I
II
I
I
I
I
I
I
.-I
I
© Copyright 2026 Paperzz