Carroll, R.J.; (1977).On Sequential Estimation of the Largest Normal Mean When The Variance is Known."

/
ON SEQUENTIAL ESTIMATION Or THE LARGEST NORMAL MEAN
WHEN THE VARIANCE IS KNOWN
by
Raymond J. Carroll*
University of North Carolina at Chapel Hill
Abstract
We define a class of stopping times for estimating the largest of k
normal means when the variance is known.
The class can achieve signi-
ficant reduction in sample size compared to a related procedure due to
Blumenthal (1976) because it employs an elimination feature which halts
sampling on populations furnishing no information about the largest
mean.
The asymptotic behavior of the stopping times and the mean square
consistency of the estimators are studied.
AMS 1970 Subject Classifications:
Key Words and Phrases:
Primary 62F07; Secondary 62L12
Sequential Estimation, Elimination, Largest Normal
Mean, Ranking and Selection
*This research was supported by the Air Force Office of Scientific Research
under Grant No. AFOSR-7S-2796. The author wishes to thank the Department
of Theoretical Statistics, University of Minnesota for providing office
space during part of this research.
..
1.
Introduction
Let
6 , ... ,6
be the unknown means of k normal populations with
k
l
common known variance 0 2 (henceforth taken as unity). Let Xln ,··· 'Xkn
be the sample means of n observations taken from the k populations, and
define the ordered population and sample means by
X[l]n~ ... ~X[k]n·
6[1]~ ... ~
ark] and
The problem is to construct a sequential stopping time
* often takes as -X[k]n)
for the estimation of 6[k] (by an estimator en'
with prespecified mean seuqre error (MSE) r.
gate depend on the estimates
fj,
The procedures we investi-
in = X[k]n -X[i]n of
fj,
i = 6 [k] -6 [i]
(i=l, ... ,k).
Blumenthal (1976) constructed a purely sequential stopping time
N and a related two-stage procedure N* ' obtaining results which may
B
B
be summarized as follows.
For
fj,
l' ... ,
fj,
k-l fixed, rN B and rN; have
almost sure limits as r+O, but asymptotic mean square consistency is
verified
only for the two-stage procedure for k=2.
proportional to / / 2
(fj,.
1
~ r 112 ) ,
neither
If each
fj,.
1
is
has an almost
sure limit, but the limit distribution is computed only for the two-
* when k=2; asymptotic mean square consistency is
stage procedure NB
not checked in this case for the sequential procedure N .
B
Blumenthal
indicates that for k=2, his procedures will achieve approximately 10%
savings in sample size when compared to a conservative, fixed-sample
procedure.
In this note we generalize Blumenthal's results in two ways:
first,
we introduce a class of stopping times N for which N is (in a sense to
B
be made precise) a "least favorable" member and second, we answer a
number of open questions in his paper by obtaining limit distributions
and asymptotic mean square errors.
In particular, when the parameters
-2-
8.
~. are proportional to r 1(0~8.< 00), we compute the limit distributions
1
1
8k_l~
of this class, finding that if
1/2, the limit distribution is a
stopping time for a function of Brownian motion, and that in general,
the mean square error is proportional to r.
De f ine HI =1 an d let n
*
-1
~
(112
n
-
~
l,···,n
1/2
~
k-l) be the MSE due to
Define
estimating e[k] by en = X[k]n .
The stopping time NB=NB(k) is defined as follows:
after obtaining t
observations from each population, compute the estimates
A
and compute
net) =
definition of
n(~);
~
It"'·'
~
k-l , t
n(~) by using the estimates ~lt""'~k-l, t in the
then
The difficulty with N is that it continues to sample populations which
B
obviously are not associated with the largest population mean, i.e., it
fails to eliminate inferior populations.
We correct this difficulty by
defining a "selection sequence" (Swanepoel and Geertsema (197'); define
b to be the solution to 1-~(b)+b$(b)+$2(b)/~(b) = a/k-l (here ~ and
~
are the standard normal distribution function and density), so that
as a+O, b
2
~ 2 10g((k-l)/a), and define
_
_
11
Mi = inf{n: X[k]n-Xin~2
~k_l>O
Assuming
1
11
((b +log n)m)
2
= 2
11
2 g (n,b)}
and ej=e[k]' it follows(Kobbins (1970)) that
Pr{M.>M. for all i
J
2
2
~
j}
~
I-a.
Thus, our plan will be to continue to sample from population
as
NB~Mi;
i
as long
once NB>M , we will discontinue sampling from that population.
i
Formally, we make the following definition.
-3-
Definition:
Reorder the populations so that
Ml~M2~ ... ~Mk'
If
NB(k)~Ml'
take NB(k) observations from each population. Otherwise, completely eliminate the first population from further consideration, and continue as if there
were k-l populations in the experiment (although b 2 is unchanged). Then, if
NB(k-l)~2' take NB(k-l) observations from each population; otherwise, eliminate population two. Continue in this manner until stopping, denoting the
number o~ ob~ervations on each population by (N l $N $ ... $N k ) =~. The total
2
sample S1ze 1S T = Nl+ ... +N k .
2
Note that by choosing b = 00 (a=O) , we obtain Nl=···=Nk=NB(k), so that
Blumenthal's stopping time is a special case of ours.
The benefits of this
class of stopping times are discussed in the next section.
For notational convenience, we limit ourselves to the special case k = 2,
but the proofs are structured so as to extend immediately.
In order to indi-
cate the precise nature of the extension, we make no use of the following facts
which hold only for k = 2:
2.
Asymptotic Distributions
For k = 2, the limit distributions of Nand T are basic functions of
We assume throughout this section that ~ ~ r 8 for some 8 ~ 0
NB and M.
~
and !J.r 2
-+
C0 (O$CO~oo) •
Let Wbe Brownian motion with mean zero and variance
2t at time t, and define
(1)
if Co
<
00,
while W*(s,t,Q)) =
Consider 0 < a < b <
00
00.
Let [0] denote the greatest integer function.
and define G (s,t) = [sir] ~!J.[t/r]' which is a stochasr
4
tic process on the multidimensional time parameter space D [a,b] (Bickel
2
and Wichura (1971), Billingsley (1968)). Assuming that 8 <8 , we see
1 2
that
(2)
and,since X2n - Xln -6 is an average of mean zero normal random variables, (2)
tells us that on D [a,b],
2
(3)
(8
Gr ~
u
~
Let H . =min{H (x)}.
mln
2
1/2),
8 < 1/2) ,
00
where "=>,, denotes weak convergence.
Lemma 1
~
Thus we obtain
For 8<1/2, rN
B
~ 1.
For 8~1/2 and
H .
mln
min (H (W*(s,t,c ))-s) >
Pr{rNB>u} + Prf
o
H. ssstsu 2
*
mln
Further, G is a proper distribution function.
Proof of Lemma 1
By definition,
A
= Pr{n(m»m
(4)
for all H . srmsu}
mln
= Pr{H . m!n
< < (H2(kl/2~ )-rk»O}
m
mln -rk -rm_u
min{kl/2~ m:H
as x
+
00.
For
8~1/2,
. srksrmsu}
~n
~
00,
so that rN
B
~
1 since H(x)+l
(3) and (4) show that
Pr{rNB>u}=Pr{H . ~~~tsu (H 2 (G r (s,t))-s»
mln
+
o} = G*(u) .
O}
G* (u).
The following result (Carroll (1976) delineates the behavior of
2
M for a particular choice of b .
2
2 P
-1
Lemma 2 If b log~ + 0, then ~ M/b + 1.
some 0 < 8
0
< 1/2, then
Thus, if r
1-28 0 2
b + 1 for
5
p
-+
o
(if 8
1
(if 8
00
(if 8
0
p
-+
p
-+
0
0
>
8)
= 8)
< 8).
Now, letting T be the total sample size using b
2
28 0 -1
= r a n d TB = 2N
B
the total sample size of the Blumenthal procedure, we see that
Lemma 3
Remark
T/T
B
~
1/2 if 8<8 , while T/T g 1 if
B
0
8~80
.
Lemma 3 is easily extended to the case of general k as follows.
Let T = kN be the total sample size of the Blumenthal procedure, set
B
B
8i
2
l
28
b = r 0- , let
r
(i=l, ... ,k-l). Let p be the number of 8 <80 ,
i
~i ~
i.e.,
p is the number of populations furnishing little information
about
Then
P
T/T + l-p/k.
B
In other words, the elimination can result in a significant savings in
sample size.
3.
Asymptotic MSE
Our goal is to find an estimate
eN*
of 8[2] for which the follow-
ing mean square consistency result holds:
(5)
In the proof given below, we are forced to make the convention that for
a small constant a>O, at least ar
population.
observations are taken from each
Suppose that upon stopping, N. observations have been taken
1
·
(.1=,
1 2) .
at10n
on t h e 1. th popu1
*
8N
-1
= max(XlN 'X 2N
1
).
2
0ur est1mate
.
. ta ken to b e
0 f 8[2] 1S
This estimate does allow the possibility of esti-
6
mating 8[2] by the mean. of an eliminated population, but the nature of
the elimination shows this possibility to be quite unlikely.
Lemma 4
Let b
28 -1
0 n d 6 ~ r 8 , with 0~8 , 8 <1/2.
= ra
0
2
Then (5)
holds.
Proof of Lemma 4
that r
-1
By Bickel and Yahav (1968), it suffices to show
*
2
(eN -8[2]) has a limit distribution and that for some rO>O,
00
(6)
L
sup Pr{r
m=l O<r<r
o
Now, set e <8
l 2
-1
*
2
(e N-e[2]) >m}
<
00
without loss of generality.
is contained in the union of the events
Now, by the maximal inequality of reverse martingales (Doob (1953), pp. 317318) and the fact that min (Nl,N2)~ar-l,
Pr{IX
~
Pr{
for some cO>O.
2N
2
- e 1>(mr)1/2}
2
suplx2k-e21>(mr)1/2}
k;2ar -1
This verifies (6) .
*
2
To show that r -1 (eN-e
[2]) has a limit
-1
distribution, first note that r N. converges in probability to a constant
1
(i=1,2) , so by an extension of Anscombe's (1952) Theorem 1, the vector
converges in law to a jointly normal random vector.
we complete the proof by noting that pr{X
~
z + r
2N
1/
~XIN
2
2
} ~ 1 and
1
(82 - 8 1 ) and XINl>X2N2}
~ z and X
1N ~X2N }.
1
2
o
7
An alternate definition of eN* takes the maximum of the sample
means only if elimination has not occured; we have not been able to
verify (6) in this case.
2
Note that by choosing b =
00
,
the proof
of Lemma 4 shows that the Blumenthal procedure satisfies (5).
~~rl/2
The situation for
is considerably more complicated.
Define m = [t/r], assume that e <8 , and let
1 2
(7)
Y (t)
r
1/ 2
= m
1
= m
h
--
(max(Xlm,X2m)-82)
-
(max(Xlm~8l-(82-el)
This is a stochastic process on 0[1/2,2] which, for ~~8 (8~1/2),
converges weakly to a process V in C[1/2,2].
We also know that rN
B
converges in law to a random variable (call it WI) with distribution
function as in Lemma 1. Define a process
*
_1/ 2
V (t) = WI
Y(tW l ).
28 -1
Lemma 5 Let b 2=r 0
and ~~r8 with O<80<~s6 <
00
Then E(Y*(1))2
exists and
Proof of Lemma 5 By Lemma 4, since (6) holds, it suffices to show
1/
*
that r - 2(8 -8) converges in distribution to Y* (1). By Lemma 3,
N
Ni/N B g 1, so we may take 8 * =max(X- lN ,X).
N
2N
B
we first show that on D2 [1/2,2] x O2 [1/2,2]
(8)
Note that 1/ 2 s rN
B
x
B
S
Rl ,
(1) Y (2) rN ) =>(y(l) y(2) W )
( yr
'r'B
'
'1'
Here, for m1 = [sir],
v
r
ID
2 = [t/r] ,
(j) (s t) = m 1/ 2
'
1
(X.
Jm
-8.)
2
J
U=1,2).
and V. is the limit distribution of V (j). Because V (1) and V (2)
J
r
r
r
are tight, (8) will follow if we can prove the convergence of the
2;
8
finite dimensional distributions.
y(3)CS t)
r
Z Cu)
r
=
'
To do this, define two processes:
H (Iv(Z)(s t) - Vr(l)(S,t)
Z
r
'
= Zr (u,Hmln
. ) = inf{y(3)(s,t):
r
If u<Hmin , define ZrCu)
= Zr(Hmin ).
+
[s/r]~(eZ-el)
I) -
[s/r]r ,
H . ssStsu}
mln
By the continuous mapping theorem,
since HZ is continuous and both Vel) and V(Z) have weak limits in
r
r
CZ[l/Z,Z], it follows that y~3) and Zr have weak limits (call them
y(3) ,Z) in CZ[1/2,2] and C[1/2,2] and on O2 [1/2,2] x O2 [1/2,2] x 0[1/2,2],
(yCl) y(2) Z ) => CyCl) y(2) Z)
r 'r 'r
"
(9)
To check the convergence of the finite dimensional distributions in (8),
we consider only a special case; note that
e
Pr{y(l)Cs,t) s u ' y;2)(s,t) s u ' rN S u }
l
B
r
3
2
(10)
Pr{V(l) (s,t) s u ' y~Z)(s,t) s u }
r
Z
l
=
- pr{V~l)(s,t) s u ' y~Z)(S,t) < u , Zr(u )
l
z
3
>
O}
Since the first term on the right hand side of (10) has a limit, equation (9)
and Theorem Z.l of Billingsley (1968) prove the weak convergence of the finite
dimensional distributions.
We next apply a modification of Theorem 17.2
of Billingsley, replacing his equation (17.18) by our (8) and remembering
in the proof that
~r(t) =
trN B '
~
s rN B, WI s Z with probability one.
~O(t)
=
To see this, define
tW 1 , and note that (8) implies that on O[l/Z,Z] x
D[1/2,Z] x R,
Then by the continuous mapping theorem,
9
where "0" denotes composition.
(rN )
B
Since
_k
2(V 04> ) (1)
r r
o
the proof is complete.
4.
Conclusions
In this note we have defined a class of stopping times for estimating
the largest of k means.
This class includes a procedure due to Blumenthal
but, by building in an elimination feature, allows the possibility for
significant savings in sample size.
We have obtained the asymptotic be-
havior of the stopping times, showing that they are related to stopping
times for a function of Brownian motion.
with asymptotic MSE proportional to r.
Finally, we define an estimator
10
References
Anscombe, F. J. (1952). Large sample theory of sequential estimation.
Proc. Camb. Phil. Soc. 48, 600-607.
Bickel, P. J. and Wichura, M. J. (1971). Convergence criteria for
multiparameter stochastic processes and some applications. Ann.
Math. Statist. 42, 1656-1670.
Bickel, P. J. and Yahav, J. A. (1968). Asymptotically optimal Bayes
and minimax procedures in sequential estimation. Ann. Math.
Statist. 39, 442-456.
Billingsley, P.
New York.
(1968).
Convergence of Probability Measures.
Wiley,
Blumenthal, S. (1976). Sequential estimation of the largest normal
mean when the variance is known. Ann. Statist. i, 1077-1087.
Carroll, R. J. (1976). On sequential elimination procedures. Institute of Statistics Mimeo Series No. 1078, University of North
Carolina at Chapel Hill.
Doob, J. L.
(1953).
Stochastic Processes. Wiley, New York.
Robbins, H. (1970). Statistical methods related to the law of the
iterated logarithm. Ann. Math. Statist. 41, 1397-1409.
Swanepoel, J. W. H. and Geertsema, J. C. (19 67 ). Sequential procedures
with elimination for selecting the best of k normal populations.
s. Afr. Statist. J., lQ, 9-36.