/ ON SEQUENTIAL ESTIMATION Or THE LARGEST NORMAL MEAN WHEN THE VARIANCE IS KNOWN by Raymond J. Carroll* University of North Carolina at Chapel Hill Abstract We define a class of stopping times for estimating the largest of k normal means when the variance is known. The class can achieve signi- ficant reduction in sample size compared to a related procedure due to Blumenthal (1976) because it employs an elimination feature which halts sampling on populations furnishing no information about the largest mean. The asymptotic behavior of the stopping times and the mean square consistency of the estimators are studied. AMS 1970 Subject Classifications: Key Words and Phrases: Primary 62F07; Secondary 62L12 Sequential Estimation, Elimination, Largest Normal Mean, Ranking and Selection *This research was supported by the Air Force Office of Scientific Research under Grant No. AFOSR-7S-2796. The author wishes to thank the Department of Theoretical Statistics, University of Minnesota for providing office space during part of this research. .. 1. Introduction Let 6 , ... ,6 be the unknown means of k normal populations with k l common known variance 0 2 (henceforth taken as unity). Let Xln ,··· 'Xkn be the sample means of n observations taken from the k populations, and define the ordered population and sample means by X[l]n~ ... ~X[k]n· 6[1]~ ... ~ ark] and The problem is to construct a sequential stopping time * often takes as -X[k]n) for the estimation of 6[k] (by an estimator en' with prespecified mean seuqre error (MSE) r. gate depend on the estimates fj, The procedures we investi- in = X[k]n -X[i]n of fj, i = 6 [k] -6 [i] (i=l, ... ,k). Blumenthal (1976) constructed a purely sequential stopping time N and a related two-stage procedure N* ' obtaining results which may B B be summarized as follows. For fj, l' ... , fj, k-l fixed, rN B and rN; have almost sure limits as r+O, but asymptotic mean square consistency is verified only for the two-stage procedure for k=2. proportional to / / 2 (fj,. 1 ~ r 112 ) , neither If each fj,. 1 is has an almost sure limit, but the limit distribution is computed only for the two- * when k=2; asymptotic mean square consistency is stage procedure NB not checked in this case for the sequential procedure N . B Blumenthal indicates that for k=2, his procedures will achieve approximately 10% savings in sample size when compared to a conservative, fixed-sample procedure. In this note we generalize Blumenthal's results in two ways: first, we introduce a class of stopping times N for which N is (in a sense to B be made precise) a "least favorable" member and second, we answer a number of open questions in his paper by obtaining limit distributions and asymptotic mean square errors. In particular, when the parameters -2- 8. ~. are proportional to r 1(0~8.< 00), we compute the limit distributions 1 1 8k_l~ of this class, finding that if 1/2, the limit distribution is a stopping time for a function of Brownian motion, and that in general, the mean square error is proportional to r. De f ine HI =1 an d let n * -1 ~ (112 n - ~ l,···,n 1/2 ~ k-l) be the MSE due to Define estimating e[k] by en = X[k]n . The stopping time NB=NB(k) is defined as follows: after obtaining t observations from each population, compute the estimates A and compute net) = definition of n(~); ~ It"'·' ~ k-l , t n(~) by using the estimates ~lt""'~k-l, t in the then The difficulty with N is that it continues to sample populations which B obviously are not associated with the largest population mean, i.e., it fails to eliminate inferior populations. We correct this difficulty by defining a "selection sequence" (Swanepoel and Geertsema (197'); define b to be the solution to 1-~(b)+b$(b)+$2(b)/~(b) = a/k-l (here ~ and ~ are the standard normal distribution function and density), so that as a+O, b 2 ~ 2 10g((k-l)/a), and define _ _ 11 Mi = inf{n: X[k]n-Xin~2 ~k_l>O Assuming 1 11 ((b +log n)m) 2 = 2 11 2 g (n,b)} and ej=e[k]' it follows(Kobbins (1970)) that Pr{M.>M. for all i J 2 2 ~ j} ~ I-a. Thus, our plan will be to continue to sample from population as NB~Mi; i as long once NB>M , we will discontinue sampling from that population. i Formally, we make the following definition. -3- Definition: Reorder the populations so that Ml~M2~ ... ~Mk' If NB(k)~Ml' take NB(k) observations from each population. Otherwise, completely eliminate the first population from further consideration, and continue as if there were k-l populations in the experiment (although b 2 is unchanged). Then, if NB(k-l)~2' take NB(k-l) observations from each population; otherwise, eliminate population two. Continue in this manner until stopping, denoting the number o~ ob~ervations on each population by (N l $N $ ... $N k ) =~. The total 2 sample S1ze 1S T = Nl+ ... +N k . 2 Note that by choosing b = 00 (a=O) , we obtain Nl=···=Nk=NB(k), so that Blumenthal's stopping time is a special case of ours. The benefits of this class of stopping times are discussed in the next section. For notational convenience, we limit ourselves to the special case k = 2, but the proofs are structured so as to extend immediately. In order to indi- cate the precise nature of the extension, we make no use of the following facts which hold only for k = 2: 2. Asymptotic Distributions For k = 2, the limit distributions of Nand T are basic functions of We assume throughout this section that ~ ~ r 8 for some 8 ~ 0 NB and M. ~ and !J.r 2 -+ C0 (O$CO~oo) • Let Wbe Brownian motion with mean zero and variance 2t at time t, and define (1) if Co < 00, while W*(s,t,Q)) = Consider 0 < a < b < 00 00. Let [0] denote the greatest integer function. and define G (s,t) = [sir] ~!J.[t/r]' which is a stochasr 4 tic process on the multidimensional time parameter space D [a,b] (Bickel 2 and Wichura (1971), Billingsley (1968)). Assuming that 8 <8 , we see 1 2 that (2) and,since X2n - Xln -6 is an average of mean zero normal random variables, (2) tells us that on D [a,b], 2 (3) (8 Gr ~ u ~ Let H . =min{H (x)}. mln 2 1/2), 8 < 1/2) , 00 where "=>,, denotes weak convergence. Lemma 1 ~ Thus we obtain For 8<1/2, rN B ~ 1. For 8~1/2 and H . mln min (H (W*(s,t,c ))-s) > Pr{rNB>u} + Prf o H. ssstsu 2 * mln Further, G is a proper distribution function. Proof of Lemma 1 By definition, A = Pr{n(m»m (4) for all H . srmsu} mln = Pr{H . m!n < < (H2(kl/2~ )-rk»O} m mln -rk -rm_u min{kl/2~ m:H as x + 00. For 8~1/2, . srksrmsu} ~n ~ 00, so that rN B ~ 1 since H(x)+l (3) and (4) show that Pr{rNB>u}=Pr{H . ~~~tsu (H 2 (G r (s,t))-s» mln + o} = G*(u) . O} G* (u). The following result (Carroll (1976) delineates the behavior of 2 M for a particular choice of b . 2 2 P -1 Lemma 2 If b log~ + 0, then ~ M/b + 1. some 0 < 8 0 < 1/2, then Thus, if r 1-28 0 2 b + 1 for 5 p -+ o (if 8 1 (if 8 00 (if 8 0 p -+ p -+ 0 0 > 8) = 8) < 8). Now, letting T be the total sample size using b 2 28 0 -1 = r a n d TB = 2N B the total sample size of the Blumenthal procedure, we see that Lemma 3 Remark T/T B ~ 1/2 if 8<8 , while T/T g 1 if B 0 8~80 . Lemma 3 is easily extended to the case of general k as follows. Let T = kN be the total sample size of the Blumenthal procedure, set B B 8i 2 l 28 b = r 0- , let r (i=l, ... ,k-l). Let p be the number of 8 <80 , i ~i ~ i.e., p is the number of populations furnishing little information about Then P T/T + l-p/k. B In other words, the elimination can result in a significant savings in sample size. 3. Asymptotic MSE Our goal is to find an estimate eN* of 8[2] for which the follow- ing mean square consistency result holds: (5) In the proof given below, we are forced to make the convention that for a small constant a>O, at least ar population. observations are taken from each Suppose that upon stopping, N. observations have been taken 1 · (.1=, 1 2) . at10n on t h e 1. th popu1 * 8N -1 = max(XlN 'X 2N 1 ). 2 0ur est1mate . . ta ken to b e 0 f 8[2] 1S This estimate does allow the possibility of esti- 6 mating 8[2] by the mean. of an eliminated population, but the nature of the elimination shows this possibility to be quite unlikely. Lemma 4 Let b 28 -1 0 n d 6 ~ r 8 , with 0~8 , 8 <1/2. = ra 0 2 Then (5) holds. Proof of Lemma 4 that r -1 By Bickel and Yahav (1968), it suffices to show * 2 (eN -8[2]) has a limit distribution and that for some rO>O, 00 (6) L sup Pr{r m=l O<r<r o Now, set e <8 l 2 -1 * 2 (e N-e[2]) >m} < 00 without loss of generality. is contained in the union of the events Now, by the maximal inequality of reverse martingales (Doob (1953), pp. 317318) and the fact that min (Nl,N2)~ar-l, Pr{IX ~ Pr{ for some cO>O. 2N 2 - e 1>(mr)1/2} 2 suplx2k-e21>(mr)1/2} k;2ar -1 This verifies (6) . * 2 To show that r -1 (eN-e [2]) has a limit -1 distribution, first note that r N. converges in probability to a constant 1 (i=1,2) , so by an extension of Anscombe's (1952) Theorem 1, the vector converges in law to a jointly normal random vector. we complete the proof by noting that pr{X ~ z + r 2N 1/ ~XIN 2 2 } ~ 1 and 1 (82 - 8 1 ) and XINl>X2N2} ~ z and X 1N ~X2N }. 1 2 o 7 An alternate definition of eN* takes the maximum of the sample means only if elimination has not occured; we have not been able to verify (6) in this case. 2 Note that by choosing b = 00 , the proof of Lemma 4 shows that the Blumenthal procedure satisfies (5). ~~rl/2 The situation for is considerably more complicated. Define m = [t/r], assume that e <8 , and let 1 2 (7) Y (t) r 1/ 2 = m 1 = m h -- (max(Xlm,X2m)-82) - (max(Xlm~8l-(82-el) This is a stochastic process on 0[1/2,2] which, for ~~8 (8~1/2), converges weakly to a process V in C[1/2,2]. We also know that rN B converges in law to a random variable (call it WI) with distribution function as in Lemma 1. Define a process * _1/ 2 V (t) = WI Y(tW l ). 28 -1 Lemma 5 Let b 2=r 0 and ~~r8 with O<80<~s6 < 00 Then E(Y*(1))2 exists and Proof of Lemma 5 By Lemma 4, since (6) holds, it suffices to show 1/ * that r - 2(8 -8) converges in distribution to Y* (1). By Lemma 3, N Ni/N B g 1, so we may take 8 * =max(X- lN ,X). N 2N B we first show that on D2 [1/2,2] x O2 [1/2,2] (8) Note that 1/ 2 s rN B x B S Rl , (1) Y (2) rN ) =>(y(l) y(2) W ) ( yr 'r'B ' '1' Here, for m1 = [sir], v r ID 2 = [t/r] , (j) (s t) = m 1/ 2 ' 1 (X. Jm -8.) 2 J U=1,2). and V. is the limit distribution of V (j). Because V (1) and V (2) J r r r are tight, (8) will follow if we can prove the convergence of the 2; 8 finite dimensional distributions. y(3)CS t) r Z Cu) r = ' To do this, define two processes: H (Iv(Z)(s t) - Vr(l)(S,t) Z r ' = Zr (u,Hmln . ) = inf{y(3)(s,t): r If u<Hmin , define ZrCu) = Zr(Hmin ). + [s/r]~(eZ-el) I) - [s/r]r , H . ssStsu} mln By the continuous mapping theorem, since HZ is continuous and both Vel) and V(Z) have weak limits in r r CZ[l/Z,Z], it follows that y~3) and Zr have weak limits (call them y(3) ,Z) in CZ[1/2,2] and C[1/2,2] and on O2 [1/2,2] x O2 [1/2,2] x 0[1/2,2], (yCl) y(2) Z ) => CyCl) y(2) Z) r 'r 'r " (9) To check the convergence of the finite dimensional distributions in (8), we consider only a special case; note that e Pr{y(l)Cs,t) s u ' y;2)(s,t) s u ' rN S u } l B r 3 2 (10) Pr{V(l) (s,t) s u ' y~Z)(s,t) s u } r Z l = - pr{V~l)(s,t) s u ' y~Z)(S,t) < u , Zr(u ) l z 3 > O} Since the first term on the right hand side of (10) has a limit, equation (9) and Theorem Z.l of Billingsley (1968) prove the weak convergence of the finite dimensional distributions. We next apply a modification of Theorem 17.2 of Billingsley, replacing his equation (17.18) by our (8) and remembering in the proof that ~r(t) = trN B ' ~ s rN B, WI s Z with probability one. ~O(t) = To see this, define tW 1 , and note that (8) implies that on O[l/Z,Z] x D[1/2,Z] x R, Then by the continuous mapping theorem, 9 where "0" denotes composition. (rN ) B Since _k 2(V 04> ) (1) r r o the proof is complete. 4. Conclusions In this note we have defined a class of stopping times for estimating the largest of k means. This class includes a procedure due to Blumenthal but, by building in an elimination feature, allows the possibility for significant savings in sample size. We have obtained the asymptotic be- havior of the stopping times, showing that they are related to stopping times for a function of Brownian motion. with asymptotic MSE proportional to r. Finally, we define an estimator 10 References Anscombe, F. J. (1952). Large sample theory of sequential estimation. Proc. Camb. Phil. Soc. 48, 600-607. Bickel, P. J. and Wichura, M. J. (1971). Convergence criteria for multiparameter stochastic processes and some applications. Ann. Math. Statist. 42, 1656-1670. Bickel, P. J. and Yahav, J. A. (1968). Asymptotically optimal Bayes and minimax procedures in sequential estimation. Ann. Math. Statist. 39, 442-456. Billingsley, P. New York. (1968). Convergence of Probability Measures. Wiley, Blumenthal, S. (1976). Sequential estimation of the largest normal mean when the variance is known. Ann. Statist. i, 1077-1087. Carroll, R. J. (1976). On sequential elimination procedures. Institute of Statistics Mimeo Series No. 1078, University of North Carolina at Chapel Hill. Doob, J. L. (1953). Stochastic Processes. Wiley, New York. Robbins, H. (1970). Statistical methods related to the law of the iterated logarithm. Ann. Math. Statist. 41, 1397-1409. Swanepoel, J. W. H. and Geertsema, J. C. (19 67 ). Sequential procedures with elimination for selecting the best of k normal populations. s. Afr. Statist. J., lQ, 9-36.
© Copyright 2025 Paperzz