Carroll, Raymond J.; (1976)On the uniformity of sequential ranking procedures."

On the Uniformity of Sequential Ranking Procedures*
by
Raymond J. Carroll
Department of Statistias
University of North Carolina at ChapeZ HiU
May 9 1976
Institute of Statistics Mimeo Series No. 1067
* This
research was supported in part by the Office of Naval Research
Contract N00014-67-A-00014 and in part by the Air Force Office of Scientific
Research tmder Grant Noo AFOSR-75-2796 •
.e
SUMMARY
On the Uniformity of Sequential Ranking Procedures*
by
Raymond J. Carroll
In the class of ranking procedures based on sequential confidence
intervals of fixed-length, the probability statements are not tmifonn
in the scale parameter.
Further the available generalizations to rankings
based on stochastic orderings are not uniform over the parameter space
Q.
1\'10
methods are proposed for solving such problems; the first is
based on the theory of weak convergence while the second is more direct
but only solves the problem when Q is compact.
American
~hthematical
Society 1970 subject classifications: Primary 62L99;
Secondary 62F07.
Keywords and phrases: Ranking and Selection, Sequential Confidence Intervals,
Unifonnity in Weak Convergence.
* This
research was supported in part by the Office of Naval Research
Contract N00014-67-A-00014 and in part by the Air Force Office of Scientific
Research under Grant No. AFOSR-75-2796 •
.e
1. Introduction
In a certain class of sequential ranking and selection procedures
based on fixed-length confidence intervals (Geertsema (1972), Chow and
F ,O'(x) =
ll
and provides, for each fixed F and fixed k-vector 2,
Robbins (1965)), the standard work assumes an unknown d.f.
F(O'- 1(x-1l))
a statement lim inf Pr{CSIM,Q;}
d+0
= p*
, where d is the length of the
lJ
preference zone ned), the infimum is taken over ned)
a correct selection.
and CS denotes
However, the main historical concern in ran}<ing
theory has been lower bounds for Pr{CSIM,2} unifonn in both ned)
and certain sets of nuisance parameters, so that in fact one requires
(1)
e
lim inf Pr{CSlll,O'}
d+0 M,!l
= p*
.
In the general problem, the distributions take the form F1l,O'(x)
FO'(x-ll), where
0'
ranges over an indexing set
as a subset of [0,(0)
1l
=0,
)>>
=
F (which is taken here
the ranking might be with respect to EX.
Taking
this example includes the case of stochastic orderings.
Although important applications of our results are in ranking theory,
it turns out that if one applies the sequential rule defined below (with
iPk- l (x+b)diP(x) = p* ) to each of the k independent
b chosen so that
J
populations and choE:lses the population with the largest observed value
of the ranking statistic Tne ' then to show (1) it is sufficient to
show (4) below, i.e., to shm'l the mifonnity of the corresponding sequential
fixed-width confidence interval.
The problem of miform bounds for the
vast number of such confidence intervals is itself an i..'1IpOrtant and neglected
.e
area, as the usual results show that under FO' (x-ll),
2
e
(2)
inf lim Pr{correct coverage}
(j
d-!-O
~
p* .
The object of this work then is to find sequential selection and
confidence interval procedures and conditions tmder which (1) and (4) hold for
all compact indexing sets
F.
That compactness alone is not sufficient .to
guarantee (1) and (4) is shown in the counter-e:lCaJ1lP1e of Section 2; the problem
is that a certain stochastic process in D[O,l]
be tight.
(Billingsley (1968)) fails to
This leads to the presentation in Section 3 (see Theorem 1) of a
weak convergence theory which guarantees (1) and (4) in the sample means case.
In Section 4, two examples are given which illustrate the use of Theorem 1.
Finally, in Section 5, a generalization of Theorem 1 is presented.
Tong (1971) considered a problem similar to (1) in the fixed sample size
e
case, making use of results on the tmiformity of convergence (in a) in the
Central Limit Theorem (Parzen (1954)).
These results are not applicable in
our case since the number of observations is random; hence, this paper might
also be considered a slight extension of Parzen's work to the sequential case.
To set the notation for our solution to the confidence interval and ranking problems, we assume there are Li.d. observations Xl'Xz,...
from a d.f.
Fe(x).
One defines for each n a location-scale equivariant statistic Tne
for which, under Fe' Iil(Tne -}.l(e))/h(8) ~ N(O,l) and the ranking is with
respect to }.l(e).
By translation invariance, one may asstu11e }.l(8) :: O.
8 may be considered a scalar and h(e)
Hence,
will satisfy a Lipschitz condition.
For the rest of this paper, assume for convenience that h(e) :: 8.
One also
defines a sequence ~e of location invariant, scale equivariant statistics
2
for which ~8
-+ 6 almost surely. Finally, define for a* > 0,
.e
cr(6)
= max{a*,8},
s2
ne
= rnax{a* ''116
~2 }
3
Although Ne(d)
is slightly different from Chow and Robbins' (1965) rule,
choosing a* very small results in no great change. The problem is to show
for all compact F in [0,(0) that
(4)
where
a
lim inf Pr{ITN (e) e'~d} = Z~(b) - 1,
d+0 e€F
d'
is the standard nomal d.f. Note that e need not be a scale para-
meter ~ and Fa need not be known.
2. An Example
The following example (which can easily be extended to the ranking problem)
shows that even if Tne is the sample mean and ~e the sample variance, it
is possible for F = [0,1] that (4) fails but (2) does not. Let Ul,UZ""
be LLd. unifonn (0,1) random variables.
(~)(1-8)/e,
Thus for
~(e)
a
Define for
aCe) =
0~8~1
I the indicator of the event A, and
A
l
Xi (8) = (e/l-e/~( (1-a(8)r I (U >aC8)) .
i
< 1,
= EX1 (8) = (8/1-8)~,
while for 8 = 1, ~(8)
a2 (8)
= Var
X1(8)
= (8/1_8)(2(1-8)/8
- 1)-1 ,
= a2 (8) = O. Then
Nd (8f
1
N (8)
LId
(Xi (8) - ~(8))
is a stochastic process in D[O,l], but choosing Sed)
= d- 4/(1+d- 4)
Na(O) ' .
lim sup Pr{INd (8)L1'
d+O O~e~l
2
(xi(e)~~(~)) I~d} ~ lim Pr{Xk (8(d))=0, k=l, ••• ,(ba*/d) }
d+O'
= lim (2- d4 )(ba*/d)2
.e
we obtain
d+O
= 1,
4
e
2
the inequality following because xk(e(d)) = 0 for k = 1 1 " , 1 (ba*/d)
2
means Nd(e(d)) = (ba*/d)2 and ll(e(d)) = d- ~ d. However, from Anscombe
(1952), it is clear that
3. Weak Convergence
SUppose the distributions
in
e.
Fe
have inverses
The crucial idea is that if Xl'X Z'''.
Ge which are continuous
are Lied. 1.mifonn
(0,1)
random variables, t~e statistics Tne = Tn (Ge (X1), ... ,Ge (Xu)) form a
stochastic process in e. To exploit this idea, the first example considered
-1 tn
is that of the sample mean Tne = n
£1 GeCXi ). ~e will be the sample
variance. Consider the following assumptions:
(Al)
On each finite interval,
exist positive constants
J
(AZ)
I
G:Cx)dx is bOlmded and there
~~,
00
such that
4
1+°0
(Ge1(X)-Gez(X)) dx ~ f~le2-ell
For arbitrary positive const8.l1.ts
E,
.
M,
Note that (Al) and (AZ) hold in the important special case of scale orderings
4
(F (x) = F(X-ll)} as long as
x dF(x) < 00. The key is the following
1l,8
lemma.
e
f
5
Suppose Vn > Y in D[O ~oo) ~ where Vet) is normally distributed
with mean zero and variance at JOOst Mrc ~ a finite constant. Suppose
lel1111a 1.
Pi'{V
t
C[O,oo)}
=L
Then for any compact set
lim sup
tf:F
n-+oo
Proof:
Let
~
= {x
Ipr{IVn(t)l~b}
~ D[O~oo)
-
f
Pr{IV(t)l~b}1 = O.
: Ix(t)lsb}
and
A ='
Theorem 3 of Topsf6e (1967) ~ we have to show that if
{~ :
on
-i-
teF}.
0,
°
By
t n e f,
py ( 00n (oAt) n) = O.
n=l
n
(5)
Assume A is a V-continuity class.
Then
sinc~
{tn } has:a limit point
(5) is shown to hold by following the method of proof of Topsf6e' s Theorem
'e
8.
To verify that
for each t.
A
is a V-continuity class, one must show
PV(o~)
= 0
But
Py(oAt ) = Py(o(AtnC))
= Py{xeC : Ix(t)l=b} = O.
Theorem 1.
.e
Under (Al) and (A2), if f
is (;ompact~ (4) and -hence (I) ·'hold.
6
Proof:
We freely use the notation and results of Bickel and Wichura
Fix M* > 0 and define on
(1971), said paper hereafter denoted by B-W.
intervals
[O,aln) ~ [a1n,aZn)".' ,[akn,~'1*] each of length at most exp( _n 2)
~2(e) = sUP{~a
: a jn s a s aj+l,n}
~l(a) = inf{~e
: a jn s
if a·In s 6 < a.J+1 ,n
e
0
Nd 2)(e) = inf{n~5
: n ~ (bs (8)/d)2}.
nz
D[O,co).
Define nil) (8)
and for
0 S t S 2,
D (M*)
2
= D([0,2]
0 ]
Define
: n ~ (bsnl(8)/d)Z}
Ndl )(8)
[
a j +1 ,n
Ndl )(8) = inf{n~5
Clearly
where
as
S
Nd(6)
0 S
S
Ni2) (8) and the above rules are members of
and ni2) (6)
a s M* ,
in a mariner similar to the above
let
is the greatest integer function.
x [O,M*])
and for fixed t,
Vd is an element of
Vd(t,o)
is an element
Let B = (s,t) x [6 1 ,8 2) be any block (B-W, page 1658),
where ns and nt are integers. Then, by (Al),
of C[O,M*].
1+°0
2
162-61'"
{It-sl/n + It-sl }
1+00
162-611
since
.e
It-s I ~
lIn.
inequality shows
1+00
It-sl
Hence for any pair of blocks B, C, the Schwarz
7
proving by the Corollary to Theorem 3 of B-W that
{V } is tight. By
d
Vd converges weakly to a Gaussian process
Theorems Z and 4 of B-W,
V in DZ(11.1*) • Now define
{Val)}
are also in
DZ(~F) and
(t i
are in a rectangle with the same center as
co, for a fixed constant c
e
nJ2)C6i)
0 > 0,
are in a rectangle R of diameter
(i=l,Z)
(t ,8 ),
l l
the sequence is tight since if
and d
n~2)CM*)'
)
8i
R but of diameter
sufficiently small.
Here, we have
used the fact that n~Z)(8)/naZ)O~) ~ (a*/2~~)Z. Note that n~2)(6)
was defined precisely to keep v£l)
(2)
fd
Z
1
(M*)a (M*J) ~
l~ (2)
2
n
(6)a (a)
J
(1) .
Vd
(t,6).
in D .
2
(2)
Then Vd
Next define v£Z) (t,6) =
is tight in DZ(M*)
and its
d
finite dimensional distributions converge to those of a Gaussian process
V with covariance ftmction when t = 1 and 81 < a2 given by
G (x)G (x)
a1 eZ dx
•
a2(a )
f
Hence
Z
V~Z)(l,6Z) - V~Z)(1,6l) converges to a Gaussian random variable
with mean zero and variance
f
.e
for some
that
{
G~
(x)
G~ (x) ZG8 (x)G6
2
1
12
2
2
+ 2
a (8 2) a (6 1 )
a (6 2)
e: > O.
(X)}
I
dx ~ r~ 62-61
Ie: '
The inequality here follows from (Al) and the fact
la 2 (6 2) - a Z(6 l ) I ~ M4lez-6ll Hence, by Billingsley (1968, page
8
e
97) for fixed t,
C[O,M].
V~2) (t, 0)
converges weakly to a Gaussian process on
2
Now define v~3) (t,6) = vl ) (t
N(2) (6)
12)
nd (8)
,8).
Note that condition
e: > 0
(AZ) shows that fOT
(6)
By (6) and the Kolmogorov mequality (see Anscombe (1952)), one shows
that for fixed t
and 8,
V~2)(t,8) - Va3) (t,8)
+
0
(in probability).
V~3) (and hence v~2) - Vd3) ) is shown to be tight by (6) and the same
e
l
kind of argument used to prove the tightness of Vd ) ; hence for fixed
t, Vd3) (t,o) converges weakly to a Gaussian element of C[O,M]. We
need the following result.
Proposition 1. Wd
+
Define
0 in probability.
Proof of Proposition 1:
First note that (A2) implies
(7)
Let n > 0 be arbitrary.
Then
IV~3)(t,8) - Vd3)(l,S)I ~ Ivl3)(1,e) - VJ3)(l-n,e)\
3
+ mini IVd3) (t,e)-Va ) (1,8)
I, Iva 3) (1-n,e)-Vd3) (t,8) I},
9
e
so that by (7)
(8)
W s
d
~
with probability approaching
one~
sUPIVd3)(1~6)-V~3)(1-n~6)1
6
sup
+
l-nsts1
min{suplv~3)(t,6)-Vd3)(1~e)1 ~ sUPlv~3)(1-n~e)-vd3)Ct~6)1}.
e
6
The first tern on the right hand side of (8) converges in probability
to zero asn,d
-+ 0
by Chebychev's inequality, while the second is bounded
by the modulus of continuity
w~
of
B~W
and hence converges to zero in
n -+ o.
probability as
Returning to the proof of Theorem 1» we see that the conclusion
of Proposition 1 also holds for the process
v£4) (t,6)
e
=
b(dI'J~l) (8)r
since b 2<i(6) (d2Ndl) (8)) -1
e: > 0 and d
(9)
P...
[tN(2) (6)]
li=ld
GeCXi )
l
1 tuiifonnlyon 0 s 8 s M.
Thus, for all
sufficiently small
-1
Pr{jNd (6)
N (6)
II d
G6(Xi)l~d}=Pr
{
b
. N (6)
laNd(6 ) II d
....
}
G6(Xi)l~b
s Pr{ 1Vd4) (1,6) I ~ b-d +
Since
v~4) (1,) converges ,.,reakly on D[O,M*] to a Gaussian process
(call it V) on C[O,r1*]
on D[O,oo)
one.
and M*
is arbitrary, the convergence is
to a Gaussian process V which is in C(O,oo)
with probability
Thus, Lemma 1 (with b replaced by b- e: ) says that as
last term in (9) is botmded uniformly in 8
Letting e: +:0 comp:letes the proof.
.e
E.
€
f
d
-+
0, the
10
e
In the sample means case, (Al) and (A2) hold if
f G:(x)dx
is bounded
on each finite interval and there exists a function H with finite fourth
moment such that
4.
IGe
(x)-Ge·.(x)l::;; leZ-elIH(x).
2
1
Applications
Theorem 1 holds for many types of estimators, two of which (the
sample median and a smooth M-estimate) we illustrate here.
idea is that Theorem 1 will hold when Tne
mean plus a tmiformly small order tern.
.e
-+
0
(a.s.u.)) if for all
C,
sup{ITne ' : o::;;e::;;C}
Exampl_~
can be expanded as a sample
{Tne } converges to zero almost surely tmiformly (denoted
Definition.
Tne
4.1.
+
Here we define Xl'X2 ""
0
(a.s.).
as i. i. d. uniform
variables and Tne by I~ i/J (Ge (Xi) -Tne ) = O. Here l/J
nondecreasing with two bounded continuous derivatives.
in a neighborhood of zero and i/J' (x)
These i/J
=0
(0,1)
random
is bounded and
Further,
outside an interval
i/J' (x) > 0
[-k,k].
ftmctions include smoothed versions of the lfuber M-est:imate
(Huber (1964)).
~e
=
n- l
~e is defined by
I~ i/J2(Ge (Xi )-Tne ) {n- l L~ l/J'(Ge(Xi )-Tne)}-2
{Tne } with Vimean" zero; more precisely,
We Fake the following asstmq>tions for every C > 0:
We again have the
.e
The key.
•
f
i/J(x)dFe(x)
= O.
11
(EI) There exists C* such that
C~
(BZ)
For every E: there exists
(B3)
TIlere existP9sitive constants c1
j
> 0 such that
C
j
z
such that
Condition (Bl) is weaker than the sufficient condition for the sample
means given at the end of Section
3~
while (HZ) and (B3) are quite reasonable.
Lemma 2. Under (Bl) - (B3)j the conclusion to Theorem 1 holds.
Proof: We first show
the fact that l/Jv
ITnal
-+ 0
(a.s.u.).
By a Taylor expansion
and
vanishes outside [-kjk] for any E: j there exists
nl > 0 such that
(10)
In-
l
L~ l/J(Ge(Xi)-E:) - n- l L~ l/J(Ge (Xi)-t) I
o
~ s~ l/JV(x) n
~ HI le-eol
where f·!1.l depends only on C and·
j
1
j
o~e~c.
Now choose a set of such
1
parameters eO by {e in = i/~ O~i~c~}.
(1963) and the Borel-Cantelli Lemma~
j
.e
£1 IGe(Xi)-Geo(Xi) II{nlsXi~l-nl}
-1 ~n
:By
Theorem 1 of Heeffding ;: .
12
(11)
sup{ln-
l
L~
1/J(Ge(Xi )-t:) -
f 1/J'(x,.e:)dFe (x) >"nO ~
Since
J 1/J(x-t:)dFe(x) I :
(10) and (11) give
ede in }} ... 0
ITnal'" 0
(a.s.).
(a.s.u.).
Again by a Taylor expansion,
~en(Xi)
where
and Ge(X i ) - Tne • Since 1/J'(x) > 0
is between Ge(Xi )
in a neighborhood of zero and (B3) holds, there is a positive constant
nZ
such that
almost surely as n'"
-e
One now shows
00.
(12)
n
by shQi.\Ting n-3/4
!.-4
ITna ' ... 0
(a.s.u.)
I~ 1/J(Ge (Xi )) ... 0 (a.s.u.); this is accomplished by
following the steps in (10) and (11).
1/Jll
(13)
is
bounded~
~
.e
one rore Taylor expansion shows
ITne - n-
Now define
Now using (12) and the fact that
Pe(x)
1
L~
1/J(Ge (Xi )) /
= a 1 (e)1/J2(x)
J 1/J'(x)dFe (x)I
(a.s.u.).
+ a (e)1/J'(x) + a (e)1/J(x), where
2
3
r2
a l (e)
=
a (e)
2
= -2E1/J2 (X) (E1/J' (X))-3
a (8)
3
= -azE1/JI1(X)
(E1/J' (x)
and the expectations are under
... 0
Fe'"
- a l E1/J(X)1}J'(X),
Using (13), Taylor's theorem,
13
and the same type of arguments as in (10) and (11), one
I~e
(14)
where
- h(8) - n -1
h(8)
r~
Pe (Ge (Xi)) +
J Wi (x)dFe(x)}-2
= J w2(X)dFe (X){
a on each subinterval
0
~
J Pe(x)dFe(x) I
a
~
+
shO\~s
0
(a. s. u.),
is clearly Lipschitz in
M.
Now reconsider the proof of Theorem 1.
processes there in terms of Tna •
One can redefine all the
For example,
(15)
if t <
-e
~.
Because of (Bl), (Al) holds for the sample means generated by w(Ge(X))/a (E~),
l
while (14) shows that (A2) holds for ~e' Because of (13), all the weak
convergence arguments in Theorem 1 apply to processes such as (15), and
the proof is complete.
It is clear that this type of result can be extended to a more general
class of smooth W functions and to the I-Iuber
Lemma 2 for the sake of brevity.
w»
but we will stay with
Further» the scale equivariant version
given by
could also be considered.
invariant
range.
.e
Here qna
= qn(Ga(Xi ),
..• ,Ge(~))
is location
and scale equivariant; an example would be the interquartile
The proof of Lemma 2 will still be applicable if we can write qne
as a sum of i. i. d. variates plus remainder tenn as in (13), say
l
qna = n- L~ H(Ge(Xi )) + o(n-~) (a.s.u.). The next example includes
14
implicit conditions under which the interquartile range admits such an
expansion.
~le
4. Z.
Let
Xln < XZn < ••• <
the uniform sample Xl' XZ' •••
Bahadur (1966) has shown that if
k
where n 2
Rn ~ a
(a.s.).
,~.
~
be the order statistics from
Let
1)1 = np
+
a<p
< 1 and define
k
(n 2 log n),
Following Kiefer (1967, page 1324), it is
possible to prove the following .
.e
Proposition 2.
for some
Define
~e
= Ge(p) ,
and suppose that unifonnly on 0::;
e: > 0,
Then
This proposition gives an. (a.s.u.) expansion for the interquartile
range, the median, and the variance estimate for the median suggested
by Geertsema (1972).
As in the previous example, it suffices to verify
Theorem 1 when the random variables are given by
.e
e ::;
C,
15
and ~6
is given by the expansion for Geertsema is variance estimate.
Under the conditions of Proposition 2 ~ (Al) and (AZ) reduce to requiring
that
Fe(~8)
is Lipschitz in 8 of order a >
~.
5. A General Result
The Theorem we actually gave in Section 3 is stronger than stated.
We place it here in its fullest generality as it should be of independent
interest in the theory of "'leak convergence of partial sum processes with
random indices.
Theorem 2. Suppose (AI) holds and there exists elements Nil)(6)~ N~2)(8)
.e
and nd(8)
Ni2) (8).
in D[O~oo) with {nd (8)}
Suppose further that
~a2)(8)/nd(8) -~>
non-random and N~1)(8)
:s;
N (8) :s;
d
I
Nil) (8)/ndC6) ~> 1
on D[O,oo)
d 2n (8)/o2(8) ~ b 2
d
where 11=>ii
is weak convergence.
Then C4) and (1) hold.
Further ~ if
we define
then Vd converges weakly to a Gaussian process V on
sample paths are in C[O,oo)
,e
with probability one.
D[O~oo)
whose
16
ACKNm~LEDGE~1ENT
The author wishes to thank Professors S. Gupta and H. Rootzen for
their encouragement and advice.
--
.e
18
REFERENCES
F.J. (1952). Large sample theory of sequential estimation.
Froo. Camb. PhiZ. Soo. 48, 600-617.
Ar~SCONffiE,
BAHADUR, R.R. (1966). A note on quanti1es in large samples. Ann. Math.
Statist. 37, 577-580.
BICKEL, P.J. and vvICHURA, 11.J. (1971). Convergence criteria for mu1tiparameter stochastic processes and some applications. Ann. Math.
Statist. 12, 1656-l670.
BILLINGSLEY, P. (1968). Convergenoe of PpobabiUty MeasU!'es. New York:
JoIL~ Wiley &Sons, Inc.
mow,
Y.S. and ROBBINS, H. (1968). On the asymptotic theory of fixedwidth sequential confidence· intervals for the mean. Ann. Math. Statist.
36, 463-467.
J.C. (1972). Nonpararnetric sequential procedures for selecting
the best of k populations. J. Am. Statist. Assoo. 67, 614-616.
GEERTSEr"~,
.
-
HOEFFDING, W. (1963) ~ Probability inequalities for sums of bounded random
variables. J. Am. Statist. Assoc. 58, 13-30•
HtIBER, P.J. (1964). Robust estimation of a location parameter. Ann. Math.
Statist. 35, 73-101.
KIEFER, J. (1967). On Bahadur's representation of saJl¥)le quantiles. Ann.
Math. Statist. 38, 1323-1342.
LINDVALL. T. (1973). Weak convergence of probability measures and random
functions in the function space D[O,oo). J. Appl. Probe 10, 109-121.
PARZEN, E. (1954). On unifonn convergence of families of sequences of
random variables. Univ. of CaZif. PubZ. in Statist. 2, 23-54.
TONG, Y.L. (1971). On the consistency of single-stage ranking procedures.
Ann. Institute Statist. Math. 22, 271-284.
TOPS0E, F. (1967). On the cormection between P-continuity and P-uniformity
in weak convergence. Theor. Frob. Appl. 12, 241-250 .
.e