The use of random allocation for the control of selection

Biometrika (1969), 56, 3, p. 553
Printed in Oreat Britain
553
The use of random allocation for the control of selection bias
B Y STEPHEN M. STIGLER
University of Wisconsin, Madison
SUMMARY
1. INTRODUCTION
In the problem of comparing the effectiveness of two treatments, it is common for the
statistician to have the experimenter select In subjects suitable for treatment and treat n
subjects with each treatment. Clearly, if the experimenter is aware of or guesses which
treatment a subject will receive before he selects the subject, then he can, consciously or
unconsciously, bias the experiment by bis choice; therefore the assignment of treatments to
subjects is usually done 'randomly'. Ideally, the 2n subjects will be selected in advance, and
the treatment assignments made by random sampling without replacement. This is, however, not always possible. In many cases, suitable subjects arrive sequentially and must be
treated immediately or not at all. Two examples which have been cited are clinical trials
and cloud seeding experiments. In clinical trials a patient often must be treated as soon as
the disease has been diagnosed; in cloud seeding it is physically impossible to collect the
subjects (storm clouds) for simultaneous assignment of treatments.
In some experiments it may be possible to eliminate this bias by having the selection of
suitable subjects done by a third party who is ignorant of the past assignment of treatments,
or by defining ' suitability' in a sufficiently objective manner to permit selection without an
exercise of judgment. There remain many cases in which the best or only judges available
are those involved in administering the treatments. For such situations it is reasonable to
ask what strategy the statistician should adopt in the assignment of treatments in order to
reduce the expected bias as much as possible.
This problem was first considered by Blackwell & Hodges (1957), who have proposed
the following model. In order to compare two treatments it is decided to treat 2n subjects,
n with treatment A and n with treatment B. Candidates for treatment arrive sequentially,
and as each arrives, the experimenter E decides whether or not it is a suitable subject. If it is
declared suitable, the statistician S then tells E which treatment to administer, A or B. The
experimenter E attempts to bias the experiment in favour of A by his selection of a suitable
Downloaded from http://biomet.oxfordjournals.org/ at Penn State University (Paterno Lib) on May 17, 2016
In comparing two treatments, suppose the suitable subjects arrive sequentially and must
be treated at once. In such situations, if the experiment calls for fixed treatment numbers,
the experimenter can, using his knowledge of the number of treatments that have been
assigned, bias the experiment by his selection of subjects. If we consider the method of
assigning treatments as an experimental design, Blackwell & Hodges (1957) have shown
that the minimax design is the truncated binomial. In this paper we show that random
allocation is a restricted Bayes design within the class of Markov designs, and is in many
senses preferable to the minimax design. In particular, it is possible for the random allocation design effectively to eliminate the bias asymptotically when the minimax design does
not, and in no case will random allocation have a much worse performance than the minimax.
554
S T E P H E N M. STIGLEB,
An (2n\ /2 2 *- 1 = 2Ani/7r* + o(l)
as n -+ 00,
independently of E'a strategy, as long as E guesses in the obvious manner once one of the
treatments has been exhausted.
In this paper we shall consider a slightly different model. Rather than necessarily biasing
the experiment by A on each trial, it will only be assumed that E picks a subject with
expected response between fi — A and /* + A. Thus he may choose not to bias the experiment
at all, picking a subject with expected response fi. While this model may seem more general,
it in fact is not. For any strategy available to E in the new model there is one hi the BlackwellHodges model with identical risk; E need only randomize appropriately between fi — A. and
JJ, + A. It follows in particular that the truncated binomial design is still minimax in this new
model. It is only for the convenience of representing E'a strategies as nonrandomized that
this different formulation is introduced.
If the truncated binomial design is minimax, even in this different formulation, why do
we consider the problem anew ? The principal reason is that although this design is optimal
in the minimax sense, it is not satisfactory. In the first place, as Blackwell and Hodges have
pointed out, the bias for this design is of the order of ni. Thus when the difference of treatment sums is standardized by dividing by ni, the bias does not tend to zero as n increases and
could cause incorrect inferences to be drawn. Secondly, for this design the risk does not
depend on the experimenter's strategy. Thus S does no better against a timid E who refuses
to bias unless sure than against a bold E who, knowing his game theory, always uses his
optimal strategy.
In fact, it seems reasonable to expect that in realistic situations E will tend to be timid,
either consciously or unconsciously. One possible strategy which exhibits this tendency to
timidity we will call the proportional convergence strategy, denoted by 6V Under strategy dx,
if E observes that in the first t of the 2n trials, exactly j A treatments and i —j B treatments
have been assigned, then on the (t + l)st trial E picks a subject with expected response
Thus with this strategy, E biases by an amount proportional to the fraction of treatments
yet to be assigned which must be A's; if one of the treatments has been exhausted he will
Downloaded from http://biomet.oxfordjournals.org/ at Penn State University (Paterno Lib) on May 17, 2016
subject. If he guesses that the next treatment will be an A, he selects a subject with a high
expected response, say fi + A. If he guesses B, he waits for a subject with a low expected
response, say /i — A. For this model the bias, the expected difference of treatment sums when
there is in fact no difference between the effects of the two treatments, can be easily seen to
be 2A(C — n), where O is the number of times E guesses correctly. If we consider an experimental design for the statistician S to be a decision rule by which he assigns treatments A or
B to the selected subject, and if we call the expected bias for such a rule the risk of the design,
which is 2A{E(0) — n} in the above model, then Blackwell and Hodges have found the
minimax design for this model, namely the design which minimizes the maximum risk, the
maximum being taken with respect to all possible guessing strategies for E. Their solution
is the truncated binomial design; that is, the statistician assigns A or B to the subject with
probability \ each, independently of all other assignments, until all n of one of the treatments
have been used. The risk for this design was found to be
Random allocation for the control of selection bias
555
bias by A, and if the treatments have been assigned equally he will not bias at all. If S knew
that E was vising d1 he would alternate treatments. However, that design is clearly unacceptable; should E guess, or recognize, that the design is being used, he can bias the
experiment by as much as 2An. What is needed is a design which has a quite small risk when
E is using strategy 8lt yet has a maximum risk only slightly higher than that of the truncated
binomial. I t is the purpose of this paper to show that these demands are met by the random
allocation design. This is the design where 8 picks n of the first 2n integers at random without
replacement, and gives treatment A to the subjects corresponding to the integers selected.
Equivalently, if after the tth trial 8 has assigned j A'e and i —j B's, he assigns an A on the
(t+ l)st trial with probability (n-j)/(2n-i).
Blackwell & Hodges (1957) showed that for
this design the maximum risk is
as n->-oo,
which is of the same order and only slightly larger than the maximum risk of the truncated
binomial design. In what follows we shall show that if 8 uses the random allocation design
and E follows the proportional convergence strategy 6lt then the risk is only
2n \ 2n
j £ ** = A{log» + log2 + y - l } + o(l) as
wherey = 0-577... is Euler's constant. Furthermore, of all 'simple' designs with maximum
risk less than or equal to the maximum risk of the random allocation design, the random
allocation design minimizes the risk against 6X. Thus against a timid experimenter the
random allocation design is a great improvement over the truncated binomial design (in
fact the bias is then asymptotically negligible), while against a bold experimenter it is only
slightly worse.
2. RANDOM ALLOCATION IS A BESTRICTED BAYES DESIGN
Since the bias of a design will be measured by the expected difference of treatment sums,
we can assume without loss of generality that/* = 0. I t will be convenient in what follows to
generalize the problem by allowing the numbers of the treatments to be assigned to be
unequal. Let D(m, k) denote the design problem where m A'B and k B'B are to be assigned.
We assume that E is attempting to bias the experiment by choosing subjects with high
expected response if he suspects A will be assigned and low expected response if he suspects B.
If, in the problem D{m, k), the statistician S uses design 8 and the experimenter E uses
strategy 6, we shall represent the expected bias (the risk to 8) by B^S,6). In particular, let
So denote the random allocation design, 61 the proportional convergence strategy and 62
the convergence strategy. Then at any stage of the experiment if there remain i A's and j B's
to be assigned and 8 is using So, he will assign A with probability il(i+j) and B with
probability jl(i+j); if E is using 6X he will pick a subject with expected response
2A{i/(*+;)-*};
and if E is using 62 he will pick a subject with expected response Asgn(i-j), where
sgn (0) = 0.
Downloaded from http://biomet.oxfordjournals.org/ at Penn State University (Paterno Lib) on May 17, 2016
- l j = A{(7TO)i-l} + o(l)
556
S T E P H E N M. S T I G L E R
Hodges & Lehmann (1952) defined a restricted Bayes solution to a decision problem as
follows: If B(8, 8) is the risk of procedure 8 when the parameter equals 8, 80 is said to be the Crestricted Bayes solution with respect to the prior distribution A if 80 minimizes JB(8,8)d\(8)
subject to B(8,8) 4 C for all 8. In our problem the procedures 8 are the designs of 8, the
parameters 8 are the strategies of E, and the risk is the expected bias. Define a Markov
design 8' to be one such that for any positive integers i andj, if there remain * A's andj B'a to
be assigned, then 8' assigns treatments with probabilities depending only on i andj and not
on the number of treatments already assigned or the order in which they were assigned. Then
in the design problem D(m, k) we shall prove that within the class of all Markov designs,
80 is the Bmk(80,0S)-restricted Bayes solution with respect to the prior distribution A,
where A(/91) = 1.
The following two lemmas will be useful in the proof.
LEMMA 1. We have thai
Proof. If E uses strategy 8V then upon observing s A'B in the first (r — 1) trials, he will
choose a subject with expected response
2A /
m s
~
1
\
\m + k-r+l
2/
on the rth trial. Let AB(r) be the expected response of the rth subject selected, let a(r) = 1
or — 1 accordingly as the rth treatment assigned is an A or a B, and let
pr(s) = (m-a)l(m + k-r+l).
Then the bias induced by E is
m+k
A 2 B(r)a(r),
r-l
m+k
i^fc(*0,0x) = A 2 E{B{r)a{r)}.
r-1
Let Xr be the number of A's assigned in the first r — 1 trials. Now
E{B(r)a(r)} = E[E{B(r)a(r)\Xr}],
and since, when using 80, S assigns an A on the rth trial with probability pr(Xr), we have
E{B(r) a(r)\Xr} = 2{pr(Xr) - \}pT(Xr) - 2{pr(Xr) - J}{1 -pr(Xr)}
= {2pr(XT)-lf,
E{B(r)a{r)} = 4var{p r (Z r )} + {2^p r (Z r )-l}».
When S uses 80, XT has a hypergeometric distribution,
Downloaded from http://biomet.oxfordjournals.org/ at Penn State University (Paterno Lib) on May 17, 2016
THEOREM 1. In the design problem D(m, k), the random allocation design 80 uniquely minimizes -BmJfc(5, #x) among aU Markov designs satisfying
Random allocation for the control of selection bias
557
for max (0, r — 1 — k) ^ x ^ min (r — 1, m),
E(Xr) = m{r-\)j(m
+ k),
var(X r ) = mk{r- 1) (m + &-r + l)/{(TO +fc)2(m+ fc- 1)}.
Thus
E{pT{XT)} = ml(m + k),
vax{pr(XT)} =
mk(r-
and we have
4mk
)
J
~ (m + kfim
y
+k l ) ^
4mifc
~ (m + k)(m + k-l)
ll
(m + -r+iy
k-r+iy (m + k)
m
x
+
(m-k)2
(m + k) '
2. Let s = min (k, m). Then
Proof. Assume, by consideration of the symmetry of the problem, that k > m. If we
consider a realization of the design So to be a walk on the lattice points of the plane proceeding
from (0,0) to (m, k), moving up when B is chosen and right when A is chosen, then the bias of
an experiment is clearly A times the number of steps towards the line y = x + k — m minus A
times the number of steps away from the line y = x + k — m which do not start on this line.
Thus A"1 times the bias equals k — m plus the number of steps starting on that line, since the
walk must move towards the line k times and away m times. Define Ur = 1 if the walk passes
through (r, k — m + r), Ur = 0 otherwise. Then the number of steps starting on the line
y = x + k — mis just U± + ... + Ur, and since
the lemma follows.
Proof of Theorem. We proceed by a double induction on m and k. The theorem clearly
holds for D(m,0) and D(0,k). Assume it holds for both D(m- 1,k) and D(m,k— 1), and
consider the problem D(m, k). For any pair (8,6), where on the first trial 8 picks A with
probability p and 6 chooses a subject with expected response A1; we have
Rmk(8,d) = pRm_ltk(8,d) + (l-p)Itnhk_1(8,d)
+ &1(2p-l).
(1)
By the induction hypothesis and the fact that we are limited to Markov designs, we must
have 8 = So after the first assignment. Thus we need only show that subject to
(2)
(3)
is minimized by taking p = ml(m + k).
Without loss of generality, assume A = 1 and k > m. If k = m, obviously p = $ is the
solution. Then (3) is rninimized when we minimize
(^1^
(4)
Downloaded from http://biomet.oxfordjournals.org/ at Penn State University (Paterno Lib) on May 17, 2016
LEMMA
+k
^ ^
( r 1 }
558
S T E P H E N M. STIGLEB
The quantity in brackets, from Lemma 1, is given by
4
/m+k \
(m + k— \)(m + k — 2) \ , D 2
2(m-ib)
(m + k)
=
J
/*•
'
(m + ifc—1)
2(fe-«t) r(m +fc+ l ) _
2
"»+*+1^ j
(m + k-l)\
(m + k)
(m + k-2) j _ 2 )
for m + k > 2, since
m+fc+l
1= 2
\2p-
+\k-m\l(m
(m + k)
or, since we can take p < \,
We claim that Bm-Uk(^o> ^2) ~^m,*-i(^o' ^2) < 2. From Lemma 2, this is equivalent to
,
m
~ 2 /k-m+l + 2r\ (2m-2-2r\
m
r 0
m
m-i/jfc_m_l + 2r\ I2m-2r\
* 5 (
or
1\
z} (k-m-\
)[ -l-r)< M
*2
r_
-r, ,
But
r
m
r
m
~} (k-m-l
<m 2
r-1
) \ m - r )
r_0 \
. /k-m-l + 2r\
(k-m-l
ifc I
<m
r
V
-! /
+ 2r\ /2m-2r\
\
) ( m-r )
+ 2r\ /2m-2r\
r
+ 2r\
I
J \ m-r )
.
for r < w,
»" /
so that iZro-i.fcf^oi ^a)~ -^m,t-i(^o> ^2) < 2, and (5) becomes p ^ ml (m + k). Since we wish p as
small as possible, this completes the induction and proves the theorem.
3. REMABKS
Two objections might be raised to the above. First, why do we limit ourselves to Markov
designs when we might be able to do better and certainly could do no worse without this
restriction? Secondly, since the choice of an upper bound for the risk is arbitrary and
different restricted Bayes designs would correspond to different upper bounds, why should
we choose the maximum risk of random allocation as our upper bound? In answer to the
first criticism, three points can be made. (1) If the number of treatments to be assigned is
large, it is quite difficult to find the restricted Bayes design without constraining ourselves
to Markov designs. (2) Markov designs are quite easy to use in practice; extensive tables are
not needed, as they would be with more general designs. (3) While it is true that some
improvement is possible through non-Markov designs, the amount of improvement is quite
small, as will be evident from a later comment. That improvement is possible can even be
Downloaded from http://biomet.oxfordjournals.org/ at Penn State University (Paterno Lib) on May 17, 2016
Then (4) and thus (3) is minimized by taking p as small as possible. It remains to be seen
how small &p is allowed by (2). Clearly we can takep < mj(m + k). Applying (1) to the right
side of (2), we find that (2) becomes
Random allocation for the control of selection bias
559
A S (2&-1)- 1
against 6X, An against 62. Thus random allocation does slightly worse than the Bayes design
against 81 (for large n, Alogn as compared to £Alogn for the Bayes design) but it has a
much smaller maximum risk (for large n, l-77A»i as compared to An for the Bayes design).
It can therefore be seen that random allocation strikes an excellent balance between
guarding against too large a maximum risk and doing well against an experimenter who,
through timidity or ignorance, uses a strategy similar to 6V In addition, when the difference
of treatment sums is standardized by dividing by nt, the bias attributable to selection will
be negligible if the number of treatments is large and the experimenter uses dx.
It can be easily shown from Lemmas 1 and 2 that if the statistician uses design 80 and the
experimenter uses either 6X or 82 as his strategy, then for any given total number of treatments m + k, the selection bias is minimized by taking m and k as nearly equal as possible,
equal if m + k is even. In fact, this result, that selection bias is minimized by assigning equal
numbers of treatments A and B, is true under much more general conditions. Denote by
B = {6(i,j)} the experimenter's strategy and by 8 = {<5(i,jj} the statistician's design, where
by this notation we mean that if t treatment A's and,?' treatment B'& remain to be assigned,
the experimenter chooses a subject with expected response 6(i,j) and the statistician assigns
A with probability 8(i,j).
2. Assume that for eachfixedm (m = 1,..., 2n)
(a) 6(i, m — i) and 8(i, m — i) are both monotone increasing as functions of i, for 0 < i s$ m;
(b) 6(i,m — i) > Ofori > \m, d(i,m — i) < Qfori > ^m;
(c) *(»,») = J / o r » = l,2,...,n.
Then R^ 2n_i(#, 6) is minimized by taking k = n.
THEOREM
Proof. The theorem follows, using an easy induction on m, by proving the stronger result
that -Rjt>m_fc(#, 0) is a monotone decreasing function of k for k < \m, and a monotone
increasing function of k for k > \m (m = 1,..., 2n).
Thus for 'reasonable' designs 8 and strategies 6, selection bias is smallest with equal
numbers of treatments.
35
Biomet. j6
Downloaded from http://biomet.oxfordjournals.org/ at Penn State University (Paterno Lib) on May 17, 2016
seen for the simple case m = k = 3. Here, if we have three of one treatment and one of the
other left, we should choose the one we have the most of with probability one; if we have two
of one treatment and one of the other left, we should choose the one we have the most of
with probability ^ ; otherwise flip a fair coin whenever there is a choice. This design has the
same maximum risk as 80,2-2A, but its risk against 61 is only 1-68A while R^^SQ, 6X) = 1-74A.
The second criticism, that the choice of an upper bound is arbitrary, also has some validity.
However, it can be easily seen that the design we have arrived at, random allocation, has
other quite nice properties. In the first place, its maximum risk is only slightly larger than
the risk of the minimax design (for large n, 1 • 7 7 Ani as compared to 1 • 13 Ani for the minimax
design), while against 8X its risk is much smaller (for large n, A log n as compared to 1- 13Ani
for the minimax design). Secondly, it is easy to see that for the prior A, A(^) = 1, the Bayes
design is' Always assign the treatment of which you have the most remaining, flip a fair
coin when equal numbers remain', and the risk of the Bayes design is
560
STEPHEN M. STIGLER
4.
THE BIASES OF DIFFERENT DESIGNS
Table 1 gives the biases of a number of strategies. The following notation is used:
80: random allocation design;
£x: minimax design (truncated binomial);
S2: Bayes design, for prior A(0X) = 1;
8X: proportional convergence strategy;
0S: convergence strategy.
Table 1. Values of bias A^R^S, 6)
ti
(o 0 ,
@I)
(OQ,
Pj)
{$1, @i),
{&IJ
Of)
(Oj, dj)
(^j,
6f)
1-00
214
2-73
310
3-36
3-57
3-74
3-89
402
100
306
4-68
6-92
6-98
7-91
8-75
9-52
10-25
1-00
2-46
3-52
4-33
501
5-61
6-15
6-65
711
1-00
1-79
213
2-34
2-48
2-59
2-68
2-76
2-83
1
5
10
15
20
25
30
35
40
45
50
60
70
80
90
100
413
4-23
4-41
4-55
4-68
4-80
4-90
10-92
11-56
12-76
13-86
14-88
15-84
16-75
7-55
7-96
8-72
9-42
10-08
10-69
11-27
2-89
2-94
303
311
317
3-23
3-28
45
50
60
70
80
90
100
oo
log,n
l-772ni
l-128n*
0-51og,n
n
This research was supported in part by the Wisconsin Alumni Research Foundation.
REFERENCES
BLAOKWBIX, D. & HODGES, J. L. (1957). Design for the control of selection bias. Ann. Math. Statist.
28, 449-60.
HODGES, J. L. & LETTMANU, E. L. (1952). The use of previous experience in reaching statistical
decisions. Ann. Math. Statist. 23, 396-407.
[Received April 1968. Revised April 1969]
Downloaded from http://biomet.oxfordjournals.org/ at Penn State University (Paterno Lib) on May 17, 2016
1
5
10
15
20
25
30
35
40