Dowling, T. and Shachtman, R.; (1972)On the relative efficiency of randomized response models."

1
Research partially supported by the Air Force Office of Scientific Research under Contract AFOSR-68-1415.
ll-tE REu\TI Vi:: EFF ICIENCY
~~DOMIZED RESPONSE MoDELS
Oi
OF
T. A. Oowl i ng
1
a:nd Ri chard Shachtman
Departmenr- of Statistics
University of North Carolina at ChapeZ BiU
Institute of Statistics Mimeo Series No. 811
March, 1972
ON THE RELATIVE EFFICIENCY OF RANDoMIZED RESPONSE f1::>DELS
T. A. Dowling l and Richard Shachtman
Department of Statistics
Univemity of Noroth CarooZina at Chapel Hill
1.
INTRODUCTION
Warner introduced in [5] a technique for estimating the proportion
of
TI
a human population having an unobservable sensitive or stigmatizing characteris tic
A.
The method, which he called "randomized response", is designed to
eliminate untruthful responses in sample surveys which would result in a biased
estimate of
TI.
Each subject is asked to observe the outcome, concealed to the
interviewer, of a randomization device producing one of two outcomes
with known probabilities
p, l-p,
A,
A
respectively, and then to respond either
"yes" or "no" according as he does or does not belong to the group indicated
~
by the outcome of the randomization device.
sensitive characteristic;
ization probability
p
A
(A
refers to the group with the
refers to the complementary group.)
must be chosen sufficiently distant from
The random0
or
1
to
encourage truthful responses, as extreme values will tend to arouse suspicion
in the respondent.
A second technique, called the "alternate question randomized response
model", is discussed by Greenberg, et.al. in [1].
observable innocuous characteristic
Y,
The method involves an un-
possessed by a proportion
~
of the
population, which respondents would presumably have no reason to conceal, and
a randomization device producing one of two outcomes
known probabilities
p, l-p.
A, Y with respective
Each subject is asked to respond "yes" or "no"
according as he does or does not belong to the group indicated by the outcome
1
Research partially supported by the Air Force Office of Scientific Research under Contract AFOSR-68-l4l5.
2
of the randomization device.
unknown are discussed in [1].
Both the cases when
~
~
is known and when
In the latter case, the total sample must be
split into two samples with different randomization probabilities
order to estimate
near
1
TI.
is
in
As with Warner's method, randomization probabilities
must be excluded to encourage truthful responses, but in the alter-
nate question models these need not be bounded away from
O.
The purpose of this note is to prove, as is suggested by numerical evidence in [1], that the variance of the alternate' question estimator of
less than that of Warner's estimator, uniformly in
p
(or
max(P1,P2)
TI
and
~,
TI
is
provided that
in the two-sample case) is greater than one-third
(roughly).
Section 22 deals with the one-sample case, while in Section 3 we correct
and extend a proof of this result due to Moors [3] for the two-sample estimator, optimized with respect to sample size allocation and the smaller randomization probability.
Further work on randomized response models may be found in [2,6].
2. THE ONE-SAMPLE CASE
Let
TI
be the unknown proportion of people in the population having the
sensitive characteristic
characteristic
in place of
~.
Y.
A,
and
~
be the proportion with the innocuous
We assume in this section that
~
is known, and write
This situation can always be achieved by incorporating
the randomization device (see [1], p. 532).
We assume, for both the Warner
model and the one-sample alternate question model, a fixed sample size
2
~
n,
The two models studied in this section were discovered independently by
the authors, before learning of [1,3].
q
in
3
e
and a fixed randomization probability
~ve
"Do you belong to Group A?".
p
of selecting the sensitive question:
further assume that
p
is within the allowable
range required to obtain completely truthful responses, and denote by
X the
number of "yes" responses obtained in the sample.
In the Warner model [5],
eters
n
p = t,
Aw
and
so
= pn
X is a binomial random variable with param-
+ (l-p)(l-n).
The latter is independent of
X carries no information about
the maximum likelihood estimator of
A
n
=
w
n
n
in this case.
n
when
p ~ t,
tVhen
is
X -n(l-p)
'ri(2p-l)
which is unbiased, with variance
A
V(n )
w
A (I-A)
w
w
2
n(2p-l)
=
Note that the term
n(l-n)/n
domized estimator of
p(1-p)/n(2p-l)2
n(l-n)
+
n
=
n
p(l-p)
n(2p-l)
2
.
(2.1)
in (2.1) is the variance of the direct, nonran-
if truthful responses were obtained.
represents the additional variance introduced by the random-
ization procedure, necessary to obtain truthful responses.
symmetric about
The term
p
=t
and is minimized by taking
p
A
Clearly
as far from
V(1T )
W
t
is
as
practicable.
For the one-sample alternate question model [1]
variable with parameters
estimator of
n
n
and
A = p1T + (l-p)q.
X is a binomial random
The maximum likelihood
is
=
x-
n(l-p)q
np
which is unbiased, with variance
=
A(l-A)
np
2
=
[pn + (l-p)q][l - pn - (l-p)qJ
np
2
(2.2)
4
Theorem 1. 'Let p
in
(PO,l),
E
whe re
Po
= •339333 •••
is the unique solution
of
[0,%]
1
=
4p(l-p) •
(2.3)
Then
<
for all
Proof.
in
(q.n)
[o,~].
[0,1] 2 •
E
We first note that
[o,~],
A
V(rr)
w
and that
1/(1+p2)
4p(1-p)
is strictly decreasing from
is strictly increasing from
Hence (2.3) has a unique solution
Po
E
[O,~],
°
1
to
to
1
4/5
in
the approximate value
of which is given.
By expanding the numerator of (2.2), then adding and subtracting the
~
term
p2 rr ,
we can write (2.2) in the form
n(l-rr)
n
+
=
~
np
(2.4)
2 K (q,rr),
p
where (see Figure 1)
= p(1-2q)rr + q[l - (l-p)q].
K (q,rr)
P
(l-p)K (q,rr)/np 2
The term
p
(2.5)
in (2.4) is the additional variance of the
alternate question estimator over the direct estimator of
and (2.4), we see that
K (q ,.rr)
p
We must
tinuous in
<
ShO~ol
(q,rr)
3
p / (2p-l)
rr.
Comparing (2.1)
i f and only if
2
(2.6)
•
that (2.6) holds for all
over the compact set
(q,rr)
[0,1]2,
E
2
[0,1] •
Now
K
p
is con-
so that a maximum exists and
may be obtained by maximizing first with respect to
rr,
then with respect to
q.
5
(1,1)
1T
FIGURE 1
The Sur f ace
KP ( q,1T)
6
Kp (q,~)
Further, since
(q,~) E [o,~] x
[0,1].
~ E
Fixing
max
[0,1]
a concave quadratic in
p of;
%
A
since
~
W
also, while if
(q,~)
p
q
[o,~],
E
Kp(q,~)
1-~),
=
- (1_p)q2 + (1-2p)q + p,
q with maximum at
Hence
q* < 0.
=
(q,~)
p
max
q
=
€
[0
q*
= ~.
E (~,1),
K
it is clear that
K (q,l)
P
p
[0,1]2
we may restrict attention to
=
is not defined at
max
€
= Kp (l-q,
=
If
(1-2p)/2(I-p).
then
p E (O,-t),
E
[0,1]2
E
(O,-t)
= 1/4(I-p),
jKp (q*,l)
= p,
P
(~,1).
€
Substituting (2.7) into (2.6), we have
(q,~)
q*
Kp (q,l)
,~]
Kp (0,1)
We may assume
(2.7)
for all
if either
(a)
p
€
(O,~)
(b)
p
€
(~,l)
and
1/4(I-p) < P 3 / (2p-l) 2 ,
or
Now the inequality in (b) holds for all
(a) is equivalent to
p
E (O,~).
K
p
The q-sections of
TI-sections are quadratic in
maximum value at values
monotone in
€
1/(l+p2) < 4p(1-p),
The general shape of the surface
value of
p
q*
q.
~
If
[0,1]
p
(~,l),
while the inequality in
hence holds if
(q,~)
K
p
(pO'~)' 0
€
is shown in Figure 1, for a
(q,~)
E (~,l),
p
are linear in
~,
while the
the TI-sections take their
for extreme values of
hence are
~,
(0,1].
It is clear from Figure 1 (and easily proved) that for
is minimized over
q
E
[0,1]
by taking
q
=
° if
~ < ~
p
or
fixed,
q
=1
if
7
TT >
~,
then
More realistically, i f
A
V(TT )
l
provided
is minimized at
q2
= l-ql'
If
q
= ql
q
than at
q2
if
TT <
~
or
ql'
even when
[ql'q2] ,
TT >
if
q ... q2
are not symmetric about
ql' q2
tain smaller variance at
is constrained to an interval
~,
~,
then one may at-
7T < ~,
3, THE TWO-SAMPLE CASE
Assume now that
is unknown,
~
into two samples of sizes
bilities
so that
and
p
iables and
Ai = PiTT +
Xi
th
is split
with respective randomization proba-
and
... PI'
sample
(i
Then if
= 1,2),
Xi
Xl
denotes the number of "yes" resand
X
2
are independent random var-
is binomially distributed with parameters
(l-Pi)~'
which exists for all
n
PI
>
Z =
PZ'
and
i
The maximum likelihood estimator [1] of
A
TT
==
n
Without loss of generality, we may assume
= max(Pl,P2)
ponses in the i
Then the total sample of
TT
is
n z(I-P2)X l - n 1 (1-Pl)X 2
n l n Z (Pl-P2)
is unbiased, and has variance
1
(3.1)
It is shown in [1] that (3.1) is minimized over
nl , n2 ,
subject to
in the ratio
=
in which case, (3.1) becomes
(3.2)
8
(3.3)
Of course the optimal allocation ratio (3.2) depends on the unknown parameters
~,
n
so is unavailable to the statistician.
as in [3], that
n , n
2
l
We shall assume, however,
satisfy (3.2) in order to compare the variance with
Warner's variance independently of total sample size
with respect to
Pi'
that
P2
by taking
p
possib Ie, i.e.,
= PI
p
A
is negative and
in (3.3) is minimized over
as large as practicable, and taking
= o.
2
Differentiating (3.3)
aV(n2)/apl
we observe (see (2]) that
is positive, so that
n.
P2
as small as
In other words, the second sample is a direct survey,
involving no randomization, used solely to estimate
~.
The estimator with optimal sample allocation (3.2) and with
e
called the "optimized" estimator in [3].
A = Al
= pn +
•
(l-p}J.1,
_1_
np2
Theorem 2. Let p
In this case, setting
~l-p)/~(l-v) +
(v,n)
E
p
2
=0
is
= PI'
(POO,l)
where
h(l-Afl.
(3.4)
~
POO
=
<
for all
p
(3.3) becomes
L
E
PI'
(3-1.5)/2
=
.381966 •.•
Then
A
V(n)
w
2
[0,1] .
Proof. We write
=
L
p
is continuous in
(v,n)
=
over the compact set
[0,1] 2 ,
2, we may iterate maxima again to obtain the maximum of
which will maximize
We first maximize over
so as in Section
2
[0,1] ,
Lover
p
[0,1] .
L
positive linear combination of square roots of unimodal functions.
P
is a
Since the
9
square root of a unimodal function is unimodal,
a point
~*
between the values
(1-2pn)/2(1-p)
maximizing
~l
= ~
A(l-A).
Lp (·,n)
maximizing
~(l-~)
dLp/a~
Computing
will be maximized at
and
~2
=
and setting it equal
to zero, we obtain
~
Each side of (3.5) will have the same sign if and only if
In particular, this holds for
and
= ~*,
~
side, and simplify, obtaining the quadratic in
(2_p)~2 - [1+2n(l-p)]~ + n(l-pn)
One root of (3.6) is
the second is
hence
n
n.
(1-pn)/(2-p),
n <~,
If
=
is not the maximizing root.
~
(Observe that for
= ~*,
thus we may square each
~,
0.
(3.6)
which is in [0,1] for all
~2 > ~l'
then
is between
and if
Consequently,
n >~,
~*
=
then
p, n;
~2 < ~1'
(1-pn)/(2-p).
= ~(l-~).)
A(l-A)
Thus
~ €
max
[0,1]
Lp(~,n)
=
L
p
(~*,n).
and at this value,
A
V(n 2 )
=
n(l-n)
n
+
1=£
(3.7)
2·
np
i f and only i f
Comparing (3.7) with (2.1), we see that
1=£
2
p
<
1-p
(2p-l)
2 '
or
p3 _ 4p2 + 4p - 1
>
0.
(3.8)
10
The roots of the cubic in (3.8) are
Hence (3.8) holds for
p
€
(0,1)
if
p
= Poot
(3-15)/2
It
€
and
(3+15)/2
>
1.
0
(POO,l).
4. REMARKS
As
n , n
l
2
and
"
is expected t if we compare the "optimized"
and
IT.
P2)
"
V(lT ) ,
1
and
we find that
V(lT )
2
V(lT"1)
~
(with respect to
V(lT" )
2
for all
p, q
= J.l
This can be seen if the variances are written as follows
V(~l)
•
IT(~-lT)
+
(l-~)
np
{[l-(1-P)q]q+P1J(1-2 q )}
and
IT(l-lT)
n
+ (l-p)
2 { [2-p-2(l-p)q]q + plT(1-2q) + 2{q(1-q)A(l-A) } •
np
A sufficient condition is then
~
1 - (l-p)q
which is equivalent to
(l-p)(l-q)
~
2 - P - 2(l-p)q
o.
as a function of
(~,lT)
in Theorem 1, and
L ,
P
as a function of
in Theorem 2 t enjoy a stronger property than that of unimodality; in
fact they are concave.
Hence an alternate justification for the proofs may
proceed along the lines of showing this concavity and, hence t the concavity of
the iterated maxima.
This may be seen by evaluating the Hessians (matrices of
second order partial derivatives) of
"
V(lT )
1
and
L
p
which turn out to be
non-positive definite; see [4, p. 27].
ACKNOWLEDGMENT
We would like to express our gratitude to Professor Bernard Greenberg for
a helpful discussion.
11
REFERENCES
[1]
Greenberg, Bernard G. et.aZ .. ;, "The Unrelated Question Randomized
Response Model: Theoretical Framework," JournaZ of the American Statistical Association;, 64 (June 1969), 520-539.
[2]
Greenberg, Bernard G. and Roy R. Kuebler, Jr., "App1ications of
Randomized Response Technique in Obtaining Quantitative Data,"
Journal of the American Statistical Association;, 66 (June 1971), 243-250.
[3]
Moors, J. A., "Optimization of the Unrelated Question Randomized
Response Model, II Journal of the American Statistical Association;, 66
(September 1971), 627·-629.
[4]
Rockafe11ar, R. Tyrre1,
Princeton, New Jersey, 1970.
[5]
\varner, Stanley, "Randomized Response: A Survey Technique for
Eliminating Evasive Answer Bias," JournaZ of the American StatisticaZ
Association;, 60 (March 1965), 63-69.
[6]
Harner, Stanley,
Convex Analysis;, Princeton University Press,
"Linear Randomized Response Model, 11
JournaZ of the American Statistical Association;, 66 (December 1971),
884-888.
I