Kammerman, Lisa A.; (1984).Selecting Optimal Values for Pi_Y in the Unrelated Question Randomized Response Model, Pi_Y Known."

SELECTING Ol'TIMAL VALUES FOR
lty
IN THE UNRELATED
QUESTION RANDOMIZED RESl'ONSE MODEL,
lty
KNOWN
by
Lisa A.
K~erman
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1474
December 1984
SELECTING OPTIMAL VALUES FOR Tv IN THE UNRELATED
QUESTION RANDOMIZED RESPONSE MODEL, Iv KNOWN
by
A Diss.rtation submitt.d to the Faculty of the University
of North Carolina at Chap.l Hill in partial
fullfill ••nt of the r.quir ••ents for the degree of
Doctor of Philolophy in the D.part ••nt of Biostatistics.
Chap.l Hill
1984
LISA ANNE KAMMERMAN. Selecting Optimal Values for i. in the Unrelated
Question Randomized Response Model, i v Known <Under the direction of
BERNARD G. GREENBERG.)
In surveys contain1ng direct questions on sensitive issues,
individuals may refuse to participate or they may choose to give
incorrect answers, resulting in a biased estimate of the population
proportion of individuals who possess a sensitive characteristic <iA).
The present research focuses on the unrelated question randomized
response technique, which offers respondents a choice between answering
the sensitive question or an innocuous question, thus eliciting higher
response rates and better cooperation.
The proportion of the population
with the nonsensitive characteristic (iy) is assumed known.
The perceived protection, which affects a respondent's level of
cooperation, is d1rectly related to Tv, the sensitivity of the topic,
and the probability of selecting the sensitive question (p).
As i.
increases the bias of iA decreases but the variance increases; the
converse also holds.
value of i
v
The objective of this research is to recommend a
based upon the sensitivity of the stigmatizing question, the
perceived protection, a presumed value of iA, and p.
We suggest categorizing social topics into eleven levels of
sensitivity.
For each category, the 1ncomplete beta function 1S used to
generate a postulated relationship between the probability of glving a
truthful response and the degree of protection.
Using MSE criteria, the optimal Tv are obtained for two models of
respondent behavior through the use of the postulated relationships.
The first takes into account the probability of non-truthful reporting
among individuals w1th the sensitive attribute and among those with the
innoucuous attribute.
The second assumes that each individual possesses
their own level of tolerance which is compared with the percelved
protection in order to determine whether or not to answer truthfully.
Finally, MSE efficiency ratios are used to explore the influence of p
and to make recommendations concerning the choice of p.
lV
ACKNOWLEDGEMENTS
Many individuals contributed to the preparation of this
dissertation.
1
a~
.ost indebted to my adviser, Dr. Bernard G.
Greenberg, for his tremendous faith in my abilities and his concern for
.e as an individual.
encouragement.
His nurturing and caring were constant sources of
1 greatly admire his courage, and the way in which he
interweaves the professional and personal aspects of his life.
1 can
only hope that my own career will reflect much of what he has given me.
1 am very apprlciative to Dr. Dana Quade for his major
contributions to this study, both in his theoretical suggestions,
especially those for Hodel 11, and in the amount of time he gave to this
project.
have always admired his ability for reducing seemingly
complicated concepts to understandable and practical constructs, and
will most certainly use him as a role model in my future consulting and
teaching responsibilitils.
I will miss our annual celebration of our
mutual birthdays.
1 would like also to thank my other committee members, who include
Drs. James R. Abernathy, Daniel G. Horvitz, William D. Kalsbeek, and
Berton H. Kaplan, for the time, contributions, and support they gave to
this project.
This final product also reflects the contributions of
other individuals whom I would like to thank:
Joe Janis for introducing
me to microcomputers, Kathie and Jerry Edwards for teaching me how to
use them, Carl Yoshizawa for a helpful suggestion, and the Oral Cancer
Project for supporting my literature search and the use of the
.icracomputlrl.
v
Many friends have provided me with a never-ending source of
encouragement and support throughout my years here at UNCi and thus they
too have ultimately contributed to this dissertation which marks the end
of my graduate studies in Chapel Hill.
These folks include my adviser
and classmate Roger Grimson for his unwavering support and hours of
listening; Jim Knoke for many helpful suggestions regarding my graduate
career; Dick Shachtman for his concern for me; my many fellow graduate
students and especially Kerrie Boyle, Violette Kaslca, Rocky Feuer,
Bercedis Peterson, Dave Shore, Steve Snapinn, Sylvia Wallace and Kathie
Edwards for their very special contributions; Luccia Wolfe for her
empathy and encouragement; Bill and Michele Sollecito for their
energetic spirits and support, and for the best cafe in the Southern
Part of Heaven; Ken Lessler for his caring, nurturing, and special
friendship; Martha Hamilton for her patience, encouragement, and
support; and Trailer 32 for their support and acceptance of me during a
very difficult time.
These years have also given me the opportunity to
grow close to all my kin here in Chapel Hill--Carol and Kevin Stack, and
Ken Williamson--whom I will miss very dearly and who mean more to me
than just family.
And I am most grateful to my parents, Samuel and Sylvia kammerman,
for laying the early foundations for the importance of education in my
life, and to them and my brother Bob for their support throughout my
educational career.
vii
TABLE OF CONTENTS
ACKNOWLEDGEMENTS
LIST OF FIGURES
LIST OF TABLES
····
....·····
....··
····
·········
····
····
·····
iv
l(
l(
ii
ChApter
I•
A HISTORY OF RANDOMIZED RESPONSE 110DELS AND STUDIES
····..
····
····
1.1
Introduction
1.2
Direct quution lIodel
1.3
Randomized response lIodels
2
I. 3.1
2
1.3.2
e
I. 3. 4
I. 3. 5
One unrel.ted question
·
Two unrel ated questions
1S
I. 3. 7
Linear And general randomized
relponse lIodels
·····
19
Two-stlge or conditional randomized
relponse .odels
···············
20
Mutt i ple sensitive chAracteristics
21
· •··
COllparilon Itudin
·.• ·
Applicationl . .
····
ValidAtion Itudi I I
RANDOMIZED RESPONSE:
I I. 1
13
17
I. 3. 9
1.6
12
Mutt i ple trials randomized response models
I. 3. 8
1.5
9
I. 3. b
·······
1.4
5
5
·····
l1ulti-proportion lIodels · ·
·····
QuantitAtive 1I0dels
····
·
Contui naU on 1I0dels
··········
1.3.2.b
I. 3. 3
··
·········
Warner's lIodel
·······
····
Unrelated question lIodels
····
·
1.3.2.1
II.
·
Introduction
······
····
26
30
A PRACTICAL TECHNIQUE?
...·.····..···
23
31
....
31
viil
11.2
Proposed for highly sensitive issues
31
11.3
Protection of the respondent and the interviewer
33
II.4
Effect of respondent behavior on population
estimates
...
....
38
Relationship between respondent protectlonand the
choice of parameters •
39
Summary
44
11.5
II.6
III.
IV.
MODEL 1: AN UNRELATED QUESTION RANDOMIZED RESOPNSE MODEL
THAT ALLOWS NON-TRUTHFUL RESPONSES AMONG A AND Y
INDIVIDUALS • • • .
45
111.1
Introduction
45
111.2
Definitions of probabilities of truthful reporting
46
111.3
An estimator for i
49
and its properties.
THE RELATIONSHIP BETWEEN RESPONDENT HAZARDS AND T
53
I V. 1
Introduction . . •
53
IV.2
Definition of risk of suspicion, P<AIYes)
54
IV.3
Definition of hazard functions.
55
IV.4
V.
A
I V. 3. 1
Introduction
55
IV.3.2
Limited hazards for AY individuals
55
IV.3.3
Limited hazards for Ay individuals
56
IV.3.4
Limited hazards for aY individuals
56
IV.3.5
Limited hazards for ay individuals
57
IV.3.6
Summary
57
Relationship between limited hazards and T •
I V. 4. 1
Description of the relationship
58
IV.4.2
A family of relationships between T and
limited hazards.
..•..
. ..•
61
.
67
AN INVESTIGATION OF THE CHOICE OF iv
V.l
58
Introduction
.
67
1· '.'
"
V.2
VI.
VII.
Specification of parameters
68
V. 2.1
Choosing p, VA, and n .
V.2.2
Choosing PIAIYes),
V.2.3
Summary
V.3
Select
V.4
Recommendation for p ••
Vvopt
Vv,
68
and T
68
69
for minimum MSE
70
70
MODEL II: AN UNRELATED QUESTION RANDOMIZED RESPONSE MODEL
THAT ALLOWS NON-TRUTHFUL RESPONSES AMONG A AND a
INDIVIDUALS . . • •
71
VI. 1
I ntroduct i on • .
71
VI.2
An estimator for Vv and its properties.
72
VI.3
Defining Q and FIQ)
76
VI.4
Choosing
77
VI.5
Recommendation for p
VVopt;
78
RESULTS FROM THE INVESTIGATIONS OF THE OPTIMAL Vv
79
•
VII.l
Introduction • . • • •
79
VII.2
Results for Model I
79
VI I. 3
VII.2.1
Investigation of
VII.2.2
MSE efficiency ratios for p
79
Tvopt;
Results for Model II •
....
87
89
VII.3.1
Investigation of
VII.3.2
MSE efficiency ratios for p
Vvopt;
89
97
VIII. CONCLUSIONS AND RECOMMENDATIONS FOR FUTURE RESEARCH
102
APPENDICES
108
BIBLIOGRAPHY
118
LIST OF FIGURES
IV.l
IV.2
POSTULATED RELATIONSHIP BETWEEN PROBABILITY OF A
TRUTHFUL RESPONSE (T) AND THE LIMITED HAZARD
62
RELATIONSHIPS BETWEEN PROBABILITY OF A TRUTHFUL
RESPONSE (T) AND THE LIMITED HAZARD
65
VIII.l POSTULATED RELATIONSHIP BETWEEN PROBABILITY OF A
TRUTHFUL RESPONSE (T) AND THE LIMITED HAZARD FOR
ELEVEN CATEGORIES OF SENSITIVITY
8. 1
HODEL I: PLOTS OF
lvop~
104
VS. SENSITIVITY FOR p=1/2,
110
2/3, And 3/4; I A -.Ol
110
8.2
b.
n-l000
110
c.
n-l0000
111
HODEL I: PLOTS OF
lyop~
VS. SENSITIVITY FOR p=1/2,
2/3, And 3/4; IA-.02; n-l000
B.3
HODEL 1: PLOTS OF
'YOP~
VB. SENSITIVITY FOR p-1/2,
2/3, And 3/4; IA-.05; n-l000
B.4
B.5
8.6
HODEL 1: PLOTS OF
'YoP~
112
112
VS. SENSITIVITY FOR p=1/2,
2/3, And 3/4; IA=.10; n-l000
113
HODEL 1: PLOTS OF 'vop~ VS. SENSITIVITY FOR p=1/2,
2/3, And 3/4; TA-.25; n-l0000
113
HODEL II: PLOTS OF
Ivop~
VS. SENSITIVITY FOR p=1/2,
2/3, Ind 3/4; I A -.Ol
114
114
8.7
b.
n-l000
114
c.
n-l0000
115
MODEL 11: PLOTS OF
'YOP~
VS. SENSITIVITY FOR p=1/2,
2/3, And 3/4; TA-.02; n-l000
B.B
HODEL II: PLOTS OF
Tyop~
VS. SENSITIVITY FOR p=1/2,
2/3, And 3/4; TA-.05; n-1000
B.9
HODEL II: PLOTS OF
Ivop~
116
116
VS. SENSITIVITY FOR p=1/2,
2/3, And 3/4; TA-.l0; n-1000
117
:-:i
B. 10
MODEL II: PLOTS OF
VYap~
VS. SENSITIVITY FOR
2/3, and 3/4; VA =.25; n=10000
p=1/2,
117
xii
LIST OF
TAB~ES
VII.1
MODEL I: OPTIMAL VALUES FOR Tv GIVEN lA, p, AND
SENSITIVITY OF STIGMATIZING QUESTION; n=100
80
VII.2
MODEL I: OPTIMAL VALUES FOR Tv GIVEN lA, p, AND
SENSITIVITY OF STIGMATIZING QUESTION; n=500
81
VII.3
MODEL I: OPTIMAL VALUES FOR Tv GIVEN TA, p, AND
SENSITIVITY OF STIGMATIZING QUESTION; n-l000
82
MODEL I: OPTIMAL VALUES FOR Tv GIVEN lA, p, AND
SENSITIVITY OF STIGMATIZING QUESTION; n=10000
83
MODEL I: MSE EFFICIENCY RATIOS FOR p, GIVEN lv,
TA , AND DEGREE OF SENSITIVITY OF STIGMATIZING
QUESTION; n=100
88
MODEL I: MSE EFFICIENCY RATIOS FOR p, GIVEN lv,
TA, AND DEGREE OF SENSITIVITY OF STIGMATIZING
QUESTION; n-500
90
MODEL I: MSE EFFICIENCY RATIOS FOR p, GIVEN Tv,
TA , AND DEGREE OF SENSITIVITY OF STIGMATIZING
QUESTION; n-l000
91
MODEL I: MSE EFFICIENCY RATIOS FOR p, GIVEN lv,
TA, AND DEGREE OF SENSITIVITY OF STIGMATIZING
QUESTION; n-l0000
92
MODEL II: OPTIMAL VALUES FOR Tv GIVEN VA, p, AND
SENSITIVITY OF STIGMATIZING QUESTION; n-l00
93
MODEL II: OPTIMAL VALUES FOR Iv GIVEN lA, p, AND
SENSITIVITY OF STIGMATIZING QUESTION; n=100
94
MODEL II: OPTIMAL VALUES FOR Iv GIYEN TA, p, AND
SENSITIVITY OF STIGMATIZING QUESTION; n-500
95
MODEL II: OPTIMAL VALUES FOR Iv GIVEN TA, p, AND
SENSITIVITY OF STIGMATIZING QUESTION; n-1000
96
MODEL II: MSE EFFICIENCY RATIOS FOR p, GIVEN Tv,
TA, AND DEGREE OF SENSITIVITY OF STIGMATIZING
QUESTION; n-l00
98
MODEL II: MSE EFFICIENCY RATIOS FOR p, GIVEN lv,
TA , AND DEGREE OF SENSITIVITY OF STIGMATIZING
QUESTION; n-500
99
VII.4
YII.5
YII.6
VII.7
YII.8
YII.9
VII.I0
VII.l1
VII.12
VII.13
YII.14
).: i i i
VIL15
VII.16
MODEL II: MSE EFFICIENCY RATIOS FOR p, GIVEN lTv,
VA, AND DEGREE OF SENSITIVITY OF STIGMATIZING
QUESTION; n=1000
100
MODEL II: MSE EFFICIENCY RATIOS FOR p, GIVEN Tv,
VA, AND DEGREE OF SENSITIVITY OF STIGMATIZING
QUESTION; n=10000
101
I.
A HISTORY OF RANDOMIZED RESPONSE MODELS AND STUDIES
1.1
Introduction
This chapter gives a history of the development of randomized
response models beginning with a description of the direct question
estimator for population proportions.
Issues related to respondent
hazards and jeopardy are mentioned only briefly here; the next chapter
presents a more detailed discussion.
Also included is a review of
comparison and validation studies of randomized response models, and a
description of various applications.
1.2
Direct question model
In most surveys designed to estimate certain characteristics of a
population, respondents are asked direct questions about the attributes
under stUdy.
Suppose we wish to estimate the proportion (i M ) of a
population of size N who have a certain characteristic.
If we take a
simple random sample (SRSI of size n, without replacement, then the
"-
proportion
(iA)R
of the sample who claim to have the characteristic is
the ,:"eglj.lc1TE.!.Sttlllc1t9.f.:" of i
A •
Ignoring for the moment the possibility of
nonsampling errors, we have
( 1. 1)
with variance:
2
lI'A(l-lTA)
(N-n)
n
(N-l)
( 1• 2)
= ---------------
where
A
lTA
N
and
where
n
(N-n)
(N-l )
= observed
proportion of attribute in
the sample,
= population size,
= sample size
is the finite population correction (fpc).
In most applications, samples are selected from very large populations.
These situations are practically equivalent to selecting a SRS from an
infinite population.
Thus, the fpc is close to unity and (1.2) reduces
to
( 1• 3)
For the remainder of this paper, we assume that populations are of an
infinite size, unless otherwise specified.
1.3
1.3.1
Randomized response models
Warner' s model
Researchers often design sample surveys to estimate the
population proportion of individuals who possess a sensitive
characteristic.
In surveys containing direct questions on sensitive
issues, individuals may refuse to participate or, if they do
participate, they may choose to give incorrect answers.
Thus, the
theory of the preceding section does not apply, and the regular
estimator could be very biased.
In qrder to elicit higher response
rates and better cooperation among participants, Warner (19651 proposed
the use of a randomized response design.
As he originally conceived the problem, Warner selected a SRS of n
people without replacement from a population of individuals who fell
into one of two mutually exclusive groups:
Group A: members with the sensitive characteristic
Group B: members without the characteristic.
and
He then proposed that participants use a spinner as a randomizing
device to select either a question concerning the sensitive attribute or
its negation.
Thus, the spinner would point either to the letter "A" or
to the letter "B".
Unobserved by the interviewer, each participant
would spin the device and, without revealing the letter selected by the
spinner, would respond with a "Ves" or RNo" answer with respect to
membership in the group denoted by the selected letter.
Assuming that all respondents answer truthfully, then the Warner
estimator, which is the maximum likelihood estimate, for the true
population proportion of individuals with the sensitive characteristic
is estimated as follows.
Let
i
and
A
= proportion
p
= the
n
= sample
n1
= the
of population with the sensitive
characteristic (AI,
probability that the spinner points to A
size,
number of observed "Yes" responses in
the sample •
4
Then the Warner estimator is
nI
+ --------(2p-1)n
, provided
p~1/2
.
( 1• 4 )
The estimator is unbiased,
( 1 • 5)
with variance
( 1 .6)
= ---------------+
n
Note that the first term is the variance associated with a direct
question situation and that the second term is the additional
variance associated with the randomized response model.
Warner illustrated the conditions under which his technique gives
better estimates than the direct question, based upon certain
assumptions and mean squared error (MSE) criterion.
He assumed that a
low value of p induces full cooperation among participants, whereas the
regular (or direct questionl estimates are affected by non-truthful
responses.
Letting TA denote the probability of an A individual
responding truthfully, and Ta denote the probability of a B individual
responding truthfully, he computes the MSE of the .regular estimate for
A
various combinations of TA and Ta
of p.
and the MSE of (VAlw for four levels
."
He then computes ratios of the MSEs of (VA)W to the MSEs of the
regular estimates.
,
Ratios <1 suggest that the Warner technique is
superior to that of the direct question, with the above assumptions.
5
In related work, Pitz (1980) and Winkler and Franklin (1979)
discuss a Bayesian approach to the estimation of parameters in Warner's
model •. Raghavarao (1978) and Devore (1977) discussed and presented
solutions to the problem of obtaining estimates of VA outside the
interval [0,11.
Godambe (1980) investigated optimum estimation in the
slnse of minimum expected variance in the class of all unbiased
estimates.
When testing hypotheses concerning the parameter VA in Warner's
model, Levy (1978) points out that because the variance of the Warner
estimator is greater than that for the regular estimator, a test based
upon the distribution of the regUlar estimator of VA will be more
powerful than a test based upon the distribution of (VA)w for the same
sample size n.
He presents a table of sample sizes required to achieve
power of at least .80 when testing Ho : VA=i o against the alternative
HI: VA aVo +.l0 of size alpha a .05 when using the regular estimate or
Warner's estimator for different values of Vo and p (the selection
probability).
When comparing the ratios of the sample sizes of Warner's
model to the regular model, Levy noted that the Warner technique
requires impractical sample sizes.
1.3.2
Unrelated question models
1.3.2.a
One unrelated question
The single unrelated question randomized response model, a
variation of Warner's model that was suggested by W.R. Simmons,
constituted the nlxt major development of randomized response models.
Abul-Ela (1966)4 and HorVitz, Shah, and Simmons (1967) performed the
6
initial work on this model, while Greenberg et ala
(1969) developed a
theoretical framework.
Like Warner's model, this model provides the respondent a choice of
two questions, one of which concerns the sensitive characteristic.
However, the other question concerns an innocuous or unrelated
characteristic.
This model aSlumes that respondents are more likely to
cooperate if they have a chance to select a question that relates to a
non-sensitive characteristic.
Assume that the population is divided into two groups, which may
intersect:
Group A: members with the sensitive characteristic (A)
and
Group V: members with the unrelated characteristic (Y).
Let
= proportion
of population with A
= proportion
of population with Y.
and
Iv
If Iv is unknown, then two SRSs are required to obtain the estimate of
I
A •
In the first sample, assuming full cooperation, let
P1
= the
1-p1
= the
~q
probability of selecting the sensitive
question,
probability of selecting the unrelated
question,
= the population proportion of "Ves" responses
( 1• 7)
...
Xl = the observed proportion of "Yes" responses,
"7
and
nl
= size
of sample 1.
Similarly for the second sample.
Assuming full cooperation among
participants, the true proportion of the individuals with the sensitive
characteristic is designated in the single unrelated question as (VA)UI
and is
~~!!:~~!_:_~~~!:e!!
( 1• 8)
Pi - P2
Then the single unrelated question (Vv unknown) estimator for VA is
A
A
~~!!:~:!_:_~~~!:e!!
( 1 • 9)
Pi - P2
with variance
+ _~~~!:~~!l!:e~l~_}
n2
(1. 10)
If the population proportion of the Y individuals is known then
only onl simple random sample is required to estimate fA:
A
Xl - f
a
(l-pJl
v
---------------
Pi
(1.111
and
( 1. 12)
where the true proportion of the individuals with the sensitive
characteristic is
8
>" - lJ'vl1-ptl
( 1. 13)
= --------------PI
Greenberg et al. (1969) gave recommendations for selecting PI and
P2, and the unrelated characteristic.
They suggest that PI and
p~
should be selected in the neighborhood of .80 (or .20) to minimize the
variance of the estimate and yet maintain cooperation among the
participants.
They also propose that PI + P2 = 1, and choosing the
unrelated characteristic such that Tv is as far as possible from .5 and
on the same side as lJ'A.
They explored the effects of non-truthful reporting on the
estimator by deriving an expression for the bias which assumes that
only a certain proportion of the individuals who have characteristic A
will respond truthfully when asked the sensitive question while all
individuals who have characteristic Y will respond truthfully when
asked the unrelated question.
The proportion of "Yes" responses in
the ith sample is
(1. 14)
where TA
= probability
of a truthful response.
Then,
~~~~:~~:_=_~:~~:e~~
( 1. 15)
= lJ'A(T A-1)
( 1. 16)
Pi - P 2
A
and
Bias(lJ'A)Ul
•
Greenberg, Abernathy, and Horvitz (1970) extended the single
unrelated question model, lJ'v known, by using three colors of beads in
their randomizing device.
If red was selected the respondent was
instructed to answer the sensitive question; if either white or blue was
9
selected the respondent answered "Yes" or "No" to the question "The
color of the ball in the window is blue."
Then Iv was estimated to be
the ratio of blue balls to non-red balls.
Fidler and Kleinknecht (1977)
used a slight variant in that if color 1 was selected the respondent was
instructed to respond "Yes", if color 2 was selected the respondent was
instructed to
resp~nd
"No", and if color 3 was selected the respondent
answered the sensitive question.
This model is essentially the same as
that proposed by Liu and Chow (1976a); se. Section 1.3.3.
Moors (1971) argued that a more efficient estimator of IA is
obtained when one of the two samples is used to estimate lv, and when
the second sample uses the unrelated randomized response model, Iv
known, with Iv estimated by the first sample.
Moors showed that the
minimum variance of the estimator with respect to nl, n2 and P2 is:
(1.17l
This can be obtained by minimizing (1.10) with respect to n1 and n2 for
constant n, and then setting P2=O.
Dowling and Shachtman (1975)
compared the unrelated question model with Warner's, and proved that the
variances for the unrelated question estimators, both with Iv known and
unknown, are less than the variance of the Warner estimator when p in
the Iv known model, or when p=max(pl,p2) in the Iv unknown model is
greater than apprOXimately 1/3.
1.3.2.b
Two unrelated questions
For the case in which Tv is unknown, Folsom et al. (1973)
developed the two unrelated questions model.
Their objective was to use
the two samples more efficiently in arriving at the final estimator by
10
combining Moors' idea with the single unrelated question model.
Each
sample is used to estimate the proportion of individuals with one of the
innocuous characteristics through the use of two unrelated questions.
Thus, in sample one, a randomizing device selects either the sensitive
question or the first unrelated question while the second unrelated
question is a direct question.
In sample two, the second unrelated
question is used in the randomizing device while the first unrelated
question is a direct question.
is selected with probability p.
XJ
In both samples the sensitive question
Assuming 1007. cooperation, let
= the probability of a "Yes" response to the
(r)
question selected by the randomizing device
in samp Ie j ,
XJ (d)
= the probability of a "Yes" response to the
direct question in sample
j,
XJ (rd) = the probability of a "Yes" response to both
questions in sample j,
and
j
= 1,2.
For sample 1, these probabilities may be written as
Xtlr)
= P1TA
Xl (d)
= VV~,
+ (1-p)1Tvl,
Xl (rd) = P1'AV2 + (1-P)I'Vlv2,
and for samp Ie 2:
where
X.dr)
= PI'A
X2 (d)
= I'v
b(rd)
= PI'AVl
+ (1-p)1T v2 ,
1 ,
+ (1-P)l'v1V2,
= the proportion of the population with the
sensitive characteristic A,
e
11
lVJ
= the proportion of the population with the
nonsensitive characteristic YJ ,
lAVJ
= the proportion of the population with both
the sensitive attribute A and the
nonsensitive characteristic Y
j
and
lY1Y2
,
= the proportion of the population with both
nonsensitive characteristics Y1 and Y2
•
Then two unbiased estimates of lA can be obtained from the observed
frequencies of "Yes" responses in the two samples:
"
"
lA(1)
• [Xdr)
- (1-p)X'" 2 (d))/p,
(1. 18)
"
"
[X:dr)
- (1-p)Xl(d))/p.
( 1. 19)
and
"
lA(2)
:0
If these estimates were statistically independent, then the
optimum estimator of VA would be a weighted average of the two, with
weights inversely proportional to their variances.
Using the symbol
"
(lA)U2
to designate the proportion of the population estimated to have
the sensitive characteristic using two unrelated questions, the weights
are chosen to minimize the variance of:
"
(VA)U2
"
= wVA(l)
"
+ (1-W)VA(2)
given thl sample sizes nl and n2.
obtaining these wlights.
(1. 20)
The authors describe the method for
In efficiency comparisons, Folsom et al.
"
(1975) showed that (lA)U2
is never less efficient than Moors' estimator
"
and is never more effi·cient than (TAllv)uI.
12
1.3.3
Multi-proportion models
Abul-Ela, Greenberg, and Horvitz (1967) extended Warner's model to
the estimation of multi-proportions in the case where the population can
be divided into k categories in which at least one of which and not more
than k-1 are stigmatizing.
to the jth category such
Let the population proportion I
thatj~1lJ
= 1.
correspond
J
Then, this procedure requires
k-1 non-overlapping SRSs with replacement to estimate the first k-1
proportions and where
I~
=1
k-l
- .E I J .
J= 1
The authors derived a method for
obtaining the maximum likelihood estimators of the II, and thoroughly
investigated the case for k=3.
They also explored the effect of less
than truthful reporting on the bias of the sample estimators.
Several studies developed models that require only one sample to
estimate multi-proportions.
Bourke and Dalenius (1973) developed a
model for k=3; other authors considered the general case for any k
(Bourke (1974), Eriksson (1973) and Liu, Chou, and Mosley (1975»).
Liu, Chow, and Mosely (1975) developed a randomizing device for
obtaining discrete quantitative data for sensitive issues.
The device,
a flask with a long neck, contains k different colored balls.
When the
device is inverted, the balls stack singly in the neck. The respondent
gives the position number (j) of the ball which corresponds to the group
identifying the respondent.
Then estimates are obtained by using the
proportion of individuals responding OJ" and the probabilities that a
certain color will appear in position j.
Liu and Chow (1976a) developed a randomization device to estimate
multi-proportions, which is a modification of the Greenberg et al.
(1971) continuous quantitative model discussed in the next section.
device contains a predetermined combination of balls.
One color
The
•
17
~I
corresponds to the sensitive question; the others contain preprinted
numbers.
If the respondent selects the color corresponding to the
sensitive question which involves a discrete quantitative answer, then
the respondent is instructed to answer the sensitive question
truthfully.
If one of the other colors is selected, then the respondent
is instructed to state the number printed on the ball.
Levy (1980)
proposed a statistical procedure to test hypotheses concerning the
proportions estimated by this model.
1.3.4
Quantitative models
As previously discussed, the early work in the development of
randomized response models focused on categorical (or qualitative)
values.
Many authors have extended this work to the estimation of
quantitative variables which are sensitive in nature.
Greenberg et al.
(1971) developed a model which uses two questions, one of which is
sensitive and the other innocuous.
In one of their applications, the
randomizing device included these two questions:
1. About how much money in dollars did the head of
this household earn last year?
2. About how much money in dollars do you think the
average head of a household earns each year?
If the distribution of the non-sensitive attribute is unknown, then two
SRSs with replacement are required to estimate the parameters of
interest.
The responses for each sample are a mixture of answers to
both the sensitive and non-sensitive questions, and each has a
probability density function.
Then, the use of sample means and
variances of the responses leads to an unbiased estimator of the
14
population mean and an estimator of the population variance for the
sensitive question.
When the distribution of the sensitive attribute is
known prior to the survey, only one sample is required.
The authors
present estimators for this situation.
Eriksson (1973) proposed a new model which assumes that in a
population of N persons with associated values of the sensitive
characteristic, it is possible to choose a non-sensitive question with
answers that contain all values of the sensitive attribute.
The
distribution of the non-sensitive responses must be chosen so as to
convince respondents that the interviewer cannot guess which question is
being answered.
At the interview with the
i~h
person who has the true
value Xl, the respondent is asked about that value and then is
instructed to select one of these two statements from a deck of cards:
1. Give a true answer [i.e., state your value of X.],
2. Say that your value is V~
~
,
where V is a value of the innocuous statement which is preset by the
j
survey design.
The probability of selecting the first question is p,
and the probability of selecting statement VI statements is P, where
Poole (1974) presented a technique that instructs the respondent
to multiply his or her true response by a random number and to report to
the interviewer only the result of the multiplication, thus concealing
the true response.
By using these answers and the known properties of
the distribution of the random number, the entire distribution of the
quantitative variable can be estimated, including the mean and variance.
Himmelfarb and Edgell (1980) developed an additive constants
quantitative randomized response model which instructs the respondents
15
to add (or subtract) one or more known constants from the true answer a
certain proportion of the time.
Estimates of the mean and variance of
the sampling distribution of the mean are obtained from the observed
distribution of responses and from the known distribution of the
const~nts.
This model is a contamination model, which we discuss in the
next section.
This particular contamination model violates one tenet of
randomized response, viz. that no respondent's response should allow the
interviewer to know with certainty which question was answered.
Thus,
an individual who has the minimum possible value as a true answer (say,
zero) and who draws the minimum additive constant (say, zero) will
automatically be classifiable.
They also described the relative
efficiencies of special cases of the additive constants model relative
to the Greenberg et al. (1971) quantitative model.
Sen (1974) noted that the literature mostly deals with estimation
problems in which the distributions are assumed to be specified.
He
explores the case in which the basic distribution may be unknown, but in
which it can be assumed that the distributions belong to a broad class
of distributions.
In related work, Sen (1976) investigates the
selection of sample sizes that maximizes the precision for the
estimators obtained in Sen (1974).
1.3.5
Contamination models
Contamination models were developed to give the respondent more
protection when answering a sensitive quantitative question.
In
general, the respondent is instructed to distort his or her answer
according to a rule with a known distribution.
Then, an estimate of the
population mean is obtained from the distribution of responses and from
16
the known distribution of the rule which alters the respondent·s true
value.
The responses thus represent a mixture distributlon of the true
values and of the contamination rule.
Boruch (1972) described the contamination model, first presented
in 1969, which alters the responses given by respondents.
The model
presents the respondent with a single question or statement, and
instructs the respondent to lie or tell the truth according to the
outcome of a randomizing device.
For example, in the speclal case of
estimating a proportion, if a die is the randomizing device, the
individual is instructed to respond truthfully if a 2,3,4,5, or 6 shows,
and to respond falsely if a 1 shows.
Thus, 1/6 of all "Yes" responses
are false positives, and 1/6 of all "No" responses are false negatIves,
if each respondent follows the instructions.
By knowing these two rates
and using the proportion of "Yes" responses, researchers can estimate
HA •
The author discussed efficiency results of the contaminatIon model
relative to Warner's model and to the unrelated question model, Hy
unknown.
Warner (1971) discussed the potential applications of the linear
randomized response model (see 1.3.7), which include estlmatlng HA from
responses that have been multiplied by or added to a random number from
a known distribution.
In this way, a respondent may receive added
protection when answering a sensitive question.
The general case for the Himmelfarb and Edgell (1980) additive
constants quantitative model is the following.
respondent is handed a deck of 100 cards.
statement:
Assume that the
Each card contaIns the
"Take the amount in dollars that the head of this household
earned last year, and to it add an amount K"
and tell me the amount of
17
your addition."
There are c different constants K., where i=l, ... ,c,
with the corresponding probability of occurrence Pi (i=l, ... cl.
Let X
represent the true value of the household earnings, and let Z represent
the respondent's answer.
Then,
(1.21l
with probability Pt.
Using the Pt and density functions of Z and X. an
unbiased estimator of the population mean is obtained.
considered three special cases.
The authors
In each case, the respondents are
instructed to answer directly the sensitive question with selection
probability p.
If the randomization device does not select the
sensitive question, in case 1, the respondents are directed either to
add K or 2K, or to subtract K or 2K, each with equal probability
(1-pl/4.
Again, a value of (-2Kl identifies the person as having a
"zero" as a response to the sensitive question.
In case 2, they are
directed to either add or subtract K with probability (1-pl/2; and. in
case 3, they are directed to add (or subtractl K with probability (1-pl.
Using two sets of assumptions concerning the population means and
variances, the authors compute the relative efflciencies of the three
cases of their model to the Greenberg et al. (1971l model.
They present
a table containing values of K for which the efficiency of their model
is superior.
1.3.6
Multiple trials randomized response models
As Liu and Chow (1976bl discussed, the purpose of randomized
response models with multiple trials per respondent is to reduce the
variance introduced by the randomizing device without increasing the
18
sample size of a study.
Horvitz, Shah, and Simmons (1967) first
introduced the multiple trials concept by extending the single unrelated
randomized response technique to one that requires each respondent to
use a randomizing device to make two independent selections of the
questions.
Using the observed proportions of "Yes" responses, unbiased
estimates for the population proportions are obtained, provided that all
respondents respond truthfully.
Although the objective of reduced
variances of the estimators is implied, the authors did not present a
motivation for their study nor do they present any variance formulas or
comparisons with other models.
Liu and Chow (1976b) introduced a general model that allows for
two or more independent trials per respondent.
A maximum likelihood
estimator for HA is derived through an iterative numerical method
utilizing efficiency score and information functions.
Relative
efficiencies of multiple trials to one trial estimates show an increase
in efficiency when the number of trials per respondent (m) is increased.
As m increases, the variation due to the randomizing device approaches
zero.
The authors note that for small m, cooperation may increase
because of an increase in perceived protection, but for large m,
participants may become suspicious and give non-truthful responses.
Assuming different probabilities of truthful responses in the multiple
trials model and in the direct question method, relative efficiencles
show the effects of sample sizes and illustrate when one technique is
preferred to the other.
Gould, Shah, and Abernathy (1969) extensively explored the effects
of non-truthful responses in models for two trials per respondent.
considered many different rates of truthful responses among various
They
19
combinations of population subgroups, such as the group containing
individuals with the sensitive characteristic who select the sensitive
question in each trial.
They developed a general model that includes
parameters representing the proportions of the attributes in the
population, the proportion that does not understand the randomizing
device, the probability that an individual with the sensitive
characteristic will choose to answer the innocuous question, and
probabilities that individuals will tell the truth in various
situations.
Some behavioral models which can be obtained from the
general model are presented.
1.3.7
Linear and general randomized response models
Warner (1971) attempted to provide a unifying structure to many
types of linear randomized response models.
He reformulated the schemes
as generalized linear regression models, termed linear randomized
response models, which assumed SRSs of size n with replacement.
This
approach puts the randomized response models into a familiar framework,
which, in turn, suggests potential applications for these techniques and
possibilities for improving the estimates.
As an example, Warner states
that one potential application involves using computers to apply
randomization techniques to large data files from general sources of
sensitive data, such as the Bureau of the Census, in cases where
confidentiality must be ensured.
Bellhouse (1980) develops a linear model that assumes fixed-size
sampling designs with one sample and allows for unequal probability
sampling designs.
He discusses the specification for the Pollock and
Bek (1976) additive model and for the unrelated question design.
He
20
shows that estimates of the mean are optimal in the sense of m1n1mum
average mean square error within a certain class of models.
He also
pointed out that this model is more general than the Warner linear
randomized response model.
Anderson (1977) defined a general randomized response model that
covers some of the linear models discussed above.
following.
The model is the
Suppose there exists a population of individuals with a
sensitive characteristic XEQ x distributed according to the unknown
distribution Fx(x).
The randomizing device produces as answer Y with a
probability distribution dependent on the value X=x of the respondent.
Then,
an individual with X=x answers according to the known
probability density function hv(ylx), xEQ..
(1.22)
The densities hv(ylx), xEQ M are called response densities which, 1n
general, can be selected by the statistician.
Using this model,
Anderson also develops revealing densities (see 11.5) and a theorem that
shows the amount of Fisher's information lost due to the randomized
response, and that an increase in protection to the respondent results
in a loss of information.
1.3.8
Two-stage or conditional randomized response models
Reinmuth and Geurts (1975) combined qualitative and quantitative
models into a two-stage model.
The authors were interested in
estimating the frequency of shoplifting among those who shoplift.
At
the first stage, the respondents were asked a qualitative question using
the single unrelated model.
estimated.
Then the proportion who shoplift was
At the second stage, a quantitative randomized response
21
model was applied to a sample independent from the first, which yielded
an estimate of the mean number of times of shoplifting.
They then
proceeded to estimate the average number of times of shoplifting among
those who shoplift and among the entire popualation.
Carr, Marascuillo, and Busk (1982) used Loynes' model (Loynes
(1976a,b); see 11.5 in the following chapter), which sets Vv=1 in the
unrelated question model, to screen those who answered "Yes" to a
qualitative question: "Have you smoked marijuana in the last year?"
"No" answer denotes non-smokers.
A
Loynes' model was then applied to the
"Yes" respondents, i.e., the smokers, who were asked: "Have you been
loaded or high during the school day?"
The
autho~s
claimed that this
model reduces the standard error of the estimate of the proportion of
students who fall into the second category.
1.3.9. Multiple sensitive characteristics
In each of the models discussed above, the respondent answers, at
most, only one stigmatizing question.
Barksdale (1971), in an early
work, developed techniques to estimate two sensitive attributes.
These
included repeated applications of Warner's model and of the unrelated
question models.
He defined a technique that applies a variation of the
unrelated question model two times per respondent; one to estimate the
first sensitive attribute, and the other to estimate the second
attribute.
The model at each trial is essentially that used by
Greenberg, Abernathy, and Horvitz (1970); see 1.3.2.1.
the randomizing device selects one of three options:
At each trial,
the sensitive
question, a statement that instructs the respondent to reply "Yes", or a
statement that instructs the respondent to reply "No".
In an extension
of this work, Tamhane (1981) proposed a model that uses two or more
multiple trials on respondents in a sample of size n.
Subsamples are
used to estimate the unknown population proportions of the sensitive
attributes.
Bourke (1981) in an extension of some earlier work showed
that the multivariate design matrix for the multiple randomized response
model is simply the Kronecker product of the univariate design matrices
used for each trial.
Some authors have explored whether responses to two different
randomized questions are independent or not.
Kraemer (1980) proposed a
method to assess the correlation between two "Yes" responses.
She also
derived the kappa coefficient of agreement between, say, a husband and a
wife to the same question or between the same subject's responses to the
same question at different times.
Drane (1976) proved that .if two
sensitive characteristics are independent then the randomized responses
to the two questions are also independent; the converse also holds.
Clickner and Iglewicz (1980) provided a test for independence when the
survey includes two sensitive questions; they also extended Warner's
model to the two sensitive question setting.
Levy (1976) proposed methods for making all possible pairwise
comparisons among parameters of k independent binomial populations based
upon information from randomized response models.
He used a studentized
range statistic with infinite degrees of freedom and showed how to apply
the Tukey and the Newman-Keuls methods for pairwise contrasts.
In the
case of unequal sample sizes, the author used a method of Spjotvoll and
Stoline, which produces more powerful tests than the Scheffe method when
sample sizes are not severely unbalanced.
Levy (1977) provided a set of
sample sizes for selecting the population with the largest value of
i~
from among k binomial populations when VA is estimated by Warner's
model.
The results of a Monte Carlo study provided support for these
sample sizes.
1.4
Validation studies
Although many researchers have noted the need for validation
studies of the randomized response models, the literature contains only
a limited number of such reports.
Taken together they demonstrate that
randomized response models reduce the bias of the estimates relative to
direct question models, and that the estimates are reasonably close to
the true population parameter.
The first validation study, which tested
the unrelated question model by using a sample of households selected
from birth certificates which contained the marital status of the
mother, produced conflicting results (Horvitz, Shah, and Simmons
(1967».
The estimated proportion of all households with a birth to an
unmarried woman was in relatively close agreement with the true
proportion, and close when comparing the estimates separately for white
and non-white households.
However, when testing the model with two
trials per respondent, the estimated proportions differed greatly from
the true population proportions.
The authors suggested several
possibilities for these discrepancies, including errors in data coding
or processing, differences between the actual question selection
probabilities and the intended probabilities, and response errors due to
misunderstanding the instructions and non-truthful responses, among
others.
The authors postulated several alternate randomized response
models that adjust for these possible sources of error.
24
Locander, Sudman, and Bradburn (1974) selected four topics which
they believed, a priori, differed in their levels of threat to the
respondent and which contained validation information in public records.
They used four interviewing techniques which differed in the degree of
interviewer-respondent interaction: self-administered, telephone, single
unrelated question randomized response, and face-to-face interviews.
The telephone surveys achieved the highest response rates, while
randomized response methods produced completion rates similar to
personal interviewing.
The proportions of distortion, where distortion
is defined as the absolute value of the difference between the estimate
and the true value divided by the total sample size, increased as threat
increased, with the exception of the randomized response technique for
bankruptcy whose estimate was not distorted.
Compared with the other
methods, the randomized response model tended to produce higher
variances across the threats.
At the highest threat level, randomized
response models gave the lowest distortion rate, while in the next level
of threat, the model gave the highest distortion rate, and for the third
level
I
there was no distortion.
The authors suggested that randomized
response methods are least effective in reducing over-reporting of
socially desirable acts and most effective in reducing under-reporting
of socially undesirable acts.
The randomized response models gave the lowest distortion on
threatening questions asking about socially undesirable acts, although
bias still exists.
However, the authors did not report the probability
of selecting the sensitive question or the proportion of the population
with the innocuous characteristics.
These would indicate the level of
protection afforded by the randomized response models and thus might
help explain the amounts of distortion given by the respondent.
The concept of investigating the usefulness of randomized response
according to the levels of threat to the respondent is an important one.
The present dissertation will later explore this aspect of threat more
fully and attempt to indicate how to utilize such jeopardy in the design
of the randomized response survey.
Frenette and Begin (1979) validated a randomized response model
by first asking a sample of adults to record their party affiliation and
deposit the response in a locked box.
Then, using the method of Fidler
and Kleinknecht (1977), three simulation studies were conducted by
applying the randomized response technique to the party preference
slips.
The use of this approach assured 1007. truthful reporting.
The
authors reported estimates that agreed closely with the true values.
However, this is what one might expect, because the randomized response
technique was applied to the slips
aft~r.
they had been deposited; the
marked slips were considered to be "true" values.
Lamb and Stem (1978) used direct questions, and the unrelated
question and quantitative models to estimate the proportion and the mean
number of students who have failed college courses.
The estimates were
compared with the true values reported by the registrar's office.
The
authors measured the sensitivity of the questions by testing the
differences between the direct question estimate and the randomized
response model estimates.
question was sensitive.
They determined that only the quantitative
None of the estimates was significantly
different from the true value, demonstrating the efficacy of randomized
response models in this setting.
26
Tracy and Fox (1981) compared the number of self-reported arrests
estimated by randomized response and direct question techniques with the
true number.
The randomized response model achieved a smaller response
bias relative to the direct question model.
They also demonstrate that
the randomized response models reduce systematic bias.
For a sample of college students, Shotland and Yankowski (1982)
compared the randomized response estimate with the true value of the
proportion of students who had received information about a test prior
to taking the test.
The authors found that the randomized response
estimate was much closer to the true value than was the direct question
estimate.
1.5
Comparison studies
Several studies compare the estimates obtained by randomized
response models with those obtained by other survey designs.
Among
studies that use theoretical techniques, Boruch (1972) compared the
variances of the Warner, the unrelated question and the contamination
models, and demonstrated the conditions under which the three models
yield identical results.
He also showed that, relative to the
contamination model, the unrelated question model is more efficient for
extreme values of the probability of selecting the sensitive question in
the presence of false negatives.
Pollock and Bek (1976) and Clickner
and Iglewicz (1980) showed that Warner's model is superior to the direct
question technique, in terms of MSE, when there is a considerable
increase in the rate of truthful responses for Warner's model.
Folsom
et al. (1973) demonstrated that the two unrelated questions model is
never more efficient than the single unrelated question model with iv
27
known, but is more efficient than Moors' optimized model.
Greenberg et
al. (1969) compared mean squared errors and showed the unrelated
question randomized response model is more efficient then Warner's model
if the rate of truthful responses is greater for the unrelated questlon
model.
Greenberg, Horvitz, and Abernathy (1974) compared six different
randomized response models by computing design effects and showed that
Warner's model was the least efficient, whereas the single and the two
unrelated question models with Iv known were generally efficient.
certain instances the multiple trials model was most efficient.
In
In the
presence of less than truthful reporting, the randomized response models
were always more efficient than the direct question setting.
Each of the empirical studies reviewed here compared estimates
obtained by randomized response models with those obtained by the direct
question method.
The studies comprise many applied research areas.
Among studies that compare the unrelated question model with Iv known,
Barth and Sandler (1976) showed that the randomized response estimate of
alcohol consumption among high school students was greater than the
direct question estimate, a statistically significant difference.
I-
Cheng, Chow and Rider (1972) and Rider et al. (1976) found the same
result when comparing the estimates of induced abortions among a
population in Taiwan.
Rider et al. (1976) also showed that the
randomized response estimate was significantly greater than that
obtained by a repeat interview method.
By contrast, Martin and Newman
(1982), who interviewed ninth grade students on health behaviors, found
that the differencis in the .stimates from these two techniques were
statistically non-significant.
28
A few studies compared the estimates obtained by variations of the
unrelated question model with those obtained by the direct questlon
technique.
Volicer and Volicer (1982) used Moors' model to obtain
estimates for variables with dichotomous and with quantitative outcomes
regarding alcohol use and non-compliance among hypertensives.
Differences between the two types of Istimates were not statistically
significant for the dichotomous data, but were significant for the
quantitative data from the entire sample, and from women, from people 52
years old or older, and among people with any post-high school
education.
Reaser, Hartsock, and Hoehn (1975) found that the direct
question estimate was smaller than the Loynes' randomized response
estimate when p=.5, and larger than the randomized response estimate
when p=.83.
Zdep and Rhodes (1976) showed that the response rates and
estimates were higher for the randomized response model with two
unrelated questions than for the direct question.
This study, as most.
did not present the standard errors of the estimates in order to
determine the range in differences between the two estimates.
Two studies used a randomized response model containing two
alternate questions; one question instructed the respondent to respond
"Yes" and the other to respond "No".
Using social issues of different
degrees of sensitivity among college students, Begin, Boivin, and
Bellerose (1979) found statistically significant differences between the
two types of estimates for many items.
In some cases the sign of the
test statistic indicated that some sensitive attributes might in fact be
socially desirable, and thus would elicit a higher proportion of ·Yes·
responses to direct questions.
In addition to these two types of
estimates, Begin and Boivin (1980) obtained estimates made by a
29
projective method.
Among three sensitive issues, one significant
difference existed between the direct question and randomized response
techniques, and significantly different estimates existed between the
projective method and the randomized response model.
Wiseman, Moriarty,
and Schaefer (1975-76) found that the randomized response estimate was
close to the personal interview estimate; on two items it was
significantly higher than a self-administered questionnaire estimate;
and on two other items it was significantly lower.
They attributed
these results to a small sample size, lack of confidence among the
respondents toward the randomizing device, misunderstanding of the
randomized response instructions, and suspicion of risk.
Using the two unrelated questions randomized response model,
Goodstadt and Gruson (1975) found that the estimates of drug use among
high school students were significantly larger for the randomized
response model.
Further, refusal reponse rates were significantly lower
when using the randomized response technique.
Goodstadt, Cook, and
Gruson (1978) found similar results in a separate study of drug use.
Zdep et al. (19i9) found that the randomized response estimates of the
proportion of marijuana users among certain demographic categories were
consistently greater than the direct question estimates.
In place of
the direct question interView, Zdep and Rhodes (1976) used two selfadministered interviews to elicit information about child abuse among
parents.
The randomized response model yielded the highest response
rates and gave estimates which were four to five times larger than those
estimated by the self-administered interviews.
In general, these studies compare the estimates from a
randomized response model with those from a direct question setting.
These studies tend to show that the randomized response models yield
significantly different higher estimates when the attribute is socially
undesirable and underreporting can be safely assumed to exist with a
direct question whether by personal interview or by a self-administered
questionnaire.
1.6
Applications
Researchers have implemented randomized response models in many
areas of applied research.
In addition to the studies mentioned in I.3,
others include estimating abortion rates (Abernathy, Greenberg, and
Horvitz (1970)), mean number of abortions in an urban population and
mean income of heads of households (Greenberg, Kuebler, Abernathy, and
Horvitz (1971)), abortion rates in a developing country (Chow, Gruhn,
and Chang (1979)), the proportion of abortions among various subclasses
of women in the United states (Shimizu and Bonham (1978)), evaluation of
faculty performance (Smith and Sosnowski (1972)), and the proportion of
consumers who shoplift and of those who deface retail property (Geurts,
Andrus, and Reinmuth (1975-76)).
Additional applications could be cited but such an exercise will
simply show that imaginative researchers in all fields of social
sciences have utilized randomized response techniques fairly
successfully.
The field of social science has been the chief
beneficiary of randomized response because the technique is designed to
overcome the stigma that might be associated with a social act or values
of certain personal attributes as perceived by society.
II.
RANDOMIZED RESPONSE:
11.1
A PRACTICAL TECHNIQUE?
Introduction
The first chapter presented a history of randomized response
models and described many of the different models that have evolved from
Warner's original conception.
The objective of providing protection to
the respondent and to the interviewer has underscored this evolution of
models.
However, in spite of all the work and development that has
focused on randomized response models, they have yet to fulfill the
expectation that they would be more widely used in practice.
Perhaps
the major obstacles to their adoption are the uncertainties about the
actual degree of protection provided by these different designs, and
about the effect of the choice of the parameters in the models on the
behavior of the respondents.
The central question is:
can the right
choice of parameters, which actually determines the degree of
protection, increase response rates and truthful responses while also
yielding reasonable estimates as measured by bias and variance?
This
chapter focuses on some of these aspects.
11.2
Proposed for highly sensitive issues
As discussed in Chapter I, randomized response models were first
designed in an attempt to gain better information regarding behaviors or
practices that are in some way considered sensitive in nature.
In such
situations, respondents may hesitate to give truthful answers, or may
32
even refuse to participate in a survey.
That such models are of
potential value is evidenced both by the theoretical development of
Warner's model into the many forms of randomized response models and by
their use in a diversity of applied research areas such as marketing,
psychology, sociology, public health, education, and others outlined
earlier.
And yet randomized response models are not utilized as often
as they might.
For example, Pohl and Pohl
(197~)
state that the
theoretical issues have been resolved and advocate more work on issues
related to application and methodology.
These issues would include
providing guidelines for determining which techniques and types of
devices should be used in particular situations, correlation between the
understandability of a method and the degree of cooperation, and
selecting the model parameters for optimal results, which the authors
consider to be the critical issue.
Part of the reason for the underutilization of randomized response
models at this point in time is perhaps somewhat attributable to the
degree of protection as perceived by the respondent, who then decides
either to refuse to participate or to give a nontruthful response.
degree of perceived protection may very well vary by the degree of
sensitivity of the topic under study.
A better understanding of the
protection given to the respondent and interviewer, and of the
relationship between the choice of model parameters and respondent
behavior could lead to greater acceptability of randomized response
models.
This
33
11.3
Protection of the respondent and the interviewer
Social science surveys at times focus on extremely sensitive
issues.
Despite assurances that their answers to direct questions will
be handled
~ith
complete confidentiality, respondents may still remain
skeptical and either refuse to participate or choose to give nontruthful
responses.
This distrust is even more problematic when dealing with
issues that are criminal in nature.
Such instances involve ethical
concerns which might even include the possibility of providing selfincriminating information.
Further, the interviewer is also placed in
jeopardy when recording information regarding illegal behavior.
In
these types of settings, randomized response models would be a very
desirable alternative to direct questions or self-administered
questionnaires.
Fox and Tracy (1980) discuss some of these issues in
their review article which focuses on the potential of randomized
response models in criminological research where both the research
subject and the researcher are concerned about individual anonymity.
In her discussion of the research by Fox and Tracy (1980), Miller (1981)
argues that Fox and Tracy failed to stress that the randomized response
model reduced bias only among persons with multiple arrests and
increased bia' among slngle arrestees and if the data were weighted to
represent the actual distribution, then the reduction in response bias
would be minimal.
She proposes some alternatives to address the
problems of "response validity," one of which is to relate the innocuous
question to a socially desirable behavior, such as "How many times have
you stopped to help a stranger whose car had broken down?"
This
question would encourage "Yes" responses; however, it is not an
34
innocuous question and would probably provide an additional source of
bias.
Finally Miller argues that randomized response models contain a
"fatal flaw" of too much variance in the estimates from designs
providing adequate protection; and thus advocates the development of
alternative methods of indirect survey-based estimation.
Fox and Tracy
(1981) reply that the poor results for the one-time arrestees is due to
their increased hazards, and that the final distribution of arrestees
was apparently different from that taken into account when the model was
designed.
They again argue for additional field tests to compare models
and to understand the respondent behaviors of cooperation, comfort, and
comprehension.
Selles (1978) expresses concerns similar to those of Fox and Tracy
(1980) in his reveiw of research methods and of problems associated with
performing research on sensitive topics in the family, such as child
abuse, wife abuse, and family violence.
He also notes that federally
funded projects contain strict guidelines on the protection of human
subjects; thus, randomized response models are of potential importance
in studies requiring anonymity of participants.
In the discussion of
different sampling techniques and strategies, randomized response models
are identified as having the methodological advantage of providing
confidentiality for the respondent and reducing the bias of the estimate
due to evasion.
However, he also notes that the randomized response
technique contains a potential disadvantage of inducing skepticism
concerning the degree of protection provided by the technique, perhaps
resulting in noncooperation among the respondents.
Some studies, some of which were discussed in 1.5, note increased
cooperation among participants in a randomized response model setting as
compared with direct question settings (Reaser, Hartsock, and Hoehn
(1975); Fidler and Kleinknecht (1977); Goodstadt, Cook, and Gruson
(1978); Zdep et a1. (1979); Begin, Boivin, and Bellerose (1980); Begin
and Boivin (1980); and Volicer and Volicer (1982)).
Some studies report
that respondents found the randomizing device easy to understand, felt
that no trick was involved, or felt comfortable with the device
(Abernathy, Greenberg, and Horvitz (1970); I-Cheng, Chow, and Rider
(1972); Shotland and Yankowski (1972); Krotki and Fox (1973); Locander,
Sudman, and Bradburn (1974); and Chow, Gruhn, and Chang (1979)).
One
study notes th,t , nons.nsitive question dealing with a socially
desirable behavior resulted in better cooperation (Zdep and Rhodes
(1976)); this finding supports the Miller (1981) suggestion mentioned
earlier.
Only a limited number of studies have attempted to measure the
respondent's
perceived level of protection or to assess the sensitivity
of questions (e.g., Moriarty and Wiseman (1976); Shotland and Yankowski
(1982); and Carr (1983)).
Two of these articles, which we now
introduce, provide the basis for much of the research developed in the
present manuscript.
Soeken and Macready (1982) administered randomized
response models in a study designed to assess a respondent's perceived
level of protection and willingness to cooperate.
Using the unrelated
question randomized response model, Iv known, the authors selected I y to
be either close to the initial estimate of lA, which they called "low
discrepancy," or selected Iv to be close to 0 or to 1 in order to
achieve a high level of discrepancy.
These selections followed the
recommendations of Greenberg et a1. (1977) and Lanke (1975),
respectively.
Additionally for each level of discrepancy, 4 values of p
(the question selection probability) were used, thus yielding a 2X4
36
factorial design.
Seven pairs of sensitive and unrelated questions were
administered; respondents were also instructed to respond to the
statements "I am
I. sure the researcher knows which questions
answered" and "I would be willing to answer aDY questions using this
technique."
Perceived protection seemed to be unrelated to the degree
of discrepancy but linearly related to the value of p, and was
significantly less when p=.91 than at other levels.
recommend that p be restricted to
~.85.
The authors
No trends were noted for the
probability of responding to a sensitive question.
They point out that
the questionnaire form contained items of varying levels of sensitivity
and that further research should examine the relation of perceived
protection to item sensitivity.
Himmelfarb and Lickteig (1982) propose a social desira?ility scale
which measures the sensitivity of a direct question and possible
direction of bias by comparing responses to a direct question with those
to a randomized response model.
The underlying assumptions are that
responses to direct questions are biased both by social desirability and
by undesirability of the reported behavior and opinions, and that the
randomized response technique circumvents these tendencies to some
degree.
They point out that previous research simply compares responses
from direct questions with those of randomized response models, but does
not assess whether the responses are more valid; these studies are not
definitive and simply suggest that such responses are more valid.
The
authors argue that more meaningful tests of validity of both techniques
require checks against records.
They also argue that previous research
has not addressed the extent to which people, in direct questioning,
will overreport socially desirable behaviors.
In their study, the
37
authors gave a sample of college students a questionnaire and answer
sheet, and instructed them to omit their name from both sheets.
Further, the students were told that the study was an attempt to
ascertain the types of questions that would be answered truthfully or
untruthfully under direct questioning.
The students were instructed to
answer the questions by indicating whether they thought that the typical
college student of the same sex would answer the question truthfully or
not.
If they thought the typical student would not answer truthfully
they were told to indicate whether the answer would be distorted by a
"Yes or "No" reponse.
The reported results included the estimates of IA
by both the direct question and randomized response techniques for the
same set of questions.
The difference of the two estimates was
converted into standard normal differences or z-scores,
The social
desirability difference was determined by calculating the difference
between the proportions of students who said the typical college student
would distort his/her answer by answering "Ves" and by answering "No."
Pearson product-moment correlations showed that the z-scores and social
desirability differences were significantly correlated.
The authors
point out that the actual differences between the direct question and
randomized responsed estimates were quite small, and offer some possible
explanations.
However, they failed to notice that the probability of a
respondent being perceived as a A subject when givingLa "Ves" response
IPIAIVes»
ranged from about .8 to 1.0 with only a few values falling
below the lower limit;
lie
IV.2 for a further discussion of PIAIYes).
A
high value of P<AIVes) is almost equivalent to a direct question setting
which does not give any protection to the respondent.
Thus it may be
that these high values of the probabilities may account for the small
38
differences between the two estimates; smaller P(AIYes) may yield larger
discrepancies.
It should be noted that reports of randomized response
studies do not typically include these jeopardy values, nor do they
usually provide enough information from which such calculations could be
made as was the case here.
These investigators again emphasize the need
for studies that expl?re the usefulness and validity of randomized
response models.
11.4
Effect of respondent behavior on population estimates
As discussed earlier, respondent behavior can have a large impact
on population estimates through bias and variance.
Refusal to
participate and nontruthful reporting are two of the major
considerations when designing a randomized respondent model.
If the
choice of parameters places the respondents at too high a rlsk for being
perceived as A subjects, then the potential participants may refuse to
participate, or if they do participate, they may give a nontruthful
response.
This latter situation would be reflected in the bias of the
estimate.
The randomized response model introduces its own source of
variation; see 1.3.1.
The higher the probability of selecting the
sensitive question, the smaller is this variance.
However, a higher
probability of selecting the sensitive question lowers the protection
given to the respondent, which could lead to nonresponse or nontruthful
reporting.
On the other hand, increasing the protection results in a
larger variance.
So a trade-off exists between increasing the
probability of selecting the sensitive question and increasing the
protection to the respondent.
Other sources of error in the estimate
39
include the respondent misunderstanding how to use the randomizing
response device; the randomizing device may not be presenting the
questions with the assumed levels of probability; and the innocuous
question itself may carry some degree of sensitivity.
At least one study has attempted to deal with many combinations of
respondent behavior.
Gould, Shah, and Abernathy (1969) develop a
general behavioral model for the randomized response technique requiring
two trials per respondent; see 1.3.6.
Their model takes into account
the probabilities of selecting the questions; the proportions of people
with the sensitive characteristic, with the non-sensitive
characteristic, who misunderstood the instructions and answered at
random, who selected the sensitive question but answered the innoucuous
question instead; and the probabilities of telling the truth in various
situations.
Specific behavioral models are then derived for specific
settings and applied to a set of data reported by Horvitz, Shah, and
Simmons (1967) that contained many discrepancies; see 1.3.6 for a
discussion of the model used by Horvitz, Shah, and Simmons (1967).
A way of exploring the relationship between respondent behavior
and the estimates is to examine the relationship between respondent
protection and choice of model parameters.
Several authors have defined
various measures of protection, which we will now discuss.
II.S
Relationship between respondent protection and the choice of
parameters
While several authors have recommended working rules mentioned in
1.3.2.a on selecting the value p (e.g., Greenberg et al. (1969),
Greenberg et al. (1971)), and others have explored the effects of less
40
than truthful reporting on the bias of the estimates, only a few have
examined the relationship between respondent risks or jeopardies and the
parameters in the randomized response models.
Using the single unrelated question randomized response model
I
Lanke (1975) explores the relation between p and Wv through P(AIYes)
I
which is the conditional probability that the respondent belongs to the
A group given that his/her response to the question selected by the
randomizing response device is "Yes."
These concepts are used in
constructing one of our models describing the relationship between the
probability of a truthful response and individual levels of protection.
He argued that this "risk of suspicion" should be bounded by some
constant e:
(2. 1)
P(AIYes)le •
The rate of cooperation decreases beyond e. In fact, Lanke postulates
that potential respondents may use P(AIYes) as a basis for deciding on
whether or not to participate or give a truthful response.
From the
definitions of conditional probabilities,
(2.2)
The boundary on P(AIYes) leads to
(2.3)
Then wA<e is a necessary and sufficient condition for the
existence of a p that satisfies (2.3).
decreases as
e
decreases.
Further, the boundary on p
Lanke (1975) and also Loynes (1976) conclude
that the optimal design occurred when Wv=l, so that lOOt. protection
41
would be given to the A individual.
However, one negative aspect of
this strategy is that a "No" answer would automatically identify the
respondent as having answered the sensitive question which violates a
principle of randomized response models.
Extending Lanke's work, Anderson (1976) discusses that a "No"
response carries some risk, that is a "No" response may carry a
suspicion that an individual is perceived as belonging to group A when
answering "No", or, in symbols, PlAINo).
He further considers the risk
to potential respondents who do not have the sensitive characteristic by
introducing two response distributions and showing that the parameter
~A
is a mixture parameter between the two distributions:
(2.4)
where fc,ly) and ft ly) are the response probability density functions
from A and a individuals, respectively, and i
A
population with the sensitive characteristic.
is the proportion of the
Lanke (1976) extends this
work through the use of information theory and discusses the loss of
information in various randomized response designs.
In this context, he
also presents some measures of respondent protection.
Anderson (1977) develops a general randomized response model (see
1.3.7) and defined a family of conditional densities called revealing
densities which are similar to PIAIYes).
X€Q~
The sensitive characteristic
is distributed according to the unknown distribution
F~
Ix).
The
respondent gives an answer Y with a probability distribution dependent
on the x-values.
Then the family of conditional densities of X given
Y=y, y€Q v is h~(xly)=hylylx)f~lx)/fvly), y€Q v •
f IXje) and
H
h~
The discrepancy between
(x IYiG) reflects the invasion of privacy caused by the
answer Y=y.
The degree of protection is alternatively indicated by the
amount of lost Fisher information due to the randomizing device.
He
also discussed measures of minimal protection.
Leysieffer and Warner (19761 discuss the optimal designs of
randomized response models based upon levels of jeopardy, which are
defined as follows.
Response R is said to be jeopardizing for an A
individual if the conditional probability of membership in group A,
given response R, is greater than TA :
P(AIRl)T A
(2.51
•
Similarly for individuals without A, which we term "a" individuals:
(2.61
P(aIRl<l-T .
They then
say that
natural
measures for
the di fferent
level s of
jeopardy are based on P(RIAI and P(Rlal and are defined by
,AI
= ------
P(RIAI
P(Rlal
12.71
g (R, a I
P(Rlal
= -----P(RIAI
12.81
g (R
and
They point out that each g is a function of
design of the model, and not of TA •
th~
parameters of the
When g(Yes,Al )1 then a "Yes"
response is jeopardizing with respect to A; similarly, when glNo,al
>1, a "No" response is jeopardizing with respect to a.
Using these
definitions of levels of jeopardy, they compare various models and
discuss minimizing the variance of models at fixed levels of
4 ··'!"
••• 1
jeopardy.
Tamhane (1981) extends the definition of respondent
jeopardy to the case of 2 or more sensitive characteristics, and
then derives the jeopardy functions for various randomized response
techniques.
Greenberg et al. (1977) discuss many aspects of respondent jeopardy
and, in particular, examine the risk to A respondents as well as that to
the a respondents.
HA
The hazard for A individuals is defined as
= PIA
is perceived as A)
= P(YesIA)P(AIYes)+P(NoIA)P(AINo)
= [p+(l-p)lI'v]P(AIYes)+[ (l-p) (l-lI'v) JP(AINo)
(2.9)
A similar definition is given for the a individuals and definitions of
limited hazards are also presented which are discussed in detail in
IV.3.
A reduction in hazard for the A respondents necessitates an
increase in hazard among the a respondents, which may lead to reduced
cooperation among the a respondents.
They define a benefit ratio, which
measures the reduction in hazard (i.e., a galn) among the A respondents
relative to the increase in hazard (Le., a loss) among the a
repondents.
They recommended that the bias be related to the risk of
suspicion P(AIYes) in order to show the influence of p, lI'v, and lI'A on
the bias, and that the best strategy for choosing p and lI'v will involve
minimiZing the mean squared error of VA.
Many researchers have compared theoretical variances of models by
equating the values of p or Pi (e.g., Greenberg et al. (19b9), and Moors
(1971».
However, when comparing different randomized response models,
Lanke (1976) and Warner (1976) recommend that the comparisons should be
made for equal levels of protection, rather than for equal values of p
44
or of other parameters.
Fligner, Policello, and Singh (1977) use this
notion when they compare Warner's model with the unrelated question
model using data from Greenberg et al. (1969).
They define the primary
protection afforded the respondent by a scheme S as
(2.10 )
Then the models are compared for equal levels of primary protection.
I 1.6
Summary
This chapter has attempted to discuss the issues related to the
controversy of whether or not randomized response techniques are useful
in practice.
The approach has been to relate this issue to the notion
of respondent behavior as a function of the perceived level of
protection.
A major thrust of this present research concerns itself
with the effects of perceived protection and of individual levels of
tolerance on the behavior of the respondents as measured by the
probability of giving a truthful response.
Ill.
MODEL I:
AN UNRELATED QUESTION RANDOMIZED RESPONSE HODEL THAT
ALLOWS NON-TRUTHFUL RESPONSES AMONG A AND V INDIVIDUALS
111.1
Introduction
The theoretical development in this research focuses on the
unrelated question randomized response model with Vv known.
to recommend to an investigator an optimal value of
Our goal is
Vv (Vvop~)
for this
model, based upon the sensitivity of the stigmatizing question, an
estimate of VA, and the choice of p.
results in tabl.s of
of sensitivity.
estimates I
A
Ivopt
The theoretical development
for several combinations of lA, p, and degree
In this chapter, we develop a general model that
in the presence of non-truthful reporting among both the A
and V respondents.
We assume that there are sometimes untruthful
responses to the V question because respondents may. fear or perceive
some threat by answering "Yes".
We derive the bias, variance, and mean
squared error for the estimate of lAo
Later chapters explore the
relationship between non-truthful reporting and the risk of suspicion,
the selection of
Iv
for fixed p and VA, and discuss the specifications
of the fixed parameters.
Chapter VI discusses a somewhat different
approach to estimating VA in the presence of non-truthful reporting
among A and a individuals.
46
111.2
Definitions of probabilities of truthful reporting
A search of the literature for studies that considered nontruthful reporting found only studies that adjusted for non-truthful
reporting among the individuals with the sensitive characteristic, while
assuming full cooperation among individuals who possessed the innocuous
characteristic and answered the unrelated question; see Greenberg et al.
(1969), and
(1.14) to (1.16).
Our model accounts for non-truthful reporting among four mutually
exclusive subgroups that are defined by the presence or absence of the
sensitive characteristic (A or a) and of the innocuous characteristic (Y
or y):
Innocuous Characteristic
Y
... Y
A
Sensitive
Characteristic
a
where
lT A = proportion of population with A
IT. = proport i on of population without A
= 1-lTA
lTv = proportion of population with Y
lTv = proportion of population without Y
= 1-lTv •
.•
47
We aSlume that only those individuals who must give a "Yes"
response might choose to give a non-truthful response, while those who
mus t gi ve a "N 0 " r IS p0 nIe \I I ill coo PI rate f u11y•
q.~.I!.~~i9Dj~..!i .1! 1..ecJE!.!1
If t h eS~fl si~ i ye
, the truthful response probabilities may distribute
as follows:
Innocuous Characteristic
.....1
m
•••••••
.............Y ..
A
Sensitive
Characteristic
a
where
TAv
e
= probability
1.0
1.0
of "Yes" response among A
individuals ~.tJh. Y
and
TAl'
..
probability of "Yes" response among A
individual! ~i.Jh9.\"~ Y.
As
indicate~,
this model assumes 1007. truthful reporting among
individuals who are required to ans\IIIer "No"; in other \IIIords, the a
individuals \IIIill al\lllays respond truthfully to the sensitive question,
because their response is non-jeopardizing.
subgroups of individuals.
Consider now the AY and Ay
If all individuals give honest responses,
then the AY individuals must always answer "Yes" to either the sensitive
or the innocuous question; hO\lllevlr, although the Ay individuals would
answer "Yes" to the sensitive question, they would still have the
opportunity to answer "No" to the innocuous question.
Thus, because the
48
AY individuals1must always answer "Yes", while the Ay indivlduals could
answer "No",
we assume that TAvSTAy.
probabilities may distribute as:
Innocuous Characteristic
Y
A
1.0
a
1.0
Sensi ti ve
Characteristic
where
TAv = probability of "Yes" response among Y
individuals \".J..~~ A
and
T. v = probability of "Yes" response among Y
individuals \"...U.h.9..~ . ~ A.
Note that we assume that y individuals who select the unrelated question
will always respond "No".
Further, because the AY individuals must
always answer "Yes" in a full cooperation setting while the aY
individuals could answer "No" if the sensitive question is selected, we
By comparing the distribution of truthful responses for the
unrelated question with that for the sensitive question, notice that we
assume that the subgroup of AY individuals will demonstrate the same
probability of a truthful response, TAv, when either the sensitive or
unrelated question is selected.
Further, because the subgroup of ay
individuals must give a "No" response when selecting either the
e
sensitive or'unrelated question, we assume that they will always give a
truthful response, or T. v
= 1.
Weighted averages of the truthful response probabilities give
overall rates of truthful response among the A or Y individuals. The
probability of a truthful response for an A individual if asked the
sensitive question is
(3 • 1)
For a Y individual if asked the unrelated question, the probability of a
truthful response is
(3.2)
With these definitions of truthful response probabilities we can now
derive an estimator for IA.
111.3
An estimator for IA and its properties
Using the above notation, the unrelated question model with IT
known yields the following estimator of lA.
Let
p
= probability
of selecting the sensitive
question,
X'
A
X'
and
= probability
= observed
of a "Yes" response,
proportion of "Yes" responses,
n • sample size.
A
A
Then nX' is distributed as Binomial(n,X'), and E(X') = X'.
notation is equivalent to that for
(1.7)
and (1.8).
Then,
This
50
A' =
pC (Proportion with A and Y)Pr(truth among A and Y)
+(Proportion with A and y)Pr(truth among A and y) )
+ (l-p) C(Proporti on with A and Y)Pr(truth among A and Y)
+(Proportion with a and Y)Pr(truth among a and Y) )
= pClTAlTvTAv + lTAlTyT Ay ) + (l-p) ClTAlTvT AV + IT.lTvT.v)
= plTAClTvTAV
+ lTyT Ay ) + (l-p)lTvClI'AT Av + IT.T.v)
= plTATA + (l-pllTvTv
(3.3)
by substitution of (3.1) and (3.2).
Define the following expression:
'"
A'-lTv(l-p)
= ---------P
(3.4)
We will use (3.4) as an estimator of lTA, because its expectation is the
following:
= ~~:!~i!:e~
(3.5)
p
which equals lT A by (1.11) under 100% truthful reporting.
The bias is then
'"
Bias(lI'AllTv)
'Ul
'"
= EC(lTAllTv)
'Ul)
= ~~:!~~!:eL
p
-
lT A
~_:!:d!:eL
p
. -----
A'- A
P
By substitution of (1.7) and (3.3), (3.6) becomes
pll'ATA + (l-p)lI'vTv - (Pll'A + (l-p)lI'v)
= ----------------------------------p
(3.6)
51
P'ITAlTA-1) + (l-pl'ITvTv - (l-nl'ITv
= ---------------------------~--p
(3.7)
We note here that the bias is always negative, because we are
assuming that individuals with A or Y will not give a ·Yes· response
when confronted with the sensitive question or the unrelated question,
respectively.
Further, when comparing (3.7) with the bias expression
that assumes 100% cooperation among the Y individuals confronted with
the unrelated question, equation (1.161, we observe an additional
(1-~1
term, --- - Iv(Tv-il, which approaches 0 as p.l or Tv.l and
p
approaches
((l
"-
when p.O.
Because nX' is distributed as Binomial (n,X'),
the variance of the estimate is
I'
Var('lTAI'lTvl'lJl =
(3.81
By substitution of (3.31, (3.8) becomes
Var('lTAllv)
'lJl
{pIAT A+(l-pl'IT vTv}{l-p'ITATA-(l-p)'vTv}
= -----------------------------------np :2
(3.9)
The MSE of the estimate is
."
"-
MSE(I . . llI'v)
'UI
= {Bias(lI'AI'lTv)
'lJl}2
+ VarnrAllI'vl
'UI
+
by substitution of
(3.101
(3.71 and (3.91.
We note here several observations concerning the MSE in (3.10).
First, the squared bias is the dominant term; the variance equals
52
~'(1-~')/np2,
which cannot exceed 1/4np2.
squared bias is somewhat difficult.
Deriving the minimum of the
The negative of the bias is
(3.11>
If TA does not depend on p then clearly the squared bias is minimized by
choosing p
= 1.
However, as we discuss in a subsequent chapter, TAV,
TAy, and T. v themselves depend on p, TA, and Tv.
Thus finding the
minimum of the squared bias with respect to p and Vv is complicated.
If we set Tv
II
1 in (3.7), (3.9), and CL10), then exprusions
reduce to the bias, variance, and MSE formulated by Greenberg et al.
"-
(1969) when Tv is known without error; or Bi.s(V AtVv)ul·VA(TA-1), and
pTATA+(1-p)Vv}{1-pVAT A-(1-p)Vv}
= ------------------------------np2
(3.12 )
IV.
THE RELATIONSHIP BETWEEN RESPONDENT HAZARDS AND T
IV.!
A purpole of
r.ndo~ized
Introduction
relponse lodell in surveys of sensitive
ilsues is to .licit incr'.led cooper.tion over the direct question
.Ithod or other lurvIY t.chniquII by providing protection to the
relpondent .nd int.rvilwlr.
However, the degrel of protection .s
perceived by the r.spondent will .ffect the probabilities of truthful
rlsponses, which
WI
will represent by T without regard to whether the
rlspondent is an AV, .V, Ay, or ay individu.l.
For a particular
qUlstion, the dlgree of protection il dltlr.ined by p, lA, and lv, two
of which Ire lel.ct.d by thl lurv.y d'ligner.
For a particular
Itigmatizing topic, w. hypothesize that a certain relationship exists
between T and perceiv.d prot.ction.
Further, the shape of the
relationship lay vary .ccording to the lensitivity of the Itigmatizing
topic.
This ch.pter explor'l the r.lationship between T and the perceived
protlction by first propoling a g.neral •• thematical relationship.
We
then gener.te a family of relationships, whOle .e.bers correspond to the
degrle ofl.nlitivity of the Itig.atizing lubJect.
certain definitions.
We begin with
54
IV.2
Definition of risk of suspicion, P(AIVes)
As discussed in Chapter II, Lanke (1975) defined the "risk of
suspicion", P(AIVes), as the conditional probability that a respondent
belongs to the sensitive group given that the respondent's response is
"Yes".
By definition of conditional probabilities,
P(AnVes)
P(AIVes) = -------P (Ves)
14. 1 )
For the unrelated question model with Iv known,
PlArlVes) = Pr[an individual has A and selects the sensitive
questionJ
+ Pd an individual ha A and V and sel ects
the unrelated questionJ
= pIA + (l-p)lAlv
and
14.2)
PlVes) = "= pIA + lv(1-p)
by substitution of 11.2).
(4.3)
SUbstituting 14.2) and 14.3) into 14.1) gives
(4.4)
Note that p=l is a direct question setting, or, equivalently,
PlAIVes)=1; and that when p=O, PlAIVes)-V A•
e
55
IV.3
IV.3.1
Definition of hazard functions
Introduction
We are interested in modeling the concern felt by a respondent
when answering a randomized response question.
Greenberg et al. (1977)
suggest that this concern may center on whether or not the respondent
believes the interviewer perceives the respondent as A when the
respondent gives a response.
Further, we believe that a "No" response
carries little or no concern, whlreas a "Ves" response could carry a
large amount of concern.
Greenberg et al. (1977) define the hazard for
a respondent to be the concern associated with any response, and the
limited hazard to be the concern associated with only the "Ves"
response.
In this paper, we concern ourselves with limited hazard functions
for the four population subgroups defined earlier.
The derivations
extend the Greenberg definitions of limited hazards:
HA = Pr(A is perceived as A when answering Ves)
= PlYesIA)PlAIYes)
H. = Prla is perceived as A when answering Yes)
and
= PlVesla)PlAIYes).
Using these definitions, we define the limited hazards in our study as
follows.
IV.3.2
Limited hazards for AV individuals
For individuals with both A and V lAY), the limited hazard is
HAY = PrlAY is perceived as A when answering Yes)
= PlAIYes)PlYesIAY).
14.5)
56
An individual with both A and Y must always answer "Yes" when confronted
with either the sensitive or innocuous question.
Thus, PlYesIAY)=l and
14.5) becomes:
HAY
= P(AIYes)
14.6)
which is simply the risk of suspicion defined in 14.4).
IV.3.3
Limited hazard for Ay individuals
For individuals with both A and y lAy) the limited hazard il
= PrlAy is perceived as A when answering Yes)
HAy
= PlAIYes)PlYesIAy).
14.7)
An Ay individual responds "Yes" only when the sensitive question is
selected.
e
Thus, PlYesIAy)=p and (4.7) becomes
HAy
= pPlAIYes).
14.8)
Because O<p<l, we note that the limited hazard for an Ay individual is
less than the limited hazard for an AY individual.
This, of course, is
what one would expect since Ay individuals respond "Yes" less frequently
according to the value of p.
IV.3.4
Limited hazard for aY individuals
For individuals with both a and Y laY), the limited hazard is
H. y = PrlaY is perceived as A when answering Yes)
= PlAIYes)PlYeslaY).
(4.9)
57
An aY individual responds "Yes" only when selecting the unrelated
question.
Thus, PlYeslaY)=11-p), and 14.9) becomes
14.10)
H. v = 11-p)P(AIYes).
If
p~.5,
we note that the limited hazard for individuals with only Y is
less than or equal to the limited hazard for individuals with only A.
IV.3.5
Limited hazards for a and y individuals
For individuals without both the sensitive and unrelated
characteristics lay), the limited hazard is
H. y = Prlay is perceived as A when answering Ves)
= PlAIYes)PlVeslay)
=0
14. 11 )
,
because ay individuals must always answer "No" or PlVeslay)=O.
the limited hazard is zero.
Thus,
Such individuals have no limited hazard.
They would have some hazard only if the full hazard was computed using
PlAINo).
IV.3.6
Summary
When a potential participant is given instructions on the
randomized response technique, the respondent gains insight into the
values of p, VA, and Tv.
Although the exact values may be unknown, we
conjecture that the respondent possesses a concern associated with a
"Yes" response and this quantity varies accordong to the four subgroups
of the population.
We have attempted to model this concern through the
use of limited hazards which are summarized here for convenience:
58
HAY = P(AIYes) ,
HAY
and
(4.5)
= pP(AIYes) ,
(4.8)
H.v = (1-p)P(AIYes) ,
(4.10)
H. y = 0,
(4. 11 )
where P(AIYes) is defined by (4.4).
We believe that individuals quantify either approximately or
exactly their limited hazard functions, and then decide
wh~ther
or not
to give a truthful response when confronted with a question requiring a
"Yes" response.
Owing to the fact that the respondent may reach a
dec i s ion for a t rut hf uIre s p 0 ns e
~
ft.er de te r min i ng P (AlYe s),
0 ur
derivations of the risk of suspicion and limited hazard functions do not
include the probabilities of truthful responses.
Another way of
considering the hazard is that the individual respondent calculates
his/her hazard prior to selecting the question and may decide in advance
whether the jeopardy is too great a risk.
This concept of an
individual's threshold of tolerance will be discussed in a subsequent
chapter.
IV.4
IV.4.1
Relationship between limited hazards and T
Description of the relationship
Our study of the relationship between limited hazards and T builds
upon and extends several studies.
Greenberg et al. (1977) postulated a
mathematical relationship between the risk of suspicion (P(AIYes)) and
the probability of answering truthfully among the A individuals (T A
).
They suggest that TA remains close to unity at low values of P(AIYes),
and then decreases rapidly to zero beyond a certain threshold of
59
P(AIYes).
Soeken and Macready (1982), in their study of the
relationship of perceived respondent protection with the probability of
responding to a sensitive question, suggested that perceived protection
might be related to the sensitivity of the topic under study.
In the Greenberg et al. (1977) setting, the relationship of the
probability of telling the truth to the sensitivity of the topic would
correspond to different threshold values of P(AIYes).
For example, a
low threshold would correspond to a highly sensitive topic, and thus TA
would remain close to unity over only a small range of P(AIYes).
Conversely, a mildly sensitive topic would have a high threshold, and TA
would remain close to unity over a broad range of P(AIYes).
Lastly, Himmelfarb and Lickteig (1982) proposed a measure of the
social desirability, i.e., sensitivity, of various topics.
Essentially,
they constructed the difference between an estimate produced by a direct
question and one produced by a randomized response technique.
The
magnitude of the difference indicates the degree of sensitivity of the
topic:
a large value denotes a highly sensitive topic, and a small
value denotes only a mildly sensitive topic.
In combining and extending these studies, we put forth two
assumptions.
First, we propose that the probability of a truthful
response is dependent upon the respondent's subgroup and thus is related
more accurately to the limited hazard rather then to the risk of
suspicion.
In this way, we can explore the probability of a truthful
response separately for each subgroup in order to produce an estimate of
i
A •
Further, we also assume that the relationship between T and the
limited hazards is question-specific.
We suggest rating a specific
60
topic on an ordinal scale of 0 to 10, where 0 denotes little or no
sensitivity and 10 denotes the most sensitive and threatening topic.
A
method such as Himmelfarb and Lickteig's could be adapted such that the
differences calculated by their scale are categorized into the above
eleven groups.
In order to illustrate these last several points, the following
example presents typical questions that might correspond to each of the
eleven categories of sensitivity.
These questions were used by
Himmelfarb and Lickteig (1982) when exploring the social desirability of
a "Yes" response among introductory psychology students to each of
fifty-six items.
Sensi.tiy-tt y (5)
o
Iy pj ca IQu@lStion
Have you ever given help to a motorist in
distress?
Do you use lead-free gas in your car?
2
Did you go to church or synagogue last
weekend?
3
Have you ever collected money for a
charitable fund drive?
4
Have you ever used some light drugs?
5
Have you ever had complete sexual
intercourse?
6
Have you ever bought pornographic
material?
7
Have you ever taken something from a store
without paying for it?
8
Have you ever falsified your income tax
report?
9
Did you ever have an abortion?
10
Have you ever driven a motor vehicle while
drunk?
61
While it is evident that social topics vary in their categories
of sensitivity, it is less obvious that S is also space- and timedependent.
For example, abortion was a highly sensitive issue in the
1960s in the United States; indeed, abortion was one of the earliest
applications of randomized response models.
However, in other
countries, abortion was never a significant issue.
In addition to the
variation of society's values across geographic areas, values also
change over time.
A few examples include abortion, marital infidelity
and psychiatric help, which are now more acceptable than they were
decades ago.
For each of the eleven categories, we propose a unique curve that
depicts the relationship between T and the limited hazards, using an
ogive or sigmoidal shape postulated by Greenberg et al. (1977); see
Figure IV.1 for example.
Each curve would correspond to a different
threshold category described above, where the group with a sensitivity
of 10 would have a low threshold, and the group with a sensitivity of 0
would have an extremely high threshold.
In the next step of the theoretical development, a family of
curves is generated for each of the eleven sensitivity categories.
Then, for each of the eleven categories, a table of optimal values of lv
will be produced for fixed p and lA.
IV.4.2
A family of relationships between T and limited hazards
To generate a family of eleven curves, each corresponding to one
of the levels of sensitivity described earlier and following the shape
described by Greenberg et al. (1977), such relationships must contain
the following properties:
62
FIG~RE
IY.lt
POSTULATED RELATIONSHIP BETWEEN PROBABILITY OF A TRUTHFUL
RESPONSE <Tl AND THE LIMITED HAZARD
63
1. the x-values range from 0 to
2. the y-valuu range from 0 to
3. the shape is in the form of an ogive.
One convenient function that allows us to achieve these properties is
the beta survival function, which is described as follows (Hastings and
Peacock (1975)).
Let X denote a random variate such that
denote shape parameters.
O~x~l.
Let
Q)O
and
Thus, X follows a beta distribution with
probability density function
f(x:Q,~),
(4.12)
where
and
~)O
B(Q,~)
is the beta function, with parameters Q and
~,
given by
(4.13)
The beta cumulative distribution, also known as the incomplete beta
function, is
r
1
B(Q,~)
where
F(x:Q,~)
x ta
JO
-
J
is increasing in x over
(l-U"--l dt
O~x~l.
(4. 14)
Then, the beta survival
function is simply
(4. 15)
S(X:Q,f3) = 1 - F(x:Q,f3)
where
S(x:Q,~)
is decreasing in x over
S(l:Q,f3)=O for all
Q
and 13.
If we set
hazard denoted by X and for a given
Q
Oix~=l
and S(O:Q,f3)=l and
S(~:Q,f3)=T
and
~,
for a given limited
then we can generate a
64
family of curves that relate T with the limited hazards by varying
~
Q
and
as follows.
Define 5
= sensitivity
of the threatening question, such that
5=0,1,,,.,9,10 as defined in IV.4.1.
and
~
= 25 +
Q
= 22
=
..,.., -
1.. ..
= 21
gives
(Q, ~)
for
-
-
(4.16 )
~
(25 + 1 )
(4.171
25
(19,3) ,
= ( 21, 1 ) ,
5 = 0, 1,
Then, letting
...
, (3,19) ,
(1,21)
,9,10
E>:presslon (4.1) can be rewritten as
5(x:21-2S,25+1)
=1
- F(x:21-2S,2S+1).
(4. 18)
Using this set of values of 5, the family of curves depicted in Figure
IV.2 was generated.
From the definition of the four limited hazards ((4.5), (4.8),
(4.10), and
(4.11», and the general relationship between T and limited
hazards for a particular degress of sensitivity, the following
relationships are obtained:
(4. 19)
(4.20)
T.v = 5((1-p)P(AIYes):Q,I3)
(4.21 )
(4.22)
6S
FIGU~_.LV.!..~l.
RELATIONSHIP BETWEEN PROBABILITY OF A TRUTHFUL RESPONSE
(T) AND LIMITED HAZARD FOR ELEVEN CATEGORIES OF
SENSITIVITY
,
: I:'\\~\~\ -, ~'~ -
...,..,.....".----..--~-.."...-.."...-----~---
.D
\0
0.6
\
\
\
\ ~ \
\
\
J\ .\
T
.
"
\
\
1 \
11\
\
\'
0.' I \
04
-J \ cr--
\
\
\
\
\
~
\
\
'
\
\
\
o:j \ \ \ \ \\ \ \ \
o:L .\,\\~\, \,~~, ~\ \
0.,
I
o
\
'
0.2
\
\
\
0.8
0.6
0.4
HAZARD
S • sensitivity oi stigmatizing question
1
·2(~-1).
where ~+~ • 22. ~ • 1.3 •...• 19.21. and
beta probability dtnsity iunction
Q
and
~
ire pira.eters oi the
66
With these definitions of TAY, TAy , T.y, and T. y for eleven
categories of sensitivity, we are now prepared to select the optimal
value of Uv for fixed values of p and UA •
V.
AN INVESTIGATION OF THE CHOICE OF Iv
V.l
Introduction
A
In this chapter, we select the Tv which minimizes MSE(IAlTv)'ul,
which is equivalent to minimizing (3.10) with respect to Tv:
(5. 1 )
+
We would like to obtain a general result for the optimal Iv.
However, as shown in the previous chapters, the probabilities of telling
the truth are themselves functions of beta survival functions which
contain integrals involving P(AIYes), and thus involving Tv.
Thus,
differentiating (5.1) with respect to Tv becomes very difficult if not
impossible.
Therefore, it was decided to obtain the optimal Iv through
the use of numerical techniques.
This chapter discusses the technique
which includes specifying lA, p, and n, and obtaining Iv by varying
P(AIYes).
In order to obtain the MSE, which requires the probabilities
of telling the truth, we generate the probabilities (T Av , T.v, TAy , l.y)
for the four sub-populations using the liMited hazards for eleven
categories of sensitivity.
We then select Iv for minimum MSE.
68
V.2
V.2.1
Choosing p, lf H
,
Specification of parameters
and n
The first step in the numerical technique is to specify the values
of p and lf A for which one would like to obtain the
lfvon~.
We chose
three levels of p: viz. 112,2/3, and 3/4; each of these corresponds to
a randomization device such as flipping a coin, rolling a die, and
flipping two coins, respectively.
Although the purpose of a survey is to estimate VA, the researcher
usually postulates in advance a probable estimate of lf A
•
This estimate
may be obtained from a pilot study or from other research.
While the
goal of the unrelated question rahdomized response model is to estimate
lfA, one must still incorporate an estimate of lf A when recommending the
optimal lfv.
lf A
:
We have chosen to recommend the
lfvnp~
for five values of
.01, .02, .05, .10, and .25.
The variance calculation, (3.9), requires the sample size.
We
obtained the Vvopt for n=100, 500, 1000, and 10,000.
V.2.2
Choosing P(AIYes), lf v , and T
In order to model the decision making of the respondent, i.e.,
whether or not to give a truthful reponse, one needs to know the values
of the limited hazards as discussed in the previous chapter.
Inasmuch
as the limited hazards are functions of p and P(AIYes), where p is
fixed, we need only to specify many values of P(AIYes) to obtain the
limited hazards and then, in turn, the values of the corresponding
probabilities of telling the truth, which are required for the MSE
calculations.
Further, a unique Vv is determined for each P(AIYes),
when n, p and VA are known.
69
From (4.4), tole I:notol that
(5.2)
tolhich gives
pIA (1-P(AIYes)
Iv • (1-p)
---------------(P(AIYes)-1A)'
(5.3)
Thus based upon the relationship in (5.3), for every value of P(AIYes)
such that the denominator is positive (or P(AIYes)(1A) and the numerator
is less than the denominator,
tole can obtain a corresponding Tv.
The next step is to obtain the probabilities of telling the truth.
Thus, at each lev.l of P(AIY.s), the limited hazards tolere calculated
using P(AIYes) and p.
For a specific level of sensitivity, tole then
obtained TAv , T.v, TAy , and T. y through the application of (4.15),
(4.1b),
(4.17), and (4.18), tolhich relate the probability of a truthful
response and the limited hazards through the survival function for the
beta distribution.
V.2.3
Summary
Through various steps, tole have notol obtained all the values
required for calculating the HSE, and thus determining the optimal Iv.
First, p, TA, and n are fixed.
For a given P(AtYes), one can then
obtain the corresponding Iv and limited hazards.
At a given level of
sensitivity, TAv, lAy, l.v, and l.v are obtained tolhich a110tols the HSE to
be calculated.
70
V.3
Selecting lvopt for minimum MSE
At each combination of p, VA, level of sensitivity, and n, we wish
A
to recommend the optimal value of Vv for which MSEIVAli v ) ' U l is a
minimum.
This discussion is restricted to a particular combination of
those variables; however, the technique is applicable to all
combinations.
The approach to minimizing the MSE requires two steps; the first
gives a rough approximation of the minimum MSE and the second gives a
more precise value.
This approach is summarized here; for further
details, see Appendix A.
At the first step, the MSE, the limited
hazards, the probabilities of a truthful response, and iv, were
generated for 101 values of PIAIYes).
We then determined the
approximate minimum MSE by inspecting the MSE corresponding to each
PIAIYes).
The second step uses this value in an iterative process for
obtaining the minimum MSE.
Finally,
ivop~
is the Vv corresponding to
the minimum MSE obtained by this iterative process; see Tables VII.1VII.4 and a discussion of the results in Chapter VII.
V.4
Recommendation for p
Although the primary purpose of this research is to select lvoPt,
we would like also to address the uncertainty regarding the optimal
value of p.
To that end, MSEs are ocmpared for a given sensitivity and
lA by constructing the ratios of the MSEs for p=1/2 to those for p=3/4,
and for p=2/3 to p=3/4.
A ratio
)1
denotes an increase in efficiency
relative to p=3/4; a value <1 denotes an decrease in efficiency relative
to p=3/4; and a value =1 denotes no change relative to p=3/4.
These
ratios produced Tables VII.5-VII.8, which are discussed in Chapter VII.
VI.
MODEL II:
AN UNRELATED QUESTION RANDOMIZED RESPONSE MODEL THAT
ALLOWS NON-TRUTHFUL REPONSES AMONG A AND a INDIVIDUALS
VI.l
Introduction
The model presented in previous chapters considered non-truthful
reporting among A and Y individuals.
For each of the four mutually
exclusive sub-populations and eleven categories of sensitivity we
hypothesized certain relationships between the probabilities of a
truthful response and of an individual being perceived as possessing the
stigmatizing attribute when responding "Yes" to a question selected by
the randomizing device.
This approach assumes that an individual first
quantifies his/her limited hazard, and then decides whether or not to
give a truthful response when required to answer "Yes".
By contrast a second model, which is developed in this section,
does not depend on whether or not the individual possesses the innocuous
characteristic and assumes that every individual possesses his/her own
tolerance of suspicion, namely 9.
If the individual selects a question
requiring a "No" response, then a truthful answer is given.
However if
the questlon requires a "Yes" response, the individual calculates or
estimates in some way the probability that a person who answers "Yes"
will be perceived as having the stigmatizing characteristic.
Suppose
the respondent arrives at the value Q; then he/she answers "Yes" if Q is
less than his/her tolerance for suspicion.
The tolerance 9 has a
probability distribution (tolerance distribution) over the population.
72
The shape of the tolerance distribution depends on how stigmatizing the
characteristic is; the more stigmatizing, the lower the average
tolerance.
The distribution may also differ for people who have the
sensitive characteristic from those who do not.
However, we assume that
an individual's tolerance is fixed before the question is selected, and
is independent of whether the individual has the innocuous
characteristic or not.
An estimator for VA and its properties
VI.2
The mathematical formulation for Model II which was described in
the preceding section is the following.
a = tolerance
and
Q
= the
For any individual let
of suspicion
calculated value that a person who answers
"Yes" will be perceived as pOlsessing the
stigmatizing characteriltic.
Note that
a
is a random variable and Q is a value fixed in advance of
answering "Yes".
An individual who is confronted with a question
requiring a "Yes" response uses the following rule to decide whether to
give a truthful answer:
1. If Q S
a,
give a true answer, i.e., "Yes"
2. If Q )
a,
give a false answer, i.e'
l
"No".
Let F1(Q) be the tolerance distribution among persons who have the
stigmatizing characteristic and Fg(Q) among those who do not have it,
where
73
F (0) - Pr
an d
(e,So)
(6. 1 )
i-F(Q) - Pr(e>Q)
(6.2)
Among the population of A individuals, the proportion I-F l (0) would give
truthful answers to questions requiring a "Yes" response and among the a
population, the proportion I-Fo(D) would give a truthful response.
Similar to the notation of Chapter III, let
p
::
probability of selecting the sensitive questi on
A' .
::
e>: pec ted proporti on of "Yes" responses
/,
A' .
= observed
and
n
= sample
Then
nA"
is distributed as Binomial(n,A") and E(A")=A".
A' . =
p[F'roportion with AHPr (truth among A) ]
proportion of "Yes" responses
size
..
.-,
A
Then,
+ (i-p) [ (Proporti on with A and Y)F'r(truth among A)
+(F'roportion with a and Y)F'r(truth among a) ]
= plT .. (l-F. (0»
+
= plT .. (i-F ,(0»
+ (l-p)lTv[lTA(l-F l (0»
(1-p)
[(lTAlTv) (1-F 1 (0»
+ (l-lT A)lTv(l-Fo(D) ]
+ (l-lT A) (l-F o (D)]
(6.3)
As In Chapter III, define the following expression:
,.,
X"
- lT v (l-p)
= -------------
P
(6.4)
which we will use as an estimator of lTA, because its expectation is the
following:
X" - lTv(l-p)
= -------------
P
(6.5)
74
which equals lT A by
(1.11)
under 1001. truthful reporting.
The bias
15
then:
.",
Bias(lTAllTv)' 'Ut
= E[ (lTAllT v )' 'u, J - lTA
= ~~~_:_!:::i!:e~
p
ft,"
-
(6.6)
~_:_~~il:e~
p
X
= -------
p
By substitution of
(1.7) and (6.3),
(6.6) becomes
..,
Bias(lTAllTv)' 'Ut
Because
nX"
(6.7)
is distributed as Binomial(n,X"), the variance of the
estImate is:
..,
Var (lTAllT v )' 'ut
By substitution of
(6.3),
=
X"(I-ft,")
(6.8)
(6.8) becomes:
!1-F t (Q»lTA[P + (1-p)lT ...
J + (I-F(.(Q» (l-p)lTv(l-lT A )
= ------------------------------------------------np"
x
{l
The MSE of the estimate is:
( 1- F t (Q) ) IT A [p + (1 - P ) IT y J + (1 - F Q) ) ( 1- P ) IT y ( 1- IT A )
+ {--------------------------~---------------------oj (
np'
x
{1
-
(I-Ft(Q))lTA[p + (l-p)lT v ] + (I-F,,(Q») (l-p)lT ... (l-lT A )
---------------------------------------------------
np"
(6. 10)
75
If we assume that the tolerance distribution for the A
individuals is the same as that for the a individuals, then F 1
= Fo = F
and 16.3) becomes:
= (1-Fl(;l»
X'"
lplf A + (1-p)ll'v)
( 6. 11 )
Then, it follows that the bias is:
A
Biaslll'AIll'v)"
'Ul
= [-Fl(;l»)[lfA +
( 1- P )
p
lTv]
(6.12)
the variance is:
Varlll'AIll'v)"
[11-F(Q» lplfA+(1-p)lfv»)[1 - (11-Fl(;l» (plf A+(1-p)ll'v)]
'lIl
= ---------------------------------------------------
16.13)
np2
and the mean squared error is:
(1-p)
."
MSE(ll'AIll'v)"
'Ul
When F 1
= {[-FlQ»)[lf A + ----- lfv])2
P
= Fo = F,
the bias and variance for Model I are identical
to those for Model II under certain conditions.
showed TA
= TAvll'v
t
TAvll'v and Tv = TAvwA
II
t
In Chapter III, we
T.v w
II ••
If TAV - TAy, that
l'S
if the probability for a truthful response from an A individual is the
same no matter which question is selected, then it follows that TA
=
TAv.
By a similar argument, Tv = TAV
restriction that TA
= Tv,
then (1-F(Q»
= TAy •
= TA =
= TAV
With the further
Tv.
Thus, with these
restrictions and appropriate substitutions, the bias and variance ((3.7)
and (3.9», respectively) for Model I are equivalent to (6.12) and
(6. 13 i •
76
VI.3
Defining 0 and F(O)
In order to implement the model described in the previous section,
an approximation for Q must be defined.
Recall that when a respondent
selects a question that requires a "Yes" response, the respondent
calculates or estimates in some way the probability that a person who
answers "Yes" will be perceived as having the stigmatizing
characteristic.
eXlst.
Two very different approaches to arriving at Q might
A sophisticated respondent might conceivably calculate 0
=
P(A/Yes), by estimating lA, Iv, and p and then applying probability
theory.
However, most respondents will follow cruder methods for which
setting Q = P(AIYes) might be a reasonable approximation.
one we use for Model II.
This is the
Further, Q = P(AIYes) will be calculated
assuming that some respondents might give non-truthful responses, since
some respondents will realize that other respondents will have
incentives to be untruthful also.
So,
Q
P(Af1Yes)
P(Yes)
= P(AlYes) = --------
IAp[1-FtlO)) + I A l v (1-D)[1-F n (Q))
IAp[l-F 1 (0)) + Iv(1-p)[1-Fo(Q))
= ----------------------~----------
Under the assumption that F 1 = Fn = F, (6.15) becomes
(6.15 )
77
(6.16 )
The eleven curves depicting T as a function of hazard (Figure
IV.2) will be used as the tolerance distributions represented by F(Q);
they are cumulated from the right instead of the left.
VI.4 Choosing
The method of arriving at
parallels that used in Model I.
F 1 = Fe> = F.
Tvop~
ivop~
for combinations of p, iA, and S
We make the additional assumption that
We will recommend the optimal tr v for p = 112, 2/3, and
3/4: i .. = .01, .02, .05, .10, and .25; and n= 100, 500, 1000, and
10,000.
As before (6.16) gives
(6.17l
Thus, for fixed p and TA, we need only to specify many values
of Q' to
...
obtain the corresponding Tv.
To calculate the MSE the values of F(Q') must be obtained.
Thus,
at each level of Q', F(Q') was generated using the eleven curves
depicted in Figure IV.l.
For each combination of p, TA, S, and n, the optimal values of tr v
A
for whIch MSE(tr A I1fv)"
described in IV.3.
'Ul
is a minimum was obtained using the approach
These iterations produced Tables VII.9-VII.12 which
we discuss in the next chapter.
78
VI.S
Recommendation for p
As in the case for Model I, the MSEs for a given sensitivity and
WA will be compared by forming MSE ratios; see V.4 for further details.
These comparisons produced Tables VII.13-VII.16, which are discussed in
the next chapter.
VII.
RESULTS FROM THE INVESTIGATIONS OF THE OPTIMAL Tv
VII.1
Introduction
Previous chapters discussed the methods used to obtain the optimal
Iv (lvopt) for two different· models.
This chapter presents and
discusses the results generated by Models I and II, and then compares
these two letl of relults.
VII.2
VII.2.1
Results for Model I
Investigation of Tvopt
First, we note th.t topics of high sensitivity yield. minimum
A
MSE with respect to Tv for which Bias(1A"v)ul'
VII.4).
~
lA (Tables VII.1-
For these topics, aore protection .ight be Qiven to the
respondent by either reducing p to • value less than one-half or by
increasing Iv to one.
The l.tter solution would probably provide more
psychologic.l incentive for respondents to .nswer truthfully, and thus
would yield. corresponding reduction in bi.s.
Indeed, inspection of
the bias when lv-1 showed th.t, in most instances, the
A
IBias(IA'lv)Ul 'I < 1 and th.t the vari.nce is only slightly higher than
the variance corresponding to the Minimum MSE.
However, from results
not presented here, reducing p to 1/4 or 1/3 resulted in Ivopt=O for
many of the highly sensitive topics; possible reasons for this result
are discussed below.
TABLE
VII.l:
_ .. -._- _-_..... --_... _-_..... -
...
MODEL I: OPTIMAL VALUES l FOR lv GIVEN lA, p, AND DEGREE OF SENSITIVITY
OF STIGMATIZING QUESTION; n=100
._______._________.__ ._._.__ .____. ___.. P~GR;_;_OL.M:_~§U I _VJTt_.J:tOJ.~~J lJ____._______.__________ . ____.____.___.
JL_
0
_L__
---~
1/2
2/3
3/4
.000 2
.000
.000
.000
.000
.000
.000
.000
.000
3
.000
.000
*
__4_
.000
*
*
1/2
2/3
3/4
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.006
.000
.004
.016
1/2
2/3
3/4
.000
.000
.000
.000
.000
.000
.000
.000
.012
.000
.010
.040
.000
.030
.077
1/2
2/3
3/4
.000
.000
.000
.000
.000
.000
.000
.000
.050
.000
.039
. 141
.000
.100
.273
1/2
2/3
3/4
.000
.000
.000
.000
.000
.000
.000
.000
.705
.000
1.000
1. 000
.010
1.000
1. 000
5
TA = .01
.000
2
*
_L_
---~---
9
10
.000
*
*
*
*
*
*
*
*
*
*f
*
*
*
*
f
*
*
= .02
*
*
.000
.010
.027
l A = .05
.005
.058
• 128
TA = .10
.028
.202
.482
T A = .25
.070
.301
.424
*
*f
1I'A
-----------_.
1
_..._~--_.
f
*
.023
.098
.200
• 161
.308
.296
.471
.079
.399
.924
.212
1.000
1.000
*
*
*
.136
*
*
*
*
f
*f
*
*
*
*
*f
*f
*
f
f
f
f
*
*
*
*
I
f
f
f
*f
*
*
f
*
Opti.al value is based upon .inimu. MSE which, in turn, is derived from probability of answering
truthfully according to the hazard presented by having to answer "Yes" to either question (or both).
When Tvop~ = .000, a direct question design is reco •• ended.
IBiasl ! TA for this cell. Tv = 1.00 is the reco ••ended value to be used in a survey;
however the IBilsl .ay not necessarily be reduced to < TA.
m
o
e
e
e
e
e
TABLE VII.2:
e
MODEL I: OPTIMAL VALUESl FOR Iv GIVEN lA, p, AND DEGREE OF SENSITIVITY
OF STIGMATIZING QUESTION; n=500 (SEE TABLE VII. 1 FOR FURTHER DETAILS>.
D~~R~_UL!~_~SIT
-.L
.._0_
_1_
--~-
._3_
4
1/2
2/3
3/4
.000
.000
.000
.000
.000
.000
.000
.000
.002
.000
.002
.007
.000
.005
.014
1/2
213
3/4
.000
.000
.000
.000
.000
.000
.000
.000
.008
.000
.007
.019
.000
.015
.035
IVI TV (81 '-~JP'-l ).1-_____
_2..._
I A = .01
.001
.010
.022
.~--
7
--------_._-
_.L_
9
t
t
t
*
*
*
*
*
.004
.017
.033
.010
.027
.013
.045
.086
.029
.073
.132
.132
.217
*t
•
.054
.179
.342
.124
.318
.595
.741
1.000
•
•
•
•
.260
.894
1.000
1.000
1.000
1.000
t
•
I A 8 .02
.005
.028
.056
fA 8 .05
1/2
2/3
3/4
.000
.000
.000
.000
.000
.002
.000
.006
.035
.000
.028
.076
.005
.060
. 131
1/2
213
3/4
.000
.000
.000
.000
.000
.012
.000
.018
• 132
.000
. 121
.256
.018
.236
.440
1/2
2/3
3/4
.000
.000
.000
.000
.000
.457
.000
.603
1.000
.000
1.000
1.000
1.000
1.000
1.000
.023
.106
.212
*
fA • .10
.062
.421
.775
•*
•
fA • .25
1.000
.314
.442
----~-
• 151
*
•
*
•
•
t
•
•
•
*
•
•
*
•
•
•
*
•
*t
•
•
•
•
•
•
Opti •• l value il baled upon .ini.u. MSE Which, in turn, is derived 4ro. probability 04 ansNering
truth4ully according to the hazard presented by having to answer NYI,N to either question (or both).
• IBia11 ~ fA for this cell
1
m
~
TABLE VIL3e
MODEL I: OPTIMAL VALUESl FOR Tv GIVEN lA, p, AND DEGREE OF SENSITIVITY
OF STIGMATIZING QUESTION; n=1000 (SEE TABLE VII.l FOR FURTHER DETAILS) •
._______pE BREE OF SEN ~J..lJ~llY_~-L~J.~:tU ________
.JL
-_.Q_-
_1_
__
2_
_ t_
_4_
1/2
213
3/4
.000
.000
.000
.000
.000
.000
.000
.000
.004
.000
.003
.010
.000
.009
.017
112
2/3
3/4
.000
.000
.000
.000
.000
.000
.000
.002
.011
.000
.009
.024
.002
.019
.042
1/2
213
3/4
.000
.000
.000
.000
.000
•007
.000
.010
.045
.000
.039
.092
.009
.07b
• 153
1/2
213
3/4
.000
.000
.000
.000
.000
.052
.000
.027
.1 b4
.000
• 1b 1
.298
.024
.285
.50b
112
213
3/4
.000
.000
.000
.000
.000
.528
.000
.710
1.000
.000
1.000
1.000
1.000
1.000
1.000
__.L_
I
A
TA
-_!_--
7
----------._._._-----_.---~--
9
t
• .01
.002
.013
.027
.00b
.022
.041
.014
.035
.Ob2
.031
.Ob2
.099
c .02
.009
.034
.Obb
.018
.055
.102
.037
.090
.160
.094
.165
.271
.071
.219
.403
.157
.400
.731
.455
1.000
1.000
.34b
1.000
1.000
1.000
1.000
1.000
---!.Q.-
•t
t
t
t
t
t
•t
•
t
t
•
•
t
t
t
t
TA = .05
.030
• 131
.24b
T. • .10
• 172
.503
.903
T. = .25
1.000
.31b
.444
.153
t
t
t
•t
••
•
•
•
•t
t
t
•t
t
t
t
t
t
t
t
Optilal value i. ba.ed upon .inilul "SE which, in turn, is derived fro. probability of an.wering
truthfully according to the hazard pre.,nt'd by having to answer "Yes" to lither qUlstion (or both) •
• IBia.' ~ T. for this c,ll
I
m
tV
e
e
e
e
e
TABLE
VII.4:
-._
.•. _.... _.. -_ .. -._.-_.--_.-
MODEL I: OPTIMAL VALUES' FOR Tv GIVEN TA, p, AND DEGREE OF SENSITIVITY
OF STIGMATIZING QUESTION; n-10000 (SEE TABLE VII.l FOR FURTHER DETAILS).
DE!~~]f-!ENSITIVITV
JL
_._9.._
_.._.1_
2
3
1/2
213
3/4
.000
.000
.000
.000
.000
.002
.000
.000
.00'1
.000
.007
.016
.002
.013
.027
1/2
.000
.000
.000
.000
.000
.007
.000
.007
.022
.001
.01B
.040
.007
.033
.064
3/4
.000
.000
.000
.000
.000
.0:51
.000
.041
.OB3
.004
.079
• 141
.043
.132
.227
1/2
2/3
3/4
.000
.000
.000
.000
.000
.129
.000
.150
.251
.004
.253
.430
.175
.432
.72'1
1/2
2/3
.000
.000
.000
.000
.000
.770
.000
1.000
1.000
1.000
1.000
1.000
.028
1.000
1.000
2/3
3/4
1/2
2/3
e
4
._~-
fA • .01
.006
.022
.041
fA • .02
.016
.054
.0'1'1
(·1/2(~-1l)
6
7
_8_
.013
.035
.063
.025
.057
.0'19
.057
.10B
• 17B
.234
.344
.465
•
•
•
.033
.086
.156
.065
.147
.254
.1 SS
.2'14
1.000
1.000
1.000
•
•
• 143
.362
.631
.2'1S
.747
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
fA • .05
.OBO
.214
.364
fA - .10
.309
.807
1.000
fA = .25
3/4
__._------
.
.487
1.000
.312
.446
.154
.345
•
•
•
•
f
•
9
---!JL.
t
•
•
•
•
•
•
•
•
•
f
•
•
•
•
•
•
•
•
•
•
•
•
Opti.al value is based upon .ini.u~ M5E Mhich, in turn, is derived frOM probability of IniMerin9
truthfully Iccording to thl hlzlrd prelented by having to Inswer "Ves· to .ith., qUIstion (or bothl •
• IBiul ~ fA for this cell
I
CD
(A
84
Secondly, Greenberg et al. (1969) recommended that Vv should be
selected in the neighborhood of the presumed or postulated VA.
The
results here for topics of modest sensitivity (3 i S i 5) and where VA
ranges from .02 to .10 show that the approximation was a reasonable one.
Topics of no or little sensitivity (S=0,1) and, in general, of
mild sensitivity IS=3) imply Vvnpt=O.
When Vvopt=O, a "No" response is
always required when the unrelated question is selected.
Thus, when
lyopt=O, a person who responded "Yes" to a randomized response question
on a sensitive topic would be identified as possessing the sensitive
attribute; that is, the individual's limited hazard is one.
An
implication of these results is a recommendation that no protection
should be given to the respondent with the sensitive characteristic who
selects the sensitive question; this is eqUivalent to a direct question
setting where a "Yes" response identifies the participant as possessing
the sensitive attribute.
An earlier discussion of the relationship
between the probability of telling the truth and the limited hazard
postulated that when the limited hazard is equal to one, the probability
of a truthful response is zero; see Figure IV.2.
For such a situation,
A
IBias(IAll y )ul' I = lA, and yet the results here show that, in fact, when
A
IvoPt=O, then IBias(V AIVv)ul' I < I A
•
What are the reasons for this apparent contradiction?
The answer
lies in the definitions of the limited hazards for the AY and Ay
subgroups of individuals.
The definitions lead to the result that when
Iv=O, only the AY individuals possess a limited hazard of one; the other
individuals possess hazards less than one.
By applying the definitions
of limited hazards presented in Chapter IV to this situation where
P(AIYes)=1, then HAy=1, HAy=p, H.v=(1-p), and H.y=O.
These hazards are
85
derived independently of Vv.
From Figure IV.2, the probability of a
truthful response is zero when the limited hazard is one and is greater
than zero when the limited hazard is less than one.
Thus these
definltions imply that only the AY individuals will always give a nontruthful response while Ay individuals have some probability of giving a
truthful "Yes" response.
Because some Ay individuals will give truthful
answers when selecting the sensitive question, it then follows that
A
IBiaslV Allvlul'
I(I~,
which is the result shown in these tables.
However, in reality when ly=O, then HAy=1 and H.y=O, which
A
would give IBiasllAllvlul' I=I A•
mathematical.
A second reason is more intuitive than
For topics categorized as mildly sensitive, it may well
be that certain individuals are willing to give truthful responses to a
topic that they may feel is not sensitive for themselves; this is the
rationale behind Model II.
Recalling that a randomized response model
contributes an additional cost in variance over a direct question
survey, a direct question setting for mildly sensitive topics may more
than offset an increase in the squared bias with a greater decrease in
the variance.
Thus, this may be the actual situation that is reflected
These results also show that the
be very similar.
Ivop~
in adjacent columns tend to
This would suggest that if the sensitivity of a topic
is not known exactly but that it is known to fall into a certain range,
say 3 i 5 i 5, then a
Vvop~
could still be selected.
tend to increase as the sensitivity increases.
Also, the
Ivop~
This is again what one
would expect--higher sensitivity requires greater protection for the
respondent.
When
V~
is fairly common IIA=.251, Vv =1 unless p=1/2.
86
When iA=.25, the ivopt within rows tend to increase to 1.000 and
then decrease.
Inspection of the MSE for various iv (produced by the
first part of the iterative procedure described in Chapter V) showed
that two local minima exist for some of these cells.
When the iv
corresponding to the local minima that are not the absolute minima are
substituted for the i v corresponding to the absolute minima, then we
again observe the pattern of increasing iv across rows.
Additionally,
for each combination of i A and S, the smallest ivopt corresponds to
p=1/2, and the largest to p=3/4.
This is consistent with the notion
that larger values of p give less protection to respondents which can be
compensated for by higher values of i v.
When the sample size increases, iVopt tends to increase.
This is
because the variance decreases as sample size increases, while the bias
remains independent of sample size.
Recall that, in general, the bias
tends to decrease as more protection is given to the respondent through
an increase in the choice of i v .
of the variance to the MSE
increase i v.
bec~mes
Thus, for larger n, the contribution
smaller, and we can afford to
Also, as n increases, the values of 5 for which we can
obtain iVopt also Increase.
And when n=10000, 5 can go as high as 9
Although not shown here,
w~
observed that in certain cases the
bias does not always decrease with increasing i v.
Recall from equation
(3.7) that
A
Bias(iAliv)
'Ul
(7 . 1)
where
( 7 .2)
87
and
(7.3)
Bias =
-~A.
However, as
~v
decreases from one to zero, the
.',
IBias(~AllTv) ' U t
increase) to
I does not necessarily monotonically decrease (or
~A.
This is due to two reasons.
themselves functions of
~v
First, TA and Tv are
through the limited hazard functions.
Each
contains its own rate of change; the contribution of each, actually of
(TA-ll and (Tv-l), to the bias is weighted by
( 1-p)
----p
respectively.
~v,
namely I
y •
and by
This leads to the second reason.
Tv, TA decreases more rapidly.
by a constant.
~A
Relative to
The contribution of (T A -l) is weighted
However, (T Y -l) is weighted by a varying constant,
Thus, for high values of lTv the second term dominates; and
for small values of lTv the first term dominates.
And because each term
decreases at a different rate, this leads to the observed pattern of the
change in bias.
In order to graphically depict some of the findings discussed
above and as a potential aid to researchers who would like to obtain
I vep ., we have plotted lTvopt vs Sensitivity for p=1/2, 2/3, 3/4, I
A
=.Ol,
and n=500, 1000, 10000 (Figure B.l in Appendix B).
~
.25
When .02
~
I A
only plots for n=1000 are presented (Figures B.2-B.4).
VII.2.2
MSE efficiency ratios for p
The MSE comparisons reveal that when n=100 and 5=0,1, the
preferred selection probability is p-3/4 (Table VII.S).
As the
senSItivity increases and .02 iAlT A i .10, each of the three p's becomes
an approXImately equally good choice in terms of MSE.
However, when
.-TABLE
.. -_. __.... _.. _._-. VII.5:
__ .-._------_.-
MODEL I: MSE EFFICIENCY RATIOS FOR p, GIVEN Tv, TA, AND DEGREE OF SENSITIVITY
OF STIGMATIZING QUESTION; n=100
_____._._______.______________________________.____ P.E._GKE.~j)f__~~t!~JJJ~U_r_i~JLfh.t~ - JJ_l___________________
0
__J___
2
1/2
2/3
3/4
.66
.89
1. 00
.62
.84
1. 00
.49
.74
1. 00
.53
1. 00
1/2
2/3
3/4
.66
.89
1. 00
.62
.84
1. 00
.56
.83
1. 00
.67
1. 06
1. 00
.91
1. 07
1. 00
1/2
2/3
3/4
.66
.88
1. 00
.63
.85
1. 00
.69
.99
1. 00
.85
1.04
1. 00
1. 11
1. 04
1/2
2/3
3/4
.65
.88
1. 00
.63
.87
1. 00
.78
1. 07
1. 00
1.01
1/2
2/3
3/4
.62
.87
1. 00
.66
.93
1. 00
.99
1. 16
1. 00
.-P--
• IBill'
~
3
it
4
1. 00
---~----
fA = .01
1. 00
----~1. 00
___1__
8
9
_l9_
it
it
it
it
*
it
*
it
it
it
it
it
it
it
*
*it
*
*
*it
*
*it
it
it
it
it
•
it
it
it
•
•
fA = .02
1. 23
1. 06
1. 00
it
it
1. 06
1. 04
•
1. 02
1. 00
fA • .05
1.14
1. 04
1. 00
1. 00
1. 00
1. 07
1. 00
1. 28
1. 15
1. 00
TA = .10
1. 17
1. 00
1. 00
1. 02
.97
1. 00
1. 13
1.44
1. 06
1. 00
3.96
2.17
1. 00
fA = .25
4.24
1. 55
1.00
-fA for this cell
1. 00
•
•
.98
•
*
•
•
.95
1. 00
it
•
*
it
•
•
*
it
•
it
•*
it
it
it
it
•
•
•
•
it
•
m
m
e
-
e
89
and sensitivity increases, the preferred p is p=1/2, which is
~A=.25
equlvalent to giving the respondent as much protection as is possible
for very sensitive topics that are relatively common.
Similar patterns are noted for n=500 and n=1000 with exceptions
occurring in the more sensitive categories (Tables VII.6-VII.7).
the upper two or three categories when .01
values are p=2/3 or p=3/4.
~
For
lA S .05, the preferred
Further, when lA=.10 and S=7 the preferred
values is p=1/2; again giving the respondent a high level of protection.
These general patterns hold for n=10000 with the addition that the
preferred value is p=1/2 when lA=.05 and 5=8 (Table VII.8).
We also
note that for all sample sizes, the MSE ratios strongly favor p=1/2 with
the three maximum ratios equalling 47.7, 21.2, and 15.7.
VII.3
VII.3.1
Investigation of
Results for Model II
lvop~
The results for Model II follow the same patterns discussed in the
previous section for Model I with certain exceptions (Tables VII.9VII.121.
First, when n=100 and lA=.01, it is always true that
A
IBias(~AI~y)
'Ul
12~A.
When
~A=.02,
contain any recommended values for
only the least sensitive categories
lyop~.
In general, relative to Model
I, Model II contains fewer recommended values for
~YOPt
~yop~.
Secondly, the
in Model II are always higher than those for Model I.
true model may very well lie between the two presented here.
there are no reversals in the trends of increasing
increases as was the case for Model I.
~YOPt
The actual
Finally,
as sensitivity
TABLE
- - -VII.o:
- - - MODEL I: MSE EFFICIENCY RATIOS FOR p, GIVEN Tv, TA, AND DEGREE OF SENSITIVITY
OF STIGMATIZING QUESTION; n-500
____.____.. ____ ._._______________....._._ ..__.. . _._ . . _.P-.;_G_~J_; __ QE._~N~l!Y!II._.~J.If.!.~=.!lJ_ .........__...____.. ___......_.. _.._.___...__._. . _..._.. . _.__'___
..JL
0
_L__
--~.
3
4
1/2
2/3
3/4
.00
.89
1. 00
.03
.8S
1. 00
.08
.98
1. 00
.83
1. 03
1. 00
1. 07
1. 01
1. 00
1/2
2/3
3/4
.06
.89
1. 00
.64
.87
1. 00
.75
1. 02
1. 00
.92
1. 02
1. 00
1.08
1. 02
1. 00
1/2
2/3
3/4
.66
.88
1. 00
.68
.92
1.00
.84
1. 04
1. 00
1. 04
1. 02
1. 00
1. 09
1. 01
1. 00
1/2
2/3
3/4
.6S
.88
1. 00
.73
1. 00
1. 00
.90
1. 00
1. 00
1. 19
.95
1. 00
1. 10
.94
1. 00
1/2
2/3
3/4
.02
.87
1. 00
.84
1. 24
1.00
1. 05
.84
1.00
3.04
2. 14
1.00
9.13
3.120
1.00
5
0
7
._-~---
__.J___
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
it
it
it
it
it
*
TA ... 01
1.09
1. 02
1. 00
1. 00
1. 02
1. 00
.88
1. 00
= .02
1. 04
1. 02
1. 00
.96
1. 01
1. 00
.85
1. 00
1. 00
.95
1. 00
.93
.99
1. 00
.82
.97
1. 00
.97
1. 00
.88
1. 10
1. 00
2.16
1. 8a
1. 00
IA
*
'.'_.n.
10
*
fA ... 05
1. 02
1. 00
1. 00
fA
= .10
.85
.92
1. 00
it
it
it
it
it
it
it
*
*
*it
it
*
*
*
*
fA" • 25
5.79
1. 61
1.00
-_._--------it IBiASI ~ TA for this cell
1. 00
it
it
it
it
*
it
it
-{)
0
e
-
e
e
e
e
__
TABLE
_ __ ...VII.7:
_... ._.. __.
..... ....._._ ....
MODEL 1: MSE EFFICIENCY RATIOS FOR p, GIVEN lv, TA , AND DEGREE OF SENSITIVITY
OF STIGMATIZING QUESTION; n=1000
p~§!t~.;_9~~1!.~
-_._--_._-----_..
I T t'LUY_( -I {~J.!- 1) )
_Jl_
__9-___
___1_
__L
112
.66
.89
1. 00
.64
.B7
1.00
.75
1. 02
1. 00
.91
1. 02
1. 00
1. 07
1. 01
1. 00
I", - .01
1. 01
1. 01
1.00
.95
1. 01
1.00
.85
1. 00
1. 00
.69
.95
1. 00
•
•
•
•
•t
.66
.B9
1. 00
.67
.91
1. 00
.80
1. 02
1. 00
.98
1. 02
1. 00
1. OS
1. 01
1. 00
I", ... 02
1.00
1. 01
1. 00
.93
1. 00
1. 00
.83
.99
1. 00
.6B
.95
1. 00
•
•
•
•
•t
.66
.B8
1. 00
.72
.97
1. 00
.89
1.04
1. 00
1. 10
1. 00
1. 00
1. 07
.99
1. 00
f", - .OS
.97
.98
1.00
.87
.97
1. 00
.79
.96
1. 00
.80
1. 15
1. 00
•
•
.65
• BB
1. 00
•8 1
1. 10
1. 00
1. 02
.97
1. 00
1. 21
.93
1. 00
.96
.93
1. 00
1. 08
1.44
1. 00
3.59
2.19
1. 00
t
•
•
•
.62
.87
1. 00
.91
1. 25
1. 00
1. 10
.B6
1. 00
4.70
3.13
1. 00
15.69
3.45
1. 00
1.00
t
t
•
•
2/3
3/4
112
2/3
3/4
112
2/3
3/4
3
4
5
6
7
_8_
I", - • 10
112
2/3
3/4
IA
~
112
2/3
3/4
_.._---_._---------------• IBiul
~
.74
.91
1.00
.25
6.41
1. 61
1. 00
t
•
•t
•
•
•
•
•
-y-
t
t
t
•
f
_IL
t
f
f
t
f
1", for this cell
-0
~
TABLE
VII.8:
_....
..__ ._---------_.
.
...
MODEL I: MSE EFFICIENCY RATIOS FOR p, GIVEN tv, VA, AND DEGREE OF SENSITIVITY
OF STIGMATIZING QUESTION; n=10000
_.........__...._.___.. _._____.. _.. ___._.. _.._...._._._._.__._._._ . . _._.~~.G_~~L.Qf_ . _~_~N§.LU_VJ_Jl.~-=_I_L~J~_:::.LL' __
. P._.
._.9___
___ 1-_
___.f..
1/2
2/3
3/4
.66
.89
1. 00
.74
.99
1. 00
.89
.63
1. 00
1. 03
1. 00
1. 00
.98
1. 00
1. 00
1/2
2/3
3/4
.66
.89
1. 00
.77
1. 02
1. 00
.94
1. 00
1. 00
1. 05
.99
1. 00
.98
.99
1. 00
1/2
2/3
3/4
.66
.89
1. 00
.85
1. 10
1. 00
1. 05
1. 05
.95
1. 00
.94
1. 00
.80
.94
1. 00
1/2
2/3
3/4
.65
.88
1. 00
.93
1. 15
1. 00
1. 16
.91
1. 00
.B7
.91
1. 00
.70
.90
1. 00
1/2
2/3
3/4
.63
.88
1. 00
.98
1. 04
1. 00
2.02
1. 55
1. 00
21. 15
7.54
1. 00
47.74
3.71
1. 00
3
_i__
4
6
7
9
8
_LL
fA = .01
.95
•
.89
.99
1. 00
.80
.99
1. 00
.66
.94
1. 00
.43
.79
1. 00
.92
.98
1. 00
.86
.98
1. 00
.79
.97
1. 00
.66
.94
1. 00
.50
.94
1. 00
t A = .05
.79
.94
1. 00
.77
.93
1. 00
.77
1. 00
1. 00
3.05
2.25
1. 00
I
.79
1. 06
1. 00
5.25
4.27
1. 00
11.70
2.65
1. 00
= .25
7. 11
1. 62
1. 00
1. B8
1. 00
I
f
f
I
f
I
I
I
I
I
I
I
I
1. 00
1. 00
fA = .02
I
•
•
•
•
•
•
•
•
•
fA = .10
i
A
I
I
I
•
I
I
I
I
f
---_ _._--_.
..
I
IBiasl
~
fA for this cell
~
~
e
-
e
e
e
TABLE VII.9:
e
HODEL II: OPTIMAL VALUES 1 FOR Tv GIVEN TA , p, AND DEGREE OF SENSITIVITY
OF STIGMATIZING QUESTION; n-l00 (SEE TABLE VII. 1 FOR FURTHER DETAILS).
._.._._ _.._.._._.____Q~ ~.~.~~._!lL.j~J!~JIIVXt.L
..~_!l£ (.. ~:JLL. ___________....__......__._
..JL
1/2
213
3/4
~-
•
•
I
_ 1_
•
•
•
.__2_
•
•
I
•
__L.
•
•
•
4
•
•
•
5
TA
•
_
.01
•
•
I
••••
_~
__n
•
I
I
T", • •02
._...7..._-
•
•
•
•
•
8
•
•
•
•
•
I
I
f
I
.001
.002
.003
.007
.011
.013
.022
•
•
.035
1/2
2/3
3/4
.004
.009
.015
.012
.027
.043
.023
.049
.077
.037
.079
.123
.057
.122
.189
f
f
f
f
.190
.295
f
f
f
•
f
f
1/2
2/3
3/4
.012
•027
.043
.033
.072
• 112
.060
.129
.200
.098
.210
.324
.155
.336
.522
T", • .10
.255
.567
.900
•
•
•
•
•
1/2
2/3
3/4
.052
.116
.183
.139
.307
.484
.270
.624
1.000
.556
1.000
1.000
1.000
1.000
1.000
•
•
•
•
•
•
f
•
•
I
T", - .05
T", • • 25
1.000
•
•
.488
1.000
1.000
I
I
•
•
•
I
__JQ._
•
•
•
I
1/2
2/3
3/4
•
9
I
•
I
f
•
I
I
•
•
•
•
•
f
•
•
•
•
I
I
•
•
•
f
•
f
f
f
, Opti.al value is baled upon .ini.u. H5E Nhich, in turn, is derived fro. probability of answering
truthfully according to the hazard presented by having to anlNer ·Yes N to either question (or both) •
• IBinl ~ T", for this cell
..0
~
TABLE 1,111.10.
MODEL III OPTIMAL VALUES' FOR Tv GIVEN TA, p, AND DEGREE OF SENSITIVITY
OF STIGMATIZING QUESTION; n-500 (SEE TABLE VII.l FOR FURTHER DETAILS).
.DEGREE OF
..1L
_ 0_
_1_
_-1-_
__
3_
1/2
2/3
3/4
.001
.002
.003
.002
.005
.008
.004
.009
.014
.007
.015
.022
.010
.022
.033
1/2
2/3
3/4
.002
.005
.007
.006
.013
.019
.010
.022
.034
.016
.034
.053
.025
.052
.079
1/2
2/3
3/4
.008
.016
.025
.019
.040
.061
.032
.068
• 104
.051
.107
.163
.078
.164
.250
1/2
2/3
3/4
.019
.041
.064
.046
.098
.149
.080
• 169
.258
.128
.271
.415
.205
.438
.676
1/2
2/3
3/4
.073
.159
.248
• 183
• 398
.622
.356
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
4
§ENSITIY~TY
___5_
__(:1/2(~-I))
---------_._-----~
_L.
_7__
~._-
9
t
t
.078
t
t
t
t
t
t
t
t
t
t
t
t
.057
• 122
.188
t
.210
.324
t
t
t
t
t
t
t
t
t
.200
.431
.668
.401
.961
1.000
t
t
t
t
t
t
t
t
t
t
t
t
f
f
it
it
t
t
t
t
__JJL
TA = .01
t
.033
.051
TA = .02
.037
.078
.120
TA • .05
• 121
.255
.392
TA • .10
.349
.782
1.000
1.000
1.000
1.000
TA • .25
1.000
•
•
it
f
•
f
t
t
•
•
t
f
•
•
•
•
it
•
, Opti.ll vilue il bA,ed upon .ini.u. "SE which, in turn, is derived fro. problbility of In,wering
truthfully Iccording to thl hazard pre'lnted by hiving to Inlwer -Ve,- to lither qUlltion (or both) •
• IBill1 ~ TA for this cell
-0
~
e
e
e
--
e
e
TABLE YILll:
---
"ODEL II: OPTI"AL VALUES' FOR Tv GIVEN I~, p, AND DEGREE OF SENSITIVITY
OF STIG"ATIZING QUESTION; n-l000 (SEE TABLE VII.1 FOR FURTHER DETAILS) •
._________..___DE GR~E_Jlf._ ..!E N_~.u..t'{lrr_.~!l7.J..~.: 1) )
-L
_0_
_ 1_
_ 2_
_3_
1/2
2/3
3/4
.001
.002
.004
.003
.006
.010
.005
.011
.017
.008
.017
.026
.012
.025
.039
1/2
2/3
3/4
.003
.006
.009
.007
.014
.022
.012
.025
.039
.018
.039
.059
.027
.059
.088
1/2
2/3
3/4
.009
.019
.029
.021
.044
.068
.036
.075
.115
.056
• 117
.179
.086
.179
.275
1/2
2/3
3/4
.022
.047
.072
.051
.107
.164
.088
.184
.281
• 141
.297
.453
.226
.483
.743
1/2
2/3
3/4
.083
• 178
.275
.201
.437
.688
.397
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
4
T~
I'll
__5_.
--~--
- .01
.018
.037
.058
.027
.058
.089
:I:
.02
.041
.097
.133
_._~._-
9
*
.097
.149
*
*
*
*
*
*
*
*
*
.065
.139
.210
.113
.240
.370
•*
•
•*
•
•
.226
.484
.751
.476
1.000
1.000
•
•
•
*
1.000
1.000
1.000
1.000
•
•
I'll • .05
.134
.282
.432
I'll • .10
.393
.897
1.000
It
.25
1.000
7
*
*
I
_1JL
•
•
I
•
•
I
I
*
•*
I
I
I
I'll
•
•
I
I
I
I
I
I
I
I
I
I
I
I
*
*
t
, Optilll vilue il blled upon linilul "SE which, in turn, il derived frol problbility of Inlw.ring
truthfully Iccording to the hlzlrd. presented by hiving to Inswer -Yes- to either qUlltion (or both).
I
IBill' ~ I'll for thil cell
-0
UI
TABLE VII.12:
_.
...
o
.
MODEL 11: OPTIMAL VALUES' FOR Tv GIVEN TA, p, AND DEGREE OF SENSITIVITY
OF STIGMATIZING QUESTION; n-10,000 (SEE TABLE VII. 1 FOR FURTHER DETAILS).
.__ ..__.__.
D_~_~~IL.Q£._~~~~J..!J_y
_1_
_2_
._.J __
--~----
.002
.004
.006
.004
.009
.014
.007
.015
.011
.023
.023
.035
.017
.034
.051
3/4
.004
.009
.014
.010
.020
.031
.016
.033
.050
.025
.050
.078
.037
.075
.115
1/2
.013
2/3
.027
.043
.028
.058
.090
.047
.098
.147
.073
• 151
• 112
.233
.353
.065
.099
.067
.140
.214
• 114
.234
.361
• 182
.383
.585
.302
.637
1.000
• 114
.244
.375
.268
.S79
.936
.565
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
.JL
EI._i·lnJ.~_:ill
_.. ~ .. _.
_._-~-_..-
._..
7
._.
_-1J__.
.__.__.__...
_._~_._-
TA = .01
1/2
2/3
3/4
.025
.052
.080
TA
1/2
2/3
.02
.055
.116
.180
.039
.082
. 124
.068
· 141
· 217
.149
.321
.501
.090
.178
.288
· 167
.352
.535
1.000
.319
.692
1.000
1.000
1.000
1.000
1.000
1.000
.450
1.000
TA ... 05
3/4
.230
.179
.378
.581
TA
1/2
2/3
3/4
1/2
213
3/4
.031
...
1.000
10
.576
1.000
1.000
1.000
1.000
•
t
•
t
t
t
•
t
•
t
•
•
•
•
•
•
•
t
t
TA
.25
1.000
•
•
t
t
t
t
t
t
t
•
t
t
•
t
_ tQ_
•
*
•
•
•
•
•
•*
•
•
*
•
•
t
Optiall vilue il blled upon alnlaua MSE Nhich, in turn, is derived froa probability of InlNering
truthfully Iccording to the hazard prlsented by hiving to InlNlr -Ves- to eithlr que.tion (or both) •
• IBill1 ~ TA for this clll
I
e
e
-0
0-
e
97
As was done for Model I, plots were made for
for
~A=.Ol
and n=500, 1000, or 10000; and for .02
Pvopt
~
vs. Sensitivity
VA i .25 and n=1000
(Figures 8.5-B.8).
VII.3.2
MSE efficiency ratios for p
The patterns of the MSE ratios for Model 11 differ from those for
Modell !Tables VII.13-Vll.16).
.01
~
p=3/4.
V~
i
Here, with only a few exceptions, when
.10, the most efficient value of the selection probability is
The exceptions occur for lA=.10:
p=1/2 is preferred.
when n=500 or 1000 and S=6,
For VA=.25, p=1/2 is preferred for the higher
sensitivity categories.
TABLE VII.13:
_ _ _ 0_._ _ _ _ _.
110DEL II: I1SE EFFICIENCY RATIOS FOR p, GIVEN lv,
OF STI611ATIZIN6 QUESTION; n-100
_____._ _ _._ _ .____._n____...._ .__..___
.JL
_.__ Q.-
_1_
-_£-
1/2
2/3
3/4
f
*
f
f
f
f
f
1/2
2/3
3/4
.73
.91
1. 00
f
*
....1_
4
5
*
*
*
*
*
*
.92
1. 00
*
*
1. 00
1/2
2/3
3/4
.69
.90
1. 00
.70
.90
1. 00
.70
.90
1. 00
.70
.90
1. 00
1/2
2/3
3/4
.67
.89
1.00
.67
.89
1.00
.67
.89
1. 00
.66
1. 00
.65
.88
1. 00
1/2
2/3
3/4
.62
.87
1. 00
.60
.86
1. 00
.56
.83
1. 00
1. 03
1. 59
1. 00
5.43
2.40
1. 00
*
----~._---
9
•• _ _ _
_-!.Q._.
f
f
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*f
f
f
*
*
*
f
*
*
*
*
*
*
*f
*
*
f
f
f
f
f
.02
f
f
1A
.70
.90
1. 00
*
*
*
*
*
7
f
•
f
6
n
.01
II
f
1 111
.92
1. 00
AND DEGREE OF SENSITIVITY
DE GREE. 0 L~.J.!(~lI.lY!!.Y__i~.1.L~J~:JLL. ____________.__.__._________.. .__.__._______
TA
f
lA,
*
*
.05
II
*
.90
1. 00
*
*
*
*
*
Till • .10
.BB
.63
.87
1. 00
1 111
.25
1.00
------_.
f 18ilsl t 1 111 for this cell
.94
1. 36
1. 00
*
t
f
*
*
*
t
f
*
*
*
*
*
*
*
*
*
*
*f
*
*
f
*
..0
m
e
e
e
e
e
TABLE VI1.14:
HODEL 11: HSE EFFICIENCY RATIOS FOR p, GIVEN TVI
OF STIGHATIZING QUESTION; 0-500
---------_.
__ ._-------
_-.9- __
_1_._
__-1._
_3_
4
1/2
3/4
.bB
.90
1. 00
.b9
.90
1.00
.70
.90
1. 00
• 71
.90
1.00
• 71
.90
1.00
1/2
2/3
314
.69
.90
1. 00
.69
.90
1.00
.69
.90
1.00
.b9
.90
1. 00
.69
.90
1. 00
1/2
3/4
.b7
.89
1. 00
.67
.89
1. 00
.67
.89
1.00
.67
.89
1.00
.67
.89
1. 00
1/2
2/3
3/4
.66
• BB
1. 00
.66
.BB
1. 00
.65
.BB
1. 00
.64
.BB
1. 00
.63
.87
1. 00
1/2
•b1
.8b
1.00
.59
.85
1.002
.57
.B9
1. 00
3.bB
3.52
1.00
15.53
2.75
1.00
213
-~--I~
• .01
•
.90
1. 00
I" • .02
.69
.90
1.00
I" • .05
.67
.B9
1. 00
I~
314
---.1!___
_7__
•
t
•
1.00
AND DEGREE OF SENSITIVITY
.63
.91
1. 00
= .25
1.000
•
•
--_!!_--
9
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•t
.90
1. 00
.65
• BB
1.00
.75
1.06
1.00
•
2:BB
2.70
1.00
1. 00
•
•
•
•
•
•
•
•t
•
•
•
•
•
•
•
•
t
•
•
•
•
•
____LO___
•
• b9
.90
1. 00
- .10
I~
213
T~,
DE GRE E OF _SEtf~.u_l'{llL~l11~.:Jl L _________________________________
.-1L
213
e
•
t
•
t
•
•
•
•
•
•
•
•
_.-~-----
• IBill1
~
I" fDr thil cell
-0
-0
TABLE VILIS:
MODEL 11: HSE EFFICIENCY RATIOS FOR p, GIVEN TVI
OF STIGMATIZING QUESTION; 0=1000
TA,
AND DEGREE OF SENSITIVITY
_____________________________________DE6REL.QL_~j: N~~U.YLLLJ_:JL~J~_::Jl_L _____________________
-L
__S!..___
_1__
--~--
__3__
1/2
2/3
3/4
.69
.89
1. 00
.69
.90
1. 00
.69
.90
1. 00
.70
.90
1. 00
.70
.90
1. 00
1/2
2/3
3/4
.68
.89
1. 00
.68
.89
1. 00
.68
.89
1. 00
.68
.B9
1. 00
.69
.90
1. 00
1/2
2/3
3/4
.67
.89
1. 00
.67
.89
1. 00
.67
.89
1. 00
.67
.89
1. 00
.67
.89
1. 00
1/2
2/3
3/4
.66
.88
1.00
.65
.88
1. 00
.65
.88
1. 00
.64
.87
1. 00
.62
.86
1. 00
1/2
2/3
3/4
.61
.86
1. 00
.58
.84
1. 00
.63
1.01
1. 00
6.B5
4.61
1.00
20.55
2.81
1. 00
4
--~----
---~---
u
_________
7
__
8 ___
9
*
t
*
t
t
t
t
*
*
*
_1.9___
= .01
fA
.70
.90
1. 00
TA
.90
1. 00
.69
.90
1. 00
.69
.89
1. 00
.64
.87
1. 00
.97
1. 39
1. 00
= .02
.69
.90
1. 00
TA
.70
.90
1. 00
*
t
t
t
t
*
*
*t
t
*
t
t
t
t
*
*
*
= .05
.66
.88
1. 00
TA • .10
.71
1. 04
1. 00
5.24
3.35
1. 00
1. 00
t
t
t
f
f
t
*
t
*
f
t
• .25
1.000
f
t
f
t
t
t
f
f
f
t
t
t
f
t
f
f
TA
*
18inl l TA for this cell
e-
o
0
e
e
e
TABLE VIL16:
----
110DEL II: I1SE EFFICIENCY RATIOS FOR p, GIVEN lv, TA , AND DEGREE OF SENSITIVITY
OF STIGI1ATIZING QUESTION; n=10,000
______ Q~GRE~Jl_E
_~NS
-L
-~--
_ 1_
_--1__
3
4
1/2
2/3
3/4
.68
.89
1. 00
.67
.89
1.00
.68
.89
1.00
.68
.89
1.00
.68
.90
1. 00
lll..'lJJ Y__ i",l'-~J_P_::J~L __________"_____"__________
.68
.89
1. 00
.67
.89
1.00
.67
.89
1.00
.67
.89
1. 00
.68
.89
1.00
1/2
213
3/4
.66
.89
1.00
.66
.B9
1.00
.66
.89
1. 00
.66
.8B
1. 00
.66
.88
1.00
1/2
2/3
3/4
.65
.88
1. 00
.65
.B8
1.00
.64
• BB
1. 00
.63
.87
1. 00
.60
.B5
1. 00
1/2
2/3
3/4
.60
.86
1.00
.55
.83
1.00
1. 86
2.91
1. 00
54.48
6.74
1.00
2.91
2.86
1.00
~
8
_ 1_0_
9
-_..!_--
TA = .01
.68
.90
1.00
.69
.90
1. 00
.69
.90
1.00
.68
.89
1.00
t
t
t
.68
.89
1. 00
.67
.88
1.00
.67
.89
1.00
.87
1. 23
1.00
t
•
t
.05
.64
.88
1. 00
.62
.87
1.00
5.04
3.66
1.00
1.000
•
t
TA
TA
IBiis'
7
--~--
TA
1/2
2/3
3/4
I
e
e
e
...
TA . . . 25
1.000
I
•t
t
t
t
t
t
•t
10
2.39
3.31
1. 00
t
t
02
•
...
t
t
30.81
4.43
1. 00
t
it
*
t
t
*
t
it
t
t
it
I
I
it
t
t
t
it
•t
it
it
t
t
t
t
----
TA for this cell
-0
VIII.
CONCLUSIONS AND RECOMMENDATIONS FOR FUTURE RESEARCH
This research has focused on th@ unrelated question randomized
response model, Iy known, in order to recommend values of I y based on
the effect of the sensitivity of a stigmatizing question on the
relationship between respondent jeopardy and respondent propensity for
giving a truthful response.
The
lyop~
were found using MSE criteria for
two different models of respondent behavior.
Hopefully, these results
will open the door to wider acceptance of r.ndomized response models
through new avenues of research or through more confidence in this
survey technique.
This work has introduced the notion of categorizing social topics
into eleven categories of sensitivity, and has identified a possible
measure of social desirabllty.
While several authors have suggested
that the sensitivity of a social issue affects the level of cooperation
among respondents, no other study to our knowledge h.s explored or
hypothesized the nature of the effect.
This research has assumed that
the possible measure of social desirability is a feasible instrument for
determining the sensitivity of topics.
However, more research is
required to fully develop this measure or to formulate a different one.
Ideally, once a measure is proposed, it would be tested on a welldefined sample of individuals, such as college students, for which
randomized response techniques are of potential usefulness.
Such a
study would include sensitive topics for which it is desirable to obtain
103
information, such as topics reported in the literature that were
discussed in earlier chapters.
The assumption that such a measure can
be formulated formed the basis for Models 1 and 11.
Model I assumes that a respondent calculates or estimates his/her
limited hazard and then uses that value when deciding whether or not to
•
give a truthful response to a question requiring a "Yes" response.
topics of high sensitivity, no
magnitude of the bias for
Wvap.
For
were recommended because the
~vopt
obtained through the use of MSE criteria
was equal to or greater than the estimated value of WA.
In most of
these cases a value of WA =1.0 yielded an acceptable level of bias.
Topics of modest sensitivity (3
~
5
~
falling in the range from zero to one,
within the two limits.
1~"p.=O,
S) yield recommended values
wi~h
most values falling well
Topics of slight sensitivity (5=0,1) call for
which is equivalent to the direct question setting and which
contradicts the hypothesized relationship that when the limited hazard
is 1, then there is no probability that an individual will give a
truthful response.
This contradiction leads to the suggestion that
future research should consider a more realistic hypothesized
relationship which assumes that some respondents are willing to give a
truthful response without any protection or when their limited hazard is
equal to one.
Such a relationship may be depicted as shown in Figure
VIII. 1.
Model II is based on the assumption that each individual possesses
his/her own tolerance of suspicion
the question is asked.
a,
which we assume is fixed before
If a question requires a "No" response then a
truthful response is given.
If a question requires a "Yes" response
then the individual estimates his/her own probability that a person who
104
El!~RE
VIII.I:
THE POSTULATED RELATIONSHIPS BETWEEN THE PROBABILITY OF
GIVING A TRUTHFUL RESPONSE (T) AND THE LIMITED HAZARD
FOR ELEVEN CATEGORIES OF SENSITIVITY.
"
0.9
\
""\ ,
\
0.8
V\
"
.Q
0.7
\;;,
\
\
0.6
T
"\
,
\
\
0.5
0.4
'-
,
~
\,
\~
\
\
\0
\\
'\
"
\
",
,
,,
...
"-
"-
_---.
0
0
HAZARD
-
",
\\1".. ,
'-
,
""" ..
~
\
"
0
\,
\
\
,
,
\ ,,
\
\
\\
\
\
"-
-'- '--
\~
\~
,,
\ \ \\
,
IJ'
V\
'
,
\
",
\oJ\
\~
\
-,IJ\
\
\,
"
"-
"-
" ,"
\
\
\
-_.,
""
\,
\
0.1
"-
-
-,
\
\ \
0.2
\
\
\
0 . .3
"-
\~
,,,..
V\
\
-
-,
'-
" "- "--,
-
e
\
1
105
answers "Yes" will be perceived as an A individual.
less than
a,
If that value is
then the respondent gives a truthful response; otherwise a
nontruthful response is given.
The results here reflect the assumption
that A and a individuals possess the same tolerance distributions.
Future research should take into account that the tolerance distribution
•
for an A Individual may differ from that for an a individual.
The results for Model II include no recommended
when
ivop~
i
A
=.Ol,
and include some for only the least sensitive categories when lA=.02.
As in Model I, these findings are again due to the magnitude of the bias
as compared to the estimated values of lA.
Model I, Model II yielded fewer but larger
In general, relative to
iyop~.
In fact the "true"
model may very well fall between Models I and II, and may even include
one of the models recommended above for future research.
The results for Model I and Model II were similar in other
respects.
Adjacent categories tended to contain similar
iYDP~'
Thus if
a researcher was unsure of the exact sensitivity category of a topic,
then he/she could classify the topic into a range of sensitivity
categories instead of a specific one and be fairly confident both that
the
ivopt
selected
for the categories would fall into a small range and that the
ivop~
would be close to the ideal one.
Also, the
lvop~
tended
to increase both within rows as the sensitivity increased and within
cells as the sample size increased.
increased the number of
Increasing the sample size
ivop~.
Both models require estimates of i
of
n~n~t.
A
and yield recommended values
However, fA is not known exactly; if it were then the survey
would be both redundant and unnecessary.
actually used in the survey for
fyop~
Also, the value of
fvop~
may not be the recommended value.
106
This situation could result from the researcher estimating a range of
sensitivity categories for the sensitive topic, of the recommended value
being unobtainable, or of the investigators mistaking the value used as
being the recommended value.
Thus, each of
~A
and
~YoP~
is a potential
source of non-sampling error; the effects of these errors should be
considered the focus of future research.
This research explored the relationships among the MSEs for the
three dlfferent values of p used in the models.
For Model I, in
general, topics that are of high sensitivity and are relatively common
yield the most efficient MSE when p=1/2.
This is consistent with what
one would expect, because the respondent in this setting would require a
high level of protection in order to give a truthful response to a
question requiring a "Yes" answer.
Further, when a topic is of mild
sensitivity and is relatively uncommon, then the most efficient MSEs
occur when p=3/4.
Again, this is a consistent finding, because
respondents in low threat situations would require less protection in
order to achieve an acceptable level of cooperation.
By contrast for
Model II, p=3/4 was the selection probability for which the most
. efficient MSE occurred in nearly all cases.
Future research might more
thoroughly consider the effects of p on the results from a randomized
response model.
Such an investigation could explore the changes in MSE
when p takes on values not considered here in detail, such as p<1/2.
Each of the models accounts differently for respondent behavior,
thus resulting in two sets of tables containing
~YoPt.
In order for
these tables to be used in the design of a survey, it must be decided
which model is closer to the "true" model.
When individuals are
confronted with a question requiring a "Yes" response, Model I assumes
107
that individuals will give a truthful answer according to whether they
are AY, Ay, or aY individuals; whereas Model II assumes that individuals
will give a truthful response based only on their individual tolerances
of suspicion and without regard to the presence or absence of the A and
Y characteristlcs, since we assumed the same tolerance distributions for
all individuals.
The "true" model probably falls between the two, but
closer to Model II.
It seems more reasonable to expect individuals to
respond "Yes" according to their own a's, rather than to their
combination of the two characteristics.
However, for example, because
AY individuals must always respond "Yes", they may possess more concern
than Ay individuals when responding to questions calling for a "Yes"
response, and thus may behave according to their individual a's, plus
their comblnation of the A and Y characteristics.
This research has considered only the unrelated question
randomized response model,
Vvapt
known.
this work to other types of models.
Future research might extend
It is also possible that other
models may contain respondent behavior curves that differ from those
presented here.
For example, a model offering the respondent a choice
of one sensitive question and two unrelated questions may have curves
that give higher probabilities of telling the truth because the
respondent may feel more protected.
This research has focused on only
the respondents and has not considered the effects of the model design
•
on non-response, which future research should take into account.
Finally, empirical research documenting the behavior curves and
providing validation of randomized response models in general would be
an important contribution to this area.
APPENDICES
endpoints of an interval: and denote the corresponding MSEs by MSE. and
MSEd.
For the first iteration, let
P(AIYes)mld = .5*IX. + Xd ),
Xb = .5*IX. t PIAIYes)mld),
Xc = .5*IX d + PIAIYes)mld),
and obtain the corresponding MSEs, say MSEmld, MSE b
,
and MSE c
•
Comparisons among MSE., MSEb, MSE, , MSEd, and MSE,/? yield the following
possible outcomes and accompanying actions.
Outcome 1
following .
If MSE b > MSEm.d and MSE c > MSE.'d, then we perform the
If
IMSEb - MSE m,d ) < .0000001 and IMSE" - MSE •• d ) <
• 0000001, then we say that MSE mld is the minimum MSE and the Iterative
process stops.
Otherwise, we set X.
= Xb ,
and Xd
= X,
and repeat the
iteration.
Outcome 2
If MSE b > MSEm,d and MSE c < MSEmid, then set X. = Xb and
repeat the iteration.
Outcome 3
If MSE b
< MSEm.
d
and MSE c > MSE ml
then set Xd = Xc and
oj,
repeat the iteration.
Thus, the iterative process stops when the MSE corresponding to
the P(AIYes) that
IS
the mid-point of the interval is within .0000001
units of the MSEs corresponding to the PIAIYes)s halfway between the
mid-point and endpoints.
In order to detect non-converging processes,
this iterative procedure stops after 100 iterations.
Finally, the optimum value of
~y
is the
~y
minimum MSE obtained by this iterative process.
corresponding to the
110
LUllJ..~L1_J.!
"'ODEL II PLOTS OF l
Yopt
VS. SENSITIVITY FOR p-1I2, 2/3, AND
3/4; 1",-.01.
•. n-500
0.9
0.8
0.7
0.6
TYopt
0.5
0..4
0.3
0.2
0.1
0
2
0
0
p
-
4
8
6
10
SENSITIVITY
1/2
..
P -
o
2/3
3/4
p -
b. n-l000
---------
-
------------- - - -
0.9
0.8
0.7
0.6
Tvopt
0.5
0.4
0.3
0.2
0.1
0
4
2
0
0
p
-
1/2
..
6
SENSITIVITY
P - 2/3
8
o
p -
10
3/4
111
FIGURE B.1:
(CONTI NUED)
c. 00::10000
-r------------------------------0.9
0.8
0.7
0.6
1I'vop~
0.5
0.4
0.3
0.2
0.1
0
2
0
0
p
-
4
10
8
6
SENSITIVITY
1/2
P -
2/3
()
p
-
3/4
112
FIGURE
"ODEL 11 PLOT OF 'Yap, VS. SENSITIVITY FOR p·1/2, 2/3,
B~ll
0.9
0.8
0.7
0.&
Tv.p,
O.~
0.4
0.:3
0.2
0.1
0
2
0
0
p
-
1/2
4
6
SENSITIVITY
p - 2/3
-+
10
8
o
3/4
p -
"ODEL 1: PLOT OF 'Yap, VS. SENSITIVITY FOR p.1/2, 2/3, AND
[IGUR~_~_~_
0.9
0.8
I
0.7
I
0.6
'vop,
0.5
J
0.4
,/
0.:3
0.2
0.1
0
2
0
4
10
8
&
SENSITIVITY
0
p -
1/2
+
P -
2/3
0
p -
:3/4
113
FIGU~~~
'v.P~
MODEL II PLOT OF
VS. SENSITIVITY FOR p-1/2, 2/3, AND
0.9
o.e
0.7
0.6
Tv.p~ 0.5
0.4
0.3
/
0.2
,/
0.1
/
/
/
/
/
/
/
J6
/
/
0
0
2
D
P
1/2
~
..
MODEL I: PLOT OF
[IGU~S:
e
6
4
SENSITIVITY
p - 2/3
TVDP~
0
10
3/4
p -
VS. SENSITIVITY FOR p a l/2, 2/3, AND
/
0.9
;
/
o.e
i
/
J
0.7
0.6
TYDP~
0.5
/
/
0.4
0.3
/
;
0.2
I
0.1
;
!
I
0
D
P -
1/2
..
e
6
4
2
0
SENSITIVITY
p - 2/3
~
p -
10
3/4
114
f.JJ~. ~.~.LLA!.
MODEL 11 PLOTS OF 'Y.P~ VS. SENS 1T1 V1TV FOR PII 112, 2/3 AND
3/4; 1,,-.OL
n-MO
i.
------ - - - - - - - - - - - - - - - - - - - - - - - - - - ,
0.9
O.l!!
O.~
0.6 -
lyop~
I
0.5
04
0.:3
0.2
o
p
-
1/2
SENSITIVITY
p - 2/:3
..
<>
:3/4
p -
b. n-l000
0.9
0.8
0.7
0.6
Ivop~
0.5
0,4
0.:3
0.2
0.1
..
0
0
2
rJ
P
-
1;'2
6
4
..
SENSITIVITY
p - 2/3
8
<>
P -
10
3/4
115
(CONTINUED)
FIGUR~~~
c. nz l0000
0.9
0.8
0.7
0.6
'Yap~
0.5
0.4
0.3
0.2
0.1
0
2
0
a
p
-
4
8
6
10
SENSITIVITY
1/2
..
p
-
2/3
()
p
-
3/4
116
fJGU.RE._~~I!.
110DEL 111 PLOT OF
1 vop ,
VS. SENSITIVITY FOR p-l/2, 213, AND
0.9
O.B
0.7
0.6
I vop "
0.5
0.4
0.3
0.2
0.1
0
0
2
0
EIGU~!-.!~~J_
p
-
4
10
B
6
SENSITIVITY
1/2
+
p -
110DEL 11: PLOT OF
2/3
'vop"
<)
p
3/4
-
VS. SENSITIVITY FOR p=1/2, 2/3, AND
'--- ._----------------+----------,
0.9
O.B
0.7
0.6
Ivop"
0.5
0.4
0.3
0.2
0.1
0
0
2
o
p -
1/2
4
+
SENSITIVITY
p - 2/3
10
B
6
<)
p -
3/4
117
FIGURE 8.9:
"ODEL II: PLOT OF
'Y.P~
VS. SENSITIVITY FOR p·1/2, 2/3, AND
3/4; n-1000; '.--.10.
0.9
0.8
0.7
0.6
'YOP~ 0.5
0.4
0.3
0.2
o. ,
0
2
0
4
6
10
8
SENSITIVITY
0
FIGURE B.
to:
1/2
P -
..
p
110DEL II: PLOT OF
-
2/3
IYcop~
o
3/4
p -
VS. SENSITIVITY FOR p-1/2, 2/3,
AND 3/4; n-1000; TA =.2S.
J
0.9
0.8
~
0.7
0.6
'YOP~
0.5
/1
I
I
0.4
//
0.3
0.2
)
/
0.1
0
0
0
p
-
1 I..,
1
4
.
e
6
4
2
10
SENSITIVITY
P -
2/3
0
p
-
3/4
118
BIBLIOGRAPHY
Abernathy, J.R., Greenberg, B.G., and Horvitz, D.G. (1970), "Estimates
of induced abortion in urban North Carolina," pe~ography, 7, 1929.
Abul-Ela, A.A. (1966), R",ndomi~l!dr~sPCJ.n~l!fllod.els
. Jor samplesurvE'Ys on
hUfllanpopul",tio/1s, Ph.D. dissertation, Department of
Biostatistics, University of North Carolina at Chapel Hill, Chapel
Hill, NC.
Abul-Ela, A.A., Greenberg, B.G., and Horvitz, D.G. (19671, "A multiproportions randomized response model, "JCJ\Jrnal of t.hE'..Ame.rican
Statis.;tjcal Associcl.t.iol1, 62, 990-1008.
Anderson, H. (1976), "Estimation of a proportion through randomized
response," InlE'[n~tio/1cl!.'tati~ti~cll Review, 44, 213-217.
_ (1977), "Efficiency versus protection in a general randomized
response model," Scandil1ayi",n.JlJ.l,lrl1al.of . . St.attsti.. c:.. s, 4, 11-19.
Barksdale, VI. B. (1971), New . randomLz.edresponsf!tec~.l1ique~. forc..ontrol
of non-:-samplingerrorsjn.sljrve.ys.;, Ph.D. dissertation, Department
of Biostatistics, University of North Carolina at Chapel Hill,
Chapel Hill, NC.
Barth, J. T., and Sandler, H.M. (1976), "Evaluation of the randomized
response t e c h n i que ina d r ink i n g survey," ~CJ.lj [nel! Cl f.~tlld.J~.~ Or:!
AJlJc:ohlJJ, 37, 690-693.
Begin, G. and Boivin, M. (1980), "Comparison of data gathered on
sensitive questions via direct questionnaire, randomized response
technique, and a projective method," PsychollJgica1 Reports.;, 47,
743-750.
Begin, G., Boivin, M., and Bellerose, J. (1979), "Sensitive data
collection through the random response technique: some
improvements," J.ourn aJ.9f .. f'~.yc:hCJJ.9.9Y, 101, 53-65.
Bellhouse, D.R. (1980), "Linear models for randomized response designs,"
J 0 u rn",1.9{ . th e A.fllE'r .iC.clD.. Stclt..i~ttC:clJ ..As.;.soc:. Lcl~ i.o.n, 375, 100 1- 1004 .
Boruch, R.F. (1972), "Relations among statistical methods on assuring
c on f i den t i ali t y of so ci aIr e sear c h data," Sgt:i~L ~c:i.,.rU::E!~
R~s!~rt:h, 1, 403-414.
Bourke, P.D. (1981), "On the analysis of some multivariate randomized
response designs for categorical data," Journ.al. of Statis.;ticcll
E1al1n jl19cl n d. InJE'r e /1.<: E" 5, 165- 170.
Bourke, P.D., and Dalenius, T. (1976), "Some new ideas in the realm of
ran d 0 mi zed i n qui r i e s ," In t e rna.ti 0 n a 1 .5 tat is ti cal Re vi e w, 4 4, 2 19221.
119
Carr, J.W. (1983), "An empirical comparison of the randomized response
and tr.ditional self-report methods for eliciting truthfulness
f r om ad 0 1esc en t s abo ut del i nque nt act s ," DJss.eTt~Ji.911... A.I:l~Jr.:.~.c1~
In~l!rn~tiCJI}.c1J,
43, 2640A
Carr, J.W., Marascuilo, L.A., and Busk, P. (1982) ,"Optimal randomized
response models and methods for hypothesis testing," ~.9url1a.l.. of
Educati onal Sttltistic:~, 7, 295-310.
Chow, L.P., Gruhn, W., and Chang, W.P. (1979), "Feasibility of the
randomized res p 0 n Ie technique in rural Ethiopia," AIlll!T.i... C: c111}C1I"lT.f.lil)
of Public Health, 69, 273-276.
Clickner, R.P., and 19lewicz, B. (1980), "Warner's randomized response
technique: the two sensitive questions case,"~9ut.hAf.r.:J.C:~I)
Sta tistic:~L..J 9u,..nal, 14, 77 -86.
Devore, J.L. 119771, "A note on the randomized response technique,"
G. 0 mmun.i ctl ~J9n S.... JD . ~~a.ti..!!itjc:.!!i.;.J.h' 9r.:ytl.ll.g/'ll!.~h.9g~, A6, 1525 -15 29.
Dowling, T.A., and Shachtman, R.H. (1975), "On the relative efficiency
of randomized response models," J.Q!ll"J1c11..9 f1h~ .. AIlI.~.l".tc:.c1"
St..aJ.i ~t i. c a l.A s.!!i 9 c:J c1ti...l:l.I}, 70, 84 - 87 •
Drane, W. (1976) "On the theory of randomized responses to two sensitive
que s t ion s ," C9Il1m..LJ.llJ.C:..c1.ti911 ~ .J.fl .~1c1ti..!!i .~Jc:~; ...Th.!!.9.l"Y..tlllfL ./'Il!1 h.9.d.!!i, A5 ,
565-574.
Eriksson, S.A. (1973) "A new model for randomized response,"
Internati9nal Statistictll ~'.v.J.l!~, 41, 101-113.
Fidler, 0.5., and Kleinknecht, R.E., "Randomized response versus direct
questioning: two data collection methods for sensitive
i nfor mat i on ," P!!iY c:h9J9 gi caL.~LJ.l.l!!ti..'" 84, 1045 - 1049 •
Fligner, M.A., Policello, S.L, II, and Singh, J. (1977), "A comparison
of two randomized response survey methods wtih consideration for
the level of respondent protection," G.olllmLJ.D..i.C:c1ti...c.'.ll!!i...i.I:1..?1~.ti.!!iJJ.C:.!!i.
Theoryal1~~eth9~s,
A6, 1511-1524.
Folsom, R.L, Greenberg, B.G., Horvitz, D.G., and Abernathy, J.R.
(1973), "The two alternate questions randomized response model for
human sur ve y Ii ," J...91J..r.:..rl..c1J Clt.. J.h.E!. A!!I.l!Tic:.c1fl.. ?1c1~.i~Jtc:al A.!!i~.Clc; ..i..c1t~ ..lJ.D.,
68, 525-530.
Fox, J.A., and Tracy, P.E. (1980), "The randomized response approach:
applicability to criminal justice research and evaluation,"
Evaluation Revie.f'I, 4, 601-622.
_ (1981), "Reaffirming the viability of the randomized
res pon sea ppro a c h (rep 1y to 1'1 i I I e r ) ," A.I!l.!! ,.. i C:c1rl..?9Ci9J99iC:t1J
ReXie~, 46, 930-933.
120
Frenette, C. and Begin G. (1979), "Sensitive data gathering through the
random response techniques: a validity study," Psycho~ogical
Repgrts, 45, 1(101 - 1002.
Gelles, R.J. (1978), "Methods for studying sensitive family topics,"
~m e r.ic~.nJ au r 1}~lofO r:th a P 5iVc: hiatTY, 48, 408 - 424 •
Geurts, M.D., Andrus, R.R., and Reinmuth, J. (1975-6), "Researching
shoplifting and other deviant customer behavior, using the
randomized response research design," Jo.url}..i i1o.f.Retaili.ng, 51,
43-48.
Godambe, V.P. (1980), "Estimation in randomised response trials,"
Int~r:rl.a.ti.onaJ. ~tiiti.sti.c . ii.lRe".J.~w, 48, 29-32.
Goodstadt, M.S. and Gruson, V. (1975), "The randomized response
t echni que: ate s ton drug use," ~ou rn a 1. of J.h.!? Am e ric do."
St~tistic:al As.sClciation, 370, 814-818.
Goodstadt, M.S., Cook, G., and Bruson, V. (1978), "The validity of
reported drug use: the randomized response technique," I~e
I nternatJo na 1.Jo urn a.1 ....Clt . the. . Ag .c!i c: t ions, 13, 359 - 36 7 •
Gould, A.L., Shah, B.V., and Abernathy, J.R. (1969), "Unrelated question
randomized response techniques with two trials per respondent,"
Pr.o.c:eecjirlgsott h~ Socia.l .. ~ta ~i.s.tics~e ctiClI) J .... Ame r tcan§Ja t.i.5itic:a.l
As SClC i at .i..Cl.n, 35 1- 359 •
Greenberg, B.G., Abernathy, J.R., and Horvitz, D.G. (1970), "A new
survey technique and its application in the field of public
health," Mi.lbank. MemgrialFun~9~a~t~rlY, 68, 39-55.
Greenberg, B.G., Abul-Ela, A.A., Simmons, W.R., and Horvitz D.G.,
(1969), "The unrelated question randomized response model:
the are t i cal framework," Jou.rnc1J. of. the .. Am eri can Bta tis tic a I
A§sociatiCln, 64, 520-539.
Greenberg, B.G., Horvitz, D.G., and Abernathy, J.R. (1974), "A
comparison of randomized response designs," in Reli.c1~ility and
BioRl~trVLStatisticalADa.lY5ii.5i of . J.ifelength, F. Procahn and R. J.
Serfling, editors, Philadelphia: Society for Industrial and
Applied Mathematics, 1974, 787-815.
Greenberg, B.G., Kuebler, R.R., Abernathy, J.R., and Horvitz, D.G.
(1971), "Application of the randomized response technique in
obtaining quantitative data," Journal of_the American Statistical
Associatign, 66, 243-250.
(1977), "Respondent hazards in the unrelated question
randomized response model," Journal of Statistical Planning and
Inference, 1,53-60.
121
Hastings, N.A.J., and Peacock, LB. (1975), Stati~tic."L~t~~rJtJLl1io.l'I~:
a ha~~b99k.for~tu~el'ltJ ~1'I~_Rr!~~lol'l~[~' Butterworths, London,
England.
Himmelfarb, S. and Edgell, S.E. (1980), "Additive constants model: a
randomized response technique for eliminating evasiveness to
quanti t a ti ve response quest i on s," Ps.ycholog ictliJ~lI.lJe.ti...n., 87,
525-533.
_ (1982), "Note on . The randomi zed response approach':
ad den dum to Fox an d Tr ac y," EVtlil.lJ..~.U9n . R.~"i...~w., 6, 279-284.
Himmelfarb, S. and Lickteig, C. (1982), "Social desirability and
randomized response technique," ~o.ur.l:l.~.t o t P~r5.o.l1~JiJY. tlll1. d.. !)Clc.LtllJ
Psychology, 43, 710-717.
Horvitz, O.G., Greenberg, B.6., and Abernathy, J.R. (1976), "Randomized
response: a data-generating device for sensitive questions,"
lnt er na.Ji Cl nill...!)~ a.J i.5.1 i c: tli.lJ~ e.'l.i. .~.~., 44, 18 1- 196 •
Horvitz, D.G., Shah, B.V., and Simmons, W.R. (1967), "The unrelated
question randomized response model," FrClc:.~egJl'Ig~. Clf1h~§ClC.itllJ
Stat i.s tic s .Se c t.i on,Am~rjC::.il.,.§1aJj.s.t.Jc::.a l... ~~s.Clciati..9.I'I, 65 -7 2.
I-Cheng, C., Chow, L.P., and Rider, R.V. (1972), "The randomized
response technique as used in the Taiwan outcome of pregnancy
stud y," !)tud.i f!~. )n.f il.!".i LY...F.~.~I).I:li.l'Ig, 3, 265-269.
Kraemer, H.C. (1980), "Estimation and testing of bivariate association
using data generated by the randomized response technique,"
PsychologicalBlJ.lletjn, 87, 304-308.
Krotki and Fox (1974), "The randomized response technique, the
interview, and the self-administered questionnaire: an empirical
comparison of fertility reports," f'r9c~li!c:t.l.,.gs.uo.f.thf!!)0c.jtlll
!) tat i s tic 5S e c t io n, ~m e ric:: il.,St !t(s.t..ic.a 1 . A5so.C.i a.tJo.n, 367 - 37 1•
Lamb, Jr., C.W. and Stem, Jr., D.E. (1978), "An empirical validation of
the ran d 0 mi zed response t e c hni que," J()urll <l Lof .1'I.a.r.~f!.t. ill.g. R.~ ~f!a.r c:.h. ,
15, 616-621.
Lanke, J. (1975), liOn the choice of the unrelated question in Simmons'
version of randomized response," ~.9.lIrn.al of the.~.merica.,
St a. ti~ t i c.a 1 ..~.s. s.oc.i a..ti..9.n., 70, 80 - 83.
•
(1976), "On the degree of protection in randomized
interviews," Internationtli1 StaJistical .. Rf!vt~w, 44, 197-203 •
Levy, K.J. (1976), "The randomized response technique and large sample
pairwise comparisons among the parameters of k independent
bin om i alp op u 1 at ion s ," ~r . i .ti .sh.J tl.lI..r::n.a.l....9.J.l'lil1hE!,!"ilt.ic.a.J §ta.t.J.~tic s
andPsycho.lo.gy, 29, 257-262.
__________ (1 977), " The ran d 0 mi zed res p 0 n set e c h n i que and a p pro p ria te
sample sizes for selecting the population with the largest value
of pi from among k binomial populations," British Journal of
Mathem~tical Statisticsand!sych9109y, 30, 234-236.
___________ (1978), "Sample size comparisons involving the randomized
res p0 n set e c h n i que ," Jo urn a 1 0 f Exp e rim e n talE d ucat ion, 4 7, 2 1- 23 •
.
___________ (1 9 8 (l), "T her and 0 mi zed res p 0 n set e c h n i que and I a r g e sam pie
tests concerning the parameters of a multinomial distribution,"
Educatlorr~land Psycholg~icalMe~surement, 40, 701-707.
Leysieffer, F.W. and Warner, S.L. (1976), "Respondent jeopardy and
optimal designs in randomized response models," Journal ...of the
..
Ame ri can.$t ati s tic a lAs!ioc;i at io l1 , 371, 649-656.
"".--
,-
-,-
,-.
".,-
-,,_
Liu, P.T. and Chow, L.P. (1976a), "A new discrete quantititative
randomized response model," Journal of the Bm..e.rican Statistical
A55 0 c:ia,t io.n, 7 1, 72 - 7 3 •
________ (1 976 b), "T he e ff i c i en cy 0 f the mu 1 ti pie tria 1 ran d 0 mi ze d
response technique," ~iometrics, 32, 607-618.
Liu, P. T., Chow, L.P., and Mosley, W.H. (1975), "Use of the randomized
response technique with a new randomizing device," Journal.gf the
~merl~an~At~t.tstical Asso~iatipn, 70, 329-332.
Locander, W., Sudman, 5., and Bradburn, N. (1974), "An investigation of
interview method, threat and response distortion," Proceedin.Qlio.f
the.. SClc:ial. Statistics Sect..i .on,American Stiltistical A.ssgci.ati.on,
21-27.
Loynes, R.M. (1976a), "Asymptotically optimal randomized response
procedures: an abstract," Interna.t.ionaIStatistical ~eView, 44,
,",,",I:"
~4.J.
_____.____ (1 976 b), " As ymp t 0 ti cal I y ran d 0 mi zed res p 0 ns e pro c e d u res, "
Journal of.the Ameri~an ~t~~i~tical Asspc:i~tioD, 71, 924-928.
Martin, G.L. and Newman, LN. (1982), "Randomized response: a technique
for imp r 0 vi ng the val i d i t'y 0 f s elf - rep 0 r te d he a 1 t h be h a vi 0 r s ," Th e
Journal of School Health, 52, 222-226.
Miller, J.D. (1981), "Complexities of the randomized response solution
( com men ton Trae y and Fox, ASR, Apr i 1 19 8 1 ) ," Amer ic .a. n
Soci~lo~ical~eview, 46, 928-930.
Moors, J.A. (1971), "Optimization of the unrelated question randomized
response model," J ourn al.. oJ t heA.f!\.e.rjcan 5.ta ti !it i c aLA ssgci ~ t iCln ,
66, 627-629.
Moriarty, M., and Wiseman, F. (1976), "On the choice of a randomization
technique with the randomized response model," Proceedings of the
•
..:.,,_1
1'"'-::'
Social Stat i sti e:!i Section ottb ~~!ft~rj e:~ I'l~ta t i!itical
624-626.
~s~CJ c:.i
ati .0l'l. ,
Pitz, 6.F. (1980), "Bayesian analysis of random response models,"
Psychological Bulletil'l' 87, 209-212.
Pohl, B.B. and Pohl, N.F. (1975), "Random response techniques for
reducing non-sampling error in interview survey research," J~~rrral
of . Experi mental.Educ~q.ol'l, 44, 48-53.
Pollock, K.H. and Bek, Y. (1976), "A comparison of three randomized
r esp on se mod e Is for quan t i at i ve dat a," J ourn~J[JL . . t.I1.~ ...~Ill~ri.c: an
Stati sti.cal Assoc(a.ti on, 71, 884-886.
Poole, W.K. (1974), "Estimation of the distribution function of a
continuous type random variable through randomized response,"
Journal of the A!ft~r.i~an. ~t.l~i~t.ic:~l ~~~~e:l~tlCJn, 69, 1002-1005.
Raghavarao, D. (1978) I "On an estimation problem in Warner's randomized
response technique," BiCJ.mli!tric:s, 34, 87-90.
Raghavarao, D. and Federer, W.T. (1979), "Block total response as an
alternative to the randomized response method in surveys," J~~~~~l
Clft hli!.. Roy ~.lS..t.ati_!5t(e:.!5. §.CJ.c;.it!!~ Y.L . .,lb.. , 4 1, 40 - 45.
•
Reaser, J.M., Hartsock, S., and Hoehn, A.J. (1975), ~.tes.t. . CJJJhe
fo rc ed -.al te r naJi ve . r..~ 1).11 om... r:: e !5PCJl'ls ~.q ~.~s. tionl'l~j r~.J~c:hDi q~~J.
Tech.n.ical Report]~::-9., Human Research Organization, Alexandria,
Virginia.
Reinmuth, J.E. and Geurts, M.D. (1975), "The collection of sensitive
information using a two-stage, randomized response model," Journal
of Markli!ting Re~e.rch, 12, 402-407.
Rider, R.V., Harper, P.A., Chow, L.P., and I-Cheng C. (1976), "A
comparison of four methods for determining prevalence of induced
abort i on, Ta i wan, 1970 - 7 1 ," Am erAe: al'l... J g.~r I)a Ll:lJ.~pJdli!m. i oJogy , 103 ,
37-50.
Sen, P.K. (1974), "On unbiased estimation for randomized response
models," Jourl'la.tCJ.f ..thi!Am.t!!.ri.c~n §t~U!5tJ.c:al.~~!5.gc:i. .• tiCJ.D., 69,
997-1001.
•
_ _ _ _ _ (1976), "Asymptotically optimal estimators of general
par a me t e r sin ran dom i zed res p0ns e mod e 15 ," II'l.Jer..D..~ti..(J.D.ua.J
Statistic.al ReVii!w, 44, 221-224.
ShimiZU, I.M., Bonham, 6.5. (1978), "Randomized response technique in a
nat ion a 1 survey," J eJ 1,J.r..I'l~J . . eJJ t h~.~.!".t!!r i.c:.•.D? ~.a~ J s.tj c.l. ~.!5s.eJci~ tj 911,
73, 35-39.
-e
Shotland, R.L. and Yankowski, L.D. (1982), "The random response method:
a valid and ethical indicator of the 'truth' in reactive
124
situations,"
174-179.
Per!onali~y a~d_Social
Psychology Bulletin, 8,
Smith, E.P. and Sosnowski, ToS. (1972), "Faculty evaluations by
randomized response sampling," Journal of Ex~erimental Edu~ation,
41,' 70-72.
Soeken, K.L. and Macready, 6.B. (1982), "Respondents' perceived
protection when using randomized response," P!'iycholCJgic~l
~ulL~Jil1' 92, 487 - 489 •
.
Tamhane, A.C. (1981), "Randomized response techniques for multiple
sensitive attributes," Jour::.nalCl.Lthe Ameri.c:an Stati.stical
9S!Ofiation, 76, 916-923.
Tracy, P.E. and Fox, J.A. (1981>, "The validity of randomized response
for ses i ti ve measurements," Sociol ogicaJRf!view, 46, 187 - 200.
Volicer, B.J. and Volicer, L. (1982), "Randomized response technique for
estimating alcohol use and noncompliance in hypertensives,"
Journal o.iStudies on Alcohol, 43, 739-750.
Warner, S.L. (1965), "Randomized response: a survey technique for
eliminating evasive answer bias," Jqurl1al.of the Am~rica~
~tatistical Association, 60, 63-69.
•
________ (1 97 1> , "The linear randomized r IS p 0 n I f mod 11 ," JO IJ rn ....1.0 f
the 9 m~ ri.c:a n.S ta.J i. sti ca 1..A!'i!'i.ClcJ.C1tio l1 , 66, 884 - 888.
__________ (1976), "Optimal randomized response models," I l1 terD. C1 tio l1 al
Stati~~if~l R~view, 44, 2025 212.
Winkler, R.L. and Franklin, L.A. (1979), "Warner's randomized response
model: a Bayesian approach," Journal of the American Stati,tjcal
Association, 74, 207-214.
Wiseman, F" Moriarty, M., and Schafer, M. (1975), "Estimating public
opinion with the randomized response model," Public OpinJon
QUCirterly, 39, 507-513.
Zdep, S.M. and Rhodes, LM. (1976), "Making the randomized response
technique work," Pub.l iC:.Opj ~LoD.J;lu.arte.r 1y, 40, 53 1- 5 3 7 •
Zdep, S.M., Rhodes, LN., Schwarz, R.M., and Kilkenny, M.J. (1979), "The
validity of the randomized response technique," F'ublicOpi.nion
QuarterlY, 43, 544-549.
c