Respondent-Driven Sampling - The University of Texas at Dallas

Respondent-Driven Sampling: A New Approach to the Study of Hidden Populations
Author(s): Douglas D. Heckathorn
Source: Social Problems, Vol. 44, No. 2 (May, 1997), pp. 174-199
Published by: University of California Press on behalf of the Society for the Study of Social
Problems
Stable URL: http://www.jstor.org/stable/3096941 .
Accessed: 20/12/2013 16:14
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
.
University of California Press and Society for the Study of Social Problems are collaborating with JSTOR to
digitize, preserve and extend access to Social Problems.
http://www.jstor.org
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
Respondent-Driven Sampling: A New
Approach to the Study of Hidden
Populations*
DOUGLAS
D. HECKATHORN,
University
of Connecticut
" whennosampling
A population
is "hidden
exists
andpublicacknowledgment
in
frame
ofmembership
thepopulation
ispotentially
suchpopulations
is difficult
standard
because
threatening.
probability
Accessing
lowresponse
methods
ratesandresponses
thatlackcandor.Existing
sampling
produce
procedures
forsampling
thesepopulations,
snowballand otherchain-referral
thekey-informant
and
samples,
approach,
including
a newvariant
introduce
well-documented
biasesintotheir
targeted
sampling,
samples.Thispaperintroduces
a dualsystem
to
thatemploys
incentives
respondent-driven
sampling,
ofstructured
ofchain-referral
sampling,
overcome
someofthedeficiencies
onbothMarkov-chain
drawing
theory
ofsuchsamples.A theoretic
analysis,
and thetheory
showsthatthisprocedure
canreduce
thebiasesgenerally
associated
with
ofbiasednetworks,
a proof
methods.
Theanalysis
thateventhough
includes
sampling
showing
beginswithan
chain-referral
chosen
setofinitialsubjects,
as do mostchain-referral
thecomposition
samples,
oftheultimate
arbitrarily
a theoretic
initial
Theanalysis
alsoincludes
independent
sampleiswholly
ofthose
subjects.
specification
ofthe
underwhichtheprocedure
conditions
basedon surveys
results,
yieldsunbiased
samples.Empirical
of277
in Connecticut,
activedruginjectors
these
theconclusion
howresponconclusions.
discusses
support
Finally,
dent-driven
canimprove
bothnetwork
andethnographic
sampling
44investigation.
sampling
Introduction
"Hidden populations" have two characteristics:first,no sampling frameexists, so the
size and boundaries of the population are unknown; and second, thereexist strongprivacy
concerns,because membershipinvolvesstigmatizedor illegalbehavior,leading individualsto
refuseto cooperate,or give unreliableanswers to protecttheirprivacy. Traditionalmethods,
such as household surveys,cannot produce reliablesamples,and theyare inefficient,
because
most hidden populations are rare. Examples include many groups at riskof contractingHIV,
includingactive injectiondrug users (IDUs). Identifyingthese groups is crucial to the development of effectiveAIDS preventioninterventions,and findingvalid, reliable ways of sampling them is essential to evaluating interventions.
Three methods dominate studies of hidden populations: snowball sampling and other
formsof chain referralsamples, key informantsampling,and targetedsampling. The best
known approach is snowballsampling(Goodman 1961): ideally, a randomlychosen sample
serves as initial contacts,though in practiceease of access virtuallyalways determinesthe
initialsample; these subjectsprovide the names of a fixednumber of other individualswho
fulfillthe researchcriteria. The researcherapproaches these persons,and asks them to participate;and each subject who agrees is then asked to provide a fixed number of additional
names. The researchercontinues thisprocess foras many stages as desired.
* An earlierversionofthispaperwas presented
onAIDS,Vancouver,
at theXI International
B.C.,July
Conference
and Denise
to combatAIDS was invented,
intervention
1996. I thankRobertBroadhead,
withwhomthepeer-driven
Boris
Dean Behrens,RobertMills,BethJacobson,
who conducted
muchofthedataanalysis.In addition,
Anthony,
I
andadvice.Finally,
reviewers
comments
DavidWeakliem,JeroenWeesieandanonymous
provided
helpful
Sergeyev,
onDrugAbuse(Grant#R01DA08014)foritssupportofthisresearch.
Institute
thanktheNational
174
SOCIAL PROBLEMS, Vol.44, No. 2, May 1997
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
Respondent-Driven
Sampling
Erickson(1979:299) describesa numberof problemsafflicting
snowballsamplingand
otherchain-referral
about individualsmustrelymainlyon the
samples. First,"inferences
initialsample,sinceadditionalindividuals
foundbytracing
chainsareneverfoundrandomly
or even withknownbiases." Thisissueis especiallygrave,because in the contextswhere
chain-referral
methodsare used,theinitialsampleusuallycannotbe drawnrandomly.Second, chain-referral
samplestendto be biasedtowardthe morecooperativesubjectswho
thisproblemis aggravated
whentheinitialsubjectsare volunteers,
beagreeto participate;
cause in termsofcooperation
are
outliers.
Third,thesesamplesmaybe biasedbecause
they
of"masking,"
thatis,protecting
friends
them,an important
bynotreferring
problemwhena
occurthroughnetworklinks,so
populationhas strongprivacyconcerns.Fourth,referrals
willbe oversampled,
and relativeisolateswillbe exsubjectswithlargerpersonalnetworks
cluded. Because of thesepotentialbiases,snowballsamplestypically
are seen as "convenience samples"thatlackanyvalidclaimto produceunbiasedand consistent
samples. Still,
Ericksonis optimistic
aboutthepotential
forresolving
theseproblems,
that: "the
concluding
problems in making inferencesabout individualsand about chains .
. .
are . . . in principle
solvable:One can buildin addedincentives"
she didnotdevelop
(1979:299). Unfortunately,
thispoint.
Twoadditionalmethodshave developedto overcomethedifficulties
snowball
afflicting
samples. Keyinformant
sampling(Deaux and Callaghan1985) is designedto overcomereand askingthem about
sponse biases by selectingespeciallyknowledgeable
respondents
others'behavior,ratherthantheirown. For example,one mightask socialworkers,drug
abuse counselors,
or naturalopinionleadersto reporton patternsof
publichealthofficials,
to exaggeratesociallyacdruguse and sexualbehavior.Thismethodreducesthetendency
behavior,however,itadds severalsourcesof
ceptablebehaviorand understate
disreputable
bias. First,when professionals
are keyinformants,
theirprofessional
orientation
maybias
theirresponses;
abusecounselors
theirclients'difficulties,
and
e.g.,substance
mayexaggerate
naturalopinionleadersmayfeelobligated
topresenteitheran idealizedora disparaging
view
ofthosewhomtheyinfluence.Second,keyinformants
detailedknowlmaylacksufficiently
ofcondomuse
edge;e.g.,withtheexceptionofsexualpartners,
knowledgeaboutfrequency
tendsto be strictly
do notinteract
witha randomgroupof
personal.Third,keyinformants
is a professional,
thebias is a formof "institutional
potentialclients.If the keyinformant
bias"characteristic
ofsamplesdrawnfrominstitutionalized
populations.Forexample,justas
selected,neitherare thosewho come intocontact
imprisoned
drugusersare notrandomly
withdrug-abusecounselorsor thosewho are partof the entourageof a naturalopinion
leader. Hencethekeyinformant
itintroduces
new sourcesofpoapproachhas limitations:
tentialresponsebias,it cannotbe used to accesshighlydetailedand personalinformation;
and thesamplingmayhave an institutional
bias.
(Wattersand Biernacki1989) is a widelyemployedresponseto the
Targeted
sampling
ofchain-referral
deficiencies
models.Itinvolvestwobasicsteps:first,
fieldresearchers
mapa
the
extent
thattheysucceedin penetrating
thelocal networks
targetpopulation(to
linking
thispreventsthe under-sampling
thattraditional
potentialrespondents,
approacheswould
recruita pre-specified
numberof subjectsat sites
produce);and second,fieldresearchers
identified
thatsubjectsfromdifferent
areasand subbytheethnographic
mapping,
ensuring
will
groups
appearin the finalsample. The adequacyof targeted
samplingdependson the
oftheethnographic
accuracyand comprehensiveness
mapping.Ifethnographic
mappingis
nearperfect,
targeted
samplingreducesto a formoflocationsampling.Forexample,based
on an exhaustiveethnographic
analysis,one couldweightdrugscenesbasedon theirintenofuserat eachtimeoftheday,and sampleaccordingly.
sityofuse byeachcategory
Unfortunately,thisis neverpossiblebecausedrugscenesare neitherdiscretenorpublic;whilesome
drugcopping(i.e.,selling)occursin concentrated
areas,muchis dispersed,
in prioccurring
vateapartments
and othernonpublicsettings.Evenan accomplished
ethnographer
requires
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
175
176
HECKATHORN
a smallsubsetofsuchsettings
monthstopenetrate
withinone urbanneighborhood.
Ethnogof the timeof day
and
therefore
is
i.e.,by theeffects
limited,
targetsampling always
raphy
wheretheydo theirrecruiting,
and therecruitment
whenresearchers
recruit,
they
strategies
ifresearchers
recruit
and Biernacki1989:424-426).Forinstance,
use (Watters
duringnormal
businesshours,theywillhavedifficulty
recruiting
gainfully
employedsubjects.Ifresearchers
a subtlerversionofthe "instituin "obvious"locations,theyintroduce
focustheiractivities
tional"bias targeted
samplingseeksto avoid- theyoversamplethemostvisiblepotential
thosein less obviousniches. Researchers'
(or missaltogether)
subjects,and under-sample
limitthe
intoareas wheretheydo not feelsafefurther
aboutventuring
own trepidations
that
correintroduces
biases
Thus
of
the
targetedsampling
diversity
populationsampled.
which
it
is
based.
of
the
to
the
limits
upon
ethnography
spond
limitations
and theabsenceofanyfundaoftargeted
sampling's
recognition
Increasing
in snowballand other
renewed
interest
methods
have
new
produced
sampling
mentally
methods.Thesemethodshave greatpotentialpower,as studiesofsocialnetchain-referral
associas largeas theUnitedStates,everymemberis indirectly
worksreveal.In populations
six intermediaries
ated witheveryothermemberthroughapproximately
(Killworthand
Bernard1978/79). This means that even the most sociallyisolatedindividualscan be
chosenindividwithanyarbitrarily
chainbeginning
reachedbythesixthwave ofa referral
ual. Ofcourse,realizingeven a smallportionof thispotentialrequiresa highlyrobustrerecruit
forexample,thatparticipants
cruitment
everyonetheyknow,and
process,requiring,
in thecase of sociometric
stars,thismightinvolvemanythousandsofpersons.
ofchain-referral
Refinements
samplingincludesFrankand Snijders'(1994) methodfor
hidden
of
the
size
populationsusinga one-wavesnowballsample. Theyselecta
estimating
diverseset of initialsubjects,each ofwhomthenlistsall membersof thetargetpopulation
basedon theamountof
thattheyknow. Thesizeofthehiddenpopulationis thenestimated
Klovdahl(1989) proposesa "randomwalk"
overlapamongthe memberslisted.Similarly,
network
structure,
usinga snowballsamplein whicheach wave conapproachforanalyzing
memofthenetwork
features
sistsofonlyone subject;itsresultsrevealstructural
connecting
bers of the hidden population. In addition,Spreen and Zwaagstra(1994) propose a
combinationof snowballand targetedsampling,termed"targeted
personalnetworksambecomesthebasis
which
an
initial
to
locate
which
uses
sample,
mapping
ethnographic
pling,"
of cocaineusers'
fornetworksampling.Theyused thisapproachto analyzethe structure
personalnetworks.
sampling,some problems
Despite the sophisticationof these extensionsof chain-referral
discussionabout
remain unresolved. In particular: "The centralquestionin themethodological
samplingand analyzinghiddenpopulationsis, basically:How to draw a random(initial)sample"
(Spreen 1992:49). This question is crucial,because it generallyhas been assumed thathowever many waves the chain-referralsamplingmay contain, it necessarilymust reflectthe
biases in the initialsample.
This paper describesa new formof chain-referral
samplingtermed"Respondent-Driven
Sampling" (RDS). It was developed as part of an AIDS preventionintervention,the Eastern
HealthOutreach(ECHO) project,which targetsactive injectiondrug users forinConnecticut
terviews,AIDS preventioneducation,and HIV testingand counseling (Broadhead and Heckathorn 1994; Broadhead et al. 1995; Heckathornand Broadhead 1996). This processrequires
approximately1 1/2hours. RDS plays a dual role in the project. It is used both to recruit
subjectsinto the AIDS preventionintervention,and to sample the population of active injectors. Based on an analysis drawingon Markov chains and the theoryof biased networks,I
show that suitable incentivescan reduce the biases of chain-referralsamples. Specifically,
RDS produces samples thatare independent
fromwhichsamplingbegins.As a
oftheinitialsubjects
result,it does not matterwhether the initialsample is drawn randomly. In addition, RDS
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
Respondent-Driven
Sampling
reducesthebiasesresulting
fromvoluntarism
and masking,
and providesmeansforcontrolthe
from
differences
in
the
sizesofpersonalnetworks.Thusit resolves,
biases
ling
resulting
or providesmeansforresolving,
theprincipal
chain-referral
problemsaffecting
samples.
Respondent-Driven Sampling
A principle
derivesfromstudiesofincentive
respondent-driven
underlying
sampling
systems(Heckathorn
1990, 1993, 1996). Behavioralcompliancecan arisefromtwo theoretisources.First,itcan arisefromindividual-sanction-based
control.Here
callydistinguishable
an agent,suchas a teacher,parent,neighbor,
or AIDS prevention
an indicounselor,targets
vidualforcontrol,
forexample,bypromising
a rewardfora respondent
an interundergoing
view. The resultis a dyadicrelationof the sortpresumedin mostanalysesof influence
a primary
incentive.
socialcontrol
relations,
Second,compliancecan arisefromgroup-mediated
can be rewardednotonlyfortheirown par(Heckathorn
1990). Forexample,respondents
in a study,butalso forparticipation
ticipation
theyelicitfroma peer. In thesecases,control
involvesa two-step
oftheactor'sgroupis promiseda reward
process:one ormoremembers
or threatened
witha punishment
basedon whethertheactorcomplies,and membersofthe
incentive
theactor.In thisway,theagent's
grouprespondto thatsecondary
bycontrolling
influence
is amplified
the
this
formofcontrolis "groupmediThus,
through target's
group.
thattrigger
it are "secondary
incentives."
ated,"and theincentives
A centralconclusionfromrecentresearchon incentivesystems(Heckathorn1990,
incentives
canbe moreefficient
and effective
thanprimary
incentives
1996) is thatsecondary
in contextswhereintragroup
controlis cheap and effective,
as when socialapprovalis an
sanctionin peergroups.A difference
betweenprimary
and secondaryincentives
important
liesin thedistribution
ofcompliance
costs.In thecase ofindividualized
rewards,
compliance
costsare internal;the targetedactoreithercompliesor refuses,therebybearingwhatever
costsare involved,and thechoiceis strictly
in thecase of secondary
personal.In contrast,
actorseekstoinduceothersto comply,
incentives,
compliancecostsareexternal;thetargeted
and therefore
othersbearthecompliancecosts. It is usuallyeasierto tellothersto comply
thanto do so oneself.Thisdifference
is crucialin thecase ofrecruitment
intoa study.When
incentives
motivateparticipation,
individuals
maketheirown autonomousdeonlyprimary
cisionsaboutwhetherto cooperate.However,ifsecondary
rewardsalso motivaterecruiting
othersintothestudy,peers'socialinfluence
is harnessedon behalfofthesamplingprocess.
A secondreasonforthepotentialeffectiveness
ofsecondaryincentives
concernsmoniincentives
as meansof controlmustbe able to monitor
toring.Agentsemploying
primary
can observeonlya
compliance.But police,teachers,and drugabuse counselorstypically
smallportionofbehavior,so monitoring
is difficult
whenactivities
cannotbe geographically
confined.In contrast,
and peerstend
secondaryincentives
operatethroughpeerinfluence,
to be farmoreeffective
monitorsof behavior(Heckathorn
1990). When samplinghidden
can be located.For
populations,
helpfrompeersis oftentheonlywaythatmanyindividuals
IDUs are thanotherIDUs.
example,no one knowsbetterwho a community's
A thirdreasonforthepotentialeffectiveness
ofsecondary
incentives
concernsdifferentialsin responsetomaterialincentives.Someindividuals
not
may respondtomaterialincentives,e.g.,affluent
subjectsmaynotneed themoneytheywouldgainbybeinginterviewed.
In samplingbasedon primary
iftheresearcher's
incentives,
is
abilityto rewardsymbolically
limited,he or she mustrelyon materialrewards.Generally,
socialresearchers
have little
choicebutto relyupon materialrewards,
because,exceptin thecase oflengthy
participant
observation
studies,theycannotdevelopmeaningful
withsubjects.Secondary
relationships
incentivesharnesspeer pressure,applyingnon-material
rewardssuch as peer approvalto
securecompliance.In essence,secondaryincentives
convertmaterialincentives
intopeer-
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
177
178
HECKATHORN
basedsymbolic
incentives.Forexample,individuals
who aretooaffluent
tocareaboutmaterialrewardsmaydefertosocialpressurefromlessaffluent
or morematerially-oriented
peers,
such as mainconnections
or dealers.
Likeotherchain-referral
methods,RDS assumesthatthosebestable to accessmembers
snowballsamplingin
of hiddenpopulationsare theirown peers. It differs
fromtraditional
involvesan incentiveforparticipatworespects.First,whereassnowballsamplingtypically
tion,RDS involvesa dual incentivesystem- therewardforbeinginterviewed
(a primary
othersintothestudy(a secondaryreward).Thisstudy
reward)plusa rewardforrecruiting
to helpprotectoneself
and symbolic
also used a mixofmaterial(monetary)
(theopportunity
a robustrecruitand one's peersfroma deadlyepidemic)rewards.Theserewardsfostered
thatyieldeda large
ment,in whicha fewinitialsubjectseachproducedchain-referral
systems
overthecourseof successivewaves. Forexample,Figure1 depictsthe
numberof recruits
networkfromSite2, in whicha singlesubjectgeneratedmorethanone
largestrecruitment
hundredrecruits
ofdiverserace,ethnicity,
gender,and place of residence.
wm2
wm
w2
hm2
wm2---wf2
2
w
bm2
bbf2
W
m2w2
w/w2
wf2
-
/ wf2wf2
/Io
w2
w
?f2
wft2-wm2
2
wm2 w2
w2
wm2
wfo
wfo
M2
2w
2
wfo
3
f
-wm
&
Wf2
w
w2bm3
wm3
bf3
w23/
bm2
-
____
hm3
h 2hm2h 2
wf2
/
/?
hm2
wf2 wf2
h3
w3
3
m3wf3
wf3
hm3
2
hf2 ??m2
hf2
h2
w3 \
wf3
wf3
bm2wm3
Wf3
bm3
2
wm2
wm3
wf3
/
wm3
bf 3
wm
3
bm3 \
wm3
-h
bf2m
mhiw2
bm2
-m2 m2
3w3
bf3
Key:
=Black
wfo
hm3
hm3 hmN
hi3
-2=Town
omo
wfo
M=Male
2
3=Town 3
O=Other
?=Missing
in a respondent-driven
network
sample,beginning
froma single"seed."
Figure 1 * Recruitment
betweenRDS and typicalsnowballsamplingis thatsubjectsare not
A seconddifference
but to recruitthemintothe study.This
theirpeersto the investigator,
asked to identify
thataresubjectedto considerable
is crucialwhendealingwithhiddenpopulations
distinction
thatwhen Frank
not surprising
so
it
is
are
The
Netherlands
tolerant,
relatively
repression.
and Snijders(1994:62) askedheroinaddictsto listotherheroinaddicts,thetypicalresponin the
dentgave up more thannine names. This approachwould be moreproblematic
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
Respondent-Driven
Sampling
UnitedStatesundercurrent"war-on-drugs"
whereaskingrespondents
to proconditions,
duce suchlistsis highlythreatening,
and violatestheinformal
normagainst"snitching."
The
RDS approachreducesmasking,
becauseitgivesrespondents
theopportunity
to allow peers
to decideforthemselves
whethertheywishtoparticipate.
Forexample,a subjectwho would
not feelcomfortable
the name of a peer,may nonethelesssucceedin
givinga researcher
thatpeer. Also,the problemof recruiting
recruiting
onlythe mostcooperativesubjectsis
reducedby combining
primaryand secondaryincentives.Individualswho resistfieldresearchers'appealsmaynonetheless
at leastin part,by
yieldto appealsfrompeersmotivated,
individuals
are notimmuneto socialpressecondaryincentives.Even theleastcooperative
sure. Finally,it typically
is assumedthatchain-referral
samplesare biasedtowardsubjects
withlargepersonalnetworks.However,an empiricaltestof thishypothesis
did not find
evidenceofthis(Welch1975). Similarly,
in thecurrent
study,theassociationbetweenpersonalnetworksize,as measuredbya questionthataskedhowmanyinjectors
therespondent
knewby name or face,lackedany statistically
association
with
the numberof
significant
intothestudy.Thismayresultfromtherecruitment
subjectsrecruited
quotasbuiltintothis
couldgenerally
findat leasta few
study.Even thosesubjectswithsmallpersonalnetworks
subjectsto recruit.Recruitment
quotasreducedthe abilityof subjectswithlargepersonal
networksto recruitmoreextensively
thansubjectswithsmallernetworks.
The potentialbias of oversampling
subjectswithlargerpersonalnetworkscan be addressedin severalways: (a) samplescan be weightedinversely
withsubjects'self-reported
networksize; (b) samplingcan focuson saturating
areasand therebycapturesubtargeted
can be employedtoincrease
sizes;or (c) specialincentives
jectswiththefullarrayofnetwork
recruitment
of subjectswithtraitsassociatedwithsmallpersonalnetworks.
in thefollowing
Respondent-driven
samplingwas implemented
way:
Research
staff
a
recruit
handful
of
who
serveas "seeds."
(1)
subjects
financial
incentives
to recruit
theirpeersintothesame interview
(2) Seedsare offered
they
havejustcompleted.Specifically,
seedsaregivenseveralrecruitment
and toldthat
coupons
iftheypass thecouponson to peerswho come to thestorefront
foran interview,
the
recruiter
willbe paid $10 foreach recruited
peer.
are offered
the same dual incentivesas were the seeds. Everyoneis
(3) All new recruits
rewardedbothforcompleting
theinterview,
and forrecruiting
theirpeersintothe research. Givenadequate incentives,
thismechanismcreatesan expandingsystemof
chain-referrals
in whichsubjectsrecruitmoresubjects,who recruitstillmoresubjects,
and so on fromwave to wave. To ensurethata broadarrayofsubjectshavean opportuto preventtheemergence
ofsemi-professional
and to preclude
nityto recruit,
recruiters,
turfbattlesover recruitment
each subjectwas limitedto threeinitialcoupons.
rights,
Setsof threeadditionalcouponsweregivenonlyundertwoconditions,
thatthreesuband thattheyhad been effectively
recruited,
educatedabout
jectshad been successfully
HIV prevention
issuesas measuredusinga structured
assessment
test. A by-product
of
thisrecruitment
quota was to increasethenumberof wavesofrecruitment
requiredto
saturatethepopulation.Giventhecombination
ofincentives
forrecruitment
and education,theaveragecostperrecruited
subjectwas about$14.
in thepopulationmustbe objectively
(4) The traitdefining
lestremembership
verifiable,
reacttotherecruitment
incentives
spondents
byenlisting
personswhoare notpartofthe
hiddenpopulation(e.g., noninjectors).In thisstudy,verification
was accomplished
througha seven-step
screening
protocolused to confirm
injectiondruguse. Forexample, the firststepwas to look forrecenttrackmarks.If markswerenot found,subsequent steps requiredthe subjectto demonstrate
detailedacquaintancewith drug
and injectiontechniques.
preparation
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
179
180
HECKATHORN
occurswhena subjectseeksto participate
in a studyundermultiple
(5) Subjectduplication
and subjectimpersonation
occurswhena subjectpretendsto be another,peridentities,
Thesepotentialproblems
haps as a meansto collectthelatter'srewardforrecruitment.
were overcomeusinga subjectidentification
databaserecordingsubjects'identifying
height,weight,scars,tattoos,
physicalcharacteristics,
includinggender,age, ethnicity,
and somebiometric
measures.
theuse of"steering"
withina populationcan occurthrough
(6) Targeting
specific
subgroups
thatis, extrabonusespaid forrecruiting
incentives,
specificcategoriesof subjects.For
forrecruiting
femaleinjectors.
example,a modest($5) bonuswas offered
or when a
is saturated,
either
when
the
can
be
ended
targetedcommunity
(7) Sampling
has
has
reached
and
the
minimum
size
been
samplecomposition reacheda
targetsample
withrespectto thetraitsupon whichtheresearchfocuses.
stablecomposition
limitations.
Likeanychain-referral
Theseprocedures
haveinherent
procedure,
sampling
thatconwitha contactpattern;theactivities
RDS is suitableonlyforsamplingpopulations
in thepopulationmustcreateconnections
stitutemembership
amongpopulationmembers,
takeplace.
sexualactivities
as when druguserspurchaseor sharedrugs,or whenhigh-risk
thismethodis not suitablefordrawingnationalsamples.The size of the area
Therefore,
extenwithinwhichsamplingcan be effective
dependson thecontactpattern'sgeographic
to therespondents.Nor
oftransportation
siveness,whichin turndependon theavailability
whosemembersshareno ties. Second,the
is RDS suitableforsamplinghiddenpopulations
and
lestinterview
in thepopulationmustbe verified
traitdefining
objectively,
membership
to claimfalselythattheyare membersof
incentives
motivatesomerespondents
recruitment
thepopulation.Thisis nota problemuniqueto RDS; itoccurswheneverinterview
responincentivemayaggravatetheproblem,requiringa
dentsare rewarded,but therecruitment
well-tested
screening
protocol.
The Setting
of28,500
withpopulations
in twosmallcitiesin Connecticut
The RDS was implemented
and 42,800,and substantial
injectiondruguse and associatedAIDS cases. As ofJuly1996,
185casesofAIDS hadbeendiagnosedin thefirst
site,and 89 at thesecond,withanother116
cases in a nearbytownof 59,500thatalso fellwithinthesamplingarea. Aboutone halfof
- involveddruginjection.
the cases in thesetowns- 55%, 49% and 47% respectively
was high,i.e.,71.7% and 78.9% at sites1 and 2
Unemployment
amongtheIDU respondents
Samplingat the two siteswas sequential.It beganat the firstsitein March
respectively.
untilMarch1995. Samplingat thesecondsitebeganin March1995 and
1994,continuing
I reporton exactlyone year
March1996. Therefore,
hereextendthrough
thedata reported
locatedin activedrug
of operationat each site. Bothprojectsoperatedout of storefronts
forinterusingscenes. A programstaffof threehealtheducatorsscheduledappointments
viewsduringthreeweekdays.
A Formal Model
a RDS createsa stochastic
Whenviewedanalytically,
processin whicheach recruiter's
oftherecruits.In thecase ofraceand ethnicity,
affect
thecharacteristics
socialcharacteristics
ethnicmixofrecruits.For
ofeach ethnicgroupgeneratea distinct
thismeansthatrecruiters
whites,on average,reexample,fromTableIa it is apparentthatin Site 1, non-Hispanic
cruited70% whites,19% non-Hispanic
blacks,10% Hispanics,and 2% others.Similarly,
on average,28% whites,56% blacks,6% Hispanics,and 9%
blacksrecruit,
non-Hispanic
towardin-grouprecruitment
persists.Thispatterncontinuesamong
others,so thetendency
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
Respondent-Driven
Sampling
Hispanics,whose in-grouprecruitmentrate is 59%. Hence, recruitingoccurs preponderantly
within the ethnic group, but cross-ethnicrecruitmentis also common.
Table I * Characteristics
Site 1
ofRecruits,
byCharacteristics
ofRecruiter,
Table Ia * Recruitment
byEthnicity
ofRecruit
Race/Ethnicity
Ethnicity
ofRecruiter
W
B
H
0
Total
White(W)
Non-Hispanic
69.8%
19%
9.5%
1.6%
Black(B)
Non-Hispanic
28.1%
56.3%
6.3%
9.4%
Hispanic (H)
23.5%
5.9%
58.8%
11.8%
50%
50%
0%
0%
Other (0)
ofRecruits
TotalDistribution
50.9%
28.4%
15.5%
5.2%
(59)
(33)
(18)
(6)
49%
29.7%
15.8%
5.4%
Equilibrium
Mean Discrepancy,
ofRecruits
Distribution
and Equilibrium
= .91% (r = .998)
100%
(63)
100%
(32)
100%
(17)
100%
(4)
100%
(116)
100%
Table Ib * Recruitment
byGender
GenderofRecruiter
Female
Female
Male
Total
29.7%
70.3%
Male
23.9%
76.1%
TotalDistribution
ofRecruits
25.6%
(32)
25.4%
100%
(37)
100%
(88)
74.4%
(93)
74.6%
Equilibrium
= .2%
Mean Discrepancy,
Distribution
ofRecruits
and Equilibrium
100%
(125)
100%
Table Ib reportsthe correspondingdata forgender. No in-grouprecruitmentpatternis
apparent in thiscase, because both femalesand males recruitbetween 70% and 76% males.
This may reflectthe male-dominatedcharacterof injectiondrug scenes. However, because
rewards for recruitingfemales were greaterthan those forrecruitingmales, their expected
effectwas to increase in-group selection among women, strengtheningtheir incentives to
recruitone another,while reducingin-groupselectionamong men, because theirincentive
was to recruitwomen. This shows how steeringincentivescan alter tendencies toward ingroup or out-groupselection.
An intermediatelevel of in-groupselectionis apparentin Table Ic's reportof recruitment
based on drug preference.Most subjectsin the studyreportthatheroin is theirdrug of first
choice, so drug preferencewas dichotomizedbetween heroin,and an "other"categorythat
includes cocaine and crack,methadone, cannabis, speedballs (a combinationof cocaine and
heroin), and alcohol. A tendencyforsubjects to recruitfromtheirown group is apparent,
but it is weaker than the effectsof ethnicity.Finally,Table Id reportsrecruitmentbased on
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
181
182
HECKATHORN
Table Ic * Recruitment
byDrugPreference
ofRecruit
DrugPreference
Other
DrugPreference
ofRecruiter
Heroin
Heroin
87.7%
12.3%
Other
67.6%
32.4%
ofRecruits
TotalDistribution
80%
(94)
84.6%
20%
(21)
15.4%
Equilibrium
= 2.86%
ofRecruits
Mean Discrepancy,
Distribution
and Equilibrium
Total
100%
(81)
100%
(34)
100%
(105)
100%
Table Id * Recruitment
byLocation
LocationofRecruit
Location
ofRecruiter
In Town
(Town1)
In Area
In Town
(Town1)
In Area
OutofArea
Total
84.3%
8.4%
7.2%
60%
10%
30%
50%
8.8%
41.2%
100%
(83)
100%
(10)
100%
(34)
100%
(127)
100%
Out ofArea
18.1%
8.7%
73.2%
(23)
(93)
(11)
13.9%
8.6%
77.5%
Equilibrium
= 2.86% (r = .997)
ofRecruits
and Equilibrium
Distribution
Mean Discrepancy,
ofRecruits
TotalDistribution
area of residence. Here the tendencytowardin-grouprecruitmentis variable: whereas residents of Site l's town recruitinternally84% of the time,subjects fromthe area (i.e., from
contiguous towns) or out of the area (i.e., frommore distanttowns) recruitstronglyboth
fromwithinand outside of town. This reflectsthe town's positionas a regionaldrugdistribution center. IDUs travel considerabledistances to purchase drugs in the town, and in the
process they develop networkconnectionsboth in the town and in theirarea of origin.
Sampling as a Markov Process
RDS recruitmenthas two importantcharacteristics.First,thereare a limitednumber of
states (e.g., typesof ethnicity)thatsubjectscan assume. Second, any subject's recruitsare a
functionof his or her type,such as his or her ethnicity;and not of previous events, such as
who recruitedthe recruiter.This requirementis satisfiedin the case of Table Ia's data because thereis no significantassociationbetween the ethnicityof a recruiter'srecruitsand the
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
Respondent-Driven
Sampling
of therecruiter's
is a memoryless
recruiter.'Hence,recruitment
ethnicity
process.For exwho werethemselves
recruited
aboutthesame
ample,whiterecruiters
bywhites,recruited
mixofsubjectsas didwhiterecruiters
whowererecruited
or
blacks.
Therefore,
byHispanics
RDS recruitment
in thiscase qualifiesas a first-order
Markov
and can be represented
process,2
in theformofthenetwork
depictedinFigure2a. Therecruitment
processcanbe conceptualized as a pointmovingfromnode to node in thisnetwork,
each representing
a distinct
state
of the system.At any instant,the currentlocationof the pointindicatesthe mostrecent
recruit's
The arrowsleavingthatpointto othernodesindicatetheethnicity
ofthe
ethnicity.
nextrecruit,
and thenumberassociatedwitheach arrowindicatestheprobability
thatthis
pathwillbe taken.
Markovprocessesareofseveralbasictypes.Somehaveabsorbing
states,suchthatwhen
one entersthatstate,no exitis possible.Thisoccurswhenan ethnicgroupis totallyisolated,
such thatitsmembersonlyrecruitone another.Figure2a, indicatesthisdoes not fitthis
thereis a substantial
selection
network;
bias,butitis lessthantotal.Indeed,followin-group
the
arrows
in
2a's
one
can
reachany pointin the networkfromany
network,
ing
Figure
otherpoint. Therefore,
thisnetworkis "ergodic"(Fararo1973:280). In contrast,
Figure2b
a nonergodic
illustrates
networkthatwillbe discussedbelow,in whichthe rightmost
two
nodes are absorbing
states.Returning
to Figure2a, thenetworkis non-cyclic,
in thatany
node can be reachedduringany timeperiod(e.g.,odd versuseven timeperiods).Because
theMarkovprocessis bothergodicand non-cyclic,
Figure2a's networkis termed"regular."
Two theoremsregarding
RDS.
regularMarkovprocessesare relevantto understanding
First,the "law of largenumbersforregularMarkovchains"(Kemenyand Snell 1960:73)
statesthattheprobability
thata systemwillbe in anygivenstateoverthecourseofa large
numberof stepsis independent
ofitsstarting
state.The implication
forRDS recruitment
is
that:
THEOREM ONE: As therecruitment
continues
from
waveto wave,an equilibrium
mixof
process
willeventually
recruits
beattained
thatisindependent
ofthecharacteristics
ofthesubject
orsetof
from
whichrecruitment
subjects
began.
is reached,and theresulting
Thus,ifrecruitment
recruitment
netoperatesuntilequilibrium
workqualifiesas a regularMarkovprocess,itavoidsthecentralproblemforsampling
hidden
theinitialsample. Instead,a
populations- thatthesample'scharacteristics
merelyreflect
of theinitialset ofsubjects.Thispointis
respondent-driven
sampleis whollyindependent
illustrated
in Figures3a and 3b, in whichTableIa is used as the basis fortwo
graphically
alternative
simulations.Figure3a drawson TableIa's data to mathematically
projectwhat
would have happenedhad recruitment
African-American
begunwithonly non-Hispanic
seeds. Afterthefirstwave,thosesubjectswouldhave on averagerecruited
28.1% whites,
56.3% blacks,6.3% Hispanics,and 9.4% others,so blackswould predominate.Afterthe
secondwave,thecomposition
ofrecruits
wouldchange,because28.1% ofrecruiters
would
be white,and theywouldrecruit,
on average,69.8% whites,19% blacks,9.5% Hispanics,
and 1.6% others(i.e, NativeAmericans
and AsianAmericans).Similarly,
6.3% of secondwave recruiters
wouldbe Hispanic,and theywouldrecruit
on average23.5% whites,5.9%
and 11.8% others.Combining
blacks,58.8% Hispanics,
thesewiththerecruits
oftheblacks
and others,theoverallcomposition
ofthesecond-wave
recruits
is 40% whites,43.7% blacks,
9.9% Hispanics,
and 9.5% others.AsFigure3a illustrates,
thistrendcontinues,
fromwave to
whetherthe ethnicity
1. A seriesof chi-squareanalyseswas conductedto determine
ofa recruiter's
recruiter
affected
theethniccomposition
oftherecruiter's
recruits.Forexample,whiterecruiters
weredifferentisignificantly
atedbasedon whether
black,white,oran "other"person,and theethniccompositheywererecruited
bya Hispanic,
tionoftheirrecruits
werecompared.No significant
differences
werefound,e.g.,in thecase ofwhiterecruiters,
the
levelwas 0.44.
significance
2. A Markovchainis formally
tablesthatdisplaychangesin statefromone periodof
equivalentto contingency
timeto another,and as suchtheycan be analyzedusinglog-linear
models(see Bishopet al. 1975:257-279).
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
183
184
HECKATHORN
563
.69.19
.281
.5
A
.285
.0
16 .063
.059
.5
.095
.094
O
H
Key:
WTNon-Hispanic
Whites
B-Non-Hispanic Blacks
H-Hispanics
O-Other
.878
881
.012
T3-Town 3
area
2's Affiliated
A2-Town3's Affiliated
area
A3-Town
.
wherenodes
and arrows
represent
subjectcharacteristics,
Figure 2 * Markovchainnetworks,
recruitment
probabilities
represent
2
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
Respondent-Driven
Sampling
P
rS100
c
e
n
t
a
-
90-
80-
70-
o 50g
*
f
p
o
p
u
a
t
Startingpoint* one black subject
60-
40
302010
0
n
p
2
1
0
3
p
p
0
0
0
0
0
4
5
6
7
8
9
10
9
10
RecruitmentWave
Non-Hispanic White
-*-
P
e
r
100-
t
80-
e
60-
ce
n
p
o
p
u
a
t
n
- Other
A
Hispanic
90-
Startingpoint * one white subject
70g a
o
f
Non-Hispanic Black
50-
4030-
20100
1
2
3
4
5
6
RecruitmentWave
- Non-Hispanic White
Hispanic
B
---
7
8
Non-Hispanic Black
Other
in a Respondent-Driven
Figure 3 * Race and Ethnicity
ofRecruits
Sample,Beginningwitha
SingleBlackor WhiteSubject
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
185
186
HECKATHORN
their
so theproportion
ofblacksdeclinesto approximate
stabilizes,
wave, untilrecruitment
occurrencein the sample. Similarly,
what
have
had
re3b
shows
would
happened
Figure
ofwhiteswould
cruitment
begunwithone or morewhiteseeds. The initialover-sampling
recruitment
declinefromwave to wave, untilan equilibrium
patternwas attained.Note
ofthe
withTheorem
recruitment
is independent
One,theultimate
that,consistent
pattern
the
That
ultimate
recruitof
initial
with
which
recruitment
the
is,
subject(s)
began.
ethnicity
of
whether
mentpattern
is identical,
theinitial
seed(s)areblackor white.Theweakness
- thattheultimate
onthechoiceofinitial
chain-referral
subjects
samples
sampledepends
doesnot
thisbiasarisesonlywhensampling
ofsampling.
isnotinherent
inthisform
Instead,
to
be
reached.
continue
waves
for
through
enough
equilibrium
iftwoconditions
aremet.First,
Thisconclusion
tosnowball
samples,
appliesgenerally
Thisrequires
morestages
thesnowball
mustcontinue
tothepointofequilibrium.
process
mustcorrespond
toa
therecruitment
thanoccurinmostsnowball
samples.Second,
process
first
is
severe
than
be
at
Markov
This
constraint
less
may
apparent.For
process.
regular
a
to
a
order
Markov
a
not
to
first-order
but
process
higher
sample
corresponding
example,
toequilibtoslowtheapproach
severeproblems.
Itseffect
doesnotproduce
maybe merely
waves.
additional
rium,so sampling
mayrequire
a meansforcomputing
forregular
Markovchainsprovides
Thelaw oflargenumbers
and
Snell
of
the
1960:72).Theequi(seeKemeny
analytically equilibrium
sampling groups
ofequations:
thefollowing
state(E) ofTableIa's system
is foundbysolving
librium
system
1 =Ew+ Eb+ Eh+ Eo
Ew = .698 Ew + .281 Eb + .235 Eh + .5 Eo
Eb = .19 Ew + .563 Eb + .059 Eh + .5 E0
Eh = .095 Ea + .063 Eb + .588 Eh
(1)
(2)
(3)
(4)
fornon-Hispanic
whereEw,Eb,Eh,andEo aretheequilibrium
whites,
sampleproportions
the
andothergroups,
blacks,
Equationoneexpresses
respectively.
Hispanics,
non-Hispanic
ofpopulation
mustsumto one. Thesubsedistributions
factthatthesumofproportional
sizes
oftheequilibrium
sizeas a function
eachgroup's
equilibrium
quentequations
express
ofeachgroup.Berecruitment
andeachgroup's
ofthegroup,
ofallmembers
proportional
withfourunknowns,
thereis a uniquesolucausethisis a system
offourlinearequations
tion,thatis,Ew= .490,Eb= .297,Eh = .158,andE0= .054.3
is
oftheabovetheorem
thepractical
indetermining
ofgreat
A matter
import
significance
wouldoccur
is reached.Thatthisconvergence
therateatwhichtheequilibrium
ultimately
Markov
chainsarecharacwereslow.Fortunately,
ifconvergence
wouldnotmatter,
regular
ofconvergence."
"a
fast
kind
as
describe
and
Snell
what
terized
very
(1960:72)
Kemeny
by
occursgeometrically.
thatprovesthatconvergence
is basedon a theorem
Thisconclusion
is that:
Theimplication
Thesubject
THEOREM
equilibrium
sample
approaches
bya respondent-driven
poolgenerated
TWO:
rate.
ata rapid
(i.e.,geometric)
occurs
Theimplication,
below,isthatconvergence
exceptina specialclassofcasesdiscussed
withtheabovedisis consistent
waves.Thisconclusion
ofrecruitment
withina handful
theactualsubject
between
themeanabsolute
cusseddata.Forexample,
compodiscrepancy
islessthan1%,showing
andthecomputed
sitionbyrace/ethnicity
composition
equilibrium
recruitment
basedongender,
hasbeenapproximated.
thatequilibrium
drugprefSimilarly,
location(see TablesIb to Id) also closelyapproximates
erence,and recruitment
equilibrium,
Italso
forsystems
withuptotengroups.
these
thatcomputes
from
theauthor
3. Aprogram
isavailable
equilibria
ofinitial
from
wavetowave,basedonanyspecification
as itchanges
thecomposition
ofthesample,
subjects,
computes
senda blankdisk,
Fora copy,
tobereached.
ofwavesrequired
forequilibrium
thenumber
andindicates
alongwitha
an IBMcompatible
andrequires
ismenudriven
totheauthor.Theprogram
diskette
mailer
self-addressed
stamped,
withVGAgraphics.
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
Respondent-Driven
Sampling
as reflected
in meandiscrepancies
betweenthesampleand equilibrium
oflessthan3%. This
closeapproximation
to equilibrium
is theoretically
oftheinitial
expected,in thatirrespective
distribution
of seeds,approximating
to
within
a
of
within
tolerance
2%
equilibrium
any of
TableI's matrices
neverrequiresmorethansix recruitment
waves. A further
of
implication
thetheoremis thatthegreater
theextentto whichtheinitialsampleapproximates
theequilibriumsample,thequickerwillbe theapproachto equilibrium.Hence,choosinga diverse
samplespeedstheapproachto equilibrium.
Table II * Recruitment
byArea,Site2
Table Ha * Recruitment
byLocation,Site2
Location
ofRecruit
Town
2
Location
ofRecruiter
Town
3
Other
Total
Town 2
87.8%
1.2%
11.0%
100%
Town 3
0%
88.1%
11.9%
Other
22.2%
55.6%
22.2%
100%
(59)
TotalDistribution
ofRecruits
49.3%
(74)
38.7%
(58)
12.0%
(18)
(Site2's location)
23.7%
Equilibrium
63.3%
13.0%
(82)
100%
(9)
100%
(150)
100%
= 17.09% (r = .432)
Mean Discrepancy,
Distribution
ofRecruits
and Equilibrium
Table IIb * Recruitment
byLocation,withPartitioning
ofthe"Other"Group,Site2
Location
ofRecruit
Location
ofRecruiter
Town
2
Town
3
Town2 s
Affiliated
area(A2)
Town3 s
Affiliated
area(A3)
Total
Town 2
87.8%
1.2%
11.0%
0%
100%
Town 3
0%
88.1%
0%
11.9%
Town2's Affiliated
area (A2)
50%
0%
50%
0%
100%
(59)
Town3's Affiliated
area (A3)
0%
80%
0%
20%
38.4%
100%
(4)
100%
5
7.3%
5.3%
100%
(Site2's location)
TotalDistribution
ofRecruits
49.0%
(74)
(58)
(11)
(8)
0%
87.1%
0%
12.9%
Equilibrium
Mean Discrepancy,
ofRecruits
Distribution
and Equilibrium
= 28.15% (r = .330)
(82)
(150)
100%
Considernow thecase ofSite2, in whicha morecomplextheoretic
analysisis required
to make sense of the data. TableIIa showsrecruitment
as a function
of the recruiter
and
recruit's
townsoforiginforSite2. HereTownTwois Site2's location.Theseedsweredrawn
fromthissite,becausean aim of thestudywas to determine
how geographically
extensive
recruitment
wouldbecome. Henceno effort
was made to specify
thesamplinguniversein
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
187
188
HECKATHORN
Table IIc
*
Recruitment
byLocation,Town2 and Vicinity
Location
ofRecruiter
Town2
LocationofRecruit
area (A2)
Affiliated
Town2
(Site2's location)
Affiliated
area (A2)
88.9%
11.1%
50%
50%
TotalDistribution
ofRecruits
87.1%
12.9%
(74)
(11)
81.8%
18.2%
Equilibrium
= 5.26%
ofRecruits
Mean Discrepancy,
Distribution
and Equilibrium
Table IId
*
Total
100%
(81)
100%
(4)
100%
(85)
100%
Recruitment
byLocation,Town3 and Vicinity
LocationofRecruit
area (A3)
Affiliated
Total
Location
ofRecruiter
Town3
Town3
88.1%
11.9%
80%
20%
100%
(59)
100%
12.5%
(8)
12.5%
100%
(64)
100%
Affiliated
area (A3)
87.5%
(56)
87.5%
Equilibrium
= 0%
ofRecruits
and Equilibrium
Distribution
Mean Discrepancy,
ofRecruits
TotalDistribution
(5)
advance. Unlike most studies,thisservedas a outcome variable. TownThreeis a largertown
about 15 miles away, and Otherrefersto a mix of locations,eitherin between or (in a few
cases) more distant. In-group recruitmentwithinthe two towns is very strong(i.e., 88%),
is almost nonexistent(i.e., 1.2% and 0%). Recruitmentbegan from
and cross-recruitment
Town 2 and laterextended to Town 3, a processthatshows the abilityof RDS to break out of
one isolated group to penetrateanother similarlyisolated group. How this occurred is depicted in Figure 1, which shows the largestrecruitmentnetworkfromthis site. During the
thirdwave, 5 1/2monthsinto the 12 monthstudyperiod,a Hispanic male subject fromTown
2 recruitedanother Hispanic male fromTown 3. From this tinyfootholdgrew a substantial
branch of the recruitmenttree (see the bottom-rightquadrant of Figure 1). That subject
directlyor indirectlyrecruited40 othersubjectsfromTown 3 and the surroundingarea. This
robustchain-referralsystemcan exploit existingnetwork
shows vividlyhow a sufficiently
connections to expand into isolated groups.
The relative isolation of the two towns derives fromtransportationdifficulties.Many
subjects (53.1%) reportthat they do not have access to a car, so hourly bus service is a
primarymeans of transportation.Most subjects travelingfromTowns 3 to 2 began with 15
mile bus ride, and then walked an additional mile to the ECHO Project storefront.Though
theywere compensated fortheirbus fares,thisreflectsa remarkablecommitmentto participation in the study. That subjectscontinuedto make thistrekduringthe Connecticutwinter
is especially remarkable.
There is a sharp contrastbetween recruitmentby area at the second and the firstsites.
Whereas equilibrium was closely approximatedby the sample distributionin the firstsite
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
Respondent-Driven
Sampling
is largeat thesecondsite(D=17.09%). This
(meandiscrepancy,
D=2.86%), thedivergence
reflects
an over-sampling
in Site2 ofTown2 (49.3% versus23.7%), and an under-sampling
in Town3 (38.7% versus63.3%). It mightseemthatthisdiscrepancy
resultsmerelybecause
the systemhas not as yetreachedequilibrium.Thismightappearto explainmuchof the
fromequilibrium,
becauseat presentin thison-goingstudy,thefirstsubjectfrom
departure
Town3 appearedin thesample5 1/2monthsintothe 12 monthstudyperiod,and thetenth
wave has onlyjust begun. Based on thisincomplete
movementtowardsequilibrium,
one
could thenweightthesample. Forexample,giventhatTown2 is over-sampled
25.6%
by
(i.e., 49.3%-23.7%), and Town 3 is under-sampled
by 24.6% (i.e., 63.3%-38.7%), these
townscan be assignedweightsof .481 (i.e., 23.7%/49.3%) and 1.64 (i.e., 63.3%/38.7%),
when analyzingethnicity,
The "Other"area had
respectively
gender,and drugpreference.
so it is assigneda virtually
neutralrateof 1.08
alreadycloselyapproximated
equilibrium,
but
a
closer
examination
ofthe
(i.e., 13%/12%). Suchan accountis theoretically
plausible,
data showsthatit does notfitthisparticular
case.
As represented
in TableHa, theOthergroupservesas a bridgebetweenTowns2 and 3,
networkdid
recruiting
equally(22%) frombothtowns.Whythenin Figure1's recruitment
none ofthe 10 subjectsfromthe"other"area serveas bridgesbetweenthetwo towns,i.e.,
Town3 recruitments?
Giventhemodestnumberofcases,
whywerethereno Town2/Other/
thiscould be a coincidence,
but the same occursin Site 2's otherrecruitment
networks.
theOthercategory
However,on closerinspection,
into
two
disaggregates
mutuallyexclusive
groups.First,thereare personswho liveoutsideofTown2 and have connections
onlyto
thattownand to theirown locality.Thismaybe termedthe "affiliated
area" of Town 2.
area. Becauseoftheconsiderable
Second,Town3 also has an affiliated
distancebetweenthe
areas withnetworkconnectionsto bothtowns.
towns,no subjectsresidein intermediate
the appearancethatthe Othercategorycan serveas a bridgebetweenthe two
Therefore,
townsis an illusioncreatedby inappropriate
As thiscase illustrates,
in RDS
categorization.
of subjectsto categories
is by no meansneutral.It musttakeinto
analysistheassignment
accountthestructure
ofnetworkconnections.Specifically,
ifa category
includesindividuals
withquitedisparatenetworkstructures,
differentiated
theyshouldbe further
by category.
TableIIb showsthe matrixproducedwhen the Othercategoryin TableIIa is divided
basedon whethertheyare affiliated
nonresidents
ofTowns2 or 3. As is apparentbyinspection,thisdoes notbringthesystemcloserto equilibrium.Indeed,thesystemis farther
from
thanpreviously
has become
equilibrium
(D = 28.15%), and thecontentof theequilibrium
becauserecruitment
is projectedto die out whollyin Town2 and itsaffiliated
strange,
area.
Thereasonforthisproblemis apparentfrominspection
ofFigure2b,whichdepictsTableIIb's
recruitment
datain networkform.Notethatthenetworkis nonergodic,
in thateverypoint
in thenetworkcannotbe reacheddirectly
or indirectly
fromeveryotherpoint. Thisoccurs
because nodesforTown 3 and itsaffiliated
area serveas absorbingstates.Thisis whythe
forthissystem(see TableIIb) showsrecruitment
as ceasingin Town2 and its
equilibrium
area,and continuing
onlyin Town3 and itsarea.
The solutionto thisproblemis straightforward.
Becausethenetwork
is nonergodic,
this
datafromthissitecan bestbe
systemdoesnotgeneratea regularMarkovprocess.Therefore,
thesampleintotwo subsamples,
one forTown2 and itsaffiliates,
analyzedby partitioning
and one forTown3 and affiliates.
TablesIIc and IId illustrate
thisprocessforTableIIb's data.
Bothsystems
reachsimilarequilibria,
withabout87% recruitment
withinthetown,and the
remainder
fromaffiliated
is closelyapproximated
areas;equilibrium
outlying
in bothsystems.
In sum,whereasSite2 initially
appearedto containa singlerespondent-driven
samplewith
anomalous properties,it turnsout on closer examination to consistof two distinctsamples.
This possibilityshould be suspected whenever, despite a substantialsample size, departures
fromequilibriumare great.
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
189
190
HECKATHORN
Assessing Bias in Respondent-DrivenSamples
An idealsamplingprocedureyieldsnotonlya sampleindependent
ofitsstarting
point,
but also an unbiasedsampleof theunderlying
population,witha knowndegreeof consisintervals
canbe computed.However,becauseoftheabsenceof
tencyfromwhichconfidence
a moremodestgoal has been to devise
probability
samplesin studiesofhiddenpopulations,
meansfordrawingsamplesthatproduce"a goodcross-section
of thetargetpopulation,"or
in thetargetpopulation"(Spreenand Zwaagstra1994:478).
"thecoverageof heterogeneity
Such samplesare termed"representative."
An assessment
ofRDS requiresexaminingboth
theextentto whichitfulfills
thismodestgoal,and whetheritmightprovidea basisforfulfilidealofprobability
lingthemorestringent
againtothedatafromSite1,
sampling.Returning
membersof the majorethnicgroupstendto recruitdifferentially
fromwithintheirown
ranks(see TableIa). Thisreflects
thesocialstructure
in whichthesesubjectsare embedded;
intra-ethnic
connections
are morecommonthaninter-ethnic
connections.A formof networkanalysis,"biasednetworktheory,"
has been used to analyzea varietyofrelationships,
and friendships
(Fararoand Skvoretz1984; Rapoport1977). The theincludingmarriages
idea
a
essential
is
that
structured
socialsystem'ssociallinkageswillbe non-random:
ory's
willbe moreprobablethanothers,where"biases"referto anydepartures
somerelationships
froma fullyrandompatternofconnection.Biasesmayconsistofeithera tendencytoward
as typically
occursin friendships;
or a tendency
towardout-group
affiliain-groupaffiliation,
tion,as in exogamousmarriagesystems.
In an unstructured
the prevalenceof each group
group,recruitment
merelyreflects
ofgroupidentity,
withinthepopulation.Everyrecruiter,
recruits
thesame mix
irrespective
of subjects.Frominspectionof TableI, thatobviouslydoes not fitthe data. In a system
wheregroupaffiliation
affects
selection,the membersrecruited
by each individualreflect
boththerecruiter's
ofdifferent
withinthepopubiases,and theprevalence
typesofmembers
lation. Selectionin such a systemcan be modeledas a processwithtwo conditionalsteps
biasevent
fora memberofgroupX either
(Fararoand Skvoretz1984:233). First,an inbreeding
or
fails
to
occur
with
bias event
occurs,withprobability
1-Ix.
probability Iftheinbreeding
Ix,
withcertainty.
eventdoes
occurs,selectionoccursfromthein-group
Second,iftheinbreeding
not occur,any individualis selectedrandomly
fromthesystem'spopulationirrespective
of
theprobability
ofselecting
a memberofanygroupis equal to
Therefore,
groupmembership.
theproportion
of thatgroup'smembersin thesystem.In thisformulation,
selectionof an
eventdid notoccur,whereasselectionof an
out-groupmemberprovesthattheinbreeding
event,or byrandomselection
in-groupmembercan occureitherbecauseoftheinbreeding
fromthesystem's
themagnitude
ofinbreeding
a
reflects
population.As thusconceptualized,
mixofculturaland situational
fromculturalemphasison in-group
affiliation
factors,
ranging
to ease of transportation
betweengeographically
distinct
groups.It can also be affected
by
offemaleIDUs.
suchas thoseused to increaserecruitment
steeringincentives,
of
Thismodelinvolvesa hostofsimplifying
assumptions.Forexample,theoccurrence
is treatedas a dichotomous
event,whereasgradations
appearmoreempirically
inbreeding
are not necessarily
undesirable.Formal
plausible.However,such simplifying
assumptions
heuristicmodelsshouldneverbe made morecomplexthanrequiredto capturethefundamentalsof theprocesstheyare intendedto represent,
lesttheylose tractability
and failto
ofthemodelrequiresan assessment
ofitsrobustyieldclearconclusions.Ideally,validation
ness. First,simplifying
assumptionsare replacedby more realisticassumptions.Then,
whetherthemodel'sfundamental
conclusions
remainvalidis assessed.Theaimis to identify
the simplestrobustmodel. However,such an assessmentexceedsthe scope of thispaper.
Whatcan be said at presentis thatthe Fararo-Skvoretz
modelcapturesthe two essential
of inbreeding,
and thatthe
featuresof inbreeding,
i.e., thatgroupsvaryin theirstrength
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
Respondent-Driven
Sampling
selectionsproducethe structure
of in-groupand out-groupaffiliations
withinthe
resulting
the
is
Hence
model
sufficient
for
heuristic
system.
purposes.
Considerthecase ofa systemcomposedofN groups,A, B, C,
N. Theprobability
ofa
....
memberofa groupX selecting
fromthein-group,
is
the
sum
of
the
of
selecprobability
Sx,
tionbeingcontrolled
an eventwithprobability
that
I,, and the probability
by inbreeding,
did notoccur(1 - Ix),weightedbytheproportion
ofmembersofgroupX in the
inbreeding
population,Px,i.e.,
Sxx
=
Ix
+ (1 Ix) Px
(5)
theprobability
ofa memberofgroupX selecting
a memberofanyoutBy similarprinciples,
of selectinga memberof group Y is the
group,Y, can be computed.The probability
that inbreeding
didnotoccur(1-Ix),weightedby theproportion
ofmembersof
probability
groupY in thepopulation,Py,i.e.,
Sxy= (1 - Ix) Py
(6)
1 = Ea + Eb
Ea = Saa Ea + SbaEb
(7)
(8)
Withthissetofequations,themodelspecifies
therelationship
betweentheunderlying
populationproportion
(P), and selectionprobabilities
(S) of thesortdepictedin TableIa.
Based on the above pair of equations,equilibrium
samplesize can be expressedas a
ofbothpopulationand inbreeding
function
terms.To simplify
theanalysis,considerfirst
the
case. Consistent
withthelaw oflargenumbersfortwo-subgroup
regularMarkovchains,the
ofmembersofeach groupis foundbysolvinga systemoftwolinear
equilibrium
proportion
equations:
whichyields,
E, =
Sba
(9)
1-Saa+Sba
Thisinturncanbe expanded
from
5 and6 toexpress
theequilibbysubstitution
equations
riumsamplein terms
oftheinbreeding
terms
andpopulation:
Ea =
Pa(1-Ib)
(10)
1-(Ia+Pa
(1-Ia))+Pa(1-Ib)
Thissimplifies
to:
Ea =
Pa(Ib-l)
(11)
Palb-Paa+Ia-1
Thisexpressionprovidesa basisforderiving
conclusions,
groundedin bothbiased-network
the conditionsunderwhicha RDS willproducean
theoryand Markovchains,regarding
unbiasedsample,thatis, a samplein whichthe proportion
of each groupin the sample
in thepopulation.
equals thatgroup'sproportion
Giventhatinbreeding
affects
theequilibrium
recruitment,
groupcomposition
(E) need
not correspondto the trueunderlying
populationdistribution
(P). The extentto which
membersofanygivengroupwillbe sampleddependson threefactors:
thesizeofthegroup,
itstendency
towardinbreeding,
and thestrength
ofinbreeding
in othergroups.Partialdifferin the sampleincreases,all else equal, as the
entiation4shows thata group'sproportion
4. Partialdifferentiation
a meansfordetermining
whether
therelationship
betweentwotermsis consistprovides
ormixed.Thefirst
entlypositive,
consistently
derivative
negative,
ofone termwithrespect
stepis to computethefirst
to theother.The direction
oftherelationship
is thenfoundbydetermining
whether,
on parameter
givenconstraints
is alwayspositive,
values,thederivative
or mixed.Forexample,to determine
the relationship
alwaysnegative,
betweenequilibrium
size
fora group,and thatgroup'sinbreeding
sample
bias,differentiate
to Ia, i.e.,
Ea withrespect
dEIdla =
a
P(Ib)
2
(PIb+PaI+I,- 1
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
191
192
HECKATHORN
a group'sproportion
deincreases.Similarly,
group'ssize increasesand as its inbreeding
a
small
other
creaseswithincreasesin inbreeding
Thus,
potentially
might
by
group
groups.
appear large in the respondent-drivensample if it has stronginbreedingand if members of
other groups have weak inbreeding. Conversely,a large group might appear small in the
respondent-drivensample ifits inbreedingis weak, and ifothergroups have stronginbreed-
of the RDS as a means fordrawingunbiasedsamplesdependson
ing. The effectiveness
are plausible.
whetherthesetwopossibilities
The conditionsunderwhichRDS producesunbiasedsamplescan be deducedthrough
underwhichEa
theconditions
analysisofequation11 above. Thiscanbe donebyidentifying
11
substitute
for
in
above:
Ea equation
Pa
equals Pa. First,
Pa =
Pa(Ib-1)
PaIb-PaIa+Ia-1
(12)
then, solve forIa,
Ia =
Ib(Pa-1)
(13)
Pa-1
which can be simplifiedas tollows:
(14)
Ia = Ib
When describedin termsof Figure 4, this conclusion means that when inbreedingis equal,
effects
of
is exactlyoffset
effects
ofeachgroup'sinbreeding
the inflationary
bythedeflationary
termscancelone another,leavingsample
Thusthetwoinbreeding
othergroups'inbreeding.
size determined
exclusively
bypopulationsize. Thoughthisproofappliesto the two-group
in larger
case it extendsto the three-group
case, and has been confirmed
by simulations
statedas follows:
systems.Thistheoremmaybe formally
THREE:A respondent-driven
THEOREM
sampledrawsan unbiasedsampleifall groups'inbreeding
termsare equal, i.e.,foranygroupX, E. = P", iffIx = IyforanyothergroupY.
A limitationof the theoremis noteworthy.When inbreedingtermsare verylarge,reflecting
mutual isolation of the system'sgroups, the approach to equilibriumslows, e.g., when the
termsexceed .99, scores of recruitmentwaves may be required to approximateequilibrium.
Therefore,given the practicallimitationsthat constrainthe number of recruitmentwaves,
equilibrium will be reached only when inbreedingis not extreme. The implicationis that
when the boundaries separatinggroupsare virtuallyimpassible,RDS should be used to draw
termsproveto
suchgroups,and notacross
them,evenshouldinbreeding
samplesfromwithin
be equal.
condition.Hence,thequestionofhow
termsis a quitestringent
Equalityofinbreeding
termsare relatedto one anotheris crucialto an assessmentof RDS's abilityto
inbreeding
the relabiasednetworktheoryprovidesno guidanceregarding
avoidbias. Unfortunately,
terms.However,othersociologicaltheoriesprovidereasonsto
tionshipsamonginbreeding
fromSimmel
related.A venerableprinciple
willbe at leastpositively
expectthatinbreeding
has
that
Balkanization
enhance
common
enemies
is
that
implying
groupsolidarity,
(1955),
thatinduces
itsboundaries,
an epidemicquality,becausewhenanysinglegroupstrengthens
thatcreatesa sense
othergroupsto do thesame. Morerecentresearchshowsthatanything
and
Horowitz
boundaries
tostrengthen
ofcommonfateis sufficient
1969). An
(Rabbie
group
determined
are
for
association
occurs
when
this
of
group
identity.
by
opportunities
example
a groupwill
theirdegreeof inbreeding,
Thus,in a systemwhereothergroupsstrengthen
is positive,
ofthisexpression
becauseP, is positivebyassumption,
The numerator
(1 - Ib) and (1 - Pa) are positive
is positive
thedenominator
numbers
arepositive.Similarly,
ofthreepositive
becauseIa < 1,andPb< 1,andtheproduct
areboth
and denominator
becauseitsnumerator
becausethesquareofanynumberis positive.Thus,dE,/dIis positive
positive.
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
Respondent-Driven
Sampling
itsown inbreeding.Conversely,
reactby increasing
in a systemwhereinbreeding
is weak,
thegroupwilltendto avoidinbreeding.The cumulativeeffect
of theseprocessescreatesa
terms.If thisline oftheorizing
is correct,
the equal-inpositiverelationamonginbreeding
in manysocial systems.Of course,thisspeculation
breedingassumptionis approximated
requiresempiricalconfirmation.
It mightseemthatthebestway ofempirically
assessingthebias of a RDS wouldbe to
it
with
the
from
which
it
was
drawn. Unfortunately,
in the current
compare
population
thisis notpossiblegiventhehiddennatureofthatpopulation.One approachmight
research,
be to applythemethodto a non-hidden
Such a test
populationwithknowncharacteristics.
wouldbe valuable,butlessthandefinitive.
Problems
ofmaskingdo notarisein non-hidden
of theresultsto hiddenpopulationswould remainunpopulations,so the generalizability
clear. A secondapproachapplicableto somehiddenpopulations
mightbe to use veryaggressive measuresto producea saturation
sample,so thetotalsamplecouldthenbe compared
withpartialsamplespreviously
derivedfroma standardRDS. Thismoreaggressive
recruitmentmechanism
couldinvolvean additional
in whichrecruiters
are rewardednot
incentive,
butalso forthoseoftheirrecruits.Thiswouldcreatean incenonlyfortheirown activities,
and theirrecruits
tiveforrecruiters
to cooperatebypoolingtheirnetworksand socialinfluthatsuchan approachmightworkis an innovation
ence. One indication
thatspontaneously
forsome
emergedat bothsites.Themaledominated
druginjectionscenescreateddifficulties
femalerecruiters,
and withoutanyprompting
fromstaff,
thesamesolutionemergedat both
sites- somewomenrecruited
in pairs.The resultswerehighlyeffective,
and suggestthat
the robustness
ofrecruitment
couldbe increasedifall subjectshave incentives
to recruitin
channeldruginjecpairs. Thisshowsthecapacityofan RDS to harnessand constructively
tors'creativity
and energy.
Table III
*
Recruitment
in Towns1, 2, and 3, and Comparisons
withPopulation
byEthnicity
Distributions*
ofRace/Ethnicity
in SampleofRecruits
Line(P),
Comparison
(S), Population
LivingBelowPoverty
and TotalPopulation(T) byTown
Race/Ethnicity
(n)
S
Non-Hispanic Black
Non-Hispanic White
Hispanic
Other
Total
27.6%
54.3%
14.7%
3.4%
100%
(93)
Mean Discrepancy
(D)
TownI
P**
T
23.7%
16%
45.2%
69%
26.7% 11.9%
4.4%
3.1%
100%
100%
(4,310) (28,540)
S
Town2
P**
T
S
Town3
P**
T
11.4% 27.6% 10.9%
55.7% 61.3% 84.2%
31.4% 9.6%
3%
1.4% 1.5%
1.9%
100% 100%
100%
(70) (2,993) (42,762)
12.1%
9%
3,6%
56.9%
52% 82.8%
27.6%
39% 12.9%
3.4%
0%
.7%
100% 100%
100%
(58) (4,342) (59,479)
Mean Discrepancy
Mean Discrepancy
Mean Discrepancy
Poor Population,D =
6.5% (r = .926)
Total Population D =
7.4% (r = .957)
Poor Population, D =
10.92% (r = .804)
Total Population, D =
14.5% (r = .946)
Poor Population, D =
5.7% (r = .950)
Total Population D =
13.0% (r = .943)
Between
Recruits
and: Between
Recruits
and: Between
Recruits
and:
Notes:
* Source,1990 Census
- family
ofpoverty
offourearningbelow$12,674for1989
**Definition
Availabledatapermitonlyroughcomparisons
betweenthesampledistributions
and the
hiddenpopulations
fromwhichtheyaredrawn.Thisinvolvescomparing
samplesofrecruits
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
193
194
HECKATHORN
to the demographics
of theircommunities,
on thepresumption
thatthehiddenpopulation
ofracialand
willto someextentmirror
thelargercommunity.
TableIII showscomparisons
of
ethniccomposition
and theracialand ethnicdistribution
betweenthesampleofrecruits
of thepopulationliving
the totalpopulation,and betweenthesampleand thedistribution
is potentially
becauseinjectors
belowthepoverty
line. The lattercomparison
morerelevant,
at bothsites,
comedisproportionately
fromtheranksofthepoor. Forexample,in aggregate
morethanthreequarters
ofsubjectsreported
linelevelsofincome.Thisanalysis
sub-poverty
focusesonlyon subjectslivingwithinthesites'threeprincipaltowns. Subjectsscatteredin
and moredistantcommunities
wereexcluded,becausetoo fewsubjectscame
surrounding
fromany singlecommunity
to providea meaningful
samplesize.
of
several
it
is
to
make
too
much
thesetentative
not
important
comparisons,
Though
are notablefrominspection
ofTableIII. First,as measuredby mean discrepancies
findings
betweenthesampleand population,each samplecorresponds
morecloselyto thedistributionof personslivingbelow the povertyline thanto the totalpopulation.Thisno doubt
reflects
differential
recruitment
ofthepoorintodruginjection.Second,a majordiscrepancy
ofHispanicsin thesample(31.4%) and in boththe
existsin Town2 betweentheproportion
totalpopulation(3%), and thepoverty
a substantial
population(9.6%). Thismayconstitute
ofHispanicsoffrom10.5 to 3.2 times.Thisdiscrepancy
over-sampling
mayresultfromany
numberof sources:changesin the Town'sethniccomposition
since the 1990 census,in
whichcase sampling
is correct;
of
attributes
the
town's
special
Hispanicpopulationthatintroduce someunknownformofbias intothesample;or an artifact
ofincomplete
sampling.In
minorities.Respondentany case, thisformof samplingdoes not appearto undersample
drivensamplingcontinuesat thissite,and ethnographic
fieldinvestigation
has begun,so
thesefurther
resolve
the
issue.
investigations
may
betweenthesampleand poorpopulationsis
Finally,in Towns1 and 3 thecomparison
Even in
close,i.e., mean discrepancies
relatively
equal only6.5% and 5.7% respectively.
Town2, wheretheapparentover-sampling
ofHispanicsconstitutes
an anomaly,thecomparison becomesrelatively
close ifone excludesHispanics.Therefore,
in thesecases RDS apto
succeed
in
that
a
broad
cross
of the underlying
include
section
pears
producingsamples
in Spreen'sand Zwaagstra'ssense.
population,and henceis "representative"
SensitivityAnalysis
Giventheimportance
oftheequal inbreeding
conditionforRDS's abilityto drawunbiased samples,and giventhathoweverstrongly
termsmayproveto be,
relatedtheinbreeding
never
be
can
to
coincide
it
is
useful
to
a sensitivity
they
expected
perform
analysisto
exactly,
is violated.Figure4a redetermine
whathappenswhentheequal inbreeding
requirement
in systems
portstheresultsofsimulations
composedoffourgroups.5Theverticalaxis represents the mean absolute discrepancybetween population and equilibriumsample
are identical,to a
distributions.
Thistermcan varyfromzero,when the two distributions
5. The simulation
in Figure4a involveda sevenstepprocedure:
experiments
reported
issetrandomly.
distribution
Thiswasdonebychoosing
fourrandomnumbers
between0 and 1,and
(1) Thepopulation
eachbythesumofthefournumbers
toyieldfourpositive
numbers
thatsumtoone. Thesenumbers
represent
dividing
thesize ofeach ofthefourgroups,i.e, Pa, Pb,Pc,and Pd.
of
axiscorresponds
toa range
withinwhichinbreeding
mustfall.Themidpoint
(2) Eachpointon Figure6a's horizontal
thisrangeis chosenrandomly,
withtherequirement
withinthesetofvaluesconsistent
thatinbreeding
cannotexceed
oftheinterval
mustlie betweenpoints.3 and
1,orbe lessthan0. Forexample,iftherangechosenis .6,themidpoint
.7.
of
fromwithintheabove-chosen
terms(Ia, Ib,Ic,andId)areselectedrandomly
(3) Inbreeding
range,e.g.,ifthemidpoint
are chosenbetween.1 and .7.
therangeis .4, and thelengthis .6, fourrandomnumbers
are computedusingequations5 and 6.
(4) Selectionprobabilities
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
Respondent-Driven
Sampling
maximumvalue of .5 in a systemwithfourgroups(i.e., fora systemof n = 4 groups,the
maximumis 2/n= 2/4= .5). Thehorizontal
axisrepresents
therangewithinwhichinbreedterms
fall.
This
lower
limit
is
that
termsare equal.
zero,
ing
range's
indicating all inbreeding
The range'supperlimitis 1,indicating
thatinbreeding
mayvarywithinthefulltheoretically
allowablerangeof0 to 1,and henceare maximally
diverse.To reflect
someoftheresource
limitations
thatnecessarily
attendactualresearch,
in thesimulations
recruitment
beganwith
a homogeneoussetofinitialsubjects(theworst-case
and continuedfor10 waves.
possibility),
Each simulationwas repeated1,000timesto determine
themean absolutediscrepancy
betweensampleand population(see the bold line in Figure4a). The area boundedby the
dottedlinesrepresents
theconfidence
forthediscrepancy.
interval
Whentherangeis zero,
and hence the inbreeding
termsare all equal, discrepancies
are minimized(i.e., 3.3% +
1.5%). Thereis not a perfectassociationbetweensampleand populationbecause of the
limitation
in thenumberofwaves. It is also apparentbyinspection
ofFigure4a thatas the
differences
increases.However,even when
amonginbreeding
grows,samplingdiscrepancy
is maximally
diverse,witha rangeequal to one, a positiveassociationbetween
inbreeding
and
in a samplingdiscrepancy
sample
populationcontinuesto exist,as reflected
(i.e., 12.8%
? 7.7%) thatis substantially
lessthanthetheoretically
maximum.
This
possible
corresponds
to a correlation
betweensampleand populationdistributions
of r = .57 ? .114. Thus,the
resultsshowthat,evenin theworst-case
scenariowhereinbreeding
is mostdisparate,
a RDS
nonethelessproducesa samplethatcontainsa cross-section
oftheunderlying
population.
A morecompletecharacterization
of the effects
of violationsof the equal inbreeding
concernstheeffect
ofthemagnitude
oftheterms.Fromtheoremthreewe know
assumption
thatifinbreeding
is equal and thenumberofwavesis sufficient
to reachequilibrium,
samplingis unbiased. However,a questionremains:when inbreedingis unequal, does the
ofinbreeding
affect
strength
sampling?Figure4b depictstheresultsofa seriesofsimulations
thatanswerthisquestion.Asbefore,
thevertical
axisrepresents
thediscrepancy
betweenthe
axis now represents
themagnitudeof
populationand thesample. However,thehorizontal
therangewithinwhichtheinbreeding
inbreeding.Thisis operationalized
byfirst
specifying
termsfall. Thiswas set at a rangeof .5, themean theoretically
possiblevalue. Inbreeding
termswithinthisrangewerethenallowedtovaryfroma minimum
value of.25 (a
midpoint
smallermidpoint
valuewouldnotbe possiblebecausethensomeinbreeding
wouldbe negative), and a maximummidpoint
valueof.75 (a largermidpoint
valuewouldnotbe possible
becausethensomeinbreeding
wouldexceed 1). As before,theboldlinerepresents
thedisand thedottedlinerepresents
theconfidence
increpancybetweensampleand population,
terval.RDS yieldsthemostbiasedsampleswheninbreeding
is greatest.However,even in
thisworstcase scenario,theassociation
betweensampleandpopulationremainspositiveand
as reflected
in a meandiscrepancy
of12.1% ? 8%. Thiscorresponds
to a correlasubstantial,
tionbetweensampleand populationdistributions
of r = .55. Wheninbreeding
is weaker
lessbiasedsamples(i.e.,meandiscrepancies
are
(e.g.,lessthan1/3),RDS yieldssubstantially
less than4%).
In sum,thesensitivity
fromtheequal inbreeding
analysisshowsthatdepartures
assumptionmustbe substantial
beforetheyintroduce
substantial
biasesintothesample.Theformof
occurswhen inbreeding
is verystrong,and yet
departurethathas the mostadverseeffect
even undertheseconditionsthe populationsand sampledistributions
remainassociated.
wereestimated
(5) Equilibrium
sampledistributions
described
forFigure3. To reflect
theresource
usingtheprocedure
limitations
thatexistin moststudies,recruitment
was assumedto continuethrough
onlytenwaves,and theinitial
seedswereassumedto comefromthesamegroup.
betweentheestimated
(6) Themeanabsolutedifference
distribution
ofthe10thwave
equilibrium
(i.e.,thecomposition
ofrecruits),
and thepopulation
distribution
was computed.
(7) Steps1 to 6 abovewererepeated1,000times,thenthemeanand standard
deviation
ofthe meandifferences
were
computed.
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
195
196
.
HECKATHORN
M
e
0.5.
n
i 0.3
D
s
C
r 0.2e
p
y
ConfidenceInterval
0
0
0.2
0.4
0.6
InbreedingBias Range
0.8
1
A
M
e
0 .5 ...........
............
...................................................................
n
D
0.3
.
S
C
r
0.2
.
e
pMeann
ConfidenceInterval
0.1
yn...-----------
- --
0.25
0.35
Constraint Interval * 0.5
0.45
0.55
InbreedingBias Midpoint
0.65
0.75
B
as a FunctionofConstraints
on Inbreeding
Biases
Figure 4 * Population/Sample
Discrepancy
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
Respondent-Driven
Sampling
is violated,RDS canbe expectedto proHence,evenwhentheequal-inbreeding
assumption
duce good cross-sections
ofthetargetpopulation.
Limitations
oftheaboveanalysisshouldbe noted. RecallSpreenand Zwaagstra'sargumentthatsamplesofhiddenpopulationscannotexpectto do morethanofferbroadcrosssectionsofthetargetpopulation.The aboveanalysissuggests
thatRDS fulfills
thisgoal,and
thatmoremaybe possible.As shownabove,biasesin the firststepofrecruitment
tendto
decreaseas thesuccessivewavesare recruited.
undera broadarrayofcircumFurthermore,
stances(i.e.,lessthanextremeand positively
associatedinbreeding
levels),RDS can be theoifmeans
retically
expectedtoproducesampleswithonlya modestdegreeofbias. Potentially,
couldbe foundforindependently
be
could
levels,
assessinginbreeding
samples
weightedto
eliminatethesebiases. In addition,steering
whichprovidea meansforaltering
incentives,
or to make
levels,couldbe used to eitherover-sample
inbreeding
groupsofspecialinterest,
levels morenearlyequal and therebyreducethe need to weightthe samples.
inbreeding
Anothermajorremaining
taskis theinvestigation
ofthesesamples'consistency
and thereofestimators.
then
will
it
be
to
standard
errorsand
sultingvariability
Only
possible compute
confidence
intervals
forpopulationestimates,
and thereby
makeRDS intoa fullydefensible
statistical
samplingprocedure.
Conclusion
Thispaperpresentedempiricalresultsfroma new approachto accessingand sampling
hiddenpopulations,designedto reduceseveraldeficiencies
traditional
formsof
afflicting
chain-referral
itis usuallyassumedthatinferences
aboutindividuals
mustrely
samples.First,
foundby tracingchainsare never
mainlyon theinitialsample,sinceadditionalindividuals
foundrandomly
or even withknownbiases. However,ifthesamplingprocessis allowedto
continuethrough
itscomposition
willbe independent
of
enoughwavesto reachequilibrium,
theinitialsubjects.A secondproblemis thatchain-referral
tend
to
be
biased
toward
samples
morecooperativesubjects.Thisproblemcan be reducedby RDS's dual incentivesystem.
Individuals
who resistresearchers'
appealsmaynonetheless
yieldto appealsfromtheirpeers.
A third,relatedproblemis thatchain-referral
samplesare biasedby "masking"(protecting
friends
incentives
bynotreferring
them).Thisproblemcan be reducedbecauserecruitment
weakenthe reluctanceto approachreclusivepeers. Finally,chain-referral
samplesmaybe
biasedtowardsubjectswithlargepersonalnetworks.Thereare severalwaysin whichthis
problemcan be addressed:weighting
samplesbased on networksizes,saturating
targeted
to increaserecruitment
areas,or usingsteeringincentives
of subjectswithtraitsassociated
withsmallpersonalnetworks.
RDS need not be employedalone. It also can be appliedin combination
withother
methods.Considerfirstnetworksampling(Sudmanet al 1988). Thisformof samplingincreasesthe amountof information
obtainedfromeach respondent
by includingquestions
abouttheirpersonalnetworks.Ideally,respondents
are selectedby a procedurethatresults
in eachpopulationmemberhavinga knownprobability
ofselection,
butthisis notpossibleif
thepopulationis hidden.Yettheimplication
oftheorems
one and twois thatthechoiceof
the initialsampleneed not introducebias if a suitableprocedureis followed.Thatis, the
samplingshouldfirst
ofrecruitproceedthrough
multiplewaves. Thispermits
computation
mentnetworks
as depictedin TableI. This,together
withknowledgeofthecomposition
of
the initialset of seeds,makesit possibleto computehow manywaves mustbe completed
beforethe sample approximatesthe equilibriumdistribution,
and therebybecomes independentofitsstarting
point.Usuallythiswillbe no morethanthreeto fivewaves. Thus,
theRDS analysisprovidesa principled
basisfordecidinghowmanywavesmustbe completed
beforewhateverbiaseswerepresentin theselectionofseedsis overcome.The subjectsfor
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
197
198
HECKATHORN
thenetworksamplearethendrawnfromthatorsubsequentwaves. In essence,respondentdrivensamplingservesas the initialstageof thesamplingprocedure,
whichleads froman
initialset of subjectswithan unknownbias,to a subsequentset of subjectsthatare independentof thatbias.
WhendealRDS maybe able to playa similarinitialrolein ethnographic
investigation.
with
access
to
a
diverse
set
of
inforhidden
ing
populations,
gaining
ethnographic
suitably
a lengthyand uncertainprocess.RDS mayprovidea meansbothfor
mantsis frequently
thatdifferent
sectorsofthepopulationareadequately
speedingthisprocess,and forensuring
withnetwork
Thus,as in thecase ofRDS's combination
represented
amongtheinformants.
and then
the
waves
reach
should
be
number
of
to
equilibrium
computed,
sampling,
required
informants
are drawnfromthatand subsequentwaves, untilthe desired
ethnographic
is attained.
numberof informants
References
and Paul W. Holland
Bishop,YvonneM. M., StephenE. Fienberg,
Mass: MIT Press.
1975 DiscreteMultivariate
Analysis:
Theoryand Practice.Cambridge,
Broadhead,RobertS., and DouglasD. Heckathorn
and new
1994 "AIDSprevention
outreach
problems
amonginjection
drugusers:Agency
41:473-495.
approaches."SocialProblems
Jean-PaulC. Grund,L. SynnStern,and DeniseL.
Broadhead,RobertS., DouglasD. Heckathorn,
Anthony
AIDS: Preliminary
resultsof a peer1995 "Drugusersversusoutreachworkersin combating
JournalofDrugIssues25(3) 531-564.
drivenintervention."
Coser,LewisA
1956 The Functions
ofSocialConflict.Glencoe,Ill.: FreePress.
Deaux, Edward,and JohnW. Callaghan
ofhealthbehavior."EvaluationReview9:365versusself-report
estimates
1985 "Keyinformant
368.
BonnieH
Erickson,
10:276-302.
fromchaindata." Sociological
1979 "Someproblemsofinference
Methodology
Fararo,ThomasJ
New York:JohnWileyand
to Fundamentals.
1973 Mathematical
Sociology:An Introduction
Sons.
Fararo,ThomasJ.,and JohnSkvoretz
6:223-258.
and socialstructure
theorems:PartII." SocialNetworks
1984 "Biasednetworks
Frank,Ove, and TomSnijders
thesize of hiddenpopulations
1994 "Estimating
usingsnowballsampling."Journalof Official
10:53-67.
Statistics
Goodman,L. A.
32:148-170.
Statistics
1961 "SnowballSampling."AnnalsofMathematical
Heckathorn,
DouglasD.
social
norms:A formaltheoryofgroup-mediated
1990 "Collective
sanctionsand compliance
Review55:366-384.
control."AmericanSociological
versusselective
actionand groupheterogeneity:
1993 "Collective
Voluntary
provision
Review58:329-350.
incentives."AmericanSociological
1996 "Dynamicsand Dilemmasof CollectiveAction."AmericanSociologicalReview61:250278.
Heckathorn,
DouglasD., and RobertS. Broadhead.
and Society8:235-260.
1996 "Rationalchoice,publicpolicy,and AIDS." Rationality
Kemeny,JohnG.,and J.LaurieSnell
N.J.:VanNostrand.
1960 FiniteMarkovChains. Princeton,
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
Respondent-Driven
Sampling
PeterD., and H. RussellBernard
Killworth,
1978/79"The
reversalsmallworldexperiment."
SocialNetworks
1:159-192.
Klovdahl,Alden.S.
1989 "Urbansocialnetworks:Some methodological
In The Small
problemsand possibilities."
World,M. Kochen(ed.), 176-210.Norwood,N.J.:Ablex.
Rabbie,JacobM., and MurrayHorwitz
1969 "Arousalofin-group-out-group
biasbya chancewin or loss." JournalofPersonality
and
SocialPsychology
13:269-277.
Anatol
Rapoport,
1979 "A probabilistic
2:1-18.
approachto networks."SocialNetworks
Simmel,Georg
1955 Conflict:The WebofGroupAffiliations.
and ReinhardBendix.New
Trans.KurtH. Wolff
York:FreePress.
Spreen,Marius
1992 "Rarepopulations,
hiddenpopulations,
and link-tracing
designs:Whatand why?"Bulletin
de Methodologie
Sociologique6:34-58.
Spreen,Marius,and RonaldZwaagstra
1994 "Personalnetworksampling,
the
outdegreeanalysisand multilevel
analysis:Introducing
networkconceptin studiesofhiddenpopulations."International
Sociology9:475-491.
and GrahamKalton
Sudman,Seymour,
1986 'New developments
in thesampling
ofspecialpopulations."AnnualReviewofSociology
12:401-429.
JohnK., and Patrick
Biernacki
Watters,
1989 "Targetedsampling:Optionsfor the studyof hidden populations."Social Problems
36(4):416-430.
Welch,Susan
1975 "Sampling
in a dispersed
39:237-245.
byreferral
population."PublicOpinionQuarterly
This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM
All use subject to JSTOR Terms and Conditions
199