Respondent-Driven Sampling: A New Approach to the Study of Hidden Populations Author(s): Douglas D. Heckathorn Source: Social Problems, Vol. 44, No. 2 (May, 1997), pp. 174-199 Published by: University of California Press on behalf of the Society for the Study of Social Problems Stable URL: http://www.jstor.org/stable/3096941 . Accessed: 20/12/2013 16:14 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . University of California Press and Society for the Study of Social Problems are collaborating with JSTOR to digitize, preserve and extend access to Social Problems. http://www.jstor.org This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions Respondent-Driven Sampling: A New Approach to the Study of Hidden Populations* DOUGLAS D. HECKATHORN, University of Connecticut " whennosampling A population is "hidden exists andpublicacknowledgment in frame ofmembership thepopulation ispotentially suchpopulations is difficult standard because threatening. probability Accessing lowresponse methods ratesandresponses thatlackcandor.Existing sampling produce procedures forsampling thesepopulations, snowballand otherchain-referral thekey-informant and samples, approach, including a newvariant introduce well-documented biasesintotheir targeted sampling, samples.Thispaperintroduces a dualsystem to thatemploys incentives respondent-driven sampling, ofstructured ofchain-referral sampling, overcome someofthedeficiencies onbothMarkov-chain drawing theory ofsuchsamples.A theoretic analysis, and thetheory showsthatthisprocedure canreduce thebiasesgenerally associated with ofbiasednetworks, a proof methods. Theanalysis thateventhough includes sampling showing beginswithan chain-referral chosen setofinitialsubjects, as do mostchain-referral thecomposition samples, oftheultimate arbitrarily a theoretic initial Theanalysis alsoincludes independent sampleiswholly ofthose subjects. specification ofthe underwhichtheprocedure conditions basedon surveys results, yieldsunbiased samples.Empirical of277 in Connecticut, activedruginjectors these theconclusion howresponconclusions. discusses support Finally, dent-driven canimprove bothnetwork andethnographic sampling 44investigation. sampling Introduction "Hidden populations" have two characteristics:first,no sampling frameexists, so the size and boundaries of the population are unknown; and second, thereexist strongprivacy concerns,because membershipinvolvesstigmatizedor illegalbehavior,leading individualsto refuseto cooperate,or give unreliableanswers to protecttheirprivacy. Traditionalmethods, such as household surveys,cannot produce reliablesamples,and theyare inefficient, because most hidden populations are rare. Examples include many groups at riskof contractingHIV, includingactive injectiondrug users (IDUs). Identifyingthese groups is crucial to the development of effectiveAIDS preventioninterventions,and findingvalid, reliable ways of sampling them is essential to evaluating interventions. Three methods dominate studies of hidden populations: snowball sampling and other formsof chain referralsamples, key informantsampling,and targetedsampling. The best known approach is snowballsampling(Goodman 1961): ideally, a randomlychosen sample serves as initial contacts,though in practiceease of access virtuallyalways determinesthe initialsample; these subjectsprovide the names of a fixednumber of other individualswho fulfillthe researchcriteria. The researcherapproaches these persons,and asks them to participate;and each subject who agrees is then asked to provide a fixed number of additional names. The researchercontinues thisprocess foras many stages as desired. * An earlierversionofthispaperwas presented onAIDS,Vancouver, at theXI International B.C.,July Conference and Denise to combatAIDS was invented, intervention 1996. I thankRobertBroadhead, withwhomthepeer-driven Boris Dean Behrens,RobertMills,BethJacobson, who conducted muchofthedataanalysis.In addition, Anthony, I andadvice.Finally, reviewers comments DavidWeakliem,JeroenWeesieandanonymous provided helpful Sergeyev, onDrugAbuse(Grant#R01DA08014)foritssupportofthisresearch. Institute thanktheNational 174 SOCIAL PROBLEMS, Vol.44, No. 2, May 1997 This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions Respondent-Driven Sampling Erickson(1979:299) describesa numberof problemsafflicting snowballsamplingand otherchain-referral about individualsmustrelymainlyon the samples. First,"inferences initialsample,sinceadditionalindividuals foundbytracing chainsareneverfoundrandomly or even withknownbiases." Thisissueis especiallygrave,because in the contextswhere chain-referral methodsare used,theinitialsampleusuallycannotbe drawnrandomly.Second, chain-referral samplestendto be biasedtowardthe morecooperativesubjectswho thisproblemis aggravated whentheinitialsubjectsare volunteers, beagreeto participate; cause in termsofcooperation are outliers. Third,thesesamplesmaybe biasedbecause they of"masking," thatis,protecting friends them,an important bynotreferring problemwhena occurthroughnetworklinks,so populationhas strongprivacyconcerns.Fourth,referrals willbe oversampled, and relativeisolateswillbe exsubjectswithlargerpersonalnetworks cluded. Because of thesepotentialbiases,snowballsamplestypically are seen as "convenience samples"thatlackanyvalidclaimto produceunbiasedand consistent samples. Still, Ericksonis optimistic aboutthepotential forresolving theseproblems, that: "the concluding problems in making inferencesabout individualsand about chains . . . are . . . in principle solvable:One can buildin addedincentives" she didnotdevelop (1979:299). Unfortunately, thispoint. Twoadditionalmethodshave developedto overcomethedifficulties snowball afflicting samples. Keyinformant sampling(Deaux and Callaghan1985) is designedto overcomereand askingthem about sponse biases by selectingespeciallyknowledgeable respondents others'behavior,ratherthantheirown. For example,one mightask socialworkers,drug abuse counselors, or naturalopinionleadersto reporton patternsof publichealthofficials, to exaggeratesociallyacdruguse and sexualbehavior.Thismethodreducesthetendency behavior,however,itadds severalsourcesof ceptablebehaviorand understate disreputable bias. First,when professionals are keyinformants, theirprofessional orientation maybias theirresponses; abusecounselors theirclients'difficulties, and e.g.,substance mayexaggerate naturalopinionleadersmayfeelobligated topresenteitheran idealizedora disparaging view ofthosewhomtheyinfluence.Second,keyinformants detailedknowlmaylacksufficiently ofcondomuse edge;e.g.,withtheexceptionofsexualpartners, knowledgeaboutfrequency tendsto be strictly do notinteract witha randomgroupof personal.Third,keyinformants is a professional, thebias is a formof "institutional potentialclients.If the keyinformant bias"characteristic ofsamplesdrawnfrominstitutionalized populations.Forexample,justas selected,neitherare thosewho come intocontact imprisoned drugusersare notrandomly withdrug-abusecounselorsor thosewho are partof the entourageof a naturalopinion leader. Hencethekeyinformant itintroduces new sourcesofpoapproachhas limitations: tentialresponsebias,it cannotbe used to accesshighlydetailedand personalinformation; and thesamplingmayhave an institutional bias. (Wattersand Biernacki1989) is a widelyemployedresponseto the Targeted sampling ofchain-referral deficiencies models.Itinvolvestwobasicsteps:first, fieldresearchers mapa the extent thattheysucceedin penetrating thelocal networks targetpopulation(to linking thispreventsthe under-sampling thattraditional potentialrespondents, approacheswould recruita pre-specified numberof subjectsat sites produce);and second,fieldresearchers identified thatsubjectsfromdifferent areasand subbytheethnographic mapping, ensuring will groups appearin the finalsample. The adequacyof targeted samplingdependson the oftheethnographic accuracyand comprehensiveness mapping.Ifethnographic mappingis nearperfect, targeted samplingreducesto a formoflocationsampling.Forexample,based on an exhaustiveethnographic analysis,one couldweightdrugscenesbasedon theirintenofuserat eachtimeoftheday,and sampleaccordingly. sityofuse byeachcategory Unfortunately,thisis neverpossiblebecausedrugscenesare neitherdiscretenorpublic;whilesome drugcopping(i.e.,selling)occursin concentrated areas,muchis dispersed, in prioccurring vateapartments and othernonpublicsettings.Evenan accomplished ethnographer requires This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions 175 176 HECKATHORN a smallsubsetofsuchsettings monthstopenetrate withinone urbanneighborhood. Ethnogof the timeof day and therefore is i.e.,by theeffects limited, targetsampling always raphy wheretheydo theirrecruiting, and therecruitment whenresearchers recruit, they strategies ifresearchers recruit and Biernacki1989:424-426).Forinstance, use (Watters duringnormal businesshours,theywillhavedifficulty recruiting gainfully employedsubjects.Ifresearchers a subtlerversionofthe "instituin "obvious"locations,theyintroduce focustheiractivities tional"bias targeted samplingseeksto avoid- theyoversamplethemostvisiblepotential thosein less obviousniches. Researchers' (or missaltogether) subjects,and under-sample limitthe intoareas wheretheydo not feelsafefurther aboutventuring own trepidations that correintroduces biases Thus of the targetedsampling diversity populationsampled. which it is based. of the to the limits upon ethnography spond limitations and theabsenceofanyfundaoftargeted sampling's recognition Increasing in snowballand other renewed interest methods have new produced sampling mentally methods.Thesemethodshave greatpotentialpower,as studiesofsocialnetchain-referral associas largeas theUnitedStates,everymemberis indirectly worksreveal.In populations six intermediaries ated witheveryothermemberthroughapproximately (Killworthand Bernard1978/79). This means that even the most sociallyisolatedindividualscan be chosenindividwithanyarbitrarily chainbeginning reachedbythesixthwave ofa referral ual. Ofcourse,realizingeven a smallportionof thispotentialrequiresa highlyrobustrerecruit forexample,thatparticipants cruitment everyonetheyknow,and process,requiring, in thecase of sociometric stars,thismightinvolvemanythousandsofpersons. ofchain-referral Refinements samplingincludesFrankand Snijders'(1994) methodfor hidden of the size populationsusinga one-wavesnowballsample. Theyselecta estimating diverseset of initialsubjects,each ofwhomthenlistsall membersof thetargetpopulation basedon theamountof thattheyknow. Thesizeofthehiddenpopulationis thenestimated Klovdahl(1989) proposesa "randomwalk" overlapamongthe memberslisted.Similarly, network structure, usinga snowballsamplein whicheach wave conapproachforanalyzing memofthenetwork features sistsofonlyone subject;itsresultsrevealstructural connecting bers of the hidden population. In addition,Spreen and Zwaagstra(1994) propose a combinationof snowballand targetedsampling,termed"targeted personalnetworksambecomesthebasis which an initial to locate which uses sample, mapping ethnographic pling," of cocaineusers' fornetworksampling.Theyused thisapproachto analyzethe structure personalnetworks. sampling,some problems Despite the sophisticationof these extensionsof chain-referral discussionabout remain unresolved. In particular: "The centralquestionin themethodological samplingand analyzinghiddenpopulationsis, basically:How to draw a random(initial)sample" (Spreen 1992:49). This question is crucial,because it generallyhas been assumed thathowever many waves the chain-referralsamplingmay contain, it necessarilymust reflectthe biases in the initialsample. This paper describesa new formof chain-referral samplingtermed"Respondent-Driven Sampling" (RDS). It was developed as part of an AIDS preventionintervention,the Eastern HealthOutreach(ECHO) project,which targetsactive injectiondrug users forinConnecticut terviews,AIDS preventioneducation,and HIV testingand counseling (Broadhead and Heckathorn 1994; Broadhead et al. 1995; Heckathornand Broadhead 1996). This processrequires approximately1 1/2hours. RDS plays a dual role in the project. It is used both to recruit subjectsinto the AIDS preventionintervention,and to sample the population of active injectors. Based on an analysis drawingon Markov chains and the theoryof biased networks,I show that suitable incentivescan reduce the biases of chain-referralsamples. Specifically, RDS produces samples thatare independent fromwhichsamplingbegins.As a oftheinitialsubjects result,it does not matterwhether the initialsample is drawn randomly. In addition, RDS This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions Respondent-Driven Sampling reducesthebiasesresulting fromvoluntarism and masking, and providesmeansforcontrolthe from differences in the sizesofpersonalnetworks.Thusit resolves, biases ling resulting or providesmeansforresolving, theprincipal chain-referral problemsaffecting samples. Respondent-Driven Sampling A principle derivesfromstudiesofincentive respondent-driven underlying sampling systems(Heckathorn 1990, 1993, 1996). Behavioralcompliancecan arisefromtwo theoretisources.First,itcan arisefromindividual-sanction-based control.Here callydistinguishable an agent,suchas a teacher,parent,neighbor, or AIDS prevention an indicounselor,targets vidualforcontrol, forexample,bypromising a rewardfora respondent an interundergoing view. The resultis a dyadicrelationof the sortpresumedin mostanalysesof influence a primary incentive. socialcontrol relations, Second,compliancecan arisefromgroup-mediated can be rewardednotonlyfortheirown par(Heckathorn 1990). Forexample,respondents in a study,butalso forparticipation ticipation theyelicitfroma peer. In thesecases,control involvesa two-step oftheactor'sgroupis promiseda reward process:one ormoremembers or threatened witha punishment basedon whethertheactorcomplies,and membersofthe incentive theactor.In thisway,theagent's grouprespondto thatsecondary bycontrolling influence is amplified the this formofcontrolis "groupmediThus, through target's group. thattrigger it are "secondary incentives." ated,"and theincentives A centralconclusionfromrecentresearchon incentivesystems(Heckathorn1990, incentives canbe moreefficient and effective thanprimary incentives 1996) is thatsecondary in contextswhereintragroup controlis cheap and effective, as when socialapprovalis an sanctionin peergroups.A difference betweenprimary and secondaryincentives important liesin thedistribution ofcompliance costs.In thecase ofindividualized rewards, compliance costsare internal;the targetedactoreithercompliesor refuses,therebybearingwhatever costsare involved,and thechoiceis strictly in thecase of secondary personal.In contrast, actorseekstoinduceothersto comply, incentives, compliancecostsareexternal;thetargeted and therefore othersbearthecompliancecosts. It is usuallyeasierto tellothersto comply thanto do so oneself.Thisdifference is crucialin thecase ofrecruitment intoa study.When incentives motivateparticipation, individuals maketheirown autonomousdeonlyprimary cisionsaboutwhetherto cooperate.However,ifsecondary rewardsalso motivaterecruiting othersintothestudy,peers'socialinfluence is harnessedon behalfofthesamplingprocess. A secondreasonforthepotentialeffectiveness ofsecondaryincentives concernsmoniincentives as meansof controlmustbe able to monitor toring.Agentsemploying primary can observeonlya compliance.But police,teachers,and drugabuse counselorstypically smallportionofbehavior,so monitoring is difficult whenactivities cannotbe geographically confined.In contrast, and peerstend secondaryincentives operatethroughpeerinfluence, to be farmoreeffective monitorsof behavior(Heckathorn 1990). When samplinghidden can be located.For populations, helpfrompeersis oftentheonlywaythatmanyindividuals IDUs are thanotherIDUs. example,no one knowsbetterwho a community's A thirdreasonforthepotentialeffectiveness ofsecondary incentives concernsdifferentialsin responsetomaterialincentives.Someindividuals not may respondtomaterialincentives,e.g.,affluent subjectsmaynotneed themoneytheywouldgainbybeinginterviewed. In samplingbasedon primary iftheresearcher's incentives, is abilityto rewardsymbolically limited,he or she mustrelyon materialrewards.Generally, socialresearchers have little choicebutto relyupon materialrewards, because,exceptin thecase oflengthy participant observation studies,theycannotdevelopmeaningful withsubjects.Secondary relationships incentivesharnesspeer pressure,applyingnon-material rewardssuch as peer approvalto securecompliance.In essence,secondaryincentives convertmaterialincentives intopeer- This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions 177 178 HECKATHORN basedsymbolic incentives.Forexample,individuals who aretooaffluent tocareaboutmaterialrewardsmaydefertosocialpressurefromlessaffluent or morematerially-oriented peers, such as mainconnections or dealers. Likeotherchain-referral methods,RDS assumesthatthosebestable to accessmembers snowballsamplingin of hiddenpopulationsare theirown peers. It differs fromtraditional involvesan incentiveforparticipatworespects.First,whereassnowballsamplingtypically tion,RDS involvesa dual incentivesystem- therewardforbeinginterviewed (a primary othersintothestudy(a secondaryreward).Thisstudy reward)plusa rewardforrecruiting to helpprotectoneself and symbolic also used a mixofmaterial(monetary) (theopportunity a robustrecruitand one's peersfroma deadlyepidemic)rewards.Theserewardsfostered thatyieldeda large ment,in whicha fewinitialsubjectseachproducedchain-referral systems overthecourseof successivewaves. Forexample,Figure1 depictsthe numberof recruits networkfromSite2, in whicha singlesubjectgeneratedmorethanone largestrecruitment hundredrecruits ofdiverserace,ethnicity, gender,and place of residence. wm2 wm w2 hm2 wm2---wf2 2 w bm2 bbf2 W m2w2 w/w2 wf2 - / wf2wf2 /Io w2 w ?f2 wft2-wm2 2 wm2 w2 w2 wm2 wfo wfo M2 2w 2 wfo 3 f -wm & Wf2 w w2bm3 wm3 bf3 w23/ bm2 - ____ hm3 h 2hm2h 2 wf2 / /? hm2 wf2 wf2 h3 w3 3 m3wf3 wf3 hm3 2 hf2 ??m2 hf2 h2 w3 \ wf3 wf3 bm2wm3 Wf3 bm3 2 wm2 wm3 wf3 / wm3 bf 3 wm 3 bm3 \ wm3 -h bf2m mhiw2 bm2 -m2 m2 3w3 bf3 Key: =Black wfo hm3 hm3 hmN hi3 -2=Town omo wfo M=Male 2 3=Town 3 O=Other ?=Missing in a respondent-driven network sample,beginning froma single"seed." Figure 1 * Recruitment betweenRDS and typicalsnowballsamplingis thatsubjectsare not A seconddifference but to recruitthemintothe study.This theirpeersto the investigator, asked to identify thataresubjectedto considerable is crucialwhendealingwithhiddenpopulations distinction thatwhen Frank not surprising so it is are The Netherlands tolerant, relatively repression. and Snijders(1994:62) askedheroinaddictsto listotherheroinaddicts,thetypicalresponin the dentgave up more thannine names. This approachwould be moreproblematic This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions Respondent-Driven Sampling UnitedStatesundercurrent"war-on-drugs" whereaskingrespondents to proconditions, duce suchlistsis highlythreatening, and violatestheinformal normagainst"snitching." The RDS approachreducesmasking, becauseitgivesrespondents theopportunity to allow peers to decideforthemselves whethertheywishtoparticipate. Forexample,a subjectwho would not feelcomfortable the name of a peer,may nonethelesssucceedin givinga researcher thatpeer. Also,the problemof recruiting recruiting onlythe mostcooperativesubjectsis reducedby combining primaryand secondaryincentives.Individualswho resistfieldresearchers'appealsmaynonetheless at leastin part,by yieldto appealsfrompeersmotivated, individuals are notimmuneto socialpressecondaryincentives.Even theleastcooperative sure. Finally,it typically is assumedthatchain-referral samplesare biasedtowardsubjects withlargepersonalnetworks.However,an empiricaltestof thishypothesis did not find evidenceofthis(Welch1975). Similarly, in thecurrent study,theassociationbetweenpersonalnetworksize,as measuredbya questionthataskedhowmanyinjectors therespondent knewby name or face,lackedany statistically association with the numberof significant intothestudy.Thismayresultfromtherecruitment subjectsrecruited quotasbuiltintothis couldgenerally findat leasta few study.Even thosesubjectswithsmallpersonalnetworks subjectsto recruit.Recruitment quotasreducedthe abilityof subjectswithlargepersonal networksto recruitmoreextensively thansubjectswithsmallernetworks. The potentialbias of oversampling subjectswithlargerpersonalnetworkscan be addressedin severalways: (a) samplescan be weightedinversely withsubjects'self-reported networksize; (b) samplingcan focuson saturating areasand therebycapturesubtargeted can be employedtoincrease sizes;or (c) specialincentives jectswiththefullarrayofnetwork recruitment of subjectswithtraitsassociatedwithsmallpersonalnetworks. in thefollowing Respondent-driven samplingwas implemented way: Research staff a recruit handful of who serveas "seeds." (1) subjects financial incentives to recruit theirpeersintothesame interview (2) Seedsare offered they havejustcompleted.Specifically, seedsaregivenseveralrecruitment and toldthat coupons iftheypass thecouponson to peerswho come to thestorefront foran interview, the recruiter willbe paid $10 foreach recruited peer. are offered the same dual incentivesas were the seeds. Everyoneis (3) All new recruits rewardedbothforcompleting theinterview, and forrecruiting theirpeersintothe research. Givenadequate incentives, thismechanismcreatesan expandingsystemof chain-referrals in whichsubjectsrecruitmoresubjects,who recruitstillmoresubjects, and so on fromwave to wave. To ensurethata broadarrayofsubjectshavean opportuto preventtheemergence ofsemi-professional and to preclude nityto recruit, recruiters, turfbattlesover recruitment each subjectwas limitedto threeinitialcoupons. rights, Setsof threeadditionalcouponsweregivenonlyundertwoconditions, thatthreesuband thattheyhad been effectively recruited, educatedabout jectshad been successfully HIV prevention issuesas measuredusinga structured assessment test. A by-product of thisrecruitment quota was to increasethenumberof wavesofrecruitment requiredto saturatethepopulation.Giventhecombination ofincentives forrecruitment and education,theaveragecostperrecruited subjectwas about$14. in thepopulationmustbe objectively (4) The traitdefining lestremembership verifiable, reacttotherecruitment incentives spondents byenlisting personswhoare notpartofthe hiddenpopulation(e.g., noninjectors).In thisstudy,verification was accomplished througha seven-step screening protocolused to confirm injectiondruguse. Forexample, the firststepwas to look forrecenttrackmarks.If markswerenot found,subsequent steps requiredthe subjectto demonstrate detailedacquaintancewith drug and injectiontechniques. preparation This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions 179 180 HECKATHORN occurswhena subjectseeksto participate in a studyundermultiple (5) Subjectduplication and subjectimpersonation occurswhena subjectpretendsto be another,peridentities, Thesepotentialproblems haps as a meansto collectthelatter'srewardforrecruitment. were overcomeusinga subjectidentification databaserecordingsubjects'identifying height,weight,scars,tattoos, physicalcharacteristics, includinggender,age, ethnicity, and somebiometric measures. theuse of"steering" withina populationcan occurthrough (6) Targeting specific subgroups thatis, extrabonusespaid forrecruiting incentives, specificcategoriesof subjects.For forrecruiting femaleinjectors. example,a modest($5) bonuswas offered or when a is saturated, either when the can be ended targetedcommunity (7) Sampling has has reached and the minimum size been samplecomposition reacheda targetsample withrespectto thetraitsupon whichtheresearchfocuses. stablecomposition limitations. Likeanychain-referral Theseprocedures haveinherent procedure, sampling thatconwitha contactpattern;theactivities RDS is suitableonlyforsamplingpopulations in thepopulationmustcreateconnections stitutemembership amongpopulationmembers, takeplace. sexualactivities as when druguserspurchaseor sharedrugs,or whenhigh-risk thismethodis not suitablefordrawingnationalsamples.The size of the area Therefore, extenwithinwhichsamplingcan be effective dependson thecontactpattern'sgeographic to therespondents.Nor oftransportation siveness,whichin turndependon theavailability whosemembersshareno ties. Second,the is RDS suitableforsamplinghiddenpopulations and lestinterview in thepopulationmustbe verified traitdefining objectively, membership to claimfalselythattheyare membersof incentives motivatesomerespondents recruitment thepopulation.Thisis nota problemuniqueto RDS; itoccurswheneverinterview responincentivemayaggravatetheproblem,requiringa dentsare rewarded,but therecruitment well-tested screening protocol. The Setting of28,500 withpopulations in twosmallcitiesin Connecticut The RDS was implemented and 42,800,and substantial injectiondruguse and associatedAIDS cases. As ofJuly1996, 185casesofAIDS hadbeendiagnosedin thefirst site,and 89 at thesecond,withanother116 cases in a nearbytownof 59,500thatalso fellwithinthesamplingarea. Aboutone halfof - involveddruginjection. the cases in thesetowns- 55%, 49% and 47% respectively was high,i.e.,71.7% and 78.9% at sites1 and 2 Unemployment amongtheIDU respondents Samplingat the two siteswas sequential.It beganat the firstsitein March respectively. untilMarch1995. Samplingat thesecondsitebeganin March1995 and 1994,continuing I reporton exactlyone year March1996. Therefore, hereextendthrough thedata reported locatedin activedrug of operationat each site. Bothprojectsoperatedout of storefronts forinterusingscenes. A programstaffof threehealtheducatorsscheduledappointments viewsduringthreeweekdays. A Formal Model a RDS createsa stochastic Whenviewedanalytically, processin whicheach recruiter's oftherecruits.In thecase ofraceand ethnicity, affect thecharacteristics socialcharacteristics ethnicmixofrecruits.For ofeach ethnicgroupgeneratea distinct thismeansthatrecruiters whites,on average,reexample,fromTableIa it is apparentthatin Site 1, non-Hispanic cruited70% whites,19% non-Hispanic blacks,10% Hispanics,and 2% others.Similarly, on average,28% whites,56% blacks,6% Hispanics,and 9% blacksrecruit, non-Hispanic towardin-grouprecruitment persists.Thispatterncontinuesamong others,so thetendency This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions Respondent-Driven Sampling Hispanics,whose in-grouprecruitmentrate is 59%. Hence, recruitingoccurs preponderantly within the ethnic group, but cross-ethnicrecruitmentis also common. Table I * Characteristics Site 1 ofRecruits, byCharacteristics ofRecruiter, Table Ia * Recruitment byEthnicity ofRecruit Race/Ethnicity Ethnicity ofRecruiter W B H 0 Total White(W) Non-Hispanic 69.8% 19% 9.5% 1.6% Black(B) Non-Hispanic 28.1% 56.3% 6.3% 9.4% Hispanic (H) 23.5% 5.9% 58.8% 11.8% 50% 50% 0% 0% Other (0) ofRecruits TotalDistribution 50.9% 28.4% 15.5% 5.2% (59) (33) (18) (6) 49% 29.7% 15.8% 5.4% Equilibrium Mean Discrepancy, ofRecruits Distribution and Equilibrium = .91% (r = .998) 100% (63) 100% (32) 100% (17) 100% (4) 100% (116) 100% Table Ib * Recruitment byGender GenderofRecruiter Female Female Male Total 29.7% 70.3% Male 23.9% 76.1% TotalDistribution ofRecruits 25.6% (32) 25.4% 100% (37) 100% (88) 74.4% (93) 74.6% Equilibrium = .2% Mean Discrepancy, Distribution ofRecruits and Equilibrium 100% (125) 100% Table Ib reportsthe correspondingdata forgender. No in-grouprecruitmentpatternis apparent in thiscase, because both femalesand males recruitbetween 70% and 76% males. This may reflectthe male-dominatedcharacterof injectiondrug scenes. However, because rewards for recruitingfemales were greaterthan those forrecruitingmales, their expected effectwas to increase in-group selection among women, strengtheningtheir incentives to recruitone another,while reducingin-groupselectionamong men, because theirincentive was to recruitwomen. This shows how steeringincentivescan alter tendencies toward ingroup or out-groupselection. An intermediatelevel of in-groupselectionis apparentin Table Ic's reportof recruitment based on drug preference.Most subjectsin the studyreportthatheroin is theirdrug of first choice, so drug preferencewas dichotomizedbetween heroin,and an "other"categorythat includes cocaine and crack,methadone, cannabis, speedballs (a combinationof cocaine and heroin), and alcohol. A tendencyforsubjects to recruitfromtheirown group is apparent, but it is weaker than the effectsof ethnicity.Finally,Table Id reportsrecruitmentbased on This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions 181 182 HECKATHORN Table Ic * Recruitment byDrugPreference ofRecruit DrugPreference Other DrugPreference ofRecruiter Heroin Heroin 87.7% 12.3% Other 67.6% 32.4% ofRecruits TotalDistribution 80% (94) 84.6% 20% (21) 15.4% Equilibrium = 2.86% ofRecruits Mean Discrepancy, Distribution and Equilibrium Total 100% (81) 100% (34) 100% (105) 100% Table Id * Recruitment byLocation LocationofRecruit Location ofRecruiter In Town (Town1) In Area In Town (Town1) In Area OutofArea Total 84.3% 8.4% 7.2% 60% 10% 30% 50% 8.8% 41.2% 100% (83) 100% (10) 100% (34) 100% (127) 100% Out ofArea 18.1% 8.7% 73.2% (23) (93) (11) 13.9% 8.6% 77.5% Equilibrium = 2.86% (r = .997) ofRecruits and Equilibrium Distribution Mean Discrepancy, ofRecruits TotalDistribution area of residence. Here the tendencytowardin-grouprecruitmentis variable: whereas residents of Site l's town recruitinternally84% of the time,subjects fromthe area (i.e., from contiguous towns) or out of the area (i.e., frommore distanttowns) recruitstronglyboth fromwithinand outside of town. This reflectsthe town's positionas a regionaldrugdistribution center. IDUs travel considerabledistances to purchase drugs in the town, and in the process they develop networkconnectionsboth in the town and in theirarea of origin. Sampling as a Markov Process RDS recruitmenthas two importantcharacteristics.First,thereare a limitednumber of states (e.g., typesof ethnicity)thatsubjectscan assume. Second, any subject's recruitsare a functionof his or her type,such as his or her ethnicity;and not of previous events, such as who recruitedthe recruiter.This requirementis satisfiedin the case of Table Ia's data because thereis no significantassociationbetween the ethnicityof a recruiter'srecruitsand the This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions Respondent-Driven Sampling of therecruiter's is a memoryless recruiter.'Hence,recruitment ethnicity process.For exwho werethemselves recruited aboutthesame ample,whiterecruiters bywhites,recruited mixofsubjectsas didwhiterecruiters whowererecruited or blacks. Therefore, byHispanics RDS recruitment in thiscase qualifiesas a first-order Markov and can be represented process,2 in theformofthenetwork depictedinFigure2a. Therecruitment processcanbe conceptualized as a pointmovingfromnode to node in thisnetwork, each representing a distinct state of the system.At any instant,the currentlocationof the pointindicatesthe mostrecent recruit's The arrowsleavingthatpointto othernodesindicatetheethnicity ofthe ethnicity. nextrecruit, and thenumberassociatedwitheach arrowindicatestheprobability thatthis pathwillbe taken. Markovprocessesareofseveralbasictypes.Somehaveabsorbing states,suchthatwhen one entersthatstate,no exitis possible.Thisoccurswhenan ethnicgroupis totallyisolated, such thatitsmembersonlyrecruitone another.Figure2a, indicatesthisdoes not fitthis thereis a substantial selection network; bias,butitis lessthantotal.Indeed,followin-group the arrows in 2a's one can reachany pointin the networkfromany network, ing Figure otherpoint. Therefore, thisnetworkis "ergodic"(Fararo1973:280). In contrast, Figure2b a nonergodic illustrates networkthatwillbe discussedbelow,in whichthe rightmost two nodes are absorbing states.Returning to Figure2a, thenetworkis non-cyclic, in thatany node can be reachedduringany timeperiod(e.g.,odd versuseven timeperiods).Because theMarkovprocessis bothergodicand non-cyclic, Figure2a's networkis termed"regular." Two theoremsregarding RDS. regularMarkovprocessesare relevantto understanding First,the "law of largenumbersforregularMarkovchains"(Kemenyand Snell 1960:73) statesthattheprobability thata systemwillbe in anygivenstateoverthecourseofa large numberof stepsis independent ofitsstarting state.The implication forRDS recruitment is that: THEOREM ONE: As therecruitment continues from waveto wave,an equilibrium mixof process willeventually recruits beattained thatisindependent ofthecharacteristics ofthesubject orsetof from whichrecruitment subjects began. is reached,and theresulting Thus,ifrecruitment recruitment netoperatesuntilequilibrium workqualifiesas a regularMarkovprocess,itavoidsthecentralproblemforsampling hidden theinitialsample. Instead,a populations- thatthesample'scharacteristics merelyreflect of theinitialset ofsubjects.Thispointis respondent-driven sampleis whollyindependent illustrated in Figures3a and 3b, in whichTableIa is used as the basis fortwo graphically alternative simulations.Figure3a drawson TableIa's data to mathematically projectwhat would have happenedhad recruitment African-American begunwithonly non-Hispanic seeds. Afterthefirstwave,thosesubjectswouldhave on averagerecruited 28.1% whites, 56.3% blacks,6.3% Hispanics,and 9.4% others,so blackswould predominate.Afterthe secondwave,thecomposition ofrecruits wouldchange,because28.1% ofrecruiters would be white,and theywouldrecruit, on average,69.8% whites,19% blacks,9.5% Hispanics, and 1.6% others(i.e, NativeAmericans and AsianAmericans).Similarly, 6.3% of secondwave recruiters wouldbe Hispanic,and theywouldrecruit on average23.5% whites,5.9% and 11.8% others.Combining blacks,58.8% Hispanics, thesewiththerecruits oftheblacks and others,theoverallcomposition ofthesecond-wave recruits is 40% whites,43.7% blacks, 9.9% Hispanics, and 9.5% others.AsFigure3a illustrates, thistrendcontinues, fromwave to whetherthe ethnicity 1. A seriesof chi-squareanalyseswas conductedto determine ofa recruiter's recruiter affected theethniccomposition oftherecruiter's recruits.Forexample,whiterecruiters weredifferentisignificantly atedbasedon whether black,white,oran "other"person,and theethniccompositheywererecruited bya Hispanic, tionoftheirrecruits werecompared.No significant differences werefound,e.g.,in thecase ofwhiterecruiters, the levelwas 0.44. significance 2. A Markovchainis formally tablesthatdisplaychangesin statefromone periodof equivalentto contingency timeto another,and as suchtheycan be analyzedusinglog-linear models(see Bishopet al. 1975:257-279). This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions 183 184 HECKATHORN 563 .69.19 .281 .5 A .285 .0 16 .063 .059 .5 .095 .094 O H Key: WTNon-Hispanic Whites B-Non-Hispanic Blacks H-Hispanics O-Other .878 881 .012 T3-Town 3 area 2's Affiliated A2-Town3's Affiliated area A3-Town . wherenodes and arrows represent subjectcharacteristics, Figure 2 * Markovchainnetworks, recruitment probabilities represent 2 This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions Respondent-Driven Sampling P rS100 c e n t a - 90- 80- 70- o 50g * f p o p u a t Startingpoint* one black subject 60- 40 302010 0 n p 2 1 0 3 p p 0 0 0 0 0 4 5 6 7 8 9 10 9 10 RecruitmentWave Non-Hispanic White -*- P e r 100- t 80- e 60- ce n p o p u a t n - Other A Hispanic 90- Startingpoint * one white subject 70g a o f Non-Hispanic Black 50- 4030- 20100 1 2 3 4 5 6 RecruitmentWave - Non-Hispanic White Hispanic B --- 7 8 Non-Hispanic Black Other in a Respondent-Driven Figure 3 * Race and Ethnicity ofRecruits Sample,Beginningwitha SingleBlackor WhiteSubject This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions 185 186 HECKATHORN their so theproportion ofblacksdeclinesto approximate stabilizes, wave, untilrecruitment occurrencein the sample. Similarly, what have had re3b shows would happened Figure ofwhiteswould cruitment begunwithone or morewhiteseeds. The initialover-sampling recruitment declinefromwave to wave, untilan equilibrium patternwas attained.Note ofthe withTheorem recruitment is independent One,theultimate that,consistent pattern the That ultimate recruitof initial with which recruitment the is, subject(s) began. ethnicity of whether mentpattern is identical, theinitial seed(s)areblackor white.Theweakness - thattheultimate onthechoiceofinitial chain-referral subjects samples sampledepends doesnot thisbiasarisesonlywhensampling ofsampling. isnotinherent inthisform Instead, to be reached. continue waves for through enough equilibrium iftwoconditions aremet.First, Thisconclusion tosnowball samples, appliesgenerally Thisrequires morestages thesnowball mustcontinue tothepointofequilibrium. process mustcorrespond toa therecruitment thanoccurinmostsnowball samples.Second, process first is severe than be at Markov This constraint less may apparent.For process. regular a to a order Markov a not to first-order but process higher sample corresponding example, toequilibtoslowtheapproach severeproblems. Itseffect doesnotproduce maybe merely waves. additional rium,so sampling mayrequire a meansforcomputing forregular Markovchainsprovides Thelaw oflargenumbers and Snell of the 1960:72).Theequi(seeKemeny analytically equilibrium sampling groups ofequations: thefollowing state(E) ofTableIa's system is foundbysolving librium system 1 =Ew+ Eb+ Eh+ Eo Ew = .698 Ew + .281 Eb + .235 Eh + .5 Eo Eb = .19 Ew + .563 Eb + .059 Eh + .5 E0 Eh = .095 Ea + .063 Eb + .588 Eh (1) (2) (3) (4) fornon-Hispanic whereEw,Eb,Eh,andEo aretheequilibrium whites, sampleproportions the andothergroups, blacks, Equationoneexpresses respectively. Hispanics, non-Hispanic ofpopulation mustsumto one. Thesubsedistributions factthatthesumofproportional sizes oftheequilibrium sizeas a function eachgroup's equilibrium quentequations express ofeachgroup.Berecruitment andeachgroup's ofthegroup, ofallmembers proportional withfourunknowns, thereis a uniquesolucausethisis a system offourlinearequations tion,thatis,Ew= .490,Eb= .297,Eh = .158,andE0= .054.3 is oftheabovetheorem thepractical indetermining ofgreat A matter import significance wouldoccur is reached.Thatthisconvergence therateatwhichtheequilibrium ultimately Markov chainsarecharacwereslow.Fortunately, ifconvergence wouldnotmatter, regular ofconvergence." "a fast kind as describe and Snell what terized very (1960:72) Kemeny by occursgeometrically. thatprovesthatconvergence is basedon a theorem Thisconclusion is that: Theimplication Thesubject THEOREM equilibrium sample approaches bya respondent-driven poolgenerated TWO: rate. ata rapid (i.e.,geometric) occurs Theimplication, below,isthatconvergence exceptina specialclassofcasesdiscussed withtheabovedisis consistent waves.Thisconclusion ofrecruitment withina handful theactualsubject between themeanabsolute cusseddata.Forexample, compodiscrepancy islessthan1%,showing andthecomputed sitionbyrace/ethnicity composition equilibrium recruitment basedongender, hasbeenapproximated. thatequilibrium drugprefSimilarly, location(see TablesIb to Id) also closelyapproximates erence,and recruitment equilibrium, Italso forsystems withuptotengroups. these thatcomputes from theauthor 3. Aprogram isavailable equilibria ofinitial from wavetowave,basedonanyspecification as itchanges thecomposition ofthesample, subjects, computes senda blankdisk, Fora copy, tobereached. ofwavesrequired forequilibrium thenumber andindicates alongwitha an IBMcompatible andrequires ismenudriven totheauthor.Theprogram diskette mailer self-addressed stamped, withVGAgraphics. This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions Respondent-Driven Sampling as reflected in meandiscrepancies betweenthesampleand equilibrium oflessthan3%. This closeapproximation to equilibrium is theoretically oftheinitial expected,in thatirrespective distribution of seeds,approximating to within a of within tolerance 2% equilibrium any of TableI's matrices neverrequiresmorethansix recruitment waves. A further of implication thetheoremis thatthegreater theextentto whichtheinitialsampleapproximates theequilibriumsample,thequickerwillbe theapproachto equilibrium.Hence,choosinga diverse samplespeedstheapproachto equilibrium. Table II * Recruitment byArea,Site2 Table Ha * Recruitment byLocation,Site2 Location ofRecruit Town 2 Location ofRecruiter Town 3 Other Total Town 2 87.8% 1.2% 11.0% 100% Town 3 0% 88.1% 11.9% Other 22.2% 55.6% 22.2% 100% (59) TotalDistribution ofRecruits 49.3% (74) 38.7% (58) 12.0% (18) (Site2's location) 23.7% Equilibrium 63.3% 13.0% (82) 100% (9) 100% (150) 100% = 17.09% (r = .432) Mean Discrepancy, Distribution ofRecruits and Equilibrium Table IIb * Recruitment byLocation,withPartitioning ofthe"Other"Group,Site2 Location ofRecruit Location ofRecruiter Town 2 Town 3 Town2 s Affiliated area(A2) Town3 s Affiliated area(A3) Total Town 2 87.8% 1.2% 11.0% 0% 100% Town 3 0% 88.1% 0% 11.9% Town2's Affiliated area (A2) 50% 0% 50% 0% 100% (59) Town3's Affiliated area (A3) 0% 80% 0% 20% 38.4% 100% (4) 100% 5 7.3% 5.3% 100% (Site2's location) TotalDistribution ofRecruits 49.0% (74) (58) (11) (8) 0% 87.1% 0% 12.9% Equilibrium Mean Discrepancy, ofRecruits Distribution and Equilibrium = 28.15% (r = .330) (82) (150) 100% Considernow thecase ofSite2, in whicha morecomplextheoretic analysisis required to make sense of the data. TableIIa showsrecruitment as a function of the recruiter and recruit's townsoforiginforSite2. HereTownTwois Site2's location.Theseedsweredrawn fromthissite,becausean aim of thestudywas to determine how geographically extensive recruitment wouldbecome. Henceno effort was made to specify thesamplinguniversein This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions 187 188 HECKATHORN Table IIc * Recruitment byLocation,Town2 and Vicinity Location ofRecruiter Town2 LocationofRecruit area (A2) Affiliated Town2 (Site2's location) Affiliated area (A2) 88.9% 11.1% 50% 50% TotalDistribution ofRecruits 87.1% 12.9% (74) (11) 81.8% 18.2% Equilibrium = 5.26% ofRecruits Mean Discrepancy, Distribution and Equilibrium Table IId * Total 100% (81) 100% (4) 100% (85) 100% Recruitment byLocation,Town3 and Vicinity LocationofRecruit area (A3) Affiliated Total Location ofRecruiter Town3 Town3 88.1% 11.9% 80% 20% 100% (59) 100% 12.5% (8) 12.5% 100% (64) 100% Affiliated area (A3) 87.5% (56) 87.5% Equilibrium = 0% ofRecruits and Equilibrium Distribution Mean Discrepancy, ofRecruits TotalDistribution (5) advance. Unlike most studies,thisservedas a outcome variable. TownThreeis a largertown about 15 miles away, and Otherrefersto a mix of locations,eitherin between or (in a few cases) more distant. In-group recruitmentwithinthe two towns is very strong(i.e., 88%), is almost nonexistent(i.e., 1.2% and 0%). Recruitmentbegan from and cross-recruitment Town 2 and laterextended to Town 3, a processthatshows the abilityof RDS to break out of one isolated group to penetrateanother similarlyisolated group. How this occurred is depicted in Figure 1, which shows the largestrecruitmentnetworkfromthis site. During the thirdwave, 5 1/2monthsinto the 12 monthstudyperiod,a Hispanic male subject fromTown 2 recruitedanother Hispanic male fromTown 3. From this tinyfootholdgrew a substantial branch of the recruitmenttree (see the bottom-rightquadrant of Figure 1). That subject directlyor indirectlyrecruited40 othersubjectsfromTown 3 and the surroundingarea. This robustchain-referralsystemcan exploit existingnetwork shows vividlyhow a sufficiently connections to expand into isolated groups. The relative isolation of the two towns derives fromtransportationdifficulties.Many subjects (53.1%) reportthat they do not have access to a car, so hourly bus service is a primarymeans of transportation.Most subjects travelingfromTowns 3 to 2 began with 15 mile bus ride, and then walked an additional mile to the ECHO Project storefront.Though theywere compensated fortheirbus fares,thisreflectsa remarkablecommitmentto participation in the study. That subjectscontinuedto make thistrekduringthe Connecticutwinter is especially remarkable. There is a sharp contrastbetween recruitmentby area at the second and the firstsites. Whereas equilibrium was closely approximatedby the sample distributionin the firstsite This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions Respondent-Driven Sampling is largeat thesecondsite(D=17.09%). This (meandiscrepancy, D=2.86%), thedivergence reflects an over-sampling in Site2 ofTown2 (49.3% versus23.7%), and an under-sampling in Town3 (38.7% versus63.3%). It mightseemthatthisdiscrepancy resultsmerelybecause the systemhas not as yetreachedequilibrium.Thismightappearto explainmuchof the fromequilibrium, becauseat presentin thison-goingstudy,thefirstsubjectfrom departure Town3 appearedin thesample5 1/2monthsintothe 12 monthstudyperiod,and thetenth wave has onlyjust begun. Based on thisincomplete movementtowardsequilibrium, one could thenweightthesample. Forexample,giventhatTown2 is over-sampled 25.6% by (i.e., 49.3%-23.7%), and Town 3 is under-sampled by 24.6% (i.e., 63.3%-38.7%), these townscan be assignedweightsof .481 (i.e., 23.7%/49.3%) and 1.64 (i.e., 63.3%/38.7%), when analyzingethnicity, The "Other"area had respectively gender,and drugpreference. so it is assigneda virtually neutralrateof 1.08 alreadycloselyapproximated equilibrium, but a closer examination ofthe (i.e., 13%/12%). Suchan accountis theoretically plausible, data showsthatit does notfitthisparticular case. As represented in TableHa, theOthergroupservesas a bridgebetweenTowns2 and 3, networkdid recruiting equally(22%) frombothtowns.Whythenin Figure1's recruitment none ofthe 10 subjectsfromthe"other"area serveas bridgesbetweenthetwo towns,i.e., Town3 recruitments? Giventhemodestnumberofcases, whywerethereno Town2/Other/ thiscould be a coincidence, but the same occursin Site 2's otherrecruitment networks. theOthercategory However,on closerinspection, into two disaggregates mutuallyexclusive groups.First,thereare personswho liveoutsideofTown2 and have connections onlyto thattownand to theirown locality.Thismaybe termedthe "affiliated area" of Town 2. area. Becauseoftheconsiderable Second,Town3 also has an affiliated distancebetweenthe areas withnetworkconnectionsto bothtowns. towns,no subjectsresidein intermediate the appearancethatthe Othercategorycan serveas a bridgebetweenthe two Therefore, townsis an illusioncreatedby inappropriate As thiscase illustrates, in RDS categorization. of subjectsto categories is by no meansneutral.It musttakeinto analysistheassignment accountthestructure ofnetworkconnections.Specifically, ifa category includesindividuals withquitedisparatenetworkstructures, differentiated theyshouldbe further by category. TableIIb showsthe matrixproducedwhen the Othercategoryin TableIIa is divided basedon whethertheyare affiliated nonresidents ofTowns2 or 3. As is apparentbyinspection,thisdoes notbringthesystemcloserto equilibrium.Indeed,thesystemis farther from thanpreviously has become equilibrium (D = 28.15%), and thecontentof theequilibrium becauserecruitment is projectedto die out whollyin Town2 and itsaffiliated strange, area. Thereasonforthisproblemis apparentfrominspection ofFigure2b,whichdepictsTableIIb's recruitment datain networkform.Notethatthenetworkis nonergodic, in thateverypoint in thenetworkcannotbe reacheddirectly or indirectly fromeveryotherpoint. Thisoccurs because nodesforTown 3 and itsaffiliated area serveas absorbingstates.Thisis whythe forthissystem(see TableIIb) showsrecruitment as ceasingin Town2 and its equilibrium area,and continuing onlyin Town3 and itsarea. The solutionto thisproblemis straightforward. Becausethenetwork is nonergodic, this datafromthissitecan bestbe systemdoesnotgeneratea regularMarkovprocess.Therefore, thesampleintotwo subsamples, one forTown2 and itsaffiliates, analyzedby partitioning and one forTown3 and affiliates. TablesIIc and IId illustrate thisprocessforTableIIb's data. Bothsystems reachsimilarequilibria, withabout87% recruitment withinthetown,and the remainder fromaffiliated is closelyapproximated areas;equilibrium outlying in bothsystems. In sum,whereasSite2 initially appearedto containa singlerespondent-driven samplewith anomalous properties,it turnsout on closer examination to consistof two distinctsamples. This possibilityshould be suspected whenever, despite a substantialsample size, departures fromequilibriumare great. This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions 189 190 HECKATHORN Assessing Bias in Respondent-DrivenSamples An idealsamplingprocedureyieldsnotonlya sampleindependent ofitsstarting point, but also an unbiasedsampleof theunderlying population,witha knowndegreeof consisintervals canbe computed.However,becauseoftheabsenceof tencyfromwhichconfidence a moremodestgoal has been to devise probability samplesin studiesofhiddenpopulations, meansfordrawingsamplesthatproduce"a goodcross-section of thetargetpopulation,"or in thetargetpopulation"(Spreenand Zwaagstra1994:478). "thecoverageof heterogeneity Such samplesare termed"representative." An assessment ofRDS requiresexaminingboth theextentto whichitfulfills thismodestgoal,and whetheritmightprovidea basisforfulfilidealofprobability lingthemorestringent againtothedatafromSite1, sampling.Returning membersof the majorethnicgroupstendto recruitdifferentially fromwithintheirown ranks(see TableIa). Thisreflects thesocialstructure in whichthesesubjectsare embedded; intra-ethnic connections are morecommonthaninter-ethnic connections.A formof networkanalysis,"biasednetworktheory," has been used to analyzea varietyofrelationships, and friendships (Fararoand Skvoretz1984; Rapoport1977). The theincludingmarriages idea a essential is that structured socialsystem'ssociallinkageswillbe non-random: ory's willbe moreprobablethanothers,where"biases"referto anydepartures somerelationships froma fullyrandompatternofconnection.Biasesmayconsistofeithera tendencytoward as typically occursin friendships; or a tendency towardout-group affiliain-groupaffiliation, tion,as in exogamousmarriagesystems. In an unstructured the prevalenceof each group group,recruitment merelyreflects ofgroupidentity, withinthepopulation.Everyrecruiter, recruits thesame mix irrespective of subjects.Frominspectionof TableI, thatobviouslydoes not fitthe data. In a system wheregroupaffiliation affects selection,the membersrecruited by each individualreflect boththerecruiter's ofdifferent withinthepopubiases,and theprevalence typesofmembers lation. Selectionin such a systemcan be modeledas a processwithtwo conditionalsteps biasevent fora memberofgroupX either (Fararoand Skvoretz1984:233). First,an inbreeding or fails to occur with bias event occurs,withprobability 1-Ix. probability Iftheinbreeding Ix, withcertainty. eventdoes occurs,selectionoccursfromthein-group Second,iftheinbreeding not occur,any individualis selectedrandomly fromthesystem'spopulationirrespective of theprobability ofselecting a memberofanygroupis equal to Therefore, groupmembership. theproportion of thatgroup'smembersin thesystem.In thisformulation, selectionof an eventdid notoccur,whereasselectionof an out-groupmemberprovesthattheinbreeding event,or byrandomselection in-groupmembercan occureitherbecauseoftheinbreeding fromthesystem's themagnitude ofinbreeding a reflects population.As thusconceptualized, mixofculturaland situational fromculturalemphasison in-group affiliation factors, ranging to ease of transportation betweengeographically distinct groups.It can also be affected by offemaleIDUs. suchas thoseused to increaserecruitment steeringincentives, of Thismodelinvolvesa hostofsimplifying assumptions.Forexample,theoccurrence is treatedas a dichotomous event,whereasgradations appearmoreempirically inbreeding are not necessarily undesirable.Formal plausible.However,such simplifying assumptions heuristicmodelsshouldneverbe made morecomplexthanrequiredto capturethefundamentalsof theprocesstheyare intendedto represent, lesttheylose tractability and failto ofthemodelrequiresan assessment ofitsrobustyieldclearconclusions.Ideally,validation ness. First,simplifying assumptionsare replacedby more realisticassumptions.Then, whetherthemodel'sfundamental conclusions remainvalidis assessed.Theaimis to identify the simplestrobustmodel. However,such an assessmentexceedsthe scope of thispaper. Whatcan be said at presentis thatthe Fararo-Skvoretz modelcapturesthe two essential of inbreeding, and thatthe featuresof inbreeding, i.e., thatgroupsvaryin theirstrength This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions Respondent-Driven Sampling selectionsproducethe structure of in-groupand out-groupaffiliations withinthe resulting the is Hence model sufficient for heuristic system. purposes. Considerthecase ofa systemcomposedofN groups,A, B, C, N. Theprobability ofa .... memberofa groupX selecting fromthein-group, is the sum of the of selecprobability Sx, tionbeingcontrolled an eventwithprobability that I,, and the probability by inbreeding, did notoccur(1 - Ix),weightedbytheproportion ofmembersofgroupX in the inbreeding population,Px,i.e., Sxx = Ix + (1 Ix) Px (5) theprobability ofa memberofgroupX selecting a memberofanyoutBy similarprinciples, of selectinga memberof group Y is the group,Y, can be computed.The probability that inbreeding didnotoccur(1-Ix),weightedby theproportion ofmembersof probability groupY in thepopulation,Py,i.e., Sxy= (1 - Ix) Py (6) 1 = Ea + Eb Ea = Saa Ea + SbaEb (7) (8) Withthissetofequations,themodelspecifies therelationship betweentheunderlying populationproportion (P), and selectionprobabilities (S) of thesortdepictedin TableIa. Based on the above pair of equations,equilibrium samplesize can be expressedas a ofbothpopulationand inbreeding function terms.To simplify theanalysis,considerfirst the case. Consistent withthelaw oflargenumbersfortwo-subgroup regularMarkovchains,the ofmembersofeach groupis foundbysolvinga systemoftwolinear equilibrium proportion equations: whichyields, E, = Sba (9) 1-Saa+Sba Thisinturncanbe expanded from 5 and6 toexpress theequilibbysubstitution equations riumsamplein terms oftheinbreeding terms andpopulation: Ea = Pa(1-Ib) (10) 1-(Ia+Pa (1-Ia))+Pa(1-Ib) Thissimplifies to: Ea = Pa(Ib-l) (11) Palb-Paa+Ia-1 Thisexpressionprovidesa basisforderiving conclusions, groundedin bothbiased-network the conditionsunderwhicha RDS willproducean theoryand Markovchains,regarding unbiasedsample,thatis, a samplein whichthe proportion of each groupin the sample in thepopulation. equals thatgroup'sproportion Giventhatinbreeding affects theequilibrium recruitment, groupcomposition (E) need not correspondto the trueunderlying populationdistribution (P). The extentto which membersofanygivengroupwillbe sampleddependson threefactors: thesizeofthegroup, itstendency towardinbreeding, and thestrength ofinbreeding in othergroups.Partialdifferin the sampleincreases,all else equal, as the entiation4shows thata group'sproportion 4. Partialdifferentiation a meansfordetermining whether therelationship betweentwotermsis consistprovides ormixed.Thefirst entlypositive, consistently derivative negative, ofone termwithrespect stepis to computethefirst to theother.The direction oftherelationship is thenfoundbydetermining whether, on parameter givenconstraints is alwayspositive, values,thederivative or mixed.Forexample,to determine the relationship alwaysnegative, betweenequilibrium size fora group,and thatgroup'sinbreeding sample bias,differentiate to Ia, i.e., Ea withrespect dEIdla = a P(Ib) 2 (PIb+PaI+I,- 1 This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions 191 192 HECKATHORN a group'sproportion deincreases.Similarly, group'ssize increasesand as its inbreeding a small other creaseswithincreasesin inbreeding Thus, potentially might by group groups. appear large in the respondent-drivensample if it has stronginbreedingand if members of other groups have weak inbreeding. Conversely,a large group might appear small in the respondent-drivensample ifits inbreedingis weak, and ifothergroups have stronginbreed- of the RDS as a means fordrawingunbiasedsamplesdependson ing. The effectiveness are plausible. whetherthesetwopossibilities The conditionsunderwhichRDS producesunbiasedsamplescan be deducedthrough underwhichEa theconditions analysisofequation11 above. Thiscanbe donebyidentifying 11 substitute for in above: Ea equation Pa equals Pa. First, Pa = Pa(Ib-1) PaIb-PaIa+Ia-1 (12) then, solve forIa, Ia = Ib(Pa-1) (13) Pa-1 which can be simplifiedas tollows: (14) Ia = Ib When describedin termsof Figure 4, this conclusion means that when inbreedingis equal, effects of is exactlyoffset effects ofeachgroup'sinbreeding the inflationary bythedeflationary termscancelone another,leavingsample Thusthetwoinbreeding othergroups'inbreeding. size determined exclusively bypopulationsize. Thoughthisproofappliesto the two-group in larger case it extendsto the three-group case, and has been confirmed by simulations statedas follows: systems.Thistheoremmaybe formally THREE:A respondent-driven THEOREM sampledrawsan unbiasedsampleifall groups'inbreeding termsare equal, i.e.,foranygroupX, E. = P", iffIx = IyforanyothergroupY. A limitationof the theoremis noteworthy.When inbreedingtermsare verylarge,reflecting mutual isolation of the system'sgroups, the approach to equilibriumslows, e.g., when the termsexceed .99, scores of recruitmentwaves may be required to approximateequilibrium. Therefore,given the practicallimitationsthat constrainthe number of recruitmentwaves, equilibrium will be reached only when inbreedingis not extreme. The implicationis that when the boundaries separatinggroupsare virtuallyimpassible,RDS should be used to draw termsproveto suchgroups,and notacross them,evenshouldinbreeding samplesfromwithin be equal. condition.Hence,thequestionofhow termsis a quitestringent Equalityofinbreeding termsare relatedto one anotheris crucialto an assessmentof RDS's abilityto inbreeding the relabiasednetworktheoryprovidesno guidanceregarding avoidbias. Unfortunately, terms.However,othersociologicaltheoriesprovidereasonsto tionshipsamonginbreeding fromSimmel related.A venerableprinciple willbe at leastpositively expectthatinbreeding has that Balkanization enhance common enemies is that implying groupsolidarity, (1955), thatinduces itsboundaries, an epidemicquality,becausewhenanysinglegroupstrengthens thatcreatesa sense othergroupsto do thesame. Morerecentresearchshowsthatanything and Horowitz boundaries tostrengthen ofcommonfateis sufficient 1969). An (Rabbie group determined are for association occurs when this of group identity. by opportunities example a groupwill theirdegreeof inbreeding, Thus,in a systemwhereothergroupsstrengthen is positive, ofthisexpression becauseP, is positivebyassumption, The numerator (1 - Ib) and (1 - Pa) are positive is positive thedenominator numbers arepositive.Similarly, ofthreepositive becauseIa < 1,andPb< 1,andtheproduct areboth and denominator becauseitsnumerator becausethesquareofanynumberis positive.Thus,dE,/dIis positive positive. This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions Respondent-Driven Sampling itsown inbreeding.Conversely, reactby increasing in a systemwhereinbreeding is weak, thegroupwilltendto avoidinbreeding.The cumulativeeffect of theseprocessescreatesa terms.If thisline oftheorizing is correct, the equal-inpositiverelationamonginbreeding in manysocial systems.Of course,thisspeculation breedingassumptionis approximated requiresempiricalconfirmation. It mightseemthatthebestway ofempirically assessingthebias of a RDS wouldbe to it with the from which it was drawn. Unfortunately, in the current compare population thisis notpossiblegiventhehiddennatureofthatpopulation.One approachmight research, be to applythemethodto a non-hidden Such a test populationwithknowncharacteristics. wouldbe valuable,butlessthandefinitive. Problems ofmaskingdo notarisein non-hidden of theresultsto hiddenpopulationswould remainunpopulations,so the generalizability clear. A secondapproachapplicableto somehiddenpopulations mightbe to use veryaggressive measuresto producea saturation sample,so thetotalsamplecouldthenbe compared withpartialsamplespreviously derivedfroma standardRDS. Thismoreaggressive recruitmentmechanism couldinvolvean additional in whichrecruiters are rewardednot incentive, butalso forthoseoftheirrecruits.Thiswouldcreatean incenonlyfortheirown activities, and theirrecruits tiveforrecruiters to cooperatebypoolingtheirnetworksand socialinfluthatsuchan approachmightworkis an innovation ence. One indication thatspontaneously forsome emergedat bothsites.Themaledominated druginjectionscenescreateddifficulties femalerecruiters, and withoutanyprompting fromstaff, thesamesolutionemergedat both sites- somewomenrecruited in pairs.The resultswerehighlyeffective, and suggestthat the robustness ofrecruitment couldbe increasedifall subjectshave incentives to recruitin channeldruginjecpairs. Thisshowsthecapacityofan RDS to harnessand constructively tors'creativity and energy. Table III * Recruitment in Towns1, 2, and 3, and Comparisons withPopulation byEthnicity Distributions* ofRace/Ethnicity in SampleofRecruits Line(P), Comparison (S), Population LivingBelowPoverty and TotalPopulation(T) byTown Race/Ethnicity (n) S Non-Hispanic Black Non-Hispanic White Hispanic Other Total 27.6% 54.3% 14.7% 3.4% 100% (93) Mean Discrepancy (D) TownI P** T 23.7% 16% 45.2% 69% 26.7% 11.9% 4.4% 3.1% 100% 100% (4,310) (28,540) S Town2 P** T S Town3 P** T 11.4% 27.6% 10.9% 55.7% 61.3% 84.2% 31.4% 9.6% 3% 1.4% 1.5% 1.9% 100% 100% 100% (70) (2,993) (42,762) 12.1% 9% 3,6% 56.9% 52% 82.8% 27.6% 39% 12.9% 3.4% 0% .7% 100% 100% 100% (58) (4,342) (59,479) Mean Discrepancy Mean Discrepancy Mean Discrepancy Poor Population,D = 6.5% (r = .926) Total Population D = 7.4% (r = .957) Poor Population, D = 10.92% (r = .804) Total Population, D = 14.5% (r = .946) Poor Population, D = 5.7% (r = .950) Total Population D = 13.0% (r = .943) Between Recruits and: Between Recruits and: Between Recruits and: Notes: * Source,1990 Census - family ofpoverty offourearningbelow$12,674for1989 **Definition Availabledatapermitonlyroughcomparisons betweenthesampledistributions and the hiddenpopulations fromwhichtheyaredrawn.Thisinvolvescomparing samplesofrecruits This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions 193 194 HECKATHORN to the demographics of theircommunities, on thepresumption thatthehiddenpopulation ofracialand willto someextentmirror thelargercommunity. TableIII showscomparisons of ethniccomposition and theracialand ethnicdistribution betweenthesampleofrecruits of thepopulationliving the totalpopulation,and betweenthesampleand thedistribution is potentially becauseinjectors belowthepoverty line. The lattercomparison morerelevant, at bothsites, comedisproportionately fromtheranksofthepoor. Forexample,in aggregate morethanthreequarters ofsubjectsreported linelevelsofincome.Thisanalysis sub-poverty focusesonlyon subjectslivingwithinthesites'threeprincipaltowns. Subjectsscatteredin and moredistantcommunities wereexcluded,becausetoo fewsubjectscame surrounding fromany singlecommunity to providea meaningful samplesize. of several it is to make too much thesetentative not important comparisons, Though are notablefrominspection ofTableIII. First,as measuredby mean discrepancies findings betweenthesampleand population,each samplecorresponds morecloselyto thedistributionof personslivingbelow the povertyline thanto the totalpopulation.Thisno doubt reflects differential recruitment ofthepoorintodruginjection.Second,a majordiscrepancy ofHispanicsin thesample(31.4%) and in boththe existsin Town2 betweentheproportion totalpopulation(3%), and thepoverty a substantial population(9.6%). Thismayconstitute ofHispanicsoffrom10.5 to 3.2 times.Thisdiscrepancy over-sampling mayresultfromany numberof sources:changesin the Town'sethniccomposition since the 1990 census,in whichcase sampling is correct; of attributes the town's special Hispanicpopulationthatintroduce someunknownformofbias intothesample;or an artifact ofincomplete sampling.In minorities.Respondentany case, thisformof samplingdoes not appearto undersample drivensamplingcontinuesat thissite,and ethnographic fieldinvestigation has begun,so thesefurther resolve the issue. investigations may betweenthesampleand poorpopulationsis Finally,in Towns1 and 3 thecomparison Even in close,i.e., mean discrepancies relatively equal only6.5% and 5.7% respectively. Town2, wheretheapparentover-sampling ofHispanicsconstitutes an anomaly,thecomparison becomesrelatively close ifone excludesHispanics.Therefore, in thesecases RDS apto succeed in that a broad cross of the underlying include section pears producingsamples in Spreen'sand Zwaagstra'ssense. population,and henceis "representative" SensitivityAnalysis Giventheimportance oftheequal inbreeding conditionforRDS's abilityto drawunbiased samples,and giventhathoweverstrongly termsmayproveto be, relatedtheinbreeding never be can to coincide it is useful to a sensitivity they expected perform analysisto exactly, is violated.Figure4a redetermine whathappenswhentheequal inbreeding requirement in systems portstheresultsofsimulations composedoffourgroups.5Theverticalaxis represents the mean absolute discrepancybetween population and equilibriumsample are identical,to a distributions. Thistermcan varyfromzero,when the two distributions 5. The simulation in Figure4a involveda sevenstepprocedure: experiments reported issetrandomly. distribution Thiswasdonebychoosing fourrandomnumbers between0 and 1,and (1) Thepopulation eachbythesumofthefournumbers toyieldfourpositive numbers thatsumtoone. Thesenumbers represent dividing thesize ofeach ofthefourgroups,i.e, Pa, Pb,Pc,and Pd. of axiscorresponds toa range withinwhichinbreeding mustfall.Themidpoint (2) Eachpointon Figure6a's horizontal thisrangeis chosenrandomly, withtherequirement withinthesetofvaluesconsistent thatinbreeding cannotexceed oftheinterval mustlie betweenpoints.3 and 1,orbe lessthan0. Forexample,iftherangechosenis .6,themidpoint .7. of fromwithintheabove-chosen terms(Ia, Ib,Ic,andId)areselectedrandomly (3) Inbreeding range,e.g.,ifthemidpoint are chosenbetween.1 and .7. therangeis .4, and thelengthis .6, fourrandomnumbers are computedusingequations5 and 6. (4) Selectionprobabilities This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions Respondent-Driven Sampling maximumvalue of .5 in a systemwithfourgroups(i.e., fora systemof n = 4 groups,the maximumis 2/n= 2/4= .5). Thehorizontal axisrepresents therangewithinwhichinbreedterms fall. This lower limit is that termsare equal. zero, ing range's indicating all inbreeding The range'supperlimitis 1,indicating thatinbreeding mayvarywithinthefulltheoretically allowablerangeof0 to 1,and henceare maximally diverse.To reflect someoftheresource limitations thatnecessarily attendactualresearch, in thesimulations recruitment beganwith a homogeneoussetofinitialsubjects(theworst-case and continuedfor10 waves. possibility), Each simulationwas repeated1,000timesto determine themean absolutediscrepancy betweensampleand population(see the bold line in Figure4a). The area boundedby the dottedlinesrepresents theconfidence forthediscrepancy. interval Whentherangeis zero, and hence the inbreeding termsare all equal, discrepancies are minimized(i.e., 3.3% + 1.5%). Thereis not a perfectassociationbetweensampleand populationbecause of the limitation in thenumberofwaves. It is also apparentbyinspection ofFigure4a thatas the differences increases.However,even when amonginbreeding grows,samplingdiscrepancy is maximally diverse,witha rangeequal to one, a positiveassociationbetween inbreeding and in a samplingdiscrepancy sample populationcontinuesto exist,as reflected (i.e., 12.8% ? 7.7%) thatis substantially lessthanthetheoretically maximum. This possible corresponds to a correlation betweensampleand populationdistributions of r = .57 ? .114. Thus,the resultsshowthat,evenin theworst-case scenariowhereinbreeding is mostdisparate, a RDS nonethelessproducesa samplethatcontainsa cross-section oftheunderlying population. A morecompletecharacterization of the effects of violationsof the equal inbreeding concernstheeffect ofthemagnitude oftheterms.Fromtheoremthreewe know assumption thatifinbreeding is equal and thenumberofwavesis sufficient to reachequilibrium, samplingis unbiased. However,a questionremains:when inbreedingis unequal, does the ofinbreeding affect strength sampling?Figure4b depictstheresultsofa seriesofsimulations thatanswerthisquestion.Asbefore, thevertical axisrepresents thediscrepancy betweenthe axis now represents themagnitudeof populationand thesample. However,thehorizontal therangewithinwhichtheinbreeding inbreeding.Thisis operationalized byfirst specifying termsfall. Thiswas set at a rangeof .5, themean theoretically possiblevalue. Inbreeding termswithinthisrangewerethenallowedtovaryfroma minimum value of.25 (a midpoint smallermidpoint valuewouldnotbe possiblebecausethensomeinbreeding wouldbe negative), and a maximummidpoint valueof.75 (a largermidpoint valuewouldnotbe possible becausethensomeinbreeding wouldexceed 1). As before,theboldlinerepresents thedisand thedottedlinerepresents theconfidence increpancybetweensampleand population, terval.RDS yieldsthemostbiasedsampleswheninbreeding is greatest.However,even in thisworstcase scenario,theassociation betweensampleandpopulationremainspositiveand as reflected in a meandiscrepancy of12.1% ? 8%. Thiscorresponds to a correlasubstantial, tionbetweensampleand populationdistributions of r = .55. Wheninbreeding is weaker lessbiasedsamples(i.e.,meandiscrepancies are (e.g.,lessthan1/3),RDS yieldssubstantially less than4%). In sum,thesensitivity fromtheequal inbreeding analysisshowsthatdepartures assumptionmustbe substantial beforetheyintroduce substantial biasesintothesample.Theformof occurswhen inbreeding is verystrong,and yet departurethathas the mostadverseeffect even undertheseconditionsthe populationsand sampledistributions remainassociated. wereestimated (5) Equilibrium sampledistributions described forFigure3. To reflect theresource usingtheprocedure limitations thatexistin moststudies,recruitment was assumedto continuethrough onlytenwaves,and theinitial seedswereassumedto comefromthesamegroup. betweentheestimated (6) Themeanabsolutedifference distribution ofthe10thwave equilibrium (i.e.,thecomposition ofrecruits), and thepopulation distribution was computed. (7) Steps1 to 6 abovewererepeated1,000times,thenthemeanand standard deviation ofthe meandifferences were computed. This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions 195 196 . HECKATHORN M e 0.5. n i 0.3 D s C r 0.2e p y ConfidenceInterval 0 0 0.2 0.4 0.6 InbreedingBias Range 0.8 1 A M e 0 .5 ........... ............ ................................................................... n D 0.3 . S C r 0.2 . e pMeann ConfidenceInterval 0.1 yn...----------- - -- 0.25 0.35 Constraint Interval * 0.5 0.45 0.55 InbreedingBias Midpoint 0.65 0.75 B as a FunctionofConstraints on Inbreeding Biases Figure 4 * Population/Sample Discrepancy This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions Respondent-Driven Sampling is violated,RDS canbe expectedto proHence,evenwhentheequal-inbreeding assumption duce good cross-sections ofthetargetpopulation. Limitations oftheaboveanalysisshouldbe noted. RecallSpreenand Zwaagstra'sargumentthatsamplesofhiddenpopulationscannotexpectto do morethanofferbroadcrosssectionsofthetargetpopulation.The aboveanalysissuggests thatRDS fulfills thisgoal,and thatmoremaybe possible.As shownabove,biasesin the firststepofrecruitment tendto decreaseas thesuccessivewavesare recruited. undera broadarrayofcircumFurthermore, stances(i.e.,lessthanextremeand positively associatedinbreeding levels),RDS can be theoifmeans retically expectedtoproducesampleswithonlya modestdegreeofbias. Potentially, couldbe foundforindependently be could levels, assessinginbreeding samples weightedto eliminatethesebiases. In addition,steering whichprovidea meansforaltering incentives, or to make levels,couldbe used to eitherover-sample inbreeding groupsofspecialinterest, levels morenearlyequal and therebyreducethe need to weightthe samples. inbreeding Anothermajorremaining taskis theinvestigation ofthesesamples'consistency and thereofestimators. then will it be to standard errorsand sultingvariability Only possible compute confidence intervals forpopulationestimates, and thereby makeRDS intoa fullydefensible statistical samplingprocedure. Conclusion Thispaperpresentedempiricalresultsfroma new approachto accessingand sampling hiddenpopulations,designedto reduceseveraldeficiencies traditional formsof afflicting chain-referral itis usuallyassumedthatinferences aboutindividuals mustrely samples.First, foundby tracingchainsare never mainlyon theinitialsample,sinceadditionalindividuals foundrandomly or even withknownbiases. However,ifthesamplingprocessis allowedto continuethrough itscomposition willbe independent of enoughwavesto reachequilibrium, theinitialsubjects.A secondproblemis thatchain-referral tend to be biased toward samples morecooperativesubjects.Thisproblemcan be reducedby RDS's dual incentivesystem. Individuals who resistresearchers' appealsmaynonetheless yieldto appealsfromtheirpeers. A third,relatedproblemis thatchain-referral samplesare biasedby "masking"(protecting friends incentives bynotreferring them).Thisproblemcan be reducedbecauserecruitment weakenthe reluctanceto approachreclusivepeers. Finally,chain-referral samplesmaybe biasedtowardsubjectswithlargepersonalnetworks.Thereare severalwaysin whichthis problemcan be addressed:weighting samplesbased on networksizes,saturating targeted to increaserecruitment areas,or usingsteeringincentives of subjectswithtraitsassociated withsmallpersonalnetworks. RDS need not be employedalone. It also can be appliedin combination withother methods.Considerfirstnetworksampling(Sudmanet al 1988). Thisformof samplingincreasesthe amountof information obtainedfromeach respondent by includingquestions abouttheirpersonalnetworks.Ideally,respondents are selectedby a procedurethatresults in eachpopulationmemberhavinga knownprobability ofselection, butthisis notpossibleif thepopulationis hidden.Yettheimplication oftheorems one and twois thatthechoiceof the initialsampleneed not introducebias if a suitableprocedureis followed.Thatis, the samplingshouldfirst ofrecruitproceedthrough multiplewaves. Thispermits computation mentnetworks as depictedin TableI. This,together withknowledgeofthecomposition of the initialset of seeds,makesit possibleto computehow manywaves mustbe completed beforethe sample approximatesthe equilibriumdistribution, and therebybecomes independentofitsstarting point.Usuallythiswillbe no morethanthreeto fivewaves. Thus, theRDS analysisprovidesa principled basisfordecidinghowmanywavesmustbe completed beforewhateverbiaseswerepresentin theselectionofseedsis overcome.The subjectsfor This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions 197 198 HECKATHORN thenetworksamplearethendrawnfromthatorsubsequentwaves. In essence,respondentdrivensamplingservesas the initialstageof thesamplingprocedure, whichleads froman initialset of subjectswithan unknownbias,to a subsequentset of subjectsthatare independentof thatbias. WhendealRDS maybe able to playa similarinitialrolein ethnographic investigation. with access to a diverse set of inforhidden ing populations, gaining ethnographic suitably a lengthyand uncertainprocess.RDS mayprovidea meansbothfor mantsis frequently thatdifferent sectorsofthepopulationareadequately speedingthisprocess,and forensuring withnetwork Thus,as in thecase ofRDS's combination represented amongtheinformants. and then the waves reach should be number of to equilibrium computed, sampling, required informants are drawnfromthatand subsequentwaves, untilthe desired ethnographic is attained. numberof informants References and Paul W. Holland Bishop,YvonneM. M., StephenE. Fienberg, Mass: MIT Press. 1975 DiscreteMultivariate Analysis: Theoryand Practice.Cambridge, Broadhead,RobertS., and DouglasD. Heckathorn and new 1994 "AIDSprevention outreach problems amonginjection drugusers:Agency 41:473-495. approaches."SocialProblems Jean-PaulC. Grund,L. SynnStern,and DeniseL. Broadhead,RobertS., DouglasD. Heckathorn, Anthony AIDS: Preliminary resultsof a peer1995 "Drugusersversusoutreachworkersin combating JournalofDrugIssues25(3) 531-564. drivenintervention." Coser,LewisA 1956 The Functions ofSocialConflict.Glencoe,Ill.: FreePress. Deaux, Edward,and JohnW. Callaghan ofhealthbehavior."EvaluationReview9:365versusself-report estimates 1985 "Keyinformant 368. BonnieH Erickson, 10:276-302. fromchaindata." Sociological 1979 "Someproblemsofinference Methodology Fararo,ThomasJ New York:JohnWileyand to Fundamentals. 1973 Mathematical Sociology:An Introduction Sons. Fararo,ThomasJ.,and JohnSkvoretz 6:223-258. and socialstructure theorems:PartII." SocialNetworks 1984 "Biasednetworks Frank,Ove, and TomSnijders thesize of hiddenpopulations 1994 "Estimating usingsnowballsampling."Journalof Official 10:53-67. Statistics Goodman,L. A. 32:148-170. Statistics 1961 "SnowballSampling."AnnalsofMathematical Heckathorn, DouglasD. social norms:A formaltheoryofgroup-mediated 1990 "Collective sanctionsand compliance Review55:366-384. control."AmericanSociological versusselective actionand groupheterogeneity: 1993 "Collective Voluntary provision Review58:329-350. incentives."AmericanSociological 1996 "Dynamicsand Dilemmasof CollectiveAction."AmericanSociologicalReview61:250278. Heckathorn, DouglasD., and RobertS. Broadhead. and Society8:235-260. 1996 "Rationalchoice,publicpolicy,and AIDS." Rationality Kemeny,JohnG.,and J.LaurieSnell N.J.:VanNostrand. 1960 FiniteMarkovChains. Princeton, This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions Respondent-Driven Sampling PeterD., and H. RussellBernard Killworth, 1978/79"The reversalsmallworldexperiment." SocialNetworks 1:159-192. Klovdahl,Alden.S. 1989 "Urbansocialnetworks:Some methodological In The Small problemsand possibilities." World,M. Kochen(ed.), 176-210.Norwood,N.J.:Ablex. Rabbie,JacobM., and MurrayHorwitz 1969 "Arousalofin-group-out-group biasbya chancewin or loss." JournalofPersonality and SocialPsychology 13:269-277. Anatol Rapoport, 1979 "A probabilistic 2:1-18. approachto networks."SocialNetworks Simmel,Georg 1955 Conflict:The WebofGroupAffiliations. and ReinhardBendix.New Trans.KurtH. Wolff York:FreePress. Spreen,Marius 1992 "Rarepopulations, hiddenpopulations, and link-tracing designs:Whatand why?"Bulletin de Methodologie Sociologique6:34-58. Spreen,Marius,and RonaldZwaagstra 1994 "Personalnetworksampling, the outdegreeanalysisand multilevel analysis:Introducing networkconceptin studiesofhiddenpopulations."International Sociology9:475-491. and GrahamKalton Sudman,Seymour, 1986 'New developments in thesampling ofspecialpopulations."AnnualReviewofSociology 12:401-429. JohnK., and Patrick Biernacki Watters, 1989 "Targetedsampling:Optionsfor the studyof hidden populations."Social Problems 36(4):416-430. Welch,Susan 1975 "Sampling in a dispersed 39:237-245. byreferral population."PublicOpinionQuarterly This content downloaded from 129.110.242.8 on Fri, 20 Dec 2013 16:14:08 PM All use subject to JSTOR Terms and Conditions 199
© Copyright 2025 Paperzz