DataAcquisition and StatisticalAnalysis Factsarestubborn,butstatisticsaremorepliable. - MarkTwain LauraSaba,PhD AssistantProfessor DepartmentofPharmaceuticalSciences SkaggsSchoolofPharmacyandPharmaceuticalSciences UniversityofColoradoDenver [email protected] ReproducibleResearch “theideathattheultimateproductofresearch isthepaperalongwiththefullcomputational environmentusedtoproducetheresultsin thepapersuchasthecode,data,etc. necessaryforreproductionoftheresultsand buildingupontheresearch” - Wikipedia NSFDefinitionof Reproducibility/Replicability • “reproducibility referstotheabilityofa researchertoduplicatetheresultsofaprior studyusingthesamematerialsaswereused bytheoriginalinvestigator” • “replicability referstotheabilityofa researchertoduplicatetheresultsofaprior studyifthesameproceduresarefollowedbut newdataiscollected" ReproducibleResearch Fig1.Studiesreportingtheprevalenceofirreproducibility. FreedmanLP,CockburnIM,SimcoeTS(2015)TheEconomicsofReproducibilityin PreclinicalResearch.PLoS Biol 13(6):e1002165. CostofIrreproducibleResearch Fig2.EstimatedUSpreclinicalresearchspendandcategoriesoferrorsthatcontributeto irreproducibility. FreedmanLP,CockburnIM,SimcoeTS(2015)TheEconomicsofReproducibilityin PreclinicalResearch.PLoS Biol 13(6):e1002165. Objectives • DataAcquisition – Learngeneralguidelinesforunbiaseddatacollection – Knowthe3topthingstoconsiderwhenstoring data – Explainwhoowns researchdataandwithwhoandhowit shouldbeshared • StatisticalAnalysis – Learnhowtoidentifyoutliers andwhattodowiththem – Recognizethetradeoffsbetweensuitable,better,andbest methods forstatisticalanalyses – Learntoidentifycommonmistakesanddeceptionswhen display andinterpretation ofresults DataAcquisition DataCollection DataCollection 1. AppropriateMethods – Garbagein,garbageout(biaseddatacollection,e.g.,sampleselection, biasedresults) 2. Attentiontodetail – Accuracyinrecording,interpretation,publications 3. Authorized – HIPAA,hazardousmaterials,copyrights,etc. 4. Recording – – – Hardcopyevidenceshouldbeenteredintoanumbered,boundnotebook Electronicevidenceshouldbevalidatedinsomewaytoassurethatitwas actuallyrecordedonaparticulardateandnotchangedatsomelaterdata Notonlyshoulddataderivedfromtheresearchbeaccuratelyrecorded,but alsodetailedinformationonproceduresincludingmaterialsused,e.g., chemicalagents. TakenfromORIIntroductiontoRCR (http://ori.hhs.gov/education/products/RCRintro/) Pannucci CJ,WilkinsEG.Identifyingandavoidingbiasinresearch.Plast Reconstr Surg.2010Aug;126(2):619-25. Reasonstokeepaccuraterecords • Reproducibility • Futureanalyses • Investigationsof misconduct • Provingownershipof intellectualproperties • Others? CaseStudy fromResponsibleConductofResearch • Dr.Zismentoringa“promising”medicalstudentoverthe summerinhisresearchlab • Student’sproject: – cancercelllinethatrequires3weekstogrowinordertotestfor aspecificantibody – thestudenthasalreadywrittenashortpaperonhiswork • Dr.Z’sdilemma: – aftergoingovertherawdata,somedatawereonpiecesof yellowpadswithoutclearidentificationfromwhichexperiment thedatacame – someoftheexperimentswererepeatedseveraltimeswithout explanationastoway – Dr.Zisnothappyaboutthedata,butdoesn’twantto discouragethestudentfrompursuingacareerinresearch CaseStudy fromResponsibleConductofResearch • • Dr.Zismentoringa“promising” medicalstudentoverthesummerin hisresearchlab Student’sproject • Whatistheprimary responsibilityofthementor? Dr.Z’sdilemma: • Shouldthestudentwritea shortpaperandsenditfor publication? – cancercelllinethatrequires3weeksto growinordertotestforaspecific antibody – thestudenthasalreadywrittenashort paperonhiswork • – aftergoingovertherawdata,some datawereonpiecesofyellowpads withoutclearidentificationfromwhich experimentthedatacame – someoftheexperimentswererepeated severaltimeswithoutexplanationasto way – sheisnothappyaboutthedata,but doesn’twanttodiscouragehimto pursueacareerinresearch • Shouldthementorwritea shortpaperandsenditfor publication? • Ifyouwerethementor,what wouldyoudo? DataAcquisition DataStorage DataStorage “Overtime,data,asthecurrencyof research,becomeaninvestmentin research.Ifthedataarenot properlyprotected,theinvestment, whetherpublicorprivate,could becomeworthless” – ORIIntroductiontoRCR ConsiderationsWhenStoring Data/Research • Catastrophe – Labnotebooksareina“safe”place – Electronicdataarebackedupandstored inaseparatelocation – Samplesarestoredproperlytoavoid contamination • Confidentiality – Informationonhumansubject– see HIPAAguidelines – Informationonintellectualproperty • Periodofretention – NIHgenerallyrequires3yearsafter projectend – Otheragenciesmayrequireupto7years afterprojectend – Otherunforeseenuses… TakenfromORIIntroductiontoRCR (http://ori.hhs.gov/education/products/RCRintro/) DataAcquisition DataOwnership/Sharing Ownership/DataSharing Whoownsthedata? • Researchers • Funders – Grantsvs.Contracts • ResearchInstitutions • e.g.,“forthemostpart,NIH makesawardstoinstitutions andnotindividuals”– NIHData SharingPolicyand ImplementationGuidance • DataSources – Subjects – Countries IllustrationbyDavidZinn TakenfromORIIntroductiontoRCR (http://ori.hhs.gov/education/products/RCRintro/) Intheheadlines… http://www.sandiegouniontribune.com/news/2015/jul/02/ucsd-sues-usc-aisen/ AfewinterestingquotesfromtheNIHData SharingPolicyandImplementationGuidanceon DataSharing “Finalresearchdataarerecorded factualmaterialcommonlyaccepted inthescientificcommunityas necessarytodocument,support,and validateresearchfindings.” AfewinterestingquotesfromtheNIHData SharingPolicyandImplementationGuidance “NIHexpectstimelyreleaseand sharingofdatatobenolaterthan theacceptanceforpublicationofthe mainfindingsfromthefinaldataset.” AfewinterestingquotesfromtheNIHData SharingPolicyandImplementationGuidance “Forthemostpart,itisnot appropriatefortheinitial investigatortoplacelimitsonthe researchquestionsandmethods otherinvestigatorsmightpursue withthedata.” CaseStudy fromResponsibleConductofResearch Drs.KandWareconductingaNIH-fundedlong-term(25years),observational studyofthehealthofpesticideapplicators. – Initialhealthassessment(healthhistory,physicalexam,bloodandurinetests, DNAsample,anddustsamples) – Yearlyhealthsurveysandfullhealthassessmentevery4years Afterthefirst15years: • Publishedmorethanadozenpaperfromthedatabase • Requireaelaboratedata-sharingagreementbeforereleasingthedata DrsKandW’sdilemmaisthattheyrecentlyreceivedrequestsforaccessto thedatabasefrom: • Apesticidecompany • Acompetingresearchteam • Aradicalenvironmentgroupwithananti-pesticideagenda CaseStudy fromResponsibleConductofResearch Drs.KandWareconductingaNIH-fundedlongterm(25years),observationalstudyofthe healthofpesticideapplicators. – – Initialhealthassessment(healthhistory, physicalexam,bloodandurinetests,DNA sample,anddustsamples) Yearlyhealthsurveysandfullhealth assessmentevery4years Afterthefirst15years: • Publishedmorethanadozenpaperfromthe database • Requireaelaboratedata-sharingagreement beforereleasingthedata DrsKessenbaum andWilcox’sdilemmaisthat theyrecentlyreceivedrequestsforaccessto thedatabasefrom: • Apesticidecompany • Acompetingresearchteam • Aradicalenvironmentgroupwithanantipesticideagenda QUESTIONS • HowshouldDrs.KandW handletheserequeststo accesstheirdatabase? • Isitethicaltorequire peoplewhorequestdata tosignelaboratedata sharingagreements StatisticalAnalysis TipsforReproducibleStatistical Analyses 1. ALWAYSkeepaversionofthe“mostraw”data – 2. Useascriptinglanguage – – – 3. ProgramslikeRandSASallowyoutofollowyourstepsexactly ifyou (orsomeoneelse)hadtoredoyouranalysis EasilyexecuteanddocumentQCsteps Avoidcopy/pasteerrors Addcomments/notesdirectlytoprogram – – 4. Recordwhenandwhereitwascreated,soyoucaneasilytellifithas beenchangedsincecreation Whyareyoudoingthisstep? Whatisthegoalofthisstep? Exportprecisetables/figuresfromprogram – – Avoidtranspositionerrors Savetime/energywherechangesarerequestedininitialsteps StatisticalAnalysis Outliers Outliers Without Outlier 220 1000 With Outlier ● correlation coefficient = 0.7 p−value = <0.0001 correlation coefficient = 0.06 p−value = 0.5636 ● 160 ● ● ● ● ● ● ● 60 ● ● ● 65 Height (inches) 70 ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 60 ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● 120 ● ●● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 140 400 600 Weight (lbs) 180 ● ● ● ● ● ● ●●●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● 200 Weight (lbs) 800 200 ● 65 Height (inches) 70 OutlierMitigation 1. Identify – – – 2or3standarddeviations Unrealisticvalues Inconsistent 2. Investigate – – Wasthereatechnicalissue?typo?etc? Isitevenapossibletruevalue? 3. RemediatewithDOCUMENTATION – Makearuleandwriteitdown 4. Sensitivityanalysis – Whatwouldhavehappenedifyouhadn’teliminated values?Isyouresultrobust? CaseStudy fromResponsibleConductofResearch Anonymoussurveyofcollegestudentsonopinionaboutacademic integrity • 20questions(Likert scale) • 10open-endedquestions • 480surveysadministered(320responses) Issues: 1. 8surveysappearaspracticaljokes(obscenities,additional numbersaddedtoscale,etc.) – 2. 35respondentsappeartobeconfusedaboutscale – 3. Somequestionsappearusablebutsomearenot Theyanswer“5”when“1”ismorelogicalgiventheirotheranswers 29surveyshavenamesonthemwhenrespondentswere instructednottodoso CaseStudy fromResponsibleConductofResearch Anonymoussurveyofcollegestudents onopinionaboutacademicintegrity • 20questions(Likert scale) • 10open-endedquestions • 480surveysadministered(320 responses) QUESTIONS: 1. Howshouldtheresearchers dealwiththesesissueswith theirdata? Issues: 1. 8surveysappearaspracticaljokes 2. Shouldtheytrytoedit/fix surveysthathaveproblems? 3. Shouldtheythrowawayany surveys?Whichones? 4. Howmighttheirdecisions concerningthedispositionof thesesurveysaffecttheir overallresults? – 2. 35respondentsappeartobe confusedaboutscale – 3. Somequestionsappearusablebut somedonot Theyanswer“5”when“1”ismore logicalgiventheirotheranswers 29surveyshavenamesonthem whenrespondentswereinstructed nottodoso StatisticalAnalysis Suitable,better,andbestmethodsfor analysis MethodsforStatisticalAnalysis • Whatisthenorminthefield? • Aspectrumofalternativestatisticalmethods Bias, Inappropriate method Generalmethod withstated assumptions Moststatistically rigorousmethod thatevaluates most/all assumptions Increasingscope Increasingmonetaryandtimecosts Increasingprecision Knowtheassumptionsofanytest • Isthesamesubject/samplemeasuredmore thanonce? • Arethedatanormallydistributed? • Isthereequalvarianceineachgroup? • Aresubjectsrandomlyassignedtoa treatment?Aretheymatched? • Istemporalorderassumed? StatisticalAnalysis Displayandinterpretationofresults DisplayingResults DisplayingResults ConfidentinobtainingfirstNIHR01Grant? InterpretingResults • Associationvs.Causation – Causationcanonlybeproven inacarefullydesignedand carefullycontrolledprospective study • Eatingmorechocolatewillnot causeyoutobecomeaNobel Laureate • PotentialConfoundingIssues – Confoundingvariable– “extraneousvariableina statisticalmodelthatcorrelates withboththedependent variableandtheindependent variable”– Wikipedia – e.g.,Coffeedrinkersaremore likelytogetlungcancer • Smokersaremorelikelytobe coffeedrinkersandsmokersare morelikelytogetcancer HighlightsofEthicalGuidelinesforReportingStatistical Analysis/ResultsinPublications FromAmericanStatisticalAssociation’sEthicalGuidelinesforStatisticalPractice 1. Reportstatisticalandsubstantiveassumptionmadein thestudy. 2. Accountforalldataconsideredinastudyandexplain thesample(s)actuallyused 3. Reportthesourcesandassessedadequacyofthedata 4. Reportthedatacleaningandscreeningprocedures used 5. Clearlyandfullyreportthestepstakentoguard validity.Addressthesuitabilityoftheanalytic methodsandtheirinherentassumptionsrelativeto thecircumstancesofthespecificstudy Acknowledgements/References • • • Dr.PaulaHoffman Dr.BrandieWagner ORCandCCTSI References ResponsibleConductofResearchbyAdil EShamoo andDavidB.Resnick.SecondEd.OxfordUniversity Press,2009. NIHDataSharingPolicyandImplementationGuidance (https://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm),March5,2003. EthicalGuidelinesforStatisticalPractice,AmericanStatisticalAssociation (http://www.amstat.org/ASA/Your-Career/Ethical-Guidelines-for-Statistical-Practice.aspx),April 2016. IntroductiontoRCR– 6.DataManagementPractices,OfficeofResearchIntegrity,USDepartmentof HealthandHumanServices(http://ori.hhs.gov/education/products/RCRintro/c06/0c6.html), RevisedEditionAugust2007
© Copyright 2025 Paperzz