Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Author(s): Yoav Benjamini and Yosef Hochberg Reviewed work(s): Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 57, No. 1 (1995), pp. 289-300 Published by: Blackwell Publishing for the Royal Statistical Society Stable URL: http://www.jstor.org/stable/2346101 . Accessed: 15/02/2012 12:16 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. Blackwell Publishing and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to Journal of the Royal Statistical Society. Series B (Methodological). http://www.jstor.org J. R. Statist.Soc. B (1995) 57, No. 1, pp. 289-300 Controllingthe False DiscoveryRate: a Practical and Powerful Approach to Multiple Testing By YOAV BENJAMINIt and YOSEF HOCHBERG Tel Aviv University, Israel [Received January1993. Revised March 1994] SUMMARY The common approach to the multiplicity problemcalls for controllingthe familywise errorrate(FWER). This approach,though,has faults,and we pointout a few.A different approach to problemsof multiplesignificancetestingis presented.It calls forcontrolling theexpectedproportionof falselyrejectedhypotheses -the falsediscoveryrate. This error rateis equivalentto theFWER whenall hypothesesare truebutis smallerotherwise.Therefore, in problemswherethe controlof the false discoveryrate ratherthan that of the FWER is desired,thereis potentialfora gain in power. A simplesequentialBonferronitypeprocedureis provedto controlthe falsediscoveryrate forindependentteststatistics, and a simulationstudyshows that the gain in power is substantial.The use of the new procedureand the appropriatenessof the criterionare illustratedwithexamples. Keywords: BONFERRONI-TYPE PROCEDURES; FAMILYWISE ERROR RATE; MULTIPLECOMPARISON PROCEDURES; p-VALUES 1. INTRODUCTION Whenpursuing researchers tendtoselectthe(statistically) multiple signifinferences, and supportof conclusions. icantonesforemphasis,discussion An unguarded use resultsin a greatlyincreasedfalsepositive(signifof single-inference procedures icance) rate. To controlthis multiplicity (selection)effect,classical multipleof committing comparison procedures (MCPs) aim to controltheprobability any consideration. The typeI errorin familiesof comparisonsundersimultaneous errorrate(FWER) is usuallyrequiredin a strongsense, controlof thisfamilywise ofthetrueand falsehypotheses i.e. underall configurations tested(see forexample Hochbergand Tamhane(1987)). EventhoughMCPs havebeenin use sincetheearly1950s,and in spiteof the forsomejournals,as wellas in some advocacyfortheiruse (e.g. beingmandatory suchas theFood and DrugAdministration institutions of theUSA), researchers In medicalresearch,forexample, havenot yetwidelyadoptedtheseprocedures. Godfrey (1985),Pococket al. (1987)and Smithet al. (1987)examinedsamplesof reportsof comparativestudiesfrommajor medicaljournals.They foundthat overlookvariouskindsof multiplicity, researchers and as a resultreporting tends treatment differences to exaggerate (Pocock et al., 1987). withclassicalMCPs whichcause theirunderutilization Some of thedifficulties in appliedresearchare as follows. tAddressfor correspondence:Departmentof Statistics,School of MathematicalSciences, Sackler Faculty for Exact Sciences, Tel Aviv University,Tel Aviv 69978, Israel. E-mail: [email protected] ? 1995 RoyalStatisticalSociety 0035-9246/95/57289 290 BENJAMINI AND HOCHBERG [No. 1, MCPs concernscompariof FWER controlling (a) Muchof themethodology aremultivariate whoseteststatistics andfamilies treatments sonsofmultiple arenotofthe encountered manyoftheproblems normal(or t). In practice, normal.In are notmultivariate type,and teststatistics multiple-treatments types. of different with statistics often combined fact,familiesare thatcontroltheFWER in thestrongsense,at levels (b) Classicalprocedures less in single-comparison tendto havesubstantially problems, conventional powerthantheper comparisonprocedureof thesamelevels. (c) Oftenthe controlof the FWER is not quiteneeded.The controlof the FWER is important whena conclusionfromthevariousindividualinferencesis likelyto be erroneouswhenat leastone of themis. Thismaybe are competing against thecase, forexample,whenseveralnewtreatments which anda singletreatment is chosenfromthesetoftreatments a standard, are declaredsignificantly betterthanthestandard.However,a treatment groupand a controlgroupare oftencomparedby testingvariousaspects The overall endpointsin clinicaltrialsterminology). oftheeffect (different evenifsome is superior neednotbe erroneous thatthetreatment conclusion are falselyrejected. of thenullhypotheses has been partiallyaddressedby therecentlineof research The firstdifficulty whichusetheobservedindividual p-values, Bonferroni-type procedures, advancing faithful to FWER control:Simes(1986),Hommel(1988),Hochberg whileremaining stillpresenta seriousproblem. (1988)and Rom (1990).The othertwodifficulties errorrate(PCER) approach,whichamounts whya percomparison Thisis probably is stillrecommended bysome(e.g. themultiplicity to ignoring problemaltogether, Saville(1990)). In In thisworkwe suggesta newpointof viewon theproblemof multiplicity. shouldbe takeninto oferroneous thenumber rejections problems manymultiplicity anyerrorwas made.Yet, at thesame accountand notonlythequestionwhether is inversely related of loss incurred erroneous rejections by time,theseriousness the of a desirable error From this point view, to thenumberof hypotheses rejected. the rejected rate to controlmay be the expectedproportionof errorsamong integrates rate(FDR). Thiscriterion whichwe termthefalsediscovery hypotheses, in of committed multipleSpj0tvoll's(1972) concernabout the number errors of a false withSoric's(1989)concernabouttheprobability problems, comparison givena rejection.We use thetermFDR afterSoris(1989),whoidentified rejection witha 'statistical discovery'. a rejectedhypothesis in Section2.1 a formaldefinition of the we some present After preliminaries, thiserrorrateare butimportant of controlling FDR. Two immediate consequences procedures. given:it impliesweakcontrolof FWER and it admitsmorepowerful In Section2.2 wepresent someexampleswherethecontroloftheFDR is desirable. FDR controlling and procedure In Section3 we presenta simpleBonferroni-type anddemonstration ofitsproperties. therestofthesectionis devotedto a discussion a simulation studyof thepowerof theprocedure. Section4 presents 2. FALSE DISCOVERY RATE of which m (null)hypotheses, Considertheproblemof testingsimultaneously the rejected.Table 1 summarizes moare true.R is the numberof hypotheses 19951 291 CONTROLLING FALSE DISCOVERY RATE TABLE 1 m nullhypotheses Numberof errorscommitted whentesting Declared Declared non-significant significant True null hypotheses Non-truenull hypotheses Total U T V S m-mO m-R R m mO ina traditional form.Thespecific m hypotheses areassumedto be known situation in advance.R is an observable randomvariable;U, V, S and T are unobservable randomvariables.If eachindividual nullhypothesis is testedseparately at levela, in a. We use theequivalent thenR = R(a) is increasing lowercase letters fortheir realizedvalues. In termsof theserandomvariables,the PCER is E(V/m) and the FWER is eachhypothesis atlevela guarantees P(V > 1).Testing thatE(V/m)S a. individually Testingindividually each hypothesis at levela/lmguarantees thatP(V > 1) < a. 2.1. Definitionof False DiscoveryRate of errorscommitted The proportion nullhypotheses by falselyrejecting can be viewedthrough therandomvariableQ = V/(V + S)-the proportion oftherejected whichare erroneously nullhypotheses we defineQ = 0 when rejected.Naturally, V + S = 0, as no errorof falserejectioncan be committed. Q is an unobserved (unknown)randomvariable,as we do not knowv or s, and thusq = v/(v+ s), and data analysis.We definethe FDR Qe to be the evenafterexperimentation of Q, expectation Qe = E(Q) = E{V/(V + S)} = E(V/R). Two properties of thiserrorrateare easilyshown,yetare veryimportant. aretrue,theFDR is equivalent to theFWER: in this (a) If all nullhypotheses case s = 0 and v = r, so if v = 0 thenQ = 0, and if v> 0 thenQ = 1, leading to P(V > 1) = E(Q) = Qe. Thereforecontrol of the FDR implies controlof theFWER in theweaksense. (b) Whenmo< m, the FDR is smallerthanor equal to the FWER: in this case,if v > 0 thenvir < 1, leadingto X(v> l) > Q. Takingexpectations on bothsideswe obtainP(V > 1) > Qe, and thetwocan be quitedifferent. As a result,any procedurethatcontrolsthe FWER also controlsthe FDR. theFDR only,itcanbe lessstringent, controls and However,ifa procedure a gainin powermaybe expected.In particular, thelargerthenumberof null hypotheses the non-true is, the largerS tendsto be, and so is the betweentheerrorrates.As a result,thepotential difference forincreasein are non-true. poweris largerwhenmoreof thehypotheses 292 BENJAMINIAND HOCHBERG [No. 1, 2.2. Examples Thefollowing examplesshowtherelevance ofFDR controlin sometypicalsituaofa largenumberof rejections tions.In additiontheyindicatethedesirability (disBoth are addressed the coveries). aspects by proceduregivenlaterin Section3. One typeof multiple-comparison probleminvolvesan overalldecision(concluinferences. An exampleof sion,recommendation, etc.)whichis basedon multiple thistypeof problemsis the'multiple end pointsproblem',whichwas used earlier to show thatFWER controlis not alwaysneeded.In thisexamplethe overall a new treatment decisionproblemis whetherto recommend over a standard ofnullhypotheses Discoveries herearerejections treatment. thattreatment claiming on specified endpoints.Theseconclusions thanstandard is no better aboutdifferent are of interest aspectsof thebenefitof thenewtreatment per se, but theset of willbe used to reachan overalldecisionregarding discoveries thenewtreatment. to makeas manydiscoveries as possible(whichwillenhancea We wishtherefore decisionin favourof thenewtreatment), subjectto controlof theFDR. Control as a smallproportion of anyerroris unnecessarily of theprobability of stringent, errorswillnot changethe overallvalidityof theconclusion. involvesmultiple Another typeofproblems separatedecisionswithout an overall An exampleofthistypeis themultiple-subgroups decisionbeingrequired. problem, arecomparedinmultiple and separaterecommenwheretwotreatments subgroups, treatments mustbe made forall subgroups.As usual dationson the preferred we wishto discoveras manyas possiblesignificant differences, thereby reaching of operationaldecisions,butwouldbe willingto admita prespecified proportion misses,i.e. willingto use an FDR controlling procedure. wheremultiple The thirdtypeinvolvesscreening problems, potentialeffects are One exampleis screening to weedoutthenulleffects. screened ofvariouschemicals Anotherexampleis testingmultiplefactorsin forpotentialdrugdevelopment. an experimental (2k say) design.In suchexampleswe wantto obtainas manyas factorsthat affectthe possiblediscoveries(candidatesfor drugdevelopments, qualityofa product)butagainwishto controltheFDR, becausetoolargea fraction of falseleads wouldburdenthesecondphase of theconfirmatory analysis. 2.3. Alternative Formulations to capturetheerrorratevaguelydescribed We havesuggested as 'theproportion of falsediscoveries' to discuss usingtheFDR. At thispointitmightbe illuminating formulations of thisconcept,and thusto motivateour choiceof FDR alternative further. therandomvariableQ at eachrealization is mostdesirUndoubtedly, controlling forif mO = m and evenif a singlehypothesis able. Thisis impossible, is rejected v/r= 1 and Q cannotthenbe controlled. Controlling (V/RIR > 0) has thesame - it is identically 1 in theabove configuration. problem Therefore E(V/RIR > 0) The FDR, instead,is P(R > 0) E(V/RIR > 0), and, as will cannotbe controlled. be shownlater,thisis possibleto control. thatSoric(1989) gave to 'theproportion Second,considerthe formulation of as Q' = E(V)/r. This quotientis neither falsediscoveries amongthediscoveries' but is a mixtureof expectations therandomvariableQ nor its expectation and It is not eventheconditionalexpectation of Q, namelyE(Q IR = r) realizations. = E(V IR = r)/r,whichhas againtheproblemof controlformO = m. 1995] CONTROLLING FALSE DISCOVERY RATE 293 1, and aretrueit is identically Third,considerE(V)/E(R). Whenall hypotheses again impossibleto control.A remedymaybe givenby eitheradding1 to the to solution,or by changingthedenominator a somewhatartificial denominator, in the same way will and denominator bothnumerator E(R IR > 0). Modifying againrunintoproblemsof controlwhenmo= m. 3. FALSE DISCOVERY RATE CONTROLLING PROCEDURE 3.1. TheProcedure Consider testingH1, H2, . . ., Hm based on the correspondingp-values P1, P2, ..., Pm. Let P(l) < P(2) < ... < P(m) be the orderedp-values, and denoteby Bonferronito P(i). Definethe following corresponding H(i) the nullhypothesis procedure: typemultiple-testing let k be thelargesti forwhichP(i) < mq (1) of false and foranyconfiguration teststatistics Theorem1. For independent theabove procedurecontrolstheFDR at q*. nullhypotheses, lemma,whoseproofis given Proof. The theoremfollowsfromthefollowing in AppendixA. thenrejectall H(i) i = 1, 2, ..., k. Lemma. For any 0 < mO < m independentp-values correspondingto truenull to and foranyvaluesthattheml = m - mO p-valuescorresponding hypotheses, can take, the multiple-testing proceduredefinedby the false null hypotheses procedure(1) above satisfiestheinequality PPm= pml) mOq*. (2) m are false.Whatever thejoint mO of thehypotheses E(Q IPmo+1 = Pi, * * Now, supposethatml = m distributionof P1I', ..., Pm',which correspondsto these false hypothesesis, (2) above we obtain inequality integrating < q*' E(Q) < MOq* m and theFDR is controlled. of theteststatistics to the corresponding Remark. Notethattheindependence is not neededfortheproofof thetheorem. falsenullhypotheses extension to by Simes(1986)as an exploratory Thisprocedurewas mentioned thatall nullhypotheses hypothesis his procedureforrejectionof theintersection are trueif,forsomei, P(i) < ia/m.WhereasSimes(1986)showedthathisproceHommel(1988) nullhypothesis, durecontrolsthe FWER undertheintersection forinference on individual doesnot hypotheses showedthattheextended procedure of the falsenull controlthe FWER in the strongsense:forsomeconfiguration of an erroneousrejectionis greaterthana. theprobability hypotheses, a different so that wayto utilizeSimes'sprocedure Hochberg(1988)hassuggested thefollowing procedure: it doescontroltheFWER in thestrongsense,byoffering 294 [No. 1, BENJAMINIAND HOCHBERG let k be the largesti for whichP(i) < m + 1 - i ; then reject all H(i) i = 1, 2, . . ., k. betweenHochberg'sprocedureand theFDR controlling Note therelationship and theFDR whenq* is chosento equal a. BothHochberg's procedure procedure P(m) whichstartby comparing procedures, controlling procedureare step-down -as ifa PCER approachhadbeen arerejected withax,andifsmallerall hypotheses taken.If P(m)> a, proceedto smallerp-valuesuntilone satisfiesthecondition. as in P(1) witha/rm, earlier,by comparing end,if notterminated The procedures At thetwoendstheprocedures are similar,but,in a pureBonferroni comparison. between,the sequence of p(i)s is compared with {1 - (i - 1)/m}ain the current The series procedure. ratherthanwith{ 1/(m+ 1 - i)}a in Hochberg's procedure, oftheFDR controlling methodis alwayslargerthan constants oflinearly decreasing ratiois as constantsof Hochberg,and theextreme thehyperbolically decreasing procedure largeas 4m/(m+ 1)2 at i = (m + 1)/2. This showsthatthe suggested methodandtherefore at leastas manyhypotheses as Hochberg's rejectssamplewise methodssuch as Holm's has also greaterpowerthan otherFWER controlling (1979). Procedure 3.2. Exampleof False DiscoveryRate Controlling activator(rt-PA)and plasminogen withrecombinant tissue-type Thrombolysis infarction activator (APSAC) in myocardial streptokinase anisoylated plasminogen theeffects Neuhauset al. (1992)investigated has beenprovedto reducemortality. ofrt-PAversusthoseobtainedwitha standard administration ofa newfront-loaded trialin 421 patientswithacute of APSAC, in a randomizedmulticentre regimen in thestudy: can be identified Four familiesof hypotheses infarction. myocardial wherethe problemis of showing (a) base-linecomparisons(11 hypotheses), equivalence; artery(eighthypotheses); (b) patencyof infarct-related ratesof patentinfarct-related artery(six hypotheses); (c) reocclusion treatment (15 (d) cardiac and othereventsafterthe startof thrombolitic hypotheses). FDR controlmaybe desired:we do notwishto concludethatthe In thislastfamily to theprevioustreatment is betterifit is merely treatment equivalent front-loaded in all respects. to theproblemof multiplicity thereis no attention In thepaper,however, (the and secondary). beingthedivisionof theend pointsintoprimary onlyexception as theyare,withno wordofwarning regarding Theindividual p-valuesarereported The authorsconcludethat theirinterpretation. theclinicalcoursewith despitemoreearlyreocclusions, 'Comparedto APSAC treatment, and a substantially withfewerbleedingcomplications rt-PAtreatment is morefavorable oftheinfarctduetoimproved lowerin-hospital earlypatency mortality rate,presumably relatedartery'. 1995] CONTROLLING FALSE DISCOVERY RATE 295 The statement aboutthemortality is based on a p-valueof 0.0095. and whichcontainsthecomparison ofmortality Considernowthefourth family, 14 othercomparisons. The orderedp(i)s forthe 15 comparisons made are 0.0001, 0.0004, 0.0019, 0.0095, 0.0201, 0.0278, 0.0298, 0.0344, 0.0459, 0.3240, 0.4262, 0.5719, 0.6528, 0.7590, 1.000. theFWER at 0.05, theBonferroni Controlling approach,using0.05/15= 0.0033, to the smallestp-values.These hyrejectsthe threehypotheses corresponding to reducedallergicreaction,and to two different pothesescorrespond aspectsof bleeding;theydo not includethe comparisonof mortality. Using Hochberg's procedureleavesus withthesamethreehypotheses rejected.Thus thestatement is unjustified fromthe classicalpoint reductionin mortality about a significant of view. withq* = 0.05, we nowcomparesequenUsingtheFDR controlling procedure withP(15). The firstp-valueto satisfythe tiallyeachP(i) with0.05i/15,starting constraint is P(4) as 0.013. ~~15 Thus we rejectthe fourhypotheses havingp-valueswhichare less thanor equal confidence thestatements about to 0.013. We maysupportnowwithappropriate mortality decrease,of whichwe did not have sufficiently strongevidencebefore. P(4) = P(4) 0.0095< - 0.05 = 3.3. AnotherLook at False DiscoveryRate Controlling Procedure The above FDR controlling procedurecan be viewedas a post hoc maximizing procedure, as thefollowing theoremsuggests. Theorem2. The FDR controlling proceduregivenby expression(1) is the constrained maximization solutionof thefollowing problem: choosea thatmaximizesthenumberof rejectionsat thislevel,r(a), (3) subjectto theconstraint am/r(a) < q*. Proof. Observethat,foreach a, if P(i) < a <P(i+ 1), thenr(a) = i. Furtherina overtherange sideofconstraint more,as theratioon theleft-hand (3) increases it is enoughto investigate caswhichare equal to one of on whichr(a) is constant, thep(i)s. This a = P(k) satisfiesthe constraint becausea/r(a) = P(k)/k< q*/m. thelargestpotentialas (largest theprocedure Byconsidering yieldsthe p(i)s) first, a withthelargestr(a) satisfying theconstraint. has also theappearanceof a simultaneous Thustheprocedure maximization of R and FDR controlbeingattempted afterexperimentation. Wheneachhypothesis at levela, theexpectednumberof wrongrejections is testedindividually satisfies theoutcomeof theexperiment, an upperbound E(V) < am. So, afterobserving of Qeis am/r(a). In viewof theobservedp-values,thelevela can now estimate the observednumberof rejectionsr(a) subjectto the be chosen,by maximizing on theimpliedFDR-likebound.As notedin theexamplesof Section2.2 constraint thisaspectof theprocedureis desirable. 296 BENJAMINI AND HOCHBERG [No. 1, 4. SOME POWER COMPARISONS procedurewithsome other We comparethe powerof our FDR controlling procedureswhichcontrolthe FWER. Underthe overallnull Bonferroni-type theFWER at levelq*. We taketheFWER theproposedmethodcontrols hypothesis methodto controltheFWER weakly methodsandtheFDR controlling controlling at thesamelevel,usingq* = a, and comparethepowerof themethodsfromthe It is clear fromthe configurations. approachesunderdifferent two different in Section2.1, property (b), thata methodwhichcontrolstheFDR is comment whichcontrolsthe FWER (in the morepowerfulthanits counterpart generally remainsto be investigated. of thedifference strongsense).The magnitude 4.1. The Setting study,wherethefamilyof We studiedthisquestionbyusinga largesimulation random normallydistributed of m independent is the expectations hypotheses is testedbya z-test,and the hypothesis variablesbeingequal to 0. Each individual of the We use q* = a = 0.05. The configurations are independent. teststatistics and thenumberof truly involvem = 4, 8, 16, 32 and 64 hypotheses, hypotheses were being3m/4,m/2,m/4 and 0. The non-zeroexpectations null hypotheses L in three and the and at 3L/4 following dividedintofourgroups placed L/4,L/2, ways: awayfrom0 in each group; decreasing (D) numberof hypotheses (a) linearly in each group; (b) equal (E) numberof hypotheses away from0 in each group. (I) numberof hypotheses (c) linearlyincreasing theexperiment. throughout werefixed(perconfiguration) Theseexpectations Thevarianceof all variableswas setto 1, andL was chosenat twolevels-5 and ratio. thesignal-to-noise 10-therebyvarying Theestimated standarderrorsofthe involved20000repetitions. Each simulation powerareabout0.0008-0.0016.As thesamenormaldeviateswereusedin a single and the withthesamenumberof hypotheses, acrossall configurations repetition in different were monotonically related,a positive alternatives configurations reducesthe varianceof a comparison was induced.This correlation correlation to belowtwicethevarianceof a single twomethodsor twoconfigurations between method. 4.2. Results of thefalse of theaveragepower(theproportion theestimates Fig. 1 presents whichare correctly rejected)for threemethods.The two FWER hypotheses theBonferroni (1988)method (dottedcurves)and Hochberg's methods, controlling procedure(full (brokencurves),are comparedwiththe new FDR controlling can be made fromtheresultsdisplayed. observations curves).The following (a) The powerof all the methodsdecreaseswhenthenumberof hypotheses control. testedincreases-thisis thecost of multiplicity wherethenon-null hypoth(b) The poweris smallestfortheD-configuration, eses are closerto the null, and is largestfor I (whichis obviousbut forcompleteness). mentioned CONTROLLING 1995] FALSE DISCOVERY (a) RATE 297 (c) (b) 1.0 0.6 ~0.4 1.0 0.8 18 36 * O0.2 z _0.6 . 0.0 aI) > 0.8 75 0.6 z to C\J0.4 1.0 0.8 .. . 0.4 0.2 _ _ _ _ _ _ _ _ _ _ _ __ 4 8 16 32 84 _ _ _ _ _ _ _ _ _ _ 4 8 16 32 64 4 8 16 32 64 Number ofTestedHypotheses Fig. 1. Simulation-based estimates of the average power (the proportion of the false null hypotheses ) and which are correctly rejected) for two FWER controlling methods, the Bonferroni ( . methods, and the FDR controlling procedure ( Hochberg's (1988) (?) ): (a) decreasing; (b) equally spread; (c) increasing The power of the FDR controllingmethodis uniformlylargerthan thatof the othermethods. (d) The advantage increaseswiththe numberof non-nullhypotheses. (e) The advantageincreasesin m. Therefore,the loss of power as m increases is relativelysmall for the FDR controllingmethod in the E- and 1-configurations. (c) BENJAMINI AND HOCHBERG 298 [No. 1, large.For example,testing is extremely (f) The advantagein somesituations equallyspreadin fourclustersfrom1.25to 5 so thatnone 32 hypotheses, suggested methodis 0.42.Theprocedure is true,thepoweroftheBonferroni half ofwhich to as few as four hypotheses, 0.65; testing increases thepower and 0.70 are 0.62 respectively. are true,thevalues to alternative (g) It is knownthatHochberg'smethodoffersa morepowerful itis important to notethat thetraditional Bonferroni method.Nevertheless, thegainin powerdue to thecontrolof theFDR ratherthantheFWER is methodovertheBonfermuchlargerthanthegainoftheFWER controlling ronimethod. intoa different form,itmaybe seenthatin someconfiguraCastingtheseresults whichwerenotrejectedbytheBonferroni hypotheses tionsup to halfthenon-null method,whenat leasthalfof are nowrejectedbytheFDR controlling procedure Evenwhenonly25% ofthehypotheses arenonarenon-null. thetestedhypotheses null,thegaininpoweris suchthatabouta quarteroftheequallyspacedhypotheses whichwerenot rejectedbeforeare now rejected. Fig. 1 allows us also to answera questionraisedby a referee,about how method.This errorrateis 0 when by theFDR controlling E(V/mo)is controlled can be approximated by theaveragelevelcraveat whichthe mo= 0 and otherwise individual are tested.Obviouslyaave is alwayslessthanq*, butformo hypotheses and awayfromm morecan be said: letRavebe theaveragenumberof rejections in Fig. 1). It followsthatmaxave< Raveq* fave be theaveragepower(displayed (mOotave + m1fave)q*, so aave < favem + m( - Therefore E(V/mo)looksas in Fig. 1 butis smallerby a factorof q* or evenless. For m = mothe erroris much closer to q*/mo than to q*: 0.0132, 0.0063, 0.0033, testedrespectively. 0.0017and 0.0009forthefour,eight,16,32 and 64 hypotheses 5. CONCLUSION testingin thispaperis philosophically The approachto multiplesignificance fromtheclassicalapproaches.The classicalapproachrequires thecontrol different of theFWER in thestrongsense,a conservative typeI errorratecontrolagainst of thehypotheses tested.The newapproachcallsforthecontrol anyconfiguration also thecontrolof theFWER in theweaksense. of theFDR instead,and thereby from thisis the desirablecontrolagainsterrorsoriginating In manyapplications multiplicity. maybe developed,including Withintheframework suggested, otherprocedures whichutilizethestructure of specificproblemssuchas pairwisecomprocedures direction,whichwe have already parisonsin analysisof variance.A different theideasofHochberg isto developan adaptivemethodwhichincorporates pursued, we haveonlyfocusedon presenting and Benjamini(1990).In thispaper,however, theFDR, and we have thenewapproachthatcalls forcontrolling andmotivating Thus demonstrated thatitcan be developedintoa simpleand powerful procedure. neednotbe large.Thismightcontribute thecostpaidforthecontrolofmultiplicity 299 CONTROLLING FALSE DISCOVERY RATE 1995] considerablyto the proliferationof a greaterawareness of multiple-comparison problems,and of cases wheresomethingis done about it. ACKNOWLEDGEMENTS We would like to expressour thanksto J.P. Shafferand J.W. Tukey, whose commentsand questionsabout an earlierdrafthelpedus to crystallizetheapproach presentedhere. We thank YettyVaron for her participationin the programming of the simulationstudyand YechezkelKling forpointingout a need to tightenthe originalproof of theorem1. APPENDIX A: PROOF OF LEMMA weproceed on m.As thecasem = 1 is immediate, Theproofofthelemmais byinduction by assumingthatthelemmais trueforanym' < m, and showingit to hold form + 1. 0 and are false,Q is identically If mo= 0, all nullhypotheses E(QIPi =PI * Pm= Pm) =0( m + I q*. to thetruenull If mo> 0, denotebyP, = 1, 2,..., mo,thep-valuescorresponding randomvariand thelargestof thesebyP'mo).Theseare U(0, 1) independent hypotheses, take ables.For ease of notationassumethatthemlp-valuesthatthefalsenullhypotheses are orderedPi < P2 6. . .S PM,. Finally,definejo to be thelargest0 < j < ml satisfying pi < MO0+j + (4) q* (4) at jo byp ". side of inequality and denotetheright-hand Conditioningon P'mo) = p E(QIPmo+i =Pi .. . Pm Pm) =lE(Q Pmo+1=Pi IPIo) ., Pm = Pm)fA(mo)(P) dp +P E(Q IPI PM Pmo+1=Pigs. * Pm = Pml)f1(mO)(P)dp (5) 1). withfP(mO)(P) = MOP In thefirst partp S p". Thusall mo + jo hypothesesare rejected,and Q (4), we obtain Evaluatingtheintegralfirst,and thenusinginequality (MO - in0 MO_ + _ ____+__-(P_ 6)MO q*_p_ MO+ioiMO+j0imn+1 inO +IO*pMO~ Jo - )mOi mo/(mo + jo). -__ i+ nO 1 q*(p )l)mO04 (6) In the second part of equation (5), considerseparatelyeach pj0 < pj < P'mo) = P < Pj+ I, along with pj0 < p" < P'mo) = P <Pjo +l. It is importantto note that, because of the can be rejectedas a resultof the way by whichjo and p" are defined,no hypothesis whenall hypotheses-trueand false-are valuesof P, Pj+1i Pj+2, . . Pml Therefore, and theirp-valuesthusordered,a hypothesis considered together, H(i) can be rejected onlyifthereexistsk, i 6 k < mO + j - 1, forwhichP(k) < {k/(m + 1)}q*, orequivalently , [No. 1, BENJAMINIAND HOCHBERG 300 P(k) p k mO+j-1 mO +j - 1 (7) * (m+l)p When conditioningon Pmo) = p, the P /p for i = 1, 2, . . ., mo - 1 are distributedas j are mo - 1 independentU(O, 1) random variables, and the pi/p for i = 1, 2, ..., (7) to between0 and 1. Usinginequality to falsenullhypotheses numbers corresponding is equivalentto usingprocedure(1), withthe testthe mo+ j - 1 = m' < m hypotheses constant{(mo + j - 1)/(m+ I)p}q* takingtherole of q*. Applyingnow theinduction we have hypothesis, m-1 q* M M m+j-1I E(QIPImo)PP = P.Pmo+1 P1. = PM=Pml) MO mO+j m-1q (m + I)p (m + 1)p I (8). Theboundin inequality (8) dependsonp, butnoton thesegment pj < p < Pj+ I forwhich it was evaluated,so |E(Q I P'(mo) p, Pmo+I =m+ q* = P I. (MO (P)dP . PM= PMl)f(mO) l)p(mo-2)dp mO < J q*fl, (m +4 q*mop(m0l) pt(mO -1) dp (9) theproofof thelemma. (6) and (9) completes Addinginequalities REFERENCES themeansof severalgroups.New Engl. J. Med., 311, 1450-1456. K. (1985)Comparing Godfrey, Y. (1988)A sharperBonferroni formultiple Hochberg, procedure testsof significance. Biometrika, 75, 800-803. formultiple Y. and Benjamini, Y. (1990)Morepowerful Hochberg, procedures significance testing. Statist.Med., 9, 811-818. Hochberg,Y. and Tamhane,A. (1987)MultipleComparison Procedures.New York:Wiley. testprocedure. Holm,S. (1979)A simplesequentially rejective multiple Scand.J. Statist.,6, 65-70. testprocedure basedon a modified Hommel,G. (1988)A stagewise rejective multiple Bonferroni test. Biometrika, 75, 383-386. Neuhaus,K. L., Von Essen,R., Tebbe,U., Vogt,A., Roth,M., Riess,M., Niederer, W., Forycki, F., Wirtzfeld, A., Maeurer,W., Limbourg,P., Merx,W. and Haerten,K. (1992) Improved in acutemyocardial infarction front-loaded administration ofthe thrombolysis ofAlteplase:results rt-PA-APSACpatencystudy(TAPS). J. Am. Coll. Card., 19, 885-891. in reporting Pocock,S. J.,Hughes,M. D. and Lee, R. J.(1987)Statistical problems of clinicaltrials. J. Am. Statist.Ass., 84, 381-392. testprocedure basedon a modified Rom,D. M. (1990)A sequentially rejective Bonferroni inequality. Biometrika, 77, 663-665. Saville,D. J. (1990) Multiplecomparisonprocedures:the practicalsolution.Am. Statistn,44, 174-180. Bonferroni formultiple testsof significance. Simes,R. J.(1986)An improved procedure Biometrika, 73, 751-754. Smith,D. E., Clemens,J., Crede,W., Harvey,M. and Gracely,E. J. (1987) Impactof multiple in randomized clinicaltrials.Am. J. Med., 83, 545-550. comparisons and effect sizeestimation. J.Am. Statist.Ass., 84, 608-610. "discoveries" Soric,B. (1989)Statistical of somemultiple Ann.Math.Statist., Spj0tvoll,E. (1972)On theoptimality comparison procedure. 43, 398-411.
© Copyright 2026 Paperzz