Controlling the False Discovery Rate: A

Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing
Author(s): Yoav Benjamini and Yosef Hochberg
Reviewed work(s):
Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 57, No. 1
(1995), pp. 289-300
Published by: Blackwell Publishing for the Royal Statistical Society
Stable URL: http://www.jstor.org/stable/2346101 .
Accessed: 15/02/2012 12:16
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
Blackwell Publishing and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and
extend access to Journal of the Royal Statistical Society. Series B (Methodological).
http://www.jstor.org
J. R. Statist.Soc. B (1995)
57, No. 1, pp. 289-300
Controllingthe False DiscoveryRate: a Practical and Powerful
Approach to Multiple Testing
By YOAV BENJAMINIt and YOSEF HOCHBERG
Tel Aviv University,
Israel
[Received January1993. Revised March 1994]
SUMMARY
The common approach to the multiplicity
problemcalls for controllingthe familywise
errorrate(FWER). This approach,though,has faults,and we pointout a few.A different
approach to problemsof multiplesignificancetestingis presented.It calls forcontrolling
theexpectedproportionof falselyrejectedhypotheses
-the falsediscoveryrate. This error
rateis equivalentto theFWER whenall hypothesesare truebutis smallerotherwise.Therefore, in problemswherethe controlof the false discoveryrate ratherthan that of the
FWER is desired,thereis potentialfora gain in power. A simplesequentialBonferronitypeprocedureis provedto controlthe falsediscoveryrate forindependentteststatistics,
and a simulationstudyshows that the gain in power is substantial.The use of the new
procedureand the appropriatenessof the criterionare illustratedwithexamples.
Keywords: BONFERRONI-TYPE PROCEDURES; FAMILYWISE ERROR RATE; MULTIPLECOMPARISON PROCEDURES; p-VALUES
1.
INTRODUCTION
Whenpursuing
researchers
tendtoselectthe(statistically)
multiple
signifinferences,
and supportof conclusions.
icantonesforemphasis,discussion
An unguarded
use
resultsin a greatlyincreasedfalsepositive(signifof single-inference
procedures
icance) rate. To controlthis multiplicity
(selection)effect,classical multipleof committing
comparison
procedures
(MCPs) aim to controltheprobability
any
consideration.
The
typeI errorin familiesof comparisonsundersimultaneous
errorrate(FWER) is usuallyrequiredin a strongsense,
controlof thisfamilywise
ofthetrueand falsehypotheses
i.e. underall configurations
tested(see forexample
Hochbergand Tamhane(1987)).
EventhoughMCPs havebeenin use sincetheearly1950s,and in spiteof the
forsomejournals,as wellas in some
advocacyfortheiruse (e.g. beingmandatory
suchas theFood and DrugAdministration
institutions
of theUSA), researchers
In medicalresearch,forexample,
havenot yetwidelyadoptedtheseprocedures.
Godfrey
(1985),Pococket al. (1987)and Smithet al. (1987)examinedsamplesof
reportsof comparativestudiesfrommajor medicaljournals.They foundthat
overlookvariouskindsof multiplicity,
researchers
and as a resultreporting
tends
treatment
differences
to exaggerate
(Pocock et al., 1987).
withclassicalMCPs whichcause theirunderutilization
Some of thedifficulties
in appliedresearchare as follows.
tAddressfor correspondence:Departmentof Statistics,School of MathematicalSciences, Sackler Faculty for
Exact Sciences, Tel Aviv University,Tel Aviv 69978, Israel.
E-mail: [email protected]
? 1995 RoyalStatisticalSociety
0035-9246/95/57289
290
BENJAMINI AND HOCHBERG
[No. 1,
MCPs concernscompariof FWER controlling
(a) Muchof themethodology
aremultivariate
whoseteststatistics
andfamilies
treatments
sonsofmultiple
arenotofthe
encountered
manyoftheproblems
normal(or t). In practice,
normal.In
are notmultivariate
type,and teststatistics
multiple-treatments
types.
of
different
with
statistics
often
combined
fact,familiesare
thatcontroltheFWER in thestrongsense,at levels
(b) Classicalprocedures
less
in single-comparison
tendto havesubstantially
problems,
conventional
powerthantheper comparisonprocedureof thesamelevels.
(c) Oftenthe controlof the FWER is not quiteneeded.The controlof the
FWER is important
whena conclusionfromthevariousindividualinferencesis likelyto be erroneouswhenat leastone of themis. Thismaybe
are competing
against
thecase, forexample,whenseveralnewtreatments
which
anda singletreatment
is chosenfromthesetoftreatments
a standard,
are declaredsignificantly
betterthanthestandard.However,a treatment
groupand a controlgroupare oftencomparedby testingvariousaspects
The overall
endpointsin clinicaltrialsterminology).
oftheeffect
(different
evenifsome
is superior
neednotbe erroneous
thatthetreatment
conclusion
are falselyrejected.
of thenullhypotheses
has been partiallyaddressedby therecentlineof research
The firstdifficulty
whichusetheobservedindividual
p-values,
Bonferroni-type
procedures,
advancing
faithful
to FWER control:Simes(1986),Hommel(1988),Hochberg
whileremaining
stillpresenta seriousproblem.
(1988)and Rom (1990).The othertwodifficulties
errorrate(PCER) approach,whichamounts
whya percomparison
Thisis probably
is stillrecommended
bysome(e.g.
themultiplicity
to ignoring
problemaltogether,
Saville(1990)).
In
In thisworkwe suggesta newpointof viewon theproblemof multiplicity.
shouldbe takeninto
oferroneous
thenumber
rejections
problems
manymultiplicity
anyerrorwas made.Yet, at thesame
accountand notonlythequestionwhether
is inversely
related
of
loss
incurred
erroneous
rejections
by
time,theseriousness the
of
a
desirable
error
From
this
point view,
to thenumberof hypotheses
rejected.
the
rejected
rate to controlmay be the expectedproportionof errorsamong
integrates
rate(FDR). Thiscriterion
whichwe termthefalsediscovery
hypotheses,
in
of
committed
multipleSpj0tvoll's(1972) concernabout the number errors
of a false
withSoric's(1989)concernabouttheprobability
problems,
comparison
givena rejection.We use thetermFDR afterSoris(1989),whoidentified
rejection
witha 'statistical
discovery'.
a rejectedhypothesis
in Section2.1 a formaldefinition
of the
we
some
present
After
preliminaries,
thiserrorrateare
butimportant
of controlling
FDR. Two immediate
consequences
procedures.
given:it impliesweakcontrolof FWER and it admitsmorepowerful
In Section2.2 wepresent
someexampleswherethecontroloftheFDR is desirable.
FDR controlling
and
procedure
In Section3 we presenta simpleBonferroni-type
anddemonstration
ofitsproperties.
therestofthesectionis devotedto a discussion
a simulation
studyof thepowerof theprocedure.
Section4 presents
2. FALSE DISCOVERY RATE
of which
m (null)hypotheses,
Considertheproblemof testingsimultaneously
the
rejected.Table 1 summarizes
moare true.R is the numberof hypotheses
19951
291
CONTROLLING FALSE DISCOVERY RATE
TABLE 1
m nullhypotheses
Numberof errorscommitted
whentesting
Declared
Declared
non-significant significant
True null hypotheses
Non-truenull hypotheses
Total
U
T
V
S
m-mO
m-R
R
m
mO
ina traditional
form.Thespecific
m hypotheses
areassumedto be known
situation
in advance.R is an observable
randomvariable;U, V, S and T are unobservable
randomvariables.If eachindividual
nullhypothesis
is testedseparately
at levela,
in a. We use theequivalent
thenR = R(a) is increasing
lowercase letters
fortheir
realizedvalues.
In termsof theserandomvariables,the PCER is E(V/m) and the FWER is
eachhypothesis
atlevela guarantees
P(V > 1).Testing
thatE(V/m)S a.
individually
Testingindividually
each hypothesis
at levela/lmguarantees
thatP(V > 1) < a.
2.1. Definitionof False DiscoveryRate
of errorscommitted
The proportion
nullhypotheses
by falselyrejecting
can be
viewedthrough
therandomvariableQ = V/(V + S)-the proportion
oftherejected
whichare erroneously
nullhypotheses
we defineQ = 0 when
rejected.Naturally,
V + S = 0, as no errorof falserejectioncan be committed.
Q is an unobserved
(unknown)randomvariable,as we do not knowv or s, and thusq = v/(v+ s),
and data analysis.We definethe FDR Qe to be the
evenafterexperimentation
of Q,
expectation
Qe
= E(Q) = E{V/(V + S)} = E(V/R).
Two properties
of thiserrorrateare easilyshown,yetare veryimportant.
aretrue,theFDR is equivalent
to theFWER: in this
(a) If all nullhypotheses
case s = 0 and v = r, so if v = 0 thenQ = 0, and if v> 0 thenQ = 1,
leading to P(V > 1) = E(Q) = Qe. Thereforecontrol of the FDR implies
controlof theFWER in theweaksense.
(b) Whenmo< m, the FDR is smallerthanor equal to the FWER: in this
case,if v > 0 thenvir < 1, leadingto X(v> l) > Q. Takingexpectations
on
bothsideswe obtainP(V > 1) > Qe, and thetwocan be quitedifferent.
As
a result,any procedurethatcontrolsthe FWER also controlsthe FDR.
theFDR only,itcanbe lessstringent,
controls
and
However,ifa procedure
a gainin powermaybe expected.In particular,
thelargerthenumberof
null hypotheses
the non-true
is, the largerS tendsto be, and so is the
betweentheerrorrates.As a result,thepotential
difference
forincreasein
are non-true.
poweris largerwhenmoreof thehypotheses
292
BENJAMINIAND HOCHBERG
[No. 1,
2.2. Examples
Thefollowing
examplesshowtherelevance
ofFDR controlin sometypicalsituaofa largenumberof rejections
tions.In additiontheyindicatethedesirability
(disBoth
are
addressed
the
coveries).
aspects
by
proceduregivenlaterin Section3.
One typeof multiple-comparison
probleminvolvesan overalldecision(concluinferences.
An exampleof
sion,recommendation,
etc.)whichis basedon multiple
thistypeof problemsis the'multiple
end pointsproblem',whichwas used earlier
to show thatFWER controlis not alwaysneeded.In thisexamplethe overall
a new treatment
decisionproblemis whetherto recommend
over a standard
ofnullhypotheses
Discoveries
herearerejections
treatment.
thattreatment
claiming
on specified
endpoints.Theseconclusions
thanstandard
is no better
aboutdifferent
are of interest
aspectsof thebenefitof thenewtreatment
per se, but theset of
willbe used to reachan overalldecisionregarding
discoveries
thenewtreatment.
to makeas manydiscoveries
as possible(whichwillenhancea
We wishtherefore
decisionin favourof thenewtreatment),
subjectto controlof theFDR. Control
as a smallproportion
of anyerroris unnecessarily
of theprobability
of
stringent,
errorswillnot changethe overallvalidityof theconclusion.
involvesmultiple
Another
typeofproblems
separatedecisionswithout
an overall
An exampleofthistypeis themultiple-subgroups
decisionbeingrequired.
problem,
arecomparedinmultiple
and separaterecommenwheretwotreatments
subgroups,
treatments
mustbe made forall subgroups.As usual
dationson the preferred
we wishto discoveras manyas possiblesignificant
differences,
thereby
reaching
of
operationaldecisions,butwouldbe willingto admita prespecified
proportion
misses,i.e. willingto use an FDR controlling
procedure.
wheremultiple
The thirdtypeinvolvesscreening
problems,
potentialeffects
are
One exampleis screening
to weedoutthenulleffects.
screened
ofvariouschemicals
Anotherexampleis testingmultiplefactorsin
forpotentialdrugdevelopment.
an experimental
(2k say) design.In suchexampleswe wantto obtainas manyas
factorsthat affectthe
possiblediscoveries(candidatesfor drugdevelopments,
qualityofa product)butagainwishto controltheFDR, becausetoolargea fraction
of falseleads wouldburdenthesecondphase of theconfirmatory
analysis.
2.3. Alternative
Formulations
to capturetheerrorratevaguelydescribed
We havesuggested
as 'theproportion
of falsediscoveries'
to discuss
usingtheFDR. At thispointitmightbe illuminating
formulations
of thisconcept,and thusto motivateour choiceof FDR
alternative
further.
therandomvariableQ at eachrealization
is mostdesirUndoubtedly,
controlling
forif mO = m and evenif a singlehypothesis
able. Thisis impossible,
is rejected
v/r= 1 and Q cannotthenbe controlled.
Controlling
(V/RIR > 0) has thesame
- it is identically
1 in theabove configuration.
problem
Therefore
E(V/RIR > 0)
The FDR, instead,is P(R > 0) E(V/RIR > 0), and, as will
cannotbe controlled.
be shownlater,thisis possibleto control.
thatSoric(1989) gave to 'theproportion
Second,considerthe formulation
of
as Q' = E(V)/r. This quotientis neither
falsediscoveries
amongthediscoveries'
but is a mixtureof expectations
therandomvariableQ nor its expectation
and
It is not eventheconditionalexpectation
of Q, namelyE(Q IR = r)
realizations.
= E(V IR = r)/r,whichhas againtheproblemof controlformO = m.
1995]
CONTROLLING FALSE DISCOVERY RATE
293
1, and
aretrueit is identically
Third,considerE(V)/E(R). Whenall hypotheses
again impossibleto control.A remedymaybe givenby eitheradding1 to the
to
solution,or by changingthedenominator
a somewhatartificial
denominator,
in the same way will
and denominator
bothnumerator
E(R IR > 0). Modifying
againrunintoproblemsof controlwhenmo= m.
3. FALSE DISCOVERY RATE CONTROLLING PROCEDURE
3.1. TheProcedure
Consider testingH1, H2, . . ., Hm based on the correspondingp-values P1,
P2, ..., Pm. Let P(l) < P(2) < ... < P(m) be the orderedp-values, and denoteby
Bonferronito P(i). Definethe following
corresponding
H(i) the nullhypothesis
procedure:
typemultiple-testing
let k be thelargesti forwhichP(i) < mq
(1)
of false
and foranyconfiguration
teststatistics
Theorem1. For independent
theabove procedurecontrolstheFDR at q*.
nullhypotheses,
lemma,whoseproofis given
Proof. The theoremfollowsfromthefollowing
in AppendixA.
thenrejectall H(i) i = 1, 2, ...,
k.
Lemma. For any 0 < mO < m independentp-values correspondingto truenull
to
and foranyvaluesthattheml = m - mO p-valuescorresponding
hypotheses,
can take, the multiple-testing
proceduredefinedby
the false null hypotheses
procedure(1) above satisfiestheinequality
PPm= pml) mOq*.
(2)
m
are false.Whatever
thejoint
mO of thehypotheses
E(Q IPmo+1 = Pi, * *
Now, supposethatml = m distributionof P1I', ...,
Pm',which correspondsto these false hypothesesis,
(2) above we obtain
inequality
integrating
< q*'
E(Q) < MOq*
m
and theFDR is controlled.
of theteststatistics
to the
corresponding
Remark. Notethattheindependence
is not neededfortheproofof thetheorem.
falsenullhypotheses
extension
to
by Simes(1986)as an exploratory
Thisprocedurewas mentioned
thatall nullhypotheses
hypothesis
his procedureforrejectionof theintersection
are trueif,forsomei, P(i) < ia/m.WhereasSimes(1986)showedthathisproceHommel(1988)
nullhypothesis,
durecontrolsthe FWER undertheintersection
forinference
on individual
doesnot
hypotheses
showedthattheextended
procedure
of the falsenull
controlthe FWER in the strongsense:forsomeconfiguration
of an erroneousrejectionis greaterthana.
theprobability
hypotheses,
a different
so that
wayto utilizeSimes'sprocedure
Hochberg(1988)hassuggested
thefollowing
procedure:
it doescontroltheFWER in thestrongsense,byoffering
294
[No. 1,
BENJAMINIAND HOCHBERG
let k be the largesti for whichP(i) < m + 1 - i ;
then reject all H(i) i = 1, 2,
. .
., k.
betweenHochberg'sprocedureand theFDR controlling
Note therelationship
and theFDR
whenq* is chosento equal a. BothHochberg's
procedure
procedure
P(m)
whichstartby comparing
procedures,
controlling
procedureare step-down
-as ifa PCER approachhadbeen
arerejected
withax,andifsmallerall hypotheses
taken.If P(m)> a, proceedto smallerp-valuesuntilone satisfiesthecondition.
as in
P(1) witha/rm,
earlier,by comparing
end,if notterminated
The procedures
At thetwoendstheprocedures
are similar,but,in
a pureBonferroni
comparison.
between,the sequence of p(i)s is compared with {1 - (i - 1)/m}ain the current
The series
procedure.
ratherthanwith{ 1/(m+ 1 - i)}a in Hochberg's
procedure,
oftheFDR controlling
methodis alwayslargerthan
constants
oflinearly
decreasing
ratiois as
constantsof Hochberg,and theextreme
thehyperbolically
decreasing
procedure
largeas 4m/(m+ 1)2 at i = (m + 1)/2. This showsthatthe suggested
methodandtherefore
at leastas manyhypotheses
as Hochberg's
rejectssamplewise
methodssuch as Holm's
has also greaterpowerthan otherFWER controlling
(1979).
Procedure
3.2. Exampleof False DiscoveryRate Controlling
activator(rt-PA)and
plasminogen
withrecombinant
tissue-type
Thrombolysis
infarction
activator
(APSAC) in myocardial
streptokinase
anisoylated
plasminogen
theeffects
Neuhauset al. (1992)investigated
has beenprovedto reducemortality.
ofrt-PAversusthoseobtainedwitha standard
administration
ofa newfront-loaded
trialin 421 patientswithacute
of APSAC, in a randomizedmulticentre
regimen
in thestudy:
can be identified
Four familiesof hypotheses
infarction.
myocardial
wherethe problemis of showing
(a) base-linecomparisons(11 hypotheses),
equivalence;
artery(eighthypotheses);
(b) patencyof infarct-related
ratesof patentinfarct-related
artery(six hypotheses);
(c) reocclusion
treatment
(15
(d) cardiac and othereventsafterthe startof thrombolitic
hypotheses).
FDR controlmaybe desired:we do notwishto concludethatthe
In thislastfamily
to theprevioustreatment
is betterifit is merely
treatment
equivalent
front-loaded
in all respects.
to theproblemof multiplicity
thereis no attention
In thepaper,however,
(the
and secondary).
beingthedivisionof theend pointsintoprimary
onlyexception
as theyare,withno wordofwarning
regarding
Theindividual
p-valuesarereported
The authorsconcludethat
theirinterpretation.
theclinicalcoursewith
despitemoreearlyreocclusions,
'Comparedto APSAC treatment,
and a substantially
withfewerbleedingcomplications
rt-PAtreatment
is morefavorable
oftheinfarctduetoimproved
lowerin-hospital
earlypatency
mortality
rate,presumably
relatedartery'.
1995]
CONTROLLING FALSE DISCOVERY RATE
295
The statement
aboutthemortality
is based on a p-valueof 0.0095.
and
whichcontainsthecomparison
ofmortality
Considernowthefourth
family,
14 othercomparisons.
The orderedp(i)s forthe 15 comparisons
made are
0.0001, 0.0004, 0.0019, 0.0095, 0.0201, 0.0278, 0.0298, 0.0344,
0.0459, 0.3240, 0.4262, 0.5719, 0.6528, 0.7590, 1.000.
theFWER at 0.05, theBonferroni
Controlling
approach,using0.05/15= 0.0033,
to the smallestp-values.These hyrejectsthe threehypotheses
corresponding
to reducedallergicreaction,and to two different
pothesescorrespond
aspectsof
bleeding;theydo not includethe comparisonof mortality.
Using Hochberg's
procedureleavesus withthesamethreehypotheses
rejected.Thus thestatement
is unjustified
fromthe classicalpoint
reductionin mortality
about a significant
of view.
withq* = 0.05, we nowcomparesequenUsingtheFDR controlling
procedure
withP(15). The firstp-valueto satisfythe
tiallyeachP(i) with0.05i/15,starting
constraint
is P(4) as
0.013.
~~15
Thus we rejectthe fourhypotheses
havingp-valueswhichare less thanor equal
confidence
thestatements
about
to 0.013. We maysupportnowwithappropriate
mortality
decrease,of whichwe did not have sufficiently
strongevidencebefore.
P(4)
=
P(4)
0.0095< - 0.05 =
3.3. AnotherLook at False DiscoveryRate Controlling
Procedure
The above FDR controlling
procedurecan be viewedas a post hoc maximizing
procedure,
as thefollowing
theoremsuggests.
Theorem2. The FDR controlling
proceduregivenby expression(1) is the
constrained
maximization
solutionof thefollowing
problem:
choosea thatmaximizesthenumberof rejectionsat thislevel,r(a),
(3)
subjectto theconstraint
am/r(a) < q*.
Proof. Observethat,foreach a, if P(i) < a <P(i+ 1), thenr(a) = i. Furtherina overtherange
sideofconstraint
more,as theratioon theleft-hand
(3) increases
it is enoughto investigate
caswhichare equal to one of
on whichr(a) is constant,
thep(i)s. This a = P(k) satisfiesthe constraint
becausea/r(a) = P(k)/k< q*/m.
thelargestpotentialas (largest
theprocedure
Byconsidering
yieldsthe
p(i)s) first,
a withthelargestr(a) satisfying
theconstraint.
has also theappearanceof a simultaneous
Thustheprocedure
maximization
of
R and FDR controlbeingattempted
afterexperimentation.
Wheneachhypothesis
at levela, theexpectednumberof wrongrejections
is testedindividually
satisfies
theoutcomeof theexperiment,
an upperbound
E(V) < am. So, afterobserving
of Qeis am/r(a). In viewof theobservedp-values,thelevela can now
estimate
the observednumberof rejectionsr(a) subjectto the
be chosen,by maximizing
on theimpliedFDR-likebound.As notedin theexamplesof Section2.2
constraint
thisaspectof theprocedureis desirable.
296
BENJAMINI AND HOCHBERG
[No. 1,
4. SOME POWER COMPARISONS
procedurewithsome other
We comparethe powerof our FDR controlling
procedureswhichcontrolthe FWER. Underthe overallnull
Bonferroni-type
theFWER at levelq*. We taketheFWER
theproposedmethodcontrols
hypothesis
methodto controltheFWER weakly
methodsandtheFDR controlling
controlling
at thesamelevel,usingq* = a, and comparethepowerof themethodsfromthe
It is clear fromthe
configurations.
approachesunderdifferent
two different
in Section2.1, property
(b), thata methodwhichcontrolstheFDR is
comment
whichcontrolsthe FWER (in the
morepowerfulthanits counterpart
generally
remainsto be investigated.
of thedifference
strongsense).The magnitude
4.1.
The Setting
study,wherethefamilyof
We studiedthisquestionbyusinga largesimulation
random
normallydistributed
of m independent
is the expectations
hypotheses
is testedbya z-test,and the
hypothesis
variablesbeingequal to 0. Each individual
of the
We use q* = a = 0.05. The configurations
are independent.
teststatistics
and thenumberof truly
involvem = 4, 8, 16, 32 and 64 hypotheses,
hypotheses
were
being3m/4,m/2,m/4 and 0. The non-zeroexpectations
null hypotheses
L
in
three
and
the
and
at
3L/4
following
dividedintofourgroups placed L/4,L/2,
ways:
awayfrom0 in each group;
decreasing
(D) numberof hypotheses
(a) linearly
in each group;
(b) equal (E) numberof hypotheses
away from0 in each group.
(I) numberof hypotheses
(c) linearlyincreasing
theexperiment.
throughout
werefixed(perconfiguration)
Theseexpectations
Thevarianceof all variableswas setto 1, andL was chosenat twolevels-5 and
ratio.
thesignal-to-noise
10-therebyvarying
Theestimated
standarderrorsofthe
involved20000repetitions.
Each simulation
powerareabout0.0008-0.0016.As thesamenormaldeviateswereusedin a single
and the
withthesamenumberof hypotheses,
acrossall configurations
repetition
in different
were monotonically
related,a positive
alternatives
configurations
reducesthe varianceof a comparison
was induced.This correlation
correlation
to belowtwicethevarianceof a single
twomethodsor twoconfigurations
between
method.
4.2.
Results
of thefalse
of theaveragepower(theproportion
theestimates
Fig. 1 presents
whichare correctly
rejected)for threemethods.The two FWER
hypotheses
theBonferroni
(1988)method
(dottedcurves)and Hochberg's
methods,
controlling
procedure(full
(brokencurves),are comparedwiththe new FDR controlling
can be made fromtheresultsdisplayed.
observations
curves).The following
(a) The powerof all the methodsdecreaseswhenthenumberof hypotheses
control.
testedincreases-thisis thecost of multiplicity
wherethenon-null
hypoth(b) The poweris smallestfortheD-configuration,
eses are closerto the null, and is largestfor I (whichis obviousbut
forcompleteness).
mentioned
CONTROLLING
1995]
FALSE DISCOVERY
(a)
RATE
297
(c)
(b)
1.0
0.6
~0.4
1.0
0.8
18
36
*
O0.2
z
_0.6
.
0.0
aI)
>
0.8
75 0.6
z
to
C\J0.4
1.0
0.8
..
.
0.4
0.2
_ _ _ _ _ _ _ _ _ _ _ __
4
8
16
32
84
_ _ _ _ _ _ _ _ _ _
4
8
16
32
64
4
8
16
32
64
Number
ofTestedHypotheses
Fig. 1. Simulation-based estimates of the average power (the proportion of the false null hypotheses
) and
which are correctly rejected) for two FWER controlling methods, the Bonferroni ( .
methods, and the FDR controlling procedure (
Hochberg's (1988) (?)
): (a) decreasing;
(b) equally spread; (c) increasing
The power of the FDR controllingmethodis uniformlylargerthan thatof
the othermethods.
(d) The advantage increaseswiththe numberof non-nullhypotheses.
(e) The advantageincreasesin m. Therefore,the loss of power as m increases
is relativelysmall for the FDR controllingmethod in the E- and
1-configurations.
(c)
BENJAMINI AND HOCHBERG
298
[No.
1,
large.For example,testing
is extremely
(f) The advantagein somesituations
equallyspreadin fourclustersfrom1.25to 5 so thatnone
32 hypotheses,
suggested
methodis 0.42.Theprocedure
is true,thepoweroftheBonferroni
half
ofwhich
to
as
few
as
four
hypotheses,
0.65;
testing
increases
thepower
and
0.70
are
0.62
respectively.
are true,thevalues
to
alternative
(g) It is knownthatHochberg'smethodoffersa morepowerful
itis important
to notethat
thetraditional
Bonferroni
method.Nevertheless,
thegainin powerdue to thecontrolof theFDR ratherthantheFWER is
methodovertheBonfermuchlargerthanthegainoftheFWER controlling
ronimethod.
intoa different
form,itmaybe seenthatin someconfiguraCastingtheseresults
whichwerenotrejectedbytheBonferroni
hypotheses
tionsup to halfthenon-null
method,whenat leasthalfof
are nowrejectedbytheFDR controlling
procedure
Evenwhenonly25% ofthehypotheses
arenonarenon-null.
thetestedhypotheses
null,thegaininpoweris suchthatabouta quarteroftheequallyspacedhypotheses
whichwerenot rejectedbeforeare now rejected.
Fig. 1 allows us also to answera questionraisedby a referee,about how
method.This errorrateis 0 when
by theFDR controlling
E(V/mo)is controlled
can be approximated
by theaveragelevelcraveat whichthe
mo= 0 and otherwise
individual
are tested.Obviouslyaave is alwayslessthanq*, butformo
hypotheses
and
awayfromm morecan be said: letRavebe theaveragenumberof rejections
in Fig. 1). It followsthatmaxave< Raveq*
fave be theaveragepower(displayed
(mOotave + m1fave)q*,
so
aave < favem
+
m(
-
Therefore
E(V/mo)looksas in Fig. 1 butis smallerby a factorof q* or evenless.
For m = mothe erroris much closer to q*/mo than to q*: 0.0132, 0.0063, 0.0033,
testedrespectively.
0.0017and 0.0009forthefour,eight,16,32 and 64 hypotheses
5. CONCLUSION
testingin thispaperis philosophically
The approachto multiplesignificance
fromtheclassicalapproaches.The classicalapproachrequires
thecontrol
different
of theFWER in thestrongsense,a conservative
typeI errorratecontrolagainst
of thehypotheses
tested.The newapproachcallsforthecontrol
anyconfiguration
also thecontrolof theFWER in theweaksense.
of theFDR instead,and thereby
from
thisis the desirablecontrolagainsterrorsoriginating
In manyapplications
multiplicity.
maybe developed,including
Withintheframework
suggested,
otherprocedures
whichutilizethestructure
of specificproblemssuchas pairwisecomprocedures
direction,whichwe have already
parisonsin analysisof variance.A different
theideasofHochberg
isto developan adaptivemethodwhichincorporates
pursued,
we haveonlyfocusedon presenting
and Benjamini(1990).In thispaper,however,
theFDR, and we have
thenewapproachthatcalls forcontrolling
andmotivating
Thus
demonstrated
thatitcan be developedintoa simpleand powerful
procedure.
neednotbe large.Thismightcontribute
thecostpaidforthecontrolofmultiplicity
299
CONTROLLING FALSE DISCOVERY RATE
1995]
considerablyto the proliferationof a greaterawareness of multiple-comparison
problems,and of cases wheresomethingis done about it.
ACKNOWLEDGEMENTS
We would like to expressour thanksto J.P. Shafferand J.W. Tukey, whose
commentsand questionsabout an earlierdrafthelpedus to crystallizetheapproach
presentedhere. We thank YettyVaron for her participationin the programming
of the simulationstudyand YechezkelKling forpointingout a need to tightenthe
originalproof of theorem1.
APPENDIX A: PROOF OF LEMMA
weproceed
on m.As thecasem = 1 is immediate,
Theproofofthelemmais byinduction
by assumingthatthelemmais trueforanym' < m, and showingit to hold form + 1.
0 and
are false,Q is identically
If mo= 0, all nullhypotheses
E(QIPi =PI *
Pm= Pm) =0(
m +
I
q*.
to thetruenull
If mo> 0, denotebyP, = 1, 2,..., mo,thep-valuescorresponding
randomvariand thelargestof thesebyP'mo).Theseare U(0, 1) independent
hypotheses,
take
ables.For ease of notationassumethatthemlp-valuesthatthefalsenullhypotheses
are orderedPi < P2 6. . .S PM,. Finally,definejo to be thelargest0 < j < ml satisfying
pi
<
MO0+j
+
(4)
q*
(4) at jo byp ".
side of inequality
and denotetheright-hand
Conditioningon P'mo) = p
E(QIPmo+i =Pi
.. . Pm Pm) =lE(Q
Pmo+1=Pi
IPIo)
.,
Pm = Pm)fA(mo)(P)
dp
+P
E(Q IPI PM
Pmo+1=Pigs. *
Pm = Pml)f1(mO)(P)dp
(5)
1).
withfP(mO)(P) = MOP
In thefirst
partp S p". Thusall mo + jo hypothesesare rejected,and Q
(4), we obtain
Evaluatingtheintegralfirst,and thenusinginequality
(MO -
in0
MO_
+ _
____+__-(P_ 6)MO q*_p_
MO+ioiMO+j0imn+1
inO +IO*pMO~
Jo
-
)mOi
mo/(mo + jo).
-__
i+ nO
1 q*(p )l)mO04
(6)
In the second part of equation (5), considerseparatelyeach pj0 < pj < P'mo) = P < Pj+ I,
along with pj0 < p" < P'mo) = P <Pjo +l. It is importantto note that, because of the
can be rejectedas a resultof the
way by whichjo and p" are defined,no hypothesis
whenall hypotheses-trueand false-are
valuesof P, Pj+1i Pj+2, . . Pml Therefore,
and theirp-valuesthusordered,a hypothesis
considered
together,
H(i) can be rejected
onlyifthereexistsk, i 6 k < mO + j - 1, forwhichP(k) < {k/(m + 1)}q*, orequivalently
,
[No. 1,
BENJAMINIAND HOCHBERG
300
P(k)
p
k
mO+j-1
mO +j
-
1
(7)
*
(m+l)p
When conditioningon Pmo) = p, the P /p for i = 1, 2, . . ., mo - 1 are distributedas
j are
mo - 1 independentU(O, 1) random variables, and the pi/p for i = 1, 2, ...,
(7) to
between0 and 1. Usinginequality
to falsenullhypotheses
numbers
corresponding
is equivalentto usingprocedure(1), withthe
testthe mo+ j - 1 = m' < m hypotheses
constant{(mo + j - 1)/(m+ I)p}q* takingtherole of q*. Applyingnow theinduction
we have
hypothesis,
m-1
q*
M
M
m+j-1I
E(QIPImo)PP
= P.Pmo+1 P1.
=
PM=Pml)
MO
mO+j
m-1q
(m + I)p
(m + 1)p
I
(8).
Theboundin inequality
(8) dependsonp, butnoton thesegment
pj < p < Pj+ I forwhich
it was evaluated,so
|E(Q
I P'(mo)
p, Pmo+I
=m+
q*
=
P I.
(MO
(P)dP
. PM= PMl)f(mO)
l)p(mo-2)dp
mO
<
J
q*fl,
(m +4
q*mop(m0l)
pt(mO -1)
dp
(9)
theproofof thelemma.
(6) and (9) completes
Addinginequalities
REFERENCES
themeansof severalgroups.New Engl. J. Med., 311, 1450-1456.
K. (1985)Comparing
Godfrey,
Y. (1988)A sharperBonferroni
formultiple
Hochberg,
procedure
testsof significance.
Biometrika,
75, 800-803.
formultiple
Y. and Benjamini,
Y. (1990)Morepowerful
Hochberg,
procedures
significance
testing.
Statist.Med., 9, 811-818.
Hochberg,Y. and Tamhane,A. (1987)MultipleComparison
Procedures.New York:Wiley.
testprocedure.
Holm,S. (1979)A simplesequentially
rejective
multiple
Scand.J. Statist.,6, 65-70.
testprocedure
basedon a modified
Hommel,G. (1988)A stagewise
rejective
multiple
Bonferroni
test.
Biometrika,
75, 383-386.
Neuhaus,K. L., Von Essen,R., Tebbe,U., Vogt,A., Roth,M., Riess,M., Niederer,
W., Forycki,
F., Wirtzfeld,
A., Maeurer,W., Limbourg,P., Merx,W. and Haerten,K. (1992) Improved
in acutemyocardial
infarction
front-loaded
administration
ofthe
thrombolysis
ofAlteplase:results
rt-PA-APSACpatencystudy(TAPS). J. Am. Coll. Card., 19, 885-891.
in reporting
Pocock,S. J.,Hughes,M. D. and Lee, R. J.(1987)Statistical
problems
of clinicaltrials.
J. Am. Statist.Ass., 84, 381-392.
testprocedure
basedon a modified
Rom,D. M. (1990)A sequentially
rejective
Bonferroni
inequality.
Biometrika,
77, 663-665.
Saville,D. J. (1990) Multiplecomparisonprocedures:the practicalsolution.Am. Statistn,44,
174-180.
Bonferroni
formultiple
testsof significance.
Simes,R. J.(1986)An improved
procedure
Biometrika,
73, 751-754.
Smith,D. E., Clemens,J., Crede,W., Harvey,M. and Gracely,E. J. (1987) Impactof multiple
in randomized
clinicaltrials.Am. J. Med., 83, 545-550.
comparisons
and effect
sizeestimation.
J.Am. Statist.Ass., 84, 608-610.
"discoveries"
Soric,B. (1989)Statistical
of somemultiple
Ann.Math.Statist.,
Spj0tvoll,E. (1972)On theoptimality
comparison
procedure.
43, 398-411.