An Introduction on Multiple Testing:

Multiple Testing and FDR
An Introduction on Multiple Testing:
False Discovery Control
Florent Chatelain
Gipsa-Lab / DIS, Grenoble INP
BasMatI, Porquerolles, 2015
1/ 45
Multiple Testing and FDR
Introduction
Multiplicity problem and chance correlation
Lottery
I
Winning probability for a given
ticket is very low...
I
But among the huge number of
tickets, the probablity that there
is at least one winning ticket is
quite high !
Paul the octopus
I
Paul predicts eight of the 2010
FIFA World Cup matches with a
perfect score !
I
Does it really means that Paul is
an Oracle ?
+ Large-scale experiments : multiplying the comparisons dramatically
increases the probability to obtain a good match by pure chance
2/ 45
Multiple Testing and FDR
Introduction
Multiplicity problem for statistical testing
I
T is the test statistics,
I
Rα is the region of rejection at level α : if H0 is true, Pr(T ∈ Rα ) = α
Multiple testing issue
I
N independent statistics T1 , . . . , TN obtained under the null H0
I
Probability to reject at least one of the N null hypotheses :
N
Y
Pr (∃Ti ∈ Rα ) = 1 − Pr (T1 , . . . , TN ∈
/ Rα ) = 1 −
Pr (Ti ∈
/ Rα ),
i=1
=1−
N
Y
(1 − α) = 1 − (1 − α)N
i=1
I
for a usual significative level α = 0.05, performing N = 20 tests gives a
probability 0.64 to find a ’significative’ discovery by pure chance...
+ Pr ( at least one false positive ) Pr ( the i-th is a false positive )
3/ 45
Multiple Testing and FDR
Introduction
Multiplicity problem in science
The Economist, 2013, “Unreliable research”
Many published research
findings in top-ranked
journals are not, or poorly,
reproducible
[Ioannidis,
2005]
I
if the test power is only 0.4, 40 true positives in average for 45 false
positives. Is this significant ?
4/ 45
Multiple Testing and FDR
Introduction
Large-Scale Hypothesis Testing [Efron, 2010]
Era of Massive Data Production
I
“omics” revolution, e.g. microarrays measures expression levels of tens
of thousands of genes for hundreds of subjects
I
astrophysics, e.g. MUSE spectro-imager delivers cubes of 300 × 300
images for 3600 wavelengths : detecting faint sources leads to
N ≈ 3 × 108 tests in a pixelwise approach
Large-Scale methodology
I
statistical inference and hypothesis testing theory devolopped in the
early 20th century (Pearson, Fisher, Neyman, . . . ) for small-data sets
collected by individual scientist
+ corrections are needed to assess significancy in large-scale experiments
5/ 45
Multiple Testing and FDR
Introduction
Outline
Multiple testing error control
Basic statistical hypothesis testing concepts
Family-Wise Error Rate FWER
False Discovery Rate FDR
FDR control : Benjamini-Hochberg Procedure
BH Procedure
Bayesian interpration of FDR
Empirical Bayes interpretation of BH procedure
Variations on FDR control and BH Procedure
Improving power
Dependence
Learning the null distribution
6/ 45
Multiple Testing and FDR
Multiple testing error control
Basic statistical hypothesis testing concepts
Type I and Type II Errors
For an individual statistical hypothesis testing
Decision
H0 retained
H0 rejected
H0 true
True Negative (TN)
1−α
False Positive (FP)
Type I Error α
H0 false
False Negative (FN)
Type II Error β
True Positive (TP)
1−β
Actual
I
False Positive ← false alarm,
I
False Negative ← miss-detection,
I
α = Pr(Type I Error) ← significance level,
I
β = Pr(Type II Error)
I
power π = Pr(True Positive) = 1 − β
7/ 45
Multiple Testing and FDR
Multiple testing error control
Basic statistical hypothesis testing concepts
P -values : an universal language for hypothesis testing
Intuitive definition
p-value ≡ probability of obtaining a result as extreme or “more extreme”
than the observed statistics, under H0
One-sided test example
I
T is the test statistic, tobs an observed realization of T
I
H0 rejected when tobs is too large : Rα = {t : t ≥ ηα }
p
p(tobs ) = PrH0 (T ≥ tobs )
tobs
Mathematical definition
Smallest value of α such that tobs ∈ Rα
p(tobs ) = inf {tobs ∈ Rα }
α
8/ 45
Multiple Testing and FDR
Multiple testing error control
Basic statistical hypothesis testing concepts
Property of p-values
I
Note that p(tobs ) ≤ u ⇔ tobs ∈ Ru , for all u ∈ [0, 1]
I
Let P = p(T ) be the random variable. If H0 is true
PrH0 (P ≤ u) = PrH0 (T ∈ Ru ) = u,
+ p-value ≡ transformation of the test statistics to be uniformly
distributed under the null (whatever the distribution of T )
Statistical hypothesis test based on p-value
H0 : p-value has a uniform distribution on [0, 1] : P ∼ U ([0, 1])
H1 : p-value is stochastically lower than U([0, 1]) :
PrH1 (P ≤ u) = PrH1 (T ∈ Ru ) > u,
+ the smaller is p ≡ p(tobs ), the more decisevely is H0 rejected
+ for a given α, H0 is rejected at level α if p ≤ α
9/ 45
Multiple Testing and FDR
Multiple testing error control
Basic statistical hypothesis testing concepts
Counting the errors in multiple testing
I
N hypothesis tests with a common procedure
Decision
H0 retained H0 rejected
Actual
Total
H0 true
V
U
N0
H0 false
S
T
N1
N −R
R
N
Total
I
N0 = # true nulls, N1 = # true alternatives
I
U = # False Positives ← Type I Errors
I
T = # True Positives,
I
R = # Rejections
How to define, and control, a global Type I Error rate/criterion ?
10/ 45
Multiple Testing and FDR
Multiple testing error control
Family-Wise Error Rate FWER
Family-Wise Error Rate FWER
Multiple testing settings for N tests
I
H01 , H02 , . . . , H0N ≡ family of null hypotheses
I
p1 , p2 , . . . , pN ≡ corresponding p-values
Definition
I
The familywise error rate is
FWER ≡ Pr Reject at least one true H0i = Pr (U > 0)
I
A FWER control procedure inputs a family of p-values p1 , p2 , . . . , pN
and outputs the list of rejected null hypotheses with the constraint
FWER ≤ α
for any preselected α
11/ 45
Multiple Testing and FDR
Multiple testing error control
Family-Wise Error Rate FWER
Bonferroni’s correction and FWER control
Bonferroni’s correction
α
Reject the null hypotheses H0i for which pi ≤ , (N is the number of tests)
N
FWER control
Let I0 be the indexes of the true null hypotheses, and N0 = #I0
!
[
X
α
α
FWER = Pr
pi ≤
≤
Pr pi ≤
,
N
N
i∈I
i∈I
0
0
α
= N0
≤ α,
N
where the first inequality is the Boole’s inequality Pr (∪i Ai ) ≤
P
i
Pr (Ai ).
I
Bonferroni’s does not require that the tests be independent (the pi can
be dependent)
I
Šidák correction ’improves’ Bonferroni for independent tests by
rejecting the H0i for which pi ≤ 1 − (1 − α)1/N ← equivalent for small
α/N to Bonferroni : no real improvement.
12/ 45
Multiple Testing and FDR
Multiple testing error control
Family-Wise Error Rate FWER
Bonferroni’s correction and FWER control
Bonferroni’s correction
α
Reject the null hypotheses H0i for which pi ≤ , (N is the number of tests)
N
FWER control
Let I0 be the indexes of the true null hypotheses, and N0 = #I0
!
[
X
α
α
FWER = Pr
pi ≤
≤
Pr pi ≤
,
N
N
i∈I
i∈I
0
0
α
= N0
≤ α,
N
where the first inequality is the Boole’s inequality Pr (∪i Ai ) ≤
P
i
Pr (Ai ).
I
Bonferroni’s does not require that the tests be independent (the pi can
be dependent)
I
Šidák correction ’improves’ Bonferroni for independent tests by
rejecting the H0i for which pi ≤ 1 − (1 − α)1/N ← equivalent for small
α/N to Bonferroni : no real improvement.
12/ 45
Multiple Testing and FDR
Multiple testing error control
Family-Wise Error Rate FWER
Stepwise FWER control procedures
Ordered p-values p(1) ≤ p(2) ≤ . . . ≤ p(N ) and associated null hypotheses
(1)
(N )
H0 , . . . , H 0
Step-down procedures
I
(k)
Reject H0
p(j) ≤ tα,j for j = 1, . . . , k
when
(1)
+ learning from the other experiments idea
I
(1)
(k̂max )
Reject H0 , . . . ,H0
where k̂max is the largest index satisfying (1)
⇔ global threshold t̂α ≡ tα,k̂max ← “testimation” problem
I
α
Holm’s procedure : tα,j = N −j+1
ensures FWER control at level α (not
requiring independence) ← uniformly more powerful than Bonferroni
Step-up procedures
I
Hochberg’s procedure : t̂α = tα,k̂max where k̂max is the largest index
α
satisfying p(k) ≤ N −k+1
+ unif. more powerful than Holm but requires the tests to be independent
13/ 45
Multiple Testing and FDR
Multiple testing error control
Family-Wise Error Rate FWER
Practical limits of FWER
I
FWER is appropriate to guard against any false positives
I
In many applications, this appears to be too stringent : we can accept
several false positives if their number is still much “lower” than the
number of true positives...
I
More liberal variants
k − FWER ≡ Pr Reject at least k true H0i = Pr (U ≥ k),
but how to preselect a relevant k for a given problem ?
+ Need to define a less stringent global Type I Error rate criterion, more
useful in many applications
14/ 45
Multiple Testing and FDR
Multiple testing error control
False Discovery Rate FDR
False Discovery Rate FDR [Benjamini and Hochberg, 1995]
“Discovery” terminology
I
R ≡ # Discoveries (Rejections)
I
U ≡ # False Discoveries (False Positives) ← Type I errors,
I
T ≡ # True Discoveries (True Positives),
Actual
H0 true
H0 false
Total
Decision
H0 retained
H0
V
S
N −R
rejected
U
T
R
Total
N0
N1
N
Definition
U
, where R ∨ 1 ≡ max (R, 1) ← False Discovery Proportion
R∨1
U
← False Discovery Rate
FDR ≡ E [FDP] = E
R∨1
FDP ≡
+ single test errors, or power, are calculated horizontally in the table
+ False Discovery Rate is calculated vertically (Bayesian flavor)
15/ 45
Multiple Testing and FDR
Multiple testing error control
False Discovery Rate FDR
False Discovery Rate FDR [Benjamini and Hochberg, 1995]
“Discovery” terminology
I
R ≡ # Discoveries (Rejections)
I
U ≡ # False Discoveries (False Positives) ← Type I errors,
I
T ≡ # True Discoveries (True Positives),
Actual
H0 true
H0 false
Total
Decision
H0 retained
H0
V
S
N −R
rejected
U
T
R
Total
N0
N1
N
Definition
U
, where R ∨ 1 ≡ max (R, 1) ← False Discovery Proportion
R∨1
U
FDR ≡ E [FDP] = E
← False Discovery Rate
R∨1
FDP ≡
+ single test errors, or power, are calculated horizontally in the table
+ False Discovery Rate is calculated vertically (Bayesian flavor)
15/ 45
Multiple Testing and FDR
Multiple testing error control
False Discovery Rate FDR
False Discovery Rate FDR [Benjamini and Hochberg, 1995]
“Discovery” terminology
I
R ≡ # Discoveries (Rejections)
I
U ≡ # False Discoveries (False Positives) ← Type I errors,
I
T ≡ # True Discoveries (True Positives),
Actual
H0 true
H0 false
Total
Decision
H0 retained
H0
V
S
N −R
rejected
U
T
R
Total
N0
N1
N
Definition
U
, where R ∨ 1 ≡ max (R, 1) ← False Discovery Proportion
R∨1
U
FDR ≡ E [FDP] = E
← False Discovery Rate
R∨1
FDP ≡
+ single test errors, or power, are calculated horizontally in the table
+ False Discovery Rate is calculated vertically (Bayesian flavor)
15/ 45
Multiple Testing and FDR
Multiple testing error control
False Discovery Rate FDR
FDR control is more liberal than FWER
FWER control procedure controls FDR
U
U U =E
U = 0 Pr (U = 0) + E
U > 0 Pr (U > 0),
R∨1
R∨1
R∨1
U U
≤ 1,
=E
where 0 ≤
U > 0 Pr (U > 0),
R
R
≤ Pr (U > 0) = FWER
FDR = E
+ Procedure controlling FWER at level α controls FDR at level α
FDR control procedure controls the FWER in the weak sense
If all the nulls H01 , . . . , H0N are true then U = R and
FDR = E
U U > 0 Pr (U > 0) = 1 × Pr (U > 0) = FWER
R
+ Procedure controlling FDR at level q controls FDR at level q only
when all null hypotheses are true
16/ 45
Multiple Testing and FDR
Multiple testing error control
False Discovery Rate FDR
FDR control is more liberal than FWER
FWER control procedure controls FDR
U
U U =E
U = 0 Pr (U = 0) + E
U > 0 Pr (U > 0),
R∨1
R∨1
R∨1
U U
≤ 1,
=E
where 0 ≤
U > 0 Pr (U > 0),
R
R
≤ Pr (U > 0) = FWER
FDR = E
+ Procedure controlling FWER at level α controls FDR at level α
FDR control procedure controls the FWER in the weak sense
If all the nulls H01 , . . . , H0N are true then U = R and
FDR = E
U U > 0 Pr (U > 0) = 1 × Pr (U > 0) = FWER
R
+ Procedure controlling FDR at level q controls FDR at level q only
when all null hypotheses are true
16/ 45
Multiple Testing and FDR
FDR control : Benjamini-Hochberg Procedure
Outline
Multiple testing error control
Basic statistical hypothesis testing concepts
Family-Wise Error Rate FWER
False Discovery Rate FDR
FDR control : Benjamini-Hochberg Procedure
BH Procedure
Bayesian interpration of FDR
Empirical Bayes interpretation of BH procedure
Variations on FDR control and BH Procedure
Improving power
Dependence
Learning the null distribution
17/ 45
Multiple Testing and FDR
FDR control : Benjamini-Hochberg Procedure
Canonical example
Source detection (oversimplified) problem
Statiscal linear model (source + noise)


   
X1
r1
1
 X2 
 r2   2 


   
 .  = µ .  +  . 
 .. 
 ..   .. 
XN
rN
N
I
µ > 0 ← source response
I
ri ∈ {0, 1} ← absence (ri = 0) or presence (ri = 1) of source for ith
location
I
i , 1 ≤ i ≤ N , are iid with N (0, 1) distribution ← gaussian noise
I
Xi is the ith observation
18/ 45
Multiple Testing and FDR
FDR control : Benjamini-Hochberg Procedure
Canonical example (cont’d)
Multiple testing problem
for each i
I
H0 : null hypothesis ≡ absence of signal, i.e. ri = 0
I
H1 : alternative hypothesis ≡ presence of signal, i.e. ri = 1
Test statistics
for each i
I
Xi is the test statistics
I
pi = 1 − Φ(Xi ), where Φ is the standard normal cdf, is the associated
p-value
How to choose a good threshold t to reject the tests s.t. pi ≤ t ?
19/ 45
Multiple Testing and FDR
FDR control : Benjamini-Hochberg Procedure
Ordered p-values plot for N = 50, N0 = 40, µ = 2, α = 0.1
1
0.9
0 0
Theoretical null quantiles
1 test control
Bonferroni
0
0
0
0 0
0.8
0
0.7
0
0.6
0
0.5
0
0 0
0 0
0 0
0 0
0 0
0
0 0
0.4
0.3
0.2
0 0
1 0
1 0
0 0 1
0
1
0
0 1 1 1
0
0.1
0.2
0.3
0 0 0
1
0
0 0
0 1
1 0
0.1
0.4
0.5
0.6
0.7
0.8
0.9
1
Ordered p-values p(1) ≤ p(2) ≤ · · · ≤ p(N ) vs theoretical quantiles
1/N, 2/N, . . . 1 under the null
20/ 45
Multiple Testing and FDR
FDR control : Benjamini-Hochberg Procedure
Ordered p-values plot for N = 100, N0 = 80, µ = 3, α = 0.1
Try something between Bonferroni and one test control : choose ti = q Ni
(here q = α = 0.1)
0.2
00
1 test control
Bonferroni
FDP=0.13333
0
00
10
0.18
0.16
00
0
0.14
00
0.12
0
1
0
0.1
0
00
0.08
0.06
0
10
0.04
00
1
11
11
1
0
0 11111111110
0
0.1
0.2
0.02
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Ordered p-values p(1) ≤ p(2) ≤ · · · ≤ p(N ) vs theoretical quantiles
1/N, 2/N, . . . 1 under the null
21/ 45
Multiple Testing and FDR
FDR control : Benjamini-Hochberg Procedure
BH Procedure
Benjamini-Hochberg (BH) procedure
Ordered p-values p(1) ≤ p(2) ≤ . . . ≤ p(N ) and associated null hypotheses
(1)
(N )
H0 , . . . , H0 , let p(0) = 0 by convention
Step-up BH procedure
For a preselected control level 0 ≤ q ≤ 1, BHq procedure rejects
(1)
(k̂)
H0 , . . . , H 0
where
k̂ = max 0 ≤ k ≤ N : p(k)
k
≤q
N
k̂
⇔ region of rejection RBH = p ≤ t̂q with t̂q = q N
+ learning from the other experiments idea
+ “testimation problem” : blurs the line between testing and estimation
22/ 45
Multiple Testing and FDR
FDR control : Benjamini-Hochberg Procedure
BH Procedure
FDR control of BH procedure
Theorem [Benjamini and Hochberg (1995)]
Under the independence assumption among the tests, BHq procedure
control the FDR at level :
FDR ≤
N0
q ≤ q,
N
where N0 is the number of true null hypotheses
N0
N
I
in practice N0 is unknown and bounded by N (π0 ≡
I
BH procedure control can be extended beyond independence for special
cases of positive dependence [Benjamini and Yekutieli (2001)]
≤ 1)
I
Typical value of q : no real conventional choice in the literature,
though q = 0.1 seems to be popular
23/ 45
Multiple Testing and FDR
FDR control : Benjamini-Hochberg Procedure
BH Procedure
Popularity of FDR and BH procedure
main application of multiple testing: 1990: beginning for DNA microarrays
increasing popularity of FDR
clinical trials
Bonferroni
correction
1935
FDR control under
- PRDS assumption,
15000
Control under
independance
5000
Holm's
procedure
1979
25000
multiplication of FWER
- upper bound for any dependance
control procedures
FDR introduction
1995
Benjamini
Hochberg
2001
Benjamini
Yekutieli
2013
Historical context and citations of the seminal paper [Benjamini and
Hochberg, 1995] (many thanks to Marine Roux for the picture)
FDR for Big Data
Large-scale hypothesis testing in many fields
I
DNA microarray, genomics, fMRI data,. . . ...
I
Several works with astronomical imaging applications since the early
2000s
24/ 45
Multiple Testing and FDR
FDR control : Benjamini-Hochberg Procedure
Bayesian interpration of FDR
Outline
Multiple testing error control
Basic statistical hypothesis testing concepts
Family-Wise Error Rate FWER
False Discovery Rate FDR
FDR control : Benjamini-Hochberg Procedure
BH Procedure
Bayesian interpration of FDR
Empirical Bayes interpretation of BH procedure
Variations on FDR control and BH Procedure
Improving power
Dependence
Learning the null distribution
25/ 45
Multiple Testing and FDR
FDR control : Benjamini-Hochberg Procedure
Bayesian interpration of FDR
Mixture model
Denote by X one test statistic, and Γ any subset of the real line
Two Group Mixture model
N tests statistics are either null or non-null with prior probability
I
π0 ≡ Pr (H0 )
I
π1 ≡ Pr (H1 ) = 1 − π0 ,
(in practice, π0 will be often close to 1),
and respective distributions,
I
F0 (Γ) ≡ Pr(X ∈ Γ|H0 ) ← null case
I
F1 (Γ) ≡ Pr(X ∈ Γ|H1 ) ← non-null case
The distribution of any X is the mixture with distribution
F (Γ) = π0 F0 (Γ) + π1 F1 (Γ)
26/ 45
Multiple Testing and FDR
FDR control : Benjamini-Hochberg Procedure
Bayesian interpration of FDR
Bayesian Fdr
Classification problem
I
We observe x ∈ Γ, does it corresponds to the null group ?
+ Applying the Bayes rule yields the posterior of the null
Pr (H0 |X ∈ Γ) = π0 F0 (Γ)/F (Γ)
Bayesian false discovery rate [Efron (2004,2010)]
I
Γ is now the region of rejection of the null
I
Bayesian false discovery rate defined as
Fdr(Γ) ≡ Pr (H0 |X ∈ Γ) = π0 F0 (Γ)/F (Γ),
27/ 45
Multiple Testing and FDR
FDR control : Benjamini-Hochberg Procedure
Bayesian interpration of FDR
Bayesian Fdr and positive FDR [Storey (2003)]
Positive FDR :
pFDR ≡ E
U R > 0
R
+ FDR = pFDR × Pr(R > 0)
Theorem [Storey (2003)]
I
R ≡ R(Γ) = # discoveries for the region of rejection Γ
I
U ≡ U (Γ) = # false discoveries for the region of rejection Γ
If the Xi are independent and distributed according to the mixture model,
U (Γ) ← Positive FDR
Fdr(Γ) ≡ Pr (H0 |X ∈ Γ) = E
R(Γ) > 0
R(Γ)
I
proof relies on U (Γ) |R(Γ) = k ∼ binomial distribution B(k, Fdr(Γ))
+ interpretation of a frequentist concept as a Bayesian one
28/ 45
Multiple Testing and FDR
FDR control : Benjamini-Hochberg Procedure
Empirical Bayes interpretation of BH procedure
Empirical Bayes Fdr estimate [Efron (2004, 2010)]
I
F , F0 and F1 denote now the cdf of the mixture, null and non-null
I
the test can be assumed to be left-sided : Γ = (−∞, t] and pi = F0 (xi )
Estimation of Fdr(t) = π0 F0 (t)/F (t)
I
F0 , assumed to be known,
I
π0 , unknown but usually close to 1,
I
F1 , unlikely to be known in large-scale inference
However F = π0 F0 + π1 F1 can be estimated by its empirical distribution :
F (t) = #{xi ≤ t}/N
+ does not require to specify H1 : robust to alternative miss-specifications
+ empirical Bayes : prior on F estimated from the observations
I
empirical Bayes Fdr estimate : Fdr(t) = π0 F0 (t)/F (t)
29/ 45
Multiple Testing and FDR
FDR control : Benjamini-Hochberg Procedure
Empirical Bayes interpretation of BH procedure
Empirical Bayes Fdr estimate [Efron (2004, 2010)]
I
F , F0 and F1 denote now the cdf of the mixture, null and non-null
I
the test can be assumed to be left-sided : Γ = (−∞, t] and pi = F0 (xi )
Estimation of Fdr(t) = π0 F0 (t)/F (t)
I
F0 , assumed to be known,
I
π0 , unknown but usually close to 1,
I
F1 , unlikely to be known in large-scale inference
However F = π0 F0 + π1 F1 can be estimated by its empirical distribution :
F (t) = #{xi ≤ t}/N
+ does not require to specify H1 : robust to alternative miss-specifications
+ empirical Bayes : prior on F estimated from the observations
I
empirical Bayes Fdr estimate : Fdr(t) = π0 F0 (t)/F (t)
29/ 45
Multiple Testing and FDR
FDR control : Benjamini-Hochberg Procedure
Empirical Bayes interpretation of BH procedure
Equivalence between Empirical Bayes Fdr control and BH procedure
Ordered observations x(1) ≤ x(2) ≤ . . . ≤ x(N )
I
F0 (x(i) ) = p(i) ,
⇒ Fdr(x(i) ) =
π0 Ni
and F (x(i) ) = i/N
p(i)
Fdr control at level π0 q
Given a preselected q, find t̂ = max t : Fdr(t) ≤ π0 q
⇔ t = maxi x(i) s.t. p(i) ≤ q Ni
(1)
(k̂)
⇔ reject H0 , . . . , H0
k
where k̂ is the largest index s.t. p(k) ≤ q N
⇔ BHq procedure
Fdr control and dependence
I
F (t) is an unbiased estimator of F (t) even under dependence,
I
Fdr is a rather slighlty upward biased estimate of FDR even under
dependence [Efron (2010)],
I
price of dependence is the variance of the estimator Fdr(t)
30/ 45
Multiple Testing and FDR
FDR control : Benjamini-Hochberg Procedure
Empirical Bayes interpretation of BH procedure
Equivalence between Empirical Bayes Fdr control and BH procedure
Ordered observations x(1) ≤ x(2) ≤ . . . ≤ x(N )
I
F0 (x(i) ) = p(i) ,
⇒ Fdr(x(i) ) =
π0 Ni
and F (x(i) ) = i/N
p(i)
Fdr control at level π0 q
Given a preselected q, find t̂ = max t : Fdr(t) ≤ π0 q
⇔ t = maxi x(i) s.t. p(i) ≤ q Ni
(1)
(k̂)
⇔ reject H0 , . . . , H0
k
where k̂ is the largest index s.t. p(k) ≤ q N
⇔ BHq procedure
Fdr control and dependence
I
F (t) is an unbiased estimator of F (t) even under dependence,
I
Fdr is a rather slighlty upward biased estimate of FDR even under
dependence [Efron (2010)],
I
price of dependence is the variance of the estimator Fdr(t)
30/ 45
Multiple Testing and FDR
Variations on FDR control and BH Procedure
Outline
Multiple testing error control
Basic statistical hypothesis testing concepts
Family-Wise Error Rate FWER
False Discovery Rate FDR
FDR control : Benjamini-Hochberg Procedure
BH Procedure
Bayesian interpration of FDR
Empirical Bayes interpretation of BH procedure
Variations on FDR control and BH Procedure
Improving power
Dependence
Learning the null distribution
31/ 45
Multiple Testing and FDR
Variations on FDR control and BH Procedure
Improving power
Estimation of the proportion π0 of true H0
Adaptive BH procedures [Benjamini et al. (2006), Storey et al. (2004)]
N0
N
I
BH procedure overcontrols FDR : FDR(BHq ) = π0 q, where π0 =
I
an upward bias estimator π̂0 of π0 can be plugged to improve power
+ Adaptive BH procedure : BH procedure at control level q/π̂0 to obtain
a FDR control at nominal level q
Storey’s π0 estimator [Storey et al. (2004)]
I
Survival function G(t) = 1 − F (t) of the p-values
G(λ) = π0 G0 (λ) + π1 G1 (λ) ≥ π0 G0 (λ) = π0 (1 − λ)
+ for large enough λ, G1 (λ) ≈ 0, thus π0 ≈ G(λ)/(1 − λ)
Based on the empirical survival funtion, the modified Storey’s estimator is
π̂0 (λ) =
#{pi > λ} + 1
,
N (1 − λ)
for a given λ ∈ (0, 1),
32/ 45
Multiple Testing and FDR
Variations on FDR control and BH Procedure
Improving power
Estimation of the proportion π0 of true H0
Adaptive BH procedures [Benjamini et al. (2006), Storey et al. (2004)]
N0
N
I
BH procedure overcontrols FDR : FDR(BHq ) = π0 q, where π0 =
I
an upward bias estimator π̂0 of π0 can be plugged to improve power
+ Adaptive BH procedure : BH procedure at control level q/π̂0 to obtain
a FDR control at nominal level q
Storey’s π0 estimator [Storey et al. (2004)]
I
Survival function G(t) = 1 − F (t) of the p-values
G(λ) = π0 G0 (λ) + π1 G1 (λ) ≥ π0 G0 (λ) = π0 (1 − λ)
+ for large enough λ, G1 (λ) ≈ 0, thus π0 ≈ G(λ)/(1 − λ)
Based on the empirical survival funtion, the modified Storey’s estimator is
π̂0 (λ) =
#{pi > λ} + 1
,
N (1 − λ)
for a given λ ∈ (0, 1),
32/ 45
Multiple Testing and FDR
Variations on FDR control and BH Procedure
Improving power
Adaptive BH procedures : Storey’s π0 estimator
Properties of adaptive BH procedure with modified Storey’s estimator π̂0 (λ)
I
exact control of FDR at nominal level q for independent tests,
I
asymptotic control of FDR in case of weak dependence
Typical values of λ
Based on various simulations [Blanchard et al. (2009)]
I
λ = 21 ← “uniformly” more powerful than other adaptive procedures,
but not robust to strong dependences (e.g. equicorrelation of the test
statistics)
I
λ = q ← powerful and quite robust to long memory dependences
33/ 45
Multiple Testing and FDR
Variations on FDR control and BH Procedure
Dependence
Extension of BH procedure to dependent tests
Positive Regression Dependence on a Subset PRDS [Benjamini et al. (2001)]
BH procedure still controls FDR at nominal value q when the test statistics
vector obey the PRDS property : e.g. for one-sided tests
I
Gaussian vector with positive correlations,
I
Studentized gaussian PRDS vector for q ≤ 0.5
Universal bound [Benjamini and Yekutieli (2001)]
For any dependence structure, BH procedure still controls FDR at level
FDR(BHq ) ≤ π0 q c,
where π0 =
I
N0
N
and c =
PN
i
i=1 N
≈ log(N )
too conservative to be useful in practice (more conservative than
Bonferroni when the number of rejected test k̂ is lower than c)
34/ 45
Multiple Testing and FDR
Variations on FDR control and BH Procedure
Dependence
Knockoff filter for dependent tests [Barber and Candes (2015)]
Statistical linear model
y = Xβ + I
y ∈ Rn is the response vector
I
∈ Rn is a white gaussian noise vector
I
X ∈ Rn×p is a determistic matrix of the p column predictors
I
β ∈ Rp is the weight vector
Multiple testing problem : predictors associated with the response ?
I
H0i : “βi = 0”, for 1 ≤ i ≤ p
+ Knockoff construction to control FDR based on any model selection
procedure
+ Application to large-scale hypothesis testing, and/or strong local
dependences ?
35/ 45
Multiple Testing and FDR
Variations on FDR control and BH Procedure
Learning the null distribution
Null hypothesis specification diagnonis
I
I
BH procedure requires so little : only the choice of the test statistics
and its specification when the null hypothesis is true
crucial to check that the null is correctly specified before !
Graphical diagnonis
qq-plot of the p-values must be linear for large enough values
1
π-1
slope
0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
correctly specified H0
36/ 45
Multiple Testing and FDR
Variations on FDR control and BH Procedure
Learning the null distribution
Null hypothesis specification diagnonis
I
I
BH procedure requires so little : only the choice of the test statistics
and its specification when the null hypothesis is true
crucial to check that the null is correctly specified before !
Graphical diagnonis
qq-plot of the p-values must be linear for large enough values
1
π-1
slope
0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
miss-specified H0
36/ 45
Multiple Testing and FDR
Variations on FDR control and BH Procedure
Learning the null distribution
Learning the null distribution
Deviation from the theoretical null
I
theoretical null hypothesis usually derived in an idealized framework,
does not not account for sample correlations,...
+ unlikely to be correctly specified in large-scale testing !
+ possibility to detect and correct possible miss-specification of the null
hypothesis
Empirical null distribution [Efron 2010]
Parametric H0 estimation : based on the observations that are the most
likely under theoretical H0 , estimates of the null parameters (and π0 )
I
central matching
I
maximum likelihood
Non-parametric H0 estimation
I
permutation null distribution
37/ 45
Multiple Testing and FDR
Variations on FDR control and BH Procedure
Learning the null distribution
Concluding remarks
I
FDR is a very useful global error criterion that allows one to control a
trade-off between Type I error and Power
I
BH procedure is a very simple and quite robust procedure to control
FDR
I
Main important problem and challenges still concerns the dependence :
how to explicitely account for dependence ?
38/ 45
Multiple Testing and FDR
Variations on FDR control and BH Procedure
Learning the null distribution
Selected references
I Barber, R. F. and Candes, E. (2015). “Controlling the False Discovery Rate via
Knockoffs,” to appear in Annals of Statistics, arXiv preprint arXiv:1404.5609
I Benjamini, Y. and Hochberg, Y. (1995), “Controlling the false discovery rate : a practical
and powerful approach to multiple testing,” Journal of the Royal Statistical Society, Series
B (Methodological), 289-300
I Benjamini, Y. Krieger, A. M. and Yekutieli, D. (2006) “Adaptive linear step-up
procedures that control the false discovery rate,” Biometrika, 93(3) :491–507.
I Benjamini, Y. and Yekutieli, D. (2001), “The control of the false discovery rate in
multiple testing under dependency,” Annals of Statistics, 1165-1188.
I Blanchard, G. and Roquain, E. (2009) “Adaptive False Discovery Rate Control under
Independence and Dependence,” Journal of Machine Learning Research 10, 2837-2871
I Efron, B. (2004), “Large-scale simultaneous hypothesis testing,” Journal of the American
Statistical Association, 99(465).
I Efron, B. (2010), “Large-scale inference : empirical Bayes methods for estimation,
testing, and prediction,” (Vol. 1), Cambridge University Press
I Ioannidis, J. P. (2005), “Why most published research findings are false,” PLoS medicine,
2(8)
I Storey, J. D. (2002), “A direct approach to false discovery rates,” Journal of the Royal
Statistical Society : Series B (Statistical Methodology), 64(3), 479-498
I Storey, J. D. (2003), “The positive false discovery rate : A Bayesian interpretation and
the q-value,” Annals of statistics, 2013-2035
I Storey, J. D., Taylor, J. E. and Siegmund, D. (2004) “Strong control, conservative point
estimation and simultaneous conservative consistency of false discovery rates : a unified
approach,” Journal of the Royal Statistical Society : Series B (Statistical Methodology),
66(1), 187-205
39/ 45
Multiple Testing and FDR
Variations on FDR control and BH Procedure
Learning the null distribution
Selected references with Astrophysics applications
I Clements, N., Sarkar, S. K. and Guo, W. (2012), “Astronomical transient detection
controlling the false discovery rate,” Statistical Challenges in Modern Astronomy, vol V
(pp. 383-396), Springer New York
I Friedenberg, D. A. and Genovese, C. R. (2013), “Straight to the Source : Detecting
Aggregate Objects in Astronomical Images With Proper Error Control,” Journal of the
American Statistical Association, 108(502), 456-468
I Hopkins, A. M. et al., (2002), “A new source detection algorithm using the false-discovery
rate,” The Astronomical Journal, 123(2), 1086.
I Miller, C. J. et al., (2001), “Controlling the false-discovery rate in astrophysical data
analysis,” The Astronomical Journal, 122(6), 3492.
I Pacifico, P.M. et al., (2004), “False discovery control for random fields,” Journal of the
American Statistical Association, 99(468), 1002-1014.
I Serra, P. et al., (2015), “SoFiA : a flexible source finder for 3D spectral line data,” arXiv
preprint arXiv :1501.03906.
I Meillier, C. et al., (2015), “Error control for the detection of rare and weak signatures in
massive data”, to appear in Proc. of EUSIPCO 2015
40/ 45
Multiple Testing and FDR
Variations on FDR control and BH Procedure
Learning the null distribution
Great talks on multiple testing / FDR available on the web
Genovese, C. R. “A Tutorial on False Discovery Control,”
http://www.stat.cmu.edu/~genovese/talks/hannover1-04.pdf
Roquain, E. (2014) “False Discovery Rate. Part I : Introduction et
Enjeux” http://www.proba.jussieu.fr/pageperso/picard/
roquain-pdv-part1.pdf
Benjamini Y. (2005) “Discovering the False Discovery Rate,”
http://www.researchgate.net/profile/Yoav_Benjamini/
publication/227532102_Discovering_the_false_discovery_rate/
links/0c960517bfb2c2d516000000.pdf
41/ 45
Multiple Testing and FDR
Variations on FDR control and BH Procedure
Learning the null distribution
Supplementary materials
Some proofs of the BH procedure FDR control
I
FDR control under independence
I
FDR control under PRDS property
I
universal FDR bound for an arbitrary depedence structure
42/ 45
Multiple Testing and FDR
Variations on FDR control and BH Procedure
Learning the null distribution
proof of the FDR control of BH procedure
I R ≡ set of discoveries, H0 ≡ set of N0 tur null hypotheses
I indicator trick : for discrete random variables E[A|B = b] Pr(B = b) = E[A × 1B=b ]
N
X
|R ∩ H0 | |R| = k Pr(|R| = k) =
k
k=1


N
X
X
X
1

1
×
1
=
E
|R|=k =
pi ≤q k
k
N
i∈H
i∈H
k=1
FDR =
E
=
q
N
X
N i∈H k=1
0
Pr
k=1
k
N
X
1
0 k=1
0
X
N
X
1
k
h
i
E |R ∩ H0 | × 1|R|=k ,
Pr
k̂ = k, pi ≤ q
k
N
,
k
k̂ = k pi ≤ q
,
N
where the last equality comes from pi ∼ U[0,1] under the null (this becomes an inequality ≤ if
pi is assumed to be stocahstically greater than U[0,1] under the null)
43/ 45
Multiple Testing and FDR
Variations on FDR control and BH Procedure
Learning the null distribution
proof of the FDR control of BH procedure (cont’d)
Independent case
I k̂i ≡ number of discoveries except the ith test (r.v. in {0, . . . , N − 1}),
k
k
= Pr k̂i = k − 1 pi ≤ q N
= Pr k̂i = k − 1
+ Pr k̂ = k pi ≤ q N
FDR =
q
X
N
X
N i∈H k=1
0
Pr
−1
k
q X
N0
q X NX
i
k̂ = k pi ≤ q
Pr k̂ = k − 1 =
1=
q
=
N
N i∈H k=0
N i∈H
N
0
0
PRDS case
I PRDS : u 7→ Pr k̂ ≤ k |pi = u is increasing in u
k
+ Pr k̂ ≤ k − 1 pi ≤ q N
≥ Pr k̂ ≤ k − 1 pi ≤ q k−1
N
N
X
N
X
k
k
k
k̂ = k pi ≤ q
=
Pr k̂ ≤ k pi ≤ q
− Pr k̂ ≤ k − 1 pi ≤ q
,
N
N
N
k=1
k=1
N
X
k
k−1
≤
Pr k̂ ≤ k pi ≤ q
− Pr k̂ ≤ k − 1 pi ≤ q
N
N
k=1
q
q
= Pr k̂ ≤ N |pi ≤ q − Pr k̂ ≤ 1 pi ≤
+ Pr k̂ = 1 pi ≤
= 1,
N
N
N
thus FDR ≤ N0 q
Pr
44/ 45
Multiple Testing and FDR
Variations on FDR control and BH Procedure
Learning the null distribution
proof of the FDR control bound for arbitrary dependence
Arbitrary dependence
I
1
k
1
1
= k−1
− k(k−1)
FDR =
X
N X
1
k
i∈H0 k=1
N
X
1
X
≤
i∈H0 k=1
=
+
X
X
1
i∈H0
N0
N
N
q+
k
k
1
k̂ ≤ k, pi ≤ q
Pr k̂ ≤ k − 1, pi ≤ q
−
,
N
k
N
N
X X
k
k−1
1
k̂ ≤ k, pi ≤ q
Pr k̂ ≤ k − 1, pi ≤ q
−
,
N
k
N
i∈H0 k=2
Pr
N
X
1
X
=
Pr
k
i∈H0 k=1
i∈H0 k=2
≤
k
N
X
Pr
N
X X
k
k−1
1
k̂ ≤ k, pi ≤ q
−
Pr k̂ ≤ k − 1, pi ≤ q
N
k
−
1
N
i∈H0 k=2
1
k(k − 1)
Pr
Pr (pi ≤ q) +
N0
N
q
1
2
k−1
k̂ ≤ k − 1, pi ≤ q
,
N
X
N
X
1
k(k − 1)
i∈H0 k=2
+ ... +
1
N
=
N0
N
q
Pr
k−1
pi ≤ q
,
N
N
X
1
j=1
j
45/ 45