Jesse Hemerik

False discovery proportion control by permutations
False discovery proportion control by
permutations
Proving properties of SAM
Jesse Hemerik, Jelle Goeman
Radboud university medical center, The Netherlands
80 Years After Bonferroni
False discovery proportion control by permutations
Main message
• SAM (“Significance Analysis of Microarrays”) is a useful
method for FDP estimation
• First paper about SAM (2001) cited 10,000 times
• SAM is only heuristic
• We provide exact conf. statements about FDP
False discovery proportion control by permutations
FDP
We test hypotheses H1 , ..., Hm
R := {1 ≤ i ≤ m : Hi is rejected}
N := {1 ≤ i ≤ m : Hi is true}
V := #N ∩ R number of false positives
FDP :=
V
#R
False discovery proportion control by permutations
Setting of SAM
• Hypotheses H1 , ..., Hm
• Data X with any distribution
• Test statistics T1 (X ), ..., Tm (X )
• G a finite group of transformations from and to the range
of X
• Joint distr. of the Ti (gX ) with i ∈ N , g ∈ G , is invariant
under all transformations in G of the data X .
False discovery proportion control by permutations
Output of SAM
1
User chooses a rejection region D ⊂ R
2
[
SAM rejects the Hi with Ti ∈ D and provides FDP
False discovery proportion control by permutations
d
SAM’s calculation of FDP
1
R = R(X ) = {1 ≤ i ≤ m : Ti (X ) ∈ D}
2
For each permutation gj , calculate
#R(gj X ) = #{1 ≤ i ≤ m : Ti (gj X ) ∈ D}
3
Vb := median of the values #R(gj X ), 1 ≤ j ≤ w
4
[ :=
FDP
5
[ := FDP
[·π
FDP
b0
0
Vb
#R
(π0 =
#N
)
m
False discovery proportion control by permutations
Part 2: our results
False discovery proportion control by permutations
d
Results on FDP
[ is a median-controlling estimator of FDP, i.e:
Proven: FDP
[ ≥ 1.
P(FDP ≤ FDP)
2
0
[ = FDP
[·π
FDP
b0 is not
False discovery proportion control by permutations
Generalization
Choose:
• for each Ti any rejection region Di ⊂ R
• some α ∈ [0, 1]
We provide:
a (1 − α)100%-confidence upper bound FDP for the FDP:
P(FDP ≤ FDP) ≥ 1 − α
False discovery proportion control by permutations
Calculation of upper bound
The (1 − α)100%-confidence upper bound is
FDP :=
V
,
#R
where V is the (1 − α)-quantile of the values
#R(gj X ), 1 ≤ j ≤ w
False discovery proportion control by permutations
Recall permutation test:
• Consider:
• data X with any distribution
• a group G of transformations from and to the range of X
• a test statistic T (X )
d
• H0 : X = gX for all g ∈ G .
• Let
T (1) ≤ ... ≤ T (#G )
be the sorted values T (gX ), g ∈ G .
• Then P(T (X ) > T (d(1−α)·#G e) ) ≤ α.
False discovery proportion control by permutations
Proof upper bound
To show: P(V > V ) ≤ α.
Proof: Let V 1−α be the (1 − α)-quantile of the values
#N ∩ R(gj X ),
1 ≤ j ≤ w.
By permutation principle:
P(#N ∩ R(X ) > V 1−α ) ≤ α.
Finally note that V 1−α ≤ V .
False discovery proportion control by permutations
Conservativeness (1)
• By permutation principle the (1 − α)-quantile of the
values
#N ∩ R(gj X ),
1 ≤ j ≤ w,
is a (1 − α)-upper bound for V .
• But we don’t know N , so use the (1 − α)-quantile of the
values
#R(gj X ),
1 ≤ j ≤ w.
• So real error rate can be much smaller than α.
False discovery proportion control by permutations
Conservativeness (2)
[ is conservative
When there are many false hypotheses, FDP
0
[ := FDP
[·π
SAM software therefore uses FDP
b0
Unknown properties. It’s not median-unbiased
We want to decrease the bound without losing the property
P(FDP ≤ FDP) ≥ 1 − α
False discovery proportion control by permutations
Better upper bound
Let E be the event that V ≤ V 1−α .
Thus P(E ) ≥ 1 − α.
Suppose E holds. Thus V ≤ V 1−α ≤ V . So among R there
are no more than V true hypotheses
Use this information to find better bound V
1
2
3
Continue like this, finding V ≥ V ≥ V ...
Improved upper bound = mini V
i
1
False discovery proportion control by permutations
Part 3: Relation to closed testing
SAM bound ≥
i
Bound of iterative method min V ≥
i
Bound derived from closed testing procedure
False discovery proportion control by permutations
General definition closed testing
T
Want to test each intersection hypothesis HI = i∈I Hi ,
I ⊆ {1, ..., m} such that P(no false positives) ≥ 1 − α
For each HI , define a test of level α. (So 2m − 1 local tests)
C.t.procedure rejects all HI with property that all HJ with
J ⊇ I are rejected by their local tests
False discovery proportion control by permutations
Deriving upper bounds using c.t.p.
Write X = {I ⊆ {1, ..., m} :
HI rejected by c.t.p.}
Let K ⊆ {1, ..., m} be any set.
By Goeman and Solari (2011):
An upper bound to #N ∩ K is
max{#I : I ⊆ K , I 6∈ X } ∨ 0.
With probability ≥ 1 − α these bounds are valid
uniformly over all K ⊆ {1, ..., m}.
False discovery proportion control by permutations
Our c.t.p.
In the SAM context, recall
R(X ) = {1 ≤ i ≤ m : Ti (X ) ∈ Di }.
For each HI consider local test that rejects iff
(1−α)
#I ∩ R(X ) > RI
(1−α)
,
where RI
is the (1 − α)-quantile of the values
#I ∩ R(gj X ), 1 ≤ j ≤ w
False discovery proportion control by permutations
Connection to our iterative method
Consider the c.t.p. based on these local tests.
Write R := R(X )
Upper bound for V = R ∩ N is
max{#I : I ⊆ R and I 6∈ X }
1
= ... = ... ≤ ... = V .
1
2
Using V , by analogous argument V follows, etc.
False discovery proportion control by permutations
Uniform bounds
For every K ⊆ {1, ..., m} a (uniform) bound for #K ∩ N is
max{#I : I ⊆ K and I 6∈ X } = ... = ... ≤ ... = ... =
(1−α)
min{#K , #K ∩ R c + RK ∪R c } =: V (K )
False discovery proportion control by permutations
Relation to iterative method
• An upper bound to R ∩ N is
max{V (K ) : K ⊆ R, #K = V (R)} =
(1−α)
max{min{#K , RK ∪R c } : K ⊆ R, #K = V (R)}.
1
2
3
But this is exactly V . Analogously V , V , ... follow
• Likewise, for every I ⊆ {1, ..., m} we can improve V (I )
False discovery proportion control by permutations
Computational feasibility
• SAM bound ≥
Bound of iterative method mini V
Bound from c.t.p.
i
≥
• Iterative method faster than using c.t.p.
• But still computationally intensive
• → Shortcut
False discovery proportion control by permutations
Use of random permutations
Suppose we want to use only w permutations from G
Drawing with replacement: Take g1 := id. Draw g2 , ..., gw
with replacement from G
Drawing without replacement: Take g1 := id. Draw
g2 , ..., gw without replacement from G \ {id}
False discovery proportion control by permutations
Conclusion
• Until now SAM was only heuristic
• We have proven properties of SAM and extended it to
give confidence statements about the FDP
• We have improved SAM without losing coverage
False discovery proportion control by permutations
References
First SAM paper:
Tusher, V.G., Tibshirani, R. and Chu, G. (2001). Significance
analysis of microarrays applied to the ionizing radiation response.
Proceedings of the National Academy of Sciences 98 5116-5121.
Rationale behind π
b0 :
Storey, J.D. et al. (2004). Strong control, conservative point
estimation and simultaneous conservative consistency of false
discovery rates: a unified approach. JRSS: Series B (Statistical
Methodology) 66 187-205.
Details about (b
π0 as used in) SAM R package samr:
Chu, G. et al. Significance Analysis of Microarrays: users guide
and technical document.
Deriving FDP upper bounds using closed testing:
Goeman, J. J. and Solari, A. (2011). Multiple testing for
exploratory research. Statistical Science 26 584-597