False discovery proportion control by permutations False discovery proportion control by permutations Proving properties of SAM Jesse Hemerik, Jelle Goeman Radboud university medical center, The Netherlands 80 Years After Bonferroni False discovery proportion control by permutations Main message • SAM (“Significance Analysis of Microarrays”) is a useful method for FDP estimation • First paper about SAM (2001) cited 10,000 times • SAM is only heuristic • We provide exact conf. statements about FDP False discovery proportion control by permutations FDP We test hypotheses H1 , ..., Hm R := {1 ≤ i ≤ m : Hi is rejected} N := {1 ≤ i ≤ m : Hi is true} V := #N ∩ R number of false positives FDP := V #R False discovery proportion control by permutations Setting of SAM • Hypotheses H1 , ..., Hm • Data X with any distribution • Test statistics T1 (X ), ..., Tm (X ) • G a finite group of transformations from and to the range of X • Joint distr. of the Ti (gX ) with i ∈ N , g ∈ G , is invariant under all transformations in G of the data X . False discovery proportion control by permutations Output of SAM 1 User chooses a rejection region D ⊂ R 2 [ SAM rejects the Hi with Ti ∈ D and provides FDP False discovery proportion control by permutations d SAM’s calculation of FDP 1 R = R(X ) = {1 ≤ i ≤ m : Ti (X ) ∈ D} 2 For each permutation gj , calculate #R(gj X ) = #{1 ≤ i ≤ m : Ti (gj X ) ∈ D} 3 Vb := median of the values #R(gj X ), 1 ≤ j ≤ w 4 [ := FDP 5 [ := FDP [·π FDP b0 0 Vb #R (π0 = #N ) m False discovery proportion control by permutations Part 2: our results False discovery proportion control by permutations d Results on FDP [ is a median-controlling estimator of FDP, i.e: Proven: FDP [ ≥ 1. P(FDP ≤ FDP) 2 0 [ = FDP [·π FDP b0 is not False discovery proportion control by permutations Generalization Choose: • for each Ti any rejection region Di ⊂ R • some α ∈ [0, 1] We provide: a (1 − α)100%-confidence upper bound FDP for the FDP: P(FDP ≤ FDP) ≥ 1 − α False discovery proportion control by permutations Calculation of upper bound The (1 − α)100%-confidence upper bound is FDP := V , #R where V is the (1 − α)-quantile of the values #R(gj X ), 1 ≤ j ≤ w False discovery proportion control by permutations Recall permutation test: • Consider: • data X with any distribution • a group G of transformations from and to the range of X • a test statistic T (X ) d • H0 : X = gX for all g ∈ G . • Let T (1) ≤ ... ≤ T (#G ) be the sorted values T (gX ), g ∈ G . • Then P(T (X ) > T (d(1−α)·#G e) ) ≤ α. False discovery proportion control by permutations Proof upper bound To show: P(V > V ) ≤ α. Proof: Let V 1−α be the (1 − α)-quantile of the values #N ∩ R(gj X ), 1 ≤ j ≤ w. By permutation principle: P(#N ∩ R(X ) > V 1−α ) ≤ α. Finally note that V 1−α ≤ V . False discovery proportion control by permutations Conservativeness (1) • By permutation principle the (1 − α)-quantile of the values #N ∩ R(gj X ), 1 ≤ j ≤ w, is a (1 − α)-upper bound for V . • But we don’t know N , so use the (1 − α)-quantile of the values #R(gj X ), 1 ≤ j ≤ w. • So real error rate can be much smaller than α. False discovery proportion control by permutations Conservativeness (2) [ is conservative When there are many false hypotheses, FDP 0 [ := FDP [·π SAM software therefore uses FDP b0 Unknown properties. It’s not median-unbiased We want to decrease the bound without losing the property P(FDP ≤ FDP) ≥ 1 − α False discovery proportion control by permutations Better upper bound Let E be the event that V ≤ V 1−α . Thus P(E ) ≥ 1 − α. Suppose E holds. Thus V ≤ V 1−α ≤ V . So among R there are no more than V true hypotheses Use this information to find better bound V 1 2 3 Continue like this, finding V ≥ V ≥ V ... Improved upper bound = mini V i 1 False discovery proportion control by permutations Part 3: Relation to closed testing SAM bound ≥ i Bound of iterative method min V ≥ i Bound derived from closed testing procedure False discovery proportion control by permutations General definition closed testing T Want to test each intersection hypothesis HI = i∈I Hi , I ⊆ {1, ..., m} such that P(no false positives) ≥ 1 − α For each HI , define a test of level α. (So 2m − 1 local tests) C.t.procedure rejects all HI with property that all HJ with J ⊇ I are rejected by their local tests False discovery proportion control by permutations Deriving upper bounds using c.t.p. Write X = {I ⊆ {1, ..., m} : HI rejected by c.t.p.} Let K ⊆ {1, ..., m} be any set. By Goeman and Solari (2011): An upper bound to #N ∩ K is max{#I : I ⊆ K , I 6∈ X } ∨ 0. With probability ≥ 1 − α these bounds are valid uniformly over all K ⊆ {1, ..., m}. False discovery proportion control by permutations Our c.t.p. In the SAM context, recall R(X ) = {1 ≤ i ≤ m : Ti (X ) ∈ Di }. For each HI consider local test that rejects iff (1−α) #I ∩ R(X ) > RI (1−α) , where RI is the (1 − α)-quantile of the values #I ∩ R(gj X ), 1 ≤ j ≤ w False discovery proportion control by permutations Connection to our iterative method Consider the c.t.p. based on these local tests. Write R := R(X ) Upper bound for V = R ∩ N is max{#I : I ⊆ R and I 6∈ X } 1 = ... = ... ≤ ... = V . 1 2 Using V , by analogous argument V follows, etc. False discovery proportion control by permutations Uniform bounds For every K ⊆ {1, ..., m} a (uniform) bound for #K ∩ N is max{#I : I ⊆ K and I 6∈ X } = ... = ... ≤ ... = ... = (1−α) min{#K , #K ∩ R c + RK ∪R c } =: V (K ) False discovery proportion control by permutations Relation to iterative method • An upper bound to R ∩ N is max{V (K ) : K ⊆ R, #K = V (R)} = (1−α) max{min{#K , RK ∪R c } : K ⊆ R, #K = V (R)}. 1 2 3 But this is exactly V . Analogously V , V , ... follow • Likewise, for every I ⊆ {1, ..., m} we can improve V (I ) False discovery proportion control by permutations Computational feasibility • SAM bound ≥ Bound of iterative method mini V Bound from c.t.p. i ≥ • Iterative method faster than using c.t.p. • But still computationally intensive • → Shortcut False discovery proportion control by permutations Use of random permutations Suppose we want to use only w permutations from G Drawing with replacement: Take g1 := id. Draw g2 , ..., gw with replacement from G Drawing without replacement: Take g1 := id. Draw g2 , ..., gw without replacement from G \ {id} False discovery proportion control by permutations Conclusion • Until now SAM was only heuristic • We have proven properties of SAM and extended it to give confidence statements about the FDP • We have improved SAM without losing coverage False discovery proportion control by permutations References First SAM paper: Tusher, V.G., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences 98 5116-5121. Rationale behind π b0 : Storey, J.D. et al. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. JRSS: Series B (Statistical Methodology) 66 187-205. Details about (b π0 as used in) SAM R package samr: Chu, G. et al. Significance Analysis of Microarrays: users guide and technical document. Deriving FDP upper bounds using closed testing: Goeman, J. J. and Solari, A. (2011). Multiple testing for exploratory research. Statistical Science 26 584-597
© Copyright 2025 Paperzz