Motivation
The p-Value Statistic
Results
Conclusion
Robust p-Values for Multiple Testing
Procedures
Presentation by:
Joshua D. Habiger (Oklahoma State University)
Joint work with:
Edsel Peña (University of South Carolina)
June 24, 2011
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Overview
Motivation
⇓
The p-Value Statistic
⇓
Results
⇓
Conclusion
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Microarray Experiment
Quick Analysis
Multiple Testing
Timons et. al. (2007)
Each microarray chip has 12,488 different gene
expression measurements
5 microarray chips from brown fat cells and 8 chips
from white fat cells
m
1
2
..
.
xm1
1.22
3.57
..
.
xm2
1.66
19.22
..
.
...
...
...
..
.
xm5
2.33
11.89
..
.
ym1
5.64
5.17
..
.
ym2
1.79
29.49
..
.
...
...
...
..
.
ym8
4.05
11.26
..
.
12488
2.52
10.91
...
22.67
10.70
7.35
...
12.81
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Microarray Experiment
Quick Analysis
Multiple Testing
Boxplots for 5 genes
400
20
300
60
15
350
30
25
6
5
250
15
50
10
4
5
200
40
5
10
3
2
EXPRESSION LEVEL
70
20
Which genes are differentially expressed?
b
w
GENE 1
b
w
GENE 2
J. D. Habiger
b
w
GENE 3
Robust p-Values
b
w
GENE 4
b
w
GENE 5
Motivation
The p-Value Statistic
Results
Conclusion
Microarray Experiment
Quick Analysis
Multiple Testing
Hypothesis Testing
We can test each pair of hypotheses with a T -test
m
1
2
..
.
M = 12488
H0m
µx1 = µy 1
µx2 = µy 2
..
.
µxM = µyM
vs.
H1m
vs. µx1 6= µy 1
vs. µx2 6= µy 2
..
..
.
.
vs. µxM 6= µyM
tm
-.463
-1.006
..
.
.010
Small p-values are evidence against the null
hypothesis
J. D. Habiger
Robust p-Values
pm
.678
.336
..
.
.992
Motivation
The p-Value Statistic
Results
Conclusion
Microarray Experiment
Quick Analysis
Multiple Testing
Initial Results
Four possible outcomes for each test
H0m true
H0m false
Accept H0m
correct decision
type 2 error
Reject H0m
type 1 error
correct decision
DISCOVERY
Decision rule: Reject H0m if pm ≤ .05
2879 DISCOVERIES !!!
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Microarray Experiment
Quick Analysis
Multiple Testing
But we could expect up to 624 FALSE
DISCOVERIES!!
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Microarray Experiment
Quick Analysis
Multiple Testing
Global Error Rates
#H0m true
#H0m false
Total
#H0m accepted
U
T
m−R
#H0m rejected
V
S
R
Total
M0
M1
M
FWER = Pr (V ≥ 1)
FDR = E [V / max{R, 1}]
Other Error Rates: See Storey(2004), Benjamini and
Hochberg (1995), Sarkar (2007)
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Microarray Experiment
Quick Analysis
Multiple Testing
Some Procedures
Popular FDR - type procedures
Benjamini and Hochberg (1995)
Storey (2002, 2003) - Q-value procedure
Efron(2001,2004,2007) local FDR procedure
Popular FWER procedures
Bonferroni (1933)
Holm (1979)
Sequential Sidak (1967)
Westfall and Young (1993)
Many procedures make use of p-values
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Microarray Experiment
Quick Analysis
Multiple Testing
Examples
For p(1) ≤ p(2) ≤ ... ≤ p(M)
Holm Procedure: Reject H0m if pm ≤ α M−k1 +1 where
α
k = max j : p(i) ≤
, ∀i ≤ j
M −i +1
BH procedure: Reject H0m if pm ≤ α Mk where
j
k = max j : p(j) ≤ α
M
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Microarray Experiment
Quick Analysis
Multiple Testing
Application:α = .05
200
150
100
50
0
Frequency
250
300
350
Histogram of T−test P−values
0.00
0.01
0.02
0.03
0.04
0.05
P−value
Uncorrected
2879
J. D. Habiger
BH
812
Holm
49
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Microarray Experiment
Quick Analysis
Multiple Testing
Validity of MTPs
BH procedure controls FDR at α if p-values from nulls
are independent and uniformly distributed
Holm procedure controls FWER at α if p-values from
nulls are uniformly distributed
Many mutiple testing procedures assume p-values
from null are uniformly distributed
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Microarray Experiment
Quick Analysis
Multiple Testing
Issues
T test p-values only uniformly distributed under nulls
if data are Normally distributed
Nonparametric rank based tests lead to discrete
p-values
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Framework
Tools
Application
Basic Elements
Data Model: X ∼ F ∈ F
Hypotheses(Null/Alternative Models): H0 : F ∈ F0 vs.
H1 : F ∈ F 1
Decision function: δ(x, u; η) ∈ {0, 1}
U is a uniform variate
η is specified size . . . called a size index
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Framework
Tools
Application
Valid Decision Process
Definition: The stochastic process
∆ = {δ(X , U; η) : η ∈ [0, 1]}
is called a decision process
Definition: ∆ is F0 -size-valid if for every η ∈ [0, 1],
sup EF [δ(X , U; η)] = η
F ∈F0
i.e. Specified size = actual size
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Framework
Tools
Application
Uniform p-Value Statistic
Definition: The p-Value statistic for ∆ is
P∆ (X , U) = inf{η ∈ [0, 1] : δ(X , U; η) = 1}
The p-value P∆ (x, u) is the smallest size index
allowing for the rejection of H0 .
Definition: P∆ (X , U) is F0 -uniform if for every t ∈ [0, 1],
sup PF (P∆ (X , U) ≤ t) = t
F ∈F0
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Framework
Tools
Application
Main Theorem
Theorem: P∆ (X , U) is F0 -uniform if and only if
∆ is F0 -size-valid.
Why do we care?
Valid decision process
⇓
Valid p-value
⇓
Valid multiple testing procedures
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Framework
Tools
Application
Randomized Wilcoxon Test
i.i.d
Data: X1 , X2 , .., Xn ∼ F (·) and
i.i.d.
Y1 , Y2 ..., Ym ∼ F (· − θ)
Model: F ∈ F ={ continuous F}
Hypotheses: H0 : F ∈ F0 = {θ ≤ 0, F continuous}
vs. H1 : F ∈ F1 = {θ > 0, F continuous}
The usual randomized
Wilcoxon test is
if W (x, y) > k (η)
1
γ(η) if W (x, y) = k (η)
φ(x, y; η) =
0
if W (x, y) < k (η)
k (η) and γ(η) can be chosen s.t. supF ∈F0 EF [φ(x , y; η)] = η
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Framework
Tools
Application
Randomized Wilcoxon p-Value
Decision Function:
δWR (x, y, u; η) = I[u ≤ φ(x, y; η)]
= I[W (x, y) > k (η)] + I[u ≤ γ(η)]I[W (x, y) = k (η)]
p-Value:
P∆WR (x, y) = inf{η ∈ [0, 1] : δWR (x, y, u; η) = 1}
= Pr(W (X , Y ) > W (x, y)) + u Pr(W (X , Y ) = W (x, y))
The regular Wilcoxon p-value is
P∆W (x, y) = Pr(W (X , Y ) ≥ W (x, y))
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
p-value distribution
Multiple Testing Procedures
Distribution of p-Value Statistics
Since ∆WR is F0 -size valid by construction, the
randomized Wilcoxon p-value is F0 -uniform
Only need data to be continuous
Nonrandomized Wilcoxon p-value has a discrete
distribution
T -test p-value is only uniformly distributed under
Normal model
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
p-value distribution
Multiple Testing Procedures
Example: F is Normal
i.i.d.
i.i.d.
X1 , X2 , ..., X5 ∼ F and Y1 , Y2 , ..., Y5 ∼ F
Get 10,000 T -test, Wilc., and Rand. Wilc. p-Values
1.0
0.8
0.5
0.4
0.6
1.0
0.6
0.4
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.0
0.2
0.2
0.0
Density
0.8
1.5
1.0
normal
0.0
0.2
T−test
0.4
0.6
0.8
1.0
Wilcoxon
J. D. Habiger
Robust p-Values
0.0
0.2
0.4
0.6
Randomized Wilcoxon
0.8
1.0
Motivation
The p-Value Statistic
Results
Conclusion
p-value distribution
Multiple Testing Procedures
Example: F is Cauchy
i.i.d.
i.i.d.
X1 , X2 , ..., X5 ∼ F and Y1 , Y2 , ..., Y5 ∼ F
Get 10,000 T -test, Wilc., and Rand. Wilc. p-Values.
1.0
0.6
1.0
0.5
0.4
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.0
0.2
0.5
0.0
Density
1.5
0.8
1.5
2.0
cauchy
0.0
0.2
T−test
0.4
0.6
0.8
1.0
Wilcoxon
J. D. Habiger
Robust p-Values
0.0
0.2
0.4
0.6
Randomized Wilcoxon
0.8
1.0
Motivation
The p-Value Statistic
Results
Conclusion
p-value distribution
Multiple Testing Procedures
Example: F is Double Exp.
i.i.d.
i.i.d.
X1 , X2 , ..., X5 ∼ F and Y1 , Y2 , ..., Y5 ∼ F
Get 10,000 T -test, Wilc., and Rand. Wilc. p-Values.
1.5
0.8
1.0
0.6
1.0
0.8
0.2
0.5
0.4
0.6
0.4
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.0
0.2
0.0
Density
1.0
laplace
0.0
0.2
T−test
0.4
0.6
0.8
1.0
Wilcoxon
J. D. Habiger
Robust p-Values
0.0
0.2
0.4
0.6
Randomized Wilcoxon
0.8
1.0
Motivation
The p-Value Statistic
Results
Conclusion
p-value distribution
Multiple Testing Procedures
BH Simulation: F = Normal
i.i.d
i.i.d
Data X1m , ..., X5m ∼ F (·) and Y1m , ..., Y5m ∼ F (· − θm )
Testing H0m : θm ≤ 0, F ∈ F vs. H1m : θm > 0, F ∈ F
θ1 = ... = θ800 = 0; θ801 = ... = θ1000 = 2
BH procedure applied at α using P-values from T, Wilcoxon, and
Randomized Wilcoxon tests.
100
Discoveries
0
50
0.2
0.1
0.0
FDR
0.3
150
0.4
normal
0.0
0.1
0.2
0.3
0.4
0.5
alpha
J. D. Habiger
0.0
0.1
0.2
0.3
alpha
Robust p-Values
0.4
0.5
Motivation
The p-Value Statistic
Results
Conclusion
p-value distribution
Multiple Testing Procedures
BH Simulation: F 6= Normal
0.0
100
0 20
60
Discoveries
0.2
0.1
FDR
0.3
0.4
cauchy
0.0
0.1
0.2
0.3
0.4
0.5
0.0
0.1
alpha
0.2
0.3
0.4
0.5
0.4
0.5
alpha
100
Discoveries
0
50
0.2
0.1
0.0
FDR
0.3
150
0.4
uniform
0.0
0.1
0.2
0.3
0.4
0.5
alpha
J. D. Habiger
0.0
0.1
0.2
0.3
alpha
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
p-value distribution
Multiple Testing Procedures
Application
BH procedure, α = .05, Timmons et. al (2007) data
Discoveries
T -test Nonrandom Wilcoxon Random Wilcoxon
812
863
928
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Remarks
Overtime
Remark
Some multiple testing procedures allow for
nonuniform p-values by estimating the null p-value
distribution.
Ex.: books by Efron(2010) or Duduit and van der Laan
(2008)
Estimation induces added variability and/or bias
(sometimes substantial)
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Remarks
Overtime
Example
Efron’s local FDR procedure operates by estimating FDR
Can make uniform assumption OR estimate null density
0.3
0.0
0.0
0.1
0.2
St. Dev.
0.4
0.2
Bias
0.6
0.4
0.8
0.5
cauchy
0.0
0.1
0.2
0.3
0.4
0.5
local fdr
0.0
0.1
0.2
0.3
0.4
0.5
local fdr
Key: Assume w/ R.W.(= o), Estimate w/ R.W. (= △), Assume w/ T-test (= +), Estimate w/ T-test(= x)
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Remarks
Overtime
What We Did
Provide a more sturdy bridge
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Remarks
Overtime
Recap
Multiple testing procedures are valid if p-values
satisfy uniformity condition
or perform better under the assumption
We provide a route for defining such p-values
Ex: Randomized Wilcoxon p-value uniform if data
continuous
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Remarks
Overtime
Composite Testing
Let S be a test stat with S ∼ Q under H0 : F ∈ F0
If large or small values of S are evidence against H0
δ + (S, U; η) = I{S > Q −1 (1−η)}+I{S = Q −1 (1−η)}I{U ≤ γ + (η)}
δ − (S, U; η) = I{S < Q −1 (η)} + I{S = Q −1 (η)}I{U ≤ γ − (η)}
δc0 (S, U; η) = δ − (S, U; [1 − c]η) + δ + (S, U; cη)
∆0c is F0 -size-valid for any c ∈ [0, 1]
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Remarks
Overtime
P-value
The P-value statistic for ∆0c is
P∆− (S, U) 1 − P∆− (S, U)
P∆0c (S, U) = min
,
.
c
1−c
where
P∆− (S, U) = Q(S−) + Uq(S)
How to choose c?
J. D. Habiger
Robust p-Values
Motivation
The p-Value Statistic
Results
Conclusion
Remarks
Overtime
Thanks
THANKS FOR LISTENING
Acknowledgements
Taylor and Francis and The Journal of Nonparametric
Statistics
Colorado State University
Edsel Peña
Wensong Wu
NSF DMS0805809, NIH RR17698, EPA RD-83241902-0
J. D. Habiger
Robust p-Values
© Copyright 2026 Paperzz