HEYSE_JFH Graybill

False Discovery Rates
for Discrete Data
Joseph F. Heyse
Merck Research Laboratories
Graybill Conference
June 13, 2008
Introduction
 Almost all multiplicity considerations in clinical trial applications are
designed to control the Family Wise Error Rate (FWER).
 Benjamini and Hochberg (1995) argued that in certain settings,
requiring control of the FWER is often too conservative.
 They suggested controlling the “False Discovery Rate” (FDR) is a
more powerful alternative.
 Accounting for the discrete endpoints can further improve the
power of FDR (and FWER) methods.
Graybill.ppt.
Outline
1. Definition and properties of FDR
2. FDR for discrete data
3. Application: Genetic variants of HIV
4. Summary of simulation results
5. Application: Rodent carcinogenicity study
6. Concluding remarks
Graybill.ppt.
Familywise Error Rate (FWER)
 Let F = {H1,H2 … HK} denote a family of K hypotheses.
 FWER = Pr(any true Hi  F is rejected).
 The procedures currently used for clinical studies are
intended to control the FWER  a.
 Benjamini & Hochberg (1995) proposed controlling the
“False Discovery Rate” (FDR) as a more powerful
alternative to FWER.
Graybill.ppt.
False Discovery Rate (FDR)
(Benjamini & Hochberg, 1995)
Declared

Insignificant
Declared
Significant
Total
# of true Hi
U
V
K0
# of false Hi
T
S
K  K0
Total
KR
R
K
V 
FDR  E   expected proportion of rejected null
R 
hypotheses which are incorrectly rejected.
 When R=0, FDR is defined to be 0.
Graybill.ppt.
False Discovery Rate (FDR) (cont’d)
(Benjamini & Hochberg)
j
 Re ject H1 , H2 , , H j if P j   a
K
~
Adjusted P - values : PK   PK 
K 
~
~
P j  minP j1 , P j  , j  K  1
j


Example (K=4)
Unadjusted p-values
.0193
.0280
.2038
.4941
FDR-adjusted p-values
.0560
.0560
.2718
.4941
Graybill.ppt.
Properties of FDR Control
 The B&H sequential procedure controls the FDR at
K 0 K  a  a for independent hypotheses.
 FDR < FWER and equality holds if K=K0.
 The Hochberg (1988) stepwise procedure compares
P j  to a K  1 - j while the FDR procedure compares
P j  to a j K .
 FDR is potentially more powerful than FWER
controlling procedures.
Graybill.ppt.
Comparing FDR and FWER
Example (K=4)
Unadjusted P-values
.0193
.0280
.2038
.4941
FDR-adjusted P-values
.0560
.0560
.2718
.4941
FWER-adjusted P-values
.0772
.0840
.4076
.4941
FDR adjusted P-values < FWER adjusted P-values
Graybill.ppt.
Modified FDR for Discrete Data
 Adjusted P-values for FDR
K
~
P[ j ]  P j 
j
 For discrete data
 K *
~
P[ j ]   Pi  j
 i 1 
Where Pi* is largest P-value achievable for
hypothesis i that is less than or equal to P(j).
Graybill.ppt.
FDR for Discrete Data
 Gain in power for the discrete data FDR comes from
the difference P ( j )  Pi *.
 If endpoint i is not able to achieve a P-value ≤ P(j)
*
then Pi  0 and the dimensionality is reduced.
 If endpoint i is able to achieve a P-value ≤~P(j) then
Pi*  P ( j ) and a smaller quantity adds to P[ j ] .
Graybill.ppt.
Other Approaches for Discrete Data
 Tarone (1990) proposed a modified Bonferroni procedure for
discrete data by removing those endpoints unable to reach that
level of statistical significance.
 Gilbert (2005) proposed a 2 step FDR method for discrete data.
1. Apply Tarone’s method to identify endpoints suitable for
adjustment.
2. Apply B-H FDR to those endpoints.
 Calculating the FDR adjusted P-value
~
P[ j ]
is expected to
improve upon these approaches by using the complete exact
distribution.
Graybill.ppt.
Example: Genetic Variants of HIV
•
Gilbert (2005) compared the mutation rates at 118 positions in
HIV amino-acid sequences of 73 patients with subtype C to 73
patients with subtype B.
•
The B-H FDR procedure identified 12 significant positions.
•
The Tarone modified FDR procedure reduced the
dimensionality to 25 and identified 15 significant positions.
•
The fully discrete FDR identified 20 significant positions.
Graybill.ppt.
Simulation Study
for Independent Hypotheses
 A simulation study was conducted to evaluate the statistical
properties of the FDR controlling methods for discrete data
using Fisher’s Exact Test.
 Simulation parameters
–
–
–
–
–
–
Number of Hypotheses: K = 5, 10, 15, 20
Varying numbers of false hypotheses (K-K0)
Background rates chosen randomly from U(.01, .5)
Odds Ratios for Effect Size: OR = 1.5, 2, 2.5, 3
Sample sizes: N = 10, 25, 50, 100
a = 0.05 1-Tailed
Graybill.ppt.
Rate of Rejecting True Hypotheses
When All Hypotheses are True (K0=K)
(N=100)
(N=25)
0.05
0.05
0.04
0.04
0.03
0.03
0.02
0.02
0.01
0.01
5
10
15
20
5
10
15
No. of Hypotheses (K)
B-H FDR
T-G FDR
Discrete FDR
Graybill.ppt.
20
Rate of Rejecting True Hypotheses
When Some Hypotheses are False (K0<K)
(K=10, N=25, OR=2.5)
0.05
0.04
0.04
0.03
0.03
0.02
0.02
0.01
0.01
1
2
3
4
5
6
7
8
(K=10, N=100, OR=2.5)
0.05
9
1
2
3
4
5
6
7
8
9
No. of True Hypotheses (K0)
B-H FDR
T-G FDR
Discrete FDR
Reference Level
Graybill.ppt.
Rate of Rejecting False Hypotheses
1.0
1.0
(K=10, K-K0=4, N=100)
(K=10, K-K0=4, N=25)
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
1.5
2.0
2.5
3.0
1.5
2.0
2.5
3.0
Odds Ratio
B-H FDR
T-G FDR
Discrete FDR
Graybill.ppt.
Other Applications
 Analysis of Tier II clinical trial adverse
experiences
 Trend test analysis of rodent carcinogenicity
data
 Similar modification applied to Bonferonni
adjustment for discrete endpoints
Graybill.ppt.
Rodent Carcinogenicity Studies
 Long-term carcinogenicity studies typically test
candidate drugs in several graded doses and use a
vehicle control group.
 50 male and 50 female rodents are randomly
assigned to each drug treated group with 100 rodents
of each sex assigned to control. Male and female
studies are considered separately.
 Treatment is administered daily and a terminal
necropsy is performed at the end of the study.
Graybill.ppt.
Rodent Carcinogenicity Studies (cont’d)
 Each individual tumor site encountered is described
by a combination of organ or tissue with tumor type.
 A statistical analysis of trend is performed for all
tumor sites encountered.
 An exact test uses the permutation distribution of the
trend statistic.
 Exact tests can account for age at necropsy and
lethality of tumor.
 For illustration purposes, this analysis only
considered dose and presence of tumor.
Graybill.ppt.
Summary of Statistical Results from a Long-Term
Carcinogenicity Study in Male Mice
Tumor Site
Control
0
Liver, Hepatocellular Carcinoma
P.S.U., Hemangiosarcoma
Adrenal Cortex, Adenoma
P.S.U., Sarcoma
P.S.U., Lymphoma
Lung, Adenoma
Liver, Hepatocellular Adenoma
Liver, Hemangiosarcoma
Harderian Gland, Adenoma
Skin, Fibroma
Thyroid, Follicular Cell Carcinoma
P.S.U., Leukemia
Lung, Adenocarcinoma
Testes, Interstitial Cell Tumor
Stomach, Papilloma
1
0
0
0
7
16
12
2
2
1
1
5
0
1
2
0
1
0
0
4
8
6
0
0
0
0
2
3
1
0
1
0
0
0
1
6
7
0
0
1
1
2
0
1
0
3
1
1
1
6
11
6
1
1
0
0
2
0
0
0
100
50
50
50
Number of Mice on Study
Test Agent Dose
2
4
8
Trend
P-value
Trend P-value is reported 1-tailed using exact permutational distribution.
Graybill.ppt.
N indicates 1-tailed P-value for negative trend.
.0342
.20
.20
.20
.24
.24
.49
.50
.50
.50
.50
.44N
.41N
.41N
.16N
Available Methods
 Adjusting P-values to account for multiple tumor
types. (Mantel 1980, Brown and Fears 1981, Mantel
et al. 1982)
 Adapting a for interpreting unadjusted P-values
(Haseman 1983 and 1990, Lin and Rahman 1998)
 Resampling methods to adjust P-values (Heyse and
Rom 1988, Westfall and Young 1989, Westfall and
Soper 1998)
 Bayesian methods using historical control priors
(Westfall and Soper 2001)
Graybill.ppt.
Multiplicity of Statistical Tests
 Liver, hepatocellular carcinoma was only 1 of K=15
tumor sites encountered.
 P(1)=0.0342 was the most extreme individual trend P-
value.
 Interest is in the likelihood of observing P(1)=0.0342
as the most extreme P-value among the K=15 in this
study.
 Need to consider the discrete nature of the data since
several tumor sites may not be able to achieve
significance levels of P(1).
Graybill.ppt.
P-value Adjustment Methods
 Mantel (1980) attributed to J.W. Tukey
P[1]  1  1  P1  k  0.268
Where k=number of tumor sites that could yield P-values as
extreme as P(1).
 Mantel et al. (1982)
P[*1]
K


 1   1  Pi*  0.235
i1
Where P * is largest P-value achievable for tumor site i that is
i
less than or equal to P(1).
~
P[1=] 0.264
 Discrete FDR adjustment:
Graybill.ppt.
Conclusions
 FDR control provides higher power than FWER
control when some hypotheses are false.
 Proposed procedure based on exact analysis of
binomial data controls FDR at a.
 Discrete nature of data results in slightly conservative
FDR control. FDR is less conservative for increasing
sample size and increasing numbers of hypotheses.
 Accounting for discrete endpoints increases the
power of FDR.
Graybill.ppt.