Supplementary Information (doc 784K)

Supplemental Material
S1. Derivation of meta-analyzed Levene’s test statistics using summary statistics
Let ni denote the count in the ith genotype group. Levene’s test statistics to assess whether the genotype groups share a common
variance is:
(N - 3)å i =0 ni ×(Z i - Z)2
2
L=
Where z ij  y ij  Y i , yij the
th
observation in the
(3 -1)× å i=0 å j =1 (zij - Z i )2
2
th
ni
genotype group and Y i the group mean of Yi . Z i is the group mean of Z i
and Z the overall mean of Z i . Without loss of generality, assume the quantitative trait conditional on a genotype is centered
about its group mean (i.e. no main effect). z ij  y ij  Y i is then reduced to z ij  y ij .
Let n0 s , n1s , n2s be the genotype counts summed over all studies, N the overall sample size. Calculation of Levene’s test
statistics by simply combining samples assumes the following natural weights:  is 
i 
n
 n
i

is
s
s
is
nis
(  is  1 ) and
s nis s
ni
( i  i  1 ).
N
Let superscript “+” denote the statistics calculated under a hypothetical situation where individual-level data from all studies are
used:
the overall within genotype mean Z i
Zi



ni
z
j 1 ij
ni

 n Z
n
s
is
s
is

 s is  Z is
is
the grand mean
Z i  ni
2  n is  Z is
2
Z  i  0
 i 0 s
 i  0 s Z is   is   i
N
N


2
We then express the test statistics L+ by mathematical equivalence using only the summary statistics and weight:
2

( N  3) i 0 ni  ( Z i  Z )
L 

(3  1) 2 nis ( z ij  Z i  ) 2
i 0
j 1

2

( N  3)

(3  1)
( N  3)


(3  1)
( i  0 n i  Z i
2
 ( 
2
i 0
2
nis
s
 N Z
( z ij  ni Z i
2
 ( (n  1)  
( n  ( 
( N  3)


is
2
Z is
2
i 0
(3  1)
 ( 
i
s
2
i 0
( N  3) N

 
(3  1) N
s
is
  is 
2
Z is
2
)
 N Z
2
)
 Z is nis ) ni Z i
2
2
)
 Z is ) 2  N  (i  0  s Z is   is   i ) 2 )
2
Z
2
 Z is   is )  nis ni ( s  is  Z is ) 2 )
2
is
nis
(i 0 ni  ( s  is  Z is ) 2  N  (i  0  s Z is   is   i ) 2 )
2
2
 ( 
2
i 0
( N  3)


(3  1)
2
2
s
)
2
j 1
( i  0 n i  Z i
i 0
2
s
2
Z is
  is 
Z
2
is
nis
 Z is   is )  nis ni (s  is  Z is ) 2 )
2
(i  0  i  ( s Z is   is )  (i 0  s Z is   is   i ) 2 )
2
2
 ( (
2
i 0
s
2
Z is
  is 
Z
2
is
N i
2
 Z is   is )   i  ((  s Z is   is ) 2   i ))
2
Table S1. Study-specific Quanlity Control Results
Trait
Height
Study
BMI
Number of Samples
Number of SNPs
Number of Samples
Number of SNPs
MESA
2,358
695,368
2,168
692,326
NHS
3,307
687,532
2,449
671,098
HPFS
2,449
667,108
1,275
647,506
Combined
8,114
660,716
5,892
642,600
Table S2 SNPs associated with other traits/disease reported in the published GWAS catalogue (at p-value < 5E-08) that
are within 500kb distance away from the Variance heterogeneity SNPs.
Variance
Heterogeneous
Chr
SNP
Position
(KB)
Nearest Gene
Known Associated
Associated
R2
D’
Distance (kb)
Between Variance Heterogeneous
PubMed ID
Disease/Trait
SNPs
SNP and Known SNP
(References)
BMI
rs2568958
BMI
rs12132044
1
72306264
NEGR1
0.325
0.888
231.440
20935630
19079260
Height
rs11224301
11
99959951
ARHGAP42
Blood Pressure
rs633185
0.012
1.000
138.797
21909115
rs12919408
16
13831900
ERCC4
Menarche
rs1659127
0.047
0.697
463.906
21102462
rs7153476
14
68102983
RAD51L1
rs1465788
0.000
0.016
230.369
19430480
Multiple Sclerosis
rs4902647
0.007
0.092
220.961
21833088
Menarche
rs1659127
0.047
0.697
457.675
21102462
rs857179
16
13838131
ERCC4
Diabetes Mellitus,
Type 1
Table S3. Power to detect a gene-environment and gene-gene interaction, respectively
Each condition in the corresponding cell was simulated 5,000 times with n (= 1, 000, 2, 500, 5, 000, 7, 500) individuals. Four
studies were individually analyzed to generate summary statistics for meta-analysis of Levene’s test. SNPs with nominally
significant meta-analyze Levene’s test p-values were then selected by Variance Prioritization for interaction with either a
continuous environmental covariate or a second SNP. The beta-coefficients represents the main effect of the prioritized SNP,
main effect of the interacting covariate (or a second SNP, not necessarily selected by Variance Prioritization) and the interaction
effect. Exhaustive search power represents the power to detect an interaction with linear regression after correcting for M =
500,000 SNPs (p-value < 0.05/M). VP power represents the power of Variance Prioritization at the optimal p-value threshold.
Increase in power relative to exhaustive search is computed as a ratio and approaches infinity when the exhaustive power
approaches 0. Variance explained (VE) by covariate and interaction was calculated using beta-coefficients and also to reflect
effect sizes as a function of minor allele frequencies.
Table S3.1 Power to detect a gene-environment interaction
Beta-Coefficients
MAF = 10%
MAF = 20%
MAF = 30%
MAF = 40%
MAF = 50%
Exhaustive Search Power
0.0032
0.0372
0.1066
0.1784
0.1942
Proportion of Prioritized SNPs
VP Power
Relative Increase
VE Covariate (%)
VE Interaction (%)
6%
0.0098
2.063
8.253
0.0412
6%
0.052
0.398
8.251
0.0733
30%
0.1352
0.268
8.249
0.0962
21%
0.2096
0.175
8.2478
0.11
19%
0.2276
0.172
8.2474
0.1145
β1= 0
β 2 = 0.3
β 3 = 0.08
Exhaustive Search Power
Proportion of Prioritized SNPs
VP Power
Relative Increase
VE Covariate (%)
VE Interaction (%)
0.1474
21%
0.1764
0.197
8.248
0.1056
0.6558
50%
0.6732
0.027
8.241
0.1875
0.8874
69%
0.8928
0.006
8.2365
0.246
0.95
64%
0.9532
0.003
8.234
0.281
0.966
73%
0.9684
0.002
8.233
0.2927
β 1 = 0.1
β 2 = 0.3
β 3 = 0.05
Exhaustive Search Power
Proportion of Prioritized SNPs
VP Power
Relative Increase
VE Covariate (%)
VE Interaction (%)
0.0038
19%
0.0086
1.263
8.24
0.0412
0.0412
9%
0.063
0.529
8.227
0.0731
0.1168
21%
0.14
0.199
8.2173
0.0959
0.1724
18%
0.2046
0.187
8.212
0.1095
0.1934
24%
0.2322
0.201
8.21
0.114
β 1 = 0.1
β 2 = 0.3
β 3 = 0.08
Exhaustive Search Power
Proportion of Prioritized SNPs
VP Power
Relative Increase
VE Covariate (%)
VE Interaction (%)
0.1536
22%
0.183
0.191
8.234
0.1054
0.6434
63%
0.661
0.027
8.217
0.187
0.8898
79%
0.8932
0.004
8.205
0.2451
0.9582
78%
0.9604
0.002
8.198
0.28
0.9614
71%
0.9644
0.003
8.195
0.2914
β1 = 0
β2 = 0.3
β3 = 0.05
Table S3.2. Power to detect a gene-gene interaction
Beta-Coefficients
Exhaustive Search Power
β 1 = 0.4
β 2 = 0.4
β 3 = 0.256
Proportion of Prioritized SNPs
VP Power
Relative Increase
VE SNP1 (%)
VE SNP2 (%)
VE Interaction (%)
Exhaustive Search Power
β 1 = 0.4
β 2 = 0.4
β 3 = 0.1
Proportion of Prioritized SNPs
VP Power
Relative Increase
VE SNP1 (%)
VE SNP2 (%)
VE Interaction (%)
Exhaustive Search Power
β 1 = 0.1
β 2 = 0.4
β 3 = 0.1
Proportion of Prioritized SNPs
VP Power
Relative Increase
VE SNP1 (%)
VE SNP2 (%)
VE Interaction (%)
MAF1 = 10%
MAF2 = 10%
0.0774
56%
0.0912
0.178
2.7177
2.7177
0.2004
MAF1 = 10%
MAF2 = 20%
0.6972
56%
0.7164
0.028
2.6573785
4.7242284
0.3483079
MAF1 = 20%
MAF2 = 20%
0.999
87%
0.999
0
4.61631
4.61631
0.605069
MAF1 = 20%
MAF2 = 40%
1
17%
1
0
4.4989
6.7483
0.8845
MAF1 = 30%
MAF2 = 30%
1
10%
1
0
5.864076
5.864076
1.008809
MAF1 = 30%
MAF2 = 50%
1
69%
1
0
5.7883
6.8908
1.1854
0
1%
0
0
2.722
2.722
0.0306
0
2%
2.00E-04
Inf
2.6652452
4.7382137
0.0533049
0.001
10%
0.0036
2.6
4.64010208
4.64010208
0.09280204
0.011
5%
0.0278
1.527
4.5328
6.7993
0.136
0.026
8%
0.0648
1.492
5.914639
5.914639
0.155259
0.0682
7%
0.1408
1.065
5.847
6.9608
0.1827
0
1%
0
0
0.1746
2.7936
0.0314
0
1%
4.00E-04
Inf
0.17084672
4.85963993
0.05467095
4.00E-04
33%
0.0022
4.50
0.30319568
4.85113092
0.09702262
0.0088
28%
0.0144
0.636
0.2959
7.101
0.142
0.025
36%
3.00E-02
0.2
0.391366
6.261858
0.164374
0.0708
67%
0.071
0.003
0.3866
7.3644
0.1933
Figure S1 – Quantile-quantile plot of Levene’s test P-value using individual-level data and aggregated-level data (metaanalyzed Levene’s test P-value)
We showed with simulated data, under the null of variance homogeneity, the p-values given by meta-Levene test statistics using
both individual-level and summary-level data were identical by mathematical equivalence.
Figure S2. Quantile-quantile plots of Levene’s test p-value distribution in individual studies and meta-analysis for
log(height) and log(BMI)
Illustrated in (A-D) are the quantile-quantile plots of meta-analyzed Levene's test P-values for log (height) in individual studies
(MESA, NHS, HPFS) and combined analysis (8,114 individuals combined), respectively. Illustrated in (E-H) is the quantilequantile plot of meta-analyzed Levene's test P-values for log (BMI) in individual studies (MESA, NHS, HPFS) and combined
analysis (5,892 individuals combined) , respectively.
Figure S3. NEGR1 Gene Region
Regional plot of the Levene’s test p-values obtained from meta-analysis of Body Mass Index variance heterogeneity. The purple
diamond represents the rs12132044 variant with Levene’s test p-value of 4.28e-6. Whereas the light blue circle to the far right (rsquared estimated from HapMap CEU panel [The International HapMap 3 Consortium (2010). Integrating common and rare
genetic variation in diverse human populations Nature: 10.1038/nature09298] is 0.325) represents the rs2815752 variant known
to be associated with BMI and extreme obesity. This plot is generated using LocusZoom (Pruim RJ, Welch RP, Sanna S,
Teslovich TM, Chines PS, Gliedt TP, Boehnke M, Abecasis GR, Willer CJ. (2010) LocusZoom: Regional visualization of
genome-wide association scan results. Bioinformatics 2010 September 15; 26(18): 2336.2337).
Figure S4.1 - Variance Prioritization power to detect a gene-environment interaction
Consider a hypothetic genetic consortium comprising 4 studies, with 1,000, 2,500, 5,000 and 7,500 participants and a common
set of 500,000 genotyped SNPs. Four studies were individually analyzed using Variance Prioritization and meta-analyzed to
generate summary statistics for meta-analysis (Left panel). Each condition on the right panel was simulated 5,000 times. Minor
allele frequency of the SNP was set at 20%. For each condition, assume that the environmental exposure explained 13.8% of the
quantitative trait variance, the horizontal line represents the power to detect an interaction that explained 0.16% of the
quantitative trait variance with linear regression after correcting for M = 500,000 SNPs (p-value < 0.05/M). Black curves
represent the power of Variance Prioritization when Levene’s test p-value thresholds range from 0.001 to 1 with 0.001
incremental increase. The power of Variance Prioritization was maximized at the optimal p-value threshold.
Figure S4.2 - Variance Prioritization power to detect a gene-gene interaction
Each condition on the right panel was simulated 5,000 times with n (= 1, 000, 2, 500, 5, 000, 7, 500) individuals. Four studies
were individually analyzed using Variance Prioritization and meta-analyzed to generate summary statistics for meta-analysis
(Left panel). Minor allele frequency of both interacting SNPs was set at 20%. For each condition, assume individual SNPs each
explained 4.6% of the quantitative trait variance, the horizontal line represents the power to detect a gene-gene interaction
explaining 0.2% of the quantitative trait variance with linear regression after correcting for M = 500,000 SNPs (p-value <
0.05/M2). Black curves represent the power of Variance Prioritization when Levene’s test p-value thresholds range from 0.001 to
1 with 0.001 incremental increase. The power of Variance Prioritization was maximized at the optimal p-value threshold.
We compared the performance of VP using Meta-Levene to a conventional method (i.e. exhaustive search with correction for all
SNPs tested). When considering a gene-environment interaction explaining 0.16% of the quantitative trait variance (Figure
S4.1), power to detect the interaction using both Meta-Levene and Levene’s test on individual-level data was estimated at 58.2%,
as compared to 52.2% using exhaustive search (left panel). Further, when Levene’s test was not meta-analyzed, the VP powers to
detect interactions under the same condition for individual studies were 0.002% (n = 1, 000), 0.4% (n = 2, 500), 2.5% (n = 5,
000) and 9.54% (n = 7, 500), corresponding to even lower conventional powers of 0%, 0.008%, 0.56% and 6.38%, respectively
(right panel). For a gene-gene interaction model, we considered the simplified situation where the two interacting loci had the
same minor allele frequency (p) and main effect (β1) on the quantitative trait. When the proportion of variance explained by a
gene-gene interaction was 0.2% and individual SNPs explained 4.6% of the quantitative trait variance (Figure S4.2), power to
detect gene-gene interactions using exhaustive search was 8.90%, while Variance Prioritization led to an improved power of
14.04% (left panel). When Levene’s test was not meta-analyzed, statistical powers of Variance Prioritization to detect
interactions under the same conditions for individual studies were 0% (n = 1, 000), 0% (n = 2, 500), 0.08% (n = 5, 000) and
0.42% (n = 7, 500), corresponding to conventional powers of 0%, 0%, 0% and 0.02%, respectively (right panel).