Supplementary Information (docx 5121K)

Supplementary materials
Title: EigenGWAS: finding loci under selection through genome-wide association studies of eigenvectors in
structured populations
Guo-Bo Chen1, Sang Hong Lee1,2, Zhi-Xiang Zhu3, Beben Benyamin1, Matthew R. Robinson1
1Queensland
Brain Institute, The University of Queensland, Brisbane, QLD 4072, Australia;
of Environmental and Rural Science, The University of New England, Armidale, NSW 2351,
Australia; 3SPLUS Game, Guangzhou, Guangdong 510665, China
2School
Figure S1 The distribution of the entries in genetic relatedness matrix for HapMap3 and POPRES.
Figure S2 Correlation for the SNP effects estimated using EigenGWAS and BLUP for POPRES samples.
Figure S3 The correlation between EigenGWAS chi-square tests and 𝐹𝑠𝑑 for HapMap3 samples.
Figure S4 EigenGWAS for Simulated data. The simulated data has 2000 samples, and 500,000 biallelic
markers.
Figure S5 The correlation between chisquare test statistics for EigenGWAS SNP effects and their 𝐹𝑠𝑑 .
Figure S6 Distribution of p-values for EigenGWAS for Simulated data (Scheme I)
Figure S7 Validation of the adjustment for the test statistic with the largest eigenvalue.
Figure S8 The statistical power for EigenGWAS.
Figure S9 Eigenvecotor1 against eigenvector 2 for 88 TSI samples and 112 CEU samples.
Figure S10 QQ plot for EigenGWAS πœ’ 2 statistics against the πœ’ 2 observed from the null for CEU&TSI
cohort.
Figure S11 Eigenvector 1 against Eigenveoctor 2 for POPRES European samples.
Figure S12 QQ plot for EigenGWAS πœ’ 2 statistics against the πœ’ 2 observed from the null for POPRES.
Table S1 The correlation between 𝐹𝑠𝑑 and chi-sq test for SNP effects on each EigenGWAS
Table S2 Gene discovery using EigenGWAS for POPRES using 234,127 SNPs
Table S3 EigenGWAS results for POPRES based on 234,127 SNPs
1
Figure S1 The distribution of the entries in genetic relatedness matrix for HapMap3 and POPRES. a)
HapMap is a mix of various ethnicities, the many off-diagonal elements also showed very large number. b)
For POPRES Europeans, which is a relative homogenous population, its off-diagonal elements were very
close zero.
2
Figure S2 Correlation for the SNP effects estimated using EigenGWAS and BLUP for POPRES samples.
The x-axis represents EigenGWAS estimation for SNP effects, and the y-axis represents BLUP estimation
for SNP effects. The left panel illustrates from 𝐸1 ~𝐸5 ; the right panel from 𝐸6 ~𝐸10. As illustrated, the
correlation is nearly 1.
3
Figure S3 The correlation between EigenGWAS chi-square tests and 𝐹𝑠𝑑 for HapMap3 samples. The
correlation between EigenGWAS chi-square tests and 𝐹𝑠𝑑 from 𝐸1 to 𝐸10 (left panel, from top to bottom
𝐸1 ~𝐸5 ; right panel, from top to bottom 𝐸6 ~𝐸10 ) were 0.965, 0.819, 0.094, 0.146, 0.155, 0.186, 0.244, 0.159,
0.142, and 0.156, respectively.
4
Figure S4 EigenGWAS for Simulated data (Scheme I). The simulated data has 2000 samples, and 500,000
biallelic markers. Markers are in linkage equilibiurm, and the minor allele frequency ranges from 0.01 to 0.5
under a uniform distribution. Eigenvalues associated to the first ten eigenvectors are 1.12935, 1.12895,
1.12842, 1.12763, 1.12725, 1.12622, 1.12605, 1.12595, 1.12546, and 1.1253, respectively.
5
Figure S5 Distribution of p-values for EigenGWAS for Simulated data (Scheme I). The simulated data has
2000 samples, and 500,000 biallelic markers. Markers are in linkage equilibiurm, and the minor allele
frequency ranges from 0.01 to 0.5 under a uniform distribution. Eigenvalues associated to the first ten
eigenvectors are 1.12935, 1.12895, 1.12842, 1.12763, 1.12725, 1.12622, 1.12605, 1.12595, 1.12546, and
1.1253, respectively. The p-values have been corrected for πœ†πΊπΆ .
6
Figure S6 The correlation between chisquare test statistics for EigenGWAS SNP effects and their 𝐹𝑠𝑑 .
Depending on 𝐸𝑖 > 0 or 𝐸𝑖 ≀ 0, the samples were split into two groups, upon which 𝐹𝑠𝑑 was calcuated. The
average correlation between chi-square tests and 𝐹𝑠𝑑 was around 0.66.
7
Figure S7 Validation of the adjustment for the test statistics. Two subdividions were generated, the average
𝐹𝑠𝑑 between these two samples were 0.02 for 10,000 loci. For the top panel, the subdivision 1 and 2 had
1,000 and 200 individuals, respectively. πœ†Μ‚1 = 14.34, and its expected value was 14.46. After the adjustment
for πœ†1 , it could be seemed that the distribution of the test statistic follows the null distribution very well. For
the bottom panel, the subdivision 1 and 2 had 1,000 and 500 samples, respectively. πœ†Μ‚1 = 26.96, and its
expected value was 27.50. The black squares were test statistics after the adjustment of πœ†Μ‚1 , whereas the red
2
𝑁
ones were test statistics calculated from theory 𝐸(πœ’1.πœ†
) = 4𝑛𝑀(1 βˆ’ 𝑀)𝐹𝑠𝑑
, in which 𝑛 is total sample for
1
𝑁
two subdivisions, 𝑀 = 0.83 for the top panel and 0.67 for the bottom panel, and 𝐹𝑠𝑑
is Nei’s measure for 𝐹𝑠𝑑 .
8
The agreement between the red and black points validated our theory. For more details please refer to the
main text.
9
1.0
0.4
0.0
0.2
Power
0.6
0.8
1.0
0.8
0.6
0.4
0.2
Power
0.0
Fst=0.01
Fst=0.02
Fst=0.05
Fst=0.1
0.0
0.2
0.4
0.6
0.8
1.0
0.0
Selection (Fst)
0.2
0.4
0.6
0.8
1.0
Selection (Fst)
Figure S8 The statistical power for EigenGWAS. The x-axis represents the strength of selection, in terms
of 𝐹𝑠𝑑 , for a locus. 𝐹̅𝑠𝑑 indicates population stratification. The y-axis represents statistical power evaluated
𝐹
fron πœ’12 with non-centrality parameter 𝐹̅𝑠𝑑 βˆ’ 1. The p-value cutoff for 0.05 significant for the left and right
𝑠𝑑
panel were 0.05/10000, and 0.05/500,000, respectively.
10
Figure S9 Eigenvecotor1 against eigenvector 2 for 88 TSI samples and 112 CEU samples. Eigenvetors
were generated directly based on 919,313 SNPs over 88 TSI and 112 CEU samples from HapMap.
11
Figure S10 QQ plot for EigenGWAS 𝝌𝟐 statistics against the 𝝌𝟐 observed from the null for CEU&TSI
cohort. The left one was the origninal EigenGWAS for CEU&TSI cohort, which had πœ†πΊπΆ = 1.725, and the
right one was the one corrected for πœ†πΊπΆ .
12
Figure S11 Eigenvector 1 against Eigenveoctor 2 for POPRES European samples.
13
Figure S12 QQ plot for EigenGWAS 𝝌𝟐 statistics against the 𝝌𝟐 observed from the null for POPRES.
The left one was the origninal EigenGWAS for POPRES cohort, which had πœ†πΊπΆ = 5.00, and the right one
was the one corrected for πœ†πΊπΆ .
14
Table S1 The correlation between 𝑭𝒔𝒕 and chi-sq test for SNP effects on each EigenGWAS
Phenotype
𝐸1
𝐸2
𝐸3
𝐸4
𝐸5
𝐸6
𝐸7
𝐸8
𝐸9
𝐸10
POPRES
0.887
0.933
0.904
0.987
0.765
0.994
0.977
0.963
0.917
0.927
p-value
<1e-16
<1e-16
<1e-16
<1e-16
<1e-16
<1e-16
<1e-16
<1e-16
<1e-16
<1e-16
HapMap
0.965
0.819
0.094
0.146
0.155
0.186
0.244
0.159
0.142
0.156
15
p-value
<1e-16
<1e-16
<1e-16
<1e-16
<1e-16
<1e-16
<1e-16
<1e-16
<1e-16
<1e-16
Table S2 Gene discovery using EigenGWAS for POPRES using 234,127 SNPs
Gene
LCT
MCM6
HERC2
Phenotype
𝐸1
𝐸1
𝐸1
Lead SNP
rs2304371
rs309180
rs8039195
Alleles
C/T
G/A
A/G
p-value
4.19e-46
5.223e-93
8.902e-36
Allele frequency (samples)
0.817 (1373):0.676 (1062)
0.706 (1386):0.483 (1080)
0.694 (1385):0.765 (1080)
16
𝐹𝑠𝑑
0.110
0.102
0.055
Position (CHR:BP)
2:135817629
2:135856685
15:28268218
Table S3 EigenGWAS results for POPRES based on 234,127 SNPs
POPRES
Eigen value
#GWAS hits
𝐸𝑖
πœ†πΊπΆ
1
5.104
5.282
5373
2
2.207
2.118
544
3
2.157
1.703
713
4
2.077
1.467
481
5
1.971
1.726
447
6
1.871
1.303
464
7
1.843
1.469
513
8
1.818
1.660
295
9
1.807
1.637
309
10
1.798
1.584
240
17