doc 3574K

Supplemental materials synaptic groups and SCZ
Supplemental Materials for:
Functional gene group analysis identifies synaptic gene groups as risk factor for schizophrenia
Esther S Lips1, L Niels Cornelisse1, Ruud F Toonen1, Josine L Min1, Christina M Hultman2,3, the
International Schizophrenia Consortium#, Peter A. Holmans4, Michael C. O'Donovan4 Shaun M.
Purcell5,6,7,8, August B Smit9, Matthijs Verhage1, Patrick F Sullivan10, Peter M Visscher11, Danielle
Posthuma1,12
1
Department of Functional Genomics, Center for Neurogenomics and Cognitive Research,
Neuroscience Campus Amsterdam VU University, The Netherlands
2
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden,
3
Department of Neuroscience, Psychiatry, Ulleråker, Uppsala University, Uppsala, Sweden,
4
School of Medicine, Department of Psychological Medicine, School of Medicine, Cardiff
University, Cardiff, United Kingdom,
5
Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston,
Massachusetts, United States of America,
6
Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge,
Massachusetts, United States of America,
7
Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts,
United States of America,
8
Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts,
United States of America,
9
Molecular & Cellular Neurobiology, Center for Neurogenomics and Cognitive Research,
Neuroscience Campus Amsterdam VU University, The Netherlands,
10
Department of Genetics, University of North Carolina, Chapel Hill, United States of America,
11
Queensland Statistical Genetics Laboratory, Queensland Institute of Medical Research, Brisbane,
Australia,
12
Department of Medical Genomics, VU Medical Center, Neuroscience Campus Amsterdam, The
Netherlands,
# Please see Acknowledgements for consortium authorship
Correspondence to: Danielle Posthuma, Center for Neurogenomics and Cognitive Research
(CNCR), De Boelelaan 1085, 1081 HV Amsterdam, The Netherlands Tel: +31 20 598 2823, Fax:
+31 20 598 6926, E-mail: [email protected]
1
Supplemental materials synaptic groups and SCZ
Supplemental Materials Contents
1. Genome-wide association analyses ................................................................................. 3
2. Testing synaptic genes versus groups of randomly drawn genes ........................................ 4
2.1 Competitive test matched for the number of genes – control method 1. ........................ 5
2.2 Competitive test matched for the effective number of SNPs: control method 2 - genic and
non-genic SNPs. ........................................................................................................... 6
2.3 Competitive test matched for the effective number of SNPs: control method 3 - genic
SNPs only. ................................................................................................................... 7
2.4 Competitive test matched for the effective number of SNPs: control method 4 - non-genic
SNPs only. ................................................................................................................... 7
2.5 Competitive test matched for the effective number of SNPs: control method 5 - SNPs
annotated to brain-expressed genes. .............................................................................. 7
2.6 Overlap across samples and across draws .................................................................. 8
3. Systematic differences between synaptic genes and other genes .......................................10
4. Functional gene group analysis for synaptic subgroups ....................................................10
5. Enrichment analysis on genes previously implicated genes in schizophrenia........................19
5.1 Signals from GWAS studies .....................................................................................19
5.2 Signals from large scaled CNV studies ......................................................................19
5.3 Enrichment analysis ...............................................................................................20
5.4 Gene-group analysis using previously implicated genes ..............................................22
6. Genetic heterogeneity and gene group robustness...........................................................24
7. Graphical representation of significant functional gene groups ..........................................24
8. Web resources ............................................................................................................28
9. References .................................................................................................................29
2
Supplemental materials synaptic groups and SCZ
1. Genome-wide association analyses
We first conducted a single SNP analysis on the ISC_AFFY5, ISC_AFFY6 and GAIN_AFFY6
datasets, using all SNPs that passed QC. For the ISC samples we clustered the analysis for
collection site to correct for possible confounders due to population stratification, following
Purcell et al., (2009). Figure S1 shows the three Manhattan plots, while figure S2 shows the
corresponding QQ plots. Red dots indicate SNPs with a P-value < 1x10-4. None of the SNPs
reached the threshold of genome wide significance (< 1x10-8).
Figure S1: Manhattan plots of all SNPs that passed QC for the ISC affy5, affy6 and GAIN samples.
Figure S2: Quantile-quantile plots of all SNPs that passed QC for the ISC affy5, affy6 and GAIN
samples.
3
Supplemental materials synaptic groups and SCZ
2. Testing synaptic genes versus groups of randomly drawn genes
We randomly drew control groups of genes/SNPs that were not necessarily functionally related,
to test whether the group of synaptic genes was significantly more related to the risk of
schizophrenia than any other randomly drawn group of genes/SNPs. Randomly drawn groups
were compiled following two strategies: matched for the number of genes and matched for the
effective number of SNPs. For both strategies we conducted 100 random draws.
When creating groups matched for the effective number of SNPs we first derived the effective
number of SNPs in the synaptic gene group based on the empirical distribution of the Σ-log(P)
under the null hypothesis of no association of the 10,000 permutations, following Purcell et al.
(2009). Briefly, under the null hypothesis of no association, -2ln(P) is distributed as a 2 with 2
degrees of freedom, and hence –log10(P) is distributed as 1/(2ln(10)) = 0.217 times a 2 with 2
degrees of freedom. If all M SNPs are independent then –log(P) has a mean of (0.217)(2M) and
a variance of (0.217)2(4M) = 0.189M. We define the effective number of SNPs (Meff) as
2
M obs[s exp,
SNPsind ]
å - log10 ( p)
M eff =
=
2
s emp,
å - log10 ( p)
M obs[(0.217) 2 (4 M obs)]
2
s emp,
å - log10 ( p)
=
2
0.189 M obs
2
s emp,
å - log10 ( p)
The expected mean and variance are calculated based on the number of SNPs that are summed
to obtain the Σ-log(P), and larger variance of the observed distribution than expected indicates
dependency (i.e. due to LD) between included SNPs.
For each sample, we started off with a list of all SNPs that passed QC, we then retrieved a list of
all independent SNPs (using Plink option --indep-pairwise, with a window size of 200 SNPs, a
sliding window of 5 SNPs and and r2 of 0.25), deleted all SNPs from synaptic genes, and created
four pools: one pool included all genic and non-genic SNPs (control method 2), one pool included
only SNPs in genes (control method 3), one pool included only SNPs outside genes (control
method 4), and one pool included SNPs in genes known to be expressed in brain (control
method 5). A SNP was assigned to a gene when located between the transcription start site and
transcription end site of the gene. Expression in brain was determined using the Unigene Homo
sapiens repository (Build #221).
For each draw we carried out an association analysis for all SNPs in the group, calculated the Σlog(P) and then carried out 10,000 permutations of the dataset to determine the empirical P-
4
Supplemental materials synaptic groups and SCZ
value. We then calculated per dataset (i.e. ISC-AFFY5, ISC-AFFY6, GAIN-AFFY6) how often the
empirical P-value from the random draw was lower than the empirical P-value of all synaptic
genes and divided that by the total number of draws to obtain the ‘empirical P-value of the
empirical P-value’. Since there were 100 draws, the lowest empirical P-value of the empirical Pvalue that we could obtain was <.01, when none of the random draws had a lower P-value than
the P-value of the synaptic gene group. We then combined the empirical P-values across the
three different samples and within each draw to obtain 100 combined empirical P-values. We
finally determined how often the combined empirical P-value was lower than the combined
empirical P-value obtained from the synaptic genes, to calculate the ‘empirical P-value of the
combined P-value’.
2.1 Competitive test matched for the number of genes – control method 1.
For this method we started off with randomly drawing 1026 genes (i.e. the total number of
synaptic genes) from the total pool of genes covered on the AFFY 6.0 platform minus the
synaptic genes (Ngenes in pool = 16351). This was done 100 times. The average number of
genes across the different draws was less than 1026 due to genes not being covered (AFFY5) or
genotyped in a sample (AFFY5 or AFFY6) (see Table S1).
Table S1: Results from control method1 to test whether synaptic genes are more significantly related to the
risk of schizophrenia than randomly drawn, matched groups of genes.
ISC
ISC
AFFY5
AFFY6
GAIN AFFY6 Combined
(N=3353)
(N=3556)
(N=2729) (N=9638)
Synaptic gene group (N=1026 genes)
nSNPs observed
15105
34860
35412 nGenes observed
795
906
908 Average number exons/gene
15.12
14.62
14.61 Σlog(p)
7102.19
16070.97
16348.01 10000
10000
10000 Number of permutations
Average- Σlog (p)_perm
6564.67
15141.25
15380.92 Average var- Σlog (p)_perm
15380.92
59022.80
62465.73 nSNPs effective
2830
3883
3786 Empirical P-value (emp_p from synaptic
genes)
<1.00E-04
<1.00E-04
<1.00E-04
7.61E-11
Matched for N_Genes
Number of independent draws
100
100
100 Average nSNPs observed
6703
15610
15899 Average nGenes observed
775
959
967 Average nSNPs effective
1493
2064
2076 Average number exons/gene
12.34
11.50
11.46 Average Σlog(p)
3118.94
7015.83
7334.84 Variance Σlog(p)
58759.19
289574.80
346796.49 Number of permutations per draw
10000
10000
10000 Average Σlog(p)_of perms across draws
Average variance Σlog(p)_ of perms across
draws
Average emp_p across draws
Variance of emp_p across draws
Emp_p of emp_p from synaptic genes
Average combined emp_p
Emp_p of combined emp_p from synaptic
genes
2912.46
6777.58
6905.25
-
48208.22
0.0239
0.0020
0.11
260652.96
0.1364
0.0319
0.02
265796.11
0.0286
0.0039
0.12
-
-
-
-
2.00E-03
-
-
-
<0.01
5
Supplemental materials synaptic groups and SCZ
The average empirical P-value obtained from the 100 draws was 239 times as large as the
empirical P-value obtained in the original analysis for all synaptic genes in the ISC_AFFY5
sample. For ISC_AFFY6 and GAIN this was 1364 times and 286 times respectively. In 11 out of
100 draws in the ISC_AFFY5 sample, the empirical P-value from a random draw was lower/more
significant than the empirical P-value from the original analysis for all synaptic genes, for the
ISC_AFFY6 and GAIN samples this was 2 times and 12 times respectively. The average
combined empirical P-values from 100 draws across the three samples was .002, and in none of
the 100 draws was the combined P-value smaller than the combined P-value from the original
analysis for all synaptic genes (i.e. the empirical P-value of the combined P-value was < .01).
(See Table S1).
Although this method matches for the number of genes, it allows differences between the
effective number of independent SNPs between the randomly drawn groups and the original
synaptic gene group. For all draws, the number of genotyped SNPs was lower than in the original
analysis on all synaptic genes. On average, the effective number of SNPs was ~50% lower in the
draws than in the original analysis, which renders the draws matched for the number of genes
sub-optimal. That is, given the findings previously reported by the ISC, which suggest that the
predicted risk to schizophrenia increases with the effective number of SNPs in the model, a
larger number of effective SNPs in the original analysis as compared to the control groups may
lead to an inflation of type I errors. We therefore also conducted control methods 2-5, where we
created randomly drawn control groups of SNPs matched for the effective number of SNPs in the
original analysis.
2.2 Competitive test matched for the effective number of SNPs: control method 2 - genic and
non-genic SNPs.
Control method 2 is used to determine whether SNPs in synaptic genes are more significantly
related to the risk to schizophrenia than any other SNP.
The empirical P-value of the empirical P-value for each sample ranged between 0.01 and 0.10,
while the empirical P-value of the combined empirical P-value was < .01, i.e. not once was the
combined empirical P-value of the random draws lower than the combined empirical P-value
from the original analysis with all synaptic genes (see Table S2). We should note however, that
in contrast to control method 1, SNPs included in a draw may not completely overlap (in terms
of both the actual SNP but also in terms of genomic location) across the three samples, which
complicates interpretation of comparing P-values across samples.
From Table S1 we also see that the total number of independent SNPs available in the pool for
control method 2 varied between 77,702-109,551 for the three samples, while the effective
number of SNPs varied between 2,830 to 3,883. Drawing 100 sets of 2,830-3,883 SNPs from
pools of sizes 77,702-109,551 is not ideal, and results in some overlap in SNPs across different
draws, rendering the 100 draws non-independent. In addition, this method may not serve as the
6
Supplemental materials synaptic groups and SCZ
most informative control method as we now compare synaptic genes with both genic and nongenic SNPs, while it can be expected that genic SNPs have a larger contribution to the risk of
schizophrenia than non-genic SNPs. We therefore also applied control methods 3 (genic SNPs)
and 4 (non-genic SNPs).
2.3 Competitive test matched for the effective number of SNPs: control method 3 - genic SNPs
only.
This method was used to investigate whether SNPs in synaptic genes are more strongly related
to the risk for schizophrenia than SNPs in other genes. The pool available for 100 draws was
significantly smaller than the pool available for control method 2, as all non-genic SNPs were
excluded. The number of SNPs in the pool ranged from 28,558 to 39,866, which is ~ 10 times
the size of each draw, which - although it allows comparison with genic SNPs - is far from
optimal and creates non-independent sets of drawn SNPs. The empirical P-value of the combined
empirical P-value was <.01.
2.4 Competitive test matched for the effective number of SNPs: control method 4 - non-genic
SNPs only.
Control method 4 was used to test whether SNPs in synaptic genes are more strongly related to
the risk of schizophrenia than non-genic SNPs. This method has slightly larger pools than
method 3, and therefore suffers less from dependency between random draws. Results reported
previously by the ISC (Purcell et al., 2009) showed that genic SNPs do better in predicting risk to
schizophrenia as compared to non-genic SNPs. However, not once was the combined empirical
P-value lower than the original analysis with all synaptic genes, indicating that the association
with synaptic genes and schizophrenia is stronger than with other genes. The empirical P-value
of the (combined) empirical P-value for this method was <.01.
2.5 Competitive test matched for the effective number of SNPs: control method 5 - SNPs
annotated to brain-expressed genes.
With this method we tested whether SNPs in synaptic genes are more strongly related to the risk
of schizophrenia than SNPs in genes expressed in brain. This method has the smallest pools and
the 100 independent draws will not be independent. Although the average combined empirical Pvalue of random draws composed of SNPs annotated to genes expressed in brain was the lowest
of the control methods, again, the empirical P-value of the (combined) empirical P-value for this
method was <.01. This suggests that SNPs in synaptic genes are more strongly to the risk to
schizophrenia than any other set of SNPs in genes expressed in brain.
7
Supplemental materials synaptic groups and SCZ
2.6 Overlap across samples and across draws
The overlap of SNPs included in the draws based on the ISC_AFFY5, ISC_AFFY6, and GAIN
samples is not 100%. This is due to differences between the Affymetrix 5.0 and Affymetrix 6.0
platforms and due to the fact that LD structure between the samples may differ, which causes
different SNPs to be selected as independent. For control methods 2-5 we also conducted
analyses where we forced the SNPs included in the draws to be maximally overlapping, i.e. the
draws for ISC_AFFY6 and GAIN included all SNPs included in ISC_AFFY5, ISC_AFFY6 and GAIN
were completely overlapping and independent SNPs in ISC_AFFY5 were also qualified as
independent SNPs in ISC_AFFY6 and GAIN. However, this resulted in very small pool sizes for
ISC_AFFY5 (a reduction on average of 35% across methods 2-5), resulting in pool sizes for
control methods 3 and 4 that were only 6 times as large as the total number of SNPs to be
included in a draw. Since 100 draws need to be obtained, these pools are highly undersized.
In summary, results from the applied control methods suggest that SNPs in synaptic genes are
more strongly associated to the risk to schizophrenia than any other set of randomly drawn
genes. Although all applied control methods have their own pros and cons all indicated a strong
association of synaptic genes versus other genes or SNPs.
Determining the correct control method is complicated: matching for genes (method 1) seems
most optimal as it allows comparison across different platforms (i.e. considering the gene as the
unit of association signal, as is also done in the original analysis), but there is a lot of fluctuation
in the effective number of SNPs included in the draws for that method. This seems mainly due to
an artifact of the AFFY 5.0 and AFFY 6.0 platforms probably in combination with brain genes
(including synaptic genes) in general being larger than other genes; for the synaptic genes there
are on average 28 SNPs and on average 12 SNPs for the non synaptic genes on the AFFY 5.0
platform, where there are on average 48 SNPs for synaptic genes and 20 SNPs for non synaptic
genes on the AFFY 6.0 platform.
As a final check we generated three draws in which we controlled for both the gene size and the
genotyped number of SNPs by finding a matched-control gene for each synaptic gene. Matching
was based on gene size (+/- 10%) and the number of genotyped SNPs (+/- 10%). In those few
cases where a synaptic gene could not be matched with a control gene on these two matching
criteria we randomly selected a gene matched for gene size only. Only three complete draws
could be made on this basis. These three draws were analyzed following the same procedure as
the synaptic gene group. None of the combined empirical P-values for these 3 draws were lower
than or equal to the combined empirical P-value for the group of synaptic genes.
8
Supplemental materials synaptic groups and SCZ
Table S2: Results from control methods 2-5
ISC AFFY5
2. Matched for n effective SNPS, genic and nongenic
Number of independent SNPs in pool
Number of independent draws
Average nSNPs observed
Average nGenes observed
Average nSNPs effective
Average Σ-log(p)
Variance Σ-log(p)
Number of permutations per draw
Average Σ-log(p)_permxdraw
Variance- Σ-log(p)_permxdraw
Average emp_p across draws
Variance of emp_p across draws
Emp_p of emp_p from synaptic genes
Average combined emp_p
Emp_p of combined emp_p from synaptic genes
3. Matched for n effective SNPs, genic
Number of independent SNPs in pool
Number of independent draws
Average nSNPs observed
Average nGenes observed
Average nSNPs effective
Average Σ-log(p)
Variance Σ-log(p)
Number of permutations per draw
Average Σ-log(p)_ of perms across draws
Variance Σ-log(p)_ of perms across draws
Average emp_p across draws
Variance of emp_p across draws
Emp_p of emp_p from synaptic genes
Average combined emp_p
Emp_p of combined emp_p from synaptic genes
4. Matched for n effective SNPs, nongenic
Number of independent SNPs in pool
Number of independent draws
Average nSNPs observed
Average nGenes observed
Average nSNPs effective
Average Σ-log(p)
Variance Σ-log(p)
Number of permutations per draw
Average Σ-log(p)_ of perms across draws
Variance Σ-log(p_ of perms across draws
Average emp_p across draws
Variance of emp_p across draws
Emp_p of emp_p from synaptic genes
Average combined emp_p
Emp_p of combined emp_p from synaptic genes
5. Matched for n effective SNPs, brain genes
Number of independent SNPs in pool
Number of independent draws
Average nSNPs observed
Average nGenes observed
Average nSNPs effective
Average Σlog(p)
Variance Σlog(p)
Number of permutations per draw
Average Σlog(p)_ of perms across draws
Variance Σlog(p)_ of perms across draws
Average emp_p across draws
Variance of emp_p across draws
Emp_p of emp_p from synaptic genes
Average combined emp_p
Emp_p of combined emp_p from synaptic genes
77702
100
2830
n.a
2789
1292.82
569.05
10000
1229.51
541.77
0.0269
0.0048
0.10
-
ISC AFFY6
GAIN AFFY6
Combined
109551
100
3883
n.a
3740
1728.28
791.07
10000
1686.04
760.64
0.1411
0.0313
0.01
109431
100
3786
n.a
3711
1713.52
602.35
10000
1711.82
728.74
0.0288
0.0028
0.07
2.79E-03
<0.01
28558
100
2830
1930
2765
1306.12
471.51
10000
1229.52
546.44
0.0102
0.0011
0.29
-
39866
100
3883
2407
3702
1719.78
800.43
10000
1685.92
768.41
0.1992
0.0446
0.01
49221
100
2830
0
2781
1285.21
539.77
10000
1229.45
543.24
0.0436
0.0089
0.06
-
9
-
69784
100
3883
0
3742
1732.59
561.99
10000
1686.05
760.22
0.1032
0.0193
<0.01
25646
100
2830
1837
2767
1312.82
484.34
10000
1229.42
546.06
0.0041
0.0002
0.34
-
39853
100
3786
2358
3664
1711.82
740.22
10000
1644.29
737.90
0.0370
0.0049
0.11
1.89E-03
<0.01
69677
100
3786
0
3694
1715.67
839.19
10000
1644.23
732.04
0.0350
0.0049
0.12
-
35713
100
3883
2267
3681
1715.00
537.70
10000
1685.90
772.86
0.2104
0.0351
<0.01
-
1.85E-03
<0.01
35684
100
3786
2227
3658
1720.03
664.31
10000
1644.36
739.23
0.0212
0.0033
0.12
-
1.47E-03
<0.01
Supplemental materials synaptic groups and SCZ
3. Systematic differences between synaptic genes and other genes
The average number of SNPs observed across all 100 draws from control method 1, was
significantly lower than that in the original analysis of all synaptic genes: the mean number of
SNPs per gene was 8.6 in the 100 random draws, while it was 19.0 in the original analysis of all
synaptic genes in the ISC_AFFY5 sample, and similar discrepancies for the other two samples.
This difference could not be ascribed to ‘unlucky’ draws, as new draws gave similar results. This
discrepancy is most likely due to the larger gene-length known to be related to genes expressed
in brain (e.g. Jia et al., 2010). To investigate whether other systematic differences between
synaptic genes and other genes may explain the obtained results for synaptic genes, we looked
at possible differences in minor allele frequency, gene size, and the number of exons in synaptic
genes and non-synaptic genes (see Table S3).
Table S3: Characteristics of synaptic genes and non-synaptic genes
ISC
AFFY5
ISC
AFFY6
GAIN
AFFY6
Number of genes on platform
Mean MAF
Mean nSNPs/gene**
Mean gene size (bp)
Mean number of exons
795*
0.239
19.0
151,821
15.12
906*
0.237
38.5
136,495
14.62
908*
0.236
39.0
136,180
14.61
Number of genes on platform
Mean MAF
Mean nSNPs/gene**
Mean gene size (bp)
Mean number of exons
* max # genes genotyped on platform
** based on genotyped snps
12,355*
0.239
8.6
79,886
12.32
15,366*
0.234
16.2
68,300
11.46
15,430*
0.234
16.4
68,169
11.43
Synaptic genes (N=1026)
Non-synaptic genes (N=22,655)
The most notable difference between synaptic genes and other genes is the gene size (and
related; the number of exons and the number of SNPs/gene), which is a known difference
between genes expressed in brain and genes not expressed in brain. Although larger genes are
more likely to contain SNPs that show a significant P-value (simply because more tests are
conducted), we do not think this may have affected our results, as the permutation procedure
we used corrects for any systematic effects of gene size or the number of SNPs, thus the
combined empirical P-value is unlikely to be biased because of these systematic differences.
4. Functional gene group analysis for synaptic subgroups
Seventeen functional synaptic subgroups were tested as well as one subgroup that contained
synaptic genes for which the specific synaptic function is not yet known. The 17 groups are: Cell
adhesion and transsynaptic signaling molecules; Cell metabolism (synaptic metabolic enzymes
and their co-factors, excluding mitochondrial proteins); Endocytosis (proteins involved in
endocyotosis); Excitability (voltage gated ion channels); Exocytosis (proteins involved in
10
Supplemental materials synaptic groups and SCZ
regulated secretion); G-protein relay (G-protein subunits); GPCR signaling (G-protein coupled
receptors); Intracellular signal transduction (enzymes downstream of G-protein/TK signaling);
Intracellular trafficking (vesicle adaptors, sorting- and motor proteins); Ion balance/transport
(ion-/solute-carriers and exchangers); Ligand gated ion channel signaling; Neurotransmitter
metabolism (metabolizing enzymes); Peptide/neurotrophin signaling (neuropeptide, trophic
factors, hormones); Protein clustering (scaffolding proteins); RNA and protein synthesis, folding
and breakdown; Structural plasticity (cytoskeletal proteins and their regulators); and Tyrosine
kinase (TK) signaling (tyrosine receptor kinases). Table S4 provides all assigned genes to each
of the functional gene groups.
Table S4: Genes assigned to tested synaptic functional gene groups
Intracellular Signal Transduction
ADCY1, ADCY2, ADCY3, ADCY4, ADCY5, ADCY8, ADCY9, ATM, BAIAP2, BASP1, BEGAIN, BRSK1, CALM1,
CALM2, CALM3, CALML3, CALML4, CALR, CAMK1, CAMK1D, CAMK1G, CAMK2A, CAMK2B, CAMK2D,
CAMK2G, CAMK2N1, CAMK4, CAP1, CAP2, CHP, CIT, CKB, CNP, CSNK2A1, CSNK2B, CTNNA1, CTNNA2,
CTNNB1, CTNND1, CTNND2, DAPK1, DBC1, DCLK1, DIRAS2, DOCK4, ENSA, GDAP1, GRB2, GSK3B,
GUCY1A2, GUCY1A3, GUCY1B3, HINT1, HOMER1, HOMER3, HPCA, HPCAL1, HPCAL4, HRAS, INPP4A,
IQSEC1, IQSEC2, ITPKA, KIAA1688, KRAS, LANCL1, LANCL2, LINGO1, MAP2K1, MAPK1, MAPK3, MAPK8IP1,
MAPK8IP2, MAPK8IP3, MARCKS, MARCKSL1, MINK1, MRAS, NCALD, NCKIPSD, NDRG2, NPTX1, NPTXR,
NRGN, PAFAH1B1, PAFAH1B2, PCP4, PDE2A, PDXP, PEBP1, PGRMC1, PHB, PLCB1, PLCB3, PLCB4, PLCG1,
PLD1, PPAP2B, PPP1CB, PPP1R9B, PPP2CB, PPP2R1A, PPP2R4, PPP3CA, PPP3CB, PPP3CC, PPP3R1,
PRKACA, PRKACB, PRKAR1A, PRKAR1B, PRKAR2A, PRKAR2B, PRKCA, PRKCB1, PRKCD, PRKCE, PRKCG,
PRKCZ, PSD3, PTPN9, RAC1, RALA, RAP1B, RAP1GDS1, RAP2A, RAP2B, RAPGEF4, RASAL1, RHOA, RHOB,
ROCK1, ROCK2, RYR1, RYR2, RYR3, SIRT2, SKP1, SMPD3, SNCB, SPG3A, THY1, VSNL1, WASF1, YWHAB,
YWHAE, YWHAG, YWHAH, YWHAQ, YWHAZ
Excitability
CACNA1A, CACNA1B, CACNA1C, CACNA1D, CACNA1E, CACNA1F, CACNA2D1, CACNA2D2, CACNA2D3,
CACNA2D4, CACNB1, CACNB3, CACNB4, CACNG2, CACNG3, CACNG4, CACNG5, CACNG8, KCNA1, KCNA2,
KCNA4, KCNAB1, KCNAB2, KCNC1, KCNC2, KCNC4, KCND2, KCNE1, KCNE1L, KCNE2, KCNE3, KCNE4,
KCNJ10, KCNJ12, KCNJ15, KCNJ3, KCNJ4, KCNJ5, KCNJ6, KCNJ8, KCNJ9, KCNMA1, KCNMB1, KCNMB2,
KCNMB4, KCTD12, KCTD16, SCN1A, SCN1B, SCN2A, SCN2B, SCN3A, SCN3B, SCN4B, SCN5A, SCN8A,
SCN9A, VDAC1, VDAC3
CAT signaling
AGRN, ALCAM, BCAN, BSG, C1QBP, CADM1, CADM2, CADM4, CD200, CD47, CDH1, CDH10, CDH11, CDH12,
CDH13, CDH15, CDH16, CDH18, CDH19, CDH2, CDH20, CDH22, CDH23, CDH26, CDH3, CDH4, CDH5, CDH6,
CDH7, CDH8, CDH9, CNTN1, CNTN2, CNTN3, CNTN4, CNTN5, CNTN6, CNTNAP1, CNTNAP2, CRMP1, CSPG5,
CTTN, DCHS1, DCHS2, ERBB2, ERBB2IP, GPC1, GPM6A, GPM6B, HAPLN1, HAPLN4, HNT, ICAM5, IGSF8,
L1CAM, LSAMP, LY6H, MBP, MOG, NCAM1, NCAM2, NCAN, NEO1, NFASC, NLGN1, NLGN2, NLGN3, NLGN4X,
NPTN, NRXN1, NRXN2, NRXN3, OMG, OPCML, PCDH1, PKP4, PLP1, PVRL1, ROBO1, ROBO2, TNR
Endocytosis
AAK1, AP1B1, AP1M1, AP1M2, AP2A1, AP2A2, AP2B1, AP2M1, AP2S1, AP3D1, AP3M2, AP3S2, CLTC, DNM1,
DNM1L, DNM2, DNM3, GIT1, NECAP1, PICALM, SH3GL2, SH3GLB1, SH3GLB2, SNAP91, SYNJ1, SYNJ2
Structural plasticity
ABI1, ABI2, ABLIM1, ACTB, ACTG1, ACTN1, ACTR2, ACTR3, ADD1, ADD2, ADD3, ANK2, ANK3, ANXA1, ANXA2,
ARPC1A, ARPC3, ARPC4, ARPC5, ASTN1, CAPZA2, CAPZB, CDC42, CEND1, CFL1, CLASP2, CORO1A,
CORO2B, COTL1, CPNE4, CPNE6, CRYM, CSRP1, DBN1, DBNL, DPYSL2, DPYSL3, DPYSL4, DPYSL5, DSTN,
DYNC1I1, EPB41L1, EPB41L2, EPB41L3, EZR, FSCN1, GAS7, GPRIN1, INA, KIF2A, KIF5C, KPNB1, LASP1,
LGI1, LPPR4, MACF1, MAP1A, MAP1B, MAP2, MAP6, MAPRE3, MAPT, MYH10, MYH9, MYL6, MYL9, NEBL,
NEFH, NEFL, NEFM, NF1, NUMBL, PALM, PALM2, PFN1, PFN2, PLEC1, PRICKLE2, SPTAN1, SPTB, SPTBN1,
SPTBN2, STMN1, TAGLN3, TCP1, TLN2, TMOD2, TMSB4X, TUBA1A, TUBA4B, TUBB, TUBB2A, TUBB2B,
TUBB2C, TUBB3, TUBB4, WASL, WDR1
GPCR signaling
ADORA1, ADORA2A, ADRA1A, ADRA2A, ADRB2, ADRB3, BAI3, BZRAP1, CHRM1, CHRM2, CNR1, CNR2,
CNRIP1, CRHR1, CRHR2, DRD1, DRD2, GPR158, GRM1, GRM3, GRM4, GRM5, GRM7, GRM8, HTR1A, HTR1B,
HTR1D, HTR2A, HTR2B, HTR7, LPHN1, OPRD1, OPRK1, OPRM1, P2RX1, P2RX3, P2RX6, P2RX7, P2RY1,
P2RY2, RGS7
Protein cluster
BSN, CASK, CASKIN1, CASKIN2, CNKSR2, CTBP1, CTBP2, DLG1, DLG2, DLG3, DLG4, DLGAP1, DLGAP2,
DLGAP3, DLGAP4, ERC1, ERC2, KIAA1045, LIN7A, LIN7B, LRRC7, MPP2, MPP3, MPP4, MPP5, MPP6,
11
Supplemental materials synaptic groups and SCZ
PACSIN1, PCLO, PICK1, PPFIA1, PPFIA2, PPFIA3, PPFIA4, PPFIBP1, PPFIBP2, SEPT11, SEPT2, SEPT3,
SEPT5, SEPT6, SEPT7, SEPT8, SEPT9, SHANK2, SHANK3, SORBS1, WDR37
Tyrosine kinase signaling
EPHA4, NTRK2, PTPRA, PTPRG, PTPRN2, PTPRS, SIRPA
Cell metabolism
ACSL1, ACSL3, AGK, APOE, CA2, DBI, ENO1, ENO2, FASN, GAPDH, GDA, GK, GLO1, GOT1, GPD1, GSTM3,
GSTM5, GSTP1, HK1, HSD17B4, IDH1, LDHA, LRP1, MDH1, MGLL, NME1, NME2, PFKL, PFKM, PFKP, PGAM1,
PGK1, PGM2, PHGDH, PI4KA, PIP4K2B, PIP4K2C, PIP5K1C, PITPNA, POR, PRDX1, PRDX2, PYGB, SCCPDH,
SLC25A4, SLC25A5, SLC2A1, SLC2A3, SLC35d3, SLC38A3, SLC3A2, SLC6A6, SLC6A7, SLC6A8, SLC7A5, TKT,
TMEM30A
Neurotransmitter metabolism
CHAT, DBH, DDAH1, GLS, MAOA, MAOB, MOXD1, SLC17A6, SLC17A7, SLC17A8, SLC18A1, SLC18A2,
SLC18A3, SLC1A1, SLC1A2, SLC1A3, SLC1A4, SLC1A6, SLC1A7, SLC32A1, SLC6A1, SLC6A17, SLC6A2,
SLC6A3, SLC6A4, SLC6A5, SLC6A9, TPH1, TPH2
Intracellular trafficking
AGAP2, ARFIP2, ARHGAP1, ARL8B, DCTN1, DCTN2, DYNC1H1, DYNC2H1, DYNLL1, DYNLL2, DYNLT1, EHD3,
GDI1, GDI2, GOLSYN, KIDINS220, KIF1A, KIF1B, KIF2B, KIF4A, KIF5B, KLC1, KLC2, KLC4, KTN1, MYO5A,
RAB10, RAB11B, RAB12, RAB14, RAB15, RAB18, RAB1A, RAB1B, RAB21, RAB24, RAB25, RAB26, RAB2A,
RAB30, RAB31, RAB33A, RAB35, RAB3A, RAB3B, RAB3C, RAB3D, RAB3GAP1, RAB4A, RAB4B, RAB5A,
RAB5B, RAB5C, RAB6A, RAB6B, RAB6IP1, RAB7A, RAB8A, RAB8B, RAB9B, RABAC1, RABEP1, RABEP2,
RABGEF1, RABIF, SEC22C, SNX5, STX12, STX16, STX5, STX6, STX7, STX8, VAMP7, VAPA, VPS33A, VPS33B,
VPS45, VTI1A, ZFYVE20
LGIC signaling
CHRNA4, CHRNA6, CHRNA7, CHRNB2, GABARAPL2, GABBR1, GABBR2, GABRA1, GABRA2, GABRA3,
GABRA4, GABRA5, GABRA6, GABRB1, GABRB2, GABRB3, GABRE, GABRG1, GABRG2, GRIA1, GRIA2, GRIA3,
GRIA4, GRIK1, GRIK2, GRIK3, GRIK4, GRIK5, GRIN1, GRIN2A, GRIN2B, GRIN2C, GRIN2D, HCN1, HTR3A,
HTR3B
Exocytosis
AMPH, ARF4, ARFGAP1, ARHGDIA, C1orf142, CADPS, CPLX1, CPLX2, DNAJC5, DOC2A, DOC2B, EXOC1,
EXOC2, EXOC3, EXOC4, EXOC5, EXOC6, EXOC7, EXOC8, MYRIP, NAPA, NAPG, NSF, NSFL1C, RAB27B,
RAB3D, RIMS1, RIMS2, RIMS3, RPH3A, SCAMP1, SCAMP2, SCAMP3, SCAMP5, SCRN1, SNAP23, SNAP25,
SNAP29, SNAPIN, SNIP, STX1A, STX1B, STX2, STX3, STX4, STXBP1, STXBP2, STXBP3, STXBP4, STXBP5,
STXBP5L, STXBP6, SV2A, SV2B, SV2C, SVOP, SYN1, SYN2, SYN3, SYNGR1, SYNGR2, SYNGR3, SYNPR, SYP,
SYPL1, SYT1, SYT12, SYT2, SYT3, SYT5, SYT6, SYT7, SYT9, SYTL1, SYTL2, SYTL3, SYTL4, SYTL5, UNC13A,
UNC13B, UNC13C, UNC13D, VAMP1, VAMP2, VAMP3, VAMP4
RPSFB
ADAM22, ADAM23, AHSA1, ALG2, BCAP31, CAND1, CANX, CAPN5, CCT2, CCT3, CCT4, CCT6A, CCT7, CCT8,
CST3, CYFIP1, DNAJA1, DNAJC11, DNAJC6, DPP6, EEF1A1, EEF1A2, EEF2, EIF4B, ENDOD1, FBXO41,
FKBP1A, GANAB, HSP90AA1, HSP90B1, HSPA12A, HSPA4, HSPA4L, HSPA5, HSPA8, HSPH1, MGST3, NEDD8,
NPEPPS, NT5DC3, OTUB1, P4HB, PACS1, PDIA3, PDIA6, PPIA, PPIB, PURA, QDPR, RPL10, RPL13, RPL29,
RPL4, RPL6, RPS16, RPS18, RPS27A, RPS3A, RPS9, RRBP1, SACS, SAMM50, STIP1, TMPRSS9, TXN, UBA1,
UBE2N, UBE2V2, UCHL1, USP5, VCP
Ion balance/transport
AQP4, ATP1A1, ATP1A2, ATP1A3, ATP1B1, ATP1B2, ATP2A2, ATP2B1, ATP2B2, ATP2B3, ATP2B4, ATP6AP1,
ATP6V0A1, ATP6V0C, ATP6V0D1, ATP6V1A, ATP6V1B1, ATP6V1B2, ATP6V1C1, ATP6V1D, ATP6V1E1,
ATP6V1F, ATP6V1G2, ATP6V1H, ATP8A1, GJA1, SLC12A5, SLC30A1, SLC30A3, SLC30A4, SLC30A5, SLC30A6,
SLC30A7, SLC30A9, SLC4A1, SLC4A10, SLC4A4, SLC8A1, SLC8A2, SLC8A3, SLC9A3R1, TTYH1, TTYH3
Peptide/Neurotrophin signals
APH1A, APH1B, AVP, BDNF, CCK, CHGA, CRH, CRHBP, FGF1, MIF, NCSTN, NPY, NRG1, NRG2, NRG3, NRG4,
NTS, OXT, PCSK1, PCSK2, PDYN, PENK, PSEN1, PSEN2, PSENEN, SCG2, SCG5, SST
G-protein relay
GNA11, GNA12, GNA13, GNA14, GNA15, GNAI1, GNAI2, GNAI3, GNAL, GNAO1, GNAQ, GNAS, GNAT1, GNAZ,
GNB1, GNB2, GNB3, GNB4, GNB5, GNG10, GNG11, GNG12, GNG2, GNG3, GNG4, GNG5, GNG7
Unknown
AADACL1, ABHD12, ANXA11, ANXA5, ANXA6, ANXA7, APBA1, APBA2, APBA3, APP, APPBP2, ARL6IP5,
ATCAY, BCAS1, C10orf58, C2orf55, C6orf174, C9orf126, DMXL2, DVL1, FLJ45455, FLOT1, FLOT2, FRMPD4,
GAP43, GBAS, GDAP1L1, HBA2, HBG1, HRB, KIAA0513, LAMP1, LAMP2, LOC389813, LRRTM1, MAL2, NBEA,
NCDN, NCKAP1, NIPSNAP1, OLFM1, PHYHIP, PPP1R1B, PRNP, PRRT1, PRRT3, RIMBP2, RPH3AL, RTN1,
RTN3, RTN4, SBF1, SNCA, TMED10, TMEM65, TRAPPC1, TRAPPC3, TRAPPC5, WDR7, WFS1, WNK1
Figures S3 a, b, c provide Q-Q plots for all eighteen groups of genes for the ISC_AFFY5,
ISC_AFFY6 and GAIN samples.
12
Supplemental materials synaptic groups and SCZ
Figure S3a. Q-Q plots of P-values per synaptic subgroup - ISC_AFFY5
13
Supplemental materials synaptic groups and SCZ
Figure S3b. Q-Q plots of P-values per synaptic subgroup - ISC_AFFY6
14
Supplemental materials synaptic groups and SCZ
Figure S3c. Q-Q plots of P-values per synaptic subgroup – GAIN
15
Supplemental materials synaptic groups and SCZ
Figure S4,a,b,c provide the distribution of the Σ-log(P) of all permuted datasets with the red bar
denoting the Σ-log(P) of the actual analysis for each functional gene group.
Figure S4a. Distribution of Σ-log(P) of permuted and actual datasets – ISC_AFFY5
16
Supplemental materials synaptic groups and SCZ
Figure S4b. Distribution of Σ-log(P) of permuted and actual datasets - ISC_AFFY6
17
Supplemental materials synaptic groups and SCZ
Figure S4c. Distribution of Σ-log(P) of permuted and actual datasets – GAIN
18
Supplemental materials synaptic groups and SCZ
5. Enrichment analysis on genes previously implicated genes in schizophrenia
We retrieved a list of all previously implicated genes and tested for enrichment of these in the
synaptic groups. In addition and as a positive control we tested whether previously implicated
genes as a gene-set were significantly associated to the risk for schizophrenia in the current
data. The list was compiled based on whole genome studies and thus did not include genes
implicated from previous candidate gene studies (except when these were replicated in the
whole genome studies). We used the following criteria for compiling the list of previously
implicated genes:
5.1 Signals from GWAS studies
We used the online GWAS catalogue (Hindorff et al, 2009), accessed 14 th of February 2011, to
obtain a list of published GWA studies for schizophrenia only (Lenzc et al, 2007; Walsh et al,
2008; Sullivan et al, 2008; Shifman et al, 2008; Kirov et al, 2008; Shi et al, 2009; Stefansson et
al, 2009; Need et al, 2009; Purcell et al, 2009; Donovan et al, 2009;Athanasiu et al, 2010).
Subsequently, we identified all SNPs with P
≤
1.0-5 reported in these GWA studies and mapped
these SNPs to protein coding genes (NCBI build v36.3). SNPs that were not directly located
within TSS and TES of a gene, were mapped to genes that were located within a 20kb window,
or excluded from the subsequent analyses if no genes were within this window. This procedure
resulted in the mapping of 97 GWAS signals to 78 unique protein-coding genes.
5.2 Signals from large scaled CNV studies
We used genome-wide CNV studies for schizophrenia in which CNV’s implicated a single gene.
Most CNV studies report CNV’s that overlap or disrupt multiple genes, without providing clear
evidence for the disease causing gene. Including these multiple genes would introduce a lot of
error in our analysis. We thus selected only those CNV studies that implicated a single gene per
CNV. This resulted in the selection of five genes from two recent genome-wide CNV studies
(Levinson et al, 2011; Vacic et al, 2011).
Combining the lists of implicated genes from SNP and CNV GWAS studies resulted in a total of 83
unique genes previously associated with schizophrenia. This list is given in Table S5.
19
Supplemental materials synaptic groups and SCZ
Table S5 Complete list of 83 previously implicated genes for schizophrenia obtained from GWAS and CNV
studies.
Genes (N=78) implicated from GWAS studies for SCZ published until 14/02/2011
ACSM1; ADAMTSL3; ADIPOR2; AGBL1; AGTRAP; ANK3; ATXN1; C10orf59; C16orf5; C1orf187; C1orf51; C1orf54;
C1orf91; C9orf82; CBX2; CCDC60; CDC42; CDH13; CENTD2; CENTG2; CSMD1; DCDC2B; DDX31; DOCK3; EIF3I;
EML5; ERBB4; ESAM; FMO3; FXR1; GRID1; GTF3C4; HIST1H2AG; HIST1H2BJ; HIST1H2BK; HIST1H4I; HLA-DQA1;
IFT74; ITGB1; LCK; LOC100128797; LOC100131289; LOC440302; LOC646993; MAD1L1; MAST4; MRPS21; MXRA5;
MYO18B; NLGN4X; NOS1; NOTCH4; NRGN; NTRK3; OPCML; OR5M1; PCLO; PGBD1; PLAA; POM121L2; PROX2;
PTGS2; PTPN21; RAD54L2; RBM15B; RELN; RPGRIP1L; RPP21; TCF4; TMEM16D; TRIM39; TRPA1; TXNRD2;
VSIG2; WNT7A; YLPM1; ZNF452; ZNF804A.
Genes (N=5) implicated from CNV studies published until April 2011
NRXN1, C16orf72, VIPR2, BCR, OTUD7A
5.3 Enrichment analysis
From the 83 previously implicated genes, 72 (86.7%) were genes that are expressed in brain
(Total N_brain expressed=18,029) and 8 (9.6%) belonged to the synaptic gene group. We used
a Fisher exact test to test for enrichment. We note that there is some overlap between
previously published GWAS studies and the data currently used (the current study includes data
used in Purcell et al., 2009 and Shi et al., 2009), whicj introduces some bias to the enrichment
analyses.
There was a significant overrepresentation of previously implicated genes in brain expressed
genes (P=.04) and synaptic genes (P=.03). Subsequently, we tested whether previously
implicated genes were enriched in any of the synaptic subgroups. The Fisher exact test statistic
was calculated using three different a priori expectations (see Table S6):
1) Enrichment of previously implicated genes in a subgroup, given 8 observed implicated genes
in 1026 synaptic genes: Penr_SYN
2) Enrichment of previously implicated genes in a subgroup, given 72 observed implicated genes
in 18029 brain expressed genes: Penr_BRAIN
3) Enrichment of previously implicated genes in a subgroup, given 83 observed implicated genes
in 23681 total genes: Penr_ALL
We do note that this enrichment analysis does not take into account the strength of the evidence
(i.e. a gene is either associated or not, actual P-values are not taken into account), but merely
provides an indication of whether the number of previously implicated genes is higher than
expected. Results indicate that there is enrichment of previously implicated genes in one
synaptic subgroup (CAT signaling), which was the third most significant gene-group in our
original analyses. However, since larger genes have a higher chance of being significantly related
to a trait in a GWAS, the selection of previously implicated genes may be biased towards larger
genes. Previously implicated genes are indeed much larger than genes not implicated previously
(Table S7; P= 1.958e-07, Wilcoxon rank sum test). Given that brain-expressed genes, including
synaptic genes are generally larger than non-brain expressed genes, and that gene sizes of
synaptic genes are not significantly different from gene-sizes of previously implicated genes
(P=0.5661, Wilcoxon rank sum test), the enrichment we see may thus be biased by gene-size.
20
Supplemental materials synaptic groups and SCZ
Three previously implicated and synaptic genes have large gene sizes. These three genes are all
part of the CAT-signaling group and we thus note that the significant enrichment in that group
needs to be interpreted with caution. Five previously implicated and synaptic genes have normal
to small gene sizes (see Figure S5). These are unlikely to have been implicated due to gene size.
Table S6 – Results of enrichment test per functional group. ‘Total N GENES’ denotes the number of genes
within the functional group, N_IMP the number of associated genes that is observed. Functional groups
indicated in bold are the functional groups that showed significant association to schizophrenia in the
functional gene group analysis. Significant enrichment (after Bonferroni correction) is indicated with an
asterisk.
Total N
GENES*
N_ IMP
Penr_SYN
Penr_BRAIN
Penr_ALL
Intracellular signal transduction
150
1
0.7189
0.4527
Excitability
59
0
1
1
1
CAT signaling
81
4
0.0020*
0.0003*
0.0002*
Endocytosis
26
0
1
1
1
Structural plasticity
98
2
0.1732
0.0585
0.0465
GPCR signaling
41
0
1
1
1
'Unknown'
61
0
1
1
1
Protein cluster
47
1
0.3137
0.1716
0.1523
Tyrosine kinase signaling
0.4104
7
0
1
1
1
Cell metabolism
57
0
1
1
1
Neurotransmitter metabolism
29
0
1
1
1
Intracellular trafficking
80
0
1
1
1
LGIC signaling
36
0
1
1
1
Exocytosis
86
0
1
1
1
RPSFB
71
0
1
1
1
Ion balance/transport
43
0
1
1
1
Peptide/Neurothropin signals
28
0
1
1
1
G-protein relay
27
0
1
1
*Please note that although the totals in the Table add up to 1027, the total number of synaptic genes is
1026 – because RAB3D is included in both the exocytosis and the intracellular trafficking group.
1
Table S7 Characteristics of previously implicated and non-implicated genes
Synaptic genes
Number
of
genes*
1,026
Mean
gene
size (bp)
121,553
Median
gene
(bp)
46,662
210,955
Mean
number
of exons
13.66
SD
number
of exons
12.05
Non-synaptic genes
22,655
47,535
16,346
102,451
8.82
9.54
Previously implicated genes
83
213,965
58,048
356,585
15.02
14.16
size
SD
gene size
Non-previously
implicated 23,598
50,192
17,127
108,227
9.01
9.69
genes
*Although the total Ngenes is 23681, we here used 23463 genes as for 218 genes gene size or exon
information was not complete. None of these encoded synaptic genes or were a previously implicated gene.
21
Supplemental materials synaptic groups and SCZ
Figure S5: Gene sizes of synaptic genes and previously implicated synaptic genes.
5.4 Gene-group analysis using previously implicated genes
As a positive control we conducted a gene-group analysis using the total set of 83 previously
implicated genes as a group. As the current dataset overlaps with the GWA studies on which the
list was based, we expected to find a significant association with this gene-group, which was
confirmed with a combined P-value of 3.74E-5, for the self contained test and P-values all < .05
for the competitive tests. (Table S8). Figure S6 provides qq-plots for all SNPs in previously
implicated genes across three samples.
22
Supplemental materials synaptic groups and SCZ
Table S8: Results of testing for total set of previously implicated genes
Set of previously
implicated genes (N=83)
ISC AFFY5
ISC AFFY6
GAIN AFFY6
ALL
* limited at 0.0001 due to
N genes 63
N SNPs 2833
Σ-log(P) 1340
PEMP 0.0344
Competitive Pemp Control Method1 0.12
Competitive Pemp Control Method2 0.26
Competitive Pemp Control Method3 0.26
Competitive Pemp Control Method4 0.13
Competitive Pemp Control Method5 0.37
N genes 71
N SNPs 6766
Σ-log(P) 3134
PEMP 0.0489
Competitive Pemp Control Method1 0.09
Competitive Pemp Control Method2 0.15
Competitive Pemp Control Method3 0.13
Competitive Pemp Control Method4 0.23
Competitive Pemp Control Method5 0.16
N genes 69
N SNPs 6838
Σ-log(P) 3716
PEMP <0.0001*
Competitive Pemp Control Method1 <0.01**
Competitive Pemp Control Method2 <0.01**
Competitive Pemp Control Method3 <0.01**
Competitive Pemp Control Method4 <0.01**
Competitive Pemp Control Method5 0.01
PCOMB 3.74E-05
PCOMB Control method 1 0.0036
PCOMB Control method 2 0.0147
PCOMB Control method 3 0.0127
PCOMB Control method 4 0.0114
PCOMB Control method 5 0.0247
10000 permutations; ** limited at 0.01 due to 100 draws
Figure S6: QQ plots for all SNPs in previously implicated genes across three samples.
23
Supplemental materials synaptic groups and SCZ
6. Genetic heterogeneity and gene group robustness
Schematic view of functional gene group configurations in eight individuals, four unaffected and four affected. The nodes represent
the gene products while the edges between the nodes indicate the physical interactions between the nodes. Unaffected individuals
carry mutations (red nodes) in only a few genes. These do not lead to a notable difference in functioning of the group due to the
robustness of the functional group. In the four affected individuals there are many genes with a mutation. In each individual this
can be in a different set of genes. However, as the set of genes carrying a mutation is large, other genes can no longer act as a backup. As a consequence the robustness of the functional group is affected, and dysfunction of the gene group occurs.
Figure S7 Heterogeneity at the gene level underlies robust functional gene groups: a model
7. Graphical representation of significant functional gene groups
We used the STRING database to graphically represent the known physical and functional
interactions between genes included in the subgroups ‘intracellular signal transduction’,
‘excitability’ and ‘cell adhesion and transsynaptic signaling molecules’. The lines in the graphs
(Figures S8-S10) represent the confidence of the interactions between two connecting genes,
where ticker lines represent a higher confidence.
24
Supplemental materials synaptic groups and SCZ
Figure S8: Graphical view of known associations between the genes within the intracellular
signal transduction group.
25
Supplemental materials synaptic groups and SCZ
Figure S9: Graphical view of of known associations between the genes within the excitability
group.
26
Supplemental materials synaptic groups and SCZ
Figure S10: Graphical view of known associations between the genes within the CAT signalling
group.
27
Supplemental materials synaptic groups and SCZ
8. Web resources
The URLs for data presented herein are as follows:
Genetics Cluster Computer: http://www.geneticcluster.org
dbSNP: http://www.ncbi.nlm.nih.gov/SNP).
R Software: http://www.r-project.org
Plink software: http://pngu.mgh.harvard.edu/~purcell/plink/
String interaction database: http://string-db.org/
GAIN QA/QC software Package http://www.sph.umich.edu/csg/abecasis/GainQC/
28
Supplemental materials synaptic groups and SCZ
9. References
Athanasiu L, Mattingsdal M, Kahler AK, Brown A, Gustafsson O, Agartz I et al. Gene variants
associated with schizophrenia in a Norwegian genome-wide study are replicated in a
large European cohort. J Psychiatr Res 2010; 44: 748-753.
Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS et al. Potential etiologic
and functional implications of genome-wide association loci for human diseases and
traits. Proc Natl Acad Sci U S A 2009; 106: 9362-9367.
Jia P, Wang L, Meltzer HY, Zhao Z. Common variants conferring risk of schizophrenia: a pathway
analysis of GWAS data. Schizophr Res 2010; 122: 38-42.
Kirov G, Zaharieva I, Georgieva L, Moskvina V, Nikolov I, Cichon S et al. A genome-wide
association study in 574 schizophrenia trios using DNA pooling. Mol Psychiatry 2009; 14:
796-803.
Lencz T, Morgan TV, Athanasiou M, Dain B, Reed CR, Kane JM et al. Converging evidence for a
pseudoautosomal cytokine receptor gene locus in schizophrenia. Mol Psychiatry 2007;
12: 572-580.
Levinson DF, Duan J, Oh S, Wang K, Sanders AR, Shi J et al. Copy number variants in
schizophrenia: confirmation of five previous findings and new evidence for 3q29
microdeletions and VIPR2 duplications. Am J Psychiatry 2011; 168: 302-316.
Need AC, Ge D, Weale ME, Maia J, Feng S, Heinzen EL et al. A genome-wide investigation of
SNPs and CNVs in schizophrenia. PLoS Genet 2009 Feb; 5(2): e1000373.
O'Donovan MC, Craddock N, Norton N, Williams H, Peirce T, Moskvina V et al. Identification of
loci associated with schizophrenia by genome-wide association and follow-up. Nat Genet
2008; 40: 1053-1055.
Purcell SM, Wray NR, Stone JL, Visscher PM, O'Donovan MC, Sullivan PF et al. Common
polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature
2009; 460: 748-752.
Raychaudhuri S, Korn JM, McCarroll SA, Altshuler D, Sklar P, Purcell S et al. Accurately assessing
the risk of schizophrenia conferred by rare copy-number variation affecting genes with
brain function. PLoS Genet 2010; 6.
Shi J, Levinson DF, Duan J, Sanders AR, Zheng Y, Pe'er I et al. Common variants on
chromosome 6p22.1 are associated with schizophrenia. Nature 2009; 460: 753-757.
Shifman S, Johannesson M, Bronstein M, Chen SX, Collier DA, Craddock NJ et al. Genome-wide
association identifies a common variant in the reelin gene that increases the risk of
schizophrenia only in women. PLoS Genet 2008; 4: e28.
Stefansson H, Ophoff RA, Steinberg S, Andreassen OA, Cichon S, Rujescu D et al. Common
variants conferring risk of schizophrenia. Nature 2009; 460: 744-747.
29
Supplemental materials synaptic groups and SCZ
Sullivan PF, Lin D, Tzeng JY, van den Oord E, Perkins D, Stroup TS et al. Genomewide
association for schizophrenia in the CATIE study: results of stage 1. Mol Psychiatry 2008;
13: 570-584.
Vacic V, McCarthy S, Malhotra D, Murray F, Chou HH, Peoples A et al. Duplications of the
neuropeptide receptor gene VIPR2 confer significant risk for schizophrenia. Nature 2011;
471: 499-503.
Walsh T, McClellan JM, McCarthy SE, Addington AM, Pierce SB, Cooper GM et al. Rare structural
variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia.
Science 2008; 320: 539-543.
30