401_2017_1693_MOESM1_ESM

SUPPLEMENTAL INFORMATION
SI METHODS
Voxel based morphometry cohort
For the comparative neuroimaging analysis, a cohort of individuals clinically diagnosed
and pathologically confirmed to have CBD, FTD, or PSP was compared against a cohort
of cognitively normal healthy controls. To be included in the study, cases were required
to have at least one MRI scan available, a clinical diagnosis of CBD, PSP, or FTD, and a
pathologically confirmed primary diagnosis concordant with their clinical presentation.
All participants were recruited through on-going studies of healthy aging and
neurodegenerative disease administered by the University of California, San Francisco
(UCSF) Memory and Aging Center (MAC). All participants or their proxy provided
written informed consent prior to participation, and all tests were approved by the
University of California, San Francisco Committee on Human Research. Each participant
underwent a multi-step screening process requiring at least one in-person visit to the
MAC. During the screening process, all participants underwent a neurologic exam,
detailed cognitive assessment (Rankin et al) and provided a medical history. Each
participant brought a study partner who was interviewed to help evaluate the participant’s
functional abilities. A multidisciplinary team composed of a neurologist,
neuropsychologist, and nurse then reviewed each participant’s data and determined their
clinical diagnosis.
Structural Image Acquisition and Processing
1
Study participants underwent MRI scanning within one year of CDR administration and
had at least one T1-weighted MR image available for analysis. Individuals were scanned
at the UCSF Neuroscience Imaging Center (NIC) or at the UCSF Veterans Affairs
Medical Center (SFVA). Scans from the UCSF NIC were acquired using a 3.0 Tesla
Siemens (Siemens, Iselin, NJ) TIM Trio scanner equipped with a 12-channel head coil
using a magnetization prepared rapid gradient echo (MPRAGE) sequence (160 sagittal
slices; slice thickness, 1.0 mm; field of view (FOV), 256 Χ 230 mm2; matrix 256 Χ 230;
voxel size, 1.0 Χ 1.0 Χ mm3; repetition time (TR), 2,300 ms; echo time (TE), 2.98 ms;
flip angle, 9°). Scans from the SFVA were acquired on a 1.5 Tesla Siemens Magnetom
VISION system (Siemens, Iselin, NJ) equipped with a quadrature head coil using an
MPRAGE sequence (164 coronal slices; slice thickness, 1.5mm; FOV, 256 Χ 256 mm2;
matrix, 256 Χ 256; voxel size, 1.0 Χ 1.5 Χ 1.0 mm3, TR, 10 ms; TE, 4 ms; flip angle,
15°) or a 4T Bruker MedSpec system with an 8-channel head coil controlled by a
Siemens Trio console, using an MPRAGE sequence (192 saggital slices; slice thickness,
1mm; FOV, 256 Χ 224 mm2; matrix, 256 Χ 224; voxel size, 1.0 Χ 1.0 Χ 1.0 mm3; TR,
2840 ms; TE, 3ms; flip angle, 7°).
In all cases, the most recent scan available was used. Statistical Parametric
Mapping (SPM) 12 was used to segment each participant’s image into grey matter, white
matter, and cerebrospinal fluid. The developer’s suggested settings were used for all
processing steps and a 8mm FHWM kernel was used to smooth the images. We used a
custom DARTEL template, which incorporated images from both healthy aging and
neurodegenerative disease.
2
For each diagnostic group, we ran a voxel-based morphometry analysis
comparing it to the same group of cognitively normal health controls. Each analysis
controlled for the effects of age, sex, education, scan type, and total intracranial volume
(TIV). The resulting maps of statistical significance were thresholded at p<0.001 and
overlaid onto a template brain. All voxel-based statistical analyses were conducted using
vslm2.55.
Conditional Q-Q plots
Q-Q plots compare a nominal probability distribution against an empirical distribution.
In the presence of all null relationships, nominal p-values form a straight line on a Q-Q
plot when plotted against the empirical distribution. For CBD, PSP and FTD SNPs and
for each categorical subset (strata), -log10 nominal p-values were plotted against -log10
empirical p-values (conditional Q-Q plots, see Supplemental Figure 1). Deflections of the
observed distribution from the projected null line reflect increased tail probabilities in the
distribution of test statistics (z-scores) and consequently an over-abundance of low pvalues compared to that expected by chance (enrichment).
Under large-scale testing paradigms, such as GWAS, quantitative estimates of
likely true associations can be estimated from the distributions of summary statistics [4,
6]. One common method for visualizing the enrichment of statistical association relative
to that expected under the global null hypothesis is through Q-Q plots of nominal pvalues obtained from GWAS summary statistics. The usual Q-Q curve has as the yordinate the nominal p-value, denoted by “p”, and as the x-ordinate the corresponding
value of the empirical cdf, denoted by “q”. Under the global null hypothesis the
3
theoretical distribution is uniform on the interval [0,1]. As is common in GWAS, we
instead plot -log10 p against -log10 q to emphasize tail probabilities of the theoretical and
empirical distributions. Therefore, genetic enrichment results in a leftward shift in the QQ curve, corresponding to a larger fraction of SNPs with nominal -log10 p-value greater
than or equal to a given threshold. Conditional Q-Q plots are constructed by creating
subsets of SNPs based on levels of an auxiliary measure for each SNP, and computing QQ plots separately for each level. If SNP enrichment is captured by variation in the
auxiliary measure, this is expressed as successive leftward deflections in a conditional QQ plot as levels of the auxiliary measure increase.
We constructed conditional Q-Q plots of empirical quantiles of nominal –log10(p)
values for SNP association with CBD for all SNPs, and for subsets (strata) of SNPs
determined by the nominal p-values of their association with PSP and FTD. Specifically,
we computed the empirical cumulative distribution of nominal p-values for a given
phenotype for all SNPs and for SNPs with significance levels below the indicated cutoffs for the other phenotypes (–log10(p) ≥ 0, –log10(p) ≥ 1, –log10(p) ≥2 corresponding to
p < 1, p < 0.1, p < 0.01 respectively). The nominal p-values (–log10(p)) are plotted on the
y-axis, and the empirical quantiles (–log10(q), where q=1-cdf(p)) are plotted on the x-axis
(Supplemental Figure 1). To assess for polygenic effects below the standard GWAS
significance threshold, we focused the conditional Q-Q plots on SNPs with nominal –
log10(p) < 7.3 (corresponding to p > 5x10-8).
Genomic Control
4
The empirical null distribution in GWAS is affected by global variance inflation
due to population stratification and cryptic relatedness [2] and deflation due to overcorrection of test statistics for polygenic traits by standard genomic control methods [8].
We applied a control method leveraging only intergenic SNPs, which are likely depleted
for true associations [5]. First, we annotated the SNPs to genic (5’UTR, exon, intron,
3’UTR) and intergenic regions using information from the 1KGP. We used intergenic
SNPs because their relative depletion of associations suggests that they provide a robust
estimate of true null effects and thus seem a better category for genomic control than all
SNPs. We converted all p-values to z-scores and for all phenotypes we estimated the
genomic inflation factor λGC for intergenic SNPs. We computed the inflation factor, λGC
as the median z-score squared divided by the expected median of a chi-square
distribution with one degree of freedom and divided all test statistics by λGC.
Conditional True Discovery Rate (TDR)
Enrichment seen in the fold enrichment plots can be directly interpreted in terms
of TDR (equivalent to one minus the False Discovery Rate (FDR)) [1]. We applied the
conditional FDR method [7], previously used for enrichment of GWAS based on linkage
information [9]. Specifically, for a given p-value cutoff, the FDR is defined as
FDR(p) = π0F0(p) / F(p),
[1]
where π0 is the proportion of null SNPs, F0 is the null cdf, and F is the cdf of all SNPs,
both null and non-null. Under the null hypothesis, F0 is the cdf of the uniform distribution
on the unit interval [0,1], so that Eq. [1] reduces to
FDR(p) = π0p / F(p),
5
[2]
The
p
p
is the number of
SNPs with p-values less than or equal to p, and N is the total number of SNPs. Replacing
F by q in Eq. [2], we get
Estimated FDR(p) = π0p / q,
[3]
which is biased upwards as an estimate of the FDR [3]. Replacing π0 in Equation [3] with
unity gives an estimated FDR that is further biased upward;
q* = p/q
[4]
If π0 is close to one, as is likely true for most GWAS, the increase in bias from Eq. [3] is
minimal. The quantity 1 – p/q, is therefore biased downward, and hence is a conservative
estimate of the TDR.
Referring to the formulation of the Q-Q plots, we see that q* is equivalent to the
nominal p-value divided by the empirical quantile, as defined earlier. Given the -log10 of
the Q-Q plots we can easily obtain
-log10(q*) = log10(q) – log10(p)
[5]
demonstrating that the (conservatively) estimated FDR is directly related to the horizontal
shift of the curves in the conditional Q-Q plots from the expected line x = y, with a larger
shift corresponding to a smaller FDR, as illustrated in Supplemental Figure 1. As before,
the estimated TDR can be obtained as 1-FDR.
Conjunction statistics – test of association with both phenotypes
We defined the conjunction statistics (denoted as FDR Trait1 & Trait2) as the maximum of the
conditional FDR in both directions, i.e.
FDR Trait1 & Trait2 = max(FDR Trait1 | Trait2, FDR Trait2 | Trait1)
6
based on the combination of p-value for the SNP in CBD and the associated disease (e.g.
PSP), by interpolation into a bidirectional 2-D look-up table [4, 6]. The conjunction
statistic allows for identification of SNPs that are associated with both phenotypes, which
minimizes the effect of a single phenotype driving the common association signal. Table
1 lists all SNPs with conjunction FDR < 0.05 (-log10(FDR) > 1.3) with CBD and PSP or
FTD considered after removing all SNPs with r2 > 0.2 based on 1KGP linkage
disequilibrium (LD) (pruning).
Conjunction FDR Manhattan plots
To illustrate the localization of the genetic markers associated with CBD given PSP and
FTD we used a ‘Conjunction FDR Manhattan plot’, plotting all SNPs within an LD block
in relation to their chromosomal location. As illustrated in Figure 2 within the main
manuscript, the large points represent the SNPs with FDR < 0.05, whereas the small
points represent the non-significant SNPs. All SNPs before ‘pruning’ (removing all SNPs
with r2 > 0.2 based on 1KGP LD structure) are shown. The strongest signal in each LD
block is illustrated with a black line around the circles. This was identified by ranking all
SNPs in increasing order, based on the conjunction FDR value for CBD, and then
removing SNPs in LD r2 > 0.2 with any higher ranked SNP. Thus, the selected locus was
the most significantly associated with CBD in each LD block (Figure 2).
SI RESULTS
Neuroimaging Analysis
CBD
7
The comparison of pathologically confirmed CBD cases versus controls revealed diffuse
cortical atrophy that was pronounced in the frontal lobes, parietal lobes, insula, and
caudate bilaterally. The maximum T-score in the CBD analysis was 9.43 and was
centered in the left supplementary motor cortex. The results of the CBD analysis are
illustrated in Figure 5 within the main manuscript.
FTD
The comparison of pathologically confirmed FTD cases versus controls revealed
pronounced cortical atrophy across the frontal and temporal lobes bilaterally. There was
also bilateral atrophy in the caudate and putamen. The maximum T-score in the FTD
analysis was 18.17 and was located in the right caudate. The results of the FTD analysis
are illustrated in Figure 5 within the main manuscript.
PSP
The comparison of pathologically confirmed PSP cases versus controls revealed diffuse
cortical atrophy that was pronounced in the caudate, insula, operculum, and cerebellum
bilaterally. The maximum T-score in the PSP analysis was 8.03 and it was located in the
left caudate. The results of the PSP analysis are illustrated in Figure 5 within the main
manuscript.
8
Supplemental References
1.
Benjamini Y, Hochberg Y (1995) Controlling the False Discovery Rate: A
Practical and Powerful Approach to Multiple Testing. J R Stat Soc Ser B 57:289–
300. doi: 10.2307/2346101
2.
Devlin B, Roeder K (1999) Genomic control for association studies. Biometrics
55:997–1004. doi: 10.1111/j.0006-341X.1999.00997.x
3.
Efron B (2007) Size, power and false discovery rates. Ann Stat 35:1351–1377. doi:
10.1214/009053606000001460
4.
Efron B (2010) Large-scale inference : empirical Bayes methods for estimation,
testing, and prediction. Cambridge University Press, New York
5.
Schork AJ, Thompson WK, Pham P, et al. (2013) All SNPs are not created equal:
genome-wide association studies reveal a consistent pattern of enrichment among
functionally annotated SNPs. PLoS Genet 9:e1003449. doi:
10.1371/journal.pgen.1003449
6.
Schweder T, Spjøtvoll E (1982) Plots of p-values to evaluate many tests
simultaneously. Biometrika 69:493–502. doi: 10.1093/biomet/69.3.493
7.
Sun L, Craiu R V, Paterson AD, Bull SB (2006) Stratified false discovery control
for large-scale hypothesis testing with application to genome-wide association
studies. Genet Epidemiol 30:519–30. doi: 10.1002/gepi.20164
8.
Yang J, Weedon MN, Purcell S, et al. (2011) Genomic inflation factors under
polygenic inheritance. Eur J Hum Genet 19:807–812. doi: 10.1038/ejhg.2011.39
9.
Yoo YJ, Pinnaduwage D, Waggott D, Bull SB, Sun L (2009) Genome-wide
association analyses of North American Rheumatoid Arthritis Consortium and
Framingham Heart Study data utilizing genome-wide linkage results. BMC Proc
3:S103. doi: 10.1186/1753-6561-3-s7-s103
9
SI FIGURE LEGENDS
Fig. S1. Conditional quantile-quantile (Q-Q) plots of empirical -log10 p versus nominal log10 p (corrected for inflation) in corticobasal degeneration (CBD) below the standard
GWAS threshold of p < 5x10-8 as a function of significance of association with
progressive supranuclear palsy (PSP) (left panel), and frontotemporal dementia (FTD)
(right panel) at p ≤ 0.1, p ≤ 0.01, and p ≤ 0.001, respectively. Blue line indicates all SNPs.
Fig S2. Regional association plots for (a) rs199533, (b) rs1768208, (c) rs2011946, (d)
rs759162 and (e) rs7035933.
Fig S3. Limiting the frontotemporal dementia (FTD) cohort to only patients with
diagnosis of behavioral variant FTD (bvFTD): (a) Fold enrichment plots of enrichment
versus nominal -log10 p-values (corrected for inflation) in corticobasal degeneration
(CBD) below the standard GWAS threshold of p < 5x10-8 as a function of significance of
association with bvFTD (left panel) and progressive supranuclear palsy (PSP, right panel)
at the level of -log10(p) ≥ 0, -log10(p) ≥ 1, -log10(p) ≥ 2 corresponding to p ≤ 1, p ≤ 0.1, p
≤ 0.01, respectively. Blue line indicates all SNPs. (b) ‘Conjunction’ Manhattan plot of
conjunction and conditional –log10 (FDR) values for corticobasal degeneration (CBD)
(black) given bvFTD (CBD|bvFTD, red) and CBD given PSP (CBD|PSP, orange). SNPs
with conditional and conjunction –log10 FDR > 1.3 (i.e. FDR < 0.05) are shown with
large points. A black line around the large points indicates the most significant SNP in
each LD block and this SNP was annotated with the closest gene, which is listed above
the symbols in each locus.
10
Fig. S4. After removing chromosome 17 MAPT-region associated signal (consisting of
all SNPs in r2 > 0.2, based on 1000 Genomes Project LD structure, within 1 Mb of MAPT
variants) from the analysis: (a) Fold enrichment plots of enrichment versus nominal -log10
p-values (corrected for inflation) in corticobasal degeneration (CBD) below the standard
GWAS threshold of p < 5x10-8 as a function of significance of association with
frontotemporal dementia (FTD, left panel) and progressive supranuclear palsy (PSP, right
panel) at the level of -log10(p) ≥ 0, -log10(p) ≥ 1, -log10(p) ≥ 2 corresponding to p ≤ 1, p ≤
0.1, p ≤ 0.01, respectively. Blue line indicates all SNPs. (b) Conditional quantile-quantile
(Q-Q) plots of empirical -log10 p versus nominal -log10 p (corrected for inflation) in CBD
below the standard GWAS threshold of p < 5x10-8 as a function of significance of
association with and FTD (left panel) and PSP (right panel) at p ≤ 0.1, p ≤ 0.01, and p ≤
0.001, respectively. Blue line indicates all SNPs.
Fig. S5. Bar plots demonstrating regional expression of MAPT across the brain using the
Braineac dataset.
Fig S6. Bar plots demonstrating gene expression profiles of top risk genes across
different types of cells in the brain identified through RNA-seq
(http://web.stanford.edu/group/barres_lab/brain_rnaseq.html). Note, scales of gene
expression (as Fragments Per Kilobase of transcript per Million mapped reads) are
different for each transcript.
11
Fig S1.
12
Fig S2a.
13
Fig S2b.
14
Fig S2c.
15
Fig S2d.
16
Fig S2e.
17
Fig S3a.
18
Fig S3b.
19
Fig S4a.
20
Fig S4b.
21
Fig S5.
9.0
Affymetrix ID t3723687
●
●
8.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
7.5
●
●
●
7.0
Expression level in log2 scale
8.5
●
●
●
●
●
●
FCTX TCTX OCTX HIPP THAL CRBL SNIG PUTM MEDU WHMT
(N=127) (N=119) (N=129) (N=122) (N=124) (N=130) (N=101) (N=129) (N=119) (N=131)
Fold change between FCTX and WHMT = 1.5 (p=9.4e−49)
Source:BRAINEAC
22
Fig S6.
EGFR
MAPT
GLDC
CXCR4
MOBP
23
SI TABLES
Supplemental Table 1: eQTL for rs199533 in GTEx
Region
Cerebellum
Frontal Cortex
Gene
KANSL1-AS1
LRRC37A
LRRC37A2
MAPT
MAPT-AS1
RP11-259G18.1
RP11-259G18.2
RP11-259G18.3
SPPL2C
KANSL1-AS1
LRRC37A2
MAPT
RP11-259G18.2
RP11-259G18.3
P-value
1.98E-20
4.13E-13
4.16E-19
6.31E-12
2.56E-08
1.04E-12
8.43E-18
1.32E-25
1.00E-07
4.64E-21
3.30E-16
8.24E-08
2.61E-16
2.12E-15
24
Supplemental Table 2: Co-localization results from COLOC using the GTEX
cohort.
SNP
eQTL
gene
MCM6
Tissue
rs2011946
Nearest
gene
CXCR4
Heart
Lowest
p-value
9.00E-09
rs2011946
CXCR4
MCM6
Nerve
6.00E-07
rs1768208
MOBP
MOBP
Nerve
1.00E-28
rs1768208
MOBP
RPSA
Lung
4.00E-17
rs759162
EGFR
N/A
N/A
N/A
rs7035933
GLDC
GLDC
Nerve
1.40E-11
rs199533
MAPT
KANSL1AS1
Whole
blood
1.00E-68
25
Supplemental Table 2: Network weights associated with bioinformatic analyses for
MAPT, MOBP, CXCR4, GLDC and EGFR (for additional information see
www.genemania.org). File attached as a separate document
26