bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 1 2 Title: Determining causal genes from GWAS signals using topologically associating domains 3 4 Authors: 5 Gregory P. Way1,2, Daniel W. Youngstrom3, Kurt D. Hankenson3, Casey S. Greene2*, and 6 Struan F.A. Grant4,5* 7 * These authors directed this work jointly 8 9 Affiliations: 10 1 11 Philadelphia, PA 19104, USA 12 2 13 Pennsylvania, Philadelphia, PA 19104, USA 14 3 15 State University, East Lansing, MI 48824, USA 16 4 17 Philadelphia, PA 19104, USA 18 5 19 Hospital of Philadelphia, Philadelphia, PA 19104, USA Genomics and Computational Biology Graduate Program, University of Pennsylvania, Department of Systems Pharmacology and Translational Therapeutics, University of Department of Small Animal Clinical Sciences, College of Veterinary Medicine, Michigan Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Division of Human Genetics, Division of Endocrinology and Diabetes, The Children's 20 21 Author Emails: 22 GPW: [email protected]; DWY: [email protected]; KDH: [email protected]; 23 CSG: [email protected]; *SFAG: [email protected] 1 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 24 Co-Corresponding Authors:* Struan F.A. Grant Casey S. Greene Rm 1102D, 3615 Civic Center Blvd 10-131 SCTR 34th and Civic Center Blvd, Philadelphia, PA 19104 Philadelphia, PA 19104 Office: 267-426-2795 Office: 215-573-2991 Fax: 215-590-1258 Fax: 215-573-9135 25 26 Abstract: 27 Background 28 Genome wide association studies (GWAS) have contributed significantly to the field of 29 complex disease genetics. However, GWAS only report signals associated with a given trait and 30 do not necessarily identify the precise location of culprit genes. As most association signals 31 occur in non-coding regions of the genome, it is often challenging to assign genomic variants to 32 the underlying causal mechanism(s). Topologically associating domains (TADs) are primarily 33 cell-type independent genomic regions that define interactome boundaries and can aid in the 34 designation of limits within which a GWAS locus most likely impacts gene function. 35 Results 36 We describe and validate a computational method that uses the genic content of TADs to 37 assign GWAS signals to likely causal genes. Our method, called “TAD_Pathways”, performs a 38 Gene Ontology (GO) analysis over all genes that reside within the boundaries of all TADs 39 corresponding to the GWAS signals for a given trait or disease. We applied our pipeline to the 40 GWAS catalog entries associated with bone mineral density (BMD), identifying ‘Skeletal 41 System Development’ (Benjamini-Hochberg adjusted p = 1.02x10-5) as the top ranked pathway. 42 Often, the causal gene identified at a given locus was well known and/or the nearest gene to the 43 sentinel SNP. In other cases, our method implicated a gene further away. Our molecular 2 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 44 experiments describe a novel example: ACP2, implicated at the canonical ‘ARHGAP1’ locus. We 45 found ACP2 to be an important regulator of osteoblast metabolism, whereas a causal role of 46 ARHGAP1 was not supported. 47 Conclusions 48 Our results demonstrate how basic principles of three-dimensional genome organization can 49 help define biologically informed windows of signal association. We anticipate that 50 incorporating TADs will aid in refining and improving the performance of a variety of 51 algorithms that linearly interpret genomic content. 52 53 Keywords: 54 Genome wide association study, gene prioritization, topologically associating domains, 55 pathway analysis, bone mineral density 56 57 Background: 58 Genome-wide association studies (GWAS) have been applied to over 300 different traits, 59 leading to the discovery and subsequent validation of several important disease associations [1]. 60 However, GWAS can only discover association signals in the data. Subsequent assignment of 61 signal to causal genes has proven difficult due to these signals falling principally within 62 noncoding genomic regions [2–4] and not necessarily implicating the nearest gene [5]. For 63 example, a signal found within an intron for FTO, a well-studied gene previously thought to be 64 important for obesity [6], has been shown to physically interact with and lead to the differential 65 expression of two genes (IRX3 and IRX5) directly next to this gene, and not FTO itself [7–9]. 66 Moreover, there is evidence suggesting a type 2 diabetes GWAS association previously 67 implicating TCF7L2 [10] also influences the nearby ACSL5 gene [11]. It remains unclear how 3 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 68 pervasive these kinds of associations are, but similar strategies are necessary in order for GWAS 69 to better guide research and precision medicine [12]. 70 Three-dimensional genomics has changed the way geneticists think about genome 71 organization and its functional implications [13,14]. Genome-wide chromatin interaction maps 72 have facilitated the development of several genome organization principles, including 73 topologically associating domains (TADs) [15–18]. TADs are sub-architectural units of the 74 overall genome organization that have consistent and functionally important genomic element 75 distributions including an enrichment of housekeeping genes, insulator elements, and early 76 replication timing regions at boundary regions [19–21]. TADs are largely consistent across 77 different cell types and demonstrate synteny [22,23]. These observations can therefore allow the 78 leveraging of TADs to set the bounds of where non-coding causal variants can most likely 79 impact promoters, enhancers and genes in a tissue independent fashion [24,25]. Therefore, we 80 sought to develop a method that integrates GWAS data with interactome boundaries to more 81 accurately map signals to the mostly likely candidate gene(s). 82 We developed a computational approach, called “TAD_Pathways”, which is agnostic to gene 83 locations relative to each GWAS signal within TADs. We scanned publically available GWAS 84 data for given traits and used TAD boundaries to output lists of genes likely to be causal. We 85 demonstrate this approach by assessing the influence of GWAS signals on bone mineral density 86 (BMD) [26–29]. This trait is clinically of great importance as low BMD is an important 87 precursor to osteoporosis, a disease condition affecting millions of patients annually [30]. We 88 also chose BMD as a trait for analysis because BMD GWAS primarily points to very well- 89 known genes involved in bone development (positive controls) but there remain a number of 90 established loci where no obvious gene resides, therefore offering the opportunity to uncover 4 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 91 novel biology. After applying our TAD_Pathways discovery approach, we investigated putative 92 causal genes using cell culture-based assays, identifying ACP2 as a novel regulator of osteoblast 93 metabolism. 94 95 Results: 96 The Genomic Landscape of SNPs across Topologically Associating Domains 97 We observed a consistent and non-random distribution of SNPs across TADs derived from 98 human embryonic stem cells (hESC), human fibroblasts (IMR90), mouse embryonic stem cells 99 (mESC), and mouse cortex cells (mcortex) cells. As expected, SNPs are tightly associated with 100 TAD length for each cell type, but there are substantial outlier TADs (Figure 1). For example, 101 the TAD harboring the largest number of common SNPs (minor allele frequency (MAF) greater 102 than 0.05) in hESC is located on chromosome 6 (UCSC hg19: chr6:31492021-32932022) and 103 has 19,431 SNPs. Not surprisingly, this TAD harbors an abundance of genes including HLA 104 genes, which are well known to have many polymorphic sites [31]. However, the other human 105 cell line (IMR90) outlier TAD is located on chromosome 8 (UCSC hg19: chr8:2132593- 106 6252592) and has 27,220 SNPs and could be potentially biologically meaningful. Indeed, 107 although this TAD harbors relatively few genes, it does include CSMD1, a gene implicated in 108 cancer and neurological disorders such as epilepsy and schizophrenia [32,33]. 109 Common SNPs were enriched near the center of TADs (Figure 1A). This is the opposite of 110 gene (Supplementary Figure S1) and repeat element (Supplementary Figure S2) distributions 111 (also see Dixon et al. 2012) [22]. The repeat element distribution was driven largely by the 112 SINE/Alu repeat distribution, which could not be explained by GC content and estimated 113 evolutionary divergence (Supplementary Figure S3). We also observed that common SNPs are 5 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 114 significantly enriched in the 3’ half of TADs in hESC and mcortex cells (Supplementary Table 115 S1). There was also a slight increase in GWAS implicated SNPs near hESC TAD boundaries 116 (Figure 1B). Given the non-random patterns observed across the TADs, we went on to explore 117 the gene content further in an attempt to imply causality at given GWAS loci. 118 119 120 121 122 123 124 125 126 127 128 129 Figure 1. Distributions of single nucleotide polymorphisms (SNPs) across topologically associating domains of four cell types. (A) Top – The length of TADs is associated with the number of SNPs found in each cell type. Bars on the top and right side of the plots represent histograms of each respective metric. hESC and IMR90 TADs are based on 1000 genomes phase 3 SNPs (hg19) and mESC and mcortex TADs represent 15 strains from the mouse genomes project version 2 (mm9). Bottom – Number of SNPs found in each cell type. (B) Number of GWAS SNPs from the NHGRI-EBI GWAS catalog that reached genome-wide significance in replication required journals. In each case, we independently discretized TADs into 50 bins where the distribution of elements is linear from 5’ to 3’ (bin 0 is the 5’ most end of all TADs). 130 6 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 131 TAD_Pathways reveals potentially causal genes within phenotype-associated TADs 132 Seeking to leverage TADs and disease associated SNPs, we integrated GWAS and TAD 133 domain boundaries in an effort to assign GWAS signals to causal genes. Alternative approaches 134 to understand the gene landscape of a locus that do not consider TAD boundaries typically either 135 assign genes to a GWAS signal based on nearest gene [34] or by an arbitrary or a linkage 136 disequilibrium-based window of several kilobases [35,36] (Figure 2A). Instead, we used TAD 137 boundaries and the full catalog of GWAS findings for a given trait or disease to assign genes to 138 GWAS variants based on overrepresentation in a gene set in an approach we termed 139 “TAD_Pathways”. For a given trait or disease, we collected all genes that are located in TADs 140 harboring significant GWAS signals. We then applied a statistical enrichment analysis for 141 biological pathways using this TAD gene set and assign candidate genes within a TAD based on 142 the pathways significantly associated with a phenotype (Figure 2B). In our implementation, we 143 used GO biological processes, GO cellular components, and GO molecular functions to provide 144 the pathway sets [37]. We included both experimentally confirmed and computationally inferred 145 GO gene annotations, which permit the inclusion of putative casual genes that do not necessarily 146 have literature support but are predicted by a variety of computational methods. 147 To validate our approach, we applied TAD_Pathways to bone mineral density (BMD) 148 GWAS results derived from replication-requiring journals [26–29]. Our method implicated 149 ‘Skeletal System Development’ (Benjamini-Hochberg adjusted p = 1.02x10-5) as the top ranked 150 pathway. We provide full TAD_Pathways results for BMD in Supplementary Table S2. 151 Despite a high content of presumably non-causal genes, which we expect would contribute noise 152 to the overrepresentation analysis [38], our method demonstrated enrichment of a skeletal system 153 related pathway and selected a subset of potentially causal genes belonging to the same pathway. 7 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 154 Many of these genes (24/38) were not the nearest gene to the GWAS signal and several also had 155 independent expression quantitative trail loci (eQTL) support (Supplementary Table S3, 156 Supplementary Figure S4). 157 158 159 160 161 162 163 164 165 166 167 168 169 Figure 2. Concepts motivating our approach. Topologically associating domains (TADS) are shown as orange triangles, genes are shown as black lines, and a genome wide significant GWAS signal is shown as a dotted red line. (A) Three hypothetical examples illustrated by a cartoon. The ground truth causal gene is shaded in red. The method-specific selected genes are shaded in blue. The top panel describes a nearest gene approach. The nearest gene in this scenario is not the gene actually impacted by the GWAS SNP. The middle panel describes a window approach. Based either on linkage disequilibrium or an arbitrarily sized window, the scenario does not capture the true gene. The bottom panel describes the TAD_Pathways approach. In this scenario, the causal gene is selected for downstream assessment. (B) The TAD_Pathways method. An example using Bone Mineral Density GWAS signals is shown. 170 siRNA Knockdown of TAD Pathway Gene Predictions in Osteoblast Cells 171 The loci rs7932354 (cytoband: 11p11.2) and rs11602954 (cytoband: 11p15.5) are currently 172 assigned to ARHGAP1 and BETL1 but our method implicated ACP2 and DEAF1, respectively. 173 The two genes implicated by TAD_Pathways, ACP2 and DEAF1, lacked eQTL support and were 174 not the nearest gene to the BMD GWAS signal. We tested the gene expression activity and 175 metabolic importance of these four genes, ARHGAP1, BETL1, DEAF1, and ACP2. Specifically, 176 our assays in a human fetal osteoblast cell line (hFOB) evaluate whether or not the 8 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 177 TAD_Pathways method identifies causal genes at GWAS signals beyond those captured by 178 closest and eQTL connected genes. Though these two genes were annotated to the identified GO 179 process, the annotation had been made computationally and their known biology did not provide 180 obvious links to bone biology. 181 We targeted the expression of all four of these genes in vitro using small interfering RNA 182 (siRNA), assessing knockdown efficiency at the mRNA level relative to untreated controls and 183 determined corresponding p values relative to scrambled siRNA controls. We used an siRNA 184 targeting tissue-nonspecific alkaline phosphatase (TNAP) as a positive control. Knockdown 185 efficiencies were: TNAP siRNA 48.7±9.9% (p=0.141), ARHGAP1 siRNA 68.7±14.3% 186 (p=0.015), ACP2 siRNA 48.9±6.4% (p=0.035), BET1L 56.4±1.0% (N.S.) and DEAF1 187 52.7±9.2% (p=0.021) (Figure 3). siRNA targeted against each gene of interest did not down- 188 regulate the expression of the other genes under investigation, indicating specificity of 189 knockdown, although we noted that TNAP siRNA did reduce DEAF1 gene expression, though 190 this did not reach the threshold for statistical significance (p= 0.077). 191 We noted significant variation across the three controls, with the scrambled siRNA control 192 altering expression of OCN (osteocalcin), IBSP (bone sialoprotein), TNAP and BET1L (p < 0.05). 193 Relative to the scrambled siRNA control, OCN was downregulated in all siRNA groups (p < 194 0.05) except for BET1L siRNA (p = 0.122). OSX, IBSP and TNAP were not significantly altered 195 by any siRNA treatment (Figure 3). 196 197 198 199 Metabolic Activity of TAD Pathway Gene Predictions Use of ACP2 siRNA led to a 66.0% reduction in MTT metabolic activity versus the scrambled siRNA control (p = 0.012). ARHGAP1 siRNA caused a 38.8% reduction, which fell 9 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 200 short of statistical significance (p = 0.088). siRNA targeted against TNAP, BET1L or DEAF1 did 201 not alter MTT metabolic activity (Figure 4A). 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 Figure 3: Real-time PCR of osteoblast differentiation genes and GWAS/TAD hits in hFOB cells. siRNA was used to knock down expression of TNAP (positive control), ARHGAP1, ACP2, BET1L and DEAF1. Relative expression of the osteoblast marker genes OSX, OCN and IBSP suggest that GWAS/TAD hits are not major regulators of bone differentiation in this model. Red bars highlight specificity of each siRNA knockdown. Values represent mean ± standard deviation. Statistical significance relative to the scrambled siRNA control is annotated as: *p ≤ 0.05 and #p ≤ 0.10 using a two-tailed Student’s t-test. Figure 4. Validating two ‘TAD Pathway’ predictions for Bone Mineral Density GWAS hits on hFOB cells. siRNA was used to knock down expression of TNAP, ARHGAP1, ACP2, BET1L and DEAF1. (A) Knockdown of ACP2 decreases cellular metabolic activity, demonstrated using an MTT assay. (B) ALP staining and quantitation indicates that knockdown of TNAP or ACP2 inhibits performance in an osteoblast differentiation assay. Values represent mean ± standard deviation. Statistical significance relative to the scrambled siRNA control is annotated as: *p ≤ 0.05 and #p ≤ 0.10 using a two-tailed Student’s t-test. 10 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 221 222 Influence of TAD_Pathways Gene Predictions on Alkaline Phosphatase Activity Alkaline phosphatase (ALP) is highly expressed in osteoblasts; disruption of proliferation or 223 osteoblast differentiation would result in downregulation of ALP. Treatment with siRNA 224 resulted in changes in ALP staining that we analyzed further by quantitation. TNAP siRNA 225 significantly reduced ALP by 5.98±1.77 versus the scrambled siRNA control (p = 0.006). ACP2 226 siRNA also significantly reduced ALP intensity by 8.74±2.11 versus the scrambled siRNA 227 control (p = 0.003). The scrambled siRNA group stained less intensely than untreated or 228 transfection reagent control wells, but this did not reach statistical significance (0.05 < p < 0.10) 229 (Figure 4B). 230 231 232 Discussion: We observed a nonrandom enrichment of SNPs in the center of TADs that was consistent 233 across different cell types, but was in the opposite direction of the gene and repeat elements 234 distributions. It is possible that the gene distribution is driving this phenomenon, since coding 235 regions are under higher evolutionary constraint and are thus more averse to SNPs [39]. 236 Nevertheless, GWAS SNPs also appeared to be distributed closely to boundary regions in hESC 237 cells. This may support GWAS causally implicating nearest genes more frequently since genes 238 are also distributed near boundaries. The observation may also suggest that polymorphism in 239 regions near TAD boundaries are more important drivers of disease risk associations than 240 polymorphism in the center of TADs. However, we do not observe this pattern in IMR90 cells. 241 The SNP distribution was also opposite of the SINE/Alu repeat distribution. Given that Alu 242 elements tend to insert into GC-rich regions [40], we tracked GC content across TADs and 243 observed only a slight increase in GC content near TAD boundaries. There was also a slightly 11 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 244 inverse distribution of Alu evolutionary divergence [40]. Our results suggest that the Alu 245 distribution is primarily driven by intronic clustering [41] rather than GC-biased insertion or 246 evolutionary divergence. Recently, retrotransposons have been shown to act as genomic 247 insulators [42], while Alu repeats have been shown to be correlated with functional elements 248 [43]. However, the relationships between polymorphic sites, repeat elements, and genes across 249 TADs and higher level genome organization have yet to be explored in detail and warrants 250 further investigation. 251 TAD boundaries offer a unique computational opportunity to use biologically informed 252 windows to predefine areas of the genome that are more likely to interact with themselves. We 253 showed, as a proof of concept, that TADs can reveal functional GWAS variant to gene 254 relationships using BMD. Several of the TAD_Pathways implicated genes, including LRP5 and 255 other Wnt signaling genes, are bona fide BMD genes already identified by nearest gene GWAS, 256 eQTL analyses and human clinical syndromes [44,45], thus providing positive controls for our 257 approach. However, several BMD GWAS signals do not have obvious nearest gene associations, 258 which allowed us to validate our approach with two candidate causal, non-nearest gene 259 predictions: ACP2 and DEAF1. Both genes also did not have eQTL associations, but this is 260 likely a result of the eQTL browser lacking bone tissue. 261 To assess the validity of our predictions, we experimentally knocked down ACP2 and 262 DEAF1 in hFOB cells. siRNA for ACP2 and DEAF1 did not significantly alter expression of the 263 osteoblast marker genes OSX, IBSP or TNAP. OCN was downregulated in each of the 264 experimental groups relative to the scrambled siRNA control, but comparison with the reagent 265 control indicated no significant difference in any group, suggesting an off-target effect on OCN 266 in the scrambled siRNA group. Because osteoblast differentiation genes were not downregulated 12 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 267 following knockdown of the genes of interest, we concluded that these genes do not directly 268 regulate the transcriptional processes of osteoblast differentiation in vitro. The decrease in 269 DEAF1 expression following TNAP siRNA treatment, though not statistically significant, 270 suggests that DEAF1 may function downstream of TNAP. 271 There was a pronounced and statistically significant reduction in metabolic activity in hFOB 272 cells treated with ACP2 siRNA. This result carried through to the ALP assay, in which staining 273 intensity and ALP+ area fraction were dramatically reduced in only the TNAP siRNA and ACP2 274 siRNA groups. The combination of these results with the gene expression data suggests that 275 ACP2 regulates early osteoblast proliferation/viability, but does not directly regulate osteoblast 276 differentiation. 277 We provide evidence that our approach can steer researchers from GWAS signals toward 278 genes relevant to the pathogenesis of the given trait. Furthermore, because our method treats all 279 genes in implicated TADs equally, functional classification extends to the identification of single 280 variant pleiotropic events; as was the case with an intronic FTO variant impacting both IRX3 and 281 IRX5 [8]. 282 Despite the advantages presented by TAD_Pathways, the method has a number of 283 limitations. Currently, our method will not overcome the possibility of a gene being 284 inappropriately included in a pathway that it does not actually contribute to, plus all other 285 propagated errors related to pathway curation and analyses [46]. Network based methods built on 286 gene-gene interaction data also suffer from similar biases [47], but potentially to a lesser extent 287 than curated pathways. We include both curated and computationally predicted GO annotations 288 to ameliorate this bias. The computational predictions provide additional support that these genes 289 may be important disease associated genes that we would have missed using only experimentally 13 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 290 validated pathway genes. We are also unable to implicate a gene to a trait if it is not assigned to a 291 curated or predicted pathway, or if it does not fall within a TAD corresponding to a GWAS 292 signal. It is also likely that our approach will not work well with every GWAS. Indeed we are 293 implicating causality to given genes - we are not making a direct connection between the gene 294 and the given variant. Furthermore, our method does not include the possibility of finding genes 295 associated with a disease that is impacted by alternative looping, which has been observed to 296 occur in cancer [48,49] and sickle cell anemia [50]. As research on 3D genome organization 297 increases, it is likely that more diseases will include chromosome looping deficiencies as part of 298 their etiology. Additionally, we used TAD boundaries defined by Dixon et al. 2012. A more 299 recent Hi-C analysis at increased resolution substantially reduced the estimated average size of 300 TADs [51]. Nevertheless, there remains disagreement about how TADs are defined [52]. Despite 301 our method using larger TAD boundaries, thus promoting the inclusion of more presumably false 302 positive genes, we retain the ability to identify biologically logical pathways. The larger 303 boundaries permit us to screen a larger number of candidate genes but makes the method 304 analytically conservative by increasing the pathway overrepresentation signal required to surpass 305 the adjusted significance threshold. 306 The validation screen is also limited: it was performed in a simplified in vitro cell culture 307 system lacking organismal complexity, and the cell line selected is largely tetraploid which may 308 partially compensate for gene knockdown. This is particularly true for the lack of reduction in 309 TNAP gene expression in the TNAP siRNA group, in light of the historical selection of the hFOB 310 cell line based on robust ALP staining in culture [53]. As well, while the TAD approach 311 identifies potentially several GWAS associated genes, herein we only examined two genes per 312 TAD – one immediately adjacent to the GWAS SNP and another that we postulated could play a 14 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 313 role in the skeleton. Further work would need to systematically examine the relative importance 314 of each gene in a TAD. 315 Other recent mechanisms and algorithms used to assign causality from association signals or 316 enhancers to genes typically leverage multiple data types including expression [54] or epigenetic 317 features [24]. For example, TargetFinder uses several high throughput genomic marks to identify 318 features predictive of a chromosome physically looping together enhancers and promoters [24]. 319 Looping occurs at sub-TAD level resolutions [55] and sub-TADs are variable across cell types. 320 Therefore, in order for a chromosome looping signature to generalize to GWAS signals, a user 321 must assay tissue-specific and high resolution Hi-C to identify more specific interactions. 322 Alternatively, one could also query variants that affect gene expression in high-throughput and 323 systematically match signals to gene expression [56]. A major limitation to these approaches is 324 that several diseases do not yet have a known tissue source or involve multiple tissues. This is 325 particularly true for osteoporosis whereby multiple cell types, as well as systemic factors, 326 influence bone mass [57]. Therefore, identifying these specific signals may require the 327 procedures to be repeated across competing tissue types. In contrast, a TAD_Pathways analysis 328 is computationally cheap and uses publicly available TAD boundaries and GO terms as a guide 329 for assigning genes to GWAS signals. The method is effective in a wide variety of settings and 330 across tissues because TADs are consistent across cell types [25]. In summary, TAD_Pathways 331 can be used to guide researchers toward the most likely causal gene implicated by a GWAS 332 signal. We have also identified ACP2 as a gene involved in BMD determination, which warrants 333 further investigation. 334 335 Conclusions: 15 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 336 TADs offer a novel tool in the investigation of genome function. We present an approach, 337 called TAD_Pathways, to leverage 3D genomics to prioritize and predict causal genes implicated 338 by GWAS signal. At the foundation of our method is the principle that genomic regions within 339 the same TAD more often interact with each other, and therefore, provide the genomic 340 scaffolding that can impact gene function and gene regulation within each TAD. We applied our 341 method to established BMD GWAS signals. By selecting two GWAS signals and two classes of 342 genes for each signal (nearest gene and predicted gene by TAD_Pathways), we demonstrated 343 that our approach can causally implicate genes kilobases away from their associated GWAS 344 signal. We validated ACP2 (TAD prediction), but not ARHGAP1 (nearest gene), and show that 345 ACP2 influences the proliferation and differentiation of osteoblast cells. We were unable to 346 validate either DEAF1 or BET1L and conclude that neither impacts osteoblast gene expression 347 nor metabolic activity. Whether these genes influence other aspects of skeletal biology cannot be 348 determined within the scope of the current study. Future studies focused on BMD GWAS would 349 explore both osteoblast and osteoclast associated changes. In conclusion, as more information 350 and data is collected regarding 3D genome principles, we propose that algorithms that leverage 351 dynamic 3D structure rather than static linear organization will more accurately predict and 352 discover the basic genomic biology of diseases. 353 354 Methods: 355 Data Integration 356 We used previously identified TAD boundaries for hESC, IMR90, mESC, and mcortex cells 357 for all TAD based analyses [22,58]. To describe the genomic content of TADs, we extracted 358 common SNPs (major allele frequency ≤ 0.95) from the 1000 Genomes Phase III data (2 May 16 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 359 2013 release) [59] and downloaded hg19 Gencode genes [60] and hg19 RepeatMasker repeat 360 elements [61]. We downloaded hg19 FASTA files for all chromosomes as provided by the 361 Genome Reference Consortium [62]. Furthermore, we downloaded the NHGRI-EBI GWAS 362 catalog on 25 February 2016, which holds the significant findings of several GWAS’ for over 363 300 traits [1]. Since the GWAS catalog reports hg38 coordinates, we used the hg38 to hg19 364 UCSC chain file [63] and PyLiftover [64] to convert genome build coordinates to hg19. We 365 assessed relevant expression quantitative trait loci (eQTLs) using all tissues in the NCBI GTEx 366 eQTL Browser [65]. 367 368 TAD_Pathways 369 Our TAD_Pathways method is a light-weight approach that uses TAD boundary regions, 370 rather than distance explicitly, to identify putative causal genes. We first build a comprehensive 371 TAD based gene list that consists of all genes that fall inside TADs that are implicated with a 372 GWAS signal (see Figure 2). This gene list assumes that all genes within each signal TAD have 373 an equal likelihood of functional impact on the trait or disease of interest. We then input the 374 TAD based gene list into a WebGestalt overrepresentation analysis [66]. WebGestalt is a 375 webapp that facilitates a pathway analysis interface allowing for quick and custom gene set 376 based analyses. We perform a pathway overrepresentation test for the input TAD based genes 377 against GO biological process, molecular function, and cellular component terms with a 378 background of the human genome. Specifically, this tests if the input gene set is associated with 379 any particular GO term at a higher probability than by chance compared to background genes. 380 We include both experimentally validated and computationally inferred genes in each GO term, 381 which allows the method to discover associations for genes that lack literature support. We 17 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 382 consider genes that are annotated to the most significantly enriched GO term to be the associated 383 set [65]. 384 385 386 Cell culture and siRNA transfection A human fetal osteoblast cell line (ATCC hFOB 1.19 CRL-11372) was obtained and 387 subcultured twice at 34°C, 5% CO2 and 95% relative humidity in 1:1 DMEM/F12 with 2.5mM 388 L-glutamine without phenol red (Gibco 21041025) supplemented with 10% FBS (Atlas USDA 389 F0500D) and 0.3mg/mL G418 sulfate (Gibco 10131035). All experiments were conducted in 390 three temporally separated independent technical replicates from cryopreserved P2 aliquots of 391 these cells. 48 hours prior to transfection, media was switched to a G418-free formulation. 392 Transfections were conducted in single-cell suspension using a commercial siRNA reagent 393 system (Santa Cruz sc-45064) according to the manufacturer’s instructions, with 6μL of siRNA 394 duplex and 4μL transfection reagent in 750μL of transfection media per 100,000 cells. Following 395 trypsinization, cells were counted and divided into one of 8 experimental groups: 1) untreated 396 control, 2) siRNA-negative transfection reagent control, 3) scrambled control siRNA (sc-37007), 397 4) TNAP siRNA (sc-38921), 5) ARHGAP1 siRNA (sc-96477), 6) ACP2 siRNA (sc-96327), 7) 398 BET1L siRNA (sc-97007) or 8) Suppressin (DEAF1) siRNA (sc-76613). Samples for RNA 399 isolation were generated by plating 200,000 cells per well in tissue culture treated 6-well plates 400 (Falcon 353046). Samples for MTT assay and alkaline phosphatase (ALP) staining were 401 generated by plating 50,000 cells per well in tissue culture treated 24-well plates (Falcon 402 353047). Cells were then switched to a 37°C incubator and transfected for 6 hours, at which 403 point transfection cocktails were diluted with 2x hFOB media concentrate. Media was 404 completely changed to 1x standard hFOB media 16 hours later. 18 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 405 406 407 Quantitative PCR RNA was harvested at Day 4 using TRIzol (Ambion) followed by acid guanidinium 408 thiocyanate-phenol-chloroform extraction and RNeasy (Qiagen) spin-column purification with 409 DNase (Qiagen) then reverse-transcribed using a high-capacity RNA-to-cDNA kit (Applied 410 Biosystems). Duplicate qPCR reactions were conducted on 20ng of whole-RNA template using 411 SYBR Select master mix (Applied Biosystems) in an Applied Biosystems 7500 Fast Real-Time 412 PCR System. Primers for GAPDH, OSX, OCN and IBSP were adopted from the literature and 413 synthesized by R&D Systems. New primer sets spanning exon-exon junctions were designed for 414 ARHGAP1, ACP2, BET1L and DEAF1 using NCBI Primer Blast, verified by melt curve analysis 415 and agarose gel electrophoresis. Sequences follow: ARHGAP1 F- 416 GCGGAAATGGTTGGGGATAG R-CCTTAAGAGAAACCGCGCTC (127bp), ACP2 F- 417 AGCGGGTTCCAGCTTGTTT R-TGGCGGTACAGCAAGGTAAC (165bp), BET1L F- 418 GGATGGCATGGACTCGGATT R-TCCTCTGGAGCCCAAAACAC (254bp), DEAF1 F- 419 GGAAGGAGCAGTCCTGCGTT R-TCACCTTCTCCATCACGCTTT (195bp). Results were 420 analyzed using the 2-ddCt method using GAPDH as a housekeeping gene and reported as (mean ± 421 standard deviation) fold-change versus untreated controls. Statistical significance was 422 determined using 2-way homoscedastic Student’s t-tests versus the scrambled siRNA control, 423 annotated using *p≤0.05 and #p≤0.10. The acronym N.S. stands for “not significant”. 424 425 426 427 MTT metabolic assay Cellular metabolism/proliferation was assessed using a commercial cell growth determination kit (Sigma CGD1). At 1 or 4 days post-transfection, media in assigned 24-well 19 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 428 plates was switched to 450μL hFOB media plus 50μL MTT solution. Cells were incubated at 429 37°C for 3.5 hours, after which the media was aspirated and the resulting formazan crystals were 430 solubilized in MTT solvent. Plates were shaken and read in a BioTek Synergy H1 microplate 431 reader. Results are reported as the difference in mean absorbance at 570nm-690nm from Day 1 432 to Day 4. Error bars represent root-mean-square standard deviation from the measurements at 433 both days. Statistical significance was determined using 2-way homoscedastic Student’s t-tests 434 versus the scrambled siRNA control, annotated using *p≤0.05 and #p≤0.10. 435 436 437 Alkaline phosphatase (ALP) staining Plates were stained at 4 days post-transfection using a commercial ALP kit (Sigma 86C). 438 Dried plates were then imaged at 600dpi using an Epson V370 photo scanner and staining was 439 quantified using mean whole-well intensity measurements in ImageJ using raw output files. ALP 440 area fraction was calculated using color thresholding. Values are reported as mean ± standard 441 deviation. Statistical significance was determined using 2-way homoscedastic Student’s t-tests 442 versus the scrambled siRNA control, annotated using *p≤0.05 and #p≤0.10. Images of plates 443 presented as figures were edited using a warming filter and for brightness/contrast using Adobe 444 Photoshop CS6. 445 446 447 Computational Reproducibility We provide all of our source code under a permissive open source license and encourage 448 others to modify and build upon our work [67]. Additionally, we provide an accompanying 449 docker image [68] to replicate our computational environment 450 (https://hub.docker.com/r/gregway/tad_pathways/) [69]. 20 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 451 452 Declarations: 453 Ethics approval and consent to participate 454 Not applicable. 455 Consent for publication 456 Not applicable. 457 Availability of data and material 458 All data used to construct the TAD_Pathways approach are publically available datasets. We 459 make all software used to develop this approach publically available in a GitHub repository 460 (http://github.com/greenelab/tad_pathways). We also provide a docker image 461 (https://hub.docker.com/r/gregway/tad_pathways/) and archive the GitHub software on 462 Zenodo (https://zenodo.org/record/163950). 463 464 465 Competing interests The authors declare no competing interests. Funding 466 This work was supported by the Genomics and Computational Biology Graduate program at 467 The University of Pennsylvania (to G.P.W.); the Gordon and Betty Moore Foundation’s Data 468 Driven Discovery Initiative (grant number GBMF 4552 to C.S.G); the National Institute of 469 Dental & Craniofacial Research (grant number NIH F32DE026346 to D.W.Y.); S.F.A.G is 470 supported by the Daniel B. Burke Endowed Chair for Diabetes Research. 471 Authors' contributions 472 GPW wrote the software, analyzed the data, developed the method and wrote the manuscript; 473 DWY performed the experimental validation and wrote the manuscript; KDH performed the 21 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 474 experimental validation and wrote the manuscript; CSG analyzed the data, developed the 475 method and wrote the manuscript; SFAG analyzed the data, developed the method and wrote 476 the manuscript. 477 Acknowledgements 478 Hannah E. Sexton and Troy L. Mitchell assisted in optimizing siRNA transfection 479 conditions. Daniel Himmelstein and Amy Campbell performed analytical code review. 480 Authors' information (optional) 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 22 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 497 Supplementary Figures: 498 499 500 501 502 503 Supplementary Figure S1: Distributions of genes across topologically associating domains of four cell types. Bars on the top and right side of the plots represent histograms of each respective metric. We independently discretized TADs into 50 bins where the distribution of elements is linear from 5’ to 3’ (bin 0 is the 5’ most end of all TADs). 504 505 506 507 508 509 Supplementary Figure S2: Distributions of all repeat elements across topologically associating domains of four cell types. Bars on the top and right side of the plots represent histograms of each respective metric. We independently discretized TADs into 50 bins where the distribution of elements is linear from 5’ to 3’ (bin 0 is the 5’ most end of all TADs). 23 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 510 511 512 513 514 515 516 517 518 519 520 521 522 Supplementary Figure S3: Distributions of SINE/Alu elements and potential confounding factors across topologically associating domains of four cell types. We independently discretized TADs into 50 bins where the distribution of elements is linear from 5’ to 3’ (bin 0 is the 5’ most end of all TADs). The distribution of Alu elements is strongly clustered toward TAD boundaries but neither the percentage of GC content across TADs nor the evolutionary divergence of Alu elements can explain the Alu distribution. Supplementary Figure S4: Bone Mineral Density gene counts associated with various classes of evidence. The gene LRP5 is implicated by all evidence codes. About 1/3 of all TAD_Pathways implicated genes are the nearest gene implicated by the GWAS signal. 24 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 523 Tables: 524 Supplementary Table S1: Chi square testing the enrichment of genes at TAD boundaries and 525 enrichment of SNPs towards the right half of TADs. 526 Enrichment of SNPs in the Right of TADs Left Right Chi-square P hESC 3268781 3296450 116.6 3.5E-27 IMR90 3211140 3210251 0.12 0.73 mESC* 25161934 25129279 21.2 4.1E-06 mcortex 24616532 24641931 13.1 3.0E-4 * Note- mouse embryonic stem cells have an opposite enrichment. There are more SNPs in the 527 left of TADs 528 529 We provide Supplementary Tables S2 and S3 as attached .xls files. 530 531 Abbreviations: 532 TAD – Topologically associating domain 533 GWAS – Genome wide association study 534 SNP – Single nucleotide polymorphism 535 BMD – Bone mineral density 536 hESC – Human embryonic stem cells 537 mESC – Mouse embryonic stem cells 538 IMR90 – Human fibroblast cells 539 mcortex – Mouse cortex cells 540 siRNA – Small interfering RNA 541 eQTL – Expression quantitative trail loci 542 hFOB – Human fetal osteoblast 25 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 543 TNAP - tissue-nonspecific alkaline phosphatase 544 OCN – osteocalcin 545 IBSP – bone sialoprotein 546 ALP – Alkaline phosphatase 547 GO – Gene ontology 548 eQTL – Expression quantitative trait loci 549 550 References: 551 552 1. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42:D1001–6. 553 554 2. Zhang F, Lupski JR. Non-coding genetic variants in human disease. Hum. Mol. Genet. 2015;24:R102–10. 555 556 3. Ward LD, Kellis M. Interpreting noncoding genetic variation in complex traits and human disease. Nat. Biotechnol. 2012;30:1095–106. 557 558 559 4. McVean GA, Altshuler (Co-Chair) DM, Durbin (Co-Chair) RM, Abecasis GR, Bentley DR, Chakravarti A, et al. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. 560 561 5. Brodie A, Azaria JR, Ofran Y. How far from the SNP may the causative genes be? Nucleic Acids Res. 2016;gkw500. 562 563 6. Tung YCL, Yeo GSH, O’Rahilly S, Coll AP. Obesity and FTO: Changing Focus at a Complex Locus. Cell Metab. 2014;20:710–8. 564 565 566 7. Ragvin A, Moro E, Fredman D, Navratilova P, Drivenes O, Engstrom PG, et al. Long-range gene regulation links genomic type 2 diabetes and obesity risk regions to HHEX, SOX4, and IRX3. Proc. Natl. Acad. Sci. 2010;107:775–80. 567 568 8. Claussnitzer M, Dankel SN, Kim K-H, Quon G, Meuleman W, Haugen C, et al. FTO Obesity Variant Circuitry and Adipocyte Browning in Humans. N. Engl. J. Med. 2015;373:895–907. 569 570 571 9. Smemo S, Tena JJ, Kim K-H, Gamazon ER, Sakabe NJ, Gómez-Marín C, et al. Obesityassociated variants within FTO form long-range functional connections with IRX3. Nature. 2014;507:371–5. 26 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 572 573 574 10. Grant SFA, Thorleifsson G, Reynisdottir I, Benediktsson R, Manolescu A, Sainz J, et al. Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nat. Genet. 2006;38:320–3. 575 576 577 11. Xia Q, Chesi A, Manduchi E, Johnston BT, Lu S, Leonard ME, et al. The type 2 diabetes presumed causal variant within TCF7L2 resides in an element that controls the expression of ACSL5. Diabetologia. 2016;59:2360–8. 578 579 12. Herman MA, Rosen ED. Making Biological Sense of GWAS Data: Lessons from the FTO Locus. Cell Metab. 2015;22:538–9. 580 581 13. Dekker J, Rippe K, Dekker M, Kleckner N. Capturing chromosome conformation. Science. 2002;295:1306–11. 582 14. Dekker J. Gene Regulation in the Third Dimension. Science. 2008;319:1793–4. 583 584 585 15. Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, et al. Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science. 2009;326:289–93. 586 587 16. Duan Z, Andronescu M, Schutz K, McIlwain S, Kim YJ, Lee C, et al. A three-dimensional model of the yeast genome. Nature. 2010;465:363–7. 588 589 590 17. Phillips-Cremins JE, Sauria MEG, Sanyal A, Gerasimova TI, Lajoie BR, Bell JSK, et al. Architectural Protein Subclasses Shape 3D Organization of Genomes during Lineage Commitment. Cell. 2013;153:1281–95. 591 592 593 18. Sexton T, Yaffe E, Kenigsberg E, Bantignies F, Leblanc B, Hoichman M, et al. ThreeDimensional Folding and Functional Organization Principles of the Drosophila Genome. Cell. 2012;148:458–72. 594 595 19. Symmons O, Uslu VV, Tsujimura T, Ruf S, Nassari S, Schwarzer W, et al. Functional and topological characteristics of mammalian regulatory domains. Genome Res. 2014;24:390–400. 596 597 20. Pope BD, Ryba T, Dileep V, Yue F, Wu W, Denas O, et al. Topologically associating domains are stable units of replication-timing regulation. Nature. 2014;515:402–5. 598 599 600 21. Gurudatta B, Yang J, Van Bortle K, Donlin-Asp P, Corces V. Dynamic changes in the genomic localization of DNA replication-related element binding factor during the cell cycle. Cell Cycle. 2013;12:1605–15. 601 602 22. Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485:376–80. 603 604 605 23. Nora EP, Dekker J, Heard E. Segmental folding of chromosomes: A basis for structural and regulatory chromosomal neighborhoods?: Prospects & Overviews. BioEssays. 2013;35:818–28. 27 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 606 607 24. Whalen S, Truty RM, Pollard KS. Enhancer–promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat. Genet. 2016;48:488–96. 608 609 610 25. Smith EM, Lajoie BR, Jain G, Dekker J. Invariant TAD Boundaries Constrain Cell-TypeSpecific Looping Interactions between Promoters and Distal Elements around the CFTR Locus. Am. J. Hum. Genet. 2016;98:185–201. 611 612 613 26. Richards JB, Rivadeneira F, Inouye M, Pastinen TM, Soranzo N, Wilson SG, et al. Bone mineral density, osteoporosis, and osteoporotic fractures: a genome-wide association study. Lancet Lond. Engl. 2008;371:1505–12. 614 615 616 27. Rivadeneira F, Styrkársdottir U, Estrada K, Halldórsson BV, Hsu Y-H, Richards JB, et al. Twenty bone-mineral-density loci identified by large-scale meta-analysis of genome-wide association studies. Nat. Genet. 2009;41:1199–206. 617 618 619 28. Estrada K, Styrkarsdottir U, Evangelou E, Hsu Y-H, Duncan EL, Ntzani EE, et al. Genomewide meta-analysis identifies 56 bone mineral density loci and reveals 14 loci associated with risk of fracture. Nat. Genet. 2012;44:491–501. 620 621 622 29. Styrkarsdottir U, Thorleifsson G, Sulem P, Gudbjartsson DF, Sigurdsson A, Jonasdottir A, et al. Nonsense mutation in the LGR4 gene is associated with several human diseases and other traits. Nature. 2013;497:517–20. 623 624 30. Hendrickx G, Boudin E, Van Hul W. A look behind the scenes: the risk and pathogenesis of primary osteoporosis. Nat. Rev. Rheumatol. 2015;11:462–74. 625 626 31. Choo SY. The HLA System: Genetics, Immunology, Clinical Testing, and Clinical Implications. Yonsei Med. J. 2007;48:11. 627 628 629 32. Håvik B, Le Hellard S, Rietschel M, Lybæk H, Djurovic S, Mattheisen M, et al. The Complement Control-Related Genes CSMD1 and CSMD2 Associate to Schizophrenia. Biol. Psychiatry. 2011;70:35–42. 630 631 33. Kwon E, Wang W, Tsai L-H. Validation of schizophrenia-associated genes CSMD1, C10orf26, CACNA1C and TCF4 as miR-137 targets. Mol. Psychiatry. 2013;18:11–2. 632 633 34. Wang K, Li M, Bucan M. Pathway-Based Approaches for Analysis of Genomewide Association Studies. Am. J. Hum. Genet. 2007;81:1278–83. 634 635 35. Hao K, Di X, Cawley S. LdCompare: rapid computation of single- and multiple-marker r2 and genetic coverage. Bioinformatics. 2007;23:252–4. 636 637 638 36. Taşan M, Musso G, Hao T, Vidal M, MacRae CA, Roth FP. Selecting causal genes from genome-wide association studies via functionally coherent subnetworks. Nat. Methods. 2014;12:154–9. 639 640 37. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 2000;25:25–9. 28 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 641 642 643 38. Petersen A, Alvarez C, DeClaire S, Tintle NL. Assessing Methods for Assigning SNPs to Genes in Gene-Based Tests of Association Using Common Variants. Chen L, editor. PLoS ONE. 2013;8:e62161. 644 645 646 39. Mu XJ, Lu ZJ, Kong Y, Lam HYK, Gerstein MB. Analysis of genomic variation in noncoding elements using population-scale sequencing data from the 1000 Genomes Project. Nucleic Acids Res. 2011;39:7058–76. 647 648 40. Jurka J, Kohany O, Pavlicek A, Kapitonov VV, Jurka MV. Duplication, coclustering, and selection of human Alu retrotransposons. Proc. Natl. Acad. Sci. 2004;101:1268–72. 649 41. Deininger P. Alu elements: know the SINEs. Genome Biol. 2011;12:236. 650 651 652 42. Wang J, Vicente-García C, Seruggia D, Moltó E, Fernandez-Miñán A, Neto A, et al. MIR retrotransposon sequences provide insulators to the human genome. Proc. Natl. Acad. Sci. 2015;112:E4428–37. 653 654 655 43. Gu Z, Jin K, Crabbe MJC, Zhang Y, Liu X, Huang Y, et al. Enrichment analysis of Alu elements with different spatial chromatin proximity in the human genome. Protein Cell. 2016;7:250–66. 656 657 44. Baron R, Kneissel M. WNT signaling in bone homeostasis and disease: from human mutations to treatments. Nat. Med. 2013;19:179–92. 658 659 45. Krishnan V, Bryant HU, Macdougald OA. Regulation of bone mass by Wnt signaling. J. Clin. Invest. 2006;116:1202–9. 660 661 46. Khatri P, Sirota M, Butte AJ. Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges. Ouzounis CA, editor. PLoS Comput. Biol. 2012;8:e1002375. 662 663 664 47. Reguly T, Breitkreutz A, Boucher L, Breitkreutz B-J, Hon GC, Myers CL, et al. Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. J. Biol. 2006;5:11. 665 666 48. Corces MR, Corces VG. The three-dimensional cancer genome. Curr. Opin. Genet. Dev. 2016;36:1–7. 667 668 669 49. Flavahan WA, Drier Y, Liau BB, Gillespie SM, Venteicher AS, Stemmer-Rachamimov AO, et al. Insulator dysfunction and oncogene activation in IDH mutant gliomas. Nature. 2015;529:110–4. 670 671 672 50. Deng W, Lee J, Wang H, Miller J, Reik A, Gregory PD, et al. Controlling long-range genomic interactions at a native locus by targeted tethering of a looping factor. Cell. 2012;149:1233–44. 673 674 675 51. Rao SSP, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, et al. A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping. Cell. 2014;159:1665–80. 29 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 676 677 52. Bonev B, Cavalli G. Organization and function of the 3D genome. Nat. Rev. Genet. 2016;17:661–78. 678 679 680 53. Harris SA, Enger RJ, Riggs BL, Spelsberg TC. Development and characterization of a conditionally immortalized human fetal osteoblastic cell line. J. Bone Miner. Res. Off. J. Am. Soc. Bone Miner. Res. 1995;10:178–86. 681 682 683 54. Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 2016;48:481–7. 684 685 686 55. Dowen JM, Fan ZP, Hnisz D, Ren G, Abraham BJ, Zhang LN, et al. Control of Cell Identity Genes Occurs in Insulated Neighborhoods in Mammalian Chromosomes. Cell. 2014;159:374– 87. 687 688 689 56. Tewhey R, Kotliar D, Park DS, Liu B, Winnicki S, Reilly SK, et al. Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay. Cell. 2016;165:1519–29. 690 691 57. Lupsa BC, Insogna K. Bone Health and Osteoporosis. Endocrinol. Metab. Clin. North Am. 2015;44:517–30. 692 693 58. Ho JWK, Jung YL, Liu T, Alver BH, Lee S, Ikegami K, et al. Comparative analysis of metazoan chromatin organization. Nature. 2014;512:449–52. 694 695 59. 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature. 2015;526:68–74. 696 697 698 60. Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, et al. GENCODE: The reference human genome annotation for The ENCODE Project. Genome Res. 2012;22:1760–74. 699 700 61. Smit A, Hubley R, Green P. RepeatMasker Open-4.0 [Internet]. Available from: http://www.repeatmasker.org 701 702 62. Church DM, Schneider VA, Graves T, Auger K, Cunningham F, Bouk N, et al. Modernizing Reference Genome Assemblies. PLoS Biol. 2011;9:e1001091. 703 704 63. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The Human Genome Browser at UCSC. Genome Res. 2002;12:996–1006. 705 706 64. Tretyakov K. pyliftover [Internet]. 2014. Available from: https://github.com/konstantint/pyliftover 707 708 65. GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 2013;45:580–5. 30 bioRxiv preprint first posted online Nov. 15, 2016; doi: http://dx.doi.org/10.1101/087718. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. 709 710 66. Wang J, Duncan D, Shi Z, Zhang B. WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013. Nucleic Acids Res. 2013;41:W77–83. 711 712 67. Gregory Way, Casey Green. Greenelab/Tad_Pathways: Pre-Release. 2016 [cited 2016 Nov 14]; Available from: https://doi.org/10.5281/zenodo.163950 713 714 68. Boettiger C. An introduction to Docker for reproducible research. ACM SIGOPS Oper. Syst. Rev. 2015;49:71–9. 715 716 69. Gregory Way, Struan Grant, Casey Greene. TAD Pathways Archived Docker Image. 2016 [cited 2016 Nov 14]; Available from: https://doi.org/10.5281/zenodo.166556 717 31
© Copyright 2026 Paperzz