Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 904 Genome-Wide Studies of Transcriptional Regulation in Human Liver Cells by Highthroughput Sequencing MADHUSUDHAN REDDY BYSANI ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2013 ISSN 1651-6206 ISBN 978-91-554-8671-6 urn:nbn:se:uu:diva-198579 Dissertation presented at Uppsala University to be publicly examined in Rudbeck hall, The Rudbeck Laboratory, Dag Hammarskjölds väg 20, Uppsala, Monday, June 10, 2013 at 09:15 for the degree of Doctor of Philosophy (Faculty of Medicine). The examination will be conducted in English. Abstract Bysani, M. S. R. 2013. Genome-Wide Studies of Transcriptional Regulation in Human Liver Cells by High-throughput Sequencing. Acta Universitatis Upsaliensis. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 904. 50 pp. Uppsala. ISBN 978-91-554-8671-6. The human genome contains slightly more than 20 000 genes that are expressed in a tissue specific manner. Transcription factors play a key role in gene regulation. By mapping the transcription factor binding sites genome-wide we can understand their role in different biological processes. In this thesis we have mapped transcription factors and histone marks along with nucleosome positions and RNA levels. In papers I and II, we used ChIP-seq to map five liver specific transcription factors that are crucial for liver development and function. We showed that the mapped transcription factors are involved in metabolism and other cellular processes. We showed that ChIP-seq can also be used to detect protein-protein interactions and functional SNPs. Finally, we showed that the epigenetic histone mark studied in paper I is associated with transcriptional activity at promoters. In paper III, we mapped nucleosome positions before and after treatment with transforming growth factor β (TGFβ) and found that many nucleosomes changed positions when expression changed. After treatment with TGFβ, the transcription factor HNF4α was replaced by a nucleosome in some regions. In paper IV, we mapped USF1 transcription factor and three active chromatin marks in normal liver tissue and in liver tissue of patients diagnosed with alcoholic steatohepatitis. Using gene ontology, we as expected identified many metabolism related genes as active in normal samples whereas genes in cancer pathways were active in steatohepatitis tissue. Cancer is a common complication to the disease and early signs of this were found. We also found many novel and GWAS catalogue SNPs that are candidates to be functional. In conclusion, our results have provided information on location and structure of regulatory elements which will lead to better knowledge on liver function and disease. Keywords: ChIP-seq, Transcription factors, Alcoholic steatohepatitis, Genome-wide, GWAS, SNPs Madhusudhan Reddy Bysani, Uppsala University, Department of Immunology, Genetics and Pathology, Rudbecklaboratoriet, SE-751 85 Uppsala, Sweden. © Madhusudhan Reddy Bysani 2013 ISSN 1651-6206 ISBN 978-91-554-8671-6 urn:nbn:se:uu:diva-198579 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-198579) To my parents and well wishers List of Papers This thesis is based on the following papers, which are referred to in the text by their Roman numerals. I Motallebipour M*, Ameur A*,Bysani MSR, Patra K, Wallerman O, Mangion J, Barker MA, McKernan KJ,Komorowski J, Wadelius C. Differential binding and co-binding pattern of FOXA1 and FOXA3 and their relation to H3K4me3 in HepG2 cells revealed by ChIP-seq. Genome Biology 2009, 10:R129 (17 November 2009). II Wallerman O*, Motallebipour M*, Enroth S, Patra K, Bysani MS, Komorowski J, Wadelius C. Molecular interactions between HNF4a, FOXA2 and GABP identified at regulatory DNA elements through ChIP-sequencing. (2009) Nucleic Acids Research, 37(22):7498-508. III Enroth S*, Andersson R*, Bysani MSR, Wallerman O, Tuch B B, De la Vega F, Heldin C-H, Moustakas A, Komorowski J , Wadelius C. Nucleosome regulatory dynamics in response to TGF-beta treatment in HepG2 cells (Manuscript). IV Bysani MSR, Wallerman O, Bornelöv S, Zatloukal K, Komorowski J Wadelius C. ChIP-seq in steatohepatitis and normal liver tissue identifies candidate disease mechanisms related to progression to cancer (Manuscript). *These authors contributed equally in this work. Reprints were made with permission from the respective publishers. Contents Introduction ................................................................................................... 11 Genetic variation ...................................................................................... 11 The ENCODE Project .............................................................................. 12 Sequencing technology ............................................................................ 13 Transcriptional regulation and RNA polymerases ................................... 13 The transcriptome ................................................................................ 15 Transcription factors ............................................................................ 16 Regulatory networks ............................................................................ 17 Transcription factors in disease ........................................................... 17 Transcription factors in reprogramming .............................................. 18 Epigenetics ............................................................................................... 20 Chromatin structure and organization ................................................. 20 Nucleosome positioning ...................................................................... 22 Post-translational histone modifications .............................................. 22 DNA methylation ................................................................................ 25 Imprinting and X-chromosome inactivation........................................ 27 TGFβ signaling......................................................................................... 28 Alcoholic steatohepatitis .......................................................................... 28 Materials and Methods.................................................................................. 30 Cells and Tissues ...................................................................................... 30 Methods .................................................................................................... 30 Chromatin Immunoprecipitation (ChIP) .................................................. 30 Verification of ChIP DNA by PCR .......................................................... 32 Library preparation for ChIP-sequencing................................................. 32 Next generation Sequencing..................................................................... 32 ChIP-reChIP ............................................................................................. 34 Co-Immunoprecipitation .......................................................................... 34 TGFβ treatment, nucleosome and RNA preparation................................ 34 Present Investigations ................................................................................... 35 Aims of the present studies ...................................................................... 35 Paper I ...................................................................................................... 36 Paper II ..................................................................................................... 37 Paper III .................................................................................................... 38 Paper IV.................................................................................................... 39 Concluding remarks and future perspectives ................................................ 41 Acknowledgements ....................................................................................... 43 References ..................................................................................................... 45 Abbreviations 3C ASH Bp ChIA-PET ChIP ChIP-seq CNV Co-IP CTCF DHS DNA DNMT ENCODE FCHL GABP GTF GWAS H3K27ac H3K4me1 H3K4me3 HCC HCV HepG2 HGP HNF4α ICR LCR lincRNA lncRNA miRNA MNase1 MODY1 Chromosome Conformation Capture Alcoholic Steatohepatitis Base pair Chromatin Interaction Analysis by Paired End Tags Chromatin Immunoprecipitation Chromatin Immunoprecipitation with Sequencing Copy Number Variation Co-Immunoprecipitation CCCTC binding factor DNase 1 Hyper Sensitivity Deoxyribonucleic acid DNA methyl transferase ENCyclopedia of DNA Elements Familal Combined Hyperlipedemia GA Binding Protein General Transcription Factor Genome Wide Association Study Histone 3 lysine 27 acetylation Histone 3 lysine 4 mono-methylation Histone 3 lysine 4 tri-methylation Hepatocellular carcinoma Hepatitis C Virus Cell line from Hepatocellular Carcinoma Human Genome Project Hepatocyte Nuclear Factor 4α Imprinted Control Region Locus Control Region long intervening noncoding RNA long noncoding RNA micro-RNA Micrococcal Nuclease 1 Maturity Onset Diabetes of Young 1 mRNA NASH NGS NRF PCR RNA rRNA SNP TF TGF tRNA TSS USF1 UTR messenger RNA Non-alcoholic steatohepatitis Next Generation Sequencing Nuclear Respiratory Factor Polymerase Chain Reaction Ribonucleic acid ribosomal RNA Single Nucleotide Polymorphism Transcription Factor Transforming Growth Factor transfer RNA Transcription Start Site Upstream Stimulatory Factor 1 Untranslated Region Introduction The decoding of a near complete human genome in 2001 is one of the largest achievements in science since the discovery of the structure of deoxyribonucleic acid (DNA) by Watson and Crick in 1953 [1-3]. The human genome project (HGP) revealed that there are approximately three billion nucleotides organized in the 22 pairs of chromosomes and the sex chromosomes X and Y (XX in women and XY in men). Only a small proportion 1.5%, of the genome codes for mRNA. There are slightly more than 20 000 human protein coding genes. In addition, an unknown number of DNA loci are transcribed into non-coding RNAs such as ribosomal RNA (rRNA), transfer RNA (tRNA), micro-RNA (miRNA), long non-coding RNA (lnc-RNA) or transcripts of unknown function [4]. The successful completion of the HGP has provided a road map to many large international and collaboration projects like the HapMap project, the ENCyclopedia of DNA Elements (ENCODE) project and the 1000 genomes project. Genetic variation The current estimate is that the human genome contains approximately 38 million single nucleotide polymorphisms (SNPs) and around 10 thousand of common SNPs are associated with common diseases, whereas other variants are rare and possible association to disease is not known. The HapMap project was aimed to identify common patterns of genetic variation in diverse human populations [5]. The purpose of the 1000 genomes project was to characterize the genetic variants with at least 1% frequency in the population using high-throughput sequencing [6]. The second phase of this project used 1092 individuals to provide the current information map of 38 million SNPs, 1.4 million short indels and 14 000 large deletions [7] that have occurred at different times during human evolution. Common variants are old and rare variants are younger. A recent large scale exome-seq study of ~15 000 genes in 6 515 individuals revealed that 86% of the SNPs that were predicted to be harmful had arisen in the last 5 000 years [8]. Selection may not have had time to act on these variants and remove them from the population. There are a large number of copy number variations (CNV) varying in size from ~1 kb to several mega base long in human genome. CNVs have been associated 11 with many diseases e.g. autism [9]. Genetic variants like SNPs and indels may also affect the binding of transcription factors (TFs) [10]. Also others have later shown that genetic variation changes the heritable chromatin status and TF binding [11, 12]. The ENCODE Project The ENCODE project was started in 2003 to map the functional elements in the human genome. The pilot phase of the project mapped TF binding sites, histone marks, DNase 1 hypersensitive sites (DHS) etc. in 1% of the human genome [13]. The next phase of the project was focused on continuing the annotations of protein coding, non-protein coding genes and their transcripts and transcription regulatory regions in the whole genome using massively parallel next generation sequencing technology [14]. In September 2012, ENCODE consortia published a series of papers in Nature, Science, Genome Research and in Genome Biology to describe the results from 1 640 genomewide datasets (ChIP-seq, RNA-seq, DNase-seq, FAIRE-seq, 5C etc.) (Figure 1) from 147 cell types. The ENCODE data summary paper [15] claimed that 80.4% of the human genome is biochemically active with approximately 400 000 enhancer-like characters and 70 000 promoter-like characters. Their findings will be further discussed in the coming sections. The modENCODE project was started to identify the functional elements in Caenorhabditis elegans and in Drosophila melanogaster. modENCODE project has generated more than 700 datasets and they have identified protein coding and non-coding genes, regulatory, replication and chromatin elements in Drosophila melanogaster. They have also identified new functions of annotated genes, general and tissue-specific regulatory elements. The complex modENCODE data generated today covers 82% of D .melanogaster genome which is four times larger than the previously annotated protein coding exons [16]. The modENCODE project of C .elegans was aimed at the transcriptome profiling during a developmental time course, identification of TF binding sites and generation of chromatin organization maps. They have also focused on TF binding networks and miRNA interactions [17]. 12 Figure 1. The figure illustrates various methods used in the ENCODE project to map the functional elements in the genome. The figure was obtained from the UCSC genome browser webpage and the credit goes to Ian Dunham and Daryl Leja. Sequencing technology The launch of Next Generation Sequencing (NGS) technology, played a key role in the ENCODE project, 1000 genomes project and many other projects. Rapid development in NGS technology reduced the sequencing cost and today the whole human genome can be sequenced in couple of hours. For this thesis, we have used Illumina and Life technologies SOLiD sequencing platforms to map the regulatory regions, histone marks, nucleosome positioning etc. We have mapped liver specific TFs, using Chromatin Immunoprecipitation in combination with NGS (ChIP-seq). Transcriptional regulation and RNA polymerases Transcription is the process where an equivalent copy of RNA is created from a DNA sequence. Transcription is performed by three different multisubunit RNA polymerases. RNA polymerase I synthesizes 45S pre rRNA which then matures into 28S, 18S and 5.8S rRNA [18]. RNA polymerase II synthesizes pre-mRNAs, miRNAs and snRNAs [19]. RNA polymerase II coordinates with RNA polymerase III to maintain gene expression globally [20]. RNA polymerase II forms a holozyme along with several enzymes and general TFs (GTFs). Transcription begins once the TATA binding protein (TBP), subunit of TFIID, binds to the TATA box which allows the assembly of GTFs and RNA polymerase II. A recent study showed that many promot13 ers lack a classical TATA box, but instead a more degenerate TATA-like sequence has been found in Saccharomyces which might also exist in humans [21]. RNA polymerase III is involved in the synthesis of tRNAs, 5S rRNAs and other small RNAs. RNA polymerase III transcription is needed for the generation of adapter tRNA molecules that are used for translation of the genetic information in mRNA into protein. RNA polymerase III binds to only a few hundreds of tRNAs and a study in six mammalian species showed that RNA polymerase III is evolutionarily conserved and that RNA polymerase III binding varies between species in terms of strength and location [22]. Promoters are necessary for transcription and are located upstream of the transcription start site (TSS). Promoters have been divided into core promoters and proximal promoters. Core promoters are located within 50 bp of the TSS and are important for the formation of the pre-initiation complex (PIC) and the binding of GTFs. The TATA-box is located 20-30 bp upstream of the TSS and is found in many promoters. Core promoters are usually silent and need a proximal promoter to initiate transcription. Proximal promoter starts where the core promoters end and are located +/-250 bp of the TSS (Figure 2). The ENCODE project and others have studied GC-rich and ATrich promoters at the TSS of protein coding transcripts and found that these promoters have distinct histone modification and TF patterns [15, 23]. GCrich promoters are most common and have the highest activity in all tissues. Most of the housekeeping genes tend to be associated with GC-rich promoters [24]. In addition to promoters, gene activity is also regulated by the binding of sequence specific activator and repressor proteins to DNA elements termed enhancers, operators and silencers. Enhancers/silencers can be located many kilo base pairs from the TSS and are involved in the activation or repression of transcription [25, 26]. P300 is a well-studied enhancer protein whose effect on enhancer activity in different cells was measured by ChIP-seq [27] and with other methods. In the ENCODE project the carbon copy of chromosome conformation capture (5C) method was used to map more than a thousand long range interactions between the promoters and distal elements in each cell type and to correlate their interaction with gene expression [28]. Using one of the recently developed methods, STARR-seq, many functional enhancers have been identified in Drosophila [29]. In this method tentative enhancers are cloned in a vector downstream of the TSS so that they are transcribed. Their abundance reflects the enhancer activity and can be measured by RNA-seq. Silencers are sequence specific DNA binding elements located close to or far from genes and involved in the silencing of a target gene [30]. Locus control regions (LCRs) have similar functions as enhancers/silencers but have the ability to enhance at a greater distance and to regulate many genes in tissue specific manner [31]. 14 RNA polymerase II and GTFs are sufficient for promoter-specific transcription but additional factors such as mediators are important in order to respond to the activators. The mediator complex plays an important role in transcription and acts as a co-activator, co-repressor and sometimes as a GTF during transcription (Figure 2) [32, 33]. Open chromatin sites hypersensitive to the enzyme deoxyribonuclease 1 (DNase 1) are found in all classes of regulatory elements. ENCODE project has identified 2.9 million DHS sites from 129 cell and tissue types by using DNase-seq [15, 34]. Most of the DHS sites they have identified showed cell specific regulation. They also identified many novel TF binding motifs (n=683) using DNase 1 footprints [35]. The majority of the disease associated GWAS SNPs are located at DHS sites and some of these affect TF binding, chromatin states and are involved in the formation of regulatory networks [36]. Figure 2.Schematic illustration of transcriptional regulation. The figure illustrates part of the transcription machinery at core promoter, proximal promoter and TSS. Figure adapted with the permission from Ong C.T & Corces V.G, Nature review genetics, 2011 [37]. The transcriptome RNA transcripts are involved in many cellular functions either directly or indirectly. Technological advances in RNA sequencing has enabled the prediction of many RNA types, their location and function [38]. Only a small portion of the transcripts code for protein and the remaining are non-coding. There are many types of non-coding RNAs in the genome such as tRNA, rRNA, miRNA and lncRNA. The main function of tRNA is to translate the triplet amino acid code into amino acid sequence of proteins. rRNAs are present in the ribosomes and play an important role in translation. miRNAs are ~22 nucleotides long small non-coding RNA molecules that control gene expression by degrading mRNA molecules or by inhibiting translation [39-42]. miRNAs binds at 3’un-translated region (3’UTRs) of the 15 target gene [43]. According to the miRNA data base (miRBASE 19, August 2012; www.mirbase.org), there are about 21 264 miRNAs in 193 species. Among them 2 019 were discovered in humans but the majority are not validated yet. Some of these miRNAs play an important role in organ development, for e.g. miR-127 in lung development and some of them are involved in disease processes, e.g. miR-122 in HCV replication and miR-124 acts as a tumor suppressor [44, 45]. Long non-coding RNAs (lnc-RNA) are at least 200 nucleotides long and show less expression than protein coding genes. lnc-RNAs are expressed in a cell specific manner [46] and can form networks to control gene expression and function. They are also involved in the regulation of the three dimensional (3D) structure of chromosomes [47]. Most of the disease associated SNPs are found in non-protein coding regions [48] and some of them may affect the function of lnc-RNAs and thereby predispose to disease. Some lnc-RNAs are called long intervening noncoding RNAs (linc-RNA) and were discovered by Guttman and co-workers [49] by searching outside of non-protein coding genes in regions marked by histone 3 lysine 4 trimethylation (H3K4me3) and Histone 3 lysine 36 tri-methylation (H3K36me3). These linc-RNAs are involved in many biological processes such as dosage compensation and imprinting [50] and many are associated with chromatin remodeling complexes and regulate histone modification patterns and gene expression [51, 52]. Transcription factors TFs are a group of proteins that regulate the gene expression by binding to DNA in a sequence specific manner. TFs affect transcription alone or in a group by activating or repressing the recruitment of RNA polymerase II. There are ~1 850 TFs in the human genome [30]. Based on their DNA binding motif they have been grouped into helix-loop-helix, leucine-zipper, homeo domain, zinc finger and helix-turn-helix [53]. In a recent study, 830 TF binding motifs were identified by using high throughput SELEX and ChIP-seq [54]. However, the DNA-binding motifs remain to be defined for many TFs. TFs can bind various regulatory regions, i.e. at promoters, enhancers/silencers, LCRs and insulators etc. TFs are modular proteins that contain activator/repressor domains along with the DNA-binding domain [30]. TFs also contain domains for protein-protein interactions. Some proteins called co-activators or co-repressors regulate transcription by binding to TFs instead of DNA [55]. Transcription begins with the binding of one or several TFs to their respected TF binding site and also the recruitment of co-factors. The co-factors help to remove nucleosomes to allow binding of the transcription machinery to the promoters. Co-factors also modify the nucleosomes at promoters and at distal regions. 16 Regulatory networks The genomic information is almost the same for all multi-cellular organisms but there is a lot of phenotypic variation. A recent technological advance in the mapping of regulatory regions i.e. DHS sites and TF binding sites has identified regulatory networks in humans and other species. The regulatory networks are involved in the development, metabolism and hormonal stimulation etc. in a tissue specific manner [56]. To know the function of a specific TF, it is important to know which genes it regulates. Mapping of hepatocyte nuclear factors (HNF) in mouse embryonic stem cells showed that correct regulation of the HNF family network is required for differentiation and metabolism. Promoter array analysis on three TFs in liver and in endocrine pancreas showed that there is a transcriptional hierarchy between HNF1 and HNF4α [57] suggesting that regulatory networks play a role in maintaining different biological processes. ENCODE project has identified the regulatory networks between 119 TFs and found that these factors co-bind in a combinatorial fashion. These networks are highly cell specific. They also found that there is a significant difference between the networks for proximal and distal regulatory elements. Strongly connected network elements are more evolutionarily conserved than the weaker ones [15, 58]. Transcription factors in disease TFs are involved in the control of gene expression and any abnormality of their binding to regulatory regions may change the expression status and thus lead to a disease. This may be caused by different types of mutations in TFs like missense, nonsense, duplications or deletions. To date mutations in more than 400 TFs and other nuclear proteins are known to cause human disease (Wadelius C, personal communication). One example is mutations in HNF4α described further below. Mutations in this protein change the affinity for its motif [59]. Mutations or variants at regulatory regions may affect the binding of TFs which may lead to diseases such as cancer, diabetes, obesity, autoimmune and cardio vascular diseases etc. [60]. Many SNPs have been associated to diseases or common traits and this information is collected in a catalogue at NIH (http://www.genome.gov/26525384). As of 19 April, 2013 the catalogue includes 1 571 publications and 9 906 SNPs. Most of these SNPs are thought to affect gene regulation by changing the interaction with TFs. E.g. there has been studies showing that variants in the TF gene TCF7L2 are associated with the risk of type 2 diabetes [61, 62]. CMYC is one of the well-studied TF in many tumors and it regulates many genes involved in tumorigenesis. 17 Transcription factors in reprogramming TFs also play a key role in the development and reprogramming of cells. Induced pluripotent stems cells (iPS) were generated from adult human dermal fibroblasts using the TFs Oct3/4, Sox2, Klf4 and c-Myc (Figure 3). These factors could reprogram mouse embryonic or adult fibroblasts to iPS cells [63, 64]. For this work, Professor, Yamanaka was awarded the Nobel Prize in 2012. In a study by P.Huang et al, adult fibroblasts were directly converted into iHep cells by transduction of the TFs Gata4, Hnf1α and FoxA3 (Figure 3) [65]. In a parallel study, mouse embryonic and adult fibroblasts were directly converted to cells that are closely resembled to hepatocytes. For this study, they used combinations of Hnf4α plus any one of FoxA proteins (FoxA1/FoxA2/FoxA3) in vitro (Figure 3) [66]. Thus, some defined factors could soon be used to generate hepatocyte-like cells for therapeutic purposes. Further studies on these factors and processes could contribute to a breakthrough in tissue transplantation. Figure 3.Illustration of the reprogramming of cells from one state to another by selected transcription factors. Figure adapted with the permission from Lee Tong I and Richard A. Young, Cell, 2013 [60]. We have studied the TFs FOXA1, FOXA2, FOXA3, HNF4α, GABP and USF1 for this thesis. These factors are briefly described below. FOXA: s FOXA or hepatocyte nuclear factor 3 (HNF3) TFs belong to the forkhead box/winged helix family. They are called pioneer factors since they can bind to DNA that it is packed in nucleosomes and removes the nucleosomes to make place for other TFs [67]. There are three FOXAs, namely FOXA1 (HNF3α), FOXA2 (HNF3β) and FOXA3 (HNF3γ). FOXA proteins play a major role in early development, organogenesis and carbohydrate metabolism [68]. The forkhead box sequences of the three FOXAs are highly conserved and are 95% identical. All FOXA proteins share weaker homology 18 outside the fork head box, whereas there is higher degree of homology at the N and C terminal ends. FOXA proteins interact with other TFs like USF1, HNF1α, HNF4α and HNF6 for the regulation of target genes [69]. The FOXA family gene expression during embryogenesis and combined with liver specific FOXA target genes has been interpreted as evidence that the FOXA TFs regulate hepatogenesis. There are ChIP-chip, ChIP-seq and EMSA studies on FOXA TFs in mouse, human tissues and in HepG2 but the three factors have not been studied together in HepG2 cells previously. HNF4α Hepatocyte nuclear factor 4 alpha (HNF4α) is a TF that belongs to the zinc finger protein family. HNF4α plays a key role in the formation of TF networks that regulate the development of vital organs such as the kidney, pancreas and liver [70, 71]. There are three members in the HNF4 family: HNF4α, HNF4β and HNF4γ. HNF4α regulates genes involved in lipid and carbohydrate metabolism [72]. HNF4α is also directly involved in the regulation of genes in glucose transport and in glycolysis. Any abnormality in HNF4α regulation may lead to diabetes and cancer [73]. Mutations in HNF4α lead to Maturity onset diabetes of young 1 (MODY1) which is an autosomal dominant form of diabetes or to hyperinsulinaemic hypoglycemia [74, 75]. A recent study found that mutations in the DNA binding domain of HNF4α protein as well as in the nearby hinge region cause MODY1 by affecting the binding of HNF4α to DNA [59]. HNF4α is also associated to the common form of diabetes [76]. GABP GA binding protein (GABP) is a TF that belongs to the ETS family. It is abundantly expressed in liver and muscle tissue. GABP binds to DNA as a heterotetrameric complex and forms two GABPα and GABPβ subunits. GABP mostly binds to GAA enriched sequences and its preferred motif is CCGGAA. GABP regulates genes involved in the cell cycle, protein synthesis and metabolism. An alternative name for GABP is NRF2 since it regulates nuclear respiratory factors [77]. USF1 Upstream stimulatory factor 1 (USF1) belongs to the helix-loop-helix leucine zipper family and binds to the E-box sequence CA[C/T]GTG. USF1 can interact with its targets as a homo-dimer or hetero-dimer. USF1 is along with USF2 is involved in the recruitment of histone modifying enzymes and they are enriched at the sites of H3K4me2 and H3ac [78]. USF1 is ubiquitously expressed and involved in disorders of lipid and carbohydrate metabolism. SNPs in the USF1 gene are strongly associated with the disease familial combined hyperlipedemia (FCHL) which results from defects in the liver and other organs. FCHL is phenotypically associated with type 2 diabetes and the metabolic syndrome [79]. 19 Epigenetics After several decades of genetics research, some focus is now redirected towards epigenetics. The definition of epigenetics is “the study of changes in the gene function that are mitotically and/or meiotically heritable and that do not entail a change in DNA sequence” [80]. This means that epigenetic modifications are heritable from cell to cell just like genetic modifications. There are mainly two types of epigenetic mechanisms, DNA methylation and modification of histone tails. Chromatin structure and organization The eukaryotic nucleus harbors genetic information consisting of several billion nucleotides. To maintain all this genetic information in the nucleus, the DNA is packed tightly to form nucleosomes. A nucleosome consists of approximately 147 bp of DNA wrapped twice around an octamer of eight core histones, namely two of each histone H2A, H2B, H3 and H4. Due to this packing DNA is 10 fold compacted [81, 82]. Nucleosomes are connected to each other by a ~20 bp long linker bound by histone H1, which seems to regulate the distance between the nucleosomes (Figure 4). Nucleosomes are further subjected to compaction to form higher order chromatin fibers. However this tightly packed chromatin structure is virtually inaccessible for the transcriptional machinery. A prerequisite for the transcription of a certain gene segment is the structural reorganization of the gene locus in question. Change in nucleosome positioning, post-translational modification of histone tails and methylation of CpG dinucleotides constitute various levels of chromatin modifications that affect DNA accessibility [83]. These alterations occur in a highly ordered fashion, where selective modifications allow for tissue-specific expression and heritability at cell division; two hallmarks of epigenetic regulation. Chromatin organization in the cell nucleus is not random. It is organized in a spatial manner and plays an important role in gene regulation. Chromosome conformation capture (3C) based methods have been widely used to study spatial chromatin organization and their interactions [84]. The new HiC method can be used to view the 3D interactions of the chromatin genomewide [85]. . 20 Figure 4. Compaction of DNA into chromosomes: A) DNA anti-parallel strands base pair each other and form a double helical structure. The DNA double helix is tightly wrapped around the histone core octamer to form a nucleosome. Nucleosomes are compacted and folded into 30 nm fibers for higher order of compaction and for the chromosome formation. B) Nucleosome core particle: 3D image of nucleosome core particle showing that DNA is wrapped around the histone core octamer. Figure adapted with the permission from Felsenfeld & Groudine, Nature, 2003 [86]. 21 Nucleosome positioning Rapid advances in sequencing allowed mapping of nucleosome positions in different species and in different cell types. At least in some genes nucleosomes are positioned in order at promoters while the positioning is more random in the gene body (Figure 5) [87]. We have seen well-positioned nucleosomes downstream and upstream of a TSS and nucleosome free region in between them. However, this average nucleosome positioning signal was rarely observed at the TSS of individual genes. Some CTCF binding sites contain well positioned nucleosomes but nucleosomes are not necessarily well positioned on both sides of a TF bound region. This has been observed in our data at JUND binding sites (Paper III). In addition, some nucleosome positioning studies have shown that exons contain positioned nucleosomes marked with specific histone modifications reflecting the expression level [88, 89]. We have studied the nucleosome positioning pattern in HepG2 cells after the treatment with transforming growth factor β (TGFβ) (Paper III). Figure 5 Nucleosome positioning pattern. Figure illustrates the positioning pattern at the TSS and at nucleosome free regions. Figure adapted with the permission from Jiang & Pugh, Nature review genetics, 2009 [87]. Post-translational histone modifications Histone tails undergo a wide range of post-translational modifications at their N-terminal ends. Modified histones are involved in biological processes such as transcription, replication and maintain the chromatin states. There are at least eight known histone modifications. They are acetylation and methylation of lysines (K) and arginines (R), ubiquitinylation of lysines (K) and phosphorylation of serines (S) and threonines (T) (Figure 6) [82]. The pattern of modifications on histone tails are complex and to further increase the variation, many of the modifications can occur at different residues of the histone tail. However, some histone modifications are frequently observed in either active (euchromatin) or silent chromatin (heterochromatin) [90]. The 22 complex language of histone modifications is the subject of continuous research and great progress has been made in recent years. Figure 6. Histone octamer and modification of their tails at N-terminal ends. Figure adapted with the permission from Paul A. Marks et al, Nature reviews Cancer 2001 [91]. Acetylation and methylation of histones are the most commonly studied chromatin marks. Acetylation is usually localized at the active regions of chromatin; on the other hand methylation plays a dual role and is associated with both transcriptional activation and silencing. As shown in Table 1, histone 3 lysine 27 tri-methylation (H3K27me3) is associated with transcriptional silencing. By contrast previous studies have shown that nucleosomes with H3K4me3 are linked to transcriptional activity. High levels of H3K4me3 is associated with 5’ regions of all active genes. There is a strong positive correlation between this modification and RNA polymerase II activity, transcription rates, histone acetylation [92, 93]. Table 1. Different histone marks and their functional association in the genome. Table obtained and modified from Mehdi Motallebipour PhD thesis (2008). Histone H2A H2A H2A H2A Residue Ser1 Lys5 Lys119 Thr120 Modification Phosphorylation Acetylation Ubiquitylation Phosphorylation Function Mitosis Activation Spermatogenesis Mitosis 23 H2A H2A H2B H2B H2B H3 H3 H3 H3 H3 H3 H3 H3 H3 H3 H3 H3 H3 H3 H3 H3 H4 H4 H4 H4 H4 Ser139 Tyr142 Lys5,12,15, 20 Ser14 Lys120 Arg2, 8 Thr3 Lys4 Thr6 Arg8 Lys4, 9,14,18, 27, 36 Lys9 Ser10, 28 Thr11 Lys27 Lys27 Lys27 Lys36 Thr45 Lys56 Lys79 Ser1 Arg3 Lys5, 8, 12, 16 Lys20 Lys91 Phosphorylation Phosphorylation Acetylation Phosphorylation Ubiquitylation Methylation Phosphorylation Methylation1-2-3 Phosphorylation Methylation Acetylation Methylation Phosphorylation Phosphorylation Methylation-1 Methylation-2 Methylation-3 Methylation1, 3 Phosphorylation Acetylation Methylation Phosphorylation Methylation Acetylation Methylation Acetylation Repair Activation Apoptosis Elongation? Activation Mitosis Activation Activation Activation Activation Silencing Mitosis Mitosis Activation More active than repression Repression Activation Replication Repair Activation Activation Activation Activation Silencing Repair ChIP-seq studies suggest that histone modifications can predict the activity state of chromatin. In two large scale ChIP-seq studies of 19 methylated histone marks and 18 acetylated histone marks the localization of these marks was determined in CD4+T cells and correlated to promoters and other gene features (Table 1) [94, 95]. One can use the histone modification to predict gene expression [96]. It is also possible to detect enhancer elements using histone modifications. For example, by using acetylated histone 3 lysine 27 (H3K27ac) and mono-methyl histone 3 lysine 4 (H3K4me1), active enhancers can be distinguished from inactive/poised enhancers [97]. 24 DNA methylation DNA methylation is a biological process in which a methyl group is added to the 5’ positions of cytosine residues within CpG dinucleotide by DNA methyl transferases (DNMTs). CpGs occur in lower frequencies than expected in the human genome and 70% of the CpGs are present at promoters. DNA methylation is one of the important and widely studied epigenetic modifications that is involved in controlling cellular processes such as Xchromosome inactivation, transcription, embryonic development, cell differentiation, chromosome stability and genomic imprinting [98]. DNA methylation is mainly associated with gene silencing and heterochromatin formation (Figure 7). DNA methylation along with histone modification plays a role in gene activation, repression and nuclear structure. Two of the studies headed by Dirk Schubeler showed that local DNA sequences such as TF binding sequences (motifs) and respective TFs are the primary determinants of target specification and regulation of DNA methylation in mammals [99, 100]. A study in colon cancer has identified methylation alterations at lower CpG density sites called CpG island shores. CpG island shores are located neither at promoters nor at CpG islands but are in the proximity of 2 kb from CpG islands. Just like at CpG islands, methylation at CpG island shores is strongly correlated with gene expression [101, 102]. Hypermethylation at CpG island shores has been observed during normal tissue differentiation [101, 103]. This is in line with data which shows that methylation with the help of DNMTs plays a role in normal mammalian development [104]. DNA methylation has been associated with many diseases in addition to cancer, e.g. systemic lupus erythematosis (SLE), mental retardation, Prader Willi syndrome [98] and many other disorders. Prader Willi syndrome is due to effect in imprinting discussed further below. DNA methylation suppresses the expression of many genes and changes of methylation status i.e. hypo or hypermethylation has been observed in many cancers. Mutations in DNMT3A that reduce methylation have been observed in acute myeloid leukemia (AML). Differential methylation patterns have been observed in the pancreatic islet cells of type 2 diabetic individuals suggesting that methylation also plays a role in type 2 diabetes [105]. A human methylation study in rheumatoid arthritis patients revealed differential methylation in the major histocompatibility complex region and it was suggested to mediate the genetic risk of rheumatoid arthritis [106]. 25 Figure 7. Illustrates different methylation mechanisms in the genome. Figure adapted with the permission from Smith Z.D & A. Meissner, Nature review genetics 2013 [104]. 26 Imprinting and X-chromosome inactivation Imprinting is an epigenetic phenomenon in which genes are expressed on the basis of their parental origin. There are about 150 imprinted genes observed in the mouse. Imprinted genes can form clusters of 20-3 700 kb of DNA. The expression of these genes is regulated by an imprinted control region (ICR). ICR is usually associated with parent origin specific methylation and histone modifications. Notable imprinted clusters are H19/Igf2 and Gtl2/Dlk1 in which the paternal alleles are methylated, and the non-coding RNAs Kcnqot1 and Snprn, where the ICRs of the maternal allele are methylated [107]. One well-known locus in tumorigenesis is IGF2-H19 in which loss of imprinting is associated with cancer [108] and in this region CTCF regulates the methylation pattern (Figure 8). In mammals, females have two X- chromosomes and males have one X and one Y-chromosome. One of the X-chromosomes is inactivated in females by a 170 000 bp long non-coding RNA, X-inactivation specific transcript (XIST) [109, 110]. It plays an important role since XIST mediates the epigenetic silencing of the genes on the X-chromosome. XIST is also involved in cancer and in reprogramming [107]. Figure 8. The mechanism of methylation and imprinting at H19/IGF2 loci in the maternal and paternal allele. CTCF binds to the unmethylated maternal allele of the ICR and prevents activation of IGF2. Modified figure adapted with the permission from, Plass. C and P.D Soloway, European Journal of Human Genetics, 2002 [108]. 27 TGFβ signaling TGFβ is involved in many cellular processes such as differentiation, proliferation, apoptosis and development. There are three highly homologous isoforms in the TGFβ family which are expressed in a tissue specific manner [111]. TGFβ is involved in immune cell maturation and regulation of the gene expression. TGFβ signaling is complex and is mediated e.g. by the SMAD TFs. Abnormality in TGFβ signaling may contribute to cancer, diabetes, heart failure etc. TGFβ can act as both tumor suppressor and tumor promoter and is associated to many cancers [112]. TGFβ is one of the main regulators of liver disease and liver cancer and participates in liver inflammation, fibrosis and cirrhosis in hepatocellular carcinoma (HCC) (Figure 9). Due to the effect of the TGFβ signaling pathway in many diseases, it has been identified as a promising drug target [113]. Figure 9. Roles of TGFβ in different cells and its positive and negative effects Figure adapted with the permission from Dooley. S & P. ten Dijke, 2012, Cell and tissue research [114]. Alcoholic steatohepatitis Excess alcohol consumption may cause fatty liver disease or alcoholic steatohepatitis (ASH) which may lead to ballooned hepatocytes, mallory bodies, inflammation, fibrosis, cirrhosis and malignant transformation to HCC. There are similarities between ASH and the disease non-alcoholic steatohepatitis (NASH). One of the main sign of ASH is jaundice with fever and prox28 imal muscle loss [115]. Both genetic and environmental factors contribute to this disease. Aldehyde dehydrogenase (ALDH) genes, alcohol dehydrogenase (ADH) genes and catalase genes participate in alcohol metabolism and change in their function may contribute to the disease (Paper IV) and [116]. In hepatocytes alcohol is converted to acetaldehyde through oxidation and then to acetate. Downstream in the pathway ethanol affects the lipid metabolism through the TFs PPAR-α, AMPK and SREBP1 leading to an increase in fatty acid (FA) and steatosis. Lipopolysaccharides (LPS) in gram negative bacteria may also contribute to inflammation in the liver (Figure 10). ASH can be treated with proper diet control, using lactulose and gut cleaning antibiotics. Corticosteroids can be used to inhibit the TFs AP-1and NF-Kb. Other drugs such as Pentoxifylline and Infliximab antibodies that inhibit the TFs that contribute to this disease can be used to treat ASH. In the worst scenario, liver transplantation may be a final option. Figure 10. An illustration of alcoholic steatohepatitis. Adapted with the permission from Lucey. M .R et al, New England Journal of Medicine, 2009 [115]. 29 Materials and Methods Cells and Tissues In this study we used the HepG2 cell line which is derived from a hepatocellular carcinoma in a Caucasian male. HepG2 cells were cultured in RPMI1640 medium containing 10% fetal bovine serum (FBS), 1% Lglutamine at 37°C with 5% CO2. Human liver tissue from people diagnosed with alcoholic steatohepatitis and normal liver tissue were obtained from the Graz biobank, Medical University of Graz, Austria. Methods ChIP-seq is the main method used in this thesis. Apart from this, we performed Nucleosome-seq and RNA-seq for paper III. For the follow-up studies, we used semi-quantitative PCR, qPCR, ChIP-reChIP, coImmunoprecipitation (Co-IP) and Western blotting. Chromatin Immunoprecipitation (ChIP) Cells or finely chopped tissue was crosslinked with formaldehyde to keep the protein-DNA interactions stable. Crosslinking was stopped by adding glycine and the chromatin was sonicated to 100-300 bp fragments by using Bioruptor. Sonication can be replaced by an enzyme called micrococcal nuclease 1 (MNase 1). MNase 1 cleaves the nucleosome free regions and leaves the nucleosomes. The main advantage of using MNase 1 for shearing is that chromatin marks can be mapped at the highest possible resolution and assigned to individual nucleosomes. MNase 1 was used to map the histone mark H3K4me3 in paper I and for mapping of the nucleosome positioning in paper III. Immunoprecipitation was performed using the antibodies against the TFs or the histone marks of interest. The immunoprecipitated complex was attached to the solid phase of Protein-G agarose/sepharose beads or magnetic beads. After the extensive washing to remove all unbound protein-DNA complexes the protein-DNA complexes were eluted. Crosslinks were reversed the protein-bound DNA was treated with RNase A and proteinase K 30 and extracted with phenol, chloroform followed by ethanol precipitation. A fraction of chromatin called input was not subject to immunoprecipitation and was saved and purified after reverse-crosslinking the DNA. Input DNA was later used to measure the background of the ChIP experiments. ChIP enriched DNA can be used to verify known regions by PCR or can be used in large scale sequencing. Figure 11. Chromatin Immunoprecipitation. Chopped tissues or cells are crosslinked then cells are lysed and chromatin is sheared into small fragments and immunoprecipitated and DNA is extracted. This DNA can be used in PCR, qPCR, Microarray and in sequencing. Figure adapted with the permission of publishers, Philippe Collas, 2008 [117]. 31 Verification of ChIP DNA by PCR Enrichment in immunoprecipitated DNA can be verified by semiquantitative PCR or preferably real time quantitative PCR (qPCR) in known target regions of the specific TF or histone modification. ChIP enrichment by semi-quantitative PCR is usually evaluated by separating the PCR products on an agarose gel and visualizing under UV light. qPCR with SYBR green is a more sensitive and accurate way of measuring ChIP enrichment as it enables measurements to be done in the exponential phase of amplification. Serial dilutions of input DNA can be used to obtain a standard curve and from this it is possible to quantify the ChIP enrichment. Enrichment can be compared with that in a negative control ChIP using IgG as antiserum or with the enrichment obtained from unbound regions of the TF or histone mark. Library preparation for ChIP-sequencing ChIP combined with high-throughput sequencing is a powerful method to map the regulatory elements across the genome. For ChIP-seq, ChIP-DNA is end-polished with end-polishing enzymes T4 DNA polymerase, Klenow DNA polymerase and T4 polynucleotide kinase to create blunt ends. The end-repaired DNA fragments are A-tailed for the Illumina pipeline and ligated with adapters. The ligated fragments are amplified using adapter specific primers and the number of cycles should be low to reduce duplicates. Since the chemistry of the sequencing instruments only allows reading of smaller fragments, the amplified PCR product is separated on an agarose gel and the fragments are size selected and purified. It is also possible to use AMPure XP beads to size select the fragments. The purified DNA fragment sizes should be checked on a BioAnalyzer and can be used for large scale sequencing. The above protocol is an outline for different instruments but, each NGS platforms have their own adapters and library protocols. There may be changes between the steps for example; SOLiD adapters need to be nick-translated before the amplification. Next generation Sequencing Sanger sequencing has been the main source for any type of sequencing until the launch of the 454 sequencing instrument in 2005. The 454 technique is based on pyro-sequencing and uses sequencing by synthesis [118]. The initial read length for 454 was 100-150 bp but, now it is up to 1000 bp. In 2006, Illumina-Solexa released their Genome Analyzer which is also based on sequencing by synthesis. After that Life technology’s SOLiD (Sequencing by Oligo Ligation Detection) was launched into the market. Today there are 32 many other NGS instruments on the market, for example PGM and Proton (Life tech/Ion torrent), RS (Pacific Biosciences) and ARGUS (Opgen) (Table 2). For all platforms the sequence reads come from the ends of the ChIPenriched fragments. The reads are aligned to the genome and peaks are called using resources like MACS available on the internet or using in-house technology. Table 2. Different NGS platforms, their sequencing type, chemistry. Platform Sequencing Type Amplification Comments Illumina Synthesis Bridge PCR Short reads SOLiD Ligation Emulsion PCR High accuracy Ion Torrernt Synthesis, H+ detection Emulsion PCR short Time 454 Synthesis Emulsion PCR Long reads Pac Bio Synthesis Ion Proton Synthesis Single molecule sequencing Emulsion PCR short Time For this thesis, we have used Illumina and SOLiD platforms and I will briefly discuss their chemistry and current status. Illumina sequencing system is one of the leaders in NGS. It is based on sequencing by synthesis technology. In this system, denatured single stranded libraries with adapters are hybridized to the flow cell. Formamide is used for denaturation and Bst polymerase is used for the extension and amplification to create clusters of individual molecules. The double stranded library is made single stranded. Each base is labeled with a unique fluorophore and one base at a time is incorporated and detected by a CCD camera. The HiSeq 2000 machine can generate up to 200 GB of data per run with the read length 2x100 bp. SOLiD sequencing is based on ligation. Quantified SOLiD libraries are added to paramagnetic beads and emulsion PCR is performed. Beads with template are enriched and the selected beads undergo 3’ modification to attach covalently to the slide or flow cell. During sequencing, a universal sequencing primer hybridizes to the P1 adapter on the template beads and an eight base labeled probe ligates to the sequencing primer. The first base of this probe contains ligation site, a cleavage site at the 5th base and the last base contains one of four fluorescent dyes. The images from the fluorescent signal are recorded, the probe is cut at the cleavage site and the next probe is ligated to generate a complex data in which each base is sequenced twice to achieve an accurate sequence. SOLiD has recently launched the 5500 W instrument in which a flow chip is used in place of emulsion PCR for the amplification. Using this system we can get the data with 99.99% accuracy. 33 ChIP-reChIP ChIP-reChIP is a method to detect two different proteins or histone marks present at the same location in the genome. In this method, ChIP is performed with an antibody against the first protein. After the elution of the protein-antibody-DNA complex from the solid phase; the complex is again immunoprecipitated with a second antibody against the second protein. This complex is eluted from the solid phase and DNA is extracted as described in the ChIP method. Enrichment of ChIP-reChIP DNA can be verified by PCR with known genomic loci to confirm that the studied proteins are located at the same region in the genome. ChIP-reChIP enriched DNA could also be used in sequencing to map all common target regions. Co-Immunoprecipitation Co-IP is a method used to find protein-protein interactions in vivo using Western blotting. In this method, the extracted cell lysate is incubated with an antibody on a solid phase and washed vigorously to remove the unbound proteins. Later, the purified protein lysate is boiled and separated on a polyacrylamide gel and the separated proteins are transferred on to a PVDF membrane. The membrane is probed with the antibody against the interacting protein of interest. TGFβ treatment, nucleosome and RNA preparation HepG2 cells were cultured as described above and serum starved overnight and then treated with 2.5ng/mL of TGFβ for one hour. The cells were used either for nucleosome preparation or RNA extraction. Nuclei were prepared by homogenization with a Dounce homogenizer and the nuclear suspension was slowly layered with the buffer containing 30% of sucrose. The nuclei were collected and treated with 300U of MNase 1 at 37°C for 5 minutes. Mono-nucleosome size DNA was prepared and DNA was gel purified before the library preparation (Paper III). Total RNA was extracted by using TRIzol-chloroform method according to the manufacturer’s protocol. The quality of RNA was measured by using a BioAnalyzer. The SOLiD transcriptome whole library preparation kit (Ambion) was used to prepare the strand-specific library. 34 Present Investigations Aims of the present studies The main aim of this thesis was to map regulatory elements in liver cells and disease tissue by high-throughput sequencing. The specific aims are as follows: Paper I and II To investigate the TF binding sites for FOXA1, FOXA2, FOXA3, HNF4α, GABP and their interaction in the human genome. The TFs were mapped genome wide in the liver cell line HepG2 using ChIP-seq. Paper III To evaluate the effect of TGFβ on nucleosome positioning and RNA expression by nucleosome and RNA-sequencing. Paper IV To identify disease mechanisms of alcoholic steatohepatitis by mapping the active chromatin marks H3K4me1, H3K4me3 and H3K27ac and the TF USF1 in diseased and normal liver tissue. 35 Paper I Differential binding and co binding pattern of FOXA1 and FOXA3 and their relation to H3K4me3 in HepG2 cells revealed by ChIP-seq We performed ChIP-seq for whole genome analysis of the TFs FOXA1, FOXA3 and the histone mark, H3K4me3 on the SOLiD platform. We obtained FOXA2 data from paper II. From ChIP-seq data, we identified 8 175 FOXA1 peaks, 4 598 FOXA3 peaks and 41 480 H3K4me3 regions that correspond to 160 000 nucleosomes across the genome. Only a fraction of FOXA1 (5.2%) and FOXA3 (12.2%) peaks were located within 1kb of TSS and the majority of them were located more than 1 kb away from a TSS. H3K4me3 signals were mainly localized downstream of the TSS. Of the H3K4me3 regions 42% were located within 1 kb of a TSS. To identify promoter patterns, we compared H3K4me3 data with CAGE (Cap-analysis of gene expression) tag data of HepG2 and found that 28% of H3K4me3 regions were not associated with any known transcript. To study the relationship between FOXA proteins, we compared the ChIP-seq data of the TFs FOXA1, FOXA2 and FOXA3 and found that these three datasets had 2 304 regions in common. This suggested that the TFs not only interact with DNA but also with each other. When we tested these interactions by Co-IP and ChIP-reChIP, we found that FOXA1 and FOXA3 bind to FOXA2, but there is no evidence of binding between FOXA1 and FOXA3. Based on these results, we classified the binding sites of these factors pair-wise and found that 51% of FOXA2 and FOXA3 binding occurs within 5kb of TSS whereas only 10% of FOXA2-FOXA1 bind within the same distance. We clustered H3K4me3 signals into seven clusters by K-means clustering and each of these clusters has individual H3K4me3 signatures. All seven clusters showed different levels of signals and expression. We found that the genes with the highest expression had high levels of H3K4me3 enrichment in HepG2 cells. Clusters I-III had the highest H3K4me3 enrichment upstream of the TSS and we found that it could be due to bidirectional promoters. When we looked at the CpG density at bidirectional promoters, we found that there was high CpG density in all clusters except in cluster V. Cluster V contains genes with TATA and CAAT boxes. Comparisons of the clusters with CAGE tag data at TSSs of a gene revealed that 30% of genes in each of these clusters had bidirectional promoters. Confirming of this bidirectionality, we also found an H3K4me3 double peak at FOXA1-2-3 binding sites. Another finding was that 11% of the genes in cluster-I had a FOXA3 binding site within 1kb of TSS. In summary, we showed that FOXA1 and FOXA3 TFs had differential binding patterns in HepG2 cells and no verified interaction in vivo. Most of the FOXA1, FOXA2 and FOXA3 bindings coincide with H3K4me3 and the data suggested a directionality of this histone mark even though there is no 36 known TSS in close vicinity. We also showed that ChIP-seq can be used to find mono-allelic binding of TFs or histone marks at SNPs. SNPs present inside the regulatory regions have the capacity to differentially control the activity of many genes and some of these SNPs could be associated with disease. This tells us that ChIP-seq can also be used to identify diseaseassociated SNPs. Paper II Molecular interactions between HNF4a, FOXA2 and GABP identified at regulatory DNA elements through ChIP-sequencing In paper II we performed ChIP-seq of three TFs FOXA2, HNF4α and GABP (NRF2) in HepG2 cells using the Solexa/Illumina platform. Before starting this work, a paper was published by the group on USF1 and USF2 which found that most of the USF2 binding was at distal positions and its binding coincided with motifs for FOXA2 and HNF4α. They also found that there are GABP binding motifs in close vicinity of TSSs bound by USF. We found 18 693 HNA4α binding sites, 7 253 FOXA2 binding sites and 3 060 sites for GABP in HepG2 cells. Among these, 10% of HNF4α, 6.3% of FOXA2 and 85.2% GABP binding sites were located within 500 bp of a TSS so most of the HNF4α and FOXA2 binding sites are far from TSSs. We also found that there was a large number of overlap between HNF4α and GABP peaks at promoters. We have identified many variants of expected motifs for FOXA2 and HNF4α and these motifs were highly similar to the motifs already present in Transfac database. For GABP, we were able to identify two motifs that were enriched equally for both GABP/NRF2 and for NRF1. Although many GABP peaks contain motifs for both NRF1 and NRF2, NRF1 motifs have a tight distribution in the peaks and some of the GABP peaks contain an NRF1 motif only. Correlation of HepG2 expression data with ChIP-seq data showed that the genes bound by HNF4α and FOXA2 had higher expression than average expression in HepG2 cells but, GABP did not show such an association in HepG2. We also applied Co-IP studies as in paper I to see the possible interaction between HNF4α, FOXA2 and GABP and identified that these factors were interacting in vivo but, we did not see any interaction with USF2 except the ChIP-seq overlap between the data sets. In conclusion, we showed that ChIP-seq is a powerful method to map the regulatory regions in the genome. Apart from this, ChIP-seq can be used to identify SNPs associated with common diseases in TF binding regions. 37 Paper III Nucleosome regulatory dynamics in response to TGF-beta treatment in HepG2 cells. In this paper, we performed high-throughput sequencing of nucleosomal DNA and total RNA using the SOLiD platform before and after 1 hour of TGFβ treatment. When we looked at the nucleosome positioning pattern around the TSS, we found that there was a well-positioned nucleosome downstream of TSS, another well positioned nucleosome upstream of the TSS and a nucleosome free region between them. However, we did not see any flanking positioned nucleosomes at 33% of TSSs. We also observed a well-positioned nucleosome at exons as seen before in CD4+ T cells. There was a higher density of nucleosomes in exons than in introns and intergenic regions. When we separated the nucleosomes based on their fuzzyness, exonic regions contained more fuzzy nucleosomes than phased nucleosomes. This is compatible with the fact that there are AT-rich sequences flanking exons which disfavor positioned nucleosomes. When we looked the regulatory potential of TGFβ1 in HepG2 cells, 81% of the TGFβ+ nucleosome positions matched with TGFβ- and a surprising 11% did not match with each other. After the strict filtering of the data, 24 318 nucleosome depleted loci were identified in TGFβ+ cells. When we looked at the expression of genes with nucleosomes depleted in the vicinity, after TGFβ1 treatment, 20% of those genes changed their expression. To understand the role of TGFβ in regulation, we searched for TF motifs using the JASPAR database at nucleosome depleted regions and found that these regions were highly enriched with TF motifs. Some of these TFs, for example SPI1, SMADs, FEV, ELF5, and HNF4α etc., are regulated by TGFβ1. By coupling these results with the RNA-seq expression change after TGFβ1 treatment it was found that many nucleosomes with TF motifs that changed positions were related to the gene expression change. Of nucleosomes in introns that had TF motifs and that were depleted after treatment, 61% were associated with exon expression change. Out of 2 469 intronic distal loci, 1 173 were associated with expression change in 306 genes. When we compared the expression change after TGFβ1 treatment, 591 genes were up-regulated and 196 were down-regulated. Genes with neuronal function, ion channels and ion transporters, neuro-transmitters, G-protein coupled receptors were up-regulated and the TFs that were known to be regulated by TGFβ1 were among the most up-regulated genes. Using HNF4α ChIP-seq data from paper II, we identified 37 candidate HNF4α binding sites at nucleosome depleted regions in TGFβ- cells compared to TGFβ+ cells and the ChIP-qPCR validations supported our findings. In summary, our data showed that 1 hour of TGFβ1 treatment depleted many nucleosomes and some of these depletions were associated with 38 change in the expression of genes. In addition, chromatin is dynamic and plays an important role in transcriptional regulation in general. TGFβ1 therefore has an important impact on chromatin organization and function. Paper IV ChIP-seq in steatohepatitis and normal liver tissue identifies candidate disease mechanism related to progression to cancer. To learn more of disease mechanisms we mapped the active chromatin marks H3K4me1, H3K4me3 and H3K27ac and the TF USF1 in ASH patients and in controls using ChIP-seq. We identified 2 054 USF1 peaks in control and 1 766 for ASH and out of these 44% of peaks in control and 54% in ASH contained a USF1 motif. Peak calling for chromatin marks gave 10 256 H3K4me1, 14 668 H3K4me3 and 11 656 H3K27ac peaks in the control sample and 10 674 for H3K4me1, 18 358 H3K4me3 and 12 385 H3K27ac in ASH sample. For each histone modification, the top 1 000 differentially enriched peaks were identified. When we looked at the genomic localization of differentially enriched histone modification peaks at +/-2 kb of a TSS, a majority of the H3K4me3 peaks were located at TSS as expected, whereas the peaks for H3K4me1 and H3K27ac were distributed both at TSS and non-TSS regions. Surprisingly there was an increased number of peaks in ASH patient compared to control. There were 1 600 genes that contained histone modification peaks only in control and 3 000 genes contain peaks only in patient. Comparison of differentially enriched USF1 peaks with histone modification peaks gave 7 overlaps which suggests that USF1 is just one among many factors that may contribute to differential gene regulation. Gene ontology of genes associated with differentially enriched histone modification peaks revealed that the genes from control sample were enriched with different metabolic process whereas ASH patient sample were enriched for cancer associated genes. This suggests that the disease in this patient is progressing towards cancer. We observed different genes associated with both ASH and NASH which indicates that common pathophysiological mechanisms exist. We looked at the pattern of histone modifications at the genes involved in alcohol metabolism and disease (CD14, TLR4, TNF4α and TGFβ) and found evidence that they contribute to disease in this case. Using SNP calling in differentially enriched regions, we identified 178 potentially functional SNPs that are present in the GWAS catalogue and additionally 1 237 GWAS SNPs with high LD were identified in the patient and 456 in the control. The majority of these SNPs were located at the binding motifs of different TFs and DHS sites which suggest a regulatory potential of these SNPs. Some of these SNPs are within the binding motif of the 39 TFs associated with ASH and NASH. We also identified 383 novel SNPs from ASH sample that might be involved in the etiology of the disease. RNA expression with qPCR at differentially enriched USF1 sites did not follow USF1 ChIP-seq pattern. We compared the histone modification ChIPseq data with RNA expression by qPCR and found that RNA expression at majority of the sites followed the ChIP-seq pattern. In summary, our study identified many genes associated with liver disorders and we also found many genes associated with cancer pathways. This illustrates that analysis of chromatin marks in disease tissue may provide useful insights into disease mechanisms. 40 Concluding remarks and future perspectives For this thesis, we used ChIP with next generation sequencing to map TFs, histone marks and nucleosome positions and to determine gene expression. Rapid advances in NGS technology have allowed the mapping of regions in the genome in a short time span. Due to high competition, the price for sequencing and reagents has dropped rapidly. At the time of our ChIP-seq paper I and II [10, 119], very few genome-wide ChIP-seq studies had been performed. Today there are thousands of ChIP-seq datasets, mainly from the ENCODE project, and many additional have been published. We have mapped FOXA TFs, HNF4α, GABP, USF1 and a few histone marks by ChIP-seq. In paper I and II, we showed that ChIP-seq not only can be used to map the TF binding sites, but also to identify protein-protein interactions genome-wide. We also developed a method to identify allelespecific binding of TF or histone modification and identified many functional SNPs. During the time of this thesis work, it was found that the TFs studied in paper I and II are sufficient to reprogram fibroblasts to liver cells which illustrates the importance of these factors. In paper III, we mapped the nucleosome positions across the HepG2 genome before and after TGFβ1 treatment and measured the nucleosome dynamics in response to TGFβ1. We have seen a change in RNA expression of genes at nucleosome depleted regions after treatment. Nucleosome positioning data is valuable for delineating gene regulatory regions. After the extensive studies in the liver cell line HepG2, we focused on liver tissue from ASH patients and controls. We identified differences in histone marks which provide insights into ASH disease mechanisms and in this case its progression towards cancer. There was a correlation between ChIP-seq of histone marks and RNA expression which suggests that chromatin marks could be used to get further insights into disease processes. Advances in NGS technology has facilitated introduction of new ChIP based methods. ChIA-PET was developed to map binding sites for nuclear proteins and long way chromatin interactions globally and has been applied to TFs and RNA Pol II [120, 121]. ChIP-exo was developed to map TFs at single base-pair resolution [122] and will be useful to identify TF binding sites more precisely. There are ~1 850 TFs and it is not an easy task to map all TFs in one cell and even more demanding to map them in all cell types and tissues. Consequently there is a need to develop high-throughput methods. Recently an 41 HT-ChIP method was developed for the preparation of sequencing library samples using a pipetting robot [123]. Our group also developed a new HTChIP-library preparation method with which we can prepare many ChIPlibraries in a short time span and perhaps more efficiently than any other method (Manuscript in preparation). A lot of things have happened in genetics and genomics since the discovery of the structure of DNA exactly 60 years ago, but we still need answer to many questions. Each and every regulatory element in the genome probably has an important function and identification of their location and the gene(s) they act on will provide useful information. We still need to understand the epigenetic behavior in cancer cells and normal cells and high-throughput mapping of several TFs in tumors by ChIP-seq may be a good strategy to understand disease mechanisms. Mapping of regulatory regions, non-coding RNAs, mRNA in dynamic conditions i.e. after treatment with different biological agents may provide clues for many diseases. NGS technology will soon play a great role in disease diagnostics in patients since one can sequence the entire genome in one or a few days using novel bench-top instruments. Today interpretation of genomic sequences is limited to the protein coding regions, but soon we will also be able to deduce the functional consequences of variation in gene regulatory regions. 42 Acknowledgements This thesis work would not have been possible without following people. I would like to express my deepest gratitude towards the people who helped me directly or indirectly. First of all, I would like to thank heartfully to my Supervisor, Professor Claes Wadelius, who gave me the opportunity to pursue my PhD in his group. I am very much thankful for your immense support, valuable discussions and suggestions. I also would like to thank my co-supervisor Professor Jan Komorowski for great collaboration and group meetings. I am very much thankful to the former members of the group. Mehdi Motallebipour, for all your help during my early days in the lab. Ola Wallerman, for taking care of large datasets and for all the help whenever needed. The present members in the lab, Gang Pan, for all the discussions we have during lunch and work. Marco Cavalli and Helena Nord, for all your help and fun in lab. Thanks to the present and former members of Komorowski’s lab especially, Susanne, Stefan and Robin for all the data analysis, group meetings we have together. Professor Marta Alarcon, who gave me the opportunity to work in her lab. Sergey Kozyrev, for all your suggestions and help. Special thanks to Hong Yin, who introduced me to Claes. Lena Åslund, for all the help during teaching period and for the discussions in the journal club. Marcela Ferella, for being a good friend in the office. I am grateful to the members of Uppsala Genome Center (Ulf, Inger, Katherina and Cecelia) and Seq & SNP platform for the sequencing service. Thanks to the people at Life technologies, California for the great collaboration. 43 Thanks to all other Rudbeck friends, Angelica, Ammar, Nikki, Hamid, Sara E, Tanmoy, Jitendra and Kali. Thanks to all Indian friends in Uppsala who made my stay more easy and fun. Chandu-Geeta and Kiran K-Madhuri for always being there to help our family in all aspects. Ravi-Sushma, good luck with your thesis and baby. Ramesh-Laura, for your kindness. Srisailam-Swathi, for being good friends. Kiran J-Bhavya, for your help to Akshara. Raghuveer, wish you good luck with the new innings. Nallan-Gayatri, good luck with your thesis Nallan. Sudhakar-Saritha, all the best in Canada. Rambabu-Lakshmi, thank you. Thanks to all my friends in India, Vidya, Renuka, Aruna, C.V, Raja.B, Raja.O and Ramana. Thank you so much Siva for always taking care of my Tirumala trip. Thanks to my sisters and brother-in-laws, Aruna-Nageswar Reddy and Prameela-Prathap for all your help. Eswar Reddy mama for always being there for anything. Peddnanna, Jayarami Reddy for all your support and Sesha Reddy anna for being a good friend than a brother. My wife’s parents, Balarami Reddy mama and Ratna atha, for all your help and support. My Padmavathi pinni, for all your care during my childhood and school days. Thanks to my parents, Nagi Reddy and Lakshmidevi for your unconditional support even my interests were beyond your limits. Simple thanks may not enough. Many thanks to my wife, Vijayalakshmi for all the support and love. Finally, my little daughter Akshara, for all the joy you brought into our lives. 44 References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. Watson, J.D. and F.H. Crick, Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature, 1953. 171: p. 737-738. Lander, E.S., et al., Initial sequencing and analysis of the human genome. Nature, 2001. 409(6822): p. 860-921. Venter, J.C., et al., The sequence of the human genome. Science (New York, N.Y.), 2001. 291(5507): p. 1304-51. Ihgs, Finishing the euchromatic sequence of the human genome. Nature, 2004. 431: p. 931-945. HapMap, [International HapMap project]. Nature, 2005. 426: p. 789-796. Abecasis, G.R., et al., A map of human genome variation from populationscale sequencing. Nature, 2010. 467(7319): p. 1061-73. Abecasis, G.R., et al., An integrated map of genetic variation from 1,092 human genomes. Nature, 2012. 491(7422): p. 56-65. Fu, W., et al., Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature, 2013. 493(7431): p. 216-20. Girirajan, S., C.D. Campbell, and E.E. Eichler, Human Copy Number Variation and Complex Genetic Disease, in Annual Review Genetics, Vol 45, B.L. Bassler, M. Lichten, and G. Schupbach, Editors. 2011, Annual Reviews: Palo Alto. p. 203-226. Motallebipour, M., et al., Differential binding and co-binding pattern of FOXA1 and FOXA3 and their relation to H3K4me3 in HepG2 cells revealed by ChIP-seq. Genome biology, 2009. 10(11): p. R129-R129. Kasowski, M., et al., Variation in Transcription Factor Binding Among Humans. Science, 2010. 328(5975): p. 232-235. McDaniell, R., et al., Heritable Individual-Specific and Allele-Specific Chromatin Signatures in Humans. Science, 2010. 328(5975): p. 235-239. Birney, E., et al., Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature, 2007. 447(7146): p. 799-816. Myers, R.M., et al., A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS biology, 2011. 9(4): p. e1001046-e1001046. Dunham, I., et al., An integrated encyclopedia of DNA elements in the human genome. Nature, 2012. 489(7414): p. 57-74. Roy, S., et al., Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science (New York, N.Y.), 2010. 330(6012): p. 1787-97. Gerstein, M.B., et al., Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project. Science (New York, N.Y.), 2010. 330(6012): p. 1775-87. Drygin, D., W.G. Rice, and I. Grummt, The RNA polymerase I transcription machinery: an emerging target for the treatment of cancer. Annual review of pharmacology and toxicology, 2010. 50: p. 131-56. 45 19. Hahn, S., Structure and mechanism of the RNA polymerase II transcription machinery. Nature structural & molecular biology, 2004. 11(5): p. 394-403. 20. Raha, D., et al., Close association of RNA polymerase II and many transcription factors with Pol III genes. Proceedings of the National Academy of Sciences of the United States of America, 2010. 107(8): p. 3639-44. 21. Rhee, H.S. and B.F. Pugh, Genome-wide structure and organization of eukaryotic pre-initiation complexes. Nature, 2012. 483(7389): p. 295-301. 22. Kutter, C., et al., Pol III binding in six mammals shows conservation among amino acid isotypes despite divergence among tRNA genes. Nature genetics, 2011. 43(10). 23. Sandelin, A., et al., Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nature reviews. Genetics, 2007. 8(6): p. 424-36. 24. Gagniuc P Fau - Ionescu-Tirgoviste, C. and C. Ionescu-Tirgoviste, Eukaryotic genomes may exhibit up to 10 generic classes of gene promoters. (1471-2164 (Electronic)). 25. Butler, J.E.F. and J.T. Kadonaga, The RNA polymerase II core promoter: a key component in the regulation of gene expression. Genes & development, 2002. 16(20): p. 2583-92. 26. Cooper, S.J., et al., Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome. Genome research, 2006. 16(1): p. 1-10. 27. Visel, A., et al., ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature, 2009. 457(7231): p. 854-8. 28. Sanyal, A., et al., The long-range interaction landscape of gene promoters. Nature, 2012. 489(7414): p. 109-113. 29. Arnold, C.D., et al., Genome-Wide Quantitative Enhancer Activity Maps Identified by STARR-seq. Science (New York, N.Y.), 2013. 1074. 30. Maston, G.a., S.K. Evans, and M.R. Green, Transcriptional regulatory elements in the human genome. Annual review of genomics and human genetics, 2006. 7: p. 29-59. 31. Li, Q., et al., Locus control regions. Blood, 2002. 100(9): p. 3077-86. 32. Woychik, N.a. and M. Hampsey, The RNA polymerase II machinery: structure illuminates function. Cell, 2002. 108(4): p. 453-63. 33. Tóth-Petróczy, A., et al., Malleable machines in transcription regulation: the mediator complex. PLoS Computational Biology, 2008. 4(12): p. e1000243e1000243. 34. Thurman, R.E., et al., The accessible chromatin landscape of the human genome. Nature, 2012. 489(7414): p. 75-82. 35. Neph, S., et al., An expansive human regulatory lexicon encoded in transcription factor footprints. Nature, 2012. 489(7414): p. 83-90. 36. Maurano, M.T., et al., Systematic localization of common disease-associated variation in regulatory DNA. Science (New York, N.Y.), 2012. 337(6099): p. 1190-5. 37. Ong, C.T. and V.G. Corces, Enhancer function: new insights into the regulation of tissue-specific gene expression. Nature Publishing Group, 2011. 12(4): p. 284-293. 38. Wang, Z., M. Gerstein, and M. Snyder, RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet, 2009. 10(1): p. 57-63. 39. Ambros, V., MicroRNA pathways in flies and worms: growth, death, fat, stress, and timing. Cell, 2003. 113(6): p. 673-6. 46 40. Bartel, D.P., R. Lee, and R. Feinbaum, MicroRNAs : Genomics , Biogenesis , Mechanism , and Function Genomics : The miRNA Genes. 2004. 116: p. 281297. 41. Yekta, S., I.H. Shih, and D.P. Bartel, MicroRNA-directed cleavage of HOXB8 mRNA. Science (New York, N.Y.), 2004. 304(5670): p. 594-6. 42. Banales, J.M., et al., Up-regulation of microRNA 506 leads to decreased Cl−/HCO3− anion exchanger 2 expression in biliary epithelium of patients with primary biliary cirrhosis. Hepatology, 2012. 56(2): p. 687-697. 43. Pasquinelli, A.E., S. Hunter, and J. Bracht, MicroRNAs: a developing story. Current opinion in genetics & development, 2005. 15(2): p. 200-5. 44. Sayed, D. and M. Abdellatif, MicroRNAs in Development and Disease. Physiological reviews, 2011. 91(3): p. 827-87. 45. Zheng, F., et al., The putative tumour suppressor microRNA-124 modulates hepatocellular carcinoma cell aggressiveness by repressing ROCK2 and EZH2. Gut, 2012. 61(2): p. 278-89. 46. Djebali, S., et al., Landscape of transcription in human cells. Nature, 2012. 489(7414): p. 101-8. 47. Batista, Pedro J. and Howard Y. Chang, Long Noncoding RNAs: Cellular Address Codes in Development and Disease. Cell, 2013. 152(6): p. 1298-1307. 48. Hindorff, L.A., et al., Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences, 2009. 106(23): p. 9362-9367. 49. Guttman, M., et al., Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature, 2009. 458(7235): p. 223-7. 50. Schorderet, P. and D. Duboule, Structural and functional differences in the long non-coding RNA hotair in mouse and human. PLoS genetics, 2011. 7(5): p. e1002071-e1002071. 51. Khalil, A.M., et al., Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proceedings of the National Academy of Sciences of the United States of America, 2009. 106(28): p. 11667-72. 52. Tsai, M.-C., et al., Long noncoding RNA as modular scaffold of histone modification complexes. Science (New York, N.Y.), 2010. 329(5992): p. 68993. 53. Kadonaga, J.T., Regulation of RNA polymerase II transcription by sequencespecific DNA binding factors. Cell, 2004. 116(2): p. 247-57. 54. Jolma, A., et al., DNA-Binding Specificities of Human Transcription Factors. Cell, 2013. 152(1-2): p. 327-39. 55. Garvie, C.W. and C. Wolberger, Recognition of Specific DNA Sequences. Molecular Cell, 2001. 8(5): p. 937-946. 56. Kaestner, K.H., Transcriptional Networks in Mammalian Gene Regulation. 2000, Landes Bioscience. 57. Odom, D.T., et al., Control of pancreas and liver gene expression by HNF transcription factors. Science (New York, N.Y.), 2004. 303(5662): p. 1378-81. 58. Neph, S., et al., Circuitry and dynamics of human transcription factor regulatory networks. Cell, 2012. 150(6): p. 1274-86. 59. Chandra, V., et al., Multidomain integration in the structure of the HNF-4[agr] nuclear receptor complex. Nature, 2013. 495(7441): p. 394-398. 60. Lee, Tong I. and Richard A. Young, Transcriptional Regulation and Its Misregulation in Disease. Cell, 2013. 152(6): p. 1237-1251. 47 61. Grant, S.F.a., et al., Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nature genetics, 2006. 38(3): p. 320-3. 62. Gaulton, K.J., et al., A map of open chromatin in human pancreatic islets. Nature genetics, 2010. 42(3): p. 255-9. 63. Takahashi, K. and S. Yamanaka, Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell, 2006. 126(4): p. 663-76. 64. Takahashi, K., et al., Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell, 2007. 131(5): p. 861-72. 65. Huang, P., et al., Induction of functional hepatocyte-like cells from mouse fibroblasts by defined factors. Nature, 2011. 475(7356): p. 386-389. 66. Sekiya, S. and A. Suzuki, Direct conversion of mouse fibroblasts to hepatocyte-like cells by defined factors. Nature, 2011. 475(7356): p. 390-393. 67. Zaret, K.S. and J.S. Carroll, Pioneer transcription factors: establishing competence for gene expression. Genes & Development, 2011. 25(21): p. 2227-2241. 68. Lupien, M., et al., FoxA1 translates epigenetic signatures into enhancer-driven lineage-specific transcription. Cell, 2008. 132(6): p. 958-70. 69. Rada-Iglesias, A., et al., Binding sites for metabolic disease related transcription factors inferred at base pair resolution by chromatin immunoprecipitation and genomic microarrays. Human molecular genetics, 2005. 14(22): p. 3435-47. 70. Schrem, H., J. Klempnauer, and J. Borlak, Liver-enriched transcription factors in liver function and development. Part I: the hepatocyte nuclear factor network and liver-specific gene expression. Pharmacological reviews, 2002. 54(1): p. 129-58. 71. Rha, G.B., et al., Multiple binding modes between HNF4alpha and the LXXLL motifs of PGC-1alpha lead to full activation. The Journal of biological chemistry, 2009. 284(50): p. 35165-76. 72. Petrescu, A.D., et al., Ligand specificity and conformational dependence of the hepatic nuclear factor-4alpha (HNF-4alpha ). The Journal of biological chemistry, 2002. 277(27): p. 23988-99. 73. Walesky, C., et al., Hepatocyte-specific deletion of hepatocyte nuclear factor4alpha in adult mice results in increased hepatocyte proliferation. Am J Physiol Gastrointest Liver Physiol, 2013. 304(1): p. 25. 74. Flanagan, S.E., et al., Diazoxide-responsive hyperinsulinemic hypoglycemia caused by HNF4A gene mutations. Eur J Endocrinol, 2010. 162(5): p. 987-92. 75. Kapoor, R.R., et al., Hyperinsulinaemic hypoglycaemia. Arch Dis Child, 2009. 94(6): p. 450-7. 76. Cho, Y.S., et al., Meta-analysis of genome-wide association studies identifies eight new loci for type 2 diabetes in east Asians. Nat Genet, 2011. 44(1): p. 6772. 77. Rosmarin, a., GA-binding protein transcription factor: a review of GABP as an integrator of intracellular signaling and protein–protein interactions. Blood Cells, Molecules, and Diseases, 2004. 32(1): p. 143-154. 78. West, A.G., Chromatin insulator elements : establishing barriers to set heterochromatin boundaries R eview. 2012. 4: p. 67-80. 79. Pajukanta, P., et al., Familial combined hyperlipidemia is associated with upstream transcription factor 1 (USF1). Nature genetics, 2004. 36(4): p. 371-6. 80. Bohacek, J. and I.M. Mansuy, Epigenetic Inheritance of Disease and Disease Risk. Neuropsychopharmacology, 2013. 38(1): p. 220-236. 48 81. Kornberg, R.D., Chromatin Structure : A Repeating Unit of Histones and DNA Chromatin structure is based on a repeating unit of eight histone molecules and about 200 DNA base pairs. Science (New York, N.Y.), 1974. 184: p. 3-6. 82. Peterson, C.L. and M.-A. Laniel, Histones and histone modifications. Current biology : CB, 2004. 14(14): p. R546-51. 83. Berger, S.L., The complex language of chromatin regulation during transcription. Nature, 2007. 447(7143): p. 407-12. 84. Dekker, J., et al., Capturing Chromosome Conformation. Science, 2002. 295(5558): p. 1306-1311. 85. Lieberman-Aiden, E., et al., Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science, 2009. 326(5950): p. 289-293. 86. Gary, F. and G. Mark, Controlling the double helix. Nature, 2003. 421(6921): p. 448-453. 87. Jiang, C. and B.F. Pugh, Nucleosome positioning and gene regulation: advances through genomics. Nat Rev Genet, 2009. 10(3): p. 161-172. 88. Andersson, R., et al., Nucleosomes are well positioned in exons and carry characteristic histone modifications. Genome research, 2009. 19(10): p. 173241. 89. Kolasinska-Zwierz, P., et al., Differential chromatin marking of introns and expressed exons by H3K36me3. Nature genetics, 2009. 41(3): p. 376-81. 90. Fischle, W., Y. Wang, and C.D. Allis, Histone and chromatin cross-talk. Current Opinion in Cell Biology, 2003. 15(2): p. 172-183. 91. Marks, P., et al., Histone deacetylases and cancer: causes and therapies. Nature reviews. Cancer, 2001. 1(3): p. 194-202. 92. Santos-rosa, H., et al., Active genes are tri-methylated at K4 of histone H3. Nature, 2002. 419(September): p. 407-411. 93. Martin, D.G.E., et al., The Yng1p plant homeodomain finger is a methylhistone binding module that recognizes lysine 4-methylated histone H3. Molecular and cellular biology, 2006. 26(21): p. 7871-9. 94. Barski, A., et al., High-resolution profiling of histone methylations in the human genome. Cell, 2007. 129(4): p. 823-37. 95. Wang, Z., et al., Combinatorial patterns of histone acetylations and methylations in the human genome. Nature genetics, 2008. 40(7): p. 897-903. 96. Karlić, R., et al., Histone modification levels are predictive for gene expression. Proceedings of the National Academy of Sciences of the United States of America, 2010. 107(7): p. 2926-31. 97. Creyghton, M.P., et al., Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proceedings of the National Academy of Sciences of the United States of America, 2010. 107(50): p. 21931-6. 98. Robertson, K.D., DNA methylation and human disease. Nature reviews. Genetics, 2005. 6(8): p. 597-610. 99. Lienert, F., et al., Identification of genetic elements that autonomously determine DNA methylation states. Nature Genetics, 2011. 43(11): p. 10911097. 100. Stadler, M.B., et al., DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature, 2011. 480(7378): p. 490-5. 101. Irizarry, R.A., et al., The human colon cancer methylome shows similar hypoand hypermethylation at conserved tissue-specific CpG island shores. Nat Genet, 2009. 41(2): p. 178-186. 49 102. Pollard, S.M., S.H. Stricker, and S. Beck, A Shore Sign of Reprogramming. Cell Stem Cell, 2009. 5(6): p. 571-572. 103. Portela, A. and M. Esteller, Epigenetic modifications and human disease. Nat Biotech, 2010. 28(10): p. 1057-1068. 104. Smith, Z.D. and A. Meissner, DNA methylation: roles in mammalian development. Nature Reviews Genetics, 2013. 14(3): p. 204-220. 105. Volkmar, M., et al., DNA methylation profiling identifies epigenetic dysregulation in pancreatic islets from type 2 diabetic patients. EMBO J, 2012. 31(6): p. 1405-1426. 106. Liu, Y., et al., Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in rheumatoid arthritis. Nat Biotech, 2013. 31(2): p. 142-147. 107. Lee, Jeannie T. and Marisa S. Bartolomei, X-Inactivation, Imprinting, and Long Noncoding RNAs in Health and Disease. Cell, 2013. 152(6): p. 13081323. 108. Plass, C. and P.D. Soloway, REVIEW DNA methylation , imprinting and cancer. European Journal of Human Genetics, 2002: p. 6-16. 109. Gartler, S.M., X-Chromosome Inactivation. Life Sciences, 2001: p. 1-6. 110. Okamoto, I. and E. Heard, Lessons from comparative analysis of Xchromosome inactivation in mammals. Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology, 2009. 17(5): p. 659-69. 111. Massagué, J., TGFβ signalling in context. Nature reviews. Molecular cell biology, 2012. 13(10): p. 616-30. 112. Kubiczkova, L., et al., TGF-β - an excellent servant but a bad master. Journal of translational medicine, 2012. 10: p. 183-183. 113. Akhurst, R.J. and A. Hata, Targeting the TGFβ signalling pathway in disease. Nature reviews. Drug discovery, 2012. 11(10): p. 790-811. 114. Dooley, S. and P. ten Dijke, TGF-β in progression of liver disease. Cell and tissue research, 2012. 347(1): p. 245-56. 115. Lucey, M.R., P. Mathurin, and T.R. Morgan, Alcoholic hepatitis. The New England journal of medicine, 2009. 360(26): p. 2758-69. 116. Zakhari, S., Overview: how is alcohol metabolized by the body? Alcohol research & health : the journal of the National Institute on Alcohol Abuse and Alcoholism, 2006. 29(4): p. 245-54. 117. Philippe Collas, J.A.D., Chop it, ChIP it, check it: the current status of chromatin immunoprecipitation. Front Biosci., 2008. 1;13:929-4. 118. Margulies, M., et al., Genome sequencing in microfabricated high-density picolitre reactors. Nature, 2005. 437(7057): p. 376-80. 119. Wallerman, O., et al., Molecular interactions between HNF4a, FOXA2 and GABP identified at regulatory DNA elements through ChIP-sequencing. Nucleic acids research, 2009. 37(22): p. 7498-508. 120. Fullwood, M.J., et al., An oestrogen-receptor-[agr]-bound human chromatin interactome. Nature, 2009. 462(7269): p. 58-64. 121. Li, G., et al., Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell, 2012. 148(1-2): p. 84-98. 122. Rhee, Ho S. and B.F. Pugh, Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution. Cell, 2011. 147(6): p. 1408-1419. 123. Garber, M., et al., A high-throughput chromatin immunoprecipitation approach reveals principles of dynamic gene regulation in mammals. Molecular cell, 2012. 47(5): p. 810-22. 50 Acta Universitatis Upsaliensis Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 904 Editor: The Dean of the Faculty of Medicine A doctoral dissertation from the Faculty of Medicine, Uppsala University, is usually a summary of a number of papers. A few copies of the complete dissertation are kept at major Swedish research libraries, while the summary alone is distributed internationally through the series Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine. Distribution: publications.uu.se urn:nbn:se:uu:diva-198579 ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2013
© Copyright 2026 Paperzz