University of Iowa Iowa Research Online Theses and Dissertations Fall 2012 Global analysis of alternative polyadenylation regulation using high-throughput sequencing Ji Wan University of Iowa Copyright 2012 Ji Wan This dissertation is available at Iowa Research Online: http://ir.uiowa.edu/etd/3548 Recommended Citation Wan, Ji. "Global analysis of alternative polyadenylation regulation using high-throughput sequencing." PhD (Doctor of Philosophy) thesis, University of Iowa, 2012. http://ir.uiowa.edu/etd/3548. Follow this and additional works at: http://ir.uiowa.edu/etd Part of the Genetics Commons GLOBAL ANALYSIS OF ALTERNATIVE POLYADENYLATION REGULATION USING HIGH-THROUGHPUT SEQUENCING by Ji Wan An Abstract Of a thesis submitted in partial fulfillment of the requirements for the Doctor of Philosophy degree in Genetics in the Graduate College of The University of Iowa December 2012 Thesis Supervisor: Associate Professor Yi Xing 1 ABSTRACT Messenger RNAs (mRNAs) have to undergo a series of post-transcriptional processing steps before translation. One of the post-transcriptional steps – 3’ end processing, which consists of cleavage and polyadenylation, is critical for delimiting the 3’ end of mRNA and determining regulatory elements for downstream posttranscriptional/translational regulation. Like another well-characterized mRNA processing step – splicing, 3’ end processing is very flexible due to the diversity of transacting factors and cis-acting elements in the 3’ end of mRNA. In recent years, the differential usage of alternative polyA sites (APA) of the same gene, which leads to mRNA isoforms of different 3’ UTR, has been increasingly revealed by both experimental and computational studies. More significantly, the global changes of 3’ UTR length have been observed in multiple clinical settings, particularly in the cancer cells. However, the depiction of APA phenomenon does not synchronize the efforts to study the mechanism underlying APA biogenesis. In this thesis, we first describe general principle and pipeline to identify APA in different biological or clinical conditions using various high throughput sequencing techniques. After that, we present the work about the global impacts of two RNA binding proteins (ESRP/aCP) and one core 3’ end processing factor (CstF64 and its paralog CstF64τ) on the regulation of APA. The APA identification analyses and motif analyses suggest a wide range of APA associated with the expression change of those proteins in different cell lines. In addition, for each protein, we have collected substantial evidence about the mechanism underlying the APA induction. Our findings could provide significant insights into the APA regulation mechanisms. 2 In addition, we also conducted research on the induction of APA in JEG-3 cells as a response to the change of oxygen supply (Hypoxia and Normoxia). Using a robust protocol for specifically sequencing 3’ end of mRNA, we identified more than 500 APA events and revealed a global shortening pattern of 3’ UTR length as a result of hypoxia. The work on APA in this thesis largely increases the understanding of APA regulation by various proteins and provided new evidence for the APA in clinical condition. Abstract Approved: ____________________________________ Thesis Supervisor ____________________________________ Title and Department ____________________________________ Date GLOBAL ANALYSIS OF ALTERNATIVE POLYADENYLATION REGULATION USING HIGH-THROUGHPUT SEQUENCING by Ji Wan A thesis submitted in partial fulfillment of the requirements for the Doctor of Philosophy degree in Genetics in the Graduate College of The University of Iowa December 2012 Thesis Supervisor: Associate Professor Yi Xing Graduate College The University of Iowa Iowa City, Iowa CERTIFICATE OF APPROVAL _______________________ PH.D. THESIS _______________ This is to certify that the Ph.D. thesis of Ji Wan has been approved by the Examining Committee for the thesis requirement for the Doctor of Philosophy degree in Genetics at the December 2012 graduation. Thesis Committee: ___________________________________ Yi Xing, Thesis Supervisor ___________________________________ Beverly Davidson ___________________________________ Charles Brenner ___________________________________ Jian Huang ___________________________________ Dana Levasseur ACKNOWLEDGMENTS I gratefully acknowledge my mentor Yi Xing for his patience, guidance and support. I also acknowledge my committee members Beverly Davidson, Charles Brenner , Jian Huang and Dana Levasseur for their encouragement and advice. I acknowledge my past and current colleagues in the Xing lab (Peng Jiang, Lan Lin, Zhixiang Lu, Juw Won Park, Jinkai Wang, Keyan Zhao, Seth Brown, Hongchao Lu, Shihao Shen, Collin Tokheim) for their intellectual suggestions, support and friendship I truly enjoyed. I acknowledge my collaborators Russ Carstens, Kimberly Dittmar, Stephen Liebhaber, Xinjun Ji, Yongsheng Shi, and Hayley McLoughlin for their working attitude and knowledge I learnt. Finally, I acknowledge Dan Eberl, Isabelle Hardy, Linda Hurst and Kafer Anita of the genetics program for their timely and generous help in the past four years. ii ABSTRACT Messenger RNAs (mRNAs) have to undergo a series of post-transcriptional processing steps before translation. One of the post-transcriptional steps – 3’ end processing, which consists of cleavage and polyadenylation, is critical for delimiting the 3’ end of mRNA and determining regulatory elements for downstream posttranscriptional/translational regulation. Like another well-characterized mRNA processing step – splicing, 3’ end processing is very flexible due to the diversity of transacting factors and cis-acting elements in the 3’ end of mRNA. In recent years, the differential usage of alternative polyA sites (APA) of the same gene, which leads to mRNA isoforms of different 3’ UTR, has been increasingly revealed by both experimental and computational studies. More significantly, the global changes of 3’ UTR length have been observed in multiple clinical settings, particularly in the cancer cells. However, the depiction of APA phenomenon does not synchronize the efforts to study the mechanism underlying APA biogenesis. In this thesis, we first describe general principle and pipeline to identify APA in different biological or clinical conditions using various high throughput sequencing techniques. After that, we present the work about the global impacts of two RNA binding proteins (ESRP/aCP) and one core 3’ end processing factor (CstF64 and its paralog CstF64τ) on the regulation of APA. The APA identification analyses and motif analyses suggest a wide range of APA associated with the expression change of those proteins in different cell lines. In addition, for each protein, we have collect substantial evidence about the mechanism underlying the APA induction. Our findings could provide significant insights into the APA regulation mechanisms. iii In addition, we also conducted a research on the induction of APA in JEG-3 cells as a response to the change of oxygen supply (Hypoxia and Normoxia). Using a robustness protocol for specifically sequencing 3’ end of mRNA, we identified more than 500 APA events and revealed a global shortening pattern of 3’ UTR length as a result of hypoxia. The work on APA in this thesis largely increases the understanding of APA regulation by various proteins and provided new evidence for the APA in clinical condition. iv TABLE OF CONTENTS LIST OF TABLES ............................................................................................................ vii LIST OF FIGURES ......................................................................................................... viii INTRODUCTION ...............................................................................................................1 CHAPTER I INTRODUCTION TO ALTERNATIVE POLYADENYLATION ...................................................................................3 3’ end processing pathways ..............................................................................3 Alternative 3’ end processing ...........................................................................4 Consequences of APA ......................................................................................5 Mechanism of APA regulation .........................................................................8 Alternative polyadenylation in disease ...........................................................10 CHAPTER II IDENTIFICATION OF APA EVENTS USING HIGHTHROUGHPUT SEQUENDING TECHNIQUES ........................................12 Introduction.....................................................................................................12 Detect APA using generic RNA-Seq ..............................................................14 Filtering artifact terminal exon ................................................................15 Predefine APA events..............................................................................16 Calling statistically significant APA (RNA-Seq) ....................................17 Detect APA using PAS-Seq and DRS ............................................................19 Read mapping and polyA site calling ......................................................19 Filtering artificial polyA site due to internal priming .............................19 Clustering heterogeneous polyA sites .....................................................20 Calling statistically significant APA (DRS/PAS-Seq) ............................20 Summary .........................................................................................................22 CHAPTER III THE POLY-C BINDING PROTEINS ACT AS GLOBAL REGULATORS OF ALTERNATIVE POLYADENYLATION ...................23 Introduction.....................................................................................................23 Results.............................................................................................................24 Direct RNA 3’ sequencing of the transcriptome in cells acutely depleted of CP .......................................................................................24 Identification of mRNAs impacted by CP depletion ............................27 Motif analysis reveals C-rich determinants in the 3’ UTRs of mRNAs impacted by CP depletion .......................................................30 CP impacts on patterns of alternative polyA selection .........................32 Motif analysis of APA events..................................................................34 CP2 controls the 3’ processing of its own transcript ............................38 APA pattern changes impacted by CPs.................................................39 Discussion .......................................................................................................40 Materials and Methods ...................................................................................52 Cell culture and siRNA transfection ........................................................52 Direct RNA sequencing ...........................................................................52 Mapping and APA analysis of DRS data ................................................52 Detection of differential gene expression ................................................53 Motif enrichment analysis .......................................................................54 Gene ontology analysis ............................................................................54 QPCR .......................................................................................................55 v 3’RACE ...................................................................................................55 RNA UV-crosslinking and EMSA ..........................................................55 CHAPTER IV CONTEXT-DEPENDENT REGULATION OF APA BY EPITHELIAL SPLICING REGULATORY PROTEINS ..............................56 Introduction.....................................................................................................56 Results.............................................................................................................56 Identification of ESRP-regulated changes in alternative 3’ end formation by coupling .............................................................................56 Discussion .......................................................................................................62 Materials and Methods ...................................................................................64 Cell Culture, transfection, and transduction ............................................64 Library preparation and sequencing ........................................................64 Identification of ESRP regulated changes in polyadenylation ................65 CHAPTER V GLOBAL REGULATION OF ALTERNATIVE POLYADENYLATION BY CLEAVAGE STIMULATION FACTOR 64 (CSTF64) ...................................................................................................66 Background .....................................................................................................66 Global analyses of CstF64-mediated APA regulation ....................................66 Discussion .......................................................................................................69 Materials and Methods ...................................................................................72 Cell culture and transfections ..................................................................72 Gel shift assay..........................................................................................72 Sequencing and reads mapping ...............................................................72 APA analysis ...........................................................................................74 CHAPTER VI ALTERNATIVE POLYADENYLATION DURING HYPOXIA INDUCTION ...............................................................................75 Background .....................................................................................................75 Results.............................................................................................................76 Quality control analysis of PAS-Seq data ...............................................76 Reproducibility of PAS-Seq ....................................................................76 Gene expression correlation between PAS-Seq and RNA-Seq ...............78 Alternative polyadenylation induced by hypoxia ....................................80 GO analysis of APA genes ......................................................................81 Discussion .......................................................................................................83 Materials and Methods ...................................................................................84 Sample prepration ....................................................................................84 RT-qPCR validation of APA ...................................................................85 Detect APA for multiple samples ............................................................85 Classification of APA ..............................................................................86 Gene enrichment analysis ........................................................................87 CHAPTER VII FUTURE DIRECTION ....................................................................88 REFERENCES ..................................................................................................................90 vi LIST OF TABLES Table 1 Characteristic features of different high-throuput sequencing techniques in 3’ end processing study. .............................................................................14 Table 2 Example of contingency table used in One v.s. Others method. . ..................21 Table 3 Summary of DRS data of CP knockdown experiments. .............................28 Table 4 Number of differentially expressed genes impacted by aCP depletion ..........29 Table 5 Gene Ontology analysis on DEG genes impacted by αCPs knockdown ........33 Table 6 List of validated genes having significant changes in overall expression after CP depletion. ........................................................................................35 Table 7 Number of APA events impacted by αCPs depletion.....................................37 Table 8 Summary of direct RNA sequencing data of ESRPs knockdown. .................58 Table 9 Summary of DRS data of CstF64/τ knockdown experiments. .......................67 Table 10 Summary of Hypoxia and MAQC PAS-Seq data...........................................77 vii LIST OF FIGURES Figure 1 Cis-acting regulatory elements and trans-acting proteins in eukaryotic 3’ end processing. ......................................................................4 Figure 2 Classification of APA events by mechanism. .............................................6 Figure 3 Functional consequence of SE-APA. ..........................................................7 Figure 4 Impact of DE-APA on the difference of coding sequence. ........................8 Figure 5 Pipeline for detecting significant APA switch for generic RNASeq, DRS and PAS-Seq.............................................................................15 Figure 6 Flowchart of precompiling APA events only from gene structure annotation ..................................................................................................17 Figure 7 siRNA-mediated co-depletion of CP1 and CP2 from K562 cells. ..........................................................................................................25 Figure 8 Histogram of DRS read lengths of all the four samples (pooled). ............26 Figure 9 Reproducibility of DRS polyA reads. .......................................................27 Figure 10 Heatmap of differentially expressed genes after CP knockdown ...........31 Figure 11 GO analysis of mRNAs altered in overall expression (DGE levels)by CP depletion. ...........................................................................32 Figure 12 Confirmation of differentially expressed genes impacted by CPs by targeted real time RT-PCR analysis. ....................................................34 Figure 13 Motif analysis within the 3’UTRs of mRNAs impacted by CP depletion ....................................................................................................36 Figure 14 Motif analysis of transcripts undergoing APA in response to CP depletion. ....................................................................................................41 Figure 15 siRNA-mediated depletion of CPs alters 3’ processing of the CP2 transcript. ...................................................................................................44 Figure 16 QPCR validations of APA.........................................................................46 Figure 17 In vitr o RNA-protein interaction assay. ...................................................49 Figure 18 Impact of CPs on expression and alternative polyadenylation of Pol II transcripts. ........................................................................................50 Figure 19 Outline of the experimental systems and RNA-Seq/DRS protocol used to identify ESRP regulated APA. ......................................................57 Figure 20 Examples of three types of alternative 3’ end formation regulated by the ESRPs...................................................................................................60 viii Figure 21 Example of APA events correlating with host gene expression change ........................................................................................................61 Figure 22 A functional map for ESRP position-dependent regulation of alternative polyadenylation. .......................................................................63 Figure 23 CstF64-mediated global APA regulation. ..................................................70 Figure 24 Comparison of the RNA-binding specificities of CstF64 and CstF64τ. .....................................................................................................71 Figure 25 Comparison of APA changes in CstF64- and CstF64&τ-RNAi cells. .......73 Figure 26 Reproducibility of polyA sites by PAS-Seq...............................................78 Figure 27 Clustering of 18 PAS-Seq samples. ...........................................................79 Figure 28 Scatterplots of gene expression measured by PAS-Seq and RNASeq..............................................................................................................80 Figure 29 RT-qPCR validation of APA events in hypoxia. .......................................82 Figure 30 Distribution of 3’ UTR lengths in Hypoxia and Normoxia. ......................83 Figure 31 Distribution of expression levels of SE-APA genes. .................................84 ix 1 INTRODUCTION As the bridge between hardcoded DNA and functional protein during gene expression, messenger RNA (mRNA) processing in eukaryotic cells provides a layer of flexibility to increase regulatory and functional diversity. Thus, understanding the alternative signals or pathways in mRNA processing is one of central issues in modern biology. The major mRNA processing events include 5’capping, 3’ cleavage and polyadenylation, splicing and RNA editing [1]. After being processed, the mature mRNA consists of three structural segments – 5’ untranslated region (5’ UTR), coding sequence (CDS) and 3’ untranslated region (3’ UTR). Amongst them, the 5’ UTR is involved in translational initiation; the CDS determines the amino acid composition of protein; and the 3’ UTR regulates mRNA stability and translational efficiency. The expressed sequences for these three segments are not invariable. There is great complexity in the pre-mRNA processing pathways. This property in eukaryotic cells provides a plethora of functionally specific mRNA or protein isoforms. As a double-edged sword, the temporally and spatially abnormal expression of inappropriate isoforms can cause defects in cell function and ultimately lead to animal diseases[2]. This dissertation mainly focuses on the issues related to 3’ UTR isoforms of mRNA. Chapter I will introduce the background for the generation, regulation and consequences of 3’ UTR. Chapter II will discuss how to take advantage of highthroughput sequencing techniques to identify differential expression of 3’ UTR isoforms. Chapters III, IV and V will discuss to what extent and in what manner do core processing factors (CstF64/CstF64τ) and RNA binding protens (ESRP and aCP) regulate different 3’ 2 UTR isoforms. At last, chapter VI will discuss the differential expression of 3’ UTR isoforms between hypoxia and normoxia. 3 CHAPTER I INTRODUCTION TO ALTERNATIVE POLYADENYLATION 3’ end processing pathways Formation of the 3’ end is critical to the maturation and post-transcriptional regulation of mRNA. Except for histone genes, most of precursor mRNAs (pre-mRNAs) undergo two steps of processing at the 3’ end – cleavage and polyadenylation [3]. The cleavage is to delimit the 3’ end of mRNA by cutting the precursor mRNA at the cleavage site (CS); and the polyadenylation is to attach a stretch of adenosines to the CS for the ease of nuclear export as well as conferringmRNA stability and translation. Two major groups of processing factors (core and accessory factors) are recruited to regulate 3’ end processing by interacting with different cis-acting elements in the 3’ end of premRNA. The endonucleolytic cleavage is catalyzed by the CPSF complex (CPSF 30, CPSF 73, CPSF 100, CPSF 160 and hFip1) and CF complex (CF I and CF II) through recognition of the polyA signal (AAUAAA and the variants) which is about 10 to 30nt upstream of the cleavage site. On the other side, CstF complex are recruited for cleavage by recognizing U/GU-rich element usually 30nt downstream of the CS (Figure 1). The cleaved mRNA is subsequently appended by a stretch of adenosines with the involvement of PAP. It is proposed that the length of polyA tail is decided by the synergy between PABPN1, CPSF and PAP, which is typically about 250 nt long [4]. The length of polyA tail is critical for mRNA stability, cellular localization and translational regulation. In addition to the aforementioned core processing machinery, the footprints of multiple RNA binding proteins (RBPs) or splicing regulators (SRs) have been found on the 3’ UTR via possible interaction with other cis-acting elements in the 3’ UTR [5-7].These 4 proteins are thought to be accessory to the core 3’ end processing machinery. For example, a considerable portion of CrossLinking ImmunoPrecipitation sequencing (CLIP-Seq) tags of a brain specific SR – Nova, were mapped to the flanking regions of polyA sites in hundreds of mRNAs. In addition, the Nova-specific YCAY-rich motifs were enriched in those flanking regions, which confirms the physical interaction between RBP and a cis-acting element in the 3' UTR [5]. Figure 1 Cis-acting regulatory elements and trans-acting proteins in eukaryotic 3’ end processing. The mRNA is cleaved at the cleavage site (black triangle) and is further attached with a polyA tail. There are three functional units in this whole process (as distinguished by different colors). The first unit consists of cleavage factor (CF) and cleavage and polyadenylation specificity factor (CPSF) families which interacting with polyA signal AATAAA about 30 nt upstream of cleavage site. The second unit is composed of cleavage stimulating factor (CstF) family and downstream sequence elements (DSE). The last unit represents the interaction between RNA binding proteins (RBP) and upstream sequence elements (USE). Alternative 3’ end processing Multiple distinct polyA sites can be alternatively utilized in the same gene. It was reported that about 40% to 50% of human or mouse genes had multiple polyA sites byanalyzing expressed sequence tag (EST) [8, 9]. A recent study identified near 100 5 proteins from the purified human 3’ processing complex [10]. Such a big framework implicates a large landscape of alternative processing in the 3’ end of mRNA, which can lead to the alternative usage of different polyA sites in the same gene (APA), due to the following lines of evidences: 1) variable polyA regulatory signals [11, 12]; 2) cell state/tissue specific expression of core 3’ end processing factors; 3) tissue specificity of accessory factors [13]. According to the biogenesis pathways, APA can be classified into 2 major categories corresponding to 4 sub-categories. The first major category is called Same terminal Exon APA (SE-APA) which denotes multiple polyA sites happening in the same continuous terminal exons. In contrast, the second category is called Different terminal Exon APA (DE-APA) in which alternative polyA sites happen in two non-overlapping terminal exons. DE-APA is usually the product of synergy of polyadenylation and splicing pathways. So it can be further divided into two sub-categories according to how the upstream splice site is used – DE-APA3 and DE-APA5. DE-APA3 is defined as the DE-APA coupled with alternative 3’ splice site choice and DE-APA5 is defined as the DE-APA coupled with alternative 5’ splice site choice as illustrated in Figure 2. Consequences of APA Genome-wide analyses have recently revealed a wide range of tissue-specific, cell-state-specific, and disease-related APA events in both human and mouse [14-16] . Understanding the biological consequences of different types of APA events (SE-APA, DE-APA3 and DE-APA5) is of great importance in many aspects of mRNA posttranscriptional regulation and translation. 6 SE-APA can generate two mRNA isoforms with mere difference in the 3’ UTR. Since 3’ UTR contains many regulatory elements for post-transcriptional and translational control, the SE-APA isoforms could undergo different post-transcriptional and translational regulation procedures (Figure 3) [17]. First, a representative example of the effect of SE-APA on post-transcriptional regulation is BDNF which is a key Figure 2 Classification of APA events by mechanism. SE-APA is generated due to two set of regulatory elements with the same 3’ UTR. DE-APA3 is generated when an upstream exon is spliced to the 3’ splice site of an exon (orange) bearing a functional polyA site or skips the entire exon and is spliced to the 3’ splice site of a downstream exon (red). DE-APA5 is created when a site in an exon (orange) either leads to a functional polyA site at the end of the same exon or is spliced as a 5’ splice site to a downstream exon. regulatory protein for neuronal structure and funtion. There are two mRNA isoforms of brain–derived neurotrophic factor (BDNF) encoding the same protein with only difference in the 3’ UTR. The isoform with longer 3’ UTR is preferentially localized in 7 dendrites while the short isoform is restricted to the cell soma [18]. This differential localization is not functionally arbitrary. It is further shown that the defect of dendritic targeting of BDNF is due to truncation of the longer isoform [18]. Secondly, the translational efficiency could also be different in SE-APA isoforms as demonstrated in a study of the gene Polo, in which the less translationally efficient isoform is fatal to the transgenic flies [19]. Finally, the isoforms of SE-APA are subject to regulation by microRNAs, which results in differential protein expression [16]. Figure 3 Functional consequence of SE-APA. Two mRNA isoforms which only differ in the 3’ UTR are shown. The isoform with extended UTR (bottom) bears additional microRNA target sites and binding sequences of regulatory proteins which could affect the mRNA stability, translation efficienty, mRNA export and cellular localization of mRNA. On the other hand, DE-APA could affect not only the 3’ UTR but also the CDS, leading to two different protein isoforms (Figure 4). For example, the IgM heavy chain gene possesses two isoforms as the result of DE-APA in the B cell. In the resting B cell, a 8 membrane-bound isoform corresponding to the distal polyA site is preferentially expressed; while in the activated B cell, a secreted isoform with the proximal polyA site is more highly expressed [20]. Recently, a truncated form of glutamyl-prolyl tRNA synthetase (EPRS) was found to be generated by a “Tyrosine codon to stop codon conversion” mechanism (PAY*). The C-terminus-truncated isoform can protect its targets from being translationally repressed by GAIT (gamma-interferon-activated inhibitor of translation), which was suggested to maintain basal level of pro-inflammatory proteins for tissue health and organismal advantage [21]. Figure 4 Impact of DE-APA on the difference of coding sequence. The common coding sequence is represented as green, while the different coding sequence is colored as organe and red respectively for two DE-APA isoforms. The blue segments are either 5’ UTR or 3’ UTR. Mechanism of APA regulation Transcription is a tightly coupled process [22]. While 3’ end processing of mRNA is predominantly regulated by a multitude of core and accessory factors, it can also be affected by the upstream events such as splicing and transcription. In this section, we will elaborate current findings for the regulatory mechanism underlying APA. First, the abundance of 3’ end processing factors can act as a determinant for alternative polyA choice. More specifically, higher expression of 3’ end processing 9 factors can compensate the weaker affinity between the factors and weak polyA cisacting elements. For example, the DE-APA event of IgM heavy chain gene is responsive to the differential expression of CstF64 between resting and activated B cells. As the expression of CstF64 dramatically increases from the resting to activated state, it interacts with a proximal weaker polyA site (as measured by affinity between CstF64 and polyA site) in contrast to interacting with a strong distal polyA site. In the opposite direction, the depletion of CstF64 switches the 3’ end processing from proximal polyA site to distal polyA site [23]. The indirect regulation of transcription factor – E2F has been found to cause the expression change of 3’ end processing factors in proliferation [24]. However, the study of CFIm68 exhibits a seemingly more complicated counter-example for this model where a proximal polyA site binds to CFIm68 (after knockdown) in higher affinity (as measured by the number of A-Seq reads) than the binding between CFIm68 (before knockdown) and a distal polyA site [25]. This may implicate a more sophisticated model of 3’ end processing factors in the APA regulation pathway. Secondly, canonical RNA binding proteins or splicing factors, which are thought to bind to the CDS or intronic regions, can also bind to the 3’ UTR regulating the efficiency of 3’ end processing. One possible mechanism is that RBPs/SFs can compete with core 3’ end processing factors for certain cis-acting elements. For instance, PTB can outcompete CstF64 for the U-rich DSE to play an inhibitory role in the 3’ end processing of -globin [26]. Another mechanism is that RBPs/SFs can bind with other cis-acting elements flanking the cleavage sites to enhance 3’ end processing [7, 26]. In this scenario, aCPs bind with upstream C-rich motif to enhance the 3’ end processing of human α-globin protein. Similarly, hnRNP H can bind upstream G-rich motif by forming a cooperative assembly with PTB [6]. In addition, recent high-throughput sequencing studies have revealed global regulation of alternative polyadenylation by hnRNP H and U1 snRNP, suggesting a pervasive role of RBP/SF in the regulation of APA [27, 28]. 10 Finally, alternative polyadenylation is also coupled with transcriptional processes. First, polyA site choice can be modulated by the gene promoter as demonstrated by reporter assays in [29]. Second, RNA polymerase II (RNAP II), which is critical in RNA transcription, was found to affect polyA site choice through multiple possible mechanisms. One mechanism is that the variation of transcription elongation rate associated with RNAP II could favor different polyA sites. This was demonstrated in Polo mRNAs whose proximal polyA site was promoted by an RNAP II mutant with a lower elongation rate [19]. Another mechanism is associated with transcriptional pausing of RNAP II during transcriptional elongation [30]. For example, the difference in the enrichment of RNAP II pausing signal between the vicinity of proximal polyA site in highly expressed genes and distal polyA site in lowly expressed genes may be ascribed to the different pattern of RNAP II pausing that affects gene expression and polyA site choice [29]. Alternative polyadenylation in disease Widespread APA has been increasingly discovered in disease, especially in cancer, through either conventional techniques or state-of-the-art high throughput sequencing techniques. More interestingly, systematic APA switch pattern is found to be related to different disease conditions, which demonstrates a significant role of APA in pathogenesis and oncogenesis. Using northern blotting, Mayr and Bartel observed shortening of oncogene 3’ UTRs in cancer cell lines when compared to non-transformed cell lines. The shortening of 3’ UTR results in higher protein product of oncogenes by avoiding repressive elements (presumably the microRNA binding sites) in the extended region of 3’ UTR [16]. In addition, two global studies on cancer cells not only found this shortening pattern but also observed a lengthening pattern in distinct cancer 11 cells/subtypes [31, 32]. On top of global analysis, individual studies have uncovered the relevant paths of APA to diseases. A well-studied example is the APA caused by a mutation in the polyA signal of FOXP3. Specifically, a rare AAUAA to AAUGAA mutation deactivates the proximal polyA site leading to an unstable FOXP3 mRNA using a distal polyA site. As a result, the FOXP3 protein decreases which leads to the syndrome of autoimmune disease IPEX (Immune dysfunction, Polyendocrinopathy, Enteropathy, X-linked) [33]. 12 CHAPTER II IDENTIFICATION OF APA EVENTS USING HIGH-THROUGHPUT SEQUENDING TECHNIQUES Introduction Initial efforts to study large-scale APA events were based on EST obtained by Sanger sequencing [34, 35]. However, rigidly speaking, these studies were not quantitative enough to call APA switch due to the limitation of EST data in terms of quantification and coverage. The early success in identifying global pattern of APA switching was made possible by taking advantage of various microarray techniques with high density probes targeting mRNAs. In these studies, APA usage switches were detected by comparing the expression intensities of two alternative regions delimited by two cleavage sites between different clinical or biological conditions [14, 15, 36]. However, three limitations in microarray techniques have restricted its application in detecting APA: 1) coverage limitation — microarray requires pre-defined probes against known mRNA or gene regions and is not able to address unannotated transcript or unannotated 3’ UTR; 2) accuracy limitation — no accurate polyA coordinate information can be derived from microarray data; 3). sensitivity limitation — microarray is not sensitive enough to handle the low expression of alternative region of 3’ UTR. In recent years, revolutionary high-throughput sequencing techniques (as compared to Sanger sequencing) have become a powerful alternative method to detect APA following their success in detecting alternative splicing [37, 38]. Superior to predefined microarray probes, the signal of RNA-Seq consists of short RNA fragments ranging from 25nt to 400nt (varied by sequencing platforms). Ideally, RNA-Seq reads covers the full body of the expressed RNAs (poly(A)+ or total RNA according to different 13 protocols). In addition, RNA-Seq can also detect the expression of novel transcripts without any prior knowledge. Furthermore, RNA-Seq is sensitive enough to detect lowly expressed gene [39]. These merits lead to the identification of APA switches in multiple biological settings, including APA between human tissues and the impact of RNA binding protein on APA formation [27, 38, 40]. In spite of these advantages, because 3’ UTRs only account for a small portion of whole transcriptome, using generic RNA-Seq is cost-ineffective for detecting APA. Similar to the microarray, RNA-Seq alone cannot precisely determine the polyA site coordinate which could possibly result in the loss of accuracy in detecting APA. On the basis of generic RNA-Seq, some RNA-Seq protocols, PolyA Site Sequencing (PAS-Seq), specifically focusing on the 3’ end of mRNA have recently been developed [32, 41-44]. All of these protocols share a common and core step by applying oligo (dT) primed reverse transcription-PCR (RT-PCR) to capture the RNA fragment with a polyA stretch at its 3’ end. This type of method overcomes most of the disadvantages of generic RNA-Seq in detecting APA but introduce a considerable amount of internal A-rich sequences (by oligo (dT) internal priming), which requires extra efforts for removal [32, 44]. More importantly, since these methods are derivatives of the Illumina RNA-Seq protocol, it allows sequencing multiplexed samples in one lane with still high yield of RNA-Seq reads. Another high-throughput sequencing method to detect APA is direct RNA sequencing (DRS) from the single molecule sequencing machine of Helicos [45]. This method is specifically designed to accurately capture the mRNA 3’ end without RT-PCR in library construction. Therefore, it can measure RNA expression and polyA abundance simultaneously with a smaller amount of RNA sample than RNA-Seq. Although DRS has better quantitative nature, it also has problems of 14 higher sequencing error rate, higher sequencing cost and shortage of multiplex capability. Practically, this technique is less used than the Illumina-based techniques (Table 1). But overall, all of these high-throughput sequencing techniques have rendered APA detection effective and successful. Next, we will describe how to conduct global identification of APA events using generic RNA-Seq and DRS or PAS-Seq respectively as outlined in Figure 5. Table 1 Characteristic features of different high-throuput sequencing techniques in 3’ end processing study. Techniques Specific to 3’ Internal Sequencing end priming error Multiplex cDNA amplification RNA-Seq No No Low Yes Yes PAS-Seq Yes High Low Yes Yes DRS Yes Low High No No Detect APA using generic RNA-Seq The critical issues in APA detection using generic RNA-Seq include: 1) a set of reliable terminal exons; 2) accurate coordinates of polyA sites; 3) a predefined set of APA events of different types (SE-APA, DE-APA3 and DE-APA5); and 4) robust algorithms to infer significant APA switches for different types of APA events. In this section, we will discuss the aforementioned issues respectively. 15 Filtering artificial terminal exon Transcript annotations of human genes were downloaded from the UCSC Known Gene database (hg19) and the Ensembl gene database (release 61). To minimize the Figure 5 Pipeline for detecting significant APA switch for generic RNA-Seq, DRS and PAS-Seq. 16 effect of artificial transcript annotation for APA detection, we empirically set two rules to eliminate potential artifacts in the UCSC and Ensembl transcript annotation: 1) remove 3’ terminal exons whose 5’ ends equal but 3’ ends fall within internal exons of other transcripts (incomplete transcript annotation); and 2) remove 3’ terminal exons containing multiple internal exons of other transcripts (artificial intron retention events) (Figure 6). Predefine APA events Next, based on filtered transcription annotation, we constructed three sets of alternative 3’ end events (SE-APA, DE-APA3, and DE-APA5). If the 3’ terminal exons of two different transcripts of the same gene do not overlap with each other and any other internal exons, they are defined as a DE-APA3 event. If the 3’ terminal exons of two different transcripts of the same gene do not overlap with each other, and the proximal 3’ terminal exon shared the 5’ end with an internal exon of the other transcript while its 3’ end extended beyond the 3’ end of the same internal exon, these two 3’ terminal exons are defined as a DE-APA5 event. For SE-APA events, we collected all EST-supported polyA sites within the 3’ terminal exons from the PolyA-DB2 database [46], and treated the 3’ end of any 3’ terminal exon as another putative polyA site. After that, for any 3’ terminal exon with more than one polyA site, we considered all pairs of adjacent polyA sites as SE-APA events. The common and extended regions of each SE-APA event were determined as the region between the 5’ end of the 3’ terminal exon and the proximal polyA site, and the region between the proximal polyA site and the distal polyA site respectively. SE-APA events whose common or extended regions were smaller than 100 nt were removed from further analysis in order to eliminate closely clustered polyA sites. 17 As a result, we produced 30044, 20337, and 17117 possible SE-APA, DE-APA3 and DEAPA5 events respectively from the known gene annotation. Figure 6 Flowchart of precompiling APA events only from gene structure annotation. Calling statistically significant APA (RNA-Seq) For each APA event, uniquely mapped RNA-Seq reads against the human genome (hg19) or exon-exon junctions were used for calculating the transcript levels of different 3’ ends. For SE-APA event, RNA-Seq reads uniquely mapped to common and 18 extended regions were counted respectively. We calculated the Bayes factor for testing whether the ratios of read densities in the common and extended regions were significantly different between two biological conditions [27]. In addition, we also calculated the relative expression fold change of the common and extended regions by dividing the ratio of read counts of the common region and the extended region in one condition by that in the other condition. Significant SE-APA events are defined as those with a fold change of at least 2 and a Bayes factor no less than 100 (according to the suggestion in [47]). For DE-APA3 and DE-APA5 events, RNA-Seq reads uniquely mapped to the proximal and distal 3’ terminal exons were counted respectively. Raw RNA-Seq counts were also transformed to the Reads Per Kilobase per Million mapped reads (RPKM) values as the normalized expression levels of the two alternative 3’ terminal exons [37]. APA events in which at least one of the alternative 3’ terminal exons were expressed with RPKM less than 1 in both conditions are eliminated to assure the APA events being tested are considerably highly expressed. For the remaining events, we conducted Fisher exact test on the isoform-specific RNA-Seq read counts of the proximal and distal 3’ terminal exons in both conditions. The P-value derived from Fisher exact test is adjusted by multiple testing correction algorithm – Benjamini-Hochberg algorithm [48]. We also calculated the relative expression fold change of the proximal and distal 3’ terminal exons between two conditions by dividing the ratio of read counts for the two alternative terminal exons in one condition by that in the other condition. Significant APA3 and APA5 events were defined as those with a fold change of at least 2 between two conditions and a false discovery rate (FDR) of <0.01. 19 Detect APA using PAS-Seq and DRS Read mapping and polyA site calling The basic principles for detecting APA using PAS-Seq and DRS are similar except for the mapping and polyA calling steps. We used Bowtie to map PAS-Seq [49] and Helisphere to map DRS reads by requiring the mapping score no less than 4.0 (http://open.helicosbio.com). The Helisphere mapping score is specifically designed for Helicos DRS data because of the high sequencing error and insertion/deletion rate. Empirically, a mapping score of 4.0 can yield a good tradeoff between mapping error and the number of uniquely mapped reads. After mapping, only the uniquely mapped reads were retained for further analysis. According to the sequencing protocols, the most 3’ end position of the mapped PAS-Seq read and the opposite strand of the most 5’ end position of the mapped DRS read were defined as putative polyA cleavage sites. Filtering artificial polyA site due to internal priming One common side-effect of oligo polyT priming used in both PAS-Seq and DRS is that internal A-rich reads other than polyA-tail reads could also be pulled out. There are two major categories of strategies for identifying the internal-priming reads – one is the “position-specific matrix” method and the other is “fixed regular expression pattern” method [32, 44]. We compared these two methods and found the “position-specific matrix” method was too loose to keep some well annotated known polyA sites. In this regard, we choose the “fixed regular expression pattern” by removing polyA sites whose downstream 20nt sequences match the following three patterns: 1) there are in total 12 20 “A”s; 2) there is a substring with at least 8 consecutive “A”s; 3) the sequence matches a pattern like “GAAAA+GAAA+G”, where ‘+’ means repeating more than 0 times. Clustering heterogeneous polyA sites During 3’ end processing, the exact cleavage sites regulated by the same set of regulatory elements in the same transcript usually offset for several nucleotides. This phenomenon is called the “heterogeneity of cleavage” [9, 50]. Therefore, another issue is to get a consensus polyA coordinate for the ease of comparing polyA expression across different samples. In this regard, we sequentially pooled all the sequencing data, identified cleavage sites and iteratively clustered all individual cleavage sites within 40 nt to its nearest cleavage site on the same chromosome strand. The weighted coordinate, which was calculated as the sum of the product of the coordinate of an individual polyA site and its percentage of usage in the whole cluster, was taken as the representative coordinate of the corresponding polyA cluster. Finally, the sum of DRS/PAS-Seq reads corresponding to each individual cleavage site of one polyA cluster is defined as the abundance of clustered polyA (or equivalently polyA site). Calling statistically significant APA (DRS/PAS-Seq) One intuitive way of calling APA using PAS-Seq or DRS is to compare each individual polyA to another polyA of the same gene. This method is called “One v.s. One”, in which the Fisher exact test was conducted on all possible pairs of poly(A)s of one gene in two different experiments to test whether there is a change in relative usage of two poly(A)s, and the Benjamini-Hochberg method can be used to calculate false 21 discovery rate (FDR). The pairs of poly(A)s with FDRs less than 0.05 (or any given cutoff) could be defined as statistically significant events. However, this “One v.s. One” method is not robust enough due to the possible artificial polyAs, low abundance of individual polyA and large number of pair-wise comparisons needed for being adjusted by FDR calculation. Therefore, an improved method called “One v.s. Others” was designed to address these issues. The basic principle of “One v.s. Others” is to test one polyA site each time and compare it to the sum of all the other polyA sites of the same gene (Table 2). Specifically, to test whether there is a change of usage for any single poly(A) cluster of a certain gene, the Fisher exact test was conducted to compare the ratio of read count of a single polyA cluster to the sum of all the other polyAs clusters between two biological conditions. The p-values could also be adjusted by Benjamini-Hochberg method. Finally, the polyA sites with FDRs less than a given cutoff value and percentage change of DRS count greater than another cutoff value can be defined as significantly changed polyA sites. Table 2 Example of contingency table used in One v.s. Others method. One (polyA 1) Others (polyA 2+3) Total Condition 1 10 90 100 Condition 2 50 50 100 Note: The example gene has three polyA sites (1,2 and 3) and the numbers in the cells represent the DRS/PAS-Seq read counts. 22 Summary In this chapter, we briefly reviewed popular techniques for detecting significant APA events. The techniques have now evolved into a stage of high-capacity and highaccuracy. During the past several years, generic RNA-Seq, DRS and PAS-Seq have been utilized successfully in a wide range of APA studies. Computationally, almost every study differs slightly from each other in certain steps of the aforementioned pipelines. The methods depicted in this chapter are the outcome of fine-tuning during our APArelated projects. The remaining parts of the thesis will demonstrate the effective application of these pipelines by addressing different biological or clinical problems. 23 CHAPTER III THE POLY-C BINDING PROTEINS ACT AS GLOBAL REGULATORS OF ALTERNATIVE POLYADENYLATION Introduction Prior studies have identified a novel RNA-protein (RNP) complex that assembles on the 3’ UTR of the human -globin mRNA. This complex, initially identified based on its ability to enhance stability of h-globin mRNA in the cytoplasm of erythroid cells [51-55]), is comprised of the KH-domain RNA binding protein, CP (also known as polyC-binding protein (PCBP) and hnRNP E; reviewed in [56]), bound to a repeated Crich motif within the 3’ UTR [57, 58]. Subsequent studies have revealed that this CP/polyC RNP complex plays a role in stability control of multiple mRNAs, both in erythroid and nonerythroid cells, and is likely to constitute a widely distributed cytoplasmic determinant of gene regulation [58-61]. The sequences and structures of these native C-rich elements parallel the C-rich motifs in single stranded configuration that have been identified by in vitro SELEX as the optimal binding site for CP2 [62]. In addition to their stabilizing role in the cytoplasm, CP/polyC complexes also function in the nucleus during transcript processing [63]. For example, CP has been demonstrated to initially bind to the nascent human -globin transcript in the nucleus [63] where it acts in vivo as a splicing regulator [63]. Our recent study indicated that CP also enhances mRNA 3’ processing [7]. These studies demonstrate that CP bound to the C-rich upstream sequence enhances (USEs) in both steps in 3’end processing, cleavage and polyadenylation [7]. The ability of the CP complex to enhance 3’end processing is further supported by the in vivo interaction of CP with core components of the 3’end processing complex [7]. These observations support a model in which CP assembles cotranscriptionally on the 3’UTR, setting the stage for a coordinated set of nuclear and cytoplasmic controls. 24 In the current study we extend these observations by exploring a wider role for theCP/polyC complex control of the mammalian transcriptome. The results demonstrate that CPs, in conjunction with their cognate C-rich binding sites, control the utilization of polyA processing sites in a defined subset of the mRNAs. Thus the CP mRNP complex has the capacity to play a pivotal role in determining the structure and expression of specific transcripts via its impact on the 3’ processing pathway. Results Direct RNA 3’ sequencing of the transcriptome in cells acutely depleted of CP We have previously demonstrated that the RNA binding proteins, CPs, markedly enhance 3’ processing of the h-globin transcript via a sequence-specific association of the CP proteins with a C-rich motif within the 3’UTR [7]. These studies lead us to ask whether CPs play a global role in 3’ processing in erythroid cells. To address this question we assessed the impact of CP depletion on the K562 transcriptome. K562 cells are a human Tier I ENCODE cell line with hematopoietic properties. We separately transfected the K562 cells with two distinct siRNAs, each of which co-targets the two major CP transcripts, CP1 and CP2 [56]. Parallel control transfections were carried out with siRNAs against an unrelated protein (GLD-2). Effective and specific co-depletion of CP1 and CP2 from the siRNA treated cells was demonstrated by mRNA and protein analyses at three days post transfection (Figure 7). Total RNA isolated from each set of siRNA-transfected cells was subject to direct RNA sequencing (DRS, Helicos BioSciences Corporation, Cambridge, MA). DRS isolates individual tethered poly(A) RNAs for massively parallel sequencing of 3’ termini. This direct approach eliminates the need for generating cDNA intermediates, for amplification steps, or for ligation reactions, any of which has the capacity to introduce bias in the final quantification of mRNA species [45, 64]. 25 Figure 7 siRNA-mediated co-depletion of CP1 and CP2 from K562 cells. (A). Experiment procedure. K562 cells were separately transfected with two distinct siRNAs, each co-targeting CP1 and CP2 mRNAs (CP1-1 and CP1-4). Parallel transfections were carried out with 2 distinct control siRNAs targeting an unrelated protein (GLD-2 mRNA; CTRL-1 and CTRL2). 24 hours post transfection, cells were re-transfected with same siRNAs, cultured an additional two days (total 3 days of culture), and assessed for effective siRNA-mediated knock-down by protein and RNA analyses. RNA isolated from each culture was subjected to DRS analysis for mapping and quantitation of 3’ processing sites. (B). Assessment of CP depletion by realtime RT/PCR. Levels of mRNAs encoding the two CP isoforms, CP1 and CP2, are displayed. The values on the Y-axis represent the CP mRNA level normalized to levels of GAPDH mRNA in the respective sample. The ratio of CP:GAPDH for the CTRL-1 is defined as 1.0. Standard deviation for each sample is shown (n = 2). (C). Assessment of CP depletion by Western blot. Affinity purified antibodies specific for either CP1 or CP2 [57] were used for detection in the first and second panels. Detection of the large ribosomal subunit, L7a [51], controlled for sample loading (bottom panel). (data courtesy of Dr. Xinjun Ji, University of Pennsylvania) 26 Total cellular RNAs from cells individually treated with each of the two CP siRNAs and with each of the two control siRNAs were sequenced on four separate channels. The sequenced DRS reads had a mean read length of 32 nt (24 nt to 70 nt, Figure 8). Figure 8 Histogram of DRS read lengths of all the four samples (pooled). Three of the channels generated 16-18 million reads while yield in the fourth was somewhat lower (9 million) (Table 2). The raw reads were mapped back to hg19 genome assembly and were filtered for internal priming to generate a final data set of positions and numbers of polyA termini (Table 2). Approximately one third (28% - 35%) of the 27 sequenced reads were retained for polyA site quantification. 55.7% to 61.4% of the retained DRS reads are within 40nt of ends of UCSC and Ensembl genes or polyA sites in polyA_DB2 [46]. The DRS data were highly reproducible with Pearson correlation coefficient higher than 0.92 and 0.94 for two siRNA control samples and two CP siRNA samples, respectively (Figure 9). Based on this level of reproducibility, we pooled siRNA control data and CP siRNA data respectively for the subsequent computational analysis. Figure 9 Reproducibility of DRS polyA reads. DRS read counts are normalized by the minimum number of the total non-internal priming reads of two DRS samples and graphed in log2 scale. Identification of mRNAs impacted by CP depletion The DRS data was evaluated for the impact of the CP depletion on overall gene expression levels and on the relative abundances of competing polyA sites (APA). The steady state expression from each locus was determined by summing the total number of 28 poly(A) site reads overlapping Ensembl genes. This sum was referred to as the Digital Gene Expression (DGE) value. We applied DEGseq to identify differentially expressed genes [65]. Using a False Discovery Rate (FDR) of less than 0.05 and a minimal normalized fold-change of 1.5, the data revealed that acute depletion of CPs significantly altered the expression of 586 genes; 231 were increased and 355 were decreased relative to cells transfected with either of the two control siRNAs (Table 3). Increasing the cutoff to a 2-fold change in transcript abundance revealed a significant impact on the expression of 117 genes; 42 were increased and 75 were decreased relative to the two controls (Table 3). A heat map profiling the comparative DGE values for the117 most significantly impacted genes (>2-fold change) revealed excellent concordance between the analyses of RNAs isolated from cells treated with the two distinct CP siRNAs and those with the two distinct control siRNAs (Figure 10). Table 3 Summary of DRS data of CP knockdown experiments. Control Control CP CP siRNA -1 siRNA -2 siRNA-1 siRNA-2 Sequenced reads 9.9 M 18.5 M 17.4 M 16.4 m Uniquely mapped reads 3,056,826 7,069,003 6,399,451 5,863,787 Non-internal priming reads 2,811,092 6,557,478 5,919,346 5,420,574 Percentage of non-internal 28.4% 35.5% 34.0% 33.0% 1,564,685 4,027,412 3,480,473 3,185,739 55.7% 61.4% 58.8% 58.8% RNA Samples priming reads Non-internal priming reads overlapping known annotation Percentage of known polyA sites 29 Table 4 Number of differentially expressed genes impacted by aCP depletion Up-regulated 2 fold change Down-regulated Total 42 75 117 231 355 586 FDR < 0.05 1.5 fold change FDR < 0.05 Gene Ontology (GO) analysis revealed that the 586 genes with the 1.5-fold or greater change in expression subsequent to CP depletion were enriched in genes related to amino acid metabolism, amino acid biosynthesis, oxidation-reduction reactions, cholesterol metabolism, lyase reactions, and immunity and defense (Figure 11 and Table 4). The impact of CPs depletion on these gene categories is consistent with a role in the modulation of pathways controlling basic metabolism and cell stress responses [66]. A subset of 21 transcripts with more than 1.5 fold changes on the DGE analysis was subjected to verification by real-time PCR. Each amplimer set corresponded to an internal region of the target mRNA so as to detect all mRNA isoforms, irrespective of their 3’end processing patterns. These analyses, carried out on the same RNA sample that were assessed in the original DRS study, confirmed the DRS data (increase or decrease more than 1.5 fold in steady state mRNA representation) in 15 of 21 genes (Figure 12 and Table 5). 30 Motif analysis reveals C-rich determinants in the 3’ UTRs of mRNAs impacted by CP depletion We applied MEME software to infer motifs in the differentially expressed genes (DEG) associated with CP knockdown [67]. The search (MEME) was initiated on the full set of transcripts that underwent >1.5 fold change in expression subsequent to CP depletion. This MEME analysis was limited to the 200 nt segment immediately 5’ to the functional polyA cleavage site. By setting a rigorous p-value cutoff of 1.0E-10, we found 3 motifs significantly enriched in the 200 nt segments upstream of the major polyA sites of the significantly changed genes. As expected, the most strongly conserved element was the canonical polyA signal, AAUAAA, and its variants, peaking at 15-20 nts 5’ to the polyA site. These data corroborate the quality of DRS in recovering functional polyA sites. As expected, this polyA signal was observed in the mRNAs irrespective of whether or not they were impacted by the CP depletion. Both of the next two most prominent motifs contained several prominent C’s (Figure 13A). An RNA-Map and Wilcoxon rank sum test were employed to identify the positioning of motifs relative to the respectively utilized polyA sites. The two C-rich motifs were significantly enriched in the CPimpacted transcripts at three positions relative to the polyA site (-150nt, -100nt and -50nt; FDR < 0.05). The MEME analysis was next separately applied to 355 and 231 transcripts that were either down-regulated or up-regulated, respectively, in response to the acute CP depletion (Figure 13B and C). The analysis of the down-regulated genes revealed C-rich motifs 5’ to sequences of polyA sites in approximately 80% of these transcripts (185 and 105 occurences for the two C-rich motifs, respectively) (Figure 13B). These C-rich motifs in the down-regulated genes were enriched at three peak locations relative to the polyA site. In contrast, only 38 out of 231 (16%) of the upregulated genes harbor a C-rich motif 5’ to the respective polyA sites (Figure 13C) that were located at a mean distance of 125nt 5’ to the polyA sites (Figure 13C). In summary, the DGE analysis points to a significant impact of CP on the overall expression level of 31 Figure 10 Heatmap of differentially expressed genes after CP knockdown. Direct RNA sequencing (DRS) analyses were carried out on polyA RNAs isolated from cell cultures treated with the CP siRNAs or the control siRNAs (as in Figure 7). The heat map represents all 117 mRNA species that showed a >2 fold change in expression (increased or decreased) subsequent to the CP depletion. The color gradient (log scale) for the heat map represents the change in the overall representation of each mRNA normalized to the corresponding level in the RNA isolated from the cells treated in parallel with the control siRNAs. The positions of the direct siRNA targets, CP1 and CP2 mRNAs, are indicated by the arrows to the left of the heat map. a defined subset of mRNAs. The markedly greater number of mRNAs that were downregulated following CP depletion was consistent with an overall enhancing action of this CP complex on steady state mRNA levels, as the motif analysis revealed a clear enrichment for C-rich motifs in the mRNAs impacted negatively by CP-depletion transcripts. These data thus supported a direct role for polyC-binding proteins in one or more post-transcriptional control pathways that impact on steady state mRNA representation. 32 Figure 11 GO analysis of mRNAs altered in overall expression (DGE levels) by CP depletion. GO analysis (DAVID algorithm) of mRNAs that undergo an alteration in steady state levels (1.5-fold or greater change) subsequent to CP depletion were included in the analysis. The data was assessed with Fisher's exact test with the FDR adjustment. The asterix indicate the level of significance of the effect. * 0.01 < FDR < 0.05; ** 0.001 < FDR < 0.01; *** FDR < 0.001. CP impacts on patterns of alternative polyA selection The CP/polyC complex within the human -globin 3’UTR enhances 3’ cleavage and polyadenylation [7]. Based on these studies, we proposed that C-rich motifs might act as USEs in a subset of cellular transcripts. This activity could alter overall production of mature mRNAs by enhancing the use of a unique polyA site and/or have its impact via modulation of alternative polyA utilization (APA). The preceding DGE analysis is consistent with a positive impact of the 3’UTR CP/polyC complex on steady state levels 33 Table 5 Gene Ontology analysis on DEG genes impacted by αCPs knockdown GO Term Count Fold enrichment FDR Amino acid metabolism 24 3.377 1.19E-06 Immunity and defense 51 1.871 8.57E-05 Lyase 14 2.864 0.008 Cholesterol metabolism 10 3.732 0.008 Oxidoreductase 33 1.668 0.039 of a subset of mRNAs. To assess the impact of this complex on APA, we screened the DRS dataset for shifts in polyA site utilization. For each individual polyA site, we applied Fisher exact test to compare its DRS count to the sum of DRS counts of all the other polyAs within the same gene between two cell conditions (cells transfected with CP siRNAs and with control siRNAs). This comparison revealed a total of 357 significant changes in polyA site utilization (198 down-regulated polyA sites and 159 upregulated polyA sites) subsequent to CP depletion, corresponding to a total 264 gene transcripts (FDR<0.05) (Table 6). Of these APA events, 102 occurred between competing alternative polyA sites within the Same terminal Exon (‘SE-APA’). This SEAPA subset of APA events should be particularly informative regarding the identification of 3’ UTR motifs functional in APA as they should be independent of alterations in transcript splicing. Another 122 genes with APA events linked to alterations in splicing patterns and occurred in Different terminal Exons (‘DE-APA’). The remaining 133 APA events could not be simply assigned to either SE-APA or DE-APA categories and are 34 Figure 12 Confirmation of differentially expressed genes impacted by CPs by targeted real time RT-PCR analysis Shown are real-time analyses of three mRNAs that increased and three mRNAs that decreased in overall abundance subsequent to CP depletion (as shown in Table 5). These studies were carried out on the same RNA preparations as were used in the original DRS analysis. To further validate these results, the analyses of mRNA levels in K562 cells were additionally carried out with cells treated with a third distinct control siRNA to an unrelated mRNA (CTRL-3; Cyclophilin siRNA). All values shown were normalized for the corresponding levels of GAPDH mRNA. The data is represented as ratios with the ratio for the CTRL-3 siRNA sample defined as 1.0. Standard deviation for each sample is shown (n=3). (data courtesy of Dr. Xinjun Ji, University of Pennsylvania) termed “ambiguous-APA” events. The pathways controlling these last two sets of APA events are likely to be more complex and difficult to directly attribute to defined 3’UTR motifs. Motif analysis of APA events We searched for sequence motifs that could establish direct mechanistic link(s) between CP depletion and APA events. Similar to the DGE gene analysis, we examined the 200 nt regions upstream of polyA sites that underwent significant alteration in 35 Table 6 List of validated genes having significant changes in overall expression after CP depletion. Gene symbol Fold-change P-value FDR (KD/Control by DRS) HBZ 3.43 0 0 CP1 0.13 0 0 CP2 0.38 0 0 DDIT4 6.24 6.71E-287 1.96E-283 ACADVL 0.21 6.43E-78 2.77E-75 PHGDH 4.37 1.39E-62 4.86E-60 ACSM3 0.45 1.47E-58 4.20E-56 PRG2 0.26 3.35E-42 5.33E-40 ALAS2 4.13 2.27E-28 2.28E-26 CFLAR 0.36 4.25E-27 4.07E-25 SLC7A11 3 4.36E-27 4.15E-25 CHAC1 8.21 3.08E-26 2.77E-24 COMTD1 0.42 7.89E-19 4.51E-17 SCO2 0.46 8.19E-13 2.77E-11 FAHD2B 0.31 9.77E-12 2.99E-10 utilization for the structure and positioning of enriched motifs. A set of unchanged polyAs (FDR > 0.8), with similar DRS count distribution as the group with significantly change polyAs, was randomly selected to serve as a background set for the analysis. 36 Figure 13 Motif analysis within the 3’UTRs of mRNAs impacted by CP depletion.A.MEME analyses of the sequences 5’ to the dominant polyA sites of all mRNAs that underwent a 1.5-fold or greater change (up or down) in their representation subsequent to CP depletion (‘DEG’ mRNAs). The RNAMap encompassed the 200 nt segments immediately 5’ to the sites of polyA addition. The top 3 motifs as detected by MEME are shown. For each motif, we listed E-value and number of mRNAs containing corresponding motifs among the total number of mRNAs being studied. The distance distributions (polyA cleavage site defined is defined as base 0) are shown below each motif (X-axis). The Y-axis indicates the percentage of nucleotides at each indicated site. An asterisk symbolizes a significant peak detected by the Wilcoxon rank sum test (FDR < 0.05). P-value measures the significant of a motif and the ratio measures the fraction of mRNAs harboring corresponding motifs in the whole set of mRNAs.B. Summary of MEME analyses of all mRNAs downregulated by greater than 1.5-fold subsequent to CP depletion. Displayed as in (A). C. Summary of MEME analysis of all mRNAs up-regulated by greater than 1.5-fold subsequent to CP depletion. Displayed as in (A) 37 Table 7 Number of APA events impacted by αCPs depletion. SE-APA DE-APA Ambiguous-APA TOTAL Up-regulated 44 58 57 159 Down-regulated 58 64 76 198 Total 102 122 133 357 The initial analysis was carried out on the entire set of 198 polyA sites that were down-regulated upon CP depletion. As expected, the canonical polyA signals, AATAAA and the variants, were consistently identified approximately 15-20 bp 5’ to each of the utilized polyA sites (168/198 mRNAs) and was equally represented in CPimpacted APA events as in the control group (Figure 14A). A motif markedly enriched for C’s was identified in 56 of the 198 APA sites. This motif was pyrimidine-pure with C’s the predominant base at 9 of the 10 positions. This motif was not observed in the control group. This C-rich motif was highly represented 5’ of the polyA sites that were down-regulated subsequent to CP depletion and the peak of this C-rich motif was located 35-45bp 5’ of site of polyA addition. The complementing analysis of the set of polyA sites that were up-regulated following CP depletion revealed a complex motif in 41 of 159 polyA sites. This motif contained central purines, lacked a significant polyC tract, and lacked a specific or predominant localization relative to the affected polyA site (Figure 14B). To further explore the basis for the APA events, we limited the motif search to the 102 competing APA events that were limited to the same terminal exon (SE-APA events). This was done to eliminate complicating influences of co-existing alterations in splicing events (Figure 14C). To more directly link the C-rich motifs with the proposed 38 USE function, we configured the discriminative MEME motif approach to directly compare the sequence environment of 58 down-regulated SE-APA (positive set) to 44 up-regulated SE-APA (negative set). In this manner, the analysis was specifically configured to identify motifs associated with the down-regulated polyA sites that were underrepresented in the environment of the up-regulated polyA sites. The top-ranking motif in this discriminative analysis was pyrimidine-pure and C-rich motif (Figure 14C). This motif was present in 34 of the 58 down-regulated polyA sites (Figure 14C) and was positioned approximately 50bp 5’ to the down-regulated polyA processing site. When this same motif search was extended to the DE-APA events (122 polyA sites) (Figure 14D) we again identified a C-rich motif (41/122, 21 enhanced APA and 20 repressed APA) although in this case the positioning was somewhat less focused and had a mean distance of 80 bp upstream from the polyA site. These studies thus reveal a strong correlation between repression of a polyA site utilization subsequent and CP depletion and to the presence of a pyrimidine-pure and C-rich motif in close proximity to the site of 3’ processing. CP2 controls the 3’ processing of its own transcript An unexpected observation from the APA analyses was that CPs appears to autoregulate the polyA selection of the CP2 transcript. The co-depletion of the two major CP transcripts activated a set of two adjacent cryptic polyA sites within the last intron (intron 13) of the CP2 RNA (Figure 15A, sites within the dotted oval in the gene browser diagram). Both of these polyA sites are located immediately 3’ to a cryptic polyA signal, AATAAA (Figure 15B). The use of these two sites was linked to the activation of a cryptic splice acceptor site upstream of these polyA sites, thus generating an mRNA with a unique 3’ terminal exon. Targeted RT-PCR analysis and 3’ RACE both confirmed the positioning of the novel 3’ processing sites within intron 13 and the generation of exon 13a (Figure 15B and data not shown). The generation of the novel 39 ‘exon 13a’ subsequent to CP depletion was accompanied by a decrease in the use of the polyA site in exon 14. This reciprocal relationship was validated by targeted real-time RT/PCR (Figure 15C). The presence of a C-rich sequence approximately 40 nt upstream of the splicing acceptor site and overlapping the likely lariat branch site for this new intronic exon (13a) may play a role in this alternative processing event. The absence of a C-rich motif near these new polyA sites and exon 14 polyA site supports the likelihood that the control is mediated by direct effect on alternative splicing, rather than a direct effect on polyA utilization [63]. A primary splicing mechanism is further supported by the interaction between this upstream C-rich element and CP proteins, as evidenced by the RNA EMSA and UV-crosslinking assay (Figure 15D). Thus, under normal conditions, the usage of the branch site encompassed by the C-rich motif may be repressed by CPs. When CP levels or activity are depleted from the cell, a new set of mRNA isoforms is generated. PA pattern changes impacted by CPs The preceding analysis of DRS data identified enrichment for C-rich motifs 5’ to polyA sites that are down-regulated subsequent to CP depletion. These data were next confirmed by a set of targeted real-time RT/PCR analyses. Direct confirmation of APA was first carried out on two examples of DE-APA. The real-time RT/PCR analysis confirmed the DE-APA events for Ssu72 gene and NPM1 gene following CP depletion (Figure 16A-B). Of note, both of these genes have been themselves directly or indirectly implicated in mRNA 3’ end processing regulation [68] [69] (also see Discussion). Six examples of mRNAs with SE-APA pattern were next assessed by the same strategy (Figure 16C-H). In four of these transcripts a C-rich motif preceded the proximal site of the competing polyA sites and for the remaining two transcripts it preceded the distal site of the competing polyA sites. In both sets of situations, the polyA 40 sites located directly 3’ to the C-rich motif were repressed subsequent to CP depletion. UV-crosslinking assays (and RNA EMSAs for two of them) demonstrated that each of these C-rich motifs bind CPs (Figure 17). Taken together, these studies demonstrated that CP proteins, via interacting with C-rich RNA elements, impact on alternative polyA site choices, as summarized in a model in Figure 18. The sequences encompassing and 5’ to each APA sites are shown, with the C-rich sequence highlighted. The real time RT-PCR quantify total polyA site usage levels (use of both proximal and distal polyA sites) and the distal polyA site usage levels. The histogram indicates the ratio of the long 3’UTR isoform (ie., use of the distal polyA site) relative to the total polyA site usage (as in [5]). These studies were done in same RNA samples used for original DRS studies. The real-time RT-PCR quantifications were normalized to GAPDH mRNA and presented as a ratio versus the CTRL-3 (cyclophilin siRNA) defined as 1. Standard deviation for each sample is shown (n=3). Discussion We previously demonstrated that CPs enhance 3’ processing of the h-globin transcript via binding to a C-rich motif in the 3’UTR. These findings led us to conclude that the polyC motif in the h-globin 3’UTR acted as an USE enhancer of h-globin transcript 3’ processing [7]. The current findings support and extend role of CPs in the control of 3’ end processing by documenting their broad impact on steady state levels and polyA site utilization of mRNAs within the human transcriptome. These data specifically identify a subset of mRNA transcripts in which the enhancement of processing is tightly linked to the presence of the cognate C-rich binding sites in close proximity to a polyA signal. A global relationship of 3’ processing to gene regulation has been highlighted by a number of recent studies [14-16]. These processing pathways are complex in their biochemistry and reflect the input of multiple proteins and protein complexes [70]. These factors impact on 3’ processing via direct as well as indirect interactions with target. 41 Figure 14 Motif analysis of transcripts undergoing APA in response to CP depletion. (A). Motif analyses 5’ to polyA sites that are involved in APA ( DE-APA and SE-APA categories combined; 198 polyA sites) and are repressed in their representation by CP depletion The distance distribution plot is shown below each corresponding motif. The Y-axis indicates the percentage of each nucleotide at each indicated site at the indicated distance to polyA cleavage site location (defined as base 0). The analyses from the cells treated with the control or the CP siRNAs are directly compared in each setting. Asterix illustrate positions with FDR (Benjamini-Hochberg algorithm) less than 0.05 by Wilcoxon rank sum test.(B). Motif analyses 5’ to polyA sites that are involved in APA ( (DE-APA and SE-APA categories combined; 159 polyA sites) and are enhanced in their representation by CP depletion ( Figure organized as in A). (C). Motif analysis 5’ to polyA sites that are involved specifically in APA between sites in the same terminal exon (SEAPA category). A discriminative motif discovery analysis by MEME was executed to specifically identify motifs over-represented in the downregulated SE-APA (58 polyAs) and under-represented in the up-regulated SEAPA (44 polyAs). The repressed SE-APA was defined in this comparison as the positive set and the enhanced SE-APA was defined as the negative/background set. The position-specific prior probabilities were first estimated for background set. Next a normal motif search was done in downregulated SE-APA based on the position-specific prior probabilities. Note, position 7 can be any of the four nucleotides. (D). Motifs around DE-APA polyA sites (122 polyAs). The analysis was carried out as in A., above. 42 Alterations in levels of general factors and complexes involved in 3’ processing, such as CPSF, CSTF, and the nuclear polyA binding protein PABPN1 can impact on polyA site selection and the efficiency of polyA addition [71]. In addition, particular RNA binding proteins have the capacity to impact on the 3’ processing of specific transcripts or groups of transcripts. For example, the KH-domain binding protein Nova2, a protein closely related to CP, exerts controls over polyA site choices in a position dependent manner [5]. The polypyrimidine tract binding (PTB) has been implicated in the enhancement of 3’ end processing of several genes. This enhancement appears to be mediated by stimulating hnRNP H binding to a G-rich binding sites [6]. This pathway appears to have a global role in alternative polyA site selection [27]. Likewise, recent genome wide surveys have revealed that alterations in the levels of the epithelial-specific splicing regulatory protein, ESRPs, can trigger widespread shifts in polyadenylation patterns [72]. In the current study we demonstrate that CP proteins are actively involved in the determination of mRNA expression levels and alternative polyadenylation. We observe that CP depletion from the cell represses the steady state levels of substantially more mRNA than are increased. This result was consistent with the known enhancing action of the CP complex on steady state mRNA levels [51, 54, 61]. The impact of CPs on nuclear functions, and in particular on splicing and polyA activity are likely to also play roles in how much of the mRNA is generated and exported to the cytoplasm. Importantly, these control pathways are likely to be interrelated mechanistically. We have demonstrated in the case of the h-globin gene expression that the CP complex assembles on the nascent h-globin transcript in the nucleus and appears to travel on the mRNA to the cytoplasem where it stabilizes the mRNA. Thus the nuclear and cytoplasmic pathways are linked and may coordinate overall levels of gene expression and protein production. Future studies will determine whether CPs regulate mRNA steady-state level through other mechanisms, such as APA may impact 43 on mRNA steady state levels by including or excluding miRNA target sites, RNAbinding protein (such as CPs) binding sites in the final mRNA products. The data directly demonstrates that alterations in CP protein availability can regulate polyA site utilization (APA) choices in a subset of RNAP II transcripts by interacting with C-rich sequences. This involvement of CP in the global control of 3’ processing is supported by a recent general screen for proteins involved in 3’ processing [71]. We observe that the formation of CP RNP complex near the polyA sites (either proximal or distal) enhances use of the corresponding polyA site ([7] and current data). Following CP depletion, the AES, Get 4, CDK16, and SHMT2 transcripts undergo a decrease in their usage of proximal polyA sites and shift to the distal polyA sites (Figure 16). These four genes all have C-rich motifs closely located upstream of their proximal polyA site. In a reciprocal fashion, the depletion of CP results in decreased usage of the distal polyA sites and shift to the proximal polyA site in the CSTF1 and PPP2r2d transcripts. In agreement with the model (Figure 18) that the C-rich USE enhances polyA site activity, we find C-rich motifs 5’ to the distal polyA sites in both of theses transcripts. The identification of the C-rich motif 5’ of the repressed sites by MEME analysis is in remarkably accord with the definition of the CP binding site by prior analyses of mRNAs targeted by CP [7, 52, 59, 63, 73, 74] and with CP binding features as determined by in vitro SELEX [62]. Consistent with this role, we show that these C-rich motifs interact with CP proteins (Figure 17). From these complementing lines of evidence, we conclude that the binding of CP to a C-rich motif 5’ acts as a potent USE enhancer of 3’ processing. It should be noted that a significant number of mRNAs have altered PA utilization in the absence of the C-rich motif. This may represent secondary effects of CP depletion. As revealed in current study, CP depletion caused the changes of steady-state and APA pattern in a number of RNA binding proteins and components of RNA processing machinery. One can imagine that these changes will indirectly alter APA 44 Figure 15 siRNA-mediated depletion of CPs alters 3’ processing of the CP2 transcript.(A). Genome browser view of the DRS reads at the CP2 locus. Comparison of 3’ processing site utilization in cells treated with CP siRNAs (pooled, upper panels) and control siRNAs (pooled, lower panels). The y-axis represents the number of read counts corresponding to each of the polyA sites. The two novel polyA sites observed in the CP-depleted cells are encompassed in the dotted oval. (B). Generation of the two novel polyA sites in CP mRNA subsequent to CP-depletion reflects linked alterations in splicing and 3’ processing. Exons 13 and 14 (terminal exon) of the CP gene are shown. The regular splicing patterns and the positions of the corresponding 3’ polyA termini are indicated by the solid lines and solid vertical arrows, respectively. The alternative splicing/3’ processing event that occurs in the CP-depleted cells is indicated by the corresponding set of dotted lines and dotted vertical arrows, respectively (upper panel). RT-PCR amplification was between primers #1 and #2 and the corresponding fragment was excised and sequenced (middle panel). The sequencing confirmed the use of the novel splice-acceptor site in exon 13a, with the two polyA signals highlighted (lower panel: partial exon 13 sequence was shown in italic and arrow indicated the starting point of new novel exon 13).(C). Real-time PCR analysis confirms the switch in terminal intron splicing and 3’ polyA site selection within the CP2 transcript subsequent to CP depletion. Primers F and R assess the use of the canonical polyA usage, primers c and d assess use of the distal novel polyA site within exon 13a, while the primers a and b asses overall levels of alternative splicing and 3’ processing with exon 13a. The assays were carried out on RNAs isolated from cells treated with each of the 3 distinct control siRNAs and with each of the two distinct CP2-targeting siRNAs. Each real time assay was normalized to GAPDH amplicon. The ratio in the CTRL-3 sample is defined as 1.0. Standard deviation for each sample is shown (n=3).(D). In vitro RNA-protein interaction assay. An RNA oligo of 24 bp (shown below diagram), encompassing the region immediately 5’ to exon 13a (dash rectangle), was synthesized and 32P - labeled. The DNA sequence 5’ the splice site is also shown. Left panel: the RNA oligo was incubated with HeLa cell nuclear extract, UV-crosslinked, IP’ed with antiCP2/KL, and resolved on a SDS-PAGE gel [7]. The position of the ‘CP complex’ is defined by IP using anti-CP2/KL antibody. Right panel: the same 32P - labeled probe was subjected to RNA EMSA assay using K562 cell S100 extract as described [63]. (data courtesy of Dr. Xinjun Ji, University of Pennsylvania) 45 46 Figure 16 QPCR validations of APA. The subset of APA events identified in the DRS analysis was independently assessed by targeted RT-PCR. The DRS data is shown in the context of the genome browser diagram of the respective locus. The red arrow indicates the position of the site of alternative polyadenylation triggered by CP depletion. A-B: Examples of APA involving PA sites located in different exons (DE-APA). (A), SSU72 RNA polymerase II CTD phosphatase homolog (Ssu72) gene, an APA5 event [72]; (B), Nucleophosmin (NPM1)gene, APA3 event [72]. In each case, the targeted PCRs were performed to determine the change in the representation of the short isoforms. All QPCR values were normalized to GAPDH mRNA levels, using the same RNAs as used for DRS study. Standard deviation for each sample is shown (n=3). Blue bar and red bar represent 3’ portion either long or short mRNA isoforms, respectively. C-H: Examples of APA involving competing PA sites within the same terminal exon (SE-APA). (C), Amino-terminal enhancer of split (AES) gene; (D), Cyclin-dependent kinase 16 (CDK16) gene; (E), Golgi to ER traffic protein 4 homolog (GET4) gene; (F), Serine hydroxymethyltransferase 2 (SHMT2) gene; (G). Cleavage stimulation factor (CSTF1) gene (also known as CSTF50); (H), Protein phosphatase 2, subunit B, isoform delta (PPP2r2d) gene. (data courtesy of Dr. Xinjun Ji, University of Pennsylvania) 47 48 Figure 16–continued 49 Figure 17 In vitro RNA-protein interaction assay. A 32P-labeled C-rich RNA probe corresponding to each tested gene (shown at the top of the figure) was incubated with HeLa cell nuclear extract and subjected to UV-cross-linking [7] or RNA EMSA assay, as in Fig.7D. (data courtesy of Dr. Xinjun Ji, University of Pennsylvania) patterns of a subset of genes, although the exact mechanism needs for future detailed studies. It is interesting to note that the current study revealed that there was only a small overlap between the mRNAs that changed significantly in their overall steady state level (DGE values) and those that were impacted by significant alterations in their polyA site utilization (APA patterns). When the DGE genes with greater than 2-fold change and APA data sets were compared, only 7 (PCBP2, ACOT2, ACSM3, SLC6A6, PRG2, C3orf75 and C1orf86) of 117 genes were present in both categories. When the more inclusive 1.5 fold change in DGE was used, we observed only 29 genes of 586 genes in both categories. So, a reciprocal switch between two sets of competing polyA sites in 50 Figure 18 Impact of CPs on expression and alternative polyadenylation of Pol II transcripts. Based on our analysis of 3’ processing of the h-globin transcript [7] and on the present genome-wide analysis, we propose that CPs can act as general regulators of 3’ processing. The CP RNP complex recruits corecomponents of the 3’end processing machinery to a defined subset of human transcripts containing cognate C-rich binding site motifs [7] and in this way enhances the efficiency of 3’ cleavage and polyadenylation when situated in proximity to polyA signals. This enhancement of 3’ processing can increase the levels of steady state mRNA and/or alter the pattern of polyA site selection. Thus physiologic [75] or pathologic [76] shifts in the levels or biologic activity of CP can result in major alterations in the transcriptome by changes in steady state levels of subsets of mRNAs and/or a shifts in the relative utilization of competing polyA sites (APA). (data courtesy of Dr. Xinjun Ji, University of Pennsylvania) those transcripts would not necessarily result in a significant change in overall mRNA abundance. Thus the impact of CP depletion on steady state levels for many of the genes studied is likely to reflect substantial changes in the efficiency of 3’ processing at a unique polyA site (ie., absence of APA). How might the role of CP/polyC complex as an USE enhancer of 3’ processing relate to known physiologic and pathophysiologic processes? The current study revealed 51 that CPs can autoregulate polyA site selection on the CP2 transcript. This finding is in general agreement with multiple observations of auto-regulatory control over expression of RNA binding proteins [73]. These new CP2 polyA sites were generated secondary to an alternative splicing event mediated by the shift in CP levels [63]. These isoforms of the full-length CP2 mRNAs utilizing this novel polyA sites are structurally similar to CP4 protein, which has been implicated in apoptosis regulation [74]. Future work will determine whether this novel CP2 isoform plays similar role under some circumstances. CPs are considered ubiquitously expressed and are linked to a variety of activity [56]. However, CP activity and level can change significantly under special and abnormal circumstances, such as environmental stress [66], cancer [77], differentiation [75], chronic myeloid leukemia [76] and epithelial-mesenchymal transdifferentiation (EMT) during the development and metastatic progression of tumours [78]. So, it is formally possible that the change of level and activity of CP proteins may impact the pattern of alternative polyadenylation of some master genes or subsets of genes in these and other pathways. In summary, the present report reveals RNA binding protein CPs plays an important role on the 3’ end processing of a subset of genes, and this effect is mediated by the USE function of CP RNP complex. Combining our recent study [7] with the current work, we propose that CP complex assemble on the target RNAs cotranscriptionally, the nuclear-assembled CP-complexes then are retained on the mRNAs post-processing in the nucleus and are co-exported along with the mature mRNAs to the cytoplasm. By doing so, CPs are capable to link nuclear transcript processing and cytoplasmic mRNA metabolism. 52 Materials and Methods Cell culture and siRNA transfection K562 cells were cultured in RPMI 1640 medium supplemented with 10% fetal bovine serum (HyClone) and antibiotic/antimycotic at 37°C in a 5% CO2 incubator. Cells were transfected with a total of 2.0 μg of siRNA using Nucleofector V (Amaxa) acoording to the manufacturer’s instruction. All the siRNAs are ordered from Dharmacon. The siRNAs used are: CP1-1: GUG AAA GGC UAU UGG GCA A; CP1-4:UGU AAG AGU GGA AUG UUA A; GLD2-1: GUG AUU AAG AAG UGG GCA A; GLD2-2:CCA AAG AUA AGU UGA GUC A; and siRNA to Cyclophilin is ordered from Dharmacon. K562 cells were transfected twice with siRNAs; after 24 hours of initial transfection, these K562 cells were transfected one more time with same siRNAs. Cells were harvested after 72 hours of total transfection and RNAs were purified using Absolutely RNA miniprep Kit (Stratagene) according to manufacturer’s instruction. Western blot analysis is as described [51, 57]. Direct RNA sequencing 3’ end of RNA samples were sequenced by Helicos bioscience corporation (Cambridge, MA, USA) according to their protocols [45, 64]. Mapping and APA analysis of DRS data The direct RNA sequencing (DRS) was performed by Helicos Biosciences and we aligned DRS reads to human genome assembly 19 (hg19) using the indexDPgenomic tool in Helisphere at http://open.helicosbio.com/mwiki/index.php/Releases and the uniquely mapped reads with a minimal mapped length of 25 and alignment score of 4.0 were kept for further analysis. The replicate samples for control and CP knockdown experiments were pooled together for differential expression and APA study. We first filtered all mapped reads for those arising from internal poly(A) priming using a previously 53 described approach[32]. We next identified individual poly(A) sites by reversing 5’ ends of the non-internal-priming reads. To construct a consensus poly(A) annotation for downstream analysis, we used pooled data from both pooled control and aCP experiments to iteratively cluster all individual poly(A) sites within 40 nt to its nearest poly(A) site on the same chromosome strand. The weighted coordinate, which was calculated as the sum of the product of the coordinate of an individual poly(A) and its percentage of usage in the whole cluster, was taken as the representative coordinate of the corresponding poly(A) cluster. The frequencies of poly(A) clusters in the different samples were calculated according to the above consensus coordinates of poly(A) clusters in the pooled data. Next, the poly(A)s residing in the whole gene region, including exons, introns, and the downstream 100-nt region of the terminal exon, were collected as possible poly(A)s of a certain gene (UCSC genes (hg19) and Ensembl genes (release 61)). To test whether there is a change of usage for any single poly(A) cluster of a certain gene, the Fisher exact test was conducted to compare the ratio of DRS counts of a single polyA cluster to the sum of all the other polyAs clusters between pooled control and aCP knockdown samples. The p-values were adjusted by Benjamini-Hochberg method for calculating FDR. Finally, the poly(A)s with FDRs less than 0.05 and percentage change of total polyA usage greater than 10% (| | 0.1) were defined as significantly changed poly(A)s. Detection of differential gene expression The expression level of a gene is represented by the sum of DRS read counts of all the overlapping poly(A)s. We next ran DEGSeq to detect differential gene expression between the control and CP knockdown samples. The genes with FDRs less than 0.05 and normalized fold change greater than 1.5 (by number of mapped DRS reads) were defined as significantly differentially expressed genes. 54 Motif enrichment analysis We divided significantly changed polyAs in APA study into up-regulated sets (FDR < 0.05 and 0.1 ) and down-regulated sets (FDR < 0.05 and 0.1 ). Motif enrichment and co-occurrence analysis were conducted on these two subsets separately. The upstream 200bp sequences of polyAs were first scanned by MEME. To control for the polyA abundance, we grouped significantly changed polyAs and unchanged polyAs (FDR > 0.5) into bins (the borders of bins were defined by , n is determined by the most highly abundant polyA in the dataset) according to DRS read counts. The background polyAs were next randomly sampled from unchanged polyAs (FDR > 0.5) with bin sizes 10 times bigger than those in the significant polyA sets. To draw RNA-Map, the motif score of a sequence position (in the upstream 200 nt region of polyA site) is calculated as the average percentage of overlapped nucleotides in a 31nt window (upstream and downstream 15nt) for both significant and background polyAs. A Wilcoxon rank sum test was performed to measure the significance of difference in average motif score for a specific position. The p-value of Wilcoxon rank sum test was adjusted by Benjamin-Hochberg algorithm to get a FDR. The dominant polyAs of the significantly up-regulated and down-regulated genes were retrieved for similar motif analysis as in the APA study. The background dataset was generated by controlling for the gene expression level of significantly differentially expressed genes, which used a same binning and random sampling method as in the APA section. All the following analyses were also the same as in the APA section. Gene ontology analysis The DAVID (using PANTHER classification system) was used to analyze the Gene ontology enrichment of significantly differentially expressed genes and genes embracing APA events separately [79]. The background gene sets were controlled for the 55 distribution of gene expression level in the foreground gene sets using similar “binning and random sampling” method as in the “Motif enrichment analysis”. QPCR RNAs were treated with DNase I (Invitrogen, PCR grade) and then reverse transcribed using First-strand cDNA synthesis kit (GE). qPCR were performed using Fast SYBR Green Master Mix kit(Applied Biosystems) on 7900HT Fast qPCR machine(Applied Biosystems) according to manufacturer’s instruction. Primers used in the DGE and APA studies are listed in Table 6. 3’ RACE 3’ RACE is performed according to established protocol (3’ RACE System for Rapid Amplification of cDNA Ends, Invitrogen). RNA UV-crosslinking and EMSA UV-crosslinking assays were performed as described [7, 63]. 56 CHAPTER IV CONTEXT-DEPENDENT REGULATION OF APA BY EPITHELIAL SPLICING REGULATORY PROTEINS Introduction ESRP1 and ESPR2 are two epithelial cell-specific splicing proteins which play an important role for the epithelial to mesenchymal transition (EMT). Previous studies have uncovered its regulation on a broad range of alternative splicing events (alternative cassette exons and alternative 5’/3’ splice site) and a limited set of APA events using high-density microarray [80, 81]. However, as reviewed in Chapter II, the microarray platforms are of great limits to the identification of APA. Instead, in the current study, we combined both RNA-Seq and DRS in order to identify the significant APA events regulated by ESRPs. Based on the motif analysis on SELEX-Seq data of ESRPs, we also scanned for motifs in the flanking sequences of ESRP-regulated polyA sites and gained certain insights into the context-dependent pattern of ESRP in regulating APA. Results Identification of ESRP-regulated changes in alternative 3’ end formation by coupling We prepared RNA-Seq and direct RNA sequencing (DRS) libraries from mesenchymal cells (MDA-MB-231) in which ESRP1 is ectopically expressed, as well as its control by over-expressing GFP (Figure 19). The RNA-Seq and DRS libraries were respectively sequenced on Illumina and Helicos DRS platforms [64]. The DRS reads obtained through the pipeline were filtered for minimum length of 25 and for internal polyA sites, which yielded 3,338,956 and 3,537,072 uniquely mapped reads for the control and ESRP expressing samples, respectively. Clustered reads from DRS were then used to identify 335 candidate polyA siteswith significant differential use between control and ESRP expressing cells. To 57 Figure 19 Outline of the experimental systems and RNA-Seq/DRS protocol used to identify ESRP regulated APA. (data courtesy of Dr. Kimberly Dittmar, University of Pennsylvania) validate these events, we collected reads from the RNA-Seq within the 300 nt region upstream of the polyA sites. Overall, we noted that 71.8% of the DRS predicted changes in polyA site use were supported by RNA-Seq, where many of the non-validated cases lacked sufficient RNA-Seq coverage in the relevant genomic location. These data suggested that the DRS pipeline was robust and accurate. However, to obtain a more confident set of ESRP regulated polyA sites we further filtered the DRS predicted polyA sites to include only those with statistical RNA-Seq validation. This resulted in a total of 160 high confidence changes in polyA site use in 58 Table 8 Summary of direct RNA sequencing data of ESRPs knockdown. RNA Samples Control ESRP Sequenced reads 11,494,266 11,624,031 Uniquely mapped reads 3,660,857 3,881,752 Non-internal priming reads 3,338,956 3,537,072 Percentage of non-internal 29.0% 30.4% 1,698,664 1,924,890 50.9% 64.4% priming reads Non-internal priming reads overlapping known annotation Percentage of known polyA sites response to ESRP1 expression of which 32 were designated SE-APA, 76 as DE-APA3, and 52 as DE-APA5. In six cases we also used competitive RT-PCR using a common forward primer and specific reverse primers that would recognize each alternative form. Although these competitive PCRs are less quantitative than those using common primer sets, these validations supported each of the events tested thereby providing an added level of evidence that the DRS/RNA-Seq approach is robust and reliable (Figure 20). One example of alternative polyA use in the same UTR (SE-APA) was BAG1, where ESRP1 promoted expression of the isoform with an extended 3’ UTR (Figure 20). We also validated a DE-APA3 type event in the CHID1 transcript, an example where ESRP promotes use of a proximal terminal exon (Figure 20). In the EPHA2 transcript ESRP promotes the use of a 5’ splice site in the proximal DE-APA5 type terminal exon. We were intrigued by several examples where ESRP expression induced proximal DE-APA3 or DE-APA5 type 3’ terminal exons that were very close to the 5’ end of 59 transcripts and associated with a significant decrease in expression of the downstream exons (Figure 21). For example, in COL5A1 our analysis identified a novel DE-APA3 in the fourth intron and a DE-APA5 event in the first exon (Figure 21A). ESRP expression promoted both of these events, leading to short truncated products and a nearly 35-fold reduction in expression of full length transcripts. Similar reductions in total mRNA expression due to ESRP activation of a DE-APA5 in exon one of HSPG2 and the second non-coding exon of EIF4G3 were also observed (Figure 21). While it is not known whether or not the resulting truncated transcripts encode polypeptides, these examples illustrated how early polyadenylation can downregulate gene expression, though we note that this mechanism may also involve contributions of microRNA based regulation of the alternative UTRs. These examples are reminiscent of a previous observation in cleavage stimulation factor CstF-77 transcripts where the use of a conserved alternative polyA site in the third intron was proposed to serve as a means of utilizing alternative polyadenylation to directly modulate expression of the full length functional isoform [82]. Similar to our observations on the ESRPs, a limited number of other examples have been described wherein splicing factors can also regulate polyadenylation [70]. While the APA type events suggest that they can directly interact with components of the polyadenylation machinery, the DE-APA3 and DE-APA5 type events present a more complex type of regulation involving interplay of splice site and polyA site selection. Previous studies of the Nova proteins (and its D. melanogaster ortholog Pasilla) have shown that the binding sites or known motifs support a position dependent function to regulate these more complex types of regulation [5, 83]. We therefore similarly sought to explore whether the ESRP binding motif was enriched in the set of alternatively polyadenylated transcripts identified here and whether the position of these putative binding sites might determine whether they positively or negatively affect polyA site use. Using a more refined set of 108 such events (see Materials and Methods) regulated by ESRP we evaluated the positions of these motifs relative to a background set of 60 Figure 20 Examples of three types of alternative 3’ end formation regulated by the ESRPs. (A) Example of SE-APA, (B) example of DE-APA3, and (C) Example of DE-APA5 are shown with UCSC browser views of RNA-Seq and direct RNA sequencing (DRS) read counts from MDA-MB-231 control (EV, green) vs. ESRP-overexpressing cells (ESRP, red) and RT-PCR validations. (data courtesy of Dr. Kimberly Dittmar, University of Pennsylvania) 61 Figure 21 Example of APA events correlating with host gene expression change. Examples of APA5 and APA3 events near the 5’ end that result in decreased overall expression for these genes in ESRP1 expressing cells (visualized in UCSC genome browser). alternative polyA sites. Given the limited number of well supported examples of each APA subtype, our primary analysis lumped all alternative polyA sites together and investigated whether the enrichment of these motifs relative to the site of polyA addition might determine if use of a given site increased or decreased upon ESRP expression. As shown in Figure 22, there was a highly significant level of enrichment for ESRP binding sites both upstream and downstream of the ESRP-regulated polyA sites, suggesting that they can directly impact polyadenylation. While there were regions relative to the polyA site in which ESRP binding motifs were enriched in both ESRP enhanced and repressed sites, there were also regions in which binding sites were more statistically associated with ESRP enhanced or repressed sites. For example, in the region ~-220 to -160 62 upstream of ESRP regulated polyA sites there was a greater enrichment in enhanced vs. repressed polyA sites (Figure 22). However, for ESRP repressed sites there was greater enrichment of the motifs +7 to +86 and +200 to +250 nt downstream of the polyA site. These observations are in agreement with previous studies showing enrichment for binding motifs of other splicing factors such as Nova and hnRNP H that also regulate polyadenylation [5, 27]. While these observations implicate the ESRPs in the regulation of polyadenylation, it bears note that the regulation of DE-APA3 and DE-APA5 events may also occur through the regulation of splicing via binding to intronic regions near the regulated splice sites. Such events may further involve coupled recruitment or inhibition of the splicing and polyadenylation machineries. However, given the limited number of each subtype (DE-APA3 and DE-APA5), we were unable to derive a separate confident map for ESRP binding sites in these events. While this motif analysis supports a role for ESRPs in direct regulation of polyadenylation, we cannot be sure that all of the events we identified are direct targets. Nonetheless, the current set of over 100 high confidence changes in polyadenylation provides a useful dataset for downstream analysis and illustrates the potential of integrating high throughput sequencing of mRNA 3’ ends with transcriptome-wide sequencing to uncover larger sets of these regulatory networks. Discussion We analyzed DRS and RNA-Seq for identifying APA events potentially impacted by ESRPs. This work highlights the emerging potential of high throughput sequencing methodologies to comprehensively identify differential patterns of alternative polyadenylation. The pipeline using RNA-Seq and DRS to investigate regulation of polyadenylation demonstrates how coupling these different high throughput technologies can increase the power of detection while also providing intrinsic cross-validation. Our studies also add the ESRPs to the list of known splicing regulators, such as Nova, hnRNP F, hnRNP H, PTB, and U1A that can regulate polyadenylation [70, 84, 85]. While the 63 Figure 22 A functional map for ESRP position-dependent regulation of alternative polyadenylation. The top twelve 6-mer ESRP binding motifs from SELEXSeq were used to derive an ESRP binding score, which is shown mapped across the set of 108 DRS identified and RNA-Seq cross-validated ESRP regulated polyA sites and the 250nt upstream and downstream with promoted sites in red and silenced sites in blue. This motif was also mapped across a background set of annotated polyA sites (black). (data courtesy of Dr. Peng Jiang, University of Iowa) proposed map of ESRP binding sites suggests that they can promote or inhibit polyadenylation in a position dependent manner, further experiments are needed to validate this. In addition, future investigations are needed to understand the mechanisms by which the ESRPs regulate polyadenylation. HNRNP F, a homolog of ESRP, was previously suggested to regulate alternative polyadenylation of IgM heavy chain through inhibition of Cstf-64 binding to the downstream polyA [86]. A similar role for PTB mediated inhibition via reduced Cstf-64 binding was also shown, whereas binding of PTB upstream of the polyA site can promote polyadenylation [26]. Based upon the pattern of 64 enrichment of ESRP binding motifs relative to regulated polyA sites we envision that they might operate through similar mechanisms to regulate polyadenylation. We also noted numerous examples of high confidence ESRP regulated polyA sites that represent novel previously unannotated sites of polyadenylation. In fact, 72 out of the 271 (26.6%) of the regulated polyA sites supported by both DRS and RNA-Seq were novel, similar to findings in other recent 3’ end sequencing studies that discovered large numbers of novel polyA sites [32, 43, 64]. Thus, as more studies using similar technologies are performed for additional regulators and in different cell contexts, the percentage of human genes that are known to undergo regulated alternative polyadenylation will almost surely continue to rise. Materials and Methods Cell Culture, transfection, and transduction MDA-MB-231 cells were maintained, transfected, and transduced as described [87]. Library preparation and sequencing Sequencing libraries were prepared using the mRNA-Seq Sample Prep Kits (Illumina) according to the manufacturer's instructions. 10ug total RNA was used to prepare polyA RNA for fragmentation followed by cDNA synthesis with random hexamers and ligation to Ilumina adaptor sequences. The samples were quantified using an Agilent 2100 Bioanalyzer, loaded onto flow-cells for cluster generation, and sequenced on an Illumina GenomeAnalyzer IIx using single-read protocol to generate 76bp reads (Illumina). 65 Identification of ESRP regulated changes in polyadenylation We used single-end RNA-Seq reads for EV and ESRP experiments to infer the exon-exon junctions which are used to classify APA events. The junction prediction was done in TopHat [88]. The predicted junctions and known gene annotation were taken together to do the APA type classification (SE-APA, DE-APA3 and DE-APA5). To investigate the agreement between DRS and RNA-Seq data and to validate the DRS predictions, we counted RNA-Seq reads within 300 nt upstream regions of polyAs and conducted one-sided fisher exact test based on RNA-Seq reads. We defined consensus validated events as those with FDR less than 0.05 from DRS and p-value less than 0.01 from RNA-Seq with the same direction of change in both datasets. We also filtered using a cutoff of at least a 10% change in polyA site use from DRS and discarded those events that could not be classified as SE-APA, DE-APA3, or DE-APA5. For the purposes of investigating ESRP binding motifs within the events we also removed significant APA genes with more than two polyAs for drawing an RNA map. We also noted that a number of the DE-APA3 and DE-APA5 type events corresponded to comparison of two or more closely approximated polyA sites with a single alternative polyA and we therefore retained only the most representative comparison within that gene with the most significant p-value. 66 CHAPTER V GLOBAL REGULATION OF ALTERNATIVE POLYADENYLATION BY CLEAVAGE STIMULATION FACTOR 64 (CSTF64) Background Cleavage stimulation factor (CstF) protein family is the core family protein involved in the 3’ end processing, which includes CstF77, CstF50, and CstF64 [89]. CstF64 can directly bind to U/GU-rich elements of RNA via its RNA recognition motif (RRM) [90]. Although the AAUAAA hexamer is highly conserved, the downstream U/GU-rich elements are much more heterogeneous and it is not well understood how CstF64 can recognize such divergent sequences [91, 92]. As reviewed in Chapter I, CstF64 is an important regulator of APA. However, it remains unknown how CstF64 regulates APA globally. On the other hand, CstF64τis a paralog of CstF64 and the two protein share a similar domain structure [93]. Recently, CstF64τwas isolated as a part of the CstF complex [10], but its functions in mRNA 3’ processing remain poorly understood. In order to comprehensively characterize the functions of CstF64/τ in vivo, we characterized CstF64-mediated global APA regulation by quantitative RNA polyadenylation profiling of CstF64-expressing and CstF64-depleted cells. The results in this study provided significant new insights into the mechanisms of PAS recognition and APA regulation by CstF64/τ. Global analyses of CstF64-mediated APA regulation To characterize the role of CstF64 in global APA regulation, we generated HeLa cell lines (CstF64-RNAi cells) that stably express specific small hairpin RNAs against CstF64 mRNAs. As shown in Figure 23A, CstF64 was efficiently depleted in these cells while CstF77 and CstF50 levels were not significantly affected. Interestingly, no apparent growth defects were observed for these cells. We then isolated total RNAs from control 67 HeLa cells and CstF64-RNAi cells and carried out direct RNA sequencing (DRS) using the Helicos platform to quantitatively map RNA polyadenylation profiles (Table 8). Table 9 Summary of DRS data of CstF64/τ knockdown experiments. RNA Samples Hela CstF64- CstF64/τ- siRNA siRNA Sequenced reads 8,787,115 22,699,065 2,769,295 Uniquely mapped reads 3,978,877 9,957,878 1,115,885 Non-internal priming reads 3,795,224 9,538,799 1,060,090 Percentage of non-internal 43.2% 42.0% 38.3% 2,760,228 7,160,775 553,018 72.7% 75.1% 52.2% priming reads Non-internal priming reads overlapping known annotation Percentage of known polyA sites When the APA profiles were compared between control HeLa cells and CstF64-RNAi cells, we identified 327 PASs that showed significantly different usage. 85 genes were identified as high-confidence targets as they contained two alternative PASs that showed significant difference in usage. Among them, 52 genes showed an increase in the relative usage of the distal PAS in CstF64-RNAi cells (proximalto-distal shift) while changes in the opposite direction were observed for 33 genes (distal-to-proximal shift) (Figure 23B, left panel). Given the known function of CstF64 as an essential mRNA 3’ processing factor, it was surprising that depletion of CstF64 had relatively small effect on the global 68 APA profile. Interestingly, we observed that the protein levels of CstF64were significantly higher in CstF64-RNAi cells (Figure 23A). We next compared the RNAbinding specificity of CstF64 and CstF64τusing gel shift assays with purified GSTCstF64 or CstF64τ-RRM and the polyA sites of SVL, BASP1 and RPS11. For all tested RNAs, the affinities of CstF64 and CstF64τwere almost indistinguishable (Figure 24). These results suggest that CstF64τand CstF64 have overlapping RNA binding specificities and may play redundant roles in mRNA 3’ processing. Therefore, the enhanced levels of CstF64τin CstF64-RNAi cells may at least partially compensate for the loss of CstF64. To assess the specific role of CstF64 in global APA regulation, we knocked down CstF64τin CstF64-RNAi cells to a level similar to that in control HeLa cells through transient transfection of siRNAs against CstF64τ(Figure 23A, right lane). Two interesting observations were made from the DRS analysis results of the CstF64 and CstF64τdouble knockdown (CstF64&τ-RNAi) cells. First, we identified 873 PASs with significantly different usage between the CstF64&τ-RNAi cells and the control HeLa cells, which was significantly higher than the number detected for CstF64-RNAi cells. 201 genes were identified as high confidence targets with two PASs that displayed significantly different usage (Figure 23B, right panel). There was significant overlap between the genes with significant APA changes in CstF64-RNAi cells and those in CstF64&τ-RNAi cells (Figure 25A), and the regulated PASs also shared some sequence features in the two datasets (Figure 25B). Second, for the identified genes with APA changes, the majority (171 genes or 85%) showed a proximal to-distal shift while only 30 genes (15%) displayed changes in the opposite direction (Figure 23B). We validated our APA analyses results on 6 selected target genes through quantitative RT-PCRs (qRTPCRs) using primer sets that target the common regions shared by both APA isoforms or the extended regions that are only found in the longer isoforms. For all 6 genes tested, the directionality of APA changes detected by the DRS analysis was confirmed by our qRT- 69 PCRs (Figure 23C), suggesting that our DRS analyses of APA were highly reliable. In most cases, the magnitude of the APA changes was greater in CstF64&τ-RNAi cells compared to that in CstF64-RNAi cells. To understand the role of CstF64-RNA interactions in APA regulation, we compared CstF64 individual-nucleotide resolution UV Cross-Linking and ImmunoPrecipitation (iCLIP) signals at the polyA sites that are regulated by CstF64. We divided the genes with significant APA changes in CstF64&τRNAi cells into “proximal-to-distal shift” or “distal-to-proximal shift” groups. For all the genes within each APA group, we then plotted the total normalized iCLIP signals at the proximal and distal sites. As shown in Figure 23D, similar levels of iCLIP signals were detected at both proximal and distal PASs for genes in the “proximal-todistal” group (top panel). By contrast, the distal PASs have significantly higher CstF64 iCLIP signals than the proximal PASs for genes in the “distal-to-proximal” group (Figure 23D, lower panel). Discussion Our study provided important new insights into the role of CstF64 in global APA regulation. First, our data revealed that CstF64 and its paralog CstF64τhave overlapping RNA-binding specificities and play redundant roles in APA regulation. The functional redundancy between the two proteins provides an explanation for our observations that depletion of CstF64 had little effect either on cell growth or the global APA profile, and that co-depletion of both CstF64 and CstF64τled to greater APA changes. As CstF64τwas still present in our CstF64/τ-RNAi cells (Figure 23A), the actual number of APA events regulated by CstF64 and CstF64τmay exceed those identified in this study. Secondly, our data suggest that CstF64 is an important global regulator of APA and, in most cases, promotes the usage of proximal PASs (Figure 23B). We propose the following model for CstF64-mediated APA regulation. When CstF64 is abundant, it promotes efficient recognition of the proximal and weaker PASs through direct protein- 70 RNA interactions. 3’ processing at proximal PASs prevents the transcription and usage of the distal PASs. In the presence of limited CstF64, however, recognition of the proximal PASs becomes less efficient, which allows the distal and stronger PASs to be transcribed and recognized by the 3’ processing machinery. Our results are consistent with previous studies showing \\\ Figure 23 CstF64-mediated global APA regulation. (A) Western blot analysis of control HeLa, CstF64-RNAi and CstF64&τ-RNAi cells. (B) Pair-wise comparison of PAS usage in HeLa, CstF64-RNAi and CstF64&τ-RNAi cells. Y axis: log10(proximal/distal)-HeLa. X axis: log10(proximal/distal)-CstF64RNAi (left) or – CstF64&τ-RNAi (right). PAS pairs with statistically significant differences in usage are highlighted in blue (higher usage of proximal PAS in RNAi cells) or red (higher usage of distal PAS in RNAi cells). (C) qRT-PCR verification of the APA changes in six genes. Y axis is log2 ratio of RNAi/HeLa(extended/common). (D) Total normalized iCLIP signals for proximal-to-distal shift (red) and distal-toproximal shift (blue) PAS pairs (the same highlighted PAS pairs in (B)). (data courtesy of Dr. Chengguo Yao, University of California, Irvine) 71 that higher levels of CstF64 led to increased usage of the proximal PASs in the IgM and NF-ATc mRNAs [20, 94]. Thirdly, although CstF64 is believed to be a general 3’ processing factor, our results suggest that CstF64 depletion impacts the APA of a specific Figure 24 Comparison of the RNA-binding specificities of CstF64 and CstF64τ. Gel mobility shift assays using recombinant GST-CstF64-RRM or GST-CstF64τ (0, 25, and 50μM) and the 60nt sequences downstream of the cleavage sites of the listed genes. SVL RNA was used as a positive control. (data courtesy of Dr. Chengguo Yao, University of California, Irvine) subset of genes. Interestingly, a similar phenomenon has been reported in splicing where changes in the concentration of core spliceosomal components regulate specific alternative splicing events [95]. This may be a common theme for the regulation of mRNA processing. Finally, a number of recent studies have reported widespread and systematic APA changes under a variety of physiological and pathological conditions [70]. Interestingly, systematic APA shifts to the distal PASs during stem cell differentiation and development are accompanied by a decrease in the mRNA levels of many core 3’ processing factors, including CstF64 and CstF64τ. Our study provided the first direct evidence that a decrease in the protein levels of a general 3’ processing factor leads to APA changes characterized by the higher relative usage of the distal PASs in many mRNAs. It is important for future studies to determine whether/how 72 the protein levels of CstF64/τand other core 3’ processing factors are regulated under different physiological conditions and how these changes contribute to global APA regulation. Materials and Methods Cell culture and transfections HeLa cells were grown in DMEM plus 10% fetal bovine serum. For CstF64 RNAi, a pSuperior.puro plasmid was constructed to express small hairpin RNAs targeting CstF64 mRNA (target sequence: GTTAGATGCCAGAGGATTA). Transfections were carried out using Lipofectamine 2000 (Invitrogen) according to instruction provided by the manufacturer. Stable CstF64 RNAi cell lines were obtained by selection with puromycin and expansion of single colonies. To knockdown CstF64in CstF64 RNAi cell lines, pre-designed siRNAs (Ambion s23471) were transfected into a stable CstF64 RNAi cell line using Lipofectamine 2000. Knockdown efficiencies were determined by western blotting using antibodies against CstF64 (mAb 6A9) and CstF64(Bethyl A301-487A). Gel shift assay RNA substrates were synthesized by T7 transcription in the presence of α-32P UTP. RNAs (~1.5nM) were incubated with 0 to 60 µM GST-CstF64-RRM fusion protein in 10.6 µl binding buffer (10mM HEPES, pH 7.9, 50mM NaCl, 0.5mM MgCl2, 0.1mM EDTA, 5% glycerol, 1mM ATP, 10mM creatine phosphate, 5mM -mercaptoethanol, 0.25mM PMSF, 0.7 µg E. coli tRNA, and 1.4 µg BSA) at 30C for 10 mins. Reaction mixtures were resolved on 5% nondenaturing PAGE gels. Sequencing and reads mapping The direct RNA sequencing (DRS) was performed by Helicos Biosciences and DRS reads were aligned to human genome assembly 19 (hg19) using the 73 Figure 25 Comparison of APA changes in CstF64- and CstF64&τ-RNAi cells. (A) A Venn diagram comparing the genes with 2 PASs showing significantly different usages in CstF64- and CstF64&τ-RNAi cells. (B) MEME analysis of the proximal and distal PASs (200nt sequence centering on the cleavage sites) of genes with proximal-to-distal shifts in CstF64&τ- (top panel) and CstF64RNAi cells. 74 indexDPgenomic tool in Helisphere. The uniquely mapped reads with a minimal mapped length of 25 and alignment score of 4.0 were kept for further analysis. We first filtered all mapped reads for those arising from internal poly(A) priming using a previously described approach (9). We next identified individual poly(A) sites by reversing 5’ ends of the non-internal-priming reads. To construct a consensus poly(A) annotation for downstream analysis, we used pooled data from both Hela-Mock and CstF64-RNAi cells to iteratively cluster all individual poly(A) sites within 40 nt to its nearest poly(A) site on the same chromosome strand. The weighted coordinate, which was calculated as the sum of the product of the coordinate of an individual poly(A) and its percentage of usage in the whole cluster, was taken as the representative coordinate of the corresponding poly(A) cluster. The frequencies of poly(A) clusters in the different samples were calculated according to the above consensus coordinates of poly(A) clusters in the pooled data. Next, the poly(A)s residing in the whole gene region, including exons, introns, and the downstream 100-nt region of the terminal exon, were collected as possible poly(A)s of a certain gene (UCSC genes (hg19) and Ensembl genes (release 61)). APA analysis To compare the APA profiles between HeLa and CstF64-RNAi or CstF64&τRNAi cells using DRS data, we first removed poly(A) sites that overlap with snoRNA/scaRNA/snRNA regions and those that have 0 read in two out of three samples. For the remaining poly(A) sites, the Fisher exact test was conducted to compare the ratio between the DRS read counts of one PAS and the sum of the read counts of all the other PASs within the same gene. The P-values were adjusted by Benjamini-Hochberg method for calculating FDR. The poly(A) sites with FDR less than 0.05 were defined as significantly changed APA. 75 CHAPTER VI ALTERNATIVE POLYADENYLATION DURING HYPOXIA INDUCTION Background Hypoxia is a condition associated with inadequate oxygen supply to tissues or cells. The hypoxia cellular response could play an important role in human disease. For example, hypoxia is pervasive in cancer tissues/cells. It can promote tumor progression and hypoxic cancer tissues/cells are resistant to therapy [97]. The biological pathway of hypoxia is regulated by a family of transcription factors – hypoxia-inducible factors (HIFs). Under normal oxygen condition, HIF-1α degrades to a level at which it cannot bind with HIF-1β to form a transcription complex; while in hypoxia, HIF-1α:HIF-1β complex accumulates and induces the transcriptions of a set of hypoxia-responsive genes [98]. In addition, the hypoxic signaling pathways are diversified in various cell types as measured by numerous microarray studies, which demonstrates the physiologic complexity of hypoxia at the transcriptional level [99]. At the post-transcription level, some studies discovered the involvement of microRNA and RNA binding proteins during the cell response to hypoxia [100, 101]. As a result, a limited set of significant alternative splicing events were detected in hypoxia-induced endothelial cells using exon array [102]. Given all these facts, we hypothesize hypoxia can also induce alternative polyadenylation which is important in creating protein and post-transcription diversity in human cells. To justify this hypothesis, we first improved a protocol for specifically sequencing 3’ end of mRNA based on the prototype described in [103]. Our quality control analysis justified this improvement in terms of reproducibility, robustness and 76 correctness. Next, we applied a modified pipeline based on the method in Chapter II to identify APA events in three sets of normal and hypoxic JEG-3 cells. Results Quality control analysis of PAS-Seq data We prepared PAS-Seq libraries for 14 samples (6 hypoxia samples, 4 MAQC brain samples and 4 MAQC UHR samples) and did sequencing on Illumina platform. We first mapped the above 14 samples to human genome assembly 19 (hg19). The outputs for each sample are 12.6M to 33M which are generally much higher than the output of DRS as described in previous chapters. However, due to the higher internal priming reads (30% to 55%) in the PAS-Seq, the final numbers of polyA site reads are 4M to 12M which are at the same level as the DRS (Table 9). We next compared PAS-Seq polyA sites to known gene or polyA site annotation. First, the polyA sites derived from PASSeq were compared to the polyA_DB2 and the 3’ end of known genes (UCSC and Ensembl). A PAS-Seq polyA site is defined as “known” if it is within 40 nt of a polyA_DB2 polyA site or the 3’ end of a known transcript. As shown in Table 9, we observed that 85% to 92% of polyA sites were consistent with known polyA annotation. In addition, for each group of technical replicates (MAQC samples), we observed very similar numbers in both the percentage of non-internal priming reads and percentage of known polyA sites implicating high robustness of PAS-Seq. Reproducibility of PAS-Seq To further assess the reproducibility of the PAS-Seq data, we carried out a series of quality control analyses. First, within each PAS-Seq replicate group, we calculated the 77 reproducibility score of each individual polyA site— number of samples in which a certain individual polyA site occurs among all the replicate experiments. We also calculated the percentage of PAS-Seq polyA sitess with certain reproducibility score range (>=2, >=3, and ==4) using different threshold of PAS-Seq read count (Figure 26). In this analysis, we observed an increasing trend illustrating that it is more likely for polyA site with higher PAS-Seq read count occurring in more replicates. When the threshold of PAS-Seq read count was set at 5 or more, 95% of polyA sites were detected Table 10 Summary of Hypoxia and MAQC PAS-Seq data Brain1 20.2M 13.8M 6.3M 45.7% Non-internal priming reads overlapping known annotation 5.4M Brain2 17.9M 12.4M 5.6M 45.2% 4.8M 85.7% Brain3 12.6M 8.7M 4M 46.0% 3.5M 87.5% Brain4 19.4M 13.3M 6.1M 45.9% 5.3M 86.9% UHR1 16.6M 11.2M 6.1M 54.5% 5.6M 91.8% UHR2 15.4M 10.3M 5.6M 54.4% 5.0M 89.3% UHR3 20.2M 13.7M 7.3M 53.3% 6.5 M 89.0% UHR4 14.6M 9.8M 5.3M 54.1% 4.7M 88.7% HYP1 23.5M 16.2M 11.3M 69.8% 10.4M 92% Ctrl1 27.2M 18.7M 11.4M 61.0% 10.2M 89.5% Hyp2 33.0M 22.0M 12.0M 54.5% 10.5M 87.5% Ctrl2 23.0M 15.9M 8.0M 50.3% 7.0M 87.5% Hyp3 25.6M 17.1M 9.4M 55.0% 8.3M 88.3% Ctrl3 25M 16.8M 8.7M 51.8% 7.6M 87.4% Sequenced Samples reads Uniquely mapped reads Noninternal priming reads Percentage of noninternal priming reads Percentage of known polyA sites 85.7% 78 Figure 26 Reproducibility of polyA sites by PAS-Seq. X-axis is the minimal PAS-Seq read counts across multiple replicate samples and Y axis is the percentage of polyA sites reproduced in no less than 2 (yellow), no less than 3 (red) and 4 replicates (blue). The panel A is for MAQC brain replicates and panel B for MAQC UHR replicates. in all four samples, 90% in at least 3 samples and 85% in at least 2 samples (in both Hypoxia and MAQC sample). Next, we plotted a heatmap about the pairwise Pearson correlation of common individual polyA sites for 18 individual PAS-Seq samples (14 in house datasets –4 UHR, 4 Brain, 6 Hypoxia, and 4 published datesets – 2 UHR and 2 Brain [44]) (Figure 27) . The dendrogram clearly shows the clustering of replicates of the same experiment and the separation of different experiments of data, which further suggests the high reproducibility and robustness of our PAS-Seq data. Gene expression correlation between PAS-Seq and RNA-Seq Ideally, each individual PAS-Seq polyA site represents an mRNA transcript. To demonstrate that PAS-Seq is quantitative for gene expression level and that PAS-Seq 79 Figure 27 Clustering of 18 PAS-Seq samples. The correlation coefficients of PAS-Seq read counts of polyA sites are calculated pairwisely for all 18 PAS-Seq samples as described in the text. The correlation coefficients matrix was then used to plot the clustering heatmap. reads are not over-amplified during sample preparation, we compared the gene expression level measured by RNA-Seq to that by PAS-Seq for 6 hypoxia samples. As a result, the Pearson correlation coefficients ranges from 0.57 to 0.71 (Figure 28). These correlation coefficients are high enough to claim the quantitative capability of PAS-Seq in measuring gene expression levels. In conclusion, we have demonstrated the correctness, robustness, reproducibility and quantitative ability of PAS-Seq through various quality control analyses. Thus, this improved protocol is suited for APA detection and other relevant analysis. 80 R: 0.621 R: 0.712 R: 0.596 R: 0.578 R: 0.572 R: 0.608 Figure 28 Scatterplots of gene expression measured by PAS-Seq and RNA-Seq. 6 samples in hypoxia experiment are plotted independently. The X-axis and Yaxis represent gene expression (in log2 scale) by PAS-Seq and RNA-Seq respectively. Alternative polyadenylation induced by hypoxia By applying a modified pipeline as compared to the method in Chapter II (Materials and Methods), we detected 579 significant APAs (312 are downregulated and 267 are upregulated) between hypoxia and normoxia. Amongst these APA events, 241 are SE-APA events, 21 are SE-APA3 events, 8 are SE-APA5 events and the remaining 308 cannot be explicitly classified into any of the above three types, which are 81 defined as “ambiguous-APA”. We also took advantage of RNA-Seq for APA detection for the purpose of verifying PAS-Seq results. 104 out of 579 significant APA by PASSeq are also supported by RNA-Seq. Using qRT-PCR, we successfully validated 21 out of 23 polyA sites with the same direction change of APA utilization (Figure 29). Many studies have revealed a systematic shortening or lengthening pattern of mRNA 3’ UTR as the response to clinical conditions [16, 31, 104]. To investigate the direction of 3’ UTR length change between hypoxia and normoxia, we calculated weighted 3’ UTR lengths and observed the overall distribution of 3’ UTR length in hypoxia replicates is shorter than that in the corresponding normoxia replicates (Figure 30). One function of APA is to regulate mRNA steady level through the regulation of microRNAs on alternative UTR region. In this regard, we further separated the SE-APA into two subgroups – SE-APA with the usage preference changed from proximal to distal polyA site (SE-APA-P2D) and SE-APA with the usage preference changed from distal to proximal polyA site (SE-APA-D2P). Although we observed no difference in mRNA expression distribution between SE-APA-P2D and SE-APA-D2P, we did find the number of SE- APA-D2P was two times as much as the SE-APA-P2D (Figure 31). Together with the results in Figure 30, we concluded that mRNAs are inclined to expressed shorter 3’ UTR isoforms in hypoxia as compared to normoxia. GO analysis of APA genes To gain insights into the biological pathways of APA genes, we conducted gene ontology (GO) analysis on genes with significant APA switches. Using a background dataset consisting of expressed genes (FPKM >= 1), we found no significant GO term. Next, we conducted GO analysis on SE-APA-P2D and SE-APA-D2P gene sets separately. In the SE-APA-D2P gene set, we detected one single significant GO term – “Homeostasis” (6.8 fold enrichment compared to the background gene set). This implies 82 Figure 29 RT-qPCR validation of APA events in hypoxia. One example is shown for each APA type (SE-APA, DE-APA3 and DE-APA5). The left panel represents the structural diagram and RNA-Seq and PAS-Seq profile from UCSC genome browser. The top picture in the middle panel illustrates the difference of polyA site usage from normoxia to hypoxia while the bottom picture represents the fold changes of ratio of expression levels (by qPCR) between different isoforms in hypoxia and normoxia. The right panel shows the gene expression level (FPKM by RNA-Seq) of the corresponding APA gene. (data courtesy of Dr. Lan Lin, University of Iowa) SE-APA-D2P is biologically relevant to hypoxia induction, which may play a role in maintaining the Homeostasis. 83 Figure 30 Distribution of 3’ UTR lengths in Hypoxia and Normoxia. For each Ensembl gene with a single stop codon annotation, the 3’ UTR length was calculated by summing up all the 3’ UTR lengths, weighted by the PAS-Seq read counts of their corresponding polyA sites. The 25th, 50th and 70th percentile of overall distribution were marked in the box and 5th and 9th percentile were the boundary of the whiskers. Discussion Using an improved protocol to specifically sequence 3’ end of mRNA, we discovered a large set of genes with significant APA usage switch between hypoxia and normoxia. Two immediate questions for this finding are 1) whether the APA usage switch in hypoxia is the consequence of upstream hypoxia-induced pathways and 2) whether it can influence certain pathways leading to tumorgenesis? From the gene expression analysis, we found significant gene expression changes of a set of RNA binding proteins and 3’ end processing factors, such as RBFOX2 and MBNL3. It was 84 Figure 31 Distribution of expression levels of SE-APA genes. The red and blue dots represent expression levels (FPKM in log2 scale) of APA-D2P and APA-P2D genes in normoxia (x-axis) and hypoxia (y-axis). The red and blue bars show the number of significant APA-P2D and APA-D2P genes respectively. previously revealed that RNA binding protein was regulated by the oxygen supply change in a HIF-1 independent pathway [105]. It would be interesting to investigate similar pathways involving the significantly differentially expressed RBPs or 3’ end processing factors in hypoxia and associate the RBPs or 3’ end processing factors with the induction of APA usage switches. The functional enrichment analysis on APA genes revealed the overrepresentation of homeostasis genes in hypoxia, which may suggest the direct influence of hypoxia on APA formation. Moreover, it would be worthwhile to delve into the list of homeostasis APA genes for biomarker identification. Materials and Methods Sample preparation JEG-3 cells were incubated in humidified hypoxia chambers (Billups-Rothenburg, Del Mar, CA, USA) with 2% and 20% O2 mixture at 37C for 48 hours. Total RNA from six sets of biological replicate treatments were extracted using TRIzol reagent 85 (Invitrogen). For each sample, 1.5 g of total RNA used in reverse transcription reactions with random hexamer to generate single-pass cDNA. qRT-PCR analysis of GAPDH and ERRF1 were used to evaluate the hypoxic response of each treatment, 3 sets of treatment (Set 1, 2 and 3) with the largest hypoxic response were selected for RNA-Seq and PASSeq analysis. . For the purpose of quality control, we also prepared sample using the same protocol for 8 MAQC samples (4 brain technical replicates and 4 UHR technical replicates) [106]. RT-qPCR validation of APA To validate SE-APA events, two gene-specific primer sets were designed for each APA event. One primer set targets the common region of all APA isoforms, and the other primer set targets the extended region only on the specific APA isoform. For DEAPA3 and DE-APA5 events, each specific exon has one set of specific primers. Detect APA for multiple samples We modified the “One v.s. Others method” described in Chapter 2 in order to detect significant APA events from multiple sets of hypoxia samples. First, instead of conducting two-sided fisher exact test as in the one-sample setting, we conducted two independent one-sided fisher exact tests (either greater or less) for each single polyA site in each set (Set1, Set2 and Set3). Next we combined all the three one-sided fisher exact tests with the same null hypothesis and calculated a combined χ2 value as , where pi is the p-value for the ith one-sided fisher exact test. The χ2 was then transformed to p-values which were further adjusted by Benjamini-Hochberg algorithm. Finally, we applied a set of filtering criteria to call the significant APA events: 86 (1). FDR less than 0.05; (2) individual p-value less than 0.001; and (3) percentage change of polyA abundance greater than 0.1 in each sample. To validate significant APA events using RNA-Seq, we first counted the total reads residing in the upstream 300 nt region of each polyA site and called this number as “supporting RNA-Seq count”. Next for each polyA site, we construct a contingency table consisting of its supporting RNA-Seq count and the sum of the supporting RNA-Seq counts of other polyA sites in the same gene in both hypoxia and normoxia. A two-sided fisher exact test was conducted on each contingency table for each polyA site. If the pvalues of fisher exact test on RNA-Seq data for a polyA site were less than 0.05 in all three replicates and the direction of polyA usage change was the same as in PAS-Seq data, it was defined as a RNA-Seq validated APA. Classification of APA To classify APA events into different subtypes, we first removed minor polyA sites whose average percentage of usage is less than 10% in both hypoxia and normoxia data. Next, one significant APA was paired with each one of the remaining polyA sites of the same gene and each corresponding polyA site pair was compared to known gene structure annotation. If all the polyA site pairs were in the same terminal exon, there will be several scenarios: 1). if the significant APA is the most distal polyA site whose usage was increased from normoxia to hypoxia or if the significant APA is the most proximal polyA site whose usage was decreased, this APA is called SE-APA-P2D (proximal to distal); 2). if the significant APA is the most distal polyA whose usage was decreased from normoxia to hyoxia or the significant APA is the most proximal polyA whose usage was increased, this APA is called SE-APA-D2P (distal to proximal). 3). if he significant APA is a middle polyA site, it is simply called SE-APA. If all the polyA site pairs were in different terminal exons, the significant APA was classified as DE-APA3 and DEAPA5 accordingly. If multiple pairs were classified into different categories (SE-APA, 87 DE-APA3 and DE-APA5), it is called as a “multiple type APA”. At last, all the remaining cases were called as “ambiguous APA”. For SE-APA-D2P and SE-APA-P2D classification, we also calculated the ratio of the most distal polyA read count to the sum of other polyA read counts for both hypoxia and normoxia. If a polyA is previously classified as SE-APA-D2P and its corresponding ratio in hypoxia is lower than in normoxia, it is corroborated as “SE-APA-D2P” in the final list; vice versa, a polyA is corrobobated as “SE-APA-P2D”. Gene enrichment analysis The DAVID (using PANTHER classification system) was used to analyze the Gene ontology enrichment of genes with significant APA events. The expressed genes (FPKM > 1) in hypoxia as measured by RNA-Seq are used for background gene set. 88 CHAPTER VII FUTURE DIRECTION In this thesis, we have presented multiple studies regarding the identification of APA impacted by the expression changes of different proteins (overexpression of ESRP and knockdown experiments of αCPs and CstF64/τ). We first uncovered that αCPs interacted with C-rich motifs upstream of cleavage sites and acted as an enhancer for 3’ end processing of a set of genes. Next, we revealed a context-dependent manner of ESRP in APA regulation. Furthermore, we identified a wide range of APA events regulated by CstF64 alone and CstF64/τ together and shown that higher levels of CstF64 promote the usage of proximal polyA sites. These studies expanded current repertoire of proteins for APA regulation and provided new insights into the mechanisms of APA regulation. The boom for high-throughput sequencing technology has made it easier to detect a large number of APA events between different samples. However, few studies dived into the functional analysis of individual APA event which might be partly due to the limited knowledge of functional elements in the 3’ UTR. Therefore, efforts should be made toward decoding the functional elements associated with mRNA stability, translation efficiency and subcellular localization. To date, fewer studies about DE-APA, which affects coding region and 3’ UTR simultaneously, were carried out as compared to SE-APA. This specific type of APA is functionally more important although it is mechanistically more complicated. Future studies are needed to address this problem especially in terms of interplay between alternative polyadenylation and alternative splicing. In addition, most of current global analyses of APA were conducted in transformed cells or cell lines. It would be more significant if similar analysis can be performed on clinical samples or even primary cell lines that are not transformed. Moreover, the analysis on clinical samples can help reveal whether APA is a key factor for certain diseases or is only a byproduct of other key biological pathways. 89 Finally, as revealed by [10], there is a complex coorperation network of proteins involved in 3’ end processing. Although knockdown or overexpression experiments of single gene have already provided valueable clues to APA regulation, further analysis to study the synergy of different proteins in regulating APA will be needed in order to more precisely decode APA regulation. 90 REFERENCES 1. Watson JD: Molecular biology of the gene, 6th edn. San Francisco Cold Spring Harbor, N.Y.: Pearson/Benjamin Cummings ; Cold Spring Harbor Laboratory Press; 2008. 2. Cooper TA, Wan L, Dreyfuss G: RNA and disease. Cell 2009, 136(4):777-793. 3. Marzluff WF, Wagner EJ, Duronio RJ: Metabolism and regulation of canonical histone mRNAs: life without a poly(A) tail. Nat Rev Genet 2008, 9(11):843-854. 4. Kuhn U, Wahle E: Structure and function of poly(A) binding proteins. Biochim Biophys Acta 2004, 1678(2-3):67-84. 5. Licatalosi DD, Mele A, Fak JJ, Ule J, Kayikci M, Chi SW, Clark TA, Schweitzer AC, Blume JE, Wang X et al: HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 2008, 456(7221):464-469. 6. Millevoi S, Decorsiere A, Loulergue C, Iacovoni J, Bernat S, Antoniou M, Vagner S: A physical and functional link between splicing factors promotes premRNA 3' end processing. Nucleic Acids Res 2009, 37(14):4672-4683. 7. Ji X, Kong J, Liebhaber SA: An RNA-protein complex links enhanced nuclear 3' processing with cytoplasmic mRNA stabilization. Embo J, 30(13):2622-2633. 8. Yan J, Marr TG: Computational analysis of 3'-ends of ESTs shows four classes of alternative polyadenylation in human, mouse, and rat. Genome Res 2005, 15(3):369-375. 9. Tian B, Hu J, Zhang H, Lutz CS: A large-scale analysis of mRNA polyadenylation of human and mouse genes. Nucleic Acids Res 2005, 33(1):201212. 10. Shi Y, Di Giammartino DC, Taylor D, Sarkeshik A, Rice WJ, Yates JR, 3rd, Frank J, Manley JL: Molecular architecture of the human pre-mRNA 3' processing complex. Mol Cell 2009, 33(3):365-376. 11. Beaudoing E, Freier S, Wyatt JR, Claverie JM, Gautheret D: Patterns of variant polyadenylation signal usage in human genes. Genome Res 2000, 10(7):10011010. 12. Hu J, Lutz CS, Wilusz J, Tian B: Bioinformatic identification of candidate cisregulatory elements involved in human mRNA polyadenylation. RNA 2005, 11(10):1485-1493. 13. Grosso AR, Gomes AQ, Barbosa-Morais NL, Caldeira S, Thorne NP, Grech G, von Lindern M, Carmo-Fonseca M: Tissue-specific splicing factor gene expression signatures. Nucleic Acids Res 2008, 36(15):4823-4832. 14. Ji Z, Lee JY, Pan Z, Jiang B, Tian B: Progressive lengthening of 3' untranslated regions of mRNAs by alternative polyadenylation during mouse embryonic development. Proc Natl Acad Sci U S A 2009, 106(17):7028-7033. 91 15. Sandberg R, Neilson JR, Sarma A, Sharp PA, Burge CB: Proliferating cells express mRNAs with shortened 3' untranslated regions and fewer microRNA target sites. Science 2008, 320(5883):1643-1647. 16. Mayr C, Bartel DP: Widespread shortening of 3'UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell 2009, 138(4):673-684. 17. Andreassi C, Riccio A: To localize or not to localize: mRNA fate is in 3'UTR ends. Trends Cell Biol 2009, 19(9):465-474. 18. An JJ, Gharami K, Liao GY, Woo NH, Lau AG, Vanevski F, Torre ER, Jones KR, Feng Y, Lu B et al: Distinct role of long 3' UTR BDNF mRNA in spine morphology and synaptic plasticity in hippocampal neurons. Cell 2008, 134(1):175-187. 19. Pinto PA, Henriques T, Freitas MO, Martins T, Domingues RG, Wyrzykowska PS, Coelho PA, Carmo AM, Sunkel CE, Proudfoot NJ et al: RNA polymerase II kinetics in polo polyadenylation signal selection. EMBO J 2011, 30(12):24312444. 20. Takagaki Y, Seipelt RL, Peterson ML, Manley JL: The polyadenylation factor CstF-64 regulates alternative processing of IgM heavy chain pre-mRNA during B cell differentiation. Cell 1996, 87(5):941-952. 21. Yao P, Potdar AA, Arif A, Ray PS, Mukhopadhyay R, Willard B, Xu Y, Yan J, Saidel GM, Fox PL: Coding region polyadenylation generates a truncated tRNA synthetase that counters translation repression. Cell, 149(1):88-100. 22. Moore MJ, Proudfoot NJ: Pre-mRNA processing reaches back to transcription and ahead to translation. Cell 2009, 136(4):688-700. 23. Takagaki Y, Manley JL: Levels of polyadenylation factor CstF-64 control IgM heavy chain mRNA accumulation and other events associated with B cell differentiation. Mol Cell 1998, 2(6):761-771. 24. Elkon R, Drost J, van Haaften G, Jenal M, Schrier M, Vrielink JA, Agami R: E2F mediates enhanced alternative polyadenylation in proliferation. Genome Biol, 13(7):R59. 25. Martin G, Gruber AR, Keller W, Zavolan M: Genome-wide Analysis of PremRNA 3' End Processing Reveals a Decisive Role of Human Cleavage Factor I in the Regulation of 3' UTR Length. Cell Rep, 1(6):753-763. 26. Castelo-Branco P, Furger A, Wollerton M, Smith C, Moreira A, Proudfoot N: Polypyrimidine tract binding protein modulates efficiency of polyadenylation. Mol Cell Biol 2004, 24(10):4174-4183. 27. Katz Y, Wang ET, Airoldi EM, Burge CB: Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods, 7(12):1009-1015. 28. Berg MG, Singh LN, Younis I, Liu Q, Pinto AM, Kaida D, Zhang Z, Cho S, Sherrill-Mix S, Wan L et al: U1 snRNP determines mRNA length and regulates isoform expression. Cell, 150(1):53-64. 92 29. Ji Z, Luo W, Li W, Hoque M, Pan Z, Zhao Y, Tian B: Transcriptional activity regulates alternative cleavage and polyadenylation. Mol Syst Biol, 7:534. 30. Glover-Cutter K, Kim S, Espinosa J, Bentley DL: RNA polymerase II pauses and associates with pre-mRNA processing factors at both ends of genes. Nat Struct Mol Biol 2008, 15(1):71-78. 31. Singh P, Alley TL, Wright SM, Kamdar S, Schott W, Wilpan RY, Mills KD, Graber JH: Global changes in processing of mRNA 3' untranslated regions characterize clinically distinct cancer subtypes. Cancer Res 2009, 69(24):94229430. 32. Fu Y, Sun Y, Li Y, Li J, Rao X, Chen C, Xu A: Differential genome-wide profiling of tandem 3' UTRs among human breast cancer and normal cells by high-throughput sequencing. Genome Res, 21(5):741-747. 33. Bennett CL, Brunkow ME, Ramsdell F, O'Briant KC, Zhu Q, Fuleihan RL, Shigeoka AO, Ochs HD, Chance PF: A rare polyadenylation signal mutation of the FOXP3 gene (AAUAAA-->AAUGAA) leads to the IPEX syndrome. Immunogenetics 2001, 53(6):435-439. 34. Beaudoing E, Gautheret D: Identification of alternate polyadenylation sites and analysis of their tissue distribution using EST data. Genome Res 2001, 11(9):1520-1526. 35. Zhang H, Lee JY, Tian B: Biased alternative polyadenylation in human tissues. Genome Biol 2005, 6(12):R100. 36. Flavell SW, Kim TK, Gray JM, Harmin DA, Hemberg M, Hong EJ, MarkenscoffPapadimitriou E, Bear DM, Greenberg ME: Genome-wide analysis of MEF2 transcriptional program reveals synaptic target genes and neuronal activitydependent polyadenylation site selection. Neuron 2008, 60(6):1022-1038. 37. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008, 5(7):621-628. 38. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB: Alternative isoform regulation in human tissue transcriptomes. Nature 2008, 456(7221):470-476. 39. Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 2009, 10(1):57-63. 40. Jan CH, Friedman RC, Ruby JG, Bartel DP: Formation, regulation and evolution of Caenorhabditis elegans 3'UTRs. Nature, 469(7328):97-101. 41. Mangone M, Manoharan AP, Thierry-Mieg D, Thierry-Mieg J, Han T, Mackowiak SD, Mis E, Zegar C, Gutwein MR, Khivansara V et al: The landscape of C. elegans 3'UTRs. Science, 329(5990):432-435. 93 42. Shepard PJ, Choi EA, Lu J, Flanagan LA, Hertel KJ, Shi Y: Complex and dynamic landscape of RNA polyadenylation revealed by PAS-Seq. RNA, 17(4):761-772. 43. Fox-Walsh K, Davis-Turak J, Zhou Y, Li H, Fu XD: A multiplex RNA-seq strategy to profile poly(A+) RNA: application to analysis of transcription response and 3' end formation. Genomics, 98(4):266-271. 44. Derti A, Garrett-Engele P, Macisaac KD, Stevens RC, Sriram S, Chen R, Rohl CA, Johnson JM, Babak T: A quantitative atlas of polyadenylation in five mammals. Genome Res, 22(6):1173-1183. 45. Ozsolak F, Platt AR, Jones DR, Reifenberger JG, Sass LE, McInerney P, Thompson JF, Bowers J, Jarosz M, Milos PM: Direct RNA sequencing. Nature 2009, 461(7265):814-818. 46. Lee JY, Yeh I, Park JY, Tian B: PolyA_DB 2: mRNA polyadenylation sites in vertebrate genes. Nucleic Acids Res 2007, 35(Database issue):D165-168. 47. Kass RE, Raftery AE: Bayes Factors. Journal of the American Statistical Association 1995, 90(430):773-795. 48. Benjamini Y, Hochberg Y: Controlling the False Discovery Rate - a Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B-Methodological 1995, 57(1):289-300. 49. Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 2009, 10(3):R25. 50. Pauws E, van Kampen AH, van de Graaf SA, de Vijlder JJ, Ris-Stalpers C: Heterogeneity in polyadenylation cleavage sites in mammalian mRNA sequences: implications for SAGE analysis. Nucleic Acids Res 2001, 29(8):1690-1694. 51. Ji X, Kong J, Liebhaber SA: In vivo association of the stability control protein alphaCP with actively translating mRNAs. Mol Cell Biol 2003, 23(3):899-907. 52. Kiledjian M, Wang X, Liebhaber SA: Identification of two KH domain proteins in the alpha-globin mRNP stability complex. EMBO J 1995, 14(17):4357-4364. 53. Kong J, Ji X, Liebhaber SA: The KH-domain protein alpha CP has a direct role in mRNA stabilization independent of its cognate binding site. Mol Cell Biol 2003, 23(4):1125-1134. 54. Kong J, Liebhaber SA: A cell type-restricted mRNA surveillance pathway triggered by ribosome extension into the 3' untranslated region. Nat Struct Mol Biol 2007, 14(7):670-676. 55. Weiss IM, Liebhaber SA: Erythroid cell-specific determinants of alpha-globin mRNA stability. Mol Cell Biol 1994, 14(12):8123-8132. 56. Makeyev AV, Liebhaber SA: The poly(C)-binding proteins: a multiplicity of functions and a search for mechanisms. Rna 2002, 8(3):265-278. 94 57. Chkheidze AN, Lyakhov DL, Makeyev AV, Morales J, Kong J, Liebhaber SA: Assembly of the alpha-globin mRNA stability complex reflects binary interaction between the pyrimidine-rich 3' untranslated region determinant and poly(C) binding protein alphaCP. Mol Cell Biol 1999, 19(7):4572-4581. 58. Chaudhury A, Chander P, Howe PH: Heterogeneous nuclear ribonucleoproteins (hnRNPs) in cellular processes: Focus on hnRNP E1's multifunctional regulatory roles. RNA, 16(8):1449-1462. 59. Holcik M, Liebhaber SA: Four highly stable eukaryotic mRNAs assemble 3' untranslated region RNA-protein complexes sharing cis and trans components. Proc Natl Acad Sci U S A 1997, 94(6):2410-2414. 60. Waggoner SA, Liebhaber SA: Regulation of alpha-globin mRNA stability. Exp Biol Med (Maywood) 2003, 228(4):387-395. 61. Waggoner SA, Liebhaber SA: Identification of mRNAs associated with alphaCP2-containing RNP complexes. Mol Cell Biol 2003, 23(19):7055-7067. 62. Thisted T, Lyakhov DL, Liebhaber SA: Optimized RNA targets of two closely related triple KH domain proteins, heterogeneous nuclear ribonucleoprotein K and alphaCP-2KL, suggest Distinct modes of RNA recognition. J Biol Chem 2001, 276(20):17484-17496. 63. Ji X, Kong J, Carstens RP, Liebhaber SA: The 3' untranslated region complex involved in stabilization of human alpha-globin mRNA assembles in the nucleus and serves an independent role as a splice enhancer. Mol Cell Biol 2007, 27(9):3290-3302. 64. Ozsolak F, Kapranov P, Foissac S, Kim SW, Fishilevich E, Monaghan AP, John B, Milos PM: Comprehensive polyadenylation site maps in yeast and human reveal pervasive alternative polyadenylation. Cell, 143(6):1018-1029. 65. Wang L, Feng Z, Wang X, Zhang X: DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics, 26(1):136138. 66. Ghosh D, Srivastava GP, Xu D, Schulz LC, Roberts RM: A link between SIN1 (MAPKAP1) and poly(rC) binding protein 2 (PCBP2) in counteracting environmental stress. Proc Natl Acad Sci U S A 2008, 105(33):11673-11678. 67. Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS: MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res 2009, 37(Web Server issue):W202-208. 68. Sagawa F, Ibrahim H, Morrison AL, Wilusz CJ, Wilusz J: Nucleophosmin deposition during mRNA 3' end processing influences poly(A) tail length. Embo J, 30(19):3994-4005. 69. Richard P, Manley JL: Transcription termination by nuclear RNA polymerases. Genes Dev 2009, 23(11):1247-1269. 70. Di Giammartino DC, Nishida K, Manley JL: Mechanisms and consequences of alternative polyadenylation. Mol Cell, 43(6):853-866. 95 71. Jenal M, Elkon R, Loayza-Puch F, van Haaften G, Kuhn U, Menzies FM, Oude Vrielink JA, Bos AJ, Drost J, Rooijers K et al: The poly(A)-binding protein nuclear 1 suppresses alternative cleavage and polyadenylation sites. Cell, 149(3):538-553. 72. Dittmar KA, Jiang P, Park JW, Amirikian K, Wan J, Shen S, Xing Y, Carstens RP: Genome-wide determination of a broad ESRP-regulated posttranscriptional network by high-throughput sequencing. Mol Cell Biol, 32(8):1468-1482. 73. Buratti E, Baralle FE: TDP-43: new aspects of autoregulation mechanisms in RNA binding proteins and their connection with human disease. Febs J, 278(19):3530-3538. 74. Zhu J, Chen X: MCG10, a novel p53 target gene that encodes a KH domain RNA-binding protein, is capable of inducing apoptosis and cell cycle arrest in G(2)-M. Mol Cell Biol 2000, 20(15):5602-5618. 75. Naarmann IS, Harnisch C, Flach N, Kremmer E, Kuhn H, Ostareck DH, Ostareck-Lederer A: mRNA silencing in human erythroid cell maturation: heterogeneous nuclear ribonucleoprotein K controls the expression of its regulator c-Src. J Biol Chem 2008, 283(26):18461-18472. 76. Perrotti D, Cesi V, Trotta R, Guerzoni C, Santilli G, Campbell K, Iervolino A, Condorelli F, Gambacorti-Passerini C, Caligiuri MA et al: BCR-ABL suppresses C/EBPalpha expression through inhibitory action of hnRNP E2. Nat Genet 2002, 30(1):48-58. 77. Molinaro RJ, Jha BK, Malathi K, Varambally S, Chinnaiyan AM, Silverman RH: Selection and cloning of poly(rC)-binding protein 2 and Raf kinase inhibitor protein RNA activators of 2',5'-oligoadenylate synthetase from prostate cancer cells. Nucleic Acids Res 2006, 34(22):6684-6695. 78. Chaudhury A, Hussey GS, Ray PS, Jin G, Fox PL, Howe PH: TGF-beta-mediated phosphorylation of hnRNP E1 induces EMT via transcript-selective translational induction of Dab2 and ILEI. Nat Cell Biol, 12(3):286-293. 79. Huang da W, Sherman BT, Lempicki RA: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 2009, 37(1):1-13. 80. Warzecha CC, Jiang P, Amirikian K, Dittmar KA, Lu H, Shen S, Guo W, Xing Y, Carstens RP: An ESRP-regulated splicing programme is abrogated during the epithelial-mesenchymal transition. EMBO J, 29(19):3286-3300. 81. Warzecha CC, Shen S, Xing Y, Carstens RP: The epithelial splicing factors ESRP1 and ESRP2 positively and negatively regulate diverse types of alternative splicing events. RNA Biol 2009, 6(5):546-562. 82. Pan Z, Zhang H, Hague LK, Lee JY, Lutz CS, Tian B: An intronic polyadenylation site in human and mouse CstF-77 genes suggests an evolutionarily conserved regulatory mechanism. Gene 2006, 366(2):325-334. 96 83. Brooks AN, Yang L, Duff MO, Hansen KD, Park JW, Dudoit S, Brenner SE, Graveley BR: Conservation of an RNA regulatory map between Drosophila and mammals. Genome Res, 21(2):193-202. 84. Danckwardt S, Hentze MW, Kulozik AE: 3' end mRNA processing: molecular mechanisms and implications for health and disease. EMBO J 2008, 27(3):482498. 85. Millevoi S, Vagner S: Molecular mechanisms of eukaryotic pre-mRNA 3' end processing regulation. Nucleic Acids Res, 38(9):2757-2774. 86. Veraldi KL, Arhin GK, Martincic K, Chung-Ganster LH, Wilusz J, Milcarek C: hnRNP F influences binding of a 64-kilodalton subunit of cleavage stimulation factor to mRNA precursors in mouse B cells. Mol Cell Biol 2001, 21(4):12281238. 87. Warzecha CC, Sato TK, Nabet B, Hogenesch JB, Carstens RP: ESRP1 and ESRP2 are epithelial cell-type-specific regulators of FGFR2 splicing. Mol Cell 2009, 33(5):591-601. 88. Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 2009, 25(9):1105-1111. 89. Takagaki Y, Manley JL, MacDonald CC, Wilusz J, Shenk T: A multisubunit factor, CstF, is required for polyadenylation of mammalian pre-mRNAs. Genes Dev 1990, 4(12A):2112-2120. 90. MacDonald CC, Wilusz J, Shenk T: The 64-kilodalton subunit of the CstF polyadenylation factor binds to pre-mRNAs downstream of the cleavage site and influences cleavage site location. Mol Cell Biol 1994, 14(10):6647-6654. 91. Zhao J, Hyman L, Moore C: Formation of mRNA 3' ends in eukaryotes: mechanism, regulation, and interrelationships with other steps in mRNA synthesis. Microbiol Mol Biol Rev 1999, 63(2):405-445. 92. Colgan DF, Manley JL: Mechanism and regulation of mRNA polyadenylation. Genes Dev 1997, 11(21):2755-2766. 93. Wallace AM, Dass B, Ravnik SE, Tonk V, Jenkins NA, Gilbert DJ, Copeland NG, MacDonald CC: Two distinct forms of the 64,000 Mr protein of the cleavage stimulation factor are expressed in mouse male germ cells. Proc Natl Acad Sci U S A 1999, 96(12):6763-6768. 94. Chuvpilo S, Zimmer M, Kerstan A, Glockner J, Avots A, Escher C, Fischer C, Inashkina I, Jankevics E, Berberich-Siebelt F et al: Alternative polyadenylation events contribute to the induction of NF-ATc in effector T cells. Immunity 1999, 10(2):261-269. 95. Park JW, Parisky K, Celotto AM, Reenan RA, Graveley BR: Identification of alternative splicing regulators by RNA interference in Drosophila. Proc Natl Acad Sci U S A 2004, 101(45):15974-15979. 97 96. Ji Z, Tian B: Reprogramming of 3' untranslated regions of mRNAs by alternative polyadenylation in generation of pluripotent stem cells from different cell types. PLoS One 2009, 4(12):e8419. 97. Vaupel P, Mayer A: Hypoxia in cancer: significance and impact on clinical outcome. Cancer Metastasis Rev 2007, 26(2):225-239. 98. Chi JT, Wang Z, Nuyten DS, Rodriguez EH, Schaner ME, Salim A, Wang Y, Kristensen GB, Helland A, Borresen-Dale AL et al: Gene expression programs in response to hypoxia: cell type specificity and prognostic significance in human cancers. PLoS Med 2006, 3(3):e47. 99. Lendahl U, Lee KL, Yang H, Poellinger L: Generating specificity and diversity in the transcriptional response to hypoxia. Nat Rev Genet 2009, 10(12):821-832. 100. Masuda K, Abdelmohsen K, Gorospe M: RNA-binding proteins implicated in the hypoxic response. J Cell Mol Med 2009, 13(9A):2759-2769. 101. Gorospe M, Tominaga K, Wu X, Fahling M, Ivan M: Post-Transcriptional Control of the Hypoxic Response by RNA-Binding Proteins and MicroRNAs. Front Mol Neurosci, 4:7. 102. Weigand JE, Boeckel JN, Gellert P, Dimmeler S: Hypoxia-induced alternative splicing in endothelial cells. PLoS One, 7(8):e42697. 103. Shepard PJ, Choi EA, Lu J, Flanagan LA, Hertel KJ, Shi Y: Complex and dynamic landscape of RNA polyadenylation revealed by PAS-Seq. Rna 2011, 17(4):761-772. 104. Morris AR, Bos A, Diosdado B, Rooijers K, Elkon R, Bolijn AS, Carvalho B, Meijer GA, Agami R: Alternative Cleavage and Polyadenylation during Colorectal Cancer Development. Clin Cancer Res, 18(19):5256-5266. 105. Wellmann S, Buhrer C, Moderegger E, Zelmer A, Kirschner R, Koehne P, Fujita J, Seeger K: Oxygen-regulated expression of the RNA-binding proteins RBM3 and CIRP by a HIF-1-independent mechanism. J Cell Sci 2004, 117(Pt 9):17851794. 106. Shi L, Campbell G, Jones WD, Campagne F, Wen Z, Walker SJ, Su Z, Chu TM, Goodsaid FM, Pusztai L et al: The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat Biotechnol, 28(8):827-838.
© Copyright 2026 Paperzz