Global analysis of alternative polyadenylation regulation using high

University of Iowa
Iowa Research Online
Theses and Dissertations
Fall 2012
Global analysis of alternative polyadenylation
regulation using high-throughput sequencing
Ji Wan
University of Iowa
Copyright 2012 Ji Wan
This dissertation is available at Iowa Research Online: http://ir.uiowa.edu/etd/3548
Recommended Citation
Wan, Ji. "Global analysis of alternative polyadenylation regulation using high-throughput sequencing." PhD (Doctor of Philosophy)
thesis, University of Iowa, 2012.
http://ir.uiowa.edu/etd/3548.
Follow this and additional works at: http://ir.uiowa.edu/etd
Part of the Genetics Commons
GLOBAL ANALYSIS OF ALTERNATIVE POLYADENYLATION
REGULATION USING HIGH-THROUGHPUT SEQUENCING
by
Ji Wan
An Abstract
Of a thesis submitted in partial fulfillment
of the requirements for the Doctor of
Philosophy degree in Genetics
in the Graduate College of
The University of Iowa
December 2012
Thesis Supervisor: Associate Professor Yi Xing
1
ABSTRACT
Messenger RNAs (mRNAs) have to undergo a series of post-transcriptional
processing steps before translation. One of the post-transcriptional steps – 3’ end
processing, which consists of cleavage and polyadenylation, is critical for delimiting the
3’ end of mRNA and determining regulatory elements for downstream posttranscriptional/translational regulation. Like another well-characterized mRNA
processing step – splicing, 3’ end processing is very flexible due to the diversity of transacting factors and cis-acting elements in the 3’ end of mRNA. In recent years, the
differential usage of alternative polyA sites (APA) of the same gene, which leads to
mRNA isoforms of different 3’ UTR, has been increasingly revealed by both
experimental and computational studies. More significantly, the global changes of 3’
UTR length have been observed in multiple clinical settings, particularly in the cancer
cells. However, the depiction of APA phenomenon does not synchronize the efforts to
study the mechanism underlying APA biogenesis.
In this thesis, we first describe general principle and pipeline to identify APA in
different biological or clinical conditions using various high throughput sequencing
techniques. After that, we present the work about the global impacts of two RNA binding
proteins (ESRP/aCP) and one core 3’ end processing factor (CstF64 and its paralog
CstF64τ) on the regulation of APA. The APA identification analyses and motif analyses
suggest a wide range of APA associated with the expression change of those proteins in
different cell lines. In addition, for each protein, we have collected substantial evidence
about the mechanism underlying the APA induction. Our findings could provide
significant insights into the APA regulation mechanisms.
2
In addition, we also conducted research on the induction of APA in JEG-3 cells as
a response to the change of oxygen supply (Hypoxia and Normoxia). Using a robust
protocol for specifically sequencing 3’ end of mRNA, we identified more than 500 APA
events and revealed a global shortening pattern of 3’ UTR length as a result of hypoxia.
The work on APA in this thesis largely increases the understanding of APA
regulation by various proteins and provided new evidence for the APA in clinical
condition.
Abstract Approved: ____________________________________
Thesis Supervisor
____________________________________
Title and Department
____________________________________
Date
GLOBAL ANALYSIS OF ALTERNATIVE POLYADENYLATION
REGULATION USING HIGH-THROUGHPUT SEQUENCING
by
Ji Wan
A thesis submitted in partial fulfillment
of the requirements for the Doctor of
Philosophy degree in Genetics
in the Graduate College of
The University of Iowa
December 2012
Thesis Supervisor: Associate Professor Yi Xing
Graduate College
The University of Iowa
Iowa City, Iowa
CERTIFICATE OF APPROVAL
_______________________
PH.D. THESIS
_______________
This is to certify that the Ph.D. thesis of
Ji Wan
has been approved by the Examining Committee
for the thesis requirement for the Doctor of Philosophy
degree in Genetics at the December 2012 graduation.
Thesis Committee: ___________________________________
Yi Xing, Thesis Supervisor
___________________________________
Beverly Davidson
___________________________________
Charles Brenner
___________________________________
Jian Huang
___________________________________
Dana Levasseur
ACKNOWLEDGMENTS
I gratefully acknowledge my mentor Yi Xing for his patience, guidance and
support. I also acknowledge my committee members Beverly Davidson, Charles Brenner ,
Jian Huang and Dana Levasseur for their encouragement and advice.
I acknowledge my past and current colleagues in the Xing lab (Peng Jiang, Lan
Lin, Zhixiang Lu, Juw Won Park, Jinkai Wang, Keyan Zhao, Seth Brown, Hongchao Lu,
Shihao Shen, Collin Tokheim) for their intellectual suggestions, support and friendship I
truly enjoyed.
I acknowledge my collaborators Russ Carstens, Kimberly Dittmar, Stephen
Liebhaber, Xinjun Ji, Yongsheng Shi, and Hayley McLoughlin for their working attitude
and knowledge I learnt.
Finally, I acknowledge Dan Eberl, Isabelle Hardy, Linda Hurst and Kafer Anita of
the genetics program for their timely and generous help in the past four years.
ii
ABSTRACT
Messenger RNAs (mRNAs) have to undergo a series of post-transcriptional
processing steps before translation. One of the post-transcriptional steps – 3’ end
processing, which consists of cleavage and polyadenylation, is critical for delimiting the
3’ end of mRNA and determining regulatory elements for downstream posttranscriptional/translational regulation. Like another well-characterized mRNA
processing step – splicing, 3’ end processing is very flexible due to the diversity of transacting factors and cis-acting elements in the 3’ end of mRNA. In recent years, the
differential usage of alternative polyA sites (APA) of the same gene, which leads to
mRNA isoforms of different 3’ UTR, has been increasingly revealed by both
experimental and computational studies. More significantly, the global changes of 3’
UTR length have been observed in multiple clinical settings, particularly in the cancer
cells. However, the depiction of APA phenomenon does not synchronize the efforts to
study the mechanism underlying APA biogenesis.
In this thesis, we first describe general principle and pipeline to identify APA in
different biological or clinical conditions using various high throughput sequencing
techniques. After that, we present the work about the global impacts of two RNA binding
proteins (ESRP/aCP) and one core 3’ end processing factor (CstF64 and its paralog
CstF64τ) on the regulation of APA. The APA identification analyses and motif analyses
suggest a wide range of APA associated with the expression change of those proteins in
different cell lines. In addition, for each protein, we have collect substantial evidence
about the mechanism underlying the APA induction. Our findings could provide
significant insights into the APA regulation mechanisms.
iii
In addition, we also conducted a research on the induction of APA in JEG-3 cells
as a response to the change of oxygen supply (Hypoxia and Normoxia). Using a
robustness protocol for specifically sequencing 3’ end of mRNA, we identified more than
500 APA events and revealed a global shortening pattern of 3’ UTR length as a result of
hypoxia.
The work on APA in this thesis largely increases the understanding of APA
regulation by various proteins and provided new evidence for the APA in clinical
condition.
iv
TABLE OF CONTENTS
LIST OF TABLES ............................................................................................................ vii
LIST OF FIGURES ......................................................................................................... viii
INTRODUCTION ...............................................................................................................1
CHAPTER I
INTRODUCTION TO ALTERNATIVE
POLYADENYLATION ...................................................................................3
3’ end processing pathways ..............................................................................3
Alternative 3’ end processing ...........................................................................4
Consequences of APA ......................................................................................5
Mechanism of APA regulation .........................................................................8
Alternative polyadenylation in disease ...........................................................10
CHAPTER II
IDENTIFICATION OF APA EVENTS USING HIGHTHROUGHPUT SEQUENDING TECHNIQUES ........................................12
Introduction.....................................................................................................12
Detect APA using generic RNA-Seq ..............................................................14
Filtering artifact terminal exon ................................................................15
Predefine APA events..............................................................................16
Calling statistically significant APA (RNA-Seq) ....................................17
Detect APA using PAS-Seq and DRS ............................................................19
Read mapping and polyA site calling ......................................................19
Filtering artificial polyA site due to internal priming .............................19
Clustering heterogeneous polyA sites .....................................................20
Calling statistically significant APA (DRS/PAS-Seq) ............................20
Summary .........................................................................................................22
CHAPTER III
THE POLY-C BINDING PROTEINS ACT AS GLOBAL
REGULATORS OF ALTERNATIVE POLYADENYLATION ...................23
Introduction.....................................................................................................23
Results.............................................................................................................24
Direct RNA 3’ sequencing of the transcriptome in cells acutely
depleted of CP .......................................................................................24
Identification of mRNAs impacted by CP depletion ............................27
Motif analysis reveals C-rich determinants in the 3’ UTRs of
mRNAs impacted by CP depletion .......................................................30
CP impacts on patterns of alternative polyA selection .........................32
Motif analysis of APA events..................................................................34
CP2 controls the 3’ processing of its own transcript ............................38
APA pattern changes impacted by CPs.................................................39
Discussion .......................................................................................................40
Materials and Methods ...................................................................................52
Cell culture and siRNA transfection ........................................................52
Direct RNA sequencing ...........................................................................52
Mapping and APA analysis of DRS data ................................................52
Detection of differential gene expression ................................................53
Motif enrichment analysis .......................................................................54
Gene ontology analysis ............................................................................54
QPCR .......................................................................................................55
v
3’RACE ...................................................................................................55
RNA UV-crosslinking and EMSA ..........................................................55
CHAPTER IV
CONTEXT-DEPENDENT REGULATION OF APA BY
EPITHELIAL SPLICING REGULATORY PROTEINS ..............................56
Introduction.....................................................................................................56
Results.............................................................................................................56
Identification of ESRP-regulated changes in alternative 3’ end
formation by coupling .............................................................................56
Discussion .......................................................................................................62
Materials and Methods ...................................................................................64
Cell Culture, transfection, and transduction ............................................64
Library preparation and sequencing ........................................................64
Identification of ESRP regulated changes in polyadenylation ................65
CHAPTER V
GLOBAL REGULATION OF ALTERNATIVE
POLYADENYLATION BY CLEAVAGE STIMULATION FACTOR
64 (CSTF64) ...................................................................................................66
Background .....................................................................................................66
Global analyses of CstF64-mediated APA regulation ....................................66
Discussion .......................................................................................................69
Materials and Methods ...................................................................................72
Cell culture and transfections ..................................................................72
Gel shift assay..........................................................................................72
Sequencing and reads mapping ...............................................................72
APA analysis ...........................................................................................74
CHAPTER VI
ALTERNATIVE POLYADENYLATION DURING
HYPOXIA INDUCTION ...............................................................................75
Background .....................................................................................................75
Results.............................................................................................................76
Quality control analysis of PAS-Seq data ...............................................76
Reproducibility of PAS-Seq ....................................................................76
Gene expression correlation between PAS-Seq and RNA-Seq ...............78
Alternative polyadenylation induced by hypoxia ....................................80
GO analysis of APA genes ......................................................................81
Discussion .......................................................................................................83
Materials and Methods ...................................................................................84
Sample prepration ....................................................................................84
RT-qPCR validation of APA ...................................................................85
Detect APA for multiple samples ............................................................85
Classification of APA ..............................................................................86
Gene enrichment analysis ........................................................................87
CHAPTER VII
FUTURE DIRECTION ....................................................................88
REFERENCES ..................................................................................................................90
vi
LIST OF TABLES
Table 1
Characteristic features of different high-throuput sequencing techniques
in 3’ end processing study. .............................................................................14
Table 2
Example of contingency table used in One v.s. Others method. . ..................21
Table 3
Summary of DRS data of CP knockdown experiments. .............................28
Table 4
Number of differentially expressed genes impacted by aCP depletion ..........29
Table 5
Gene Ontology analysis on DEG genes impacted by αCPs knockdown ........33
Table 6
List of validated genes having significant changes in overall expression
after CP depletion. ........................................................................................35
Table 7
Number of APA events impacted by αCPs depletion.....................................37
Table 8
Summary of direct RNA sequencing data of ESRPs knockdown. .................58
Table 9
Summary of DRS data of CstF64/τ knockdown experiments. .......................67
Table 10
Summary of Hypoxia and MAQC PAS-Seq data...........................................77
vii
LIST OF FIGURES
Figure 1
Cis-acting regulatory elements and trans-acting proteins in
eukaryotic 3’ end processing. ......................................................................4
Figure 2
Classification of APA events by mechanism. .............................................6
Figure 3
Functional consequence of SE-APA. ..........................................................7
Figure 4
Impact of DE-APA on the difference of coding sequence. ........................8
Figure 5
Pipeline for detecting significant APA switch for generic RNASeq, DRS and PAS-Seq.............................................................................15
Figure 6
Flowchart of precompiling APA events only from gene structure
annotation ..................................................................................................17
Figure 7
siRNA-mediated co-depletion of CP1 and CP2 from K562
cells. ..........................................................................................................25
Figure 8
Histogram of DRS read lengths of all the four samples (pooled). ............26
Figure 9
Reproducibility of DRS polyA reads. .......................................................27
Figure 10
Heatmap of differentially expressed genes after CP knockdown ...........31
Figure 11
GO analysis of mRNAs altered in overall expression (DGE
levels)by CP depletion. ...........................................................................32
Figure 12
Confirmation of differentially expressed genes impacted by CPs
by targeted real time RT-PCR analysis. ....................................................34
Figure 13
Motif analysis within the 3’UTRs of mRNAs impacted by CP
depletion ....................................................................................................36
Figure 14
Motif analysis of transcripts undergoing APA in response to CP
depletion. ....................................................................................................41
Figure 15
siRNA-mediated depletion of CPs alters 3’ processing of the CP2
transcript. ...................................................................................................44
Figure 16
QPCR validations of APA.........................................................................46
Figure 17
In vitr o RNA-protein interaction assay. ...................................................49
Figure 18
Impact of CPs on expression and alternative polyadenylation of
Pol II transcripts. ........................................................................................50
Figure 19
Outline of the experimental systems and RNA-Seq/DRS protocol
used to identify ESRP regulated APA. ......................................................57
Figure 20
Examples of three types of alternative 3’ end formation regulated by
the ESRPs...................................................................................................60
viii
Figure 21
Example of APA events correlating with host gene expression
change ........................................................................................................61
Figure 22
A functional map for ESRP position-dependent regulation of
alternative polyadenylation. .......................................................................63
Figure 23
CstF64-mediated global APA regulation. ..................................................70
Figure 24
Comparison of the RNA-binding specificities of CstF64 and
CstF64τ. .....................................................................................................71
Figure 25
Comparison of APA changes in CstF64- and CstF64&τ-RNAi cells. .......73
Figure 26
Reproducibility of polyA sites by PAS-Seq...............................................78
Figure 27
Clustering of 18 PAS-Seq samples. ...........................................................79
Figure 28
Scatterplots of gene expression measured by PAS-Seq and RNASeq..............................................................................................................80
Figure 29
RT-qPCR validation of APA events in hypoxia. .......................................82
Figure 30
Distribution of 3’ UTR lengths in Hypoxia and Normoxia. ......................83
Figure 31
Distribution of expression levels of SE-APA genes. .................................84
ix
1
INTRODUCTION
As the bridge between hardcoded DNA and functional protein during gene
expression, messenger RNA (mRNA) processing in eukaryotic cells provides a layer of
flexibility to increase regulatory and functional diversity. Thus, understanding the
alternative signals or pathways in mRNA processing is one of central issues in modern
biology.
The major mRNA processing events include 5’capping, 3’ cleavage and
polyadenylation, splicing and RNA editing [1]. After being processed, the mature mRNA
consists of three structural segments – 5’ untranslated region (5’ UTR), coding sequence
(CDS) and 3’ untranslated region (3’ UTR). Amongst them, the 5’ UTR is involved in
translational initiation; the CDS determines the amino acid composition of protein; and
the 3’ UTR regulates mRNA stability and translational efficiency. The expressed
sequences for these three segments are not invariable. There is great complexity in the
pre-mRNA processing pathways. This property in eukaryotic cells provides a plethora of
functionally specific mRNA or protein isoforms. As a double-edged sword, the
temporally and spatially abnormal expression of inappropriate isoforms can cause defects
in cell function and ultimately lead to animal diseases[2].
This dissertation mainly focuses on the issues related to 3’ UTR isoforms of
mRNA. Chapter I will introduce the background for the generation, regulation and
consequences of 3’ UTR. Chapter II will discuss how to take advantage of highthroughput sequencing techniques to identify differential expression of 3’ UTR isoforms.
Chapters III, IV and V will discuss to what extent and in what manner do core processing
factors (CstF64/CstF64τ) and RNA binding protens (ESRP and aCP) regulate different 3’
2
UTR isoforms. At last, chapter VI will discuss the differential expression of 3’ UTR
isoforms between hypoxia and normoxia.
3
CHAPTER I INTRODUCTION TO ALTERNATIVE
POLYADENYLATION
3’ end processing pathways
Formation of the 3’ end is critical to the maturation and post-transcriptional
regulation of mRNA. Except for histone genes, most of precursor mRNAs (pre-mRNAs)
undergo two steps of processing at the 3’ end – cleavage and polyadenylation [3]. The
cleavage is to delimit the 3’ end of mRNA by cutting the precursor mRNA at the
cleavage site (CS); and the polyadenylation is to attach a stretch of adenosines to the CS
for the ease of nuclear export as well as conferringmRNA stability and translation. Two
major groups of processing factors (core and accessory factors) are recruited to regulate
3’ end processing by interacting with different cis-acting elements in the 3’ end of premRNA. The endonucleolytic cleavage is catalyzed by the CPSF complex (CPSF 30,
CPSF 73, CPSF 100, CPSF 160 and hFip1) and CF complex (CF I and CF II) through
recognition of the polyA signal (AAUAAA and the variants) which is about 10 to 30nt
upstream of the cleavage site. On the other side, CstF complex are recruited for cleavage
by recognizing U/GU-rich element usually 30nt downstream of the CS (Figure 1). The
cleaved mRNA is subsequently appended by a stretch of adenosines with the involvement
of PAP. It is proposed that the length of polyA tail is decided by the synergy between
PABPN1, CPSF and PAP, which is typically about 250 nt long [4]. The length of polyA
tail is critical for mRNA stability, cellular localization and translational regulation.
In addition to the aforementioned core processing machinery, the footprints of multiple
RNA binding proteins (RBPs) or splicing regulators (SRs) have been found on the 3’
UTR via possible interaction with other cis-acting elements in the 3’ UTR [5-7].These
4
proteins are thought to be accessory to the core 3’ end processing machinery. For
example, a considerable portion of CrossLinking ImmunoPrecipitation sequencing
(CLIP-Seq) tags of a brain specific SR – Nova, were mapped to the flanking regions of
polyA sites in hundreds of mRNAs. In addition, the Nova-specific YCAY-rich motifs
were enriched in those flanking regions, which confirms the physical interaction between
RBP and a cis-acting element in the 3' UTR [5].
Figure 1 Cis-acting regulatory elements and trans-acting proteins in eukaryotic 3’
end processing. The mRNA is cleaved at the cleavage site (black triangle)
and is further attached with a polyA tail. There are three functional units in
this whole process (as distinguished by different colors). The first unit consists
of cleavage factor (CF) and cleavage and polyadenylation specificity factor
(CPSF) families which interacting with polyA signal AATAAA about 30 nt
upstream of cleavage site. The second unit is composed of cleavage
stimulating factor (CstF) family and downstream sequence elements (DSE).
The last unit represents the interaction between RNA binding proteins (RBP)
and upstream sequence elements (USE).
Alternative 3’ end processing
Multiple distinct polyA sites can be alternatively utilized in the same gene. It was
reported that about 40% to 50% of human or mouse genes had multiple polyA sites
byanalyzing expressed sequence tag (EST) [8, 9]. A recent study identified near 100
5
proteins from the purified human 3’ processing complex [10]. Such a big framework
implicates a large landscape of alternative processing in the 3’ end of mRNA, which can
lead to the alternative usage of different polyA sites in the same gene (APA), due to the
following lines of evidences: 1) variable polyA regulatory signals [11, 12]; 2) cell
state/tissue specific expression of core 3’ end processing factors; 3) tissue specificity of
accessory factors [13].
According to the biogenesis pathways, APA can be classified into 2 major
categories corresponding to 4 sub-categories. The first major category is called Same
terminal Exon APA (SE-APA) which denotes multiple polyA sites happening in the same
continuous terminal exons. In contrast, the second category is called Different terminal
Exon APA (DE-APA) in which alternative polyA sites happen in two non-overlapping
terminal exons. DE-APA is usually the product of synergy of polyadenylation and
splicing pathways. So it can be further divided into two sub-categories according to how
the upstream splice site is used – DE-APA3 and DE-APA5. DE-APA3 is defined as the
DE-APA coupled with alternative 3’ splice site choice and DE-APA5 is defined as the
DE-APA coupled with alternative 5’ splice site choice as illustrated in Figure 2.
Consequences of APA
Genome-wide analyses have recently revealed a wide range of tissue-specific,
cell-state-specific, and disease-related APA events in both human and mouse [14-16] .
Understanding the biological consequences of different types of APA events (SE-APA,
DE-APA3 and DE-APA5) is of great importance in many aspects of mRNA posttranscriptional regulation and translation.
6
SE-APA can generate two mRNA isoforms with mere difference in the 3’ UTR.
Since 3’ UTR contains many regulatory elements for post-transcriptional and
translational control, the SE-APA isoforms could undergo different post-transcriptional
and translational regulation procedures (Figure 3) [17]. First, a representative example of
the effect of SE-APA on post-transcriptional regulation is BDNF which is a key
Figure 2 Classification of APA events by mechanism. SE-APA is generated due to
two set of regulatory elements with the same 3’ UTR. DE-APA3 is generated
when an upstream exon is spliced to the 3’ splice site of an exon (orange)
bearing a functional polyA site or skips the entire exon and is spliced to the 3’
splice site of a downstream exon (red). DE-APA5 is created when a site in an
exon (orange) either leads to a functional polyA site at the end of the same
exon or is spliced as a 5’ splice site to a downstream exon.
regulatory protein for neuronal structure and funtion. There are two mRNA isoforms of
brain–derived neurotrophic factor (BDNF) encoding the same protein with only
difference in the 3’ UTR. The isoform with longer 3’ UTR is preferentially localized in
7
dendrites while the short isoform is restricted to the cell soma [18]. This differential
localization is not functionally arbitrary. It is further shown that the defect of dendritic
targeting of BDNF is due to truncation of the longer isoform [18]. Secondly, the
translational efficiency could also be different in SE-APA isoforms as demonstrated in a
study of the gene Polo, in which the less translationally efficient isoform is fatal to the
transgenic flies [19]. Finally, the isoforms of SE-APA are subject to regulation by
microRNAs, which results in differential protein expression [16].
Figure 3 Functional consequence of SE-APA. Two mRNA isoforms which only
differ in the 3’ UTR are shown. The isoform with extended UTR (bottom)
bears additional microRNA target sites and binding sequences of regulatory
proteins which could affect the mRNA stability, translation efficienty, mRNA
export and cellular localization of mRNA.
On the other hand, DE-APA could affect not only the 3’ UTR but also the CDS,
leading to two different protein isoforms (Figure 4). For example, the IgM heavy chain
gene possesses two isoforms as the result of DE-APA in the B cell. In the resting B cell, a
8
membrane-bound isoform corresponding to the distal polyA site is preferentially
expressed; while in the activated B cell, a secreted isoform with the proximal polyA site
is more highly expressed [20]. Recently, a truncated form of glutamyl-prolyl tRNA
synthetase (EPRS) was found to be generated by a “Tyrosine codon to stop codon
conversion” mechanism (PAY*). The C-terminus-truncated isoform can protect its
targets from being translationally repressed by GAIT (gamma-interferon-activated
inhibitor of translation), which was suggested to maintain basal level of pro-inflammatory
proteins for tissue health and organismal advantage [21].
Figure 4 Impact of DE-APA on the difference of coding sequence. The common
coding sequence is represented as green, while the different coding sequence
is colored as organe and red respectively for two DE-APA isoforms. The blue
segments are either 5’ UTR or 3’ UTR.
Mechanism of APA regulation
Transcription is a tightly coupled process [22]. While 3’ end processing of mRNA
is predominantly regulated by a multitude of core and accessory factors, it can also be
affected by the upstream events such as splicing and transcription. In this section, we will
elaborate current findings for the regulatory mechanism underlying APA.
First, the abundance of 3’ end processing factors can act as a determinant for
alternative polyA choice. More specifically, higher expression of 3’ end processing
9
factors can compensate the weaker affinity between the factors and weak polyA cisacting elements. For example, the DE-APA event of IgM heavy chain gene is responsive
to the differential expression of CstF64 between resting and activated B cells. As the
expression of CstF64 dramatically increases from the resting to activated state, it interacts
with a proximal weaker polyA site (as measured by affinity between CstF64 and polyA
site) in contrast to interacting with a strong distal polyA site. In the opposite direction, the
depletion of CstF64 switches the 3’ end processing from proximal polyA site to distal
polyA site [23]. The indirect regulation of transcription factor – E2F has been found to
cause the expression change of 3’ end processing factors in proliferation [24]. However,
the study of CFIm68 exhibits a seemingly more complicated counter-example for this
model where a proximal polyA site binds to CFIm68 (after knockdown) in higher affinity
(as measured by the number of A-Seq reads) than the binding between CFIm68 (before
knockdown) and a distal polyA site [25]. This may implicate a more sophisticated model
of 3’ end processing factors in the APA regulation pathway.
Secondly, canonical RNA binding proteins or splicing factors, which are thought
to bind to the CDS or intronic regions, can also bind to the 3’ UTR regulating the
efficiency of 3’ end processing. One possible mechanism is that RBPs/SFs can compete
with core 3’ end processing factors for certain cis-acting elements. For instance, PTB can
outcompete CstF64 for the U-rich DSE to play an inhibitory role in the 3’ end processing
of -globin [26]. Another mechanism is that RBPs/SFs can bind with other cis-acting
elements flanking the cleavage sites to enhance 3’ end processing [7, 26]. In this
scenario, aCPs bind with upstream C-rich motif to enhance the 3’ end processing of
human α-globin protein. Similarly, hnRNP H can bind upstream G-rich motif by forming
a cooperative assembly with PTB [6]. In addition, recent high-throughput sequencing
studies have revealed global regulation of alternative polyadenylation by hnRNP H and
U1 snRNP, suggesting a pervasive role of RBP/SF in the regulation of APA [27, 28].
10
Finally, alternative polyadenylation is also coupled with transcriptional processes.
First, polyA site choice can be modulated by the gene promoter as demonstrated by
reporter assays in [29]. Second, RNA polymerase II (RNAP II), which is critical in RNA
transcription, was found to affect polyA site choice through multiple possible
mechanisms. One mechanism is that the variation of transcription elongation rate
associated with RNAP II could favor different polyA sites. This was demonstrated in
Polo mRNAs whose proximal polyA site was promoted by an RNAP II mutant with a
lower elongation rate [19]. Another mechanism is associated with transcriptional pausing
of RNAP II during transcriptional elongation [30]. For example, the difference in the
enrichment of RNAP II pausing signal between the vicinity of proximal polyA site in
highly expressed genes and distal polyA site in lowly expressed genes may be ascribed to
the different pattern of RNAP II pausing that affects gene expression and polyA site
choice [29].
Alternative polyadenylation in disease
Widespread APA has been increasingly discovered in disease, especially in
cancer, through either conventional techniques or state-of-the-art high throughput
sequencing techniques. More interestingly, systematic APA switch pattern is found to be
related to different disease conditions, which demonstrates a significant role of APA in
pathogenesis and oncogenesis. Using northern blotting, Mayr and Bartel observed
shortening of oncogene 3’ UTRs in cancer cell lines when compared to non-transformed
cell lines. The shortening of 3’ UTR results in higher protein product of oncogenes by
avoiding repressive elements (presumably the microRNA binding sites) in the extended
region of 3’ UTR [16]. In addition, two global studies on cancer cells not only found this
shortening pattern but also observed a lengthening pattern in distinct cancer
11
cells/subtypes [31, 32]. On top of global analysis, individual studies have uncovered the
relevant paths of APA to diseases. A well-studied example is the APA caused by a
mutation in the polyA signal of FOXP3. Specifically, a rare AAUAA to AAUGAA
mutation deactivates the proximal polyA site leading to an unstable FOXP3 mRNA using
a distal polyA site. As a result, the FOXP3 protein decreases which leads to the syndrome
of autoimmune disease IPEX (Immune dysfunction, Polyendocrinopathy, Enteropathy,
X-linked) [33].
12
CHAPTER II
IDENTIFICATION OF APA EVENTS USING
HIGH-THROUGHPUT SEQUENDING TECHNIQUES
Introduction
Initial efforts to study large-scale APA events were based on EST obtained by
Sanger sequencing [34, 35]. However, rigidly speaking, these studies were not
quantitative enough to call APA switch due to the limitation of EST data in terms of
quantification and coverage. The early success in identifying global pattern of APA
switching was made possible by taking advantage of various microarray techniques with
high density probes targeting mRNAs. In these studies, APA usage switches were
detected by comparing the expression intensities of two alternative regions delimited by
two cleavage sites between different clinical or biological conditions [14, 15, 36].
However, three limitations in microarray techniques have restricted its application in
detecting APA: 1) coverage limitation — microarray requires pre-defined probes against
known mRNA or gene regions and is not able to address unannotated transcript or
unannotated 3’ UTR; 2) accuracy limitation — no accurate polyA coordinate information
can be derived from microarray data; 3). sensitivity limitation — microarray is not
sensitive enough to handle the low expression of alternative region of 3’ UTR.
In recent years, revolutionary high-throughput sequencing techniques (as
compared to Sanger sequencing) have become a powerful alternative method to detect
APA following their success in detecting alternative splicing [37, 38]. Superior to
predefined microarray probes, the signal of RNA-Seq consists of short RNA fragments
ranging from 25nt to 400nt (varied by sequencing platforms). Ideally, RNA-Seq reads
covers the full body of the expressed RNAs (poly(A)+ or total RNA according to different
13
protocols). In addition, RNA-Seq can also detect the expression of novel transcripts
without any prior knowledge. Furthermore, RNA-Seq is sensitive enough to detect lowly
expressed gene [39]. These merits lead to the identification of APA switches in multiple
biological settings, including APA between human tissues and the impact of RNA
binding protein on APA formation [27, 38, 40]. In spite of these advantages, because 3’
UTRs only account for a small portion of whole transcriptome, using generic RNA-Seq is
cost-ineffective for detecting APA. Similar to the microarray, RNA-Seq alone cannot
precisely determine the polyA site coordinate which could possibly result in the loss of
accuracy in detecting APA. On the basis of generic RNA-Seq, some RNA-Seq protocols,
PolyA Site Sequencing (PAS-Seq), specifically focusing on the 3’ end of mRNA have
recently been developed [32, 41-44]. All of these protocols share a common and core step
by applying oligo (dT) primed reverse transcription-PCR (RT-PCR) to capture the RNA
fragment with a polyA stretch at its 3’ end. This type of method overcomes most of the
disadvantages of generic RNA-Seq in detecting APA but introduce a considerable
amount of internal A-rich sequences (by oligo (dT) internal priming), which requires
extra efforts for removal [32, 44]. More importantly, since these methods are derivatives
of the Illumina RNA-Seq protocol, it allows sequencing multiplexed samples in one lane
with still high yield of RNA-Seq reads. Another high-throughput sequencing method to
detect APA is direct RNA sequencing (DRS) from the single molecule sequencing
machine of Helicos [45]. This method is specifically designed to accurately capture the
mRNA 3’ end without RT-PCR in library construction. Therefore, it can measure RNA
expression and polyA abundance simultaneously with a smaller amount of RNA sample
than RNA-Seq. Although DRS has better quantitative nature, it also has problems of
14
higher sequencing error rate, higher sequencing cost and shortage of multiplex capability.
Practically, this technique is less used than the Illumina-based techniques (Table 1). But
overall, all of these high-throughput sequencing techniques have rendered APA detection
effective and successful. Next, we will describe how to conduct global identification of
APA events using generic RNA-Seq and DRS or PAS-Seq respectively as outlined in
Figure 5.
Table 1
Characteristic features of different high-throuput sequencing techniques
in 3’ end processing study.
Techniques
Specific to 3’
Internal
Sequencing
end
priming
error
Multiplex
cDNA
amplification
RNA-Seq
No
No
Low
Yes
Yes
PAS-Seq
Yes
High
Low
Yes
Yes
DRS
Yes
Low
High
No
No
Detect APA using generic RNA-Seq
The critical issues in APA detection using generic RNA-Seq include: 1) a set of
reliable terminal exons; 2) accurate coordinates of polyA sites; 3) a predefined set of
APA events of different types (SE-APA, DE-APA3 and DE-APA5); and 4) robust
algorithms to infer significant APA switches for different types of APA events. In this
section, we will discuss the aforementioned issues respectively.
15
Filtering artificial terminal exon
Transcript annotations of human genes were downloaded from the UCSC Known
Gene database (hg19) and the Ensembl gene database (release 61). To minimize the
Figure 5 Pipeline for detecting significant APA switch for generic RNA-Seq, DRS
and PAS-Seq.
16
effect of artificial transcript annotation for APA detection, we empirically set two rules to
eliminate potential artifacts in the UCSC and Ensembl transcript annotation: 1) remove 3’
terminal exons whose 5’ ends equal but 3’ ends fall within internal exons of other
transcripts (incomplete transcript annotation); and 2) remove 3’ terminal exons
containing multiple internal exons of other transcripts (artificial intron retention events)
(Figure 6).
Predefine APA events
Next, based on filtered transcription annotation, we constructed three sets of
alternative 3’ end events (SE-APA, DE-APA3, and DE-APA5). If the 3’ terminal exons
of two different transcripts of the same gene do not overlap with each other and any other
internal exons, they are defined as a DE-APA3 event. If the 3’ terminal exons of two
different transcripts of the same gene do not overlap with each other, and the proximal 3’
terminal exon shared the 5’ end with an internal exon of the other transcript while its 3’
end extended beyond the 3’ end of the same internal exon, these two 3’ terminal exons
are defined as a DE-APA5 event. For SE-APA events, we collected all EST-supported
polyA sites within the 3’ terminal exons from the PolyA-DB2 database [46], and treated
the 3’ end of any 3’ terminal exon as another putative polyA site. After that, for any 3’
terminal exon with more than one polyA site, we considered all pairs of adjacent polyA
sites as SE-APA events. The common and extended regions of each SE-APA event were
determined as the region between the 5’ end of the 3’ terminal exon and the proximal
polyA site, and the region between the proximal polyA site and the distal polyA site
respectively. SE-APA events whose common or extended regions were smaller than 100
nt were removed from further analysis in order to eliminate closely clustered polyA sites.
17
As a result, we produced 30044, 20337, and 17117 possible SE-APA, DE-APA3 and DEAPA5 events respectively from the known gene annotation.
Figure 6 Flowchart of precompiling APA events only from gene structure
annotation.
Calling statistically significant APA (RNA-Seq)
For each APA event, uniquely mapped RNA-Seq reads against the human
genome (hg19) or exon-exon junctions were used for calculating the transcript levels of
different 3’ ends. For SE-APA event, RNA-Seq reads uniquely mapped to common and
18
extended regions were counted respectively. We calculated the Bayes factor for testing
whether the ratios of read densities in the common and extended regions were
significantly different between two biological conditions [27]. In addition, we also
calculated the relative expression fold change of the common and extended regions by
dividing the ratio of read counts of the common region and the extended region in one
condition by that in the other condition. Significant SE-APA events are defined as those
with a fold change of at least 2 and a Bayes factor no less than 100 (according to the
suggestion in [47]).
For DE-APA3 and DE-APA5 events, RNA-Seq reads uniquely mapped to the
proximal and distal 3’ terminal exons were counted respectively. Raw RNA-Seq counts
were also transformed to the Reads Per Kilobase per Million mapped reads (RPKM)
values as the normalized expression levels of the two alternative 3’ terminal exons [37].
APA events in which at least one of the alternative 3’ terminal exons were expressed with
RPKM less than 1 in both conditions are eliminated to assure the APA events being
tested are considerably highly expressed. For the remaining events, we conducted Fisher
exact test on the isoform-specific RNA-Seq read counts of the proximal and distal 3’
terminal exons in both conditions. The P-value derived from Fisher exact test is adjusted
by multiple testing correction algorithm – Benjamini-Hochberg algorithm [48]. We also
calculated the relative expression fold change of the proximal and distal 3’ terminal
exons between two conditions by dividing the ratio of read counts for the two alternative
terminal exons in one condition by that in the other condition. Significant APA3 and
APA5 events were defined as those with a fold change of at least 2 between two
conditions and a false discovery rate (FDR) of <0.01.
19
Detect APA using PAS-Seq and DRS
Read mapping and polyA site calling
The basic principles for detecting APA using PAS-Seq and DRS are similar
except for the mapping and polyA calling steps. We used Bowtie to map PAS-Seq [49]
and Helisphere to map DRS reads by requiring the mapping score no less than 4.0
(http://open.helicosbio.com). The Helisphere mapping score is specifically designed for
Helicos DRS data because of the high sequencing error and insertion/deletion rate.
Empirically, a mapping score of 4.0 can yield a good tradeoff between mapping error and
the number of uniquely mapped reads. After mapping, only the uniquely mapped reads
were retained for further analysis. According to the sequencing protocols, the most 3’ end
position of the mapped PAS-Seq read and the opposite strand of the most 5’ end position
of the mapped DRS read were defined as putative polyA cleavage sites.
Filtering artificial polyA site due to internal priming
One common side-effect of oligo polyT priming used in both PAS-Seq and DRS
is that internal A-rich reads other than polyA-tail reads could also be pulled out. There
are two major categories of strategies for identifying the internal-priming reads – one is
the “position-specific matrix” method and the other is “fixed regular expression pattern”
method [32, 44]. We compared these two methods and found the “position-specific
matrix” method was too loose to keep some well annotated known polyA sites. In this
regard, we choose the “fixed regular expression pattern” by removing polyA sites whose
downstream 20nt sequences match the following three patterns: 1) there are in total 12
20
“A”s; 2) there is a substring with at least 8 consecutive “A”s; 3) the sequence matches a
pattern like “GAAAA+GAAA+G”, where ‘+’ means repeating more than 0 times.
Clustering heterogeneous polyA sites
During 3’ end processing, the exact cleavage sites regulated by the same set of
regulatory elements in the same transcript usually offset for several nucleotides. This
phenomenon is called the “heterogeneity of cleavage” [9, 50]. Therefore, another issue is
to get a consensus polyA coordinate for the ease of comparing polyA expression across
different samples. In this regard, we sequentially pooled all the sequencing data,
identified cleavage sites and iteratively clustered all individual cleavage sites within 40 nt
to its nearest cleavage site on the same chromosome strand. The weighted coordinate,
which was calculated as the sum of the product of the coordinate of an individual polyA
site and its percentage of usage in the whole cluster, was taken as the representative
coordinate of the corresponding polyA cluster. Finally, the sum of DRS/PAS-Seq reads
corresponding to each individual cleavage site of one polyA cluster is defined as the
abundance of clustered polyA (or equivalently polyA site).
Calling statistically significant APA (DRS/PAS-Seq)
One intuitive way of calling APA using PAS-Seq or DRS is to compare each
individual polyA to another polyA of the same gene. This method is called “One v.s.
One”, in which the Fisher exact test was conducted on all possible pairs of poly(A)s of
one gene in two different experiments to test whether there is a change in relative usage
of two poly(A)s, and the Benjamini-Hochberg method can be used to calculate false
21
discovery rate (FDR). The pairs of poly(A)s with FDRs less than 0.05 (or any given
cutoff) could be defined as statistically significant events.
However, this “One v.s. One” method is not robust enough due to the possible
artificial polyAs, low abundance of individual polyA and large number of pair-wise
comparisons needed for being adjusted by FDR calculation. Therefore, an improved
method called “One v.s. Others” was designed to address these issues.
The basic principle of “One v.s. Others” is to test one polyA site each time and
compare it to the sum of all the other polyA sites of the same gene (Table 2).
Specifically, to test whether there is a change of usage for any single poly(A) cluster of a
certain gene, the Fisher exact test was conducted to compare the ratio of read count of a
single polyA cluster to the sum of all the other polyAs clusters between two biological
conditions. The p-values could also be adjusted by Benjamini-Hochberg method. Finally,
the polyA sites with FDRs less than a given cutoff value and percentage change of DRS
count greater than another cutoff value can be defined as significantly changed polyA
sites.
Table 2
Example of contingency table used in One v.s. Others method.
One (polyA 1)
Others (polyA 2+3)
Total
Condition 1
10
90
100
Condition 2
50
50
100
Note: The example gene has three polyA sites (1,2 and 3) and the numbers in the cells
represent the DRS/PAS-Seq read counts.
22
Summary
In this chapter, we briefly reviewed popular techniques for detecting significant
APA events. The techniques have now evolved into a stage of high-capacity and highaccuracy. During the past several years, generic RNA-Seq, DRS and PAS-Seq have been
utilized successfully in a wide range of APA studies. Computationally, almost every
study differs slightly from each other in certain steps of the aforementioned pipelines.
The methods depicted in this chapter are the outcome of fine-tuning during our APArelated projects. The remaining parts of the thesis will demonstrate the effective
application of these pipelines by addressing different biological or clinical problems.
23
CHAPTER III
THE POLY-C BINDING PROTEINS ACT AS
GLOBAL REGULATORS OF ALTERNATIVE
POLYADENYLATION
Introduction
Prior studies have identified a novel RNA-protein (RNP) complex that assembles
on the 3’ UTR of the human -globin mRNA. This complex, initially identified based on
its ability to enhance stability of h-globin mRNA in the cytoplasm of erythroid cells
[51-55]), is comprised of the KH-domain RNA binding protein, CP (also known as
polyC-binding protein (PCBP) and hnRNP E; reviewed in [56]), bound to a repeated Crich motif within the 3’ UTR [57, 58]. Subsequent studies have revealed that this
CP/polyC RNP complex plays a role in stability control of multiple mRNAs, both in
erythroid and nonerythroid cells, and is likely to constitute a widely distributed
cytoplasmic determinant of gene regulation [58-61]. The sequences and structures of
these native C-rich elements parallel the C-rich motifs in single stranded configuration
that have been identified by in vitro SELEX as the optimal binding site for CP2 [62].
In addition to their stabilizing role in the cytoplasm, CP/polyC complexes also
function in the nucleus during transcript processing [63]. For example, CP has been
demonstrated to initially bind to the nascent human -globin transcript in the nucleus
[63] where it acts in vivo as a splicing regulator [63]. Our recent study indicated that CP
also enhances mRNA 3’ processing [7]. These studies demonstrate that CP bound to the
C-rich upstream sequence enhances (USEs) in both steps in 3’end processing, cleavage
and polyadenylation [7]. The ability of the CP complex to enhance 3’end processing is
further supported by the in vivo interaction of CP with core components of the 3’end
processing complex [7]. These observations support a model in which CP assembles cotranscriptionally on the 3’UTR, setting the stage for a coordinated set of nuclear and
cytoplasmic controls.
24
In the current study we extend these observations by exploring a wider role for
theCP/polyC complex control of the mammalian transcriptome. The results
demonstrate that CPs, in conjunction with their cognate C-rich binding sites, control the
utilization of polyA processing sites in a defined subset of the mRNAs. Thus the CP
mRNP complex has the capacity to play a pivotal role in determining the structure and
expression of specific transcripts via its impact on the 3’ processing pathway.
Results
Direct RNA 3’ sequencing of the transcriptome in cells
acutely depleted of CP
We have previously demonstrated that the RNA binding proteins, CPs,
markedly enhance 3’ processing of the h-globin transcript via a sequence-specific
association of the CP proteins with a C-rich motif within the 3’UTR [7]. These studies
lead us to ask whether CPs play a global role in 3’ processing in erythroid cells. To
address this question we assessed the impact of CP depletion on the K562
transcriptome. K562 cells are a human Tier I ENCODE cell line with hematopoietic
properties. We separately transfected the K562 cells with two distinct siRNAs, each of
which co-targets the two major CP transcripts, CP1 and CP2 [56]. Parallel control
transfections were carried out with siRNAs against an unrelated protein (GLD-2).
Effective and specific co-depletion of CP1 and CP2 from the siRNA treated cells was
demonstrated by mRNA and protein analyses at three days post transfection (Figure 7).
Total RNA isolated from each set of siRNA-transfected cells was subject to direct
RNA sequencing (DRS, Helicos BioSciences Corporation, Cambridge, MA). DRS
isolates individual tethered poly(A) RNAs for massively parallel sequencing of 3’
termini. This direct approach eliminates the need for generating cDNA intermediates, for
amplification steps, or for ligation reactions, any of which has the capacity to introduce
bias in the final quantification of mRNA species [45, 64].
25
Figure 7 siRNA-mediated co-depletion of CP1 and CP2 from K562 cells. (A).
Experiment procedure. K562 cells were separately transfected with two
distinct siRNAs, each co-targeting CP1 and CP2 mRNAs (CP1-1 and
CP1-4). Parallel transfections were carried out with 2 distinct control
siRNAs targeting an unrelated protein (GLD-2 mRNA; CTRL-1 and CTRL2). 24 hours post transfection, cells were re-transfected with same siRNAs,
cultured an additional two days (total 3 days of culture), and assessed for
effective siRNA-mediated knock-down by protein and RNA analyses. RNA
isolated from each culture was subjected to DRS analysis for mapping and
quantitation of 3’ processing sites. (B). Assessment of CP depletion by realtime RT/PCR. Levels of mRNAs encoding the two CP isoforms, CP1 and
CP2, are displayed. The values on the Y-axis represent the CP mRNA level
normalized to levels of GAPDH mRNA in the respective sample. The ratio of
CP:GAPDH for the CTRL-1 is defined as 1.0. Standard deviation for each
sample is shown (n = 2). (C). Assessment of CP depletion by Western blot.
Affinity purified antibodies specific for either CP1 or CP2 [57] were used
for detection in the first and second panels. Detection of the large ribosomal
subunit, L7a [51], controlled for sample loading (bottom panel). (data
courtesy of Dr. Xinjun Ji, University of Pennsylvania)
26
Total cellular RNAs from cells individually treated with each of the two CP
siRNAs and with each of the two control siRNAs were sequenced on four separate
channels. The sequenced DRS reads had a mean read length of 32 nt (24 nt to 70 nt,
Figure 8).
Figure 8 Histogram of DRS read lengths of all the four samples (pooled).
Three of the channels generated 16-18 million reads while yield in the fourth was
somewhat lower (9 million) (Table 2). The raw reads were mapped back to hg19 genome
assembly and were filtered for internal priming to generate a final data set of positions
and numbers of polyA termini (Table 2). Approximately one third (28% - 35%) of the
27
sequenced reads were retained for polyA site quantification. 55.7% to 61.4% of the
retained DRS reads are within 40nt of ends of UCSC and Ensembl genes or polyA sites
in polyA_DB2 [46]. The DRS data were highly reproducible with Pearson correlation
coefficient higher than 0.92 and 0.94 for two siRNA control samples and two CP
siRNA samples, respectively (Figure 9). Based on this level of reproducibility, we
pooled siRNA control data and CP siRNA data respectively for the subsequent
computational analysis.
Figure 9 Reproducibility of DRS polyA reads. DRS read counts are normalized by
the minimum number of the total non-internal priming reads of two DRS
samples and graphed in log2 scale.
Identification of mRNAs impacted by CP depletion
The DRS data was evaluated for the impact of the CP depletion on overall gene
expression levels and on the relative abundances of competing polyA sites (APA). The
steady state expression from each locus was determined by summing the total number of
28
poly(A) site reads overlapping Ensembl genes. This sum was referred to as the Digital
Gene Expression (DGE) value. We applied DEGseq to identify differentially expressed
genes [65]. Using a False Discovery Rate (FDR) of less than 0.05 and a minimal
normalized fold-change of 1.5, the data revealed that acute depletion of CPs
significantly altered the expression of 586 genes; 231 were increased and 355 were
decreased relative to cells transfected with either of the two control siRNAs (Table 3).
Increasing the cutoff to a 2-fold change in transcript abundance revealed a significant
impact on the expression of 117 genes; 42 were increased and 75 were decreased relative
to the two controls (Table 3). A heat map profiling the comparative DGE values for
the117 most significantly impacted genes (>2-fold change) revealed excellent
concordance between the analyses of RNAs isolated from cells treated with the two
distinct CP siRNAs and those with the two distinct control siRNAs (Figure 10).
Table 3 Summary of DRS data of CP knockdown experiments.
Control
Control
CP
CP
siRNA -1
siRNA -2
siRNA-1
siRNA-2
Sequenced reads
9.9 M
18.5 M
17.4 M
16.4 m
Uniquely mapped reads
3,056,826
7,069,003
6,399,451
5,863,787
Non-internal priming reads
2,811,092
6,557,478
5,919,346
5,420,574
Percentage of non-internal
28.4%
35.5%
34.0%
33.0%
1,564,685
4,027,412
3,480,473
3,185,739
55.7%
61.4%
58.8%
58.8%
RNA Samples
priming reads
Non-internal priming reads
overlapping known annotation
Percentage of known polyA sites
29
Table 4
Number of differentially expressed genes impacted by aCP depletion
Up-regulated
2 fold change
Down-regulated
Total
42
75
117
231
355
586
FDR < 0.05
1.5 fold change
FDR < 0.05
Gene Ontology (GO) analysis revealed that the 586 genes with the 1.5-fold or
greater change in expression subsequent to CP depletion were enriched in genes related
to amino acid metabolism, amino acid biosynthesis, oxidation-reduction reactions,
cholesterol metabolism, lyase reactions, and immunity and defense (Figure 11 and
Table 4). The impact of CPs depletion on these gene categories is consistent with a role
in the modulation of pathways controlling basic metabolism and cell stress responses
[66].
A subset of 21 transcripts with more than 1.5 fold changes on the DGE analysis
was subjected to verification by real-time PCR. Each amplimer set corresponded to an
internal region of the target mRNA so as to detect all mRNA isoforms, irrespective of
their 3’end processing patterns. These analyses, carried out on the same RNA sample
that were assessed in the original DRS study, confirmed the DRS data (increase or
decrease more than 1.5 fold in steady state mRNA representation) in 15 of 21 genes
(Figure 12 and Table 5).
30
Motif analysis reveals C-rich determinants in the 3’
UTRs of mRNAs impacted by CP depletion
We applied MEME software to infer motifs in the differentially expressed genes
(DEG) associated with CP knockdown [67]. The search (MEME) was initiated on the
full set of transcripts that underwent >1.5 fold change in expression subsequent to CP
depletion. This MEME analysis was limited to the 200 nt segment immediately 5’ to the
functional polyA cleavage site. By setting a rigorous p-value cutoff of 1.0E-10, we found
3 motifs significantly enriched in the 200 nt segments upstream of the major polyA sites
of the significantly changed genes. As expected, the most strongly conserved element
was the canonical polyA signal, AAUAAA, and its variants, peaking at 15-20 nts 5’ to
the polyA site. These data corroborate the quality of DRS in recovering functional polyA
sites. As expected, this polyA signal was observed in the mRNAs irrespective of whether
or not they were impacted by the CP depletion. Both of the next two most prominent
motifs contained several prominent C’s (Figure 13A). An RNA-Map and Wilcoxon rank
sum test were employed to identify the positioning of motifs relative to the respectively
utilized polyA sites. The two C-rich motifs were significantly enriched in the CPimpacted transcripts at three positions relative to the polyA site (-150nt, -100nt and -50nt;
FDR < 0.05). The MEME analysis was next separately applied to 355 and 231
transcripts that were either down-regulated or up-regulated, respectively, in response to
the acute CP depletion (Figure 13B and C). The analysis of the down-regulated genes
revealed C-rich motifs 5’ to sequences of polyA sites in approximately 80% of these
transcripts (185 and 105 occurences for the two C-rich motifs, respectively) (Figure
13B). These C-rich motifs in the down-regulated genes were enriched at three peak
locations relative to the polyA site. In contrast, only 38 out of 231 (16%) of the upregulated genes harbor a C-rich motif 5’ to the respective polyA sites (Figure 13C) that
were located at a mean distance of 125nt 5’ to the polyA sites (Figure 13C). In summary,
the DGE analysis points to a significant impact of CP on the overall expression level of
31
Figure 10 Heatmap of differentially expressed genes after CP knockdown. Direct
RNA sequencing (DRS) analyses were carried out on polyA RNAs isolated
from cell cultures treated with the CP siRNAs or the control siRNAs (as in
Figure 7). The heat map represents all 117 mRNA species that showed a >2
fold change in expression (increased or decreased) subsequent to the CP
depletion. The color gradient (log scale) for the heat map represents the
change in the overall representation of each mRNA normalized to the
corresponding level in the RNA isolated from the cells treated in parallel with
the control siRNAs. The positions of the direct siRNA targets, CP1 and
CP2 mRNAs, are indicated by the arrows to the left of the heat map.
a defined subset of mRNAs. The markedly greater number of mRNAs that were downregulated following CP depletion was consistent with an overall enhancing action of
this CP complex on steady state mRNA levels, as the motif analysis revealed a clear
enrichment for C-rich motifs in the mRNAs impacted negatively by CP-depletion
transcripts. These data thus supported a direct role for polyC-binding proteins in one
or more post-transcriptional control pathways that impact on steady state mRNA
representation.
32
Figure 11 GO analysis of mRNAs altered in overall expression (DGE levels) by CP
depletion. GO analysis (DAVID algorithm) of mRNAs that undergo an
alteration in steady state levels (1.5-fold or greater change) subsequent to CP
depletion were included in the analysis. The data was assessed with Fisher's
exact test with the FDR adjustment. The asterix indicate the level of
significance of the effect. * 0.01 < FDR < 0.05; ** 0.001 < FDR < 0.01; ***
FDR < 0.001.
CP impacts on patterns of alternative polyA selection
The CP/polyC complex within the human -globin 3’UTR enhances 3’ cleavage
and polyadenylation [7]. Based on these studies, we proposed that C-rich motifs might
act as USEs in a subset of cellular transcripts. This activity could alter overall production
of mature mRNAs by enhancing the use of a unique polyA site and/or have its impact via
modulation of alternative polyA utilization (APA). The preceding DGE analysis is
consistent with a positive impact of the 3’UTR CP/polyC complex on steady state levels
33
Table 5
Gene Ontology analysis on DEG genes impacted by αCPs knockdown
GO Term
Count
Fold enrichment
FDR
Amino acid metabolism
24
3.377
1.19E-06
Immunity and defense
51
1.871
8.57E-05
Lyase
14
2.864
0.008
Cholesterol metabolism
10
3.732
0.008
Oxidoreductase
33
1.668
0.039
of a subset of mRNAs. To assess the impact of this complex on APA, we screened the
DRS dataset for shifts in polyA site utilization. For each individual polyA site, we
applied Fisher exact test to compare its DRS count to the sum of DRS counts of all the
other polyAs within the same gene between two cell conditions (cells transfected with
CP siRNAs and with control siRNAs). This comparison revealed a total of 357
significant changes in polyA site utilization (198 down-regulated polyA sites and 159 upregulated polyA sites) subsequent to CP depletion, corresponding to a total 264 gene
transcripts (FDR<0.05) (Table 6). Of these APA events, 102 occurred between
competing alternative polyA sites within the Same terminal Exon (‘SE-APA’). This SEAPA subset of APA events should be particularly informative regarding the identification
of 3’ UTR motifs functional in APA as they should be independent of alterations in
transcript splicing. Another 122 genes with APA events linked to alterations in splicing
patterns and occurred in Different terminal Exons (‘DE-APA’). The remaining 133 APA
events could not be simply assigned to either SE-APA or DE-APA categories and are
34
Figure 12 Confirmation of differentially expressed genes impacted by CPs by
targeted real time RT-PCR analysis Shown are real-time analyses of three
mRNAs that increased and three mRNAs that decreased in overall abundance
subsequent to CP depletion (as shown in Table 5). These studies were
carried out on the same RNA preparations as were used in the original DRS
analysis. To further validate these results, the analyses of mRNA levels in
K562 cells were additionally carried out with cells treated with a third distinct
control siRNA to an unrelated mRNA (CTRL-3; Cyclophilin siRNA). All
values shown were normalized for the corresponding levels of GAPDH
mRNA. The data is represented as ratios with the ratio for the CTRL-3 siRNA
sample defined as 1.0. Standard deviation for each sample is shown (n=3).
(data courtesy of Dr. Xinjun Ji, University of Pennsylvania)
termed “ambiguous-APA” events. The pathways controlling these last two sets of APA
events are likely to be more complex and difficult to directly attribute to defined 3’UTR
motifs.
Motif analysis of APA events
We searched for sequence motifs that could establish direct mechanistic link(s)
between CP depletion and APA events. Similar to the DGE gene analysis, we examined
the 200 nt regions upstream of polyA sites that underwent significant alteration in
35
Table 6
List of validated genes having significant changes in overall expression
after CP depletion.
Gene symbol
Fold-change
P-value
FDR
(KD/Control by DRS)
HBZ
3.43
0
0
CP1
0.13
0
0
CP2
0.38
0
0
DDIT4
6.24
6.71E-287 1.96E-283
ACADVL
0.21
6.43E-78
2.77E-75
PHGDH
4.37
1.39E-62
4.86E-60
ACSM3
0.45
1.47E-58
4.20E-56
PRG2
0.26
3.35E-42
5.33E-40
ALAS2
4.13
2.27E-28
2.28E-26
CFLAR
0.36
4.25E-27
4.07E-25
SLC7A11
3
4.36E-27
4.15E-25
CHAC1
8.21
3.08E-26
2.77E-24
COMTD1
0.42
7.89E-19
4.51E-17
SCO2
0.46
8.19E-13
2.77E-11
FAHD2B
0.31
9.77E-12
2.99E-10
utilization for the structure and positioning of enriched motifs. A set of unchanged
polyAs (FDR > 0.8), with similar DRS count distribution as the group with significantly
change polyAs, was randomly selected to serve as a background set for the analysis.
36
Figure 13 Motif analysis within the 3’UTRs of mRNAs impacted by CP
depletion.A.MEME analyses of the sequences 5’ to the dominant polyA sites
of all mRNAs that underwent a 1.5-fold or greater change (up or down) in
their representation subsequent to CP depletion (‘DEG’ mRNAs). The RNAMap encompassed the 200 nt segments immediately 5’ to the sites of polyA
addition. The top 3 motifs as detected by MEME are shown. For each motif,
we listed E-value and number of mRNAs containing corresponding motifs
among the total number of mRNAs being studied. The distance distributions
(polyA cleavage site defined is defined as base 0) are shown below each motif
(X-axis). The Y-axis indicates the percentage of nucleotides at each indicated
site. An asterisk symbolizes a significant peak detected by the Wilcoxon rank
sum test (FDR < 0.05). P-value measures the significant of a motif and the
ratio measures the fraction of mRNAs harboring corresponding motifs in the
whole set of mRNAs.B. Summary of MEME analyses of all mRNAs downregulated by greater than 1.5-fold subsequent to CP depletion. Displayed as
in (A). C. Summary of MEME analysis of all mRNAs up-regulated by greater
than 1.5-fold subsequent to CP depletion. Displayed as in (A)
37
Table 7
Number of APA events impacted by αCPs depletion.
SE-APA DE-APA Ambiguous-APA TOTAL
Up-regulated
44
58
57
159
Down-regulated
58
64
76
198
Total
102
122
133
357
The initial analysis was carried out on the entire set of 198 polyA sites that were
down-regulated upon CP depletion. As expected, the canonical polyA signals,
AATAAA and the variants, were consistently identified approximately 15-20 bp 5’ to
each of the utilized polyA sites (168/198 mRNAs) and was equally represented in CPimpacted APA events as in the control group (Figure 14A). A motif markedly enriched
for C’s was identified in 56 of the 198 APA sites. This motif was pyrimidine-pure with
C’s the predominant base at 9 of the 10 positions. This motif was not observed in the
control group. This C-rich motif was highly represented 5’ of the polyA sites that were
down-regulated subsequent to CP depletion and the peak of this C-rich motif was
located 35-45bp 5’ of site of polyA addition.
The complementing analysis of the set of polyA sites that were up-regulated
following CP depletion revealed a complex motif in 41 of 159 polyA sites. This motif
contained central purines, lacked a significant polyC tract, and lacked a specific or
predominant localization relative to the affected polyA site (Figure 14B).
To further explore the basis for the APA events, we limited the motif search to the
102 competing APA events that were limited to the same terminal exon (SE-APA
events). This was done to eliminate complicating influences of co-existing alterations in
splicing events (Figure 14C). To more directly link the C-rich motifs with the proposed
38
USE function, we configured the discriminative MEME motif approach to directly
compare the sequence environment of 58 down-regulated SE-APA (positive set) to 44
up-regulated SE-APA (negative set). In this manner, the analysis was specifically
configured to identify motifs associated with the down-regulated polyA sites that were
underrepresented in the environment of the up-regulated polyA sites. The top-ranking
motif in this discriminative analysis was pyrimidine-pure and C-rich motif (Figure 14C).
This motif was present in 34 of the 58 down-regulated polyA sites (Figure 14C) and was
positioned approximately 50bp 5’ to the down-regulated polyA processing site. When
this same motif search was extended to the DE-APA events (122 polyA sites) (Figure
14D) we again identified a C-rich motif (41/122, 21 enhanced APA and 20 repressed
APA) although in this case the positioning was somewhat less focused and had a mean
distance of 80 bp upstream from the polyA site. These studies thus reveal a strong
correlation between repression of a polyA site utilization subsequent and CP depletion
and to the presence of a pyrimidine-pure and C-rich motif in close proximity to the site of
3’ processing.
CP2 controls the 3’ processing of its own transcript
An unexpected observation from the APA analyses was that CPs appears to
autoregulate the polyA selection of the CP2 transcript. The co-depletion of the two
major CP transcripts activated a set of two adjacent cryptic polyA sites within the last
intron (intron 13) of the CP2 RNA (Figure 15A, sites within the dotted oval in the gene
browser diagram). Both of these polyA sites are located immediately 3’ to a cryptic
polyA signal, AATAAA (Figure 15B). The use of these two sites was linked to the
activation of a cryptic splice acceptor site upstream of these polyA sites, thus generating
an mRNA with a unique 3’ terminal exon. Targeted RT-PCR analysis and 3’ RACE both
confirmed the positioning of the novel 3’ processing sites within intron 13 and the
generation of exon 13a (Figure 15B and data not shown). The generation of the novel
39
‘exon 13a’ subsequent to CP depletion was accompanied by a decrease in the use of the
polyA site in exon 14. This reciprocal relationship was validated by targeted real-time
RT/PCR (Figure 15C). The presence of a C-rich sequence approximately 40 nt upstream
of the splicing acceptor site and overlapping the likely lariat branch site for this new
intronic exon (13a) may play a role in this alternative processing event. The absence of a
C-rich motif near these new polyA sites and exon 14 polyA site supports the likelihood
that the control is mediated by direct effect on alternative splicing, rather than a direct
effect on polyA utilization [63]. A primary splicing mechanism is further supported by
the interaction between this upstream C-rich element and CP proteins, as evidenced by
the RNA EMSA and UV-crosslinking assay (Figure 15D). Thus, under normal
conditions, the usage of the branch site encompassed by the C-rich motif may be
repressed by CPs. When CP levels or activity are depleted from the cell, a new set of
mRNA isoforms is generated.
PA pattern changes impacted by CPs
The preceding analysis of DRS data identified enrichment for C-rich motifs 5’ to
polyA sites that are down-regulated subsequent to CP depletion. These data were next
confirmed by a set of targeted real-time RT/PCR analyses. Direct confirmation of APA
was first carried out on two examples of DE-APA. The real-time RT/PCR analysis
confirmed the DE-APA events for Ssu72 gene and NPM1 gene following CP depletion
(Figure 16A-B). Of note, both of these genes have been themselves directly or indirectly
implicated in mRNA 3’ end processing regulation [68] [69] (also see Discussion).
Six examples of mRNAs with SE-APA pattern were next assessed by the same
strategy (Figure 16C-H). In four of these transcripts a C-rich motif preceded the
proximal site of the competing polyA sites and for the remaining two transcripts it
preceded the distal site of the competing polyA sites. In both sets of situations, the polyA
40
sites located directly 3’ to the C-rich motif were repressed subsequent to CP depletion.
UV-crosslinking assays (and RNA EMSAs for two of them) demonstrated that each of
these C-rich motifs bind CPs (Figure 17). Taken together, these studies demonstrated
that CP proteins, via interacting with C-rich RNA elements, impact on alternative
polyA site choices, as summarized in a model in Figure 18.
The sequences encompassing and 5’ to each APA sites are shown, with the C-rich
sequence highlighted. The real time RT-PCR quantify total polyA site usage levels (use
of both proximal and distal polyA sites) and the distal polyA site usage levels. The
histogram indicates the ratio of the long 3’UTR isoform (ie., use of the distal polyA site)
relative to the total polyA site usage (as in [5]). These studies were done in same RNA
samples used for original DRS studies. The real-time RT-PCR quantifications were
normalized to GAPDH mRNA and presented as a ratio versus the CTRL-3 (cyclophilin
siRNA) defined as 1. Standard deviation for each sample is shown (n=3).
Discussion
We previously demonstrated that CPs enhance 3’ processing of the h-globin transcript
via binding to a C-rich motif in the 3’UTR. These findings led us to conclude that the
polyC motif in the h-globin 3’UTR acted as an USE enhancer of h-globin transcript 3’
processing [7]. The current findings support and extend role of CPs in the control of 3’
end processing by documenting their broad impact on steady state levels and polyA site
utilization of mRNAs within the human transcriptome. These data specifically identify a
subset of mRNA transcripts in which the enhancement of processing is tightly linked to
the presence of the cognate C-rich binding sites in close proximity to a polyA signal.
A global relationship of 3’ processing to gene regulation has been highlighted by
a number of recent studies [14-16]. These processing pathways are complex in their
biochemistry and reflect the input of multiple proteins and protein complexes [70]. These
factors impact on 3’ processing via direct as well as indirect interactions with target.
41
Figure 14 Motif analysis of transcripts undergoing APA in response to CP
depletion. (A). Motif analyses 5’ to polyA sites that are involved in APA (
DE-APA and SE-APA categories combined; 198 polyA sites) and are
repressed in their representation by CP depletion The distance distribution
plot is shown below each corresponding motif. The Y-axis indicates the
percentage of each nucleotide at each indicated site at the indicated distance to
polyA cleavage site location (defined as base 0). The analyses from the cells
treated with the control or the CP siRNAs are directly compared in each
setting. Asterix illustrate positions with FDR (Benjamini-Hochberg algorithm)
less than 0.05 by Wilcoxon rank sum test.(B). Motif analyses 5’ to polyA sites
that are involved in APA ( (DE-APA and SE-APA categories combined; 159
polyA sites) and are enhanced in their representation by CP depletion (
Figure organized as in A). (C). Motif analysis 5’ to polyA sites that are
involved specifically in APA between sites in the same terminal exon (SEAPA category). A discriminative motif discovery analysis by MEME was
executed to specifically identify motifs over-represented in the downregulated SE-APA (58 polyAs) and under-represented in the up-regulated SEAPA (44 polyAs). The repressed SE-APA was defined in this comparison as
the positive set and the enhanced SE-APA was defined as the
negative/background set. The position-specific prior probabilities were first
estimated for background set. Next a normal motif search was done in downregulated SE-APA based on the position-specific prior probabilities. Note,
position 7 can be any of the four nucleotides. (D). Motifs around DE-APA
polyA sites (122 polyAs). The analysis was carried out as in A., above.
42
Alterations in levels of general factors and complexes involved in 3’ processing,
such as CPSF, CSTF, and the nuclear polyA binding protein PABPN1 can impact on
polyA site selection and the efficiency of polyA addition [71]. In addition, particular
RNA binding proteins have the capacity to impact on the 3’ processing of specific
transcripts or groups of transcripts. For example, the KH-domain binding protein Nova2,
a protein closely related to CP, exerts controls over polyA site choices in a position
dependent manner [5]. The polypyrimidine tract binding (PTB) has been implicated in
the enhancement of 3’ end processing of several genes. This enhancement appears to be
mediated by stimulating hnRNP H binding to a G-rich binding sites [6]. This pathway
appears to have a global role in alternative polyA site selection [27].
Likewise, recent genome wide surveys have revealed that alterations in the levels
of the epithelial-specific splicing regulatory protein, ESRPs, can trigger widespread shifts
in polyadenylation patterns [72]. In the current study we demonstrate that CP proteins
are actively involved in the determination of mRNA expression levels and alternative
polyadenylation. We observe that CP depletion from the cell represses the steady state
levels of substantially more mRNA than are increased. This result was consistent with the
known enhancing action of the CP complex on steady state mRNA levels [51, 54, 61].
The impact of CPs on nuclear functions, and in particular on splicing and polyA activity
are likely to also play roles in how much of the mRNA is generated and exported to the
cytoplasm. Importantly, these control pathways are likely to be interrelated
mechanistically. We have demonstrated in the case of the h-globin gene expression that
the CP complex assembles on the nascent h-globin transcript in the nucleus and
appears to travel on the mRNA to the cytoplasem where it stabilizes the mRNA. Thus
the nuclear and cytoplasmic pathways are linked and may coordinate overall levels of
gene expression and protein production. Future studies will determine whether CPs
regulate mRNA steady-state level through other mechanisms, such as APA may impact
43
on mRNA steady state levels by including or excluding miRNA target sites, RNAbinding protein (such as CPs) binding sites in the final mRNA products.
The data directly demonstrates that alterations in CP protein availability can regulate
polyA site utilization (APA) choices in a subset of RNAP II transcripts by interacting
with C-rich sequences. This involvement of CP in the global control of 3’ processing is
supported by a recent general screen for proteins involved in 3’ processing [71]. We
observe that the formation of CP RNP complex near the polyA sites (either proximal or
distal) enhances use of the corresponding polyA site ([7] and current data). Following
CP depletion, the AES, Get 4, CDK16, and SHMT2 transcripts undergo a decrease in
their usage of proximal polyA sites and shift to the distal polyA sites (Figure 16). These
four genes all have C-rich motifs closely located upstream of their proximal polyA site.
In a reciprocal fashion, the depletion of CP results in decreased usage of the distal
polyA sites and shift to the proximal polyA site in the CSTF1 and PPP2r2d transcripts. In
agreement with the model (Figure 18) that the C-rich USE enhances polyA site activity,
we find C-rich motifs 5’ to the distal polyA sites in both of theses transcripts. The
identification of the C-rich motif 5’ of the repressed sites by MEME analysis is in
remarkably accord with the definition of the CP binding site by prior analyses of
mRNAs targeted by CP [7, 52, 59, 63, 73, 74] and with CP binding features as
determined by in vitro SELEX [62]. Consistent with this role, we show that these C-rich
motifs interact with CP proteins (Figure 17). From these complementing lines of
evidence, we conclude that the binding of CP to a C-rich motif 5’ acts as a potent USE
enhancer of 3’ processing.
It should be noted that a significant number of mRNAs have altered PA utilization
in the absence of the C-rich motif. This may represent secondary effects of CP
depletion. As revealed in current study, CP depletion caused the changes of steady-state
and APA pattern in a number of RNA binding proteins and components of RNA
processing machinery. One can imagine that these changes will indirectly alter APA
44
Figure 15 siRNA-mediated depletion of CPs alters 3’ processing of the CP2
transcript.(A). Genome browser view of the DRS reads at the CP2 locus.
Comparison of 3’ processing site utilization in cells treated with CP siRNAs
(pooled, upper panels) and control siRNAs (pooled, lower panels). The y-axis
represents the number of read counts corresponding to each of the polyA sites.
The two novel polyA sites observed in the CP-depleted cells are
encompassed in the dotted oval. (B). Generation of the two novel polyA sites
in CP mRNA subsequent to CP-depletion reflects linked alterations in
splicing and 3’ processing. Exons 13 and 14 (terminal exon) of the CP gene
are shown. The regular splicing patterns and the positions of the
corresponding 3’ polyA termini are indicated by the solid lines and solid
vertical arrows, respectively. The alternative splicing/3’ processing event that
occurs in the CP-depleted cells is indicated by the corresponding set of
dotted lines and dotted vertical arrows, respectively (upper panel). RT-PCR
amplification was between primers #1 and #2 and the corresponding fragment
was excised and sequenced (middle panel). The sequencing confirmed the use
of the novel splice-acceptor site in exon 13a, with the two polyA signals
highlighted (lower panel: partial exon 13 sequence was shown in italic and
arrow indicated the starting point of new novel exon 13).(C). Real-time PCR
analysis confirms the switch in terminal intron splicing and 3’ polyA site
selection within the CP2 transcript subsequent to CP depletion. Primers F
and R assess the use of the canonical polyA usage, primers c and d assess use
of the distal novel polyA site within exon 13a, while the primers a and b asses
overall levels of alternative splicing and 3’ processing with exon 13a. The
assays were carried out on RNAs isolated from cells treated with each of the 3
distinct control siRNAs and with each of the two distinct CP2-targeting
siRNAs. Each real time assay was normalized to GAPDH amplicon. The
ratio in the CTRL-3 sample is defined as 1.0. Standard deviation for each
sample is shown (n=3).(D). In vitro RNA-protein interaction assay. An RNA
oligo of 24 bp (shown below diagram), encompassing the region immediately
5’ to exon 13a (dash rectangle), was synthesized and 32P - labeled. The DNA
sequence 5’ the splice site is also shown. Left panel: the RNA oligo was
incubated with HeLa cell nuclear extract, UV-crosslinked, IP’ed with antiCP2/KL, and resolved on a SDS-PAGE gel [7]. The position of the ‘CP
complex’ is defined by IP using anti-CP2/KL antibody. Right panel: the
same 32P - labeled probe was subjected to RNA EMSA assay using K562 cell
S100 extract as described [63]. (data courtesy of Dr. Xinjun Ji, University of
Pennsylvania)
45
46
Figure 16 QPCR validations of APA. The subset of APA events identified in the DRS
analysis was independently assessed by targeted RT-PCR. The DRS data is
shown in the context of the genome browser diagram of the respective locus.
The red arrow indicates the position of the site of alternative polyadenylation
triggered by CP depletion. A-B: Examples of APA involving PA sites
located in different exons (DE-APA). (A), SSU72 RNA polymerase II CTD
phosphatase homolog (Ssu72) gene, an APA5 event [72]; (B), Nucleophosmin
(NPM1)gene, APA3 event [72]. In each case, the targeted PCRs were
performed to determine the change in the representation of the short isoforms.
All QPCR values were normalized to GAPDH mRNA levels, using the same
RNAs as used for DRS study. Standard deviation for each sample is shown
(n=3). Blue bar and red bar represent 3’ portion either long or short mRNA
isoforms, respectively. C-H: Examples of APA involving competing PA sites
within the same terminal exon (SE-APA). (C), Amino-terminal enhancer of
split (AES) gene; (D), Cyclin-dependent kinase 16 (CDK16) gene; (E), Golgi
to ER traffic protein 4 homolog (GET4) gene; (F), Serine
hydroxymethyltransferase 2 (SHMT2) gene; (G). Cleavage stimulation factor
(CSTF1) gene (also known as CSTF50); (H), Protein phosphatase 2, subunit
B, isoform delta (PPP2r2d) gene. (data courtesy of Dr. Xinjun Ji, University
of Pennsylvania)
47
48
Figure 16–continued
49
Figure 17 In vitro RNA-protein interaction assay. A 32P-labeled C-rich RNA probe
corresponding to each tested gene (shown at the top of the figure) was
incubated with HeLa cell nuclear extract and subjected to UV-cross-linking
[7] or RNA EMSA assay, as in Fig.7D. (data courtesy of Dr. Xinjun Ji,
University of Pennsylvania)
patterns of a subset of genes, although the exact mechanism needs for future detailed
studies.
It is interesting to note that the current study revealed that there was only a small
overlap between the mRNAs that changed significantly in their overall steady state level
(DGE values) and those that were impacted by significant alterations in their polyA site
utilization (APA patterns). When the DGE genes with greater than 2-fold change and
APA data sets were compared, only 7 (PCBP2, ACOT2, ACSM3, SLC6A6, PRG2,
C3orf75 and C1orf86) of 117 genes were present in both categories. When the more
inclusive 1.5 fold change in DGE was used, we observed only 29 genes of 586 genes in
both categories. So, a reciprocal switch between two sets of competing polyA sites in
50
Figure 18 Impact of CPs on expression and alternative polyadenylation of Pol II
transcripts. Based on our analysis of 3’ processing of the h-globin transcript
[7] and on the present genome-wide analysis, we propose that CPs can act as
general regulators of 3’ processing. The CP RNP complex recruits corecomponents of the 3’end processing machinery to a defined subset of human
transcripts containing cognate C-rich binding site motifs [7] and in this way
enhances the efficiency of 3’ cleavage and polyadenylation when situated in
proximity to polyA signals. This enhancement of 3’ processing can increase
the levels of steady state mRNA and/or alter the pattern of polyA site
selection. Thus physiologic [75] or pathologic [76] shifts in the levels or
biologic activity of CP can result in major alterations in the transcriptome by
changes in steady state levels of subsets of mRNAs and/or a shifts in the
relative utilization of competing polyA sites (APA). (data courtesy of Dr.
Xinjun Ji, University of Pennsylvania)
those transcripts would not necessarily result in a significant change in overall mRNA
abundance. Thus the impact of CP depletion on steady state levels for many of the
genes studied is likely to reflect substantial changes in the efficiency of 3’ processing at a
unique polyA site (ie., absence of APA).
How might the role of CP/polyC complex as an USE enhancer of 3’ processing
relate to known physiologic and pathophysiologic processes? The current study revealed
51
that CPs can autoregulate polyA site selection on the CP2 transcript. This finding is in
general agreement with multiple observations of auto-regulatory control over expression
of RNA binding proteins [73]. These new CP2 polyA sites were generated secondary to
an alternative splicing event mediated by the shift in CP levels [63]. These isoforms of
the full-length CP2 mRNAs utilizing this novel polyA sites are structurally similar to
CP4 protein, which has been implicated in apoptosis regulation [74]. Future work will
determine whether this novel CP2 isoform plays similar role under some circumstances.
CPs are considered ubiquitously expressed and are linked to a variety of activity [56].
However, CP activity and level can change significantly under special and abnormal
circumstances, such as environmental stress [66], cancer [77], differentiation [75],
chronic myeloid leukemia [76] and epithelial-mesenchymal transdifferentiation (EMT)
during the development and metastatic progression of tumours [78]. So, it is formally
possible that the change of level and activity of CP proteins may impact the pattern of
alternative polyadenylation of some master genes or subsets of genes in these and other
pathways.
In summary, the present report reveals RNA binding protein CPs plays an
important role on the 3’ end processing of a subset of genes, and this effect is mediated
by the USE function of CP RNP complex. Combining our recent study [7] with the
current work, we propose that CP complex assemble on the target RNAs cotranscriptionally, the nuclear-assembled CP-complexes then are retained on the mRNAs
post-processing in the nucleus and are co-exported along with the mature mRNAs to the
cytoplasm. By doing so, CPs are capable to link nuclear transcript processing and
cytoplasmic mRNA metabolism.
52
Materials and Methods
Cell culture and siRNA transfection
K562 cells were cultured in RPMI 1640 medium supplemented with 10% fetal
bovine serum (HyClone) and antibiotic/antimycotic at 37°C in a 5% CO2 incubator. Cells
were transfected with a total of 2.0 μg of siRNA using Nucleofector V (Amaxa)
acoording to the manufacturer’s instruction. All the siRNAs are ordered from
Dharmacon. The siRNAs used are: CP1-1: GUG AAA GGC UAU UGG GCA A;
CP1-4:UGU AAG AGU GGA AUG UUA A; GLD2-1: GUG AUU AAG AAG UGG
GCA A; GLD2-2:CCA AAG AUA AGU UGA GUC A; and siRNA to Cyclophilin is
ordered from Dharmacon. K562 cells were transfected twice with siRNAs; after 24 hours
of initial transfection, these K562 cells were transfected one more time with same
siRNAs. Cells were harvested after 72 hours of total transfection and RNAs were purified
using Absolutely RNA miniprep Kit (Stratagene) according to manufacturer’s instruction.
Western blot analysis is as described [51, 57].
Direct RNA sequencing
3’ end of RNA samples were sequenced by Helicos bioscience corporation
(Cambridge, MA, USA) according to their protocols [45, 64].
Mapping and APA analysis of DRS data
The direct RNA sequencing (DRS) was performed by Helicos Biosciences and we
aligned DRS reads to human genome assembly 19 (hg19) using the indexDPgenomic tool
in Helisphere at http://open.helicosbio.com/mwiki/index.php/Releases and the uniquely
mapped reads with a minimal mapped length of 25 and alignment score of 4.0 were kept
for further analysis. The replicate samples for control and CP knockdown experiments
were pooled together for differential expression and APA study. We first filtered all
mapped reads for those arising from internal poly(A) priming using a previously
53
described approach[32]. We next identified individual poly(A) sites by reversing 5’ ends
of the non-internal-priming reads. To construct a consensus poly(A) annotation for
downstream analysis, we used pooled data from both pooled control and aCP
experiments to iteratively cluster all individual poly(A) sites within 40 nt to its nearest
poly(A) site on the same chromosome strand. The weighted coordinate, which was
calculated as the sum of the product of the coordinate of an individual poly(A) and its
percentage of usage in the whole cluster, was taken as the representative coordinate of the
corresponding poly(A) cluster. The frequencies of poly(A) clusters in the different
samples were calculated according to the above consensus coordinates of poly(A)
clusters in the pooled data. Next, the poly(A)s residing in the whole gene region,
including exons, introns, and the downstream 100-nt region of the terminal exon, were
collected as possible poly(A)s of a certain gene (UCSC genes (hg19) and Ensembl genes
(release 61)).
To test whether there is a change of usage for any single poly(A) cluster of a
certain gene, the Fisher exact test was conducted to compare the ratio of DRS counts of a
single polyA cluster to the sum of all the other polyAs clusters between pooled control
and aCP knockdown samples. The p-values were adjusted by Benjamini-Hochberg
method for calculating FDR. Finally, the poly(A)s with FDRs less than 0.05 and
percentage change of total polyA usage greater than 10% (|  |  0.1) were defined as
significantly changed poly(A)s.
Detection of differential gene expression
The expression level of a gene is represented by the sum of DRS read counts of
all the overlapping poly(A)s. We next ran DEGSeq to detect differential gene expression
between the control and CP knockdown samples. The genes with FDRs less than 0.05
and normalized fold change greater than 1.5 (by number of mapped DRS reads) were
defined as significantly differentially expressed genes.
54
Motif enrichment analysis
We divided significantly changed polyAs in APA study into up-regulated sets
(FDR < 0.05 and   0.1 ) and down-regulated sets (FDR < 0.05 and   0.1 ).
Motif enrichment and co-occurrence analysis were conducted on these two subsets
separately. The upstream 200bp sequences of polyAs were first scanned by MEME. To
control for the polyA abundance, we grouped significantly changed polyAs and
unchanged polyAs (FDR > 0.5) into bins (the borders of bins were defined by
, n is determined by the most highly abundant polyA in the dataset)
according to DRS read counts. The background polyAs were next randomly sampled
from unchanged polyAs (FDR > 0.5) with bin sizes 10 times bigger than those in the
significant polyA sets. To draw RNA-Map, the motif score of a sequence position (in the
upstream 200 nt region of polyA site) is calculated as the average percentage of
overlapped nucleotides in a 31nt window (upstream and downstream 15nt) for both
significant and background polyAs. A Wilcoxon rank sum test was performed to
measure the significance of difference in average motif score for a specific position. The
p-value of Wilcoxon rank sum test was adjusted by Benjamin-Hochberg algorithm to get
a FDR. The dominant polyAs of the significantly up-regulated and down-regulated genes
were retrieved for similar motif analysis as in the APA study. The background dataset
was generated by controlling for the gene expression level of significantly differentially
expressed genes, which used a same binning and random sampling method as in the APA
section. All the following analyses were also the same as in the APA section.
Gene ontology analysis
The DAVID (using PANTHER classification system) was used to analyze the
Gene ontology enrichment of significantly differentially expressed genes and genes
embracing APA events separately [79]. The background gene sets were controlled for the
55
distribution of gene expression level in the foreground gene sets using similar “binning
and random sampling” method as in the “Motif enrichment analysis”.
QPCR
RNAs were treated with DNase I (Invitrogen, PCR grade) and then reverse
transcribed using First-strand cDNA synthesis kit (GE). qPCR were performed using Fast
SYBR Green Master Mix kit(Applied Biosystems) on 7900HT Fast qPCR
machine(Applied Biosystems) according to manufacturer’s instruction. Primers used in
the DGE and APA studies are listed in Table 6.
3’ RACE
3’ RACE is performed according to established protocol (3’ RACE System for
Rapid Amplification of cDNA Ends, Invitrogen).
RNA UV-crosslinking and EMSA
UV-crosslinking assays were performed as described [7, 63].
56
CHAPTER IV
CONTEXT-DEPENDENT REGULATION OF
APA BY EPITHELIAL SPLICING REGULATORY PROTEINS
Introduction
ESRP1 and ESPR2 are two epithelial cell-specific splicing proteins which play an
important role for the epithelial to mesenchymal transition (EMT). Previous studies have
uncovered its regulation on a broad range of alternative splicing events (alternative
cassette exons and alternative 5’/3’ splice site) and a limited set of APA events using
high-density microarray [80, 81]. However, as reviewed in Chapter II, the microarray
platforms are of great limits to the identification of APA. Instead, in the current study, we
combined both RNA-Seq and DRS in order to identify the significant APA events
regulated by ESRPs. Based on the motif analysis on SELEX-Seq data of ESRPs, we also
scanned for motifs in the flanking sequences of ESRP-regulated polyA sites and gained
certain insights into the context-dependent pattern of ESRP in regulating APA.
Results
Identification of ESRP-regulated changes in alternative
3’ end formation by coupling
We prepared RNA-Seq and direct RNA sequencing (DRS) libraries from
mesenchymal cells (MDA-MB-231) in which ESRP1 is ectopically expressed, as well as
its control by over-expressing GFP (Figure 19). The RNA-Seq and DRS libraries were
respectively sequenced on Illumina and Helicos DRS platforms [64]. The DRS reads
obtained through the pipeline were filtered for minimum length of 25 and for internal
polyA sites, which yielded 3,338,956 and 3,537,072 uniquely mapped reads for the
control and ESRP expressing samples, respectively.
Clustered reads from DRS were then used to identify 335 candidate polyA
siteswith significant differential use between control and ESRP expressing cells. To
57
Figure 19 Outline of the experimental systems and RNA-Seq/DRS protocol used to
identify ESRP regulated APA. (data courtesy of Dr. Kimberly Dittmar,
University of Pennsylvania)
validate these events, we collected reads from the RNA-Seq within the 300 nt region
upstream of the polyA sites. Overall, we noted that 71.8% of the DRS predicted changes
in polyA site use were supported by RNA-Seq, where many of the non-validated cases
lacked sufficient RNA-Seq coverage in the relevant genomic location. These data
suggested that the DRS pipeline was robust and accurate.
However, to obtain a more confident set of ESRP regulated polyA sites we further
filtered the DRS predicted polyA sites to include only those with statistical RNA-Seq
validation. This resulted in a total of 160 high confidence changes in polyA site use in
58
Table 8
Summary of direct RNA sequencing data of ESRPs knockdown.
RNA Samples
Control
ESRP
Sequenced reads
11,494,266
11,624,031
Uniquely mapped reads
3,660,857
3,881,752
Non-internal priming reads
3,338,956
3,537,072
Percentage of non-internal
29.0%
30.4%
1,698,664
1,924,890
50.9%
64.4%
priming reads
Non-internal priming reads
overlapping known annotation
Percentage of known polyA sites
response to ESRP1 expression of which 32 were designated SE-APA, 76 as DE-APA3,
and 52 as DE-APA5. In six cases we also used competitive RT-PCR using a common
forward primer and specific reverse primers that would recognize each alternative form.
Although these competitive PCRs are less quantitative than those using common primer
sets, these validations supported each of the events tested thereby providing an added
level of evidence that the DRS/RNA-Seq approach is robust and reliable (Figure 20).
One example of alternative polyA use in the same UTR (SE-APA) was BAG1, where
ESRP1 promoted expression of the isoform with an extended 3’ UTR (Figure 20). We
also validated a DE-APA3 type event in the CHID1 transcript, an example where ESRP
promotes use of a proximal terminal exon (Figure 20). In the EPHA2 transcript ESRP
promotes the use of a 5’ splice site in the proximal DE-APA5 type terminal exon.
We were intrigued by several examples where ESRP expression induced proximal
DE-APA3 or DE-APA5 type 3’ terminal exons that were very close to the 5’ end of
59
transcripts and associated with a significant decrease in expression of the downstream
exons (Figure 21). For example, in COL5A1 our analysis identified a novel DE-APA3 in
the fourth intron and a DE-APA5 event in the first exon (Figure 21A). ESRP expression
promoted both of these events, leading to short truncated products and a nearly 35-fold
reduction in expression of full length transcripts. Similar reductions in total mRNA
expression due to ESRP activation of a DE-APA5 in exon one of HSPG2 and the second
non-coding exon of EIF4G3 were also observed (Figure 21). While it is not known
whether or not the resulting truncated transcripts encode polypeptides, these examples
illustrated how early polyadenylation can downregulate gene expression, though we note
that this mechanism may also involve contributions of microRNA based regulation of the
alternative UTRs. These examples are reminiscent of a previous observation in cleavage
stimulation factor CstF-77 transcripts where the use of a conserved alternative polyA site
in the third intron was proposed to serve as a means of utilizing alternative
polyadenylation to directly modulate expression of the full length functional isoform
[82]. Similar to our observations on the ESRPs, a limited number of other examples have
been described wherein splicing factors can also regulate polyadenylation [70]. While the
APA type events suggest that they can directly interact with components of the
polyadenylation machinery, the DE-APA3 and DE-APA5 type events present a more
complex type of regulation involving interplay of splice site and polyA site selection.
Previous studies of the Nova proteins (and its D. melanogaster ortholog Pasilla)
have shown that the binding sites or known motifs support a position dependent function
to regulate these more complex types of regulation [5, 83]. We therefore similarly sought
to explore whether the ESRP binding motif was enriched in the set of alternatively
polyadenylated transcripts identified here and whether the position of these putative
binding sites might determine whether they positively or negatively affect polyA site use.
Using a more refined set of 108 such events (see Materials and Methods) regulated by
ESRP we evaluated the positions of these motifs relative to a background set of
60
Figure 20 Examples of three types of alternative 3’ end formation regulated by the
ESRPs. (A) Example of SE-APA, (B) example of DE-APA3, and (C)
Example of DE-APA5 are shown with UCSC browser views of RNA-Seq and
direct RNA sequencing (DRS) read counts from MDA-MB-231 control (EV,
green) vs. ESRP-overexpressing cells (ESRP, red) and RT-PCR validations.
(data courtesy of Dr. Kimberly Dittmar, University of Pennsylvania)
61
Figure 21 Example of APA events correlating with host gene expression change.
Examples of APA5 and APA3 events near the 5’ end that result in decreased
overall expression for these genes in ESRP1 expressing cells (visualized in
UCSC genome browser).
alternative polyA sites. Given the limited number of well supported examples of each
APA subtype, our primary analysis lumped all alternative polyA sites together and
investigated whether the enrichment of these motifs relative to the site of polyA addition
might determine if use of a given site increased or decreased upon ESRP expression. As
shown in Figure 22, there was a highly significant level of enrichment for ESRP binding
sites both upstream and downstream of the ESRP-regulated polyA sites, suggesting that
they can directly impact polyadenylation. While there were regions relative to the polyA
site in which ESRP binding motifs were enriched in both ESRP enhanced and repressed
sites, there were also regions in which binding sites were more statistically associated
with ESRP enhanced or repressed sites. For example, in the region ~-220 to -160
62
upstream of ESRP regulated polyA sites there was a greater enrichment in enhanced vs.
repressed polyA sites (Figure 22). However, for ESRP repressed sites there was greater
enrichment of the motifs +7 to +86 and +200 to +250 nt downstream of the polyA site.
These observations are in agreement with previous studies showing enrichment for
binding motifs of other splicing factors such as Nova and hnRNP H that also regulate
polyadenylation [5, 27]. While these observations implicate the ESRPs in the regulation
of polyadenylation, it bears note that the regulation of DE-APA3 and DE-APA5 events
may also occur through the regulation of splicing via binding to intronic regions near the
regulated splice sites. Such events may further involve coupled recruitment or inhibition
of the splicing and polyadenylation machineries. However, given the limited number of
each subtype (DE-APA3 and DE-APA5), we were unable to derive a separate confident
map for ESRP binding sites in these events. While this motif analysis supports a role for
ESRPs in direct regulation of polyadenylation, we cannot be sure that all of the events we
identified are direct targets. Nonetheless, the current set of over 100 high confidence
changes in polyadenylation provides a useful dataset for downstream analysis and
illustrates the potential of integrating high throughput sequencing of mRNA 3’ ends with
transcriptome-wide sequencing to uncover larger sets of these regulatory networks.
Discussion
We analyzed DRS and RNA-Seq for identifying APA events potentially impacted
by ESRPs. This work highlights the emerging potential of high throughput sequencing
methodologies to comprehensively identify differential patterns of alternative
polyadenylation. The pipeline using RNA-Seq and DRS to investigate regulation of
polyadenylation demonstrates how coupling these different high throughput technologies
can increase the power of detection while also providing intrinsic cross-validation. Our
studies also add the ESRPs to the list of known splicing regulators, such as Nova, hnRNP
F, hnRNP H, PTB, and U1A that can regulate polyadenylation [70, 84, 85]. While the
63
Figure 22 A functional map for ESRP position-dependent regulation of alternative
polyadenylation. The top twelve 6-mer ESRP binding motifs from SELEXSeq were used to derive an ESRP binding score, which is shown mapped
across the set of 108 DRS identified and RNA-Seq cross-validated ESRP
regulated polyA sites and the 250nt upstream and downstream with promoted
sites in red and silenced sites in blue. This motif was also mapped across a
background set of annotated polyA sites (black). (data courtesy of Dr. Peng
Jiang, University of Iowa)
proposed map of ESRP binding sites suggests that they can promote or inhibit
polyadenylation in a position dependent manner, further experiments are needed to
validate this. In addition, future investigations are needed to understand the mechanisms
by which the ESRPs regulate polyadenylation. HNRNP F, a homolog of ESRP, was
previously suggested to regulate alternative polyadenylation of IgM heavy chain through
inhibition of Cstf-64 binding to the downstream polyA [86]. A similar role for PTB
mediated inhibition via reduced Cstf-64 binding was also shown, whereas binding of PTB
upstream of the polyA site can promote polyadenylation [26]. Based upon the pattern of
64
enrichment of ESRP binding motifs relative to regulated polyA sites we envision that
they might operate through similar mechanisms to regulate polyadenylation. We also
noted numerous examples of high confidence ESRP regulated polyA sites that represent
novel previously unannotated sites of polyadenylation. In fact, 72 out of the 271 (26.6%)
of the regulated polyA sites supported by both DRS and RNA-Seq were novel, similar to
findings in other recent 3’ end sequencing studies that discovered large numbers of novel
polyA sites [32, 43, 64]. Thus, as more studies using similar technologies are performed
for additional regulators and in different cell contexts, the percentage of human genes that
are known to undergo regulated alternative polyadenylation will almost surely continue to
rise.
Materials and Methods
Cell Culture, transfection, and transduction
MDA-MB-231 cells were maintained, transfected, and transduced as described
[87].
Library preparation and sequencing
Sequencing libraries were prepared using the mRNA-Seq Sample Prep Kits
(Illumina) according to the manufacturer's instructions. 10ug total RNA was used to
prepare polyA RNA for fragmentation followed by cDNA synthesis with random
hexamers and ligation to Ilumina adaptor sequences. The samples were quantified using
an Agilent 2100 Bioanalyzer, loaded onto flow-cells for cluster generation, and
sequenced on an Illumina GenomeAnalyzer IIx using single-read protocol to generate
76bp reads (Illumina).
65
Identification of ESRP regulated changes in
polyadenylation
We used single-end RNA-Seq reads for EV and ESRP experiments to infer the
exon-exon junctions which are used to classify APA events. The junction prediction was
done in TopHat [88]. The predicted junctions and known gene annotation were taken
together to do the APA type classification (SE-APA, DE-APA3 and DE-APA5). To
investigate the agreement between DRS and RNA-Seq data and to validate the DRS
predictions, we counted RNA-Seq reads within 300 nt upstream regions of polyAs and
conducted one-sided fisher exact test based on RNA-Seq reads. We defined consensus
validated events as those with FDR less than 0.05 from DRS and p-value less than 0.01
from RNA-Seq with the same direction of change in both datasets. We also filtered using
a cutoff of at least a 10% change in polyA site use from DRS and discarded those events
that could not be classified as SE-APA, DE-APA3, or DE-APA5. For the purposes of
investigating ESRP binding motifs within the events we also removed significant APA
genes with more than two polyAs for drawing an RNA map. We also noted that a number
of the DE-APA3 and DE-APA5 type events corresponded to comparison of two or more
closely approximated polyA sites with a single alternative polyA and we therefore
retained only the most representative comparison within that gene with the most
significant p-value.
66
CHAPTER V
GLOBAL REGULATION OF ALTERNATIVE
POLYADENYLATION BY CLEAVAGE STIMULATION FACTOR
64 (CSTF64)
Background
Cleavage stimulation factor (CstF) protein family is the core family protein
involved in the 3’ end processing, which includes CstF77, CstF50, and CstF64 [89].
CstF64 can directly bind to U/GU-rich elements of RNA via its RNA recognition motif
(RRM) [90]. Although the AAUAAA hexamer is highly conserved, the downstream
U/GU-rich elements are much more heterogeneous and it is not well understood how
CstF64 can recognize such divergent sequences [91, 92]. As reviewed in Chapter I,
CstF64 is an important regulator of APA. However, it remains unknown how CstF64
regulates APA globally. On the other hand, CstF64τis a paralog of CstF64 and the two
protein share a similar domain structure [93]. Recently, CstF64τwas isolated as a part of
the CstF complex [10], but its functions in mRNA 3’ processing remain poorly
understood. In order to comprehensively characterize the functions of CstF64/τ in vivo,
we characterized CstF64-mediated global APA regulation by quantitative RNA
polyadenylation profiling of CstF64-expressing and CstF64-depleted cells. The results in
this study provided significant new insights into the mechanisms of PAS recognition and
APA regulation by CstF64/τ.
Global analyses of CstF64-mediated APA regulation
To characterize the role of CstF64 in global APA regulation, we generated HeLa
cell lines (CstF64-RNAi cells) that stably express specific small hairpin RNAs against
CstF64 mRNAs. As shown in Figure 23A, CstF64 was efficiently depleted in these cells
while CstF77 and CstF50 levels were not significantly affected. Interestingly, no apparent
growth defects were observed for these cells. We then isolated total RNAs from control
67
HeLa cells and CstF64-RNAi cells and carried out direct RNA sequencing (DRS) using
the Helicos platform to quantitatively map RNA polyadenylation profiles (Table 8).
Table 9
Summary of DRS data of CstF64/τ knockdown experiments.
RNA Samples
Hela
CstF64-
CstF64/τ-
siRNA
siRNA
Sequenced reads
8,787,115
22,699,065
2,769,295
Uniquely mapped reads
3,978,877
9,957,878
1,115,885
Non-internal priming reads
3,795,224
9,538,799
1,060,090
Percentage of non-internal
43.2%
42.0%
38.3%
2,760,228
7,160,775
553,018
72.7%
75.1%
52.2%
priming reads
Non-internal priming reads
overlapping known annotation
Percentage of known polyA sites
When the APA profiles were compared between control HeLa cells and CstF64-RNAi
cells, we identified 327 PASs that showed significantly different usage. 85 genes were
identified as high-confidence targets as they contained two alternative PASs that showed
significant difference in usage. Among them, 52 genes showed an increase in the relative
usage of the distal PAS in CstF64-RNAi cells (proximalto-distal shift) while changes in
the opposite direction were observed for 33 genes (distal-to-proximal shift) (Figure 23B,
left panel). Given the known function of CstF64 as an essential mRNA 3’ processing
factor, it was surprising that depletion of CstF64 had relatively small effect on the global
68
APA profile. Interestingly, we observed that the protein levels of CstF64were
significantly higher in CstF64-RNAi cells (Figure 23A). We next compared the RNAbinding specificity of CstF64 and CstF64τusing gel shift assays with purified GSTCstF64 or CstF64τ-RRM and the polyA sites of SVL, BASP1 and RPS11. For all tested
RNAs, the affinities of CstF64 and CstF64τwere almost indistinguishable (Figure 24).
These results suggest that CstF64τand CstF64 have overlapping RNA binding
specificities and may play redundant roles in mRNA 3’ processing. Therefore, the
enhanced levels of CstF64τin CstF64-RNAi cells may at least partially compensate for
the loss of CstF64.
To assess the specific role of CstF64 in global APA regulation, we knocked down
CstF64τin CstF64-RNAi cells to a level similar to that in control HeLa cells through
transient transfection of siRNAs against CstF64τ(Figure 23A, right lane). Two
interesting observations were made from the DRS analysis results of the CstF64 and
CstF64τdouble knockdown (CstF64&τ-RNAi) cells. First, we identified 873 PASs with
significantly different usage between the CstF64&τ-RNAi cells and the control HeLa
cells, which was significantly higher than the number detected for CstF64-RNAi cells.
201 genes were identified as high confidence targets with two PASs that displayed
significantly different usage (Figure 23B, right panel). There was significant overlap
between the genes with significant APA changes in CstF64-RNAi cells and those in
CstF64&τ-RNAi cells (Figure 25A), and the regulated PASs also shared some sequence
features in the two datasets (Figure 25B). Second, for the identified genes with APA
changes, the majority (171 genes or 85%) showed a proximal to-distal shift while only 30
genes (15%) displayed changes in the opposite direction (Figure 23B). We validated our
APA analyses results on 6 selected target genes through quantitative RT-PCRs (qRTPCRs) using primer sets that target the common regions shared by both APA isoforms or
the extended regions that are only found in the longer isoforms. For all 6 genes tested, the
directionality of APA changes detected by the DRS analysis was confirmed by our qRT-
69
PCRs (Figure 23C), suggesting that our DRS analyses of APA were highly reliable. In
most cases, the magnitude of the APA changes was greater in CstF64&τ-RNAi cells
compared to that in CstF64-RNAi cells. To understand the role of CstF64-RNA
interactions in APA regulation, we compared CstF64 individual-nucleotide resolution UV
Cross-Linking and ImmunoPrecipitation (iCLIP) signals at the polyA sites that are
regulated by CstF64. We divided the genes with significant APA changes in
CstF64&τRNAi cells into “proximal-to-distal shift” or “distal-to-proximal shift” groups.
For all the genes within each APA group, we then plotted the total normalized iCLIP
signals at the proximal and distal sites. As shown in Figure 23D, similar levels of iCLIP
signals were detected at both proximal and distal PASs for genes in the “proximal-todistal” group (top panel). By contrast, the distal PASs have significantly higher CstF64
iCLIP signals than the proximal PASs for genes in the “distal-to-proximal” group
(Figure 23D, lower panel).
Discussion
Our study provided important new insights into the role of CstF64 in global APA
regulation. First, our data revealed that CstF64 and its paralog CstF64τhave overlapping
RNA-binding specificities and play redundant roles in APA regulation. The functional
redundancy between the two proteins provides an explanation for our observations that
depletion of CstF64 had little effect either on cell growth or the global APA profile, and
that co-depletion of both CstF64 and CstF64τled to greater APA changes. As
CstF64τwas still present in our CstF64/τ-RNAi cells (Figure 23A), the actual number of
APA events regulated by CstF64 and CstF64τmay exceed those identified in this study.
Secondly, our data suggest that CstF64 is an important global regulator of APA and, in
most cases, promotes the usage of proximal PASs (Figure 23B). We propose the
following model for CstF64-mediated APA regulation. When CstF64 is abundant, it
promotes efficient recognition of the proximal and weaker PASs through direct protein-
70
RNA interactions. 3’ processing at proximal PASs prevents the transcription and usage of
the distal PASs. In the presence of limited CstF64, however, recognition of the proximal
PASs becomes less efficient, which allows the distal and stronger PASs to be transcribed
and recognized by the 3’ processing machinery. Our results are consistent with previous
studies showing
\\\
Figure 23 CstF64-mediated global APA regulation. (A) Western blot analysis of
control HeLa, CstF64-RNAi and CstF64&τ-RNAi cells. (B) Pair-wise
comparison of PAS usage in HeLa, CstF64-RNAi and CstF64&τ-RNAi cells.
Y axis: log10(proximal/distal)-HeLa. X axis: log10(proximal/distal)-CstF64RNAi (left) or – CstF64&τ-RNAi (right). PAS pairs with statistically
significant differences in usage are highlighted in blue (higher usage of
proximal PAS in RNAi cells) or red (higher usage of distal PAS in RNAi
cells). (C) qRT-PCR verification of the APA changes in six genes. Y axis is
log2 ratio of RNAi/HeLa(extended/common). (D) Total normalized iCLIP
signals for proximal-to-distal shift (red) and distal-toproximal shift (blue) PAS
pairs (the same highlighted PAS pairs in (B)). (data courtesy of Dr. Chengguo
Yao, University of California, Irvine)
71
that higher levels of CstF64 led to increased usage of the proximal PASs in the IgM and
NF-ATc mRNAs [20, 94]. Thirdly, although CstF64 is believed to be a general 3’
processing factor, our results suggest that CstF64 depletion impacts the APA of a specific
Figure 24 Comparison of the RNA-binding specificities of CstF64 and CstF64τ. Gel
mobility shift assays using recombinant GST-CstF64-RRM or GST-CstF64τ
(0, 25, and 50μM) and the 60nt sequences downstream of the cleavage sites of
the listed genes. SVL RNA was used as a positive control. (data courtesy of
Dr. Chengguo Yao, University of California, Irvine)
subset of genes. Interestingly, a similar phenomenon has been reported in splicing where
changes in the concentration of core spliceosomal components regulate specific
alternative splicing events [95]. This may be a common theme for the regulation of
mRNA processing. Finally, a number of recent studies have reported widespread and
systematic APA changes under a variety of physiological and pathological conditions
[70]. Interestingly, systematic APA shifts to the distal PASs during stem cell
differentiation and development are accompanied by a decrease in the mRNA levels of
many core 3’ processing factors, including CstF64 and CstF64τ. Our study
provided the first direct evidence that a decrease in the protein levels of a general 3’
processing factor leads to APA changes characterized by the higher relative usage of the
distal PASs in many mRNAs. It is important for future studies to determine whether/how
72
the protein levels of CstF64/τand other core 3’ processing factors are regulated under
different physiological conditions and how these changes contribute to global APA
regulation.
Materials and Methods
Cell culture and transfections
HeLa cells were grown in DMEM plus 10% fetal bovine serum. For CstF64
RNAi, a pSuperior.puro plasmid was constructed to express small hairpin RNAs
targeting CstF64 mRNA (target sequence: GTTAGATGCCAGAGGATTA).
Transfections were carried out using Lipofectamine 2000 (Invitrogen) according to
instruction provided by the manufacturer. Stable CstF64 RNAi cell lines were obtained
by selection with puromycin and expansion of single colonies. To knockdown
CstF64in CstF64 RNAi cell lines, pre-designed siRNAs (Ambion s23471) were
transfected into a stable CstF64 RNAi cell line using Lipofectamine 2000. Knockdown
efficiencies were determined by western blotting using antibodies against CstF64 (mAb
6A9) and CstF64(Bethyl A301-487A).
Gel shift assay
RNA substrates were synthesized by T7 transcription in the presence of α-32P
UTP. RNAs (~1.5nM) were incubated with 0 to 60 µM GST-CstF64-RRM fusion protein
in 10.6 µl binding buffer (10mM HEPES, pH 7.9, 50mM NaCl, 0.5mM MgCl2, 0.1mM
EDTA, 5% glycerol, 1mM ATP, 10mM creatine phosphate, 5mM -mercaptoethanol,
0.25mM PMSF, 0.7 µg E. coli tRNA, and 1.4 µg BSA) at 30C for 10 mins. Reaction
mixtures were resolved on 5% nondenaturing PAGE gels.
Sequencing and reads mapping
The direct RNA sequencing (DRS) was performed by Helicos Biosciences and
DRS reads were aligned to human genome assembly 19 (hg19) using the
73
Figure 25 Comparison of APA changes in CstF64- and CstF64&τ-RNAi cells. (A) A
Venn diagram comparing the genes with 2 PASs showing significantly
different usages in CstF64- and CstF64&τ-RNAi cells. (B) MEME analysis of
the proximal and distal PASs (200nt sequence centering on the cleavage sites)
of genes with proximal-to-distal shifts in CstF64&τ- (top panel) and CstF64RNAi cells.
74
indexDPgenomic tool in Helisphere. The uniquely mapped reads with a minimal mapped
length of 25 and alignment score of 4.0 were kept for further analysis. We first filtered all
mapped reads for those arising from internal poly(A) priming using a previously
described approach (9). We next identified individual poly(A) sites by reversing 5’ ends
of the non-internal-priming reads. To construct a consensus poly(A) annotation for
downstream analysis, we used pooled data from both Hela-Mock and CstF64-RNAi cells
to iteratively cluster all individual poly(A) sites within 40 nt to its nearest poly(A) site on
the same chromosome strand. The weighted coordinate, which was calculated as the sum
of the product of the coordinate of an individual poly(A) and its percentage of usage in
the whole cluster, was taken as the representative coordinate of the corresponding poly(A)
cluster. The frequencies of poly(A) clusters in the different samples were calculated
according to the above consensus coordinates of poly(A) clusters in the pooled data. Next,
the poly(A)s residing in the whole gene region, including exons, introns, and the
downstream 100-nt region of the terminal exon, were collected as possible poly(A)s of a
certain gene (UCSC genes (hg19) and Ensembl genes (release 61)).
APA analysis
To compare the APA profiles between HeLa and CstF64-RNAi or CstF64&τRNAi cells using DRS data, we first removed poly(A) sites that overlap with
snoRNA/scaRNA/snRNA regions and those that have 0 read in two out of three samples.
For the remaining poly(A) sites, the Fisher exact test was conducted to compare the ratio
between the DRS read counts of one PAS and the sum of the read counts of all the other
PASs within the same gene. The P-values were adjusted by Benjamini-Hochberg method
for calculating FDR. The poly(A) sites with FDR less than 0.05 were defined as
significantly changed APA.
75
CHAPTER VI
ALTERNATIVE POLYADENYLATION
DURING HYPOXIA INDUCTION
Background
Hypoxia is a condition associated with inadequate oxygen supply to tissues or
cells. The hypoxia cellular response could play an important role in human disease. For
example, hypoxia is pervasive in cancer tissues/cells. It can promote tumor progression
and hypoxic cancer tissues/cells are resistant to therapy [97]. The biological pathway of
hypoxia is regulated by a family of transcription factors – hypoxia-inducible factors
(HIFs). Under normal oxygen condition, HIF-1α degrades to a level at which it cannot
bind with HIF-1β to form a transcription complex; while in hypoxia, HIF-1α:HIF-1β
complex accumulates and induces the transcriptions of a set of hypoxia-responsive genes
[98]. In addition, the hypoxic signaling pathways are diversified in various cell types as
measured by numerous microarray studies, which demonstrates the physiologic
complexity of hypoxia at the transcriptional level [99]. At the post-transcription level,
some studies discovered the involvement of microRNA and RNA binding proteins during
the cell response to hypoxia [100, 101]. As a result, a limited set of significant alternative
splicing events were detected in hypoxia-induced endothelial cells using exon array
[102]. Given all these facts, we hypothesize hypoxia can also induce alternative
polyadenylation which is important in creating protein and post-transcription diversity in
human cells. To justify this hypothesis, we first improved a protocol for specifically
sequencing 3’ end of mRNA based on the prototype described in [103]. Our quality
control analysis justified this improvement in terms of reproducibility, robustness and
76
correctness. Next, we applied a modified pipeline based on the method in Chapter II to
identify APA events in three sets of normal and hypoxic JEG-3 cells.
Results
Quality control analysis of PAS-Seq data
We prepared PAS-Seq libraries for 14 samples (6 hypoxia samples, 4 MAQC
brain samples and 4 MAQC UHR samples) and did sequencing on Illumina platform. We
first mapped the above 14 samples to human genome assembly 19 (hg19). The outputs
for each sample are 12.6M to 33M which are generally much higher than the output of
DRS as described in previous chapters. However, due to the higher internal priming reads
(30% to 55%) in the PAS-Seq, the final numbers of polyA site reads are 4M to 12M
which are at the same level as the DRS (Table 9). We next compared PAS-Seq polyA
sites to known gene or polyA site annotation. First, the polyA sites derived from PASSeq were compared to the polyA_DB2 and the 3’ end of known genes (UCSC and
Ensembl). A PAS-Seq polyA site is defined as “known” if it is within 40 nt of a
polyA_DB2 polyA site or the 3’ end of a known transcript. As shown in Table 9, we
observed that 85% to 92% of polyA sites were consistent with known polyA annotation.
In addition, for each group of technical replicates (MAQC samples), we observed very
similar numbers in both the percentage of non-internal priming reads and percentage of
known polyA sites implicating high robustness of PAS-Seq.
Reproducibility of PAS-Seq
To further assess the reproducibility of the PAS-Seq data, we carried out a series
of quality control analyses. First, within each PAS-Seq replicate group, we calculated the
77
reproducibility score of each individual polyA site— number of samples in which a
certain individual polyA site occurs among all the replicate experiments. We also
calculated the percentage of PAS-Seq polyA sitess with certain reproducibility score
range (>=2, >=3, and ==4) using different threshold of PAS-Seq read count (Figure 26).
In this analysis, we observed an increasing trend illustrating that it is more likely for
polyA site with higher PAS-Seq read count occurring in more replicates. When the
threshold of PAS-Seq read count was set at 5 or more, 95% of polyA sites were detected
Table 10 Summary of Hypoxia and MAQC PAS-Seq data
Brain1
20.2M
13.8M
6.3M
45.7%
Non-internal
priming
reads
overlapping
known
annotation
5.4M
Brain2
17.9M
12.4M
5.6M
45.2%
4.8M
85.7%
Brain3
12.6M
8.7M
4M
46.0%
3.5M
87.5%
Brain4
19.4M
13.3M
6.1M
45.9%
5.3M
86.9%
UHR1
16.6M
11.2M
6.1M
54.5%
5.6M
91.8%
UHR2
15.4M
10.3M
5.6M
54.4%
5.0M
89.3%
UHR3
20.2M
13.7M
7.3M
53.3%
6.5 M
89.0%
UHR4
14.6M
9.8M
5.3M
54.1%
4.7M
88.7%
HYP1
23.5M
16.2M
11.3M
69.8%
10.4M
92%
Ctrl1
27.2M
18.7M
11.4M
61.0%
10.2M
89.5%
Hyp2
33.0M
22.0M
12.0M
54.5%
10.5M
87.5%
Ctrl2
23.0M
15.9M
8.0M
50.3%
7.0M
87.5%
Hyp3
25.6M
17.1M
9.4M
55.0%
8.3M
88.3%
Ctrl3
25M
16.8M
8.7M
51.8%
7.6M
87.4%
Sequenced
Samples
reads
Uniquely
mapped
reads
Noninternal
priming
reads
Percentage
of noninternal
priming
reads
Percentage
of known
polyA sites
85.7%
78
Figure 26 Reproducibility of polyA sites by PAS-Seq. X-axis is the minimal PAS-Seq
read counts across multiple replicate samples and Y axis is the percentage of
polyA sites reproduced in no less than 2 (yellow), no less than 3 (red) and 4
replicates (blue). The panel A is for MAQC brain replicates and panel B for
MAQC UHR replicates.
in all four samples, 90% in at least 3 samples and 85% in at least 2 samples (in both
Hypoxia and MAQC sample).
Next, we plotted a heatmap about the pairwise Pearson correlation of common
individual polyA sites for 18 individual PAS-Seq samples (14 in house datasets –4 UHR,
4 Brain, 6 Hypoxia, and 4 published datesets – 2 UHR and 2 Brain [44]) (Figure 27) .
The dendrogram clearly shows the clustering of replicates of the same experiment and the
separation of different experiments of data, which further suggests the high
reproducibility and robustness of our PAS-Seq data.
Gene expression correlation between PAS-Seq and
RNA-Seq
Ideally, each individual PAS-Seq polyA site represents an mRNA transcript. To
demonstrate that PAS-Seq is quantitative for gene expression level and that PAS-Seq
79
Figure 27 Clustering of 18 PAS-Seq samples. The correlation coefficients of PAS-Seq
read counts of polyA sites are calculated pairwisely for all 18 PAS-Seq
samples as described in the text. The correlation coefficients matrix was then
used to plot the clustering heatmap.
reads are not over-amplified during sample preparation, we compared the gene
expression level measured by RNA-Seq to that by PAS-Seq for 6 hypoxia samples. As a
result, the Pearson correlation coefficients ranges from 0.57 to 0.71 (Figure 28). These
correlation coefficients are high enough to claim the quantitative capability of PAS-Seq
in measuring gene expression levels.
In conclusion, we have demonstrated the correctness, robustness, reproducibility
and quantitative ability of PAS-Seq through various quality control analyses. Thus, this
improved protocol is suited for APA detection and other relevant analysis.
80
R: 0.621
R: 0.712
R: 0.596
R: 0.578
R: 0.572
R: 0.608
Figure 28 Scatterplots of gene expression measured by PAS-Seq and RNA-Seq. 6
samples in hypoxia experiment are plotted independently. The X-axis and Yaxis represent gene expression (in log2 scale) by PAS-Seq and RNA-Seq
respectively.
Alternative polyadenylation induced by hypoxia
By applying a modified pipeline as compared to the method in Chapter II
(Materials and Methods), we detected 579 significant APAs (312 are downregulated
and 267 are upregulated) between hypoxia and normoxia. Amongst these APA events,
241 are SE-APA events, 21 are SE-APA3 events, 8 are SE-APA5 events and the
remaining 308 cannot be explicitly classified into any of the above three types, which are
81
defined as “ambiguous-APA”. We also took advantage of RNA-Seq for APA detection
for the purpose of verifying PAS-Seq results. 104 out of 579 significant APA by PASSeq are also supported by RNA-Seq. Using qRT-PCR, we successfully validated 21 out
of 23 polyA sites with the same direction change of APA utilization (Figure 29).
Many studies have revealed a systematic shortening or lengthening pattern of
mRNA 3’ UTR as the response to clinical conditions [16, 31, 104]. To investigate the
direction of 3’ UTR length change between hypoxia and normoxia, we calculated
weighted 3’ UTR lengths and observed the overall distribution of 3’ UTR length in
hypoxia replicates is shorter than that in the corresponding normoxia replicates (Figure
30).
One function of APA is to regulate mRNA steady level through the regulation of
microRNAs on alternative UTR region. In this regard, we further separated the SE-APA
into two subgroups – SE-APA with the usage preference changed from proximal to distal
polyA site (SE-APA-P2D) and SE-APA with the usage preference changed from distal to
proximal polyA site (SE-APA-D2P). Although we observed no difference in mRNA
expression distribution between SE-APA-P2D and SE-APA-D2P, we did find the number
of SE- APA-D2P was two times as much as the SE-APA-P2D (Figure 31). Together
with the results in Figure 30, we concluded that mRNAs are inclined to expressed shorter
3’ UTR isoforms in hypoxia as compared to normoxia.
GO analysis of APA genes
To gain insights into the biological pathways of APA genes, we conducted gene
ontology (GO) analysis on genes with significant APA switches. Using a background
dataset consisting of expressed genes (FPKM >= 1), we found no significant GO term.
Next, we conducted GO analysis on SE-APA-P2D and SE-APA-D2P gene sets
separately. In the SE-APA-D2P gene set, we detected one single significant GO term –
“Homeostasis” (6.8 fold enrichment compared to the background gene set). This implies
82
Figure 29 RT-qPCR validation of APA events in hypoxia. One example is shown for
each APA type (SE-APA, DE-APA3 and DE-APA5). The left panel
represents the structural diagram and RNA-Seq and PAS-Seq profile from
UCSC genome browser. The top picture in the middle panel illustrates the
difference of polyA site usage from normoxia to hypoxia while the bottom
picture represents the fold changes of ratio of expression levels (by qPCR)
between different isoforms in hypoxia and normoxia. The right panel shows
the gene expression level (FPKM by RNA-Seq) of the corresponding APA
gene. (data courtesy of Dr. Lan Lin, University of Iowa)
SE-APA-D2P is biologically relevant to hypoxia induction, which may play a role in
maintaining the Homeostasis.
83
Figure 30 Distribution of 3’ UTR lengths in Hypoxia and Normoxia. For each
Ensembl gene with a single stop codon annotation, the 3’ UTR length was
calculated by summing up all the 3’ UTR lengths, weighted by the PAS-Seq
read counts of their corresponding polyA sites. The 25th, 50th and 70th
percentile of overall distribution were marked in the box and 5th and 9th
percentile were the boundary of the whiskers.
Discussion
Using an improved protocol to specifically sequence 3’ end of mRNA, we
discovered a large set of genes with significant APA usage switch between hypoxia and
normoxia. Two immediate questions for this finding are 1) whether the APA usage
switch in hypoxia is the consequence of upstream hypoxia-induced pathways and 2)
whether it can influence certain pathways leading to tumorgenesis? From the gene
expression analysis, we found significant gene expression changes of a set of RNA
binding proteins and 3’ end processing factors, such as RBFOX2 and MBNL3. It was
84
Figure 31 Distribution of expression levels of SE-APA genes. The red and blue dots
represent expression levels (FPKM in log2 scale) of APA-D2P and APA-P2D
genes in normoxia (x-axis) and hypoxia (y-axis). The red and blue bars show
the number of significant APA-P2D and APA-D2P genes respectively.
previously revealed that RNA binding protein was regulated by the oxygen supply
change in a HIF-1 independent pathway [105]. It would be interesting to investigate
similar pathways involving the significantly differentially expressed RBPs or 3’ end
processing factors in hypoxia and associate the RBPs or 3’ end processing factors with
the induction of APA usage switches.
The functional enrichment analysis on APA genes revealed the overrepresentation of homeostasis genes in hypoxia, which may suggest the direct influence
of hypoxia on APA formation. Moreover, it would be worthwhile to delve into the list of
homeostasis APA genes for biomarker identification.
Materials and Methods
Sample preparation
JEG-3 cells were incubated in humidified hypoxia chambers (Billups-Rothenburg,
Del Mar, CA, USA) with 2% and 20% O2 mixture at 37C for 48 hours. Total RNA from
six sets of biological replicate treatments were extracted using TRIzol reagent
85
(Invitrogen). For each sample, 1.5 g of total RNA used in reverse transcription reactions
with random hexamer to generate single-pass cDNA. qRT-PCR analysis of GAPDH and
ERRF1 were used to evaluate the hypoxic response of each treatment, 3 sets of treatment
(Set 1, 2 and 3) with the largest hypoxic response were selected for RNA-Seq and PASSeq analysis. . For the purpose of quality control, we also prepared sample using the same
protocol for 8 MAQC samples (4 brain technical replicates and 4 UHR technical
replicates) [106].
RT-qPCR validation of APA
To validate SE-APA events, two gene-specific primer sets were designed for each
APA event. One primer set targets the common region of all APA isoforms, and the
other primer set targets the extended region only on the specific APA isoform. For DEAPA3 and DE-APA5 events, each specific exon has one set of specific primers.
Detect APA for multiple samples
We modified the “One v.s. Others method” described in Chapter 2 in order to
detect significant APA events from multiple sets of hypoxia samples. First, instead of
conducting two-sided fisher exact test as in the one-sample setting, we conducted two
independent one-sided fisher exact tests (either greater or less) for each single polyA site
in each set (Set1, Set2 and Set3). Next we combined all the three one-sided fisher exact
tests with the same null hypothesis and calculated a combined χ2 value as
, where pi is the p-value for the ith one-sided fisher exact test. The χ2
was then transformed to p-values which were further adjusted by Benjamini-Hochberg
algorithm. Finally, we applied a set of filtering criteria to call the significant APA events:
86
(1). FDR less than 0.05; (2) individual p-value less than 0.001; and (3) percentage change
of polyA abundance greater than 0.1 in each sample.
To validate significant APA events using RNA-Seq, we first counted the total
reads residing in the upstream 300 nt region of each polyA site and called this number as
“supporting RNA-Seq count”. Next for each polyA site, we construct a contingency table
consisting of its supporting RNA-Seq count and the sum of the supporting RNA-Seq
counts of other polyA sites in the same gene in both hypoxia and normoxia. A two-sided
fisher exact test was conducted on each contingency table for each polyA site. If the pvalues of fisher exact test on RNA-Seq data for a polyA site were less than 0.05 in all
three replicates and the direction of polyA usage change was the same as in PAS-Seq
data, it was defined as a RNA-Seq validated APA.
Classification of APA
To classify APA events into different subtypes, we first removed minor polyA
sites whose average percentage of usage is less than 10% in both hypoxia and normoxia
data. Next, one significant APA was paired with each one of the remaining polyA sites of
the same gene and each corresponding polyA site pair was compared to known gene
structure annotation. If all the polyA site pairs were in the same terminal exon, there will
be several scenarios: 1). if the significant APA is the most distal polyA site whose usage
was increased from normoxia to hypoxia or if the significant APA is the most proximal
polyA site whose usage was decreased, this APA is called SE-APA-P2D (proximal to
distal); 2). if the significant APA is the most distal polyA whose usage was decreased
from normoxia to hyoxia or the significant APA is the most proximal polyA whose usage
was increased, this APA is called SE-APA-D2P (distal to proximal). 3). if he significant
APA is a middle polyA site, it is simply called SE-APA. If all the polyA site pairs were
in different terminal exons, the significant APA was classified as DE-APA3 and DEAPA5 accordingly. If multiple pairs were classified into different categories (SE-APA,
87
DE-APA3 and DE-APA5), it is called as a “multiple type APA”. At last, all the
remaining cases were called as “ambiguous APA”. For SE-APA-D2P and SE-APA-P2D
classification, we also calculated the ratio of the most distal polyA read count to the sum
of other polyA read counts for both hypoxia and normoxia. If a polyA is previously
classified as SE-APA-D2P and its corresponding ratio in hypoxia is lower than in
normoxia, it is corroborated as “SE-APA-D2P” in the final list; vice versa, a polyA is
corrobobated as “SE-APA-P2D”.
Gene enrichment analysis
The DAVID (using PANTHER classification system) was used to analyze the
Gene ontology enrichment of genes with significant APA events. The expressed genes
(FPKM > 1) in hypoxia as measured by RNA-Seq are used for background gene set.
88
CHAPTER VII
FUTURE DIRECTION
In this thesis, we have presented multiple studies regarding the identification of
APA impacted by the expression changes of different proteins (overexpression of ESRP
and knockdown experiments of αCPs and CstF64/τ). We first uncovered that αCPs
interacted with C-rich motifs upstream of cleavage sites and acted as an enhancer for 3’
end processing of a set of genes. Next, we revealed a context-dependent manner of ESRP
in APA regulation. Furthermore, we identified a wide range of APA events regulated by
CstF64 alone and CstF64/τ together and shown that higher levels of CstF64 promote the
usage of proximal polyA sites. These studies expanded current repertoire of proteins for
APA regulation and provided new insights into the mechanisms of APA regulation.
The boom for high-throughput sequencing technology has made it easier to detect
a large number of APA events between different samples. However, few studies dived
into the functional analysis of individual APA event which might be partly due to the
limited knowledge of functional elements in the 3’ UTR. Therefore, efforts should be
made toward decoding the functional elements associated with mRNA stability,
translation efficiency and subcellular localization. To date, fewer studies about DE-APA,
which affects coding region and 3’ UTR simultaneously, were carried out as compared to
SE-APA. This specific type of APA is functionally more important although it is
mechanistically more complicated. Future studies are needed to address this problem
especially in terms of interplay between alternative polyadenylation and alternative
splicing.
In addition, most of current global analyses of APA were conducted in
transformed cells or cell lines. It would be more significant if similar analysis can be
performed on clinical samples or even primary cell lines that are not transformed.
Moreover, the analysis on clinical samples can help reveal whether APA is a key factor
for certain diseases or is only a byproduct of other key biological pathways.
89
Finally, as revealed by [10], there is a complex coorperation network of proteins
involved in 3’ end processing. Although knockdown or overexpression experiments of
single gene have already provided valueable clues to APA regulation, further analysis to
study the synergy of different proteins in regulating APA will be needed in order to more
precisely decode APA regulation.
90
REFERENCES
1.
Watson JD: Molecular biology of the gene, 6th edn. San Francisco
Cold Spring Harbor, N.Y.: Pearson/Benjamin Cummings ;
Cold Spring Harbor Laboratory Press; 2008.
2.
Cooper TA, Wan L, Dreyfuss G: RNA and disease. Cell 2009, 136(4):777-793.
3.
Marzluff WF, Wagner EJ, Duronio RJ: Metabolism and regulation of canonical
histone mRNAs: life without a poly(A) tail. Nat Rev Genet 2008, 9(11):843-854.
4.
Kuhn U, Wahle E: Structure and function of poly(A) binding proteins. Biochim
Biophys Acta 2004, 1678(2-3):67-84.
5.
Licatalosi DD, Mele A, Fak JJ, Ule J, Kayikci M, Chi SW, Clark TA, Schweitzer
AC, Blume JE, Wang X et al: HITS-CLIP yields genome-wide insights into brain
alternative RNA processing. Nature 2008, 456(7221):464-469.
6.
Millevoi S, Decorsiere A, Loulergue C, Iacovoni J, Bernat S, Antoniou M,
Vagner S: A physical and functional link between splicing factors promotes premRNA 3' end processing. Nucleic Acids Res 2009, 37(14):4672-4683.
7.
Ji X, Kong J, Liebhaber SA: An RNA-protein complex links enhanced nuclear 3'
processing with cytoplasmic mRNA stabilization. Embo J, 30(13):2622-2633.
8.
Yan J, Marr TG: Computational analysis of 3'-ends of ESTs shows four classes of
alternative polyadenylation in human, mouse, and rat. Genome Res 2005,
15(3):369-375.
9.
Tian B, Hu J, Zhang H, Lutz CS: A large-scale analysis of mRNA
polyadenylation of human and mouse genes. Nucleic Acids Res 2005, 33(1):201212.
10.
Shi Y, Di Giammartino DC, Taylor D, Sarkeshik A, Rice WJ, Yates JR, 3rd,
Frank J, Manley JL: Molecular architecture of the human pre-mRNA 3'
processing complex. Mol Cell 2009, 33(3):365-376.
11.
Beaudoing E, Freier S, Wyatt JR, Claverie JM, Gautheret D: Patterns of variant
polyadenylation signal usage in human genes. Genome Res 2000, 10(7):10011010.
12.
Hu J, Lutz CS, Wilusz J, Tian B: Bioinformatic identification of candidate cisregulatory elements involved in human mRNA polyadenylation. RNA 2005,
11(10):1485-1493.
13.
Grosso AR, Gomes AQ, Barbosa-Morais NL, Caldeira S, Thorne NP, Grech G,
von Lindern M, Carmo-Fonseca M: Tissue-specific splicing factor gene
expression signatures. Nucleic Acids Res 2008, 36(15):4823-4832.
14.
Ji Z, Lee JY, Pan Z, Jiang B, Tian B: Progressive lengthening of 3' untranslated
regions of mRNAs by alternative polyadenylation during mouse embryonic
development. Proc Natl Acad Sci U S A 2009, 106(17):7028-7033.
91
15.
Sandberg R, Neilson JR, Sarma A, Sharp PA, Burge CB: Proliferating cells
express mRNAs with shortened 3' untranslated regions and fewer microRNA
target sites. Science 2008, 320(5883):1643-1647.
16.
Mayr C, Bartel DP: Widespread shortening of 3'UTRs by alternative cleavage and
polyadenylation activates oncogenes in cancer cells. Cell 2009, 138(4):673-684.
17.
Andreassi C, Riccio A: To localize or not to localize: mRNA fate is in 3'UTR
ends. Trends Cell Biol 2009, 19(9):465-474.
18.
An JJ, Gharami K, Liao GY, Woo NH, Lau AG, Vanevski F, Torre ER, Jones
KR, Feng Y, Lu B et al: Distinct role of long 3' UTR BDNF mRNA in spine
morphology and synaptic plasticity in hippocampal neurons. Cell 2008,
134(1):175-187.
19.
Pinto PA, Henriques T, Freitas MO, Martins T, Domingues RG, Wyrzykowska
PS, Coelho PA, Carmo AM, Sunkel CE, Proudfoot NJ et al: RNA polymerase II
kinetics in polo polyadenylation signal selection. EMBO J 2011, 30(12):24312444.
20.
Takagaki Y, Seipelt RL, Peterson ML, Manley JL: The polyadenylation factor
CstF-64 regulates alternative processing of IgM heavy chain pre-mRNA during B
cell differentiation. Cell 1996, 87(5):941-952.
21.
Yao P, Potdar AA, Arif A, Ray PS, Mukhopadhyay R, Willard B, Xu Y, Yan J,
Saidel GM, Fox PL: Coding region polyadenylation generates a truncated tRNA
synthetase that counters translation repression. Cell, 149(1):88-100.
22.
Moore MJ, Proudfoot NJ: Pre-mRNA processing reaches back to transcription
and ahead to translation. Cell 2009, 136(4):688-700.
23.
Takagaki Y, Manley JL: Levels of polyadenylation factor CstF-64 control IgM
heavy chain mRNA accumulation and other events associated with B cell
differentiation. Mol Cell 1998, 2(6):761-771.
24.
Elkon R, Drost J, van Haaften G, Jenal M, Schrier M, Vrielink JA, Agami R: E2F
mediates enhanced alternative polyadenylation in proliferation. Genome Biol,
13(7):R59.
25.
Martin G, Gruber AR, Keller W, Zavolan M: Genome-wide Analysis of PremRNA 3' End Processing Reveals a Decisive Role of Human Cleavage Factor I in
the Regulation of 3' UTR Length. Cell Rep, 1(6):753-763.
26.
Castelo-Branco P, Furger A, Wollerton M, Smith C, Moreira A, Proudfoot N:
Polypyrimidine tract binding protein modulates efficiency of polyadenylation.
Mol Cell Biol 2004, 24(10):4174-4183.
27.
Katz Y, Wang ET, Airoldi EM, Burge CB: Analysis and design of RNA
sequencing experiments for identifying isoform regulation. Nat Methods,
7(12):1009-1015.
28.
Berg MG, Singh LN, Younis I, Liu Q, Pinto AM, Kaida D, Zhang Z, Cho S,
Sherrill-Mix S, Wan L et al: U1 snRNP determines mRNA length and regulates
isoform expression. Cell, 150(1):53-64.
92
29.
Ji Z, Luo W, Li W, Hoque M, Pan Z, Zhao Y, Tian B: Transcriptional activity
regulates alternative cleavage and polyadenylation. Mol Syst Biol, 7:534.
30.
Glover-Cutter K, Kim S, Espinosa J, Bentley DL: RNA polymerase II pauses and
associates with pre-mRNA processing factors at both ends of genes. Nat Struct
Mol Biol 2008, 15(1):71-78.
31.
Singh P, Alley TL, Wright SM, Kamdar S, Schott W, Wilpan RY, Mills KD,
Graber JH: Global changes in processing of mRNA 3' untranslated regions
characterize clinically distinct cancer subtypes. Cancer Res 2009, 69(24):94229430.
32.
Fu Y, Sun Y, Li Y, Li J, Rao X, Chen C, Xu A: Differential genome-wide
profiling of tandem 3' UTRs among human breast cancer and normal cells by
high-throughput sequencing. Genome Res, 21(5):741-747.
33.
Bennett CL, Brunkow ME, Ramsdell F, O'Briant KC, Zhu Q, Fuleihan RL,
Shigeoka AO, Ochs HD, Chance PF: A rare polyadenylation signal mutation of
the FOXP3 gene (AAUAAA-->AAUGAA) leads to the IPEX syndrome.
Immunogenetics 2001, 53(6):435-439.
34.
Beaudoing E, Gautheret D: Identification of alternate polyadenylation sites and
analysis of their tissue distribution using EST data. Genome Res 2001,
11(9):1520-1526.
35.
Zhang H, Lee JY, Tian B: Biased alternative polyadenylation in human tissues.
Genome Biol 2005, 6(12):R100.
36.
Flavell SW, Kim TK, Gray JM, Harmin DA, Hemberg M, Hong EJ, MarkenscoffPapadimitriou E, Bear DM, Greenberg ME: Genome-wide analysis of MEF2
transcriptional program reveals synaptic target genes and neuronal activitydependent polyadenylation site selection. Neuron 2008, 60(6):1022-1038.
37.
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and
quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008,
5(7):621-628.
38.
Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF,
Schroth GP, Burge CB: Alternative isoform regulation in human tissue
transcriptomes. Nature 2008, 456(7221):470-476.
39.
Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for
transcriptomics. Nat Rev Genet 2009, 10(1):57-63.
40.
Jan CH, Friedman RC, Ruby JG, Bartel DP: Formation, regulation and evolution
of Caenorhabditis elegans 3'UTRs. Nature, 469(7328):97-101.
41.
Mangone M, Manoharan AP, Thierry-Mieg D, Thierry-Mieg J, Han T,
Mackowiak SD, Mis E, Zegar C, Gutwein MR, Khivansara V et al: The landscape
of C. elegans 3'UTRs. Science, 329(5990):432-435.
93
42.
Shepard PJ, Choi EA, Lu J, Flanagan LA, Hertel KJ, Shi Y: Complex and
dynamic landscape of RNA polyadenylation revealed by PAS-Seq. RNA,
17(4):761-772.
43.
Fox-Walsh K, Davis-Turak J, Zhou Y, Li H, Fu XD: A multiplex RNA-seq
strategy to profile poly(A+) RNA: application to analysis of transcription
response and 3' end formation. Genomics, 98(4):266-271.
44.
Derti A, Garrett-Engele P, Macisaac KD, Stevens RC, Sriram S, Chen R, Rohl
CA, Johnson JM, Babak T: A quantitative atlas of polyadenylation in five
mammals. Genome Res, 22(6):1173-1183.
45.
Ozsolak F, Platt AR, Jones DR, Reifenberger JG, Sass LE, McInerney P,
Thompson JF, Bowers J, Jarosz M, Milos PM: Direct RNA sequencing. Nature
2009, 461(7265):814-818.
46.
Lee JY, Yeh I, Park JY, Tian B: PolyA_DB 2: mRNA polyadenylation sites in
vertebrate genes. Nucleic Acids Res 2007, 35(Database issue):D165-168.
47.
Kass RE, Raftery AE: Bayes Factors. Journal of the American Statistical
Association 1995, 90(430):773-795.
48.
Benjamini Y, Hochberg Y: Controlling the False Discovery Rate - a Practical and
Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society
Series B-Methodological 1995, 57(1):289-300.
49.
Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient
alignment of short DNA sequences to the human genome. Genome Biol 2009,
10(3):R25.
50.
Pauws E, van Kampen AH, van de Graaf SA, de Vijlder JJ, Ris-Stalpers C:
Heterogeneity in polyadenylation cleavage sites in mammalian mRNA sequences:
implications for SAGE analysis. Nucleic Acids Res 2001, 29(8):1690-1694.
51.
Ji X, Kong J, Liebhaber SA: In vivo association of the stability control protein
alphaCP with actively translating mRNAs. Mol Cell Biol 2003, 23(3):899-907.
52.
Kiledjian M, Wang X, Liebhaber SA: Identification of two KH domain proteins in
the alpha-globin mRNP stability complex. EMBO J 1995, 14(17):4357-4364.
53.
Kong J, Ji X, Liebhaber SA: The KH-domain protein alpha CP has a direct role in
mRNA stabilization independent of its cognate binding site. Mol Cell Biol 2003,
23(4):1125-1134.
54.
Kong J, Liebhaber SA: A cell type-restricted mRNA surveillance pathway
triggered by ribosome extension into the 3' untranslated region. Nat Struct Mol
Biol 2007, 14(7):670-676.
55.
Weiss IM, Liebhaber SA: Erythroid cell-specific determinants of alpha-globin
mRNA stability. Mol Cell Biol 1994, 14(12):8123-8132.
56.
Makeyev AV, Liebhaber SA: The poly(C)-binding proteins: a multiplicity of
functions and a search for mechanisms. Rna 2002, 8(3):265-278.
94
57.
Chkheidze AN, Lyakhov DL, Makeyev AV, Morales J, Kong J, Liebhaber SA:
Assembly of the alpha-globin mRNA stability complex reflects binary interaction
between the pyrimidine-rich 3' untranslated region determinant and poly(C)
binding protein alphaCP. Mol Cell Biol 1999, 19(7):4572-4581.
58.
Chaudhury A, Chander P, Howe PH: Heterogeneous nuclear ribonucleoproteins
(hnRNPs) in cellular processes: Focus on hnRNP E1's multifunctional regulatory
roles. RNA, 16(8):1449-1462.
59.
Holcik M, Liebhaber SA: Four highly stable eukaryotic mRNAs assemble 3'
untranslated region RNA-protein complexes sharing cis and trans components.
Proc Natl Acad Sci U S A 1997, 94(6):2410-2414.
60.
Waggoner SA, Liebhaber SA: Regulation of alpha-globin mRNA stability. Exp
Biol Med (Maywood) 2003, 228(4):387-395.
61.
Waggoner SA, Liebhaber SA: Identification of mRNAs associated with
alphaCP2-containing RNP complexes. Mol Cell Biol 2003, 23(19):7055-7067.
62.
Thisted T, Lyakhov DL, Liebhaber SA: Optimized RNA targets of two closely
related triple KH domain proteins, heterogeneous nuclear ribonucleoprotein K and
alphaCP-2KL, suggest Distinct modes of RNA recognition. J Biol Chem 2001,
276(20):17484-17496.
63.
Ji X, Kong J, Carstens RP, Liebhaber SA: The 3' untranslated region complex
involved in stabilization of human alpha-globin mRNA assembles in the nucleus
and serves an independent role as a splice enhancer. Mol Cell Biol 2007,
27(9):3290-3302.
64.
Ozsolak F, Kapranov P, Foissac S, Kim SW, Fishilevich E, Monaghan AP, John
B, Milos PM: Comprehensive polyadenylation site maps in yeast and human
reveal pervasive alternative polyadenylation. Cell, 143(6):1018-1029.
65.
Wang L, Feng Z, Wang X, Zhang X: DEGseq: an R package for identifying
differentially expressed genes from RNA-seq data. Bioinformatics, 26(1):136138.
66.
Ghosh D, Srivastava GP, Xu D, Schulz LC, Roberts RM: A link between SIN1
(MAPKAP1) and poly(rC) binding protein 2 (PCBP2) in counteracting
environmental stress. Proc Natl Acad Sci U S A 2008, 105(33):11673-11678.
67.
Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW,
Noble WS: MEME SUITE: tools for motif discovery and searching. Nucleic
Acids Res 2009, 37(Web Server issue):W202-208.
68.
Sagawa F, Ibrahim H, Morrison AL, Wilusz CJ, Wilusz J: Nucleophosmin
deposition during mRNA 3' end processing influences poly(A) tail length. Embo
J, 30(19):3994-4005.
69.
Richard P, Manley JL: Transcription termination by nuclear RNA polymerases.
Genes Dev 2009, 23(11):1247-1269.
70.
Di Giammartino DC, Nishida K, Manley JL: Mechanisms and consequences of
alternative polyadenylation. Mol Cell, 43(6):853-866.
95
71.
Jenal M, Elkon R, Loayza-Puch F, van Haaften G, Kuhn U, Menzies FM, Oude
Vrielink JA, Bos AJ, Drost J, Rooijers K et al: The poly(A)-binding protein
nuclear 1 suppresses alternative cleavage and polyadenylation sites. Cell,
149(3):538-553.
72.
Dittmar KA, Jiang P, Park JW, Amirikian K, Wan J, Shen S, Xing Y, Carstens
RP: Genome-wide determination of a broad ESRP-regulated posttranscriptional
network by high-throughput sequencing. Mol Cell Biol, 32(8):1468-1482.
73.
Buratti E, Baralle FE: TDP-43: new aspects of autoregulation mechanisms in
RNA binding proteins and their connection with human disease. Febs J,
278(19):3530-3538.
74.
Zhu J, Chen X: MCG10, a novel p53 target gene that encodes a KH domain
RNA-binding protein, is capable of inducing apoptosis and cell cycle arrest in
G(2)-M. Mol Cell Biol 2000, 20(15):5602-5618.
75.
Naarmann IS, Harnisch C, Flach N, Kremmer E, Kuhn H, Ostareck DH,
Ostareck-Lederer A: mRNA silencing in human erythroid cell maturation:
heterogeneous nuclear ribonucleoprotein K controls the expression of its regulator
c-Src. J Biol Chem 2008, 283(26):18461-18472.
76.
Perrotti D, Cesi V, Trotta R, Guerzoni C, Santilli G, Campbell K, Iervolino A,
Condorelli F, Gambacorti-Passerini C, Caligiuri MA et al: BCR-ABL suppresses
C/EBPalpha expression through inhibitory action of hnRNP E2. Nat Genet 2002,
30(1):48-58.
77.
Molinaro RJ, Jha BK, Malathi K, Varambally S, Chinnaiyan AM, Silverman RH:
Selection and cloning of poly(rC)-binding protein 2 and Raf kinase inhibitor
protein RNA activators of 2',5'-oligoadenylate synthetase from prostate cancer
cells. Nucleic Acids Res 2006, 34(22):6684-6695.
78.
Chaudhury A, Hussey GS, Ray PS, Jin G, Fox PL, Howe PH: TGF-beta-mediated
phosphorylation of hnRNP E1 induces EMT via transcript-selective translational
induction of Dab2 and ILEI. Nat Cell Biol, 12(3):286-293.
79.
Huang da W, Sherman BT, Lempicki RA: Bioinformatics enrichment tools: paths
toward the comprehensive functional analysis of large gene lists. Nucleic Acids
Res 2009, 37(1):1-13.
80.
Warzecha CC, Jiang P, Amirikian K, Dittmar KA, Lu H, Shen S, Guo W, Xing Y,
Carstens RP: An ESRP-regulated splicing programme is abrogated during the
epithelial-mesenchymal transition. EMBO J, 29(19):3286-3300.
81.
Warzecha CC, Shen S, Xing Y, Carstens RP: The epithelial splicing factors
ESRP1 and ESRP2 positively and negatively regulate diverse types of alternative
splicing events. RNA Biol 2009, 6(5):546-562.
82.
Pan Z, Zhang H, Hague LK, Lee JY, Lutz CS, Tian B: An intronic
polyadenylation site in human and mouse CstF-77 genes suggests an
evolutionarily conserved regulatory mechanism. Gene 2006, 366(2):325-334.
96
83.
Brooks AN, Yang L, Duff MO, Hansen KD, Park JW, Dudoit S, Brenner SE,
Graveley BR: Conservation of an RNA regulatory map between Drosophila and
mammals. Genome Res, 21(2):193-202.
84.
Danckwardt S, Hentze MW, Kulozik AE: 3' end mRNA processing: molecular
mechanisms and implications for health and disease. EMBO J 2008, 27(3):482498.
85.
Millevoi S, Vagner S: Molecular mechanisms of eukaryotic pre-mRNA 3' end
processing regulation. Nucleic Acids Res, 38(9):2757-2774.
86.
Veraldi KL, Arhin GK, Martincic K, Chung-Ganster LH, Wilusz J, Milcarek C:
hnRNP F influences binding of a 64-kilodalton subunit of cleavage stimulation
factor to mRNA precursors in mouse B cells. Mol Cell Biol 2001, 21(4):12281238.
87.
Warzecha CC, Sato TK, Nabet B, Hogenesch JB, Carstens RP: ESRP1 and
ESRP2 are epithelial cell-type-specific regulators of FGFR2 splicing. Mol Cell
2009, 33(5):591-601.
88.
Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with
RNA-Seq. Bioinformatics 2009, 25(9):1105-1111.
89.
Takagaki Y, Manley JL, MacDonald CC, Wilusz J, Shenk T: A multisubunit
factor, CstF, is required for polyadenylation of mammalian pre-mRNAs. Genes
Dev 1990, 4(12A):2112-2120.
90.
MacDonald CC, Wilusz J, Shenk T: The 64-kilodalton subunit of the CstF
polyadenylation factor binds to pre-mRNAs downstream of the cleavage site and
influences cleavage site location. Mol Cell Biol 1994, 14(10):6647-6654.
91.
Zhao J, Hyman L, Moore C: Formation of mRNA 3' ends in eukaryotes:
mechanism, regulation, and interrelationships with other steps in mRNA
synthesis. Microbiol Mol Biol Rev 1999, 63(2):405-445.
92.
Colgan DF, Manley JL: Mechanism and regulation of mRNA polyadenylation.
Genes Dev 1997, 11(21):2755-2766.
93.
Wallace AM, Dass B, Ravnik SE, Tonk V, Jenkins NA, Gilbert DJ, Copeland
NG, MacDonald CC: Two distinct forms of the 64,000 Mr protein of the cleavage
stimulation factor are expressed in mouse male germ cells. Proc Natl Acad Sci U
S A 1999, 96(12):6763-6768.
94.
Chuvpilo S, Zimmer M, Kerstan A, Glockner J, Avots A, Escher C, Fischer C,
Inashkina I, Jankevics E, Berberich-Siebelt F et al: Alternative polyadenylation
events contribute to the induction of NF-ATc in effector T cells. Immunity 1999,
10(2):261-269.
95.
Park JW, Parisky K, Celotto AM, Reenan RA, Graveley BR: Identification of
alternative splicing regulators by RNA interference in Drosophila. Proc Natl Acad
Sci U S A 2004, 101(45):15974-15979.
97
96.
Ji Z, Tian B: Reprogramming of 3' untranslated regions of mRNAs by alternative
polyadenylation in generation of pluripotent stem cells from different cell types.
PLoS One 2009, 4(12):e8419.
97.
Vaupel P, Mayer A: Hypoxia in cancer: significance and impact on clinical
outcome. Cancer Metastasis Rev 2007, 26(2):225-239.
98.
Chi JT, Wang Z, Nuyten DS, Rodriguez EH, Schaner ME, Salim A, Wang Y,
Kristensen GB, Helland A, Borresen-Dale AL et al: Gene expression programs in
response to hypoxia: cell type specificity and prognostic significance in human
cancers. PLoS Med 2006, 3(3):e47.
99.
Lendahl U, Lee KL, Yang H, Poellinger L: Generating specificity and diversity in
the transcriptional response to hypoxia. Nat Rev Genet 2009, 10(12):821-832.
100.
Masuda K, Abdelmohsen K, Gorospe M: RNA-binding proteins implicated in the
hypoxic response. J Cell Mol Med 2009, 13(9A):2759-2769.
101.
Gorospe M, Tominaga K, Wu X, Fahling M, Ivan M: Post-Transcriptional
Control of the Hypoxic Response by RNA-Binding Proteins and MicroRNAs.
Front Mol Neurosci, 4:7.
102.
Weigand JE, Boeckel JN, Gellert P, Dimmeler S: Hypoxia-induced alternative
splicing in endothelial cells. PLoS One, 7(8):e42697.
103.
Shepard PJ, Choi EA, Lu J, Flanagan LA, Hertel KJ, Shi Y: Complex and
dynamic landscape of RNA polyadenylation revealed by PAS-Seq. Rna 2011,
17(4):761-772.
104.
Morris AR, Bos A, Diosdado B, Rooijers K, Elkon R, Bolijn AS, Carvalho B,
Meijer GA, Agami R: Alternative Cleavage and Polyadenylation during
Colorectal Cancer Development. Clin Cancer Res, 18(19):5256-5266.
105.
Wellmann S, Buhrer C, Moderegger E, Zelmer A, Kirschner R, Koehne P, Fujita
J, Seeger K: Oxygen-regulated expression of the RNA-binding proteins RBM3
and CIRP by a HIF-1-independent mechanism. J Cell Sci 2004, 117(Pt 9):17851794.
106.
Shi L, Campbell G, Jones WD, Campagne F, Wen Z, Walker SJ, Su Z, Chu TM,
Goodsaid FM, Pusztai L et al: The MicroArray Quality Control (MAQC)-II study
of common practices for the development and validation of microarray-based
predictive models. Nat Biotechnol, 28(8):827-838.