Genetic Organization, Length Conservation, and

Genetic Organization, Length Conservation, and Evolution of
RNA Polymerase II Carboxyl-Terminal Domain
Pengda Liu, ,1 John M. Kenney,2 John W. Stiller,*à,1 and Arno L. Greenleaf*à,3
1
Department of Biology, East Carolina University
Department of Physics, East Carolina University
3
Department of Biochemistry, Duke University Medical Center
Present address: Department of Biochemistry, Duke University Medical Center.
àThese authors contributed equally to this article.
*Corresponding author: E-mail: [email protected]; [email protected].
Associate editor: Martin Embley
2
Research article
Abstract
With a simple tandem iterated sequence, the carboxyl terminal domain (CTD) of eukaryotic RNA polymerase II (RNAP II)
serves as the central coordinator of mRNA synthesis by harmonizing a diversity of sequential interactions with
transcription and processing factors. Despite intense research interest, many key questions regarding functional and
evolutionary constraints on the CTD remain unanswered; for example, what selects for the canonical heptad sequence, its
tandem array across organismal diversity, and constant CTD length within given species and finally and how a sequenceidentical, repetitive structure can orchestrate a diversity of simultaneous and sequential, stage-dependent interactions with
both modifying enzymes and binding partners? Here we examine comparative sequence evolution of 58 RNAP II CTDs
from diverse taxa representing all six major eukaryotic supergroups and employ integrated evolutionary genetic,
biochemical, and biophysical analyses of the yeast CTD to further clarify how this repetitive sequence must be organized
for optimal RNAP II function. We find that the CTD is composed of indivisible and independent functional units that span
diheptapeptides and not only a flexible conformation around each unit but also an elastic overall structure is required.
More remarkably, optimal CTD function always is achieved at approximately wild-type CTD length rather than number of
functional units, regardless of the characteristics of the sequence present. Our combined observations lead us to advance
an updated CTD working model, in which functional, and therefore, evolutionary constraints require a flexible CTD
conformation determined by the CTD sequence and tandem register to accommodate the diversity of CTD–protein
interactions and a specific CTD length rather than number of functional units to correctly order and organize global CTD–
protein interactions. Patterns of conservation of these features across evolutionary diversity have important implications
for comparative RNAP II function in eukaryotes and can more clearly direct specific research on CTD function in currently
understudied organisms.
Key words: RNAPII CTD, transcription, functional organization, evolutionary conservation.
Introduction
Proper gene regulation underlies cellular differentiation,
morphogenesis, and the versatility and adaptability of
any organism. Alterations in regulation are important
in evolutionary change because control of the timing,
location, and level of gene expression can have profound
effects on protein function and, therefore, the phenotype
and fitness of the organism. Studies over decades have
revealed that many steps in gene expression initially
viewed as independent are intricately coordinated, interconnected, and regulated in a sophisticated network (for
reviews, see Maniatis and Reed [2002] and Proudfoot
et al. [2002]). The central coordinator that couples this
regulatory network together, along with many other
functions carried out in the cell’s nucleus, is RNA polymerase II (RNAP II); of particular importance is the
stage-specific differentially modified sequence-repetitive
carboxyl terminal domain (CTD) of its largest subunit
(Rpb1).
Studies in yeast and animals have demonstrated that the
CTD serves as a scaffold for the assembly of protein complexes that help coordinate not only mRNA biogenesis, including transcription initiation (Kim et al. 1994), promoter
clearance (Corden 1993), histone methylation (Hampsey
and Reinberg 2003), capping (Cho et al. 1997; McCracken,
Fong, Rosonina, et al. 1997), elongation (Corden and
Patturajan 1997), RNA processing (Proudfoot et al. 2002)
(editing [Ryman et al. 2007], splicing [de la Mata and
Kornblihtt 2006], polyadenylation and 3# cleavage
[McCracken, Fong, Yankulov, et al. 1997]), and termination
(Gudipati et al. 2008; Vasiljeva et al. 2008) (and see review
[Phatnani and Greenleaf 2006]), but also snRNA gene expression (Jacobs et al. 2004; Egloff et al. 2007; Egloff and
Murphy 2008a). These widely varied functions of the
CTD are accomplished through diversity in recognition
of the step-dependent modification patterns on the
CTD repeats, sometimes referred to as the ‘‘CTD code’’
(Buratowski 2003; Corden 2007; Egloff and Murphy
© The Author 2010. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please
e-mail: [email protected]
2628
Mol. Biol. Evol. 27(11):2628–2641. 2010 doi:10.1093/molbev/msq151 Advance Access publication June 17, 2010
RNAP II CTD Functional and Evolutionary Conservation · doi:10.1093/molbev/msq151
2008b). Although crystal structures of RNAP II have been
solved at high resolution, the CTD is an exception; it appears to maintain an unordered structure during the whole
transcription process, permitting interactions with different sets of CTD modifying enzymes, and binding factors
(Meinhart et al. 2005; Phatnani and Greenleaf 2006).
The CTD comprises tandemly repetitive heptapeptides
with the consensus sequence Tyr1-Ser2-Pro3-Thr4-Ser5Pro6-Ser7 (YSPTSPS) and occurs in all green plants, yeasts,
and animals tested to date, as well as in many protists including Ameobozoa, Microsporidia, chytrids, and choanoflagellates (Stiller et al. 2000; Stiller and Hall 2002; Stiller and
Cook 2004). Its tandem structure is proposed to have arisen
through amplifications of a repetitive DNA sequence
(Chapman et al. 2008). The CTD is essential for viability
in animals and yeasts, where it has been studied most extensively. The number of repeats present is somewhat proportional to genomic complexity, with multicellular
animals and plants typically containing longer CTD regions
than yeasts and other simpler forms (fig. 1 and supplementary fig. 1, Supplementary Material online).
The consensus sequence and tandem structure are evolutionarily conserved from yeast to human but, interestingly, not among all eukaryotes. Initial phylogenetic
analyses of widely diverse organisms indicated that the
CTD’s canonical sequence and repetitive structure was
conserved strongly only in a set of taxonomic groups that
was named the ‘‘CTD clade’’ (Stiller and Hall 2002), which
clustered together in trees constructed from RPB1 sequences. Phylogenomic studies generally indicate that this ‘‘CTD
clade’’ is not a natural evolutionary group, and broader
sampling of diverse eukaryotes show that the pattern
of CTD conservation is more complicated than it first
appeared (fig. 1).
In addition to the complete or nearly complete degeneration of the CTD in eukaryotic groups outside the ‘‘CTD
clade,’’ certain groups of fungi have lost the well-ordered
CTD structure found in yeasts (fig. 1). Moreover, organisms
with tandem repeats of various alternative sequences are
found, for example, in Mastigamoeba invertens (YSPASPA),
Plasmodium falciparum (YSPTSPK), and the red algae
Glaucosphaera vacuolata (YSPTSPT/H/Q) and Cyanidioschyzon merolae (YSPSSPVNA).
Although CTDs from different species vary in length and
amount of degenerative sequence, some consistent patterns have emerged. The consensus heptapeptide and
the tandem structure are conserved across a wide range
of organisms and, even when not, the overall pattern of
CTD-like motifs remains strongly conserved (fig. 1). Moreover, both length and relative pattern of canonical and degenerate regions are relatively conserved among related
taxa (fig. 1). This raises a number of questions that are best
addressed through integrated evolutionary and functional
analyses. What selects for the specificity of the canonical
heptad across broad organismal diversity? Why is this heptad maintained in a global tandem register? Why are both
these features conserved strongly in some eukaryotic lineages, whereas one or both are altered dramatically in
MBE
others? What selection pressures are responsible for
maintaining a constant CTD length? Finally, how can
a sequence-identical, repetitive structure orchestrate
such a diversity of simultaneous and sequential, stagedependent interactions with both modifying enzymes
and binding partners?
In this article, we report novel results from genetics, biochemical, and biophysical analyses in budding yeast, to develop a model of the evolution of sequence identical but
functionally segregated CTD essential units and the selective constraints that keep them arranged in a tandemly repeated, overlapping global structure. We propose that the
globally unordered protein structure conferred by tandem
repeats of the canonical heptapeptide is critical for CTD
function, particularly when large numbers of protein interactions must be accommodated and that differences in the
CTD’s workload through the transcription cycle likely
underlie variation across eukaryotic lineages.
Materials and Methods
Construction of Artificial CTD Sequence and Yeast
Transformation
Artificial oligonucleotides with various substitutions or insertions were used to construct mutated CTDs, subcloned
into a yeast RNAP II shuttle vector and transformed into
yeast cells via the plasmid shuffle as described in detail previously (Liu et al. 2008).
Construction and purification of GST–CTD fusion proteins, in vitro phosphorylation assays for GST-tagged CTD
mutants and quantification of relative levels of phosphorylation were as described previously (Liu et al. 2008).
Construction and Purification of Tag-Free CTD
Mutant Proteins via Intein Tag System
Mutated CTD sequences were polymerase chain reaction
amplified from pSBO-CTD constructs and cloned into
the Escherichia coli pTXB1 expression vector (New
England Biolab: http://www.neb.com/nebecomm/products
/productN6707.asp) polylinker region, upstream of the intein/chitin-binding domain from the Mycobacterium
xenopi GyrA gene, to generate a series of C-terminal
intein-tagged CTD mutants. Two sets of primers were
used for sequences from wild type (WT) and mutants,
in order to reduce non-CTD vector sequence in the protein products. pTXB1 F WT: 5#-GGT GGT CAT ATG TTT
TCT CCA ACT TCC CCA-3# and pTXB1 R WT: 5#-GGT
GGT TGC TCT TCC GCA TGC AGG AGA TCC TGG
GCT-3# were designed for WT, whereas pTXB1 F6: 5#GTG GAT CAT ATG GGG ATC CTT GGA GTC TCC-3#
and pTXB1 R3: 5#-GTA GCT TGC TCT TCC GGT CAT
AGC ATA GGG GTA GCT-3# were designed for all other
CTD mutants. Those two sets of primers introduced SapI
and NdeI restriction sites into flanking sequences of WT
and artificial CTDs. Transformants were sequenced to verify integrity. Vectors encoding various CTD fusion proteins were transformed into the E. coli strain PR2655
for intein-tagged fusion protein expression.
2629
Liu et al. · doi:10.1093/molbev/msq151
MBE
FIG. 1. Conservation and degeneration of the CTD across eukaryotic diversity. CTDs from individual taxa are aligned by recognizable
heptapeptide motifs or by aligning tyrosine residues where not even degenerate heptapeptides can be identified. Green regions correspond to
sequences that conform to the essential CTD functional units indentified in yeast, yellow to conserved heptapeptides that do not have proper
adjacent sequences to form a functional unit, and red to regions with substitutions that lethally disrupt CTD function in yeast. Full sequences
with color coding are provided in supplementary figure 1 (Supplementary Material online). Taxonomic designations and phylogenetic
associations are based on the Tree of Life Web Project (http://www.tolweb.org/tree/), with tentative relationships that remain controversial
shown by dashed lines. Within larger taxonomic groups, and between related groups, the pattern of conservation or degeneration tends to be
consistent. For example, in both animals and green plants, proximal heptads are arrayed consistently in viable functional units, whereas distal
CTD sequences tend toward more degeneracy. In contrast, conserved heptads occur more distally in basidiomycete fungi. In other groups, such
as ciliates and parasitic excavate taxa, there appears to be no purifying selection on CTD structure whatsoever. The functional significance this
variation, with respect to how RNAP II transcription is organized remains unclear. Although not shown in this figure, within Plasmodium
falciparum and related species, there is extensive variation in the number of tandem functional units present, which could reflect coevolution
with mammalian host transcription systems (Kishore et al. 2009).
A fresh single colony carrying the correct mutated CTD
sequence in pTXB1 vector was inoculated directly into 1L
LBþAMP liquid medium and incubated in an orbital shaker
at 37 °C until the optical density (OD)600 reached 0.5, then
0.4 mM isopropyl b-D-1-thiogalactopyranoside was added
for induction of protein expression. The culture was incubated for another 4 h, and the cells were spun down at
5,000 revolutions per minute (rpm) at 4 °C; 100 ml of column buffer (20 mM Na-4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES), pH 8.5; 500 mM NaCl, and 1
2630
mM ethylenediaminetetraacetic acid [EDTA]) was added
and the pellet was sonicated three times at 50% duty cycle
for 45 s each. Cell debris was removed by centrifugation at 25
K relative centrifugal force (rcf) for 30 min at 4 °C. The clarified cell extract was loaded on pre-equilibrated Chitin columns packed with 10 ml chitin beads with column buffer
and washed by at least 20 bed volumes of column buffer.
On column, cleavage of the intein tag from the intein–
CTD fusion protein was conducted by quickly flushing
the column with three bed volumes of the cleavage buffer
RNAP II CTD Functional and Evolutionary Conservation · doi:10.1093/molbev/msq151
(20 mM Na-HEPES, pH 8.5; 50 mM NaCl; 50 mM
Dithiothreitol, and 1 mM EDTA) to evenly distribute thiols
through the column. Then the column flow was stopped
and the column was left at 4 °C for 12 h for total intein tag
cleavage. Target protein without the intein tag was eluted
out using the column buffer, and elution efficiency was monitored by OD280 reading. Elution fractions were resolved on
4–20% sodium dodecyl sulfate polyacrylamide gel electrophoresis gels with silver staining. Concentrations of tag-free CTD
fusion proteins were determined using an UV/vis photometer
at manufacturer’s standard settings (Biophotometer,
Eppendorf, Hamburg).
Growth Curve Experiments
Three different single colonies for ‘‘WT’’ controls (pY1transformed yeast) and each CTD mutants were picked
from 5-Fluoroorotic acid plates, inoculated into 10 ml liquid YPD medium, and incubated on a shaking incubator at
250 rpm overnight. OD600 readings were monitored for
each sample starting around 0.1. For each reading, 50 ll
of culture was pipetted out into a 1-mm cuvette, and
the reading was repeated twice and averaged. The cell
number was calculated based on the assumption that
an OD600 of 0.1 equates to 1 106 cells. The cell number
calculation was confirmed by hemocytometer counting.
Measurements were taken until concentrations exceeded
and OD600 of 1.0, and all data were input into excel and
analyzed by the following formula: doubling time 5 ln2/
((ln (A/Ao))/t), where A 5 cell density (e.g., OD) at time
t and A0 5 initial cell density.
Protein Secondary Structure Prediction
Protein secondary structure predictions were performed
on Imperial College Phyre Server (http://www.sbg.bio.ic
.ac.uk/phyre/) using five different methods. Different
CTD mutant protein sequences in FASTA format were input into the prediction window and the results were obtained from the server Web page.
Circular Dichroism Measurements
All circular dichroism (CD) measurements were carried out
on a Jasco J-720 CD machine equipped with a temperature
controller. CD spectra were the average of eight scans. All
CTD spectra were recorded on protein samples in pure water at room temperature. CD parameter settings are as follows: sensitivity, standard (100 mdeg); start, 270 nm; end,
180 nm; data pitch, 0.1 nm; scanning model, continuous;
scanning speed, 50 nm/min; response, 4 s; band width,
2 nm; and accumulation: 8.
Results
The CTD Is Composed of Indivisible and
Independent Functional Units that Span
Diheptapeptides
Because of its tandemly iterated structure, a clear explanation of how the CTD sequence determines CTD function
has been elusive. Nevertheless, genetic investigations in
MBE
both yeast and animals have shown that in a given heptad
repeat, Y1, S2P3, and S5P6 are essential for CTD function
(West and Corden 1995); in contrast, neither T4 nor S7
is required for viability in yeast (Stiller et al. 2000). S7 phosphorylation has been shown to be critical for small nuclear
RNA transcription (Chapman et al. 2007; Egloff et al. 2007)
in mammalian cells and recently also was observed in yeast
(Akhtar et al. 2009); however, given the viability of complete S7 / A7 Saccharomyces cerevisiae CTD variants,
the significance of S7 phosphorylation for RNAP II transcription in yeast is unclear. In addition, a variety of substitution mutants are viable, including those with partial
changes at essential positions (e.g., in half of all S2 residues)
(West and Corden 1995), demonstrating that a great deal
of flexibility is permitted in primary CTD structure. With
respect to a globally conserved tandem structure, Ala insertions between adjacent repeats are lethal in yeast in
any register, whereas Ala residues inserted between pairs
of heptapeptides are well tolerated (Stiller and Cook
2004); this demonstrates that the irreducible CTD functional
unit lies within paired heptads. Our genetic analyses of a series of yeast CTD substitution and insertion mutants within
or between diheptads have further narrowed this indivisible
unit down to two loosely linked sequence elements maintained in paired heptapeptides. Specifically, ‘‘S2P-S5P-S9P’’
must be present and associated with two tyrosines spaced
seven amino acids apart in either a ‘‘Y1–Y8’’ (YSPTSPSYSP)
or ‘‘Y8–Y15’’ (SPTSPSYSPTSPSY) orientation (Stiller and
Cook 2004; Liu et al. 2008). For example, yeast mutants
are viable with repeats containing only a minimal sequence
of these two essential elements (Y1S2P3T4S5P6S7Y1S2P3T or
‘‘252,’’ named after the positions of essential phosphoserines), with alanines replacing the right-hand S5P6S7 residues
(see construct ‘‘AR’’ in table 1).
We constructed a series of CTD variants, containing 6, 9,
and 12 repeats, respectively, of the ‘‘252’’ minimum required sequence (YSPTSPSYSPS). Although these CTDs
are missing the last three residues (SPS) of every other heptad completely, strains with 9 or more tandem copies of
this sequence grew almost like WT (table 1, construct
252). These results show that the minimal (252) sequence,
as defined by substitution mutants within a global heptad
register, is sufficient to confer virtually WT CTD function,
even outside the normal positional spacing of tandemly repeated heptapeptides (in these variants, the repeating unit
is an 11-mer). In ‘‘252’’ CTDs, the absence of the last ‘‘SP’’
pair in all second repeats of diheptads eliminates the potential for them to serve as proximal heptads in overlapping units. This constrains the mutated CTD to a string
of nonoverlapping individual minimal functional units
and eliminates almost half the essential units potentially
present for a given CTD length. These features also apply
to many other viable CTD mutants we analyzed (table 1,
rows AR, APS, and A7). Thus, the overall sequence required
for most or all CTD functions is not the tandemly repeated
heptads present but, rather, repeated units of three consecutive Ser-Pro pairs interspersed with Tyr residues that
are spaced at a heptad interval.
2631
MBE
Liu et al. · doi:10.1093/molbev/msq151
Table 1. Sequences and Phenotypes for CTD Mutants of Rpb1.
CTD Mutant
AL
AR
YAA
YATA
YAS
YAP
APS
252
AA
5A
7A
A7
AP
Repeated Sequence
ASPTSPS YSPTSPS
YSPTSPS YSPTAAA
YAATSPS YSPTSPS
YAPTAPS YSPTSPS
YASPTSPS YSPTSPS
YAPTSPS YSPTSPS
YSPTAPS YSPTSPS
YSPTSPS YSPS
AA YSPTSPS YSPTSPS
AAAAA YSPTSPS YSPTSPS
AAAAAAA YSPTSPS YSPTSPS
AAAAAAA YSPTSPS YSPTSPS YSPTSPS
AAPAAPA YSPTSPS YSPTSPS
Cell Phenotype at
around WT CTD Length
Lethal
Viable 1111
Lethal
Lethal
Viable 111
Lethal
Viable 1111
Viable 11111
Viable 1111
Slow growth 1
Lethal
Viable 111
Viable 1111
The ‘‘þ’’ marks for each viable CTD mutant indicate the relative vigor of the yeast cells bearing the mutants, compared with WT cells (WT is labeled as five pluses).
If the minimum essential yeast CTD unit is, indeed, the
sequence identified genetically, how is this reconciled with
observations, for example, that Candida albicans capping
enzyme (Cgt1) requires at least three repeats to bind
the CTD (Fabrega et al. 2003)? A close examination of
the Cgt1-binding motifs on three CTD repeats provides
the answer: the two major Cgt1-binding motifs, CDS1
and CDS2, are associated with two distinct CTD essential
units, with the intervening CTD segment looped out and
not in contact with the capping enzyme (Meinhart et al.
2005). Thus, effective CTD–Cgt1 interaction requires contact with more than one minimum CTD unit and enough
flexibility between them to permit cooperative binding.
Flexible Protein Structure around Each Essential
Unit Is Fundamental for CTD Function
If CTD functional units do not need to overlap, then what
relationships do they have to each other and why are they
conserved in a tandemly overlapping structure? The nearly
WT growth of variant AR (YSPTSPSYSPTAAA)12 (Liu et al.
2008) (table 1) demonstrates that core functional units can
be separated without major disruptions of CTD function.
This case, perhaps, is not surprising given that AR mutants
retain the normal heptad spacing between core CTD units.
In contrast, additional distance between essential elements
results in a progressive decline in CTD efficiency (table 1,
rows AA, 5A); function is compromised fully when units are
separated by seven Ala residues (table 1, row 7A) (Stiller
and Cook 2004). Initially, this suggested that at least some
essential functions require core units to be in relatively
close proximity, for example, to accommodate direct protein steric interactions with multiple, adjacent CTD binding
sites (Stiller and Cook 2004). However, we have found (see
later results) that polyalanine insertions (e.g., AAAAAAA)
engender the formation of stable secondary structures that
could be detrimental to CTD functions. Alanine residues
show the highest propensity to form helices (Chakrabartty
et al. 1994); short poly-Ala peptides have been observed to
adopt a-helical (Chakrabartty et al. 1994) and partial
a-helical (Kentsis et al. 2004) structures and also have been
observed in b-sheet or random coil (Toniolo et al. 1979),
2632
polyproline II (Shi et al. 2002), amyloid (Gasset et al. 1992),
and helical dimer (Kohtani et al. 2004) conformations. The
specific conformation adopted appears to depend on the
length of the poly-Ala stretch (Blondelle et al. 1997), the
broader protein context (Scheuermann et al. 2003), and
properties of the aqueous solution (Peng et al. 2003).
To test this interpretation, we constructed CTDs with
same number of intervening residues (seven) between diheptads but disrupted the 2° structure by replacing Ala
with Pro as in the same locations in the canonical CTD sequence (e.g., AAPAAPA); this change restores relatively vigorous growth (table 1, row AP). Further, we inserted seven
alanines between canonical triheptapeptides (A7 in table 1),
thereby increasing the amount of normal, structurally unordered sequence around each given functional unit, and
these mutants also grew vigorously. Thus, it is the quality of
the inserted sequence, rather than the physical distance
between functional units, that is critically important.
The respective behaviors of 7A and A7 mutants (table 1)
are consistent with presence of stable poly-Ala structures
that interfere with functions if they are too close to adjacent essential functional units. In turn, because prolines interfere with helix formation, the viability of AP mutants
(with seven residue-long spacers containing prolines) (table
1, row AP) suggests that secondary structures adopted by
7A inserts may be helices that are absent when Pro residues
interrupt the Ala stretches. Thus, our cumulative genetic
results suggest that the essential repeating units in the
CTD can function when separated by seven residues, as
long as the separating sequences do not adopt stable secondary structures that impinge on protein interactions
with adjacent heptapeptides.
We further investigated these genetics-based conclusions using physical–chemical approaches. First, as has
been noted previously, we found that protein secondary
structure predictions indicate a random coil predominates
in WT repeats (supplementary fig. 2, Supplementary Material online); in contrast, the same prediction programs
suggest that some helical structure is introduced by insertion of 7A spacers. To test these computational predictions, we analyzed WT and 7-alanine–mutated CTDs by
RNAP II CTD Functional and Evolutionary Conservation · doi:10.1093/molbev/msq151
MBE
FIG. 2. CD spectra of WT CTD peptide and CTD mutant peptides under different conditions. (A) CD spectra of WT, AP, 7A, and A7 proteins in
pure water. The spectra are the average of eight accumulations for each protein. WT and AP are in the upper group with similar predominant
random coil structure, whereas 7A and A7 are in the lower group with a majority of helical structure. The WT and 7A curves were adjusted to
the same scale as AP and A7 by magnifying the signal. (B). CD temperature scan spectra for WT, AP, 7A, and A7 proteins. All the four protein
samples were measured under different temperatures ranging from 10 °C to 60 °C at 10 °C intervals; ‘‘back to 10’’ indicates that temperature
was directly dropped from 60 °C to 10 °C. With increasing temperatures, both WT and AP proteins became more unordered, and when the
temperature dropped from 60 °C back to 10 °C, the spectra were not exactly the same as when beginning at 10 °C. These results suggest that
changes in conformation from more ordered to less ordered as temperatures increased were not easily reversible and that the WT protein was
much harder to bring back to its original conformation than was the AP mutant. As no solid material was observed in the sample cuvette at
the end of the temperature scan measurements, the nonreversibility of the protein structures was not due to protein precipitation. In contrast,
the 7A protein demonstrated a greater ability to recover its original structure when the temperature returned to 10 °C.
CD spectroscopy. The CTDs analyzed were tag-free proteins purified using an intein self-cleavage system (see
Methods). As expected, the WT CTD produced spectra
consistent with predominantly random coil (fig. 2A). In
contrast, both 7A and A7 CTDs produced spectra indicating significant presence of helical structure (fig. 2A). Finally,
the AP CTD spectrum was extremely similar to that of WT,
again indicating predominantly random coil (fig. 2A).
The secondary structures formed in the 7A and A7 CTDs
appear to be quite stable. A isodichroic point was observed
in the A7 CD melting spectrum around 200 nm (Fig. 2B),
suggesting a two-state helical-coil ellipticity model (Gans
et al. 1991); however, more rigorous testing is needed to
support this the two-state model because an isodichronic
point at the shorter wave length does not rule out the possibility that local dichroic chromophores could significantly perturb ellipticity only at longer wavelengths
(Wallimann et al. 2003). In any case, its closer match to
a standard a-helical CD spectrum, along with the presence
of a isodichronic point in its melting curve, do suggest
that stronger helical structures are formed in the A7
CTD than in 7A.
2633
Liu et al. · doi:10.1093/molbev/msq151
It is noteworthy that polyalanine helical structures are
more stable with three intervening CTD heptads (A7:
AAAAAAA YSPTSPS YSPTSPS YSPTSPS AAAAAAA), yet
do not have dramatically negative impacts on CTD function. In contrast weaker helical structures directly next to
essential CTD motifs (7A: AAAAAAA YSPTSPS YSPTSPS
AAAAAAA) are lethal. These combined genetic and structural results indicate that placing ordered structures directly next to essential heptad motifs (underlined
sequences shown above) impinges on the interactions between CTD functional units and binding partners, In contrast, unordered sequence on either side of each functional
unit (A7) allows those proper interactions, even if more
stable structural features are present nearby. This supports
our model that strong secondary structures too close to
CTD functional units are responsible for the deleterious
and lethal effects of polyalanine insertions.
To test whether secondary structures in A7 or 7A CTDs
could affect CTD kinase interactions with these proteins,
we compared phosphorylation of WT and mutant CTDs
by human CDK7 and CDK9 kinases, as well as yeast CTDKI in vitro (Fig. 3). Consistent with their lethality, 7A CTDs, with
only two heptads between each 7-alanine stretch (table 1), did
not function detectably as substrate for any of the CTD
kinases. In contrast, the A7 CTDs, with three heptads between
each7-alanine stretch, functionedas effective substrates for all
three kinases. Both viability and phosphorylatability of 7-alanine CTDs with longer canonical sequences between Ala
stretches (compare 7A and A7, table 1) are consistent with
the hypothesis that Ala-dependent secondary structures interfere with accessibility of critical residues in adjacent functional units. Thus, CTD units of sufficient length (e.g., three
heptads) can achieve significant levels of function, even when
separated by incompatible physical structures. Our results
support the following model: if canonical segments between
interfering structural features are long enough to allow access
of enzymes and binding proteins to core CTD units, essential
CTD functions are accommodated, although not necessarily
with WT efficiency.
Taken together, genetic, biochemical, and biophysical
analyses indicate that introducing strong structural features into the CTD sterically hinders access to nearby essential functional units. This would appear to put serious
constraints on the kinds of evolution in primary sequence
that are permitted. Specifically, selection should favor
maintenance of a globally unstructured domain in which
each functional unit remains fully available for requisite
protein interactions.
Although introduction of stable structure elements can
be lethal, we also should note that mutants with short (2alanine) insertions exhibit slower growth and lower phosphorylation efficiencies than cells with a WT CTD (Liu et al.
2008) (table 1 and fig. 3). As there is no suggestion that two
alanines should form stable secondary structures, this indicates that pushing apart the core elements or, more precisely, displacing them from their normal physical locations
with respect to the transcription/processing apparatus,
also is somewhat deleterious to CTD function.
2634
MBE
CTD Length rather than a Number of Essential
Units Determines the CTD Function
Like the positioning of essential units and canonical heptads, overall length also is a conserved feature of the CTD;
however, yeast cells growing in favorable laboratory conditions do not have obvious functional deficiencies when only 13 of the normal 26 CTD repeats are present (West and
Corden 1995). Combined with the likelihood that random
length variation will be introduced frequently in repetitive
sequences (Nonet et al. 1987), this begs the question, why is
the number of CTD repeats so strongly conserved within
and among closely related taxa? Without measurable phenotypic variation between strains with slightly different
CTD length, assessing potential selective advantage of
one length over another is a difficult proposition. On
the other hand, given the identification of CTD core essential elements in yeast, along with the knowledge that strains
with CTDs of about normal overall length but with nonconsensus repeat sequences frequently show significant
growth defects (table 1: rows YAS, A7, and AP), it became
possible to address several questions about constraints on
CTD length. For example, what is most important for dictating optimal function: the total overall length of the CTD,
the absolute number of essential sequence elements present, or the spacing of the elements along the CTD? To explore these issues, we performed growth experiments on
a series of yeast strain sets containing varying lengths of
mutated CTDs (fig. 4). For each strain set, the CTD contained different numbers of a particular repeating unit
(e.g., ‘‘AR’’ in fig. 4 and table 1). Each strain set thus contained CTDs that were shorter than, equal to, or longer
than the normal WT length (26 repeats 7 aa/
repeat 5 182 aa). The repeating units used to construct
the collection of CTDs included a substitution mutant that
maintained the tandemly repeated heptad register but
with an altered sequence (AR), another with repeats truncated to the essential functional unit (252), and four that
contained insertions up to seven residues long between
sets of tandemly paired heptads (YAS, 5A, A7, and AP).
Surprisingly, regardless of the sequence repeated, the
length variant in each strain set showing the highest
growth rate always contained the CTD with nearest to normal length (182 aa) (fig. 4B), rather than the CTD with
a WT-equivalent number of essential functional units
(fig. 4C). For example, the variant AR(84) (the length of each
mutated CTD is presented by the total number of amino
acids contained that is described as a subscript next to the
name of the CTD mutant) is viable but contains only six
essential functional units, fewer than the seven units contained within the eight WT repeats necessary for viability.
This result suggests that it is CTD length rather than number of essential units that governs functionality of the CTD
(as reflected in the growth rate). Our other results are consistent with this suggestion. For example, variant CTDs with
additional residues inserted between heptad pairs or triplets (5A, A7, AP) further reduce the number of essential
functional units per length of CTD (see fig. 4A), yet they
RNAP II CTD Functional and Evolutionary Conservation · doi:10.1093/molbev/msq151
MBE
FIG. 3. In vitro phosphorylation assay for different polyalanine insertion mutant proteins. Relatively similar amounts of GST–CTD fusion
proteins (except for WT which was usually present at somewhat lower amounts) were incubated with different CDKs for 2 h at 30 °C (details
in Materials and Methods). Half of each reaction mixture was resolved by SDS-PAGE; the gels were Coomassie stained and then exposed to an
Amersham screen for 30 min and scanned on a Typhoon phosphorimager. The amount of phosphates and proteins on the gel was both
quantified by imagequant software, and the relative phosphorylation (phosphate/amount of protein) of each mutant protein was normalized
to WT value (defined as 1.00 for each kinase), as shown in the charts in the figure.
support viability. Variant AP(168), for example, is equivalent
in length to a WT CTD with 24 heptad repeats and 23þ
essential units (fig. 4A), but the variant CTD contains only 8
essential functional units. The fact that the AP(168) strain
grows extremely well (fig. 4B,4C) strongly supports the idea
that CTD length influences CTD function much more than
does the number of essential minimal units.
Although overall length is clearly a very important feature determining optimal CTD function, it is also clear that
none of the nonconsensus CTD variant strains grows as
well as WT cells with a normal length CTD. This is not surprising because both primary sequence and spacing of repeat units have been known for some time to affect CTD
function (West and Corden 1995; Stiller and Cook 2004; Liu
2635
Liu et al. · doi:10.1093/molbev/msq151
2636
FIG. 4. Growth rates of several viable CTD mutants over a range of CTD lengths. (A) The sequences of the viable mutants tested around WT length in this study were demonstrated. The overall
length (number of amino acids) is indicated on the top of sequence, and the essential functional units in each constructs are underlined in green. The number of repeats for each CTD mutant is
labeled behind the name of the mutant. (B) Substitution mutant (AR) and insertion mutants (YAS, 5A, and A7) with increasing number of repeats, from fewer to far more than natural CTD length,
were cultured under the same conditions and the cell growth was monitored by measuring OD600 at 30 °C. Three different colonies for each mutant were grown, and overall doubling times were
averaged. For all insertion and substitution mutants, peak growth rates were consistently achieved nearest the natural CTD length. Both shortening and lengthening the CTD resulted in reduced cell
growth. Average growth rate for control cells with 26 WT repeats is shown by a red dot. (C) The same data plotted against the number of functional units instead of the total CTD length as shown in
panel (B). No consistent result indicating a certain number of essential units were critical for CTD function emerges. (D) Two CTD constructs with nearest WT growth rate at near WT length were
tested for growth rate under cold (15 °C) or heat shock (37 °C). Consistent with our model, the mutants grew much slower under stressful conditions (15 °C and 37 °C) than normal (30 °C), whereas
almost no difference is observed in WT cells. This is consistent with the hypothesis that denser packing of functional CTD units in the WT sequence better accommodate conditions requiring greater
CTD–PCAP interactions, and that pre-established PCAP binding on the CTD affects the efficiency of interactions with additional stress response PCAPs.
MBE
RNAP II CTD Functional and Evolutionary Conservation · doi:10.1093/molbev/msq151
et al. 2008). Indeed, among the mutants we studied, all
grew much more slowly than WT, except for two: 252
and AR variants of about normal length. In fig. 4A we
see that variant 252 retains the highest density of essential
units in the CTDs tested, other than WT (fig. 4). The figure
also reveals that the AR CTD contains both a reasonably
high density of essential units and the normal heptad register (e.g., fig. 4A). These observations are congruent with
what is seen in the evolutionary history of the CTD in major
lineages like animals and green plants, where tandem structure is more conserved than primary sequence in degenerate heptads (fig. 1). Thus, it appears that both the
combination of overall CTD length and its tandem heptapeptide register are under strong purifying selection in
these groups.
One additional novel result emerged from the comparative growth analyses. When preparing the constructs, we
noticed that one AR mutant contained an inadvertent
frameshift after S7 residues, which resulted in deletion of
the last 13 residues present in the yeast WT CTD control
(YPYDVPDYNENSR). We measured the growth rate of this
mutant and found that it grew much more slowly than
a variant with a similar length but a normally positioned
stop codon (data not shown). This indicates that, even
though the last 13 residues of S. cerevisiae Rpb1 are not
essential for cell viability, as shown by this particular AR
mutant, they play some important role in CTD function.
To date, there have been no reports concerning such a function in yeast; in contrast, in humans, the last 10 CTD residues have been linked to CTD stability (Chapman et al.
2004, 2005).
Discussion
CTD Length and Organization of CTD-Associating
Proteins
Extant results clearly support the idea that CTD repeats are
functionally redundant. If this is so, then why are extra repeats
retained and overall length strongly conserved within species?
One model is that a certain length is required to maintain an
optimal ‘‘loading platform’’ for CTD- and phosphoCTD-associating proteins (PCAPs) that will accommodate different
configurations of these factors needed to survive all the conditions to which the cells are exposed. We propose three additional aspects of this model. The first is that binding of
protein complexes responsible for ‘‘essential functions’’
(e.g., those required under permissive laboratory growth conditions) determines the minimum number of repeats (the essential scaffold) and confers strong purifying selection on CTD
regions that retain overlapping tandem essential units (e.g.,
proximal CTD regions in plants and animals). Second, length
beyond the minimum provides room to bind proteins involved in additional or accessory functions not usually required under laboratory growth conditions. Third,
noncanonical repeat regions provide landing pads (or space)
for proteins involved in more taxon-specific functions.
That the requirements of an ‘‘essential scaffold’’ determine the minimal CTD length is consistent with observa-
MBE
tions from CTD truncation mutants. For example,
truncation of the yeast CTD to 10 repeats or substitution
with 7 WT and 8 S2A repeats showed conditional growth
(West and Corden 1995) but had little if any effect on basal
transcription factor recruitment to promoters or on production of mature transcripts tested from constitutively
expressed genes (Ahn et al. 2009). In contrast, constructs
that retained less than the minimal viable CTD length
showed severe defects in most or all essential RNA processing events (Custodio et al. 2007). A cautionary note for interpreting results from truncation studies in mammalian
cells is that the last 10 residues of the WT CTD are responsible for CTD stability (Chapman et al. 2004, 2005). Thus,
mammalian cell studies in which CTD truncations do not
contain these last 10 amino acids may generate artifacts
due to Rpb1 degradation. In yeast, there is no clear evidence for a role of the last nonconsensus part of the
CTD, apart from the effects on growth rate in the AR
mutant reported here.
Support for the idea that additional repeats are needed
for events that do not occur under optimal conditions (e.g.,
stress responses) comes from yeast truncation mutants
with about the minimal number of repeats needed for viability; these are cold and temperature sensitive and inositol auxotrophs (Nonet et al. 1987), as are shorter repeats
with Ser2 or Ser5 substitutions at C- or N-terminus (West
and Corden 1995). Apparently, the auxiliary functions required for survival under stress need additional CTD repeats. CTD length also seems to be important in
mammals for a differentiation-specific alternative splicing
event, repression of weak fibronectin extra domain I exon
inclusion (de la Mata and Kornblihtt 2006). In other experiments, it was observed that mammalian cells with 31 CTD
repeats are proficient in transcription, splicing, cleavage,
and polyadenylation but are deficient in mature mRNA
transportation, suggesting that the missing CTD repeats
are responsible for binding proteins needed for that function (Custodio et al. 2007). Finally, the CTD length required
for viability of a mouse cell in culture 25 repeats
(Bartolomei et al. 1988) is less than that required for differentiation and development of a whole mouse (39 repeats) (Meininghaus et al. 2000). Although, these results
do not rule out the binding of ‘‘auxiliary’’ protein complexes to proximal positions along the CTD, they do indicate that additional repeats are required as more and more
functions are incorporated into the transcriptional/
processing factory.
An idea that emerges from these observations is that
overall CTD length is determined by events that require
the largest physical load of PCAPs. These events might include responses to signals from an external stress or to developmental cues during differentiation. At its natural
length, the CTD accommodates all requisite binding
factors, and these factors are arranged in (presumably)
an optimal manner. From the earlier arguments, we might
posit that a group of ‘‘core’’ PCAPs occupies CTD binding
sites mainly proximal to the body of RNAP II. During stress
or differentiation, auxiliary factors would then bind largely
2637
Liu et al. · doi:10.1093/molbev/msq151
MBE
FIG. 5. Putative models of effects of CTD length alternations on PCAPs organization along the CTD. Nucleosomes (DNA and histone core) are
marked in orange. RNAP II and its CTD are labeled in purple. Various PCAPs are shown in different colors. Variation in CTD phosphorylation
patterns are not shown or considered here. (A) On the WT CTD, the PCAPs are organized compactly and tightly in temporal and spacial order
along the CTD. The dynamic CTD structure permits the formation of large, compound functional protein complexes. As shown, proteins H and
L, which do not contact the phosphoCTD directly, are required for the organization of the complex by binding to other direct CTD-interacting
PCAPs (C, D, G and I, J, K). In addition, proteins M, N, and O form another complex that collectively interacts directly with the CTD over
multiple adjacent functional units. (B) The shortened CTD does provide enough binding sites for all essential or conditionally required PCAPs.
As illustrated in (B), around 2/3 of the CTD is missing, leading to the inability to load proteins J, K, L, M, N, O, and P on the CTD at the
appropriate time and place. (C) Extended CTD increases the potential number of binding sites, which results in competitive binding sites for
the same PCAPs in different locations, thereby reducing the efficiency by which PCAPs localize to their optimal binding sites. Note how the
association of proteins K and I with an extra binding site in the extended CTD region disrupts their normal organizational complex with J. (D)
Effects of CTD insertions on PCAPs binding. The inserted sequences between diheptads or essential units (YSPTSPS YSPT) are marked as purple
helices. Insertions lead to increased distance between binding sites and accordingly changes the spacial order or position of individual proteins.
As shown in panel (D), protein H no longer can make contact with D and C, thus disabling the formation of a larger mega-complex. Similarly,
although proteins J and I still bind to the CTD, no preferred conformation exists for the further binding of L. In some cases, protein associations
are not fully disrupted by insertions, such as proteins M, N, and O because M binds to both N and O, thereby establishing a functional complex.
Nevertheless, disruption of normal contact between N and O could reduce functional efficiency of the collective protein interactions.
to distal sites (the ‘‘extra’’ repeats). In addition, our demonstration that decreased function occurs with all CTD variants extended beyond WT length supports the idea that
specific and/or relative positions of CTD binding sites is key
to CTD functional efficiency and suggests that binding by
many CTD-associating proteins is highly orchestrated. This
view leads to the model in figure 5 and supplementary
2638
figure 3 (Supplementary Material online) for how CTD
primary sequence and length change influence single PCAP
binding as well as overall organization of PCAPs. One
prediction of this model is that a shortened CTD will
not properly accommodate the auxiliary PCAPs when required (fig. 5B). Conversely, an overly long CTD, containing
distal repeats that could compete for binding factors, could
RNAP II CTD Functional and Evolutionary Conservation · doi:10.1093/molbev/msq151
deplete properly constituted binding complexes, thereby
compromising CTD function (fig. 5C).
We performed one additional experiment as an initial
test of this model. Strains that contain mutant CTDs of
near WT length and exhibit close to WT growth rates
at permissive temperatures were grown at both low
(15 °C) and high (37 °C) temperature. Under these more
stressful conditions, which likely require the CTD to coordinate additional functions, all the mutants performed proportionally more poorly than WT controls (fig. 4D). These
results are consistent with our proposed model and also
with the prediction that the strongest selection on global
CTD length should be evident under more complicated
transcriptional regimens.
An additional kind of observation, combined with those
already discussed, induces us to add one more feature to the
model. There are some results in the literature to suggest
that the distribution of bound proteins along the CTD
may not be stochastic; instead, PCAPs may bind at specific
sites or positions. In other words, different factors, carrying
out distinct functions, may preferentially target different regions of CTD. One result consistent with this idea is that
mammalian Spt6 selectively binds to the N-terminal consensus repeats in vitro (Yoh et al. 2008); another is the previously mentioned selective loss of one RNA processing event,
proper mRNA transport to the cytoplasm, when certain
CTD repeats are deleted from mammalian Rpb1 (Custodio
et al. 2007). Spatial constraints on where direct-binding
PCAPs can associate with the CTD, if confirmed, presumably
exist to position these proteins for proper interactions with
other components of the transcription elongation megacomplex. Depending on how spatially reproducible these
interactions are, it may be that the many components associated with the CTD are organized into a smaller number
of distinct multisubunit structures (e.g., see fig. 5).
A Panoramic View of the CTD
Key features of the RNAP II CTD conserved in evolution are
the canonical heptapeptide sequence and a tandemly repeated structure, along with conservation of overall length
and of patterns of sequence degeneracy among related
taxa. The canonical repeat motif that selection has favored
is remarkably adaptable. In each heptad, Y1, S2, T4, S5, and
S7 can be phosphorylated and P3 and P6 can shift structure
between cis- and trans conformations. Through all these
changes, this particular configuration of amino acids remains relatively flexible, providing a vast array of potential
binding sites for transcription, processing, and other factors
(Sudol et al. 2001).
Thus, the specific sequence of the canonical heptad
likely evolved because it optimized efficient modifiable
regulation, whereas remaining largely unordered when
unphosphorylated and, at the same time, permitting a
wide variety of inducible structures when phosphorylated
(Zhang and Corden 1991; Meinhart et al. 2005). With various proteins loaded on, however, the CTD is under greater
regional and global constraints. For example, when factors
bind near a given pair of heptads along the tandem array,
MBE
the N- and C-terminal repeats can become functionally
nonequivalent (Custodio et al. 2007). Thus, this simple
set of identical repeats can be adjusted for scores of different and highly specific interactions, through time- and
position-dependent modifications.
The fact that two canonical tandem heptads are required to form individual ‘‘essential’’ units could favor conservation of a global tandem structure as well; however,
structurally flexible sequence inserted between the essential units also is fairly well tolerated. Nevertheless, a global
tandem structure appears to optimize CTD function in
a number of ways, at least in strongly conserved canonical
regions where numerous proteins related to core functions
must be accommodated, possibly in varying patterns during expression of different genes.
A tandem structure ensures that any given YSPTSPS motif is part of a core CTD functional unit. It also maximizes
the flexibility of positional binding of those motifs by various protein factors. Because functional units overlap,
a global, tandemly repeated structure provides the largest
number and densest packing of binding positions along the
length of CTD. This allows the most efficient positioning of
multiple proteins with respect to each other. In other
words, for those core factors that have evolved to interact
with canonical essential units, context-dependent binding
can be exploited most efficiently through maximization of
binding sites over the fixed length of CTD. Therefore, even
though most mutated CTDs have enough core units to accommodate essential CTD–protein interactions, they require proteins to bind at suboptimal positions (see
fig. 5). Moreover, because each individual heptad is inherently unstructured, a tandem array results in a disordered
structure along the entire length of the CTD. Thus, the
globally repetitive CTD structure is likely to be under strong
selection also because it maintains an inherently unfolded
state within and around each essential repeat unit.
Finally, overall CTD length is likely conserved to avoid
both the loss of intermittently required binding sites (usually at distal locations) and the addition of ‘‘excess’’ potentially competitive binding sites that could interfere with
optimal, spacially specific protein arrangements.
The view of CTD evolution that emerges from these considerations takes into account both the CTD and its many
associating components. Together these have been coevolving over much of eukaryotic evolution, with changes
in the CTD constrained to avoid disrupting the highly orchestrated temporal and spacial interactions that occur in
CTD-based multicomponent entities such as the transcription elongation mega-complex. This view agrees with comparative genomic studies that show an overall conservation
of CTD-related proteins rather than conservation of their
specific interactions with the CTD (Guo and Stiller 2005). In
conclusion, requirements for maintenance of a dynamic
microenvironment around each CTD unit, balanced by
retention of an optimized macroenvironment for overall
binding across the full CTD length, can explain the patterns
of CTD evolution across the broad scale of eukaryotic
diversity.
2639
Liu et al. · doi:10.1093/molbev/msq151
Supplementary Material
Supplementary figures 1–3 are available at Molecular
Biology and Evolution online (http://www.mbe
.oxfordjournals.org/).
Acknowledgments
This work was supported by grant MCB 0133295 from the
National Science Foundation to Dr J.W.S. and by grant
R01GM040505 from the National Institutes of Health to
Dr A.L.G.
References
Ahn SH, Keogh MC, Buratowski S. 2009. Ctk1 promotes dissociation
of basal transcription factors from elongating RNA polymerase
II. EMBO J. 28:205–212.
Akhtar MS, Heidemann M, Tietjen JR, Zhang DW, Chapman RD,
Eick D, Ansari AZ. 2009. TFIIH kinase places bivalent marks on
the carboxy-terminal domain of RNA polymerase II. Mol Cell.
34:387–393.
Bartolomei MS, Halden NF, Cullen CR, Corden JL. 1988. Genetic
analysis of the repetitive carboxyl-terminal domain of the largest
subunit of mouse RNA polymerase II. Mol Cell Biol. 8:330–339.
Blondelle SE, Forood B, Houghten RA, Perez-Paya E. 1997.
Polyalanine-based peptides as models for self-associated betapleated-sheet complexes. Biochemistry 36:8393–8400.
Buratowski S. 2003. The CTD code. Nat Struct Biol. 10:679–680.
Chakrabartty A, Kortemme T, Baldwin RL. 1994. Helix propensities
of the amino acids measured in alanine-based peptides without
helix-stabilizing side-chain interactions. Protein Sci. 3:843–852.
Chapman RD, Conrad M, Eick D. 2005. Role of the mammalian RNA
polymerase II C-terminal domain (CTD) nonconsensus repeats
in CTD stability and cell proliferation. Mol Cell Biol. 25:
7665–7674.
Chapman RD, Heidemann M, Albert TK, Meilhammer R, Flatley A,
Meisterernst M, Kremmer E, Eick D. 2007. Transcribing RNA
polymerase II is phosphorylated at CTD residue serine-7. Science
318:1780–1782.
Chapman RD, Heidemann M, Hintermair C, Eick D. 2008. Molecular
evolution of the RNA polymerase II CTD. Trends Genet. 24:
289–296.
Chapman RD, Palancade B, Lang A, Bensaude O, Eick D. 2004. The
last CTD repeat of the mammalian RNA polymerase II large
subunit is important for its stability. Nucleic Acids Res. 32:35–44.
Cho EJ, Takagi T, Moore CR, Buratowski S. 1997. mRNA capping
enzyme is recruited to the transcription complex by phosphorylation of the RNA polymerase II carboxy-terminal domain. Gen
Dev. 11:3319–3326.
Corden JL. 1993. RNA polymerase II transcription cycles. Curr Opin
Genet Dev. 3:213–218.
Corden JL. 2007. Transcription. Seven ups the code. Science
318:1735–1736.
Corden JL, Patturajan M. 1997. A CTD function linking transcription
to splicing. Trends Biochem Sci. 22:413–416.
Custodio N, Vivo M, Antoniou M, Carmo-Fonseca M. 2007. Splicingand cleavage-independent requirement of RNA polymerase II
CTD for mRNA release from the transcription site. J Cell Biol.
179:199–207.
de la Mata M, Kornblihtt AR. 2006. RNA polymerase II C-terminal
domain mediates regulation of alternative splicing by SRp20. Nat
Struct Mol Biol. 13:973–980.
Egloff S, Murphy S. 2008a. Role of the C-terminal domain of RNA
polymerase II in expression of small nuclear RNA genes. Biochem
Soc Trans. 36:537–539.
2640
MBE
Egloff S, Murphy S. 2008b. Cracking the RNA polymerase II CTD
code. Trends Genet. 24:280–288.
Egloff S, O’Reilly D, Chapman RD, Taylor A, Tanzhaus K, Pitts L,
Eick D, Murphy S. 2007. Serine-7 of the RNA polymerase II CTD
is specifically required for snRNA gene expression. Science 318:
1777–1779.
Fabrega C, Shen V, Shuman S, Lima CD. 2003. Structure of an mRNA
capping enzyme bound to the phosphorylated carboxy-terminal
domain of RNA polymerase II. Mol Cell. 11:1549–1561.
Gans PJ, Lyu PC, Manning MC, Woody RW, Kallenbach NR. 1991.
The helix-coil transition in heterogeneous peptides with specific
side-chain interactions: theory and comparison with CD spectral
data. Biopolymers 31:1605–1614.
Gasset M, Baldwin MA, Lloyd DH, Gabriel JM, Holtzman DM,
Cohen F, Fletterick R, Prusiner SB. 1992. Predicted alpha-helical
regions of the prion protein when synthesized as peptides form
amyloid. Proc Natl Acad Sci U S A. 89:10940–10944.
Gudipati RK, Villa T, Boulay J, Libri D. 2008. Phosphorylation of the
RNA polymerase II C-terminal domain dictates transcription
termination choice. Nat Struct Mol Biol. 15:786–794.
Guo Z, Stiller JW. 2005. Comparative genomics and evolution of
proteins associated with RNA polymerase II C-terminal domain.
Mol Biol Evol. 22:2166–2178.
Hampsey M, Reinberg D. 2003. Tails of intrigue: phosphorylation
of RNA polymerase II mediates histone methylation. Cell 113:
429–432.
Jacobs EY, Ogiwara I, Weiner AM. 2004. Role of the C-terminal
domain of RNA polymerase II in U2 snRNA transcription and 3’
processing. Mol Cell Biol. 24:846–855.
Juban MM, Javadpour MM, Barkley MD. 1997. Circular dichroism studies
of secondary structure of peptides. Methods Mol Biol. 78:73–78.
Kentsis A, Mezei M, Gindin T, Osman R. 2004. Unfolded state of
polyalanine is a segmented polyproline II helix. Proteins 55:493–501.
Kim YJ, Bjorklund S, Li Y, Sayre MH, Kornberg RD. 1994. A
multiprotein mediator of transcriptional activation and its
interaction with the C-terminal repeat domain of RNA
polymerase II. Cell 77:599–608.
Kishore SP, Perkins SL, Templeton TJ, Deitsch KW. 2009. An unusual
recent expansion of the C-terminal domain of RNA polymerase II
in primate malaria parasites features a motif otherwise found only
in mammalian polymerases. J Mol Evol. 68:706–714.
Kohtani M, Schneider JE, Jones TC, Jarrold MF. 2004. The mobile
proton in polyalanine peptides. J Am Chem Soc. 126:16981–16987.
Liu P, Greenleaf AL, Stiller JW. 2008. The essential sequence
elements required for RNAP II carboxyl-terminal domain
function in yeast and their evolutionary conservation. Mol Biol
Evol. 25:719–727.
Maniatis T, Reed R. 2002. An extensive network of coupling among
gene expression machines. Nature 416:499–506.
McCracken S, Fong N, Rosonina E, Yankulov K, Brothers G,
Siderovski D, Hessel A, Foster S, Shuman S, Bentley DL. 1997. 5#Capping enzymes are targeted to pre-mRNA by binding to the
phosphorylated carboxy-terminal domain of RNA polymerase II.
Gene Dev. 11:3306–3318.
McCracken S, Fong N, Yankulov K, Ballantyne S, Pan G, Greenblatt J,
Patterson SD, Wickens M, Bentley DL. 1997. The C-terminal
domain of RNA polymerase II couples mRNA processing to
transcription. Nature 385:357–361.
Meinhart A, Kamenski T, Hoeppner S, Baumli S, Cramer P. 2005. A
structural perspective of CTD function. Gene Dev. 19:1401–1415.
Meininghaus M, Chapman RD, Horndasch M, Eick D. 2000.
Conditional expression of RNA polymerase II in mammalian
cells. Deletion of the carboxyl-terminal domain of the large
subunit affects early steps in transcription. J Biol Chem.
275:24375–24382.
RNAP II CTD Functional and Evolutionary Conservation · doi:10.1093/molbev/msq151
Nonet M, Sweetser D, Young RA. 1987. Functional redundancy and
structural polymorphism in the large subunit of RNA polymerase II. Cell 50:909–915.
Peng Y, Hansmann UHE, Nelson Alves A. 2003. Solution effects and
the order of the helix-coil transition in polyalanine. J Chem Phys.
118:2374–2380.
Phatnani HP, Greenleaf AL. 2006. Phosphorylation and functions of
the RNA polymerase II CTD. Gene Dev. 20:2922–2936.
Proudfoot NJ, Furger A, Dye MJ. 2002. Integrating mRNA processing
with transcription. Cell 108:501–512.
Ryman K, Fong N, Bratt E, Bentley DL, Ohman M. 2007. The
C-terminal domain of RNA Pol II helps ensure that editing
precedes splicing of the GluR-B transcript. RNA 13:1071–1078.
Scheuermann T, Schulz B, Blume A, Wahle E, Rudolph R, Schwarz E.
2003. Trinucleotide expansions leading to an extended poly-Lalanine segment in the poly (A) binding protein PABPN1 cause
fibril formation. Protein Sci. 12:2685–2692.
Shi Z, Olson CA, Rose GD, Baldwin RL, Kallenbach NR. 2002.
Polyproline II structure in a sequence of seven alanine residues.
Proc Natl Acad Sci U S A. 99:9190–9195.
Stiller JW, Cook MS. 2004. Functional unit of the RNA polymerase II
C-terminal domain lies within heptapeptide pairs. Euk Cell.
3:735–740.
Stiller JW, Hall BD. 2002. Evolution of the RNA polymerase II Cterminal domain. Proc Natl Acad Sci U S A. 99:6091–6096.
MBE
Stiller JW, McConaughy BL, Hall BD. 2000. Evolutionary complementation for polymerase II CTD function. Yeast. 16:57–64.
Sudol M, Sliwa K, Russo T. 2001. Functions of WW domains in the
nucleus. FEBS Lett. 490:190–195.
Toniolo CB, Maria Gian, Mutter Manfred. 1979. Conformations of
poly(ethylene glycol) bound homooligo-L-alanines and -Lvalines in aqueous solution. J Am Chem Soc. 101:450–454.
Vasiljeva L, Kim M, Mutschler H, Buratowski S, Meinhart A. 2008.
The Nrd1-Nab3-Sen1 termination complex interacts with the
Ser5-phosphorylated RNA polymerase II C-terminal domain. Nat
Struct Mol Biol. 15:795–804.
Wallimann P, Kennedy RJ, Miller JS, Shalongo W, Kemp DS. 2003.
Dual wavelength parametric test of two-state models for circular
dichroism spectra of helical polypeptides: anomalous dichroic
properties of alanine-rich peptides. J Am Chem Soc.
125:1203–1220.
West ML, Corden JL. 1995. Construction and analysis of yeast
RNA polymerase II CTD deletion and substitution mutations.
Genetics 140:1223–1233.
Yoh SM, Lucas JS, Jones KA. 2008. The Iws1:Spt6:CTD complex
controls cotranscriptional mRNA biosynthesis and HYPB/Setd2mediated histone H3K36 methylation. Gene Dev. 22:3422–3434.
Zhang J, Corden JL. 1991. Phosphorylation causes a conformational
change in the carboxyl-terminal domain of the mouse RNA
polymerase II largest subunit. J Biol Chem. 266:2297–2302.
2641