review - Physiological Genomics

Physiol Genomics
6: 57–80, 2001.
review
Full-length cDNAs: more than just reaching the ends
Received 1 May 2001; accepted in final form 23 May 2001
Das, Manjula, Isabelle Harvey, Lee Lee Chu, Manisha Sinha, and Jerry Pelletier. Full-length cDNAs: more
than just reaching the ends. Physiol Genomics 6: 57–80,
2001.—The development of functional genomic resources is
essential to understand and utilize information generated
from genome sequencing projects. Central to the development of this technology is the creation of high-quality cDNA
resources and improved technologies for analyzing coding
and noncoding mRNA sequences. The isolation and mapping
of cDNAs is an entrée to characterizing the information that
is of significant biological relevance in the genome of an
organism. However, a bottleneck is often encountered when
attempting to bring to full-length (or at least full-coding) a
number of incomplete cDNAs in parallel, since this involves
the nonsystematic, time consuming, and labor-intensive iterative screening of a number of cDNA libraries of variable
quality and/or directed strategies to process individual clones
(e.g., 5⬘ rapid amplification of cDNA ends). Here, we review
the current state of the art in cDNA library generation, as
well as present an analysis of the different steps involved in
cDNA library generation.
gene hunting; complementary DNA cloning; expressed sequence tags; expression validated gene
THE TWO MOST POPULAR WAYS of cataloging human genes
are sequencing expressed sequence tags (ESTs) and
utilization of computational approaches. Several geneprediction programs, like GeneWise (17), ExoFish
(127), and GenScan (30) are available. Although all of
these have the advantage of being “genome based” (rely
on the properties of the genome sequence only, and
thus are not affected by relative abundance of the
encoded transcript), the evidence used is indirect and
often statistical in nature. Thus bioinformatic approaches must be supported by experimental validation. EST sequencing is a direct approach to human
Article published online before print. See web site for date of
publication (http://physiolgenomics.physiology.org).
Address for reprint requests and other correspondence: J. Pelletier, Dept. of Biochemistry, McIntyre Medical Sciences Bldg., McGill
Univ., Rm 810, 3655 Drummond St., Montreal, Quebec, Canada H3G
1Y6 (E-mail: [email protected]).
gene cataloging and has been responsible for the identification of thousands of genes in humans and other
species. But, the rate of novel EST discovery has fallen
dramatically in recent years, from an estimated 10.6%
in 1996 to 2.7% in 1998 (160) and is limited by the
relative abundance of the encoded transcript. Recently,
microarray-based strategies for gene hunting (“exonscanning” and “tiling”) (136) have been performed and,
although notable for their simplicity and ability to be
scaled up, they lack absolute confirmation. In some
cases a single expression validated gene (EVG) may
correspond to multiple coregulated genes or vice versa.
Thus full-length (FL) cDNA identification and characterization still remains the most reliable way to determine how the exons of a gene are woven together.
A set of FL cDNA clones have “value-added” features
that ESTs do not carry. FL cDNAs define the boundary
of the transcriptional unit (and thus identify the immediate upstream basal promoter), provide a record of
isoform diversity when posttranscriptional modifications alter the primary pre-mRNA transcript in various ways (such as alternate promoter usage, alternative splicing, alternate polyadenylation, and RNA
editing), provide the complete coding region of the
encoded protein for functional studies, and provide
sequence characterization of two mRNA segments important for posttranscriptional gene regulation, i.e.,
the 5⬘ and 3⬘ untranslated regions (UTRs).
COMPLEMENTARY DNA CLONING
The conversion of mRNA into cDNA for the purposes
of functional studies is a complex, interrelated series of
enzyme-catalyzed reactions involving the in vitro synthesis of a cDNA copy from mRNA, its subsequent
conversion to duplex cDNA, and insertion into an appropriate cloning vector. Although a number of different approaches can be used to generate cDNA libraries,
most suffer from common limitations producing libraries with a number of inherent problems. Thus there
remains considerable room for improving the approach
used to generate cDNA libraries. One strategy we have
explored is outlined in Fig. 1.
1094-8341/01 $5.00 Copyright © 2001 the American Physiological Society
57
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
MANJULA DAS,1 ISABELLE HARVEY,1 LEE LEE CHU,1 MANISHA SINHA,1
AND JERRY PELLETIER1,2
1
Department of Biochemistry and 2McGill Cancer Center, McGill University,
Montreal, Quebec, Canada H3G 1Y6
58
REVIEW
RNA Fractionation
Cytoplasmic RNA isolation. There are two basic approaches to the isolation of RNA: those that entail
subcellular fractionation, and those that dissociate
macromolecular complexes in denaturing solutions (10,
11). By far, the most reproducible method found to
yield the highest quality RNA involves lysis of cells/
tissues in the presence of guanidinium thiocyanate.
[Guanidinium salts are more efficient denaturants
than phenol, and their relative efficiencies are guanidinium thiocyanate ⬎ guanidine hydrochloride ⬎
urea.]
Given that metabolic labeling experiments in mammalian cells estimate that steady-state levels of heterogeneous nuclear RNA (hnRNA) represent about 7%
of total RNA (24), it should be little surprise that a
significant fraction of cDNA clones within any given
library will contain intron sequences. Indeed, there are
Physiol Genomics • VOL
6•
many published reports documenting the presence of
introns within cDNA clones (e.g., 5, 8, 39, 55, 70, 122,
165; see also Ref. 89 for additional examples). To complicate matters, there is anecdotal data suggesting that
5⬘ introns are excised more slowly than internal introns (89), possibly a consequence of differences in the
mechanism by which these introns are recognized (13,
79). The presence of such cDNA clones in libraries can
significantly delay the progress of gene characterization due to misleading interpretations on predicted
protein structure, incorrect assignment of translation
initiation codons, incorrect assignment of 5⬘-UTRs, and
false information on promoter location. The frequency
with which this problem is encountered is difficult to
estimate, since reports documenting intron-containing
cDNAs are generally not published.
Given these issues, it follows that mRNAs for cDNA
library generation should, if possible, be prepared from
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
Fig. 1. Schematic diagram of a strategy utilized to generate plasmid-based
cDNA libraries. The approach is divided into three sections: I) mRNA isolation and fractionation, II) cDNA synthesis, and III) cDNA cloning. mRNA
is purified from cytoplasmic total RNA
by affinity selection on oligo (dT) columns. The mRNA is then size fractionated on sucrose gradients containing
methylmercury hydroxide. cDNA is then
prepared on the individual size cuts
utilizing Superscript II, NC-p7 (an
RNA chaperone), and oligo pd(T 䡠 Z) [an
oligo (dT) primer designed to minimize
mispriming to internal A-rich sequences
on mRNA templates]. Double-stranded
cDNA is generated utilizing RNase H,
E. coli ligase, and E. coli polymerase.
The ends of the cDNA are polished
with T4 DNA polymerase, and BstXI
adaptors are ligated to the cDNA ends.
The adaptors have noncomplementary
overhangs and thus cannot concatenate. Following ligation, the cDNA is
fractionated on sucrose gradients to
eliminate unligated adaptors, dimerized adaptors, and truncated cDNAs.
Individual size fractions are ligated
into an appropriate vector and transformed, by electroporation, into a bacterial host. Amplification of libraries is
avoided; rather, individual clones are
picked and processed.
REVIEW
Physiol Genomics • VOL
6•
mRNAs may also occur as a mechanism of posttranscriptional regulation in some situations. Libraries
generated from cytoplasmic mRNA will make it easier
to identify these events above the background generally provided by total cellular mRNA preparations. The
best studied example of this is the trans-regulation of
the nucleocytoplasmic distribution of unspliced mRNA
by the human immunodeficiency virus (HIV) rev product (48, 60, 101). This phenomenon requires a cisacting Rev-responsive element (RRE) in the env region
of HIV-1, within the common intron of Tat and Rev
coding sequences. In the presence of rev, the RRE
mediates the cytoplasmic appearance of HIV mRNAs
containing introns. It appears that rev affects a general
process that retains precursor mRNAs in the nucleus
(36). Another example of this sort of regulation is the
Mason-Pfizer monkey virus where efficient accumulation of unspliced mRNAs appears to depend on the
interaction of a cis-acting RNA element with the ATPdependent RNA helicase A (25, 150) or with other
cellular factors (57, 118). Thus some representatives of
specific genes within cDNA libraries generated from
cytoplasmic mRNA might contain introns. The identification of such clones would be a first step toward
cataloging cellular genes potentially utilizing this
mechanism of posttranscription regulation. Recent descriptions of mRNAs harboring retained introns suggests that in some cases this phenomenon may be at
work to regulate gene expression (84) and increase
protein isoform diversity (117, 123, 125, 155). This
form of regulation of gene expression may function
similarly to a recently identified sequence element
within the intronless H2a gene which functions to
promote mRNA transport, inhibit splicing, and stimulate polyadenylation (75). Similar elements have been
identified within HSV-TK (98) and HBV (74, 76).
Hence, the presence of related elements within introns
have the potential to inhibit splicing and increase accumulation of unspliced transcripts in the cytoplasm.
Oligo (dT) chromatography. One fractionation procedure that most individuals utilize when synthesizing
cDNA libraries is to generate poly(A)⫹ mRNA by oligo
(dT) selection of total RNA. This makes good sense
given that RNA PolII transcripts represent only ⬃5%
of the total RNA population within a given cell. However, what is generally not appreciated is that some
mRNA transcripts are not polyadenylated (e.g., histones) or may have shortened poly(A) tails that do not
permit selection by oligo (dT) chromatography (160). It
is interesting to note that although to date, the only
class of poly(A)⫺ mRNA transcripts which have been
defined are some of the histone mRNAs, estimates
have indicated that there could be a large fraction of
genes expressed in postnatal brain uniquely as
poly(A)⫺ transcripts (156). These reports indicate that
a large proportion of the complexity in poly(A)⫺ mRNA
comprises sequences distinct from those in poly(A)⫹
mRNA. Hybridization with 3H-labeled poly(U) indicated that the poly(A)⫺ mRNA fraction was relatively
free of poly(A)⫹ mRNA since the equivalent of only one
A tract of 20 residues per 100 average-sized molecules
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
cytoplasmic fractions obtained from cells or tissues.
Alternatively, cDNAs obtained from libraries based on
total mRNA (which is the case for the majority of
current libraries) should be carefully examined to ensure they lack retained introns. By removing the bulk
of nuclear pre-mRNA, this modification significantly
decreases the amount of intron-containing clones that
contaminate the final cDNA library. Protocols for isolating cytoplasmic mRNA have been extensively used
with cultured cells, not very much with tissues, and
produce RNA preparations of variable quality (12). It is
reasonable to expect that the procedure may have to be
tailored for different cell lines or tissues. Also, it is
likely that some interesting nuclear transcripts, such
as XIST, a nuclear transcript involved in X-chromosome inactivation, will be selected against by this
strategy.
We have modified conventional cytoplasmic RNA
purification schemes (12) to isolate cytoplasmic RNA of
sufficient quality and purity for cDNA library construction from cell lines and tissues. The approach we utilize
is described in the legend to Fig. 2A and has been
successfully applied to isolation of RNA from a number
of cell lines and tissues. When our isolation method
was piloted on MDAH cells, an ovarian cancer cell line,
the quality of the cytoplasmic RNA was as good as the
quality of total RNA obtained in parallel (Fig. 2B,
compare lane 2 with 1). We have successfully applied
this procedure to a number of cell lines, as well as fresh
and frozen tissues. In the cases of isolation from tissue,
the material is first disrupted using a Teflon homogenizer. One tissue source from which we have failed to
isolate good quality cytoplasmic RNA is kidney, presumably due to the presence of nucleases in this tissue.
To obtain an estimate as to how much nuclear material is contaminating our cytoplasmic RNA preparations, we perform a series of analyses. First, utilizing a
primer pair directed to exon 8 of the WT1 tumor suppressor gene, amplification reactions are performed to
assess the amount of contaminating nuclear DNA in
our cytoplasmic preparations (Fig. 2C, compare lane 1
with 3). Controls are performed to ensure the absence
of trans-inhibitors of the PCR in the RNA preparation
under analysis by adding genomic DNA to the cDNA
preparations to be used in the PCRs (Fig. 2C, compare
lane 2 with 1). Second, we probe for the presence of a
15-kb inactive X-specific transcript (XIST) located
solely in the nucleus (26, 28), since this serves as a good
marker by which to determine the amount of contaminating nuclear RNA in our cytoplasmic RNA preparations. The MIC2 transcript serves as a control, since it
is also X-encoded, but located in the cytoplasm. As
determined by RT-PCR and slot-blot analysis on cytoplasmic and nuclear RNA from equivalent cell numbers (Fig. 2, D and E), our fractionation procedure
significantly enriches for cytoplasmic RNA, with little
contamination from high molecular weight nuclear
RNA (Fig. 2, D and E).
There is potential valuable information to be gained
from properly characterizing cDNAs prepared from
cytoplasmic mRNA. Intron retention by cytoplasmic
59
60
REVIEW
(1,500 nt) was observed (156). The complexities of the
poly(A)⫹ and poly(A)⫺ mRNA classes were also found
to be additive, since the sum of their complexities
equaled that of polysomal RNA (37, 156). Another
interpretation of these data is that the poly(A)⫺ mRNA
species are derived from the poly(A)⫹ mRNA transcripts, and indeed in a number of situations, specific
mRNAs have been found in both poly(A)⫹ and poly(A)⫺
forms, such is the case for some histone species, casein
mRNA, and protamine mRNA (52, 72, 130). It is also
known that the length of the poly(A) tail can be regulated (41). Indeed, there is some suggestion that this
phenomenon may be tissue restricted, since analysis of
RNA from kidney indicates extensive homology be-
tween poly(A)⫹ mRNA and purified nominal poly(A)⫺
mRNA (116). For poly(A)⫹ mRNA preparations to be
used for cDNA library construction, it is imperative to
assess the quality of the preparation and efficiency of
fractionation. We assess the quality of our cytoplasmic
RNA preparations by Northern blotting with probes to
three ubiquitously expressed transcripts; histone H4 [a
poly(A)⫺ transcript], ␤-actin (⬃1.8 kb), and protein
tyrosine phosphatase 1E (mRNA is ⬃8 kb) (Fig. 3A).
Size fractionation. Another problem that affects
clone representation in a cDNA library is the issue of
competition between more abundant, shorter mRNA
species and less abundant, longer mRNA species at
various steps during library construction. For example,
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
Physiol Genomics • VOL
6•
www.physiolgenomics.org
REVIEW
itate, and analyze 10% of the material by Northern
blotting, probing for ␤-actin transcripts and for the
Huntington disease (HD) transcript (⬃11 kb). The HD
gene is ubiquitous but of very low abundance, making
it difficult to detect (Fig. 3C).
Correction of Mispriming by the Oligo (dT) Primer
During First-Strand Synthesis
Priming from the mRNA poly(A) tract with oligo (dT)
is necessary to obtain a copy of the entire 3⬘-UTR.
However, it has been estimated that ⬃10–15% of
clones in cDNA libraries have 3⬘ truncations due to
internal misannealing of the oligo (dT) primer to internal A-rich sites (23). Such clones are identified by the
absence of a polyadenylation signal sequence (generally AATAAA) ⬃30 nucleotides upstream of the oligo
(dA) tail of the cDNA. If enhanced discrimination could
be achieved between annealing of the oligo (dT) primer
to the bona fide poly(A) tail vs. internal A-rich sequences, then the frequency of this mispriming artifact
should be significantly reduced. Guo et al. (59) have
shown that increased discrimination of single nucleotide polymorphisms by oligonucleotides can be
achieved by introducing artificial mismatches into an
oligonucleotide probe using the base analog 3-nitropyrrole (Fig. 4A). This base analog acts as a universal
nucleoside that hydrogen bonds minimally with all
Fig. 2. Fractionation of cytoplasmic and nuclear RNA from MDAH ovarian cancer cells. A: MDAH cells (American
Type Culture Collection) were grown to 80% confluence in DMEM supplemented with 10% fetal bovine serum. For
harvesting, cells were washed twice with prechilled PBS, scraped with a rubber policeman, and harvested by
centrifugation at 800 g for 5 min. After removal of the PBS, 2 ml of lysis buffer (50 mM Tris-Cl, pH 8.0, 5 mM
MgCl2, 0.5% Nonidet P-40, 0.5% sodium deoxycholate, and 10 mM vanadyl-ribonucleoside complex) was added, and
the tube was vortexed for 15 s. Nuclei were removed by centrifugation at 14,000 g for 2 min, and the supernatant
was transferred to 15 ml Trizol (LTI, Life Technologies). Isolation of RNA was then performed as recommended by
LTI, with the exception that following precipitation from the Trizol/chloroform mixture, the pellet was resuspended
in GTC [4 M guanidinium thiocyanate (GITC), 25 mM sodium citrate, pH 7.0, 0.5% N-lauroylsarcosine, and 0.1 M
␤-mercaptoethanol], transferred to another tube, and precipitated with an equal volume of isopropanol. Eighty
150-cm2 dishes of MDAH cells yielded 10 mg of cytoplasmic RNA. B: formaldehyde/agarose gel analysis of total
RNA, cytoplasmic RNA, and nuclear RNA isolated from MDAH cells. Ten micrograms of RNA was fractionated on
a 1% agarose/formaldehyde gel and visualized by staining with SYBR gold (Molecular Probes). The positions of
migration of the 28S and 18S rRNA are indicated to the left. C: PCR analysis for the presence of contaminating
genomic DNA in the RNA preparations. PCR primers directed to intron 7 (5⬘-ATGAGATCCCCTTTTCCAG-3⬘) and
intron 8 (5⬘-GGAACACAGCTGCCAGC-3⬘) of the WT1 tumor suppressor gene and designed to amplify WT1 exon
8 (176 bp) were used in a PCR with 0.2 ␮g of either cytoplasmic or nuclear RNA (lanes 1 and 3) or 0.2 ␮g of
cytoplasmic or nuclear RNA supplemented with 100 ng of genomic DNA (lanes 2 and 4). Amplifications were
performed as previously described (157) using 26 cycles, and the products were analyzed on a 1.2% agarose gel. The
solid square to the right indicates the position of migration of the expected product. PCR product is obtained in the
nuclear RNA fraction (lane 3) but not in the cytoplasmic RNA fraction (lane 1), indicating a minimal amount of
genomic DNA is contaminating the cytoplasmic RNA preparation. A PCR product is also obtained when cytoplasmic RNA was supplemented with genomic DNA (compare lane 2 to 1), indicating the absence of a trans-inhibitor
for the PCR in the cytoplasmic RNA preparation. D: RT-PCR analysis of the MIC2 and XIST genes in cytoplasmic
and nuclear RNA fractions. Cytoplasmic and nuclear RNA was converted to cDNA using random hexamers as
primers and Superscript II as recommended by LTI. RNA corresponding to equivalent cell numbers was used in
each reaction. PCR amplifications were performed with primers directed against the nuclear XIST transcript
(HXIST-8, 5⬘-AGCTCCTCGGACAGCTGTAA-3⬘; and HXIST-11r, 5⬘-TGTTGATCTTCAGGTGGGAA-3⬘) (27) or the
X-encoded cytoplasmic MIC2 transcript (MIC2A, 5⬘-ACCCAGTGCTGGGGATGACT-3⬘; and MIC2B, 5⬘-TCTCCATGTCCACCTCCCCT-3⬘) (27) for 26 cycles. Products were analyzed on a 1% agarose, and the position of migration
of the expected products is indicated by a solid square. E: slot-blot analysis of XIST and MIC2 RNA present in
cytoplasmic and nuclear RNA fractions. Two or 10 ␮g of RNA was denatured in 1.3 M formaldehyde/32%
formamide/10⫻ SSC at 65°C for 15 min and loaded onto Hybond N⫹ paper using a slot-blot manifold apparatus.
The blot was baked under vacuum at 80°C for 2 h and prehybridized in Church buffer. Hybridizations were
performed in Church buffer with either a XIST or MIC2 probe (108 cpm/␮g; 106 cpm/ml). Washes were performed
twice with 2⫻ SSC/0.1% SDS at 65°C for 30 min and once in 0.2⫻ SSC/0.1% SDS at 65°C for 30 min. Blots were
exposed to X-OMAT X-ray film (Kodak) at ⫺70°C overnight. Cytoplasmic RNA preparations contain MIC2
transcripts, but not any contaminating nuclear XIST transcripts.
Physiol Genomics • VOL
6•
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
size competition during transformation will bias clone
representation in the final library product. One manner to deal with this is to size fractionate the mRNA
before the start of library construction and treat each
size cut as a distinct subcomponent of the library. This
also allows one to implement an additional size fraction step at the end of library construction to select
against smaller cDNA products produced as a result of
failure to reach the vicinity of the 5⬘ end of the mRNA
or inefficient conversion to second-strand product.
There are two procedures that are amenable to preparative mRNA size fractionation-agarose gel electrophoresis and sucrose gradient centrifugation. Preparative agarose gel electrophoresis is relatively simple
and provides sharp size cuts and good resolving power
between 500 and 10,000 bases, but the RNA recovery is
lower than with sucrose gradients, and some lots of
commercial agarose contain contaminants (that inhibit
subsequent downstream enzymatic steps) that copurify with the RNA (49). On the other hand, sucrose
gradient centrifugation allows the fractionation of
large amounts of RNA and allows recovery in high
yields, although the size cuts are not as sharp as with
agarose gels (61, 90). An example of total RNA fractionated on methylmercury hydroxide sucrose gradients is shown in Fig. 3B. We generally load 100 ␮g of
poly(A)⫹ mRNA on 10–30% sucrose gradients. Following centrifugation, we collect 30-drop fractions, precip-
61
62
REVIEW
RNA-Dependent DNA Polymerases
Fig. 3. A: Northern blot analysis of poly(A)⫹ and poly(A)⫺ RNA. One
microgram poly(A)⫹ and 20 ␮g of poly(A)⫺ RNA were fractionated on
a 1.2% formaldehyde/agarose gel, transferred to Hybond N⫹, and
hybridized with probes to the genes indicated at bottom. Prehybridizations and hybridizations were performed in Church buffer. Two
washes were performed with 2⫻ SSC/0.1% SDS at 65°C for 30 min and
one wash in 0.2⫻ SSC/0.1% SDS at 65°C for 30 min. Arrows indicate
the position of migration of the expected mRNA species. B: formaldehyde/agarose gel analysis of size fractions obtained after fractionating
100 ␮g of total RNA on a 10–30% methylmercury hydroxide sucrose
gradient. The first left lane contains an aliquot of the sample loaded
onto the sucrose gradient. The agarose gel was stained with SYBR gold
(Molecular Probes) to visualize the RNA. C: Northern blotting analysis of size fractions obtained after fractionating 100 ␮g of poly(A)⫹
RNA isolated from MDAH cells and probed with the Huntington disease (HD) and ␤-actin genes. Fraction numbers are indicated at top.
Physiol Genomics • VOL
6•
Inhibition of reverse transcriptase processivity by secondary structure. The central enzyme to cDNA library
construction has always been RT, since all current
cDNA library generation procedures employ an RNAdependent DNA polymerase to convert mRNA into a
cDNA copy. This enzyme performs multiple functions
during the life cycle of a retrovirus, including the
copying of RNA in DNA, the hydrolysis of RNA from
the RNA-DNA hybrid, and the copying of singlestranded DNA into double-stranded DNA. The two
most commonly used RTs for cDNA library construction have been those derived from avian myeloblastosis
virus (AMV) and Moloney murine leukemia virus
(MMLV). The AMV RT is a heterodimeric molecule
containing two related subunits, the ␣-subunit (⬃63
kDa) and the ␤ subunit (⬃95 kDa) (71). The ␣-subunit
of AMV RT carries both polymerase and RNase H
activity. The MMLV RT differs from the avian form in
that it is a single 84-kDa polypeptide (159) with the
polymerase activity mapping to the amino terminus
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
four bases without steric disruption of the DNA duplex.
Differences in thermal stability (⌬Tm) between hybrids
formed with normal and single-nucleotide variant
DNA targets are increased by as much as 200% over
conventional hybridization (59). We tested whether an
oligo (dT) primer in which some of the internal thymidine residues were replaced with 3-nitropyrrole [called
oligo (dT) 䡠 Z1; 5⬘ d(T)7 䡠 Z 䡠 d(T)9 䡠 Z 䡠 d(T)5 3⬘] significantly
decreased the frequency of 3⬘ end mispriming (Fig. 4).
When eukaryotic initiation factor-4GII (eIF-4GII)
cDNAs were initially isolated, only one of five clones
had the correct 3⬘ end; all other truncated clones were
the result of internal priming by oligo (dT) at four
different sites (Fig. 4B) (A. Gradi and N. Sonenberg,
unpublished data). In vitro transcribed RNA generated
from this clone thus served as an excellent test reagent
by which to determine the ability of oligo (dT) 䡠 Z1 to
decrease the frequency of internal mispriming events
(Fig. 4C). This RNA was annealed to either oligo (dT)
or oligo (dT) 䡠 Z1, and reverse transcription was performed with Superscript II. Use of oligo (dT) on this
template resulted in shorter than FL products (⬎95%)
generated as a result of two internal mispriming
events (Fig. 4D, lane 1). However, utilizing oligo
(dT) 䡠 Z1 as primer on the same template dramatically
improves the yield of FL product (Fig. 4D, lane 2). We
have mapped the 3⬘ ends of the various reverse transcriptase (RT) products by hybridization of the cDNA
with oligonucleotides which spanned the 3⬘-UTR (Fig.
4E). cDNA was generated with oligo (dT)23 or oligo
(dT) 䡠 Z2 [5⬘ d(T)5 䡠 Z 䡠 d(T)5 䡠 Z 䡠 d(T)5 䡠 Z 䡠 d(T)5 3⬘] on eIF4GII RNA, fractionated on an alkaline agarose gel, and
transferred to Hybond N⫹ for analysis. These products
were analyzed by probing with radiolabeled oligonucleotides directed to different regions of the eIF-4GII
3⬘-UTR (Fig. 4B). These results confirm that priming
with oligo dT 䡠 Z2 dramatically reduces 3⬘ end mispriming.
REVIEW
Physiol Genomics • VOL
6•
source which can function at 55°C, thus allowing one to
perform RT reactions under conditions that should
otherwise destabilize mRNA secondary structure.
When we compared the efficiency of this enzyme to
Superscript II, we also found it to be inferior since it
produced a lower proportion of FL product than Superscript II (Fig. 5C, compare lanes 5–8 to lanes 1–4). Use
of wild-type MMLV RT produced only truncated product when attempting to copy flWT1/(GNRA)2, as assessed by alkaline agarose gel electrophoresis (Fig. 5D,
lane 1). We noticed that a small fraction of cDNA
produced with Superscript II was larger than full
length (Fig. 5D, lane 2) and attribute this to hairpin
priming of second-strand from first-strand product.
The thermostable polymerase, tTH, has been previously reported to harbor significant RT activity. In our
hands, the yield and quality of cDNA product obtained
with this enzyme was very disappointing (data not
shown).
Our results indicate that MMLV RNase H⫺ enzyme
appears to be best at negotiating regions of secondary
structure. Recently a number of such products have
appeared on the market, and although we have not
tested all of them, our preliminary results indicate that
many of the RNase H⫺ MMLV RTs show similar activities. In our hands, Expand RT (Roche) also produces
high-quality first-strand product (data not shown). We
have also tested a number of RT RNase H inhibitors
such as suramine (147), heparin (113), novobiocin (2),
and illimaquinone (99) for their ability to suppress
RNase H activity present in the native MMLV and
AMV enzymes. This approach was to attempt to reduce
the amount of pausing on flWT1/(GNRA)2 templates.
However, in our hands, none of these improved the
activity of the native enzymes on templates containing
structural features inhibitory to RT processivity. Parenthetically, we have noticed variation in quality of
some lots of buffers that accompanied Superscript II
and that resulted in the synthesis of a major truncated
product when the enzyme was assayed on poliovirus
mRNA [an ⬃7.5-kb RNA template with a poly(A)⫹
tail], compared with the same buffer made in our lab.
One should thus be wary about utilizing any reagents
from manufacturers unless they are first quality controlled in the investigator’s lab.
A number of conditions have been reported in the
literature to improve the processivity of RT. These
include denaturing the RNA template at 65°C for 5 min
before starting the RT reaction, pretreatment of the
RNA with methylmercury hydroxide before the RT
reaction, addition of DMSO to the RT reaction, and
performing the RT reaction at a higher temperature
(55°C) in the presence of the thermostabilizer, trehalose (34). However, the usefulness of these conditions
has never been tested on controlled test transcripts
harboring defined structural features capable of blocking RT activity. RT reactions performed with Superscript II and either WT1 or flWT1 result in exclusive
production of FL products as assessed by denaturing
alkaline agarose gels (Fig. 5E, lanes 1 and 2). RT
reactions on flWT1/(GNRA)2 template show FL prodwww.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
and the RNase H activity to the carboxy terminus
(149). The error rates of MMLV and AMV RT on DNA
template has been estimated to be ⬃1/30,000 and
1/17,000, respectively (81, 126). On RNA, the error rate
of MMLV RT is 1/37,000 (81). The error rate of HIV RT
has been found to be significantly higher ⬃1/4,600 to
1/7,000 (6, 7, 78, 80, 81), precluding the use of the
native HIV enzyme in cDNA library construction. One
drawback with current RTs is that they have a very
poor rate of processivity, in the range of 5–15 nucleotides per second (77, 100). Reducing the RNase H
activity of MMLV RT improves the efficiency of RT
(53), suggesting that generation of truncated products
by RNase H⫹ RTs may be the result of pausing by the
RT on the RNA template, followed by degradation of
the RNA moiety by the RNase H activity. Recombinant
MMLV RT engineered to be devoid of RNase H activity
has resulted in the generation of several commercial
products.
To assess pausing by RTs on mRNA templates during first-strand synthesis, we have developed a series
of test vectors that have stable stem-loop structures in
an NcoI site positioned 918 bp upstream of the WT1 3⬘
end (4) (Fig. 5A). It is well documented that stable
hairpin loops can inhibit RT processivity (154). Also,
plasmid SP/flWT1 contains 433 bp of the 5⬘-UTR of
WT1 and is ⬃70% GC rich. Indeed, when cDNA clones
for the murine WT1 gene were first isolated, none of
the clones were full length, and 5 of 9 clones terminated within 21 nucleotides of each other, 182 bases
upstream of the ATG codon, suggesting the presence of
a strong RT stop signal in this region. The murine WT1
5⬘ end could only be obtained by 5⬘ RACE and genomic
DNA sequencing (J. Pelletier, data not published). We
have used in vitro generated WT1 transcripts (ranging
in size from ⬃1.4 to 2.0 kb) to elucidate and optimize
conditions most effective in allowing RTs of various
sources to proceed through these blocks.
We have attempted to define which currently available RT is capable of best “negotiating” regions of
secondary structure. In all, we have compared the
quality of first-strand product obtained on these test
templates with Superscript II (Fig. 5B), AMV RT (Fig.
5B), tTH (data not shown), MMLV RT (Fig. 5D), Expand RT (Roche) (data not shown), and Thermoscript
(thermostable avian RT obtained from LTI) (Fig. 5C).
Representative data is shown in Fig. 5, B–D. Superscript II is capable of efficiently copying WT1, flWT1,
and flWT1/(GNRA)2 into cDNA, with a minimal of
truncated products (Fig. 5, B and C, compare lanes
2–4). The indicated products are known to be full
length, because we have recloned them and by sequencing showed that they contain the expect 5⬘ end
(data not shown). By comparison, AMV is a poor choice
for production of cDNA, since the majority of cDNA
products from all three RNA templates are truncated
(Fig. 5B, compare lanes 5–7). This is not a peculiarity
of this particular AMV enzyme preparation, since we
have tested AMV RTs from a variety of major suppliers
(data not shown) and obtained similar results. Thermoscript is a RNase H⫺ RT derived from an avian
63
64
REVIEW
Physiol Genomics • VOL
6•
induced hydrolysis of RNA at high temperatures (see
below), the utility of this modification remains to be
established (108). It is evident that current treatments
for enabling RT enzymes to proceed through regions of
high secondary structure within RNAs simply are not
effective.
Effect of temperature on mRNA template stability.
Many current cDNA protocols aim to perform reactions
under conditions in which the temperature of the reaction is maintained as high as possible (without interfering with the activity of the enzyme). The underlying premise is that reduced hydrogen bonding at
elevated temperatures will decrease mRNA secondary
structure, leading to reduced pausing of the RT on the
mRNA template and resulting in a higher proportion of
FL product. Examples of this include 1) utilization of
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
uct, as well as a truncated product due to a block in
processivity at 918 bp by the GNRA stem-loop (denoted
by an asterisk) (Fig. 5E, lane 3). None of the methods in
common use today to denature RNA templates before
the commencement of an RT reaction improves the
processivity of Superscript II on the flWT1/(GNRA)2
template (Fig. 5E, lanes 5–11). We interpret these
results to indicate that denaturation of local stem-loop
structures by any of these conditions is transient, and
once the treatment is terminated, structured regions
rapidly reform. We do not know why trehalose did not
improve the processivity of RT, but it is possible that
even at 55°C, the GNRA stem-loop structure is very
stable. Additionally, it is not clear how cDNA synthesis
at 55°C, in the presence of trehalose, affects the error
rate of Superscript II. Also, given the problem of metal-
REVIEW
65
Thermoscript (LTI) at 55°C, 2) addition of trehalose to
stabilize Superscript II at 60°C (34), and 3) use of
Display Thermo-RT (Display Systems Biotech). However, these approaches disregard the fact that at elevated temperatures, mRNA is susceptible to metal
catalyzed degradation (108). We have assessed the
effect of incubating mRNA with a range of heavy metals at room temperature for 60 min (Fig. 6A). The
results demonstrate that under these conditions La3⫹,
Lu3⫹, and Zn2⫹ are very effective at hydrolyzing
mRNA (Fig. 6A, compare lanes 2, 3, and 6 with lane 1),
whereas Mg2⫹, Mn2⫹, and Cu2⫹ did not appreciably
degrade the test template (Fig. 6A, compare lanes 4, 5,
and 7 with lane 1). Incubation of test mRNA template
with 3 mM MgCl2 at a series of temperatures demonstrated that at 55°C, degradation by Mg2⫹ is significant (Fig. 6B). Given the absolute requirement of RTs
for divalent metal cations, our results would suggest
that RT reactions performed at or above 55°C will
result in significant damage to mRNA templates. As a
matter of precaution, we do not perform RTs at temperatures greater than 45°C.
Physiol Genomics • VOL
6•
Use of an RNA chaperone to improve the quality of
first-strand synthesis. In an attempt to improve the
quality of first-strand products, without having to resort to increasing the temperature of the RT reaction,
we have taken advantage of the unique biological properties of nucleic acid chaperones and have modified the
conditions for first-strand synthesis. Nucleocapsid
(NC) proteins are very small, highly basic proteins. NC
proteins bind both to single-strand/double-strand DNA
and to single-strand RNA in vitro and are associated
with the genomic RNA in the interior of the mature
retrovirus particle (92). The NC proteins of most retroviruses contain one or two zinc fingers of the form
CX2CX4HX4C, which according to nuclear magnetic
resonance analysis form very tight, rigid loops within
the protein. NC proteins catalyze the folding of nucleic
acids into conformations that have the maximal number of base pairs and appear to do this by lowering the
energy barrier for breakage and re-formation of base
pairs. Thus interaction between a nucleic acid molecule
and an NC protein results in the transient unpairing of
bases within the chain, which makes the bases availwww.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
Fig. 4. A: structure of the universal nucleoside 3-nitropyrrole. B: structure of eukaryotic initiation factor-4GII
(eIF-4GII) construct used to analyze mispriming at the 3⬘ end. The locations of four internal A-rich sequences are
shown, all of which generated 3⬘ truncated clones when eIF-4GII was isolated from a cDNA library. The plasmid
was linearized with Asp718, and T7 RNA polymerase was used to generate an ⬃2,400-nt 3H-test transcript. The
locations of five oligos (a, b, c, d, e) used in the hybridization assay to map the sites of 3⬘ mispriming by oligo (dT)
are shown. The nucleotide targets of the oligonucleotides on eIF-4GII are as follows: oligo a, (5567)GAAATTGACTCAGTACTATT(5584); oligo b, (5416)GAAGGAAATGCTGTGGACC(5535); oligo c, (5194)TGTATAATAGAAAAGCAGAG(5213); oligo d, (5068)TTTTAAACAAGGACTCATAC(5087); and oligo e, (4781)AAGAGGAGTCTGAGGATAAC(4800). C: the integrity of the in vitro generated transcript is shown following fractionation on a
formaldehyde/1.2% agarose gel, treatment with EN3HANCE, and autoradiography of the dried gel. D: alkaline
agarose (1.2%) analysis of RT products generated by priming synthesis with oligo (dT)15 (lane 1) or oligo (dT) 䡠 Z1
primer [5⬘-d(T)7 䡠 Z 䡠 d(T)9 䡠 Z 䡠 d(T)5-3⬘] (lane 2) using Superscript II. cDNA was labeled with [␣-32P]dCTP. The gel was
neutralized in 5% trichloroacetic acid, dried, and exposed to X-OMAT film (Kodak). The positions of migration of
truncated products are indicated by solid squares, and full-length (FL) product is indicated by an arrow. E:
Southern blot of alkaline agarose gel of RT products generated by priming synthesis with either oligo (dT) or oligo
(dT) 䡠 Z. Marker lane refers to the 1-kb size ladder from GIBCO-BRL, and sizes (in nt) are indicated to the left in
E. eIF-4GII DNA refers to a DNA fragment of eIF-4GII used as a positive control for DNA hybridizations. Oligos
used as probes on each blot are indicated at bottom (a–e). On left in E: ⴱ, cDNA product obtained by priming at the
correct poly(A) site; F, cDNA product obtained by priming from the internal A-rich stretch between nucleotides
5550–5575; arrow, cDNA obtained by internal priming from nucleotides 5085–5120.
66
REVIEW
Physiol Genomics • VOL
6•
ability of the NC protein to transiently eliminate secondary structures in the template RNA that obstruct
the polymerization process. Dose-response studies
have shown that a threshold concentration of the protein is required to demonstrate nucleic acid chaperone
effects (83, 162, 163). This minimum concentration is
generally in the range of 1 protein molecule per 7
nucleotides, approximately the ratio at which a nucleic
acid becomes saturated with NC protein.
To determine whether NC can function with MMLV
RT, recombinant NCp7 (Fig. 7A) was added to native
MMLV RT and Superscript II (Fig. 5D). Addition of
NCp7 to MMLV RT did not improve the quality of
first-strand product produced with this enzyme (Fig.
5D, compare lane 3 with 1). However, addition of NCp7
to Superscript II dramatically improved the quality of
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
able for re-pairing in alternative combinations. This
activity allows nucleic acid molecules to escape from
suboptimal conformations, which would otherwise represent kinetic traps.
The ability of NC proteins to refold RNA has been
demonstrated in a wide variety of assay systems: NC
proteins accelerate annealing of complementary
strands (40, 43, 93, 153, 163), facilitate transfer of a
nucleic acid strand from one hybrid to a more stable
hybrid (93, 153), cause unwinding of tRNA (83), and
stimulate release of the products of hammerhead ribozyme-mediated RNA cleavage (16, 68, 110). In addition, supplementing HIV RT with recombinant HIV
NC protein has been shown to reduce pausing and
increase the efficiency of synthesis of FL DNA products
(81, 148, 162). This effect is almost certainly due to the
REVIEW
67
Physiol Genomics • VOL
6•
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
Fig. 5. Effects of mRNA secondary structure on RT activity. A: a schematic diagram illustrating test constructs generated in which stable
stem-loop structures were inserted into the NcoI site of the WT1 gene. A Sau3A I fragment of the WT1 gene was inserted into pSP65(A)18,
positioning a tract of 18 adenosine residues downstream of the WT1 gene (pSP/WT1), thus allowing first-strand synthesis to be primed with
an oligo (dT) primer. Clones containing either one or two copies of the GNRA stem-loop structure were constructed in pSP/flWT1, a derivative
of pSP/WT1 in which 433 bp of the GC-rich 5⬘-UTR of WT1 was present. Termination of RT by the stem-loop structures is expected to generate
a truncated product of 918 bases, whereas generation of FL cDNA is expected to generate a product of ⬃1.4–1.5 kb in the case of pSP/WT1
and a product of ⬃1.9–2.0 kb in the case of pSP/flWT1 and pSP/flWT1/(GNRA). B: alkaline agarose gel analysis of RT products obtained by
priming WT1, flWT1, and flWT1/(GNRA)2 mRNA with oligo (dT)15 and Superscript II (lanes 2–4) or AMV (lanes 5 and 6). cDNA reactions
were performed with [␣-32P]dCTP, and products were visualized by autoradiography. The positions of migration of the predicted FL cDNA
products are indicated by arrows. Radioactive DNA markers (Life Technologies) are shown in lane 1. C: alkaline agarose gel analysis of RT
products obtained by priming WT1, flWT1, flWT1/(GNRA), and flWT1/(GNRA)2 mRNA with oligo (dT)15 and Superscript II (lanes 1–4) or
Thermoscript (lanes 5–8). cDNA reactions were performed with [␣-32P]dCTP, and products were visualized by autoradiography. The positions
of migration of the predicted FL cDNA products are indicated by arrows. D: alkaline agarose gel analysis of RT products obtained by priming
flWT1/(GNRA)2 mRNA with oligo (dT)15 and MMLV RT (lanes 1 and 3) or Superscript II (lanes 2 and 4). cDNA reactions were performed with
[␣-32P]dCTP, and products were visualized by autoradiography. The effect of the RNA chaperone, NCp7 (1 ␮g), on RT activity was assessed
(lanes 3 and 4). The positions of migration of the predicted FL cDNA products are indicated by arrows. E: assessment of denaturation
conditions currently used to improve RT processivity during first-strand cDNA synthesis. First-strand RT reactions were performed with
oligo (dT)15 and Superscript II, utilizing either RNA made from in vitro transcription of pSP/WT1 (lane 1), pSP/flWT1 (lane 2), or
pSP/flWT1/(GNRA)2 (lanes 3–11). Inhibition of RT processivity by the GNRA stem-loop structure is expected to yield a product of ⬃918 bases
(asterisk). Treatments performed on the RNA sample before the RT reaction are indicated and included as follows: lane 3, addition of oligo
(dT)15 primer before the RT enzyme; lane 4, addition of oligo (dT)15 primer after the RT enzyme; lane 5, incubation of the RNA at 65°C for
5 min; lane 6, denaturation at 65°C for 5 min, followed by snap freezing on dry ice and slow thawing on ice; lane 7, preincubation with
methylmercury hydroxide followed by neutralization with ␤-mercaptoethanol; lane 8, incubation in 5% DMSO followed by dilution to 1%
DMSO (since 5% DMSO inhibits the RT reaction); lane 9, incubation in 5% DMSO/35% glycerol, followed by dilution to 1% DMSO/7% glycerol;
lane 10, addition of trehalose to an RT reaction performed at 45°C; lane 11, addition of trehalose to an RT reaction performed at 55°C. cDNA
reactions were performed with [␣-32P]dCTP, and products were visualized by autoradiography. Radioactive DNA markers are from LTI.
68
REVIEW
tions of NCp7 are equally effective at providing the
effect we have described herein. The reasons for this
are currently not well understood.
Second-Strand Synthesis
first-strand product obtained with this enzyme (compare lane 4 with lane 2). We noticed two effects of
NCp7: 1) a reduction in the amount of truncated firststrand product and 2) a reduction in the amount of
undesired hairpin primed second-strand product (Fig.
5D, compare lane 4 with 2). Titrations of NCp7 in RT
reactions primed from WT1 or flWT1/(GNRA)2 mRNA
templates were performed (Fig. 7B). Addition of NCp7
to Superscript II reactions containing WT1 RNA did
not affect the quality of the products, producing only a
slight reduction in yield (compare lanes 1–5). Addition
of increasing amounts of NCp7 to flWT1/(GNRA)2
showed a significant improvement in the quality of the
RT products (compare lanes 6–10). At the highest concentration of NC (1.2 ␮g), the majority of the RT
products are full length (lane 10). These results demonstrate that NCp7 is capable of improving the processivity of Superscript II, and we have incorporated its
use to generate better quality first-strand product.
Parenthetically, we have found that not all RNase H⫺
RTs are stimulated by NCp7 and that not all preparaPhysiol Genomics • VOL
6•
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
Fig. 6. Degradation of RNA by metal ions. A: agarose/formaldehyde
gel analysis of 3H-labeled WT1 mRNA produced by in vitro transcription and incubated with various metal salts and hydroxides; 0.4 ␮g of
RNA in 50 mM Tris 䡠 HCl, pH 8.0, was incubated with no salt (lane 1),
2.5 mM LaCl3 䡠 7H2O (lane 2), 2.5 mM LuCl3 䡠 6H2O (lane 3), 2.5 mM
MnCl2 (lane 4), 2.5 mM MgCl2 (lane 5), 2.5 mM ZnCl2 (lane 6), or 2.5
mM CuCl2 (lane 7) at 22°C for 1 h. The quality of the RNA transcripts was analyzed by electrophoresis into a 1.2% agarose gel,
treating the gel with EN3HANCE (NEN), drying, and exposing to
X-OMAT film (Kodak). B: agarose/formaldehyde gel analysis of 3HWT1 mRNA produced by in vitro transcription and incubated with 3
mM MgCl2 at 22°C (lanes 1 and 2), 37°C (lanes 3–5), 45°C (lanes
6–8), and 55°C (lanes 9–11). Transcripts were incubated for 5 min
(lanes 3, 6, and 9) or for 60 min (lanes 1, 2, 4, 5, 7, 8, 10, 11) and
analyzed by agarose gel electrophoresis as described in A.
There are several approaches that can be used for
second-strand synthesis. In one set of methods, initiation of second-strand synthesis takes place within the
sequence of the first strand. One of these methods
involves synthesis of the second-strand with Escherichia coli DNA polymerase I or AMV RT by selfpriming of the first strand, followed by digestion of the
resulting loop with S1 nuclease (46, 69, 102, 129).
Another approach, the second-strand synthesis method
of Gubler and Hoffman (58), is the most widely utilized
procedure and relies on the combined activities of E.
coli RNase H and DNA polymerase I to catalyze second-strand synthesis from mRNA-cDNA hybrids by
RNA-primed nick translation (115). Unfortunately,
some sequence information at the mRNA 5⬘ terminus
is lost during subsequent cloning (8–50 nt), since RNA
priming can never accommodate the 5⬘ most sequences
(38, 44).
In a second group of methods, initiation of secondstrand synthesis takes place outside the sequence of
the first strand. For that purpose, a homopolymeric
tract is synthesized at the 3⬘ end of the first-strand
product with terminal deoxynucleotidyl transferase
(TdT) (9, 51, 128, 129). Ligation of an oligonucleotide at
the 3⬘ end was also reported, but the efficiency of this
reaction is low (153). In both cases, a complementary
oligonucleotide annealed to the extension allows initiation. Another method involves addition of a homopolymeric tract to the first strand and of a complementary
homopolymeric tract to the linearized plasmid vector,
followed by annealing (115). Depending on the method,
the second strand is synthesized with E. coli DNA
polymerase I, AMV RT, or thermostable polymerases.
This group of methods does not require digestion of the
3⬘ end of the first strand, thereby allowing conservation of the sequences corresponding to the 5⬘ end of the
mRNA.
With respect to homopolymeric tailing, generally a
stretch of guanosine residues are added to the 3⬘ end of
the first strand, since this reaction is self-limiting
(⬃10–15 nt) due to secondary structure formed by
poly(dG). Second-strand synthesis is then primed by
oligo (dC). Recently a series of thermostable enzymes
[Taq DNA polymerase, Ampligase (thermostable ligase), and Hybridase (thermostable RNase H)] have
been utilized to prime the second strand with oligo (dC)
(34). A lesser known approach has been to utilize TdT
to tail the first-strand cDNA product with dTTP, followed by second-strand synthesis utilizing oligo (dA)
and T7 DNA polymerase (20, 21), an enzyme which has
a higher 3⬘-exonuclease activity than polymerases previously used for second-strand synthesis. This results
in a higher level of fidelity (error rate is ⬃15 ⫻ 10⫺6)
and allows trimming of the poly d(T) tract during the
second-strand synthesis reaction. The size of the tract
69
REVIEW
synthesized with terminal deoxynucleotidyl transferase is therefore not required to be within a given
size range, and the resulting clones contain a tract of
limited size (⬃40 nt; see Refs. 20 and 21). Furthermore,
T7 DNA polymerase has a much higher processivity
(⬃300 nt/s) than the polymerases previously used for
second-strand synthesis, thus making it the enzyme of
choice for synthesis of long molecules (120, 146). However, in our hands the efficiency of this approach can be
quite low (probably due to inefficient tailing of the
cDNA by TdT), resulting in low conversion of firststrand product to second-strand product.
Alternatively, an “oligo capping” approach, also
known as reverse ligation-mediated PCR (RLPCR) (15)
has been reported and used with some success (104,
145). In this approach, before generation of first-strand
product, the 5⬘ end m7G mRNA cap structure is removed using tobacco acid pyrophosphatase. An oligoribonucleotide is then ligated onto the mRNA 5⬘ end
using T4 RNA ligase. Following first-strand synthesis,
the oligoribonucleotide at the mRNA 5⬘ end is copied
Physiol Genomics • VOL
6•
into DNA, if the polymerase makes it that far. This
unique sequence at the 5⬘ end then serves as an anchor
for second-strand reaction, which incorporates an amplification step using thermostable enzymes (145). Unfortunately, T4 RNA ligase is a very inefficient enzyme,
although the use of macromolecular crowding agents,
such as polyethylene glycol 8000, can improve the
efficiency of the reaction (64). The use of an amplification step in the generation of cDNA libraries based on
oligo capping is likely the result of the low-efficiency
ligation step, and is a cause for concern, given the
changes in representation associated with such manipulations. A variation on this approach involves oligodeoxynucleotide ligation to first-strand product with T4
RNA ligase (45). Another method of anchoring defined
sequences at the 3⬘ end of first-strand product is based
on the observation that Superscript II often adds three
to four non-template-derived cytidine residues to the 3⬘
end of cDNAs in the presence of manganese or high
magnesium. The presence of these additional residues
can be used to either ligate a specific DNA adaptor
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
Fig. 7. A: primary amino acid structure of HIV NCp7, a
well-characterized nucleic acid chaperone. The two zinc
fingers are underlined. B: titration of NCp7 into RT
reactions performed with Superscript II. The nature of
the input RNA is shown below the panel. Asterisk
denotes the major truncated RT product obtained due
to termination of DNA synthesis by the RT enzyme
when regions of secondary structure are encountered
by the enzyme. The amount of exogenously supplied
NCp7 is indicated above each well. First-strand synthesis was performed with 0.2 ␮g of template RNA, 50 ng
of oligo (dT)15 primer, and 100 U of Superscript II in a
final volume of 25 ␮l. Following first-strand synthesis
in the presence of [␣-32P]dCTP, products were extracted
with phenol/chloroform, passed over a G50 spun column, precipitated with ammonium acetate, and fractionated on a 1.2% agarose/formaldehyde gel. Products
were visualized by drying the gel and exposing to XOMAT film (Kodak).
70
REVIEW
achieve these ends, we 1) perform first-strand synthesis in the presence of 5-methyl-dCTP with a primer
d(T) 䡠 Z2, containing an XhoI site and 2) ligate BstXI
adaptors onto the products of our second-strand synthesis (Fig. 1). The nature of the BstXI recognition site
2
5⬘ CCANNNNNNTGG 3⬘
3⬘ GGTNNNNNNACC 5⬘
1
allows us to design adaptors with noncomplementary
ends, thus avoiding concatenation of the adaptors
(143). Subsequent cleavage of this ligation product
with XhoI produces a cDNA fragment with an XhoI site
at the 3⬘ end of the gene and a BstXI site at the 5⬘ end
of the gene. 3) We then fractionate the products of
second-strand synthesis on sucrose gradients (Fig. 8B)
to remove unligated BstXI adaptors and BstXI dimers
from the cDNA mixture. In addition, ligating different
cDNA size fractions to our propagation vector minimizes the amount of competition between inserts of
different sizes which may occur during transformation.
Insert Fractionation and Cloning
Propagation Vector, Library Maintenance, and
Clone Manipulation
The method one will use to insert cDNA fragments
into the propagation vector will depend on both the
vector design and downstream applications. Since our
main interest is in being able to rapidly evaluate clones
for their full coding nature, our needs require that we
directionally clone the inserts and place appropriate
sequencing primer sites immediately upstream and
downstream of the cloning sites. Additionally, the presence of rare restriction sites flanking the insert could
be useful in helping excise interesting inserts. To
Many methods have been reported in the literature
for inserting cDNA fragments into appropriate propagation vectors. We will not attempt to review these
methods, since they often are tailored to individual
needs or functional applications. In our case, our application calls for a vector that can accommodate large
inserts and that can easily be manipulated in a fashion
compatible with current high-throughput sequencing
technologies (at least with respect to identifying the
nature of the ends of the insert).
Fig. 8. A: Southern blot analysis of the ␤-actin and PTP-1E genes following second-strand synthesis. Following second-strand synthesis, 10%
of the reaction was fractionated on a 0.8% agarose gel and processed for Southern blotting. Note that the blot was first hybridized with
␤-actin, then stripped and rehybridized with a PTP-1E probe, to reduce the signal arising from the ␤-actin cDNA. B: sucrose gradient
fractionation of second-strand product. Following BstXI adaptor ligation and XhoI digestion, 33P-labeled second-strand product was loaded
on a 10–30% sucrose gradient and centrifuged in an SW40 rotor at 20,500 rpm for 23 h. Fractions were collected by puncturing the bottom
of the centrifuge tube and collecting 30 drops per fraction. These were precipitated, and an aliquot (10% of the sample) was analyzed by
electrophoresis on a 0.8% agarose gel. The gel was treated with EN3HANCE, dried, and exposed to X-OMAT film (Kodak).
Physiol Genomics • VOL
6•
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
(132) or to add 5⬘ end sequences during the RT process
based on CapFinder technology (Clontech). In test assays utilizing flWT1/(GNRA)2, we found that nontemplate cytidine residues appeared to also be added to
incomplete first-strand product, suggesting to us that
minimal discrimination between FL and incomplete
products is achieved with approaches relying on the
addition of 3⬘ end nontemplate residues by RTs during
first strand.
We evaluated two criteria in choosing a method for
second-strand synthesis. First, we avoided any procedures that required an amplification step, due to high
mutation rate and alteration of transcript representation. Second, we were interested in a method that
provided the highest conversion yield of first-strand
into second-strand product. In our hands, the classic
method of Okayama and Berg (115) and Gubler and
Hoffman (58) provided the highest yield, with minimal
amount of sample manipulation. As shown in Fig. 8A,
Southern blotting of the second-strand product with a
probe against ␤-actin and PTP-1E demonstrates that
the majority of second-strand product is still full length
and that size fractionation of the mRNA template effectively produces fractions that are enriched for the
different cDNA size classes.
REVIEW
imposing an upper size limit on the inserts that can be
propagated in this system. 4) Excision of inserts from
the vectors is a labor-intensive process, even for ␭ZAPbased libraries.
There are a number of requirements that a vector
system designed for library maintenance should fulfill
to satisfy the needs of the genomics community. The
vector must be easy to manipulate, allow propagation
of large inserts, allow excision or manipulation of the
intact cDNA, and be compatible with current sequencing technology. A plasmid-based cDNA library has the
advantage that clones can be individually manipulated
utilizing picking robots and pipetting stations. Clones
from a given library can be pooled, and schemes for
PCR-based screening can be implemented (111). Although the initial costs and labor in setting up these
arrayed libraries is significant, the library pools can be
stored indefinitely at ⫺80°C and represent enough
material for ⬎10,000 individual screens; these libraries represent an essentially permanent and maintenance-free resource. In addition, PCR screening of arrayed libraries is adaptable to high throughput, and
can be completed in a few days (111). For hybridization-based screening, the clones of the libraries can be
gridded onto nitrocellulose filters and microarrays.
We have modified a pUC-based vector for library
propagation (Fig. 9A). pUC plasmids are high copy due
to a point mutation (G to A) immediately preceding the
RNA I (antisense RNA) gene in the origin of replication. This mutation alters the conformation of RNA II
(RNA primer), preventing the hybridization of RNA I
Fig. 9. A: schematic diagram of pMD1,
a pUC18-derived vector. The binding
sites for the M13 reverse and T7 sequencing primers are shown. The stuffer
is an ⬃650-bp BglII fragment derived
from ␭. B: temperature-sensitive replication of pUC18 and pMD1. E. coli
DH10␤ cells were grown at the indicated temperatures to the same OD660,
at which point DNA mini-preps were performed to isolate the plasmids. DNA was
analyzed on a 1% agarose gel and visualized by staining with ethidium bromide.
DNA from equal cell numbers are
loaded in each lane. C: stable propagation of a 16.8-kbp BamHI fragment isolated from ␭ and cloned into pMD1. E.
coli DH10␤ cells were grown at 30°C,
and supercoiled DNA (lane 1) or
BamHI-digested DNA (lane 2) was analyzed on a 1% agarose gel.
Physiol Genomics • VOL
6•
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
Until 1983, almost all cDNA libraries were propagated in plasmid vectors, and were usually maintained
as a collection of more than 105 independently transformed bacterial colonies. Sometimes these colonies
were pooled, amplified in liquid culture, and stored at
⫺70°C; more frequently they were maintained on the
surfaces of nitrocellulose filters (62). However, these
libraries were difficult to maintain without loss of clone
viability, and screening by hybridization to multiple
radioactive probes required labor-intensive replication
of colonies from one nitrocellulose filter to another.
With the advent of bacteriophage cloning vectors, it
became possible to take advantage of the high efficiency and reproducibility of packaging bacteriophage
␭ DNA in vitro into infectious virus particles. The
resulting libraries could be amplified and stored indefinitely without loss of clone viability and could be
screened with both nucleic acid and antibody probes.
Testimony to the usefulness of the ␭ cloning vectors is
that the majority of current cDNA libraries utilize
these vectors. However, even the bacteriophage propagation system suffers from serious drawbacks. 1) Amplification of bacteriophage cDNA libraries leads to loss
of clone representation due to differential growth rates
of different clones. 2) Plaque hybridization methods are
cumbersome, labor intensive, time consuming, and
thus not adaptable for larger-scale screening efforts.
Filter replicas of cDNA libraries have finite lifespans
and must be periodically regenerated, resulting in the
rapid consumption of unique and valuable resources. 3)
Up to 10 kbp in length may be propagated in ␭ZAP,
71
72
REVIEW
Normalization and Subtraction Strategies
Reassociation-kinetic analysis indicates that the
mRNAs of a typical cell are distributed into three
frequency classes: a highly abundant class consisting
of 10–15 mRNAs that together represent 10–20% of
the total mRNA mass, a middle abundance class consisting of 1,000–2,000 mRNAs making up 30–40% of
the total mRNA, and a low-abundance class consisting
of 15,000–20,000 mRNAs covering ⬃50% of the total
mRNA (18). Hence, sequencing cDNAs from standard
libraries is not an efficient approach for novel gene
discovery or for identifying rare cDNAs. Thus for gene
identification endeavors, normalization of cDNA libraries has been an important tool (140). Reassociation
kinetics has been successfully used toward this end
(23, 85, 119, 139). Indeed, an evaluation of the extent of
normalization has indicated that, from an extreme
range of abundance of four orders of magnitude in an
original library, the frequency of occurrence of any
clone examined in the normalized library can be
brought within the narrow range of only one order of
magnitude (139). However, many normalization procedures were not developed with the aim of selecting or
maintaining large cDNA coding regions and thus involve library amplification, complex in vitro manipulations, and retransformation of normalized clones. Indeed, the length of some cDNAs are noticeably
shortened in some normalized libraries (23). Potentially severe problems are associated with the need to
use both amplified (rather than primary) libraries and
protocols involving exponential amplification of complex pools of inserts.
Recently, Carninci et al. (35) have described a different approach to gene normalization and subtraction. In
Physiol Genomics • VOL
6•
their approach, they have incorporated a methodology
that appears to be compatible with maintaining selection for large cDNA clones. They normalize their cDNA
preparations using an aliquot of biotinylated cellular
mRNA that they prepare (utilizing a Rot of 10) by
removing RNA/cDNA duplexes with streptavidincoated magnetic beads. Additionally, they have developed a subtraction approach where nonredundant
clones from the RIKEN libraries are used to generate
biotinylated mRNA using in vitro transcription reactions. This RNA is then hybridized to first-strand product at a Rot of 5, and duplexes are removed utilizing
streptavidin-coated magnetic beads (35). They clearly
demonstrated the efficacy of their normalization and
subtraction approach. One concern that needs to be
monitored is the nonspecific removal of rare transcripts during the normalization steps.
AFFINITY SELECTION OF FULL-LENGTH
COMPLEMENTARY DNAS
A common structural feature of all cellular eukaryotic mRNAs (except for organelle mRNAs) analyzed to
date is the presence of a 5⬘-terminal m7GpppN (where
N is any nucleotide) structure, termed the cap (140).
This structure is added early during RNA polymerase
II-driven transcription and is required for several steps
of mRNA biogenesis. Consistent with the diverse roles
of the cap during gene expression, a number of cap
binding proteins have been identified in the cytoplasm
and nucleus (54, 140). The first-described, and bestcharacterized of these, is eukaryotic initiation factor-4E (eIF-4E), a 24-kDa polypeptide that has been
cloned and characterized from a number of species.
This protein shows strong binding specificity for methylated, cap structures of eukaryotic mRNAs (54).
We have developed an affinity chromatography procedure, called CAPture, that allows for the purification
of mRNA via the cap structure (44). Previous to this,
several laboratories had used antibodies directed
against the cap structure to select and purify eukaryotic mRNAs via their 5⬘ end. Schwer et al. (134) and de
Magistris and Stunnenberg (42) used a polyclonal antim7G antibody to demonstrate that vaccinia virus late
mRNAs are discontinuously synthesized. Muhlrad et
al. (109) used this antibody to define an mRNA decay
pathway in which polyadenylation leads to decapping.
However, this antibody resource is limiting, and the
efficiency of cap selection is not known. In related
studies, a monoclonal antibody directed against 2,2,7trimethylguanosine was generated and used to purify
snRNAs (19) and to demonstrate the presence of a
caplike structure at the 5⬘ end of mutant ␤-globin
transcripts (96). Although the antibody is not limiting,
its use in purifying capped mRNAs is somewhat restricted due to its 15- to 20-fold lower affinity for m7G
cap structures relative to trimethylated caps (19, 96).
Nonetheless, these experiments demonstrated the feasibility of selecting mRNA molecules via their cap
structure. The use of a bifunctional cap binding protein, such as protein A/meIF-4E, provides a highly
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
to RNA II and resulting in an increased rate of DNA
replication from the ColEI ori. The phenotypic effects
of this mutation are suppressed by lowering the growth
temperature to 30°C (97). Thus the ColEI origin of
replication in pUC is temperature sensitive, and its
copy number can be controlled by different growth
temperatures for safe propagation of large inserts. We
have deleted all nonessential regions from pUC to
construct the pMD1 vector. We have confirmed that the
new vector is temperature sensitive (Fig. 9B) and allows propagation of inserts as large as 16.8 kbp (Fig.
9C). This vector allows the generation of directionally
cloned cDNA libraries utilizing the XhoI and BstXI
restriction enzyme sites.
It is well documented that bacterial strains with
different inserts do not always have the same growth
rates or lag phase (138). Thus one issue with library
maintenance deals with amplification and the potential for loss of clone representation during this process.
We feel it is best to pick individual colonies to avoid
this potential problem, and in our experience, following
an initial library plating, colony size can vary by as
much as 10-fold. Methods for handling large number of
cDNA clones and sorting them by oligonucleotide fingerprinting have been described (32, 47, 65, 134).
REVIEW
Physiol Genomics • VOL
6•
to note that the oxidation reaction with NaIO4 is difficult to control, and the molar ratio of periodate to
substrate is important, otherwise one gets destruction
of base rings (142, 151).
Many libraries have been generated with the cap
trapper procedure, indicating that this technique has
been well adopted to a cDNA cloning program currently in place at RIKEN (35). It is difficult for us to
assess whether this approach is truly enriching for FL
sequences. A perusal of the average cDNA insert size of
cap trapper libraries suggests that many of the clones
in these libraries are of the same size as clones from
commercially available libraries. An additional concern with these 5⬘ end affinity approaches is that they
may actually select against certain genes. Thus genes
with exceptionally difficult 5⬘ ends (high GC content,
abundant secondary structure) that will block RT processivity should not be present in cap trapper libraries.
Hence, our experience with procedures aimed at enriching for FL cDNAs indicate that these shortcomings
first need to be addressed and resolved before they are
integrated into cDNA library synthesis protocols.
AUTOMATION AND MINIATURIZATION
OF THE PROCESS
Considering the number of complex steps involved in
cDNA synthesis and the number of libraries one would
like to generate from different tissues and cell lines,
the integration of this technology into a microfluidics
platform where some of the reactions could be performed in a closed, miniaturized, and integrated system would greatly accelerate the generation of cDNA
libraries. In addition, a closed platform would minimize the introduction of nucleases by the experimenter. We feel that size fractionation of RNA, oligo
(dT) selection of mRNA, synthesis of first strand, synthesis of second strand, polishing of cDNA ends, fractionation of double-stranded cDNA, and ligation into
the appropriate cloning vector are all amenable to
miniaturization and microfluidics technology. Indeed,
several years ago we approached a microfluidics group
and presented this concept to them (82). They have
utilized poly-T-coated dynal beads and demonstrated
that selection of mRNA from total RNA is feasible
utilizing microfluidics, suggesting that it would be possible to perform cDNA library construction in microfluidic devices. There is still room for improvement here,
however, since the efficiency of poly(A)⫹ mRNA selection appeared to be poor and the authors did not
perform an extensive analysis of their selected material to demonstrate that there was no change in RNA
representation or quality (across different size groups)
during the selection process (82). With more sophisticated approaches and better chip designs (31), one
should be able to achieve respectable yields and
throughput. Indeed, it should also be possible to link
up mRNA isolation, cDNA synthesis, and amplification
by PCR utilizing microfluidics technology (161).
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
efficient and specific method for cap selection (44).
Physical immobilization of the eIF-4E cap binding protein produces an affinity column that is capable of
binding to mRNA cap structures with high affinity (44)
and which has been used in several analytical procedures (50, 105). In addition, following first-strand synthesis, one can treat the cDNA/mRNA hybrids with
single strand-specific nucleases (such as RNase A) to
remove the cap structure from mRNA in duplexes
where the cDNA has not progressed all the way to the
5⬘ end. In FL cDNA/mRNA hybrids, the mRNA is
protected, and consequently only these hybrids will
retain a cap structure. Selection by affinity chromatography utilizing immobilized eIF-4E leads to enrichment of FL cDNAs (44).
One shortfall of this approach that we have noticed is
that secondary structure immediately adjacent to the
cap structure lowers the efficiency of selection (J. Pelletier, data not shown). This is consistent with studies
on eukaryotic translation indicating that the affinity of
eIF-4E is rate limiting for mRNAs with inaccessible
cap structures, compared with those where the cap
structure is accessible (94). Hence, the lowered efficiency of CAPture for cDNA/mRNA duplexes can
partly be due to a decreased affinity of eIF-4E for the
m7G group due to steric hindrance from the 3⬘ terminal
nucleotide from the DNA strand of the duplex. This
possibility is based on data from the cocrystal structure
of eIF-4E bound to m7GDP (103). eIF-4E is shaped like
a cupped hand with the base of the m7GDP ligand
buried deep in the cup and the phosphate residues
oriented toward the outside of the cup (103). It is
predicted from the crystal structure of eIF-4E that the
last template-derived base of the cDNA would abut
against the first ␤-sheet of the palm of the eIF-4E
structure and is likely to interfere with binding. The
situation is worsened by the fact that RTs, such as
Superscript II, add additional non-template-derived
nucleotides to the 3⬘ ends of cDNAs. The use of other
cap binding proteins, as well as mutants of eIF-4E, in
the CAPture assay is currently under evaluation.
Recently, a variation on CAPture, called cap trapper,
has been described in which a biotin group is introduced into the diol residue of the cap structure (after
oxidation of the ribose moiety with NaIO4). Following
treatment of the cDNA/mRNA duplex with RNase
ONE (a single-strand-specific nuclease), selection of FL
cDNAs is undertaken utilizing streptavidin-coated
magnetic beads (33). Although this method overcomes
the limitations described above for CAPture, in our
hands, this method has not proven to be specific. That
is, if we compare the ability of cap trapper to discriminate between cDNA duplex with capped mRNA (generated in vitro) or duplexed with uncapped mRNA
(generated in vitro), then we are unable to obtain
specific selection of capped over uncapped transcripts
(J. Pelletier, data not shown). This is likely due to the
fact that biotin-hydrazide can also react with unoxidized RNA due to incipient reaction of cytosine residues (66, 142). Hence, addition of biotin is not solely
directed toward the cap structure. Also, it is important
73
74
REVIEW
INFORMATION CONTENT OF COMPLEMENTARY DNAS
Problems Associated with ORF Prediction
in Eukaryotic mRNAs
Physiol Genomics • VOL
6•
The Untranslated Regions
Posttranscriptional regulation of gene expression is
an important determinant of the final protein concentration produced from a particular transcript. The significance of such controls is evident from studies in
Xenopus laevis, Drosophila melanogaster, and Caenorhabditis elegans, where early patterning in development is largely determined by regulating the distribution, stability, and translation of inherited maternal
transcripts (135). In mammals, posttranscriptional
regulation has been well described under conditions of
heat shock (137) in response to iron availability (67) or
oxygen deprivation (106) or in response to growth factors (3). Two well-established posttranscriptional control mechanisms are alterations in mRNA stability and
mRNA translation efficiency. The cis-acting mRNA sequences that are involved in these processes generally
reside within the mRNA 5⬘-UTR and/or 3⬘-UTR. Thus
a thorough characterization of mRNA UTRs will likely
provide tremendous insight into these regulatory mechanisms. An example is the description of conserved RNA
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
Most interest in cDNAs is aimed at being able to
identify the major open reading frame (ORF) of eukaryotic genes to pursue functional studies. Unfortunately,
to date, there are no absolute sets of rules by which this
can be done with certainty, and one can only be guided
by generalities learnt from studies on translation. In
many cases, one can predict in silico the largest ORF of
an mRNA from cDNA sequence, based on the presence
of an AUG codon and one of three termination codons.
However, this oversimplified approach does not always
yield all the polypeptides that can be potentially encoded by a eukaryotic mRNA.
It is generally accepted that eukaryotic mRNAs engage small ribosomal units (actually a 43S preinitiation complex consisting of a 40S ribosome and associated initiation factors) for translation by one of two
ways: 1) via recruitment to the cap structure
(m7GpppN) present at the 5⬘ end of eukaryotic cellular
mRNAs, followed by threading to the appropriate initiation codon, or 2) by binding to internal ribosome
entry sites (121). However, until mRNA primary/secondary structural features for internal initiation are
known and defined, one cannot predict ORFs that
utilize this mechanism of initiation on cellular mRNAs.
These ORFs can be identified only by functional assays
(for example, see Ref. 121) and hence will not be predicted by in silico approaches (Fig. 10A).
The presence of upstream AUG codons that are in a
different reading frames than the AUG of the major
ORF can act to shunt ribosomes past the AUG codon of
the major ORF (Fig. 10B). In the event that there is a
second AUG codon in frame and downstream of the
first AUG of the major ORF, this event would produce
a polypeptide that is shorter than predicted by in silico
approaches (Fig. 10B).
It has been shown that the presence of upstream
ORFs (uORFs) can decrease, or eliminate, the number
of ribosomes that initiate at a downstream AUG (88),
especially if the downstream AUG is in close proximity
to the termination codon of a translated uORF (88)
(Fig. 10B). It appears that a threading ribosome, after
having terminated translation of an uORF, requires
some time (and hence intercistronic distance) before it
can become competent again for reinitiation (this probably reflects a requirement on the part of the ribosome
to recruit an initiator Met-tRNA). Thus this mRNA
configuration is predicted to produce a polypeptide that
is shorter than predicted by in silico approaches, since
the ribosomes would “miss” the first AUG of the major
ORF and initiate at a downstream AUG.
Sequences flanking the AUG can contribute to the
efficiency with which that codon is recognized as an
initiation codon (87). An optimal context for an AUG is
(A/G)ccAUGg, with the greatest contribution being
from the purine at the ⫺3 position relative to the A of
the AUG codon (87). Thus mRNAs where the first AUG
of the major ORF is in a “suboptimal” context, would be
predicted to produce more that one polypeptide.
The presence of secondary structure surrounding an
AUG codon can influence the efficiency with which that
initiation codon is recognized. Secondary structure immediately downstream of an AUG codon can “pause”
scanning ribosomes sufficiently long to allow recognition of the initiation codon (86). Hence, a predicted
AUG codon may not be efficiently used because of lack
of local secondary structure, thus allowing a certain
percentage of ribosomes to bypass the first AUG codon
and initiate at downstream AUGs.
Although AUG codons are generally used as initiation codons for eukaryotic mRNAs, there are examples
of cellular mRNAs that utilize other codons (GUG,
ACG, and CUG) for this purpose. Examples include
some protooncogenes (myc, int-2, pim-1, and lyl-1) (1,
63, 107, 131), the basic fibroblast growth factor gene
(124), retinoic acid receptor ␤4 (112), Krox-24 (also a
member of the EGR family) (95), the ltk receptor (14),
and the WT1 tumor suppressor gene (29). The nature
of the signals that dictate the use of a CUG codon as an
initiation codon is not well understood, although immediate downstream sequences can influence the efficiency with which CUG codons are selected (22, 56).
These non-AUG codons are difficult to accurately predict by in silico approaches, since little is understood
regarding the criteria that need to be met for their
selection as initiation codons.
From these examples, it should be clear that bioinformatic approaches that simply rely on defining the
longest ORF, or defining the initiation codon based on
context sequences, are oversimplifying a complex biological process (i.e., the selection of the initiator codon).
Such predictions should be supplemented with careful
functional studies to provide true guidance as to the
identity of the protein(s) potentially encoded by a given
mRNA transcript.
75
REVIEW
Physiol Genomics • VOL
6•
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
Fig. 10. Schematic representation of ways in which protein
diversity that can be generated by eukaryotic mRNAs. A:
there are two mechanisms by which eukaryotic ribosomes
can initiate translation on mRNA templates, either a scanning mechanism or by internal initiation of translation (see
text for details). B: situations in which several polypeptides
can be produced from a single eukaryotic mRNA species.
Although most eukaryotic mRNAs are structurally monocistronic, there are many situations in which they can be functionally polycistronic. See text for details.
76
REVIEW
TOWARD A HUMAN GENE SET
If current estimates of the total gene number are
accurate, based on recent published analysis of the
human draft sequence (91, 158), then the biological
community will require ⬃30,000–40,000 cDNAs to represent one form of every gene encoded by the genome.
This is certainly an attainable goal. However, since
gene diversity can be increased by a number of posttranscriptional processes (alternate promoter usage,
alternative splicing, alternate polyadenylation, and
RNA editing), in reality a much larger set will be
required to obtain functional cDNA copies of all mRNA
isoforms.
Recently, a number of cDNA libraries have been
characterized to isolate cDNAs with complete ORFs
(35, 73, 144, 145, 165, 166). It is clear from these
studies that a large effort will be necessary to achieve
this task and that different approaches currently underway are likely to complement each other. The approaches described herein should be useful in providing a framework by which good quality cDNA libraries
can be generated and subsequently used to identify
important cDNA reagents.
J. Pelletier is a Canadian Institutes of Health Research Senior
Investigator. This work was supported by the Merck Genome Research Initiative and by National Institutes of Health Grant CA85174 to J. Pelletier.
REFERENCES
1. Acland P, Dixon M, Peters G, and Dickson C. Subcellular
fate of the int-2 oncoprotein is determined by choice of initiation
codon. Nature 343: 662–665, 1990.
2. Althaus IW, Franks KM, Langley KB, Kezdy FJ, Peterson
T, Buxser SE, Decker DE, Dolak LA, Ulrich RG, and
Reusser F. The amphiphilic properties of novenamines determine their activity as inhibitors of HIV-1 RNase H. Experientia
52: 329–335, 1996.
3. Amara FM, Chen FY, and Wright JA. A novel transforming
growth factor-beta 1 responsive cytoplasmic trans-acting factor
binds selectively to the 3⬘-untranslated region of mammalian
ribonucleotide reductase R2 mRNA: role in message stability.
Nucleic Acids Res 21: 4803–4809, 1993.
Physiol Genomics • VOL
6•
4. Antao VP, Lai SY, and Tinoco I. A thermodynamic study of
unusually stable RNA and DNA hairpins. Nucleic Acids Res 19:
5901–5905, 1991.
5. Baird NR, Orlowski J, Szabo EZ, Zaun HC, Schultheis PJ,
Menon AG, and Shull GE. Molecular cloning, genomic organization, and functional expression of Na⫹/H⫹ exchanger isoform 5 (NHE5) from human brain. J Biol Chem 274: 4377–
4382, 1999.
6. Bakhanashvili M and Hizi A. Fidelity of the reverse transcriptase of human immunodeficiency virus type 2. FEBS Lett
306: 151–156, 1992.
7. Bakhanashvili M and Hizi A. Fidelity of the RNA-dependent
DNA synthesis exhibited by the reverse transcriptases of human immunodeficiency virus types 1 and 2 and of murine
leukemia virus: mispair extension frequencies. Biochemistry
31: 9393–8398, 1992.
8. Bassett JH, Rashbass P, Harding B, Forbes SA, Pannett
AA, and Thakker RV. Studies of the murine homolog of the
multiple endocrine neoplasia type 1 (MEN1) gene, men1.
J Bone Miner Res 14: 3–10, 1999.
9. Belyavsky A, Vinogradova T, and Rajewsky K. PCR-based
cDNA library construction: general cDNA libraries at the level
of a few cells. Nucleic Acids Res 17: 2919–2932, 1989.
10. Berger SL. Isolation of cytoplasmic RNA: ribonucleoside-vanadyl complexes. Methods Enzymol 152: 227–234, 1987.
11. Berger SL. Preparation and characterization of RNA: overview. Methods Enzymol 152: 215–219, 1987.
12. Berger SL and Chirgwin JM. Isolation of RNA. Methods
Enzymol 180: 3–13, 1989.
13. Berget SM. Exon recognition in vertebrate splicing. J Biol
Chem 270: 2411–2414, 1995.
14. Bernards A and de la Monte SM. The ltk receptor tyrosine
kinase is expressed in pre-B lymphocytes and cerebral neurons
and uses a non-AUG translational initiator. EMBO J 9: 2279–
2287, 1990.
15. Bertrand E, Fromont-Racine M, Pictet R, and Grange T.
Visualization of the interaction of a regulatory protein with
RNA in vivo. Proc Natl Acad Sci USA 90: 3496–500, 1993.
16. Bertrand EL and Rossi JJ. Facilitation of hammerhead
ribozyme catalysis by the nucleocapsid protein of HIV-1 and the
heterogeneous nuclear ribonucleoprotein A1. EMBO J 13:
2904–2912, 1994.
17. Birney E and Durbin R. Using GeneWise in the Drosophila
annotation experiment. Genome Res 10: 547–548, 2000.
18. Bishop JO, Morton JG, Rosbash M, and Richardson M.
Three abundance classes in HeLa cell messenger RNA. Nature
250: 199–204, 1974.
19. Bochnig P, Reuter R, Bringmann P, and Luhrmann R. A
monoclonal antibody against 2,2,7-trimethylguanosine that reacts with intact, class U, small nuclear ribonucleoproteins as
well as with 7-methylguanosine-capped RNAs. Eur J Biochem
168: 461–467, 1987.
20. Bodescot M and Brison O. Efficient second-strand cDNA
synthesis using T7 DNA polymerase. DNA Cell Biol 13: 977–
985, 1994.
21. Bodescot M and Brison O. T7 DNA polymerase requires
unusual reaction conditions for blunt-ending activity. Anal Biochem 216: 234–235, 1994.
22. Boeck R and Kolakofsky D. Positions ⫹5 and ⫹6 can be
major determinants of the efficiency of non- AUG initiation
codons for protein synthesis. EMBO J 13: 3608–3617, 1994.
23. Bonaldo MF, Lennon G, and Soares MB. Normalization and
subtraction: two approaches to facilitate gene discovery. Genome Res 6: 791–806, 1996.
24. Brandhorst BP and McConkey EH. Relationship between
nuclear and cytoplasmic poly(adenylic acid). Proc Natl Acad Sci
USA 72: 3580–3584, 1975.
25. Bray M, Prasad S, Dubay JW, Hunter E, Jeang KT, Rekosh D, and Hammarskjold ML. A small element from the
Mason-Pfizer monkey virus genome makes human immunodeficiency virus type 1 expression and replication Rev-independent. Proc Natl Acad Sci USA 91: 1256–1260, 1994.
26. Brockdorff N, Ashworth A, Kay GF, McCabe VM, Norris
DP, Cooper PJ, Swift S, and Rastan S. The product of the
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
sequences within 3⬘-UTRs that act as sensors to environmental stress (141).
Since specific RNA-binding proteins are generally
the mediators of such posttranscriptional effects,
knowledge about the primary sequence of the UTRs is
necessary to identify the mediators of these responses.
This is most evident in the case of insulin-like growth
factor II (IGF-II), a fetal growth factor with autocrine
and paracrine modes of action, where several mRNA
isoforms are produced; one is the 4.8-kb leader 4 mRNA
with a 5⬘-UTR of 100 nucleotides, and another is the
abundant 6.0-kb leader 3 mRNA comprising a 1,170-nt
cytidine-rich (48%) 5⬘-UTR. The IGF-II leader 3 mRNA
switches from a translated to a repressed state between embryonic days 11.5 and 12.5 in the mouse
(114). Implicated in this regulation are specific cytoplasmic 5⬘-UTR binding proteins that recognize the
leader 3 isoform, but not the leader 4 IGF-II mRNA
isoform (114).
77
REVIEW
27.
28.
29.
30.
31.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
Physiol Genomics • VOL
6•
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
5⬘ ends of mRNAs and for constructing cDNA libraries by in
vitro amplification. Nucleic Acids Res 19: 5227–5232, 1991.
Efstratiadis A, Kafatos FC, Maxam AM, and Maniatis T.
Enzymatic in vitro synthesis of globin genes. Cell 7: 279–288,
1976.
Eickhoff H, Schuchhardt J, Ivanov I, Meier-Ewert S,
O’Brien J, Malik A, Tandon N, Wolski EW, Rohlfs E,
Nyarsik L, Reinhardt R, Nietfeld W, and Lehrach H.
Tissue gene expression analysis using arrayed normalized
cDNA libraries. Genome Res 10: 1230–1240, 2000.
Emerman M, Vazeux R, and Penden K. The rev gene product of the human immunodeficiency virus affects envelopespecific RNA localization. Cell 57: 1155–1165, 1989.
Fremont P, Dionne FT, and Rogers PA. Recovery of biologically functional messenger RNA from agarose gels by passive
elution. Anal Biochem 156: 508–514, 1986.
Fresco LD and Buratowski S. Conditional mutants of the
yeast mRNA capping enzyme show that the cap enhances, but
is not required for, mRNA splicing. RNA 2: 584–596, 1996.
Fukuoka S and Scheele GA. Novel strategy for synthesis of
full-length double-stranded cDNA transcripts without dC-dG
tails. Nucleic Acids Res 19: 6961–6962, 1991.
Gedamu L, Iatrou K, and Dixon GH. Isolation and characterization of trout testis protamine mRNAs lacking poly (A).
Cell 10: 443–451, 1977.
Gerard GF, Fox DK, Nathan M, and D’Alessio JM. Reverse
transcriptase. The use of cloned Moloney murine leukemia
virus reverse transcriptase to synthesize DNA from RNA. Mol
Biotechnol 8: 61–77, 1997.
Gingras AC, Raught B, and Sonenberg N. eIF4 initiation
factors: effectors of mRNA recruitment to ribosomes and regulators of translation. Annu Rev Biochem 68: 913–963, 1999.
Glos M, Kreienkamp HJ, Hausmann H, and Richter D.
Characterization of the 5⬘-flanking promoter region of the rat
somatostatin receptor subtype 3 gene. FEBS Lett 440: 33–37,
1998.
Grunert S and Jackson RJ. The immediate downstream
codon strongly influences the efficiency of utilization of eukaryotic translation initiation codons. EMBO J 13: 3618–3630,
1994.
Gruter P, Tabernero C, von Kobbe C, Schmitt C, Saavedra C, Bachi A, Wilm M, Felber BK, and Izaurralde E.
TAP, the human homolog of Mex67p, mediates CTE-dependent
RNA export from the nucleus. Mol Cell 1: 649–659, 1998.
Gubler U and Hoffman BJ. A simple and very efficient
method for generating cDNA libraries. Gene 25: 263–269, 1983.
Guo Z, Liu Q, and Smith LM. Enhanced discrimination of
single nucleotide polymorphisms by artificial mismatch hybridization. Nat Biotechnol 15: 331–335, 1997.
Hadzopoulou-Cladaras M, Felber BK, Cladaras C, Athanassopoulos A, Tse A, and Pavlakis GN. The rev (trs/art)
protein of human immunodeficiency virus type 1 affects viral
mRNA and protein expression via a cis-acting sequence in the
env region. J Virol 63: 1265–1274, 1989.
Hagenbuch B, Lubbert H, Stieger B, and Meier PJ. Expression of the hepatocyte Na⫹/bile acid cotransporter in Xenopus laevis oocytes. J Biol Chem 265: 5357–5360, 1990.
Hanahan D and Meselson M. Plasmid screening at high
colony density. Methods Enzymol 100: 333–342, 1983.
Hann SR, King MW, Bentley DL, Anderson CW, and
Eisenman RN. A non-AUG translational initiation in c-myc
exon 1 generates an N-terminally distinct protein whose synthesis is disrupted in Burkitt’s lymphomas. Cell 52: 185–195,
1988.
Harrison B and Zimmerman SB. Polymer-stimulated ligation: enhanced ligation of oligo- and polynucleotides by T4 RNA
ligase in polymer solutions. Nucleic Acids Res 12: 8235–8251,
1984.
Hartuv E, Schmitt AO, Lange J, Meier-Ewert S, Lehrach
H, and Shamir R. An algorithm for clustering cDNA fingerprints. Genomics 66: 249–256, 2000.
Hayatsu H and Ukita T. Selective modification of cytidine
residue in ribonucleic acid by semicarbazide. Biochem Biophys
Res Commun 14: 198–203, 1964.
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
32.
mouse Xist gene is a 15 kb inactive X-specific transcript containing no conserved ORF and located in the nucleus. Cell 71:
515–526, 1992.
Brown CJ, Flenniken AM, Williams BR, and Willard HF.
X chromosome inactivation of the human TIMP gene. Nucleic
Acids Res 18: 4191–4195, 1990.
Brown CJ, Hendrich BD, Rupert JL, Lafreniere RG, Xing
Y, Lawrence J, and Willard HF. The human XIST gene:
analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus. Cell
71: 527–542, 1992.
Bruening W and Pelletier J. A non-AUG translational initiation event generates novel WT1 isoforms. J Biol Chem 271:
8646–8654, 1996.
Burge C and Karlin S. Prediction of complete gene structures
in human genomic DNA. J Mol Biol 268: 78–94, 1997.
Burns MA, Johnson BN, Brahmasandra SN, Handique K,
Webster JR, Krishnan M, Sammarco TS, Man PM, Jones
D, Heldsinger D, Mastrangelo CH, and Burke DT. An
integrated nanoliter DNA analysis device. Science 282: 484–
487, 1998.
Bussow K, Nordhoff E, Lubbert C, Lehrach H, and Walter
G. A human cDNA library for high-throughput protein expression screening. Genomics 65: 1–8, 2000.
Carninci P, Kvam C, Kitamura A, Ohsumi T, Okazaki Y,
Itoh M, Kamiya M, Shibata K, Sasaki N, Izawa M, Muramatsu M, Hayashizaki Y, and Schneider C. High-efficiency
full-length cDNA cloning by biotinylated CAP trapper. Genomics 37: 327–336, 1996.
Carninci P, Nishiyama Y, Westover A, Itoh M, Nagaoka S,
Sasaki N, Okazaki Y, Muramatsu M, and Hayashizaki Y.
Thermostabilization and thermoactivation of thermolabile enzymes by trehalose and its application for the synthesis of full
length cDNA. Proc Natl Acad Sci USA 95: 520–524, 1998.
Carninci P, Shibata Y, Hayatsu N, Sugahara Y, Shibata
K, Itoh M, Konno H, Okazaki Y, Muramatsu M, and Hayashizaki Y. Normalization and subtraction of cap-trapperselected cDNAs to prepare full-length cDNA libraries for rapid
discovery of new genes. Genome Res 10: 1617–1630, 2000.
Chang DD and Sharp PA. Regulation by HIV Rev depends
upon recognition of splice sites. Cell 59: 789–795, 1989.
Chikaraishi DM. Complexity of cytoplasmic polyadenylated
and nonpolyadenylated rat brain ribonucleic acids. Biochemistry 18: 3249–3256, 1979.
D’Alessio JM and Gerard GF. Second-strand cDNA synthesis with E. coli DNA polymerase I and RNase H: the fate of
information at the mRNA 5⬘ terminus and the effect of E. coli
DNA ligase. Nucleic Acids Res 16: 1999–2014, 1988.
Danielson KG, Pillarisetti J, Cohen IR, Sholehvar B,
Huebner K, Ng LJ, Nicholls JM, Cheah KS, and Iozzo RV.
Characterization of the complete genomic structure of the human WNT-5A gene, functional analysis of its promoter, chromosomal mapping, and expression in early human embryogenesis. J Biol Chem 270: 31225–31234, 1995.
Darlix JL, Lapadat-Tapolsky M, de Rocquigny H, and
Roques BP. First glimpses at structure-function relationships
of the nucleocapsid protein of retroviruses. J Mol Biol 254:
523–537, 1995.
Das Gupta J, Gu H, Chernokalskaya E, Gao X, and
Schoenberg DR. Identification of two cis-acting elements that
independently regulate the length of poly(A) on Xenopus albumin pre-mRNA. RNA 4: 766–776, 1998.
de Magistris L and Stunnenberg HG. Cis-acting sequences
affecting the length of the poly(A) head of vaccinia virus late
transcripts. Nucleic Acids Res 16: 3141–3156, 1988.
Dib-Hajj F, Khan R, and Giedroc DP. Retroviral nucleocapsid proteins possess potent nucleic acid strand renaturation
activity. Protein Sci 2: 231–243, 1993.
Edery I, Chu LL, Sonenberg N, and Pelletier J. An efficient
strategy to isolate full-length cDNAs based on an mRNA cap
retention procedure (CAPture). Mol Cell Biol 15: 3363–331,
1995.
Edwards JB, Delort J, and Mallet J. Oligodeoxyribonucleotide ligation to single-stranded cDNAs: a new tool for cloning
78
REVIEW
Physiol Genomics • VOL
6•
88. Kozak M. Effects of intercistronic length on the efficiency of
reinitiation by eucaryotic ribosomes. Mol Cell Biol 7: 3438–
3445, 1987.
89. Kozak M. Interpreting cDNA sequences: some insights from
studies on translation. Mamm Genome 7: 563–574, 1996.
90. Kullak-Ublick GA, Gerloff T, Hagenbuch B, Berr F, Meier
PJ, and Stieger B. Expression of a rat liver phosphatidylcholine translocator in Xenopus laevis oocytes. Hepatology 23:
1254–1259, 1996.
91. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC,
Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W,
Funke R, Gage D, Harris K, Heaford A, Howland J, Kann
L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J,
Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C,
Stange-Thomann N, Stojanovic N, Subramanian A,
Wyman D, Rogers J, Sulston J, Ainscough R, Beck S,
Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R,
French L, Grafham D, Gregory S, Hubbard T, Humphray
S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L,
Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R,
Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK,
Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton
LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL,
Wendl MC, Delehaunty KD, Miner TL, Delehaunty A,
Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ,
Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen
A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA,
Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley
KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL,
Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y,
Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J,
Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T,
Pelletier E, Robert C, Wincker P, Smith DR, DoucetteStamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J,
Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump
A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen
L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP,
Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu
N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M,
Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H,
Reinhardt R, McCombie WR, de la Bastide M, Dedhia N,
Blocker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E,
Bork P, Brown DG, Burge CB, Cerutti L, Chen HC,
Church D, Clamp M, Copley RR, Doerks T, Eddy SR,
Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C,
Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K,
Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A,
Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D,
Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran
JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz
J, Slater G, Smit AF, Stupka E, Szustakowki J, ThierryMieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R,
Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins
F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA,
Patrinos A, and Morgan MJ. Initial sequencing and analysis
of the human genome. Nature 409: 860–921, 2001.
92. Lapadat-Tapolsky M, De Rocquigny H, Van Gent D,
Roques B, Plasterk R, and Darlix JL. Interactions between
HIV-1 nucleocapsid protein and viral DNA may have important
functions in the viral life cycle. Nucleic Acids Res 21: 831–839,
1993.
93. Lapadat-Tapolsky M, Pernelle C, Borie C, and Darlix JL.
Analysis of the nucleic acid annealing activities of nucleocapsid
protein from HIV-1. Nucleic Acids Res 23: 2434–2441, 1995.
94. Lawson TG, Cladaras MH, Ray BK, Lee KA, Abramson
RD, Merrick WC, and Thach RE. Discriminatory interaction
of purified eukaryotic initiation factors 4F plus 4A with the 5⬘
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
67. Hentze MW and Kuhn LC. Molecular control of vertebrate
iron metabolism: mRNA-based regulatory circuits operated by
iron, nitric oxide, and oxidative stress. Proc Natl Acad Sci USA
93: 8175–8182, 1996.
68. Herschlag D, Khosla M, Tsuchihashi Z, and Karpel R. An
RNA chaperone activity of non-specific RNA binding proteins in
hammerhead ribozyme catalysis. EMBO J 13: 2913–2924,
1994.
69. Higuchi R, Paddock GV, Wall R, and Salser W. A general
method for cloning eukaryotic structural gene sequences. Proc
Natl Acad Sci USA 73: 3146–3150, 1976.
70. Hilton AA, Slavin AJ, Hilton DJ, and Bernard CC. Characterization of cDNA and genomic clones encoding human myelin oligodendrocyte glycoprotein. J Neurochem 65: 309–318,
1995.
71. Hizi A, Leis JP, and Joklik WK. The RNA-dependent DNA
polymerase of avian sarcoma virus B77. Binding of viral and
nonviral ribonucleic acids to the ␣, ␤2, and ␣␤ forms of the
enzyme. J Biol Chem 252: 6878–6884, 1977.
72. Houdebine LM. Absence of poly (A) in a large part of newly
synthesized casein mRNAs. FEBS Lett 66: 110–113, 1976.
73. Hu RM, Han ZG, Song HD, Peng YD, Huang QH, Ren SX,
Gu YJ, Huang CH, Li YB, Jiang CL, Fu G, Zhang QH, Gu
BW, Dai M, Mao YF, Gao GF, Rong R, Ye M, Zhou J, Xu
SH, Gu J, Shi JX, Jin WR, Zhang CK, Wu TM, Huang GY,
Chen Z, Chen MD, and Chen JL. Gene expression profiling
in the human hypothalamus-pituitary-adrenal axis and fulllength cDNA cloning. Proc Natl Acad Sci USA 97: 9543–9548,
2000.
74. Huang J and Liang TJ. A novel hepatitis B virus (HBV)
genetic element with Rev response element-like properties that
is essential for expression of HBV gene products. Mol Cell Biol
13: 7476–7486, 1993.
75. Huang Y, Wimler KM, and Carmichael GG. Intronless
mRNA transport elements may affect multiple steps of premRNA processing. EMBO J 18: 1642–1652, 1999.
76. Huang ZM and Yen TS. Role of the hepatitis B virus posttranscriptional regulatory element in export of intronless transcripts. Mol Cell Biol 15: 3864–3869, 1995.
77. Huber HE, McCoy JM, Seehra JS, and Richardson CC.
Human immunodeficiency virus 1 reverse transcriptase. Template binding, processivity, strand displacement synthesis, and
template switching. J Biol Chem 264: 4669–4678, 1989.
78. Hubner A, Kruhoffer M, Grosse F, and Krauss G. Fidelity
of human immunodeficiency virus type I reverse transcriptase
in copying natural RNA. J Mol Biol 223: 595–600, 1992.
79. Inoue K, Ohno M, Sakamoto H, and Shimura Y. Effect of
the cap structure on pre-mRNA splicing in Xenopus oocyte
nuclei. Genes Dev 3: 1472–1479, 1989.
80. Ji J and Loeb LA. Fidelity of HIV-1 reverse transcriptase
copying a hypervariable region of the HIV-1 env gene. Virology
199: 323–330, 1994.
81. Ji JP and Loeb LA. Fidelity of HIV-1 reverse transcriptase
copying RNA in vitro. Biochemistry 31: 954–958, 1992.
82. Jiang G and Harrison DJ. mRNA isolation in a microfluidic
device for eventual integration of cDNA library construction.
Analyst 125: 2176–2179, 2000.
83. Khan R and Giedroc DP. Recombinant human immunodeficiency virus type 1 nucleocapsid (NCp7) protein unwinds tRNA.
J Biol Chem 267: 6689–6695, 1992.
84. Kienzle N, Young DB, Liaskou D, Buck M, Greco S, and
Sculley TB. Intron retention may regulate expression of Epstein-Barr virus nuclear antigen 3 family genes. J Virol 73:
1195–1204, 1999.
85. Ko MS. An “equalized cDNA library” by the reassociation of
short double-stranded cDNAs. Nucleic Acids Res 18: 5705–
5711, 1990.
86. Kozak M. Influences of mRNA secondary structure on initiation by eukaryotic ribosomes. Proc Natl Acad Sci USA 83:
2850–2854, 1986.
87. Kozak M. An analysis of 5⬘-noncoding sequences from 699
vertebrate messenger RNAs. Nucleic Acids Res 15: 8125–8148,
1987.
79
REVIEW
95.
96.
97.
98.
100.
101.
102.
103.
104.
105.
106.
107.
108.
109.
110.
111.
112.
113.
Physiol Genomics • VOL
6•
114.
115.
116.
117.
118.
119.
120.
121.
122.
123.
124.
125.
126.
127.
128.
129.
130.
131.
132.
tracted from sea algae. Antimicrob Agents Chemother 31: 1524–
1528, 1987.
Nielsen J, Christiansen J, Lykke-Andersen J, Johnsen
AH, Wewer UM, and Nielsen FC. A family of insulin-like
growth factor II mRNA-binding proteins represses translation
in late development. Mol Cell Biol 19: 1262–1270, 1999.
Okayama H and Berg P. High-efficiency cloning of full-length
cDNA. Mol Cell Biol 2: 161–170, 1982.
Ouellette AJ and Ordahl CP. Extensive homology between
poly(A)-containing mRNA and purified nominal poly(A)-lacking
mRNA in mouse kidney. J Biol Chem 256: 5104–5108, 1981.
Parker MS, Larroque ML, Campbell JM, Bobbin RP, and
Deininger PL. Novel variant of the P2X2 ATP receptor from
the guinea pig organ of Corti. Hear Res 121: 62–70, 1998.
Pasquinelli AE, Ernst RK, Lund E, Grimm C, Zapp ML,
Rekosh D, Hammarskjold ML, and Dahlberg JE. The
constitutive transport element (CTE) of Mason-Pfizer monkey
virus (MPMV) accesses a cellular mRNA export pathway.
EMBO J 16: 7500–7510, 1997.
Patanjali SR, Parimoo S, and Weissman SM. Construction
of a uniform-abundance (normalized) cDNA library. Proc Natl
Acad Sci USA 88: 1943–1947, 1991.
Patel SS, Wong I, and Johnson KA. Pre-steady-state kinetic
analysis of processive DNA replication including complete characterization of an exonuclease-deficient mutant. Biochemistry
30: 511–525, 1991.
Pelletier J and Sonenberg N. Internal initiation of translation of eukaryotic mRNA directed by a sequence derived from
poliovirus RNA. Nature 334: 320–325, 1988.
Peytavi R, Hong SS, Gay B, d’Angeac AC, Selig L, Benichou S, Benarous R, and Boulanger P. HEED, the product
of the human homolog of the murine eed gene, binds to the
matrix protein of HIV-1. J Biol Chem 274: 1635–1645, 1999.
Pollard AJ, Flanagan BF, Newton DJ, and Johnson PM.
A novel isoform of human membrane cofactor protein (CD46)
mRNA generated by intron retention. Gene 212: 39–47, 1998.
Prats H, Kaghad M, Prats AC, Klagsbrun M, Lelias JM,
Liauzun P, Chalon P, Tauber JP, Amalric F, Smith JA,
and Caput, D. High molecular mass forms of basic fibroblast
growth factor are initiated by alternative CUG codons. Proc
Natl Acad Sci USA 86: 1836–1840, 1989.
Robbins PF, El-Gamil M, Li YF, Fitzgerald EB,
Kawakami Y, and Rosenberg SA. The intronic region of an
incompletely spliced gp100 gene transcript encodes an epitope
recognized by melanoma-reactive tumor-infiltrating lymphocytes. J Immunol 159: 303–308, 1997.
Roberts JD, Preston BD, Johnston LA, Soni A, Loeb LA,
and Kunkel TA. Fidelity of two retroviral reverse transcriptases during DNA-dependent DNA synthesis in vitro. Mol Cell
Biol 9: 469–476, 1989.
Roest Crollius H, Jaillon O, Bernot A, Dasilva C, Bouneau L, Fischer C, Fizames C, Wincker P, Brottier P,
Quetier F, Saurin W, and Weissenbach J. Estimate of
human gene number provided by genome-wide analysis using
Tetraodon nigroviridis DNA sequence. Nat Genet 25: 235–238,
2000.
Rougeon F, Kourilsky P, and Mach B. Insertion of a rabbit
beta-globin gene sequence into an E. coli plasmid. Nucleic Acids
Res 2: 2365–2378, 1975.
Rougeon F and Mach B. Stepwise biosynthesis in vitro of
globin genes from globin mRNA by DNA polymerase of avian
myeloblastosis virus. Proc Natl Acad Sci USA 73: 3418–3422,
1976.
Ruderman JV and Pardue ML. Cell-free translation analysis of messenger RNA in echinoderm and amphibian early
development. Dev Biol 60: 48–68, 1977.
Saris CJ, Domen J, and Berns A. The pim-1 oncogene encodes two related protein-serine/threonine kinases by alternative initiation at AUG and CUG. EMBO J 10: 655–664, 1991.
Schmidt WM and Mueller MW. CapSelect: a highly sensitive
method for 5⬘ CAP-dependent enrichment of full-length cDNA
in PCR-mediated analysis of mRNAs. Nucleic Acids Res 27: e31,
1999.
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
99.
ends of reovirus messenger RNAs. J Biol Chem 263: 7266–
7276, 1988.
Lemaire P, Vesque C, Schmitt J, Stunnenberg H, Frank
R, and Charnay P. The serum-inducible mouse gene Krox-24
encodes a sequence-specific transcriptional activator. Mol Cell
Biol 10: 3456–3467, 1990.
Lim S, Mullins JJ, Chen CM, Gross KW, and Maquat LE.
Novel metabolism of several beta zero-thalassemic beta-globin
mRNAs in the erythroid tissues of transgenic mice. EMBO J 8:
2613–2619, 1989.
Lin-Chao S, Chen WT, and Wong TT. High copy number of
the pUC plasmid results from a Rom/Rop-suppressible point
mutation in RNA II. Mol Microbiol 6: 3385–3393, 1992.
Liu X and Mertz JE. HnRNP L binds a cis-acting RNA
sequence element that enables intron-dependent gene expression. Genes Dev 9: 1766–1780, 1995.
Loya S, Tal R, Kashman Y, and Hizi A. Illimaquinone, a
selective inhibitor of the RNase H activity of human immunodeficiency virus type 1 reverse transcriptase. Antimicrob Agents
Chemother 34: 2009–2012, 1990.
Majumdar C, Abbotts J, Broder S, and Wilson SH. Studies
on the mechanism of human immunodeficiency virus reverse
transcriptase. Steady-state kinetics, processivity, and polynucleotide inhibition. J Biol Chem 263: 15657–65, 1988.
Malim MH, Hauber J, Le SY, Maizel JV, and Cullen BR.
The HIV-1 rev trans-activator acts through a structured target
sequence to activate nuclear export of unspliced viral mRNA.
Nature 338: 254–257, 1989.
Maniatis T, Kee SG, Efstratiadis A, and Kafatos FC. Amplification and characterization of a beta-globin gene synthesized in vitro. Cell 8: 163–182, 1976.
Marcotrigiano J, Gingras AC, Sonenberg N, and Burley
SK. Cocrystal structure of the messenger RNA 5⬘ cap-binding
protein (eIF4E) bound to 7-methyl-GDP. Cell 89: 951–961,
1997.
Maruyama K and Sugano S. Oligo-capping: a simple method
to replace the cap structure of eukaryotic mRNAs with oligoribonucleotides. Gene 138: 171–174, 1994.
McCracken S, Fong N, Rosonina E, Yankulov K, Brothers
G, Siderovski D, Hessel A, Foster S, Shuman S, and
Bentley DL. 5⬘-Capping enzymes are targeted to pre-mRNA by
binding to the phosphorylated carboxy-terminal domain of RNA
polymerase II. Genes Dev 11: 3306–3318, 1997.
McGary EC, Rondon IJ, and Beckman BS. Post-transcriptional regulation of erythropoietin mRNA stability by erythropoietin mRNA-binding protein. J Biol Chem 272: 8628–8634,
1997.
Mellentin JD, Smith SD, and Cleary ML. lyl-1, a novel gene
altered by chromosomal translocation in T cell leukemia, codes
for a protein with a helix-loop-helix DNA binding motif. Cell 58:
77–83, 1989.
Morrow JR. Hydrolytic cleavage of RNA catalyzed by metal
ion complexes. Met Ions Biol Syst 33: 561–592, 1996.
Muhlrad D, Decker CJ, and Parker R. Deadenylation of the
unstable mRNA encoded by the yeast MFA2 gene leads to
decapping followed by 5⬘33⬘ digestion of the transcript. Genes
Dev 8: 855–866, 1994.
Muller G, Strack B, Dannull J, Sproat BS, Surovoy A,
Jung G, and Moelling K. Amino acid requirements of the
nucleocapsid protein of HIV-1 for increasing catalytic activity of
a Ki-ras ribozyme in vitro. J Mol Biol 242: 422–429, 1994.
Munroe DJ, Loebbert R, Bric E, Whitton T, Prawitt D, Vu
D, Buckler A, Winterpacht A, Zabel B, and Housman DE.
Systematic screening of an arrayed cDNA library by PCR. Proc
Natl Acad Sci USA 92: 2209–2213, 1995.
Nagpal S, Zelent A, and Chambon P. RAR-beta 4, a retinoic
acid receptor isoform is generated from RAR-␤2 by alternative
splicing and usage of a CUG initiator codon. Proc Natl Acad Sci
USA 89: 2718–2722, 1992.
Nakashima H, Kido Y, Kobayashi N, Motoki Y, Neushul
M, and Yamamoto N. Purification and characterization of an
avian myeloblastosis and human immunodeficiency virus reverse transcriptase inhibitor, sulfated polysaccharides ex-
80
REVIEW
Physiol Genomics • VOL
6•
150. Tang H, Xu Y, and Wong-Staal F. Identification and purification of cellular proteins that specifically interact with the
RNA constitutive transport elements from retrovirus D. Virology 228: 333–339, 1997.
151. Tomasz M and Chambers RW. Periodate oxidation of
pseudouridine. Biochim Biophys Acta 108: 510–512, 1965.
152. Troutt AB, McHeyzer-Williams MG, Pulendran B, and
Nossal GJ. Ligation-anchored PCR: a simple amplification
technique with single- sided specificity. Proc Natl Acad Sci
USA 89: 9823–9825, 1992.
153. Tsuchihashi Z and Brown PO. DNA strand exchange and
selective DNA annealing promoted by the human immunodeficiency virus type 1 nucleocapsid protein. J Virol 68: 5863–5870,
1994.
154. Tuerk C, Gauss P, Thermes C, Groebe DR, Gayle M,
Guild N, Stormo G, d’Aubenton-Carafa Y, Uhlenbeck OC,
Tinoco I Jr, Brody EN, and Gold L. CUUCGG hairpins:
extraordinarily stable RNA secondary structures associated
with various biochemical processes. Proc Natl Acad Sci USA 85:
1364–1368, 1988.
155. van Leusden MR, Kuikman I, and Sonnenberg A. The
unique cytoplasmic domain of the human integrin variant ␤4E
is produced by partial retention of intronic sequences. Biochem
Biophys Res Commun 235: 826–830, 1997.
156. Van Ness J, Maxwell IH, and Hahn WE. Complex population of nonpolyadenylated messenger RNA in mouse brain. Cell
18: 1341–1349, 1979.
157. Varanasi R, Bardeesy N, Ghahremani M, Petruzzi MJ,
Nowak N, Adam MA, Grundy P, Shows TB, and Pelletier
J. Fine structure analysis of the WT1 gene in sporadic Wilms
tumors. Proc Natl Acad Sci USA 91: 3554–3558, 1994.
158. Venter JC, et al. The sequence of the human genome. Science
291: 1304–1351, 2001.
159. Verma IM. Studies on reverse transcriptase of RNA tumor
viruses. I. Localization of thermolabile DNA polymerase and
RNase H activities on one polypeptide. J Virol 15: 121–126,
1975.
160. Wang SM, Fears SC, Zhang L, Chen JJ, and Rowley JD.
Screening poly(dA/dT)-cDNAs for gene identification. Proc Natl
Acad Sci USA 97: 4162–4167, 2000.
161. Wilding P, Kricka LJ, Cheng J, Hvichia G, Shoffner MA,
and Fortina P. Integrated cell isolation and polymerase chain
reaction analysis using silicon microfilter chambers. Anal Biochem 257: 95–100, 1998.
162. Wu W, Henderson LE, Copeland TD, Gorelick RJ, Bosche
WJ, Rein A, and Levin JG. Human immunodeficiency virus
type 1 nucleocapsid protein reduces reverse transcriptase pausing at a secondary structure near the murine leukemia virus
polypurine tract. J Virol 70: 7132–7142, 1996.
163. You JC and McHenry CS. Human immunodeficiency virus
nucleocapsid protein accelerates strand transfer of the terminally redundant sequences involved in reverse transcription.
J Biol Chem 269: 31491–31495, 1994.
164. Zhang JS and Longo FM. LAR tyrosine phosphatase receptor: alternative splicing is preferential to the nervous system,
coordinated with cell growth and generates novel isoforms
containing extensive CAG repeats. J Cell Biol 128: 415–431,
1995.
165. Zhang QH, Ye M, Wu XY, Ren SX, Zhao M, Zhao CJ, Fu G,
Shen Y, Fan HY, Lu G, Zhong M, Xu XR, Han ZG, Zhang
JW, Tao J, Huang QH, Zhou J, Hu GX, Gu J, Chen SJ, and
Chen Z. Cloning and functional analysis of cDNAs with open
reading frames for 300 previously undefined genes expressed in
CD34⫹ hematopoietic stem/progenitor cells. Genome Res 10:
1546–1560, 2000.
166. Zhumabayeva B, Chang C, McKinley J, Diatchenko L,
and Siebert PD. Generation of full-length cDNA libraries
enriched for differentially expressed genes for functional
genomics. Biotechniques 30: 512–516, 518–520, 2001.
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.1 on June 14, 2017
133. Schuchhardt J, Beule D, Malik A, Wolski E, Eickhoff H,
Lehrach H, and Herzel H. Normalization strategies for
cDNA microarrays. Nucleic Acids Res 28: E47, 2000.
134. Schwer B, Visca P, Vos JC, and Stunnenberg HG. Discontinuous transcription or RNA processing of vaccinia virus late
messengers results in a 5⬘ poly(A) leader. Cell 50: 163–169,
1987.
135. Seydoux G. Mechanisms of translational control in early development. Curr Opin Genet Dev 6: 555–561, 1996.
136. Shoemaker DD, Schadt EE, Armour CD, He YD, GarrettEngele P, McDonagh PD, Loerch PM, Leonardson A,
Lum PY, Cavet G, Wu LF, Altschuler SJ, Edwards S, King
J, Tsang JS, Schimmack G, Schelter JM, Koch J, Ziman
M, Marton MJ, Li B, Cundiff P, Ward T, Castle J,
Krolewski M, Meyer MR, Mao M, Burchard J, Kidd MJ,
Dai H, Phillips JW, Linsley PS, Stoughton R, Scherer S,
and Boguski MS. Experimental annotation of the human
genome using microarray technology. Nature 409: 922–927,
2001.
137. Sierra JM and Zapata JM. Translational regulation of the
heat shock response. Mol Biol Rep 19: 211–220, 1994.
138. Smith MA and Bidochka MJ. Bacterial fitness and plasmid
loss: the importance of culture conditions and plasmid size. Can
J Microbiol 44: 351–355, 1998.
139. Soares MB, Bonaldo MF, Jelene P, Su L, Lawton L, and
Efstratiadis A. Construction and characterization of a normalized cDNA library. Proc Natl Acad Sci USA 91: 9228–9232,
1994.
140. Sonenberg N. Cap-binding proteins of eukaryotic messenger
RNA: functions in initiation and control of translation. Prog
Nucleic Acid Res Mol Biol 35: 173–207, 1988.
141. Spicher A, Guicherit OM, Duret L, Aslanian A, Sanjines
EM, Denko NC, Giaccia AJ, and Blau HM. Highly conserved RNA sequences that are sensors of environmental
stress. Mol Cell Biol 18: 7371–7382, 1998.
142. Steinschneider A and Fraenkel-Conrat H. Studies of nucleotide sequences in tobacco mosaic virus ribonucleic acid. 3.
Periodate oxidation and semicarbazone formation. Biochemistry 5: 2729–2734, 1966.
143. Stover CK, Vodkin MH, and Oaks EV. Use of conversion
adaptors to clone antigen genes in lambda gt11. Anal Biochem
163: 398–407, 1987.
144. Sugahara Y, Carninci P, Itoh M, Shibata K, Konno H,
Endo T, Muramatsu M, and Hayashizaki Y. Comparative
evaluation of 5⬘-end-sequence quality of clones in CAP trapper
and other full-length-cDNA libraries. Gene 263: 93–102, 2001.
145. Suzuki Y, Ishihara D, Sasaki M, Nakagawa H, Hata H,
Tsunoda T, Watanabe M, Komatsu T, Ota T, Isogai T,
Suyama A, and Sugano S. Statistical analysis of the 5⬘
untranslated region of human mRNA using “Oligo-Capped”
cDNA libraries. Genomics 64: 286–297, 2000.
146. Tabor S and Richardson CC. DNA sequence analysis with a
modified bacteriophage T7 DNA polymerase. Proc Natl Acad
Sci USA 84: 4767–4771, 1987.
147. Take Y, Inouye Y, Nakamura S, Allaudeen HS, and Kubo
A. Comparative studies of the inhibitory properties of antibiotics on human immunodeficiency virus and avian myeloblastosis
virus reverse transcriptases and cellular DNA polymerases. J
Antibiot (Tokyo) 42: 107–115, 1989.
148. Tanchou V, Delaunay T, De Rocquigny H, Bodeus M,
Darlix JL, Roques B, and Benarous R. Monoclonal antibody-mediated inhibition of RNA binding and annealing activities of HIV type 1 nucleocapsid protein. AIDS Res Hum Retroviruses 10: 983–993, 1994.
149. Tanese N and Goff SP. Domain structure of the Moloney murine leukemia virus reverse transcriptase: mutational analysis
and separate expression of the DNA polymerase and RNase H
activities. Proc Natl Acad Sci USA 85: 1777–1781, 1988.