Identification of a conserved sequence in the non

Volume 17 Number 2 1989
Nucleic Acids Research
Identification of a conserved sequence in the non-coding regions of many human genes
Lawrence A.Donehower*, Betty L.Slagle, Margaret Wilde1, Gretchen Darlington1 and Janet S.Butel
Department of Virology and •Department of Pathology, Baylor College of Medicine, Houston,
TX 77030, USA
Received September 9, 1988; Revised and Accepted December 19, 1988
Accession no. X13001
ABSTRACT
We have analyzed a sequence of approximately 70 base pairs (bp) that shows a high
degree of similarity to sequences present in the non-coding regions of a number of human
and other mammalian genes. The sequence was discovered in a fragment of human
genomic DNA adjacent to an integrated hepatitis B virus genome in cells derived from
human hepatocellular carcinoma tissue. When one of the viral flanking sequences was
compared to nucleotide sequences in GenBank, more than thirty human genes were
identified that contained a similar sequence in their non-coding regions. The sequence
element was usually found once or twice in a gene, either in an intron or in the 5 ' or 3 '
flanking regions. It did not share any similarities with known short interspersed nucleotide
elements (SINEs) or presently known gene regulatory elements. This element was highly
conserved at the same position within the corresponding human and mouse genes for
myoglobin and N-myc, indicating evolutionary conservation and possible functional
importance. Preliminary DNase I footprinting data suggested that the element or its
adjacent sequences may bind nuclear factors to generate specific DNase I hypersensitive
sites. The size, structure, and evolutionary conservation of this sequence indicates that it is
distinct from other types of short interspersed repetitive elements. It is possible that the
element may have a cis-acting functional role in the genome.
INTRODUCTION
Virtually all mammalian genes contain various types of repetitive elements in their
non-coding regions. The two most abundant classes of interspersed repetitive sequences
associated with genes are the long interspersed nucleotide elements (LINEs) and short
interspersed nucleotide elements (SINEs) (1-3). Both types of elements have structural
properties (flanking direct repeats and 3 ' A-rich regions) which indicate that they are
inserted into different sites in the genome via an RNA intermediate form. Consequently,
these repetitive elements have been classified as retrotransposons (2). Typically, these
elements show greater interspecies divergence than intraspecies divergence. For example,
the dominant SINE families of humans (Alu) and mice (Bl) are ancestrally related, but
show significant differences both in consensus sequence and overall structure (1-3). No
clear functional role has been assigned to these types of repetitive elements.
In addition to repetitive elements and non-functional DNA, the non-coding regions of
genes possess regulatory elements that allow the appropriate level of expression of the
encoded protein. A number of cis-acting gene regulatory elements have been
characterized, and some have been shown to bind nuclear factors in a specific manner
(4,5). While the majority of these regulatory elements appear to be located 5' of the coding
sequences, a number of examples indicate that cis-acting sequences that regulate
expression can be present in introns or in the 3'flankingregion of a gene (6-10).
In this paper, we report the characterization of a sequence element which appears to
be highly conserved among mammals. The sequence is about 70 bp in length and is usually
699
Nucleic Acids Research
present once or twice in the non-coding regions of at least thirty human genes and a
number of other mammalian genes. It has none of the features typical of SINEs, such as
flanking direct repeats or 3 ' A-rich tracts. The evolutionary conservation of this sequence
suggests a cis-acting functional role, and we propose that this element is part of a class of
interspersed repetitive elements distinct from other SINEs.
MATERIALS AND METHODS
Molecular doping and Sequencing
Standard molecular cloning and nudeotide sequencing methods were used and are
fully described in Zhou et al. (fl).
Sequence Comparisons
All computer sequence analysis work was performed on the Baylor College of
Medicine Molecular Biology Information Resource. The streamlined user interface
EuGene, developed by Thomas Shalom, was used to efficiently search GenBank and make
sequence comparisons. The GenBank search programs were developed by Charles
Thomas and Dan Goldman (12,13). The default parameters (unit cost matrix used, 15
nudeotide minimum length of similarity for acceptance, and only matches with an SD value
above 3.5 reported) were used for the search function (12). Programs used for optimal
alignments of similar sequences were according to Altschul and Erickson (14) and
Lawrence and Goldman (13). The parameters of the Altschul and Erickson (14)
alignments were the default parameters (Dayhoff matrix used, a cost penalty of 25 for
opening a gap, and a cost penalty of 0.5 for each space in a gap).
Preparation of Hepatoma Extracts
Nuclear extracts were prepared from the Hep3B and HepG2 human hepatocellular
carcinoma cell lines (15). Cells were trypsinized from eight roller bottles, pelleted by lowspeed centrifugation, and suspended in solution A2 (10 mM HEPES, pH 72, 0.15 mM
spermine, 0.5 mM spermidine, 2 mM EGTA, and 2 mM DTT). The suspended cells were
lysed by Dounce homogenization, and the crude nuclei were pelleted twice by low-speed
centrifugation in solution A2 and resuspended in 76 ml of solution Cl (10 mM HEPES, pH
7.6,25 mM KC1, 0.5 mM spennidine, 0.15 mM spermine, 2 mM EDTA, 0.5 mM EGTA, 2
mM DTT, 1.45 M sucrose, and 10% glycerol). The suspension was pelleted al 25,000 rpm
for 30 minutes in an SW28 rotor. The pellet was suspended in solution D (10 mM HEPES,
pH 7.6,100 mM KC1, 3 mM MgCly 0.1 mM EDTA, 1 mM DTT, 0.1 mM PMSF, and 10%
glycerol) and then diluted to 10 A ^ units/ml. One-tenth volume of 4 M ammonium
sulfate was added, and the mixture was placed on ice with occasional gentle miring. The
rysate was then centrifuged at 45,000 rpm in a Beckman 50Ti rotor for 1 hour, and03 g of
ammonium sulfate was added to each ml of supernatant The precipitated proteins were
pelleted in the Sorvall SS34 rotor at 10,000 rpm for 20 minutes and then redissorved in 25
ml of solution E (50 mM Tris, pH 8.0,0.1 mM EDTA, 1 mM DTT, 125 mM MgCL, and
20% glycerol). The solution was desalted on a Pharmacia PD10 disposable G25 column
and frozen as aliquots in liquid nitrogen.
DNase I Footprinting
Footprints were performed as described by Carthew et aL (16), based on the method of
Galas and Schmitz (17). Briefly, 10,000 cpm of ^-end-labelled DNA (250-bp fragment)
were incubated with 10 ug of bovine serum albumin or 10 ug of the Hep3B nuclear extract
in solution E for 60 minutes at 30°C. Five ng of DNase I was added to each reaction, and
the mixture was incubated on ice for 3 minutes before addition of 1 ul of 0.5 M EDTA
followed by 100 ul of stop buffer (1% SDS, 100 mM Tris-HCl, pH 8.0,05 mg/ml
proteinase K, and 50 ug/ml yeast RNA). This mixture was phenol-extractedTchloroformextracted and ethanol-precipitated prior to denaturation in ionnarnide-dye buffer and
loading on a 6% denaturing poryacrylamide gel for electrophoresis followed by
autoradiography. Nucleodde sequence determination of the end-labelled fragments for
use as markers was performed according to Maxam and Gilbert (18).
700
Nucleic Acids Research
RESULTS
Identification of a Sequence Similar to Elements in Other Human Genes
The probe DNA sequence described here (Figure 1C) was isolated as part of a study
on the role of hepatitis B virus (HBV) infection in the development of hepatocellular
carcinoma (HCC) in humans (11). Genomic DNA derived from a liver tumor tissue of an
individual with chronic HBV infection was analyzed by Southern blot hybridization and
found to contain two integrated copies of HBV DNA. One of the integrated HBV DNAs
and itsflankingcellular DNA was molecularh/ doned and analyzed extensively by
restriction endonuclease mapping and selective DNA sequencing (Figure IB). A 1.0-kb
human flanking DNA fragment (Figure IB) contained no repetitive DNA sequences after
hybridization to human DNA at nigh stringency and was mapped to the pll.2-pl2 region of
chromosome 17 (Figure 1A) (11).
The sequenced segments of the humanflankingDNA were further analyzed by
computer. None of the flanking sequences contained discernible open reading frames. A
search of the GenBank data base for possible nudeotide sequence similarities was negative
for all of the sequence fragments except one. This one sequence (Figure 1C) showed
significant similarity to a number of GenBank entries, and its reverse complement also
showed similarity to GenBank sequences, indicating that the element wasfresent in genes
in both orientations. Optimal alignment of the HBVflankingsequence (Figure 2A) with
17 GenBank human DNA entries (Figure 2B) revealed a startling degree of similarity
extending about 70 nucleotides among most of the entries. No gaps had to be introduced
into any of the sequences to provide a better fit A consensus sequence was derived from
these alignments (Figure 2C), in which a given nudeotide position has an identical base for
9 or more of the 18 DNAs listed. Secondary nucleotides present in at least 6 of 18
sequences are indicated below the primary consensus nucleotides. Similarity of greater
than 75% (at least 14 of the 18 DNAs contained the consensus nudeotide) was noted at 28
positions (Figure 2C, asterisks). In general, the highest amount of sequence conservation is
found near the center of the 70-bp region, with decreasing conservation moving away from
this central core sequence. The calculated probability values for obtaining the consensus
nucleotides at a given frequency at each position (Figure 2D) are usually very low,
indicating a high degree of significance of these similarities throughout most of the aligned
sequences.
The most conserved part of the consensus sequence derived in Figure 2C (the central
45-bp portion} was used to further probe the GenBank library for entries with sequence
similarities. Additional sequences with significant similarity were obtained, and tne
sequence alignments for 35 of the most similar entries are presented (Figure 3). The 45-bp
consensus (+) probe is displayed at the top of Figure 3, and the region of highest similarity
(core similarity) is indicated by double underlining. In the 15-bp core similarity region, 14
of the positions have at least 75% nudeotide identity among the entries. In four positions
at least 34 of the 35 entries have the same nudeotide (Figure 3, bottom). When the data
from 20 consensus-like sequences in the opposite orientation (Table 2) are included in
these comparisons, 55 of 55 sequences have an A at position 33,54 of 55 sequences have an
A at positions 35 and 39, and 53 of 55 sequences have a G at position 38.
Thus far, we have identified over thirty different human genes and eight genes from
other mammalian spedes that contain a consensus-like sequence (Table I and Table 2). In
addition, three human sequences are represented that are not associated with a particular
gene. Two of these sequences are in potential origin of replication regions: the human
ARS1 sequence (19) and the African green monkey SV40 origin-like sequence (20). The
standard deviation (SD) value in Table 1 is a measure of the degree of similarity (12).
Values of 3.0 or greater are considered to reflect possible similarity, and values above 6.0
are considered to have probable similarity (13). The position of the consensus-like
sequence within each gene is indicated (Tables 1 and 2, third column). Many of the genes
contain the sequence within an intron, although some genes have the element either in
their 5 ' or 3 ' flanking regions. In only one case (human interleukin 1) is the consensuslike sequence present within a gene exon. However, in this instance, it is located within the
3 ' non-coding region of the gene. Interestingly, the human aceryicholine receptor (alpha
701
Nucleic Acids Research
A. Human
chromosome 17
B. 9.0 kb
cloned DNA
.---''
"""--.
.--'"
"
|*1J)U»I
P">b»
••"•T--.J
.---'
H
B
HE
—. Sequtnod
,J!
regkmi
C. HBV flanking
DNA sequence
_...-'*10
_--"
...--'"
20
\
30
40
50
60
70
80
Figure 1. Genetic and physical maps showing the location of the original consensus-like
sequence in a human hepatocellular carcinoma (HCC). See Zhou et al. (11) for details on
molecular cloning, chromosomal localization, and sequencing. (A) Pictorial representation
of human chromosome 17. The hepatitis B virus (HBV) flanking sequences shown in B
map to the 17plO-17pl2 region of chromosome 17. (B) Restriction endonudease map of
a cloned 9.0-kb EcoRI fragment derived from HCC genomic DNA containing integrated
HBV sequences and human flanking DNA. HBV DNA (middle boxed area) contains two
genes (laoelled "S" and "C"). Open boxes represent the pre-S gene and hatched boxes
represent the gap between pre-S and C Enzymes used m the map include EcoRI (E),
Bgin (B), Hindlll (H), Xhol (X), and Xbal (Xb). The 1.0-kb BgW-EcoRI fragment, used
for chromosome mapping, is represented by a closed box at the right of the map. Regions
that have been sequence? are indicated by lines below the map. (C) The sequence
containing the consensus-like element is shown.
subunit) gene contains the consensus-like sequence as part of a 49-bp tandem direct repeat.
Finally, the number of nucleotides in the central 15-bp core similarity region (double
underlined nucleotides in Figure 3 consensus sequence) that are identical to the consensus
are indicated for each entry (Tables 1 and 2, fourth column). Fourteen entries have
identity with the consensus sequence in at least 14 of 15 nucleotide positions in Table 1.
Similar results are shown in Table 2 (5 of 20 entries have at least 14 of 15 nucleotides
which match the core consensus), which lists consensus-like sequences found in the reverse
orientation (to those in Table 1) in a number of human and mammalian genes.
Two of the human genes Listed in Table 1, human myoglobin and human N-myc, have
mouse counterpart genes that are also reported. To test for relative evolutionary
conservation of the consensus-like sequence within the human and mouse myoglobin genes,
we performed a homology matrix comparison between the second intron (and first and
second exons) of each gene (Figure 4). Only matches over 10 nucleotides with a standard
deviation value above 3.0 are shown. As expected, the mouse and human exons 2 and 3 are
highly homologous, while intron 2 contains only a few scattered regions of homology. Two
ofthese homologous stretches correspond exactly to the match (SD = 5.4,33) between the
consensus-like sequences in the human and mouse intron 2. The homology between the
consensus-like elements does not extend on either side. There are only two other regions
702
Nucleic Acids Research
>.
10
HBV FLAMKIHG DMA
SEQUENCE:
B.
DMAB COMPARED:
HBV f l a n k i n g DMA
aarua praalbuain
carbonic anhydrasa
•yoglobin
apolipoprotain C H I
haptoglobin
olpho-1-antitrypain
bata tubulin
factor IX
alpha fatoprotain
f ibrinogan
adanoaina daaainaaa
opain
acatylcholina racaptor
protain C
anXaphalin
20
30
40
50
60
70
80
AACATCcccTCTiTACAajuuuauuauaKaaauaa
CTATTAT—AC
GTCCTAT
TC
-C
ACATTTC—A-TAT
CTACCAT
C—AT
A
A
A
T
A
A
C—C-T
A
A—TC-AA-G-T
A-AACA-T
TTACG-GTT-TC-CAAAATTO-C-CATTGTTA
G
GTAGGTG-TA-T-T-T—C—C
CACAGTT
-G-TTT—C—G-T
A
CCTC-TA
-CAG--A—C--A—A-T—T-A
T
T—C
A
T
T
A
C-G
-C-A
C
G--GA-C
G
T
T—C
A-AGGAGG
-C—A—A-AC-A-T
AT—AGGC-G-C
CTATGACA-AGTC-A—T—C-CATCTCC
TT
A
T
A-AC-G-T
ACTCGCC-C
CTT
G
A
A
-C
T—GTCAC-TC-AC-AACTCACACTGC
AAAGCAT
A-T
GA T
A—TT--O—A
AACAAC
A
T-CC-CTT
CTTGAAGATAG—G-TA—C—C—AC
C
T
AT
A
T-TT-T—GAA-CACCAA
GGTACTTCTG-TA-T—TA
AA
-C-AG-T-A—TA-CT-GC—A-GACCACATA-CTA
GGGCATAATA-TC-ATTG
T
T
T
-C -GT
T-TC
A-AC-G-A
CAGGGATA-CAGAT
C—A
GAG-A
GG
GG
AGA—CATTT-G—ATGTGGCCAG
AATCCTC—A-C
T
C—A
A
T
ATTT—G
TCT
T
AC-AGAAC-A-T
ACCACGT
-CCAT
A
TG—CA
-C-T
CA-TG-CCTC-A
AAAGATAAT
GT—A
A
A
GC—A—CA
A
GT GT-G
A—TG-CC-G-A
TGACCCA-T-A
GT
T-GAC
T—TGA-ATGAT-AAGAT—T
T
AT-AGGAAT
TTTATAAAT-C
T
G
T-A-G
C
T—T
AA-A-C-T—TGCA-A-A-TTGT-C
C.
COHSEHSOS SEOOEHCE:
«(HKinniTATTATCCC<^TTTTACAGATGAGGAAACTGACGCACAGAGAGGTTAAGTAaCTTGCCCAAGGTCKCHNAHCl)
D.
DISTBIBUTIOH OF HUCLEOTIDE3 AT EACH POSITIOH:
ATA
TT
A
T
A
A
Figure 2. Identification of a consensus sequence by comparison of the HBV flanking
sequence with similar sequences located in other human genes. (A) The HBV flanking
DNA sequence (80 bp) shown in Figure 1C is presented here. (B) Human genes or DNA
elements that have similarity with the HBV flanking sequence shown in A, determined by
searching the GenBank data base. The DNAs are optimally aligned under the criteria of
not allowing gaps or deletions. A dash (-) indicates a nucleotide identical to that of the
primary consensus sequence shown in C. (C) Consensus sequence. Comparison of the 18
DNAs listed in part B generated a consensus nucleotide if at least 9 of 18 sequences had an
identical nucleotide at a given position. 14 or more identical nudeotides (of the possible
18) at a position is noted oy an asterisk above the nucleotide. Secondary nudeotides are
shown below the primary consensus nudeotides at positions which had 6 identical
secondary nudeotides. N = any nucleotide. (D) Distribution of nudeotides at eadi
position for the 18 compared sequences. Numbers representing the consensus nudeotides
are underlined. Consensus nudeotides are indicated at each position under CON. The
probability of obtaining the primary consensus nudeotide is shown under the column
designated P(l). The probaoility of obtaining the primary and secondary nudeotides at a
position is indicated under the P(l + 2) column. Probabilities were calculated using a
Sinomial probability distribution, with the following values for the human genomic
nucleotide frequendes: A=03,C=02,G = 02, T=O3. The most likely probability value (4
or 5 occurences of a given nudeotide at a position) is about 2X10" .
703
Nucleic Acids Research
CONSENSUS
1)
2)
3)
4)
5)
6)
7)
• )
9)
10)
11)
12)
13)
14)
19)
l»)
17)
It)
19)
20)
21)
22)
23)
24)
23)
26)
27)
28)
29)
30)
31)
32)
33)
34)
35)
aXCATTTTACACATCACCAAACTOACCCTCACACACCTTAACTA
A
human adanoaina daaminaM
human p r o t a i n C
human b a t a - t a b a l l n
human a l p f a a - l - a n t i t r y p * i n
human h a p t o g l o b i n ( a l p b a - 2 )
human • y o g l o b i n
X chremomamm s«qaanc*s ( n « i r DHD)
human c a r b o n i c anhydraM
African graan monkay 8V40 ori-llka
human c-aia
human haptoglobin-ralatad
human n-ayo
human factor IX
human prothrombin
human acatylcholina raoaptor
human apolipoprotain A-l
human alpha-f«toprot«in
human Mnm praalbomin
Booaa myoglobin
human apolipoprotain C-III
rabbit poly Ig r*captor
human fibrinogan
human intarfaron bata-3
human immuna intarfaron gamma
human opain
bovina acatylcbolina racaptor
human ASS1
human ankaphalin
rat thyrotropin bata subonit
human protain C gana
human intarlaukln 1
bovina pancraatic trypain inhibitor
M U M N-myc
human T-call racaptor bata chain
aouaa g l i a l f i b r i l l a r y acid protain
AATATTC-ATTO
TCCAICCAT
CCTCTTA-TT
T
T
A
C
TC-—CA
-C-TCTTCCCC
C
T—CTCACTTCCA
A
ACAAAOTC-A—T—C-CATCTCC
—X
ACCTCTCC
TTATTOTTTt—C
0-1 A
C
-C
TCTTCCCC
-C
C-CCTTCTCC
TTATCATAT
-C-T
ACAATAT-C
AA
TAT-AC-—C-TTCTCAGCC
TTACC-CTT-TCCCAAAAT
CTAATTAT
A
A
T
GGTAGCC—T—AC
~0
1 :—AC
AATACTTCTAC
T—T
AA-ACCTTCCT
AATTCAT-T—-C
T-A-C
CC--C
C
TCTTCCCC
ITATTATT-T—C
C-C A
AT
c-CC ACTTCACC
OTATCAI
A-T--C
A
TTATTAA-T
-CA T
•A
T—C—A
AACAACTCCCA
OTCCTCTC
-C
A—T-C
-C-CAC
TTCCCTACTACC
GTAATGT
T
C--A
A
T— • A T T — — C - —
TCTTTGCTC
-C
CTACCCCACCTACCACA
GGGTGATT—T-CC—-O
AT
—--ACTTTCTT
CATACATO-TA--C—C—AC
-G
CAC—AC-C—A—ACTTCTCA
TTATTAT
-C
T
A
A—
——CC—C—OO-C-ACTCACTG
OOACCCA-TGG-C-GG
T
-OA
G—GACCTCCCC
CTTATTTT-T—C—<1
A '
T-CTACTCAACCCCACCCTCCCAOA
ATCTTCT-0
A-CC-AC-T-A-CTAACTTOCCCA
TCTCTTA-T—IA
AA
-ATC—T-CCCTCCCCTAOGCCTCACCA
TCTCATTTTA
-<3
ACAGTCT-AT-C-CC-TT
A
T A --A-T
-CCTCCCTC
TAACAOAT
C—A
SAC-A
00
-CC ACACTCATTT
TAAATGT
T-C
A—A—A
-ACT-C—C
TCATTATCC
TGA-ATOAT-AAGTTTTCCCA
ATTTAAT
OT
T-0
AATTT ACT—A
A
A—-CC- •A—CA
A
-CT—-CTTCCCCC
GTTACACTTT-C
T
A
T— —<JCT—GA
T—GGGCAAACT
—CCTCTCA-A-C-TTAC-TCCTCCAC
ATOTTTAC-I
A-C
G
T-ACA-AAC
A-ATATCCAC
AOACTCTA
A
-C-C—T-A
-T—CTC-CTCTCC-CAOCAACTCTGO
CACATGCT
T
-C-A-C
•A-ACTTTTTCCACC-TCATCCOACAAC
TTATTAC
A-T—C
-C
-ATCACAC-TTACACCTC-TATAACTA
GATATTCT
-C—T-CA
-CTTCCATCCTGCACCCCTrCAT
CTGTGAT
AT
GCA—-C
2 7 2 3 24 3 1 2 6 2 1 27 2 7 2 1 33 23 39 3 0 34
T A C A C » T O » O < 1 » »
I
9 7
9
•
4
A
C A C
A
C
34 2 7 2 9 2 3 14 22
O g
C T C
•
13 7
A
AT
Figure 3. Sequences from GenBank with significant similarity to the consensus sequence.
The central 45 nucleotides (nucleotides 14-58) of the consensus sequence derived in Figure
2C are shown at the top. This sequence was used to search GenBank for other similar
sequences. Thirty-five DNAs that revealed a significant degree of similarity are aligned
with the consensus sequence. (Each DNA is 60 bp in length with the 45-bp region of
similarity in the middle). The gene or DNA element from which each listed sequence was
obtained is identified at the left The number of entries that shared identity with the
consensus nucleotide at each position is indicated at the bottom. Positions that show a
frequent secondary nucleotide are indicated below the consensus nucleotide. The 15
central nucleotides that exhibit the highest degree of similarity (core similarity) are
emphasized in the consensus sequence by double' underlining.
of homology in intron 2 near the 3 ' portioa Human N-myc and mouse N-myc also
displayed homology between their consensus-like sequences (data not shown).
Binding of Nuclear Factors to the Conseasus-Like Sequence
If the consensus-like sequence plays a functional role in the cell, then it might be
expected to specifically bind trans-acting nuclear factors as do a number of other cis-acting
regulatory elements. To explore this possibility, we incubated a small DNA fragment (a
250-bp Hindm-EcoRI fragment from the 1.0-kb probe DNA) containing the consensus-like
sequence with a nudear extract prepared from Hep3B hepatoma cells and subjected the
reaction to a DNase I footprinting assay. Evidence of specific protein binding may be more
704
Nucleic Acids Research
Table 1. Characteristics of 35 ( +) consensus-like sequences
1)
2)
3)
V
5)
6)
7)
8)
9)
10)
11)
12)
13)
14)
15)
16)
17)
18)
19)
20)
21)
22)
23)
24)
25)
26)
28)
29)
30)
31)
32)
33)
34)
35)
Gene or DNA Sequence
SD a
Location
human adenosine deaminase
human protein C
human beta-tubulin
human alpha-1-antitrypsin
human haptoglobin (alpha-2)
human myoglobin
X chromosome sequences (near DMD)
human carbonic anhydrase II
African green monkey SV40 on like
human c-sis
human haptoglobin-related
human N-myc
human factor IX
human prothrombin
humin acetylcholine receptor (alpha)
human apofipoprotein A-l
human alpha-fetoprotein
human serum prealbumin
mouse myoglobin
human apohpoprotein C-HI
rabbit poly Ig receptor
human fibrinogen
human interferon beta-3
human immune interferon gamma
human opsin
bovine acetylcholine receptor (alpha)
human ARS1 sequence
human enkephalin B
rat thyrotropin beta subunit
human protein C gene
human interleukin 1
bovine pancreatic trypsin inhibitor
mouse N-myc
human T-cell receptor beta chain
mouse glial fibrillary acid protein
9.7
9.6
93
9.1
8.8
8.7
8.4
83
8.1
7.5
7.1
7.0
6.7
6.7
6.6
6.6
65
65
65
63
63
63
6.0
5.9
5.8
intron 1
5 ' flanking
intron 3
5 ' flanking
intron 5
intron 2
unknown
intron 1
unknown
5 ' flanking
intron 5
intron 2
intron 3
intron 1
3 ' flanking (rpt)c
3 ' flanking
intron 3
intron 3
intron 2
intron 2
3 ' flanking
intron 6
unknown
intron 3
intron 1
3 ' flanking
rep enhancer
intron 3
intron 1
intron 3
exon7(3' UT) d
3 ' flanking
intron 2
D-J region
intron/
5.8
5.6
5.4
5.0
4.8
4.6
42
42
42
3.6
Core
Similarity1"
14/15
15/15
12/15
12/15
13/15
14/15
13/15
14/15
13/15
13/15
13/15
15/15
11/15
12/15
11/15
14/15
13/15
13/15
14/15
13/15
13/15
13/15
12/15
11/15
11/15
11/15
13/15
11/15
12/15
14/15
11/15
11/15
12/15
14/15
14/15
a
SD measures the degree of similarity to the consensus sequence. See text for additional
details.
b
Core similarity represents the number of nucleotides in each consensus-like sequence (first
number) which are identical to the most highly conserved central 15 nucleotides of the
consensus sequence (Figure 3, double underlined nucleotides).
c
The consensus-like sequences of the alpha subunit gene are within a 49-bp tandem direct
repeat
*rhe consensus-like sequence is within the 3 ' untranslated region.
readily observable at the higher resolution afforded by this particular procedure. The
DNA-binding reactions were performed in an excess of the simple alternating copoh/mer
duplex pory(dI-dC) with a constant amount of ^-labelled 250-bp fragment and 10 ug of
Hep3B extract
705
Nucleic Acids Research
Table 2. Characteristics of 20 (-) consensus-like sequences"
Similarity
Gene
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
human interleukin 2
human c-sis
bovine acetylcholine receptor15
human beta crystallin
human myoglobin
human adenosine deaminase
human protein C
human prothrombin
human alpha-fetoprotein
human dystrophin
human alkaline phosphatase
human apolipoprotein C-in
human epsilon globin
human aldolase B gene
human prolactin gene
human tissue plasminogen act
human interleukin 1
human myoglobin
human opsin
mouse myoglobin
SD
Location
Core
Total
10.4
9.9
9.1
8.8
8.7
8.4
8.2
8.0
7.6
7.4
13
12
6.9
6.7
6.7
63
6.2
5.9
5.9
4.2
5 ' flanking
5 ' flanking
3 ' flanking
intron4
intron 1
5 ' flanking
5 ' flanking
intron 12
intron 3
intron 7
intron 9
intron 3
5 ' flanking
5 ' flanking
5 ' flanking
intron 4
intron 4
intron 2
3 ' flanking
intron 2
13/15
14/15
14/15
13/15
13/15
14/15
12/15
12/15
12/15
13/15
13/15
13/15
13/15
14/15
12/15
15/15
11/15
13/15
11/15
12/15
39/45
37/45
37/45
37/45
36/45
34/45
36/45
36/45
34/45
34/45
29/45
35/45
32/45
35/45
33/45
29/45
34/45
33/45
31/45
30/45
These consensus-like elements are in the reverse orientation of those shown in Table 1 and are
homologous to the opposite strand of the 45 nucleon'de consensus sequence at the top of Figure 3.
The (-) strand consensus probe is:
5 ' -TACITAACXnXTCTGAGCCIX^GTTTCCrCATCTGTAAAATGGGG-3'.
''Beta subunit.
Both strands of the 250-bp fragment exhibited DNase I hypersensitive cleavage sites in
the presence of the Hep3B extract (Figure 5A). These sites were not apparent when the
fragment was incubated with DNase I in the presence of bovine serum albumin.
Interestingly, the strongest hypersensitive sites on each strand appear to be at the same
position and were within the consensus-like sequence. These strong hypersensitive bands
were consistently observed in three separate DNase I footprinting experiments as well as
with an extract derived from the HepG2 hepatoma cell line (data not shown). Other less
hypersensitive sites were present on one strand at the other end of the consensus-like
sequence. No obvious protected regions were evident (Figure 5A), although three possible
regions of decreased DNase I cleavage appeared, all outside of the consensus-like
sequence. The relationship of the DNase I hypersensitive sites to the DNA sequence is
illustrated (Figure 5B).
DISCUSSION
As a result of sequence analysis of human DNA associated with an integrated HBV
genome in a hepatoma, we have discovered a sequence element that appears to be highly
conserved in a number of human and other mammalian genes. The sequence itself displays
some interesting features, including purine-rich tracts alternating with pyrimidine-rich
regions. The 5 ' region of some of the consensus-like sequences has the potential to form
stem-loop structures, but the significance of these putative secondary structures is unclear.
The consensus-like sequences do not appear to have similarity to any previously
described human repetitive DNA element and do not share the properties of a short
interspersed nucleotide element (SINE), typified by the human Atu repeats. The
consensus-like sequence observed in human genes does not contain an A-rich tract at its 3 '
706
Nucleic Acids Research
Mouse Myoglobin
C i
>
.O
o
O)
o
>
c
CO
\
Figure 4. Homology matrix comparison between human and mouse myoglobin in the
intron 2 (and adjacent exons) sequences. Mouse myoglobin sequences (top) are matched
with human myoglobin sequences (left). Sequences greater than 10 nucleotides in length
which show homology at an SD value of 3.0 will generate a diagonal line in this matrix
display. The distance between each vertical and horizontal line in the matrix is equal to
100 nucleotides. The consensus-like sequences in each intron are represented by a closed
box with an arrow. Note that the consensus-like sequences are conserved between the
human and mouse genes.
end nor is it bordered by direct repeats, features characteristic of SINEs (1-3). In most
cases, the consensus-like sequence is present once or twice per gene, whereas dozens of
SINEs can be located within a single gene (1-3), indicating significantly lower copy
numbers of the consensus-like sequence. However, it is possible that our screen
underestimated the number of consensus-like sequences, since the more divergent
members of this family might not be picked up by our computer searches.
The consensus-like sequences are conserved among a variety of mammalian species,
whereas rodent and human SINEs, although ancestrally related, have significant
differences in overall structure and primary sequence(3). Whereas most of the genes
containing the consensus-like sequences are human (Table 1 and 2), a significant number
of rodent and other mammalian species contain this repeat The predominance of human
genes in Tables 1 and 2 may be due to the greater number of human genes sequenced
(including intron and flanking sequences). Alternatively, the non-human consensus-like
sequences may be somewhat diverged from their human counterparts. To test this
possibility, we derived a consensus sequence from six consensus-like elements in rat and
mouse genes. This rodent consensus sequence was identical to the 15-bp core region and
nearly identical to the entire 45 nucleotides of the consensus sequence in Figure 3. This
test is consistent with the results of the homology matrix comparison of Figure 4, which
shows that the consensus-like sequences in intron 2 of the mouse and human myoglobin
gene are conserved, whereas the adjacent non-coding sequences share no homology. These
data suggest that the human and mouse consensus-like elements are conserved over some
60-85 million years of evolution. Non-mammalian vertebrate genes in GenBank contained
no significant sequence similarities to the consensus sequence.
The differences in structure and evolutionary conservation between the consensus-like
707
Nucleic Acids Research
A.
12
3 4 5
6 7 8 9 1011
ATTAACATCCCCTCT-TTACAGAAGAGAAAACTGAGGCACAGAGAGATTAAGTCCTGTTAC
TAATTGTAG^A5A^TCTCT
Figure 5. DNase I footprint analysis of a 250-bp HBV flanking DNA fragment containing
the consensus-like sequence. The fragment tested was from the 3 ' end of the 1.0-kb
fragment indicated in Figure IB. DNase I footprinting was performed as described in the
Materials and Methods section. (A) Lanes 1-S contain the negative strand end-labelled at
the EcoRI site. Lanes 6-11 contain the positive strand end-labelled at the HindlH site.
Maxam and Gilbert sequencing reactions for C+T (lanes 1 and 6), G+A (lanes 2 and 7),
and A+ C (lane 8) are used as markers. Untreated control DNA fragments are in lanes 3
and 9. Control reactions in which the DNA fragments were incubated with bovine serum
albumin before DNase I treatment are shown in lanes 4 and 10. Lanes 5 and 11 contain
fragments incubated with Hep3B nuclear extract prior to DNase I treatment
Hypersensitive cleavage sites by DNase I in the presence of Hep3B extracts are indicated
by arrowheads. The dominant hypersensitive site on each strand is shown by a large
arrowhead. The regions of the fragments containing the consensus-like sequence are
designated by vertical lines next to the marker lanes. (B) Summary of footprinting assay in
the vicinity of the consensus-like sequence on both strands. The same DNase I
hypersensitive cleavage sites are indicated by arrowheads, as in panel A Underlined
nucleotides represent the 45-bp consensus region. Double underlined nucleotides
represent the 15 bp of core similarity.
Nucleic Acids Research
repeats and the typical SINEs indicate that the former should be placed in a separate class
of repetitive elements. Other investigators have reported non-Ahi types of short
interspersed repeats (21-23). Some of these are apparently retrotransposons and are
inserted (22). Another group of interspersed repeats is composed of tandem repetitions of
short oligonucleotides, which are highly conserved (3,24,25). The consensus-like sequences
reporteahere show no evidence of being inserted elements and are not repeated
oligonucleotides but are highly conserved. We therefore propose that the consensus-like
sequences form a different class of interspersed repetitive elements.
One possible explanation for the evolutionary conservation of the consensus-like
repeat is that these sequences play a functional role in the cell. The presence of a highly
conserved core sequence is consistent with the idea that the sequence binds a nuclear
factor or factors. Comparison of the consensus sequence to a list of sequences associated
with nuclear factor binding (5) did not uncover any similarities except possibly in one case.
A portion of the HBV genome to which liver-specific factors bind (26) showed identity at
11 of 13 positions with nucleotides 15-27 (Figure 2C) of the consensus sequence. This
particular sequence is directly adjacent to the dominant DNase I hypersensitive sites in the
consensus-like sequence of the HBV flanking DNA (Figure 5B).
In two instances, the consensusylike element is associated with sequences that display
potential origin of replication function. In one example, it is adjacent to a sequence in the
African green monkey genome which is similar to the SV40 origin of replication (20). In
the other case, the consensus-like sequence is part of the humany4R5/ DNA (19). ARS1 is
a sequence of human DNA that allows replication of yeast integrative plasmids as
autonomously replicating elements in yeast cells (19). Within ARS1 is a 325-bp segment
that is necessary for maximal expression of the autonomous replication phenotype,
analogous to an enhancer. Interestingly, the consensus-like sequence is positioned in the
center of this 325-bp segment having replication enhancer activity. It is not known,
however, whether the consensus-like sequence is part of the active portion of the
replicative enhancer. It should also be noted that another cloned human sequence, ARS2,
which also allows autonomous replication in yeast, does not appear to contain the
consensus-like sequence.
The DNase I footprinting experiments suggested that nuclear factors in extracts
derived from two different liver cell lines bound the DNA fragment containing the
consensus-like element and altered its susceptibility to DNase I. liver cell nuclear extracts
were employed because the original HBVflankingsequence was isolated from hepatoma
tissue and a number of the genes containing the consensus-like sequence are liver-specific.
The footprinting results suggest at least a specific DNA conformational change, since
DNase I hypersensitive sites were consistently noted in the same position of the 250-bp
fragment Whether specific binding of discrete factors to the DNA occurred is somewnat
unclear, although preliminary gel retardation binding assays suggested specific factor
binding (data not shown). Interestingly, the observed DNase I hypersensitive sites were
localized to both ends of the consensus-like sequence. This phenomenon of DNase I
hypersensitivity has frequently been observed at nudeotide positions adjacent to sites of
binding by transcriptional factors (27,28). There was some evidence of DNase I protection
of specific domains of the fragment, but the putative protected regions were outside of the
consensus-like sequences. Further experiments must be performed to clarify whether there
is a specific interaction of nuclear factors with the consensus-like sequence. The primary
question is whether this well conserved sequence has a function in gene expression, DNA
replication, or some other cellular event
ACKNOWLEDGEMENTS
We thank Charlie Lawrence, Sandy Honda, Debbie Wilson and Jim Kelly for helpful
discussions and Joyce Evans for help with the manuscript
This work was supported in part by Public Health Service grant CA37257 from the
National Cancer Institute and by National Research Service Award CA09197.
709
Nucleic Acids Research
strand. This turned out to be the case. As a result, one is able to incorporate mutagenic
oligonucleotides into the coding strand and transcribe the partially mismatched template directly.
We have used this technique to produce RNAs containing defined insertions, deletions, or
substitutions of virtually any length. The mutagenesis procedure is performed in a single reaction
vessel, beginning with 10-20 |ig (~3-6pmol) of plasmid DNA and ending with 100-200 pmol of a
purified mutant RNA that is typically 300-500 nucleotides in length. The transcription products
are loaded directly onto a polyacrylamide gel and are purified by electrophoresis and subsequent
column chromatography. Since the mutant RNAs are distinguished from wild type by their
electrophoretic mobility, the technique is best suited for mutations that result in a discemable size
difference between mutant and wild type or involve the use of a mutagenic oligonucleotide that
hybridizes very efficiently to the target DNA. The technique does not depend on the presence of a
convenient restriction site within the target gene, and, except for the T7 promoter and a restriction
site located somewhere downstream from the gene, does not place any limitations on the design of
the plasmid DNA.
MATERIALS AND METHODS
Nucleotides and Enzymes
Unlabeled nucleoside triphosphates, deoxynucleoside triphosphates, and dideoxynucleoside
triphosphates were purchased from Sigma,
[a 32 P] GTP and ly 32 P] ATP were from ICN
RadiochemicaJs and PH] UTP was from New England Nuclear. Synthetic oligodeoxynucleotides
were obtained from Operon Technologies and were purified by polyacrylamide gel electrophoresis
and chromatography on Sephadex G-10. Restriction enzymes were from New England Biolabs, T4
polynudeotide kinase, T7 gene 6 exonudease, T4 DNA polymerase, and T4 DNA ligase from VS.
Biochemical, and AMV reverse transcriptase from Life Sciences. T7 RNA polymerase was
prepared as previously described (5), and purified according to a procedure originally developed for
SP6 RNA polymerase (6).
Plasmid pTTlA3, which contains a 533 base-pair fragment of
Tetrahymena rDNA (7), and pTL-45, which contains a 5"-truncated 399 base-pair fragment of the
same gene (8), were provided by T.R. Cech.
Preparation of Mutant RNAs
In a typical preparation, 10-20 )ig of pTL-45 DNA was deaved at a Hindlll restriction site that
lies immediately downstream from the gene for Tetrahymena rRNA. The cleaved DNA was
added to a 100 uJ volume containing 50 mM Tris (pH8.1), 20 mM KQ, 5mM MgClj, 1 mM dithiothreitol, and 50 U T7 gene 6 exonudease, which was incubated at 37° C for 30 min. The exonuclease
was removed by three phenol extractions and the DNA was purified by ethanol predpitation. Two
oligodeoxynucleotides were then hybridized to the single-stranded (minus strand) DNA; one
oligonudeotide forming a perfect duplex at the 3' end of the target gene and the other forming a
712
Nucleic Acids Research
partial duplex that introduces the desired mutation. Annealing was performed in a 300 |il volume
containing 20 mM Tris (pH 75), 50 mM NaCl, 2 mM MgClj, and a 5-fold molar excess of the two
oligonucleotides, which was incubated at 70° C for 5 min and then steadily cooled to 30° C over 40 min.
Synthesis of the mutant strand was completed by adding 40 U of T4 DNA ligase and 15 U of T4 DNA
polymerase, and incubating at 37° C for 60 min in the presence of 20 mM Tris (pH 75), 50 mM N a d ,
5 mM M g d 2 , 2 mM dithiothreitol, 1 mM ATP, and 0 3 mM (each) dNTPs. The resulting DNA was
purified by ethanol precipitation, and then used to direct the transcription of mutant RNA.
Transcription took place either in a 10 (ll volume containing 1 |ig of mutant DNA, 2 jiCi
32
[a ?] GTP and SOU T7 RNA polymerase or in a 400jU volume containing 10(ig of mutant DNA,
40 nCi [3H] UTP and 2,400 U T7 RNA polymerase. In either case, the transcription mixture also
contained 40mM Tris (pH7.5), 15 mM MgQ^ 10 mM dithiothreitol, 2mM spermidine, and 1 mM
(each) NTPs, and was incubated at 37° C for 90 min. T7 RNA polymerase was extracted with phenol
and the transcription products were purified by ethanol precipitation.
The mutant RNA was
isolated by electrophoresis in a 5% polyacrylamide / 8 M urea gel, eluted from the gel, and purified
by ethanol precipitation and chromatography on Sephadex G-50.
Sequencing of Mutant RNAa
The mutant RNAs were sequenced by primer extension analysis using reverse transcriptase in
the presence of dideoxynucleotides (9). 1.0 pmol of [5'-MP]-labeled synthetic DNA primer was
annealed with 03 pmol of mutant RNA by incubating at 65° C for 5 min and then cooling to 30° C
over 5 min. The primer-extended cDNA products were analyzed on a 10% polyacrylamide / 8 M
urea sequencing gel.
RESULTS
Development of the Mutagenesis Procedure
The most widely used technique for site-directed mutagenesis involves hybridization of an
oligodeoxynucleotide to single-stranded DNA, forming a partial duplex structure that contains a
region of base mismatch.
The oligomer strand is extended using a DNA-dependent DNA
polymerase, and the resulting double-stranded DNA is used to transform bacterial cells (10,11).
This technique is useful for producing a specific mutation at a defined location. However, It is
awkward when one wishes to perform wholesale mutagenesis without taking the time to
construct clones and harvest DNA from bacterial cells.
Introduction of the mutant DNA into a bacterial host serves two useful purposes. First, the
mutation becomes fixed as a result of bacterial repair processes that resolve the region of base
mismatch. Second, the mutant DNA becomes amplified as a consequence of bacterial growth, so
that one can obtain an essentially unlimited supply of pure mutant DNA. Oftentimes, however,
one only needs enough material to sequence the mutant and to conduct a simple assay to examine
713
Nucleic Acids Research
hybridize mutator &
terminator oligos
plasmid sequence
D
restriction site
P: T7 promoter
T4 DNA polymerase
DNA ligase
M: mulator oligo
T: terminator oHgo
I
T7 RNA polymerase
RNA 5'
Figure 1: Outline of the muta genes is procedure, beginning with plasmid DNA and ending with
mutant RNA. The mutator oligo (M) directs an insertion, deletion, or substitution, as indicated by
hatched lines within its central portion.
its functional consequences. In such instances, the time required to prepare the mutant becomes a
critical factor.
We have found that the two useful aspects of bacterial transformation, fixation of the desired
mutation and amplification of the mutant DNA within the bacterial host, can be met in an entirely
in vitro reaction system that makes use of T7 RNA polymerase. This enzyme is able to transcribe
714
Nucleic Acids Research
partially mismatched DNA, reading the template strand while ignoring the non-coding strand, and
in doing so generates several hundred copies of RNA transcript per copy of DNA template. We
have exploited these properties in order to develop a "mini-prep" method for the rapid production
of mutant RNA. The method involves excising the coding strand of wild-type DNA and replacing
it with a new strand that contains the desired mutation. The resulting partial duplex structure is
then used to direct the transcription of mutant RNA.
In the most general form of the technique, plasmid DNA, containing a T7 promoter and the
gene of interest, is cleaved at a site that lies downstream from the target gene (Fig. 1). The
restriction site need not lie immediately downstream from the target gene; one can choose any
unique restriction site that lies within a few hundred base pairs of the end of the gene. The cleaved
plasmid is partially digested using a 5'->3' exonuclease to produce a stretch of single-stranded
(minus strand) DNA. We prefer to use gene 6 exonuclease of T7 phage because of its distributive
properties and because of its marked preference for duplex DNA (12). One can easily control the
extent of the digestion to ensure complete removal of the coding strand of the gene as well as the
plus strand of the adjacent T7 promoter. Disruption of the promoter region provides an internal
selection mechanism since incomplete reconstructs will not obtain a functional promoter and will
be inert in the subsequent transcription reaction.
T7 gene 6 exonuclease operates inefficiently at termini that have a 5' overhang. When using
a restriction enzyme that leaves a 5' overhang, we found it necessary to increase the amount of
exonuclease from 50 U to 100 U in order to ensure adequate digestion of the coding strand.
Removal of a 5 overhang may, to some extent, be dependent on the sequence of the overhanging
bases, so that somewhat different amounts of exonuclease may be required in certain cases.
After digestion of the coding strand, the exonuclease is removed by phenol extraction, and the
DNA is purified by ethanol precipitation. Two oligodeoxynucleotides are then hybridized to the
segment of single-stranded (minus strand) DNA. One, which we refer to as the "terminator oligo",
forms a perfect duplex at a chosen location near the 3' end of the target gene. The other, which we
refer to as the "mutator oligo", forms a partial duplex at a site of interest within the gene. The
mutator oligo Is designed such that it contains a central region of base mismatch flanked by two
regions that form a perfect duplex. The mismatched region may be shorter or longer than the
original complementary DNA, and may consist of a defined sequence or a mixture of random
sequences. As in all oligonucleotide-directed mutagenesis techniques, the mutator oligo should be
designed such that it can form a stable partial duplex structure at the desired location. The mutator
oligo must be phosphorylated at its 5' end so that it can serve as a donor substrate for DNA ligase.
The two oligos are extended using T4 DNA polymerase and are ligated to form a template for
transcription of the mutant RNA. T4 DNA polymerase is used because, unlike most other DNAdependent DNA polymerases, it does not have strand displacement activity (13). We tested the
715
Nucleic Acids Research
Klenow fragment of E. coli DNA polymerase I In this reaction and obtained very unsatisfactory
results. We usually begin the reaction by incubating at 25° C for 5 min to give the polymerase a
chance to extend the two oligos under conditions that enhance duplex stability. The reaction is
completed by incubation at 37° C for 60 min, and the DNA is purified by ethanol precipitation. The
precipitation step is not absolutely necessary, but tends to increase the yield in the subsequent
transcription reaction.
Transcription is performed under conditions similar to those described by Milligan et at (4),
using a large amount of T7 RNA polymerase and high concentrations of MgQ 2 and the four NTPs.
After phenol extraction and ethanol precipitation, the transcription products are loaded onto a
polyacrylamide gel and the mutant RNA is isolated electrophorettcally.
Depending on how
efficiently the mutator oligo hybridizes to the target DNA, there may be an appreciable amount of
wild-type RNA included among the transcription products (for example, see below). For this
reason, the mutagenesis technique is best suited for insertion or deletion mutations that result in a
discernable size difference on the gel. In some cases, hybridization of the mutator oligo is very
efficient and the amount of "revertant" wild-type RNA is negligible (again, see below). This is
more likely to occur when the 5" portion of the mutator oligo forms a long stretch of stable duplex
structure with the minus strand DNA. However, hybridization of the mutator oligo may also
depend on features of secondary structure that are not possible to predict.
Application of the Muragenesis Technique
We have applied the above-described mutagenesis technique to the study of a self-splicing
group I intron. Working with the intervening sequence (IVS) of Tetrahymcna pre-rRNA, we
wished to produce sizeable internal deletions within the non-conserved portions of the molecule.
In the present paper we focus on the mutagenesis technique itself, and present our data as an example of how the technique can be applied. In a subsequent paper (14) we will detail the effect that
these and other internal deletions have on the catalytic activity of the Tetrahymena ribozyme.
We made two internal deletions within the Tclrahymcna IVS (Fig. 2). Ml is a 38-nucleotide
deletion from position 56 through 93 that removes structural elements P2.1 and L2.1. M2 is a 69nudeotide deletion from position 127 through 195 that removes structural elements P5a, F5b, L5b,
P5c, and L5c. The location of these deletions was chosen based on known features of group I
secondary structure (15-17). The mutator oligos were hybridized to the minus strand DNA by
flanking regions consisting of 11-14 complementary residues. Two terminator oligos were used,
each containing 15 nucleotides. Tl hybridizes at positions 305 through 319 of the IVS and T2
hybridizes at positions +8 through +22 of the 3' exon.
The two terminator oligos were used either alone or in combination with one or both of the
mutator oligos. All eight combinations were tested using pTL-45 DNA, which contains a portion
of the Tetrahymena IVS (beginning at position 45) and 29 nucleotides of the 3' exon, inserted 6
nucleotides downstream from a T7 promoter (8). Figure 3 shows the direct transcription products
716
Nucleic Acids Research
P5b
P5c
L5b
Figure t. Sequence and secondary structure of the Tctrahymena IVS, showing the location of the
two terminator and two mutator oligos. The RNA is truncated at its 51 end, corresponding to the
direct transcription product of pTL-45 DNA. Structural elements within the IVS are labeled
according to the standard nomenclature for group I Introns (23). A portion of the 3' exon is shown
in lower case letters. The location of the 3' terminus produced by Tl and T2 and the site of
internal deletion produced by Ml and M2 are indicated using a heavy diagonal line. The extent of
hybridization by the terminator and mutator oligos is indicated by a heavy bracketed line.
that were obtained. Comparable results were achieved using a different plasmid that contains the
entire Tetrakymena TVS, although in that case the autoradiogram was much more complicated
due to the catalytic activity of the precursor rRNA In the transcription buffer (data not shown).
Digestion of the coding strand with gene 6 exonudease is essentially complete. Only a trace
717
Nucleic Acids Research
T2 (in 3' exon at +22)
—
Ml
M2
M1M2
Tl
—
(inIVSat319)
Ml
M2
M1M2
,405
"398
360
329
291
281
-243
-212
- 174
Figure 3: Autoradlogram of wild-type and mutant RNAs obtained by transcription in the presence
of [a 3iP] GTP. The bands marked by an arrow correspond to the expected transcription product, the
size of which is indicated at the right. Bands marked by a dot correspond to materials derived from
the expected transcription product as a result of RNA-catalyzed cleavage at the 3' splice site that
occurs during the transcription reaction (24). Unmarked bands correspond to wild-type RNA and
its cleavage products that appear as a result of inefficient hybridization of the mutator oligo. wt is
the transcription product obtained from intact pTL-45 DNA that has been cut with Hin dm. g6 is
the transcription product obtained after digestion of the wild-type DNA with gene 6 exonuclease.
The products were separated by electrophoresis in a 5% polyacrylamkie / 8 M urea gel run in 90 mM
Tris/borate buffer.
(< 1%) of the wild type is detected when DNA that has been treated with gene 6 exonuclease is
transcribed directly. Hybridizing either Tl or T2 to the minus strand DNA and then extending
with T4 DNA polymerase allows one to produce 3'-rruncated RNAs with a defined end.
Hybridizing one or two mutator oligos in addition to the terminator oligo allows one to control
internal positions as well as the 3' terminus. The data presented in Figure 3 indicate that M2 does
not hybridize as efficiently to the minus strand DNA as does Ml. This Is evidenced by material in
the M2 lanes corresponding to transcripts whose 3' end is defined by the terminator oligo but
718
Nucleic Acids Research
—
M2 mutant
C
U
G
Ml M2 mutant
A
—
C
U
-«-M2
Figure 4: Sequence analysis of the M2 and Ml M2 mutants by the primer extension method. A
deoxynucleotide complementary to positions 274-288 of the IVS was hybridized to the mutant
RNA and extended using reverse transcriptase. Lane -, primer extension in the absence of
dideoxynucleotides. Lanes C, U, G, or A, primer extension reactions in the presence of ddGTP,
ddATP, ddCTP, or ddTTP, respectively. Angle brackets indicate the extent of hybridization by the
mutator oligos. Arrows indicate the site of internal deletions.
719
Nucleic Acids Research
whose internal positions are unchanged from the wild type. The Ml M2 double deletion mutant is
accompanied by a smaller amount of the Ml single deletion mutant For the most part, however,
the desired single or double deletion mutant dominates the family of transcription products.
The identity of the mutants was confirmed by eluting the transcription products from the gel
and determining their nucleotide sequence. Figure 4 shows the nudeotide sequence of the M2 and
Ml M2 mutants as determined by primer extension analysis using reverse transcriptase in the
presence of dideoxynudeotides (9). It is important to note that the transition from a double- to a
single-stranded template and from a single-stranded template back to a double strand takes place
without appreciable slippage of the polymerase enzyme. The transcription products obtained using
a partially mismatched template do not appear to be any less accurate than one would obtain using
a complete double strand.
The data presented in figure 3 was prepared quantitatively, that is, differences in the amount
of radioactivity reflect either a loss of material during the workup or differences in the effidency of
transcription. The lane corresponding to the Tl oligo alone demonstrates that in some cases the
reconstruction of the template strand is nearly complete. The comparatively lower effidency of
template reconstruction with the T2 oligo alone is likely to be due to decreased hybridization
effidency of the T2 oligo. When a mutator oligo is used, the effidency of template reconstruction
is lowered even further.
This is partly because the mutator oligo presents a more difficult
hybridization task and partly because the extended terminator oligo must be ligated to the 51 end of
the mutator oligo. Despite the loss of viable templates due to ineffident strand reconstruction, one
can obtain an adequate amount of mutant RNA as a result of the high turnover of T7 RNA
polymerase. In a large-scale preparation, we used 10 ng (~3pmol) of plasmid DNA as starting
material.
The yield of mutant RNA, after elution from the gel, ethanol precipitation, and
chromatography on Sephadex G-50, was 183pmol for the 329-nudeotide M2 mutant and 106pmol
for 291-nudeotide Ml M2 mutant. The mutant RNA was found to exhibit catalytic activity in
vitro (14), attesting to its purity and reasonable sequence homogeneity.
DISCUSSION
We hope that others will find our mutagenesis technique useful for the rapid preparation of
mutant RNAs.
We have been using the technique routinely for the past several months to
produce a number of mutations within the Tetnhymena TVS. In addition to deletions, we have
produced single base insertions, multiple base substitutions, and various combined insertions and
deletions (data not shown). Because the mutant RNA is usually accompanied by a significant
amount of "revertant" wild-type RNA, we prefer to indude an insertion or deletion along with
any substitution to produce a discemable size difference, allowing the mutant RNA to be separated
from the wild type on a polyacrylamide gel. This would not be necessary if one used a mutator
oligo that hybridizes very effidently to the minus strand DNA or if one is willing to tolerate a
720
Nucleic Acids Research
small amount of wild type included among the mutant RNAs.
The major advantage that the mutagenesis technique has to offer is its speed and simplicity.
Producing a 3'-truncated RNA with a defined end is especially straightforward since it does not
require the use of a mutator oligo and thus is not subject to contamination by wild-type RNA. The
3' end may be fixed at any point along the gene or may extend from any point into an extraneous
sequence as determined by a terminator oligo with a dangling 5' end (14). There are established
methods for cloning a defined region of DNA (18) and for the in vitro synthesis of RNAs with
defined ends (19). The latter technique is similar to our own, except that it uses cloned singlestranded DNA (e.g. M13 phage DNA) and a "portable promoter" to define the transcription start site.
One could combine the M13 technique with our own to produce RNAs that have internal mutations.
The site of an internal mutation may lie at any point along the gene, and need not be in
proximity to a restriction site. The design of the mutator oligo must take into account three factors:
the desired mutation, the need for efficient hybridization, and the cost. If one wishes to produce a
radical alteration of the wild type, it is probably wise to design the mutator oligo with long flanking
regions so that it will be able to bind tightly to the minus-strand DNA. This, of course, will
increase the cost, but is likely to be economical in the long run. Similarly, if one plans to use two
or more mutator oligos simultaneously, each should contain long flanking regions so as to
maintain the combined efficiency of hybridization at an adequate level.
Our primary interest has been the construction of recombinant RNAs.
However, the
mutagenesis technique that we describe could also be used to generate recombinant DNAs. The
mutant RNA could be reverse transcribed to cDNA using the terminator oligo as a primer for
reverse transcriptase. Typically, the yield of full-length cDNAs is only about 20-30% relative to the
input of RNA template (20), so that that the net yield of mutant DNA would be 10-15 times the
input of wild-type DNA.
Alternatively, after reconstruction of the template strand, one could
excise the minus-strand DNA using exonuclease in, and then run the polymerase chain reaction
(21,22) to amplify the mutant DNA.
The terminator oligo and the minus strand of the T7
promoter could serve as the two primers for this reaction.
ACKNOWLEDGEMENTS
We thank K. Umesono for helpful comments and J. M. Burke for providing a diagram of
group I secondary structure. This work was supported by grants from the National Institutes of
Health (GM35755) and the Alfried Krupp von Bohlen und Halbach-Siftung.
G.F.J. is a Merck
Fellow of the Life Sciences Research Foundation.
REFERENCES
1.
2.
Kunkel, T.A. (1985) Proc. Nat. Acad. Sd. USA 82, 488-492.
Chamberlin, M. and Ryan, T. (1982) In Boyer, P. (ed), The Enzymes, 3rd edition, Academic
721
Nucleic Acids Research
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
722
Press, New York, pp.87-108.
Lowary, P., Sampson, J., Milligan, ]., Groebe, D., and Uhlenbeck, O.C. (1986) In van
Knippenberg, P.H. and Hilbers, C.W. (eds), Structure and Dynamics of RNA, Plenum Press,
New York, pp.69-76.
Milligan, J.F., Groebe, D.R., Witherell G.W., and Uhlenbeck, O.C (1987) Nucl. Adds Res.
15:8783-8798.
Davanloo, P., Rosenberg, A.H., Dunn, J.J., and Studier, F.W. (1984) Proc. Nati. Acad. Sci. USA
81,2035-2039.
Butler, E.T. and Chamberlln, M.J. (1982) J. Biol. Chem. 257, 5772-5778.
Zaug, A.J., Been, M.D., and Cech, T.R. (1986) Nature 324, 429-433.
Young, B. and Cech, T.R., personal communication.
Sanger, F., Nicklen, S., and Coulson, A.R. (1977) Proc. NatL Acad. Sd. USA 74, 5463-5467.
Gillam, S. and Smith, M. (1979) Gene 8:81-97.
Kunkel, T.A. (1987) In Ausubel, F.M. et al. (eds), Current Protocols in Molecular Biology,
John Wiley & Sons, New York, unit 8.1.
Kerr, C and SadowsM, P.D. (1972) J. Biol. Chem. 247:305-310.
Nossal, N.C. (1974) J. Biol. Chem. 2495668-5676.
Joyce, G.F., Van der Horst, G., and Inoue, T., manuscript in preparation.
Davies, R.W., Waring, R.B., Ray, J.A., Brown, T.A., and Scazzocchio, C (1982) Nature 300:719724.
Michel, F., Jacquier, A., and Dujon, B. (1982) Biochemie 64:867-881.
Michel, F. and Dujon, B. (1983) EMBO J. 2:33-38.
Chisaka, O., Iwai, S., Ohtsuka, E., and Matsubara, K. (1986) Gene 45:19-25.
Krupp, G. and Soil, D. (1987) FEBS Lett. 212, 271-275.
Berger, S.L., Wallace, D.M., Puskas, RS., and Eschenfeldt, W.H. (1983) Biochemistry 22:23652372.
Scharf, S.J., Horn, G.T, and Erlich, H.A. (1986) Sdence 233:1076-1078.
Saiki, R.K., Gelfand, D.H., Stoffel, S., Scharf, S.J., Higuchi, R., Horn, G.T., Mullis, K.B., and
Erlich, H.A. (1988) Sdence 239:487-491.
Burke, J.M., Belfort, M., Cech, T.R., Davies, R.W., Schweyen, R.J., Shub, D.A., Szostak, J.W.,
and Tabak, H.F. (1987) Nud. Adds Res. 15:7217-7221.
Inoue, T., Sullivan, F.X., and Cech, T.R. (1986) J. Mol. Biol. 189:143-165.