Volume 17 Number 2 1989 Nucleic Acids Research Identification of a conserved sequence in the non-coding regions of many human genes Lawrence A.Donehower*, Betty L.Slagle, Margaret Wilde1, Gretchen Darlington1 and Janet S.Butel Department of Virology and •Department of Pathology, Baylor College of Medicine, Houston, TX 77030, USA Received September 9, 1988; Revised and Accepted December 19, 1988 Accession no. X13001 ABSTRACT We have analyzed a sequence of approximately 70 base pairs (bp) that shows a high degree of similarity to sequences present in the non-coding regions of a number of human and other mammalian genes. The sequence was discovered in a fragment of human genomic DNA adjacent to an integrated hepatitis B virus genome in cells derived from human hepatocellular carcinoma tissue. When one of the viral flanking sequences was compared to nucleotide sequences in GenBank, more than thirty human genes were identified that contained a similar sequence in their non-coding regions. The sequence element was usually found once or twice in a gene, either in an intron or in the 5 ' or 3 ' flanking regions. It did not share any similarities with known short interspersed nucleotide elements (SINEs) or presently known gene regulatory elements. This element was highly conserved at the same position within the corresponding human and mouse genes for myoglobin and N-myc, indicating evolutionary conservation and possible functional importance. Preliminary DNase I footprinting data suggested that the element or its adjacent sequences may bind nuclear factors to generate specific DNase I hypersensitive sites. The size, structure, and evolutionary conservation of this sequence indicates that it is distinct from other types of short interspersed repetitive elements. It is possible that the element may have a cis-acting functional role in the genome. INTRODUCTION Virtually all mammalian genes contain various types of repetitive elements in their non-coding regions. The two most abundant classes of interspersed repetitive sequences associated with genes are the long interspersed nucleotide elements (LINEs) and short interspersed nucleotide elements (SINEs) (1-3). Both types of elements have structural properties (flanking direct repeats and 3 ' A-rich regions) which indicate that they are inserted into different sites in the genome via an RNA intermediate form. Consequently, these repetitive elements have been classified as retrotransposons (2). Typically, these elements show greater interspecies divergence than intraspecies divergence. For example, the dominant SINE families of humans (Alu) and mice (Bl) are ancestrally related, but show significant differences both in consensus sequence and overall structure (1-3). No clear functional role has been assigned to these types of repetitive elements. In addition to repetitive elements and non-functional DNA, the non-coding regions of genes possess regulatory elements that allow the appropriate level of expression of the encoded protein. A number of cis-acting gene regulatory elements have been characterized, and some have been shown to bind nuclear factors in a specific manner (4,5). While the majority of these regulatory elements appear to be located 5' of the coding sequences, a number of examples indicate that cis-acting sequences that regulate expression can be present in introns or in the 3'flankingregion of a gene (6-10). In this paper, we report the characterization of a sequence element which appears to be highly conserved among mammals. The sequence is about 70 bp in length and is usually 699 Nucleic Acids Research present once or twice in the non-coding regions of at least thirty human genes and a number of other mammalian genes. It has none of the features typical of SINEs, such as flanking direct repeats or 3 ' A-rich tracts. The evolutionary conservation of this sequence suggests a cis-acting functional role, and we propose that this element is part of a class of interspersed repetitive elements distinct from other SINEs. MATERIALS AND METHODS Molecular doping and Sequencing Standard molecular cloning and nudeotide sequencing methods were used and are fully described in Zhou et al. (fl). Sequence Comparisons All computer sequence analysis work was performed on the Baylor College of Medicine Molecular Biology Information Resource. The streamlined user interface EuGene, developed by Thomas Shalom, was used to efficiently search GenBank and make sequence comparisons. The GenBank search programs were developed by Charles Thomas and Dan Goldman (12,13). The default parameters (unit cost matrix used, 15 nudeotide minimum length of similarity for acceptance, and only matches with an SD value above 3.5 reported) were used for the search function (12). Programs used for optimal alignments of similar sequences were according to Altschul and Erickson (14) and Lawrence and Goldman (13). The parameters of the Altschul and Erickson (14) alignments were the default parameters (Dayhoff matrix used, a cost penalty of 25 for opening a gap, and a cost penalty of 0.5 for each space in a gap). Preparation of Hepatoma Extracts Nuclear extracts were prepared from the Hep3B and HepG2 human hepatocellular carcinoma cell lines (15). Cells were trypsinized from eight roller bottles, pelleted by lowspeed centrifugation, and suspended in solution A2 (10 mM HEPES, pH 72, 0.15 mM spermine, 0.5 mM spermidine, 2 mM EGTA, and 2 mM DTT). The suspended cells were lysed by Dounce homogenization, and the crude nuclei were pelleted twice by low-speed centrifugation in solution A2 and resuspended in 76 ml of solution Cl (10 mM HEPES, pH 7.6,25 mM KC1, 0.5 mM spennidine, 0.15 mM spermine, 2 mM EDTA, 0.5 mM EGTA, 2 mM DTT, 1.45 M sucrose, and 10% glycerol). The suspension was pelleted al 25,000 rpm for 30 minutes in an SW28 rotor. The pellet was suspended in solution D (10 mM HEPES, pH 7.6,100 mM KC1, 3 mM MgCly 0.1 mM EDTA, 1 mM DTT, 0.1 mM PMSF, and 10% glycerol) and then diluted to 10 A ^ units/ml. One-tenth volume of 4 M ammonium sulfate was added, and the mixture was placed on ice with occasional gentle miring. The rysate was then centrifuged at 45,000 rpm in a Beckman 50Ti rotor for 1 hour, and03 g of ammonium sulfate was added to each ml of supernatant The precipitated proteins were pelleted in the Sorvall SS34 rotor at 10,000 rpm for 20 minutes and then redissorved in 25 ml of solution E (50 mM Tris, pH 8.0,0.1 mM EDTA, 1 mM DTT, 125 mM MgCL, and 20% glycerol). The solution was desalted on a Pharmacia PD10 disposable G25 column and frozen as aliquots in liquid nitrogen. DNase I Footprinting Footprints were performed as described by Carthew et aL (16), based on the method of Galas and Schmitz (17). Briefly, 10,000 cpm of ^-end-labelled DNA (250-bp fragment) were incubated with 10 ug of bovine serum albumin or 10 ug of the Hep3B nuclear extract in solution E for 60 minutes at 30°C. Five ng of DNase I was added to each reaction, and the mixture was incubated on ice for 3 minutes before addition of 1 ul of 0.5 M EDTA followed by 100 ul of stop buffer (1% SDS, 100 mM Tris-HCl, pH 8.0,05 mg/ml proteinase K, and 50 ug/ml yeast RNA). This mixture was phenol-extractedTchloroformextracted and ethanol-precipitated prior to denaturation in ionnarnide-dye buffer and loading on a 6% denaturing poryacrylamide gel for electrophoresis followed by autoradiography. Nucleodde sequence determination of the end-labelled fragments for use as markers was performed according to Maxam and Gilbert (18). 700 Nucleic Acids Research RESULTS Identification of a Sequence Similar to Elements in Other Human Genes The probe DNA sequence described here (Figure 1C) was isolated as part of a study on the role of hepatitis B virus (HBV) infection in the development of hepatocellular carcinoma (HCC) in humans (11). Genomic DNA derived from a liver tumor tissue of an individual with chronic HBV infection was analyzed by Southern blot hybridization and found to contain two integrated copies of HBV DNA. One of the integrated HBV DNAs and itsflankingcellular DNA was molecularh/ doned and analyzed extensively by restriction endonuclease mapping and selective DNA sequencing (Figure IB). A 1.0-kb human flanking DNA fragment (Figure IB) contained no repetitive DNA sequences after hybridization to human DNA at nigh stringency and was mapped to the pll.2-pl2 region of chromosome 17 (Figure 1A) (11). The sequenced segments of the humanflankingDNA were further analyzed by computer. None of the flanking sequences contained discernible open reading frames. A search of the GenBank data base for possible nudeotide sequence similarities was negative for all of the sequence fragments except one. This one sequence (Figure 1C) showed significant similarity to a number of GenBank entries, and its reverse complement also showed similarity to GenBank sequences, indicating that the element wasfresent in genes in both orientations. Optimal alignment of the HBVflankingsequence (Figure 2A) with 17 GenBank human DNA entries (Figure 2B) revealed a startling degree of similarity extending about 70 nucleotides among most of the entries. No gaps had to be introduced into any of the sequences to provide a better fit A consensus sequence was derived from these alignments (Figure 2C), in which a given nudeotide position has an identical base for 9 or more of the 18 DNAs listed. Secondary nucleotides present in at least 6 of 18 sequences are indicated below the primary consensus nucleotides. Similarity of greater than 75% (at least 14 of the 18 DNAs contained the consensus nudeotide) was noted at 28 positions (Figure 2C, asterisks). In general, the highest amount of sequence conservation is found near the center of the 70-bp region, with decreasing conservation moving away from this central core sequence. The calculated probability values for obtaining the consensus nucleotides at a given frequency at each position (Figure 2D) are usually very low, indicating a high degree of significance of these similarities throughout most of the aligned sequences. The most conserved part of the consensus sequence derived in Figure 2C (the central 45-bp portion} was used to further probe the GenBank library for entries with sequence similarities. Additional sequences with significant similarity were obtained, and tne sequence alignments for 35 of the most similar entries are presented (Figure 3). The 45-bp consensus (+) probe is displayed at the top of Figure 3, and the region of highest similarity (core similarity) is indicated by double underlining. In the 15-bp core similarity region, 14 of the positions have at least 75% nudeotide identity among the entries. In four positions at least 34 of the 35 entries have the same nudeotide (Figure 3, bottom). When the data from 20 consensus-like sequences in the opposite orientation (Table 2) are included in these comparisons, 55 of 55 sequences have an A at position 33,54 of 55 sequences have an A at positions 35 and 39, and 53 of 55 sequences have a G at position 38. Thus far, we have identified over thirty different human genes and eight genes from other mammalian spedes that contain a consensus-like sequence (Table I and Table 2). In addition, three human sequences are represented that are not associated with a particular gene. Two of these sequences are in potential origin of replication regions: the human ARS1 sequence (19) and the African green monkey SV40 origin-like sequence (20). The standard deviation (SD) value in Table 1 is a measure of the degree of similarity (12). Values of 3.0 or greater are considered to reflect possible similarity, and values above 6.0 are considered to have probable similarity (13). The position of the consensus-like sequence within each gene is indicated (Tables 1 and 2, third column). Many of the genes contain the sequence within an intron, although some genes have the element either in their 5 ' or 3 ' flanking regions. In only one case (human interleukin 1) is the consensuslike sequence present within a gene exon. However, in this instance, it is located within the 3 ' non-coding region of the gene. Interestingly, the human aceryicholine receptor (alpha 701 Nucleic Acids Research A. Human chromosome 17 B. 9.0 kb cloned DNA .---'' """--. .--'" " |*1J)U»I P">b» ••"•T--.J .---' H B HE —. Sequtnod ,J! regkmi C. HBV flanking DNA sequence _...-'*10 _--" ...--'" 20 \ 30 40 50 60 70 80 Figure 1. Genetic and physical maps showing the location of the original consensus-like sequence in a human hepatocellular carcinoma (HCC). See Zhou et al. (11) for details on molecular cloning, chromosomal localization, and sequencing. (A) Pictorial representation of human chromosome 17. The hepatitis B virus (HBV) flanking sequences shown in B map to the 17plO-17pl2 region of chromosome 17. (B) Restriction endonudease map of a cloned 9.0-kb EcoRI fragment derived from HCC genomic DNA containing integrated HBV sequences and human flanking DNA. HBV DNA (middle boxed area) contains two genes (laoelled "S" and "C"). Open boxes represent the pre-S gene and hatched boxes represent the gap between pre-S and C Enzymes used m the map include EcoRI (E), Bgin (B), Hindlll (H), Xhol (X), and Xbal (Xb). The 1.0-kb BgW-EcoRI fragment, used for chromosome mapping, is represented by a closed box at the right of the map. Regions that have been sequence? are indicated by lines below the map. (C) The sequence containing the consensus-like element is shown. subunit) gene contains the consensus-like sequence as part of a 49-bp tandem direct repeat. Finally, the number of nucleotides in the central 15-bp core similarity region (double underlined nucleotides in Figure 3 consensus sequence) that are identical to the consensus are indicated for each entry (Tables 1 and 2, fourth column). Fourteen entries have identity with the consensus sequence in at least 14 of 15 nucleotide positions in Table 1. Similar results are shown in Table 2 (5 of 20 entries have at least 14 of 15 nucleotides which match the core consensus), which lists consensus-like sequences found in the reverse orientation (to those in Table 1) in a number of human and mammalian genes. Two of the human genes Listed in Table 1, human myoglobin and human N-myc, have mouse counterpart genes that are also reported. To test for relative evolutionary conservation of the consensus-like sequence within the human and mouse myoglobin genes, we performed a homology matrix comparison between the second intron (and first and second exons) of each gene (Figure 4). Only matches over 10 nucleotides with a standard deviation value above 3.0 are shown. As expected, the mouse and human exons 2 and 3 are highly homologous, while intron 2 contains only a few scattered regions of homology. Two ofthese homologous stretches correspond exactly to the match (SD = 5.4,33) between the consensus-like sequences in the human and mouse intron 2. The homology between the consensus-like elements does not extend on either side. There are only two other regions 702 Nucleic Acids Research >. 10 HBV FLAMKIHG DMA SEQUENCE: B. DMAB COMPARED: HBV f l a n k i n g DMA aarua praalbuain carbonic anhydrasa •yoglobin apolipoprotain C H I haptoglobin olpho-1-antitrypain bata tubulin factor IX alpha fatoprotain f ibrinogan adanoaina daaainaaa opain acatylcholina racaptor protain C anXaphalin 20 30 40 50 60 70 80 AACATCcccTCTiTACAajuuuauuauaKaaauaa CTATTAT—AC GTCCTAT TC -C ACATTTC—A-TAT CTACCAT C—AT A A A T A A C—C-T A A—TC-AA-G-T A-AACA-T TTACG-GTT-TC-CAAAATTO-C-CATTGTTA G GTAGGTG-TA-T-T-T—C—C CACAGTT -G-TTT—C—G-T A CCTC-TA -CAG--A—C--A—A-T—T-A T T—C A T T A C-G -C-A C G--GA-C G T T—C A-AGGAGG -C—A—A-AC-A-T AT—AGGC-G-C CTATGACA-AGTC-A—T—C-CATCTCC TT A T A-AC-G-T ACTCGCC-C CTT G A A -C T—GTCAC-TC-AC-AACTCACACTGC AAAGCAT A-T GA T A—TT--O—A AACAAC A T-CC-CTT CTTGAAGATAG—G-TA—C—C—AC C T AT A T-TT-T—GAA-CACCAA GGTACTTCTG-TA-T—TA AA -C-AG-T-A—TA-CT-GC—A-GACCACATA-CTA GGGCATAATA-TC-ATTG T T T -C -GT T-TC A-AC-G-A CAGGGATA-CAGAT C—A GAG-A GG GG AGA—CATTT-G—ATGTGGCCAG AATCCTC—A-C T C—A A T ATTT—G TCT T AC-AGAAC-A-T ACCACGT -CCAT A TG—CA -C-T CA-TG-CCTC-A AAAGATAAT GT—A A A GC—A—CA A GT GT-G A—TG-CC-G-A TGACCCA-T-A GT T-GAC T—TGA-ATGAT-AAGAT—T T AT-AGGAAT TTTATAAAT-C T G T-A-G C T—T AA-A-C-T—TGCA-A-A-TTGT-C C. COHSEHSOS SEOOEHCE: «(HKinniTATTATCCC<^TTTTACAGATGAGGAAACTGACGCACAGAGAGGTTAAGTAaCTTGCCCAAGGTCKCHNAHCl) D. DISTBIBUTIOH OF HUCLEOTIDE3 AT EACH POSITIOH: ATA TT A T A A Figure 2. Identification of a consensus sequence by comparison of the HBV flanking sequence with similar sequences located in other human genes. (A) The HBV flanking DNA sequence (80 bp) shown in Figure 1C is presented here. (B) Human genes or DNA elements that have similarity with the HBV flanking sequence shown in A, determined by searching the GenBank data base. The DNAs are optimally aligned under the criteria of not allowing gaps or deletions. A dash (-) indicates a nucleotide identical to that of the primary consensus sequence shown in C. (C) Consensus sequence. Comparison of the 18 DNAs listed in part B generated a consensus nucleotide if at least 9 of 18 sequences had an identical nucleotide at a given position. 14 or more identical nudeotides (of the possible 18) at a position is noted oy an asterisk above the nucleotide. Secondary nudeotides are shown below the primary consensus nudeotides at positions which had 6 identical secondary nudeotides. N = any nucleotide. (D) Distribution of nudeotides at eadi position for the 18 compared sequences. Numbers representing the consensus nudeotides are underlined. Consensus nudeotides are indicated at each position under CON. The probability of obtaining the primary consensus nudeotide is shown under the column designated P(l). The probaoility of obtaining the primary and secondary nudeotides at a position is indicated under the P(l + 2) column. Probabilities were calculated using a Sinomial probability distribution, with the following values for the human genomic nucleotide frequendes: A=03,C=02,G = 02, T=O3. The most likely probability value (4 or 5 occurences of a given nudeotide at a position) is about 2X10" . 703 Nucleic Acids Research CONSENSUS 1) 2) 3) 4) 5) 6) 7) • ) 9) 10) 11) 12) 13) 14) 19) l») 17) It) 19) 20) 21) 22) 23) 24) 23) 26) 27) 28) 29) 30) 31) 32) 33) 34) 35) aXCATTTTACACATCACCAAACTOACCCTCACACACCTTAACTA A human adanoaina daaminaM human p r o t a i n C human b a t a - t a b a l l n human a l p f a a - l - a n t i t r y p * i n human h a p t o g l o b i n ( a l p b a - 2 ) human • y o g l o b i n X chremomamm s«qaanc*s ( n « i r DHD) human c a r b o n i c anhydraM African graan monkay 8V40 ori-llka human c-aia human haptoglobin-ralatad human n-ayo human factor IX human prothrombin human acatylcholina raoaptor human apolipoprotain A-l human alpha-f«toprot«in human Mnm praalbomin Booaa myoglobin human apolipoprotain C-III rabbit poly Ig r*captor human fibrinogan human intarfaron bata-3 human immuna intarfaron gamma human opain bovina acatylcbolina racaptor human ASS1 human ankaphalin rat thyrotropin bata subonit human protain C gana human intarlaukln 1 bovina pancraatic trypain inhibitor M U M N-myc human T-call racaptor bata chain aouaa g l i a l f i b r i l l a r y acid protain AATATTC-ATTO TCCAICCAT CCTCTTA-TT T T A C TC-—CA -C-TCTTCCCC C T—CTCACTTCCA A ACAAAOTC-A—T—C-CATCTCC —X ACCTCTCC TTATTOTTTt—C 0-1 A C -C TCTTCCCC -C C-CCTTCTCC TTATCATAT -C-T ACAATAT-C AA TAT-AC-—C-TTCTCAGCC TTACC-CTT-TCCCAAAAT CTAATTAT A A T GGTAGCC—T—AC ~0 1 :—AC AATACTTCTAC T—T AA-ACCTTCCT AATTCAT-T—-C T-A-C CC--C C TCTTCCCC ITATTATT-T—C C-C A AT c-CC ACTTCACC OTATCAI A-T--C A TTATTAA-T -CA T •A T—C—A AACAACTCCCA OTCCTCTC -C A—T-C -C-CAC TTCCCTACTACC GTAATGT T C--A A T— • A T T — — C - — TCTTTGCTC -C CTACCCCACCTACCACA GGGTGATT—T-CC—-O AT —--ACTTTCTT CATACATO-TA--C—C—AC -G CAC—AC-C—A—ACTTCTCA TTATTAT -C T A A— ——CC—C—OO-C-ACTCACTG OOACCCA-TGG-C-GG T -OA G—GACCTCCCC CTTATTTT-T—C—<1 A ' T-CTACTCAACCCCACCCTCCCAOA ATCTTCT-0 A-CC-AC-T-A-CTAACTTOCCCA TCTCTTA-T—IA AA -ATC—T-CCCTCCCCTAOGCCTCACCA TCTCATTTTA -<3 ACAGTCT-AT-C-CC-TT A T A --A-T -CCTCCCTC TAACAOAT C—A SAC-A 00 -CC ACACTCATTT TAAATGT T-C A—A—A -ACT-C—C TCATTATCC TGA-ATOAT-AAGTTTTCCCA ATTTAAT OT T-0 AATTT ACT—A A A—-CC- •A—CA A -CT—-CTTCCCCC GTTACACTTT-C T A T— —<JCT—GA T—GGGCAAACT —CCTCTCA-A-C-TTAC-TCCTCCAC ATOTTTAC-I A-C G T-ACA-AAC A-ATATCCAC AOACTCTA A -C-C—T-A -T—CTC-CTCTCC-CAOCAACTCTGO CACATGCT T -C-A-C •A-ACTTTTTCCACC-TCATCCOACAAC TTATTAC A-T—C -C -ATCACAC-TTACACCTC-TATAACTA GATATTCT -C—T-CA -CTTCCATCCTGCACCCCTrCAT CTGTGAT AT GCA—-C 2 7 2 3 24 3 1 2 6 2 1 27 2 7 2 1 33 23 39 3 0 34 T A C A C » T O » O < 1 » » I 9 7 9 • 4 A C A C A C 34 2 7 2 9 2 3 14 22 O g C T C • 13 7 A AT Figure 3. Sequences from GenBank with significant similarity to the consensus sequence. The central 45 nucleotides (nucleotides 14-58) of the consensus sequence derived in Figure 2C are shown at the top. This sequence was used to search GenBank for other similar sequences. Thirty-five DNAs that revealed a significant degree of similarity are aligned with the consensus sequence. (Each DNA is 60 bp in length with the 45-bp region of similarity in the middle). The gene or DNA element from which each listed sequence was obtained is identified at the left The number of entries that shared identity with the consensus nucleotide at each position is indicated at the bottom. Positions that show a frequent secondary nucleotide are indicated below the consensus nucleotide. The 15 central nucleotides that exhibit the highest degree of similarity (core similarity) are emphasized in the consensus sequence by double' underlining. of homology in intron 2 near the 3 ' portioa Human N-myc and mouse N-myc also displayed homology between their consensus-like sequences (data not shown). Binding of Nuclear Factors to the Conseasus-Like Sequence If the consensus-like sequence plays a functional role in the cell, then it might be expected to specifically bind trans-acting nuclear factors as do a number of other cis-acting regulatory elements. To explore this possibility, we incubated a small DNA fragment (a 250-bp Hindm-EcoRI fragment from the 1.0-kb probe DNA) containing the consensus-like sequence with a nudear extract prepared from Hep3B hepatoma cells and subjected the reaction to a DNase I footprinting assay. Evidence of specific protein binding may be more 704 Nucleic Acids Research Table 1. Characteristics of 35 ( +) consensus-like sequences 1) 2) 3) V 5) 6) 7) 8) 9) 10) 11) 12) 13) 14) 15) 16) 17) 18) 19) 20) 21) 22) 23) 24) 25) 26) 28) 29) 30) 31) 32) 33) 34) 35) Gene or DNA Sequence SD a Location human adenosine deaminase human protein C human beta-tubulin human alpha-1-antitrypsin human haptoglobin (alpha-2) human myoglobin X chromosome sequences (near DMD) human carbonic anhydrase II African green monkey SV40 on like human c-sis human haptoglobin-related human N-myc human factor IX human prothrombin humin acetylcholine receptor (alpha) human apofipoprotein A-l human alpha-fetoprotein human serum prealbumin mouse myoglobin human apohpoprotein C-HI rabbit poly Ig receptor human fibrinogen human interferon beta-3 human immune interferon gamma human opsin bovine acetylcholine receptor (alpha) human ARS1 sequence human enkephalin B rat thyrotropin beta subunit human protein C gene human interleukin 1 bovine pancreatic trypsin inhibitor mouse N-myc human T-cell receptor beta chain mouse glial fibrillary acid protein 9.7 9.6 93 9.1 8.8 8.7 8.4 83 8.1 7.5 7.1 7.0 6.7 6.7 6.6 6.6 65 65 65 63 63 63 6.0 5.9 5.8 intron 1 5 ' flanking intron 3 5 ' flanking intron 5 intron 2 unknown intron 1 unknown 5 ' flanking intron 5 intron 2 intron 3 intron 1 3 ' flanking (rpt)c 3 ' flanking intron 3 intron 3 intron 2 intron 2 3 ' flanking intron 6 unknown intron 3 intron 1 3 ' flanking rep enhancer intron 3 intron 1 intron 3 exon7(3' UT) d 3 ' flanking intron 2 D-J region intron/ 5.8 5.6 5.4 5.0 4.8 4.6 42 42 42 3.6 Core Similarity1" 14/15 15/15 12/15 12/15 13/15 14/15 13/15 14/15 13/15 13/15 13/15 15/15 11/15 12/15 11/15 14/15 13/15 13/15 14/15 13/15 13/15 13/15 12/15 11/15 11/15 11/15 13/15 11/15 12/15 14/15 11/15 11/15 12/15 14/15 14/15 a SD measures the degree of similarity to the consensus sequence. See text for additional details. b Core similarity represents the number of nucleotides in each consensus-like sequence (first number) which are identical to the most highly conserved central 15 nucleotides of the consensus sequence (Figure 3, double underlined nucleotides). c The consensus-like sequences of the alpha subunit gene are within a 49-bp tandem direct repeat *rhe consensus-like sequence is within the 3 ' untranslated region. readily observable at the higher resolution afforded by this particular procedure. The DNA-binding reactions were performed in an excess of the simple alternating copoh/mer duplex pory(dI-dC) with a constant amount of ^-labelled 250-bp fragment and 10 ug of Hep3B extract 705 Nucleic Acids Research Table 2. Characteristics of 20 (-) consensus-like sequences" Similarity Gene 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 human interleukin 2 human c-sis bovine acetylcholine receptor15 human beta crystallin human myoglobin human adenosine deaminase human protein C human prothrombin human alpha-fetoprotein human dystrophin human alkaline phosphatase human apolipoprotein C-in human epsilon globin human aldolase B gene human prolactin gene human tissue plasminogen act human interleukin 1 human myoglobin human opsin mouse myoglobin SD Location Core Total 10.4 9.9 9.1 8.8 8.7 8.4 8.2 8.0 7.6 7.4 13 12 6.9 6.7 6.7 63 6.2 5.9 5.9 4.2 5 ' flanking 5 ' flanking 3 ' flanking intron4 intron 1 5 ' flanking 5 ' flanking intron 12 intron 3 intron 7 intron 9 intron 3 5 ' flanking 5 ' flanking 5 ' flanking intron 4 intron 4 intron 2 3 ' flanking intron 2 13/15 14/15 14/15 13/15 13/15 14/15 12/15 12/15 12/15 13/15 13/15 13/15 13/15 14/15 12/15 15/15 11/15 13/15 11/15 12/15 39/45 37/45 37/45 37/45 36/45 34/45 36/45 36/45 34/45 34/45 29/45 35/45 32/45 35/45 33/45 29/45 34/45 33/45 31/45 30/45 These consensus-like elements are in the reverse orientation of those shown in Table 1 and are homologous to the opposite strand of the 45 nucleon'de consensus sequence at the top of Figure 3. The (-) strand consensus probe is: 5 ' -TACITAACXnXTCTGAGCCIX^GTTTCCrCATCTGTAAAATGGGG-3'. ''Beta subunit. Both strands of the 250-bp fragment exhibited DNase I hypersensitive cleavage sites in the presence of the Hep3B extract (Figure 5A). These sites were not apparent when the fragment was incubated with DNase I in the presence of bovine serum albumin. Interestingly, the strongest hypersensitive sites on each strand appear to be at the same position and were within the consensus-like sequence. These strong hypersensitive bands were consistently observed in three separate DNase I footprinting experiments as well as with an extract derived from the HepG2 hepatoma cell line (data not shown). Other less hypersensitive sites were present on one strand at the other end of the consensus-like sequence. No obvious protected regions were evident (Figure 5A), although three possible regions of decreased DNase I cleavage appeared, all outside of the consensus-like sequence. The relationship of the DNase I hypersensitive sites to the DNA sequence is illustrated (Figure 5B). DISCUSSION As a result of sequence analysis of human DNA associated with an integrated HBV genome in a hepatoma, we have discovered a sequence element that appears to be highly conserved in a number of human and other mammalian genes. The sequence itself displays some interesting features, including purine-rich tracts alternating with pyrimidine-rich regions. The 5 ' region of some of the consensus-like sequences has the potential to form stem-loop structures, but the significance of these putative secondary structures is unclear. The consensus-like sequences do not appear to have similarity to any previously described human repetitive DNA element and do not share the properties of a short interspersed nucleotide element (SINE), typified by the human Atu repeats. The consensus-like sequence observed in human genes does not contain an A-rich tract at its 3 ' 706 Nucleic Acids Research Mouse Myoglobin C i > .O o O) o > c CO \ Figure 4. Homology matrix comparison between human and mouse myoglobin in the intron 2 (and adjacent exons) sequences. Mouse myoglobin sequences (top) are matched with human myoglobin sequences (left). Sequences greater than 10 nucleotides in length which show homology at an SD value of 3.0 will generate a diagonal line in this matrix display. The distance between each vertical and horizontal line in the matrix is equal to 100 nucleotides. The consensus-like sequences in each intron are represented by a closed box with an arrow. Note that the consensus-like sequences are conserved between the human and mouse genes. end nor is it bordered by direct repeats, features characteristic of SINEs (1-3). In most cases, the consensus-like sequence is present once or twice per gene, whereas dozens of SINEs can be located within a single gene (1-3), indicating significantly lower copy numbers of the consensus-like sequence. However, it is possible that our screen underestimated the number of consensus-like sequences, since the more divergent members of this family might not be picked up by our computer searches. The consensus-like sequences are conserved among a variety of mammalian species, whereas rodent and human SINEs, although ancestrally related, have significant differences in overall structure and primary sequence(3). Whereas most of the genes containing the consensus-like sequences are human (Table 1 and 2), a significant number of rodent and other mammalian species contain this repeat The predominance of human genes in Tables 1 and 2 may be due to the greater number of human genes sequenced (including intron and flanking sequences). Alternatively, the non-human consensus-like sequences may be somewhat diverged from their human counterparts. To test this possibility, we derived a consensus sequence from six consensus-like elements in rat and mouse genes. This rodent consensus sequence was identical to the 15-bp core region and nearly identical to the entire 45 nucleotides of the consensus sequence in Figure 3. This test is consistent with the results of the homology matrix comparison of Figure 4, which shows that the consensus-like sequences in intron 2 of the mouse and human myoglobin gene are conserved, whereas the adjacent non-coding sequences share no homology. These data suggest that the human and mouse consensus-like elements are conserved over some 60-85 million years of evolution. Non-mammalian vertebrate genes in GenBank contained no significant sequence similarities to the consensus sequence. The differences in structure and evolutionary conservation between the consensus-like 707 Nucleic Acids Research A. 12 3 4 5 6 7 8 9 1011 ATTAACATCCCCTCT-TTACAGAAGAGAAAACTGAGGCACAGAGAGATTAAGTCCTGTTAC TAATTGTAG^A5A^TCTCT Figure 5. DNase I footprint analysis of a 250-bp HBV flanking DNA fragment containing the consensus-like sequence. The fragment tested was from the 3 ' end of the 1.0-kb fragment indicated in Figure IB. DNase I footprinting was performed as described in the Materials and Methods section. (A) Lanes 1-S contain the negative strand end-labelled at the EcoRI site. Lanes 6-11 contain the positive strand end-labelled at the HindlH site. Maxam and Gilbert sequencing reactions for C+T (lanes 1 and 6), G+A (lanes 2 and 7), and A+ C (lane 8) are used as markers. Untreated control DNA fragments are in lanes 3 and 9. Control reactions in which the DNA fragments were incubated with bovine serum albumin before DNase I treatment are shown in lanes 4 and 10. Lanes 5 and 11 contain fragments incubated with Hep3B nuclear extract prior to DNase I treatment Hypersensitive cleavage sites by DNase I in the presence of Hep3B extracts are indicated by arrowheads. The dominant hypersensitive site on each strand is shown by a large arrowhead. The regions of the fragments containing the consensus-like sequence are designated by vertical lines next to the marker lanes. (B) Summary of footprinting assay in the vicinity of the consensus-like sequence on both strands. The same DNase I hypersensitive cleavage sites are indicated by arrowheads, as in panel A Underlined nucleotides represent the 45-bp consensus region. Double underlined nucleotides represent the 15 bp of core similarity. Nucleic Acids Research repeats and the typical SINEs indicate that the former should be placed in a separate class of repetitive elements. Other investigators have reported non-Ahi types of short interspersed repeats (21-23). Some of these are apparently retrotransposons and are inserted (22). Another group of interspersed repeats is composed of tandem repetitions of short oligonucleotides, which are highly conserved (3,24,25). The consensus-like sequences reporteahere show no evidence of being inserted elements and are not repeated oligonucleotides but are highly conserved. We therefore propose that the consensus-like sequences form a different class of interspersed repetitive elements. One possible explanation for the evolutionary conservation of the consensus-like repeat is that these sequences play a functional role in the cell. The presence of a highly conserved core sequence is consistent with the idea that the sequence binds a nuclear factor or factors. Comparison of the consensus sequence to a list of sequences associated with nuclear factor binding (5) did not uncover any similarities except possibly in one case. A portion of the HBV genome to which liver-specific factors bind (26) showed identity at 11 of 13 positions with nucleotides 15-27 (Figure 2C) of the consensus sequence. This particular sequence is directly adjacent to the dominant DNase I hypersensitive sites in the consensus-like sequence of the HBV flanking DNA (Figure 5B). In two instances, the consensusylike element is associated with sequences that display potential origin of replication function. In one example, it is adjacent to a sequence in the African green monkey genome which is similar to the SV40 origin of replication (20). In the other case, the consensus-like sequence is part of the humany4R5/ DNA (19). ARS1 is a sequence of human DNA that allows replication of yeast integrative plasmids as autonomously replicating elements in yeast cells (19). Within ARS1 is a 325-bp segment that is necessary for maximal expression of the autonomous replication phenotype, analogous to an enhancer. Interestingly, the consensus-like sequence is positioned in the center of this 325-bp segment having replication enhancer activity. It is not known, however, whether the consensus-like sequence is part of the active portion of the replicative enhancer. It should also be noted that another cloned human sequence, ARS2, which also allows autonomous replication in yeast, does not appear to contain the consensus-like sequence. The DNase I footprinting experiments suggested that nuclear factors in extracts derived from two different liver cell lines bound the DNA fragment containing the consensus-like element and altered its susceptibility to DNase I. liver cell nuclear extracts were employed because the original HBVflankingsequence was isolated from hepatoma tissue and a number of the genes containing the consensus-like sequence are liver-specific. The footprinting results suggest at least a specific DNA conformational change, since DNase I hypersensitive sites were consistently noted in the same position of the 250-bp fragment Whether specific binding of discrete factors to the DNA occurred is somewnat unclear, although preliminary gel retardation binding assays suggested specific factor binding (data not shown). Interestingly, the observed DNase I hypersensitive sites were localized to both ends of the consensus-like sequence. This phenomenon of DNase I hypersensitivity has frequently been observed at nudeotide positions adjacent to sites of binding by transcriptional factors (27,28). There was some evidence of DNase I protection of specific domains of the fragment, but the putative protected regions were outside of the consensus-like sequences. Further experiments must be performed to clarify whether there is a specific interaction of nuclear factors with the consensus-like sequence. The primary question is whether this well conserved sequence has a function in gene expression, DNA replication, or some other cellular event ACKNOWLEDGEMENTS We thank Charlie Lawrence, Sandy Honda, Debbie Wilson and Jim Kelly for helpful discussions and Joyce Evans for help with the manuscript This work was supported in part by Public Health Service grant CA37257 from the National Cancer Institute and by National Research Service Award CA09197. 709 Nucleic Acids Research strand. This turned out to be the case. As a result, one is able to incorporate mutagenic oligonucleotides into the coding strand and transcribe the partially mismatched template directly. We have used this technique to produce RNAs containing defined insertions, deletions, or substitutions of virtually any length. The mutagenesis procedure is performed in a single reaction vessel, beginning with 10-20 |ig (~3-6pmol) of plasmid DNA and ending with 100-200 pmol of a purified mutant RNA that is typically 300-500 nucleotides in length. The transcription products are loaded directly onto a polyacrylamide gel and are purified by electrophoresis and subsequent column chromatography. Since the mutant RNAs are distinguished from wild type by their electrophoretic mobility, the technique is best suited for mutations that result in a discemable size difference between mutant and wild type or involve the use of a mutagenic oligonucleotide that hybridizes very efficiently to the target DNA. The technique does not depend on the presence of a convenient restriction site within the target gene, and, except for the T7 promoter and a restriction site located somewhere downstream from the gene, does not place any limitations on the design of the plasmid DNA. MATERIALS AND METHODS Nucleotides and Enzymes Unlabeled nucleoside triphosphates, deoxynucleoside triphosphates, and dideoxynucleoside triphosphates were purchased from Sigma, [a 32 P] GTP and ly 32 P] ATP were from ICN RadiochemicaJs and PH] UTP was from New England Nuclear. Synthetic oligodeoxynucleotides were obtained from Operon Technologies and were purified by polyacrylamide gel electrophoresis and chromatography on Sephadex G-10. Restriction enzymes were from New England Biolabs, T4 polynudeotide kinase, T7 gene 6 exonudease, T4 DNA polymerase, and T4 DNA ligase from VS. Biochemical, and AMV reverse transcriptase from Life Sciences. T7 RNA polymerase was prepared as previously described (5), and purified according to a procedure originally developed for SP6 RNA polymerase (6). Plasmid pTTlA3, which contains a 533 base-pair fragment of Tetrahymena rDNA (7), and pTL-45, which contains a 5"-truncated 399 base-pair fragment of the same gene (8), were provided by T.R. Cech. Preparation of Mutant RNAs In a typical preparation, 10-20 )ig of pTL-45 DNA was deaved at a Hindlll restriction site that lies immediately downstream from the gene for Tetrahymena rRNA. The cleaved DNA was added to a 100 uJ volume containing 50 mM Tris (pH8.1), 20 mM KQ, 5mM MgClj, 1 mM dithiothreitol, and 50 U T7 gene 6 exonudease, which was incubated at 37° C for 30 min. The exonuclease was removed by three phenol extractions and the DNA was purified by ethanol predpitation. Two oligodeoxynucleotides were then hybridized to the single-stranded (minus strand) DNA; one oligonudeotide forming a perfect duplex at the 3' end of the target gene and the other forming a 712 Nucleic Acids Research partial duplex that introduces the desired mutation. Annealing was performed in a 300 |il volume containing 20 mM Tris (pH 75), 50 mM NaCl, 2 mM MgClj, and a 5-fold molar excess of the two oligonucleotides, which was incubated at 70° C for 5 min and then steadily cooled to 30° C over 40 min. Synthesis of the mutant strand was completed by adding 40 U of T4 DNA ligase and 15 U of T4 DNA polymerase, and incubating at 37° C for 60 min in the presence of 20 mM Tris (pH 75), 50 mM N a d , 5 mM M g d 2 , 2 mM dithiothreitol, 1 mM ATP, and 0 3 mM (each) dNTPs. The resulting DNA was purified by ethanol precipitation, and then used to direct the transcription of mutant RNA. Transcription took place either in a 10 (ll volume containing 1 |ig of mutant DNA, 2 jiCi 32 [a ?] GTP and SOU T7 RNA polymerase or in a 400jU volume containing 10(ig of mutant DNA, 40 nCi [3H] UTP and 2,400 U T7 RNA polymerase. In either case, the transcription mixture also contained 40mM Tris (pH7.5), 15 mM MgQ^ 10 mM dithiothreitol, 2mM spermidine, and 1 mM (each) NTPs, and was incubated at 37° C for 90 min. T7 RNA polymerase was extracted with phenol and the transcription products were purified by ethanol precipitation. The mutant RNA was isolated by electrophoresis in a 5% polyacrylamide / 8 M urea gel, eluted from the gel, and purified by ethanol precipitation and chromatography on Sephadex G-50. Sequencing of Mutant RNAa The mutant RNAs were sequenced by primer extension analysis using reverse transcriptase in the presence of dideoxynucleotides (9). 1.0 pmol of [5'-MP]-labeled synthetic DNA primer was annealed with 03 pmol of mutant RNA by incubating at 65° C for 5 min and then cooling to 30° C over 5 min. The primer-extended cDNA products were analyzed on a 10% polyacrylamide / 8 M urea sequencing gel. RESULTS Development of the Mutagenesis Procedure The most widely used technique for site-directed mutagenesis involves hybridization of an oligodeoxynucleotide to single-stranded DNA, forming a partial duplex structure that contains a region of base mismatch. The oligomer strand is extended using a DNA-dependent DNA polymerase, and the resulting double-stranded DNA is used to transform bacterial cells (10,11). This technique is useful for producing a specific mutation at a defined location. However, It is awkward when one wishes to perform wholesale mutagenesis without taking the time to construct clones and harvest DNA from bacterial cells. Introduction of the mutant DNA into a bacterial host serves two useful purposes. First, the mutation becomes fixed as a result of bacterial repair processes that resolve the region of base mismatch. Second, the mutant DNA becomes amplified as a consequence of bacterial growth, so that one can obtain an essentially unlimited supply of pure mutant DNA. Oftentimes, however, one only needs enough material to sequence the mutant and to conduct a simple assay to examine 713 Nucleic Acids Research hybridize mutator & terminator oligos plasmid sequence D restriction site P: T7 promoter T4 DNA polymerase DNA ligase M: mulator oligo T: terminator oHgo I T7 RNA polymerase RNA 5' Figure 1: Outline of the muta genes is procedure, beginning with plasmid DNA and ending with mutant RNA. The mutator oligo (M) directs an insertion, deletion, or substitution, as indicated by hatched lines within its central portion. its functional consequences. In such instances, the time required to prepare the mutant becomes a critical factor. We have found that the two useful aspects of bacterial transformation, fixation of the desired mutation and amplification of the mutant DNA within the bacterial host, can be met in an entirely in vitro reaction system that makes use of T7 RNA polymerase. This enzyme is able to transcribe 714 Nucleic Acids Research partially mismatched DNA, reading the template strand while ignoring the non-coding strand, and in doing so generates several hundred copies of RNA transcript per copy of DNA template. We have exploited these properties in order to develop a "mini-prep" method for the rapid production of mutant RNA. The method involves excising the coding strand of wild-type DNA and replacing it with a new strand that contains the desired mutation. The resulting partial duplex structure is then used to direct the transcription of mutant RNA. In the most general form of the technique, plasmid DNA, containing a T7 promoter and the gene of interest, is cleaved at a site that lies downstream from the target gene (Fig. 1). The restriction site need not lie immediately downstream from the target gene; one can choose any unique restriction site that lies within a few hundred base pairs of the end of the gene. The cleaved plasmid is partially digested using a 5'->3' exonuclease to produce a stretch of single-stranded (minus strand) DNA. We prefer to use gene 6 exonuclease of T7 phage because of its distributive properties and because of its marked preference for duplex DNA (12). One can easily control the extent of the digestion to ensure complete removal of the coding strand of the gene as well as the plus strand of the adjacent T7 promoter. Disruption of the promoter region provides an internal selection mechanism since incomplete reconstructs will not obtain a functional promoter and will be inert in the subsequent transcription reaction. T7 gene 6 exonuclease operates inefficiently at termini that have a 5' overhang. When using a restriction enzyme that leaves a 5' overhang, we found it necessary to increase the amount of exonuclease from 50 U to 100 U in order to ensure adequate digestion of the coding strand. Removal of a 5 overhang may, to some extent, be dependent on the sequence of the overhanging bases, so that somewhat different amounts of exonuclease may be required in certain cases. After digestion of the coding strand, the exonuclease is removed by phenol extraction, and the DNA is purified by ethanol precipitation. Two oligodeoxynucleotides are then hybridized to the segment of single-stranded (minus strand) DNA. One, which we refer to as the "terminator oligo", forms a perfect duplex at a chosen location near the 3' end of the target gene. The other, which we refer to as the "mutator oligo", forms a partial duplex at a site of interest within the gene. The mutator oligo Is designed such that it contains a central region of base mismatch flanked by two regions that form a perfect duplex. The mismatched region may be shorter or longer than the original complementary DNA, and may consist of a defined sequence or a mixture of random sequences. As in all oligonucleotide-directed mutagenesis techniques, the mutator oligo should be designed such that it can form a stable partial duplex structure at the desired location. The mutator oligo must be phosphorylated at its 5' end so that it can serve as a donor substrate for DNA ligase. The two oligos are extended using T4 DNA polymerase and are ligated to form a template for transcription of the mutant RNA. T4 DNA polymerase is used because, unlike most other DNAdependent DNA polymerases, it does not have strand displacement activity (13). We tested the 715 Nucleic Acids Research Klenow fragment of E. coli DNA polymerase I In this reaction and obtained very unsatisfactory results. We usually begin the reaction by incubating at 25° C for 5 min to give the polymerase a chance to extend the two oligos under conditions that enhance duplex stability. The reaction is completed by incubation at 37° C for 60 min, and the DNA is purified by ethanol precipitation. The precipitation step is not absolutely necessary, but tends to increase the yield in the subsequent transcription reaction. Transcription is performed under conditions similar to those described by Milligan et at (4), using a large amount of T7 RNA polymerase and high concentrations of MgQ 2 and the four NTPs. After phenol extraction and ethanol precipitation, the transcription products are loaded onto a polyacrylamide gel and the mutant RNA is isolated electrophorettcally. Depending on how efficiently the mutator oligo hybridizes to the target DNA, there may be an appreciable amount of wild-type RNA included among the transcription products (for example, see below). For this reason, the mutagenesis technique is best suited for insertion or deletion mutations that result in a discernable size difference on the gel. In some cases, hybridization of the mutator oligo is very efficient and the amount of "revertant" wild-type RNA is negligible (again, see below). This is more likely to occur when the 5" portion of the mutator oligo forms a long stretch of stable duplex structure with the minus strand DNA. However, hybridization of the mutator oligo may also depend on features of secondary structure that are not possible to predict. Application of the Muragenesis Technique We have applied the above-described mutagenesis technique to the study of a self-splicing group I intron. Working with the intervening sequence (IVS) of Tetrahymcna pre-rRNA, we wished to produce sizeable internal deletions within the non-conserved portions of the molecule. In the present paper we focus on the mutagenesis technique itself, and present our data as an example of how the technique can be applied. In a subsequent paper (14) we will detail the effect that these and other internal deletions have on the catalytic activity of the Tetrahymena ribozyme. We made two internal deletions within the Tclrahymcna IVS (Fig. 2). Ml is a 38-nucleotide deletion from position 56 through 93 that removes structural elements P2.1 and L2.1. M2 is a 69nudeotide deletion from position 127 through 195 that removes structural elements P5a, F5b, L5b, P5c, and L5c. The location of these deletions was chosen based on known features of group I secondary structure (15-17). The mutator oligos were hybridized to the minus strand DNA by flanking regions consisting of 11-14 complementary residues. Two terminator oligos were used, each containing 15 nucleotides. Tl hybridizes at positions 305 through 319 of the IVS and T2 hybridizes at positions +8 through +22 of the 3' exon. The two terminator oligos were used either alone or in combination with one or both of the mutator oligos. All eight combinations were tested using pTL-45 DNA, which contains a portion of the Tetrahymena IVS (beginning at position 45) and 29 nucleotides of the 3' exon, inserted 6 nucleotides downstream from a T7 promoter (8). Figure 3 shows the direct transcription products 716 Nucleic Acids Research P5b P5c L5b Figure t. Sequence and secondary structure of the Tctrahymena IVS, showing the location of the two terminator and two mutator oligos. The RNA is truncated at its 51 end, corresponding to the direct transcription product of pTL-45 DNA. Structural elements within the IVS are labeled according to the standard nomenclature for group I Introns (23). A portion of the 3' exon is shown in lower case letters. The location of the 3' terminus produced by Tl and T2 and the site of internal deletion produced by Ml and M2 are indicated using a heavy diagonal line. The extent of hybridization by the terminator and mutator oligos is indicated by a heavy bracketed line. that were obtained. Comparable results were achieved using a different plasmid that contains the entire Tetrakymena TVS, although in that case the autoradiogram was much more complicated due to the catalytic activity of the precursor rRNA In the transcription buffer (data not shown). Digestion of the coding strand with gene 6 exonudease is essentially complete. Only a trace 717 Nucleic Acids Research T2 (in 3' exon at +22) — Ml M2 M1M2 Tl — (inIVSat319) Ml M2 M1M2 ,405 "398 360 329 291 281 -243 -212 - 174 Figure 3: Autoradlogram of wild-type and mutant RNAs obtained by transcription in the presence of [a 3iP] GTP. The bands marked by an arrow correspond to the expected transcription product, the size of which is indicated at the right. Bands marked by a dot correspond to materials derived from the expected transcription product as a result of RNA-catalyzed cleavage at the 3' splice site that occurs during the transcription reaction (24). Unmarked bands correspond to wild-type RNA and its cleavage products that appear as a result of inefficient hybridization of the mutator oligo. wt is the transcription product obtained from intact pTL-45 DNA that has been cut with Hin dm. g6 is the transcription product obtained after digestion of the wild-type DNA with gene 6 exonuclease. The products were separated by electrophoresis in a 5% polyacrylamkie / 8 M urea gel run in 90 mM Tris/borate buffer. (< 1%) of the wild type is detected when DNA that has been treated with gene 6 exonuclease is transcribed directly. Hybridizing either Tl or T2 to the minus strand DNA and then extending with T4 DNA polymerase allows one to produce 3'-rruncated RNAs with a defined end. Hybridizing one or two mutator oligos in addition to the terminator oligo allows one to control internal positions as well as the 3' terminus. The data presented in Figure 3 indicate that M2 does not hybridize as efficiently to the minus strand DNA as does Ml. This Is evidenced by material in the M2 lanes corresponding to transcripts whose 3' end is defined by the terminator oligo but 718 Nucleic Acids Research — M2 mutant C U G Ml M2 mutant A — C U -«-M2 Figure 4: Sequence analysis of the M2 and Ml M2 mutants by the primer extension method. A deoxynucleotide complementary to positions 274-288 of the IVS was hybridized to the mutant RNA and extended using reverse transcriptase. Lane -, primer extension in the absence of dideoxynucleotides. Lanes C, U, G, or A, primer extension reactions in the presence of ddGTP, ddATP, ddCTP, or ddTTP, respectively. Angle brackets indicate the extent of hybridization by the mutator oligos. Arrows indicate the site of internal deletions. 719 Nucleic Acids Research whose internal positions are unchanged from the wild type. The Ml M2 double deletion mutant is accompanied by a smaller amount of the Ml single deletion mutant For the most part, however, the desired single or double deletion mutant dominates the family of transcription products. The identity of the mutants was confirmed by eluting the transcription products from the gel and determining their nucleotide sequence. Figure 4 shows the nudeotide sequence of the M2 and Ml M2 mutants as determined by primer extension analysis using reverse transcriptase in the presence of dideoxynudeotides (9). It is important to note that the transition from a double- to a single-stranded template and from a single-stranded template back to a double strand takes place without appreciable slippage of the polymerase enzyme. The transcription products obtained using a partially mismatched template do not appear to be any less accurate than one would obtain using a complete double strand. The data presented in figure 3 was prepared quantitatively, that is, differences in the amount of radioactivity reflect either a loss of material during the workup or differences in the effidency of transcription. The lane corresponding to the Tl oligo alone demonstrates that in some cases the reconstruction of the template strand is nearly complete. The comparatively lower effidency of template reconstruction with the T2 oligo alone is likely to be due to decreased hybridization effidency of the T2 oligo. When a mutator oligo is used, the effidency of template reconstruction is lowered even further. This is partly because the mutator oligo presents a more difficult hybridization task and partly because the extended terminator oligo must be ligated to the 51 end of the mutator oligo. Despite the loss of viable templates due to ineffident strand reconstruction, one can obtain an adequate amount of mutant RNA as a result of the high turnover of T7 RNA polymerase. In a large-scale preparation, we used 10 ng (~3pmol) of plasmid DNA as starting material. The yield of mutant RNA, after elution from the gel, ethanol precipitation, and chromatography on Sephadex G-50, was 183pmol for the 329-nudeotide M2 mutant and 106pmol for 291-nudeotide Ml M2 mutant. The mutant RNA was found to exhibit catalytic activity in vitro (14), attesting to its purity and reasonable sequence homogeneity. DISCUSSION We hope that others will find our mutagenesis technique useful for the rapid preparation of mutant RNAs. We have been using the technique routinely for the past several months to produce a number of mutations within the Tetnhymena TVS. In addition to deletions, we have produced single base insertions, multiple base substitutions, and various combined insertions and deletions (data not shown). Because the mutant RNA is usually accompanied by a significant amount of "revertant" wild-type RNA, we prefer to indude an insertion or deletion along with any substitution to produce a discemable size difference, allowing the mutant RNA to be separated from the wild type on a polyacrylamide gel. This would not be necessary if one used a mutator oligo that hybridizes very effidently to the minus strand DNA or if one is willing to tolerate a 720 Nucleic Acids Research small amount of wild type included among the mutant RNAs. The major advantage that the mutagenesis technique has to offer is its speed and simplicity. Producing a 3'-truncated RNA with a defined end is especially straightforward since it does not require the use of a mutator oligo and thus is not subject to contamination by wild-type RNA. The 3' end may be fixed at any point along the gene or may extend from any point into an extraneous sequence as determined by a terminator oligo with a dangling 5' end (14). There are established methods for cloning a defined region of DNA (18) and for the in vitro synthesis of RNAs with defined ends (19). The latter technique is similar to our own, except that it uses cloned singlestranded DNA (e.g. M13 phage DNA) and a "portable promoter" to define the transcription start site. One could combine the M13 technique with our own to produce RNAs that have internal mutations. The site of an internal mutation may lie at any point along the gene, and need not be in proximity to a restriction site. The design of the mutator oligo must take into account three factors: the desired mutation, the need for efficient hybridization, and the cost. If one wishes to produce a radical alteration of the wild type, it is probably wise to design the mutator oligo with long flanking regions so that it will be able to bind tightly to the minus-strand DNA. This, of course, will increase the cost, but is likely to be economical in the long run. Similarly, if one plans to use two or more mutator oligos simultaneously, each should contain long flanking regions so as to maintain the combined efficiency of hybridization at an adequate level. Our primary interest has been the construction of recombinant RNAs. However, the mutagenesis technique that we describe could also be used to generate recombinant DNAs. The mutant RNA could be reverse transcribed to cDNA using the terminator oligo as a primer for reverse transcriptase. Typically, the yield of full-length cDNAs is only about 20-30% relative to the input of RNA template (20), so that that the net yield of mutant DNA would be 10-15 times the input of wild-type DNA. Alternatively, after reconstruction of the template strand, one could excise the minus-strand DNA using exonuclease in, and then run the polymerase chain reaction (21,22) to amplify the mutant DNA. The terminator oligo and the minus strand of the T7 promoter could serve as the two primers for this reaction. ACKNOWLEDGEMENTS We thank K. Umesono for helpful comments and J. M. Burke for providing a diagram of group I secondary structure. This work was supported by grants from the National Institutes of Health (GM35755) and the Alfried Krupp von Bohlen und Halbach-Siftung. G.F.J. is a Merck Fellow of the Life Sciences Research Foundation. REFERENCES 1. 2. Kunkel, T.A. (1985) Proc. Nat. Acad. Sd. USA 82, 488-492. Chamberlin, M. and Ryan, T. (1982) In Boyer, P. (ed), The Enzymes, 3rd edition, Academic 721 Nucleic Acids Research 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 722 Press, New York, pp.87-108. Lowary, P., Sampson, J., Milligan, ]., Groebe, D., and Uhlenbeck, O.C. (1986) In van Knippenberg, P.H. and Hilbers, C.W. (eds), Structure and Dynamics of RNA, Plenum Press, New York, pp.69-76. Milligan, J.F., Groebe, D.R., Witherell G.W., and Uhlenbeck, O.C (1987) Nucl. Adds Res. 15:8783-8798. Davanloo, P., Rosenberg, A.H., Dunn, J.J., and Studier, F.W. (1984) Proc. Nati. Acad. Sci. USA 81,2035-2039. Butler, E.T. and Chamberlln, M.J. (1982) J. Biol. Chem. 257, 5772-5778. Zaug, A.J., Been, M.D., and Cech, T.R. (1986) Nature 324, 429-433. Young, B. and Cech, T.R., personal communication. Sanger, F., Nicklen, S., and Coulson, A.R. (1977) Proc. NatL Acad. Sd. USA 74, 5463-5467. Gillam, S. and Smith, M. (1979) Gene 8:81-97. Kunkel, T.A. (1987) In Ausubel, F.M. et al. (eds), Current Protocols in Molecular Biology, John Wiley & Sons, New York, unit 8.1. Kerr, C and SadowsM, P.D. (1972) J. Biol. Chem. 247:305-310. Nossal, N.C. (1974) J. Biol. Chem. 2495668-5676. Joyce, G.F., Van der Horst, G., and Inoue, T., manuscript in preparation. Davies, R.W., Waring, R.B., Ray, J.A., Brown, T.A., and Scazzocchio, C (1982) Nature 300:719724. Michel, F., Jacquier, A., and Dujon, B. (1982) Biochemie 64:867-881. Michel, F. and Dujon, B. (1983) EMBO J. 2:33-38. Chisaka, O., Iwai, S., Ohtsuka, E., and Matsubara, K. (1986) Gene 45:19-25. Krupp, G. and Soil, D. (1987) FEBS Lett. 212, 271-275. Berger, S.L., Wallace, D.M., Puskas, RS., and Eschenfeldt, W.H. (1983) Biochemistry 22:23652372. Scharf, S.J., Horn, G.T, and Erlich, H.A. (1986) Sdence 233:1076-1078. Saiki, R.K., Gelfand, D.H., Stoffel, S., Scharf, S.J., Higuchi, R., Horn, G.T., Mullis, K.B., and Erlich, H.A. (1988) Sdence 239:487-491. Burke, J.M., Belfort, M., Cech, T.R., Davies, R.W., Schweyen, R.J., Shub, D.A., Szostak, J.W., and Tabak, H.F. (1987) Nud. Adds Res. 15:7217-7221. Inoue, T., Sullivan, F.X., and Cech, T.R. (1986) J. Mol. Biol. 189:143-165.
© Copyright 2026 Paperzz