volume 10 Number 10 1982 Nucleic Acids Research Nucleotide sequence definition of a major human repeated DNA, the Hind HI 1.9 kb family Laura Manuelidis Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06510, USA Received 3 February 1982; Revised and Accepted 30 April 1982 ABSTRACT A human Hind III 1.9 kb repeated DNA fragment was isolated and cloned in pBR322. A cloned member that hybridized predominantly to the 1.9 kb Hind III band in a digest of whole human DNA was chosen for sequencing. It is an 1894 bp fragment that shows no significant internal repeats. Few pCG residues are observed in the sequence and there are numerous stop codons. Detailed sequence comparisons confirm this is a novel class of repeats that is not related to previously characterized human satellite DNAs or Alu sequences. At least a portion of the sequence described is conserved in evolution. INTRODUCTION A 1.9 kb band is visible in Hind III digests of human male and female DNA and by densitometry, accounts for >_ 0-25% of the genome; by hybridization experiments its sequence has been shown to be distinct from previously identified human repeated DNAs including the simple DNA satellites, the Eco RI 340 bp dimer repeats and the Alu sequence family (1). The complete sequence of a cloned member of this 1.9 kb family is here reported. An accompanying paper describes minor sequence variations in the whole genomic array that are detectable in hybridization studies using this cloned DNA as a probe (2). Such studies clearly link the sequence described here to several primate Kpn I bands observed by other investigators (3, 4 ) . The 1.9 kb sequence also is likely to be related to repeated sequences found in cloning studies of the 3' side of the Bglobin gene (5) and to those identified in X chromosome cell hybrid experiments (6). The 1.9 kb sequence shows no internal repeats of statistical significance, and in accord with previous hybridization studies, it is not homologous to any of the previously described human repeated DNA families. Re- lated Kpn I repeats have not yet been sequenced and thus cannot be compared directly for homologies. At least part of the 1.9 kb © IRL Press Limited, 1 Falconberg Court, London W1V 5FG, U.K. 0305-1048/82/1010-32118 2.00/0 sequence is conserved 3211 Nucleic Acids Research in evolution (1, 2, 3 ) . The "function" of this sequence is unknown. MATERIALS AND METHODS The Hind III band visualized in a digest of whole human DNA was excised and eluted from a preparative 0.8% agarose gel, cloned in A Charon 27 and subcloned in pBR322 (2). One recombinant clone that in Southern blot experiments hybridized predominantly to the 1.9 kb Hind III band in digests of human nuclear DNA (1) was chosen for further sequence studies. Plasmid DNA isolated from E_^ coli as described ethidlum bromide centrifugation. (7) was purified by equilibrium CsCl- The DNA insert was mapped using restric- tion enzymes and standard agarose gel electrophoresls (8, 9 ) . Appropriate restriction fragments were 5' or 3' end labelled as described in detail (9, 10, 11) using a series of 20%, 8% and 4% gels. analyzed at least twice Most of the sequence was (Fig. 1) and complementary strand analysis or sequence analysis in a second experiment showed unambiguous confirming data. Restriction enzyme map sites consistent with fragments obtained prior to sequencing were confirmed in the sequence. The sequence was analyzed extensively using the sequence analysis system of the NIH MOLGEN SUMEX-AIM computer facility at Stanford University. The homology, symmetry and dyad-symmetry searches in this system are based on an updated version of the Korn-Queen algorithm; details of the parameters used and their statistical significance were generously provided and can be obtained from the MOLGEN group at Stanford (J. Clayton, P. Friedland, L. Kedes, and D. Brutlag). The system has numerous sequence files available for comparative searches. RESULTS AND DISCUSSION Mapping and Sequencing Strategy for the Hind III 1.9 kb Insert The enzymes used for sequencing of end labelled insert fragments are depicted in Fig. 1. Bgl II/Hind III digests were used to sequence positions 1-700 since fragments of 400 bp and 1500 bp were well resolved from 4362 bp plasmid DNA. positions Similarly, Xba I/Hind III digests yielded the sequence from 1200-1900. Rsa I/Eco RII digests were found most useful for sequencing central fragments of the Insert. Complementary strand analyses were also done using Rsa I/Bgl II digests or Rsa I/Sau IIIA (positions 14241667) digests. Hinf I, Hinc II, Kpn I and Taq I cleavage studies on whole plasmid or end labelled fragments were also used to determine the orientation of the insert in pBR322 and to confirm restriction sites obtained in 3212 Nucleic Acids Research 7 8 v Hind III 0 Eco R II T Bgl II 9 10 II 12 13 14 15 16 17 IS 19 P t\ Rsa I S Kpn | t Xba I Figure 1. Diagrammatic summary of restriction gel fragments analyzed with major enzymes of use in DNA sequencing. 3' represents 3' end labelled fragments used for sequencing, all other fragments were 5' labelled. Arrows show direction of sequencing from each labelled end. Double line represents identically labelled fragments sequenced twice. Most of the sequence analysis contains complementary strand confirmation. Ep represents the end of the vector arm to position 29 (the Hind III insert site) and p represents plasmid from position 29 onwards; insert scale is 1900 bp. the sequence. Restriction Sites of the Hind III 1.9 kb Insert A complete restriction map is shown with the obtained DNA sequence of the 1.9kb insert (Fig. 2 ) . The sequence determinations confirmed our initial restriction site studies indicating no Pst I, Ava I, Bgl I, Bam HI, Xho I, Ava II, Bst E II or Eco RI sites in the cloned insert. The Rsa I, Eco RII, Bgl II and Xba I sites were confirmed by sequencing. Restriction studies using Hinf I and Hinc II digestion on the cloned plasmid DNA, and Alu I, Hae III and Taq I on end labelled fragments, were compatible with the sequence obtained. Studies with whole genomic DNA have also confirmed the absence of Pst I, Bam HI, and Bst EII sites within the majority of uncloned members of this family. were observed Hind III, Xba I, Hae III, Rsa I, Eco RII and Hinc II sites at the same positions in the genomic family with minor variations; a single base change in one of the Eco Rl* sites (at position 1642 or 1652) is likely to be represented as Eco RI sites in a fair proportion of uncloned members of this family (2). Nucleotide Composition There were 42.6% Adenine (A), 21.9% Cytosine (C), 16.5% Guanine (G), and 19.1% Thymlne (T) residues (reading 5' to 3'). Taken together this yielded a base composition of 38.4% GC and 61.7% AT for the double stranded molecule. Whole human DNA bands at a density of 1.697 in neutral CsCl (un- published data) which is equivalent to average GC content of 39-40%. this DNA is somewhat AT rich. Thus We have never been able to purify or signifi- cantly resolve fragments containing the Hind III 1.9 kb band in high mole- 3213 Ttblllll Alul UiiDIII 1 Ddol EcoiU* 1 liphl 1 EcoRII I 1 1 Mall Tiql I HboII Sf.NI I Ilinl'I II DJel II Lcol'l Ttblllll 11 Hull Dd«l 1 Alol EcoPl Ddol Usll 11 1 •30 860 870 880 890 900 910 CTTOGAACTAAAGCrCTCCrCAGCAAATGTAAAAGAATAGAAATTATAACAAACTGTCCCTCAGACCACA 11 Alul Odol I LcoUII 710 720 730 740 750 760 770 ATTAACAAGGATACCCAGGAATTUAACTCAGCTCTGCACCAAGCAGAQCr/JlAliACATCTACACAACTCr 1 Udcl Hu.KI EcoUI* 111 U|iAI 570 580 590 600 610 620 630 TATATOCACCCAATACAGGAGCACCCAGATllCATAAACCAAGTCCrGAGTGACCTACAAAGAGACTTAGA Upol 1 430 440 450 460 470 480 4'J<) AAAACAAAAAAAGGTACGGCTTGCAATCCTAG7XrCTGATAAAACAGACrrTAAACCAACAAAGATCAAA EcolU 1 290 300 310 320 330 340 3S0 GCTCCAATTAJUUUlACACGGACrCGCAAATTGGATAAAAACTCAAUACCCATCAGTCTCCTGTATTCAGG EcoPl 1 ISO 160 170 180 190 200 210 AAAATCATGCCAAATTCTAAAGACCATCGAGGCTAGGAAUAAACTGaTCAACTAACCAGCAAAATAACC 1 11 10 20 30 40 50 60 70 AACCnCATACUTGAAOOAGAAATAAAATACrTCACACACAACCAAATGCTGACAGATnTGTCACCACC lUelll 1 11 Upall Kill Kpnl UaiCI 1 1 llbvl Fnu4III Fuu4HI Dpal DpnI HboII II Alul llbotl Upnl 1 licoRI* 920 930 940 950 960 970 980 GTUCAATCAAACTAGAACTCAGGGTTAAGAAACTCACTCAAAACCACTCAACTACATGCAAACTAAACAA I 1 K«»I TM 790 BOO «10 820 830 U40 CCAGCCCAAATCAACAGAAAUTACAj\Ai\.VAri'CA(iCACCACACCACACCTATTCCAAAACri;ACCACATA 1 1 UioDll 64U 650 660 670 bilO 69U 700 CTCCCACACAATAATAATOGUAGACrrCAACACCCTACTCiTCAACAnAUAAAGATCAACAAGACAGAAA 1 Upol II Haolll Hiol 500 51u 520 530 540 550 560 AGAGACAAAGAAGGCCATTACATAATGGTAAAAOGATCAAlTCAACAAliAAGAUCTAACTATCCTAAATA XlioII 11 1 1 Ac y1 360 370 360 390 400 410 420 AMCCCATCrCATCTCCAGAGACACACATAGGCTCAAAATAAAAflGACGTCAAGATCTTCCAACAAATGO 1 1 Alul 220 230 240 250 260 270 280 AGCTAACATCATAATGATAGGATCAAACTCACACATAACAATATTAACTTTACATGTAAATGGACTAAAT ll»cl StuI II Alul llgiAJ SicI 11 BO 90 100 110 120 130 MO AGGCCTGCCCTAAAAGAGCTCCTGAAGCAAGCACTAAACATGCAAAGUAACAACCGGTACCAGCCGCTGC 1 1 Dpnl 1 1 Alul 1 ninFI F.cotll* 1 P.coPl Alul 1 H,,I JpnI ,I,,CI 1 Hnll ninFI IlcollI' 1 1 1 1 .'Inll r.coR|. 1 1 I'bvl Fnu41ll Fxcil'15 Kcol'.I* 1 MboII 1 1 HinFI MboII ninPt 1 Xbil EcoKI* 1 1 1 MboII Mnll Tiql CCORI* 1 1 1 Hnll 1 1 F.coRI* Moll Rill 1 1 1 EcoRII 1 StnNI F.colll" 1 EcoPl 1 It is 1760 1770 1780 1790 1800 1810 1B20 GCCAGCATCATCATCCTGATACCAAAGCCrGGCAGAnACACAACAAAAAAAGAGAATnTAOACCMTAT 11 IlinFI BcoRI* 1 1 1 licoI'.II Ilinl'l Avail Alul 1620 1630 1640 1650 1660 1670 1680 TACCAACCAAAAAGAGTCCAGGACCAGATGGAITCACAGCCAAATTCTACCAGAGGTACAAGGTDGAACT Alul IlinlHII II 1 1480 1490 1500 1510 1S20 1530 1540 CTATGCAAATAAACTAGAAAATCTAfiAAGAAATGGATAAATTCCTCnACACATACACCCTCCCAAGACTA Dpnl 1 1340 1350 1360 1370 1380 1390 1400 AAAATTGATAGATCACTAGCAAGACTAATAAAGAAr.AAAACAGAGAAGACTCTGATAOACTCAATAAAAA Figure 2. Complete nucleotide sequence and restriction enzyme positions of the Hind III 1.9 kb insert. shown as oriented in pBR322 (see Fig. 1 ) . Eco RI* = PPATQQ.Dpn I = Mbol. Sf«NI 1 Alul 1 2 0 0 1210 1220 1230 1240 1230 1260 AAAAGCAAGAGCAAATACATTCAAAAGCTAGCAGAAGGCAAnAAATAATTAAAATCAGAGCAOAACrGAA 1 1 Unll EcoRI* 1 Alul 1060 1070 1080 1090 1100 1110 1120 GAOAACAAAOACACAACATACCAOTATCTCTOOOACATTCATAOCTGTGTGTACACOCAAATn'ATAeC* Mnll "«HI »«1 Ml 1830 1840 1850 1860 1870 1880 1810 CCTTOATnAACATTnATCCAAAAATCCTrAATAAAATAnT.nCAAACiCAAATCCAfilWnCACATCAAAAA 1 Mull 11 1690 1700 1710 1720 1730 1740 1750 GGTACCATTCCrrCTGAAACTATTCCAATCAATAGAAAAAGAGGGAATCCTCCCTAACTCATTTTATClAG HboII 1 1550 1560 1570 1580 1590 1600 1610 AACAAGGAAGAAOCTOAATCTCTGAACAGACCAJITAACAGGCTCTGAAATTATCGCAACAATCAATACCT Moll Tthlllll HphI Dpnl II II 1410 1420 1430 1440 14S0 1460 1470 ATGATAAAGOAGATATCACCACCGATCCCACAGAAATACAAACTACCATCAGAGAATACTACAAACACCT HlnFI Alul [icoRII EcoRI* 1 1270 1280 1290 1300 1310 1320 1330 COAAATACAGACACAAAAAACCCTTCAAAAAATTAATCAATCCACGAGCTGOII'IrnHAAACCATCAAC till 1 1130 1140 1150 1160 1110 1180 1150 CTTAATGTCCACCAGTCAAAGCTOaAAAfiTTTCAAMTTCACACCCTAACATCACAATTCAAAnCACTAC 990 1000 1010 1020 1030 1040 10S0 CCTGCTCCTOAATGACTACTGGGTCCaTAACOAMTGAAGGCAGAAATAAAGATGTTCTTTGAAACCAGT 3 > o a. <•>• Nucleic Acids Research cular weight human DNA using CsCl-Hoechst or Ag-CsSO4 gradients (12) possibly because the repeat is integrated with other sequences in the genome. Analysis of the dinucleotide sequence distribution was generally in keeping with the percent expected in a random sequence, with one striking exception. The pCG ratio (which is the same in both strands) was quite low and showed only 9 pCG pairs or 0.5%, as compared to an expected percent of 3.6% in total DNA. (position 127). One of these pCG sites is at the Msp I (Hpa II) site In preliminary studies using the restriction enzymes Msp I and Hpa II to differentiate methylated from non-methylated cytosines, there appears to be a small degree of methylation at this site in whole human DNA (data not shown). Only one other pCG site of the 9 in the total sequence has a methyl C residue that potentially can be studied with a restriction enzyme; at position 134 a simple inversion of CG could yield a Hha I site. Generally, the degree of potential methylation sites in the 1.9 insert ,is considerably lower than that seen in mouse satellite DNA (11, 13). Although pCG methylation has been considered an important feature in repeated DNAs and the inactivation of coding sequences (14, 15, 16), the low percent of pCG residues here suggest C methylation may not be an important modifying feature in the 1.9 kb sequence. The low representation of pCG sites also could coincide with special protein binding affinities of this sequence, especially if it is adjacent to active gene coding regions. Lack of Internal Repeats Since the 1.9 kb Hind III sequence is repeated, one might suspect that the sequence is built upon a simpler sequence, as for example that seen in the 340 bp Eco RI human dimer, or the 232 bp mouse satellite sequences (9, 11, 13). We therefore searched the sequence for internal homologies that were statistically significant (see Methods). The best homology was the match between 175-212 and 1545-1580 in the 1.9 kb insert (30 of 40 bases matched). The expected value (E) for this match as compared to a random DNA is 0.118, and showed a one base loop out (insertion-deletion) at 6 positions. A statistically significant homology would have zero base loop outs, and an expected value E of 0.05 or less. We further reviewed all the matches of 75% homology and found no repeated sequence that was obvious in these matches; there was also no consistent spacing pattern discernable in the homologies obtained. As a control of the computer program we compared its expected value for the two repeats of the human Eco RI 340 bp dimer, which using a different analysis were considered to be highly significant statistically 3216 (9). The computer also read this as a highly significant Nucleic Acids Research repeat with an expect value E = 0.000. Regions of Symmetry and Dyad Symmetry in the 1.9 kb Fragment Several symmetries were noted in the sequence, for example at position 430. However, computer analysis of all symmetries showed there were none of statistical significance. Furthermore, there were no dyad symmetries where the total free energy of base pairing was -10 K cal; -15 K cal or less has been considered to be the limit for the stable formation of RNA hairpins under physiological conditions (17). Translation Analysis of the 1.9 kb Sequence The translation products in all three reading frames were examined. There were numerous stop codons. The longest reading frame (in any frame, without stop codons) was 251 amino acids long, starting at position 1139. More stop codons were observed in a search of the inverse complement of the sequence (the largest translation array was 82 amino acids in length). More detailed translation analyses (and homology and dyad symmetry analyses) are available on request. Comparison with other Sequences The Hind III 1.9 kb sequence was compared with the human 340 bp Eco RI dimer, in both orientations, using the sequence homology program. No statistically significant homologies were obtained. It is curious that 5 of the 12 matches (41.7%) with >^ 75% homology to the 1.9 kb insert oriented as shown in Fig. 2, matched a region of the dimer sequence (ATAGAAACTAGACAGAATAAT) at positions 683, 1192, 1338, 1477 and 1485 in the 1.9 kb sequence. It is not known if this sequence region may be of importance in cross-over events or evolution of repeated DNAs. ly related No evidence that the 1.9 kb fragment was close- to the Eco RI dimer family was obtained from these sequence analyses, and this is in accord with our hybridization studies showing no homology between these two repeated DNAs (1). These results contrast to hybridization studies of related "alpha" satellites by other investigators (3). The best homology between these two different repeats within the same species was 76.9% (20 of 26 bases) with an unimpressive E value of 0.995 (1.9 kb positions 683-708 with Eco RI dimer positions 327-351). Analysis of the inverse complement for homologies also showed no significant similarities between these two repeats. Analysis of dyad symmetries between the human Hind III 1.9 kb fragment and the Eco RI 340 bp dimer did however show 7 matches where the pairing yielded < -15.0 K cal, possibly suggesting that under less stringent hybridization conditions, some positive hybridization between the two different sequences might be obtained. Alternatively, high 3217 Nucleic Acids Research molecular weight "alphold" DNA, isolated as a satellite from cesium gradients, could also contain other repeated DNA sequences in addition to the 340 dimer family that contributed to the positive hybridization results obtained by others (3). In summary, the comparative data here does not lend any positive evidence to support the notion that the shorter Eco RI 340 bp human dimer sequence is related to the longer 1.9 kb human repeat. We previously found that the 1.9 kb clone hybridized to an "interspersed" mouse repeated DNA band (1,18). this sequence satellite. bore any resemblance Again no statistically to Thus it was of interest to find if the common significant repeated mouse DNA sequence homologies were noted between the human 1.9 kb repeat and mouse satellite, and this was in accord with previous hybridization studies (1). The best homology between these two sequences matched positions 126-15 7 of the mouse sequence with 14 72-1504 of the 1.9 kb sequence (75%, 2 7 of 36 bp, E = 0.154) as shown: * * * * * * * * * 1.9 kb 1472: TATG CAA ATAAACTAGAAAATC TAGAAGAAATG 1504 Mouse satellite 126: TATGGCAAGAAAA CT GAAAATCATGGAA AA TG 157 This homology between two widely separated species was statistically better than any homology between the human Eco RI 340 bp dimer and the human 1.9 kb repeat. "Alu family" sequences, including a polymerase III in vitro trans- cription unit (19) were also compared since they are "interspersed" in the human genome. Alu sequences are notably more GC rich than the 1.9 kb se- quence depicted here; Alu sequences showed less significant homologies with the 1.9 kb repeat than mouse satellite DNA (sequences compared as shown in Fig. 2 or as inverse complement). Tandemly repeated human simple satellite sequences (12, 20) did not show any obvious relation to the 1.9 kb repeat. In summary, the 1.9 kb repeated sequence is here defined as a novel human repeated DNA. It lacks any internal repeated sequence subsets, and thus cannot be classified as a tandem repeat based on its internal sequence. In the following paper we demonstrate that larger fragments containing this sequence do not show strict tandem repetition of the 1.9 kb sequence, i.e. the end of the molecule is not followed directly by the beginning of the sequence or vice versa. The "function" of this conserved repeat is unknown. It is of interest that high resolution chromosome hybridization studies directly show at least a proportion of this sequence is localized on the chromosome arms in a discrete or banded configuration (1 and in preparation). It is possible that this sequence, or clusters of this sequence, may have a role in the definition of discrete chromosomal and nuclear domains that 3218 are distinct from those occupied by the predominant centromeric Nucleic Acids Research repeats. ACKNOWLEDGEMENTS J. Steitz generously supplied very high specific activity Y 32P ATP used in these studies. This work was supported by NIH Grant CA15O44. REFERENCES 1. M a n u e l i d i s , L. ( 1 9 8 2 ) . I n : Genome E v o l u t i o n and P h e n o t y p i o c V a r i a t i o n G.A. Dover and R.B. F l a v e l l , e d s . p p . 263-285 Academic P r e s s , New York 2. M a n u e l i d i s , L. and B i r o , A. ( 1 9 8 2 ) . N u c l e i c Acids Res ( i n p r e s s ) 3 . Maio, J . J . , Brown, F . L . , and Musich, P . R . ( 1 9 8 1 ) . Chromosoma ( B e r l . ) J33_, 103-125 4 . Singer, M.F. (1981). I n t e r n a t i o n a l Rev. Cytology ( i n press) 5. Kaufman, R.E., Kretschmer, P . J . , Adams, .W., Coon, H.C., Anderson, W.F. and Nienhuis, A.W. (1980). Proc. N a t l . Acad. Sci. (U.S.A.) !]_, 4229-4233 6. Schmeckpeper, B . J . , Willard, H.F. and Smith, K.D. (1981). Nucleic Acids Res. \ 1853-1872 7. Clewell, D.B. and Helinski, D.R. (1970). Biochem. 9_, 4428-4440 8. Manuelidis, L. (1976). Nucleic Acids Res. _3» 3063-3076 9. Wu J . C . and Manuelidis, L. (1980). J . Mol. Biol. _142_> 363-386 10. Maxam, A. and G i l b e r t , W. (1979). Methods Enzymol. ^ 5 , 504-561 11. Manuelidis, L. (1981). FEBS L e t t . \V^, 25-28 12. Manuelidis, L. (1978). Chromosoma ( B e r l . ) ^ 6 , 1-21 13. Horz, W. and Altenburger, W. (1981). Nucleic Acids Res. _9» 683-695 14. Spencer, J . H . , Harbers, K., and Duhamel-Maestracci, N. (1981). Advances in Enzyme Regulation J_9_, 453-470 15. Razin, A. and Riggs, A.D. (198^1. Science 2H), 604-610 16. Tantravahi, U. , Guntaka, R. , Erlanger, B. and M i l l e r , O.J. (1981). Proc. Natl. Acad. Sci. U.S.A. _78» 489-493 17. Tinoco, I . , Borer, P.N., Dengler, B . , Levine, M.D., Uhlenbec, O.C., Crothers, D.M. and Gralla, J . (1973). Nature New Biol. 246_, 40-41 18. Manuelidis, L. (1980). Nucleic Acids Res. 8^, 3247-3258 19. Pan, J . , Elder, J . T . , Duncan, C.H., and Weissman, S.M. (1981). Nucleic Acids Res. 9_, 1151-1170 20. Frommer, M., Prosser, J . , Tkachuk, D., Reisner, A.H. and Vincent, P.C. (1982). Nucleic Acids Res. 10, 547-563 3219 Nucleic Acids Research
© Copyright 2025 Paperzz