volume 8 Number 201980 Nucleic Acids Research Recognition of specific DNA sequences in eukaryotic chromosomes Harold Weintraub Hutchinson Cancer Center, Seattle, WA 98104, USA Received 11 August 1980 ABSTRACT The packaging of DNA Into chromatin probably places certain restrictions on how specific DNA sequences can be recognized by DNA sequence specific recognition proteins (SRP). Several unique features of this type of Interaction are discussed. Specifically, as a consequence of the colling of the DNA about a hi stone core. It Is proposed that DNA recognition sites will be compound and that each element of the compound recognition site will be about 10 - 20 b.p. 1n length and distributed at approximately 80 b.p. Intervals— the periodicity of the DNA wrapping around the nucleosome. INTRODUCTION It 1s now very clear that both transcribed and non-transcribed DNA 1n eukaryotic cells 1s packaged Into nucleosomes. These nucleoscmes are themselves packaged Into so-called 100 A 0 fibers and these 1n turn, Into 250 300 A 0 fibers. It 1s likely that the packaging of DNA Into these chromosomal structures places certain limitations on how specific DNA sequences can be recognized by those (protein) factors that are presumably required for transcription, differential gene expression, genetic recombination, etc.. I would like to suggest that as a result of the folding of DNA Into chromosomes, a recognition site for any given protein - DNA Interaction will have two properties: (1) The recognition site will be compound and (2) each element of the compound recognition site will be about 10 - 20 b.p. 1n length and distributed at approximately 80 b.p. Intervals. DNA Sequence Recognition 1n Chromatin Figure la Illustrates one particular model for a 100 A 0 fiber. The DNA 1s colled Into a reasonably regular 80 b.p. supercoii by the wedge-shaped, hi stone octamer core. As Illustrated 1n the model of Finch et al. 1 each nucleosome core particle (55 A 0 high x 110 A 0 1n length) 1s composed of 2 turns of 80 b.p. each, and the assumption 1s made that the so-called spacer or linker DNA between nucleosomes 1s also packaged Into a similar coll as © IRL Press Umlted, 1 Falconberg Court London W1V6FG. U.K. 4745 Nucleic Acids Research 55A' per turn (A) 110 A° (B) Figure 1. Binding of DNA Sequence Recognizing Proteins CSRPJto Chromatin. A: The histone core (wedge shaped discs) folds the DNA Into a uniform coll with 80 b.p. per turn. The dimensions of the nucleosome 0 and shape of the core 1s given by Finch et al (1); the nucleosome is 55 A x 110 A 0 x 110 A 0 ; the repeat 1s 200 b.p.. This particular view 1s that of a 100 A° fiber as discussed by Worcel and Benyajati (2). In other models (14,15) which seem to contradict the face to face model proposed by Dubochet and Noll 0(22) the nucleosomes may be tilted with respect to 0 each other 1n the 100 A fiber; however, when packaged Into the 250-300 A fiber, their faces become alligned (14) and approach the view shown 1nthe simplified diagram presented here. The sequence 0 "xyz" represents a palindrome separated by 80 b.p. but brought to within 30 A as a result of the nucleosome folding. The symmetry of the resulting compound binding site would match that of a symmetrical protein. B: The coll 1s the same as 1n (A); however, the DNA 1s turned 180° so that the vector associated with "abc" 1s now pointed inward towards the center of the coll. As discussed 1n the text, a "singularity" (open triangle) placed near DEF or near ABC will determine whether the abc vector 80 b.p. downstream points inward or outward. proposed by a number of workers (see for example, Worcel and Benyajati. ) (This assumption 1s made for simplicity only, since the general argument being put forth 1s really Independent of this. Additional proposals for the 100 A 0 fiber will also be considered below.) The colling of the DNA 1n the chromosome fiber means that 1t 1s likely 4746 Nucleic Acids Research that a given DNA sequence recognizing protein (SRP) does not "read" a specific contiguous sequence of DNA; on the contrary, as the protein approaches the chranatin fiber from a particular angle, 1t would likely "see" a series of roughly parallel segments of DNA, each about 10 - 20 b.p. long and, for the most part, spaced at an average center-to-center distance of about 50 A 0 with a slight tilt with respect to the main axis of the fiber. (The exact spacing between the roughly parallel segments of DNA may vary along the periphery of the nucleosome due to the wedge-shaped structure of the core.1) Clearly, the exact details of what the protein might see are best predicted from the exact structure of DNA 1n the nucleosome core and 1n the nucleosome spacer. Since these parameters have not been fully established by direct experimental measurements 1t would be premature at present to go Into great detail as to what exactly the protein might be seeing, but 1t 1s clear that as the detailed structure of the nucleosome becomes known, the exact requirements for SRP binding should also become understood to a similar level of resolution. With respect to F1g. 1, 1t 1s also possible to Imagine that the approaching protein may not be oriented exactly parallel to the axis of the fiber. Thus, the average phasing of 80 b.p. between multiple sites 1s only an approximation and 1t 1s likely that, depending on the spatial distribution of the multiple DNA binding sites 1n a given SRP, some Interactions might Involve sites spaced at 90 b.p. or 70 b.p., etc.. As will be clear from the discussion below, whatever the exact spacing, 1t should be very near a multiple of 10 b.p.. Figure 1 also emphasizes the fact that 1n this structure, almost all of the DNA 1s on the outside of the fiber and 1s probably accessible to DNA binding proteins, at least over 10 b.p. Intervals (see below). While the common notion that only linker DNA Is readily accessible to proteins 1s best supported by the finding that micrococcal nuciease cleaves these spacer regions preferentially; this notion 1s probably Incorrect since 1t 1s also well known that DNase I and DNase II (and at high concentrations, even micrococcal nuclease) readily cut the DNA associated with the nucleosome core, and also that the core DNA 1s readily accessible to alkylating agents and even to lac repressor when the later 1s challenged to bind nucleosomes reconstituted with lac DNA (see review by McGhee and Felsenfeld.3) There are several very general features of the proposed SRP-chromat1n Interaction that are Interesting. The first deals with the length of DNA over which a typical Interaction can occur. First consider the 100 A 0 fiber (Fig. 1). If one Imagines a DNA sequence recognizing protein (SRP) of say 100 4747 Nucleic Acids Research A 0 1n diameter, 1t 1s possible that such a protein might Interact with at most 30 b.p. of contiguous, linear DNA. In contrast, 1n the context of the proposed model, the same protein could Interact with the DNA associated with almost 2 nucleosomes. This 1s about 5 colls of DNA. If we guess that about 20 b.p. per coll 1s positioned 1n the proper direction, then a 100 A 0 protein has the potential to bind as many as 100 b.p. (20 b.p. per coll times 5 colls). Consequently for a given sized protein; the free energy of Interaction (and hence the specificity) can be decreased by concentrating more DNA Into a given area, providing more surface for proper recognition. In this view, each coll would represent a separate binding sub-domain and therefore the overall Interaction between SRP and DNA would be the product of each successful Interaction between a SRP binding site and a sub-domain. Thus, by compacting DNA binding sites 1t 1s possible to achieve a high degree of specificity without forcing DNA binding proteins to evolve Into the long cylindrical structures required to Interact with long stretches of linear, contiguous DNA, or else altering the basic design of DNA protein Interactions Initially established 1n simpler organisms. (Actually, since bacterial DNA 1s also colled, this same type of Interaction might also occur for some bacterial SRPs.) In contrast to the previous calculation, 1t should be emphasized that most of the chromatin 1n higher cells 1s present 1n the nucleus as a 250 300 A 0 fiber. One basic view of how nucleosomes are packaged In such a fiber has been proposed by a number of laboratories. For simplicity and as an example, 1f one considers a solenoidal packing of nucleosomes with 6 nucleosomes per turn of the solenoid, then it 1s not unreasonable that a 100 A 0 protein could span a compound binding site extending as far as 24 nucleosomes or almost 5 Kb. Possibly many of the Inverted repeats separated by long Intervals 1n eukaryotic DNA (the sequence "XYZ" 1n F1g. 1A) could be brought adjacent to each other by this type of folding and hence, form a compound recognition site for a symmetrical protein as has also been suggested by the elegant work of Nasmyth et al.^ The use of phased, compound recognition sites has additional evolutionary advantages. The argument here 1s analagous to that proposed by Gilbert^ for the exons and Introns comprising eukaryotic transcription units. Thus, since each coll of DNA represents a separate domain containing a sub-domain of 10 - 20 b.p. of DNA that Interacts with SRPs (the "exonic" region) and 60 - 70 b.p. that do not (the "Intronic" region), 1t 1s possible to Imagine, as 1n the case of true exons and Introns, that when a particularly effective 4748 Nucleic Acids Research sub-domain evolves, rapid recombination vrithin the Intronic regions can occur, allowing for reassortment of the functional domain for SRP binding. Orientation of Binding Sites The previous estimate Illustrates a very Important limitation that a recognition mechanism must encounter. In a solenoid, (or 1n any higher order structure) not all of the DNA will be available; thus, some of the DNA 1n each nucleosome faces to the Inside and some, to the outside of the solenoid. Consequently, for proper recognition to occur two criteria are required. First compound recognition sites 1n the DNA must be placed at specific distances along linear DNA 1rv a way that 1s compatible with the underlying chromosomal structure and second, these sites must be properly oriented (all pointed inside or all pointed outside.) Interestingly, because of the helical nature of DNA, the same problem occurs even when only a single nucleosome 1s considered. I shall deal with this specifically. Figure 1A shows the DNA wrapped about a hi stone core. A specific sequence (for example "abc" In 1A) on the DNA 1s maximally accessible only from one orientation (eg, the large groove). Thus, 1f such a sequence 1s to be read with maximal efficiency i t must be pointing upward away from the center of the cylinder (arrow) and not downward (Fig. IB). What will determine the direction of the "abc" vector? Clearly, specific features of the structure Itself are Important (the exact path of the DNA around the core, hi stone Induced variations of the DNA twist, the pitch of the supercoiled DNA, the exact path of the spacer DNA, e t c . ) . However one special feature will be the exact point where the coil begins. For convenience we call such a point a "singularity" (This term was suggested by A. Varshavsky.) and depict such a singularity by the triangles 1n F1g. 1A, B. In comparing these two diagrams a singularity at ABC (Fig. 1A) results 1n the abc vector pointing out while a singularity at DEF (5 b.p. away from ABC) causes the abc vector to point Inward (F1g. IB). This follows because the DNA Is a double helix with about 10 b.p. per turn. Thus, rotations of 5 b.p. (or deletions or Insertions of 5 b.p.) will have marked effects on how SRPs Interact with chromatin. Clearly, without a singularity there will be a continuum of orientations of the abc vector, though the Interaction with the hi stone core might place some restraint on t h i s , perhaps d1g1tal1z1ng the possibilities to unit base pair Intervals. Singularities and Hypersensitive Sites What 1s the nature of the proposed singularities? For a number of genes, regions of DNase I hypersensUivity have been described. 7 f 8 > 9 > 1 0 > 1 1 > 1 2 4749 Nucleic Acids Research The genes Include the SV 40 mini chromosome, Drosophila heat shock genes, actin, <*. and /?-glob1n, RAV-0, Integrated adeno virus. Integrated polyoma, and conalbumin. These regions of hypersens1t1v1ty are rather precise points In the chromosome that are cleaved by DNase L. They tend to be localized near the 5' side of these genes but they are also detected 1n other regions as well. They are visualized, experimentally, by digesting nuclei with DNase I so that the average size of the DNA 1s about 10 - 15 Kb. The DNA 1s purif1ed, redigested with a restriction enzyme, separated on agarose gels, blotted, and hybridized to a particular probe. As a result of the DNase 1 treatment, specific, descrete sub-bands are produced 1n addition to the original parental fragments produced by restriction alone. This type of point s i t e , hypersensitivity 1s clearly distinguished from the preferential sensitivity of transcribed genes Induced by Wffi 14 and 1713 1n that the latter occurs In each nucleosome all along the length of a transcription unit. We think that 1t 1s a likely possibility that the regions of DNase I hypersensitivity, which are also tissue specific, are regions that are used to define a singularity as discussed above. The biochemical basis for the hypersens1t1v1ty 1s for the most part not known; however. In the case of the SV 40 mini chromosome, one suspects that I t 1s determined 1n some way by T-Ag since 1t occurs very near the T-Ag binding site. By establishing a specific point, the DNA coll Is defined (both upstream and downstream) by these singul a r i t i e s . As a result. Important DNA sequences come Into register (and proper orientation) every 80 b.p. and they can now be recognized (or not recognized, depending on their orientation) by SRPs. These proposed recognized events, which would occur downstream and upstream from the singularities, may be used to Initiate transcription and replication (and recombination) and also to control genes that are differentially regulated 1n different tissues. By orienting adjacent sites inward, singularities might also be used to prevent proper recognition of adjacent multiple binding sites. Clearly, a distinguishing feature of this type of control 1s that two events are required: The establishment of a singularity and the subsequent orientation of the 80 b.p. coll. Both of these events would occur at nearby, but different chromosomal sites. In a sense, the proposed singularities could be thought of as producing "position-effects." A crucial question 1s how the singularities are established 1n the f i r s t place and what is their biochemical basis. Perhaps a more accessible question is 1f these proposed singularities prove to be regions of DNase I hypersensitivity, whether these hypersensitive regions are present before a gene 4750 Nucleic Acids Research Is activated during development or not. We have recently Investigated t h i s point comparing hypersensitive regions 1n the chicken /?-glob1n domain In precursor hematocytoblasts and 1n progeny nucleated erythrocytes. Our results show that the hypersensitive sites are present 1n erythrocytes, but not 1n precursor c e l l s . Thus, the appearance of these hypersensitive sites coincides with the chromosomal activation of the globin genes. This r e s u l t suggests that 1f the hypersensitive sites are the proposed s i n g u l a r i t i e s , then they are created by the developmental process and are not, f o r example, premarked and stable areas of the chromosome. How 1s a given singularity created during development? Since these structures clearly occur at specific DNA sequences the problem arises how these sequences are recognized I f the 80 b.p. coll 1s not already phased by some additional, Independent s i n g u l a r i t y , ad 1nf1n1tum. I t 1s possible that during early development the DNA c o l l s associated with p a r t i c u l a r chromosomal regions become phased randomly and the chromosomal differences that eventually evolve between c e l l types are generated by the selection (possibly Influenced by embryonic gradients) of a particular phasing scheme. This selection would be manifest 1n the establishment of a s i n g u l a r i t y . As a consequence, a particular orientation of the 80 b.p. coll would be assembled and locked-1n over rather long distances and neighboring compound recognition sites would become properly oriented for subsequent Interaction with SRPs. Clearly, some mechanism must exist f o r propagating these singularities to daughter c e l l s during c e l l d i v i s i o n . CONCLUSIONS AND PREDICTIONS In conclusion, a consideration of the way 1n which DNA Is folded 1n the eukaryotic chromosome has led to the prediction that SRPs w i l l recognize multiple DNA sequences spaced at an average of 80 b.p. Intervals along linear DNA. By concentrating these multiple binding sites Into a smaller higher degree of s p e c i f i c i t y can be achieved. area a As the DNA c o l l s about the nucleosome, 1t Is known to have an Inside and an outside face. Thus, any particular sequence may be facing Inward (where I t would be Inaccessible) or outward (where 1t could be read). For proper binding (or lack of binding) to occur 1t 1s Imperative that these binding sites be phased and oriented so that they are always facing 1n or out. We propose that s i n g u l a r i t i e s exist to perform these functions and at a biochemical level these s i n g u l a r i t i e s may prove to be the hypersensitivity chromosomal regions defined by DNase I . For the purpose of the model presented here, I have assumed the simplest 4751 Nucleic Acids Research structure for the pathway of the DNA 1n a higher order chromatin fiber. While the exact pathway DNA follows as 1t traverses the nucleosome core particle 1s fairly well understood the pathway of the DNA when the chromatin 1s In a higher order structure 1s thought to be understood only for the nucleosoroe component of this structure. What 1s not known 1n detail 1s the exact relationship of one nucleosome to the next, the angle of the nucleosome with respect to the axis of the fiber, and as a related factor, the exact pathway of the DNA 1n the spacer region between nucleosome core particles. Many of the specific details of the suggestions presented here depend on several parameters of higher order structure that have yet to be experimentally defined; however, the general features of the model rely heavily on the reasonably well-established pathway of the DNA 1n the nucleoscme I t s e l f . The simplest form of the proposed model falls to account for theories of higher order structure that may not accomodate a regular colling of the spacer DM14,15,16 or a regular positioning of one nucleosome vis a viz the next. As a result 1t would be difficult to see how a given singularity could organize the adjacent nucleosomes. However, several laboratories have presented convincing data showing that there are, 1n fact, definite non-random relationships between one nucleosoree and the next. 1 7 » 1 8 » 1 9 Moreover, while 1t had originally been thought that nucleosomes were not positioned on particular DNA sequences, there 1s now a growing body of compelling evidence showing that this 1s not true for tRNA genes20, for 5S genes (J. Gottesfeld, A. Worcel, personal communications), satellite DNA^1 (and A. Varshavsky, personal communication), Mstone DNA (A. Worcel, personal communication). beta-glob1n DNA (H. Weintraub, manuscript in preparation), and heat shock DNA (C. Wu, personal commune1 at1on). Thus, whether I t be the uniform colling of the spacer DNA, or a regular positioning of one nucleosoroe with respect to the next, there are presumably mechanisms that can determine how particular nucleosomes are positioned and hence, provide a given singularity a mechanism for determining the orientation (facing Inward or outward) of particular DNA sequences associated with adjacent nucleosomes. In Its most general form, the model makes a number of specific predictions: That Important DNA binding sites will be compound and phased at an average of 80 b.p.; that DNA insertions and deletions of multiples of 5 b.p. (between compound binding sites) will be much more deleterious than multiples of 10 b.p.; that singularities (hypersensitive regions) will be used to orient Important binding sites upstream and downstream and would therefore be acting at a distance, perhaps as far as 1 - 5 Kb; that Important control- 4752 Nucleic Acids Research ling elements need not necessarily be localized to the 5' side of genes; and that gene activation will require (at least) two dependent events, the establishment of a singularity and the subsequent phasing of adjacent DNA so that Important signals can be recognized by SRPs. ACKNOWLEDGEMENTS This work was supported by grants from t h e National I n s t i t u t e s o f Health. REFERENCES 1. Finch, J . T . , Lutter, L. C , Rhodes, D . , Brown, R. S . , Rushton, B . , L e v i t t , M., and Klug, A. (1977) Nature 269, 19-36. 2. Worcel. A. and Benyajati, C. (1977) C e l T T 2 , 83-100. 3 . McGhee, J . and F e l s e n f e l d , G. (1980) Ann.~R~ev. Biochem. 49_, 1115-1156. 4 . G i l b e r t , H. (1978) Nature 271, 5 0 1 . 5. Finch, J . T. and Klug, A. TJ976) Proc. Nat. Acad. Sc1. USA 7 3 , 1897-1901. 6. Nasmyth, K., T a t c h e l l , K., H a l l . B . , and Smith, M. (1980) manuscript submitted. 7. Wu, C , Bingham, P . , L1vak, K., Holmgreen, R. and E l g i n , S. C. R. (1979) Cell 1 6 , 797-808. 8. S t a l d e r , J . , Larsen, A . , Engel, J . D . , Dolan, M., Groudine, M., and Heintraub, H. (1980) Cell 2 0 , 451-460. 9. Kuo, T . , Mandel, J . , and Cnambon, P. (1979) Nuc. A d d Res. 7 , 2105-2113. 10. Varshavsky, A . , Sundin, 0 . . and Bahn. M. (1979) Cell 1 6 . 457-466. 11. Waideck, W., Fohring, B . , Chowdhury, K., Grass, D. ancTSauer, G. (1978) Proc. Nat. Acad. Sc1. USA 7 5 , 5964-5968. 12. S c o t t , W. A. and Wigmore, DT J . (1978) Cell 1 5 , 1511-1518. 13. Weisbrod. S. and Weintraub, H. (1978) Proc. Nat. Acad. Sc1. USA 7 6 , 328-332. 14. Tcma, F . , K o l l e r , T . , and K l u g , A. (1979) J . Cell B1ol. 8 3 , 403-427. 15. Suau, P., Bradbury, M., and Bradbury, J. (1979) Europ. J7~B1ochem. 97, 593-602. Worcel, A., Strogatz, S . , and R1ley, D. (1980) manuscript 1n preparation. Lohr, D. and Van Holde, K. E. (1979) Proc. Nat. Acad. Sc1. USA 76_, 63266330. Lohr. D., Tatchell, K. and Van Holde, K. E. (1977) Cell 1£, 829-836. R1ley, D. and Weintraub. H. (1978) Cell 13, 281-293. WHtig, B. and W1tt1g. S. (1979) Cell 18, 1173-1183. Musich, P. R., Ma1o, J. J. and Brown, F. (1977) J. Mol. B1ol. 117, 657677. Dubochet, J. and Noll, M. (1978) Science 202, 280-286. 16. 17. 18. 19. 20. 21. 22. 4753 Nucleic Acids Research 4754
© Copyright 2024 Paperzz