The packaging of DNA Into chromatin probably places certain

volume 8 Number 201980
Nucleic Acids Research
Recognition of specific DNA sequences in eukaryotic chromosomes
Harold Weintraub
Hutchinson Cancer Center, Seattle, WA 98104, USA
Received 11 August 1980
ABSTRACT
The packaging of DNA Into chromatin probably places certain restrictions
on how specific DNA sequences can be recognized by DNA sequence specific
recognition proteins (SRP). Several unique features of this type of Interaction are discussed. Specifically, as a consequence of the colling of the
DNA about a hi stone core. It Is proposed that DNA recognition sites will be
compound and that each element of the compound recognition site will be about
10 - 20 b.p. 1n length and distributed at approximately 80 b.p. Intervals—
the periodicity of the DNA wrapping around the nucleosome.
INTRODUCTION
It 1s now very clear that both transcribed and non-transcribed DNA 1n
eukaryotic cells 1s packaged Into nucleosomes. These nucleoscmes are themselves packaged Into so-called 100 A 0 fibers and these 1n turn, Into 250 300 A 0 fibers. It 1s likely that the packaging of DNA Into these chromosomal
structures places certain limitations on how specific DNA sequences can be
recognized by those (protein) factors that are presumably required for transcription, differential gene expression, genetic recombination, etc.. I
would like to suggest that as a result of the folding of DNA Into chromosomes,
a recognition site for any given protein - DNA Interaction will have two
properties: (1) The recognition site will be compound and (2) each element
of the compound recognition site will be about 10 - 20 b.p. 1n length and
distributed at approximately 80 b.p. Intervals.
DNA Sequence Recognition 1n Chromatin
Figure la Illustrates one particular model for a 100 A 0 fiber. The DNA
1s colled Into a reasonably regular 80 b.p. supercoii by the wedge-shaped,
hi stone octamer core. As Illustrated 1n the model of Finch et al. 1 each
nucleosome core particle (55 A 0 high x 110 A 0 1n length) 1s composed of 2
turns of 80 b.p. each, and the assumption 1s made that the so-called spacer
or linker DNA between nucleosomes 1s also packaged Into a similar coll as
© IRL Press Umlted, 1 Falconberg Court London W1V6FG. U.K.
4745
Nucleic Acids Research
55A'
per turn
(A)
110 A°
(B)
Figure 1. Binding of DNA Sequence Recognizing Proteins CSRPJto
Chromatin. A: The histone core (wedge shaped discs) folds the DNA Into
a uniform coll with 80 b.p. per turn. The dimensions of the nucleosome 0
and shape
of the core 1s given by Finch et al (1); the nucleosome is 55 A
x 110 A 0 x 110 A 0 ; the repeat 1s 200 b.p.. This particular view 1s that of
a 100 A° fiber as discussed by Worcel and Benyajati (2). In other models
(14,15) which seem to contradict the face to face model proposed by
Dubochet and Noll 0(22) the nucleosomes may be tilted with respect to 0 each
other 1n the 100 A fiber; however, when packaged Into the 250-300 A fiber,
their faces become alligned (14) and approach the view shown 1nthe simplified
diagram presented here. The sequence 0 "xyz" represents a palindrome separated
by 80 b.p. but brought to within 30 A as a result of the nucleosome folding.
The symmetry of the resulting compound binding site would match that of a
symmetrical protein. B: The coll 1s the same as 1n (A); however, the DNA
1s turned 180° so that the vector associated with "abc" 1s now pointed inward
towards the center of the coll. As discussed 1n the text, a "singularity"
(open triangle) placed near DEF or near ABC will determine whether the abc
vector 80 b.p. downstream points inward or outward.
proposed by a number of workers (see for example, Worcel and Benyajati. )
(This assumption 1s made for simplicity only, since the general argument
being put forth 1s really Independent of this. Additional proposals for the
100 A 0 fiber will also be considered below.)
The colling of the DNA 1n the chromosome fiber means that 1t 1s likely
4746
Nucleic Acids Research
that a given DNA sequence recognizing protein (SRP) does not "read" a
specific contiguous sequence of DNA; on the contrary, as the protein
approaches the chranatin fiber from a particular angle, 1t would likely "see"
a series of roughly parallel segments of DNA, each about 10 - 20 b.p. long
and, for the most part, spaced at an average center-to-center distance of
about 50 A 0 with a slight tilt with respect to the main axis of the fiber.
(The exact spacing between the roughly parallel segments of DNA may vary
along the periphery of the nucleosome due to the wedge-shaped structure of
the core.1) Clearly, the exact details of what the protein might see are
best predicted from the exact structure of DNA 1n the nucleosome core and 1n
the nucleosome spacer. Since these parameters have not been fully established by direct experimental measurements 1t would be premature at present
to go Into great detail as to what exactly the protein might be seeing, but
1t 1s clear that as the detailed structure of the nucleosome becomes known,
the exact requirements for SRP binding should also become understood to a
similar level of resolution. With respect to F1g. 1, 1t 1s also possible to
Imagine that the approaching protein may not be oriented exactly parallel to
the axis of the fiber. Thus, the average phasing of 80 b.p. between multiple
sites 1s only an approximation and 1t 1s likely that, depending on the
spatial distribution of the multiple DNA binding sites 1n a given SRP, some
Interactions might Involve sites spaced at 90 b.p. or 70 b.p., etc.. As will
be clear from the discussion below, whatever the exact spacing, 1t should be
very near a multiple of 10 b.p..
Figure 1 also emphasizes the fact that 1n this structure, almost all of
the DNA 1s on the outside of the fiber and 1s probably accessible to DNA
binding proteins, at least over 10 b.p. Intervals (see below). While the
common notion that only linker DNA Is readily accessible to proteins 1s best
supported by the finding that micrococcal nuciease cleaves these spacer
regions preferentially; this notion 1s probably Incorrect since 1t 1s also
well known that DNase I and DNase II (and at high concentrations, even micrococcal nuclease) readily cut the DNA associated with the nucleosome core, and
also that the core DNA 1s readily accessible to alkylating agents and even to
lac repressor when the later 1s challenged to bind nucleosomes reconstituted
with lac DNA (see review by McGhee and Felsenfeld.3)
There are several very general features of the proposed SRP-chromat1n
Interaction that are Interesting. The first deals with the length of DNA
over which a typical Interaction can occur. First consider the 100 A 0 fiber
(Fig. 1). If one Imagines a DNA sequence recognizing protein (SRP) of say 100
4747
Nucleic Acids Research
A 0 1n diameter, 1t 1s possible that such a protein might Interact with at most
30 b.p. of contiguous, linear DNA. In contrast, 1n the context of the proposed model, the same protein could Interact with the DNA associated with
almost 2 nucleosomes. This 1s about 5 colls of DNA. If we guess that about
20 b.p. per coll 1s positioned 1n the proper direction, then a 100 A 0 protein
has the potential to bind as many as 100 b.p. (20 b.p. per coll times 5
colls). Consequently for a given sized protein; the free energy of Interaction (and hence the specificity) can be decreased by concentrating more DNA
Into a given area, providing more surface for proper recognition. In this
view, each coll would represent a separate binding sub-domain and therefore
the overall Interaction between SRP and DNA would be the product of each
successful Interaction between a SRP binding site and a sub-domain. Thus, by
compacting DNA binding sites 1t 1s possible to achieve a high degree of
specificity without forcing DNA binding proteins to evolve Into the long
cylindrical structures required to Interact with long stretches of linear,
contiguous DNA, or else altering the basic design of DNA protein Interactions
Initially established 1n simpler organisms. (Actually, since bacterial DNA
1s also colled, this same type of Interaction might also occur for some bacterial SRPs.)
In contrast to the previous calculation, 1t should be emphasized that
most of the chromatin 1n higher cells 1s present 1n the nucleus as a 250 300 A 0 fiber. One basic view of how nucleosomes are packaged In such a fiber
has been proposed by a number of laboratories. For simplicity and as an
example, 1f one considers a solenoidal packing of nucleosomes with 6 nucleosomes per turn of the solenoid, then it 1s not unreasonable that a 100 A 0
protein could span a compound binding site extending as far as 24 nucleosomes
or almost 5 Kb. Possibly many of the Inverted repeats separated by long
Intervals 1n eukaryotic DNA (the sequence "XYZ" 1n F1g. 1A) could be brought
adjacent to each other by this type of folding and hence, form a compound
recognition site for a symmetrical protein as has also been suggested by the
elegant work of Nasmyth et al.^
The use of phased, compound recognition sites has additional evolutionary advantages. The argument here 1s analagous to that proposed by Gilbert^
for the exons and Introns comprising eukaryotic transcription units. Thus,
since each coll of DNA represents a separate domain containing a sub-domain
of 10 - 20 b.p. of DNA that Interacts with SRPs (the "exonic" region) and
60 - 70 b.p. that do not (the "Intronic" region), 1t 1s possible to Imagine,
as 1n the case of true exons and Introns, that when a particularly effective
4748
Nucleic Acids Research
sub-domain evolves, rapid recombination vrithin the Intronic regions can occur,
allowing for reassortment of the functional domain for SRP binding.
Orientation of Binding Sites
The previous estimate Illustrates a very Important limitation that a
recognition mechanism must encounter. In a solenoid, (or 1n any higher order
structure) not all of the DNA will be available; thus, some of the DNA 1n
each nucleosome faces to the Inside and some, to the outside of the solenoid.
Consequently, for proper recognition to occur two criteria are required.
First compound recognition sites 1n the DNA must be placed at specific distances along linear DNA 1rv a way that 1s compatible with the underlying chromosomal structure and second, these sites must be properly oriented (all
pointed inside or all pointed outside.) Interestingly, because of the
helical nature of DNA, the same problem occurs even when only a single
nucleosome 1s considered. I shall deal with this specifically.
Figure 1A shows the DNA wrapped about a hi stone core. A specific
sequence (for example "abc" In 1A) on the DNA 1s maximally accessible only
from one orientation (eg, the large groove). Thus, 1f such a sequence 1s to
be read with maximal efficiency i t must be pointing upward away from the
center of the cylinder (arrow) and not downward (Fig. IB). What will determine the direction of the "abc" vector? Clearly, specific features of the
structure Itself are Important (the exact path of the DNA around the core,
hi stone Induced variations of the DNA twist, the pitch of the supercoiled DNA,
the exact path of the spacer DNA, e t c . ) . However one special feature will be
the exact point where the coil begins. For convenience we call such a point
a "singularity" (This term was suggested by A. Varshavsky.) and depict such
a singularity by the triangles 1n F1g. 1A, B.
In comparing these two diagrams a singularity at ABC (Fig. 1A) results
1n the abc vector pointing out while a singularity at DEF (5 b.p. away from
ABC) causes the abc vector to point Inward (F1g. IB). This follows because
the DNA Is a double helix with about 10 b.p. per turn. Thus, rotations of 5
b.p. (or deletions or Insertions of 5 b.p.) will have marked effects on how
SRPs Interact with chromatin. Clearly, without a singularity there will be
a continuum of orientations of the abc vector, though the Interaction with
the hi stone core might place some restraint on t h i s , perhaps d1g1tal1z1ng
the possibilities to unit base pair Intervals.
Singularities and Hypersensitive Sites
What 1s the nature of the proposed singularities? For a number of
genes, regions of DNase I hypersensUivity have been described. 7 f 8 > 9 > 1 0 > 1 1 > 1 2
4749
Nucleic Acids Research
The genes Include the SV 40 mini chromosome, Drosophila heat shock genes,
actin, <*. and /?-glob1n, RAV-0, Integrated adeno virus. Integrated polyoma,
and conalbumin. These regions of hypersens1t1v1ty are rather precise points
In the chromosome that are cleaved by DNase L. They tend to be localized
near the 5' side of these genes but they are also detected 1n other regions
as well. They are visualized, experimentally, by digesting nuclei with DNase
I so that the average size of the DNA 1s about 10 - 15 Kb. The DNA 1s purif1ed, redigested with a restriction enzyme, separated on agarose gels, blotted, and hybridized to a particular probe. As a result of the DNase 1 treatment, specific, descrete sub-bands are produced 1n addition to the original
parental fragments produced by restriction alone. This type of point s i t e ,
hypersensitivity 1s clearly distinguished from the preferential sensitivity
of transcribed genes Induced by Wffi 14 and 1713 1n that the latter occurs In
each nucleosome all along the length of a transcription unit.
We think that 1t 1s a likely possibility that the regions of DNase I
hypersensitivity, which are also tissue specific, are regions that are used
to define a singularity as discussed above. The biochemical basis for the
hypersens1t1v1ty 1s for the most part not known; however. In the case of the
SV 40 mini chromosome, one suspects that I t 1s determined 1n some way by T-Ag
since 1t occurs very near the T-Ag binding site. By establishing a specific
point, the DNA coll Is defined (both upstream and downstream) by these singul a r i t i e s . As a result. Important DNA sequences come Into register (and proper
orientation) every 80 b.p. and they can now be recognized (or not recognized,
depending on their orientation) by SRPs. These proposed recognized events,
which would occur downstream and upstream from the singularities, may be used
to Initiate transcription and replication (and recombination) and also to
control genes that are differentially regulated 1n different tissues. By
orienting adjacent sites inward, singularities might also be used to prevent
proper recognition of adjacent multiple binding sites. Clearly, a distinguishing feature of this type of control 1s that two events are required:
The establishment of a singularity and the subsequent orientation of the 80
b.p. coll. Both of these events would occur at nearby, but different chromosomal sites. In a sense, the proposed singularities could be thought of as
producing "position-effects."
A crucial question 1s how the singularities are established 1n the f i r s t
place and what is their biochemical basis. Perhaps a more accessible question is 1f these proposed singularities prove to be regions of DNase I hypersensitivity, whether these hypersensitive regions are present before a gene
4750
Nucleic Acids Research
Is activated during development or not. We have recently Investigated t h i s
point comparing hypersensitive regions 1n the chicken /?-glob1n domain In
precursor hematocytoblasts and 1n progeny nucleated erythrocytes. Our results
show that the hypersensitive sites are present 1n erythrocytes, but not 1n
precursor c e l l s . Thus, the appearance of these hypersensitive sites coincides with the chromosomal activation of the globin genes. This r e s u l t suggests that 1f the hypersensitive sites are the proposed s i n g u l a r i t i e s , then
they are created by the developmental process and are not, f o r example, premarked and stable areas of the chromosome.
How 1s a given singularity created during development? Since these
structures clearly occur at specific DNA sequences the problem arises how
these sequences are recognized I f the 80 b.p. coll 1s not already phased by
some additional, Independent s i n g u l a r i t y , ad 1nf1n1tum. I t 1s possible that
during early development the DNA c o l l s associated with p a r t i c u l a r chromosomal
regions become phased randomly and the chromosomal differences that eventually
evolve between c e l l types are generated by the selection (possibly Influenced
by embryonic gradients) of a particular phasing scheme. This selection
would be manifest 1n the establishment of a s i n g u l a r i t y . As a consequence, a
particular orientation of the 80 b.p. coll would be assembled and locked-1n
over rather long distances and neighboring compound recognition sites would
become properly oriented for subsequent Interaction with SRPs. Clearly, some
mechanism must exist f o r propagating these singularities to daughter c e l l s
during c e l l d i v i s i o n .
CONCLUSIONS AND PREDICTIONS
In conclusion, a consideration of the way 1n which DNA Is folded 1n the
eukaryotic chromosome has led to the prediction that SRPs w i l l recognize
multiple DNA sequences spaced at an average of 80 b.p. Intervals along linear
DNA.
By concentrating these multiple binding sites Into a smaller
higher degree of s p e c i f i c i t y can be achieved.
area a
As the DNA c o l l s about the
nucleosome, 1t Is known to have an Inside and an outside face.
Thus, any
particular sequence may be facing Inward (where I t would be Inaccessible)
or outward (where 1t could be read).
For proper binding (or lack of binding)
to occur 1t 1s Imperative that these binding sites be phased and oriented so
that they are always facing 1n or out.
We propose that s i n g u l a r i t i e s exist
to perform these functions and at a biochemical level these s i n g u l a r i t i e s
may prove to be the hypersensitivity chromosomal regions defined by DNase I .
For the purpose of the model presented here, I have assumed the simplest
4751
Nucleic Acids Research
structure for the pathway of the DNA 1n a higher order chromatin fiber. While
the exact pathway DNA follows as 1t traverses the nucleosome core particle 1s
fairly well understood the pathway of the DNA when the chromatin 1s In a
higher order structure 1s thought to be understood only for the nucleosoroe
component of this structure. What 1s not known 1n detail 1s the exact relationship of one nucleosome to the next, the angle of the nucleosome with
respect to the axis of the fiber, and as a related factor, the exact pathway of the DNA 1n the spacer region between nucleosome core particles. Many
of the specific details of the suggestions presented here depend on several
parameters of higher order structure that have yet to be experimentally
defined; however, the general features of the model rely heavily on the reasonably well-established pathway of the DNA 1n the nucleoscme I t s e l f .
The simplest form of the proposed model falls to account for theories of
higher order structure that may not accomodate a regular colling of the spacer
DM14,15,16 or a regular positioning of one nucleosome vis a viz the next.
As a result 1t would be difficult to see how a given singularity could organize the adjacent nucleosomes. However, several laboratories have presented
convincing data showing that there are, 1n fact, definite non-random relationships between one nucleosoree and the next. 1 7 » 1 8 » 1 9 Moreover, while 1t had
originally been thought that nucleosomes were not positioned on particular
DNA sequences, there 1s now a growing body of compelling evidence showing
that this 1s not true for tRNA genes20, for 5S genes (J. Gottesfeld, A. Worcel,
personal communications), satellite DNA^1 (and A. Varshavsky, personal communication), Mstone DNA (A. Worcel, personal communication). beta-glob1n DNA
(H. Weintraub, manuscript in preparation), and heat shock DNA (C. Wu, personal
commune1 at1on). Thus, whether I t be the uniform colling of the spacer DNA,
or a regular positioning of one nucleosoroe with respect to the next, there
are presumably mechanisms that can determine how particular nucleosomes are
positioned and hence, provide a given singularity a mechanism for determining
the orientation (facing Inward or outward) of particular DNA sequences associated with adjacent nucleosomes.
In Its most general form, the model makes a number of specific predictions: That Important DNA binding sites will be compound and phased at an
average of 80 b.p.; that DNA insertions and deletions of multiples of 5 b.p.
(between compound binding sites) will be much more deleterious than multiples
of 10 b.p.; that singularities (hypersensitive regions) will be used to
orient Important binding sites upstream and downstream and would therefore
be acting at a distance, perhaps as far as 1 - 5 Kb; that Important control-
4752
Nucleic Acids Research
ling elements need not necessarily be localized to the 5' side of genes;
and that gene activation will require (at least) two dependent events, the
establishment of a singularity and the subsequent phasing of adjacent DNA so
that Important signals can be recognized by SRPs.
ACKNOWLEDGEMENTS
This work was supported by grants from t h e National I n s t i t u t e s o f
Health.
REFERENCES
1. Finch, J . T . , Lutter, L. C , Rhodes, D . , Brown, R. S . , Rushton, B . ,
L e v i t t , M., and Klug, A. (1977) Nature 269, 19-36.
2. Worcel. A. and Benyajati, C. (1977) C e l T T 2 , 83-100.
3 . McGhee, J . and F e l s e n f e l d , G. (1980) Ann.~R~ev. Biochem. 49_, 1115-1156.
4 . G i l b e r t , H. (1978) Nature 271, 5 0 1 .
5. Finch, J . T. and Klug, A. TJ976) Proc. Nat. Acad. Sc1. USA 7 3 , 1897-1901.
6. Nasmyth, K., T a t c h e l l , K., H a l l . B . , and Smith, M. (1980) manuscript
submitted.
7. Wu, C , Bingham, P . , L1vak, K., Holmgreen, R. and E l g i n , S. C. R. (1979)
Cell 1 6 , 797-808.
8. S t a l d e r , J . , Larsen, A . , Engel, J . D . , Dolan, M., Groudine, M., and
Heintraub, H. (1980) Cell 2 0 , 451-460.
9. Kuo, T . , Mandel, J . , and Cnambon, P. (1979) Nuc. A d d Res. 7 , 2105-2113.
10. Varshavsky, A . , Sundin, 0 . . and Bahn. M. (1979) Cell 1 6 . 457-466.
11. Waideck, W., Fohring, B . , Chowdhury, K., Grass, D. ancTSauer, G. (1978)
Proc. Nat. Acad. Sc1. USA 7 5 , 5964-5968.
12. S c o t t , W. A. and Wigmore, DT J . (1978) Cell 1 5 , 1511-1518.
13. Weisbrod. S. and Weintraub, H. (1978) Proc. Nat. Acad. Sc1. USA 7 6 ,
328-332.
14.
Tcma, F . , K o l l e r , T . , and K l u g , A. (1979) J . Cell B1ol. 8 3 , 403-427.
15.
Suau, P., Bradbury, M., and Bradbury, J. (1979) Europ. J7~B1ochem. 97,
593-602.
Worcel, A., Strogatz, S . , and R1ley, D. (1980) manuscript 1n preparation.
Lohr, D. and Van Holde, K. E. (1979) Proc. Nat. Acad. Sc1. USA 76_, 63266330.
Lohr. D., Tatchell, K. and Van Holde, K. E. (1977) Cell 1£, 829-836.
R1ley, D. and Weintraub. H. (1978) Cell 13, 281-293.
WHtig, B. and W1tt1g. S. (1979) Cell 18, 1173-1183.
Musich, P. R., Ma1o, J. J. and Brown, F. (1977) J. Mol. B1ol. 117, 657677.
Dubochet, J. and Noll, M. (1978) Science 202, 280-286.
16.
17.
18.
19.
20.
21.
22.
4753
Nucleic Acids Research
4754