The YYRR box: a conserved dipyrimidine

Volume 16 Number 8 1988
Nucleic Acids Research
The YYRR box: a conserved dipyrimidine—djpurine sequence element in Drosophila and other
eukaryotes
Douglas Cavener, Yue Feng, Brian Foster, Philip Krasney, Michael Murtha, Christopher Schonbaum
and Xiao Xiao
Department of Molecular Biology, Vanderbilt University, Nashville, TN 37235, USA
Received December 2, 1987; Revised and Accepted March 9, 1988
ABSTRACT
We have discovered a novel DNA sequence element In Dro3ophlla which I s
based upon a CTGA tandem repeat. This element has been named the YYRR box to
emphasize i t s dlpyrimldine-dipurine nature which i s predicted to nave unusual
structural features. Southern hybridization analysis of genomic DNA indicates
the presence of 25-30 copies of the YYRR box in each of three Drosophila
species (melanogaster, pseudoobscura. and v i r l l l s ) and conservation of genomic
location within species. Similar analysis of human and rat DNA indicates the
presence of YYRR related sequences in mammals as well. YYRR boxes have been
localized to two genetic loci in Drosophila: Gld and a gene tentative
identified as ted. These two genes exhibit correlated patterns of
developmental expression and an identical mutant phenotype. Sequence analysis
of the Gld YYRR box in three Drosophila species revealed a high degree of
conservation despite i t s intronlc location.
INTRODUCTION
Recent refinements in models of DNA structure and results of X-ray
diffraction studies have stimulated a number of investigations into the
structural and functional
conformation of DNA (1-9).
consequences of sequence dependent variation in the
Calladine (1) proposed that large distortions In
the archetypical 3-DNA conformation are generated as a consequence of avoiding
s t e r l c hindrance of nearest^nelghbor purlnes on complementary DNA strands.
Furthermore, he suggested four alterations in the DNA duplex structure which
could potentially alleviate the purlne-purlne clash.
Based upon the
assumption that these alterations would be compensated by nearest-neighbor
nudeotldes, a simple quantitative model (denoted Calladine's rules) was
proposed to describe the relative changes in these helix parameters (2).
Dickerson and coworkers have shown that t h i s model is able to predict such
alterations in short ollgonucleotlde crystals (2).
Tung and Harvey (8) have
recently developed a more elaborate theoretical model to estimate absolute
values for sequence dependent conformatlonal parameters.
Whether these two
theoretical models are directly applicable to DNA in vivo is certainly not
3375
Nucleic Acids Research
clear.
However, considerable evidence e x i s t s that a l t e r a t i o n s In the
conformation of B-DNA strongly Influence the recognition s i t e s of various
endonucleases ( e . g . DNAase I)
(6,10-13)
He have discovered a conserved multicopy DNA sequence element In
Droaophlla which has the highest possible variance In the helix twist angle
allowed by C a l l a d i n e ' s r u l e s . This sequence I s 50-72 bp In length and i s
based upon a tetranucleotlde dlpyrlmidine-dlpurlne motif. We have named t h i s
sequence the YYRR (pronounced wire) box t o emphasize I t s dipyrlmidine-dlpurine
nature. Two of the YYRR boxes have been localized within or near the
developmentally r e l a t e d Droaophlla genes, glucose dehydrogenase (Gld) and a
t r a n s c r i p t i o n unit t e n t a t i v e l y Identified as trapped ( t e d ) . Gld and ted
exhibit an i d e n t i c a l mutant phenotype and are t i g h t l y linked (14,15).
Furthermore, the expression of the mRNAs from Gld and the putative ted gene
exhibit highly correlated developmental p a t t e r n s . Their correlated pattern of
expression may be r e l a t e d to the presence of the YYRR boxes.
MATERIALS AND METHODS
D r o s o p h l l a DNA c l o n e s
Genomic DNA c l o n e s c o n t a i n i n g t h e Gld and t e d genes for p_;_ m e l a n o g a a t e r
were p r e v i o u s l y d e s c r i b e d ( 1 4 ) .
These c l o n e s were I s o l a t e d from an EMBL-4
lambda l i b r a r y of t h e Oregon R. s t r a i n .
An a d d i t i o n a l
ted c l o n e was i s o l a t e d
i n t h i s s t u d y from a Charon 4 lambda l i b r a r y of t h e Canton S s t r a i n .
The D^_
pseudoob3cura and p_;_ v l r l l l a Gld c l o n e s were i s o l a t e d from an EMBL lambda
l i b r a r y and a Charon 30 l i b r a r y , r e s p e c t i v e l y , via standard plaque
hybridization techniques using D melanogaster Gld clones as the hybridization
probes. The YYRR r e l a t e d sequences found in the atranded-at-second (sas) gene
and the Antennapedla Complex (ANT-C) were Identified from a s e r i e s of genomic
clones from two chromosome walks i n the 84BC region (14,16) using the 16-mer
GACT ollgonucleotlde probe desrlbed below.
DNA hybridization
Two s y n t h e t i c ollgonucleotldes were used t o screen Southern blots of
various genomic clones and genomic DNAa for the YYRR sequences: a 16 mer,
d(GACT)i| and a 32-mer, d(GACT)g. These will be referred t o as CACT-16 and
CACT'*32, r e s p e c t i v e l y . Hybridization experiments with 32P end-labelled
GACT-16 used the following conditions: 5X SSC, 0.1J SDS, 0.01J heparln, and
20J fonnamlde at 25° C. Blots were washed in 2X SCC-0.2J SDS at 25° C.
Hybridization experiments with GACT-32 were conducted at 42° C using one of
3376
Nucleic Acids Research
the following h y b r i d i z a t i o n buffers:
heparln and 50* forraamlde:
(Buffer A) 5X SSC, 0 . 1 * SDS, 0.01J
(Buffer B) 0.25 M 3odiura phosphate, pH 7, 0.3 M
NaCl, 0.01 M EDTA, 7* SDS, and 50J formamide.
Blots were washed in 2X SSCH). 2*
SSC-0.2JSDS f i r s t at 25° C and then a t 60° C.
DNA cloning and sequencing
Nested d e l e t i o n s of several DNA clones described below were constructed
using t h e Henikoff ExoIII method (17).
D melanogaster Gld YYRR box:
A series
of ExoIII d e l e t i o n s were constructed In pEG25, a pEMBLi8 3ubclone of a 7 kb
Sal-Xba fragment from i n t r o n - I . D. melanogaster ted YYRR box:
fragments
(pCTtO) from genomlc clones of Oregon-R and Canton-S were subcloned
I n t o the Bluescript vector (Stratagene, I n c . )
sequence:
S'lObp AccI
D. melanogaster sas YYRR
A s e r i e s of ExoIII deletions were constructed in pCS58, a
B l u e s c r i p t subclone of a 2.5 kb EcoRI fragment from l n t r o n - I of t h e sas gene.
D. melanogaster ANT-C YYRR sequence: a s e r i e s of ExoIII deletions were
constructed in pCX62, a B l u e s c r i p t subclone of a 2.4 kb EcoRI fragment located
between the divergent promoters of the ANT-C genes Sex combs reduced (Scr) and
fushl tarazu ( f t z )
(Dr. T. C. Kaufman, personal communication).
pseudoobscura Gld YYRR box:
D.
A s e r i e s of ExoIII d e l e t i o n s were constructed in
pBy2.1, a Bluescrlbe ( S t r a t a g e n e , I n c . ) subclone of a 2.1 kb S s t I fragment
from i n t r o n I .
D v l r i l l s Gld YYRR box:
subcloned i n t o B l u e s c r i p t (pCG79fl).
a 350 bp PvuII-SstI fragment was
After verifying t h e l o c a t i o n of YYRR
sequences within the subclones and deletion mutants using Southern a n a l y s i s ,
t h e i r DNA sequences were determined using the Sanger dldeoxy sequencing method
(18).
Analysis of the DNA sequences was performed on a Compaq 286 computer
using the IBI P u s t e l l DHA sequence a n a l y s i s programs.
RESULTS
D. melanogaster Gld YYRR box
DNA sequence a n a l y s i s of the 8 kb intron s e p a r a t i n g the s t a r t s i t e of
t r a n s c r i p t i o n from t h e s t a r t codon of Gld revealed an unusual 72 bp
t e t r a n u c l e o t l d e tandem repeat element ( F l g . 1 ) .
sequence CTGA.
This repeat 13 based upon the
Single n u d e o t l d e differences in the CTGA motif e x i s t In seven
of the eighteen r e p e a t s .
I n t e r e s t i n g l y , only two of these differences
the dipyrlmidine-dipurine (YYRR) nature of the r e p e a t .
disrupt
Although t h e s e two
repeats (YYYR and RRRY) a l s o destroy the 3lmple dyad symmetry (YYRR/RRYY),
they give r i s e t o a higher order palindromlc p a t t e r n (YY R YYY RR / YY RRR Y
RR).
Because YYRR sequences have a high d e n s i t y of YR and RY base s t e p s , high
l e v e l s of v a r i a t i o n were found for a l l four h e l i x parameters ( h e l i x twist
3377
Nucleic Acids Research
D. melanooaster Gld YYRR BOX
CTGA CTGG CTGA CTGG CTGA CTGG CTOA CTOA CTOC CTGA CCOA QTQA CTGA CTGA CTQA CTOA CTAA CTOA
(YYRR)s YY R YYY RR | YY RFIR Y RR (YYRR)6
2 1 3
2
2 3
12
D. pseudoobscura Gld YYRR BOX
CTGA CTGA CTGA CTOA CTGG CTGA CTGA CTOA CTCA CTAA CTQA CTGA TTOA CTQA CTGG CAGA CTGA
(YYRR)8 YYY R (YY | RR)e Y RRR YYRR
3 1 2
2
1 3
D. virilis Gld YYRR BOX
AA CTGA CTCA CTGA CTCA CTQA CTGA CTGA CTGA CGGA CTGA CAGA CTGA
RR YYRR YYY R YYRR YYY R (YY | RR)4 Y RRR YYRR Y RRR YYRR
3 1 2 2
3 1 2
2 1 3 2 2 1 3
D. melanogaster ted YYRR BOX
AA CTGA CTGA CTGT CTCA CTGC CTCA CTGA CTQA CTGA CTGA CTQC CTGA ATGA CTGA CTGA CTAA CTGA
(RRYY)3 R Y4 R YY R Y4 R (YYRR)4 Y2 R Y3 | R3 Y (RRYY)4 RR
2
14 1 2 14 1 2
2 13
3 1 2
Figure 1. Primary DNA sequences and derived symmetries of Gld and ted YYRR
boxes. All of the sequence symmetries observed for the Gld gene are
pallndromlo. The axis of the dyad 3ummetry I s indicated by "
"In the second
l i n e for each sequence. A direct repeat Is observed In the ted YSSR box. The
ted YYRR box was sequenced In both the Canton-S and Oregon-R s t r a i n s of £.
melanogaster and found t o be I d e n t i c a l .
angle, base-plane r o l l angle, main-chain torsion angle, and propeller twist
angle) as calculated by the sum rules of Dlckerson ( 2 ) . According t o these
r u l e s , a l t e r a t i o n s In the helix twist angle and base-plane r o l l angle are
compensated for by opposite changes in the same angles In adjacent base p a l r 3 .
The YYRR sequence i s unique i n t h i s regard because i t contains the highest
possible packing of YR and RY steps which generate the clash and YY and RR
steps which compensate for the clash. As a consequence, YYRR tandem repeats
generate the highest possible variance In the helix t w i s t angle as computed by
C a l l a d l n e ' s r u l e s . The expected deviations and variance in the helix twi3t
angle for the Gld YYRR box and flanking sequence are depicted in Figure 2.
The central symmetrical pattern In Figure 2a is due t o the higher order
3378
Nucleic Acids Research
GTATCTACGGTATTGTTCACACTGACTGGCTGACTG6CTeACTG6CTGACTQACT0CCT6ATTGAGTGACTGACT0ACTa*CTGACTAACTGACAGGCAQCTCA
(a)
O
30
30 TO 90
IK) 130 150 170 190 210 230 250 270 290 310 330 350 370 390
(b)
Figure 2. Helix twist plot for the D. melanogaster Gld YYRR box.
(a) The expected deviations in the helix twi3t angle were calculated using
Caladlne's r u l e s . Zero i s approximately equal to an angle of 36° ( i . e . the
mean angle for B*DNA). Each unit i s approximately equal to 2° (Dickerson,
1983). Thus, the predicted range in the helix twi3t angle i s 28° t o 12".
(b) The variances of the twist angle deviations, V(TAD), are plotted for the
Gld YYRR box and flanking sequences. Each data point represents the variance
calculated from ten adjacent angles. The two highest peaks in the variance
represents the YYRR box (basesi60-230). The intervening valley between these
two peaks is caused by the pallndromic sequence of the YYRR box which has a
lower density of purine-purine clash than the simple dipyrlmidlne-dlpurlne
motif.
pallndromic sequence described above. The t i g h t packing of helix t w i s t
deviations In the YYRR box leads to a two fold higher variance in t h i s
parameter than that observed in flanking sequences (Figure 2b). Using the
methods of Tung and Harvey (8) the following s e r i e s of helix twist angle i s
predicted for the basic CTGA tandem motif CT-34.9 0 , TG-27.7", GA-36.6 0 , and
AC-36.4 0 . Although both models predict a similar sharp reduction for the TG
step twist angle, the Tung-Harvey model doe3 not predict the large positive
3379
Nucleic Acids Research
D. melanogaster
a
tsi
:»
:»
HI
HI
i
i
i
t
t
««
i
«•
i
sio
i
»•
i
\ .
^
\
>••:
YYRR BOX...
i •;
^
401
v.
5Qbp
v
D. melanogaster
100
250
100
Wi
YYRR BOX
I -
\
\:
50 bp
3380
Nucleic Acids Research
D. pseudoobscuro
SI
10
h
:
ISt
to
rRR BO:<
!0t
\
»
! \
1(1
50 bf>
:
ISO
Figure 3. Homology matrices for Gld YYRR boxes In three Prosophlla s p e c i e s .
The Pustell - IBI DNA Sequence Analysis program was used to compare the
nucleotide sequence of D. melanogaster. D. pseudoobscura. and D. v l r l l l 3 . (a)
melanogaster & pseudoobscura. (b) melanog a s t e r & v l r l l l 3 , (c) pseudoobscura &
v l r l l l s . Homology between sequences i s indicated by l e t t e r s which represent
the degree of sequence s i m i l a r i t y within a centrally-weighted region of 31
nucleotidea. Sequence s i m i l a r i t y ranges from 100J ( l e t t e r A) t o a lower
cutoff of 651 ( l e t t e r S). The t h r e e sequences containing the YYRR boxes are
a l l located within the large 5' lntron of the Gld gene. The o r i e n t a t i o n (top
to bottom and l e f t to r i g h t ) corresponds t o the 5' to 3' o r i e n t a t i o n of the
Gld transcription unit.
deviations i n the CT and 3tep angles. Consequently the helix twist variance
within the YYRR box i s not greater than variance in the sequences flanking the
YYRR box as calculated by the Tung-Harvey method (data not shown) in contrast
to Figure 2b (Calladine helix t w i s t variance).
Gld YYRR box in P. pseudoobscura and P. v l r l l l s
Genomlc PNA clones of the D^_ melanogaster Gld gene were used as
hybridization probes to i s o l a t e the homologous clones l n D ^ v l r l l l 3 and P_^
pseudoobscura from genomic PNA l i b r a r i e s (Krasney and Cavener, unpublished
d a t a ) . YYRR boxes were found t o e x i s t in the homologous regions of the Gld
3381
Nucleic Acids Research
genes of t h e s e two species as well. The sizes of the aid YYRR boxes are
similar in the three species (£. melanogaster - 72 bp. D. pseudoobscura - 68
bp. and D. v l r l l l 3 - 50 bp). A t o t a l of eight mismatches t o the dipyrlmldinedlpurlne motif Is observed among the t h r e e Gld YYRR boxe3, and a l l of these
are part of higher order pallndromlc sequences. However, each one of the YYRR
boxes contains a different pallndroralc p a t t e r n . One common feature of the
palindromes 13 that trlpyrlmidines are found t o the l e f t of the axis of
symmetry whereas t r l p u r l n e s are found t o the r i g h t . Also, a trlpyrimldine i s
present in the same position of a l l t h r e e r e l a t i v e t o the 3' boundary of the
YYRR box.
The Pustell-IBI DNA Sequence Analysis program was used t o examine the
conservation of the sequences flanking t h e Gld YYRR box in the three
Drosophlla s p e c i e s . The sequences Immediately downstream of the Gld YYRR
boxes are highly conserved among a l l t h r e e species (Fig. 3). I n t e r e s t i n g l y ,
t h i s region I s responsible for the low variance In the helix twist angle
immediately t o the l e f t of the YYRR box in Figure 2b. A more extensive region
of horaology i s observed in the 300 bp region surrounding the YYRR box between
D. melanogaster and £. pseudoobscura. I t i s Important to note that we have
examined t h i s i t r o n i c region in £. melanogaster and find no evidence for
protein coding regions.
YYRR sequences in other regions
Two ollgonucleotlde probes, GACT-16 and GACT-32 were used t o detect YYRR
sequences among several previously cloned D. melanogaster genes. GACT-16 and
GACT-32 are four and eight tandem repeats of GACT, r e s p e c t i v e l y . We chose
hybridization conditions which would allow for approximately two mismatches
using GACT-16 and five mismatches using GACT-32. All DNA sequences examined
were probed in p a r a l l e l with t h e two o l i g o n u d e o t i d e s . We have chosen to use
the general term YYRR sequence to denote sequences which hybridize with at
l e a s t the GACT-16 probe and t h e s p e c i f i c term YYRR box to denote those
sequences which a l s o hybridize with the GACT-32 probe.
The following regions of the genome were examined for YYRR sequencs by
probing Southern blots containing genomlc DNA clones: (a) fWC region
containing nine genes including Gld, 160 kb; (b) 8tAB region which contains
the Antennapedla Complex, 320 kb; (16), (c) 89E region which contains the
Blthorax Complex, 100 kb (19.20), (d) 25F-26A region which contains the
q-glycerophosphate dehydrogenase and B-galactosldase genes as well as several
other genes, 195 kb (D. Knlpple & R. Maclntyre, personal communication): and
(e) Ikt region which contains the E7t gene, 100 kb (C. Thummel & D. Hogness,
3382
Nucleic Acids Research
a
12
3 4 5 6
7 8 9 10 II 12 13 14
7.0-
b
sas
YYRR Sequence
I
ted
Gld
YYRR Box
„ \
YYRR Box
„
\
Figure t. YYRR sequences in the IWC region of D. melanogaster. (a) Overlapping
genomlc DNA clones (152 kb) containing the e n t i r e 84c region of the right arm
of the t h i r d chromosome (Cavener, et a l , 1986a) were hybridized with the
GACT^I 6 probe. Clone 6 contains the YYRR sequence within an intron of the sas
gene. Clones 12 and 14 contain the ted and Gld YYRR boxes, respectively, (b)
The location of the YYRR sequences/boxes r e l a t i v e t o the t r a n s c r i p t maps of
sas, ted and Gld. The thick l i n e s represent the approximate location of
exonlc sequences. The precise boundaries for the exonic sequences have only
been determined for exons I , I I , and I I I for the Gld gene.
personal communication). Using the GACT-1 6 probe, a t o t a l of 15 YYRR
sequences were detected among the 1.2 megabases of DNA examined. The r e s u l t s
for the 84C region are shown in Figure 1. Only two of the 15 YYRR sequences
were also detected by the GACT-32 probe. Besides the YYRR box at the Gld
locus the other YYRR box detected by GACT-32 maps e i t h e r within or near the
trapped (ted) locus, ted and Gld are t i g h t l y linked and reside within a 100
kb region. (11,15). The location of the _ted YYRR box r e l a t i v e t o the ted
t r a n s c r i p t i o n unit has not been precisely determined. Preliminary evidence
Indicates that t h i s YYRR box i s most l i k e l y within a 5' intron of the ted
gene.
The sequence of the ted YYRR box i s shown in Figure 1.
I t s length i s
e s s e n t i a l l y the 3ame as that of the D. melanogaster Gld YYRR box.
The ted
3383
Nucleic Acids Research
Hind H I
EcoRI
PvuIL
O C P V O C P V O C P V
Figure 5. YYRR boxes in the Drosophlla genome. Genoraic Southern a n a l y s i s was
performed on t o t a l genomic DNAs of D. melanogaster, D. pseudoob3cura, and D. _
v l r l l l s using the GACT-32 probe. 0 - D. melanogaster Oregon-R s t r a i n , C - D.
melanogaster Canton-S s t r a i n . P - D. pseudoob3cura and V - D. v l r l l l s . High
stringency hybridization and washing conditions were used t o eliminate the
detection of YYRR sequences with l e s s than a 25 of 32 match with GACT«32. The
f i l t e r s were exposed t o XAR-5 film for 15 hours at -70 C°.
YYRR box contains s i x mismatches t o the YYRR motif. Four of these mismatches
are within a tandem repeat (YY R YYYY R, YY R YYYY R) while the other two
mismatches impose a higher order palindromlc sequence much l i k e the
palindromes observed for the Old YYRR boxes of D^ melanogaster, D.
pseudoobscura, and D. v l r l l l s . Another s t r i k i n g feature of a l l four YYRR
boxes which have been sequenced i s that t h e i r four base repeat element i s
3384
Nucleic Acids Research
uninterrupted by any Insertions or deletions which would d i s t u r b the four base
frame.
Given the heterogeneity of the symmetries observed among the four
3equenced YYRR boxes, we decided to determine if the ted YYRR box sequence
differed In another D. melangoaster s t r a i n . He sequenced the ted YYRR box
from the Canton-S s t r a i n and found t h a t I t was Identical t o t h a t of the
Oregon-R s t r a i n .
In order t o compare the YYRR sequences which only are detected by the
GACT-16 probe, we sequenced the YYRR sequences detected in the sas gene
(formerly denoted l(3)84Cd) (14,15) and In the ANT-C between the Scr and ffcz
genes. These YYRR sequences contain four or five tandem repeats of the GACT
motif with matches of 15 (j3as) and It (ANT-C) bases to the GACT-16 probe (data
not shown). The significance of these short YYRR sequences Is questionable,
inasmuch as t h e i r probable occurence in the Drosophlla genome approaches
random expectation. Nonetheless, the clones containing them nave proven to be
useful negative controls for experiments conducted with GACT-32. The best
match for each of these two YYRR sequences t o the GACT-^ ollgomer Is 21 of 32
(ANT-C) and 20 of 32 (3as) .
Copy number and conservation of location
Whole genomlc Southern hybridization indicated that the YYRR box copy
number ranges from 25-30 for D. melanogaster, I), pseudoobscura, and D. v l r l l l s
(Figure 5). Furthermore, the locations of YYRR boxes within D. melanogaster
appear t o be highly conserved between the Oregon-R and Canton-3 s t r a i n s (which
were isolated from different populations more than f i f t y years ago). This
r e s u l t I s In contrast to the lack of l n t r a s p e c i f i c conservation in location of
several other middle r e p e t i t i v e sequences in D. melanogaster (21).
Southern hybridization experiments were also conducted with genomic DNA
from r a t s and humans (Figure 6). Humans have four d i s t i n c t PvuII fragments
(8.7 kb, 2.8 kb, 1,2 kb, and 0.7 kb) which hybridize with GACT-32. This
r e s t r i c t i o n pattern la conserved among eight unrelated hunan DNA samples. The
YYRR copy number in r a t s i s d i f f i c u l t to estimate because one of the
r e s t r i c t i o n fragments containing a YYRR sequence e x h i b i t s a very high signal
(thus obscuring the signal from other fragments). Analysis of five diverse
r a t tissues uterus, I n t e s t i n e , l i v e r , heart, and spleen) indicates that the
6.5 kb PvuII fragment has very high signal in a l l t i s s u e s . I t I s important t o
note that under these hybridization conditions, high concentrations of
r e s t r i c t i o n fragments containing the short s a s and ANT-C YYRR sequences are
not detected at a l l with GACT:J32. Thus the genomic r e s t r i c t i o n fragments
3385
Nucleic Acids Research
1234
6.5
4.9
-
2.8
-
56
78
910
?f
ft
1.7 —
1.5 —
1.2 —
I
2
•* a s
2 3 4 5 6 7 8 9
8.7
-
6.5 —
2.8 -
4
1.2 -
-
0.7
•
-
Figure 6. Human and r a t YYRR box related elements. Total genoraic DNAs were
subjected t o Southern analysis using the GACT«32 probe and high stringency
hybridization conditions, (a) Lane 1, cloned ted YYRR box; Lane 2, cloned Gld
YYRR box; Lane 3, cloned sa3 YYRR sequence; Lane 1, cloned ANTHC YYRR
3386
Nucleic Acids Research
sequence; Lanes 5, 7, & 9 r a t u t e r u s 0 DNA (20 iig) digested with PvuII,
H i n d l l l , & P s t I , respectively: and Lanes 6, 8, 4 10 r a t uterus 214 DNA (20 ug)
digested with PvuII, Hindlll, and P3tl, r e s p e c t i v e l y . Rat DNA was Isolated
from Sprague Dawley r a t u t e r i either untreated, u t e r u s 0 , or t r e a t e d for 24
hours with e s t r a d i o l , u t e r u s 2 4 . Lanes 1 & 2 contain the two positive controls
(ted and Gld YYRR boxes) with the predicted 2.8 kb and 1.2 kb fragments,
r e s p e c t i v e l y , hybridizing with the probe. Lanes 3 S 1 contain the two
negative controls ( s a s ) and ANT-C YYRR sequences) which f a l l to hybridize to
GACT-32 as predicted under these hybridization conditions. The r e s t r i c t i o n
fragments containing the control YYRR sequences (Lanes 1*-4) contain between
20-60 ng of DNA. Exposure time was for 6 hours, (b) Rat genomic DNA from
various tissues digested with PvuII. Lane 1, spleen (15 ug); lanes 2 & 3,
l i v e r (7 ug); lanes H i 5 heart (15 ug)j lanes 6 & 7 I n t e s t i n e (15 ug); and
lanes 8 & 9 uterus (15 ug) . (c) Lanes 1 & 2 contain DNA samples (20 ug)
isolated from peripheral lymphocytes from two different human individuals both
digested with PvuII. Exposure time was for HO hours. The four PvuII
fragments detected have also been seen In s i x other individual samples (data
not shown).
which a r e detected in these experiments must have a match of greater than 21
of 32 with GACTU32. Given the high stringency conditions used, we estimate
that the percent match i s greater than 80$.
DISCUSSION
Tandem repeats of short oligonucleotlde3 are common to a variety of
eukaryotlc taxa and have been 3tudled extensively in D. melanogaster and D.
v l r l l l s (22). Although the simple sequence cooplexltiy of the YYRR box i s
similar to these previously reported tandem repeats, there are some Important
differences germane to their size, genomic location, and evolutionary
conservation. A large majority of the previously characterized tandem repeats
occur in large blocks (105-10? copies per clusters), and are typically
located In heterochromatin. As a consequence of their length and GC content
bias, they are often isolated as s a t e l l i t e DNA. Phylogentic studies Indicate
that these sequence elements evolve rapidly such that s a t e l l i t e s in closely
related species are quite dissimilar and In some cases may have independent
origins. In contrast, the YYRR box repeats sequenced to date are relatively
small (50-72 bp), are found within euchromatlc genes, and exhibit a much
greater degree of evolutionary conservation of size, location, and copy number
both within and between species.
Three other tandem repeats are more similar to the YYRR box than the
s a t e l l i t e DNAs mentioned above. The Opa sequence is a trlnucleotlde repeat
found in the coding region of a number of Drosphlla genes (23). Given i t s
location In protein coding regions, i t s functional consequence Is more obvious
than that of the YYRR box which apparently does not occur in any mRNA (Cavener
and Cox, unpublished data). The other two tandem repeats, found in
3387
Nucleic Acids Research
imnunoglobulin heavy chain genes and chorion genes, are found near or within
their transcription units but not within coding regions.
Imraunoglobulln heavy
chain genes contain pentanucleotlde tandem repeats of GAGCT upstream of the
alternative C exons (24).
This repeat is thought to be essential for the DNA
rearrangements which give rise to lmmunoglubulln class switching.
The
Drosophlla chorion gene cluster also contains a pentamerlc DNA tandem repeat.
This repeat has been Implicated in the tissue specific amplicification of the
chorion gene clusters on both the X and third chromosomes (25).
Presently, we
have no evidence that a DNA rearrangement or amplification occurs in the
region of the Drosophlla YYRR boxes.
makes i t technically difficult
However, the small size of Drosophlla
to rule out tissue-specific DNA rearrangements.
The conservation of the Gld YYRR box in D. melanogaster, £.
P3eudoob3cura, and D. v l r l l l 3 strongly suggests that the YYRR box has a
function.
Given the estimated time of divergence between the Sophophora
subgenus (containing D. melanogaster and D. pseudoobscura) and the Drosophlla
subgenus (containing D. v l r l l l s ) (26, 27) unconstrained DNA sequences in
lntronlo and flanking regions surrounding genes are expected to be completely
diverged between species from the two main subgenera.
Indeed, comparisons of
intronic sequences between D. melanogaster, D. pseudoobscura, and D v l r l l l s
for the Hsp^2 gene reveal complete divergence using the identical program and
parameters values we have used herein (28).
Thus, the conservation of the Gld
YYRR box and the f i r s t 50 bp immediately downstream of i t (Figure 3) implies a
conserved function.
This hypothesis i s further supported by the presence of a
YYRR box associated with the ted gene which exhibits an identical mutant
phenotype and has a very similar pattern of mRNA expression to Gld (Cox &
Cavener, unpublished data).
The Gld gene in D. pseudoobscura and JX_ v l r l l l a
are not abundantly expressed In adult males as i s the case for the Gld gene in
D. melanogaster, so i t is unlikely that the YYRR box Is involved in adult
expression.
The most compelling hypothesis at t h i s time i s that the YYRR box
serves as an enhancer or DNA amplification element which affects the
developmental expression of Gld, ted, and other related genes during preadult
development.
If t h i s hypothesis proves to be correct, then i t i s likely that
the 25-30 copies of the YYRR box in Drosophlla are contained within a large
family of developmentally related genes.
We have begun in vitro mutagenesls -
gene transformation experiments to t e s t the hypothesis that the YYRR serves a
regulatory function in the Gld gene.
Although i t i s tempting to speculate about the functional relationship of
the YYRR box related sequences found in f r u i t f l i e s , humans and r a t s , there
3388
Nucleic Acids Research
are 3ane important differences observed.
F i r s t l y , the copy number of the YYRR
boxes in htmans is an order of magnitude smaller than that of Drosphlla.
However, like Drosophlla, intraspecific variation in the r e s t r i c t i o n fragments
detected by GACT-32 is virtually absent suggesting that the YYRR box is a
non-mobile element and evolutionarily s t a b l e .
amplified YYRR sequence.
Secondly, rats contain a highly
But this amplification is not tissue specific, and
therefore, i t i s unlikely that the amplification i s involved in developmental
regulation.
In addition to functional and evolutionary questions, the potential
structure of the YYRR box is of considerable interest because i t represents an
extreme case of perturbation in the helix twist angle as estimated by
Calladine's rules.
Although the estimates of the helix twist angles in the
YYRR box by the method of Tung and Harvey (8) predicts less extreme average
deviations, i t i s interesting to note that the predict four angle series for
the YYRR box (3t.9°, 27.7°, 36.6°, and 36.t 0 ) i s very similar to the consensus
five angle series (36.2°, 37.8°, 26.1°, 37.8° and 36.2°) found in E coll
promoters (29).
Both the YYRR box and the E c o l l consensus structure contain
a YR step leading to a large decrease in the twist angle.
The notion that conformatlonal effects of the dlpyrimldlne - dlpurlne
nature of the YYRR box are more important than the primary sequence i s
supported by the pattern of mismatches to the CTGA motif.
None of the
mismatches to the CTGA motif give r i s e to symmetries observable in the primary
sequence, whereas among the four sequenced YYRR boxes all fourteen of the
mismatches to the YYRR motif are organized into pallndroraic symmetries (10
mismatches) or direct repeats (1 mismatches in the ted YYRR box).
Although
this striking observation implies that the pallndromlc sequences serve a
function, i t is curious that the palindromes are not precisely conserved for
the Gld YYRR box among the three Drosophlla species.
One feature which i s
conserved among the three is the presence of t r i p y r l ml dines and tripurlnes to
the l e f t and r i g h t , respectively, of the axis of symmetry.
This i s also a
characteristic of the ted YYRR box palindrome.
A number of investigators have suggested that helix structure variants may
3erve as recognition 3ltes for specific regulatory proteins or proteins
involved in DNA packaging (2,5-9,30).
He have obtained preliminary results
which Indicate that Drosophlla embryo extracts contain a protease sensitive
factor(s) which strongly binds to a 217 bp r e s t r i c t i o n fragment containing the
Gld YYRR box. Experiments are underway in our laboratory to characterize t h i s
factor and determine whether the basis of i t s recognition is dependent upon
the primary sequence and/or the conformation of the YYRR box.
3389
Nucleic Acids Research
ACKNOWLEDGEMENTS
We thank Thomas Kaufman, Welcome Bender, Carl Thummel, and Ross Maclntyre
for providing us with genomlc clones or Southern blots of clones for four
region or the D melanogaster genome. We are grateful to Charles Singleton for
pointing out the possible s t r u c t u r a l significance of the YYRR box and t o F r i t z
Parl for providing us the rat and human DNAs.
REFERENCES
C a l l a d l n e , C.R. ( 1 9 8 2 ) . J . Hoi. B l o l . 161 , 3 4 3 - 3 5 2 . .
1.
D l c k e r s o n , R. E. ( 1 9 8 3 ) . J . Mol. B l o l . 166, 11 9-441.
2.
Kabsch, W. S a n d e r , C. i T r i f o n o v , E.N. ( 1 9 8 2 ) . Nucl. Acids Res.
3.
1O.1O97-110H.
F r a t l n l , A. V. , Kopka, H . L . , Drew, H.R. i. D l c k e r s o n , R. E. ( 1 9 8 2 ) . J . B l o l .
Chem. 257, 11686-14707.
Nusslnov, R., S h a p i r o , B . , L l p k i n , L. E. & M a i z a l , J. B. ( 1 9 8 4 a ) .
Bloch.
Bloph. Acts 783, 246*257.
N i e s l n o v , R., S h a p i r o , B. , L l p k i n , L. E. & M a i z a l , J . B . ( 1 9 8 4 b ) , J . Mol.
B l o l . 177, 59U-607.
Tung C . S . , & Harvey, S.C. ( 1 9 8 4 ) . Nucl. Acids Re3. 12, 3343-3356.
Tung, C . S . , & Harvey, S.C. ( 1 9 8 6 ) . J. B l o l . Chem. 261, 3700-3709.
9.
Tung, C . S . , & Harvey, S.C. ( 1 9 8 6 ) . Nucl. Acids Re3. 14, 381-388.
10. Drew, H.R., & T r a v e r s , A. A. ( 1 9 8 4 ) . C e l l 37, 491-502.
11. Drew, H.R. ( 1 9 8 4 ) . J. Hoi. B l o l . 176, 535*557.
12. Lomonosoff, G . P . , B u t l e r , P . J . G . , Klug, A. ( 1 9 8 1 ) , J . Mol. B l o l . 149,
745-760.
Keene, M.A. & E l g i n , S.C.R. ( 1 9 8 4 ) , C e l l 36, 121-129.
13.
1 4 . C a v e n e r , D. C o r b e t t , G. , Cox, D. & Whetten, R. ( 1 9 8 6 ) , EMBO J . 5,
2939-2948.
15. C a v e n e r , D.R., O t t e a o n , D . , & Kaufman, T.C. ( 1 9 8 6 b ) . G e n e t i c s , 114,
111*123.
16. S c o t t , M.P. Weiner, A. J . , H a z e l r i g g , T . , P o l l s k y , B. A., P l r r o t a , V.,
S c a l e n g h e . , F. , & Kaufman, T.C. ( 1 9 8 3 ) . C e l l 35, 763-776.
17. Henikoff, S. ( 1 9 8 4 ) . Gene 2 8 , 351-359.
18. S a n g e r , F . , N i c k l e n , S . , & C o u l s o n , A.R. ( 1 9 7 7 ) . P r o c . N a t l . Ac ad. S c l .
USA 7 4 , 5643-5467.
19. Bender, W. Akam. M. Karen, F . , Beachy, P . , P e i f e r , M. S o l e r e r , P . , Lewi3,
E. B. , & Hogness, D.S. ( 1 9 8 3 ) . S c i e n c e 2 2 1 , 2 3 - 2 9 .
20. Karch, F . , Weirfenbach, B . , P e i f e r , M., Bender, W., Duncan, I . , C e l n i k e r ,
S . , C r o s b y , M.p 4 Lewis, E. B. ( 1 9 8 5 ) . C e l l 4 3 , 81-96.
21. Young, M.W. ( 1 9 7 9 ) . Proc. N a t l . Acad. S c l . USA 76, 6274-6278.
22.
Lewin, B. (1980) . Gene E x p r e s s i o n 2 : E u k a r y o t l c Chromosomes, 2nd e d .
John Wiley & Sons, NY, pp. 5 4 0 - 5 6 9 .
23. Wharton, K. A., Yedvobnlck, B. , F l n n e r t y , V. G., & A r t a v a n i s - T s a k o n a s , S.
( 1 9 8 5 ) . C e l l 40, 55-62.
21.
Shlmlzu, A. & Honjo, T. ( 1 9 8 4 ) . C e l l 36, 801-803.
25. S p r a d l i n g , A. C , d e C i c c o , D. V . , Waklmoto, B.T., L e v l n e , J . F . ,
K a l f a y a n , L. J . , & Cooley, L. (1987) EMBO J . 6, 1045-105326. B e v e r l y , S. M., & W H w n , A.C. ( 1 9 8 4 ) . J . Mol. Evol. 2 1 , 1-13.
27. Throckmorton, L.H. ( 1 9 7 5 ) . I n Handbook of G e n t l e s (King, R. C , e d . ) ,
v o l . 3, PP. 421-469, Plenum P r e s s , New York.
28. Blackman, R.K. & Meselson, M. ( 1 9 8 6 ) . J . Mol. B l o l . 188, 499-515.
29. Tung, C . S . & Harvey, S.C. ( 1 9 8 7 ) . Nucl. Acids Res. 15, 4973-4984.
30. Nuasinov, R. & Lennon, G.G. ( 1 9 8 4 ) . J . Mol. Evol. 20, 106-110.
3390