Volume 16 Number 8 1988 Nucleic Acids Research The YYRR box: a conserved dipyrimidine—djpurine sequence element in Drosophila and other eukaryotes Douglas Cavener, Yue Feng, Brian Foster, Philip Krasney, Michael Murtha, Christopher Schonbaum and Xiao Xiao Department of Molecular Biology, Vanderbilt University, Nashville, TN 37235, USA Received December 2, 1987; Revised and Accepted March 9, 1988 ABSTRACT We have discovered a novel DNA sequence element In Dro3ophlla which I s based upon a CTGA tandem repeat. This element has been named the YYRR box to emphasize i t s dlpyrimldine-dipurine nature which i s predicted to nave unusual structural features. Southern hybridization analysis of genomic DNA indicates the presence of 25-30 copies of the YYRR box in each of three Drosophila species (melanogaster, pseudoobscura. and v i r l l l s ) and conservation of genomic location within species. Similar analysis of human and rat DNA indicates the presence of YYRR related sequences in mammals as well. YYRR boxes have been localized to two genetic loci in Drosophila: Gld and a gene tentative identified as ted. These two genes exhibit correlated patterns of developmental expression and an identical mutant phenotype. Sequence analysis of the Gld YYRR box in three Drosophila species revealed a high degree of conservation despite i t s intronlc location. INTRODUCTION Recent refinements in models of DNA structure and results of X-ray diffraction studies have stimulated a number of investigations into the structural and functional conformation of DNA (1-9). consequences of sequence dependent variation in the Calladine (1) proposed that large distortions In the archetypical 3-DNA conformation are generated as a consequence of avoiding s t e r l c hindrance of nearest^nelghbor purlnes on complementary DNA strands. Furthermore, he suggested four alterations in the DNA duplex structure which could potentially alleviate the purlne-purlne clash. Based upon the assumption that these alterations would be compensated by nearest-neighbor nudeotldes, a simple quantitative model (denoted Calladine's rules) was proposed to describe the relative changes in these helix parameters (2). Dickerson and coworkers have shown that t h i s model is able to predict such alterations in short ollgonucleotlde crystals (2). Tung and Harvey (8) have recently developed a more elaborate theoretical model to estimate absolute values for sequence dependent conformatlonal parameters. Whether these two theoretical models are directly applicable to DNA in vivo is certainly not 3375 Nucleic Acids Research clear. However, considerable evidence e x i s t s that a l t e r a t i o n s In the conformation of B-DNA strongly Influence the recognition s i t e s of various endonucleases ( e . g . DNAase I) (6,10-13) He have discovered a conserved multicopy DNA sequence element In Droaophlla which has the highest possible variance In the helix twist angle allowed by C a l l a d i n e ' s r u l e s . This sequence I s 50-72 bp In length and i s based upon a tetranucleotlde dlpyrlmidine-dlpurlne motif. We have named t h i s sequence the YYRR (pronounced wire) box t o emphasize I t s dipyrlmidine-dlpurine nature. Two of the YYRR boxes have been localized within or near the developmentally r e l a t e d Droaophlla genes, glucose dehydrogenase (Gld) and a t r a n s c r i p t i o n unit t e n t a t i v e l y Identified as trapped ( t e d ) . Gld and ted exhibit an i d e n t i c a l mutant phenotype and are t i g h t l y linked (14,15). Furthermore, the expression of the mRNAs from Gld and the putative ted gene exhibit highly correlated developmental p a t t e r n s . Their correlated pattern of expression may be r e l a t e d to the presence of the YYRR boxes. MATERIALS AND METHODS D r o s o p h l l a DNA c l o n e s Genomic DNA c l o n e s c o n t a i n i n g t h e Gld and t e d genes for p_;_ m e l a n o g a a t e r were p r e v i o u s l y d e s c r i b e d ( 1 4 ) . These c l o n e s were I s o l a t e d from an EMBL-4 lambda l i b r a r y of t h e Oregon R. s t r a i n . An a d d i t i o n a l ted c l o n e was i s o l a t e d i n t h i s s t u d y from a Charon 4 lambda l i b r a r y of t h e Canton S s t r a i n . The D^_ pseudoob3cura and p_;_ v l r l l l a Gld c l o n e s were i s o l a t e d from an EMBL lambda l i b r a r y and a Charon 30 l i b r a r y , r e s p e c t i v e l y , via standard plaque hybridization techniques using D melanogaster Gld clones as the hybridization probes. The YYRR r e l a t e d sequences found in the atranded-at-second (sas) gene and the Antennapedla Complex (ANT-C) were Identified from a s e r i e s of genomic clones from two chromosome walks i n the 84BC region (14,16) using the 16-mer GACT ollgonucleotlde probe desrlbed below. DNA hybridization Two s y n t h e t i c ollgonucleotldes were used t o screen Southern blots of various genomic clones and genomic DNAa for the YYRR sequences: a 16 mer, d(GACT)i| and a 32-mer, d(GACT)g. These will be referred t o as CACT-16 and CACT'*32, r e s p e c t i v e l y . Hybridization experiments with 32P end-labelled GACT-16 used the following conditions: 5X SSC, 0.1J SDS, 0.01J heparln, and 20J fonnamlde at 25° C. Blots were washed in 2X SCC-0.2J SDS at 25° C. Hybridization experiments with GACT-32 were conducted at 42° C using one of 3376 Nucleic Acids Research the following h y b r i d i z a t i o n buffers: heparln and 50* forraamlde: (Buffer A) 5X SSC, 0 . 1 * SDS, 0.01J (Buffer B) 0.25 M 3odiura phosphate, pH 7, 0.3 M NaCl, 0.01 M EDTA, 7* SDS, and 50J formamide. Blots were washed in 2X SSCH). 2* SSC-0.2JSDS f i r s t at 25° C and then a t 60° C. DNA cloning and sequencing Nested d e l e t i o n s of several DNA clones described below were constructed using t h e Henikoff ExoIII method (17). D melanogaster Gld YYRR box: A series of ExoIII d e l e t i o n s were constructed In pEG25, a pEMBLi8 3ubclone of a 7 kb Sal-Xba fragment from i n t r o n - I . D. melanogaster ted YYRR box: fragments (pCTtO) from genomlc clones of Oregon-R and Canton-S were subcloned I n t o the Bluescript vector (Stratagene, I n c . ) sequence: S'lObp AccI D. melanogaster sas YYRR A s e r i e s of ExoIII deletions were constructed in pCS58, a B l u e s c r i p t subclone of a 2.5 kb EcoRI fragment from l n t r o n - I of t h e sas gene. D. melanogaster ANT-C YYRR sequence: a s e r i e s of ExoIII deletions were constructed in pCX62, a B l u e s c r i p t subclone of a 2.4 kb EcoRI fragment located between the divergent promoters of the ANT-C genes Sex combs reduced (Scr) and fushl tarazu ( f t z ) (Dr. T. C. Kaufman, personal communication). pseudoobscura Gld YYRR box: D. A s e r i e s of ExoIII d e l e t i o n s were constructed in pBy2.1, a Bluescrlbe ( S t r a t a g e n e , I n c . ) subclone of a 2.1 kb S s t I fragment from i n t r o n I . D v l r i l l s Gld YYRR box: subcloned i n t o B l u e s c r i p t (pCG79fl). a 350 bp PvuII-SstI fragment was After verifying t h e l o c a t i o n of YYRR sequences within the subclones and deletion mutants using Southern a n a l y s i s , t h e i r DNA sequences were determined using the Sanger dldeoxy sequencing method (18). Analysis of the DNA sequences was performed on a Compaq 286 computer using the IBI P u s t e l l DHA sequence a n a l y s i s programs. RESULTS D. melanogaster Gld YYRR box DNA sequence a n a l y s i s of the 8 kb intron s e p a r a t i n g the s t a r t s i t e of t r a n s c r i p t i o n from t h e s t a r t codon of Gld revealed an unusual 72 bp t e t r a n u c l e o t l d e tandem repeat element ( F l g . 1 ) . sequence CTGA. This repeat 13 based upon the Single n u d e o t l d e differences in the CTGA motif e x i s t In seven of the eighteen r e p e a t s . I n t e r e s t i n g l y , only two of these differences the dipyrlmidine-dipurine (YYRR) nature of the r e p e a t . disrupt Although t h e s e two repeats (YYYR and RRRY) a l s o destroy the 3lmple dyad symmetry (YYRR/RRYY), they give r i s e t o a higher order palindromlc p a t t e r n (YY R YYY RR / YY RRR Y RR). Because YYRR sequences have a high d e n s i t y of YR and RY base s t e p s , high l e v e l s of v a r i a t i o n were found for a l l four h e l i x parameters ( h e l i x twist 3377 Nucleic Acids Research D. melanooaster Gld YYRR BOX CTGA CTGG CTGA CTGG CTGA CTGG CTOA CTOA CTOC CTGA CCOA QTQA CTGA CTGA CTQA CTOA CTAA CTOA (YYRR)s YY R YYY RR | YY RFIR Y RR (YYRR)6 2 1 3 2 2 3 12 D. pseudoobscura Gld YYRR BOX CTGA CTGA CTGA CTOA CTGG CTGA CTGA CTOA CTCA CTAA CTQA CTGA TTOA CTQA CTGG CAGA CTGA (YYRR)8 YYY R (YY | RR)e Y RRR YYRR 3 1 2 2 1 3 D. virilis Gld YYRR BOX AA CTGA CTCA CTGA CTCA CTQA CTGA CTGA CTGA CGGA CTGA CAGA CTGA RR YYRR YYY R YYRR YYY R (YY | RR)4 Y RRR YYRR Y RRR YYRR 3 1 2 2 3 1 2 2 1 3 2 2 1 3 D. melanogaster ted YYRR BOX AA CTGA CTGA CTGT CTCA CTGC CTCA CTGA CTQA CTGA CTGA CTQC CTGA ATGA CTGA CTGA CTAA CTGA (RRYY)3 R Y4 R YY R Y4 R (YYRR)4 Y2 R Y3 | R3 Y (RRYY)4 RR 2 14 1 2 14 1 2 2 13 3 1 2 Figure 1. Primary DNA sequences and derived symmetries of Gld and ted YYRR boxes. All of the sequence symmetries observed for the Gld gene are pallndromlo. The axis of the dyad 3ummetry I s indicated by " "In the second l i n e for each sequence. A direct repeat Is observed In the ted YSSR box. The ted YYRR box was sequenced In both the Canton-S and Oregon-R s t r a i n s of £. melanogaster and found t o be I d e n t i c a l . angle, base-plane r o l l angle, main-chain torsion angle, and propeller twist angle) as calculated by the sum rules of Dlckerson ( 2 ) . According t o these r u l e s , a l t e r a t i o n s In the helix twist angle and base-plane r o l l angle are compensated for by opposite changes in the same angles In adjacent base p a l r 3 . The YYRR sequence i s unique i n t h i s regard because i t contains the highest possible packing of YR and RY steps which generate the clash and YY and RR steps which compensate for the clash. As a consequence, YYRR tandem repeats generate the highest possible variance In the helix t w i s t angle as computed by C a l l a d l n e ' s r u l e s . The expected deviations and variance in the helix twi3t angle for the Gld YYRR box and flanking sequence are depicted in Figure 2. The central symmetrical pattern In Figure 2a is due t o the higher order 3378 Nucleic Acids Research GTATCTACGGTATTGTTCACACTGACTGGCTGACTG6CTeACTG6CTGACTQACT0CCT6ATTGAGTGACTGACT0ACTa*CTGACTAACTGACAGGCAQCTCA (a) O 30 30 TO 90 IK) 130 150 170 190 210 230 250 270 290 310 330 350 370 390 (b) Figure 2. Helix twist plot for the D. melanogaster Gld YYRR box. (a) The expected deviations in the helix twi3t angle were calculated using Caladlne's r u l e s . Zero i s approximately equal to an angle of 36° ( i . e . the mean angle for B*DNA). Each unit i s approximately equal to 2° (Dickerson, 1983). Thus, the predicted range in the helix twi3t angle i s 28° t o 12". (b) The variances of the twist angle deviations, V(TAD), are plotted for the Gld YYRR box and flanking sequences. Each data point represents the variance calculated from ten adjacent angles. The two highest peaks in the variance represents the YYRR box (basesi60-230). The intervening valley between these two peaks is caused by the pallndromic sequence of the YYRR box which has a lower density of purine-purine clash than the simple dipyrlmidlne-dlpurlne motif. pallndromic sequence described above. The t i g h t packing of helix t w i s t deviations In the YYRR box leads to a two fold higher variance in t h i s parameter than that observed in flanking sequences (Figure 2b). Using the methods of Tung and Harvey (8) the following s e r i e s of helix twist angle i s predicted for the basic CTGA tandem motif CT-34.9 0 , TG-27.7", GA-36.6 0 , and AC-36.4 0 . Although both models predict a similar sharp reduction for the TG step twist angle, the Tung-Harvey model doe3 not predict the large positive 3379 Nucleic Acids Research D. melanogaster a tsi :» :» HI HI i i i t t «« i «• i sio i »• i \ . ^ \ >••: YYRR BOX... i •; ^ 401 v. 5Qbp v D. melanogaster 100 250 100 Wi YYRR BOX I - \ \: 50 bp 3380 Nucleic Acids Research D. pseudoobscuro SI 10 h : ISt to rRR BO:< !0t \ » ! \ 1(1 50 bf> : ISO Figure 3. Homology matrices for Gld YYRR boxes In three Prosophlla s p e c i e s . The Pustell - IBI DNA Sequence Analysis program was used to compare the nucleotide sequence of D. melanogaster. D. pseudoobscura. and D. v l r l l l 3 . (a) melanogaster & pseudoobscura. (b) melanog a s t e r & v l r l l l 3 , (c) pseudoobscura & v l r l l l s . Homology between sequences i s indicated by l e t t e r s which represent the degree of sequence s i m i l a r i t y within a centrally-weighted region of 31 nucleotidea. Sequence s i m i l a r i t y ranges from 100J ( l e t t e r A) t o a lower cutoff of 651 ( l e t t e r S). The t h r e e sequences containing the YYRR boxes are a l l located within the large 5' lntron of the Gld gene. The o r i e n t a t i o n (top to bottom and l e f t to r i g h t ) corresponds t o the 5' to 3' o r i e n t a t i o n of the Gld transcription unit. deviations i n the CT and 3tep angles. Consequently the helix twist variance within the YYRR box i s not greater than variance in the sequences flanking the YYRR box as calculated by the Tung-Harvey method (data not shown) in contrast to Figure 2b (Calladine helix t w i s t variance). Gld YYRR box in P. pseudoobscura and P. v l r l l l s Genomlc PNA clones of the D^_ melanogaster Gld gene were used as hybridization probes to i s o l a t e the homologous clones l n D ^ v l r l l l 3 and P_^ pseudoobscura from genomic PNA l i b r a r i e s (Krasney and Cavener, unpublished d a t a ) . YYRR boxes were found t o e x i s t in the homologous regions of the Gld 3381 Nucleic Acids Research genes of t h e s e two species as well. The sizes of the aid YYRR boxes are similar in the three species (£. melanogaster - 72 bp. D. pseudoobscura - 68 bp. and D. v l r l l l 3 - 50 bp). A t o t a l of eight mismatches t o the dipyrlmldinedlpurlne motif Is observed among the t h r e e Gld YYRR boxe3, and a l l of these are part of higher order pallndromlc sequences. However, each one of the YYRR boxes contains a different pallndroralc p a t t e r n . One common feature of the palindromes 13 that trlpyrlmidines are found t o the l e f t of the axis of symmetry whereas t r l p u r l n e s are found t o the r i g h t . Also, a trlpyrimldine i s present in the same position of a l l t h r e e r e l a t i v e t o the 3' boundary of the YYRR box. The Pustell-IBI DNA Sequence Analysis program was used t o examine the conservation of the sequences flanking t h e Gld YYRR box in the three Drosophlla s p e c i e s . The sequences Immediately downstream of the Gld YYRR boxes are highly conserved among a l l t h r e e species (Fig. 3). I n t e r e s t i n g l y , t h i s region I s responsible for the low variance In the helix twist angle immediately t o the l e f t of the YYRR box in Figure 2b. A more extensive region of horaology i s observed in the 300 bp region surrounding the YYRR box between D. melanogaster and £. pseudoobscura. I t i s Important to note that we have examined t h i s i t r o n i c region in £. melanogaster and find no evidence for protein coding regions. YYRR sequences in other regions Two ollgonucleotlde probes, GACT-16 and GACT-32 were used t o detect YYRR sequences among several previously cloned D. melanogaster genes. GACT-16 and GACT-32 are four and eight tandem repeats of GACT, r e s p e c t i v e l y . We chose hybridization conditions which would allow for approximately two mismatches using GACT-16 and five mismatches using GACT-32. All DNA sequences examined were probed in p a r a l l e l with t h e two o l i g o n u d e o t i d e s . We have chosen to use the general term YYRR sequence to denote sequences which hybridize with at l e a s t the GACT-16 probe and t h e s p e c i f i c term YYRR box to denote those sequences which a l s o hybridize with the GACT-32 probe. The following regions of the genome were examined for YYRR sequencs by probing Southern blots containing genomlc DNA clones: (a) fWC region containing nine genes including Gld, 160 kb; (b) 8tAB region which contains the Antennapedla Complex, 320 kb; (16), (c) 89E region which contains the Blthorax Complex, 100 kb (19.20), (d) 25F-26A region which contains the q-glycerophosphate dehydrogenase and B-galactosldase genes as well as several other genes, 195 kb (D. Knlpple & R. Maclntyre, personal communication): and (e) Ikt region which contains the E7t gene, 100 kb (C. Thummel & D. Hogness, 3382 Nucleic Acids Research a 12 3 4 5 6 7 8 9 10 II 12 13 14 7.0- b sas YYRR Sequence I ted Gld YYRR Box „ \ YYRR Box „ \ Figure t. YYRR sequences in the IWC region of D. melanogaster. (a) Overlapping genomlc DNA clones (152 kb) containing the e n t i r e 84c region of the right arm of the t h i r d chromosome (Cavener, et a l , 1986a) were hybridized with the GACT^I 6 probe. Clone 6 contains the YYRR sequence within an intron of the sas gene. Clones 12 and 14 contain the ted and Gld YYRR boxes, respectively, (b) The location of the YYRR sequences/boxes r e l a t i v e t o the t r a n s c r i p t maps of sas, ted and Gld. The thick l i n e s represent the approximate location of exonlc sequences. The precise boundaries for the exonic sequences have only been determined for exons I , I I , and I I I for the Gld gene. personal communication). Using the GACT-1 6 probe, a t o t a l of 15 YYRR sequences were detected among the 1.2 megabases of DNA examined. The r e s u l t s for the 84C region are shown in Figure 1. Only two of the 15 YYRR sequences were also detected by the GACT-32 probe. Besides the YYRR box at the Gld locus the other YYRR box detected by GACT-32 maps e i t h e r within or near the trapped (ted) locus, ted and Gld are t i g h t l y linked and reside within a 100 kb region. (11,15). The location of the _ted YYRR box r e l a t i v e t o the ted t r a n s c r i p t i o n unit has not been precisely determined. Preliminary evidence Indicates that t h i s YYRR box i s most l i k e l y within a 5' intron of the ted gene. The sequence of the ted YYRR box i s shown in Figure 1. I t s length i s e s s e n t i a l l y the 3ame as that of the D. melanogaster Gld YYRR box. The ted 3383 Nucleic Acids Research Hind H I EcoRI PvuIL O C P V O C P V O C P V Figure 5. YYRR boxes in the Drosophlla genome. Genoraic Southern a n a l y s i s was performed on t o t a l genomic DNAs of D. melanogaster, D. pseudoob3cura, and D. _ v l r l l l s using the GACT-32 probe. 0 - D. melanogaster Oregon-R s t r a i n , C - D. melanogaster Canton-S s t r a i n . P - D. pseudoob3cura and V - D. v l r l l l s . High stringency hybridization and washing conditions were used t o eliminate the detection of YYRR sequences with l e s s than a 25 of 32 match with GACT«32. The f i l t e r s were exposed t o XAR-5 film for 15 hours at -70 C°. YYRR box contains s i x mismatches t o the YYRR motif. Four of these mismatches are within a tandem repeat (YY R YYYY R, YY R YYYY R) while the other two mismatches impose a higher order palindromlc sequence much l i k e the palindromes observed for the Old YYRR boxes of D^ melanogaster, D. pseudoobscura, and D. v l r l l l s . Another s t r i k i n g feature of a l l four YYRR boxes which have been sequenced i s that t h e i r four base repeat element i s 3384 Nucleic Acids Research uninterrupted by any Insertions or deletions which would d i s t u r b the four base frame. Given the heterogeneity of the symmetries observed among the four 3equenced YYRR boxes, we decided to determine if the ted YYRR box sequence differed In another D. melangoaster s t r a i n . He sequenced the ted YYRR box from the Canton-S s t r a i n and found t h a t I t was Identical t o t h a t of the Oregon-R s t r a i n . In order t o compare the YYRR sequences which only are detected by the GACT-16 probe, we sequenced the YYRR sequences detected in the sas gene (formerly denoted l(3)84Cd) (14,15) and In the ANT-C between the Scr and ffcz genes. These YYRR sequences contain four or five tandem repeats of the GACT motif with matches of 15 (j3as) and It (ANT-C) bases to the GACT-16 probe (data not shown). The significance of these short YYRR sequences Is questionable, inasmuch as t h e i r probable occurence in the Drosophlla genome approaches random expectation. Nonetheless, the clones containing them nave proven to be useful negative controls for experiments conducted with GACT-32. The best match for each of these two YYRR sequences t o the GACT-^ ollgomer Is 21 of 32 (ANT-C) and 20 of 32 (3as) . Copy number and conservation of location Whole genomlc Southern hybridization indicated that the YYRR box copy number ranges from 25-30 for D. melanogaster, I), pseudoobscura, and D. v l r l l l s (Figure 5). Furthermore, the locations of YYRR boxes within D. melanogaster appear t o be highly conserved between the Oregon-R and Canton-3 s t r a i n s (which were isolated from different populations more than f i f t y years ago). This r e s u l t I s In contrast to the lack of l n t r a s p e c i f i c conservation in location of several other middle r e p e t i t i v e sequences in D. melanogaster (21). Southern hybridization experiments were also conducted with genomic DNA from r a t s and humans (Figure 6). Humans have four d i s t i n c t PvuII fragments (8.7 kb, 2.8 kb, 1,2 kb, and 0.7 kb) which hybridize with GACT-32. This r e s t r i c t i o n pattern la conserved among eight unrelated hunan DNA samples. The YYRR copy number in r a t s i s d i f f i c u l t to estimate because one of the r e s t r i c t i o n fragments containing a YYRR sequence e x h i b i t s a very high signal (thus obscuring the signal from other fragments). Analysis of five diverse r a t tissues uterus, I n t e s t i n e , l i v e r , heart, and spleen) indicates that the 6.5 kb PvuII fragment has very high signal in a l l t i s s u e s . I t I s important t o note that under these hybridization conditions, high concentrations of r e s t r i c t i o n fragments containing the short s a s and ANT-C YYRR sequences are not detected at a l l with GACT:J32. Thus the genomic r e s t r i c t i o n fragments 3385 Nucleic Acids Research 1234 6.5 4.9 - 2.8 - 56 78 910 ?f ft 1.7 — 1.5 — 1.2 — I 2 •* a s 2 3 4 5 6 7 8 9 8.7 - 6.5 — 2.8 - 4 1.2 - - 0.7 • - Figure 6. Human and r a t YYRR box related elements. Total genoraic DNAs were subjected t o Southern analysis using the GACT«32 probe and high stringency hybridization conditions, (a) Lane 1, cloned ted YYRR box; Lane 2, cloned Gld YYRR box; Lane 3, cloned sa3 YYRR sequence; Lane 1, cloned ANTHC YYRR 3386 Nucleic Acids Research sequence; Lanes 5, 7, & 9 r a t u t e r u s 0 DNA (20 iig) digested with PvuII, H i n d l l l , & P s t I , respectively: and Lanes 6, 8, 4 10 r a t uterus 214 DNA (20 ug) digested with PvuII, Hindlll, and P3tl, r e s p e c t i v e l y . Rat DNA was Isolated from Sprague Dawley r a t u t e r i either untreated, u t e r u s 0 , or t r e a t e d for 24 hours with e s t r a d i o l , u t e r u s 2 4 . Lanes 1 & 2 contain the two positive controls (ted and Gld YYRR boxes) with the predicted 2.8 kb and 1.2 kb fragments, r e s p e c t i v e l y , hybridizing with the probe. Lanes 3 S 1 contain the two negative controls ( s a s ) and ANT-C YYRR sequences) which f a l l to hybridize to GACT-32 as predicted under these hybridization conditions. The r e s t r i c t i o n fragments containing the control YYRR sequences (Lanes 1*-4) contain between 20-60 ng of DNA. Exposure time was for 6 hours, (b) Rat genomic DNA from various tissues digested with PvuII. Lane 1, spleen (15 ug); lanes 2 & 3, l i v e r (7 ug); lanes H i 5 heart (15 ug)j lanes 6 & 7 I n t e s t i n e (15 ug); and lanes 8 & 9 uterus (15 ug) . (c) Lanes 1 & 2 contain DNA samples (20 ug) isolated from peripheral lymphocytes from two different human individuals both digested with PvuII. Exposure time was for HO hours. The four PvuII fragments detected have also been seen In s i x other individual samples (data not shown). which a r e detected in these experiments must have a match of greater than 21 of 32 with GACTU32. Given the high stringency conditions used, we estimate that the percent match i s greater than 80$. DISCUSSION Tandem repeats of short oligonucleotlde3 are common to a variety of eukaryotlc taxa and have been 3tudled extensively in D. melanogaster and D. v l r l l l s (22). Although the simple sequence cooplexltiy of the YYRR box i s similar to these previously reported tandem repeats, there are some Important differences germane to their size, genomic location, and evolutionary conservation. A large majority of the previously characterized tandem repeats occur in large blocks (105-10? copies per clusters), and are typically located In heterochromatin. As a consequence of their length and GC content bias, they are often isolated as s a t e l l i t e DNA. Phylogentic studies Indicate that these sequence elements evolve rapidly such that s a t e l l i t e s in closely related species are quite dissimilar and In some cases may have independent origins. In contrast, the YYRR box repeats sequenced to date are relatively small (50-72 bp), are found within euchromatlc genes, and exhibit a much greater degree of evolutionary conservation of size, location, and copy number both within and between species. Three other tandem repeats are more similar to the YYRR box than the s a t e l l i t e DNAs mentioned above. The Opa sequence is a trlnucleotlde repeat found in the coding region of a number of Drosphlla genes (23). Given i t s location In protein coding regions, i t s functional consequence Is more obvious than that of the YYRR box which apparently does not occur in any mRNA (Cavener and Cox, unpublished data). The other two tandem repeats, found in 3387 Nucleic Acids Research imnunoglobulin heavy chain genes and chorion genes, are found near or within their transcription units but not within coding regions. Imraunoglobulln heavy chain genes contain pentanucleotlde tandem repeats of GAGCT upstream of the alternative C exons (24). This repeat is thought to be essential for the DNA rearrangements which give rise to lmmunoglubulln class switching. The Drosophlla chorion gene cluster also contains a pentamerlc DNA tandem repeat. This repeat has been Implicated in the tissue specific amplicification of the chorion gene clusters on both the X and third chromosomes (25). Presently, we have no evidence that a DNA rearrangement or amplification occurs in the region of the Drosophlla YYRR boxes. makes i t technically difficult However, the small size of Drosophlla to rule out tissue-specific DNA rearrangements. The conservation of the Gld YYRR box in D. melanogaster, £. P3eudoob3cura, and D. v l r l l l 3 strongly suggests that the YYRR box has a function. Given the estimated time of divergence between the Sophophora subgenus (containing D. melanogaster and D. pseudoobscura) and the Drosophlla subgenus (containing D. v l r l l l s ) (26, 27) unconstrained DNA sequences in lntronlo and flanking regions surrounding genes are expected to be completely diverged between species from the two main subgenera. Indeed, comparisons of intronic sequences between D. melanogaster, D. pseudoobscura, and D v l r l l l s for the Hsp^2 gene reveal complete divergence using the identical program and parameters values we have used herein (28). Thus, the conservation of the Gld YYRR box and the f i r s t 50 bp immediately downstream of i t (Figure 3) implies a conserved function. This hypothesis i s further supported by the presence of a YYRR box associated with the ted gene which exhibits an identical mutant phenotype and has a very similar pattern of mRNA expression to Gld (Cox & Cavener, unpublished data). The Gld gene in D. pseudoobscura and JX_ v l r l l l a are not abundantly expressed In adult males as i s the case for the Gld gene in D. melanogaster, so i t is unlikely that the YYRR box Is involved in adult expression. The most compelling hypothesis at t h i s time i s that the YYRR box serves as an enhancer or DNA amplification element which affects the developmental expression of Gld, ted, and other related genes during preadult development. If t h i s hypothesis proves to be correct, then i t i s likely that the 25-30 copies of the YYRR box in Drosophlla are contained within a large family of developmentally related genes. We have begun in vitro mutagenesls - gene transformation experiments to t e s t the hypothesis that the YYRR serves a regulatory function in the Gld gene. Although i t i s tempting to speculate about the functional relationship of the YYRR box related sequences found in f r u i t f l i e s , humans and r a t s , there 3388 Nucleic Acids Research are 3ane important differences observed. F i r s t l y , the copy number of the YYRR boxes in htmans is an order of magnitude smaller than that of Drosphlla. However, like Drosophlla, intraspecific variation in the r e s t r i c t i o n fragments detected by GACT-32 is virtually absent suggesting that the YYRR box is a non-mobile element and evolutionarily s t a b l e . amplified YYRR sequence. Secondly, rats contain a highly But this amplification is not tissue specific, and therefore, i t i s unlikely that the amplification i s involved in developmental regulation. In addition to functional and evolutionary questions, the potential structure of the YYRR box is of considerable interest because i t represents an extreme case of perturbation in the helix twist angle as estimated by Calladine's rules. Although the estimates of the helix twist angles in the YYRR box by the method of Tung and Harvey (8) predicts less extreme average deviations, i t i s interesting to note that the predict four angle series for the YYRR box (3t.9°, 27.7°, 36.6°, and 36.t 0 ) i s very similar to the consensus five angle series (36.2°, 37.8°, 26.1°, 37.8° and 36.2°) found in E coll promoters (29). Both the YYRR box and the E c o l l consensus structure contain a YR step leading to a large decrease in the twist angle. The notion that conformatlonal effects of the dlpyrimldlne - dlpurlne nature of the YYRR box are more important than the primary sequence i s supported by the pattern of mismatches to the CTGA motif. None of the mismatches to the CTGA motif give r i s e to symmetries observable in the primary sequence, whereas among the four sequenced YYRR boxes all fourteen of the mismatches to the YYRR motif are organized into pallndroraic symmetries (10 mismatches) or direct repeats (1 mismatches in the ted YYRR box). Although this striking observation implies that the pallndromlc sequences serve a function, i t is curious that the palindromes are not precisely conserved for the Gld YYRR box among the three Drosophlla species. One feature which i s conserved among the three is the presence of t r i p y r l ml dines and tripurlnes to the l e f t and r i g h t , respectively, of the axis of symmetry. This i s also a characteristic of the ted YYRR box palindrome. A number of investigators have suggested that helix structure variants may 3erve as recognition 3ltes for specific regulatory proteins or proteins involved in DNA packaging (2,5-9,30). He have obtained preliminary results which Indicate that Drosophlla embryo extracts contain a protease sensitive factor(s) which strongly binds to a 217 bp r e s t r i c t i o n fragment containing the Gld YYRR box. Experiments are underway in our laboratory to characterize t h i s factor and determine whether the basis of i t s recognition is dependent upon the primary sequence and/or the conformation of the YYRR box. 3389 Nucleic Acids Research ACKNOWLEDGEMENTS We thank Thomas Kaufman, Welcome Bender, Carl Thummel, and Ross Maclntyre for providing us with genomlc clones or Southern blots of clones for four region or the D melanogaster genome. We are grateful to Charles Singleton for pointing out the possible s t r u c t u r a l significance of the YYRR box and t o F r i t z Parl for providing us the rat and human DNAs. REFERENCES C a l l a d l n e , C.R. ( 1 9 8 2 ) . J . Hoi. B l o l . 161 , 3 4 3 - 3 5 2 . . 1. D l c k e r s o n , R. E. ( 1 9 8 3 ) . J . Mol. B l o l . 166, 11 9-441. 2. Kabsch, W. S a n d e r , C. i T r i f o n o v , E.N. ( 1 9 8 2 ) . Nucl. Acids Res. 3. 1O.1O97-110H. F r a t l n l , A. V. , Kopka, H . L . , Drew, H.R. i. D l c k e r s o n , R. E. ( 1 9 8 2 ) . J . B l o l . Chem. 257, 11686-14707. Nusslnov, R., S h a p i r o , B . , L l p k i n , L. E. & M a i z a l , J. B. ( 1 9 8 4 a ) . Bloch. Bloph. Acts 783, 246*257. N i e s l n o v , R., S h a p i r o , B. , L l p k i n , L. E. & M a i z a l , J . B . ( 1 9 8 4 b ) , J . Mol. B l o l . 177, 59U-607. Tung C . S . , & Harvey, S.C. ( 1 9 8 4 ) . Nucl. Acids Re3. 12, 3343-3356. Tung, C . S . , & Harvey, S.C. ( 1 9 8 6 ) . J. B l o l . Chem. 261, 3700-3709. 9. Tung, C . S . , & Harvey, S.C. ( 1 9 8 6 ) . Nucl. Acids Re3. 14, 381-388. 10. Drew, H.R., & T r a v e r s , A. A. ( 1 9 8 4 ) . C e l l 37, 491-502. 11. Drew, H.R. ( 1 9 8 4 ) . J. Hoi. B l o l . 176, 535*557. 12. Lomonosoff, G . P . , B u t l e r , P . J . G . , Klug, A. ( 1 9 8 1 ) , J . Mol. B l o l . 149, 745-760. Keene, M.A. & E l g i n , S.C.R. ( 1 9 8 4 ) , C e l l 36, 121-129. 13. 1 4 . C a v e n e r , D. C o r b e t t , G. , Cox, D. & Whetten, R. ( 1 9 8 6 ) , EMBO J . 5, 2939-2948. 15. C a v e n e r , D.R., O t t e a o n , D . , & Kaufman, T.C. ( 1 9 8 6 b ) . G e n e t i c s , 114, 111*123. 16. S c o t t , M.P. Weiner, A. J . , H a z e l r i g g , T . , P o l l s k y , B. A., P l r r o t a , V., S c a l e n g h e . , F. , & Kaufman, T.C. ( 1 9 8 3 ) . C e l l 35, 763-776. 17. Henikoff, S. ( 1 9 8 4 ) . Gene 2 8 , 351-359. 18. S a n g e r , F . , N i c k l e n , S . , & C o u l s o n , A.R. ( 1 9 7 7 ) . P r o c . N a t l . Ac ad. S c l . USA 7 4 , 5643-5467. 19. Bender, W. Akam. M. Karen, F . , Beachy, P . , P e i f e r , M. S o l e r e r , P . , Lewi3, E. B. , & Hogness, D.S. ( 1 9 8 3 ) . S c i e n c e 2 2 1 , 2 3 - 2 9 . 20. Karch, F . , Weirfenbach, B . , P e i f e r , M., Bender, W., Duncan, I . , C e l n i k e r , S . , C r o s b y , M.p 4 Lewis, E. B. ( 1 9 8 5 ) . C e l l 4 3 , 81-96. 21. Young, M.W. ( 1 9 7 9 ) . Proc. N a t l . Acad. S c l . USA 76, 6274-6278. 22. Lewin, B. (1980) . Gene E x p r e s s i o n 2 : E u k a r y o t l c Chromosomes, 2nd e d . John Wiley & Sons, NY, pp. 5 4 0 - 5 6 9 . 23. Wharton, K. A., Yedvobnlck, B. , F l n n e r t y , V. G., & A r t a v a n i s - T s a k o n a s , S. ( 1 9 8 5 ) . C e l l 40, 55-62. 21. Shlmlzu, A. & Honjo, T. ( 1 9 8 4 ) . C e l l 36, 801-803. 25. S p r a d l i n g , A. C , d e C i c c o , D. V . , Waklmoto, B.T., L e v l n e , J . F . , K a l f a y a n , L. J . , & Cooley, L. (1987) EMBO J . 6, 1045-105326. B e v e r l y , S. M., & W H w n , A.C. ( 1 9 8 4 ) . J . Mol. Evol. 2 1 , 1-13. 27. Throckmorton, L.H. ( 1 9 7 5 ) . I n Handbook of G e n t l e s (King, R. C , e d . ) , v o l . 3, PP. 421-469, Plenum P r e s s , New York. 28. Blackman, R.K. & Meselson, M. ( 1 9 8 6 ) . J . Mol. B l o l . 188, 499-515. 29. Tung, C . S . & Harvey, S.C. ( 1 9 8 7 ) . Nucl. Acids Res. 15, 4973-4984. 30. Nuasinov, R. & Lennon, G.G. ( 1 9 8 4 ) . J . Mol. Evol. 20, 106-110. 3390
© Copyright 2026 Paperzz