Blackwell Science, LtdOxford, UKMMIMolecular Microbiology0950-382XBlackwell Publishing Ltd, 2003482429441Original ArticlePredicting mutations in stem–loop structuresB. E. Wright et al. Molecular Microbiology (2003) 48(2), 429–441 Predicting mutation frequencies in stem–loop structures of derepressed genes: implications for evolution Barbara E. Wright,* Dennis K. Reschke, Karen H. Schmidt, Jacqueline M. Reimers and William Knight Division of Biological Sciences, University of Montana, Missoula, MT 59812, USA. the background mutable bases in lacI. (iv) The data suggest that environmental stressors may cause as well as select mutations in derepressed genes. The implications of these results for evolution are discussed. Summary Introduction This work provides evidence that, during transcription, the mutability (propensity to mutate) of a base in a DNA secondary structure depends both on the stability of the structure and on the extent to which the base is unpaired. Zuker's DNA folding computer program reveals the most probable stem–loop structures ∆G) for any (SLSs) and negative energies of folding (–∆ given nucleotide sequence. We developed an interfacing program that calculates (i) the percentage of folds in which each base is unpaired during transcription; and (ii) the mutability index (MI) for each base, expressed as an absolute value and defined as follows: MI = (% total folds in which the base is ∆G of all folds in which it is unpaired) × (highest –∆ unpaired). Thus, MIs predict the relative mutation or reversion frequencies of unpaired bases in SLSs. MIs for 16 mutable bases in auxotrophs, selected during starvation in derepressed genes, are compared with 70 background mutations in lacI and ebgR that were not derepressed during mutant selection. All the results are consistent with the location of known mutable bases in SLSs. Specific conclusions are: (i) Of 16 mutable bases in transcribing genes, 87% have higher MIs than the average base of the sequence analysed, compared with 50% for the 70 background mutations. (ii) In 15 of the mutable bases of transcribing genes, the correlation between MIs and relative mutation frequencies determined experimentally is good. There is no correlation for 35 mutable bases in the lacI gene. (iii) In derepressed auxotrophs, 100% of the codons containing the mutable bases are within one codon's length of a stem, compared with 53% for Gene-specific transcription-induced mutations have been described in growing cells of prokaryotes (Brock, 1971; Herman and Dworkin, 1971; Beletskii and Bhagwat, 1996) and eukaryotes (Datta and Jinks-Robertson, 1995), as well as in starving cells of Escherichia coli (Lipschutz et al., 1965; Wright, 1996; 1997; Wright and Minnick, 1997; Rudner et al., 1999; Wright et al., 1999). Recent reviews have discussed the mechanisms by which starvation and gene derepression can enhance non-random mutations (Foster, 1999; Wright, 2000; Bridges, 2001; Rosenberg, 2001). Previous studies have demonstrated a correlation between mutation rates and transcription as determined by measuring levels of specific mRNA (Wright et al., 1999). However, accurate rates of mRNA synthesis can only be determined when both mRNA concentration and half-life are known. Current studies on mRNA turnover (J. M. Reimers et al., submitted) have confirmed the conclusions of the earlier investigations (Wright et al., 1999). There are at least two mechanisms by which transcription can increase mutation rates in localized areas of the genome: (i) separation of the two DNA strands exposes the non-transcribed strand, which is then vulnerable to mutations (Lindahl and Nyberg, 1972; 1974; Coulondre et al., 1978; Singer and Sterns, 1982; Fix and Glickman, 1987; Frederico et al., 1990; 1993; Skandalis et al., 1994); and (ii) the advancing RNA polymerase complex drives localized supercoiling (Balke and Gralla, 1987; Liu and Wang, 1987; Dayn et al., 1991; Dayn et al., 1992; Rahmouni and Wells, 1992; Krasilnikov et al., 1999), which creates and stabilizes stem–loop structures (SLSs) containing vulnerable unpaired and mispaired bases (Ripley, 1982; Todd and Glickman, 1982; Pananicolaou and Ripley, 1991). Thus, these two mechanisms, in concert, are ideal for obtaining the variants most likely to survive each kind of stress while minimizing random genomic damage (Wright, 2000). Accepted 13 December, 2002. *For correspondence. E-mail [email protected]; Tel. (+1) 406 243 6676; Fax (+1) 406 243 4184. © 2003 Blackwell Publishing Ltd 430 B. E. Wright et al. Evidence from many laboratories has demonstrated that transcription drives localized supercoiling and that stem– loop conformations contain mutable bases. However, no direct evidence has yet been presented to document a correlation in specific genes between increased rates of transcription and enhanced mutation localized in unpaired bases of SLSs. The highest concentration of supercoiling and SLSs occurs in the wake (5′) of the transcription bubble, and SLSs can form in either or both strands (cruciforms). DNA secondary structures not only expose mutable bases, but also increase the time of their exposure by periodically blocking the progress of transcribing RNA polymerase complexes during transcript elongation. Transcriptionenhanced mutations should therefore occur preferentially at pause sites, and mutation rates should be increased by activators such as guanosine tetraphosphate (ppGpp), which has been shown to lengthen the duration of these pauses (Kingston et al., 1981). Specific effects of stress on transcription are essential to the mechanisms of nonrandom mutations in evolution (see Discussion; Wright, 2000). To explore the possible correlation of secondary structure formation and sites of mutation, Zuker's DNA folding computer program (Benham, 1992; SantaLucia, 1998; Zuker et al., 1999) was used. This program folds a segment of ssDNA, predicts the most thermodynamically stable secondary structure that could form and calculates the negative energy of folding (–∆G) for each structure (http://www.bioinfo.rpi.edu/applications/mfold/old/dna). For example, a 200 nucleotide (nt) sequence containing a known mutable base is copied as input to the program. The program is then instructed to fold a series of 30 nt segments that include the mutable base, and then to show the structure with the highest –∆G for that base. The theoretical methods used in screening DNA sequences for sites susceptible to superhelical strand separation and transition to alternative structures have been verified experimentally in other systems (Benham, 1992). A computer program that interfaces with Zuker's program has been developed in this laboratory in order to obtain additional information regarding the mutability of individual bases. This new program calculates the extent to which each base is unpaired during transcription (see Experimental procedures). In four Escherichia coli auxotrophs (leuB, argH, malT and lacZ) studied in our laboratory, mutant sequences and number of mutations have been determined. Sequences and mutation frequencies of seven trpA auxotrophs were found in the literature (Isbell and Fowler, 1989; Bhamre et al., 2001). As controls, data were also obtained from the literature for background (‘spontaneous’) mutations not isolated in derepressed genes, namely lacI and ebgR (Schaaper et al., 1986; Hall, 1999). The presence of loss- of-function mutations in these repressor genes is revealed indirectly by allowing the constitutive expression of βgalactosidase. Results The location of mutable bases in high and low –∆G areas of a gene In order to correlate the location of mutable bases in SLSs with the –∆G of their structures, about 130 nt of the nontranscribed strand, including the mutation(s) in leuB, argH, malT or lacZ, were folded in successive, overlapping 30 nt segments, beginning at each fifth nt (Fig. 1). Thus, each nucleotide was included in six successive folds. The magnitudes of the –∆G values indicate the relative stability of the structures formed by each segment: the greater the – ∆G value, the more likely a stable secondary structure will form and create a pause site. Figure 1 shows that mutations generally occur in structures with high –∆Gs. That is, in lacZ, leuB, argH, malT and codon 49 of trpA, mutations occur in the highest –∆G peaks present. In the other three codons of interest in trpA, mutations occur in three of the four highest peaks. It should be pointed out that a high –∆G results primarily from the stability of the stem(s), and does not necessarily indicate the presence of an unpaired base vulnerable to a chemical change (see Discussion). Presumably, mutations usually occur when an unpaired base exists in its highest –∆G structure. The SLSs that form at peak –∆G values in selected genes are described below. The DNA folding program appears to describe a biochemical process that is robust. That is, although the structures formed in the 30 nt window are not as large and stable as those in 40 nt and 50 nt windows (not shown), all three analyses show a peak of high negative energy values and structures with a similar conformation at the site of the mutable base. This suggests a reliable and buffered system for the dynamic interconversion of secondary structures. Thus, despite probable continuous variations in the length of segments folded during transcription, mutation and (presumably) pause sites generally coincide with peak –∆G values. The most stringent condition used, folding 30 nt (compared with 40 and 50 nt) shows the most specific correlation of maximum peak – ∆G values with the presence of each mutant nucleotide, reflecting, perhaps, the average number of nucleotides folded in vivo (Fig. 1). Although the 50 nt foldings had peaks at the mutation sites, they also had higher peaks elsewhere, and were therefore not as specifically correlated with the mutation sites. Very large windows generate very complex structures but, in choosing a smaller window, we attempted to simulate transcription conditions in silico. Therefore, a 30 nt window was chosen for analysing © 2003 Blackwell Publishing Ltd, Molecular Microbiology, 48, 429–441 Predicting mutations in stem–loop structures 431 Fig. 1. The location and –∆G values of 16 known mutable bases in E. coli auxotrophs. Beginning at each fifth nucleotide, 130 nt (lacZ, including codon 63; leuB, including codon 286; argH, including codon 4; malT, including codon 545; and trpA, including codon 49) or 260 nt (trpA, including codons 183, 211 and 234) of the non-transcribed strands were folded in successive, overlapping 30 nt windows. Each bar indicates the beginning of a window, and the number of mutable bases in each fold appears above each bar. The entire trpA gene was included in the analysis, but a 340 nt segment without mutable bases was omitted from the following sequence and is indicated by dotted lines. Codon numbers are in parenthesis, and mutable bases are capitalized. …gct aat tga agc cgg tgc tga gcg ctg GAg (49) tta ggc atc ccc ttc tcc gac cca ctg gca gat ggc ccg acg att caa … gca ggc gtg aCc (183) ggc gca gaa aac cgc gcc gcg tta ccc ctc aat cat ctg gtt gcg aag ctg aaa gag tac aac gct gca cct cca ttg cag GGa (211) ttt ggt att tcc gcc ccg gat cag gta aaa gca gcg att gat gca gga gct gcg ggc gcg att tct GGt (234) tcg ccc… the mutant sequences. However, in the analysis of lacZ, it was also found necessary to use a 40 nt window in order to understand the mechanism of the triple mutation that occurred (see below). © 2003 Blackwell Publishing Ltd, Molecular Microbiology, 48, 429–441 In the trpA gene (Yanofsky et al., 1981; Isbell and Fowler, 1989; Bhamre et al., 2001), the locations of seven mutable bases in four codons are known, and the sequences containing these bases are shown (Fig. 1). 432 B. E. Wright et al. Fig. 2. The location and –∆G values of mutable bases in the E. coli repressors, lacI and ebgR. Beginning every fifth nt, 30 nt windows were folded. Each bar indicates the nucleotide at the beginning of the window, and the number of mutable bases in each fold appears above each bar. Wild type lacI, 57 mutations (mutable bases are capitalized) have been found: …gtg aaa cca gta ACg tta tac gat gTc GCa gag tat Gcc ggt gTc tCt Tat CAG ACc GTt TCc CGc GTg gTg Aac CAg gcc agc cac GTt tCt gcg aaa Acg cgg gaa aaa GTg Gaa gcg GCg atg Gcg gag ctg aat TAC att cCc aAC cgc gTg GCa CAA CAa cTg gCg Ggc Aaa Cag tCg ttg ctg att ggc gtt gcc acc tcc agt cTg gcc… Wild-type ebgR sequence, 13 mutation sites (mutable bases are capitalized) have been found: …gac atc gca atc gaa gCt ggc gta Tcc ctg gcg aca Gta tcc AGg gtc tta aat gac gAT ccg aca ttg Aat Gtg aaa gaa gag aCg aaa cat cgc att ctc Gag atc gcc Gaa aag ctg gag taC aag acc… These seven mutations are also located in high –∆G peaks. In contrast, Fig. 2 shows an apparently random distribution of mutable bases in lacI and ebgR in which transcription was not induced. Note, especially in the large lacI data set (Fig. 2 legend), that most of the 414 mutations that were sequenced are in the first and second positions of the codons (see Discussion). Calculation of a mutability index for mutable bases DNA folding analyses have suggested that mutable bases in selected, derepressed genes are preferentially located in high –∆G SLSs, in contrast to mutable bases in noninduced genes. The mutability of a base should depend both on the stability (–∆G) of its SLS and on whether the base is unpaired in these structures (Lindahl and Nyberg, 1972; 1974; Coulondre et al., 1978; Singer and Sterns, 1982; Fix and Glickman, 1987; Frederico et al., 1990; 1993; Skandalis et al., 1994). Therefore, a program has been developed that will provide this information for each base (see Experimental procedures). To describe and evaluate these assumed causes of base mutability, we have defined a new term, called the mutability index (MI), and express it as an absolute value: MI =(% total folds in which the base is unpaired) × (highest –∆G of all folds in which it is unpaired). Thus, MIs indicate the relative mutation frequencies of bases in predicted SLSs. The highest –∆G term gives the most weight to the most stable fold containing the unpaired base. That is, primary importance is given to the structure in which the base is exposed for the longest period of time and therefore has the highest probability of mutation. In addition, the average –∆G and MI of all the bases included in the sequence analysed are calculated, for comparison with individual base –∆Gs and MIs. This information is necessary, as mutable bases exist in different –∆G regions. For example, the average –∆G value in the lacZ analysis is 4.5, whereas this value for leuB is 2.4 (Fig. 1). The percentage of each wild-type mutable base MI above or below the average MI in the –∆G region of each base was calculated for the 16 mutable bases in auxotrophs and for the 70 mutable bases in control genes. The percentage above average MI values for the auxotroph bases is 87%, and this value for lacI and ebgR is 50%. Comparing MIs with mutation frequencies If mutable bases are located in unpaired bases of SLSs predicted by the Zuker DNA folding program, MI values may predict relative mutation frequencies. Sequences and reversion frequencies have been determined for bases in the same codons of seven trpA mutant alleles (Isbell and Fowler, 1989; Bhamre et al., 2001). In Fig. 3, the SLSs with the highest –∆G for wild type and three mutant alleles in codon 49 are shown. In the wild type, a strong stem with five C*G basepairs is formed. As the G in the first position of the codon is in the stem, its MI is very low (0.3) in this 30 nt SLS, as well as in smaller windows (not © 2003 Blackwell Publishing Ltd, Molecular Microbiology, 48, 429–441 Predicting mutations in stem–loop structures 433 Fig. 3. Predicted SLSs at peak –∆G values are shown for wild type and three mutant alleles of codon 49 of trpA. The predicted MI values for the mutations (arrows) are compared in the table inset with mutation frequencies determined experimentally in two data sets. See Table 1 and text for details . shown). However, in higher –∆G folds of 35 or 40 nt, more stable stems occur, and the MI is zero, i.e. this G is paired in all folds. Therefore, the mutation of this G to an A or T, giving rise to alleles 11 and 88, respectively, is predicted to occur in relatively small SLSs. These latter SLSs have only three C*G pairs in their stems and thus lower –∆G values (see bar graph in Fig. 3). However, as they are unpaired much of the time, their MIs are much higher (2.3 and 2.5) than that of the wild-type G (0.3). The relative MI values calculated for the three mutable bases in these mutants compare favourably with their relative reversion frequencies (Isbell and Fowler, 1989; Bhamre et al., 2001; see Fig. 3, table inset). Mutation of the middle base in codon 49 (from an A to a T) retains the strong stem (allele 3), and the MI of that T reverting to wild-type A is the highest of all (3.5). These relative MIs predict that the reversion frequency of allele 3 should be higher than that of 11 and 88, and that the latter alleles should have comparable reversion frequencies. The table inset compares MIs with reversion frequencies. Figure 4 summarizes similar data for four more mutant alleles in codons 211 and 234 of trpA. Again, the relative MIs compare well with their relative reversion frequencies, and MIs of the mutant bases are higher than those of the wild-type base to which they revert. © 2003 Blackwell Publishing Ltd, Molecular Microbiology, 48, 429–441 Figures 3 and 4 also show that the highest –∆G SLSs of wild-type sequences are often the same as that of their mutant alleles. For example, in codon 49, the most stable SLS of allele A 3 is the same as that of wild type. Similarly, A 23 and A 26 in codon 211 and A 78 and A 58 in codon 234 have the same SLSs as their respective wild types. Transcription should enhance the mutation frequency of all mutable bases, including those of wild type, by localized increases in the number of SLSs. The most probable SLSs formed by the non-transcribed strands (NTSs) at peak –∆G values were compared with those formed by the transcribed strands (TSs) and usually found to mirror one another. In such genes, therefore, mutations are equally probable in the NTS and TS. However, in the argH gene (and, to a lesser extent, leuB), different structures form in 30 nt folds of the two strands, as shown in Fig. 5A. This occurs because of nearest neighbour interactions, the asymmetry of the base stacking rules and because a stem structure containing a G*T bond forms in the TS (SantaLucia, 1998; Zuker et al., 1999). This stem cannot form in the NTS, which has a C and an A in a comparable region of the SLS. The longer stem present in the TS bestows a higher –∆G and higher MIs to that SLS compared with the NTS, and predicts that mutations are more likely in the TS. In both the TS and 434 B. E. Wright et al. Fig. 4. Predicted SLSs at peak negative energy values for wild type and two mutant alleles for codons 211 and 234 of trpA. Predicted MIs and mutation frequencies determined experimentally are compared in the table inset. See Table 1 and text for details. the NTS, the MIs of the mutable bases are somewhat higher than the MIs of the wild-type bases (not shown). In the NTS, folding 40 nt gives similar MIs for the first two codon positions but increases the MI in the third position from 1.5 to 2.8. In all folds above 20 nt, the third codon position has the highest MI value, consistent with the number of mutations determined by sequencing 23 revertants (Fig. 5A, table inset). In Fig. 5B, a qualitative comparison of MIs with the number of mutation events is shown for the leuB mutant. In most cases, 30 nt SLSs have higher –∆G values than 20 nt SLSs but, in this case, the stems are the same, and the long sequences at the ends of the 30 nt SLSs may well be paired with their complements. As shown in the table inset of Fig. 5B, the MIs of the mutable bases are comparable and consistent with the Fig. 5. Predicted SLSs of argH (A) and leuB (B) genes. The predicted MI of each mutable base is indicated by an arrow. Inset table shows the MI of each mutable base compared with the number of mutations (No.) determined by sequencing 23 revertants in the case of argH and 36 in the case of leuB. See text for discussion. © 2003 Blackwell Publishing Ltd, Molecular Microbiology, 48, 429–441 Predicting mutations in stem–loop structures 435 results of sequencing 36 revertants (Wright and Minnick, 1997). Table 1 summarizes the comparisons in the same codon between calculated MIs and reversion frequencies for the seven trpA auxotrophs, and Table 2 summarizes sequencing data for the argH, leuB and lacZ auxotrophs. Data are also available for a comparable analysis of 414 background mutable bases in lacI (Schaaper and Dunn, 1991; Fig. 6). In this sequence, all mutable bases within a codon in which at least one base had three or more mutational events were analysed, giving a total of 35. The number of mutants shown above each base was counted, and the results are summarized in Table 3. Mutation frequencies (trpA) and number of mutations (lacI) are plotted against MIs to obtain linear regression analyses of the data in Tables 1 and 3 (Fig. 7). The R2 value for the trpA alleles is 0.57 and for the lacI data set is 0.002. Although the qualitative comparisons of MIs and mutation events are good for the mutable bases in Table 2, there are too few data points for this type of analysis. The triple mutant The lacZ stop codon mutant and its triple revertant are shown in Fig. 8. As mentioned previously, a 30 nt window was chosen for the analysis of SLSs. However, when the very long and stable stem was observed, it was necessary to increase the window to 32 nt in order to include the TAG codon (Fig. 8). In the course of sequencing 47 of the revertants, we observed many types of single base revertants, a four codon deletion and, on nine occasions, a triple mutation was isolated (Table 2). As seen in Table 2, a number of amino acid substitutions can result in an active enzyme, although different growth rates are observed. This triple mutation must have arisen by a templating mechanism. The reversion frequency of this stop codon is ≈ 10−9, and three Table 1. Comparisons in the same codons between MIs and reversion frequencies in seven mutable alleles of the derepressed trpA gene. Revertants/108 cells Codon Mutant allele MI Expt Aa Expt Bb 49 49 49 211 211 234 234 trpA trpA trpA trpA trpA trpA trpA 3.5 2.3 2.5 3.6 3.1 2.8 3.4 0.4 0.06 NDc 1.1 0.8 0.3 ND 1.3 0.2 0.3 ND ND 0.3 1.8 3 11 88 23 46 58 78 a. Taken from Table 3 in Bhamre et al. (2001). For allele 58, the frequencies of A to G and to C were summed. b. Taken from Table 4 in Isbell and Fowler (1989). Only class I full revertant frequencies were used. c. No data available. © 2003 Blackwell Publishing Ltd, Molecular Microbiology, 48, 429–441 Table 2. Comparisons between MIs and relative numbers of mutations determined by sequencing in eight mutable alleles of three derepressed genes. Mutant Codon Base change MI No. of mutations argH 4 4 4 286 286 286 63 63 63 63 T to G, C, or A G to T or C A to C, G, or T T to A or G T to C G to A, C or T T to C, G or A A to G or C G to C or T TAG to AGC 0.8 0.8 1.5 2.6 2.6 – 13.9 13.9 13.9 Triple 8 2a 13 20 16 Not observed 7b 19 12 9 leuB lacZ a. This number of revertants is lower than expected, but mutation of G to A results in a stop codon that would not be observed. b. Theoretically, mutations in the first position should occur as frequently as those in the second and third positions. This value may be low due to (observed) lower growth rates of these revertants, just as the A to G revertants (wild type) in the second position (wild type) may be high because these revertants have the highest growth rate. simultaneous independent mutations would have the unlikely frequency of (10−9)3. Inspection of the sequences at the end of the 32 nt SLS revealed that, were the structure extended to 40 nt, a bubble would appear in which templating from the complementary strand could change the TAG stop codon to the triple mutant shown in Fig. 8. There is precedence for this mechanism in the literature (Ripley, 1982), and the existence of this templated triple mutation provides compelling evidence for the role of SLSs in vivo. Discussion Evidence in the literature indicates that DNA sequence can determine the location of a background mutation, but the frequency with which it occurs depends primarily upon various kinds of DNA-destabilizing metabolic events. When growth is inhibited, DNA replication is minimized and transcription of specific derepressed genes in response to an inhibitor or condition of stress may then be a major cause of mutations (Wright, 2000). In 11 strains of leuB and argH auxotrophs, isogenic except for relA, a correlation was found between mutation rates, ppGpp and mRNA levels (Wright, 1996; Wright and Minnick, 1997; Longacre et al., 1999; Wright et al., 1999). Similar results have been obtained in B. subtilis, in which reversion rates of three different amino acid auxotrophs were higher in strains able to produce the transcriptional activator guanosine tetraphosphate (Rudner et al., 1999). Current studies with malT and lacZ auxotrophs also indicate that those strains able to activate transcription (cya+, crp+) have higher reversion rates than the strains (cya and crp mutants) unable to activate transcription (unpublished data). The presence of these mutations in unpaired bases 436 B. E. Wright et al. Fig. 6. The number of background mutations in lacI (Schaaper and Dunn, 1991). of predicted SLSs implicates supercoiling and is consistent with the literature. Background point mutations in unpaired bases occur by known chemical mechanisms having finite, significant activation energies under physiological conditions, and such lesions can subsequently be immortalized by replication. The thermodynamic properties of hydrolytic reactions that occur in nucleic acids under these conditions are such that the deamination of C is much greater than that of A, and the depurination of G or A is much greater than the depyrimidation of C or T (reviewed by Singer and Sterns, 1982). Also, the oxidation potential of a G located 5′ to another G is greater than it is next to a C or T (reviewed by Sugden and Stearns, 2000). Because of its size, an A is more likely than a C or a T to replace a G. The lacI data set (Fig. 6) is sufficiently large to document these statements, i.e. 75% of the G mutations are to A, and 88% of the C mutations are to T. A recent analysis (Wright et al., 2002) of 14 000 mutations in the human p53 gene shows a similar pattern, with values of 74% and 91% respectively. A significant source of unpaired and mispaired bases subject to the kinds of mutations described above are the loops of SLSs that can often elude detection and repair (Moore et al., 1999). These secondary DNA structures are naturally present because of the frequent occurrence of inverted complements in DNA sequences (Lilley, 1980; Papyotatos and Wells, 1981). It should be pointed out that, in general, specific positions in these SLSs (e.g. a base near a stem) are uniquely vulnerable to mutation, regardless of which base is present at that position. During transcript elongation and the probable continuous variations in the length of the nucleotide segments folded in vivo, the stems, which create the SLSs, are the invariant part. Thus, the most reliable location for an unpaired base that evolved to be mutable is near a stem. In prokaryotes, the most mutable bases are in codons that are within a codon's length of a stem (Figs 3–5 and 8). In the eukaryotic p53 tumour suppressor gene, the most mutable bases are located immediately next to stems in stable SLSs. The predicted MIs correlate well (R 2 = 0.76) with those observed for 14 000 human cancers, whereas no such © 2003 Blackwell Publishing Ltd, Molecular Microbiology, 48, 429–441 Predicting mutations in stem–loop structures 437 Table 3. Comparisons in the same codon between MIs and the number of background mutations in 35 mutable bases of lacI. Allele MI No. of mutations 41 42 56 57 80 81 82 83 84 86 87 89 90 92 93 95 96 104 105 116 117 140 141 149 150 167 168 169 177 178 185 186 188 189 190 3.6 0.4 0.6 0.6 3.9 3.2 3.2 0.8 0.8 1.4 4.3 4.7 4.7 4.7 3.4 2.7 0.3 3.1 3.4 1.5 0.1 4.2 3.5 0.6 2.2 3.5 3.5 3.3 3.8 1.3 2.0 3.5 2.9 2.8 3.1 2 3 6 10 3 5 1 17 5 7 3 1 4 7 16 5 10 29 5 4 1 1 12 3 2 15 6 1 6 1 9 15 6 3 3 correlation (R 2 = 0.0005) is seen for nearby control bases (Wright et al., 2002). Do measured mutation rates really represent mutation rates in vivo? An appropriate gene to consider in discussing the true frequency of background mutations (whether or not they are silent) is lacI, in which 25% of the bases are known to be mutable (Fig. 6). However, 90% of these mutations occur in the first two codon positions, as changes in the third position are usually silent, i.e. do not alter the encoded amino acid. As mutations are also undoubtedly occurring in the third position, these can be added to the 57 observed mutated bases. Thus, about 40% of the bases in lacI are vulnerable to mutation. This does not include all the other silent mutations that do not inactivate this repressor. Thus, the majority of all bases appear to be mutable, but many of these are not detectable. Although SLSs are similar for mutable bases in the auxotrophs and in lacI, a number of differences have been noted. In lacI, mutable bases occur more frequently at a greater distance from a stem, or in an unlooped, single strand 8–11 bases from a stem. The existence of such a structure may be unlikely, as that length of ssDNA would probably be bonded to its complement as dsDNA. The same mechanisms that give rise to the lacI mutations must of course occur at some frequency in derepressed auxotroph genes. The analysis of background mutable bases in lacI indicates that as many mutations occur in high as in low –∆G SLSs. In this system, all mutations have equal probabilities of being detected and sequenced. However, this is not the case with revertants of auxotrophs selected in derepressed genes during starvation. Transcription enhances localized supercoiling and increases the concentration of SLSs that harbour unpaired bases susceptible to mutation. Therefore, these vulnerable bases located in high –∆G SLSs are selected during starvation. Such bases will mutate the most frequently, enrich the mutant population with the most progeny and have the highest probability of being isolated. Many kinds of environmental challenge can target specific genes for supercoiling and enhanced mutagenesis. Data summarized in this article indicate that nutritional stress can localize enhanced mutability in SLSs resulting from selective transcription. These circumstances can Fig. 7. Linear regression analyses for the data in Tables 1 and 3. Mutation frequencies or number of mutations are plotted against MIs for seven mutable alleles in the derepressed trpA gene (Isbell and Fowler, 1989; Bhamre et al., 2001), and for 35 background mutable alleles in lacI (Schaaper and Dunn, 1991). © 2003 Blackwell Publishing Ltd, Molecular Microbiology, 48, 429–441 438 B. E. Wright et al. Fig. 8. The 32 nt and 40 nt SLSs of lacZ. When 40 nt are folded, it can be seen that the triple mutation, AGC, could arise by templating to the complementary TCG sequence in the opposite strand. result in ‘forward’ mutations beneficial to evolution. For example, during starvation for ribitol in the presence of xylitol, two existing genes are modified to regulate the transport and use of xylose as a new carbon source (Lerner et al., 1964; Wright, 2000). Other stressors, such as extreme osmotic conditions, heat shock, host–pathogen interactions or metal toxicity, could direct mutations to related genes potentially able to alleviate these problems. (Borowiec and Gralla, 1987; Higgins et al., 1988; Pruss and Drlica, 1989; Ansari et al., 1992; Dorman, 1995; Jordi et al., 1995; Lefstin and Yamamoto, 1998; Massey et al., 1999). Specific areas of the genome are targeted for localized transcription and supercoiling, increasing the concentration of SLSs and unpaired bases vulnerable to mutation. This mechanism provides a continuous feedback loop between the ever-changing environment, productive targets for localized enhanced mutation frequencies and selection of the fittest. As most mutations are detrimental, directing mutations to those genes regulating the cell's response to each type of stress minimizes genome-wide genetic damage while creating the most appropriate variants for accelerating evolution. Just as DNA secondary structures may be considered an additional dimension of gene expression (Wells et al., 1980), so may the mutations that occur because of their location in these structures. These structures are present many thousands of times more frequently than expected by chance (Lilley, 1980) and, in viral DNAs, they are overrepresented in regions that appear to have regulatory significance (Müller and Fitch, 1982). In higher organisms, these hypermutable sequences have apparently become localized to increase variants in genes of critical importance to the survival of the organism (Forsdyke, 1995). However, the same mechanisms by which stress can cause mutations and accelerate evolution in microorganisms may cause cancer in multicellular organisms (Rady et al., 1992; Ionov et al., 1993; Skandalis et al., 1994; Panagopoulos et al., 1997; Colman et al., 2000), where survival of the fittest mutant can result in abnormal growth. There is reason to believe that the evolution of the DNA world was paralleled by the development of systems such as supercoiling to help maintain the integrity, geometry and functionality of DNA (López-García, 1999). The essential role of negative supercoiling in the regulation of all aspects of DNA behaviour, especially in response to environmental stress, is becoming increasingly apparent. If specific conditions of stress target particular areas of the genome for negative supercoiling, the formation of secondary structures will be facilitated; if stress first activates transcription, this will in turn drive supercoiling. By either scenario, the resulting increase in concentration of SLSs will result in localized mutations, and a number of circumstances can apparently converge to make responses to particular adverse conditions quite specific. In bases within the same codon, differential effects of other variables affecting mutability should be minimal. These contiguous bases obviously have a very similar microenvironment with respect to supercoiling domains (Sinden and Pettijohn, 1981; Krasilnikov et al., 1999), ssDNA binding proteins, accessory proteins involved in transcription, repair systems and so on. Therefore, comparing their relative mutation frequencies or events is ideal for evaluating the relevance of the factors used in calculating MI, and for testing the validity of our proposed model for predicting mutability. In fact, all our results are consistent with the location of mutable bases in SLSs. Based on this assumption, MIs for mutable bases in selected genes have, with unanticipated success, predicted 15 relative mutation frequencies (trpA) or events (argH, leuB and lacZ). This correlation occurred in spite of the fact that an MI depends upon only two variables: (i) the highest –∆G of all the SLSs in which it is unpaired; and (ii) the extent to which the base is unpaired in all its foldings during transcription. © 2003 Blackwell Publishing Ltd, Molecular Microbiology, 48, 429–441 Predicting mutations in stem–loop structures 439 Experimental procedures The analysis of DNA secondary structures A new software tool called ‘mfg’ was developed for use in this paper for predicting mutation frequencies. Given an input sequence, the set of all subsequences containing that base is generated. From this set, mfg calculates the singlestranded percentage of the base and the energy of the most stable structure containing the base. These are multiplied to produce the MI value for that base. A full description of this program and instructions for its use can be found at http://biology.dbs.umt.edu/wright/upload/mfg.html. The program currently runs on the Windows platform only, but source code is available for porting to other platforms. Mutation rate conditions used to obtain revertants for sequencing Mutation rates for argH and leuB were determined as described previously (Wright and Minnick, 1997). The reversion rate of codon 63 of lacZ was determined as follows: E. coli CSH2 cells from a 12- to 24-h-old rich media plate were diluted in 75 ml of minimal medium, consisting of 50 mM sodium phosphate buffer, pH 6.5, 1.0 g l−1 (NH4)2SO4, 1.0 g l−1 MgSO4, 0.3 mM glucose and 0.3 mM tryptophan to a cell density of 5 × 10−4 ml−1. This inoculum was divided into 45 2cm-diameter test tubes at 1.5 ml per tube and grown for 18– 28 h with shaking, at a 45° angle at 37°C. Cell growth was followed by hourly sampling companion tubes at a 1:10 dilution in buffered saline. When the cell concentration reached the end of log growth, serial dilutions were made from two companion cultures, and 10−6 and 10−7 dilutions were plated on rich nutrient agar plates and incubated for 24 h at 37°C to determine viable counts. The contents of 40 culture tubes were distributed on selective [5 g l−1 lactose, salts (as above), 0.3 mM tryptophan, 0.1 mM NaCl] agar plates. Plates were incubated for 40–48 h at 37°C and evaluated for revertants (Luria and Delbrück, 1943). Although mutation rates per se are not reported in this paper, the above procedures were used to obtain the revertants sequenced (Tables 2 and 3). Sequencing of revertants Revertants were isolated from mutation rate plates containing a single colony at 72 h, restruck on selective plates and grown for 24 h at 37°C. An inoculum was transferred to 2 ml of Luria–Bertani (LB) broth and grown for 8 h with shaking at 37°C. The culture was transferred to a microcentrifuge tube and pelleted at 10 000 g for 10 min. A Qiagen DNeasy tissue kit was used to extract chromosomal DNA. The lacZ DNA was amplified by polymerase chain reaction (PCR) in a 50 µl volume using 500 ng of template and the primer pair: LacZ5′ (5′-ATGACCATGATTACGGATTC-3′) and LacZ3′ (5′TTATTTTTGACACCAGACCAAC-3′) used at a concentration of 50 pmol per reaction. Other components of the PCR, all from New England BioLabs, were 10× polymerase buffer, VentR DNA polymerase, MgSO4 and dNTPs. After 30 cycles, 2 µl of the product was visualized on a 1% agarose gel containing 0.5 µg ml−1 of ethidium bromide, and the remainder was cleaned with a Qiaquick PCR purification kit © 2003 Blackwell Publishing Ltd, Molecular Microbiology, 48, 429–441 (Qiagen). Sequencing was performed at the Murdock Molecular Biology Facility using a BigDye Terminator ready reaction kit with an Applied Biosystems Model 373A stretch DNA sequencer. Acknowledgements We are indebted to Drs Robert G. Fowler, Bryn Bridges, Roel M. Schaaper and Scott Samuels for valuable criticisms of and suggestions for the manuscript. We thank Dr George Card for the CSH2 E. coli strain. This work was supported by National Institutes of Health grants R15CA88893 and R55CA99242 and the Stella Duncan Memorial Research Institute. References Ansari, A.Z., Chael, M.L., and O'Halloran, T.V. (1992) Allosteric underwinding of DNA is a critical step in positive control of transcription by Hg-MerR. Nature 355: 87–89. Balke, V.L., and Gralla, J.D. (1987) Changes in the linking number of supercoiled DNA accompany growth transitions in Escherichia coli. J Bacteriol 169: 4499–4506. Beletskii, A., and Bhagwat, A.S. (1996) Transcription-induced mutations: increase in C-to-T mutations in the nontranscribed strand during transcription in Escherichia coli. Proc Natl Acad Sci USA 93: 13919–13924. Benham, C.J. (1992) Energetics of the strand separation transition in superhelical DNA. J Mol Biol 225: 835–847. Bhamre, S., Bedrick, B.G., Koyama, C.A., White, S.J., and Fowler, R.G. (2001) An aerobic recA-, umuC-dependent pathway of spontaneous base-pair substitution mutagenesis in Escherichia coli. Mutat Res 473: 229–247. Borowiec, J.A., and Gralla, J.D. (1987) All three elements of the lac ps promoter mediate its transcriptional response to DNA supercoiling. J Mol Biol 195: 89–97. Bridges, B.A. (2001) Hypermutation in bacteria and other cellular systems. Phil Trans R Soc London B 356: 29–39. Brock, R.D. (1971) Differential mutation of the β-galactosidase gene of Escherichia coli. Mutat Res 11: 181–186. Colman, M.S., Afshari, C.A., and Barrett, J.C. (2000) Regulation of p53 stability and activity in response to genotoxic stress. Mutat Res 462: 179–188. Coulondre, C., Miller, J.H., Farabaugh, P.J., and Gilbert, W. (1978) Molecular basis of base substitution hotspots in Escherichia coli. Nature 274: 775–780. Datta, A., and Jinks-Robertson, S. (1995) Association of increased spontaneous mutation rates with high levels of transcription in yeast. Science 268: 1616–1619. Dayn, A., Malkhosyan, S., Duzhy, D., Lyamichev, V., Panchenko, Y., and Mirkin, S. (1991) Formation of (dA-dT)n cruciforms in Escherichia coli cells under different environmental conditions. J Bacteriol 173: 2658–2664. Dayn, A., Malkhosyan, S., and Mirkin, S.M. (1992) Transcriptionally driven cruciform formation in vivo. Nucleic Acids Res 20: 5991–5997. Dorman, C.J. (1995) DNA topology and the global control of bacterial gene expression: implications for the regulation of virulence gene expression. Microbiology 141: 1271– 1280. Fix, D.F., and Glickman, B.W. (1987) Asymmetric cytosine 440 B. E. Wright et al. deamination revealed by spontaneous mutational specificity in an UngG– strain of Escherichia coli. Mol Gen Genet 209: 78–82. Forsdyke, D.R. (1995) Conservation of stem-loop potential in introns of snake venom phospholipase A2 genes: an application of FORS-D analysis. Mol Evol Biol 12: 1157–1165. Foster, P.L. (1999) Mechanisms of stationary phase mutation: a decade of adaptive mutation. Annu Rev Genet 33: 57–88. Frederico, L.A., Kunkel, T.A., and Shaw, B.R. (1990) A sensitive genetic assay for the detection of cytosine deamination: determination of rate constants and the activation energy. Biochemistry 29: 2532–2537. Frederico, L.A., Kunkel, T.A., and Shaw, B.R. (1993) Cytosine deamination in mismatched base pairs. Biochemistry 32: 6523–6530. Hall, B.G. (1999) Spectra of spontaneous growth-dependent and adaptive mutations at ebgR. J Bacteriol 181: 1149– 1155. Herman, R.K., and Dworkin, N.B. (1971) Effect of gene induction on the rate of mutagenesis by ICR-191 in Escherichia coli. J Bacteriol 106: 543–550. Higgins, C.F., Dorman, C.J., Stirling, D.A., Waddell, L., Booth, I.R., May, G., and Bremer, E. (1988) A physiological role for DNA supercoiling in the osmotic regulation of gene expression in S. typhimurium and E. coli. Cell 52: 569–584. Ionov, Y., Peinado, M.A., Malkhosyan, S., Shibata, D., and Perucho, M. (1993) Ubiquitous somatic mutations in simple repeated sequences reveal a new mechanism for colonic carcinogenesis. Nature 363: 558–561. Isbell, R.J., and Fowler, R.G. (1989) Temperature-dependent mutational specificity of an Escherichia coli mutator, dnaQ49, defective in 3′→5′ exonuclease (proofreading) activity. Mutat Res 213: 149–156. Jordi, B.J.A.M., Owen-Hughes, T.A., Hulton, C.S.J., and Higgins, C.F. (1995) DNA twist, flexibility and transcription of the osmoregulated proU promoter of Salmonella typhimurium. EMBO J 14: 5690–5700. Kingston, R.E., Nierman, W.C., and Chamberlin, M.J. (1981) A direct effect of guanosine tetraphosphate on pausing of Escherichia coli RNA polymerase during RNA chain elongation. J Biol Chem 256: 2787–2797. Krasilnikov, A.S., Podtelezhnikov, A., Vologodskii, A., and Mirkin, S.M. (1999) Large-scale effects of transcriptional DNA supercoiling in vivo. J Mol Biol 292: 1149–1160. Lefstin, J.A., and Yamamoto, K.R. (1998) Allosteric effects of DNA on transcriptional regulators. Nature 392: 885–888. Lerner, S.A., Wu, T.T., and Lin, E.C.C. (1964) Evolution of a catabolic pathway in bacteria. Science 146: 1313–1315. Lilley, D.M.J. (1980) The inverted repeat as a recognizable structural feature in supercoiled molecules. Proc Natl Acad Sci USA 77: 6468–6472. Lindahl, T., and Nyberg, B. (1972) Rate of depurination of native deoxyribonucleic acid. Biochemistry 11: 3610–3618. Lindahl, T., and Nyberg, B. (1974) Heat-induced deamination of cytosine residues in deoxyribonucleic acid. Biochemistry 13: 3405–3410. Lipschutz, R., Falk, R., and Avigad, G. (1965) Mutation rates of the induced and repressed locus of alkaline phosphatase in Escherichia coli. Isr J Med Sci 1: 23–24. Liu, L.F., and Wang, J.C. (1987) Supercoiling and the DNA template during transcription. Proc Natl Acad Sci USA 84: 7024–7027. Longacre, A., Reimers, J.M., and Wright, B.E. (1999) Specificity of transcription-enhanced mutations. Ann NY Acad Sci 870: 383–385. López-García, P. (1999) DNA supercoiling and temperature adaptation: a clue to early diversification of life? J Mol Evol 49: 439–452. Luria, S., and Delbrück, M. (1943) Mutations of bacteria from virus sensitivity to virus resistance. Genetics 28: 491–504. Massey, R.C., Rainey, P.B., Sheehan, B.J., Keane, O.M., and Dorman, C.J. (1999) Environmentally constrained mutation and adaptive evolution in Salmonella. Curr Biol 9: 1477–1480. Moore, H., Greenwell, P.W., Liu, C.-P., Arnheim, N., and Petes, T.D. (1999) Triplet repeats form secondary structures that escape DNA repair in yeast. Proc Natl Acad Sci USA 96: 1504–1509. Müller, U.R., and Fitch, W.M. (1982) Evolutionary selection for perfect hairpin structures in viral DNAs. Nature 928: 582–585. Panagopoulos, I., Lasen, C., Isaksson, M., Mitelman, F., Mandahl, N., and Aman, P. (1997) Characteristic sequence motifs at the breakpoints of the hybrid genes FUS/CHOP, EWS/CHOP and FUS/ERG in myxoid liposarcoma and acute myeloid leukemia. Oncogene 15: 1357– 1362. Pananicolaou, C., and Ripley, L.S. (1991) An in vitro approach to identifying specificity determinants of mutagenesis mediated by DNA misalignments. J Mol Biol 221: 805–821. Papyotatos, N., and Wells, R.D. (1981) Cruciform structures in superhelical DNA. Nature 289: 466–470. Pruss, G.J., and Drlica, K. (1989) DNA supercoiling and prokaryotic transcription. Cell 56: 521–523. Rady, P., Scinicariello, F., Wagner, F., and King, S. (1992) p53 mutations in basal cell carcinoma. Cancer Res 52: 3804–3806. Rahmouni, A.R., and Wells, R.D. (1992) Direct evidence for the effect of transcription on local DNA supercoiling in vivo. J Mol Biol 223: 131–144. Ripley, L.S. (1982) Model for the participation of quasipalindromic DNA sequences in frameshift mutation. Proc Natl Acad Sci USA 79: 4128–4132. Rosenberg, S.M. (2001) Evolving responsively: adaptive mutation. Nature Rev Genet 2: 504–515. Rudner, R., Murray, A., and Huda, N. (1999) Is there a link between mutation rates and the stringent response in Bacillus subtilis? Ann NY Acad Sci 870: 418–422. SantaLucia, J., Jr (1998) A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc Natl Acad Sci USA 95: 1460–1465. Schaaper, R.M., and Dunn, R.L. (1991) Spontaneous mutation in the Escherichia coli lacI gene. Genetics 129: 317– 326. Schaaper, R.M., Danforth, B.N., and Glickman, B.L. (1986) Mechanisms of spontaneous mutagenesis: an analysis of the spectrum of spontaneous mutation in the Escherichia coli lacI gene. J Mol Biol 189: 273–284. Sinden, R.R., and Pettijohn, D.E. (1981) Chromosomes in living Escherichia coli are segregated into domains of supercoiling. Proc Natl Acad Sci USA 78: 224–228. © 2003 Blackwell Publishing Ltd, Molecular Microbiology, 48, 429–441 Predicting mutations in stem–loop structures 441 Singer, K.D., and Sterns, D.M. (1982) Chemical Mutagenesis. Annu Rev Biochem 52: 655–693. Skandalis, A., Ford, B.N., and Glickman, B.W. (1994) Strand bias in mutation involving 5-methylcytosine deamination in the human hprt gene. Mutat Res 314: 21–26. Sugden, K.D., and Stearns, D.M. (2000) The role of chromium (V) in the mechanism of chromate-induced oxidative DNA damage and cancer. J Environ Pathol Toxicol Oncol 19: 215–230. Todd, P.A., and Glickman, B.W. (1982) Mutational specificity of UV light in Escherichia coli: indications for a role of secondary structure. Proc Natl Acad Sci USA 79: 4123– 4127. Wells, R.D., Goodman, T.C., Hillen, W., Horn, G.T., Klein, R.D., Larson, J.E., et al. (1980) DNA structure and gene regulation. Prog Nucleic Acid Res Mol Biol 24: 167–267. Wright, B.E. (1996) The effect of the stringent response on mutations in Escherichia coli K12. Mol Microbiol 19: 213– 219. Wright, B.E. (1997) Does selective gene activation direct evolution? FEBS Lett 402: 4–8. © 2003 Blackwell Publishing Ltd, Molecular Microbiology, 48, 429–441 Wright, B.E. (2000) A biochemical mechanism for nonrandom mutations and evolution. J Bacteriol 182: 2993–3001. Wright, B.E., and Minnick, M.F. (1997) Reversion rates in a leuB auxotroph of Escherichia coli K12 correlate with ppGpp levels during exponential growth. Microbiology 143: 847–854. Wright, B.E., Longacre, A., and Reimers, J.M. (1999) Hypermutation in derepressed operons of Escherichia coli K12. Proc Natl Acad Sci USA 96: 5089–5094. Wright, B.E., Reimers, J.M., Schmidt, K.H., and Reschke, D.K. (2002) Hypermutable bases in the p53 cancer gene are at vulnerable positions in DNA secondary structures. Cancer Res 62: 5641–5644. Yanofsky, C.T., Platt, T., Crawford, I.P., Nichols, B.P., Christie, G.E., Horowitz, H., et al. (1981) The complete nucleotide sequence of the tryptophan operon of Escherichia coli. Nucleic Acids Res 9: 6647–6668. Zuker, M., Mathews, D.H., and Turner, D.H. (1999) A Practical Guide. In RNA Biochemistry and Biotechnology. Barciszewski, J., and Clark, B.F.C. (eds). Dordrecht: Kluwer Academic Publishers, pp. 11–43.
© Copyright 2026 Paperzz