Transcription-Induced Mutational Strand Bias and Its Effect on Substitution Rates in Human Genes Carina F. Mugal,1 Hans-Hennig von Grünberg,1 and Martin Peifer2 Institute of Chemistry, Karl-Franzens University Graz, Graz, Austria If substitution rates are not the same on the two complementary DNA strands, a substitution is considered strand asymmetric. Such substitutional strand asymmetries are determined here for the three most frequent types of substitution on the human genome (C / T, A / G, and G / T). Substitution rate differences between both strands are estimated for 4,590 human genes by aligning all repeats occurring within the introns with their ancestral consensus sequences. For 1,630 of these genes, both coding strand and noncoding strand rates could be compared with rates in gene-flanking regions. All three rates considered are found to be on average higher on the coding strand and lower on the transcribed strand in comparison to their values in the gene-flanking regions. This finding points to the simultaneous action of rateincreasing effects on the coding strand—such as increased adenine and cytosine deamination—and transcription-coupled repair as a rate-reducing effect on the transcribed strand. The common behavior of the three rates leads to strong correlations of the rate asymmetries: Whenever one rate is strand biased, the other two rates are likely to show the same bias. Furthermore, we determine all three rate asymmetries as a function of time: the A / G and G / T rate asymmetries are both found to be constant in time, whereas the C / T rate asymmetry shows a pronounced time dependence, an observation that explains the difference between our results and those of an earlier work by Green et al. (2003. Transcription-associated mutational asymmetry in mammalian evolution. Nat Genet. 33:514–517.). Finally, we show that in addition to transcription also the replication process biases the substitution rates in genes. Introduction Substitutional strand asymmetry in DNA evolution and its causes is a topic of long-standing interest (Francino and Ochman 1997; Frank and Lobry 1999). Such a strand asymmetry is given if substitution rates do not have identical values on the two complementary strands of the DNA. Substitutional strand asymmetry results from the processes of transcription and replication, which both distinguish between the two DNA strands. Replication-associated substitutional asymmetry has been found in bacteria (Lobry 1996b) and human mitochondrial DNA (Tanaka and Ozawa 1994), whereas transcription-induced substitutional asymmetry has been observed in enterobacterial genes (Francino et al. 1996; Beletskii and Bhagwat 1998; Francino and Ochman 2001) and mammalian genes (Green et al. 2003). In the following, we will focus mainly on transcriptioninduced rate asymmetries and will only briefly discuss the effect of replication on the observed rate asymmetries (Huvet et al. 2007). We study such asymmetries on the human genome for the three most frequent types of substitution, the two transitions C / T (with its complementary rate G / A) and A / G (connected to T / C) and the transversion G / T (complementary to C / A). The C / T transition is the dominant mutation for many genes (Coulondre et al. 1978; Lindahl 1993; Hutchinson and Donnellan 1998) and is particularly important on the human genome where in a genome-wide average it has a rate of a relative strength 0.93 (Karro et al. 2008). It is followed in strength by the A / G rate (strength 0.47) and the rate G / T (strength 1 Present address: Institute of Chemistry, Karl-Franzens University Graz, Graz, Austria. 2 Present address: Max-Planck Institute for Neurological Research, Cologne, Germany. Key words: substitution rate asymmetries, transcription-coupled repair, cytosine deamination. E-mail: [email protected]. Mol. Biol. Evol. 26(1):131–142. 2009 doi:10.1093/molbev/msn245 Advance Access publication October 29, 2008 Ó The Author 2008. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] 0.2). These rates can be associated with certain mutagenic lesions on the genome. Hydrolytic deamination of cytosine, 5-methylcytosine (5-mC), and adenine; oxidation of guanine to 8-hydroxyguanine (8-oxoG); and strand breaks and depurination are the major types of spontaneous directly mutagenic events in the living cell (Lindahl 1993). Cytosine and 5-mC are the main targets for hydrolytic deamination (Lindahl 1993). Deamination of C leads to uracil U; uracil simulates T, pairs with A, and after two rounds of replication the substitution from C / T becomes persistent (Francino and Ochman 2001) (see fig. 1). The resulting substitution rate, however, is naturally affected also by the efficiency of the corresponding repair enzymes, in the C / T case the uracil–DNA glycosylase (Lindahl and Nyberg 1974; Lindahl et al. 1977; Duncan and Miller 1980). Hydrolytic deamination of adenine to hypoxanthine (H) is one of the reactions that can be associated with the second rate considered here, A / G. This reaction is known to be much slower than cytosine deamination (Shapiro and Klein 1966; Karran and Lindahl 1980; Zhang et al. 2007). Because H forms a more stable base pair with C than with T, it is mutagenic; two rounds of replication are again required to produce an A:T to G:C transition (see fig. 1). As for uracil, there exists a hypoxanthine–DNA glycosylase in mammalian cells to release hypoxanthine from double-stranded DNA (Karran and Lindahl 1980; Dianov and Lindahl 1991). The third mutagenic base lesion important in the context of this work is 8-oxoG, a common product of oxidative damage to DNA. It can mispair with adenine and thus induces G / T (and C / A) transversions after replication (Shibutani et al. 1991). This lesion is repaired through base excision repair by 8-oxoG DNA glycosylases (David et al. 2007), an enzyme that works quite efficiently such that the substitution frequency of 8-oxoG in both Escherichia coli and mammalian cells is comparatively low (David et al. 2007). G / T transversions can also be caused by depurination if after the release of a guanine the DNA polymerase inserts an adenine opposite the abasic (AP) site (see again fig. 1). In this paper, these three types of substitutions will be shown to occur with different frequencies on the two DNA 132 Mugal et al. FIG. 1.—Deamination of cytosine to uracil and of adenine to hypoxanthine causes C / T and A / G transitions, respectively. The presence of 8oxo-G can lead to G / T transversions, whereas loss of a purine residue may cause A / T (Y 5 A) or G / T (Y 5 G) transversions. strands in transcribed regions of the human genome. During transcription, two processes are currently thought to be responsible for the generation of strand asymmetric substitutions: transcription-coupled repair (TCR) and hydrolytic cytosine deamination (Francino and Ochman 2001). The transcription-induced deamination event occurs primarily on the coding strand, which while transiently exposed during transcription is particularly vulnerable to deamination (Beletskii and Bhagwat 1996, 1998; Beletskii et al. 2000). In contrast, the noncoding strand is not exposed to the environment but protected by the RNA polymerase and the nascent RNA; therefore, hydrolytic deamination is less frequent on this strand. In fact, Francino and Ochman (2001) has shown that the C / T rate on noncoding strands in enterobacteria is even lower than the average value of this rate in the untranscribed region—an effect that the authors attribute to TCR. TCR corrects any kind of bulky lesions (Takahasi et al. 2007) that block transcription by causing the RNA polymerase to stall, which then leads to preferential repair of the lesion on the template strand of a gene. Among other things, we here try to quantify the relative contribution of TCR and other competing mechanisms to the observed substitutional rate asymmetries in human transcribed regions. Green et al. (2003) have recently discovered transcription-induced asymmetries also in mammalian germ-line genes. Based on comparative analyses of orthologous sequences of eight different mammals containing nine genes (in total 1.5 Mb), these authors have determined substitution rates and, subsequently, substitutional strand asymmetries. They find that A / G and G / A (purine transitions) are significantly higher on the coding strand than T / C and C / T (pyrimidine transitions) and argue that this bias may be caused by the mammalian TCR. Because this finding is not compatible with the transcriptioninduced substitutional asymmetry found in enterobacteria which were characterized by an increased rate C / T on the coding strand (Francino and Ochman 2001), Green et al. claimed to have found a transcription-induced strand asymmetry that is based on a qualitatively different mech- anism. Moreover, six of nine genes showed a significant G þ T excess over A þ C on the coding strand that was again coupled to transcription. This finding suggests the interpretation that this type of compositional asymmetry in mammalian transcribed regions is caused by transcriptioninduced substitutional asymmetries—a hypothesis that was further supported by correlating expression levels of germ cell genes with the compositional asymmetry (Majewski 2003; Comeron 2004; Webster et al. 2004). Green et al. (2003) observed that the same asymmetries were also seen when only substitutions in interspersed repeats in the transcribed regions were considered. This result motivated the present work. To further test the hypothesis of Green et al., we estimated the rates and rate strand asymmetries for C / T, A / G, and G / T for 4,590 human genes by aligning all repeats occurring within the introns of each gene with their ancestral consensus sequences. Our main result is that all three rates C / T, A / G, and G / T are significantly higher on the coding strand of transcribed regions when compared with their values in the genomic region adjacent to the genes (flanking region) and considerably lower on the template strand when measured relative to their values in the flanking region. This common behavior of all three rates leads to interesting correlations of the rates. In particular, we find that A / G and C / T are both considerably larger on the coding strand than on the noncoding strand. Whereas the first coding strand asymmetry ([A / G]cod . [A / G]noncod) agrees with the findings of Green et al., the second one ([C / T]cod . [C / T]noncod) does not, an apparent discrepancy that we will explain with a time dependence of the C / T rate. In a recent study, Huvet et al. (2007) were able to identify a large number of ‘‘putative’’ human replication origins. Genes in the neighborhood of these origins should show rate asymmetries that result from the superposed effect of both transcription and replication. Replication-associated strand asymmetries can either increase or decrease transcription-associated asymmetries, depending on the transcription direction and the position of the gene relative to the origin of replication. Resorting to the results of Huvet Mutational Strand Bias in Human Genes 133 et al., we will demonstrate in the last part of this paper that our three rates are indeed biased also by replication. Methods We briefly present our method to estimate substitution rates. It is the same method used in the work of Karro et al. (2008), where the method and its validation is explained in more detail. Our underlying model is a nonhomogeneous Markov chain, similar to that used in other standard approaches. The four-dimensional time-dependent state vector (pA(t), pC(t), pG(t), pT(t)), defining the probability of being in each state at time t, evolves according to a substitution rate matrix q. The transition probability matrix propagating a state vector a period t forward in time reads ð1Þ Pc t 5 exp mqc t : where m is the absolute substitution rate while the 4 4 rate matrix qc reflects the relative contributions of the different types of substitutions. One should remark that the product mqc is in principle a quantity averaged over the period t so that the rates within qc must be considered as time-averaged rates, with the time t in Pc(t) determining the period over which the rates are averaged. We estimated rates either gene by gene (the index c refers to the individual gene) or, using the repeat data of the entire genome, in a genome-wide analysis (index c is then suppressed). For qc, we used a 4 4 rate model with 12 independent substitution rates for all possible mutual replacements within the group of four base nucleotides. All estimates of rates are based on the transitional probability matrix Pc(t) which we fit to RepeatMasker-generated repeat data. Human repeat information was extracted by RepeatMasker version 3.1.2 and RM database version 20051025. Genome builds used were downloaded from the University of California–San Cruz (UCSC) browser for hg18 (human). Low-complexity repeats were removed. We were then left with M 5 857 repeat families. The RepeatMasker data provide the reconstructed ancestor of each repeat family a (a 5 1, . . ., M) as well as a pairwise alignment between each modern instance and the family’s reconstructed ancestor that was inserted into the genome a time ta ago. Alignment positions involving gaps are discarded. For each repeat family a and each gene c, we determine from repeat alignments the matrix components of Pc(ta) through a maximum likelihood fit. From this maximum likelihood fit, the substitution rates are obtained (which are quantities time averaged over the period ta). We considered just one strand, rate differences between two strands were obtained from rate differences between a rate and its complementary rate. We first performed a genome-wide maximum likelihood fit of the M numbers mta and the 12 rates in q on the right-hand side of equation (1) to the matrices P(ta) on the left-hand side of equation (1). Then the procedure was repeated for each gene c but keeping mta fixed to the corresponding genomewide values, resulting in the 12 rates for every gene. The resulting pool of rates were then averaged over all M repeat families in cases where all repeat families were included but only over the corresponding fraction of families in cases where the analysis was based on just a subset of families (such as in fig. 6). One can finally compute the relative rate asymmetries for gene c by dividing the difference between a rate and its complementary rate by the corresponding genome-wide rate. The LogDet time distance used in figure 6 is a measure for the time distance proportional to the time ta at which repeat family a was inserted into the genome. It can be obtained directly from the genome-wide transitional probability matrix P(ta) without any further fitting step (see Karro et al. 2008). A frequently occurring context-dependent substitution process in the mammalian genomes is CpG to TpG/CpA—a dinucleotide rate that as such is not explicitly included in our neighbor-independent evolutionary model. For this mutation to occur, the cytosine base needs to be located next to a G, and it must be methylated. To cope with the CpG effect, we have systematically blocked all CpG sites, as Green et al. (2003) have done. But even if we included all CpG sites in our analysis, the results hardly changed. A statistical test was developed to distinguish genes with high rate asymmetries from those that rather follow a strand symmetric substitution pattern. To this end, we fitted an asymmetric and a symmetric model to the genomic data. The asymmetric model has 12 independent rates, whereas in the symmetric model, every rate is set equal to its complementary rate, resulting in just six independent rates within qc in equation (1). Clearly, the more complex model having more fitting parameter will fit the data better. We therefore used the Bayesian information criterion (Schwarz 1978) to decide whether or not the asymmetric model describes the underlying substitution process better. Finally, presented correlations are quantified with Pearson’s moment correlation coefficients (r) having a P value of P , 104 to be different from zero if not otherwise stated. Results We downloaded all transcribed regions from chromosomes 1–22 of the human genome from Known Gene at UCSC (March 2006). As outlined in the Methods, we determined the six rates (A / C, A / T, C / G, C / A, A / G, and C / T) together with their complementary rates by aligning the ancestor sequence of each repeat family (857 families) with the individual repeat copies occurring within the intronic regions of each gene. CpG sites were systematically ignored. To ensure proper statistics, we considered only those 4,590 genes (i.e., roughly 20% of all genes) that have more than 40 repeat copies included in their introns (on average 78 copies per gene). In total, our analysis is based on repetitive elements of an accumulated length of about 350 Mb (360,000 repeat copies). Based on the 12 estimated rates, one can compute rate strand asymmetries for each gene by computing the difference between a rate and its complementary rate which is the same as the difference between a rate on one strand and the same rate on the other strand. From the resulting data, we selected 1,070 genes that show particularly high rate asymmetries. The statistical test that we developed to select these genes is described in the Methods. 134 Mugal et al. FIG. 2.—Correlations between rate strand asymmetries for the three rates A / G, C / T, and G / T. A rate asymmetry is defined to be the difference in substitution rates on the two complementary DNA strands. Data are given in percent by dividing these rate differences by the genomewide averaged rate. Every point corresponds to one human gene. From the initially 4,590 genes considered (gray data points), 1,070 genes (black data points) are selected showing a particularly high rate asymmetry. Correlation coefficients: r 5 0.67 for A / G versus G / T, r 5 0.50 for C / T versus G / T, and r 5 0.61 for C / T versus A / G. Figure 2 shows the correlations between rate asymmetries for the three main rates A / G, C / T, and G / T. The rate asymmetries are given in percent by relating the rate differences between the two strands to the genomewide averaged rate. For the genome-wide averaged rates, no strand asymmetries were found (i.e., the rate differences between both strands were below the statistical error of our method). The plot documents that all three rates are highly correlated: If one rate is significantly higher (lower) on one strand, then also the rates of the two other substitutions are consistently higher (lower) on that strand. Further below, this result will be explained with similarities in the molecular mechanisms causing the asymmetries. The 1,070 genes selected for their high rate asymmetries are highlighted by black crosses in figure 2. Our test identifies this subgroup of genes using a substitution-typeaveraged criterion for rate asymmetry. To visualize the extent to which the individual type of substitution is affected by rate asymmetries, figure 3 shows histograms over rate asymmetries for all six types of substitutions. The differences for the two transversions C / G and A / C show unimodal distributions, whereas the distributions of all other rates are clearly bimodal. A unimodal distribution can either mean that there is no rate asymmetry at all for that substitution type or that it is too small in magnitude for our resolution. It is evident that relative to its genome-wide value the A / G transitional substitution shows the strongest asymmetry, with values that can become as high as 50% of the genome-wide rate. C / T and G / T show rate splittings of roughly the same order of magnitude; a weak rate asymmetry can also be observed for the A / T transversion. Asymmetries have been claimed to correlate with the direction of transcription (Beletskii and Bhagwat 1996; Francino and Ochman 2001; Green et al. 2003; Touchon et al. 2003). To test whether the rate differences [X / Y]strand 1 [X / Y]strand 2 in figure 2 support this idea, we determined the direction of transcription of all genes and assign this information to the rate asymmetries. Figure 4 shows again the correlation diagram of figure 2 but now with a key that indicates whether strand 1 in [X / Y]strand 1 [X / Y]strand 2 is the coding or the non- coding strand. For all three types of substitution, the differences [X / Y]strand 1 [X / Y]strand 2 tend to be positive if strand 1 is the coding strand and rather negative otherwise. Both findings are summarized by the statement that [X / Y]cod [X / Y]noncod tends to be positive. In words, all three rates tend to be larger on the coding strand compared with their values on the transcribed strand. So far, we have compared rates on the two strands with respect to each other. To decide whether these rates decrease or increase relative to their environment, we next compare the rates in the transcribed regions with rates determined in the immediate untranscribed neighborhood (gene-flanking region). Processes such as replication can produce rate asymmetries both in transcribed and untranscribed regions (Huvet et al. 2007). Thus, taking the difference between rates in genic and gene-flanking regions is also a convenient way to filter out the effect of these underlying processes. The flanking regions were located on either side of the transcribed region, with a sequence length ranging between 100 kbp and 1 Mbp. Genes without a flanking region (of a length of at least 100 kbp) were not further considered, resulting in a subset of 1,733 genes from the initial pool of 4,590 genes. It was found that rate asymmetries within the untranscribed flanking regions were only weakly developed, showing for all rates rather unimodal distributions similar to those for C / G and A / C in figure 3. Only 103 flanking regions (6% of all flanking regions) showed particularly high rate asymmetries according to our statistical test. The corresponding genes were also excluded from the subsequent analysis, such that the comparison was performed on the basis of 1,630 genes having flanking regions that 1) are of sufficient length and 2) show only rather weak rate asymmetries. In figure 5, we show—for all three rates and for coding and noncoding strands separately—the percent difference between the rates in the transcribed region and those in the flanking region. These are values obtained after averaging over all 1,630 genes considered. To document that also in this comparison the rate asymmetries from figures 2–4 have an effect, we next applied our statistical test to identify among the 1,630 genes those 356 genes with particularly high asymmetries (see Methods) Mutational Strand Bias in Human Genes 135 FIG. 3.—Normalized distributions of substitution rate strand asymmetries on 1,070 human genes with exceptionally high rate asymmetries. The upper row of graphs shows histograms of the three types of substitutions considered in this work. and computed averaged rates also on the basis of this subset (column in fig. 5 labeled asymmetric regions). The whole procedure was then repeated, computing rates only on the basis of a subgroup of the 50 youngest repeat families (see below). All three rates considered (C / T, A / G, and G / T) are found to be consistently higher on the coding strand of genes in comparison to their values in the flanking region. The difference becomes considerably larger if only genes showing high rate strand asymmetries are included; now rate differences between genic and flanking regions can become as high as 40% of the mean genome-wide rate (for the C / T and A / G transitions in fig. 5). If only the class of youngest repeats are used in our analysis, the result remains essentially unaltered for the A / G and G / T rates but changes for the C / T transitions—a finding that further below will be explained with a time dependence of C / T transition rates. A similar result is obtained for the noncoding strand. Figure 5 shows that on the noncoding strand, all rates are lower when compared with their values in the flanking region. For A / G, this reduction is more pronounced if only genes showing high rate asymmetries are considered, whereas for C / T and G / T, the particular choice of the subgroup of genes hardly affects the results. Taken together, figure 5 confirms that the rates C / T, A / G, and G / T are increased on the coding strand (and decreased on the noncoding strand) not only with respect to the rates on the complementary strand (as has been shown in fig. 4) but also with respect to the rates in the flanking region. FIG. 4.—Same data as in figure 2 but with a different key to identify direction of transcription: If strand 1 is the coding strand, points are plotted with black crosses, else if strand 1 is the noncoding strand, gray squares are chosen. 136 Mugal et al. FIG. 5.—Relative rate differences between rates in the transcribed genic regions and in the intergenic regions adjacent to these transcribed regions (flanking regions), averaged over all 1,630 genes considered (column labeled ‘‘all transcribed regions’’) and over a subset of 356 genes showing high rate asymmetries (column labeled asymmetric regions). Data are based on an analysis including either all repeat families or just the subgroup of youngest repeat families (columns labeled ‘‘young repeats’’). The error bars represent the 95% confidence intervals. The large pool of available repeat families and repeat copies permits us to use just a fraction of these families for estimating rates. This opens up interesting perspectives to derive information on the time dependence of rates. We therefore determined for each repeat family its average LogDet time distance (Karro et al. 2008) (the ‘‘age’’ of the family) and sorted all families according to their age. We then took a package of 50 families from this list, determined the mean age of these families, and estimated rates on the basis of just this group. Shifting the package of 50 families stepwise downward in the list, repeating each time the procedure to estimate rates, one obtains rates as a function of the mean age of the respective package of families. The result of this procedure is shown in figure 6. The rates represent averages over all 4,590 genes. The x axis can be understood as time axis, where a point in time must be considered as a representative of the group of repeat families used for the analysis. The set of 50 youngest (oldest) repeat families has a mean time distance of about 0.07 (0.38), which is where the curves in figure 6 are terminated. Rates based on a group of repeat families with a mean age T are to be understood as quantities averaged over the time period T. Thus, rates based on the group of oldest repeat families are averaged over the period T 5 0.38, whereas those obtained with the group of youngest repeat are averaged only over the time period T 5 0.07. The figure demonstrates that for both the A / G transition and the G / T transversion, there is hardly any dependence of the rate asymmetries on the age of the repeat group used in the analysis. For the C / T transition, how- ever, the analysis based on older repeats leads to the asymmetry [C / T]cod . [C / T]noncod, whereas, consistent also with the comparison in figure 5, an analysis based on younger repeat families suggests that there is no rate strand asymmetry for this transition. The good statistics we have is demonstrated by the fact that even though the rates for coding and noncoding strands are based on completely disjunct sets of repeat data, the two resulting curves are almost mirror inverted to each other with respect to the zero baseline. To study the interdependence between substitutional rate asymmetry and compositional asymmetry, we show in figure 7 the total compositional skew S 5 (T A)/(T þ A) þ (G C)/(G þ A) as well as the G þ T content of each gene versus the total substitutional rate asymmetry. To ensure that the calculations on each axis are independent, the GT content has been determined from the introns after excluding all repeats. For the values on the x axis, the data from figure 4 are taken, multiplied with 1 if strand 1 is the noncoding strand and with þ1 otherwise, such that the rate differences in figure 7 now refer to a rate at the coding strand minus the same rate at the noncoding strand, in that order. We find that the compositional asymmetry is high whenever the substitutional asymmetry is increased, a result that we will compare to the findings of Green et al. (2003) in the next section. We finally wanted to examine the influence of replication onto our results. Replication is generally known as an additional process leading to substitutional rate asymmetries. Touchon et al. (2005) have recently shown that like Mutational Strand Bias in Human Genes 137 FIG. 6.—Rate asymmetries between the two complementary DNA strands averaged over all genes considered and plotted as a function of the mean age of the respective group of repeats which the rate estimation procedure is based on. Color code as defined in figure 4. The error bars (dashed lines) represent the 95% confidence intervals. The arrow is explained in the text. in prokaryotes also in mammalian cells genomic regions situated 5# of an origin of replication (ORI) tend to have a negative skew, whereas regions situated 3# of an ORI show on average a positive skew. Thus, when comparing the leading with the lagging strand of replication they have found compositional strand asymmetries that are quite similar to those on the noncoding and coding strands of a gene. They concluded that for genes located in the neighborhood of ORIs, the strand asymmetries associated with replication and transcription superpose each other. The compositional skew is amplified if the coding strand happens to be the lagging strand or attenuated otherwise. This effect should be dependent on the distance between gene loci and the nearest ORI. Following Touchon et al. (2005), we may assume that averaged over many replication processes, the counteracting effect of replication starting from two neighboring ORIs cancel each other exactly at the midpoint between these two ORIs. We furthermore can assume that the replicationassociated bias decreases linearly from its largest value 0.4 55 S 0.2 0 50 -0.2 -0.4 -0.4 r = 0.80 -0.2 0 0.2 0.4 substitutional asymmetry r = 0.83 0.6 45 -0.4 -0.2 0 0.2 0.4 0.6 substitutional asymmetry FIG. 7.—Compositional skew S (left panel) and current G þ T content (right panel) of the introns of each gene versus the total substitutional rate asymmetry, defined as the sum of the three differences [C / T]cod [C / T]noncod, [A / G]cod [A / G]noncod, and [G / T]cod [G / T]noncod. Correlation coefficients are r 5 0.80 and r 5 0.83 for the correlations in the left and right plots, respectively. 138 Mugal et al. rel. difference [%] [X→Y]cod - [X→Y]non-cod difference between [G→T] 0.6 r = 0.22 a) difference between [A→G] r = 0.34 b) difference between [C→T] r = 0.49 c) 0.4 0.2 0 -0.2 -0.5 0 0.5 putative bias of replication -0.5 0 0.5 putative bias of replication -0.5 0 0.5 putative bias of replication FIG. 8.—Substitutional asymmetries are also affected by replication: Rate asymmetries correlated with the weighted distance of a gene’s center to the midpoint between its nearest left and right origin of replication. One data point for each of the 1,088 genes that are positioned within one of the Ndomains determined by Huvet et al. (2007). Correlation coefficients are given in the plots. (directly at the ORI) to zero (half way to the next ORI). Under these assumptions, we identified those 1,088 genes (from our pool of 4,590 genes) having positions within one of the 678 ‘‘N-domains’’ identified by Huvet et al. (2007), where the word N-domain refers to the region between two neighboring ORIs. For each of these genes, we then determined the relative distance b between the center of a gene and the center of its respective N-domain (ranging from 1 to þ1) and multiplied this distance by þ1 if lagging and coding strand were the same and by 1 otherwise. Figure 8 correlates the rate asymmetries for the 1,088 genes with this weighted distance b. For all three substitution types, we find significant correlations, with correlation coefficients ranging between 0.22 and 0.49 as indicated in the figure. This result supports the idea that—depending on the distance to the next ORI and the relative direction of transcription and replication—the rate asymmetries of genes are to some extent also produced by the process of replication. Discussion Asymmetry 1: Increased Rates on the Coding Strand of Transcribed Genes All three rates considered (C / T, A / G, and G / T) are found to be on average higher on the coding strand in comparison to their values both in the gene-flanking regions and on the noncoding strand. Following Beletskii and Bhagwat (1996, 1998) and Beletskii et al. (2000), this observation could be a consequence of the fact that during transcription the coding strand is transiently exposed to the environment and—being single stranded—is then particularly vulnerable to mutagenic reactions such as deamination or depurination. Indeed, a number of experiments confirm that single strandedness can significantly enhance mutational rates and that enhanced rates are therefore to be expected in regions of single-stranded DNA generated during replication and transcription (Karran and Lindahl 1980; Lindahl 1993). For example, the cytosine deamination rate at 37 °C for human double-stranded DNA is 7 1013 1/s but as high as 1010 1/s if the DNA is single stranded, that is, a factor 140 times larger (Frederico et al. 1990). Although enhanced cytosine deamination on single and exposed strands can be made responsible for our finding of increased C / T coding strand transitions, the increased A / G transition rate is more difficult to explain because the situation with adenine deamination in singlestranded DNA is less clear. Under comparable conditions, the deamination of adenine residues on single strands was 40 times slower than that of cytosine (Karran and Lindahl 1980). If adenine deamination was the only mutagenic effect related to the A / G transition, then this factor 40 would be in conflict with our results because figure 5 clearly shows that the A / G rate has even a higher relative increase on the coding strand than the C / T transition rate, which becomes even worse if absolute rate differences were considered. Therefore, mutagenic effects other than adenine deamination must have an impact on the A / G rate asymmetry. The increased G / T transversion rate on the coding strand could possibly mean that just like the deamination rate also the rate of creation of mutagenic 8-oxo-G is increased on single strands. In addition, mutagenesis induced by AP sites could also play a role. It is known that considerable amounts of purine bases are continuously released from double-stranded DNA (Lindahl and Nyberg 1972; Lindahl 1993). Depurination of a guanine with a preferred insertion of adenine opposite such an AP site would then lead to a G / T transversion (see fig. 1). Such a substitution mechanism would indeed be strongly affected by the single strandedness of the coding strand during transcription: It is known that the depurination rate jumps from 3 1011 1/s on double-stranded DNA to 1010 1/s on single-stranded DNA (Lindahl and Nyberg 1972)—thus being the only mutagenic reaction with rates as high as the single-stranded cytosine deamination rate. This mechanism requires a DNA polymerase that preferably inserts an A opposite an AP site. In mammalian cells, controversial reports suggest either no specificity in the insertion of bases opposite the AP site (Gentil et al. 1992) or the preferential incorporation of an A opposite an AP site (Shibutani et al. 1997; Avkindagger et al. 2002; Simonelli et al. 2005), which would produce either an A / T transversion (if A was released) or a G / T transversion (if G was released). If from time to time also a C was inserted opposite an AP site, then Mutational Strand Bias in Human Genes 139 this AP-induced mutation mechanism would contribute also to the A / G transition rate. Asymmetry 2: Reduced Rates on the Noncoding Strand of Transcribed Genes All three rates tend to be lower on the noncoding strand when compared with their values in the flanking regions. The fact that substitution rates on the transcribed strand of genes are considerably below their values in untranscribed regions can only mean that a mechanism is effective which removes mutagenic lesions from the transcribed strand of active genes. Such TCR mechanisms are long known and have been shown to actively repair DNA damage, for example, the products of guanine oxidation (Mellon et al. 1987; Svejstrup 2002; Kuraoka et al. 2003; Saxowsky and Doetsch 2006). Our results in figure 5 suggest that TCR must also operate on the damage causing the A / G and C / T mutations. Candidate enzymes for TCR include the recently described NEIL enzymes that exhibit a preference for excising 8-oxoG from singlestranded DNA and from bubble structures rather than from intact double-stranded DNA, prompting the suggestion that this enzyme may play a role in the excision of damage from a transcription bubble (Dou et al. 2003; Hazra et al. 2007). Reported substrates of this enzymes also include 5-hydroxyuracil, an oxidative product of cytosine (Thiviyanathan et al. 2005), which like U pairs preferentially with adenine, thus leading to the same mutagenic effect. The explanation that TCR reduces the rates on the transcribed strand is not unproblematic. It is thought that a lesion must block the RNA polymerase progression for TCR to occur (Scicchitano et al. 2004). However, all the mutagenic lesions causing our three rates do not usually block RNA polymerase and should therefore not be able to trigger the TCR mechanism. Specifically, high transcriptional bypass of uracil has been reported in vivo (Viswanathan et al. 1999); high lesion bypass efficiency has also been observed of AP sites, both during transcription (Kuraoka et al. 2003) and replication (Avkindagger et al. 2002); and even oxidatively damaged bases, such as 8-oxoG, do not usually stall the RNA polymerase (Saxowsky and Doetsch 2006). And still, TCR of 8-oxoG has clearly been observed in mammalian cells (Page, Klungland, et al. 2000; Page, Kwoh, et al. 2000; Kuraoka et al. 2003). How TCR is activated in the absence of a significant block to transcription is far from being clear (Scicchitano et al. 2004). Different possibilities are currently discussed (Scicchitano et al. 2004). For example, it has been suggested that such lesions might merely pause the transcription machinery and that this is already sufficient to induce TCR (Zhou and Doetsch 1993; Bregeon et al. 2003). Also, lesions that usually do not block may lead to some sort of blocking at some minimal rate under certain circumstances, which—no matter how small this rate is—would statistically favor eventual TCR, that is, even if one polymerase bypasses the lesion, the chance of every subsequent bypass is statistically reduced, and once the lesion stalls a transcription complex, it will be subjected to TCR. However, the detailed molecular mecha- nisms look like, our results clearly demonstrate that such a mechanism must exist and that it actively works toward reducing rates on the template strand of transcribed genes. Asymmetry 3: The Correlation between Rate Asymmetries In enterobacterial genes, it is long known that cytosine deamination is coupled to transcription and that it occurs primarily on the coding strand (Beletskii and Bhagwat 1996, 1998; Beletskii et al. 2000). Simultaneously, there is a reduction of the C / T rate on the noncoding strand caused by TCR. As first pointed out by Francino and Ochman (2001), these two processes are likely to occur together, both working toward an increase of the transcription-induced substitutional asymmetry of C / T transitions in bacterial genes. The simultaneous occurrence of rate-reducing effects on one strand and rate-increasing effects on the other during transcription is in essence what we have found here also for human genes and with regard not only to the C / T transition but also to the A / G and G / T type of substitution. Interestingly, for C / T and G / T, the rate reduction on the noncoding strand in figure 5 was the same, irrespective of whether we averaged over all transcribed regions or just over the subset of highly asymmetric genes. At the same time, we observe on the coding strand a dramatic increase of the rate difference when going from an average over all transcribed regions to one over just the asymmetric genes. These two observations together suggest that for C / T and G / T particularly strong rate asymmetries are caused primarily by the rate-increasing effects on the coding strand. The fact that all three rates show a common behavior—being all three reduced on the transcribed and increased on the coding strand—expresses itself in the correlations observed in figure 4: If there is a strand bias for one rate, then the rates of the two other substitutions are likely to show the same bias. Comparison to Work of Green et al. Green et al. (2003) have recently examined transcription-induced asymmetries in nine different genes of eight different mammals and have found that A / G tend to have higher rates on the coding strand than on the noncoding strand, whereas C / T transitions are found to have slightly higher rates on the noncoding strand than on the coding strand. Although the first asymmetry is compatible with our results, the C / T asymmetry seems to be in contradiction to our findings. However, the method chosen by Green et al. differs from ours. Their procedure is based on a three-species alignment between human, chimpanzee, and baboon. With their method, these authors reconstructed ancestral sequences of introns at some specific point in time, the human–chimp speciation time (relative time t 5 0.0016 marked by an arrow in fig. 6) and then used these sequences to identify substitutions that have occurred between this time point and today. We, on the other hand, estimate rates on the basis of repetitive elements occurring within introns—a method that allows to determine rates that are averaged over different periods of times, depending on the 140 Mugal et al. composition of the group of repeat families used in the rate estimation procedure(see fig. 6). For the C / T transition, the older repeats are found to predict the asymmetry [C / T]cod . [C / T]noncod, whereas an analysis based on the group of youngest repeat families (with mean age t 5 0.07) suggests that [C / T]cod [C / T]noncod. Unfortunately, we cannot perform our analysis exactly at the time point t 5 0.0016 investigated by Green et al. but extrapolating the trend of our curves in figure 6 down to t 5 0.0016 an agreement with the rate asymmetry found by Green et al. ([C / T]cod [C / T]noncod) does not seem unlikely. We are not able to give a biological explanation for the observed time dependence of the C / T rate asymmetry. The important point, however, is that the asymmetry [C / T]cod . [C / T]noncod found for all times t . 0.15 has been effective over much longer periods of time than the asymmetry with reversed rates and thus will be the decisive factor determining the resulting base compositional asymmetry discussed below. Relation to the Compositional Skew GC and AT skew diagrams were introduced in the analysis of bacterial chromosomes for short sequences (Lobry 1996a), complete genomes of bacteria (Lobry 1996c; Blattner et al. 1997; Grigoriev 1998), and viruses (Grigoriev 1999). The correlation found in figure 2 has an immediate impact on to the compositional skew of human genes. We focus in the following discussion on the two strongest types of substitutions, the two transitions A / G and C / T. The correlation in figure 2c can be reinterpreted as the correlation between (A / G) (T / C) and (C / T) (G / A) when measured on the same strand. This correlation implies that the difference of nucleotide frequencies of G and C correlates with the difference of the frequencies of T and A. In other words, the correlation found for the rate asymmetries dictates that the GC skew SGC 5 (G C)/(G þ C) must correlate with the TA skew STA 5 (T A)/(T þ A). Following this line of reasoning, the total skew S 5 SGC þ STA must correlate with the sum of rate asymmetries, as we have indeed found in figure 7. This shows that compositional and substitutional strand asymmetries have a causal relation. Moreover, figure 4c implies that on the coding strand, both the GC and the TA skew are significantly higher than on the noncoding strand. Exactly, this correlation and also its association with transcription has been found previously by Touchon et al. (2003) who examined compositional asymmetries on intron sequences from human genes. Our rate asymmetries, C/Tcod .½C/Tnoncod 5 ½G/Acod ; A/Gcod .½A/Gnoncod 5 ½T/Ccod ; G/Tcod .½G/Tnoncod 5 ½C/Acod ; ð2Þ when acting over evolutionary long periods of time, must have the effect that on the coding strand an excess of G and T is produced because more guanine residues and thymine residues are gained by substitutions than lost by substitutions back to A and C. Accordingly, the G þ T content on the coding strand must correlate with the sum of [C / T]cod [G / A]cod, [A / G]cod [T / C]cod, and [G / T]cod [C / A]cod, as we have indeed found in figure 7. A correlation coefficient as high as 0.83 has been found, confirming that the deviations of G þ T from 50% are indeed produced by the strand asymmetries of the transitions. Moreover, a linear fit through the data showed that when C / T, A / G, and G / T are the same on both strands (genes with strand symmetric substitution patterns), the regression line predicts a G þ T content of roughly 50%, which is in agreement with Chargaff’s second parity rule (Sueoka 1995). Green et al. (2003) explained an excess of G þ T pairs over A þ C pairs by coding strand rate asymmetries of the form [A / G]cod [T / C]cod . 0 and [G / A]cod [C / T]cod . 0 that are different from ours. They argued that G þ T is in excess on the coding strand because [A / G]cod [T / C]cod . 0 has a much stronger effect toward increasing G þ T than the weak [G / A]cod [C / T]cod . 0 asymmetry working toward reducing the G þ T content. With our rate asymmetries in equation (2), all three rates work in the same direction, that is, toward building up the G þ T content. A compositional skew can only result from a substitutional asymmetry if the latter has been effective over a sufficiently long, that is, evolutionary, time distance. However, Green et al. (2003) could study rate asymmetries only at one specific point in time and had to assume a time independence of their strand bias for their argument to hold. Figure 6 indicates that at least for the C to T rate this assumption cannot be made. If rates are indeed time dependent, then it is impossible to clarify the exact relationship between compositional and substitutional asymmetry on the basis of rate asymmetries determined at just one point in time. Relation to Replication-Induced Bias Both replication and transcription lead to a mutational strand bias. The idea behind studying rate differences between genic and flanking regions (fig. 5) has been to distinguish between these two asymmetry-producing effects: Because on average the strength of the replication-induced bias should be very similar in the genic and gene-flanking region, its effect should be eliminated if rates in a transcribed region are subtracted from their values in the flanking region. Therefore, the asymmetries found in figure 5 should reflect primarily the effect brought about by transcription. Figure 8, on the other hand, demonstrates that substitutional asymmetries of genes are also influenced by replication. Replication enhances the transcription-induced asymmetry when the coding strand of a gene is identical with the lagging strand and reduces this asymmetry if not. Furthermore, the correlation found shows that the effect of replication tends to be largest near to an ORI and can be assumed to linearly decrease down to zero half way to the next ORI. Interestingly, whereas transcription has the most pronounced effect on the A to G rate, replication seems to be particularly important for the C to T rate asymmetry, showing by far the strongest correlation in figure 8. We should also remark that the origins of replication that we used in our analysis were determined by Huvet et al. Mutational Strand Bias in Human Genes 141 (2007) through a guided search for certain patterns of the compositional skew. As such they are of course only putative origins. The strength of the correlations shown in figure 8 seems to confirm that these positions are indeed origins of replication. It seems that there are two important factors causing the strong gene-to-gene variations of rate asymmetries observable in figures 2–4: First, in case, the rate asymmetries are induced by transcription, repeated transcription increases the extent of substitutional asymmetry. Hence, genes having a high level of germ-line expression should have high rate asymmetries. This assumption is supported by a previously found correlation between gene expression and compositional skew (Majewski 2003; Comeron 2004). Second, figure 8 shows that the extent of substitutional asymmetry is also affected by the gene’s transcription direction and position relative to the next ORI. For genes located close to an ORI, the two processes, replication and transcription, superpose each other in either a positive or a negative direction, thus leading to a variation in rate asymmetry. Further work is needed to explain these variations in a more quantitative fashion. Acknowledgments We are grateful to Dr John Karro for his constant support, his helpfulness, and advice and to Dr Phil Green for reading and commenting on the first draft of this manuscript. M.P. received financial support from the Austrian Science Foundation (FWF) under project title P18762. C.F.M. has been supported by the ‘‘Heinrich Jörg Stiftung’’ of the Karl-Franzens University Graz. Literature Cited Avkindagger S, Adar S, Blander G, Livneh Z. 2002. Quantitative measurement of translesion replication in human cells: evidence for bypass of abasic sites by a replicative DNA polymerase. Proc Natl Acad Sci USA. 99:3764–3769. Beletskii A, Bhagwat A. 1996. Transcription-induced mutations: increase in C to T mutations in the nontranscribed strand during transcription in Escherichia coli. Proc Natl Acad Sci USA. 93:13919–13924. Beletskii A, Bhagwat A. 1998. Correlation between transcription and C to T mutations in the non-transcribed DNA strand. Biol Chem. 379:549–551. Beletskii A, Grigoriev A, Joyce S, Bhagwat A. 2000. Mutations induced by bacteriophage T7 RNA polymerase and their effects on the composition of the T7 genome. J Mol Biol. 300:1057–1065. Blattner F, Plunkett G, Bloch C, et al. (17 co-authors). 1997. The complete genome sequence of Escherichia coli K-12. Science. 277:1453–1462. Bregeon D, Doddridge Z, You H, Weiss B, Doetsch P. 2003. Transcriptional mutagenesis induced by uracil and 8-oxoguanine in Escherichia coli. Mol Cell. 12:959–970. Comeron J. 2004. Selective and mutational patterns associated with gene expression in humans: influences on synonymous composition and intron presence. Genetics. 167:1293–1304. Coulondre C, Miller J, Farabaugh P, Gilbert W. 1978. Molecularbasis of base substitution hotspots in Escherichia-coli. Nature. 274:775–780. David S, O’Shea V, Kundu S. 2007. Base-excision repair of oxidative DNA damage. Nature. 447:941–950. Dianov G, Lindahl T. 1991. Preferential recognition of IT basepairs in the initiation of excision-repair by hypoxanthineDNA glycosylase. Nucleic Acids Res. 19:3829–3833. Dou H, Mitra S, Hazra T. 2003. Repair of oxidized bases in DNA bubble structures by human DNA glycosylases NEIL1 and NEIL2. J Biol Chem. 278:49679. Duncan B, Miller J. 1980. Mutagenic deamination of cytosine residues in DNA. Nature. 287:560–561. Francino M, Chao L, Riley M, Ochman H. 1996. Asymmetries generated by transcription-coupled repair in enterobacterial genes. Science. 272:107–109. Francino M, Ochman H. 1997. Strand asymmetries in DNA evolution. Trends Genet. 13:240–245. Francino M, Ochman H. 2001. Deamination as the basis of strand-asymmetric evolution in transcribed Escherichia coli sequences. Mol Biol Evol. 18:1147–1150. Frank A, Lobry J. 1999. Asymmetric substitution patterns: a review of possible underlying mutational or selective mechanisms. Gene. 238:65–77. Frederico L, Kunkel T, Shaw B. 1990. A sensitive genetic assay for the detection of cytosine deamination-determination of rate constants and the activation-energy. Biochemistry. 29:2532–2537. Gentil A, Cabral-Neto J, Mariage-Samson R, Margot A, Imbach J, Rayner B, Sarasin A. 1992. Mutagenicity of a unique apurinic/apyrimidinic site in mammalian cells. J Mol Biol. 227:981–984. Green P, Ewing B, Miller W, Thomas P, Green E. 2003. Transcription-associated mutational asymmetry in mammalian evolution. Nat Genet. 33:514–517. Grigoriev A. 1998. Analyzing genomes with cumulative skew diagrams. Nucleic Acids Res. 26:2286–2290. Grigoriev A. 1999. Strand-specific compositional asymmetries in double-stranded DNA viruses. Virus Res. 60:1–19. Hazra T, Das A, Das S, Choudhury S, Kow Y, Roy R. 2007. Oxidative DNA damage repair in mammalian cells: a new perspective. DNA Repair. 6:470–480. Hutchinson F, Donnellan J. 1998. A mutation spectra database for bacterial and mammalian genes: 1998. Nucleic Acids Res. 26:290–291. Huvet M, Nicolay S, Touchon M, Audit B, d’Aubenton Carafa Y, Arneodo A, Thermes C. 2007. Human gene organization driven by the coordination of replication and transcription. Genome Res. 17:1278–1285. Karran P, Lindahl T. 1980. Hypoxanthine in deoxyribonucleic acid. Biochemistry. 19:6005. Karro J, Peifer M, Hardison R, Kollmann M, von Grünberg H. 2008. Exponential decay of GC content detected by strandsymmetric substitution rates influences the evolution of isochore structure. Mol Biol Evol. 25:362–374. Kuraoka I, Endou M, Yamaguchi Y, Wada T, Handa H, Tanaka K. 2003. Effects of endogenous DNA base lesions on transcription elongation by mammalian RNA polymerase II. J Biol Chem. 278:7294. Lindahl T. 1993. Instability and decay of the primary structure of DNA. Nature. 362:709–715. Lindahl T, Ljungquist S, Siegert W, Nyberg B, Sperens B. 1977. DNA N-glycosidases: properties of uracil-DNA glycosidase from Escherichia coli. J Biol Chem. 252:3286–3294. Lindahl T, Nyberg B. 1972. Rate of depurination of native deoxyribonucleic acid. Biochemistry. 11:3610. Lindahl T, Nyberg B. 1974. Heat-induced deamination of cytosine residues in deoxyribonucleic acid. Biochemistry. 13:3405–3410. Lobry J. 1996a. A simple vectorial representation of DNA sequences for the detection of replication origins in bacteria. Biochimie. 78:323–326. 142 Mugal et al. Lobry J. 1996b. Asymmetric substitution patterns in the two DNA strands of bacteria. Mol Biol Evol. 13:660–665. Lobry J. 1996c. Origin of replication of Mycoplasma genitalium. Science. 272:745–746. Majewski J. 2003. Dependence of mutational asymmetry on gene-expression levels in the human genome. Am J Hum Genet. 73:688–692. Mellon I, Spivak G, Hanawalt P. 1987. Selective removal of transcription-blocking DNA damage from the transcribed strand of the mammalian DHFR gene. Cell. 51: 241–249. Page FL, Klungland A, Barnes D, Sarasin A, Boiteux S. 2000. Transcription coupled repair of 8-oxoguanine in murine cells. Proc Natl Acad Sci USA. 97:8397. Page FL, Kwoh E, Avrutskaya A, Gentil A, Leadon S, Sarasin A, Cooper P. 2000. Transcription-coupled repair of 8-oxoguanine: requirement for XPG, TFIIH, and CSB and implications for Cockayne syndrome. Cell. 101:159–171. Saxowsky T, Doetsch P. 2006. RNA polymerase encounters with DNA damage: transcription-coupled repair or transcriptional mutagenesis? Chem Rev. 106:474. Schwarz G. 1978. Estimating the dimension of a model. Ann Stat. 6:461–464. Scicchitano D, Olesnicky E, Dimitri A. 2004. Transcription and DNA adducts: what happens when the message gets cut off? DNA Repair. 3:1537–1548. Shapiro R, Klein R. 1966. The deamination of cytidine and cytosine by acidic buffer solutions. Mutagenic implications. Biochem. 5:2358–2362. Shibutani S, Takeshita M, Grollman A. 1991. Insertion of specific bases during DNA synthesis past the oxidationdamaged base 8-oxodG. Nature. 349:431. Shibutani S, Takeshita M, Grollman A. 1997. Translesional synthesis on DNA templates containing a single abasic site. J Biol Chem. 272:13916–13922. Simonelli V, Narciso L, Dogliotti E, Fortini P. 2005. Base excision repair intermediates are mutagenic in mammalian cells. Nucleic Acids Res. 33:4405. Sueoka N. 1995. Intrastrand parity rules of DNA base composition and usages biases of synonymous codons. J Mol Evol. 40:318–3–25 Erratum 42: 323. Svejstrup J. 2002. Mechanisms of transcription-coupled DNA repair. Mol Cell Biol. 3:21–29. Takahasi K, Sakuraba Y, Gondo Y. 2007. Mutational pattern and frequency of induced nucleotide changes in mouse ENU mutagenesis. BMC Mol Biol. 8:1–10. Tanaka M, Ozawa T. 1994. Strand asymmetry in human mitochondrial-DNA mutations. Genomics. 22:327–335. Thiviyanathan V, Somasunderam A, Volk D, Gorenstein D. 2005. 5-Hydroxyuracil can form stable base pairs with all four bases in a DNA duplex. Chem Commun. 400–402. Touchon M, Nicolay S, Arneodo A, d’Aubenton Carafa Y, Thermes C. 2003. Transcription-coupled TA and GC strand asymmetries in the human genome. FEBS Lett. 555: 579–582. Touchon M, Nicolay S, Audit B, Brodie EB, d’Aubenton Carafa Y, Arneodo A, Thermes C. 2005. Replicationassociated strand asymmetries in mammalian genomes: toward detection of replication origins. Proc Natl Acad Sci USA. 102:9836–9841. Viswanathan A, You H, Doetsch P. 1999. Phenotypic Change Caused by Transcriptional Bypass of Uracil in Nondividing Cells. Science. 284:159. Webster M, Smith N, Lercher M, Ellegren H. 2004. Gene expression, synteny, and local similarity in human noncoding mutation rates. Mol Biol Evol. 21:1820–1830. Zhang A, Yang B, Li Z. 2007. Theoretical study on the hydrolytic deamination reaction mechanism of adenine– (H2O)n (n 5 1–4). J Mol Struct Theochem. 819:95–101. Zhou W, Doetsch P. 1993. Effects of abasic sites and DNA single-strand breaks on prokaryotic RNA polymerases. Proc Natl Acad Sci USA. 90:6601–6605. Manolo Gouy, Associate Editor Accepted October 7, 2008
© Copyright 2026 Paperzz