The Targeting of Somatic Hypermutation Closely Resembles That of Meiotic Mutation Mihaela Oprea, Lindsay G. Cowell and Thomas B. Kepler This information is current as of June 18, 2017. Subscription Permissions Email Alerts This article cites 45 articles, 24 of which you can access for free at: http://www.jimmunol.org/content/166/2/892.full#ref-list-1 Information about subscribing to The Journal of Immunology is online at: http://jimmunol.org/subscription Submit copyright permission requests at: http://www.aai.org/About/Publications/JI/copyright.html Receive free email-alerts when new articles cite this article. Sign up at: http://jimmunol.org/alerts The Journal of Immunology is published twice each month by The American Association of Immunologists, Inc., 1451 Rockville Pike, Suite 650, Rockville, MD 20852 Copyright © 2001 by The American Association of Immunologists All rights reserved. Print ISSN: 0022-1767 Online ISSN: 1550-6606. Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017 References J Immunol 2001; 166:892-899; ; doi: 10.4049/jimmunol.166.2.892 http://www.jimmunol.org/content/166/2/892 The Targeting of Somatic Hypermutation Closely Resembles That of Meiotic Mutation1 Mihaela Oprea,* Lindsay G. Cowell,† and Thomas B. Kepler2‡ D uring the first 2 wk of infection, the primary Ig repertoire is diversified by a hypermutation mechanism that introduces mutations at a rate ⬃6 orders of magnitude above background (1). Although some properties of somatic hypermutation (SH)3 have been well characterized, the mechanism by which Ig DNA is modified remains unknown and the molecules involved unidentified. Many different models have been proposed, including those involving gene conversion (2), reverse transcription (3), asymmetric error-prone replication (4, 5), error-prone repair (6), transcription-coupled repair (7), and strand-break repair (8) but none has yet proven convincing. Recent attempts to implicate specific gene products known to be involved in DNA metabolism using knockout mice have produced largely negative results (9 – 13) or have shown small effects (14 –16). Similarly, studies involving human patients with identified DNA metabolism deficiencies (17–19) had negative results (for review, see Ref. 20). Examination of the mutations introduced during SH has led to the formulation of complicated models involving multiple targeting mechanisms, including different mutators for A-T and G-C bp and multiple stages of processing (8, 16, 21–26). It has been recognized that SH exhibits microsequence dependence in both its targeting (27) and spectra (25). Similar microsequence dependence of mutation frequency and spectra has been shown to occur during neutral evolution (28, 29). The purpose of the present study was to investigate the relationships between the mechanisms underlying the accumulation of mutations during germline evolution and those accumulated during SH by comparing the characteristics of mutation targeting and spectra under meiotic mutation and under SH. A previous study (30) found differences in the T:A to C:G transition frequency and in the mutability4 of G between SH and meiotic processes and thus concluded that the mechanism introducing somatic mutations is different from that responsible for germline evolution. We have shown previously that the spectra of SH and meiotic mutation are different (25). We are here undertaking a more comprehensive study that might reveal similarities undetectable in previous studies and further characterize the differences. *Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, NM 87545; †Department of Immunology, Duke University Medical Center, Durham, NC 27710; and ‡The Santa Fe Institute, Santa Fe, NM 87501 DNA sequence data: meiotic mutations Received December 8, 1999. Accepted October 20, 2000. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. 1 Partially supported by National Science Foundation Award MCB 9357637 (to T.B.K.) and by National Institutes of Health Grant AI28433 (to A.P. and M.O.). Portions of the work were done under the auspices of the U.S. Department of Energy. During much of this work T.B.K. and L.G.C. were in the Biomathematics Program, Department of Statistics, North Carolina State University, Raleigh, NC. 2 Address correspondence and reprint requests to Dr. Thomas B. Kepler, The Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501. E-mail address: [email protected] 3 Abbreviation used in this paper: SH, somatic hypermutation. Copyright © 2001 by The American Association of Immunologists Materials and Methods DNA sequence data: SH We collected a data set comprised of 1721 mutations accumulated in nonfunctionally rearranged human Ig genes, murine 3⬘ Ig V-flanking region DNA, and murine J–C intron DNA (31–38). In all cases, the germline sequence is known; mutations were identified by comparison of each sequence with its corresponding germline sequence. Insertions and deletions were not treated in our analysis. Further details regarding this sequence collection can be found elsewhere (25, 31). We collected a set of processed human pseudogenes by searching GenBank, release 111.0. Processed pseudogenes result from reverse transcription of mRNA from functional genes and the integration of the reversetranscribed DNA into new chromosomal positions. These pseudogenes are usually integrated far from the parent gene and are therefore not transcribed and do not participate in gene conversion events (28, 39, 40). We then used a locally built version of the BLASTALL algorithm from National Center for Biotechnology Information to search the primate DNA database for sequences with homology to the processed pseudogenes. Only the pseudogenes for which the functional ortholog was unambiguously identified were kept for further analysis. When multiple pseudogenes of the same 4 We use the term “mutability” rather than “mutation rate” to emphasize its role as a property of the DNA sequence itself. 0022-1767/01/$02.00 Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017 We have compared the microsequence specificity of mutations introduced during somatic hypermutation (SH) and those introduced meiotically during neutral evolution. We have minimized the effects of selection by studying nonproductive (hence unselected) Ig V region genes for somatic mutations and processed pseudogenes for meiotic mutations. We find that the two sets of patterns are very similar: the mutabilities of nucleotide triplets are positively correlated between the somatic and meiotic sets. The major differences that do exist fall into three distinct categories: 1) The mutability is sharply higher at CG dinucleotides under meiotic but not somatic mutation. 2) The complementary triplets AGC and GCT are much more mutable under somatic than under meiotic mutation. 3) Triplets of the form WAN (W ⴝ T or A) are uniformly more mutable under somatic than under meiotic mutation. Nevertheless, the relative mutabilities both within this set and within the SAN (S ⴝ G or C) triplets are highly correlated with those under meiotic mutation. We also find that the somatic triplet specificity is strongly symmetric under strand exchange for A/T triplets as well as for G/C triplets in spite of the strong predominance of A over T mutations. Thus, we suggest that somatic mutation has at least two distinct components: one that specifically targets AGC/GCT triplets and another that acts as true catalysis of meiotic mutation. The Journal of Immunology, 2001, 166: 892– 899. The Journal of Immunology 893 The pooled mutation and adjusted total counts were used in the study of strand symmetry of the mutational mechanism and of the potential relation between triplet targeting in somatic and meiotic mutation. There were 2,261 mutations in 53,479 triplets. Statistical models and methods Our analyses are based on models for the acquisition of mutations in which the mutability of a given nucleotide depends on the microsequence motif that contains it. We consider two motif sizes: singlets and triplets. Models based on singlets account only for the identity of the target nucleotide itself, i.e., whether it is A, G, C, or T. Models based on triplets account for the identity of the target nucleotide and its immediate neighbors. In other words, we consider the mutability of XYZ where the target nucleotide Y is flanked by nucleotide X (5⬘) and nucleotide Z (3⬘). Every nucleotide in the database is characterized by three factors: the type of mutation to which it has been exposed (somatic or meiotic), the sequence in which it is located, and the motif in which it is found. Each nucleotide, therefore, has probability pijk of being mutated, where the indices i, j, and k identify the mutational set, sequence number within the set, and motif, respectively. This probability is modeled as: pijk ⫽ ijik (1) where ij is the effective time of exposure to mutation, or age, of the jth sequence in the ith set and ik is the mutability of the kth motif under the mutational process i. Although the times are not of interest to us, it is necessary to include them in the model for consistent comparison among Table I. Pseudogenes and related orthologs used in this studya Gene name Human Ortholog Pseudogene Ortholog in Another Species -Actin ␥-Actin 1 S-adenosylmethionine decarboxylase Argininosuccinate synthetase Na,K-ATPase -1 Calmodulin Ceruloplasmin ␣ catenin Cytochrome b-5 T cell cyclophilin Dihydrolipoamide succinyltransferase ER-60 protein Estrogen-related receptor ␣ Ubiquitin-like protein FAU Ferritin H FKBP-12 protein Dihydrofolate reductase Gap junction protein ␣1 Glutamine synthetase Glutathione peroxidase G protein-coupled receptor kinase 6 Hexokinase II High-mobility group 1-like protein L3 Heat shock protein 70 Keratin 19 Lactate dehydrogenase B Poly(ADP-ribose) synthetase Phosphoglycerate kinase Prohibitin Nuclear inhibitor of protein phosphatase-1 cAMP-dependent protein kinase type I␣ Small nuclear RNA protein E Jk-recombination signal binding protein Src homology collagen protein p66 isoform Serine hydroxymethyltransferase 1 Activating transcription factor 4 Tubulin B Tom 20 Ubiquitin protein ligase E3A Topoisomerase I Thiopurine methyltransferase X-box binding protein 5016088 4501886 178517 4557336 806753 665587 180248 415305 181226 30308 632883 1208426 4758305 31302 182504 182632 4503322 4755136 4504026 183260 3005017 4809268 4504424 32466 4504916 34328 337423 4505762 4505772 4883484 1526989 338266 190949 1658387 4759105 4502264 4929137 285986 4507798 339805 805083 4827057 28248 2094760 457942 179063 189062 29627 180258 556810 181230 30164 537350 4098206 1916859 458007 806340 182629 182732 181210 551473 183654 1477557 881950 4884557 29571 186694 187061 265480 189924 405232 4581597 307336 338248 871825 1834516 1204117 434667 338692 3860160 2853322 339811 805081 292164 2182268 57573 162622 162696 1198 6680831 5281318 6753293 2257956 2565300 220658 927669 6679692 1628627 2879899 165022 191041 7012755 452436 1564 3005004 204612 164489 313283 623167 473574 450596 206112 206383 1082085 7012769 312004 52756 7012770 2407961 7012771 4929434 717145 1843534 6678398 2895910 6224965 a The number given is the GenBank gene identification number. Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017 gene were available, we only used one in the analysis. We searched GenBank (using the BLAST program) for an ortholog of each gene in a species other than Homo sapiens. The accession of numbers of the genes in the final data set are given in Table I. Each group of two functional genes and a processed pseudogene we subjected to sequence alignment using the ClustalW program (http://www2.ebi.ac.uk/clustalw). From the obtained alignments, we inferred the state in the ancestor of the human gene and processed pseudogene at each nucleotide position according to the following rules (41): wherever the two human genes agreed, we assumed that they carry the ancestral state; where they did not agree, we turned to the second ortholog. If this ortholog agreed with any of the human genes, the ancestral state was assumed to be the one carried by two of the three genes. If the nucleotide was different in all three genes, we declared the ancestral state ambiguous and excluded that nucleotide position from the analysis. We also discarded positions where an insertion or deletion was identified in any of the three genes. Having identified the ancestral state, we then traversed the alignment and counted the number of occurrences of each of the 64 nucleotide triplets in the ancestral gene, as well as the number of instances in which the pseudogene carried a mutation at the central nucleotide of a triplet. A given number of mutations in a triplet in a given pseudogene is the result of its intrinsic propensity to mutate as well as the divergence time between the gene and the pseudogene. A pseudogene may have a high mutation count because it contains highly mutable triplets or because it is very old. To account for these factors, we determined the relative age of the genes and adjusted the total triplet count in each pseudogene by the relative age of the pseudogene (see below). 894 TARGETING OF SOMATIC AND MEIOTIC MUTATIONS total counts provide more reliable estimates of the underlying binomial parameter and must be weighted more heavily than those with few total counts. See Appendix for the formula defining the estimators. We carried out the hypothesis testing on these estimators by randomly permuting the triplet labels on one of the sets in the paired data and reporting (as p) the quantile of the real estimated correlation coefficient among the estimators obtained using the permuted data. Table II. Mutability ratio of complementary nucleotides Data Set Nucleotides X/X (p) Somatic Somatic A/T G/C 1.86 (⬍10⫺4) 1.22 (0.01) Meiotic Meiotic A/T G/C 1.02 (0.80) 1.02 (0.78) Results Mutation targeting: complementation symmetry sequences from different sources and for consistent pooling of data from diverse sources. We denote the total nucleotide count in class (i, j, k) by nijk and the number of mutations among those by mijk. Our analyses are based on the likelihood model given by log ᏸ共, 兩m, n兲 ⫽ 冘 共mijk log ijik ⫺ nijkijik兲. (2) ijk FIGURE 1. Mutability scatter plots: comparison of the estimated mutability under somatic hypermutation between nucleotide triplets (XYZ) and their complements (X Y Z ). The dashed line is the principal line. A, Triplets of the form XAZ, with A mutating compared to their complements Z TX with T mutating. For visual clarity, points are labeled with the XAZ member of the pair only. B, Triplets of the form XGZ, with G mutating compared to their complements Z CX with C mutating. The correlation apparent here was tested using the method described the Materials and Methods (see Table III). Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017 Specific hypotheses within the context of this model are expressed as constraints on the mutabilities . For example, the hypothesis that the mutabilities under meiotic and somatic processes are the same is expressed as 1k ⫽ 2k. The parameters and were estimated by maximizing the log likelihood, Eq. 1, subject to the constraints for the hypothesis under consideration and to the identifiability constraint on : 兺jkijnijk ⫽ 兺jknijk, for both i. This constraint ensures that the mean “time of exposure” is normalized between sets. Analyses using contingency tables or correlation tests (where counts over all sequences in a set are needed) were performed using pooled counts derived from the likelihood model and adjusted as follows. The total counts (mutated plus unmutated) for each motif, denoted ñi 䡠 k, are adjusted for consistent estimation: ñi 䡠 k ⫽ 兺jˆ ijnijk, where ˆ ij is the maximum likelihood estimate for the effective time of exposure, ij. We applied correlation tests designed to infer the correlation coefficient among the binomial parameters (proportions or probabilities) that underlie our count data. The data themselves also have binomial sampling variability, which is not correlated. Therefore, the task is somewhat more complicated than an ordinary (Pearson) correlation test, which, in addition, assumes normality and equality of variances. We have used two types of estimators: those that are designed to diminish the bias induced by the presence of binomial sampling by accounting for the excess variance and those that do not make this correction. The results of hypothesis testing, where the null hypothesis is that the correlation coefficient is zero, do not depend on this choice, but the numerical value of the estimated correlation coefficient does. All estimators use the fact that the triplets with greater To investigate the presence of strand bias in the mechanisms responsible for introducing mutations, we compared the mutabilities of motifs with those of their complements. The first-order model, in which mutability depends on the identity of the base itself but not on its neighbors, shows that the somatic set is highly asymmetric, with mutability at A almost twice that at T (Table II). The G:C ratio is not nearly as high as that for A:T but is also significantly different from 1. The meiotic set does not show any evidence of complementation asymmetry. This result holds even when we exclude from the computation the sites that span CG dinucleotides. For the model in which both neighbors influence the mutability, we performed correlation tests comparing the mutability of a given triplet with that of its complement. Triplets were classified according to their central bases and analyzed separately to remove the clear asymmetry of the single-nucleotide rates. We find that the correlations between triplets and their complements are extremely high under SH (Fig. 1) but not meiotic mutation. Tests of the correlation coefficient bear this out (Table III). Note, however, that if we include triplets spanning CG dinucleotides in the calculation of correlation coefficients for the germline set, we obtain a significant correlation for this set as well. We obtain similar results when we account for the binomial variance, although the values of the correlation coefficients are (as expected) higher: r ⫽ 0.83 ( p ⬍ 10⫺4) for the somatic set with AGC/GCT excluded, and r ⫽ 0.74 ( p ⫽ 0.12) for the meiotic set with CG dinucleotide-containing triplets excluded. The correlation becomes significant for the meiotic set as well if we include these motifs. The Journal of Immunology 895 Table III. Symmetry of microsequence specificity of mutation targeting: linear correlation coefficients for triplet-complement mutability pairs Data Set Nucleotides rS (p) Somatic Somatic Somatic A/T G/Ca G/Cb 0.87 (⬍10⫺4) 0.87 (0.01) 0.60 (0.03) Meiotic Meiotic Meiotic A/T G/Ca G/Cc 0.39 (0.15) 0.99 (10⫺4) 0.34 (0.29) a b c Complete set. AGC/GCT and CGN/NCG excluded. CGN/NCG excluded. Mutation targeting: meiotic/somatic comparison Mutation spectrum: complementation symmetry We tested the complementation symmetry of the mutation spectrum conditioned only on the identity of the mutating base. For both the somatic and meiotic data, we constructed 2 ⫻ 2 ⫻ 3 contingency tables with mutating base classified as purine/pyrimidine and weak/strong, and resulting nucleotide as the transition partner, complement or transition partner’s complement (31), and tested for independence of the purine/pyrimidine classification and the resulting nucleotide (complementation symmetry). Both 2 tests failed to provide any evidence for departures from complementation symmetry (meiotic: 2 ⫽ 7.53, if we do not include mutations at CG dinucleotides and 8.20 if we do; somatic: 2 ⫽ 6.14; none of these values is significant at the 0.05 level). The microsequence dependence of the spectrum under somatic hypermutation is symmetric: the estimated common correlation FIGURE 2. Strength of deviation from equality of triplet mutability between somatic and meiotic sets. Triplets with A/T mutating are shown in A; triplets with G/C mutating are shown in B. Contributions to the log-likelihood difference from individual triplet motifs are drawn upward when the mutability is higher in the somatically mutated set and downward when the mutability is higher in the meiotically mutated set. Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017 To compare the microsequence mutability patterns in meiotic and somatic processes, we computed the log-likelihood differences between two models: one is the fully parameterized model in which the mutability for each triplet in each of the two sets is separately estimated, for a total of 128 mutability parameters (plus age parameters; see Materials and Methods). In the second model, all triplet mutabilities are assumed to be identical between the somatic and meiotic sets. The age parameters are still assumed independent and take up any differences in overall mutation rate. Each nucleotide triplet contributes a term to the log-likelihood difference; the larger the term, the more poorly the assumption of equality between somatic and meiotic data sets accommodates that triplet (Fig. 2). We find that almost three-fourths of the log-likelihood difference is due to the following triplets (or motifs): triplets containing CG dinucleotides, AGC, and its complement GCT, and triplets of the form WAN, where W is T or A, N is any nucleotide. We estimated the contributions of each of these classes by amending the model to recognize the appropriate number of triplet classes. For example, to estimate the contribution of CG dinucleotides, the amended model recognizes two classes of triplets: those containing CG dinucleotides and those that do not. All of the triplets within a class are constrained to have the same ratio of somatic mutability to meiotic mutability. Each of the above classes therefore uses 1 df. The increase in log likelihood produced by the serial inclusion of each of these classes is: NCG/CGN, 115.5; AGC/ GCT, 40.8; WAN, 49.7, out of a total likelihood difference (largest minus smallest model) of 291.3 (63 df). In sum, these 3 df (of 63) account for 206 of the total 291.3 log-likelihood difference. Scatterplot comparisons of triplet mutabilities between somatic and meiotic data sets largely corroborate these results and provide additional insights; the correlation between somatic and meiotic mutabilities stands out quite clearly (Fig. 3). For central nucleotide A, when triplets are grouped as above with WAN and SAN (S ⫽ G or C), the within-groups correlation stands out strongly. The observed patterns are further confirmed by computation of the linear correlation coefficients (Table IV). These were performed both for the complete data sets and as modified by the above considerations to remove the effects of those triplets that are clearly involved in processes unique to one set or the other and without taking into account binomial sampling variance. If we account for the binomial sampling, the estimated correlation coefficients become higher, but the p values are similar: r ⫽ 0.73(0.0004) and r ⫽ 0.55(0.01), depending on whether we do or we do not divide the triplets with A as the central nucleotide into WAN and SAN classes. Inspection of Fig. 3 also reveals that, consistent with our findings of complementation symmetry, the triplets NTW, complementary to WAN, also have mutability higher than the triplets NTS. The effect is not as marked for T as it is for A, but this may be due to the smaller number of mutations at T. 896 TARGETING OF SOMATIC AND MEIOTIC MUTATIONS coefficient for the rate of transitions and of transversions to the complement of the mutating base between a triplet and its complement is r ⫽ 0.43 ( p ⫽ 0.001). This result also holds if we do not include the triplets that span CG dinucleotides; these triplets are extremely rare and their mutation counts are also very low. For the meiotic set, the estimated correlation coefficient with CG dinucleotides excluded is r ⫽ 0.23 ( p ⫽ 0.12). Similar to what we observed in mutational targeting, if we include CG dinucleotides, the spectrum becomes symmetric in the meiotic case as well (r ⫽ 0.36, p ⫽ 0.003). Mutation spectrum: meiotic/somatic comparison When represented in terms relative to the mutating base, the mutation spectrum is strikingly consistent regardless of which base is mutating, for both meiotic and somatic processes (Fig. 4). The spectra are not the same between somatic and meiotic processes however (Fig. 4). Direct test of the spectrum conditional on the mutating base only shows very strong differences between meiotic and somatic mutation (2 ⫽ 14.42 (A), 35.68 (G), 22.02 (T), and 7.82 (C); with the exception of C, all other values are significant at the 0.01 level). The correlation coefficient between somatic and meiotic sets (computed as the combined triplet correlations as above for the symmetry tests) is not significantly different from zero (r ⫽ ⫺0.03, p ⫽ 0.78). Discussion We compared the characteristics of mutations introduced by SH to those introduced meiotically. To ensure that observed characteristics are due to the mutational process itself, we have minimized the effects of selection by choosing, where possible, DNA sequences that are not subject to selection. The SH data are from nonproductively rearranged Ig V genes and from introns flanking rearranged V genes. For the meiotic mutations, we have used processed pseudogenes. For these, we do not completely eliminate selection, since there is uncertainty in assignment of observed nucleotide differences to the pseudogene or its ortholog. We attempt to minimize this uncertainty by considering the state of each nucleotide site in a second ortholog, from a species other than human. A marked asymmetry between the mutability under SH of thymidine and that of adenine has been noted previously and taken as evidence for strand bias of the hypermutation mechanism (42). We also find a higher mutability at A than at T and that this asymmetry is much greater than any singlet asymmetry under meiotic mutation. But we also find that when this overall mutability difference is factored out, the microsequence specificity at A is very similar to that at T (Fig. 1 and Table III). Similar findings have been reported (23, 24) and used to justify the conclusion that both strands are targeted by SH and that two mechanisms, one strandunbiased mutating G and C and the other strand-biased acting on A and T, operate. We find, however, that the triplet mutabilities are surprisingly complementation symmetric for both A/T and G/C mutations. In fact, once the single-nucleotide mutabilities have been taken into account, the triplet symmetry is evident for SH. The triplet symmetry appears in meiotic mutation depends strongly on whether the triplets that span CG dinucleotides are included in the calculation of the correlation coefficient. Thus, although we also conclude that there are two distinct components of SH targeting, we find that they share similar strand symmetry. Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017 FIGURE 3. Mutability scatter plots: comparison of the estimated mutability between triplets in the somatically mutated data set and those in the meiotically mutated data set. A, Triplets with A mutating; triplets with G, T, and C mutating are shown in B, C, and D, respectively. The solid line in each panel is the principle lines. In A, the dashed lines correspond to principle lines constructed independently for two groups of triplets: those of the form WAN and those of the form SAN (see text). Correlation coefficients are shown in Table IV. The error bars give the SE due to binomial sampling. The Journal of Immunology 897 Table IV. Similarity between microsequence dependence of mutation targeting under meiotic and somatic processes: linear correlation coefficients for triplet mutabilitiesa FIGURE 4. Comparison of the mutation spectra under somatic and meiotic mutation. Pie charts show the proportion of mutations to each of the corresponding bases. Colors indicate the biochemical relationship of the mutating nucleotide and the nucleotide resulting from the mutation: dark gray, transitions; medium gray, complements; light gray, complement of the transition. r (p) Commonb,c Commonc A Ab Gc G T Cc C 0.46 (⬍10⫺3) 0.36 (0.01) 0.23 (0.41) 0.49 (0.06) 0.52 (0.10) 0.03 (0.91) 0.36 (0.16) 0.32 (0.33) ⫺0.13 (0.68) a Common indicates that r is the common correlation coefficient for all four nucleotides. b A divided into two classes: WAN and SAN. r is the common correlation coefficient for the two groups. c AGC/GCT and CGN/NCG removed. being advantageous whereas mutations under meiotic mutation presumably are merely unavoidable. We have previously shown that the mutation spectrum under SH is microsequence dependent: what a nucleotide mutates to is influenced by what its neighbors are (25). We compared this spectrum to that previously inferred from a set of meiotic mutations and found no correlations. That meiotic data set, however, combined information from triplets and their complements; furthermore, the mutations were inferred by a somewhat different process than the one we use here. The more comprehensive comparison here confirms the previous result: although there are significant effects of neighboring nucleotides on the mutation spectrum in both meiotic and somatic processes, the triplet dependencies are uncorrelated. The following model is consistent with the findings thus far, though it is certainly not uniquely so. An initial lesion is created in the dsDNA. The targeting at this point is symmetric: sense strand XAZ is affected just as frequently as sense strand Z TX . This occurs naturally if the lesion is a double-strand break, consistent with the findings of Sale and Neuberger (8). In fact, the complementation symmetry of targeting even suggests a staggered cut. In a blunt cut, the complementary nucleotides are not in equivalent states: one is 3⬘ of the break and the other is 5⬘ of it. A staggered cut that also breaks the base pairing leaves the two nucleotides both 5⬘ or both 3⬘ of the break, though now on opposite sides of it. Furthermore, both are unpaired and overhanging. Note that now the apparent strand asymmetry can now be viewed as the asymmetry between the DNA 5⬘ and 3⬘ of the break. The probability that religation is mutagenic now depends on which side of the break the purine is on, with the probability of mutagenic repair higher if the purine is on the plus strand. This would result if, for example, purines are more susceptible to excision when overhanging and gaps in the plus strand (or 5⬘ of the double-stranded break) are less likely to be repaired correctly. Several studies have found reduced mutation rates in mismatch repair-deficient mice (11, 14, 16) and relative enhancement of mutations at the AGC/GCT hot spots (16) or at G and C bases (13, 15). Rada et al. (16) inferred from this observation that the mutator has two components, one that is dependent on the mismatch repair protein MSH-2 and another that is MSH-2 independent. We concur and suggest that MSH-2 is responsible for introducing lesions as described above and leaves the signature of catalytically enhanced meiotic mutation. A second component, as yet unidentified, is targeted specifically at AGC/GCT triplets or at the palindomic quadruplet AGCT (L. G. Cowell and T. B. Kepler, manuscript in Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017 With certain well-defined exceptions, the sequence specificity of mutational targeting underlying meiotic and somatic mutations are significantly correlated. This is quite remarkable since the time scales over which these changes have accrued differ by about 7 orders of magnitude (about 1 mo for SH and on the order of a million years for meiotic mutation). This would be expected if mutations under SH are introduced by catalytic enhancement of the processes responsible for meiotic mutations. Thus, if a major proportion of mutations introduced during evolution occur at strand breaks, then SH hastens the introduction of these breaks, but they are introduced in the same places. In this sense, the reaction resembles true catalysis. The differences in the triplet mutabilities between somatic and meiotic mutation are largely attributable to three effects: 1) The mutability of triplets containing CG dinucleotides is much higher under meiotic mutation than under SH. The mutability of CG dinucleotides is a well-understood consequence of the methylation of such dinucleotides (43). This excess mutability has been seen in studies of pseudogene-ortholog pairs (29) and in surveys of genetic lesions associated with human genetic disease (44). 2) The mutability of the triplet AGC and its complement GCT is considerably higher under SH than under meiotic mutation. This is the wellknown serine hot spot (27, 36). 3) The mutability of triplets of the form WAN is higher under SH. The mutabilities of the triplets within each of the two subsets (WAN, SAN) are correlated with those in the meiotic data set. Although the pattern is weaker for T mutating, the complementary triplets NTW also segregate at higher mutability from the triplets NTS and both sets are correlated with the meiotic mutabilities. The overarching similarities between somatic and meiotic mutation targeting, punctuated sharply by specific differences suggests that two components are involved in the targeting: a “background” mechanism that has recruited and modified components of the DNA repair machinery, and a mechanism, perhaps novel, specific to AGC/GCT triplets (see below). We also investigated the relationships between the mutation spectra under somatic and meiotic mutation. It was previously suggested that the two processes may be related because both result in an excess of transitions over transversions (22). We find, however, that the proportion of transitions is significantly smaller under SH. The effect of this is that the rate of replacement mutations is higher under SH and, consequently, so is the net rate of diversification. Both of these effects are consistent with diversification under SH Target TARGETING OF SOMATIC AND MEIOTIC MUTATIONS preparation), which contains both triplet motifs, and introduces lesions preferentially at these sites. One candidate for the unknown molecule is a modified site-specific methylase. Other groups have hypothesized the presence of a two-component mutator (21–24), consistent with the observation that G and C are mutated more frequently in the murine cell line 18-81 (26) and the Burkitt lymphoma line Ramos (8). Furthermore, the G 䡠 C-targeting component is argued to have arisen first (or been co-opted first by SH) (22), consistent with the observations that AGC/GCT or G and C are preferentially targeted in shark (45) and Xenopus (46). The identity of the molecules involved in somatic hypermutation will surely be revealed soon, but even after their names are known, it will remain to learn how they do what they do. For this task, careful analysis of the mutation patterns will be essential. hypermutation in which targeting of the mutator is linked to the direction of DNA replication. EMBO J. 10:4331. Gearhart, P. J., and D. F. Bogenhagen. 1983. Clusters of point mutations are found exclusively around rearranged antibody variable genes. Proc. Natl. Acad. Sci. USA 80:3439. Peters, A., and U. Storb. 1996. Somatic hypermutation of immunoglobulin genes is linked to transcription initiation. Immunity 4:57. Sale, J. E., and M. Neuberger. 1998. TdT-accessible breaks are scattered over the immunoglobulin V domain in a constitutively hypermutating B cell line. Immunity 9:859. Shen, H., D. Cheo, E. Friedberg, and U. Storb. 1997. The inactivation of the xpc gene does not affect somatic hypermutation or class switch recombination of immunoglobulin genes. Mol. Immunol. 34:527. Zheng, B., S. Han, E. Spanopoulou, and G. Kelsoe. 1998. Immunoglobulin gene hypermutation in germinal centers is independent of the RAG-1 V(D)J recombinase. Immunol. Rev. 162:133. Winter, D., Q. Phung, A. Umar, S. Baker, R. Tarone, K. Tanaka, R. Liskay, T. Kunkel, V. Bohr, and P. Gearhart. 1998. Altered spectra of hypermutation in antibodies from mice deficient for the DNA mismatch repair protein PMS2. Proc. Natl. Acad. Sci. USA 95:6953. Frey, S., B. Bertocci, F. Delbos, L. Quint, J.-C. Weill, and C.-A. Raynaud. 1998. Mismatch repair deficiency interferes with the accumulation of mutations in chronically stimulated B cells and not with the hypermutation process. Immunity 9:127. Jacobs, H., Y. Fukita, G. van der Horst, J. de Boer, G. Weeda, J. Essers, N. de Wind, B. Engelward, L. Samson, S. Verbeek, et al. 1998. Hypermutation of immunoglobulin genes in memory B cells of DNA repair-deficient mice. J. Exp. Med. 187:1735. Cascalho, M., J. Wong, C. Steinberg, and M. Wabl. 1998. Mismatch repair coopted by hypermutation. Science 279:1207. Phung, Q., D. Winter, A. Cranston, R. Tarone, V. Bohr, R. Fishel, and P. Gearhart. 1998. Increased hypermutation at G and C nucleotides in immunoglobulin variable genes from mice deficient in the MSH2 mismatch repair protein. J. Exp. Med. 187:1745. Rada, C., M. R. Ehrenstein, M. Neuberger, and C. Milstein. 1998. Hot spot focusing of somatic hypermutation in MSH2-deficient mice suggests two stages of mutational targeting. Immunity 9:135. Sack, S., Y. Liu, J. Germain, and N. Green. 1998. Somatic hypermutation of immunoglobulin genes is independent of the bloom’s syndrome DNA helicase. Clin. Exp. Immunol. 112:248. Wagner, S., and M. Neuberger. 1996. Somatic hypermutation of immunoglobulin genes. Annu. Rev. Immunol. 14:441. Kim, N., K. Kage, F. Matsuda, M. Lefranc, and U. Storb. 1997. B lymphocytes of xeroderma pigmentosum or Cockayne syndrome patients with inherited defects in nucleotide excision repair are fully capable of somatic hypermutation of immunoglobulin genes. J. Exp. Med. 186:413. Harris, R. S., Q. Kong, and N. Maizels. 1999. Somatic hypermutation and the three R’s: repair, replication and recombination. Mutat. Res. 436:157. Spencer, J., M. Dunn, and D. Dunn-Walters. 1999. Characteristics of sequences around individual nucleotide substitutions in Igvh genes suggest different GC and AT mutators. J. Immunol. 162:6596. Diaz, M., J. Velez, M. Singh, J. Cerny, and M. Flajnik. 1999. Mutational pattern in the nurse shark antigen receptor gene (NAR) is similar to mammalian Ig and to spontaneous mutations in evolution: the translesion synthesis model of somatic hypermutation. Int. Immunol. 11:825. Dörner, T., S. Foster, N. Farner, and P. Lipsky. 1998. Somatic hypermutation of human immunoglobulin heavy chain genes: targeting of RGYW motifs on both DNA strands. Eur. J. Immunol. 28:3384. Milstein, C., M. Neuberger, and R. Staden. 1998. Both DNA strands of antibody genes are hypermutation targets. Proc. Natl. Acad. Sci. USA 95:8791. Cowell, L., and T. Kepler. 1999. The nucleotide-replacement spectrum under somatic hypermutation exhibits microsequence dependence that is strandsymmetric and distinct from that under germline mutation. J. Immunol. 164:1971. Bachl, J., and M. Wabl. 1996. An immunoglobulin mutator that targets G.C base pairs. Proc. Natl. Acad. Sci. USA 93:851. Rogozin, I., and N. Kolchanov. 1992. Somatic hypermutagenesis in immunoglobulin genes. II. Influence of neighboring base sequences on mutagenesis. Biochim. Biophys. Acta 1171:11. Bains, W. 1992. Local sequence dependence of rate of base replacement in mammals. Mutat. Res. 267:43. Hess, S., J. Blake, and R. Blake. 1994. Wide variations in neighbor-dependent substitution rates. J. Mol. Biol. 236:1022. Golding, G., P. Gearhart, and B. Glickman. 1987. Patterns of somatic mutations in immunoglobulin variable genes. Genetics 115:169. Cowell, L., H. Kim, T. Humaljoki, C. Berek, and T. Kepler. 1999. Enhanced evolvability in immunoglobulin V genes under somatic hypermutation. J. Mol. Evol. 49:23. Brezinschek, H. P., R. I. Brezinschek, and P. Lipsky. 1995. Analysis of the heavy chain repertoire of human peripheral B cells using single-cell polymerase chain reaction. J. Immunol. 155:190. Weber, J. S., J. Berry, T. Manser, and J. L. Claflin. 1994. Mutations in Ig V(D)J genes are distributed asymmetrically and independently of the position of V(D)J. J. Immunol. 153:3594. Weber, J. S., J. Berry, S. Litwin, and J. L. Claflin. 1991. Somatic hypermutation of the JC intron is markedly reduced in unrearranged and H alleles and is unevenly distributed in rearranged alleles. J. Immunol. 146:3218. 6. 7. 8. 9. 10. 11. 12. Acknowledgments We thank Claudia Berek and Latham Claflin for sharing unpublished data. 13. Appendix Estimator for the correlation coefficient among binomial proportions 14. The model underlying the data analysis is that of two sets of mutabilities which are linearly correlated and which give rise to binomial (count) data. The task is to estimate the linear correlation coefficient. The difficulty is that the binomial sampling variability is independent (i.e., uncorrelated); it is only the indirectly observed mutabilities that are correlated. The estimation is as follows. The adjusted counts for each motif k are designated by nik where i ⫽ 1, 2 is the group index (somatic or meiotic; triplet or complement), and k designates the motif. Similarly, mik denotes the number of mutated occurrences of motif k in group i. For each of the four nucleotides, the number of triplets is denoted by K. The dot denotes summation over the respective coefficient. The estimators for the correlation coefficients are computed as: ˆ ⫽ Q s1s2 (3) 16. 17. 18. 19. 20. 21. 22. where Q⫽ 冘冑 冉 n1kn2k k s2i ⫽ m1k m1䡠 ⫺ n1k n1䡠 冘 冉 knik mik m1䡠 ⫺ nik n1䡠 冊 冊冉 and 冉冘 冑 k 冊冋 n1kn2k 1 ⫹ m2k m2䡠 ⫺ n2k n2䡠 2 ⫺ 共K ⫺ 1 兲 ni䡠 ⫺ K ⫹ 1 ⫺ ⫽ 15. 冘 (4) 冊 mi䡠 mi䡠 1⫺ ni䡠 ni䡠 2 knik 23. 24. 25. , (5) ni䡠 26. 27. 册 kn1kn2k n1 䡠 n2䡠 冘 冉 冊 ⫺ 冘冑 k 冉 冊 n1kn2k n1k n2k ⫹ . n1䡠 n2䡠 (6) References 1. McKean, D. M., K. Huppi, M. Bell, L. Staudt, W. Gerhard, and M. Weigert. 1984. Generation of antibody diversity in the immune response of BALB/c mice to influenza virus hemagglutinin. Proc. Natl. Acad. Sci. USA 81:3180. 2. Maizels, N. 1989. Might gene conversion be the mechanism of somatic hypermutation of mammalian Ig genes? Trends Genet. 5:4. 3. Steele, E., and J. Pollard. 1987. Hypothesis: somatic mutation by gene conversion via the error prone DNA 3 RNA 3 DNA information loop. Mol. Immunol. 24:667. 4. Manser, T. 1990. The efficiency of antibody maturation; can the rate of B cell division be limiting? Immunol. Today 11:305. 5. Rogerson, B., J. Hackett, A. Peters, D. Haasch, and U. Storb. 1991. Mutation pattern of immunoglobulin transgenes is compatible with a model of somatic 28. 29. 30. 31. 32. 33. 34. Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017 898 The Journal of Immunology 35. Wu, P., and L. Claflin. 1999. Promoter-associated displacement of hypermutations. Int. Immunol. 10:1131. 36. Smith, D., G. Creadon, P. Jena, J. Portanova, B. Kotzin, and L. Wysocki. 1996. Di- and trinucleotide target preferences in somatic mutagenesis in normal and autoreactive B cells. J. Immunol. 156:2642. 37. Rickert, R., and S. Clarke. 1993. Low frequencies of somatic mutation in two expressed V genes: unequal distribution of mutation in 5⬘ and 3⬘ flanking regions. Int. Immunol. 5:255. 38. Weber, J. S., J. Berry, T. Manser, and J. L. Claflin. 1991. Position of the rearranged V and its 5⬘ flanking sequences determines the location of somatic mutations in the J locus. J. Immunol. 146:3652. 39. Ophir, R., T. Itoh, D. Graur, and T. Gojobori. 1999. A simple method for estimating the intensity of purifying selection in protein-coding genes. Mol. Biol. Evol. 16:49. 40. Li, W. 1997. Molecular Evolution. Sinauer Associates, Inc., Sunderland, MA. 899 41. Li, W., C. Wu, and C. Luo. 1984. Nonrandomness of point mutation as reflected in nucleotide substitutions in pseudogenes and its evolutionary implications. J. Mol. Evol. 21:58. 42. Lebecque, S., and P. Gearhart. 1990. Boundaries of somatic mutation in rearranged immunoglobulin genes: 5⬘ boundary is near the promoter, and 3⬘ boundary is approximately 1 kb from V(D)J gene. J. Exp. Med. 172:1717. 43. Bird, A. 1980. DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 8:1499. 44. Cooper, D., and H. Youssoufian. 1988. The CpG dinucleotide and human genetic disease. Hum. Genet. 78:151. 45. Wilson, M., E. Hsu, A. Marcuz, L. Courtet, L. Du Pasquier, and C. Steinberg. 1992. What limits affinity maturation of antibodies in Xenopus: the rate of somatic mutation or the ability to select mutants? EMBO J. 11:4337. 46. Hinds-Frey, K., H. Nishikata, R. Litman, and G. Litman. 1993. Somatic variation precedes extensive diversification of germline sequences and combinatorial joining in the evolution of immunoglobulin heavy chain diversity. J. Exp. Med. 178:815. Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017
© Copyright 2026 Paperzz