Closely Resembles That of Meiotic Mutation The Targeting of

The Targeting of Somatic Hypermutation
Closely Resembles That of Meiotic Mutation
Mihaela Oprea, Lindsay G. Cowell and Thomas B. Kepler
This information is current as
of June 18, 2017.
Subscription
Permissions
Email Alerts
This article cites 45 articles, 24 of which you can access for free at:
http://www.jimmunol.org/content/166/2/892.full#ref-list-1
Information about subscribing to The Journal of Immunology is online at:
http://jimmunol.org/subscription
Submit copyright permission requests at:
http://www.aai.org/About/Publications/JI/copyright.html
Receive free email-alerts when new articles cite this article. Sign up at:
http://jimmunol.org/alerts
The Journal of Immunology is published twice each month by
The American Association of Immunologists, Inc.,
1451 Rockville Pike, Suite 650, Rockville, MD 20852
Copyright © 2001 by The American Association of
Immunologists All rights reserved.
Print ISSN: 0022-1767 Online ISSN: 1550-6606.
Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017
References
J Immunol 2001; 166:892-899; ;
doi: 10.4049/jimmunol.166.2.892
http://www.jimmunol.org/content/166/2/892
The Targeting of Somatic Hypermutation Closely Resembles
That of Meiotic Mutation1
Mihaela Oprea,* Lindsay G. Cowell,† and Thomas B. Kepler2‡
D
uring the first 2 wk of infection, the primary Ig repertoire
is diversified by a hypermutation mechanism that introduces mutations at a rate ⬃6 orders of magnitude above
background (1). Although some properties of somatic hypermutation (SH)3 have been well characterized, the mechanism by which
Ig DNA is modified remains unknown and the molecules involved
unidentified. Many different models have been proposed, including
those involving gene conversion (2), reverse transcription (3),
asymmetric error-prone replication (4, 5), error-prone repair (6),
transcription-coupled repair (7), and strand-break repair (8) but
none has yet proven convincing. Recent attempts to implicate specific gene products known to be involved in DNA metabolism
using knockout mice have produced largely negative results (9 –
13) or have shown small effects (14 –16). Similarly, studies involving human patients with identified DNA metabolism deficiencies (17–19) had negative results (for review, see Ref. 20).
Examination of the mutations introduced during SH has led to
the formulation of complicated models involving multiple targeting mechanisms, including different mutators for A-T and G-C bp
and multiple stages of processing (8, 16, 21–26).
It has been recognized that SH exhibits microsequence dependence in both its targeting (27) and spectra (25). Similar microsequence dependence of mutation frequency and spectra has been
shown to occur during neutral evolution (28, 29). The purpose of
the present study was to investigate the relationships between the
mechanisms underlying the accumulation of mutations during
germline evolution and those accumulated during SH by comparing the characteristics of mutation targeting and spectra under meiotic mutation and under SH. A previous study (30) found differences in the T:A to C:G transition frequency and in the mutability4
of G between SH and meiotic processes and thus concluded that
the mechanism introducing somatic mutations is different from that
responsible for germline evolution. We have shown previously
that the spectra of SH and meiotic mutation are different (25). We
are here undertaking a more comprehensive study that might reveal similarities undetectable in previous studies and further characterize the differences.
*Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos,
NM 87545; †Department of Immunology, Duke University Medical Center, Durham,
NC 27710; and ‡The Santa Fe Institute, Santa Fe, NM 87501
DNA sequence data: meiotic mutations
Received December 8, 1999. Accepted October 20, 2000.
The costs of publication of this article were defrayed in part by the payment of page
charges. This article must therefore be hereby marked advertisement in accordance
with 18 U.S.C. Section 1734 solely to indicate this fact.
1
Partially supported by National Science Foundation Award MCB 9357637 (to
T.B.K.) and by National Institutes of Health Grant AI28433 (to A.P. and M.O.).
Portions of the work were done under the auspices of the U.S. Department of Energy.
During much of this work T.B.K. and L.G.C. were in the Biomathematics Program,
Department of Statistics, North Carolina State University, Raleigh, NC.
2
Address correspondence and reprint requests to Dr. Thomas B. Kepler, The Santa Fe
Institute, 1399 Hyde Park Road, Santa Fe, NM 87501. E-mail address:
[email protected]
3
Abbreviation used in this paper: SH, somatic hypermutation.
Copyright © 2001 by The American Association of Immunologists
Materials and Methods
DNA sequence data: SH
We collected a data set comprised of 1721 mutations accumulated in nonfunctionally rearranged human Ig genes, murine 3⬘ Ig V-flanking region
DNA, and murine J–C intron DNA (31–38). In all cases, the germline
sequence is known; mutations were identified by comparison of each sequence with its corresponding germline sequence. Insertions and deletions
were not treated in our analysis. Further details regarding this sequence
collection can be found elsewhere (25, 31).
We collected a set of processed human pseudogenes by searching GenBank, release 111.0. Processed pseudogenes result from reverse transcription of mRNA from functional genes and the integration of the reversetranscribed DNA into new chromosomal positions. These pseudogenes are
usually integrated far from the parent gene and are therefore not transcribed
and do not participate in gene conversion events (28, 39, 40). We then used
a locally built version of the BLASTALL algorithm from National Center
for Biotechnology Information to search the primate DNA database for
sequences with homology to the processed pseudogenes. Only the pseudogenes for which the functional ortholog was unambiguously identified
were kept for further analysis. When multiple pseudogenes of the same
4
We use the term “mutability” rather than “mutation rate” to emphasize its role as a
property of the DNA sequence itself.
0022-1767/01/$02.00
Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017
We have compared the microsequence specificity of mutations introduced during somatic hypermutation (SH) and those introduced meiotically during neutral evolution. We have minimized the effects of selection by studying nonproductive (hence unselected) Ig V region genes for somatic mutations and processed pseudogenes for meiotic mutations. We find that the two sets of
patterns are very similar: the mutabilities of nucleotide triplets are positively correlated between the somatic and meiotic sets. The
major differences that do exist fall into three distinct categories: 1) The mutability is sharply higher at CG dinucleotides under
meiotic but not somatic mutation. 2) The complementary triplets AGC and GCT are much more mutable under somatic than
under meiotic mutation. 3) Triplets of the form WAN (W ⴝ T or A) are uniformly more mutable under somatic than under meiotic
mutation. Nevertheless, the relative mutabilities both within this set and within the SAN (S ⴝ G or C) triplets are highly correlated
with those under meiotic mutation. We also find that the somatic triplet specificity is strongly symmetric under strand exchange
for A/T triplets as well as for G/C triplets in spite of the strong predominance of A over T mutations. Thus, we suggest that somatic
mutation has at least two distinct components: one that specifically targets AGC/GCT triplets and another that acts as true
catalysis of meiotic mutation. The Journal of Immunology, 2001, 166: 892– 899.
The Journal of Immunology
893
The pooled mutation and adjusted total counts were used in the study of
strand symmetry of the mutational mechanism and of the potential relation
between triplet targeting in somatic and meiotic mutation. There were
2,261 mutations in 53,479 triplets.
Statistical models and methods
Our analyses are based on models for the acquisition of mutations in which
the mutability of a given nucleotide depends on the microsequence motif
that contains it. We consider two motif sizes: singlets and triplets. Models
based on singlets account only for the identity of the target nucleotide
itself, i.e., whether it is A, G, C, or T. Models based on triplets account for
the identity of the target nucleotide and its immediate neighbors. In other
words, we consider the mutability of XYZ where the target nucleotide Y is
flanked by nucleotide X (5⬘) and nucleotide Z (3⬘).
Every nucleotide in the database is characterized by three factors: the
type of mutation to which it has been exposed (somatic or meiotic), the
sequence in which it is located, and the motif in which it is found. Each
nucleotide, therefore, has probability pijk of being mutated, where the indices i, j, and k identify the mutational set, sequence number within the set,
and motif, respectively. This probability is modeled as:
pijk ⫽ ␪ij␮ik
(1)
where ␪ij is the effective time of exposure to mutation, or age, of the jth
sequence in the ith set and ␮ik is the mutability of the kth motif under the
mutational process i. Although the times ␪ are not of interest to us, it is
necessary to include them in the model for consistent comparison among
Table I. Pseudogenes and related orthologs used in this studya
Gene name
Human Ortholog
Pseudogene
Ortholog in Another Species
␤-Actin
␥-Actin 1
S-adenosylmethionine decarboxylase
Argininosuccinate synthetase
Na,K-ATPase ␤-1
Calmodulin
Ceruloplasmin
␣ catenin
Cytochrome b-5
T cell cyclophilin
Dihydrolipoamide succinyltransferase
ER-60 protein
Estrogen-related receptor ␣
Ubiquitin-like protein FAU
Ferritin H
FKBP-12 protein
Dihydrofolate reductase
Gap junction protein ␣1
Glutamine synthetase
Glutathione peroxidase
G protein-coupled receptor kinase 6
Hexokinase II
High-mobility group 1-like protein L3
Heat shock protein 70
Keratin 19
Lactate dehydrogenase B
Poly(ADP-ribose) synthetase
Phosphoglycerate kinase
Prohibitin
Nuclear inhibitor of protein phosphatase-1
cAMP-dependent protein kinase type I␣
Small nuclear RNA protein E
Jk-recombination signal binding protein
Src homology collagen protein p66 isoform
Serine hydroxymethyltransferase 1
Activating transcription factor 4
Tubulin B
Tom 20
Ubiquitin protein ligase E3A
Topoisomerase I
Thiopurine methyltransferase
X-box binding protein
5016088
4501886
178517
4557336
806753
665587
180248
415305
181226
30308
632883
1208426
4758305
31302
182504
182632
4503322
4755136
4504026
183260
3005017
4809268
4504424
32466
4504916
34328
337423
4505762
4505772
4883484
1526989
338266
190949
1658387
4759105
4502264
4929137
285986
4507798
339805
805083
4827057
28248
2094760
457942
179063
189062
29627
180258
556810
181230
30164
537350
4098206
1916859
458007
806340
182629
182732
181210
551473
183654
1477557
881950
4884557
29571
186694
187061
265480
189924
405232
4581597
307336
338248
871825
1834516
1204117
434667
338692
3860160
2853322
339811
805081
292164
2182268
57573
162622
162696
1198
6680831
5281318
6753293
2257956
2565300
220658
927669
6679692
1628627
2879899
165022
191041
7012755
452436
1564
3005004
204612
164489
313283
623167
473574
450596
206112
206383
1082085
7012769
312004
52756
7012770
2407961
7012771
4929434
717145
1843534
6678398
2895910
6224965
a
The number given is the GenBank gene identification number.
Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017
gene were available, we only used one in the analysis. We searched GenBank (using the BLAST program) for an ortholog of each gene in a species
other than Homo sapiens. The accession of numbers of the genes in the
final data set are given in Table I. Each group of two functional genes and
a processed pseudogene we subjected to sequence alignment using the
ClustalW program (http://www2.ebi.ac.uk/clustalw). From the obtained
alignments, we inferred the state in the ancestor of the human gene and
processed pseudogene at each nucleotide position according to the following rules (41): wherever the two human genes agreed, we assumed that they
carry the ancestral state; where they did not agree, we turned to the second
ortholog. If this ortholog agreed with any of the human genes, the ancestral
state was assumed to be the one carried by two of the three genes. If the
nucleotide was different in all three genes, we declared the ancestral state
ambiguous and excluded that nucleotide position from the analysis. We
also discarded positions where an insertion or deletion was identified in any
of the three genes.
Having identified the ancestral state, we then traversed the alignment
and counted the number of occurrences of each of the 64 nucleotide triplets
in the ancestral gene, as well as the number of instances in which the
pseudogene carried a mutation at the central nucleotide of a triplet.
A given number of mutations in a triplet in a given pseudogene is the
result of its intrinsic propensity to mutate as well as the divergence time
between the gene and the pseudogene. A pseudogene may have a high
mutation count because it contains highly mutable triplets or because it is
very old. To account for these factors, we determined the relative age of the
genes and adjusted the total triplet count in each pseudogene by the relative
age of the pseudogene (see below).
894
TARGETING OF SOMATIC AND MEIOTIC MUTATIONS
total counts provide more reliable estimates of the underlying binomial
parameter and must be weighted more heavily than those with few total
counts. See Appendix for the formula defining the estimators.
We carried out the hypothesis testing on these estimators by randomly
permuting the triplet labels on one of the sets in the paired data and reporting (as p) the quantile of the real estimated correlation coefficient
among the estimators obtained using the permuted data.
Table II. Mutability ratio of complementary nucleotides
Data Set
Nucleotides
␮X/␮X៮ (p)
Somatic
Somatic
A/T
G/C
1.86 (⬍10⫺4)
1.22 (0.01)
Meiotic
Meiotic
A/T
G/C
1.02 (0.80)
1.02 (0.78)
Results
Mutation targeting: complementation symmetry
sequences from different sources and for consistent pooling of data from
diverse sources. We denote the total nucleotide count in class (i, j, k) by nijk
and the number of mutations among those by mijk. Our analyses are based
on the likelihood model given by
log ᏸ共␪, ␮兩m, n兲 ⫽
冘
共mijk log ␪ij␮ik ⫺ nijk␪ij␮ik兲.
(2)
ijk
FIGURE 1. Mutability scatter plots: comparison of the estimated mutability under somatic hypermutation between nucleotide triplets (XYZ) and their
complements (X៮ Y៮ Z៮ ). The dashed line is the principal line. A, Triplets of the form XAZ, with A mutating compared to their complements Z៮ TX៮ with T mutating.
For visual clarity, points are labeled with the XAZ member of the pair only. B, Triplets of the form XGZ, with G mutating compared to their complements
Z៮ CX៮ with C mutating. The correlation apparent here was tested using the method described the Materials and Methods (see Table III).
Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017
Specific hypotheses within the context of this model are expressed as constraints on the mutabilities ␮. For example, the hypothesis that the mutabilities under meiotic and somatic processes are the same is expressed as
␮1k ⫽ ␮2k. The parameters ␪ and ␮ were estimated by maximizing the log
likelihood, Eq. 1, subject to the constraints for the hypothesis under consideration and to the identifiability constraint on ␪: 兺jk␪ijnijk ⫽ 兺jknijk, for
both i. This constraint ensures that the mean “time of exposure” is normalized between sets.
Analyses using contingency tables or correlation tests (where counts
over all sequences in a set are needed) were performed using pooled
counts derived from the likelihood model and adjusted as follows. The
total counts (mutated plus unmutated) for each motif, denoted ñi 䡠 k, are
adjusted for consistent estimation: ñi 䡠 k ⫽ 兺j␪ˆ ijnijk, where ␪ˆ ij is the maximum likelihood estimate for the effective time of exposure, ␪ij.
We applied correlation tests designed to infer the correlation coefficient
among the binomial parameters (proportions or probabilities) that underlie
our count data. The data themselves also have binomial sampling variability, which is not correlated. Therefore, the task is somewhat more complicated than an ordinary (Pearson) correlation test, which, in addition,
assumes normality and equality of variances. We have used two types of
estimators: those that are designed to diminish the bias induced by the
presence of binomial sampling by accounting for the excess variance and
those that do not make this correction. The results of hypothesis testing,
where the null hypothesis is that the correlation coefficient is zero, do not
depend on this choice, but the numerical value of the estimated correlation
coefficient does. All estimators use the fact that the triplets with greater
To investigate the presence of strand bias in the mechanisms responsible for introducing mutations, we compared the mutabilities
of motifs with those of their complements. The first-order model,
in which mutability depends on the identity of the base itself but
not on its neighbors, shows that the somatic set is highly asymmetric, with mutability at A almost twice that at T (Table II). The
G:C ratio is not nearly as high as that for A:T but is also significantly different from 1. The meiotic set does not show any evidence of complementation asymmetry. This result holds even
when we exclude from the computation the sites that span CG
dinucleotides.
For the model in which both neighbors influence the mutability,
we performed correlation tests comparing the mutability of a given
triplet with that of its complement. Triplets were classified according to their central bases and analyzed separately to remove the
clear asymmetry of the single-nucleotide rates.
We find that the correlations between triplets and their complements are extremely high under SH (Fig. 1) but not meiotic mutation. Tests of the correlation coefficient bear this out (Table III).
Note, however, that if we include triplets spanning CG dinucleotides in the calculation of correlation coefficients for the germline
set, we obtain a significant correlation for this set as well. We
obtain similar results when we account for the binomial variance,
although the values of the correlation coefficients are (as expected)
higher: r ⫽ 0.83 ( p ⬍ 10⫺4) for the somatic set with AGC/GCT
excluded, and r ⫽ 0.74 ( p ⫽ 0.12) for the meiotic set with CG
dinucleotide-containing triplets excluded. The correlation becomes
significant for the meiotic set as well if we include these motifs.
The Journal of Immunology
895
Table III. Symmetry of microsequence specificity of mutation targeting:
linear correlation coefficients for triplet-complement mutability pairs
Data Set
Nucleotides
rS (p)
Somatic
Somatic
Somatic
A/T
G/Ca
G/Cb
0.87 (⬍10⫺4)
0.87 (0.01)
0.60 (0.03)
Meiotic
Meiotic
Meiotic
A/T
G/Ca
G/Cc
0.39 (0.15)
0.99 (10⫺4)
0.34 (0.29)
a
b
c
Complete set.
AGC/GCT and CGN/NCG excluded.
CGN/NCG excluded.
Mutation targeting: meiotic/somatic comparison
Mutation spectrum: complementation symmetry
We tested the complementation symmetry of the mutation spectrum conditioned only on the identity of the mutating base. For
both the somatic and meiotic data, we constructed 2 ⫻ 2 ⫻ 3
contingency tables with mutating base classified as purine/pyrimidine and weak/strong, and resulting nucleotide as the transition
partner, complement or transition partner’s complement (31), and
tested for independence of the purine/pyrimidine classification and
the resulting nucleotide (complementation symmetry). Both ␹2
tests failed to provide any evidence for departures from complementation symmetry (meiotic: ␹2 ⫽ 7.53, if we do not include
mutations at CG dinucleotides and 8.20 if we do; somatic: ␹2 ⫽
6.14; none of these values is significant at the 0.05 level).
The microsequence dependence of the spectrum under somatic
hypermutation is symmetric: the estimated common correlation
FIGURE 2. Strength of deviation from equality of triplet mutability between somatic and meiotic sets. Triplets with A/T mutating are shown in A; triplets
with G/C mutating are shown in B. Contributions to the log-likelihood difference from individual triplet motifs are drawn upward when the mutability is
higher in the somatically mutated set and downward when the mutability is higher in the meiotically mutated set.
Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017
To compare the microsequence mutability patterns in meiotic and
somatic processes, we computed the log-likelihood differences between two models: one is the fully parameterized model in which
the mutability for each triplet in each of the two sets is separately
estimated, for a total of 128 mutability parameters (plus age parameters; see Materials and Methods). In the second model, all
triplet mutabilities are assumed to be identical between the somatic
and meiotic sets. The age parameters are still assumed independent
and take up any differences in overall mutation rate.
Each nucleotide triplet contributes a term to the log-likelihood
difference; the larger the term, the more poorly the assumption of
equality between somatic and meiotic data sets accommodates that
triplet (Fig. 2). We find that almost three-fourths of the log-likelihood difference is due to the following triplets (or motifs): triplets
containing CG dinucleotides, AGC, and its complement GCT, and
triplets of the form WAN, where W is T or A, N is any nucleotide.
We estimated the contributions of each of these classes by amending the model to recognize the appropriate number of triplet
classes. For example, to estimate the contribution of CG dinucleotides, the amended model recognizes two classes of triplets: those
containing CG dinucleotides and those that do not. All of the triplets within a class are constrained to have the same ratio of somatic
mutability to meiotic mutability. Each of the above classes therefore uses 1 df. The increase in log likelihood produced by the serial
inclusion of each of these classes is: NCG/CGN, 115.5; AGC/
GCT, 40.8; WAN, 49.7, out of a total likelihood difference (largest
minus smallest model) of 291.3 (63 df). In sum, these 3 df (of 63)
account for 206 of the total 291.3 log-likelihood difference.
Scatterplot comparisons of triplet mutabilities between somatic
and meiotic data sets largely corroborate these results and provide
additional insights; the correlation between somatic and meiotic
mutabilities stands out quite clearly (Fig. 3). For central nucleotide
A, when triplets are grouped as above with WAN and SAN (S ⫽
G or C), the within-groups correlation stands out strongly. The
observed patterns are further confirmed by computation of the linear correlation coefficients (Table IV). These were performed both
for the complete data sets and as modified by the above considerations to remove the effects of those triplets that are clearly involved in processes unique to one set or the other and without
taking into account binomial sampling variance. If we account for
the binomial sampling, the estimated correlation coefficients become higher, but the p values are similar: r ⫽ 0.73(0.0004) and
r ⫽ 0.55(0.01), depending on whether we do or we do not divide
the triplets with A as the central nucleotide into WAN and SAN
classes. Inspection of Fig. 3 also reveals that, consistent with our
findings of complementation symmetry, the triplets NTW, complementary to WAN, also have mutability higher than the triplets
NTS. The effect is not as marked for T as it is for A, but this may
be due to the smaller number of mutations at T.
896
TARGETING OF SOMATIC AND MEIOTIC MUTATIONS
coefficient for the rate of transitions and of transversions to the
complement of the mutating base between a triplet and its complement is r ⫽ 0.43 ( p ⫽ 0.001). This result also holds if we do
not include the triplets that span CG dinucleotides; these triplets
are extremely rare and their mutation counts are also very low. For
the meiotic set, the estimated correlation coefficient with CG
dinucleotides excluded is r ⫽ 0.23 ( p ⫽ 0.12). Similar to what we
observed in mutational targeting, if we include CG dinucleotides,
the spectrum becomes symmetric in the meiotic case as well (r ⫽
0.36, p ⫽ 0.003).
Mutation spectrum: meiotic/somatic comparison
When represented in terms relative to the mutating base, the mutation spectrum is strikingly consistent regardless of which base is
mutating, for both meiotic and somatic processes (Fig. 4). The
spectra are not the same between somatic and meiotic processes
however (Fig. 4). Direct test of the spectrum conditional on the
mutating base only shows very strong differences between meiotic
and somatic mutation (␹2 ⫽ 14.42 (A), 35.68 (G), 22.02 (T), and
7.82 (C); with the exception of C, all other values are significant at
the 0.01 level).
The correlation coefficient between somatic and meiotic sets
(computed as the combined triplet correlations as above for the
symmetry tests) is not significantly different from zero (r ⫽ ⫺0.03,
p ⫽ 0.78).
Discussion
We compared the characteristics of mutations introduced by SH to
those introduced meiotically. To ensure that observed characteristics are due to the mutational process itself, we have minimized the
effects of selection by choosing, where possible, DNA sequences
that are not subject to selection. The SH data are from nonproductively rearranged Ig V genes and from introns flanking rearranged
V genes. For the meiotic mutations, we have used processed pseudogenes. For these, we do not completely eliminate selection, since
there is uncertainty in assignment of observed nucleotide differences to the pseudogene or its ortholog. We attempt to minimize
this uncertainty by considering the state of each nucleotide site in
a second ortholog, from a species other than human.
A marked asymmetry between the mutability under SH of thymidine and that of adenine has been noted previously and taken as
evidence for strand bias of the hypermutation mechanism (42). We
also find a higher mutability at A than at T and that this asymmetry
is much greater than any singlet asymmetry under meiotic mutation. But we also find that when this overall mutability difference
is factored out, the microsequence specificity at A is very similar
to that at T (Fig. 1 and Table III). Similar findings have been
reported (23, 24) and used to justify the conclusion that both
strands are targeted by SH and that two mechanisms, one strandunbiased mutating G and C and the other strand-biased acting on
A and T, operate. We find, however, that the triplet mutabilities are
surprisingly complementation symmetric for both A/T and G/C
mutations. In fact, once the single-nucleotide mutabilities have
been taken into account, the triplet symmetry is evident for SH.
The triplet symmetry appears in meiotic mutation depends strongly
on whether the triplets that span CG dinucleotides are included in
the calculation of the correlation coefficient. Thus, although we
also conclude that there are two distinct components of SH targeting, we find that they share similar strand symmetry.
Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017
FIGURE 3. Mutability scatter plots: comparison
of the estimated mutability between triplets in the
somatically mutated data set and those in the meiotically mutated data set. A, Triplets with A mutating; triplets with G, T, and C mutating are shown in
B, C, and D, respectively. The solid line in each
panel is the principle lines. In A, the dashed lines
correspond to principle lines constructed independently for two groups of triplets: those of the form
WAN and those of the form SAN (see text). Correlation coefficients are shown in Table IV. The error bars give the SE due to binomial sampling.
The Journal of Immunology
897
Table IV. Similarity between microsequence dependence of mutation
targeting under meiotic and somatic processes: linear correlation
coefficients for triplet mutabilitiesa
FIGURE 4. Comparison of the mutation spectra under somatic and meiotic mutation. Pie charts show the proportion of mutations to each of the
corresponding bases. Colors indicate the biochemical relationship of the
mutating nucleotide and the nucleotide resulting from the mutation: dark
gray, transitions; medium gray, complements; light gray, complement of
the transition.
r (p)
Commonb,c
Commonc
A
Ab
Gc
G
T
Cc
C
0.46 (⬍10⫺3)
0.36 (0.01)
0.23 (0.41)
0.49 (0.06)
0.52 (0.10)
0.03 (0.91)
0.36 (0.16)
0.32 (0.33)
⫺0.13 (0.68)
a
Common indicates that r is the common correlation coefficient for all four
nucleotides.
b
A divided into two classes: WAN and SAN. r is the common correlation coefficient for the two groups.
c
AGC/GCT and CGN/NCG removed.
being advantageous whereas mutations under meiotic mutation
presumably are merely unavoidable.
We have previously shown that the mutation spectrum under SH
is microsequence dependent: what a nucleotide mutates to is influenced by what its neighbors are (25). We compared this spectrum to that previously inferred from a set of meiotic mutations
and found no correlations. That meiotic data set, however, combined information from triplets and their complements; furthermore, the mutations were inferred by a somewhat different process
than the one we use here. The more comprehensive comparison
here confirms the previous result: although there are significant
effects of neighboring nucleotides on the mutation spectrum in
both meiotic and somatic processes, the triplet dependencies are
uncorrelated.
The following model is consistent with the findings thus far,
though it is certainly not uniquely so. An initial lesion is created in
the dsDNA. The targeting at this point is symmetric: sense strand
XAZ is affected just as frequently as sense strand Z៮ TX៮ . This occurs
naturally if the lesion is a double-strand break, consistent with the
findings of Sale and Neuberger (8). In fact, the complementation
symmetry of targeting even suggests a staggered cut. In a blunt cut,
the complementary nucleotides are not in equivalent states: one is
3⬘ of the break and the other is 5⬘ of it. A staggered cut that also
breaks the base pairing leaves the two nucleotides both 5⬘ or both
3⬘ of the break, though now on opposite sides of it. Furthermore,
both are unpaired and overhanging. Note that now the apparent
strand asymmetry can now be viewed as the asymmetry between
the DNA 5⬘ and 3⬘ of the break. The probability that religation
is mutagenic now depends on which side of the break the purine is
on, with the probability of mutagenic repair higher if the purine is
on the plus strand. This would result if, for example, purines are
more susceptible to excision when overhanging and gaps in the
plus strand (or 5⬘ of the double-stranded break) are less likely to be
repaired correctly.
Several studies have found reduced mutation rates in mismatch
repair-deficient mice (11, 14, 16) and relative enhancement of mutations at the AGC/GCT hot spots (16) or at G and C bases (13,
15). Rada et al. (16) inferred from this observation that the mutator
has two components, one that is dependent on the mismatch repair
protein MSH-2 and another that is MSH-2 independent. We concur
and suggest that MSH-2 is responsible for introducing lesions as
described above and leaves the signature of catalytically enhanced
meiotic mutation. A second component, as yet unidentified, is targeted specifically at AGC/GCT triplets or at the palindomic quadruplet AGCT (L. G. Cowell and T. B. Kepler, manuscript in
Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017
With certain well-defined exceptions, the sequence specificity of
mutational targeting underlying meiotic and somatic mutations are
significantly correlated. This is quite remarkable since the time
scales over which these changes have accrued differ by about 7
orders of magnitude (about 1 mo for SH and on the order of a
million years for meiotic mutation). This would be expected if
mutations under SH are introduced by catalytic enhancement of
the processes responsible for meiotic mutations. Thus, if a major
proportion of mutations introduced during evolution occur at
strand breaks, then SH hastens the introduction of these breaks, but
they are introduced in the same places. In this sense, the reaction
resembles true catalysis.
The differences in the triplet mutabilities between somatic and
meiotic mutation are largely attributable to three effects: 1) The
mutability of triplets containing CG dinucleotides is much higher
under meiotic mutation than under SH. The mutability of CG
dinucleotides is a well-understood consequence of the methylation
of such dinucleotides (43). This excess mutability has been seen in
studies of pseudogene-ortholog pairs (29) and in surveys of genetic
lesions associated with human genetic disease (44). 2) The mutability of the triplet AGC and its complement GCT is considerably
higher under SH than under meiotic mutation. This is the wellknown serine hot spot (27, 36). 3) The mutability of triplets of the
form WAN is higher under SH. The mutabilities of the triplets
within each of the two subsets (WAN, SAN) are correlated with
those in the meiotic data set. Although the pattern is weaker for T
mutating, the complementary triplets NTW also segregate at
higher mutability from the triplets NTS and both sets are correlated
with the meiotic mutabilities. The overarching similarities between
somatic and meiotic mutation targeting, punctuated sharply by specific differences suggests that two components are involved in the
targeting: a “background” mechanism that has recruited and modified components of the DNA repair machinery, and a mechanism,
perhaps novel, specific to AGC/GCT triplets (see below).
We also investigated the relationships between the mutation
spectra under somatic and meiotic mutation. It was previously suggested that the two processes may be related because both result in
an excess of transitions over transversions (22). We find, however,
that the proportion of transitions is significantly smaller under SH.
The effect of this is that the rate of replacement mutations is higher
under SH and, consequently, so is the net rate of diversification.
Both of these effects are consistent with diversification under SH
Target
TARGETING OF SOMATIC AND MEIOTIC MUTATIONS
preparation), which contains both triplet motifs, and introduces
lesions preferentially at these sites. One candidate for the unknown
molecule is a modified site-specific methylase. Other groups have
hypothesized the presence of a two-component mutator (21–24),
consistent with the observation that G and C are mutated more
frequently in the murine cell line 18-81 (26) and the Burkitt lymphoma line Ramos (8). Furthermore, the G 䡠 C-targeting component is argued to have arisen first (or been co-opted first by SH)
(22), consistent with the observations that AGC/GCT or G and C
are preferentially targeted in shark (45) and Xenopus (46).
The identity of the molecules involved in somatic hypermutation will surely be revealed soon, but even after their names are
known, it will remain to learn how they do what they do. For this
task, careful analysis of the mutation patterns will be essential.
hypermutation in which targeting of the mutator is linked to the direction of DNA
replication. EMBO J. 10:4331.
Gearhart, P. J., and D. F. Bogenhagen. 1983. Clusters of point mutations are
found exclusively around rearranged antibody variable genes. Proc. Natl. Acad.
Sci. USA 80:3439.
Peters, A., and U. Storb. 1996. Somatic hypermutation of immunoglobulin genes
is linked to transcription initiation. Immunity 4:57.
Sale, J. E., and M. Neuberger. 1998. TdT-accessible breaks are scattered over the
immunoglobulin V domain in a constitutively hypermutating B cell line. Immunity 9:859.
Shen, H., D. Cheo, E. Friedberg, and U. Storb. 1997. The inactivation of the xpc
gene does not affect somatic hypermutation or class switch recombination of
immunoglobulin genes. Mol. Immunol. 34:527.
Zheng, B., S. Han, E. Spanopoulou, and G. Kelsoe. 1998. Immunoglobulin gene
hypermutation in germinal centers is independent of the RAG-1 V(D)J recombinase. Immunol. Rev. 162:133.
Winter, D., Q. Phung, A. Umar, S. Baker, R. Tarone, K. Tanaka, R. Liskay,
T. Kunkel, V. Bohr, and P. Gearhart. 1998. Altered spectra of hypermutation in
antibodies from mice deficient for the DNA mismatch repair protein PMS2. Proc.
Natl. Acad. Sci. USA 95:6953.
Frey, S., B. Bertocci, F. Delbos, L. Quint, J.-C. Weill, and C.-A. Raynaud. 1998.
Mismatch repair deficiency interferes with the accumulation of mutations in
chronically stimulated B cells and not with the hypermutation process. Immunity
9:127.
Jacobs, H., Y. Fukita, G. van der Horst, J. de Boer, G. Weeda, J. Essers,
N. de Wind, B. Engelward, L. Samson, S. Verbeek, et al. 1998. Hypermutation
of immunoglobulin genes in memory B cells of DNA repair-deficient mice.
J. Exp. Med. 187:1735.
Cascalho, M., J. Wong, C. Steinberg, and M. Wabl. 1998. Mismatch repair coopted by hypermutation. Science 279:1207.
Phung, Q., D. Winter, A. Cranston, R. Tarone, V. Bohr, R. Fishel, and
P. Gearhart. 1998. Increased hypermutation at G and C nucleotides in immunoglobulin variable genes from mice deficient in the MSH2 mismatch repair protein. J. Exp. Med. 187:1745.
Rada, C., M. R. Ehrenstein, M. Neuberger, and C. Milstein. 1998. Hot spot focusing of somatic hypermutation in MSH2-deficient mice suggests two stages of
mutational targeting. Immunity 9:135.
Sack, S., Y. Liu, J. Germain, and N. Green. 1998. Somatic hypermutation of
immunoglobulin genes is independent of the bloom’s syndrome DNA helicase.
Clin. Exp. Immunol. 112:248.
Wagner, S., and M. Neuberger. 1996. Somatic hypermutation of immunoglobulin
genes. Annu. Rev. Immunol. 14:441.
Kim, N., K. Kage, F. Matsuda, M. Lefranc, and U. Storb. 1997. B lymphocytes
of xeroderma pigmentosum or Cockayne syndrome patients with inherited defects in nucleotide excision repair are fully capable of somatic hypermutation of
immunoglobulin genes. J. Exp. Med. 186:413.
Harris, R. S., Q. Kong, and N. Maizels. 1999. Somatic hypermutation and the
three R’s: repair, replication and recombination. Mutat. Res. 436:157.
Spencer, J., M. Dunn, and D. Dunn-Walters. 1999. Characteristics of sequences
around individual nucleotide substitutions in Igvh genes suggest different GC and
AT mutators. J. Immunol. 162:6596.
Diaz, M., J. Velez, M. Singh, J. Cerny, and M. Flajnik. 1999. Mutational pattern
in the nurse shark antigen receptor gene (NAR) is similar to mammalian Ig and
to spontaneous mutations in evolution: the translesion synthesis model of somatic
hypermutation. Int. Immunol. 11:825.
Dörner, T., S. Foster, N. Farner, and P. Lipsky. 1998. Somatic hypermutation of
human immunoglobulin heavy chain genes: targeting of RGYW motifs on both
DNA strands. Eur. J. Immunol. 28:3384.
Milstein, C., M. Neuberger, and R. Staden. 1998. Both DNA strands of antibody
genes are hypermutation targets. Proc. Natl. Acad. Sci. USA 95:8791.
Cowell, L., and T. Kepler. 1999. The nucleotide-replacement spectrum under
somatic hypermutation exhibits microsequence dependence that is strandsymmetric and distinct from that under germline mutation. J. Immunol.
164:1971.
Bachl, J., and M. Wabl. 1996. An immunoglobulin mutator that targets G.C base
pairs. Proc. Natl. Acad. Sci. USA 93:851.
Rogozin, I., and N. Kolchanov. 1992. Somatic hypermutagenesis in immunoglobulin genes. II. Influence of neighboring base sequences on mutagenesis. Biochim. Biophys. Acta 1171:11.
Bains, W. 1992. Local sequence dependence of rate of base replacement in mammals. Mutat. Res. 267:43.
Hess, S., J. Blake, and R. Blake. 1994. Wide variations in neighbor-dependent
substitution rates. J. Mol. Biol. 236:1022.
Golding, G., P. Gearhart, and B. Glickman. 1987. Patterns of somatic mutations
in immunoglobulin variable genes. Genetics 115:169.
Cowell, L., H. Kim, T. Humaljoki, C. Berek, and T. Kepler. 1999. Enhanced
evolvability in immunoglobulin V genes under somatic hypermutation. J. Mol.
Evol. 49:23.
Brezinschek, H. P., R. I. Brezinschek, and P. Lipsky. 1995. Analysis of the heavy
chain repertoire of human peripheral B cells using single-cell polymerase chain
reaction. J. Immunol. 155:190.
Weber, J. S., J. Berry, T. Manser, and J. L. Claflin. 1994. Mutations in Ig V(D)J
genes are distributed asymmetrically and independently of the position of V(D)J.
J. Immunol. 153:3594.
Weber, J. S., J. Berry, S. Litwin, and J. L. Claflin. 1991. Somatic hypermutation
of the JC intron is markedly reduced in unrearranged ␬ and H alleles and is
unevenly distributed in rearranged alleles. J. Immunol. 146:3218.
6.
7.
8.
9.
10.
11.
12.
Acknowledgments
We thank Claudia Berek and Latham Claflin for sharing unpublished data.
13.
Appendix
Estimator for the correlation coefficient among binomial
proportions
14.
The model underlying the data analysis is that of two sets of mutabilities which are linearly correlated and which give rise to binomial (count) data. The task is to estimate the linear correlation
coefficient. The difficulty is that the binomial sampling variability
is independent (i.e., uncorrelated); it is only the indirectly observed
mutabilities that are correlated. The estimation is as follows.
The adjusted counts for each motif k are designated by nik where
i ⫽ 1, 2 is the group index (somatic or meiotic; triplet or complement), and k designates the motif. Similarly, mik denotes the number of mutated occurrences of motif k in group i. For each of the
four nucleotides, the number of triplets is denoted by K. The dot
denotes summation over the respective coefficient.
The estimators for the correlation coefficients are computed as:
␳ˆ ⫽
Q
␯s1s2
(3)
16.
17.
18.
19.
20.
21.
22.
where
Q⫽
冘冑 冉
n1kn2k
k
s2i ⫽
m1k m1䡠
⫺
n1k n1䡠
冘 冉
knik
mik m1䡠
⫺
nik n1䡠
冊
冊冉
and
冉冘 冑
k
冊冋
n1kn2k 1 ⫹
m2k m2䡠
⫺
n2k n2䡠
2
⫺ 共K ⫺ 1 兲
ni䡠 ⫺ K ⫹ 1 ⫺
␯⫽
15.
冘
(4)
冊
mi䡠
mi䡠
1⫺
ni䡠
ni䡠
2
knik
23.
24.
25.
,
(5)
ni䡠
26.
27.
册
kn1kn2k
n1 䡠 n2䡠
冘
冉
冊
⫺
冘冑
k
冉 冊
n1kn2k
n1k n2k
⫹
.
n1䡠 n2䡠
(6)
References
1. McKean, D. M., K. Huppi, M. Bell, L. Staudt, W. Gerhard, and M. Weigert.
1984. Generation of antibody diversity in the immune response of BALB/c mice
to influenza virus hemagglutinin. Proc. Natl. Acad. Sci. USA 81:3180.
2. Maizels, N. 1989. Might gene conversion be the mechanism of somatic hypermutation of mammalian Ig genes? Trends Genet. 5:4.
3. Steele, E., and J. Pollard. 1987. Hypothesis: somatic mutation by gene conversion
via the error prone DNA 3 RNA 3 DNA information loop. Mol. Immunol.
24:667.
4. Manser, T. 1990. The efficiency of antibody maturation; can the rate of B cell
division be limiting? Immunol. Today 11:305.
5. Rogerson, B., J. Hackett, A. Peters, D. Haasch, and U. Storb. 1991. Mutation
pattern of immunoglobulin transgenes is compatible with a model of somatic
28.
29.
30.
31.
32.
33.
34.
Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017
898
The Journal of Immunology
35. Wu, P., and L. Claflin. 1999. Promoter-associated displacement of hypermutations. Int. Immunol. 10:1131.
36. Smith, D., G. Creadon, P. Jena, J. Portanova, B. Kotzin, and L. Wysocki. 1996.
Di- and trinucleotide target preferences in somatic mutagenesis in normal and
autoreactive B cells. J. Immunol. 156:2642.
37. Rickert, R., and S. Clarke. 1993. Low frequencies of somatic mutation in two
expressed V␬ genes: unequal distribution of mutation in 5⬘ and 3⬘ flanking regions. Int. Immunol. 5:255.
38. Weber, J. S., J. Berry, T. Manser, and J. L. Claflin. 1991. Position of the rearranged V␬ and its 5⬘ flanking sequences determines the location of somatic mutations in the J␬ locus. J. Immunol. 146:3652.
39. Ophir, R., T. Itoh, D. Graur, and T. Gojobori. 1999. A simple method for estimating the intensity of purifying selection in protein-coding genes. Mol. Biol.
Evol. 16:49.
40. Li, W. 1997. Molecular Evolution. Sinauer Associates, Inc., Sunderland, MA.
899
41. Li, W., C. Wu, and C. Luo. 1984. Nonrandomness of point mutation as reflected
in nucleotide substitutions in pseudogenes and its evolutionary implications.
J. Mol. Evol. 21:58.
42. Lebecque, S., and P. Gearhart. 1990. Boundaries of somatic mutation in rearranged immunoglobulin genes: 5⬘ boundary is near the promoter, and 3⬘ boundary is approximately 1 kb from V(D)J gene. J. Exp. Med. 172:1717.
43. Bird, A. 1980. DNA methylation and the frequency of CpG in animal DNA.
Nucleic Acids Res. 8:1499.
44. Cooper, D., and H. Youssoufian. 1988. The CpG dinucleotide and human genetic
disease. Hum. Genet. 78:151.
45. Wilson, M., E. Hsu, A. Marcuz, L. Courtet, L. Du Pasquier, and C. Steinberg.
1992. What limits affinity maturation of antibodies in Xenopus: the rate of somatic mutation or the ability to select mutants? EMBO J. 11:4337.
46. Hinds-Frey, K., H. Nishikata, R. Litman, and G. Litman. 1993. Somatic variation
precedes extensive diversification of germline sequences and combinatorial joining in the evolution of immunoglobulin heavy chain diversity. J. Exp. Med.
178:815.
Downloaded from http://www.jimmunol.org/ by guest on June 18, 2017