Nearest 3 Substitution Rates and Distance to the Specificity and

This information is current as
of July 12, 2017.
Analysis of 6912 Unselected Somatic
Hypermutations in Human VDJ
Rearrangements Reveals Lack of Strand
Specificity and Correlation between Phase II
Substitution Rates and Distance to the
Nearest 3 ′ Activation-Induced Cytidine
Deaminase Target
Line Ohm-Laursen and Torben Barington
References
Subscription
Permissions
Email Alerts
This article cites 76 articles, 32 of which you can access for free at:
http://www.jimmunol.org/content/178/7/4322.full#ref-list-1
Information about subscribing to The Journal of Immunology is online at:
http://jimmunol.org/subscription
Submit copyright permission requests at:
http://www.aai.org/About/Publications/JI/copyright.html
Receive free email-alerts when new articles cite this article. Sign up at:
http://jimmunol.org/alerts
The Journal of Immunology is published twice each month by
The American Association of Immunologists, Inc.,
1451 Rockville Pike, Suite 650, Rockville, MD 20852
Copyright © 2007 by The American Association of
Immunologists All rights reserved.
Print ISSN: 0022-1767 Online ISSN: 1550-6606.
Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017
J Immunol 2007; 178:4322-4334; ;
doi: 10.4049/jimmunol.178.7.4322
http://www.jimmunol.org/content/178/7/4322
The Journal of Immunology
Analysis of 6912 Unselected Somatic Hypermutations in
Human VDJ Rearrangements Reveals Lack of Strand
Specificity and Correlation between Phase II Substitution
Rates and Distance to the Nearest 3ⴕ Activation-Induced
Cytidine Deaminase Target1
Line Ohm-Laursen2 and Torben Barington3
S
omatic hypermutations (SHMs)4 in the form of nucleotide
substitutions, insertions, and deletions are found throughout the variable regions of Ig rearrangements in postgerminal center B cells. SHM is followed by selection against unfavorable mutations and for high affinity binding to Ag, and the
process is necessary for the generation of high affinity Abs and B
cell memory.
SHM is dependent on several cis-acting elements, including the
promoter and the elements of the Ig enhancer regions (1– 6). The
promoter and enhancer requirement is likely to be due to a requirement for transcription, although other properties of the enhancers, e.g., protein binding, are likely to be involved as well (6).
The transcription dependence is furthermore supported by the fact
that the mutation frequency decays exponentially from a starting
point ⬃150 –200 bp from the promoter (7, 8). The 3⬘ boundary is
Department of Clinical Immunology, Odense University Hospital, Denmark
Received for publication July 13, 2006. Accepted for publication January 8, 2007.
The costs of publication of this article were defrayed in part by the payment of page
charges. This article must therefore be hereby marked advertisement in accordance
with 18 U.S.C. Section 1734 solely to indicate this fact.
1
This study was supported by Danish Medical Research Council Grant 22-01-0156.
2
Current address: University of Oxford, The Peter Medawar Building for Pathogen
Research, South Parks Road, Oxford, U.K.
3
Address correspondence and reprint requests to Prof. Torben Barington, Department
of Clinical Immunology, Odense University Hospital, 5000 Odense C, Denmark.
E-mail address: [email protected]
4
Abbreviations used in this paper: SHM, somatic hypermutation; AID, activationinduced cytidine deaminase; CSR, class switch recombination; EXO1, exonuclease I;
FR, framework region; MSH, MutS homolog; pol, DNA polymerase; UNG, uracil
DNA glycosylase.
Copyright © 2007 by The American Association of Immunologists, Inc. 0022-1767/07/$2.00
www.jimmunol.org
in the intron region downstream of the J genes and no mutations
are normally found in the constant domain. In contrast, the entire
variable domain is targeted by SHM. The CDRs have been described
as being more prone to mutation than the framework regions (FRs),
and this been attributed to the presence of more RGYW hot spot
motifs (9, 10).
The only trans-acting factor described to be absolutely mandatory for SHM is activation-induced cytidine deaminase (AID).
AID is not only required for SHM but also for class switch recombination (CSR) and gene conversion (11–13). Ectopic expression of AID can turn on mutation and CSR in human hybridomas,
Escherichia coli, and murine fibroblasts (14 –17), proving that it is
the only B cell-specific factor necessary for SHM. AID is a cytidine deaminase shown to be able to deaminate cytidine residues in
ssDNA, in particular in WRC motifs (18 –20). It has also been
suggested that AID could be involved in SHM by modulation of
the mRNA of an involved protein (11).
According to the current model for SHM (21, 22), the process is
initiated by cytidine deamination by AID. The targeted sequence is
thought to be single stranded because of the ongoing transcription
from the Ig promoter. The generated uracil can either be replicated
over, generating a C to T transition in the sister cell (or G to A if
the transcribed strand is targeted) (phase I), or it can be removed.
Uracil DNA-glycosylase (UNG) and a complex of MutS homologs
(MSH) 2 and 6 (MSH2 and MSH6) have both been described as
being capable of uracil removal.
Deletion of or mutation in the murine and human UNG gene
changes the mutation pattern of G and C residues almost exclusively to transitions (21, 23, 24). MSH2/MSH6 deficiency leads to
impaired mutation of A and T residues (21, 25, 26), and UNGMSH6 double knock-out mice have almost exclusively C to T and
Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017
The initial event of somatic hypermutation (SHM) is the deamination of cytidine residues by activation-induced cytidine deaminase
(AID). Deamination is followed by the replication over uracil and/or different error-prone repair events. We sequenced 659
nonproductive human IgH rearrangements (IGHV3-23*01) from blood B lymphocytes enriched for CD27-positive memory cells.
Analyses of 6,912 unique, unselected substitutions showed that in vivo hot and cold spots for the SHM of C and G residues
corresponded closely to the target preferences reported for AID in vitro. A detailed analysis of all possible four-nucleotide motifs
present on both strands of the VH gene showed significant correlations between the substitution frequencies in reverse complementary motifs, suggesting that the SHM machinery targets both strands equally well. An analysis of individual JH and D gene
segments showed that the substitution frequencies in the individual motifs were comparable to the frequencies found in the VH
gene. Interestingly, JH6-carrying sequences were less likely to undergo SHM (average 15.2 substitutions per VH region) than
sequences using JH4 (18.1 substitutions, p ⴝ 0.03). We also found that the substitution rates in G and T residues correlated
inversely with the distance to the nearest 3ⴕ WRC AID hot spot motif on both the nontranscribed and transcribed strands. This
suggests that phase II SHM takes place 5ⴕ of the initial AID deamination target and primarily targets T and G residues or,
alternatively, the corresponding A and C residues on the opposite strand. The Journal of Immunology, 2007, 178: 4322– 4334.
The Journal of Immunology
4323
motifs but, as for phase I C/G mutations, we find no sign of strand
specificity. Interestingly, the phase II substitution rates in T and G
showed an inverse correlation to the distance to the nearest 3⬘
WRC, indicating that phase II mutations predominately occur in T
and G residues 5⬘ of the initial AID-deaminated cytidine residue
or, alternatively, in the corresponding A and C residues on the
opposite strand.
Materials and Methods
FIGURE 1. Number of substitutions in the VH region of 659 nonproductive rearrangements. Only the 386 sequences with more than three substitutions were used in the mutation analysis to reduce the influence of Taq errors.
FIGURE 2. Substitution rates in the different positions of the IGHV3-23*01 VH gene segment in 386 somatically hypermutated, nonproductive rearrangements. A total of 6,912 substitutions were analyzed. FR/CDR boundaries are indicated according to ImMunoGeneTics (IMGT) nomenclature.
Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017
G to A transitions, i.e., phase I mutations (27). A and T residues
are targeted during phase II of SHM. Phase II involves resolution
of the abasic site generated by uracil removal and has been suggested to involve repair by error-prone DNA polymerases such as
polymerase (pol) ␩ and pol ␨ and the involvement of exonuclease
I (EXO1) (28 –35). Pol ␪ has been shown to be involved in phase
II C/G mutation (36). An alternative hypothesis suggests that incorporation of dUTP may be the cause of A/T mutations (22).
In this study we have analyzed 6,912 substitutions in 386 mutated, nonproductive human H chain rearrangements using IGHV323*01 and 542 substitutions in 56 mutated rearrangements using
the IGHV3-h pseudogene. This high number of nonproductive sequences enables us to study nonselected mutations with a good
statistical power. We show that mutations in C and G residues can
be assigned to AID deamination equally targeted to both strands.
The substitution rate depends on the motifs in which the nucleotide
is found and the recognized motifs are at least four nucleotides
long. The substitution rates in A and T residues also depend on the
A material consisting of 659 nonproductive and 5,670 productive rearrangements of IGHV3-23 rearranged to predominately IGHJ4 and IGHJ6
was collected and validated as described (37) (European Molecular Biology Laboratory (EMBL) accession nos. AM076988 –AM083316). In brief,
100 ml of blood was collected from 28 healthy, adult volunteers after
informed consent (the study was approved by the Ethics Committee for
Funen and Vejle Counties, Denmark). Donors had been selected to be
homozygous for IGHV3-23*01 and IGHJ6*02, which are the most frequent genotypes in the Danish Caucasian population (genotype frequency
0.64 and 0.51, respectively) (see Ref. 38). DNA was purified from magnetic bead-isolated memory B cells (MACS B cell isolation kit II followed
by CD27-isolation (Miltenyi Biotec via Biotech Line)) using the QIAmp
blood DNA mini kit (Qiagen via VWR International). IgH rearrangements
were PCR amplified with 3-23cn9.F (5⬘-CTGAGCTGGCTTTTTTTCTT
GTG-3⬘) as the forward primer and either JH4.R (5⬘-GCCGCTGTTGC
CTCAGG-3⬘) or JH6.R1 (5⬘-CCCACAGGCAGTAGCAGAA-3⬘) as the
reverse primer. PCR products were cloned using the TOPO TA cloning kit
(Invitrogen Life Technologies), and plasmid DNA was purified using the
Wizard SV 9600 plasmid purification system (Promega via Ramcon). The
rearrangements were sequenced with the BigDye Terminator kit (Applied
Biosystems) with an ABI Prism 3100 genetic analyzer (Applied Biosystems). A set of 103 rearrangements using the IGHV3-h pseudogene was
generated in a similar way (37) (EMBL accession nos. AM282702–
AM282804). The CDR3 regions were analyzed for D genes, P and N nucleotides, and trimming using the JointMLc algorithm as described elsewhere (37). An online version of the algorithm can be found at
www.cbs.dtu.dk/services/VDJsolver. Statistical analyses were performed
using the Analyze-it addition to Microsoft Excel.
4324
SHM OCCUR IN AND 5⬘ OF AID HOT SPOTS ON BOTH STRANDS
Table I. VH substitutions in 386 non-productive IGHV3–23*01 sequences with more than three substitutions in the VH regiona
To
From
A
A
T
C
G
Total
0.046 (316)
0.040 (273)
0.166 (1148)
T
C
G
Transitions
Transversions
Percentage
Mutated (%)b
0.061c (424)
0.052 (359)
0.089 (615)
0.118 (816)
0.054 (375)
0.093 (645)
0.118 (816)
0.089 (615)
0.130 (895)
0.166 (1148)
0.503 (3474)
0.113 (783)
0.100 (691)
0.133 (918)
0.151 (1046)
0.497 (3438)
6.8
5.6
6.7
6.1
6.3
0.130 (895)
0.042 (290)
0.109 (756)
a
The fraction of each type of substitution among all substitutions is given as well as the absolute numbers (parentheses). The overall transition/
transversion ratio was 1.01. The rightmost column shows the substitution rates for each of the four nucleotides and the overall substitution rate.
b
Substitution rates varied from 5.6% for T to 6.8% for A, which is statistically significant ( p ⬍ 0.0001, ␹2 test).
c
Fraction of the given substitution of all substitutions.
Results
FIGURE 3. Mutation rates given as the log
transformed fraction of 386 nonproductive
IGHV3-23*01 VH sequences (codons 1–107)
carrying a specific substitution type (of three
possible) in individual C (a), G (b), A (c), and T
(d) residues. For clarity, the individual nucleotides have been placed in the order of declining
total substitution rates along the x-axis. For all
four nucleotides, transitions (light gray marks)
predominate. However, for transversions two
distinct patterns are seen. For C and G residues,
transversions to the complementary nucleotides
(black marks) are more common than transversions to the noncomplementary nucleotides
(dark gray marks), while both types of transversions are equally frequent in A and T residues.
ⴱ, The T residue in position three in codon 15
with a remarkably high level of T to G
transversions.
0.0015 substitutions per nucleotide (37). Furthermore, 153 deletions and 70 insertions were found in the VH regions, accounting
for 2 and 1% of the total mutations, respectively. Insertions and
deletions will be described in detail elsewhere (T. Barington and L.
Ohm-Laursen, manuscript in preparation). A further 855 substitutions were found in the JH-regions and 213 in the D regions of
IGHV3-23-using sequences. The VH region of the IGHV3-h-using
sequences contained 542 substitutions in total.
To exclude the possibility that some of the rearrangements used
other VH-genes, all sequences were thoroughly compared with all
known IGHV alleles in the ImMunoGeneTics (IMGT) database
(http://imgt.cines.fr) (39). We always found maximal identity with
IGHV3-23*01 or IGHV3-h*01, respectively. Some donors frequently had a specific mutation in IGHD3-3, suggesting the existence of a new D gene allele. However, this could not be confirmed
by sequencing of the IGHD3-3 germline gene (data not shown).
No other mutations in the IGHD genes or IGHJ genes indicated the
presence of new alleles.
Fig. 2 shows the substitution frequencies in the different positions in the VH region in the 386 nonproductive IGHV3-23*01
rearrangements. The overall substitution frequency per nucleotide
in the VH-region was 6.3%, varying from 5.6% for T residues to
6.8% for A residues (see Table I). The transition/transversion ratio
was 1.01. In the mutated productive sequences the transition/transversion ratio was significantly higher (1.30, p ⬍ 0.0001) and the
mutation rate lower (5.0%, p ⫽ 0.0026). This was largely due to a
lower rate of replacement mutation yielding a lower replacement-tosilent mutation ratio (2.53 vs 4.02, p ⬍ 0.0001) that is indicative of
Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017
The data set consisted of 659 nonproductive and 5,670 productive
IGHV3-23*01 rearrangements. Sequences were considered nonproductive when the V-D-JH joint changed the normal reading
frame of the JH segment or when the D segment contained one or
more stop codons not resulting from SHM (i.e., germline encoded). Because the mutations in nonproductive sequences have
not been selected for Ag binding, the mutations were expected to
represent the intrinsic preferences of the mutation mechanism.
Great care was taken to remove clonally related sequences from
the material (37) and hence the mutations are expected to have
arisen independently. All sequences contained exon 2 of the VH
gene starting in codon 1 or further upstream and continued through
the 3⬘ end of the JH gene. The precise germline sequences of the
VH gene (IGHV3-23*01) and the JH6 allele (IGHJ6*02) were
known, because all donors had been typed and found homozygous
in the respective loci (see Ref. 38). The VDJ joints were analyzed
using the JointML algorithm that we have showed to be the currently best algorithm for identifying D genes (37). Of the 659
nonproductive rearrangements, 185 were unmutated in the VH region (codons 1–107) and therefore excluded. To minimize interference from Taq errors in the analyses, those sequences with one
to three mutations were also excluded (88 sequences). The remaining 386 sequences contained from four to 70 mutations (median
14, average 16.5) in the VH region, excluding insertions and deletions (see Fig. 1). Altogether, these sequences contained 6,912
substitutions. Less than 2.4% of these substitutions were expected
to be Taq errors based on an estimated error rate of 0.00048 –
The Journal of Immunology
4325
Table II. Substitution rates of the boldfaced C/G (top portion) and A/T (bottom portion) residue in the given four-nucleotide motifs in 386
non-productive, mutated rearrangements (codons 1 through 107)a
C/G Substitutions in the Nontranscribed Strand
Substitutions in C
Substitutions in G
Substitution Rates (%)
No. of
Motifs
Tested
Total
C to T
C to G
C to A
AGCT*
AGCG
AGCA
TGCC*
TGCT*
TACT*
CGCT
TACC*
AACC*
TACG
ACCT
ATCT
AGCC*
CACC
ATCC
AACA
TTCT
TGCA
CACT
GACT
TTCC
ACCC
CGCA
TGCG
GGCT
TCCT
TTCA
TTCG
CACG
CGCC
CACA
GGCA
CTCA
CTCC
TCCC
TCCA
GACA
GGCC
ACCA
TACA
ACCG
GTCT
TCCG
CTCT
CTCG
GACC
GCCC
CCCC
CCCA
GCCT
GCCA
GCCG
CCCT
GTCC
GGCG
301
—
110
73
—
76
40
—
—
26
30
51
108
51
—
42
—
60
—
33
13
—
11
11
32
11
20
—
17
10
6
—
7
28
6
16
10
10
4
4
—
8
8
6
—
—
—
—
—
4
2
3
1
2
—
871
—
470
334
—
477
268
—
—
255
308
580
1367
648
—
618
—
1007
—
696
284
—
288
299
996
345
644
—
594
362
240
—
325
1363
368
999
674
723
295
298
—
661
695
669
—
—
—
—
—
906
606
947
356
717
—
34.6
—
23.4
21.9
—
15.9
14.9
—
—
10.2
9.7
8.8
7.9
7.9
—
6.8
—
6.0
—
4.7
4.6
—
3.8
3.7
3.2
3.2
3.1
—
2.8
2.8
2.5
—
2.2
2.1
1.6
1.6
1.5
1.4
1.4
1.3
—
1.2
1.2
0.9
—
—
—
—
—
0.4
0.3
0.3
0.3
0.3
—
14.5
—
13.0
11.7
—
7.6
7.5
—
—
6.3
4.9
3.5
5.1
3.2
—
3.4
—
3.8
—
1.3
2.5
—
2.1
2.0
1.8
1.5
1.6
—
1.9
2.2
0.8
—
1.5
0.8
0.8
0.7
0.9
0.6
1.0
1.0
—
0.6
0.9
0.5
—
—
—
—
—
0.2
0.2
0.2
0.0
0.1
—
17.2
—
5.1
4.8
—
5.0
6.0
—
—
2.0
3.3
4.1
1.8
3.7
—
2.3
—
1.1
—
2.9
1.8
—
1.0
1.3
1.3
0.3
0.8
—
0.8
0.3
0.8
—
0.3
0.8
0.3
0.7
0.3
0.4
0.0
0.3
—
0.3
0.0
0.3
—
—
—
—
—
0.2
0.0
0.0
0.0
0.1
—
2.9
—
5.3
5.4
—
3.4
1.1
—
—
2.0
1.6
1.2
1.0
0.9
—
1.1
—
1.1
—
0.6
0.4
—
0.7
0.3
0.1
1.5
0.8
—
0.2
0.3
0.8
—
0.3
0.4
0.5
0.2
0.3
0.4
0.3
0.0
—
0.3
0.3
0.2
—
—
—
—
—
0.0
0.2
0.1
0.3
0.0
—
Substitution Rates (%)
Motif
No. of
Substitutions
No. of
Motifs
Tested
Total
G to A
G to C
G to T
AGCT*
CGCT
TGCT
GGCA*
AGCA*
AGTA*
AGCG
GGTA*
GGTT*
CGTA
AGGT
AGAT
GGCT*
GGTG
GGAT
TGTT
AGAA
TGCA
AGTG
AGTC
GGAA
GGGT
TGCG
CGCA
AGCC
AGGA
TGAA
CGAA
CGTG
GGCG
TGTG
TGCC
TGAG
GGAG
GGGA
TGGA
TGTC
GGCC
TGGT
TGTA
CGGT
AGAC
CGGA
AGAG
CGAG
GGTC
GGGC
GGGG
TGGG
AGGC
TGGC
CGGC
AGGG
GGAC
CGCC
309
71
—
—
92
—
—
87
34
59
8
—
76
27
21
22
19
32
52
35
8
47
7
21
39
47
17
7
10
—
7
6
17
22
8
26
—
8
57
24
4
17
—
10
2
6
4
12
7
3
—
3
5
0
1
879
299
—
—
452
—
—
731
328
279
355
—
1040
600
305
350
304
979
879
354
327
1076
295
298
1298
340
636
244
349
—
657
267
907
1007
710
1005
—
721
1013
242
303
1000
—
672
305
1035
712
2176
1313
627
—
350
1034
297
353
35.2
23.8
—
—
20.4
—
—
11.9
10.4
21.2
2.3
—
7.3
4.5
6.9
6.3
6.3
3.3
5.9
9.9
2.5
4.4
2.4
7.1
3.0
13.8
2.7
2.9
2.9
—
1.1
2.3
1.9
2.2
1.1
2.6
—
1.1
5.6
9.9
1.3
1.7
—
1.5
0.7
0.6
0.6
0.6
0.5
0.5
—
0.9
0.5
0.0
0.3
17.4
13.0
—
—
8.9
—
—
8.2
4.0
11.5
1.1
—
4.4
2.7
3.6
3.7
2.3
1.8
3.3
4.2
1.8
2.4
1.7
4.7
1.1
3.5
2.0
2.1
0.9
—
0.5
1.1
0.7
0.3
1.0
1.9
—
0.1
3.5
4.6
0.0
0.7
—
0.3
0.7
0.3
0.3
0.3
0.1
0.2
—
0.6
0.4
0.0
0.3
15.5
6.0
—
—
10.4
—
—
1.2
5.5
6.1
0.9
—
1.9
1.3
2.6
1.4
4.0
1.0
1.7
5.7
0.3
1.6
0.7
1.7
1.5
9.7
0.6
0.8
1.2
—
0.3
0.8
0.9
1.9
0.0
0.3
—
0.4
1.1
2.9
1.0
0.7
—
1.0
0.0
0.2
0.3
0.1
0.4
0.2
—
0.0
0.1
0.0
0.0
2.3
5.0
—
—
1.1
—
—
2.5
0.9
3.6
0.3
—
1.0
0.5
0.7
1.1
0.0
0.4
0.9
0.0
0.3
0.4
0.0
0.7
0.5
0.6
0.0
0.0
0.9
—
0.3
0.4
0.3
0.0
0.1
0.4
—
0.6
1.1
2.5
0.3
0.3
—
0.2
0.0
0.1
0.0
0.1
0.1
0.2
—
0.3
0.0
0.0
0.0
A/T Substitutions in the Nontranscribed Strand
Substitutions in A
Substitutions in T
Substitution Rates (%)
Motif
No. of
Substitutions
No. of
Nucleotide
Tested
Total
A to G
A to T
A to C
GTAT
CTAT
GTAG
TTAC
CTAC
GAAT
GTAA
CAAT
ATAC
GTAC
ATAT
106
65
60
31
29
—
—
31
22
32
25
520
367
388
238
238
—
—
291
214
319
250
20.4
17.7
15.5
13.0
12.2
—
—
10.7
10.3
10.0
10.0
6.7
7.1
10.1
5.0
4.2
—
—
4.8
4.2
6.3
4.8
7.8
6.0
3.4
3.8
3.8
—
—
2.4
3.7
1.9
3.6
5.8
4.6
2.1
4.2
4.2
—
—
3.4
2.3
1.9
1.6
Substitution Rates (%)
Motif
No. of
Substitutions
No. of
Nucleotides
Tested
Total
T to C
ATAC
ATAG
CTAC
GTAA
GTAG
ATTC
TTAC
ATTG
GTAT
GTAC
ATAT
47
—
26
—
52
75
26
—
38
31
20
239
—
235
—
380
641
233
—
452
318
245
19.4
—
11.1
—
13.7
11.7
11.2
—
8.4
9.8
8.2
7.5
—
6.8
—
6.3
5.3
4.7
—
4.4
2.8
4.5
T to A
T to G
5.4
6.7
—
—
2.6
1.7
—
—
3.2
4.2
3.7
2.7
3.0
3.4
—
—
1.6
2.4
3.5
3.5
2.9
0.8
(Table continues)
Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017
Motif
No. of
Substitutions
4326
SHM OCCUR IN AND 5⬘ OF AID HOT SPOTS ON BOTH STRANDS
Table II. (Continued)
A/T Substitutions in the Nontranscribed Strand
Substitutions in A
Substitutions in T
Substitution Rates (%)
No. of
Nucleotide
Tested
Total
A to G
A to T
A to C
CTAA
AAAT
CAAG
CAAA
GAAC
ATAG
CCAT
GAAA
TTAG
AAAG
GAAG
CCAA
TAAT
GCAT
TCAC
ACAT
ACAA
CCAC
GGAC
GCAA
CGAA
TCAG
AGAG
TGAA
ACAC
AGAA
GCAC
TCAT
CCAG
GCAG
GGAG
CAAC
CGAG
TGAG
TAAA
GGAA
GGAT
ACAG
AGAC
AGAT
—
29
28
25
50
—
44
15
35
10
40
17
—
—
32
11
14
—
13
12
10
7
26
24
22
10
6
—
32
33
30
—
9
25
—
8
7
12
19
—
—
314
318
304
619
—
608
212
549
160
696
312
—
—
651
250
321
—
310
296
247
176
688
643
643
295
183
—
1057
1096
1015
—
312
915
—
327
291
616
1002
—
—
9.2
8.8
8.2
8.1
—
7.2
7.1
6.4
6.3
5.8
5.5
—
—
4.9
4.4
4.4
—
4.2
4.1
4.1
4.0
3.8
3.7
3.4
3.4
3.3
—
3.0
3.0
3.0
—
2.9
2.7
—
2.5
2.1
2.0
1.9
—
—
5.7
7.2
6.3
5.5
—
3.3
4.3
2.9
5.0
4.5
3.9
—
—
2.2
2.0
3.1
—
2.6
2.7
2.0
1.7
2.0
2.5
2.3
2.4
2.2
—
2.0
2.0
1.6
—
1.3
1.2
—
1.8
1.7
1.0
1.5
—
—
1.9
0.9
1.0
0.7
—
2.6
0.5
2.4
0.6
0.7
1.0
—
—
1.2
1.2
0.3
—
0.3
0.7
0.8
0.0
0.0
0.0
0.8
0.7
0.6
—
0.5
0.6
0.6
—
1.0
0.6
—
0.3
0.0
0.7
0.2
—
—
1.6
0.6
1.0
1.9
—
1.3
2.4
1.1
0.6
0.6
0.6
—
—
1.5
1.2
0.9
—
1.3
7.0
1.2
2.3
1.7
1.2
0.3
0.3
0.6
—
0.6
0.4
0.8
—
0.6
1.0
—
0.3
0.7
0.3
0.2
—
Substitution Rates (%)
Motif
No. of
Substitutions
No. of
Nucleotides
Tested
Total
T to C
T to A
T to G
TTAG
ATTT
CTTG
TTTG
GTTC
CTAT
ATGG
TTTC
CTAA
CTTT
CTTC
TTGG
ATTA
ATGC
GTGA
ATGT
TTGT
GTGG
GTCC
TTGC
TTCG
CTGA
CTCT
TTCA
GTGT
TTCT
GTGC
ATGA
CTGG
CTGC
CTCC
GTTG
CTCG
CTCA
TTTA
TTCC
ATCC
CTGT
GTCT
ATCT
53
—
19
—
26
26
—
—
—
18
—
22
26
12
10
—
—
44
10
—
—
22
12
11
—
—
27
20
100
6
36
10
—
11
8
13
—
12
10
10
567
—
342
—
318
328
—
—
—
296
—
685
482
230
343
—
—
1038
725
—
—
704
675
635
—
—
982
622
1634
271
1371
339
—
329
300
284
—
1134
663
539
9.3
—
5.6
—
8.2
7.9
—
—
—
6.1
—
3.2
5.4
5.2
2.9
—
—
4.2
1.4
—
—
3.1
1.8
1.7
—
—
2.8
3.2
6.1
2.2
2.6
3.0
—
3.3
2.7
4.6
—
1.1
1.5
1.9
6.7
—
4.7
—
4.7
4.6
—
—
—
4.4
—
1.3
1.9
4.4
1.5
—
—
1.8
0.4
—
—
1.9
0.7
0.5
—
—
1.5
1.6
0.8
1.9
2.0
1.5
—
1.8
1.7
3.5
—
0.7
0.9
0.9
1.1
—
0.3
—
1.3
2.1
—
—
—
1.0
—
0.3
1.9
0.9
0.3
—
—
1.5
0.4
—
—
0.9
0.2
0.3
—
—
0.7
0.5
0.4
0.0
0.2
0.9
—
0.3
0.3
0.0
—
0.4
0.5
0.0
1.6
—
0.6
—
2.2
1.2
—
—
—
0.7
—
1.6
1.7
0.0
1.2
—
—
1.0
0.6
—
—
0.4
0.9
0.9
—
—
0.5
1.1
5.0
0.4
0.4
0.6
—
1.2
0.7
1.1
—
0.0
0.2
0.9
a
All possible four-nucleotide motifs found in the IGHV3–23*01 germline gene are included in the table. Substitutions in C or A are given in the left half of the table and
substitutions in the corresponding G or T in the reverse complementary motif are given in the right half of the table. The total substitution rate as well as the substitution rates
to each of the three possible nucleotides are given. A dash (—) means that the motif is not found in the IGHV3–23*01 germline gene. Motifs were not considered if nucleotides
other than the boldfaced nucleotide were mutated in the given motif in the given sequence. The motifs are listed according to decreasing total substitution frequency in C or A
or, in case the motif containing C or A was not present, in G or T, respectively. An asterisk (ⴱ) indicates that the motif is contained within the WRCY/RGYW motif.
selection and highlights the importance of using nonselected sequences to study the mechanism of SHM. To confirm that the mutations in the nonproductive rearrangements were indeed unselected, we
studied mutations (including insertions and deletions) abrogating the
open reading frame of the VH segment. Only 46 (1%) of 3701 mutated, productive rearrangements contained one or more stop codons.
These were likely to result from Taq errors because B cells lacking a
functional Ag receptor are rapidly lost from the circulation and should
therefore not appear in our material (40). In contrast, as many as 171
(44%, p ⬍ 0.0001 when compared with productive rearrangements)
of the 386 nonproductive sequences contained stop codon(s) in the
VH segment. This significant high proportion was not different from
that found in rearrangements of the pseudogene IGHV3-h (45%, p ⫽
1.00), supporting the notion that substitutions in the nonproductive
IGHV3-23*01-derived sequences are indeed unselected.
Substitution in different nucleotides
Fig. 3 shows the distribution of substitution rates for all C (Fig.
3a), G (Fig. 3b), A (Fig. 3c), and T (Fig. 3d) positions, respectively, divided into the three different substitution types. The
curves suggest that the substitutions in C and G residues were
caused by a mechanism with a very high preference for some positions (substitution rates ⬎30%) and a low preference for other
positions (substitution rates ⬍1%). In general, transitions predominated followed by transversions to the complementary nucleotide.
In contrast, the substitution rates in the A and T positions were less
variable and the rates for the two types of transversions were comparable. The graphs for C and G have similar courses and so have
the graphs for A and T, suggesting that the mutation mechanism is
strand symmetric.
C and G substitution rates in different motifs correlate closely to
the reported hot spot and cold spot motifs for cytidine
deamination by AID
To further investigate strand specificity, we compared the substitution rates in different motifs with those of the reverse complementary motifs when both were present in the nontranscribed
strand of the germline gene. In case the mutations are targeted to
both strands by similar mechanisms, this method should show a
correlation between the mutation rates because the mutation of a
given residue in a motif on the nontranscribed strand should
Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017
Motif
No. of
Substitutions
The Journal of Immunology
A and T substitution rates in different motifs
For A/T substitutions, the 17 most mutated four-nucleotide motifs
(see Table II, bottom portion) contained WA/TW mutations pre-
FIGURE 4. Correlation analysis for substitution rates (log transformed)
in reverse complementary motifs on the transcribed and nontranscribed
strand for C/G (a) and A/T (b) motifs, respectively. Only positions with
more than five substitutions are shown in the figure. Strong correlations
between substitution rates in reverse complementary motifs were found for
both C/G motifs (p ⬍ 0.0001, R ⫽ 0.86, Pearson’s correlation analysis) and
A/T (p ⬍ 0.0001, R ⫽ 0.83).
viously described as A/T SHM hot spots (30, 45). No motifs had
an A/T substitution rate of ⬍1%, suggesting that there are no A/T
mutational cold spots.
In IGHV3-h sequences the A/T mutation frequency tended to be
lower than in IGHV3-23. This was also the case for the C/G mutation frequency, indicating that IGHV3-h is less targeted by SHM
than IGHV3-23. However, the three most mutable A/T motifs were
the same in the two VH genes.
Correlation between substitution rates in reverse complementary
motifs
The apparent correlation between the substitution rates in reverse
complementary motifs prompted a closer analysis. Fig. 4a shows a
significant correlation between the total substitution rates in C residues in position 3 in four-nucleotide motifs and in G residues in
the corresponding reverse complementary motifs ( p ⬍ 0.0001,
Pearson’s correlation analysis), and Fig. 4b shows that the case is
the same for A/T substitutions ( p ⬍ 0.0001). These data show that
reverse complementary motifs were targeted equally well, indicating that the SHM machinery targeted individual motifs similarly
on the two strands.
Substitutions in and around runs of G residues
Seven areas in the FRs contain 3– 6 G residues in a row and
showed a particularly low degree of substitution that correlates
well with CCC/GGG being a cold spot. An interesting observation,
however, was the substitution pattern of the T residue in position
3 of codon 15 immediately adjacent to a run of six G residues. This
residue had a very high substitution frequency of 22.5%, the highest for any T residue (marked by an asterisk (ⴱ) in Fig. 3d). Ninetythree percent of the substitutions were transversions to G, a substitution type that only accounted for 23% of substitutions in other
T residues ( p ⬍ 0.0001, Fisher’s exact test). Only two of 262
(0.8%) sequences with less than three mutations in the VH region
Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017
correspond to the mutation of the same nucleotide in the same
motif on the transcribed strand.
We analyzed all possible three-, four-, and five- nucleotide motifs and substitutions in all positions within these motifs. The strongest correlations between substitutions in the same position of the
same motif on the two strands were found for C and A residues in
position 3 in four-nucleotide motifs on the nontranscribed strand
(compared with G and T, respectively, in position 2 in the reverse
complementary motifs of the same strand) (data not shown). Table
II (top portion) shows the substitution rates in all pairs of reverse
complementary C/G motifs. Motifs are ranked by declining total
substitution rate in the C residue or the corresponding G in case the
C motif was not found on the nontranscribed strand. For a given
motif, only sequences in which all other motif positions than the
nucleotide in question were unmutated were included to minimize
the risk of looking at an influence from neighboring substitutions.
With this restriction, ⬃70% of the 6,912 detected substitutions
were included in the analysis. The 10 most mutable motifs (that all
have a substitution rate of ⬎10% for C and/or G) included six of
the seven RGYW/WRCY (where R is A or G, Y is C or T, and W
is T or A) motifs present in the sequence. RGYW/WRCY has
earlier been defined as hot spot motifs for SHM in vivo (41– 43).
The last RGYW/WRCY motifs (AGCC/GGCT) had mutation
rates of 7.8% and 7.3%, respectively, indicating that RGYW/
WRCY is indeed good at predicting high mutability (the targeted
nucleotides of the four nucleotide motifs are set in boldface type).
WRCY includes WRC that has been found to be a hot spot for AID
deamination of the C residue (19, 20, 44), and hence we find that
there is a good correlation between deamination hot spots and C/G
SHM hot spots. The other four highly mutable motifs (CGCT,
AGCA, TACG, and CGCT) only deviate from RGYW/WRCY in
position 1 or 4. Noticeably, the first three of these motifs are WRC/
GYW motifs, suggesting that the three first nucleotides within the
motifs are most important for the mutability.
SYC (where S is G or C) and SSC have been described as
deamination cold spots (19, 20, 44) and, with one exception
(GGTC), the 12 motifs with a substitution rate of ⬍1% were all
compatible with these or the reverse complementary motifs. The
substitution rates in these motifs were only marginally above the
Taq error rate (0.00048 – 0.0015 substitutions per nucleotide). Furthermore, those of the 24 possible SYCx and SSCx motifs (or the
24 reverse complementary motifs) that were present in the sequence had substitution rates of ⬍3.9%. One exception was CGCT
with a substitution rate of 14.9%; however, this motif only deviates
from the hot spot WRCY in position 1.
A set of rearrangements using IGHV3-h was also analyzed. Fifty-six of 103 sequences had more than three mutations in the VHregion and were thus included in the substitution analysis.
IGHV3-h is a pseudogene due to a disturbed translation initiation
codon and thus these sequences have not undergone selection following SHM. Generally the mutation frequencies in the different
motifs were lower than in the similar motifs in the IGHV3-23 using
rearrangements. ATCC, TACT, AGCT, and AACT were the only
motifs where the C mutation frequency was ⬎10%. The last three
are included in the consensus WRCY hot spot motif. ATCC and
AGCT are both present only once in the sequence, leading to considerable uncertainty in the relatively small sample. All possible
combinations of SYC and SSC present in the germline sequence
were also found to have a mutation frequency of ⬍1% in
IGHV3-h.
4327
4328
SHM OCCUR IN AND 5⬘ OF AID HOT SPOTS ON BOTH STRANDS
Table III. Substitution rates in the C and G nucleotides (boldfaced) of the four germline encoded AGCT
motifs present in the 386 nonproductive IGHV3–23*01 rearrangements studieda
AGCT
Location
Codon
Codon
Codon
Codon
4/3 (FR1)
32 (CDR1)
40 (FR2)
55 (FR2/CDR2)
AGCT
Mutated
Unmutated
Rate (%)
Mutated
Unmutated
Rate (%)
85
80
86
227
290
286
279
134
22.7
21.9
23.6
62.9
64
144
130
174
312
219
233
191
17.0
39.7
35.8
47.7
a
AGCT motifs affected by deletions or insertions were excluded. The columns show the number of motifs that were either
mutated or unmutated in the given position as well as the substitution rates. For both AGCT and AGCT the substitution rates
varied significantly between the four locations ( p ⬍ 0.0001, ␹2 test). Also, the C and the G substitution rates within the same
motif were found to differ significantly for the last three motifs ( p ⬍ 0.0005, pairwise t tests).
tution rates of C on the nontranscribed and transcribed strands (G
on the nontranscribed strand) within the same motif.
When comparing individual substitution rates for most of the
other motifs that exist more than once, we find that they also vary
without any obvious relation to location (data not shown). This
indicates that a four-nucleotide motif only partially explains the
mutability in a given position. The substitution rate might still be
influenced by the base composition in the neighboring region or
the location in the gene and, hence, by the distance from the promoter or perhaps the distance to an AID target.
Substitution rates when the motif is present more than once
Many of the motifs are present more than once in the IGHV3-23
gene. The highly mutable AGCT motif, for example, is present
four times, but many other four-base motifs are present up to six
times. As seen in Table III, the substitution rates of C and G in the
individual AGCT motifs are clearly different without any obvious
relation to the location in the gene, i.e., whether the position is in
a FR or a CDR. Nor is there any correlation between the substi-
Correlation between the substitution rate and the distance to the
nearest 3⬘ AID target
Replication over an AID-generated uracil can only account for C
to T transitions and G to A transitions (when occurring on the
transcribed strand), and the different processes involved in the repair of a uracil are thought to generate other mutations. These may
involve the generation of an abasic site by uracil removal or the
Table IV. Correlations between the substitution rate of a given nucleotide and the distance (in base pairs) to the C in the nearest 5⬘ or 3⬘ WRC AID
deamination hot spot motif, respectivelya
Nontranscribed Strand
Distance to Nearest 3⬘
WRC
Transcribed Strand
Distance to Nearest 5⬘
WRC
Distance to Nearest 3⬘
WRC
Distance to Nearest 5⬘
WRC
Substitution
Correlation
Coefficient
p Valueb
Correlation
Coefficient
p Value
Substitution
Correlation
Coefficient
p Valueb
Correlation
Coefficient
p Value
All A
A to G
A to T
A to C
All T
T to C
T to A
T to G
All Cd
C to T
C to G
C to A
All G
G to A
G to C
G to T
⫺0.28
⫺0.19
⫺0.33
⫺0.33
ⴚ0.36
ⴚ0.40
ⴚ0.35
⫺0.17
⫺0.19
⫺0.17
⫺0.20
⫺0.25
ⴚ0.37
ⴚ0.35
ⴚ0.36
⫺0.21
0.0309
0.1446
0.0088
0.0106
0.0040
0.0013
0.0044
0.1794
0.1848
0.2423
0.1641
0.0723
0.0003
0.0006
0.0004
0.0433
0.03
0.04
⫺0.14
0.08
⫺0.05
0.05
⫺0.07
⫺0.12
0.01
0.04
⫺0.03
⫺0.03
0.04
⫺0.01
0.07
0.03
0.8267
0.7626
0.2910
0.5532
0.6843
0.6748
0.5823
0.3382
0.9234
0.7772
0.8084
0.8415
0.6909
0.9197
0.5310
0.7936
All Ac
A to G
A to T
A to C
All T
T to C
T to A
T to G
All Cd
C to T
C to G
C to A
All G
G to A
G to C
G to T
⫺0.19
⫺0.13
⫺0.19
⫺0.31
ⴚ0.43
ⴚ0.38
ⴚ0.51
⫺0.26
⫺0.21
⫺0.25
⫺0.14
⫺0.19
ⴚ0.41
ⴚ0.41
ⴚ0.35
⫺0.24
0.1399
0.3187
0.1392
0.0145
0.0004
0.0020
<0.0001
0.0430
0.0819
0.0358
0.2360
0.1077
0.0003
0.0004
0.0026
0.0459
⫺0.19
⫺0.24
⫺0.20
⫺0.04
⫺0.09
⫺0.09
⫺0.15
⫺0.04
0.01
⫺0.04
0.03
⫺0.06
⫺0.02
⫺0.06
⫺0.03
⫺0.19
0.1675
0.0680
0.1287
0.7950
0.5097
0.5173
0.2484
0.7861
0.9338
0.7303
0.7951
0.6301
0.9006
0.5952
0.8238
0.1087
a
Both the correlation coefficients and the p values are given. In the right half of the table the substitutions are indicated as if they occurred on the transcribed strand (the strand
containing the WRC motif) although technically they were detected as the reverse complementary substitutions on the nontranscribed strand. It is seen that distances to the nearest
3⬘ AID hot spot correlate significantly with T and G substitutions on both strands, while the same trend was only borderline significant for A substitutions. No correlation between
substitution rates and the distance to the nearest 5⬘ WRC was found for any of the substitutions on either strand.
b
The p value for correlation between substitution frequency and distance to the given motif was obtained by Spearman Correlation analysis. Boldfaced values are significant
(⬍0.005). Underlined values (0.005 ⬍ p ⬍ 0.05) are considered borderline significant due to the many comparisons.
c
WRC on the transcribed strand is seen as GYW on the nontranscribed strand, A on the transcribed strand is seen as T on the nontranscribed strand, etc.
d
Distance 0 (equal to C in WRC) was not included in any of the calculations.
Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017
had a substitution in this T residue (one to G and one to C). This
is significantly lower than in mutated sequences ( p ⬍ 0.0001),
indicating that Taq errors cannot account for the high
substitution rate.
In the IGHV3-h-using sequences the substitution rate in the corresponding residue was also remarkably high (14.6%) and again
consisted of mostly T to G transversions. Also, the T residue in the
last position of codon 7 preceding a run of five G residues showed
predominately T to G transversions (75%), but at a somewhat
lower substitution rate (5.3%).
The Journal of Immunology
4329
removal of a stretch of nucleotides by endonucleases and/or exonucleases followed by gap filling by error-prone polymerases that
introduce substitutions in the flanking nucleotides. If this is the
case, one would expect to find a correlation between the substitution rate and the distance to the nearest AID target motif. We tested
such correlations in both directions on both strands, that is, the
distance to the nearest 5⬘ WRC and the distance to the nearest 3⬘
WRC. Naturally, one cannot determine whether a mutation originally occurred on the nontranscribed or the transcribed strand, so
correlations were calculated for two extreme scenarios: namely
that all substitution had occurred on the nontranscribed or the transcribed strand, respectively. Substitutions on the transcribed strand
were counted as the complementary substitution on the nontranscribed strand and correlations were calculated to G in GYW on
the nontranscribed strand. Table IV shows that there is a statistically significant inverse correlation between substitution rates in T
and G residues on both strands and the distance to the nearest 3⬘
WRC motif on the same strand. The only exception is T to G
transversions on the nontranscribed strand. However, when the
substitutions of the T residues in position 3 of codons 15 and 7 (the
ones preceding the runs of G residues) are omitted, the inverse
correlation between T to G substitutions and the distance to WRC
on the nontranscribed strands increases (correlation coefficient of
⫺0.24, p ⫽ 0.06). For substitutions of A there was a trend toward
an inverse correlation between the substitution rate and the distances to the nearest 3⬘ AID hot spots that was borderline significant for transversion to C only. There was no correlation between
substitution rates and the distance to the nearest 5⬘ WRC for any
substitutions, suggesting that the error-inducing repair process following AID deamination only works 5⬘ of the deaminated C.
Influence of substitutions in the neighboring nucleotide
We also tested whether substitutions in C or G in a given AGCT
target would influence the mutation rate in the neighboring G/C by
changing the hot spot motif to a less mutable one. When comparing sequences with such substitutions to those without, we found
that the substitution rate in the neighboring G/C was less than half
(average 43%). This was true for all four AGCT motifs present in
the germline sequence, indicating that substitutions in a given position can indeed influence the mutability of the neighboring nucleotide in subsequent rounds of SHM. We saw, however, no significant differences in the substitution pattern further away than the
closest nucleotide (data not shown).
Substitution rates in JH gene and D gene motifs resemble those
of the VH region
Because of the size of our material, we had an opportunity to study
mutations in individual JH genes and in several D genes. Fig. 5
shows the substitution rates for individual residues in IGHJ6 (average 4.7% per nucleotide) and IGHJ4 (average 3.9%) that are not
significantly different ( p ⫽ 0.19). The highest substitution rates
were seen in the 5⬘ end of the JH genes, which falls within the
CDR3 and contains most of the motifs found to have a high substitution rate in the VH region (e.g., TACT, GGTA, CTAT, and
CTAT; see Table II). Substitution rates in these motifs in the JH
genes correspond well to the rates found in the IGHV3-23, indicating that it is the same regulatory mechanism that controls VH
and JH mutations. The 3⬘ ends of the genes that encode FR4 have
fewer mutations, consistent with a high content of cold spots (e.g.,
GCC, GGC, GTC, GGC, and GGG).
The overall mutation rate in D segments was rather high (average 7.8% per nucleotide). An analysis of individual motifs was
performed and showed that the high mutation rate could be explained by a high content of hot spot motifs in the D genes. The hot
spots targeted had substitution frequencies comparable to those of
the same motif when placed in the VH region (see Table V).
Mutation frequencies depend on the JH gene
We noticed that the fraction of unmutated sequences, defined as
sequences with less than three mutations in the VH region, varied
depending on the JH gene. Twenty-one percent of the sequences
using JH4 (30 of 143 sequences) were unmutated, whereas 46.5%
of the JH6-carrying sequences were unmutated (223 of 473 sequences). This was statistically significant ( p ⬍ 0.0001). When
Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017
FIGURE 5. Substitution rates in
IGHJ6 (n ⫽ 250) (a) and IGHJ4 (n ⫽
113) (b). The first two positions in
each sequence were not included in
the analysis to reduce bias from uncertain definition of the 5⬘ end of the
JH gene when the first or second base
was mutated. It is seen that most of
the substitutions are in the 5⬘ end of
the genes lying within the CDR3,
whereas the part lying in FR4 has a
low intrinsic mutability. The high mutability of the CDR3 parts could be
explained by a high content of motifs
found to have a high substitution rate
in the VH region (e.g., TACT, GGTA,
CTAC, and CTAC; the targeted nucleotide is set in boldface type), while
the FR4 parts are rich in cold spot motifs (e.g., stretches of multiple G or C
residues).
4330
SHM OCCUR IN AND 5⬘ OF AID HOT SPOTS ON BOTH STRANDS
Table V. Substitution rates in different motifs in D genes in mutated, nonproductive rearrangements compared with those of the same motifs in
IGHV3–23a
C Substitutions
IGHV3–23
G Substitutions
D Gene
IGHV3–23
D Gene
Motif
Substitution
Rate (%)
No of
Substitutions
Positions
Tested
Substitution
Rate (%)
Motif
Substitution
Rate (%)
No of
Substitutions
Positions
Tested
Substitution
Rate (%)
AGCT
AGCA
TACT
TACC
TACG
AACC
CACT
ACCA
GACT
34.6
23.4
15.9
—
10.2
—
—
—
4.7
15
2
8
11
1
—
—
8
0
91
36
31
47
25
—
—
51
27
16.5
5.6
25.8
234
4.0
—
—
15.7
0.0
AGCT
TGCT
AGTA
GGTA
CGTA
GGTT
AGTG
TGGT
AGTC
35.2
—
—
11.9
21.2
10.4
5.9
5.6
9.9
19
4
11
1
—
11
2
6
—
92
52
102
24
—
54
49
122
—
20.7
7.7
10.8
4.2
—
20.4
4.1
4.9
—
A Substitutions
IGHV3–23
T Substitutions
D Gene
IGHV3–23
D Gene
Substitution
Rate (%)
No of
Substitutions
Positions
Tested
Substitution
Rate (%)
Motif
Substitution
Rate (%)
No of
Substitutions
Positions
Tested
Substitution
Rate (%)
GTAG
CTAT
TTAC
CTAC
ATAT
GTAC
ATAG
AAAT
CCAC
GCAG
CCAG
GGAG
20.4
17.7
13.0
12.2
10.0
10.0
—
9.2
—
3.0
3.0
3.0
8
4
2
5
4
2
3
—
—
0
3
0
114
25
34
20
43
14
22
—
—
46
47
46
7.0
16.0
5.9
25.0
9.3
14.3
13.6
—
—
0.0
6.4
0.0
CTAC
ATAG
GTAA
GTAG
ATAT
GTAC
CTAT
ATTT
GTGG
CTGC
CTGG
CTCC
11.1
—
—
13.7
8.2
9.8
7.9
—
4.2
2.2
6.1
2.6
2
4
—
15
1
6
2
1
4
3
0
—
17
31
—
125
12
63
31
32
104
80
32
—
11.8
12.9
—
12.0
8.3
9.5
6.5
3.1
3.9
3.8
0.0
—
a
The substitutions in residues at the ends of D segments tend to be underestimated because substitutions in these residues may change the way the joint region is interpreted.
To compensate for this problem, the two 5⬘ and the two 3⬘ nucleotides of each D segment were excluded from the analysis. The boldfaced residue is the one being analyzed
and the motifs are listed after a decreasing substitution rate in C or A residues on the nontranscribed strand, respectively. A dash (—) means that the motif is not found in the
gene(s).
looking at the mutated sequences only (more than three mutations
in the VH region) we also found that the substitution frequency
varied between the two subsets of sequences. JH4-carrying sequences had an average of 18.1 substitutions in the VH region
compared with 15.2 for JH6 ( p ⫽ 0.03). Substitution rates in the
different hot and cold spot motifs were comparable in sequences
using JH6 and JH4. Also, there was no difference in the ratio between C/G and A/T substitutions in JH6- and JH4-carrying sequences ( p ⫽ 0.29), suggesting that it was the overall substitution
rate that was decreased.
Discussion
C and G substitution motifs and enzyme specificities
Using the hitherto largest published set of nonfunctionally rearranged, somatically mutated human IgH sequences, we found that
the most mutable motifs for the SHM of C and G residues corresponded well to the previously described WRCY/RGYW fournucleotide motifs (41– 43). It has been claimed that WRCH/
DGYW is an even better predictor for C/G mutability (46);
however, this cannot be supported by our data (targeted nucleotides are set in boldface type). TACA and TGCA, for example,
have mutation rates as low as 1.3 and 3.3%, respectively, which is
lower than average. The discrepancies can be due to the differences
in sample sizes and methods.
The WRCY motifs include the reported WRC deamination preference of AID (19, 20, 44). This is in line with a previous report
based on 25 nonproductive human ␭ IgL rearrangements and 17
IgH rearrangements (47). Similarly, the reported deamination cold
spots (SYC and SSC) (19, 20, 44) were found to be cold spots for
SHMs in C residues on both strands.
It is interesting to note that the many highly mutable four-nucleotide motifs include a hot spot for C deamination on both
strands, e.g., AGCT, as this provides a simple explanation for the
double-strand breaks shown to appear during the course of SHM
(48, 49). Others, however, have not been able to find a clear correlation between SHM and double-strand breaks in the BL2 cell
line (50).
Replication over an AID-generated uracil can only account for
C/G transitions and, thus, the preferences of other enzymes involved in the mutation process may influence the mutability of the
nucleotides in and around a given motif. UNG or a complex of
MSH2 and MSH6 are proposed to be able to remove the created
uracil, and the sequence specificities of these enzymes may therefore influence the resolution of the U:G mismatch and hence the
mutability of different motifs. Bovine and E. coli UNG have been
shown to have high activity in ATU (51, 52), which corresponds
well with the finding of ATC being a hot spot. However, there are
also some discrepancies because AGU, corresponding to the AGC
mutational hot spot, displays intermediate to low uracil removal
efficiency (51, 52). It is possible that human UNG has a different
nucleotide preference or that MSH2/MSH6 provides the necessary
backup. Substitutions in C and G residues during phase II could
Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017
Motif
The Journal of Immunology
4331
also influence the substitution rates. In support of this, it has been
shown in mice that the inactivation of pol ␪ reduces the number of
C/G substitutions, particularly in hot spot motifs (36). Another
possible phase II C/G mutator is Rev1. Rev1-deficient mice have
been shown to have fewer C to G and G to C mutations while the
relative frequencies of C to A, T to C, and A to T substitutions
were increased (53), suggesting that — at least in mice — Rev1 is
involved in the generation of several types of phase II
substitutions.
the location of most WRC sequences in the CDRs. However, the
fact that not only the distance but also the direction is important
shows that this is not the case. These findings suggest that phase II
SHM predominately targets T and G residues in the AID-targeted
strand 5⬘ of the deaminated C. Alternatively, phase II substitutions
could target the corresponding A and/or C on the opposite strand.
Because both strands are targeted, both models account for phase
II substitutions in all four nucleotides. These two models are discussed further below.
A and T substitutions
G and T substitutions 5⬘ of the AID target suggests involvement
of a 3⬘-5⬘ nuclease followed by gap filling
Strand symmetry indicates that SHM happens on both strands
Substitution rates in complementary A and T residues showed
strand symmetry, indicating that both strands are targeted by phase
II mutations. Likewise, the correlation between the C substitution
frequency in a particular motif and the G substitution frequency in
the motif corresponding to the same motif on the transcribed strand
strongly suggests that AID can deaminate both strands equally
well during the initial phase of SHM. This is in agreement with
data from Foster et al. and Boursier et al. who found that SHM
could target both strands in the human ␬ locus (42) and ␭ locus
(47), respectively. Studies of AID deamination in vitro are, however, contradictory at this point, as some find only deamination of
the nontranscribed strand (19, 57) while others have shown that the
transcribed strand can also be targeted (58 – 60), although in some
cases to a lesser extent than the nontranscribed strand. This discrepancy could be due to the different experimental ways of detecting deamination in vitro, and the presence of cofactors in vivo
may help the targeting of AID to both strands. One such cofactor,
which has very recently been shown to be involved in targeting
AID in engineered mice, is MSH6 (61). MSH6 thus seems to be
involved in not only phase II but also phase I SHM.
Models to explain correlations between substitutions in T and G
residues and the distance to the nearest 3⬘ AID target
Interestingly, we found significant inverse correlations between
phase II substitution rates on T and G residues and the distance to
the nearest 3⬘ WRC AID hot spot. Only nonsignificant or borderline significant trends were found for A and C substitutions. In
contrast, no correlations were found between substitution rates and
the distance to the nearest 5⬘ WRC. It could be argued that the
inverse correlation to the distance to WRC is an artifact caused by
The inverse correlation between the substitution rate in G and T
residues and the distance to the nearest 3⬘ WRC motif suggests a
molecular mechanism involving a 3⬘-5⬘ exonuclease and/or an endonuclease. Such enzyme(s) could be recruited to the abasic site
created at the site of the initial deamination event where it/they
could remove a stretch of DNA 5⬘ of the abasic site. DNA removal
in turn could be followed by error-prone gap filling.
Several human 3⬘-5⬘ exonucleases are known. These include
polymerases ␦ and ␧, WRN, APE1, and MRE11 (62). MRE11
forms the MRN complex along with RAD50 and NBS1. Ectopic
expression of NBS1 increases SHM in a hypermutating Ramos cell
line (63), suggesting that MRN is involved. This is further supported by the finding that MRE11 binds to a rearranged VH region
only in mutating cells and that recombinant MRE11/RAD50 can
cleave abasic sites in ssDNA (64). The ability to cleave DNA is
separable from the 3⬘-5⬘ activity (64) and it is possible that both
functions are important for SHM. APE1 is also capable of DNA
cleavage at abasic sites; however, APE1 does not bind to VH region (64), speaking against an involvement in SHM.
EXO1-deficient mice have normal mutation frequencies but
their mutations are C/G biased and hot spot focused (34), suggesting a possible involvement of EXO1 in phase II SHM. EXO1 binds
to the VH region, but not the C region in hypermutating BL-2 cells
(34). However, EXO1 is a 5⬘-3⬘ exonuclease and therefore does
not fit into this model unless it also has 3⬘-5⬘ exonuclease or endonuclease activity as previously suggested (65). Alternatively, its
involvement in SHM may not be as a nuclease.
As mentioned earlier, several error-prone DNA polymerases
have been suggested as being involved in phase II mutations including polymerases ␪, ␩, ␫, and ␨ (29, 31, 54, 32, 33, 55). These
could be involved in gap filling following DNA removal. However, to account for the finding of a correlation only between T and
G substitution rates and distances to WRC, the involved polymerase(s) would have to make mistakes mainly opposite A and C
residues.
An alternative explanation that easily accounts for the strong
correlation between substitutions in T residues but a less strong
correlation for A substitutions is that a large fraction of the T/A
substitutions could be caused by the occasional incorporation of
dUTP (instead of dTTP) opposite A during phase II repair. The
occasional incorporation of dUTP as a means of generating SHMs
has been suggested by Neuberger et al. (22). According to their
model, the incorporated dUTP would subsequently be excised
and substitutions would be generated during replication over
the abasic site.
Phase II substitution on the opposite strand
As mentioned above, the finding of inverse correlation between T and
G substitution rates and the distance to the nearest 3⬘ WRC can also
be explained if the main targets for phase II mutations are C and A
residues on the strand opposite the AID target. This could, for example, be the case if the generated uracil is either removed to
Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017
A and T substitutions occur only during phase II. Several enzymes
have been shown to be involved, among them some error-prone
DNA polymerases likely to be involved in DNA repair following
the removal of the AID-generated uracil. One such error-prone
polymerase is the translesion pol ␩. Pol ␩-deficient mice and patients with variant xeroderma pigmentosum who have a mutation
in the gene encoding pol ␩ display a reduced level of A/T substitution despite a normal overall mutation rate (29, 31, 54). Mouse
pol ␩ is expressed in germinal center B cells (29) and have been
found to interact with MSH2/6 (31), suggesting a possible way of
recruitment. The substitution pattern of mouse and human pol ␩ in
vitro shows a preference for mutations in WA/TW motifs (30),
which corresponds well with our findings that the most mutable A
and T four-nucleotide motifs include the WA and TW motif,
respectively.
Pol ␨ and pol ␫ have also been suggested as being involved in
phase II mutations (32, 33, 35, 55), and their error preferences may
also influence the substitution patterns. Pol ␫ is, for example,
known for a preference for creating A to G transversions and for
incorporating G and T opposite dUTP and A opposite an abasic
site (56).
4332
SHM OCCUR IN AND 5⬘ OF AID HOT SPOTS ON BOTH STRANDS
generate an abasic site or is left untouched until replication. During
replication, the abasic site/uracil could cause the DNA polymerase
to stall and recruit an error-prone translesion polymerase. In fact,
all of the error-prone polymerases implicated in SHM (␪, ␩, ␫, and
␨) are known translesion polymerases. The translesion polymerase
would be predicted to be engaged opposite the targeted C and
introduce errors while synthesizing a short DNA segment. If errors
are preferentially introduced opposite T and G residues, this model
would explain our findings. This is, for example, the case for polymerase ␫, which has been show to preferentially incorporate G
opposite T, leading to a T to C transition on the AID-targeted
strand 5⬘ of the targeted C (56).
Influence by substitutions on the neighboring nucleotides
Substitution pattern of T residue preceding a run of G residues
The substitution pattern of the T residue in the last position of
codon 15 is remarkable because the frequency is very high, it has
almost exclusively T to G transversions, and it is outside any
known SHM hot spot motifs. The adjacent run of six G residues
contains almost no substitutions but a high frequency of 1–3 nucleotide insertions likely to be caused by Taq polymerase slippage.
However, comparison of the numbers of substitutions in the T
residue in the mutated (62 in 383 sequences) and unmutated (2 in
263 sequences), nonproductive sequences clearly shows that Taq
errors are not the cause of the unusual substitution pattern. Substitutions in T or A residues preceding the other runs of at least
four G residues are also predominantly to G. The JH genes also
have a run of four G residues, but this region is found to contain
very few mutations in all sequences.
Runs of three or four G residues in G-rich motifs are known to
be able to form G quartets when single stranded (66). G quartets,
are for example, formed in the G-rich nontranscribed strand of the
switch region during CSR, and AID is found to bind to them (67).
It can be hypothesized that the runs of G residues in the variable
region also fold into G quartets during the transcription-dependent,
single-stranded phase of SHM. Although AID may bind to the
quartets, the activity of the enzyme may be inhibited, which would
account for the low substitution rate. How this can lead to a T to
G transversion in the flanking base is unknown. One possibility is
that the quartets attract other proteins. GQN1 is a human endonuclease highly expressed in B cells and has been shown to cleave
D and JH gene substitutions
We found that substitution rates in motifs in the VH region were
comparable to the substitution rates found in the same motifs in the
D and VH genes. It is noteworthy that the two JH genes analyzed
contain very few C/G mutational hot spot motifs in the region
encoding FR4 while the regions encoding CDR3 have several hot
spots, for example four overlapping TACT motifs in JH6 creating
hot spots for T, A, and C mutations. Hot spots are also common in
the D genes contributing to the CDR3. In contrast, the FR4 regions
encoded by the JH genes contain many cold spot motifs and
showed very low substitution rates. This suggests that there has
been an evolutionary selection against mutational hot spots in the
FR4 region. That would be in line with earlier studies suggesting
that the codon usage in CDR regions of IgV has been optimized for
SHM (9, 10, 68).
The substitution rate depends on the JH gene
Although substitution patterns in the JH genes are the same as in
the VH region, we find that the mutation status of a rearrangement
partly depends on the JH gene. JH6-carrying sequences are less
likely to be mutated than JH4-carrying sequences and, when mutated, they contain fewer substitutions on average. The mutation
pattern does not seem to change but the overall frequency is decreased. The fact that the difference was found in nonproductive
rearrangements makes the simplest explanation, namely that the
cells with a JH6-containing rearrangement constituted a special
cell subset containing fewer mutations, very unlikely, because to
account for the observed findings rearrangements on both alleles
would then have to use the same JH gene. This is not thought to be
the case.
Rearrangements using JH6 tend to have longer CDR3 loops than
rearrangements with, for example, JH4 (69) (L. Ohm-Laursen
et al., manuscript in preparation) and CDR3s are longer among
unmutated sequences compared with mutated (69 –71 and S.
Petersen and T. Barington, manuscript in preparation). This corresponds well with the finding of more JH6 sequences in the unmutated subset. However, even when the mutation analysis is restricted to rearrangements within a narrow range of CDR3 lengths
(44 –52 bp), we still find that the JH6-carrying sequences are less
likely to be mutated and, when mutated, are significantly less mutated than JH4-carrying sequences (data not shown). Thus, the
length of the CDR3 does not seem to account for the changed
mutation frequency.
Therefore, we suggest that rearrangements using JH6 have special properties influencing the mutation rate. Perhaps a binding site
for a cofactor is located within the intronic region upstream of JH6
and is therefore deleted when JH6 is used. Also, it is possible that
JH6 is simply too close to the regulatory elements in the 3⬘ intronic
enhancer (E␮) (72, 73) for optimal effect.
Another possible regulator could be the E box motif 5⬘-CAG
GTG-3⬘, which is known to bind the regulatory E47 protein (74).
This motif is found in the 3⬘ end of JH1, JH2, JH4, and JH5 but not
in JH3 and JH6, where the last nucleotide of the motif is exchanged
to an A. When inserted into the ␬ locus, the 5⬘-CAGGTG-3⬘ motif
has been shown to enhance SHM in transgenic mice without
changing the mutation pattern (74). Mutation of the E box motif to
5⬘-AAGGTG-3⬘ decreases this effect. Furthermore, inactivation of
the E2A gene in the DT40 chicken B cell line reduces SHM. Mutations can be restored by the expression of either of the E2A splice
Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017
We found that substitutions in C and G residues in hot spot motifs
decreased the substitution frequency of the neighboring G/C residue. This can be explained if the SHM machinery does not normally make substitutions in neighboring nucleotides during the
same cell cycle, because substitutions in the hot spot motifs most
often lead to a less mutable motif such as AGCA (mutated in
23.8% of the motifs) becoming AACA (8% mutation), for example. It is also possible that substitutions do occur on both strands
during the same round of SHM but that the two strands subsequently end up in sister cells before the substitutions are fixed.
Despite the previously shown inverse correlation between the
mutation rate of a given nucleotide and the distance to the nearest
3⬘ WRC, we find that mutations in a given AGCT position (included in the WRC/GYW motif) do not influence the overall mutation distribution in the sequence. This suggests that the C/G nucleotide that undergoes the initial deamination step (index
nucleotide) leading to phase II SHM is sometimes repaired, while
on other occasions the mutation is fixed during phase II. If the
index nucleotide mutation is always fixed during phase II we
would expect to find more mutations 5⬘ of the index nucleotide in
the sequences mutated in the index and, on the contrary, if the
index is always repaired we would expect to find fewer mutations.
DNA 2–5 nucleotides upstream of G quartets (66). This endonuclease may possibly be involved in the cleavage of the DNA leading to SHM, although the observed substitution pattern is not
readily explained.
The Journal of Immunology
variants, E47 or E12, showing the importance of these proteins for
the level of SHM (75). Because the 5⬘-CAGGTA-3⬘ motifs found
in JH3 and JH6 deviate from the consensus 5⬘-CANNTG-3⬘ E-box
motif, we therefore hypothesize that this one nucleotide difference
may be involved in reducing the mutational load of JH6-carrying
sequences compared with JH4-carrying sequences. Regardless of
the cause, the finding of a variable mutation frequency being dependent on the type of JH gene has implications for the affinity
maturation and fine tuning of the repertoire, as JH6 and JH4 are the
two most commonly used JH genes in the repertoire (76).
Concluding remarks
Acknowledgment
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
We are grateful to Nina Eggers for excellent technical assistance.
27.
Disclosures
The authors have no financial conflict of interest.
28.
References
1. Peters, A., and U. Storb. 1996. Somatic hypermutation of immunoglobulin genes
is linked to transcription initiation. Immunity 4: 57– 65.
2. Fukita, Y., H. Jacobs, and K. Rajewsky. 1998. Somatic hypermutation in the
heavy chain locus correlates with transcription. Immunity 9: 105–114.
3. Delpy, L., C. Sirac, C. Le Morvan, and M. Cogne. 2004. Transcription-dependent
somatic hypermutation occurs at similar levels on functional and nonfunctional
rearranged IgH alleles. J. Immunol. 173: 1842–1848.
4. Betz, A. G., C. Milstein, A. Gonzalez-Fernandez, R. Pannell, T. Larson, and
M. S. Neuberger. 1994. Elements regulating somatic hypermutation of an immunoglobulin ␬ gene: critical role for the intron enhancer/matrix attachment region.
Cell 77: 239 –248.
5. Terauchi, A., K. Hayashi, D. Kitamura, Y. Kozono, N. Motoyama, and
T. Azuma. 2001. A pivotal role for DNase I-sensitive regions 3b and/or 4 in the
induction of somatic hypermutation of IgH genes. J. Immunol. 167: 811– 820.
6. Kodama, M., R. Hayashi, H. Nishizumi, F. Nagawa, T. Takemori, and H. Sakano.
2001. The PU. 1 and NF-EM5 binding motifs in the Igkappa 3⬘ enhancer are
responsible for directing somatic hypermutations to the intrinsic hot spots in the
transgenic V(kappa) gene. Int. Immunol. 13: 1415–1422.
7. Rada, C., and C. Milstein. 2001. The intrinsic hypermutability of antibody heavy
and light chain genes decays exponentially. EMBO J. 20: 4570 – 4576.
8. Rada, C., J. Yelamos, W. Dean, and C. Milstein. 1997. The 5⬘ hypermutation
boundary of ␬ chains is independent of local and neighbouring sequences and
related to the distance from the initiation of transcription. Eur. J. Immunol. 27:
3115–3120.
9. Wagner, S. D., C. Milstein, and M. S. Neuberger. 1995. Codon bias targets
mutation. Nature 376: 732.
10. Cowell, L. G., H. J. Kim, T. Humaljoki, C. Berek, and T. B. Kepler. 1999.
Enhanced evolvability in immunoglobulin V genes under somatic hypermutation.
J. Mol. Evol. 49: 23–26.
11. Muramatsu, M., K. Kinoshita, S. Fagarasan, S. Yamada, Y. Shinkai, and
T. Honjo. 2000. Class switch recombination and hypermutation require activa-
29.
30.
31.
32.
33.
34.
35.
36.
37.
tion- induced cytidine deaminase (AID), a potential RNA editing enzyme. Cell
102: 553–563.
Revy, P., T. Muto, Y. Levy, F. Geissmann, A. Plebani, O. Sanal, N. Catalan,
M. Forveille, R. Dufourcq-Labelouse, A. Gennery, et al. 2000. Activation-induced cytidine deaminase (AID) deficiency causes the autosomal recessive form
of the Hyper-IgM syndrome (HIGM2). Cell 102: 565–575.
Arakawa, H., J. Hauschild, and J. M. Buerstedde. 2002. Requirement of the
activation-induced deaminase (AID) gene for immunoglobulin gene conversion.
Science 295: 1301–1306.
Martin, A., P. D. Bardwell, C. J. Woo, M. Fan, M. J. Shulman, and M. D. Scharff.
2002. Activation-induced cytidine deaminase turns on somatic hypermutation in
hybridomas. Nature 415: 802– 806.
Petersen-Mahrt, S. K., R. S. Harris, and M. S. Neuberger. 2002. AID mutates E.
coli suggesting a DNA deamination mechanism for antibody diversification. Nature 418: 99 –104.
Yoshikawa, K., I. M. Okazaki, T. Eto, K. Kinoshita, M. Muramatsu, H. Nagaoka,
and T. Honjo. 2002. AID enzyme-induced hypermutation in an actively transcribed gene in fibroblasts. Science 296: 2033–2036.
Okazaki, I. M., K. Kinoshita, M. Muramatsu, K. Yoshikawa, and T. Honjo. 2002.
The AID enzyme induces class switch recombination in fibroblasts. Nature 416:
340 –345.
Bransteitter, R., P. Pham, M. D. Scharff, and M. F. Goodman. 2003. Activationinduced cytidine deaminase deaminates deoxycytidine on single-stranded DNA
but requires the action of RNase. Proc. Natl. Acad. Sci. USA 100: 4102– 4107.
Pham, P., R. Bransteitter, J. Petruska, and M. F. Goodman. 2003. Processive
AID-catalysed cytosine deamination on single-stranded DNA simulates somatic
hypermutation. Nature 424: 103–107.
Larijani, M., D. Frieder, W. Basit, and A. Martin. 2005. The mutation spectrum
of purified AID is similar to the mutability index in Ramos cells and in ung(⫺/⫺)
msh2(⫺/⫺) mice. Immunogenetics 56: 840 – 845.
Rada, C., J. M. Di Noia, and M. S. Neuberger. 2004. Mismatch recognition and
uracil excision provide complementary paths to both Ig switching and the A/Tfocused phase of somatic mutation. Mol. Cell 16: 163–171.
Neuberger, M. S., J. M. Di Noia, R. C. Beale, G. T. Williams, Z. Yang, and
C. Rada. 2005. Somatic hypermutation at A.T pairs: polymerase error versus
dUTP incorporation. Nat. Rev. Immunol. 5: 171–178.
Rada, C., G. T. Williams, H. Nilsen, D. E. Barnes, T. Lindahl, and
M. S. Neuberger. 2002. Immunoglobulin isotype switching is inhibited and somatic hypermutation perturbed in UNG-deficient mice. Curr. Biol. 12:
1748 –1755.
Imai, K., G. Slupphaug, W. I. Lee, P. Revy, S. Nonoyama, N. Catalan, L. Yel,
M. Forveille, B. Kavli, H. E. Krokan, H. D. Ochs, et al. 2003. Human uracil-DNA
glycosylase deficiency associated with profoundly impaired immunoglobulin
class-switch recombination. Nat. Immunol. 4: 1023–1028.
Rada, C., M. R. Ehrenstein, M. S. Neuberger, and C. Milstein. 1998. Hot spot
focusing of somatic hypermutation in MSH2-deficient mice suggests two stages
of mutational targeting. Immunity 9: 135–141.
Martomo, S. A., W. W. Yang, and P. J. Gearhart. 2004. A role for Msh6 but not
Msh3 in somatic hypermutation and class switch recombination. J. Exp. Med.
200: 61– 68.
Shen, H. M., A. Tanaka, G. Bozek, D. Nicolae, and U. Storb. 2006. Somatic
hypermutation and class switch recombination in Msh6⫺/⫺Ung⫺/⫺ doubleknockout mice. J. Immunol. 177: 5386 –5392.
Matsuda, T., K. Bebenek, C. Masutani, I. B. Rogozin, F. Hanaoka, and
T. A. Kunkel. 2001. Error rate and specificity of human and murine DNA polymerase eta. J. Mol. Biol. 312: 335–346.
Zeng, X., D. B. Winter, C. Kasmer, K. H. Kraemer, A. R. Lehmann, and
P. J. Gearhart. 2001. DNA polymerase eta is an A-T mutator in somatic hypermutation of immunoglobulin variable genes. Nat. Immunol. 26: 537–541.
Rogozin, I. B., Y. I. Pavlov, K. Bebenek, T. Matsuda, and T. A. Kunkel. 2001.
Somatic mutation hot spots correlate with DNA polymerase eta error spectrum.
Nat. Immunol. 2: 530 –536.
Martomo, S. A., W. W. Yang, R. P. Wersto, T. Ohkumo, Y. Kondo, M. Yokoi,
C. Masutani, F. Hanaoka, and P. J. Gearhart. 2005. Different mutation signatures
in DNA polymerase eta- and MSH6-deficient mice suggest separate roles in
antibody diversification. Proc. Natl. Acad. Sci. USA 102: 8656 – 8661.
Diaz, M., L. K. Verkoczy, M. F. Flajnik, and N. R. Klinman. 2001. Decreased
frequency of somatic hypermutation and impaired affinity maturation but intact
germinal center formation in mice expressing antisense RNA to DNA polymerase
␨. J. Immunol. 167: 327–335.
Zan, H., A. Komori, Z. Li, A. Cerutti, A. Schaffer, M. F. Flajnik, M. Diaz, and
P. Casali. 2001. The translesion DNA polymerase ␨ plays a major role in Ig and
bcl-6 somatic hypermutation. Immunity 14: 643– 653.
Bardwell, P. D., C. J. Woo, K. Wei, Z. Li, A. Martin, S. Z. Sack, T. Parris,
W. Edelmann, and M. D. Scharff. 2004. Altered somatic hypermutation and reduced class-switch recombination in exonuclease 1-mutant mice. Nat. Immunol.
5: 224 –229.
Faili, A., S. Aoufouchi, E. Flatter, Q. Gueranger, C. A. Reynaud, and J. C. Weill.
2002. Induction of somatic hypermutation in immunoglobulin genes is dependent
on DNA polymerase iota. Nature 419: 944 –947.
Masuda, K., R. Ouchida, A. Takeuchi, T. Saito, H. Koseki, K. Kawamura,
M. Tagawa, T. Tokuhisa, T. Azuma, and J. Wang. 2005. DNA polymerase theta
contributes to the generation of C/G mutations during somatic hypermutation of
Ig genes. Proc. Natl. Acad. Sci. USA 102: 13986 –13991.
Ohm-Laursen, L., M. Nielsen, S. R. Larsen, and T. Barington. 2006. No evidence
for the use of DIR, D-D fusions, chromosome 15 open reading frames or VH
replacement in the peripheral repertoire was found when applying an improved
Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017
The Ig repertoire is known to be shaped by the SHM of the variable regions during an immune response. In this study we report
that the mutation machinery operates equally well on both strands.
The substitution frequency of a given residue is dependent on the
motif in which it resides and the distance to the nearest 3⬘ AID
deamination hot spot, suggesting that phase II substitutions occur
5⬘ of the site of the initial deamination. Alternatively, phase II
substitutions occur on the opposite strand 3⬘ of the G residue facing the AID-targeted C residue. Substitutions in the neighboring
nucleotide also influence the substitution frequency of C and G in
AGCT double hot spots. Motifs are the same in VH, D, and JH
genes; however, the JH gene of the rearrangement influences the
overall mutation frequency, because JH6-using rearrangements are
found to contain fewer mutations than JH4-using rearrangements.
The sequences in this study use the IGHV3-23*01 VH gene, and
it can therefore be argued that the findings may be special to this
VH gene. However, when possible we have confirmed the results
by analysis of a set of sequences using the IGHV3-h pseudogene.
Also, previous work from many groups suggests that the mutation
process is similar irrespective of which VH genes have been studied. We therefore suggest that the results presented in this paper
are also applicable to other human VH genes.
4333
4334
38.
39.
40.
41.
42.
43.
44.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
algorithm, JointML, to 6329 human IgH rearrangements. Immunology 119:
265–277.
Ohm-Laursen, L., S. R. Larsen, and T. Barington. 2005. Identification of two new
alleles, IGHV3-23*04 and IGHJ6*04, and the complete sequence of the
IGHV3-h pseudogene in the human immunoglobulin locus and their prevalences
in Danish caucasians. Immunogenetics 57: 621– 627.
Lefranc, M. P., V. Giudicelli, C. Ginestoux, J. Bodmer, W. Muller, R. Bontrop,
M. Lemaitre, A. Malik, V. Barbie, and D. Chaume. 1999. IMGT, the international
immunogenetics database. Nucleic Acids Res. 27: 209 –212.
Lam, K. P., R. Kühn, and K. Rajewsky. 1997. In vivo ablation of surface immunoglobulin on mature B cells by inducible gene targeting results in rapid cell
death. Cell 90: 1073–1083.
Rogozin, I. B., and N. A. Kolchanov. 1992. Somatic hypermutagenesis in immunoglobulin genes, II: influence of neighbouring base sequences on mutagenesis. Biochim. Biophys. Acta 1171: 11–18.
Foster, S. J., T. Dorner, and P. E. Lipsky. 1999. Somatic hypermutation of V␬J␬
rearrangements: targeting of RGYW motifs on both DNA strands and preferential
selection of mutated codons within RGYW motifs. Eur. J. Immunol. 29:
4011– 4021.
Dorner, T., H. P. Brezinschek, R. I. Brezinschek, S. J. Foster, R. Domiati-Saad,
and P. E. Lipsky. 1997. Analysis of the frequency and pattern of somatic mutations within nonproductively rearranged human variable heavy chain genes.
J. Immunol. 158: 2779 –2789.
Yu, K., F. T. Huang, and M. R. Lieber. 2004. DNA substrate length and surrounding sequence affect the activation-induced deaminase activity at cytidine.
J. Biol. Chem. 279: 6496 – 6500.
Pavlov, Y. I., I. B. Rogozin, A. P. Galkin, A. Y. Aksenova, F. Hanaoka, C. Rada,
and T. A. Kunkel. 2002. Correlation of somatic hypermutation specificity and
A-T base pair substitution errors by DNA polymerase eta during copying of a
mouse immunoglobulin ␬ light chain transgene. Proc. Natl. Acad. Sci. USA 99:
9954 –9959.
Rogozin, I. B., and M. Diaz. 2004. Cutting edge: DGYW/WRCH is a better
predictor of mutability at G:C bases in Ig hypermutation than the widely accepted
RGYW/WRCY motif and probably reflects a two-step activation-induced cytidine deaminase-triggered process. J. Immunol. 172: 3382–3384.
Boursier, L., W. Su, and J. Spencer. 2004. Analysis of strand biased ‘G’.C hypermutation in human immunoglobulin V(␭) gene segments suggests that both
DNA strands are targets for deamination by activation-induced cytidine deaminase. Mol. Immunol. 40: 1273–1278.
Sale, J. E., and M. S. Neuberger. 1998. TdT-accessible breaks are scattered over
the immunoglobulin V domain in a constitutively hypermutating B cell line.
Immunity 9: 859 – 869.
Bross, L., Y. Fukita, F. McBlane, C. Demolliere, K. Rajewsky, and H. Jacobs.
2001. DNA double-strand breaks in immunoglobulin genes undergoing somatic
hypermutation. Immunity 13: 589 –597.
Faili, A., S. Aoufouchi, Q. Gueranger, C. Zober, A. Leon, B. Bertocci,
J. C. Weill, and C. A. Reynaud. 2006. AID-dependent somatic hypermutation
occurs as a DNA single-strand event in the BL2 cell line. Nat. Immunol. 39:
815– 821.
Eftedal, I., P. H. Guddal, G. Slupphaug, G. Volden, and H. E. Krokan. 1993.
Consensus sequences for good and poor removal of uracil from double stranded
DNA by uracil-DNA glycosylase. Nucleic Acids Res. 21: 2095–2101.
Eftedal, I., G. Volden, and H. E. Krokan. 1994. Excision of uracil from doublestranded DNA by uracil-DNA glycosylase is sequence specific. Ann. NY Acad.
Sci. 726: 312–314.
Jansen, J. G., P. Langerak, A. Tsaalbi-Shtylik, P. ven den Berk, H. Jacobs, and
N. de Wind. 2006. Strand-biased defect in G/C transversions in hypermutating
immunoglobulin genes in Rev1-deficient mice. J. Exp. Med. 203: 319 –323.
Zeng, X., G. A. Negrete, C. Kasmer, W. W. Yang, and P. J. Gearhart. 2004.
Absence of DNA polymerase ␩ reveals targeting of C mutations on the nontranscribed strand in immunoglobulin switch regions. J. Exp. Med. 199: 917–924.
Delbos, F., A. De Smet, A. Faili, S. Aoufouchi, J. C. Weill, and C. A. Reynaud.
2005. Contribution of DNA polymerase ␩ to immunoglobulin gene hypermutation in the mouse. J. Exp. Med. 201: 1191–1196.
56. Zhang, Y., X. Yuan, X. Wu, and Z. Wang. 2000. Preferential incorporation of G
opposite template T by the low-fidelity human DNA polymerase ␫. Mol. Cell.
Biol. 20: 7099 –7108.
57. Martomo, S. A., D. Fu, W. W. Yang, N. S. Joshi, and P. J. Gearhart. 2005.
Deoxyuridine is generated preferentially in the nontranscribed strand of DNA
from cells expressing activation-induced cytidine deaminase. J. Immunol. 174:
7787–7791.
58. Chaudhuri, J., M. Tian, C. Khuong, K. Chua, E. Pinaud, and F. W. Alt. 2003.
Transcription-targeted DNA deamination by the AID antibody diversification enzyme. Nature 422: 726 –730.
59. Besmer, E., E. Market, and F. N. Papavasiliou. 2006. The transcription elongation
complex directs activation-induced cytidine deaminase-mediated DNA deamination. Mol. Cell. Biol. 26: 4378 – 4385.
60. Shen, H. M., S. Ratnam, and U. Storb. 2005. Targeting of the activation-induced
cytosine deaminase is strongly influenced by the sequence and structure of the
targeted DNA. Mol. Cell. Biol. 25: 10815–10821.
61. Li, Z., C. Zhao, M. D. Iglesias-Ussel, Z. Polonskaya, M. Zhuang, G. Yang,
Z. Luo, W. Edelmann, and M. D. Scharff. 2006. The mismatch repair protein
Msh6 influences the in vivo AID targeting to the Ig locus. Immunity 24: 393– 403.
62. Shevelev, I. V., and U. Hübscher. 2002. The 3⬘-5⬘ exonucleases. Nat. Rev. Mol.
Cell. Biol. 3: 1–12.
63. Yabuki, M., M. M. Fujii, and N. Maizels. 2005. The MRE11-RAD50-NBS1
complex accelerates somatic hypermutation and gene conversion of immunoglobulin variable regions. Nat. Immunol. 6: 730 –736.
64. Larson, E. D., W. J. Cummings, D. W. Bednarski, and N. Maizels. 2005. MRE11/
RAD50 cleaves DNA in the AID/UNG-dependent pathway of immunoglobulin
gene diversification. Mol. Cell 20: 367–375.
65. Genschel, J., L. R. Bazemore, and P. Modrich. 2002. Human Exonuclease I Is
Required for 5⬘ and 3⬘ Mismatch Repair. J. Biol. Chem. 277: 13302–13311.
66. Sun, H., A. Yabuki, and N. Maizels. 2001. A human nuclease specific for G4
DNA. Proc. Natl. Acad. Sci. USA 98: 12444 –12449.
67. Duquette, M. L., P. Pham, M. F. Goodman, and N. Maizels. 2005. AID binds to
transcription-induced structures in c-MYC that map to regions associated with
translocation and hypermutation. Oncogene 24: 5791–5798.
68. Monson, N. L., T. Dorner, and P. E. Lipsky. 2000. Targeting and selection of
mutations in human V␭ rearrangements. Eur. J. Immunol. 30: 1597–1605.
69. Rosner, K., D. B. Winter, R. E. Tarone, G. L. Skovgaard, V. A. Bohr, and
P. J. Gearhart. 2001. Third complementarity-determining region of mutated VH
immunoglobulin genes contains shorter V, D, J, P, and N components than nonmutated genes. Immunology 103: 179 –187.
70. Luger, E., M. Lamers, G. Achatz-Straussberger, R. Geisberger, D. Infuhr,
M. Breitenbach, R. Crameri, and G. Achatz. 2001. Somatic diversity of the immunoglobulin repertoire is controlled in an isotype-specific manner. Eur. J. Immunol. 31: 2319 –2330.
71. Brezinschek, H. P., S. J. Foster, T. Dorner, R. I. Brezinschek, and P. E. Lipsky.
1998. Pairing of variable heavy and variable kappa chains in individual naive and
memory B cells. J. Immunol. 160: 4762– 4767.
72. Morvan, C. L., E. Pinaud, C. Decourt, A. Cuvillier, and M. Cogne. 2003. The
immunoglobulin heavy-chain locus hs3b and hs4 3⬘ enhancers are dispensable for
VDJ assembly and somatic hypermutation. Blood 102: 1421–1427.
73. Bottaro, A., F. Young, J. Chen, M. Serwe, F. Sablitzky, and F. W. Alt. 1998.
Deletion of the IgH intronic enhancer and associated matrix-attachment regions
decreases, but does not abolish, class switching at the ␮ locus. Int. Immunol. 10:
799 – 806.
74. Michael, N., H. M. Shen, S. Longerich, N. Kim, A. Longacre, and U. Storb. 2003.
The E box motif CAGGTG enhances somatic hypermutation without enhancing
transcription. Immunity 19: 235–242.
75. Schoetz, U., M. Cervelli, Y. D. Wang, P. Fiedler, and J. M. Buerstedde. 2006.
E2A expression stimulates Ig hypermutation. J. Immunol. 177: 395– 400.
76. Wasserman, R., Y. Ito, N. Galili, M. Yamada, B. A. Reichard, S. Shane,
B. Lange, and G. Rovera. 1992. The pattern of joining (JH) gene usage in the
human IgH chain is established predominantly at the B precursor cell stage.
J. Immunol. 149: 511–516.
Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017
45.
SHM OCCUR IN AND 5⬘ OF AID HOT SPOTS ON BOTH STRANDS