This information is current as of July 12, 2017. Analysis of 6912 Unselected Somatic Hypermutations in Human VDJ Rearrangements Reveals Lack of Strand Specificity and Correlation between Phase II Substitution Rates and Distance to the Nearest 3 ′ Activation-Induced Cytidine Deaminase Target Line Ohm-Laursen and Torben Barington References Subscription Permissions Email Alerts This article cites 76 articles, 32 of which you can access for free at: http://www.jimmunol.org/content/178/7/4322.full#ref-list-1 Information about subscribing to The Journal of Immunology is online at: http://jimmunol.org/subscription Submit copyright permission requests at: http://www.aai.org/About/Publications/JI/copyright.html Receive free email-alerts when new articles cite this article. Sign up at: http://jimmunol.org/alerts The Journal of Immunology is published twice each month by The American Association of Immunologists, Inc., 1451 Rockville Pike, Suite 650, Rockville, MD 20852 Copyright © 2007 by The American Association of Immunologists All rights reserved. Print ISSN: 0022-1767 Online ISSN: 1550-6606. Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017 J Immunol 2007; 178:4322-4334; ; doi: 10.4049/jimmunol.178.7.4322 http://www.jimmunol.org/content/178/7/4322 The Journal of Immunology Analysis of 6912 Unselected Somatic Hypermutations in Human VDJ Rearrangements Reveals Lack of Strand Specificity and Correlation between Phase II Substitution Rates and Distance to the Nearest 3ⴕ Activation-Induced Cytidine Deaminase Target1 Line Ohm-Laursen2 and Torben Barington3 S omatic hypermutations (SHMs)4 in the form of nucleotide substitutions, insertions, and deletions are found throughout the variable regions of Ig rearrangements in postgerminal center B cells. SHM is followed by selection against unfavorable mutations and for high affinity binding to Ag, and the process is necessary for the generation of high affinity Abs and B cell memory. SHM is dependent on several cis-acting elements, including the promoter and the elements of the Ig enhancer regions (1– 6). The promoter and enhancer requirement is likely to be due to a requirement for transcription, although other properties of the enhancers, e.g., protein binding, are likely to be involved as well (6). The transcription dependence is furthermore supported by the fact that the mutation frequency decays exponentially from a starting point ⬃150 –200 bp from the promoter (7, 8). The 3⬘ boundary is Department of Clinical Immunology, Odense University Hospital, Denmark Received for publication July 13, 2006. Accepted for publication January 8, 2007. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. 1 This study was supported by Danish Medical Research Council Grant 22-01-0156. 2 Current address: University of Oxford, The Peter Medawar Building for Pathogen Research, South Parks Road, Oxford, U.K. 3 Address correspondence and reprint requests to Prof. Torben Barington, Department of Clinical Immunology, Odense University Hospital, 5000 Odense C, Denmark. E-mail address: [email protected] 4 Abbreviations used in this paper: SHM, somatic hypermutation; AID, activationinduced cytidine deaminase; CSR, class switch recombination; EXO1, exonuclease I; FR, framework region; MSH, MutS homolog; pol, DNA polymerase; UNG, uracil DNA glycosylase. Copyright © 2007 by The American Association of Immunologists, Inc. 0022-1767/07/$2.00 www.jimmunol.org in the intron region downstream of the J genes and no mutations are normally found in the constant domain. In contrast, the entire variable domain is targeted by SHM. The CDRs have been described as being more prone to mutation than the framework regions (FRs), and this been attributed to the presence of more RGYW hot spot motifs (9, 10). The only trans-acting factor described to be absolutely mandatory for SHM is activation-induced cytidine deaminase (AID). AID is not only required for SHM but also for class switch recombination (CSR) and gene conversion (11–13). Ectopic expression of AID can turn on mutation and CSR in human hybridomas, Escherichia coli, and murine fibroblasts (14 –17), proving that it is the only B cell-specific factor necessary for SHM. AID is a cytidine deaminase shown to be able to deaminate cytidine residues in ssDNA, in particular in WRC motifs (18 –20). It has also been suggested that AID could be involved in SHM by modulation of the mRNA of an involved protein (11). According to the current model for SHM (21, 22), the process is initiated by cytidine deamination by AID. The targeted sequence is thought to be single stranded because of the ongoing transcription from the Ig promoter. The generated uracil can either be replicated over, generating a C to T transition in the sister cell (or G to A if the transcribed strand is targeted) (phase I), or it can be removed. Uracil DNA-glycosylase (UNG) and a complex of MutS homologs (MSH) 2 and 6 (MSH2 and MSH6) have both been described as being capable of uracil removal. Deletion of or mutation in the murine and human UNG gene changes the mutation pattern of G and C residues almost exclusively to transitions (21, 23, 24). MSH2/MSH6 deficiency leads to impaired mutation of A and T residues (21, 25, 26), and UNGMSH6 double knock-out mice have almost exclusively C to T and Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017 The initial event of somatic hypermutation (SHM) is the deamination of cytidine residues by activation-induced cytidine deaminase (AID). Deamination is followed by the replication over uracil and/or different error-prone repair events. We sequenced 659 nonproductive human IgH rearrangements (IGHV3-23*01) from blood B lymphocytes enriched for CD27-positive memory cells. Analyses of 6,912 unique, unselected substitutions showed that in vivo hot and cold spots for the SHM of C and G residues corresponded closely to the target preferences reported for AID in vitro. A detailed analysis of all possible four-nucleotide motifs present on both strands of the VH gene showed significant correlations between the substitution frequencies in reverse complementary motifs, suggesting that the SHM machinery targets both strands equally well. An analysis of individual JH and D gene segments showed that the substitution frequencies in the individual motifs were comparable to the frequencies found in the VH gene. Interestingly, JH6-carrying sequences were less likely to undergo SHM (average 15.2 substitutions per VH region) than sequences using JH4 (18.1 substitutions, p ⴝ 0.03). We also found that the substitution rates in G and T residues correlated inversely with the distance to the nearest 3ⴕ WRC AID hot spot motif on both the nontranscribed and transcribed strands. This suggests that phase II SHM takes place 5ⴕ of the initial AID deamination target and primarily targets T and G residues or, alternatively, the corresponding A and C residues on the opposite strand. The Journal of Immunology, 2007, 178: 4322– 4334. The Journal of Immunology 4323 motifs but, as for phase I C/G mutations, we find no sign of strand specificity. Interestingly, the phase II substitution rates in T and G showed an inverse correlation to the distance to the nearest 3⬘ WRC, indicating that phase II mutations predominately occur in T and G residues 5⬘ of the initial AID-deaminated cytidine residue or, alternatively, in the corresponding A and C residues on the opposite strand. Materials and Methods FIGURE 1. Number of substitutions in the VH region of 659 nonproductive rearrangements. Only the 386 sequences with more than three substitutions were used in the mutation analysis to reduce the influence of Taq errors. FIGURE 2. Substitution rates in the different positions of the IGHV3-23*01 VH gene segment in 386 somatically hypermutated, nonproductive rearrangements. A total of 6,912 substitutions were analyzed. FR/CDR boundaries are indicated according to ImMunoGeneTics (IMGT) nomenclature. Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017 G to A transitions, i.e., phase I mutations (27). A and T residues are targeted during phase II of SHM. Phase II involves resolution of the abasic site generated by uracil removal and has been suggested to involve repair by error-prone DNA polymerases such as polymerase (pol) and pol and the involvement of exonuclease I (EXO1) (28 –35). Pol has been shown to be involved in phase II C/G mutation (36). An alternative hypothesis suggests that incorporation of dUTP may be the cause of A/T mutations (22). In this study we have analyzed 6,912 substitutions in 386 mutated, nonproductive human H chain rearrangements using IGHV323*01 and 542 substitutions in 56 mutated rearrangements using the IGHV3-h pseudogene. This high number of nonproductive sequences enables us to study nonselected mutations with a good statistical power. We show that mutations in C and G residues can be assigned to AID deamination equally targeted to both strands. The substitution rate depends on the motifs in which the nucleotide is found and the recognized motifs are at least four nucleotides long. The substitution rates in A and T residues also depend on the A material consisting of 659 nonproductive and 5,670 productive rearrangements of IGHV3-23 rearranged to predominately IGHJ4 and IGHJ6 was collected and validated as described (37) (European Molecular Biology Laboratory (EMBL) accession nos. AM076988 –AM083316). In brief, 100 ml of blood was collected from 28 healthy, adult volunteers after informed consent (the study was approved by the Ethics Committee for Funen and Vejle Counties, Denmark). Donors had been selected to be homozygous for IGHV3-23*01 and IGHJ6*02, which are the most frequent genotypes in the Danish Caucasian population (genotype frequency 0.64 and 0.51, respectively) (see Ref. 38). DNA was purified from magnetic bead-isolated memory B cells (MACS B cell isolation kit II followed by CD27-isolation (Miltenyi Biotec via Biotech Line)) using the QIAmp blood DNA mini kit (Qiagen via VWR International). IgH rearrangements were PCR amplified with 3-23cn9.F (5⬘-CTGAGCTGGCTTTTTTTCTT GTG-3⬘) as the forward primer and either JH4.R (5⬘-GCCGCTGTTGC CTCAGG-3⬘) or JH6.R1 (5⬘-CCCACAGGCAGTAGCAGAA-3⬘) as the reverse primer. PCR products were cloned using the TOPO TA cloning kit (Invitrogen Life Technologies), and plasmid DNA was purified using the Wizard SV 9600 plasmid purification system (Promega via Ramcon). The rearrangements were sequenced with the BigDye Terminator kit (Applied Biosystems) with an ABI Prism 3100 genetic analyzer (Applied Biosystems). A set of 103 rearrangements using the IGHV3-h pseudogene was generated in a similar way (37) (EMBL accession nos. AM282702– AM282804). The CDR3 regions were analyzed for D genes, P and N nucleotides, and trimming using the JointMLc algorithm as described elsewhere (37). An online version of the algorithm can be found at www.cbs.dtu.dk/services/VDJsolver. Statistical analyses were performed using the Analyze-it addition to Microsoft Excel. 4324 SHM OCCUR IN AND 5⬘ OF AID HOT SPOTS ON BOTH STRANDS Table I. VH substitutions in 386 non-productive IGHV3–23*01 sequences with more than three substitutions in the VH regiona To From A A T C G Total 0.046 (316) 0.040 (273) 0.166 (1148) T C G Transitions Transversions Percentage Mutated (%)b 0.061c (424) 0.052 (359) 0.089 (615) 0.118 (816) 0.054 (375) 0.093 (645) 0.118 (816) 0.089 (615) 0.130 (895) 0.166 (1148) 0.503 (3474) 0.113 (783) 0.100 (691) 0.133 (918) 0.151 (1046) 0.497 (3438) 6.8 5.6 6.7 6.1 6.3 0.130 (895) 0.042 (290) 0.109 (756) a The fraction of each type of substitution among all substitutions is given as well as the absolute numbers (parentheses). The overall transition/ transversion ratio was 1.01. The rightmost column shows the substitution rates for each of the four nucleotides and the overall substitution rate. b Substitution rates varied from 5.6% for T to 6.8% for A, which is statistically significant ( p ⬍ 0.0001, 2 test). c Fraction of the given substitution of all substitutions. Results FIGURE 3. Mutation rates given as the log transformed fraction of 386 nonproductive IGHV3-23*01 VH sequences (codons 1–107) carrying a specific substitution type (of three possible) in individual C (a), G (b), A (c), and T (d) residues. For clarity, the individual nucleotides have been placed in the order of declining total substitution rates along the x-axis. For all four nucleotides, transitions (light gray marks) predominate. However, for transversions two distinct patterns are seen. For C and G residues, transversions to the complementary nucleotides (black marks) are more common than transversions to the noncomplementary nucleotides (dark gray marks), while both types of transversions are equally frequent in A and T residues. ⴱ, The T residue in position three in codon 15 with a remarkably high level of T to G transversions. 0.0015 substitutions per nucleotide (37). Furthermore, 153 deletions and 70 insertions were found in the VH regions, accounting for 2 and 1% of the total mutations, respectively. Insertions and deletions will be described in detail elsewhere (T. Barington and L. Ohm-Laursen, manuscript in preparation). A further 855 substitutions were found in the JH-regions and 213 in the D regions of IGHV3-23-using sequences. The VH region of the IGHV3-h-using sequences contained 542 substitutions in total. To exclude the possibility that some of the rearrangements used other VH-genes, all sequences were thoroughly compared with all known IGHV alleles in the ImMunoGeneTics (IMGT) database (http://imgt.cines.fr) (39). We always found maximal identity with IGHV3-23*01 or IGHV3-h*01, respectively. Some donors frequently had a specific mutation in IGHD3-3, suggesting the existence of a new D gene allele. However, this could not be confirmed by sequencing of the IGHD3-3 germline gene (data not shown). No other mutations in the IGHD genes or IGHJ genes indicated the presence of new alleles. Fig. 2 shows the substitution frequencies in the different positions in the VH region in the 386 nonproductive IGHV3-23*01 rearrangements. The overall substitution frequency per nucleotide in the VH-region was 6.3%, varying from 5.6% for T residues to 6.8% for A residues (see Table I). The transition/transversion ratio was 1.01. In the mutated productive sequences the transition/transversion ratio was significantly higher (1.30, p ⬍ 0.0001) and the mutation rate lower (5.0%, p ⫽ 0.0026). This was largely due to a lower rate of replacement mutation yielding a lower replacement-tosilent mutation ratio (2.53 vs 4.02, p ⬍ 0.0001) that is indicative of Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017 The data set consisted of 659 nonproductive and 5,670 productive IGHV3-23*01 rearrangements. Sequences were considered nonproductive when the V-D-JH joint changed the normal reading frame of the JH segment or when the D segment contained one or more stop codons not resulting from SHM (i.e., germline encoded). Because the mutations in nonproductive sequences have not been selected for Ag binding, the mutations were expected to represent the intrinsic preferences of the mutation mechanism. Great care was taken to remove clonally related sequences from the material (37) and hence the mutations are expected to have arisen independently. All sequences contained exon 2 of the VH gene starting in codon 1 or further upstream and continued through the 3⬘ end of the JH gene. The precise germline sequences of the VH gene (IGHV3-23*01) and the JH6 allele (IGHJ6*02) were known, because all donors had been typed and found homozygous in the respective loci (see Ref. 38). The VDJ joints were analyzed using the JointML algorithm that we have showed to be the currently best algorithm for identifying D genes (37). Of the 659 nonproductive rearrangements, 185 were unmutated in the VH region (codons 1–107) and therefore excluded. To minimize interference from Taq errors in the analyses, those sequences with one to three mutations were also excluded (88 sequences). The remaining 386 sequences contained from four to 70 mutations (median 14, average 16.5) in the VH region, excluding insertions and deletions (see Fig. 1). Altogether, these sequences contained 6,912 substitutions. Less than 2.4% of these substitutions were expected to be Taq errors based on an estimated error rate of 0.00048 – The Journal of Immunology 4325 Table II. Substitution rates of the boldfaced C/G (top portion) and A/T (bottom portion) residue in the given four-nucleotide motifs in 386 non-productive, mutated rearrangements (codons 1 through 107)a C/G Substitutions in the Nontranscribed Strand Substitutions in C Substitutions in G Substitution Rates (%) No. of Motifs Tested Total C to T C to G C to A AGCT* AGCG AGCA TGCC* TGCT* TACT* CGCT TACC* AACC* TACG ACCT ATCT AGCC* CACC ATCC AACA TTCT TGCA CACT GACT TTCC ACCC CGCA TGCG GGCT TCCT TTCA TTCG CACG CGCC CACA GGCA CTCA CTCC TCCC TCCA GACA GGCC ACCA TACA ACCG GTCT TCCG CTCT CTCG GACC GCCC CCCC CCCA GCCT GCCA GCCG CCCT GTCC GGCG 301 — 110 73 — 76 40 — — 26 30 51 108 51 — 42 — 60 — 33 13 — 11 11 32 11 20 — 17 10 6 — 7 28 6 16 10 10 4 4 — 8 8 6 — — — — — 4 2 3 1 2 — 871 — 470 334 — 477 268 — — 255 308 580 1367 648 — 618 — 1007 — 696 284 — 288 299 996 345 644 — 594 362 240 — 325 1363 368 999 674 723 295 298 — 661 695 669 — — — — — 906 606 947 356 717 — 34.6 — 23.4 21.9 — 15.9 14.9 — — 10.2 9.7 8.8 7.9 7.9 — 6.8 — 6.0 — 4.7 4.6 — 3.8 3.7 3.2 3.2 3.1 — 2.8 2.8 2.5 — 2.2 2.1 1.6 1.6 1.5 1.4 1.4 1.3 — 1.2 1.2 0.9 — — — — — 0.4 0.3 0.3 0.3 0.3 — 14.5 — 13.0 11.7 — 7.6 7.5 — — 6.3 4.9 3.5 5.1 3.2 — 3.4 — 3.8 — 1.3 2.5 — 2.1 2.0 1.8 1.5 1.6 — 1.9 2.2 0.8 — 1.5 0.8 0.8 0.7 0.9 0.6 1.0 1.0 — 0.6 0.9 0.5 — — — — — 0.2 0.2 0.2 0.0 0.1 — 17.2 — 5.1 4.8 — 5.0 6.0 — — 2.0 3.3 4.1 1.8 3.7 — 2.3 — 1.1 — 2.9 1.8 — 1.0 1.3 1.3 0.3 0.8 — 0.8 0.3 0.8 — 0.3 0.8 0.3 0.7 0.3 0.4 0.0 0.3 — 0.3 0.0 0.3 — — — — — 0.2 0.0 0.0 0.0 0.1 — 2.9 — 5.3 5.4 — 3.4 1.1 — — 2.0 1.6 1.2 1.0 0.9 — 1.1 — 1.1 — 0.6 0.4 — 0.7 0.3 0.1 1.5 0.8 — 0.2 0.3 0.8 — 0.3 0.4 0.5 0.2 0.3 0.4 0.3 0.0 — 0.3 0.3 0.2 — — — — — 0.0 0.2 0.1 0.3 0.0 — Substitution Rates (%) Motif No. of Substitutions No. of Motifs Tested Total G to A G to C G to T AGCT* CGCT TGCT GGCA* AGCA* AGTA* AGCG GGTA* GGTT* CGTA AGGT AGAT GGCT* GGTG GGAT TGTT AGAA TGCA AGTG AGTC GGAA GGGT TGCG CGCA AGCC AGGA TGAA CGAA CGTG GGCG TGTG TGCC TGAG GGAG GGGA TGGA TGTC GGCC TGGT TGTA CGGT AGAC CGGA AGAG CGAG GGTC GGGC GGGG TGGG AGGC TGGC CGGC AGGG GGAC CGCC 309 71 — — 92 — — 87 34 59 8 — 76 27 21 22 19 32 52 35 8 47 7 21 39 47 17 7 10 — 7 6 17 22 8 26 — 8 57 24 4 17 — 10 2 6 4 12 7 3 — 3 5 0 1 879 299 — — 452 — — 731 328 279 355 — 1040 600 305 350 304 979 879 354 327 1076 295 298 1298 340 636 244 349 — 657 267 907 1007 710 1005 — 721 1013 242 303 1000 — 672 305 1035 712 2176 1313 627 — 350 1034 297 353 35.2 23.8 — — 20.4 — — 11.9 10.4 21.2 2.3 — 7.3 4.5 6.9 6.3 6.3 3.3 5.9 9.9 2.5 4.4 2.4 7.1 3.0 13.8 2.7 2.9 2.9 — 1.1 2.3 1.9 2.2 1.1 2.6 — 1.1 5.6 9.9 1.3 1.7 — 1.5 0.7 0.6 0.6 0.6 0.5 0.5 — 0.9 0.5 0.0 0.3 17.4 13.0 — — 8.9 — — 8.2 4.0 11.5 1.1 — 4.4 2.7 3.6 3.7 2.3 1.8 3.3 4.2 1.8 2.4 1.7 4.7 1.1 3.5 2.0 2.1 0.9 — 0.5 1.1 0.7 0.3 1.0 1.9 — 0.1 3.5 4.6 0.0 0.7 — 0.3 0.7 0.3 0.3 0.3 0.1 0.2 — 0.6 0.4 0.0 0.3 15.5 6.0 — — 10.4 — — 1.2 5.5 6.1 0.9 — 1.9 1.3 2.6 1.4 4.0 1.0 1.7 5.7 0.3 1.6 0.7 1.7 1.5 9.7 0.6 0.8 1.2 — 0.3 0.8 0.9 1.9 0.0 0.3 — 0.4 1.1 2.9 1.0 0.7 — 1.0 0.0 0.2 0.3 0.1 0.4 0.2 — 0.0 0.1 0.0 0.0 2.3 5.0 — — 1.1 — — 2.5 0.9 3.6 0.3 — 1.0 0.5 0.7 1.1 0.0 0.4 0.9 0.0 0.3 0.4 0.0 0.7 0.5 0.6 0.0 0.0 0.9 — 0.3 0.4 0.3 0.0 0.1 0.4 — 0.6 1.1 2.5 0.3 0.3 — 0.2 0.0 0.1 0.0 0.1 0.1 0.2 — 0.3 0.0 0.0 0.0 A/T Substitutions in the Nontranscribed Strand Substitutions in A Substitutions in T Substitution Rates (%) Motif No. of Substitutions No. of Nucleotide Tested Total A to G A to T A to C GTAT CTAT GTAG TTAC CTAC GAAT GTAA CAAT ATAC GTAC ATAT 106 65 60 31 29 — — 31 22 32 25 520 367 388 238 238 — — 291 214 319 250 20.4 17.7 15.5 13.0 12.2 — — 10.7 10.3 10.0 10.0 6.7 7.1 10.1 5.0 4.2 — — 4.8 4.2 6.3 4.8 7.8 6.0 3.4 3.8 3.8 — — 2.4 3.7 1.9 3.6 5.8 4.6 2.1 4.2 4.2 — — 3.4 2.3 1.9 1.6 Substitution Rates (%) Motif No. of Substitutions No. of Nucleotides Tested Total T to C ATAC ATAG CTAC GTAA GTAG ATTC TTAC ATTG GTAT GTAC ATAT 47 — 26 — 52 75 26 — 38 31 20 239 — 235 — 380 641 233 — 452 318 245 19.4 — 11.1 — 13.7 11.7 11.2 — 8.4 9.8 8.2 7.5 — 6.8 — 6.3 5.3 4.7 — 4.4 2.8 4.5 T to A T to G 5.4 6.7 — — 2.6 1.7 — — 3.2 4.2 3.7 2.7 3.0 3.4 — — 1.6 2.4 3.5 3.5 2.9 0.8 (Table continues) Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017 Motif No. of Substitutions 4326 SHM OCCUR IN AND 5⬘ OF AID HOT SPOTS ON BOTH STRANDS Table II. (Continued) A/T Substitutions in the Nontranscribed Strand Substitutions in A Substitutions in T Substitution Rates (%) No. of Nucleotide Tested Total A to G A to T A to C CTAA AAAT CAAG CAAA GAAC ATAG CCAT GAAA TTAG AAAG GAAG CCAA TAAT GCAT TCAC ACAT ACAA CCAC GGAC GCAA CGAA TCAG AGAG TGAA ACAC AGAA GCAC TCAT CCAG GCAG GGAG CAAC CGAG TGAG TAAA GGAA GGAT ACAG AGAC AGAT — 29 28 25 50 — 44 15 35 10 40 17 — — 32 11 14 — 13 12 10 7 26 24 22 10 6 — 32 33 30 — 9 25 — 8 7 12 19 — — 314 318 304 619 — 608 212 549 160 696 312 — — 651 250 321 — 310 296 247 176 688 643 643 295 183 — 1057 1096 1015 — 312 915 — 327 291 616 1002 — — 9.2 8.8 8.2 8.1 — 7.2 7.1 6.4 6.3 5.8 5.5 — — 4.9 4.4 4.4 — 4.2 4.1 4.1 4.0 3.8 3.7 3.4 3.4 3.3 — 3.0 3.0 3.0 — 2.9 2.7 — 2.5 2.1 2.0 1.9 — — 5.7 7.2 6.3 5.5 — 3.3 4.3 2.9 5.0 4.5 3.9 — — 2.2 2.0 3.1 — 2.6 2.7 2.0 1.7 2.0 2.5 2.3 2.4 2.2 — 2.0 2.0 1.6 — 1.3 1.2 — 1.8 1.7 1.0 1.5 — — 1.9 0.9 1.0 0.7 — 2.6 0.5 2.4 0.6 0.7 1.0 — — 1.2 1.2 0.3 — 0.3 0.7 0.8 0.0 0.0 0.0 0.8 0.7 0.6 — 0.5 0.6 0.6 — 1.0 0.6 — 0.3 0.0 0.7 0.2 — — 1.6 0.6 1.0 1.9 — 1.3 2.4 1.1 0.6 0.6 0.6 — — 1.5 1.2 0.9 — 1.3 7.0 1.2 2.3 1.7 1.2 0.3 0.3 0.6 — 0.6 0.4 0.8 — 0.6 1.0 — 0.3 0.7 0.3 0.2 — Substitution Rates (%) Motif No. of Substitutions No. of Nucleotides Tested Total T to C T to A T to G TTAG ATTT CTTG TTTG GTTC CTAT ATGG TTTC CTAA CTTT CTTC TTGG ATTA ATGC GTGA ATGT TTGT GTGG GTCC TTGC TTCG CTGA CTCT TTCA GTGT TTCT GTGC ATGA CTGG CTGC CTCC GTTG CTCG CTCA TTTA TTCC ATCC CTGT GTCT ATCT 53 — 19 — 26 26 — — — 18 — 22 26 12 10 — — 44 10 — — 22 12 11 — — 27 20 100 6 36 10 — 11 8 13 — 12 10 10 567 — 342 — 318 328 — — — 296 — 685 482 230 343 — — 1038 725 — — 704 675 635 — — 982 622 1634 271 1371 339 — 329 300 284 — 1134 663 539 9.3 — 5.6 — 8.2 7.9 — — — 6.1 — 3.2 5.4 5.2 2.9 — — 4.2 1.4 — — 3.1 1.8 1.7 — — 2.8 3.2 6.1 2.2 2.6 3.0 — 3.3 2.7 4.6 — 1.1 1.5 1.9 6.7 — 4.7 — 4.7 4.6 — — — 4.4 — 1.3 1.9 4.4 1.5 — — 1.8 0.4 — — 1.9 0.7 0.5 — — 1.5 1.6 0.8 1.9 2.0 1.5 — 1.8 1.7 3.5 — 0.7 0.9 0.9 1.1 — 0.3 — 1.3 2.1 — — — 1.0 — 0.3 1.9 0.9 0.3 — — 1.5 0.4 — — 0.9 0.2 0.3 — — 0.7 0.5 0.4 0.0 0.2 0.9 — 0.3 0.3 0.0 — 0.4 0.5 0.0 1.6 — 0.6 — 2.2 1.2 — — — 0.7 — 1.6 1.7 0.0 1.2 — — 1.0 0.6 — — 0.4 0.9 0.9 — — 0.5 1.1 5.0 0.4 0.4 0.6 — 1.2 0.7 1.1 — 0.0 0.2 0.9 a All possible four-nucleotide motifs found in the IGHV3–23*01 germline gene are included in the table. Substitutions in C or A are given in the left half of the table and substitutions in the corresponding G or T in the reverse complementary motif are given in the right half of the table. The total substitution rate as well as the substitution rates to each of the three possible nucleotides are given. A dash (—) means that the motif is not found in the IGHV3–23*01 germline gene. Motifs were not considered if nucleotides other than the boldfaced nucleotide were mutated in the given motif in the given sequence. The motifs are listed according to decreasing total substitution frequency in C or A or, in case the motif containing C or A was not present, in G or T, respectively. An asterisk (ⴱ) indicates that the motif is contained within the WRCY/RGYW motif. selection and highlights the importance of using nonselected sequences to study the mechanism of SHM. To confirm that the mutations in the nonproductive rearrangements were indeed unselected, we studied mutations (including insertions and deletions) abrogating the open reading frame of the VH segment. Only 46 (1%) of 3701 mutated, productive rearrangements contained one or more stop codons. These were likely to result from Taq errors because B cells lacking a functional Ag receptor are rapidly lost from the circulation and should therefore not appear in our material (40). In contrast, as many as 171 (44%, p ⬍ 0.0001 when compared with productive rearrangements) of the 386 nonproductive sequences contained stop codon(s) in the VH segment. This significant high proportion was not different from that found in rearrangements of the pseudogene IGHV3-h (45%, p ⫽ 1.00), supporting the notion that substitutions in the nonproductive IGHV3-23*01-derived sequences are indeed unselected. Substitution in different nucleotides Fig. 3 shows the distribution of substitution rates for all C (Fig. 3a), G (Fig. 3b), A (Fig. 3c), and T (Fig. 3d) positions, respectively, divided into the three different substitution types. The curves suggest that the substitutions in C and G residues were caused by a mechanism with a very high preference for some positions (substitution rates ⬎30%) and a low preference for other positions (substitution rates ⬍1%). In general, transitions predominated followed by transversions to the complementary nucleotide. In contrast, the substitution rates in the A and T positions were less variable and the rates for the two types of transversions were comparable. The graphs for C and G have similar courses and so have the graphs for A and T, suggesting that the mutation mechanism is strand symmetric. C and G substitution rates in different motifs correlate closely to the reported hot spot and cold spot motifs for cytidine deamination by AID To further investigate strand specificity, we compared the substitution rates in different motifs with those of the reverse complementary motifs when both were present in the nontranscribed strand of the germline gene. In case the mutations are targeted to both strands by similar mechanisms, this method should show a correlation between the mutation rates because the mutation of a given residue in a motif on the nontranscribed strand should Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017 Motif No. of Substitutions The Journal of Immunology A and T substitution rates in different motifs For A/T substitutions, the 17 most mutated four-nucleotide motifs (see Table II, bottom portion) contained WA/TW mutations pre- FIGURE 4. Correlation analysis for substitution rates (log transformed) in reverse complementary motifs on the transcribed and nontranscribed strand for C/G (a) and A/T (b) motifs, respectively. Only positions with more than five substitutions are shown in the figure. Strong correlations between substitution rates in reverse complementary motifs were found for both C/G motifs (p ⬍ 0.0001, R ⫽ 0.86, Pearson’s correlation analysis) and A/T (p ⬍ 0.0001, R ⫽ 0.83). viously described as A/T SHM hot spots (30, 45). No motifs had an A/T substitution rate of ⬍1%, suggesting that there are no A/T mutational cold spots. In IGHV3-h sequences the A/T mutation frequency tended to be lower than in IGHV3-23. This was also the case for the C/G mutation frequency, indicating that IGHV3-h is less targeted by SHM than IGHV3-23. However, the three most mutable A/T motifs were the same in the two VH genes. Correlation between substitution rates in reverse complementary motifs The apparent correlation between the substitution rates in reverse complementary motifs prompted a closer analysis. Fig. 4a shows a significant correlation between the total substitution rates in C residues in position 3 in four-nucleotide motifs and in G residues in the corresponding reverse complementary motifs ( p ⬍ 0.0001, Pearson’s correlation analysis), and Fig. 4b shows that the case is the same for A/T substitutions ( p ⬍ 0.0001). These data show that reverse complementary motifs were targeted equally well, indicating that the SHM machinery targeted individual motifs similarly on the two strands. Substitutions in and around runs of G residues Seven areas in the FRs contain 3– 6 G residues in a row and showed a particularly low degree of substitution that correlates well with CCC/GGG being a cold spot. An interesting observation, however, was the substitution pattern of the T residue in position 3 of codon 15 immediately adjacent to a run of six G residues. This residue had a very high substitution frequency of 22.5%, the highest for any T residue (marked by an asterisk (ⴱ) in Fig. 3d). Ninetythree percent of the substitutions were transversions to G, a substitution type that only accounted for 23% of substitutions in other T residues ( p ⬍ 0.0001, Fisher’s exact test). Only two of 262 (0.8%) sequences with less than three mutations in the VH region Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017 correspond to the mutation of the same nucleotide in the same motif on the transcribed strand. We analyzed all possible three-, four-, and five- nucleotide motifs and substitutions in all positions within these motifs. The strongest correlations between substitutions in the same position of the same motif on the two strands were found for C and A residues in position 3 in four-nucleotide motifs on the nontranscribed strand (compared with G and T, respectively, in position 2 in the reverse complementary motifs of the same strand) (data not shown). Table II (top portion) shows the substitution rates in all pairs of reverse complementary C/G motifs. Motifs are ranked by declining total substitution rate in the C residue or the corresponding G in case the C motif was not found on the nontranscribed strand. For a given motif, only sequences in which all other motif positions than the nucleotide in question were unmutated were included to minimize the risk of looking at an influence from neighboring substitutions. With this restriction, ⬃70% of the 6,912 detected substitutions were included in the analysis. The 10 most mutable motifs (that all have a substitution rate of ⬎10% for C and/or G) included six of the seven RGYW/WRCY (where R is A or G, Y is C or T, and W is T or A) motifs present in the sequence. RGYW/WRCY has earlier been defined as hot spot motifs for SHM in vivo (41– 43). The last RGYW/WRCY motifs (AGCC/GGCT) had mutation rates of 7.8% and 7.3%, respectively, indicating that RGYW/ WRCY is indeed good at predicting high mutability (the targeted nucleotides of the four nucleotide motifs are set in boldface type). WRCY includes WRC that has been found to be a hot spot for AID deamination of the C residue (19, 20, 44), and hence we find that there is a good correlation between deamination hot spots and C/G SHM hot spots. The other four highly mutable motifs (CGCT, AGCA, TACG, and CGCT) only deviate from RGYW/WRCY in position 1 or 4. Noticeably, the first three of these motifs are WRC/ GYW motifs, suggesting that the three first nucleotides within the motifs are most important for the mutability. SYC (where S is G or C) and SSC have been described as deamination cold spots (19, 20, 44) and, with one exception (GGTC), the 12 motifs with a substitution rate of ⬍1% were all compatible with these or the reverse complementary motifs. The substitution rates in these motifs were only marginally above the Taq error rate (0.00048 – 0.0015 substitutions per nucleotide). Furthermore, those of the 24 possible SYCx and SSCx motifs (or the 24 reverse complementary motifs) that were present in the sequence had substitution rates of ⬍3.9%. One exception was CGCT with a substitution rate of 14.9%; however, this motif only deviates from the hot spot WRCY in position 1. A set of rearrangements using IGHV3-h was also analyzed. Fifty-six of 103 sequences had more than three mutations in the VHregion and were thus included in the substitution analysis. IGHV3-h is a pseudogene due to a disturbed translation initiation codon and thus these sequences have not undergone selection following SHM. Generally the mutation frequencies in the different motifs were lower than in the similar motifs in the IGHV3-23 using rearrangements. ATCC, TACT, AGCT, and AACT were the only motifs where the C mutation frequency was ⬎10%. The last three are included in the consensus WRCY hot spot motif. ATCC and AGCT are both present only once in the sequence, leading to considerable uncertainty in the relatively small sample. All possible combinations of SYC and SSC present in the germline sequence were also found to have a mutation frequency of ⬍1% in IGHV3-h. 4327 4328 SHM OCCUR IN AND 5⬘ OF AID HOT SPOTS ON BOTH STRANDS Table III. Substitution rates in the C and G nucleotides (boldfaced) of the four germline encoded AGCT motifs present in the 386 nonproductive IGHV3–23*01 rearrangements studieda AGCT Location Codon Codon Codon Codon 4/3 (FR1) 32 (CDR1) 40 (FR2) 55 (FR2/CDR2) AGCT Mutated Unmutated Rate (%) Mutated Unmutated Rate (%) 85 80 86 227 290 286 279 134 22.7 21.9 23.6 62.9 64 144 130 174 312 219 233 191 17.0 39.7 35.8 47.7 a AGCT motifs affected by deletions or insertions were excluded. The columns show the number of motifs that were either mutated or unmutated in the given position as well as the substitution rates. For both AGCT and AGCT the substitution rates varied significantly between the four locations ( p ⬍ 0.0001, 2 test). Also, the C and the G substitution rates within the same motif were found to differ significantly for the last three motifs ( p ⬍ 0.0005, pairwise t tests). tution rates of C on the nontranscribed and transcribed strands (G on the nontranscribed strand) within the same motif. When comparing individual substitution rates for most of the other motifs that exist more than once, we find that they also vary without any obvious relation to location (data not shown). This indicates that a four-nucleotide motif only partially explains the mutability in a given position. The substitution rate might still be influenced by the base composition in the neighboring region or the location in the gene and, hence, by the distance from the promoter or perhaps the distance to an AID target. Substitution rates when the motif is present more than once Many of the motifs are present more than once in the IGHV3-23 gene. The highly mutable AGCT motif, for example, is present four times, but many other four-base motifs are present up to six times. As seen in Table III, the substitution rates of C and G in the individual AGCT motifs are clearly different without any obvious relation to the location in the gene, i.e., whether the position is in a FR or a CDR. Nor is there any correlation between the substi- Correlation between the substitution rate and the distance to the nearest 3⬘ AID target Replication over an AID-generated uracil can only account for C to T transitions and G to A transitions (when occurring on the transcribed strand), and the different processes involved in the repair of a uracil are thought to generate other mutations. These may involve the generation of an abasic site by uracil removal or the Table IV. Correlations between the substitution rate of a given nucleotide and the distance (in base pairs) to the C in the nearest 5⬘ or 3⬘ WRC AID deamination hot spot motif, respectivelya Nontranscribed Strand Distance to Nearest 3⬘ WRC Transcribed Strand Distance to Nearest 5⬘ WRC Distance to Nearest 3⬘ WRC Distance to Nearest 5⬘ WRC Substitution Correlation Coefficient p Valueb Correlation Coefficient p Value Substitution Correlation Coefficient p Valueb Correlation Coefficient p Value All A A to G A to T A to C All T T to C T to A T to G All Cd C to T C to G C to A All G G to A G to C G to T ⫺0.28 ⫺0.19 ⫺0.33 ⫺0.33 ⴚ0.36 ⴚ0.40 ⴚ0.35 ⫺0.17 ⫺0.19 ⫺0.17 ⫺0.20 ⫺0.25 ⴚ0.37 ⴚ0.35 ⴚ0.36 ⫺0.21 0.0309 0.1446 0.0088 0.0106 0.0040 0.0013 0.0044 0.1794 0.1848 0.2423 0.1641 0.0723 0.0003 0.0006 0.0004 0.0433 0.03 0.04 ⫺0.14 0.08 ⫺0.05 0.05 ⫺0.07 ⫺0.12 0.01 0.04 ⫺0.03 ⫺0.03 0.04 ⫺0.01 0.07 0.03 0.8267 0.7626 0.2910 0.5532 0.6843 0.6748 0.5823 0.3382 0.9234 0.7772 0.8084 0.8415 0.6909 0.9197 0.5310 0.7936 All Ac A to G A to T A to C All T T to C T to A T to G All Cd C to T C to G C to A All G G to A G to C G to T ⫺0.19 ⫺0.13 ⫺0.19 ⫺0.31 ⴚ0.43 ⴚ0.38 ⴚ0.51 ⫺0.26 ⫺0.21 ⫺0.25 ⫺0.14 ⫺0.19 ⴚ0.41 ⴚ0.41 ⴚ0.35 ⫺0.24 0.1399 0.3187 0.1392 0.0145 0.0004 0.0020 <0.0001 0.0430 0.0819 0.0358 0.2360 0.1077 0.0003 0.0004 0.0026 0.0459 ⫺0.19 ⫺0.24 ⫺0.20 ⫺0.04 ⫺0.09 ⫺0.09 ⫺0.15 ⫺0.04 0.01 ⫺0.04 0.03 ⫺0.06 ⫺0.02 ⫺0.06 ⫺0.03 ⫺0.19 0.1675 0.0680 0.1287 0.7950 0.5097 0.5173 0.2484 0.7861 0.9338 0.7303 0.7951 0.6301 0.9006 0.5952 0.8238 0.1087 a Both the correlation coefficients and the p values are given. In the right half of the table the substitutions are indicated as if they occurred on the transcribed strand (the strand containing the WRC motif) although technically they were detected as the reverse complementary substitutions on the nontranscribed strand. It is seen that distances to the nearest 3⬘ AID hot spot correlate significantly with T and G substitutions on both strands, while the same trend was only borderline significant for A substitutions. No correlation between substitution rates and the distance to the nearest 5⬘ WRC was found for any of the substitutions on either strand. b The p value for correlation between substitution frequency and distance to the given motif was obtained by Spearman Correlation analysis. Boldfaced values are significant (⬍0.005). Underlined values (0.005 ⬍ p ⬍ 0.05) are considered borderline significant due to the many comparisons. c WRC on the transcribed strand is seen as GYW on the nontranscribed strand, A on the transcribed strand is seen as T on the nontranscribed strand, etc. d Distance 0 (equal to C in WRC) was not included in any of the calculations. Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017 had a substitution in this T residue (one to G and one to C). This is significantly lower than in mutated sequences ( p ⬍ 0.0001), indicating that Taq errors cannot account for the high substitution rate. In the IGHV3-h-using sequences the substitution rate in the corresponding residue was also remarkably high (14.6%) and again consisted of mostly T to G transversions. Also, the T residue in the last position of codon 7 preceding a run of five G residues showed predominately T to G transversions (75%), but at a somewhat lower substitution rate (5.3%). The Journal of Immunology 4329 removal of a stretch of nucleotides by endonucleases and/or exonucleases followed by gap filling by error-prone polymerases that introduce substitutions in the flanking nucleotides. If this is the case, one would expect to find a correlation between the substitution rate and the distance to the nearest AID target motif. We tested such correlations in both directions on both strands, that is, the distance to the nearest 5⬘ WRC and the distance to the nearest 3⬘ WRC. Naturally, one cannot determine whether a mutation originally occurred on the nontranscribed or the transcribed strand, so correlations were calculated for two extreme scenarios: namely that all substitution had occurred on the nontranscribed or the transcribed strand, respectively. Substitutions on the transcribed strand were counted as the complementary substitution on the nontranscribed strand and correlations were calculated to G in GYW on the nontranscribed strand. Table IV shows that there is a statistically significant inverse correlation between substitution rates in T and G residues on both strands and the distance to the nearest 3⬘ WRC motif on the same strand. The only exception is T to G transversions on the nontranscribed strand. However, when the substitutions of the T residues in position 3 of codons 15 and 7 (the ones preceding the runs of G residues) are omitted, the inverse correlation between T to G substitutions and the distance to WRC on the nontranscribed strands increases (correlation coefficient of ⫺0.24, p ⫽ 0.06). For substitutions of A there was a trend toward an inverse correlation between the substitution rate and the distances to the nearest 3⬘ AID hot spots that was borderline significant for transversion to C only. There was no correlation between substitution rates and the distance to the nearest 5⬘ WRC for any substitutions, suggesting that the error-inducing repair process following AID deamination only works 5⬘ of the deaminated C. Influence of substitutions in the neighboring nucleotide We also tested whether substitutions in C or G in a given AGCT target would influence the mutation rate in the neighboring G/C by changing the hot spot motif to a less mutable one. When comparing sequences with such substitutions to those without, we found that the substitution rate in the neighboring G/C was less than half (average 43%). This was true for all four AGCT motifs present in the germline sequence, indicating that substitutions in a given position can indeed influence the mutability of the neighboring nucleotide in subsequent rounds of SHM. We saw, however, no significant differences in the substitution pattern further away than the closest nucleotide (data not shown). Substitution rates in JH gene and D gene motifs resemble those of the VH region Because of the size of our material, we had an opportunity to study mutations in individual JH genes and in several D genes. Fig. 5 shows the substitution rates for individual residues in IGHJ6 (average 4.7% per nucleotide) and IGHJ4 (average 3.9%) that are not significantly different ( p ⫽ 0.19). The highest substitution rates were seen in the 5⬘ end of the JH genes, which falls within the CDR3 and contains most of the motifs found to have a high substitution rate in the VH region (e.g., TACT, GGTA, CTAT, and CTAT; see Table II). Substitution rates in these motifs in the JH genes correspond well to the rates found in the IGHV3-23, indicating that it is the same regulatory mechanism that controls VH and JH mutations. The 3⬘ ends of the genes that encode FR4 have fewer mutations, consistent with a high content of cold spots (e.g., GCC, GGC, GTC, GGC, and GGG). The overall mutation rate in D segments was rather high (average 7.8% per nucleotide). An analysis of individual motifs was performed and showed that the high mutation rate could be explained by a high content of hot spot motifs in the D genes. The hot spots targeted had substitution frequencies comparable to those of the same motif when placed in the VH region (see Table V). Mutation frequencies depend on the JH gene We noticed that the fraction of unmutated sequences, defined as sequences with less than three mutations in the VH region, varied depending on the JH gene. Twenty-one percent of the sequences using JH4 (30 of 143 sequences) were unmutated, whereas 46.5% of the JH6-carrying sequences were unmutated (223 of 473 sequences). This was statistically significant ( p ⬍ 0.0001). When Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017 FIGURE 5. Substitution rates in IGHJ6 (n ⫽ 250) (a) and IGHJ4 (n ⫽ 113) (b). The first two positions in each sequence were not included in the analysis to reduce bias from uncertain definition of the 5⬘ end of the JH gene when the first or second base was mutated. It is seen that most of the substitutions are in the 5⬘ end of the genes lying within the CDR3, whereas the part lying in FR4 has a low intrinsic mutability. The high mutability of the CDR3 parts could be explained by a high content of motifs found to have a high substitution rate in the VH region (e.g., TACT, GGTA, CTAC, and CTAC; the targeted nucleotide is set in boldface type), while the FR4 parts are rich in cold spot motifs (e.g., stretches of multiple G or C residues). 4330 SHM OCCUR IN AND 5⬘ OF AID HOT SPOTS ON BOTH STRANDS Table V. Substitution rates in different motifs in D genes in mutated, nonproductive rearrangements compared with those of the same motifs in IGHV3–23a C Substitutions IGHV3–23 G Substitutions D Gene IGHV3–23 D Gene Motif Substitution Rate (%) No of Substitutions Positions Tested Substitution Rate (%) Motif Substitution Rate (%) No of Substitutions Positions Tested Substitution Rate (%) AGCT AGCA TACT TACC TACG AACC CACT ACCA GACT 34.6 23.4 15.9 — 10.2 — — — 4.7 15 2 8 11 1 — — 8 0 91 36 31 47 25 — — 51 27 16.5 5.6 25.8 234 4.0 — — 15.7 0.0 AGCT TGCT AGTA GGTA CGTA GGTT AGTG TGGT AGTC 35.2 — — 11.9 21.2 10.4 5.9 5.6 9.9 19 4 11 1 — 11 2 6 — 92 52 102 24 — 54 49 122 — 20.7 7.7 10.8 4.2 — 20.4 4.1 4.9 — A Substitutions IGHV3–23 T Substitutions D Gene IGHV3–23 D Gene Substitution Rate (%) No of Substitutions Positions Tested Substitution Rate (%) Motif Substitution Rate (%) No of Substitutions Positions Tested Substitution Rate (%) GTAG CTAT TTAC CTAC ATAT GTAC ATAG AAAT CCAC GCAG CCAG GGAG 20.4 17.7 13.0 12.2 10.0 10.0 — 9.2 — 3.0 3.0 3.0 8 4 2 5 4 2 3 — — 0 3 0 114 25 34 20 43 14 22 — — 46 47 46 7.0 16.0 5.9 25.0 9.3 14.3 13.6 — — 0.0 6.4 0.0 CTAC ATAG GTAA GTAG ATAT GTAC CTAT ATTT GTGG CTGC CTGG CTCC 11.1 — — 13.7 8.2 9.8 7.9 — 4.2 2.2 6.1 2.6 2 4 — 15 1 6 2 1 4 3 0 — 17 31 — 125 12 63 31 32 104 80 32 — 11.8 12.9 — 12.0 8.3 9.5 6.5 3.1 3.9 3.8 0.0 — a The substitutions in residues at the ends of D segments tend to be underestimated because substitutions in these residues may change the way the joint region is interpreted. To compensate for this problem, the two 5⬘ and the two 3⬘ nucleotides of each D segment were excluded from the analysis. The boldfaced residue is the one being analyzed and the motifs are listed after a decreasing substitution rate in C or A residues on the nontranscribed strand, respectively. A dash (—) means that the motif is not found in the gene(s). looking at the mutated sequences only (more than three mutations in the VH region) we also found that the substitution frequency varied between the two subsets of sequences. JH4-carrying sequences had an average of 18.1 substitutions in the VH region compared with 15.2 for JH6 ( p ⫽ 0.03). Substitution rates in the different hot and cold spot motifs were comparable in sequences using JH6 and JH4. Also, there was no difference in the ratio between C/G and A/T substitutions in JH6- and JH4-carrying sequences ( p ⫽ 0.29), suggesting that it was the overall substitution rate that was decreased. Discussion C and G substitution motifs and enzyme specificities Using the hitherto largest published set of nonfunctionally rearranged, somatically mutated human IgH sequences, we found that the most mutable motifs for the SHM of C and G residues corresponded well to the previously described WRCY/RGYW fournucleotide motifs (41– 43). It has been claimed that WRCH/ DGYW is an even better predictor for C/G mutability (46); however, this cannot be supported by our data (targeted nucleotides are set in boldface type). TACA and TGCA, for example, have mutation rates as low as 1.3 and 3.3%, respectively, which is lower than average. The discrepancies can be due to the differences in sample sizes and methods. The WRCY motifs include the reported WRC deamination preference of AID (19, 20, 44). This is in line with a previous report based on 25 nonproductive human IgL rearrangements and 17 IgH rearrangements (47). Similarly, the reported deamination cold spots (SYC and SSC) (19, 20, 44) were found to be cold spots for SHMs in C residues on both strands. It is interesting to note that the many highly mutable four-nucleotide motifs include a hot spot for C deamination on both strands, e.g., AGCT, as this provides a simple explanation for the double-strand breaks shown to appear during the course of SHM (48, 49). Others, however, have not been able to find a clear correlation between SHM and double-strand breaks in the BL2 cell line (50). Replication over an AID-generated uracil can only account for C/G transitions and, thus, the preferences of other enzymes involved in the mutation process may influence the mutability of the nucleotides in and around a given motif. UNG or a complex of MSH2 and MSH6 are proposed to be able to remove the created uracil, and the sequence specificities of these enzymes may therefore influence the resolution of the U:G mismatch and hence the mutability of different motifs. Bovine and E. coli UNG have been shown to have high activity in ATU (51, 52), which corresponds well with the finding of ATC being a hot spot. However, there are also some discrepancies because AGU, corresponding to the AGC mutational hot spot, displays intermediate to low uracil removal efficiency (51, 52). It is possible that human UNG has a different nucleotide preference or that MSH2/MSH6 provides the necessary backup. Substitutions in C and G residues during phase II could Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017 Motif The Journal of Immunology 4331 also influence the substitution rates. In support of this, it has been shown in mice that the inactivation of pol reduces the number of C/G substitutions, particularly in hot spot motifs (36). Another possible phase II C/G mutator is Rev1. Rev1-deficient mice have been shown to have fewer C to G and G to C mutations while the relative frequencies of C to A, T to C, and A to T substitutions were increased (53), suggesting that — at least in mice — Rev1 is involved in the generation of several types of phase II substitutions. the location of most WRC sequences in the CDRs. However, the fact that not only the distance but also the direction is important shows that this is not the case. These findings suggest that phase II SHM predominately targets T and G residues in the AID-targeted strand 5⬘ of the deaminated C. Alternatively, phase II substitutions could target the corresponding A and/or C on the opposite strand. Because both strands are targeted, both models account for phase II substitutions in all four nucleotides. These two models are discussed further below. A and T substitutions G and T substitutions 5⬘ of the AID target suggests involvement of a 3⬘-5⬘ nuclease followed by gap filling Strand symmetry indicates that SHM happens on both strands Substitution rates in complementary A and T residues showed strand symmetry, indicating that both strands are targeted by phase II mutations. Likewise, the correlation between the C substitution frequency in a particular motif and the G substitution frequency in the motif corresponding to the same motif on the transcribed strand strongly suggests that AID can deaminate both strands equally well during the initial phase of SHM. This is in agreement with data from Foster et al. and Boursier et al. who found that SHM could target both strands in the human locus (42) and locus (47), respectively. Studies of AID deamination in vitro are, however, contradictory at this point, as some find only deamination of the nontranscribed strand (19, 57) while others have shown that the transcribed strand can also be targeted (58 – 60), although in some cases to a lesser extent than the nontranscribed strand. This discrepancy could be due to the different experimental ways of detecting deamination in vitro, and the presence of cofactors in vivo may help the targeting of AID to both strands. One such cofactor, which has very recently been shown to be involved in targeting AID in engineered mice, is MSH6 (61). MSH6 thus seems to be involved in not only phase II but also phase I SHM. Models to explain correlations between substitutions in T and G residues and the distance to the nearest 3⬘ AID target Interestingly, we found significant inverse correlations between phase II substitution rates on T and G residues and the distance to the nearest 3⬘ WRC AID hot spot. Only nonsignificant or borderline significant trends were found for A and C substitutions. In contrast, no correlations were found between substitution rates and the distance to the nearest 5⬘ WRC. It could be argued that the inverse correlation to the distance to WRC is an artifact caused by The inverse correlation between the substitution rate in G and T residues and the distance to the nearest 3⬘ WRC motif suggests a molecular mechanism involving a 3⬘-5⬘ exonuclease and/or an endonuclease. Such enzyme(s) could be recruited to the abasic site created at the site of the initial deamination event where it/they could remove a stretch of DNA 5⬘ of the abasic site. DNA removal in turn could be followed by error-prone gap filling. Several human 3⬘-5⬘ exonucleases are known. These include polymerases ␦ and , WRN, APE1, and MRE11 (62). MRE11 forms the MRN complex along with RAD50 and NBS1. Ectopic expression of NBS1 increases SHM in a hypermutating Ramos cell line (63), suggesting that MRN is involved. This is further supported by the finding that MRE11 binds to a rearranged VH region only in mutating cells and that recombinant MRE11/RAD50 can cleave abasic sites in ssDNA (64). The ability to cleave DNA is separable from the 3⬘-5⬘ activity (64) and it is possible that both functions are important for SHM. APE1 is also capable of DNA cleavage at abasic sites; however, APE1 does not bind to VH region (64), speaking against an involvement in SHM. EXO1-deficient mice have normal mutation frequencies but their mutations are C/G biased and hot spot focused (34), suggesting a possible involvement of EXO1 in phase II SHM. EXO1 binds to the VH region, but not the C region in hypermutating BL-2 cells (34). However, EXO1 is a 5⬘-3⬘ exonuclease and therefore does not fit into this model unless it also has 3⬘-5⬘ exonuclease or endonuclease activity as previously suggested (65). Alternatively, its involvement in SHM may not be as a nuclease. As mentioned earlier, several error-prone DNA polymerases have been suggested as being involved in phase II mutations including polymerases , , , and (29, 31, 54, 32, 33, 55). These could be involved in gap filling following DNA removal. However, to account for the finding of a correlation only between T and G substitution rates and distances to WRC, the involved polymerase(s) would have to make mistakes mainly opposite A and C residues. An alternative explanation that easily accounts for the strong correlation between substitutions in T residues but a less strong correlation for A substitutions is that a large fraction of the T/A substitutions could be caused by the occasional incorporation of dUTP (instead of dTTP) opposite A during phase II repair. The occasional incorporation of dUTP as a means of generating SHMs has been suggested by Neuberger et al. (22). According to their model, the incorporated dUTP would subsequently be excised and substitutions would be generated during replication over the abasic site. Phase II substitution on the opposite strand As mentioned above, the finding of inverse correlation between T and G substitution rates and the distance to the nearest 3⬘ WRC can also be explained if the main targets for phase II mutations are C and A residues on the strand opposite the AID target. This could, for example, be the case if the generated uracil is either removed to Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017 A and T substitutions occur only during phase II. Several enzymes have been shown to be involved, among them some error-prone DNA polymerases likely to be involved in DNA repair following the removal of the AID-generated uracil. One such error-prone polymerase is the translesion pol . Pol -deficient mice and patients with variant xeroderma pigmentosum who have a mutation in the gene encoding pol display a reduced level of A/T substitution despite a normal overall mutation rate (29, 31, 54). Mouse pol is expressed in germinal center B cells (29) and have been found to interact with MSH2/6 (31), suggesting a possible way of recruitment. The substitution pattern of mouse and human pol in vitro shows a preference for mutations in WA/TW motifs (30), which corresponds well with our findings that the most mutable A and T four-nucleotide motifs include the WA and TW motif, respectively. Pol and pol have also been suggested as being involved in phase II mutations (32, 33, 35, 55), and their error preferences may also influence the substitution patterns. Pol is, for example, known for a preference for creating A to G transversions and for incorporating G and T opposite dUTP and A opposite an abasic site (56). 4332 SHM OCCUR IN AND 5⬘ OF AID HOT SPOTS ON BOTH STRANDS generate an abasic site or is left untouched until replication. During replication, the abasic site/uracil could cause the DNA polymerase to stall and recruit an error-prone translesion polymerase. In fact, all of the error-prone polymerases implicated in SHM (, , , and ) are known translesion polymerases. The translesion polymerase would be predicted to be engaged opposite the targeted C and introduce errors while synthesizing a short DNA segment. If errors are preferentially introduced opposite T and G residues, this model would explain our findings. This is, for example, the case for polymerase , which has been show to preferentially incorporate G opposite T, leading to a T to C transition on the AID-targeted strand 5⬘ of the targeted C (56). Influence by substitutions on the neighboring nucleotides Substitution pattern of T residue preceding a run of G residues The substitution pattern of the T residue in the last position of codon 15 is remarkable because the frequency is very high, it has almost exclusively T to G transversions, and it is outside any known SHM hot spot motifs. The adjacent run of six G residues contains almost no substitutions but a high frequency of 1–3 nucleotide insertions likely to be caused by Taq polymerase slippage. However, comparison of the numbers of substitutions in the T residue in the mutated (62 in 383 sequences) and unmutated (2 in 263 sequences), nonproductive sequences clearly shows that Taq errors are not the cause of the unusual substitution pattern. Substitutions in T or A residues preceding the other runs of at least four G residues are also predominantly to G. The JH genes also have a run of four G residues, but this region is found to contain very few mutations in all sequences. Runs of three or four G residues in G-rich motifs are known to be able to form G quartets when single stranded (66). G quartets, are for example, formed in the G-rich nontranscribed strand of the switch region during CSR, and AID is found to bind to them (67). It can be hypothesized that the runs of G residues in the variable region also fold into G quartets during the transcription-dependent, single-stranded phase of SHM. Although AID may bind to the quartets, the activity of the enzyme may be inhibited, which would account for the low substitution rate. How this can lead to a T to G transversion in the flanking base is unknown. One possibility is that the quartets attract other proteins. GQN1 is a human endonuclease highly expressed in B cells and has been shown to cleave D and JH gene substitutions We found that substitution rates in motifs in the VH region were comparable to the substitution rates found in the same motifs in the D and VH genes. It is noteworthy that the two JH genes analyzed contain very few C/G mutational hot spot motifs in the region encoding FR4 while the regions encoding CDR3 have several hot spots, for example four overlapping TACT motifs in JH6 creating hot spots for T, A, and C mutations. Hot spots are also common in the D genes contributing to the CDR3. In contrast, the FR4 regions encoded by the JH genes contain many cold spot motifs and showed very low substitution rates. This suggests that there has been an evolutionary selection against mutational hot spots in the FR4 region. That would be in line with earlier studies suggesting that the codon usage in CDR regions of IgV has been optimized for SHM (9, 10, 68). The substitution rate depends on the JH gene Although substitution patterns in the JH genes are the same as in the VH region, we find that the mutation status of a rearrangement partly depends on the JH gene. JH6-carrying sequences are less likely to be mutated than JH4-carrying sequences and, when mutated, they contain fewer substitutions on average. The mutation pattern does not seem to change but the overall frequency is decreased. The fact that the difference was found in nonproductive rearrangements makes the simplest explanation, namely that the cells with a JH6-containing rearrangement constituted a special cell subset containing fewer mutations, very unlikely, because to account for the observed findings rearrangements on both alleles would then have to use the same JH gene. This is not thought to be the case. Rearrangements using JH6 tend to have longer CDR3 loops than rearrangements with, for example, JH4 (69) (L. Ohm-Laursen et al., manuscript in preparation) and CDR3s are longer among unmutated sequences compared with mutated (69 –71 and S. Petersen and T. Barington, manuscript in preparation). This corresponds well with the finding of more JH6 sequences in the unmutated subset. However, even when the mutation analysis is restricted to rearrangements within a narrow range of CDR3 lengths (44 –52 bp), we still find that the JH6-carrying sequences are less likely to be mutated and, when mutated, are significantly less mutated than JH4-carrying sequences (data not shown). Thus, the length of the CDR3 does not seem to account for the changed mutation frequency. Therefore, we suggest that rearrangements using JH6 have special properties influencing the mutation rate. Perhaps a binding site for a cofactor is located within the intronic region upstream of JH6 and is therefore deleted when JH6 is used. Also, it is possible that JH6 is simply too close to the regulatory elements in the 3⬘ intronic enhancer (E) (72, 73) for optimal effect. Another possible regulator could be the E box motif 5⬘-CAG GTG-3⬘, which is known to bind the regulatory E47 protein (74). This motif is found in the 3⬘ end of JH1, JH2, JH4, and JH5 but not in JH3 and JH6, where the last nucleotide of the motif is exchanged to an A. When inserted into the locus, the 5⬘-CAGGTG-3⬘ motif has been shown to enhance SHM in transgenic mice without changing the mutation pattern (74). Mutation of the E box motif to 5⬘-AAGGTG-3⬘ decreases this effect. Furthermore, inactivation of the E2A gene in the DT40 chicken B cell line reduces SHM. Mutations can be restored by the expression of either of the E2A splice Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017 We found that substitutions in C and G residues in hot spot motifs decreased the substitution frequency of the neighboring G/C residue. This can be explained if the SHM machinery does not normally make substitutions in neighboring nucleotides during the same cell cycle, because substitutions in the hot spot motifs most often lead to a less mutable motif such as AGCA (mutated in 23.8% of the motifs) becoming AACA (8% mutation), for example. It is also possible that substitutions do occur on both strands during the same round of SHM but that the two strands subsequently end up in sister cells before the substitutions are fixed. Despite the previously shown inverse correlation between the mutation rate of a given nucleotide and the distance to the nearest 3⬘ WRC, we find that mutations in a given AGCT position (included in the WRC/GYW motif) do not influence the overall mutation distribution in the sequence. This suggests that the C/G nucleotide that undergoes the initial deamination step (index nucleotide) leading to phase II SHM is sometimes repaired, while on other occasions the mutation is fixed during phase II. If the index nucleotide mutation is always fixed during phase II we would expect to find more mutations 5⬘ of the index nucleotide in the sequences mutated in the index and, on the contrary, if the index is always repaired we would expect to find fewer mutations. DNA 2–5 nucleotides upstream of G quartets (66). This endonuclease may possibly be involved in the cleavage of the DNA leading to SHM, although the observed substitution pattern is not readily explained. The Journal of Immunology variants, E47 or E12, showing the importance of these proteins for the level of SHM (75). Because the 5⬘-CAGGTA-3⬘ motifs found in JH3 and JH6 deviate from the consensus 5⬘-CANNTG-3⬘ E-box motif, we therefore hypothesize that this one nucleotide difference may be involved in reducing the mutational load of JH6-carrying sequences compared with JH4-carrying sequences. Regardless of the cause, the finding of a variable mutation frequency being dependent on the type of JH gene has implications for the affinity maturation and fine tuning of the repertoire, as JH6 and JH4 are the two most commonly used JH genes in the repertoire (76). Concluding remarks Acknowledgment 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. We are grateful to Nina Eggers for excellent technical assistance. 27. Disclosures The authors have no financial conflict of interest. 28. References 1. Peters, A., and U. Storb. 1996. Somatic hypermutation of immunoglobulin genes is linked to transcription initiation. Immunity 4: 57– 65. 2. Fukita, Y., H. Jacobs, and K. Rajewsky. 1998. Somatic hypermutation in the heavy chain locus correlates with transcription. Immunity 9: 105–114. 3. Delpy, L., C. Sirac, C. Le Morvan, and M. Cogne. 2004. Transcription-dependent somatic hypermutation occurs at similar levels on functional and nonfunctional rearranged IgH alleles. J. Immunol. 173: 1842–1848. 4. Betz, A. G., C. Milstein, A. Gonzalez-Fernandez, R. Pannell, T. Larson, and M. S. Neuberger. 1994. Elements regulating somatic hypermutation of an immunoglobulin gene: critical role for the intron enhancer/matrix attachment region. Cell 77: 239 –248. 5. Terauchi, A., K. Hayashi, D. Kitamura, Y. Kozono, N. Motoyama, and T. Azuma. 2001. A pivotal role for DNase I-sensitive regions 3b and/or 4 in the induction of somatic hypermutation of IgH genes. J. Immunol. 167: 811– 820. 6. Kodama, M., R. Hayashi, H. Nishizumi, F. Nagawa, T. Takemori, and H. Sakano. 2001. The PU. 1 and NF-EM5 binding motifs in the Igkappa 3⬘ enhancer are responsible for directing somatic hypermutations to the intrinsic hot spots in the transgenic V(kappa) gene. Int. Immunol. 13: 1415–1422. 7. Rada, C., and C. Milstein. 2001. The intrinsic hypermutability of antibody heavy and light chain genes decays exponentially. EMBO J. 20: 4570 – 4576. 8. Rada, C., J. Yelamos, W. Dean, and C. Milstein. 1997. The 5⬘ hypermutation boundary of chains is independent of local and neighbouring sequences and related to the distance from the initiation of transcription. Eur. J. Immunol. 27: 3115–3120. 9. Wagner, S. D., C. Milstein, and M. S. Neuberger. 1995. Codon bias targets mutation. Nature 376: 732. 10. Cowell, L. G., H. J. Kim, T. Humaljoki, C. Berek, and T. B. Kepler. 1999. Enhanced evolvability in immunoglobulin V genes under somatic hypermutation. J. Mol. Evol. 49: 23–26. 11. Muramatsu, M., K. Kinoshita, S. Fagarasan, S. Yamada, Y. Shinkai, and T. Honjo. 2000. Class switch recombination and hypermutation require activa- 29. 30. 31. 32. 33. 34. 35. 36. 37. tion- induced cytidine deaminase (AID), a potential RNA editing enzyme. Cell 102: 553–563. Revy, P., T. Muto, Y. Levy, F. Geissmann, A. Plebani, O. Sanal, N. Catalan, M. Forveille, R. Dufourcq-Labelouse, A. Gennery, et al. 2000. Activation-induced cytidine deaminase (AID) deficiency causes the autosomal recessive form of the Hyper-IgM syndrome (HIGM2). Cell 102: 565–575. Arakawa, H., J. Hauschild, and J. M. Buerstedde. 2002. Requirement of the activation-induced deaminase (AID) gene for immunoglobulin gene conversion. Science 295: 1301–1306. Martin, A., P. D. Bardwell, C. J. Woo, M. Fan, M. J. Shulman, and M. D. Scharff. 2002. Activation-induced cytidine deaminase turns on somatic hypermutation in hybridomas. Nature 415: 802– 806. Petersen-Mahrt, S. K., R. S. Harris, and M. S. Neuberger. 2002. AID mutates E. coli suggesting a DNA deamination mechanism for antibody diversification. Nature 418: 99 –104. Yoshikawa, K., I. M. Okazaki, T. Eto, K. Kinoshita, M. Muramatsu, H. Nagaoka, and T. Honjo. 2002. AID enzyme-induced hypermutation in an actively transcribed gene in fibroblasts. Science 296: 2033–2036. Okazaki, I. M., K. Kinoshita, M. Muramatsu, K. Yoshikawa, and T. Honjo. 2002. The AID enzyme induces class switch recombination in fibroblasts. Nature 416: 340 –345. Bransteitter, R., P. Pham, M. D. Scharff, and M. F. Goodman. 2003. Activationinduced cytidine deaminase deaminates deoxycytidine on single-stranded DNA but requires the action of RNase. Proc. Natl. Acad. Sci. USA 100: 4102– 4107. Pham, P., R. Bransteitter, J. Petruska, and M. F. Goodman. 2003. Processive AID-catalysed cytosine deamination on single-stranded DNA simulates somatic hypermutation. Nature 424: 103–107. Larijani, M., D. Frieder, W. Basit, and A. Martin. 2005. The mutation spectrum of purified AID is similar to the mutability index in Ramos cells and in ung(⫺/⫺) msh2(⫺/⫺) mice. Immunogenetics 56: 840 – 845. Rada, C., J. M. Di Noia, and M. S. Neuberger. 2004. Mismatch recognition and uracil excision provide complementary paths to both Ig switching and the A/Tfocused phase of somatic mutation. Mol. Cell 16: 163–171. Neuberger, M. S., J. M. Di Noia, R. C. Beale, G. T. Williams, Z. Yang, and C. Rada. 2005. Somatic hypermutation at A.T pairs: polymerase error versus dUTP incorporation. Nat. Rev. Immunol. 5: 171–178. Rada, C., G. T. Williams, H. Nilsen, D. E. Barnes, T. Lindahl, and M. S. Neuberger. 2002. Immunoglobulin isotype switching is inhibited and somatic hypermutation perturbed in UNG-deficient mice. Curr. Biol. 12: 1748 –1755. Imai, K., G. Slupphaug, W. I. Lee, P. Revy, S. Nonoyama, N. Catalan, L. Yel, M. Forveille, B. Kavli, H. E. Krokan, H. D. Ochs, et al. 2003. Human uracil-DNA glycosylase deficiency associated with profoundly impaired immunoglobulin class-switch recombination. Nat. Immunol. 4: 1023–1028. Rada, C., M. R. Ehrenstein, M. S. Neuberger, and C. Milstein. 1998. Hot spot focusing of somatic hypermutation in MSH2-deficient mice suggests two stages of mutational targeting. Immunity 9: 135–141. Martomo, S. A., W. W. Yang, and P. J. Gearhart. 2004. A role for Msh6 but not Msh3 in somatic hypermutation and class switch recombination. J. Exp. Med. 200: 61– 68. Shen, H. M., A. Tanaka, G. Bozek, D. Nicolae, and U. Storb. 2006. Somatic hypermutation and class switch recombination in Msh6⫺/⫺Ung⫺/⫺ doubleknockout mice. J. Immunol. 177: 5386 –5392. Matsuda, T., K. Bebenek, C. Masutani, I. B. Rogozin, F. Hanaoka, and T. A. Kunkel. 2001. Error rate and specificity of human and murine DNA polymerase eta. J. Mol. Biol. 312: 335–346. Zeng, X., D. B. Winter, C. Kasmer, K. H. Kraemer, A. R. Lehmann, and P. J. Gearhart. 2001. DNA polymerase eta is an A-T mutator in somatic hypermutation of immunoglobulin variable genes. Nat. Immunol. 26: 537–541. Rogozin, I. B., Y. I. Pavlov, K. Bebenek, T. Matsuda, and T. A. Kunkel. 2001. Somatic mutation hot spots correlate with DNA polymerase eta error spectrum. Nat. Immunol. 2: 530 –536. Martomo, S. A., W. W. Yang, R. P. Wersto, T. Ohkumo, Y. Kondo, M. Yokoi, C. Masutani, F. Hanaoka, and P. J. Gearhart. 2005. Different mutation signatures in DNA polymerase eta- and MSH6-deficient mice suggest separate roles in antibody diversification. Proc. Natl. Acad. Sci. USA 102: 8656 – 8661. Diaz, M., L. K. Verkoczy, M. F. Flajnik, and N. R. Klinman. 2001. Decreased frequency of somatic hypermutation and impaired affinity maturation but intact germinal center formation in mice expressing antisense RNA to DNA polymerase . J. Immunol. 167: 327–335. Zan, H., A. Komori, Z. Li, A. Cerutti, A. Schaffer, M. F. Flajnik, M. Diaz, and P. Casali. 2001. The translesion DNA polymerase plays a major role in Ig and bcl-6 somatic hypermutation. Immunity 14: 643– 653. Bardwell, P. D., C. J. Woo, K. Wei, Z. Li, A. Martin, S. Z. Sack, T. Parris, W. Edelmann, and M. D. Scharff. 2004. Altered somatic hypermutation and reduced class-switch recombination in exonuclease 1-mutant mice. Nat. Immunol. 5: 224 –229. Faili, A., S. Aoufouchi, E. Flatter, Q. Gueranger, C. A. Reynaud, and J. C. Weill. 2002. Induction of somatic hypermutation in immunoglobulin genes is dependent on DNA polymerase iota. Nature 419: 944 –947. Masuda, K., R. Ouchida, A. Takeuchi, T. Saito, H. Koseki, K. Kawamura, M. Tagawa, T. Tokuhisa, T. Azuma, and J. Wang. 2005. DNA polymerase theta contributes to the generation of C/G mutations during somatic hypermutation of Ig genes. Proc. Natl. Acad. Sci. USA 102: 13986 –13991. Ohm-Laursen, L., M. Nielsen, S. R. Larsen, and T. Barington. 2006. No evidence for the use of DIR, D-D fusions, chromosome 15 open reading frames or VH replacement in the peripheral repertoire was found when applying an improved Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017 The Ig repertoire is known to be shaped by the SHM of the variable regions during an immune response. In this study we report that the mutation machinery operates equally well on both strands. The substitution frequency of a given residue is dependent on the motif in which it resides and the distance to the nearest 3⬘ AID deamination hot spot, suggesting that phase II substitutions occur 5⬘ of the site of the initial deamination. Alternatively, phase II substitutions occur on the opposite strand 3⬘ of the G residue facing the AID-targeted C residue. Substitutions in the neighboring nucleotide also influence the substitution frequency of C and G in AGCT double hot spots. Motifs are the same in VH, D, and JH genes; however, the JH gene of the rearrangement influences the overall mutation frequency, because JH6-using rearrangements are found to contain fewer mutations than JH4-using rearrangements. The sequences in this study use the IGHV3-23*01 VH gene, and it can therefore be argued that the findings may be special to this VH gene. However, when possible we have confirmed the results by analysis of a set of sequences using the IGHV3-h pseudogene. Also, previous work from many groups suggests that the mutation process is similar irrespective of which VH genes have been studied. We therefore suggest that the results presented in this paper are also applicable to other human VH genes. 4333 4334 38. 39. 40. 41. 42. 43. 44. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. algorithm, JointML, to 6329 human IgH rearrangements. Immunology 119: 265–277. Ohm-Laursen, L., S. R. Larsen, and T. Barington. 2005. Identification of two new alleles, IGHV3-23*04 and IGHJ6*04, and the complete sequence of the IGHV3-h pseudogene in the human immunoglobulin locus and their prevalences in Danish caucasians. Immunogenetics 57: 621– 627. Lefranc, M. P., V. Giudicelli, C. Ginestoux, J. Bodmer, W. Muller, R. Bontrop, M. Lemaitre, A. Malik, V. Barbie, and D. Chaume. 1999. IMGT, the international immunogenetics database. Nucleic Acids Res. 27: 209 –212. Lam, K. P., R. Kühn, and K. Rajewsky. 1997. In vivo ablation of surface immunoglobulin on mature B cells by inducible gene targeting results in rapid cell death. Cell 90: 1073–1083. Rogozin, I. B., and N. A. Kolchanov. 1992. Somatic hypermutagenesis in immunoglobulin genes, II: influence of neighbouring base sequences on mutagenesis. Biochim. Biophys. Acta 1171: 11–18. Foster, S. J., T. Dorner, and P. E. Lipsky. 1999. Somatic hypermutation of VJ rearrangements: targeting of RGYW motifs on both DNA strands and preferential selection of mutated codons within RGYW motifs. Eur. J. Immunol. 29: 4011– 4021. Dorner, T., H. P. Brezinschek, R. I. Brezinschek, S. J. Foster, R. Domiati-Saad, and P. E. Lipsky. 1997. Analysis of the frequency and pattern of somatic mutations within nonproductively rearranged human variable heavy chain genes. J. Immunol. 158: 2779 –2789. Yu, K., F. T. Huang, and M. R. Lieber. 2004. DNA substrate length and surrounding sequence affect the activation-induced deaminase activity at cytidine. J. Biol. Chem. 279: 6496 – 6500. Pavlov, Y. I., I. B. Rogozin, A. P. Galkin, A. Y. Aksenova, F. Hanaoka, C. Rada, and T. A. Kunkel. 2002. Correlation of somatic hypermutation specificity and A-T base pair substitution errors by DNA polymerase eta during copying of a mouse immunoglobulin light chain transgene. Proc. Natl. Acad. Sci. USA 99: 9954 –9959. Rogozin, I. B., and M. Diaz. 2004. Cutting edge: DGYW/WRCH is a better predictor of mutability at G:C bases in Ig hypermutation than the widely accepted RGYW/WRCY motif and probably reflects a two-step activation-induced cytidine deaminase-triggered process. J. Immunol. 172: 3382–3384. Boursier, L., W. Su, and J. Spencer. 2004. Analysis of strand biased ‘G’.C hypermutation in human immunoglobulin V() gene segments suggests that both DNA strands are targets for deamination by activation-induced cytidine deaminase. Mol. Immunol. 40: 1273–1278. Sale, J. E., and M. S. Neuberger. 1998. TdT-accessible breaks are scattered over the immunoglobulin V domain in a constitutively hypermutating B cell line. Immunity 9: 859 – 869. Bross, L., Y. Fukita, F. McBlane, C. Demolliere, K. Rajewsky, and H. Jacobs. 2001. DNA double-strand breaks in immunoglobulin genes undergoing somatic hypermutation. Immunity 13: 589 –597. Faili, A., S. Aoufouchi, Q. Gueranger, C. Zober, A. Leon, B. Bertocci, J. C. Weill, and C. A. Reynaud. 2006. AID-dependent somatic hypermutation occurs as a DNA single-strand event in the BL2 cell line. Nat. Immunol. 39: 815– 821. Eftedal, I., P. H. Guddal, G. Slupphaug, G. Volden, and H. E. Krokan. 1993. Consensus sequences for good and poor removal of uracil from double stranded DNA by uracil-DNA glycosylase. Nucleic Acids Res. 21: 2095–2101. Eftedal, I., G. Volden, and H. E. Krokan. 1994. Excision of uracil from doublestranded DNA by uracil-DNA glycosylase is sequence specific. Ann. NY Acad. Sci. 726: 312–314. Jansen, J. G., P. Langerak, A. Tsaalbi-Shtylik, P. ven den Berk, H. Jacobs, and N. de Wind. 2006. Strand-biased defect in G/C transversions in hypermutating immunoglobulin genes in Rev1-deficient mice. J. Exp. Med. 203: 319 –323. Zeng, X., G. A. Negrete, C. Kasmer, W. W. Yang, and P. J. Gearhart. 2004. Absence of DNA polymerase reveals targeting of C mutations on the nontranscribed strand in immunoglobulin switch regions. J. Exp. Med. 199: 917–924. Delbos, F., A. De Smet, A. Faili, S. Aoufouchi, J. C. Weill, and C. A. Reynaud. 2005. Contribution of DNA polymerase to immunoglobulin gene hypermutation in the mouse. J. Exp. Med. 201: 1191–1196. 56. Zhang, Y., X. Yuan, X. Wu, and Z. Wang. 2000. Preferential incorporation of G opposite template T by the low-fidelity human DNA polymerase . Mol. Cell. Biol. 20: 7099 –7108. 57. Martomo, S. A., D. Fu, W. W. Yang, N. S. Joshi, and P. J. Gearhart. 2005. Deoxyuridine is generated preferentially in the nontranscribed strand of DNA from cells expressing activation-induced cytidine deaminase. J. Immunol. 174: 7787–7791. 58. Chaudhuri, J., M. Tian, C. Khuong, K. Chua, E. Pinaud, and F. W. Alt. 2003. Transcription-targeted DNA deamination by the AID antibody diversification enzyme. Nature 422: 726 –730. 59. Besmer, E., E. Market, and F. N. Papavasiliou. 2006. The transcription elongation complex directs activation-induced cytidine deaminase-mediated DNA deamination. Mol. Cell. Biol. 26: 4378 – 4385. 60. Shen, H. M., S. Ratnam, and U. Storb. 2005. Targeting of the activation-induced cytosine deaminase is strongly influenced by the sequence and structure of the targeted DNA. Mol. Cell. Biol. 25: 10815–10821. 61. Li, Z., C. Zhao, M. D. Iglesias-Ussel, Z. Polonskaya, M. Zhuang, G. Yang, Z. Luo, W. Edelmann, and M. D. Scharff. 2006. The mismatch repair protein Msh6 influences the in vivo AID targeting to the Ig locus. Immunity 24: 393– 403. 62. Shevelev, I. V., and U. Hübscher. 2002. The 3⬘-5⬘ exonucleases. Nat. Rev. Mol. Cell. Biol. 3: 1–12. 63. Yabuki, M., M. M. Fujii, and N. Maizels. 2005. The MRE11-RAD50-NBS1 complex accelerates somatic hypermutation and gene conversion of immunoglobulin variable regions. Nat. Immunol. 6: 730 –736. 64. Larson, E. D., W. J. Cummings, D. W. Bednarski, and N. Maizels. 2005. MRE11/ RAD50 cleaves DNA in the AID/UNG-dependent pathway of immunoglobulin gene diversification. Mol. Cell 20: 367–375. 65. Genschel, J., L. R. Bazemore, and P. Modrich. 2002. Human Exonuclease I Is Required for 5⬘ and 3⬘ Mismatch Repair. J. Biol. Chem. 277: 13302–13311. 66. Sun, H., A. Yabuki, and N. Maizels. 2001. A human nuclease specific for G4 DNA. Proc. Natl. Acad. Sci. USA 98: 12444 –12449. 67. Duquette, M. L., P. Pham, M. F. Goodman, and N. Maizels. 2005. AID binds to transcription-induced structures in c-MYC that map to regions associated with translocation and hypermutation. Oncogene 24: 5791–5798. 68. Monson, N. L., T. Dorner, and P. E. Lipsky. 2000. Targeting and selection of mutations in human V rearrangements. Eur. J. Immunol. 30: 1597–1605. 69. Rosner, K., D. B. Winter, R. E. Tarone, G. L. Skovgaard, V. A. Bohr, and P. J. Gearhart. 2001. Third complementarity-determining region of mutated VH immunoglobulin genes contains shorter V, D, J, P, and N components than nonmutated genes. Immunology 103: 179 –187. 70. Luger, E., M. Lamers, G. Achatz-Straussberger, R. Geisberger, D. Infuhr, M. Breitenbach, R. Crameri, and G. Achatz. 2001. Somatic diversity of the immunoglobulin repertoire is controlled in an isotype-specific manner. Eur. J. Immunol. 31: 2319 –2330. 71. Brezinschek, H. P., S. J. Foster, T. Dorner, R. I. Brezinschek, and P. E. Lipsky. 1998. Pairing of variable heavy and variable kappa chains in individual naive and memory B cells. J. Immunol. 160: 4762– 4767. 72. Morvan, C. L., E. Pinaud, C. Decourt, A. Cuvillier, and M. Cogne. 2003. The immunoglobulin heavy-chain locus hs3b and hs4 3⬘ enhancers are dispensable for VDJ assembly and somatic hypermutation. Blood 102: 1421–1427. 73. Bottaro, A., F. Young, J. Chen, M. Serwe, F. Sablitzky, and F. W. Alt. 1998. Deletion of the IgH intronic enhancer and associated matrix-attachment regions decreases, but does not abolish, class switching at the locus. Int. Immunol. 10: 799 – 806. 74. Michael, N., H. M. Shen, S. Longerich, N. Kim, A. Longacre, and U. Storb. 2003. The E box motif CAGGTG enhances somatic hypermutation without enhancing transcription. Immunity 19: 235–242. 75. Schoetz, U., M. Cervelli, Y. D. Wang, P. Fiedler, and J. M. Buerstedde. 2006. E2A expression stimulates Ig hypermutation. J. Immunol. 177: 395– 400. 76. Wasserman, R., Y. Ito, N. Galili, M. Yamada, B. A. Reichard, S. Shane, B. Lange, and G. Rovera. 1992. The pattern of joining (JH) gene usage in the human IgH chain is established predominantly at the B precursor cell stage. J. Immunol. 149: 511–516. Downloaded from http://www.jimmunol.org/ by guest on July 12, 2017 45. SHM OCCUR IN AND 5⬘ OF AID HOT SPOTS ON BOTH STRANDS
© Copyright 2026 Paperzz