REVIEWS Hypermutation in human cancer genomes: footprints and mechanisms Steven A. Roberts1,2 and Dmitry A. Gordenin1 Abstract | A role for somatic mutations in carcinogenesis is well accepted, but the degree to which mutation rates influence cancer initiation and development is under continuous debate. Recently accumulated genomic data have revealed that thousands of tumour samples are riddled by hypermutation, broadening support for the idea that many cancers acquire a mutator phenotype. This major expansion of cancer mutation data sets has provided unprecedented statistical power for the analysis of mutation spectra, which has confirmed several classical sources of mutation in cancer, highlighted new prominent mutation sources (such as apolipoprotein B mRNA editing enzyme catalytic polypeptide-like (APOBEC) enzymes) and empowered the search for cancer drivers. The confluence of cancer mutation genomics and mechanistic insight provides great promise for understanding the basic development of cancer through mutations. Mutation load The number of mutations in the genome or part of the genome of a cell, a group of cells (tumour, tissue and so on) or an organism. Mutation rates (Also known as mutation frequencies). The numbers of mutations per unit of length (for example, per Megabase) or biological time (for example, per cell generation). Drivers Mutations that increase the probability of tumour incidence. Laboratory of Molecular Genetics, National Institute of Environmental Health Sciences (NIH, DHHS), Durham, North Carolina 27709, USA. 2 School of Molecular Biosciences, Washington State University, Pullman, Washington 99164, USA. Correspondence to D.A.G. e-mail: [email protected] doi:10.1038/nrc3816 1 Mutations are among the usual suspects for causing cancer, with mutations being found in oncogenes and tumour suppressor genes in malignant tumours. Moreover, there are several classical cases in which increased spontaneous mutagenesis or environmentally enhanced mutagenesis correlates with increased mutation load and cancer risk1–7. Such instances of high mutation load have served as evidence supporting the hypothesis that cancer frequently involves the establishment of a mutator phenotype8, in which pre-cancerous cells begin to accumulate more mutations per cell generation than their normal counterparts. Despite the general observation that tumours often contain a large number of mutations (that is, >10,000 single nucleotide changes and tens to hundreds of structural changes), which we shall refer to as hypermutation, neither how these mutations accumulate (that is, through higher mutation rates or increased number of replications in highly proliferative cancer cells)8–10 nor whether they accelerate cancer or are merely a by‑product of immortalization has yet to be established. To try to understand the relationship between mutations and tumorigenesis more clearly, recent resequencing of cancer genomes has been undertaken. These efforts have been mostly designed to identify frequently mutated genes, the alteration of which likely drive cancer, but have also revealed important insights into the plasticity of the cancer genome. For example, mutation loads can differ by several orders of magnitude11,12, with a wide variety of tumour types, such as melanoma, lung, stomach, colorectal, endometrial and cervical cancers, showing high mutation loads that are consistent with hypermutation, which may generate drivers of malignancy. Evaluating this contribution by cataloguing cancer genes that are frequently affected by hypermutation and determining the mechanisms of hypermutation may further our understanding of cancer biology, through which new therapeutic targets may be identified. This Review will discuss the current understanding of hypermutation in cancer and speculate on future advances in this field that will be facilitated by the rapidly evolving area of cancer genomics, in which the analysis of vast whole-genome and whole-exome mutation data sets merges with detailed knowledge about DNA trans actions to identify new mutagenic mechanisms and find new cancer drivers. Hypermutation in cancer Scientists have long understood that the root causes of cancer lie in the dysregulation of cell survival and proliferation, often as the result of multiple genetic alterations (that is, both single nucleotide changes and structural rearrangements) that accumulate within a cell, despite a normally low mutation rate. However, 40 years after the initial suggestion of the cancer mutator phenotype13, this hypothesis remains supported primarily by the increased predisposition to cancer of individuals who are deficient in various DNA replication and repair processes, as 786 | DECEMBER 2014 | VOLUME 14 www.nature.com/reviews/cancer © 2014 Macmillan Publishers Limited. All rights reserved REVIEWS Structural rearrangements Changes in the location of genomic regions relative to each other. Mutation reporters DNA sequences that, when mutated in a cell, give the cell a phenotype that can be selected. In most cases the mutation either restores the function of an inactive protein or provides resistance to a drug. Mutation spectrum The list of mutation types and coordinates within a mutation load. well as by limited experimental observation of usually large numbers of mutations in various tumour samples. The number of cancer genomes and exomes (currently exceeding 10,000 and growing fast) sequenced by the collective efforts of individual groups, as well as The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC), has provided the ability to have a much broader assessment of the sources and consequences of hypermutation in cancer development, mostly through statistical analysis of patterns within the mutation data. In these studies11,12,14–19, the sequence of tumour DNA is compared to the DNA sequence of either the matched normal tissue or blood of the patient to identify tumour-specific mutations. All current mutation identification algorithms require evidence of a genetic change to be supported by multiple DNA sequences that would originate from multiple cells within the tumour. As a consequence, these mutation lists represent a composite image of the mutagenesis occurring in the major sub-clones of the tumour and limit the contribution of mutations in neighbouring stromal cells. The philosophy and statistical approaches for extracting useful information from catalogues of mutations in cancer genomes are, overall, analogous to the analysis of mutation spectra obtained in experiments with mutation reporters — the classical approach in molecular genetics20,21. Apparent ‘irregularities’ in the distribution of mutation types and position, as compared to the null hypothesis of a random mutation spectrum, are matched against mechanistic knowledge about the chemistry of a mutagen and genetic systems expected to repair the resulting DNA lesions. For example, mutation spectra of ultraviolet (UV) radiation are in good agreement with its capability to cause bulky lesions (cyclobutane pyrimidine dimers (CPDs) and 6–4 photoproducts (6–4PPs)) in adjacent pyrimidine nucleotides22,23. However, where the analysis of mutation spectra from reporters in model systems is greatly aided by defined experimental conditions and genotypes, the background information for cancer genome mutation catalogues is much less defined. In part, this is compensated for by the large number of mutations within individual cancer genomes – sometimes greater than 105 – allowing for statistical analysis of cancer mutation spectra that is unrestricted by mechanistic hypotheses. De novo genome-wide or exome-wide patterns. The first mutation patterns within whole-genome-sequenced cancers were detected in relation to the distribution of mutations in genomic space (TABLE 1). Universally, mutation rate was observed to be increased near breakpoints of structural rearrangements24,25, which are present in large numbers in many cancer genomes and are unique in each sample. Studies in model microbial systems established that this relationship probably results from either error-prone DNA synthesis that is associated with DNA double-strand-break (DSB) repair 26–29 or an increased number of unrepaired lesions in long regions of singlestranded DNA (ssDNA) created around the break site30–34 (also discussed further below). Collectively, the experimental evidence showing that regions of DNA that are associated with DSB repair are prone to mutation and the bioinformatics analysis describing similar effects within clinical tumour samples suggested that intrinsic chromosomal features could themselves alter the rate of mutation and serve as one source of hypermutation in cancer. Other genomic features, such as replication timing, transcription levels, and chromatin organization, also affect mutagenesis and are considered to occur universally among normal somatic cells and various cancer types. They have non-uniform profiles across the genome; however, each profile is relatively constant between cells of the same type and even, in part, between different cancer cell lines. The stratification of these features across genomic space, initially developed by individual groups and subsequently expanded by the ENCODE consortium35–37, has generated a large database of profiles that Table 1 | Genomic features affecting the mutation rate in cancer genomes Genomic feature Type of feature Association with increased mutation density Potential mechanism Examples of affected cancer types Refs Rearrangement breakpoints Sample-specific Vicinity to a breakpoint Hypermutation in ssDNA and error-prone DNA synthesis Breast, head and neck, colorectal, prostate, melanoma, multiple myeloma and chronic lymphocytic leukaemia 24,25,32, 80 Replication timing Universal Late replication Unknown Melanoma; pan-cancer studies of up to 27 cancer types 12,38,39 Transcription Static Low transcription and/or non-transcribed strand TCR-mediated removal of mutagenic lesions Urothelial upper tract, lung and melanoma; pan-cancer study of 30 cancer types 11,42–44, 168 Heterochromatin Universal or cancer marks type-specific H3K9me3, H3K9me2 and Absence of TCR and poor H4K20me3 access of GGR Leukaemia, melanoma, lung and prostate 46 Nucleosome density Universal or cancer type-specific DNAse I-resistant regions with higher nucleosome content Poor access of GGR complexes Melanoma 39 Local sequence mutation Cancer type-specific Trinucleotide context DNA repair, replication and endogenous lesions Multiple types See the main text GGR, global genome repair; me2, dimethylation; me3, trimethylation; ssDNA, single-stranded DNA; TCR, transcription-coupled repair. NATURE REVIEWS | CANCER VOLUME 14 | DECEMBER 2014 | 787 © 2014 Macmillan Publishers Limited. All rights reserved REVIEWS Late replicating regions Regions of the genome where DNA is synthesized near the end of S‑phase during replication. Transcription-coupled repair (TCR). A form of nucleotide excision repair that utilizes a stalled RNA polymerase as a means to recognize a DNA lesion in the transcribed strand. Nucleotide excision repair (NER). A DNA repair pathway that removes bulky DNA lesions by removing a stretch of the DNA strand containing the lesion and then by subsequently using the remaining undamaged DNA strand as a template to synthesize a new stretch of DNA to replace the excised one. Mutation signatures Characteristics of a mutation, such as the mutated base, the resulting base (or bases) and nucleotides in the immediate vicinity that occur more frequently than expected with random mutation of genomic DNA. DNA lesion A specific change in DNA structure that results from DNA damage; for example, cytidine deamination, base alkylation, base oxidation, strand crosslinks, ultraviolet‑induced dimers and DNA breakage. Mutation clusters Groups of mutations that are spaced more closely than expected by random distribution of mutations in a genome. can be correlated with densities of somatic mutations in cancer genomes. The most robust and reproducible correlation was documented for late replicating regions accumulating higher mutation densities than in the rest of the cancer genomes12,38,39. Interestingly, late replicating regions seem to be also more mutable over an evolutionary timescale40,41. A greater chance of replication fork uncoupling leading to formation of hypermutable ssDNA in late replicating regions was suggested as a mechanism underlying this hypermutability 41. A second, relatively invariant, feature associated with increased mutation density is non-transcribed or low transcribed regions as compared to highly transcribed regions of the genome12. High levels of transcription may prevent mutations by enabling transcription-coupled repair (TCR), a form of nucleotide excision repair (NER) that is initiated by lesion-stalled RNA polymerase II (RNA Pol II), which, together with global genome repair (GGR), would remove more mutation-generating DNA lesions than GGR alone would do in non-transcribed or lowtranscribed regions. In support of a role for TCR in suppressing mutation rates, mutation frequencies were often lower on the transcribed strand versus the nontranscribed strand, where TCR does not operate11,42–45. Such analyses typically compare the relative number of specific mutation types occurring on the transcribed and non-transcribed DNA strands and therefore require assumptions about which base in a pair is mutated as well as about equal distribution of the target sequences between DNA strands. In cases in which a tumour is linked to a well-characterized DNAdamaging agent (for example, UV for melanoma), these assumptions are probably valid. However, other explanations for transcriptional strand bias are possible. Transcription levels may also partially explain the increase in mutation density that is associated with heterochromatin46. Regions with condensed chromatin, as determined by resistance to S1 nuclease indicating a higher density of nucleosomes, had increased mutation density in melanoma genomes39. This could also be due to more active TCR in more highly transcribed open chromatin. However, even non-transcribed regulatory regions with low nucleosome density had low mutation density. Supporting an alternative explanation, melanoma samples with mutations in NER genes show higher mutation densities in S1‑sensitive nucleosomedeprived regions than in samples with wild-type NER genes, suggesting that the increase in mutation density in nucleosome-rich regions can be, in part, due to lower accessibility to NER complexes. In addition to regional mutation patterns in cancer, patterns of sequence specificity can also be observed. Initial model studies, which involve sequencing of mutation reporters, established that mutation rates can vary substantially for different ‘mutation signatures’; that is, the choices of mutated nucleotide, the kind of nucleotide resulting from mutation and the immediate sequence context. These choices may be defined by one or several factors, such as the DNA lesion spectrum induced by a mutagen, replication fidelity, and the relative contribution of DNA repair pathways in an individual or cell type21 (see also below). Therefore, mutation signatures in a cancer sample may carry useful information about the history of the development of a tumour, which can be likened to an archaeological record47. A powerful analytical method, non-negative matrix factorization (NMF)48,49, which proved to be useful for a variety of bioinformatics analyses of large data sets, has likewise been productive in decrypting mutation signatures in cancer genomes. When applied to cancer mutation databases, this approach uses iteration to identify two matrices that, together, can accurately reconstruct the distribution of mutations across 96 mutation types (that is, three possible substitutions occurring at the central base of 32 trinucleotide sequences; complementary mutation types are combined). The first matrix is a group of specific mutation signatures that is defined by relative abundances of each mutation type, and the second matrix describes the strength at which the mutation signature occurs in each tumour in the analysed database. Lawrence, Getz and colleagues12 highlighted six prevailing mononucleotide or dinucleotide mutation signatures in more than 3,000 samples of 27 different cancer types. Some signatures were highly concordant with exposures to specific mutagens, such as tobacco or UV radiation. Stratton and colleagues11,50,51 applied the NMF algorithm to all possible combinations of trinucleotides and base substitutions of the central base. Their analysis identified 21 mutation signatures, each consisting of several simple trinucleotide mutation motifs. A somewhat different version of NMF led Zhao and colleagues52 to identify 20 mutation signatures, most of which overlapped with signatures found by other groups. They also explored the heterogeneity of mutation signatures within cancer types. They concluded that the mutation spectrum in cancers is generated by a complex mix of mechanisms resulting in a signature profile that is unique for each individual cancer. The advantage of the NMF-based approach is that it is unbiased and hypothesis-free, so that, in principle, it can extract any (even previously unsuspected) mutational signatures. However, as indicated by Alexandrov et al.50, the number of extracted signatures depends on the number of samples being analysed and the diversity and frequency of mutation signatures. Hence, low-frequency signatures will be difficult to identify. Still, with a large number of cancer samples, many signatures can be detected and then compared to experimental and mechanistic knowledge, with the goal of deciphering multiple mutagenic mechanisms that occur during cancer development. NMF-generated signatures are usually complex, and they may result from a mix of different mechanisms. Further analysis can be greatly facilitated by identifying a single-mechanism component (or components) within a signature. Patterns in mutation clusters. As an alternative to NMF, single-mechanism components of a mutation signature can be pinpointed by statistical analysis of spectra in groups of mutations that are located too close to each other to be attributed to random independent events — that is, representing mutation clusters. The phenomenon of unusually close positioning of several mutations was 788 | DECEMBER 2014 | VOLUME 14 www.nature.com/reviews/cancer © 2014 Macmillan Publishers Limited. All rights reserved REVIEWS a DSB repair b Break-induced replication c Telomere uncapping TLS and repair TLS and repair TLS and repair d Replication e Transcription Fork regression TLS and repair Repair TLS and repair Repair Mutation cluster Mutated cluster Figure 1 | Lesions in single-stranded DNA (ssDNA) can result in clusters of strand-coordinated mutations. Nature ReviewsLesions | Cancer are shown as hexagons. In the case of base-specific damage — for example, cytidine deamination by an apolipoprotein B mRNA editing enzyme catalytic polypeptide-like (APOBEC) enzyme (or enzymes) — lesions would be in the same type of DNA base (that is, C) of the same strand. Translesion synthesis (TLS) will introduce mutations in the complementary strand, which can then be fixed in DNA by a subsequent round of synthesis (labelled “TLS and repair” in each part of the figure). This will result in strand-coordination of mutations changing only Cs (red) of the initially damaged strand and only Gs (dark blue) that are mutated in the complementary strand. a | ssDNA that is formed by 5′→3′ resection at DNA double-strand-breaks (DSBs); one of several DSB repair mechanisms170 restores double-stranded DNA (dsDNA) at the position of the break. b | ssDNA is formed by the migrating loop and uncoupled strand copying during break-induced replication58,63. c | Telomere uncapping creates regions of ssDNA (REFS 31,55 and therein). d | ssDNA that is formed during DNA replication (ssDNA in the lagging strand is shown; a similar sequence of events may be associated with the leading strand). Lesions may result in mutations or be repaired via fork regression, which displaces a short stretch of complementary strand and pairs damaged ssDNA of the gap with the intact region of the complementary nascent strand62. The intact complementary strand provides a template for excision repair of lesions that are generated in the ssDNA gap. e | Lesions in transient ssDNA that is formed by mRNA–DNA pairing (R-loops69) can lead to mutation clusters. Mutations will be prevented if re‑annealing of DNA strands followed by excision repair occurs before replication. Break-induced replication A DNA double-strand-break repair mechanism involving the invasion of one DNA end into a homologous locus on a sister chromatid or homologous chromosome. Once invaded, the broken DNA is used to prime replication to the end of the unbroken sister chromatid or homologous chromosome to replace the DNA sequence that is lost owing to the DNA double-strand-break. initially detected by Sommer and colleagues53 using the Big Blue mouse mutation reporter system. These unusually spaced mutations were called ‘mutation showers’, by analogy with a meteor shower in the dark sky of the night 53. Later, indications of mutation showers were also found within the epidermal growth factor receptor (EGFR) gene in lung cancers54. The mechanisms involved in mutation cluster formation are of current interest. Using yeast model systems, we found that accumulation of base damage in ssDNA is one mechanism that is capable of generating closely spaced mutations31–34,55. Clusters of up to 30 mutations spanning 200 kilobases (kb) resulted from a variety of lesions (alkylation, deamination and photoproducts). Transient long regions of ssDNA can be formed at DSBs56, at uncapped telomeres57, during break-induced replication58 and at dysfunctional or uncoupled replication forks 30,32,34,59–63 (FIG. 1) : all of which are common events that occur in evolving cancer cells64–68. The specificity of mutation clusters for ssDNA results in some cases either from the robust removal of lesions in dsDNA by various excision repair systems, leaving only ssDNA lesions remaining, or from lesions caused by agents that specifically affect ssDNA31. An important condition for clustered hypermutation is that the lesions in transient regions of ssDNA are not NATURE REVIEWS | CANCER VOLUME 14 | DECEMBER 2014 | 789 © 2014 Macmillan Publishers Limited. All rights reserved REVIEWS Translesion synthesis (TLS). A form of lesion tolerance that involves the insertion of a new nucleotide opposite a DNA lesion, usually by a specialized DNA polymerase. R‑loops Stable hybrids of RNA and DNA that are formed during transcription. Strand-coordination A phenomenon in which clustered mutations involve changes of only one kind of base within the same DNA strand (for example, C‑coordination — only cytosines are mutated in the top DNA strand) Non-canonical excision repair Excision of a DNA lesion or modification that does not completely follow the pathway mechanisms of base excision repair, nucleotide excision repair or mismatch repair. Kataegis (or mutation shower). A group of clustered mutations carrying additional similarity features that are unlikely to be random (for example, changes of the same nucleotide in the given strand — strand-coordination). repaired until this strand is copied during synthesis of a nascent complementary strand with the help of errorprone translesion synthesis (TLS) DNA polymerases. Such repair may happen if an ssDNA region is annealed with an undamaged complementary strand; for example, by re‑annealing of R-loops69 or replication fork regression62 (FIG. 1). Whether ssDNA mutagenesis is the predominant mechanism for cluster formation in human cells, as it seems to be in yeast, remains to be established. The probability of mutation in transient ssDNA can be as high as several thousand-fold greater than in the rest of the genome32,34, leading to the incidence of mutation clusters containing multiple closely spaced mutations. As all mutations of a cluster occurred simultaneously, presumably from lesions in the single DNA strand, they exhibit strand-coordination30,32,34 (FIG. 1). Moreover, DNA-damaging agents often exhibit preference for a certain nucleotide (or nucleotides) or even for a short nucleotide motif (or motifs). Therefore, simultaneous mutations in a cluster occur primarily at the same kind of nucleotide or motif. The combined result of the stand-coordination and homogeneous motif specificity of clustered mutations is a pure mutation spectrum stemming from a single mutagenic process. The original proof‑of‑principle for this concept was for lesions in ssDNA caused by UVC radiation16,20. Strand-coordination was also demonstrated for other lesions (induced by methyl methanesulphonate (MMS), sulphites and the cytidine deaminase apolipoprotein B mRNA editing enzyme catalytic polypeptidelike 3G (APOBEC3G)) in artificially created ssDNA31,33. However, it is important to note that although strandcoordinated clusters have so far been associated with lesions in ssDNA, they can also stem from lesions in dsDNA, as long as the lesions occur simultaneously (see also figure 2 in REF 70)). On the basis of these and other findings, Roberts et al.32 reasoned that if strand-coordinated clusters could be found in cancer genomes, the mutations in each cluster would probably have occurred simultaneously and resulted from one mechanism. Consequently, analysis of mutations in these clusters would enable evaluation of a single mutagenic mechanism that was singled out from a complex mix of mutation causes in cancer. Indeed, many mutation clusters (containing up to 34 mutations and spanning several kb) were found in whole-genome sequencing (WGS) of multiple myelomas, head and neck, prostate and colorectal cancers, and around 30% of clusters were completely strandcoordinated32,71. Completely strand-coordinated clusters of cytosines or guanines (C- or G‑coordinated) prevailed over A- or T‑coordinated clusters. A- or T‑coordinated clusters were found mostly in multiple myelomas and were clearly associated with the mutation motif [T(A|T)] (mutated base underlined; ambiguous nucleotides shown in parentheses separated by “|”). This mutation signature has been reported for the error-prone DNA Pol η, which participates in gap-filling DNA synthesis that is associated with non-canonical excision repair of U∙G mismatches during somatic hypermutation (SHM) of immunoglobulin genes72,73. Abnormal targeting of activation-induced cytidine deaminase (AID; which induces U∙G mismatches during SHM) to random chromosomal locations could be the source of A- or T‑coordinated clusters74. However, if AID-induced U∙G mismatches were not repaired, they would result in a C‑to‑T or C‑to‑G substitution at the preferred DNA motif of AID for cytidine deamination: [(A|T)(A|G)C→(T|G)]75,76 (short), or [(A|T)(A|G) C(A|T|C)→(T|G)]77 (extended). Consequently, AIDgenerated clusters normally carry a mix of mutations in C and A75. Indeed, such mixed clusters were found in immunoglobulin loci and a small number of secondary targets (including potential cancer genes74,78,79) that are known to be targeted by AID mutagenesis with lower efficiency 74 in multiple myelomas32,80 and chronic leukaemias81. Further supporting SHM as the origin of these clusters, both of these cancer types originate from immune cells that have experienced SHM, and the mutations were enriched with the AID mutation signature32,80. AID mutagenesis, however, seems to occur primarily within the context of SHM, as no enrichment for the AID signature was found in randomly located clusters or in WGS data sets within the same multiple myeloma genomes32, and the AID deamination signature was absent from pan-cancer signature decomposition by NMF11. Similarly, C- or G‑coordinated clusters, as well as genome-wide mutagenesis across all cancer types analysed, were depleted for the AID signature32. C- or G‑coordinated clusters were more frequent than A- or T‑coordinated clusters in WGS, as well as in whole-exome sequencing (WES) data sets32,71, and were frequently colocalized with breakpoints of rearrangements found within the same cancer sample (FIG. 2a), whereas no colocalization of A- or T‑coordinated clusters were detected32. Inspection of the nucleotides flanking the strand-coordinated C or G mutations revealed high enrichment for a mutation signature [TC(A|T)→(T|G)], characteristic of a subclass of APOBEC cytidine deaminases (FIG. 2b). Similar clusters enriched with mutations in C and G were also found using NMF by Nik-Zainal et al.51 in WGS of breast cancer samples. Sizes and mutation densities of individual clusters (called micro-clusters in that study) were comparable to those found by Roberts et al.32 by mutated motif and colocalization with rearrangements. It was also noted that some larger areas of the cancer genome (up to 1 megabase) have a high density of mutations. Nik-Zainal et al. named this phenomenon kataegis (the Greek word for thunderstorm, by analogy with the previously suggested term — mutation shower). In a later work11, they defined kataegis (or kataegic foci) as six or more consecutive mutations with an average intermutation distance of less than or equal to 1 kb. This study found several kataegic foci not only in breast cancer but also in cancers of the pancreas, lung and liver, as well as in medulloblastomas, lymphomas and leukaemias11. The high prevalence of the [TC(A|T)→(T|G)] mutation signature in C- or G‑kataegic foci suggested that APOBEC mutagenesis could be the major component of some signatures that are revealed by NMF, in which each signature is composed of several trinucleotide motifs. Indeed 790 | DECEMBER 2014 | VOLUME 14 www.nature.com/reviews/cancer © 2014 Macmillan Publishers Limited. All rights reserved REVIEWS 21 Y 22 X 1 2 4 18 5′-TACCTCAGCGA...AGATTCTGACT...TCTATCATAGA-3′ 3′-ATGGAGTCGCT...TCTAAGACTGA...AGATAGTATCT-5′ Tumour 5′-TACCTgAGCGA...AGATTtTGACT...TCTATtATAGA-3′ 3′-ATGGAcTCGCT...TCTAAaACTGA...AGATAaTATCT-5′ Motif base frequency 5 17 6 7 13 Germline 3 20 19 16 15 14 b Strand-coordination, preferences in motif and substitution 12 11 9 10 8 1.0 1.0 Base frequency a Clustering of mutations with genomic features 0.8 0.6 0.4 0.2 0.0 –1 0 c Processing and mutation of deoxyuridine in ssDNA 0.4 0.2 G A Substitution C T d Refined signature analysis N N N T C A N N N N N N N 0.6 0.0 +1 Nucleotide position within a DNA motif 0.8 APOBEC mutation motif N N N TCW/WGA (W=A or T) Deamination N N N T U A N N N N N N N N N N Abasic site N N N T A N N N N N N N N N N N N N Pol δ and Pol ζ 1 N N N T APOBEC mutation signature UDG A N N N N N N N N A A T N N N N N TCW→TTW or →TGW/WGA→WAA or →WCA 2 Pol δ, Pol ζ and REV1 N N N T A N N N N N N N N A C T N N N N N APOBEC mutation pattern in a sample E= N N N T T A N N N N N N N N T G A N N N N N N N N A A T N N N N N N N N A C T N N N N N C-to-T mutation C-to-G mutation MutationsTCW→TTW or TCW→TGW x ContextC MutationsC→T or C→G x ContextTCW Figure 2 | Mutation patterns and mechanistic knowledge used to define an APOBEC mutation signature and produce sample-specific statistics evaluating mutagenesis. Robust mutation signatures can be developed by identifying groups of spatially clustered mutations that are likely to have been induced by a single mutagenic Nature Reviews | Cancer mechanism. Additional features that are used to implicate the causative factor are described in parts a–d. a | Proximity to sites of chromosomal rearrangement (purple connector line). b | Strand-coordination (example with sequence context of a C‑coordinated cluster from multiple myeloma32; see also FIG. 1 and the main text), motif preference (grey) and substitution specificity. For example, the colocalization of strand-coordinated clustered cytosine substitutions with rearrangement breakpoints implicates the involvement of DNA double-strand-break (DSB) repair in the formation of the mutations. The frequent involvement of single-stranded DNA (ssDNA) intermediates during such DSB repair events combined with an over-representation of TCA and TCT sequences among the mutations corresponds to the biochemical characteristics of a subset of apolipoprotein B mRNA editing enzyme catalytic polypeptide-like (APOBEC) cytidine deaminases within ssDNA. Both C-to-T and C-to-G substitutions are also over-represented. c | The mechanism of downstream processing of deoxyuridine (the deamination product of deoxycytidine), explaining mutation specificity in clusters55: glycolytic conversion of deoxyuridine to an abasic site creates a block to DNA synthesis during gap filling. The concerted action of DNA polymerase δ (Pol δ) and DNA Pol ζ (1) or DNA Pol δ, DNA Pol ζ and REV1 (2) makes mutagenic insertion of either adenine or cytosine opposite the abasic site, respectively, resulting in C‑to‑T and C‑to‑G mutations. d | Combining the favoured sequence motifs and base substitution preferences of APOBEC enzymes creates a refined mutation signature and allows calculation of sample-specific statistics evaluating mutagenesis — enrichment (E) by APOBEC signature mutations over the presence of an APOBEC mutation motif in the surrounding nucleotide context. TCW and WGA indicate the APOBEC-targeted sequences of TCT and TCA, as well as their complements, as described in REF. 71. Nucleotides that are involved in mutation events are shown in red. UDG, uracil DNA glycosylase. NATURE REVIEWS | CANCER VOLUME 14 | DECEMBER 2014 | 791 © 2014 Macmillan Publishers Limited. All rights reserved REVIEWS Rearrangement breakpoints A pair of distant genomic coordinates that are brought into the immediate vicinity of each other by a rearrangement. [TC(A|T)→(T|G)] mutations prevailed in two of the signatures (signatures 2 and 13) that occurred in several cancer types that were defined by the NMF analysis11. Pairing mutation signatures with their source: the APOBEC example. Several key experimentally determined characteristics of APOBEC cytidine deaminases enabled attribution of the mutation signature [TC(A|T)→(T|G)] to these enzymes (FIG. 2b). The human genome encodes eight enzymatically active APOBEC polypeptides (seven are located in the APOBEC3 cluster of highly homologous genes), which normally serve to restrict viral infection and retrotransposon mobility by deaminating cytosines during the ssDNA stage of their replication cycle82,83. These enzymes have exquisite preference for ssDNA, where they deaminate the C that resides in TCA and TCT trinucleotides to U. This U can be directly copied to produce C-to-T substitutions. However, uracil bases in ssDNA are often excised by uracil DNA glycosylase enzymes (UDG enzymes, which are also known as UNG enzymes in humans), leaving abasic sites which, after copying by TLS polymerases, results in both C-to-T and C-to-G mutations31,55,84–86 (FIG. 2c). Moreover, some APOBECs are highly processive, allowing direct formation of mutation clusters87,88. These biochemical properties correlated well with the sequence motif and base substitution spectrum of mutations in C‑ and G‑coordinated clusters, as well as the colocalization with rearrangement breakpoints where ssDNA is expected to have formed, either because ssDNA is more fragile or because regions of ssDNA would be created around DSBs by end resection56. The primary benefit of associating a mutagenic process with a stringent signature, such as [TC(A|T)→(T|G)], is that this signature can be used to develop a measure of mutagenesis in the entire genome of individual samples — enrichment over the expected presence of the signature if caused by random mutagenesis32,71 (FIG. 2d). This allowed computation of sample-specific p‑values as well as q‑values corrected for multiple testing errors. In the case of the [TC(A|T)→(T|G)] signature, ana lysis of about a million mutations in 14 types of cancer reported by TCGA revealed that bladder, cervical, breast, head and neck and lung cancers have a high prevalence of APOBEC-mutated samples. The same cancer types showed the presence of signatures 2 and/or 13 in the NMF-based analysis11. However, only the capacity to compute sample-specific q‑values and enrichments enabled the finding that tumours in the HER2‑enriched subtype of breast cancer are much more likely to be mutated by APOBEC compared with three other breast cancer subtypes71. APOBEC1, as well as six of the APOBEC3 proteins, have the ability to generate the [TC(A|T)→(T|G)] mutation signature in vitro and in model in vivo studies70,82. A high level of APOBEC3B expression in several cancer types, and weak but statistically significant correlation of its mRNA with the load of APOBEC signature mutations, prompted Burns et al.89,90 to suggest that this enzyme is the likely cause of mutagenesis. However, another study implicated APOBEC3A, as well as APOBEC3B, as possible mutagenic enzymes in breast cancer 91. Interestingly, APOBEC3B homozygous deletion is polymorphic within the human population92 and individuals who are homozygous-null and even hetero zygous have a detectable increase in frequencies of breast or liver cancers93–95 that can have a high frequency of mutations fitting the APOBEC signature96. Identification of the real culprit through analysis of cancer genomics data sets is complicated by the level of mutagenesis, which probably depends on additional factors besides mRNA levels of APOBECs, such as the active protein level, the accessibility of chromosomal DNA, and the availability of ssDNA substrate (discussed in REF. 71). It is also worth noting that many mutations detected in cancers may have occurred long before tumour mRNA was extracted for RNA sequencing. Although several questions about APOBEC mutagenesis in cancers remain unresolved, it is a useful example of a strategy leading to the construction of a rigid mutation signature. This in turn allows statistical power that is sufficient for computing sample-specific p‑values, which are not available if complex signatures derived from NMF are used for sample-by‑sample analysis directly. In addition to APOBEC mutagenesis, cancer genome resequencing identified other mutation signatures occurring both in mutation clusters and scattered across the genome. Although many of these signatures cannot yet be linked to a specific source, correlations with mechanism-based hypotheses have enabled the likely causative agents for the following signatures, which can be broadly separated according to the source — exposure to a mutagen, or resulting from defective DNA repair (TABLE 2 and below). Increased lesions or decreased error-free repair This group of mutagenic mechanisms relies on exo genous or endogenous (such as the AID and APOBEC enzymes discussed above) DNA lesions, which are turned into mutations by downstream DNA repair and replication (FIG. 3; TABLE 2). Mutations in CpG motifs. Cytosines in CpG sequences are often methylated at the C5 carbon, producing 5‑methylcytosines (5‑meCs). Deamination of 5‑meC produces T and often C-to-T mutations. Such mutations can be prevented if a T∙G mispair is recognized by the T∙G‑specific thymine DNA glycosylase (TDG) and the resulting abasic site is repaired by base excision repair (BER) 97,98. Unlike mutations that are caused by C‑deamination in the APOBEC motif, CpG mutations are depleted within C- or G‑coordinated mutation clusters, which are expected from simultaneous mutation events32,71. Frequent conversion of 5‑meC to T leads to depletion of CpG dinucleotides from genomes, except within regulatory CpG islands or rare coding positions, where methylation is limited and they are maintained by positive selection99,100. When mutation rates in CpG motifs are normalized for their presence in the genome or in the exome, this mutagenesis becomes detectable in many WES and WGS cancer mutation data sets11,12. Unlike other mutation signatures, CpG mutagenesis correlates with the patient’s age at cancer detection11. 792 | DECEMBER 2014 | VOLUME 14 www.nature.com/reviews/cancer © 2014 Macmillan Publishers Limited. All rights reserved REVIEWS Table 2 | Signatures associated with different causes of hypermutation in human cancer genomes Source of mutations Mutated motif*‡ Resulting base(s) Base change Hypermutation load (fraction)§ Additional features UV radiation (–|T|C|–)C(A|T|C|G) T C→T 30,000–50,000 (40–50%) Exposure history; somatic mutations, transcription 105 (–|T|C|–)C(G) T C→T 3,000–5,000 (5–10%) Exposure history; somatic mutations, transcription 105 (A|T|C|G)CC(A|T|C|G) TT C→T 500–1,000 (1%) Exposure history; somatic mutations, transcription 105 5‑meCpG deamination (A|T|C|G)C(–|–|–|G) T C→T 1,000–10,000 (5–10%) Age dependence 11 APOBEC (–|T|–|–)C(A|T|–|–) T|G C→T 50,000–100,000 (30–70%) C- or G‑strand-coordinated clusters 71 AID (A|T)(A|G)C(A|T|C|G) T|G C→T Region specific Clusters in AID genomic targets DNA polymerase ε (–|T|–|–)C(–|–|–|G) T C→T 10,000–20,000 (40–60%) || Refs Somatic mutations, MSS 138 Somatic mutations, MSS 138 (–|T|–|–)C(–|T|–|–) A C→A ROS (A|T|C|G)C(A|T|C|G) A C→A TBD TBD Tobacco (A|T|C|G)C(A|T|C|G) A C→A 30,000–100,000 (20–90%) Exposure history 11 Temozolomide (A|T|C|G)C(–|T|C|–) A C→A 30,000–200,000 (30–90%) Treatment record 126 Aristolochic acid (–|–|C|–)T(A|–|–|G) A T→A 50,000–200,000 (30–70%) Exposure history 42,45 DNA polymerase η (A|T|C|G)T(A|T|–|–) A|T|C|G T→A TBD TBD MMR defects MSI NA NA NA Somatic mutations, MLH1 hypermethylation *Each motif produced by the same source of mutation is shown on a separate line. ‡The mutated nucleotide (underlined) as well as the 5ʹ and 3ʹ flanking nucleotides are shown for each motif: (5′ nucleotide) mutant (3′ nucleotide), with the exception of the mutation motif of AID, for which two 5ʹ nucleotides are shown. Possible nucleotides for a position are shown in parentheses divided by “|” in the order A|T|C|G; if a certain nucleotide does not occur in this position, it is shown by “–” to align similar nucleotides; “–” is not used in representation of signatures in the text and in BOX 1; AID and the third UV signature are not aligned with the others. § Representative examples are provided; whole-exome sequencing data are prorated on a whole-genome scale with the assumption of 100‑times more mutations in whole-genome sequencing. ||Values indicate the combined contribution of both signatures. AID, activation-induced cytidine deaminase; APOBEC, apolipoprotein B mRNA editing enzyme catalytic polypeptide-like; me, methyl; MMR, mismatch repair; MSI, microsatellite instability; MSS, microsatellite-stable; NA, not applicable; ROS, reactive oxygen species; TBD, to be determined; UV, ultraviolet. UV radiation. UVB (280–320 nm) and UVA (320– 400 nm), which are components of sunlight, are established risk factors for melanomas and head and neck squamous cell cancers101. The spectrum and signature of UV‑induced mutagenesis is defined by the spectrum of lesions that arise and the interplay of repair pathways and TLS across unrepaired lesions22,23. CPDs and 6–4PPs in dsDNA are normally repaired by NER using the undamaged DNA strand as a template. Unrepaired photoproducts are converted into mutations by error-prone TLS during replication102,103. The specific UV mutagenesis signature relies on a highly increased rate of deamination of cytosines in CPDs22,104. The resulting pyrimidine‑U CPD is copied by the error-free TLS polymerase DNA Pol η, resulting in C-to-T mutation in [(T|C)C(A|T|G|C)] sequences. When CC dinucleotides form a CPD, deamination of both cytosines results in CC→TT dinucleotide substitution. Indeed, both of these mutation signatures are prevalent in melanomas and to a lesser extent in squamous cell carcinomas of the head and neck11,12,43,105. Oxidative DNA damage. Free radical (hydroxyl and superoxide) and non-radical (hydrogen peroxide) reactive oxygen species (ROS) that are constantly produced by cell metabolism and enter the cell from the environment can cause various kinds of oxidative damage to all DNA bases, which can become mutations106–108. Studies with mutation reporters have indicated that oxidative damage can lead to a higher prevalence of mutations in G∙C than in A∙T base pairs15,109–111, which could contribute to the bias towards G∙C mutations observed in many cancers11,12. Highly mutagenic 8‑oxo‑7,8 dihydroguanine (8‑oxoG) could be the reason for such a bias; however, other products of reactions of guanine and cytosine could also contribute108. Vasquez and colleagues112,113 found that the mutation motif [(A|T|G|C)(T|C)C(A|T|C)] is enriched in several cancers. Experimentally, they found that guanines in the complementary sequence context have a higher potential to trap electrons and thus have an increased chance of chemical modification. Partial overlap of this motif with the APOBEC mutation motif [(A|T) GA] in the G‑containing strand suggests that additional steps should be taken at statistical analysis of either motif. Tobacco. Tobacco contains multiple ingredients that are capable of forming mutagenic DNA adducts114,115. Among these, benzo[a]pyrene (BaP) is the most experimentally studied. Its adduct with guanine predominantly induces G∙C→T∙A mutations. However, the same class of mutations is expected from 8‑oxoG, which can result from ROS that are generated by smoking. Regardless, G∙C→T∙A mutations are the major class of base substitutions observed in TP53 from lung cancers and in WGS and WES of lung cancers, and the occurrence of this class of mutation correlates with smoking history 44,116–118. Interestingly, in bladder cancer, for which smoking is NATURE REVIEWS | CANCER VOLUME 14 | DECEMBER 2014 | 793 © 2014 Macmillan Publishers Limited. All rights reserved REVIEWS Low mutation frequency Error-free lesion repair MMR Hypermutation Figure 3 | Sources of hypermutation in cancer. Hypermutation can occur through Nature Reviews | Cancer alterations in the equilibrium between the formation of lesions (hexagons) and error-free lesion repair, or changes in the equilibrium between the rate of replication errors and DNA polymerase proofreading or mismatch repair (MMR) (shown as removal of bulged mispairing behind the replication fork). Low levels of lesions and replication errors and/or high efficiency of error-free lesion repair, proofreading and MMR results in low mutation frequency; however, reduction in repair and/or increase in lesions or in replication error levels may lead to hypermutation (mutant positions are shown in red and do not indicate nucleotide content). known to be a strong risk factor, the WES of 130 samples did not contain an increased number of G∙C→T∙A mutations15. Instead, bladder cancers showed enrichment for the APOBEC mutation signature. SN1-type alkylating agents Chemicals that form a reactive intermediate in the body that can attack DNA bases to covalently link an organic group (for example, a methyl group). Mismatch repair (MMR). A DNA repair pathway that removes mismatched nucleotides resulting from DNA replication errors (for example, C∙T base pairs). MMR removes a stretch of the DNA strand containing the mismatched nucleotide and then subsequently uses the remaining intact DNA strand as a template to synthesize a new stretch of DNA to replace the excised one. The mutation load in upper tract urothelial cancers induced by aristolochic acid is one of the highest known so far. A∙T→T∙A mutations in these cancers were enriched with [(C|T)AG]42 (note that TABLE 2 shows this motif as in the complementary (T) strand) or even with a more stringent [A(C|T)AGG]45 motif. Temozolomide. The major mutagenic product of the SN1-type alkylating agents, including the anticancer drug temozolomide (TMZ), is O6-methylguanine (O6-meG). This DNA lesion can be copied by DNA polymerase, inserting either the correct C‑nucleotide or the incorrect T. The latter would cause a C∙G→T∙A mutation. O6-meG∙C as well as O6-meG∙T are recognised by the mismatch repair (MMR) pathway, which functions in conjunction with DNA replication. Upon recognition of a lesion, MMR triggers removal of a nascent DNA stretch from the O6-meG lesion and reinitiation of DNA synthesis. Subsequent synthesis past the same O6-meG frequently regenerates the mismatch, which is in turn recognized by MMR and establishes a repetitive repair cycle119–121. This ‘futile repair cycle’ is mostly toxic to proliferating cells, which underlies the mechanism of action of TMZ as a cancer drug. However, resistance to TMZ as well as to other SN1 alkylating agents can occur as long as cancer cells acquire additional defects in MMR or overexpress O6-meG-DNA methyltransferase (MGMT)122–125. This results in a high frequency of G∙C→A∙T mutations, which occurs in the genomes of TMZ-treated gliomas and melanomas11,19,126. Importantly, these G∙C→A∙T mutations were predominantly in CC or (to a lesser extent) in CT motifs, which allows them to be distinguished from C-to-T mutations that are associated with 5‑meCG. Aristolochic acid. This carcinogenic substance is contained in the East-Asian traditional medicinal plant Aristolochicia sp.127,128. The metabolized derivative of aristolochic acid forms mutagenic aristolactam–DNA adducts (AL‑dAs). AL‑dAs are refractory to the GGR branch of NER and can be repaired only by TCR in the transcribed strand129, which could explain a transcriptional strand bias and high frequency of A∙T→T∙A mutations in cancer-associated genes such as TP53 (REFS 130,131). Increased DNA synthesis errors The chances for mutations to arise from DNA synthesis copying a normal template during genome replication are limited by replicase accuracy in base selection, nearly instant proofreading of replication errors and postreplicative MMR132,133. Inactivation of any of these safeguards can lead to hypermutation. Similarly, mutations can also be introduced during replication when errorprone TLS polymerases copy small stretches of DNA that do not contain lesions134. MMR defects. The inherited predisposition to hereditary non-polyposis colorectal cancer (HNPCC; also known as Lynch syndrome) as well as to several other types of cancers is the consequence of germline genetic defects in one of the MMR genes: MSH2, MSH6, PMS2 and MLH1 (REF. 133). The most easily detected result of MMR defects, even in WGS and WES data sets, is an increase in microsatellite instability (MSI) — frequent deletions and insertions in arrays of very short direct repeat sequences (microsatellites)135. In sporadic colorectal, gastric and endometrial cancers14,16,136, genome-wide MSI often coincides with MLH1 hypermethylation or with somatic mutations in MMR genes. Cancer samples with MSI also have an increased frequency of various base substitutions. They form a category of hypermutated cancers, which is distinct from the hypermutation that occurs in cancer cells with somatic mutations in replicative DNA polymerases. DNA Pol ε and DNA Pol δ mutations. Somatic mutations in the respective catalytic subunits of replicative DNA Pol ε (POLE) and DNA Pol δ (POLD1) have been recently associated with familial predisposition to colorectal, endometrial and ovarian endometrioid cancers137–140. On the basis of the budding yeast model, DNA Pol δ synthesises the lagging strand and DNA Pol ε is responsible for the synthesis of the leading strand in a bi‑directional replication fork141. In WES analysis, samples with somatic mutations in POLE are ‘ultramutated’ and carry even higher mutation loads than MSI-positive MMR-deficient tumours. It was noted that sporadic mutations in POLD1 were less frequent among ultramutated tumours, which mostly carried POLE mutations in the proofreading exonuclease domain (EDM)14,137,142,143. Samples with EDM mutations are particularly enriched with two separate signatures [TCT→A] and [TCG→T]11. TLS polymerases. DNA Pol η can perform error-free copying of CPDs; however, it has a high error rate when copying an undamaged template in vitro144. The DNA Pol η [T(A|T)] mutation motif was significantly enriched in A- or T‑coordinated clusters found in multiple myelomas32 (discussed above). Enrichment with this motif was 794 | DECEMBER 2014 | VOLUME 14 www.nature.com/reviews/cancer © 2014 Macmillan Publishers Limited. All rights reserved REVIEWS also documented for chronic leukaemias and in B cell lymphomas11,81. Another mutation signature, [AT→C], detected in hepatitis C virus (HCV)-positive hepatocellular carcinomas was suggested to originate from errorprone synthesis by one of the TLS polymerases based on overexpression of DNA Pol ζ and DNA Pol ι in this type of cancer 145,146. Structural and copy number changes In addition to the large numbers of base substitutions that are seen in various cancer types, many cancers show marked structural rearrangement of the genome, resulting in chromosome translocations and changes in allelic copy number. Copy number changes that are driven by aneuploidy are themselves sufficient to promote carcinogenesis in mice147. A role for similar changes driving cancer in humans has been indicated computationally 148, and by recurrence in WGS analyses149. Observations in many lung squamous cell carcinomas, as well as ovarian, breast and head and neck cancers, suggested that either mutation or rearrangement can drive cancer progression150. Discoveries of phenomena termed ‘chromothripsis’ (REF. 151) and ‘chromoanasynthesis’ (REF. 152) indicate that structural alterations, as well as base substitutions, can occur at high density in localized regions, resulting in large-scale rearrangement of one or a few chromosomes. Evidence of frequent microhomology at the junction of these mass chromosomal rearrangements151,152 and in instances of solitary structural changes25 suggests that mechanisms such as non-homologous end joining and microhomologymediated break-induced replication at DSBs or fork stalling and template switching can be involved in the generation of chromosome alteration; however, the initiating causes of such events remain unclear. Cancers with ‘simple’ genomes Contrasting the multiple instances of hypermutation and genome instability observed in various cancer types, several recently characterized paediatric and some adult lymphoid cancers show relatively small numbers of genetic alterations12. Such ‘simple’ paediatric cancers are exemplified by rhabdoid tumours in which the initial sequencing of 35 patients revealed mutation loads on the order of 1–10 mutations per exome and only a single recurrent event involving alteration of the gene encoding the chromatin remodelling enzyme SMARCB1 (REF. 153). Similarly, paediatric medulloblastomas and Ewing’s sarcomas have low mutation numbers17,154,155, and adult lymphoid cancers, such as acute myeloid leukaemia, often have similar genetic complexity to corresponding normal cells18. Thus, although increased mutation rates are one potential means to dysregulate cell growth and proliferation leading to cancer development, others clearly exist. The frequently recurrent disruption of genes encoding epigenetic modifying enzymes such as SMARCB1, DNA methyltransferase 3α (DNMT3A), TET2, enhancer of zeste 2 (EZH2), and mixed lineage leukaemia 2 (MLL2; also known as KMT2D) in paediatric and lymphoid tumours156 may suggest that global changes in gene expression through a single genetic change can be sufficient to drive cancer progression. Moreover, programmed genetic changes such as V(D)J recombination, SHM and class switch recombination that occur in lymphoid cells can inadvertently target cancer-associated genes (for example, recombination-activating gene (RAG)-mediated deletion of ETS variant 6 (ETV6) and cyclin-dependent kinase inhibitor 2A (CDKN2A)–CDKN2B in acute lymphoblastic leukaemia (ALL)157, and MYC translocations in Burkitt’s lymphoma158,159) and can thereby promote cancer progression without the establishment of a general mutator phenotype. The carcinogenic potential of hypermutation Despite the few examples of cancers with ‘simple’ genomes, a high level of genome and epigenome instability in most adult cancers is a well-accepted fact 150,160, but the relative roles and specific contribution of each type of instability is still a question for the cancer research of today. Not surprisingly, the relative carcinogenic role of hypermutation is still under debate8–10. Intrinsic features of cancers, such as an increased number of cell divisions owing to unrestrained proliferation, could produce high total numbers of mutations in a tumour without the requirement of an increased mutation rate. Moreover, increased genome instability that is initiated during malignant transformation may obscure less prominent mutagenic processes operating early during tumorigenesis. The importance of these minor pathways in driving cancer could be greatly amplified relative to their contribution to the total mutation load by selection for specific oncogenic changes. However, it is clear that hypermutated cancer genomes can contain prevailing mutation signatures in genes that are important for cancer initiation and development12,42,45,71,90,156. Recent bioinformatics and experimental analyses indicate that in at least some of these cases, hypermutation is inducing mutations that actively drive carcinogenesis. For example, mutations occurring in PIK3CA are more common in APOBEC-hypermutated tumours, regardless of tumour type161. Moreover, the same study demonstrated that the PIK3CA mutations in APOBEC-hypermutated tumours are skewed towards two ‘hotspot’ locations in the regulatory helical domain over the third canonical mutation hotspot that is located directly in the kinase domain. The over-representation of the tumour-selected helical domain mutations, which exist in APOBEC target motifs, over an equally selectable, non-APOBEC mutation in the APOBEC-hypermutated cancers strongly suggests that APOBEC enzymes are probably causative in these cancers. In addition, Marais and colleagues162 have shown that UV‑induced mutations accelerate carcinogenesis in BRAF-deficient mice by inducing selected mutations in Trp53. Identifying similar examples of hypermutation-induced cancer will probably become easier as our knowledge of the key genes that are responsible for tumorigenesis becomes more complete. Lawrence et al.12 found that accounting for mutational heterogeneity that is associated with various functional and structural features across the genome (TABLE 1) increased the statistical power for detecting significantly mutated genes (SMGs) that are potentially important for cancer, and it eliminated apparent false positives, such as olfactory receptor genes. These genes reside NATURE REVIEWS | CANCER VOLUME 14 | DECEMBER 2014 | 795 © 2014 Macmillan Publishers Limited. All rights reserved REVIEWS in late replicating, low transcribed regions of the genome that show an elevated mutation rate. After correcting for regional differences in mutation rate, these genes fell out from the SMG category. Alternative methods of identifying cancer drivers, such as comparing ratios of benign to deleterious mutations in specific genes149 or computational algorithms such as CRAVAT163, are also sensitive to background mutation frequencies to some extent. High numbers of passenger mutations and the potential for mutational hotspots in hypermutated cancers can considerably complicate these analyses, potentially resulting in both false-negative and false-positive results. We anticipate that the accumulation of mechanistic knowledge, as well as WES and WGS cancer mutation data, will increase the accuracy of correction factors by accounting for specific features of mutagenic mechanisms that are operating in individual hypermutated cancer samples. For example, calculating enrichments for a simple mutation signature allows selection of those samples for analysis, in which a significant part of the mutations carrying the signature really have risen from a mutagen. Groups of samples with high statistically-evaluated presence of specific mutation signatures can be used as a discovery set for defining genomic profiles of hypermutation caused by a signature-generating mutagen, leading to the development of mutagen-specific correction factors in searching for SMGs. Perspectives — actions and outcomes Despite the current successes of resequencing genomes in understanding the impact of mutation on cancer and genome dynamics in general, a number of the mutation signatures that have been highlighted by NMF50 cannot, Box 1 | Merging statistical pattern analysis with mechanistic information Although de novo pattern recognition approaches are useful for highlighting mutation signatures that prevail in large groups of cancers, they can only identify samples that are enriched with a given signature after the principal component of a signature is extracted and used for repeated analysis. Such an extraction can be done on the basis of the prevalence of a signature derived by non-negative matrix factorization (NMF); however, an independent statistical analysis, such as the presence of a principal component of a signature in clusters and/or correlation with mechanistic information on a suggested source of mutagenesis may result in a more rigid definition of a signature, thereby increasing the power of the statistical analysis. Key benefits of using experimentally derived mechanistic knowledge to generate rigid mutation signatures include: •The information about mutagenic specificity of environmental and endogenous causes of mutations can be used for reducing each multicomponent signature derived from NMF to a less complex one, which is better suited for sample-to‑sample analysis. •Mechanistic information can help avoid over-simplification of mutation signatures, which would result in excessive overlap. •As long as two simple signatures are known to come from the same mechanism, they can be used within a single, more focused and powerful statistical hypothesis. The examples listed in TABLE 2 and below (see also REFS 112,168) illustrate these lines of analysis: Example 1 The ultraviolet (UV) radiation signature, [(T|C)C(A|T|C|G)→T] (similar to TABLE 2, mutated bases are underlined and ambiguous nucleotides are shown in parentheses separated by “|”; unlike in TABLE 2, only the nucleotides that can be found in a given position of a signature are shown) overlaps with the [TC(A|T)→T] part of the apolipoprotein B mRNA editing enzyme catalytic polypeptide-like (APOBEC) signature. However, the other part of the APOBEC signature [TC(A|T)→G] is absent in the UV mutagenesis signature. Further supporting the lack of [TC(A|T)→G] mutations stemming from UV radiation, in melanoma genomes, the density of [TC(A|T)→T] mutations showed reverse dependence on nucleosome density, whereas [TC(A|T)→G] did not25. Another feature distinguishing UV from APOBEC mutagenesis is the lack of preference in the choice of the 3′‑nucleotide setting the mutation motif as [(T|C)C(A|T|C|G)→T]. Moreover, the third nucleotide position for a UV‑induced signature should be enriched with G, because methylation of cytosine increases the chance of cyclobutane pyrimidine dimer (CPD) formation22,104. Indeed, high prevalence of [(T|C)CG→T] mutations is a common feature of many UV‑associated cancers11,12,43,105. Together with CC→TT enrichment in a sample, these attributes provide a good tool for highlighting cancer samples with a strong component of UV‑induced mutagenesis. In addition, the partial overlap between the part of the UV mutation signature [(T|C)C→T] and the APOBEC signature can also be resolved by UV‑mutagenesis occurring more frequently in the non-transcribed strand (TABLE 1), whereas APOBEC signature mutations do not show any transcriptional bias11. Example 2 The partial overlap between the APOBEC mutation signature [(T)C(A|T)→T|G] and the DNA polymerase ε mutation signature [TCT→A] can be resolved by accounting for the difference in the base resulting from a mutation (T or G versus A), which is in agreement with mechanistic information about translesion synthesis (TLS) across abasic sites — the primary lesion generated in place of uracils created by APOBEC cytidine deamination in single-stranded DNA (ssDNA)55. Example 3 The overlap between the tobacco signature, [(A|T|C|G)C(A|T|C|G)→A], and the temozolomide (TMZ) signature, [(A|T|C|G) C(T|C)→A], can be resolved using the clinical history of exposure. As mechanism-based mutation signatures are confined to simple nucleotide motifs, it is possible to calculate enrichment for the motif in mutations over its presence in a sequenced part of the genome (see FIG. 2 for an example involving APOBEC). These calculations are useful for discerning between mutagenic processes that generate overlapping signatures (see, for example, Figure 1 in REF 71), as the more likely process would generate a higher enrichment. 796 | DECEMBER 2014 | VOLUME 14 www.nature.com/reviews/cancer © 2014 Macmillan Publishers Limited. All rights reserved REVIEWS Box 2 | Establishing a uniform organization among mutation databases Unifying the organization of the large tables containing mutation lists (called Mutation Annotation Files (MAFs) in The Cancer Genome Atlas (TCGA) data depository) and supplementing them with additional information could greatly facilitate the use of genomics data by a large number of mechanistic experts who do not have sufficient bioinformatics expertise. Below are several suggestions in this regard: •Unifying the column titles and data value designations is a simple first step to facilitate data analysis by multiple groups outside the data generation consortia. •Unifying sample identification (ID) so that it is the same for a given sample between all platforms (clinical records, expression, methylation, mutation, rearrangement, copy number variation and so on) and data sets within a study. •Providing the values for allele fractions and coverage for each mutation call, which is a natural requirement for the accurate representation of mutation data in a cancer sample that usually consists of more than one clone. Analysis of allelic fractions provides an understanding of cancer development history in the sense of mutation incidence, selection and fixation148,169. •So far, statistical analyses of hypermutation have been based on tumour-specific mutation calls. Adding the data about germline genotype for each patient with a sporadic tumour analysed by genomics platforms would allow genetic analysis of mutation spectra and mutation signatures as quantitative genetic traits. Although these data are available from the same sequencing effort that produces tumour-specific mutation calls, they are not provided in sample depositories or by individual studies. so far, be associated with any known mutagenic pathway or agent. Likewise, the mechanism underlying other forms of hypermutation, such as the mutation signature [(A|T|C|G)TT→G] in oesophageal cancer 164, hypermutation of the female inactivated X chromosome165 and hypermutated prostate cancer 166,167, remain unknown. Combining statistical signature analysis with mechanistic research could bring clarity to these issues (BOX 1). Moreover, statistical evaluation of specific mutagenesis 1.Aaltonen, L. A. et al. Clues to the pathogenesis of familial colorectal cancer. Science 260, 812–816 (1993). 2. Cleaver, J. E. & Crowley, E. UV damage, DNA repair and skin carcinogenesis. Frontiers Biosci. 7, D1024–D1043 (2002). 3. Hart, R. W., Setlow, R. B. & Woodhead, A. D. Evidence That Pyrimidine Dimers in DNA Can Give Rise to Tumors. Proc. Natl Acad. Sci. USA 74, 5574–5578 (1977). 4. Hecht, S. S. Progress and challenges in selected areas of tobacco carcinogenesis. Chem. Res. Toxicol. 21, 160–171 (2008). 5. Ionov, Y., Peinado, M. A., Malkhosyan, S., Shibata, D. & Perucho, M. Ubiquitous somatic mutations in simple repeated sequences reveal a new mechanism for colonic carcinogenesis. Nature 363, 558–561 (1993). 6.Parsons, R. et al. Hypermutability and mismatch repair deficiency in RER+ tumor cells. Cell 75, 1227–1236 (1993). 7.Cunningham, J. M. et al. Hypermethylation of the hMLH1 promoter in colon cancer with microsatellite instability. Cancer Res. 58, 3455–3460 (1998). 8. Fox, E. J., Prindle, M. J. & Loeb, L. A. Do mutator mutations fuel tumorigenesis? Cancer Metastasis Rev. 32, 353–361 (2013). 9. Fox, E. J., Beckman, R. A. & Loeb, L. A. Reply: Is There Any Genetic Instability in Human Cancer? DNA Repair 9, 859–860 (2010). 10. Shibata, D. & Lieber, M. R. Is there any genetic instability in human cancer? DNA Repair 9, 858; discussion 859–860 (2010). 11.Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013). signatures among an expanded sequencing effort that includes more preneoplastic lesions and relapsed tumour samples may eventually enable analyses that correlate these mutations to clinical phenotypes, provide a better understanding of the role of hypermutation in cancer development, and potentially lead to new cancer prevention, early diagnostics and therapeutic strategies. These perspectives can become closer with concerted efforts of the bioinformatics and mechanistic fields. Both the bioinformatics and mechanistic fields will benefit from a growing number of analysed samples, especially if the output data are expanded and presented in uniform format between different project and data depositories. A common organization would greatly facilitate analyses that are carried out on different cancer data sets (BOX 2). Unifying and supplementing information in data sets could be matched by mechanistic research that commonly addresses questions of mutation signatures and that develops clear mutation motifs, as long as such motifs can be derived from a study. In other words, an effort should be made, where possible, to organize conclusions of mechanistic research in a form that is as convenient as possible for immediate statistical exploration in cancer mutation data sets. These simple steps may result in drafting many researchers with all kinds of expertise, who would otherwise be deterred by the perspective of an indefinite time that was needed for mining and organizing the data before carrying out pilot analyses, which only infrequently lead to a finding. Altogether, merging mechanistic and bioinformatics approaches suggests realistic modifications to study formats and data organization with the payoff in more discoveries that could be important for the fight against cancer. This paper applied NMF to a mutation catalogue from 7,042 tumours to identify 21 mutation signatures occurring in these cancers. Additional analyses were carried out to suggest the causes of some of these signatures. 12.Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013). This paper identifies genes that are recurrently mutated in sequenced whole genomes and exomes of various cancers, thereby identifying genes the alteration of which probably contributes to cancer progression. This method standardizes the rate of gene mutation by the overall regional mutation rates. 13. Loeb, L. A., Springgate, C. F. & Battula, N. Errors in DNA replication as a basis of malignant changes. Cancer Res. 34, 2311–2321 (1974). 14. Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012). 15. Cancer Genome Atlas Research Network. Comprehensive molecular characterization of urothelial bladder carcinoma. Nature 507, 315–322 (2014). 16.Nagarajan, N. et al. Whole-genome reconstruction and mutational signatures in gastric cancer. Genome Biol. 13, R115 (2012). 17.Pugh, T. J. et al. Medulloblastoma exome sequencing uncovers subtype-specific somatic mutations. Nature 488, 106–110 (2012). 18.Welch, J. S. et al. The origin and evolution of mutations in acute myeloid leukemia. Cell 150, 264–278 (2012). 19. Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008). NATURE REVIEWS | CANCER 20. Benzer, S. & Freese, E. Induction of Specific Mutations with 5‑Bromouracil. Proc. Natl Acad. Sci. USA 44, 112–119 (1958). 21. Rogozin, I. B. & Pavlov, Y. I. Theoretical analysis of mutation hotspots and their DNA sequence context specificity. Mutat. Res. 544, 65–85 (2003). 22. Pfeifer, G. P., You, Y. H. & Besaratinia, A. Mutations induced by ultraviolet light. Mutat. Res. 571, 19–31 (2005). 23. Sage, E. Distribution and repair of photolesions in DNA: genetic consequences and the role of sequence context. Photochem. Photobiol. 57, 163–174 (1993). 24. De, S. & Babu, M. M. A time-invariant principle of genome evolution. Proc. Natl Acad. Sci. USA 107, 13004–13009 (2010). 25.Drier, Y. et al. Somatic rearrangements across cancer reveal classes of samples with distinct patterns of DNA breakage and rearrangement-induced hypermutability. Genome Res. 23, 228–235 (2013). 26. Hicks, W. M., Kim, M. & Haber, J. E. Increased mutagenesis and unique mutation signature associated with mitotic gene conversion. Science 329, 82–85 (2010). 27. Malkova, A. & Haber, J. E. Mutations arising during repair of chromosome breaks. Annu. Rev. Genet. 46, 455–473 (2012). This review discusses the experimental evidence that DSB repair processes are prone to increased rates of mutation. 28.Deem, A. et al. Break-induced replication is highly inaccurate. PLoS Biol. 9, e1000594 (2011). 29. Shee, C., Gibson, J. L. & Rosenberg, S. M. Two mechanisms produce mutation hotspots at DNA breaks in Escherichia coli. Cell Rep. 2, 714–721 (2012). 30.Burch, L. H. et al. Damage-induced localized hypermutability. Cell Cycle 10, 1073–1085 (2011). VOLUME 14 | DECEMBER 2014 | 797 © 2014 Macmillan Publishers Limited. All rights reserved REVIEWS 31.Chan, K. et al. Base damage within single-strand DNA underlies in vivo hypermutability induced by a ubiquitous environmental agent. PLoS Genet. 8, e1003149 (2012). 32.Roberts, S. A. et al. Clustered mutations in yeast and in human cancers can arise from damaged long singlestrand DNA regions. Mol. Cell 46, 424–435 (2012). This paper identified mutation clusters that are induced by APOBEC cytidine deaminase activity in human multiple myeloma, prostate cancer and head and neck cancer. It also provided experimental evidence from a yeast model system that DNA damage-induced mutation clusters can be formed by targeting regions of ssDNA. 33. Yang, Y., Gordenin, D. A. & Resnick, M. A. A singlestrand specific lesion drives MMS-induced hypermutability at a double-strand break in yeast. DNA Repair 9, 914–921 (2010). 34. Yang, Y., Sterling, J., Storici, F., Resnick, M. A. & Gordenin, D. A. Hypermutability of damaged singlestrand DNA formed at double-strand breaks and uncapped telomeres in yeast Saccharomyces cerevisiae. PLoS Genet. 4, e1000264 (2008). 35. Rhind, N. & Gilbert, D. M. DNA replication timing. Cold Spring Harb. Perspect. Biol. 5, a010132 (2013). 36. Stamatoyannopoulos, J. A. What does our genome encode? Genome Res. 22, 1602–1611 (2012). 37. Thurman, R. E., Day, N., Noble, W. S. & Stamatoyannopoulos, J. A. Identification of higherorder functional domains in the human ENCODE regions. Genome Res. 17, 917–927 (2007). 38. Liu, L., De, S. & Michor, F. DNA replication timing and higher-order nuclear organization determine singlenucleotide substitution patterns in cancer genomes. Nature Commun. 4, 1502 (2013). 39.Polak, P. et al. Reduced local mutation density in regulatory DNA of cancer genomes is linked to DNA repair. Nature Biotech. 32, 71–75 (2014). 40.Koren, A. et al. Differential relationship of DNA replication timing to different forms of human mutation and variation. Am. J. Hum. Genet. 91, 1033–1040 (2012). 41.Stamatoyannopoulos, J. A. et al. Human mutation rate associated with DNA replication timing. Nature Genet. 41, 393–395 (2009). This paper reports the discovery that regions of the genome that replicate late during S-phase have a higher rate of substitutions during human evolution. 42.Hoang, M. L. et al. Mutational signature of aristolochic acid exposure as revealed by wholeexome sequencing. Sci. Transl. Med. 5, 197ra102 (2013). 43.Pleasance, E. D. et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463, 191–196 (2010). 44.Pleasance, E. D. et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 184–190 (2010). 45.Poon, S. L. et al. Genome-wide mutational signatures of aristolochic acid and its application as a screening tool. Sci. Transl. Med. 5,197ra101 (2013). References 42 and 45 describe extremely high mutation loads and specific mutation signatures in the WGS and WES of upper urinary tract urothelial cell carcinomas that are associated with exposure to aristolochic acid. 46. Schuster-Bockler, B. & Lehner, B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature 488, 504–507 (2012). 47. Stratton, M. R. Exploring the genomes of cancer cells: progress and promise. Science 331, 1553–1558 (2011). 48. Devarajan, K. Nonnegative matrix factorization: an analytical and interpretive tool in computational biology. PLoS Comput. Biol. 4, e1000029 (2008). 49. Lee, D. D. & Seung, H. S. Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999). 50. Alexandrov, L. B., Nik-Zainal, S., Wedge, D. C., Campbell, P. J. & Stratton, M. R. Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 3, 246–259 (2013). This paper describes the mathematical framework for applying NMF to identify mutation signatures that occur in human cancer databases. 51.Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012). 52. Jia, P., Pao, W. & Zhao, Z. Patterns and processes of somatic mutations in nine major cancers. BMC Med. Genom. 7, 11 (2014). 53.Wang, J. et al. Evidence for mutation showers. Proc. Natl Acad. Sci. USA 104, 8403–8408 (2007). 54. Chen, Z., Feng, J., Buzin, C. H. & Sommer, S. S. Epidemiology of doublet/multiplet mutations in lung cancers: evidence that a subset arises by chronocoordinate events. PLoS ONE 3, e3714 (2008). 55. Chan, K., Resnick, M. A. & Gordenin, D. A. The choice of nucleotide inserted opposite abasic sites formed within chromosomal DNA reveals the polymerase activities participating in translesion DNA synthesis. DNA Repair 12, 878–889 (2013). 56. Mimitou, E. P. & Symington, L. S. DNA end resection — unraveling the tail. DNA Repair 10, 344–348 (2011). 57. Dewar, J. M. & Lydall, D. Similarities and differences between “uncapped” telomeres and DNA doublestrand breaks. Chromosoma 121, 117–130 (2012). 58.Saini, N. et al. Migrating bubble during break-induced replication drives conservative DNA synthesis. Nature 502, 389–392 (2013). This paper reports the discovery that the specialized form of DSB repair called break-induced replication occurs through conservative DNA synthesis in which the leading and lagging strand synthesis is uncoupled, leading to the formation of a ssDNA intermediate. 59. Lopes, M., Foiani, M. & Sogo, J. M. Multiple mechanisms control chromosome integrity after replication fork uncoupling and restart at irreparable UV lesions. Mol. Cell 21, 15–27 (2006). 60. Sogo, J. M., Lopes, M. & Foiani, M. Fork reversal and ssDNA accumulation at stalled replication forks owing to checkpoint defects. Science 297, 599–602 (2002). 61. McInerney, P. & O’Donnell, M. Functional uncoupling of twin polymerases: mechanism of polymerase dissociation from a lagging-strand block. J. Biol. Chem. 279, 21543–21551 (2004). 62. Yeeles, J. T., Poli, J., Marians, K. J. & Pasero, P. Rescuing stalled or damaged replication forks. Cold Spring Harb. Perspect. Biol. 5, a012815 (2013). 63. Sakofsky, C. J. et al. Break-induced replication is a source of mutation clusters underlying kataegis. Cell Rep. 7, 1640–1648 (2014). 64. Bartek, J., Lukas, J. & Bartkova, J. DNA damage response as an anti-cancer barrier: damage threshold and the concept of ‘conditional haploinsufficiency’. Cell Cycle 6, 2344–2347 (2007). 65.Costantino, L. et al. Break-induced replication repair of damaged forks induces genomic duplications in human cells. Science 343, 88–91 (2014). 66.Chin, K. et al. In situ analyses of genome instability in breast cancer. Nature Genet. 36, 984–988 (2004). 67.Vega, F. et al. Splenic marginal zone lymphomas are characterized by loss of interstitial regions of chromosome 7q, 7q31.32 and 7q36.2 that include the protection of telomere 1 (POT1) and sonic hedgehog (SHH) genes. Br. J. Haematol. 142, 216–226 (2008). 68.Poncet, D. et al. Changes in the expression of telomere maintenance genes suggest global telomere dysfunction in B‑chronic lymphocytic leukemia. Blood 111, 2388–2391 (2008). 69. Aguilera, A. & Garcia-Muse, T. R loops: from transcription byproducts to threats to genome stability. Mol. Cell 46, 115–124 (2012). 70. Roberts, S. A. & Gordenin, D. A. Clustered and genome-wide transient mutagenesis in human cancers: Hypermutation without permanent mutators or loss of fitness. Bioessays 36, 382–393 (2014). 71.Roberts, S. A. et al. An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers. Nature Genet. 45, 970–976 (2013). This paper used a hypothesis-driven approach to calculate the over-representation of APOBEC signature mutations across 2,680 tumours. This work identified APOBEC signature mutagenesis as a prominent source of mutation in six cancer types and correlated the total number of these mutations with APOBEC mRNA levels. 798 | DECEMBER 2014 | VOLUME 14 72. Rogozin, I. B., Pavlov, Y. I., Bebenek, K., Matsuda, T. & Kunkel, T. A. Somatic mutation hotspots correlate with DNA polymerase eta error spectrum. Nature Immunol. 2, 530–536 (2001). 73. Neuberger, M. S. & Rada, C. Somatic hypermutation: activation-induced deaminase for C/G followed by polymerase eta for A/T. J. Exp. Med. 204, 7–10 (2007). 74. Liu, M. & Schatz, D. G. Balancing AID and DNA repair during somatic hypermutation. Trends Immunol. 30, 173–181 (2009). 75. Maul, R. W. & Gearhart, P. J. AID and somatic hypermutation. Adv. Immunol. 105, 159–191 (2010). 76.Peled, J. U. et al. The biochemistry of somatic hypermutation. Annu. Rev. Immunol. 26, 481–511 (2008). 77. Rogozin, I. B. & Diaz, M. Cutting edge: DGYW/WRCH is a better predictor of mutability at G:C bases in Ig hypermutation than the widely accepted RGYW/WRCY motif and probably reflects a two-step activationinduced cytidine deaminase-triggered process. J. Immunol. 172, 3382–3384 (2004). 78.Migliazza, A. et al. Frequent somatic hypermutation of the 5′ noncoding region of the BCL6 gene in B‑cell lymphoma. Proc. Natl Acad. Sci. USA 92, 12520–12524 (1995). 79.Pasqualucci, L. et al. Hypermutation of multiple protooncogenes in B‑cell diffuse large-cell lymphomas. Nature 412, 341–346 (2001). 80.Bolli, N. et al. Heterogeneity of genomic evolution and mutational profiles in multiple myeloma. Nature Commun. 5, 2997 (2014). 81.Puente, X. S. et al. Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia. Nature 475, 101–105 (2011). 82. Refsland, E. W. & Harris, R. S. The APOBEC3 family of retroelement restriction factors. Curr. Top. Microbiol. Immunol. 371, 1–27 (2013). 83. Smith, H. C., Bennett, R. P., Kizilyer, A., McDougall, W. M. & Prohaska, K. M. Functions and regulation of the APOBEC family of proteins. Semin. Cell Dev. Biol. 23, 258–268 (2012). 84. Gibbs, P. E. & Lawrence, C. W. Novel mutagenic properties of abasic sites in Saccharomyces cerevisiae. J. Mol. Biol. 251, 229–236 (1995). 85. Gibbs, P. E., McDonald, J., Woodgate, R. & Lawrence, C. W. The relative roles in vivo of Saccharomyces cerevisiae Pol ε, Pol ζ, Rev1 protein and Pol32 in the bypass and mutation induction of an abasic site, T‑T (6‑4) photoadduct and T‑T cis-syn cyclobutane dimer. Genetics 169, 575–582 (2005). 86.Krokan, H. E. et al. Error-free versus mutagenic processing of genomic uracil-Relevance to cancer. DNA Repair 19, 38–47(2014). 87. Pham, P., Chelico, L. & Goodman, M. F. DNA deaminases AID and APOBEC3G act processively on single-stranded DNA. DNA Repair 6, 689–692; author reply 693–694 (2007). 88. Chelico, L., Pham, P., Calabrese, P. & Goodman, M. F. APOBEC3G DNA deaminase acts processively 3′ → 5′ on single-stranded DNA. Nature Struct. Mol. Biol. 13, 392–399 (2006). 89.Burns, M. B. et al. APOBEC3B is an enzymatic source of mutation in breast cancer. Nature 494, 366–370 (2013). 90. Burns, M. B., Temiz, N. A. & Harris, R. S. Evidence for APOBEC3B mutagenesis in multiple human cancers. Nature Genet. 45, 977–983 (2013). This paper investigated the expression of APOBEC3B across tumours of multiple cancer types, finding that the mRNA levels of this gene correlated with the total mutation load of the tumour. Mutations from tumours with the highest APOBEC3B expression levels occurred primarily within APOBEC-targeted motifs. 91.Taylor, B. J. et al. DNA deaminases induce breakassociated mutation showers with implication of APOBEC3B and 3A in breast cancer kataegis. Elife 2, e00534 (2013). 92. Kidd, J. M., Newman, T. L., Tuzun, E., Kaul, R. & Eichler, E. E. Population stratification of a common APOBEC gene deletion polymorphism. PLoS Genet. 3, e63 (2007). 93.Long, J. et al. A common deletion in the APOBEC3 genes and breast cancer risk. J. Natl Cancer Inst. 105, 573–579 (2013). 94.Xuan, D. et al. APOBEC3 deletion polymorphism is associated with breast cancer risk among women of European ancestry. Carcinogenesis 34, 2240–2243 (2013). www.nature.com/reviews/cancer © 2014 Macmillan Publishers Limited. All rights reserved REVIEWS 95.Zhang, T. et al. Evidence of associations of APOBEC3B gene deletion with susceptibility to persistent HBV infection and hepatocellular carcinoma. Hum. Mol. Genet. 22, 1262–1269 (2013). 96.Nik-Zainal, S. et al. Association of a germline copy number polymorphism of APOBEC3A and APOBEC3B with burden of putative APOBEC-dependent mutations in breast cancer. Nature Genet. 46, 487–491 (2014). 97. Pfeifer, G. P. Mutagenesis at methylated CpG sequences. Curr. Top. Microbiol. Immunol. 301, 259–281 (2006). 98.Visnes, T. et al. Uracil in DNA and its processing by different DNA glycosylases. Phil. Trans. R. Soc. B 364, 563–568 (2009). 99. Antequera, F. Structure, function and evolution of CpG island promoters. Cell. Mol. Life Sci. 60, 1647–1658 (2003). 100.Schmidt, S. et al. Hypermutable non-synonymous sites are under stronger negative selection. PLoS Genet. 4, e1000281 (2008). 101. World Health Organisation. A Review of Human Carcinogens: IARC Monographs on the Evaluation of Carcinogenic Risks to Humans. 100D, 7–303 (World Health Organisation, 2012). 102.Plosky, B. S. & Woodgate, R. Switching from highfidelity replicases to low-fidelity lesion-bypass polymerases. Curr. Opin. Genet. Dev. 14, 113–119 (2004). 103.Sale, J. E. Translesion DNA synthesis and mutagenesis in eukaryotes. Cold Spring Harb. Perspect. Biol. 5, a012708 (2013). 104.Ikehata, H. & Ono, T. The mechanisms of UV mutagenesis. J. Radiat. Res. 52, 115–125 (2011). 105.Berger, M. F. et al. Melanoma genome sequencing reveals frequent PREX2 mutations. Nature 485, 502–506 (2012). 106.Breen, A. P. & Murphy, J. A. Reactions of oxyl radicals with DNA. Free Radic. Biol. Med. 18, 1033–1077 (1995). 107. Cadet, J. & Wagner, J. R. DNA base damage by reactive oxygen species, oxidizing agents, and UV radiation.Cold Spring Harb. Perspect. Biol. 5, a012559 (2013). 108.Evans, M. D., Dizdaroglu, M. & Cooke, M. S. Oxidative DNA damage and disease: induction, repair and significance. Mutat. Res. 567, 1–61 (2004). 109.Degtyareva, N. P. et al. Oxidative stress-induced mutagenesis in single-strand DNA occurs primarily at cytosines and is DNA polymerase ζ-dependent only for adenines and guanines. Nucleic Acids Res. 41, 8995–9005 (2013). 110. Moraes, E. C., Keyse, S. M. & Tyrrell, R. M. Mutagenesis by hydrogen peroxide treatment of mammalian cells: a molecular analysis. Carcinogenesis 11, 283–293 (1990). 111. Tkeshelashvili, L. K., McBride, T., Spence, K. & Loeb, L. A. Mutation spectrum of copper-induced DNA damage. J. Biol. Chem. 266, 6401–6406 (1991). 112. Bacolla, A., Cooper, D. N. & Vasquez, K. M. Mechanisms of base substitution mutagenesis in cancer genomes. Genes 5, 108–146 (2014). 113.Bacolla, A. et al. Guanine holes are prominent targets for mutation in cancer and inherited disease. PLoS Genet. 9, e1003816 (2013). This paper describes a positive correlation between sequence-dependent oxidation potentials of guanines and the number of mutations that occur within specific sequence motifs in various human cancers. 114. DeMarini, D. M. Genotoxicity of tobacco smoke and tobacco smoke condensate: a review. Mutat. Res. 567, 447–474 (2004). 115.Pfeifer, G. P. et al. Tobacco smoke carcinogens, DNA damage and p53 mutations in smokingassociated cancers. Oncogene 21, 7435–7451 (2002). 116.Govindan, R. et al. Genomic landscape of non-small cell lung cancer in smokers and never-smokers. Cell 150, 1121–1134 (2012). 117.Imielinski, M. et al. Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell 150, 1107–1120 (2012). 118.Lee, W. et al. The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature 465, 473–477 (2010). 119. Bodell, W. J., Gaikwad, N. W., Miller, D. & Berger, M. S. Formation of DNA adducts and induction of lacI mutations in Big Blue Rat‑2 cells treated with temozolomide: implications for the treatment of lowgrade adult and pediatric brain tumors. Cancer Epidemiol. Biomarkers Prev. 12, 545–551 (2003). 120.Stojic, L., Brun, R. & Jiricny, J. Mismatch repair and DNA damage signalling. DNA Repair 3, 1091–1101 (2004). 121.Tomita-Mitchell, A. et al. Mismatch repair deficient human cells: spontaneous and MNNG-induced mutational spectra in the HPRT gene. Mutat. Res. 450, 125–138 (2000). 122.Cahill, D. P. et al. Loss of the mismatch repair protein MSH6 in human glioblastomas is associated with tumor progression during temozolomide treatment. Clin. Cancer Res. 13, 2038–2045 (2007). This paper identified mutation or loss of expression of MSH6 as a recurrent mediator of glioblastoma resistance to the alkylating chemotherapeutic drug temozolomide. 123.Hunter, C. et al. A hypermutation phenotype and somatic MSH6 mutations in recurrent human malignant gliomas after alkylator chemotherapy. Cancer Res. 66, 3987–3991 (2006). 124.Moen, E. L., Stark, A. L., Zhang, W., Dolan, M. E. & Godley, L. A. The role of gene body cytosine modifications in MGMT expression and sensitivity to temozolomide. Mol. Cancer Ther. (2014). 125.Zhang, J. et al. Certain imidazotetrazines escape O6‑methylguanine-DNA methyltransferase and mismatch repair. Oncology 80, 195–207 (2011). 126.Johnson, B. E. et al. Mutational analysis reveals the origin and therapy-driven evolution of recurrent glioma. Science 343, 189–193 (2014). This paper used exome sequencing to compare the mutations occurring in primary gliomas to those in relapsed tumours, finding that relapsed tumours frequently stem from cancer cells that are established early during tumour progression. Moreover, tumours that recurred following treatment with temozolomide more frequently resulted in high-grade gliomas with temozolomide-induced mutations in the RB and mTOR pathways. 127.Arlt, V. M., Stiborova, M. & Schmeiser, H. H. Aristolochic acid as a probable human cancer hazard in herbal remedies: a review. Mutagenesis 17, 265–277 (2002). 128.National Toxicology Program. Aristolochic acids. Rep. Carcinog. 12, 45–49 (2011). 129.Sidorenko, V. S. et al. Lack of recognition by globalgenome nucleotide excision repair accounts for the high mutagenicity and persistence of aristolactamDNA adducts. Nucleic Acids Res. 40, 2494–2505 (2012). 130.Chen, C. H. et al. Aristolochic acid-associated urothelial cancer in Taiwan. Proc. Natl Acad. Sci. USA 109, 8241–8246 (2012). 131.Hollstein, M., Moriya, M., Grollman, A. P. & Olivier, M. Analysis of TP53 mutation spectra reveals the fingerprint of the potent environmental carcinogen, aristolochic acid. Mutat. Res. 753, 41–49 (2013). 132.Kunkel, T. A. & Bebenek, K. DNA replication fidelity. Annu. Rev. Biochem. 69, 497–529 (2000). 133.Kunkel, T. A. & Erie, D. A. DNA mismatch repair. Annu. Rev. Biochem. 74, 681–710 (2005). 134.Shcherbakova, P. V. & Fijalkowska, I. J. Translesion synthesis DNA polymerases and control of genome stability. Front. Biosci. 11, 2496–2517 (2006). 135.Vilar, E. & Gruber, S. B. Microsatellite instability in colorectal cancer-the stable evidence. Nature Rev. Clin. Oncol. 7, 153–162 (2010). 136.Kim, T. M., Laird, P. W. & Park, P. J. The landscape of microsatellite instability in colorectal and endometrial cancer genomes. Cell 155, 858–868 (2013). 137.Briggs, S. & Tomlinson, I. Germline and somatic polymerase epsilon and delta mutations define a new class of hypermutated colorectal and endometrial cancers. J. Pathol. 230, 148–153 (2013). 138.Palles, C. et al. Germline mutations affecting the proofreading domains of POLE and POLD1 predispose to colorectal adenomas and carcinomas. Nature Genet. 45, 136–144 (2013). This paper describes germline mutations in the exonuclease domains of POLE and POLD1 as predisposing to familial colorectal and endometrial cancers, probably causing a class of microsatellitestable hypermutated tumours. NATURE REVIEWS | CANCER 139.Valle, L. et al. New insights into POLE and POLD1 germline mutations in familial colorectal cancer and polyposis. Hum. Mol. Genet. 23, 3506–3512 (2014). 140.Zou, Y. et al. Frequent POLE1 p. S297F mutation in Chinese patients with ovarian endometrioid carcinoma. Mutat. Res. 761, 49–52 (2014). 141.Nick McElhinny, S. A., Gordenin, D. A., Stith, C. M., Burgers, P. M. & Kunkel, T. A. Division of labor at the eukaryotic replication fork. Mol. Cell 30, 137–144 (2008). 142.Cancer Genome Atlas Research Network. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013). 143.Donehower, L. A. et al. MLH1‑silenced and nonsilenced subgroups of hypermutated colorectal carcinomas have distinct mutational landscapes. J. Pathol. 229, 99–110 (2013). 144.Matsuda, T., Bebenek, K., Masutani, C., Hanaoka, F. & Kunkel, T. A. Low fidelity DNA synthesis by human DNA polymerase-ε. Nature 404, 1011–1013 (2000). 145.Machida, K. et al. Hepatitis C virus induces a mutator phenotype: enhanced mutations of immunoglobulin and protooncogenes. Proc. Natl Acad. Sci. USA 101, 4262–4267 (2004). 146.Totoki, Y. et al. High-resolution characterization of a hepatocellular carcinoma genome. Nature Genet. 43, 464–469 (2011). 147.Sotillo, R. et al. Mad2 overexpression promotes aneuploidy and tumorigenesis in mice. Cancer Cell 11, 9–23 (2007). 148.Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nature Biotech. 30, 413–421 (2012). This paper describes an algorithm for estimating whether given cancer mutations are clonal or subclonal based on allelic fractions, tumour purity and regional ploidy. 149.Davoli, T. et al. Cumulative haploinsufficiency and triplosensitivity drive aneuploidy patterns and shape the cancer genome. Cell 155, 948–962 (2013). 150.Ciriello, G. et al. Emerging landscape of oncogenic signatures across human cancers. Nature Genet. 45, 1127–1133 (2013). 151.Stephens, P. J. et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell 144, 27–40 (2011). This paper describes mass breakage of individual chromosomes and subsequent repair that results in multiple, regionally isolated chromosome rearrangements, known as chromothripsis. 152.Liu, P. et al. Chromosome catastrophes involve replication mechanisms generating complex genomic rearrangements. Cell 146, 889–903 (2011). 153.Lee, R. S. et al. A remarkably simple genome underlies highly malignant pediatric rhabdoid cancers. J. Clin. Invest. 122, 2983–2988 (2012). This paper describes the exome sequencing of paediatric rhabdoid cancers, showing that they have a genome with remarkably few genetic alterations and only one recurrent mutation occurring in SMARCB1. 154.Parsons, D. W. et al. The genetic landscape of the childhood cancer medulloblastoma. Science 331, 435–439 (2011). This paper describes the sequencing of the protein-coding regions of medulloblastomas, determining that these tumours have a low number of mutations, with the primary recurrent mutations occurring in MLL2 and MLL3. This suggests that epigenetic alteration has a role in pathogenesis. 155.Crompton, B. D. et al. The genomic landscape of pediatric ewing sarcoma. Cancer Discov. http://dx.doi.org/10.1158/2159-8290.CD-13-1037 (2014). 156.Lawrence, M. S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014). This paper analysed mutations occurring in 4,072 human tumours to find likely cancer drivers as recurrently mutated genes. This analysis identified 33 new cancer genes and estimated that between 600 to 5,000 samples need to be sequenced to identify all of the cancer genes in a given cancer type. VOLUME 14 | DECEMBER 2014 | 799 © 2014 Macmillan Publishers Limited. All rights reserved REVIEWS 157.Papaemmanuil, E. et al. RAG-mediated recombination is the predominant driver of oncogenic rearrangement in ETV6‑RUNX1 acute lymphoblastic leukemia. Nature Genet. 46, 116–125 (2014). 158.Robbiani, D. F. et al. AID is required for the chromosomal breaks in c‑myc that lead to c‑myc/IgH translocations. Cell 135, 1028–1038 (2008). 159.Dorsett, Y. et al. A role for AID in chromosome translocations between c‑myc and the IgH variable region. J. Exp. Med. 204, 2225–2232 (2007). 160.Burrell, R. A. & Swanton, C. The evolution of the unstable cancer genome. Curr. Opin. Genet. Dev. 24, 61–67 (2013). 161.Henderson, S., Chakravarthy, A., Su, X., Boshoff, C. & Fenton, T. R. APOBEC-Mediated Cytosine Deamination Links PIK3CA Helical Domain Mutations to Human Papillomavirus-Driven Tumor Development. Cell Rep. 7, 1833–1841 (2014). This paper determined that human papillomavirus (HPV)-positive head and neck cancers have a higher number of mutations occurring in DNA sequences that are targeted by APOBEC cytidine deaminases, resulting in higher incidence of PIK3CA helical domain mutations in these tumours. The favouring of the helical domain mutations, which reside in APOBEC-target motifs, over kinase domain mutation suggests that the APOBEC enzymes are actively contributing to cancer progression in these tumours. 162.Viros, A. et al. Ultraviolet radiation accelerates BRAFdriven melanomagenesis by targeting TP53. Nature 511, 478–482 (2014). 163.Douville, C. et al. CRAVAT: cancer-related analysis of variants toolkit. Bioinformatics 29, 647–648 (2013). 164.Dulak, A. M. et al. Exome and whole-genome sequencing of esophageal adenocarcinoma identifies recurrent driver events and mutational complexity. Nature Genet. 45, 478–486 (2013). 165.Jager, N. et al. Hypermutation of the inactive X chromosome is a frequent event in cancer. Cell 155, 567–581 (2013). 166.Kumar, A. et al. Exome sequencing identifies a spectrum of mutation frequencies in advanced and lethal prostate cancers. Proc. Natl Acad. Sci. USA 108, 17087–17092 (2011). 800 | DECEMBER 2014 | VOLUME 14 167.Baca, S. C. et al. Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 (2013). 168.Poon, S., McPherson, J., Tan, P., Teh, B. & Rozen, S. Mutation signatures of carcinogen exposure: genome-wide detection and new opportunities for cancer prevention. Genome Med. 6, 24 (2014). 169.Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012). 170.Ciccia, A. & Elledge, S. J. The DNA damage response: making it safe to play with knives. Mol. Cell 40, 179–204 (2010). Acknowledgements The authors are grateful to D. Kwiatkowski, D. Zaykin and K. Chan for advice on the manuscript. This research was supported by the Intramural Research Program of the National Institutes of Health (NIH), National Institute of Environmental Health Sciences (D.A.G. and S.A.R.) and by an NIH Pathway to Independence Award K99ES022633‑01 (S.A.R.) Competing interests statement The authors declare no competing interests. www.nature.com/reviews/cancer © 2014 Macmillan Publishers Limited. All rights reserved
© Copyright 2026 Paperzz