Hypermutation in human cancer genomes: footprints and mechanisms

REVIEWS
Hypermutation in human cancer
genomes: footprints and mechanisms
Steven A. Roberts1,2 and Dmitry A. Gordenin1
Abstract | A role for somatic mutations in carcinogenesis is well accepted, but the degree to
which mutation rates influence cancer initiation and development is under continuous
debate. Recently accumulated genomic data have revealed that thousands of tumour
samples are riddled by hypermutation, broadening support for the idea that many cancers
acquire a mutator phenotype. This major expansion of cancer mutation data sets has
provided unprecedented statistical power for the analysis of mutation spectra, which has
confirmed several classical sources of mutation in cancer, highlighted new prominent
mutation sources (such as apolipoprotein B mRNA editing enzyme catalytic polypeptide-like
(APOBEC) enzymes) and empowered the search for cancer drivers. The confluence of cancer
mutation genomics and mechanistic insight provides great promise for understanding the
basic development of cancer through mutations.
Mutation load
The number of mutations in
the genome or part of the
genome of a cell, a group of
cells (tumour, tissue and so on)
or an organism.
Mutation rates
(Also known as mutation
frequencies). The numbers of
mutations per unit of length
(for example, per Megabase) or
biological time (for example,
per cell generation).
Drivers
Mutations that increase the
probability of tumour incidence.
Laboratory of Molecular
Genetics, National Institute of
Environmental Health
Sciences (NIH, DHHS),
Durham, North Carolina
27709, USA.
2
School of Molecular
Biosciences, Washington
State University, Pullman,
Washington 99164, USA.
Correspondence to D.A.G. e-mail:
[email protected]
doi:10.1038/nrc3816
1
Mutations are among the usual suspects for causing cancer, with mutations being found in oncogenes
and tumour suppressor genes in malignant tumours.
Moreover, there are several classical cases in which
increased spontaneous mutagenesis or environmentally
enhanced mutagenesis correlates with increased mutation
load and cancer risk1–7. Such instances of high mutation
load have served as evidence supporting the hypothesis
that cancer frequently involves the establishment of a
mutator phenotype8, in which pre-cancerous cells begin
to accumulate more mutations per cell generation than
their normal counterparts. Despite the general observation that tumours often contain a large number of
mutations (that is, >10,000 single nucleotide changes
and tens to hundreds of structural changes), which we
shall refer to as hypermutation, neither how these mutations accumulate (that is, through higher mutation rates
or increased number of replications in highly proliferative cancer cells)8–10 nor whether they accelerate cancer
or are merely a by‑product of immortalization has yet
to be established.
To try to understand the relationship between mutations and tumorigenesis more clearly, recent resequencing of cancer genomes has been undertaken. These
efforts have been mostly designed to identify frequently
mutated genes, the alteration of which likely drive cancer, but have also revealed important insights into the
plasticity of the cancer genome. For example, mutation
loads can differ by several orders of magnitude11,12, with
a wide variety of tumour types, such as melanoma, lung,
stomach, colorectal, endometrial and cervical cancers,
showing high mutation loads that are consistent with
hypermutation, which may generate drivers of malignancy. Evaluating this contribution by cataloguing cancer genes that are frequently affected by hypermutation
and determining the mechanisms of hypermutation may
further our understanding of cancer biology, through
which new therapeutic targets may be identified. This
Review will discuss the current understanding of hypermutation in cancer and speculate on future advances in
this field that will be facilitated by the rapidly evolving
area of cancer genomics, in which the analysis of vast
whole-genome and whole-exome mutation data sets
merges with detailed knowledge about DNA trans­
actions to identify new mutagenic mechanisms and find
new cancer drivers.
Hypermutation in cancer
Scientists have long understood that the root causes of
cancer lie in the dysregulation of cell survival and proliferation, often as the result of multiple genetic alterations
(that is, both single nucleotide changes and structural
rearrangements) that accumulate within a cell, despite a
normally low mutation rate. However, 40 years after the
initial suggestion of the cancer mutator phenotype13, this
hypothesis remains supported primarily by the increased
predisposition to cancer of individuals who are deficient
in various DNA replication and repair processes, as
786 | DECEMBER 2014 | VOLUME 14
www.nature.com/reviews/cancer
© 2014 Macmillan Publishers Limited. All rights reserved
REVIEWS
Structural rearrangements
Changes in the location of
genomic regions relative to
each other.
Mutation reporters
DNA sequences that, when
mutated in a cell, give the cell a
phenotype that can be
selected. In most cases the
mutation either restores the
function of an inactive protein
or provides resistance to a
drug.
Mutation spectrum
The list of mutation types and
coordinates within a mutation
load.
well as by limited experimental observation of usually
large numbers of mutations in various tumour samples. The number of cancer genomes and exomes (currently exceeding 10,000 and growing fast) sequenced
by the collective efforts of individual groups, as well as
The Cancer Genome Atlas (TCGA) and International
Cancer Genome Consortium (ICGC), has provided
the ability to have a much broader assessment of the
sources and consequences of hypermutation in cancer
development, mostly through statistical analysis of patterns within the mutation data. In these studies11,12,14–19,
the sequence of tumour DNA is compared to the DNA
sequence of either the matched normal tissue or blood
of the patient to identify tumour-specific mutations. All
current mutation identification algorithms require evidence of a genetic change to be supported by multiple
DNA sequences that would originate from multiple cells
within the tumour. As a consequence, these mutation
lists represent a composite image of the mutagenesis
occurring in the major sub-clones of the tumour and
limit the contribution of mutations in neighbouring
stromal cells.
The philosophy and statistical approaches for
extracting useful information from catalogues of
mutations in cancer genomes are, overall, analogous
to the analysis of mutation spectra obtained in experiments with mutation reporters — the classical approach
in molecular genetics20,21. Apparent ‘irregularities’ in
the distribution of mutation types and position, as
compared to the null hypothesis of a random mutation
spectrum, are matched against mechanistic knowledge
about the chemistry of a mutagen and genetic systems
expected to repair the resulting DNA lesions. For example, mutation spectra of ultraviolet (UV) radiation are
in good agreement with its capability to cause bulky
lesions (cyclobutane pyrimidine dimers (CPDs) and
6–4 photoproducts (6–4PPs)) in adjacent pyrimidine
nucleotides22,23. However, where the analysis of mutation
spectra from reporters in model systems is greatly aided
by defined experimental conditions and genotypes, the
background information for cancer genome mutation
catalogues is much less defined. In part, this is compensated for by the large number of mutations within
individual cancer genomes – sometimes greater than
105 – allowing for statistical analysis of cancer mutation
spectra that is unrestricted by mechanistic hypotheses.
De novo genome-wide or exome-wide patterns. The first
mutation patterns within whole-genome-sequenced
cancers were detected in relation to the distribution of
mutations in genomic space (TABLE 1). Universally, mutation rate was observed to be increased near breakpoints
of structural rearrangements24,25, which are present in
large numbers in many cancer genomes and are unique
in each sample. Studies in model microbial systems established that this relationship probably results from either
error-prone DNA synthesis that is associated with DNA
double-strand-break (DSB) repair 26–29 or an increased
number of unrepaired lesions in long regions of singlestranded DNA (ssDNA) created around the break site30–34
(also discussed further below). Collectively, the experimental evidence showing that regions of DNA that are
associated with DSB repair are prone to mutation and the
bioinformatics analysis describing similar effects within
clinical tumour samples suggested that intrinsic chromosomal features could themselves alter the rate of mutation
and serve as one source of hypermutation in cancer.
Other genomic features, such as replication timing,
transcription levels, and chromatin organization, also
affect mutagenesis and are considered to occur universally among normal somatic cells and various cancer
types. They have non-uniform profiles across the genome;
however, each profile is relatively constant between cells
of the same type and even, in part, between different cancer cell lines. The stratification of these features across
genomic space, initially developed by individual groups
and subsequently expanded by the ENCODE consortium35–37, has generated a large database of profiles that
Table 1 | Genomic features affecting the mutation rate in cancer genomes
Genomic
feature
Type of feature
Association with
increased mutation
density
Potential mechanism
Examples of affected cancer types
Refs
Rearrangement
breakpoints
Sample-specific
Vicinity to a breakpoint
Hypermutation in ssDNA
and error-prone DNA
synthesis
Breast, head and neck, colorectal,
prostate, melanoma, multiple myeloma
and chronic lymphocytic leukaemia
24,25,32,
80
Replication
timing
Universal
Late replication
Unknown
Melanoma; pan-cancer studies of up to
27 cancer types
12,38,39
Transcription
Static
Low transcription and/or
non-transcribed strand
TCR-mediated removal
of mutagenic lesions
Urothelial upper tract, lung and
melanoma; pan-cancer study of 30
cancer types
11,42–44,
168
Heterochromatin Universal or cancer
marks
type-specific
H3K9me3, H3K9me2 and Absence of TCR and poor
H4K20me3
access of GGR
Leukaemia, melanoma, lung and
prostate
46
Nucleosome
density
Universal or cancer
type-specific
DNAse I-resistant
regions with higher
nucleosome content
Poor access of GGR
complexes
Melanoma
39
Local sequence
mutation
Cancer
type-specific
Trinucleotide context
DNA repair, replication
and endogenous lesions
Multiple types
See the
main text
GGR, global genome repair; me2, dimethylation; me3, trimethylation; ssDNA, single-stranded DNA; TCR, transcription-coupled repair.
NATURE REVIEWS | CANCER
VOLUME 14 | DECEMBER 2014 | 787
© 2014 Macmillan Publishers Limited. All rights reserved
REVIEWS
Late replicating regions
Regions of the genome where
DNA is synthesized near the
end of S‑phase during
replication.
Transcription-coupled repair
(TCR). A form of nucleotide
excision repair that utilizes a
stalled RNA polymerase as a
means to recognize a DNA
lesion in the transcribed
strand.
Nucleotide excision repair
(NER). A DNA repair pathway
that removes bulky DNA
lesions by removing a stretch
of the DNA strand containing
the lesion and then by
subsequently using the
remaining undamaged DNA
strand as a template to
synthesize a new stretch of
DNA to replace the excised
one.
Mutation signatures
Characteristics of a mutation,
such as the mutated base, the
resulting base (or bases) and
nucleotides in the immediate
vicinity that occur more
frequently than expected with
random mutation of genomic
DNA.
DNA lesion
A specific change in DNA
structure that results from DNA
damage; for example, cytidine
deamination, base alkylation,
base oxidation, strand
crosslinks, ultraviolet‑induced
dimers and DNA breakage.
Mutation clusters
Groups of mutations that are
spaced more closely than
expected by random
distribution of mutations in a
genome.
can be correlated with densities of somatic mutations in
cancer genomes. The most robust and reproducible correlation was documented for late replicating regions accumulating higher mutation densities than in the rest of the
cancer genomes12,38,39. Interestingly, late replicating regions
seem to be also more mutable over an evolutionary timescale40,41. A greater chance of replication fork uncoupling
leading to formation of hypermutable ssDNA in late replicating regions was suggested as a mechanism underlying
this hypermutability 41.
A second, relatively invariant, feature associated with
increased mutation density is non-transcribed or low
transcribed regions as compared to highly transcribed
regions of the genome12. High levels of transcription may
prevent mutations by enabling transcription-coupled repair
(TCR), a form of nucleotide excision repair (NER) that is initiated by lesion-stalled RNA polymerase II (RNA Pol II),
which, together with global genome repair (GGR),
would remove more mutation-generating DNA lesions
than GGR alone would do in non-transcribed or lowtranscribed regions. In support of a role for TCR in
suppressing mutation rates, mutation frequencies were
often lower on the transcribed strand versus the nontranscribed strand, where TCR does not operate11,42–45. Such analyses typically compare the relative
number of specific mutation types occurring on the
transcribed and non-transcribed DNA strands and
therefore require assumptions about which base in a
pair is mutated as well as about equal distribution of
the target sequences between DNA strands. In cases in
which a tumour is linked to a well-characterized DNAdamaging agent (for example, UV for melanoma),
these assumptions are probably valid. However, other
explanations for trans­criptional strand bias are possible. Transcription levels may also partially explain
the increase in mutation density that is associated with
heterochromatin46. Regions with condensed chromatin,
as determined by resistance to S1 nuclease indicating
a higher density of nucleosomes, had increased mutation density in melanoma genomes39. This could also
be due to more active TCR in more highly transcribed
open chromatin. However, even non-transcribed
regulatory regions with low nucleosome density had low
mutation density. Supporting an alternative explanation,
melanoma samples with mutations in NER genes show
higher mutation densities in S1‑sensitive nucleosomedeprived regions than in samples with wild-type NER
genes, suggesting that the increase in mutation density
in nucleosome-rich regions can be, in part, due to lower
accessibility to NER complexes.
In addition to regional mutation patterns in cancer,
patterns of sequence specificity can also be observed.
Initial model studies, which involve sequencing of mutation reporters, established that mutation rates can vary
substantially for different ‘mutation signatures’; that is, the
choices of mutated nucleotide, the kind of nucleotide
resulting from mutation and the immediate sequence
context. These choices may be defined by one or several factors, such as the DNA lesion spectrum induced
by a mutagen, replication fidelity, and the relative contribution of DNA repair pathways in an individual or
cell type21 (see also below). Therefore, mutation signatures in a cancer sample may carry useful information
about the history of the development of a tumour, which
can be likened to an archaeological record47.
A powerful analytical method, non-negative matrix
factorization (NMF)48,49, which proved to be useful for a
variety of bioinformatics analyses of large data sets, has
likewise been productive in decrypting mutation signatures in cancer genomes. When applied to cancer mutation databases, this approach uses iteration to identify
two matrices that, together, can accurately reconstruct
the distribution of mutations across 96 mutation types
(that is, three possible substitutions occurring at the central base of 32 trinucleotide sequences; complementary
mutation types are combined). The first matrix is a group
of specific mutation signatures that is defined by relative abundances of each mutation type, and the second
matrix describes the strength at which the mutation signature occurs in each tumour in the analysed database.
Lawrence, Getz and colleagues12 highlighted six prevailing mononucleotide or dinucleotide mutation signatures
in more than 3,000 samples of 27 different cancer types.
Some signatures were highly concordant with exposures
to specific mutagens, such as tobacco or UV radiation.
Stratton and colleagues11,50,51 applied the NMF algorithm
to all possible combinations of trinucleotides and base
substitutions of the central base. Their analysis identified
21 mutation signatures, each consisting of several simple trinucleotide mutation motifs. A somewhat different
version of NMF led Zhao and colleagues52 to identify
20 mutation signatures, most of which overlapped with
signatures found by other groups. They also explored
the heterogeneity of mutation signatures within cancer
types. They concluded that the mutation spectrum in
cancers is generated by a complex mix of mechanisms
resulting in a signature profile that is unique for each
individual cancer. The advantage of the NMF-based
approach is that it is unbiased and hypothesis-free, so
that, in principle, it can extract any (even previously
unsuspected) mutational signatures. However, as indicated by Alexandrov et al.50, the number of extracted
signatures depends on the number of samples being
analysed and the diversity and frequency of mutation
signatures. Hence, low-frequency signatures will be
difficult to identify. Still, with a large number of cancer
samples, many signatures can be detected and then compared to experimental and mechanistic knowledge, with
the goal of deciphering multiple mutagenic mechanisms
that occur during cancer development. NMF-generated
signatures are usually complex, and they may result from
a mix of different mechanisms. Further analysis can be
greatly facilitated by identifying a single-mechanism
component (or components) within a signature.
Patterns in mutation clusters. As an alternative to NMF,
single-mechanism components of a mutation signature
can be pinpointed by statistical analysis of spectra in
groups of mutations that are located too close to each
other to be attributed to random independent events —
that is, representing mutation clusters. The phenomenon
of unusually close positioning of several mutations was
788 | DECEMBER 2014 | VOLUME 14
www.nature.com/reviews/cancer
© 2014 Macmillan Publishers Limited. All rights reserved
REVIEWS
a DSB repair
b Break-induced replication
c Telomere uncapping
TLS and repair
TLS and repair
TLS and repair
d Replication
e Transcription
Fork regression
TLS and repair
Repair
TLS and repair
Repair
Mutation cluster
Mutated cluster
Figure 1 | Lesions in single-stranded DNA (ssDNA) can result in clusters of strand-coordinated
mutations. Nature
ReviewsLesions
| Cancer
are shown as hexagons. In the case of base-specific damage — for example, cytidine deamination by an apolipoprotein B
mRNA editing enzyme catalytic polypeptide-like (APOBEC) enzyme (or enzymes) — lesions would be in the same type of
DNA base (that is, C) of the same strand. Translesion synthesis (TLS) will introduce mutations in the complementary strand,
which can then be fixed in DNA by a subsequent round of synthesis (labelled “TLS and repair” in each part of the figure).
This will result in strand-coordination of mutations changing only Cs (red) of the initially damaged strand and
only Gs (dark blue) that are mutated in the complementary strand. a | ssDNA that is formed by 5′→3′ resection at DNA
double-strand-breaks (DSBs); one of several DSB repair mechanisms170 restores double-stranded DNA (dsDNA) at the
position of the break. b | ssDNA is formed by the migrating loop and uncoupled strand copying during break-induced
replication58,63. c | Telomere uncapping creates regions of ssDNA (REFS 31,55 and therein). d | ssDNA that is formed during
DNA replication (ssDNA in the lagging strand is shown; a similar sequence of events may be associated with the leading
strand). Lesions may result in mutations or be repaired via fork regression, which displaces a short stretch of
complementary strand and pairs damaged ssDNA of the gap with the intact region of the complementary nascent
strand62. The intact complementary strand provides a template for excision repair of lesions that are generated in the
ssDNA gap. e | Lesions in transient ssDNA that is formed by mRNA–DNA pairing (R-loops69) can lead to mutation clusters.
Mutations will be prevented if re‑annealing of DNA strands followed by excision repair occurs before replication.
Break-induced replication
A DNA double-strand-break
repair mechanism involving the
invasion of one DNA end into a
homologous locus on a sister
chromatid or homologous
chromosome. Once invaded,
the broken DNA is used to
prime replication to the end of
the unbroken sister chromatid
or homologous chromosome to
replace the DNA sequence that
is lost owing to the DNA
double-strand-break.
initially detected by Sommer and colleagues53 using the
Big Blue mouse mutation reporter system. These unusually spaced mutations were called ‘mutation showers’, by analogy with a meteor shower in the dark sky of
the night 53. Later, indications of mutation showers were also found within the epidermal growth
factor receptor (EGFR) gene in lung cancers54. The
mechanisms involved in mutation cluster formation are of current interest. Using yeast model systems, we found that accumulation of base damage
in ssDNA is one mechanism that is capable of generating closely spaced mutations31–34,55. Clusters of up
to 30 mutations spanning 200 kilobases (kb) resulted
from a variety of lesions (alky­lation, deamination and
photoproducts). Transient long regions of ssDNA can
be formed at DSBs56, at uncapped telo­meres57, during break-induced replication58 and at dysfunctional
or uncoupled replication forks 30,32,34,59–63 (FIG. 1) :
all of which are common events that occur in evolving cancer cells64–68. The specificity of mutation clusters
for ssDNA results in some cases either from the robust
removal of lesions in dsDNA by various excision repair
systems, leaving only ssDNA lesions remaining, or from
lesions caused by agents that specifically affect ssDNA31.
An important condition for clustered hypermutation is
that the lesions in transient regions of ssDNA are not
NATURE REVIEWS | CANCER
VOLUME 14 | DECEMBER 2014 | 789
© 2014 Macmillan Publishers Limited. All rights reserved
REVIEWS
Translesion synthesis
(TLS). A form of lesion tolerance
that involves the insertion of a
new nucleotide opposite a DNA
lesion, usually by a specialized
DNA polymerase.
R‑loops
Stable hybrids of RNA and
DNA that are formed during
transcription.
Strand-coordination
A phenomenon in which
clustered mutations involve
changes of only one kind of
base within the same DNA
strand (for example,
C‑coordination — only
cytosines are mutated in the
top DNA strand)
Non-canonical excision
repair
Excision of a DNA lesion or
modification that does not
completely follow the pathway
mechanisms of base excision
repair, nucleotide excision
repair or mismatch repair.
Kataegis
(or mutation shower). A group
of clustered mutations carrying
additional similarity features
that are unlikely to be random
(for example, changes of the
same nucleotide in the given
strand — strand-coordination).
repaired until this strand is copied during synthesis of
a nascent complementary strand with the help of errorprone translesion synthesis (TLS) DNA polymerases. Such
repair may happen if an ssDNA region is annealed with
an undamaged complementary strand; for example, by
re‑annealing of R-loops69 or replication fork regression62
(FIG. 1). Whether ssDNA mutagenesis is the predominant
mechanism for cluster formation in human cells, as it
seems to be in yeast, remains to be established.
The probability of mutation in transient ssDNA
can be as high as several thousand-fold greater than
in the rest of the genome32,34, leading to the incidence
of mutation clusters containing multiple closely spaced
mutations. As all mutations of a cluster occurred simultaneously, presumably from lesions in the single DNA
strand, they exhibit strand-coordination30,32,34 (FIG. 1).
Moreover, DNA-damaging agents often exhibit preference for a certain nucleotide (or nucleotides) or even
for a short nucleotide motif (or motifs). Therefore,
simultaneous mutations in a cluster occur primarily at
the same kind of nucleotide or motif. The combined
result of the stand-coordination and homogeneous
motif specificity of clustered mutations is a pure mutation spectrum stemming from a single mutagenic
process. The original proof‑of‑principle for this concept was for lesions in ssDNA caused by UVC radiation16,20. Strand-coordination was also demonstrated for
other lesions (induced by methyl methanesulphonate
(MMS), sulphites and the cytidine deaminase apolipoprotein B mRNA editing enzyme catalytic polypeptidelike 3G (APOBEC3G)) in artificially created ssDNA31,33.
However, it is important to note that although strandcoordinated clusters have so far been associated with
lesions in ssDNA, they can also stem from lesions in
dsDNA, as long as the lesions occur simultaneously
(see also figure 2 in REF 70)).
On the basis of these and other findings, Roberts
et al.32 reasoned that if strand-coordinated clusters
could be found in cancer genomes, the mutations in
each cluster would probably have occurred simultaneously and resulted from one mechanism. Consequently,
analysis of mutations in these clusters would enable
evaluation of a single mutagenic mechanism that was
singled out from a complex mix of mutation causes in
cancer. Indeed, many mutation clusters (containing up
to 34 mutations and spanning several kb) were found
in whole-genome sequencing (WGS) of multiple myelomas, head and neck, prostate and colorectal cancers,
and around 30% of clusters were completely strandcoordinated32,71. Completely strand-coordinated clusters
of cytosines or guanines (C- or G‑coordinated) prevailed
over A- or T‑coordinated clusters. A- or T‑coordinated
clusters were found mostly in multiple myelomas and
were clearly associated with the mutation motif [T(A|T)]
(mutated base underlined; ambiguous nucleotides
shown in parentheses separated by “|”). This mutation
signature has been reported for the error-prone DNA
Pol η, which participates in gap-filling DNA synthesis
that is associated with non-canonical excision repair of U∙G
mismatches during somatic hypermutation (SHM) of
immunoglobulin genes72,73.
Abnormal targeting of activation-induced cytidine
deaminase (AID; which induces U∙G mismatches during SHM) to random chromosomal locations could be
the source of A- or T‑coordinated clusters74. However,
if AID-induced U∙G mismatches were not repaired,
they would result in a C‑to‑T or C‑to‑G substitution at
the preferred DNA motif of AID for cytidine deamination: [(A|T)(A|G)C→(T|G)]75,76 (short), or [(A|T)(A|G)
C(A|T|C)→(T|G)]77 (extended). Consequently, AIDgenerated clusters normally carry a mix of mutations
in C and A75. Indeed, such mixed clusters were found
in immunoglobulin loci and a small number of secondary targets (including potential cancer genes74,78,79)
that are known to be targeted by AID mutagenesis with
lower efficiency 74 in multiple myelomas32,80 and chronic
leukaemias81. Further supporting SHM as the origin of
these clusters, both of these cancer types originate from
immune cells that have experienced SHM, and the
mutations were enriched with the AID mutation signature32,80. AID mutagenesis, however, seems to occur
primarily within the context of SHM, as no enrichment
for the AID signature was found in randomly located
clusters or in WGS data sets within the same multiple
myeloma genomes32, and the AID deamination signature was absent from pan-cancer signature decomposition by NMF11. Similarly, C- or G‑coordinated clusters,
as well as genome-wide mutagenesis across all cancer
types analysed, were depleted for the AID signature32.
C- or G‑coordinated clusters were more frequent
than A- or T‑coordinated clusters in WGS, as well
as in whole-exome sequencing (WES) data sets32,71,
and were frequently colocalized with breakpoints
of rearrangements found within the same cancer
sample (FIG. 2a), whereas no colocalization of A- or
T‑coordinated clusters were detected32. Inspection of
the nucleotides flanking the strand-coordinated C or G
mutations revealed high enrichment for a mutation signature [TC(A|T)→(T|G)], characteristic of a subclass of
APOBEC cytidine deaminases (FIG. 2b). Similar clusters
enriched with mutations in C and G were also found
using NMF by Nik-Zainal et al.51 in WGS of breast cancer samples. Sizes and mutation densities of individual
clusters (called micro-clusters in that study) were comparable to those found by Roberts et al.32 by mutated
motif and colocalization with rearrangements. It was
also noted that some larger areas of the cancer genome
(up to 1 megabase) have a high density of mutations.
Nik-Zainal et al. named this phenomenon kataegis (the
Greek word for thunderstorm, by analogy with the previously suggested term — mutation shower). In a later
work11, they defined kataegis (or kataegic foci) as six or
more consecutive mutations with an average intermutation distance of less than or equal to 1 kb. This study
found several kataegic foci not only in breast cancer but
also in cancers of the pancreas, lung and liver, as well as
in medulloblastomas, lymphomas and leukaemias11. The
high prevalence of the [TC(A|T)→(T|G)] mutation signature in C- or G‑kataegic foci suggested that APOBEC
mutagenesis could be the major component of some signatures that are revealed by NMF, in which each signature is composed of several trinucleotide motifs. Indeed
790 | DECEMBER 2014 | VOLUME 14
www.nature.com/reviews/cancer
© 2014 Macmillan Publishers Limited. All rights reserved
REVIEWS
21
Y
22 X
1
2
4
18
5′-TACCTCAGCGA...AGATTCTGACT...TCTATCATAGA-3′
3′-ATGGAGTCGCT...TCTAAGACTGA...AGATAGTATCT-5′
Tumour
5′-TACCTgAGCGA...AGATTtTGACT...TCTATtATAGA-3′
3′-ATGGAcTCGCT...TCTAAaACTGA...AGATAaTATCT-5′
Motif base frequency
5
17
6
7
13
Germline
3
20
19
16
15
14
b Strand-coordination, preferences in motif and substitution
12
11
9
10
8
1.0
1.0
Base frequency
a Clustering of mutations with genomic features
0.8
0.6
0.4
0.2
0.0
–1
0
c Processing and mutation of deoxyuridine in ssDNA
0.4
0.2
G
A
Substitution
C
T
d Refined signature analysis
N N N T C A N N N N N
N N
0.6
0.0
+1
Nucleotide position
within a DNA motif
0.8
APOBEC mutation motif
N N N
TCW/WGA (W=A or T)
Deamination
N N N T U A N N N N N
N N
N N N
Abasic site
N N N T
A N N N N N
N N
N N N N N N
Pol δ and Pol ζ 1
N N N T
APOBEC mutation signature
UDG
A N N N N N
N N N A A T N N N N N
TCW→TTW or →TGW/WGA→WAA or →WCA
2 Pol δ, Pol ζ and REV1
N N N T
A N N N N N
N N N A C T N N N N N
APOBEC mutation pattern in a sample
E=
N N N T T A N N N N N
N N N T G A N N N N N
N N N A A T N N N N N
N N N A C T N N N N N
C-to-T mutation
C-to-G mutation
MutationsTCW→TTW or TCW→TGW x ContextC
MutationsC→T or C→G x ContextTCW
Figure 2 | Mutation patterns and mechanistic knowledge used to define an APOBEC mutation signature and
produce sample-specific statistics evaluating mutagenesis. Robust mutation signatures can be developed by
identifying groups of spatially clustered mutations that are likely to have been induced by a single
mutagenic
Nature
Reviews | Cancer
mechanism. Additional features that are used to implicate the causative factor are described in parts a–d. a | Proximity
to sites of chromosomal rearrangement (purple connector line). b | Strand-coordination (example with sequence
context of a C‑coordinated cluster from multiple myeloma32; see also FIG. 1 and the main text), motif preference (grey)
and substitution specificity. For example, the colocalization of strand-coordinated clustered cytosine substitutions
with rearrangement breakpoints implicates the involvement of DNA double-strand-break (DSB) repair in the
formation of the mutations. The frequent involvement of single-stranded DNA (ssDNA) intermediates during such
DSB repair events combined with an over-representation of TCA and TCT sequences among the mutations
corresponds to the biochemical characteristics of a subset of apolipoprotein B mRNA editing enzyme catalytic
polypeptide-like (APOBEC) cytidine deaminases within ssDNA. Both C-to-T and C-to-G substitutions are also
over-represented. c | The mechanism of downstream processing of deoxyuridine (the deamination product of
deoxycytidine), explaining mutation specificity in clusters55: glycolytic conversion of deoxyuridine to an abasic site
creates a block to DNA synthesis during gap filling. The concerted action of DNA polymerase δ (Pol δ) and DNA Pol ζ (1)
or DNA Pol δ, DNA Pol ζ and REV1 (2) makes mutagenic insertion of either adenine or cytosine opposite the abasic
site, respectively, resulting in C‑to‑T and C‑to‑G mutations. d | Combining the favoured sequence motifs and base
substitution preferences of APOBEC enzymes creates a refined mutation signature and allows calculation of
sample-specific statistics evaluating mutagenesis — enrichment (E) by APOBEC signature mutations over
the presence of an APOBEC mutation motif in the surrounding nucleotide context. TCW and WGA indicate the
APOBEC-targeted sequences of TCT and TCA, as well as their complements, as described in REF. 71. Nucleotides that
are involved in mutation events are shown in red. UDG, uracil DNA glycosylase.
NATURE REVIEWS | CANCER
VOLUME 14 | DECEMBER 2014 | 791
© 2014 Macmillan Publishers Limited. All rights reserved
REVIEWS
Rearrangement breakpoints
A pair of distant genomic
coordinates that are brought
into the immediate vicinity of
each other by a
rearrangement.
[TC(A|T)→(T|G)] mutations prevailed in two of the
signatures (signatures 2 and 13) that occurred in several
cancer types that were defined by the NMF analysis11.
Pairing mutation signatures with their source: the
APOBEC example. Several key experimentally determined
characteristics of APOBEC cytidine deaminases enabled
attribution of the mutation signature [TC(A|T)→(T|G)]
to these enzymes (FIG. 2b). The human genome encodes
eight enzymatically active APOBEC polypeptides (seven
are located in the APOBEC3 cluster of highly homologous
genes), which normally serve to restrict viral infection and
retrotransposon mobility by deaminating cytosines during the ssDNA stage of their replication cycle82,83. These
enzymes have exquisite preference for ssDNA, where
they deaminate the C that resides in TCA and TCT trinucleotides to U. This U can be directly copied to produce
C-to-T substitutions. However, uracil bases in ssDNA
are often excised by uracil DNA glycosylase enzymes
(UDG enzymes, which are also known as UNG enzymes
in humans), leaving abasic sites which, after copying by
TLS polymerases, results in both C-to-T and C-to-G
mutations31,55,84–86 (FIG. 2c). Moreover, some APOBECs are
highly processive, allowing direct formation of mutation
clusters87,88. These biochemical properties correlated well
with the sequence motif and base substitution spectrum
of mutations in C‑ and G‑coordinated clusters, as well as
the colocalization with rearrangement breakpoints where
ssDNA is expected to have formed, either because ssDNA
is more fragile or because regions of ssDNA would be
created around DSBs by end resection56.
The primary benefit of associating a mutagenic process with a stringent signature, such as [TC(A|T)→(T|G)],
is that this signature can be used to develop a measure
of mutagenesis in the entire genome of individual samples — enrichment over the expected presence of the
signature if caused by random mutagenesis32,71 (FIG. 2d).
This allowed computation of sample-specific p‑values
as well as q‑values corrected for multiple testing errors.
In the case of the [TC(A|T)→(T|G)] signature, ana­
lysis of about a million mutations in 14 types of cancer
reported by TCGA revealed that bladder, cervical, breast,
head and neck and lung cancers have a high prevalence
of APOBEC-mutated samples. The same cancer types
showed the presence of signatures 2 and/or 13 in the
NMF-based analysis11. However, only the capacity to
compute sample-specific q‑values and enrichments
enabled the finding that tumours in the HER2‑enriched
subtype of breast cancer are much more likely to be
mutated by APOBEC compared with three other breast
cancer subtypes71.
APOBEC1, as well as six of the APOBEC3 proteins, have the ability to generate the [TC(A|T)→(T|G)]
mutation signature in vitro and in model in vivo studies70,82. A high level of APOBEC3B expression in several cancer types, and weak but statistically significant
correlation of its mRNA with the load of APOBEC
signature mutations, prompted Burns et al.89,90 to suggest that this enzyme is the likely cause of mutagenesis.
However, another study implicated APOBEC3A, as well
as APOBEC3B, as possible mutagenic enzymes in breast
cancer 91. Interestingly, APOBEC3B homozygous deletion is polymorphic within the human population92 and
individuals who are homozygous-null and even hetero­
zygous have a detectable increase in frequencies of breast
or liver cancers93–95 that can have a high frequency of
mutations fitting the APOBEC signature96. Identification
of the real culprit through analysis of cancer genomics
data sets is complicated by the level of mutagenesis,
which probably depends on additional factors besides
mRNA levels of APOBECs, such as the active protein
level, the accessibility of chromosomal DNA, and the
availability of ssDNA substrate (discussed in REF. 71).
It is also worth noting that many mutations detected in
cancers may have occurred long before tumour mRNA
was extracted for RNA sequencing.
Although several questions about APOBEC mutagenesis in cancers remain unresolved, it is a useful example of
a strategy leading to the construction of a rigid mutation
signature. This in turn allows statistical power that is sufficient for computing sample-specific p‑values, which are
not available if complex signatures derived from NMF are
used for sample-by‑sample analysis directly. In addition
to APOBEC mutagenesis, cancer genome resequencing identified other mutation signatures occurring both
in mutation clusters and scattered across the genome.
Although many of these signatures cannot yet be linked
to a specific source, correlations with mechanism-based
hypotheses have enabled the likely causative agents for
the following signatures, which can be broadly separated
according to the source — exposure to a mutagen, or
resulting from defective DNA repair (TABLE 2 and below).
Increased lesions or decreased error-free repair
This group of mutagenic mechanisms relies on exo­
genous or endogenous (such as the AID and APOBEC
enzymes discussed above) DNA lesions, which are
turned into mutations by downstream DNA repair and
replication (FIG. 3; TABLE 2).
Mutations in CpG motifs. Cytosines in CpG sequences
are often methylated at the C5 carbon, producing
5‑methylcytosines (5‑meCs). Deamination of 5‑meC
produces T and often C-to-T mutations. Such mutations can be prevented if a T∙G mispair is recognized
by the T∙G‑specific thymine DNA glycosylase (TDG)
and the resulting abasic site is repaired by base excision repair (BER) 97,98. Unlike mutations that are
caused by C‑deamination in the APOBEC motif, CpG
mutations are depleted within C- or G‑coordinated
mutation clusters, which are expected from simultaneous mutation events32,71. Frequent conversion of
5‑meC to T leads to depletion of CpG dinucleotides
from genomes, except within regulatory CpG islands
or rare coding positions, where methylation is limited and they are maintained by positive selection99,100.
When mutation rates in CpG motifs are normalized
for their presence in the genome or in the exome,
this mutagenesis becomes detectable in many WES
and WGS cancer mutation data sets11,12. Unlike other
mutation signatures, CpG mutagenesis correlates with
the patient’s age at cancer detection11.
792 | DECEMBER 2014 | VOLUME 14
www.nature.com/reviews/cancer
© 2014 Macmillan Publishers Limited. All rights reserved
REVIEWS
Table 2 | Signatures associated with different causes of hypermutation in human cancer genomes
Source of mutations
Mutated motif*‡
Resulting
base(s)
Base
change
Hypermutation load
(fraction)§
Additional features
UV radiation
(–|T|C|–)C(A|T|C|G)
T
C→T
30,000–50,000 (40–50%)
Exposure history; somatic
mutations, transcription
105
(–|T|C|–)C(G)
T
C→T
3,000–5,000 (5–10%)
Exposure history; somatic
mutations, transcription
105
(A|T|C|G)CC(A|T|C|G)
TT
C→T
500–1,000 (1%)
Exposure history; somatic
mutations, transcription
105
5‑meCpG deamination
(A|T|C|G)C(–|–|–|G)
T
C→T
1,000–10,000 (5–10%)
Age dependence
11
APOBEC
(–|T|–|–)C(A|T|–|–)
T|G
C→T
50,000–100,000 (30–70%)
C- or G‑strand-coordinated
clusters
71
AID
(A|T)(A|G)C(A|T|C|G)
T|G
C→T
Region specific
Clusters in AID genomic targets
DNA polymerase ε
(–|T|–|–)C(–|–|–|G)
T
C→T
10,000–20,000 (40–60%)
||
Refs
Somatic mutations, MSS
138
Somatic mutations, MSS
138
(–|T|–|–)C(–|T|–|–)
A
C→A
ROS
(A|T|C|G)C(A|T|C|G)
A
C→A
TBD
TBD
Tobacco
(A|T|C|G)C(A|T|C|G)
A
C→A
30,000–100,000 (20–90%)
Exposure history
11
Temozolomide
(A|T|C|G)C(–|T|C|–)
A
C→A
30,000–200,000 (30–90%)
Treatment record
126
Aristolochic acid
(–|–|C|–)T(A|–|–|G)
A
T→A
50,000–200,000 (30–70%)
Exposure history
42,45
DNA polymerase η
(A|T|C|G)T(A|T|–|–)
A|T|C|G
T→A
TBD
TBD
MMR defects
MSI
NA
NA
NA
Somatic mutations, MLH1
hypermethylation
*Each motif produced by the same source of mutation is shown on a separate line. ‡The mutated nucleotide (underlined) as well as the 5ʹ and 3ʹ flanking nucleotides
are shown for each motif: (5′ nucleotide) mutant (3′ nucleotide), with the exception of the mutation motif of AID, for which two 5ʹ nucleotides are shown. Possible
nucleotides for a position are shown in parentheses divided by “|” in the order A|T|C|G; if a certain nucleotide does not occur in this position, it is shown by “–” to
align similar nucleotides; “–” is not used in representation of signatures in the text and in BOX 1; AID and the third UV signature are not aligned with the others.
§
Representative examples are provided; whole-exome sequencing data are prorated on a whole-genome scale with the assumption of 100‑times more mutations in
whole-genome sequencing. ||Values indicate the combined contribution of both signatures. AID, activation-induced cytidine deaminase; APOBEC, apolipoprotein
B mRNA editing enzyme catalytic polypeptide-like; me, methyl; MMR, mismatch repair; MSI, microsatellite instability; MSS, microsatellite-stable; NA, not
applicable; ROS, reactive oxygen species; TBD, to be determined; UV, ultraviolet.
UV radiation. UVB (280–320 nm) and UVA (320–
400 nm), which are components of sunlight, are established risk factors for melanomas and head and neck
squamous cell cancers101. The spectrum and signature
of UV‑induced mutagenesis is defined by the spectrum of
lesions that arise and the interplay of repair pathways and
TLS across unrepaired lesions22,23. CPDs and 6–4PPs in
dsDNA are normally repaired by NER using the undamaged DNA strand as a template. Unrepaired photoproducts are converted into mutations by error-prone TLS
during replication102,103. The specific UV mutagenesis
signature relies on a highly increased rate of deamination
of cytosines in CPDs22,104. The resulting pyrimidine‑U
CPD is copied by the error-free TLS polymerase DNA
Pol η, resulting in C-to-T mutation in [(T|C)C(A|T|G|C)]
sequences. When CC dinucleotides form a CPD, deamination of both cytosines results in CC→TT dinucleotide
substitution. Indeed, both of these mutation signatures
are prevalent in melanomas and to a lesser extent in
squamous cell carcinomas of the head and neck11,12,43,105.
Oxidative DNA damage. Free radical (hydroxyl and
superoxide) and non-radical (hydrogen peroxide) reactive oxygen species (ROS) that are constantly produced
by cell metabolism and enter the cell from the environment can cause various kinds of oxidative damage to all
DNA bases, which can become mutations106–108. Studies
with mutation reporters have indicated that oxidative
damage can lead to a higher prevalence of mutations in
G∙C than in A∙T base pairs15,109–111, which could contribute to the bias towards G∙C mutations observed in many
cancers11,12. Highly mutagenic 8‑oxo‑7,8 dihydroguanine
(8‑oxoG) could be the reason for such a bias; however,
other products of reactions of guanine and cytosine could
also contribute108. Vasquez and colleagues112,113 found
that the mutation motif [(A|T|G|C)(T|C)C(A|T|C)] is
enriched in several cancers. Experimentally, they found
that guanines in the complementary sequence context
have a higher potential to trap electrons and thus have an
increased chance of chemical modification. Partial overlap of this motif with the APOBEC mutation motif [(A|T)
GA] in the G‑containing strand suggests that additional
steps should be taken at statistical analysis of either motif.
Tobacco. Tobacco contains multiple ingredients that
are capable of forming mutagenic DNA adducts114,115.
Among these, benzo[a]pyrene (BaP) is the most experimentally studied. Its adduct with guanine predominantly
induces G∙C→T∙A mutations. However, the same class of
mutations is expected from 8‑oxoG, which can result
from ROS that are generated by smoking. Regardless,
G∙C→T∙A mutations are the major class of base substitutions observed in TP53 from lung cancers and in WGS
and WES of lung cancers, and the occurrence of this
class of mutation correlates with smoking history 44,116–118.
Interestingly, in bladder cancer, for which smoking is
NATURE REVIEWS | CANCER
VOLUME 14 | DECEMBER 2014 | 793
© 2014 Macmillan Publishers Limited. All rights reserved
REVIEWS
Low mutation frequency
Error-free
lesion repair
MMR
Hypermutation
Figure 3 | Sources of hypermutation in cancer. Hypermutation can occur through
Nature Reviews
| Cancer
alterations in the equilibrium between the formation of lesions (hexagons)
and error-free
lesion repair, or changes in the equilibrium between the rate of replication errors and
DNA polymerase proofreading or mismatch repair (MMR) (shown as removal of bulged
mispairing behind the replication fork). Low levels of lesions and replication errors and/or
high efficiency of error-free lesion repair, proofreading and MMR results in low mutation
frequency; however, reduction in repair and/or increase in lesions or in replication error
levels may lead to hypermutation (mutant positions are shown in red and do not indicate
nucleotide content).
known to be a strong risk factor, the WES of 130 samples
did not contain an increased number of G∙C→T∙A mutations15. Instead, bladder cancers showed enrichment for
the APOBEC mutation signature.
SN1-type alkylating agents
Chemicals that form a reactive
intermediate in the body that
can attack DNA bases to
covalently link an organic
group (for example, a methyl
group).
Mismatch repair
(MMR). A DNA repair pathway
that removes mismatched
nucleotides resulting from DNA
replication errors (for example,
C∙T base pairs). MMR removes
a stretch of the DNA strand
containing the mismatched
nucleotide and then
subsequently uses the
remaining intact DNA strand as
a template to synthesize a new
stretch of DNA to replace the
excised one.
The mutation load in upper tract urothelial cancers
induced by aristolochic acid is one of the highest known
so far. A∙T→T∙A mutations in these cancers were enriched
with [(C|T)AG]42 (note that TABLE 2 shows this motif as
in the complementary (T) strand) or even with a more
stringent [A(C|T)AGG]45 motif.
Temozolomide. The major mutagenic product of the
SN1-type alkylating agents, including the anticancer drug
temozolomide (TMZ), is O6-methylguanine (O6-meG).
This DNA lesion can be copied by DNA polymerase,
inserting either the correct C‑nucleotide or the incorrect T. The latter would cause a C∙G→T∙A mutation.
O6-meG∙C as well as O6-meG∙T are recognised by the
mismatch repair (MMR) pathway, which functions in conjunction with DNA replication. Upon recognition of a
lesion, MMR triggers removal of a nascent DNA stretch
from the O6-meG lesion and reinitiation of DNA synthesis. Subsequent synthesis past the same O6-meG
frequently regenerates the mismatch, which is in turn
recognized by MMR and establishes a repetitive repair
cycle119–121. This ‘futile repair cycle’ is mostly toxic to proliferating cells, which underlies the mechanism of action
of TMZ as a cancer drug. However, resistance to TMZ as
well as to other SN1 alkylating agents can occur as long as
cancer cells acquire additional defects in MMR or overexpress O6-meG-DNA methyltransferase (MGMT)122–125.
This results in a high frequency of G∙C→A∙T mutations,
which occurs in the genomes of TMZ-treated gliomas
and melanomas11,19,126. Importantly, these G∙C→A∙T mutations were predominantly in CC or (to a lesser extent) in
CT motifs, which allows them to be distinguished from
C-to-T mutations that are associated with 5‑meCG.
Aristolochic acid. This carcinogenic substance is contained in the East-Asian traditional medicinal plant
Aristolochicia sp.127,128. The metabolized derivative of aristolochic acid forms mutagenic aristolactam–DNA adducts
(AL‑dAs). AL‑dAs are refractory to the GGR branch
of NER and can be repaired only by TCR in the transcribed strand129, which could explain a transcriptional
strand bias and high frequency of A∙T→T∙A mutations
in cancer-associated genes such as TP53 (REFS 130,131).
Increased DNA synthesis errors
The chances for mutations to arise from DNA synthesis
copying a normal template during genome replication
are limited by replicase accuracy in base selection, nearly
instant proofreading of replication errors and postreplicative MMR132,133. Inactivation of any of these safeguards can lead to hypermutation. Similarly, mutations
can also be introduced during replication when errorprone TLS polymerases copy small stretches of DNA that
do not contain lesions134.
MMR defects. The inherited predisposition to hereditary
non-polyposis colorectal cancer (HNPCC; also known
as Lynch syndrome) as well as to several other types of
cancers is the consequence of germline genetic defects
in one of the MMR genes: MSH2, MSH6, PMS2 and
MLH1 (REF. 133). The most easily detected result of MMR
defects, even in WGS and WES data sets, is an increase
in microsatellite instability (MSI) — frequent deletions and insertions in arrays of very short direct repeat
sequences (microsatellites)135. In sporadic colorectal, gastric and endometrial cancers14,16,136, genome-wide MSI
often coincides with MLH1 hypermethylation or with
somatic mutations in MMR genes. Cancer samples
with MSI also have an increased frequency of various
base substitutions. They form a category of hypermutated cancers, which is distinct from the hypermutation
that occurs in cancer cells with somatic mutations in
replicative DNA polymerases.
DNA Pol ε and DNA Pol δ mutations. Somatic mutations in the respective catalytic subunits of replicative
DNA Pol ε (POLE) and DNA Pol δ (POLD1) have
been recently associated with familial predisposition to
colorectal, endometrial and ovarian endometrioid cancers137–140. On the basis of the budding yeast model, DNA
Pol δ synthesises the lagging strand and DNA Pol ε is
responsible for the synthesis of the leading strand in a
bi‑directional replication fork141. In WES analysis, samples with somatic mutations in POLE are ‘ultramutated’
and carry even higher mutation loads than MSI-positive
MMR-deficient tumours. It was noted that sporadic
mutations in POLD1 were less frequent among ultramutated tumours, which mostly carried POLE mutations in
the proofreading exonuclease domain (EDM)14,137,142,143.
Samples with EDM mutations are particularly enriched
with two separate signatures [TCT→A] and [TCG→T]11.
TLS polymerases. DNA Pol η can perform error-free
copying of CPDs; however, it has a high error rate when
copying an undamaged template in vitro144. The DNA
Pol η [T(A|T)] mutation motif was significantly enriched
in A- or T‑coordinated clusters found in multiple myelomas32 (discussed above). Enrichment with this motif was
794 | DECEMBER 2014 | VOLUME 14
www.nature.com/reviews/cancer
© 2014 Macmillan Publishers Limited. All rights reserved
REVIEWS
also documented for chronic leukaemias and in B cell
lymphomas11,81. Another mutation signature, [AT→C],
detected in hepatitis C virus (HCV)-positive hepatocellular carcinomas was suggested to originate from errorprone synthesis by one of the TLS polymerases based on
overexpression of DNA Pol ζ and DNA Pol ι in this type
of cancer 145,146.
Structural and copy number changes
In addition to the large numbers of base substitutions
that are seen in various cancer types, many cancers show
marked structural rearrangement of the genome, resulting in chromosome translocations and changes in allelic
copy number. Copy number changes that are driven by
aneuploidy are themselves sufficient to promote carcinogenesis in mice147. A role for similar changes driving cancer
in humans has been indicated computationally 148, and by
recurrence in WGS analyses149. Observations in many lung
squamous cell carcinomas, as well as ovarian, breast and
head and neck cancers, suggested that either mutation or
rearrangement can drive cancer progression150. Discoveries
of phenomena termed ‘chromothripsis’ (REF. 151) and
‘chromoanasynthesis’ (REF. 152) indicate that structural
alterations, as well as base substitutions, can occur at
high density in localized regions, resulting in large-scale
rearrangement of one or a few chromosomes. Evidence
of frequent microhomology at the junction of these mass
chromosomal rearrangements151,152 and in instances of solitary structural changes25 suggests that mechanisms such
as non-homologous end joining and microhomologymediated break-induced replication at DSBs or fork
stalling and template switching can be involved in the generation of chromosome alteration; however, the initiating
causes of such events remain unclear.
Cancers with ‘simple’ genomes
Contrasting the multiple instances of hypermutation
and genome instability observed in various cancer types,
several recently characterized paediatric and some
adult lymphoid cancers show relatively small numbers
of genetic alterations12. Such ‘simple’ paediatric cancers
are exemplified by rhabdoid tumours in which the initial sequencing of 35 patients revealed mutation loads on
the order of 1–10 mutations per exome and only a single
recurrent event involving alteration of the gene encoding
the chromatin remodelling enzyme SMARCB1 (REF. 153).
Similarly, paediatric medulloblastomas and Ewing’s sarcomas have low mutation numbers17,154,155, and adult lymphoid cancers, such as acute myeloid leukaemia, often
have similar genetic complexity to corresponding normal
cells18. Thus, although increased mutation rates are one
potential means to dysregulate cell growth and proliferation leading to cancer development, others clearly exist.
The frequently recurrent disruption of genes encoding
epigenetic modifying enzymes such as SMARCB1, DNA
methyltransferase 3α (DNMT3A), TET2, enhancer of
zeste 2 (EZH2), and mixed lineage leukaemia 2 (MLL2;
also known as KMT2D) in paediatric and lymphoid
tumours156 may suggest that global changes in gene
expression through a single genetic change can be sufficient to drive cancer progression. Moreover, programmed
genetic changes such as V(D)J recombination, SHM and
class switch recombination that occur in lymphoid cells
can inadvertently target cancer-associated genes (for
example, recombination-activating gene (RAG)-mediated
deletion of ETS variant 6 (ETV6) and cyclin-dependent
kinase inhibitor 2A (CDKN2A)–CDKN2B in acute
lymphoblastic leukaemia (ALL)157, and MYC translocations in Burkitt’s lymphoma158,159) and can thereby promote cancer progression without the establishment of a
general mutator phenotype.
The carcinogenic potential of hypermutation
Despite the few examples of cancers with ‘simple’ genomes,
a high level of genome and epigenome instability in most
adult cancers is a well-accepted fact 150,160, but the relative
roles and specific contribution of each type of instability is still a question for the cancer research of today. Not
surprisingly, the relative carcinogenic role of hypermutation is still under debate8–10. Intrinsic features of cancers,
such as an increased number of cell divisions owing to
unrestrained proliferation, could produce high total numbers of mutations in a tumour without the requirement of
an increased mutation rate. Moreover, increased genome
instability that is initiated during malignant transformation may obscure less prominent mutagenic processes
operating early during tumorigenesis. The importance
of these minor pathways in driving cancer could be
greatly amplified relative to their contribution to the total
mutation load by selection for specific oncogenic changes.
However, it is clear that hypermutated cancer genomes
can contain prevailing mutation signatures in genes
that are important for cancer initiation and development12,42,45,71,90,156. Recent bioinformatics and experimental analyses indicate that in at least some of these cases,
hypermutation is inducing mutations that actively drive
carcinogenesis. For example, mutations occurring in
PIK3CA are more common in APOBEC-hypermutated
tumours, regardless of tumour type161. Moreover, the
same study demonstrated that the PIK3CA mutations in
APOBEC-hypermutated tumours are skewed towards
two ‘hotspot’ locations in the regulatory helical domain
over the third canonical mutation hotspot that is located
directly in the kinase domain. The over-representation
of the tumour-selected helical domain mutations, which
exist in APOBEC target motifs, over an equally selectable,
non-APOBEC mutation in the APOBEC-hypermutated
cancers strongly suggests that APOBEC enzymes are
probably causative in these cancers. In addition, Marais
and colleagues162 have shown that UV‑induced mutations
accelerate carcinogenesis in BRAF-deficient mice by
inducing selected mutations in Trp53. Identifying similar
examples of hypermutation-induced cancer will probably
become easier as our knowledge of the key genes that are
responsible for tumorigenesis becomes more complete.
Lawrence et al.12 found that accounting for mutational heterogeneity that is associated with various
functional and structural features across the genome
(TABLE 1) increased the statistical power for detecting
significantly mutated genes (SMGs) that are potentially
important for cancer, and it eliminated apparent false positives, such as olfactory receptor genes. These genes reside
NATURE REVIEWS | CANCER
VOLUME 14 | DECEMBER 2014 | 795
© 2014 Macmillan Publishers Limited. All rights reserved
REVIEWS
in late replicating, low transcribed regions of the genome
that show an elevated mutation rate. After correcting for
regional differences in mutation rate, these genes fell out
from the SMG category. Alternative methods of identifying cancer drivers, such as comparing ratios of benign
to deleterious mutations in specific genes149 or computational algorithms such as CRAVAT163, are also sensitive to
background mutation frequencies to some extent. High
numbers of passenger mutations and the potential for
mutational hotspots in hypermutated cancers can considerably complicate these analyses, potentially resulting
in both false-negative and false-positive results. We anticipate that the accumulation of mechanistic knowledge, as
well as WES and WGS cancer mutation data, will increase
the accuracy of correction factors by accounting for specific features of mutagenic mechanisms that are operating
in individual hypermutated cancer samples. For example,
calculating enrichments for a simple mutation signature
allows selection of those samples for analysis, in which a
significant part of the mutations carrying the signature
really have risen from a mutagen. Groups of samples with
high statistically-evaluated presence of specific mutation
signatures can be used as a discovery set for defining
genomic profiles of hypermutation caused by a signature-generating mutagen, leading to the development of
mutagen-specific correction factors in searching for SMGs.
Perspectives — actions and outcomes
Despite the current successes of resequencing genomes
in understanding the impact of mutation on cancer and
genome dynamics in general, a number of the mutation
signatures that have been highlighted by NMF50 cannot,
Box 1 | Merging statistical pattern analysis with mechanistic information
Although de novo pattern recognition approaches are useful for highlighting mutation signatures that prevail in large
groups of cancers, they can only identify samples that are enriched with a given signature after the principal component
of a signature is extracted and used for repeated analysis. Such an extraction can be done on the basis of the prevalence
of a signature derived by non-negative matrix factorization (NMF); however, an independent statistical analysis, such as
the presence of a principal component of a signature in clusters and/or correlation with mechanistic information on a
suggested source of mutagenesis may result in a more rigid definition of a signature, thereby increasing the power of the
statistical analysis.
Key benefits of using experimentally derived mechanistic knowledge to generate rigid mutation signatures include:
•The information about mutagenic specificity of environmental and endogenous causes of mutations can be used for
reducing each multicomponent signature derived from NMF to a less complex one, which is better suited for
sample-to‑sample analysis.
•Mechanistic information can help avoid over-simplification of mutation signatures, which would result in excessive
overlap.
•As long as two simple signatures are known to come from the same mechanism, they can be used within a single, more
focused and powerful statistical hypothesis.
The examples listed in TABLE 2 and below (see also REFS 112,168) illustrate these lines of analysis:
Example 1
The ultraviolet (UV) radiation signature, [(T|C)C(A|T|C|G)→T] (similar to TABLE 2, mutated bases are underlined and
ambiguous nucleotides are shown in parentheses separated by “|”; unlike in TABLE 2, only the nucleotides that can be
found in a given position of a signature are shown) overlaps with the [TC(A|T)→T] part of the apolipoprotein B mRNA
editing enzyme catalytic polypeptide-like (APOBEC) signature. However, the other part of the APOBEC signature
[TC(A|T)→G] is absent in the UV mutagenesis signature. Further supporting the lack of [TC(A|T)→G] mutations stemming
from UV radiation, in melanoma genomes, the density of [TC(A|T)→T] mutations showed reverse dependence on
nucleosome density, whereas [TC(A|T)→G] did not25. Another feature distinguishing UV from APOBEC mutagenesis is the
lack of preference in the choice of the 3′‑nucleotide setting the mutation motif as [(T|C)C(A|T|C|G)→T]. Moreover, the third
nucleotide position for a UV‑induced signature should be enriched with G, because methylation of cytosine increases the
chance of cyclobutane pyrimidine dimer (CPD) formation22,104. Indeed, high prevalence of [(T|C)CG→T] mutations is a
common feature of many UV‑associated cancers11,12,43,105. Together with CC→TT enrichment in a sample, these attributes
provide a good tool for highlighting cancer samples with a strong component of UV‑induced mutagenesis. In addition,
the partial overlap between the part of the UV mutation signature [(T|C)C→T] and the APOBEC signature can also be
resolved by UV‑mutagenesis occurring more frequently in the non-transcribed strand (TABLE 1), whereas APOBEC
signature mutations do not show any transcriptional bias11.
Example 2
The partial overlap between the APOBEC mutation signature [(T)C(A|T)→T|G] and the DNA polymerase ε mutation
signature [TCT→A] can be resolved by accounting for the difference in the base resulting from a mutation (T or G versus
A), which is in agreement with mechanistic information about translesion synthesis (TLS) across abasic sites — the
primary lesion generated in place of uracils created by APOBEC cytidine deamination in single-stranded DNA (ssDNA)55.
Example 3
The overlap between the tobacco signature, [(A|T|C|G)C(A|T|C|G)→A], and the temozolomide (TMZ) signature, [(A|T|C|G)
C(T|C)→A], can be resolved using the clinical history of exposure.
As mechanism-based mutation signatures are confined to simple nucleotide motifs, it is possible to calculate
enrichment for the motif in mutations over its presence in a sequenced part of the genome (see FIG. 2 for an example
involving APOBEC). These calculations are useful for discerning between mutagenic processes that generate overlapping
signatures (see, for example, Figure 1 in REF 71), as the more likely process would generate a higher enrichment.
796 | DECEMBER 2014 | VOLUME 14
www.nature.com/reviews/cancer
© 2014 Macmillan Publishers Limited. All rights reserved
REVIEWS
Box 2 | Establishing a uniform organization among mutation databases
Unifying the organization of the large tables containing mutation lists (called Mutation
Annotation Files (MAFs) in The Cancer Genome Atlas (TCGA) data depository) and
supplementing them with additional information could greatly facilitate the use of
genomics data by a large number of mechanistic experts who do not have sufficient
bioinformatics expertise. Below are several suggestions in this regard:
•Unifying the column titles and data value designations is a simple first step to
facilitate data analysis by multiple groups outside the data generation consortia.
•Unifying sample identification (ID) so that it is the same for a given sample between all
platforms (clinical records, expression, methylation, mutation, rearrangement, copy
number variation and so on) and data sets within a study.
•Providing the values for allele fractions and coverage for each mutation call, which is
a natural requirement for the accurate representation of mutation data in a cancer
sample that usually consists of more than one clone. Analysis of allelic fractions
provides an understanding of cancer development history in the sense of mutation
incidence, selection and fixation148,169.
•So far, statistical analyses of hypermutation have been based on tumour-specific
mutation calls. Adding the data about germline genotype for each patient with a
sporadic tumour analysed by genomics platforms would allow genetic analysis of
mutation spectra and mutation signatures as quantitative genetic traits. Although
these data are available from the same sequencing effort that produces
tumour-specific mutation calls, they are not provided in sample depositories or by
individual studies.
so far, be associated with any known mutagenic pathway
or agent. Likewise, the mechanism underlying other
forms of hypermutation, such as the mutation signature
[(A|T|C|G)TT→G] in oesophageal cancer 164, hypermutation of the female inactivated X chromosome165 and
hypermutated prostate cancer 166,167, remain unknown.
Combining statistical signature analysis with mechanistic research could bring clarity to these issues (BOX 1).
Moreover, statistical evaluation of specific mutagenesis
1.Aaltonen, L. A. et al. Clues to the pathogenesis of
familial colorectal cancer. Science 260, 812–816
(1993).
2. Cleaver, J. E. & Crowley, E. UV damage, DNA repair
and skin carcinogenesis. Frontiers Biosci. 7,
D1024–D1043 (2002).
3. Hart, R. W., Setlow, R. B. & Woodhead, A. D. Evidence
That Pyrimidine Dimers in DNA Can Give Rise to
Tumors. Proc. Natl Acad. Sci. USA 74, 5574–5578
(1977).
4. Hecht, S. S. Progress and challenges in selected areas
of tobacco carcinogenesis. Chem. Res. Toxicol. 21,
160–171 (2008).
5. Ionov, Y., Peinado, M. A., Malkhosyan, S., Shibata, D.
& Perucho, M. Ubiquitous somatic mutations in simple
repeated sequences reveal a new mechanism for
colonic carcinogenesis. Nature 363, 558–561 (1993).
6.Parsons, R. et al. Hypermutability and mismatch
repair deficiency in RER+ tumor cells. Cell 75,
1227–1236 (1993).
7.Cunningham, J. M. et al. Hypermethylation of
the hMLH1 promoter in colon cancer with
microsatellite instability. Cancer Res. 58,
3455–3460 (1998).
8. Fox, E. J., Prindle, M. J. & Loeb, L. A. Do mutator
mutations fuel tumorigenesis? Cancer Metastasis Rev.
32, 353–361 (2013).
9. Fox, E. J., Beckman, R. A. & Loeb, L. A. Reply: Is There
Any Genetic Instability in Human Cancer? DNA Repair
9, 859–860 (2010).
10. Shibata, D. & Lieber, M. R. Is there any genetic
instability in human cancer? DNA Repair 9, 858;
discussion 859–860 (2010).
11.Alexandrov, L. B. et al. Signatures of mutational
processes in human cancer. Nature 500, 415–421
(2013).
signatures among an expanded sequencing effort that
includes more preneoplastic lesions and relapsed tumour
samples may eventually enable ana­lyses that correlate
these mutations to clinical phenotypes, provide a better
understanding of the role of hypermutation in cancer
development, and potentially lead to new cancer prevention, early diagnostics and therapeutic strategies. These
perspectives can become closer with concerted efforts of
the bioinformatics and mechanistic fields.
Both the bioinformatics and mechanistic fields will
benefit from a growing number of analysed samples,
especially if the output data are expanded and presented
in uniform format between different project and data
depositories. A common organization would greatly facilitate analyses that are carried out on different cancer data
sets (BOX 2). Unifying and supplementing information
in data sets could be matched by mechanistic research
that commonly addresses questions of mutation signatures and that develops clear mutation motifs, as long as
such motifs can be derived from a study. In other words,
an effort should be made, where possible, to organize
conclusions of mechanistic research in a form that is as
convenient as possible for immediate statistical exploration in cancer mutation data sets. These simple steps
may result in drafting many researchers with all kinds
of expertise, who would otherwise be deterred by the
perspective of an indefinite time that was needed for
mining and organizing the data before carrying out
pilot ana­lyses, which only infrequently lead to a finding. Altogether, merging mechanistic and bioinformatics approaches suggests realistic modifications to
study formats and data organization with the payoff in
more discoveries that could be important for the fight
against cancer.
This paper applied NMF to a mutation catalogue
from 7,042 tumours to identify 21 mutation
signatures occurring in these cancers. Additional
analyses were carried out to suggest the causes of
some of these signatures.
12.Lawrence, M. S. et al. Mutational heterogeneity in
cancer and the search for new cancer-associated
genes. Nature 499, 214–218 (2013).
This paper identifies genes that are recurrently
mutated in sequenced whole genomes and exomes
of various cancers, thereby identifying genes the
alteration of which probably contributes to cancer
progression. This method standardizes the rate of
gene mutation by the overall regional mutation rates.
13. Loeb, L. A., Springgate, C. F. & Battula, N. Errors in
DNA replication as a basis of malignant changes.
Cancer Res. 34, 2311–2321 (1974).
14. Cancer Genome Atlas Network. Comprehensive
molecular characterization of human colon and rectal
cancer. Nature 487, 330–337 (2012).
15. Cancer Genome Atlas Research Network.
Comprehensive molecular characterization of urothelial
bladder carcinoma. Nature 507, 315–322 (2014).
16.Nagarajan, N. et al. Whole-genome reconstruction
and mutational signatures in gastric cancer. Genome
Biol. 13, R115 (2012).
17.Pugh, T. J. et al. Medulloblastoma exome sequencing
uncovers subtype-specific somatic mutations. Nature
488, 106–110 (2012).
18.Welch, J. S. et al. The origin and evolution of
mutations in acute myeloid leukemia. Cell 150,
264–278 (2012).
19. Cancer Genome Atlas Research Network.
Comprehensive genomic characterization defines
human glioblastoma genes and core pathways. Nature
455, 1061–1068 (2008).
NATURE REVIEWS | CANCER
20. Benzer, S. & Freese, E. Induction of Specific Mutations
with 5‑Bromouracil. Proc. Natl Acad. Sci. USA 44,
112–119 (1958).
21. Rogozin, I. B. & Pavlov, Y. I. Theoretical analysis of
mutation hotspots and their DNA sequence context
specificity. Mutat. Res. 544, 65–85 (2003).
22. Pfeifer, G. P., You, Y. H. & Besaratinia, A. Mutations
induced by ultraviolet light. Mutat. Res. 571, 19–31
(2005).
23. Sage, E. Distribution and repair of photolesions in
DNA: genetic consequences and the role of sequence
context. Photochem. Photobiol. 57, 163–174 (1993).
24. De, S. & Babu, M. M. A time-invariant principle of
genome evolution. Proc. Natl Acad. Sci. USA 107,
13004–13009 (2010).
25.Drier, Y. et al. Somatic rearrangements across cancer
reveal classes of samples with distinct patterns of DNA
breakage and rearrangement-induced hypermutability.
Genome Res. 23, 228–235 (2013).
26. Hicks, W. M., Kim, M. & Haber, J. E. Increased
mutagenesis and unique mutation signature
associated with mitotic gene conversion. Science 329,
82–85 (2010).
27. Malkova, A. & Haber, J. E. Mutations arising during
repair of chromosome breaks. Annu. Rev. Genet. 46,
455–473 (2012).
This review discusses the experimental evidence
that DSB repair processes are prone to increased
rates of mutation.
28.Deem, A. et al. Break-induced replication is highly
inaccurate. PLoS Biol. 9, e1000594 (2011).
29. Shee, C., Gibson, J. L. & Rosenberg, S. M. Two
mechanisms produce mutation hotspots at DNA breaks
in Escherichia coli. Cell Rep. 2, 714–721 (2012).
30.Burch, L. H. et al. Damage-induced localized
hypermutability. Cell Cycle 10, 1073–1085 (2011).
VOLUME 14 | DECEMBER 2014 | 797
© 2014 Macmillan Publishers Limited. All rights reserved
REVIEWS
31.Chan, K. et al. Base damage within single-strand DNA
underlies in vivo hypermutability induced by a
ubiquitous environmental agent. PLoS Genet. 8,
e1003149 (2012).
32.Roberts, S. A. et al. Clustered mutations in yeast and
in human cancers can arise from damaged long singlestrand DNA regions. Mol. Cell 46, 424–435 (2012).
This paper identified mutation clusters that are
induced by APOBEC cytidine deaminase activity in
human multiple myeloma, prostate cancer and
head and neck cancer. It also provided
experimental evidence from a yeast model system
that DNA damage-induced mutation clusters can
be formed by targeting regions of ssDNA.
33. Yang, Y., Gordenin, D. A. & Resnick, M. A. A singlestrand specific lesion drives MMS-induced hypermutability at a double-strand break in yeast. DNA
Repair 9, 914–921 (2010).
34. Yang, Y., Sterling, J., Storici, F., Resnick, M. A. &
Gordenin, D. A. Hypermutability of damaged singlestrand DNA formed at double-strand breaks and
uncapped telomeres in yeast Saccharomyces
cerevisiae. PLoS Genet. 4, e1000264 (2008).
35. Rhind, N. & Gilbert, D. M. DNA replication timing.
Cold Spring Harb. Perspect. Biol. 5, a010132
(2013).
36. Stamatoyannopoulos, J. A. What does our
genome encode? Genome Res. 22, 1602–1611
(2012).
37. Thurman, R. E., Day, N., Noble, W. S. &
Stamatoyannopoulos, J. A. Identification of higherorder functional domains in the human ENCODE
regions. Genome Res. 17, 917–927 (2007).
38. Liu, L., De, S. & Michor, F. DNA replication timing and
higher-order nuclear organization determine singlenucleotide substitution patterns in cancer genomes.
Nature Commun. 4, 1502 (2013).
39.Polak, P. et al. Reduced local mutation density in
regulatory DNA of cancer genomes is linked to DNA
repair. Nature Biotech. 32, 71–75 (2014).
40.Koren, A. et al. Differential relationship of DNA
replication timing to different forms of human
mutation and variation. Am. J. Hum. Genet. 91,
1033–1040 (2012).
41.Stamatoyannopoulos, J. A. et al. Human mutation rate
associated with DNA replication timing. Nature Genet.
41, 393–395 (2009).
This paper reports the discovery that regions of the
genome that replicate late during S-phase have a
higher rate of substitutions during human
evolution.
42.Hoang, M. L. et al. Mutational signature of
aristolochic acid exposure as revealed by wholeexome sequencing. Sci. Transl. Med. 5, 197ra102
(2013).
43.Pleasance, E. D. et al. A comprehensive catalogue of
somatic mutations from a human cancer genome.
Nature 463, 191–196 (2010).
44.Pleasance, E. D. et al. A small-cell lung cancer genome
with complex signatures of tobacco exposure. Nature
463, 184–190 (2010).
45.Poon, S. L. et al. Genome-wide mutational
signatures of aristolochic acid and its application as
a screening tool. Sci. Transl. Med. 5,197ra101
(2013).
References 42 and 45 describe extremely high
mutation loads and specific mutation signatures in
the WGS and WES of upper urinary tract urothelial
cell carcinomas that are associated with exposure
to aristolochic acid.
46. Schuster-Bockler, B. & Lehner, B. Chromatin
organization is a major influence on regional mutation
rates in human cancer cells. Nature 488, 504–507
(2012).
47. Stratton, M. R. Exploring the genomes of cancer cells:
progress and promise. Science 331, 1553–1558
(2011).
48. Devarajan, K. Nonnegative matrix factorization: an
analytical and interpretive tool in computational
biology. PLoS Comput. Biol. 4, e1000029 (2008).
49. Lee, D. D. & Seung, H. S. Learning the parts of objects
by non-negative matrix factorization. Nature 401,
788–791 (1999).
50. Alexandrov, L. B., Nik-Zainal, S., Wedge, D. C.,
Campbell, P. J. & Stratton, M. R. Deciphering
signatures of mutational processes operative in human
cancer. Cell Rep. 3, 246–259 (2013).
This paper describes the mathematical
framework for applying NMF to identify
mutation signatures that occur in human cancer
databases.
51.Nik-Zainal, S. et al. Mutational processes molding the
genomes of 21 breast cancers. Cell 149, 979–993
(2012).
52. Jia, P., Pao, W. & Zhao, Z. Patterns and processes of
somatic mutations in nine major cancers. BMC Med.
Genom. 7, 11 (2014).
53.Wang, J. et al. Evidence for mutation showers. Proc.
Natl Acad. Sci. USA 104, 8403–8408 (2007).
54. Chen, Z., Feng, J., Buzin, C. H. & Sommer, S. S.
Epidemiology of doublet/multiplet mutations in lung
cancers: evidence that a subset arises by
chronocoordinate events. PLoS ONE 3, e3714
(2008).
55. Chan, K., Resnick, M. A. & Gordenin, D. A. The choice
of nucleotide inserted opposite abasic sites formed
within chromosomal DNA reveals the polymerase
activities participating in translesion DNA synthesis.
DNA Repair 12, 878–889 (2013).
56. Mimitou, E. P. & Symington, L. S. DNA end resection
— unraveling the tail. DNA Repair 10, 344–348
(2011).
57. Dewar, J. M. & Lydall, D. Similarities and differences
between “uncapped” telomeres and DNA doublestrand breaks. Chromosoma 121, 117–130
(2012).
58.Saini, N. et al. Migrating bubble during break-induced
replication drives conservative DNA synthesis. Nature
502, 389–392 (2013).
This paper reports the discovery that the
specialized form of DSB repair called
break-induced replication occurs through
conservative DNA synthesis in which the leading
and lagging strand synthesis is uncoupled, leading
to the formation of a ssDNA intermediate.
59. Lopes, M., Foiani, M. & Sogo, J. M. Multiple
mechanisms control chromosome integrity after
replication fork uncoupling and restart at
irreparable UV lesions. Mol. Cell 21, 15–27
(2006).
60. Sogo, J. M., Lopes, M. & Foiani, M. Fork reversal and
ssDNA accumulation at stalled replication forks
owing to checkpoint defects. Science 297, 599–602
(2002).
61. McInerney, P. & O’Donnell, M. Functional uncoupling
of twin polymerases: mechanism of polymerase
dissociation from a lagging-strand block. J. Biol. Chem.
279, 21543–21551 (2004).
62. Yeeles, J. T., Poli, J., Marians, K. J. & Pasero, P.
Rescuing stalled or damaged replication forks.
Cold Spring Harb. Perspect. Biol. 5, a012815
(2013).
63. Sakofsky, C. J. et al. Break-induced replication is a
source of mutation clusters underlying kataegis. Cell
Rep. 7, 1640–1648 (2014).
64. Bartek, J., Lukas, J. & Bartkova, J. DNA damage
response as an anti-cancer barrier: damage threshold
and the concept of ‘conditional haploinsufficiency’. Cell
Cycle 6, 2344–2347 (2007).
65.Costantino, L. et al. Break-induced replication repair
of damaged forks induces genomic duplications in
human cells. Science 343, 88–91 (2014).
66.Chin, K. et al. In situ analyses of genome instability
in breast cancer. Nature Genet. 36, 984–988
(2004).
67.Vega, F. et al. Splenic marginal zone lymphomas are
characterized by loss of interstitial regions of
chromosome 7q, 7q31.32 and 7q36.2 that include
the protection of telomere 1 (POT1) and sonic
hedgehog (SHH) genes. Br. J. Haematol. 142,
216–226 (2008).
68.Poncet, D. et al. Changes in the expression of
telomere maintenance genes suggest global telomere
dysfunction in B‑chronic lymphocytic leukemia. Blood
111, 2388–2391 (2008).
69. Aguilera, A. & Garcia-Muse, T. R loops: from
transcription byproducts to threats to genome
stability. Mol. Cell 46, 115–124 (2012).
70. Roberts, S. A. & Gordenin, D. A. Clustered and
genome-wide transient mutagenesis in human cancers:
Hypermutation without permanent mutators or loss of
fitness. Bioessays 36, 382–393 (2014).
71.Roberts, S. A. et al. An APOBEC cytidine deaminase
mutagenesis pattern is widespread in human cancers.
Nature Genet. 45, 970–976 (2013).
This paper used a hypothesis-driven approach to
calculate the over-representation of APOBEC
signature mutations across 2,680 tumours. This
work identified APOBEC signature mutagenesis as
a prominent source of mutation in six cancer types
and correlated the total number of these mutations
with APOBEC mRNA levels.
798 | DECEMBER 2014 | VOLUME 14
72. Rogozin, I. B., Pavlov, Y. I., Bebenek, K., Matsuda, T. &
Kunkel, T. A. Somatic mutation hotspots correlate with
DNA polymerase eta error spectrum. Nature Immunol.
2, 530–536 (2001).
73. Neuberger, M. S. & Rada, C. Somatic hypermutation:
activation-induced deaminase for C/G followed by
polymerase eta for A/T. J. Exp. Med. 204, 7–10
(2007).
74. Liu, M. & Schatz, D. G. Balancing AID and DNA repair
during somatic hypermutation. Trends Immunol. 30,
173–181 (2009).
75. Maul, R. W. & Gearhart, P. J. AID and somatic
hypermutation. Adv. Immunol. 105, 159–191
(2010).
76.Peled, J. U. et al. The biochemistry of somatic
hypermutation. Annu. Rev. Immunol. 26, 481–511
(2008).
77. Rogozin, I. B. & Diaz, M. Cutting edge: DGYW/WRCH
is a better predictor of mutability at G:C bases in Ig
hypermutation than the widely accepted RGYW/WRCY
motif and probably reflects a two-step activationinduced cytidine deaminase-triggered process.
J. Immunol. 172, 3382–3384 (2004).
78.Migliazza, A. et al. Frequent somatic hypermutation of
the 5′ noncoding region of the BCL6 gene in B‑cell
lymphoma. Proc. Natl Acad. Sci. USA 92,
12520–12524 (1995).
79.Pasqualucci, L. et al. Hypermutation of multiple protooncogenes in B‑cell diffuse large-cell lymphomas.
Nature 412, 341–346 (2001).
80.Bolli, N. et al. Heterogeneity of genomic evolution and
mutational profiles in multiple myeloma. Nature
Commun. 5, 2997 (2014).
81.Puente, X. S. et al. Whole-genome sequencing
identifies recurrent mutations in chronic lymphocytic
leukaemia. Nature 475, 101–105 (2011).
82. Refsland, E. W. & Harris, R. S. The APOBEC3 family of
retroelement restriction factors. Curr. Top. Microbiol.
Immunol. 371, 1–27 (2013).
83. Smith, H. C., Bennett, R. P., Kizilyer, A.,
McDougall, W. M. & Prohaska, K. M. Functions and
regulation of the APOBEC family of proteins. Semin.
Cell Dev. Biol. 23, 258–268 (2012).
84. Gibbs, P. E. & Lawrence, C. W. Novel mutagenic
properties of abasic sites in Saccharomyces
cerevisiae. J. Mol. Biol. 251, 229–236 (1995).
85. Gibbs, P. E., McDonald, J., Woodgate, R. &
Lawrence, C. W. The relative roles in vivo of
Saccharomyces cerevisiae Pol ε, Pol ζ, Rev1 protein
and Pol32 in the bypass and mutation induction of an
abasic site, T‑T (6‑4) photoadduct and T‑T cis-syn
cyclobutane dimer. Genetics 169, 575–582 (2005).
86.Krokan, H. E. et al. Error-free versus mutagenic
processing of genomic uracil-Relevance to cancer. DNA
Repair 19, 38–47(2014).
87. Pham, P., Chelico, L. & Goodman, M. F. DNA
deaminases AID and APOBEC3G act processively on
single-stranded DNA. DNA Repair 6, 689–692;
author reply 693–694 (2007).
88. Chelico, L., Pham, P., Calabrese, P. & Goodman, M. F.
APOBEC3G DNA deaminase acts processively 3′ → 5′
on single-stranded DNA. Nature Struct. Mol. Biol. 13,
392–399 (2006).
89.Burns, M. B. et al. APOBEC3B is an enzymatic source
of mutation in breast cancer. Nature 494, 366–370
(2013).
90. Burns, M. B., Temiz, N. A. & Harris, R. S. Evidence for
APOBEC3B mutagenesis in multiple human cancers.
Nature Genet. 45, 977–983 (2013).
This paper investigated the expression of
APOBEC3B across tumours of multiple cancer
types, finding that the mRNA levels of this gene
correlated with the total mutation load of the
tumour. Mutations from tumours with the highest
APOBEC3B expression levels occurred primarily
within APOBEC-targeted motifs.
91.Taylor, B. J. et al. DNA deaminases induce breakassociated mutation showers with implication of
APOBEC3B and 3A in breast cancer kataegis. Elife 2,
e00534 (2013).
92. Kidd, J. M., Newman, T. L., Tuzun, E., Kaul, R. &
Eichler, E. E. Population stratification of a common
APOBEC gene deletion polymorphism. PLoS Genet. 3,
e63 (2007).
93.Long, J. et al. A common deletion in the APOBEC3
genes and breast cancer risk. J. Natl Cancer Inst. 105,
573–579 (2013).
94.Xuan, D. et al. APOBEC3 deletion polymorphism is
associated with breast cancer risk among women of
European ancestry. Carcinogenesis 34, 2240–2243
(2013).
www.nature.com/reviews/cancer
© 2014 Macmillan Publishers Limited. All rights reserved
REVIEWS
95.Zhang, T. et al. Evidence of associations of APOBEC3B
gene deletion with susceptibility to persistent HBV
infection and hepatocellular carcinoma. Hum. Mol.
Genet. 22, 1262–1269 (2013).
96.Nik-Zainal, S. et al. Association of a germline copy
number polymorphism of APOBEC3A and APOBEC3B
with burden of putative APOBEC-dependent
mutations in breast cancer. Nature Genet. 46,
487–491 (2014).
97. Pfeifer, G. P. Mutagenesis at methylated CpG
sequences. Curr. Top. Microbiol. Immunol. 301,
259–281 (2006).
98.Visnes, T. et al. Uracil in DNA and its processing by
different DNA glycosylases. Phil. Trans. R. Soc. B 364,
563–568 (2009).
99. Antequera, F. Structure, function and evolution of CpG
island promoters. Cell. Mol. Life Sci. 60, 1647–1658
(2003).
100.Schmidt, S. et al. Hypermutable non-synonymous sites
are under stronger negative selection. PLoS Genet. 4,
e1000281 (2008).
101. World Health Organisation. A Review of Human
Carcinogens: IARC Monographs on the Evaluation of
Carcinogenic Risks to Humans. 100D, 7–303 (World
Health Organisation, 2012).
102.Plosky, B. S. & Woodgate, R. Switching from highfidelity replicases to low-fidelity lesion-bypass
polymerases. Curr. Opin. Genet. Dev. 14, 113–119
(2004).
103.Sale, J. E. Translesion DNA synthesis and mutagenesis
in eukaryotes. Cold Spring Harb. Perspect. Biol. 5,
a012708 (2013).
104.Ikehata, H. & Ono, T. The mechanisms of UV
mutagenesis. J. Radiat. Res. 52, 115–125
(2011).
105.Berger, M. F. et al. Melanoma genome sequencing
reveals frequent PREX2 mutations. Nature 485,
502–506 (2012).
106.Breen, A. P. & Murphy, J. A. Reactions of oxyl radicals
with DNA. Free Radic. Biol. Med. 18, 1033–1077
(1995).
107. Cadet, J. & Wagner, J. R. DNA base damage by
reactive oxygen species, oxidizing agents, and UV
radiation.Cold Spring Harb. Perspect. Biol. 5,
a012559 (2013).
108.Evans, M. D., Dizdaroglu, M. & Cooke, M. S.
Oxidative DNA damage and disease: induction,
repair and significance. Mutat. Res. 567, 1–61
(2004).
109.Degtyareva, N. P. et al. Oxidative stress-induced
mutagenesis in single-strand DNA occurs primarily at
cytosines and is DNA polymerase ζ-dependent only for
adenines and guanines. Nucleic Acids Res. 41,
8995–9005 (2013).
110. Moraes, E. C., Keyse, S. M. & Tyrrell, R. M.
Mutagenesis by hydrogen peroxide treatment of
mammalian cells: a molecular analysis. Carcinogenesis
11, 283–293 (1990).
111. Tkeshelashvili, L. K., McBride, T., Spence, K. &
Loeb, L. A. Mutation spectrum of copper-induced
DNA damage. J. Biol. Chem. 266, 6401–6406
(1991).
112. Bacolla, A., Cooper, D. N. & Vasquez, K. M.
Mechanisms of base substitution mutagenesis in
cancer genomes. Genes 5, 108–146 (2014).
113.Bacolla, A. et al. Guanine holes are prominent targets
for mutation in cancer and inherited disease. PLoS
Genet. 9, e1003816 (2013).
This paper describes a positive correlation between
sequence-dependent oxidation potentials of
guanines and the number of mutations that occur
within specific sequence motifs in various human
cancers.
114. DeMarini, D. M. Genotoxicity of tobacco smoke and
tobacco smoke condensate: a review. Mutat. Res.
567, 447–474 (2004).
115.Pfeifer, G. P. et al. Tobacco smoke carcinogens,
DNA damage and p53 mutations in smokingassociated cancers. Oncogene 21, 7435–7451
(2002).
116.Govindan, R. et al. Genomic landscape of non-small
cell lung cancer in smokers and never-smokers. Cell
150, 1121–1134 (2012).
117.Imielinski, M. et al. Mapping the hallmarks of lung
adenocarcinoma with massively parallel sequencing.
Cell 150, 1107–1120 (2012).
118.Lee, W. et al. The mutation spectrum revealed by
paired genome sequences from a lung cancer patient.
Nature 465, 473–477 (2010).
119. Bodell, W. J., Gaikwad, N. W., Miller, D. & Berger, M. S.
Formation of DNA adducts and induction of lacI
mutations in Big Blue Rat‑2 cells treated with
temozolomide: implications for the treatment of lowgrade adult and pediatric brain tumors. Cancer
Epidemiol. Biomarkers Prev. 12, 545–551 (2003).
120.Stojic, L., Brun, R. & Jiricny, J. Mismatch repair and
DNA damage signalling. DNA Repair 3, 1091–1101
(2004).
121.Tomita-Mitchell, A. et al. Mismatch repair deficient
human cells: spontaneous and MNNG-induced
mutational spectra in the HPRT gene. Mutat. Res.
450, 125–138 (2000).
122.Cahill, D. P. et al. Loss of the mismatch repair
protein MSH6 in human glioblastomas is associated
with tumor progression during temozolomide
treatment. Clin. Cancer Res. 13, 2038–2045
(2007).
This paper identified mutation or loss of expression
of MSH6 as a recurrent mediator of glioblastoma
resistance to the alkylating chemotherapeutic drug
temozolomide.
123.Hunter, C. et al. A hypermutation phenotype and
somatic MSH6 mutations in recurrent human
malignant gliomas after alkylator chemotherapy.
Cancer Res. 66, 3987–3991 (2006).
124.Moen, E. L., Stark, A. L., Zhang, W., Dolan, M. E. &
Godley, L. A. The role of gene body cytosine
modifications in MGMT expression and sensitivity to
temozolomide. Mol. Cancer Ther. (2014).
125.Zhang, J. et al. Certain imidazotetrazines escape
O6‑methylguanine-DNA methyltransferase and
mismatch repair. Oncology 80, 195–207 (2011).
126.Johnson, B. E. et al. Mutational analysis reveals the
origin and therapy-driven evolution of recurrent
glioma. Science 343, 189–193 (2014).
This paper used exome sequencing to compare
the mutations occurring in primary gliomas to
those in relapsed tumours, finding that relapsed
tumours frequently stem from cancer cells that
are established early during tumour progression.
Moreover, tumours that recurred following
treatment with temozolomide more frequently
resulted in high-grade gliomas with temozolomide-induced mutations in the RB and mTOR
pathways.
127.Arlt, V. M., Stiborova, M. & Schmeiser, H. H.
Aristolochic acid as a probable human cancer hazard
in herbal remedies: a review. Mutagenesis 17,
265–277 (2002).
128.National Toxicology Program. Aristolochic acids. Rep.
Carcinog. 12, 45–49 (2011).
129.Sidorenko, V. S. et al. Lack of recognition by globalgenome nucleotide excision repair accounts for the
high mutagenicity and persistence of aristolactamDNA adducts. Nucleic Acids Res. 40, 2494–2505
(2012).
130.Chen, C. H. et al. Aristolochic acid-associated
urothelial cancer in Taiwan. Proc. Natl Acad. Sci. USA
109, 8241–8246 (2012).
131.Hollstein, M., Moriya, M., Grollman, A. P. &
Olivier, M. Analysis of TP53 mutation spectra
reveals the fingerprint of the potent environmental
carcinogen, aristolochic acid. Mutat. Res. 753,
41–49 (2013).
132.Kunkel, T. A. & Bebenek, K. DNA replication fidelity.
Annu. Rev. Biochem. 69, 497–529 (2000).
133.Kunkel, T. A. & Erie, D. A. DNA mismatch repair. Annu.
Rev. Biochem. 74, 681–710 (2005).
134.Shcherbakova, P. V. & Fijalkowska, I. J. Translesion
synthesis DNA polymerases and control of genome
stability. Front. Biosci. 11, 2496–2517 (2006).
135.Vilar, E. & Gruber, S. B. Microsatellite instability in
colorectal cancer-the stable evidence. Nature Rev. Clin.
Oncol. 7, 153–162 (2010).
136.Kim, T. M., Laird, P. W. & Park, P. J. The landscape of
microsatellite instability in colorectal and
endometrial cancer genomes. Cell 155, 858–868
(2013).
137.Briggs, S. & Tomlinson, I. Germline and somatic
polymerase epsilon and delta mutations define a
new class of hypermutated colorectal and
endometrial cancers. J. Pathol. 230, 148–153
(2013).
138.Palles, C. et al. Germline mutations affecting the
proofreading domains of POLE and POLD1
predispose to colorectal adenomas and carcinomas.
Nature Genet. 45, 136–144 (2013).
This paper describes germline mutations in the
exonuclease domains of POLE and POLD1 as
predisposing to familial colorectal and endometrial
cancers, probably causing a class of microsatellitestable hypermutated tumours.
NATURE REVIEWS | CANCER
139.Valle, L. et al. New insights into POLE and POLD1
germline mutations in familial colorectal cancer and
polyposis. Hum. Mol. Genet. 23, 3506–3512
(2014).
140.Zou, Y. et al. Frequent POLE1 p. S297F mutation in
Chinese patients with ovarian endometrioid
carcinoma. Mutat. Res. 761, 49–52 (2014).
141.Nick McElhinny, S. A., Gordenin, D. A., Stith, C. M.,
Burgers, P. M. & Kunkel, T. A. Division of labor at the
eukaryotic replication fork. Mol. Cell 30, 137–144
(2008).
142.Cancer Genome Atlas Research Network. Integrated
genomic characterization of endometrial carcinoma.
Nature 497, 67–73 (2013).
143.Donehower, L. A. et al. MLH1‑silenced and nonsilenced subgroups of hypermutated colorectal
carcinomas have distinct mutational landscapes.
J. Pathol. 229, 99–110 (2013).
144.Matsuda, T., Bebenek, K., Masutani, C., Hanaoka, F.
& Kunkel, T. A. Low fidelity DNA synthesis by human
DNA polymerase-ε. Nature 404, 1011–1013
(2000).
145.Machida, K. et al. Hepatitis C virus induces a mutator
phenotype: enhanced mutations of immunoglobulin
and protooncogenes. Proc. Natl Acad. Sci. USA 101,
4262–4267 (2004).
146.Totoki, Y. et al. High-resolution characterization of a
hepatocellular carcinoma genome. Nature Genet. 43,
464–469 (2011).
147.Sotillo, R. et al. Mad2 overexpression promotes
aneuploidy and tumorigenesis in mice. Cancer Cell 11,
9–23 (2007).
148.Carter, S. L. et al. Absolute quantification of somatic
DNA alterations in human cancer. Nature Biotech. 30,
413–421 (2012).
This paper describes an algorithm for estimating
whether given cancer mutations are clonal or
subclonal based on allelic fractions, tumour purity
and regional ploidy.
149.Davoli, T. et al. Cumulative haploinsufficiency and
triplosensitivity drive aneuploidy patterns and
shape the cancer genome. Cell 155, 948–962
(2013).
150.Ciriello, G. et al. Emerging landscape of oncogenic
signatures across human cancers. Nature Genet. 45,
1127–1133 (2013).
151.Stephens, P. J. et al. Massive genomic
rearrangement acquired in a single catastrophic
event during cancer development. Cell 144, 27–40
(2011).
This paper describes mass breakage of
individual chromosomes and subsequent repair
that results in multiple, regionally isolated
chromosome rearrangements, known as
chromothripsis.
152.Liu, P. et al. Chromosome catastrophes involve
replication mechanisms generating complex
genomic rearrangements. Cell 146, 889–903
(2011).
153.Lee, R. S. et al. A remarkably simple genome underlies
highly malignant pediatric rhabdoid cancers. J. Clin.
Invest. 122, 2983–2988 (2012).
This paper describes the exome sequencing of
paediatric rhabdoid cancers, showing that they
have a genome with remarkably few genetic
alterations and only one recurrent mutation
occurring in SMARCB1.
154.Parsons, D. W. et al. The genetic landscape of the
childhood cancer medulloblastoma. Science 331,
435–439 (2011).
This paper describes the sequencing of the
protein-coding regions of medulloblastomas,
determining that these tumours have a low
number of mutations, with the primary recurrent
mutations occurring in MLL2 and MLL3. This
suggests that epigenetic alteration has a role in
pathogenesis.
155.Crompton, B. D. et al. The genomic landscape of
pediatric ewing sarcoma. Cancer Discov.
http://dx.doi.org/10.1158/2159-8290.CD-13-1037
(2014).
156.Lawrence, M. S. et al. Discovery and saturation
analysis of cancer genes across 21 tumour types.
Nature 505, 495–501 (2014).
This paper analysed mutations occurring in 4,072
human tumours to find likely cancer drivers as
recurrently mutated genes. This analysis identified
33 new cancer genes and estimated that between
600 to 5,000 samples need to be sequenced to
identify all of the cancer genes in a given cancer
type.
VOLUME 14 | DECEMBER 2014 | 799
© 2014 Macmillan Publishers Limited. All rights reserved
REVIEWS
157.Papaemmanuil, E. et al. RAG-mediated recombination
is the predominant driver of oncogenic rearrangement
in ETV6‑RUNX1 acute lymphoblastic leukemia. Nature
Genet. 46, 116–125 (2014).
158.Robbiani, D. F. et al. AID is required for the
chromosomal breaks in c‑myc that lead to
c‑myc/IgH translocations. Cell 135, 1028–1038
(2008).
159.Dorsett, Y. et al. A role for AID in chromosome
translocations between c‑myc and the IgH variable
region. J. Exp. Med. 204, 2225–2232 (2007).
160.Burrell, R. A. & Swanton, C. The evolution of the
unstable cancer genome. Curr. Opin. Genet. Dev. 24,
61–67 (2013).
161.Henderson, S., Chakravarthy, A., Su, X.,
Boshoff, C. & Fenton, T. R. APOBEC-Mediated
Cytosine Deamination Links PIK3CA Helical
Domain Mutations to Human Papillomavirus-Driven
Tumor Development. Cell Rep. 7, 1833–1841
(2014).
This paper determined that human papillomavirus
(HPV)-positive head and neck cancers have a higher
number of mutations occurring in DNA sequences
that are targeted by APOBEC cytidine deaminases,
resulting in higher incidence of PIK3CA helical
domain mutations in these tumours. The favouring
of the helical domain mutations, which reside in
APOBEC-target motifs, over kinase domain
mutation suggests that the APOBEC enzymes are
actively contributing to cancer progression in these
tumours.
162.Viros, A. et al. Ultraviolet radiation accelerates BRAFdriven melanomagenesis by targeting TP53. Nature
511, 478–482 (2014).
163.Douville, C. et al. CRAVAT: cancer-related analysis of
variants toolkit. Bioinformatics 29, 647–648
(2013).
164.Dulak, A. M. et al. Exome and whole-genome
sequencing of esophageal adenocarcinoma identifies
recurrent driver events and mutational complexity.
Nature Genet. 45, 478–486 (2013).
165.Jager, N. et al. Hypermutation of the inactive
X chromosome is a frequent event in cancer. Cell 155,
567–581 (2013).
166.Kumar, A. et al. Exome sequencing identifies a
spectrum of mutation frequencies in advanced and
lethal prostate cancers. Proc. Natl Acad. Sci. USA
108, 17087–17092 (2011).
800 | DECEMBER 2014 | VOLUME 14
167.Baca, S. C. et al. Punctuated evolution of
prostate cancer genomes. Cell 153, 666–677
(2013).
168.Poon, S., McPherson, J., Tan, P., Teh, B. &
Rozen, S. Mutation signatures of carcinogen
exposure: genome-wide detection and new
opportunities for cancer prevention. Genome Med.
6, 24 (2014).
169.Nik-Zainal, S. et al. The life history of 21 breast
cancers. Cell 149, 994–1007 (2012).
170.Ciccia, A. & Elledge, S. J. The DNA damage response:
making it safe to play with knives. Mol. Cell 40,
179–204 (2010).
Acknowledgements
The authors are grateful to D. Kwiatkowski, D. Zaykin and K.
Chan for advice on the manuscript. This research was supported by the Intramural Research Program of the National
Institutes of Health (NIH), National Institute of Environmental
Health Sciences (D.A.G. and S.A.R.) and by an NIH Pathway
to Independence Award K99ES022633‑01 (S.A.R.)
Competing interests statement
The authors declare no competing interests.
www.nature.com/reviews/cancer
© 2014 Macmillan Publishers Limited. All rights reserved