Spectrum of point mutations in the coding region of the

Carcinogenesis vol.19 no.4 pp.557–566, 1998
Spectrum of point mutations in the coding region of the
hypoxanthine-guanine phosphoribosyltransferase (hprt) gene in
human T-lymphocytes in vivo
Andrej Podlutsky1, Anne-May Österholm1, Sai-Mei Hou1,
Andreas Hofmaier2 and Bo Lambert1,3
1The
Karolinska Institute, Department of Biosciences, CNT/Novum,
141 57 Huddinge, Sweden and 2BIBRA International, Woodmansterne
Road, Carlshalton, Surrey, SM5 4DS, UK
3To
whom correspondence should be addressed
Email: [email protected]
The hypoxanthine-guanine phosphoribosyl transferase
(hprt ) locus in 6-thioguanine (TG) resistant T-lymphocytes
is a useful target for the study of somatic in vivo mutagenesis, since it provides information about a broad spectrum of mutation. Mutations in the hprt coding region were
studied in 124 TG-resistant T-cell clones from 38 healthy,
non-smoking male donors from a previously studied population of bus maintenance workers, fine-mechanics and
laboratory personnel. Their mean age was 43 years
(range 23–64) and their hprt mutant frequency was
9.3 K 5.2 H 10–6 (mean K SD, range 1.4–22.6 H 10–6).
Sequence analysis of hprt cDNA identified 115 unique
mutations; 76% were simple base substitutions, 10% were
K 1 bp frameshifts, and 10% were small deletions within
exons (3–52 bp). In addition, two tandem base substitutions
and one complex mutation were observed. Simple base
substitutions were observed at 55 (20%) of 281 sites known
to be mutable in the hprt coding sequence. The distribution
of these mutations was significantly different than would
be expected based upon a Poisson distribution (P < 0.0001),
suggesting the existence of ‘hotspots’. All of the 87 simple
base substitutions occurred at known mutable sites, but
eight were substitutions of a kind that have not previously
been reported at these sites. The most frequently mutated
sites were cDNA positions 197 and 146, with six and five
independent mutations respectively. Four mutations were
observed at position 131, and three each at positions 143,
208, 508 and 617. Transitions (52%) were slightly more
frequent than tranversions (48%), and mutations at GC
base pairs (56%) more common than mutations at AT base
pairs (44%). GC > AT was the most common type of base
pair substitution (37%). The majority of the mutations at
GC base pairs (78%) occurred at sites with G in the nontranscribed strand. All but one of eight mutations at CpGsites were of the kind expected from deamination of
methylated cytosine. Deletion of a single base pair (–1
frameshift) was three times more frequent than insertion
of a single bp (F1 frameshift). Almost half (6/13) of
the small (3–52 bp) deletions within the coding sequence
clustered in the 59end of exon 2. Short repeats and other
sequence motifs that have been associated with replication
error were found in the flanking regions of most of the
*Abbreviations: hprt, hypoxanthine-guanine phosphoribosyl transferase;
LNS, Lesch-Nyhan syndrome; TG, 6-thioguanine; GSTM1, glutathione transferase; NAT2, N-acetyltransferase; TCR, T-cell receptor; RFLP, restriction
fragment length polymorphism; RT-PCR, reverse transcription-polymerase
chain reaction.
© Oxford University Press
frameshifts and small deletions. However, several differences in the local sequence context between K1 frameshift
and deletion mutations were also noticed. The present
results identify positions 197, 146 and possibly 131 as
hotspots for base substitution mutations, and confirm
previously reported hotspots at positions 197, 508 and 617.
In addition, the earlier notion of a deletion hotspot in the
59end of exon 2 was confirmed. The observations of these
mutational cluster regions in different human populations
suggest that they are due to endogeneous mechanisms of
mutagenesis, or to ubiquitous environmental influences.
The emerging background spectrum of somatic in vivo
mutation in the human hprt gene provides a useful basis
for comparisons with radiation or chemically induced
mutational spectra, as well as with gene mutations in
human tumors.
Introduction
Knowledge about the mechanisms of mutagenesis in human
somatic cells in vivo is important for the understanding of
cancer and other diseases. In vivo mutations are also of
potential use as biomarkers of genetic effects of occupational
and other environmental exposures (1,2). The study of gene
specific mutational spectra has provided valuable information
about mechanisms of mutagenesis in human germ cells
(reviewed in 3) and tumors (4,5). However, there is still limited
information on the mechanisms of mutagenesis in human
somatic cells in vivo, and the relationship between mutations
arising in normal cells and mutations detected in tumors. The
background spectrum of somatic mutation in healthy people
in vivo is likely to be the result of many different causes,
including spontaneous events as well as long time exposure
to environmental mutagens, and is probably modified by nonmutagenic life style factors and endogenous host factors
affecting metabolism, DNA damage formation and repair. Few
endogenous and environmental causes of human in vivo
mutation have been identified, and it is still largely unknown
to what extent individuals differ in susceptibility toward
environmental mutagens. In addition to the elucidation of
molecular mechanisms of mutagenesis, the analysis of background mutation in people may reveal important in vivo
mutagens as well as constitutional susceptibility factors, and
give clues to the early steps of genetic change in cancer
development.
The usefulness of the X-linked hypoxanthine-guanine phosphoribosyl transferase (hprt*) locus in T-lymphocytes for
the analysis of gene-specific mutations in vivo has been
demonstrated in several studies of healthy people and patients
receiving chemotherapy and other treatments (reviewed in 6).
The individual frequency of mutant T-lymphocytes can be
determined by cloning in medium containing 6-thioguanine
(TG) (7,8). The molecular nature of the hprt mutations in Tcells can be studied using standard PCR and DNA-sequencing
557
A.Podlutsky et al.
methods. The crystal structure of the human hprt protein has
been determined (9), and the complete 55 kb nucleotide
sequence of the human hprt gene is known (10). Thus, almost
all different kinds of hprt mutation can be detected, and the
frequency distribution of different types of alterations in hprt
DNA (mutation spectrum) can be studied. Chemical-specific
mutation spectra at the hprt locus can be studied in T-cells
in vitro (11–13) and compared with mutations arising in Tcells in vivo (14), as well as with germ line mutations in
patients with Lesch-Nyhan syndrome (LNS) and hprt-related
gout (reviewed in 15). However, since hprt is an X-linked
gene with only one functional copy in somatic cells, this locus
is not suitable for studies of mechanisms involving both alleles,
e.g. gene conversion and recombination.
Human hprt mutations have been compiled in a comprehensive database (16–18), which, however, contains relatively
few in vivo mutations representing true somatic background
mutations. Preferably, such mutations should be studied in
well characterized populations of healthy, non-smoking individuals of both sexes. The first extensive study of this kind,
including 217 hprt mutations in T-lymphocyte clones from
172 male and female smokers and non-smokers, was recently
published by Burkhart-Schultz et al. (14). However, considering the relatively large target and diverse nature of genetic
alterations at the hprt locus (19), many more mutations have
to be identified to establish a complete somatic in vivo spectrum,
and to disclose any characteristic features of background and
induced mutations in healthy people.
We have identified 115 point mutations and small deletions
in the hprt coding region of T-lymphocytes from 38 healthy,
non-smoking males employed as bus maintenance workers,
fine-mechanics or laboratory personnel. This study population
has been characterized previously with regard to hprt mutant
frequency and aromatic DNA adduct levels in peripheral blood
lymphocytes (20,21). The spectrum of mutation, including
several tentative base substitution hotspots, show striking
smilarities with the spectrum published by Burkhart-Schultz
et al. (14). Furthermore, our present results support the previous
observation of a cluster of small deletion breakpoints in hprtexon 2 (22). The recurrence of mutations at specific positions
in cells from individuals of widely separated study populations,
suggests that these are true, independent background mutations,
probably caused by endogenous mechanisms of mutagenesis,
or ubiquitous environmental influences. This identification of
hotspots for somatic in vivo mutation provides a basis for
interesting comparisons with mutational spectra in tumor cells,
and germ line genes causing heritable diseases.
Materials and methods
Study population and isolation of mutant T-cell clones
Hprt mutant T-cell clones were obtained by direct TG selection of peripheral
blood lymphocytes from healthy, non-smoking males who were either bus
maintenance workers exposed to diesel exhausts, or non-exposed fine-mechanics and laboratory personnel. This study population has been characterized
earlier with regard to genotypes for glutathione transferase (GSTM1) and Nacetyltransferase (NAT2), as well as hprt mutant frequency and aromatic DNA
adduct levels in peripheral lymphocytes, and the results have been published
(20,21). The exposed workers showed increased levels of aromatic DNA
adducts, but the hprt mutant frequency was not different from that of the
unexposed group. A weak but significant correlation betweeen adduct levels
and hprt mutant frequency was observed when data from both groups were
combined (20).
Initially, a total of 462 directly selected TG resistant T-cell clones were
available from 43 individuals (29 exposed and 14 non-exposed) (23). In the
preliminary screening for mutations by multiplex and RT-PCR, cDNA was
558
obtained from 323 clones. The reasons for the failure to obtain RT-PCR
products from one third of the clones have been discussed and ascribed
to missing or unstable transcripts, poor quality of the cell sample, or
methodological difficulties (23). The RT-PCR positive mutants were classified
by PAGE-analysis as splice mutants (74 mutants with cDNA of abnormal
length), coding errors (241 mutants with cDNA of normal length) or genomic
deletions (8 mutants with abnormal multiplex PCR-product). Four mutants
were genomic deletions by multiplex-PCR, and made no PCR product (23).
After 2 years of freeze-storage, repeated RT-PCR and successful sequence
analysis of cDNA were carried out on 183 of the 315 clones with splicing
and coding errors. Thus, mutations were identified in 58% (183/315) of the
mutants under these circumstances.
Sequencing of the cDNA revealed 59 splicing mutations, which will be
reported separately (A.-M.Österholm et al., in preparation). The remaining
base substitutions and small deletions within the exon sequences in a total of
124 T-cell clones from 38 individuals are reported here.
Reverse transcription-polymerase chain reaction (RT-PCR)
Pellets of 6000 cells were thawed and hprt cDNA was synthesised essentially
according to Yang et al. (24), with primers and modifications as described
in Österholm et al. (23). Cells were incubated in 20 µl of cDNA cocktail
(50 mM Tris–HCl, pH 8.5; 75 mM KCl; 3 mM MgCl2; 2.5% NP-40; 10 mM
DTT), 500 µM of each dNTP (Pharmacia, Uppsala, Sweden); 1.6 µM of
reverse primer 2, 1 U/µl RNAsin (Pharmacia); 2.5 U/µl M-MLV reverse
transcriptase (Promega, Madison, WI) for 1 h at 37°C. Five µl of the cDNA
reaction mixture was used in PCR with 0.16 µM of forward primer 1 and
0.08 µM of reverse primer 2, 200 µM of each dNTP, PCR buffer (15 mM
Tris–HCl, pH 8.5; 60 mM KCl; 1.5 mM MgCl2) and 1 U of Taq polymerase
(Promega). After initial denaturation for 4 min at 94°C, 30 cycles were run
at 94°C for 30 s, 50°C for 30 s, 72°C for 1 min, followed by 7 min
polymerisation at 72°C. Nested PCR was performed with 2–10% of the first
PCR product and 0.2 µM each of forward primer 3 and reverse primer 4
(biotinylated), using the same reaction conditions as for the first PCR. The
PCR was performed in DNA-Engine™ (MJ Research, Watertown, MA, PTC200) or GeneAmp 2400® (Perkin Elmer, Foster City, CA) thermal cyclers.
Ten percent (5 µl) of the reaction product was visualised on a 3.75%
polyacrylamide gel.
Direct sequencing
Biotinylated PCR product (45 µl) was immobilised on streptavidin-coated
magnetic beads (Dynal, Oslo, Norway). DNA strands were separated in alkali
(1 M NaOH, 0.075% Tween 20) with the help of magnetic concentrator
(Dynal). Non-biotinylated DNA strand was precipitated with 1/10 volume of
3 M sodium acetate, pH 5.2 and 2 vol of ice-cold 95% ethanol, washed twice
with 70% ethanol, vacuum dried and dissolved in water. Sequencing reactions
used PRISM Sequenase® terminator single-stranded DNA sequencing kit
(Applied Biosystems, Foster City, CA), and 0.2 µM of one of the following
primers:
Forward primer 5: (124)59-ATTATGGACAGGACTGAA-39(141)
Forward primer 6: (166)59-GAGATGGGAGGCCATCACAT-39(185)
Reverse primer 7: (302)59-CTGATAAAATCTACAGTCAT-39(283)
Reverse primer 9: (373)59-AAGTTGAGAGATCTTCTCCAC-39(353)
The reactions were run on a 373A or 377A Automated Sequencer (Applied
Biosystems). Most of the clones were sequenced in a two-step procedure.
First, one of the two forward primers was used, giving a readable sequence
from cDNA position 200 to 670, i.e. about two thirds of the target sequence.
If no mutation was found in this region of the cDNA, the sequencing was
completed with one of the reverse primers. In 41 mutants (indicated in Table
III), the whole coding region was sequenced. With one exception (discussed
in the text) all clones showed one mutation only.
Analysis of TCR-pattern for clonal identity
Clones from the same donor which were found to have identical mutation,
were studied with regard to T-cell receptor (TCR) γ-gene rearrangement
essentially as described by de Boer et al. (25). Clonal cell lysates were used
in a two step, nested PCR reaction with primers originally described by
Bourguin et al. (26), and the nested PCR-product was subjected to restriction
fragment length polymorphism (RFLP) analysis as described by Bastlova and
Podlutsky (13).
Statistical analysis
The Goodness of Fit Test (27) was used to study the probability that the
observed simple base substitutions (listed in Table III) were randomly
distributed in the hprt coding sequence (the null hypothesis). The underlying
assumptions were that (i) all observed mutations are independent (ii) there
are 300 mutable sites in the hprt coding sequence (see text for explanation
and references). The expected frequency of sites with 1, 2, 3 etc. mutations
was calculated, using the Poisson distribution, and compared to the observed
HPRT mutational spectra
Table I. Summary of the study population, mutant clones and mutations
Group and no. of individuals
Mean age (range)
Mutant frequency (range)a
No. of clones
No. of mutationsb
Bus maintenance workers, 26
Laboratory personnel & fine mechanics, 12
All workers, 38
45 (28–64)
37 (23–54)
43 (23–64)
9.6 6 5.3 (3.2–22.6)
8.3 6 4.7 (1.4–18.0)
9.3 6 5.2 (1.4–22.6)
70
54
124
67
48
115
aMean
6 SD 3 10–6. Data from (21).
mutations within the coding region.
bUnique
frequency distribution. Differences between expected and observed frequencies
were tested for statistical significance (P , 0.05) using the chi-square test.
The probability that two or more base substitutions in a set of random
mutations would occur at the same site was calculated using the Poisson
distribution. Bonferroni correction was applied (28) to take into consideration
300 assumed mutable sites and thus provides a conservative method of
identifying probable hotspots. For a set of 87 simple base substitutions, as in
the present work, the probability after using the Bonferroni correction of
observing 4 or more, 5 or more and 6 or more mutations at any single site is
0.07, 0.004 and 0.0002 respectively. These probabilities were used to define
clusters of five or six mutations as indicating hotspots in the present data set.
Results
Point mutations and small deletions in the hprt coding region
were identified in mutant T-cell clones from 38 individuals
belonging to a previously studied group of workers exposed
to diesel exhaust and a non-exposed control group (20,23).
All subjects were non-smoking, healthy males. Our previous
results showed increased levels of aromatic DNA adducts in
the exposed group, but no significant difference between the
groups with regard to hprt mutant frequency. The distributions
of age and mutant frequency in the present study group
(Table I), were similar to the results for the entire study
population (20,23).
A total of 124 mutant clones were studied. The number of
clones per individual ranged from one to eight, with a mean
and median of three. Sibling clones (i.e. clones with identical
mutations and TCR-rearrangements) were observed in three
subjects (three doublets), and non-sibling clones (i.e. clones
with same mutation but different TCR-rearrangements) were
observed in five subjects (one triplet and four doublets)
(indicated in Table III). Considering the uncertainty with regard
to the origin of the mutation in these clones (i.e. a single event
in a cell prior to TCR-rearrangement or separate events in
clones with different TCR-rearrangements) and to avoid bias
in the frequency distribution of mutations, only one mutation
of each kind from each individual was regarded as a unique
mutation.
In one clone K-36–3 (Table III), two mutations were
identified. One of these occurred at position 135, the first base
of exon 3, where G.C as well as G.T substitutions have
been observed previously in the hprt mutation database (16,
here and below in the text and tables this reference relates to
database release six, August 1996). The other mutation in this
clone, a G.C transversion at position 176, is the first one to
be reported at this site. The predicted amino acid substitution
caused by this mutation is gly58.ala, a change from a polar
to a non polar residue. Nevertheless, until other mutations are
reported to occur at this site, this base substitution is regarded
as a silent mutation, and is therefore not included among the
unique mutations. Thus, a total of 115 unique mutations
were identified, 89 (77%) base substitutions and 26 (23%)
frameshifts and small deletions (Table II). The nature and
sequence context of each mutation is shown in Tables III and
Table II. Types and frequencies of mutations in the coding region of the
hprt gene
Type of mutation
Base substitution
Simple
Tandem
Frameshift
11bp
–1bp
–2bp
Deletion, 3–52 bp
Complex
Total
No.
(%)
87
2
(76)
(2)
3
9
1
12
1
115
(3)
(8)
(1)
(10)
(1)
(100)
VIII (V-clones derive from bus maintainance workers, and Kclones from fine-mechanics and laboratory personnel).
Among the base substitutions, 87 were simple and two
tandem mutations (Table III). Both tandem mutations were of
a kind (CC.TT or GG.AA) that have been observed previously at a low frequency (1–2% of base substitutions) at other
positions in the hprt gene. One of the simple base substitutions
was a change of the translation initiation codon, 78 (88%)
were missense and 10 (11%) were nonsense mutations. Two
mutations occurred at the extreme ends of exons; a G.C
transversion of the first base of exon 3 (K-36–3), and a T.G
transversion of the last base of exon 7 (K-37–1). In both cases,
the mutation would be expected to cause little if any attenuation
of the corresponding splicing sequence, and no exon skipping or
intron inclusion were observed in the cDNA of these mutants.
All of the base substitutions occurred at sites where mutations
have been reported previously in the hprt mutation database
(16). Eight new mutations were observed at sites where other
mutations have been reported earlier (Table IV). One of these
is a nonsense mutation (A.T) at position 307. With the
addition of these new mutations to the ones already existing
in the hprt mutation database, a total of 439 different simple
base substitutions have been recorded at 281 sites in the hprt
coding region.
The distribution of base substitutions among the nine exons
of the hprt coding region was roughly proportional to the
distribution of mutable sites (Table V). In agreement with
earlier observations (19), mutations were relatively common
in the 59-half of exon 3 (positions 143–220), and there was a
relative paucity of mutations in exon 1, the 39-half of exon 3
(positions 221–318), exon 4 and exon 5 (Table V, and Figure
1). These results indicate an overall consistency with regard
to the frequency distributions of mutations in the present data
set and the hprt mutation database.
There was a non-uniform distribution of the 87 simple base
substitutions among the 55 positions in the coding region
(Figure 1). Different mutations at the same site were observed
559
A.Podlutsky et al.
Table III. Simple and tandem base substitutions in the hprt coding
sequence
Mutant
ID
cDNA
Exon Base
Target sequenceb Amino acid
changesb
changes
positiona
V-143-12c
K-29-13c
K-32-3c
K-34-8/10c,e
V-136-7c
V-140-9
V-131-6c
K-33-9c
V-143-8c
V-150-13c
V-131-5c
K-25-5
V-119-8
V-127-6c
V-141-7c
K-36-3c, f
V-127-8c
V-129-2c
V-144-4c
K-25-13c
K-26-16c
V-123-6c
V-131-3c
V-140-5/15c,e
V-133-14
V-141-15c
K-25-2c
K-34-2
V-148-5c
K-33-6
K-26-14/18c,d
K-28-6c
K-29-2c
K-37-8c
V-136-6c
V-150-4c
V-124-10c
K-29-10
V-144-1c
V-150-8c
K-25-14c
V-123-8c
V-145-11
V-130-4
K-27-10
V-145-1
K-26-3
V-141-11
V-133-2
V-146-3
K-25-12c
K-27-8
K-27-5c
V-141-13
V-139-4
V-141-9
K-33-3/8d
V-137-10/13e
V-133-8
V-129-6
V-124-6
V-130-2
V-137-3/11d
K-29-12
K-36-1/6/8d
K-36-5
V-132-7
K-37-1c
K-29-11
K-37-4c
3
29
47
47
64
95
109
110
119
119
122
131
131
131
131
135
143
143
143
146
146
146
146
146
151
151
158
170
173
194
197
197
197
197
197
197
203
208
208
208
209
209
216
220
236
236
299
307
344
389
418
418
419
424
428
454
463
463
464
475
508
508
508
527
529
530
530
532
539
539
560
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
4
5
6
6
6
6
6
6
6
6
6
6
7
7
7
7
7
7
7
7
8
8
G.C
T.G
G.A
G.A
T.G
T.C
A.T
T.G
G.A
G.A
T.C
A.G
A.G
A.T
A.T
G.C
G.A
G.A
G.T
T.C
T.C
T.C
T.C
T.C
C.T
C.T
T.G
T.A
G.A
T.A
G.A
G.A
G.T
G.T
G.A
G.A
T.G
G.A
G.C
G.A
G.A
G.A
T.A
T.G
T.C
T.C
T.G
A.T
A.T
T.A
G.C
G.C
G.T
A.C
T.G
C.T
C.T
C.T
C.T
A.G
C.T
C.T
C.T
C.G
G.T
A.T
A.T
T.G
G.A
G.A
t t AT G GCGA
GTGA T TAGT
CCAG G TTAT
CCAG G TTAT
TTTA T TTTG
GATT T GGAA
GTTT A TTCC
TTTA T TCCT
CATG G ACTA
CATG G ACTA
GGAC T AATT
ATGG A CAGG
ATGG A CAGG
ATGG A CAGG
ATGG A CAGG
g t a g G ACTG
GAAC G TCTT
GAAC G TCTT
GAAC G TCTT
CGTC T TGCT
CGTC T TGCT
CGTC T TGCT
CGTC T TGCT
CGTC T TGCT
TGCT C GAGA
TGCT C GAGA
GATG T GATG
GAGA T GGGA
ATGG G AGGC
GCCC T CTGT
CTCT G TGTG
CTCT G TGTG
CTCT G TGTG
CTCT G TGTG
CTCT G TGTG
CTCT G TGTG
GTGC T CAAG
CAAG G GGGG
CAAG G GGGG
CAAG G GGGG
AAGG G GGGC
AAGG G GGGC
GCTA T AAAT
TAAA T TCTT
CTGC T GGAT
CTGC T GGAT
TTTA T CAGA
ACTG A AGAG
ATAA A AGTA
AATG T CTTG
CACT G GCAA
CACT G GCAA
ACTG G CAAA
CAAA A CAAT
ACAA T GCAG
CAGG C AGTA
TAAT C CAAA
TAAT C CAAA
AATC C AAAG
GGTC A AGGT
CCCA C GAAG
CCCA C GAAG
CCCA C GAAG
AAGC C AGAC
GCCA G ACT g
CCAG A CT g t
CCAG A CT g t
AGAC T g t a a
GTTG G ATTT
GTTG G ATTT
Initiation codon
Ile9.Ser
Gly15.Asp
Gly15.Asp
Phe21.Val
Leu31.Ser
Ile36.Phe
Ile36.Ser
Gly39.Glu
Gly39.Glu
Leu40.Pro
Asp43.Gly
Asp43.Gly
Asp43.Val
Asp43.Val
Arg44.Ser
Arg47.His
Arg47.His
Arg47.Leu
Leu 48.Pro
Leu48.Pro
Leu48.Pro
Leu48.Pro
Leu48.Pro
Arg50.TERM
Arg50.TERM
Val52.Gly
Met56.Lys
Gly57.Glu
Leu64.His
Cys65.Tyr
Cys65.Tyr
Cys65.Phe
Cys65.Phe
Cys65.Tyr
Cys65.Tyr
Leu67.Arg
Gly69.Arg
Gly69.Arg
Gly69.Trp
Gly69.Glu
Gly69.Glu
Tyr71.TERM
Phe73.Val
Leu77.Pro
Leu77.Pro
Ile99.Ser
Lys102.TERM
Lys114.Ile
Val129.Asp
Gly139.Arg
Gly139.Arg
Gly139.Val
Thr141.Pro
Met142.Arg
Gln151.TERM
Pro154.Ser
Pro154.Ser
Pro154.Leu
Lys158.Glu
Arg169.TERM
Arg169.TERM
Arg169.TERM
Pro175.Arg
Asp176.Tyr
Asp176.Val
Asp176.Val
Phe177.Ala
Gly179.Glu
Gly179.Glu
Table III. Cont.
Mutant
ID
cDNA
Exon Base
Target sequenceb Amino acid
changesb
changes
positiona
K-32-6
K-34-4c
K-34-9
V-145-7
K-33-10
K-27-9
V-131-7
V-150-1
K-28-3
V-119-6
V-135-1
V-137-1
K-39-4
V-127-3
V-132-4
V-143-6
V-141-3
K-29-3/6c,d
V-143-4
541
568
568
569
581
599
599
602
606
606
611
612
614
617
617
617
648
112-113
399-400
8
8
8
8
8
8
8
8
8
8
9
9
9
9
9
9
9
2
5
T.G
G.A
G.T
G.C
A.T
G.A
G.C
A.T
G.C
G.T
A.G
T.A
T.G
G.A
G.A
G.A
C.G
CC.TT
GG.AA
TGGA T TTGA
TGTA G GATA
TGTA G GATA
GTAG G ATAT
CTTG A CTAT
TTCA G GGAT
TTCA G GGAT
AGGG A TTTG
ATTT G AAT g
ATTT G AAT g
t a g C A TGTT
a g CA T GTTT
CATG T TTGT
GTTT G TGTC
GTTT G TGTC
GTTT G TGTC
AATA C AAAG
TATT CC TCAT
TTGT GG AAgt
Phe180.Val
Gly189.Arg
Gly189.TERM
Gly189.Ala
Asp193.Val
Arg199.Lys
Arg199.Thr
Asp200.Gly
Leu201.Phe
Leu201.Phe
His203.Arg
His203.Gln
Val204.Gly
Cys205.Tyr
Cys205.Tyr
Cys205.Tyr
Tyr215.TERM
Pro37.Phe
Val132.Val,
Glu133.Lys
aThe
A of the ATG start codon is number 1.
strand sequence is shown.
which the whole cDNA was sequenced.
dNon-sibling clone(s) with same mutation but different TCR-pattern was
observed in this donor.
eSibling clone(s) with same mutation and TCR pattern was observed in this
donor.
fA second mutation, G.C at position 176 was observed in this clone, but
regarded as a silent mutation (see text).
bThe non-transcribed
cIndicates mutants in
Table IV. New mutable sites and new mutations at previously mutated sites
New mutationsa
Site
Type (no.)
aa-substitution Previously observed mutations (no.)b
64
110
158
194
307
418
532
599
T.G (1)
T.G (1)
T.G (1)
T.A (1)
A.T (1)
G.C (2)
T.G (1)
G.C (1)
phe.val
ile.ser
val.gly
leu.his
lys.stop
gly.arg
phe.ala
arg.thr
aThis
bData
T.A (1)
T.A(4), T.C(2)
T.A(1), T.C(1)
T.C(4), T.G(1)
A.G (1)
G.A(1)
T.C(2)
G.A (10) G.T (3)
work.
from the human hprt mutation database (16).
at seven positions (Table VI). The goodness of fit test was
calculated to compare the distribution of 87 observed mutations
among the estimated 300 mutable sites (281 identified so far,
see Discussion) within the hprt gene with an expected Poisson
distribution with the parameter m 5 0.29 (87/300). The
underlying Ho hypothesis was that the observed frequencies
of occurrences of base pair substitutions at a single site does
not differ significantly from those expected from a Poisson
random distribution. A significant test result indicates a lack
of fit and may suggest the existence of hotspots within the
observed mutable sites. The test statistic shows an overrepresentation of base pair substitutions occurring two or more
times at a single site (18 observed, 9.9 expected), an overrepresentation of sites showing no mutation (244 observed,
224.5 expected) and an under-representation of sites showing
only a single base pair substitution (38 observed, 65.1
HPRT mutational spectra
Table VII. Simple base substitutions in the coding region
Fig. 1. Distribution of single base substitutions at the hprt locus in T-cells
of healthy, non-smoking males. Data from Table III.
Table V. Distribution of mutable sites and mutations in the hprt coding
sequence
Hprt
Exon
(no. of bp)
A
Mutable
sitesa
B
Mutated
sitesb
C
No. of bp
substitutionsb
B/A
C/B
1
2
3
4
5
6
7
8
9
654
9
49
76
16
10
37
23
42
19
281
1
9
17
1
1
8
5
8
5
55
1
14
33
1
1
10
8
12
7
87
0.11
0.18
0.22
0.06
0.10
0.21
0.21
0.19
0.26
0.20
1
1.6
1.9
1
1
1.3
1.6
1.5
1.4
1.55
aNo.
(27)
(107)
(184)
(66)
(18)
(83)
(47)
(77)
(45)
of known mutable sites in the human hprt mutation database (16).
work, data from Table III.
bThis
Table VI. Sites with three or more mutations of the same kind, or two
different mutations
Site
This work
The database
Ranka
131
143
146
197
208
508
568
599
606
617
A.G(2), A.T(2)
G.A(2), G.T
T.C(5)
G.T(2), G.A(411b)
G.A(2),G.C
C.T(311b)
G.A, G.T
G.A, G.C
G.T, G.C
G.A(3)
A.G (4)
G.A (4), G.C, G.T
T.G (5), T.C
G.A (23), G.C (5), G.T (9)
G.A (24), G.C (4), G.T (6)
C.T (29)
G.A (13), G.C (4), G.T (4)
G.A (10), G.T (3)
G.C (3), G.T (7)
G.A (9), G.T (7)
99
60
59
1
2
3
7
17
25
13
aIndicates
rank order among the most frequently mutated site in the human
hprt mutation database (16).
mutant in same individual.
bNon-sibling
expected). The chi-square test statistic was highly significant
(chi-square value 105.8, P , 0.0001). Based on the observed
frequencies of base pair substitutions at a single site and the
Type of mutationa
This work
No. (%)
Data baseb
No. (%)
All
No. (%)
All mutations
Transition (all):
G.A
C.T
A.G
T.C
87
45
23
9
4
9
87
45
22
13
6
4
(100)
(52)
(25)
(15)
(7)
(5)
174
90
45
22
10
13
Transversion (all):
G.T
C.A
G.C
C.G
A.T
T.A
A.C
T.G
Mutations at AT
Mutations at GC
42 (48)
7 (8)
0
8 (9)
2 (2)
8 (9)
5 (6)
1 (1)
11 (13)
38 (44)
49 (56)
42
6
2
10
5
3
8
0
8
29
58
(48)
(7)
(2)
(11)
(6)
(4)
(9)
84 (48)
13 (7)
2 (1)
18 (10)
7 (4)
11 (6)
13 (7)
1 (1)
19 (11)
67 (39)
107 (61)
(100)
(52)
(26)
(10)
(5)
(10)
(9)
(33)
(67)
aBases listed are on the non-transcribed strand.
bData from the human hprt mutation database (16)
(100)
(52)
(26)
(13)
(6)
(7)
including non-smokers
only, and excluding mutations at splice sites.
lack of fit of this distribution to a Poisson random distribution
there is strong indication for the existence of mutational
hotspots.
The probability that four or more, five or more and six or
more mutations by chance would occur at the same site in the
present set of 87 mutations was calculated using a Poisson
distribution followed by the use of Bonferroni correction to
be 0.07, 0.004 and 0.0002 respectively (see Materials and
methods section). This suggests that position 146 with five
mutations and position 197 with six mutations are hotspots,
while position 131 with four mutations is close to the criterion
of P , 0.05 after using the Bonferroni correction (Figure 1).
Positions 197, 508 and 617 were previously identified as
in vivo mutation hotspots in a different study population (14).
In the present data set, positions 508 and 617 showed three
mutations each. Thus, in addition to identifying positions 197
and 146 as hotspot, our present results confirm the hotspots at
positions 508 and 617, and indicate position 131 as tentative
hotspots for base substitution mutations in the hprt coding
region of T-cells in vivo. Almost a quarter (21/87) of the base
substitutions were found to occur at these five positions.
The hotspot position 508 is at a CpG-site. In addition to the
three mutations at position 508, five other mutations were
found to occur at CpG-sites; three at position 143 and two at
position 151. All but one of the CpG-mutations were of the
kind expected from spontaneous deamination of methylated
cytosine; seven were GC.AT and one was GC.TA (Table
III). Together, these eight mutations represent 9% (8/87) of
the simple base substitutions. There are seven positions in four
CpG-sites where mutations have been shown to occur in the
hprt coding region (16). If the 87 simple base substitutions
were randomly distributed among the 281 mutable sites, two
would be expected to occur at a CpG-site, which is significantly
less than the eight mutations that were observed. This result,
showing a relative excess of hprt mutations at CpG-sites in Tcells in vivo, is in agreement with previous observations (19).
Among the 87 simple base substitutions, 52% were transitions, and 48% transversions (Table VII). More mutations
occurred at GC base pairs (56%), despite the fact that there
561
A.Podlutsky et al.
Table VIII. Frameshifts and deletions in the hprt coding sequence
Mutant ID
cDNA positiona
Exon
Base changes
Target sequenceb
Remarksc
K-34-5
V-143-2
K-26-20
V-141-2
K-26-17
V-135-11
K-28-5
V-126-6
V-127-12
V-147-5
K-33-2
V-127-10e
K-36-9
K-37-3d
V-151-1d
K-34-7
V-141-12d
V-145-6
K-28-7
V-145-14
V-132-2
V-136-5e
V-131-1
K-29-1
V-138-2
K-30-10
144
196-197
223-225
294-297
318
368
503-506
536-537
552
589
595-596
610
617-619
29-32
45
45-46
44-46
48-50
49-51
80-82
299-300
400
405, 421
496-497
546
561
3
3
3
3
3
4
7
8
8
8
8
9
9
2
2
2
2
2
2
2
3
5
6
7
8
8
1A
1T
1T
–T
–T
–C
–C
–T
–A
–G
–T
–C
–2 bp
–19 bp
–52 bp
–5 bp
–35 bp
–8 bp
–22 bp
–3 bp
–6 bp
–3 bp
–16 bp, 1G
–3 bp
–4 bp
–26 bp
GAACG (A) TCTTGCT
CCCTC (T) TGTGTGC
AATTC (T) TTTGCTG
GTAGA T TTTAT
TATTG T gtgagta
ATCTCT C AACTT
AAGGA C CCCAC
tttagTTG T TGGAT
ATTCC A GACAA
ATAAT G AATAC
AATAC T TCAGG
tttatag C ATGTTT
GTTT GT GTCATT
cagA TTA...AGG TTAT
AACC AGGTT...TTG GAA
AACC AGGTT ATGAC
GAAC CAG...AAT CATT
CAGG TTATGACC TTGA
AGGT TAT...GCA TACC
AATC ATT ATGC
TTTA TCAGAC TGAAGA
TGTG GAA gtaa
aaGA TAT...GGC (G)AAA
GGTG AAA AGGA
TTGA AATT CCAG
GTT TGT...ATA ATGAATA
IR, CTY
(CT)2, (TG)3, CTY
(T)4, CTY
(T)4
(TG)3
(TC)3, CTY
(C)4
(TG)2
Palindrome
(AAT)2, DCS
(T)2, CTY
AT-repeat
(TG/GT)2
TTA-repeat
IR, Palindrome
Palindrome, TGA
CA-repeat
TT-repeat, CTY, TGA
TA repeat
AT-repeat, Palindrome
Palindrome, DCS
(G)2, (TG)2
GA-repeat
(A)4, TGAA
Palindrome, TGAA
TG-repeat, DCS
aThe
bThe
position of the first altered base and any ambiguity with regard to this position is indicated. Base 1 is the A of the initiation codon.
altered base and sequence (in bold) of the non-transcribed strand is spaced according to the most 59 position. In deletions longer than 8 bp, the first and
last few bases are separated by dots. Coding bases are in upper case, intron bases in lower case letters. Underlined sequence refers to the first remark in the
last column.
cIR refers to inverted repeat. CTY refers to the vertebrate topoisomerase I consensus cleavage site CTT or CTC. DCS refers to the deletion consensus
sequence TG A/G A/G G/T A/C, as described in (3). In addition, TGA(A)– motifs within two positions from the altered base are indicated.
dSequenced in genomic DNA and reported in (22). K-28-7 contains a double deletion, only the intra-exonic one is shown here.
eVerified by sequencing of genomic DNA.
are fewer mutable GC-base pairs (48%) than AT-base pairs.
The frequencies of transitions and transversions differed significantly between GC and AT base pairs; transitions were more
frequent than transversions at GC base pairs, whereas the
opposite was observed at AT base pairs. The predominant base
substitutions were GC.AT transitions (36%) and AT.TA
transversions (15%). In 78% (38/49) of the substitutions at
GC base pairs, guanine was in the non-transcribed strand,
which is more than expected, considering that only 63% (85/
134) of guanines at mutable sites are in the non-transcribed
strand. Thus, a possible strand bias for in vivo mutation at GC
base pairs was indicated by the results in this limited data set,
which possibly reflects a higher damage frequency and/or less
efficient repair at guanine bases in the non-transcribed strand.
Substitutions at thymines in the non-transcribed strand (25/38)
were twice as common as substitutions at thymines in the
transcribed strand (13/38). However, this is predicted in a
random distribution, because mutable thymines are approximately twice as common in the non-transcribed strand as in
the transcribed strand. Overall, the frequencies of the different
base substitutions in the present data set were very similar to
previous results from non-smoking donors (Table VII).
Frameshifts and small deletions were identified in 26 mutants
(Table VIII). Deletions of one or several bp at the 59or 39 end
of exons were observed in three mutants (K-26–17, V-127–10
and V-136–5), with no evidence of associated splicing error.
All of these alterations tend to increase the consensus value
of the corresponding splice site by changing the last or first
exon base into a G or an A. Among the 61 frameshift
562
mutations, deletion of a single base pair was three times as
common as insertion of a single base pair. Several of the
frameshift mutations were identical to mutations in the hprt
mutation database (16), e.g. at positions 368, 503 and 617.
The latter mutation, as well as the insertion at position 196/
197 where a –1 frameshift and a 3 bp-deletion have been
reported earlier, coincide with base substitution hotspots. The
1 bp-insertion at position 144 follows base no. 143, where
three independent base substitutions were recorded in the
present data set (Figure 1). At positions 294 and 503, where
–1 frameshifts were observed in the present work (Table VIII),
11 frameshifts have also been reported. Position 536, which
is the site of a –1 frameshift in the present data set, is included
in several small deletions in the database (16). In total, eight
of the 13 frameshifts, including the 2 bp-deletion, were identical
to or occurred at positions where frameshifts or small deletions
have been reported previously. In contrast, no frameshift
mutation was observed at position 207, the first in a run of
GC base pairs where 7 in vivo and 15 in vitro 61 frameshift
mutations are recorded in the database (16).
Almost 50% (6/13) of the intra-exonic small deletions
(ù3 bp) had at least one breakpoint in the previously defined
deletion hotspot region in the 59-end of exon 2 (positions
40–51). This work adds two new mutations to this region
(K-34–7 and V-145–6); four have been reported previously
(22, see Table VIII). The 3 bp-deletion at position 80 in exon
2 (clone V-145–14) is identical to a mutation reported in the
hprt mutation database (16) and its 59 breakpoint coincides
with a another 15 bp deletion in the database. The breakpoints
HPRT mutational spectra
of the –16 bp/1G compound mutation (V-131–1) coincides
with three previously described 61 frameshift mutations, and
the small deletions at positions 496, 546 and 561 occurred at
positions where –1 frameshift have been reported in the database (16). These observations suggest that frameshift and small
deletion mutations are non-randomly distributed, and may tend
to cluster in the same regions of hprt DNA.
Most of the frameshift mutations and deletions occurred in
a sequence context of short nucleotide repeats, or such repeats
were created by the mutation. Some of these structural features
are indicated in Table VIII, and further discussed below. The
base composition of the sequences in which deletions and
frameshifts occurred showed an excess of AT base pairs in
comparison with the whole hprt coding region. The AT content
of the hprt coding sequence is 59%. The mean AT content of
the three bases flanking the frameshifts and deletions (first
3 bp deleted 1 3 bp downstream of the deletion according to
Table VIII, total 3323265156 bp) is 66%. Two thirds (9/13)
of the frameshifts and 77% (20/26) of the deletion ends
occurred at an AT bp. In the 3 bp-sequences flanking the
deletions, 32% (25/78) of the bases in the non-transcribed
strand were thymines and 40% (31/78) were adenines. In
contrast, the corresponding sequences flanking the frameshift
mutations were significantly different; only 21% (16/78) were
adenines and 40% (31/78) were thymines.
The sequence context of the frameshift mutations seemed
to differ from that of the small deletions in some other respects
as well. Many (8/12) frameshifts occurred in a non-transcribed
sequence of 4–7 pyrimidines or thymines, and thymine was
the deleted or inserted base in half of these mutations. The
only G-deletion (V-147–5) was not in a polypyrimidine run,
but between AAT-repeats. These features were not prominent
among the small deletions; only two of them occurred in a
run of four pyrimidines, none of which contained more than
two thymines (Table VIII). The CTT/C trinucleotide motif,
which is a consensus cleavage sequence for vertebrate topoisomerase I, was found within 2 bp from the altered base in
five of the 13 frameshift mutations, but only in one of the
small deletion mutations. Moreover, TGTG motifs occurred
more often at or close to frameshift mutations (4/13) than
deletions (1/13). In contrast, TGA(A) motifs were more common at or near deletion breakpoints (6/13) than at frameshift
positions (1/13). Thus, it seems that the local sequence context
of frameshift mutations may be different from that of small
intra-exonic deletions, in spite of their tendency to occur in
the same regions of the hprt coding sequence.
Discussion
The analysis of mutational spectra in mammalian cells has the
potential to reveal the mechanisms of mutagenesis and to
provide clues to the identification of human somatic and germ
line mutagens (29,30). One important step in this research is
the study of in vivo mutation in well characterized human
populations. The observable background spectrum of in vivo
mutation in healthy people is influenced by a number of
factors, such as (i) the selection system used to isolate the
mutant cells, (ii) the rates of different kinds of spontaneous
and induced mutations and (iii) host factors modifying the
effect of environmental mutagen exposures. The selection
system restricts the genetic target for mutagenesis, i.e. the
types of mutation that will be observed and the distribution of
mutable sites. The vast majority of TG selected T-cells have
Fig. 2. Alignment of functional regions and positions for missense
mutations in the hprt protein. The small bars at the bottom of the top graph
show all positions where amino acid substitutions are known to occur, as
predicted from all sites where single base substitutions have been recorded
in the human hprt mutation database (16). The tall bars represent those sites
where two or more missense mutations were recorded in the present data set
(data from Table III). Asp43 corresponds to hprt-cDNA position 131, leu48
to hotspot position 146, cys65 to hotspot position 197, gly69 to cDNA
positions 208–209, gly139 to positions 418–419, pro154 to positon 463–
464, asp176 to position 529–530 and cys205 to hotspot position 617. The
lower graph represents the functional domains as described in (9).
a severely reduced or undetectable HPRT-enzyme activity (31).
Few mutants with residual HPRT-enzyme activity are likely
to survive TG-selection, and contribute to the mutational
spectrum, which therefore is biased toward null mutations.
Thus, hprt mutations which cause deletions, defective RNA
splicing, frameshifts and stop codons are expected to give rise
to a selectable phenotype, regardless of their position in the
gene (nonsense mutations have been reported at position 648,
close to the end of the coding sequence). However, not all
missense mutations will do so.
Since the observed mutation spectrum is filtered by phenotypic selection, the distribution of amino acid substitutions (as
predicted from the distribution of missense mutations) is
expected to reflect functionally important regions of the protein.
The crystal structure of the human HPRT-protein indicates
several functional domains, including amino acid residues in
the catalytic and substrate binding sites (9). Amino acid
substitutions that give rise to a TG-selectable or LNS-phenotype have been observed in as many as 70% (152/217) of the
residues in the HPRT-protein monomer (Figure 2). They seem
to distribute in the functional domains, such as the GMPbinding sites (GBS), but also in regions where no particular
function has been allocated. The amino acid substitutions
corresponding to the most frequent missense mutations in the
present data set are indicated in Figure 2. None of these amino
acids have been associated with a specific protein function,
except for Gly 139, which is part of the phosphoribosepyrophosphate-binding motif (PRPP) (9). The central part of
the protein (amino acid residues 82–128, corresponding to the
39 half of exon 3 and exon 4), shows a relatively low density
of mutations. This part includes the so called flexible loop
domain (9), and it is likely that many amino acid substitutions
in this region do not give rise to a selectable phenotype. With
the exception of this region, substitutions of most of the amino
acids in the highly conserved HPRT-monomer seems to produce
a non-functional protein, as defined by TG-selection. This
563
A.Podlutsky et al.
suggests that the spectrum of missense mutation that is observed
at the hprt locus is not much restricted by phenotypic selection,
and has the potential of truly reflecting a broad range of
molecular events involved in spontaneous as well as induced
mutations.
The human hprt mutation database (16, release 6, August
1996) contains information on 1166 simple base substitutions
causing 910 missense and 256 nonsense mutations at 281
different bp positions in the coding region. The mutations
derive from a variety of TG-selected human cell lines and
from patients with LNS and gout. The phenotype in LNS is
characterized by severely reduced HPRT-enzyme activity like
the TG-selected cells, whereas gout patients often show residual
enzyme activity (reviewed in 15). Consequently, there is a
considerable overlap between the mutations in LNS and TGselected cells, whereas some mutations are unique for gout
(32). Specifically, there are 9 bp positions (46, 155, 157, 160,
232, 239, 329, 396 and 472), for which only gout mutations
have been reported in the database (16), and where mutations
may not lead to a TG-selectable phenotype. Excluding these,
there are 272 bp in the hprt coding region that are known to
be able to mutate to a TG-selectable or LNS-phenotype by a
simple base substitution, which corresponds to almost 50%
(272/570) of all the base pairs in the coding region of the hprt
gene that theoretically could produce an amino acid exchange
by a single base substitution. However, the spectrum with
regard to base substitutions is not yet saturated. This work
adds seven new missense mutations and one new nonsense
mutation. The estimate that .300 mutable sites will eventually
be identified in the hprt coding region (19) seems reasonable,
and we therefore used this number in our statistical calculations
(see Materials and methods).
In the present work, mutations were identified in all 124
mutants from which RT-PCR product could be obtained for
sequence analysis. This is approximately half the number of
mutant clones that were classified as having point mutations
in the first screening analysis shortly after collection (23). The
difficulty to get high quality RT-PCR products two years later
is probably related to the long storage time, and the shortage
of material in some of the clones. It cannot be excluded that
this has caused some bias in the spectrum of mutations, but
there is no evidence that this is the case. On the contrary, the
similarity between the present and previous results strongly
indicates that these spectra are representative for the true
in vivo spectrum of hprt mutation in T-cells of non-smoking
adults. Approximately half of the mutations in the human hprt
database (16) are derived from a coherent study of a large and
well characterized study population (14), similar to the one in
this work, whereas the other half is composed of results from
many smaller studies, in which the donor status and mutant
collection was not always well characterized. Nevertheless,
the hotspots positions (14) and frequency distribution of
mutations in these data sets are very similar (Table VII).
The overall distribution of mutations per mutable site
was found to deviate significantly from a random Poisson
distribution (Figure 1). The main contribution to the significant
chi-square value was the over-representation of sites showing
three or more base pair substitutions. Burkhart-Schultz et al.
(14) identified three hotspots at positions 197, 508 and 617 in
a mixed population of smokers and non-smokers, but did not
find statistically significant differences between the mutation
spectra of the two subpopulations. Our present data confirm
the existence of these hotspots in a quite distant and unrelated
564
non-smoking study population. Moreover, our results identify
another independent hotspot at position 146, and a possible
hotspot at position 131.
The mutated guanine base at position 197 is the first G in
a (TG)3-repeat, which is preceded by a CTC-trinucleotide
conforming to the vertebrate topoisomerse I consensus cleavage
site (CTC or CTT) which has been implicated in frameshift
and small deletion mutations (see below). Both G.A and
G.T subsitutions were found to occur at this site. The
mutated thymine at position 146 is the second base of a CTTtrinucleotide. This hotspot is flanked by two other frequently
mutated sites; 143, which is a CpG-site spaced by one base
from the CTT trinucleotide and 151, which is part of a CTCtrinucleotide. Altogether seven of the 55 mutated sites occurred
in or close to a CTT or CTC trinucleotide. Position 131 is in
a sequence resembling a ‘consensus deletion sequence’ denoted
by Cooper and Krawczak (3), which was also observed close
to eight other mutated sites. Position 617 involves a (TG)2repeat, and position 508 is a CpG-site. Twelve of the 55
mutated sites were located within 2 bp from a frameshift
mutation or deletion breakpoint, which may indicate that some
structural features may be common to these types of mutations,
as suggested by Rodriguez and Loechler (33). Cariello and
Skopek (19) identified five tentative hairpin structures in the
hprt DNA; three in exon 3 and two in exon 8, which together
contain 83 base pairs. Although only 24% (68/281) of the
mutable sites are located in these structures, they contained
38% (33/87) of the base substitutions in the present data set,
including 11 hotspot mutations at positions 146 and 197. This
indicates that these palindromes may promote mutagenesis, or
it may simply reflect the functional importance of this part of
exon 3. There is a need for more mutations and proper
statistical methods for the analysis of the possible contribution
of these and other structural features to the observed spectrum
of in vivo mutation.
With the possible exception of the two tandem mutations,
the present spectrum of base substitutions does not highlight
any particular kind of exogenous factor involved in the
causation of these mutations. Both tandem mutations (CC.TT
at positions 112–113 and GG.AA at positions 399–400)
were of the kind that has been associated with UV-induced
mutagenesis in skin tumors (34), and with mutations induced
by reactive oxygen species in vitro (35). On the other hand,
positions 399–400 overlap a 3 bp deletion mutation adjacent
to a (TG)2 repeat (Table VIII), and positions 112–113 coincide
with a CTC-trinucleotide (see above). This suggests that other
factors may contribute to sequence instability at these sites as
well. Two different mutations at the same site were observed
in seven positions, all but one at GC-base pairs (Table VI).
This indicates that different mechanisms operate at these sites,
either as a result of different types of DNA damage or sequence
instability.
In general, transitions were slightly more frequent than
transversions (Table VII), as is the case for human germ line
mutations causing genetic diseases, where transitions also
dominate (3). Mutation at GC bp, especially with G in the
non-transcribed strand, were more common than mutation at
AT bp, in spite of the fact that more mutable sites exist at AT
bp in the hprt coding sequence. This preference for mutagenesis
at GC-bp is mainly caused by the predominance of GCmutations among the few hotspots and frequently mutated
sites; the base composition among all the 55 mutated sites in
the present data set (46% GC) is not much different from that
HPRT mutational spectra
of the entire hprt coding region. Therefore, it is not clear if
the preference for mutagenesis at GC bp is due to an increased
rate of (spontaneous or induced) modifications at G-bases
(possibly combined with a slower repair of G’s in the nontranscribed strand), or simply related to the local sequence
context around the hotspot positions. It is interesting to note,
however, that while base substitution mutations showed a
preference for GC-bp, the frameshift and deletion mutations
showed a preference for AT-bp.
Among human germ line mutations, the predominance of
CG.TA mutations have been attributed to the deamination
of 5-methylcytosine at CpG-sites (3). It is not known which
of the eight CpG-sites are methylated in the hprt coding region,
but mutations have been reported at all seven mutable positions
in four CpG-sites, 142/143, 151/152, 481 and 508/509. In the
present data set, three C,T transitions were found at site 508.
This in vivo-mutation hotspot (see above) is the third most
frequently mutated position in the human hprt mutation
database (16). All mutations are C.T, creating a TGA stop
codon (Table VI). Two C.T transitions were found at position
151, the sixth most frequently mutated site in the database.
Also at this position the mutation gives rise to a stop codon,
and all known mutations are C.T transitions. Taken together,
these results strongly suggest that the cytosines at positions
508 and 151 are methylated, and that spontaneous deamination
of 5-methylcytosine is a likely mechanism for the frequent
occurrence of mutation at these sites. Interestingly, G.A
transitions are much less often observed at these sites compared
to C.T transitions, suggesting a pronounced strand bias for
5-methylcytosine mediated mutagenesis in the hprt gene (36).
In contrast, it seems unlikely that 5-methylcytosine deamination is involved in mutagenesis at the CpG site at position
142/143. Three mutations occurred at this site, two G.A and
one G.T (Table III). The human mutation database (16),
contains seven mutations at position 142/143, four of them
being G.A, but no C.T-mutation has been reported to occur
at this CpG-site.
We have recently characterized a deletion hot spot in the
59-end of hprt exon 2 (22). Almost half (6/13) of the deletion
mutations in the present data set had one or both breakpoints
within a 9-base pair palindromic structure (positions 41–49),
flanked by a number of TGA-repeats. In addition to the many
small deletion breakpoints in this region, there are also four
independent 61 frameshift mutations in the human hprt
mutation database (16), suggesting that it may be a hot spot
region for frameshift mutation as well. Previous analysis of
somatic mutation in mammalian cells (37) and human germ
line mutations (3) have suggested that the mononucleotide
composition and DNA sequence context surrounding deletions
is different from that of bulk DNA. The present deletion
mutations seemed to occur in sequences with a possible excess
of AT-base pairs and bias in the distribution of A and T bases
between the transcribed and untranscribed DNA strand.
Several more or less specific sequences have been associated
with frameshifts and small deletions in human genes (discussed
in 3). The most frequently recurring sequence features appears
to be short di- or triucleotide repeats, inverted repeats, runs of
4–5 pyrimidines, the so called deletion consensus sequence
which resembles certain polymerase arrest sites, symmetric
elements, and the consensus clevage sequence for vertebrate
topoisomerase I (CTT/C). The latter sequence motif was found
to be associated with the breakpoints of large somatic hprt
deletions in human cells (38). As shown in Table VIII and
indicated in the Results section, several of the present frameshift
and deletion mutations seem to be associated with one or more
of these characteristic sequence features. But there are also
important differences between the frameshift and deletion
mutations with regard to these sequence characteristics, suggesting that the mechanisms involved are different. Overall,
the structural features of these mutations conform with earlier
studies of spontaneous gene deletions in mammalian cell lines
in vitro (reviewed in 39) and human germ line mutations
(reviewed in 3). The further evaluation of the biological
significance of these sequence motifs and their relation to
mechanisms of base substitution, frameshift and deletion
mutagenesis requires larger data sets and improved statistical
methods allowing the analysis and comparison of variations
in the local sequence context in regions where mutations occur
and in other parts of the gene.
The consistency of the present mutational spectrum with
that of Burkhart-Schultz et al. (14) and the human hprt
mutation database (16), provides strong evidence that these
mutations truly reflect the background spectrum of hprt mutation in T-cells of healthy, non-smoking adults. However, it is
also obvious that any mutation spectrum, being composed of
one or a few mutations from each of a large number of
individuals, conceals a considerable heterogeneity with regard
to individual differences in mutation rates, metabolism and
life-style related exposures. So far, the analysis of mutation
spectra in smokers and non-smokers have not provided evidence of significant differences (14,40), which may be due to
the limited size and variability within the study populations.
In addition to smoking, age and some occupational and
therapeutic exposures (reviewed in 6), and possibly genetic
variations in the metabolism of xenobiotics (41), have been
associated with increased hprt mutation frequencies. Further
studies are needed to elucidate the possible influence of these
and other host and life style related factors on the background
spectrum of hprt mutation in T-cells of healthy people. The
presently compiled database will be a useful basis for such
studies and for future comparisons with hprt mutations in
people exposed to environmental mutagens and carcinogens,
as well as with mutation spectra in tumors and inherited
genetic diseases.
Acknowledgements
Andrej Podlutsky and Anne-May Österholm contributed equally to this work.
Andrej Podlutsky was supported by stipends from the European Science
Foundation (ESF), and the Royal Academy of Sciences (KVA). Andreas
Hofmaier is a recipient of a EUCAHM-fellowship on leave from the School
for Medical Documentation at the University of Ulm. We are grateful to
Dr William Thilly and Dr Aoy Tomita, MIT, Cambridge, USA, and
Dr David Lovell, BIBRA, UK, for statistical advice. Financial contributions
was received from the Swedish Cancer Society (1179-B96–1XAB), The
Swedish Work Environmental Fund, The Swedish Environment Protection
Board and Swedish Match AB. Part of this work was conducted within the
framework of the EU-BioMed 2 Project on Occupational and Environmental
Mutagenesis: Validation and Application of the HPRT in vivo Mutation Assay
for Risk Assessment in Humans (EUCAHM, BMH4 CT96 0120, http://
www.ulst.ac.uk./faculty/science/EUCAHM).
References
1. Albertini,R.J., Nicklas,J.A., O9Neill,J.P. and Robison,S.H. (1990) In vivo
somatic mutations in humans: Measurement and analysis. Annu. Rev.
Genet., 24, 305–326.
2. Albertini,R.J. (1994) Why use somatic mutations for human biomonitoring.
Env. Molec. Mutag., 23/S24, 18–22.
3. Cooper,D.N. and Krawczak,M. (1993) Human Gene Mutation. Bios
Scientific Publishers, Oxford, UK.
565
A.Podlutsky et al.
4. Greenblatt,M.S., Bennett,W.P., Hollstein,M. and Harris,C.C. (1994)
Mutations in the p53 tumor suppressor gene: clues to cancer etiology and
molecular pathogenesis. Cancer Res., 54, 4855–4878.
5. Krawczak,M, Smith-Sørensen,B., Schmidtke,J., Kakkar,V.V., Cooper,D.N.
and Hovig,E. (1995) Somatic spectrum of cancer-associated single basepair
substitutions in the TP53 gene is determined mainly by endogeneus
mechanisms of mutation and by selection. Human Mutation, 5, 48–57.
6. Cole,J. and Skopek,T. (1994) Somatic mutant frequency, mutation rates
and mutational spectra in the human population in vivo. Mutat. Res., 304,
33–106.
7. Albertini,R.J., Castle,K.L. and Borcherding,W.R. (1982) T-cell cloning to
detect the mutant 6-thioguanine resistant lymphocytes present in human
peripheral blood. Proc. Natl Acad. Sci. USA, 79, 6617–6621.
8. Morley,A.A., Trainor,K.J., Seshadri,R. and Ryall,R.B. (1983) Measurement
of in vivo mutations in human lymphocytes. Nature, 302, 155–156.
9. Eads,J.C., Scapin,G., Xu,Y., Grbmeyer,C. and Sacchettini,J.C. (1994) The
crystal structure of human hypoxanthine-guanne phosphoribosyltransferase
with bound GMP. Cell, 78, 325–334.
10. Edwards,A., Voss,H., Rice,P., Civitello,A., Stegemann,J., Schwager,C.,
Zimmermann,J., Erfle,H. and Caskey,C.T. (1990) Automated DNA
sequencing of the human hprt-locus. Genomics, 6, 593–608.
11. Andersson,B., Fält,S. and Lambert,B. (1992) Strand specificity for
mutations induced by (1)-anti BPDE in the hprt gene in human Tlymphocytes. Mutat. Res., 269, 129–140.
12. McGregor,W.G., Maher,V.M. and McCormick,J.J. (1994) Kinds and
location of mutations induced in the hypoxanthine-guanine
phosphoribosyltransferase gene of human T-lymphocytes by 1nitrosopyrene, including those caused by V(D)J recombinase. Cancer Res.,
54, 4207–4213.
13. Bastlova,T and Podlutsky,A. (1996) Molecular analysis of styrene oxideinduced hprt mutation in human T-lymphocytes. Mutagenesis, 11, 581–591.
14. Burkhart-Schultz,K., Thompson,C.L. and Jones,I.M. (1996) Spectrum of
somatic mutation at the hypoxanthine phosphoribosyltransferase (hprt)
gene of healthy people. Carcinogenesis, 17, 1871–1883.
15. Sculley,D.G., Dawson,P.A., Emmerson,B.T. and Gordon,R.B. (1992) A
review of the molecular basis of hypoxanthine-guanine phosphoribosyltransferase (HPRT) deficiency. Hum. Genet., 90, 195–207.
16. Cariello,N.F. (1994) Software for the analysis of mutations at the human
hprt gene. Mutat. Res., 312, 173–185. (Data base release no. six, August
1996.)
17. Cariello,N. (1996) Relational data-base model for DNA mutations and
software program for implementation of the model. Mutat. Res., 359,
103–117.
18. Cariello,N.F., Douglas,G.R., Dycaico,M.J., Gorelick,N.J., Provost,G.S. and
Soussi,T. (1997) Data-bases and software for the analysis of mutations in
the human p53 gene, the human hprt gene and both the lacI and lacZ
gene in transgenic rodents. Nucl. Acids Res., 25, 136–137.
19. Cariello,N.F. and Skopek,T.R. (1993) Analysis of mutations occurring at
the human hprt locus. J. Mol. Biol., 231, 41–57.
20. Hou,S.-M., Lambert,B. and Hemminki,K. (1995a) Relationship between
hprt mutant frequency, aromatic DNA adducts and genotypes for GSTM1
and NAT2 in bus maintenance workers. Carcinogenesis, 16, 1913–1917.
21. Hou,S.-M., Fält,S and Steen,A.-M. (1995b) HPRT mutant frequency and
GSTM1 genotype in non-smoking healthy individuals. Env. Molec. Mutag.,
25, 97–105.
22. Österholm,A.-M., Bastlova,T., Meijer,A., Podlutsky,A., Zanesi,N. and
Hou,S.-M. (1996) Sequence analysis of deletion mutations at the HPRT
locus of human T-lymphocytes: association of a palindromic structure
with a breakpoint cluster in exon 2. Mutagenesis, 11, 511–517.
23. Österholm,A.-M., Fält,S., Lambert,B. and Hou,S.-M. (1995) Classification
of mutations at the human hprt-locus in T-lymphocytes of bus maintenance
workers by multiplex-PCR and reverse transcriptase-PCR analysis.
Carcinogenesis, 16, 1909–1912.
24. Yang,J.L., Maher,V.M. and McCormick,J.J. (1989) Amplification and direct
nucleotide sequencing of cDNA from the lysate of low numbers of diploid
human cells. Gene, 83, 347–354.
25. de Boer,J.G., Curry,J.D. and Glickman,B.W. (1993) A fast and simple
method to determine the clonal relationship among human T-cell
lymphocytes. Mutat. Res., 288, 173–180.
26. Bourguin,A., Tung,R., Galili,N. and Sklar,J. (1990) Rapid, nonradioactive
detection of clonal T-cell receptor gene rearrangements in lymphoid
neoplasms. Proc. Natl Acad. Sci. USA, 87, 8536–8540.
27. Snedecor,G.W. and Cochran,W.G. (1967) Sampling from the binomial
distribution. In Snedecor,G.W. and Cochran,W.G. Statistical Methods, 6th
Edn. The Iowa State University Press, IA, pp. 223–227.
28. Adams,W.T. and Skopek,T.R. (1987) Statistical tests for the comparison
of samples from mutational spectra. J. Mol. Biol., 194, 391–396.
566
29. Keohavong,P and Thilly,W.G. (1992) Mutational spectrometry: A general
approach for hot-spot mutations in selectable genes. Proc. Natl Acad. Sci.
USA, 89, 4623–4627.
30. Kat,A.G. and Thilly,W.G. (1994) Mutational spectra of endogeneous genes
in mammalian cells. In Hemminki,K. et al. DNA Adducts: Identification and
Biological Significance, IARC Scientific publications No. 125. International
Agency for Research on Cancer, Lyon, France, pp. 371–383.
31. Steen,A.-M., Sahlén,S., Hou,S.-M. and Lambert,B. (1993) Hprt-activities
and RNA phenotypes in 6-thioguanine resistant human T-lymphocytes.
Mutat. Res., 286, 209–215.
32. Lambert,B., Marcus,S., Andersson,B., Hou,S.-M., Steen,A.-M. and
Hellgren,D. (1992) Missense mutations and evolutionary conserved amino
acids at the human hypoxanthine phosphoribosyl-transferase locus.
Pharmacogenetics, 2, 329–336.
33. Rodriguez,H and Loechler,E.L. (1995) Are base substitution and frameshift
mutagenesis pathways interrelated? Mutat. Res., 326, 29–37.
34. Brash,D.E., Rudolph,J.A., Simon,J.A., Lin,A., McKenna,G.J., Baden,H.P.,
Halperin,A.J. and Pontén,J. (1991) A role for sunlight in skin cancer: UVinduced p53 mutations in squamous cell carcinoma. Proc. Natl Acad. Sci.
USA, 88, 10124–10128.
35. Reid,T.M. and Loeb,L.A. (1993) Tandem double CC→TT mutations are
produced by reactive oxygen species. Proc. Natl Acad. Sci. USA, 90,
3904–3907.
36. Skandalis,A., Ford,B.N. and Glickman,B.W. (1994) Strand bias in mutation
involving 5-methylcytosine deamination in the human hprt gene. Mutat.
Res., 314, 21–26.
37. Nalbantoglu,J., Hartley,D., Phear,G., Tear,G. and Meuth,M. (1986)
Spontaneous deletion formation at the aprt locus of hamster cells: the
presence of short sequence homologies and dyad symmetries at deletion
termini. EMBO J., 5, 1199–2004.
38. Monnat,R.,Jr, Hackmann,A.F. and Chiaverotti,T.A. (1992) Nucleotide
sequence analysis of human hypoxanthine phosphoribosyltransfrase
(HPRT) gene deletions. Genomics, 13, 777–787.
39. Meuth,M. (1990) The structure of mutation in mammalian cells. Biochim.
Biophys. Acta, 1032, 1–17.
40. Vrieling,H., Thijssen,J.C.P., Rossi,A.M., van Dam,F.J., Natarajan,A.T.,
Tates,A.D. and van Zeeland,A.A. (1992) Enhanced hprt mutant frequency
but no significant difference in mutation spectrum between a smoking and
a non-smoking human population. Carcinogenesis, 13, 1625–1631.
41. Lambert,B., Bastlova,T., Österholm,A.-M. and Hou,S.-M. (1995) Analysis
of mutation at the hprt locus in human T-lymphocytes. Toxicol. Lett.,
82/83, 323–333.
Received on July 16, 1997; revised on November 7, 1997; accepted on
December 10, 1997