Use of polymorphic short and clustered coding

IMMUNOLOGY AND
MEDICAL
MICROBIOLOGY
ELSEVIER
FEMS Immunology
and Medical Microbiology
15 (1996) 73-79
Use of polymorphic short and clustered coding-region
microsatellites to distinguish strains of Can&da albicans
Dawn Field a, Lori Eggert a, Dave Metzgar a, Randall Rose b, Christopher Wills ‘, *
a Departmentof Biology,University of California at San Diego, La Jolla, CA 92093-0116. USA
b Department of Linguistics, University of California at San Diego, La Jolla, CA 92093-0116, USA
’ Center,for Molecular Genetics. Universi@ of California at San Diego, L.a Jolla, CA 92093-0116, USA
Received 8 February
1996; revised 1 April 1996; accepted
18 April 1996
Abstract
We describe the identification of polymorphic microsatellite loci in the pathogenic yeast, Cundida albicans. A search for
all coding-region microsatellites with more than four repeats that can be found in Cundidu sequences in GenBank was
conducted. Nine such microsatellite sequences consisting of trinucleotide motifs were found. Three of these were perfect
microsatellites while the remaining six sequences were found in one imperfect microsatellite and two compound microsatell&es. Because of the close proximity of some of these repeats, all could be assayed with six PCR primer pairs. All of these
microsatellite sequences were found in five nuclear genes, ZNFl, CCNl, CPHl, EFGI, and h4hY2. Except for a single
(CTT), serine tract, all coded for polyglutamine tracts. Another locus with seven alleles, a region of the ERKl protein
kinase gene, was also examined, and may be a representative
of a new class of highly polymorphic
‘clustered’
microsatellites. Such loci, in which several non-contiguous but closely linked microsatellites are clustered together, may be a
useful source of DNA polymorphisms in microorganisms in which long microsatellite sequences are unavailable. All seven
regions amplified were polymorphic, having between two and seven variable length alleles in the 11 strains of Candida
albicuns examined. The results of this and similar searches will facilitate epidemiological
and evolutionary studies of
Cundidu and other microorganisms.
Keywords:
Candida a&cans;
Polymorphic
microsatellite
locus; Coding-region
microsatelllites;
Clustered polymorphisms
1. Introduction
the proper diagnosis,
treatment,
study of infectious
agents.
The ability to distinguish
morphologically
identical strains of pathogenic microorganisms
at the genetic level, and to use these markers to trace the
strains’ sensitivity to drugs, levels of virulence, and
transmission and colonization histories, is critical to
Candida albicans is a pathogenic,
apparently
asexual [l] yeast that can cause massive oropharyngeal, esophageal and vaginal infections and, less
commonly, a wide variety of life-threatening
conditions triggered by bloodstream
infection [2]. The
individuals most at risk are AIDS victims and other
* Corresponding
author. Tel: + 1 (619) 534-4113;
(619) 534-7108; E-mail: [email protected]
0928-8244/96/$15.00
Copyright
PII SO928-8244(96)00034-X
0 1996 Federation
Fax:
+ 1
of European
and epidemological
patients
with a disease-compromised
immune
system, and those who are on immunosuppressive
drugs.
As the prevalence
of AIDS increases,
and the use of
Microbiological
Societies. Published
by Elsevier Science B.V.
74
D. Field et al. / FEMS Immunology and Medical Microbiology 15 (19961 73-79
chemotherapy and organ transplantation
grows, the
need to understand the evolution of drug resistance,
transmission,
and virulence of accompanying
Candida infections will grow as well.
The ability to examine the level of diversity and
the stability of its clonal types is essential in understanding the epidemiology of this pathogen. Another
longstanding question about Candida is whether it is
truly a collection of asexual pathogenic lineages or
whether it retains a cryptic sexual phase. In recent
years, spurred by the connection between Candida
infections and AIDS, many methods have been proposed to differentiate Candida strains. Several electrophoretic methods, morphotyping,
serotyping, antibiogram, resistogram typing, biotyping, sensitivity
to yeast killer toxins, and typing based on protein
variability
have all been used [3,4]. In general,
molecular markers such as RFLP’s, RAPDs and
DNA and PCR fingerprinting
provide the most discriminatory
power [5,6]. Such techniques assay a
wide range of nuclear polymorphisms
using relatively simple procedures [7-91.
Since RFLPs, RAPDs, and DNA fingerprinting
and PCR fingerprinting
all assay anonymous loci,
they can be applied to organisms with poorly known
genomes, but unfortunately
these techniques rarely
assay codominantly
inherited polymorphisms.
Ideal
molecular markers should be numerous and highly
polymorphic,
provide reproducible
results, and be
simple to assay. Such markers should preferably be
codominantly
inherited (providing the ability to distinguish heterozygotes at a cloned locus) so that the
Mendelian inheritance and evolution of the target
locus can be followed through time.
Microsatellite
markers fulfill these requirements
and have revolutionized the genetic analysis of higher
eukaryotes. These abundant hypervariable DNA sequences, made up of motifs of l-6 bases of tandemly
repeated nucleotides [ 10,ll I, are subject to high rates
of polymerase slippage during replication [12]. The
accumulation of length mutations renders microsatellites among the most variable classes of repetitive
DNAs.
Microsatellite
sequences
are dispersed
throughout the length of eukaryotic chromosomes,
making them ideal for genome mapping and linkage
studies [ 131.
Before microsatellites can be assayed, informative
loci. must be found in the genome of the target
organism. This entails cloning of the microsatellite
and the flanking sequence so that PCR primers can
be designed. Microsatellites
are usually cloned by
screening
species-specific
genomic
libraries.
An
equally successful strategy is to screen existing sequences deposited in databases such as GenBank.
Here we describe an application of this latter approach to a microorganism,
which has enabled us to
identify seven polymorphic microsatellite loci in the
pathogenic yeast Candida.
2. Materials and methods
2.1. Candida isolates
Three strains of Candida aibicans from the
American
Type Culture Collection
were typed:
ATCC 60193, 11006, and 36232. Nine additional
clinical isolates from a variety of sources were kindly
supplied by Sharon Reed, one of which was C.
kruseii. C. albicans was distinguished by its ability
to form pseudohyphae
under anaerobic conditions
and by appropriate assimilation tests.
2.2. Isolation of genomic DNA for PCR
This DNA extraction method is based on a protocol used to obtain PCR-grade DNA from forensic
samples [14]. Our modified version allows DNA for
use in PCR to be extracted from single yeast colonies
in less than 20 min in a single tube. A sterile
toothpick was used to transfer approximately
one
cubic millimeter of cells from single colonies grown
on YEPD plates to 300 $ of 5% Chelex (w/v)
suspended in ddH,O in a 0.6 ml Eppendorf tube.
Each tube was boiled for 10 min followed by 10 s of
vortexing and centrifugation for 3 min at 14000 rpm
to pellet all cellular debris at the bottom of the tube.
All DNA prepared in this manner was stored at
- 20°C thawed, vortexed, and centrifuged for 3 min
at 14 000 rpm before each use as PCR template [ 141.
2.3. Identification
PCR primers
of microsatellites
and selection of
All Candida albicans sequences deposited in
GenBank (Version 90.0) were downloaded onto a
75
D. Field et al. / FEMS Immunology and Medical Microbiology I5 (1996) 73-79
Table 1
GenBank
survey of Candido microsatellites
No. of Total bp Max di Max tri Max tetra Max penta Max hexa No. di
No. tri
No. tetra
No. penta
No. hexa
entries searched found found
found
found
found
> 4 repeats > 4 repeats > 4 repeats > 4 repeats > 4 repeats
Coding
Non-coding
89
138
160109
276320
4
13
11
7
3
8
2
5
4
3
0
37
9
18
0
3
0
1
0
0
The results of searching all Candida albicans sequences, both coding and non-coding, available in GenBank for microsatellite with motifs
of 2-6 bases are summarized. For each repeat type, the longest repeat found as well as the total number of microsatellites longer than four
repeats is given. In GenBank (version 90.0) there are 138 entries from Condida, totaling 276 320 nucleotides. Of the 138 total entries, there
are 103 coding regions representing 160 109 nucleotides. Of the 103, 14 were found to be allelic redundancies, reducing the total number of
unique published Cundidu OFWs to 89.
Sun microcomputer using Entrez from the CD version of GenBank (Version 90.01, and searched for
possible microsatellites using a program under development by Randall Rose. A total of 89 non-duplicated protein sequences, comprising a total of 160 109
nucleotides, were searched. PCR primers were designed to amplify products of 100-400 nucleotides
in length. All PCR primers were purchased from
Research Genetics (Huntsville, AL).
polynucleotide kinase (Gibco BRL; according to
manufacturer’s specifications). The PCR reaction mix
included 2 mM each dNTP, 1 unit of cloned pfu
DNA polymerase (Stratagene), 0.5 p,M forward and
reverse unlabeled primers, 0.05 p,M end-labeled forward primer, and 1 ~1 DNA prepared in Chelex, in a
total volume of 10 pl 1X PCR buffer. PCR was
performed on a Perkin Elmer 2400 thermocycler,
employing 40 cycles with 1 min of denaturation at
94”C, 1 min of annealing at 50°C and 1 min of
extension at 72°C. Four ~1 of stop solution was
added to each 10 pl reaction, and the samples were
loaded on 8% polyacrylamide gels. The gels were
2.4. PCR
PCR products were visualized by end-labeling
each forward primer with gamma P3*ATP using T4
Table 2
PCR primers for seven polymorphic
microsatellites
in Cundido albicans
Microsateilite
primer name
GenBank
locus
Gene
Size of
GenBank
clone
Forward and reverse primers
‘ZNFl’
YSAzNFl
ZNFI gene: zinc finger protein
240 bp
‘CCN2’
YSACLNl
Gl cyclin: (CCNl; CLNl)
221 bp
‘CPHl’
CAU15152
CPHl gene: Stel2-like
‘MNT2’
CAMNT2GE.N
MN72 gene: mannosyltransferase
‘EFGl’
CAEFGTF
EFGI gene: putative transcription
‘EFG2’
CAEFGTF
EFGI gene: (see above)
338 bp
‘ERKl’
YSAERKl
ERKl gene: protein kinase
170 bp
F 5’ CCATTACAGCTGAACCAGCGAGGG
3’
R 5’ CGCTAGGTAACCTACAGA’ZTGTGGC
3’
F 5’ CC’ITCCCATCCTCATACC
3’
R 5’ CCAATGA-ITCAAGTA’I-TGGATGG
3’
F 5’ GCCATGGGATATCAAAGC
3
R 5’ C’I-IGGTAATGCCACCGCC
3’
F 5’ GCCAATACTGGAAACTGTGCC
3’
R 5’ CGGGCTAAAGTGACAAATGTGGC
3’
F 5’ GGTCAACAGACTGGACAGACAGC
3’
R 5’ GGTATGGGGGCACCACTAGGAGC
3
F 5’ CACCTGCATCAGAACCAGG
3’
R 5’ GATGTTG’ITGGGGTGAAGGG
3’
F 5’ CGACCACGTCATCAATACAAATCG3’
R 5’ CG’ITGAATGAAACITGACGAGGGG
3’
transcription
factor
216 bp
329 bp
factor
214 bp
For each primer pair the GenBank locus and a brief description of the gene for which primers were designed
GenBank clone refers to the length DNA region in GenBank flanked by the forward and reverse primers.
are given. The size of the
D. Field et al. / FEMS Immunology and Medical Microbiology 15 (1996) 73- 79
76
dried for 1 h and exposed to film for 6-24 h.
2.5. Cloning and sequencing of ZNFI and ERKl
alleles
Genomic DNA was amplified (substituting 1 U of
Taq polymerase (Perkin Elmer) for pfu in the above
PCR protocol) and cloned into E. coli using a TA
cloning kit (Invitrogen).
Sequencing of individual
alleles was done using the Sequenase Version 2.0
DNA sequencing kit (USB) and -40 primers according to manufacturer’s specifications.
3. Results
Results of the GenBank
lite sequences with di- to
summarized in Table 1 for
ing regions of the genome.
search for all microsatelhexanucleotide motifs are
both coding and non-codFrom these microsatellite
sequences, the nine longest coding-region trinucleotide repeats were selected for amplification by PCR.
The genes involved and the PCR primers designed
are listed in Table 2. Overall length of the microsatellite sequence was the only criterion used in
the initial GenBank search. These nine repeats were
found to be grouped into three perfect, one interrupted, and two compound microsatellite sequences
[ 151 In addition to these nine repeats which were
selected for length, a region of the ERKl gene was
analyzed, because it was found that the 5’ region of
this protein kinase shows a localized clustering of
several very short repeats in one 100 bp region of
DNA.
Eleven strains of C. albicans were typed using
these seven pairs of PCR primers. These seven coding regions tested were all polymorphic, displaying
between two and seven variable length alleles among
the strains tested (Table 3). Mean heterozygosities
Table 3
The longest coding trinucleotide repeats found in Candida albicans sequences deposited in GenBank
clustered set of short polymorphic repeats found in the gene ERKI
Gene
No. of unique
genotypes found
among 11 strains
ZNFl
4
Total no.
of alleles
Category of
repeat
Average
heterozygosity
4
perfect
55%
(version 90.0). with the addition of a
Repeated sequences
Polyaminoacid
tract
*
‘(cAA),
,
(Q), I
(Q),
(Q17
(CA&
CCN2
Ml%%?
CPHl
5
5
3
6
4
2
perfect
perfect
interrupted
75%
25%
33%
EFGI
EFG2
4
3
2
4
compound
compound
41%
58%
ERKI
8
1
clustered
66%
:;E$)
YCAA~~(CAG),(CAA),
YCAA), VCAG)~(CAA),
.
(CAA), ‘(cAG)H(cAA),
(cAA),(cAG),
@x4), . . .
90‘4),
(GCTCAA),(CAA),
.
(GCAGCC), . . O-0,
(GCTCAA),(CAA),
.
(GCAGCC), . (c7-0,
(GCTCAA),(CAA),
(GCAGCC), . . . (~~77,
(GCTCAA),(CAA),
(GCAGCC), . . (cz~T),
W,
(Q&.
.(Q),3..
(Q), I
(Q)~.
(Q)~
(QA)~@&
.(A),
fQAl,cQ,,
(A),
(QAJ,(Q)~ . . (AJ2
(QA),(Q), . . (A),
Cs),
(S),
. C.9,
. CS),
For each gene, the type of repeat, the repeat, and the amino acid tract it encodes are given. Superscripts preceding individual repeats refer to
the ranks of the nine longest repeats found in GenBank.
* The eleven strains tested were C. albicans. All loci failed to amplify specific products from a sample of C. kruseii. The one italicized
(CAA)-repeat for ZNFI was independently sequenced both from ATCC strains 14053, 36232 and the three italicized sequences at the ERKI
locus came from ATCC strains 14053 (196 bp), 36232 (175 bp) and 60193 (187 bp).
D. Field et al./ FEMS Immunology and Medical Microbiology 15 (1996) 73-79
ATCC
ATCC
ATCC
8244
14053
36232
60193
E
C. krusei
9692
7253 H
9201
9648
Fig. 1. Genotypes obtained for eight C. albicans strains and one
C. kruseii strain at the ERKJ locus. Strain 9201, an apparent
homozygote at this locus, was also apparently homozygous at the
other loci tested, suggesting that it may be a haploid.
across strains varied from 25-75% among these loci
and ten of the eleven strains could be uniquely
genotyped. The two st:rains which remained indistinguishable are clinical isolates obtained at two different hospitals within the San Diego area. All strains
were heterozygous for at least one locus, suggesting
diploidy. Strain 9201 was, however, apparently homozygous at all loci, suggesting that it may be a
haploid.
An autoradiograph, displaying amplification products from the ERKI locus for eight strains of C.
albicans and one strain of C. kruseii is shown (Fig.
1). DNA samples from C. kruseii consistently yielded
no specific amplification
products under the PCR
conditions detailed above for any of the seven loci
tested, indicating
considerable
evolutionary
divergence from C. albicans.
Sequencing
of four alleles at the ERKI locus
confirmed that four of the short microsatellites were
contributing to length polymorphisms
(Table 3). At
this locus, very short repeats can be polymorphic
among strains. Differences in the repeat length from
(CAA), , in GenBank to (CAA), in ATCC strains
14053 and 36232 accounted for the two of the band
sizes seen in ZNFZ (Table 3). To control for PCR-induced replication errors, all PCR product-containing
plasmids were ‘typed’ to verify that the allele cloned
77
into these plasmids matched exactly the length of the
allele when amplified from target genomic DNA.
4. Discussion
Microsatellites fall into three categories as defined
by Weber [El: perfect, compound, and intertupted.
Weber found that perfect CA-repeats were more
polymorphic than imperfect repeats of the other two
categories [15]. Further, for all categories of repeats
it appeared that length is the best predictor of the
degree of polymorphism
[15], an observation that
agreed with earlier findings (e.g. 116,171).
We examined three perfect, two compound, and
one interrupted microsatellite in Candida and found
results consistent with Weber’s observations. However, the seven Candida polymorphic regions typed
here differ from microsatellite
sequence-containing
regions commonly employed in higher eukaryotes
[13,15]. The microsatellite
sequences are all extremely short, are found in translated regions, and
consist of trinucleotides or hexanucleotides,
all except one of which code for polyglutamine tracts.
A region of the ERKl gene which may belong to
a fourth category of repeat, a ‘clustered’ microsatellite, was also selected for examination.
‘Clustered’
loci contain numerous non-contiguous
but closely
linked microsatellites.
The clustered microsatellite
found in ERKl resembles a string of very short
perfect microsatellites as traditionally defined. It has
yet to be determined whether the rapid accumulation
of polymorphism
in this region signifies an unusually elevated rate of polymerase slippage.
The relatively low heterozygosities found at these
loci may be explained by the fact that they are
translated into proteins, so that the degree of purifying selection acting upon them might be significant.
Microsatellites
appear to be relatively abundant in
Candida, so that a small number of alleles at each
polymorphic locus can be compensated for by surveying a larger number of loci. This can be facilitated by the multiplexing of PCR primers. We have
successfully multiplexed three to four pairs of PCR
primers in a single PCR reaction using the same
conditions as those for a single primer pair.
Although only coding microsatellites
were assayed in this study, non-coding microsatellites should
78
D. Field et al./FEMS
Immunology and Medical Microbiology 15 (1996) 73-79
be at least as polymorphic.
They may, however,
prove difficult to study in microorganisms
because
of high rates of evolutionary
divergence in primer
target sequences. Coding regions were amplified in
this study because we were interested in examining
the variability associated with polyglutamine
tracts.
In Candida, as well as in a variety of other microorganisms [ 181, polyglutamine
tracts, predominantly
coded for by the CAA codon, appear abundant. In
addition, glutamine repeats are found in a growing
number of human triplet-repeat diseases [19], as well
as in large numbers in evolutionarily
diverse transcription factors [20]. They have been shown to be
capable of modulating levels of transcription in vivo
[20]. The fact that these glutamine repeats retain high
levels of variability in natural populations suggests
that such loci will be useful in epidemiological
studies, in cloning of microsatellites
de novo from microorganisms using (CAA), probes, and perhaps even
in gaining an understanding of the role of glutamine
tract variability using a simple eukaryotic model.
The approach taken here should prove useful in
determining the detailed epidemiology
of Cundida
infections. One example is the study of drug resistance. Since Candida is an asexual organism, the
evolution of drug resistance can result from replacement of one clone with another or from de novo
mutation within a clonal lineage. We have begun a
study to determine which alternative is most likely
during the course of infection in AIDS patents undergoing fluconazole prophylaxis. Large differences
in Cundida genotypes at different points during the
course of an infection seem to be correlated with
large changes in fluconazole resistance, suggesting
clonal replacement (Field et al., manuscript in preparation). Microsatellite
typing should be especially
useful in a clinical setting, since PCR/Chelex-based
methods should allow the rapid extraction of Cundida DNA from sources such as blood and urine
La.
References
[I] Sarachek, A., Rhoads, D.D. and Schwarzhoff,
R.H. (1981)
Hybridization of Candida albicans through fusion of protoplasm. Arch. Microbial. 129, l-8.
[2] DuPont, P. (1995) Candida albicans, the opportunist.
A
cellular and molecular perspective. J. Am. Pediatric Med.
Assoc. 85, 104-115.
[3] Merz, W. (1990) Candida albicans strain delineation. Clin.
Microbial. Rev. 3, 321-334.
[4] Hunter, P. (1991) A critical review of typing methods for
Candida albicans and their applications. Crit. Rev. Microbiol. 17, 417-434.
[5] Ernst, J. (1990) Molecular genetics of pathogenic fungi:
some recent developments
and perspectives. Mycoses 33,
225-229.
[6] Pfaller, M. (1992) The use of molecular techniques for
epidemiologic typing of Candida species. Curr. Topics Med.
Mycol. 4, 43-63.
[71 Lieckfeldt, E., Meyer, W. and Borner, T. (1993) Rapid
identification and differentiation of yeasts by DNA and PCR
fingerprinting. J. Basic Microbial. 33, 413-425.
l81 Meyer, W., Lieckfeldt, E., Kuhls, K., Freedman, E., Bomer,
T. and Mitchell, T. (1993) DNA- and PCR-fingerprinting
in
fungi. Exs 67, 31 l-320.
[91 Sullivan, D., Bennett, D., Henman, M., Harwood, P., Flint,
S., Mulcahy, F., Shanley, D. and Coleman, D. (1993)
Oligonucleotide fingerprinting of isolates of Candida species
other than C. albicans and of atypical Candida species from
human immunodeficiency
virus-positive and AIDS patients.
J. Clin. Microbial. 31, 2124-2133.
of simple sequences as a
1101 Tautz, D. (1989) Hypervariability
general source for polymorphic DNA markers. Nucleic Acids
Res. 17.6463-6471.
[ill Weber, J. and May, P. (1989) Abundant class of human
DNA polymorphisms
which can be typed using the polymerase chain reaction. Am. J. Human Genetics 44, 388-396.
WI Strand, M., Prolla, T., Liskay, R. and Petes, T. (1994)
Destabilization of tracts of simple repetitive DNA in yeast by
mutations affecting DNA mismatch repair. Nature 365, 274276.
1131 Murray, J., Buetow, K., Weber, J., Ludwigsen, S., Scherpbier-Heddema,
T., Manion, F., Quillen, J., Sheffield, V..
Sunden, S., Duyk, G., et al. (1994) A comprehensive human
linkage map with centimorgan density. Cooperative Human
Linkage Center (CHLC). Science 265, 2049-2054.
[141 Walsh, P., Metzger, D. and Higuchi, R. (1991) Chelex 100 as
a medium for simple extraction of DNA for PCR-based
typing from forensic material. BioTechniques
10, 506-513.
[151 Weber, J.L. (1990) Informativeness of human (dC-dA)n.(dGdT)n polymorphisms.
Genomics 524, 524-530.
ll61 Levinson, G. and Gutman, G. (1987) High frequencies of
short frameshifts in poly-CA/TG
tandem repeats borne by
bacteriophage Ml3 in Escherichia coli K-12. Nucleic Acids
Res. 15, 5323-5338.
[171 Chung, M., Ranum, L., Duvick, L., Servadio, A., Zoghbi, H.
and Orr, H. (1992) Evidence for a mechanism predisposing
to intergenerational
CAG repeat instability in spinocerebellar
ataxia type 1. Nature Genetics 5, 254-258.
lr81 Field, D. and Wills, C. (1996) Long, polymorphic microsatellites in simple organisms. Proc. Royal Sot. Lond. B, 263,
209-215.
D. Field et al. / FEMS hvnunology and Medical Microbiology 15 (1996) 73-79
[19] Sutherland, G. and Richards, R. (1995) Simple tandem DNA
repeats and human genetic disease. Proc. Natl. Acad. Sci.
USA 92, 3636-3641.
[20] Gerber, H., Seipel, K., Georgiev, 0.. Hofferer, M., Hug, M.,
Rusconi, S. and Schaffner, W. (1994) Transcriptional activation modulated by homopolymeric
glutamine and proline
stretches. Science 263, 808-8 11.
79
[21] Buchman, T., Rossier, M., Merz, W. and Charache, P. (1990)
Detection of surgical pathogens by in vitro DNA amplification. Part I. Rapid identification of Condida albicans by in
vitro amplification of a fungus-specific
gene. Surgery 108,
338-346.