Microsatellite Variation, Repeat Array Length

Microsatellite Variation, Repeat Array Length, and Population History of
Plasmodium vivax
M. Imwong,* D. Sudimack, S. Pukrittayakamee,* L. Osorio,à J. M. Carlton,§ N. P. J. Day,*k
N. J. White,*k and T. J. C. Anderson *Department of Clinical Tropical Medicine, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand;
Department of Genetics, Southwest Foundation for Biomedical Research, San Antonio, Texas; àMalaria Research Group,
International Centre for Medical Research and Training, Cali, Colombia; §The Institute for Genomic Research, Rockville, Maryland;
and kCentre for Tropical Medicine and Vaccinology, Churchill Hospital, Oxford, United Kingdom
A recent paper (Leclerc et al. 2004) described limited
variation in dinucleotide microsatellites from Plasmodium
vivax, suggesting very recent bottlenecks or genome-wide
selective events. We describe patterns of variation in 11 dinucleotide microsatellites in P. vivax populations from Colombia, India, and Thailand. We find abundant variation
with heterozygosity of 0.64, 0.76, and 0.77, respectively,
in the three countries. The discrepancy between these
two studies results is simply explained by the differences
in the size of repeat arrays. The microsatellites studied
by Leclerc et al. (2004) have very few repeats (median
5.5, range 4–13) and so would not be expected to be variable. Plasmodium vivax microsatellites show comparable
levels of variation to those in Plasmodium falciparum when
repeat array length is taken into account and provide no support for recent bottlenecks or widespread selective purging
of variation from the genome of P. vivax.
The unusual patterns of variation in the P. falciparum
genome have generated a lively debate about parasite origins and evolutionary history (Su, Mu, and Joy 2003; Hartl
2004). Recent studies have also revealed conflicting views
on the ancestry of the related parasite Plasmodium vivax.
Sequencing studies of both mitochondrial DNA and nuclear
genes suggest a most recent common ancestor between
200,000 and 314,000 years ago in P. vivax (Feng et al.
2003; Escalante et al. 2005; Jongwutiwes et al. 2005). However, patterns of microsatellite variation muddy the picture.
Leclerc et al. (2004) isolated 13 microsatellite sequences
and found that 9/12 were monomorphic in eight populations examined, while of the remaining four loci only one
showed extensive polymorphism. Because microsatellite
repeats characteristically show high mutation rates relative
to single nucleotide polymorphisms (Ellegren 2004), these
data might suggest either expansion from a recent bottleneck
(,10,000 years ago) and/or the recent removal of variation
as a consequence of multiple selective events. However,
such recent events are also expected to remove sequence
variation, which clearly has not occurred. It therefore
seems likely that there is an alternative explanation for
the meager variation observed in the microsatellite data
of Leclerc et al. (2004).
To further evaluate microsatellite variation in P. vivax,
we screened the unpublished genome sequence data generated by The Institute for Genomic Research (TIGR) (http://
www.tigr.org) for repeats using TANDEM REPEAT
Key words: microsatellite, array length, heterozygosity, selection,
bottleneck.
E-mail: [email protected].
Mol. Biol. Evol. 23(5):1016–1018. 2006
doi:10.1093/molbev/msj116
Advance Access publication March 1, 2006
Ó The Author 2006. Published by Oxford University Press on behalf of
the Society for Molecular Biology and Evolution. All rights reserved.
For permissions, please e-mail: [email protected]
FINDER (Benson 1999) and designed oligos to amplify
16 dinucleotide microsatellite sequences. Five microsatellites amplified poorly or were not interpretable and were
discarded. The remaining markers were assigned to chromosomes by comparison with the draft genome sequence
for P. vivax. Nine of the markers were each found on different chromosomes, while two were situated on short contigs that have not yet been assigned to chromosomes (table
1). We measured length variation in these 11 markers in P.
vivax populations from Thailand (n 5 28), India (n 5 27),
and Colombia (n 5 27). Genotyping was performed on an
ABI 3100 capillary sequencer using GENESCAN and
GENOTYPER software, and products were sized by comparison to LIZ-500 size standards (table 1). The samples
from Thailand were collected from patients visiting the hospital from tropical diseases in Bangkok, Indian samples
were collected from symptomatic patients at Calcutta
School of Tropical Medicine, while Colombian parasites
were collected from five different locations (Quibdo,
Buenaventura, Guapi, and Tumaco on the coast west of
the Andes and Amazonas state to the east of the Andes).
All samples were collected with ethical permission from
review boards in Thailand, India, and Colombia and from
the Institutional Review Board of the University of Texas at
San Antonio. We measured expected heterozygosity P
(He) at
each locus using the formula He 5n=ðn 1Þð1 p2i Þ;
where p is the frequency of ith allele and n is the number
of alleles sampled. Where multiple alleles were observed
within an infection, suggesting that .1 clone is present,
we used only the predominant allele for calculation of He.
All markers examined were polymorphic with 7–18 alleles
per locus and mean expected heterozygosity (He) 6 standard
deviation (SD) of 0.64 6 0.25, 0.76 6 0.15, and 0.77 6 0.18
in Colombia, India, and Thailand, respectively. These data
and the high diversity observed at a single microsatellite
sequence by Gomez et al. (2003) demonstrate that many
microsatellites show high levels of variation in P. vivax.
What might explain the difference between these studies and the meager variation observed in the data of Leclerc
et al. (2004)? Microsatellite variation is strongly dependent
on the length of repeat arrays. Evidence for this comes
from experimental studies in microorganisms (Wierdl,
Dominska, and Petes 1997) and from observing mutation
in pedigree studies (Brohede, Moller, and Ellegren 2004).
Furthermore, descriptive studies of numerous organisms
including Plasmodium (Anderson et al. 2000) invariably
show higher levels of variation in loci with long repeat arrays than those with short repeat arrays (Ellegren 2004) and
highlight the importance of standardizing measures of genetic variation by repeat array length (Petit et al. 2005).
The relationship between array length and genetic variability
NOTE.—Loci were named after the chromosome in which they are located in the genome sequence data of the Salvador 1 strain of P. vivax (http://www.tigr.org/tdb/e2k1/pva1/intro.shtml), followed by a numeric identifier. The two markers
labeled ‘‘NA’’ could not be assigned to chromosomes. Repeat array length was measured as the longest string of uninterrupted dinucleotide repeats in the genome sequence strain. The minimum and maximum values show the range of allele sizes
observed in the field samples. We amplified each locus from DNA prepared from finger-prick blood samples using seminested amplification. External forward and reverse primers were used in the first round of polymerase chain reaction (PCR),
while florescent end-labeled forward oligos and reverse oligos were used for the second round PCR. All amplification reactions (10 ll) contained 2.5 mM MgCl2, 125 lM of each of the four deoxynucleoside triphosphates, PCR buffer, and 0.4
units of TaKaRa polymerase (Takara Bio Inc., Otsu, Shiga, Japan). Primary amplification reactions contained 2 ll of the template genomic DNA, and 1 ll of the product of these reactions was used to initiate the secondary amplification reaction.
The cycling parameters for PCR for all were 95°C, 5 min; (94°C, 30 s; 52°C, 30 s; 72°C 30 s) 3 25; and 72°C, 2 min.
GCAGATATGCTGTCGAATTT
AAAAATGGAGACATGGAAGA
AACAAATTGTGGGTAGATGC
AAAATTTTAACAAGCCTGAAA
TCTCCTTGAAAATGTAAATTGAT
AATTGGTTTTTAATTGGGAAT
GAGAAGGTAACCCCAAAGAG
TTAAGCTTCTGCATGCTCTT
GTGCCATCTGCTCAAATC
TGAAGCGGCATATATGTAAA
CCAAGTAGAGAAAGGGAAAA
TTGTATTAACAATGGGCAGA
TTATAACCTTCGGGGTTTTT
AAAATGCACCTCTTTCATTC
TGGTAAAACAGGAATACGAAA
GAATTCATGCAAAAGAACTGT
CGGAACTTTTATATCGCATC
CGAATTTTATAGGGGGAGAC
GCTATGCATGTGTGGATGT
TAACCCTCTATCGCTCTCAC
TATCATGATCCTGCGCTAA
GCCACAGGATGTACATAAGA
ATGGTTTCTGTTGCCAGTT
TTAGTTCCAGCAAAACCTTC
ATGAGGTTTTCACGTTGTTC
ACGATTTATTAAAAAGACTATGA
TTGTATCAGTTAAACAAATATGACA
AAAAATAGGGAATTTTCGTT
CTTCTAAGCGTGAGCAGTTT
CAAATCATGGTAGCCTCCTA
GTGGGGTTGTTTAGCTTGT
GTACCCATTTTGTGTACGAG
TGAGAGGAGCCTACTGTGAT
14.185
12.335
7.67
NA.1276
NA.2208
4.2771
8.332
6.34
2.21
10.29
3.35
13.5
16
15
13.5
13.5
14
17.5
16.5
17.5
16
18
AT
AT
AT
AT
AT
AT
AT
AC
AC
AT
AT
262
155
100
125
155
82
222
136
91
102
111
290
197
134
169
173
100
260
166
129
130
135
VIC
VIC
VIC
PET
6FAM
6FAM
NED
PET
VIC
NED
6FAM
Forward
Reverse
Max
Min
Motif
Array
Length
Name
Table 1
Dinucleotide Loci Amplified from Plasmodium vivax
Label
External Forward
Plasmodium vivax Microsatellite Diversity 1017
FIG. 1.—Relationship between repeat array length and microsatellite
variation in Plasmodium vivax. Unfilled circles show He estimated from the
complete data set in Leclerc et al. (2004). Only four loci were polymorphic,
and plotting data from individual locations results in similar patterns and
conclusions. The numbers of perfect repeats for each locus were counted
from the submitted sequences in GenBank (AY391730, AY391732–40,
and AY391742–44). Points have been offset slightly to show all data.
The solid triangles, diamonds, and circles show He for the 11 loci examined
in this study in Colombia (COL), India (IND), and Thailand (THAI). The
numbers of perfect repeats were counted from unpublished sequence genome sequence data for parasite isolate Salvador 1 (http://www.tigr.org/
tdb/e2k1/pva1/) from which the oligos were designed.
is not linear. There is a lower threshold length below which
slippage mutations are rare and an exponential increase in
slippage with increasing repeat number (Lai and Sun
2003). There is a simple explanation for the minimal diversity observed in the data of Leclerc et al. (2004): the microsatellite sequences isolated by these authors have very short
repeat arrays (median 5 5.5, range 4–13) and so would not
be expected to show high levels of variation. In contrast, we
examined microsatellites with 12–18 repeats (median 5 16)
in the genome sequence strain (Salvador 1). Interestingly,
the single locus showing elevated variation in the data of
Leclerc et al. (2004) had 13 repeats in the sequenced clone
(fig. 1).
Our data from Colombia, India, and Thailand reveal
comparable levels of variation with data previously collected from P. falciparum. For example, Nair et al.
(2003) sampled 58 P. falciparum dinucleotide microsatellite markers from Chr 1, 2, 3, and 12 in parasites from the
Thailand-Burma border. Of these, 24 had between 12 and
18 pure AT repeats (median 5 16) in the genome sequence
and a mean He 6 SD of 0.82 6 0.08. Similarly, 20 dinucleotide microsatellites with 12–18 pure repeat arrays
(median 5 16) sampled from across the genome in 12 parasite isolates from worldwide locations had mean He 6 SD
of 0.81 6 0.08 (Anderson et al. 2000). Hence, dinucleotide
microsatellites from P. vivax show comparable variation
to P. falciparum microsatellites with similar repeat array
length and structure.
These data demonstrate the importance of accounting
for repeat array length when interpreting microsatellite data.
Plasmodium vivax microsatellite sequences show comparable levels of variation to those seen in P. falciparum when
repeat array length is taken into account and provide little
support for recent origins or multiple selective events in this
species. Rather, microsatellite sequences with short repeat
1018 Imwong et al.
arrays such as those isolated by Leclerc et al. (2004) would
be expected to have very low mutation rates. While microsatellites are considerably less common in the P. vivax genome than in the AT-rich P. falciparum genome and also
tend to be shorter in length, these markers can still provide
useful tools for assessing population structure and for
searching for evidence of recent selection events associated
with drug resistance.
Acknowledgments
Preliminary sequence data from which microsatellite
primers were designed were obtained from TIGR (http://
www.tigr.org). Funding for the P. vivax sequencing project
came from the National Institutes for Allergy and Infectious
Disease, the U.S. Department of Defense, and the Burroughs Wellcome Fund. Financial support was provided
by a Wellcome Trust fellowship to M.I. and National Institutes of Health (NIH) grant RO1 AI48071 to T.J.C.A.
N.J.W. and N.P.J.D. were supported by the Wellcome Trust
of Great Britain. This investigation was conducted in facilities constructed with support from Research Facilities Improvement Program Grant Number C06 RR013556 from
the National Center for Research Resources, NIH.
Literature Cited
Anderson, T. J., X. Z. Su, A. Roddam, and K. P. Day. 2000. Complex mutations in a high proportion of microsatellite loci from
the protozoan parasite Plasmodium falciparum. Mol. Ecol.
9:1599–1608.
Benson, G. 1999. Tandem repeats finder: a program to analyze
DNA sequences. Nucleic Acids Res. 27:573–580.
Brohede, J., A. P. Moller, and H. Ellegren. 2004. Individual variation in microsatellite mutation rate in barn swallows. Mutat.
Res. 545:73–80.
Ellegren, H. 2004. Microsatellites: simple sequences with complex evolution. Nat. Rev. Genet. 5:435–445.
Escalante, A. A., O. E. Cornejo, D. E. Freeland, A. C. Poe,
E. Durrego, W. E. Collins, and A. A. Lal. 2005. A monkey’s
tale: the origin of Plasmodium vivax as a human malaria
parasite. Proc. Natl. Acad. Sci. USA 102:1980–1985.
Feng, X., J. M. Carlton, D. A. Joy, J. Mu, T. Furuya, B. B. Suh,
Y. Wang, J. W. Barnwell, and X. Z. Su. 2003. Single-nucleotide
polymorphisms and genome diversity in Plasmodium vivax.
Proc. Natl. Acad. Sci. USA 100:8502–8507.
Gomez, J. C., D. T. McNamara, M. J. Bockarie, J. K. Baird,
J. M. Carlton, and P. A. Zimmerman. 2003. Identification
of a polymorphic Plasmodium vivax microsatellite marker.
Am. J. Trop. Med. Hyg. 69:377–379.
Hartl, D. L. 2004. The origin of malaria: mixed messages from
genetic diversity. Nat. Rev. Microbiol. 2:15–22.
Jongwutiwes, S., C. Putaporntip, T. Iwasaki, M. U. Ferreira, H.
Kanbara, and A. L. Hughes. 2005. Mitochondrial genome
sequences support ancient population expansion in Plasmodium vivax. Mol. Biol. Evol. 22:1733–1739.
Lai, Y., and F. Sun. 2003. The relationship between microsatellite
slippage mutation rate and the number of repeat units. Mol.
Biol. Evol. 20:2123–2131.
Leclerc, M. C., P. Durand, C. Gauthier, S. Patot, N. Billotte,
M. Menegon, C. Severini, F. J. Ayala, and F. Renaud. 2004.
Meager genetic variability of the human malaria agent Plasmodium vivax. Proc. Natl. Acad. Sci. USA 101:14455–14460.
Nair, S., J. T. Williams, A. Brockman et al. (12 co-authors). 2003.
A selective sweep driven by pyrimethamine treatment in SE
Asian malaria parasites. Mol. Biol. Evol. 20:1526–1536.
Petit, R. J., M. F. Deguilloux, J. Chat, D. Grivet, P. Garnier-Gere,
and G. G. Vendramin. 2005. Standardizing for microsatellite
length in comparisons of genetic diversity. Mol. Ecol.
14:885–890.
Su, X. Z., J. Mu, and D. A. Joy. 2003. The ‘‘Malaria’s Eve’’ hypothesis and the debate concerning the origin of the human
malaria parasite Plasmodium falciparum. Microbes Infect.
5:891–896.
Wierdl, M., M. Dominska, and T. D. Petes. 1997. Microsatellite
instability in yeast: dependence on the length of the microsatellite. Genetics 146:769–779.
Laura Katz, Associate Editor
Accepted February 24, 2006