Occurrence, distribution and possible functional roles of simple

International Journal of Systematic and Evolutionary Microbiology (2015), 65, 2748 – 2760
DOI 10.1099/ijs.0.000273
Occurrence, distribution and possible functional
roles of simple sequence repeats in phytoplasma
genomes
Wei Wei, Robert E. Davis, Xiaobing Suo and Yan Zhao
Correspondence
Yan Zhao
Molecular Plant Pathology Laboratory, USDA-Agricultural Research Service, Beltsville, MD 20705,
USA
[email protected]
Phytoplasmas are unculturable, cell-wall-less bacteria that parasitize plants and insects. This
transkingdom life cycle requires rapid responses to vastly different environments, including
transitions from plant phloem sieve elements to various insect tissues and alternations among
diverse plant hosts. Features that enable such flexibility in other microbes include simple
sequence repeats (SSRs) — mutation-prone, phase-variable short DNA tracts that function as
‘evolutionary rheostats’ and enhance rapid adaptations. To gain insights into the occurrence,
distribution and potentially functional roles of SSRs in phytoplasmas, we performed
computational analysis on the genomes of five completely sequenced phytoplasma strains,
‘Candidatus Phytoplasma asteris’-related strains OYM and AYWB, ‘Candidatus Phytoplasma
australiense’-related strains CBWB and SLY and ‘Candidatus Phytoplasma mali’-related strain
AP-AT. The overall density of SSRs in phytoplasma genomes was higher than in representative
strains of other prokaryotes. While mono- and trinucleotide SSRs were significantly
overrepresented in the phytoplasma genomes, dinucleotide SSRs and other higher-order SSRs
were underrepresented. The occurrence and distribution of long SSRs in the prophage islands
and phytoplasma-unique genetic loci indicated that SSRs played a role in compounding the
complexity of sequence mosaics in individual genomes and in increasing allelic diversity among
genomes. Findings from computational analyses were further complemented by an examination
of SSRs in varied additional phytoplasma strains, with a focus on potential contingency genes.
Some SSRs were located in regions that could profoundly alter the regulation of transcription
and translation of affected genes and/or the composition of protein products.
INTRODUCTION
Simple sequence repeats (SSRs), also termed microsatellites,
are hypermutable DNA tracts that consist of tandemly
repeated nucleotide motifs of one to several bases (van
Belkum et al., 1998). SSRs are ubiquitous in the genomes
of all studied eukaryotes and prokaryotes, lying across various regions of resident loci including protein-coding
regions, 59- and 39-untranslated regions (UTRs), introns
and non-transcribed genomic regions (Field & Wills, 1998;
van Belkum et al., 1998; Gur-Arie et al., 2000; Coenye &
Vandamme, 2005; Trivedi, 2006). The distribution of SSRs
among different loci within a genome is non-random, and
the abundances (or rarities) of individual SSR motifs
are independent of the nucleotide compositions of the
Abbreviations: AMP, antigenic membrane protein; IMP, immunodominant
membrane protein; SSR, simple sequence repeat; SVM, sequencevariable mosaic
Four supplementary tables are available with the online Supplementary
Material.
2748
corresponding genomes (Katti et al., 2001; Li et al., 2002,
2004). For example, in Escherichia coli K-12, mononucleotide and trinucleotide SSRs are significantly overrepresented,
whereas dinucleotide and tetranucleotide repeats are underrepresented in genes related to stress responses (Rocha et al.,
2002). In the human genome, while SSRs of every possible
motif of mono-, di-, tri- and tetranucleotide are enormously
overrepresented (Ellegren, 2004), mono- and dinucleotide
repeats are particularly dense in genes encoding transforming acidic coiled-coil proteins that are implicated in many
types of cancer (Trivedi, 2013).
During DNA replication, unequal crossing over and related
processes, SSRs are susceptible to mutations due to DNA
polymerase slippage (Chistiakov et al., 2005). Such mutations
add repeat units to or subtract them from existing SSRs, causing SSR expansion and contraction. While the slippage rate is
somehow correlated with the length of the SSR motif, slippage could occur without a minimal threshold length
(Leclercq et al., 2010). SSR expansions and contractions
play significant roles in genome instability, evolution and
recombination (Ellegren, 2004; Loire et al., 2013). Because
Downloaded from www.microbiologyresearch.org by
000273 G 2015
IP: 88.99.165.207
On: Mon, 31 Jul 2017 16:34:04
Printed in Great Britain
Simple sequence repeats in phytoplasma genomes
of their abundance and high polymorphism, SSRs provide
informative molecular markers in genome mapping, population genetics and biological diversity studies (Ellegren,
2004).
In prokaryotes, SSR expansions and contractions are not
only frequent but also readily reversible (Kashi & King,
2006). Since alterations in SSR repeat number within regulatory or coding regions may directly affect the expression
of the associated genes or the functions of the encoded proteins, SSRs often function as contingency loci in bacterial
genomes, providing an evolutionary ‘tuning knob’ for
rapid adaptation to a changing environment (King, 1994;
Kashi et al., 1997; Young et al., 2000; Bayliss et al., 2001;
Karlin et al., 2002). SSRs acting as contingency loci are particularly common among mycoplasmas (Rocha & Blanchard, 2002; Kashi & King, 2006; Mrázek, 2006). In
pathogenic bacteria, certain contingency loci are associated
with genes that encode cell-surface antigens and may play
a role in evading host immune responses (Moxon et al.,
2006). For example, in Mycoplasma gallisepticum, the
number of trinucleotide GAA repeats upstream of a haemagglutinin (adhesin) gene, and hence the spacing between the
repeats and the core promoter, influences the expression
levels of the adhesin gene, accounting for phenotypic variations (Liu et al., 2002). Likewise, in Mycoplasma pulmonis,
the length of a tandem repeat within the gene segment
encoding the C terminus variable region of a surface antigen
correlates with the virulence level of the pathogen (Simmons
et al., 2004).
Phytoplasmas are insect-transmitted, plant-phloeminhabiting bacteria responsible for numerous diseases in
diverse plant species worldwide (Doi et al., 1967; Tsai,
1979; Lee et al., 2000). Along with mycoplasmas and
other cell-wall-less bacteria, phytoplasmas are classified in
the class Mollicutes. So far, diverse phytoplasmas belonging
to 37 candidate species of ‘Candidatus Phytoplasma’ and 32
major groups have been delineated based on 16S rRNA
gene sequences (Wei et al., 2007; Zhao et al., 2009; Harrison et al., 2014). The nature of the phytoplasma life cycle
requires rapid adaptation to exceptionally variable host
environments, including sequential transitions from plant
phloem sieve elements to the insect intestinal tract, haemolymph and salivary glands, and shuttling back to the
phloem elements of a new plant host. While it would be
interesting to learn whether SSRs play a role in the rapid
adaptation of phytoplasmas to their vastly different host
environments during transkingdom host switching, tools
for functional assessment of SSRs are currently lacking
due to our inability to establish phytoplasma cultures in
cell-free medium and the consequent inaccessibility of
measurable phenotypic characters. However, the availability of complete genome sequence data from five phytoplasma strains (Oshima et al., 2004; Bai et al., 2006; Kube
et al., 2008; Tran-Nguyen et al., 2008; Andersen et al.,
2013) provides an opportunity to investigate the occurrence, distribution and potential functional roles of
SSRs in phytoplasma genomes using bioinformatics tools.
http://ijs.sgmjournals.org
We found that, in phytoplasma genomes, SSRs are abundant and diverse in motif, repeat number and pattern of
chromosomal distribution. The SSRs that occur in prophage islands and other phytoplasma-unique genetic loci
undoubtedly compound the complexity of sequence
mosaics within individual genomes and increase allelic
diversity among genomes.
METHODS
Identification of SSRs in phytoplasma genomes. The assembled,
complete genome sequences of five phytoplasma strains were
retrieved from the National Center for Biotechnology Information
(NCBI) nucleotide sequence database. The strains were ‘Candidatus
Phytoplasma asteris’-related onion yellows mild strain (OY-M;
GenBank accession no. NC_005303) and aster yellows witches’broom strain (AY-WB; NC_007716), ‘Ca. Phytoplasma australiense’related cottonbush witches’-broom strain (CBWB; NC_010544) and
strawberry lethal yellows strain (SLY; NC_021236) and ‘Ca. Phytoplasma mali’-related apple proliferation strain AT (AP-AT;
NC_011047). The motifs, repeat numbers and locations of SSRs in the
phytoplasma genomes were identified with a computer program
developed by Gur-Arie et al. (2000). The range of motif length was set
to 1–10 nt, and the minimum number of repeats was set at 3. In the
final SSR tally and SSR density calculation, only SSRs that were equal
to or longer than 6 nt were counted (i.e. mononucleotide repeats
shorter than 6 nt were excluded). The mean density of SSRs in a given
genome was calculated by dividing the final count of SSRs by the
length (kb) of the respective genome.
SSR frequency assessment. To assess whether the observed SSRs
in the phytoplasma genomes were merely chance associations
of nucleotides, the observed frequency of each SSR motif in a given
phytoplasma genome was compared with the occurrence of the
same SSR motif in a corresponding randomized virtual genome (a
computer-generated random genome with the same base composition
as the actual genome of a given phytoplasma). A computer script was
developed in this study based on the random function (rand) of the
Perl programming language (version 5.10; http://www.perl.org/). One
hundred virtual genomes were generated for each of the five phytoplasma strains using the Perl script. The virtual genomes were then
analysed using the same SSR identification program with the same
parameters described above. The mean frequency and standard error
for each SSR motif in the 100 virtual genomes were calculated.
Statistical significance between the frequency of a given SSR motif
observed from a given actual phytoplasma genome and the mean
frequency of the same SSR motif observed from the corresponding
100 virtual genomes was assessed with two-tailed t-tests. Statistical
analysis was conducted using the Microsoft Excel program (Microsoft
Office Suite 2007).
DNA extraction from phytoplasma strains. To determine experimentally the repeat number variation of a nonanucleotide SSR motif
(AAAATAAGG) in/around the gene encoding haemolysin III (hlyIII),
six phytoplasma strains were used. These strains were aster yellows
phytoplasma (AY1a), Oklahoma aster yellows phytoplasma (OKAY),
New Jersey aster yellows phytoplasma (NJAY), clover phyllody phytoplasma (CPh), paulownia witches’-broom phytoplasma (PaWB)
and Chinese wingnut witches’-broom phytoplasma (CWWB). Total
phytoplasma DNAs were extracted from diseased plants according to
the procedure described by Ahrens & Seemüller (1992).
PCR amplification of the haemolysin III gene. The sequences
of hlyIII and the neighbouring loci were identified from the
genomes of OYM and AYWB. The sequences were aligned using the
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Mon, 31 Jul 2017 16:34:04
2749
W. Wei and others
program (CLUSTAL _X option) of the sequence analysis
software suite Lasergene (DNASTAR ). Primers for amplification of
the full-length hlyIII were designed based on the alignment:
HaemoF1, 59-GGGTTTGATTTCAGGAAGGG-39, and HaemoR1, 59TGTCTTTGGTCTTCATTG-39. PCR amplifications were carried out
using DNA templates extracted from six ‘Ca. P. asteris’-related
strains. Each reaction mixture, in a total volume of 50 ml, contained
1 ml of the above prepared DNA extract, 25 pmol each of a forward
and a reverse primer, 200 mM each dNTP, 1| GeneAmp PCR buffer
I and 2.5 U AmpliTaq DNA polymerase (Applied Biosystems). The
cycling program consisted of 35 cycles of denaturation at 94 uC for
30 s, annealing at 56 uC for 45 s and extension at 72 uC for 45 s.
Amplicons were analysed by electrophoresis through 1 % agarose gel.
MEGALIGN
TA cloning and DNA sequencing. PCR products were cloned into
plasmid vector pCRII-TOPO (Invitrogen), and propagated in
Escherichia coli as described previously (Shuman, 1994). DNA
sequencing was performed with an ABI PRISM 377 automated DNA
sequencer (Applied Biosystems). DNA sequence data were assembled
using the SeqMan program of the sequence analysis software suite
Lasergene (DNASTAR ).
RESULTS AND DISCUSSION
Occurrence and composition of SSRs in
phytoplasma genomes
Totals of 9338, 8873, 11654, 12416 and 9261 SSRs were
observed in the genomes of OYM, AYWB, CBWB, SLY
and AP-AT, respectively (Tables 1 and 2). The mean density
of SSRs in individual phytoplasma genomes ranged from
10.95 per kb (OYM) to 15.39 per kb (AP-AT), much
higher than those in the genomes of other prokaryotes,
including a representative Gram-negative bacterium (Escherichia coli; 3.33 per kb), a low-G+C-content Gram-positive
bacterium (Bacillus subtilis; 4.74 per kb) and a humanpathogenic, cell-wall-less bacterium (Mycoplasma genitalium; 6.30 per kb) (Table 2). The SSR densities in phytoplasma genomes were also significantly higher than those
of phylogenetically related, plant-associated species of the
genus Acholeplasma, Acholeplasma brassicae (4.73 per kb)
and A. palmae (7.17 per kb), and higher than those of
plant-phloem-inhabiting candidate species of ‘Candidatus
Liberibacter’, ‘Ca. Liberibacter americanus’ (7.90 per kb)
and ‘Ca. L. asiaticus’ (6.39 per kb) (Table 2). The high SSR
density in phytoplasmas was mainly the result of mononucleotide repeats (MNRs) (Table 2).
SSRs with mononucleotide repeats (MNRs i6) were the
most abundant, constituting 79.38 % (in AP-AT) to
84.68 % (in CBWB) of the total SSRs in the phytoplasma
genomes (Table 1 and Table S1, available in the online Supplementary Material). MNRs with A or T motifs were far
more frequent than MNRs with C or G motifs; in fact,
the counts of A or T tracks were 148, 144, 129, 93 and
524 times greater than the counts of C or G tracks in the
genomes of OYM, AYWB, CBWB, SLY and AP-AT,
respectively (Table S2). Generally, low-G+C-content
genomes tend to have more A or T runs, often leading to
a larger number of MNRs. However, our results showed
that the mean density of MNRs in a given phytoplasma
2750
genome is not always proportional to its G+C content.
For example, whereas the genomes of OYM, AYWB and
CBWB have comparable G+C contents (27.76, 26.89 and
27.42 %, respectively), the MNR densities of the three phytoplasmas were 8.71, 10.23 and 11.22 per kb, respectively
(Table 2). Our results also indicated that the MNR densities in the five phytoplasma genomes are much higher
than that of A. palmae (4.53 per kb), a low-G+C-content,
cell-wall-less bacterium closely related to phytoplasmas
(Table 2). While the maximum repeat number of MNRs
reached 15, the counts decreased significantly when the
repeat number exceeded eight for A or T tracks, and six
for C or G tracks (Tables 1 and S2). Compared with virtual
phytoplasma genomes (computer-generated random genomes with the same base compositions as actual phytoplasmas), MNRs were overrepresented in each of the five
phytoplasma genomes (Pv0.01), especially MNRs in the
range of 6 to 8 nt in length (Tables 1 and S1). Whereas
MNRs occurred in both coding and non-coding regions
of the phytoplasma genomes, it appeared that a majority
of MNRs ranging from 6 to 10 nt in length were located
in protein-coding regions. For example, 75.34, 68.75,
66.06, 69.93 and 76.02 % of 9-nt MNRs were observed in
coding regions of the OYM, AYWB, CBWB, SLY and
AP-AT genomes, respectively (Table 1).
SSRs with dinucleotide repeat motifs (DiNRs) accounted for
8.62–11.79 % of the total SSRs found in the five genomes
(Tables 1 and S1). Among the 12 DiNR motifs, AT and TA
were predominant (Table S2). The maximum repeat
number of DiNR motifs was 7, 14, 8, 9 and 14 in the genomes
of OYM, AYWB, CBWB, SLY and AP-AT, respectively
(Tables 1 and S1). The distribution of DiNRs between
coding and non-coding regions appeared to be length
(repeat number) -dependent: while the frequencies of 6-nt
DiNRs were slightly higher in coding regions, DNRs longer
than 10 nt were found mainly in the non-coding regions
(Table 1). Overall, DiNRs were underrepresented (Pv0.01)
in the five phytoplasma genomes when compared with randomized virtual phytoplasma genomes (Table S1).
SSRs with trinucleotide repeat motifs (TrNRs) were
overrepresented (Pv0.01) in each of the five phytoplasma
genomes when compared with randomized virtual phytoplasma genomes (Tables 1 and S1). A total of 58 triplet
motif types were identified. Of the 64 possible triplet motif
types, six (AAA, CCC, CCG, GCC, GGG and TTT) were
absent from the five phytoplasma genomes (Table S2).
An overwhelming majority of the TrNRs were 9 nt in
length, and they occurred more frequently in coding regions
than in non-coding regions (Table 1). TrNRs in coding
regions apparently encode amino acid residues. It is well
known that tandem repeats of amino acid residues can
form secondary structures that lead to conformations such
as alpha helices, beta pleated sheets or loops (Heringa &
Taylor, 1997). Expansion or contraction of TrNRs can
change such conformations and therefore change the corresponding protein surface structures and their abilities to
interact with other macromolecules, including other
Downloaded from www.microbiologyresearch.org by
International Journal of Systematic and Evolutionary Microbiology 65
IP: 88.99.165.207
On: Mon, 31 Jul 2017 16:34:04
http://ijs.sgmjournals.org
Table 1. Occurrence of SSRs in the genomes of five phytoplasma strains
MNR, Mononucleotide repeat; DiNR, dinucleotide repeat; TrNR, trinucleotide repeat; TeNR, tetranucleotide repeat; PNR, pentanucleotide repeat; HxNR, hexanucleotide repeat; HeNR, heptanucleotide repeat; ONR, octanucleotide repeat; NNR, nonanucleotide repeat; DeNR, decanucleotide repeat; GW, genome-wide; CR,
coding region; ~, MNRs shorter than six nucleotides (which were not counted in the total SSR tally); (-no SSRs in either GW or CR; %, percentage of SSRs located
within protein-coding regions. The percentages of total protein-encoding sequences in the five phytoplasma genomes are as follows: OYM, 72.95 %; AYWB, 73.73 %;
CBWB, 64.15 %; SLY, 77.35 %; AP-AT, 76.12 %.
Repeats (n)
MNR
GW/CR
~
~
~
4280/2895
2363/1360
704/442
73/55
6/4
2/0
–
3/0
1/0
1/0
~
~
~
3865/2539
2366/1332
775/461
160/110
36/27
7/5
1/1
2/0
11/2
3/2
~
~
~
5096/2904
3631/1567
%
TrNR
TeNR
PNR
HxNR
HeNR
ONR
NNR
DeNR
GW/CR
%
GW/CR
%
GW/CR
%
GW/CR
%
GW/CR
%
GW/CR
%
GW/CR
%
GW/CR
%
GW/CR
%
60.26
49.37
20
0
0
689/471
31/19
2/1
–
–
–
–
–
–
–
–
–
–
68.36
61.29
50
58/25
–
–
–
–
–
–
–
–
–
–
–
–
43.10
10/2
–
–
–
–
–
–
–
–
–
–
–
–
20
9/6
–
–
–
–
–
–
–
–
–
–
–
–
66.67
2/0
–
–
–
–
–
–
–
–
–
–
–
–
0
1/0
–
–
–
–
–
–
–
–
–
–
–
–
0
1/1
–
–
–
–
–
–
–
–
–
–
–
–
100
1/0
–
–
–
–
–
–
–
–
–
–
–
–
0
0
0
0
1014/611
79/39
5/1
1/0
2/0
–
–
–
–
–
–
–
–
57.34
53.95
20
593/406
34/19
7/4
1/1
–
–
–
–
–
–
–
–
–
68.47
55.88
57.14
100
52/27
–
–
–
–
–
–
–
–
–
–
–
–
51.92
8/5
–
–
–
–
–
–
–
–
–
–
–
–
62.5
8/6
–
–
–
–
–
–
–
–
–
–
–
–
75
1/0
–
–
–
–
–
–
–
–
–
–
–
–
0
65.69
56.30
59.48
68.75
75
71.43
100
0
18.18
66.67
858/492
76/41
5/1
–
–
–
–
–
1/0
–
1/0
1/0
–
–
–
–
–
–
–
–
–
–
–
–
–
–
53.08
33.82
37.5
100
0
680/425
19/11
2/1
1/1
–
62.5
57.89
50
100
52/23
2/1
–
–
–
44.23
50
4/0
–
–
–
–
0
15/5
–
–
–
–
33.33
1/0
–
–
–
–
0
56.99
43.16
925/491
68/23
8/3
1/1
1/0
67.64
57.55
62.78
75.34
66.67
0
0
0
0
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Mon, 31 Jul 2017 16:34:04
1/0
–
–
–
–
–
–
1/1
–
–
–
–
–
–
–
–
–
–
0
–
–
–
–
–
100
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
1/0
–
0
Simple sequence repeats in phytoplasma genomes
2751
OYM
3
4
5
6
7
8
9
10
11
12
13
14
15
AYWB
3
4
5
6
7
8
9
10
11
12
13
14
15
CBWB
3
4
5
6
7
DiNR
Repeats (n)
International Journal of Systematic and Evolutionary Microbiology 65
8
9
10
11
12
13
14
15
SLY
3
4
5
6
7
8
9
10
11
12
13
14
15
AP-AT
3
4
5
6
7
8
9
10
11
12
13
14
MNR
DiNR
TrNR
TeNR
%
GW/CR
%
GW/CR
934/476
165/109
14/11
4/0
1/1
–
8/0
17/4
50.96
66.06
78.57
0
100
0
0
23.53
2/0
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
62.89
46.99
8.33
33.33
0
731/564
29/23
–
–
–
–
–
–
–
–
–
–
–
77.15
79.31
52/25
2/1
–
–
–
–
–
–
–
–
–
–
–
48.08
50
4/0
–
–
–
–
–
–
–
–
–
–
–
–
0
68.48
54.51
64.05
69.93
50
25
0
0
33.33
25
1032/649
83/39
24/2
3/1
1/0
–
1/0
–
–
–
–
–
–
8/5
–
–
–
–
–
–
–
–
–
–
–
–
57.87
42.72
16.67
0
0
705/508
48/35
2/0
–
–
–
–
–
–
72.06
72.92
0
–
–
14/6
–
–
–
–
–
–
–
–
–
–
–
42.86
0
0
72/32
–
–
–
–
–
–
–
–
–
–
–
44.44
69.76
63.77
64.58
76.02
78.38
84.21
83.33
80
28.57
921/533
103/44
24/4
2/0
3/0
–
–
–
–
–
2/0
1/0
7/7
1/1
1/1
–
–
–
–
–
–
–
–
–
~
~
~
3615/2522
2219/1415
1039/671
367/279
74/58
19/16
6/5
5/4
7/2
0
GW/CR
%
–
–
–
–
–
–
–
–
GW/CR
HxNR
GW/CR
~
~
~
5453/3734
3746/2042
1043/668
153/107
22/11
4/1
2/0
2/0
15/5
4/1
%
PNR
%
–
–
–
–
–
–
–
–
GW/CR
HeNR
%
–
–
–
–
–
–
–
–
GW/CR
ONR
%
GW/CR
NNR
%
GW/CR
GW/CR
%
–
–
–
–
–
–
–
–
1/0
–
–
–
–
–
–
–
0
1/0
–
–
–
–
–
–
–
–
–
–
–
–
0
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
62.5
–
–
–
–
–
–
–
–
–
–
–
–
–
1/0
–
–
–
–
–
–
–
–
–
–
–
–
0
–
–
–
–
–
–
–
–
–
–
–
–
–
100
100
100
–
1/0
–
–
–
–
–
–
–
–
–
–
1/0
–
–
–
–
–
–
–
–
–
–
–
0
2/2
–
–
–
–
–
–
–
–
–
–
–
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Mon, 31 Jul 2017 16:34:04
0
DeNR
%
100
–
–
–
–
–
–
–
–
–
–
–
–
W. Wei and others
2752
Table 1. cont.
Simple sequence repeats in phytoplasma genomes
Table 2. Abundance and mean density of SSRs in each of the five phytoplasma genomes
The mean density of SSRs of a given phytoplasma genome was calculated by dividing the total count of SSRs by the length
(kb) of the genome. The mean density of SSRs in phytoplasma prophage islands was calculated by dividing the total count of
SSRs within the phage islands by the length of concatenated genomic segments that comprise the prophage islands. For comparative purposes, the mean densities of SSRs in the genomes of Escherichia coli K-12 (GenBank accession no. NC_000913),
Bacillus subtilis subsp. subtilis 168 (NC_000964), Mycoplasma genitalium G-37T (NC_000908), Acholeplasma brassicae 0502T
(NC_022549), Acholeplasma palmae J233T (NC_022538), ‘Candidatus Liberibacter americanus’ Sao Paulo (NC_022793),
‘Candidatus Liberibacter asiaticus’ Psy62 (NC_012985) and Campylobacter jejuni subsp. jejuni NCTC 11168 (NC_002163)
were also calculated with the same criteria. SSRs shorter than six nucleotides were not counted.
MNR to DeNR
Strain
Whole genome
OYM
AYWB
CBWB
SLY
AP-AT
M. genitalium G-37T
E. coli K-12
B. subtilis 168
A. brassicae 0502T
A. palmae J233T
‘Ca. L. americanus’ Sao Paulo
‘Ca. L. asiaticus’ Psy62
C. jejuni NCTC 11168
Prophage islands (SVMs)
OYM
AYWB
CBWB
SLY
Outside SVMs
OYM
AYWB
CBWB
SLY
MNRs only
G+C content
(mol%)
DNA
length (kb)
SSRs (n)
Density
SSRs (n)
Density
SSRs (n)
Density
27.76
26.89
27.42
27.19
21.39
31.69
50.79
43.51
35.77
28.98
31.11
36.47
30.55
853.09
706.57
879.96
959.78
601.94
580.08
4641.65
4215.61
1877.79
1554.23
1195.20
1227.33
1641.48
9338
8873
11654
12416
9261
3655
15449
19984
8889
11151
9445
7843
13297
10.95
12.56
13.24
12.94
15.39
6.30
3.33
4.74
4.73
7.17
7.90
6.39
8.10
1905
1647
1784
1972
1910
819
10083
11302
3453
4117
4397
3153
3102
2.23
2.33
2.03
2.05
3.17
1.41
2.17
2.68
1.84
2.65
3.68
2.47
1.89
7433
7226
9870
10444
7351
2836
5366
8682
5436
7034
5048
4690
10195
8.71
10.23
11.22
10.88
12.21
4.89
1.16
2.06
2.89
4.53
4.22
3.67
6.21
264.22
160.20
318.26
327.27
2997
2031
3520
3318
11.34
12.68
11.06
10.14
701
486
680
704
2.65
3.03
2.14
2.15
2296
1545
2840
2614
8.69
9.64
8.92
7.99
588.87
546.37
561.70
632.51
6343
6842
8134
9096
10.77
12.52
14.48
14.38
1208
1161
1106
1266
2.05
2.12
1.97
2.00
5135
5681
7028
7830
8.72
10.40
12.51
12.38
proteins. Interestingly, in the coding regions, the predominant TrNR motif differed among phytoplasma lineages,
even among closely related strains. For example, while ATT
was the most abundant TrNR motif in the genomes of
both AYWB and OYM as a whole, ATT was the predominant
TrNR motif in the AYWB coding regions, whereas CAA was
the predominant TrNR motif in the OYM coding regions
(data not shown).
The counts of SSRs with tetranucleotide repeat motifs
(TeNRs) were far lower compared with SSRs with TrNRs
(Table 1), and the composition of motif types differed significantly among the five genomes. While a total of 39
TeNR motif types were identified, only eight motif types,
AAAG, AAAT, AATA, AATT, ATTA, ATTT, TTAT and
TTTA, were present in all five completely sequenced genomes. Twelve combinations of TeNR motif types were
present in no more than one genome. For example, the
http://ijs.sgmjournals.org
DiNR to DeNR
combination of GTTT and TTCT repeats was found only
in the genome of OYM (Table S2). Such strain-specific
SSR motif combinations could be exploited as molecular
markers for phytoplasma strain-typing.
SSRs with motif lengths equal to or greater than pentanucleotides were rare. Furthermore, the repeat number of
these long motifs rarely exceeded four (Table 1), with the following exceptions: a nonanucleotide motif (ATAAGGAAA)
repeated five times at position 581651 in the genome of
AYWB; a decanucleotide motif (TAAATAATAA) repeated
six times at position 878250 and eight times at position
878313 in the genome of CBWB; and a hexanucleotide
motif (TATTTT) repeated five times at position 307958 in
the genome of AP-AT (Table S2).
Since phytoplasmas constitute a unique group of cell-wallless bacteria that have distinctive ecological, nutritional,
biochemical, genomic and phylogenetic properties (Zhao
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Mon, 31 Jul 2017 16:34:04
2753
High-order SSRs refer to SSRs with repeat motif-length from 6 to 10 nt. Gene annotations are based on the original genome sequencing reports.
Motif
Repeats
(n)
‘Ca. P. asteris’-related strains OYM
and AYWB
CTTTGT
GCTTTT
TTTATT
CATTTT
ATTCAG
GATAAT
OYM/
AYWB
3/1
3/1
3/1
3/1
3/0
3/2
International Journal of Systematic and Evolutionary Microbiology 65
TTTGTG
GTTATTT
TTATTAC
AAAACTAA
TTGATTTGA
TTTATTTTTG
3/1
3/0
3/0
3/1
3/0
3/0
CAAGAA
GATAAT
1/3
2/3
TTTTTA
TTCTAA
TCTTTT
TTGTAA
TTATTT
1/3
0/3
3/3
3/3
1/3
TTTTTC
GATATTT
2/3
0/3
AAAATAAGG
‘Ca. P. australiense’-related strains SLY
and CBWB
AAAGAA
ATTATA
ATTATC
ATTATC
ATTATC
ATTATC
1/5
SLY/
CBWB
3/3
3/3
3/3
0/3
0/3
0/3
Location and associated gene in:
OYM genome
AYWB genome
36285–36302 (coding), PAM027 hypothetical protein
162853–162870 (coding), PAM137 hypothetical protein
196083–196100 (non-coding), upstream of PAM165 (rpsB)
471290–471307 (coding), PAM763 (uvrB)
540802–540819 (coding), PAM481 hypothetical protein
591978–591995 (non-coding), upstream of PAM526 (tra5
fragment)
850098–850115 (non-coding), upstream of PAM752 (malK)
771597–771617 (non-coding), upstream of PAM678 ( ffh)
798029–798049 (non-coding), upstream of PAM705 (grpE)
506868–506891 (non-coding), upstream of PAM456 (artM)
797751–797777 (coding), PAM705 (grpE)
239461–239490 (non-coding), upstream of PAM195
hypothetical protein
772340–772335 (coding), PAM679 ( ftsY)
774677–774688 (non-coding), upstream of PAM681
(pseudo-tra5)
477689–477694 (coding), PAM436 hypothetical protein
–
314199–314216 (coding), PAM272 hypothetical protein
168109–168126 (coding), PAM142 (udk)
842877–842882 (coding tail), PAM747 (thdF)
20303–20298 (coding), AYWB016 hypothetical protein
606171–606166 (coding), AYWB584 (thiJ)
574307–574302 (non-coding), upstream of AYWB554 (rpsB)
411901–411906 (coding), AYWB399 (uvrB)
–
228987–228976 (non-coding), upstream of AYWB215 (tra5)
286028–286045 (coding), SLY_0335 ( polC)
279226–279243 (non-coding), upstream of SLY_0331(lplA)
482953–482970 (non-coding), after SLY_0554 (tra5)
–
–
543289–543306 (coding), PAa_0541 ( polC)
536488–536505 (non-coding), upstream of PAa_0537 (lplA)
737620–737637 (non-coding), downstream of PAa_0720 (tra5)
812232–812249 (non-coding), upstream of PAa_0790 (fliA)
321316–321333 (non-coding), upstream of PAa_0291 putative
methyltransferase
368643–368660 (non-coding), upstream of PAa_0354 hypothetical
protein
703978–703983 (non-coding), upstream of AYWB670 (malK)
–
–
335245–335252 (non-coding), upstream of AYWB315 (artM)
–
–
75157–75174 (coding), AYWB064 ( ftsY)
194922–194939 (non-coding), upstream of AYWB174 (pseudo-tra5)
341419–341436 (coding), AYWB320 hypothetical protein
372064–372081 (coding), AYWB353 hypothetical protein
463349–463366 (coding), AYWB448 ‘HAD hydrolase’
600806–600823 (coding), AYWB578 (udk)
696816–696833 (coding+non-coding), AYWB665 (trmE) and its
downstream region
846873–846884 (coding), PAM749 (ugpB)
700757–700774 (coding), AYWB667 (malE)
–
671414–671434 (non-coding), upstream of AYWB643 hypothetical
protein
188423–188415 (non-coding), upstream of PAM158 (hlyIII) 581648–581692 (coding), PAM561 (hlyIII)
SLY genome
CBWB genome
–
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Mon, 31 Jul 2017 16:34:04
W. Wei and others
2754
Table 3. Representative high-order SSRs in the genomes of closely related strains: allelic diversity and potential contingency loci
http://ijs.sgmjournals.org
262076–262081 (coding), SLY_0313 ( pan)
3/3
3/3
3/2
0/3
1/3
In the present study, we examined four genomes that possess characteristic SVM structures. A total of 2997
(32.07 %), 2031 (22.89 %), 3520 (30.20 %) and 3318
(35.83 %) SSRs were identified from prophage islands in
the genomes of OYM, AYWB, CBWB and SLY, respectively
(Table 2). The numbers in parentheses indicate the percentages of prophage island SSRs over the total SSRs in the
corresponding genome. Apparently, the SSR density profiles differed between the phage islands and the rest of
the resident genomes: while the density of mononucleotide
SSRs was higher in the resident genomes than in the prophage islands, the opposite was true for the density of dinucleotide and higher-order motif SSRs (Table 2). This
differential SSR distribution pattern was consistent among
the four phytoplasma genomes examined. We reported previously that DNA sequences in the phytoplasma prophage
islands possess distinct physical properties, including a
G+C content lower than that of the host phytoplasma
genes and a relative dinucleotide abundance that is drastically different from that of their respective host DNA (Wei
et al., 2008). Differential SSR density profiles revealed in
the present study add another distinctive physical property
to the phytoplasma prophage islands.
TTTCTT
TTTTGG
TTAAATAA
GATAAT
GTTGTA
Despite their small size, each of the five completely
sequenced phytoplasma genomes, as well as other partially
sequenced phytoplasma genomes, contains large numbers
of multiple-copy genes of unknown function. These multiple-copy genes are clustered in non-randomly distributed
segments termed sequence-variable mosaics (SVMs), a distinctive architecture of phytoplasma genomes (Jomantiene
& Davis, 2006; Jomantiene et al., 2007). Sequence stretches
referred to as potential mobile units (PMUs) in the genomes of AYWB, CBWB and SLY phytoplasmas (Bai
et al., 2006; Tran-Nguyen et al., 2008; Andersen et al.,
2013) are mostly located in SVM regions. Several lines of
evidence indicated that the SVMs were genomic islands
formed through recurrent phage attacks and subsequent
recombination events (Wei et al., 2008). Phage-derived
genomic islands often occupy a significant portion of the
resident genome. For example, phage-derived islands
encompass 374 clustered, multiple-copy genes, and
account for over 36 % of the total length of the circular
CBWB chromosome (Zhao et al., 2014).
433504–433521 (coding), SLY_0500 hypothetical protein
84828–84845 (coding), SLY_0098 (mgtA)
169685–169702 (non-coding), upstream of SLY_0199
(engC)
746651–746668 (coding), SLY_0869 ( potA)
781538–781555 (coding), SLY_0907 hypothetical protein
345172–345195 (coding), SLY_0413 hypothetical protein
–
156041–156058 (coding), PAa_0135 ( potA)
191613–191630 (coding), PAa_0160 hypothetical protein
602438–602453 (coding), PAa_0607 hypothetical protein
654853–654870 (non-coding), downstream of PAa_0655 putative
methylase
521453–521470 (coding), PAa_0523 putative peptidase M41 cell
division protein
SSR distribution in phage-derived genomic
islands
3/3
3/3
3/3
693005–693022 (coding), PAa_0685 hypothetical protein
96888–96905 (coding), PAa_0089 (mgtA)
364253–34270 (non-coding), upstream of PAa_0348 (engC)
et al., 2014), we devoted our attention to SSRs that
occurred in phytoplasma-unique genomic loci in the subsequent analyses. We also narrowed our focus to SSRs that
were longer than eight nucleotides, as longer SSRs are
more prone to expansion/contraction because of a higher
probability of polymerase slippage (Leclercq et al., 2010).
CATAAA
CCAAAA
TCTATT
Motif
Table 3. cont.
Repeats
(n)
Location and associated gene in:
Simple sequence repeats in phytoplasma genomes
Although extant phytoplasmas probably shared a common
ancestor, emerged as a single clade (Wei et al., 2008)
and still comprise a phylogenetically coherent group
(Gundersen et al., 1994; Zhao et al., 2010), diverse
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Mon, 31 Jul 2017 16:34:04
2755
W. Wei and others
phytoplasma lineages have evolved in adaptation to a broad
range of bio- and geo-ecological niches. The genetic diversity of phytoplasmas is also reflected in the occurrence and
distribution of SSRs in the coding regions of phytoplasmal
prophage islands: the occurrence of long SSRs (i10 nt)
and their motifs in these islands differed significantly
among the five phytoplasma genomes.
For example, OYM and AYWB are two closely related strains
affiliated with the same Candidatus species. The two strains
also share homologous prophage sequences (Wei et al.,
2008). In the OYM prophage islands, only 13 long SSRs
were identified; while most of them had a tri- or tetranucleotide motif, none had a mononucleotide motif. On the other
hand, in the AYWB genome, 30 long SSRs were identified,
with 16 of them having a mononucleotide motif (Table
S3). This phenomenon is particularly noteworthy, considering that the AYWB genome had smaller SVM regions (Wei
et al., 2008) and fewer total SSRs in the prophage islands
compared with the OYM genome (Table 2).
Not surprisingly, most of the long SSRs that occurred in the
OYM and AYWB prophage islands were associated with
‘SVM genes’ (Jomantiene & Davis, 2006; Jomantiene et al.,
2007) or ‘mobile unit genes (MUG)’ (Arashida et al.,
2008a) that encode putative phage structural or functional
proteins (Wei et al., 2008). It is reasonable to predict that
the presence of long SSRs in such genes will probably further
increase the sequence variability of the SVMs within the
genome and further increase the allelic diversity of these
loci among closely related strains. In addition, long SSRs
were also found in strain-specific genes including PAM761,
AYWB202, AYWB_205 and AYWB_274 (Table S3). Prophage islands contain numerous phytoplasma-unique and/or
lineage-specific genes including mornons, transduced genes
(Wei et al., 2008) and genes acquired through an integron
mobile gene cassette-like system in the hypervariable regions
of prophage islands (Jomantiene et al., 2007), some of which
encode putative virulence factors (Gedvilaite et al., 2014).
Future studies will be needed to determine the functions of
these SSR-associated, strain-specific genes and the role of
the SSRs in modulating the functions of the genes.
CBWB and SLY are another pair of closely related strains
affiliated with the same Candidatus species. The genomes
of both strains have extensive prophage islands. Thirty
long SSRs (i10 nt) were identified within 21 proteinencoding genes located in the prophage islands of
the CBWB genome (Table S3). Nine of the 21 genes had
two long SSRs (PAa_0049, PAa_0067, PAa_0189,
PAa_0204, PAa_0236, PAa_0382, PAa_0416, PAa_0745
and PAa_0798). Furthermore, all nine of these genes,
plus another two long-SSR-bearing genes (PAa_0280 and
PAa_0285), shared mutually high sequence similarity and
were members of a same mosaic gene family; the lengths
of these genes varied, indicating that some had become
truncated or decayed. Similarly, in the SLY genome, 27
long SSRs (i10 nt) were identified within 20 proteinencoding genes in the prophage islands; some of the
2756
genes contained two or more SSRs. A majority of these
20 genes fell into three mosaic gene families: (i)
SLY_0152, SLY_0157, SLY_0182, SLY_0593, SLY_0930,
SLY_1000 and SLY_1001; (ii) SLY_0604, SLY_0715,
SLY_0768, SLY_0979 and SLY_1103; and (iii) SLY_0696,
SLY_0942 and SLY_0962. Members of each mosaic gene
family had different lengths, indicating evolutionary
decay or truncation. The results from analysis of long
SSRs in CBWB and SLY prophage islands further support
our hypothesis set out in the previous paragraph proposing
a role of SSRs in compounding the complexity of the
mosaics (SVMs) of clustered reparative genes within individual phytoplasma genomes and in increasing the allelic
diversity among closely related strains. As in the case of
the OYM and AYWB genomes, long SSRs in the CBWB
and SLY prophage islands also occurred within speciesand/or strain-specific genes such as PAa_0293, PAa_0329,
PAa_0343, PAa_0651, PAa_0727, PAa_0729 and SLY_1095.
Several of these lineage-specific genes were apparently
fragmented, raising a question, and a topic of future study,
as to whether SSRs played a role in lineage-specific decay
of these genes. Lineage-specific gene decay in the genomes
of diverse phytoplasmas has been described previously
(Davis et al., 2003, 2005; Oshima et al., 2007).
SSRs in phytoplasma-unique genes outside of
prophage islands
Outside the prophage islands, phytoplasmas possess
additional unique genes that are absent from all other
cell-wall-less bacteria (Zhao et al., 2014). While most of
these phytoplasma-unique, non-phage genes encode
hypothetical proteins of unknown function, a subset of
about 20 genes can be functionally annotated. Our analysis
revealed that a significant portion of these phytoplasmaunique genes had SSRs in their coding regions (Table S4).
It is worth noting that multiple SSRs were identified within
genes encoding phytoplasma-unique transporters (Table
S4). While most of these genes had multiple copies of
MNRs of eight to nine nucleotides or TrNRs of three to
four repeat units, the AP-AT malate/citrate symporter
gene had a DiNR of 28 nt. Since phytoplasmas have limited
metabolic capacities, they must be sustained by the constant exchange of metabolites with host cells, securing
steady import of nutrients and timely efflux of toxins
(Oshima et al., 2004; Kube et al., 2012). In addition,
cross-membrane transportation may be also required for
mediating secretion of potential virulence factors and
maintaining intracellular redox potentials. It would be
interesting to learn whether the observed abundance of
SSRs in genes encoding phytoplasma-unique transporters,
especially the substrate-bonding subunit of the transport
systems, plays a role in tuning or broadening the substrate
specificity of the respective transporters.
SSRs were also found within genes encoding immunodominant or antigenic membrane proteins (IMPs or AMPs)
(Table S4). In diverse phytoplasmas, IMP/AMP genes are
Downloaded from www.microbiologyresearch.org by
International Journal of Systematic and Evolutionary Microbiology 65
IP: 88.99.165.207
On: Mon, 31 Jul 2017 16:34:04
Simple sequence repeats in phytoplasma genomes
M
1
2
3
4
5
6
Fig. 1. PCR amplification of the full-length haemolysin III gene
(hlyIII) and flanking sequences from six ‘Ca. Phytoplasma’-related
strains using primer pair HaemoF1/HaemoR1.M, 1 kb Plus DNA
ladder; 1, aster yellows phytoplasma (AY1a); 2, Oklahoma aster
yellows phytoplasma (OKAY); 3, New Jersey aster yellows phytoplasma (NJAY); 4, clover phyllody phytoplasma (CPh); 5, paulownia witches’-broom phytoplasma (PaWB); 6, Chinese wingnut
witches’-broom phytoplasma (CWWB).
highly expressed, and therefore the AMPs are abundant at
the surface of the phytoplasma cells (Morton et al., 2003;
Kakizawa et al., 2004; Arashida et al., 2008b). As inferred
from their coding sequences, AMPs and IMPs are rich in
positively charged amino acids, and therefore have an isoelectric point greater than 8.0. At physiological pH, such
membrane proteins would tend to present, at the cell surface, positively charged sites or pockets that are critical for
ligand binding, signal perception and other biochemical
functions during pathogen–host interactions (Suzuki
et al., 2006; Boonrod et al., 2012; Zhao et al., 2014). Conceivably, SSRs (especially TrNRs) in IMP/AMP genes may
reversibly alter the amino acid sequence, length and conformation (Heringa & Taylor, 1997) of the AMPs and
IMPs, thus modulating phytoplasma–host interactions
and even escaping host immune surveillance.
The presence of multiple SSRs in genes that encode phosphatidylserine decarboxylase (Psd) and phosphatidylserine
synthase (PssA) is also intriguing (Table S4). All five completely sequenced phytoplasma genomes possess a complex
set of phospholipid biosynthesis pathway genes and, notably, among mollicutes, phytoplasmas are the only group of
organisms whose genomes encode Psd and PssA (Kube
et al., 2012). Since the phospholipid biosynthesis pathway
plays an important role in the virulence of diverse pathogens including fungi (Chen et al., 2010) and bacteria
(Conde-Alvarez et al., 2006; Bukata et al., 2008), the presence of multiple SSRs in the psd and pssA genes further
stimulates our interest in exploring the functions of phospholipids in phytoplasma pathogenesis.
Allelic diversity and potential contingency loci
It has been reported that long SSRs, especially SSRs with
long motif length, are underrepresented in most prokaryotic genomes, and that they often function as contingency
loci, affecting gene expression through altering the motif
repeat numbers (Moxon et al., 2006; Mrázek et al., 2007).
In this study, we identified 22 SSRs with motif lengths
longer than hexanucleotide in the genomes of two ‘Ca. P.
asteris’-related strains, OYM and AYWB (Table 3). Comparative analysis of the genetic loci that bear such long
SSRs revealed that motif repeat number variations (SSR
Full length of
hlylll ORF
831bp
(276aa)
Repeat 1 Repeat 2 Repeat 3 Repeat 4
NJAY
Repeat 1 Repeat 2 Repeat 3 Repeat 4 Repeat 5
OKAY
840bp
(279aa)
Repeat 1 Repeat 2 Repeat 3 Repeat 4 Repeat 5
AYWB
840bp
(279aa)
PaWB
615bp
(204aa)
Repeat 1
OYM
615bp
(204aa)
Fig. 2. Allelic polymorphism of a nonanucleotide-motif SSR associated with the hlyIII gene of five ‘Ca. P. asteris’-related
strains. Partial coding nucleotide sequences and deduced amino acid sequences of the hlyIII loci were aligned. The translational initiation codon in each coding sequence is indicated by an asterisk (*), and the N-terminal methionine is shown in
bold. The nonanucleotide repeat motif (CCTTATTTT) is delineated by a box. A possible partial or decayed repeat unit
(CCTTAT) upstream of the PaWB hlyIII translation initiation codon is underlined. The annotations of the OYM and AYWB
hlyIII translational initiation codon are based on the respective original genome sequencing reports (Oshima et al., 2004; Bai
et al., 2006) and the corresponding GenBank record (accession numbers NC_005303 and NC_007716).
http://ijs.sgmjournals.org
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Mon, 31 Jul 2017 16:34:04
2757
W. Wei and others
polymorphisms) occurred in 20 of the 22 SSR motifs, and
only two SSR motifs (TCTTTT and TTGTAA) did not
appear to be polymorphic between the two phytoplasma
genomes. Among the 20 polymorphic SSR loci, 11 were
located within coding regions and nine were in 59-UTR
regions. Pairwise sequence alignment of the allelic polymorphic SSRs revealed that sequence mismatches (indels)
were essentially confined within the SSRs due to motif
repeat number variations, and the sequences flanking the
SSR tracts were identical or nearly identical (data not
shown). Likewise, 16 SSRs with motif lengths longer than
hexanucleotide were identified in the genomes of two ‘Ca.
P. australiense’-related strains, CBWB and SLY. SSR polymorphisms occurred in eight of the 16 SSR motifs (Table
3). These data suggest that polymorphisms in high-order
(long-motif) SSRs contribute significantly to allelic diversity
among closely related lineages, and such potentially reversible polymorphic loci may serve as contingency loci.
Interestingly, a nonanucleotide SSR motif (AAAATAAGG,
reverse complement CCTTATTTT) was repeated five times
in the coding region of the haemolysin III gene (hlyIII) in
the AYWB genome, while only one copy of this SSR motif
was observed in the 59-UTR region of the OYM hlyIII. Since
hlyIII encodes an AMP, a suspected virulence factor, we
investigated the hlyIII-associated SSR tracts further in six
additional ‘Ca. P. asteris’-related strains (see Methods).
PCRs were conducted using primers annealing to conserved
nucleotide sequence blocks flanking hlyIII. PCRs with DNA
templates derived from three strains, OKAY, NJAY and
PaWB, yielded amplicons (Fig. 1). Results from DNA
sequencing analysis of the cloned amplicons revealed
additional polymorphic alleles of this SSR motif. Since
the allelic polymorphism (i.e. variations in SSR motif
repeat number) occurred within either the coding or 59
regulatory regions of hlyIII, depending on individual strains
(Fig. 2), this polymorphic SSR could conceivably affect
both the expression of the gene and the composition of
the encoded haemolysins. Functional characterizations of
these allelic SSR loci, including assays of haemolytic activities of varied protein products, are being undertaken to
advance our understanding of haemolysins and SSRs in
phytoplasma pathogenesis.
Arashida, R., Kakizawa, S., Ishii, Y., Hoshi, A., Jung, H. Y., Kagiwada,
S., Yamaji, Y., Oshima, K. & Namba, S. (2008b). Cloning and
characterization of the antigenic membrane protein (Amp) gene and
in situ detection of Amp from malformed flowers infected with
Japanese hydrangea phyllody phytoplasma. Phytopathology 98, 769–775.
Bai, X., Zhang, J., Ewing, A., Miller, S. A., Jancso Radek, A.,
Shevchenko, D. V., Tsukerman, K., Walunas, T., Lapidus, A. &
other authors (2006). Living with genome instability: the
adaptation of phytoplasmas to diverse environments of their insect
and plant hosts. J Bacteriol 188, 3682–3696.
Bayliss, C. D., Field, D. & Moxon, E. R. (2001). The simple sequence
contingency loci of Haemophilus influenzae and Neisseria meningitidis.
J Clin Invest 107, 657–666.
Boonrod, K., Munteanu, B., Jarausch, B., Jarausch, W. & Krczal, G.
(2012). An immunodominant membrane protein (Imp) of
‘Candidatus Phytoplasma mali’ binds to plant actin. Mol Plant
Microbe Interact 25, 889–895.
Bukata, L., Altabe, S., de Mendoza, D., Ugalde, R. A. & Comerci, D. J.
(2008). Phosphatidylethanolamine synthesis is required for optimal
virulence of Brucella abortus. J Bacteriol 190, 8197–8203.
Chen, Y. L., Montedonico, A. E., Kauffman, S., Dunlap, J. R., Menn,
F. M. & Reynolds, T. B. (2010). Phosphatidylserine synthase and
phosphatidylserine decarboxylase are essential for cell wall integrity
and virulence in Candida albicans. Mol Microbiol 75, 1112–1132.
Chistiakov, D. A., Hellemans, B., Haley, C. S., Law, A. S.,
Tsigenopoulos, C. S., Kotoulas, G., Bertotto, D., Libertini, A. &
Volckaert, F. A. (2005). A microsatellite linkage map of the
European sea bass Dicentrarchus labrax L. Genetics 170, 1821–1826.
Coenye, T. & Vandamme, P. (2005). Characterization of mononucleo-
tide repeats in sequenced prokaryotic genomes. DNA Res 12, 221–233.
Conde-Alvarez, R., Grilló, M. J., Salcedo, S. P., de Miguel, M. J.,
Fugier, E., Gorvel, J. P., Moriyón, I. & Iriarte, M. (2006). Synthesis of
phosphatidylcholine, a typical eukaryotic phospholipid, is necessary
for full virulence of the intracellular bacterial parasite Brucella
abortus. Cell Microbiol 8, 1322–1335.
Davis, R. E., Jomantiene, R., Zhao, Y. & Dally, E. L. (2003). Folate
biosynthesis pseudogenes, ( folP and ( folK, and an O-sialoglycoprotein
endopeptidase gene homolog in the phytoplasma genome. DNA Cell
Biol 22, 697–706.
Davis, R. E., Jomantiene, R. & Zhao, Y. (2005). Lineage-specific decay
of folate biosynthesis genes suggests ongoing host adaptation in
phytoplasmas. DNA Cell Biol 24, 832–840.
Doi, Y. M., Teranaka, M., Yora, K. & Asuyama, H. (1967). Mycoplasma or
PLT group-like microorganisms found in the phloem elements of plants
infected with mulberry dwarf, potato witches’-broom, aster yellows, or
paulownia witches’-broom. Ann Phytopathol Soc Jpn 33, 259–266.
Ellegren, H. (2004). Microsatellites: simple sequences with complex
evolution. Nat Rev Genet 5, 435–445.
REFERENCES
Ahrens, U. & Seemüller, E. (1992). Detection of DNA of plant patho-
genic mycoplasma like organisms by a polymerase chain reaction that
amplifies a sequence of the 16S rRNA gene. Phytopathology 82, 828–832.
Andersen, M. T., Liefting, L. W., Havukkala, I. & Beever, R. E. (2013).
Field, D. & Wills, C. (1998). Abundant microsatellite polymorphism in
Saccharomyces cerevisiae, and the different distributions of
microsatellites in eight prokaryotes and S. cerevisiae, result from
strong mutation pressures and a variety of selective forces. Proc
Natl Acad Sci U S A 95, 1647–1652.
Gedvilaite, A., Jomantiene, R., Dabrisius, J., Norkiene, M. & Davis,
R. E. (2014). Functional analysis of a lipolytic protein encoded in
Comparison of the complete genome sequence of two closely related
isolates of ‘Candidatus Phytoplasma australiense’ reveals genome
plasticity. BMC Genomics 14, 529.
phytoplasma phage based genomic island. Microbiol Res 169, 388–394.
Arashida, R., Kakizawa, S., Hoshi, A., Ishii, Y., Jung, H. Y., Kagiwada,
S., Yamaji, Y., Oshima, K. & Namba, S. (2008a). Heterogeneic
plasmas): a basis for their classification. J Bacteriol 176, 5244–5254.
dynamics of the structures of multiple gene clusters in two pathogenetically different lines originating from the same phytoplasma. DNA Cell
Biol 27, 209–217.
2758
Gundersen, D. E., Lee, I.-M., Rehner, S. A., Davis, R. E. & Kingsbury,
D. T. (1994). Phylogeny of mycoplasmalike organisms (phytoGur-Arie, R., Cohen, C. J., Eitan, Y., Shelef, L., Hallerman, E. M. & Kashi,
Y. (2000). Simple sequence repeats in Escherichia coli: abundance,
distribution, composition, and polymorphism. Genome Res 10, 62–71.
Downloaded from www.microbiologyresearch.org by
International Journal of Systematic and Evolutionary Microbiology 65
IP: 88.99.165.207
On: Mon, 31 Jul 2017 16:34:04
Simple sequence repeats in phytoplasma genomes
Harrison, N., Davis, R. E., Oropeza, C., Helmick, E., Narvaez, M.,
Eden-Green, S., Dollet, M. & Dickinson, M. (2014). ‘Candidatus
Phytoplasma palmicola’, associated with a lethal yellowing-type
disease of coconut (Cocos nucifera L.) in Mozambique. Int J Syst
Evol Microbiol 64, 1890–1899.
Moxon, R., Bayliss, C. & Hood, D. (2006). Bacterial contingency loci:
the role of simple sequence DNA repeats in bacterial adaptation.
Annu Rev Genet 40, 307–333.
Heringa, J. & Taylor, W. R. (1997). Three-dimensional domain
Mrázek, J. (2006). Analysis of distribution indicates diverse functions
of simple sequence repeats in Mycoplasma genomes. Mol Biol Evol 23,
1370–1385.
duplication, swapping and stealing. Curr Opin Struct Biol 7, 416–421.
Mrázek, J., Guo, X. & Shah, A. (2007). Simple sequence repeats in
Jomantiene, R. & Davis, R. E. (2006). Clusters of diverse genes existing
prokaryotic genomes. Proc Natl Acad Sci U S A 104, 8472–8477.
as multiple, sequence-variable mosaics in a phytoplasma genome.
FEMS Microbiol Lett 255, 59–65.
Oshima, K., Kakizawa, S., Nishigawa, H., Jung, H. Y., Wei, W.,
Suzuki, S., Arashida, R., Nakata, D., Miyata, S. & other authors
(2004). Reductive evolution suggested from the complete
Jomantiene, R., Zhao, Y. & Davis, R. E. (2007). Sequence-variable
mosaics: composites of recurrent transposition characterizing the
genomes of phylogenetically diverse phytoplasmas. DNA Cell Biol
26, 557–564.
Kakizawa, S., Oshima, K., Nishigawa, H., Jung, H. Y., Wei, W., Suzuki,
S., Tanaka, M., Miyata, S., Ugaki, M. & Namba, S. (2004). Secretion of
immunodominant membrane protein from onion yellows phytoplasma
through the Sec protein-translocation system in Escherichia coli.
Microbiology 150, 135–142.
Karlin, S., Brocchieri, L., Bergman, A., Mrazek, J. & Gentles, A. J.
(2002). Amino acid runs in eukaryotic proteomes and disease
genome sequence of a plant-pathogenic phytoplasma. Nat Genet
36, 27–29.
Oshima, K., Kakizawa, S., Arashida, R., Ishii, Y., Hoshi, A., Hayashi,
Y., Kagiwada, S. & Namba, S. (2007). Presence of two glycolytic gene
clusters in a severe pathogenic line of Candidatus Phytoplasma asteris.
Mol Plant Pathol 8, 481–489.
Rocha, E. P. C. & Blanchard, A. (2002). Genomic repeats, genome
plasticity and the dynamics of Mycoplasma evolution. Nucleic Acids
Res 30, 2031–2042.
Rocha, E. P. C., Matic, I. & Taddei, F. (2002). Over-representation of
Kashi, Y. & King, D. G. (2006). Simple sequence repeats as
repeats in stress response genes: a strategy to increase versatility under
stressful conditions? Nucleic Acids Res 30, 1886–1894.
advantageous mutators in evolution. Trends Genet 22, 253–259.
Shuman, S. (1994). Novel approach to molecular cloning and
Kashi, Y., King, D. & Soller, M. (1997). Simple sequence repeats as a
source of quantitative genetic variation. Trends Genet 13, 74–78.
polynucleotide synthesis using vaccinia DNA topoisomerase. J Biol
Chem 269, 32678–32684.
Katti, M. V., Ranjekar, P. K. & Gupta, V. S. (2001). Differential
Simmons, W. L., Denison, A. M. & Dybvig, K. (2004). Resistance of
associations. Proc Natl Acad Sci U S A 99, 333–338.
distribution of simple sequence repeats in eukaryotic genome
sequences. Mol Biol Evol 18, 1161–1167.
King, D. G. (1994). Triple repeat DNA as a highly mutable regulatory
mechanism. Science 263, 595–596.
Kube, M., Schneider, B., Kuhl, H., Dandekar, T., Heitmann, K.,
Migdoll, A. M., Reinhardt, R. & Seemüller, E. (2008). The linear
chromosome of the plant-pathogenic mycoplasma ‘Candidatus
Phytoplasma mali’. BMC Genomics 9, 306.
Kube, M., Mitrovic, J., Duduk, B., Rabus, R. & Seemüller, E. (2012).
Current view on phytoplasma genomes and encoded metabolism.
ScientificWorldJournal 2012, 185942.
Leclercq, S., Rivals, E. & Jarne, P. (2010). DNA slippage occurs at
Mycoplasma pulmonis to complement lysis is dependent on the
number of Vsa tandem repeats: shield hypothesis. Infect Immun 72,
6846–6851.
Suzuki, S., Oshima, K., Kakizawa, S., Arashida, R., Jung, H. Y.,
Yamaji, Y., Nishigawa, H., Ugaki, M. & Namba, S. (2006).
Interaction between the membrane protein of a pathogen and insect
microfilament complex determines insect-vector specificity. Proc
Natl Acad Sci U S A 103, 4252–4257.
Tran-Nguyen, L. T. T., Kube, M., Schneider, B., Reinhardt, R. & Gibb,
K. S. (2008). Comparative genome analysis of ‘‘Candidatus
Phytoplasma australiense’’ (subgroup tuf-Australia I; rp-A) and
‘‘Ca. Phytoplasma asteris’’ strains OY-M and AY-WB. J Bacteriol
190, 3979–3991.
microsatellite loci without minimal threshold length in humans:
a comparative genomic approach. Genome Biol Evol 2, 325–335.
Trivedi, S. (2006). Comparison of simple sequence repeats in 19
Lee, I.-M., Davis, R. E. & Gundersen-Rindal, D. E. (2000). Phytoplasma:
Trivedi, S. (2013). Repeats in transforming acidic coiled-coil (TACC)
phytopathogenic mollicutes. Annu Rev Microbiol 54, 221–255.
archaea. Genet Mol Res 5, 741–772.
genes. Biochem Genet 51, 458–473.
Li, Y. C., Korol, A. B., Fahima, T., Beiles, A. & Nevo, E. (2002).
and
Tsai, J. H. (1979). Vector transmission of mycoplasmal agents of plant
diseases. In The Mycoplasmas, pp. 265–307. Edited by R. F. Whitcomb
& J. G. Tully. San Diego: Academic Press.
Li, Y. C., Korol, A. B., Fahima, T. & Nevo, E. (2004). Microsatellites
van Belkum, A., Scherer, S., van Alphen, L. & Verbrugh, H. (1998).
within genes: structure, function, and evolution. Mol Biol Evol 21,
991–1007.
Short-sequence DNA repeats in prokaryotic genomes. Microbiol Mol
Biol Rev 62, 275–293.
Liu, L., Panangala, V. S. & Dybvig, K. (2002). Trinucleotide GAA
Wei, W., Davis, R. E., Lee, I.-M. & Zhao, Y. (2007). Computer-
repeats dictate pMGA gene expression in Mycoplasma gallisepticum
by affecting spacing between flanking regions. J Bacteriol 184,
1335–1339.
simulated RFLP analysis of 16S rRNA genes: identification of
ten new phytoplasma groups. Int J Syst Evol Microbiol 57,
1855–1867.
Loire, E., Higuet, D., Netter, P. & Achaz, G. (2013). Evolution of
Wei, W., Davis, R. E., Jomantiene, R. & Zhao, Y. (2008). Ancient,
coding microsatellites in primate genomes. Genome Biol Evol 5,
283–295.
recurrent phage attacks and recombination shaped dynamic
sequence-variable mosaics at the root of phytoplasma genome
evolution. Proc Natl Acad Sci U S A 105, 11827–11832.
Microsatellites: genomic distribution, putative functions
mutational mechanisms: a review. Mol Ecol 11, 2453–2465.
Morton, A., Davies, D. L., Blomquist, C. L. & Barbara, D. J. (2003).
Characterization of homologues of the apple proliferation
immunodominant membrane protein gene from three related
phytoplasmas. Mol Plant Pathol 4, 109–114.
http://ijs.sgmjournals.org
Young, E. T., Sloan, J. S. & Van Riper, K. (2000). Trinucleotide repeats
are clustered in regulatory genes in Saccharomyces cerevisiae. Genetics
154, 1053–1068.
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Mon, 31 Jul 2017 16:34:04
2759
W. Wei and others
Zhao, Y., Wei, W., Lee, I. M., Shao, J., Suo, X. & Davis, R. E. (2009).
Construction of an interactive online phytoplasma classification tool,
i PhyClassifier, and its application in analysis of the peach X-disease
phytoplasma group (16SrIII). Int J Syst Evol Microbiol 59, 2582–2593.
Zhao, Y., Wei, W., Davis, R. E. & Lee, I. -M. (2010). Recent advances in
16S rRNA gene-based phytoplasma differentiation, classification and
taxonomy. In Phytoplasmas: Genomes, Plant Hosts and Vector,
2760
pp. 64–92. Edited by P. Weintraub & P. Jones. Wallingford, UK:
CABI Publishing.
Zhao, Y., Davis, R. E., Wei, W., Shao, J. & Jomantiene, R. (2014).
Phytoplasma genomes: evolution through mutually complementary
mechanisms, gene loss and horizontal acquisition. In Genomics of
Plant-Associated Bacteria, pp. 235–271. Edited by D. Gross,
A. Lichens-Park & C. Kole. Heidelberg: Springer.
Downloaded from www.microbiologyresearch.org by
International Journal of Systematic and Evolutionary Microbiology 65
IP: 88.99.165.207
On: Mon, 31 Jul 2017 16:34:04