International Journal of Systematic and Evolutionary Microbiology (2015), 65, 2748 – 2760 DOI 10.1099/ijs.0.000273 Occurrence, distribution and possible functional roles of simple sequence repeats in phytoplasma genomes Wei Wei, Robert E. Davis, Xiaobing Suo and Yan Zhao Correspondence Yan Zhao Molecular Plant Pathology Laboratory, USDA-Agricultural Research Service, Beltsville, MD 20705, USA [email protected] Phytoplasmas are unculturable, cell-wall-less bacteria that parasitize plants and insects. This transkingdom life cycle requires rapid responses to vastly different environments, including transitions from plant phloem sieve elements to various insect tissues and alternations among diverse plant hosts. Features that enable such flexibility in other microbes include simple sequence repeats (SSRs) — mutation-prone, phase-variable short DNA tracts that function as ‘evolutionary rheostats’ and enhance rapid adaptations. To gain insights into the occurrence, distribution and potentially functional roles of SSRs in phytoplasmas, we performed computational analysis on the genomes of five completely sequenced phytoplasma strains, ‘Candidatus Phytoplasma asteris’-related strains OYM and AYWB, ‘Candidatus Phytoplasma australiense’-related strains CBWB and SLY and ‘Candidatus Phytoplasma mali’-related strain AP-AT. The overall density of SSRs in phytoplasma genomes was higher than in representative strains of other prokaryotes. While mono- and trinucleotide SSRs were significantly overrepresented in the phytoplasma genomes, dinucleotide SSRs and other higher-order SSRs were underrepresented. The occurrence and distribution of long SSRs in the prophage islands and phytoplasma-unique genetic loci indicated that SSRs played a role in compounding the complexity of sequence mosaics in individual genomes and in increasing allelic diversity among genomes. Findings from computational analyses were further complemented by an examination of SSRs in varied additional phytoplasma strains, with a focus on potential contingency genes. Some SSRs were located in regions that could profoundly alter the regulation of transcription and translation of affected genes and/or the composition of protein products. INTRODUCTION Simple sequence repeats (SSRs), also termed microsatellites, are hypermutable DNA tracts that consist of tandemly repeated nucleotide motifs of one to several bases (van Belkum et al., 1998). SSRs are ubiquitous in the genomes of all studied eukaryotes and prokaryotes, lying across various regions of resident loci including protein-coding regions, 59- and 39-untranslated regions (UTRs), introns and non-transcribed genomic regions (Field & Wills, 1998; van Belkum et al., 1998; Gur-Arie et al., 2000; Coenye & Vandamme, 2005; Trivedi, 2006). The distribution of SSRs among different loci within a genome is non-random, and the abundances (or rarities) of individual SSR motifs are independent of the nucleotide compositions of the Abbreviations: AMP, antigenic membrane protein; IMP, immunodominant membrane protein; SSR, simple sequence repeat; SVM, sequencevariable mosaic Four supplementary tables are available with the online Supplementary Material. 2748 corresponding genomes (Katti et al., 2001; Li et al., 2002, 2004). For example, in Escherichia coli K-12, mononucleotide and trinucleotide SSRs are significantly overrepresented, whereas dinucleotide and tetranucleotide repeats are underrepresented in genes related to stress responses (Rocha et al., 2002). In the human genome, while SSRs of every possible motif of mono-, di-, tri- and tetranucleotide are enormously overrepresented (Ellegren, 2004), mono- and dinucleotide repeats are particularly dense in genes encoding transforming acidic coiled-coil proteins that are implicated in many types of cancer (Trivedi, 2013). During DNA replication, unequal crossing over and related processes, SSRs are susceptible to mutations due to DNA polymerase slippage (Chistiakov et al., 2005). Such mutations add repeat units to or subtract them from existing SSRs, causing SSR expansion and contraction. While the slippage rate is somehow correlated with the length of the SSR motif, slippage could occur without a minimal threshold length (Leclercq et al., 2010). SSR expansions and contractions play significant roles in genome instability, evolution and recombination (Ellegren, 2004; Loire et al., 2013). Because Downloaded from www.microbiologyresearch.org by 000273 G 2015 IP: 88.99.165.207 On: Mon, 31 Jul 2017 16:34:04 Printed in Great Britain Simple sequence repeats in phytoplasma genomes of their abundance and high polymorphism, SSRs provide informative molecular markers in genome mapping, population genetics and biological diversity studies (Ellegren, 2004). In prokaryotes, SSR expansions and contractions are not only frequent but also readily reversible (Kashi & King, 2006). Since alterations in SSR repeat number within regulatory or coding regions may directly affect the expression of the associated genes or the functions of the encoded proteins, SSRs often function as contingency loci in bacterial genomes, providing an evolutionary ‘tuning knob’ for rapid adaptation to a changing environment (King, 1994; Kashi et al., 1997; Young et al., 2000; Bayliss et al., 2001; Karlin et al., 2002). SSRs acting as contingency loci are particularly common among mycoplasmas (Rocha & Blanchard, 2002; Kashi & King, 2006; Mrázek, 2006). In pathogenic bacteria, certain contingency loci are associated with genes that encode cell-surface antigens and may play a role in evading host immune responses (Moxon et al., 2006). For example, in Mycoplasma gallisepticum, the number of trinucleotide GAA repeats upstream of a haemagglutinin (adhesin) gene, and hence the spacing between the repeats and the core promoter, influences the expression levels of the adhesin gene, accounting for phenotypic variations (Liu et al., 2002). Likewise, in Mycoplasma pulmonis, the length of a tandem repeat within the gene segment encoding the C terminus variable region of a surface antigen correlates with the virulence level of the pathogen (Simmons et al., 2004). Phytoplasmas are insect-transmitted, plant-phloeminhabiting bacteria responsible for numerous diseases in diverse plant species worldwide (Doi et al., 1967; Tsai, 1979; Lee et al., 2000). Along with mycoplasmas and other cell-wall-less bacteria, phytoplasmas are classified in the class Mollicutes. So far, diverse phytoplasmas belonging to 37 candidate species of ‘Candidatus Phytoplasma’ and 32 major groups have been delineated based on 16S rRNA gene sequences (Wei et al., 2007; Zhao et al., 2009; Harrison et al., 2014). The nature of the phytoplasma life cycle requires rapid adaptation to exceptionally variable host environments, including sequential transitions from plant phloem sieve elements to the insect intestinal tract, haemolymph and salivary glands, and shuttling back to the phloem elements of a new plant host. While it would be interesting to learn whether SSRs play a role in the rapid adaptation of phytoplasmas to their vastly different host environments during transkingdom host switching, tools for functional assessment of SSRs are currently lacking due to our inability to establish phytoplasma cultures in cell-free medium and the consequent inaccessibility of measurable phenotypic characters. However, the availability of complete genome sequence data from five phytoplasma strains (Oshima et al., 2004; Bai et al., 2006; Kube et al., 2008; Tran-Nguyen et al., 2008; Andersen et al., 2013) provides an opportunity to investigate the occurrence, distribution and potential functional roles of SSRs in phytoplasma genomes using bioinformatics tools. http://ijs.sgmjournals.org We found that, in phytoplasma genomes, SSRs are abundant and diverse in motif, repeat number and pattern of chromosomal distribution. The SSRs that occur in prophage islands and other phytoplasma-unique genetic loci undoubtedly compound the complexity of sequence mosaics within individual genomes and increase allelic diversity among genomes. METHODS Identification of SSRs in phytoplasma genomes. The assembled, complete genome sequences of five phytoplasma strains were retrieved from the National Center for Biotechnology Information (NCBI) nucleotide sequence database. The strains were ‘Candidatus Phytoplasma asteris’-related onion yellows mild strain (OY-M; GenBank accession no. NC_005303) and aster yellows witches’broom strain (AY-WB; NC_007716), ‘Ca. Phytoplasma australiense’related cottonbush witches’-broom strain (CBWB; NC_010544) and strawberry lethal yellows strain (SLY; NC_021236) and ‘Ca. Phytoplasma mali’-related apple proliferation strain AT (AP-AT; NC_011047). The motifs, repeat numbers and locations of SSRs in the phytoplasma genomes were identified with a computer program developed by Gur-Arie et al. (2000). The range of motif length was set to 1–10 nt, and the minimum number of repeats was set at 3. In the final SSR tally and SSR density calculation, only SSRs that were equal to or longer than 6 nt were counted (i.e. mononucleotide repeats shorter than 6 nt were excluded). The mean density of SSRs in a given genome was calculated by dividing the final count of SSRs by the length (kb) of the respective genome. SSR frequency assessment. To assess whether the observed SSRs in the phytoplasma genomes were merely chance associations of nucleotides, the observed frequency of each SSR motif in a given phytoplasma genome was compared with the occurrence of the same SSR motif in a corresponding randomized virtual genome (a computer-generated random genome with the same base composition as the actual genome of a given phytoplasma). A computer script was developed in this study based on the random function (rand) of the Perl programming language (version 5.10; http://www.perl.org/). One hundred virtual genomes were generated for each of the five phytoplasma strains using the Perl script. The virtual genomes were then analysed using the same SSR identification program with the same parameters described above. The mean frequency and standard error for each SSR motif in the 100 virtual genomes were calculated. Statistical significance between the frequency of a given SSR motif observed from a given actual phytoplasma genome and the mean frequency of the same SSR motif observed from the corresponding 100 virtual genomes was assessed with two-tailed t-tests. Statistical analysis was conducted using the Microsoft Excel program (Microsoft Office Suite 2007). DNA extraction from phytoplasma strains. To determine experimentally the repeat number variation of a nonanucleotide SSR motif (AAAATAAGG) in/around the gene encoding haemolysin III (hlyIII), six phytoplasma strains were used. These strains were aster yellows phytoplasma (AY1a), Oklahoma aster yellows phytoplasma (OKAY), New Jersey aster yellows phytoplasma (NJAY), clover phyllody phytoplasma (CPh), paulownia witches’-broom phytoplasma (PaWB) and Chinese wingnut witches’-broom phytoplasma (CWWB). Total phytoplasma DNAs were extracted from diseased plants according to the procedure described by Ahrens & Seemüller (1992). PCR amplification of the haemolysin III gene. The sequences of hlyIII and the neighbouring loci were identified from the genomes of OYM and AYWB. The sequences were aligned using the Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Mon, 31 Jul 2017 16:34:04 2749 W. Wei and others program (CLUSTAL _X option) of the sequence analysis software suite Lasergene (DNASTAR ). Primers for amplification of the full-length hlyIII were designed based on the alignment: HaemoF1, 59-GGGTTTGATTTCAGGAAGGG-39, and HaemoR1, 59TGTCTTTGGTCTTCATTG-39. PCR amplifications were carried out using DNA templates extracted from six ‘Ca. P. asteris’-related strains. Each reaction mixture, in a total volume of 50 ml, contained 1 ml of the above prepared DNA extract, 25 pmol each of a forward and a reverse primer, 200 mM each dNTP, 1| GeneAmp PCR buffer I and 2.5 U AmpliTaq DNA polymerase (Applied Biosystems). The cycling program consisted of 35 cycles of denaturation at 94 uC for 30 s, annealing at 56 uC for 45 s and extension at 72 uC for 45 s. Amplicons were analysed by electrophoresis through 1 % agarose gel. MEGALIGN TA cloning and DNA sequencing. PCR products were cloned into plasmid vector pCRII-TOPO (Invitrogen), and propagated in Escherichia coli as described previously (Shuman, 1994). DNA sequencing was performed with an ABI PRISM 377 automated DNA sequencer (Applied Biosystems). DNA sequence data were assembled using the SeqMan program of the sequence analysis software suite Lasergene (DNASTAR ). RESULTS AND DISCUSSION Occurrence and composition of SSRs in phytoplasma genomes Totals of 9338, 8873, 11654, 12416 and 9261 SSRs were observed in the genomes of OYM, AYWB, CBWB, SLY and AP-AT, respectively (Tables 1 and 2). The mean density of SSRs in individual phytoplasma genomes ranged from 10.95 per kb (OYM) to 15.39 per kb (AP-AT), much higher than those in the genomes of other prokaryotes, including a representative Gram-negative bacterium (Escherichia coli; 3.33 per kb), a low-G+C-content Gram-positive bacterium (Bacillus subtilis; 4.74 per kb) and a humanpathogenic, cell-wall-less bacterium (Mycoplasma genitalium; 6.30 per kb) (Table 2). The SSR densities in phytoplasma genomes were also significantly higher than those of phylogenetically related, plant-associated species of the genus Acholeplasma, Acholeplasma brassicae (4.73 per kb) and A. palmae (7.17 per kb), and higher than those of plant-phloem-inhabiting candidate species of ‘Candidatus Liberibacter’, ‘Ca. Liberibacter americanus’ (7.90 per kb) and ‘Ca. L. asiaticus’ (6.39 per kb) (Table 2). The high SSR density in phytoplasmas was mainly the result of mononucleotide repeats (MNRs) (Table 2). SSRs with mononucleotide repeats (MNRs i6) were the most abundant, constituting 79.38 % (in AP-AT) to 84.68 % (in CBWB) of the total SSRs in the phytoplasma genomes (Table 1 and Table S1, available in the online Supplementary Material). MNRs with A or T motifs were far more frequent than MNRs with C or G motifs; in fact, the counts of A or T tracks were 148, 144, 129, 93 and 524 times greater than the counts of C or G tracks in the genomes of OYM, AYWB, CBWB, SLY and AP-AT, respectively (Table S2). Generally, low-G+C-content genomes tend to have more A or T runs, often leading to a larger number of MNRs. However, our results showed that the mean density of MNRs in a given phytoplasma 2750 genome is not always proportional to its G+C content. For example, whereas the genomes of OYM, AYWB and CBWB have comparable G+C contents (27.76, 26.89 and 27.42 %, respectively), the MNR densities of the three phytoplasmas were 8.71, 10.23 and 11.22 per kb, respectively (Table 2). Our results also indicated that the MNR densities in the five phytoplasma genomes are much higher than that of A. palmae (4.53 per kb), a low-G+C-content, cell-wall-less bacterium closely related to phytoplasmas (Table 2). While the maximum repeat number of MNRs reached 15, the counts decreased significantly when the repeat number exceeded eight for A or T tracks, and six for C or G tracks (Tables 1 and S2). Compared with virtual phytoplasma genomes (computer-generated random genomes with the same base compositions as actual phytoplasmas), MNRs were overrepresented in each of the five phytoplasma genomes (Pv0.01), especially MNRs in the range of 6 to 8 nt in length (Tables 1 and S1). Whereas MNRs occurred in both coding and non-coding regions of the phytoplasma genomes, it appeared that a majority of MNRs ranging from 6 to 10 nt in length were located in protein-coding regions. For example, 75.34, 68.75, 66.06, 69.93 and 76.02 % of 9-nt MNRs were observed in coding regions of the OYM, AYWB, CBWB, SLY and AP-AT genomes, respectively (Table 1). SSRs with dinucleotide repeat motifs (DiNRs) accounted for 8.62–11.79 % of the total SSRs found in the five genomes (Tables 1 and S1). Among the 12 DiNR motifs, AT and TA were predominant (Table S2). The maximum repeat number of DiNR motifs was 7, 14, 8, 9 and 14 in the genomes of OYM, AYWB, CBWB, SLY and AP-AT, respectively (Tables 1 and S1). The distribution of DiNRs between coding and non-coding regions appeared to be length (repeat number) -dependent: while the frequencies of 6-nt DiNRs were slightly higher in coding regions, DNRs longer than 10 nt were found mainly in the non-coding regions (Table 1). Overall, DiNRs were underrepresented (Pv0.01) in the five phytoplasma genomes when compared with randomized virtual phytoplasma genomes (Table S1). SSRs with trinucleotide repeat motifs (TrNRs) were overrepresented (Pv0.01) in each of the five phytoplasma genomes when compared with randomized virtual phytoplasma genomes (Tables 1 and S1). A total of 58 triplet motif types were identified. Of the 64 possible triplet motif types, six (AAA, CCC, CCG, GCC, GGG and TTT) were absent from the five phytoplasma genomes (Table S2). An overwhelming majority of the TrNRs were 9 nt in length, and they occurred more frequently in coding regions than in non-coding regions (Table 1). TrNRs in coding regions apparently encode amino acid residues. It is well known that tandem repeats of amino acid residues can form secondary structures that lead to conformations such as alpha helices, beta pleated sheets or loops (Heringa & Taylor, 1997). Expansion or contraction of TrNRs can change such conformations and therefore change the corresponding protein surface structures and their abilities to interact with other macromolecules, including other Downloaded from www.microbiologyresearch.org by International Journal of Systematic and Evolutionary Microbiology 65 IP: 88.99.165.207 On: Mon, 31 Jul 2017 16:34:04 http://ijs.sgmjournals.org Table 1. Occurrence of SSRs in the genomes of five phytoplasma strains MNR, Mononucleotide repeat; DiNR, dinucleotide repeat; TrNR, trinucleotide repeat; TeNR, tetranucleotide repeat; PNR, pentanucleotide repeat; HxNR, hexanucleotide repeat; HeNR, heptanucleotide repeat; ONR, octanucleotide repeat; NNR, nonanucleotide repeat; DeNR, decanucleotide repeat; GW, genome-wide; CR, coding region; ~, MNRs shorter than six nucleotides (which were not counted in the total SSR tally); (-no SSRs in either GW or CR; %, percentage of SSRs located within protein-coding regions. The percentages of total protein-encoding sequences in the five phytoplasma genomes are as follows: OYM, 72.95 %; AYWB, 73.73 %; CBWB, 64.15 %; SLY, 77.35 %; AP-AT, 76.12 %. Repeats (n) MNR GW/CR ~ ~ ~ 4280/2895 2363/1360 704/442 73/55 6/4 2/0 – 3/0 1/0 1/0 ~ ~ ~ 3865/2539 2366/1332 775/461 160/110 36/27 7/5 1/1 2/0 11/2 3/2 ~ ~ ~ 5096/2904 3631/1567 % TrNR TeNR PNR HxNR HeNR ONR NNR DeNR GW/CR % GW/CR % GW/CR % GW/CR % GW/CR % GW/CR % GW/CR % GW/CR % GW/CR % 60.26 49.37 20 0 0 689/471 31/19 2/1 – – – – – – – – – – 68.36 61.29 50 58/25 – – – – – – – – – – – – 43.10 10/2 – – – – – – – – – – – – 20 9/6 – – – – – – – – – – – – 66.67 2/0 – – – – – – – – – – – – 0 1/0 – – – – – – – – – – – – 0 1/1 – – – – – – – – – – – – 100 1/0 – – – – – – – – – – – – 0 0 0 0 1014/611 79/39 5/1 1/0 2/0 – – – – – – – – 57.34 53.95 20 593/406 34/19 7/4 1/1 – – – – – – – – – 68.47 55.88 57.14 100 52/27 – – – – – – – – – – – – 51.92 8/5 – – – – – – – – – – – – 62.5 8/6 – – – – – – – – – – – – 75 1/0 – – – – – – – – – – – – 0 65.69 56.30 59.48 68.75 75 71.43 100 0 18.18 66.67 858/492 76/41 5/1 – – – – – 1/0 – 1/0 1/0 – – – – – – – – – – – – – – 53.08 33.82 37.5 100 0 680/425 19/11 2/1 1/1 – 62.5 57.89 50 100 52/23 2/1 – – – 44.23 50 4/0 – – – – 0 15/5 – – – – 33.33 1/0 – – – – 0 56.99 43.16 925/491 68/23 8/3 1/1 1/0 67.64 57.55 62.78 75.34 66.67 0 0 0 0 Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Mon, 31 Jul 2017 16:34:04 1/0 – – – – – – 1/1 – – – – – – – – – – 0 – – – – – 100 – – – – – – – – – – – – – – – – 1/0 – 0 Simple sequence repeats in phytoplasma genomes 2751 OYM 3 4 5 6 7 8 9 10 11 12 13 14 15 AYWB 3 4 5 6 7 8 9 10 11 12 13 14 15 CBWB 3 4 5 6 7 DiNR Repeats (n) International Journal of Systematic and Evolutionary Microbiology 65 8 9 10 11 12 13 14 15 SLY 3 4 5 6 7 8 9 10 11 12 13 14 15 AP-AT 3 4 5 6 7 8 9 10 11 12 13 14 MNR DiNR TrNR TeNR % GW/CR % GW/CR 934/476 165/109 14/11 4/0 1/1 – 8/0 17/4 50.96 66.06 78.57 0 100 0 0 23.53 2/0 – – – – – – – – – – – – – – – 62.89 46.99 8.33 33.33 0 731/564 29/23 – – – – – – – – – – – 77.15 79.31 52/25 2/1 – – – – – – – – – – – 48.08 50 4/0 – – – – – – – – – – – – 0 68.48 54.51 64.05 69.93 50 25 0 0 33.33 25 1032/649 83/39 24/2 3/1 1/0 – 1/0 – – – – – – 8/5 – – – – – – – – – – – – 57.87 42.72 16.67 0 0 705/508 48/35 2/0 – – – – – – 72.06 72.92 0 – – 14/6 – – – – – – – – – – – 42.86 0 0 72/32 – – – – – – – – – – – 44.44 69.76 63.77 64.58 76.02 78.38 84.21 83.33 80 28.57 921/533 103/44 24/4 2/0 3/0 – – – – – 2/0 1/0 7/7 1/1 1/1 – – – – – – – – – ~ ~ ~ 3615/2522 2219/1415 1039/671 367/279 74/58 19/16 6/5 5/4 7/2 0 GW/CR % – – – – – – – – GW/CR HxNR GW/CR ~ ~ ~ 5453/3734 3746/2042 1043/668 153/107 22/11 4/1 2/0 2/0 15/5 4/1 % PNR % – – – – – – – – GW/CR HeNR % – – – – – – – – GW/CR ONR % GW/CR NNR % GW/CR GW/CR % – – – – – – – – 1/0 – – – – – – – 0 1/0 – – – – – – – – – – – – 0 – – – – – – – – – – – – – – – – 62.5 – – – – – – – – – – – – – 1/0 – – – – – – – – – – – – 0 – – – – – – – – – – – – – 100 100 100 – 1/0 – – – – – – – – – – 1/0 – – – – – – – – – – – 0 2/2 – – – – – – – – – – – Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Mon, 31 Jul 2017 16:34:04 0 DeNR % 100 – – – – – – – – – – – – W. Wei and others 2752 Table 1. cont. Simple sequence repeats in phytoplasma genomes Table 2. Abundance and mean density of SSRs in each of the five phytoplasma genomes The mean density of SSRs of a given phytoplasma genome was calculated by dividing the total count of SSRs by the length (kb) of the genome. The mean density of SSRs in phytoplasma prophage islands was calculated by dividing the total count of SSRs within the phage islands by the length of concatenated genomic segments that comprise the prophage islands. For comparative purposes, the mean densities of SSRs in the genomes of Escherichia coli K-12 (GenBank accession no. NC_000913), Bacillus subtilis subsp. subtilis 168 (NC_000964), Mycoplasma genitalium G-37T (NC_000908), Acholeplasma brassicae 0502T (NC_022549), Acholeplasma palmae J233T (NC_022538), ‘Candidatus Liberibacter americanus’ Sao Paulo (NC_022793), ‘Candidatus Liberibacter asiaticus’ Psy62 (NC_012985) and Campylobacter jejuni subsp. jejuni NCTC 11168 (NC_002163) were also calculated with the same criteria. SSRs shorter than six nucleotides were not counted. MNR to DeNR Strain Whole genome OYM AYWB CBWB SLY AP-AT M. genitalium G-37T E. coli K-12 B. subtilis 168 A. brassicae 0502T A. palmae J233T ‘Ca. L. americanus’ Sao Paulo ‘Ca. L. asiaticus’ Psy62 C. jejuni NCTC 11168 Prophage islands (SVMs) OYM AYWB CBWB SLY Outside SVMs OYM AYWB CBWB SLY MNRs only G+C content (mol%) DNA length (kb) SSRs (n) Density SSRs (n) Density SSRs (n) Density 27.76 26.89 27.42 27.19 21.39 31.69 50.79 43.51 35.77 28.98 31.11 36.47 30.55 853.09 706.57 879.96 959.78 601.94 580.08 4641.65 4215.61 1877.79 1554.23 1195.20 1227.33 1641.48 9338 8873 11654 12416 9261 3655 15449 19984 8889 11151 9445 7843 13297 10.95 12.56 13.24 12.94 15.39 6.30 3.33 4.74 4.73 7.17 7.90 6.39 8.10 1905 1647 1784 1972 1910 819 10083 11302 3453 4117 4397 3153 3102 2.23 2.33 2.03 2.05 3.17 1.41 2.17 2.68 1.84 2.65 3.68 2.47 1.89 7433 7226 9870 10444 7351 2836 5366 8682 5436 7034 5048 4690 10195 8.71 10.23 11.22 10.88 12.21 4.89 1.16 2.06 2.89 4.53 4.22 3.67 6.21 264.22 160.20 318.26 327.27 2997 2031 3520 3318 11.34 12.68 11.06 10.14 701 486 680 704 2.65 3.03 2.14 2.15 2296 1545 2840 2614 8.69 9.64 8.92 7.99 588.87 546.37 561.70 632.51 6343 6842 8134 9096 10.77 12.52 14.48 14.38 1208 1161 1106 1266 2.05 2.12 1.97 2.00 5135 5681 7028 7830 8.72 10.40 12.51 12.38 proteins. Interestingly, in the coding regions, the predominant TrNR motif differed among phytoplasma lineages, even among closely related strains. For example, while ATT was the most abundant TrNR motif in the genomes of both AYWB and OYM as a whole, ATT was the predominant TrNR motif in the AYWB coding regions, whereas CAA was the predominant TrNR motif in the OYM coding regions (data not shown). The counts of SSRs with tetranucleotide repeat motifs (TeNRs) were far lower compared with SSRs with TrNRs (Table 1), and the composition of motif types differed significantly among the five genomes. While a total of 39 TeNR motif types were identified, only eight motif types, AAAG, AAAT, AATA, AATT, ATTA, ATTT, TTAT and TTTA, were present in all five completely sequenced genomes. Twelve combinations of TeNR motif types were present in no more than one genome. For example, the http://ijs.sgmjournals.org DiNR to DeNR combination of GTTT and TTCT repeats was found only in the genome of OYM (Table S2). Such strain-specific SSR motif combinations could be exploited as molecular markers for phytoplasma strain-typing. SSRs with motif lengths equal to or greater than pentanucleotides were rare. Furthermore, the repeat number of these long motifs rarely exceeded four (Table 1), with the following exceptions: a nonanucleotide motif (ATAAGGAAA) repeated five times at position 581651 in the genome of AYWB; a decanucleotide motif (TAAATAATAA) repeated six times at position 878250 and eight times at position 878313 in the genome of CBWB; and a hexanucleotide motif (TATTTT) repeated five times at position 307958 in the genome of AP-AT (Table S2). Since phytoplasmas constitute a unique group of cell-wallless bacteria that have distinctive ecological, nutritional, biochemical, genomic and phylogenetic properties (Zhao Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Mon, 31 Jul 2017 16:34:04 2753 High-order SSRs refer to SSRs with repeat motif-length from 6 to 10 nt. Gene annotations are based on the original genome sequencing reports. Motif Repeats (n) ‘Ca. P. asteris’-related strains OYM and AYWB CTTTGT GCTTTT TTTATT CATTTT ATTCAG GATAAT OYM/ AYWB 3/1 3/1 3/1 3/1 3/0 3/2 International Journal of Systematic and Evolutionary Microbiology 65 TTTGTG GTTATTT TTATTAC AAAACTAA TTGATTTGA TTTATTTTTG 3/1 3/0 3/0 3/1 3/0 3/0 CAAGAA GATAAT 1/3 2/3 TTTTTA TTCTAA TCTTTT TTGTAA TTATTT 1/3 0/3 3/3 3/3 1/3 TTTTTC GATATTT 2/3 0/3 AAAATAAGG ‘Ca. P. australiense’-related strains SLY and CBWB AAAGAA ATTATA ATTATC ATTATC ATTATC ATTATC 1/5 SLY/ CBWB 3/3 3/3 3/3 0/3 0/3 0/3 Location and associated gene in: OYM genome AYWB genome 36285–36302 (coding), PAM027 hypothetical protein 162853–162870 (coding), PAM137 hypothetical protein 196083–196100 (non-coding), upstream of PAM165 (rpsB) 471290–471307 (coding), PAM763 (uvrB) 540802–540819 (coding), PAM481 hypothetical protein 591978–591995 (non-coding), upstream of PAM526 (tra5 fragment) 850098–850115 (non-coding), upstream of PAM752 (malK) 771597–771617 (non-coding), upstream of PAM678 ( ffh) 798029–798049 (non-coding), upstream of PAM705 (grpE) 506868–506891 (non-coding), upstream of PAM456 (artM) 797751–797777 (coding), PAM705 (grpE) 239461–239490 (non-coding), upstream of PAM195 hypothetical protein 772340–772335 (coding), PAM679 ( ftsY) 774677–774688 (non-coding), upstream of PAM681 (pseudo-tra5) 477689–477694 (coding), PAM436 hypothetical protein – 314199–314216 (coding), PAM272 hypothetical protein 168109–168126 (coding), PAM142 (udk) 842877–842882 (coding tail), PAM747 (thdF) 20303–20298 (coding), AYWB016 hypothetical protein 606171–606166 (coding), AYWB584 (thiJ) 574307–574302 (non-coding), upstream of AYWB554 (rpsB) 411901–411906 (coding), AYWB399 (uvrB) – 228987–228976 (non-coding), upstream of AYWB215 (tra5) 286028–286045 (coding), SLY_0335 ( polC) 279226–279243 (non-coding), upstream of SLY_0331(lplA) 482953–482970 (non-coding), after SLY_0554 (tra5) – – 543289–543306 (coding), PAa_0541 ( polC) 536488–536505 (non-coding), upstream of PAa_0537 (lplA) 737620–737637 (non-coding), downstream of PAa_0720 (tra5) 812232–812249 (non-coding), upstream of PAa_0790 (fliA) 321316–321333 (non-coding), upstream of PAa_0291 putative methyltransferase 368643–368660 (non-coding), upstream of PAa_0354 hypothetical protein 703978–703983 (non-coding), upstream of AYWB670 (malK) – – 335245–335252 (non-coding), upstream of AYWB315 (artM) – – 75157–75174 (coding), AYWB064 ( ftsY) 194922–194939 (non-coding), upstream of AYWB174 (pseudo-tra5) 341419–341436 (coding), AYWB320 hypothetical protein 372064–372081 (coding), AYWB353 hypothetical protein 463349–463366 (coding), AYWB448 ‘HAD hydrolase’ 600806–600823 (coding), AYWB578 (udk) 696816–696833 (coding+non-coding), AYWB665 (trmE) and its downstream region 846873–846884 (coding), PAM749 (ugpB) 700757–700774 (coding), AYWB667 (malE) – 671414–671434 (non-coding), upstream of AYWB643 hypothetical protein 188423–188415 (non-coding), upstream of PAM158 (hlyIII) 581648–581692 (coding), PAM561 (hlyIII) SLY genome CBWB genome – Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Mon, 31 Jul 2017 16:34:04 W. Wei and others 2754 Table 3. Representative high-order SSRs in the genomes of closely related strains: allelic diversity and potential contingency loci http://ijs.sgmjournals.org 262076–262081 (coding), SLY_0313 ( pan) 3/3 3/3 3/2 0/3 1/3 In the present study, we examined four genomes that possess characteristic SVM structures. A total of 2997 (32.07 %), 2031 (22.89 %), 3520 (30.20 %) and 3318 (35.83 %) SSRs were identified from prophage islands in the genomes of OYM, AYWB, CBWB and SLY, respectively (Table 2). The numbers in parentheses indicate the percentages of prophage island SSRs over the total SSRs in the corresponding genome. Apparently, the SSR density profiles differed between the phage islands and the rest of the resident genomes: while the density of mononucleotide SSRs was higher in the resident genomes than in the prophage islands, the opposite was true for the density of dinucleotide and higher-order motif SSRs (Table 2). This differential SSR distribution pattern was consistent among the four phytoplasma genomes examined. We reported previously that DNA sequences in the phytoplasma prophage islands possess distinct physical properties, including a G+C content lower than that of the host phytoplasma genes and a relative dinucleotide abundance that is drastically different from that of their respective host DNA (Wei et al., 2008). Differential SSR density profiles revealed in the present study add another distinctive physical property to the phytoplasma prophage islands. TTTCTT TTTTGG TTAAATAA GATAAT GTTGTA Despite their small size, each of the five completely sequenced phytoplasma genomes, as well as other partially sequenced phytoplasma genomes, contains large numbers of multiple-copy genes of unknown function. These multiple-copy genes are clustered in non-randomly distributed segments termed sequence-variable mosaics (SVMs), a distinctive architecture of phytoplasma genomes (Jomantiene & Davis, 2006; Jomantiene et al., 2007). Sequence stretches referred to as potential mobile units (PMUs) in the genomes of AYWB, CBWB and SLY phytoplasmas (Bai et al., 2006; Tran-Nguyen et al., 2008; Andersen et al., 2013) are mostly located in SVM regions. Several lines of evidence indicated that the SVMs were genomic islands formed through recurrent phage attacks and subsequent recombination events (Wei et al., 2008). Phage-derived genomic islands often occupy a significant portion of the resident genome. For example, phage-derived islands encompass 374 clustered, multiple-copy genes, and account for over 36 % of the total length of the circular CBWB chromosome (Zhao et al., 2014). 433504–433521 (coding), SLY_0500 hypothetical protein 84828–84845 (coding), SLY_0098 (mgtA) 169685–169702 (non-coding), upstream of SLY_0199 (engC) 746651–746668 (coding), SLY_0869 ( potA) 781538–781555 (coding), SLY_0907 hypothetical protein 345172–345195 (coding), SLY_0413 hypothetical protein – 156041–156058 (coding), PAa_0135 ( potA) 191613–191630 (coding), PAa_0160 hypothetical protein 602438–602453 (coding), PAa_0607 hypothetical protein 654853–654870 (non-coding), downstream of PAa_0655 putative methylase 521453–521470 (coding), PAa_0523 putative peptidase M41 cell division protein SSR distribution in phage-derived genomic islands 3/3 3/3 3/3 693005–693022 (coding), PAa_0685 hypothetical protein 96888–96905 (coding), PAa_0089 (mgtA) 364253–34270 (non-coding), upstream of PAa_0348 (engC) et al., 2014), we devoted our attention to SSRs that occurred in phytoplasma-unique genomic loci in the subsequent analyses. We also narrowed our focus to SSRs that were longer than eight nucleotides, as longer SSRs are more prone to expansion/contraction because of a higher probability of polymerase slippage (Leclercq et al., 2010). CATAAA CCAAAA TCTATT Motif Table 3. cont. Repeats (n) Location and associated gene in: Simple sequence repeats in phytoplasma genomes Although extant phytoplasmas probably shared a common ancestor, emerged as a single clade (Wei et al., 2008) and still comprise a phylogenetically coherent group (Gundersen et al., 1994; Zhao et al., 2010), diverse Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Mon, 31 Jul 2017 16:34:04 2755 W. Wei and others phytoplasma lineages have evolved in adaptation to a broad range of bio- and geo-ecological niches. The genetic diversity of phytoplasmas is also reflected in the occurrence and distribution of SSRs in the coding regions of phytoplasmal prophage islands: the occurrence of long SSRs (i10 nt) and their motifs in these islands differed significantly among the five phytoplasma genomes. For example, OYM and AYWB are two closely related strains affiliated with the same Candidatus species. The two strains also share homologous prophage sequences (Wei et al., 2008). In the OYM prophage islands, only 13 long SSRs were identified; while most of them had a tri- or tetranucleotide motif, none had a mononucleotide motif. On the other hand, in the AYWB genome, 30 long SSRs were identified, with 16 of them having a mononucleotide motif (Table S3). This phenomenon is particularly noteworthy, considering that the AYWB genome had smaller SVM regions (Wei et al., 2008) and fewer total SSRs in the prophage islands compared with the OYM genome (Table 2). Not surprisingly, most of the long SSRs that occurred in the OYM and AYWB prophage islands were associated with ‘SVM genes’ (Jomantiene & Davis, 2006; Jomantiene et al., 2007) or ‘mobile unit genes (MUG)’ (Arashida et al., 2008a) that encode putative phage structural or functional proteins (Wei et al., 2008). It is reasonable to predict that the presence of long SSRs in such genes will probably further increase the sequence variability of the SVMs within the genome and further increase the allelic diversity of these loci among closely related strains. In addition, long SSRs were also found in strain-specific genes including PAM761, AYWB202, AYWB_205 and AYWB_274 (Table S3). Prophage islands contain numerous phytoplasma-unique and/or lineage-specific genes including mornons, transduced genes (Wei et al., 2008) and genes acquired through an integron mobile gene cassette-like system in the hypervariable regions of prophage islands (Jomantiene et al., 2007), some of which encode putative virulence factors (Gedvilaite et al., 2014). Future studies will be needed to determine the functions of these SSR-associated, strain-specific genes and the role of the SSRs in modulating the functions of the genes. CBWB and SLY are another pair of closely related strains affiliated with the same Candidatus species. The genomes of both strains have extensive prophage islands. Thirty long SSRs (i10 nt) were identified within 21 proteinencoding genes located in the prophage islands of the CBWB genome (Table S3). Nine of the 21 genes had two long SSRs (PAa_0049, PAa_0067, PAa_0189, PAa_0204, PAa_0236, PAa_0382, PAa_0416, PAa_0745 and PAa_0798). Furthermore, all nine of these genes, plus another two long-SSR-bearing genes (PAa_0280 and PAa_0285), shared mutually high sequence similarity and were members of a same mosaic gene family; the lengths of these genes varied, indicating that some had become truncated or decayed. Similarly, in the SLY genome, 27 long SSRs (i10 nt) were identified within 20 proteinencoding genes in the prophage islands; some of the 2756 genes contained two or more SSRs. A majority of these 20 genes fell into three mosaic gene families: (i) SLY_0152, SLY_0157, SLY_0182, SLY_0593, SLY_0930, SLY_1000 and SLY_1001; (ii) SLY_0604, SLY_0715, SLY_0768, SLY_0979 and SLY_1103; and (iii) SLY_0696, SLY_0942 and SLY_0962. Members of each mosaic gene family had different lengths, indicating evolutionary decay or truncation. The results from analysis of long SSRs in CBWB and SLY prophage islands further support our hypothesis set out in the previous paragraph proposing a role of SSRs in compounding the complexity of the mosaics (SVMs) of clustered reparative genes within individual phytoplasma genomes and in increasing the allelic diversity among closely related strains. As in the case of the OYM and AYWB genomes, long SSRs in the CBWB and SLY prophage islands also occurred within speciesand/or strain-specific genes such as PAa_0293, PAa_0329, PAa_0343, PAa_0651, PAa_0727, PAa_0729 and SLY_1095. Several of these lineage-specific genes were apparently fragmented, raising a question, and a topic of future study, as to whether SSRs played a role in lineage-specific decay of these genes. Lineage-specific gene decay in the genomes of diverse phytoplasmas has been described previously (Davis et al., 2003, 2005; Oshima et al., 2007). SSRs in phytoplasma-unique genes outside of prophage islands Outside the prophage islands, phytoplasmas possess additional unique genes that are absent from all other cell-wall-less bacteria (Zhao et al., 2014). While most of these phytoplasma-unique, non-phage genes encode hypothetical proteins of unknown function, a subset of about 20 genes can be functionally annotated. Our analysis revealed that a significant portion of these phytoplasmaunique genes had SSRs in their coding regions (Table S4). It is worth noting that multiple SSRs were identified within genes encoding phytoplasma-unique transporters (Table S4). While most of these genes had multiple copies of MNRs of eight to nine nucleotides or TrNRs of three to four repeat units, the AP-AT malate/citrate symporter gene had a DiNR of 28 nt. Since phytoplasmas have limited metabolic capacities, they must be sustained by the constant exchange of metabolites with host cells, securing steady import of nutrients and timely efflux of toxins (Oshima et al., 2004; Kube et al., 2012). In addition, cross-membrane transportation may be also required for mediating secretion of potential virulence factors and maintaining intracellular redox potentials. It would be interesting to learn whether the observed abundance of SSRs in genes encoding phytoplasma-unique transporters, especially the substrate-bonding subunit of the transport systems, plays a role in tuning or broadening the substrate specificity of the respective transporters. SSRs were also found within genes encoding immunodominant or antigenic membrane proteins (IMPs or AMPs) (Table S4). In diverse phytoplasmas, IMP/AMP genes are Downloaded from www.microbiologyresearch.org by International Journal of Systematic and Evolutionary Microbiology 65 IP: 88.99.165.207 On: Mon, 31 Jul 2017 16:34:04 Simple sequence repeats in phytoplasma genomes M 1 2 3 4 5 6 Fig. 1. PCR amplification of the full-length haemolysin III gene (hlyIII) and flanking sequences from six ‘Ca. Phytoplasma’-related strains using primer pair HaemoF1/HaemoR1.M, 1 kb Plus DNA ladder; 1, aster yellows phytoplasma (AY1a); 2, Oklahoma aster yellows phytoplasma (OKAY); 3, New Jersey aster yellows phytoplasma (NJAY); 4, clover phyllody phytoplasma (CPh); 5, paulownia witches’-broom phytoplasma (PaWB); 6, Chinese wingnut witches’-broom phytoplasma (CWWB). highly expressed, and therefore the AMPs are abundant at the surface of the phytoplasma cells (Morton et al., 2003; Kakizawa et al., 2004; Arashida et al., 2008b). As inferred from their coding sequences, AMPs and IMPs are rich in positively charged amino acids, and therefore have an isoelectric point greater than 8.0. At physiological pH, such membrane proteins would tend to present, at the cell surface, positively charged sites or pockets that are critical for ligand binding, signal perception and other biochemical functions during pathogen–host interactions (Suzuki et al., 2006; Boonrod et al., 2012; Zhao et al., 2014). Conceivably, SSRs (especially TrNRs) in IMP/AMP genes may reversibly alter the amino acid sequence, length and conformation (Heringa & Taylor, 1997) of the AMPs and IMPs, thus modulating phytoplasma–host interactions and even escaping host immune surveillance. The presence of multiple SSRs in genes that encode phosphatidylserine decarboxylase (Psd) and phosphatidylserine synthase (PssA) is also intriguing (Table S4). All five completely sequenced phytoplasma genomes possess a complex set of phospholipid biosynthesis pathway genes and, notably, among mollicutes, phytoplasmas are the only group of organisms whose genomes encode Psd and PssA (Kube et al., 2012). Since the phospholipid biosynthesis pathway plays an important role in the virulence of diverse pathogens including fungi (Chen et al., 2010) and bacteria (Conde-Alvarez et al., 2006; Bukata et al., 2008), the presence of multiple SSRs in the psd and pssA genes further stimulates our interest in exploring the functions of phospholipids in phytoplasma pathogenesis. Allelic diversity and potential contingency loci It has been reported that long SSRs, especially SSRs with long motif length, are underrepresented in most prokaryotic genomes, and that they often function as contingency loci, affecting gene expression through altering the motif repeat numbers (Moxon et al., 2006; Mrázek et al., 2007). In this study, we identified 22 SSRs with motif lengths longer than hexanucleotide in the genomes of two ‘Ca. P. asteris’-related strains, OYM and AYWB (Table 3). Comparative analysis of the genetic loci that bear such long SSRs revealed that motif repeat number variations (SSR Full length of hlylll ORF 831bp (276aa) Repeat 1 Repeat 2 Repeat 3 Repeat 4 NJAY Repeat 1 Repeat 2 Repeat 3 Repeat 4 Repeat 5 OKAY 840bp (279aa) Repeat 1 Repeat 2 Repeat 3 Repeat 4 Repeat 5 AYWB 840bp (279aa) PaWB 615bp (204aa) Repeat 1 OYM 615bp (204aa) Fig. 2. Allelic polymorphism of a nonanucleotide-motif SSR associated with the hlyIII gene of five ‘Ca. P. asteris’-related strains. Partial coding nucleotide sequences and deduced amino acid sequences of the hlyIII loci were aligned. The translational initiation codon in each coding sequence is indicated by an asterisk (*), and the N-terminal methionine is shown in bold. The nonanucleotide repeat motif (CCTTATTTT) is delineated by a box. A possible partial or decayed repeat unit (CCTTAT) upstream of the PaWB hlyIII translation initiation codon is underlined. The annotations of the OYM and AYWB hlyIII translational initiation codon are based on the respective original genome sequencing reports (Oshima et al., 2004; Bai et al., 2006) and the corresponding GenBank record (accession numbers NC_005303 and NC_007716). http://ijs.sgmjournals.org Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Mon, 31 Jul 2017 16:34:04 2757 W. Wei and others polymorphisms) occurred in 20 of the 22 SSR motifs, and only two SSR motifs (TCTTTT and TTGTAA) did not appear to be polymorphic between the two phytoplasma genomes. Among the 20 polymorphic SSR loci, 11 were located within coding regions and nine were in 59-UTR regions. Pairwise sequence alignment of the allelic polymorphic SSRs revealed that sequence mismatches (indels) were essentially confined within the SSRs due to motif repeat number variations, and the sequences flanking the SSR tracts were identical or nearly identical (data not shown). Likewise, 16 SSRs with motif lengths longer than hexanucleotide were identified in the genomes of two ‘Ca. P. australiense’-related strains, CBWB and SLY. SSR polymorphisms occurred in eight of the 16 SSR motifs (Table 3). These data suggest that polymorphisms in high-order (long-motif) SSRs contribute significantly to allelic diversity among closely related lineages, and such potentially reversible polymorphic loci may serve as contingency loci. Interestingly, a nonanucleotide SSR motif (AAAATAAGG, reverse complement CCTTATTTT) was repeated five times in the coding region of the haemolysin III gene (hlyIII) in the AYWB genome, while only one copy of this SSR motif was observed in the 59-UTR region of the OYM hlyIII. Since hlyIII encodes an AMP, a suspected virulence factor, we investigated the hlyIII-associated SSR tracts further in six additional ‘Ca. P. asteris’-related strains (see Methods). PCRs were conducted using primers annealing to conserved nucleotide sequence blocks flanking hlyIII. PCRs with DNA templates derived from three strains, OKAY, NJAY and PaWB, yielded amplicons (Fig. 1). Results from DNA sequencing analysis of the cloned amplicons revealed additional polymorphic alleles of this SSR motif. Since the allelic polymorphism (i.e. variations in SSR motif repeat number) occurred within either the coding or 59 regulatory regions of hlyIII, depending on individual strains (Fig. 2), this polymorphic SSR could conceivably affect both the expression of the gene and the composition of the encoded haemolysins. Functional characterizations of these allelic SSR loci, including assays of haemolytic activities of varied protein products, are being undertaken to advance our understanding of haemolysins and SSRs in phytoplasma pathogenesis. Arashida, R., Kakizawa, S., Ishii, Y., Hoshi, A., Jung, H. Y., Kagiwada, S., Yamaji, Y., Oshima, K. & Namba, S. (2008b). Cloning and characterization of the antigenic membrane protein (Amp) gene and in situ detection of Amp from malformed flowers infected with Japanese hydrangea phyllody phytoplasma. Phytopathology 98, 769–775. Bai, X., Zhang, J., Ewing, A., Miller, S. A., Jancso Radek, A., Shevchenko, D. V., Tsukerman, K., Walunas, T., Lapidus, A. & other authors (2006). Living with genome instability: the adaptation of phytoplasmas to diverse environments of their insect and plant hosts. J Bacteriol 188, 3682–3696. Bayliss, C. D., Field, D. & Moxon, E. R. (2001). The simple sequence contingency loci of Haemophilus influenzae and Neisseria meningitidis. J Clin Invest 107, 657–666. Boonrod, K., Munteanu, B., Jarausch, B., Jarausch, W. & Krczal, G. (2012). An immunodominant membrane protein (Imp) of ‘Candidatus Phytoplasma mali’ binds to plant actin. Mol Plant Microbe Interact 25, 889–895. Bukata, L., Altabe, S., de Mendoza, D., Ugalde, R. A. & Comerci, D. J. (2008). Phosphatidylethanolamine synthesis is required for optimal virulence of Brucella abortus. J Bacteriol 190, 8197–8203. Chen, Y. L., Montedonico, A. E., Kauffman, S., Dunlap, J. R., Menn, F. M. & Reynolds, T. B. (2010). Phosphatidylserine synthase and phosphatidylserine decarboxylase are essential for cell wall integrity and virulence in Candida albicans. Mol Microbiol 75, 1112–1132. Chistiakov, D. A., Hellemans, B., Haley, C. S., Law, A. S., Tsigenopoulos, C. S., Kotoulas, G., Bertotto, D., Libertini, A. & Volckaert, F. A. (2005). A microsatellite linkage map of the European sea bass Dicentrarchus labrax L. Genetics 170, 1821–1826. Coenye, T. & Vandamme, P. (2005). Characterization of mononucleo- tide repeats in sequenced prokaryotic genomes. DNA Res 12, 221–233. Conde-Alvarez, R., Grilló, M. J., Salcedo, S. P., de Miguel, M. J., Fugier, E., Gorvel, J. P., Moriyón, I. & Iriarte, M. (2006). Synthesis of phosphatidylcholine, a typical eukaryotic phospholipid, is necessary for full virulence of the intracellular bacterial parasite Brucella abortus. Cell Microbiol 8, 1322–1335. Davis, R. E., Jomantiene, R., Zhao, Y. & Dally, E. L. (2003). Folate biosynthesis pseudogenes, ( folP and ( folK, and an O-sialoglycoprotein endopeptidase gene homolog in the phytoplasma genome. DNA Cell Biol 22, 697–706. Davis, R. E., Jomantiene, R. & Zhao, Y. (2005). Lineage-specific decay of folate biosynthesis genes suggests ongoing host adaptation in phytoplasmas. DNA Cell Biol 24, 832–840. Doi, Y. M., Teranaka, M., Yora, K. & Asuyama, H. (1967). Mycoplasma or PLT group-like microorganisms found in the phloem elements of plants infected with mulberry dwarf, potato witches’-broom, aster yellows, or paulownia witches’-broom. Ann Phytopathol Soc Jpn 33, 259–266. Ellegren, H. (2004). Microsatellites: simple sequences with complex evolution. Nat Rev Genet 5, 435–445. REFERENCES Ahrens, U. & Seemüller, E. (1992). Detection of DNA of plant patho- genic mycoplasma like organisms by a polymerase chain reaction that amplifies a sequence of the 16S rRNA gene. Phytopathology 82, 828–832. Andersen, M. T., Liefting, L. W., Havukkala, I. & Beever, R. E. (2013). Field, D. & Wills, C. (1998). Abundant microsatellite polymorphism in Saccharomyces cerevisiae, and the different distributions of microsatellites in eight prokaryotes and S. cerevisiae, result from strong mutation pressures and a variety of selective forces. Proc Natl Acad Sci U S A 95, 1647–1652. Gedvilaite, A., Jomantiene, R., Dabrisius, J., Norkiene, M. & Davis, R. E. (2014). Functional analysis of a lipolytic protein encoded in Comparison of the complete genome sequence of two closely related isolates of ‘Candidatus Phytoplasma australiense’ reveals genome plasticity. BMC Genomics 14, 529. phytoplasma phage based genomic island. Microbiol Res 169, 388–394. Arashida, R., Kakizawa, S., Hoshi, A., Ishii, Y., Jung, H. Y., Kagiwada, S., Yamaji, Y., Oshima, K. & Namba, S. (2008a). Heterogeneic plasmas): a basis for their classification. J Bacteriol 176, 5244–5254. dynamics of the structures of multiple gene clusters in two pathogenetically different lines originating from the same phytoplasma. DNA Cell Biol 27, 209–217. 2758 Gundersen, D. E., Lee, I.-M., Rehner, S. A., Davis, R. E. & Kingsbury, D. T. (1994). Phylogeny of mycoplasmalike organisms (phytoGur-Arie, R., Cohen, C. J., Eitan, Y., Shelef, L., Hallerman, E. M. & Kashi, Y. (2000). Simple sequence repeats in Escherichia coli: abundance, distribution, composition, and polymorphism. Genome Res 10, 62–71. Downloaded from www.microbiologyresearch.org by International Journal of Systematic and Evolutionary Microbiology 65 IP: 88.99.165.207 On: Mon, 31 Jul 2017 16:34:04 Simple sequence repeats in phytoplasma genomes Harrison, N., Davis, R. E., Oropeza, C., Helmick, E., Narvaez, M., Eden-Green, S., Dollet, M. & Dickinson, M. (2014). ‘Candidatus Phytoplasma palmicola’, associated with a lethal yellowing-type disease of coconut (Cocos nucifera L.) in Mozambique. Int J Syst Evol Microbiol 64, 1890–1899. Moxon, R., Bayliss, C. & Hood, D. (2006). Bacterial contingency loci: the role of simple sequence DNA repeats in bacterial adaptation. Annu Rev Genet 40, 307–333. Heringa, J. & Taylor, W. R. (1997). Three-dimensional domain Mrázek, J. (2006). Analysis of distribution indicates diverse functions of simple sequence repeats in Mycoplasma genomes. Mol Biol Evol 23, 1370–1385. duplication, swapping and stealing. Curr Opin Struct Biol 7, 416–421. Mrázek, J., Guo, X. & Shah, A. (2007). Simple sequence repeats in Jomantiene, R. & Davis, R. E. (2006). Clusters of diverse genes existing prokaryotic genomes. Proc Natl Acad Sci U S A 104, 8472–8477. as multiple, sequence-variable mosaics in a phytoplasma genome. FEMS Microbiol Lett 255, 59–65. Oshima, K., Kakizawa, S., Nishigawa, H., Jung, H. Y., Wei, W., Suzuki, S., Arashida, R., Nakata, D., Miyata, S. & other authors (2004). Reductive evolution suggested from the complete Jomantiene, R., Zhao, Y. & Davis, R. E. (2007). Sequence-variable mosaics: composites of recurrent transposition characterizing the genomes of phylogenetically diverse phytoplasmas. DNA Cell Biol 26, 557–564. Kakizawa, S., Oshima, K., Nishigawa, H., Jung, H. Y., Wei, W., Suzuki, S., Tanaka, M., Miyata, S., Ugaki, M. & Namba, S. (2004). Secretion of immunodominant membrane protein from onion yellows phytoplasma through the Sec protein-translocation system in Escherichia coli. Microbiology 150, 135–142. Karlin, S., Brocchieri, L., Bergman, A., Mrazek, J. & Gentles, A. J. (2002). Amino acid runs in eukaryotic proteomes and disease genome sequence of a plant-pathogenic phytoplasma. Nat Genet 36, 27–29. Oshima, K., Kakizawa, S., Arashida, R., Ishii, Y., Hoshi, A., Hayashi, Y., Kagiwada, S. & Namba, S. (2007). Presence of two glycolytic gene clusters in a severe pathogenic line of Candidatus Phytoplasma asteris. Mol Plant Pathol 8, 481–489. Rocha, E. P. C. & Blanchard, A. (2002). Genomic repeats, genome plasticity and the dynamics of Mycoplasma evolution. Nucleic Acids Res 30, 2031–2042. Rocha, E. P. C., Matic, I. & Taddei, F. (2002). Over-representation of Kashi, Y. & King, D. G. (2006). Simple sequence repeats as repeats in stress response genes: a strategy to increase versatility under stressful conditions? Nucleic Acids Res 30, 1886–1894. advantageous mutators in evolution. Trends Genet 22, 253–259. Shuman, S. (1994). Novel approach to molecular cloning and Kashi, Y., King, D. & Soller, M. (1997). Simple sequence repeats as a source of quantitative genetic variation. Trends Genet 13, 74–78. polynucleotide synthesis using vaccinia DNA topoisomerase. J Biol Chem 269, 32678–32684. Katti, M. V., Ranjekar, P. K. & Gupta, V. S. (2001). Differential Simmons, W. L., Denison, A. M. & Dybvig, K. (2004). Resistance of associations. Proc Natl Acad Sci U S A 99, 333–338. distribution of simple sequence repeats in eukaryotic genome sequences. Mol Biol Evol 18, 1161–1167. King, D. G. (1994). Triple repeat DNA as a highly mutable regulatory mechanism. Science 263, 595–596. Kube, M., Schneider, B., Kuhl, H., Dandekar, T., Heitmann, K., Migdoll, A. M., Reinhardt, R. & Seemüller, E. (2008). The linear chromosome of the plant-pathogenic mycoplasma ‘Candidatus Phytoplasma mali’. BMC Genomics 9, 306. Kube, M., Mitrovic, J., Duduk, B., Rabus, R. & Seemüller, E. (2012). Current view on phytoplasma genomes and encoded metabolism. ScientificWorldJournal 2012, 185942. Leclercq, S., Rivals, E. & Jarne, P. (2010). DNA slippage occurs at Mycoplasma pulmonis to complement lysis is dependent on the number of Vsa tandem repeats: shield hypothesis. Infect Immun 72, 6846–6851. Suzuki, S., Oshima, K., Kakizawa, S., Arashida, R., Jung, H. Y., Yamaji, Y., Nishigawa, H., Ugaki, M. & Namba, S. (2006). Interaction between the membrane protein of a pathogen and insect microfilament complex determines insect-vector specificity. Proc Natl Acad Sci U S A 103, 4252–4257. Tran-Nguyen, L. T. T., Kube, M., Schneider, B., Reinhardt, R. & Gibb, K. S. (2008). Comparative genome analysis of ‘‘Candidatus Phytoplasma australiense’’ (subgroup tuf-Australia I; rp-A) and ‘‘Ca. Phytoplasma asteris’’ strains OY-M and AY-WB. J Bacteriol 190, 3979–3991. microsatellite loci without minimal threshold length in humans: a comparative genomic approach. Genome Biol Evol 2, 325–335. Trivedi, S. (2006). Comparison of simple sequence repeats in 19 Lee, I.-M., Davis, R. E. & Gundersen-Rindal, D. E. (2000). Phytoplasma: Trivedi, S. (2013). Repeats in transforming acidic coiled-coil (TACC) phytopathogenic mollicutes. Annu Rev Microbiol 54, 221–255. archaea. Genet Mol Res 5, 741–772. genes. Biochem Genet 51, 458–473. Li, Y. C., Korol, A. B., Fahima, T., Beiles, A. & Nevo, E. (2002). and Tsai, J. H. (1979). Vector transmission of mycoplasmal agents of plant diseases. In The Mycoplasmas, pp. 265–307. Edited by R. F. Whitcomb & J. G. Tully. San Diego: Academic Press. Li, Y. C., Korol, A. B., Fahima, T. & Nevo, E. (2004). Microsatellites van Belkum, A., Scherer, S., van Alphen, L. & Verbrugh, H. (1998). within genes: structure, function, and evolution. Mol Biol Evol 21, 991–1007. Short-sequence DNA repeats in prokaryotic genomes. Microbiol Mol Biol Rev 62, 275–293. Liu, L., Panangala, V. S. & Dybvig, K. (2002). Trinucleotide GAA Wei, W., Davis, R. E., Lee, I.-M. & Zhao, Y. (2007). Computer- repeats dictate pMGA gene expression in Mycoplasma gallisepticum by affecting spacing between flanking regions. J Bacteriol 184, 1335–1339. simulated RFLP analysis of 16S rRNA genes: identification of ten new phytoplasma groups. Int J Syst Evol Microbiol 57, 1855–1867. Loire, E., Higuet, D., Netter, P. & Achaz, G. (2013). Evolution of Wei, W., Davis, R. E., Jomantiene, R. & Zhao, Y. (2008). Ancient, coding microsatellites in primate genomes. Genome Biol Evol 5, 283–295. recurrent phage attacks and recombination shaped dynamic sequence-variable mosaics at the root of phytoplasma genome evolution. Proc Natl Acad Sci U S A 105, 11827–11832. Microsatellites: genomic distribution, putative functions mutational mechanisms: a review. Mol Ecol 11, 2453–2465. Morton, A., Davies, D. L., Blomquist, C. L. & Barbara, D. J. (2003). Characterization of homologues of the apple proliferation immunodominant membrane protein gene from three related phytoplasmas. Mol Plant Pathol 4, 109–114. http://ijs.sgmjournals.org Young, E. T., Sloan, J. S. & Van Riper, K. (2000). Trinucleotide repeats are clustered in regulatory genes in Saccharomyces cerevisiae. Genetics 154, 1053–1068. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Mon, 31 Jul 2017 16:34:04 2759 W. Wei and others Zhao, Y., Wei, W., Lee, I. M., Shao, J., Suo, X. & Davis, R. E. (2009). Construction of an interactive online phytoplasma classification tool, i PhyClassifier, and its application in analysis of the peach X-disease phytoplasma group (16SrIII). Int J Syst Evol Microbiol 59, 2582–2593. Zhao, Y., Wei, W., Davis, R. E. & Lee, I. -M. (2010). Recent advances in 16S rRNA gene-based phytoplasma differentiation, classification and taxonomy. In Phytoplasmas: Genomes, Plant Hosts and Vector, 2760 pp. 64–92. Edited by P. Weintraub & P. Jones. Wallingford, UK: CABI Publishing. Zhao, Y., Davis, R. E., Wei, W., Shao, J. & Jomantiene, R. (2014). Phytoplasma genomes: evolution through mutually complementary mechanisms, gene loss and horizontal acquisition. In Genomics of Plant-Associated Bacteria, pp. 235–271. Edited by D. Gross, A. Lichens-Park & C. Kole. Heidelberg: Springer. Downloaded from www.microbiologyresearch.org by International Journal of Systematic and Evolutionary Microbiology 65 IP: 88.99.165.207 On: Mon, 31 Jul 2017 16:34:04
© Copyright 2026 Paperzz