GBE Phylogenomic and MALDI-TOF MS Analysis of Streptococcus sinensis HKU4T Reveals a Distinct Phylogenetic Clade in the Genus Streptococcus Jade L.L. Teng1,2,3,4,y, Yi Huang1,y, Herman Tse1,2,3,4, Jonathan H.K. Chen1, Ying Tang1, Susanna K.P. Lau1,2,3,4,*, and Patrick C.Y. Woo1,2,3,4,* 1 Department of Microbiology, The University of Hong Kong, Hong Kong, China 2 Research Centre of Infection and Immunology, The University of Hong Kong, Hong Kong, China 3 State Key Laboratory of Emerging Infectious Diseases, The University of Hong Kong, Hong Kong, China 4 Carol Yu Centre for Infection, The University of Hong Kong, Hong Kong, China *Corresponding author: E-mail: [email protected]; [email protected]. yThese authors contributed equally to this work. Accepted: October 13, 2014 Data deposition: This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession JPEN00000000. The version described in this paper is version JPEN01000000. Abstract Streptococcus sinensis is a recently discovered human pathogen isolated from blood cultures of patients with infective endocarditis. Its phylogenetic position, as well as those of its closely related species, remains inconclusive when single genes were used for phylogenetic analysis. For example, S. sinensis branched out from members of the anginosus, mitis, and sanguinis groups in the 16S ribosomal RNA gene phylogenetic tree, but it was clustered with members of the anginosus and sanguinis groups when groEL gene sequences used for analysis. In this study, we sequenced the draft genome of S. sinensis and used a polyphasic approach, including concatenated genes, whole genomes, and matrix-assisted laser desorption ionization-time of flight mass spectrometry to analyze the phylogeny of S. sinensis. The size of the S. sinensis draft genome is 2.06 Mb, with GC content of 42.2%. Phylogenetic analysis using 50 concatenated genes or whole genomes revealed that S. sinensis formed a distinct cluster with Streptococcus oligofermentans and Streptococcus cristatus, and these three streptococci were clustered with the “sanguinis group.” As for phylogenetic analysis using hierarchical cluster analysis of the mass spectra of streptococci, S. sinensis also formed a distinct cluster with S. oligofermentans and S. cristatus, but these three streptococci were clustered with the “mitis group.” On the basis of the findings, we propose a novel group, named “sinensis group,” to include S. sinensis, S. oligofermentans, and S. cristatus, in the Streptococcus genus. Our study also illustrates the power of phylogenomic analyses for resolving ambiguities in bacterial taxonomy. Key words: Streptococcus sinensis, genome, phylogenomic, “sinensis group”, Illumina. Background The genus Streptococcus currently comprises more than 90 species with some of them being important human pathogens causing significant morbidity and mortality globally. Traditional phenotypic classification of Streptococcus relies on their Lancefield group antigens and hemolytic properties, which divides the genus into two major groups, pyogenic and viridans groups (Sherman 1937). As a result of the widespread use of polymerase chain reaction and DNA sequencing in the last two decades, genotypic methods such as amplification and sequencing of universal gene targets represent an advanced method for bacterial classification and identification. Among the various studied gene targets, 16S ribosomal RNA (rRNA) gene has been the most widely used. Classification of Streptococcus based on the 16S rRNA gene sequences divided the genus into six major groups, namely anginosus, mitis, salivarius, mutans, bovis, and pyogenic groups (Kawamura et al. 1995). ß The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] 2930 Genome Biol. Evol. 6(10):2930–2943. doi:10.1093/gbe/evu232 Advance Access publication October 20, 2014 GBE Streptococcus sinensis HKU4T Genome Sequence Subsequently, Facklam (2002) suggested the addition of sanguinis group to the existing groups based on the phenotypic properties, resulting in seven major groups. However, some studies still showed that 16S rRNA gene or other housekeeping genes failed to provide sufficient resolution and to delineate Streptococcus species into the same taxonomic groupings (Tapp et al. 2003; Glazunova et al. 2010; Park et al. 2010; Zbinden et al. 2011). Apart from these phenotypic and genotypic identification methods, a recently emerged technique, matrix-assisted laser desorption ionization timeof-flight mass spectrometry (MALDI-TOF MS), has been used for identification and cluster analysis of many bacterial species. The technique examines the profile of proteins detected directly from intact bacteria and involves pattern analysis of the mass spectra obtained within a mass range of 2,000–20,000 Da from whole cell protein fingerprinting using mathematical tools (Hsieh et al. 2008). Several studies have shown that MALDI-TOF MS is useful for identification and cluster analysis of different groups of medically important Streptococcus species (Dubois et al. 2013; Karpanoja et al. 2014). In 2002, we reported the discovery of a novel species, Streptococcus sinensis, isolated from multiple blood cultures of a 42-year-old Chinese woman with infective endocarditis complicating her chronic rheumatic heart disease in Hong Kong (Woo et al. 2002). Two years later, we reported two other cases of infective endocarditis caused by S. sinensis with Lancefield F (Woo et al. 2004). Subsequently, additional cases reported from France and Italy suggested that the bacterium is an emerging pathogen of global importance (Uckay et al. 2007; Faibis et al. 2008). The oral cavity is the natural reservoir of S. sinensis (Woo et al. 2008). Phenotypically, the bacterium possessed characteristics from members of anginosus, sanguinis, and mitis groups, as different commercial kits gave different identities to different strains of S. sinensis (Woo et al. 2006). Genotypically, phylogenetic analysis using 16S rRNA gene sequences showed that S. sinensis was equally close to both the anginosus and mitis groups but was more closely related to sanguinis group when using GroEL sequences. Because of these inconclusive results, the phylogenetic position of S. sinensis remains uncertain. In this study, we attempted to determine the exact phylogenetic position of S. sinensis using phylogenomic approach. We sequenced the first genome of S. sinensis type strain HKU4T and performed comparative genomic studies utilizing publicly available Streptococcus genomes. Phylogenomic analysis using the whole-genome sequences revealed a new phylogenetic clade, including S. sinensis, Streptococcus oligofermentans, and Streptococcus cristatus, which we propose it as “sinensis group.” Intergenomic distance analysis, phylogenetic analysis using concatenated sequences, and cluster analysis of MALDI-TOF MS were also performed. Genome Sequencing of S. sinensis Type Strain HKU4T The de novo assembly using 26.81 million paired-end reads generated 116 contigs with lengths ranging from 212 to 125,948 bases, giving a total genome size of 2.06 Mb with the average GC content of 42.2%. All contigs generated were submitted to the RAST version 4.0 (Rapid Annotation using Subsystem Technology) annotation server, resulting in 1,992 protein-coding sequences (CDSs), 4 rRNA operons, and 47 transfer RNA-encoding genes (supplementary table S1, Supplementary Material online). To facilitate the subsequent genomic analysis, 86 genome sequences of other Streptococcus species and that of Enterococcus faecalis were also submitted to RAST for annotation. Among the 86 genome sequences, 25 complete genome sequences and 5 draft genome sequences (Streptococcus australis, Streptococcus criceti, S. cristatus, Streptococcus dentisani, and Streptococcus tigurinus because only one partial genome sequence was available for each species) were used for subsystem classification (table 1). Each CDS in annotated genomes was grouped into different RAST subsystems based on the predicted functional role. Among the 1,992 CDSs in the S. sinensis HKU4T genome, 854 CDSs can be categorized into RAST subsystems (fig. 1A), in which 133 CDSs were classified into more than one category. Overall, majority of CDSs were classified into subsystems of carbohydrates (148 CDSs, 7.4%), protein metabolism (118 CDSs, 5.9%), DNA metabolism (111 CDSs, 5.6%), and amino acid and derivatives (110 CDSs, 5.5%) (fig. 1A). The remaining 1,138 (57.1%) CDSs could not be classified into any subsystems, in which 488 (24.5%) CDSs were only annotated as hypothetical proteins. Consistent to the results of previous studies (Olson et al. 2013), when we compared the distribution of CDSs in each subsystem of the S. sinensis genome with those of other Streptococcus genomes, all of them have a similar percentage of their genome dedicated to each subsystem (fig. 1B). Phylogenomic Analyses of S. sinensis and Other Streptococcus Species Reveals a Distinct Clade Comprising S. sinensis, S. oligofermentans, and S. cristatus Phylogenetic analyses using sequences of single gene loci, 16S rRNA and groEL, extracted from 87 Streptococcus genomes showed that S. sinensis closely clustered with S. oligofermentans and S. cristatus, respectively (fig. 2A–C). However, the topology of these trees was not concordant, in which S. sinensis branched out from members of anginosus, mitis, and sanguinis groups when using 16S rRNA gene sequences (fig. 2A) but clustered with members of anginosus and sanguinis groups when using groEL sequences for analyses (fig. 2B and C). This inconclusive result suggested that Genome Biol. Evol. 6(10):2930–2943. doi:10.1093/gbe/evu232 Advance Access publication October 20, 2014 2931 GBE Teng et al. Table 1 Genome Sequence Details and the Corresponding DDH Value with Streptococcus sinensis HKU4T Species “Sinensis” group Streptococcus sinensis Streptococcus oligofermentans Streptococcus cristatus Sanguinis group Streptococcus sanguinis Streptococcus sanguinis Streptococcus gordonii Anginosus group Streptococcus intermedius Streptococcus intermedius Streptococcus intermedius Streptococcus intermedius Streptococcus constellatus subsp. pharyngis Streptococcus constellatus subsp. pharyngis Streptococcus constellatus subsp. pharyngis Streptococcus constellatus subsp. pharyngis Streptococcus anginosus Streptococcus anginosus Streptococcus anginosus Streptococcus anginosus subsp. whileyi Mitis group Streptococcus oralis Streptococcus oralis Streptococcus mitis Streptococcus pneumoniae Streptococcus pneumoniae Streptococcus pneumoniae Streptococcus pneumoniae Streptococcus pneumoniae Streptococcus pneumoniae Streptococcus pneumoniae Streptococcus pneumoniae Streptococcus pneumoniae Streptococcus pneumoniae Streptococcus pneumoniae Streptococcus parasanguinis Streptococcus parasanguinis Streptococcus pseudopneumoniae Streptococcus infantis Streptococcus infantis Streptococcus peroris Streptococcus tigurinus Streptococcus dentisani Streptococcus australis Pyogenic group Streptococcus agalactiae Streptococcus agalactiae Streptococcus agalactiae Streptococcus agalactiae Streptococcus dysgalactiae subsp. equisimilis Streptococcus pyogenes Streptococcus pyogenes Streptococcus pyogenes Strain Status GenBank Accession No. DDH (Formula 1) HKU4T AS 1.3089 ATCC 51100 Draft Finished Draft JPEN00000000a,b NC_021175.1a,b AEVC01000001–AEVC01000031a,b — 45.6 44.3 SK36 SK678 CH1 Finished Draft Finished NC_009009a,b AEXA01000001–AEXA01000010 NC_009785a,b 28.9 — 23.2 C270 JTH08 B196 F0413 C232 C818 C1050 SK1060 C1051 C238 SK1138 CCUG 39159 Finished Finished Finished Draft Finished Finished Finished Draft Finished Finished Draft Draft NC_022237.1b NC_018073.1b NC_022246.1a,b AFXO01000001–AFXO01000013 NC_022236.1a,b NC_022245.1b NC_022238.1b BASX01000001–BASX01000066 NC_022244.1a,b NC_022239.1b ALJO01000001–ALJO01000013 AICP01000001–AICP01000083 17.1 16.7 16.7 — 16.8 16.8 16.7 — 16.2 15.6 — — Uo5 SK610 B6 ATCC 700669 D39 JJA Taiwan19F-14 TIGR4 G54 P1031 R6 70585 CGSP14 Hungary19A-6 FW213 ATCC 15912 IS7493 ATCC 700779 SK970 ATCC 700780 1366 7746 ATCC 700641 Finished Draft Finished Finished Finished Finished Finished Finished Finished Finished Finished Finished Finished Finished Finished Finished Finished Draft Draft Draft Draft Draft Draft NC_015291.1a,b AJKQ01000001–AJKQ01000031 NC_013853a,b NC_011900a,b NC_008533b NC_012466b NC_012469b NC_003028b NC_011072b NC_012467b NC_003098b NC_012468b NC_010582b NC_010380b NC_017905.1b NC_015678.1a,b NC_015875.1a,b AEVD01000001–AEVD01000031 AFUT01000001–AFUT01000009 AEVF01000001–AEVF01000017 AORX01000001–AORX01000014a CAUJ01000001–CAUJ01000008a AEQR01000001–AEQR01000027a 16.1 — 15.4 15.4 15.4 15.4 15.4 15.4 15.3 15.3 15.3 15.2 15.2 15.0 15.2 15.1 14.8 — — — — — — NEM316 09mas018883 2603 V/R A909 GGS_124 M1 GAS Manfredo MGAS10270 Finished Finished Finished Finished Finished Finished Finished Finished NC_004368b NC_021485.1a,b NC_004116b NC_007432b NC_012891a,b NC_002737a,b NC_009332b NC_008022b 13.5 13.2 13.2 13.2 13.3 13.3 13.3 13.3 (continued) 2932 Genome Biol. Evol. 6(10):2930–2943. doi:10.1093/gbe/evu232 Advance Access publication October 20, 2014 GBE Streptococcus sinensis HKU4T Genome Sequence Table 1 Continued Species Streptococcus pyogenes Streptococcus pyogenes Streptococcus pyogenes Streptococcus pyogenes Streptococcus pyogenes Streptococcus pyogenes Streptococcus pyogenes Streptococcus pyogenes Streptococcus pyogenes Streptococcus pyogenes Streptococcus uberis Streptococcus parauberis Streptococcus iniae Streptococcus equi subsp. zooepidemicus Streptococcus equi subsp. equi Streptococcus equi subsp. zooepidemicus Streptococcus criceti Bovis group Streptococcus gallolyticus Streptococcus lutetiensis Streptococcus pasteurianus Streptococcus equinus Streptococcus infantarius Mutans group Streptococcus mutans Streptococcus mutans Streptococcus mutans Streptococcus mutans Salivarius group Streptococcus thermophilus Streptococcus thermophilus Streptococcus thermophilus Streptococcus thermophilus Streptococcus salivarius Streptococcus salivarius Streptococcus salivarius No group Streptococcus suis Streptococcus suis Streptococcus suis Streptococcus suis Streptococcus suis Outgroup Enterococcus faecalis Strain Status GenBank Accession No. DDH (Formula 1) MGAS10394 MGAS10750 MGAS2096 MGAS315 MGAS5005 MGAS6180 MGAS8232 MGAS9429 NZ131 SSI-1 0140J KCTC 11537 SF1 MGCS10565 4047 H70 HS-6 Finished Finished Finished Finished Finished Finished Finished Finished Finished Finished Finished Finished Finished Finished Finished Finished Draft NC_006086b NC_008024b NC_008023b NC_004070b NC_007297b NC_007296b NC_003485b NC_008021b NC_011375b NC_004606b NC_012004a,b NC_015558.1a,b NC_021314.1a,b NC_011134a,b NC_012471b NC_012470b AEUV02000001–AEUV02000002a 13.3 13.3 13.3 13.3 13.3 13.3 13.3 13.3 13.3 13.3 13.2 13.2 13.1 13.1 13.0 13.0 — UCN34 33 ATCC 43144 ATCC 9812 ATCC BAA-102 Finished Finished Finished Draft Draft NC_013798a,b NC_021900.1a,b NC_015600.1a,b AEVB01000001–AEVB01000057 ABJK02000001–ABJK02000022 13.3 13.3 13.3 — — NN2025 GS-5 LJ23 UA159 Finished Finished Finished Finished NC_013928a,b NC_018089b NC_017768.1b NC_004350b 13.3 13.2 13.2 13.2 CNRZ1066 LMD-9 LMG 18311 JIM 8232 JIM8777 CCHSS3 SK126 Finished Finished Finished Finished Finished Finished Draft NC_006449b NC_008532b NC_006448b NC_017581.1a,b NC_017595.1b FR873481a,b ACLO01000001–ACLO01000101 13.8 13.8 13.8 13.7 13.7 13.6 — BM407 P1/7 98HAH33 05ZYH33 SC84 Finished Finished Finished Finished Finished NC_012926b NC_012925b NC_009443b NC_009442a,b NC_012924b 13.7 13.7 13.6 13.6 13.6 V583 Finished NC_004668.1b 12.6 a Genome sequence submitted for subsystem classification using RAST 4.0. Genome sequence used to construct the whole genome tree. b phylogenetic analysis using single gene target has not added much to our understanding on the phylogeny of S. sinensis, as they only represent a very small portion of the bacterial genome. In view of this problem, we attempted to use a phylogenomic approach, which based on the whole bacterial genome, to resolve the taxonomic ambiguity of S. sinensis. In bacterial taxonomy, whole genomic DNA–DNA hybridization (DDH) has been used to determine the genetic distance between two strains and 70% DDH was proposed as a criterion for delineating species (Wayne 1988). However, this method is not widely used because it is tedious and results depend on experimental conditions. Therefore, it is often difficult to compare results between different laboratories objectively. Recent advancement of genome sequencing calls for bioinformatics methods to replace the wet-lab DDH by in silico Genome Biol. Evol. 6(10):2930–2943. doi:10.1093/gbe/evu232 Advance Access publication October 20, 2014 2933 Teng et al. GBE FIG. 1.—Distributions of predicted coding sequence function in the annotated genomes according to RAST subsystems. In (A), the number of CDSs of S. sinensis HKU4T in different subsystems is indicated in bracket. In (B), a total of 31 genomes of Streptococcus species, including S. sinensis and representatives from all major groups, were analyzed. Each column indicates the number of CDSs of each Streptococcus species in different subsystems showing in different color. 2934 Genome Biol. Evol. 6(10):2930–2943. doi:10.1093/gbe/evu232 Advance Access publication October 20, 2014 Streptococcus sinensis HKU4T Genome Sequence GBE FIG. 2.—Phylogenetic relationship among Streptococcus strains. Three phylogenetic trees were constructed, each using a different genetic locus for analysis. (A) 16S rRNA. (B) groEL. (C) GroEL. The trees were constructed by maximum-likelihood method using RAxML (version 7.3) and E. faecalis V583 as the root. A total of 1,583 nucleotide positions of the 16S rRNA gene, 1,709 nucleotide positions and 565 deduced amino acid positions of groEL from 88 genomes were included for analyses. Bootstrap values were calculated from 1,000 replicates. The scale bar corresponds to the mean number of nucleotide/ amino acid substitutions per site on the respective branch. Names and accession numbers are given as cited in GenBank in table 1. Genome Biol. Evol. 6(10):2930–2943. doi:10.1093/gbe/evu232 Advance Access publication October 20, 2014 2935 GBE Teng et al. FIG. 2.—Continued. genome-to-genome comparison. Among various sophisticated methods studied, a digital DDH method, GGDC 2.0, was shown to yield very good correlation with wet-lab DDH (Auch et al. 2010). In this study, with the availability of Streptococcus genome sequences, we were able to use GGDC 2.0 for intergenomic distance estimation, which allowed genome sequence comparison between two Streptococcus strains. The results showed that S. sinensis HKU4T shared 45.6% and 44.3% nucleotide identities to the genome sequence of S. oligofermentans AS 1.3089 (GenBank accession number NC_021175) and S. cristatus ATCC 51100 (GenBank accession number AEVC01000001– AEVC01000031) but only 23.2–28.9%, 15.6–17.1%, and 14.8–16.1% nucleotide identities to those from members of sanguinis (3 genomes), anginosus (12 genomes), and mitis (23 genomes) groups, respectively (table 1). The present DDH value between S. sinensis and S. oligofermentans was quite different from the results obtained in the previous study, in which the wet-lab DDH value between S. sinensis and S. oligofermentans was only 15%. We speculate that the 2936 Genome Biol. Evol. 6(10):2930–2943. doi:10.1093/gbe/evu232 Advance Access publication October 20, 2014 GBE Streptococcus sinensis HKU4T Genome Sequence FIG. 2.—Continued. discrepancy may due to the experimental error as DDH method is well-known not easily be made reproducible. Nevertheless, the present results using the entire genome sequences showed that S. sinensis was more closely related to S. oligofermentans and S. cristatus than members of sanguinis, anginosus, and mitis groups. To further verify the phylogenetic position of S. sinensis among the genus Streptococcus, phylogenomic analysis utilizing the draft genome sequences of S. sinensis HKU4T and S. cristatus ATCC51100 as well as the available complete genome sequences of 69 Streptococcus species was performed and the results also supported that S. sinensis closely clustered with S. oligofermentans and S. cristatus, forming a unique clade (fig. 3A). Although this clade, comprising S. sinensis, S. oligofermentans, and S. cristatus, also clustered with members of sanguinis group, inclusion of this clade into Genome Biol. Evol. 6(10):2930–2943. doi:10.1093/gbe/evu232 Advance Access publication October 20, 2014 2937 GBE Teng et al. FIG. 3.—Phylogenetic tree constructed using draft genome sequences and concatenated nucleotide sequences of 50 ribosomal protein genes of S. sinensis HKU4T. In (A), the tree based on draft genome sequences was constructed by neighbor-joining method using GGDC distance (formula 1) and E. faecalis V583 as the root. In (B), the tree based on 50 ribosomal protein genes, showing the relationship of S. sinensis HKU4T to other Streptococcus species, was constructed by maximum-likelihood method using RAxML (version 7.3) and E. faecalis V583 as the root. Bootstrap values were calculated from 1,000 replicates. The scale bar corresponds to the mean number of nucleotide substitutions per site on the respective branch. The gene names and accession numbers are given as cited in GenBank in table 1. the sanguinis group does not appear justified as they were connected by a relatively long branch and shared relatively low DDH value with S. sinensis. To confirm this result, we performed phylogenetic analysis using concatenated sequences of 50 ribosomal protein genes retrieved from 87 Streptococcus genomes including S. sinensis, representing 35 different species, and from one E. faecalis genome. These 50 ribosomal protein genes were chosen because they have been shown to be useful for phylogenetic delineation of bacterial species in previous studies (Jolley et al. 2012; Maiden et al. 2013) and their names were listed in supplementary table S2, Supplementary Material online. Results showed that the topologies of both trees were concordant and able to recover members of the seven taxonomic groups as described in previous studies (fig. 3A and B) (Kawamura et al. 1995; Facklam 2002). Consistently, the tree based on the concatenated sequences also revealed that S. sinensis, together with S. oligofermentans and S. cristatus, forming a distinct phylogenetic clade with a bootstrap value of 100%. It is worth noting that this unique clade also fell within the sanguinis group but the grouping was weakly supported by a low bootstrap value of 42%, meaning that it was not often clustered with members of sanguinis group (fig. 3B). The phylogenetic position of S. sinensis has been considered controversial due to its simultaneous possession of 2938 Genome Biol. Evol. 6(10):2930–2943. doi:10.1093/gbe/evu232 Advance Access publication October 20, 2014 GBE Streptococcus sinensis HKU4T Genome Sequence FIG. 3.—Continued. phenotypic characteristics from members of mitis, sanguinis, and anginosus groups (Woo et al. 2006). Genotypic methods such as 16S rRNA or GroEL sequence analyses also failed to reveal its exact taxonomic group. In this study, the genome data, offering superior discriminatory power than any phenotypic and genotypic methods, resolve the taxonomic ambiguity of S. sinensis. Based on these genomic evidences, we suggest the three species, S. sinensis, S. oligofermentans, and S. cristatus, forming a well-supported clade, might best be considered as a separate group that we tentatively propose it as the sinensis group. MALDI-TOF MS Analysis Supports the Newly Proposed Sinensis Group Dendrogram generated from hierarchical cluster analysis of MALDI-TOF MS showed that the MALDI-TOF MS spectrum of S. sinensis HKU4T and 28 nonduplicated Streptococcus species correctly recovered members of salivarius, mutans, bovis and pyogenic groups, and Streptococcus suis (fig. 4). Streptococcus sinensis consistently formed a distinct cluster with S. oligofermentans and S. cristatus, but these three streptococci were clustered with members belonging to “mitis Genome Biol. Evol. 6(10):2930–2943. doi:10.1093/gbe/evu232 Advance Access publication October 20, 2014 2939 GBE Teng et al. FIG. 4.—Dendrogram generated from hierarchical cluster analysis of MALDI-TOF MS spectra of S. sinensis HKU4T and 28 isolates of other Streptococcus species. Distances are displayed in relative units. group” (fig. 4). Notably, in contrast to the previous phylogenomic analysis using 50 concatenated genes or whole genomes, these three streptococci were clustered with “sanguinis group” (fig. 3A and B). On the basis of all these findings, these three streptococci should not be included into either sanguinis group or mitis group, but they should exist as a novel group, sinensis group in the genus Streptococcus. Comparative Genomic Analyses of the Three Genomes of Sinensis Group Based on BLAST (Basic Local Alignment Search Tool) analysis using 50% nucleotide identity as a threshold, we compared the genome sequence of S. sinensis HKU4T with those of S. oligofermentans and S. cristatus and found that 1,241 CDSs were shared among the three genomes (fig. 5). There were 222 CDSs only shared with S. oligofermentans and 63 CDSs only shared with S. cristatus (fig. 5). A total of 466 CDSs were uniquely found in S. sinensis HKU4T genome (fig. 5). Among these 466 unique CDSs, only 90 CDSs could be classified into RAST subsystems according to their predicted functional roles. For the remaining 376 CDSs, which did not belong to any subsystem, 256 CDSs were only annotated as hypothetical proteins. More in-depth analyses on these unique CDSs, especially those annotated as hypothetical proteins, may provide insights about the differences in metabolic capacity between S. sinensis and two other members of sinensis group. Potential Virulence Factors in S. sinensis HKU4T Previous studies showed that S. sinensis has been recovered from patients with infective endocarditis globally (Woo et al. 2002, 2004; Uckay et al. 2007; Faibis et al. 2008), suggesting that the bacteria may possess virulence factors to adhere and to colonize heart valves and to cause endocardial damage. The draft genome of S. sinensis HKU4T contains homologs of several virulence genes that are important for the pathogenesis of infective endocarditis. These include platelet activating factor, clumping factor B, collagen-binding protein, laminin-binding protein, and fibronectin/fibrinogen-binding protein which have been implicated in the pathogenesis of infective endocarditis by promoting bacterial adherence to endothelial tissues and triggering inflammatory responses (supplementary table S3, Supplementary Material online) (Abranches et al. 2011; Que and Moreillon 2011). Notably, two other members of the sinensis group, S. oligofermentans and S. cristatus, have also been isolated from patients with infective endocarditis (Matthys et al. 2006; Matta et al. 2009). Detailed comparative genomic studies on the S. sinensis genome and genomes of members of the sinensis group and those of the mitis group, including Streptococcus mitis, S. tigurinus, and Streptococcus oralis, may shed light on the ecology and biology of S. sinensis 2940 Genome Biol. Evol. 6(10):2930–2943. doi:10.1093/gbe/evu232 Advance Access publication October 20, 2014 GBE Streptococcus sinensis HKU4T Genome Sequence FIG. 5.—Comparative genomic analysis among three members of sinensis group, including S. sinensis, S. oligofermentans, and S. cristatus. The total number of CDSs per genome is given as indicated. The overlapping sections indicate shared numbers of CDSs among genomes. as well as pathogenesis of infective endocarditis caused by members belonging to this new phylogenetic clade. Collection of Institut Pasteur or Culture Collection, University of Göteborg (CCUG) (Woo et al. 2002). Conclusions Genome Sequencing, Assembly, and Annotation of S. sinensis Type Strain HKU4T In this study, we sequenced the first draft genome of S. sinensis type strain HKU4T and unambiguously determined the phylogenetic position of S. sinensis using phylogenomic approach. Phylogenomic and MALDI-TOF MS analysis revealed a distinct phylogenetic clade in the genus Streptococcus, which we proposed it as sinensis group, currently comprising three species, S. sinensis, S. oligofermentans, and S. cristatus. The present draft genome sequence has also allowed rapid exploration of potential virulence genes in S. sinensis. Our findings also illustrate the power of phylogenomic analyses for resolving ambiguities in bacterial taxonomy. Materials and Methods Bacterial Strains A total of 29 nonduplicated Streptococcus strains, including S. sinensis HKU4T, were included in the MALDI-TOF MS analysis (supplementary table S4, Supplementary Material online). Streptococcus sinensis HKU4T was isolated from blood culture of a Chinese patient with infective endocarditis in Hong Kong, whereas the remaining strains were either purchased from the The draft genome sequence of S. sinensis HKU4T was determined by high-throughput sequencing with Illumina Hi-Seq 2500. Genomic DNA was extracted from overnight cultures (37 C) grown on blood agars by genomic DNA purification kit (QIAgen, Hilden, Germany) as described previously (Woo et al. 2009; Tse et al. 2010). It was sequenced by 151-bp paired-end reads with mean library size of 350 bp. Sequencing errors were corrected by k-mer frequency spectrum analysis using SOAPec (http://soap.genomics.org.cn/about.html, last accessed October 22, 2014). De novo assembly was performed using SOAPdenovo2 (http:// soap.genomics.org.cn/soapdenovo.html, last accessed October 22, 2014). Prediction of protein-coding regions and automatic functional annotation was performed using Glimmer3 and RAST server version 4.0 (Delcher et al. 2007; Aziz et al. 2008). Intergenomic distance between S. sinensis HKU4T and other Streptococcus species, including representatives from seven major groups, was calculated using GGDC 2.0 (http://ggdc.dsmz.de/distcalc2.php, last accessed October 22, 2014) (Auch et al. 2010). Streptococcus species included for the distance calculation was shown in table 1. Genome Biol. Evol. 6(10):2930–2943. doi:10.1093/gbe/evu232 Advance Access publication October 20, 2014 2941 GBE Teng et al. Genome Sequence Data Details of the 88 genome sequences used in this study are shown in table 1. Despite the 18 draft genome sequences, including one from S. sinensis HKU4T which was sequenced to near completion as part of this study, the remaining 70 genome sequences were complete and downloaded from National Center for Biotechnology Information. Nucleotide sequences of 50 ribosomal protein genes (supplementary table S2, Supplementary Material online) and one copy of 16S rRNA and groEL genes were retrieved, respectively, from all 88 genomes (table 1). Phylogenetic Characterization The tree based on entire genome sequences was constructed by neighbor-joining method using GGDC distance (formula 1: length of all high-scoring segment pairs divided by total genome length) and E. faecalis as the root. The trees based on single gene loci, 16S rRNA and groEL gene (nucleotide and amino acid), were aligned by Geneious 7.1.5 (Biomatters Limited) and constructed, respectively, by maximum-likelihood method using RAxML (version 7.3). The tree based on the concatenated sequences of 50 ribosomal protein genes from 88 Streptococcus species, including 35 nonduplicated species, was constructed by maximum-likelihood method using RAxML (version 7.3). A total of 20,921 nucleotide positions were included in the analysis. Nucleotide sequences of corresponding homologs in E. faecalis were used as the outgroups where appropriate. Matrix-Assisted Laser Desorption Ionization Time-of-Flight Mass Spectrometry MALDI-TOF MS was performed as previously described with slight modifications (Tang et al. 2013; Lau et al. 2014). Twenty-nine nonduplicated Streptococcus strains, including S. sinensis HKU4T, were grown on sheep blood agar at 37 C with 5% CO2 for 48 h. All isolates were analyzed by the ethanol formic acid extraction method using the same experimental conditions suggested by the Bruker system. Briefly, 1–3 colonies were suspended into 100 ml of HPLC grade water (Fluka, St Louis, MO). Then, 300 ml of absolute ethanol was added and incubated for 5 min at room temperature. Cell pellet was then air dried after centrifugation. The pellet was resuspended in 30 ml each of formic acid (Fluka) and acetonitrile (Sigma Aldrich, St Louis, MO). Each bacterial extract was spotted onto six spots of the MSP96 target plate after centrifugation. Samples were processed in the Bruker MicroFlex LT mass spectrometer (Bruker Daltonics, Bremen, Germany) with 1 ml of a-cyano 4-hydroxycinnimic acid matrix solution (Sigma Aldrich). Spectra were obtained with an accelerating voltage of 20 kV in linear mode and analyzed within m/z range 3,000–15,000 Da. Spectra were analyzed with MALDI Biotyper 3.1 and Reference Library V.3.1.2.0 (Bruker Daltonics). A mass spectrum profile (MSP) based on 24 separate determinations was created. The representative MSPs were then selected for hierarchical cluster analysis using MALDI Biotyper 3.1 (Bruker Daltonics) with default parameters (Ketterlinus et al. 2005), where distance values are relative and are always normalized to a maximum value of 1,000. Supplementary Material Supplementary tables S1–S4 are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/). Acknowledgments This work was partly supported by the Strategic Research Theme Fund and the Small Project Funding Scheme, The University of Hong Kong. The authors thank members of the Centre for Genomic Sciences, The University of Hong Kong, for their technical support in genome sequencing. Literature Cited Abranches J, et al. 2011. The collagen-binding protein Cnm is required for Streptococcus mutans adherence to and intracellular invasion of human coronary artery endothelial cells. Infect Immun. 79: 2277–2284. Auch AF, Klenk HP, Goker M. 2010. Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs. Stand Genomic Sci. 2:142–148. Aziz RK, et al. 2008. The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9:75. Delcher AL, Bratke KA, Powers EC, Salzberg SL. 2007. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23: 673–679. Dubois D, Segonds C, Prere MF, Marty N, Oswald E. 2013. Identification of clinical Streptococcus pneumoniae isolates among other alpha and nonhemolytic streptococci by use of the Vitek MS matrix-assisted laser desorption ionization-time of flight mass spectrometry system. J Clin Microbiol. 51:1861–1867. Facklam R. 2002. What happened to the streptococci: overview of taxonomic and nomenclature changes. Clin Microbiol Rev. 15:613–630. Faibis F, et al. 2008. Streptococcus sinensis: an emerging agent of infective endocarditis. J Med Microbiol. 57:528–531. Glazunova OO, Raoult D, Roux V. 2010. Partial recN gene sequencing: a new tool for identification and phylogeny within the genus Streptococcus. Int J Syst Evol Microbiol. 60:2140–2148. Hsieh SY, et al. 2008. Highly efficient classification and identification of human pathogenic bacteria by MALDI-TOF MS. Mol Cell Proteomics. 7:448–456. Jolley KA, et al. 2012. Ribosomal multilocus sequence typing: universal characterization of bacteria from domain to strain. Microbiology 158:1005–1015. Karpanoja P, Harju I, Rantakokko-Jalava K, Haanpera M, Sarkkinen H. 2014. Evaluation of two matrix-assisted laser desorption ionizationtime of flight mass spectrometry systems for identification of viridans group streptococci. Eur J Clin Microbiol Infect Dis. 33:779–788. Kawamura Y, Hou XG, Sultana F, Miura H, Ezaki T. 1995. Determination of 16S rRNA sequences of Streptococcus mitis and Streptococcus gordonii and phylogenetic relationships among members of the genus Streptococcus. Int J Syst Bacteriol. 45:406–408. 2942 Genome Biol. Evol. 6(10):2930–2943. doi:10.1093/gbe/evu232 Advance Access publication October 20, 2014 GBE Streptococcus sinensis HKU4T Genome Sequence Ketterlinus R, Hsieh SY, Teng SH, Lee H, Pusch W. 2005. Fishing for biomarkers: analyzing mass spectrometry data with the new ClinProTools software. Biotechniques Suppl:37–40. Lau SK, et al. 2014. Matrix-assisted laser desorption ionisation time-of-flight mass spectrometry for identification of clinically significant bacteria that are difficult to identify in clinical laboratories. J Clin Pathol. 67:361–366. Maiden MC, et al. 2013. MLST revisited: the gene-by-gene approach to bacterial genomics. Nat Rev Microbiol. 11:728–736. Matta M, et al. 2009. First case of Streptococcus oligofermentans endocarditis determined based on sodA gene sequences after amplification directly from valvular samples. J Clin Microbiol. 47:855–856. Matthys C, et al. 2006. Streptococcus cristatus isolated from a resected heart valve and blood cultures: case reports and application of phenotypic and genotypic techniques for identification. Acta Clin Belg. 61: 196–200. Olson AB, et al. 2013. Phylogenetic relationship and virulence inference of Streptococcus anginosus group: curated annotation and wholegenome comparative analysis support distinct species designation. BMC Genomics 14:895. Park HK, Yoon JW, Shin JW, Kim JY, Kim W. 2010. rpoA is a useful gene for identification and classification of Streptococcus pneumoniae from the closely related viridans group streptococci. FEMS Microbiol Lett. 305:58–64. Que YA, Moreillon P. 2011. Infective endocarditis. Nat Rev Cardiol. 8: 322–336. Sherman JM. 1937. The streptococci. Bacteriol Rev. 1:3–97. Tang BS, et al. 2013. Matrix-assisted laser desorption ionisation-time of flight mass spectrometry for rapid identification of Laribacter hongkongensis. J Clin Pathol. 66:1081–1083. Tapp J, Thollesson M, Herrmann B. 2003. Phylogenetic relationships and genotyping of the genus Streptococcus by sequence determination of the RNase P RNA gene, rnpB. Int J Syst Evol Microbiol. 53: 1861–1871. Tse H, et al. 2010. Complete genome sequence of Staphylococcus lugdunensis strain HKU09-01. J Bacteriol. 192:1471–1472. Uckay I, et al. 2007. Streptococcus sinensis endocarditis outside Hong Kong. Emerg Infect Dis. 13:1250–1252. Wayne LG. 1988. International Committee on Systematic Bacteriology: announcement of the report of the ad hoc Committee on Reconciliation of Approaches to Bacterial Systematics. Zentralbl Bakteriol Mikrobiol Hyg A. 268:433–434. Woo PC, et al. 2002. Streptococcus sinensis sp. nov., a novel species isolated from a patient with infective endocarditis. J Clin Microbiol. 40: 805–810. Woo PC, et al. 2004. Streptococcus sinensis may react with Lancefield group F antiserum. J Med Microbiol. 53:1083–1088. Woo PC, et al. 2008. The oral cavity as a natural reservoir for Streptococcus sinensis. Clin Microbiol Infect. 14:1075–1079. Woo PC, et al. 2009. The complete genome and proteome of Laribacter hongkongensis reveal potential mechanisms for adaptations to different temperatures and habitats. PLoS Genet. 5: e1000416. Woo PC, Teng JL, Lau SK, Yuen KY. 2006. Clinical, phenotypic, and genotypic evidence for Streptococcus sinensis as the common ancestor of anginosus and mitis groups of streptococci. Med Hypotheses. 66: 345–351. Zbinden A, Kohler N, Bloemberg GV. 2011. recA-based PCR assay for accurate differentiation of Streptococcus pneumoniae from other viridans streptococci. J Clin Microbiol. 49: 523–527. Associate editor: B. Venkatesh Genome Biol. Evol. 6(10):2930–2943. doi:10.1093/gbe/evu232 Advance Access publication October 20, 2014 2943
© Copyright 2026 Paperzz