REVIEW ARTICLE Structural diversity in Salmonella O antigens and its genetic basis Bin Liu1,2, Yuriy A. Knirel3, Lu Feng1,2,4, Andrei V. Perepelov3, Sof’ya N. Senchenkova3, Peter R. Reeves5 & Lei Wang1,2,4,6 1 TEDA School of Biological Sciences and Biotechnology, Nankai University, TEDA, Tianjin, China; 2The Key Laboratory of Molecular Microbiology and Technology, Ministry of Education, Tianjin, China; 3N.D. Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Moscow, Russian Federation; 4Tianjin Key Laboratory of Microbial Functional Genomics, Tianjin, China; 5School of Molecular and Microbial Bioscience (G08), University of Sydney, Sydney, Australia; and 6Tianjin Research Center for Functional Genomics and Biochip, Tianjin, China Correspondence: Lei Wang, TEDA School of Biological Sciences and Biotechnology, Nankai University, 23 Hongda Street, TEDA, Tianjin 300457, China. Tel.: 86 22 66229588; fax: 86 22 66229596; e-mail: [email protected] Received 30 November 2012; revised 15 May 2013; accepted 5 July 2013. Final version published online 2 August 2013. DOI: 10.1111/1574-6976.12034 MICROBIOLOGY REVIEWS Editor: Wilbert Bitter Keywords polysaccharide; pathogen; polymorphism; serotyping; evolution; glycosyltransferase. Abstract This review covers the structures and genetics of the 46 O antigens of Salmonella, a major pathogen of humans and domestic animals. The variation in structures underpins the serological specificity of the 46 recognized serogroups. The O antigen is important for the full function and virulence of many bacteria, and the considerable diversity of O antigens can confer selective advantage. Salmonella O antigens can be divided into two major groups: those which have N-acetylglucosamine (GlcNAc) or N-acetylgalactosamine (GalNAc) and those which have galactose (Gal) as the first sugar in the O unit. In recent years, we have determined 21 chemical structures and sequenced 28 gene clusters for GlcNAc-/GalNAc-initiated O antigens, thus completing the structure and DNA sequence data for the 46 Salmonella O antigens. The structures and gene clusters of the GlcNAc-/GalNAc-initiated O antigens were found to be highly diverse, and 24 of them were found to be identical or closely related to Escherichia coli O antigens. Sequence comparisons indicate that all or most of the shared gene clusters were probably present in the common ancestor, although alternative explanations are also possible. In contrast, the better-known eight Gal-initiated O antigens are closely related both in structures and gene cluster sequences. Introduction O antigen (O polysaccharide) is a part of the lipopolysaccharide (LPS) component of the outer membrane of Gram-negative bacteria and is one of the most variable cell constituents. It consists of oligosaccharide repeats (O units), normally containing two to eight sugar residues. The variation is mostly in the types of sugars present, their order in the structure, and the linkages between them. The O antigen is subject to intense selection by the host immune system, bacteriophages, and other environmental factors (Reeves & Wang, 2002), which may account for the maintenance of diverse O-antigen forms within a species. O-antigen diversity is a common basis for bacterial serotyping and also important for the bacteria, as it allows each of the various clones to present a surface that offers selective advantage in its specific niche ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved (Reeves, 1992). The presence of O antigen is also essential for survival of bacteria in their natural environment and plays a role in bacterial virulence. There is direct evidence that the loss of O antigen makes many pathogens, such as Escherichia coli, Shigella flexneri, Francisella tularensis, and Yersinia enterocolitica, serum sensitive or otherwise seriously impaired in virulence (Pluschke et al., 1983; Bengoechea et al., 2004; West et al., 2005; Plainvert et al., 2007; Raynaud et al., 2007). Salmonella is recognized as a major pathogen of both animals and humans and is the cause of typhoid fever, paratyphoid fever, and the foodborne illness salmonellosis. Salmonella infections arise from contamination of poultry, eggs, beef, and other foods, sometimes including unwashed fruits and vegetables. In many countries, Salmonella is the leading cause of foodborne outbreaks and infections. It is estimated that there are 1.3 million cases FEMS Microbiol Rev 38 (2014) 56–89 Salmonella O-antigen diversity of salmonellosis, 15 000 hospitalizations, and 400 deaths annually in the United States (Hardnett et al., 2004). The genus Salmonella includes two species, S. enterica and S. bongori. S. enterica is divided into the following six subspecies: S. enterica enterica, S. enterica salamae, S. enterica arizonae, S. enterica diarizonae, S. enterica houtenae, and S. enterica indica or subspecies I, II, IIIa, IIIb, IV, and VI, respectively. S. bongori was originally designated S. enterica subspecies V, but it has since been determined to be a separate species. This classification has been confirmed by multilocus enzyme electrophoresis and sequence analysis of housekeeping genes (Nelson et al., 1991; Nelson & Selander, 1992; Boyd et al., 1994; McQuiston et al., 2008). Serotyping is highly useful for identifying strains that vary in host range and disease spectrum, including pathogens such as Salmonella, and is invaluable for epidemiological investigations. The Kauffmann–White–Le Minor serotyping scheme for designation of Salmonella serotypes, maintained by the WHO Collaborating Centre for Reference and Research on Salmonella, is used by most laboratories for the characterization of Salmonella isolates. A serotype of Salmonella is determined on the basis of O and flagellar (H) antigens. The O antigen determines the serogroup, while the H antigen completes the definition of the serovar or serotype of a Salmonella isolate. There are 46 O serogroups described in the Kauffmann– White–Le Minor scheme. These were originally designated by letters of the alphabet, but later, it was necessary to continue with numbers 51–67. The genes specific for O-antigen synthesis are normally present as a gene cluster in the chromosome, which maps between galF and gnd in Salmonella, E. coli, and Shigella, but sometimes, one or more such genes map outside the gene cluster. There are 114 H antigens in Salmonella (McQuiston et al., 2004), and 2557 serovars in total have been recognized (Grimont & Weill, 2007). Approximately 60% of the serovars belong to subspecies I, while subspecies VI and S. bongori are rare. O-antigen gene clusters appear to have been transferred among subspecies, as the majority of Salmonella O antigens are found in at least two subspecies with a mean of 3.5 subspecies per O antigen (Reeves, 1995; Popoff & Le Minor, 1997). Genetic variation in the O-antigen gene cluster is the major determinant of differences among the diverse O-antigen forms. O-antigen synthesis genes fall into three main classes: (1) nucleotide sugar precursor synthesis genes for sugars specific to the O antigen. Note that the common sugars in the O antigen that are also found in other polysaccharide structures or are involved in metabolism, such as glucose (Glc), galactose (Gal), and N-acetylglucosamine (GlcNAc), are usually synthesized by genes outside the O-antigen gene cluster. (2) sugar transferase FEMS Microbiol Rev 38 (2014) 56–89 57 genes associated with the O-unit assembly that are specific for the donor and acceptor sugars and generate a specific linkage between them; and (3) genes for O-unit processing and the conversion of the O unit to O antigen (wzx and wzy in the Wzx/Wzy pathway and wzm and wzt in the ABC transporter pathway). However, genes on bacteriophages or other chromosomally encoded genes, which are not located in the O-antigen gene cluster, are often involved in modification of the structure and particularly in the addition of side-chain residues to the O units. The synthesis and translocation of the O antigen can occur through three distinct pathways: the Wzx/Wzy pathway, the ATP-binding cassette (ABC) transporter pathway, and the synthase pathway (Bronner et al., 1994; Keenleyside & Whitefield, 1996; Daniels et al., 1998; Linton & Higgins, 1998; Samuel & Reeves, 2003). In the Wzx/Wzy pathway, the O unit is synthesized by sequential transfer of a sugar phosphate and one or more sugars from the respective nucleotide sugars to the carrier lipid, namely undecaprenyl phosphate (UndP). O units are flipped across the cytoplasmic membrane and then polymerized to form polysaccharide chains, which are transferred to the independently synthesized core-lipid A component to form LPS (Mulford & Osborn, 1983; McGrath & Osborn, 1991; Reeves & Wang, 2002). In the ABC transporter pathway, the glycosyltransferases mediate the sequential addition of sugar residues to the nonreducing end of the growing polymer to form the complete O-antigen polymer that is attached to UndPP. The polysaccharide is then translocated across the cytoplasmic membrane by an ABC transporter and ligated to the core-lipid A to form the complete LPS (Bronner et al., 1994; Linton & Higgins, 1998). In the synthase pathway used for synthesis of the Salmonella O54 antigen, a synthase catalyzes the extension of the polysaccharide chain with simultaneous extrusion of the nascent polymer across the cytoplasmic membrane (Keenleyside & Whitefield, 1996). Most Salmonella O antigens (39 in total including Salmonella O54 and O67 and taking into account that Salmonella O28 is divided into O28ab and O28ac) have either GlcNAc or N-acetylgalactosamine (GalNAc) as the first sugar of the O unit. As in most E. coli and Shigella strains WecA, which is encoded by a gene in the enterobacterial common antigen (ECA) gene cluster, is responsible for initiating the synthesis of GlcNAc- and GalNAc-initiated O antigens by transferring GlcNAc-1-phosphate to the UndP carrier. When GalNAc is the initiating sugar, UndPP–GlcNAc is then converted to UndPP–GalNAc by an epimerase, which is encoded by a gene that has been called gne (Rush et al., 2010). However, we suggest that this gene be renamed gnu, as its product is specific for ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved 58 UndPP–GlcNAc, whereas epimerases encoded by gne are specific for UDP–GlcNAc (Cunneen et al., 2013). Salmonella O67 occurs rarely and has been suggested to be a variant of serogroup O4 (B) O antigen (Li & Reeves, 2000). However, in this study, we found that the O-antigen structure of Salmonella O67 is similar to that of D-galactan I O antigen in Klebsiella pneumonia and that its gene cluster is not located between galF and gnd. Salmonella O54 has a disaccharide O unit composed of two ManNAc residues. The O54 antigen gene cluster is on a plasmid, and the O antigen expressed from the main O-antigen gene cluster is present together with the O54 antigen (Keenleyside & Whitefield, 1996). The O54 serogroup is currently retained, but if the plasmid is lost, factor O54 is no longer expressed. The Salmonella O antigens belonging to serogroups O2 (A), O4 (B), O8 (C2–C3), O9 (D1), O9,46 (D2), O9,46,27 (D3), O3,10 (E1–E3), and O1,3,19 (E4) form a distinct set that is characterized by having a Gal residue as the first sugar of the O unit and a wbaP gene in the O-antigen gene cluster, which encodes the glycosyltransferase that catalyzes the addition of the Gal-1-phosphate residue to UndP to initiate O-unit synthesis. These serogroups have related O-antigen structures and gene clusters (Reeves et al., 2013). Details of their relationships show that they have a complex evolutionary history that will be reviewed separately (Reeves et al., 2013). Although GlcNAc-/GalNAc-initiated O antigens outnumber Gal-initiated O antigens in Salmonella (39 vs. 8), the latter were found to be more prevalent in Salmonella isolates. Among Salmonella isolates from human sources reported between 1999 and 2009 by the Centers for Disease Control and Prevention in the United State, 84.23% isolates belonged to serogroups with a Gal-initiated O antigen, and only 5.35% isolates belonged to serogroups with a GlcNAc-/GalNAc-initiated O antigen (other isolates could not be serotyped) (CDC, 2009). Systematically analyzing the chemical structures and gene clusters of different O-antigen forms in a genus or species will improve our understanding of the generation of the O-antigen diversity. It will also open the way for experimental studies on the relationship between this diversity and pathogenicity. Many laboratories in the world have worked on the structure, genetics, and function of O antigens. However, most of these studies have focused on relatively few O-antigen forms. In a previous review, we summarized the structures and gene clusters of all Shigella O antigens (Liu et al., 2008) and found many genetic anomalies in the gene clusters. It was suggested that the Shigella set of O antigens has been assembled relatively recently or undergone adaptive modifications in a newly occupied niche. Salmonella, Shigella, and E. coli are known to be evolutionarily ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved B. Liu et al. related (Ochman & Wilson, 1987), and we also presented evidence in support of the close relationship between Shigella and E. coli, as 21 of 34 Shigella O antigens are either identical or closely related to an E. coli O antigen. Homologous recombination was shown to be an essential mechanism in the diversification of Shigella O antigens. Shigella is a pathogenic form that was estimated to have developed within E. coli several times over the last 35 000– 270 000 years (Pupo et al., 2000), but these events were probably more recent as mutation rates in bacterial clones observed in recent studies are much higher than earlier estimates, and this affects the date estimates (Feng et al., 2008; Ho et al., 2011; Morelli et al., 2011; Reeves et al., 2011). In contrast, Salmonella is a distinct genus with a much longer history (Ochman & Wilson, 1987; Doolittle et al., 1996); it is thought that E. coli and Salmonella diverged from a common ancestor about 140 million years ago. The evolutionary mechanisms for generation of the O-antigen diversity in Salmonella are expected to be different from those in Shigella. When we started a systematic study of Salmonella and E. coli O antigens, there were only four cases in which the O-antigen structure had been shown to be identical in the two species (Rundlof et al., 1998; Samuel et al., 2004), although more serological cross-reactions had been observed (Orskov et al., 1977). It was suggested that there had been extensive replacement of O antigens, presumably by lateral gene transfer, since divergence of the two species. In the past 5 years, we have sequenced 28 Salmonella O-antigen gene clusters, 14 of which are reported here for the first time, and determined 21 Salmonella O-antigen chemical structures, five of which are reported here for the first time. We have also revised the chemical structures of another three Salmonella O antigens. In this study, we present a compilation of the published and new chemical structures and DNA sequence data for the 46 known Salmonella O antigens. Together with the summary of Shigella O antigens, it gives an improved insight into the evolution of O-antigen diversity in bacteria. The structures and gene clusters of GlcNAc-/GalNAc-initiated O antigens were found to be highly diverse. However, the proportion of genetic anomalies in these gene clusters is clearly lower than that in Shigella, indicating that these O antigens are more stable. We also sequenced 18 E. coli O-antigen gene clusters and determined 9 and revised 2 E. coli O-antigen chemical structures to obtain sufficient data for a comparison of all O antigens shared by Salmonella and E. coli (the others were retrieved from databases). We found that 24 Salmonella O-antigen forms are either identical or closely related to E. coli O antigens, as indicated by both genetic and structural data. Therefore, the relationship between E. coli and Salmonella O antigens is much closer than previously thought. The genetic data imply that almost all O antigens shared by Salmonella FEMS Microbiol Rev 38 (2014) 56–89 Salmonella O-antigen diversity and E. coli originated from an O antigen in their common ancestor, although alternative explanations (such as a recent lateral transfer of a gene cluster from one species to the other) are also possible. In contrast to Salmonella GlcNAc-/GalNAc-initiated O antigens, Salmonella Galinitiated O antigens exhibit a high level of relatedness in structure and genetic aspects, implying a distinct evolutionary history. Chemical composition and structures for Salmonella GlcNAc-/GalNAc-initiated O antigens The structures of all GlcNAc-/GalNAc-initiated Salmonella O antigens are now known (Table 1). Some structures elucidated by us only recently have not been reported earlier and are presented here for the first time. They were established using one- and two-dimensional 1H- and 13C-NMR spectroscopy essentially as described (Duus et al., 2000).Three new Salmonella structures, those of O42, O52, and O65 antigens, are identical to the known structures of E. coli O1B (Gupta et al., 1992), O153 (Ratnayake et al., 1994), and O78 (Jansson et al., 1987), respectively. The O-antigen structure of Salmonella O67 was found to be highly similar to that of D-galactan I (?3)-D-Galf(b1?3)-a-D-Galp-(a1?) in K. pneumoniae (Whitfield et al., 1991). The only difference between the two O antigens is the presence of an O-acetyl group in Salmonella O67. Its position was determined by a comparison of the NMR spectra of the initial and O-deacetylated polysaccharides, which revealed characteristic displacements of 1 H- and 13C NMR signals caused by a deshielding effect of the O-acetyl group. Using 13C-NMR spectroscopy and the ‘fingerprint’ method, it was found that the O antigen of Salmonella O21 has the same structure as that reported erroneously for S. enterica arizonae O64 and Citrobacter freundii O32 (Kocharova et al., 1988). Formerly, a wrong structure has been assigned to the S. enterica arizonae O21 O antigen (Vinogradov et al., 1994), which, in fact, may belong to Citrobacter braakii O37 (A. Gamian, pers. commun.). In addition, structures of two Salmonella O antigens were revised in this work. Using known regularities in the 13 C-NMR chemical shifts of the Quip3NAc-(a1?3)-DManp disaccharide (Shashkov et al., 1988), the absolute configuration of Qui3NAc in the O39 antigen was revised from L (Gajdus et al., 2009) to D. In the O62 antigen, D-GalNAcA is present in the amide form (D-GalNAcAN) rather than as the free acid (Vinogradov et al., 1992). This was demonstrated by the 1H-NMR spectrum of a polysaccharide sample measured in a 9:1 H2O/D2O mixture, which showed two signals for NH2 protons at 7.40 and 7.65 ppm [compare published data (Rundlof et al., FEMS Microbiol Rev 38 (2014) 56–89 59 1998)]. We have also revised the N-acyl group on L-FucN in the O48 antigen from N-acetyl to N-acetimidoyl (Feng et al., 2005b). Except for the O54 and O67 antigens, all Salmonella GlcNAc-/GalNAc-initiated O antigens are heteropolysaccharides. Some are linear and have tri- to pentasaccharide O units. Others are branched with tetra- to hexasaccharide O units usually including one or two monosaccharide side chains or, less often, a disaccharide side chain. Most of the sugars are in the pyranose form, whereas D-Gal occurs in the furanose form in two O antigens, and D-Rib exists in the furanose form in all cases. In addition to D-GlcNAc and D-GalNAc, hexoses D-Glc, D-Man, D-Gal, L-Rha, and L-Fuc occur in six or more O antigens each (Supporting Information, Table S1). When D-Glc is present as a side chain, its content is often less than stoichiometric, and there is no putative glycosyltransferase for its transfer in the gene clusters, both indicating that this sugar is incorporated into the O antigen after assembling and processing of the O unit. Exceptionally, a side-chain Glc was proposed to be transferred by a glycosyltransferase encoded in the Salmonella O66 gene cluster (Liu et al., 2010a). Other monosaccharides are components in 1–4 O antigens each. These include neutral sugars (D-Rib, Col) and various uncommon amino sugars (D-ManN, L-QuiN, L-FucN, D-Qui3N, D-Fuc3N, D-Qui4N, D-Rha4N). In most cases, the amino sugars are N-acetylated, but in some O antigens, they carry rarely occurring N-acyl groups, such as N-acetimidoyl on L-FucN, N-formyl on D-Fuc3N, (R)-3-hydroxybutanoyl on 8-epilegionaminic acid (8eLeg), and N-acetyl-L-seryl or N-[(S)-3-hydroxybutanoyl]-D-alanyl on Qui4N (Table S1). O-Acetylation is not uncommon in Salmonella O antigens, and one or two O-acetyl groups are present in nonstoichiometric quantities in the O units of 7 O serogroups (Table 1). In contrast to the O antigens of Shigella (Liu et al., 2008), the O antigens of Salmonella are typically neutral polysaccharides. Only a few of them contain acidic components, such as hexuronic acids (D-GlcA and D-GalNAcA in the O45 and O62 antigens, respectively), nonulosonic acids (derivatives of Neu and 8eLeg in the O48 and O61 antigens, respectively), and ribitol phosphate (in the O47 antigen). However, GalNAcA exists in the neutral amide form, and the negative charge of both nonulosonic acids and phosphate group is neutralized by a basic N-acetimidoyl group on a L-FucN residue. 8eLeg5RHb7Ac (7-acetamido-3,5,7,9-tetradeoxy-5-[(R)3-hydroxybutanoylamino]-L-glycero-D-galacto-non-2-ulosonic acid, a derivative of 8-epilegionaminic acid) is a higher sugar rarely occurring in nature and is similar to isomeric nonulosonic acids found in some other bacterial carbohydrates, di-N-acyl derivatives of Pse (5,7-diamino-3,5,7, ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved 60 B. Liu et al. Table 1. Structures of Salmonella GlcNAc-/GalNAc-initiated O antigens, including O54 and O67, and related Escherichia coli O antigens Bacterium*, serogroup, Salmonella serovar or subspecies SO6,7 (C1) Thompson O-antigen structure† References ?2)-D-Manp-(b1?2)-D-Manp-(a1?2)-D-Manp-(a1?2)-D-Manp-(b1?3)-D-GlcpNAc-(b1? Lindberg et al. (1988) D-Glcp SO6,7 (C1) Thompson, Livingstone α1 ↓ 3 →2)-D-Manp-(β1→2)-D-Manp-(α1→2)-D-Manp-(α1→2)-D-Manp-(β1→3)-D-GlcpNAc-(β1→ Lindberg et al. (1988) Di Fabio et al. (1989b) D-Glcp SO6,7 (C1) Ohio α1 ↓ 3 →2)-D-Manp-(β1→2)-D-Manp-(α1→2)-D-Manp-(α1→2)-D-Manp-(β1→3)-D-GlcpNAc-(β1→ Di Fabio et al. (1989c) D-Glcp SO6,7 (C4) Livingstone var. 14+ α1 ↓ 3 →2)-D-Manp-(β1→2)-D-Manp-(α1→2)-D-Manp-(α1→2)-D-Manp-(β1→3)-D-GlcpNAc-(β1→ Di Fabio et al. (1988b) D-Manp β1 ↓ 4 SO11 (F) Aberdeen EO75 →3)-D-Galp-(α1→4)-L-Rhap-(α1→3)-D-GlcpNAc-(β1→ SO13 (G) ?2)-L-Fucp-(a1?2)-D-Galp-(b1?3)-D-GalpNAc-(a1?3)-D-GlcpNAc-(a1? EO127 Ac (~70%) | 3 →2)-L-Fucp-(α1→2)-D-Galp-(β1→3)-D-GalpNAc-(α1→3)-D-GalpNAc-(α1→ 4 | Ac (~40%) SO6,14 (H) Boecker, Carrau, Madelia EO77 Szafranek et al. (2003) Erbing et al. (1978) ?6)-D-Manp-(a1?2)-D-Manp-(a1?2)-D-Manp-(b1?3)-D-GlcpNAc-(a1? Perepelov et al. (2010e) Widmalm & Leontein (1993) Perepelov et al. (2010e) Brisson & Perry (1988) Di Fabio et al. (1988a) Di Fabio et al. (1989a) Yildirim et al. (2001) D-Glcp SO6,14 (H) Carrau, Madelia EO44 α1 ↓ 3 →6)-D-Manp-(α1→2)-D-Manp-(α1→2)-D-Manp-(β1→3)-D-GlcpNAc-(α1→ Di Fabio et al. (1988a) Di Fabio et al. (1989a) Staaf et al. (1995) D-Glcp SO6,14 (H) Madelia α1 ↓ 4 →6)-D-Manp-(α1→2)-D-Manp-(α1→2)-D-Manp-(β1→3)-D-GlcpNAc-(α1→ ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved Di Fabio et al. (1989a) FEMS Microbiol Rev 38 (2014) 56–89 61 Salmonella O-antigen diversity Table 1. Continued Bacterium*, serogroup, Salmonella serovar or subspecies O-antigen structure† References EO17 Masoud & Perry (1996) D-Glcp α1 ↓ 6 →6)-D-Manp-(α1→2)-D-Manp-(α1→2)-D-Manp-(β1→3)-D-GlcpNAc-(α1→ D-Glcp EO73 D-Glcp α1 α1 ↓ ↓ 4 3 →6)-D-Manp-(α1→2)-D-Manp-(α1→2)-D-Manp-(β1→3)-D-GlcpNAc-(α1→ Wang et al. (2007) L-Fucp SO16 (I) D-Glcp (~50%) α1 Ac (~20/40/20 %) β1 ↓ | ↓ 3 2/3/4 4 →4)-D-GalpNAc-(α1→6)-D-Manp-(α1→3)-L-Fucp-(α1→3)-D-GalpNAc-(β1→ Li et al. (2010b) L-Fucp EO11 α1 ↓ 3 →4)-D-GalpNAc-(α1→6)-D-Manp-(α1→3)-L-Fucp-(α1→3)-D-GalpNAc-(β1→ Li et al. (2010b) D-Galf SO17 (J) α1 Ac (~80%) ↓ | 4 2 →2)-D-Galp-(α1→3)-D-ManpNAc-(β1→6)-D-Galf-(β1→3)-D-GlcpNAc-(β1→ Perepelov et al. (2011d) D-Galf EO85 SO18 (K) Cerro α1 ↓ 4 →2)-D-Galp-(α1→3)-D-ManpNAc-(β1→6)-D-Galf-(β1→3)-D-GlcpNAc-(β1→ ?4)-D-Manp-(a1?2)-D-Manp-(a1?2)-D-Manp-(b1?3)-D-GalpNAc-(a1? Perepelov et al. (2011d) Vinogradov et al. (2004) D-Glcp E. coli 73-1 α1 ↓ 3 →4)-D-Manp-(α1→2)-D-Manp-(α1→2)-D-Manp-(β1→3)-D-GalpNAc-(α1→ Weintraub et al. (1993) D-GlcpNAc SO21 (L)‡ α1 ↓ 3 →4)-D-GalpNAc-(β1→3)-D-Galp-(α1→4)-D-Galp-(β1→3)-D-GalpNAc-(β1→ This review (~22%) D-Glcp (~55%) α1 α1 ↓ ↓ 3 4 →4)-D-Quip3NAc-(β1→3)-D-Ribf-(β1→4)-D-Galp-(β1→3)-D-GalpNAc-(α1→ D-Galp-(α1→3)-D-Galp SO28ab (M) Telaviv FEMS Microbiol Rev 38 (2014) 56–89 Kumirska et al. (2011) ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved 62 B. Liu et al. Table 1. Continued Bacterium*, serogroup, Salmonella serovar or subspecies O-antigen structure† References EO5ab ?4)-D-Quip3NAc-(b1?3)-D-Ribf-(b1?4)-D-Galp-(b1?3)-D-GalpNAc-(a1? MacLean & Perry (1997) D-Glcp SO28ac (M) Dakar EO71 SO30 (N) Landau β1 ↓ 4 →4)-D-Quip3NAc-(α1→3)-L-Rhap-(α1→4)-D-Galp-(β1→3)-D-GalpNAc-(α1→ Ac (~10%) Ac (~30%) | | 4 2 →4)-D-Quip3NAc-(α1→3)-L-Rhap-(α1→4)-D-Galp-(β1→3)-D-GalpNAc-(α1→ 3 | Ac (~55%) Ac (~50%) | 6 →2)-D-Rhap4NAc-(α1→3)-L-Fucp-(α1→4)-D-Glcp-(β1→3)-D-GalpNAc-(α1→ Kumirska et al. (2007) Hu et al. (2010) Bundle et al. (1986) D-Glcp SO30 (N) Urbana, Godesberg β1 ↓ 4 →2)-D-Rhap4NAc-(α1→3)-L-Fucp-(α1→4)-D-Glcp-(β1→3)-D-GalpNAc-(α1→ EO157 ?2)-D-Rhap4NAc-(a1?3)-L-Fucp-(a1?4)-D-Glcp-(b1?3)-D-GalpNAc-(a1? Perry et al. (1986a) Colp α1 ↓ 3 →4)-D-Glcp-(α1→4)-D-Galp-(α1→3)-D-GlcpNAc-(β1→ 6 ↑ α1 Colp Kenne et al. (1983) SO35 (O) Adelaide EO111 D-Galp SO38 (P) EO21 SO39 (Q) Mara D-GlcpNAc β1 β1 ↓ ↓ 4 2 →3)-D-Galp-(β1→4)-D-Glcp-(β1→3)-D-GalpNAc-(β1→ Perry et al. (1986b) ?2)-D-Quip3NAc-(a1?3)-D-Manp-(a1?3)-L-Fucp-(a1?3)-D-GalpNAc-(a1? Li et al. (2010b) Staaf et al. (1999) Gajdus et al. (2009) this review D-GlcpNAc SO40 (R) Riogrande β1 ↓ 2 →4)-D-GalpNAc-(α1→3)-D-Manp-(β1→4)-D-Glcp-(β1→3)-D-GalpNAc-(α1→ SO41 (S) ?2)-D-Manp-(b1?4)-D-Glcp-(a1?3)-L-QuipNAc-(a1?3)-D-GlcpNAc-(a1? ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved Perry & MacLean (1992b) Perepelov et al. (2010b) FEMS Microbiol Rev 38 (2014) 56–89 63 Salmonella O-antigen diversity Table 1. Continued Bacterium*, serogroup, Salmonella serovar or subspecies SO42 (T) EO1B O-antigen structure† References This review Gupta et al. (1992) D-ManpNAc β1 ↓ 2 →3)-L-Rhap-(α1→2)-L-Rhap-(α1→2)-D-Galp-(α1→3)-D-GlcpNAc-(β1→ D-Galp SO43 (U) Milwaukee α1 ↓ 3 →4)-L-Fucp-(α1→2)-D-Galp-(β1→3)-D-GalpNAc-(α1→3)-D-GlcpNAc-(β1→ Perry & MacLean (1992b) D-Galp EO86:K2:H2 α1 ↓ 3 →4)-L-Fucp-(α1→2)-D-Galp-(β1→3)-D-GalpNAc-(α1→3)-D-GalpNAc-(β1→ Andersson et al. (1989) D-GlcpNAc SO44 (V) SO45 (W) ssp. arizonae β1 ↓ 3 →2)-D-Glcp-(α1→6)-D-Glcp-(α1→4)-D-Galp-(α1→3)-D-GlcpNAc-(β1→ Perepelov et al. (2010d) L-Fucp Ac (~80%) α1 | ↓ 3 2 →4)-D-GlcpA-(β1→4)-L-Fucp-(α1→3)-D-Ribf-(β1→4)-D-Galp-(β1→3)-D-GlcpNAc-(β1→ SO47 (X) Ac (~10%) | 4 →2)-D-Rib-ol-5-P-(O→6)-D-Galp-(α1→3)-L-FucpNAm-(α1→3)-D-GlcpNAc-(α1→ EO118 ?3)-D-Rib-ol-5-P-(O?6)-D-Galp-(a1?3)-L-FucpNAm-(a1?3)-D-GlcpNAc-(b1? Shashkov et al. (1993) Perepelov et al. (2009) Liu et al. (2010b) D-GlcpNAc EO151 SO48 (Y) Toucra EO145 β1 ↓ 4 →2)-D-Rib-ol-5-P-(O→6)-D-Galp-(α1→3)-L-FucpNAm-(α1→3)-D-GlcpNAc-(β1→ →4)-Neup5Ac-(α2→3)-L-FucpNAm-(α1→3)-D-GlcpNAc-(β1→ 7,9 | Ac (~30%, ~70%) ?4)-Neup5Ac-(a2?3)-L-FucpNAm-(a1?3)-D-GlcpNAc-(b1? FEMS Microbiol Rev 38 (2014) 56–89 Liu et al. (2010b) Gamian et al. (2000) Feng et al. (2005b) Feng et al. (2005b) ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved 64 B. Liu et al. Table 1. Continued Bacterium*, serogroup, Salmonella serovar or subspecies SO50 (Z) Greenside EO55 SO50 ssp. arizonae O-antigen structure† References Colp-(α1→2)-D-Galp β1 ↓ 3 →6)-D-GlcpNAc-(β1→3)-D-Galp-(α1→3)-D-GalpNAc-(β1→ Kenne et al. (1983) Lindberg et al. (1981) Colp-(α1→2)-D-Galp β1 ↓ 3 →6)-D-GlcpNAc-(β1→3)-D-Galp-(α1→3)-D-GlcpNAc-(β1→ Senchenkova et al. (1997) D-GlcpNAc SO51 β1 ↓ 3 →6)-D-Glcp-(α1→4)-D-Galp-(β1→3)-D-GalpNAc-(α1→3)-D-GlcpNAc-(β1→ Perepelov et al. (2011c) D -GlcpNAc EO23 SO52 EO153 SO53 SO54 Borreze D-Glcp β1 α1 ↓ ↓ 3 6 →6)-D-Glcp-(α1→4)-D-Galp-(β1→3)-D-GalpNAc-(α1→3)-D-GlcpNAc-(β1→ Bartelt et al. (1993) ?2)-D-Ribf-(b1?4)-D-Galp-(b1?4)-D-GlcpNAc-(a1?4)-D-Galp-(b1?3)-D-GlcpNAc-(a1? This review Ratnayake et al. (1994) Ac (~60%, ~25%) | 2,3 →2)-Galf-(α1→4)-D-GalpNAc-(β1→4)-L-Rhap-(α1→3)-D-GlcpNAc-(β1→ Perepelov et al. (2011a) ?4)-D-ManpNAc-(b1?3)-D-ManpNAc-(b1? Keenleyside et al. (1994) SO55 ?2)–D-Glcp-(b1?2)-D-Fucp3NAc-(b1?6)-D-Glcp-(a1?4)-D-GalpNAc-(a1?3)-D-GlcpNAc-(b1? Liu et al. (2010c) EO103 ?2)–D-Glcp-(b1?2)-D-Fucp3NRHb-(b1?6)-D-GlcpNAc-(a1?4)-D-GalpNAc-(a1?3)-D-GlcpNAc-(b1? Liu et al. (2010c) SO56 ?3)-D-Quip4N(L-SerAc)-(b1?3)-D-Ribf-(b1?4)-D-GalpNAc-(a1?3)-D-GlcpNAc-(a1? Perepelov et al. (2010g) D-GlcpNAc SO57 EO51 β1 ↓ 2 →3)-L-Rhap-(α1→2)-L-Rhap-(α1→4)-D-Glcp-(α1→3)-D-GalpNAc-(β1→ SO58 ?3)-D-Quip4N(D-Ala-SHb)-(b1?6)-D-GlcpNAc-(a1?3)-L-QuipNAc-(a1?3)-D-GlcpNAc-(a1? EO123 Ac (~30%) | 6 →3)-D-Quip4N(D-Ala-SHb)-(β1→6)-D-GlcpNAc-(α1→3)-L-QuipNAc-(α1→3)-D-GlcpNAc-(α1→ ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved Perepelov et al. (2011e) Perepelov et al. (2011e) Perepelov et al. (2010f) Clark et al. (2009) Perepelov et al. (2010f) FEMS Microbiol Rev 38 (2014) 56–89 65 Salmonella O-antigen diversity Table 1. Continued Bacterium*, serogroup, Salmonella serovar or subspecies § SO59 O-antigen structure† References ?2)-D-Galp-(b1?3)-D-GlcpNAc-(a1?4)-L-Rhap-(a1?3)-D-GlcpNAc-(b1? Perepelov et al. (2011b) D-Fucp3NFo SO60 SO61 ssp. arizonae α1 ↓ 3 →2)-D-Manp-(β1→3)-D-Glcp-(β1→3)-D-GlcpNAc-(β1→ Perepelov et al. (2010a) ?8)-8eLegp5RHb7Ac-(a2?3)-L-FucpNAmp-(a1?3)-D-GlcpNAc-(a1? Vinogradov et al. (1992) D-GalpNAcAN SO62 ssp. arizonae EO35 α1 ↓ 2 →3)-L-Rhap-(α1→2)-L-Rhap-(α1→3)-L-Rhap-(α1→2)-L-Rhap-(α1→3)-D-GlcpNAc-(β1→ Vinogradov et al. (1994) this review Rundlof et al. (1998) D-Fucp3NAc SO63 ssp. arizonae SO65 EO78 α1 ↓ 4 →3)-D-Galp-(β1→4)-D-Glcp-(α1→4)-D-GalpNAc-(α1→3)-D-GalpNAc-(β1→ Vinogradov et al. (1987a) ?4)-D-Manp-(b1?4)-D-Manp-(a1?3)-D-GlcpNAc-(b1?4)-D-GlcpNAc-(b1? This review Jansson et al. (1987) D-Glcp SO66 β1 Ac (~90%) ↓ | 3 6 →2)-D-Galp-(α1→6)-D-Galp-(α1→4)-D-GalpNAc-(α1→3)-D-GalpNAc-(β1→ Liu et al. (2010a) D-Glcp EO166 SO67 β1 ↓ 3 →3)-D-Galp-(α1→6)-D-Galp-(α1→4)-D-GalpNAc-(α1→3)-D-GalpNAc-(β1→ Ac (∼30%) | 2 →3)-D-Galf-(β1→3)-D-Galp-(α1→ Ali et al. (2007) This review *S, Salmonella; E, E. coli. For abbreviations of sugar residues and nonsugar groups, see Table S1. ‡ Earlier, this structure has been reported erroneously as that of S. enterica arizonae O64 and Citrobacter freundii O32 (Kocharova et al., 1988), whereas another structure has been reported for S. enterica arizonae O21 (Vinogradov et al., 1994), which, in fact, may belong to Citrobacter braakii O37 (A. Gamian, pers. commun.). § Earlier, another structure has been reported for S. enterica arizonae O59 (Vinogradov et al., 1987b), which, in fact, belongs to Citrobacter braakii O35 (Kocharova et al., 1996) and E. coli O15 (Perepelov et al., 2011b). † 9-tetradeoxy-L-glycero-L-manno-non-2-ulosonic or pseudaminic acid), and Leg (5,7-diamino-3,5,7,9-tetradeoxyD-glycero-D-galacto-non-2-ulosonic or legionaminic acid) (Knirel et al., 2003, 2012). There are only two pairs of Salmonella O serogroups with closely related GlcNAc-/GalNAc-initiated O antigens. The O13 and O43 antigens differ only in (1) the FEMS Microbiol Rev 38 (2014) 56–89 configuration (a vs. b) and the position (1?2 vs. 1?4) of the polymerization linkage between the O units; and (2) the presence of a Gal side chain in the O43 antigen. The O6, 14 and O18 antigens, which share O factors 6 and 14, differ only in (1) the initiating amino sugar (GlcNAc vs. GalNAc); and (2) the polymerization linkage (1?6 vs. 1?4). Within serogroups, nonglucosylated ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved 66 B. Liu et al. Table 2. Summary of Salmonella and Escherichia coli sharing the identical or closely related O-antigen structures and gene clusters* Salmonella O antigen E. coli O antigen Structure relationship Reference for sequences (Salmonella/E. coli) SO11(F) SO13 (G) SO6,14 (H) SO16(I) SO17(J) SO18(K) SO28ac(M) SO28ab(M) SO30(N) SO35(O) SO38(P) SO42(T) SO43(U) SO47(X) SO48(Y) SO50(Z) SO51 SO52‡ SO55 SO57 SO58 SO62 SO65 SO66 EO75 EO127 EO77/O73//O17/O44 EO11 EO85 73-1† EO71 EO5 EO157 EO111 EO21 EO1B EO86 EcO118/O151 EO145 EO55 EO23 EO153 EO103 EO51 EO123 EO35 EO78 EO166 = Cr Cr Cr Cr Cr Cr Cr Cr = = = Cr Cr Cr = Cr = Cr = Cr = = Cr This review/Li et al. (2010a) Fitzgerald et al. (2007)/Iguchi et al. (2009) Fitzgerald et al. (2003)/Wang et al. (2007) Li et al. (2010a, b)/Li et al. (2010b) Fitzgerald et al. (2006)/Perepelov et al. (2011d) Fitzgerald et al. (2006) Hu et al. (2010)/Hu et al. (2010) Clark et al. (2010)/this review Samuel et al. (2004)/Wang & Reeves (1998) Wang & Reeves (2000)/Wang & Reeves (2000) Li et al. (2010b)/Ren et al. (2008) This review/Li et al. (2010a) This review/Feng et al. (2005a) Liu et al. (2010b)/Liu et al. (2010b) This review/Feng et al. (2005b) Samuel et al. (2004)/Wang et al. (2002b) Perepelov et al. (2011c)/Perepelov et al. (2011c) This review/this review Liu et al. (2010c)/Fratamico et al. (2005) Perepelov et al. (2011e)/Perepelov et al. (2011e) Clark et al. (2009)/Beutin et al. (2007) This review/Liu et al. (2009) This review/Liu et al. (2009) Liu et al. (2010a)/Liu et al. (2010a) *S: Salmonella; E: E. coli; =: identical; Cr: closely related. † The backbone of Salmonella O18 antigen is found to be identical to that of strain E. coli 73-1(Weintraub et al., 1993). However, the O-serogroup and O-antigen gene cluster of E. coli 73-1 are unknown. ‡ Salmonella O52 shares the same O-antigen structure with E. coli O153 (Ratnayake et al., 1994). However, their gene clusters are unrelated. and glucosylated structural variants are known, for example, compare the Salmonella O6,7, O6,14, and O30 antigens (Table 1). Finally, two structures have been reported for the Salmonella O50 antigen, which differ only in the initiating amino sugar (GlcNAc vs. GalNAc). In contrast, the similarity of the O antigens of Salmonella O28ab and O28ac is limited to the presence of the common D-Galp-(b1?3)-D-GalpNAc-(a1?4)-D-Quip3NAc trisaccharide fragment in the main chain, and classification of the two bacteria to the same serogroup requires reconsideration. Remarkably, many Salmonella GlcNAc-/GalNAc-initiated O antigens are closely related or even identical to E. coli O antigens (Tables 1 and 2). Most O-antigen structures shared by these bacteria have been reported by us or others earlier, and some of them are discussed below. General features of gene clusters for Salmonella GlcNAc-/GalNAc-initiated O antigens The gene clusters of all GlcNAc-/GalNAc-initiated Salmonella O antigens have been sequenced (Fig. 1, Table S2). Except for Salmonella O54 and O67, these gene clusters ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved are localized in the genomes between the galF and gnd genes. Their general characteristics, such as having low GC content (about 30%), using wzx/wzy as O-unit processing genes, exhibiting great diversity, etc., are similar to those of E. coli and Shigella. Almost all Salmonella O antigens use the Wzx-/Wzydependent process for the synthesis and translocation of O antigens, the only exceptions being O54 and O67, which use the synthase-dependent pathway and the ABC transporter pathway, respectively. The wzx and wzy genes are usually located within the O-antigen gene cluster, but for O66, there is no wzy gene in the gene cluster, and it must be located elsewhere in the genome. This resembles the situation in serogroups A, B, and D1, which have the wzy gene at a locus far from the main gene cluster (Naide et al., 1965; Curd et al., 1998). In E. coli, the ABC transporter pathway has been reported for 10 of the 148 O antigens with sequenced gene clusters (O8, O9, O20, O52, O89, O95, O97, O99, O101, O162) (Liu et al., 2008), but is not found to mediate the synthesis of any Salmonella or Shigella O antigens except for Salmonella O67. Several studies have shown that most gene clusters for Salmonella Gal-initiated O antigens have a cassette FEMS Microbiol Rev 38 (2014) 56–89 67 Salmonella O-antigen diversity wzy wbaB wbaC wbaD manC manB wzx wzx rmlB rmlD rmlA rmlC wdaA wzy wdaB wdaC manC manB wzx rib O11(F) wdbD wdbE wdbFwdbG gmd fcl gmm manC manB wdbH wzx wdbI wzy wdbJ fnlB fnlC wbuB wbuC wbuW wbuX wbuY wbuZ fnlA fnlB fnlC wbuX wbuY wbuZ fnlA O47(X) gne wfbG gmd fcl gmm manC manB manC manB wzx wzy wcmC wfbI nnaD nnaB nnaC nnaA wxz wzy wbuB wbuC O48(Y) O13(G) wbaC wbaD wzy wzx wbgM O6,14(H) gmd gmm manC manB wbgN wzy wzx wbgO wbgP gnd colA colB O50(Z) wzx wdaD wzy wdaE gne wdaF gmd fcl gmm manC wdaG manB wzy wdbK O16(I) wzx wdbL wcmC wfbG gne O51 mnaA wzy wfbQ wzx wfbR wfbS glf wdbM wfbU wdbN wdbO wzy wdbP wzx O52 O17(J) wfbV wbaC manC manB wzy wzx rmlB rmlD wfbX O18(K) rmlA wzx glf wzy wdbQ wdbR rmlC wdaC gne O53 wzx wdaH wzy wdaI wdaJ wdaK gne wbbE wbbF wdaL O21(L) mnaA O54 rmlB rmlD rmlA wdaK rmlC qdtA qdtB wzx wdaM wzy rmlB rmlA fdtA fdtC fdtB wdaN qdtC wzx wdbT wdbU wzy wdbV wfbG gne O55 O28ac(M) rmlB rmlA qdtA qdtC qdtB wzx wzy rmlB wbuM wbuN wbuO wbeD O28ab(M) O30(N) wdbB wdbC wzy O45(W) C1 vioA wdbW vioB wzx wzy wdbX wdbY wdbZ wfbG gne rmlA O56 wbdN wzy wbdO wzx per wbdP gmd fcl gmm manC rmlB manB wzx rmlD rmlA rmlC wdcA wzy wdcB wdcC wdcC wdcE O57 wbdH gmd gmm manC manB colA colB wzx wzy wbdL wbdM rmlB O35(O) rmlA wzx vioA wfbA wfbB wfbC wfbD wfbE wzy wfbF fnlA qnlA qnlB wbwH wbuC O58 wzx wclN wbdN wclP wzy rmlB wclQ O38(P) rmlD rmlA rmlC wzy wdcF wdcG wdaC fdtB wzx wdcI gnaB wzx wzx O59 rmlB rmlA qdtA qdtC qdtB wzx wdaO wzy wdaF gmd fcl gmm manC wdaG manB O39(Q) rmlB rmlA fdtA fdtF rmlB rmlD rmlA rmlC gnaA wdcJ wzy wbdN manC manB O60 gne wzx wzy wdaQ wdaR wdaS wbdN manC manB O40(R) wbpS wcnX wcnY wzy wcnZ O62 wbyJ manC manB wzx wzy wbuH fnlA qnlA qnlB wbwH wbuC O41(S) gne weiD rmlB rmlA fdtA fdtC fdtB wejN wejO manC wzx weiA wzx wdcO wzy wdcP wdcQ O63 rmlB rmlD rmlA rmlC wzx mnaA wekM wzy wekN wbdH O42(T) manB wejP wejQ wzy wejR wejS wejT wzx O65 gne wfbG gmd fcl gmm manC manB wzx wcmB wzy wcmC wcmD O43(U) weiB weiC weiD gne O66 wzx wdaX wzy wdaY wdaZ wzm wzt wbdH O44(V) wbbM glf wbbN wbbO wejU wejV O67 elb1 elb2 elb4 elb6 elb3 elb5 wdcK elb7 wdcL wzx wzy wdcM wbuX wbuYwbuZ fnlA fnlB fnlC wbuB wbuC O61 1 Kb O unit processing gene glycosyltransferase gene GDP-sugar pathway gene UDP-sugar pathway gene dTDP-sugar pathway gene Neu5Ac synthesis gene acetyltransferase gene CDP-ribitol synthesis gene 8eLeg5RHb7Ac synthesis gene function unknown gene CDP-sugar pathway gene H-repeat element gene remnant gnd IS remnant Fig. 1. The O-antigen gene clusters of Salmonella GlcNAc-/GalNAc-initiated O antigens. Open arrows represent the location and orientation of putative genes. The O-antigen gene clusters that are first reported in this review have been deposited in GenBank under accession numbers from JX975328 to JX975348. structure with a central set of variable serogroup-specific genes flanked by highly homologous sugar pathway genes or other shared genes. A similar situation has been found in several groups of Streptococcus pneumoniae gene clusters for capsules with related structures (Bentley et al., 2006; Mavroidi et al., 2007) and in Yersinia pseudotuberculosis O-antigen gene clusters (Cunneen et al., 2009; De Castro et al., 2009, 2010). In contrast, the gene clusters for Salmonella GlcNAc-/GalNAc-initiated O antigens are highly diverse and possess no cassette structure. There are only three sets of related O-antigen gene clusters. (1) Salmonella O11 and C1. The last three genes (manC, manB, and wzx) at the 3′ end of their gene clusters are in the same order and share obvious DNA identity (63% for FEMS Microbiol Rev 38 (2014) 56–89 manC, 93% for manB, and 97% for wzx; Fig. 2a). The O-antigen structures of Salmonella O11 and C1 are not related except for having mannose as a constituent sugar, and the other genes of their gene clusters are quite different. It is likely that a recombination event has occurred between the O-antigen gene clusters of Salmonella O11 and C1. The DNA identity level of manC is much lower than that of the manB and wzx genes, and we propose that one of the recombination sites is located in the manB gene. It is surprising that an almost identical Wzx protein is responsible for translocation of O antigens with such different structures. (2) Salmonella O13 and O43. The first seven genes and last two genes of the Salmonella O13 and O43 antigen ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved 68 B. Liu et al. (a) wzy wdaB wdaC wbaD manC manB WbaB wzx DNA identity% 63 Protein identity% 93 97 68 wzy WbaC WbaD D-Manp 96 β1 ↓ 4 WdaC →3)-D-Gal p-(α1→4)-L-Rhap-(α1→3)-D-GlcpNAc-(β1→ 97 O11 rmlB rmlD rmlA rmlC wdaA WbaC →2)-D-Manp-(β1→2)-D-Manp-(α1→2)-D-Manp-(α1→2)-D-Manp-(β1→3)-D-GlcpNAc-(α1→ C1 wdaB wdaC manC manB wzx D-Galp (b) gmd fcl gmm manC manB wzx wcmB wzy α1 ↓ WcmC WfbG 3 →4)-L-Fucp-(α1→2)-D-Galp-(β1→3)-D-GalpNAc-(α1→3)-D-GlсpNAc-(β1→ wcmCwcmD WcmD O43 DNA identity% 98 99 99 98 98 98 93 Protein identity% 99 98 100 99 99 99 94 gne wfbG gmd fcl gmm manC manB 68 68 WcmB gne wfbG 64 55 WfbI O13 wzx WcmC WfbG →2)-L-Fucp-(α1→2)-D-Galp-(β1→3)-D-GalpNAc-(α1→3)-D-GlcpNAc-(α1→ wzy wcmC wfbI (c) rib wzx wdbI wzy wbuB wbuC wdbJ wbuX wbuY wbuZ fnlA fnlB fnlC WdbI O47 WdbJ WbuB →2)-D-Ribitol-5-P-(O→6)-D-Galp-(α1→3)-L-FucpNAm-(α1→3)-D-GlcpNAc-(α1→ DNA identity% 88 Protein identity% 98 99 91 nnaD nnaB nnaC nnaA wzx wzy 99 99 99 99 98 99 99 98 99 96 99 wbuW wbuX wbuYwbuZ fnlA fnlB fnlC 98 WbuW O48 DNA identity% 77 63 71 Protein identity% 73 99 65 64 99 99 100 99 WbuB →4)-Neup5Ac-(α2→3)-L-FucpNAm-(α1→3)-D-GlcpNAc-(β1→ 7,9 | Ac (~30%, ~70%) wbuB wbuC 100 99 99 100 99 WdcM WbuB →8)-8eLegp5R3Hb7Ac-(α2→3)-L-FucpNAm-(α1→3)-D-GlсpNAc-(α1→ O61 elb1 elb2 elb4 elb6 elb3 elb5 wdcK elb7 wdcL wzx wzy wdcM wbuX wbuY wbuZ fnlA fnlB fnlC wbuB wbuC Fig. 2. Comparisons of related Salmonella O-antigen gene clusters. For color coding key, see Fig. 1. The proposed functions of glycosyltransferases are shown. gene clusters have the same order and significant DNA identity, and the structures are also related (Fig. 2b). Both structures are also found in E. coli, and the relationships between the four structures and gene clusters are discussed below. (3) Salmonella O47, O48, and O61. The last eight genes in the O-antigen gene clusters of the three O serogroups have the same order and share 63–100% DNA identity (Fig. 2c). All three structures contain the L-FucpNAm-(a1?3)-DGlcpNAc disaccharide fragment. Four of the eight genes, fnlA, fnlB, fnlC, and wbuX, are involved in the synthesis of L-FucpNAm, and wbuB is proposed to be the L-FucNAm transferase gene. The role of the other three genes is not clear as there are no other shared structural elements. Although the O-antigen structures of Salmonella O6,14 and O18 are identical apart from the Wzy polymerization linkage (Table 1), the genes in their gene clusters share no similarity, except for manC (59% identity) and wzy (49% identity). It is interesting that the wzy genes are among those with higher levels of identity given the different polymerization linkage. The sugar synthesis genes in O-antigen gene clusters, such as those for L-Rha and D-Man, are often highly conserved and easily identified. Among the Salmonella GlcNAc-/GalNAc-initiated O antigens discussed in this section, L-Rha is present in 7 ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved O antigens. RmlB, RmlD, RmlA, and RmlC catalyze the four-step synthesis of dTDP-L-Rha, and the genes are usually located at the 5′ end of the O-antigen gene clusters of Salmonella and E. coli with the above conserved gene order. The sequence comparisons show that in Salmonella, the 5′ end of the rml gene set, comprising rmlB, rmlD, and most of rmlA, has many characteristics of housekeeping genes and is in general subspecies specific (data not shown). In contrast, the 3′ end, including part of rmlA and all of rmlC, is much more variable, and the variation at this end is clearly O antigen- and not subspecies-related. This is consistent with a previous report (Li & Reeves, 2000) based on a much smaller number of serotypes. It was suggested in that study that this was because rmlC and the 3′ end of rmlA are commonly transferred between subspecies with the glycosyltransferase and O-antigen processing genes that determine O-antigen specificity and are generally in the central region of the gene cluster. The 5′ end of the rml gene set was proposed to gain its subspecies-specific sequence in this process, as these genes remain in the species as new gene clusters arrive and others die out. The additional data fully support those conclusions. Where the rmlB and rmlA genes are involved in the synthesis of sugars other than Rha, the full gene set may also be located at the 5′ end with rmlB and rmlA as the FEMS Microbiol Rev 38 (2014) 56–89 69 Salmonella O-antigen diversity first two genes, as in the O-antigen gene clusters of Salmonella O28ab, O39, O55, O58, and O60. Only in Salmonella O56 and O63 are rmlB and rmlA found elsewhere in the gene cluster (Fig. 1). D-Man is present in 10 GlcNAc-/GalNAc-initiated Salmonella O antigens. GDP-D-Man is synthesized from fructose-6-phosphate by ManA, ManB, and ManC, but only the manB and manC genes are generally present in the gene cluster as ManA is also involved in use of exogenous mannose as a carbon source, and the gene is not associated with the O-antigen gene clusters (Neidhardt et al., 1987). ManB and ManC are also involved in the synthesis of GDP-Col, GDP-L-Fuc, GDP-D-Rha4NAc (GDP-PerNAc), so a total of 16 gene clusters for GlcNAc-/GalNAcinitiated Salmonella O antigens contain manB and manC genes. Colanic acid (CA), which is widely present in Salmonella, contains L-Fuc, and the manB and manC genes required for production of GDP-Fuc are located within the CA gene cluster (Aoyama et al., 1994). The CA gene cluster is unusual in having generally a high GC content, and remarkably, most manB genes for GlcNAc-/GalNAcinitiated Salmonella O antigens (including those in O60 and O65 that are reported in this review) share high level identity (93–99%) to the CA manB gene (Jensen & Reeves, 2001). The only exceptions are manB genes of O6,14 and O35. Furthermore, those CA-like manB genes display obvious subspecies specificity, and the CA manB genes and the CA-like manB genes in each strain appear to be evolving in concert via gene conversion events (Jensen & Reeves, 2001). These events appear to be unidirectional, as no manB gene with low GC content has been found in a CA gene cluster. It should be noted that the manB genes from the O-antigen gene clusters of the 8 Gal-initiated serogroups are closely related and not CA-like (Jensen & Reeves, 2001). In contrast to manB, with the exception of O11 and O41, the Salmonella manC genes are not CA-like even in gene clusters with the whole of the L-Fuc pathway. To assess the diversity of Wzx, Wzy, and the glycosyltransferases involved in the synthesis of the 37 Wzx/ Wzy pathway GlcNAc-/GalNAc-initiated O antigens, we used the TribeMCL program (Enright et al., 2002) with a cutoff of 1e 50 to assemble each group of proteins into homology groups (HG). 36 Wzy proteins (the Salmonella O66 gene cluster contains no wzy gene) and 37 Wzx proteins were assembled into 35 and 23 HG, respectively. There is enormous diversity as the average amino acid identity levels between the Wzy or Wzx HG are under 15%. In contrast, Wzy and Wzx proteins for the 8 Gal-initiated Salmonella O antigens were assembled into 4 and 3 HG, respectively, with mostly similar low levels of identity between HG as found for Salmonella GlcNAc-/GalNAcinitiated O antigens. However, the higher proportion of FEMS Microbiol Rev 38 (2014) 56–89 gene clusters with a shared HG for Wzx or Wzy reflects a higher level of relatedness among gene clusters for Galinitiated O antigens. The data also further demonstrate the different patterns of diversity in the gene clusters for Salmonella GlcNAc-/GalNAc- and Gal-initiated O-antigen gene clusters. The 127 glycosyltransferases from the 37 Wzx/Wzy pathway O antigens were assembled into 91 HG (Table S3), of which 20 contain 2–6 members. The functions of 64 of these glycosyltransferases can be predicted based on correlations between the presence of a glycosyltransferase with a specific protein sequence and a shared or similar structural element in the corresponding O antigens (Fig. S1). In some cases, glycosyltransferases belonging to the same HG were proposed to have the same function. For instance, the 6 glycosyltransfeases in HG-GT-1 share 41–99% identity in pairwise comparisons. Among these, WfbG in Salmonella O43 was proposed to be responsible for the synthesis of a D-GalNAc-(a1?3)-D-GlcNAc linkage. When structural data were taken into consideration, 5 of the 6 HG-GT-1 glycosyltransferases were proposed to have the same function and named WfbG accordingly. The only exception is WbdH in Salmonella O35, which is proposed to be responsible for the formation of a D-Gal(a1?3)-D-GlcNAc linkage. Low proportion of anomalies in gene clusters for Salmonella GlcNAc-/GalNAcinitiated O antigens Anomalies in the O-antigen gene clusters usually indicate a recent genetic event that may have been involved in the formation of the O-antigen form, perhaps related to adaptive modifications of bacteria in newly occupied niches (Liu et al., 2008). Twelve such anomalies belonging to five categories (mobile elements, noncoding region, gene(s) in the reverse orientation or unusual location, and gene remnant) are found in the 37 Salmonella GlcNAc-/GalNAc-initiated O-antigen gene clusters. Previous studies found 17 such anomalies in the 33 Shigella O-antigen gene clusters, and 49 anomalies present in 148 E. coli O-antigen gene clusters. The proportion of anomalies in Salmonella O-antigen gene clusters is very similar to that in E. coli O-antigen gene clusters and much lower than that in Shigella. This suggests that it is Shigella that is atypical, which is consistent with it having diverged relatively recently and adopting a new niche. Mobile elements Several insertion sequences and H-repeat elements were found in Shigella strains and were often associated with inferred gene cluster rearrangements. However, for the ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved 70 gene clusters of Salmonella GlcNAc-/GalNAc-initiated O antigens, only one mobile element is found, an H-repeat insertion that is 78% identical to the RhsB H-repeat of E. coli K12 (Zhao et al., 1993), and is located between gne and gnd in the O-antigen gene cluster of Salmonella O51. This is the only major difference between the O-antigen gene clusters of Salmonella O51 and E. coli O23 that encode the same O antigen (Perepelov et al., 2011c), indicating that this H-repeat unit inserted into the O-antigen gene cluster of Salmonella O51 after the divergence of Salmonella and E. coli. Because the H-repeat unit in Salmonella O51 is intact, it is likely that the insertion occurred recently. Noncoding regions The gaps between genes in O-antigen gene clusters are often very short, suggesting that translational coupling is occurring, but larger gaps can arise during restructure of a gene cluster (for instance, the incorporation or deletion of genes). In Salmonella serogroups A, B, and D1, for example, the functional wzy genes responsible for the polymerization of O units are found outside the O-antigen gene cluster (Naide et al., 1965; Curd et al., 1998), and a remnant wzy gene is present in the large gap upstream of the wbaO gene where the wzy gene is found in groups E and D2. Noncoding regions also are found in gene clusters for four of the Salmonella GlcNAc-/GalNAc-initiated O antigens. (1) In the O-antigen gene cluster of Salmonella O66, there is no wzy gene (Fig. 1), and there is also a 874-bp noncoding region between weiA and weiB in the gene cluster (Liu et al., 2010a). However, no remnant of a wzy gene can be found in this region by sequence homology search. A wzy remnant can be difficult to find by BLAST search because of the high divergence levels in wzy genes and the degradation of remnant sequences by deletions, which can fragment an open reading frame and/or change the reading frame. In Salmonella serogroups A, B, and D1 discussed above, the wzy remnants were not found until the ancestral wzy gene of group D3 was sequenced, which provided a closely related homologue. The Salmonella wzy genes are highly divergent, and if none are in the same HG as the lost O66 wzy gene, then a remnant may well not be detectable by BLAST but have to await sequencing of a near relative. Because the Salmonella O66 type strain can produce normal LPS, it is highly likely that it also has a functional wzy gene for its b 1?2 linkage outside the O-antigen gene cluster. (2) In the O-antigen gene cluster of Salmonella O40, there is a remnant gnu gene between gne and wzx (Fig. 1). Gnu is responsible for the formation of UndPP– GalNAc from UndPP–GlcNAc for GalNAc-initiated O ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved B. Liu et al. antigens (Rush et al., 2010), and the remnant suggests that the ancestral gene cluster coded for a GalNAc-initiated O antigen, although in E. coli and Salmonella, there is often a gnu gene upstream of galF rather than in the gene cluster. There is a 671-bp noncoding region between the gnu remnant and wzx, with no good hits in a BLAST search. Salmonella O40 has two GalNAc residues and no main-chain GlcNAc residue, indicating that both gne and gnu are required for the O-antigen synthesis. We suggest that there is a gnu gene upstream of galF in Salmonella O40, which is responsible for the synthesis of UndPP– GalNAc and replaces the function of the now degraded gnu gene in the gene cluster. (3) A 570-bp noncoding region with no good hits in a BLAST search is located between wekM and wzy in the Salmonella O42 antigen gene cluster. Salmonella O42 has the same O-antigen structure as E. coli O1 (Table 1), and their gene clusters also have the same organization. However, the 570-bp noncoding region is not found in the O-antigen gene cluster of E. coli O1. This noncoding region marks a boundary between different levels of DNA identity between the two gene clusters (Fig. 3a), being 55–81% for the seven upstream genes (rmlB-wekA), but only about 40% identity for the three downstream genes (wzy, wekN, wbdH) to the corresponding genes in E. coli O1, with no obvious protein identity for the gene products. The first seven genes in the two gene clusters presumably have a common ancestor, while the other three may have different origins. It is likely that the presence of the 570-bp noncoding region is related to the incorporation of wzy, wekN, and wbdH in Salmonella O42, suggesting that the ancestral gene cluster was like that in E. coli. (4) Two noncoding regions are found in the O-antigen gene cluster of Salmonella O53. One is located upstream (positions 9867–10961) of gne, and the other (positions 11991–13054), downstream of gne (upstream of gnd). It is likely that these two noncoding regions are related to the incorporation of the gne gene into the O-antigen gene cluster, implying an ancestor without the GalNAc residue currently present. Genes in the reverse orientation All but two of the genes in the Salmonella O-antigen gene clusters are transcribed from galF to gnd, the exceptions being qdtC in Salmonella O28ac and gne in Salmonella O21, which are transcribed in the opposite direction. (1) qdtC is located at the 3′ end of the Salmonella O28ac antigen gene cluster. QdtC is involved in the biosynthesis of dTDP-Qui3NAc, together with RmlA, RmlB, QdtA, and QdtB (Pfostl et al., 2008). QdtC is an acetyltransferase for the final step in synthesis of dTDP-Qui3NAc, and it is likely that qdtC was added to the O28ac gene cluster FEMS Microbiol Rev 38 (2014) 56–89 71 Salmonella O-antigen diversity (a) D -ManpNAc rmlD rmlArmlC wzx mnaA wekM wzy wekN wbdH β1 ↓ WekN WekN 2 WbdH →3)- L-Rha p -(α1→2)- L -Rhap-(α1→2)-D-Galp-(α1→3)- D -GlcpNAc-(β1→ WekM rmlB SO42 DNA identity% 80 74 81 72 55 63 57 Protein identity% 90 84 95 81 46 60 48 41 40 41 D -ManpNAc WekM β1 ↓ WekN WekN 2 WbdH →3)- L-Rha p -(α1→2)- L -Rhap-(α1→2)-D-Galp-(α1→3)- D -GlcpNAc-(β1→ EO1 rmlD rmlArmlC wzx (b) rmlB rmlD fdtA fdtC rmlB fdtB mnaA wekM wzx wzy wekN wbdH wdbT wdbU wzy wdbV wfbG gne WfbG →2)-D-Glcp-(β1→2)-D-Fucp3NAc-(β1→6)-D-Glcp-(α1→4)-D-GalpNAc-(α1→3)-D-GlcpNAc-(β1→ SO55 DNA identity% 74 76 69 67 69 53 59 58 60 63 66 Protein identity% 85 88 75 67 70 47 52 42 47 62 66 wbtF wbtG gne WfbG EO103 fdtA fdhC →2)-D-Glcp-(β1→2)-D-Fucp3NR3Hb-(β1→6)-D-GlcpNAc-(α1→4)-D-GalpNAc-(α1→3)-D-GlcpNAc-(β1→ rmlB rmlD fdtB wzx wbtD wbtE wzy D-Glcp (c) wzx weiA weiB weiC weiD Ac (~90%) β1 ↓ | 3 WeiD 6 →2)-D-Galp-(α1→6)-D-Galp-(α1→4)-D-GalpNAc-(α1→3)-D-GalpNAc-(β1→ gne SO66 DNA identity% 66 66 65 70 64 66 Protein identity% 61 62 59 68 60 66 wzx weiA weiB weiC weiD gne wfbG D-Glcp EO166 wzy β1 ↓ 3 WeiD →2)-D-Galp-(α1→6)-D-Galp-(α1→4)-D-GalpNAc-(α1→3)-D-GalpNAc-(β1→ gne D-Galp fcl gmm manC manB wzx α1 ↓ WfbG 3 WcmC →4)-L-Fuc p-(α1→2)-D-Galp-(β1→3)-D-GalpNAc-(α1→3)-D-GlсpNAc-(β1→ wcmB wzy wcmC wcmD SO43 WcmD DNA identity% 63 83 84 75 74 77 64 73 63 68 66 Protein identity% 60 93 90 77 79 86 54 72 59 67 54 D-Galp α1 ↓ WcmA 3 WcmC →4)-L-Fuc p-(α1→2)-D-Galp-(β1→3)-D-GalpNAc-(α1→3)-D-GalpNAc-(β1→ WcmD EO86 gne wcmA gmd fcl gmm manC manB wzx gne gmd fcl gmm manC manB wzx WcmB gmd WcmB (d) wcmB wzy wcmC wcmD (e) wfbG wzy wcmC wfbI WfbI DNA identity% 62 83 84 75 74 78 37 68 73 66 Protein identity% 60 92 87 78 79 86 35 62 71 63 WfbG Ac (~70%) | WcmA WfbI WcmC 3 →2)-L-Fucp-(α1→2)-D-Galp-(β1→3)-D-GalpNAc-(α1→3)-D-GalpNAc-(α1→ 4 | Ac (~40%) EO127 gne (f) WcmC →2)-L-Fucp-(α1→2)-D-Galp-(β1→3)-D-GalpNAc-(α1→3)-D-GlсpNAc-(α1→ SO13 wcmA gmd fcl gmm manC manB manB wzx wdbR wzy wcmC wfbI Ac (~70%) | WcmA WfbI WcmC 3 →2)-L-Fucp-(α1→2)-D-Galp-(β1→3)-D-GalpNAc-(α1→3)-D-GalpNAc-(α1→ 4 | Ac (~40%) D-Galp α1 ↓ WcmD WcmA 3 WcmC →4)-L-Fuc p-(α1→2)-D-Galp-(β1→3)-D-GalpNAc-(α1→3)-D-GalpNAc-(β1→ wzx wdbR wzy wcmC wfbI gne wcmA gmd fcl gmm manC DNA identity% 99 99 98 96 98 99 94 50 47 54 67 63 Protein identity% 99 100 100 99 98 99 99 15 13 16 59 50 manB wzx wcmB wzy wcmC wcmD WcmB EO127 EO86 fcl gmm manC gne wcmA gmd (g) wbdN wzy wbdO wzx perA wbdP gmd fcl gmm manC manB SO30 DNA identity% 66 69 66 70 77 70 74 67 64 76 79 Protein identity% 60 65 61 70 83 66 90 77 61 81 89 wbdN wzy wbdO wzx perA Ac (~50%) | 6 WbdN →2)-D-Rhap4NAc-(α1→3)-L-Fucp-(α1→4)-D-Glcp-(β1→3)-D-GalpNAc-(α1→ WbdN EO157 →2)-D-Rhap4NAc-(α1→3)-L-Fucp-(α1→4)-D-Glcp-(β1→3)-D-GalpNAc-(α1→ wbdP gmd fcl gmm manC manB perB Fig. 3. Examples of O antigens with gene clusters and structures that are the same or are related in Salmonella and Escherichia coli. For color coding key, see Fig. 1. S, Salmonella. E, E. coli. The proposed functions of glycosyltransferases are shown. FEMS Microbiol Rev 38 (2014) 56–89 ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved 72 relatively recently and that the ancestor had Qui3N in place of Qui3NAc. The E. coli O71 antigen gene cluster has the same organization as that of Salmonella O28ac, including the orientation of the qdtC gene, and the main chains of the two polysaccharides are the same (Hu et al., 2010). Thus, unlike other anomalies discussed in this section, it is likely that the qdtC gene in Salmonella O28ac and E. coli O71 was present in the common ancestor and is not an indication of recent change in our definition. (2) The gne gene in the O-antigen gene cluster of Salmonella O21 is located between wdaK and wdaL and transcribed in the opposite direction. This is an unusual location for a gene with this orientation as all previously described genes transcribed in the opposite direction in the O-antigen gene clusters of Salmonella and Shigella are located after the 3′ end of the normally transcribed genes. The O21 gne gene is needed to synthesize the GalNAc residue, which is the last sugar in the structure, and would perhaps be replaced by a GlcNAc residue in its absence. The orientation of this gne gene creates a need for two additional promoters (one for wcaL), but there is no evidence to indicate that the gne gene is a recent addition, especially because a promoter upstream of gne could not be identified based on an in silico search. rml genes in unusual location As discussed above, the rmlB, rmlD, rmlA, and rmlC genes for the four-step synthesis of dTDP-L-Rha are usually located at the 5′ end of the E. coli and Salmonella O-antigen gene clusters, with the conserved gene order as above. In some cases, the rmlC gene is separated from other rml genes. Seven Salmonella GlcNAc-/GalNAc-initiated O antigens contain L-Rha, and two of the rmlC genes are found in unusual locations. (1) and (2) In Salmonella O28ac and O53, rmlC was located 1 gene and 5 genes, respectively, downstream of the rmlBDA genes. (3) In Salmonella O56, the rmlA and rmlB genes are involved in the synthesis of dTDP-Qui4N, and as there is no L-Rha moiety, the rmlC and rmlD genes are not required. However, while the O56 rmlB gene is at the 5′ end of the gene cluster, the rmlA gene is located at the other end of the gene cluster, 10 genes downstream of rmlB. (4) In the O-antigen gene cluster of Salmonella O63, the rmlB and rmlA genes, which are involved in the synthesis of dTDP-D-Fuc3NAc, are not located at the 5′ end of the O-antigen gene cluster, but downstream of weiD. Remnant genes The O antigen of Salmonella O50 contains D-Gal, D-GalNAc, D-GlcNAc, and Col. The synthesis of GDP-Col ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved B. Liu et al. requires the products of 5 nonhousekeeping sugar synthesis genes: manB, manC, gmd, colA, and colB (Fig. 4). The manB, manC, and gmd genes are in the O-antigen gene cluster, while colA and colB are downstream of gnd, suggesting that they are a recent addition not fully incorporated into the gene cluster. There is a remnant of an fcl gene between gmd and gmm in the Salmonella O50 gene cluster, indicating that before the acquisition of colA and colB, the ancestral gene cluster coded for synthesis of GDP-L-Fuc. The colA and colB genes in E. coli O55, which has the same O antigen as Salmonella O50, are also located downstream of gnd. However, there is no fcl gene remnant in the O-antigen gene cluster of E. coli O55, presumably due to more extensive deletions than in Salmonella O50, which have occurred since the acquisition of the colA and colB genes. The presence of the colA and colB genes downstream of gnd and a remnant fcl gene in the gene cluster could suggest that colA and colB are a recent addition not fully incorporated into the gene cluster. However, the presence of the genes in the same location in E. coli suggests that in this case, it has survived for a long time and that consolidation of genes into the gene cluster can be a very slow process. Biosynthetic pathways of monosaccharides Twenty-one different sugars were found in the Salmonella GlcNAc-/GalNAc-initiated O antigens (Table S1). Fourteen of them are also present in Shigella O antigens, and their proposed or characterized biosynthetic pathways have been reviewed (Liu et al., 2008). The pathways for the other 7 sugars (Col, L-FucNAm, D-Qui3NAc, D-Fuc3NAc/D-FucNFo, D-Rha4NAc (D-PerNAc), Neu5Ac, and 8eLeg5RHb7Ac) and ribitol are shown in Fig. 4. The biosynthetic pathway for 8eLeg5RHb7Ac, a component of the Salmonella O61 O antigen, is first proposed in this review (Fig. 4) and requires biochemical confirmation. A similar derivative of 8-epilegionaminic acid, 8eLeg5Ac7Ac, has been found in the O antigens of E. coli O61 and O108, and a biosynthetic pathway, including 7 enzymes (Elg1–Elg7), also was proposed (Perepelov et al., 2010c). Orf1–Orf6 and Orf8 in the Salmonella O61 antigen gene cluster share 51–84% identity to Elg1-Elg7, respectively, and may have the corresponding functions. Therefore, it is likely that the pathway for 8eLeg5RHb7Ac is similar to that proposed for 8eLeg5Ac7Ac, and orf1–orf6 and orf8 are responsible for the synthesis of 8eLeg5RHb7Ac in Salmonella O61 (Fig. 4). Based on the structural difference, we propose that the substrates of each gene in the two pathways have the different acyl groups at N5. The biosynthesis of 8eLeg5Ac7Ac is initiated from UDP-GlcNAc, and we propose that 8eLeg5RHb7Ac is initiated from FEMS Microbiol Rev 38 (2014) 56–89 73 Salmonella O-antigen diversity PerA a Fru-6-P ManA D-Man-6-P ManB D-Man-1-P ManC GDP-D-Man Gmd ribulose 5-phosphate PerB GDP-D-Rha4NAc GDP-6-deoxy- α-D-lyxo-hexos-4-ulose ColA Rib GDP-D-Rha4N GDP-3,6-dideoxy- α- D-threo-hexos-4-ulose ColB CDP-ribitol GDP-L-colitose FdtA Glc-1-P RmlA dTDP-D-Glc RmlB dTDP-6-deoxy-α -D-xylo-hexos-3-ulose FdtB dTDP-D -Fuc3N dTDP-6-deoxy- α-D-ribo-hexos-3-ulose QdtB dTDP-D -Qui3N Abe CDP-D-Glc DdhB CDP-6-deoxy- α-D-xylo-hexos-4-ulose DdhC [I] DdhD UDP-2-acetamido-2,6-dideoxy-β-L-arabino-hexos-4-ulose FnlB QdtC dTDP- D -Qui3NAc CDP-Abe CDP-3,6-dideoxy- α-D-erythro-hexos-4-ulose Prt FnlA dTDP- D -Fuc3NAc dTDP-6-deoxy-α -D-xylo-hexos-4-ulose QdtA DdhA FdtC UDP-2-acetamido-2,6-dideoxy-β-L-talose FnlC CDP-Par Tyv UDP-L-FucNAc CDP-Tyv WbuX UDP-L-FucNAm X UDP-GlcNAc NnaA UDP-GlcNRHb Elb1 UDP-ManNAc NnaB NeuNAc NnaC CMP-NeuNAc UDP-2,6-dideoxy-2-[(R)-3-hydroxybutanoylamino]-β-L-arabino-hexos-4-ulose Elb2 Elb3 UDP-4-amino-2,4,6-trideoxy-2-[(R)3-hydroxybutanoylamino]-β-L-idose UDP-4-acetamido-2,4,6-trideoxy-2-[(R)3-hydroxybutanoylamino]-β-L-idose Elb4 CMP-8eLeg5RHb7Ac Elb7 8eLeg5RHb7Ac Elb6 4-acetamido-2,4,6-trideoxy-2-[(R)-3hydroxybutanoylamino]-β-L-gulose Elb5 UDP-4-acetamido-2,4,6-trideoxy-2-[(R)3-hydroxybutanoylamino]-β-L-gulose Fig. 4. Biosynthetic pathways for the sugars in Salmonella O antigens. The pathways for the sugars that are also present in Shigella O antigens (Liu et al., 2008) are not included. Putative pathways are denoted by a broken line. In the CDP-3,6-dideoxyhexose pathway, [I] indicates a 4-pyridoxamine 6-deoxy-D3,4-glucoseen intermediate (Johnson & Liu, 1998). ManA, phosphomannose isomerase; ManB, phosphomannomutase; ManC, mannose-1-phosphate guanylyltransferase (Samuel & Reeves, 2003); Gmd, GDP-mannose 4,6-dehydratase (Somoza et al., 2000; Kneidinger et al., 2001); ColA, GDP-4-keto-6-deoxy-D-mannose 3-dehydrase (Alam et al., 2004); ColB, GDP-colitose synthase (Alam et al., 2004); PerA, GDP-perosamine synthetase (Zhao et al., 2007; Albermann & Beuttler, 2008); PerB, GDP-perosamine N-acetyltransferase (Albermann & Beuttler, 2008); Rib, ribulose 5-phosphate reductase/CDP-ribitol pyrophosphorylase (Follens et al., 1999); RmlA, glucose-1-phosphate thymidylyltransferase (Zuccotti et al., 2001); RmlB, dTDP-D-glucose 4,6-dehydratase (Allard et al., 2001); FdtA, dTDP-6-deoxy-hex-4-ulose isomerase; FdtB, dTDP-6-deoxy-D-xylo-hex-3-ulose aminase; FdtC, dTDP-D-Fuc3N acetylase (Pfoestl et al., 2003); QdtA, dTDP-4-oxo-6-deoxy-Dglucose 3,4-oxoisomerase; QdtB, dTDP-3-oxo-6-deoxy-D-glucose aminase; QdtC, dTDP-D-Qui3N acetylase (Pfostl et al., 2008); DdhA, glucose-1phosphate cytidylyltransferase; DdhB, CDP-glucose 4,6-dehydratase; DdhC, CDP-4-keto-6-deoxy-D-glucose 3-dehydrase; DdhD, CDP-6-deoxy-D3,4glucoseen reductase (Johnson & Liu, 1998; Samuel & Reeves, 2003); Abe, CDP-abequose synthase (Hallis et al., 1998); Prt, CDP-paratose synthase (Hallis et al., 1998); Tyv, CDP-Par 2-epimerase (Koropatkin et al., 2003); FnlA, 4,6-dehydratase/5-epimerase; FnlB, 3-epimerase/ reductase; FnlC, C2 epimerase (Mulrooney et al., 2005);WbuX, aminotransferase (King et al., 2008); NnaA, GlcNAc-2-epimerase; NnaB, Neu5Ac condensing enzyme; NnaC, CMP-Neu5Ac synthetase (Annunziato et al., 1995); Esb1, C6 dehydratase/C5 epimerase; Esb2, aminotransferase; Esb3, acetyltransferase; Esb4, nucleotidase; Esb5, condensase; Esb6, cytidylyltransferase. aThe enzyme is encoded by the gene, which is not located in the O-antigen gene cluster. UDP-GlcNRHb, which is probably synthesized from UDPGlcNAc. However, the expected genes for UDP-GlcNRHb are not found in the O-antigen gene cluster of Salmonella O61 and may be located elsewhere in the genome. orf1-orf6 and orf8 in Salmonella O61 were named elb1–elb7, respectively. A close relationship between the O antigens of Salmonella and E. coli Until recently, there were only four confirmed cases in which the O antigens are identical in the two species: Salmonella O35 and E. coli O111, Salmonella O50 and E. coli FEMS Microbiol Rev 38 (2014) 56–89 O55, Salmonella O30 and E. coli O157, and Salmonella O62 and E. coli O35 (Rundlof et al., 1998; Samuel et al., 2004) (Table 1). We now find that there are 24 O antigens present in both Salmonella and E. coli being either identical or near identical between the two species, which is a much higher number than previously thought. All of the shared O antigens are GlcNAc-/GalNAc-initiated. The data are summarized in Table 2, and some interesting examples are described in detail below. It is worth noting that in addition to Salmonella O30, O35, O50, and O62, there are 11 Salmonella O antigens that cross-react serologically with one or more E. coli O antigens (Orskov et al., 1977), and 7 of them (Salmonella ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved 74 O6,14, O11, O17, O38, O42, O43, and O51) were shown in this study to have structures and gene clusters that are identical or closely related to an E. coli O antigen. However, the remaining four Salmonella O antigens are not obviously related structurally or genetically to the respective E. coli O antigen. Salmonella O52 was found to have the same O-antigen structure as that of E. coli O153 (Ratnayake et al., 1994). However, there are no genes shared by the two gene clusters, which is obviously different from other pairs of Salmonella and E. coli O antigens that are identical or closely related. The sources for the sequences of the 15 Salmonella GlcNAc-/GalNAc-initiated O antigens that are not related to E. coli O antigens are summarized in Table S4. Salmonella O6,14 and E. coli O77 group It is well known that most S. flexneri serotypes share a common O-antigen backbone and differ only in the distribution of four possible Glc side-branch residues and an O-acetyl moiety, which are all attached by enzymes encoded by prophage genes (Allison & Verma, 2000), or differ in the presence of a plasmid-encoded phosphoethanolamine modification (Sun et al., 2012; Knirel et al., 2013). There is a similar group of O-antigen structures in E. coli, comprising E. coli O77, O17, O44, O73, and O106, which have been given serogroup status, and Salmonella O6,14 is a single representative in Salmonella with a related structure (Wang et al., 2007). These strains also share a common four-sugar backbone O-unit structure and differ by the addition of one or two Glc side branches at various positions of the backbone (the only exception is the E. coli O77 O antigen that does not have any side-chain modification). Their O-antigen gene clusters contain the same genes in the same order and express proteins required for the biosynthesis of the common four-sugar backbone. The O-antigen gene clusters of the E. coli O77 group share > 99% identity to each other and 70–76% identity to that of Salmonella O6,14, suggesting that this O-antigen backbone was in the common ancestor. In S. flexneri, the side-branch Glc residues are added from UDP-Glc in a three-step process involving GtrA and GtrB common to all such residues and a side-branch-specific transferase. The three genes are always present as a set of three genes, which are on a prophage genome in the chromosome, and most probably the E. coli O77-related strains and Salmonella O6,14 gained their specific side-branch modifications by acquiring similar prophages carrying different gtr gene sets. Salmonella O55 and E. coli O103 The O antigens of Salmonella O55 and E. coli O103 have similar pentasaccharide O units that differ in only one ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved B. Liu et al. sugar (Glc vs. GlcNAc) and in the acyl group on Fuc3N (Ac group vs. Hb group; Table 1). The DNA sequence identity in corresponding genes ranges from 53% to 76% (Fig. 3b), the only exception being the two acyltransferase genes (fdtC encoding an acetyltransferase in Salmonella O55 and fdhC encoding a 3-hydroxybutanoyltransferase in E. coli O103), which share no similarity and are responsible for the structural difference between dTDP-D-Fuc3NHb and dTDP-D-Fuc3NAc. We suggest that one of the two gene clusters acquired a new gene (acetyltransferase or 3-hydroxybutanoyltransferase gene) after species divergence (Liu et al., 2010c), but there is no indication as which was the original gene in the ancestor. There must also be a difference in the specificity of the second glycosyltransferase to ensure the difference in the third sugar as precursors for Glc and GlcNAc are generally available. Salmonella O66 and E. coli O166 The O-antigen structures of Salmonella O66 and E. coli O166 differ only in the linkage between O units and the presence of an O-acetyl moiety in the former. The Oantigen gene clusters of Salmonella O66 and E. coli O166 have nearly identical organizations, the only exception being that the wzy gene in E. coli O166 is replaced by a noncoding region in Salmonella O66 (Fig. 3c) (Liu et al., 2010a). It is proposed that a functional wzy gene outside the O-antigen gene cluster is involved in the synthesis of the O antigen of Salmonella O66, similar to what is found in Salmonella serogroups A, B, and D1 (Naide et al., 1965; Curd et al., 1998). The ancestral gene cluster of Salmonella O66 presumably had the wzy gene found in E. coli O166 between weiA and weiB, which would be no longer required after the bacteria gained the new wzy gene. The noncoding region in Salmonella O66 could be a remnant of a gene, but we found no region of similarity with the E. coli O166 wzy gene, probably owing to the substantial degradation observed between weiA and weiB. Salmonella O43-E. coli O86 and Salmonella O13-E. coli O127 The four O antigens have similar four-sugar main chains varying mainly in the first sugar, which is GlcNAc in Salmonella and GalNAc in E. coli Also Salmonella O43 and E. coli O86 have a side-branch Gal that is lacking in the others. So for our purposes, there is a pair of related Salmonella O antigens, and both Salmonella O antigens have a related E. coli O antigen, all of which are treated together here. Four of the genes (gmd-manC) of the Salmonella O13 and O43 antigen gene clusters have the same order and FEMS Microbiol Rev 38 (2014) 56–89 75 Salmonella O-antigen diversity are 93–99% identical, as are the same genes in E. coli O127 and O86 (Figs 2b and 3d–f). In comparisons between the species, these genes are 74–84% identical. This is as expected for genes that were present in the common ancestor and diverged as the species diverged, with the genes in the two gene clusters undergoing frequent recombination within each species so that they evolved in concert (Samuel et al., 2004). The manB gene immediately downstream of manC is similar, but has rather more divergence than the gmd-manC genes due to having a CA gene cluster form of manB in the Salmonella and E. coli strains. The other genes show quite complex patterns including high levels of divergence as discussed below. The choice of first sugar is determined when the second sugar, a GalNAc residue, is added to either UndPP– GalNAc or UndPP–GlcNAc by glycosyltransferases WcmA or WfbG, respectively (Yi et al., 2005). The wcmA and wfbG genes are second genes in the gene cluster, after the gne gene that is required for synthesis of the UDP-GalNAc substrate. The E. coli strains will also need a gnu gene for synthesis of the UndPP–GalNAc. The two genes in Salmonella are again highly similar (98–99% identity) as are the two in E. coli (99–100% identity). However, the gne genes are only 60–63% identical in comparisons between the species, and the two glycosyltransferase genes, wcmA and wfbG, are not related at all (no more than 30% identity). It appears that this end of the gene cluster was replaced in one of the species causing the first sugar to be replaced. At the 3′ end are genes related to the addition of the side-branch Gal in Salmonella O43 and E. coli O86, and the corresponding glycosyltransferase wcmB gene is found only in those strains, where it is located between the wzx and wzy genes. The wdbR gene in the same location in E. coli O127 only is proposed to be an acetyltransferase gene based on sequence homology and may be responsible for addition of one of the O-acetyl groups to the Fuc residue in E. coli O127. The genes for addition of the main-chain Gal residue and addition of the Fuc residue to it are very different in the 2 structural forms. The main-chain Gal residue carries the Gal side branch in Salmonella O43 and E. coli O86, so this may account for the difference between wfbI and wcmD, as if the side-branch Gal is added first, it would affect the target sugar for the Fuc transferases. However, the explanation for the difference between wfbI and wcmD genes is not so simple, as they are responsible for the same linkage, although the first sugars of the molecule at this stage are different. All these genes, including wzx and wzy, are highly divergent, and only for wcmB does it seem likely that the various forms have diverged from the gene cluster in the common ancestor of the two species. Perhaps there have been FEMS Microbiol Rev 38 (2014) 56–89 gene replacements since species divergence, or perhaps the situation in the common ancestor was more complex than just having the two forms seen today and included the sequence diversity now observed. Salmonella O30 and E. coli O157 Salmonella O30 and E. coli O157 have the same O-antigen structure that contains one residue each of D-Rha4NAc (N-acetyl-D-perosamine, D-PerNAc), D-Glc, L-Fuc, and D-GalNAc. The O-antigen gene cluster of Salmonella O30 is nearly identical to that of E. coli O157, the only difference being that Salmonella O30 lacks the acetyltransferase gene perB, which is located at the 3′ end of the E. coli O157 antigen gene cluster and is involved in the synthesis of D-Rha4NAc (Albermann & Beuttler, 2008) (Fig. 3g). An H-repeat remnant is located upstream of the perB gene in E. coli O157. It is likely that the acquisition of the E. coli O157 perB gene was mediated by the H-repeat element and occurred more recently. An acetyltransferase gene that converts GDP-D-Rha4N to GDP-DRha4NAc may be located elsewhere in Salmonella O30 genome. Two special Salmonella GlcNAc-/GalNAcinitiated O-antigen forms (O54 and O67) The O antigen of Salmonella O54 is different from all other reported bacterial O antigens in being synthesized by the synthase pathway. Salmonella O54 has a disaccharide O unit: ?4)-D-ManpNAc-(b1?3)-D-ManpNAc-(b1) and is thus a homopolymer. The gene cluster responsible for the synthesis of the Salmonella O54 O antigen resides on a small mobilizable plasmid (Keenleyside & Whitefield, 1996), and mobilization of this plasmid into strains with a functional chromosomal O-antigen gene cluster can lead to the simultaneous expression of two distinct O antigens. However, in the Salmonella O54 type strain, this is not the case due to inactivation of the chromosomal O-antigen locus, but the strains for O54 serovars include some with group B, C1, C2, E, or 21(L) epitopes (Fitzgerald et al., 2007), suggesting that the plasmid is quite mobile in nature. The Salmonella O54 antigen gene cluster contains mnaA, wbbE, and wbbF. MnaA is a C2 epimerase that converts UDP-GlcNAc to UDP-ManNAc (Campbell et al., 2000). WbbE transfers the first UDP-ManNAc to UndPP– GlcNAc that is also synthesized by WecA, to complete an adapter. WbbF, an integral membrane protein, is responsible for both sequential addition of ManNAc and the concurrent extrusion of the nascent polymer across the cytoplasmic membrane (Keenleyside & Whitefield, 1996). Salmonella O67 has previously been suggested to be a variant of serogroup B (O4) (Li & Reeves, 2000). Indeed, ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved 76 a molecular typing study based on O-antigen gene cluster probes found that the serogroup O4 antigen-specific gene does not distinguish strains of serogroups O4 and O67 (Fitzgerald et al., 2007). In this study, we sequenced the region between galF and gnd in Salmonella O67 and found it to be the same as for the serogroup O4 antigen gene cluster (with identity ranging from 99% to 100% for corresponding genes). However, the structural analysis revealed that the O67 antigen structure is similar to that of D-galactan I O antigen of K. pneumoniae (Table 1), the only difference being the presence of an O-acetyl group in Salmonella O67, which is consistent with the fact that there is no cross-reaction between serogroup O4 and O67 antigens and their respective antisera. The data show that the O67 gene cluster is not located between galF and gnd, but elsewhere in the genome. The gene cluster responsible for the synthesis of D-galactan I in K. pneumoniae O1 has been identified downstream of gnd and consists of six genes, comprising wzm, wzt, wbbM, glf, wbbN, and wbbO (Clarke & Whitfield, 1992). wzm and wzt encode components of an ABC transporter for export of the O polysaccharide, and glf encodes a UDP-galactopyranose mutase, for conversion of UDP-Galp to UDP-Galf. In the ABC transporter pathway, O-antigen synthesis begins with the formation in the cytoplasm of a chain of O units on an acceptor UndPP–GlcNAc, which is synthesized by WecA. Galactan I synthesis has been studied by the Whitfield group and summarized in a recent review (Greenfield & Whitfield, 2012). WbbO was shown to be a bifunctional glycosyltransferase adding the first two-sugar repeat unit (Galp and Galf) to the UndPP–GlcNAc acceptor, forming the adaptor region of the O polysaccharide. Further extension of galactan I requires WbbM, which encodes a Galp transferase (Guan et al., 2001). WbbN is thought to be the Galf transferase for galactan I extension, although WbbO can replace WbbN as the Galf transferase in vitro. However, no genes with the potential for the synthesis of D-galactan I can be found downstream of gnd in Salmonella O67. To identify the O-antigen gene cluster, we obtained a draft genome of the Salmonella O67 type strain using Solexa sequencing. A contig was found containing eight genes related to the synthesis of the O67 antigen. orf1-orf6 of that contig are identified as wzm, wzt, wbbM, glf, wbbN, and wbbO by homology with the genes of K. pneumoniae O1 (92%, 95%, 76%, 85%, 64%, and 79% identity, respectively) and account for the synthesis of D-galactan I. orf7 and orf8 were named wejU and wejV, respectively. wejU appears to be a glycosyltransferase gene, but its exact function is unclear. WejV shares similarity to many acyltransferase, and we propose that it is responsible for transfer of the O-acetyl group to the Salmonella O67 antigen. ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved B. Liu et al. We found that K. oxytoca 10–5246 has the same gene cluster (downstream of gnd) as that of Salmonella O67 including wejU and wejV. Some of the Klebsiella strains also have galactan O antigens, and we suggest that the O67 gene cluster was derived from a Klebsiella strain with this gene cluster. Currently, the genomic locus of Salmonella O67 antigen gene cluster is unclear. It should be noted that the wzm-wejU set of genes is also present in the genome of E. coli SMS-3-5, with many insertion elements present upstream and downstream of that region. It is likely that the Salmonella O67 strain under study arose from a serogroup B strain by gaining a new gene cluster for D-galactan I that originally came from Klebsiella and has been incorporated into the chromosome at an unidentified locus and that this was followed by repression of the function of its original O-antigen gene cluster by means also not identified. It remains to be seen whether all O67 isolates have similar genetics, but serogroup O67 has only one serovar, named ‘Cresswell’, and isolates are extremely rare. Structure and genetics of Salmonella Gal-initiated O antigens with close relatedness There is a set of O antigens in Salmonella that have Gal as first sugar of the O unit, comprising serogroups O2 (A), O4 (B), O8 (C2-C3), O9 (D1), O9,46 (D2), O9,46,27 (D3), O3,10 (E1-E3), and O1,3,19 (E4). These O antigens also have many other similarities (Table 3). Except for serogroup C2-C3, they possess a main chain having a D-Manp-(1?4)-L-Rhap-(a1?3)-D-Galp trisaccharide repeat unit and may differ in (1) the configuration (a vs. b) and the position of the polymerization linkage (a 1?2 vs. a 1?6); and (2) the configuration (a vs. b) of the D-Manp(1?4)-L-Rhap linkage. In serogroup C2-C3, the main chain is built up of L-Rhap-(b1?2)-D-Manp-(a1?2)-DManp-(a1?3)-D-Galp tetrasaccharide repeats. The major differences between serogroups are defined by the presence or absence and the identity (Abe, Tyv, or Par) of the sidebranch 3,6-dideoxyhexose residue, and additional structural diversity is achieved by lateral glucosylation and/or O-acetylation, which in most cases are nonstoichiometric. The other defining feature of Gal-initiated Salmonella O antigens is that they have the wbaP gene in the gene cluster for the initial transferase that transfers Gal-P from UDP–Gal to UndP to generate UndPP–Gal. It is near universal in the Enterobacteriaceae for the first sugar to be GlcNAc or GalNAc with WecA as the initial transferase. The Gal-initiated O antigens are major exception, and it seems that the use of Gal as initial sugar arose in Salmonella since its divergence from E. coli. However, although there are only 8 Gal-initiated O antigens and FEMS Microbiol Rev 38 (2014) 56–89 77 Salmonella O-antigen diversity Table 3. Structures of Salmonella Gal-initiated O antigens [adopted from the recent review (Knirel, 2011)] Salmonella (S) serogroup, serovar SO2 (A) Paratyphi SO4 (B) Typhimurium, Agona,† Abortus equi* SO4 (B) Bredeney, Typhimurium SL3622† O-antigen structure* Parp D-Glcp α1 Ac α1 ↓ | ↓ 3 2 4 →2)-D-Manp-(α1→4)-L-Rhap-(α1→3)-D-Galp-(α1→ Abep-(2------Ac D-Glcp α1 α1 ↓ ↓ 3 4 →2)-D-Manp-(α1→4)-L-Rhap-(α1→3)-D-Galp-(α1→ D-Glcp Abep-(2------Ac α1 α1 ↓ ↓ 3 6 →2)-D-Manp-(α1→4)-L-Rhap-(α1→3)-D-Galp-(α1→ SO8 (C2) Newport Abep D-Glcp-(2------Ac α1 α1 ↓ ↓ 3 3 →4)-L-Rhap-(β1→2)-D-Manp-(α1→2)-D-Manp-(α1→3)-D-Galp-(β1→ 2 | Ac SO8 (C3) Kentucky I.S. 98 D-Glcp-(2------Ac Abep α1 α1 ↓ ↓ 3 4 →4)-L-Rhap-(β1→2)-D-Manp-(α1→2)-D-Manp-(α1→3)-D-Galp-(β1→ SO8 (C3) Kentucky 98/39 Abep D-Glcp α1 α1 ↓ ↓ 3 2 →4)-L-Rhap-(β1→2)-D-Manp-(α1→2)-D-Manp-(α1→3)-D-Galp-(β1→ SO9 (D1) Typhi, Enteritidis SE6,† Gallinarum bv. Pullorum 77† SO9 (D1) Enteritidis I.S. 64, Gallinarum bv. Pullorum 11 SO9,46 (D2) Strasbourg FEMS Microbiol Rev 38 (2014) 56–89 D-Glcp-(2------Ac Tyvp α1 α1 ↓ ↓ 3 4 →2)-D-Manp-(α1→4)-L-Rhap-(α1→3)-D-Galp-(α1→ Tyvp α1 ↓ 3 →2)-D-Manp-(α1→4)-L-Rhap-(α1→3)-D-Galp-(α1→ Tyvp D-Glcp α1 α1 ↓ ↓ 3 4 →6)-D-Manp-(β1→4)-L-Rhap-(α1→3)-D-Galp-(α1→ ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved 78 B. Liu et al. Table 3. Continued Salmonella (S) serogroup, serovar SO9,46 (D2) II O-antigen structure* Tyvp α1 ↓ 3 →6)-D-Manp-(β1→4)-L-Rhap-(α1→3)-D-Galp-(α1→ SO9,46,27 (D3) II Tyvp D-Glcp α1 α1 ↓ ↓ 3 6 →6)-D-Manp-(α/β1→4)-L-Rhap-(α1→3)-D-Galp-(α1→ SO3,10 (E1) Anatum Ac | 6 →6)-D-Manp-(β1→4)-L-Rhap-(α1→3)-D-Galp-(α1→ D-Glcp SO3,10 (E1) Muenster SO3,10 (E2) Anatum var. 15+ α1 ↓ 4 →6)-D-Manp-(β1→4)-L-Rhap-(α1→3)-D-Galp-(α1→ ?6)-D-Manp-(b1?4)-L-Rhap-(a1?3)-D-Galp-(b1? D-Glcp SO3,10 (E3) Lexington var. 15+,34+ α1 ↓ 4 →6)-D-Manp-(β1→4)-L-Rhap-(α1→3)-D-Galp-(β1→ D-Glcp SO1,3,19 (E4) Senftenberg α1 ↓ 6 →6)-D-Manp-(β1→4)-L-Rhap-(α1→3)-D-Galp-(α1→ *Abe, abequose (3,6-dideoxy-D-xylo-hexose); Par, paratose (3,6-dideoxy-D-ribo-hexose); Tyv, tyvelose (3,6-dideoxy-D-arabino-hexose). Nonstoichiometric substituents are italicized. † The O antigen lacks O-acetylation. they are found almost exclusively in subspecies I and II, they have nonetheless been very successful and dominate the isolation lists. The only exception to the presence in subspecies I and II for the serovar type strains is the C2-C3 O antigen strain, which is in subspecies IIIb. There is a modular structure for this set of O-antigen gene clusters, with several genes, especially sugar synthesis genes, being shared by different gene clusters in conserved locations as shown in Fig. 5. The rml genes are at the 5′ end of each gene cluster as for many GlcNAc-/GalNAcinitiated O antigens, followed by four ddh genes (ddhD, ddhA, ddhB, and ddhC), which are responsible for the synthesis of CDP-4-keto-3,6-dideoxy-D-glucose (Samuel & ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved Reeves, 2003), the precursor of CDP-Abe, CDP-Par, and CDP-Tyv (Fig. 4). The abe gene or prt plus tyv genes for completing the synthesis of CDP-Abe and CDP-Tyv, respectively (Fig. 4), are located just downstream of the ddh genes. However, the serogroup E O antigen does not contain a dideoxyhexose residue, and the gene cluster does not have the relevant genes. The manBC and wbaP genes are located at the 3′ end of all of these Gal-initiated O-antigen gene clusters. The major differences between the gene clusters are found in their central regions, which contain the diverse glycosyltransferase genes and the O-unit processing genes (wzx and wzy). In addition, the gene orf17.4 with unclear FEMS Microbiol Rev 38 (2014) 56–89 79 A rmlB rmlD rmlA rm lC B rmlB rmlD rmlA rm lC Salmonella O-antigen diversity ddhD ddhA ddhB ddhC abe wzy remnant wzx wbaV wbaU wbaN manC manB ddhD ddhAddhB ddhC prt tyv* wzx wbaV wbaP wbaU wbaN manC manB wbaP D1 rmlB rmlD rmlA rm lC wzy remnant ddhD ddhAddhB ddhC prt tyv wzx wbaV wbaU wbaN manC manB wbaP rm rmlB rmlD rmlA rm rmlB rmlD rmlA rmlB rmlD rmlA ddhD ddhAddhB ddhC prt tyv wzx wbaV wzy wbaU wbaN manC manB wbaP lC rmlB rmlD rmlA rm D3 rm lC wzy remnant ddhD ddhAddhB ddhC prt tyv wzx wzy wbaO wbaN manC manB wbaV wbaP lC D2 wzx wzy wbaO wbaN manC manB wbaP orf17.4 lC E C2 ddhD ddhAddhB ddhC abe wzx wbaR wbaLwbaQ wzy wbaW wbaZ manC manB wbaP 1 kb Fig. 5. The O-antigen gene clusters of Salmonella Gal-initiated O antigens. For color coding key, see Fig. 1. Group B was the first O-antigen gene cluster to be described (Nikaido et al., 1967; Jiang et al., 1991). The O-antigen gene cluster of group A was extracted from the published genome CP00026. Other gene clusters were studied by restriction enzyme mapping to locate regions shared with the O-antigen gene cluster of group B, and only unique region was sequenced (Verma & Reeves, 1989; Liu et al., 1991; Brown et al., 1992; Wang et al., 1992; Xiang et al., 1994; Curd et al., 1998). * the tyv gene in the O-antigen gene cluster of group A is nonfunctional due to a frameshift mutation. function was found downstream of wbaP in some group E strains. Group B was the first O-antigen gene cluster to be described (Nikaido et al., 1967; Jiang et al., 1991) as it is present in the strain LT2 that was used in many early studies in bacterial genetics. It has Abe as its dideoxyhexose and the four ddh genes plus the abe gene. The difference between the O-antigen structures of the D1 and B O antigens is the presence of a Tyv side-branch sugar in D1 in place of Abe. The D1 gene cluster has prt and tyv genes in place of abe, which accounts for the structural difference. The only difference between the O-antigen structures of serogroups A and D1 is the presence of a Par or Tyv side branch, respectively, and the gene clusters are near identical, with prt and tyv genes both present. However, the tyv gene is not functional in group A, and in serovar Paratyphi A at least, this has been shown to be due to a frameshift mutation near the start of the gene, which would prevent conversion of CDP-Par to CDP-Tyv (Verma et al., 1988). As mentioned above, the wzy genes of groups A, B, and D1 are not located within the O-antigen gene cluster, but at a locus named rfc. However, there are wzy remnants in their gene clusters. The group D2 structure differs from the group D1 structure in the polymerization linkage and the configuration FEMS Microbiol Rev 38 (2014) 56–89 (a vs. b) of the D-Manp-(1?4)-L-Rhap linkage. It has been suggested that the O-antigen gene cluster of serogroup D2 has arisen by reassortment of the serogroup D1 and E gene clusters by recombination mediated by an H-repeat element (Xiang et al., 1994). For serogroup D3, there are two forms of the O unit that differ only in the configuration of the linkage between Man and Rha (a-1?4 and b-1?4). wbaU is responsible for the formation of the D-Man-(a1?4)-LRha linkage (Curd et al., 1998), but there is no glycosyltransferase gene in the O-antigen gene cluster for the D-Man-(b1?4)-L-Rha linkage, which may be located elsewhere in the genome. The O-antigen gene cluster of serogroup D1 also is thought to have arisen from that of D3 by the loss of original wzy gene (Curd et al., 1998). Group E was initially subdivided into groups E1, E2, E3, and E4, based on serology. Serogroups E1, E2, and E3 have been amalgamated as serogroup E1 (the 2007 Weill summary) on the basis that they have almost the same O-antigen gene clusters between galF and gnd. Recently, we found that serogroup E4 also has the same O-antigen gene cluster as E1 and so is not really a separate serogroup. The variation in the Glc side chain among serogroup E O antigens is presumably due to the presence of different bacteriophages with side-chain modification genes (Table 3). ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved 80 Serogroup C2-C3 has a different order for the sugars and one more Man residue in the O unit. All of the genes in the central region of the gene cluster are unique to serogroup C2-C3. The C3 form was originally put in a separate serogroup C3, but differs only in having a Glc side branch that is due to a bacteriophage-encoded set of genes as is common in Salmonella. Conclusions This review covers the chemical structure and DNA sequence data for all Salmonella O antigens, including more recent work on the GlcNAc-/GalNAc-initiated Salmonella O antigens that are more directly comparable with those of E. coli and Shigella. Together with the previously published survey of Shigella O antigens, it provides insights into the evolution of O-antigen diversity in bacteria. It also documents the relationships between the O antigens of Salmonella and E. coli, which were underestimated before. In our previous review (Liu et al., 2008), we had observed that Shigella has a higher than usual proportion of anomalies in its O-antigen gene clusters (17 anomalies in 33 O-antigen gene clusters), many of which are thought to be indicators of events that mediated the formation of new O-antigen forms, such as remnants of genes no longer required or elements that mediated gain of new genes. However, only 12 anomalies are found in the 37 gene clusters of the Salmonella GlcNAc-/GalNAcinitiated O antigens with the Wzx/Wzy pathway (excluding Salmonella O54 and O67), much lower than that in Shigella. The smaller number of anomalies indicates that for this major group of Salmonella O antigens, the structure of the gene clusters has generally been stable, indicating that the set of O antigens is well adapted to the Salmonella niche. In the previous review (Liu et al., 2008), we also found that 21 of 34 Shigella O antigens are either identical or closely related to an O antigen in E. coli which is easily explained as all Shigella serotypes except for S. boydii type 13 are in fact part of the species E. coli. Homologous recombination occurs readily and was shown to be an essential mechanism in the diversification of Shigella O antigens. Previous structural analysis of Salmonella and E. coli O antigens had revealed only a few shared structures, although early serological data had shown extensive crossreactions (Orskov et al., 1977). However, in our recent studies, we found many more cases to give a total of 24 O antigens that are identical or closely related in Salmonella and E. coli. The most likely explanation for the observed similarities in the two species is that the each pair of gene clusters originated from a gene cluster that was present in ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved B. Liu et al. the most recent common ancestor. In that case, the two gene clusters would have a similar organization. E. coli and Salmonella diverged about 140 million years ago, and 93% of E. coli and Salmonella housekeeping genes have levels of identity between 76.3% and 100% (Sharp, 1991). For 23 of the 24 O-antigen gene clusters that encode identical or closely related O antigens in E. coli and Salmonella, the average identity for corresponding genes is 73.5%, and the average identity for corresponding proteins is 73.7%. This is close to the lower end of the range for housekeeping genes, but the pattern is generally similar for all of the gene clusters. However, genes in these O-antigen gene clusters do diverge at a higher rate than housekeeping genes, suggesting that the O-antigen genes are under consistent selection pressure from the environments or hosts for better adaptation. This is not unexpected as they have an atypical GC content, suggesting an origin in another species, and they may still be adjusting to the new intracellular environment. In these pairs of O-antigen gene clusters in Salmonella and E. coli, the identity levels for sugar synthesis genes, glycosyltransferase genes, and O-unit processing genes are different. The average level of identity is 77.1%, 70.3%, and 69.4%, respectively, for the three classes of genes and 81.4%, 67.1%, and 65.6%, respectively, for the proteins encoded by these genes. The divergence levels for glycosyltransferase genes and O-unit processing genes are consistently higher than those for sugar synthesis genes, being observed in almost all pairs of gene clusters. It appears that each pair has indeed diverged from a gene cluster that was in the common ancestor, but that the three classes of genes were subject to different selection pressures, although there is no experimental evidence for that. Alternative explanations for the shared gene clusters are the following: (1) The gene clusters were recently transferred from one species to the other after species divergence. In this case, the two gene clusters should have a higher level of sequence identity, not related to the level for housekeeping genes. (2) They have a common origin but were acquired independently. In this case, we expect similar gene order, as is observed, but can make no predictions on level of divergence as it will depend on the time since the divergence of the donor species, which could be earlier or later than divergence of the E. coli and Salmonella species. (3) The two gene clusters were assembled independently either after species divergence or before being acquired by the E. coli and Salmonella lineages. In this case, the gene organization of the individual gene clusters should be different, so is not supported at all for the 23 pairs being discussed. None of these alternative explanations fit the data as a sole explanation, but options (1) and (2) are also FEMS Microbiol Rev 38 (2014) 56–89 Salmonella O-antigen diversity possible, although if at all common one would expect some pairs with a much higher or lower level of divergence. This conclusion is in agreement with one proposed earlier with data for just three structures (Samuel et al., 2004), but now has much stronger support. It is of course possible that some of the 23 gene clusters did arrive independently but happen to be close to the others in divergence, but if so, we suggest that this was a minority of them. The exception is the case of Salmonella O52 and E. coli O153, in which gene clusters that are not related generate the same O-antigen structure. Each gene cluster has the expected number of glycosyltransferase genes and a wzx and wzy gene, but the order is different, and none have significant levels of identity. This is presumably a case of two gene clusters for a given structure that were assembled independently. Some of the gene clusters shared by Salmonella and E. coli did evolve to generate new O-antigen forms by acquisition of new genes after species divergence as described above. The Salmonella O66 gene cluster is thought to have obtained a new wzy gene outside the O-antigen gene cluster that is responsible for the b-1?2 linkage between the O units. The original wzy gene for the b-1?3 linkage in the O-antigen gene cluster must have degraded over time as proposed for Salmonella serogroup B (Wang et al., 2002a) as it was no longer required in O-antigen synthesis. For the Salmonella O55-E. coli O103 pair, one of the two gene clusters must have acquired a new gene (an acetyltransferase gene or a 3-hydroxybutanoyltransferase gene) after species divergence to synthesize a different sugar derivative. For the related Salmonella O43–E. coli O86 and Salmonella O13–E. coli O127 pairs, there were significant evolutionary changes, but it is not yet possible to unravel what happened. The Salmonella O6,14 and the E. coli O77 groups are interesting as, like the group of related S. flexneri serogroups, diversity arises by acquisition (or loss) of different prophage genes for side-chain modification: only one form has been observed in Salmonella, but 5 in E. coli. It should be noted that within some serogroups, there are also variant strains with O-antigen structures and gene clusters different from those of the type strains. For example, the O-antigen structure of one Salmonella O50 strain was reported to differ from that of the type strain in having a GlcNAc in place of a GalNAc residue (Senchenkova et al., 1997). Also Fitzgerald et al. found that the O-antigen-based molecular typing method they devised for Salmonella O13 cannot detect O13 strains belonging to subspecies IIIb or S. bongori (Fitzgerald et al., 2007). The genetic basis for this difference remains to be determined. In addition, there is more than one O-antigen structure for some other Salmonella O FEMS Microbiol Rev 38 (2014) 56–89 81 serogroups, usually obtained to determine the basis of serological variation, and most variations are in the side branches, as in serogroups O6,7, O6,14, and O30 (Table 1). These variations are probably due to the presence in the chromosome of different bacteriophage genomes that include O-antigen side-branch modification genes. As a genus with a long evolutionary history, the mechanism for the generation and maintenance of O-antigen diversity in Salmonella is obviously different from that in Shigella (Liu et al., 2008), which is essentially a relatively small group of strains in another species (E. coli) that are distinguished by a capacity for host cell invasion that may have only recently been adopted in the species (Maurelli et al., 1998; Pallen & Wren, 2007). One of the major characteristics of Salmonella is that the O antigens can be divided into two different classes (Gal-initiated class and GlcNAc-/GalNAc-initiated class). The GlcNAc-/ GalNAc-initiated Salmonella O antigens that we have just been discussing are similar to those in other members of Enterobacteriaceae in using the WecA initial sugar transferase encoded in the ECA gene cluster that is widely distributed in the family. Over half of the GlcNAc-/ GalNAc-initiated O antigens are also found in the closest relative E. coli and all but one of these are thought to have been present in their common ancestor. The Gal-initiated O antigens have a quite different evolutionary history and are thought to have entered the species quite recently, but although only 8 in number are now dominant. We do not know the reasons for this enormous difference between E. coli and Salmonella, with Gal-initiated O antigens greatly outnumbering the GlcNAc-/GalNAc-initiated O antigens in Salmonella, but to our knowledge not reported in E. coli. Most Salmonella strains that cause serious infection in humans and animals have Gal-initiated O antigens. However, it is worth noting that the E. coli members of several Salmonella–E. coli serogroup pairs with identical or related O antigens, including E. coli O157, O55, O111, O145, O103, O118, and O78, are associated with important pathogenic E. coli strains. The long history of these O antigens in E. coli and Salmonella indicates that they are possibly adaptive in both species, but most Salmonella members are not recognized to be particularly pathogenic. O-antigen diversity has been thought to be important in offering the various clones selective advantages in their specific niches. It has been estimated that a selective advantage of only 0.1% for one O antigen over another in a given niche is sufficient to maintain different alleles in different clones (Reeves, 1992), although it is difficult to demonstrate this in a laboratory assay. The O antigen is a target of the host innate immune system. It is recognized by the Toll-like receptor 4 (Royle et al., 2003), and ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved 82 it has been suggested that the pressures from the immune system may contribute to O-antigen diversity. Novel O antigens, especially those containing rare sugars, would not be recognized by the immune system. An example of the effect of a change in O antigen is given by Vibrio cholerae O139, a variant of the 7th pandemic O1 clone with a new O unit containing 2 Col residues and a QuiNAc residue (Knirel, 2011). It was first identified in 1992 Southern India and quickly spread in India and Bengal and some other Asian countries, totally displacing the O1 serogroup (Ramamurthy et al., 2003). It had the capacity to infect persons previously immune to the ancestral V. cholerae O1 form of the pandemic strain (Blokesch & Schoolnik, 2007), and this was thought to be the cause of its success (Ramamurthy et al., 2003). After few years, the O139 form virtually disappeared, but there has been periodic switching between V. cholerae O1 and O139 strains as agents of cholera in some areas (Faruque et al., 2003; Chatterjee et al., 2007). The strains also diversified other factors, which affect the balance of the two antigenic forms; however, the original rise of O139 form showed how powerful the selective pressure of O-antigen variation can be. It should be noted that a relationship between O-antigen form and host has been observed in several bacteria, including Salmonella for which a host is commonly most easily infected by strains bearing a specific O antigen (Makela et al., 1973; Rabsch et al., 2002; Butela & Lawrence, 2010). In addition, most bacteria cannot evade an immune response by switching their O antigens in the timescale of an infection, as for H antigen phase variation. These data raise the possibility that the different O antigens expressed by different strains may confer advantages in different ecological niches, such as different host intestinal environments, which may be a major selection pressure for the generation and maintenance of O-antigen diversity (Butela & Lawrence, 2010). It has for instance been shown that diversifying selection mediated by predation from intestinal amoebae can contribute to O-antigen variation in Salmonella (Wildschutte et al., 2004; Wildschutte & Lawrence, 2007). Intestinal amoebae recognize antigenically diverse Salmonella strains with different efficiency, giving the various serotypes different ability to escape predators in particular environments. O-antigen variation is also helpful for bacteria in avoiding bacteriophage predation (Blokesch & Schoolnik, 2007). In addition, O-antigen diversity may provide selective advantage in other aspects; for example, they may mediate more effective adhesion to different intestinal mucins. Serotyping has been very important for our understanding of diversity in Salmonella and is used to define the serovars that are referred to in most discussions of the genus. However, in recent years, several aspects of ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved B. Liu et al. traditional serotyping methods have limited the utility of serotyping, especially in large-scale epidemiology studies. The techniques can be laborious and time-consuming, and the full range of sera needed is kept only in major typing centers. In addition, based on the O-antigen structure data obtained in this study, considerable serological crossreaction is expected between E. coli and Salmonella. There has often been discussion of developing a molecular typing system for Salmonella based on the current serology scheme, and the relevant concepts have also been discussed and applied in other bacteria (Raymond et al., 2002; Li et al., 2009). The completion of the sequencing of the Salmonella O-antigen gene clusters provides the data for a comprehensive typing scheme for Salmonella using sequence diversity, but based on the serotyping scheme, to give in effect molecular serotyping. To facilitate this, we have included data on the specific genes that could be useful for this and also a comprehensive set of primers that we have developed for a microarray targeting the O-antigenspecific genes that can differentiate most Salmonella serogroups (Table S5) (Guo et al., 2013). The only exceptions are groups A and D1 that need to be further distinguished from each other using conventional serotyping methods, due to having near-identical O-antigen gene clusters. The mutations in the tyv gene of group A strains are not enough to easily distinguish groups A and D1, but the specific frameshift in serovar Paratyphi A could probably be developed into a specific test for this serovar. For most serogroups, the O-unit processing genes (wzx and wzy) were selected as target genes, the exceptions being serogroups A/D1, O54, and O67, for which the sugar synthesis gene prt, the glycosyltransferase gene wbbE, and acetyltransferase gene wejV, respectively, were selected. For most serogroups, only primer pairs based on their own specific genes can generate the specific PCR products. However, due to the close relationship among the Gal-initiated Salmonella O-antigen gene clusters, combinations of primer pairs targeting more than one gene were necessary for detecting some of these serogroups (Table S5). For instance, as the prt genes of groups A, D1, D2, and D3 are highly similar, but not found in other O-antigen gene clusters, prt was used in the identification of all these groups, with D2 and D3, for example, further distinguished by their specific wzy genes. Our molecular typing system can also accurately differentiate Salmonella and E. coli strains with related O-antigen structures. Acknowledgements This work was supported by the National Key Programs for Infectious Diseases of China (2013ZX10004-216-001); the National 973 Program of China Grant (2012CB721001, FEMS Microbiol Rev 38 (2014) 56–89 Salmonella O-antigen diversity 2009CB522603); the National Natural Science Foundation of China (NSFC) Key Program Grant 31030002; the NSFC General Program Grant (81171524, 31270003); and the Russian Foundation for Basic Research (projects 11-0491173_NNSF-a and 11-04-01020-a). The authors have no conflict of interest to declare. References Alam J, Beyer N & Liu HW (2004) Biosynthesis of colitose: expression, purification, and mechanistic characterization of GDP-4-keto-6-deoxy-D-mannose-3-dehydrase (ColD) and GDP-L-colitose synthase (ColC). Biochemistry 43: 16450–16460. Albermann C & Beuttler H (2008) Identification of the GDP-N-acetyl-D-perosamine producing enzymes from Escherichia coli O157:H7. FEBS Lett 582: 479–484. Ali T, Weintraub A & Widmalm G (2007) Structural determination of the O-antigenic polysaccharide from Escherichia coli O166. Carbohydr Res 342: 274–278. Allard ST, Giraud MF, Whitfield C et al. (2001) The crystal structure of dTDP-D-glucose 4,6-dehydratase (RmlB) from Salmonella enterica serovar Typhimurium, the second enzyme in the dTDP-l-rhamnose pathway. J Mol Biol 307: 283–295. Allison GE & Verma NK (2000) Serotype-converting bacteriophages and O-antigen modification in Shigella flexneri. Trends Microbiol 8: 17–23. Andersson M, Carlin N, Leontein K et al. (1989) Structural studies of the O-antigenic polysaccharide of Escherichia coli O86, which possesses blood-group B activity. Carbohydr Res 185: 211–223. Annunziato PW, Wright LF, Vann WF et al. (1995) Nucleotide sequence and genetic analysis of the neuD and neuB genes in region 2 of the polysialic acid gene cluster of Escherichia coli K1. J Bacteriol 177: 312–319. Aoyama KM, Haase AM & Reeves PR (1994) Evidence for effect of random genetic drift on G+C content after Lateral transfer of fucose pathway genes to Escherichia coli K-12. Mol Biol Evol 11: 829–838. Bartelt M, Shashkov AS, Kochanowski H et al. (1993) Structure of the O-specific polysaccharide of the O23 antigen (LPS) from Escherichia coli O23:K?:H16. Carbohydr Res 248: 233–240. Bengoechea JA, Najdenski H & Skurnik M (2004) Lipopolysaccharide O antigen status of Yersinia enterocolitica O:8 is essential for virulence and absence of O antigen affects the expression of other Yersinia virulence factors. Mol Microbiol 52: 451–469. Bentley SD, Aanensen DM, Mavroidi A et al. (2006) Genetic analysis of the capsular biosynthetic locus from all 90 pneumococcal serotypes. PLoS Genet 2: e31. Beutin L, Wang Q, Naumann D et al. (2007) Relationship between O-antigen subtypes, bacterial surface structures and O-antigen gene clusters in Escherichia coli O123 strains FEMS Microbiol Rev 38 (2014) 56–89 83 carrying genes for Shiga toxins and intimin. J Med Microbiol 56: 177–184. Blokesch M & Schoolnik GK (2007) Serogroup conversion of Vibrio cholerae in aquatic reservoirs. PLoS Pathog 3: e81. Boyd EF, Nelson K, Wang F-S et al. (1994) Molecular genetic basis of allelic polymorphism in malate dehydrogenase (mdh) in natural populations of Escherichia coli and Salmonella enterica. P Natl Acad Sci USA 91: 1280– 1284. Brisson JR & Perry MB (1988) The structures of the two lipopolysaccharide O-chains produced by Salmonella boecker. Biochem Cell Biol 66: 1066–1077. Bronner D, Clarke BR & Whitfield C (1994) Identification of an ATP-binding cassette transport system required for translocation of lipopolysaccharide O-antigen side-chains across the cytoplasmic membrane of Klebsiella pneumoniae serotype O1. Mol Microbiol 14: 505–519. Brown PK, Romana LK & Reeves PR (1992) Molecular analysis of the rfb gene cluster of Salmonella serovar Muenchen (strain M67): genetic basis of the polymorphism between groups C2 and B. Mol Microbiol 6: 1385–1394. Bundle D, Gerken M & Perry M (1986) Two-dimensional nuclear magnetic resonance at 500 MHz: the structural elucidation of a Salmonella serogroup N polysaccharide antigen. Can J Chem 64: 255–264. Butela K & Lawrence J (2010) Population genetics of Salmonella: selection for antigenic diversity. Bacterial Population Genetics in Infectious Disease Vol. (Ashley Robinso D, Falush D & Feil EJ, eds), A John Wiley & Sons, Inc., Hoboken, NJ. Campbell RE, Mosimann SC, Tanner ME et al. (2000) The structure of UDP-N-acetylglucosamine 2-epimerase reveals homology to phosphoglycosyl transferases. Biochemistry 39: 14993–15001. CDC (2009) Salmonella Surveillance: Annual Summary, 2009. US Department of Health and Human Services, Atlanta, GA. Chatterjee S, Ghosh K, Raychoudhuri A et al. (2007) Phenotypic and genotypic traits and epidemiological implication of Vibrio cholerae O1 and O139 strains in India during 2003. J Med Microbiol 56: 824–832. Clark CG, Kropinski AM, Parolis H et al. (2009) Escherichia coli O123 O-antigen genes and polysaccharide structure are conserved in some Salmonella enterica serogroups. J Med Microbiol 58: 884–894. Clark CG, Grant CC, Trout-Yakel KM et al. (2010) The O28 antigen gene clusters of Salmonella enterica subsp. enterica serovar Dakar and serovar Pomona are different. Int J Microbiol 2010: 209291. Clarke BR & Whitfield C (1992) Molecular cloning of the rfb region of Klebsiella pneumoniae serotype O1:K20: the rfb gene cluster is responsible for synthesis of the D-galactan I O polysaccharide. J Bacteriol 174: 4614–4621. Cunneen MM, De Castro C, Kenyon J et al. (2009) The O-specific polysaccharide structure and biosynthetic gene cluster of Yersinia pseudotuberculosis serotype O:11. Carbohydr Res 344: 1533–1540. ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved 84 Cunneen MM, Liu B, Wang L & Reeves PR (2013) Biosynthesis of UDP-GlcNAc, UndPP-GlcNAc and UDP-GlcNAcA involves three easily distinguished 4-epimerase enzymes, Gne, Gnu and GnaB. PLoS One 8: e67646. Curd H, Liu D & Reeves PR (1998) Relationships among the O-antigen gene clusters of Salmonella enterica groups B, D1, D2, and D3. J Bacteriol 180: 1002–1007. Daniels C, Vindurampulle C & Morona R (1998) Overexpression and topology of the Shigella flexneri O-antigen polymerase (Rfc/Wzy). Mol Microbiol 28: 1211–1222. De Castro C, Skurnik M, Molinaro A et al. (2009) Characterization of the specific O-polysaccharide structure and biosynthetic gene cluster of Yersinia pseudotuberculosis serotype O:15. Innate Immun 15: 351–359. De Castro C, Kenyon JJ, Cunneen MM et al. (2010) Genetic characterisation and structural analysis of the O-specific polysaccharide of Yersinia pseudotuberculosis serotype O:1c. Innate Immun 17: 183–190. Di Fabio JL, Brisson JR & Perry MB (1988a) Structure of the major lipopolysaccharide antigenic O-chain produced by Salmonella carrau (O:6, 14, 24). Carbohydr Res 179: 233–244. Di Fabio JL, Perry MB & Brisson JR (1988b) Structure of the antigenic O-polysaccharide of the lipopolysaccharide produced by Salmonella eimsbuttel. Biochem Cell Biol 66: 107–115. Di Fabio JL, Brisson JR & Perry MB (1989a) Structural analysis of the three lipopolysaccharides produced by Salmonella madelia (1,6,14,25). Biochem Cell Biol 67: 78–85. Di Fabio JL, Brisson JR & Perry MB (1989b) Structure of the lipopolysaccharide antigenic O-chain produced by Salmonella livingstone (O:6,7). Biochem Cell Biol 67: 278–280. Di Fabio JL, Brisson JR & Perry MB (1989c) Structure of the lipopolysaccharide antigenic O-chain produced by Salmonella ohio (O:6,7). Carbohydr Res 189: 161–168. Doolittle RF, Feng DF, Tsang S et al. (1996) Determining divergence times of the major kingdoms of living organisms with a protein clock. Science 271: 470–477. Duus J, Gotfredsen CH & Bock K (2000) Carbohydrate structural determination by NMR spectroscopy: modern methods and limitations. Chem Rev 100: 4589–4614. Enright AJ, Van Dongen S & Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30: 1575–1584. Erbing C, Kenne L, Lindberg B et al. (1978) Structure of the O-specific side-chains of the Escherichia coli O 75 lipopolysaccharide: a revision. Carbohydr Res 60: 259–265. Faruque SM, Chowdhury N, Kamruzzaman M et al. (2003) Reemergence of epidemic Vibrio cholerae O139, Bangladesh. Emerg Infect Dis 9: 1116–1122. Feng L, Han W, Wang Q et al. (2005a) Characterization of Escherichia coli O86 O-antigen gene cluster and identification of O86-specific genes. Vet Microbiol 106: 241–248. Feng L, Senchenkova SN, Tao J et al. (2005b) Structural and genetic characterization of enterohemorrhagic Escherichia ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved B. Liu et al. coli O145 O antigen and development of an O145 serogroup-specific PCR assay. J Bacteriol 187: 758–764. Feng L, Reeves PR, Lan R et al. (2008) A recalibrated molecular clock and independent origins for the cholera pandemic clones. PLoS ONE 3: e4053. Fitzgerald C, Sherwood R, Gheesling LL et al. (2003) Molecular analysis of the rfb O-antigen gene cluster of Salmonella enterica serogroup O:6,14 and development of a serogroup-specific PCR assay. Appl Environ Microbiol 69: 6099–6105. Fitzgerald C, Gheesling L, Collins M et al. (2006) Sequence analysis of the rfb loci, encoding proteins involved in the biosynthesis of the Salmonella enterica O17 and O18 antigens: serogroup-specific identification by PCR. Appl Environ Microbiol 72: 7949–7953. Fitzgerald C, Collins M, van Duyne S et al. (2007) Multiplex, bead-based suspension array for molecular determination of common Salmonella serogroups. J Clin Microbiol 45: 3323–3334. Follens A, Veiga-da-Cunha M, Merckx R et al. (1999) acs1 of Haemophilus influenzae type a capsulation locus region II encodes a bifunctional ribulose 5-phosphate reductaseCDP-ribitol pyrophosphorylase. J Bacteriol 181: 2001–2007. Fratamico PM, DebRoy C, Strobaugh TP Jr et al. (2005) DNA sequence of the Escherichia coli O103 O-antigen gene cluster and detection of enterohemorrhagic E. coli O103 by PCR amplification of the wzx and wzy genes. Can J Microbiol 51: 515–522. Gajdus J, Kaczynski Z, Smietana J et al. (2009) Structural determination of the O-antigenic polysaccharide from Salmonella Mara (O:39). Carbohydr Res 344: 1054–1057. Gamian A, Jones C, Lipinski T et al. (2000) Structure of the sialic acid-containing O-specific polysaccharide from Salmonella enterica serovar Toucra O48 lipopolysaccharide. Eur J Biochem 267: 3160–3166. Greenfield LK & Whitfield C (2012) Synthesis of lipopolysaccharide O-antigens by ABC transporter-dependent pathways. Carbohydr Res 356: 12–24. Grimont PAD & Weill FX (2007) Antigenic Formulae of the Salmonella Serovars, 9th edn. WHO Collaborating Centre for Reference and Research on Salmonella. Institut Pasteur, Paris, France. Guan S, Clarke AJ & Whitfield C (2001) Functional analysis of the galactosyltransferases required for biosynthesis of D-galactan I, a component of the lipopolysaccharide O1 antigen of Klebsiella pneumoniae. J Bacteriol 183: 3318–3327. Guo D, Liu B, Liu F et al. (2013) Development of a DNA microarray for molecular identification of all 46 Salmonella O serogroups. Appl Environ Microbiol 79: 3392–3399. Gupta DS, Shashkov AS, Jann B et al. (1992) Structures of the O1B and O1C lipopolysaccharide antigens of Escherichia coli. J Bacteriol 174: 7963–7970. Hallis TM, Lei Y, Que NL et al. (1998) Mechanistic studies of the biosynthesis of paratose: purification and characterization of CDP-paratose synthase. Biochemistry 37: 4935–4945. FEMS Microbiol Rev 38 (2014) 56–89 Salmonella O-antigen diversity Hardnett FP, Hoekstra RM, Kennedy M et al. (2004) Epidemiologic issues in study design and data analysis related to FoodNet activities. Clin Infect Dis 38(suppl 3): S121–S126. Ho SY, Lanfear R, Bromham L et al. (2011) Time-dependent rates of molecular evolution. Mol Ecol 20: 3087–3101. Hu B, Perepelov AV, Liu B et al. (2010) Structural and genetic evidence for the close relationship between Escherichia coli O71 and Salmonella enterica O28 O-antigens. FEMS Immunol Med Microbiol 59: 161–169. Iguchi A, Thomson NR, Ogura Y et al. (2009) Complete genome sequence and comparative genome analysis of enteropathogenic Escherichia coli O127:H6 strain E2348/69. J Bacteriol 191: 347–354. Jansson PE, Lindberg B, Widmalm G et al. (1987) Structural studies of the Escherichia coli O78 O-antigen polysaccharide. Carbohydr Res 165: 87–92. Jensen SO & Reeves PR (2001) Molecular evolution of the GDP-mannose pathway genes (manB and manC) in Salmonella enterica. Microbiology 147: 599–610. Jiang XM, Neal B, Santiago F et al. (1991) Structure and sequence of the rfb (O antigen) gene cluster of Salmonella serovar typhimurium (strain LT2). Mol Microbiol 5: 695–713. Johnson DA & Liu H (1998) Mechanisms and pathways from recent deoxysugar biosynthesis research. Curr Opin Chem Biol 2: 642–649. Keenleyside WJ & Whitefield C (1996) A novel pathway for O-polysaccharide biosynthesis in Salmonella enterica serovar Borreze. J Biol Chem 271: 28581–28592. Keenleyside WJ, Perry M, Maclean L et al. (1994) A plasmid-encoded rfbO:54 gene cluster is required for biosynthesis of the O:54 antigen in Salmonella enterica serovar Borreze. Mol Microbiol 11: 437–448. Kenne L, Lindberg B, Soderholm E et al. (1983) Structural studies of the O-antigens from Salmonella greenside and Salmonella adelaide. Carbohydr Res 111: 289–296. King JD, Mulrooney EF, Vinogradov E et al. (2008) lfnA from Pseudomonas aeruginosa O12 and wbuX from Escherichia coli O145 encode membrane-associated proteins and are required for expression of 2,6-dideoxy-2-acetamidino-L-galactose in lipopolysaccharide O antigen. J Bacteriol 190: 1671–1679. Kneidinger B, Graninger M, Adam G et al. (2001) Identification of two GDP-6-deoxy-D-lyxo-4-hexulose reductases synthesizing GDP-D-rhamnose in Aneurinibacillus thermoaerophilus L420-91T. J Biol Chem 276: 5577–5583. Knirel YA (2011) Structure of O-antigens. Bacterial Lipopolysaccharides: Structure, Chemical Synthesis, Biogenesis and Interaction with Host Cells. (Knirel YA & Valvano MA, eds), Springer Wien, New York, NY. Knirel YA, Shashkov AS, Tsvetkov YE et al. (2003) 5,7-Diamino-3,5,7,9-tetradeoxynon-2-ulosonic acids in bacterial glycopolymers: chemistry and biochemistry. Adv Carbohydr Chem Biochem 58: 371–417. FEMS Microbiol Rev 38 (2014) 56–89 85 Knirel YA, Shevelev SD & Perepelov AV (2012) Higher aldulosonic acids: components of bacterial glycans. Mendeleev Commun 21: 173–182. Knirel YA, Lan R, Senchenkova SN et al. (2013) O-antigen structure of Shigella flexneri serotype Yv and effect of the lpt-O gene variation on phosphoethanolamine modification of S. flexneri O-antigens. Glycobiology 23: 475–485. Kocharova NA, Vinogradov EV, Knirel’ IuA et al. (1988) The structure of O-specific polysaccharide chains of lipopolysaccharides from Citrobacter 032 and Salmonella arizonae 064 (Arizona 29). Bioorg Khim 14: 697–700. Kocharova NA, Knirel YA, Stanislavsky ES et al. (1996) Structural and serological studies of lipopolysaccharides of Citrobacter O35 and O38 antigenically related to Salmonella. FEMS Immunol Med Microbiol 13: 1–8. Koropatkin NM, Liu HW & Holden HM (2003) High resolution x-ray structure of tyvelose epimerase from Salmonella typhi. J Biol Chem 278: 20874–20881. Kumirska J, Szafranek J, Czerwicka M et al. (2007) The structure of the O-polysaccharide isolated from the lipopolysaccharide of Salmonella Dakar (serogroup O:28). Carbohydr Res 342: 2138–2143. Kumirska J, Dziadziuszko H, Czerwicka M et al. (2011) Heterogeneous structure of O-Antigenic part of lipopolysaccharide of Salmonella Telaviv (serogroup O:28) containing 3-Acetamido-3,6-dideoxy-D-glucopyranose. Biochemistry (Mosc) 76: 780–790. Li Q & Reeves PR (2000) Genetic variation of dTDP-L-rhamnose pathway genes in Salmonella enterica. Microbiology 146: 2291–2307. Li Y, Cao B, Liu B et al. (2009) Molecular detection of all 34 distinct O-antigen forms of Shigella. J Med Microbiol 58: 69–81. Li D, Liu B, Chen M et al. (2010a) A multiplex PCR method to detect 14 Escherichia coli serogroups associated with urinary tract infections. J Microbiol Methods 82: 71–77. Li Y, Perepelov AV, Guo D et al. (2010b) Structural and genetic relationships of two pairs of closely related O-antigens of Escherichia coli and Salmonella enterica: E. coli O11/S. enterica O16 and E. coli O21/S. enterica O38. FEMS Immunol Med Microbiol 61: 258–268. Lindberg B, Lindh F, Longren J et al. (1981) Structural studies of the O-specific side-chain of the lipopolysaccharide from Escherichia coli O55. Carbohydr Res 97: 105–112. Lindberg B, Leontein K, Lindquist U et al. (1988) Structural studies of the O-antigen polysaccharide of Salmonella thompson, serogroup C1 (6,7). Carbohydr Res 174: 313–322. Linton KJ & Higgins CF (1998) The Escherichia coli ATP-binding cassette (ABC) proteins. Mol Microbiol 28: 5– 13. Liu D, Verma NK, Romana LK & Reeves PR (1991) Relationships among the rfb regions of Salmonella serovars A, B, and D. J Bacteriol 173: 4814–4819. Liu B, Knirel YA, Feng L et al. (2008) Structure and genetics of Shigella O antigens. FEMS Microbiol Rev 32: 627–653. ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved 86 Liu B, Wu F, Li D et al. (2009) Development of a serogroup-specific DNA microarray for identification of Escherichia coli strains associated with bovine septicemia and diarrhea. Vet Microbiol 142: 373–378. Liu B, Perepelov AV, Li D et al. (2010a) Structure of the O-antigen of Salmonella O66 and the genetic basis for similarity and differences between the closely related O-antigens of Escherichia coli O166 and Salmonella O66. Microbiology 156: 1642–1649. Liu B, Perepelov AV, Guo D et al. (2010b) Structural and genetic relationships between the O-antigens of Escherichia coli O118 and O151. FEMS Immunol Med Microbiol 60: 199–207. Liu B, Perepelov AV, Svensson MV et al. (2010c) Genetic and structural relationships of Salmonella O55 and Escherichia coli O103 O-antigens and identification of a 3-hydroxybutanoyltransferase gene involved in the synthesis of a Fuc3N derivative. Glycobiology 20: 679–688. MacLean LL & Perry MB (1997) Structural characterization of the serotype O:5 O-polysaccharide antigen of the lipopolysaccharide of Escherichia coli O:5. Biochem Cell Biol 75: 199–205. Makela PH, Valtonen VV & Valtonen M (1973) Role of O-antigen (lipopolysaccharide) factors in the virulence of Salmonella. J Infect Dis 128 (Suppl): 81–85. Masoud H & Perry MB (1996) Structural characterization of the O-antigenic polysaccharide of Escherichia coli serotype 017 lipopolysaccharide. Biochem Cell Biol 74: 241–248. Maurelli AT, Fernandez RE, Bloch CA et al. (1998) “Black holes” and bacterial pathogenicity: a large genomic deletion that enhances the virulence of Shigella spp. and enteroinvasive Escherichia coli. P Natl Acad Sci USA 95: 3943–3948. Mavroidi A, Aanensen DM, Godoy D et al. (2007) Genetic relatedness of the Streptococcus pneumoniae capsular biosynthetic loci. J Bacteriol 189: 7841–7855. McGrath BC & Osborn MJ (1991) Localisation of the terminal steps of O-antigen synthesis in Salmonella typhimurium. J Bacteriol 173: 649–654. McQuiston JR, Parrenas R, Ortiz-Rivera M et al. (2004) Sequencing and comparative analysis of flagellin genes fliC, fljB, and flpA from Salmonella. J Clin Microbiol 42: 1923–1932. McQuiston JR, Herrera-Leon S, Wertheim BC et al. (2008) Molecular phylogeny of the salmonellae: relationships among Salmonella species and subspecies determined from four housekeeping genes and evidence of lateral gene transfer events. J Bacteriol 190: 7060–7067. Morelli G, Song Y, Mazzoni CJ et al. (2011) Yersinia pestis genome sequencing identifies patterns of global phylogenetic diversity. Nat Genet 42: 1140–1143. Mulford CA & Osborn MJ (1983) A intermediate step in translocation of lipopolysaccharide to outer membrane of Salmonella typhimurium. P Natl Acad Sci USA 80: 1159–1163. ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved B. Liu et al. Mulrooney EF, Poon KK, McNally DJ et al. (2005) Biosynthesis of UDP-N-acetyl-L-fucosamine, a precursor to the biosynthesis of lipopolysaccharide in Pseudomonas aeruginosa serotype O11. J Biol Chem 280: 19535–19542. Naide Y, Nikaido H, M€akel€a PH et al. (1965) Semirough strains of Salmonella. P Natl Acad Sci USA 53: 147–153. Neidhardt FC, Ingraham JL, Magasanik B et al. (1987) Escherichia and Salmonella typhimurium: Cellular and Molecular Biology. American Society for Microbiology, Washington, DC. Nelson K & Selander RK (1992) Evolutionary genetics of the proline permease gene (putP) and the control region of the proline utilization operon in populations of Salmonella and Escherichia coli. J Bacteriol 174: 6886–6895. Nelson K, Whittam TS & Selander RK (1991) Nucleotide polymorphism and evolution in the glyceraldehyde-3-phosphate dehydrogenase gene (gapA) in natural populations of Salmonella and Escherichia coli. P Natl Acad Sci USA 88: 6667–6671. Nikaido H, Levinthal M, Nikaido K et al. (1967) Extended deletions in the histidine-rough-B region of the Salmonella chromosome. P Natl Acad Sci USA 57: 1825–1832. Ochman H & Wilson AC (1987) Evolution in bacteria: evidence for a universal substitution rate in cellular genomes. J Mol Evol 26: 74–86. Orskov I, Orskov F, Jann B et al. (1977) Serology, chemistry, and genetics of O and K antigens of Escherichia coli. Bacteriol Rev 41: 667–710. Pallen MJ & Wren BW (2007) Bacterial pathogenomics. Nature 449: 835–842. Perepelov AV, Liu B, Senchenkova SN et al. (2009) Structure of O-antigen and functional characterization of O-antigen gene cluster of Salmonella enterica O47 containing ribitol phosphate and 2-acetimidoylamino-2,6-dideoxy-L-galactose. Biochemistry (Mosc) 74: 416–420. Perepelov AV, Liu B, Senchenkova SN et al. (2010a) Structure and gene cluster of the O-antigen of Salmonella enterica O60 containing 3-formamido-3,6-dideoxy-D-galactose. Carbohydr Res 345: 1632–1634. Perepelov AV, Liu B, Senchenkova SN et al. (2010b) Structure of the O-polysaccharide of Salmonella enterica O41. Carbohydr Res 345: 971–973. Perepelov AV, Liu B, Senchenkova SN et al. (2010c) Structure of the O-antigen and characterization of the O-antigen gene cluster of Escherichia coli O108 containing 5,7-diacetamido-3,5,7,9-tetradeoxy-L-glycero-D-galacto-non-2ulosonic (8-epilegionaminic) acid. Biochemistry (Mosc) 75: 19–24. Perepelov AV, Liu B, Senchenkova SN et al. (2010d) Structure and gene cluster of the O-antigen of Salmonella enterica O44. Carbohydr Res 345: 2099–2101. Perepelov AV, Liu B, Senchenkova SN et al. (2010e) The O-antigen of Salmonella enterica O13 and its relation to the O-antigen of Escherichia coli O127. Carbohydr Res 345: 1808–1811. FEMS Microbiol Rev 38 (2014) 56–89 Salmonella O-antigen diversity Perepelov AV, Liu B, Shevelev SD et al. (2010f) Relatedness of the O-polysaccharide structures of Escherichia coli O123 and Salmonella enterica O58, both containing 4,6-dideoxy-4-{N-[(S)-3-hydroxybutanoyl]-D-alanyl} amino-D-glucose; revision of the E. coli O123 O-polysaccharide structure. Carbohydr Res 345: 825–829. Perepelov AV, Liu B, Shevelev SD et al. (2010g) Structural and genetic characterization of the O-antigen of Salmonella enterica O56 containing a novel derivative of 4-amino-4,6-dideoxy-D-glucose. Carbohydr Res 345: 1891–1895. Perepelov AV, Liu B, Senchenkova SN et al. (2011a) Structure of the O-polysaccharide and characterization of the O-antigen gene cluster of Salmonella enterica O53. Carbohydr Res 346: 373–376. Perepelov AV, Liu B, Senchenkova SN et al. (2011b) Structures of the O-polysaccharides of Salmonella enterica O59 and Escherichia coli O15. Carbohydr Res 346: 381–383. Perepelov AV, Liu B, Guo D et al. (2011c) Structure elucidation of the O-antigen of Salmonella enterica O51 and its structural and genetic relation to the O-antigen of Escherichia coli O23. Biochemistry (Mosc) 76: 774–779. Perepelov AV, Li D, Liu B et al. (2011d) Structural and genetic characterization of the closely related O-antigens of Escherichia coli O85 and Salmonella enterica O17. Innate Immun 17: 164–173. Perepelov AV, Liu B, Senchenkova SN et al. (2011e) O-antigen structure and gene clusters of Escherichia coli O51 and Salmonella enterica O57; another instance of identical O-antigens in the two species. Carbohydr Res 346: 828–832. Perry MB & MacLean LL (1992a) Structure of the polysaccharide O-antigen of Salmonella riogrande O:40 (group R) related to blood group A activity. Carbohydr Res 232: 143–150. Perry MB & MacLean LL (1992b) Structural characterization of the O-polysaccharide of the lipopolysaccharide produced by Salmonella milwaukee O:43 (group U) which possesses human blood group B activity. Biochem Cell Biol 70: 49–55. Perry MB, MacLean L & Griffith DW (1986a) Structure of the O-chain polysaccharide of the phenol-phase soluble lipopolysaccharide of Escherichia coli 0157:H7. Biochem Cell Biol 64: 21–28. Perry MB, Bundle DR, MacLean L et al. (1986b) The structure of the antigenic lipopolysaccharide O-chains produced by Salmonella urbana and Salmonella godesberg. Carbohydr Res 156: 107–122. Pfoestl A, Hofinger A, Kosma P et al. (2003) Biosynthesis of dTDP-3-acetamido-3,6-dideoxy-a-D-galactose in Aneurinibacillus thermoaerophilus L420-91T. J Biol Chem 278: 26410–26417. Pfostl A, Zayni S, Hofinger A et al. (2008) Biosynthesis of dTDP-3-acetamido-3,6-dideoxy-a-D-glucose. Biochem J 410: 187–194. Plainvert C, Bidet P, Peigne C et al. (2007) A new O-antigen gene cluster has a key role in the virulence of the Escherichia coli meningitis clone O45:K1:H7. J Bacteriol 189: 8528–8536. FEMS Microbiol Rev 38 (2014) 56–89 87 Pluschke G, Mayden J, Achtman M et al. (1983) Role of the capsule and the O-antigen in resistance of O18:K1 Escherichia coli to complement-mediated killing. J Bacteriol 42: 907–913. Popoff MY & Le Minor L (1997) Antigenic Formulas of the Salmonella Serovars, 7th Revision. WHO Collaborating Centre for Reference and Research on Salmonella. Institut Pasteur, Paris, France. Pupo GM, Lan R & Reeves PR (2000) Multiple independent origins of Shigella clones of Escherichia coli and convergent evolution of many of their characteristics. P Natl Acad Sci USA 97: 10567–10572. Rabsch W, Andrews HL, Kingsley RA et al. (2002) Salmonella enterica serotype Typhimurium and its host-adapted variants. Infect Immun 70: 2249–2255. Ramamurthy T, Yamasaki S, Takeda Y et al. (2003) Vibrio cholerae O139 Bengal: odyssey of a fortuitous variant. Microbes Infect 5: 329–344. Ratnayake S, Weintraub A & Widmalm G (1994) Structural studies of the enterotoxigenic Escherichia coli (ETEC) O153 O-antigenic polysaccharide. Carbohydr Res 265: 113–120. Raymond CK, Sims EH, Kas A et al. (2002) Genetic variation at the O-antigen biosynthetic locus in Pseudomonas aeruginosa. J Bacteriol 184: 3614–3622. Raynaud C, Meibom KL, Lety MA et al. (2007) Role of the wbt locus of Francisella tularensis in lipopolysaccharide O-antigen biogenesis and pathogenicity. Infect Immun 75: 536–541. Reeves PR (1992) Variation in O antigens, niche specific selection and bacterial populations. FEMS Microbiol Lett 100: 509–516. Reeves PR (1995) Role of O-antigen variation in the immune response. Trends Microbiol 3: 381–386. Reeves PR & Wang L (2002) Genomic organization of LPS-specific loci. Curr Top Microbiol Immunol 264: 109–135. Reeves PR, Liu B, Zhou Z et al. (2011) Rates of mutation and host transmission for an Escherichia coli clone over 3 years. PLoS ONE 6: e26907. Reeves PR, Cunneen MM, Liu B & Wang L (2013) Genetics and evolution of the Salmonella galactose-initiated set of O antigens. PLoS One 8: e69306. Ren Y, Liu B, Cheng J et al. (2008) Characterization of Escherichia coli O3 and O21 O-antigen gene clusters and development of serogroup-specific PCR assays. J Microbiol Methods 75: 329–334. Royle MC, Totemeyer S, Alldridge LC et al. (2003) Stimulation of Toll-like receptor 4 by lipopolysaccharide during cellular invasion by live Salmonella typhimurium is a critical but not exclusive event leading to macrophage responses. J Immunol 170: 5445–5454. Rundlof T, Weintraub A & Widmalm G (1998) Structural determination of the O-antigenic polysaccharide from Escherichia coli O35 and cross-reactivity to Salmonella arizonae O62. Eur J Biochem 258: 139–143. Rush JS, Alaimo C, Robbiani R et al. (2010) A novel epimerase that converts GlcNAc-P-P-undecaprenol to ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved 88 GalNAc-P-P-undecaprenol in Escherichia coli O157. J Biol Chem 285: 1671–1680. Samuel G & Reeves P (2003) Biosynthesis of O-antigens: genes and pathways involved in nucleotide sugar precursor synthesis and O-antigen assembly. Carbohydr Res 338: 2503–2519. Samuel G, Hogbin JP, Wang L et al. (2004) Relationships of the Escherichia coli O157, O111, and O55 O-antigen gene clusters with those of Salmonella enterica and Citrobacter freundii, which express identical O antigens. J Bacteriol 186: 6536–6543. Senchenkova SN, Shashkov AS, Knirel YA et al. (1997) Structure of the O-specific polysaccharide of Salmonella enterica ssp. arizonae O50 (Arizona 9a, 9b). Carbohydr Res 301: 61–67. Sharp PM (1991) Determinants of DNA sequence divergence between Escherichia coli and Salmonella typhimurium: codon usage, map position, and concerted evolution. J Mol Evol 33: 23–33. Shashkov AS, Lipkind GM, Knirel NK et al. (1988) Stereochemical factors determining the effects of glycosylation on the 13C chemical shifts in carbohydrates. Magn Reson Chem 26: 735–747. Shashkov AS, Vinogradov EV, Knirel YA et al. (1993) Structure of the O-specific polysaccharide of Salmonella arizonae O45. Carbohydr Res 241: 177–188. Somoza JR, Menon S, Schmidt H et al. (2000) Structural and kinetic analysis of Escherichia coli GDP-mannose 4,6 dehydratase provides insights into the enzyme’s catalytic mechanism and regulation by GDP-fucose. Structure 8: 123–135. Staaf M, Widmalm G, Weintraub A et al. (1995) Structural elucidation of the O-antigenic polysaccharide from Escherichia coli O44:H18. Eur J Biochem 233: 473–477. Staaf M, Urbina F, Weintraub A et al. (1999) Structural elucidation of the O-antigenic polysaccharides from Escherichia coli O21 and the enteroaggregative Escherichia coli strain 105. Eur J Biochem 266: 241–245. Sun Q, Knirel YA, Lan R et al. (2012) A novel plasmid-encoded serotype conversion mechanism through addition of phosphoethanolamine to the O-antigen of Shigella flexneri. PLoS ONE 7: e46095. Szafranek J, Kaczynska M, Kaczynski Z et al. (2003) Structure of the polysaccharide O-antigen of Salmonella Aberdeen (O:11). Pol J Chem 77: 1135–1140. Verma V & Reeves PR (1989) Identification and sequence of rfbS and rfbE, which determine antigenic specificity of group A and group D Salmonella. J Bacteriol 171: 5694–5701. Verma NK, Quigley NB & Reeves PR (1988) O-antigen variation in Salmonella spp.: rfb gene clusters of three strains. J Bacteriol 170: 103–107. Vinogradov EV, Knirel’ IuA, Lipkind GM et al. (1987a) [Antigenic bacterial polysaccharides. 24. The structure of the O-specific polysaccharide chain of Salmonella arizonae 063 (Arizona 08) lipopolysaccharide]. Bioorg Khim 13: 1399–1404. Vinogradov EV, Knirel YA, Lipkind GM et al. (1987b) Antigenic polysaccharides of bacteria. 23. The structure of ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved B. Liu et al. the O-specific polysaccharide chain of the lipopolysaccharide of Salmonella arizonae O59. Bioorg Khim 13: 1275–1281. Vinogradov EV, Shashkov AS, Knirel YA et al. (1992) The structure of the O-specific polysaccharide chain of the lipopolysaccharide of Salmonella arizonae O61. Carbohydr Res 231: 1–11. Vinogradov EV, Knirel YA, Kochetkov NK et al. (1994) The structure of the O-specific polysaccharide of Salmonella arizonae O62. Carbohydr Res 253: 101–110. Vinogradov E, Nossova L & Radziejewska-Lebrecht J (2004) The structure of the O-specific polysaccharide from Salmonella cerro (serogroup K, O:6,14,18). Carbohydr Res 339: 2441–2443. Wang L & Reeves PR (1998) Organization of Escherichia coli O157 O-antigen gene cluster and identification of its specific genes. Infect Immun 66: 3545–3551. Wang L & Reeves PR (2000) The Escherichia coli O111 and Salmonella enterica O35 gene clusters: gene clusters encoding the same colitose-containing O antigen are highly conserved. J Bacteriol 182: 5256–5261. Wang L, Romana LK & Reeves PR (1992) Molecular analysis of a Salmonella enterica group E1 rfb gene cluster: O antigen and the genetic basis of the major polymorphism. Genetics 130: 429–443. Wang L, Andrianopoulos K, Liu D et al. (2002a) Extensive variation in the O-antigen gene cluster within one Salmonella enterica serogroup reveals an unexpected complex history. J Bacteriol 184: 1669–1677. Wang L, Huskic S, Cisterne A et al. (2002b) The O-antigen gene cluster of Escherichia coli O55:H7 and identification of a new UDP-GlcNAc C4 epimerase gene. J Bacteriol 184: 2620–2625. Wang W, Perepelov AV, Feng L et al. (2007) A group of Escherichia coli and Salmonella enterica O antigens sharing a common backbone structure. Microbiology 153: 2159– 2167. Weintraub A, Leontein K, Widmalm G et al. (1993) Structural studies of the O-antigenic polysaccharide of an enteroaggregative Escherichia coli strain. Eur J Biochem 213: 859–864. West NP, Sansonetti P, Mounier J et al. (2005) Optimization of virulence functions through glucosylation of Shigella LPS. Science 307: 1313–1317. Whitfield C, Richards JC, Perry MB et al. (1991) Expression of two structurally distinct D-galactan O antigens in the lipopolysaccharide of Klebsiella pneumoniae serotype O1. J Bacteriol 173: 1420–1431. Widmalm G & Leontein K (1993) Structural studies of the Escherichia coli O127 O-antigen polysaccharide. Carbohydr Res 247: 255–262. Wildschutte H & Lawrence JG (2007) Differential Salmonella survival against communities of intestinal amoebae. Microbiology 153: 1781–1789. Wildschutte H, Wolfe DM, Tamewitz A et al. (2004) Protozoan predation, diversifying selection, and the FEMS Microbiol Rev 38 (2014) 56–89 89 Salmonella O-antigen diversity evolution of antigenic diversity in Salmonella. P Natl Acad Sci USA 101: 10644–10649. Xiang SH, Hobbs M & Reeves PR (1994) Molecular analysis of the rfb gene cluster of a group D2 Salmonella enterica strain: evidence for its origin from an insertion sequence-mediated recombination event between group E and D1 strains. J Bacteriol 176: 4357–4365. Yi W, Shao J, Zhu L et al. (2005) Escherichia coli O86 O-antigen biosynthetic gene cluster and stepwise enzymatic synthesis of human blood group B antigen tetrasaccharide. J Am Chem Soc 127: 2040–2041. Yildirim H, Weintraub A & Widmalm G (2001) Structural studies of the O-polysaccharide from the Escherichia coli O77 lipopolysaccharide. Carbohydr Res 333: 179–183. Zhao S, Sandt CH, Feulner G et al. (1993) Rhs elements of Escherichia coli K-12: complex composites of shared and unique components that have different evolutionary histories. J Bacteriol 175: 2799–2808. Zhao G, Liu J, Liu X et al. (2007) Cloning and characterization of GDP-perosamine synthetase (Per) from Escherichia coli O157:H7 and synthesis of GDP-perosamine in vitro. Biochem Biophys Res Commun 363: 525–530. Zuccotti S, Zanardi D, Rosano C et al. (2001) Kinetic and crystallographic analyses support a sequential-ordered bi bi FEMS Microbiol Rev 38 (2014) 56–89 catalytic mechanism for Escherichia coli glucose-1-phosphate thymidylyltransferase. J Mol Biol 313: 831–843. Supporting Information Additional Supporting Information may be found in the online version of this article: Fig. S1. The proposed functions of glycosyltransferases involved in the synthesis of Salmonella GlcNAc/GalNAcinitiated O antigens with the Wzx/Wzy pathway. Table S1. Composition of Salmonella GlcNAc/GalNAcinitiated O antigens. Table S2. Characteristics of the ORFs in Salmonella O-antigen gene clusters which are firstly reported in this review. Table S3. Homology groups of glycosyltransferases in Salmonella GlcNAc/GalNAc-initiated O-antigen gene clusters with wzx/wzy. Table S4. Summary of unique Salmonella GlcNAc/ GalNAc-initiated O antigens. Table S5. Primers used for Salmonella molecular typing. ª 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved
© Copyright 2026 Paperzz