Diversifying Selection and Concerted Evolution of a Type IV Secretion System in Bartonella Björn Nystedt,*1 A. Carolin Frank,*1,2 Mikael Thollesson,* and Siv G. E. Andersson* *Department of Molecular Evolution, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden We have studied the evolution of a type IV secretion system (T4SS), in Bartonella, which is thought to have changed function from conjugation to erythrocyte adherence following a recent horizontal gene transfer event. The system, called Trw, is unique among T4SSs in that genes encoding both exo- and intracellular components are located within the same duplicated fragment. This provides an opportunity to study the influence of selection on proteins involved in host– pathogen interactions. We sequenced the trw locus from several strains of Bartonella henselae and investigated its evolutionary history by comparisons to other Bartonella species. Several instances of recombination and gene conversion events where detected in the 2- to 5-fold duplicated gene fragments encompassing trwJIH, explaining the homogenization of the anchoring protein TrwI and the divergence of the minor pilus protein TrwJ. A phylogenetic analysis of the 7- to 8-fold duplicated gene coding for the major pilus protein TrwL displayed 2 distinct clades, likely representing a subfunctionalization event. The analyses of the B. henselae strains also identified a recent horizontal transfer event of almost the complete trwL region. We suggest that the switch in function of the T4SS was mediated by the duplication of the genes encoding pilus components and their diversification by combinatorial sequence shuffling within and among genomes. We suggest that the pilus proteins have evolved by diversifying selection to match a divergent set of erythrocyte surface structures, consistent with the trench warfare coevolutionary model. Introduction A common theme for bacteria that colonize eukaryotes is the presence of variable surface-exposed proteins that coevolve with the host proteins. These bacteria experience selection for variability on 2 timescales; within individual hosts to enable binding to a diverse set of host cell surface proteins and to avoid immune system recognition, and between hosts to increase survival in genetically different hosts. Two coevolutionary models of host–pathogen interactions have been proposed. The trench warfare model suggests that diversifying selection on bacterial surface proteins maintains genetic polymorphisms in the population. The coevolutionary arms race model suggests back and forth evolution that oscillates between the offense imposed by the pathogen on the host and the subsequent evolution of defense responses by the host. In this model, positive selection on favorable alleles yields genetic variation over time, yet produces little variation within the population. Bacterial secretion systems are surface-exposed pili that mediate the transfer of effector molecules into recipient host cells. Studies of the pilus protein HrpA of the type III secretion system (T3SS) in the plant pathogen Pseudomonas syringae provide evidence for diversifying selection according to the trench warfare model (Guttman et al. 2006), whereas novel variants of the effector protein alleles are selected during host switching in line with the coevolutionary arms race model (Ma et al. 2006). Mechanistically, a recombinatorial process called terminal reassortment has been proposed to explain the extremely rapid evolution of the T3SS effectors when compared between a wide range of bacterial species (Stavrinides et al. 2006). 1 These authors contributed equally to this work. Present address: Smurfit Institute of Genetics, University of Dublin, Trinity College Dublin, Dublin, Ireland. Key words: type IV secretion system, Bartonella, duplication, recombination, gene conversion, host–pathogen interaction. 2 E-mail: [email protected]. Mol. Biol. Evol. 25(2):287–300. 2008 doi:10.1093/molbev/msm252 Advance Access publication December 7, 2007 Ó The Author 2007. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] The focus of this study is on the evolution of type IV secretion systems (T4SSs). Prototype systems are found in the plant pathogen Agrobacterium tumefaciens that uses a T4SS called VirB to transfer oncogenic DNA to its host, the whooping cough agent Bordetella pertussis that translocates a toxin into the host using a T4SS, and the humanassociated gastric bacterium Helicobacter pylori that transfers the effector protein CagA with the aid of its T4SS. A phylogenetic analysis of T4SSs showed that they have originated from plasmid conjugation systems multiple times independently (Frank et al. 2005). The system we have studied is the trw T4SS in Bartonella, which is thought to have switched function recently during its horizontal transfer into the Bartonella lineage from a role in plasmid conjugation to its current role in host interaction (Seubert et al. 2003; Frank et al. 2005; Fernandez-Lopez et al. 2006). Bartonella species are vector-borne bacteria that invade and replicate within the host erythrocytes without normally shortening the erythrocyte life span (Schulein et al. 2001). The Trw system is essential for erythrocyte colonization and is thought to mediate adherence to these host cells (Seubert et al. 2003). The Bartonella Trw system is unique among T4SSs in that several genes in the cluster are duplicated (Seubert et al. 2003; Alsmark et al. 2004). For example, the gene encoding the major component of the T-pilus, trwL is present in 7 or 8 copies. In addition, a segment encompassing trwJ, trwH, and trwI is present in variable copy numbers in the different species. By homology to other T4SS, TrwJ is thought to mediate host cell attachment (Yeo et al. 2003), whereas TrwH and TrwI are associated with the anchoring of the pilus to the outer and inner membranes (Krall et al. 2002; Cascales and Christie 2004). Studies of duplicated genes for other outer membrane proteins suggest that gene conversion is one of the most important mechanisms for generating diversity within a host (Deitsch et al. 1997). This can be manifested as the expression of alternative copies of surface proteins by the transfer of short donor sequence fragments, called functional pseudogenes, into a recipient expression cassette (Santoyo and Romero 2005). The power of combinatorial gene conversion 288 Nystedt et al. N Strain 0.01 100 korB L M korA K J I D E Host species Bt 7 5† Bg 8 4 Wood mouse Bh-IC11 (ST1) 7 2 Cat Bh-H1 (ST1) 8 2 Cat‡ Bh-MAR (ST8) 8 3 Cat‡ Bh-CH* (ST4) 7 Bq 8 100 97 H G F Brown rat Cheetah 2 Human FIG. 1.—Phylogeny and gene order structure of the trw locus in Bartonella. Maximum likelihood phylogeny as inferred from the concatenated alignments of the single-copy genes trwFED (left). Schematic representation of the Trw region with tandem repeat segments within brackets (top right). The trwD–trwN genes (D–N) and the regulatory genes korA and korB are indicated. The copy numbers and host species for each strain are outlined (right). ST: sequence type. * Only the trwL region has been sequenced for Bh-CH, and thus this sequence is not included in the trwFED phylogeny. The trwH gene is lacking from the third tandem repeat of this segment in Bt. à The strain was isolated from an incidentally infected human patient. to generate functional diversity can be illustrated with the major antigen (encoded by the tprK gene) in Treponema pallidum (Gray et al. 2006): at least 420,000 different sequence variants of this protein can be generated from 47 variable pseudogene sequences and 7 variable (V) regions in the expressed tprK gene (Santoyo and Romero 2005). The contribution of combinatorial gene conversion to antigenic variation of surface structures is by now well established in many pathogens, including Anaplasma marginale (Brayton et al. 2002, 2005), Borrelia hermsii (Plasterk et al. 1985), Borrelia burgdorferi (Zhang and Norris 1998), Neisseria gonorrhoeae (Hagblom et al. 1985; Haas and Meyer 1986), and Mycoplasma synoviae (Noormohammadi et al. 2000). Gene conversion can also cause members of a multigene family to evolve in concert, as has been reported for essential cell components such as the tuf gene encoding elongation factor Tu and the ribosomal RNA (rRNA) genes. Studies of the duplicated tuf genes indicate transfer rates in the order of 108 to 109 per cell division (Abdulkarim and Hughes 1996), which is higher than the estimated nucleotide substitution rate. However, mutations in recombination– repair genes, such as recAB and mutSHL have been demonstrated to affect these frequencies, suggesting that the intrinsic rate of gene conversion may be species specific. Until now, the influence of gene conversion on homogenization and diversification has been studied separately for cytoplasmic and outer membrane proteins, making it difficult to assess the relative contribution of mutation and selection to the observed patterns. Here, the trw cluster comes in handy: 1 of the duplicated regions contains genes for both exocellular (TrwJ) and intracellular pilus components (TrwI), providing the opportunity to start decoding the evolutionary forces that promotes shifts in the balance between homogenization and diversification. In this study, we have examined the molecular evolution of the duplicated genes in the trw cluster using sequence data from 4 Bartonella species from various natural hosts, including humans, felines, and rodents. We show that genes with rapid evolution and extreme intragenomic sequence divergence are found on the same coduplicated segment as genes homogenized by gene conversion. This discrepancy is directly related to the intra- versus exocellular location of the encoded proteins. We also provide evidence for horizontal gene transfer, and finally we discuss the possible selective forces and dynamics shaping the evolution of duplicated genes for host-interaction proteins. Materials and Methods Sequences and Accession Numbers This study includes sequence data from 4 species of Bartonella and 4 strains of Bartonella henselae (fig. 1). Data were taken from the genome sequences of B. henselae strains Houston-1 (Bh-H1) (NC_005956) and Bartonella quintana (Bq) (NC_005955) and the trw gene sequences from Bartonella tribocorum (Bt) (AJ496288) and Bartonella grahamii (Bg) (AM905248). We performed de novo sequencing by fosmid library construction of the complete trw region from the 2 B. henselae strains Marseille (BhMAR) (AM905245) and IndoCat-11 (Bh-IC11) (AM905246) and of the trwL region (including korA) from a long-range polymerase chain reaction (PCR) product of B. henselae strain Cheetah (Bh-CH) (AM905247). Fosmid Library Construction and Sequencing Whole genome fosmid libraries were constructed using the Epicentre Fosmid Library Construction Kit. Genomic DNA was prepared from 2 week’s B. henselae cultures on horse blood agar plates. The DNA was physically sheared with a 50-ll Hamilton syringe, end repaired, and size separated by pulse-field gel electrophoresis in 1.2% agarose (SeaKem Gold; Cambrex Bio Science, East Rutherford, NJ) in 0.5 Tris-Borate-EDTA buffer in a Evolution of Type IV Secretion Systems 289 GenNavigator System apparatus (Amersham Biosciences, Uppsala, Sweden) at 14 °C and 5.7 V/cm for a total of 35 h split into 4 phases: 1 s for 3 h, 5 s for 10 h, 10 s for 15 h, and 15 s for 7 h. Fragments of 35–45 kb were cut from the gel. The gel was digested with GELase, and the DNA was extracted by a single phenol step, followed by NaCl precipitation, and washed with 70% ethanol. The sample was desalted by microdialysis (0.025 lm Millipore membrane filters, 30 min), ligated to the CopyControl pCC1FOS fosmid vector, and packaged into phages that were used to infect T1 resistant Escherichia coli (EPI300-T1). The bacteria were screened on chloramphenicol plates (12 lg/ml), grown overnight in liquid selective media, and frozen in 15% glycerol in 80 °C for long-time storage. Screening for trw-positive clones was performed with the Gen Images AlkPhos direct labeling and detection system with chemiluminescence detection with CDP-Star (GE Healthcare, Uppsala, Sweden). Samples of 2-ll fosmidcontaining E. coli cultures were added directly onto a nylon filter, and positive clones were identified by DNA–DNA overnight hybridization at 65 °C with a labeled PCR product from the Bh-H1 trwK gene, followed by detection by exposure on Hyperfilm ECL. Fosmids were induced to high copy numbers (CopyControl Induction Solution) and prepared using standard plasmid preparation protocols (QIAGEN, Valencia, CA). Shotgun sequencing was performed on the purified fosmid with the insert DNA, using a MegaBace1000 capillary sequenator. Published fosmids were sequenced to a coverage of .8, complemented by PCR sequencing if needed to an average consensus sequence quality value of .85, and at least one read with a quality value .20 in both directions at each site. peptide (SP) regions and transmembrane regions from protein sequence data. We used stability of aligned positions (Loytynoja and Milinkovitch 2001) to analyze protein alignment ambiguities for each of the possible parameter combinations in ClustalW, with the gap extension penalty ranging from 0.1 to 0.4 (step 5 0.1) and the gap opening penalty ranging from 8 to 15 (step 5 1). Segments of the gene supported by less than 90% of the protein alignments were consider ambiguous. Sequence Alignments and Distance Measures Detection of Recombination and Gene Conversion For all coding genes, protein alignments were made with Kalign (Lassmann and Sonnhammer 2005) and backtranslated to nucleotide sequences, with the trailing gap parameter set to t 5 0. Nucleotide spacers were aligned with Kalign (t 5 0). All alignments were manually inspected and edited if needed. Distance measures must be carefully interpreted in the light of recombining sequences. Hence, we used as distance measures both corrected and uncorrected substitution frequencies, as well as nucleotide identity and diversity index calculations (see below), depending on the particular sequences involved and the question at hand. Nonsynonymous (dN) and synonymous (dS) substitution frequencies were calculated using the method of (Yang and Nielsen 2000) as implemented in yn00 in the PAML package (Yang 2000), also giving the pairwise percent nucleotide identity. Uncorrected nonsynonymous (pN) and synonymous (pS) substitution frequencies were obtained by synonymous nonsynonymous analysis program (Korber 2000). Potentially recombined trwJ sequences were identified using the 4-sequence method implemented in 4SIS (Maynard Smith and Smith 1998). For each analysis, 1 gene copy (the ‘‘child’’) was analyzed for similarity on polymorphic sites to 2 other gene copies (the ‘‘parents’’), using a fourth gene copy as outgroup. The program optimizes a chi-square value to identify potential recombination breakpoints in the analyzed gene, based on biased similarity on informative sites, with the P value being equal to the fraction of 1,000 permutations with a chi-square value larger than for the original data. We also detected recombination using the 4 methods RDP (Martin and Rybicki 2000), GENECONV (Padidam et al. 1999), MaxChi (Maynard Smith 1992), and Chimaera (Posada and Crandall 2001) as implemented in the program RDP3 (Martin and Rybicki 2000), with default settings. Potential gene conversion events in the concatenated trwJIH region (including spacers) and among the trwL genes were detected using test of (Sawyer 1989) in GENECONV (Sawyer 1999). Parameters were set to equal weights for all nucleotides (seqtype 5 all) and no mismatches allowed (gscale 5 0). Clusters of repeated gene families (more than 60% amino acid similarity over at least 50% of the protein length) in B. henselae and B. quintana were detected using Blastclust (Altschul et al. 1990). Each cluster with at least 3 Sequence Feature Predictions and Gene Segments The transmembrane regions of aligned genes were predicted using PolyPhobius (http://phobius.cgb.ki.se/) (Käll et al. 2005), which simultaneously predicts both signal Tree Reconstructions The phylogenies of the alignments were inferred by maximum likelihood as implemented in Treefinder (Jobb et al. 2004), with 100 bootstrap replicates. The Akaike information criterion was used to select among 56 nt substitution models, using Modeltest 3.7 (Posada and Crandall 1998). The models obtained were general time reversible (GTR) þ I (trwFED), GTR þ I þ G (trwJ), GTR þ G (trwI), and transversion model þ I þ G (trwL). Alignment Visualizations Sequence alignments with up to ;10 sequences were directly visualized by a novel method: Each sequence was schematically plotted as a line/arrow (spacer/gene) in a spread-out fashion. At each position in the alignment, lines were drawn to indicate identical nucleotides. Unique nucleotides and gaps were represented above (or below) each sequence as stars and lines, respectively. For visibility, nonsegregating sites were removed from the alignments in some plots, and lines were not drawn for nucleotides shared across more than 3 sequences. 290 Nystedt et al. Table 1 Synonymous and Nonsynonymous Substitution Frequencies for trwI (upper rows) and trwFED (lower rows) withina,b and between Genomes Bh-H1 0.01:0.02 NA:NAc Bh-MAR 0.04:0.15 0.00:0.02 Bh-IC11 0.04:0.17 0.00:0.02 Bq 0.15:0.55 0.07:0.45 Bg 0.25:0.62 0.10:0.75 Bt 0.24:0.69 0.10:0.85 Bh-MAR 0.29 0.10 0.02:0.06 NA:NAc 0.04:0.16 0.00:0.13 0.15:0.58 0.07:0.46 0.25:0.61 0.10:0.76 0.23:0.66 0.10:0.87 Bh-IC11 0.22 0.08 0.28 0.12 0.00:0.00 NA:NAc 0.15:0.62 0.07:0.44 0.24:0.71 0.10:0.75 0.24:0.70 0.10:0.85 Bq 0.28 0.14 0.27 0.15 0.23 0.15 0.00:0.00 NA:NAc 0.24:0.61 0.10:0.80 0.25:0.60 0.10:0.85 Bg 0.40 0.14 0.41 0.14 0.34 0.14 0.39 0.13 0.01:0.01 NA:NAc 0.08:0.26 0.05:0.37 Bt 0.35 0.11 0.36 0.11 0.34 0.12 0.42 0.12 0.31 0.13 0.01:0.01 NA:NAc Bh-H1 NOTE.—The upper triangle shows dNdS, and the lower triangle shows the dN/dS ratio. a Intragenomic comparisons are not defined for the single-copy genes trwFED. b The dN/dS rations for the intragenomic comparisons of trwI are unreliable due to the small and imprecise dN and dS estimates and were excluded. c NA, Not Applicable. repeated genes in the 2 genomes combined was aligned with Kalign (t 5 0) and analyzed for gene conversion events with GENECONV (same parameters as above). Identity Plots Segments of differential sequence conservation were detected by calculating the Shannon–Wiener diversity index (H) in sliding windows along the alignments of the trwJIH segment (window size 5 40 nt, step size 5 20 nt) and the trwL genes (window size 5 20 nt, step size 5 10 nt). The Shannon–Wiener index at each position i was here defined as: Hi 5ðfi A logðfi AÞþfi T logðfi TÞ þ fi C logðfi CÞ þfi G logðfi GÞ þ fi Gap logðfi GapÞ=logðnÞ; where fiA is the frequency of A:s at site i and n equals the number of possible states (n 5 5 if there are 5 or more sequences in the alignment, otherwise n equals the number of sequences). A high similarity at a given site will give a low Shannon– Wiener index and a low similarity will give a high value. Results Evolution of the trw Cluster in Bartonella We have studied the evolution of the trw cluster that codes for a conjugative T4SS using Bartonella as the model system. Included in the analysis were 4 different Bartonella species: B. henselae (Bh), B. quintana (Bq), B. tribocorum (Bt), and B. grahamii (Bg) (fig. 1). To increase the resolution of the analyses, we compared the sequences of the trw gene cluster in 3 different strains of B. henselae: Houston-1 (Bh-H1, type strain), Marseille (Bh-MAR) and IndoCat-11 (Bh-IC11), with some sequence data also collected from the strain Cheetah (Bh-CH). The B. henselae strains represent 3 of the 8 sequence types (ST1, ST4, and ST8) identified in a previous multi locus sequence typing analysis of polymorphisms in 9 core genes (Lindroos et al. 2006). We inferred the evolutionary relationships of the trw region based on the last 3 genes in the trw cluster, trwFED (the positional homologs to A. tumefaciens virB 9, 10, 11). This gene set supported a clade containing the human pathogens B. quintana and the feline-associated species B. henselae to the exclusion of the rodent-associated species B. grahamii and B. tribocorum (fig. 1) in accordance with previously estimated species trees (Zeaiter et al. 2002; Frank et al. 2006), providing evidence for evolution by vertical inheritance of trw in the Bartonella lineages. The nonsynonymous (dN) and the synonymous (dS) substitution frequencies of the trwFED genes between Bh-H1 and Bq were dN 5 0.07 and dS 5 0.45, which is close to the average of all orthologs in these 2 completely sequenced genomes (dN 5 0.06, dS 5 0.49) (table 1). The analyses further suggested a tight relationship of all the 3 B. henselae strains, with the ST1-strain Bh-H1 forming a clade together with the ST8-strain Bh-MAR (97% bootstrap support), to the exclusion of the ST1-strain Bh-IC11 (fig. 1). This can be compared with a previous MLST-based analysis, where no differences were found between Bh-H1 and Bh-IC11, whereas Bh-MAR differed from these 2 in 6 of the 9 sequenced genes (Lindroos et al. 2006). Thus, different segments of the genome seem to have different evolutionary histories. Gene-Specific Phylogenetic Signals in the Tandemly Repeated trwJIH Segment The Bartonella Trw locus differs from previously described T4SS in that several genes within the locus are present in multiple copies (fig. 1). One segment encompassing the genes trwJ, trwI, and trwH (the positional homologs of virB5, virB6, and virB7) is present in 2 (Bq, Bh-H1, BhIC11), 3 (Bh-MAR), 4 (Bg), or 5 (Bt) copies, implying lineage-specific duplication–deletion processes. Because the order of genes in all 4 species is trwJ-I-H-J-I-H rather than trwJ-J-I-I-H-H or variants thereof, we assume that all 3 genes in the segment encompassing trwJ-I-H were duplicated simultaneously. Evolution of Type IV Secretion Systems 291 A Bt trwI5 89 Bt trwI2 B 0.1 0.02 Bt trwJ4 100 Bt trwI3 Bg trwJ2 100 Bt trwI4 Bg trwJ4 100 Bt trwI1 Bg trwJ3 Bg trwI4 Bt trwJ3 95 100 Bg trwI3 100 100 Bt trwJ2 100 Bt trwJ5 Bg trwI2 Bh-IC11 trwJ2 Bg trwI1 100 99 100 100 Bh-IC11 trwI1 98 Bh-MAR trwJ3 Bh-MAR trwJ2 Bh-IC11 trwI2 Bh-H1 trwJ2 90 Bh-H1 trwI1 Bq trwJ2 Bh-H1 trwI2 100 Bh-IC11 trwJ1 100 Bh-MAR trwI1 99 Bh-H1 trwJ1 Bh-MAR trwI2 Bh-MAR trwJ1 Bh-MAR trwI3 Bq trwJ1 Bq trwI1 100 100 Bq trwI2 50 nt Bg trwJ1 Bt trwJ1 Bh-H1 trwJ2 100 nt Bh-H1 trwI2 Bh-H1 trwJ1 Bh-H1 trwI1 Bh-IC11 trwI2 Bh-IC11 trwJ1 Bh-IC11 trwJ2 Bh-IC11 trwI1 Bh-MAR trwJ2 Bh-MAR trwI2 Bh-MAR trwJ1 Bh-MAR trwI1 Bh-MAR trwI3 Bh-MAR trwJ3 FIG. 2.—Upper panels: Maximum likelihood phylogenies for the nucleotide alignments of (A) trwI and (B) trwJ. The trees were inferred without an outgroup and the maximum likelihood bootstrap values above 80% are shown. Lower panels: alignment visualization of (A) the trwI and (B) the trwJ gene copies in the B. henselae strains. Each arrow represents a gene copy in the alignment. At each position in the alignment, groups of homologs with shared nucleotides are connected by lines. For visibility, only groups with 2 or 3 members are connected. Red stars above the genes indicate unique sites. All nonsegregating sites were omitted from the alignments, as well as the HV region of trwJ. There is a striking difference in sequence conservation between genes that lie in the same coamplified fragment, yielding gene-specific tree topologies for the trwJ, trwI, and trwH genes. A phylogenetic analysis of the trwI and trwH genes revealed species-specific clades with bootstrap support of 100%, with trwI showing this pattern even down to the strain-specific level in B. henselae (fig. 2A). In contrast, the trwJ sequences tended to form clades consisting of positional homologs (fig. 2B). For example, the trwJ copies from the 3 B. henselae strains formed 2 separate clades with at least 98% bootstrap support, one clade containing all trwJ1 copies andthe other clade containing all trwJ2/3 copies. By and large, the topology of the trwJ tree is congruent with the estimated trw tree (fig. 1), given some duplication and deletion events. 292 Nystedt et al. Because 1 part of the trwJ gene displayed extreme sequence divergence, we examined if the pattern of vertical descent in trwJ holds true for the entire gene. The trwJ alignment was divided into a SP region (nucleotide position 1–72), a hypervariable (HV) region (nucleotide position 73–780), and a V region (nucleotide position 780–1128). Phylogenetic trees based both on the HV region (supplementary fig. 1A, Supplementary Material online) and a concatenation of the SP þ V regions (supplementary fig. 1B, Supplementary Material online) were congruent with the tree inferred from the complete trwJ alignment (fig. 2B) for all clades with strong bootstrap support (.85%). Hence, the signal of vertical descent is evident both in the hypervariable and the more conserved parts of the trwJ gene. To visualize the completely different phylogenetic signals in the trwJ and trwI genes directly from the alignments, we plotted the shared nucleotides between the 7 copies from the 3 B. henselae strains for trwJ and trwI separately. The result highlighted the similarity between copies in the same genome for trwI (fig. 2A) and between positional homologs for trwJ (fig. 2B). We also noted an access of unique sites in BhH1 trwJ2 compared with the other trwJ copies in the 3 strains. Composite Genes and Gene Conversion Visual inspection of the trwJ copies indicated heterogeneous pairwise similarities between close homologs along the alignment. One explanation for this could be composite genes, that is, genes created by recombination of 2 parental sequences. We tested this hypothesis using 4SIS (Maynard Smith and Smith 1998), looking for recombined genes by comparing 1 gene with 2 potential parental sequences. Each trwJ copy in the B. henselae strains was tested, using the closest phylogenetic homolog (always being a positional homolog from another strain) as 1 parental sequence, whereas the other parental sequence was assumed to be the second trwJ copy from the same genome as the analyzed gene (for simplicity excluding Bh-MAR trwJ3 from this analysis). Bartonella grahamii trwJ1 was used as an outgroup in all analyses. Three of the 6 analyzed trwJ genes (Bh-H1 trwJ2, Bh-IC11 trwJ1, and Bh-IC11 trwJ2) showed statistically significant evidence of recombination (P , 0.01) (supplementary fig. 2, Supplementary Material online). Similar results were obtained with RDP3 (Martin and Rybicki 2000), with significant recombination signals (P , 0.05) found by at least 3 of the 4 recombination detection methods used (see Materials and Methods) (fig. 3). Even though a much more extensive sampling of strains would be needed in order to fully explain the evolutionary history of the trwJ genes, these examples provide evidence that recombined copies are common and contribute to the divergence of the trwJ genes. In striking contrast to trwJ, the trwI copies formed strong intragenomic clades in the phylogenetic analysis (fig. 2A). Calculating the Shannon–Wiener diversity index in sliding windows along the concatenated alignments of trwJ (excluding the HV region), trwI, trwH, and their preceding spacers showed extreme differences in intragenomic diversity along the segment, with the lowest diversity focused around trwI in all genomes (fig. 4A–F). Given the signal of vertical descent in trwJ and that the trwJIH genes seem to have been copied together, the low intragenomic diversity in trwI is unlikely to be due to recent independent trwJIH duplications in each genome. We also tested whether strong negative selection on trwI could explain the low intragenomic diversity. We calculated the average dN and dS values for each pair of trwI genes between any genome pair and compared with the corresponding values for the single-copy trwFED genes (table 1). Between species, the dN/dS values were somewhat increased for trwI (dN/dS 5 0.22–0.42) compared with trwFED (dN/dS 5 0.08–0.15), indicating that negative selection is not particularly strong on trwI compared with the trwFED genes. Negative selection is also unlikely to operate on synonymous sites, but both dN and dS values were much lower within than between genomes (table 1). Gene conversion between gene copies may explain strong intragenomic homogenization on both synonymous and nonsynonymous sites, without a decreased substitution frequency between genomes. Using runs test of (Sawyer 1989) on the concatenated trwJIH segments above, gene conversion fragments larger than 200 nt were detected in all genomes, covering various parts of the trwI and trwH genes and their preceding spacers (fig. 4A–F). One intergenomic fragment was also detected between the second copy in Bh-H1 and the 2 copies in Bh-IC11, indicating either recombination between strains or representing a segment where gene conversion has not yet distorted the signal of vertical inheritance between the strains. Such partial gene conversion, or recombinations between strains, would introduce artifacts in the distance measures, perhaps explaining the elevated average of the calculated dS values between the B. henselae strains for trwI (dS 5 0.15–0.17) compared with trwFED (dS 5 0.02–0.03) (table 1). Rapid Evolution of trwJ by Short Indels and Repeats Whereas the conserved SP and V regions in trwJ showed stable alignments and contained virtually no indels, the HV region ranged in size from 87 amino acids (Bh-H1 trwJ2) to 191 amino acids (Bg trwJ4). The most notable aspect of the trwJ (HV) alignments between close homologs from the B. henselae strains was the high frequency of indels, even in otherwise almost identical sequence segments, typically without distorting the open reading frame. For example, between Bh-IC11 trwJ2, Bh-MAR trwJ2, and Bh-MAR trwJ3 we found 5 indels of 2, 13, 24, 3 (fig. 5), and 12 nt in size. The indel of 24 nt was associated with a short imperfect repeat of 18 nt, with the first copy present in all 3 strains (fig. 5). Similar patterns of indels were seen in the alignments of the 3 trwJ1 copies of the B. henselae strains, with 6 indels of 18, 36, 30, 9, 3, and 3 nt in size. An imperfect triple repeat of 18 nt was also found in a homologous stretch of the B. tribocorum copies trwJ2 and trwJ5. Some of the indels were flanked by short stretches containing more nucleotide differences than seen elsewhere in the alignment, as observed for example around the 2 and 13 nt gaps in Bh-MAR trwJ3 compared with the Bh-MAR trwJ2 and Bh-ICII trwJ2 (fig. 5). We also used the corrected (dN and dS) and uncorrected (pN and pS) substitution frequencies as a measure of the sequence divergence of the conserved SP þ V regions Evolution of Type IV Secretion Systems 293 Bt trwJ1 Bt trwJ2 Unknown Bt trwJ3 Unknown Bt trwJ4 Bt trwJ5 Bh-MAR trwJ2 Bg trwJ1 Bg trwJ2 Bg trwJ3 Bg trwJ4 Bh-H1 trwJ1 Bh-H1 trwJ1 Bh-H1 trwJ2 Bh-MAR trwJ1 Bh-MAR trwJ2 Unknown Bh-IC11 trwJ1 Bh-MAR trwJ3 Bh-IC11 trwJ1 Bh-IC11 trwJ1 Bh-IC11 trwJ1 Bh-IC11 trwJ2 Bh-IC11 trwJ1 Bh-IC11 trwJ1 Bq trwJ1 Unknown Bq trwJ2 FIG. 3.—Detection of recombinant trwJ copies by RDP3 (Martin and Rybicki 2000). Each dark bar represents a trwJ gene, with the gene name indicated to the upper left corner. White bars below a gene indicate a recombination event, with the presumed parental sequence indicated to the right. Recombination events supported (P , 0.05) by at least 3 of the 4 methods tested (see Materials and Methods) are shown. for all trwJ pairs within and across genomes. In contrast to trwI, the divergence of trwJ (SP þ V) within each genome was as high as the divergence between genomes (supplementary fig. 3, Supplementary Material online), with the maximum pairwise intragenomic divergence of trwJ (SP þ V) for each genome ranging from dN 5 0.26, dS 5 0.44 in B. quintenz to dN 5 0.60, dS 5 2.85 for B. tribocorum To compare, no copy pair of trwFED between any genomes exceeded dN 5 0.10 and dS 5 0.87. The dN/dS (and pN/ pS) ratios were generally higher for the trwJ homologs compared with the single-copy trwFED genes, but always below 1 (the only exception being Bg trw3 with a dN/dS . 1 when compared with Bg trwJ4, Bt trwJ2, and Bt trwJ5) (supplementary figs. 3 and 4, Supplementary Material online). Diverged Clades of the Tandemly Repeated trwL Genes The main pilus component protein TrwL is encoded by 7 to 8 gene copies located in a single tandem array in the investigated Bartonella genomes (fig. 1). The trwL copies displayed segmental conservation patterns as shown by the Shannon–Wiener diversity index (supplementary fig. 6, Supplementary Material online), where the most conserved parts corresponded to 2 transmembrane segments linked by a cytoplasmic region in the second half of the gene. Phylogenetic inference strongly supported a division of the trwL copies into 2 major clades, 1 clade (trwL:B) containing the last gene copy from each genome and 1 clade (trwL:A) containing all the remaining gene copies 294 Nystedt et al. A 1 B 0.5 0.5 0 C 0 500 1000 1500 0 2000 D 1 0.5 0 500 1000 1500 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 1 0 2000 1 F 1 0.5 0.5 0 0 0.5 0 E 1 0 500 1000 1500 2000 0 FIG. 4.—Intragenomic diversity index plotted against nucleotide position in the alignments of the repeated segments including trwJ, trwI, and trwH and their upstream spacers. Panels shown for the sequences from (A) B. tribocorum, (B) B. grahamii, (C) B. henselae (Bh-H1), (D) B. henselae (BhMAR), (E) B. henselae (Bh-IC11), and (F) B. quintenz. The bars above the plots indicate the locations and copy numbers of the genes. Putative gene conversion fragments larger than 200 nt as inferred by Sawyer’s test are shown as lines, with the stars indicating fragments between strains. (supplementary fig. 5, Supplementary Material online). The same division into 2 major clades was seen in a splits decomposition network of the trwL genes, also revealing some conflicting signals in the link between the 2 clades (fig. 6). The low bootstrap support values for most of the internal nodes of the trwL:A clade (supplementary fig. 5, Supplementary Material online) were also explained by the network analysis, with many trwL:A genes display- ing long terminal branches radiating out from a shared core network structure (fig. 6). Whereas the trwL:B copies appear to follow the trwFED phylogeny (supplementary fig. 5, Supplementary Material online), the trwL:A copies of B. quintenz, B. grahamii, and B. tribocorum tended to form intragenomic clades, whereas the trwL:A copies from the B. henselae strains seemed to be distributed over the entire tree FIG. 5.—Examples of indels in a region of otherwise high similarity of the HV region of trwJ between positional homologs from the B. henselae strains. The black rectangles indicate an imperfect repeat. The given position refers to the complete trwJ alignment. Evolution of Type IV Secretion Systems 295 trwL:A 0.01 Bh-CH trwL2 Bq trwL2 Bq trwL7 Bh-CH trwL1 Bh-IC11 trwL2 Bq trwL1 Bh-IC11 trwL1 Bq trwL3 Bq trwL6 Bq trwL4 Bq trwL5 Bh-H1 trwL4 Bh-MAR trwL4 Bg trwL4 Bg trwL7 Bg trwL6 Bg trwL5 Bh-H1 trwL3 Bh-MAR trwL3 Bh-CH trwL4 Bh-IC11 trwL4 Bh-IC11 trwL5 Bh-CH trwL5 Bh-MAR trwL1 Bh-H1 trwL1 Bh-H1 trwL2 Bh-MAR trwL2 Bh-CHtrwL3 Bh-IC11 trwL3 Bh-CH trwL6 Bh-IC11 trwL6 Bh-MAR trwL7 Bh-H1 trwL7 Bh-H1 trwL5 Bh-MAR trwL5 Bg trwL3 Bg trwL2 Bg trwL1 Bt trwL4 Bt trwL2 Bt trwL3 Bt trwL6 Bt trwL5 Bt trwL1 Bh-H1trwL6 Bh-MAR trwL6 trwL:B Bg trwL8 Bt trwL7 Bq trwL8 Bh-H1 trwL8 Bh-MAR trwL8 Bh-CH trwL7 Bh-IC11 trwL7 FIG. 6.—A splits decomposition network of the trwL nucleotide alignments. The trwL:A and the trwL:B clades are indicated with dashed lines. (fig. 6, supplementary fig. 5 [Supplementary Material online]). Unlike the homogenized trwI genes, the speciesspecific clades of trwL seemed to be completely independent of the trwFED phylogeny, with every species clade being roughly equidistant to every other species clade. This was quantified for B. tribocorum, using the average synonymous substitution frequencies for all pairwise comparisons of the trwL:A copies in 2 genomes as a distance measure. Comparing B. tribocorum with the other species gave high and virtually identical dS averages (Bt:Bg 1.4, Bt:Bq 1.5, Bt:Bh 1.3–1.5), which can be compared with the singlecopy trwFED genes, where the dS values are considerably lower and directly reflect the trwFED phylogeny (Bt:Bg 0.37, Bt:Bq 0.84, Bt:Bh 0.85–0.86). The average intragenomic dS value of the trwL:A genes in B. tribocorum is 0.25, thus approaching the magnitude of the trwFED genes in the 2 distinct species B. tribocorum and B. grahamii (dS 5 0.37), and 25 times larger than the average dS values of the intragenomic clade of the B. tribocorum trwI genes (dS 5 0.01). Horizontal Transfer of a Large trwL Segment The 4 B. henselae strains represent 2 completely different versions of the trwL region. Bh-H1 and Bh-MAR are very similar over most of the trwL region, and so are BhIC11 and Bh-CH (fig. 7A). From the alignments, we identified the exact boundaries of this pairwise similarity, stretching from the end region of korA to the first half of the trwL7 (Bh-H1:Bh-MAR) and trwL6 (Bh-IC11:BhCH) copies, respectively (fig. 7B and C), thus including all the genes in the trwL:A clade identified above. We inferred the closest homolog for each of the trwL:A copies between each pair of strains based on the lowest synonymous substitution frequency (excluding the last copy in the array). Within each group, the synonymous substitution frequencies for the closest homologs range from dS 5 0 to 0.16 (average 0.04) for Bh-H1:Bh-MAR and dS 5 0 to 0.001 (average 0.002) for Bh-IC11:Bh-CH, showing complete gene order conservation and .99% nt sequence identity over the complete region, including spacers. This is 296 Nystedt et al. korA L1 L2 L3 L4 A L5 L6 L7 L8 Bh-H1 90–100% 80–90% 70–80% Bh-MAR Bh-IC11 Bh-CH korA L1 L2 L3 L4 L5 L6 L7 B C 100nt Bh-H1 100nt Bh-H1 Bh-IC11 Bh-IC11 Bh-MAR Bh-MAR Bh-CH Bh-CH FIG. 7.—Analysis of the trwL region, showing distinctly different variants among the Bh strains. (A) Each gene is connected to its closest homolog in the genome outlined above/below, with the percent nucleotide identity indicated according to the grayscale (upper right). The korA gene and the trwL genes (L1–L8) are indicated. (B, C) Alignment visualization of the 5# (B) and 3# (C) ends of a segment with strong pairwise similarity among the 4 strains. Arrows represent the aligned genes of the left (B) and right (C) framed areas in (A). At each position in the alignment, sequences with identical nucleotides are connected. For visibility, nonsegregating sites are not connected. Lines and stars above (or below) the genes indicate gaps and unique sites, respectively. consistent with the estimated distances between strains as inferred from the trwFED genes (table 1). In striking contrast, dS values for the closest trwL pairs between representatives for the 2 groups range from dS 5 0.07 to 1.18 (average 0.59) for Bh-MAR:Bh-ICII. These values overlap with the values for the closest homologs between the B. henselae strains and B. quintenz, for example, dS 5 0.81– 1.43 (average 1.05) for Bh-MAR:Bq. Because the dS values for single-copy genes located within and around the trw region are similar between all the B. henselae strains (sequence data are lacking for these genes in Bh-CH), the differences between the trwL arrays cannot be explained by a gradual process of nucleotide substitutions and gene duplications/deletions. Instead, the data suggest a replacement of this large trwL segment by a recent horizontal gene transfer into one of the lineages from a rather distantly related strain. Upstream of the horizontally transferred trwL segment, the 4 B. henselae strains display between all high similarity (fig. 7B). However, at the downstream end of the segment, the Bh-H1 strain clearly differed from the 3 other strains (fig. 7C) throughout the 2 last genes in the trwL array including the intervening spacer and to some degree also in trwM and at the beginning of trwK. Segmental differences in sequence similarity were observed for trwK between the B. henselae strains (supplementary fig. 7A, Supplementary Material online), and significant (P , 0.05) recombination signals in this gene were detected among the B. henselae strains by at least 3 of the 4 methods tested with RDP3 (see Materials and Methods) (supplementary fig. 7B, Supplementary Material online). Gene Conversion on a Genome Scale In order to determine whether gene conversion is specifically targeting the trw locus, we roughly estimated the overall gene conversion pattern for repeated genes in the B. henselae and B. quintenz genomes. On the complete gene set of B. henselae and B. quintenz genes, we used Blastclust to detect repeat gene families with at least 3 members (both genomes combined) with more than 60% sequence similarity over more than 50% of their length. This resulted in 35 clusters, with the number of members ranging from 3 to 20 (average 5.3 members per cluster), excluding the clusters formed by the repeated trw genes and 2 clusters with too few polymorphisms for analysis. In 12 of the 35 clusters (34%), intragenomic gene conversion fragments larger than 100 nt were detected, suggesting that gene conversion occurs frequently among duplicated genes in Bartonella. Discussion In this paper, we present the first detailed evolutionary analysis of the Bartonella T4SS, Trw, which is required for invasion of the erythrocytes in the mammalian host (Schulein et al. 2001). This system was recently acquired to one of the Bartonella lineages by horizontal gene Evolution of Type IV Secretion Systems 297 transfer, followed by a functional switch from plasmid conjugation to host association (Seubert et al. 2003; Frank et al. 2005; Fernandez-Lopez et al. 2006). The Bartonella trw cluster contains duplications unique among T4SSs, providing an excellent model system for dissecting the molecular evolutionary forces of host-interaction proteins versus intracellular proteins, and the evolutionary dynamics of gene duplications. Gene duplication is one of the most important sources of gene innovation (Ohno 1970; Gevers et al. 2004). An important feature of tandem duplications is that the initial duplication event is the most unlikely, whereas subsequent duplication–deletions easily occur by homologous recombination during replication. As a consequence, natural selection is needed in order to maintain tandem duplications within a population (Pettersson et al. 2005). The advantages of multiple gene copies may be 1) increased expression (e.g., dosage compensation of defective enzymes), 2) variability within the genome (e.g., antigenic variation), and 3) increased potential for new functions (e.g., neo- or subfunctionalization). The Trw system in Bartonella appears to display examples of all the above dynamics, likely providing the variation and innovation required both for the initial functional switch and for successful host adaptation and chronic infection. A notable feature of the duplicated pilus component TrwJ is the extreme intragenomic divergence and the rapid evolution with frequent indels scattered over the first half of the trwJ homologs, resulting in a 2-fold difference in size of this region (from 87 amino acids in Bh-H1 trwJ2 to 191 amino acids in Bg trwJ4). A similar, but less pronounced, divergence is seen in the surface-exposed regions of the duplicated major pilus protein TrwL. One possible explanation for the sequence variability among the gene duplicates is that one of the copies is expressed, whereas the others serve as sequence reservoirs that feed divergences into the expressed site, as demonstrated for many exocellular proteins (Hagblom et al. 1985; Plasterk et al. 1985; Haas and Meyer 1986; Brayton et al. 2002, 2005). However, none of the duplicated trw genes in Bartonella appear to be silent, because all copies are complete open reading frames preceded by ribosomal binding sites (Schroder and Dehio 2005), and evolve under purifying selection (dN/dS , 1). Likewise, expression of the duplicated trwJIH variants is probably coregulated, with conserved putative KorA/KorB-regulated promoters preceding all copies (Schroder and Dehio 2005). Reoccurring rapid gene amplification, followed by divergent loss of duplicated copies has been suggested as a common molecular mechanism to select for mutations in demand (Andersson et al. 1998). In this model, selection for dosage compensation may rapidly increase the gene copy number to the order of hundreds and then quickly revert to a single copy when a favorable allele has emerged (Andersson et al. 1998; Pettersson et al. 2005). Although duplication–deletions clearly represent important mutational processes in the evolution of the trw cluster, this specific model would be difficult to reconcile with the patterns of vertical descent of the trwJ copies. Yet another possibility for the escalation of sequence divergence would be that this region is replicated separately from the rest of the genome during later growth stages (Lindroos et al. 2006), which could induce a different spectrum of mutations in this part of the genome. However, the trwFED genes as well as the genes flanking the trw cluster evolve at similar substitution rate as the Bartonella core genes, suggesting that an altered pattern or rate of nucleotide changes in the region covering the trw cluster cannot explain the data. We favor divergence mechanisms that involve combinatorial sequence shuffling between copies through recombination events within and between relatively distantly related lineages. With multiple tandem copies such as in the trwL array, the pool of genetic variants can easily be increased by new chimerical sequences created by intragenomic shuffling and recurrent duplication–deletions of genes. If Bartonella lineages can also recombine with each other, the potential for novel genetic diversity increases dramatically. Indeed, the extensive divergence of (Bh-H1:BhMAR) compared with (Bh-IC11:Bh-CH) in the trwL array is difficult to reconcile without invoking horizontal gene transfer from a rather distantly related lineage. Likewise, the many unique sequence stretches between otherwise closely related trwJ homologs in the B. henselae strains might also be explained by frequent recombination events from a large pan-genomic pool of divergent trwJ alleles. Importantly, the dynamic of this system is so high that it may well explain the ‘‘apparent absences’’ in some species of virB2 and virB7 (homologs of trwL and trwJ, respectively) as indicated by ordinary Blast searches (Medini et al. 2006). Thus, our results refute models suggesting that T4SSs have been built up in a stepwise manner by independent recruitment of units to the complex (Medini et al. 2006). We infer duplication–deletions, gene conversion between duplicated genes and sequence exchange between genomes to be part of the mutational engine that underlies the dynamic structure of the trw cluster, with the outcome being dependent on the selective constraints driving the evolution of the individual genes. Selection for Erythrocyte Binding on trwJ The trw pili could be targets for the immune response, as demonstrated for the YscF protein at the base of the T3SS in Yersinia (Matson et al. 2005), possibly explaining the rapid evolution and the sequence divergence between the Bartonella isolates. However, in lack of evidence for differential expression of the pilus protein gene copies, it appears for now as if all structural variants are exposed simultaneously on the cell surface, making it difficult to explain the strong intragenomic divergence solely in terms of an immune evasion mechanism. An increased binding potential of the erythrocytes has been suggested as an alternative hypothesis to explain the variability in the Trw pilus proteins in Bartonella (Seubert et al. 2003). Mammalian erythrocytes are known to have highly variable glycoprotein surface structures that have been suggested to evolve under 2 opposing selection pressures (Gagneux and Varki 1999). Because the nonnucleated erythrocytes cannot serve as hosts for viruses, a broad erythrocyte-binding potential may act as a decoy defense 298 Nystedt et al. mechanism by distracting viruses from their target tissues (Gagneux and Varki 1999), whereas pathogens as for example the malarian parasite Plasmodium falciparum use the same glycoprotein structures for adherence and invasion to the erythrocytes themselves (Orlandi et al. 1992; Sim et al. 1994). The erythrocyte surface protein glycophorin A has been shown to be subject to both positive and diversifying selection in humans and other primates (Baum et al. 2002; Altheide et al. 2006). Likewise, and possibly as a consequence, the erythrocyte-binding antigen-175 in P. falciparum and duffy binding protein in Plasmodium vivax have also been shown to be subject to diversifying selection (Baum et al. 2003). Given the extreme variation in the erythrocyte surface structures both within and between individual host species, selection for a diverse arsenal of binding structures appears likely and may explain the extreme variation of the trwJ gene copies within and between isolates, whereas evasion of the host immune system may explain the observed rapid evolution of this pilus protein. It is well known that B. henselae is able to invade the erythrocytes in their natural feline hosts but not in incidentally infected humans. The molecular basis for this host specificity is not known, but the Trw pilus is a prime suspect in that it is required for the erythrocyte invasion in vivo (Seubert et al. 2003). If so, host specificity may also explain the need for a larger number of trwJ copies in the rodentassociated B. tribocorum and B. grahamii that are believed to have a wider host range compared with the feline- and human-associated B. henselae and B. quintana. Selection for Functional Conservation of trwI In contrast to the highly diverged pilus proteins, the trwI (and trwH) copies are nearly identical within genomes and show a normal rate of protein evolution between lineages, despite being located on the same duplicated fragment as trwJ. This pattern resembles multiple rRNA genes and their repeated sequences (Liao 2000; Ganley and Kobayashi 2007), and as for the rRNA genes, we suggest that concerted evolution through frequent fixation of gene conversion events between gene copies best explains the observed patterns for trwI. The observation that intragenomic trwI copies appear less diverged than their upstream spacer regions suggests that the homogenization is a selected feature in trwI. If so, what would be the selective advantages behind homogenization of the anchoring proteins? As for the VirB T4SS complex in A. tumefaciens (Dang et al. 1999; Sagulenko et al. 2001), the tight interactions of the Trw anchoring proteins will likely impose strong restrictions on amino acid substitutions, where a substitution in one protein may require a compensatory substitution in another protein (Moyle et al. 1994; Goh et al. 2000; Mintseris and Weng 2005) (even though this view has been recently questioned; Hakes et al. 2007). Multiple copies of an anchoring gene may thus pose a serious problem because all copies must fit with the rest of the structure. Frequent gene conversion offers an opportunity for continuous homogenization, efficiently purging deleterious mutations that arise in one copy, while also allowing for beneficial compensatory substitutions in one gene to quickly spread to the other copies. The homogenized trwI copies are likely to be hot spots for homologous recombination during replication, thereby preserving the duplication–deletion dynamics of the region, despite the high divergence of the trwJ copies. Selection for Subfunctionalization and Pilus Elongation in trwL The 2 distinct phylogenetic clades of the genes coding for the major pilus protein TrwL is evidence that 2 trwL gene variants have been retained side-by-side in the Bartonella lineages over a considerable evolutionary time. This pattern is consistent with a functional separation between the 2 clades dating back to, or shortly after, the horizontal transfer of Trw into the Bartonella lineage. In other T4SSs, the TrwL homolog constitutes the main part of the pilus and also attaches the pilus to the outer membrane (Cascales and Christie 2003). We thus speculate that the 2 clades represent a subfunctionalization event, where one clade is responsible for the attachment to the outer membrane and the other clade provides the building blocks for the actual pilus. If so, the increased copy number of 1 clade could result in a relative increase of pilus proteins compared with anchoring proteins and an increase in pilus length that might be advantageous for rapid adhesion to the erythrocyte in the blood stream of the mammalian host. Expression leveldependent pilus lengths have previously been demonstrated for the main pilus protein of the Type III Pilus of Corynebacterium diphteriae (Swierczynski and Ton-That 2006) and is consistent with the idea that retained gene duplications are often immediately beneficial through dosage effects (Kondrashov et al. 2002). Evolutionary Tests and Duplicated Genes Even though the strikingly different evolutionary patterns in trwJ, trwI, and trwL provide strong evidence per se that selective forces are operating on some or all these genes, proper quantitative evolutionary analyses of the trw genes are difficult. Many assumptions underlying standard selective tests, such as D of (Tajima 1989), D* and F* of (Fu and Li 1993), and elevated Ka/Ks ratios are violated by frequent recombination, rapid evolution, and nonindependent substitutions between sites. Because duplications, recombination, and rapid evolution appear to be recurrent themes for host-interaction proteins, evaluating the influence of selection versus neutrality remains a major challenge for these important systems. In order to better understand how and why new infectious disease agents emerge, we need better evolutionary models, broadening our horizon beyond point mutations. This will require massive pan-genome sampling as well as the development of new statistical tools. Supplementary Material Supplementary figures 1–7 are available at Molecular Biology and Evolution online (http://www.mbe. oxfordjournals.org/). Evolution of Type IV Secretion Systems 299 Acknowledgments We thank Petter Johansson and Anna Åsman for assistance with the experimental work. This work was supported by the Linnaeus Centre for Bioinformatics in Uppsala (M. Thollesson) and the Swedish Research Council, the Swedish Foundation for Strategic Research, the Knut and Alice Wallenberg Foundation, the Göran Gustzfsson Foundation, and the European Union (S.G.E. Andersson). Literature Cited Abdulkarim F, Hughes D. 1996. Homologous recombination between the tuf genes of Salmonella typhimurium. J Mol Biol. 260:506–522. Alsmark CM, Frank AC, Karlberg EO, et al. (13 co-authors). 2004. The louse-borne human pathogen Bartonella quintana is a genomic derivative of the zoonotic agent Bartonella henselae. Proc Natl Acad Sci USA. 101:9716–9721. Altheide TK, Hayakawa T, Mikkelsen TS, Diaz S, Varki N, Varki A. 2006. System-wide genomic and biochemical comparisons of sialic acid biology among primates and rodents: evidence for two modes of rapid evolution. J Biol Chem. 281:25689–25702. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol. 215:403–410. Andersson DI, Slechta ES, Roth JR. 1998. Evidence that gene amplification underlies adaptive mutability of the bacterial lac operon. Science. 282:1133–1135. Baum J, Thomas AW, Conway DJ. 2003. Evidence for diversifying selection on erythrocyte-binding antigens of Plasmodium falciparum and P. vivax. Genetics. 163:1327–1336. Baum J, Ward RH, Conway DJ. 2002. Natural selection on the erythrocyte surface. Mol Biol Evol. 19:223–229. Brayton KA, Kappmeyer LS, Herndon DR, Dark MJ, Tibbals DL, Palmer GH, McGuire TC, Knowles DP Jr. 2005. Complete genome sequencing of Anaplasma marginale reveals that the surface is skewed to two superfamilies of outer membrane proteins. Proc Natl Acad Sci USA. 102:844–849. Brayton KA, Palmer GH, Lundgren A, Yi J, Barbet AF. 2002. Antigenic variation of Anaplasma marginale msp2 occurs by combinatorial gene conversion. Mol Microbiol. 43: 1151–1159. Cascales E, Christie PJ. 2003. The versatile bacterial type IV secretion systems. Nat Rev Microbiol. 1:137–149. Cascales E, Christie PJ. 2004. Definition of a bacterial type IV secretion pathway for a DNA substrate. Science. 304: 1170–1173. Dang TA, Zhou XR, Graf B, Christie PJ. 1999. Dimerization of the Agrobacterium tumefaciens VirB4 ATPase and the effect of ATP-binding cassette mutations on the assembly and function of the T-DNA transporter. Mol Microbiol. 32:1239– 1253. Deitsch KW, Moxon ER, Wellems TE. 1997. Shared themes of antigenic variation and virulence in bacterial, protozoal, and fungal infections. Microbiol Mol Biol Rev. 61:281–293. Fernandez-Lopez R, Garcillan-Barcia MP, Revilla C, Lazaro M, Vielva L, de la Cruz F. 2006. Dynamics of the IncW genetic backbone imply general trends in conjugative plasmid evolution. FEMS Microbiol Rev. 30:942–966. Frank AC, Alsmark CM, Thollesson M, Andersson SG. 2005. Functional divergence and horizontal transfer of type IV secretion systems. Mol Biol Evol. 22:1325–1336. Frank AC, Berglund E, Andersson SGE. 2006. The genomes of pathogenics, pathogenic Bartonella species. In: Hacker J, Dobrindt U, editors. Pathogenomics. Weinheim (Germany): Wiley-VCH Verlag GmbH & Co. p. 281–300. Fu YX, Li WH. 1993. Statistical tests of neutrality of mutations. Genetics. 133:693–709. Gagneux P, Varki A. 1999. Evolutionary considerations in relating oligosaccharide diversity to biological function. Glycobiology. 9:747–755. Ganley AR, Kobayashi T. 2007. Highly efficient concerted evolution in the ribosomal DNA repeats: total rDNA repeat variation revealed by whole-genome shotgun sequence data. Genome Res. 17:184–191. Gevers D, Vandepoele K, Simillon C, Van de Peer Y. 2004. Gene duplication and biased functional retention of paralogs in bacterial genomes. Trends Microbiol. 12:148–154. Goh CS, Bogan AA, Joachimiak M, Walther D, Cohen FE. 2000. Co-evolution of proteins with their interaction partners. J Mol Biol. 299:283–293. Gray RR, Mulligan CJ, Molini BJ, Sun ES, Giacani L, Godornes C, Kitchen A, Lukehart SA, Centurion-Lara A. 2006. Molecular evolution of the tprC, D, I, K, G, and J genes in the pathogenic genus Treponema. Mol Biol Evol. 23:2220–2233. Guttman DS, Gropp SJ, Morgan RL, Wang PW. 2006. Diversifying selection drives the evolution of the type III secretion system pilus of Pseudomonas syringe. Mol Biol Evol. 23:2342–2354. Haas R, Meyer TF. 1986. The repertoire of silent pilus genes in Neisseria gonorrhoeae: evidence for gene conversion. Cell. 44:107–115. Hagblom P, Segal E, Billyard E, So M. 1985. Intragenic recombination leads to pilus antigenic variation in Neisseria gonorrhoeae. Nature. 315:156–158. Hakes L, Lovell SC, Oliver SG, Robertson DL. 2007. Specificity in protein interactions and its relationship with sequence diversity and coevolution. Proc Natl Acad Sci USA. 104:7999–8004. Jobb G, von Haeseler A, Strimmer K. 2004. TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics. BMC Evol Biol. 4:18. Käll L, Krogh A, Sonnhammer EL. 2005. A HMM posterior decoder for sequence feature prediction that includes homology information. Bioinformatics. 21:i251–i257. Kondrashov FA, Rogozin IB, Wolf YI, Koonin EV. 2002. Selection in the evolution of gene duplications. Genome Biol. 3:R8:1–9. Korber B. 2000. HIV signature and sequence variation analysis. In: Rodrigo AG, Learn GH, editors. Computational analysis of HIV molecular sequences. Dordrecht (The Netherlands): Kluwer Academic Publishers. p. 55–72. Krall L, Wiedemann U, Unsin G, Weiss S, Domke N, Baron C. 2002. Detergent extraction identifies different VirB protein subassemblies of the type IV secretion machinery in the membranes of Agrobacterium tumefaciens. Proc Natl Acad Sci USA. 99:11405–11410. Lassmann T, Sonnhammer EL. 2005. Kalign–an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics. 6:298. Liao D. 2000. Gene conversion drives within genic sequences: concerted evolution of ribosomal RNA genes in bacteria and archaea. J Mol Evol. 51:305–317. Lindroos H, Vinnere O, Mira A, Repsilber D, Naslund K, Andersson SG. 2006. Genome rearrangements, deletions, and amplifications in the natural population of Bartonella henselae. J Bacteriol. 188:7426–7439. Loytynoja A, Milinkovitch MC. 2001. SOAP, cleaning multiple alignments from unstable blocks. Bioinformatics. 17:573– 574. 300 Nystedt et al. Ma W, Dong FF, Stavrinides J, Guttman DS. 2006. Type III effector diversification via both pathoadaptation and horizontal transfer in response to a coevolutionary arms race. PLoS Genet. 2:e209. Martin D, Rybicki E. 2000. RDP: detection of recombination amongst aligned sequences. Bioinformatics. 16:562–563. Matson JS, Durick KA, Bradley DS, Nilles ML. 2005. Immunization of mice with YscF provides protection from Yersinia pestis infections. BMC Microbiol. 5:38. Maynard Smith J. 1992. Analyzing the mosaic structure of genes. J Mol Evol. 34:126–129. Maynard Smith J, Smith NH. 1998. Detecting recombination from gene trees. Mol Biol Evol. 15:590–599. Medini D, Covacci A, Donati C. 2006. Protein homology network families reveal step-wise diversification of type III and type IV secretion systems. PLoS Comp Biol. 12:e173. Mintseris J, Weng Z. 2005. Structure, function, and evolution of transient and obligate protein-protein interactions. Proc Natl Acad Sci USA. 102:10930–10935. Moyle WR, Campbell RK, Myers RV, Bernard MP, Han Y, Wang X. 1994. Co-evolution of ligand-receptor pairs. Nature. 368:251–255. Noormohammadi AH, Markham PF, Kanci A, Whithear KG, Browning GF. 2000. A novel mechanism for control of antigenic variation in the haemagglutinin gene family of Mycoplasma synoviae. Mol Microbiol. 35:911–923. Ohno S. 1970. Evolution by gene duplication. New York: Springer. Orlandi PA, Klotz FW, Haynes JD. 1992. A malaria invasion receptor, the 175-kilodalton erythrocyte binding antigen of Plasmodium falciparum recognizes the terminal Neu5Ac (alpha 2-3) Gal- sequences of glycophorin. A J Cell Biol. 116:901–909. Padidam M, Sawyer S, Fauquet CM. 1999. Possible emergence of new geminiviruses by frequent recombination. Virology. 265:218–225. Pettersson ME, Andersson DI, Roth JR, Berg OG. 2005. The amplification model for adaptive mutation: simulations and analysis. Genetics. 169:1105–1115. Plasterk RH, Simon MI, Barbour AG. 1985. Transposition of structural genes to an expression sequence on a linear plasmid causes antigenic variation in the bacterium Borrelia hermsii. Nature. 318:257–263. Posada D, Crandall KA. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics. 14:817–818. Posada D, Crandall KA. 2001. Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc Natl Acad Sci USA. 98:13757–13762. Sagulenko E, Sagulenko V, Chen J, Christie PJ. 2001. Role of Agrobacterium VirB11 ATPase in T-pilus assembly and substrate selection. J Bacteriol. 183:5813–5825. Santoyo G, Romero D. 2005. Gene conversion and concerted evolution in bacterial genomes. FEMS Microbiol Rev. 29:165–183. Sawyer S. 1989. Statistical tests for detecting gene conversion. Mol Biol Evol. 6:526–538. Sawyer S. 1999. GENECONV: a computer package for the statistical detection of gene conversion. Distributed by the author. St Louis (MO): Department of Mathematics, Washington University. Schroder G, Dehio C. 2005. Virulence-associated type IV secretion systems of Bartonella. Trends Microbiol. 13:336–342. Schulein R, Seubert A, Gille C, Lanz C, Hansmann Y, Piemont Y, Dehio C. 2001. Invasion and persistent intracellular colonization of erythrocytes. A unique parasitic strategy of the emerging pathogen Bartonella. J Exp Med. 193:1077–1086. Seubert A, Hiestand R, de la Cruz F, Dehio C. 2003. A bacterial conjugation machinery recruited for pathogenesis. Mol Microbiol. 49:1253–1266. Sim BK, Chitnis CE, Wasniowska K, Hadley TJ, Miller LH. 1994. Receptor and ligand domains for invasion of erythrocytes by Plasmodium falciparum. Science. 264:1941–1944. Stavrinides J, Ma W, Guttman DS. 2006. Terminal reassortment drives the quantum evolution of type III effectors in bacterial pathogens. PLoS Pathog. 2:e104. Swierczynski A, Ton-That H. 2006. Type III pilus of corynebacteria: pilus length is determined by the level of its major pilin subunit. J Bacteriol. 188:6318–6325. Tajima F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 123:585–595. Yang Z. 2000. Phylogenetic analysis by maximum likelihood. PAML. Version 3.0. London: University College. Yang Z, Nielsen R. 2000. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol. 17:32–43. Yeo HJ, Yuan Q, Beck MR, Baron C, Waksman G. 2003. Structural and functional characterization of the VirB5 protein from the type IV secretion system encoded by the conjugative plasmid pKM101. Proc Natl Acad Sci USA. 100: 15947–15952. Zeaiter Z, Fournier PE, Ogata H, Raoult D. 2002. Phylogenetic classification of Bartonella species by comparing groEL sequences. Int J Syst Evol Microbiol. 52:165–171. Zhang JR, Norris SJ. 1998. Genetic variation of the Borrelia burgdorferi gene vlsE involves cassette-specific, segmental gene conversion. Infect Immun. 66:3698–3704. Edward Holmes, Associate Editor Accepted November 11, 2007
© Copyright 2026 Paperzz