Diversifying Selection and Concerted Evolution

Diversifying Selection and Concerted Evolution of a Type IV Secretion System
in Bartonella
Björn Nystedt,*1 A. Carolin Frank,*1,2 Mikael Thollesson,* and Siv G. E. Andersson*
*Department of Molecular Evolution, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
We have studied the evolution of a type IV secretion system (T4SS), in Bartonella, which is thought to have changed
function from conjugation to erythrocyte adherence following a recent horizontal gene transfer event. The system, called
Trw, is unique among T4SSs in that genes encoding both exo- and intracellular components are located within the same
duplicated fragment. This provides an opportunity to study the influence of selection on proteins involved in host–
pathogen interactions. We sequenced the trw locus from several strains of Bartonella henselae and investigated its
evolutionary history by comparisons to other Bartonella species. Several instances of recombination and gene conversion
events where detected in the 2- to 5-fold duplicated gene fragments encompassing trwJIH, explaining the
homogenization of the anchoring protein TrwI and the divergence of the minor pilus protein TrwJ. A phylogenetic
analysis of the 7- to 8-fold duplicated gene coding for the major pilus protein TrwL displayed 2 distinct clades, likely
representing a subfunctionalization event. The analyses of the B. henselae strains also identified a recent horizontal
transfer event of almost the complete trwL region. We suggest that the switch in function of the T4SS was mediated by
the duplication of the genes encoding pilus components and their diversification by combinatorial sequence shuffling
within and among genomes. We suggest that the pilus proteins have evolved by diversifying selection to match
a divergent set of erythrocyte surface structures, consistent with the trench warfare coevolutionary model.
Introduction
A common theme for bacteria that colonize eukaryotes
is the presence of variable surface-exposed proteins that coevolve with the host proteins. These bacteria experience selection for variability on 2 timescales; within individual
hosts to enable binding to a diverse set of host cell surface
proteins and to avoid immune system recognition, and between hosts to increase survival in genetically different
hosts. Two coevolutionary models of host–pathogen interactions have been proposed. The trench warfare model suggests that diversifying selection on bacterial surface
proteins maintains genetic polymorphisms in the population. The coevolutionary arms race model suggests back
and forth evolution that oscillates between the offense imposed by the pathogen on the host and the subsequent evolution of defense responses by the host. In this model,
positive selection on favorable alleles yields genetic variation over time, yet produces little variation within the
population.
Bacterial secretion systems are surface-exposed pili
that mediate the transfer of effector molecules into recipient
host cells. Studies of the pilus protein HrpA of the type III
secretion system (T3SS) in the plant pathogen Pseudomonas syringae provide evidence for diversifying selection according to the trench warfare model (Guttman et al. 2006),
whereas novel variants of the effector protein alleles are selected during host switching in line with the coevolutionary
arms race model (Ma et al. 2006). Mechanistically, a recombinatorial process called terminal reassortment has been
proposed to explain the extremely rapid evolution of the
T3SS effectors when compared between a wide range of
bacterial species (Stavrinides et al. 2006).
1
These authors contributed equally to this work.
Present address: Smurfit Institute of Genetics, University of
Dublin, Trinity College Dublin, Dublin, Ireland.
Key words: type IV secretion system, Bartonella, duplication,
recombination, gene conversion, host–pathogen interaction.
2
E-mail: [email protected].
Mol. Biol. Evol. 25(2):287–300. 2008
doi:10.1093/molbev/msm252
Advance Access publication December 7, 2007
Ó The Author 2007. Published by Oxford University Press on behalf of
the Society for Molecular Biology and Evolution. All rights reserved.
For permissions, please e-mail: [email protected]
The focus of this study is on the evolution of type IV
secretion systems (T4SSs). Prototype systems are found in
the plant pathogen Agrobacterium tumefaciens that uses
a T4SS called VirB to transfer oncogenic DNA to its host,
the whooping cough agent Bordetella pertussis that translocates a toxin into the host using a T4SS, and the humanassociated gastric bacterium Helicobacter pylori that
transfers the effector protein CagA with the aid of its
T4SS. A phylogenetic analysis of T4SSs showed that they
have originated from plasmid conjugation systems multiple
times independently (Frank et al. 2005). The system we
have studied is the trw T4SS in Bartonella, which is thought
to have switched function recently during its horizontal
transfer into the Bartonella lineage from a role in plasmid
conjugation to its current role in host interaction (Seubert
et al. 2003; Frank et al. 2005; Fernandez-Lopez et al. 2006).
Bartonella species are vector-borne bacteria that invade and
replicate within the host erythrocytes without normally
shortening the erythrocyte life span (Schulein et al.
2001). The Trw system is essential for erythrocyte colonization and is thought to mediate adherence to these host
cells (Seubert et al. 2003).
The Bartonella Trw system is unique among T4SSs in
that several genes in the cluster are duplicated (Seubert et al.
2003; Alsmark et al. 2004). For example, the gene encoding
the major component of the T-pilus, trwL is present in 7 or 8
copies. In addition, a segment encompassing trwJ, trwH,
and trwI is present in variable copy numbers in the different
species. By homology to other T4SS, TrwJ is thought to
mediate host cell attachment (Yeo et al. 2003), whereas
TrwH and TrwI are associated with the anchoring of the
pilus to the outer and inner membranes (Krall et al.
2002; Cascales and Christie 2004).
Studies of duplicated genes for other outer membrane
proteins suggest that gene conversion is one of the most important mechanisms for generating diversity within a host
(Deitsch et al. 1997). This can be manifested as the expression of alternative copies of surface proteins by the transfer of
short donor sequence fragments, called functional pseudogenes, into a recipient expression cassette (Santoyo and
Romero 2005). The power of combinatorial gene conversion
288 Nystedt et al.
N
Strain
0.01
100
korB
L M
korA
K
J
I
D
E
Host species
Bt
7
5†
Bg
8
4
Wood mouse
Bh-IC11
(ST1)
7
2
Cat
Bh-H1
(ST1)
8
2
Cat‡
Bh-MAR
(ST8)
8
3
Cat‡
Bh-CH*
(ST4)
7
Bq
8
100
97
H G F
Brown rat
Cheetah
2
Human
FIG. 1.—Phylogeny and gene order structure of the trw locus in Bartonella. Maximum likelihood phylogeny as inferred from the concatenated
alignments of the single-copy genes trwFED (left). Schematic representation of the Trw region with tandem repeat segments within brackets (top right).
The trwD–trwN genes (D–N) and the regulatory genes korA and korB are indicated. The copy numbers and host species for each strain are outlined
(right). ST: sequence type. * Only the trwL region has been sequenced for Bh-CH, and thus this sequence is not included in the trwFED phylogeny.
The trwH gene is lacking from the third tandem repeat of this segment in Bt. à The strain was isolated from an incidentally infected human patient.
to generate functional diversity can be illustrated with the major antigen (encoded by the tprK gene) in Treponema pallidum (Gray et al. 2006): at least 420,000 different sequence
variants of this protein can be generated from 47 variable
pseudogene sequences and 7 variable (V) regions in the
expressed tprK gene (Santoyo and Romero 2005). The contribution of combinatorial gene conversion to antigenic variation of surface structures is by now well established in
many pathogens, including Anaplasma marginale (Brayton
et al. 2002, 2005), Borrelia hermsii (Plasterk et al. 1985), Borrelia burgdorferi (Zhang and Norris 1998), Neisseria gonorrhoeae (Hagblom et al. 1985; Haas and Meyer 1986), and
Mycoplasma synoviae (Noormohammadi et al. 2000).
Gene conversion can also cause members of a multigene family to evolve in concert, as has been reported for
essential cell components such as the tuf gene encoding
elongation factor Tu and the ribosomal RNA (rRNA) genes.
Studies of the duplicated tuf genes indicate transfer rates in
the order of 108 to 109 per cell division (Abdulkarim and
Hughes 1996), which is higher than the estimated nucleotide
substitution rate. However, mutations in recombination–
repair genes, such as recAB and mutSHL have been demonstrated to affect these frequencies, suggesting that the
intrinsic rate of gene conversion may be species specific.
Until now, the influence of gene conversion on homogenization and diversification has been studied separately for cytoplasmic and outer membrane proteins,
making it difficult to assess the relative contribution of mutation and selection to the observed patterns. Here, the trw
cluster comes in handy: 1 of the duplicated regions contains
genes for both exocellular (TrwJ) and intracellular pilus
components (TrwI), providing the opportunity to start decoding the evolutionary forces that promotes shifts in the
balance between homogenization and diversification.
In this study, we have examined the molecular evolution of the duplicated genes in the trw cluster using sequence data from 4 Bartonella species from various
natural hosts, including humans, felines, and rodents. We
show that genes with rapid evolution and extreme intragenomic sequence divergence are found on the same coduplicated segment as genes homogenized by gene conversion.
This discrepancy is directly related to the intra- versus exocellular location of the encoded proteins. We also provide
evidence for horizontal gene transfer, and finally we discuss
the possible selective forces and dynamics shaping the evolution of duplicated genes for host-interaction proteins.
Materials and Methods
Sequences and Accession Numbers
This study includes sequence data from 4 species of
Bartonella and 4 strains of Bartonella henselae (fig. 1).
Data were taken from the genome sequences of B. henselae
strains Houston-1 (Bh-H1) (NC_005956) and Bartonella
quintana (Bq) (NC_005955) and the trw gene sequences
from Bartonella tribocorum (Bt) (AJ496288) and Bartonella grahamii (Bg) (AM905248). We performed de novo sequencing by fosmid library construction of the complete
trw region from the 2 B. henselae strains Marseille (BhMAR) (AM905245) and IndoCat-11 (Bh-IC11)
(AM905246) and of the trwL region (including korA) from
a long-range polymerase chain reaction (PCR) product of B.
henselae strain Cheetah (Bh-CH) (AM905247).
Fosmid Library Construction and Sequencing
Whole genome fosmid libraries were constructed using the Epicentre Fosmid Library Construction Kit. Genomic DNA was prepared from 2 week’s B. henselae cultures
on horse blood agar plates. The DNA was physically
sheared with a 50-ll Hamilton syringe, end repaired, and
size separated by pulse-field gel electrophoresis in 1.2%
agarose (SeaKem Gold; Cambrex Bio Science, East Rutherford, NJ) in 0.5 Tris-Borate-EDTA buffer in a
Evolution of Type IV Secretion Systems 289
GenNavigator System apparatus (Amersham Biosciences,
Uppsala, Sweden) at 14 °C and 5.7 V/cm for a total of
35 h split into 4 phases: 1 s for 3 h, 5 s for 10 h, 10 s for
15 h, and 15 s for 7 h. Fragments of 35–45 kb were cut from
the gel. The gel was digested with GELase, and the DNA
was extracted by a single phenol step, followed by NaCl
precipitation, and washed with 70% ethanol. The sample
was desalted by microdialysis (0.025 lm Millipore membrane filters, 30 min), ligated to the CopyControl pCC1FOS
fosmid vector, and packaged into phages that were used
to infect T1 resistant Escherichia coli (EPI300-T1). The
bacteria were screened on chloramphenicol plates
(12 lg/ml), grown overnight in liquid selective media,
and frozen in 15% glycerol in 80 °C for long-time storage.
Screening for trw-positive clones was performed with
the Gen Images AlkPhos direct labeling and detection system with chemiluminescence detection with CDP-Star (GE
Healthcare, Uppsala, Sweden). Samples of 2-ll fosmidcontaining E. coli cultures were added directly onto a nylon
filter, and positive clones were identified by DNA–DNA
overnight hybridization at 65 °C with a labeled PCR product from the Bh-H1 trwK gene, followed by detection by
exposure on Hyperfilm ECL.
Fosmids were induced to high copy numbers (CopyControl Induction Solution) and prepared using standard
plasmid preparation protocols (QIAGEN, Valencia, CA).
Shotgun sequencing was performed on the purified fosmid
with the insert DNA, using a MegaBace1000 capillary sequenator. Published fosmids were sequenced to a coverage
of .8, complemented by PCR sequencing if needed to an
average consensus sequence quality value of .85, and at
least one read with a quality value .20 in both directions
at each site.
peptide (SP) regions and transmembrane regions from
protein sequence data. We used stability of aligned positions (Loytynoja and Milinkovitch 2001) to analyze protein
alignment ambiguities for each of the possible parameter
combinations in ClustalW, with the gap extension penalty
ranging from 0.1 to 0.4 (step 5 0.1) and the gap opening
penalty ranging from 8 to 15 (step 5 1). Segments of the
gene supported by less than 90% of the protein alignments
were consider ambiguous.
Sequence Alignments and Distance Measures
Detection of Recombination and Gene Conversion
For all coding genes, protein alignments were made
with Kalign (Lassmann and Sonnhammer 2005) and backtranslated to nucleotide sequences, with the trailing gap parameter set to t 5 0. Nucleotide spacers were aligned with
Kalign (t 5 0). All alignments were manually inspected
and edited if needed.
Distance measures must be carefully interpreted in the
light of recombining sequences. Hence, we used as distance
measures both corrected and uncorrected substitution frequencies, as well as nucleotide identity and diversity index
calculations (see below), depending on the particular
sequences involved and the question at hand. Nonsynonymous (dN) and synonymous (dS) substitution frequencies
were calculated using the method of (Yang and Nielsen
2000) as implemented in yn00 in the PAML package (Yang
2000), also giving the pairwise percent nucleotide identity.
Uncorrected nonsynonymous (pN) and synonymous (pS)
substitution frequencies were obtained by synonymous
nonsynonymous analysis program (Korber 2000).
Potentially recombined trwJ sequences were identified
using the 4-sequence method implemented in 4SIS
(Maynard Smith and Smith 1998). For each analysis, 1 gene
copy (the ‘‘child’’) was analyzed for similarity on polymorphic sites to 2 other gene copies (the ‘‘parents’’), using
a fourth gene copy as outgroup. The program optimizes
a chi-square value to identify potential recombination
breakpoints in the analyzed gene, based on biased similarity
on informative sites, with the P value being equal to the
fraction of 1,000 permutations with a chi-square value
larger than for the original data.
We also detected recombination using the 4 methods
RDP (Martin and Rybicki 2000), GENECONV (Padidam
et al. 1999), MaxChi (Maynard Smith 1992), and Chimaera
(Posada and Crandall 2001) as implemented in the program
RDP3 (Martin and Rybicki 2000), with default settings.
Potential gene conversion events in the concatenated
trwJIH region (including spacers) and among the trwL genes
were detected using test of (Sawyer 1989) in GENECONV
(Sawyer 1999). Parameters were set to equal weights for all
nucleotides (seqtype 5 all) and no mismatches allowed
(gscale 5 0). Clusters of repeated gene families (more than
60% amino acid similarity over at least 50% of the protein
length) in B. henselae and B. quintana were detected using
Blastclust (Altschul et al. 1990). Each cluster with at least 3
Sequence Feature Predictions and Gene Segments
The transmembrane regions of aligned genes were predicted using PolyPhobius (http://phobius.cgb.ki.se/) (Käll
et al. 2005), which simultaneously predicts both signal
Tree Reconstructions
The phylogenies of the alignments were inferred by
maximum likelihood as implemented in Treefinder (Jobb
et al. 2004), with 100 bootstrap replicates. The Akaike information criterion was used to select among 56 nt substitution models, using Modeltest 3.7 (Posada and Crandall
1998). The models obtained were general time reversible
(GTR) þ I (trwFED), GTR þ I þ G (trwJ), GTR þ G
(trwI), and transversion model þ I þ G (trwL).
Alignment Visualizations
Sequence alignments with up to ;10 sequences were
directly visualized by a novel method: Each sequence was
schematically plotted as a line/arrow (spacer/gene) in
a spread-out fashion. At each position in the alignment,
lines were drawn to indicate identical nucleotides. Unique
nucleotides and gaps were represented above (or below)
each sequence as stars and lines, respectively. For visibility,
nonsegregating sites were removed from the alignments in
some plots, and lines were not drawn for nucleotides shared
across more than 3 sequences.
290 Nystedt et al.
Table 1
Synonymous and Nonsynonymous Substitution Frequencies for trwI (upper rows) and trwFED (lower rows) withina,b and
between Genomes
Bh-H1
0.01:0.02
NA:NAc
Bh-MAR
0.04:0.15
0.00:0.02
Bh-IC11
0.04:0.17
0.00:0.02
Bq
0.15:0.55
0.07:0.45
Bg
0.25:0.62
0.10:0.75
Bt
0.24:0.69
0.10:0.85
Bh-MAR
0.29
0.10
0.02:0.06
NA:NAc
0.04:0.16
0.00:0.13
0.15:0.58
0.07:0.46
0.25:0.61
0.10:0.76
0.23:0.66
0.10:0.87
Bh-IC11
0.22
0.08
0.28
0.12
0.00:0.00
NA:NAc
0.15:0.62
0.07:0.44
0.24:0.71
0.10:0.75
0.24:0.70
0.10:0.85
Bq
0.28
0.14
0.27
0.15
0.23
0.15
0.00:0.00
NA:NAc
0.24:0.61
0.10:0.80
0.25:0.60
0.10:0.85
Bg
0.40
0.14
0.41
0.14
0.34
0.14
0.39
0.13
0.01:0.01
NA:NAc
0.08:0.26
0.05:0.37
Bt
0.35
0.11
0.36
0.11
0.34
0.12
0.42
0.12
0.31
0.13
0.01:0.01
NA:NAc
Bh-H1
NOTE.—The upper triangle shows dNdS, and the lower triangle shows the dN/dS ratio.
a
Intragenomic comparisons are not defined for the single-copy genes trwFED.
b
The dN/dS rations for the intragenomic comparisons of trwI are unreliable due to the small and imprecise dN and dS estimates and were excluded.
c
NA, Not Applicable.
repeated genes in the 2 genomes combined was aligned with
Kalign (t 5 0) and analyzed for gene conversion events
with GENECONV (same parameters as above).
Identity Plots
Segments of differential sequence conservation were
detected by calculating the Shannon–Wiener diversity
index (H) in sliding windows along the alignments of
the trwJIH segment (window size 5 40 nt, step size 5 20
nt) and the trwL genes (window size 5 20 nt, step size 5
10 nt). The Shannon–Wiener index at each position i was
here defined as:
Hi 5ðfi A logðfi AÞþfi T logðfi TÞ þ fi C logðfi CÞ
þfi G logðfi GÞ þ fi Gap logðfi GapÞ=logðnÞ;
where fiA is the frequency of A:s at site i and n equals the number of possible states (n 5 5 if there are 5 or more sequences
in the alignment, otherwise n equals the number of sequences). A high similarity at a given site will give a low Shannon–
Wiener index and a low similarity will give a high value.
Results
Evolution of the trw Cluster in Bartonella
We have studied the evolution of the trw cluster that
codes for a conjugative T4SS using Bartonella as the model
system. Included in the analysis were 4 different Bartonella
species: B. henselae (Bh), B. quintana (Bq), B. tribocorum
(Bt), and B. grahamii (Bg) (fig. 1). To increase the resolution of the analyses, we compared the sequences of the trw
gene cluster in 3 different strains of B. henselae: Houston-1
(Bh-H1, type strain), Marseille (Bh-MAR) and IndoCat-11
(Bh-IC11), with some sequence data also collected from the
strain Cheetah (Bh-CH). The B. henselae strains represent 3
of the 8 sequence types (ST1, ST4, and ST8) identified in
a previous multi locus sequence typing analysis of polymorphisms in 9 core genes (Lindroos et al. 2006).
We inferred the evolutionary relationships of the trw
region based on the last 3 genes in the trw cluster, trwFED
(the positional homologs to A. tumefaciens virB 9, 10, 11).
This gene set supported a clade containing the human
pathogens B. quintana and the feline-associated species
B. henselae to the exclusion of the rodent-associated species B. grahamii and B. tribocorum (fig. 1) in accordance
with previously estimated species trees (Zeaiter et al. 2002;
Frank et al. 2006), providing evidence for evolution by vertical inheritance of trw in the Bartonella lineages. The nonsynonymous (dN) and the synonymous (dS) substitution
frequencies of the trwFED genes between Bh-H1 and
Bq were dN 5 0.07 and dS 5 0.45, which is close to
the average of all orthologs in these 2 completely sequenced
genomes (dN 5 0.06, dS 5 0.49) (table 1).
The analyses further suggested a tight relationship of
all the 3 B. henselae strains, with the ST1-strain Bh-H1
forming a clade together with the ST8-strain Bh-MAR
(97% bootstrap support), to the exclusion of the ST1-strain
Bh-IC11 (fig. 1). This can be compared with a previous
MLST-based analysis, where no differences were found between Bh-H1 and Bh-IC11, whereas Bh-MAR differed
from these 2 in 6 of the 9 sequenced genes (Lindroos
et al. 2006). Thus, different segments of the genome seem
to have different evolutionary histories.
Gene-Specific Phylogenetic Signals in the Tandemly
Repeated trwJIH Segment
The Bartonella Trw locus differs from previously described T4SS in that several genes within the locus are present in multiple copies (fig. 1). One segment encompassing
the genes trwJ, trwI, and trwH (the positional homologs of
virB5, virB6, and virB7) is present in 2 (Bq, Bh-H1, BhIC11), 3 (Bh-MAR), 4 (Bg), or 5 (Bt) copies, implying
lineage-specific duplication–deletion processes. Because
the order of genes in all 4 species is trwJ-I-H-J-I-H rather
than trwJ-J-I-I-H-H or variants thereof, we assume that all 3
genes in the segment encompassing trwJ-I-H were duplicated simultaneously.
Evolution of Type IV Secretion Systems 291
A
Bt trwI5
89
Bt trwI2
B
0.1
0.02
Bt trwJ4
100
Bt trwI3
Bg trwJ2
100
Bt trwI4
Bg trwJ4
100
Bt trwI1
Bg trwJ3
Bg trwI4
Bt trwJ3
95
100
Bg trwI3
100
100
Bt trwJ2
100
Bt trwJ5
Bg trwI2
Bh-IC11 trwJ2
Bg trwI1
100
99
100
100
Bh-IC11 trwI1
98
Bh-MAR trwJ3
Bh-MAR trwJ2
Bh-IC11 trwI2
Bh-H1 trwJ2
90
Bh-H1 trwI1
Bq trwJ2
Bh-H1 trwI2
100
Bh-IC11 trwJ1
100
Bh-MAR trwI1
99
Bh-H1 trwJ1
Bh-MAR trwI2
Bh-MAR trwJ1
Bh-MAR trwI3
Bq trwJ1
Bq trwI1
100
100
Bq trwI2
50 nt
Bg trwJ1
Bt trwJ1
Bh-H1 trwJ2
100 nt
Bh-H1 trwI2
Bh-H1 trwJ1
Bh-H1 trwI1
Bh-IC11 trwI2
Bh-IC11 trwJ1
Bh-IC11 trwJ2
Bh-IC11 trwI1
Bh-MAR trwJ2
Bh-MAR trwI2
Bh-MAR trwJ1
Bh-MAR trwI1
Bh-MAR trwI3
Bh-MAR trwJ3
FIG. 2.—Upper panels: Maximum likelihood phylogenies for the nucleotide alignments of (A) trwI and (B) trwJ. The trees were inferred without an
outgroup and the maximum likelihood bootstrap values above 80% are shown. Lower panels: alignment visualization of (A) the trwI and (B) the trwJ
gene copies in the B. henselae strains. Each arrow represents a gene copy in the alignment. At each position in the alignment, groups of homologs with
shared nucleotides are connected by lines. For visibility, only groups with 2 or 3 members are connected. Red stars above the genes indicate unique
sites. All nonsegregating sites were omitted from the alignments, as well as the HV region of trwJ.
There is a striking difference in sequence conservation
between genes that lie in the same coamplified fragment,
yielding gene-specific tree topologies for the trwJ, trwI,
and trwH genes. A phylogenetic analysis of the trwI and trwH
genes revealed species-specific clades with bootstrap support
of 100%, with trwI showing this pattern even down to the
strain-specific level in B. henselae (fig. 2A). In contrast, the
trwJ sequences tended to form clades consisting of positional
homologs (fig. 2B). For example, the trwJ copies from the 3
B. henselae strains formed 2 separate clades with at least
98% bootstrap support, one clade containing all trwJ1 copies
andthe other clade containing all trwJ2/3 copies. By and large,
the topology of the trwJ tree is congruent with the estimated
trw tree (fig. 1), given some duplication and deletion events.
292 Nystedt et al.
Because 1 part of the trwJ gene displayed extreme sequence divergence, we examined if the pattern of vertical
descent in trwJ holds true for the entire gene. The trwJ
alignment was divided into a SP region (nucleotide position
1–72), a hypervariable (HV) region (nucleotide position
73–780), and a V region (nucleotide position 780–1128).
Phylogenetic trees based both on the HV region (supplementary fig. 1A, Supplementary Material online) and a concatenation of the SP þ V regions (supplementary fig. 1B,
Supplementary Material online) were congruent with the
tree inferred from the complete trwJ alignment (fig. 2B)
for all clades with strong bootstrap support (.85%). Hence,
the signal of vertical descent is evident both in the hypervariable and the more conserved parts of the trwJ gene.
To visualize the completely different phylogenetic signals in the trwJ and trwI genes directly from the alignments,
we plotted the shared nucleotides between the 7 copies from
the 3 B. henselae strains for trwJ and trwI separately. The result highlighted the similarity between copies in the same genome for trwI (fig. 2A) and between positional homologs for
trwJ (fig. 2B). We also noted an access of unique sites in BhH1 trwJ2 compared with the other trwJ copies in the 3 strains.
Composite Genes and Gene Conversion
Visual inspection of the trwJ copies indicated heterogeneous pairwise similarities between close homologs
along the alignment. One explanation for this could be composite genes, that is, genes created by recombination of 2
parental sequences. We tested this hypothesis using 4SIS
(Maynard Smith and Smith 1998), looking for recombined
genes by comparing 1 gene with 2 potential parental sequences. Each trwJ copy in the B. henselae strains was
tested, using the closest phylogenetic homolog (always being a positional homolog from another strain) as 1 parental
sequence, whereas the other parental sequence was assumed to be the second trwJ copy from the same genome
as the analyzed gene (for simplicity excluding Bh-MAR
trwJ3 from this analysis). Bartonella grahamii trwJ1 was
used as an outgroup in all analyses. Three of the 6 analyzed
trwJ genes (Bh-H1 trwJ2, Bh-IC11 trwJ1, and Bh-IC11
trwJ2) showed statistically significant evidence of recombination (P , 0.01) (supplementary fig. 2, Supplementary
Material online). Similar results were obtained with
RDP3 (Martin and Rybicki 2000), with significant recombination signals (P , 0.05) found by at least 3 of the 4 recombination detection methods used (see Materials and
Methods) (fig. 3). Even though a much more extensive sampling of strains would be needed in order to fully explain the
evolutionary history of the trwJ genes, these examples provide evidence that recombined copies are common and contribute to the divergence of the trwJ genes.
In striking contrast to trwJ, the trwI copies formed
strong intragenomic clades in the phylogenetic analysis
(fig. 2A). Calculating the Shannon–Wiener diversity index
in sliding windows along the concatenated alignments of
trwJ (excluding the HV region), trwI, trwH, and their preceding spacers showed extreme differences in intragenomic
diversity along the segment, with the lowest diversity focused around trwI in all genomes (fig. 4A–F). Given the
signal of vertical descent in trwJ and that the trwJIH genes
seem to have been copied together, the low intragenomic
diversity in trwI is unlikely to be due to recent independent
trwJIH duplications in each genome. We also tested
whether strong negative selection on trwI could explain
the low intragenomic diversity. We calculated the average
dN and dS values for each pair of trwI genes between any
genome pair and compared with the corresponding values
for the single-copy trwFED genes (table 1). Between species, the dN/dS values were somewhat increased for trwI
(dN/dS 5 0.22–0.42) compared with trwFED (dN/dS 5
0.08–0.15), indicating that negative selection is not particularly strong on trwI compared with the trwFED genes.
Negative selection is also unlikely to operate on synonymous sites, but both dN and dS values were much lower
within than between genomes (table 1).
Gene conversion between gene copies may explain
strong intragenomic homogenization on both synonymous
and nonsynonymous sites, without a decreased substitution
frequency between genomes. Using runs test of (Sawyer
1989) on the concatenated trwJIH segments above, gene
conversion fragments larger than 200 nt were detected in
all genomes, covering various parts of the trwI and trwH
genes and their preceding spacers (fig. 4A–F). One intergenomic fragment was also detected between the second
copy in Bh-H1 and the 2 copies in Bh-IC11, indicating either recombination between strains or representing a segment where gene conversion has not yet distorted the
signal of vertical inheritance between the strains. Such partial gene conversion, or recombinations between strains,
would introduce artifacts in the distance measures, perhaps
explaining the elevated average of the calculated dS values
between the B. henselae strains for trwI (dS 5 0.15–0.17)
compared with trwFED (dS 5 0.02–0.03) (table 1).
Rapid Evolution of trwJ by Short Indels and Repeats
Whereas the conserved SP and V regions in trwJ
showed stable alignments and contained virtually no indels,
the HV region ranged in size from 87 amino acids (Bh-H1
trwJ2) to 191 amino acids (Bg trwJ4). The most notable
aspect of the trwJ (HV) alignments between close homologs from the B. henselae strains was the high frequency
of indels, even in otherwise almost identical sequence segments, typically without distorting the open reading frame.
For example, between Bh-IC11 trwJ2, Bh-MAR trwJ2, and
Bh-MAR trwJ3 we found 5 indels of 2, 13, 24, 3 (fig. 5),
and 12 nt in size. The indel of 24 nt was associated with
a short imperfect repeat of 18 nt, with the first copy present
in all 3 strains (fig. 5). Similar patterns of indels were seen
in the alignments of the 3 trwJ1 copies of the B. henselae
strains, with 6 indels of 18, 36, 30, 9, 3, and 3 nt in size. An
imperfect triple repeat of 18 nt was also found in a homologous stretch of the B. tribocorum copies trwJ2 and trwJ5.
Some of the indels were flanked by short stretches containing more nucleotide differences than seen elsewhere in the
alignment, as observed for example around the 2 and 13 nt
gaps in Bh-MAR trwJ3 compared with the Bh-MAR trwJ2
and Bh-ICII trwJ2 (fig. 5).
We also used the corrected (dN and dS) and uncorrected (pN and pS) substitution frequencies as a measure
of the sequence divergence of the conserved SP þ V regions
Evolution of Type IV Secretion Systems 293
Bt trwJ1
Bt trwJ2
Unknown
Bt trwJ3
Unknown
Bt trwJ4
Bt trwJ5
Bh-MAR trwJ2
Bg trwJ1
Bg trwJ2
Bg trwJ3
Bg trwJ4
Bh-H1 trwJ1
Bh-H1 trwJ1
Bh-H1 trwJ2
Bh-MAR trwJ1
Bh-MAR trwJ2
Unknown
Bh-IC11 trwJ1
Bh-MAR trwJ3
Bh-IC11 trwJ1
Bh-IC11 trwJ1
Bh-IC11 trwJ1
Bh-IC11 trwJ2
Bh-IC11 trwJ1
Bh-IC11 trwJ1
Bq trwJ1
Unknown
Bq trwJ2
FIG. 3.—Detection of recombinant trwJ copies by RDP3 (Martin and Rybicki 2000). Each dark bar represents a trwJ gene, with the gene name
indicated to the upper left corner. White bars below a gene indicate a recombination event, with the presumed parental sequence indicated to the right.
Recombination events supported (P , 0.05) by at least 3 of the 4 methods tested (see Materials and Methods) are shown.
for all trwJ pairs within and across genomes. In contrast to
trwI, the divergence of trwJ (SP þ V) within each genome
was as high as the divergence between genomes (supplementary fig. 3, Supplementary Material online), with the
maximum pairwise intragenomic divergence of trwJ (SP
þ V) for each genome ranging from dN 5 0.26, dS 5 0.44
in B. quintenz to dN 5 0.60, dS 5 2.85 for B. tribocorum
To compare, no copy pair of trwFED between any genomes
exceeded dN 5 0.10 and dS 5 0.87. The dN/dS (and pN/
pS) ratios were generally higher for the trwJ homologs compared with the single-copy trwFED genes, but always below
1 (the only exception being Bg trw3 with a dN/dS . 1 when
compared with Bg trwJ4, Bt trwJ2, and Bt trwJ5) (supplementary figs. 3 and 4, Supplementary Material online).
Diverged Clades of the Tandemly Repeated trwL Genes
The main pilus component protein TrwL is encoded by
7 to 8 gene copies located in a single tandem array in the
investigated Bartonella genomes (fig. 1). The trwL copies
displayed segmental conservation patterns as shown by the
Shannon–Wiener diversity index (supplementary fig. 6,
Supplementary Material online), where the most conserved
parts corresponded to 2 transmembrane segments linked by
a cytoplasmic region in the second half of the gene.
Phylogenetic inference strongly supported a division
of the trwL copies into 2 major clades, 1 clade (trwL:B)
containing the last gene copy from each genome and 1
clade (trwL:A) containing all the remaining gene copies
294 Nystedt et al.
A
1
B
0.5
0.5
0
C
0
500
1000
1500
0
2000
D
1
0.5
0
500
1000
1500
500
1000
1500
2000
0
500
1000
1500
2000
0
500
1000
1500
2000
1
0
2000
1
F
1
0.5
0.5
0
0
0.5
0
E
1
0
500
1000
1500
2000
0
FIG. 4.—Intragenomic diversity index plotted against nucleotide position in the alignments of the repeated segments including trwJ, trwI, and trwH
and their upstream spacers. Panels shown for the sequences from (A) B. tribocorum, (B) B. grahamii, (C) B. henselae (Bh-H1), (D) B. henselae (BhMAR), (E) B. henselae (Bh-IC11), and (F) B. quintenz. The bars above the plots indicate the locations and copy numbers of the genes. Putative gene
conversion fragments larger than 200 nt as inferred by Sawyer’s test are shown as lines, with the stars indicating fragments between strains.
(supplementary fig. 5, Supplementary Material online). The
same division into 2 major clades was seen in a splits decomposition network of the trwL genes, also revealing
some conflicting signals in the link between the 2 clades
(fig. 6). The low bootstrap support values for most of
the internal nodes of the trwL:A clade (supplementary
fig. 5, Supplementary Material online) were also explained
by the network analysis, with many trwL:A genes display-
ing long terminal branches radiating out from a shared core
network structure (fig. 6).
Whereas the trwL:B copies appear to follow the
trwFED phylogeny (supplementary fig. 5, Supplementary
Material online), the trwL:A copies of B. quintenz, B. grahamii, and B. tribocorum tended to form intragenomic
clades, whereas the trwL:A copies from the B. henselae
strains seemed to be distributed over the entire tree
FIG. 5.—Examples of indels in a region of otherwise high similarity of the HV region of trwJ between positional homologs from the B. henselae
strains. The black rectangles indicate an imperfect repeat. The given position refers to the complete trwJ alignment.
Evolution of Type IV Secretion Systems 295
trwL:A
0.01
Bh-CH trwL2
Bq trwL2
Bq trwL7 Bh-CH trwL1 Bh-IC11 trwL2
Bq trwL1
Bh-IC11 trwL1
Bq trwL3 Bq trwL6
Bq trwL4 Bq trwL5
Bh-H1 trwL4
Bh-MAR trwL4
Bg trwL4 Bg trwL7
Bg trwL6
Bg trwL5
Bh-H1 trwL3
Bh-MAR trwL3
Bh-CH trwL4
Bh-IC11 trwL4
Bh-IC11 trwL5
Bh-CH trwL5
Bh-MAR trwL1
Bh-H1 trwL1
Bh-H1 trwL2
Bh-MAR trwL2
Bh-CHtrwL3
Bh-IC11 trwL3
Bh-CH trwL6
Bh-IC11 trwL6
Bh-MAR trwL7
Bh-H1 trwL7
Bh-H1 trwL5
Bh-MAR trwL5
Bg trwL3
Bg trwL2
Bg trwL1
Bt trwL4
Bt trwL2
Bt trwL3
Bt trwL6
Bt trwL5
Bt trwL1
Bh-H1trwL6
Bh-MAR trwL6
trwL:B
Bg trwL8
Bt trwL7
Bq trwL8
Bh-H1 trwL8
Bh-MAR trwL8
Bh-CH trwL7
Bh-IC11 trwL7
FIG. 6.—A splits decomposition network of the trwL nucleotide alignments. The trwL:A and the trwL:B clades are indicated with dashed lines.
(fig. 6, supplementary fig. 5 [Supplementary Material online]). Unlike the homogenized trwI genes, the speciesspecific clades of trwL seemed to be completely independent
of the trwFED phylogeny, with every species clade being
roughly equidistant to every other species clade. This was
quantified for B. tribocorum, using the average synonymous substitution frequencies for all pairwise comparisons
of the trwL:A copies in 2 genomes as a distance measure.
Comparing B. tribocorum with the other species gave high
and virtually identical dS averages (Bt:Bg 1.4, Bt:Bq 1.5,
Bt:Bh 1.3–1.5), which can be compared with the singlecopy trwFED genes, where the dS values are
considerably lower and directly reflect the trwFED phylogeny (Bt:Bg 0.37, Bt:Bq 0.84, Bt:Bh 0.85–0.86). The average intragenomic dS value of the trwL:A genes in
B. tribocorum is 0.25, thus approaching the magnitude
of the trwFED genes in the 2 distinct species B. tribocorum
and B. grahamii (dS 5 0.37), and 25 times larger than the
average dS values of the intragenomic clade of the B.
tribocorum trwI genes (dS 5 0.01).
Horizontal Transfer of a Large trwL Segment
The 4 B. henselae strains represent 2 completely different versions of the trwL region. Bh-H1 and Bh-MAR are
very similar over most of the trwL region, and so are BhIC11 and Bh-CH (fig. 7A). From the alignments, we identified the exact boundaries of this pairwise similarity,
stretching from the end region of korA to the first half of
the trwL7 (Bh-H1:Bh-MAR) and trwL6 (Bh-IC11:BhCH) copies, respectively (fig. 7B and C), thus including
all the genes in the trwL:A clade identified above.
We inferred the closest homolog for each of the trwL:A
copies between each pair of strains based on the lowest synonymous substitution frequency (excluding the last copy in
the array). Within each group, the synonymous substitution
frequencies for the closest homologs range from dS 5 0 to
0.16 (average 0.04) for Bh-H1:Bh-MAR and dS 5 0 to
0.001 (average 0.002) for Bh-IC11:Bh-CH, showing complete gene order conservation and .99% nt sequence identity over the complete region, including spacers. This is
296 Nystedt et al.
korA L1 L2 L3 L4
A
L5
L6
L7
L8
Bh-H1
90–100%
80–90%
70–80%
Bh-MAR
Bh-IC11
Bh-CH
korA L1
L2
L3 L4 L5 L6
L7
B
C
100nt
Bh-H1
100nt
Bh-H1
Bh-IC11
Bh-IC11
Bh-MAR
Bh-MAR
Bh-CH
Bh-CH
FIG. 7.—Analysis of the trwL region, showing distinctly different variants among the Bh strains. (A) Each gene is connected to its closest homolog
in the genome outlined above/below, with the percent nucleotide identity indicated according to the grayscale (upper right). The korA gene and the trwL
genes (L1–L8) are indicated. (B, C) Alignment visualization of the 5# (B) and 3# (C) ends of a segment with strong pairwise similarity among the 4
strains. Arrows represent the aligned genes of the left (B) and right (C) framed areas in (A). At each position in the alignment, sequences with identical
nucleotides are connected. For visibility, nonsegregating sites are not connected. Lines and stars above (or below) the genes indicate gaps and unique
sites, respectively.
consistent with the estimated distances between strains as
inferred from the trwFED genes (table 1). In striking contrast, dS values for the closest trwL pairs between representatives for the 2 groups range from dS 5 0.07 to 1.18
(average 0.59) for Bh-MAR:Bh-ICII. These values overlap
with the values for the closest homologs between the B.
henselae strains and B. quintenz, for example, dS 5 0.81–
1.43 (average 1.05) for Bh-MAR:Bq. Because the dS values for single-copy genes located within and around the trw
region are similar between all the B. henselae strains (sequence data are lacking for these genes in Bh-CH), the differences between the trwL arrays cannot be explained by
a gradual process of nucleotide substitutions and gene duplications/deletions. Instead, the data suggest a replacement
of this large trwL segment by a recent horizontal gene transfer into one of the lineages from a rather distantly related
strain.
Upstream of the horizontally transferred trwL segment, the 4 B. henselae strains display between all high similarity (fig. 7B). However, at the downstream end of the
segment, the Bh-H1 strain clearly differed from the 3 other
strains (fig. 7C) throughout the 2 last genes in the trwL array
including the intervening spacer and to some degree also in
trwM and at the beginning of trwK. Segmental differences
in sequence similarity were observed for trwK between the
B. henselae strains (supplementary fig. 7A, Supplementary
Material online), and significant (P , 0.05) recombination
signals in this gene were detected among the B. henselae
strains by at least 3 of the 4 methods tested with RDP3
(see Materials and Methods) (supplementary fig. 7B, Supplementary Material online).
Gene Conversion on a Genome Scale
In order to determine whether gene conversion is specifically targeting the trw locus, we roughly estimated the
overall gene conversion pattern for repeated genes in the B.
henselae and B. quintenz genomes. On the complete gene
set of B. henselae and B. quintenz genes, we used Blastclust
to detect repeat gene families with at least 3 members (both
genomes combined) with more than 60% sequence similarity over more than 50% of their length. This resulted in 35
clusters, with the number of members ranging from 3 to 20
(average 5.3 members per cluster), excluding the clusters
formed by the repeated trw genes and 2 clusters with too
few polymorphisms for analysis. In 12 of the 35 clusters
(34%), intragenomic gene conversion fragments larger than
100 nt were detected, suggesting that gene conversion occurs frequently among duplicated genes in Bartonella.
Discussion
In this paper, we present the first detailed evolutionary
analysis of the Bartonella T4SS, Trw, which is required
for invasion of the erythrocytes in the mammalian host
(Schulein et al. 2001). This system was recently acquired
to one of the Bartonella lineages by horizontal gene
Evolution of Type IV Secretion Systems 297
transfer, followed by a functional switch from plasmid conjugation to host association (Seubert et al. 2003; Frank et al.
2005; Fernandez-Lopez et al. 2006). The Bartonella trw
cluster contains duplications unique among T4SSs, providing an excellent model system for dissecting the molecular
evolutionary forces of host-interaction proteins versus intracellular proteins, and the evolutionary dynamics of gene
duplications.
Gene duplication is one of the most important sources
of gene innovation (Ohno 1970; Gevers et al. 2004). An
important feature of tandem duplications is that the initial
duplication event is the most unlikely, whereas subsequent
duplication–deletions easily occur by homologous recombination during replication. As a consequence, natural selection is needed in order to maintain tandem duplications
within a population (Pettersson et al. 2005). The advantages
of multiple gene copies may be 1) increased expression
(e.g., dosage compensation of defective enzymes), 2) variability within the genome (e.g., antigenic variation), and 3)
increased potential for new functions (e.g., neo- or subfunctionalization). The Trw system in Bartonella appears to display examples of all the above dynamics, likely providing
the variation and innovation required both for the initial
functional switch and for successful host adaptation and
chronic infection.
A notable feature of the duplicated pilus component
TrwJ is the extreme intragenomic divergence and the rapid
evolution with frequent indels scattered over the first half of
the trwJ homologs, resulting in a 2-fold difference in size of
this region (from 87 amino acids in Bh-H1 trwJ2 to 191
amino acids in Bg trwJ4). A similar, but less pronounced,
divergence is seen in the surface-exposed regions of the duplicated major pilus protein TrwL.
One possible explanation for the sequence variability
among the gene duplicates is that one of the copies is expressed, whereas the others serve as sequence reservoirs
that feed divergences into the expressed site, as demonstrated for many exocellular proteins (Hagblom et al.
1985; Plasterk et al. 1985; Haas and Meyer 1986; Brayton
et al. 2002, 2005). However, none of the duplicated trw
genes in Bartonella appear to be silent, because all copies
are complete open reading frames preceded by ribosomal
binding sites (Schroder and Dehio 2005), and evolve under
purifying selection (dN/dS , 1). Likewise, expression of
the duplicated trwJIH variants is probably coregulated, with
conserved putative KorA/KorB-regulated promoters preceding all copies (Schroder and Dehio 2005).
Reoccurring rapid gene amplification, followed by divergent loss of duplicated copies has been suggested as
a common molecular mechanism to select for mutations
in demand (Andersson et al. 1998). In this model, selection
for dosage compensation may rapidly increase the gene
copy number to the order of hundreds and then quickly revert to a single copy when a favorable allele has emerged
(Andersson et al. 1998; Pettersson et al. 2005). Although
duplication–deletions clearly represent important mutational processes in the evolution of the trw cluster, this specific model would be difficult to reconcile with the patterns
of vertical descent of the trwJ copies.
Yet another possibility for the escalation of sequence
divergence would be that this region is replicated separately
from the rest of the genome during later growth stages
(Lindroos et al. 2006), which could induce a different spectrum of mutations in this part of the genome. However, the
trwFED genes as well as the genes flanking the trw cluster
evolve at similar substitution rate as the Bartonella core
genes, suggesting that an altered pattern or rate of nucleotide changes in the region covering the trw cluster cannot
explain the data.
We favor divergence mechanisms that involve combinatorial sequence shuffling between copies through recombination events within and between relatively distantly
related lineages. With multiple tandem copies such as in
the trwL array, the pool of genetic variants can easily be
increased by new chimerical sequences created by intragenomic shuffling and recurrent duplication–deletions of
genes. If Bartonella lineages can also recombine with each
other, the potential for novel genetic diversity increases dramatically. Indeed, the extensive divergence of (Bh-H1:BhMAR) compared with (Bh-IC11:Bh-CH) in the trwL array
is difficult to reconcile without invoking horizontal gene
transfer from a rather distantly related lineage. Likewise,
the many unique sequence stretches between otherwise
closely related trwJ homologs in the B. henselae strains
might also be explained by frequent recombination events
from a large pan-genomic pool of divergent trwJ alleles.
Importantly, the dynamic of this system is so high that it
may well explain the ‘‘apparent absences’’ in some species
of virB2 and virB7 (homologs of trwL and trwJ, respectively) as indicated by ordinary Blast searches (Medini
et al. 2006). Thus, our results refute models suggesting that
T4SSs have been built up in a stepwise manner by independent recruitment of units to the complex (Medini et al.
2006).
We infer duplication–deletions, gene conversion between duplicated genes and sequence exchange between
genomes to be part of the mutational engine that underlies
the dynamic structure of the trw cluster, with the outcome
being dependent on the selective constraints driving the
evolution of the individual genes.
Selection for Erythrocyte Binding on trwJ
The trw pili could be targets for the immune response,
as demonstrated for the YscF protein at the base of the T3SS
in Yersinia (Matson et al. 2005), possibly explaining the
rapid evolution and the sequence divergence between the
Bartonella isolates. However, in lack of evidence for differential expression of the pilus protein gene copies, it appears for now as if all structural variants are exposed
simultaneously on the cell surface, making it difficult to explain the strong intragenomic divergence solely in terms of
an immune evasion mechanism.
An increased binding potential of the erythrocytes has
been suggested as an alternative hypothesis to explain the
variability in the Trw pilus proteins in Bartonella (Seubert
et al. 2003). Mammalian erythrocytes are known to have
highly variable glycoprotein surface structures that have
been suggested to evolve under 2 opposing selection pressures (Gagneux and Varki 1999). Because the nonnucleated
erythrocytes cannot serve as hosts for viruses, a broad
erythrocyte-binding potential may act as a decoy defense
298 Nystedt et al.
mechanism by distracting viruses from their target tissues
(Gagneux and Varki 1999), whereas pathogens as for example the malarian parasite Plasmodium falciparum use
the same glycoprotein structures for adherence and invasion
to the erythrocytes themselves (Orlandi et al. 1992; Sim
et al. 1994). The erythrocyte surface protein glycophorin
A has been shown to be subject to both positive and diversifying selection in humans and other primates (Baum et al.
2002; Altheide et al. 2006). Likewise, and possibly as a consequence, the erythrocyte-binding antigen-175 in P. falciparum and duffy binding protein in Plasmodium vivax have
also been shown to be subject to diversifying selection
(Baum et al. 2003).
Given the extreme variation in the erythrocyte surface
structures both within and between individual host species,
selection for a diverse arsenal of binding structures appears
likely and may explain the extreme variation of the trwJ
gene copies within and between isolates, whereas evasion
of the host immune system may explain the observed rapid
evolution of this pilus protein.
It is well known that B. henselae is able to invade the
erythrocytes in their natural feline hosts but not in incidentally infected humans. The molecular basis for this host
specificity is not known, but the Trw pilus is a prime suspect
in that it is required for the erythrocyte invasion in vivo
(Seubert et al. 2003). If so, host specificity may also explain
the need for a larger number of trwJ copies in the rodentassociated B. tribocorum and B. grahamii that are believed
to have a wider host range compared with the feline- and
human-associated B. henselae and B. quintana.
Selection for Functional Conservation of trwI
In contrast to the highly diverged pilus proteins, the
trwI (and trwH) copies are nearly identical within genomes
and show a normal rate of protein evolution between lineages, despite being located on the same duplicated fragment
as trwJ. This pattern resembles multiple rRNA genes and
their repeated sequences (Liao 2000; Ganley and Kobayashi
2007), and as for the rRNA genes, we suggest that concerted evolution through frequent fixation of gene conversion events between gene copies best explains the observed
patterns for trwI. The observation that intragenomic trwI
copies appear less diverged than their upstream spacer regions suggests that the homogenization is a selected feature
in trwI. If so, what would be the selective advantages
behind homogenization of the anchoring proteins?
As for the VirB T4SS complex in A. tumefaciens
(Dang et al. 1999; Sagulenko et al. 2001), the tight interactions of the Trw anchoring proteins will likely impose
strong restrictions on amino acid substitutions, where a substitution in one protein may require a compensatory substitution in another protein (Moyle et al. 1994; Goh et al.
2000; Mintseris and Weng 2005) (even though this view
has been recently questioned; Hakes et al. 2007). Multiple
copies of an anchoring gene may thus pose a serious problem because all copies must fit with the rest of the structure.
Frequent gene conversion offers an opportunity for continuous homogenization, efficiently purging deleterious mutations that arise in one copy, while also allowing for
beneficial compensatory substitutions in one gene to
quickly spread to the other copies. The homogenized trwI
copies are likely to be hot spots for homologous recombination during replication, thereby preserving the duplication–deletion dynamics of the region, despite the high
divergence of the trwJ copies.
Selection for Subfunctionalization and Pilus Elongation
in trwL
The 2 distinct phylogenetic clades of the genes coding
for the major pilus protein TrwL is evidence that 2 trwL
gene variants have been retained side-by-side in the Bartonella lineages over a considerable evolutionary time. This
pattern is consistent with a functional separation between
the 2 clades dating back to, or shortly after, the horizontal
transfer of Trw into the Bartonella lineage. In other T4SSs,
the TrwL homolog constitutes the main part of the pilus and
also attaches the pilus to the outer membrane (Cascales and
Christie 2003). We thus speculate that the 2 clades represent
a subfunctionalization event, where one clade is responsible
for the attachment to the outer membrane and the other
clade provides the building blocks for the actual pilus. If
so, the increased copy number of 1 clade could result
in a relative increase of pilus proteins compared with anchoring proteins and an increase in pilus length that might
be advantageous for rapid adhesion to the erythrocyte in the
blood stream of the mammalian host. Expression leveldependent pilus lengths have previously been demonstrated
for the main pilus protein of the Type III Pilus of Corynebacterium diphteriae (Swierczynski and Ton-That 2006)
and is consistent with the idea that retained gene duplications are often immediately beneficial through dosage
effects (Kondrashov et al. 2002).
Evolutionary Tests and Duplicated Genes
Even though the strikingly different evolutionary patterns in trwJ, trwI, and trwL provide strong evidence per se
that selective forces are operating on some or all these
genes, proper quantitative evolutionary analyses of the
trw genes are difficult. Many assumptions underlying standard selective tests, such as D of (Tajima 1989), D* and F*
of (Fu and Li 1993), and elevated Ka/Ks ratios are violated
by frequent recombination, rapid evolution, and nonindependent substitutions between sites. Because duplications,
recombination, and rapid evolution appear to be recurrent
themes for host-interaction proteins, evaluating the influence of selection versus neutrality remains a major challenge for these important systems. In order to better
understand how and why new infectious disease agents
emerge, we need better evolutionary models, broadening
our horizon beyond point mutations. This will require massive pan-genome sampling as well as the development of
new statistical tools.
Supplementary Material
Supplementary figures 1–7 are available at Molecular Biology and Evolution online (http://www.mbe.
oxfordjournals.org/).
Evolution of Type IV Secretion Systems 299
Acknowledgments
We thank Petter Johansson and Anna Åsman for assistance with the experimental work. This work was supported by the Linnaeus Centre for Bioinformatics in
Uppsala (M. Thollesson) and the Swedish Research Council, the Swedish Foundation for Strategic Research, the
Knut and Alice Wallenberg Foundation, the Göran
Gustzfsson Foundation, and the European Union (S.G.E.
Andersson).
Literature Cited
Abdulkarim F, Hughes D. 1996. Homologous recombination
between the tuf genes of Salmonella typhimurium. J Mol Biol.
260:506–522.
Alsmark CM, Frank AC, Karlberg EO, et al. (13 co-authors).
2004. The louse-borne human pathogen Bartonella quintana
is a genomic derivative of the zoonotic agent Bartonella
henselae. Proc Natl Acad Sci USA. 101:9716–9721.
Altheide TK, Hayakawa T, Mikkelsen TS, Diaz S, Varki N,
Varki A. 2006. System-wide genomic and biochemical
comparisons of sialic acid biology among primates and
rodents: evidence for two modes of rapid evolution. J Biol
Chem. 281:25689–25702.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990.
Basic local alignment search tool. J Mol Biol. 215:403–410.
Andersson DI, Slechta ES, Roth JR. 1998. Evidence that gene
amplification underlies adaptive mutability of the bacterial lac
operon. Science. 282:1133–1135.
Baum J, Thomas AW, Conway DJ. 2003. Evidence for diversifying selection on erythrocyte-binding antigens of Plasmodium falciparum and P. vivax. Genetics. 163:1327–1336.
Baum J, Ward RH, Conway DJ. 2002. Natural selection on the
erythrocyte surface. Mol Biol Evol. 19:223–229.
Brayton KA, Kappmeyer LS, Herndon DR, Dark MJ,
Tibbals DL, Palmer GH, McGuire TC, Knowles DP Jr.
2005. Complete genome sequencing of Anaplasma marginale
reveals that the surface is skewed to two superfamilies of
outer membrane proteins. Proc Natl Acad Sci USA.
102:844–849.
Brayton KA, Palmer GH, Lundgren A, Yi J, Barbet AF. 2002.
Antigenic variation of Anaplasma marginale msp2 occurs
by combinatorial gene conversion. Mol Microbiol. 43:
1151–1159.
Cascales E, Christie PJ. 2003. The versatile bacterial type IV
secretion systems. Nat Rev Microbiol. 1:137–149.
Cascales E, Christie PJ. 2004. Definition of a bacterial type IV
secretion pathway for a DNA substrate. Science. 304:
1170–1173.
Dang TA, Zhou XR, Graf B, Christie PJ. 1999. Dimerization of
the Agrobacterium tumefaciens VirB4 ATPase and the effect
of ATP-binding cassette mutations on the assembly and
function of the T-DNA transporter. Mol Microbiol. 32:1239–
1253.
Deitsch KW, Moxon ER, Wellems TE. 1997. Shared themes of
antigenic variation and virulence in bacterial, protozoal, and
fungal infections. Microbiol Mol Biol Rev. 61:281–293.
Fernandez-Lopez R, Garcillan-Barcia MP, Revilla C, Lazaro M,
Vielva L, de la Cruz F. 2006. Dynamics of the IncW genetic
backbone imply general trends in conjugative plasmid
evolution. FEMS Microbiol Rev. 30:942–966.
Frank AC, Alsmark CM, Thollesson M, Andersson SG. 2005.
Functional divergence and horizontal transfer of type IV
secretion systems. Mol Biol Evol. 22:1325–1336.
Frank AC, Berglund E, Andersson SGE. 2006. The genomes of
pathogenics, pathogenic Bartonella species. In: Hacker J,
Dobrindt U, editors. Pathogenomics. Weinheim (Germany):
Wiley-VCH Verlag GmbH & Co. p. 281–300.
Fu YX, Li WH. 1993. Statistical tests of neutrality of mutations.
Genetics. 133:693–709.
Gagneux P, Varki A. 1999. Evolutionary considerations in
relating oligosaccharide diversity to biological function.
Glycobiology. 9:747–755.
Ganley AR, Kobayashi T. 2007. Highly efficient concerted
evolution in the ribosomal DNA repeats: total rDNA repeat
variation revealed by whole-genome shotgun sequence data.
Genome Res. 17:184–191.
Gevers D, Vandepoele K, Simillon C, Van de Peer Y. 2004.
Gene duplication and biased functional retention of paralogs
in bacterial genomes. Trends Microbiol. 12:148–154.
Goh CS, Bogan AA, Joachimiak M, Walther D, Cohen FE. 2000.
Co-evolution of proteins with their interaction partners. J Mol
Biol. 299:283–293.
Gray RR, Mulligan CJ, Molini BJ, Sun ES, Giacani L, Godornes C,
Kitchen A, Lukehart SA, Centurion-Lara A. 2006. Molecular
evolution of the tprC, D, I, K, G, and J genes in the pathogenic
genus Treponema. Mol Biol Evol. 23:2220–2233.
Guttman DS, Gropp SJ, Morgan RL, Wang PW. 2006.
Diversifying selection drives the evolution of the type III
secretion system pilus of Pseudomonas syringe. Mol Biol
Evol. 23:2342–2354.
Haas R, Meyer TF. 1986. The repertoire of silent pilus genes in
Neisseria gonorrhoeae: evidence for gene conversion. Cell.
44:107–115.
Hagblom P, Segal E, Billyard E, So M. 1985. Intragenic
recombination leads to pilus antigenic variation in Neisseria
gonorrhoeae. Nature. 315:156–158.
Hakes L, Lovell SC, Oliver SG, Robertson DL. 2007. Specificity in
protein interactions and its relationship with sequence diversity
and coevolution. Proc Natl Acad Sci USA. 104:7999–8004.
Jobb G, von Haeseler A, Strimmer K. 2004. TREEFINDER:
a powerful graphical analysis environment for molecular
phylogenetics. BMC Evol Biol. 4:18.
Käll L, Krogh A, Sonnhammer EL. 2005. A HMM posterior
decoder for sequence feature prediction that includes
homology information. Bioinformatics. 21:i251–i257.
Kondrashov FA, Rogozin IB, Wolf YI, Koonin EV. 2002.
Selection in the evolution of gene duplications. Genome Biol.
3:R8:1–9.
Korber B. 2000. HIV signature and sequence variation analysis.
In: Rodrigo AG, Learn GH, editors. Computational analysis
of HIV molecular sequences. Dordrecht (The Netherlands):
Kluwer Academic Publishers. p. 55–72.
Krall L, Wiedemann U, Unsin G, Weiss S, Domke N, Baron C.
2002. Detergent extraction identifies different VirB protein
subassemblies of the type IV secretion machinery in the
membranes of Agrobacterium tumefaciens. Proc Natl Acad
Sci USA. 99:11405–11410.
Lassmann T, Sonnhammer EL. 2005. Kalign–an accurate and
fast multiple sequence alignment algorithm. BMC Bioinformatics. 6:298.
Liao D. 2000. Gene conversion drives within genic sequences:
concerted evolution of ribosomal RNA genes in bacteria and
archaea. J Mol Evol. 51:305–317.
Lindroos H, Vinnere O, Mira A, Repsilber D, Naslund K,
Andersson SG. 2006. Genome rearrangements, deletions, and
amplifications in the natural population of Bartonella
henselae. J Bacteriol. 188:7426–7439.
Loytynoja A, Milinkovitch MC. 2001. SOAP, cleaning multiple
alignments from unstable blocks. Bioinformatics. 17:573–
574.
300 Nystedt et al.
Ma W, Dong FF, Stavrinides J, Guttman DS. 2006. Type III
effector diversification via both pathoadaptation and horizontal transfer in response to a coevolutionary arms race. PLoS
Genet. 2:e209.
Martin D, Rybicki E. 2000. RDP: detection of recombination
amongst aligned sequences. Bioinformatics. 16:562–563.
Matson JS, Durick KA, Bradley DS, Nilles ML. 2005.
Immunization of mice with YscF provides protection from
Yersinia pestis infections. BMC Microbiol. 5:38.
Maynard Smith J. 1992. Analyzing the mosaic structure of genes.
J Mol Evol. 34:126–129.
Maynard Smith J, Smith NH. 1998. Detecting recombination
from gene trees. Mol Biol Evol. 15:590–599.
Medini D, Covacci A, Donati C. 2006. Protein homology
network families reveal step-wise diversification of
type III and type IV secretion systems. PLoS Comp Biol.
12:e173.
Mintseris J, Weng Z. 2005. Structure, function, and evolution of
transient and obligate protein-protein interactions. Proc Natl
Acad Sci USA. 102:10930–10935.
Moyle WR, Campbell RK, Myers RV, Bernard MP, Han Y,
Wang X. 1994. Co-evolution of ligand-receptor pairs. Nature.
368:251–255.
Noormohammadi AH, Markham PF, Kanci A, Whithear KG,
Browning GF. 2000. A novel mechanism for control of
antigenic variation in the haemagglutinin gene family of
Mycoplasma synoviae. Mol Microbiol. 35:911–923.
Ohno S. 1970. Evolution by gene duplication. New York:
Springer.
Orlandi PA, Klotz FW, Haynes JD. 1992. A malaria invasion
receptor, the 175-kilodalton erythrocyte binding antigen of
Plasmodium falciparum recognizes the terminal Neu5Ac
(alpha 2-3) Gal- sequences of glycophorin. A J Cell Biol.
116:901–909.
Padidam M, Sawyer S, Fauquet CM. 1999. Possible emergence
of new geminiviruses by frequent recombination. Virology.
265:218–225.
Pettersson ME, Andersson DI, Roth JR, Berg OG. 2005. The
amplification model for adaptive mutation: simulations and
analysis. Genetics. 169:1105–1115.
Plasterk RH, Simon MI, Barbour AG. 1985. Transposition of
structural genes to an expression sequence on a linear plasmid
causes antigenic variation in the bacterium Borrelia hermsii.
Nature. 318:257–263.
Posada D, Crandall KA. 1998. Modeltest: testing the model of
DNA substitution. Bioinformatics. 14:817–818.
Posada D, Crandall KA. 2001. Evaluation of methods for
detecting recombination from DNA sequences: computer
simulations. Proc Natl Acad Sci USA. 98:13757–13762.
Sagulenko E, Sagulenko V, Chen J, Christie PJ. 2001. Role of
Agrobacterium VirB11 ATPase in T-pilus assembly and
substrate selection. J Bacteriol. 183:5813–5825.
Santoyo G, Romero D. 2005. Gene conversion and concerted
evolution in bacterial genomes. FEMS Microbiol Rev.
29:165–183.
Sawyer S. 1989. Statistical tests for detecting gene conversion.
Mol Biol Evol. 6:526–538.
Sawyer S. 1999. GENECONV: a computer package for the
statistical detection of gene conversion. Distributed by
the author. St Louis (MO): Department of Mathematics,
Washington University.
Schroder G, Dehio C. 2005. Virulence-associated type IV secretion systems of Bartonella. Trends Microbiol. 13:336–342.
Schulein R, Seubert A, Gille C, Lanz C, Hansmann Y,
Piemont Y, Dehio C. 2001. Invasion and persistent intracellular colonization of erythrocytes. A unique parasitic
strategy of the emerging pathogen Bartonella. J Exp Med.
193:1077–1086.
Seubert A, Hiestand R, de la Cruz F, Dehio C. 2003. A bacterial
conjugation machinery recruited for pathogenesis. Mol
Microbiol. 49:1253–1266.
Sim BK, Chitnis CE, Wasniowska K, Hadley TJ, Miller LH.
1994. Receptor and ligand domains for invasion of erythrocytes by Plasmodium falciparum. Science. 264:1941–1944.
Stavrinides J, Ma W, Guttman DS. 2006. Terminal reassortment
drives the quantum evolution of type III effectors in bacterial
pathogens. PLoS Pathog. 2:e104.
Swierczynski A, Ton-That H. 2006. Type III pilus of
corynebacteria: pilus length is determined by the level of its
major pilin subunit. J Bacteriol. 188:6318–6325.
Tajima F. 1989. Statistical method for testing the neutral
mutation hypothesis by DNA polymorphism. Genetics.
123:585–595.
Yang Z. 2000. Phylogenetic analysis by maximum likelihood.
PAML. Version 3.0. London: University College.
Yang Z, Nielsen R. 2000. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary
models. Mol Biol Evol. 17:32–43.
Yeo HJ, Yuan Q, Beck MR, Baron C, Waksman G. 2003.
Structural and functional characterization of the VirB5 protein
from the type IV secretion system encoded by the conjugative
plasmid pKM101. Proc Natl Acad Sci USA. 100:
15947–15952.
Zeaiter Z, Fournier PE, Ogata H, Raoult D. 2002. Phylogenetic
classification of Bartonella species by comparing groEL
sequences. Int J Syst Evol Microbiol. 52:165–171.
Zhang JR, Norris SJ. 1998. Genetic variation of the Borrelia
burgdorferi gene vlsE involves cassette-specific, segmental
gene conversion. Infect Immun. 66:3698–3704.
Edward Holmes, Associate Editor
Accepted November 11, 2007