Evidence for Genetic Drift in Endosymbionts (Buchnera): Analyses of Protein-Coding Genes J. J. Wernegreen and N. A. Moran Department of Ecology and Evolutionary Biology, University of Arizona Buchnera, the bacterial endosymbionts of aphids, undergo severe population bottlenecks during maternal transmission through their hosts. Previous studies suggest an increased effect of drift within these strictly asexual, small populations, resulting in an increased fixation of slightly deleterious mutations. This study further explores sequence evolution in Buchnera using three approaches. First, patterns of codon usage were compared across several homologous Escherichia coli and Buchnera loci, in order to test the prediction that selection for the use of optimal codons is less effective in small populations. A x2-based measure of codon bias was developed to adjust for the overall A1T richness of silent positions in the endosymbionts. In contrast to E. coli homologues, adaptive codon bias across Buchnera loci is markedly low, and patterns of codon usage lack a strong relationship with gene expression level. These data suggest that codon usage in Buchnera has been shaped largely by mutational pressure and drift rather than by selection for translational efficiency. One exception to the overall lack of bias is groEL, which is known to be constitutively overexpressed in Buchnera and other endosymbionts. Second, relative-rate tests show elevated rates of sequence evolution of numerous protein-coding loci across Buchnera, compared to E. coli. Finally, consistently higher ratios of nonsynonymous to synonymous substitutions in Buchnera loci relative to the enteric bacteria strongly suggest the accumulation of nonsynonymous substitutions in endosymbiont lineages. Combined, these results suggest a decreased effectiveness of purifying selection in purging endosymbiont populations of slightly deleterious mutations, particularly those affecting codon usage and amino acid identity. Introduction The rate of fixation of mutations with fitness consequences depends not only on the strength of selection for or against them but also on the effectiveness of such selection as influenced by effective population size. In populations with low rates of recombination and small effective sizes, slightly deleterious mutations may experience increased rates of fixation through drift (Ohta 1973). This predicted relationship between population structure and rate of fixation of slightly deleterious mutations can be tested among prokaryotes. Free-living bacteria are thought to have large effective population sizes (Selander, Caugant, and Whittam 1987), and even clonal groups experience recombination that is important in their evolutionary dynamics (Maynard Smith, Dowson, and Spratt 1991; Dykhuizen and Green 1993; Maynard Smith et al. 1993). In contrast, endosymbiotic bacteria associated with several insect groups have relatively small effective population sizes and have restricted opportunities for interstrain recombination because of their mode of transmission. Bacteria associated with specialized insect cells (i.e., mycetocytes) are maternally transmitted by the infection of ovaries or of internally developing embryos (reviewed in Buchner 1965; Moran and Baumann 1994; Baumann et al. 1995). The effective population size of the bacteria is reduced by the bottleneck at each inoculation of progeny, where relatively few bacteria are Abbreviations: CAI5 Codon Adaptation Index; Nc5 effective number of codons; GC35 percent G1C content at third-codon positions. Key words: Buchnera, endosymbionts, codon bias, drift, population size. Address for correspondence and reprints: Jennifer Wernegreen, Department of Ecology and Evolutionary Biology, University of Arizona, Biological Sciences West, Room 310, Tucson, Arizona 85721. E-mail: [email protected]. Mol. Biol. Evol. 16(1):83–97. 1999 q 1999 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 transmitted (Hinde 1971; A. Mira, personal communication). In addition, modeling indicates that insect host population sizes may be the primary determinant of the effective population size of intracellular genomes (C. Rispe, personal communication). Insect population sizes, while relatively large among animals, are much smaller than those of free-living bacteria (reviewed in Lambert and Moran 1998). Finally, any lateral gene transfer among endosymbionts would be confined to the bacterial genotypes present in the same host individual, and the tight bottleneck at transmission implies that these would be similar or identical. Buchnera, the endosymbionts of aphids, are particularly well characterized, and the perfect congruence between symbiont and host phylogenies supports anatomical evidence for their stable, vertical inheritance (Munson et al. 1991; Moran and Baumann 1994). The goals of this study are to explore the effects of this strict asexuality and small population size on sequence evolution in Buchnera and to test the hypothesis that Buchnera lineages experience increased rates of substitution of slightly deleterious mutations. Codon Bias The use of alternative codons may be shaped by biases in mutation rates among the four bases (Suoeka 1961; Muto and Osawa 1987), by selection for the use of optimal codons to maximize rates and efficiency of translation (Ikemura 1981, 1985), or by a combination of these processes. Studies of codon usage often attempt to distinguish the relative importance of genome nucleotide composition and selection for translational efficiency by testing alternative predictions of these models. In cases where patterns of codon usage largely reflect mutational pressure and drift rather than translational selection, codon bias is expected to correspond with local base compositional biases. This pattern characterizes genomes of vertebrates and some bacterial genomes with 83 84 Wernegreen and Moran strong A1T or G1C mutational biases (Sharp et al. 1988; Andersson and Sharp 1996). In contrast, an effect of translational selection is evidenced by a positive relationship between the extent of codon bias and level of gene expression. In Escherichia coli and in yeast, for example, a correlation between the degree of codon bias and the gene expression level (Gouy and Gautier 1982; Sharp, Tuohy, and Mosurski 1986; Sharp et al. 1988) is thought to reflect selection for the rapid translation of highly expressed genes through the use of optimal codons and a lack of such selection on lowly expressed genes and their retention of nonoptimal codons (Shields 1990). Among bacterial genomes with base compositional biases, patterns of codon usage often reflect a combination of mutational pressure and translational selection (Shields and Sharp 1987; Ohtaka, Nakamura, and Ishikawa 1992; Wright and Bibb 1992; Ohtaka and Ishikawa 1993). Population genetic models indicate that effective population size influences the balance between the effects of any mutational bias and selection for optimal codons (Li 1987; Bulmer 1991). In particular, translational selection will be effective in a haploid population only when the selective coefficient (s) for a particular codon is greater than 2/Ne (Li 1987). In large populations, relatively weak selection may produce strong codon biases. However, in small populations, weak translational selection is countered by a strong effect of drift, which may allow the maintenance of nonoptimal codons. In addition, drift may also allow rare transitions in coadapted states (Wright 1931), such as the links between codon frequencies and tRNA abundances (Shields 1990). Therefore, small, asexual populations are expected to exhibit lower overall levels of codon bias because of weakness of translational selection. Where they do exhibit bias, the bias may favor synonyms other than the optimal codons of related lineages because of switches in codon preferences. Relatively few empirical studies explicitly test the effect of population size on the balance between these two processes. One such study compares levels of codon bias of chloroplast genes across several algal and angiosperm lineages (Morton 1998). The high levels of codon bias in chloroplasts of most algal lineages, relative to chloroplasts of flowering plants, supports the hypothesis that selection for translational efficiency is more effective in large (algal) populations than in relatively small (angiosperm) populations. Likewise, previous analyses of protein-coding genes in Buchnera show a general A1T richness at synonymous sites and a lack of strong preferences for the optimal codons of the closely related species E. coli (Clark, Baumann, and Baumann 1992; Ohtaka and Ishikawa 1993; Brynnel et al. 1998; Clark, Baumann, and Baumann 1998). These studies suggest that codon usage in Buchnera largely reflects strong A1T mutational bias and fixation of nonoptimal codons through drift. However, previous estimates of codon bias in Buchnera do not account for local base composition, so it is difficult to identify any preferences for particular codons, given the strong A1T bias at synonymous sites. In the analysis presented in this study, a more extensive sample of Buchnera and E. coli homologues are included to represent a wide range of gene expression levels, and the A1T richness of the Buchnera genome (Ishikawa 1987) is considered in testing for codon bias. Previous studies of codon usage in A1T– and G1C–rich genomes highlight methods for assessing codon usage in genomes with strong mutational biases (Shields and Sharp 1987; Ohtaka, Nakamura, and Ishikawa 1992; Wright and Bibb 1992; Ohtaka and Ishikawa 1993; Andersson and Sharp 1996). For example, the effective number of codons, Nc, is reduced by preferences for particular codons or biased base composition. In order to test the null hypothesis that codons are used randomly except for the influence of local mutational bias, expected values of Nc may be adjusted to account for local base composition. In the A1T–rich Rickettsia genome, ‘‘Nc-plots’’ show an agreement between observed Nc values and those expected, given the GC3, indicating that codon usage reflects local base composition and may therefore be attributed largely to mutational bias (Andersson and Sharp 1996). Likewise, similar levels of codon bias across Rickettsia genes with very different expression levels indicate that mutational bias has a stronger effect than translational selection. In other taxa, the combined effects of mutational bias and translational selection are apparent. Across several Streptomyces loci, a strong effect of mutational bias is suggested by the correspondence of GC3 and Nc and by a correlation between the GC3 of a locus and the locus’ position along the major axis in correspondence analysis of codon usage (Wright and Bibb 1992). A slight effect of translational selection on the highly expressed Streptomyces tuf gene is supported by the relatively low Nc for this locus, its clear distinction from other loci in correspondence analysis, and the fact that, apparently, preferred codons in tuf are also preferred by another G1C–rich bacterium, Micrococcus luteus (Wright and Bibb 1992). This combination of mutational bias and translational selection is also apparent for other genomes with mutational biases, such as Micrococcus luteus (Ohtaka, Nakamura, and Ishikawa 1992; Ohtaka and Ishikawa 1993), Dictyostelium discoideum (Sharp and Devine 1989), and Bacillus subtilis (Shields and Sharp 1987). Organelle genomes may also show strong nucleotide biases. The relative importance of selection and genome composition in shaping codon usage of several A1T–biased chloroplast genomes was recently tested by comparing an observed CAI (Sharp and Li 1987), or bias toward a pool of preferred codons (here, on the basis of a highly expressed chloroplast gene), to an expected distribution of CAIs based on genome-wide nucleotide composition (Morton 1998). The analyses above test the null hypothesis that codon usage may be explained solely by local base composition. However, Nc plots, correspondence analysis, and CAI estimates may fail to detect slight preferences among synonyms, since these methods derive a single estimate across all amino acids in a locus. In addition, CAI estimates are possible only when the optimal codons for a particular genome are known. In this study, estimates of codon bias across Buchnera loci are also Evidence for Drift in Endosymbionts adjusted for local base composition. However, in contrast to previous estimates, the x2-based method developed here tests for nonrandom-use codons for single amino acids and does not require prior knowledge of preferred codons. This approach may be generally applicable to other organisms in which codon preferences may be absent or subtle, such as in taxa with small effective population sizes and/or strong mutational biases. Similar to the scaled x2 (Shields et al. 1988), as modified by Akashi and Shaeffer (1997) to adjust for A1T content at silent positions, we compared observed codon frequencies to those expected if codon usage reflects local base composition at synonymous sites. By applying this method to several homologous loci in Buchnera and in their free-living relative, E. coli, we test the hypothesis that translational selection is relatively ineffective in the endosymbionts, so that codon usage in Buchnera is shaped by A1T mutational bias and by the fixation of nonoptimal codons through drift. Rates of Sequence Divergence The decreased effectiveness of selection in small, asexual populations is also expected to accelerate the fixation of replacement substitutions. Previous studies provide strong evidence for differences in rates and patterns of sequence evolution of endosymbiotic and freeliving bacterial lineages. For example, compared to freeliving relatives in the enterics, the 16S rRNA gene of several endosymbiotic lineages has been shown to evolve 1.5–2 times faster (Moran, von Dohlen, and Baumann 1995), and observed changes in endosymbiont 16S rRNA genes destabilize the secondary structure of the molecule (Lambert and Moran 1998). In addition, several protein-coding genes in Buchnera have been shown to evolve more rapidly than their E. coli homologues (Moran 1996; Brynnel et al. 1998). Relatively high ratios of nonsynonymous divergence (Ka) to synonymous divergence (Ks) imply that these substitutions are concentrated at sites that affect amino acid sequences (Moran 1996; Brynnel et al. 1998). This study combines several types of analyses to further explore whether Buchnera loci experience increased rates of fixation of deleterious mutations, as would be expected in small, asexual populations. Here we assess patterns of codon usage across several Buchnera loci, compare levels of bias with homologues in E. coli, and explore the possibility of subtle, possibly different, codon preferences in Buchnera. Since purifying selection against nonoptimal codons is likely to be weaker than selection against replacement substitutions, previous evidence for the accumulation of nonsynonymous substitutions through drift strongly suggests that nonoptimal codons will accumulate in Buchnera lineages. Therefore, adaptive codon bias is expected to be much lower than that observed in E. coli. In addition, rates of sequence evolution of Buchnera and E. coli are compared across an extensive sample of available protein-coding loci, in order to test whether previously observed rate elevation is a general phenomenon across the Buchnera genome. Previous estimates of the ratio of nonsynonymous to synonymous divergence in Buchnera 85 were limited by high levels of synonymous divergence (Moran 1996). Values near saturation are known to have high standard errors, which may be exacerbated by the strong A1T bias of Buchnera genomes (Berg 1995). In this study, the consideration of shallower taxonomic levels allows for more reliable estimates of ratios of nonsynonymous to synonymous substitutions. Methods Loci Sampled Several loci from Buchnera taxa and the enteric bacteria were included (table 1) in estimates of codon bias, rates of sequence evolution, and patterns of nonsynonymous and synonymous substitutions. Alignments Inferred protein sequences of homologous loci were aligned using Pileup of GCG (Wisconsin Sequence Analysis Program, Genetics Computer Group, Madison, Wis.), and nucleotide alignments were adjusted to conform to amino acid alignments. Regions of loci with ambiguous amino acid alignments were excluded from the analysis. Codon Bias Estimation of Codon Bias at Fourfold Degenerate Sites For each Buchnera locus, nonrandom use of U- and A-ending codons within fourfold degenerate codon families was assessed using a x2 analysis. (Arginine, leucine, and serine were treated as fourfold degenerate by considering only the fourfold degenerate synonyms.) Cand G-ending codons were excluded from the analysis because of their small sample sizes and low expected values in this A1T-rich genome (see below). The x2 analysis involved several steps: first, nucleotide composition at fourfold degenerate sites was determined for each Buchnera locus (by the computer package Molecular Evolutionary Analysis [MEA], E. Moriyama, personal communication). Expected relative frequencies of U- and A-ending codons for each fourfold codon family were based on the relative frequencies of A’s and T’s at fourfold degenerate sites (calculated by MEA). Second, observed and expected relative values of U- and A-ending codons were compared by a twoclass x2 test adjusted for a small sample size, as suggested by Sokal and Rohlf (1981) for sample sizes ,200. Amino acids with less than five residues in a given locus were omitted. Similar to the scaled x2 of Shields et al. (1988), the magnitude of the x2 value reflects the deviation of random use of synonymous codons. However, the x2 values in this study reflect the deviation of relative frequencies of U- and A-ending codons from frequencies expected if codon usage reflects local base composition. In order to estimate overall bias at a given locus, x2 values were averaged across the fourfold codon families after excluding those amino acids with fewer than five representatives. 86 Wernegreen and Moran Table 1 Genetic Loci of Buchnera Strains Included in Study Taxona Gene Name GenBank Accession Number Acyrthosiphon pisum. . . . . . . . . . . . . Schlechtendalia chinensis. . . . . . . . . Schizaphis graminum . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . A. pisum. . . . . . . . . . . . . . . . . . . . . . . A. pisum. . . . . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . A. pisum. . . . . . . . . . . . . . . . . . . . . . . Myzus persicae . . . . . . . . . . . . . . . . . Rhopalosiphum padi . . . . . . . . . . . . . Salmonella typhimurium. . . . . . . . . . Sitobion avenae. . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . M. persicae . . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . Diuraphis noxia . . . . . . . . . . . . . . . . R. padi . . . . . . . . . . . . . . . . . . . . . . . . S. typhimurium . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . Thelaxes suberi . . . . . . . . . . . . . . . . . D. noxia . . . . . . . . . . . . . . . . . . . . . . . R. padi . . . . . . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. typhimurium . . . . . . . . . . . . . . . . . T. suberi. . . . . . . . . . . . . . . . . . . . . . . D. noxia . . . . . . . . . . . . . . . . . . . . . . . R. padi . . . . . . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. typhimurium . . . . . . . . . . . . . . . . . T. suberi. . . . . . . . . . . . . . . . . . . . . . . D. noxia . . . . . . . . . . . . . . . . . . . . . . . R. padi . . . . . . . . . . . . . . . . . . . . . . . . S. typhimurium . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . T. suberi. . . . . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . R. padi . . . . . . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . argS argS aroA aroE aroH atpA atpB atpC atpD atpE atpF atpG atpH cysE cysS ddlB dnaA dnaG(pt)c dnaJ dnaK dnaN dnaQ fdx ftsA ftsZ gapA gidA groEL groEL groEL groEL groEL groEL groES groES gyrB himD hscA hscB ilvC ilvD infC leuA leuA leuA leuA leuA leuB leuB leuB leuB leuB leuC leuC leuC leuC leuC leuD leuD leuD leuD leuD murC nifS pfs rep rep L18933b L18932b L43549b U09230b U11066b 2827020b 2827024b 2827018b 2827017b 2827023b 2827022b 2827019b 2827021b M90644b U09230b 2738587b M80817b M90644b D88673b D88673b M80817b L18927b 2827028b 2738589b 2738588b U11045b 2827025b X61150b,d 2754808b,d U77380b,d U01039c U77379b,d D85628b,d 2754807b,d D85628b,d M80817b L43549b 2827029b 2827030b 2827034b 2827033b U11066b AF041837b,d X71612b,d 47968d AF041836b,d Y11966b,d AF041837b,d X71612b,d AF041836b,d X53376d Y11966b,d AF041837b,d X71612b,d AF041836b,d M31047d Y11966b,d AF041837b,d X71612b,d 47764d AF041836b,d Y11966b,d AF012886b 2827032b AF01288b X71612b 2827035b Table 1 Continued Taxona D. noxia . . . . . . . . . . . . . . . . . . . . . . . R. padi . . . . . . . . . . . . . . . . . . . . . . . . Tetraneura caerulescens. . . . . . . . . . T. suberi. . . . . . . . . . . . . . . . . . . . . . . D. noxia . . . . . . . . . . . . . . . . . . . . . . . R. padi . . . . . . . . . . . . . . . . . . . . . . . . T. suberi. . . . . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. chinensis . . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. chinensis . . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . A. pisum. . . . . . . . . . . . . . . . . . . . . . . D. noxia . . . . . . . . . . . . . . . . . . . . . . . Macrosiphoniella ludovicianae . . . . Melaphis rhois . . . . . . . . . . . . . . . . . Rhopalosiphum maidis . . . . . . . . . . . R. padi . . . . . . . . . . . . . . . . . . . . . . . . S. typhimurium . . . . . . . . . . . . . . . . . S. chinensis . . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . ECOR 17 . . . . . . . . . . . . . . . . . . . . . . ECOR 29 . . . . . . . . . . . . . . . . . . . . . . ECOR 31 . . . . . . . . . . . . . . . . . . . . . . ECOR 37 . . . . . . . . . . . . . . . . . . . . . . ECOR 46 . . . . . . . . . . . . . . . . . . . . . . ECOR 50 . . . . . . . . . . . . . . . . . . . . . . ECOR 51 . . . . . . . . . . . . . . . . . . . . . . ECOR 60 . . . . . . . . . . . . . . . . . . . . . . ECOR 71 . . . . . . . . . . . . . . . . . . . . . . ECOR 72 . . . . . . . . . . . . . . . . . . . . . . Uroleucon aeneum . . . . . . . . . . . . . . Uroleucon ambrosiae . . . . . . . . . . . . Uroleucon astronomus . . . . . . . . . . . Uroleucon caligatum . . . . . . . . . . . . Uroleucon erigeronense . . . . . . . . . . Uroleucon helianthicola . . . . . . . . . . Uroleucon jaceae . . . . . . . . . . . . . . . Uroleucon jaceicola . . . . . . . . . . . . . Uroleucon obscurum . . . . . . . . . . . . Uroleucon rudbeckiae . . . . . . . . . . . Uroleucon rapunculoidis . . . . . . . . . Uroleucon rurale . . . . . . . . . . . . . . . Uroleucon solidaginis. . . . . . . . . . . . Uroleucon sonchi . . . . . . . . . . . . . . . S. chinensis . . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. chinensis . . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . A. pisum. . . . . . . . . . . . . . . . . . . . . . . D. noxia . . . . . . . . . . . . . . . . . . . . . . . R. maidis . . . . . . . . . . . . . . . . . . . . . . R. padi . . . . . . . . . . . . . . . . . . . . . . . . S. typhimurium . . . . . . . . . . . . . . . . . S. chinensis . . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . U. caligatum . . . . . . . . . . . . . . . . . . . Gene Name repA1 repA1 repA1 repA1 repA2 repA2 repA2 rho rmph rnh rnpA rpoB rpoC rpoD rpsA secB sohB thrS tpiA trmE trpA trpA trpB trpB trpB trpB trpB trpB trpB trpB trpB trpB(pt) trpB(pt) trpB(pt) trpB(pt) trpB(pt) trpB(pt) trpB(pt) trpB(pt) trpB(pt) trpB(pt) trpB(pt) trpB(pt) trpB(pt) trpB(pt) trpB(pt) trpB(pt) trpB(pt) trpB(pt) trpB(pt) trpB(pt) trpB(pt) trpB(pt) trpB(pt) trpB(pt) trpC trpC trpD trpD trpE trpE trpE trpE trpE trpE trpE trpE GenBank Accession Number AF041837b,g X71612b,g Y11972b,g Y11966b,g AF041837b,g X71612b,g Y11966b,g 2827037b M80817b L18927b M80817b Z11913b Z11913b M90644b L43549b M90644b U09185b U11066b L43549b 2827009b U09185b Z19055b L46355b,d AF038565b,d AF058428e L46357b,d L46356b,d L46358b,d J01810d U09185b,d Z19055b,d U23489e U25425e U23494e U23496e U23495e U23497e U25884e U23499e U23500e U25429e AF058431d,e AF058432d,e AF058433d,e L81150d,e L81151d,e AF058434d,e AF058435d,e AF058436d,e AF058437d,e AF058439d,e AF058438d,e L81149d,e AF058440d,e 1137716b,d,e U09185b Z19055b U09185b Z19055b L43555b,d L46769b,d L43550b,d L43551b,d V01378d U09184b,d Z21938b,d L8124d Evidence for Drift in Endosymbionts Table 1 Continued Taxona U. erigeronense. . . . . . . . . . . . . . . . . U. rurale . . . . . . . . . . . . . . . . . . . . . . U. sonchi . . . . . . . . . . . . . . . . . . . . . . A. pisum. . . . . . . . . . . . . . . . . . . . . . . D. noxia . . . . . . . . . . . . . . . . . . . . . . . R. maidis . . . . . . . . . . . . . . . . . . . . . . R. padi . . . . . . . . . . . . . . . . . . . . . . . . S. chinensis . . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . A. pisum. . . . . . . . . . . . . . . . . . . . . . . M. rhois . . . . . . . . . . . . . . . . . . . . . . . Pemphigus betae . . . . . . . . . . . . . . . . S. typhimurium . . . . . . . . . . . . . . . . . S. chinensis . . . . . . . . . . . . . . . . . . . . S. graminum . . . . . . . . . . . . . . . . . . . Aeromonas salmonecida. . . . . . . . . . Pseudomonas putida. . . . . . . . . . . . . Haemophilus ducreyi . . . . . . . . . . . . Vibrio cholerae . . . . . . . . . . . . . . . . . P. putida . . . . . . . . . . . . . . . . . . . . . . P. aeruginosa . . . . . . . . . . . . . . . . . . Azotobacter vinelandii . . . . . . . . . . . Azotobacter vinelandii . . . . . . . . . . . P. putida . . . . . . . . . . . . . . . . . . . . . . Gene Name trpE trpE trpE trpG trpG trpG trpG trpG trpG trxA tufA tufA tufA tufA tufA tufA aroA dnaG dnaJ dnaK dnaN leuB leuC leuD rpoB GenBank Accession Number L81123d L8112d 1137712d L43555b,g L46769b,g L43550b,g L43551b,g U09184b,g Z21938b,g 2827036b 2369691b,d 2369697b,d 2369695b,d X55116d L43549b,d 2369693b,d L05002f U85774f U25996f Y14237f X14791f U29655f Y11280f Y11280f X15840f a Buchnera strains are labeled by the aphid species from which they were isolated. All sequences from Escherichia coli and Haemophilus influenzae are accessible from the full genome sequences of these two species (GenBank accession numbers U00096 and L42023, respectively). Individual loci of these two species are not listed. b Sequences of Buchnera taxa (listed) and E. coli compared in codon usage analysis. c ‘‘(pt)’’ indicates that only a partial sequence is available in GenBank. d Sequences of Buchnera and enteric bacteria (including E. coli and the enteric species listed) used in comparison of Ka/Ks. e Sequences used in mapping of nucleotide changes across phylogenies, in addition to the trpB sequence of E. coli K12. f Taxon 3 in relative-rate tests. g No homologue in E. coli with sufficient similarity; locus excluded from comparison of CAI values. Estimates of Codon Bias in Homologous E. coli Loci Codon bias in E. coli loci was estimated by the deviation from random use of all four synonyms (not just U- and A-ending codons) at fourfold degenerate sites. Expected values were based on the nucleotide composition at fourfold degenerate sites for a particular locus (calculated using MEA), and deviation from expected was tested using a four-class x2 test. Only amino acids with five or more representatives in a given locus were considered. The magnitudes of x2 values are not directly comparable between E. coli and Buchnera, since the two tests have different numbers of classes and therefore different degrees of freedom. Analyses of Codon Usage Based on x2 Test Results and Values E. coli and Buchnera loci were compared in terms of the proportion of total x2 tests with significant results, the average x2 values of homologous loci, and the x2 values for individual amino acids across homologous loci. Although the actual x2 values are not directly com- 87 parable between E. coli and Buchnera (see above), the latter two analyses allow a visual comparison of levels and patterns of codon bias. Codon usage in Buchnera was also explored by testing for the overrepresentation of particular loci among significant x2 test results and by identifying consistent preferences for U- or A-ending codons for particular amino acids. Rates of Nucleotide Divergence Relative-Rate Tests Relative-rate tests were used to compare rates of nonsynonymous divergence at homologous loci of Buchnera and E. coli. Tests were performed as described previously (Moran 1996). Estimation of Ka/Ks Estimates of nonsynonymous and synonymous pairwise divergence and standard deviations were calculated using Li’s (1993) method (DIVERGE, GCG). Compared to other commonly used estimates of nucleotide divergence, Li’s (1993) method has been shown to be reliable for loci with biased base composition (Ina 1995). In order to avoid inaccurate estimates due to saturation, only pairwise comparisons with Ks , 1.0 were included in the calculation of Ka/Ks ratios. Mapping of Nucleotide Changes Across Genealogies A portion of trpB is available for Buchnera of several Uroleucon species and for several E. coli isolates in the ECOR collection. Within each of these two groups, Ka values were too low to estimate reliable Ka/ Ks ratios for pairwise divergences. Instead, the percent change at first- and second-codon positions was used to approximate nonsynonymous divergence, and percent change at third positions was used to approximate synonymous divergence. Changes at each codon position were mapped with parsimony across genealogies of the partial trpB sequence and summed across each phylogeny. The tree length at first- and second-codon positions, divided by the length at third positions, adjusted for the number of sites, roughly approximates the ratio of divergence at replacement versus silent positions. Phylogenies of trpB partial sequences were estimated by parsimony analysis (by PAUP, Swofford 1993). For the ECOR and Buchnera trpB data sets, two most-parsimonious trees were found, and a single mostparsimonious tree was selected. For each data set, nucleotide changes at each codon position were mapped across the selected tree (by MacClade, Maddison and Maddison 1992). Changes at first- and second-codon positions were summed across each tree and divided by the total number of first- and second-codon positions (452 nt) to approximate percent change at nonsynonymous sites. Likewise, changes at third positions were summed and divided by the total number of third-codon positions (227 nt) to approximate percent change at synonymous sites. The ratios of these estimates were compared across the ECOR phylogeny and subsets of the Buchnera phylogeny. 88 Wernegreen and Moran Results and Discussion Evidence for Strong A1T Mutational Pressure Across Buchnera Loci All Buchnera loci sampled were extremely A1T rich at fourfold degenerate sites (the average across loci was 88.4%). This is consistent with previous observations of A1T richness across the Buchnera genome (Ishikawa 1987; Clark, Baumann, and Baumann 1992; Ohtaka and Ishikawa 1993; Clark, Baumann, and Baumann 1998). Comparison of E. coli and Buchnera Number of Significant x2 Tests The null hypothesis that codon usage in Buchnera reflects local base composition was tested by performing a x2 analysis of observed and expected numbers of Uor A-ending codons for eight fourfold degenerate families across several loci. Of a total of 772 x2 tests performed across Buchnera loci, fewer tests were significant than expected by chance alone. Only 23 tests were significant at the 5% level, and 42 more were significant at the 10% level (table 2). This paucity of significant x2 tests indicates that codon usage generally reflected local base composition. In contrast, the majority of E. coli loci showed significant nonrandom use of alternative codons for most amino acids. Of a total of 528 x2 tests performed across E. coli genes, 274 were significant at the 5% level, and 35 more were significant at the 10% level (data not shown). Comparison of x2 Averaged Across Amino Acids The depression of codon bias reflected in Buchnera is also apparent in the low x2 values, averaged across amino acids for each locus (fig. 1). Although the magnitude of x2 values is not directly comparable for E. coli and Buchnera (see Methods), a large proportion of E. coli loci have average x2 values greater than the critical value for significance at the 5% level (7.815 for df 5 3), whereas no Buchnera loci have an average x2 value that exceeds the critical value significance at 5% (3.841 for df 5 1). In addition, a strong relationship exists between the average x2 values for loci of E. coli and the CAI on the basis of preferred codons for this species, which is known to be highly correlated with gene expression level (Sharp and Li 1987). In contrast, the lack of correspondence between average x2 estimates of Buchnera genes and the CAI of homologous E. coli loci indicates that codon bias in Buchnera does not correspond with levels of gene expression of E. coli homologues and provides further evidence against effective translational selection in Buchnera. Comparison of x2 Values for Individual Amino Acids The contrast between levels of bias in Buchnera and in E. coli is also evident in the narrow range of x2 values for each fourfold degenerate family across Buchnera loci (fig. 2). Buchnera loci rarely show significantly nonrandom use of U- or A-ending codons for individual amino acids, even for groEL, which is known to be highly expressed in Buchnera (Baumann, Bau- mann, and Clark 1996). In contrast, x2 values are relatively high for individual amino acids across most E. coli loci, including those considered low expression in E. coli, such as trp genes (Sharp, Tuohy, and Mosurski 1986). Evidence for Preferential Use of U-ending and Aending Codons in Buchnera Despite the severe depression of codon bias in Buchnera, codon usage cannot be attributed solely to mutational bias. In particular, serine and arginine are generally encoded by U-ending codons, and alanine tends to be encoded by the A-ending codon. For these amino acids, significant x2 tests across loci almost always reflect a higher frequency of one particular (U- or A-ending) codon (table 2). Since expected values of these tests are based on gene-specific base composition, these significant results cannot be attributed to local variation in A1T content. Likewise, for loci showing slight (nonsignificant) codon preferences, the preferred codon within the fourfold degenerate family for serine tends to be UCU, and alanine tends to be encoded by GCA (P , 0.01 in each case; table 3). These preferences agree with the general trends found in a previous analysis of codon usage across several loci of the endosymbiont of Schizaphis graminum, many of which are included in the genes sampled here (Clark, Baumann, and Baumann 1998). These apparent codon preferences across several Buchnera loci and taxa suggest that purifying selection may effectively reduce the frequency of A-ending codons for serine and arginine and reduce the U-ending codon for alanine. Buchnera populations may occasionally go through periods where selection is more effective, possibly because of an expansion of aphid population sizes. Strong mutational bias and drift may largely shape patterns of codon use, but these occasional periods of effective purifying selection may reduce the frequency of nonoptimal codons. Preference for the U-ending codon of the fourfold degenerate families of serine and arginine agrees with the codon preferences of highly expressed genes of E. coli (Sharp et al. 1988). However, codon preferences for alanine in E. coli depend on gene expression levels, as highly expressed loci are biased toward GCU, which is thought to represent the optimal codon for this amino acid, but lowly expressed loci are biased toward GCA. The preference for GCA across Buchnera loci may reflect a change in the optimal codon, possibly precipitated by drift in small populations. It has been suggested that such switches in codon preference among the enteric bacteria may relate to differences among lineages in effective population sizes (Shields 1990). Evidence for Bias in Buchnera groEL Genes A second line of evidence that codon use may be under effective (although weak) selection in Buchnera is the slightly higher levels of bias observed at the overexpressed gene, groEL. This gene had the greatest overrepresentation in the pool of significant x2 tests, relative to its frequency in the original sample (4.1% in original Evidence for Drift in Endosymbionts Table 2 Significant Nonrandom Use of U- or A- Ending Codons Across Several Buchnera Loci for Each of Eight Fourfold Degenerate Families Taxona Locus Amino Acid Schizaphis graminum . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . Schlechtendalia chinensis . . . . . . Acyrthosiphon pisum . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . Sitobion avenae . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. chinensis . . . . . . . . . . . . . . . . . S. chinensis . . . . . . . . . . . . . . . . . S. chinensis . . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . Diuraphis noxia . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . Thelaxes suberi . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. chinensis . . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . Rhopalosiphum maidis . . . . . . . . S. chinensis . . . . . . . . . . . . . . . . . A. pisum . . . . . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. avenae . . . . . . . . . . . . . . . . . . . T. suberi . . . . . . . . . . . . . . . . . . . . T. suberi . . . . . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . A. pisum . . . . . . . . . . . . . . . . . . . . A. pisum . . . . . . . . . . . . . . . . . . . . A. pisum . . . . . . . . . . . . . . . . . . . . S. chinensis . . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . D. noxia . . . . . . . . . . . . . . . . . . . . S. avenae . . . . . . . . . . . . . . . . . . . R. padi . . . . . . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. chinensis . . . . . . . . . . . . . . . . . A. pisum . . . . . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . T. suberi . . . . . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . A. pisum . . . . . . . . . . . . . . . . . . . . Melaphis rhois . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . Pemphigus betae . . . . . . . . . . . . . S. graminum. . . . . . . . . . . . . . . . . M. rhois . . . . . . . . . . . . . . . . . . . . D. noxia . . . . . . . . . . . . . . . . . . . . S. chinensis . . . . . . . . . . . . . . . . . tpiA groES trpG leuA tufA groEL trpD trpA groEL rpoD atpA trpE trpA tuf rpoB rep leuB trpE leuD dnaN trpA ftsA hscA trpE trpB groEL dnaA aroH groEL leuA leuC dnaG murC ilvD groEL trpB rpoB dnaA infC dnaK groEL dnaJ tuf ddlB trpE groEL repAl ftsZ rpoB trpG trpE groEL rhn leuA ilvD argS tufA rpoB ilvC trpA tuAf gidA tufA leuD trpA A A A A A A A A A A A G G G G G G G G L L L L L P P P P P P P P P R R R R R R R R R S S S S S S S S S S T T T T T T T V V V V V V a b No. of Amino Acid Residues in Protein x2 Value for Nonrandom Useb Codon Ending 14 8 7 35 22 54 12 11 51 27 35 18 17 30 90 17 19 16 8 9 5 9 15 18 11 15 10 10 15 7 12 8 11 14 17 6 35 17 11 9 18 10 10 16 20 29 11 17 64 7 33 30 10 18 26 5 22 58 18 7 30 26 31 9 11 6.401 4.814 4.45 3.817 3.532 3.487 3.484 3.424 2.803 2.734 5.174 9.044 4.2 3.452 3.227 3.039 2.754 2.858 3.125 3.875 2.995 4 6.317 6.368 8.27 4.158 2.9 2.756 2.708 2.734 2.76 3.265 3.777 3.435 2.7299 2.7963 3.0635 3.184 3.5566 5.0547 6.5715 9.2069 2.915 3.0218 3.1565 3.2248 3.5667 4.5219 5.8242 5.9269 6.7696 7.9462 3.784 3.613 3.2 3 2.8356 3.8263 7.2636 5.293 4.417 3.81 3.624 3.531 2.967 A A A A A A A A A A U A A A A A A U U A U U U U A A A A A U U U U A U U U U U U U U U U U U U U U U U U A A A A U U U A A A A A U Buchnera taxa are labeled by the aphid species from which they were isolated. Critical values for these two-class x2 tests are 2.706 (P , 0.1) and 3.841 (P , 0.05). 89 90 Wernegreen and Moran FIG. 1—Relationship between the x2-based estimate of codon bias and the Codon Adaptation Index (CAI) for several E. coli or Buchnera loci. Points represent a single locus and are positioned on the y-axis by the x2 value averaged across each fourfold degenerate amino acid. Points are positioned on the x-axis by the CAI of either the same gene (E. coli) or the homologous E. coli gene (Buchnera). Average x2 includes only x2 values for amino acids with five or more residues in a given locus. In contrast to Buchnera genes, E. coli loci have consistently higher average x2 values and a strong relationship with CAI. sample versus 12.3% in the pool of significant x2 tests; table 4). In particular, for symbionts of Acyrthosiphon pisum, four of the eight amino acids considered showed significant x2 values (table 2). Slight codon bias at groEL is also suggested by the relatively high x2 values at this locus (figs. 1, 2). Compared to other Buchnera loci, groEL of Buchnera from Acyrthosiphon pisum has the highest average x2 (see fig. 2). Relative-Rate Tests In agreement with previous studies, relative-rate tests demonstrated higher rates of nonsynonymous substitution in Buchnera loci relative to E. coli homologues. Rates of sequence evolution in Buchnera genes were 1.3 to 6.9 times faster than in E. coli genes (table 5). Comparisons of Ka/Ks For several protein-coding loci, pairwise divergences at nonsynonymous and synonymous sites were calculated across several Buchnera lineages and across E. coli versus Salmonella typhimurium. These pairwise divergences were used to estimate ratios of nonsynonymous to synonymous divergence (Ka/Ks) for Buchnera and for the enterics. Across each locus sampled, values of Ka/Ks were generally higher in Buchnera than in E. coli (fig. 3). The maximum value for Ka/Ks for Buch- Evidence for Drift in Endosymbionts 91 FIG. 2—Comparison of x2 values for individual amino acids, across several homologous E. coli [x] and Buchnera [.] loci. Buchnera loci are labeled by the taxon from which a gene was sampled (Sg 5 Schizaphis graminum, Ap 5 Acyrthosiphon pisum, Mr 5 Melaphis rhois, Pb 5 Pemphigus beta, Sc 5 Schlechtendalia chinensis, Rp 5 Rhopalosiphum padi, Sa 5 Sitobion avenae, Mp 5 Myzus persicae, Dn 5 Diuraphis noxia, Rm 5 R. maidis, Usn 5 Uroleucon sonchi, Ts 5 Thelaxes suberi). For illustrative purposes only, x2 values of E. coli are graphed as negative values if the preferred codon is a nonoptimal codon as defined by the E. coli Relative Synonymous Codon Usage table (Sharp et al. 1988). x2 values of Buchnera are graphed as negative values if the A-ending codon is preferred. For each locus, levels of bias in E. coli are much higher than those in Buchnera. The locus showing the strongest evidence for bias in Buchnera is groEL, particularly that of Acyrthosiphon pisum. nera always exceeded that of the enterics. Except for groEL, the minimum Ka/Ks value also exceeded the estimate for the enterics (fig. 3b). When several loci were available for a given pair of Buchnera taxa, groEL generally proved to have the lowest Ka and the lowest Ks (fig. 3a). The relatively low Ka/Ks ratios for this gene may be attributed to decreased nonsynonymous divergence that is more extreme than the observed depression at synonymous sites. Compared to other genes in Buchnera, purifying selection is apparently more effective against replacement substitutions at this highly expressed locus. 92 Wernegreen and Moran Table 3 Number of Buchnera Loci Showing a Slight Preference for U- or A-Ending Codons for Each of Eight Fourfold Degenerate Families AMINO ACID xxU xxA CONSIDERED EXPECTED NUMBER OF LOCI SHOWING A PREFERENCE FOR AENDING CODON A......... G......... L ......... P ......... R......... S ......... T ......... V......... 37 52 33 37 32 72 41 55 53 47 37 48 23 31 54 51 90 99 70 85 55 103 95 106 39.8 43.8 31.0 37.6 24.3 45.6 42.0 46.9 NO. OF BUCHNERA LOCIa TOTAL LOCI x2 VALUEb 7.77 0.42 2.09 5.12 0.13 8.32 6.07 0.64 P VALUE (,) 0.01 0.025 0.01 0.01 a Number of Buchnera loci showing slight preference for the A-ending codon of that amino acid and number showing a preference for the U-ending codon. b The x2 value expresses the deviation of the observed from the expected number of loci with preference for A-ending codons. Mapping of Base Changes Across Genealogies Sequence evolution at a portion of trpB was compared across two very shallow taxonomic groups: the ECOR collection of E. coli strains and Buchnera associated with the aphid genus Uroleucon. Because of low levels of divergence within each group, nucleotide changes were summed across genealogies of trpB instead of calculated pairwise (fig. 4). Given the higher divergence among Uroleucon isolates, distinct trpB clades were considered separately. Each clade, however, gave comparable values for the ratio of the tree length at first and second positions, divided by the length at third positions adjusted for the number of sites. This approximation of nonsynonymous divergence divided by synonymous divergence is considerably higher for the Uroleucon isolates, relative to the E. coli strains (0.162 to 0.285 across subsets of the Buchnera trpB phylogeny and 0.0328 across the E. coli trpB phylogeny; fig. 4). The ECOR and Buchnera trees each include nodes with relatively weak support (see bootstrap values, fig. 4). For the purposes of mapping nucleotide changes at each codon position, these relatively weak nodes do not affect the results obtained. It should be noted, however, that the resolved trees presented are not intended to represent exact relationships among E. coli or Buchnera isolates. Estimates of Synonymous Substitution in E. coli versus Buchnera Under the assumption that rates of synonymous substitutions reflect mutation rates, high ratios of nonsynonymous to synonymous divergence indicate that replacement substitutions accumulate at faster rates in a particular lineage (Brynnel et al. 1998). This assumption may be violated by differing levels of codon bias in Buchnera and E. coli. Selection for translational efficiency in E. coli may depress rates of synonymous substitution and thus elevate Ka/Ks estimates. However, this discrepancy in codon bias would only dispose the Ka/ Ks comparison away from the previously observed trend: higher Ka/Ks ratios in Buchnera. Therefore, low levels of adaptive codon bias in Buchnera only strengthen the interpretation that relatively high Ka/Ks ratios across Buchnera loci reflect an elevation of nonsynonymous substitutions. Table 4 Frequencies of Buchnera Loci in the Original Sample Compared to Their Representation in the Pool of Loci for Which There Is Evidence of Significant Codon Bias Locus No. of x2 Tests Fraction of Total x2 Tests Performeda No. Significant x2 Tests Fraction of Significant x2 Testsb groELc . . . . . . . . . . groES . . . . . . . . . . . leu genes . . . . . . . . trpB . . . . . . . . . . . . trpA . . . . . . . . . . . . trpD . . . . . . . . . . . . trpE . . . . . . . . . . . . trpG . . . . . . . . . . . . tuf . . . . . . . . . . . . . . 32 8 119 64 15 14 46 36 39 0.041 0.010 0.154 0.083 0.019 0.018 0.060 0.047 0.051 8 1 7 2 5 1 5 2 6 0.123 0.015 0.108 0.031 0.077 0.015 0.077 0.031 0.092 A total of 772 x2 tests were performed. The total number of significant x2 tests was 65. c groEL had the greatest overrepresentation in the pool of significant tests compared to its frequency in the original sample. a b Evidence for Drift in Endosymbionts 93 Table 5 Relative-Rates Test for Substitutions at Nondegenerate Sites in Loci of Buchnera Versus Escherichia coli Gene ilvC . . . . . . . . ilvD . . . . . . . . ilvI . . . . . . . . . leuA . . . . . . . . leuB . . . . . . . . leuC. . . . . . . . leuD. . . . . . . . aroA . . . . . . . cysE. . . . . . . . dnaJ. . . . . . . . dnaK . . . . . . . dnaN . . . . . . . dnaG(pt)c . . . secB. . . . . . . . rpoB . . . . . . . rpsA. . . . . . . . atpD . . . . . . . tpiA . . . . . . . . gapA . . . . . . . gidA. . . . . . . . Function No. Codons Isoleucine valine biosynthesis 492 Isoleucine valine biosynthesis 617 Isoleucine valine biosynthesis 568 Leucine biosynthesis 507 Leucine biosynthesis 365 Leucine biosynthesis 465 Leucine biosynthesis 208 Aromatic amino acid biosynthesis 226 Serine/glycine family amino acid biosynthesis 251 Heat-shock protein 380 Heat-shock protein 638 Chromosome replication 368 Replication 321 Protein export, molecular chaperonin 155 Transcription (RNA polymerase) 1360 Translation (ribosomal protein) 557 Energy metabolism (ATP synthase) 465 Electron transport 253 Electron transport 336 Chromosome replication 629 Taxon 3a K12 K13 K23 K13–K23 Haemophilus influenzae H. influenzae H. influenzae H. influenzae Pseudomonas aeruginosa Azotobacter vinelandii A. vinelandii Aeromonas salmonicida H. influenzae Haemophilus ducreyi Vibrio cholerae Pseudomonas putida P. putida H. influenzae P. putida H. influenzae H. influenzae H. influenzae H. influenzae H. influenzae 0.27 0.25 0.22 0.24 0.30 0.31 0.42 0.25 0.43 0.21 0.12 0.59 0.45 0.44 0.12 0.16 0.11 0.53 0.18 0.31 0.29 0.28 0.25 0.29 0.45 0.58 0.55 0.42 0.41 0.37 0.16 0.71 1.06 0.58 0.25 0.22 0.14 0.56 0.23 0.32 0.12 0.17 0.22 0.24 0.40 0.37 0.34 0.36 0.21 0.30 0.10 0.38 0.77 0.36 0.21 0.14 0.07 0.20 0.09 0.18 0.17 0.10 0.02 0.06 0.05 0.20 0.21 0.05 0.20 0.07 0.06 0.33 0.29 0.21 0.04 0.09 0.07 0.36 0.14 0.14 zb K01/K02 7.65*** 5.24*** 1.20 2.36** 1.33 5.18*** 3.66*** 1.21 4.83*** 2.23* 3.84*** 6.26*** 3.34** 3.13** 3.37** 5.01*** 4.48*** 7.21*** 6.00*** 6.38*** 4.3 2.4 1.3 1.6 1.4 4.7 2.9 1.5 2.8 2.0 2.6 3.6 4.7 2.9 2.0 3.4 3.7 5.3 6.9 2.6 a In each test, taxon 1 is Buchnera of Schizaphis graminum, except for dnaJ and dnaK, which are Buchnera of Acyrthosiphon pisum, and taxon 2 is always Escherichia coli. Taxon 3 is a more distantly related reference taxon. b z scores were calculated as described by Muse and Weir (1992). Probabilities for one-tailed t-test (H :K 0 01 # K02) are * P , 0.05, ** P , 0.01, *** P , 0.0001. c ‘‘(pt)’’ indicates that only a partial sequence is available in GenBank. More problematic is the possibility that Ks in Buchnera is underestimated because of the strong A1T bias across loci and more rapid saturation at silent sites. However, the calculation of Ka/Ks ratios across shallow taxonomic levels avoided the problem of saturation at synonymous sites and the large standard errors that accompany high divergence estimates. In addition, comparisons of changes at first- and second- vs. third-codon positions across very shallow taxonomic levels (Buchnera isolates of Uroleucon and members of the ECOR collection of E. coli) also suggest higher rates of fixation at replacement sites, relative to synonymous sites, in Buchnera. Conclusions Because of their small population sizes and limited opportunities for recombination, vertically inherited endosymbionts provide a good model system to test the effects of increased drift on sequence evolution in bacteria. In this study, the lack of adaptive codon bias across several Buchnera loci suggests that codon usage is shaped primarily by A1T mutational bias rather than by translational selection. In addition, relative-rate tests and comparisons of Ka/Ks ratios support previous conclusions that Buchnera lineages experience rapid sequence evolution at nonsynonymous sites, compared to their free-living relative, E. coli. These results suggest that selection is ineffective in eliminating two types of weakly deleterious mutations from Buchnera populations: those resulting in nonoptimal codons and those resulting in amino acid replacements. A set of loci that might be suspected of experiencing unusual selection in Buchnera are those encoding enzymes for biosynthesis of essential amino acids. Buchnera provisions host insects with these nutrients, which are limiting in the plant phloem on which aphids feed. In several Buchnera lineages, genes for anthranilate synthase (trpEG), the first and limiting enzyme in the tryptophan biosynthetic pathway, are amplified in tandem repeats on multicopy plasmids (Lai, Baumann, and Baumann 1994; Rouhbakhsh et al. 1996, 1997; Baumann et al. 1997, 1998a). This increased copy number is considered an adaptation that benefits the aphid hosts, which do not synthesize tryptophan and which depend on Buchnera to provide this essential nutrient (Baumann et al. 1995; Douglas 1998). Likewise, observed duplications of leucine genes and their occurrence on plasmids may represent an adaptation for overexpression (Bracho et al. 1995; van Ham et al. 1997; Baumann et al. 1998b). In addition, groEL is known to be constitutively overexpressed in several intracellular bacteria, both mutualistic (Aksoy 1995; Baumann, Baumann, and Clark 1996) and pathogenic (Garduno et al. 1998). In Buchnera, groEL comprises about 10% of the total protein produced (Ishikawa 1984; Hara et al. 1990). The mechanism for this overexpression is uncertain but likely involves changes in gene regulation rather than a gene duplication, since groEL is apparently single copy in at least one Buchnera species in which it is overexpressed, the symbiont of Acyrthosiphon pisum (Ohtaka and Ishikawa 1993). The protein is known in E. coli to be a heat shock chaperonin involved in the folding, assembly, and translocation of other polypeptides (Bochdareva, Lissen, and Girshovich 1988; Goloubinoff et al. 1989; Goloubinoff, Gatenby, and Lorimer 1989). While its role 94 Wernegreen and Moran FIG. 3—Comparison of levels of nonsynonymous substitutions (Ka) and synonymous substitutions (Ks) across several loci of the enteric bacteria and Buchnera. (A) Pairwise divergence at nonsynonymous and synonymous sites, calculated across Buchnera [.] taxa and across E. coli versus Salmonella typhimurium [x]. Ks values higher than 1.0 were excluded. (B) Ratios of nonsynonymous to synonymous substitutions, on the basis of pairwise divergence across Buchnera taxa [.] and across E. coli versus S. typhimurium [x]. With the exception of groEL, all Ka/ Ks estimates for Buchnera exceed the ratio calculated for the homologous gene of the enteric bacteria. All Ka/Ks ratios are based on Ks values less than one. in endosymbionts is less certain, it may function to stabilize proteins that have accumulated amino acid substitutions, as suggested by Moran (1996). In this study, sequence evolution of these functionally important Buchnera genes shows the same patterns as housekeeping genes that are not overexpressed. In particular, genes in the tryptophan (trpABC(F)DE) and leucine (leuABCD) biosynthetic pathways and groEL each show low levels of codon bias and rapid rates of nonsynonymous substitution, relative to E. coli homologues. The fact that the observed trends of depressed codon bias and accelerated evolutionary rates occur across all Buchnera loci included, even functionally important genes, argues for an increased effect of drift within endosymbiont populations. One alternative explanation for elevated evolutionary rates at nonsynonymous sites is positive selection for amino acid changes; however, such selection typically acts at specific loci rather than across the genome. In addition, there is no evidence that the acceleration of nonsynonymous substitution rates is any higher across Buchnera biosynthetic genes than across other Buchnera loci. Combining loci from table 5 with those examined previously (Moran 1996; table 2), 14 loci encode amino acid biosynthetic genes and 14 encode genes for other functions. Of the first set, seven have K01/K02 . 2 and seven have K01/K02 , 2. Of the second set, 12 have K01/K02 . 2, and 2 have K01/K02 , 1. Thus, amino acid biosynthetic loci tend to be less accelerated than other Buchnera genes. A second alternative hypothesis for lack of codon bias and accelerated evolutionary rates in Buchnera is a relaxation of purifying selection against nonoptimal codons and replacement substitutions. Perhaps the intracellular environment is more constant than that experienced by free-living bacteria. While the results here cannot exclude the possibility of relaxed selection, two features of the data argue against it. First, selection is unlikely to be relaxed across all loci, and the trends we observed occur consistently across each locus examined. In addition, selection is not likely to be relaxed at loci that are functionally important in the symbiosis and that are known to be overexpressed in Buchnera. Given their apparent significance in the symbiosis, it is difficult to imagine relaxed selection on trp genes, leu genes, and Evidence for Drift in Endosymbionts 95 FIG. 4—Mapping of nucleotide changes across genealogies of trpB for (A) strains in the ECOR collection of E. coli isolates and (B) several Buchnera isolates from the aphid genus Uroleucon (Uae 5 Uroleucon aeneum, Uja 5 U. jaceae, Usl 5 U. solidaginis, Uam 5 U. ambrosiae, Uas 5 U. astronomus, Urd 5 U. rudbeckiae, Usn 5 U. sonchi, Uo 5 U. obscurum, Urp 5 U. rapunculoidis, Ujl 5 U. jaceicola, Uc 5 U. caligatum, Uh 5 U. helianthicola, Urr 5 U. rurale, Ue 5 U. erigeronense). TrpB phylogenies were estimated using parsimony analysis of all sites (679 nucleotides). For both data sets, trees presented are one of two most-parsimonius trees. Confidence in nodes was assessed using bootstrapping (1,000 replicates). The number of unambiguous nucleotide changes along branches is given in parentheses. The number of changes at first- and second-codon positions, is followed by the number of changes at third-codon positions (in b). Nucleotide changes were summed across the entire ECOR tree, and across subsets of taxa in the Buchnera tree (see figure insert). The percent change at first and second positions, divided by the percent change at third positions, roughly approximates Ka/Ks. groEL. While the addition of more loci would be desirable, the consistency of the results across a variety of loci best supports the view that the effects of mutational bias and drift in Buchnera are sufficiently strong to override the effect of purifying selection against the nonoptimal codons and amino acid replacements. In contrast to trp and leu genes, groEL does not appear to be duplicated in Buchnera (Ohtaka and Ishikawa 1993), and its constitutive overexpression may depend on rapid, efficient translation of a limited number of mRNA molecules. It is not surprising, therefore, that slight codon bias was detected for Buchnera groEL, albeit at much lower levels than for the E. coli homologue. Acknowledgments We thank Joana Silva for her comments on an earlier version of this paper, and we thank two anonymous reviewers for their helpful suggestions. This work was supported by a Center for Insect Science postdoctoral fellowship to J.J.W. and by an NSF grant to N.M. (DEB9527635). LITERATURE CITED AKASHI, H., and S. W. SCHAEFFER. 1997. Natural selection and the frequency distributions of ‘‘silent’’ DNA polymorphism in Drosophilia. Genetics 146:295–307. AKSOY, S. 1995. Molecular analysis of the endosymbionts of tsetse flies: 16S rDNA locus and over-expression of a chaperonin. Insect Mol. Biol. 4:23–29. ANDERSSON, S. G. E., and P. M. SHARP. 1996. Codon usage and base composition in Rickettsia prowazekii. J. Mol. Evol. 42:525–536. BAUMANN, P., L. BAUMANN, and M. CLARK. 1996. Levels of Buchnera aphidicola chaperonin groEL during growth of the aphid Schizaphis graminum. Curr. Microbiol. 32:279– 285. 96 Wernegreen and Moran BAUMANN, P., L. BAUMANN, M. A. CLARK, and M. L. THAO. 1998a. Buchnera aphidicola: the endosymbiont of aphids. ASM News 64:203–209. BAUMANN, P., L. BAUMANN, C. LAI, D. ROUHBAKHSH, N. MORAN, and M. CLARK. 1995. Genetics, physiology, and evolutionary relationships of the genus Buchnera: intracellular symbionts of aphids. Ann. Rev. Microbiol. 49:55–94. BAUMANN, L., P. BAUMANN, N. A. MORAN, J. SANDSTROM, and M. L. THAO. 1998b. Genetic characterization of plasmids containing genes encoding enzymes of leucine biosynthesis in endosymbionts (Buchnera) of aphids. J. Mol. Evol.,in press. BAUMANN, L., M. A. CLARK, D. ROUHBAKHSH, P. BAUMANN, N. A. MORAN, and D. J. VOEGTLIN. 1997. Endosymbionts (Buchnera) of the aphid Uroleucon sonchi contain plasmids with trpEG and remnants of trpE pseudogenes. Curr. Microbiol. 35:18–21. BERG, O. G. 1995. Kinetics of synonymous codon change for an amino acid of arbitrary degeneracy. J. Mol. Evol. 41: 345–352. BOCHDAREVA, E. S., N. M. LISSEN, and A. S. GIRSHOVICH. 1988. Transient association of newly synthesized unfolded proteins with the heat-shock GroEL protein. Nature 336: 254–257. BRACHO, A. M., D. MARTINEZ-TORRES, A. MOYA, and A. LATORRE. 1995. Discovery and molecular characterization of a plasmid localized in Buchnera sp. bacterial endosymbiont of the aphid Rhopalosiphum padi. J. Mol. Evol. 41:67–73. BRYNNEL, E. U., C. G. KURLAND, N. A. MORAN, and S. G. E. ANDERSSON. 1998. Evolutionary rates for tuf genes in endosymbionts of aphids. Mol. Biol. Evol. 15:574–582. BUCHNER, P. 1965. Endosymbiosis of animals with plant microorganisms. Wiley and Sons, New York. BULMER, M. 1991. The selection-mutation-drift theory of synonymous codon usage. Genetics 129:897–907. CLARK, M. A., L. BAUMANN, and P. BAUMANN. 1992. Sequence analysis of an aphid endosymbiont DNA fragment containing rpoB (B-subunit of RNA polymerase) and portions of rplL and rpoC. Curr. Microbiol. 25:283–290. . 1998. Sequence analysis of a 34.7-kb DNA segment from the genome of Buchnera aphidicola (endosymbiont of aphids) containing groEL, dnaA, the atp operon, gidA, and rho. Curr. Microbiol. 36:158–163. DOUGLAS, A. 1998. Nutritional interactions in insect-microbial symbioses: aphids and their symbiotic bacteria Buchnera. Annu. Rev. Entomol. 43:17–37. DYKHUIZEN, D. E., and L. GREEN. 1993. Recombination in E. coli and the definition of biological species. J. Bacteriol. 173:7257–7268. GARDUNO, R. A., G. FAULKNER, M. A. TREVORS, N. VATS, and P. S. HOFFMAN. 1998. Immunolocalization of Hsp60 in Legionella pneumonphila. J. Bacteriol. 180:505–513. GOLOUBINOFF, P., J. T. CHRISTELLER, A. A. GATENBY, and G. H. LORIMER. 1989. Reconstitution of active dimeric ribulose bisphosphate carboxylase from an unfolded state depends on two chaperonin proteins and Mg-ATP. Nature 342:884– 889. GOLOUBINOFF, P., A. A. GATENBY, and G. H. LORIMER. 1989. GroE heat-shock proteins promote assembly of foreign prokaryotic ribulose bisphosphate carboxylase oligomers in Escherichia coli. Nature 337:44–47. GOUY, M., and C. GAUTIER. 1982. Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res. 10: 7055–7074. HARA, E., T. FUKATSU, K. KAKEDA, M. KENGAKU, C. OHTAKA, and H. ISHIKAWA. 1990. The predominant protein in an aphid endosymbiont is homologous to an E. coli heat shock protein. Symbiosis 8:271–283. HINDE, R. 1971. The control of the mycetocyte symbiotes of the aphids Brevicoryne brassicae, Myzus persicae and Macrosiphum rosae. J. Insect Physiol. 17:1791–1800. IKEMURA, T. 1981. Correlation between the abundance of E. coli transfer RNA’s and the occurrence of respective codons in its protein genes. J. Mol. Biol. 146:1–21. . 1985. Codon usage and tRNA content in unicellular and multicellular organisms. Mol. Biol. Evol. 2:13–34. INA, Y. 1995. New methods for estimating the numbers of synonymous and nonsynonymous substitutions. J. Mol. Evol. 40:190–226. ISHIKAWA, H. 1984. Characterization of the protein species synthesized in vivo and in vitro by an aphid endosymbiont. Insect Biochem. 14:417–425. . 1987. Nucleotide composition and kinetic complexity of the genomic DNA of an intracellular symbiont in the pea aphid Acyrthosiphon pisum. J. Mol. Evol. 24:205–211. LAI, C. Y., L. BAUMANN, and P. BAUMANN. 1994. Amplification of trpEG: adaptation of Buchnera aphidicola to an endosymbiotic association with aphids. Proc. Natl. Acad. Sci. USA 91(38):19–23. LAMBERT, J. D., and N. A. MORAN. 1998. Deleterious mutations destabilize ribosomal RNA in endosymbiotic bacteria. Proc. Natl. Acad. Sci. USA 95:4458–4462. LI, W. H. 1987. Models of nearly neutral mutations with particular implications for nonrandom usage of synonymous codons. J. Mol. Evol. 24:337–345. . 1993. Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J. Mol. Evol. 36:96–99. MADDISON, W. P., and D. R. MADDISON. 1992. MacClade: analysis of phylogeny and character evolution. Version 3.0. Sinauer Associates, Sunderland, Mass. MAYNARD SMITH, J., C. B. DOWSON, and B. G. SPRATT. 1991. Localized sex in bacteria. Nature 349:29–31. MAYNARD SMITH, J., N. SMITH, M. O’ROURKE, and B. SPRATT. 1993. How clonal are bacteria? Proc. Natl. Acad. Sci. USA 90:4384–4388. MORAN, N. 1996. Accelerated evolution and Muller’s rachet in endosymbiotic bacteria. Proc. Natl. Acad. Sci. USA 93: 2873–2878. MORAN, N., and P. BAUMANN. 1994. Phylogenetics of cytoplasmically inherited microorganisms or arthropods. Trends Ecol. Evol. 9:15–20. MORAN, N. A., C. D. VON DOHLEN, and P. BAUMANN. 1995. Faster evolutionary rates in endosymbiotic bacteria than in cospeciating insect hosts. J. Mol. Evol. 41:727–731. MORTON, B. R. 1998. Selection on the codon bias of chloroplast and cyanelle genes in different plant and algal lineages. J. Mol. Evol. 46:449–459. MUNSON, M. A., P. BAUMANN, M. A. CLARK, L. BAUMANN, N. A. MORAN, D. J. VOEGTLIN, and B. C. CAMPBELL. 1991. Evidence for the establishment of aphid-eubacterium endosymbiosis in an ancestor of four aphid families. J. Bacteriol. 173:6321–6324. MUSE, S. V., and B. S. WEIR. 1992. Testing for equality of evolutionary rates. Genetics 132:269–275. MUTO, A., and S. OSAWA. 1987. The guanine and cytosine content of genomic DNA and bacterial evolution. Proc. Natl. Acad. Sci. USA 84:166–169. OHTA, T. 1973. Slightly deleterious mutant substitutions in evolution. Nature 246:96–98. OHTAKA, C., and H. ISHIKAWA. 1993. Accumulation of adenine and thymine in a groE-homologous operon of an intracellular symbiont. J. Mol. Evol. 36:121–126. Evidence for Drift in Endosymbionts OHTAKA, C., H. NAKAMURA, and J. ISHIKAWA. 1992. Structures of chaperonins from an intracellular symbiont and their functional expression in E. coli groE mutants. J. Bacteriol. 174:1869–74. ROUHBAKHSH, D., M. A. CLARK, L. BAUMANN, N. A. MORAN, and P. BAUMANN. 1997. Evolution of the tryptophan biosynthetic pathway in Buchnera (aphid endosymbionts): studies of plasmid-associated trpEG within the genus Uroleucon. Mol. Phylogenet. Evol. 8:167–176. ROUHBAKHSH, D., C. Y. LAI, C. D. VON DOHLEN, M. A. CLARK, L. BAUMANN, P. BAUMANN, N. A. MORAN, and D. J. VOEGTLIN. 1996. The tryptophan biosynthetic pathway of aphid endosymbionts (Buchnera): genetics and evolution of plasmid-associated anthranilate synthase (trpEG) within the aphididae. J. Mol. Evol. 42:414–421. SELANDER, R. K., D. A. CAUGANT, and T. S. WHITTAM. 1987. Genetic structure and variation in natural populations of Escherichia coli. Pp. 1625–1648 in F. NEIDHARDT, ed. Escherichia coli and Salmonella typhimurium: cellular and molecular biology. American Society of Microbiology, Washington, D.C. SHARP, P., E. COWE, D. G. HIGGINS, D. C. SHIELDS, K. H. WOLFE, and F. WRIGHT. 1988. Codon usage patterns in E. coli, B. subtilis, S. cerevisiae, S. pombe, D. melanogaster and H. sapiens: a review of the considerable within-species diversity. Nucleic Acids Res. 16:8207–8211. SHARP, P. M., and K. M. DEVINE. 1989. Codon usage and gene expression level in Dictyostelium discoideum: highly expressed genes do ‘prefer’ optimal codons. Nucleic Acids Res. 17:5029–39. SHARP, P. M., and W. H. LI. 1987. The codon adaptation index—a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15: 1281–1295. 97 SHARP, P., T. M. F. TUOHY, and K. R. MOSURSKI. 1986. Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res. 14:5125– 5143. SHIELDS, D. C. 1990. Switches in species-specific codon preferences: the influence of mutation biases. J. Mol. Evol. 31: 71–80. SHIELDS, D. C., and P. M. SHARP. 1987. Synonymous codon usage in Bacillus subtilis reflects both translational selection and mutational biases. Nucleic Acids Res. 16:8023–8040. SHIELDS, D. C., P. M. SHARP, D. G. HIGGINS, and F. WRIGHT. 1988. ‘‘Silent’’ sites in Drosophila genes are not neutral: evidence of selection among synonymous codons. Mol. Biol. Evol. 5:704–716. SOKAL, R. R., and F. J. ROHLF. 1981. Biometry. W.H. Freeman and Co., New York. SUOEKA, N. 1961. Correlation between base composition of DNA and amino acid composition of protein. Proc. Natl. Acad. Sci. USA 47:1141–1149. SWOFFORD, D. L. 1993. PAUP: phylogenetic analysis using parsimony. Version 3.1.1. Illinois Natural History Survey, Champaign. VAN HAM, R., A. MOYA, and A. LATORRE. 1997. Putative evolutionary origin of plasmids carrying the genes involved in leucine biosynthesis in Buchnera aphidicola. J. Bacteriol. 179:4768–4777. WRIGHT, F., and M. J. BIBB. 1992. Codon usage in the G1Crich Streptomyces genome. Gene 113:55–65. WRIGHT, S. 1931. Evolution in Mendelian populations. Genetics 16:97–159. GEOFFREY I. MCFADDEN, reviewing editor Accepted September 24, 1998
© Copyright 2026 Paperzz