Kamneva et al 2012

1
2
SUPPLEMENTARY MATERIALS
3
Ecophysiology of freshwater Verrucomicrobia inferred from genomes
4
recovered through time-series metagenomics
5
6
Shaomei He1,2, Sarah LR Stevens1, Leong-Keat Chan4, Stefan Bertilsson3, Tijana Glavina del
7
Rio4, Susannah G Tringe4, Rex R Malmstrom4, and Katherine D McMahon1,5,*
8
9
1Department
of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
10
2Department
of Geoscience, University of Wisconsin-Madison, Madison, WI, USA
11
3Department
of Ecology and Genetics, Limnology and Science for Life Laboratory, Uppsala
12
University, Uppsala, Sweden
13
4DOE
14
15
16
17
18
19
5Department
20
SUPPLEMENTARY TEXT
21
Population abundance estimated from MAG coverage depth
22
Abundance of populations represented by MAGs at the sampling time points was inferred
23
by the coverage depth of these MAGs within individual metagenomes. First, coverage
24
depth of each contig was obtained by mapping merged reads from each metagenome to all
25
MAGs using the Burrows–Wheeler aligner (BWA)-backtrack alignment algorithm with a
26
95% sequence identity cutoff and n=0.05, as described in Bendall et al. (2016). Based on
27
the number of reads mapped to each contig, we calculated the coverage depth of each
28
contig. The contig coverage depth was then weighted by its contig length and averaged
29
within each MAG to obtain a weighted average, so that longer contigs (which tend to have
30
more reliable coverage estimation) weigh more in the estimate of the MAG coverage depth.
31
The MAG coverage depth within each metagenome was finally normalized by the total
32
number of reads in each metagenome and multiplied by the maximal number of reads from
Joint Genome Institute, Walnut Creek, CA, USA
of Civil and Environmental Engineering, University of Wisconsin-Madison,
Madison, WI, USA
* Corresponding author
33
all metagenomes so that the coverage can be compared across different time points and
34
different lakes (Figure 2).
35
36
Glycolate utilization
37
Previously, Verrucomicrobia were suggested to be among the glycolate utilizers in humic
38
lakes, based on the retrieval of genes encoding glycolate oxidase subunit D (glcD) (Paver
39
and Kent 2010). Glycolate is an algal exudate, which was suggested to influence bacterial
40
community structure in lakes. The first step in bacterial glycolate utilization is converting
41
glycolate to glyoxylate by glycolate oxidase, a multi-subunit protein complex consisted of
42
subunits D, E, and F (glcDEF). In E. coli, all three subunits are essential to its activity
43
(Pellicer et al 1996). The glc operon of E. coli also contains glcB, encoding malate synthase
44
G, which converts glyoxylate to malate to be utilized through the TCA cycle. Among the
45
MAGs, only TE4605 possesses all three subunits of glycolate oxidase (glcDEF) (Figure S6).
46
However, different from E. coli, the TE4605 glc operon lacks the malate synthase G, but
47
contains an alanine (or serine)-glyoxylate transaminase (AGXT) and a glycolate permease
48
instead (Figure S6). Therefore, it is likely that glyoxylate generated from glycolate
49
oxidation is converted to glycine for amino acid assimilation (Figure S6), instead of energy
50
generation through the TCA cycle as in E. coli. A similar operon containing glcDEF and
51
AGXT is also present in a soil verrucomicrobial aerobe, Chthoniobacter flavus. Notably, C.
52
flavus was reported unable to grow on glycolate as the sole carbon and energy source
53
(Sangwan et al 2004), supporting our hypothesis that glycolate is likely utilized for amino
54
acid assimilation, instead of energy generation by TE4605.
55
56
Interestingly, TE4605 also contains a second copy of glcD (glcD2), which is not associated
57
with glcEF. GlcD2 only shares a 34% amino acid identity with glcD in the glc operon
58
mentioned earlier. Notably, this glcD2 is 100% identical at the nucleotide level to the
59
verrucomicrobial glcD clone OTU45 from the study by Paver and Kent (2010) likely
60
derived from the same species. In fact, nearly all MAGs have glcD, some of which share
61
>60% amino acid identities to glcD clones (OTU43, OTU44 and OTU45). However, these
62
glcD, like glcD2 in TE4605, lack glcEF in its genome vicinity, and glcD is either an orphan
63
gene or on operons that are not apparently involved in glycolate metabolism. Therefore,
64
this raises the question whether these genes are bona fide glcD and whether freshwater
65
Verrucomicrobia are glycolate-degraders in general.
66
67
Acetate metabolism
68
Transporters for monocarboxylic acid (such as pyruvate, acetate, propionate) belong to a
69
large solute:sodium symporter (SSS) family, which can transport sugars, amino acids,
70
nucleosides, inositols, vitamins, urea or anions. Genes belong to SSS family were present in
71
all MAGs, yet most of their substrate specificities based on the annotation are unknown.
72
Several SSS genes are annotated as acetate permeases (actP), together with the presence of
73
genes for acetate activation to acetyl-CoA in these MAGs, actP would allow acetate enter the
74
TCA cycle for energy generation. However, these MAGs lack isocitrate lyase and malate
75
synthase (Figure S8), key enzymes on the glyoxylate cycle, which is necessary when cells
76
grow with two-carbon compounds, such as acetate as the sole carbon source. Pathways
77
alternative to the glyoxylate shunt have been proposed to replenish four-carbon
78
intermediates during growth on acetate. Yet, among the MAGs, only TH2746 possess key
79
genes in the ethylmalonyl-CoA pathway for growing on acetate (Figure S8) (Schneider et al
80
2012). Therefore, for most MAG-represented freshwater Verrucomicrobial populations,
81
acetate might be used as a supplementary source of energy, but not as the sole energy and
82
carbon source for growth.
83
84
Phosphorus (P) metabolism and adaptation to P-limited conditions
85
The high-affinity phosphate-specific transporter (PstABC) system genes were recovered in
86
nearly all MAGs, and the low-affinity phosphate permease (PitA) genes are also present in
87
most MAGs (Figure 6), allowing cells to efficiently take up inorganic phosphate at a wide
88
range of concentrations. In addition, alkaline phosphatase (PhoA) genes were recovered in
89
half of the MAGs and phosphonoacetate hydrolase (PhnA) genes are also present in some
90
MAGs.
91
organophosphonates as a P source under P starvation, respectively. Further, the
92
polyphosphate kinase (PPK) genes found in nearly all MAGs may allow cells to accumulate
93
polyphosphate for future use when environmental P becomes scarce. Overall, the presence
94
of genes responding to P limitation, such as the two-component regulator (phoRB), phoA,
These
two
enzymes
enable
cells
to
use
phosphate
monoesters
and
95
phnA, and pstABC in these Verrucomicrobia populations suggest a strategy to survive P
96
limitation. Previously, positive correlations between freshwater Verrucomicrobia
97
abundance and P availability were observed (Haukka et al 2006, Lindström et al 2004).
98
However, despite the much higher P levels in Mendota, we did not observe higher
99
population abundance of Verrucomicrobia in Mendota or underrepresentation of their
100
genes responding to P limitation. Therefore, Verrucomicrobia populations in Mendota are
101
probably more influenced by the availability of organic autochthonous C substrates.
102
103
Sulfur metabolisms
104
Dissimilatory sulfate reduction genes were only found in TH4590, and genes for
105
Dimethylsulfoxide (DMSO) reduction and polysulfide reduction are absent in all MAGs, as
106
are sulfur and thiosulfate oxidation (SOX) genes (Figure S8). These suggest that redox
107
processes with sulfur-containing compounds are not important modes of energy
108
generation for these Verrucomicrobia populations.
109
The ABC-type sulfate transporter or sulfate permease genes, as well as assimilatory
110
sulfate reduction genes were found in most MAGs (Figure S8). By contrast, sulfonate
111
transporter genes were only found in TH4903, and genes encoding alkanesulfonate
112
monooxygenase, which is involved in sulfur acquisition under sulfur-limiting conditions by
113
splitting organosulfonates to sulfite and formaldehyde, are absent in all MAGs. The
114
presence of sulfate transporter and assimilatory sulfate reduction genes, and the absence of
115
genes involved in sulfur acquisition under sulfur-limited conditions is consistent with our
116
hypothesis that the degradation of sulfated polysaccharide may serve as an abundant
117
source of sulfur for cell biosynthesis, based on the high occurrence of sulfatase genes in
118
these MAGs.
119
120
Oxygen tolerance
121
Oxygen (O2) reduction products such as superoxide (O2-) and hydrogen peroxide (H2O2)
122
can damage cells. Superoxide dismutases (SODs) convert O2- to H2O2 and O2, and H2O2 is
123
less destructive to cells and can be subsequently eliminated by the activities of catalases or
124
peroxidases. All MAGs have SOD genes, and the majority of them also contain catalase
125
and/or peroxidase genes (Figure S8). The lack of catalases and peroxidase is not
126
particularly associated with MAGs recovered from the anoxic hypolimnion, but rather is
127
probably due to the incomplete coverage of their genomes. Therefore, the presence of SOD,
128
catalase and/or peroxidase genes in most of MAGs suggests that most of these
129
Verrucomicrobia, including the ones found in hypolimnion, are able to tolerate oxygen.
130
131
Oxidative phosphorylation and alternative complex III
132
Most of these MAGs possess genetic components of the oxidative phosphorylation pathway,
133
including NADH:quinone oxidoreductase (Complex I), succinate:quinone oxidoreductase
134
(Complex II), the low-affinity caa3-type cytochrome c oxidase and/or the high-affinity cbb3-
135
type cytochrome c oxidase (Complex IV), and the F-ATPase (Complex V) (Figure S8).
136
However, bona fide cytochrome bc1 complex, an quinol:cytochrome c oxidoreductase
137
(Complex III), is missing in all of the MAGs, and is also missing in all Verrucomicrobia
138
isolate genomes, including the obligate aerobes (data not shown). An alternative complex
139
III (ACIII) was proposed to perform the same function traditionally provided by
140
cytochrome bc1 complex in Bacteroidetes Rhodothermus marinus (Pereira et al 2007). We
141
found ACIII genes in Verrucomicrobia isolate genomes and most MAGs, suggesting this
142
phylum uses ACIII for electron transfer. In some cases, ACIII is immediately upstream of
143
cbb3-type cytochrome c oxidase complex located in the same operon in some cases. Taken
144
together, the presence of oxidative phosphorylation and cytochrome c oxidase genes would
145
enable oxygen to be used as an electron acceptor for energy generation. Interestingly, the
146
low-affinity aa3-type cytochrome c oxidase genes are not restricted to MAGs in the
147
epilimnion where oxygen is available in higher concentrations.
148
149
Occurrence of Planctomycete-specific cytochrome c and domains
150
A number of domains that were initially identified as “Planctomycete-specific” (Studholme
151
et al 2004) are abundant in our Verrucomicrobia MAGs. Among them are three
152
Planctomycete-specific cytochrome c domains (PSCyt1, PSCyt2, and PSCyt3, represented by
153
pfam07635, pfam07583, and pfam07627, respectively), five Planctomycete-specific
154
domains (PSD1 through PSD5, represented by pfam07587, pfam07624, pfam07626,
155
pfam07631, and pfam07637, respectively), and two domains with unknown functions
156
(DUF1501 and DUF1552, represented by pfam07394 and pfam07586, respectively).
157
PSCyt-containing genes in our Verrucomicrobia MAGs encode multi-domain proteins, most
158
of which contain both PSCyt and PSD domains and exhibit various domain architectures.
159
Based on the combination of specific PSCyt and PSD, these domain structures can be
160
classified into three groups. Group I contains PSCyt1, but not PSD or other PSCyt; Group II
161
contains PSCyt2, which exclusively pairs with PSD1 and also often with PSCyt1; and Group
162
III contains PSCyt3, which exclusively pairs with PSD4 and also often with PSD2, PSD3 and
163
PSD5 (Figure 7). The pairing between specific PSCyt and PSD is also reflected in their
164
domain occurrence frequencies in these MAGs (Figure S9a). Further, PSCyt2-containing
165
genes are usually next to DUF1501-containing genes; and PSCyt3-containing genes are
166
usually next to DUF1552-containing genes (Figure S9b). Such conserved domain
167
architectures and gene organizations, as well as their high occurrence frequencies in some
168
of the Verrucomicrobia MAGs are intriguing, yet nothing is known about their functions.
169
Some of the PSCyt-containing genes also contain additional domains besides PSCyt
170
and PSD. Most of these additional domains can be classified into two categories: one
171
involved in protein-protein interactions (PPI) and the other involved in carbohydrate
172
binding (CBM, carbohydrate-binding modules) (Figure 7), similar to previous findings in a
173
number of PVC genomes by Kamneva et al. (Kamneva et al 2012). These authors suggested
174
that PPI domains in these genes were responsible for protein complex assembly or
175
substrate recognition, and cytochromes encoded by these genes likely transfer electrons to
176
acceptors (possibly proteins and sugars) due to the presence of CBM domains (Kamneva et
177
al 2012). The presence of CBM domains in redox active proteins is indeed interesting. For
178
example, both CBM1 and cytochrome b562 (another redox active protein domain) are
179
components of cellobiose dehydrogenase (CDH) in the white-rot fungus Phanerochaete
180
chrysosporium (Yoshida et al 2005) and sugar dehydrogenase (SDH) in mushroom
181
Coprinopsis cinerea (Matsumura et al 2014). Therefore, it is plausible that some of the
182
PSCyt-containing genes, especially the ones with CBMs, are involved in carbohydrate
183
degradation.
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
REFERENCE
Bendall ML, Stevens SLR, Chan L-K, Malfatti S, Schwientek P, Tremblay J et al (2016).
Genome-wide selective sweeps and gene-specific sweeps in natural bacterial populations.
The ISME journal.
Haukka K, Kolmonen E, Hyder R, Hietala J, Vakkilainen K, Kairesalo T et al (2006). Effect of
nutrient loading on bacterioplankton community composition in lake mesocosms.
Microbial ecology 51: 137-146.
Kamneva OK, Knight SJ, Liberles DA, Ward NL (2012). Analysis of genome content
evolution in pvc bacterial super-phylum: assessment of candidate genes associated with
cellular organization and lifestyle. Genome biology and evolution 4: 1375-1390.
Lindström ES, Vrede K, Leskinen E (2004). Response of a member of the Verrucomicrobia,
among the dominating bacteria in a hypolimnion, to increased phosphorus availability.
Journal of Plankton Research 26: 241-246.
Matsumura H, Umezawa K, Takeda K, Sugimoto N, Ishida T, Samejima M et al (2014).
Discovery of a Eukaryotic Pyrroloquinoline Quinone-Dependent Oxidoreductase Belonging
to a New Auxiliary Activity Family in the Database of Carbohydrate-Active Enzymes. PloS
one 9: e104851.
Paver SF, Kent AD (2010). Temporal patterns in glycolate-utilizing bacterial community
composition correlate with phytoplankton population dynamics in humic lakes. Microbial
ecology 60: 406-418.
Pellicer MT, Badia J, Aguilar J, Baldoma L (1996). glc locus of Escherichia coli:
characterization of genes encoding the subunits of glycolate oxidase and the glc regulator
protein. Journal of bacteriology 178: 2051-2059.
Pereira MM, Refojo PN, Hreggvidsson GO, Hjorleifsdottir S, Teixeira M (2007). The
alternative complex III from Rhodothermus marinus - a prototype of a new family of
quinol:electron acceptor oxidoreductases. FEBS letters 581: 4831-4835.
Sangwan P, Chen X, Hugenholtz P, Janssen PH (2004). Chthoniobacter flavus gen. nov., sp.
nov., the first pure-culture representative of subdivision two, Spartobacteria classis nov., of
the phylum Verrucomicrobia. Applied and environmental microbiology 70: 5875-5881.
Schneider K, Peyraud R, Kiefer P, Christen P, Delmotte N, Massou S et al (2012). The
ethylmalonyl-CoA pathway is used in place of the glyoxylate cycle by Methylobacterium
extorquens AM1 during growth on acetate. The Journal of biological chemistry 287: 757766.
Studholme DJ, Fuerst JA, Bateman A (2004). Novel protein domains and motifs in the
marine planctomycete Rhodopirellula baltica. FEMS microbiology letters 236: 333-340.
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
Yoshida M, Igarashi K, Wada M, Kaneko S, Suzuki N, Matsumura H et al (2005).
Characterization of carbohydrate-binding cytochrome b562 from the white-rot fungus
Phanerochaete chrysosporium. Applied and environmental microbiology 71: 4548-4555.
SUPPLEMENTARY FIGURE LEGENDS
Figure S1. A tiled display of an emergent self-organizing map (ESOM) based on the
tetranucleotide frequency (TNF) of the 19 Verrucomicrobia MAGs. TNF was calculated with
a window size of 5 kbp, with each dot on the ESOM representing a 5-kbp fragment (or a
contig if its length is shorter than 5 kbp). Dots (i.e. fragments) are colored according to
MAGs. A numeric ID is assigned to each MAG, and IDs from Mendota are labeled in black
and IDs from Trout Bog labeled in white. A red outline was drawn to indicate the clustering
of MAGs from Mendota on the ESOM.
Figure S2. Counts of GH genes among the 78 different GH families present in MAGs.
Figure S3. Heat map based on GH abundance profile patterns showing the clustering of
MAGs by different lakes.
Figure S4. Counts of carbohydrate and amino acid transporter genes.
Figure S5. Comparison of glycolate oxidase gene operons in E. coli, C. flavus and TE4605.
Figure S6. Nitrogen (N) and carbon (C) utilization in the proteome and genome. N and C
utilization in the proteome is indicated by the number of N and C atoms per amino-acid
residue side chain (ARSC) respectively, and N utilization in the genome is indicated by
genome GC content. (a and b) Quantile plots showing the number of N and C atoms per
ARSC calculated from all predicted proteins in the 19 MAGs . Plots were generated
according to Grzymski and Dussaq (2012). (c and d) The median number of N and C atoms
per ARSC for the 19 MAGs ranked by the median number. (e) Plot showing the median
number of N per ARSC and median number of C per ARSC is negatively correlated (r = 0.83). (f) Plot showing the median number of N per ARSC is positively correlated with
genome GC content. In all plots, genomes and proteomes from ME are in red, and those
from TE and TH are in blue. The three proteomes/genomes enclosed by the dash circle
(TE1800, TH2519 and TH4093) have extremely low N- but high C-contents.
Figure S7. Summary of important metabolic genes and pathways.
Figure S8. Occurrence and gene organization of Planctomycetes-specific domains,
DUF1501, and DUF1552. (a) Counts of PSCyt, PSD, DUF1501, and DUF1552 domains in the
MAGs. (b) Clustering of PUF1501- and PSCyt2-containing genes, and clustering of
PUF1552- and PSCyt3-containing genes in the genome.
280
281
282
283
284
285
286
287
288
289
Figure S1. A tiled display of an emergent self-organizing map (ESOM) based on the
tetranucleotide frequency (TNF) of the 19 Verrucomicrobia MAGs. TNF was calculated with
a window size of 5 kbp, with each dot on the ESOM representing a 5-kbp fragment (or a
contig if its length is shorter than 5 kbp). Dots (i.e. fragments) are colored according to
MAGs. A numeric ID is assigned to each MAG, and IDs from Mendota are labeled in black
and IDs from Trout Bog labeled in white. A red outline was drawn to indicate the clustering
of MAGs from Mendota on the ESOM.
290
291
292
293
294
295
Figure S2. Counts of GH genes among the 78 different GH families present in MAGs.
296
297
298
299
300
Figure S3. Heat map based on GH abundance profile patterns showing the clustering of
MAGs by different lakes.
301
302
303
304
Figure S4. Counts of carbohydrate and amino acid transporter genes.
305
306
307
308
309
Figure S5. Comparison of glycolate oxidase gene operons in E. coli, C. flavus and TE4605.
310
311
312
313
314
315
316
317
Figure S6. Nitrogen (N) and carbon (C) utilization in the proteome and genome. N and C
utilization in the proteome is indicated by the number of N and C atoms per amino-acid
residue side chain (ARSC) respectively, and N utilization in the genome is indicated by
genome GC content. (a and b) Quantile plots showing the number of N and C atoms per
ARSC calculated from all predicted proteins in the 19 MAGs . Plots were generated
according to Grzymski and Dussaq (2012). (c and d) The median number of N and C atoms
per ARSC for the 19 MAGs ranked by the median number. (e) Plot showing the median
318
319
320
321
322
323
324
325
326
327
328
number of N per ARSC and median number of C per ARSC is negatively correlated (r = 0.83). (f) Plot showing the median number of N per ARSC is positively correlated with
genome GC content. In all plots, genomes and proteomes from ME are in red, and those
from TE and TH are in blue. The three proteomes/genomes enclosed by the dash circle
(TE1800, TH2519 and TH4093) have extremely low N- but high C-contents.
Figure S7. Summary of important metabolic genes and pathways.
329
330
331
332
333
(a)
334
335
336
337
338
339
340
341
342
(b)
Figure S8. Occurrence and gene organization of Planctomycetes-specific domains,
DUF1501, and DUF1552. (a) Counts of PSCyt, PSD, DUF1501, and DUF1552 domains in the
MAGs. (b) Clustering of PUF1501- and PSCyt2-containing genes, and clustering of
PUF1552- and PSCyt3-containing genes in the genome.