Reciprocal Fusions of Two Genes in the

Reciprocal Fusions of Two Genes in the Formaldehyde Detoxification Pathway in
Ciliates and Diatoms
Nicholas A. Stover,1 André R. O. Cavalcanti, Anya J. Li, Brian C. Richardson, and
Laura F. Landweber
Department of Ecology and Evolutionary Biology, Princeton University
During the course of a pilot genome project for the ciliate Oxytricha trifallax, we discovered a fusion gene never before
described in any taxa. This gene, FSF1, encodes a putative fusion protein comprising an entire formaldehyde dehydrogenase (FALDH) homolog at one end and an S-formylglutathione hydrolase (SFGH) homolog at the other, two proteins
that catalyze serial steps in the formaldehyde detoxification pathway. We confirmed the presence of the Oxytricha fusion
gene in vivo and detected transcripts of the full-length fusion gene. A survey of other large-scale sequencing projects
revealed a similar fusion protein in a distantly related ciliate, Tetrahymena thermophila, and a possible fusion of these
two genes in the diatoms Phaeodactylum tricornutum and Thalassiosira pseudonana, but in the reverse order, with the
SFGH domain encoded upstream of the FALDH domain. Orthologs of these fusion proteins may be widespread within the
ciliates and diatoms.
Introduction
Development of the ciliate macronucleus involves
widespread chromosome breakage followed by telomere
addition (reviewed in Prescott 1994; Jahn and Klobutcher
2002). Following these processing events, most macronuclear chromosomes sequenced to date from spirotrichous
ciliates contain the coding region of a single gene flanked
by telomeres and short untranslated regions (Riley and Katz
2001; D. M. Prescott, J. D. Prescott, and R. M. Prescott
2002; Chang et al. 2004). Recently, 1,356 macronuclear
chromosomes from the spirotrichous ciliate Oxytricha trifallax (Sterkiella histriomuscorum) were cloned and end
sequenced as part of a pilot genome project (Doak et al.
2003). During annotation of these sequences, we identified
a small set of putative multigene chromosomes by searching for chromosomes that contained different genes at opposing ends (Cavalcanti et al. 2004). One of these cloned
chromosomes appeared to encode a putative formaldehyde
dehydrogenase (FALDH) homolog at one end and a putative S-formylglutathione hydrolase (SFGH) at the other. We
sequenced the entire chromosome and found open reading
frames (ORFs) capable of encoding both full-length proteins. Intriguingly, both ORFs were oriented in the same
direction on the chromosome and were separated by only
30 bp or 10 amino acids. We could find neither a stop codon
at the 3# end of the upstream (FALDH) ORF nor a start
methionine codon at the beginning of the downstream
(SFGH) ORF, and these observations, together with the
preservation of the reading frame, suggested that the two
ORFs have merged to encode a bifunctional fusion protein.
Genomic DNA polymerase chain reaction and 3# rapid amplification of cDNA ends (RACE) confirmed the presence
and transcription (data not shown), respectively, of this O.
trifallax gene, which we have called FSF1 (FALDH/S-formylglutathione synthetase fusion 1). We subsequently
1
Present address: Department of Genetics, Stanford University
School of Medicine.
Key words: fusion gene, formaldehyde dehydrogenase, S-formylglutathione hydrolase, alcohol dehydrogenase III, esterase D, bikont
phylogeny.
E-mail: [email protected].
Mol. Biol. Evol. 22(7):1539–1542. 2005
doi:10.1093/molbev/msi151
Advance Access publication April 27, 2005
Ó The Author 2005. Published by Oxford University Press on behalf of
the Society for Molecular Biology and Evolution. All rights reserved.
For permissions, please e-mail: [email protected]
found both a genomic sequence and an expressed sequence
tag (EST) containing a gene encoding a similar fusion protein in a distantly related ciliate, Tetrahymena thermophila
(fig. 1), suggesting that this fusion gene may be widespread
among ciliate taxa. A further search of genome sequences
available online revealed a fusion of the FALDH and SFGH
genes in another protist group, the diatoms. There the two
genes are found in the opposite arrangement, with the
SFGH domain located N-terminal to the FALDH domain
(fig. 2), suggesting an independent fusion event in this clade.
FALDH and SFGH, the two putative proteins that
these gene fusions merge, are both members of evolutionarily ancient protein families, with homologs of each
protein present in a wide variety of prokaryotes and eukaryotes. While enzymes in both families are known to act on
a broad range of substrates, much research has been devoted
to their mutual involvement in the detoxification of intracellular formaldehyde (Harms et al. 1996). FALDH, also
known as alcohol dehydrogenase III (Holmquist and Vallee
1991), catalyzes the formation of S-formylglutathione from
S-hydroxymethylglutathione, which forms spontaneously
upon the interaction of formaldehyde and glutathione.
SFGH catalyzes the second step in the detoxification
reaction, in which glutathione is restored and the formyl
group is released as formate. Though ours is the first report
of a genetic linkage between these two genes in eukaryotes,
prokaryotic FALDH and SFGH genes are adjacent in
the genomes of diverse bacterial species (Blattner et al.
1997; Shaw, Arioli, and Plazinski 1998) or separated by
only a few genes (Harms et al. 1996). Physical linkage
of these functionally related proteins may provide a selective advantage to the protists described here; however,
isolation of these fusion proteins from cells or their expression in a heterologous system would be needed to determine
how the fusion affects the activity of either half of the
protein.
While no naturally occurring fusion of the FALDH
and SFGH genes has been described prior to this report,
a number of other protein-coding gene fusions have been
observed in both prokaryotes (Suhre and Claverie 2004)
and eukaryotes. Fusions of genes involved in pyrimidine
production and modification have recently been used to
aid in studies of eukaryote evolution. The first three
1540 Stover et al.
FIG. 1.—Putative Fsf1p proteins translated from the FSF1 genes of the ciliates Oxytricha trifallax (Sterkiella histriomuscorum) and Tetrahymena
thermophila, aligned using the Blast 2 sequences tool at National Center for Biotechnology Information (Tatusova and Madden 1999), using default
values. Amino acids linking the FALDH (N-terminal) and SFGH domains (C-terminal) are shaded.
enzymes of the six-step pyrimidine biosynthetic pathway
are fused at the genetic level in unikonts (animals, fungi,
and amoebozoans) and exist as a fusion protein in these
species (Nara, Hshimoto, and Aoki 2000). The fifth and
sixth genes in the same pathway, which code for
orotate phosphoribosyltransferase (OPRT) and orotidine5#-monophosphate decarboxylase (OMPDC), have fused
into a separate multidomain protein (OPRT-OMPDC) in
many eukaryotes (Nara, Hshimoto, and Aoki 2000). The
fusion of these two genes appears to have occurred independently in trypanosomatids, where OPRT comprises the
C-terminal half of the protein (OMPDC-OPRT). These combinatorial fusions are akin to the reciprocal arrangements
we report here for the FALDH and SFGH genes of ciliates
and diatoms. While in both cases these arrangements most
likely indicate independent fusion of coexpressed, functionally related proteins, it is possible that the constituent
domains may have swapped positions following their initial
fusions. Further analysis at the base of the ciliate and diatom
trees may help determine if the diatom or ciliate genes fused
independently or rearranged in one or the other lineage.
In later steps of pyrimidine synthesis, thymidylate
synthase (TS) catalyzes the methylation of deoxyuridine
monophosphate to form deoxythymidine monophosphate
and dihydrofolate reductase (DHFR) catalyzes the reduction of 7,8-dihydrofolate, a by-product of the methylation
reaction (Myllykallio et al. 2003). The TS and DHFR genes
are transcribed separately in unikonts and prokaryotes but
have fused to encode a DHFR-TS protein in bikonts (plants
and many protist species, including ciliates and diatoms).
This gene fusion has provided evidence that bikonts form
a single clade, which diverged early in eukaryote evolution
(Stechmann and Cavalier-Smith 2002; Stechmann and
Cavalier-Smith 2003).
With the help of the above fusion genes, the root of the
eukaryotic tree has recently been suggested to be between
A Reciprocal Fusion Gene in Ciliates and Diatoms 1541
FIG. 2.—Conserved protein domains encoded by the fusion genes described in this paper, identified by comparisons to the Conserved Domain
Database (CDD) at National Center for Biotechnology Information (Marchler-Bauer et al. 2003). (A) The Fsf1p fusion proteins encoded by the ciliates
Oxytricha trifallax and Tetrahymena thermophila. (B) The Sff1p fusion proteins encoded by the diatoms Thalassiosira pseudonana and Phaeodactylum
tricornutum. Top-scoring CDD domains are ADH_zinc_N (gray bars) and esterase (black bars), except for the SFGH region of the P. tricornutum gene, in
which KOG3101 had the second highest score after COG0627. COG0627 is also an esterase domain. Numbers represent amino acid positions of the first
three predicted proteins and for the peptide predicted by the overlapping P. tricornutum ESTs (see Methods).
unikonts and bikonts (reviewed in Baldauf 2003). However, determining the evolutionary relationships among
the many clades within these two divisions still remains
a major challenge for evolutionary biologists. A thorough
investigation of FALDH-SFGH and SFGH-FALDH gene
fusions in a variety of protists may help define the origins
of two major bikont clades.
listed in GenBank under accession numbers CD378851
and CD382924 (Scala et al. 2002).
Supplementary Material
The sequences of the genes and peptides described in
this paper are available at Molecular Biology and Evolution
online (www.mbe.oupjournals.org).
Methods
We amplified and sequenced the FSF1 gene from O.
trifallax (S. histriomuscorum) DNA (Chang et al. 2004)
using primers corresponding to portions of GenBank
accession numbers CC819739 and CC819368. The complete sequence of the macronuclear chromosome containing the O. trifallax FALDH-SFGH fusion (FSF1) gene
has been deposited in GenBank under accession number
AY63987. The T. thermophila FSF1 gene was identified
in a TBlastN search of the T. thermophila genome sequence
scaffolds using the Blast server at The Institute for Genomic
Research, using the predicted O. trifallax Fsf1p protein as
a query. The coding sequence of this gene is located
between base pairs 542481 and 544951 of scaffold
8254688 of T. thermophila Genome Assembly 2, November_2003, and contains two introns in the FALDH
region. These preliminary sequence data were obtained
from The Institute for Genomic Research Web site at
http://www.tigr.org. The T. thermophila EST clone 500722-7-H06 contains sequences corresponding to the genomic
clone in the areas encoding FALDH (GenBank accession
number BM395174, base pairs 99–494) and SFGH
(BM395175, base pairs 149–724) (Turkewitz, A. P., K. M.
Karrer, C. L. Jahn, E. Orias, K. E. Kirk, J. Frankel, and
L. A. Klobutcher, personal communication).
Searches of the Thalassiosira pseudonana genome
were performed using the Joint Genome Institute T. pseudonana Blast server at http://aluminum.jgi-psf.org/prod/
bin/runBlast.pl?db5thaps1. The T. pseudonana SFGHFALDH fusion (SFF1) gene is located between base
pairs 60683 and 63183 of scaffold 8 (Release Version 1)
(Armbrust et al. 2004). Online database searches using the
T. pseudonana SFF1 gene performed using the National
Center for Biotechnology Information Blast server
(Altschul et al. 1997) identified two overlapping EST clones
from Phaeodactylum tricornutum. These sequences are
Acknowledgments
We thank Wei-Jen Chang for his gift of the O. trifallax
RNA and Aaron Turkewitz for assistance with the T.
thermophila EST clone. Preliminary genomic sequence
data for T. thermophila were obtained from The Institute
for Genomic Research Web site at http://www.tigr.org.
Thalassiosira pseudonana genome sequence data were
produced by the U.S. Department of Energy Joint Genome
Institute, http://www.jgi.doe.gov. This work was supported by National Institute of General Medical Sciences
Grant GM59708 and National Science Foundation Grant
EIA0121422 to L.F.L.
Literature Cited
Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang,
W. Miller, and D. J. Lipman. 1997. Gapped BLAST and
PSI-BLAST: a new generation of protein database search
programs. Nucleic Acids Res. 25:3389–3402.
Armbrust, E. V., J. A. Berges, C. Bowler et al. (42 co-authors).
2004. The genome of the diatom Thalassiosira pseudonana:
ecology, evolution, and metabolism. Science 306:79–86.
Baldauf, S. L. 2003. The deep roots of eukaryotes. Science
300:1703–1706.
Blattner, F. R., G. Plunkett III, C. A. Bloch et al. (14 co-authors).
1997. The complete genome sequence of Escherichia coli
K-12. Science 277:1453–1474.
Cavalcanti, A. R., N. A. Stover, L. Orecchia, T. G. Doak, and L. F.
Landweber. 2004. Coding properties of Oxytricha trifallax
(Sterkiella histriomuscorum) macronuclear chromosomes:
analysis of a pilot genome project. Chromosoma 113:69–76.
Chang, W. J., N. A. Stover, V. M. Addis, and L. F. Landweber.
2004. A micronuclear locus containing three protein-coding
genes remains linked during macronuclear development in
the spirotrichous ciliate Holosticha. Protist 155:245–255.
Doak, T. G., A. R. Cavalcanti, N. A. Stover, D. M. Dunn, R.
Weiss, G. Herrick, and L. F. Landweber. 2003. Sequencing
1542 Stover et al.
the Oxytricha trifallax macronuclear genome: a pilot project.
Trends Genet. 19:603–607.
Harms, N., J. Ras, W. N. Reijnders, R. J. van Spanning, and
A. H. Stouthamer. 1996. S-Formylglutathione hydrolase of
Paracoccus denitrificans is homologous to human esterase D:
a universal pathway for formaldehyde detoxification?
J. Bacteriol. 178:6296–6299.
Holmquist, B., and B. L. Vallee. 1991. Human liver class III alcohol and glutathione dependent formaldehyde dehydrogenase
are the same enzyme. Biochem. Biophys. Res. Commun.
178:1371–1377.
Jahn, C. L., and L. A. Klobutcher. 2002. Genome remodeling in
ciliated protozoa. Annu. Rev. Microbiol. 56:48.
Marchler-Bauer, A., J. B. Anderson, C. DeWeese-Scott et al. (24
co-authors). 2003. CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res. 31:383–387.
Myllykallio, H., D. Leduc, J. Filee, and U. Liebl. 2003. Life
without dihydrofolate reductase FolA. Trends Microbiol.
11:220–223.
Nara, T., T. Hshimoto, and T. Aoki. 2000. Evolutionary
implications of the mosaic pyrimidine-biosynthetic pathway
in eukaryotes. Gene 257:209–222.
Prescott, D. M. 1994. The DNA of ciliated protozoa. Microbiol.
Rev. 58:233–267.
Prescott, D. M., J. D. Prescott, and R. M. Prescott. 2002. Coding
properties of macronuclear DNA molecules in Sterkiella nova
(Oxytricha nova). Protist 153:71–77.
Riley, J. L., and L. A. Katz. 2001. Widespread distribution of
extensive chromosomal fragmentation in ciliates. Mol. Biol.
Evol. 18:1372–1377.
Scala, S., N. Carels, A. Falciatore, M. L. Chiusano, and C. Bowler.
2002. Genome properties of the diatom Phaeodactylum tricornutum. Plant Physiol. 129:993–1002.
Shaw, W. H., T. Arioli, and J. Plazinski. 1998. Cloning and
sequencing of a S-formylglutathione hydrolase (FGH) gene
from the cyanobacterium Anabaena azollae (Accession No
AF035558) (PGR98-024). Plant Physiol. 116:868.
Stechmann, A., and T. Cavalier-Smith. 2002. Rooting the eukaryote tree by using a derived gene fusion. Science 297:89–91.
———. 2003. The root of the eukaryote tree pinpointed. Curr.
Biol. 13:R665–R666.
Suhre, K., and J. M. Claverie. 2004. FusionDB: a database for indepth analysis of prokaryotic gene fusion events. Nucleic
Acids Res. 32:D273–D276.
Tatusova, T. A., and T. L. Madden. 1999. BLAST 2 sequences,
a new tool for comparing protein and nucleotide sequences.
FEMS Microbiol. Lett. 174:247–250.
Geoffrey McFadden, Associate Editor
Accepted April 18, 2005