Evolution of gene fusions: horizontal transfer versus independent

http://genomebiology.com/2002/3/5/research/0024.1
Research
Itai Yanai*, Yuri I Wolf† and Eugene V Koonin†
comment
Evolution of gene fusions: horizontal transfer versus independent
events
Addresses: *Bioinformatics Graduate Program and Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA.
†National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MA 20894, USA.
Correspondence: Eugene V Koonin. E-mail: [email protected]
Genome Biology 2002, 3(5):research0024.1–0024.13
Received: 12 November 2001
Revised: 7 February 2002
Accepted: 26 March 2002
reviews
Published: 26 April 2002
The electronic version of this article is the complete one and can be
found online at http://genomebiology.com/2002/3/5/research/0024
© 2002 Yanai et al., licensee BioMed Central Ltd
(Print ISSN 1465-6906; Online ISSN 1465-6914)
reports
Abstract
Gene fusion leading to the formation of multidomain proteins
is one of the major routes of protein evolution. Gene fusions
characteristically bring together proteins that function in a
concerted manner, such as successive enzymes in metabolic
pathways, enzymes and the domains involved in their regulation, or DNA-binding domains and ligand-binding domains
in prokaryotic transcriptional regulators [1-3]. The selective
advantage of domain fusion lies in the increased efficiency of
coupling of the corresponding biochemical reaction or signal
transduction step [1] and in the tight co-regulation of
expression of the fused domains. In signal transduction
systems, such as prokaryotic two-component regulators and
sugar phosphotransferase (PTS) systems, or eukaryotic
information
Background
interactions
Conclusion: These findings suggest a major role for horizontal transfer of gene fusions in the
evolution of protein-domain architectures, but also indicate that independent fusions of the same
pair of domains in distant species is not uncommon, which suggests positive selection for the
multidomain architectures.
refereed research
Results: The evolutionary history of gene fusions was studied by phylogenetic analysis of the
domains in the fused proteins and the orthologous domains that form stand-alone proteins.
Clustering of fusion components from phylogenetically distant species was construed as evidence
of dissemination of the fused genes by horizontal transfer. Of the 51 examined gene fusions that
are represented in at least two of the three primary kingdoms (Bacteria, Archaea and Eukaryota),
31 were most probably disseminated by cross-kingdom horizontal gene transfer, whereas 14
appeared to have evolved independently in different kingdoms and two were probably inherited
from the common ancestor of modern life forms. On many occasions, the evolutionary scenario
also involves one or more secondary fissions of the fusion gene. For approximately half of the
fusions, stand-alone forms of the fusion components are encoded by juxtaposed genes, which are
known or predicted to belong to the same operon in some of the prokaryotic genomes. This
indicates that evolution of gene fusions often, if not always, involves an intermediate stage, during
which the future fusion components exist as juxtaposed and co-regulated, but still distinct, genes
within operons.
deposited research
Background: Gene fusions can be used as tools for functional prediction and also as
evolutionary markers. Fused genes often show a scattered phyletic distribution, which suggests a
role for processes other than vertical inheritance in their evolution.
2 Genome Biology
Vol 3 No 5
Yanai et al.
receptor kinases, domain fusion is the main principle of
functional design [4-6]. Furthermore, accretion of multiple
domains appears to be one of the important routes for
increasing functional complexity in the evolution of multicellular eukaryotes [7-9].
Pairs of distinct genes that are fused in at least one genome
have been termed fusion-linked [3]. A gene fusion is presumably fixed during evolution only when the partners
cooperate functionally and, by inference, a functional link
can be predicted to exist between fusion-linked genes.
Recently, this simple concept has been used by several
groups as a means of systematic prediction of the functions
of uncharacterized genes [1-3,10,11].
In addition to their utility for functional prediction, analysis
of gene fusions may help in addressing fundamental evolutionary issues. Gene fusions often show scattered phyletic
patterns, appearing in several species from different lineages. By investigating the phylogenies of each of the two
fusion-linked genes, it may be possible to determine the evolutionary scenario for the fusion itself. A recent study provided evidence that the fission of fused genes occurred
during evolution at a rate comparable to that of fusion [12].
Here, we address another central aspect of the evolution of
gene fusions, namely, do fusions of the same domains in different phylogenetic lineages reflect vertical descent, possibly
accompanied by multiple lineage-specific fission events, or
independent fusion events, or horizontal transfer of the
fused gene? In other words, is a fusion of a given pair of
genes extremely rare and, once formed, is it spread by horizontal gene transfer (HGT) perhaps also followed by fissions
in some lineages? Alternatively, are independent fusions of
the same gene pair in distinct lineages relatively common
during evolution? Among fusions that are found in at least
two of the three primary kingdoms of life (Bacteria, Archaea
and Eukaryota), we detected both modes of evolution, but
horizontal transfer of a fused gene appeared to be more
common than independent fusion events or vertical inheritance with multiple fissions.
Results and discussion
To distinguish between a single fusion event followed by
HGT and/or fission of the fused gene and multiple, independent fusion events in distinct organisms, we analyzed phylogenetic trees that were constructed separately for each of the
fusion-linked domains (proteins). The fusion was split into
the individual component domains and phylogenetic trees
were built for each of the corresponding orthologous sets
from 32 complete microbial genomes (Figure 1, and see
Materials and methods), including both fusion components
and products of stand-alone genes. The topologies of the
resulting trees were compared to each other and to the
topology of a phylogenetic tree constructed on the basis of a
concatenated alignment of ribosomal proteins, which was
chosen as the (hypothetical) species tree of the organisms
involved [13]. If the fusion events either occurred independently of each other or were vertically inherited, perhaps followed by fission in some lineages, the distribution of the
fusion components in the phylogenetic trees for the orthologous clusters to which they belong is expected to mimic the
distribution of the species carrying the fusion in the species
tree. In contrast, if the fusion gene has been disseminated by
HGT, fusion components will form odd clusters different
from those in the species tree.
This could be a straightforward approach to reconstructing
the evolutionary history of gene fusions, if only the topology
of the species trees was well resolved. However, this is not
necessarily the case for bacteria or archaea, where relationships between major lineages remain uncertain [14,15],
although a recent detailed analysis suggested some higherlevel evolutionary affinities [13]. Because the distinction
between the three primary kingdoms is widely recognized
[14,16] and is clear in the trees for most protein families [17],
trans-kingdom horizontal transfers of fused genes can be
more reliably detected with the proposed approach. Therefore, we concentrated on the evolutionary histories of gene
fusions that are shared by at least two of the three primary
kingdoms.
As the framework for this analysis, we used the database of
clusters of orthologous groups (COGs) of proteins [18,19],
which contains sets of orthologous proteins and domains
from complete microbial genomes (32 genomes at the time
of this analysis; see Materials and methods). Domain fusions
represented in some genomes by stand-alone versions of the
fusion components are split in the COG database so that
each fusion component can be assigned to a different COG.
Whenever distinct domains of a fusion protein belong to
separate COGs, the corresponding COGs are said to be
fusion-linked [3]. A search of the COGs database revealed
405 pairs of fusion-linked COGs. The vast majority (87%) of
fusion links include fusion present in only one primary
kingdom (Table 1). Only 52 pairs of fusion-linked COGs
Table 1
Phyletic patterns of gene fusions
Kingdom profile*
Number of fusion links between COGs
abe
ab-be
a-e
a--b--e
Total
*a, Archaea; b, Bacteria; e, Eukaryota.
3
27
20
1
82
215
56
405
http://genomebiology.com/2002/3/5/research/0024.3
comment
tma
aae
syn
uur
mge
mpn
bsu
bha
ctr
cpn
tpa
bbu
cje
hpy
rpr
nme
xfa
pae
bap
eco
hin
vch
dra
mtu
reviews
COG0046
COG0067
COG0067
COG0069
COG0139
COG0145
COG0147
COG0169
COG0294
COG0304
COG0331
COG0337
COG0403
COG0439
COG0439
COG0439
COG0476
COG0511
COG0557
COG0664
COG1984
COG1155
COG0025
COG0062
COG0069
COG0077
COG0108
COG0280
COG0287
COG0301
COG0340
COG0351
COG0468
COG0475
COG0550
COG0558
COG0560
COG0649
COG0662
COG0674
COG0777
COG1013
COG1112
COG1239
COG1361
COG1387
COG1683
COG1788
COG3261
COG0511
COG0518
COG0674
included fusions represented in two or three kingdoms
(Table 1), and for reasons discussed above, we chose these
pairs of COGs for an evolutionary analysis of gene fusions.
information
Figure 2 shows the pair of phylogenetic trees for the fusionlinked COGs 1788 and 2057. In both trees, the fusion components from E. coli and B. halodurans (YdiF and BH3898,
interactions
Figure 1 shows a genome-COG matrix that reveals the
phyletic (phylogenetic) patterns of the presence or absence
of the orthologs across the spectrum of the sequenced
genomes [18] for each of the 52 pairs of fusion-linked COGs
containing cross-kingdom fusions. When assessed against
the topology of the tentative species tree based on the concatenated alignments of ribosomal proteins [13], fusions
showed a scattered distribution in phyletic patterns
(depicted by columns in Figure 1). For example, the fusion
between COG1788 and COG2057 (, and -subunits of acylCoA:acetate CoA transferase) is seen in the bacteria
Escherichia coli, Deinococcus radiodurans and Bacillus
halodurans, and in the archaea Aeropyrum pernix,
Thermophilus acidophilum and Halobacterium sp. Similarly, the fusion between COG1683 and COG3272 (uncharacterized, conserved domains) was found in the bacteria
Pseudomonas aeruginosa and Vibrio cholerae, and in the
archaeon Methanobacterium thermoautotrophicum. In
each of these cases, with the species tree used as a reference,
the bacteria involved are phylogenetically distant from each
other and more so from the archaea, and non-fused versions
of the two domains exist within the same bacterial lineages
and in archaea (Figure 1). These observations emphasize the
central question of this work: are the fusions between the
same pair of domains in different species independent or are
they best explained by HGT?
refereed research
Figure 1
Phyletic patterns of fusion-linked COGs. Each pair of COGs is represented by a double column. The dark-gray rectangles indicate fusions, the light-gray
rectangles indicate that the fusion components are represented by stand-alone genes in the given genomes, and the white rectangles indicate that there is
no representative of the given COG in the given genome. Where one rectangle in a double column is light gray and the other is white, the genome in
question has a representative of only one of the pair of fusion-linked COGs. Species abbreviations are as listed in Materials and methods.
deposited research
COG0047
COG0069
COG0070
COG0070
COG0140
COG0146
COG0512
COG0710
COG0801
COG0331
COG2030
COG0703
COG1003
COG0511
COG1038
COG0825
COG0607
COG0825
COG1098
COG1752
COG2049
COG1372
COG0569
COG0063
COG1037
COG1605
COG0807
COG0281
COG1605
COG0607
COG1654
COG1992
COG1372
COG1226
COG0551
COG1213
COG2716
COG0852
COG0836
COG1014
COG0825
COG1014
COG2251
COG1240
COG1470
COG1796
COG3272
COG2057
COG3262
COG1038
COG0519
COG1013
reports
sce
hbs
tac
ape
afu
pho
mth
mja
1-Ta
c
_1Tac
20_
Ta0
8
PA
86
ra
1-D
Fusion component
Eukaryota
Archaea
Bacteria
DR
A00
54_
8-D
ra
04
c-M
tu
06
co
ha
-BBha
2
_
82
38998_
BHH38
B
-Tac
20_2Tac 94
Taa008820_2T
98
67
2-Ecoo
ydiF_2-Ec
ydiF_
pe
2-Ape
8_ -A
468_2
E246
APE2
AP
in
-H
25
-E
74
Rv
oD
07
A0
at
HI
e su
Pa
9- jD-B
9
19
yx
PA
BS
DR
BS yodS-Bsu
BH2259-Bha
97 61
98
90
100
818
98 76
67 98
0
PA
67
81
Ta0
96
84
a
-P
7
22
100
92
63
A-E
co
98
BH3898_1-Bha
BH3898_1-Bha
0
HP
u
jE-Bs
BS yx
_2Tac
100
69
69
100
-Bsu
BS yodR
-Drara 57
RA0054_2D
DDRA0054_2-Hin
AF11
773
0
I
99-Afu
H
e
y
p
-H
2
69
a to
818
Ta0
-Ap
8_1
246
APE
ae
-P
26
02
100
HP0691-Hpy
(b)
Rv3551-Mtu
93
co
diF_1-Eco
yydiF_1-E
tu
03c-M
Rv25
ra
-D
067 e
A0
a
DR
-P
00
20
ha
PA
-B
58
22
BH
(a)
Yanai et al.
Vol 3 No 5
e
4 Genome Biology
Figure 2
Phylogenetic trees for fusion-linked COGs: , and -subunits of acyl-CoA:acetate CoA transferase. Fusion components are denoted by shading and by a
number after an underline (_1 for the amino-terminal domain and _2 for the carboxy-terminal domain). The three primary kingdoms are color-coded as
indicated in the figure. The RELL bootstrap values are indicated for each internal branch. (a) ,subunit (domain) (COG1788); (b) -subunit (domain)
(COG2057). The proteins are designated using the corresponding systematic gene names followed (after the underline) by the abbreviated species
names. Species abbreviations are as in Materials and methods and Figure 1.
respectively) confidently group with the archaeal fusion
components, to the exclusion of the non-fused orthologs.
This position of the E. coli and B. halodurans fusion components is unexpected and is in contrast to the placement
of the orthologs from other gamma-proteobacteria and
Gram-positive bacteria, as well as non-fused paralogs from
the same species (AtoA/D and BH2258/2259, respectively)
within the bacterial cluster. These observations strongly
suggest that the gene for fused subunits of acyl-CoA:acetate
CoA transferase was disseminated horizontally between
E. coli, B. halodurans, and archaea. The presence of nonfused paralogs in both these bacterial species appears to be
best compatible with gene transfer from archaea to bacteria. In contrast, the fusion of the pair of domains from the
same COGs seen in D. radiodurans seems to be an independent event because, in both trees, the D. radiodurans
branch is in the middle of the bacterial cluster
(Figure 2a,b). Thus, the history of this pair of fusion-linked
COGs appears to involve horizontal transfer of the fused
gene between bacteria and archaea (and possibly also
within kingdoms), as well as at least one additional, independent fusion event in bacteria.
Figure 3 shows the phylogenetic trees for the two domains of
phosphoribosylformylglycinamidine (FGAM) synthase, a
purine biosynthesis enzyme. The components of this fusion,
which is found in proteobacteria and eukaryotes, form a
tight cluster separated by a long internal branch from the
non-fused bacterial and archaeal orthologs. This tree topology suggests HGT between bacteria and eukaryotes, possibly
a relocation of the fused gene from the pro-mitochondrion to
the eukaryotic nuclear genome or, alternatively, gene transfer from eukaryotes to proteobacteria. An additional aspect
of the evolution of this gene is the apparent acceleration of
evolution upon gene fusion, which is manifest in the long
branch that separates the proteobacterial-eukaryotic cluster
from the rest of the bacterial and archaeal species
(Figure 3a,b).
The fusion-linked COGs 1605 and 0077 (chorismate mutase
and prephenate dehydratase, respectively) show a more
complicated history, with distinct fusion events resulting in
different domain architectures (see legend to Figure 4). The
presence, in both trees, of two distinct clusters of fusion
components and the isolated fusion in Campylobacter jejuni
http://genomebiology.com/2002/3/5/research/0024.5
(a)
XF1423_1-Xfa
XF1423_1-Xfa
99 78
(b)
XF1423_2-Xfa
-Xfa PA3763_2-Pae
PA3763_2-Pae
XF1423_2
purL_2-Eco
purL_2-Eco
50
YGR061c_2-Sce
-Sce
YGR061c_2
89
72
67
VC0869_2-Vch
97
76 VC0869_2-Vch
HI0752_2-Hin
HI0752_2-Hin
HI0752_1-Hin
HI0752_1-Hin
75
comment
NMB1996_1-Nme
NMB1996_1-Nme
PA3763_1-Pae
PA3763_1-Pae
YGR061c_1-Sce
YGR061c_1-Sce
VC0869_1-Vch
VC0869_1-Vch
100
purL_1-Eco
purL_1-Eco
NMB1996_2-Nme
NMB1996_2-Nme
Ta1318-Tac
reviews
100
100
AF1940-Afu
TM1246-Tma
TM1245-Tma
49
Ta1066-Tac
Cj0955c-Cje
AF1260-Afu
aq 1105-Aae
53
92
aq 1836-Aae
BH0629-Bha 100
73 77
84
99 69
95
89
99
MJ1264-Mja
83
100
Rv0788-Mtu
99
69
100
43
73
PAB1201-Pab
Rv0803-Mtu
VNG0864G-Hbs
VNG1945G-Hbs
MJ1648-Mja
DR0223-Dra
Cj0514-Cje
Fusion component
Eukaryota
Archaea
Bacteria
The results of phylogenetic analyses of the 51 cross-kingdom
fusion links are summarized in Tables 2 and 3 and the
information
The pair of fusion-linked COGs 0777 and 0825 (, and subunits of acetyl-CoA carboxylase, respectively) shows
unequivocal clustering of the fusion components from
numerous archaeal and bacterial species, which indicates a
prevalent role for HGT in the evolution of this fusion
(Figure 5a,b). Moreover, archaea are scattered among bacteria, suggesting multiple HGT events. However, an apparent independent fusion is seen in Mycobacterium
tuberculosis (Figure 5a,b). It could be argued that, in cases
like those in Figure 5, where there is a sharp separation (a
long, strongly supported internal branch in each of the
trees) between the fusion components and stand-alone proteins, the COGs involved needed to be reorganized, to form
one COG consisting of fusion proteins only and two separate COGs consisting of stand-alone proteins. Formally, this
would eliminate the need for HGT as an explanation of the
tree topology for any of these new COGs. However, this
solution (even if attractive from the point of view of classification) does not seem to be correct in light of the principle
of orthology that underlies the COG system: it appears that,
in both of the COGs involved, the fusion components and
stand-alone proteins are bona fide orthologs, as judged by
the high level of sequence conservation and by the fact that,
in the majority of species involved, they are the only versions of this key enzyme.
interactions
suggest at least three independent fusion events, two of
which apparently were followed by horizontal dissemination
of the fused gene (Figure 4a,b). The single archaeal fusion,
the Arachaeoglobus fulgidus protein AF0227, belongs to
one of these clusters and shows a strongly supported affinity
with the ortholog from the hyperthermophilic bacterium
Thermotoga maritima (Figure 4a,b). Given the broad distribution of this fusion in bacteria, horizontal transfer of the
bacterial fused gene to archaea is the most likely scenario.
refereed research
Figure 3
Phylogenetic trees for fusion-linked COGs: phosphoribosylformylglycinamidine (FGAM) synthase. (a) Synthetase domain (subunit) (COG0046);
(b) glutamine amidotransferase domain (subunit) (COG0047). Protein designations are as in Figure 2.
deposited research
DR0222-Dra
43
BS purL-Bsu
BH0628-Bha
Slr0520-Syn
MTH1374-Mth
MTH168-Mth
76
PH1955-Pho100
PH1953-Pho
BS purQ-Bsu
Sll1056-Syn
100
71
PAB1200-Pae 56
reports
97
6 Genome Biology
Yanai et al.
Vol 3 No 5
(a)
XF1141-Xfa
BB
UU
33
99
2211
-B-B
uu
cc
phpeheA
A 1 1-E
-Ec co
o
HH
145
I1I1
14
1-Hin
5 1-H
in
V
C
VC007
0
7055 1
60
MT
1-V-V
H80
chch
VNG
4-M
1244
90
th
C-Hb
98
s
61
92
MJ024
6-Mja
58 64
65
64
73
82
tyrA_1-Eco
HI1290_1-Hin
VC0696_1-Vch
91
Ta0245-Tac
82
MJ0637-Mja
89
100
59
85
Rv0948c-Mtu
pheA_2-Eco
pheA_2-Eco
HI1145_2-Hin
HI1145_2-Hin
MTH1220-Mth
AF0227_2-Afu
AF0227_2-Afu
VC0705_2-Vch
Vch
VC0705_2-
XF2325_2-Xfa
XF2325_2-Xfa
71
98
85
52
72
52
68 73
92
66
62
NN
MM
BB
0404446
6_1_1-N
-Nmme
PP
A
A31316
e
66_6_1
1-P-Pa
XX
F2F23
ae e
32 25
5_ _1
1-Xfa
Sll1662-Syn
71
100
53
aq951_2-Aae
951_2-Aae
aq
BS_pheA-Bsu
BH1215-Bha
Ta0915-Tac
APE0563a-Ape
Cj0316_2-Cje
Cj0316_2-Cje
Fusion component
Eukaryota
Archaea
Bacteria
PA4230-Pae
TM0155_2-Tma
TM0155_2-Tma
50
75
NMB0446_2-Nme
NMB0446_2-Nme
BSaroA_1-Bsu
aroA_1-Bsu
BS
BH3242_1-Bha
BH3242_1-Bha
AF0227_3-Afu
AF0227_3-Afu
100
PA3166_2-Pae
PA3166_2-Pae
-Aaee
51_1-Aa
aq 951_1
DR1001_1-Dra
aq 9
DR1001_1-Dra
85
je
1-CCje
31_1Cj0j031_
C
BU392_2-Buc
BU392_2-Buc
TM0155_1
TM0155_1
82
51 60 65
DR1147-Dra
YPR060c-Sce
99
49
92
YNL316c-Sce
Rv1885c-Mtu
56
90
HP0291-Hpy
(b)
PA5184-Pae
VNG2222G-Hbs
Rv3838c-Mtu
_1-Dra
ae
R15422_1-Dra
_1-Pae
DDR154
A20144_1-P
PPA201
BH1136_1-Bha
BH1136_1-Bha
BSyngE_1-Bsu
yngE_1-Bsu
BS
VVNNGG00623G
623G__1-H
1-Hbbss
CP
n00
58Cpn
CT2
93-C
tr
HP0950-Hpy
95
PP
140
AA
14
DD
RR
000_4
AA
00
33
11
00
- _4-P- ae
__
44
-D
ra
B
BHH
113
11
6_2
36_
2 BB
yngE -Bha
SS
__
yn
gE__
2
2
VV
-B- su
100
NN
0
GG
066
2
3
2
G
3
G__
-b
v2502
22
RR
-H
v2
100
s
c_2502c_
2-Mtu
PA2014_2
PA
2014_2-Pa
-Pae
99
e
_2-Mtu
v0974c_2-Mtu
RRv0974c
83
63
89
100 77
100
100
48
41
-Bssuu
jD_2-B
BS_yqqjD_2
BS_y
TM0716_2-Tma
TM0716_2-Tma
AF2217_2-Afu
AF2217_2-Afu
55 57
98
PH1287_2-Pho
PH1287_2-Pho
55
e
-Paae
a
8_22-P
-Drra
288888_
2_2-D
A
4
5
P2
1542_2
R
PA
D 1
DR
47
100
aq_1206-Aae
99
100
e
Cj
3-
YNR016c_3-Sce
n
Fusion component
Eukaryota
Archaea
Bacteria
Cp
yttI
166 -Bsu
-Bha
4-
BS
04
1
sll0336-Syn
BH3
4
j04
C
98
62
HP0557-Hpy
Sll0728-Sy
n
99
68
96
DR
121
XF
4-D
02
in
-H
06
04
HI ccA-Eco
a
VC2244-Vch
PA3639-Pae
Rv0904c_1-Mtu
Rv0904c_1-Mtu
100
aq 445-Aae
84 67
VC1000-Vch
PA3112-Pae
NMB0679-Nme
PA0211-Pae
55
NMB1177-Nme
C
T
26
5-C
CP
tr
n
93
100
DR1215-Dra
BH3165-Bha
100
74
91
51
100
100
BS_accA-Bsu
86
100
82
61
98 79
56 55
Rv0904c_2-Mtu
Rv0904c_2-Mtu
49 74 96
Rv2247_2-Mtu
Rv2247_2-Mtu
XF1467-Xfa
81
accD-Eco 93
PA1400_3-Pae
97
Cj0127c-Cje
HI1260-Hin
DRA0310_3-Dra
100
(b)
Rv3799c_2-Mtu
VNG1529G_2-Hbs
RP619_
2-Rpr
BH29
57_2
-Bha
DR1
316_
2-Dr
a
tu
Mtu
_1-M
799c_1Rv3799c
u
Rv3
t
-Mtu
_1-M
80_1
3280
r
Rvv32
Rpr
R
_1--Rp
619_1
RPP619
R
-Tma
89
82
16_1-Tma
98 97
TM0716_1
54
TM07
93
53
PP
100
68 55
128
HH
12
7_1
87_
65 AF
1-P79 79
71 AF2
ho
221
2
17_7_1
100
1-A
69
94
fu
Rv3280_2-Mtu
BH2957_1-Bha
VNG15
29G_1
-Hbs
P PA
RvRv A2828
0909
88
7474 8_81_1
c_c_
1-1- - P-a
M
e
RR
tu
v2v25
50 02
2c c_
_1 1-M
tu
6_1-Dra
DR131
su
BS yqjD_1-B
(a)
-Mtu
Rv2247_1
Figure 4
Phylogenetic trees for fusion-linked COGs: chorismate mutase and prephenate dehydratase. (a) Chorismate mutase (COG1605); (b) prephenate
dehydratase (COG0077). Protein designations are as in Figure 2. The protein AF0227contains a prephenate dehydrogenase domain in addition to the
chorismate mutase and prephenate dehydratase domains.
03
-X
ra
fa
Figure 5
Phylogenetic trees for fusion-linked COGs: , and -subunits of acetyl-CoA carboxylase. (a) - subunit (domain) (COG0777); (b) ,subunit (domain)
(COG0825). Protein designations are as in Figure 2. The proteins DRA0310 and PA1400, in addition to the domains corresponding to the , and subunits of acetyl-CoA carboxylase, contain a biotin carboxylase domain and a biotin carboxyl carrier protein domain. The clustering of these proteins in
phylogenetic trees almost certainly reflects HGT between the respective bacterial lineages.
HGT
HGT
HGT
IFE
IFE
IFE
HGT
HGT
IFE
-be
-be
-be
-be
-be
-be
-be
-be
-be
-be
-be
Glutamate synthase
domain 2
Glutamate synthase
domain 3
Glutamate synthase
domain 3
Phospho-ribosyl-ATP
pyrophospho-hydrolase
(histidine biosynthesis)
N-methylhydaintoinase B
Anthranilate/paraaminobenzoate synthase
component II
3-dehydro-quinate
dehydratase
7,8-dihydro-6hydroxymethylpterinpyrophosphokinase
(acyl-carrier-protein)
S-malonyl-transferase
Acyl dehydratase
Shikimate kinase
COG0069
COG0070
COG0070
COG0140
COG0146
COG0512
COG0710
COG0801
COG0331
COG2030
COG0703
Glutamate synthase
domain 1
Glutamate synthase
domain 1
Glutamate synthase
domain 2
Phospho-ribosyl-AMP
cyclohydrolase
(histidine biosynthesis)
N-methylhydaintoinase A
Anthranilate/paraaminobenzoate synthase
component I
Shikimate
5-dehydrogenase
Dihydropteroate
synthase
3-oxoacyl-(acyl-carrierprotein) synthase
3-oxoacyl-(acyl-carrierprotein) synthase
3-dehydroquinate
synthetase
COG0067
COG0067
COG0069
COG0139
COG0145
COG0147
COG0169
COG0294
COG0304
COG0331
COG0337
HGT
HGT
-be
Phospho-ribosylformyl-glycinamidine
(FGAM) synthase glutamine
Amidotransferase domain
COG0047
Phospho-ribosylformylglycinamidine
(FGAM) synthase,
synthetase domain
COG0046
Principal mode
of evolution†
Kingdom
pattern*
information
Protein function
interactions
COG B
deposited research
refereed research
Protein function
Gene juxtaposition‡ Evolutionary scenario
One fusion event, fused
gene transfer between
eukaryotes and proteobacteria
One fusion event, fused gene
transfer between eukaryotes and
bacteria
One fusion event, fused gene
transfer between eukaryotes and
bacteria
One fusion event, fused gene
transfer between eukaryotes and
bacteria
Uncertain
One fusion event,
fused gene transfer between
eukaryotes and (the ancestor of)
Cyanobacteria and
Actinomycetes
Independent fusion events in
eukaryotes and bacteria
Independent fusion events in
eukaryotes and bacteria
Independent fusion events in
eukaryotes and bacteria
One fusion event, fused gene
transfer between eukaryotes and
bacteria
Fused gene transfer between
eukaryotes and Actinomycetes;
additional, independent fusions
in bacteria
Independent fusion events in
eukaryotes and bacteria (with
different domain organizations)
Pyro, Paby, Tmar,
Drad, Bsub, Bhal
Aful, Mjan, Tmar
-
Aful, Mjan, Mthe
-
Mjan, Aero, Hpyl
Aful, Mthe, Taci,
Aero, Tmar, Drad,
Bsub, Bhal, Ecol,
Vcho, Xfas
Paby¶, Ecol
Llac¶, Tmar, Drad,
Bsub, Bhal
Drad, Ecol, Vcho
-
Drad, Mtub,
Proteo-bacteria,
Ctra, Cpne
Fusion
Ecol, Paer,
Vcho,Hinf,
Xfas, Nmen
Most bacteria
Most bacteria
Most bacteria
Most bacteria
Mtub, Syne, Scer
Nmen, Cjej,
Paer, Scer
Ctra, Cpne, Scer
Ctra, Cpne, Scer
Mtub, Scer
Mtub, Bsub,
Scer
Tmar, Scer
reports
COG A
reviews
Evolutionary history of trans-kingdom gene fusions
comment
Table 2
http://genomebiology.com/2002/3/5/research/0024.7
COG1372
COG0569
Biotin carboxylase
Biotin carboxylase
Biotin carboxylase
Dinucleotide-utilizing
enzyme involved in
molybdopterin and
thiamine biosynthesis
Biotin carboxyl carrier
protein
cAMP-binding domain
Allophanate hydrolase
subunit 2
Archaeal/vacuolartype H+-ATPase
subunit A
Na+/H+ and K+/H+
antiporters
Uncharacterized,
conserved protein
Glutamate synthase
domain 2
COG0439
COG0439
COG0439
COG0476
COG0511
COG0664
COG1984
COG1155
COG0025
COG0062
COG0069
COG1037
COG0063
COG2049
COG1752
COG0825
COG0607
COG0825
Ferredoxin-like domain
Predicted sugar kinase
K+ transport systems,
NAD-binding component
Intein
Allophanate hydrolase
subunit 1
Esterase
Acetyl-CoA carboxylase
,-subunit
Rhodanese-related
sulfurtransferase
Acetyl-CoA carboxylase
,-subunit
Pyruvate carboxylase,
carboxy-terminal
domain/subunit
Biotin carboxyl carrier
protein
Glycine cleavage system
protein P (pyridoxalbinding), carboxyterminal domain
Protein function
ab-
ab-
ab-
a-e
-be
-be
-be
-be
-be
-be
-be
-be
Kingdom
pattern*
HGT
AF
IFE
IFE
HGT
IFE
IFE
HGT
HGT
HGT
HGT
Principal mode
of evolution†
NA
-
-
Most bacteria
Aful, Mjan, Mthe,
NA
Tmar; (all that have
COG1037)
All archaea; all
bacteria that
have COG0062
Hbsp, Bhal,
Syne
Taci, Pyro,
Scer
Bsub, Scer
-
Mtub, Ccre||,
Scer
One ancestral fusion; fused gene
transfer from archaea to bacteria
(Thermotoga)
One ancestral fusion; fission in
eukaryotes
Uncertain
Independent fusion events in
eukaryotes and archaea
Independent fusion events in
eukaryotes and bacteria
One fusion event, fused gene
transfer between eukaryotes and
actinomycetes; an additional,
independent fusion event in
bacteria
Independent fusion events in
eukaryotes and bacteria
Pyro, Tmar, Hbsp¥
Drad, Paer,
Scer
One fusion event, fused gene
transfer between eukaryotes and
bacteria; subsequent domain
accretion in eukaryotes
One fusion event, fused gene
transfer between eukaryotes and
bacteria; subsequent domain
accretion in eukaryotes
One fusion event, fused gene
transfer between eukaryotes and
bacteria; additional, independent
fusions in bacteria
One fusion event, fused gene
transfer between eukaryotes and
proteobacteria
Independent fusion events in x
sulfurtransferase
Hbsp, Rpxx
Mjan
Bhal, Ecol, Paer
Vcho, Hinf, Xfas,
Nmen, Hpyl, Ctra,
Cpne
Hbsp, Pyro, Taci,
Aero, Tmar,
Bsub, Bhal
Gene juxtaposition‡ Evolutionary scenario
-
Mtub, Syne,
Paer, Scer
Mtub, Scer
Bsub, Scer
Hbsp, Mtub,
Rpxx, Scer
Drad, Mtub,
Syne, Ecol, Paer,
Xfas, Nmen
Fusion
Vol 3 No 5
COG1038
COG0511
COG1003
Glycine cleavage system
protein P (pyridoxalbinding), aminoterminal domain
COG0403
COG B
Protein function
COG A
Table 2 (continued from the previous page)
8 Genome Biology
Yanai et al.
ACT-domain-containing protein ab-
COG2716
Phosphoserine
phosphatase
COG0560
HGT
ab-
Predicted sugar
nucleotidyltransferase
COG1213
Phosphatidylglycerophosphate
synthase
COG0558
AF
ab-
Zn-finger domain associated
with topoisomerase type IA
COG0551
Topoisomerase IA
COG0550
HGT
ab-
Kef-type K+ transport
systems, NAD-binding
component
COG1226
Kef-type K+ transport
systems, membrane
component
COG0475
IFE
ab-
Intein
COG1372
RecA/RadA recombinase
COG0468
HGT
ab-
Uncharacterized
conserved protein
COG1992
Hydroxymethylpyrimidine/phosphomethylpyrimidine kinase
COG0351
HGT
ab-
Biotin operon
repressor
COG1654
Biotin-(acetyl-CoA
carboxylase) ligase
COG0340
IFE
Rhodanese-related
sulfurtransferase
COG0607
ATP pyrophosphatase
(thiamine biosynthesis)
COG0301
ab-
HGT
ab-
ab-
Chorismate mutase
COG1605
Prephenate
dehydrogenase
COG0287
information
IFE
Malic enzyme
COG0281
Phosphotransacetylase
COG0280
interactions
ab-
GTP cyclohydrolase II
COG0807
3,4-dihydroxy-2butanone 4-phosphate
synthase
COG0108
refereed research
HGT
Chorismate mutase
COG1605
Prephenate
dehydratase
COG0077
Principal mode
of evolution†
Kingdom
pattern*
deposited research
ab-
Protein function
COG B
One fusion event, fused gene
transfer from bacteria to archaea
(Halobacterium)
Independent fusion events in
archaea and bacteria
-
-
-
Taci, Aero, Ccre
-
NA
-
NA
-
Aero
-
Aful, Aqua, Tmar,
Ecol, Vcho, Paer,
Hinf, Xfas, Nmen,
Cjej
Aful, Aqua, Tmar,
Drad, Mtub, Bsub,
Bhal, Syne, Paer,
Vcho, Xfas, Nmen,
Hpyl, Cjej, Ctra,
Cpne
Hbsp, Ecol, Hinf,
Xfas, Rpxx
Aful, Ecol, Vcho,
Hinf
Taci, Ecol, Vcho,
Paer, Hinf
Aful, Paby, Drad,
Bsub, Bhal, Ecol,
Paer, Vcho, Xfas;
(all that have
COG1654)
Hbsp, Mjan, Pyro,
Aero, Tmar
Hbsp, Pyro, Mtub
Mthe, Ecol, Paer,
Hinf, Xfas, Nmen,
Cjej, Rpxx
Most bacteria
and archaea
Aful, Pyro, Aqua
Aful, Mtub, Paer
Uncertain
One fusion event, fused gene
transfer from archaea to bacteria
(AquIFEx)
One ancestral fusion with
subsequent fission in Aper, Aqua
One fusion event, fused gene
transfer from bacteria to archaea
(Methanobacterium)
Independent fusion events in
archaea and bacteria
One fusion event, fused gene
transfer from archaea to bacteria
(Thermotoga)
One fusion event, fused gene
transfer from bacteria to archaea
(Archaeoglobus)
Independent fusion events in
archaea and bacteria
Uncertain
Fused gene transfer between
bacteria and archaea
(Archaeoglobus and Thermotoga
lineages); additional, independent
fusions in bacteria
Gene juxtaposition‡ Evolutionary scenario
Fusion
reports
Protein function
reviews
COG A
comment
Table 2 (continued from the previous page)
http://genomebiology.com/2002/3/5/research/0024.9
Mannose-6-phosphate
isomerase
Pyruvate:ferredoxin
COG1014
oxidoreductase and
related 2-oxoacid:ferredoxin
oxidoreductases,
alpha subunit
Acetyl-CoA
carboxylase - subunit
Pyruvate:ferredoxin
COG1014
oxidoreductase and
related 2-oxoacid:ferredoxin
oxidoreductases,
beta subunit
Superfamily I DNA
and RNA helicases
and helicase subunits
Mg-chelatase
subunit ChlI
S-layer domain
Histidinol phosphatase
and related hydrolases
of the PHP family
Uncharacterized
conserved protein
Acyl-CoA:acetate CoA
transferase alpha subunit
COG0662
COG0674
COG0777
COG1013
COG1112
COG1239
COG1361
COG1387
COG1683
COG1788
COG2057
COG3272
COG1796
COG1470
COG1240
COG2251
Acyl-CoA:acetate CoA
transferase beta subunit
Uncharacterized
conserved protein
DNA polymerase IV
(family X)
Predicted membrane
protein
Mg-chelatase
subunit ChlD
Predicted metal-binding
domain
Pyruvate:ferredoxin
oxidoreductase and
related 2-oxoacid:ferredoxin
oxidoreductases,
gamma subunit
Acetyl-CoA carboxylase
, subunit
Pyruvate:ferredoxin
oxidoreductase and
related 2-oxoacid:ferredoxin
oxidoreductases,
gamma subunit
Mannose-1-phosphate
guanylyltransferase
NADH:ubiquinone
oxidoreductase 27 kD
subunit
Protein function
ab-
ab-
ab-
ab-
ab-
ab-
ab-
ab-
ab-
ab-
ab-
Kingdom
pattern*
HGT
HGT
HGT
HGT
HGT
IFE
IFE
HGT
HGT
HGT
HGT
Principal mode
of evolution†
Hbsp, Taci, Aero,
Drad, Bhal, Ecol
Mthe, Paer, Vcho
Mthe, Taci, Drad,
Bsub, Bhal; (all
prokaryotes that
have COG1796)
Aful, Pyro, Bhal
Hbsp, Mthe, Taci,
Mtub, Syne
Pyro, Mtub
Mthe, Syne, Ecol,
Vcho, Tpal
Aful, Hbsp, Pyro,
Tmar, Drad, Mtub,
Bsub, Bhal, Paer,
Rpxx
Aful, Hbsp, Taci,
Aero, Mtub, Bhal,
Syne, Ecol, Vcho,
Tpal
Aful, Pyro, Aqua,
Ecol, Paer, Vcho,
Xfas, Hpyl, Cjej
Hbsp, Aqua,
Ecol, Paer
Fusion
Mtub, Bsub,
Paer, Hinf,
Hpyl
-
NA
-
Mjan, Paer
-
Aful, Taci,
Aero, Mtub,
Bhal
-
Mjan, Mthe,
Aqua, Tmar,
Hpyl, Cjej
-
Most archaea
and bacteria
Fused gene transfer between
bacteria and archaea; a second,
independent fusion event in
bacteria
One fusion event, fused gene
transfer between archaea and
bacteria (Methanobacterium and
Vibrio/Pseudomonas, respectively)
One fusion event, fused gene
transfer between archaea to
bacteria
One fusion event, fused gene
transfer from archaea to bacteria
Fused gene transfer between
bacteria and archaea, with
subsequent fissions
Independent fusion events in
archaea and bacteria
Independent fusion events in
archaea and bacteria
Fused gene transfer from
bacteria to archaea; a second,
independent fusion event in
bacteria
Fused gene transfer from
archaea to bacteria; a second,
independent fusion event in
bacteria
Fused gene transfer from
bacteria to archaea; a second,
independent fusion event in
bacteria
One fusion event, fused
gene transfer from bacteria to
archaea (Halobacterium)
Gene juxtaposition‡ Evolutionary scenario
Vol 3 No 5
COG0825
COG0836
COG0852
NADH:ubiquinone
oxidoreductase subunit 7
COG0649
COG B
Protein function
COG A
Table 2 (continued from the previous page)
10 Genome Biology
Yanai et al.
Pyruvate:ferredoxin
COG1013
oxidoreductase and
related 2-oxoacid:ferredoxin
oxidoreductases,
alpha subunit
COG0674
*Abbreviations: a, archaea, b, bacteria, e, eukaryotes; a dash indicates that the given kingdom is not represented in at least one of the fusion-linked COGs. †AF, ancestral fusion, HGT, horizontal gene
transfer, IFE, independent fusion events. ‡In several cases, the indicated genes are separated by one to three genes or their order is switched compared to that of the fusion components. §Paby,
Pyrococcus abyssi, an archaeal genome not included in the master set of genomes analyzed in this study. ¶Llac, Lactococcus lactis, a bacterial genome not included in the master set of genomes analyzed in
this study. ||Ccre, Caulobacter crescentus, a bacterial genome not included in the master set of genomes analyzed in this study. ¥Hbsp, Halobacterium sp., an archaeal genome not included in the master set
of genomes analyzed in this study.
Fused gene transfer from
archaea to bacteria
(,-proteobacteria)
Hbsp, Mjan, Aero,
Aqua, Tmar, Mtub,
Hpyl
COG0519
GMP synthase Glutamine
amidotransferase
domain
COG0518
HGT
abe
Pyruvate:ferredoxin
oxidoreductase and
related 2-oxoacid:ferredoxin
oxidoreductases,
beta subunit
Aful, Mthe, Taci,
Pyro, Paby, Scer,
Syne, Ecol, Vcho,
Cjej, Tpal
Fused gene transfer among
bacteria, archaea, and
eukaryotes
Mthe, Pyro, Paby
Aero, Scer,
most bacteria
HGT
abe
GMP synthase
-PP-ATPase domain
Pyro
Paby, Mtub, Ecol
HGT
COG3262
Ni,Fe-hydrogenase III
large subunit
COG3261
Ni,Fe-hydrogenase III
component G
ab-
Gene juxtaposition‡ Evolutionary scenario
Fusion
Principal mode
of evolution†
Kingdom
pattern*
Protein function
COG B
Protein function
information
COG A
interactions
Table 2 (continued from the previous page)
refereed research
The apparent independent fusion of the same pair of genes
(or, more precisely, members of the same two COGs) on
multiple occasions during evolution might seem unlikely.
However, we found that one-fourth to one-third of the gene
fusions shared by at least two kingdoms might have evolved
deposited research
The results of the present analysis point to HGT as a major
route of cross-kingdom dissemination of fused genes. Horizontal transfer might be even more prominent in the evolution of fused genes within the bacterial and archaeal
kingdoms. This notion is supported by the topologies of
some of the phylogenetic trees analyzed, which show
unexpected clustering of bacterial species from different lineages (note, for example, the grouping of D. radiodurans
with P. aeruginosa in Figure 5). Massive HGT between
archaea and bacteria, particularly hyperthermophiles, has
been suggested by genome comparisons [20-24]. However,
proving HGT in each individual case is difficult, and the significance of cross-kingdom HGT has been disputed [25,26].
With gene fusions, the existence of a derived shared character
(fusion) supporting the clades formed by fusion components
and the concordance of the independently built trees for each
of the fusion components make a solid case for HGT.
reports
Examination of the genomic context of the genes that encode
stand-alone counterparts of the fusion components showed
that, in 25 of the 51 cases, these genes were juxtaposed in
some, and in certain cases, many prokaryotic genomes
(Table 2). This suggests that evolution of gene fusions often,
if not always, passes through an intermediate stage of juxtaposed and co-regulated, but still distinct, genes within known
or predicted operons. In addition, some of the juxtaposed
gene pairs might have evolved by fission of a fused gene.
reviews
Additional data. In 31 of the 51 links, an inter-kingdom horizontal transfer of the fused gene appeared to be the evolutionary mechanism by which the fusion entered one of the
kingdoms. In contrast, only 14 fusion-linked pairs of COGs
show evidence of independent fusion in two kingdoms, and
in just two cases, the fusion seems to have been inherited
from the last universal common ancestor. The latter two scenarios were distinguished on the basis of the parsimony
principle, that is, by counting the number of evolutionary
events (fusions or fissions) that were required to produce the
observed distribution of fusion components and stand-alone
versions of the domains involved across the tree branches.
Accordingly, it needs to be emphasized that we can only infer
the most likely scenario under the assumption that the probabilities of fusion and fission are comparable. It cannot be
ruled out that some of the scenarios we classify as independent fusions in reality reflect the existence of an ancestral
fused gene and subsequent multiple, independent fissions.
The detection of ancestral domain fusions may call for the
unification of the respective COG pairs in a single COG, with
the species in which fission occurred represented by two distinct proteins.
comment
One fusion event, fused gene
transfer from bacteria to archaea
http://genomebiology.com/2002/3/5/research/0024.11
12 Genome Biology
Vol 3 No 5
Yanai et al.
Table 3
Summary of evolutionary scenarios for cross-kingdom gene
fusions
Evolutionary mode*
Number of fusion-linked COG pairs
Cross-kingdom horizontal
transfer of a fused gene
31
Independent fusion events
14
Ancestral fusion
2
Uncertain
4
Total
51
*As indicated in Table 2, the evolutionary scenarios for some of the
analyzed COGs included both cross-kingdom horizontal transfer and
apparent independent gene fusion within one of the kingdoms.
through such independent events, and probable additional
independent fusions were noted among bacteria. This could
be due to the extensive genome rearrangement characteristic
of the evolution of prokaryotes [27,28], and to the selective
value of these particular fusions, which tend to get fixed once
they emerge.
matrix for minimum evolution (least-square) tree building
[30] using the FITCH program. The PROTDIST and FITCH
programs are modules of the PHYLIP software package [31].
The tree topology was then optimized by local rearrangements using PROTML, a maximum likelihood tree-building
program, included in the MOLPHY package [32]. Local
bootstrap probability was estimated for each internal branch
by using the resampling of estimated log-likelihoods (RELL)
method with 10,000 bootstrap replications [33]. The gene
order in prokaryotic genomes was examined using the
‘Genomic context’ feature of the COG database.
Additional data files
Phylogenetic trees for 82 individual COGS presented as 52
pairs of trans-kingdom fusion-linked COGs are available
with the online version of this paper. Bootstrap values (percentage of 1,000 replications) are indicated for each fork.
Archaeal proteins are designated by black squares, bacterial
proteins by gray squares and eukaryotic proteins by empty
squares. Fusion components are denoted by _1, _2, _3, etc.
Acknowledgements
Materials and methods
The version of the COG database used in this study included
the following complete prokaryotic genomes. Bacteria: Aae,
Aquifex aeolicus; Bap, Buchnera aphidicola; Bbu, Borrelia
burgdorferi; Bsu, Bacillus subtilis; Bhal, Bacillus halodurans;
Cje, Campylobacter jejuni; Cpn, Chlamydophila pneumoniae;
Ctr, Chlamydia trachomatis; Dra, Deinococcus radiodurans;
Eco, Escherichia coli; Hin, Haemophilus influenzae; Hpy,
Helicobacter pylori; Mge, Mycoplasma genitalium; Mpn,
Mycoplasma pneumoniae; Mtu, Mycobacterium tuberculosis;
Nme, Neisseria meningitidis; Pae, Pseudomonas aeruginosa;
Rpr, Rickettsia prowazekii; Syn, Synechocystis sp.; Tma, Thermotoga maritima; Tpa, Treponema pallidum; Vch, Vibrio
cholerae; Xfa, Xylella fastidiosa. Eukaryote: Sce, Saccharomyces cerevisiae. Archaea: Ape, Aeropyrum pernix; Afu,
Archaeoglobus fulgidus; Hbs, Halobacterium sp.; Mja,
Methanococcus jannaschii; Mth, Methanobacterium thermoautotrophicum; Pho, Pyrococcus horikoshii; Pab, Pyrococcus abyssi; Tac, Thermoplasma acidophilum.
COGs containing fusion components from at least two of the
three primary kingdoms, were selected for phylogenetic
analysis. COGs containing 60 or more members were
excluded because of potential uncertainty of orthologous
relationship between members of such large groups [18].
Multiple alignments were generated for each analyzed COG
using the T-Coffee program [29].
Phylogenetic trees were constructed by first generating a distance matrix using the PROTDIST program and the Dayhoff
PAM model for amino-acid substitutions and employing this
We thank Charles DeLisi, Adnan Derti, I. King Jordan, Kira Makarova,
Igor Rogozin, and Fyodor Kondrashov for critical reading of the manuscript and helpful discussions.
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg
D: Detecting protein function and protein-protein interactions from genome sequences. Science 1999, 285:751-753.
Huynen MJ, Snel B: Gene and context: integrative approaches
to genome analysis. Adv Prot Chem 2000, 54:345-379.
Yanai I, Derti A, DeLisi C: Genes linked by fusion events are
generally of the same functional category: a systematic
analysis of 30 microbial genomes. Proc Natl Acad Sci USA 2001,
98:7940-7945.
Parkinson JS, Kofoid EC: Communication modules in bacterial
signaling proteins. Annu Rev Genet 1992, 26:71-112.
Reizer J, Saier MH, Jr.: Modular multidomain phosphoryl transfer proteins of bacteria. Curr Opin Struct Biol 1997, 7:407-415.
Hunter T: Signaling - 2000 and beyond. Cell 2000, 100:113-127.
Koonin EV, Aravind L, Kondrashov AS: The impact of comparative genomics on our understanding of evolution. Cell 2000,
101:573-576.
Rubin GM, Yandell MD, Wortman JR, Gabor Miklos GL, Nelson CR,
Hariharan IK, Fortini ME, Li PW, Apweiler R, Fleischmann W, et al.:
Comparative genomics of the eukaryotes. Science 2000,
287:2204-2215.
International Human Genome Consortium: Initial sequencing and
analysis of the human genome. Nature 2001, 409:860-921.
Enright AJ, Ilipoulos I, Kyrpides NC, Ouzounis CA: Protein interaction maps for complete genomes based on gene fusion
events. Nature 1999, 402:86-90.
Galperin MY, Koonin EV: Who’s your neighbor? New computational approaches for functional genomics. Nat Biotechnol 2000,
18:609-613.
Snel B, Bork P, Huynen M: Genome evolution: gene fusion
versus gene fission. Trends Genet 2000, 16:9-11.
Wolf YI, Rogozin IB, Grishin NV, Tatusov RL, Koonin EV: Genome
trees constructed using five different approaches suggest
new major bacterial clades. BMC Evol Biol 2001, 1:8.
Pace NR: A molecular view of microbial diversity and the
biosphere. Science 1997, 276:734-740.
Teichmann SA, Mitchison G: Is there a phylogenetic signal in
prokaryote proteins? J Mol Evol 1999, 49:98-107.
http://genomebiology.com/2002/3/5/research/0024.13
comment
reviews
reports
deposited research
refereed research
16. Woese CR, Kandler O, Wheelis ML: Towards a natural system
of organisms: proposal for the domains Archaea, Bacteria,
and Eucarya. Proc Natl Acad Sci USA 1990, 87:4576-4579.
17. Brown JR, Doolittle WF: Archaea and the prokaryote-toeukaryote transition. Microbiol Mol Biol Rev 1997, 61:456-502.
18. Tatusov RL, Koonin EV, Lipman DJ: A genomic perspective on
protein families. Science 1997, 278:631-637.
19. Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram
UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV:
The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids
Res 2001, 29:22-28.
20. Koonin EV, Mushegian AR, Galperin MY, Walker DR: Comparison
of archaeal and bacterial genomes: computer analysis of
protein sequences predicts novel functions and suggests a
chimeric origin for the archaea. Mol Microbiol 1997, 25:619-637.
21. Aravind L, Tatusov RL, Wolf YI, Walker DR, Koonin EV: Evidence
for massive gene exchange between archaeal and bacterial
hyperthermophiles. Trends Genet 1998, 14:442-444.
22. Nelson KE, Clayton RA, Gill SR, Gwinn ML, Dodson RJ, Haft DH,
Hickey EK, Peterson JD, Nelson WC, Ketchum KA, et al.: Evidence
for lateral gene transfer between Archaea and Bacteria
from genome sequence of Thermotoga maritima. Nature 1999,
399:323-329.
23. Doolittle WF: Lateral genomics. Trends Cell Biol 1999, 9:M5-M8.
24. Koonin EV, Makarova KS, Aravind L: Horizontal gene transfer in
prokaryotes: quantification and classification. Annu Rev Microbiol 2001, 55:709-742.
25. Kyrpides NC, Olsen GJ: Archaeal and bacterial hyperthermophiles: horizontal gene exchange or common ancestry?
Trends Genet 1999, 15:298-299.
26. Logsdon JM, Faguy DM: Thermotoga heats up lateral gene
transfer. Curr Biol 1999, 9:R747-R751.
27. Dandekar T, Snel B, Huynen M, Bork P: Conservation of gene
order: a fingerprint of proteins that physically interact.
Trends Biochem Sci 1998, 23:324-328.
28. Wolf YI, Rogozin IB, Kondrashov AS, Koonin EV: Genome alignment, evolution of prokaryotic genome organization and
prediction of gene function using genomic context. Genome
Res 2001,11:356-372.
29. Notredame C, Higgins DG, Heringa J: T-Coffee: A novel method
for fast and accurate multiple sequence alignment. J Mol Biol
2000, 302:205-217.
30. Fitch WM, Margoliash E: Construction of phylogenetic trees.
Science 1967, 155:279-284.
31. Felsenstein J: Inferring phylogenies from protein sequences by
parsimony, distance, and likelihood methods. Methods Enzymol
1996, 266:418-427.
32. Adachi J, Hasegawa M: MOLPHY: Programs for Molecular Phylogenetics.
Tokyo: Institute of Statistical Mathematics; 1992.
33. Kishino H, Miyata T, Hasegawa M: Maximum likelihood inference of protein phylogeny and the origin of chloroplasts.
J Mol Evol 1990, 31:151-160.
interactions
information