An Amphioxus Emx Homeobox Gene Reveals

An Amphioxus Emx Homeobox Gene Reveals Duplication During
Vertebrate Evolution
Nic A. Williams1 and Peter W. H. Holland
School of Animal and Microbial Sciences, University of Reading, Reading, England
Members of the Emx homeobox gene class are expressed during embryogenesis in the brain and/or other head
structures of phylogenetically diverse phyla. Here, we describe sequence, genomic structure, and molecular phylogenetic analysis of a cephalochordate (amphioxus) Emx class gene termed AmphiEmxA. The genomic structure
of AmphiEmxA is very similar to that of vertebrate Emx genes, with two conserved intron sites. The Drosophila
homolog empty spiracles (ems) has just one intron, which may be shared with chordates; the other has been
secondarily lost in this Drosophila gene and in a cnidarian Emx-related gene. We identify a highly conserved
peptide motif close to the amino terminus of Emx proteins, demonstrate its similarity to a sequence found in a
variety of transcription factors, and argue that it arose through convergent evolution in homeobox and forkhead
genes. Finally, our molecular phylogenetic analysis strongly supports the presence of a single Emx gene in the
ancestor of chordates and gene duplication along the vertebrate lineage.
Introduction
The identification of evolutionarily homologous regions or structures by comparison of expression patterns
of homologous genes is now common practice in molecular biology. In order to infer which gene expression
sites are homologous and which may be derived in specific lineages, it is helpful to construct molecular phylogenies and identify the timing of gene duplication
events during the evolution of particular taxa. We have
embarked on a program to identify gene duplication
events and acquisition of novel developmental roles during chordate evolution. A useful animal for this approach is the cephalochordate Branchiostoma floridae
(amphioxus). Being a chordate, it has a body plan comparable to that of vertebrates, yet for many genes studied, it has a typically invertebrate gene compliment
(Garcia-Fernàndez and Holland 1994; Holland 1996,
1999; Williams and Holland 1998). It is therefore a useful animal for molecular comparison with diverse invertebrate taxa, including Drosophila, since it possesses
genes directly orthologous to invertebrate genes. On the
other hand, its genes may be compared with those of
vertebrates to deduce timings of gene duplication and
shed light on functional recruitment of genes within a
phylum.
A gene family of particular interest is the Emx homeobox gene class. The first members of this gene family to be cloned were two Drosophila genes, empty spiracles (ems) and E5 (Dalton, Chadwick, and McGinnis
1989; Walldorf and Gehring 1992). Drosophila ems is
the best known of these genes and functions as a gap
gene in the head in a way analogous to that of orthodenticle (otd). Emx is expressed in an anterior stripe dur1 Present address: Department of Psychiatry, University of Oxford, Warneford Hospital, Oxford, England.
Key words: amphioxus, homeobox, gene duplication, hep motif,
intron.
Address for correspondence and reprints: Peter W. H. Holland,
School of Animal and Microbial Sciences, University of Reading,
Whiteknights, Reading RG6 6AJ, United Kingdom. E-mail:
[email protected].
Mol. Biol. Evol. 17(10):1520–1528. 2000
q 2000 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038
1520
ing the syncytial blastoderm stage; loss of function mutations cause deletion of the anterior cephalic segments,
as well as deletion of parts of the deuterocerebrum and
tritocerebrum brain neuromeres (Hirth et al. 1995). Drosophila E5 is not expressed during the syncytial blastoderm or blastoderm stage. From stage 11 onward, E5
is expressed in segmentally reiterated blocks of lateral
mesoderm and/or lateral epidermis. In stage 10–12 embryos, this pattern overlaps with the reiterated lateral
epidermis expression pattern of the ems gene (W.
McGinnis, personal communication).
Two members of the Emx gene family have been
isolated from mice (Simeone et al. 1992); both are expressed in cephalic domains that include the presumptive cerebral cortex, the olfactory bulbs, and the olfactory epithelia, as well as the developing urogenital system. Gene targeting of the mouse Emx1 gene results in
a deletion or reduction of the hippocampus (Yoshida et
al. 1997); in homozygous mutations at the Emx2 locus,
the cortex is reduced in size, the dentate gyrus is deleted,
and the mice die postnatally due to severe urogenital
alterations (Pellegrini et al. 1996; Yoshida et al. 1997).
Similar cephalic expression domains, and putative functions, have been reported for zebrafish and Xenopus
Emx genes (Morita et al. 1995; Pannese et al. 1998); in
addition, Xenopus Emx genes are expressed in the visceral arches (Pannese et al. 1998).
The expression of Emx genes in anterior cephalic
domains of both vertebrates and Drosophila is intriguing
and (together with similar data for Otx genes) prompted
the suggestion that the process of cephalization may be
evolutionarily homologous between arthropods and
chordates (Holland, Ingham, and Krauss 1992). Since
this suggestion was made, Otx genes have been described from many invertebrate taxa, and the accumulating comparative data have added more support for
ancient roles for this gene family in the development of
the anterior body region. In contrast, Emx genes have
been cloned from very few invertebrates besides Drosophila (Dalton, Chadwick, and McGinnis 1989; Simeone et al. 1992). A putative homolog has been found in
the Caenorhabditis elegans genome (the ceh-2 gene on
cosmid C27A12), and a cnidarian Emx gene, Cn-ems,
Amphioxus Emx
has been cloned from the hydrozoan Hydractinia symbiolongicarpus. The latter gene is expressed in endoderm around the mouth-bearing end of the animal, apparently adding support for an ancient anterior expression domain for Emx genes (Mokady et al. 1998). It
may be prudent to treat this conclusion with caution,
however, since other authors have argued that the
mouth-bearing end of Cnidaria may not actually be homologous to the anterior pole of triploblasts (Martindale
and Henry 1998). Indeed, the huge differences in body
layout between diploblast and triploblast animals may
preclude direct comparisons of body regions.
The lack of information available regarding invertebrate Emx genes makes it difficult to draw safe conclusions concerning conservation of gene expression
pattern or function. Without a sound molecular phylogeny of these genes, it is not even clear if valid comparisons are being made. To begin to address these questions, we cloned an amphioxus member of the Emx homeobox class, termed AmphiEmxA. We describe the full
sequence and intron-exon organization of AmphiEmxA
and identify a highly conserved peptide domain outside
of the homeodomain in Emx protein sequences. We use
molecular phylogenetic analyses to examine the course
of gene duplication in the Emx gene family during animal evolution.
Materials and Methods
PCR Amplification
Amphioxi (B. floridae) were collected from Old
Tampa Bay, Florida (Holland and Holland 1993), and
high-molecular-weight genomic DNA was extracted
from a pooled sample of adults using standard methods
(Shimeld 1997a). Degenerate oligonucleotide primers
were designed to anneal to two regions within the homeobox, conserved at the amino acid level between vertebrate and Drosophila Emx genes. These primers were
IRTAFSP (sense primer NWems1: 59-GCGGATCCGAACNGCNTTYWSNCC-39) and AERKQLA
(antisense primer NWems3: 59-GGCGARYTGYTTCCKYTCNGC-39). Both sequences are 59 of the
homeobox intron site in vertebrates. The PCR product
amplified from genomic DNA was blunt-end-cloned into
SmaI-cut pUC18. Of 10 recombinants sequenced, eight
contained spurious amplification products, and two contained an identical DNA sequence with high sequence
similarity to the homeobox of vertebrate and Drosophila
Emx class genes (designated AmphiEmxA). Low frequency of positive recombinants may relate to divergence of AmphiEmxA, causing a mismatch at the 39 end
of NWems3.
Genomic and cDNA Library Screening
The amphioxus Emx-related PCR product was used
to screen approximately 50,000 clones of a B. floridae
genomic library (Garcia-Fernàndez and Holland 1994);
the low-stringency conditions of Holland and Hogan
(1986) were used. Six strongly hybridizing, overlapping
phage clones were isolated; two were further restrictionmapped, subcloned, and partially sequenced (Bfg 356-1
1521
and Bfg 364-1). This procedure revealed a region of
sequence similarity to the second exon of vertebrate
Emx genes, including part of the homeobox. This sequence was identical at the amino acid level to the
AmphiEmxA PCR fragment. A region of sequence similarity to the third exon of vertebrate Emx genes, including the 39 part of the homeobox, was identified by
hybridization with an end-labeled oligonucleotide, HB1
(CKNCKRTTYTGRAACCADATYTT), matching sequence encoding recognition helix III of Hox and Emx
class homeodomains. A probe derived from the genomic
partial homeobox sequence was used to screen 40,000
clones of an amplified B. floridae cDNA library constructed from 5–24-h embryos (kindly provided by J.
Langeland); conditions were those of Church and Gilbert (1984) at 658C. A single, strongly hybridizing
plaque was identified. The full cDNA sequence is available in the EMBL and GenBank databases (accession
number AF261146). The 59 region of this AmphiEmxA
cDNA was used as a probe to isolate the 59 exon from
the Bfg 356-1 and Bfg 364-1 phage clones. Comparison
of genomic and cDNA sequences was used to confirm
intron/exon boundaries of AmphiEmxA.
Molecular Phylogenetic Analysis
The entire putative AmphiEmxA coding region was
aligned with deduced amino acid sequences of Drosophila ems and human, mouse, Xenopus, and zebrafish
Emx class genes using the CLUSTAL W program
(Thompson, Higgins, and Gibson 1994) and adjusted by
eye to maximize contiguous stretches of sequence similarity. The C. elegans ceh-2 gene (C27A12.5; accession
number AF003137) and a cnidarian Emx gene, Cn-ems
(Mokady et al. 1998), were not included in this alignment due to high divergence outside the homeodomain.
The Drosophila E5 sequence was also not included. A
total of 127 amino acid positions were used in phylogenetic analyses, after regions that could not be aligned
with confidence and all sites with gaps were excluded
from the data set. The alignment is available as supplementary information. Phylogenies were constructed using neighbor joining (NJ), maximum parsimony (MP),
and maximum likelihood (ML). NJ was implemented
using the PROTDIST and NEIGHBOR programs of
PHYLIP, version 3.573c (Felsenstein 1993), on a distance matrix calculated with the Dayhoff PAM option.
MP used PROTPARS of PHYLIP, version 3.5c, from
which a strict consensus tree was constructed. Confidence in each node was assessed by 100 bootstrap replicates for both NJ and MP. Both analyses were repeated
using an alignment of just the homeodomain, so that Cnems and Drosophila E5 could be included. The ML analysis was performed using the quartet sampling and NJ
parameter estimation procedure of TreePuzzle, version
4.0.2 (Strimmer and von Haeseler 1996), with 1,000
puzzling steps, the Dayhoff model of amino acid substitution, and a mixed model of between-site rate heterogeneity with four gamma-distributed rate categories
and one invariant category.
1522
Williams and Holland
FIG. 1.—Nucleotide and predicted amino acid sequence of AmphiEmxA. The homeodomain residues are shown in bold, and the conserved
Emx peptide domain is underlined. Asterisks indicate the first 59 and 39 in-frame stop codons. Intron positions are indicated with triangles.
Results
Genomic Organization of AmphiEmxA
Using PCR, cDNA library, and genomic library
screening, we isolated an amphioxus member of the
Emx homeobox gene family, designated AmphiEmxA.
The cDNA is 2,884 bp long and has the potential to
encode a protein of 289 amino acids (fig. 1). We suggest
that this corresponds to the full-length protein, since
there are stop codons 59 of the first methionine codon.
The open reading frame includes a 180-bp homeobox
sequence close to its 39 end, followed by a long 39 untranslated region (UTR). The encoded homeodomain belongs to the Emx class, as indicated by its high similarity to the homeodomain sequences of the human, mouse,
frog, zebrafish, Drosophila, and cnidarian Emx genes
(80%–83% to vertebrate Emx sequences, 80% to ems,
85% to E5; fig. 2).
AmphiEmxA contains two introns, as determined by
comparison of genomic and cDNA sequences (figs. 1
and 3). The first is approximately 3 kb long and 59 of
the homeobox, separating it from a region coding for a
partially conserved hexapeptide sequence found in sev-
eral classes of homeobox. This intron position is shared
with mouse and human Emx (Simeone et al. 1992). The
second intron, approximately 6 kb in size, is located
between residues 44 and 45 of the homeodomain. An
intron is also found in this position in vertebrate Emx
genes (at least in those genes for which intron-exon organization has been determined, i.e., human and mouse
Emx1 and Emx2), in Drosophila E5, and in C. elegans
ceh-2. Cnidarian Cn-ems possesses an intron within the
homeobox, but this is not located in the same position
as that of amphioxus and vertebrates.
Restriction mapping and sequence analysis revealed a repeated DNA sequence of approximately 180
bp within AmphiEmxA. This motif is imperfectly repeated (fully or partially) six times within the 39 half of
the transcribed region of the gene. Approximately 66 bp
of the most 59 repeat unit forms part of the AmphiEmxA
coding region; the remainder and all subsequent repeats
are part of the 39 UTR. No such region exists in Drosophila ems, and searches against EMBL and GenBank
databases revealed no significant matches. The 39 UTR
sequences of vertebrate Emx genes have not been
published.
Amphioxus Emx
1523
FIG. 2.—Alignment of the predicted homeodomain sequence of AmphiEmxA with vertebrate, fly, nematode, and cnidarian Emx homeodomain sequences. The figures indicate percentage identity to the AmphiEmxA sequence. Abbreviations: Ce, Caenorhabditis elegans; Cn, Cnidaria
(Hydractinia symbiolongicarpus); Dm, Drosophila melanogaster; H, human; M, mouse; X, Xenopus; Z, zebrafish.
A Conserved Motif in Homeobox and Other Genes
Molecular Phylogenetic Analysis
Alignment of the AmphiEmxA deduced protein sequence to other members of the Emx homeobox class
revealed a well-conserved 14-residue peptide motif
close to the N-terminus. This sequence is located four
to five residues downstream of the first methionine in
vertebrate Emx class proteins, 21 residues downstream
in Drosophila ems, and 11 residues downstream in
AmphiEmxA (underlined in fig. 1). A weakly conserved
version is present in a cnidarian Emx protein, Cn-ems.
The Emx peptide motif was used to search the
EMBL and GenBank databases. This revealed a similar
sequence in a wide variety of homeobox genes and some
other transcription factors. We find that the Emx peptide
motif overlaps with the Hep motif present in the Drosophila H2.0 homeobox gene, engrailed homeobox
genes, and homeobox genes with a paired box (Allen et
al. 1991). Similar motifs have been noted in Msx, NK1, NK-2, gsc, Not, Pax-3/7, Rx, ceh-10, and Anf class
homeodomain proteins (Smith and Jaynes 1996; Stein,
Niß, and Kessel 1996; Galliot, de Vargas, and Miller
1999). Our analyses extend the list to include Emx and
Gbx class homeodomain proteins.
Figure 4 shows an alignment of this motif from
homeodomain proteins and some other transcription factors from diverse taxa (see Discussion). Sequence identity is more striking within a gene class than between
gene classes, suggesting the existence of functional constraints specific to each gene class. The most conserved
sites are an invariant phenylalanine at position 5 and an
almost invariant isoleucine at position 7.
In order to investigate whether AmphiEmxA is an
ortholog of a particular vertebrate Emx gene or a homolog of multiple genes, we performed molecular phylogenetic analyses on the deduced protein sequences.
Figure 5 shows phylogenetic trees inferred using NJ and
ML. Where bootstrap or quartet puzzling reliability values were less than 60%, nodes were collapsed. The trees
are rooted using Drosophila ems as the outgroup. These
analyses strongly indicate that AmphiEmxA lies outside
of a clade containing all of the vertebrate Emx genes
(98% NJ bootstrap, 100% ML reliability value). This
position is also supported by MP analysis (99% bootstrap). The implication is that a single Emx gene was
present in an ancestral chordate and that this gene underwent at least one duplication event in the vertebrate
lineage after it split from the lineage leading to the cephalochordates. We have not conclusively demonstrated
whether single or multiple Emx genes exist in amphioxus; however, the phylogenetic analyses predict that if
more Emx class genes were present in the amphioxus
genome, they would have arisen from independent gene
duplication events in the cephalochordate lineage.
The tree topology also helps to define some relationships between vertebrate Emx class genes. The
Emx2 genes of zebrafish, Xenopus, mice, and humans
clearly group into a single clade (100% NJ bootstrap,
96% MP bootstrap, 100% ML reliability). This confirms
that these genes are true orthologs. The situation regarding Emx1 genes is less clear. There is strong evidence that Xenopus, mouse, and human Emx1 genes are
orthologs (NJ, 96%; ML, 72%), but the position of the
zebrafish gene termed emx1 is not resolved with confidence. The node connecting the latter gene to the other
vertebrate Emx1 genes has been collapsed in the NJ tree
due to its low bootstrap score (51%). MP or ML, on the
other hand, places zebrafish emx1 next to the Emx2
clade, but, again, this position is supported with low
bootstrap or reliability values (MP, 56%; ML, 57%). Together, these data indicate that there has been at least
one Emx gene duplication within the vertebrate lineage
(to give Emx1 and Emx2); this predates the divergence
of actinopterygian and tetrapod lineages. The aberrant
zebrafish emx1 either is descended from a separate duplication or is a highly divergent Emx1 gene that has
confounded attempts to reconstruct ancestry from sequence data.
FIG. 3.—Genomic and cDNA organization of AmphiEmxA indicating positions and sizes of introns. Black boxes represent the homeobox; other coding regions are dotted. White boxes represent 59 and
39 untranslated regions (UTRs). The repeat domain present in the 39
UTR is indicated below the genomic schematic. Restriction sites are
shown above the genomic clone: Pv 5 PvuII, R 5 EcoRI.
1524
Williams and Holland
FIG. 4.—Alignment of the conserved peptide motif present in Emx genes. The motif is 14 amino acids in length; dashes indicate sequence
identity. A selection of similar sequences encoded by other homeobox genes, forkhead domain genes, and zinc finger genes is also shown.
Abbreviations: Amphi/am, amphioxus; C/c, chicken; Cn, cnidarian; D, Drosophila; Em, Ephydatia muelleri (sponge); h, human; m, mouse; X,
Xenopus; zf, zebrafish.
To include Drosophila E5 and the divergent cnidarian Cn-ems gene, it was necessary to restrict the
alignment to just the homeodomain, thus compromising
sequence length in favor of taxonomic sampling. Phylogenetic trees obtained from this alignment had very
similar topologies to those in figure 5 (data not shown).
All of the invertebrate Emx genes were again placed
outside of a clade containing all vertebrate Emx1 and
Emx2 genes (82% NJ bootstrap, 70% MP bootstrap).
Although these scores are lower than those above, presumably due to reduced informative sequence variation
in the homeodomain, they still support the existence of
a single Emx gene in an ancestral chordate. The two
Drosophila Emx class genes (ems and E5) group together in such analyses, suggesting that they may derive
from an independent gene duplication, although bootstrap values are very low (59% NJ, 53% MP). Further
sampling of Emx genes will be required to resolve the
timing of this duplication relative to arthropod radiation.
Discussion
Conservation, Gain, and Loss of Introns
The best-known class of homeobox genes, the Hox
genes, have a simple and stereotyped genomic organization in vertebrates and amphioxus. With few exceptions, Hox genes have a single intron just 59 of the homeobox, dividing this region from a conserved hexapeptide motif (for amphioxus, see Garcia-Fernàndez and
Holland 1994; Wada, Garcia-Fernàndez, and Holland
1999). Interestingly, this intron lies between DNA sequences coding for two functional domains of Hox proteins: the homeodomain mediating sequence-specific
binding to DNA, and the hexapeptide involved in heterodimer formation with Pbx/exd homeodomain proteins (Piper et al. 1999). A comparable intron position
is found in many other classes of homeobox gene, including ParaHox genes (Cdx, Xlox, Gsx classes), Mox,
Otx, and the Emx class genes studied here. We find that
this intron position, just upstream of the homeobox, is
conserved between human and mouse Emx genes and
the amphioxus homolog, AmphiEmxA. Drosophila ems
and the cnidarian Cn-ems gene also possess an intron 59
of the homeobox, but sequence divergence precludes assigning this as a homologous position on the basis of
sequence alone. Nonetheless, taking into account the
fact that a comparable intron position exists in many
homeobox classes and that these classes diverged early
in metazoan evolution (Bürglin 1995), we suggest that
conservation of an ancient intron position is the most
likely explanation.
Introns within the homeobox can be compared
more easily, since sequence conservation allows sites to
be aligned with certainty. We mapped a large (6 kb)
intron within the homeobox of AmphiEmxA between codons 44 and 45. This intron site is shared with human
and mouse Emx1 and Emx2 genes, indicating an origin
before the divergence of vertebrate and cephalochordate
lineages. Presence of this same intron site in the C. ele-
Amphioxus Emx
1525
PRD superclass and an ANTP superclass (as defined by
the phylogeny of Galliot, de Vargas, and Miller 1999).
The Emx, Hox, Cdx, Dlx, NK-1, NK-2, Lbx, Hlx, and
NEC classes (and others) are all part of the ANTP superclass; we propose that the intron was inserted into
the ancestor of this superclass.
It is interesting, therefore, that Drosophila ems and
cnidarian Cn-ems lack this intron. Indeed, Cn-ems possesses an intron at a different site in the homeobox. We
conclude that there has been both loss and gain of introns within Emx class homeoboxes.
Convergent Evolution of a Transcriptional
Modification Domain?
FIG. 5.—Neighbor-joining (NJ; top) and maximum-likelihood
(ML; bottom) phylogenetic trees from an alignment of the putative
protein sequences of AmphiEmxA, vertebrate Emx genes, and Drosophila ems. Figures at nodes are scores from 100 bootstrap resamplings of the data (NJ) or quartet puzzling support values (ML). Nodes
were collapsed where scores were below 60. The two methods gave
the same overall topology except for the position of zebrafish Emx1
(see text) and minor branch swapping within the Emx2 clade.
gans Emx class gene ceh-2 and the Drosophila Emx
class gene E5 (although not ems) suggests an even earlier origin, before the divergence of the major bilaterian
lineages. Bürglin (1994, 1995) noted that among homeobox genes with an intron in the homeobox, the most
frequent site is between codons 44 and 45. These include Hox genes such as Drosophila lab, pb, and AbdB, plus nematode lin-39 and the non-Hox genes NK-1,
H2.0, lbl, and Dll, chick CdxA and CNot1, several vertebrate Dlx and Hlx genes, nematode ceh-1, ceh-9, ceh
12, and ceh-20, flatworm Dth-2, and the chordate Emx
genes discussed here.
Clearly, possession of an intron between codons 44
and 45 of the homeobox is a character shared by genes
from several related classes of homeobox genes. We follow Bürglin (1995) in arguing that possession of this
intron is an ancestral property of many metazoan homeobox gene classes, including Emx. We do not suggest, however, that this intron position was present in
the first homeobox genes. Instead, we suggest that insertion of this intron corresponds to a major division of
the homeobox gene superfamily of metazoans into a
Alignment of the full-length deduced protein encoded by AmphiEmxA with its homologs from vertebrates and Drosophila allowed identification of a conserved 14-residue motif close to the N-terminus. Database analysis and comparisons by eye revealed similarity to or overlap with a number of independently
identified peptide motifs in several (but not all) homeodomain proteins, as well as some forkhead domain
genes and zinc finger genes. The motif was first noted
by Allen et al. (1991), who named it the Hep motif,
referring to its presence in the H2.0 (Hlx) homeobox
gene, engrailed homeobox genes, and homeobox genes
with a paired box (Hep 5 H2.0/engrailed/paired). The
motif is the most N-terminal of five conserved protein
stretches shared by mouse, human, and chicken engrailed (en) class genes and is designated eh1 (Logan et
al. 1992). The eh1 motif is also present in en class protein sequences from invertebrates, including Drosophila
and Artemia (e.g., Manzanares, Marco, and Garesse
1993). Smith and Jaynes (1996) extended the range of
homeodomain proteins in which the eh1 motif could be
recognized to the Msx, NK-1, NK-2, and gsc classes.
The eh1 motif from Drosophila engrailed is capable of
strongly repressing transcription when attached to a
DNA-binding domain, providing a functional reason for
wide conservation of the motif. Stein, Niß, and Kessel
(1996) noted that proteins of the Not homeodomain
class possess a Hep/eh1 motif, while Galliot, de Vargas,
and Miller (1999) noted presence of an eh1-like motif
in a range of PRD superclass homeodomain proteins,
including the Pax-3/7, Rx, ceh-10, and Anf classes.
Our finding that a similar motif exists in Emx class
proteins from invertebrates and vertebrates extends the
range of homeobox genes further. Using the Emx motif
in database searches revealed that the Gbx homeodomain class could also be added to the growing list, in
addition to allowing us to refine the extent of conservation of this motif (fig. 4). These comparisons clearly
suggest that this motif has an ancient origin within the
homeodomain superfamily, at least within Metazoa. In
addition, it suggests that Emx homeodomain proteins
possess a separate domain that is likely to act as a modulator of transcriptional activity.
Shimeld (1997b) noted that the eh1 domain has remarkable similarity to a conserved domain (region II)
shared between proteins of the HNF3 family of forkhead
1526
Williams and Holland
domain transcription factors from vertebrates, amphioxus, and arthropods. To this list of taxa we can now
add the budhead gene from Hydra (Martinez et al.
1997). As with eh1, a function has been assigned to
region II; in this case, the function is transcriptional activation rather than repression (Pani et al. 1992). Grimes
et al. (1996) and Deschet et al. (1998) noted similarity
to the SNAG repressor domain in vertebrate Gfi-1 protooncoproteins and to a sequence located at the N-terminus of the vertebrate Snail-Slug class of zinc finger proteins. In the case of zinc finger proteins, however, biochemical function has not been demonstrated.
It is intriguing that a similar protein motif exists in
at least three apparently unrelated transcription factor
families (homeodomain proteins, forkhead domain proteins, and zinc finger proteins). This is a highly unusual
distribution that demands explanation. It cannot be discounted as mere chance sequence similarity, because (at
least for the homeodomain and forkhead examples) the
motif has a defined biochemical function and evolutionary conservation across a wide taxonomic range. Indeed,
in each case, conservation extends across almost the full
range of Metazoa, from cnidarians to arthropods and
chordates. There are two opposing explanations for the
pattern described: conservation and convergence. Conservation would imply that similarity is a reflection of
descent from a very ancient functional motif that existed
in a ‘‘primordial’’ transcription factor. This would demand radical exon shuffling or gene fusion to copy a
domain between precursors of proteins possessing different DNA-binding domains, plus extensive loss or divergence of the motif in some subsequent lineages of
each gene family. On the basis of the unusual distribution of this motif, we favor the alternative explanation:
convergent evolution. Two other factors also argue in
favor of convergent evolution. First, the motif has distinct biochemical functions in the two gene families; it
can act as a repressor in homeodomain proteins and as
an activator in forkhead proteins. Second, the motif is
located in a different part of the protein in each case:
close to the N-terminus in homeodomain proteins, and
C-terminal in forkhead proteins.
Gene Duplication
Molecular phylogenetic analysis using information
from the entire coding sequences of chordate and Drosophila Emx class genes gives strong support for the
existence of a single Emx gene in the ancestor of chordates. This Emx gene underwent at least one gene duplication event in the vertebrate lineage, after this lineage had diverged from its sister lineage leading to amphioxus and before the divergence of ray-finned fish and
tetrapods. AmphiEmxA is a descendant of the ancestral
gene before it underwent vertebrate-specific duplication.
Hence, neither vertebrate Emx1 nor Emx2 should strictly
be considered orthologs of AmphiEmxA. Emx1 and
Emx2 are also not orthologs of Drosophila ems and E5.
These gene duplication events suggest that some caution
is necessary when comparing gene expression patterns
and developmental roles between vertebrate and inver-
tebrate Emx genes. Similar gene duplication events in
early vertebrate evolution have been recorded for many
genes. These include several classes of transcription factors, including homeobox genes of the Hox, Otx, Msx,
Cdx, En, and Gsx classes, Pax genes, and myogenic
bHLH genes (for review, see Holland 1999). Other examples of duplicated genes are also known, raising the
possibility that gene duplication affected a large proportion of the genome in early vertebrate evolution
(Holland et al. 1994; Holland 1999). This proposal gains
additional support from total gene number estimates;
Simmen et al. (1998) estimated that the tunicate Ciona
intestinalis has approximately 15,500 genes (63,700),
as compared with 50,000–100,000 in higher vertebrates.
Current data from individual gene families suggest that
the amphioxus condition is comparable to that of
tunicates.
Although there is now overwhelming evidence in
favor of extensive gene duplication in early vertebrate
evolution (with the Emx class adding to that evidence),
the mechanism by which duplication occurred is contentious. A popular view, originally proposed by Ohno
(1970), is that two or more polyploidy events, followed
by gene divergence and gene loss, caused a stepwise
increase in gene number during early vertebrate evolution. The existence of chromosomal paralogy regions
(regions of similar gene content on different chromosomes) in mammalian genomes seems to support the
polyploidy model (Lundin 1993). Paralogy regions may
be the echoes of at least two whole-genome duplications, but they are not necessarily faithful copies due to
subsequent gene loss and/or additional tandem gene duplication. The Emx1 and Emx2 genes of mice or humans
do not map to currently identified paralogy regions; human Emx1 maps to 2p14–p13, while Emx2 maps to
10q26.1. It is unclear, therefore, whether the gene duplication reported in this paper occurred in concert with
other genes or was an isolated event.
We have discussed the duplication of vertebrate
Emx genes as if it were a single event, since we consider
this the most parsimonious interpretation of our molecular phylogenetic analyses. However, while the monophyletic status of the vertebrate Emx2 genes is very well
supported, we cannot decide conclusively between a
monophyletic and a paraphyletic origin for Emx1 genes.
This is due to low confidence in the precise position of
zebrafish emx1, which appears to be evolving at a relatively high rate, as judged by its long branch length in
phylogenetic trees (fig. 5). It is formally possible that
zebrafish emx1 represents a third group of vertebrate
Emx class genes which has been lost from tetrapods (or
has yet to be cloned); if this is the case, there have been
at least two duplications of the ancestral Emx gene in
early vertebrate evolution. The simpler explanation is
that the zebrafish emx1 gene is a true Emx1 gene, but
its placement in the molecular phylogenetic tree is compromised by an unusually rapid rate of sequence evolution. These two alternatives were also contrasted by
Patarnello et al. (1997). These authors found a very basal placement for zebrafish emx1, as an outgroup to both
Emx1 and Emx2; our analyses do not agree with this
Amphioxus Emx
placement. Compared with the alignment of Patarnello
et al. (1997), we have been more conservative in identifying putatively homologous sites and excluding very
variable regions between chordate Emx genes and Drosophila ems. We suggest that this has resulted in more
reliable phylogenetic trees. We favor the parsimonious
interpretation that zebrafish emx1 is a real, but divergent,
Emx1 gene. In further support of this interpretation, no
other Emx1-type gene has been reported from zebrafish
to date, and the rapid divergence of particular duplicated
zebrafish genes is not without precedence (Williams and
Holland 1998). In summary, we conclude that the duplication of an ancestral Emx class homeobox gene in
the vertebrate lineage postdates divergence from cephalochordates and predates the divergence of ray-finned
fish and tetrapods.
Acknowledgments
We thank Jim Langeland for the B. floridae cDNA
library, Bill McGinnis for communicating unpublished
data, and Jordi Garcia-Fernàndez, Hidetoshi Saiga, and
members of the laboratory for helpful discussions. The
constructive suggestions of Dr. Richard Thomas and a
referee are also acknowledged. This work was supported
by BBSRC grant G04203.
LITERATURE CITED
ALLEN, J. D., T. LINTS, N. A. JENKINS, N. G. COPELAND, A.
STRASSER, R. P. HARVEY, and J. M. ADAMS. 1991. Novel
murine homeobox gene on chromosome-1 expressed in specific hematopoietic lineages and during embryogenesis.
Genes Dev. 5:509–520.
BÜRGLIN, T. R. 1994. A comprehensive classification of homeobox genes. Pp. 25–71 in D. DUBOULE, ed. Guidebook
to the homeobox genes. Oxford University Press, Oxford,
England.
———. 1995. The evolution of homeobox genes. Pp. 291–336
in R. ARAI, M. KATO, and Y. DOI, eds. Biodiversity and
evolution. National Science Museum Foundation, Tokyo.
CHURCH, G. M., and W. GILBERT. 1984. Genomic sequencing.
Proc. Natl. Acad. Sci. USA 81:1991–1995.
DALTON, D., R. CHADWICK, and W. MCGINNIS. 1989. Expression and embryonic function of empty spiracles: a Drosophila homeobox gene with two patterning functions on
the anterior-posterior axis of the embryo. Genes Dev. 3:
1940–1956.
DESCHET, K., F. BOURRAT, D. CHOURROUT, and J.-S. JOLY.
1998. Expression domains of the medaka (Oryzias latipes)
Ol-Gsh 1 gene are reminiscent of those of clustered and
orphan homeobox genes. Dev. Genes Evol. 208:235–244.
FELSENSTEIN, J. 1993. PHYLIP (phylogeny inference program). Version 3.5c. Distributed by the author, Department
of Genetics, University of Washington, Seattle.
GALLIOT, B., C. DE VARGAS, and D. MILLER. 1999. Evolution
of homeobox genes: Q50 Paired-like genes founded the
Paired class. Dev. Genes Evol. 209:186–197.
GARCIA-FERNÀNDEZ, J., and P. W. H. HOLLAND. 1994. Archetypal organization of the amphioxus Hox gene cluster. Nature 370:563–566.
GRIMES, H. L., T. O. CHAN, P. A. ZWEIDLER-MCKAY, B. TONG,
and P. N. TSICHLIS. 1996. The Gfi-1 proto-oncoprotein contains a novel transcriptional repressor domain, SNAG, and
1527
inhibits G1 arrest induced by interleukin-2 withdrawal. Mol.
Cell. Biol. 16:6263–6272.
HIRTH, F., S. THERIANOS, T. LOOP, W. J. GEHRING, H. REICHERT, and K. FURUKUBO-TOKUNAGA. 1995. Developmental
defects in brain segmentation caused by mutations of the
homeobox genes orthodenticle and empty spiracles in Drosophila. Neuron 15:769–778.
HOLLAND, N. D., and L. Z. HOLLAND. 1993. Embryos and
larvae of invertebrate deuterostomes. Pp. 21–32 in C. D.
STERN and P. W. H. HOLLAND, eds. Essential developmental
biology: a practical approach. IRL Press at Oxford University Press, Oxford, England.
HOLLAND, P. W. H. 1996. Molecular biology of lancelets: insights into development and evolution. Isr. J. Zool. 42:
S247—S272.
———. 1999. Gene duplication: past, present and future. Semin. Cell Dev. Biol. 10:541–547.
HOLLAND, P. W. H., J. GARCIA-FERNÀNDEZ, N. A. WILLIAMS,
and A. SIDOW. 1994. Gene duplications and the origins of
vertebrate development. Development 1994(Suppl.):125–
133.
HOLLAND, P. W. H., and B. L. M. HOGAN. 1986. Phylogenetic
distribution of Antennapedia-like homeoboxes. Nature 321:
251–253.
HOLLAND, P. W. H., P. INGHAM, and S. KRAUSS. 1992. Mice
and flies head to head. Nature 358:627–628.
LOGAN, C., M. C. HANKS, S. NOBLETOPHAM, D. NALLAINATHAN, N. J. PROVART, and A. L. JOYNER. 1992. Cloning
and sequence comparison of the mouse, human, and chicken engrailed genes reveal potential functional domains and
regulatory regions. Dev. Genet. 13:345–358.
LUNDIN, L. G. 1993. Evolution of the vertebrate genome as
reflected in paralogous chromosomal regions in man and
the house mouse. Genomics 16:1–19.
MANZANARES, M., R. MARCO, and R. GARESSE. 1993. Genomic organization and developmental pattern of expression
of the engrailed gene from the brine shrimp Artemia. Development 118:1209–1219.
MARTINDALE, M. Q., and J. Q. HENRY. 1998. The development
of radial and biradial symmetry: the evolution of bilaterality. Am. Zool. 38:672–684.
MARTINEZ, D. E., M. L. DIRKSEN, P. M. BODE, M. JAMRICH,
R. E. STEELE, and H. R. BODE. 1997. Budhead, a fork head
HNF-3 homologue, is expressed during axis formation and
head specification in hydra. Dev. Biol. 192:523–536.
MOKADY, O., M. H. DICK, D. LACKSCHEWITZ, B. SCHIERWATER, and L. W. BUSS. 1998. Over one-half billion years of
head conservation? Expression of an ems class gene in Hydractinia symbiolongicarpus (Cnidaria: Hydrozoa). Proc.
Natl. Acad. Sci. USA 95:3673–3678.
MORITA, T., H. NITTA, K. YUJI, H. MORI, and M. MISHINA.
1995. Differential expression of two zebrafish emx homeodomain mRNAs in the developing brain. Neurosci. Lett.
198:131–134.
OHNO, S. 1970. Evolution by gene duplication. Springer-Verlag, New York.
PANI, L., D. G. OVERDIER, A. PORCELLA, X. QIAN, E. LAI, and
R. H. COSTA. 1992. Hepatocyte nuclear factor-3-beta contains 2 transcriptional activation domains, one of which is
novel and conserved with the Drosophila fork head protein.
Mol. Cell. Biol. 12:3723–3732.
PANNESE, M., G. LUPO, B. KABLAR, E. BONCINELLI, G. BARSACCHI, and R. VIGNALI. 1998. The Xenopus Emx genes
identify presumptive dorsal telencephalon and are induced
by head organizer signals. Mech. Dev. 73:73–83.
PATARNELLO, T., L. BARGELLONI, E. BONCINELLI, F. SPADA, M.
PANNESE, and V. BROCCOLI. 1997. Evolution of Emx genes
1528
Williams and Holland
and brain development in vertebrates. Proc. R. Soc. Lond.
B Biol. Sci. 264:1763–1766.
PELLEGRINI, M., A. MANSOURI, A. SIMEONE, E. BONCINELLI,
and P. GRUSS. 1996. Dentate gyrus formation requires
Emx2. Development 122:3893–3898.
PIPER, D. E., A. H. BATCHELOR, C. P. CHANG, M. L. CLEARY,
and C. WOLBERGER. 1999. Structure of a HoxB1-Pbx1 heterodimer bound to DNA: role of the hexapeptide and a
fourth homeodomain helix in complex formation. Cell 96:
587–597.
SHIMELD, S. M. 1997a. Characterisation of amphioxus HNF-3
genes: conserved expression in the notochord and floor
plate. Dev. Biol. 183:74–85.
———. 1997b. A transcriptional modification motif encoded
by homeobox and fork head genes. FEBS Lett. 410:124–
125.
SIMEONE, A., M. GUILISANO, D. ACAMPORA, A. STORNAIULO,
M. RAMBALDI, and E. BONCINELLI. 1992. Two vertebrate
homeobox genes related to the Drosophila empty spiracles
gene are expressed in the embryonic cerebral cortex.
EMBO J. 11:2541–2550.
SIMMEN, M. W., S. LEITGEB, V. H. CLARK, S. J. M. JONES, and
A. BIRD. 1998. Gene number in an invertebrate chordate,
Ciona intestinalis. Proc. Natl. Acad. Sci. USA 95:4437–
4440.
SMITH, S. T., and J. B. JAYNES. 1996. A conserved region of
engrailed, shared among all en-, gsc-, NK1-, NK2- and
msh-class homeoproteins, mediates active transcriptional repression in vivo. Development 122:3141–3150.
STEIN, S., K. NIß, and M. KESSEL. 1996. Differential activation
of the clustered homeobox genes CNOT2 and CNOT1 during notogenesis in the chick. Dev. Biol. 180:519–533.
STRIMMER, K., and A. VON HAESELER. 1996. Quartet puzzling:
a maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13:964–969.
THOMPSON, J. D., D. G. HIGGINS, and T. J. GIBSON. 1994.
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680.
WADA, H., J. GARCIA-FERNÀNDEZ, and P. W. H. HOLLAND.
1999. Colinear and segmental expression of amphioxus Hox
genes: differences from vertebrates and clues to ancestral
roles. Dev. Biol. 213:131–141.
WALLDORF, U., and W. J. GEHRING. 1992. Empty spiracles, a
gap gene containing a homeobox involved in Drosophila
head development. EMBO J. 11:2247–2259.
WILLIAMS, N. A., and P. W. H. HOLLAND. 1998. Gene and
domain duplication in the chordate Otx gene family: insights from amphioxus Otx. Mol. Biol. Evol. 15:600–607.
YOSHIDA, M., Y. SUDA, I. MATSUO, N. MIYAMOTO, N. TAKEDA, S. KURITANI, and S. AIZAWA. 1997. Emx1 and Emx2
function in development of dorsal telencephalon. Development 124:101–111.
RICHARD THOMAS, reviewing editor
Accepted June 14, 2000