Classification and expression analyses of homeobox genes from

Classification and expression analyses of homeobox genes
from Dictyostelium discoideum
HIMANSHU MISHRA and SHWETA SARAN*
School of Life Sciences, Jawaharlal Nehru University, New Delhi 110 067, India
*Corresponding author (Email, [email protected]; [email protected])
Homeobox genes are compared between genomes in an attempt to understand the evolution of animal development.
The ability of the protist, Dictyostelium discoideum, to shift between uni- and multicellularity makes this group ideal
for studying the genetic changes that may have occurred during this transition. We present here the first genome-wide
classification and comparative genomic analysis of the 14 homeobox genes present in D. discoideum. Based on the
structural alignment of the homeodomains, they can be broadly divided into TALE and non-TALE classes. When
individual homeobox genes were compared with members of known class or family, we could further classify them
into 3 groups, namely, TALE, OTHER and NOVEL classes, but no HOX family was found. The 5 members of TALE
class could be further divided into PBX, PKNOX, IRX and CUP families; 4 homeobox genes classified as NOVEL
did not show any similarity to any known homeobox genes; while the remaining 5 were classified as OTHERS as they
did show certain degree of similarity to few known homeobox genes. No unique RNA expression pattern during
development of D. discoideum emerged for members of an individual group. Putative promoter analysis revealed
binding sites for few homeobox transcription factors among many probable factors.
[Mishra H and Saran S 2015 Classification and expression analyses of homeobox genes from Dictyostelium discoideum. J. Biosci. 40 241–255] DOI
10.1007/s12038-015-9519-3
1.
Introduction
Homeobox genes play key roles in animal, plant, fungi
and amoebozoans development such as regional patterning, regulation of cell proliferation, differentiation, adhesion, migration and regulation of cell fate (Gehring
1994; Derelle et al. 2007). They contain a conserved
DNA motif of about 180 base pairs, which when translated encode a peptide motif 60 amino acids long, called
the homeodomain. The homeodomain consists of three
α-helices that form a helix-loop-helix-turn-helix DNAbinding motif that recognizes and binds to specific
DNA sequences to regulate the expression of target
genes (Kissinger et al. 1990).
Keywords.
These proteins are usually classified based on phylogenetic analyses of homeodomain sequence similarity, chromosomal location and additional conserved protein domains
or motifs outside of the homeodomain. The homeodomaincontaining proteins of animal genomes are classified into 11
classes, namely, ANTP, PRD, TALE, POU, CERS, PROS,
ZF, LIM, HNF, CUT and SINE. A few identified animal
proteins that could not be placed readily into any of these
categories have been placed in a class called OTHER
(Holland et al. 2007) Homeodomain proteins from plants
have been classified into 14 classes: HD-ZIP I, HD-ZIP II,
HD-ZIP III, HD-ZIP IV, KNOX, BEL, PLINC, WOX, DDT,
PHD, NDX, LD, PINTOX and SAWADEE (Mukherjee
et al. 2009).
Classification; Dictyostelium; expression; homeobox
Supplementary materials pertaining to this article are available on the Journal of Biosciences Website at http://www.ias.ac.in/jbiosci/
jun2015/supp/Mishra.pdf
http://www.ias.ac.in/jbiosci
Published online: 27 April 2015
J. Biosci. 40(2), June 2015, 241–255, * Indian Academy of Sciences
241
242
H Mishra and S Saran
Conservation among these genes provides an effective tool
to study evolution of organisms. Presently, Dictyostelium
discoideum, a member of mycetozoa belonging to phylum
Amoebozoa and phylogenetically closer to ophisthokonta than
the green plants, has been analysed. This position is supported
with a combined maximum likelihood analysis of 19 proteins
(Kuma et al. 1995) and concatenated alignment of 82 universally conserved eukaryotic proteins (Iyer et al. 2008). Phylogenetic analyses of the combined data set of EF-1α, HSP70,
actin and β-tubulin amino acid sequences strongly support the
assemblages of ophisthokonta, which includes animals and
fungi with amoebozoa (Steenkamp et al. 2006). The thought
that D. discoideum had originated from ophisthokonta is ruled
out by its lack of an 11- to 13-amino-acid insertion found
exclusively in all animal and fungal EF-1α (Baldauf and Palmer 1993). An analysis of 1,280 expressed sequence tags (EST)
from M. balamuthi, D. discoideum and E. histolytica in a data
set of 123 genes from 30 species provided very strong support
for the position of Conosa (M. balamuthi, D. discoideum and
E. histolytica) as a sister clade of ophisthokonta and also
suggested that Dictyostelium diverged after the plant–animal
split but before the divergence of fungi (Bapteste et al. 2002).
In other words, dictyostelids occupy a position in evolution of
eukaryotes as a sister clade of metazoans and fungi and are
members of the eukaryote crown group.
The homeobox genes are largely studied in metazoans, and in the present study, we have analysed the
homeobox genes from the unicellular eukaryote,
D. discoideum. On the basis of its homeodomain sequence and predicted tertiary structures, we could successfully classify them as per the known classification
criteria from animals (Holland et al. 2007; Takatori
et al. 2008). There are 14 predicted homeobox genes
with one duplicate on chromosome 2. The functions of
few of the genes have already been characterized in
D. discoideum: hbx1 (wariai) regulates cell-type differentiation and proportioning, hbx3 plays a role in phagocytosis and hbx4 exhibits defects in cytokinesis and
development (Han and Firtel 1998; Sillo et al. 2008;
Kim et al. 2011).
The homeobox genes from D. discoideum could be
classified largely into two groups: TALE class and nonTALE. The non-TALE could be further subdivided into
NOVEL class and OTHER. TALE class members were
further divided into PBX, PKNOX, IRX and CUP families. In this phylogenetic analyses fungus were used as
an out-group since these are sister to both Metazoans
and Amoebas. The RNA expression analyses of all 14
during development of D. discoideum were carried out.
Genome data from such additional unicellular relatives
of metazoans will be useful in the analysis of evolution
of multicellularity and will facilitate comparative genomics of this important gene superclass.
J. Biosci. 40(2), June 2015
2.
2.1
Materials and Methods
Homeobox genes from D. discoideum
A genome-wide search for the homeodomain sequences was
performed using BLAST server at the dictyBase using the
known homeodomain sequences from the ANTP, PRD,
LIM, POU, HNF, SINE, TALE, CUT, PROS, ZF and also
OTHER classes of different organisms ranging from unicellular fungus to multicellular vertebrates. No arbitrary Evalue cut-off was selected but instead each list of hits was
manually analysed until true homeodomain sequences
ceased to be detected. Hits were analysed for the presence
of a homeodomain, additional domains and motifs by using
the Simple Modular Architecture Research Tool (http://
smart.embl-heidelberg.de/) (Letunic et al. 2011). The individual homeodomains from multi-homeodomain protein in
Dictyostelium was considered as separate taxa in our phylogenetic analyses – lowercase letters appended to the gene
name distinguish downstream homeodomain that are derived
from a single protein (Ryan et al. 2006).
2.2
Database searches and retrieval of protein sequences
To classify the Dictyostelium homeobox genes, sequences of
known homeodomains were collected using BLASTp
through NCBI and of different databases such as http://
www.tigrorg/tdb/e2k1/cna1 (Cryptococcus neoformans)
and www.wormbase.org (Caenorhabditis elegans, WS203).
Cut-off E-value (e−05) was selected. In case of the NOVEL
class proteins, hits showed insignificant E-values and therefore we relaxed the E-value criteria for obtaining sequences
showing slight similarity (E-value e0.009– e1.5) to NOVEL
class members. No unnamed, predicted or hypothetical proteins were selected. Homeodomain sequences were extracted
using SMART. Incomplete and redundant sequences were
removed manually.
2.3
Phylogenetic analyses
Phylogenetic analyses were performed with homeodomain
sequences after aligning the 60 amino acids using Maximum
Likelihood (ML) method. ML trees were constructed using
the PhyML program (Guindon and Gascuel 2003; Guindon
et al. 2005) with JTT amino acids substitution model Branch
support was assessed using SH-like (Shimodaira-Hasegawa)
in PhyML program. Sequences showing insignificant branch
support were removed manually from the final trees constructed. Trees were re-rooted using TreeDyn (http://
www.phylogeny.fr) (Dereeper et al. 2008) and visualized
by MEGA6.0 (Molecular Evlutionary Genetics Analysis)
(Tamura et al. 2013). Species codes used: Gg: Gallus gallus;
Homeobox genes from Dictyostelium
Dm: Drosophila melanogaster; Hs: Homo sapiens; Mm:
Mus musculus; Xl: Xenopus laevis; Sc: Saccharomyces
cerevisiae;; Cf: Camponotus floridanus; Tc: Tribolium
castaneum; Dp: Danaus plexippus; Hsa: Harpegnathos
saltator; Cb: Cerapachys biroi; Ae: Acromyrmix echinatior;
Dd: Dictyostelium discoideum; Cn: Cryptococcus
n e of o r m a n s ; Ec : E nceph alito zoo n cunicu l i ; Nv :
Nematostella vectensis; Cs: Cupiennius salei; Sp:
Strongylocentrotus purpuratus; Mda: Myotis davidii; Aa:
Aedes aegypti; Ta: Trichoplax adhaerens; Sk: Saccoglossus
kowalevskii; Cgi: Crassostrea gigas; Dr: Danio rerio; Rn:
Rattus norvegicus; Cg: Cricetulus griseus; Cl: Columba
livia; Ss: Salmo salar; So: Sepia officinalis; Am:
Ambystoma mexicanum; Cc: Cyprinus carpio; Bt: Bos taurus; Cgat: Cryptococcus gattii; Cq: Culex quinquefasciatus;
Tas: Trichosporon asahii; Op: Ogataea polymorpha; Om:
Osmerus mordax; Ggo: Gorilla gorilla; Pa: Pteropus alecto;
Mf: Macaca fascicularis; Od: Oikopleura dioica; Zn:
Zootermopsis nevadensis; Cj: Callithrix jacchus; Md:
Microplitis demolitor; Wi: Wallemia ichthyophaga; Cmy:
Chelonia mydus; He: Heliocidaris erythrogramma; Pv: Patella vulgata; Pl: Paracentrotus lividus; Clo: Convolutriloba
longifissura; Hm: Hymenolepis microstoma; Xt: Xenopus
tropicalis; Phc: Pediculus humanuscorporis; Po:
Paralichthys olivaceus; Csa: Ciona savignyi; Bm: Brugia
malayi; Pdu: Platynereis dumerilii; Lc: Lethenteroncamts
chaticum; Eb: Eptatretus burgeri; Pd: Papilio dardanus;
On: Ostrinia nubilalis; Ha: Haliotis asinina; Td: Thermobia
domestica; Fc: Folsomia candida; Lf: Lithobius forficatus;
Gm: Glomeris marginata; Sr: Symsagittifera roscoffensis;
Bmu: Bos mutus; Tch: Tupaia chinensis; Rt:
Rhodosporidium toruloides
2.4
Homology modelling for tertiary structure predictions
Secondary and tertiary structures of Dictyostelium
homeodomains were predicted by JPred (Cole et al. 2008)
(http://www.compbio.dundee.ac.uk/www-jpred) implemented through the Barton Group, University of Dundee and
Phyre2 Protein Homology/Analogy Recognition Engine
(Kelley and Sternberg 2009) (http://
www.sbg.Bio.ac.uk.phyre2) respectively. Structures were
aligned using aligner 3D algorithm in STRAP (http://
www.bioinformaticsorg/strap/) and tree was obtained by
Archaeopteryx in the same program using NJ method.
Alignment was modified for the present representation
(figure 1) purpose where the positions of TALE class genes
were changed. For homology modelling the crystal structure
of the different homeodomains obtained from Protein Data
Bank (PDB) was used as a template and the quality of the
model was judged by Ramachandran Plot as checked by
PROCHECK tool on SAVES online resource for each protein (http://nihserver.mbi.ucla.edu/SAVES/).
2.5
243
RNA detection by semi-quantitative RT-PCR
RNA was isolated from cells collected at specific stages of
development as mentioned in Gosain (Gosain et al. 2012).
RT-PCR reactions were performed using specific primer sets
and ig7 (rnlA) was used as an internal control (supplementary table 1). Three to five independent mRNA isolations
were carried for each stage which was processed further for
RT-PCR reactions. An average of three were taken and
plotted in the given figures.
2.6
In silico analyses of putative promoter regions of
homeobox gene classes
1000 bp upstream or intergenic region of the genes were
taken to search for probable transcription factors binding
sites using MEME (Multiple EM algorithm for Motif
EliCitation) suite (http://meme.nbcr.net/meme/) (Bailey and
Elkan 1994) taking a default criteria of zero or one motif per
sequence. Significant presence of these motifs were found
using program MAST (Bailey and Gribskov 1998) in a total
of 13,495 upstream sequences of D. discoideum. Identical
motifs were removed and selected motifs were matched with
known transcription factor database TRANSFAC
(Wingender et al. 1996) using PATCH1.0 program (http://
www.gene-regulation.com/pub/programs.html). The first 57
promoter sequences of promoter database from dictybase
which did not have homeobox genes were chosen as random
promoter sequences for the analyses.
3.
3.1
Results and discussion
Classification of the homeobox genes from
D. discoideum
The purpose of comparing and classifying the homeobox
genes is to determine evolutionary relationships between
different genes and also to place them in a framework that
would allow revealing the structural and functional relationships. A few homeobox genes such as Hox, ParaHox and
NK are present in clusters which evolved by cis-duplication
events (for review, see Garcia-Fernàndez 2005) and are one
of the characteristic features of Hox genes. A total of 39 hox
genes from humans are present in four paralogous group on
four different chromosomes and in case of Drosophila these
genes are present as a single cluster of eight genes. The
relative position in their cluster and also the positions of
their introns are conserved (Roy et al. 2003). BLAST search
results confirmed 14 homeobox genes whose sequences
were collected from dictyBase (Chisholm et al. 2006) but
none of them could be hox genes as we did not observe any
clustering in the genomic structure of D. discoideum
J. Biosci. 40(2), June 2015
244
H Mishra and S Saran
Figure 1. Multiple structural alignments of the identified homeodomains of D. discoideum. The online software aligner 3D of STRAP
server was used for the construction of this figure as well as consensus sequences. The position of the three helices as well as consensus
sequences on the aligned homeodomains is marked. Box marks the characteristic three amino acids in the loop region of TALE class
members.
(supplementary figure 2). These genes are present on 4
different chromosomes and show no uniformity in distribution: chromosomes 2 and 6 have 4 genes each, while chromosomes 3 and 4 have 3 genes each. This probably does not
seem to be a result of recent chromosomal duplication as
their positions on the chromosomes are not conserved. Most
of the genes (10 out of 14) were interrupted by introns (1 or 2
in number) and their positions were also not conserved. The
homeobox in hbx11 and hbx7 genes were interrupted and
they fall in different classes. The intron in warA interrupts
the ankyrin repeats and is a homeotic gene (Han and Firtel
1998).
Homeodomain has a very conserved architecture with
three helices where the first and the second helices are
almost antiparallel to each other while the third helix is
roughly perpendicular to the other two (Kissinger et al.
1990). The second and the third helices form a helix-turnhelix motif (Harrison and Aggarwal 1990). The predicted
tertiary structure alignment of homeodomains from
D. discoideum show the residues involved in the construction of the three helixes, turn and loop (figure 1). We do
observe a broad division of TALE class and non-TALE.
Within the non-TALE subclades exist which later on were
recognized as NOVEL class and OTHER (figure 2). In
OTHER group Hbx2 appeared as an outcast and come closer
to TALE group of genes. A comparison of the structure
based tree with the sequence based tree was also carried,
but the clarity was found to be better in this case.
A typical homeodomain has 60 amino acids long sequence with few exceptions whereas the atypical
homeodomain is characterized by an insertion of extra residues between helix 1 and 2 or between 2 and 3 of the
homeodomain (Mukherjee et al. 2009). The TALE class is
J. Biosci. 40(2), June 2015
very ancient being present in wide range of eukaryotic lineages. The TALE class homeobox proteins always have three
Proline-Tyrosine-Proline residues except in TGIF family
where it is Alanine-Tyrosine-Proline making it a total of 63
amino acid residues (Bertolino et al. 1995). Another characteristic feature of the TALE class is that the loop is often
followed by a Serine/Threonine or any other acidic residues
(Bürglin 1997). The three (Hbx6, Hbx8 and Hbx14) NOVEL
class proteins show the presence of double homeodomain
where each of these domains are separated by six residues
wherein the 1st and 4th position are occupied by Tyrosine
and Asparagine residues respectively. The second
homeodomain is 56–57 amino acids long. For the members
of the OTHER (Hbx2, Hbx5-1, Hbx13, Hbx10 and WarA),
the first helix starts from the tenth residue which is mostly
Proline or any other acidic residue.
In total, five TALE class and nine non-TALE class members were identified in D. discoideum. Both TALE and nonTALE classes were already present in the ancestor of eukaryotes (Derelle et al. 2007). The phylogenetic position of
D. discoideum is far away from the known metazoans like
Drosophila and humans, where the multicellularity is dissimilar to that originated in these metazoans. We found nearly
36–48% sequence similarity in the homeodomain proteins of
D. discoideum and metazoans.
The classical way to classify these proteins is based on the
phylogenetic analyses of the homeodomain sequences, their
chromosomal location and domain composition (Holland
et al. 2007). In case of D. discoideum no specific codomains were found except the ankyrin repeats in WarA
and Chromatin organization modifier domain (CHROMO)
in Hbx12 (supplementary figure 3). The ML tree thus obtained (figure 3) strengthen our earlier classification as
Homeobox genes from Dictyostelium
245
Figure 2. Tertiary structure based Neighbor Joining phylogenetic tree. The 3D structure of the homeodomains of all homeodomain
containing proteins were predicted and based on that a tree was generated by Archaeopteryx method in the STRAP server. No root was
selected for the tree. The broad division of different classes namely, TALE, NOVEL and OTHERS which would be confirmed later can also
be seen here.
shown in figure 1. The three groups of proteins were identified in this organism, namely: TALE class, NOVEL class
and OTHER according to the similarity they share with the
known homeodomain proteins from different organisms.
NOVEL class proteins are unique to D. discoideum and
share no similarity to any known homeodomain containing
proteins and thus make a separate clade. OTHER group
members show most ambiguous positions as they do not
make a united class but rather behave as individual genes
showing similarity (38–61%) to members of ZFH, PRD,
LIM and TALE classes of animals, fungi and microsporidia.
WarA show similarity to Encephalitozoon cuniculi; LIM,
Hbx13 make a clade with PRD class members; Hbx10,
Hbx5-1 show affinity to ZFH class members while Hbx2
shares similarity with TALE class members. Structure based
tree analysis (figure 2) allows most of them to fall in a single
clade with one stray member, Hbx2. It was therefore not
feasible to classify them in a particular class and have thus
been categorized as unclassified and placed in OTHER of
homeodomain proteins according to the animal classification
(Holland et al. 2007). TALE class members show most
promising similarity to TALE class proteins mostly from
metazoans. These fall together in a clade along with members of MEIS, PKNOX, CUP and PBX families of TALE
class. Dictybase also annotates hbx4 as an orthologue of the
cup gene. The phylogenetic analysis of these proteins failed
to confidently assign homeodomains of D. discoideum to
any of the major class of homeodomains from metazoans,
and fungi except for the TALE class.
To further improve the resolution, phylogenetic trees specific to TALE class and non-TALE members were constructed where D. discoideum genes were resolved into a family.
We retrieved homeodomain sequences of proteins that
showed E-value <e−05in BLASTp through NCBI, where
individual homeodomains of D. discoideum were used as a
query. Using this approach, we got good support for the
position of our sequences in the phylogenetic tree assessment (figures 4, 5 and 6).
As shown in figure 4, NOVEL class members form a
clade but show similarity to members of SINE class of
metazoans. SINE class (named after the Drosophila gene
so: sine oculis) homeobox genes encode a distinct typical
homeodomain and upstream to it is the highly conserved
115-amino-acid-long SIX domain. Both the SIX domain and
the homeodomain are required for DNA binding (Kawakami
et al. 2000). This similarity is distant as NOVEL class
homeobox proteins (Hbx6, Hbx7, Hbx8 and Hbx14) do not
show sequence similarity to any known homeodomain
J. Biosci. 40(2), June 2015
246
H Mishra and S Saran
Figure 3. Maximum likelihood (ML) tree was obtained with homeodomains of selected proteins showing homology to all homeodomain
containg proteins from D. discoideum. Tree was built using SPRs tree topology searches and JTT amino acid substitution model. Trees have
been rooted with TasLim. Key (●, ■, ♦) represent TALE, NOVEL and OTHER proteins of D. discoideum respectively.
sequences. Irrespective of this, we have considered these
homeodomain sequences to construct the tree simply to
show that these proteins are unique to this organism. This
is not very unusual as every organism have few proteins
which are unique to them and therefore do not belong to any
known class for example bix1 of Xenopus and ahbx1 of
Branchiostoma (homeodb).
NOVEL class proteins such Hbx6, Hbx8 and Hbx14 show
distinct characteristics of having double homeodomains (supplementary figure 3). Hbx6 and Hbx14 have very similar
sequence in their homeodomains. In other words, first
homeodomain of Hbx6 is similar to first homeodomain of the
Hbx14 (98%) and second with second (94%) but this similarity
J. Biosci. 40(2), June 2015
was not observed in the first and second (49%) homeodomains
of the particular protein itself. Their presence on different
chromosomes suggests that they may have translocated and
diverged from a common ancestor. Both Hbx8 and Hbx7 are
present on the same chromosome but on different strands,
possibly a result of duplication and transposition. Both human
and mouse have genes with double homeodomains belonging
to the family DUX of PRD class (Holland et al. 2007; Zhong
and Holland 2011). Most of these genes are pseudogenes and
they are derived from retrotransposition of mRNA transcript of
functional DUX genes (Booth and Holland 2007). NOVEL
class genes of D. discoideum do not show significant similarity
to any of them.
Homeobox genes from Dictyostelium
247
Figure 4. ML tree was obtained with homeodomains of selected genes which matched with NOVEL class D. discoideum homeodomain
proteins. The tree was constructed using SPRs tree topology searches and JTT amino acid substitution model. Trees have been rooted with
RtYox1. Key (●) represent D. discoideum proteins.
D. discoideum homeodomains fall with LIM, ZFH, POU
and PRD class members (figure 5). WarA shows similarity
with the fungal and microsporidia (Encephalitozoon
cuniculi) LIM class. It was also earlier reported that one
homeodomain (EcHD-10) of Encephalitozoon cuniculi was
related to the homeodomain of WarA (Bürglin 2003). WarA
and Hbx10 share similarity in their protein sequences (78%)
suggesting that they may have duplicated recently from a
common ancestor. Surprisingly, Hbx10 shows PRD pattern
signature sequence at positions 16-26 except at positions
23rd and 25th wherein this protein has aspartic acid and
methionine instead of AEFHKQRV and FHY respectively
(Fonseca et al. 2008). Hbx13 of D. discoideum, which falls
between Hbx10 and WarA, show affinity with members of
PRD class. This protein is highly divergent from the rest of
the proteins represented here as it is clear from the branch
length and this variability in evolutionary rates sometime
may generate artificial groupings in the eukaryotic molecular
tree because of LBA (Long Branch attraction) (Philippe
et al. 2000). Hbx2 and Hbx5-1 show affinity with members
of ZFH class and NK family homeodomain, respectively.
Since our analyses failed to classify these homeodomain
proteins of D. discoideum satisfactorily we grouped them
under OTHER, taking note of the criteria used in human
homeodomain classification (Holland et al. 2007). We thus
placed the NOVEL class members in a separate class and did
J. Biosci. 40(2), June 2015
248
H Mishra and S Saran
Figure 5. ML tree was obtained with homeodomains of selected genes which matched with OTHER D. discoideum homeodomain
proteins. The tree was constructed using SPRs tree topology searches and JTT amino acid substitution model. Trees have been rooted with
TasLim. Key (●) represents D. discoideum proteins.
not consider them as members of OTHER. NOVEL class
proteins bear no similarity to any known class of
homeodomain proteins and are unique to D. discoideum
while the OTHER group members show weak similarity to
many known homeodomain proteins but still do not belong
to any one class as such.
TALE class has following genes families: in case of
animals: MEIS, PBX, TGIF, MKX, PKNOX/PREP and
IRX/IRO (Iroquois); in case of plants: KNOX and BEL;
and in case of fungi: M-ATYP (Atypical Mating type gene)
and CUP (Bürglin 1997). D. discoideum TALE class members belong to IRX, PBX and CUP (present only in fungus)
J. Biosci. 40(2), June 2015
family clades of fungus and metazoans (figure 6). In this
tree, Hbx4 is placed before the animals and along with the
fungus CUP family. This position is basal to IRX, MEIS and
PKNOX families of animals. PKNOX, found only in animals is not orthologous to KNOX family of plants (Holland
et al. 2007). It is quite possible that Hbx4 have features of
MEINOX domain (the amalgam of the MEIS and KNOX
domains) that was already present when plants and animals
diverged (Bürglin 1997). MEIS domain also show similarity
to PBX domain of animals meaning that PBX domain are
also derived from MEINOX domain (Bürglin 1998). Hbx3,
Hbx11 and Hbx12 TALE class members form a clade that
Homeobox genes from Dictyostelium
249
Figure 6. ML tree was obtained with homeodomains of selected genes which matched with TALE class D. discoideum homeodomain
proteins. The tree was constructed using SPRs tree topology searches and JTT amino acid substitution model. Trees have been rooted with
ScCup9. Key (●) represents D. discoideum proteins.
belongs to PBX class of metazoans. Hbx9 shows good
supported similarity to IRX family proteins but if we construct a tree without IRX then Hbx9 comes with PKNOX
family. These proteins do not show significant resemblance
with MKX and TGIF families of vertebrates.
3.2
Temporal expression pattern of homeobox genes
RNA expression patterns of the homeobox genes at specific
stages of development of D. discoideum were analysed to find
if there exists any unique expression pattern for each class. 3-5
independent mRNA isolations were carried for each gene and
an average was taken to plot the graphs shown.
Figure 7 shows the temporal expression pattern in members of TALE class namely hbx3, hbx4, hbx9, hbx11 and
hbx12. We found a gradual increase in expression from
freshly starved cells to slug stage and hence after, it maintained its levels in case of hbx3, hbx4 and hbx11 (figures 7A,
B and D). hbx9 (figure 7C) showed near uniform expression
throughout development and is the only TALE class member
that express at higher levels during vegetative stage suggesting its possible requirement during growth and proliferation.
These patterns are consistent with RNA-seq expression
J. Biosci. 40(2), June 2015
250
H Mishra and S Saran
ƒFigure 7.
Temporal expression pattern of TALE class members during development of D. discoideum. Relative abundance of the specific
genes during development is shown: (A) hbx3, (B) hbx4, (C) hbx9, (D)
hbx11, (E) hbx12. [FS – Freshly starved LA – Loose aggregates, M –
Mound, SL – Slug, EC – Early culminant and FB – Fruiting body. N=3.]
available on dictyExpress (Rot et al. 2009). hbx12
(figure 7E) transcript was at its peak at the mound stage
which is different as compared to dictyExpress where it
shows maxima in freshly starved cells. It is already known
that hbx4 plays a role in cell patterning and cytokinesis as it
regulates cadA gene which encodes CAD-1, a Ca2+-dependent cell adhesion molecule (Kim et al. 2011). Hbx3 plays a
role during phagocytosis (Sillo et al. 2008).
Figure 8 shows the expression pattern of members from
OTHER class namely, hbx2, hbx5-1, hbx10, hbx13 and warA
during development. hbx2, hbx5-1and hbx10 (figures 8A, B
and C) show higher expression in freshly starved cells suggesting a role during early development hbx5-1show sharp
decrease at mound stage which is not in accordance with
dictyExpress where it decreases gradually. In dictyExpress
hbx13 (figure 8D) shows a dip at mound stage whereas in this
study expression showed a gradual increase till early culmination suggesting its involvement in differentiation. warA
(figure 8E) expression increased from vegetative to mound
and showed a small decrease till culmination. warA is known
to be responsible for prestalkO proportioning in slugs by
regulating the number of prestalkO cells differentiating from
anterior-like-cells (Han and Firtel 1998).
Figure 9 shows the expression patterns of NOVEL class
members namely, hbx6, hbx7, hbx8 and hbx14 having double homeoboxes. The level of expression for all the members
of this class was comparatively lower and no unique pattern
was observed. hbx6 (figure 9A) expression is more or less
uniform throughout development with a small rise at aggregation and slug stages. hbx7 (figure 9B) expression gradually decreased from starved cell to mound then remained at
basal levels throughout development. hbx8 (figure 9C) show
a wave like pattern having two peaks: one in freshly starved
cells and other during slug and early culmination stages with
a trough at aggregation and mound stages. RNAseq data
from dictyExpress gives a different picture, where hbx8
show only one peak at loose aggregate stage. hbx14
(figure 9D) expression is more or less same throughout
development with maximum during aggregation stage but
according to dictyExpress it shows an increase only at culmination. One could possibly say that the members of NOVEL class genes show higher expression levels during freshly
starved cells.
Expression analyses were carried to check if the individual members of each class followed a unique pattern of
expression, which could further help speculate functional
significance for this organism. The results observed did not
show the emergence of any such unique pattern for any class
J. Biosci. 40(2), June 2015
Homeobox genes from Dictyostelium
251
ƒFigure 8.
Temporal expression pattern of OTHER members during
development of D. discoideum. Relative abundance of the specific
genes during development is shown: (A) hbx2, (B) hbx5, (C) hbx10,
(D) hbx13, (E) warA. [FS – Freshly starved, LA – Loose aggregates, M
– Mound, SL – Slug, EC – Early culminant and FB – Fruiting body.
N=3. ]
suggesting that each homeodomain containing protein may
be having its own specific function. We could thus only
speculate their probable functions during development of
D. discoideum. Further experiments with each of them are
necessary for showing their specific roles during growth and
development.
3.3
Putative promoter analysis
Homeobox genes like any other genes are regulated by
various mechanism(s) including chromatin remodeling, receptor mediated regulation and regulation by other factors.
Upstream sequence analyses of identified homeobox genes
from D. discoideum could shed light on their functional
regulation and the pathways in which they are likely involved. Hence, we searched for the conserved motifs or
binding sites in their putative promoter region that could
possibly regulate them. We found three significant motifs
for each class or group of genes and their sequence logos are
shown in figure 10.
Conservation among the promoter is a necessary aspect
for regulation by transcription factors of conserved pathways. Repetitive occurrence of these motifs within the class
and low presence outside the particular class can be a measure of their unique properties; therefore we calculated the
repetitive presence of these motifs within the class using the
same program and compared the results obtained with homeobox genes with that of 57 randomly chosen genes and 9
conserved hsp70 genes from D. discoideum. TALE class
motifs had 3.2 times on an average presence in TALE class
putative promoter; NOVEL had 9.5 times per gene and
OTHER had 4 times per gene as compared to randomly
selected genes which had on an average 2.7 times whereas
conserved hsp70 have 8.4 times per gene. We searched the
presence of these motifs in non-homeobox containing genes
using MAST program. TALE class motifs appeared in 43
genes, NOVEL class in 12 genes and OTHER group in 10
non-homeobox containing genes out of total 13,495 upstream sequences of the D. discoideum. These results revealed that the NOVEL class upstream regions were more
unique than those of the TALE class and OTHER. This is
also reflected in NOVEL class proteins (figure 4) where they
appeared as a unique clade.
Resulting motifs were further utilized to find known transcription factor binding sites using TRANSFAC database
(Wingender et al. 1996) (binding sites and transcription
J. Biosci. 40(2), June 2015
252
H Mishra and S Saran
ƒFigure
9. Temporal expression pattern of NOVEL class members
during development of D. discoideum. Relative abundance of the specific genes during development is shown: (A) hbx6, (B) hbx7, (C) hbx8,
(D) hbx14. [FS – Freshly starved, LA – Loose aggregates, M – Mound,
SL – Slug, EC – Early culminant and FB – Fruiting body. N=3.]
factors are shown in supplementary figure 3A, B and C).
Most of them were DNA binding proteins for example
homeobox genes, retinoic acid receptors, MADS box proteins, CACCC box binding factors, PUF (c-myc
transcription factor) etc. Most of the binding factors identified were not present in D. discoideum, but few factors have
been identified as ortholog or similar proteins. Sequence
features of promoters to which these factors bind could
suggest the pathways or functions of these genes.
NOVEL class homeobox genes may play a role in cellular
differentiation as most of them express at initial and late
developmental stages where prespore and prestalk fate determination and terminal differentiation occur. Moreover
their promoter analysis showed a Mef2 (Myocyte enhancer
factor 2) binding site, a MADS box transcription factor
among many other factors. The ortholog of Mef2A in
Dictyostelium plays a role in cellular differentiation by modulating prestalk/prespore gene expression (Galardi-Castilla
et al. 2013).
OTHER group promoters have Puf and HSF (Heat shock
factors) binding site. Puf protein had been identified as
nucleoside diphosphate kinase of nm23-H2 protein (Postel
et al. 1993). These proteins have role in cellular growth and
differentiation through TGF-β1 signaling (Hsu et al. 1994;
Gervasi et al. 1996). D. discoideum have NDPK (nucleoside
diphosphate kinase) protein which might bind to these promoter sites. D. discoideum also has an HSF gene, an
ortholog of Homo sapiens HSF1 and Lycopersicom
HSF24. HSF have known role in cell growth in unstressed
fission yeast (Gallo et al. 1993) that could also be a case in
D. discoideum.
TALE class promoter revealed binding site of wnt pathway transcription factors like TCF1, LEF1 and SRY. These
factors act downstream of β-catenin and GSK3.
D. discoideum have an ortholog of mammalian β-catenin
and GSK3, Aar (Aardvark) and gskA respectively. These
orthologs have roles in cell-cell adhesion and cell type proportioning in multicellular phase, under the influence of
cAMP (Harwood et al. 1995; Grimson et al. 2000; Coates
et al. 2002). It indicates that these attributes get materialized
through TALE class genes via TCF1/LEF1 transcription
factors.
Clearly these motifs are conserved and some of the motifs
have binding sites for the homeobox genes itself which is an
indicative of cross regulation and auto-regulation. We found
few homeodomain binding sites in the promoter of these
homeobox genes but the binding factors identified from
J. Biosci. 40(2), June 2015
Homeobox genes from Dictyostelium
253
Figure 10. Sequences logos of conserved motifs found in the putative promoter regions of different homeobox gene classes in
D. discoideum. Motifs were searched by MEME suite using position specific scoring matrix: (A) TALE class motifs, (B) NOVEL class,
(C) OTHER motifs.
our analyses, are probably not present in D. discoideum.
Therefore, there is no scope of discussing these aspects in
the present study.
4.
Conclusions
The 14 homeobox genes identified in D. discoideum were
classified, of which 5 belong to the TALE (atypical) class,
and 9 to the non-TALE (typical) group. Further, the nonTALE group could be classified as NOVEL class and OTHER, according to the structure and sequence similarity. Also,
we did not find any Hox proteins in D. discoideum. TALE
class proteins received the much supported position in a
clade of PBC, IRX and CUP families of TALE class of
metazoans. This may be because TALE class members are
much conserved and primitive in nature. NOVEL class is
found to be unique to D. discoideum as they do not show any
similarity to known homeodomains. This we think is due to
the fact that these genes evolved after the Dictyostelium had
diverged from animal and fungal lineage. OTHER genes
show similarity to LIM, ZFH and PRD class members, but
since these genes did not appear as a united class in the
constructed phylogenetic tree, we were unable to classify
them further. These genes have sequence features that match
with signature sequences of PRD class genes. We, thus,
speculate them to be the direct descendants of a common
ancestor of this gene class. D. discoideum is a primitive
eukaryote and holds a position before fungus and metazoans
in the evolutionary tree of life.
No unique RNA expression pattern during development
of D. discoideum emerged for any of the class. To date only
a few homeobox genes have been functionally characterized
in this organism. Similar to many multigene families, many
homeobox genes may have overlapping functions. Putative
promoter analysis revealed that these genes have many probable protein factors binding sites upstream of transcription
start site. Studying those features could suggest some pathways that function with or regulate the homeobox proteins in
D. discoideum.
Acknowledgements
SS acknowledges the DST PURSE grant to her, and HM
acknowledges UGC for fellowship.
References
Bailey TL and Elkan C 1994 Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int.
Conf. Intell. Syst. Mol. Biol. 2 28–36
Bailey TL and Gribskov M 1998 Combining evidence using pvalues: application to sequence homology searches. Bioinformatics 14 48–54
Baldauf SL and Palmer JD 1993 Animals and fungi are each other’s
closest relatives: congruent evidence from multiple proteins.
Proc. Natl. Acad. Sci. USA 90 11558–11562
Bapteste E, Brinkmann H, Lee JA, Moore DV, Sensen CW, Gordon
P, Duruflé L, Gaasterland T, et al. 2002 The analysis of 100
genes supports the grouping of three highly divergent amoebae:
Dictyostelium, Entamoeba and Mastigamoeba. Proc. Natl.
Acad. Sci. USA 99 1414–1419
Bertolino E, Reimund B, Wildt-Perinic D and Clerc RG 1995 A
novel homeobox protein which recognizes a TGT core and
J. Biosci. 40(2), June 2015
254
H Mishra and S Saran
functionally interferes with a retinoid-responsive motif. J. Biol.
Chem. 270 31178–31188
Booth HAF and Holland PWH 2007 Annotation nomenclature and
evolution of four novel homeobox genes expressed in the human
germ line. Gene 387 7–14
Bürglin TR 1997 Analysis of TALE superclass homeobox genes
(MEIS PBC KNOX Iroquois TGIF) reveals a novel domain
conserved between plants and animals. Nucleic Acids Res. 25
4173–4180
Bürglin TR 1998 The PBC domain contains a MEINOX domain:
coevolution of Hox and TALE homeobox genes? Dev. Genes
Evol. 208 113–116
Bürglin TR 2003 The homeobox genes of Encephalitozoon cuniculi
(Microsporidia) reveal a putative mating-type locus. Dev. Genes
Evol. 213 50–52
Chisholm RL, Gaudet P, Just EM, Pilcher KE, Fey P, Merchant
SN and Kibbe WA 2006 dictyBase the model organism
database for Dictyostelium discoideum. Nucleic Acids Res.
34 D423–D427
Coates JC, Grimson MJ, Williams RSB, Bergman W, Blanton RL
and Harwood AJ 2002 Loss of the beta-catenin homologue
aardvark causes ectopic stalk formation in Dictyostelium. Mech.
Dev. 116 117–127
Cole C, Barber JD and Barton GJ 2008 The Jpred 3 secondary
structure prediction server. Nucleic Acids Res. 36 W197–W201
Dereeper A, Guignon V, Blanc G, Audic S, Buffet S, Chevenet F,
Dufayard JF, Guindon S, et al. 2008 Phylogeny fr: robust
phylogenetic analysis for the non-specialist. Nucleic Acids Res.
36 W465–W469
Derelle R, Lopez P, Le Guyader H and Manuel M 2007
Homeodomain proteins belong to the ancestral molecular toolkit
of eukaryotes. Evol. Dev. 9 212–219
Fonseca NA, Vieira CP, Holland PWH and Vieira J 2008 Protein
evolution of ANTP and PRD homeobox genes. BMC Evol. Biol.
8 200
Galardi-Castilla M, Fernandez-Aguado I, Suarez T and Sastre L
2013 Mef2A a homologue of animal Mef2 transcription factors
regulates cell differentiation in Dictyostelium discoideum. BMC
Dev. Biol. 13 12
Gallo GJ, Prentice H and Kingston RE 1993 Heat shock factor is
required for growth at normal temperatures in the fission yeast
Schizosaccharomyces pombe. Mol. Cell. Biol. 13 749–761
Garcia-Fernàndez J 2005 The genesis and evolution of homeobox
gene clusters. Nat. Rev. Genet. 6 881–892
Gehring WJ 1994 A history of the homeobox; in Guidebook to the
homeobox genes (ed) D Duboule (New York: Oxford Univ
Press)
Gervasi F, D’Agnano I, Vossio S, Zupi G, Sacchi A and Lombardi
D 1996 nm23 influences proliferation and differentiation of
PC12 cells in response to nerve growth factor. Cell Growth
Differ. 7 1689–1695
Gosain A, Lohia R, Shrivastava A and Saran S 2012 Identification
and characterization of peptide: N-glycanase from Dictyostelium
discoideum. BMC Biochem. 13 9
Grimson MJ, Coates JC, Reynolds JP, Shipman M, Blanton RL and
Harwood AJ 2000 Adherens junctions and beta-cateninmediated cell signalling in a non-metazoan organism. Nature
408 727–731
J. Biosci. 40(2), June 2015
Guindon S and Gascuel O 2003 A simple fast and accurate algorithm to estimate large phylogenies by maximum likelihood.
Syst. Biol. 52 696–704
Guindon S, Lethiec F, Duroux P and Gascuel O 2005 PHYML
Online–a web server for fast maximum likelihood-based phylogenetic inference. Nucleic Acids Res. 33 W557–W559
Han Z and Firtel RA 1998 The homeobox-containing gene Wariai
regulates anterior-posterior patterning and cell-type homeostasis
in Dictyostelium. Development 125 313–325
Harrison SC and Aggarwal AK 1990 DNA recognition by proteins
with the helix-turn-helix motif. Annu. Rev. Biochem. 59 933–969
Harwood AJ, Plyte SE, Woodgett J, Strutt H and Kay RR 1995
Glycogen synthase kinase 3 regulates cell fate in Dictyostelium.
Cell 80 139–148
Holland PWH, Booth HAF and Bruford EA 2007 Classification and
nomenclature of all human homeobox genes. BMC Biol. 5 47
Hsu S, Huang F, Wang L, Banerjee S, Winawer S and Friedman E 1994
The role of nm23 in transforming growth factor beta 1-mediated
adherence and growth arrest. Cell Growth Differ. 5 909–917
Iyer LM, Anantharaman V, Wolf MY and Aravind L 2008 Comparative genomics of transcription factors and chromatin proteins in
parasitic protists and other eukaryotes. Int. J. Parasitol. 38 1–31
Kawakami K, Sato S, Ozaki H and Ikeda K 2000 Six family genes–
structure and function as transcription factors and their roles in
development. Bioessays 22 616–626
Kelley LA and Sternberg MJE 2009 Protein structure prediction on the
Web: a case study using the Phyre server. Nat. Protoc. 4 363–371
Kim JS, Seo JH, Yim HS and Kang SO 2011 Homeoprotein Hbx4
represses the expression of the adhesion molecule DdCAD-1
governing cytokinesis and development. FEBS Lett. 585 1864–1872
Kissinger CR, Liu BS, Martin-Blanco E, Kornberg TB and Pabo
CO 1990 Crystal structure of an engrailed homeodomain-DNA
complex at 2 8 A resolution: a framework for understanding
homeodomain-DNA interactions. Cell 63 579–590
Kuma K, Nikoh N, Iwabe N and Miyata T 1995 Phylogenetic
position of Dictyostelium inferred from multiple protein data
sets. J. Mol. Evol. 41 238–246
Letunic I, Doerks T and Bork P 2011 SMART 7: recent updates to
the protein domain annotation resource. Nucleic Acids Res. 40
D302–D305
Mukherjee K, Brocchieri L and Bürglin TR 2009 A comprehensive
classification and evolutionary analysis of plant homeobox
genes. Mol. Biol. Evol. 26 2775–2794
Philippe H, Germot A and Moreira D 2000 The new phylogeny of
eukaryotes. Curr. Opin. Genet. Dev. 10 596–601
Postel EH, Berberich SJ, Flint SJ and Ferrone CA 1993 Human cmyc transcription factor PuF identified as nm23-H2 nucleoside
diphosphate kinase, a candidate suppressor of tumor metastasis.
Science 261 478–480
Rot G, Parikh A, Curk T, Kuspa A, Shaulsky G and Zupan B 2009
dictyExpress: a Dictyostelium discoideum gene expression database with an explorative data analysis web-based interface. BMC
Bioinformatics 10 265
Roy SW, Fedorov A and Gilbert W 2003 Large-scale comparison
of intron positions in mammalian genes shows intron loss but no
gain. Proc. Natl. Acad. Sci. USA 100 7158–7162
Ryan JF, Burton PM, Mazza ME, Kwong GK, Mullikin JC and
Finnerty JR 2006 The cnidarian-bilaterian ancestor possessed at
Homeobox genes from Dictyostelium
least 56 homeoboxes: evidence from the starlet sea anemone
Nematostella vectensis. Genome Biol. 7 R64
Sillo A, Bloomfield G, Balest A, Balbo A, Pergolizzi B, Peracino
B, Skelton J, Ivens A, et al. 2008 Genome-wide transcriptional
changes induced by phagocytosis or growth on bacteria in
Dictyostelium. BMC Genomics 9 291
Steenkamp ET, Wright J and Baldauf SL 2006 The protistan origins
of animals and fungi. Mol. Biol. Evol. 23 93–106
Takatori N, Butts T, Candiani S, Pestarino M, Ferrier DEK,
Saiga H and Holland PWH 2008 Comprehensive survey and
classification of homeobox genes in the genome of
255
amphioxus Branchiostoma floridae. Dev. Genes Evol. 218
579–590
Tamura K, Stecher G, Peterson D, Filipski A and Kumar S 2013
MEGA6: molecular evolutionary genetics analysis version 6.0.
Mol. Biol. Evol. 30 2725–2729
Wingender E, Dietze P, Karas H and Knüppel R 1996 TRAN
SFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res. 24 238–241
Zhong Y and Holland PWH 2011 The dynamics of vertebrate
homeobox gene evolution: gain and loss of genes in mouse
and human lineages. BMC Evol. Biol. 11 169
MS received 02 December 2014; accepted 12 March 2015
Corresponding editor: DURGADAS P KASBEKAR
J. Biosci. 40(2), June 2015