Genome-Wide Classification and Evolutionary

Genome-Wide Classification and Evolutionary Analysis
of the bHLH Family of Transcription Factors in
Arabidopsis, Poplar, Rice, Moss, and Algae1[W]
Lorenzo Carretero-Paulet*, Anahit Galstyan, Irma Roig-Villanova2, Jaime F. Martı́nez-Garcı́a,
Jose R. Bilbao-Castro, and David L. Robertson
Department of Applied Biology (Area of Genetics), University of Almerı́a, 04120 Almerı́a, Spain (L.C.-P.,
J.R.B.-C.); Faculty of Life Sciences, University of Manchester, Manchester M13 9PT, United Kingdom (L.C.-P.,
D.L.R.); Department of Plant Molecular Genetics, Centre for Research in Agricultural Genomics, Consejo
Superior de Investigaciones Cientı́ficas-Institut de Recerca i Tecnologia Agroalimentàries-Universitat
Autònoma de Barcelona, 08028 Barcelona, Spain (A.G., I.R.-V., J.F.M.-G.); Institució Catalana de Recerca i
Estudis Avançats, 08010 Barcelona, Spain (J.F.M.-G.); and Biocomputing Unit, National Centre of
Biotechnology, Universidad Autónoma de Madrid, 28049 Madrid, Spain (J.R.B.-C.)
Basic helix-loop-helix proteins (bHLHs) are found throughout the three eukaryotic kingdoms and constitute one of the largest
families of transcription factors. A growing number of bHLH proteins have been functionally characterized in plants.
However, some of these have not been previously classified. We present here an updated and comprehensive classification of
the bHLHs encoded by the whole sequenced genomes of Arabidopsis (Arabidopsis thaliana), Populus trichocarpa, Oryza sativa,
Physcomitrella patens, and five algae species. We define a plant bHLH consensus motif, which allowed the identification of novel
highly diverged atypical bHLHs. Using yeast two-hybrid assays, we confirm that (1) a highly diverged bHLH has retained
protein interaction activity and (2) the two most conserved positions in the consensus play an essential role in dimerization.
Phylogenetic analysis permitted classification of the 638 bHLH genes identified into 32 subfamilies. Evolutionary and
functional relationships within subfamilies are supported by intron patterns, predicted DNA-binding motifs, and the
architecture of conserved protein motifs. Our analyses reveal the origin and evolutionary diversification of plant bHLHs
through differential expansions, domain shuffling, and extensive sequence divergence. At the functional level, this would
translate into different subfamilies evolving specific DNA-binding and protein interaction activities as well as differential
transcriptional regulatory roles. Our results suggest a role for bHLH proteins in generating plant phenotypic diversity and
provide a solid framework for further investigations into the role carried out in the transcriptional regulation of key growth
and developmental processes.
Most biological processes in a eukaryotic cell or
organism are finely controlled at the transcriptional
level by transcription factors. Transcription factors
1
This work was supported by the Generalitat de Catalunya
(Xarxa de Referència en Biotecnologia and Grup de Recerca
Consolidat) and the Spanish Ministry of Science and Innovation
(MICINN)-Fondo Europeo de Desarrollo Regional (grant no.
BIO2008–00169 to J.F.M.-G.), by the Spanish Ministry of Education
and Science (MEC) and the European Social Fund (Juan de la Cierva
program grant to L.C.-P. and J.R.B.-C.), and by the Spanish MEC
(Formación Profesorado Universitario program) and MICINN (Formación Personal Investigador program; predoctoral fellowships to
A.G. and I.R.-V., respectively).
2
Present address: Dipartimento di Scienze Biomolecolari e Biotecnologie, Università degli Studi di Milano, Via Celoria 26, 20133
Milan, Italy.
* Corresponding author; e-mail [email protected].
The author responsible for distribution of materials integral to the
findings presented in this article in accordance with the policy
described in the Instructions for Authors (www.plantphysiol.org) is:
Lorenzo Carretero-Paulet ([email protected]).
[W]
The online version of this article contains Web-only data.
www.plantphysiol.org/cgi/doi/10.1104/pp.110.153593
1398
usually contain two different functional domains involved in DNA binding and protein dimerization,
activities that may be regulated by several mechanisms,
including differential dimer formation (Riechmann
et al., 2000; Amoutzias et al., 2007). In addition, transcription factors are usually encoded by multigene
families, multiplying the number and complexity of
possible transcriptional regulatory roles (Riechmann
et al., 2000).
Basic helix-loop-helix proteins (bHLHs) are widely
distributed in all three eukaryotic kingdoms and constitute one of the largest families of transcription
factors (Riechmann et al., 2000; Ledent and Vervoort,
2001). bHLHs represent key regulatory components in
transcriptional networks controlling a number of biological processes. In unicellular eukaryotes, such as
yeast, bHLH proteins are involved in chromosome
segregation, general transcriptional enhancement, and
metabolism regulation (Robinson and Lopes, 2000). In
animals, bHLHs have been involved in sensing environmental signals, in regulating the cell cycle and
circadian rhythms, as well as in the regulation of
diverse essential developmental processes, including
Plant PhysiologyÒ, July 2010, Vol. 153, pp. 1398–1412, www.plantphysiol.org Ó 2010 American Society of Plant Biologists
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2010 American Society of Plant Biologists. All rights reserved.
Genome-Wide Analysis of Plant bHLHs
neurogenesis, myogenesis, sex and cell lineage determination, proliferation, and differentiation (Atchley
and Fitch, 1997; Ledent and Vervoort, 2001; Amoutzias
et al., 2004; Stevens et al., 2008). The R gene product Lc
was the first plant protein reported to possess a bHLH
domain and is involved in the control of flavonoid/
anthocyanin biosynthesis in maize (Zea mays; Ludwig
et al., 1989). The R gene belongs to a small subfamily
comprising three additional genes (R, B, and Sn)
for which the corresponding orthologs have been
reported in Arabidopsis (Arabidopsis thaliana; AtTT8)
and rice (Oryza sativa; OsRa-c; Hu et al., 1996; Nesi
et al., 2000).
The number of characterized plant bHLHs has increased in recent years, revealing the wide and diverse
array of biological processes in which they are involved. They have been reported to function in light
signaling (Ni et al., 1998; Halliday et al., 1999; Fairchild
et al., 2000; Huq and Quail, 2002; Khanna et al., 2004;
Oh et al., 2004; Hyun and Lee, 2006; Roig-Villanova et al.,
2007; Leivar et al., 2008), hormone signaling (Abe et al.,
1997; Friedrichsen et al., 2002; Yin et al., 2005; Lee
et al., 2006), wound and drought stress responses (de
Pater et al., 1997; Smolen et al., 2002; Chinnusamy et al.,
2003; Kiribuchi et al., 2004), symbiotic ammonium transport (Kaiser et al., 1998), shoot branching (Komatsu
et al., 2001), fruit and flower development (Rajani and
Sundaresan, 2001; Liljegren et al., 2004; Szecsi et al., 2006;
Zhang et al., 2006; Gremski et al., 2007), and microspore
(Sorensen et al., 2003), trichome (Payne et al., 2000;
Morohashi et al., 2007), stomata (Pillitteri et al., 2007;
Kanaoka et al., 2008), and root (Menand et al., 2007;
Ohashi-Ito and Bergmann, 2007) development.
These proteins are defined by the bHLH signature
domain (Ferre-D’Amare et al., 1993), which is composed of approximately 60 amino acids arranged
according to the typical bifunctional structure. The
basic region, an N-terminal stretch of approximately
15 to 20 residues typically rich in basic amino acids, is
involved in DNA binding. Certain conserved amino
acids in the basic region determine recognition to
the so-called core E-box hexanucleotide consensus
sequence 5#-CANNTG-3#, whereas other residues
would provide specificity for a given type of E-box
(e.g. the G-box [5#-CACGTG-3#]). In addition, flanking nucleotides outside the core have also been shown
to play a role in binding specificity (Shimizu et al.,
1997; Atchley et al., 1999; Martinez-Garcia et al., 2000;
Massari and Murre, 2000). The HLH region is composed of two amphipathic a-helices mainly consisting
of hydrophobic residues linked by a more diverged
(both in length and primary sequence) loop region.
The HLH domain promotes protein-protein interaction, allowing the formation of homodimeric or heterodimeric complexes (Massari and Murre, 2000).
Cocrystal structural analysis has shown the interaction between the HLH regions of two bHLH proteins
and that each partner binds to half of the DNA
recognition sequence (Ma et al., 1994; Shimizu et al.,
1997).
Outside the bHLH domain, bHLH proteins usually
exhibit low, if any, sequence conservation. However,
groups of evolutionary and/or functionally related
bHLH proteins may share additional motifs. Some of
these have been characterized in animals to determine
specificity in DNA-binding sequence recognition and
dimerization activities, as responsible for the activation or repression of target genes or for the binding to
small molecules (e.g. dioxin; Ledent and Vervoort,
2001). One example is provided by the highly conserved Leu zipper (ZIP) motif characterized by heptad
repeats of Leu residues adjacent to the second helix of
the bHLH domain and predicted to adopt a coiled-coil
structure that permits dimerization between proteins
(Lupas, 1996). Other domains commonly found in
animal bHLH proteins are the PAS domain, the Orange domain, the WRPW motif, and the COE domain
(Ledent and Vervoort, 2001; Stevens et al., 2008).
Previous classifications of animal bHLHs have
led to the definition of six major functional and evolutionary lineages (groups A–F; Atchley and Fitch,
1997; Ledent and Vervoort, 2001) that can be further
subdivided into smaller orthologous subfamilies
(Simionato et al., 2007). Most bHLH proteins are
classified as group A or B and are expected to bind
the core E-box consensus sequences. Group B includes
members specifically displaying a G-box-binding motif configuration and proteins that share a ZIP domain
at the COOH-terminal end of the protein or that
contain the Orange domain. Group C bHLH proteins
share a pair of PAS domains and bind non E-box
sequences. Group E includes bHLH proteins that
contain a conserved Pro or Gly residue at a key
position within the basic region, preferentially bind to
sequences referred to as N-boxes, and share an additional WRPW motif. Groups D and F represent proteins particularly diverged at the basic region. Some
group D proteins, described as unable to bind DNA,
might form heterodimers that function as dominantnegative regulators of DNA-binding activity of otherwise DNA-binding bHLHs (Fairman et al., 1993).
Group F includes the so-called COE proteins, which
share the COE domain. It has been suggested that
the ancestral bHLH sequence was a group B protein
present in early eukaryote evolution, from which
bHLHs from different lineages evolved independently
(Ledent and Vervoort, 2001; Heim et al., 2003).
Previous classifications of the family of bHLH proteins encoded by the Arabidopsis and rice genomes
(Heim et al., 2003; Toledo-Ortiz et al., 2003; Li et al.,
2006b) are essentially based on a bHLH consensus
motif constructed from the alignment of 392 sequences
mostly from groups A and B of animal DNA-binding
bHLHs (Atchley et al., 1999). The consensus was
expected to identify bHLH domain-containing proteins with a high degree of accuracy. However, highly
diverged bHLH proteins are poorly predicted from the
consensus (Atchley et al., 1999), and recent studies
have identified and characterized novel atypical
bHLHs in Arabidopsis (Fairchild et al., 2000; Hyun
Plant Physiol. Vol. 153, 2010
1399
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2010 American Society of Plant Biologists. All rights reserved.
Carretero-Paulet et al.
and Lee, 2006; Lee et al., 2006; Roig-Villanova et al.,
2007). They were particularly diverged at the basic
region and usually lacked sequence features characterized as critical for proper DNA binding (Massari
and Murre, 2000).
Functional diversification in gene families encoding
transcription factors is emerging as a major source of
morphological and physiological diversity underlying
evolution (Doebley and Lukens, 1998; Riechmann
et al., 2000; Tsiantis and Hay, 2003; Kellogg, 2004).
We present here a comprehensive classification together with a structural and evolutionary analysis of
the plant bHLH gene family. This analysis was performed at a genome-wide level across distantly related
land plant evolutionary lineages, including three
angiosperms, Arabidopsis (eudicot-eurosids II), Populus trichocarpa (poplar; eudicot-eurosids I), and rice
(monocot), as well as Physcomitrella patens (moss;
bryophyte; Arabidopsis Genome Initiative, 2000; International Rice Genome Sequencing Project, 2005;
Tuskan et al., 2006; Rensing et al., 2008). In terms of
evolution, moss can be considered as a basal species
for land plants and, therefore, might enable inference of
the ancestral state of the land plant bHLH family
(Kenrick and Crane, 1997; Karol et al., 2001). Furthermore, to have a broader perspective into the early
evolutionary history of the plant bHLH family, we
also searched for bHLH genes in five algal species,
including four green algae species (Volvox carteri, Chlamydomonas reinhardtii, Ostreococcus tauri, Ostreococcus
lucimarinus), which diverged from the land plants over
1 billion years ago, and the primitive red alga Cyanidioschyzon merolae (Matsuzaki et al., 2004; Merchant
et al., 2007; Palenik et al., 2007). This is a first step
toward further investigations into the biological and
molecular functions of novel bHLH transcription factors as well as into their role in plant evolutionary
diversification.
RESULTS
Identification and Classification of Arabidopsis, Poplar,
Rice, Moss, and Algae bHLH Gene Families
Previous surveys of Arabidopsis and rice bHLH
gene families had identified 162 and 167 members,
respectively (Bailey et al., 2003; Li et al., 2006b). All
but seven of these sequences encoded for proteins
annotated as matching the INTERPRO 001092 domain, corresponding to the dimerization region of the
bHLH domain. To define the bHLH gene families
from poplar, moss, V. carteri, C. reinhardtii, O. tauri,
O. lucimarinus, and C. merolae, we searched through the
corresponding whole sequenced genomes for genes
encoding proteins containing the INTERPRO 001092
domain. The resulting sequences were named following the generic system proposed for Arabidopsis
(Heim et al., 2003), discarding the “bHLH.” Names
are composed of a number, corresponding to the rela-
tive position resulting from searches for the bHLH
domain, followed by the most common name as retrieved in the literature. Correspondences of sequence
names with gene and protein identifiers from the
corresponding genome browsers are shown in Table I
and Supplemental Table S2.
In recent years, novel atypical Arabidopsis bHLH
proteins, most of them not identified as such in
previous surveys, have been reported: At163KDR,
At164PRE5, At165PAR1, and At166PAR2 (Hyun and
Lee, 2006; Lee et al., 2006; Roig-Villanova et al., 2007).
Another group of putative novel bHLH sequences
were identified in microarray analysis as downregulated in At165PAR1 constitutively overexpressing
Table I. Summary of novel atypical sequences accepted and
discarded as bHLHs
Protein sequences newly identified in this study putatively corresponding to bHLHs are in boldface. Sequences were accepted or
discarded as bHLHs according to their fit to the animal and the plant
bHLH consensus used as predictive motifs. TAIR, The Arabidopsis
Information Resource; TIGR, The Institute for Genomic Research.
Sequence Name
TAIR/TIGR/JGI Gene Identifier
Novel atypical bHLH sequences
At163KDR
At1g26945
At164PRE5
At3g28857
At165PAR1
At2g42870
At166PAR2
At3g58850
At167P1R1
At5g57780
At168P1R3
At3g29370
At169
At5g39240
At170
At2g18969
Os168
LOC_Os02g54870
Os169
LOC_Os01g43950
Os170
LOC_Os02g51320
Os171
LOC_Os08g16030
Os172
LOC_Os06g12210
Os173
LOC_Os10g26460
Os174
LOC_Os10g26410
Os175
LOC_Os04g56500
Os176
LOC_Os03g19780
Os177
LOC_Os08g31950
Os178
LOC_Os07g48900
Pt183
Eugene3.00061353
Pt184
eugene3.00180893
Pt185
eugene3.00002147
Pt186
estExt_fgenesh4_pg.C_LG_XIV0893
Pt187
grail3.0003051401
Pt190
fgenesh4_pg.C_LG_XVIII000779
Sequences discarded
At111
At1g31050
At133
At2g20100
At152
At1g22380
Pt032
grail3.1832000301
Pt090
eugene3.00051537
Pt102
grail3.0033027401
Pt122
eugene3.00040401
Pt170
fgenesh4_pg.C_LG_IX000768
Pt188
grail3.0139003601
Pt189
eugene3.00170483
OlbHLH2
eugene. 1400010176
CrbHLH2
pasa_Sanger_mRNA29676|Chlre4
1400
Plant Physiol. Vol. 153, 2010
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2010 American Society of Plant Biologists. All rights reserved.
Genome-Wide Analysis of Plant bHLHs
plants, designated in this work as P1R1 (for PAR1RESPONSIVE1), P1R2, and P1R3 (corresponding to
At167P1R1, At159P1R2, and At168P1R3, respectively).
From the latter, only At159P1R2 had been previously
classified as a member of the bHLH family. With the
aim of identifying additional putative homologs to
these novel bHLH proteins in different plant species,
we implemented a BLAST-HMM (for hidden Markov
model)-based combined search strategy.
BLAST searches were performed using the novel
atypical Arabidopsis bHLHs as queries. In each case, a
large number of hits were obtained, mostly corresponding to proteins annotated as containing the
bHLH domain. However, among the best-scoring
matches, 19 Arabidopsis, poplar, and rice sequences
not previously annotated as bHLHs were also retrieved. These sequences were subsequently aligned,
and the resulting alignments were used as a seed to
generate HMM profiles. The HMM profiles were in
turn used as queries in searches against selected plant
proteome databases, resulting in the identification of
eight additional matches (Table I).
The 27 putative novel bHLH sequences were combined with the previous estimates of bHLH families,
resulting in a primary data set of 650 amino acid
sequences putatively corresponding to bHLH domains. On the basis of the corresponding alignment,
a consensus motif composed of the 25 most conserved
positions was obtained, 11 of them corresponding to
key functional residues also conserved in a consensus
previously defined from animal bHLHs (Atchley et al.,
1999; Fig. 1). Some positions specific to the plant bHLH
consensus were occupied by highly conserved amino
acids, including R16 at the basic region and P32 at the
end of the helix 1 region. Furthermore, amino acid
frequencies at some of the positions common to both
plant and animal bHLH consensus were sharply different (Supplemental Table S1). These differences underlie
the early divergence between animal and plant bHLHs.
To confirm our data set of amino acid sequences as
bHLHs, we examined the fit of every sequence to both
consensus motifs by counting the number of matches
at each region of the predicted bHLH domain (Supplemental Table S2). In previous works, sequences
with more than eight to 10 mismatches from the
animal bHLH consensus motif were discarded (Buck
and Atchley, 2003; Heim et al., 2003; Toledo-Ortiz et al.,
2003; Li et al., 2006b). To ensure that atypical bHLH
domains were not eliminated by lack of correspondence to the consensus, we used a low stringent
criterion by allowing 10 and 13 mismatches from the
animal and plant bHLH consensus, respectively.
From the whole data set of putative bHLH sequences, only 13 sequences did not match any bHLH
consensus and were eliminated from further analysis
(Table I). This criterion was relaxed for At168P1R3,
identified in our phylogenetic analysis as a recent
paralog of At169. The remaining 638 sequences, representing an updated classification of bHLH families
in the species examined in this study, are shown in
Supplemental Table S2.
Dimerization Activities of Atypical bHLH Proteins
As a way to evaluate the accuracy of our searches for
atypical bHLHs, we tested dimerization activity of
AtPAR1 by performing yeast two-hybrid assays. As
shown in Figure 2A, the GAL4 activation domain (AD)
fused to AtPAR1 interacts strongly with the GAL4
binding domain (BD) fused to AtPAR1, revealing the
ability of AtPAR1 to specifically interact with itself.
Therefore, we conclude that AtPAR1 has retained
protein interaction activity. Together with previous
results demonstrating that nuclear localization is required for AtPAR1 function as a direct transcriptional
repressor of specific targets (Roig-Villanova et al.,
2007), it supports our analyses including it as an actual
bHLH.
Conserved hydrophobic residues in the HLH region
of the animal bHLH domain presumably define protein interaction activities (Massari and Murre, 2000).
Leu-27 and Leu-73 of helix 1 and 2, respectively, have
been identified as the most highly conserved residues
across plant bHLHs (Fig. 1; Supplemental Table S1).
Furthermore, most of the amino acid changes in these
positions were conservative (Supplemental Fig. S1). To
test whether these residues played a role in dimerization activities of plant atypical bHLHs, two mutated
versions of AtPAR1, PAR1-L1mut (Leu-27Glu) and
PAR1-L2mut (Leu-73Lys), were generated. When
PAR1-L1mut and PAR1-L2mut were fused to the AD
and tested against wild-type BD-PAR1, yeast growth
Figure 1. Plant and animal bHLH consensus. Alignment of the plant and animal bHLH consensus used as predictive motifs. The
plant consensus is based on an alignment of plant bHLHs and contains positions conserved in more than 50% of the sequences.
In such positions, amino acids conserved in more than 10% of the sequences were also included. The animal consensus is based
on Atchley et al. (1999). Shown at the bottom are the boundaries of the different regions of the bHLH domain.
Plant Physiol. Vol. 153, 2010
1401
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2010 American Society of Plant Biologists. All rights reserved.
Carretero-Paulet et al.
Figure 2. Yeast two-hybrid analysis of AtPAR1 protein interaction activities. A, Homodimerization activity of wild-type AtPAR1.
B, Homodimerization activity of two mutated versions of AtPAR1, L1mut (Leu-27Glu) and L2mut (Leu-73Lys). SD-LT refers to the
selective medium for transformed yeast cells, and SD-AHLT refers to the selective medium to perform the growth assay indicative
of protein-protein interaction. Numbers refer to the combinations of BD and AD yeast constructs used in each section, as
indicated in the right panels. All transformations within a section were done simultaneously. Cotransformations were repeated at
least twice with identical results.
was clearly affected, indicating that the interaction
was greatly reduced (PAR1-L2mut) or completely
abolished (PAR1-L1mut; Fig. 2B).
Phylogenetic Analysis of Plant bHLHs
To examine the evolutionary relationships among
plant bHLH proteins, a maximum likelihood (ML)
phylogenetic analysis based on the alignment of the
corresponding bHLH domains (Supplemental Fig. S1)
was carried out. The 638 plant bHLH proteins could be
classified into 32 subfamilies identified as clades with
high support values (Fig. 3; Supplemental Fig. S2). A
summary of information of bHLH proteins grouped
into their respective subfamilies is shown in Supplemental Table S3. Our analysis was robust to the alignment method employed, as almost every sequence
clustered similarly in MUSCLE and MAFFT-based
analysis (data not shown). Furthermore, tree topology
resulting from neighbor joining (NJ) and maximum
parsimony (MP) analyses was essentially the same,
most of the subfamilies being retrieved (Supplemental Fig. S2). Most plant bHLH subfamilies identified
in a recent survey (Pires and Dolan, 2010) were also
detected in our analysis. Newly identified atypical
bHLHs either formed new subfamilies (subfamilies
18–22) or grouped within previously defined subfamilies (subfamily 16).
We found 18 sequences that were not members of
any of the identified subfamilies or showed ambiguous clustering between different phylogenetic trees. In
an attempt to solve their evolutionary relationships
with defined plant bHLH subfamilies, a Bayesian
analysis (BA) was performed on a restricted data set
of the original alignment, which also included representatives from the 32 subfamilies. From the resulting
tree, three additional sequences were classified within
many other subfamilies (Supplemental Fig. S3). The
remaining 15 sequences were considered as orphans,
most likely representing highly diverged lineage-specific bHLH sequences or our phylogenetic analysis
could not resolve their evolutionary relationships. As
typically observed in bHLH protein phylogenies, deep
nodes, those determining interclade relationships,
commonly showed low statistical support and varied
between different phylogenetic methods, likely reflecting the large number of sequences being examined, the
high divergence of the motif combined with its short
length, and the occurrence of many ancient paralogs
(Atchley and Fitch, 1997). Although beyond the scope
of this work, the BA tree also provided some preliminary insights into the deep evolutionary history of the
plant bHLH domain.
Our phylogenetic analysis permitted the estimation
of the number of ancestral bHLH genes in the most
recent common ancestors (MRCA) of plants (Nam
et al., 2004). For instance, assuming that shared clades,
composed of ortholog sequences from the four land
plant species examined, are descendants of an ancestral bHLH gene, we obtained a minimum estimate of 14
bHLH genes in the hypothetical MRCA of land plants
(Fig. 4). However, this number could represent an
underestimate if we assume that the four additional
subfamilies including moss representatives, as well as
the 13 orphan genes found in land plants, represent
divergent members of additional ancestral families
lost in specific plant lineages. Assuming the latter, we
obtained a maximum estimate of 31 bHLH genes in the
MRCA of land plants (Fig. 4). The actual number of
bHLH genes will range between these two values and
will be dependent on the prevalence of gene duplication or loss in specific evolutionary lineages. A similar
approach was performed to get estimates of the number of genes in the MRCA of eudicots and monocots as
well as of eurosids I and eurosids II (Fig. 4). Interestingly, we found chlorophyte representatives in subfamilies 4 and 14, likely representing the descendants
of ancestral green plant bHLH genes. Cr7 also tends to
cluster at the base of subfamily 4 (Supplemental Figs.
S2 and S3), although with lower support. The rest of
chlorophyte bHLHs clustered in subfamily 32 at the
base of the tree. The single representative from C.
merolae did not group into any of the subfamilies,
suggesting that plant bHLH subfamilies evolved after
divergence of red algae from other photosynthetic
eukaryotes 1.5 billion years ago (Yoon et al., 2004).
1402
Plant Physiol. Vol. 153, 2010
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2010 American Society of Plant Biologists. All rights reserved.
Genome-Wide Analysis of Plant bHLHs
Figure 3. Phylogenetic relationships, intron pattern, DNA-binding motifs, and architecture of conserved protein motifs in 32
plant bHLH subfamilies. A, ML tree of 638 plant bHLH proteins (for the full representation of the tree, see Supplemental Fig. S2).
The tree has been rooted using the single representative from C. merolae. Subfamilies are represented collapsed as triangles
(except for subfamilies 5, 12, and 24), with both depth and width proportional to sequence divergence and size, respectively.
Subfamilies supported by bootstrap values greater than 50 in NJ or MP analysis are colored black. Subfamilies 5, 12, and 24,
highlighted with gray shading, were ambiguously retrieved in NJ, MP, and BA trees. Orphan genes are represented as single lines.
The tree is drawn to scale, with branch lengths proportional to evolutionary distances between nodes. The scale bar indicates the
estimated number of amino acid replacements per site. B, Summary of information of 32 plant bHLH subfamilies. Predicted
DNA-binding motifs are as follows: I, E non G; II, G binder; III, non E binder; IV, E-box; V, G-box; VI, non DNA binder. For intron
pattern designations, see Figure 5. C, Architecture of protein conserved motifs. Motifs are graphically represented as white boxes
drawn to scale for a representative plant bHLH protein of each subfamily. Motifs matching regions of the bHLH domain are
colored gray.
Sequence and Structural Analysis Provide Further
Support to Plant bHLH Subfamilies Definition
Intron/Exon Structure within the bHLH Domains
We analyzed the intron pattern, including intron
distribution, positions, and phases over genomic regions encoding for the bHLH domains. Approximately 20% of bHLH genes had no introns at the
bHLH coding region (Fig. 5, pattern k). The rest of the
genes had up to three introns that, according to relative positions and phases, could be arranged into 21
different splicing patterns. Patterns a to g, composed
of one to three introns distributed at three highly
conserved specific positions, accounted for approximately 72% of bHLH genes. As previously observed in
Arabidopsis and rice bHLH genes (Toledo-Ortiz et al.,
2003; Li et al., 2006b), patterns a and f were found to be
the most common ones also in poplar and moss bHLH
genes but were not found in algae (Fig. 5). The
remaining bHLH genes have introns at positions different from the rest of the family, forming patterns h to
l as well as nine additional patterns exclusive of single
Plant Physiol. Vol. 153, 2010
1403
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2010 American Society of Plant Biologists. All rights reserved.
Carretero-Paulet et al.
Figure 4. Evolution of bHLH gene family size in plants. Estimates of
bHLH gene family size in the MRCA of examined plant species are
represented at the corresponding nodes of a tree depicting their
evolutionary relationships. Numbers correspond to minimum and
maximum estimates. Branch lengths are proportional to evolutionary
divergence time, according to previous estimates (Chaw et al., 2004;
Yoon et al., 2004; Tuskan et al., 2006; Merchant et al., 2007; Rensing
et al., 2008). The scale bar represents millions of years ago. The number
of bHLH genes (subfamilies) identified in extant species is indicated for
Arabidopsis (At), poplar (Pt), rice (Os), moss (Pp), four chlorophyte
species (Ch), and C. merolae (Cm).
bHLH genes. It is interesting that intron pattern distribution was almost absolutely conserved within most
subfamilies, providing an independent criterion for
testing the reliability of our phylogenetic analysis (Fig.
3B). An interesting exception is provided by the green
plant ancestral subfamily 4, which clustered representatives of intron patterns i and j.
Figure 5 also shows, in each case, the position of
splicing with respect to codon (i.e. the intron phase).
An intron was designated as occurring in one of three
phases, phase 0, 1, or 2, depending on whether the
splicing occurred between codons, after the first nucleotide, or after the second nucleotide of the codon,
respectively. Among the 887 introns analyzed here, a
great majority (840) had phase 0, whereas only 15 and
32 had phases 1 and 2, respectively. Among phase 0
introns, we found all introns from patterns a to g.
359 plant bHLH proteins. Seven more sequences have
the conservative amino acid change Arg-16Lys, which
has been shown not to interfere with E-box binding
(Hua et al., 1993). Moreover, three additional residues
at the basic region, His/Lys-9, Glu-13, and Arg-17,
provide DNA-binding specificity for a specific type of
E-box, the so-called G-box (Ferre-D’Amare et al., 1994;
Shimizu et al., 1997). Eighty-six of the 366 E-box DNAbinder bHLHs lacked the G-box recognition motif, the
rest (280) being classified as G-box DNA binders
(Table II). The remaining 105 bHLHs, lacking residues
defining E-box-binding recognition specificities but
having more than five basic amino acids at the basic
region, were classified as non E-box DNA binders.
A total of 167 out of 638 plant bHLH proteins lacked a
basic region and were tentatively predicted to be non
DNA binders. However, a subset of these sequences
displayed the E-box-binding (seven) and G-box-binding
(66) recognition motifs and, in some cases, grouped
within subfamilies mostly composed of DNA-binder
bHLHs (Fig. 3B). It remains to be determined whether
these sequences have retained DNA-binding activity in
spite of their low basic region.
Some bHLH sequences displayed a significantly
higher frequency of specific amino acids. For instance,
subfamily 23 grouped several non E-binder bHLH
sequences displaying up to four Pro residues at the
basic region (Supplemental Table S3). The presence of
Pro residues in the basic region has been claimed to
indicate a differential positioning with respect to the
DNA as a result of modified folding (Toledo-Ortiz
et al., 2003). Moreover, in most non DNA-binder
bHLHs, basic residues at the basic region have been
replaced by specific amino acids such as Ser (e.g.
subfamilies 16 and 17), Gly (e.g. subfamily 22), or even
acidic amino acids (e.g. subfamily 21). The functional
significance of such specific amino acid replacements
at the basic region is yet unknown.
Architecture of Conserved Protein Motifs
Predicted DNA-Binding Properties
By examining the amino acid sequence at the basic
region of the bHLH domain, plant bHLH proteins
could be classified into different DNA-binding groups.
The distribution of the different predicted DNAbinding categories was represented across the bHLH
phylogenetic tree, revealing that most subfamilies
share predicted DNA-binding properties (Fig. 3B).
bHLH domains with at least five basic amino acids
at the basic region are expected to bind DNA (Massari
and Murre, 2000). A larger group composed of 471
plant bHLH proteins was found to fit this criterion
(Table II). DNA-binder bHLHs can be further subdivided into additional DNA-binding categories. According to three-dimensional structural analysis of
bHLH proteins, Glu-13 and Arg-16 have been reported
to be essential in E-box-binding recognition (Fig. 1;
Ferre-D’Amare et al., 1994; Shimizu et al., 1997). This
E-box-binding recognition motif has been identified in
A search for conserved motifs in plant bHLH proteins identified 50 motifs of variable length (8–80
amino acids; Supplemental Table S4). In most cases,
protein architecture is remarkably conserved within
specific subfamilies, giving further support to the phylogenetic analysis based on bHLH domains (Fig. 3C).
Motifs 1 and 2 were identified as the helix 2 and
helix 1 regions of the bHLH domain, respectively, in
almost every bHLH protein sequence analyzed. The
basic and loop regions of the bHLH domain appear to
be less conserved; consequently, no single motif was
detected matching these regions across plant bHLHs.
By contrast, some specific motifs were identified as
matching the basic and loop regions of specific subfamilies (Fig. 3C).
Outside the bHLH domain, some subfamily-specific
motifs had been previously characterized as defining
additional functional properties. For instance, motif 9,
observed in all members of subfamily 4, was unam-
1404
Plant Physiol. Vol. 153, 2010
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2010 American Society of Plant Biologists. All rights reserved.
Genome-Wide Analysis of Plant bHLHs
Figure 5. Intron patterns within the bHLH domains. Alignment of bHLH domains representative of 11 intron patterns, named
from a to l. The ? indicates nine additional gene-specific intron patterns. Locations of introns are indicated by triangles, and the
number within the triangle corresponds to the intron phase. The number of bHLHs displaying each pattern in Arabidopsis (At),
poplar (Pt), rice (Os), moss (Pp), and algae is given in the table at right of the alignment.
biguously identified as a ZIP dimerization domain.
Motif 6, shared by members of subfamily 23, has been
characterized in AtLHW as necessary for homodimerization (Ohashi-Ito and Bergmann, 2007). Motif 14,
identified in subfamilies 2, 5, and 23, corresponds to a
motif previously identified in AtMYC3/ATR2. A conserved Asp residue in this region has been reported to
be functionally important for correct expression of
several downstream genes acting in the Trp biosynthesis pathway (Smolen et al., 2002). Motif 44, conserved among phytochrome-interacting members of
subfamily 24, has been characterized as providing a
phytochrome B-specific recognition module (Khanna
et al., 2004). Finally, motifs 7, 4, and 19 have been
reported to form the highly conserved C-terminal
domain of AtSPCH, AtMUTE, and AtFMA, grouped
within subfamily 10. Although the biological role of
this domain is still uncertain, its overexpression leads
to a weak partial reversion of the fama mutant stomata
phenotype (Pillitteri et al., 2007).
To gain further insights into the origin and mode of
evolution of bHLH motifs, we examined their distribution across species as well as their spatial locations
across bHLH proteins. Most conserved motifs were
already present in the ancestor of land plants, as all but
motif 6 were identified in moss bHLH proteins. Apart
from motifs 1 and 2, only four conserved motifs were
also found in Cm1, while a total of 19 out of 50 motifs
were detected in chlorophyte bHLH proteins. The ZIP
motif (9) was the only one to have been found outside
plants. However, no similarities were found between
subfamily 30 of plant bHLH-ZIP proteins and animal
bHLH-ZIP proteins, and previous works supported
the independent acquisition of the motif multiple
times during plant and animal evolution (Atchley
and Fitch, 1997; Morgenstern and Atchley, 1999; Pires
and Dolan, 2010). The bHLH domain itself provides an
interesting example of variation in the relative spatial
location. In specific subfamilies, the bHLH domain
is located at the NH2-terminal, middle, or COOH-
Table II. Classification of plant bHLHs according to the presence of DNA-binding motifs in the basic region of the bHLH domain
DNA-Binding Motif
Species
Arabidopsis
Poplar
Rice
Moss
Algae
Total
.5 Basic Amino Acids
,5 Basic Amino Acids
E13, R/K16
H/K9, E13, R/K16, R17
Not Defined
E13, R/K16
H/K9, E13, R/K16, R17
Not Defined
E Non G
G Binder
Non E Binder
E-Box
G-Box
Non DNA Binder
20
29
31
3
3
86
78
62
75
60
5
280
35
27
32
10
1
105
3
2
2
0
0
7
11
19
18
16
2
66
20
44
19
9
2
94
Plant Physiol. Vol. 153, 2010
1405
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2010 American Society of Plant Biologists. All rights reserved.
Carretero-Paulet et al.
Table III. Summary of functionally characterized bHLHs from plant species examined in this study classified by bHLH subfamilies
Single, double, and triple asterisks in subfamily numbers indicate angiosperm, green plant, and dicot shared subfamilies, respectively.
Subfamily
Reported Members
Biological Function
1*
At021AMS, At022DYT1, At029FRU, At033SCRM,
At116ICE Os005TD, Os006RERJ
2*
At004MYC4, At005MYC3, At006MYC, At013MYC7,
At017AI, Os009MY
3*
At020NAI1
4**
At105ILR3, Os062bHLH
5***
At001GI3, At002EGL, At012MYC, At042TT8,
Os013OSB1, Os016OSB2
At045MUTE, At097FM, At098SPC, Os051FM,
Os053SPC1, Os054SPC, Os055MUTE
At095ZOU
At038ORG2, At039ORG3
At046BIM1, At102BIM, At141BIM3
At154ERP
At134PRE2, At135PRE3, At136PRE1, At161PRE4,
At163KDR, At164PRE5
At142SAC5
At159P1R2, At167P1R1
At165PAR1, At166PAR2
At168P1R3
At155CPu, At156LHW
At008PIF3, At009PIF, At015PIL At016UN10, At024SPT,
At026HFR1, At065PIL6, At072PIF7, At073ALC,
At124PIL1, At132PIL2, Os102BP5
10*
11*
12***
14**
15***
16***
17***
20***
21***
22
23***
24*
25*
26*
28*
31*
Orphans
At031ZCW, At044BEE1, At050BEE3, At058BEE2,
At063CIB1
At059UN12, Os096PTF
At083RHD6, At086RSL1, Pp94RSL1, Pp96RSL2
At037HEC2, At040IND, At043HEC3, At088HEC1,
Os123LAX
At108MEE8
terminal region of the protein (Fig. 3C). In addition,
motifs 12, 22, 26, 35, 39, 45, and 48 also showed spatial
variation relative to the bHLH domain.
DISCUSSION
Comparative studies of bHLH gene numbers in
different plant species show a gradual increase in the
number of bHLHs from algae to flowering plants (Fig.
4), which correlates with increasing organism complexity (Richardt et al., 2007). Although the loss of
ancestral bHLH genes in specific lineages cannot be
ruled out, it is unlikely that gene loss explains the
Response to freezing and chilling, guard mother cell
differentiation, flower development, response to iron
ion, response to cytokinin and jasmonic acid stimulus,
microspore development, tapetal layer and anther
development
Wound, insect, drought, and oxidative stress responses,
jasmonic acid and abscisic acid signaling, regulation
of anthocyanin metabolism, response to chitin
Endoplasmic reticulum body development, response to
fungus
Metal homeostasis regulation, response to auxin
stimulus, stress response, seed development
Regulation of flavonoid/anthocyanin metabolism,
trichome initiation, (epidermal) cell fate specification
Stomatal complex development
Embryonic development
Response to iron ion, response to salicylic acid stimulus
Brassinosteroid signaling
Response to ethylene stimulus
Gibberellic acid-light and gibberellic acid signaling
Unidimensional cell growth
Light and auxin signaling, shade avoidance
Light and auxin signaling, shade avoidance
Light and auxin signaling, shade avoidance
Root development
Shade avoidance, light signaling, deetiolation, female
gametophyte development, double fertilization
forming zygote and endosperm, fruit dehiscence,
gibberellic acid signaling, regulation of anthocyanin
metabolism, regulation of chlorophyll metabolism,
negative gravitropism, regulation of seed germination,
regulation of photomorphogenesis
Brassinosteroid and abscisic acid signaling, floral
transition, petal morphogenesis
Female gametophyte development, double fertilization
forming zygote and endosperm, response to
phosphate deficiency stress
Root hair, rhizoid, and caulonemata development
Flower and fruit development, initiation/maintenance of
axillary meristems
Embryonic development ending in seed dormancy
observed pattern. Our results, more likely, support
evolutionary diversification of the bHLH family
through extensive expansion at key milestones during
plant evolution, a pattern similar to that observed in
animal bHLHs (Amoutzias et al., 2004; Simionato
et al., 2007).
According to our analysis, two subfamilies (4 and
14) might configure the set of bHLH transcriptional
regulatory networks ancestral to the green plant lineage. However, the most important expansion in the
bHLH family occurred after the split between green
algae and land plant species. This led to the establishment of most of the diversity of DNA-binding motifs,
intron patterns, and protein motifs of plant bHLH
1406
Plant Physiol. Vol. 153, 2010
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2010 American Society of Plant Biologists. All rights reserved.
Genome-Wide Analysis of Plant bHLHs
proteins and probably reflects the transition from
aquatic to terrestrial habitats. A similar evolutionary
scenario has also been postulated in a recent analysis
(Pires and Dolan, 2010). Other studies conclude that
a first evolutionary expansion of the bHLH complement in metazoans and plants might have been related to the acquisition of multicellularity (Ledent
and Vervoort, 2001) or even earlier (Simionato et al.,
2007). At least in certain green algae lineages, evolutionary expansion may have preceded multicellularity,
as revealed by the seven bHLH genes found in the
single-celled C. reinhardtii.
A second significant expansion was observed after
the split between moss and vascular plants, as reflected
in the 12 angiosperm-specific subfamilies and the
greater size of 12 land plant ancestral bHLH subfamilies in angiosperms (Fig. 3B). This expansion might
reflect the more complex body plan and specialization
of vascular and flowering plants (Richardt et al., 2007).
Our results support birth-and-death evolution
through repeated gene duplication and eventual loss
driving plant bHLH evolutionary expansion and diversification (Nei and Rooney, 2005; Zhang et al.,
2008). Signatures of birth-and-death evolution are
observed at both the sequence and genomic levels.
At the sequence level, this would translate into bHLH
sequences showing similar or higher between-species
divergence. To examine whether this was the case for
the plant bHLH family, we estimated sequence divergence at the amino acid level for the four land plant
species and C. reinhardtii. As expected, differences in
sequence divergence appear not to be significant in
any comparison at the within-species level and are
slightly increased in between-species comparisons
with C. reinhardtii (Supplemental Table S5). Previous
studies on the genome distribution of Arabidopsis and
rice bHLH genes supported a prominent role for genome segments and tandem duplication in the expansion of this gene family (Heim et al., 2003; Toledo-Ortiz
et al., 2003; Li et al., 2006b). Similarly, recurrent events
of single-gene duplication have been inferred to drive
animal bHLH diversification (Amoutzias et al., 2004).
Some duplicated genes will accumulate mutations as a
pseudogene and gradually lose their function. We
have identified several truncated and apparently
nonexpressed bHLH genes in poplar and moss genomes, likely corresponding to pseudogenes (data not
shown), which had been identified in a previous
survey also in Arabidopsis and rice (Li et al., 2006b).
More interestingly, some other duplicated genes remain in the genomes as differentiated functionally
specialized genes, providing a source to generate
evolutionary novelty in the form of new regulatory
functions (Nam et al., 2004; Nei and Rooney, 2005).
Regulatory roles of bHLHs are essentially based on
the recognition of a specific hexanucleotide sequence
core at the promoter of target genes (Martinez-Garcia
et al., 2000; Massari and Murre, 2000). A prominent
role has been attributed to key residues at the basic
region in discriminating between variants of this
hexanucleotide core motif, allowing the classification
of plant bHLHs into DNA-binding categories. None of
these DNA-binding categories formed monophyletic
groups, supporting the independent acquisition of
specific DNA-binding properties at different times
during plant bHLH gene family evolution (Fig. 3).
Moreover, a role for specific amino acids outside the
basic region in conferring additional DNA-binding
specificity through elements that lie outside of the
hexanucleotide core recognition motif cannot be ruled
out. Studies of the Drosophila melanogaster bHLH transcription factor Deadpan have led to the identification
of a single Lys residue at the loop region whose
replacement severely reduces DNA-binding affinity
(Nair and Burley, 2000; Winston et al., 2000). A similar
role might be inferred in plant bHLHs, as this residue
has been identified as highly conserved in 77.4% of the
sequences (position 46; Fig. 1). A second position of the
loop (position 56; Fig. 1) has also been found to be
particularly conserved, being occupied by an Asp
residue in approximately 65.5% of plant bHLHs.
Most of the novel bHLH genes and subfamilies
identified, and classified by our analysis, correspond
to atypical bHLHs, in which basic residues at the basic
region are commonly replaced by nonbasic amino
acids and are consequently predicted to lack DNAbinding activity (Supplemental Table S3). AtKDR
(subfamily 16) and AtPAR1 and AtPAR2 (subfamily
21) constitute the first characterized plant bHLH
proteins predicted to be non DNA binders. AtKDR
has been reported to negatively regulate AtHFR1
(Hyun and Lee, 2006), a bHLH protein that had
been previously reported to function as a branching
point of phytochrome-dependent signaling responses
(Fairchild et al., 2000). Later molecular and overexpression studies suggested that AtKDR, together with
a set of five additional, closely related homologs
(AtPRE1–AtPRE5), could play a role in GA-dependent
responses (Lee et al., 2006). AtPAR1 and AtPAR2 act as
direct transcriptional repressors of specific targets during shade-avoidance responses, including atypical
bHLHs (AtP1R1–AtP1R3) and specific auxin-responsive
genes (Roig-Villanova et al., 2007). Plant atypical bHLHs
would act as negative regulators of DNA-binding
bHLHs by forming heterodimers, as reported for ID
proteins from group D of animal bHLHs. Consistently,
AtKDR has been shown to heterodimerize with AtHFR1
(Hyun and Lee, 2006), and the AtPAR1 HLH domain
has retained protein interaction function (Fig. 2A).
However, no sequence similarity has been found between this group of plant bHLHs and group D of animal
bHLHs. Plant atypical bHLHs emerge as a group of
transcriptional regulators playing regulatory roles in
plant-specific biological processes, notably, by integrating phytochrome- and hormone-dependent signaling
pathways.
Plant bHLH proteins have been reported to dimerize with a wide and diverse range of transcriptional
regulators, including members of the bHLH family
(Toledo-Ortiz et al., 2003), other transcription factors,
Plant Physiol. Vol. 153, 2010
1407
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2010 American Society of Plant Biologists. All rights reserved.
Carretero-Paulet et al.
such as R2R3-MYBs (Goff et al., 1992; Dubos et al.,
2008), BZR1-BES1 (Yin et al., 2005), or AP2s (Chandler
et al., 2009), signal transduction proteins, such as WD40
repeat proteins (Ramsay and Glover, 2005), and epigenetic regulators of gene expression (Thorstensen
et al., 2008). Dimerization activities of bHLH proteins allow expanding regulatory roles of bHLH proteins by defining additional protein interaction and
DNA-binding specificities (Massari and Murre, 2000;
Toledo-Ortiz et al., 2003). The HLH region of the
bHLH domain is responsible for the dimerization
activities of bHLH proteins. However, little is known
about how the specificity of this interaction is defined.
Three-dimensional structural analysis of the mammalian Max protein together with site-directed mutagenesis experiments on human E47 and E12
characterized two conserved Leu residues at the helix
1 and 2 regions, respectively, as essential for dimerization (Voronova and Baltimore, 1990; Ferre-D’Amare
et al., 1993). Both Leu residues have been identified as
the most conserved residues across plant bHLHs
(positions 27 and 73; Fig. 1). Such an essential role in
dimerization activity would also be conserved in plant
bHLHs, as revealed by yeast two-hybrid protein interaction assays using two mutated versions at these
positions of the highly diverged AtPAR1 protein
(Fig. 2B).
We observed an excess of phase 0 introns and of
symmetric exons within the bHLH domain (Fig. 5).
This provides an interesting mechanism to explain the
exchange of protein motifs, facilitating exon shuffling
by avoiding interruptions of the open reading frame.
Introns would be inserted (or eventually excised) from
the bHLH coding region in a subfamily-specific manner, in accordance with previous results showing that
numerous introns have been specifically inserted into
plants and retained in the genome (Rogozin et al.,
2003). The scattered distribution through the bHLH
phylogeny of pattern k, lacking introns, together with
its occurrence in bHLH sequences from algae species
might be indicative of its ancestral nature, consistent
with this model.
Most plant bHLH proteins are multidomain proteins composed of a set of conserved motifs already
present in the MRCA of land plants. Many motifs
consist of short conserved sequences arranged following a mosaic pattern (Fig. 3C). This arrangement
might be mostly explained by modular evolution with
domain shuffling, as suggested in animal bHLHs
(Morgenstern and Atchley, 1999). Shuffling of functional domains among bHLH proteins, including specific regions of the bHLH domain, would promote
further functional diversification in specific lineages.
One might anticipate that ortholog bHLH proteins
closely clustering in a subfamily and sharing similar
intron/exon organization, the architecture of protein
motifs, predicted DNA-binding motifs, and additional
sequence features should have recent common evolutionary origins and consequently related molecular
and biological functions. However, the extent of func-
tional diversification within specific subfamilies is
variable, ranging from functional redundancy to members displaying highly diverged specialized functions
(Table III).
Such apparent functional redundancy is observed in
subfamilies 14 and 25, clustering AtBIM and AtBEE
genes, respectively, involved in brassinosteroid signaling (Friedrichsen et al., 2002; Yin et al., 2005). Functional specialization may be observed in other plant
bHLH subfamilies. AtSPCH, AtMUTE, and AtFMA,
members of subfamily 10, have been characterized to
control stomatal development at three consecutive
steps: initiation, meristemoid differentiation, and
guard cell morphogenesis, respectively (Pillitteri
et al., 2007). The corresponding rice orthologs of
subfamily 10 also provide an interesting example of
functional conservation (Liu et al., 2009). An outstanding example of functional diversification is encountered in subfamily 1, which clusters nine plant bHLH
genes involved in very diverse biological roles (Table
III; Chinnusamy et al., 2003; Sorensen et al., 2003;
Jakoby et al., 2004; Kiribuchi et al., 2004; Li et al., 2006a;
Zhang et al., 2006; Kanaoka et al., 2008).
We found moss orthologs of bHLHs related to
biological processes specific to vascular and flowering
plants. The above-mentioned AtBIM and AtBEE genes
provide a first example. Brassinosteroids play a key
role in the differentiation of vascular tissues (xylem
and phloem; Cano-Delgado et al., 2004). Consistently,
nonvascular moss is devoid of brassinosteroid biosynthetic and signaling pathway genes (Rensing et al.,
2008). Interestingly, subfamily 14 also grouped consistently chlorophyte representatives Cr1 and Vc3. Several moss bHLH orthologs also showed clustering in
subfamily 24, whose members have been reported for
their role in phytochrome-dependent photomorphogenic responses that appeared later in the evolutionary
lineage of vascular plants, such as shade avoidance
and seed germination (Ni et al., 1998; Huq and Quail,
2002; Yamashino et al., 2003; Oh et al., 2004). Arabidopsis AtHEC genes, also grouping within subfamily
24, have been shown to work in concert with AtSPT to
coordinately regulate development of the female reproductive tract, probably in an auxin-dependent
manner (Gremski et al., 2007). An interesting question
for future research will be to investigate whether moss
(and algae) bHLH orthologs, within these subfamilies,
have retained the ancestral function or have evolved
new functions. Studies on AtRHD6 and AtRSL1 (subfamily 28), which control root hair development, provide a first insight into this question. Interestingly, the
corresponding orthologs PpRSL1 and PpRSL2 in moss
also control the development of nonhomologous organs with a rooting function (Menand et al., 2007).
Some other bHLHs functionally characterized belong to angiosperm-specific subfamilies subjected to
lineage-specific expansions, which may reflect speciesspecific adaptations. Subfamilies 12 and 15 provide
examples of dicot- and monocot-specific expansion,
respectively. Subfamily 12 clusters AtORG2 and AtORG3,
1408
Plant Physiol. Vol. 153, 2010
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2010 American Society of Plant Biologists. All rights reserved.
Genome-Wide Analysis of Plant bHLHs
regulated by iron ion deficiency-mediated stress and the
phytohormones salicylic acid and jasmonic acid (Kang
et al., 2003). Subfamily 15 includes AtERP, which is
involved in GA signaling acting downstream of DELLA
proteins (Zentella et al., 2007), conserved growth repressors that modulate GA responses. Furthermore, subfamily 23 clusters together six poplar sequences but only
three Arabidopsis and rice homologs. Subfamily 23 is
represented by AtLHW, which is involved in the regulation of the Arabidopsis root vascular initial population
(Ohashi-Ito and Bergmann, 2007). Similar poplar-specific
significant expansion has been found in the MADS box
subfamily clustering AtANR1, which is also known to be
involved in root development (Zhang and Forde, 1998;
Leseberg et al., 2006), and in the R2R3-MYB C1 subfamily,
whose members showed particularly abundant expression in roots (Wilkins et al., 2009).
Twelve out of the 32 bHLH subfamilies defined here
lack any functionally characterized member. Some
subfamilies might regulate biological roles essential
for land plant development, as they conform to big
subfamilies, including representatives from the four
land plant species (e.g. subfamilies 9 and 27), or are
specific to angiosperms (e.g. subfamilies 7, 18, 19, and
20). We expect the comprehensive classification and
evolutionary analysis of plant bHLHs presented here
to provide a useful framework to ortholog identification. This is a first step to infer the role of newly
identified plant bHLH proteins in the transcriptional
regulation of growth and development processes as
well as toward further investigations into the role of
the bHLH family in plant phenotypic diversification.
MATERIALS AND METHODS
Plant bHLH Sequence Identification and Analysis
Putative novel bHLH sequences were identified using BLAST (Altschul
et al., 1997) and profile HMMs (Durbin et al., 1988), generated and calibrated
with HMMER software version 2.3.3. Local searches were performed through
the proteomes of Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), poplar
(Populus trichocarpa), and moss (Physcomitrella patens), downloaded from The
Arabidopsis Information Resource, The Institute for Genomic Research Rice
Genome Annotation, Joint Genome Institute (JGI) Ptri version 1.1, and JGI
Ppatens version 2.0 browsers, respectively. Similar searches were performed
on the whole sequenced genomes of Volvox carteri, Chlamydomonas reinhardtii,
Ostreococcus tauri, and Ostreococcus lucimarinus (JGI Volca version 1.0, JGI
Chlre version 4.0, JGI Ostta version 2.0, and JGI Ostlu version 2.0, respectively), as well as Cyanidioschyzon merolae (http://merolae.biol.s.u-tokyo.ac.
jp/). Only hits returning E-values of less than 0.001 were considered for
further analysis. Redundant sequences were identified through BLASTCLUST
from the BLAST stand-alone package and subsequently discarded.
The bHLH sequences were aligned using the ClustalW, MUSCLE version
5.0, and MAFFT 6.0 (FFT-NS-2 algorithm) programs (Thompson et al., 1997;
Katoh et al., 2002; Edgar, 2004), and the resulting alignments were subsequently manually edited using GENEDOC 2.6.002. Limits of the bHLH
domains were taken according to the proposed predictive consensus motif
(Atchley et al., 1999), constructed referring to the structure of the human MAX
bHLH protein (Ferre-D’Amare et al., 1993), and further corrected for predicted
plant-specific bHLH domain boundaries (Toledo-Ortiz et al., 2003; RoigVillanova et al., 2007).
The MEME version 3.5.7 tool was used to identify conserved motifs shared
among bHLH proteins (Bailey and Elkan, 1994; Bailey et al., 2006). The
following parameter settings were used: maximum number of different motifs
to find, 50; optimum motif width, 8 to 100. Subsequently, the MAST program
was used to search detected motifs in protein databases (Bailey and Gribskov,
1998). The motifs were further scanned against different domain databases,
including the National Center for Biotechnology Information’s Conserved
Domain Database, INTERPRO, and PROSITE (Apweiler et al., 2001).
Exon/intron location, distribution, and phases at the genomic sequences
encoding for the bHLH domain were examined through comparisons with the
predicted encoded protein using GENEWISE (Birney et al., 2004).
Phylogenetic Analysis
Reconstruction of evolutionary relationships was performed on the basis of
amino acid sequences of bHLH proteins. Only the bHLH domain was used,
because the flanking sequences of bHLH proteins from independent subfamilies are either nonhomologous or too divergent to be reliably aligned. bHLH
sequences from the different species examined were added sequentially to the
analysis, and the resulting trees were compared with previous classifications
(Bailey et al., 2003; Buck and Atchley, 2003; Heim et al., 2003; Toledo-Ortiz
et al., 2003; Li et al., 2006b; Pires and Dolan, 2010).
The Jones, Taylor, and Thorton (JTT) with an estimated proportion of the
invariable sites (I) and an estimated g-distribution parameter (G) was selected
as the best-fitting amino acid substitution model with the Akaike information
criterion implemented in ProtTest version 1.4 (Jones et al., 1992; Abascal et al.,
2005). The ML analyses were performed using PHYML version 2.4.5 (Guindon
and Gascuel, 2003), using the JTT+I+G model. Heterogeneity of amino acid
substitution rates was corrected using a g-distribution with eight categories.
Tree topology searching was optimized using the subtree pruning and
regrafting option. The statistical support of the retrieved topology was
assessed using the Shimodaira-Hasegawa-like approximate likelihood ratio
test and a bootstrap analysis with 100 replicates. NJ and MP analyses were
implemented with MEGA 4.0 (Tamura et al., 2007). In NJ, distances were
calculated using the JTT amino acid substitution model, the g-distributed rate
among sites, and the g-parameter set as retrieved in ProtTest analysis. To deal
with short insertions/deletions (commonly occurring throughout the loop
region), “pairwise deletion” and “all sites” settings were used in NJ and MP
analyses, respectively. A bootstrap analysis with 1,000 replicates was performed in each case.
BA was implemented in MrBayes 3.1.2 (Huelsenbeck and Ronquist, 2001;
Ronquist and Huelsenbeck, 2003). Searches were run with four Markov chains
for 1 million generations and sampling every 100th tree. After stationary
phase was reached (determined by independent runs sampling similar
likelihood values after plotting against the number of generations), the first
100,000 trees were discarded as burn-in and a consensus tree was then
constructed to evaluate clades with Bayesian posterior probabilities greater
than 50%. The JTT model with rate heterogeneity across sites modeled as
g-distributed with eight categories and invariant sites was used.
Yeast Two-Hybrid Interaction Assays
The Matchmaker two-hybrid system (Clontech) was used to perform yeast
two-hybrid assays. The full-length open reading frame of AtPAR1 was
inserted in frame with the DNA BD and transcription AD fusion construct
using the pGBKT7 and pGADT7 vectors, respectively. The NcoI-BamHI
fragment of pACV9 (containing the entire coding sequence of AtPAR1;
Roig-Villanova et al., 2007) was subcloned into the same sites of pGBKT7
and pGADT7, resulting in pCL3 (BD-PAR1) and pCL1 (AD-PAR1), respectively. L1mut was generated by PCR-based site-directed mutagenesis using
the primers RO47 (5#-GATTGAGGCGGAGCAGAGGATTATCCCCGGAGGAG-3#) and RO48 (5#-GATAATCCTCTGCTCCGCCTCAATCTTTTCCTTGAC-3#). L2mut was similarly generated using the primers RO49
(5#-CATTCTGTCTAAACAATGTCAGATCAAAACCATTA-3#) and RO50
(5#-GATCTGACATTGTTTAGACAGAATGTAACCAGCTG-3#). In both cases,
AtPAR1 was amplified from the binary vector pBF1 (P35S:AtPAR1-GFP, a
pCAMBIA-1302-based binary vector containing full-length AtPAR1 flanked
by NcoI and SpeI) using specific primers from the P35S and GFP coding
sequences. Mutated L1mut and L2mut sequences were subcloned into pCRIITOPO (Invitrogen) to generate pIR44 and pMR5, respectively. Site-directed
mutations in these inserts were verified by sequencing. The NcoI-SpeI
fragments of pIR44 and pMR5 were subcloned into the same sites of
pGADT7, resulting in pCM7 (AD-PAR1-L1mut) and pCM8 (AD-PAR1L2mut), respectively.
Plant Physiol. Vol. 153, 2010
1409
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2010 American Society of Plant Biologists. All rights reserved.
Carretero-Paulet et al.
Yeast (AH109 strain) transformation was performed according to the
manufacturer’s instructions. Yeast cells were cotransformed with the different
pairs of BD-AD constructs. Independent transformants were selected on
minimal synthetic dropout medium (SD)-Leu-Trp (SD-LT). At least 10 independent colonies were transferred to SD-Ade-His-Leu-Trp to test for proteinprotein positive interaction (SD-AHLT).
Supplemental Data
The following materials are available in the online version of this article.
Supplemental Figure S1. ClustalW amino acid sequence alignment of 638
Arabidopsis, poplar, rice, moss, and algae bHLH domains.
Supplemental Figure S2. ML phylogenetic tree of 638 plant bHLH
proteins.
Supplemental Figure S3. BA phylogenetic tree of 50 plant bHLH proteins.
Supplemental Table S1. Plant and animal bHLH predictive consensus
motifs.
Supplemental Table S2. Species classification of 638 plant bHLH sequences examined in this study.
Supplemental Table S3. Subfamily classification of 638 Arabidopsis,
poplar, rice, moss, and algae bHLH sequences examined in this study
and additional information.
Supplemental Table S4. Summary of conserved motifs identified by
MEME in plant bHLHs.
Supplemental Table S5. Rates of sequence divergence at the amino acid
level in Arabidopsis, poplar, rice, moss, and C. reinhardtii bHLH
sequences.
ACKNOWLEDGMENTS
We thank F. Paulet-Dubois for critical reading of the manuscript and all
our laboratory members for stimulating discussions and suggestions. We also
thank two anonymous referees for their insightful comments. This work has
been carried out within the University of Almerı́a, the University of
Manchester, and the Centre CONSOLIDER for Research in Agricultural
Genomics. Thanks also to the Apple Research and Technology Support
scheme for support.
Received January 18, 2010; accepted May 13, 2010; published May 14, 2010.
LITERATURE CITED
Abascal F, Zardoya R, Posada D (2005) ProtTest: selection of best-fit models
of protein evolution. Bioinformatics 21: 2104–2105
Abe H, Yamaguchi-Shinozaki K, Urao T, Iwasaki T, Hosokawa D,
Shinozaki K (1997) Role of Arabidopsis MYC and MYB homologs in
drought- and abscisic acid-regulated gene expression. Plant Cell 9:
1859–1868
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W,
Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs. Nucleic Acids Res 25: 3389–3402
Amoutzias GD, Robertson DL, Oliver SG, Bornberg-Bauer E (2004)
Convergent evolution of gene networks by single-gene duplications in
higher eukaryotes. EMBO Rep 5: 274–279
Amoutzias GD, Veron AS, Weiner J III, Robinson-Rechavi M, BornbergBauer E, Oliver SG, Robertson DL (2007) One billion years of bZIP
transcription factor evolution: conservation and change in dimerization
and DNA-binding site specificity. Mol Biol Evol 24: 827–835
Apweiler R, Attwood TK, Bairoch A, Bateman A, Birney E, Biswas M,
Bucher P, Cerutti L, Corpet F, Croning MD, et al (2001) The InterPro
database, an integrated documentation resource for protein families,
domains and functional sites. Nucleic Acids Res 29: 37–40
Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of
the flowering plant Arabidopsis thaliana. Nature 408: 796–815
Atchley WR, Fitch WM (1997) A natural classification of the basic helix-
loop-helix class of transcription factors. Proc Natl Acad Sci USA 94:
5172–5176
Atchley WR, Terhalle W, Dress A (1999) Positional dependence, cliques,
and predictive motifs in the bHLH protein domain. J Mol Evol 48:
501–516
Bailey PC, Martin C, Toledo-Ortiz G, Quail PH, Huq E, Heim MA, Jakoby
M, Werber M, Weisshaar B (2003) Update on the basic helix-loop-helix
transcription factor gene family in Arabidopsis thaliana. Plant Cell 15:
2497–2502
Bailey TL, Elkan C (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In Proceedings of the
Second International Conference on Intelligent Systems for Molecular
Biology. AAAI Press, Menlo Park, CA, pp 28–36
Bailey TL, Gribskov M (1998) Combining evidence using p-values: application to sequence homology searches. Bioinformatics 14: 48–54
Bailey TL, Williams N, Misleh C, Li WW (2006) MEME: discovering and
analyzing DNA and protein sequence motifs. Nucleic Acids Res 34:
W369–W373
Birney E, Clamp M, Durbin R (2004) GeneWise and Genomewise. Genome
Res 14: 988–995
Buck MJ, Atchley WR (2003) Phylogenetic analysis of plant basic helixloop-helix proteins. J Mol Evol 56: 742–750
Cano-Delgado A, Yin Y, Yu C, Vafeados D, Mora-Garcia S, Cheng JC, Nam
KH, Li J, Chory J (2004) BRL1 and BRL3 are novel brassinosteroid
receptors that function in vascular differentiation in Arabidopsis.
Development 131: 5341–5351
Chandler JW, Cole M, Flier A, Werr W (2009) BIM1, a bHLH protein
involved in brassinosteroid signalling, controls Arabidopsis embryonic
patterning via interaction with DORNROSCHEN and DORNROSCHEN-LIKE. Plant Mol Biol 69: 57–68
Chaw SM, Chang CC, Chen HL, Li WH (2004) Dating the monocot-dicot
divergence and the origin of core eudicots using whole chloroplast
genomes. J Mol Evol 58: 424–441
Chinnusamy V, Ohta M, Kanrar S, Lee BH, Hong X, Agarwal M, Zhu JK
(2003) ICE1: a regulator of cold-induced transcriptome and freezing
tolerance in Arabidopsis. Genes Dev 17: 1043–1054
de Pater S, Pham K, Memelink J, Kijne J (1997) RAP-1 is an Arabidopsis
MYC-like R protein homologue, that binds to G-box sequence motifs.
Plant Mol Biol 34: 169–174
Doebley J, Lukens L (1998) Transcriptional regulators and the evolution of
plant form. Plant Cell 10: 1075–1082
Dubos C, Le Gourrierec J, Baudry A, Huep G, Lanet E, Debeaujon I,
Routaboul JM, Alboresi A, Weisshaar B, Lepiniec L (2008) MYBL2 is a
new regulator of flavonoid biosynthesis in Arabidopsis thaliana. Plant J
55: 940–953
Durbin R, Eddy SR, Krogh A, Mitchison G (1988) Biological Sequence
Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge, UK
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792–1797
Fairchild CD, Schumaker MA, Quail PH (2000) HFR1 encodes an atypical
bHLH protein that acts in phytochrome A signal transduction. Genes
Dev 14: 2377–2391
Fairman R, Beran-Steed RK, Anthony-Cahill SJ, Lear JD, Stafford WF III,
DeGrado WF, Benfield PA, Brenner SL (1993) Multiple oligomeric
states regulate the DNA binding of helix-loop-helix peptides. Proc Natl
Acad Sci USA 90: 10429–10433
Ferre-D’Amare AR, Pognonec P, Roeder RG, Burley SK (1994) Structure
and function of the b/HLH/Z domain of USF. EMBO J 13: 180–189
Ferre-D’Amare AR, Prendergast GC, Ziff EB, Burley SK (1993) Recognition by Max of its cognate DNA through a dimeric b/HLH/Z domain.
Nature 363: 38–45
Friedrichsen DM, Nemhauser J, Muramitsu T, Maloof JN, Alonso J, Ecker
JR, Furuya M, Chory J (2002) Three redundant brassinosteroid early
response genes encode putative bHLH transcription factors required for
normal growth. Genetics 162: 1445–1456
Goff SA, Cone KC, Chandler VL (1992) Functional analysis of the transcriptional activator encoded by the maize B gene: evidence for a direct
functional interaction between two classes of regulatory proteins. Genes
Dev 6: 864–875
Gremski K, Ditta G, Yanofsky MF (2007) The HECATE genes
regulate female reproductive tract development in Arabidopsis thaliana. Development 134: 3593–3601
1410
Plant Physiol. Vol. 153, 2010
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2010 American Society of Plant Biologists. All rights reserved.
Genome-Wide Analysis of Plant bHLHs
Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm
to estimate large phylogenies by maximum likelihood. Syst Biol 52:
696–704
Halliday KJ, Hudson M, Ni M, Qin M, Quail PH (1999) poc1: an
Arabidopsis mutant perturbed in phytochrome signaling because
of a T DNA insertion in the promoter of PIF3, a gene encoding a
phytochrome-interacting bHLH protein. Proc Natl Acad Sci USA 96:
5832–5837
Heim MA, Jakoby M, Werber M, Martin C, Weisshaar B, Bailey PC (2003)
The basic helix-loop-helix transcription factor family in plants: a
genome-wide study of protein structure and functional diversity. Mol
Biol Evol 20: 735–747
Hu J, Anderson B, Wessler SR (1996) Isolation and characterization of rice
R genes: evidence for distinct evolutionary paths in rice and maize.
Genetics 142: 1021–1031
Hua X, Yokoyama C, Wu J, Briggs MR, Brown MS, Goldstein JL, Wang X
(1993) SREBP-2, a second basic-helix-loop-helix-leucine zipper protein
that stimulates transcription by binding to a sterol regulatory element.
Proc Natl Acad Sci USA 90: 11603–11607
Huelsenbeck JP, Ronquist F (2001) MRBAYES: Bayesian inference of
phylogenetic trees. Bioinformatics 17: 754–755
Huq E, Quail PH (2002) PIF4, a phytochrome-interacting bHLH factor,
functions as a negative regulator of phytochrome B signaling in
Arabidopsis. EMBO J 21: 2441–2450
Hyun Y, Lee I (2006) KIDARI, encoding a non-DNA binding bHLH protein,
represses light signal transduction in Arabidopsis thaliana. Plant Mol
Biol 61: 283–296
International Rice Genome Sequencing Project (2005) The map-based
sequence of the rice genome. Nature 436: 793–800
Jakoby M, Wang HY, Reidt W, Weisshaar B, Bauer P (2004) FRU
(BHLH029) is required for induction of iron mobilization genes in
Arabidopsis thaliana. FEBS Lett 577: 528–534
Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of
mutation data matrices from protein sequences. Comput Appl Biosci
8: 275–282
Kaiser BN, Finnegan PM, Tyerman SD, Whitehead LF, Bergersen FJ, Day
DA, Udvardi MK (1998) Characterization of an ammonium transport
protein from the peribacteroid membrane of soybean nodules. Science
281: 1202–1206
Kanaoka MM, Pillitteri LJ, Fujii H, Yoshida Y, Bogenschutz NL,
Takabayashi J, Zhu JK, Torii KU (2008) SCREAM/ICE1 and SCREAM2
specify three cell-state transitional steps leading to Arabidopsis stomatal
differentiation. Plant Cell 20: 1775–1785
Kang HG, Foley RC, Onate-Sanchez L, Lin C, Singh KB (2003) Target
genes for OBP3, a Dof transcription factor, include novel basic helixloop-helix domain proteins inducible by salicylic acid. Plant J 35:
362–372
Karol KG, McCourt RM, Cimino MT, Delwiche CF (2001) The closest
living relatives of land plants. Science 294: 2351–2353
Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for
rapid multiple sequence alignment based on fast Fourier transform.
Nucleic Acids Res 30: 3059–3066
Kellogg EA (2004) Evolution of developmental traits. Curr Opin Plant Biol
7: 92–98
Kenrick P, Crane PR (1997) The origin and early evolution of plants on
land. Nature 389: 33–39
Khanna R, Huq E, Kikis EA, Al-Sady B, Lanzatella C, Quail PH (2004) A
novel molecular recognition motif necessary for targeting photoactivated phytochrome signaling to specific basic helix-loop-helix transcription factors. Plant Cell 16: 3033–3044
Kiribuchi K, Sugimori M, Takeda M, Otani T, Okada K, Onodera H,
Ugaki M, Tanaka Y, Tomiyama-Akimoto C, Yamaguchi T, et al (2004)
RERJ1, a jasmonic acid-responsive gene from rice, encodes a basic helixloop-helix protein. Biochem Biophys Res Commun 325: 857–863
Komatsu M, Maekawa M, Shimamoto K, Kyozuka J (2001) The LAX1 and
FRIZZY PANICLE 2 genes determine the inflorescence architecture of
rice by controlling rachis-branch and spikelet development. Dev Biol
231: 364–373
Ledent V, Vervoort M (2001) The basic helix-loop-helix protein family:
comparative genomics and phylogenetic analysis. Genome Res 11:
754–770
Lee S, Lee S, Yang KY, Kim YM, Park SY, Kim SY, Soh MS (2006)
Overexpression of PRE1 and its homologous genes activates gibberellin-
dependent responses in Arabidopsis thaliana. Plant Cell Physiol 47:
591–600
Leivar P, Monte E, Al-Sady B, Carle C, Storer A, Alonso JM, Ecker JR,
Quail PH (2008) The Arabidopsis phytochrome-interacting factor PIF7,
together with PIF3 and PIF4, regulates responses to prolonged red light
by modulating phyB levels. Plant Cell 20: 337–352
Leseberg CH, Li A, Kang H, Duvall M, Mao L (2006) Genome-wide
analysis of the MADS-box gene family in Populus trichocarpa. Gene
378: 84–94
Li N, Zhang DS, Liu HS, Yin CS, Li XX, Liang WQ, Yuan Z, Xu B, Chu HW,
Wang J, et al (2006a) The rice tapetum degeneration retardation gene is
required for tapetum degradation and anther development. Plant Cell
18: 2999–3014
Li X, Duan X, Jiang H, Sun Y, Tang Y, Yuan Z, Guo J, Liang W, Chen L,
Yin J, et al (2006b) Genome-wide analysis of basic/helix-loop-helix
transcription factor family in rice and Arabidopsis. Plant Physiol 141:
1167–1184
Liljegren SJ, Roeder AH, Kempin SA, Gremski K, Ostergaard L, Guimil
S, Reyes DK, Yanofsky MF (2004) Control of fruit patterning in
Arabidopsis by INDEHISCENT. Cell 116: 843–853
Liu T, Ohashi-Ito K, Bergmann DC (2009) Orthologs of Arabidopsis
thaliana stomatal bHLH genes and regulation of stomatal development
in grasses. Development 136: 2265–2276
Ludwig SR, Habera LF, Dellaporta SL, Wessler SR (1989) Lc, a member of
the maize R gene family responsible for tissue-specific anthocyanin
production, encodes a protein similar to transcriptional activators
and contains the myc-homology region. Proc Natl Acad Sci USA 86:
7092–7096
Lupas A (1996) Coiled coils: new structures and new functions. Trends
Biochem Sci 21: 375–382
Ma PC, Rould MA, Weintraub H, Pabo CO (1994) Crystal structure of
MyoD bHLH domain-DNA complex: perspectives on DNA recognition
and implications for transcriptional activation. Cell 77: 451–459
Martinez-Garcia JF, Huq E, Quail PH (2000) Direct targeting of light
signals to a promoter element-bound transcription factor. Science 288:
859–863
Massari ME, Murre C (2000) Helix-loop-helix proteins: regulators of
transcription in eucaryotic organisms. Mol Cell Biol 20: 429–440
Matsuzaki M, Misumi O, Shin IT, Maruyama S, Takahara M, Miyagishima
SY, Mori T, Nishida K, Yagisawa F, Yoshida Y, et al (2004) Genome
sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae
10D. Nature 428: 653–657
Menand B, Yi K, Jouannic S, Hoffmann L, Ryan E, Linstead P, Schaefer
DG, Dolan L (2007) An ancient mechanism controls the development of
cells with a rooting function in land plants. Science 316: 1477–1480
Merchant SS, Prochnik SE, Vallon O, Harris EH, Karpowicz SJ, Witman
GB, Terry A, Salamov A, Fritz-Laylin LK, Marechal-Drouard L, et al
(2007) The Chlamydomonas genome reveals the evolution of key animal
and plant functions. Science 318: 245–250
Morgenstern B, Atchley WR (1999) Evolution of bHLH transcription
factors: modular evolution by domain shuffling? Mol Biol Evol 16:
1654–1663
Morohashi K, Zhao M, Yang M, Read B, Lloyd A, Lamb R, Grotewold E
(2007) Participation of the Arabidopsis bHLH factor GL3 in trichome
initiation regulatory events. Plant Physiol 145: 736–746
Nair SK, Burley SK (2000) Recognizing DNA in the library. Nature 404:
715, 717–718
Nam J, Kim J, Lee S, An G, Ma H, Nei M (2004) Type I MADS-box genes
have experienced faster birth-and-death evolution than type II MADSbox genes in angiosperms. Proc Natl Acad Sci USA 101: 1910–1915
Nei M, Rooney AP (2005) Concerted and birth-and-death evolution of
multigene families. Annu Rev Genet 39: 121–152
Nesi N, Debeaujon I, Jond C, Pelletier G, Caboche M, Lepiniec L (2000)
The TT8 gene encodes a basic helix-loop-helix domain protein required
for expression of DFR and BAN genes in Arabidopsis siliques. Plant Cell
12: 1863–1878
Ni M, Tepperman JM, Quail PH (1998) PIF3, a phytochrome-interacting
factor necessary for normal photoinduced signal transduction, is a novel
basic helix-loop-helix protein. Cell 95: 657–667
Oh E, Kim J, Park E, Kim JI, Kang C, Choi G (2004) PIL5, a phytochromeinteracting basic helix-loop-helix protein, is a key negative regulator of
seed germination in Arabidopsis thaliana. Plant Cell 16: 3045–3058
Ohashi-Ito K, Bergmann DC (2007) Regulation of the Arabidopsis root
Plant Physiol. Vol. 153, 2010
1411
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2010 American Society of Plant Biologists. All rights reserved.
Carretero-Paulet et al.
vascular initial population by LONESOME HIGHWAY. Development
134: 2959–2968
Palenik B, Grimwood J, Aerts A, Rouze P, Salamov A, Putnam N, Dupont
C, Jorgensen R, Derelle E, Rombauts S, et al (2007) The tiny eukaryote
Ostreococcus provides genomic insights into the paradox of plankton
speciation. Proc Natl Acad Sci USA 104: 7705–7710
Payne CT, Zhang F, Lloyd AM (2000) GL3 encodes a bHLH protein that
regulates trichome development in Arabidopsis through interaction
with GL1 and TTG1. Genetics 156: 1349–1362
Pillitteri LJ, Sloan DB, Bogenschutz NL, Torii KU (2007) Termination of
asymmetric cell division and differentiation of stomata. Nature 445:
501–505
Pires N, Dolan L (2010) Origin and diversification of basic-helix-loop-helix
proteins in plants. Mol Biol Evol 27: 862–874
Rajani S, Sundaresan V (2001) The Arabidopsis myc/bHLH gene
ALCATRAZ enables cell separation in fruit dehiscence. Curr Biol 11:
1914–1922
Ramsay NA, Glover BJ (2005) MYB-bHLH-WD40 protein complex and the
evolution of cellular diversity. Trends Plant Sci 10: 63–70
Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H,
Nishiyama T, Perroud PF, Lindquist EA, Kamisugi Y, et al (2008) The
Physcomitrella genome reveals evolutionary insights into the conquest
of land by plants. Science 319: 64–69
Richardt S, Lang D, Reski R, Frank W, Rensing SA (2007) PlanTAPDB, a
phylogeny-based resource of plant transcription-associated proteins.
Plant Physiol 143: 1452–1466
Riechmann JL, Heard J, Martin G, Reuber L, Jiang C, Keddie J, Adam L,
Pineda O, Ratcliffe OJ, Samaha RR, et al (2000) Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes.
Science 290: 2105–2110
Robinson KA, Lopes JM (2000) Survey and summary: Saccharomyces
cerevisiae basic helix-loop-helix proteins regulate diverse biological
processes. Nucleic Acids Res 28: 1499–1505
Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV (2003) Remarkable interkingdom conservation of intron positions and massive,
lineage-specific intron loss and gain in eukaryotic evolution. Curr Biol
13: 1512–1517
Roig-Villanova I, Bou-Torrent J, Galstyan A, Carretero-Paulet L, Portoles
S, Rodriguez-Concepcion M, Martinez-Garcia JF (2007) Interaction of
shade avoidance and auxin responses: a role for two novel atypical
bHLH proteins. EMBO J 26: 4756–4767
Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic
inference under mixed models. Bioinformatics 19: 1572–1574
Shimizu T, Toumoto A, Ihara K, Shimizu M, Kyogoku Y, Ogawa
N, Oshima Y, Hakoshima T (1997) Crystal structure of PHO4
bHLH domain-DNA complex: flanking base recognition. EMBO J 16:
4689–4697
Simionato E, Ledent V, Richards G, Thomas-Chollier M, Kerner P,
Coornaert D, Degnan BM, Vervoort M (2007) Origin and diversification
of the basic helix-loop-helix gene family in metazoans: insights from
comparative genomics. BMC Evol Biol 7: 33
Smolen GA, Pawlowski L, Wilensky SE, Bender J (2002) Dominant alleles
of the basic helix-loop-helix transcription factor ATR2 activate stressresponsive genes in Arabidopsis. Genetics 161: 1235–1246
Sorensen AM, Krober S, Unte US, Huijser P, Dekker K, Saedler H (2003)
The Arabidopsis ABORTED MICROSPORES (AMS) gene encodes a
MYC class transcription factor. Plant J 33: 413–423
Stevens JD, Roalson EH, Skinner MK (2008) Phylogenetic and expression analysis of the basic helix-loop-helix transcription factor gene
family: genomic approach to cellular differentiation. Differentiation 76:
1006–1022
Szecsi J, Joly C, Bordji K, Varaud E, Cock JM, Dumas C, Bendahmane M
(2006) BIGPETALp, a bHLH transcription factor is involved in the
control of Arabidopsis petal size. EMBO J 25: 3912–3920
Tamura K, Dudley J, Nei M, Kumar S (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol
24: 1596–1599
Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997)
The CLUSTAL_X windows interface: flexible strategies for multiple
sequence alignment aided by quality analysis tools. Nucleic Acids Res
25: 4876–4882
Thorstensen T, Grini PE, Mercy IS, Alm V, Erdal S, Aasland R, Aalen RB
(2008) The Arabidopsis SET-domain protein ASHR3 is involved in
stamen development and interacts with the bHLH transcription factor
ABORTED MICROSPORES (AMS). Plant Mol Biol 66: 47–59
Toledo-Ortiz G, Huq E, Quail PH (2003) The Arabidopsis basic/helix-loophelix transcription factor family. Plant Cell 15: 1749–1770
Tsiantis M, Hay A (2003) Comparative plant development: the time of the
leaf? Nat Rev Genet 4: 169–180
Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U,
Putnam N, Ralph S, Rombauts S, Salamov A, et al (2006) The genome
of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313:
1596–1604
Voronova A, Baltimore D (1990) Mutations that disrupt DNA binding and
dimer formation in the E47 helix-loop-helix protein map to distinct
domains. Proc Natl Acad Sci USA 87: 4722–4726
Wilkins O, Nahal H, Foong J, Provart NJ, Campbell MM (2009) Expansion
and diversification of the Populus R2R3-MYB family of transcription
factors. Plant Physiol 149: 981–993
Winston RL, Ehley JA, Baird EE, Dervan PB, Gottesfeld JM (2000)
Asymmetric DNA binding by a homodimeric bHLH protein. Biochemistry 39: 9092–9098
Yamashino T, Matsushika A, Fujimori T, Sato S, Kato T, Tabata S, Mizuno
T (2003) A link between circadian-controlled bHLH factors and the
APRR1/TOC1 quintet in Arabidopsis thaliana. Plant Cell Physiol 44:
619–629
Yin Y, Vafeados D, Tao Y, Yoshida S, Asami T, Chory J (2005) A new class
of transcription factors mediates brassinosteroid-regulated gene expression in Arabidopsis. Cell 120: 249–259
Yoon HS, Hackett JD, Ciniglia C, Pinto G, Bhattacharya D (2004) A
molecular timeline for the origin of photosynthetic eukaryotes. Mol Biol
Evol 21: 809–818
Zentella R, Zhang ZL, Park M, Thomas SG, Endo A, Murase K, Fleet CM,
Jikumaru Y, Nambara E, Kamiya Y, et al (2007) Global analysis of della
direct targets in early gibberellin signaling in Arabidopsis. Plant Cell 19:
3037–3057
Zhang H, Forde BG (1998) An Arabidopsis MADS box gene that controls
nutrient-induced changes in root architecture. Science 279: 407–409
Zhang R, Wang YQ, Su B (2008) Molecular evolution of a primate-specific
microRNA family. Mol Biol Evol 25: 1493–1502
Zhang W, Sun Y, Timofejeva L, Chen C, Grossniklaus U, Ma H (2006)
Regulation of Arabidopsis tapetum development and function by
DYSFUNCTIONAL TAPETUM1 (DYT1) encoding a putative bHLH
transcription factor. Development 133: 3085–3095
1412
Plant Physiol. Vol. 153, 2010
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2010 American Society of Plant Biologists. All rights reserved.