Detection and Preliminary Analysis of Motifs in

Annals of Botany 96: 669–681, 2005
doi:10.1093/aob/mci219, available online at www.aob.oupjournals.org
Detection and Preliminary Analysis of Motifs in Promoters of
Anaerobically Induced Genes of Different Plant Species
B I J A Y A L A X M I M O H A N T Y 1,*, S . P . T . K R I S H N A N 1, S A N J A Y S W A R U P 2
and V L A D I M I R B . B A J I C 1
1
Knowledge Extraction Laboratory, Institute for Infocomm Research, 21 Heng Mui Keng Terrace,
Singapore 119613 and 2Department of Biological Sciences, National University of Singapore, Singapore
Received: 29 October 2004 Returned for revision: 16 December 2004 Accepted: 31 January 2005 Published electronically: 18 July 2005
Background and Aims Plants can suffer from oxygen limitation during flooding or more complete submergence
and may therefore switch from Kreb’s cycle respiration to fermentation in association with the expression of
anaerobically inducible genes coding for enzymes involved in glycolysis and fermentation. The aim of this
study was to clarify mechanisms of transcriptional regulation of these anaerobic genes by identifying motifs shared
by their promoter regions.
Methods Statistically significant motifs were detected by an in silico method from 13 promoters of anaerobic
genes. The selected motifs were common for the majority of analysed promoters. Their significance was evaluated
by searching for their presence in transcription factor-binding site databases (TRANSFAC, PlantCARE and
PLACE). Using several negative control data sets, it was tested whether the motifs found were specific to the
anaerobic group.
Key Results Previously, anaerobic response elements have been identified in maize (Zea mays) and arabidopsis
(Arabidopsis thaliana) genes. Known functional motifs were detected, such as GT and GC motifs, but also other
motifs shared by most of the genes examined. Five motifs detected have not been found in plants hitherto but are
present in the promoters of animal genes with various functions. The consensus sequences of these novel motifs
are 50 -AAACAAA-30 , 50 -AGCAGC-30 , 50 -TCATCAC-30 , 50 -GTTT(A/C/T)GCAA-30 and 50 -TTCCCTGTT-30 .
Conclusions It is believed that the promoter motifs identified could be functional by conferring anaerobic
sensitivity to the genes that possess them. This proposal now requires experimental verification.
Key words: Anaerobic genes, promoters, motifs, anaerobic response elements, ab initio motif detection, transcription
factors, transcription factor-binding sites, Arabidopsis thaliana, ethanolic fermentative pathway.
INTRODUCTION
Plants often suffer from a shortage of oxygen during partial
or complete submergence. Initially the inundated parts,
especially roots, suffer from hypoxia, which later turns to
anoxia, as the slow diffusion of oxygen in water (10 000
times slower than in air, Armstrong, 1979) fails to match the
demands of respiration. Rice (Oryza sativa) is the only
cereal crop that can tolerate anaerobic condition for prolonged periods of time compared with other cereal crops.
In addition, seasonal rainfall affects several other important
arable crops such as wheat (Triticum aestivum), maize (Zea
mays), barley (Hordeum vulgare) and other cereals via
waterlogging problems in certain soils that also lead to
low-oxygen stress particularly in their root zone. Anaerobic
stress, together with other abiotic stresses, is the prime cause
of crop loss worldwide (Grover et al., 1998; Khush and
Baenziger, 1998). Thus, elucidation of mechanisms of plant
response and adaptation to the stress is of considerable
scientific and economic importance. Despite continuing
efforts to discover the mechanisms behind submergence
tolerance, the actual mechanisms governing this behaviour
remain elusive. One step towards this direction would be to
understand the regulation of anaerobic genes at the genome
level.
* For correspondence. E-mail [email protected]
During anaerobiosis, plants switch from Kreb’s cycle
respiration to fermentation. Although there are a number
of fermentative pathways operating during anoxia (ethanol,
lactic acid and alanine fermentation; Kennedy et al., 1992),
ethanolic fermentation is the main energy-producing pathway (ap Rees et al., 1987; Greenway and Gibbs, 2003).
However, how this pathway is controlled is not clearly
understood although various anaerobic proteins (ANPs)
become induced during anaerobiosis in the roots. Approximately 20 ANPs have been identified in maize roots by
cDNA cloning, and most are enzymes of glycolysis and
fermentation such as sucrose synthase, pyrophosphatedependent phosphofructokinase, fructose-1,6-phosphate
aldolase, glucose-6-phosphate isomerase, glyceraldehyde3-phosphate dehydrogenase, alcohol dehydrogenase, lactate
dehydrogenase, pyruvate decarboxylase and others (Sachs
et al., 1980, 1996). The expression of such ANPs is controlled predominantly at the transcriptional level, although
a post-transcriptional regulatory mechanism has also been
demonstrated (Fennoy and Bailey-Serres, 1995).
It is generally believed that genes having similar expression patterns contain common motifs in their promoter
regions (Vilo et al., 2000). Klok et al. (2002) analysed the
expression of low-oxygen response in arabidopsis root cultures and found that the transcriptional regulatory regions
of genes with a similar expression share similar motifs.
Thus, a common set of transcription factors (TFs) is likely
ª The Author 2005. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved.
For Permissions, please email: [email protected]
670
Mohanty et al. — Promoters of Anaerobically Induced Genes
to control these genes. Common promoter motifs are the key
signatures for a family of co-regulated genes and are usually
present in the regions where complex protein interactions
occur (Z. Wang et al., 2004). However, in some cases,
single motifs can bind various transcription factors thereby
bringing the genes under multiple regulatory controls (Jin
and Martin, 1999; Yanagisawa, 2002). Extensive studies on
500 bp upstream regions of yeast promoters suggest that
regulatory elements are commonly present in those regions
(Caselle et al., 2002). In eukaryotes, the computational
detection of regulatory sites is difficult as the sequences
where TFs bind are generally much shorter than in prokaryotes (van Helden et al., 1998). In addition, they are generally active in both orientations and can be dispersed over
a large distance. Sometimes, they can be present in introns
and also in distal parts of the promoter (Caselle et al., 2002).
While many computer programs have been developed
to detect biological motifs (Lawrence et al., 1993; Bailey
and Elkan, 1994; Hertz and Stromo, 1999; Califano, 2000;
Pavesi et al., 2004; Huang et al., 2005; Yang et al., 2005;
and others), it is still a considerable challenge to detect
accurately previously defined regulatory sites in regions of
interest, let alone identify new regulatory sites responsible
for activation and expression of a functional gene group.
Anaerobically induced genes are often characterized by
the presence of anaerobic response elements (AREs) in their
promoter regions (Walker et al., 1987). AREs have been
reported in promoters of maize ADH1, ADH2 and aldolase
genes, and arabidopsis (Arabidopsis thaliana) ADH1,
LDH1 and PDC1 genes (Olive et al., 1990; Dolferus et al.,
1994; Hoeren et al., 1998). Two motifs, the GT motif and
GC motif, have been identified as AREs in these promoters
(Olive et al., 1990, 1991a, b). The transcription factor
AtMYB2 in arabidopsis binds to the GT motif of the
ADH1 promoter and a GC-binding protein binds to the
GC motif (Olive et al., 1991b). The consensus sequence
for an AtMYB2-binding site present in all known anaerobically induced genes is 50 -AAACC(G/A)(G/A)-30 (reverse
complement of the GT-box) (Hoeren et al., 1998), while the
one for the GC motif is 50 -GCCCC-30 . There is evidence that
the transcription factor AtMYB2 is induced by low-oxygen
stress (Hoeren et al., 1998) and, thus, could be another
important factor for the regulation of anaerobic genes.
Our aim was to find common motifs shared by the majority of regulatory regions of anaerobic genes that would be
in addition to the previously well-characterized GC and GT
motifs. Detection of new motifs that could be potential
transcription factor-binding sites (TFBSs) could help in
understanding the transcriptional regulation of these anaerobic genes under stress conditions. More importantly, a
clearer understanding of the promoter content and architecture would allow a better understanding of co-ordinate
regulation and could provide background for reconstruction
of parts of gene regulatory networks of anaerobic responsive
genes (Werner et al., 2003). TFBSs usually have length of
4–20 nt with 0–50 % mismatches per motif (Poluliakh and
Nakai, 2003). Important motifs can be discovered computationally as patterns common to most sequences in a family
of sequences, either as sequence patterns based on sequence
comparison or from comparison of structures (Z. Wang
et al., 2004). Many biologically relevant patterns have been
found using motif discovery algorithms. There are several
programs available for the extraction of motifs, such as
MEME (Bailey and Elkan, 1994), GibbsDNA (Lawrence
et al., 1993), CONSENSUS (Hertz and Stromo, 1999),
SPLASH (Califano, 2000) and Weeder Web (Pavesi et al.,
2004), among others. We have used the DRAGON MOTIF
BUILDER (DMB) program (Huang et al., 2005; Yang et al.,
2005) for our detection work. This program has already
been applied successfully to the analysis of 11 500 mouse
and 18 300 human promoters to detect motifs in an ab initio
manner (Huang et al., 2005).
In this report, we describe several new common patterns
that have not been reported previously for plants and
could be functional TFBSs. The fidelity of our list of overrepresented motifs is enhanced by finding several already
known TFBSs in our target group of anaerobic genes. We
demonstrate by using negative control promoter sets that
the patterns we discovered are over-represented only in the
anaerobic promoter target group and are specific to that
group, giving further support to our hypothesis that the
new motifs we report here could have a functional role
as anaerobic promoter elements.
METHODS
Promoter sequences
Plant promoter sequences were extracted from SoftBerry’s
Plant Promoter Database (PPD) (Shahmuradov et al., 2003),
the Eukaryotic Promoter Database (EPD) (Praz et al., 2002)
and GenBank (Benson et al., 2004). The nature of our study
requires very accurate location of transcription start sites
(TSSs). This has been provided in the PPD and EPD databases, as the TSS locations of sequences there were verified
experimentally. Unfortunately, data in GenBank lack accuracy in this respect, and every sequence has to be manually
checked, which is a virtually impossible task. Similarly,
promoter data for arabidopsis (http://arabidopsis.med.ohiostate.edu/AtcisDB/) are predicted and thus of inadequate
accuracy for our tasks. Consequently, the main sources of
our promoter data have been PPD and EPD. In our study, we
have used only 200 bp upstream regions relative to TSS
mainly caused by the availability of promoter sequences
in SoftBerry’s PPD. For all the anaerobic genes, we have
extracted [–200, +50] regions relative to TSS for the motif
detection analysis.
Data set of anaerobic genes (anaerobic set 1)
A group of 13 anaerobic genes that belong to the ethanolic fermentative pathway from seven different plant species were used to identify probable promoter regulatory
elements (motifs). We have extracted six promoter
sequences from PPD, five sequences from EPD and two
from GenBank. The genes included are maize alcohol
dehydrogenase 1 (ADH1), maize ADH2, arabidopsis ADH,
pea (Pisum sativum) ADH1, petunia (Petunia hybrida)
ADH2, tomato (Lycopersicon esculentum) ADH3a, tomato
ADH3b, barley ADH2, maize sucrose synthase, arabidopsis
sucrose synthase, rice sucrose synthase, rice pyruvate
Mohanty et al. — Promoters of Anaerobically Induced Genes
decarboxylase (PDC) and maize fructose bisphosphate
aldolase. This promoter set we denote as anaerobic set 1.
Although anaerobic gene sequences from this same pathway, such as arabidopsis PDC1, maize glyceraldehyde3-phosphate dehydrogenase, rice ADH2, cotton (Gossypium
hirsutum) ADH2b-2 and arabidopsis lactate dehydrogenase1, were available, we were not able to use them since
they lack accurate information regarding TSSs and promoter
regions.
Tool for motif detection (DMB program)
To find the known and unknown promoter motifs in the
compiled promoter sequences, we used the DMB program
(http://research.i2r.a-star.edu.sg/DRAGON/Motif_Search/)
and the following parameters: EM2, single motif occurrence
in the sequences with zero or one motif per sequence. We
searched for all motifs of lengths 5–12 nt, with the total
number of ten motifs per session. In the sessions, we manually changed the thresholds.
Analysis of motifs
A total of 120 motifs were detected in the promoters of the
13 anaerobic genes using the DMB program with different
thresholds. Out of the 120 motifs, we selected 16 motifs
with the highest frequency of appearance. The significance
of the selected motifs was evaluated by searching for their
presence in TFBS databases such as TRANSFAC (Matys
et al., 2003; http://www.gene-regulation.com), PlantCARE
(Lescot et al., 2002; http://oberon.rug.ac.be:8080/
PlantCARE/index.html) and PLACE (Higo et al., 1999;
http://www.dna.affrc.go.jp/htdocs/PLACE/).
Homology search for TFs that could bind motifs
(found in anaerobic set 1) that are unknown in plants
We found five unknown motifs in anaerobic set 1, which
are not known to act as TFBSs in plants, but are known to
be TFBSs in animal cells. We searched for homology of
TFs that bind these animal TFBSs to arabidopsis and rice
proteins. We used BLAST (Altschul et al., 1997) and the
internet service at http://www.ncbi.nlm.nih.gov/BLAST/
producttable.shtml. The protein sequences found in animals
were extracted from Swiss-Prot Protein Knowledgebase
(Boeckmann et al., 2003; http://tw.expasy.org/sprot/).
Data set of 117 anaerobic genes involved in signal
transduction/transcription and various metabolic
pathways (anaerobic set 2)
Based on the results of low-oxygen response in arabidopsis root culture in a microarray experiment, by Klok et al.
(2002), we selected a larger data set of anaerobic genes that
are highly overexpressed or underexpressed. Based on gene
names in this group, we extracted from different plant
species 117 promoter sequences from PPD and EPD and
we denote this set as anaerobic set 2. This set includes 13
anaerobic genes of the ethanolic fermentative pathway
(anaerobic set 1), as well as many other genes of a heterogeneous nature involved in signal transduction/transcription
which belong to a number of different metabolic pathways.
671
With this set, we aimed to check to what extent motifs
found in anaerobic set 1 are shared with promoters of anaerobic set 2.
Negative control set 1 (data set of a-amylase genes)
A data set of 15 promoters of a-amylase genes from four
plants of the cereal group [rice, wild oat (Avena fatua),
barley and wheat] was selected as the negative control
set 1. a-Amylase was chosen for negative control since it
is a starch-degrading enzyme and is known to be anaerobically induced only in rice, which is in contrast to our sugardegrading anaerobic genes. We have extracted promoter
regions of these genes from PPD and EPD databases.
Negative control set 2 (promoters from PPD and EPD
which excludes known anaerobically induced genes)
From PPD and EPD, we extracted another negative data
set of 303 genes having different functions and originating
from different plants. This set does not include either the
anaerobic genes or the genes differentially expressed by low
oxygen in the microarray experiment of Klok et al. (2002).
This was mainly done to observe whether the motifs detected for the anaerobic genes in the ethanolic fermentative
pathway would also turn up in this negative control set.
Negative control set 3 (promoters of genes induced by cold,
drought, high salinity stresses and ABA application)
Promoter sequences of the genes dehydrin, aldehyde
dehydrogenase, protease inhibitor, catalase, chlorophyll a-b
binding protein, actin and phenylalanine ammonia-lyase
(based on expression of rice genes in microarray experiments; Rabbani et al., 2003) induced by cold, drought, high
salinity and abscisic acid (ABA) application were extracted
from PPD and EPD. Altogether, 17 sequences were extracted from seven different plants [arabidopsis, rice, potato
(Solanum tuberosum), tomato, pea, curled-leaved tobacco
(Nicotiana plumbaginifolia), wood tobacco (Nicotiana
sylvestris) and oilseed rape (Brassica napus)].
Negative control set 4 (promoters of heat shock
protein genes)
To compare the motifs detected in the anaerobic genes
with other stress response genes, we extracted from PPD
and EPD a set of 12 genes responsive in heat stress from four
different plant species [rice, arabidopsis, soybean (Glycine
max) and maize].
RESULTS AND DISCUSSION
Selection and significance of motifs
We analysed 13 anaerobic genes from ethanolic fermentative pathway from seven plant species (anaerobic set 1)
and searched for shared promoter motifs. This pathway is
the most prominent pathway involved in anaerobic stress.
Out of 120 motif groups detected, we selected 16 that were
the most frequent (they appeared in >46 % of promoters,
a minimum six or more promoters) (Fig. 1). The maximum occurrence in anaerobic promoters was 92 %
Mohanty et al. — Promoters of Anaerobically Induced Genes
Number of occurrences in different sequences
672
14
12
10
8
6
4
2
A
A
G
A
A
CA
A
A
CA
G
CA C
CA
A
TT T
A
A
T
A
(A T A
/G
)A
TT TT
TT
TC TC
A T
TC
A
TT
C
A
A TT
A
C
G
T T (A/ T T C
T ( C)C
A
/C CT
C
/T
)G
(A
A
C
/C
T
/G AT AA
)A
A
A
A
A
A
TT
A
A
TA CA
TA AA
A
A
A
T C AC
CT
CA CC
A T
T T CT
CC CA
CT
G
TT
0
Motifs detected
F I G . 1. Sixteen different motif groups were detected in the promoter sequences of 13 anaerobic genes involved in the fermentative pathway (anaerobic set 1).
Each motif group is presented by the consensus sequence for the group. The number of promoters that contain the motif for specific groups is given.
18
16
Total information content
14
12
10
8
6
4
2
06
CA
CA
C
G
06
CA
G
A
06
A
T
TT
07
A
TT
A
A
A A
CA
07
CA AA
A
CT
07
TC CA
A
07 TC
TC AC
CT
CC
08 07
T
A TT
A
A TTT
(A
/C CT
)C
08
CT
TT
C
09 09 A TTC
TT
G
TT TA
C
T( TA
A
A
/C AT
/T
T
09
)
TA GC
A
TA
A
A
09
10
A
TT
A
(A
CC AC
/C
/G
CT
)A
G
A
TT
A
A
A
CA
A
A
06
A
A
(A
/G
)A
TT
0
Length and type of motif detected
F I G . 2. The total information content (IC) for each motif group is given. These indicate how homogeneous motifs in an individual motif group are. The
smaller the difference between the maximum IC and the total IC for the group, the more homogeneous are the motifs in the group, i.e. they are more similar to
each other. These data are provided for 16 top ranked motifs detected in the promoters of 13 anaerobic genes involved in the fermentative pathway (anaerobic
set 1). The motifs are of different length, and the maximum IC for each motif group is 2· motif length. All motif groups have a total IC >66 % of the maximum
possible IC for motifs of that length, indicating that motifs in the detected motif groups are very similar to each other within the group.
(12 out of 13) and 13 motifs were found with either 62 %
occurrence (eight out of 13) or more. The total information
content (IC) (for definition see Stormo, 2000 and references
therein) of each motif group is given in Fig. 2. The minimum value for the total IC for each motif was >66 % of the
maximum possible values, while the maximum was 87 % of
those values. This indicates that the motifs in individual
groups are very homogeneous (i.e. very similar to each
other).
We analysed the distribution of motifs in different segments of promoters and, if the first nucleotide of the motif
fell within the considered region, we counted that motif
as appearing in that region. We looked for the presence
of motifs in [200, 150], [150, 100], [100, 50],
Mohanty et al. — Promoters of Anaerobically Induced Genes
[50, 1] and [+1, +50] regions. We observed that most of
the motifs were found in the regions [200, 150], [150,
100] and [50, 1]. The highest percentage of the motifs
was found in the region [50, 1]. The motifs AGCAGC,
CACAAT, TTATTA and CAACTCA were found in all
upstream regions. However, most interestingly, some motifs
seem to prefer very specific regions. For example, the
ATATAAATT motif has a preferred region [50, 1]
where it is found in 77 % of promoters. The TATAAAAAC
motif appeared in 47 % of promoters in the same region.
Both of these motifs contain the TATA-box motif
TATAAA. It is commonly believed that the TATA-box
is present at positions around 30 relative to the TSS,
which has good concordance with our result. In plants,
two types of TATA-binding proteins are present, which
bind to the TATA-box (Vogel et al., 1993). In tobacco,
the TATA-box of the GapC4 promoter is required for anaerobic gene expression and is bound by a TATA-box-binding
protein (TBP) (Geffers et al., 2001).
The motif AAA(A/C)CCTC was found in the region
200 to 150 in 46 % of promoters and it contains a central
motif AACC that is at the core of the GT-box (reverse
complement). The presence of GT motifs in the promoter
of anaerobically induced genes of different plant species
was studied by Hoeren et al. (1998). They observed that
the GT motif is present in all anaerobically induced genes
and is located within 300 to 100 bp upstream of TSS.
In our analysis, the location of the GT motif in the [200,
150] region agrees very closely with the observations of
Hoeren et al. In arabidopsis, the presence of a GT motif
located in the promoters of the ADH1 gene is responsible
for the induction of anaerobic genes.
Expression profiling of low-oxygen responsive genes in
arabidopsis using a 3500 cDNA array revealed 210 differentially expressed transcripts (Klok et al., 2002). These
genes were organized in six related groups based on their
patterns of RNA accumulation levels. The clustered genes
showed over-representation of 6–10 bp motifs that were
previously described for the ADH1 promoter. Out of the
six motifs listed by Klok et al. (2002), GC and GT motifs
were similar to the two motifs identified by our analysis.
The presence of previously identified motifs in our search
gives an indication that our methodology is reasonable.
The motifs TTTTTCT and TTTTCTTC are each present
in 62 % of promoters in the [50, 1] region. The motif
GTTT(A/C/T)GCAA was also found in [50, 1] with 47 %
occurrence. The motif TCATCAC was present in the region
[50, 1] in 54 % of promoters. One motif, TTCCCTGTT,
is found in 54 % of promoters in the [100, 50] region.
We also provide for convenience in Fig. 3 positional distributions of the motifs found in 13 promoters of anaerobic
set 1. In Fig. 3, the patterns ATATAAATT (motif 11) and
TATAAAAAC (motif 13) contain the commonly found
TATA-box motif and both have a preferred position in
[50, 1] as one would expect for a TATA-box motif.
The new motifs which are not found in plants such as TCATCAC (motif 7) and TTCCCTGTT (motif 16) have a similar
preferred region [75, +20], with just one of each of these
motifs falling outside the region. The motif TCATCAC is
present in the promoter sequences of ADH, sucrose synthase
673
and aldolase genes, whereas the motif TTCCCTGTT is present in ADH and aldolase genes. The other new motifs
AAACAAA (motif 1) and AGCAGC (motif 2) also show
closer distribution and are found in ADH, sucrose synthase
and aldolase gene sequences. Among the other motifs,
TCCTCCT (motif 14) is distributed around the same region
of the promoters and is found in ADH, PDC and sucrose
synthase genes. The motifs CACAAT (motif 3) and AA (A/
G)ATT (motif 5) are distributed similarly on the promoters
and are found in ADH and sucrose synthase genes. The
motif TTTTTCT (motif 6) and TTTTCTTC (motif 8) are
both found in ADH, PDC and sucrose synthase genes
and also show a similar distribution pattern. The other
motifs, TTATTA, GTTT(A/C/T)GCAA and CAACTCA,
are more randomly distributed across promoters. Thus,
the positional distributions of different motifs indicate
specific positional preferences that may suggest regulatory
roles for such motifs in the fermentative pathway of
anaerobic genes.
The significance of the motifs found was determined by
searching for their presence in TFBS databases such as the
TRANSFAC, PlantCARE and PLACE database (Table 1).
Out of 16 motifs studied, eight of them [TTTTTCT,
TTTTCTTC, AAA(A/C)CCTC, ATATAAATT, (A/C/G)AAAAACAAA,
TATAAAAAC,
TCCTCCT
and
CAACTCA] are reported in the TRANSFAC database as
TFBSs for plants. The motifs CACAAT, TTATTA and
AA(A/G)ATT have been reported in the PLACE database
as parts of motifs of other TFBSs for plants. Only one motif,
AAA(A/C)CCTC, has been reported partly as the TFBS of
the ARE GT motif in maize (Walker et al., 1987). A part of
this motif, AAACC, has been reported in GapC4. The GC
motif of maize with the consensus sequence 50 -GCCCC-30
was also detected in 11 out of 13 promoters. In the region
[150, 100], this motif appears in 47 % of promoters. Five
other motifs are not reported as TFBSs for any plant species
but are known as TFBs for other organisms such as human,
rat, chicken, mouse, Drosophila, etc. (see Table 1). These
motifs are AGCAGC (11 out of 13), AAACAAA (nine out
of 13), TCATCAC (11 out of 13), TTCCCTGTT (nine out
of 13) and GTTT(A/C/T)GCAA (six out of 13) and showed
a high occurrence ranging from 46 to 85 % of promoters
of our group of 13 anaerobic genes. Accordingly, they are
strong candidates to be regulatory elements for anaerobic
metabolism, particularly as they serve as binding sites for
TFs in various animals (Table 1). These motifs now urgently
require additional experimental verification to define their
potential role in control of fermentative pathways.
Presence of animal TFs (unknown anaerobic motifs
detected in anaerobic set 1) in plant homologues
The motifs AAACAAA, AGCAGC, TCATCAC, GTTT(A/C/T)GCAA and TTCCCTGTT, found in anaerobic set 1,
are not known to be binding sites for plant TFs, but are
found in animals. The protein sequences of the animal TFs
were aligned to the protein sequences of arabidosis and rice
to observe whether the animal TFs exist in plants. The list of
BLAST hits (Table 2) did not provide sufficient similarity to
suggest that TFs similar to animal TFs exist in these two
Mohanty et al. — Promoters of Anaerobically Induced Genes
674
species. However, they were promising since some of the
hits suggested that these novel TFs do occur in plants. Also,
many of the hits were to plant proteins of unknown function
or to hypothetical plant proteins. Although this analysis was
inconclusive in the sense that we were not able to detect
plant proteins that were very similar to animal TFs, the
suggested outcomes that point to plant TFs or hypothetical
proteins or proteins with unknown function are encouraging
and require further study.
genes code for proteins not involved in fermentation.
Thus, it was important also to examine a selection of such
genes. We used a larger promoter group from 117 genes
(anaerobic set 2) determined from various species. The
chosen genes were based on gene name matching to ones
highly over- and underexpressed in arabidopsis during
a low-oxygen microarray experiment (Klok et al., 2002).
The selected genes do not belong to the same pathway, but
to many different metabolic pathways. Thus, the overall
similarity of their promoters should be considerably
smaller. Consequently, the motifs that are potentially shared
between such promoters are more likely to belong to
the general transcriptional machinery required for basal
transcription, rather than to be a more specific anaerobic
response, although one cannot exclude such a possibility.
Although we selected the top 22 motifs with high frequency
(Table 3), only four motifs (TTTTTGT, TTCATCA,
AAAACC and CAACTT) were, over a length of 5 nt
or more, found to be similar to the motifs detected in
Detection of motifs in promoters of 117 anaerobic
genes in anaerobic set 2
The previously analysed data set of 13 anaerobic genes
was homogeneous in the sense that all the genes belong to
the ethanolic fermentative pathway, and, thus, one could
expect that these genes share many similarities in their
promoter regions. Our finding of common promoter motifs
supports such an assumption. However, many anaerobic
A
Promoter sequence legend
Motif legend
Motif sequences
Sequence
number
Gene name
1
AAACAAA
1
Alcohol dehydrogenase − 1 in maize
2
AGCAGC
2
Alcohol dehydrogenase − 2 in maize
3
CACAAT
3
Alcohol dehydrogenase in arabidopsis
4
TTATTA
4
Alcohol dehydrogenase in pea
5
AA(A/G)ATT
5
Sucrose synthase in maize
6
TTTTTCT
6
Sucrose synthase in arabidopsis
7
TCATCAC
7
Sucrose synthase in rice
8
TTTTCTTC
8
Pyruvate decarboxylase in rice
9
AAA(A/C)CCTC
9
Alcohol dehydrogenase − 2 in petunia
10
GTTT(A/C/T)GCAA
10
Alcohol dehydrogenase − 3a in tomato
11
ATATAAATT
11
Alcohol dehydrogenase − 3b in tomato
12
(A/C/G)AAAAACAAA
12
Fructose bisphosphate aldolase in maize
13
TATAAAAAC
13
Alcohol dehydrogenase − 2 in barley
14
TCCTCCT
15
CAACTCA
16
TTCCCTGTT
Motif symbol
675
Mohanty et al. — Promoters of Anaerobically Induced Genes
Use of a-amylase as negative control
(negative control set 1)
the data set of 13 anaerobic promoters from the
ethanolic fermentative pathway (Table 4). The frequencies
of occurrence of those similar motifs ranged from 48 to
88 %. Among the four similar motifs, the motif TTCATCA
was found in 48 % of sequences and is one of the new motifs
found in the data set of 13 anaerobic genes, which was not
thought to be present in plants. The other partly similar
motif AAAACC was found in 88 % of sequences. A similar
motif (AAACC) has been reported in the GapC4 gene
of maize. This observation indicates that AAAACC could
be a motif common to anaerobic genes shared across
various metabolic pathways. The other two similar motifs,
TTTTTGT and CAACTT, could play some common role in
anaerobic induction of genes related to various mechanisms,
but their nature now needs carefully study.
The presence of the motifs found in anaerobic set 2
related to different metabolic pathways was searched in the
TRANSFAC, PlantCARE and PLACE databases. A few
motifs such as AAAGAAA, AAAGAAAAA, ATTTTTAT,
AAAACC and CAACTT are listed as being present partly
in plants as TFBSs. The others were not listed for plants, but
have been found in other organisms.
Sugar availability plays an important role in the production of energy by fermentation in oxygen-starved tissues
(Vartapetian and Jackson, 1997). As the amount of hexoses
and disaccharides is limited, the degradation of starch
reserves becomes crucial for survival under anoxic conditions (Perata et al., 1998). Among the starch-degrading
enzymes, a-amylase plays a major role (Sun and Henson,
1991). There is evidence that, in rice, it is produced
in germinating seeds under anoxia (Perata et al., 1992),
but not in anoxia-intolerant wheat, barley and other
cereals (Gulieminetti et al., 1995). There is also a report
that a-amylase plays a similarly important role in the
anoxia-tolerant rhizome of Acorus calamus (Arpagaus
and Braendle, 2000). Loreti et al. (2003) demonstrated
that a-amylase production under anoxia is mostly due
to the activity of the Ramy3D gene. Due to the critical
role these genes have in the supply of sugar during anoxic
conditions in tolerant rice seeds and other anoxia tolerant tubers, it was logical to compare these genes with
B
−200
−150
2
9
2
−100
14
3
4
5
2
14
1
16
7
15
5
13
1
10
9
13
1
12
5
16
3 12 2
4
8 14
16 6 7
11
9
13
16
11
4
8
6
4
1
7
15
11
6
8
13
12
15
11
10
5
16
6
7
14
7
3
9
8
5
15
13
11 9
16
4
14
3
8 14
6
3 7 2
3
3
13 9
4
10
2
7
3
2
7
4
9
7
15 5
9
1
6 5
3
11
4
15
12
16
10 2 13
1
8
5
3
11
4
15
12
16
10 2 13
1
8 7 11
6
13
1
5
14
+50
TSS
8
6
3 11
13
1
15
1
12
5
−50
10
2
8
6 14
3
16
3
4
14
7
2
15
7
15
10
12
13
F I G . 3. (A) Motifs/symbols used and gene name of the promoter sequences. (B) Positional distribution of 16 motifs detected in anaerobic set 1 for
13 anaerobic genes.
Mohanty et al. — Promoters of Anaerobically Induced Genes
676
T A B L E 1. Motifs found in promoters of 13 anaerobic genes involved in the fermentative pathway (anaerobic set 1) and their
presence in TRANSFAC and PLACE databases
Motifs detected
Presence in TRANSFAC/PLACE
database (for plants)
AAACAAA
Not known in plants
AGCAGC
Not known in plants
CACAAT
TTATTA
AA(A/G)ATT
AS1 in pea and tobacco (PLACE)
AtHB6 in arabidopsis (PLACE)
WRKY1 in cotton/cytokinin gene
in cucumber (PLACE)
GT-1 in tobacco
Not known in plants
TTTTTCT
TCATCAC
TTTTTCTTC
HNF-3alpha, HNF-3B, HNF-3g (insulin-like growth factor-binding protein !) in rat;
FOXJ1 (HNF3/Fkh homologue 4) in mouse
Adf-1 (activate tandem promoters of the ADH gene) in Drosophila; CTCF in
human, Homo sapiens
GT-1 in tobacco, MNB1b, MNF1
in maize
SP1 in maize; SEF3 in soybean,
GCBP-1 in maize; PinII gene in
potato
Not known in plants
GT-1 in tobacco
IAAA4/5 in pea; MyB5 in rice,
GT-1 in tobacco; NAPA in oil
seed rape
GT-1 and GT-1b in tobacco
Pal gene in parsley
SEF3 in soybean
Not known in plants
AAA(A/C)CCTC
GTTT(A/C/T)GCAA
ATATAAATT
(A/C/G)AAAAACAAA
TATAAAAAC
TCCTCCT
CAACTCA
TTCCCTGTT
Presence in TRANSFAC (other organisms)
Nkx6-1 (pancreatic b-cell-specific factor) in rat; EN-1 in mouse,
IPF1 in human (insulin gene enhancer)
FOXF2 in mouse; AREB6 in human; C/EBPa in chicken
LEF-1 in mouse; TCF-1 (T cell factor) in human TCF-1(P)in mouse, TCF-1A,
TCF-1B, TCF-1C,TCF-!E, TCF-1F, TCF-1G and TCF-2a in human
T A B L E 2. List of BLAST hits of TFs (column 2) to proteins in arabidopsis and rice
Motifs detected in anaerobic
set 1 (not known in plants)
Transcription factors
found in animals
Transcription factors and proteins found in
arabidopsis
Transcription factors and
proteins found in rice
AAACAAA
HNF-3A in rat
FOXJ1 in mouse
ADF-1 in Drosophila
CTCF in human
–
–
–
Zinc finger (C2H2 type) family protein/transcription
factor jumonji (jmj) family protein and other proteins
Putative HD-ZIP transcription factor, HD-ZIP
transcription factor 5, 6, 7, 10, 12, 13 and 16, and
other proteins
Putative HD-ZIP transcription factor, HD-ZIP
transcription factor 5, 7, 10, 12 and 17, and other
proteins
Different protein
Zinc finger (C2H2 type) family protein/transcription
factor jumonji (jmj) family protein, CCAAT-boxbinding transcription factor and other proteins
Different proteins
Different proteins
Different proteins
–
–
–
Putative transcription
factor IIIA and other proteins
Different proteins
AGCAGC
TCATCAC
Nkx6-1 in rat
EN-1 in mouse
GTTT(A/C/T)GCAA
FOXF2 in mouse
AREB6 in human
TTCCCTGTT
LEF1 in mouse
TCF1 in human
TCF1 in mouse
anaerobic genes. Thus, we compiled a data set of promoters
of a-amylase genes from various plants as a negative
control set.
Using DMB, we detected motifs in promoters of
a-amylase genes as we did for anaerobic genes. We selected
20 motifs with the highest occurrence. These motifs together
with their percentage occurrence are presented in Table 5.
The results show that the motifs detected are not the same
as those detected for anaerobic genes (Table 1).
Different proteins
Different protein
Different proteins
Different proteins
Different proteins
Different proteins
We searched for the presence of motifs from a-amylase
promoters in the TRANSFAC, PlantCARE and PLACE databases. Six motifs that were previously reported as TFBSs
for a-amylase genes were identified. These are TTTCCAT
(Amy 32B in barley), CCTTTTCA (Amy 32B in barley),
CAGTGCCTCCAA (Amy 3d in rice), GTAGCCATCAAT
(Amy 32B in barley), AGTGCCTCCAA (Amy 3D in rice)
and CACTGCCTATAAAT (Amy 3D in rice). The motifs
CTATAA, CCATCAGC, CCATCAAC, AGCCATCA (A/G)
677
Mohanty et al. — Promoters of Anaerobically Induced Genes
T A B L E 3. Percentage occurrence of detected motifs in
promoters of 117 anaerobic genes including the genes detected
through microarray experiment (anaerobic set 2)
T A B L E 5. Percentage occurrence of detected motifs in
promoters of negative control set 1 (a-amylase genes)
Motifs
Motifs
Percentage occurrence
Percentage occurrence
TAATTA
TATATA
AAAGAA
AATCCAA
TTAAAAA
CAACTT
TTTTTGT
TATTATA
AAAACC
TTCATCA
ATATAAT
AAAATAAA
TTAAATTT
TTATATATA
TATATATAC
AAAGAAAAA
TAAAAAG
ATTTTTAT
AAACAT
ACAAAA
TTCCAC
TTTGTT
CTATAA
TTTCCAT
GCAACAC
GACTTG
CCTTTTCA
CCATCAGC
CCAAGCAC
CCATCAAC
AAATACCA
CTATAAATA
AGCCATCA(A/G)
CTTGTA(A/G)CCATC
AGAGTCC(T)GGTA
CA(G/T)TGCCTCCAA
GTAGCCATCAAT
A(G/T)TGCCTCCAA
CCTATAAATACCA
AGCAACACTCCAT
C(A/C)CTGCCTATAAT
CTGCCTATAA
56
66
73
58
65
63
61
54
88
48
56
54
50
63
80
63
57
68
79
77
74
74
100
80
73
80
87
80
80
87
80
100
80
53
53
53
60
53
67
80
67
60
No motifs from 13 anaerobic promoters (anaerobic set 1) are found in
the top 20.
and CTGCCTATAA are unknown in plants, but they are
known for other organisms.
T A B L E 4. Comparison of occurrences of different motifs
detected in the promoters of 13 anaerobic genes (anaerobic
set 1) and promoters of 117 anaerobic genes based on the
microarray experiment (anaerobic set 2)
Motifs from the
anaerobic set 1 of
13 anaerobic genes
that participate in the
fermentative pathway
AAACAAA
AGCAGC
CACAAT
TTATTA
AA(A/G)ATT
TTTTTCT
TCATCAC
TTTTCTTC
AAA(A/C)CCTC
GTTT(A/C/T)GCAA
ATATAAATT
(A/C/G)AAAAACAAA
TATAAAAAC
TCCTCCT
CAACTCA
TTCCCTGTT
No.of
promoters
from
anaerobic
set 1 that
contain
the motif
9
11
12
9
9
8
11
8
7
6
8
6
10
9
10
9
No. of promoters
from anaerobic
set 2 (117 genes
expressed in
anaerobic stress
in a microarray
experiment) and
motifs found by the
ab initio method
–
–
–
–
–
71 (TTTTTGT)
56 (TTCATCA)
–
103 (AAAACC)
–
–
–
–
–
73 (CAACTT)
–
No.of
promoters
from
anaerobic
set 1 that
contain
the motif
listed in
column 3
–
–
–
–
–
7
9
–
12
–
–
–
–
8
–
Four motifs from anaerobic set 2 were found to be similar to but not the
same as the motifs detected in anaerobic set 1. The four shared motifs could
be considered as potential TFBSs of TFs that are more commonly involved in
the anaerobic stress response, while the other 12 motifs from column 1 could
be considered as more specific to control of anaerobic genes that belong to the
ethanolic fermentative pathway.
Detection of motifs in 303 promoters of presumably
non-anaerobic genes (negative control set 2)
To validate our results further and to check that the system did not generate an excessive number of false-positive
motifs, a similar detection protocol was applied using 303
genes having different functions and originating from
different plant species (negative control set 2). The top
20 motifs detected with the highest frequency are reported
in Table 6. In this data set, we did not find the same motifs
as in anaerobic promoters of anaerobic set 1 (Table 6), but
we did find three shorter motifs that partly overlap with
motifs of anaerobic set 1. Hence, we conclude that the
motifs of the negative control set 2 are not prominent in
anaerobic set 1.
Out of 20 motifs we detected in this group, only three
partly overlap with the motifs detected in anaerobic set 1
(Table 7). One of them, TATAAAT, found in 79 % of
sequences, contains a commonly found TATA-box that
binds general TFs and, thus, its presence can be expected.
The other two motifs, AAAACAA and CAACTT, that were
similar to motifs detected in anaerobic set 1, were found in
61 and 56 % of sequences, respectively. The motif AAAACAA is present as an auxin-responsive element in pea in the
primary indole acetic acid-inducible gene (Balas et al.,
1993) and partly overlaps the motif (A/C/G)AAAAACAAA
from anaerobic set 1. The motif CAACTT is not found in
plants, but is present in animals. This motif overlaps with
the longer motif CAACTCA from anaerobic set 1, but only
the first 5 nt. These two motifs could therefore be of
common TFBSs in plants, but we have no evidence for
this yet. However, we observe that all other detected motifs
678
Mohanty et al. — Promoters of Anaerobically Induced Genes
T A B L E 6. Percentage occurrence of detected motifs in
promoters of negative non-control set 2 (303 promoter
sequences from PPD and EPD databases excluding genes
not induced or repressed during anoxia)
T A B L E 8. Percentage occurrence of detected motifs in
promoters of genes induced by cold, drought, high salinity
stresses and ABA application
Motifs
Motifs
CTATAA
CAAAAT
CACATT
ACACGT
CAACTT
TTAAAAA
GATTTC
ATTTCAT
AAAATATT
TTTTCAT
AAATCCA
TATATAAA
CTATAAATA
AAAAGAA
TATAAAT
GAAAAA
AAAACAA
ACAAAT
TTTTGT
AAAATATA
67
59
61
62
56
60
52
45
51
68
44
52
53
57
79
84
61
84
63
45
T A B L E 7. Comparison of the occurrence of motifs in
anaerobic promoters (anaerobic set 1) and promoters of negative control set 2
Motifs
AAACAAA
AGCAGC
CACAAT
TTATTA
AA(A/G)ATT
TTTTTCT
TCATCAC
TTTTCTTC
AAA(A/C)CCTC
GTTT(A/C/T)GCAA
ATATAAATT
(A/C/G)AAAAACAAA
TATAAAAAC
TCCTCCT
CAACTCA
TTCCCTGTT
Percentage occurrence
Percentage occurrence
Anaerobic
promoters (13)
(anaerobic set 1)
9
11
12
9
9
8
11
8
7
6
8
6
10
9
10
9
Promoters of
negative control
set 2 (303)
–
–
–
–
–
–
–
–
–
–
239 (TATAAAT)
185 (AAAACAA)
–
–
171 (CAACTT)
–
Thirteen motifs from anaerobic set 1 of the ethanolic fermentative
pathway do not appear in any significant proportion in the negative
control set 2. Of the top 20 motifs detected in the negative control set 2,
only three share similarity with motifs of anaerobic set 1. One of these is
similar to the TATA-box, and thus cannot be considered specific for any
particular gene group. The other two could be some of the more general
motifs required for transcriptional activation of plant genes, but this has yet to
be examined.
(17) are different from the most characteristic motifs of
anaerobic group 1, which demonstrates that our method
provides reasonably accurate detection of motifs specific
for anaerobic responsiveness.
(A/C)AACTT
AA(A/T)CAAA
TATATC
ATATAA
ACTCTTT
TTAAAAA
A(G/T)CCATG
CTCTT(C/T)CA
AAATT(A/G)TT
ATATAAATA
TTTCTTTAT
AA(A/G)AAAAA
AACTTTG
AACAAAA(A/T)GA
C(A/T)AAAACAAA
TAAATATAGAT
TTTAAAGA
AATCAATTC
CTATATAA
AAGAAAA(A/T)
82
71
76
71
76
82
65
65
71
76
71
71
59
47
47
65
76
65
76
59
With the exception of a partial TATA-box motif and AA(A/T)CAAA
which is partly similar to the AAACAAA motif of anaerobic set 1, no
other motif from anaerobic set 1 appears significant among the 20 most
significant motifs of negative control set 3.
Comparison with motifs in the promoters of
negative control set 3
A number of genes have been identified as being induced
by different abiotic stresses (Thomashow, 1999; Shinozaki
Yamaguchi-Shinozaki, 2000). It has been reported that
the ADH1 gene in arabidopsis is induced not only by
low-oxygen stress, but also by other abiotic stresses such
as cold, drought and the hormone abscisic acid (ABA)
(Dolferus et al., 2003). ABA plays an important role
in toleration of different stress conditions (Shinozaki
Yamaguchi-Shinozaki, 2000; Zhu, 2002). Thus, it would
be illuminating to look at genes other than ADH that are
induced by factors such as cold, drought and ABA as a
negative control for anaerobic set 1.
Due to the unavailability of promoter sequences in the
PPD and EPD, it was difficult to analyse all individual genes
induced by factors such as cold, drought or ABA However,
based on cDNA microarray expression analysis performed
in rice (Rabbani et al., 2003), we compiled a data set of
seven genes out of this set whose promoters are present in
PPD and EPD. We detected 20 motifs (Table 8) with high
frequency of occurrence across the promoters. The motifs
detected in this data set were different from the motifs
detected in anaerobic set 1, except for the TATA-box
and AA(A/T)CAAA, which is partly similar to the motif
AAACAAA. However, the other 18 motifs were very different from the motifs detected in anaerobic set 1. As we
have discussed earlier, the TATA-box is a common motif
found in many genes, but the motif AA(A/T)CAAA could
play a common role in stress-induced genes in plants.
This now requires further analysis and experimental
Mohanty et al. — Promoters of Anaerobically Induced Genes
verification. Thus, our results suggest that most of the motifs
detected in this negative control data set are different from
the motifs found in anaerobic genes of the ethanolic fermentative pathway.
Comparison with motifs in the promoters of heat shock
protein genes (negative control set 4)
As a test of specificity, we thought it logical to compare
the motifs we detected in anaerobic genes with those occurring in the promoter regions of other stress response genes.
Accordingly, we compiled a set of promoters of heat shock
protein (HSP) genes. These proteins are usually undetectable under normal growth conditions but become rapidly
induced in response to heat. The accumulation of HSPs
depends on both temperature and duration of the stress
(Howarth, 1991). Not only are the genes induced during
heat stress, but there is also evidence of HSP gene expression in response to osmotic stress, drought stress and also
constitutively at certain developmental stages (Sun et al.,
2002). The way in which HSPs protect cells against some
kinds of stress is not fully understood. They may contribute
to cellular homeostasis and are responsible for protein folding, assembly, translocation and degradation of different
cellular processes. They also help to stabilize proteins
and membranes, and maintain correct protein refolding
(W. Wang et al., 2004).
In the HSP set, we detected 20 motifs showing a high
frequency of occurrence. Some motifs occurred in no less
than 67 % of the genes, while others were present in all the
genes (Table 9). We also searched for these motifs in the
TRANSFAC, PlantCARE and PLACE databases. Seventeen motifs are already known to be present in plants as
TFBSs, and those not seen in plants before have been found
in other organisms. None of these 20 motifs is present in
T A B L E 9. Percentage occurrence of detected motifs in
promoters of HSP genes, negative control set 4
Motifs
ATATAA
AACAAT
(A/C)CACT(A/T)
CCTTTT(A/T)
GCAGAAG
CTAGAAC
TGTTA(A/T)CG
AGCAAACA
CATCTCAT
AAAAAGGA
CCAGAATT
AAAGTT(A/T)CAT
A(A/C)AAGAGAA
AAACAAAATG
TA(G/T)CATTTTA
GAAT(C/T)TTCTA
AATATCATTT
TCTGGAGA
AAAGTTACAT
CAGAATTTTTC
Percentage occurrence
83
100
100
83
67
67
67
67
67
75
75
67
75
67
75
75
75
83
75
75
No motifs from anaerobic promoters are found in the top 20.
679
the anaerobic genes except the TATA-box. However, the
TATA-box is one of the general core promoter elements
and thus not specific for transcriptional activation of any
particular functional gene group. The motifs detected suggest that those we have found for anaerobic genes do not
have any role in HSP gene activation but may well play a
major role in the control of anaerobic genes themselves.
The list of motifs over-represented in the anaerobic genes
from the ethanolic fermentative pathway, together with the
expression profiles of these genes could provide necessary
clues for reconstruction of parts of regulatory networks for
this pathway. In addition, the results shown here will allow
biological validation using standard methods such as quantitative real-time PCR assays and by reporter gene assays of
transgenic plants carrying chimaeric constructs of selected
promoter regions fused to reporter genes.
CONCLUSIONS
We detected motifs in the [200, +50] region relative to the
TSS of 13 anaerobic genes selected from seven different
plants. Sixteen of the most significant motifs were selected
out of 120 motifs. Of these, eight are reported in the
TRANSFAC database as TFBSs, while another three are
included in the PLACE database as parts of other known
motifs. The remaining five motifs have not been reported
previously for plants as binding sites of TFs. However they
have been reported as such for other organisms including
humans, mouse, rat, Drosophila and others, increasing the
chances that the new motifs we found in the majority of
anaerobic promoters are biologically active and relevant
to the regulation of anaerobic metabolism in plants. We
also searched for the presence of animal TFBSs in plant
homologues (arabidopsis and rice). Although the results did
not provide sufficient support to prove that the animal TFs
are present in plants, they do provide some encouraging
clues to guide further analysis, since several hits were to
TFs in plants, but these hits were not of sufficiently strong
similarity. We also detected motifs in a larger promoter
group from 117 genes from various species from anaerobic
set 2. Four motifs, TTTTTGT, TTCATCA, AAAACC and
CAACTT, were found to be similar to the motifs detected in
the data set of 13 anaerobic promoters from the ethanolic
fermentative pathway.
We compiled a set of a-amylase promoters as a negative
control to compare with anaerobic promoters. The motifs
found for anaerobic promoters were different from the
motifs detected for a-amylase promoters. We also compared the presence of motifs from anaerobic promoters in
a set of 303 promoters compiled from presumably nonanaerobic genes (negative control set 2). Out of the 20
most significant motifs, only three partly overlapped with
motifs from anaerobic set 1. One of these was similar to
the TATA-box, which is commonly found in the upstream
regions of very many genes. The other two could be TFBSs
that are more commonly shared in plant promoters. To
validate further our motifs found in anaerobic set 1, we
detected motifs in a number of genes induced by cold,
drought, high salinity and ABA application (negative
680
Mohanty et al. — Promoters of Anaerobically Induced Genes
control set 3) that ADH also responds to. The 16 motifs
detected in anaerobic set 1 were not found in the top 20
significant motifs detected in the negative control set 3, with
the exception of a partial TATA-box motif and AA(A/T)CAAA which is partly similar to the AAACAAA motif of
anaerobic set 1. This result suggests that although ADH
responds to different stress conditions, the regulation
could be different depending on the stress condition. In
the data set of HSP promoters, no motif (out of the top
16) from anaerobic set 1 was present. These observations
indicate that the 16 motifs we detected for anaerobic promoters could be biologically active, as they are highly specific to promoters of anaerobic genes that belong to the
ethanolic fermentative pathway. The five new motifs that
are not yet known as plant TFBSs are potentially new binding sites in plants and they, either individually or in combination with other motifs, could play an important role in
regulating anaerobic metabolism. However, experimental
verification will be necessary to establish the functionality
of these motifs more certainly.
LITERATURE CITED
Altschul SF, Madden T, Schäffer AA, Zhang J, Zhang Z, Miller W,
Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new generation
of protein database search programs. Nucleic Acids Research 25:
3389–3402.
Armstrong W. 1979. Aeration in higher plants. Advances in Botanical
Research 7: 225–232.
Arpagaus S, Braendle R. 2000. The significance of a-amylase under anoxia
stress in tolerant rhizomes (Acorus calamus L.) and non-tolerant tubers
(Solanum tuberosum L. var. Desiree). Journal of Experimental Botany
51: 1475–1477.
Bailey TL, Elkan C. 1994. Fitting a mixture model by expectation
maximization to discover motifs in biopolymers. Proceedings of the
Second International Conference on Intelligent Systems for Molecular
Biology (ISMB’94) 2: 28–36, AAAI Press, Menlo Park, California,
August.
Ballas N, Wong LM, Theologis A. 1993. Identification of the auxinresponsive element, AuxRE, in the primary indoleacetic acid-inducible
gene, PS-IAA4/5, of pea (Pisum sativum). Journal of Molecular
Biology 233: 580–596.
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. 2004.
GenBank update. Nucleic Acids Research 32: D23–D26.
Boeckmann B, Bairoch A, Apweiler R, Blatter M, Estreicher A,
Gasteige E, et al. 2003. The SWISS-PROT protein knowledgebase and its supplement TRMBL. in 2003. Nucleic Acids Research
31: 365–370.
Califano A. 2000. Splash, structural pattern localization analysis by
sequential histograms. Bioinformatics 16: 341–357.
Caselle M, Di Cunto F, Provero P. 2002. Correlating overrepresented
upstream motifs to gene expression: a computational approach to
regulatory element discovery in eukaryotes. BMC Bioinformatics 3: 7.
Dolferus R, Jacobs M, Peacock WJ, Dennisn ES. 1994. Differential
interactions of promoter elements in stress responses of the arabidopsis
Adh gene. Plant Physiology 105: 1075–1087.
Dolferus R, Klok EJ, Delessert C, Wilson S, Ismond KP, Good AG,
et al. 2003. Enhancing the anaerobic response. Annals of Botany 91:
111–117.
Fennoy SL, Bailey-Serres J. 1995. Post-transcriptional regulation of
gene expression in oxygen-deprived roots of maize. Plant Journal
7: 287–295.
Geffers R, Sell S, Cerff R, Hehl R. 2001. The TATA box and a Myb binding
site are essential for anaerobic expression of a maize Gap C4 minimal
promoter in tobacco. Biochimica et Biophysica Acta 1521: 120–125.
Greenway H, Gibbs J. 2003. Mechanism of anoxia tolerance in plants. I.
Growth, survival and anaerobic catabolism. Functional Plant Biology
30: 1–47.
Grover A, Pareek A, Singla SL, Minhas D, Katiyar S, Ghawana S, et al.
1998. Engineering crops for tolerance against abiotic stresses through
gene manipulation. Current Science 75: 689–696.
Guglielminetti L, Yamaguchi J, Perata P, Alpi A. 1995. Amylolytic
activities in cereal seeds under aerobic and anaerobic conditions. Plant
Physiology 109: 1069–1076.
van Helden J, Andre B, Collado-Vides J. 1998. Extracting regulatory
sites from the upstream region of yeast genes by computational analysis
of oligonucleotide frequencies. Journal of Molecular Biology 281:
827–842
Hertz GZ, Stormo GD. 1999. Identifying DNA and protein patterns with
statistically significant alignments of multiple sequences. Bioinformatics 15: 563–577.
Higo K, Ugawa Y, Iwamoto M, Korenaga T. 1999. Plant cis-acting
regulatory DNA elements (PLACE) database. Nucleic Acids Research
27: 297–300.
Hoeren FU, Dolferus R, Wu Y, Peacock WJ, Dennis ES. 1998. Evidence
for a role of AtMYB2 in the induction of the arabidopsis alcohol
dehydrogenase (ADH1) gene by low oxygen. Genetics 149: 479–490.
Howarth CJ. 1991. Molecular responses of plants to an increased incidence
of heat shock. Plant, Cell and Environment 14: 831–841.
Huang E, Yang L, Chowdhary R, Kassim A, Bajic VB. 2005. An algorithm
for ab-initio DNA motif detection. In: Bajic VB, Tan TW, eds.
Information processing and living system. World Scientific, Imperial
College Press, London, 611–614.
Jin H, Martin C. 1999. Multifunctionality and diversity within the plant
MYB-gene family. Plant Molecular Biology 41: 577–585.
Kennedy RA, Rumpho ME, Fox TC. 1992. Anaerobic metabolism in
plants. Plant Physiology 84: 1204–1209.
Khush GS, Baenziger PS. 1998. Crop improvement: emerging trends in
rice and wheat. In: Chopra VL, Singh RB, Verma A, eds. Crop
productivity and sustainability—shaping the future. New Delhi:
Oxford and BH publishing, 113–125.
Klok EJ, Wilson IW, Wilson D, Chapman SC, Ewing RM, Somerville SC,
et al. 2002. Expression profile analysis of the low-oxygen response in
arabidopsis root cultures. Plant Cell 14: 2481–2494.
Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF,
Wootton JC. 1993. Detecting subtle sequence signals: a Gibbs
sampling strategy for multiple alignments. Science 262: 208–214.
Lescot M, Dehais P, Thijs G, Marchal K, Moreau Y, Van de Peer Y, et al.
2002. PlantCARE, a database of plant cis-acting regulatory elements
and a portal to tools for in silico analysis of promoter sequences.
Nucleic Acids Research 30: 325–327.
Loreti E, Yamaguchi J, Alpi A, Perata P. 2003. Sugar modulation of
a-amylase genes under anoxia. Annals of Botany 91: 143–148.
Matys V, Fricke E, Geffers R, GoBling E, Haubrock M, Hehl R, et al.
2003. TRANSFAC: transcriptional regulation, from patterns to
profiles. Nucleic Acids Research 31: 374–378.
Olive MR, Walker JC, Singh K, Dennis ES, Peacock WJ. 1990.
Functional properties of the anaerobic response element of the maize
Adh1 gene. Plant Molecular Biology 15: 593–604.
Olive MR, Walker JC, Singh K, Ellis JG, Llewellyn D, Dennis ES, et al.
1991a. The anaerobic response element. Plant Molecular Biology 2:
673–684.
Olive MR, Peacock WJ, Dennis ES. 1991b. The anaerobic responsive
element contains two GC-rich sequences essential for binding a nuclear
protein and hypoxic activation of the maize Adh1 promoter. Nucleic
Acids Research 19: 7053–7060.
Pavesi G, Mereghetti P, Mauri G, Pesole G. 2004. Weeder Web: discovery
of transcription factor binding sites in a set of sequences from
co-regulated genes. Nucleic Acids Research 32: W199–W203.
Perata P, Pozueta-Romero J, Akazawa T, Yamaguchi J. 1992. Effect
of anoxia on starch breakdown in rice and wheat seeds. Planta 188:
611–618.
Perata P, Loreti E, Guglielminetti L, Alpi A. 1998. Carbohydrate
metabolism and anoxia tolerance in cereal grains. Acta Botanica
Neerlandica 47: 269–283.
Poluliakh N, Nakai K. 2003. Extraction of biological motifs by Gibbs
Sampler from the promoters of Homo sapiens, Saccharomyces
cerevisiae and Bacillus subtilis. Genome Informatics 14: 406–407.
Praz V, Perier R, Bonnard C, Bucher P. 2002. The Eukaryotic promoter
database, EPD: new entry types and links to gene expression data.
Nucleic Acids Research 30: 322–324.
Mohanty et al. — Promoters of Anaerobically Induced Genes
Rabbani MA, Maruyama K, Abe H, Khan MA, Katsura K, Ito Y, et al.
2003. Monitoring expression profiles of rice genes under cold, drought,
and high-salinity stresses and abscisic acid application using cDNA
microarray and RNA gel-blot analyses. Plant Physiology. 133:
1755–1767.
ap Rees T, Jenkin LET, Smith AM, Wilson PM. 1987. The metabolism
of flood tolerance plants. In Crawford RMM, ed. Plant life in aquatic
and amphibious habitats. Oxford: Blackwell Scientific, 227–238.
Sachs MM, Freeling M, Okimoto R. 1980. The anaerobic proteins of
maize. Cell 20: 761–767.
Sachs MM, Subbaiah CC, Saab IN. 1996. Anaerobic gene expression and
flooding tolerance in maize. Journal of Experimental Botany 47: 1–15.
Shahmuradov IA, Gammerman AJ, Hancock JM, Bramley PM,
Solovyev VV. 2003. PlantProm: a database of plant promoter
sequences. Nucleic Acids Research 31: 114–117.
Shinozaki K, Yamaguchi-Shinozaki K. 2000. Molecular responses to
dehydration and low temperature: differences and cross-talk between
two stress signaling pathways. Current Opinion in Plant Biology 3:
217–223.
Stormo GD. 2000. DNA binding sites: representation and discovery.
Bioinformatics. 16:16–23.
Sun Z, Henson CA. 1991. A quantitative assessment of the importance
of barley seed a-amylase, debranching enzyme, and a-glucosidase
in starch degradation. Archives of Biochemistry and Biophysics 284:
298–305.
Sun W, Montagu MV, Verbruggen N. 2002. Small heat shock proteins and
stress tolerance in plants. Biochimica et Biophysica Acta 1577: 1–9.
Thomashow MF. 1999. Plant cold acclimation: freezing tolerance genes
and regulatory mechanisms. Annual Review of Plant Physiology and
Plant Molecular Biology 50: 571–599.
681
Vartapetian BB, Jackson MB. 1997. Plant adaptations to anaerobic stress.
Annals of Botany 79: 3–20.
Vilo J, Brazma A, Jonassen I, Robinson A, Ukkonen E. 2000. Mining for
putative regulatory elements in the yeast genome using gene expression
data. Proceedings of the International Conference on Intelligent
Systems for Molecular Biology 8: 384–394.
Vogel JM, Roth B, Cigan M, Freeling M. 1993. Expression of the
two maize TATA binding protein genes and function of the
encoded TBP proteins by complementation in yeast. Plant Cell 5:
1627–1638.
Walker JC, Howard EA, Dennis ES, Peacock WJ. 1987. DNA sequences
required for anaerobic expression of the maize Adh1 gene.
Proceedings of the National Academy of Sciences of the USA 84:
6624–6629.
Wang W, Vinocur B, Shoseyov O, Altman A. 2004. Role of plant
heat-shock proteins and molecular chaperones in the abiotic stress
response. Trends in Plant Science 9: 244–252.
Wang Z, Dalkilic M, Kim S. 2004. Guiding motif discovery by iterative
pattern refinement. ACM Symposium on Applied Computing, Nicosia,
Cyprus. March 14–17: 162–166.
Werner T, Fessele S, Maier H, Nelson PJ. 2003. Computer modeling of
promoter organization as a tool to study transcriptional coregulation.
FASEB Journal 17: 1228–1237.
Yanagisawa S. 2002. The Dof family of plant transcription factors. Trends in
Plant Science 7: 555–560.
Yang L, Huang E, Bajic VB. 2005. Some implementation issues of heuristic
methods for motif extraction from DNA sequences. International
Journal of Computers, System, and Signals (in press).
Zhu JK. 2002. Salt and drought stress signal transduction in plants.
Annual Review of Plant Biology 53: 247–273.