Annals of Botany 96: 669–681, 2005 doi:10.1093/aob/mci219, available online at www.aob.oupjournals.org Detection and Preliminary Analysis of Motifs in Promoters of Anaerobically Induced Genes of Different Plant Species B I J A Y A L A X M I M O H A N T Y 1,*, S . P . T . K R I S H N A N 1, S A N J A Y S W A R U P 2 and V L A D I M I R B . B A J I C 1 1 Knowledge Extraction Laboratory, Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613 and 2Department of Biological Sciences, National University of Singapore, Singapore Received: 29 October 2004 Returned for revision: 16 December 2004 Accepted: 31 January 2005 Published electronically: 18 July 2005 Background and Aims Plants can suffer from oxygen limitation during flooding or more complete submergence and may therefore switch from Kreb’s cycle respiration to fermentation in association with the expression of anaerobically inducible genes coding for enzymes involved in glycolysis and fermentation. The aim of this study was to clarify mechanisms of transcriptional regulation of these anaerobic genes by identifying motifs shared by their promoter regions. Methods Statistically significant motifs were detected by an in silico method from 13 promoters of anaerobic genes. The selected motifs were common for the majority of analysed promoters. Their significance was evaluated by searching for their presence in transcription factor-binding site databases (TRANSFAC, PlantCARE and PLACE). Using several negative control data sets, it was tested whether the motifs found were specific to the anaerobic group. Key Results Previously, anaerobic response elements have been identified in maize (Zea mays) and arabidopsis (Arabidopsis thaliana) genes. Known functional motifs were detected, such as GT and GC motifs, but also other motifs shared by most of the genes examined. Five motifs detected have not been found in plants hitherto but are present in the promoters of animal genes with various functions. The consensus sequences of these novel motifs are 50 -AAACAAA-30 , 50 -AGCAGC-30 , 50 -TCATCAC-30 , 50 -GTTT(A/C/T)GCAA-30 and 50 -TTCCCTGTT-30 . Conclusions It is believed that the promoter motifs identified could be functional by conferring anaerobic sensitivity to the genes that possess them. This proposal now requires experimental verification. Key words: Anaerobic genes, promoters, motifs, anaerobic response elements, ab initio motif detection, transcription factors, transcription factor-binding sites, Arabidopsis thaliana, ethanolic fermentative pathway. INTRODUCTION Plants often suffer from a shortage of oxygen during partial or complete submergence. Initially the inundated parts, especially roots, suffer from hypoxia, which later turns to anoxia, as the slow diffusion of oxygen in water (10 000 times slower than in air, Armstrong, 1979) fails to match the demands of respiration. Rice (Oryza sativa) is the only cereal crop that can tolerate anaerobic condition for prolonged periods of time compared with other cereal crops. In addition, seasonal rainfall affects several other important arable crops such as wheat (Triticum aestivum), maize (Zea mays), barley (Hordeum vulgare) and other cereals via waterlogging problems in certain soils that also lead to low-oxygen stress particularly in their root zone. Anaerobic stress, together with other abiotic stresses, is the prime cause of crop loss worldwide (Grover et al., 1998; Khush and Baenziger, 1998). Thus, elucidation of mechanisms of plant response and adaptation to the stress is of considerable scientific and economic importance. Despite continuing efforts to discover the mechanisms behind submergence tolerance, the actual mechanisms governing this behaviour remain elusive. One step towards this direction would be to understand the regulation of anaerobic genes at the genome level. * For correspondence. E-mail [email protected] During anaerobiosis, plants switch from Kreb’s cycle respiration to fermentation. Although there are a number of fermentative pathways operating during anoxia (ethanol, lactic acid and alanine fermentation; Kennedy et al., 1992), ethanolic fermentation is the main energy-producing pathway (ap Rees et al., 1987; Greenway and Gibbs, 2003). However, how this pathway is controlled is not clearly understood although various anaerobic proteins (ANPs) become induced during anaerobiosis in the roots. Approximately 20 ANPs have been identified in maize roots by cDNA cloning, and most are enzymes of glycolysis and fermentation such as sucrose synthase, pyrophosphatedependent phosphofructokinase, fructose-1,6-phosphate aldolase, glucose-6-phosphate isomerase, glyceraldehyde3-phosphate dehydrogenase, alcohol dehydrogenase, lactate dehydrogenase, pyruvate decarboxylase and others (Sachs et al., 1980, 1996). The expression of such ANPs is controlled predominantly at the transcriptional level, although a post-transcriptional regulatory mechanism has also been demonstrated (Fennoy and Bailey-Serres, 1995). It is generally believed that genes having similar expression patterns contain common motifs in their promoter regions (Vilo et al., 2000). Klok et al. (2002) analysed the expression of low-oxygen response in arabidopsis root cultures and found that the transcriptional regulatory regions of genes with a similar expression share similar motifs. Thus, a common set of transcription factors (TFs) is likely ª The Author 2005. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For Permissions, please email: [email protected] 670 Mohanty et al. — Promoters of Anaerobically Induced Genes to control these genes. Common promoter motifs are the key signatures for a family of co-regulated genes and are usually present in the regions where complex protein interactions occur (Z. Wang et al., 2004). However, in some cases, single motifs can bind various transcription factors thereby bringing the genes under multiple regulatory controls (Jin and Martin, 1999; Yanagisawa, 2002). Extensive studies on 500 bp upstream regions of yeast promoters suggest that regulatory elements are commonly present in those regions (Caselle et al., 2002). In eukaryotes, the computational detection of regulatory sites is difficult as the sequences where TFs bind are generally much shorter than in prokaryotes (van Helden et al., 1998). In addition, they are generally active in both orientations and can be dispersed over a large distance. Sometimes, they can be present in introns and also in distal parts of the promoter (Caselle et al., 2002). While many computer programs have been developed to detect biological motifs (Lawrence et al., 1993; Bailey and Elkan, 1994; Hertz and Stromo, 1999; Califano, 2000; Pavesi et al., 2004; Huang et al., 2005; Yang et al., 2005; and others), it is still a considerable challenge to detect accurately previously defined regulatory sites in regions of interest, let alone identify new regulatory sites responsible for activation and expression of a functional gene group. Anaerobically induced genes are often characterized by the presence of anaerobic response elements (AREs) in their promoter regions (Walker et al., 1987). AREs have been reported in promoters of maize ADH1, ADH2 and aldolase genes, and arabidopsis (Arabidopsis thaliana) ADH1, LDH1 and PDC1 genes (Olive et al., 1990; Dolferus et al., 1994; Hoeren et al., 1998). Two motifs, the GT motif and GC motif, have been identified as AREs in these promoters (Olive et al., 1990, 1991a, b). The transcription factor AtMYB2 in arabidopsis binds to the GT motif of the ADH1 promoter and a GC-binding protein binds to the GC motif (Olive et al., 1991b). The consensus sequence for an AtMYB2-binding site present in all known anaerobically induced genes is 50 -AAACC(G/A)(G/A)-30 (reverse complement of the GT-box) (Hoeren et al., 1998), while the one for the GC motif is 50 -GCCCC-30 . There is evidence that the transcription factor AtMYB2 is induced by low-oxygen stress (Hoeren et al., 1998) and, thus, could be another important factor for the regulation of anaerobic genes. Our aim was to find common motifs shared by the majority of regulatory regions of anaerobic genes that would be in addition to the previously well-characterized GC and GT motifs. Detection of new motifs that could be potential transcription factor-binding sites (TFBSs) could help in understanding the transcriptional regulation of these anaerobic genes under stress conditions. More importantly, a clearer understanding of the promoter content and architecture would allow a better understanding of co-ordinate regulation and could provide background for reconstruction of parts of gene regulatory networks of anaerobic responsive genes (Werner et al., 2003). TFBSs usually have length of 4–20 nt with 0–50 % mismatches per motif (Poluliakh and Nakai, 2003). Important motifs can be discovered computationally as patterns common to most sequences in a family of sequences, either as sequence patterns based on sequence comparison or from comparison of structures (Z. Wang et al., 2004). Many biologically relevant patterns have been found using motif discovery algorithms. There are several programs available for the extraction of motifs, such as MEME (Bailey and Elkan, 1994), GibbsDNA (Lawrence et al., 1993), CONSENSUS (Hertz and Stromo, 1999), SPLASH (Califano, 2000) and Weeder Web (Pavesi et al., 2004), among others. We have used the DRAGON MOTIF BUILDER (DMB) program (Huang et al., 2005; Yang et al., 2005) for our detection work. This program has already been applied successfully to the analysis of 11 500 mouse and 18 300 human promoters to detect motifs in an ab initio manner (Huang et al., 2005). In this report, we describe several new common patterns that have not been reported previously for plants and could be functional TFBSs. The fidelity of our list of overrepresented motifs is enhanced by finding several already known TFBSs in our target group of anaerobic genes. We demonstrate by using negative control promoter sets that the patterns we discovered are over-represented only in the anaerobic promoter target group and are specific to that group, giving further support to our hypothesis that the new motifs we report here could have a functional role as anaerobic promoter elements. METHODS Promoter sequences Plant promoter sequences were extracted from SoftBerry’s Plant Promoter Database (PPD) (Shahmuradov et al., 2003), the Eukaryotic Promoter Database (EPD) (Praz et al., 2002) and GenBank (Benson et al., 2004). The nature of our study requires very accurate location of transcription start sites (TSSs). This has been provided in the PPD and EPD databases, as the TSS locations of sequences there were verified experimentally. Unfortunately, data in GenBank lack accuracy in this respect, and every sequence has to be manually checked, which is a virtually impossible task. Similarly, promoter data for arabidopsis (http://arabidopsis.med.ohiostate.edu/AtcisDB/) are predicted and thus of inadequate accuracy for our tasks. Consequently, the main sources of our promoter data have been PPD and EPD. In our study, we have used only 200 bp upstream regions relative to TSS mainly caused by the availability of promoter sequences in SoftBerry’s PPD. For all the anaerobic genes, we have extracted [–200, +50] regions relative to TSS for the motif detection analysis. Data set of anaerobic genes (anaerobic set 1) A group of 13 anaerobic genes that belong to the ethanolic fermentative pathway from seven different plant species were used to identify probable promoter regulatory elements (motifs). We have extracted six promoter sequences from PPD, five sequences from EPD and two from GenBank. The genes included are maize alcohol dehydrogenase 1 (ADH1), maize ADH2, arabidopsis ADH, pea (Pisum sativum) ADH1, petunia (Petunia hybrida) ADH2, tomato (Lycopersicon esculentum) ADH3a, tomato ADH3b, barley ADH2, maize sucrose synthase, arabidopsis sucrose synthase, rice sucrose synthase, rice pyruvate Mohanty et al. — Promoters of Anaerobically Induced Genes decarboxylase (PDC) and maize fructose bisphosphate aldolase. This promoter set we denote as anaerobic set 1. Although anaerobic gene sequences from this same pathway, such as arabidopsis PDC1, maize glyceraldehyde3-phosphate dehydrogenase, rice ADH2, cotton (Gossypium hirsutum) ADH2b-2 and arabidopsis lactate dehydrogenase1, were available, we were not able to use them since they lack accurate information regarding TSSs and promoter regions. Tool for motif detection (DMB program) To find the known and unknown promoter motifs in the compiled promoter sequences, we used the DMB program (http://research.i2r.a-star.edu.sg/DRAGON/Motif_Search/) and the following parameters: EM2, single motif occurrence in the sequences with zero or one motif per sequence. We searched for all motifs of lengths 5–12 nt, with the total number of ten motifs per session. In the sessions, we manually changed the thresholds. Analysis of motifs A total of 120 motifs were detected in the promoters of the 13 anaerobic genes using the DMB program with different thresholds. Out of the 120 motifs, we selected 16 motifs with the highest frequency of appearance. The significance of the selected motifs was evaluated by searching for their presence in TFBS databases such as TRANSFAC (Matys et al., 2003; http://www.gene-regulation.com), PlantCARE (Lescot et al., 2002; http://oberon.rug.ac.be:8080/ PlantCARE/index.html) and PLACE (Higo et al., 1999; http://www.dna.affrc.go.jp/htdocs/PLACE/). Homology search for TFs that could bind motifs (found in anaerobic set 1) that are unknown in plants We found five unknown motifs in anaerobic set 1, which are not known to act as TFBSs in plants, but are known to be TFBSs in animal cells. We searched for homology of TFs that bind these animal TFBSs to arabidopsis and rice proteins. We used BLAST (Altschul et al., 1997) and the internet service at http://www.ncbi.nlm.nih.gov/BLAST/ producttable.shtml. The protein sequences found in animals were extracted from Swiss-Prot Protein Knowledgebase (Boeckmann et al., 2003; http://tw.expasy.org/sprot/). Data set of 117 anaerobic genes involved in signal transduction/transcription and various metabolic pathways (anaerobic set 2) Based on the results of low-oxygen response in arabidopsis root culture in a microarray experiment, by Klok et al. (2002), we selected a larger data set of anaerobic genes that are highly overexpressed or underexpressed. Based on gene names in this group, we extracted from different plant species 117 promoter sequences from PPD and EPD and we denote this set as anaerobic set 2. This set includes 13 anaerobic genes of the ethanolic fermentative pathway (anaerobic set 1), as well as many other genes of a heterogeneous nature involved in signal transduction/transcription which belong to a number of different metabolic pathways. 671 With this set, we aimed to check to what extent motifs found in anaerobic set 1 are shared with promoters of anaerobic set 2. Negative control set 1 (data set of a-amylase genes) A data set of 15 promoters of a-amylase genes from four plants of the cereal group [rice, wild oat (Avena fatua), barley and wheat] was selected as the negative control set 1. a-Amylase was chosen for negative control since it is a starch-degrading enzyme and is known to be anaerobically induced only in rice, which is in contrast to our sugardegrading anaerobic genes. We have extracted promoter regions of these genes from PPD and EPD databases. Negative control set 2 (promoters from PPD and EPD which excludes known anaerobically induced genes) From PPD and EPD, we extracted another negative data set of 303 genes having different functions and originating from different plants. This set does not include either the anaerobic genes or the genes differentially expressed by low oxygen in the microarray experiment of Klok et al. (2002). This was mainly done to observe whether the motifs detected for the anaerobic genes in the ethanolic fermentative pathway would also turn up in this negative control set. Negative control set 3 (promoters of genes induced by cold, drought, high salinity stresses and ABA application) Promoter sequences of the genes dehydrin, aldehyde dehydrogenase, protease inhibitor, catalase, chlorophyll a-b binding protein, actin and phenylalanine ammonia-lyase (based on expression of rice genes in microarray experiments; Rabbani et al., 2003) induced by cold, drought, high salinity and abscisic acid (ABA) application were extracted from PPD and EPD. Altogether, 17 sequences were extracted from seven different plants [arabidopsis, rice, potato (Solanum tuberosum), tomato, pea, curled-leaved tobacco (Nicotiana plumbaginifolia), wood tobacco (Nicotiana sylvestris) and oilseed rape (Brassica napus)]. Negative control set 4 (promoters of heat shock protein genes) To compare the motifs detected in the anaerobic genes with other stress response genes, we extracted from PPD and EPD a set of 12 genes responsive in heat stress from four different plant species [rice, arabidopsis, soybean (Glycine max) and maize]. RESULTS AND DISCUSSION Selection and significance of motifs We analysed 13 anaerobic genes from ethanolic fermentative pathway from seven plant species (anaerobic set 1) and searched for shared promoter motifs. This pathway is the most prominent pathway involved in anaerobic stress. Out of 120 motif groups detected, we selected 16 that were the most frequent (they appeared in >46 % of promoters, a minimum six or more promoters) (Fig. 1). The maximum occurrence in anaerobic promoters was 92 % Mohanty et al. — Promoters of Anaerobically Induced Genes Number of occurrences in different sequences 672 14 12 10 8 6 4 2 A A G A A CA A A CA G CA C CA A TT T A A T A (A T A /G )A TT TT TT TC TC A T TC A TT C A A TT A C G T T (A/ T T C T ( C)C A /C CT C /T )G (A A C /C T /G AT AA )A A A A A A TT A A TA CA TA AA A A A T C AC CT CA CC A T T T CT CC CA CT G TT 0 Motifs detected F I G . 1. Sixteen different motif groups were detected in the promoter sequences of 13 anaerobic genes involved in the fermentative pathway (anaerobic set 1). Each motif group is presented by the consensus sequence for the group. The number of promoters that contain the motif for specific groups is given. 18 16 Total information content 14 12 10 8 6 4 2 06 CA CA C G 06 CA G A 06 A T TT 07 A TT A A A A CA 07 CA AA A CT 07 TC CA A 07 TC TC AC CT CC 08 07 T A TT A A TTT (A /C CT )C 08 CT TT C 09 09 A TTC TT G TT TA C T( TA A A /C AT /T T 09 ) TA GC A TA A A 09 10 A TT A (A CC AC /C /G CT )A G A TT A A A CA A A 06 A A (A /G )A TT 0 Length and type of motif detected F I G . 2. The total information content (IC) for each motif group is given. These indicate how homogeneous motifs in an individual motif group are. The smaller the difference between the maximum IC and the total IC for the group, the more homogeneous are the motifs in the group, i.e. they are more similar to each other. These data are provided for 16 top ranked motifs detected in the promoters of 13 anaerobic genes involved in the fermentative pathway (anaerobic set 1). The motifs are of different length, and the maximum IC for each motif group is 2· motif length. All motif groups have a total IC >66 % of the maximum possible IC for motifs of that length, indicating that motifs in the detected motif groups are very similar to each other within the group. (12 out of 13) and 13 motifs were found with either 62 % occurrence (eight out of 13) or more. The total information content (IC) (for definition see Stormo, 2000 and references therein) of each motif group is given in Fig. 2. The minimum value for the total IC for each motif was >66 % of the maximum possible values, while the maximum was 87 % of those values. This indicates that the motifs in individual groups are very homogeneous (i.e. very similar to each other). We analysed the distribution of motifs in different segments of promoters and, if the first nucleotide of the motif fell within the considered region, we counted that motif as appearing in that region. We looked for the presence of motifs in [200, 150], [150, 100], [100, 50], Mohanty et al. — Promoters of Anaerobically Induced Genes [50, 1] and [+1, +50] regions. We observed that most of the motifs were found in the regions [200, 150], [150, 100] and [50, 1]. The highest percentage of the motifs was found in the region [50, 1]. The motifs AGCAGC, CACAAT, TTATTA and CAACTCA were found in all upstream regions. However, most interestingly, some motifs seem to prefer very specific regions. For example, the ATATAAATT motif has a preferred region [50, 1] where it is found in 77 % of promoters. The TATAAAAAC motif appeared in 47 % of promoters in the same region. Both of these motifs contain the TATA-box motif TATAAA. It is commonly believed that the TATA-box is present at positions around 30 relative to the TSS, which has good concordance with our result. In plants, two types of TATA-binding proteins are present, which bind to the TATA-box (Vogel et al., 1993). In tobacco, the TATA-box of the GapC4 promoter is required for anaerobic gene expression and is bound by a TATA-box-binding protein (TBP) (Geffers et al., 2001). The motif AAA(A/C)CCTC was found in the region 200 to 150 in 46 % of promoters and it contains a central motif AACC that is at the core of the GT-box (reverse complement). The presence of GT motifs in the promoter of anaerobically induced genes of different plant species was studied by Hoeren et al. (1998). They observed that the GT motif is present in all anaerobically induced genes and is located within 300 to 100 bp upstream of TSS. In our analysis, the location of the GT motif in the [200, 150] region agrees very closely with the observations of Hoeren et al. In arabidopsis, the presence of a GT motif located in the promoters of the ADH1 gene is responsible for the induction of anaerobic genes. Expression profiling of low-oxygen responsive genes in arabidopsis using a 3500 cDNA array revealed 210 differentially expressed transcripts (Klok et al., 2002). These genes were organized in six related groups based on their patterns of RNA accumulation levels. The clustered genes showed over-representation of 6–10 bp motifs that were previously described for the ADH1 promoter. Out of the six motifs listed by Klok et al. (2002), GC and GT motifs were similar to the two motifs identified by our analysis. The presence of previously identified motifs in our search gives an indication that our methodology is reasonable. The motifs TTTTTCT and TTTTCTTC are each present in 62 % of promoters in the [50, 1] region. The motif GTTT(A/C/T)GCAA was also found in [50, 1] with 47 % occurrence. The motif TCATCAC was present in the region [50, 1] in 54 % of promoters. One motif, TTCCCTGTT, is found in 54 % of promoters in the [100, 50] region. We also provide for convenience in Fig. 3 positional distributions of the motifs found in 13 promoters of anaerobic set 1. In Fig. 3, the patterns ATATAAATT (motif 11) and TATAAAAAC (motif 13) contain the commonly found TATA-box motif and both have a preferred position in [50, 1] as one would expect for a TATA-box motif. The new motifs which are not found in plants such as TCATCAC (motif 7) and TTCCCTGTT (motif 16) have a similar preferred region [75, +20], with just one of each of these motifs falling outside the region. The motif TCATCAC is present in the promoter sequences of ADH, sucrose synthase 673 and aldolase genes, whereas the motif TTCCCTGTT is present in ADH and aldolase genes. The other new motifs AAACAAA (motif 1) and AGCAGC (motif 2) also show closer distribution and are found in ADH, sucrose synthase and aldolase gene sequences. Among the other motifs, TCCTCCT (motif 14) is distributed around the same region of the promoters and is found in ADH, PDC and sucrose synthase genes. The motifs CACAAT (motif 3) and AA (A/ G)ATT (motif 5) are distributed similarly on the promoters and are found in ADH and sucrose synthase genes. The motif TTTTTCT (motif 6) and TTTTCTTC (motif 8) are both found in ADH, PDC and sucrose synthase genes and also show a similar distribution pattern. The other motifs, TTATTA, GTTT(A/C/T)GCAA and CAACTCA, are more randomly distributed across promoters. Thus, the positional distributions of different motifs indicate specific positional preferences that may suggest regulatory roles for such motifs in the fermentative pathway of anaerobic genes. The significance of the motifs found was determined by searching for their presence in TFBS databases such as the TRANSFAC, PlantCARE and PLACE database (Table 1). Out of 16 motifs studied, eight of them [TTTTTCT, TTTTCTTC, AAA(A/C)CCTC, ATATAAATT, (A/C/G)AAAAACAAA, TATAAAAAC, TCCTCCT and CAACTCA] are reported in the TRANSFAC database as TFBSs for plants. The motifs CACAAT, TTATTA and AA(A/G)ATT have been reported in the PLACE database as parts of motifs of other TFBSs for plants. Only one motif, AAA(A/C)CCTC, has been reported partly as the TFBS of the ARE GT motif in maize (Walker et al., 1987). A part of this motif, AAACC, has been reported in GapC4. The GC motif of maize with the consensus sequence 50 -GCCCC-30 was also detected in 11 out of 13 promoters. In the region [150, 100], this motif appears in 47 % of promoters. Five other motifs are not reported as TFBSs for any plant species but are known as TFBs for other organisms such as human, rat, chicken, mouse, Drosophila, etc. (see Table 1). These motifs are AGCAGC (11 out of 13), AAACAAA (nine out of 13), TCATCAC (11 out of 13), TTCCCTGTT (nine out of 13) and GTTT(A/C/T)GCAA (six out of 13) and showed a high occurrence ranging from 46 to 85 % of promoters of our group of 13 anaerobic genes. Accordingly, they are strong candidates to be regulatory elements for anaerobic metabolism, particularly as they serve as binding sites for TFs in various animals (Table 1). These motifs now urgently require additional experimental verification to define their potential role in control of fermentative pathways. Presence of animal TFs (unknown anaerobic motifs detected in anaerobic set 1) in plant homologues The motifs AAACAAA, AGCAGC, TCATCAC, GTTT(A/C/T)GCAA and TTCCCTGTT, found in anaerobic set 1, are not known to be binding sites for plant TFs, but are found in animals. The protein sequences of the animal TFs were aligned to the protein sequences of arabidosis and rice to observe whether the animal TFs exist in plants. The list of BLAST hits (Table 2) did not provide sufficient similarity to suggest that TFs similar to animal TFs exist in these two Mohanty et al. — Promoters of Anaerobically Induced Genes 674 species. However, they were promising since some of the hits suggested that these novel TFs do occur in plants. Also, many of the hits were to plant proteins of unknown function or to hypothetical plant proteins. Although this analysis was inconclusive in the sense that we were not able to detect plant proteins that were very similar to animal TFs, the suggested outcomes that point to plant TFs or hypothetical proteins or proteins with unknown function are encouraging and require further study. genes code for proteins not involved in fermentation. Thus, it was important also to examine a selection of such genes. We used a larger promoter group from 117 genes (anaerobic set 2) determined from various species. The chosen genes were based on gene name matching to ones highly over- and underexpressed in arabidopsis during a low-oxygen microarray experiment (Klok et al., 2002). The selected genes do not belong to the same pathway, but to many different metabolic pathways. Thus, the overall similarity of their promoters should be considerably smaller. Consequently, the motifs that are potentially shared between such promoters are more likely to belong to the general transcriptional machinery required for basal transcription, rather than to be a more specific anaerobic response, although one cannot exclude such a possibility. Although we selected the top 22 motifs with high frequency (Table 3), only four motifs (TTTTTGT, TTCATCA, AAAACC and CAACTT) were, over a length of 5 nt or more, found to be similar to the motifs detected in Detection of motifs in promoters of 117 anaerobic genes in anaerobic set 2 The previously analysed data set of 13 anaerobic genes was homogeneous in the sense that all the genes belong to the ethanolic fermentative pathway, and, thus, one could expect that these genes share many similarities in their promoter regions. Our finding of common promoter motifs supports such an assumption. However, many anaerobic A Promoter sequence legend Motif legend Motif sequences Sequence number Gene name 1 AAACAAA 1 Alcohol dehydrogenase − 1 in maize 2 AGCAGC 2 Alcohol dehydrogenase − 2 in maize 3 CACAAT 3 Alcohol dehydrogenase in arabidopsis 4 TTATTA 4 Alcohol dehydrogenase in pea 5 AA(A/G)ATT 5 Sucrose synthase in maize 6 TTTTTCT 6 Sucrose synthase in arabidopsis 7 TCATCAC 7 Sucrose synthase in rice 8 TTTTCTTC 8 Pyruvate decarboxylase in rice 9 AAA(A/C)CCTC 9 Alcohol dehydrogenase − 2 in petunia 10 GTTT(A/C/T)GCAA 10 Alcohol dehydrogenase − 3a in tomato 11 ATATAAATT 11 Alcohol dehydrogenase − 3b in tomato 12 (A/C/G)AAAAACAAA 12 Fructose bisphosphate aldolase in maize 13 TATAAAAAC 13 Alcohol dehydrogenase − 2 in barley 14 TCCTCCT 15 CAACTCA 16 TTCCCTGTT Motif symbol 675 Mohanty et al. — Promoters of Anaerobically Induced Genes Use of a-amylase as negative control (negative control set 1) the data set of 13 anaerobic promoters from the ethanolic fermentative pathway (Table 4). The frequencies of occurrence of those similar motifs ranged from 48 to 88 %. Among the four similar motifs, the motif TTCATCA was found in 48 % of sequences and is one of the new motifs found in the data set of 13 anaerobic genes, which was not thought to be present in plants. The other partly similar motif AAAACC was found in 88 % of sequences. A similar motif (AAACC) has been reported in the GapC4 gene of maize. This observation indicates that AAAACC could be a motif common to anaerobic genes shared across various metabolic pathways. The other two similar motifs, TTTTTGT and CAACTT, could play some common role in anaerobic induction of genes related to various mechanisms, but their nature now needs carefully study. The presence of the motifs found in anaerobic set 2 related to different metabolic pathways was searched in the TRANSFAC, PlantCARE and PLACE databases. A few motifs such as AAAGAAA, AAAGAAAAA, ATTTTTAT, AAAACC and CAACTT are listed as being present partly in plants as TFBSs. The others were not listed for plants, but have been found in other organisms. Sugar availability plays an important role in the production of energy by fermentation in oxygen-starved tissues (Vartapetian and Jackson, 1997). As the amount of hexoses and disaccharides is limited, the degradation of starch reserves becomes crucial for survival under anoxic conditions (Perata et al., 1998). Among the starch-degrading enzymes, a-amylase plays a major role (Sun and Henson, 1991). There is evidence that, in rice, it is produced in germinating seeds under anoxia (Perata et al., 1992), but not in anoxia-intolerant wheat, barley and other cereals (Gulieminetti et al., 1995). There is also a report that a-amylase plays a similarly important role in the anoxia-tolerant rhizome of Acorus calamus (Arpagaus and Braendle, 2000). Loreti et al. (2003) demonstrated that a-amylase production under anoxia is mostly due to the activity of the Ramy3D gene. Due to the critical role these genes have in the supply of sugar during anoxic conditions in tolerant rice seeds and other anoxia tolerant tubers, it was logical to compare these genes with B −200 −150 2 9 2 −100 14 3 4 5 2 14 1 16 7 15 5 13 1 10 9 13 1 12 5 16 3 12 2 4 8 14 16 6 7 11 9 13 16 11 4 8 6 4 1 7 15 11 6 8 13 12 15 11 10 5 16 6 7 14 7 3 9 8 5 15 13 11 9 16 4 14 3 8 14 6 3 7 2 3 3 13 9 4 10 2 7 3 2 7 4 9 7 15 5 9 1 6 5 3 11 4 15 12 16 10 2 13 1 8 5 3 11 4 15 12 16 10 2 13 1 8 7 11 6 13 1 5 14 +50 TSS 8 6 3 11 13 1 15 1 12 5 −50 10 2 8 6 14 3 16 3 4 14 7 2 15 7 15 10 12 13 F I G . 3. (A) Motifs/symbols used and gene name of the promoter sequences. (B) Positional distribution of 16 motifs detected in anaerobic set 1 for 13 anaerobic genes. Mohanty et al. — Promoters of Anaerobically Induced Genes 676 T A B L E 1. Motifs found in promoters of 13 anaerobic genes involved in the fermentative pathway (anaerobic set 1) and their presence in TRANSFAC and PLACE databases Motifs detected Presence in TRANSFAC/PLACE database (for plants) AAACAAA Not known in plants AGCAGC Not known in plants CACAAT TTATTA AA(A/G)ATT AS1 in pea and tobacco (PLACE) AtHB6 in arabidopsis (PLACE) WRKY1 in cotton/cytokinin gene in cucumber (PLACE) GT-1 in tobacco Not known in plants TTTTTCT TCATCAC TTTTTCTTC HNF-3alpha, HNF-3B, HNF-3g (insulin-like growth factor-binding protein !) in rat; FOXJ1 (HNF3/Fkh homologue 4) in mouse Adf-1 (activate tandem promoters of the ADH gene) in Drosophila; CTCF in human, Homo sapiens GT-1 in tobacco, MNB1b, MNF1 in maize SP1 in maize; SEF3 in soybean, GCBP-1 in maize; PinII gene in potato Not known in plants GT-1 in tobacco IAAA4/5 in pea; MyB5 in rice, GT-1 in tobacco; NAPA in oil seed rape GT-1 and GT-1b in tobacco Pal gene in parsley SEF3 in soybean Not known in plants AAA(A/C)CCTC GTTT(A/C/T)GCAA ATATAAATT (A/C/G)AAAAACAAA TATAAAAAC TCCTCCT CAACTCA TTCCCTGTT Presence in TRANSFAC (other organisms) Nkx6-1 (pancreatic b-cell-specific factor) in rat; EN-1 in mouse, IPF1 in human (insulin gene enhancer) FOXF2 in mouse; AREB6 in human; C/EBPa in chicken LEF-1 in mouse; TCF-1 (T cell factor) in human TCF-1(P)in mouse, TCF-1A, TCF-1B, TCF-1C,TCF-!E, TCF-1F, TCF-1G and TCF-2a in human T A B L E 2. List of BLAST hits of TFs (column 2) to proteins in arabidopsis and rice Motifs detected in anaerobic set 1 (not known in plants) Transcription factors found in animals Transcription factors and proteins found in arabidopsis Transcription factors and proteins found in rice AAACAAA HNF-3A in rat FOXJ1 in mouse ADF-1 in Drosophila CTCF in human – – – Zinc finger (C2H2 type) family protein/transcription factor jumonji (jmj) family protein and other proteins Putative HD-ZIP transcription factor, HD-ZIP transcription factor 5, 6, 7, 10, 12, 13 and 16, and other proteins Putative HD-ZIP transcription factor, HD-ZIP transcription factor 5, 7, 10, 12 and 17, and other proteins Different protein Zinc finger (C2H2 type) family protein/transcription factor jumonji (jmj) family protein, CCAAT-boxbinding transcription factor and other proteins Different proteins Different proteins Different proteins – – – Putative transcription factor IIIA and other proteins Different proteins AGCAGC TCATCAC Nkx6-1 in rat EN-1 in mouse GTTT(A/C/T)GCAA FOXF2 in mouse AREB6 in human TTCCCTGTT LEF1 in mouse TCF1 in human TCF1 in mouse anaerobic genes. Thus, we compiled a data set of promoters of a-amylase genes from various plants as a negative control set. Using DMB, we detected motifs in promoters of a-amylase genes as we did for anaerobic genes. We selected 20 motifs with the highest occurrence. These motifs together with their percentage occurrence are presented in Table 5. The results show that the motifs detected are not the same as those detected for anaerobic genes (Table 1). Different proteins Different protein Different proteins Different proteins Different proteins Different proteins We searched for the presence of motifs from a-amylase promoters in the TRANSFAC, PlantCARE and PLACE databases. Six motifs that were previously reported as TFBSs for a-amylase genes were identified. These are TTTCCAT (Amy 32B in barley), CCTTTTCA (Amy 32B in barley), CAGTGCCTCCAA (Amy 3d in rice), GTAGCCATCAAT (Amy 32B in barley), AGTGCCTCCAA (Amy 3D in rice) and CACTGCCTATAAAT (Amy 3D in rice). The motifs CTATAA, CCATCAGC, CCATCAAC, AGCCATCA (A/G) 677 Mohanty et al. — Promoters of Anaerobically Induced Genes T A B L E 3. Percentage occurrence of detected motifs in promoters of 117 anaerobic genes including the genes detected through microarray experiment (anaerobic set 2) T A B L E 5. Percentage occurrence of detected motifs in promoters of negative control set 1 (a-amylase genes) Motifs Motifs Percentage occurrence Percentage occurrence TAATTA TATATA AAAGAA AATCCAA TTAAAAA CAACTT TTTTTGT TATTATA AAAACC TTCATCA ATATAAT AAAATAAA TTAAATTT TTATATATA TATATATAC AAAGAAAAA TAAAAAG ATTTTTAT AAACAT ACAAAA TTCCAC TTTGTT CTATAA TTTCCAT GCAACAC GACTTG CCTTTTCA CCATCAGC CCAAGCAC CCATCAAC AAATACCA CTATAAATA AGCCATCA(A/G) CTTGTA(A/G)CCATC AGAGTCC(T)GGTA CA(G/T)TGCCTCCAA GTAGCCATCAAT A(G/T)TGCCTCCAA CCTATAAATACCA AGCAACACTCCAT C(A/C)CTGCCTATAAT CTGCCTATAA 56 66 73 58 65 63 61 54 88 48 56 54 50 63 80 63 57 68 79 77 74 74 100 80 73 80 87 80 80 87 80 100 80 53 53 53 60 53 67 80 67 60 No motifs from 13 anaerobic promoters (anaerobic set 1) are found in the top 20. and CTGCCTATAA are unknown in plants, but they are known for other organisms. T A B L E 4. Comparison of occurrences of different motifs detected in the promoters of 13 anaerobic genes (anaerobic set 1) and promoters of 117 anaerobic genes based on the microarray experiment (anaerobic set 2) Motifs from the anaerobic set 1 of 13 anaerobic genes that participate in the fermentative pathway AAACAAA AGCAGC CACAAT TTATTA AA(A/G)ATT TTTTTCT TCATCAC TTTTCTTC AAA(A/C)CCTC GTTT(A/C/T)GCAA ATATAAATT (A/C/G)AAAAACAAA TATAAAAAC TCCTCCT CAACTCA TTCCCTGTT No.of promoters from anaerobic set 1 that contain the motif 9 11 12 9 9 8 11 8 7 6 8 6 10 9 10 9 No. of promoters from anaerobic set 2 (117 genes expressed in anaerobic stress in a microarray experiment) and motifs found by the ab initio method – – – – – 71 (TTTTTGT) 56 (TTCATCA) – 103 (AAAACC) – – – – – 73 (CAACTT) – No.of promoters from anaerobic set 1 that contain the motif listed in column 3 – – – – – 7 9 – 12 – – – – 8 – Four motifs from anaerobic set 2 were found to be similar to but not the same as the motifs detected in anaerobic set 1. The four shared motifs could be considered as potential TFBSs of TFs that are more commonly involved in the anaerobic stress response, while the other 12 motifs from column 1 could be considered as more specific to control of anaerobic genes that belong to the ethanolic fermentative pathway. Detection of motifs in 303 promoters of presumably non-anaerobic genes (negative control set 2) To validate our results further and to check that the system did not generate an excessive number of false-positive motifs, a similar detection protocol was applied using 303 genes having different functions and originating from different plant species (negative control set 2). The top 20 motifs detected with the highest frequency are reported in Table 6. In this data set, we did not find the same motifs as in anaerobic promoters of anaerobic set 1 (Table 6), but we did find three shorter motifs that partly overlap with motifs of anaerobic set 1. Hence, we conclude that the motifs of the negative control set 2 are not prominent in anaerobic set 1. Out of 20 motifs we detected in this group, only three partly overlap with the motifs detected in anaerobic set 1 (Table 7). One of them, TATAAAT, found in 79 % of sequences, contains a commonly found TATA-box that binds general TFs and, thus, its presence can be expected. The other two motifs, AAAACAA and CAACTT, that were similar to motifs detected in anaerobic set 1, were found in 61 and 56 % of sequences, respectively. The motif AAAACAA is present as an auxin-responsive element in pea in the primary indole acetic acid-inducible gene (Balas et al., 1993) and partly overlaps the motif (A/C/G)AAAAACAAA from anaerobic set 1. The motif CAACTT is not found in plants, but is present in animals. This motif overlaps with the longer motif CAACTCA from anaerobic set 1, but only the first 5 nt. These two motifs could therefore be of common TFBSs in plants, but we have no evidence for this yet. However, we observe that all other detected motifs 678 Mohanty et al. — Promoters of Anaerobically Induced Genes T A B L E 6. Percentage occurrence of detected motifs in promoters of negative non-control set 2 (303 promoter sequences from PPD and EPD databases excluding genes not induced or repressed during anoxia) T A B L E 8. Percentage occurrence of detected motifs in promoters of genes induced by cold, drought, high salinity stresses and ABA application Motifs Motifs CTATAA CAAAAT CACATT ACACGT CAACTT TTAAAAA GATTTC ATTTCAT AAAATATT TTTTCAT AAATCCA TATATAAA CTATAAATA AAAAGAA TATAAAT GAAAAA AAAACAA ACAAAT TTTTGT AAAATATA 67 59 61 62 56 60 52 45 51 68 44 52 53 57 79 84 61 84 63 45 T A B L E 7. Comparison of the occurrence of motifs in anaerobic promoters (anaerobic set 1) and promoters of negative control set 2 Motifs AAACAAA AGCAGC CACAAT TTATTA AA(A/G)ATT TTTTTCT TCATCAC TTTTCTTC AAA(A/C)CCTC GTTT(A/C/T)GCAA ATATAAATT (A/C/G)AAAAACAAA TATAAAAAC TCCTCCT CAACTCA TTCCCTGTT Percentage occurrence Percentage occurrence Anaerobic promoters (13) (anaerobic set 1) 9 11 12 9 9 8 11 8 7 6 8 6 10 9 10 9 Promoters of negative control set 2 (303) – – – – – – – – – – 239 (TATAAAT) 185 (AAAACAA) – – 171 (CAACTT) – Thirteen motifs from anaerobic set 1 of the ethanolic fermentative pathway do not appear in any significant proportion in the negative control set 2. Of the top 20 motifs detected in the negative control set 2, only three share similarity with motifs of anaerobic set 1. One of these is similar to the TATA-box, and thus cannot be considered specific for any particular gene group. The other two could be some of the more general motifs required for transcriptional activation of plant genes, but this has yet to be examined. (17) are different from the most characteristic motifs of anaerobic group 1, which demonstrates that our method provides reasonably accurate detection of motifs specific for anaerobic responsiveness. (A/C)AACTT AA(A/T)CAAA TATATC ATATAA ACTCTTT TTAAAAA A(G/T)CCATG CTCTT(C/T)CA AAATT(A/G)TT ATATAAATA TTTCTTTAT AA(A/G)AAAAA AACTTTG AACAAAA(A/T)GA C(A/T)AAAACAAA TAAATATAGAT TTTAAAGA AATCAATTC CTATATAA AAGAAAA(A/T) 82 71 76 71 76 82 65 65 71 76 71 71 59 47 47 65 76 65 76 59 With the exception of a partial TATA-box motif and AA(A/T)CAAA which is partly similar to the AAACAAA motif of anaerobic set 1, no other motif from anaerobic set 1 appears significant among the 20 most significant motifs of negative control set 3. Comparison with motifs in the promoters of negative control set 3 A number of genes have been identified as being induced by different abiotic stresses (Thomashow, 1999; Shinozaki Yamaguchi-Shinozaki, 2000). It has been reported that the ADH1 gene in arabidopsis is induced not only by low-oxygen stress, but also by other abiotic stresses such as cold, drought and the hormone abscisic acid (ABA) (Dolferus et al., 2003). ABA plays an important role in toleration of different stress conditions (Shinozaki Yamaguchi-Shinozaki, 2000; Zhu, 2002). Thus, it would be illuminating to look at genes other than ADH that are induced by factors such as cold, drought and ABA as a negative control for anaerobic set 1. Due to the unavailability of promoter sequences in the PPD and EPD, it was difficult to analyse all individual genes induced by factors such as cold, drought or ABA However, based on cDNA microarray expression analysis performed in rice (Rabbani et al., 2003), we compiled a data set of seven genes out of this set whose promoters are present in PPD and EPD. We detected 20 motifs (Table 8) with high frequency of occurrence across the promoters. The motifs detected in this data set were different from the motifs detected in anaerobic set 1, except for the TATA-box and AA(A/T)CAAA, which is partly similar to the motif AAACAAA. However, the other 18 motifs were very different from the motifs detected in anaerobic set 1. As we have discussed earlier, the TATA-box is a common motif found in many genes, but the motif AA(A/T)CAAA could play a common role in stress-induced genes in plants. This now requires further analysis and experimental Mohanty et al. — Promoters of Anaerobically Induced Genes verification. Thus, our results suggest that most of the motifs detected in this negative control data set are different from the motifs found in anaerobic genes of the ethanolic fermentative pathway. Comparison with motifs in the promoters of heat shock protein genes (negative control set 4) As a test of specificity, we thought it logical to compare the motifs we detected in anaerobic genes with those occurring in the promoter regions of other stress response genes. Accordingly, we compiled a set of promoters of heat shock protein (HSP) genes. These proteins are usually undetectable under normal growth conditions but become rapidly induced in response to heat. The accumulation of HSPs depends on both temperature and duration of the stress (Howarth, 1991). Not only are the genes induced during heat stress, but there is also evidence of HSP gene expression in response to osmotic stress, drought stress and also constitutively at certain developmental stages (Sun et al., 2002). The way in which HSPs protect cells against some kinds of stress is not fully understood. They may contribute to cellular homeostasis and are responsible for protein folding, assembly, translocation and degradation of different cellular processes. They also help to stabilize proteins and membranes, and maintain correct protein refolding (W. Wang et al., 2004). In the HSP set, we detected 20 motifs showing a high frequency of occurrence. Some motifs occurred in no less than 67 % of the genes, while others were present in all the genes (Table 9). We also searched for these motifs in the TRANSFAC, PlantCARE and PLACE databases. Seventeen motifs are already known to be present in plants as TFBSs, and those not seen in plants before have been found in other organisms. None of these 20 motifs is present in T A B L E 9. Percentage occurrence of detected motifs in promoters of HSP genes, negative control set 4 Motifs ATATAA AACAAT (A/C)CACT(A/T) CCTTTT(A/T) GCAGAAG CTAGAAC TGTTA(A/T)CG AGCAAACA CATCTCAT AAAAAGGA CCAGAATT AAAGTT(A/T)CAT A(A/C)AAGAGAA AAACAAAATG TA(G/T)CATTTTA GAAT(C/T)TTCTA AATATCATTT TCTGGAGA AAAGTTACAT CAGAATTTTTC Percentage occurrence 83 100 100 83 67 67 67 67 67 75 75 67 75 67 75 75 75 83 75 75 No motifs from anaerobic promoters are found in the top 20. 679 the anaerobic genes except the TATA-box. However, the TATA-box is one of the general core promoter elements and thus not specific for transcriptional activation of any particular functional gene group. The motifs detected suggest that those we have found for anaerobic genes do not have any role in HSP gene activation but may well play a major role in the control of anaerobic genes themselves. The list of motifs over-represented in the anaerobic genes from the ethanolic fermentative pathway, together with the expression profiles of these genes could provide necessary clues for reconstruction of parts of regulatory networks for this pathway. In addition, the results shown here will allow biological validation using standard methods such as quantitative real-time PCR assays and by reporter gene assays of transgenic plants carrying chimaeric constructs of selected promoter regions fused to reporter genes. CONCLUSIONS We detected motifs in the [200, +50] region relative to the TSS of 13 anaerobic genes selected from seven different plants. Sixteen of the most significant motifs were selected out of 120 motifs. Of these, eight are reported in the TRANSFAC database as TFBSs, while another three are included in the PLACE database as parts of other known motifs. The remaining five motifs have not been reported previously for plants as binding sites of TFs. However they have been reported as such for other organisms including humans, mouse, rat, Drosophila and others, increasing the chances that the new motifs we found in the majority of anaerobic promoters are biologically active and relevant to the regulation of anaerobic metabolism in plants. We also searched for the presence of animal TFBSs in plant homologues (arabidopsis and rice). Although the results did not provide sufficient support to prove that the animal TFs are present in plants, they do provide some encouraging clues to guide further analysis, since several hits were to TFs in plants, but these hits were not of sufficiently strong similarity. We also detected motifs in a larger promoter group from 117 genes from various species from anaerobic set 2. Four motifs, TTTTTGT, TTCATCA, AAAACC and CAACTT, were found to be similar to the motifs detected in the data set of 13 anaerobic promoters from the ethanolic fermentative pathway. We compiled a set of a-amylase promoters as a negative control to compare with anaerobic promoters. The motifs found for anaerobic promoters were different from the motifs detected for a-amylase promoters. We also compared the presence of motifs from anaerobic promoters in a set of 303 promoters compiled from presumably nonanaerobic genes (negative control set 2). Out of the 20 most significant motifs, only three partly overlapped with motifs from anaerobic set 1. One of these was similar to the TATA-box, which is commonly found in the upstream regions of very many genes. The other two could be TFBSs that are more commonly shared in plant promoters. To validate further our motifs found in anaerobic set 1, we detected motifs in a number of genes induced by cold, drought, high salinity and ABA application (negative 680 Mohanty et al. — Promoters of Anaerobically Induced Genes control set 3) that ADH also responds to. The 16 motifs detected in anaerobic set 1 were not found in the top 20 significant motifs detected in the negative control set 3, with the exception of a partial TATA-box motif and AA(A/T)CAAA which is partly similar to the AAACAAA motif of anaerobic set 1. This result suggests that although ADH responds to different stress conditions, the regulation could be different depending on the stress condition. In the data set of HSP promoters, no motif (out of the top 16) from anaerobic set 1 was present. These observations indicate that the 16 motifs we detected for anaerobic promoters could be biologically active, as they are highly specific to promoters of anaerobic genes that belong to the ethanolic fermentative pathway. The five new motifs that are not yet known as plant TFBSs are potentially new binding sites in plants and they, either individually or in combination with other motifs, could play an important role in regulating anaerobic metabolism. However, experimental verification will be necessary to establish the functionality of these motifs more certainly. LITERATURE CITED Altschul SF, Madden T, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25: 3389–3402. Armstrong W. 1979. Aeration in higher plants. Advances in Botanical Research 7: 225–232. Arpagaus S, Braendle R. 2000. The significance of a-amylase under anoxia stress in tolerant rhizomes (Acorus calamus L.) and non-tolerant tubers (Solanum tuberosum L. var. Desiree). Journal of Experimental Botany 51: 1475–1477. Bailey TL, Elkan C. 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology (ISMB’94) 2: 28–36, AAAI Press, Menlo Park, California, August. Ballas N, Wong LM, Theologis A. 1993. Identification of the auxinresponsive element, AuxRE, in the primary indoleacetic acid-inducible gene, PS-IAA4/5, of pea (Pisum sativum). Journal of Molecular Biology 233: 580–596. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. 2004. GenBank update. Nucleic Acids Research 32: D23–D26. Boeckmann B, Bairoch A, Apweiler R, Blatter M, Estreicher A, Gasteige E, et al. 2003. The SWISS-PROT protein knowledgebase and its supplement TRMBL. in 2003. Nucleic Acids Research 31: 365–370. Califano A. 2000. Splash, structural pattern localization analysis by sequential histograms. Bioinformatics 16: 341–357. Caselle M, Di Cunto F, Provero P. 2002. Correlating overrepresented upstream motifs to gene expression: a computational approach to regulatory element discovery in eukaryotes. BMC Bioinformatics 3: 7. Dolferus R, Jacobs M, Peacock WJ, Dennisn ES. 1994. Differential interactions of promoter elements in stress responses of the arabidopsis Adh gene. Plant Physiology 105: 1075–1087. Dolferus R, Klok EJ, Delessert C, Wilson S, Ismond KP, Good AG, et al. 2003. Enhancing the anaerobic response. Annals of Botany 91: 111–117. Fennoy SL, Bailey-Serres J. 1995. Post-transcriptional regulation of gene expression in oxygen-deprived roots of maize. Plant Journal 7: 287–295. Geffers R, Sell S, Cerff R, Hehl R. 2001. The TATA box and a Myb binding site are essential for anaerobic expression of a maize Gap C4 minimal promoter in tobacco. Biochimica et Biophysica Acta 1521: 120–125. Greenway H, Gibbs J. 2003. Mechanism of anoxia tolerance in plants. I. Growth, survival and anaerobic catabolism. Functional Plant Biology 30: 1–47. Grover A, Pareek A, Singla SL, Minhas D, Katiyar S, Ghawana S, et al. 1998. Engineering crops for tolerance against abiotic stresses through gene manipulation. Current Science 75: 689–696. Guglielminetti L, Yamaguchi J, Perata P, Alpi A. 1995. Amylolytic activities in cereal seeds under aerobic and anaerobic conditions. Plant Physiology 109: 1069–1076. van Helden J, Andre B, Collado-Vides J. 1998. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. Journal of Molecular Biology 281: 827–842 Hertz GZ, Stormo GD. 1999. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15: 563–577. Higo K, Ugawa Y, Iwamoto M, Korenaga T. 1999. Plant cis-acting regulatory DNA elements (PLACE) database. Nucleic Acids Research 27: 297–300. Hoeren FU, Dolferus R, Wu Y, Peacock WJ, Dennis ES. 1998. Evidence for a role of AtMYB2 in the induction of the arabidopsis alcohol dehydrogenase (ADH1) gene by low oxygen. Genetics 149: 479–490. Howarth CJ. 1991. Molecular responses of plants to an increased incidence of heat shock. Plant, Cell and Environment 14: 831–841. Huang E, Yang L, Chowdhary R, Kassim A, Bajic VB. 2005. An algorithm for ab-initio DNA motif detection. In: Bajic VB, Tan TW, eds. Information processing and living system. World Scientific, Imperial College Press, London, 611–614. Jin H, Martin C. 1999. Multifunctionality and diversity within the plant MYB-gene family. Plant Molecular Biology 41: 577–585. Kennedy RA, Rumpho ME, Fox TC. 1992. Anaerobic metabolism in plants. Plant Physiology 84: 1204–1209. Khush GS, Baenziger PS. 1998. Crop improvement: emerging trends in rice and wheat. In: Chopra VL, Singh RB, Verma A, eds. Crop productivity and sustainability—shaping the future. New Delhi: Oxford and BH publishing, 113–125. Klok EJ, Wilson IW, Wilson D, Chapman SC, Ewing RM, Somerville SC, et al. 2002. Expression profile analysis of the low-oxygen response in arabidopsis root cultures. Plant Cell 14: 2481–2494. Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC. 1993. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignments. Science 262: 208–214. Lescot M, Dehais P, Thijs G, Marchal K, Moreau Y, Van de Peer Y, et al. 2002. PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. Nucleic Acids Research 30: 325–327. Loreti E, Yamaguchi J, Alpi A, Perata P. 2003. Sugar modulation of a-amylase genes under anoxia. Annals of Botany 91: 143–148. Matys V, Fricke E, Geffers R, GoBling E, Haubrock M, Hehl R, et al. 2003. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Research 31: 374–378. Olive MR, Walker JC, Singh K, Dennis ES, Peacock WJ. 1990. Functional properties of the anaerobic response element of the maize Adh1 gene. Plant Molecular Biology 15: 593–604. Olive MR, Walker JC, Singh K, Ellis JG, Llewellyn D, Dennis ES, et al. 1991a. The anaerobic response element. Plant Molecular Biology 2: 673–684. Olive MR, Peacock WJ, Dennis ES. 1991b. The anaerobic responsive element contains two GC-rich sequences essential for binding a nuclear protein and hypoxic activation of the maize Adh1 promoter. Nucleic Acids Research 19: 7053–7060. Pavesi G, Mereghetti P, Mauri G, Pesole G. 2004. Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Research 32: W199–W203. Perata P, Pozueta-Romero J, Akazawa T, Yamaguchi J. 1992. Effect of anoxia on starch breakdown in rice and wheat seeds. Planta 188: 611–618. Perata P, Loreti E, Guglielminetti L, Alpi A. 1998. Carbohydrate metabolism and anoxia tolerance in cereal grains. Acta Botanica Neerlandica 47: 269–283. Poluliakh N, Nakai K. 2003. Extraction of biological motifs by Gibbs Sampler from the promoters of Homo sapiens, Saccharomyces cerevisiae and Bacillus subtilis. Genome Informatics 14: 406–407. Praz V, Perier R, Bonnard C, Bucher P. 2002. The Eukaryotic promoter database, EPD: new entry types and links to gene expression data. Nucleic Acids Research 30: 322–324. Mohanty et al. — Promoters of Anaerobically Induced Genes Rabbani MA, Maruyama K, Abe H, Khan MA, Katsura K, Ito Y, et al. 2003. Monitoring expression profiles of rice genes under cold, drought, and high-salinity stresses and abscisic acid application using cDNA microarray and RNA gel-blot analyses. Plant Physiology. 133: 1755–1767. ap Rees T, Jenkin LET, Smith AM, Wilson PM. 1987. The metabolism of flood tolerance plants. In Crawford RMM, ed. Plant life in aquatic and amphibious habitats. Oxford: Blackwell Scientific, 227–238. Sachs MM, Freeling M, Okimoto R. 1980. The anaerobic proteins of maize. Cell 20: 761–767. Sachs MM, Subbaiah CC, Saab IN. 1996. Anaerobic gene expression and flooding tolerance in maize. Journal of Experimental Botany 47: 1–15. Shahmuradov IA, Gammerman AJ, Hancock JM, Bramley PM, Solovyev VV. 2003. PlantProm: a database of plant promoter sequences. Nucleic Acids Research 31: 114–117. Shinozaki K, Yamaguchi-Shinozaki K. 2000. Molecular responses to dehydration and low temperature: differences and cross-talk between two stress signaling pathways. Current Opinion in Plant Biology 3: 217–223. Stormo GD. 2000. DNA binding sites: representation and discovery. Bioinformatics. 16:16–23. Sun Z, Henson CA. 1991. A quantitative assessment of the importance of barley seed a-amylase, debranching enzyme, and a-glucosidase in starch degradation. Archives of Biochemistry and Biophysics 284: 298–305. Sun W, Montagu MV, Verbruggen N. 2002. Small heat shock proteins and stress tolerance in plants. Biochimica et Biophysica Acta 1577: 1–9. Thomashow MF. 1999. Plant cold acclimation: freezing tolerance genes and regulatory mechanisms. Annual Review of Plant Physiology and Plant Molecular Biology 50: 571–599. 681 Vartapetian BB, Jackson MB. 1997. Plant adaptations to anaerobic stress. Annals of Botany 79: 3–20. Vilo J, Brazma A, Jonassen I, Robinson A, Ukkonen E. 2000. Mining for putative regulatory elements in the yeast genome using gene expression data. Proceedings of the International Conference on Intelligent Systems for Molecular Biology 8: 384–394. Vogel JM, Roth B, Cigan M, Freeling M. 1993. Expression of the two maize TATA binding protein genes and function of the encoded TBP proteins by complementation in yeast. Plant Cell 5: 1627–1638. Walker JC, Howard EA, Dennis ES, Peacock WJ. 1987. DNA sequences required for anaerobic expression of the maize Adh1 gene. Proceedings of the National Academy of Sciences of the USA 84: 6624–6629. Wang W, Vinocur B, Shoseyov O, Altman A. 2004. Role of plant heat-shock proteins and molecular chaperones in the abiotic stress response. Trends in Plant Science 9: 244–252. Wang Z, Dalkilic M, Kim S. 2004. Guiding motif discovery by iterative pattern refinement. ACM Symposium on Applied Computing, Nicosia, Cyprus. March 14–17: 162–166. Werner T, Fessele S, Maier H, Nelson PJ. 2003. Computer modeling of promoter organization as a tool to study transcriptional coregulation. FASEB Journal 17: 1228–1237. Yanagisawa S. 2002. The Dof family of plant transcription factors. Trends in Plant Science 7: 555–560. Yang L, Huang E, Bajic VB. 2005. Some implementation issues of heuristic methods for motif extraction from DNA sequences. International Journal of Computers, System, and Signals (in press). Zhu JK. 2002. Salt and drought stress signal transduction in plants. Annual Review of Plant Biology 53: 247–273.
© Copyright 2026 Paperzz