special communication Am J Physiol Regul Integr Comp Physiol 284: R1147–R1150, 2003; 10.1152/ajpregu.00448.2002. An evolutionary approach for identifying potential transcription factor binding sites: the renin gene as an example Ralf Mrowka, Karola Steinhage, Andreas Patzak, and Pontus B. Persson Johannes-Müller-Institut für Physiologie, Charité, Humboldt-Universität zu Berlin, D-10117 Berlin, Germany Submitted 24 July 2002; accepted in final form 25 December 2002 ADVANCES in the sequencing projects of closely related organisms have opened the door for sequence comparisons. Currently, the complete human, mouse, and parts of the rat genome are available. Of the approximately three billion base pairs of the human genome, only a very small fraction contains sequences coding for proteins. Our understanding of the functions and importance of the remaining large fraction is rapidly increasing: promotor and regulatory functions and previously unknown functions are brought about by noncoding regions. For example, noncoding sequences are discussed to prevent the ends of the chromosome from fraying during cell division (27). Evolutionary pressure has resulted in the conservation of certain nucleotide sequences. The conservation of these regions may be taken as an indication of their potential functional importance (6). Noncoding motifs can be conserved over 800 million years, as has been shown for the HOX gene cluster that is important in developmental processes (8). Moreover, noncoding sequences can influence gene expression of very distant genes that are separated by 120 kb, which was demonstrated in an impressive study based on computational identification of conserved sequences (11). Obtaining candidates for gene regulatory sites is more difficult than identifying exons, because they are small in size and may be situated far from their target genes (11). A first step to their verification is the identification of noncoding sequences among closely related organisms, such as humans, mice, and rats. If a noncoding sequence is important for the organism, one expects a certain evolutionary pressure on that sequence. In consequence, this sequence evolves at a slower rate compared with sequence regions, which do not have a function constrained to sequence. The feasibility of this method to identify functionally important noncoding sequences is established. For instance, Wasserman et al. (29) found that 74 of 75 transcription factor (TF) binding sites (TFBS) are located within these noncoding sequences. They applied a type of consensus searching algorithm (Gibbs sampler) on flanking regions of skeletal muscle-specific genes and conclude that the identified consensus motifs are only biologically meaningful if the search is restricted to conserved noncoding regions. To identify noncoding sequences of the renin gene, we compare the upstream (5⬘) noncoding DNA. In this study we focus on the renin gene because it has great importance for cardiovascular and renal homeostasis (2–5, 7, 12, 17, 25, 26, 28). To explore possible important regions for the regulation of the human renin gene (hREN), we conducted a bioinformatics approach comparing human, mouse, and rat noncoding sequences upstream of the gene. Our approach further applies a combination with other independent database information of weight matrices for TFBS. We estimated the homology of noncoding DNA between the human, the mouse, and the rat DNA sequences around the renin gene, which are presented as a percent identity plot (PIP, Fig. 1) (20). About 11–15 kb upstream of the human renin gene, a 3.9-kb-long block of human DNA hRENc was identified that con- Address for reprint requests and other correspondence: R. Mrowka, Johannes-Müller-Institut für Physiologie, Humboldt-Universität zu Berlin, Tucholskystr. 2, D-10117 Berlin, Germany (E-mail: ralf. [email protected]). The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked ‘‘advertisement’’ in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. noncoding sequences; cross-species conservation http://www.ajpregu.org 0363-6119/03 $5.00 Copyright © 2003 the American Physiological Society R1147 Downloaded from http://ajpregu.physiology.org/ by 10.220.33.4 on July 12, 2017 Mrowka, Ralf, Karola Steinhage, Andreas Patzak, and Pontus B. Persson. An evolutionary approach for identifying potential transcription factor binding sites: the renin gene as an example. Am J Physiol Regul Integr Comp Physiol 284: R1147–R1150, 2003; 10.1152/ajpregu.00448.2002.— Evolutionary pressure has resulted in the conservation of certain nucleotide sequences. These conserved regions are potentially important for certain functions. Here we give an example of a comparison between noncoding sequences combined with other independent database information to shed light onto the regulation of the renin gene, a gene that has great importance for cardiovascular and renal homeostasis. To combine the information regarding conservation and weight matrices of transcription factor (TF) binding sites, an algorithm was developed (TFprofile). Notably, a local peak in the resulting binding profile coincides with a previously experimentally identified regulatory region for the renin gene. The existence of further peaks in the binding profile in the conserved 3.9-kb-long hRENc DNA block upstream of the renin gene suggests additional regions of potential importance for gene regulation. The algorithm TFprofile may be used to integrate information on cross-species evolutionary conservation and aspects of TF binding characteristics to provide putative regulatory DNA regions for experimental verification. R1148 NEW APPROACH TO IDENTIFYING POTENTIAL TF BINDING SITES tained a large number of conserved elements (for detailed description please refer to the APPENDIX). We calculated the percent identity estimates of hRENc to the corresponding DNA regions for mouse and rat using blastz (20). The hRENc DNA block was searched for TFBS using MatInspector (16) with matrices for vertebrates. To combine the information regarding conservation and binding sites, a special algorithm was developed (TFprofile). The way TFprofile calculates a weighted binding TF profile to identify regulatory regions is as follows. We combine three independent parameters that suggest a functional regulatory relevance of DNA sequence. These parameters include 1) the number of TFs that have a putative binding site at this location. This provides valuable information because enrichment of putative TFBS in conserved noncoding regions (9) indicates functional importance; 2) the quality of the match of each putative TF quantified by the core and matrix similarity of the TF with the DNA. This is done because a high score of the binding matrix corresponds to a stronger probability of binding (18, 19). A TF that does not bind may not have a functional relevance at the given position; hence, a higher functional relevance may be associated with stronger binding; and 3) the AJP-Regul Integr Comp Physiol • VOL degree of conservation, because regulatory elements are strongly enriched in conserved noncoding DNA (29). These three parameters were combined, resulting in the following mathematical procedure. First, TFprofile computed the TF density, which is the number of different TFs that span each particular base position. This was done for the hRENc DNA sequence. In a second step, obeying conservation and weight matrix information, TFprofile calculated the weighted TF density by multiplying the TF density with the product of core and matrix similarity of the weight matrix, multiplied by the product of the identity scores in each species. Identities of ⬍50% were excluded, i.e., set to zero. The result of this mathematical procedure yielded a binding profile as shown in Fig. 2. There is no uniform distribution of the binding profile across the hRENc block; however, several local peaks do exist. One peak coincides with a known experimentally verified regulatory region (13, 14, 21–24). This regulatory region was first identified in mouse by Petrovic et al. (15) and in humans by Yan et al. (30). Shi et al. (21) first reported on component elements in detail. They determined by gel competition and supershift analysis that nuclear factor-Y (NF-Y), a ubiquitous CAAT-box binding protein, binds to a part of that 284 • APRIL 2003 • www.ajpregu.org Downloaded from http://ajpregu.physiology.org/ by 10.220.33.4 on July 12, 2017 Fig. 1. Percent identity plot (PIP) for renin gene (hREN) on chromosome 1q32 and the 2 human neighboring genes KISS1 and FLJ10861. Each plot shows the position in the human sequence (horizontal axis) and the percent identity (vertical axis) of each aligning sequence of mouse (mREN1, mREN2) and rat (rnREN). This plot was used to identify the region of a 3.9-kb DNA block hRENc containing conserved sequences approximately 11–15 kb upstream of hREN. (Note: The x-axis refers to the human sequence, i.e., distances refer to the human and do not reflect distances for the other organisms. To see the distance relationships, please refer to the dotplots in supplementary material at http://www.charite.de/bioinformatics/ tfprofile). UTR, untranslated region. NEW APPROACH TO IDENTIFYING POTENTIAL TF BINDING SITES R1149 sequence. Furthermore, Shi et al. (21) detected a lossof-function mutation in the human conserved sequence that can restore its trans-activating function when reverted to match the mouse sequence. Further experimental studies have shown that the TF NF-Y is involved in the blocking of stimulatory TFs (22). In addition, it has been demonstrated by electrophoretic mobility shift assays that TFs bind to the cAMP-responsive element (CRE) box and E-box of the conserved region, and further experiments indicate that they are important for activation (13). Interestingly, the conserved renin enhancer contains a putative vitamin D receptor binding site, and based on clinical observations and mouse knockout experiments, Li et al. (10) suggest that renin expression and blood pressure are directly dependent on vitamin D3 (24). The potential functional relevance of the two remaining peaks in the binding remain to be assessed by future studies. Our TFprofile approach differs from the algorithm applied by Wassermann et al. (29). Their approach involved a consensus search in conserved flanking regions of many coexpressed genes. However, considering the specific function of the renin gene, it is unlikely to find many genes with similar expression patterns. Hence this approach is not comparable with that of the study analyzing muscle-specific genes (29). In conclusion, we describe the results of a computational method (TFprofile) that combines information from weight matrices of TF binding with the information of evolutionary conservation, resulting in a binding profile. Notably, a local peak in the binding profile coincides with a previously experimentally identified regulatory region for the renin gene. The existence of further peaks AJP-Regul Integr Comp Physiol • VOL in the binding profile in the conserved 3.9-kb-long hRENc DNA block upstream of the renin gene suggests additional regions of potential importance. The human, mouse, and rat sequencing efforts as well as computational tools such as PipMaker, MatInspector, and TFprofile, as well as databases like GenBank, make a first-step DNA analysis possible in silico. Genome sequencing of further closely related species such as the monkey will provide further information and may improve the basis for statistical analysis related to the search for regulatory elements. The algorithm TFprofile may be used to integrate information on cross-species evolutionary conservation and aspects of TF binding characteristics to provide putative regulatory DNA regions for experimental verification. APPENDIX Identification of hRENc, a 3.8-kb human DNA sequence upstream of the renin gene containing conserved elements. An 80-kb DNA sequence (hRcS) containing the renin gene from human chromosome 1q32 was extracted from GenBank. Repetitive elements in the human genomic sequence were masked using Repeat Masker program (A. Smit P. Green, unpublished work). We identified the orthologous mouse genomic sequence using BLAST (1) at a local mouse chromosome database downloaded from National Center for Biotechnology Information. We found two genomic DNA blocks harboring the mREN sequence, whereby the latter block contains two tandem duplications of mREN1 and mREN2. The corresponding rat sequence rnREN was found in the renin clone CH230–198L8 of Rattus norvegicus. The pieces were reverted to its reverse complement sequence when appropriate, to have all sequences in the same orientation. We estimated the homology between the human, the two mouse, and the rat sequences (20), which are presented as a PIP (Fig. 1). About 11–15 kb upstream of the human renin 284 • APRIL 2003 • www.ajpregu.org Downloaded from http://ajpregu.physiology.org/ by 10.220.33.4 on July 12, 2017 Fig. 2. Top: weighted transcription factor (TF) binding profile in the 3.9-kblong hRENc DNA block, upstream of the renin gene. TFprofile computes for each position the number of possible different TFs spanning that region weighted by TF core and matrix similarity to the sequence (gray), and this is additionally weighted by the degree of conservation, giving the final result (bold). One peak of the calculated binding profile coincides with an experimentally verified regulatory region of the renin gene. The curves represent a 150 bp moving average of each profile. Bottom: corresponding percent identity of each aligning sequence from mouse (mRENc; black) and rat (rnRENc; red). Calculations and this figure do not contain the mouse tandem duplications. R1150 NEW APPROACH TO IDENTIFYING POTENTIAL TF BINDING SITES REFERENCES 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, and Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402, 1997. 2. Bergeron R, Kjaer M, Simonsen L, Bulow J, Skovgaard D, Howlett K, and Galbo H. Splanchnic blood flow and hepatic glucose production in exercising humans: role of renin-angiotensin system. Am J Physiol Regul Integr Comp Physiol 281: R1854–R1861, 2001. 3. Brown R, Ollerstam A, Johansson B, Skott O, Gebre-Medhin S, Fredholm B, and Persson AE. Abolished tubuloglomerular feedback and increased plasma renin in adenosine A1 receptor-deficient mice. Am J Physiol Regul Integr Comp Physiol 281: R1362–R1367, 2001. 4. Cheng HF, Wang SW, Zhang MZ, McKanna JA, Breyer R, and Harris RC. Prostaglandins that increase renin production in response to ACE inhibition are not derived from cyclooxygenase-1. Am J Physiol Regul Integr Comp Physiol 283: R638–R646, 2002. 5. Cholewa BC and Mattson DL. Role of the renin-angiotensin system during alterations of sodium intake in conscious mice. Am J Physiol Regul Integr Comp Physiol 281: R987–R993, 2001. 6. Hardison RC. Conserved noncoding sequences are reliable guides to regulatory elements. Trends Genet 16: 369–372, 2000. 7. Kammerl MC, Richthammer W, Kurtz A, and Kramer BK. Angiotensin II feedback is a regulator of renocortical renin, COX-2, and nNOS expression. Am J Physiol Regul Integr Comp Physiol 282: R1613–R1617, 2002. 8. Kim CB, Amemiya C, Bailey W, Kawasaki K, Mezey J, Miller W, Minoshima S, Shimizu N, Wagner G, and Ruddle AJP-Regul Integr Comp Physiol • VOL 22. 23. 24. 25. 26. 27. 28. 29. 30. F. Hox cluster genomics in the horn shark, Heterodontus francisci. Proc Natl Acad Sci USA 97: 1655–1660, 2000. Levy S, Hannenhalli S, and Workman C. Enrichment of regulatory signals in conserved non-coding genomic sequence. Bioinformatics 17: 871–877, 2001. Li YC, Kong J, Wei M, Chen ZF, Liu SQ, and Cao LP. 1,25-Dihydroxyvitamin D3 is a negative endocrine regulator of the renin-angiotensin system. J Clin Invest 110: 229–238, 2002. Loots GG, Locksley RM, Blankespoor CM, Wang ZE, Miller W, Rubin EM, and Frazer KA. Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science 288: 136–140, 2000. Marsh AC, Gibson KJ, Wu J, Owens PC, Owens JA, and Lumbers ER. Chronic effect of insulin-like growth factor I on renin synthesis, secretion, and renal function in fetal sheep. Am J Physiol Regul Integr Comp Physiol 281: R318–R326, 2001. Pan L, Black TA, Shi Q, Jones CA, Petrovic N, Loudon J, Kane C, Sigmund CD, and Gross KW. Critical roles of a cyclic AMP responsive element and an E-box in regulation of mouse renin gene expression. J Biol Chem 276: 45530–45538, 2001. Pan L, Xie Y, Black TA, Jones CA, Pruitt SC, and Gross KW. An Abd-B class HOX. PBX recognition sequence is required for expression from the mouse Ren-1c gene. J Biol Chem 276: 32489–32494, 2001. Petrovic N, Black TA, Fabian JR, Kane C, Jones CA, Loudon JA, Abonia JP, Sigmund CD, and Gross KW. Role of proximal promoter elements in regulation of renin gene transcription. J Biol Chem 271: 22499–22505, 1996. Quandt K, Frech K, Karas H, Wingender E, and Werner T. MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res 23: 4878–4884, 1995. Sayago CM and Beierwaltes WH. Nitric oxide synthase and cGMP-mediated stimulation of renin secretion. Am J Physiol Regul Integr Comp Physiol 281: R1146–R1151, 2001. Schneider TD. Information content of individual genetic sequences. J Theor Biol 189: 427–441, 1997. Schneider TD, Stormo GD, Gold L, and Ehrenfeucht A. Information content of binding sites on nucleotide sequences. J Mol Biol 188: 415–431, 1986. Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer C, Bouck J, Gibbs R, Hardison R, and Miller W. PipMaker—a web server for aligning two genomic DNA sequences. Genome Res 10: 577–586, 2000. Shi Q, Black TA, Gross KW, and Sigmund CD. Speciesspecific differences in positive and negative regulatory elements in the renin gene enhancer. Circ Res 85: 479–488, 1999. Shi Q, Gross KW, and Sigmund CD. NF-Y antagonizes renin enhancer function by blocking stimulatory transcription factors. Hypertension 38: 332–336, 2001. Shi Q, Gross KW, and Sigmund CD. Retinoic acid-mediated activation of the mouse renin enhancer. J Biol Chem 276: 3597– 3603, 2001. Sigmund CD. Regulation of renin expression and blood pressure by vitamin D3. J Clin Invest 110: 155–156, 2002. Skott O. Renin. Am J Physiol Regul Integr Comp Physiol 282: R937–R939, 2002. Todorov V, Muller M, Schweda F, and Kurtz A. Tumor necrosis factor-␣ inhibits renin gene expression. Am J Physiol Regul Integr Comp Physiol 283: R1046–R1051, 2002. Vogel G. The human genome. Objection #2: Why sequence the junk? Science 291: 1184, 2001. Wagner KD, Essmann V, Mydlak K, Wirth M, Gmehling G, Bohlender J, Stauss HM, Günther J, Schimke I, and Scholz H. Decreased susceptibility of cardiac function to hypoxia-reoxygenation in renin-angiotensinogen transgenic rats. Am J Physiol Regul Integr Comp Physiol 283: R153–R160, 2002. Wasserman WW, Palumbo M, Thompson W, Fickett JW, and Lawrence CE. Human-mouse genome comparisons to locate regulatory sites. Nat Genet 26: 225–228, 2000. Yan Y, Jones CA, Sigmund CD, Gross KW, and Catanzaro DF. Conserved enhancer elements in human and mouse renin genes have different transcriptional effects in As4.1 cells. Circ Res 81: 558–566, 1997. 284 • APRIL 2003 • www.ajpregu.org Downloaded from http://ajpregu.physiology.org/ by 10.220.33.4 on July 12, 2017 gene, a 3.9-kb-long block of human DNA hRENc was identified. It contains a large number of conserved elements. The corresponding sequence parts mRENc of mREN1 and rnRENc of rnREN in the mouse and rat sequences were then obtained, respectively. The tandem duplications of the mouse renin gene were not used for further analysis. Finally, the percent identity between hRENc and [m,rn]RENc was assessed (Fig. 2, bottom) using blastz (20) on a local Linux computer. Specification of sources of the DNA elements. Specification of sources of the DNA elements was as follows: 1) hRcS: gi 22044063, position 529046–609046, reverse complement; 2) hRENc: gi 22044063, position 529046–609046, reverse complement, position 25789–29689; 3) rnREN: gi 23321661, reverse complement, position 80000–150000; 4) rnRENc gi 23321661, reverse complement, position 105150–108660; 5) mREN1: gi 20340684, position 47000–122000; 6) mREN2: gi 20340631, reverse complement, position 75000–175000; 7) mRENc: gi 20340684, position 82900–86360. Neighboring relationships between the genes. Please note that the PIP identity scores in Fig. 1 are projected on the human sequence. To estimate distances and neighboring relationships of the renin gene of human, mouse, and rat, please refer to the dotplots at http://www.charite.de/bioinformatics/tfprofile. Availability of the algorithm. The c⫹⫹ source code of the TFprofile implementation is freely available for the Linux/ Unix platform (GNU General Public License; www.gnu.org). Second example of application of TFprofile. To provide further evidence of the potential usefulness of the described algorithm, we have calculated in a second example the weighted TF binding profile for a 2-kb noncoding DNA region ⬃10 kb upstream of the human IL-4 gene containing conserved elements. Like the hRENc, this 2-kb DNA region was identified using PiPmaker (20). Again, the peak in the profile of this 2-kb DNA segment coincides with a DNA region, which has been previously experimentally verified to have a functional relevance to gene expression (11). This additional binding profile calculated with our algorithm TFprofile may be found at http://www.charite.de/bioinformatics/tfprofile.
© Copyright 2025 Paperzz