THE JOURNAL OF BIOLOGICAL CHEMISTRY © 2000 by The American Society for Biochemistry and Molecular Biology, Inc. Vol. 275, No. 8, Issue of February 25, pp. 5668 –5674, 2000 Printed in U.S.A. Highly Hydrophilic Proteins in Prokaryotes and Eukaryotes Are Common during Conditions of Water Deficit* (Received for publication, September 29, 1999, and in revised form, November 29, 1999) Adriana Garay-Arroyo‡§, José M. Colmenero-Flores‡§, Alejandro Garciarrubio¶, and Alejandra A. Covarrubias‡储 From the Departamentos de ‡Biologı́a Molecular de Plantas and ¶Bioestructura y Reconocimiento Molecular, Instituto de Biotecnologı́a, Universidad Nacional Autónoma de México, 62250 Cuernavaca, Morelos, Mexico The late embryogenesis abundant (LEA) proteins are plant proteins that are synthesized at the onset of desiccation in maturing seeds and in vegetative organs exposed to water deficit. Here, we show that most LEA proteins are comprised in a more widespread group, which we call “hydrophilins.” The defining characteristics of hydrophilins are high glycine content (>6%) and a high hydrophilicity index (>1.0). By data base searching, we show that this criterion selectively differentiates most known LEA proteins as well as additional proteins from different taxons. We found that within the genomes of Escherichia coli and Saccharomyces cerevisiae, only 5 and 12 proteins, respectively, meet our criterion. Despite their deceivingly loose definition, hydrophilins usually represent <0.2% of the proteins of a genome. Additionally, we demonstrate that the criterion that defines hydrophilins seems to be an excellent predictor of responsiveness to hyperosmosis since most of the genes encoding these proteins in E. coli and S. cerevisiae are induced by osmotic stress. Evidence for the participation of one of the E. coli hydrophilins in the adaptive response to hyperosmotic conditions is presented. Apparently, hydrophilins represent analogous adaptations to a common problem in such diverse taxons as prokaryotes and eukaryotes. Complete genome data bases are enabling novel approaches to determine if different organisms have evolved similar biochemical solutions to environmentally stressful conditions. Water stress may affect all types of organisms at some stage of their life cycle. We aimed at addressing if similar proteins have been recruited during evolution to respond to water-deficit conditions. Hydrophilic proteins known as late embryogenesis abundant (LEA)1 proteins accumulate under conditions of extreme desiccation in higher plants. These proteins accumulate to high levels during the last stage of seed formation (when a natural desiccation of the seed tissues takes place) and during periods of water deficit in vegetative organs (1). LEA proteins have been grouped into at least six families on the basis of * This work was supported by Grants 0131P-N9506 and 26242-N from the Consejo Nacional de Ciencia y Tecnologı́a (México) (to A. A. C.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. § These authors contributed equally to this work. 储 To whom correspondence should be addressed: Dept. de Biologı́a Molecular de Plantas, Inst. de Biotecnologı́a, Universidad Nacional Autónoma de México, Apdo. Postal 510-3, 62250 Cuernavaca, Mor. Mexico. Tel.: 52-73-29-1643; Fax: 52-73-13-9988; E-mail: crobles@ibt. unam.mx. 1 The abbreviations used are: LEA, late embryogenesis abundant; RMF, ribosome modulation factor. sequence similarity (2, 3). Although significant similarity has not been detected between the members of the different classes, a unifying and outstanding feature of these proteins is their high hydrophilicity and high percentage of glycines (4 – 6). Amino acid sequence analysis allows one to predict that these proteins exist primarily as random coils (5). This property has been confirmed in few cases with purified proteins (7, 8) and is supported by the fact that proteins of this type do not coagulate upon heating (9, 10). LEA protein families have been identified in a wide range of different plant species to the extent that they can be considered ubiquitous in plants (3, 11). Moreover, it has been shown that members of at least one of the LEA protein families, the so-called dehydrins, are present in a range of photosynthetic organisms, including lower plants, algae, and cyanobacteria (9, 11). Two important questions to address are, how do these structural and physicochemical characteristics relate to the largely unknown function of LEA proteins and do they represent a solution to a plant-specific problem or to a more general one? Recent data suggest that LEA-like proteins are also found in fungi, and they seem to be also involved in pathways that respond to water deficit. That is the case of HSP12 (12) and GRE1 (13) from Saccharomyces cerevisiae, whose amino acid composition shows a high content of hydrophilic amino acids distributed throughout their primary structure. In both cases, it has been shown that their corresponding transcripts accumulate in response to hyperosmosis (13, 14). The CON6-encoded protein of Neurospora crassa also exhibits hydrophilic and highly charged repeated domains. The CON6 polypeptide is present at high levels in the late stages of conidiation and in the mature conidia, stages that involve nutrient deprivation and developmentally imposed desiccation (15). Since the LEA proteins of different classes have no evident sequence similarity, it is unlikely that the structural and physicochemical characteristics they share are due to common descent. In contrast, it seems that these physicochemical characteristics have evolved independently in different protein families and in different organisms, and they allow LEA proteins to carry out their specific functions under partial dehydration. If this is the case, we should expect to find proteins with similar characteristics outside the plant kingdom that are also induced by conditions of water deficit. We have chosen the name hydrophilins to group the proteins that present, as most LEA proteins in plants, extreme hydrophilicity and a high percentage of glycine residues in their sequence. In this study, we explored the question of whether proteins with these characteristics are widespread among bacteria and other eukaryotes and if they are induced by water deficit. To pursue this objective, we designed an algorithm to search sequence data bases for proteins having a physicochemical resemblance to LEA proteins. Then we investigated, either from 5668 This paper is available on line at http://www.jbc.org Hydrophilic Proteins Are Common during Water Deficit TABLE I Primers used to obtain E. coli and S. cerevisiae hydrophilin gene probes Name YCIG_ECOLI YJBJ_ECOLI PRTL_ECOLI RMF_ECOLI YHDL_ECOLI S65242 STF2_YEAST YBM6_YEAST YFBO_YEAST IF1A_YEAST S67772 YJO4_YEAST YJS4_YEAST YNTO_YEAST RL44_YEAST Sequence (5⬘ to 3⬘) ATG GCG CTG GAA GAA CGA CCG CAG ATG GCT CAA AGT AAT ACC CTT TGG TCT ACA AGT GAA CGC TTT AAA GCG CCC TGG CCG GAG CGA AGA AAC GCC TAA TTA AAC TGG GTA TAT TAT GCA GGG AGT TCC AGA GAA GGG TGT CAA CAG GAA AGA GCC TAA GGG TCC AAG TGC AAG CAT TAA AAT CGG GCC CG GC G C GCG ACT CCG AAC G CTC CTC AAT CCT ATG GAA TTC CCG ATG GAT GGA GGA CAA CGT GCC CAA ATG CAG TCC AAT GAG GAG CAA TTT TCT AGG GTC TCA GGT ATT AAA TGA GAA TAC GAT TCC AAG GAT TGA TGC GCC CAA CAG ATC GCT CAA CAC AAT AAG GAG AGA ATT GAA AAC AGT TTC TTC GAA ATC AAA AAA ACT TAA TTG AAC CTG CTG CCA AAA GAC CCC CCT GAT CCC GCA TTA TCT GGG GTA GCT CAC GAC CAA AAA GAT CAA ACC TCA AAC ACT AAC GCA GTT GAA CTG ACT TCT GGT GGG TGA TAC ATT ACC CTG TAC AGA GAG TCA ACT TGT AGT GGA AAG CCC GAA GTA GTT AGG CAA CCC CGC AAT GC TGC TAC ACA CCG AAT AAA TCG GCA ACA GAG GTG CC TCC ACT GGG ATC TTT ATA AAA GC AAA AAC G AAA GG C GGC GAA GC CC GGT GCC GCC GGC GG CG GG GGG G GG GG C GC GG C TGC the literature or by performing Northern blot experiments, whether the transcripts of the selected genes accumulated under water deficit. After finding isolated examples in several species, we searched the complete genome data bases of Escherichia coli and S. cerevisiae. A surprisingly high proportion of the proteins chosen by the physicochemical criterion was also induced at the mRNA level under water-deficit conditions. Thus, we conclude that hydrophilins exist in several kingdoms, including plant, bacteria, and fungi, and that they have been recruited from diverse protein families to pathways involved in responses to water-deficit conditions. EXPERIMENTAL PROCEDURES Strains, Media, and Growth Conditions—S. cerevisiae strain RS58 (16) was used in this work. Yeast cells were routinely grown on medium containing 1% yeast extract, 2% Bacto-peptone, and 2% glucose at 30 °C. The stress treatment involved a 30-min osmotic shock imposed by adding a final concentration of 0.5 M sorbitol to a yeast culture grown up to A660 ⫽ 0.5. For E. coli, most of the experiments were carried out using strain DH5␣. E. coli cells were routinely grown on LB medium (1% Tryptone, 0.5% yeast extract, and 0.5% NaCl) at 37 °C. As in the case of yeast cells, an osmotic shock was imposed on bacterial cultures grown up to A600 ⫽ 0.5 by the addition of NaCl to a final concentration of 0.4 M, and cells were harvested after 10, 20, and 60 min of shock treatment. E. coli strain W3110 and its derivative, HMY15, carrying an rmf::Cmr mutation, were kindly provided by Dr. A. Ishihama (17). Osmosensitivity tests were performed by growing E. coli strains in liquid LB medium with or without 0.5 M NaCl at 37 °C. Samples were removed at the times indicated, and their A600 values were determined. Strain HMY15 was grown in the presence of chloramphenicol (20 g/ ml). Levels of survival were obtained by exposing log-phase bacterial cultures to osmotic shock with 0.5 M NaCl for 30 and 60 min. After serial dilutions, viable cells were estimated by colony counting. Mean survival is expressed as percentage of the survival obtained in cultures not exposed to osmotic shock. DNA Constructions—All full-length hydrophilin genes were obtained by polymerase chain reaction amplification of chromosomal DNA. The primers used to obtain E. coli and S. cerevisiae hydrophilin gene probes are listed in Table I. The purified polymerase chain reaction products were used as probes for all Northern blot experiments. 5669 Northern Blot Analysis—E. coli and S. cerevisiae RNAs were extracted following standard procedures (18) with some modifications. To remove the remaining transfer RNA, an additional precipitation step using LiCl to a final concentration of 2.0 M was included. After centrifugation, the RNA pellet was washed twice with 70% ethanol and resuspended in water (13). For all Northern blot experiments, 10 g of total RNA was separated by electrophoresis on formaldehyde-containing 1.2% agarose gel and blotted onto nylon N⫹ (Amersham Pharmacia Biotech). All hybridizations were carried out under high stringency conditions (65 °C and 300 mM sodium phosphate (pH 7.2), 7% SDS, and 10 mM EDTA) (19). Computational Methods—To search the protein data bases, an ad hoc program written in Perl was used. For each sequence, the program records the amino acid composition and performs a hydrophilicity analysis based on the algorithm of Kyte and Doolittle (20), using a window size of 12 residues. The program’s output is a list, sorted according to a given score, of the sequences that passed the selection criterion. The selection criterion and the scoring function can be given separately to the program at runtime. Throughout this work, the selection criterion was an average hydrophilicity above 1 and at least 8% Gly; the score or index of significance was defined as shown in Equation 1. 100 ⫻ 共% Gly ⫺ 8兲 ⫻ 共average hydrophilicity ⫺ 1兲 (Eq. 1) After trying different combinations of several structural and physicochemical features common to LEA proteins, this criterion was chosen because it is simple and can adequately discriminate between LEA proteins and other plant proteins. The score function was chosen to give zero at the threshold values and to grow larger as a protein becomes more distinguishable from the bulk of the non-LEA proteins. Different sequence sets were used in this study. To develop the selection criterion and to assess its adequacy, we used a control set that includes almost all known LEA proteins, 56 sequences in total. All other sets were derived from the Swiss Protein Database (Release 36). Sequences recorded as fragments and those shorter than 30 residues were not used. To find new hydrophilins, we used different taxon-specific subsets of the Swiss Protein Database. Classification into taxons was made by the Fetch program of the Wisconsin software package (Version 9.1, Genetics Computer Group, Inc., Madison, WI). The groups considered were plant, bacteria (excluding E. coli), fungi (excluding S. cerevisiae), E. coli, and S. cerevisiae. In the case of E. coli and S. cerevisiae, searches were performed over hypothetically translated open reading frames of the corresponding complete genomes (E. coli K12 MG1655 complete genome, Release M52, GenBankTM accession number U00096; S. cerevisiae Genome Database, Release 4.1). Three-state secondary structure prediction was performed by the PHD sec service of the PredictProtein Server.2 RESULTS Identification of Hydrophilins in Bacteria, Fungi, and Plants—LEA proteins did not evolve from a single ancestral protein, but they can be grouped based on physicochemical features, mainly high glycine content and a large proportion of charged residues. LEA proteins also have characteristic hydropathy profiles, which distinguish them from other proteins (Table II). We used these distinctive traits to design a reliable method to retrieve LEA-like non-plant proteins from data bases. The search criterion was established by searching a control set, including most known LEA proteins sequences and the plant subset of the Swiss Protein Database, with an ad hoc program (see “Experimental Procedures”) that considers only glycine content and average hydrophilicity, two properties that resulted in a high selectivity for LEA proteins (Fig. 1). Proteins with an average hydrophilicity above 1 and at least 8% glycine content were retained. We coined the name “hydrophilins” to group the proteins retrieved based on this criterion (Fig. 1). In addition to these fixed thresholds, the algorithm assigned a score to the searched proteins, which allowed us to sort them by their significance. The score, or index of significance, defined in Equation 1 (see “Experimental Procedures”) has the properties of being zero at the threshold values and of becoming larger as 2 The computer program and the sequences in the control set are available from the authors. 5670 Hydrophilic Proteins Are Common during Water Deficit TABLE II Characteristics of the hydrophilins found in different organisms The protein hydropathy profiles were obtained using MacMolly’s Translate package. (1) Swiss Protein Database Functions assigned to some of these proteins: PRTL_ECOLI, protamine-like protein (33); RMF_ECOLI, ribosome modulation factor (17); GSIB_BACSU, induced by general stress conditions (31); COTT_BACSU, spore core protein (29); S65242 (GRE1), induced by different stress conditions (13); STF2_YEAST, stabilizes the F1F0-ATPase 䡠 inhibitor complex (35); SIP18_YEAST and HS12_YEAST, induced by different stress conditions (14, 34); IF1A_YEAST, eukaryotic initiation factor 1A (36); RL44_YEAST, ribosomal protein (37); CON6_NEUCR and CONX_NEUCR, conidiation proteins (26, 27). All proteins listed in the plant group correspond to LEA proteins (11, 13). (2) H.I.: hydrophatic index. This was obtained according to Kyte and Doolittle (20) using a computer program that assigns the appropriate hydropathy value to each residue in a protein along its amino acid sequence and then successively sums those values within overlapping segments displaced from each other by one residue (20). (3) %Gly: glycine content expressed as percentage of total amino acid content. (4) %Coil: proportion of the polypeptide predicted to be structured as a random coil. (5) Index of significance: corresponds to an arbitrary number that was chosen to give zero at the threshold values and that increases as a protein becomes more distinguishable from the bulk of the non-LEA proteins (see “Experimental Procedures” for details). aa, amino acids. a protein moves away from the bulk of the non-LEA proteins (Fig. 1). Of the known LEA plant proteins, 91% had index values ⬎0 and were classified as hydrophilins. As expected, one group of LEA proteins with a low glycine content and marginal hydrophilicity was not retrieved by our selection criterion. These proteins are considered atypical and belong to group V, such as FIG. 1. Plot of the hydrophilicity and percentage of Gly content of some relevant protein groups. This analysis included data obtained for all E. coli and S. cerevisiae proteins as well as for LEA proteins in the control set. For technical limitations of the graphing program, only 4000 of the ⬃6200 proteins in S. cerevisiae are represented. The LEA proteins that were not classified as hydrophilins have also been incorporated. The hydrophilicity index was obtained as described under “Experimental Procedures” (20). Black circles, E. coli proteins; gray circles, S. cerevisiae proteins; blue triangles, control set LEA proteins; yellow triangles, non-hydrophilin LEA proteins; red squares, E. coli hydrophilins; green circles, S. cerevisiae hydrophilins. LEA5 and LEA14 from cotton (21). For this reason, they were excluded from the control set of LEA proteins (Fig. 1). That our search criterion is specific is further confirmed by the fact that of 5792 plant proteins searched, only 72 were selected, and 52 of these are known LEA proteins (Fig. 1). Additional hydrophilins were identified: two LEA proteins not previously considered, two of the so-called COR proteins, a flocculent-active protein (MO2), a seed antimicrobial peptide (MBP1), and several ribosomal proteins (14). Interestingly, some of these proteins have been associated with water-deficit situations. This is the case of the alfalfa COR410 and wheat CORA proteins, which not only present sequence similarity to some LEA proteins, but also their genes are induced during cold and drought stress (22, 23). On the other hand, MBP1 and MO2 proteins are found in the desiccated seed (24, 25). Regarding the ribosomal proteins, no information is available on their expression patterns under stress conditions (Swiss-Prot accession numbers p10851, p50888, p49195, p26871, p27083, p50892, p27070, p48857, p49213, p48130, p51190, p49164, p49689, and p28520). When the same analysis was performed with the fungal and bacterial protein subsets (excluding S. cerevisiae and E. coli; see below), again a discrete group of proteins were classified as hydrophilins according to our criterion. In fungi, only 12 out of 2264 sequences passed the selection criterion. This group included two proteins, CON6 and CON10, involved in conidiation (a developmental stage at which a dehydration process occurs) (26, 27); a mitochondrial protein (28); six ribosomal proteins (Swiss-Prot accession numbers p52809, p27076, p31028, p31027, p31866, and p27075); and three hypothetical proteins. In bacteria, 14 hydrophilin candidates out of 12,911 sequences were selected. Three of them are involved in Bacillus subtilis sporulation (GsiB, SspF, and CotT), and specifically, GsiB is also induced in response to different stress conditions, including osmotic stress (29 –31). From the remaining 11 hydrophilins, seven correspond to hypothetical proteins, and two are Hydrophilic Proteins Are Common during Water Deficit classified as single-stranded DNA-binding proteins (Swiss-Prot accession numbers p40947 and p44409) and two as ribosomal proteins (Swiss-Prot accession numbers p52829 and p46386). Hydrophilins in Two Complete Genomes: E. coli and S. cerevisiae—To have a more complete picture of the distribution of hydrophilins in the genomes of prokaryotes and eukaryotes, we searched the complete genome data bases of the bacteria E. coli and yeast S. cerevisiae. This was important because the study of complete genomes eliminates many sources of bias in the composition of the data bases, specifically the fact that genes tend to be isolated and studied by virtue of being related to other known genes. In E. coli, only five candidates were found when the selection criterion was applied against the 4289 proteins of Release M52 of the complete genome, whereas the genome of S. cerevisiae codes for 6200 proteins, 12 of which qualified (Fig. 1 and Table II). Many of the identified proteins corresponded to open reading frames that had not been characterized before and whose amino acid sequences did not show significant sequence similarity among them or to other proteins in the data bases. Others had already been described, and specific functions had been assigned to them. This is the case of two of the five E. coli hydrophilins. The ribosome modulation factor (RMF) is a protein that is associated with the 100 S ribosome dimers and that accumulates in E. coli cells upon growth transition from the exponential to the stationary phase (17, 32). PRTL has been described as a small and basic polypeptide that is referred to as the P-protein because it has a sequence similar to that of the trout protamine protein; however, its function is still unknown (33). In yeast, from the 12 selected hydrophilins, the genes of three of them have been reported to be induced by osmotic stress, these being HSP12, SIP18, and GRE1, whose function is unknown (13, 14, 34). Of the other nine genes, functional characterizations have been reported only for STF2, IF1A, and RL44. STF2 encodes a 15-kDa polypeptide that has been suggested to help stabilize the complex formed between F1F0ATPases and an inhibitor protein upon cessation of phosphorylation, keeping the enzyme inactive during de-energization of mitochondrial membranes (35). IF1A encodes a eukaryotic initiation factor 1A that seems to be required to maximize the rate of protein biosynthesis (36). Finally, RL44 (YP2␣) encodes for one of four ribosomal acidic phosphoproteins characterized in S. cerevisiae that are involved in the regulation of the activity of the 60 S ribosomal subunit. It has been shown that these phosphoproteins are not required for cell viability, but regulate the pattern of expression in yeast (37). Secondary Structure of Hydrophilins—One of the striking characteristics of most LEA proteins is that they are predicted to be highly unstructured molecules, existing principally as random coils (Fig. 2 and Table II). This has been confirmed in a few cases by circular dichroism (7, 8). Accordingly, when the selected E. coli and S. cerevisiae hydrophilins were subjected to secondary structure prediction analysis, the results indicated that, in every case, they tend to adopt a coil configuration over much of their length (50 – 80%) (Fig. 2 and Table II). In contrast, in a control group made of 89 randomly chosen E. coli and S. cerevisiae proteins, only six proteins (7%) in the E. coli sample and four proteins (4.5%) in the S. cerevisiae sample had ⬎50% predicted coil regions. Hydrophilin Expression Pattern under Osmotic Stress Conditions—The high correlation between the expression pattern under water limitation and the physicochemical characteristics present in different hydrophilins (including LEA proteins) suggested that the induction in response to water-deficit conditions could be conserved among all hydrophilins. To test this 5671 FIG. 2. Distribution of LEA proteins, 89 randomly chosen E. coli and S. cerevisiae proteins, and 17 E. coli and S. cerevisiae hydrophilins according to their predicted contents of randomcoiled structures. Three-state secondary structure prediction was performed by the PHD sec service of the PredictProtein Server. Black bars, sample E. coli proteins; white bars, sample S. cerevisiae proteins; gray bars, LEA proteins; hatched bars, hydrophilins. hypothesis, we carried out Northern blot analysis using as probes amplified sequences corresponding to hydrophilins from E. coli and S. cerevisiae (see above). These probes were hybridized against total RNA obtained from E. coli or S. cerevisiae grown under optimal conditions or under hyperosmotic stress (see “Experimental Procedures”). The probes corresponding to SIP18 and HSP12 were not included in this analysis because data on induction by water stress had already been reported (14, 34). All E. coli hydrophilin transcripts studied here accumulated in response to osmotic stress. The highest accumulation was for YCIG_ECOLI and YJBJ_ECOLI transcripts after 20 min of hyperosmotic treatment. In both cases, we detected additional transcripts of higher molecular weight responsive to the imposed stress conditions. This observation suggested that YCIG_ECOLI and YJBJ_ECOLI genes are part of operons being transcribed as polycistronic mRNAs. A significant transcript accumulation in response to osmotic stress was also detected for the PRTL_ECOLI gene, whose highest accumulation occurred after 10 min of stress treatment (Fig. 3). The RMF_ECOLI transcript also responded to the imposition of a water deficit; however, in contrast with the other hydrophilin genes, the increase in mRNA accumulation was not detected until 60 min of stress treatment (Fig. 3). The mRNA for the YHDL_ECOLI gene was not detectable on gel blot hybridizations under control or stress conditions and was therefore not included in the study. The analysis of the expression patterns of the 12 S. cerevisiae hydrophilin transcripts showed that only eight accumulated in response to hyperosmotic conditions: S65242, STF2_YEAST, SI18_YEAST, YBM6_YEAST, HS12_YEAST, YJ04_YEAST, YJS4_YEAST, and YNT0_YEAST (Fig. 4). As indicated above, we pursued the study of only those yeast hydrophilins whose osmotic stress response has not been reported before. As in E. coli hydrophilins, although the absolute levels of the yeast hydrophilin transcripts varied, most were clearly induced by water-deficit conditions. Interestingly, a correlation was found between the level of induction and the index of significance of the E. coli and S. cerevisiae hydrophilins. In yeast, for example, the highest induction was for those proteins with the highest index (S65242, STF2_YEAST, SIP18_YEAST, YBM6_YEAST, and HS12_YEAST). This result reinforces the suitability of the 5672 Hydrophilic Proteins Are Common during Water Deficit investigate the contribution of this hydrophilin to the E. coli adaptive response to osmotic stress, we took advantage of the existence of characterized null mutants in the rmf gene (rmf::Cmr) (17) to carry out osmosensitivity tests. As shown in Fig. 5, a wild-type strain (W3110) presented a similar growth rate under optimal growth conditions as well as under hyperosmotic stress. In contrast, the isogenic rmf::Cmr mutant (HMY15) showed a dramatic decrease in its growth rate under stress conditions (Fig. 5A). To evaluate the viability of the wild-type and mutant strains, both bacterial cultures were subjected to osmotic shock with 0.5 M NaCl. The results from these experiments showed that a major difference in survival was detected after a 30-min osmotic shock, when the strain carrying the wild-type rmf gene presented a 82% survival; meanwhile, the mutant strain rmf::Cmr showed a survival of 33% (Fig. 5B). FIG. 3. Time course of the E. coli hydrophilin transcript accumulation in response to osmotic stress. E. coli cultures were grown to A600 ⫽ 0.5, and at this point, the stress was imposed by adding NaCl to a final concentration of 0.4 M (see “Experimental Procedures” for details). Cells were harvested at the times indicated, and total RNA was extracted. RNAs (10 g/lane) were electrophoretically separated, blotted, and hybridized against the corresponding [␣-32P]dCTP-labeled probes. High stringency conditions were used for all hybridizations and washes. Experiments were repeated at least three times. Ethidium bromide-stained ribosomal RNAs are shown at the bottom as loading reference. b, bases. FIG. 4. S. cerevisiae hydrophilin transcript accumulation in response to osmotic stress. S. cerevisiae cultures were grown to A660 ⫽ 0.5, and at this point, the stress was imposed by the addition of sorbitol to a final concentration of 0.5 M (see “Experimental Procedures” for details). Cells were harvested after 30 min of treatment, and total RNA was extracted. RNAs (10 g/lane) were electrophoretically separated, blotted, and hybridized against the corresponding [␣-32P]dCTPlabeled probes. High stringency conditions were used for all hybridizations and washes. Ethidium bromide-stained ribosomal RNAs are shown at the bottom as loading reference. Experiments were repeated at least three times. The numbers below show the quantification of transcript accumulation when compared with the transcript levels obtained under control conditions. The quantification was done using NIH Image 1.60b7. Kb, kilobases. search criterion to select new hydrophilins as they appear in data bases. Contribution of an E. coli Hydrophilin to the Adaptive Response to Hyperosmotic Conditions—As mentioned above, one of the E. coli proteins classified as a hydrophilin is a protein identified as RMF. This protein shows an unusual behavior, accumulating in E. coli cells upon the growth transition from the exponential to the stationary phase, and in contrast to other ribosome components or associated proteins, rmf expression depends, inversely, on growth rate (17), suggesting its participation in the bacterial response to stress conditions. To DISCUSSION In higher plants, proteins classed as LEA proteins present the common characteristics of being induced when a loss of water from the cell occurs. In this work, we focused our attention on the structural unifying aspects of most of these proteins: their extreme hydrophilicity and high content of glycine residues. The methodic screening that we carried out indicated, first, that proteins with these properties can be clearly distinguished from the bulk of proteins in any given organism and, second, that not only are these structural features present in the LEA proteins, but rather they are dispersed among proteins from different organisms. This was evident from the analysis of the E. coli and S. cerevisiae genomes, which showed only 5 and 12 proteins, respectively, with these characteristics. More important, the criterion we used in this analysis, which does not imply sequence similarity and is based on two simple characteristics, is highly selective. Systematically, we found that ⬃1% of the proteins passed this criterion. From the complete OWL Database, only 1800 out of 250,000 sequences (0.72%) were selected. Of the two characteristics, the hydrophilicity threshold is by far the most selective, eliminating 19 out of 20 proteins, compared with the glycine threshold, which merely eliminates 16 of every 20 proteins. An additional conclusion from this work is the remarkable correlation found in plants, E. coli, and S. cerevisiae between expression under water deficit and the physicochemical characteristics of hydrophilins. The strongest correlation was found in E. coli, where the genes encoding four of the five proteins selected as hydrophilins were induced by osmotic stress treatment (one transcript was undetectable under our conditions). In the case of S. cerevisiae, there were some exceptions: 4 of the 12 selected hydrophilin transcripts did not accumulate after osmotic shock. Reassurance comes from the observation that, on average, the exceptions presented lower scores than the osmotic stress-induced hydrophilins, suggesting that they are not true hydrophilins or that they are induced under different experimental conditions (Fig. 4). By addressing the question of how many of the well studied proteins in E. coli are related to a hyperosmotic condition, the probability to which a randomly selected protein happens to be inducible by water deficit was estimated. In the E. coli genome, 1477 open reading frames are said to be ⬎95% identical to Swiss Protein Database entries. Of these entries (which constitute a non-redundant set), 880 have some annotation under “Function” or “Induction”, 26 of which suggest some relationship to dehydration or hyperosmosis. Thus, ⬃3% or less of the proteins in any group are expected to be induced by osmotic shock. In contrast, the data in this work indicate that all the detectable E. coli hydrophilin genes are induced under these conditions, whereas in S. cerevisiae, 70% of the selected genes Hydrophilic Proteins Are Common during Water Deficit 5673 FIG. 5. Contribution of RMF to the adaptive response to hyperosmotic conditions. A, E. coli strains W3110 and HMY15 (rmf::Cmr) were grown in liquid LB medium at 37 °C with or without 0.5 M NaCl. Culture samples were removed at the times indicated, and their A600 values were determined. Open circles, W3110 control; open squares, HMY15 control; closed circles, W3110 (0.5 M NaCl); closed squares, HMY15 (0.5 M NaCl) B, levels of survival were obtained by exposing log-phase bacterial cultures to osmotic shock with 0.5 M NaCl for 30 or 60 min. Mean survival is expressed as percentage of the survival obtained in cultures not exposed to osmotic shock. Experiments were repeated at least three times. Closed circles, W3110; open circles, HMY15. are responsive. Therefore, we could conclude that the association found between the selected physicochemical properties and the transcript induction by osmotic shock cannot be by mere chance. Our criterion also selected many hydrophilin proteins from other groups besides plants, E. coli, and S. cerevisiae. Only in a few cases did we found information to suggest that the genes for these proteins were induced by water deficit. This has to be considered as an incomplete list. Future releases of the Swiss Protein Database with more entries and information on the function or inducibility of their sequences will undoubtedly provide many more examples. This is true even of the plant group. Metazoa were not analyzed in this work because, with few exceptions, the response to dehydration has not been studied in this group. However, a preliminary examination indicates that the Swiss Protein Database has 193 metazoan proteins selected by our algorithm (data not shown). This set of proteins is very heterogeneous according to their described function. It distinguishes toxins, DNA-binding proteins, ribosomal proteins, and some extracellular proteins. At present, a functional correlation with dehydration has not been established in this group. Even though no significant sequence similarity was found among most of the different hydrophilins, common functional features could be distinguished. This is the case for those that seem to interact with nucleic acids such as protamine, the single-stranded DNA-binding proteins, or the ribosomal proteins that interact with RNA. Thus, it could be argued that their structural properties (presence of charged amino acids) are easily associated with their ability to bind nucleic acids, and this would be the same for all DNA- or RNA-binding proteins; however, not all the proteins with this functional characteristic were classified as hydrophilins. For example, in E. coli, only one ribosomal protein, RMF, was included in the hydrophilin group. Given that a significant proportion of the hydrophilins from different organisms are ribosomal proteins (organellar and cytoplasmic), it could be hypothesized that some of them were selected as part of an adaptive mechanism that helps to maintain a functional translational apparatus under water-limited environments. This hypothesis is supported by the data in Fig. 5 showing that a strain lacking the hydrophilin identified as RMF (rmf::Cmr) is osmosensitive under hyperosmotic conditions. As indicated above, the hydrophilins constitute a heterogeneous group of proteins, in which, at least in the case of plant hydrophilins (mostly LEA proteins), six different families can be identified (2, 3). The analysis of the non-plant hydrophilin protein sequences has also shown that some could be grouped into class 1 of LEA proteins since homologous repetitive motifs are conserved not only in all members of this group, but also in the GsiB protein from B. subtilis (31). The presence of conserved motifs among the proteins in each family suggested that they played similar roles and probably different from those carried out by hydrophilins in other groups. Therefore, those common physicochemical properties in hydrophilins could be the result of the selection of a general solution in proteins that could have different functions; for instance, some of them could function as stabilizers of either cell structures (e.g. membranes) or macromolecular complexes (e.g. transcriptosomes, ribosomes, etc.). The high content of glycine residues in most LEA proteins allows one to predict that these proteins exist in random-coiled structures (5, 6). This is consistent with experimental data obtained for some LEA proteins (Em1, Dsp16, and G50) that showed a high percentage of randomly coiled moieties, suggesting that these proteins may be largely unfolded in their native state (7, 8). In the case of hydrophilins, the analysis of their predicted three-state structure also suggested that characteristically they exist as loosed-structured proteins (Fig. 2), in contrast with the majority of proteins present in an organism. This might be related to their hypothetical role in helping to maintain a minimum cellular water content for survival. If we consider that the sequence of a protein determines its function only indirectly, by guaranteeing that the protein will have the adequate structural and physicochemical characteristics required, it is possible that exceptional traits could serve as the criterion to search for the function when the proteins have no sequence similarity due either to convergence or to very distant ancestry. In our case, this was a fruitful strategy. By showing that the hydrophilin physicochemical properties are associated with a common expression pattern, our results suggest that these proteins represent a common adaptation to osmotic stress in the most diverse taxons. Acknowledgments—We thank B. Barkla, M. Rocha, J. L. Folch, F. Campos, and E. Alvarez-Buylla for critical reading of the manuscript; R. M. Solórzano for technical assistance; and P. Gaytán and E. López for the synthesis of oligonucleotides. REFERENCES 1. Bray, E. A. (1997) Trends Plant Sci. 2, 48 –54 2. Colmenero-Flores, J. M., Campos, F., Garciarrubio, A., and Covarrubias, A. A. (1997) Plant Mol. Biol. 35, 393– 405 3. Ingram, J., and Bartels, D. (1996) Annu. Rev. Plant Physiol. Plant Mol. Biol. 47, 377– 403 4. Baker, J., Steele, C., and Dure, L., III (1988) Plant Mol. Biol. 11, 277–291 5. Dure, L., III (1993) in Plant Responses to Cellular Dehydration during Environmental Stress (Close, T. J., and Bray, E. A., eds) Vol. 10, pp. 91–103, American Society of Plant Physiology, Rockville, MD 6. Dure, L., III (1993) Plant J. 3, 363–369 5674 Hydrophilic Proteins Are Common during Water Deficit 7. Lisse, T., Bartels, D., Kalbitzer, H. R., and Jaenicke, R. (1996) Biol. Chem. 377, 555–561 8. McCubbin, W. D., and Kay, C. M. (1985) Can. J. Biochem. 63, 803– 810 9. Close, T. J., and Lambert, P. J. (1993) Plant Physiol. (Bethesda) 101, 773–779 10. Close, T. J. (1996) Physiol. Plant. 97, 795– 803 11. Close, T. J. (1997) Physiol. Plant. 100, 291–296 12. Mtwisha, L., Brandt, W., McCready, S., and Lindsey, G. G. (1998) Plant Mol. Biol. 37, 513–521 13. Garay-Arroyo, A., and Covarrubias, A. A. (1999) Yeast 15, 879 – 892 14. Varela, J. C., van Beekvelt, C., Planta, R. J., and Mager, W. H. (1992) Mol. Microbiol. 6, 2183–2190 15. Springer, M. L., and Yanofsky, C. (1992) Genes Dev. 6, 1052–1057 16. Gaxiola, R., Larrinoa, I. F., Villalba, J. M., and Serrano, R. (1992) EMBO J. 11, 3157–3164 17. Yamagishi, M., Matsushima, H., Wada, A., Sakagami, M., Fujita, N., and Ishihama, A. (1993) EMBO J. 12, 625– 630 18. Collart, M., and Oliviero, S. (1995) in Current Protocols in Molecular Biology (Ausubel, F. M., Brent, R., Kingston, R. E., Moore, D. D., Seidman, J. G., Smith, J. A., Struhl, K., eds) Vol. 2, Suppl. 23, pp. 13.12.1–13.12.5, John Wiley & Sons, Inc., New York 19. Church, G. M., and Gilbert, W. (1984) Proc. Natl. Acad. Sci. U. S. A. 81, 1991–1995 20. Kyte, J., and Doolittle, R. F. (1982) J. Mol. Biol. 157, 105–132 21. Galau, G. A., Wang, H. Y., and Hughes, D. W. (1993) Plant Physiol. (Bethesda) 101, 695– 696 22. Danyluk, J., Houde, M., Rassart, E., and Sarhan, F. (1994) FEBS Lett. 344, 20 –24 23. Laberge, S., Castonguay, Y., and Vezina, L. P. (1993) Plant Physiol. (Bethesda) 101, 1411–1412 24. Duvick, J. P., Rood, T., Rao, A. G., and Marshak, D. R. (1992) J. Biol. Chem. 267, 18814 –18820 25. Gassenschmidt, U., Jany, K. D., Tauscher, B., and Niebergall, H. (1995) Biochim. Biophys. Acta 1243, 477– 481 26. Roberts, A. N., Berlin, V., Hager, K. M., and Yanofsky, C. (1988) Mol. Cell. Biol. 8, 2411–2418 27. White, B. T., and Yanofsky, C. (1993) Dev. Biol. 160, 254 –264 28. Amegadzie, B. Y., Zitomer, R. S., and Hollenberg, C. P. (1990) Yeast 6, 429 – 440 29. Aronson, A. I., Song, H. Y., and Bourne, N. (1989) Mol. Microbiol. 3, 437– 444 30. Stephens, M. A., Lang, N., Sandman, K., and Losick, R. (1984) J. Mol. Biol. 176, 333–348 31. Völker, U., Egelmann, S., Maul, B., Riethdorf, S., Völker, A., Schmid, R., Mach, H., and Hecker, M. (1994) Microbiology 140, 741–752 32. Wada, A., Igarashi, K., Yoshimura, S., Aimoto, S., and Ishihama, A. (1995) Biochem. Biophys. Res. Commun. 214, 410 – 417 33. Altman, S., Model, P., Dixon, G. H., and Wosnick, N. A. (1981) Cell 26, 299 –304 34. Miralles, V. J., and Serrano, R. (1995) Mol. Microbiol. 17, 653– 662 35. Yoshida, Y., Sato, T., Hashimoto, T., Ichikawa, N., Nakai, S., Yoshikawa, H., Imamoto, F., and Tagawa, K. (1990) FEBS Lett. 192, 49 –53 36. Wei, C.-L., Kainuma, M., and Hershey, J. W. B. (1995) J. Biol. Chem. 270, 22788 –22794 37. Remacha, M., Jimenez-Diaz, A., Bermejo, B., Rodriguez-Gabriel, M. A., Guarinos, E., and Ballesta, J. P. G. (1995) Mol. Cell. Biol. 15, 4754 – 4762
© Copyright 2026 Paperzz