Highly Hydrophilic Proteins in Prokaryotes and Eukaryotes Are

THE JOURNAL OF BIOLOGICAL CHEMISTRY
© 2000 by The American Society for Biochemistry and Molecular Biology, Inc.
Vol. 275, No. 8, Issue of February 25, pp. 5668 –5674, 2000
Printed in U.S.A.
Highly Hydrophilic Proteins in Prokaryotes and Eukaryotes
Are Common during Conditions of Water Deficit*
(Received for publication, September 29, 1999, and in revised form, November 29, 1999)
Adriana Garay-Arroyo‡§, José M. Colmenero-Flores‡§, Alejandro Garciarrubio¶, and
Alejandra A. Covarrubias‡储
From the Departamentos de ‡Biologı́a Molecular de Plantas and ¶Bioestructura y Reconocimiento Molecular,
Instituto de Biotecnologı́a, Universidad Nacional Autónoma de México, 62250 Cuernavaca, Morelos, Mexico
The late embryogenesis abundant (LEA) proteins are
plant proteins that are synthesized at the onset of desiccation in maturing seeds and in vegetative organs exposed to water deficit. Here, we show that most LEA
proteins are comprised in a more widespread group,
which we call “hydrophilins.” The defining characteristics of hydrophilins are high glycine content (>6%) and
a high hydrophilicity index (>1.0). By data base searching, we show that this criterion selectively differentiates most known LEA proteins as well as additional
proteins from different taxons. We found that within the
genomes of Escherichia coli and Saccharomyces cerevisiae, only 5 and 12 proteins, respectively, meet our criterion. Despite their deceivingly loose definition, hydrophilins usually represent <0.2% of the proteins of a
genome. Additionally, we demonstrate that the criterion
that defines hydrophilins seems to be an excellent predictor of responsiveness to hyperosmosis since most of
the genes encoding these proteins in E. coli and S. cerevisiae are induced by osmotic stress. Evidence for the
participation of one of the E. coli hydrophilins in the
adaptive response to hyperosmotic conditions is presented. Apparently, hydrophilins represent analogous
adaptations to a common problem in such diverse taxons as prokaryotes and eukaryotes.
Complete genome data bases are enabling novel approaches
to determine if different organisms have evolved similar biochemical solutions to environmentally stressful conditions. Water stress may affect all types of organisms at some stage of
their life cycle. We aimed at addressing if similar proteins have
been recruited during evolution to respond to water-deficit
conditions. Hydrophilic proteins known as late embryogenesis
abundant (LEA)1 proteins accumulate under conditions of extreme desiccation in higher plants. These proteins accumulate
to high levels during the last stage of seed formation (when a
natural desiccation of the seed tissues takes place) and during
periods of water deficit in vegetative organs (1). LEA proteins
have been grouped into at least six families on the basis of
* This work was supported by Grants 0131P-N9506 and 26242-N
from the Consejo Nacional de Ciencia y Tecnologı́a (México) (to
A. A. C.). The costs of publication of this article were defrayed in part by
the payment of page charges. This article must therefore be hereby
marked “advertisement” in accordance with 18 U.S.C. Section 1734
solely to indicate this fact.
§ These authors contributed equally to this work.
储 To whom correspondence should be addressed: Dept. de Biologı́a
Molecular de Plantas, Inst. de Biotecnologı́a, Universidad Nacional
Autónoma de México, Apdo. Postal 510-3, 62250 Cuernavaca, Mor.
Mexico. Tel.: 52-73-29-1643; Fax: 52-73-13-9988; E-mail: crobles@ibt.
unam.mx.
1
The abbreviations used are: LEA, late embryogenesis abundant;
RMF, ribosome modulation factor.
sequence similarity (2, 3). Although significant similarity has
not been detected between the members of the different classes,
a unifying and outstanding feature of these proteins is their
high hydrophilicity and high percentage of glycines (4 – 6).
Amino acid sequence analysis allows one to predict that these
proteins exist primarily as random coils (5). This property has
been confirmed in few cases with purified proteins (7, 8) and is
supported by the fact that proteins of this type do not coagulate
upon heating (9, 10). LEA protein families have been identified
in a wide range of different plant species to the extent that they
can be considered ubiquitous in plants (3, 11). Moreover, it has
been shown that members of at least one of the LEA protein
families, the so-called dehydrins, are present in a range of
photosynthetic organisms, including lower plants, algae, and
cyanobacteria (9, 11).
Two important questions to address are, how do these structural and physicochemical characteristics relate to the largely
unknown function of LEA proteins and do they represent a
solution to a plant-specific problem or to a more general one?
Recent data suggest that LEA-like proteins are also found in
fungi, and they seem to be also involved in pathways that
respond to water deficit. That is the case of HSP12 (12) and
GRE1 (13) from Saccharomyces cerevisiae, whose amino acid
composition shows a high content of hydrophilic amino acids
distributed throughout their primary structure. In both cases,
it has been shown that their corresponding transcripts accumulate in response to hyperosmosis (13, 14). The CON6-encoded protein of Neurospora crassa also exhibits hydrophilic
and highly charged repeated domains. The CON6 polypeptide
is present at high levels in the late stages of conidiation and in
the mature conidia, stages that involve nutrient deprivation
and developmentally imposed desiccation (15).
Since the LEA proteins of different classes have no evident
sequence similarity, it is unlikely that the structural and physicochemical characteristics they share are due to common descent. In contrast, it seems that these physicochemical characteristics have evolved independently in different protein
families and in different organisms, and they allow LEA proteins to carry out their specific functions under partial dehydration. If this is the case, we should expect to find proteins
with similar characteristics outside the plant kingdom that are
also induced by conditions of water deficit. We have chosen the
name hydrophilins to group the proteins that present, as most
LEA proteins in plants, extreme hydrophilicity and a high
percentage of glycine residues in their sequence.
In this study, we explored the question of whether proteins
with these characteristics are widespread among bacteria and
other eukaryotes and if they are induced by water deficit. To
pursue this objective, we designed an algorithm to search sequence data bases for proteins having a physicochemical resemblance to LEA proteins. Then we investigated, either from
5668
This paper is available on line at http://www.jbc.org
Hydrophilic Proteins Are Common during Water Deficit
TABLE I
Primers used to obtain E. coli and S. cerevisiae hydrophilin
gene probes
Name
YCIG_ECOLI
YJBJ_ECOLI
PRTL_ECOLI
RMF_ECOLI
YHDL_ECOLI
S65242
STF2_YEAST
YBM6_YEAST
YFBO_YEAST
IF1A_YEAST
S67772
YJO4_YEAST
YJS4_YEAST
YNTO_YEAST
RL44_YEAST
Sequence (5⬘ to 3⬘)
ATG
GCG
CTG
GAA
GAA
CGA
CCG
CAG
ATG
GCT
CAA
AGT
AAT
ACC
CTT
TGG
TCT
ACA
AGT
GAA
CGC
TTT
AAA
GCG
CCC
TGG
CCG
GAG
CGA
AGA
AAC
GCC
TAA
TTA
AAC
TGG
GTA
TAT
TAT
GCA
GGG
AGT
TCC
AGA
GAA
GGG
TGT
CAA
CAG
GAA
AGA
GCC
TAA
GGG
TCC
AAG
TGC
AAG
CAT
TAA
AAT
CGG
GCC
CG
GC
G
C
GCG
ACT
CCG
AAC
G
CTC
CTC
AAT
CCT
ATG
GAA
TTC
CCG
ATG
GAT
GGA
GGA
CAA
CGT
GCC
CAA
ATG
CAG
TCC
AAT
GAG
GAG
CAA
TTT
TCT
AGG
GTC
TCA
GGT
ATT
AAA
TGA
GAA
TAC
GAT
TCC
AAG
GAT
TGA
TGC
GCC
CAA
CAG
ATC
GCT
CAA
CAC
AAT
AAG
GAG
AGA
ATT
GAA
AAC
AGT
TTC
TTC
GAA
ATC
AAA
AAA
ACT
TAA
TTG
AAC
CTG
CTG
CCA
AAA
GAC
CCC
CCT
GAT
CCC
GCA
TTA
TCT
GGG
GTA
GCT
CAC
GAC
CAA
AAA
GAT
CAA
ACC
TCA
AAC
ACT
AAC
GCA
GTT
GAA
CTG
ACT
TCT
GGT
GGG
TGA
TAC
ATT
ACC
CTG
TAC
AGA
GAG
TCA
ACT
TGT
AGT
GGA
AAG
CCC
GAA
GTA
GTT
AGG
CAA
CCC
CGC
AAT
GC
TGC
TAC
ACA
CCG
AAT
AAA
TCG
GCA
ACA
GAG
GTG
CC
TCC
ACT
GGG
ATC
TTT
ATA AAA GC
AAA AAC G
AAA GG
C
GGC
GAA
GC
CC
GGT
GCC
GCC
GGC
GG
CG
GG
GGG G
GG
GG
C
GC
GG
C
TGC
the literature or by performing Northern blot experiments,
whether the transcripts of the selected genes accumulated under water deficit. After finding isolated examples in several
species, we searched the complete genome data bases of Escherichia coli and S. cerevisiae. A surprisingly high proportion of
the proteins chosen by the physicochemical criterion was also
induced at the mRNA level under water-deficit conditions.
Thus, we conclude that hydrophilins exist in several kingdoms,
including plant, bacteria, and fungi, and that they have been
recruited from diverse protein families to pathways involved in
responses to water-deficit conditions.
EXPERIMENTAL PROCEDURES
Strains, Media, and Growth Conditions—S. cerevisiae strain RS58
(16) was used in this work. Yeast cells were routinely grown on medium
containing 1% yeast extract, 2% Bacto-peptone, and 2% glucose at
30 °C. The stress treatment involved a 30-min osmotic shock imposed
by adding a final concentration of 0.5 M sorbitol to a yeast culture grown
up to A660 ⫽ 0.5. For E. coli, most of the experiments were carried out
using strain DH5␣. E. coli cells were routinely grown on LB medium
(1% Tryptone, 0.5% yeast extract, and 0.5% NaCl) at 37 °C. As in the
case of yeast cells, an osmotic shock was imposed on bacterial cultures
grown up to A600 ⫽ 0.5 by the addition of NaCl to a final concentration
of 0.4 M, and cells were harvested after 10, 20, and 60 min of shock
treatment. E. coli strain W3110 and its derivative, HMY15, carrying an
rmf::Cmr mutation, were kindly provided by Dr. A. Ishihama (17).
Osmosensitivity tests were performed by growing E. coli strains in
liquid LB medium with or without 0.5 M NaCl at 37 °C. Samples were
removed at the times indicated, and their A600 values were determined.
Strain HMY15 was grown in the presence of chloramphenicol (20 ␮g/
ml). Levels of survival were obtained by exposing log-phase bacterial
cultures to osmotic shock with 0.5 M NaCl for 30 and 60 min. After serial
dilutions, viable cells were estimated by colony counting. Mean survival
is expressed as percentage of the survival obtained in cultures not
exposed to osmotic shock.
DNA Constructions—All full-length hydrophilin genes were obtained
by polymerase chain reaction amplification of chromosomal DNA. The
primers used to obtain E. coli and S. cerevisiae hydrophilin gene probes
are listed in Table I. The purified polymerase chain reaction products
were used as probes for all Northern blot experiments.
5669
Northern Blot Analysis—E. coli and S. cerevisiae RNAs were extracted following standard procedures (18) with some modifications. To
remove the remaining transfer RNA, an additional precipitation step
using LiCl to a final concentration of 2.0 M was included. After centrifugation, the RNA pellet was washed twice with 70% ethanol and resuspended in water (13). For all Northern blot experiments, 10 ␮g of
total RNA was separated by electrophoresis on formaldehyde-containing 1.2% agarose gel and blotted onto nylon N⫹ (Amersham Pharmacia
Biotech). All hybridizations were carried out under high stringency
conditions (65 °C and 300 mM sodium phosphate (pH 7.2), 7% SDS, and
10 mM EDTA) (19).
Computational Methods—To search the protein data bases, an ad
hoc program written in Perl was used. For each sequence, the program
records the amino acid composition and performs a hydrophilicity analysis based on the algorithm of Kyte and Doolittle (20), using a window
size of 12 residues. The program’s output is a list, sorted according to a
given score, of the sequences that passed the selection criterion. The
selection criterion and the scoring function can be given separately to
the program at runtime. Throughout this work, the selection criterion
was an average hydrophilicity above 1 and at least 8% Gly; the score or
index of significance was defined as shown in Equation 1.
100 ⫻ 共% Gly ⫺ 8兲 ⫻ 共average hydrophilicity ⫺ 1兲
(Eq. 1)
After trying different combinations of several structural and physicochemical features common to LEA proteins, this criterion was chosen
because it is simple and can adequately discriminate between LEA
proteins and other plant proteins. The score function was chosen to give
zero at the threshold values and to grow larger as a protein becomes
more distinguishable from the bulk of the non-LEA proteins. Different
sequence sets were used in this study. To develop the selection criterion
and to assess its adequacy, we used a control set that includes almost all
known LEA proteins, 56 sequences in total. All other sets were derived
from the Swiss Protein Database (Release 36). Sequences recorded as
fragments and those shorter than 30 residues were not used. To find
new hydrophilins, we used different taxon-specific subsets of the Swiss
Protein Database. Classification into taxons was made by the Fetch
program of the Wisconsin software package (Version 9.1, Genetics Computer Group, Inc., Madison, WI). The groups considered were plant,
bacteria (excluding E. coli), fungi (excluding S. cerevisiae), E. coli, and
S. cerevisiae. In the case of E. coli and S. cerevisiae, searches were
performed over hypothetically translated open reading frames of the
corresponding complete genomes (E. coli K12 MG1655 complete genome, Release M52, GenBankTM accession number U00096; S. cerevisiae Genome Database, Release 4.1). Three-state secondary structure
prediction was performed by the PHD sec service of the PredictProtein
Server.2
RESULTS
Identification of Hydrophilins in Bacteria, Fungi, and
Plants—LEA proteins did not evolve from a single ancestral
protein, but they can be grouped based on physicochemical
features, mainly high glycine content and a large proportion of
charged residues. LEA proteins also have characteristic hydropathy profiles, which distinguish them from other proteins
(Table II). We used these distinctive traits to design a reliable
method to retrieve LEA-like non-plant proteins from data
bases. The search criterion was established by searching a
control set, including most known LEA proteins sequences and
the plant subset of the Swiss Protein Database, with an ad hoc
program (see “Experimental Procedures”) that considers only
glycine content and average hydrophilicity, two properties that
resulted in a high selectivity for LEA proteins (Fig. 1). Proteins
with an average hydrophilicity above 1 and at least 8% glycine
content were retained. We coined the name “hydrophilins” to
group the proteins retrieved based on this criterion (Fig. 1). In
addition to these fixed thresholds, the algorithm assigned a
score to the searched proteins, which allowed us to sort them by
their significance. The score, or index of significance, defined in
Equation 1 (see “Experimental Procedures”) has the properties
of being zero at the threshold values and of becoming larger as
2
The computer program and the sequences in the control set are
available from the authors.
5670
Hydrophilic Proteins Are Common during Water Deficit
TABLE II
Characteristics of the hydrophilins found in different organisms
The protein hydropathy profiles were obtained using MacMolly’s
Translate package. (1) Swiss Protein Database Functions assigned to
some of these proteins: PRTL_ECOLI, protamine-like protein (33); RMF_ECOLI, ribosome modulation factor (17); GSIB_BACSU, induced by
general stress conditions (31); COTT_BACSU, spore core protein (29);
S65242 (GRE1), induced by different stress conditions (13);
STF2_YEAST, stabilizes the F1F0-ATPase 䡠 inhibitor complex (35);
SIP18_YEAST and HS12_YEAST, induced by different stress conditions (14, 34); IF1A_YEAST, eukaryotic initiation factor 1A (36);
RL44_YEAST, ribosomal protein (37); CON6_NEUCR and CONX_NEUCR, conidiation proteins (26, 27). All proteins listed in the plant
group correspond to LEA proteins (11, 13). (2) H.I.: hydrophatic index.
This was obtained according to Kyte and Doolittle (20) using a computer
program that assigns the appropriate hydropathy value to each residue
in a protein along its amino acid sequence and then successively sums
those values within overlapping segments displaced from each other by
one residue (20). (3) %Gly: glycine content expressed as percentage of
total amino acid content. (4) %Coil: proportion of the polypeptide predicted to be structured as a random coil. (5) Index of significance:
corresponds to an arbitrary number that was chosen to give zero at the
threshold values and that increases as a protein becomes more distinguishable from the bulk of the non-LEA proteins (see “Experimental
Procedures” for details). aa, amino acids.
a protein moves away from the bulk of the non-LEA proteins
(Fig. 1).
Of the known LEA plant proteins, 91% had index values ⬎0
and were classified as hydrophilins. As expected, one group of
LEA proteins with a low glycine content and marginal hydrophilicity was not retrieved by our selection criterion. These
proteins are considered atypical and belong to group V, such as
FIG. 1. Plot of the hydrophilicity and percentage of Gly content of some relevant protein groups. This analysis included data
obtained for all E. coli and S. cerevisiae proteins as well as for LEA
proteins in the control set. For technical limitations of the graphing
program, only 4000 of the ⬃6200 proteins in S. cerevisiae are represented. The LEA proteins that were not classified as hydrophilins have
also been incorporated. The hydrophilicity index was obtained as described under “Experimental Procedures” (20). Black circles, E. coli
proteins; gray circles, S. cerevisiae proteins; blue triangles, control set
LEA proteins; yellow triangles, non-hydrophilin LEA proteins; red
squares, E. coli hydrophilins; green circles, S. cerevisiae hydrophilins.
LEA5 and LEA14 from cotton (21). For this reason, they were
excluded from the control set of LEA proteins (Fig. 1). That our
search criterion is specific is further confirmed by the fact that
of 5792 plant proteins searched, only 72 were selected, and 52
of these are known LEA proteins (Fig. 1). Additional hydrophilins were identified: two LEA proteins not previously considered, two of the so-called COR proteins, a flocculent-active
protein (MO2), a seed antimicrobial peptide (MBP1), and several ribosomal proteins (14). Interestingly, some of these proteins have been associated with water-deficit situations. This is
the case of the alfalfa COR410 and wheat CORA proteins,
which not only present sequence similarity to some LEA proteins, but also their genes are induced during cold and drought
stress (22, 23). On the other hand, MBP1 and MO2 proteins are
found in the desiccated seed (24, 25). Regarding the ribosomal
proteins, no information is available on their expression patterns under stress conditions (Swiss-Prot accession numbers
p10851, p50888, p49195, p26871, p27083, p50892, p27070,
p48857, p49213, p48130, p51190, p49164, p49689, and
p28520).
When the same analysis was performed with the fungal and
bacterial protein subsets (excluding S. cerevisiae and E. coli;
see below), again a discrete group of proteins were classified as
hydrophilins according to our criterion. In fungi, only 12 out of
2264 sequences passed the selection criterion. This group included two proteins, CON6 and CON10, involved in conidiation
(a developmental stage at which a dehydration process occurs)
(26, 27); a mitochondrial protein (28); six ribosomal proteins
(Swiss-Prot accession numbers p52809, p27076, p31028,
p31027, p31866, and p27075); and three hypothetical proteins.
In bacteria, 14 hydrophilin candidates out of 12,911 sequences
were selected. Three of them are involved in Bacillus subtilis
sporulation (GsiB, SspF, and CotT), and specifically, GsiB is
also induced in response to different stress conditions, including osmotic stress (29 –31). From the remaining 11 hydrophilins, seven correspond to hypothetical proteins, and two are
Hydrophilic Proteins Are Common during Water Deficit
classified as single-stranded DNA-binding proteins (Swiss-Prot
accession numbers p40947 and p44409) and two as ribosomal
proteins (Swiss-Prot accession numbers p52829 and p46386).
Hydrophilins in Two Complete Genomes: E. coli and S. cerevisiae—To have a more complete picture of the distribution of
hydrophilins in the genomes of prokaryotes and eukaryotes, we
searched the complete genome data bases of the bacteria E. coli
and yeast S. cerevisiae. This was important because the study
of complete genomes eliminates many sources of bias in the
composition of the data bases, specifically the fact that genes
tend to be isolated and studied by virtue of being related to
other known genes. In E. coli, only five candidates were found
when the selection criterion was applied against the 4289 proteins of Release M52 of the complete genome, whereas the
genome of S. cerevisiae codes for 6200 proteins, 12 of which
qualified (Fig. 1 and Table II).
Many of the identified proteins corresponded to open reading
frames that had not been characterized before and whose
amino acid sequences did not show significant sequence similarity among them or to other proteins in the data bases.
Others had already been described, and specific functions had
been assigned to them. This is the case of two of the five E. coli
hydrophilins. The ribosome modulation factor (RMF) is a protein that is associated with the 100 S ribosome dimers and that
accumulates in E. coli cells upon growth transition from the
exponential to the stationary phase (17, 32). PRTL has been
described as a small and basic polypeptide that is referred to as
the P-protein because it has a sequence similar to that of the
trout protamine protein; however, its function is still unknown
(33).
In yeast, from the 12 selected hydrophilins, the genes of
three of them have been reported to be induced by osmotic
stress, these being HSP12, SIP18, and GRE1, whose function is
unknown (13, 14, 34). Of the other nine genes, functional characterizations have been reported only for STF2, IF1A, and
RL44. STF2 encodes a 15-kDa polypeptide that has been suggested to help stabilize the complex formed between F1F0ATPases and an inhibitor protein upon cessation of phosphorylation, keeping the enzyme inactive during de-energization of
mitochondrial membranes (35). IF1A encodes a eukaryotic initiation factor 1A that seems to be required to maximize the rate
of protein biosynthesis (36). Finally, RL44 (YP2␣) encodes for
one of four ribosomal acidic phosphoproteins characterized in
S. cerevisiae that are involved in the regulation of the activity
of the 60 S ribosomal subunit. It has been shown that these
phosphoproteins are not required for cell viability, but regulate
the pattern of expression in yeast (37).
Secondary Structure of Hydrophilins—One of the striking
characteristics of most LEA proteins is that they are predicted
to be highly unstructured molecules, existing principally as
random coils (Fig. 2 and Table II). This has been confirmed in
a few cases by circular dichroism (7, 8). Accordingly, when the
selected E. coli and S. cerevisiae hydrophilins were subjected to
secondary structure prediction analysis, the results indicated
that, in every case, they tend to adopt a coil configuration over
much of their length (50 – 80%) (Fig. 2 and Table II). In contrast, in a control group made of 89 randomly chosen E. coli and
S. cerevisiae proteins, only six proteins (7%) in the E. coli
sample and four proteins (4.5%) in the S. cerevisiae sample had
⬎50% predicted coil regions.
Hydrophilin Expression Pattern under Osmotic Stress Conditions—The high correlation between the expression pattern
under water limitation and the physicochemical characteristics
present in different hydrophilins (including LEA proteins) suggested that the induction in response to water-deficit conditions could be conserved among all hydrophilins. To test this
5671
FIG. 2. Distribution of LEA proteins, 89 randomly chosen E.
coli and S. cerevisiae proteins, and 17 E. coli and S. cerevisiae
hydrophilins according to their predicted contents of randomcoiled structures. Three-state secondary structure prediction was
performed by the PHD sec service of the PredictProtein Server. Black
bars, sample E. coli proteins; white bars, sample S. cerevisiae proteins;
gray bars, LEA proteins; hatched bars, hydrophilins.
hypothesis, we carried out Northern blot analysis using as
probes amplified sequences corresponding to hydrophilins from
E. coli and S. cerevisiae (see above). These probes were hybridized against total RNA obtained from E. coli or S. cerevisiae
grown under optimal conditions or under hyperosmotic stress
(see “Experimental Procedures”). The probes corresponding to
SIP18 and HSP12 were not included in this analysis because
data on induction by water stress had already been reported
(14, 34).
All E. coli hydrophilin transcripts studied here accumulated
in response to osmotic stress. The highest accumulation was for
YCIG_ECOLI and YJBJ_ECOLI transcripts after 20 min of
hyperosmotic treatment. In both cases, we detected additional
transcripts of higher molecular weight responsive to the imposed stress conditions. This observation suggested that
YCIG_ECOLI and YJBJ_ECOLI genes are part of operons being transcribed as polycistronic mRNAs. A significant transcript accumulation in response to osmotic stress was also
detected for the PRTL_ECOLI gene, whose highest accumulation occurred after 10 min of stress treatment (Fig. 3). The
RMF_ECOLI transcript also responded to the imposition of a
water deficit; however, in contrast with the other hydrophilin
genes, the increase in mRNA accumulation was not detected
until 60 min of stress treatment (Fig. 3). The mRNA for the
YHDL_ECOLI gene was not detectable on gel blot hybridizations under control or stress conditions and was therefore not
included in the study.
The analysis of the expression patterns of the 12 S. cerevisiae
hydrophilin transcripts showed that only eight accumulated in
response to hyperosmotic conditions: S65242, STF2_YEAST,
SI18_YEAST, YBM6_YEAST, HS12_YEAST, YJ04_YEAST,
YJS4_YEAST, and YNT0_YEAST (Fig. 4). As indicated above,
we pursued the study of only those yeast hydrophilins whose
osmotic stress response has not been reported before. As in E.
coli hydrophilins, although the absolute levels of the yeast
hydrophilin transcripts varied, most were clearly induced by
water-deficit conditions. Interestingly, a correlation was found
between the level of induction and the index of significance of
the E. coli and S. cerevisiae hydrophilins. In yeast, for example,
the highest induction was for those proteins with the highest
index (S65242, STF2_YEAST, SIP18_YEAST, YBM6_YEAST,
and HS12_YEAST). This result reinforces the suitability of the
5672
Hydrophilic Proteins Are Common during Water Deficit
investigate the contribution of this hydrophilin to the E. coli
adaptive response to osmotic stress, we took advantage of the
existence of characterized null mutants in the rmf gene
(rmf::Cmr) (17) to carry out osmosensitivity tests. As shown in
Fig. 5, a wild-type strain (W3110) presented a similar growth
rate under optimal growth conditions as well as under hyperosmotic stress. In contrast, the isogenic rmf::Cmr mutant
(HMY15) showed a dramatic decrease in its growth rate under
stress conditions (Fig. 5A). To evaluate the viability of the
wild-type and mutant strains, both bacterial cultures were
subjected to osmotic shock with 0.5 M NaCl. The results from
these experiments showed that a major difference in survival
was detected after a 30-min osmotic shock, when the strain
carrying the wild-type rmf gene presented a 82% survival;
meanwhile, the mutant strain rmf::Cmr showed a survival of
33% (Fig. 5B).
FIG. 3. Time course of the E. coli hydrophilin transcript accumulation in response to osmotic stress. E. coli cultures were grown
to A600 ⫽ 0.5, and at this point, the stress was imposed by adding NaCl
to a final concentration of 0.4 M (see “Experimental Procedures” for
details). Cells were harvested at the times indicated, and total RNA was
extracted. RNAs (10 ␮g/lane) were electrophoretically separated, blotted, and hybridized against the corresponding [␣-32P]dCTP-labeled
probes. High stringency conditions were used for all hybridizations and
washes. Experiments were repeated at least three times. Ethidium
bromide-stained ribosomal RNAs are shown at the bottom as loading
reference. b, bases.
FIG. 4. S. cerevisiae hydrophilin transcript accumulation in
response to osmotic stress. S. cerevisiae cultures were grown to A660
⫽ 0.5, and at this point, the stress was imposed by the addition of
sorbitol to a final concentration of 0.5 M (see “Experimental Procedures”
for details). Cells were harvested after 30 min of treatment, and total
RNA was extracted. RNAs (10 ␮g/lane) were electrophoretically separated, blotted, and hybridized against the corresponding [␣-32P]dCTPlabeled probes. High stringency conditions were used for all hybridizations and washes. Ethidium bromide-stained ribosomal RNAs are
shown at the bottom as loading reference. Experiments were repeated
at least three times. The numbers below show the quantification of
transcript accumulation when compared with the transcript levels obtained under control conditions. The quantification was done using NIH
Image 1.60b7. Kb, kilobases.
search criterion to select new hydrophilins as they appear in
data bases.
Contribution of an E. coli Hydrophilin to the Adaptive Response to Hyperosmotic Conditions—As mentioned above, one
of the E. coli proteins classified as a hydrophilin is a protein
identified as RMF. This protein shows an unusual behavior,
accumulating in E. coli cells upon the growth transition from
the exponential to the stationary phase, and in contrast to
other ribosome components or associated proteins, rmf expression depends, inversely, on growth rate (17), suggesting its
participation in the bacterial response to stress conditions. To
DISCUSSION
In higher plants, proteins classed as LEA proteins present
the common characteristics of being induced when a loss of
water from the cell occurs. In this work, we focused our attention on the structural unifying aspects of most of these proteins: their extreme hydrophilicity and high content of glycine
residues. The methodic screening that we carried out indicated,
first, that proteins with these properties can be clearly distinguished from the bulk of proteins in any given organism and,
second, that not only are these structural features present in
the LEA proteins, but rather they are dispersed among proteins from different organisms. This was evident from the
analysis of the E. coli and S. cerevisiae genomes, which showed
only 5 and 12 proteins, respectively, with these characteristics.
More important, the criterion we used in this analysis, which
does not imply sequence similarity and is based on two simple
characteristics, is highly selective. Systematically, we found
that ⬃1% of the proteins passed this criterion. From the complete OWL Database, only 1800 out of 250,000 sequences
(0.72%) were selected. Of the two characteristics, the hydrophilicity threshold is by far the most selective, eliminating 19 out
of 20 proteins, compared with the glycine threshold, which
merely eliminates 16 of every 20 proteins.
An additional conclusion from this work is the remarkable
correlation found in plants, E. coli, and S. cerevisiae between
expression under water deficit and the physicochemical characteristics of hydrophilins. The strongest correlation was found
in E. coli, where the genes encoding four of the five proteins
selected as hydrophilins were induced by osmotic stress treatment (one transcript was undetectable under our conditions).
In the case of S. cerevisiae, there were some exceptions: 4 of the
12 selected hydrophilin transcripts did not accumulate after
osmotic shock. Reassurance comes from the observation that,
on average, the exceptions presented lower scores than the
osmotic stress-induced hydrophilins, suggesting that they are
not true hydrophilins or that they are induced under different
experimental conditions (Fig. 4).
By addressing the question of how many of the well studied
proteins in E. coli are related to a hyperosmotic condition, the
probability to which a randomly selected protein happens to be
inducible by water deficit was estimated. In the E. coli genome,
1477 open reading frames are said to be ⬎95% identical to
Swiss Protein Database entries. Of these entries (which constitute a non-redundant set), 880 have some annotation under
“Function” or “Induction”, 26 of which suggest some relationship to dehydration or hyperosmosis. Thus, ⬃3% or less of the
proteins in any group are expected to be induced by osmotic
shock. In contrast, the data in this work indicate that all the
detectable E. coli hydrophilin genes are induced under these
conditions, whereas in S. cerevisiae, 70% of the selected genes
Hydrophilic Proteins Are Common during Water Deficit
5673
FIG. 5. Contribution of RMF to the adaptive response to hyperosmotic conditions. A, E. coli strains W3110 and HMY15 (rmf::Cmr) were
grown in liquid LB medium at 37 °C with or without 0.5 M NaCl. Culture samples were removed at the times indicated, and their A600 values were
determined. Open circles, W3110 control; open squares, HMY15 control; closed circles, W3110 (0.5 M NaCl); closed squares, HMY15 (0.5 M NaCl)
B, levels of survival were obtained by exposing log-phase bacterial cultures to osmotic shock with 0.5 M NaCl for 30 or 60 min. Mean survival is
expressed as percentage of the survival obtained in cultures not exposed to osmotic shock. Experiments were repeated at least three times. Closed
circles, W3110; open circles, HMY15.
are responsive. Therefore, we could conclude that the association found between the selected physicochemical properties
and the transcript induction by osmotic shock cannot be by
mere chance.
Our criterion also selected many hydrophilin proteins from
other groups besides plants, E. coli, and S. cerevisiae. Only in a
few cases did we found information to suggest that the genes
for these proteins were induced by water deficit. This has to be
considered as an incomplete list. Future releases of the Swiss
Protein Database with more entries and information on the
function or inducibility of their sequences will undoubtedly
provide many more examples. This is true even of the plant
group.
Metazoa were not analyzed in this work because, with few
exceptions, the response to dehydration has not been studied in
this group. However, a preliminary examination indicates that
the Swiss Protein Database has 193 metazoan proteins selected
by our algorithm (data not shown). This set of proteins is very
heterogeneous according to their described function. It distinguishes toxins, DNA-binding proteins, ribosomal proteins, and
some extracellular proteins. At present, a functional correlation with dehydration has not been established in this group.
Even though no significant sequence similarity was found
among most of the different hydrophilins, common functional
features could be distinguished. This is the case for those that
seem to interact with nucleic acids such as protamine, the
single-stranded DNA-binding proteins, or the ribosomal proteins that interact with RNA. Thus, it could be argued that
their structural properties (presence of charged amino acids)
are easily associated with their ability to bind nucleic acids,
and this would be the same for all DNA- or RNA-binding
proteins; however, not all the proteins with this functional
characteristic were classified as hydrophilins. For example, in
E. coli, only one ribosomal protein, RMF, was included in the
hydrophilin group. Given that a significant proportion of the
hydrophilins from different organisms are ribosomal proteins
(organellar and cytoplasmic), it could be hypothesized that
some of them were selected as part of an adaptive mechanism
that helps to maintain a functional translational apparatus
under water-limited environments. This hypothesis is supported by the data in Fig. 5 showing that a strain lacking the
hydrophilin identified as RMF (rmf::Cmr) is osmosensitive under hyperosmotic conditions.
As indicated above, the hydrophilins constitute a heterogeneous group of proteins, in which, at least in the case of plant
hydrophilins (mostly LEA proteins), six different families can
be identified (2, 3). The analysis of the non-plant hydrophilin
protein sequences has also shown that some could be grouped
into class 1 of LEA proteins since homologous repetitive motifs
are conserved not only in all members of this group, but also in
the GsiB protein from B. subtilis (31). The presence of conserved motifs among the proteins in each family suggested that
they played similar roles and probably different from those
carried out by hydrophilins in other groups. Therefore, those
common physicochemical properties in hydrophilins could be
the result of the selection of a general solution in proteins that
could have different functions; for instance, some of them could
function as stabilizers of either cell structures (e.g. membranes)
or macromolecular complexes (e.g. transcriptosomes, ribosomes, etc.).
The high content of glycine residues in most LEA proteins
allows one to predict that these proteins exist in random-coiled
structures (5, 6). This is consistent with experimental data
obtained for some LEA proteins (Em1, Dsp16, and G50) that
showed a high percentage of randomly coiled moieties, suggesting that these proteins may be largely unfolded in their native
state (7, 8). In the case of hydrophilins, the analysis of their
predicted three-state structure also suggested that characteristically they exist as loosed-structured proteins (Fig. 2), in
contrast with the majority of proteins present in an organism.
This might be related to their hypothetical role in helping to
maintain a minimum cellular water content for survival.
If we consider that the sequence of a protein determines its
function only indirectly, by guaranteeing that the protein will
have the adequate structural and physicochemical characteristics required, it is possible that exceptional traits could serve
as the criterion to search for the function when the proteins
have no sequence similarity due either to convergence or to
very distant ancestry. In our case, this was a fruitful strategy.
By showing that the hydrophilin physicochemical properties
are associated with a common expression pattern, our results
suggest that these proteins represent a common adaptation to
osmotic stress in the most diverse taxons.
Acknowledgments—We thank B. Barkla, M. Rocha, J. L. Folch, F.
Campos, and E. Alvarez-Buylla for critical reading of the manuscript;
R. M. Solórzano for technical assistance; and P. Gaytán and E. López for
the synthesis of oligonucleotides.
REFERENCES
1. Bray, E. A. (1997) Trends Plant Sci. 2, 48 –54
2. Colmenero-Flores, J. M., Campos, F., Garciarrubio, A., and Covarrubias, A. A.
(1997) Plant Mol. Biol. 35, 393– 405
3. Ingram, J., and Bartels, D. (1996) Annu. Rev. Plant Physiol. Plant Mol. Biol.
47, 377– 403
4. Baker, J., Steele, C., and Dure, L., III (1988) Plant Mol. Biol. 11, 277–291
5. Dure, L., III (1993) in Plant Responses to Cellular Dehydration during Environmental Stress (Close, T. J., and Bray, E. A., eds) Vol. 10, pp. 91–103,
American Society of Plant Physiology, Rockville, MD
6. Dure, L., III (1993) Plant J. 3, 363–369
5674
Hydrophilic Proteins Are Common during Water Deficit
7. Lisse, T., Bartels, D., Kalbitzer, H. R., and Jaenicke, R. (1996) Biol. Chem. 377,
555–561
8. McCubbin, W. D., and Kay, C. M. (1985) Can. J. Biochem. 63, 803– 810
9. Close, T. J., and Lambert, P. J. (1993) Plant Physiol. (Bethesda) 101, 773–779
10. Close, T. J. (1996) Physiol. Plant. 97, 795– 803
11. Close, T. J. (1997) Physiol. Plant. 100, 291–296
12. Mtwisha, L., Brandt, W., McCready, S., and Lindsey, G. G. (1998) Plant Mol.
Biol. 37, 513–521
13. Garay-Arroyo, A., and Covarrubias, A. A. (1999) Yeast 15, 879 – 892
14. Varela, J. C., van Beekvelt, C., Planta, R. J., and Mager, W. H. (1992) Mol.
Microbiol. 6, 2183–2190
15. Springer, M. L., and Yanofsky, C. (1992) Genes Dev. 6, 1052–1057
16. Gaxiola, R., Larrinoa, I. F., Villalba, J. M., and Serrano, R. (1992) EMBO J. 11,
3157–3164
17. Yamagishi, M., Matsushima, H., Wada, A., Sakagami, M., Fujita, N., and
Ishihama, A. (1993) EMBO J. 12, 625– 630
18. Collart, M., and Oliviero, S. (1995) in Current Protocols in Molecular Biology
(Ausubel, F. M., Brent, R., Kingston, R. E., Moore, D. D., Seidman, J. G.,
Smith, J. A., Struhl, K., eds) Vol. 2, Suppl. 23, pp. 13.12.1–13.12.5, John
Wiley & Sons, Inc., New York
19. Church, G. M., and Gilbert, W. (1984) Proc. Natl. Acad. Sci. U. S. A. 81,
1991–1995
20. Kyte, J., and Doolittle, R. F. (1982) J. Mol. Biol. 157, 105–132
21. Galau, G. A., Wang, H. Y., and Hughes, D. W. (1993) Plant Physiol. (Bethesda)
101, 695– 696
22. Danyluk, J., Houde, M., Rassart, E., and Sarhan, F. (1994) FEBS Lett. 344,
20 –24
23. Laberge, S., Castonguay, Y., and Vezina, L. P. (1993) Plant Physiol. (Bethesda)
101, 1411–1412
24. Duvick, J. P., Rood, T., Rao, A. G., and Marshak, D. R. (1992) J. Biol. Chem.
267, 18814 –18820
25. Gassenschmidt, U., Jany, K. D., Tauscher, B., and Niebergall, H. (1995)
Biochim. Biophys. Acta 1243, 477– 481
26. Roberts, A. N., Berlin, V., Hager, K. M., and Yanofsky, C. (1988) Mol. Cell.
Biol. 8, 2411–2418
27. White, B. T., and Yanofsky, C. (1993) Dev. Biol. 160, 254 –264
28. Amegadzie, B. Y., Zitomer, R. S., and Hollenberg, C. P. (1990) Yeast 6,
429 – 440
29. Aronson, A. I., Song, H. Y., and Bourne, N. (1989) Mol. Microbiol. 3, 437– 444
30. Stephens, M. A., Lang, N., Sandman, K., and Losick, R. (1984) J. Mol. Biol.
176, 333–348
31. Völker, U., Egelmann, S., Maul, B., Riethdorf, S., Völker, A., Schmid, R., Mach,
H., and Hecker, M. (1994) Microbiology 140, 741–752
32. Wada, A., Igarashi, K., Yoshimura, S., Aimoto, S., and Ishihama, A. (1995)
Biochem. Biophys. Res. Commun. 214, 410 – 417
33. Altman, S., Model, P., Dixon, G. H., and Wosnick, N. A. (1981) Cell 26,
299 –304
34. Miralles, V. J., and Serrano, R. (1995) Mol. Microbiol. 17, 653– 662
35. Yoshida, Y., Sato, T., Hashimoto, T., Ichikawa, N., Nakai, S., Yoshikawa, H.,
Imamoto, F., and Tagawa, K. (1990) FEBS Lett. 192, 49 –53
36. Wei, C.-L., Kainuma, M., and Hershey, J. W. B. (1995) J. Biol. Chem. 270,
22788 –22794
37. Remacha, M., Jimenez-Diaz, A., Bermejo, B., Rodriguez-Gabriel, M. A.,
Guarinos, E., and Ballesta, J. P. G. (1995) Mol. Cell. Biol. 15, 4754 – 4762