Bendable Genes of Warm

Bendable Genes of Warm-blooded Vertebrates
Alexander E. Vinogradov
Institute of Cytology, Russian Academy of Sciences
It is shown that in the genomes of warm-blooded vertebrates the elevation of genic GC-content is associated with
an increase in the bendability of the DNA helix, which is both absolute and relative as compared with random
sequences. This trend takes place both in exons and introns, being more pronounced in the latter. At the same time,
the free energy of melting (delta G) of exons and introns increases only absolutely with elevation of GC-content,
whereas it decreases as compared with random sequences (again, this trend is stronger in the introns). In genes of
cold-blooded animals, plants, and unicellular organisms, these regularities are weaker and often not consistent.
Generally, there is a negative correlation between bendability and melting energy at any fixed GC-content value.
This effect is stronger in the introns. These findings suggest that GC-enrichment of genes in the homeotherm
vertebrates can be caused by selection for increased bendability of DNA.
Introduction
Genomes of higher eukaryotes consist of regions
which differ in their GC-percent (isochores) (reviewed
by D’Onofrio et al. 1999; Bernardi 2000). This heterogeneity reaches its highest degree in mammals and
birds. It is the GC-rich regions which seem to be an
evolved trait (Bernardi, Hughes, and Mouchiroud 1997).
There are two alternative groups of views on the emergence of these regions: neutralist (e.g., mutation bias)
(Wolfe, Sharp, and Li 1989; Wolfe and Sharp 1993;
Ellsworth, Hewett-Emmett, and Li 1994; Eyre-Walker
1994; Francino and Ochman 1999) and selectionist.
Among the proposed selectionist explanations there are
those involving the physical DNA property (higher thermal stability) (Bernardi and Bernardi 1990) and the informational content of the coding sequences (codon usage bias for better translation performance or even shift
in the amino acid composition) (D’Onofrio et al. 1999;
Bernardi 2000). The latter hypotheses, however, cannot
explain why the noncoding DNA (introns) of the GCrich genes also show an increase in the GC-content.
Here the GC-dependences of melting energy and bendability of DNA molecules are studied for the coding and
noncoding parts of genes in different genomes and compared to those of random sequences.
Materials and Methods
The sequences of nuclear genes were extracted
from GenBank (release 123). For better-presented genomes, namely, the human, mouse, nematode, fruitfly,
cress thale, rice, and fission yeast, only genes with complete coding sequences (CDS) were taken; for others, all
genes with at least two complete exons and introns between them were selected. Genes were checked for duplicates on the ground of CDS similarity (.99%). All
coding sequences, including partial ones, were checked
for the absence of internal stop codons. The intron-exon
Key words: isochore, mutation bias, GC-percent, bendability,
thermal stability, introns.
Address for correspondence and reprints: Alexander E. Vinogradov, Institute of Cytology, Russian Academy of Sciences, Tikhoretsky
Ave. 4, St. Petersburg 194064, Russian Federation. E-mail:
[email protected].
Mol. Biol. Evol. 18(12):2195–2200. 2001
q 2001 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038
boundaries were taken from annotations. In total, 59,856
genes with 346,409 exons and 286,764 introns (total
length of 204.3 Mb) were analyzed.
The random DNA sequences of 10-kb length were
generated with 0.1% increments in the 1%–100% range
of GC-content using Perl function rand. (Each base pair
was drawn by two iterations, one for choice between the
GC and AT pairs, and the other for choice between the
purine and pyrimidine bases on a given strand. The real
percent of GC-pairs and purine bases on a given strand
was determined after the generation of the sequence.)
Generally, the average content of purine bases on the
coding strand in exons is about 48%, and in introns,
about 52%. Therefore, the complete sets of random sequences were generated for different purine contents in
the range of 45%–55% (with 2.5% increments); their
bendability and melting energy were not found to vary
significantly. Here, the results for the 50% purine content are presented.
The parameters under study were determined using
the trinucleotide table for bendability based on consensus values obtained from the DNAse I digestion and
nucleosome positioning studies (Gabrielian, Simoncsits,
and Pongor 1996) and the dinucleotide table for free
energy of melting (delta G) obtained from the UV absorbance and temperature profiles (SantaLucia, Allawi,
and Seneviratne 1996) in a sliding tri- and dinucleotide
frame (with 1 2 nt step), respectively, and averaged for
each sequence.
The statistical analyses were done with the Statgraphics (Statistical Graphics Co.) and Statistica
(StatSoft, Inc.) software.
Results
Both the bendability and the free energy of melting
(which reflects the thermal stability) of various DNA
sequences increase with elevation of the GC-content
(fig. 1). However, the bendability of human exons and
introns is increasing seemingly faster than that of the
random sequences, whereas their melting energy is rising more slowly than its random-sequence counterpart
(fig. 1). To check the statistical significance of this effect, the slopes of linear regression for genic and random
sequences can be compared (fig. 1, see legend). The
slope of bendability is significantly higher in the human
2195
2196
Vinogradov
FIG. 1.—The plot of (A, B) bendability and (C, D) melting energy versus GC-percent for (A, C) human exons and (B, D) introns, and for
random DNA sequences (rand. seq.). The equations of linear regression for bendability: rand. seq. (for the GC-range of human genes), Y 5
3.501 (60.011) 1 X 3 0.0302 (60.0004), exons, Y 5 3.382 (60.016) 1 X 3 0.0366 (60.0002), introns, Y 5 3.291 (60.016) 1 X 3 0.0368
(60.0002). The corresponding equations for melting energy: random sequences, Y 5 0.818 (60.001) 1 X 3 0.0114 (60.0000), exons, Y 5
0.837 (60.001) 1 X 3 0.0108 (60.0000), introns, Y 5 0.859 (60.001) 1 X 3 0.0102 (60.0000).
(The line of polynomial regression and its confidence limits are shown on each plot but cannot be discerned because they do not come out
of the strand of points representing values for random sequences.)
genes (exons, 0.0366 6 0.0002; introns, 0.0368 6
0.0002) as compared with random sequences (0.0302 6
0.0004), whereas the reverse is true for the slope of
melting energy (exons, 0.0108 6 0.0000; introns,
0.0102 6 0.0000; random sequences, 0.0114 6 0.0000).
However, these relationships are not quite linear
(especially at the margins of GC-content range). Therefore, for correct comparison, the slopes for random sequences should be determined separately for each range
of exonic or intronic GC-percents. The linear regression
generally does not give a perfect approximation (r2 of
bendability is only 95.4% for random sequences).
Therefore, the dependences of bendability and melting
energy on GC-content in random sequences were approximated using a nonlinear polynomial regression.
The third-order polynomial for bendability and the second-order polynomial for melting energy were quite perfect and represented 99.96% and 99.998% of variance,
respectively. These approximated values were subtracted
from the exon and intron values of bendability and melting energy. The relationships of the obtained residuals
with the exonic and intronic GC-content were analyzed
(table 1, figs. 2 and 3). It can be seen that in the genes
of homeotherms, the residuals of bendability correlate
positively with the GC-content in both exons and introns, whereas the residuals of DNA melting energy
show quite the opposite trend. It is noteworthy that both
correlations are stronger in the introns as compared with
exons (the difference between the corresponding correlation coefficients is significant for all the homeotherm
cases, except bendability in the rabbit, which is represented by the smallest sample size).
The bendability over melting energy trend is weaker in murids as compared with other mammals (table 1).
This trend can also be seen in some lower animals (coldblooded vertebrates and in vertebrates), although the
correlations are weaker and may not be consistent. In
unicellular organisms and plants, the correlations are
much weaker and usually not consistent (table 1). Although the signs of correlations can be similar in the
homeotherms and some other organisms, the distributions of residuals are, however, different: the greater part
of the bendability residuals in the homeotherms is positive as compared to the lower organisms (cf. figs. 2 and
3). For the melting energy, the opposite trend is observed (figs. 2 and 3).
With only a few exceptions (exons of the rat, introns and exons of a green alga), there is a negative
partial correlation between the bendability and melting
energy at fixed GC-percent (table 1). This correlation is
always stronger in the introns as compared with exons.
(The coefficients of partial correlation between the polynomial-subtracted residuals of bendability and melting
energy at fixed GC-percent were very similar and not
shown.)
Discussion
The accelerated growth of bendability and lagging
of melting energy with elevation of GC-percent in both
25,043
566
516
248
8,209
2,109
1,087
313
1,730
53,513
93,080
16,450
1,118
132,729
1,018
6,820
1,123
737
21,889
464
406
173
7,111
1,806
921
232
1,574
41,486
78,215
13,078
930
111,718
759
4,594
775
633
Introns
4,021
99
86
42
1,312
277
164
66
288
20,325
19,052
3,847
245
29,283
480
2,930
747
166
Exons
43,908
289
248
113
6,749
1,330
526
149
481
22,814
18,298
6,061
296
18,858
51
383
145
144
Introns
TOTAL LENGTH OF OBJECTS
STUDIED (KB)
0.33***
0.27***
0.29***
0.34***
0.19***
NS
0.18***
0.16**
0.20***
0.19***
0.27***
0.05***
0.10*
0.05***
NS
0.13***
20.08*
NS
Exons
0.51***
0.35***
0.58***
0.36***
0.28***
0.25***
0.25***
NS
0.23***
0.06***
0.32***
0.07***
0.18***
0.11*
0.09
0.22***
20.14**
NS
Introns
RELATIVE BENDABILITY
(A.U.)
Introns
20.77***
20.72***
20.72***
20.78***
20.69***
20.62***
20.62***
20.36***
20.24***
0.09***
20.08***
20.10***
0.26***
20.10***
NS
0.05**
0.15***
20.27***
Exons
20.34***
20.34***
20.17**
20.17**
20.31***
20.28***
20.20***
20.30***
20.16***
20.12***
20.24***
0.22***
0.23***
20.10***
20.10***
20.14***
NS
0.13**
RELATIVE MELTING ENERGY
(KCAL/MOL)
NOTE.—NS-not significant. Correlation coefficients without asterisks have significance level P , 0.05, *P , 0.01, **P , 1023, ***P , 1024.
a The relative values were obtained by subtraction of polynomial regression of bendability or melting energy on GC-percentage for random DNA sequences.
Exons
SPECIES
Homo sapiens (man) . . . . . . . . . . . . . . . . . . . . .
Sus scrofa (pig) . . . . . . . . . . . . . . . . . . . . . . . . .
Bos taurus (cow) . . . . . . . . . . . . . . . . . . . . . . . .
Oryctolagus cuniculus (rabbit) . . . . . . . . . . . . .
Mus musculus (mouse) . . . . . . . . . . . . . . . . . . .
Rattus norvegicus (rat) . . . . . . . . . . . . . . . . . . .
Gallus gallus (chicken). . . . . . . . . . . . . . . . . . .
Xenopus laevis (clawed frog) . . . . . . . . . . . . . .
Fugu rubripes (pufferfish) . . . . . . . . . . . . . . . .
Drosophila melanogaster (fruitfly) . . . . . . . . .
Caenorhabditis elegans (nematode). . . . . . . . .
Oryza sativa (rice). . . . . . . . . . . . . . . . . . . . . . .
Zea mays (maize) . . . . . . . . . . . . . . . . . . . . . . .
Arabidopsis thaliana (thale cress) . . . . . . . . . .
Emericella nidulans (mold) . . . . . . . . . . . . . . .
Schizosaccharomyces pombe (fission yeast) . .
Plasmodium falciparum (malaria parasite) . . .
Chlamydomonas reinhardtii (green alga) . . . .
NUMBER OF
OBJECTS STUDIED
20.13***
20.17***
20.09
20.13
20.06***
0.20***
20.06
NS
20.18***
20.18***
20.42***
20.20***
20.24***
20.32***
20.06
20.32
20.64***
20.10*
Exons
20.30***
20.38***
20.32***
20.35***
20.25***
20.28***
20.29***
20.64***
20.35***
20.53***
20.70***
20.38***
20.46***
20.64***
20.44***
20.62
20.95***
20.09
Introns
PARTIAL CORRELATION
BETWEEN BENDABILITY AND
MELTING ENERGY
AT FIXED GC-PERCENT
Table 1
Coefficients of Correlation Between GC-percentage and Relative DNA Bendability or Relative Melting Energy (delta G),a and Partial Correlation Between Bendability
and Melting Energy at Fixed GC-percentage for Genes of Different Organisms
Bendable Genes of Warm-blooded Vertebrates
2197
2198
Vinogradov
FIG. 2.—The regression of (A, B) relative bendability and (C, D) relative melting energy on GC-percent for (A, C) human exons and (B,
D) introns. (The relative values were obtained by subtraction of polynomial regression for random sequences. The Y-zero line corresponds to
this regression, i.e., to the random sequences.) Dotted lines, confidence limits of regression (P 5 0.95).
exons and introns, as compared with random sequences,
seem to contradict the hypothesis that mutation pressure
is a cause of compositional heterogeneity in the genomes of warm-blooded vertebrates. (Because in the latter case, the bendability and melting energy of the introns at least, should not differ from the corresponding
values for random sequences with the same GC content.)
These results suggest that it is the need for bendability
and not thermal stability which can be a leading force
behind the elevation of GC-content in the genes of
mammals and birds. The melting energy and bendability
were found to correlate negatively at fixed GC-percent
(table 1). This is probably because the thermostability
of a DNA duplex at a given GC-content is deter-
FIG. 3.—The regression of (A, B) relative bendability and (C, D) relative melting energy on GC-percent for nematode (A, C) exons and
(B, D) introns. (The relative values were obtained by subtraction of polynomial regression for random sequences. The Y-zero line corresponds
to this regression, i.e., to the random sequences.) Dotted lines, confidence limits of regression (P 5 0.95).
Bendable Genes of Warm-blooded Vertebrates
mined by base stacking energy (Doktycz et al. 1992),
which is inversely related to bendability (Anselmi et al.
2000).
Genome compositional heterogeneity is known to
be lower in the murids as compared with other mammals
(Robinson, Gautier, and Mouchiroud 1997; Douady et
al. 2000). The compositional heterogeneity was found
not only in the homeotherms, but also, to a lesser degree, in the lower animals (D’Onofrio et al. 1999; Jabbari and Bernardi 2000; Nekrutenko and Li 2000) and
plants as well. Among the latter, it is most pronounced
in the cereals but can be discerned also in the cress thale
(Carels and Bernardi 2000; Nekrutenko and Li 2000).
The present results suggest that in all these cases, except
in cereals, this heterogeneity may be stipulated by the
increase in bendability of GC-rich genome regions. The
cereals differ from the homeotherms by a much higher
exon-intron contrast in GC-content (Carels et al. 1998;
Vinogradov 2001) and may present a special case. If
these physical DNA properties are involved somehow
in the GC-enrichment of cereal genomes, it is either the
melting energy that may be a leading cause or there is
some subtle balance between the two forces.
Although the GC-rich regions constitute only a minor part of the genome (10%–15%), they harbor a great
part of the genes because of the very high gene concentration (Bernardi 2000). They are located in the early
replicating and highly transcribed chromatin (Saccone et
al. 1993, 1999; Federico, Saccone, and Bernardi 1998).
Therefore, the DNA helix of these genes should be often
bent and unbent in its transition from packaged to extended state to comply with the operation of transcription machinery. These requirements should extend both
to the exons and introns and probably to the intergenic
sequences as well (which are short and also GC-rich in
the heavy isochores). The average molecular properties
were suggested to dominate over the local features in
the sequence-dependent nucleosome formation (Anselmi
et al. 2000). It was supposed that introns can be necessary for correct chromatin structure (Zuckerkandl
1981; Trifonov 1993). In several cases, the involvement
of introns in the nucleosome ordering was demonstrated
experimentally (Lauderdale and Stein 1992; Liu et al.
1995). Therefore, it is interesting that the bendability
over melting energy trend is more pronounced in the
introns of the homeotherms as compared with their exons (table 1). In a seeming contradiction to the notion
about possible significance of intronic bendability for
the structure of chromatin, there is the fact that introns
in the heavy isochores are GC-poorer than exons (e.g.,
Bernardi 2000; Vinogradov 2001). However, this can be
explained by the impact of transposable elements, which
decreases GC-content of introns even when these elements become nonrecognizable (Duret and Hurst 2001).
The increase in the bendability of the highly expressed
genes of mammals and birds may be associated with the
higher organizational level of these animals, which requires fast and smoothly operating transcription.
Acknowledgments
The helpful comments of three anonymous reviewers are greatly appreciated. This work was supported by
2199
a grant from the Russian Foundation for Basic Research
(RFBR).
LITERATURE CITED
ANSELMI, C., G. BOCCHINFUSO, P. DE SANTIS, M. SAVINO, and
A. SCIPIONI. 2000. A theoretical model for the prediction
of sequence-dependent nucleosome thermodynamic stability. Biophys. J. 79:601–613.
BERNARDI, G. 2000. Isochores and the evolutionary genomics
of vertebrates. Gene 241:3–17.
BERNARDI, G., and G. BERNARDI. 1990. Compositional patterns
in the nuclear genome of cold-blooded vertebrates. J. Mol.
Evol. 31:265–281.
BERNARDI, G., S. HUGHES, and D. MOUCHIROUD. 1997. The
major compositional transitions in the vertebrate genome. J.
Mol. Evol. 44(Suppl. 1):S44–S51.
CARELS, N., and G. BERNARDI. 2000. Two classes of genes in
plants. Genetics 154:1819–1825.
CARELS, N., P. HATEY, K. JABBARI, and G. BERNARDI. 1998.
Compositional properties of homologous coding sequences
from plants. J. Mol. Evol. 46:45–53.
D’ONOFRIO, G., K. JABBARI, H. MUSTO, F. ALVAREZ-VALIN, S.
CRUVEILLER, and G. BERNARDI. 1999. Evolutionary genomics of vertebrates and its implications. Ann. N. Y. Acad.
Sci. 870:81–94.
DOKTYCZ, M. J., R. F. GOLDSTEIN, T. M. PANER, F. J. GALLO,
and A. S. BENIGHT. 1992. Studies of DNA dumbbells. I.
Melting curves of 17 DNA dumbbells with different duplex
stem sequences linked by T4 endloops: evaluation of the
nearest-neighbor stacking interactions in DNA. Biopolymers 32:849–864.
DOUADY, C., N. CARELS, O. CLAY, F. CATZEFLIS, and G. BERNARDI. 2000. Diversity and phylogenetic implications of
CsCl profiles from rodent DNAs. Mol. Phylogenet. Evol.
17:219–230.
DURET, L., and L. D. HURST. 2001. The elevated G and C
content at exonic third sites is not evidence against neutralist models of isochore evolution. Mol. Biol. Evol. 18:
757–762.
ELLSWORTH, D. L., D. HEWETT-EMMETT, and W. H. LI. 1994.
Evolution of base composition in the insulin and insulinlike growth factor genes. Mol. Biol. Evol. 11:875–885.
EYRE-WALKER, A. 1994. DNA mismatch repair and synonymous codon evolution in mammals. Mol. Biol. Evol. 11:
88–98.
FEDERICO, C., S. SACCONE, and G. BERNARDI. 1998. The generichest bands of human chromosomes replicate at the onset
of the S-phase. Cytogenet. Cell Genet. 80:83–88.
FRANCINO, M. P., and H. OCHMAN. 1999. Isochores result from
mutation not selection. Nature 400:30–31.
GABRIELIAN, A., A. SIMONCSITS, and S. PONGOR. 1996. Distribution of bending propensity in DNA sequences. FEBS
Lett. 393:124–130.
JABBARI, K., and G. BERNARDI. 2000. The distribution of genes
in the Drosophila genome. Gene 247:287–292.
LAUDERDALE, J. D., and A. STEIN. 1992. Introns of the chicken
ovalbumin gene promote nucleosome alignment in vitro.
Nucleic Acids Res. 20:6589–6596.
LIU, K., E. P. SANDGREN, R. D. PALMITER, and A. STEIN. 1995.
Rat growth hormone gene introns stimulate nucleosome
alignment in vitro and in transgenic mice. Proc. Natl. Acad.
Sci. USA 92:7724–7728.
NEKRUTENKO, A., and W. H. LI. 2000. Assessment of compositional heterogeneity within and between eukaryotic genomes. Genome Res. 10:1986–1995.
2200
Vinogradov
ROBINSON, M., C. GAUTIER, and D. MOUCHIROUD. 1997. Evolution of isochores in rodents. Mol. Biol. Evol. 14:823–828.
SACCONE, S., A. DE SARIO, J. WIEGANT, A. K. RAAP, G. DELLA
VALLE, and G. BERNARDI. 1993. Correlations between isochores and chromosomal bands in the human genome.
Proc. Natl. Acad. Sci. USA 90:11929–11933.
SACCONE, S., C. FEDERICO, I. SOLOVEI, M. F. CROQUETTE, G.
DELLA VALLE, and G. BERNARDI. 1999. Identification of the
gene-richest bands in human prometaphase chromosomes.
Chromosome Res. 7:379–386.
SANTALUCIA, J., H. ALLAWI, and P. A. SENEVIRATNE. 1996.
Improved nearest-neighbor parameters for predicting DNA
duplex stability. Biochemistry 35:3555–3562.
TRIFONOV, E. M. 1993. Spatial separation of overlapping messages. Comput. Chem. 117:27–31.
VINOGRADOV, A. E. 2001. Within-intron correlation with base
composition of adjacent exons in different genomes. Gene
276:143–151.
WOLFE, K. H., and P. M. SHARP. 1993. Mammalian gene evolution: nucleotide sequence divergence between mouse and
rat. J. Mol. Evol. 37:441–456.
WOLFE, K. H., P. M. SHARP, and W. H. LI. 1989. Mutation
rates differ among regions of the mammalian genome. Nature 337:283–285.
ZUCKERKANDL, E. 1981. A general function of noncoding
polynucleotide sequences. Mass binding of transconformational proteins. Mol. Biol. Rep. 7:149–158.
KENNETH WOLFE, reviewing editor
Accepted August 13, 2001