Andrzej Kolinski
Pawel Madziar
Department of Chemistry,
University of Warsaw,
ul. Pasteura 1,
02-093 Warsaw, Poland
Received 29 January 1997;
accepted 26 March 1997
Collapse Transitions in
Protein-Like Lattice Polymers:
The Effect of Sequence
Patterns
Abstract: The collapse transition of lattice protein-like heteropolymers has been studied by
means of the Monte Carlo method. The protein model has been reduced to the a-carbon trace
restricted to a high coordination lattice. The sequences of model heteropolymers contain two
types of mers: hydrophobic/nonpolar (H) and hydrophilic/polar (P). Interactions of HH and
PP pairs were assumed to be negative (weaker attractions of PP pairs) while the contact
energy for HP pairs was equal to zero. All sequence-specific short-range interactions have
been neglected in the present studies. It has been found that homopolymeric chains undergo
a smooth collapse transition to a dense globular state. The globule lacks any signatures of
local ordering that could be interpreted as a model of protein secondary structure. Heteropolymers with the sequences of hydrophilic and hydrophobic residues characteristic for a- and
b-type proteins undergo a somewhat sharper (though continuous) collapse transition to a
dense globular state with elements of local ordering controlled by the sequence. The helical
pattern induces more secondary structure than the b-type pattern. For all examined sequences
the level of local ordering was lower than the average secondary structure content of globular proteins. The results are compared with other theoretical work and with known experimental facts. The implications for the reduced modeling of protein systems are briefly discussed.
q 1997 John Wiley & Sons, Inc. Biopoly 42: 537–548, 1997
Keywords: collapse transition; lattice proteins; protein models; sequence patterns; Monte
Carlo method
INTRODUCTION
At proper temperature and solvent condition, globular
natural proteins undergo a collapse transition from a
swollen random coil state to a dense globular state
with a unique ordering of the polypeptide chain.1 The
transition is highly cooperative and usually reversible.2 The globular state exhibits dense packing close
to the packing density of molecular crystals of homologous (to amino acids) small organic molecules. Beside the global ordering, which is essentially different
for all nonhomologous sequences, globular proteins
exhibit some common motifs of the local ordering
of their polypeptide chains.3 These motifs consist of
helices, b-strands, b-turns, and some other typical
arrangements. On average, about half of the residues
participate in helices and b-strands, although the fraction of ‘‘regular’’ elements changes significantly from
sequence to sequence.
Some important questions could be asked in the
above context. Is the local ordering simply the result
of compactness, i.e., does dense packing induce a
Correspondence to: Andrzej Kolinski
Contract grant sponsor: University of Warsaw (UW) and
Howard Hughes Medical Institute (HH); contract grant number:
BST-532-34/96 (UW) and 75195-543402 (HH)
q 1997 John Wiley & Sons, Inc.
CCC 0006-3525/97/050537-12
537
8K30
/
8K30$$5502
08-18-97 11:14:42
bpa
W: Biopolymers
5502
538
Kolinski and Madziar
secondary structure that is tuned by the sequence?
Or do the short-range secondary structure preferences dictate the nature of the regular motifs in the
globular state? What is the nature of the interplay
between short- and long-range interactions? How
important are highly directional interactions (such
as hydrogen bonds and some other polar interactions) that perhaps significantly reduce the conformational space of folded proteins. The answers to
these questions are by all means not obvious.
In the past, it has been shown by Monte Carlo
computational experiments that model homopolymeric chains collapse to a dense globular state that
could be amorphous 4 or locally highly ordered, 5,6
depending on balance between the long-range interactions and the local (short-range) conformational
stiffness. These simulations employed a very simple
tetrahedral lattice model with nearest neighbor, contact-type pairwise interactions as a model of tertiary
interactions and a preference for trans state as a
model of short-range conformational stiffness. Nevertheless, the model exhibited several features of
protein folding transition.6 First, the collapse transition of sufficiently stiff polymers, consisting of 50–
200 independent structural units, was an all-or-none
type, very much as the folding transition of single
domain globular proteins. Second, the collapse induced excess ordering, i.e., the average length of
expanded fragments in the collapsed state was a few
times larger than the length of expanded fragments
in the random coil state at the same (transition)
temperature. Thus, the transition was cooperative
with an abrupt change of entropic and enthalpic
contribution to the system free energy characteristic
for the pseudo first-order phase transition. Directionality of tertiary interactions augmented cooperativity of the collapse transition.7 With an increase
in the chain length, the transition became more and
more diffused (i.e., population of intermediate
states, where various parts of the chain have very
different spatial density of segments) at the transition temperature increases) and the globular state
consisted of several ordered domains.6 Within the
limit of an infinitely long chain, a smooth transition,
typical for flexible polymers, could be expected.
This smoothing of the collapse transition due to the
formation of partly independent domains provides
a very suggestive analogy to the folding of multidomain proteins. Of course, flexible polymers exhibited continuous collapse in the entire range of the
chain length, 4 in agreement with theoretical predictions of Lifshitz et al.8 and others.9,10 Consequently,
compactness itself did not induce any detectable
local ordering. Later, Chan and Dill 11,12 suggested
8K30
/
8K30$$5502
08-18-97 11:14:42
the opposite and the issue became somewhat controversial. With an increasing body of new computational experiments and more careful definition and
analysis of secondary structure, 13 it has been concluded that, in the absence of any short-range preferences or/and specific long-range interactions, one
should not expect a substantial protein-like ordering
of collapsed homopolymers.13 – 17
Do the results of the simulations described above
(and the results of related work, see excellent reviews by Dill et al.17 and by Karplus and Sali 18 )
prove that the secondary preferences entirely dictate
the ordering in globular proteins? Not necessarily,
for couple of reasons. First, the folding transition
of globular proteins seems be even more cooperative and the amount of secondary structure induced
by folding transition seems be larger (or rather, the
amount of secondary structure in the random coil
state of a globular protein is smaller than observed
in the model studies of an all-or-none collapse transition of semiflexible polymers 6 ). Then, it is known
that some protein fragments (the same short fragments of amino acid sequences) participate in various types of local ordering (different secondary
structure) in different proteins. It has been found
that the same pentapeptide fragments can participate
in helical and b-sheet structural motifs of various
proteins.19 Thus, perhaps some specific long-range
interactions are at least as important as the secondary preferences.
Using a highly simplified model of short heteropolymers (flexible, simple cubic lattice chains with
nearest neighbor pairwise interactions), Dill and coworkers, 17 Shakhnovich and co-workers, 20 and others 18,21 proved that a proper sequence pattern of
two (or more) types of residues induces a unique
structure of the collapsed state. By implication, one
may conclude that the secondary structure is induced upon collapse, provided that the sequence
patterns lead to a well-defined dense structure.
There are, however, some problems with interpretation of these results as an answer to the questions
posted in the beginning of this paper. It is unclear
how to define geometrical equivalents of protein
secondary structure for the simple cubic lattice
chains. Thus, the question to what extent a specific
secondary structure is induced or enhanced by the
collapse is very difficult to address. Another problem emerges from the small size of the studied simple cubic lattice models. When the collapsed structure has the form of a 3 1 3 1 3 cube, the interior
consists of 1 (one) unit. Consequently, it is difficult
to model a typical globular protein segregation of
hydrophobic and hydrophilic residues between the
bpa
W: Biopolymers
5502
Collapse Transition
interior and the surface of a protein. Such segregation is perhaps an important factor responsible for
the emerging secondary structure and early stages
of protein folding. A recent study of a 5 1 5 1 5
cubic lattice protein model by Dinner et al.22 shows
that this could really be the case. Very likely, formation of unique structure for longer polymers requires
higher heterogeneity (more than two types of residues) of model sequences.23,24
The results of our earlier work employing highly
simplified protein models and the results of Dill and
co-workers, 17 Shakhnovich and co-workers, 20 and
other related work 21,24,25 provided inspiration for the
present studies. Our intention in this and in forthcoming work is to examine the effects of various
interactions on the thermodynamics and folding
pathway of protein-like systems on a very fundamental level. In the present work, we focus on the
properties of a model that could be considered
equivalent to Dill’s and Shakhnovich’s models in
the sense that all short-range interactions are neglected, and the long-range interactions are reduced
to the pairwise contact interactions between two
distinct types of residues. The present model has,
however, a geometry (and consequently a number
of distinguishable conformations per residue) that
mimics the geometry of a polypeptide main chain
of real proteins.26,27 Thus, the secondary structure
could be defined more precisely and the question
of to what extent the sequence pattern defines the
local, protein-like ordering of the globular state
hopefully could be addressed.
There is another completely different motivation
for the present series of studies. During the last
few years, Kolinski, Skolnick, and co-workers have
developed discretized models of protein structure
and dynamics.28 – 30 These models employed a high
coordination lattice representation of polypeptide
chains 31 and a rather complex force field derived
from the statistical analysis of the regularities seen
in known structures of globular proteins. The models were capable of reproducing the structure of
very simple, small globular proteins 29,32 and protein
assemblies 33,34 from sequence information alone.
The model failed to predict more complex structures
when a straightforward approach was attempted.
However, using various hierarchical methods to derive some global restraints, the applicability of these
models increased significantly.35 Nevertheless, due
to the complex structure of the model, it seems to
be very important to separately examine the various
aspects of protein representation and, even more so,
the role and interplay between various interactions.
This should lead to a better general understanding
8K30
/
8K30$$5502
08-18-97 11:14:42
539
of the protein folding phenomenon. Moreover, the
insight from the present studies is expected to provide a guideline for the refinement of the force field
of these discretized models. This will be beneficial
for protein structure predictions and detailed studies
of protein dynamics and thermodynamics.30,36
In the remaining part of this paper, we describe
the lattice model of protein, the Monte Carlo sampling procedure, and the method of analysis of simulation results. Then, the results of the simulations
of four various types of protein sequences of H and
P residues are presented. We studied homopolymeric sequences (only H residues), sequences with
a b-type and with helical patterns of HP residues
and random HP sequences (with equimolar composition). Thus, the effect of amphipathic patterns
could be well separated. Finally, we compare our
findings with other work and analyze their possible
meaning for various aspects of theoretical studies
of protein-like systems. The paper concludes with
a summary of the main implications of the present
studies and an outline of further studies.
METHODS
Polypeptide Representation
The reference frame for the geometry of the model is
provided by the simple cubic lattice with a mesh size
equal to 1.22 Å. The numerical value of the lattice spacing
was selected previously in a way that leads to the best
fit of the real proteins’ a-carbon trace to the lattice, 26
with a set of restraints as discussed below. The rest of
the paper employs lattice units (1 l.u. Å 1.22 Å).
The model polypeptide chain consists of n united
atoms (or residues), corresponding to the a-carbons in
real polypeptides. The model residues’ positions are restricted to a set of simple cubic lattice points. The allowed
orientations of virtual bonds belong to a set of 90 vectors
{v} Å {É{3, {1, {1É, . . . , É{3, {1, 0É, . . . , É{3, 0,
0É, . . . , É{2, {2, {1É, . . . , É{2, {2, 0É, rrr}. The
length of the vectors type of É{3, {1, 0É corresponds to
3.8 Å, a value very close to the average distance between
two consecutive a-carbons in polypeptides (assuming
trans conformation of the polypeptide bonds). The fluctuation of the length of the basis vectors corresponds to
{0.3 Å. To reproduce the protein-like geometry of the
model chain, the values of the planar angles were restricted to a range of 72.57 –1547. Notice that the average
planar angle for helical conformations is about 957, while
the corresponding average is about 1157 for expanded
b states. Thus, the a-carbon representation of polypeptide chains is identical to the model employed previously.28 – 33,37 The accuracy of the Ca representation for
this lattice model ranges from 0.6–0.7 Å of the average
bpa
W: Biopolymers
5502
540
Kolinski and Madziar
rms deviation from the high resolution crystallographic
structures. For short reference, this lattice discretization
is called the ‘‘310’’ lattice (in previous work, 26 we also
used the term ‘‘310 hybrid’’ lattice).
Interactions
Except for the above-mentioned restriction superimposed
on the planar angles and restriction of ri ,i/3 to values
greater than 8 1 / 2 (3.5 Å), all short-range interactions have
been neglected. The long-range interactions were counted
for Éi 0 jÉ ú 3. The potential of the long-range interactions has the form of a square well function given below:
Erep for rij õ 4
Eij Å 1ij
for 4 ° rij ° 6 (lattice units)
(1)
for rij ú 6
0
where Erep Å 5kBT strong repulsive potential, and rij , the
distance between the ith and the jth a-carbons. All energy
parameters are defined in dimensionless kBT units. Consequently, the system reduced thermodynamic temperature
T is also dimensionless. There are three values of binary
parameters:
1ij Å
Sequences
Four types of sequences were studied. First, we studied
the homopolymeric model with the sequence that could
be denoted as (-H-)n , although weaker pairwise interactions were assumed (see the previous section) for the
HH pairs than for the heteropolymeric models. Three
heteropolymeric models of peptides were considered. In
the random sequences, the numbers of H and P residues
were the same and the sequence was generated by a pseudorandom mechanism before each simulation. The final
two sequences mimic the amphiphilic pattern seen in
globular proteins. The helical sequences had the repeating
pattern (-HHPPHPP-) and the b-type sequence had the
repeating pattern (-HP-) of hydrophilic and hydrophobic
residues. These patterns are, of course, idealized as the
amino acid sequences of real proteins exhibit large variations around these canonical hydrophobicity patterns. Actually, a simple statistical analysis of amino acid sequences of real proteins leads to results close to those
expected for random heteropolymers of a certain composition. Usually, the contribution to the idealized amphiphilic patterns is not very well pronounced. Two values
of chain length (n Å 60 and n Å 80) were investigated for
each type of model protein sequence. This corresponds to
a possible range of the chain length (the number of amino
acids) of small globular proteins.
1HH , for Ai Å H and Aj Å H
Sampling Method
1PP ,
for Ai Å P and Aj Å P
1HP ,
for Ai Å H and Aj Å P
The asymmetric Metropolis Monte Carlo scheme 39 was
used as a sampling method with the transition probability
given below:
(2)
or for Ai Å P and Aj Å H
pkl Å min{1, exp( 0 (Ek 0 El )/kBT )}
where Ai denotes the identity of the ith residues (i.e.,
hydrophobic H, or polar P). The values of the binary
parameters were selected in a fashion similar to Shakhnovich’s AB model.38 Namely, 1HH was assumed to be
equal to 02kBT, 1PP Å 01kBT, and 1HP Å 0. For the
homopolymeric model (only H residues), the value of
the interaction parameter has been reduced to 00.5kBT
so as to maintain approximately the same total energy as
the dense globular state. This way, the comparisons are
more straightforward. In all cases, the total conformational energy is the sum of the pairwise contributions for
the entire model chain.
E Å ∑ ∑ Eij
(3)
The cutoff distances (note the relatively wide square
well) have been properly adjusted in order to compensate
for the lack of side groups in this reduced model. The
assumed range of attractive interactions, i.e., 6 lattice
units (7.32 Å), nicely covers the range of distances between the a-carbons in adjacent b-strands or helices seen
in real proteins. Consequently, the density of the collapsed state is expected to mimic the density (number of
residues per volume) of globular proteins.
8K30
/
8K30$$5502
08-18-97 11:14:42
(4)
where Ek and El are conformational energies of the kth
and lth conformations, respectively. The updating of the
chain conformations consisted of three types of local
modifications: two bond kink moves, three bond moves,
and two bond moves of the chain ends.28 Except for the
chain end modifications, all possible local conformational
transitions were enumerated and used randomly during
the simulations. This advantage of the lattice models significantly accelerates the sampling process. Details of the
conformational updating of the ‘‘310’’ lattice protein
models have been described previously.27,28 The collapse
transition occurred upon the simulated thermal annealing
of the model systems. The temperature range was selected
so that the model polypeptides sampled the expanded
random coil state portion of the conformational space as
well as the dense globular states during the same relatively long run.
RESULTS
Collapse Transition of Random
Sequence Model: Effect of Chain Length
On the level of simple sequence statistics, the random sequence model mimics most closely the se-
bpa
W: Biopolymers
5502
Collapse Transition
FIGURE 1 Mean square radius of gyration » S 2 … and
the mean square end-to-end distance » R 2 … plotted vs reduced temperature T for the random sequence chain consisting of n Å 60 units. For easy comparison the numerical
values of the chain of the average chain dimensions are
divided by the chain length (number of units). The statistical error of the Monte Carlo simulations (based on a
comparison of five independent runs) is range of the
symbol size.
quences of globular proteins. During the simulations, the average coil (globule) size, the average
conformational energy, the heat capacity, and the
number of long distance contacts were monitored
as a function of the system temperature. Additionally, the local ordering (secondary structure) was
analyzed for the globular states. For the high temperature random coil conformations, the lack of local ordering is common for all sequences and the
proper order parameters could be computed only
once.
The observed collapse transition of the random
sequence model was continuous. In Figure 1 and
Figure 2, the mean square radius of gyration and
the mean square distance between the chain ends
were plotted as a function of the model’s reduced
(dimensionless) temperature T. The symbols » S 2 …
and » R 2 … denote the values divided by number of
units. This normalization facilitates more straightforward comparison of various systems. In general,
unless specifically stated, all quantities are normalized by the number of chain units. At high temperatures, the chain’s sample random coil conformations
and the normalized dimensions are essentially the
same for both values of chain length. This is because
the square dimensions of random coil polymers
scale as n 6 / 5 . Larger than 1 exponent is compensated
by a finite length effect and the nonzero long-range
interactions. At random coil regime, the ratio of
» S 2 … / » R 2 … is close to 16, which is the value characteristic for the Gaussian chain.40 With decreasing tem-
8K30
/
8K30$$5502
08-18-97 11:14:42
541
peratures, both model polypeptides collapse to a
dense globular state. The collapse transition for a
longer chain occurs at higher temperatures. The
chain dimensions decrease to the values corresponding to n 1 / 3 scaling typical for close-packed, long,
flexible polymers.40 This scaling implies that the
normalized dimensions for a longer chain should be
smaller as it is observed. Due to the slow relaxation
of the chain conformation in the low temperature
regime, the statistics for the end-to-end distance are
poor and are given here only for qualitative comparison. Generally, the picture is in excellent agreement
with the expected one for the collapse transition of
a flexible homopolymeric chain of finite length.
Somewhat sharper transitions for longer chains
could be concluded from an inspection of the conformational energy and heat capacity (computed
from the fluctuations of conformational energy)
given in Figures 3 and 4 for n Å 60 and n Å 80,
respectively. Consistent with the chain dimension
curves (compare Figures 1 and 2), the transition
for longer chains occurs at higher temperatures. The
peak of heat capacity is shifted toward higher temperatures. The peaks of heat capacity are poorly
defined due to the smooth transition from random
coil to globular state.
The number of long-range contacts per residue
(Figures 5 and 6) is the same at high temperatures.
At the globular state, it is larger for a longer chain.
This is mostly due to the surface effect. Noticeably,
the total compressive force is larger for a longer
FIGURE 2 Mean square radius of gyration » S 2 … and
the mean square end-to-end distance » R 2 … plotted vs reduced temperature T for the random sequence chain consisting of n Å 80 units. For easy comparison the numerical
values of the chain of the average chain dimensions are
divided by the chain length (number of units). The statistical error of the Monte Carlo simulations (based on a
comparison of five independent runs) is range of the
symbol size.
bpa
W: Biopolymers
5502
542
Kolinski and Madziar
FIGURE 3 Average conformational energy E and the
heat capacity C£ (calculated from fluctuations of the conformational energy) plotted against reduced temperature
T for the random sequence chain consisting of n Å 60
units. For easy comparison with the other systems the
numerical values are divided by the chain length. The
statistical error of the Monte Carlo simulations is range
of the symbol size.
chain, as shown by the larger number of repulsive
sphere collisions (the open circles in both figures).
In both cases, the magnitude of the number of longrange interactions compares very well with the number of side chain contacts seen in globular proteins.
Similar effects attributed to the length of the
chain were observed for other models of sequences
studied in this work. In all cases, increasing the
chain length caused a sharper transition at higher
temperatures. The strongest effect was observed for
the helical sequence. Nevertheless, the changes
were quantitative in all cases.
FIGURE 4 Average conformational energy E and the
heat capacity C£ (calculated from fluctuations of the conformational energy) plotted against reduced temperature
T for the random sequence chain consisting of n Å 80
units. For easy comparison with the other systems the
numerical values are divided by the chain length. The
statistical error of the Monte Carlo simulations is range
of the symbol size.
8K30
/
8K30$$5502
08-18-97 11:14:42
FIGURE 5 Average number of long range, Éi 0 jÉ
ú 3, contacts between chain units plotted against reduced
temperature T for the random sequence chain consisting
of n Å 60 units. The solid symbols correspond to the
total number of contacts per chain unit. The open symbols
represent the repulsive overlaps of the chain units.
Effect of Sequence on Collapse
Transition
To compare the effect of sequence on the cooperativity of collapse transition, the simulated annealing
runs for the three remaining sequences (homopolymeric -HH-, heteropolymeric with b-type pattern
type -HP- and heteropolymeric with a-helix type
pattern -HHPPHPP-) were performed using a temperature range similar to that employed in the random sequence simulations. At low temperatures, the
annealing was continued as long as the average
chain dimensions decreased. In all cases, the close
packing of the globular state was observed in the
FIGURE 6 Average number of long range, Éi 0 jÉ
ú 3, contacts between chain units plotted against reduced
temperature T for the random sequence chain consisting
of n Å 80 units. The solid symbols correspond to the
total number of contacts per chain unit. The open symbols
represent the repulsive overlaps of the chain units.
bpa
W: Biopolymers
5502
Collapse Transition
FIGURE 7 Average conformational energy E and the
heat capacity C£ (calculated from fluctuations of the conformational energy) plotted against reduced temperature
T for the homopolymeric sequence chain consisting of n
Å 60 units. For easy comparison with the other systems
the numerical values are divided by the chain length. The
statistical error of the Monte Carlo simulations is range
of the symbol size.
end stages of the thermal annealing procedure. The
corresponding plots of the system’s conformational
energy and the heat capacity estimates for homopolymeric, b-type, and helical sequences of n
Å 60 chains are shown in Figures 7–9. Clearly,
the sharpest transition was observed for the helical
pattern of the model sequence. In this case, after
the collapse to dense globular state, the system relaxation is the slowest and the heat capacity at very
low temperatures becomes poorly defined. It is unclear if the increase of C£ below T Å 1 reflects real
543
FIGURE 9 Average conformational energy E and the
heat capacity C£ (calculated from fluctuations of the conformational energy) plotted against reduced temperature
T for the helical (-HHPPHPP-) sequence chain consisting
of n Å 60 units. For easy comparison with the other
systems the numerical values are divided by the chain
length. The statistical error of the Monte Carlo simulations is range of the symbol size (the lowest temperature
point of the heat capacity curve is about 8 times less
accurate).
thermodynamic effect (the error bar for C£ below
T Å 1 is about half of the observed jump, while
everywhere else is within the range of the symbol
size). Since a similar jump of C£ was observed for
n Å 80, it may be speculated that this is the signature
of some rearrangement of the globule. The most
diffused was the collapse transition of the homopolymer. It is interesting that for these thermodynamic characteristics there is no qualitative difference between regular pattern sequences and random
heteropolymers (compare Figure 3). In all cases,
the collapse transition is continuous.
Secondary Structure
FIGURE 8 Average conformational energy E and the
heat capacity C£ (calculated from fluctuations of the conformational energy) plotted against reduced temperature
T for the b-type (-HP-) sequence chain consisting of n
Å 60 units. For easy comparison with the other systems
the numerical values are divided by the chain length. The
statistical error of the Monte Carlo simulations is range
of the symbol size.
8K30
/
8K30$$5502
08-18-97 11:14:42
Hydrogen bond interactions were absent in the reduced model studied here. Thus, the secondary
structure cannot be identified by a standard method
of analysis of hydrogen bond pattern. However, one
may use a strong correlation between the shortrange geometry of protein backbone and the secondary structure assignments. Such a method of analyzing protein structure has been used in various contexts.28,41 – 43 Here, we employed a set of very simple
geometrical criteria that correlates very well with
the secondary structure of proteins. Let us consider
three consecutive vectors of the reduced Ca backbone vi01 , vi , and vi/1 . Then, let us define a ‘‘chi2
ral’’ end-to-end distance r i01,i/2
* for such a fragment.
bpa
W: Biopolymers
5502
544
Kolinski and Madziar
2
r i01,i/2
*
Å sign((vi01 ^ vi )r vi/1 )
1 (vi01 / vi / vi/1 ) 2
(5)
Then, a simple geometric criterion coincides nicely
with the secondary structure of proteins.28,42 For example, small positive values of r 2 * (about 20 in
lattice units) correspond to right-handed compact
conformations, i.e., right-handed helices. Full classification is given in Table I. Similar criteria have
been used previously in various application of high
resolution lattice models of proteins.27,28,30 – 32,44,45
Figure 10 shows a fragment of the model chain
assigned as an helix. The present model has no builtin chiral interactions; thus, the right- and left-handed
helices are equally probable. A short, expanded
fragment (presumably b-type strand) is shown in
Figure 11. Both snapshots were extracted from the
dense globular conformations of one of the model
heteropolymeric, protein-like (-HP- and -HHPPHPP-) sequences. According to the definition of
secondary structure elements given in Table I, one
can calculate the secondary structure content as an
ensemble average. Statistics of the globular states
for all model sequences and for an athermal chain
were collected in Table II. The data for globular
states correspond to the temperature range where
the chain dimensions achieve low plateau values.
In this regime the results do not depend on specific
width of temperature range used in the averaging
procedure. For easy reference, the corresponding
statistics of Protein Data Bank (PDB) 46,47 structures
are included as well. The last statistics prove that
the definition of secondary structure used here is
very reasonable. Indeed, according to the employed
criteria, the right-handed helices account for about
38% protein secondary structure (as averaged over a
representative database of nonhomologous globular
proteins), expanded b-type states account for 31%,
FIGURE 10 Representative snapshot of a helical fragment of a low temperature state the helical sequence
chain.
and the remaining conformations could be classified
as loop/coil states. For the random coil state, some
‘‘secondary structure’’ is also detected. These are
usually single helical turns or very short expanded
fragments, and they come from the underlying distribution of the lattice chain conformations that
could be treated as a reference state. The lattice
chains at high temperatures (or with zero interactions) have about 24% of helical states and about
23% of expanded states. The majority of the three
vector fragments has to be classified as loop/coil
conformations.
The two statistics described above (for PDB
structures and the random coil lattice chains) provide the proper reference states for the four models
of protein sequences studied in this work. Let us first
discuss the homopolymeric case where all pairwise
interactions are of the same strength. The fraction
of helical states in the collapsed structures is the
same (the error is range of 1% for these simulations) as for the athermal random coil. The fraction
of expanded states is even smaller than for the ran-
Table I Backbone Geometry-Based Assignment of
Secondary Structure
Secondary Structure of
ith Residue
Expanded
Coil/loop
Left-handed helix
Prohibited conformations
Right-handed helix
Coil/loop
Expanded
8K30
/
8K30$$5502
Range of r 2*
i01,i/2
(Lattice Units)
(086, 057)
(056, 026)
(025, 09)
(08, 8)
(9, 25)
(26, 56)
(57, 91)
08-18-97 11:14:42
FIGURE 11 Representative snapshot of an expanded
fragment of a low temperature state the b-type sequence
chain.
bpa
W: Biopolymers
5502
Collapse Transition
545
Table II Comparison of Secondary Structure Content for
PDB Structures, Random Coil 310 Lattice Chains, and for
Various Models of Sequence at the Globular Statea
Model
Helix Content (%)
(R-Helix / L-helix)
Expanded content
(%)
40
24
26
23
43
22
31
23
20
17
9
26
PDB averaged
athermal lattice
Random HP sequence
({H{)n
({HHPPHPP{)x
({HP{)y
a
x Å n/7, y Å n/2 (n Å 60 is the model chain length).
dom coil chain. Possibly, due to compactness, some
longer expanded stretches have been suppressed.
Thus, it can be concluded that compactness itself,
when caused by an uniform attractive force, does
not induce any secondary structure. In other words,
the short-range correlations of the chain segments
in the closely packed globular state are essentially
the same (i.e., random) as in the random coil state.
The random heteropolymeric sequence leads to
some marginal increase of secondary structure content in the globular state. With respect to the homopolymeric case, the fraction of helices and expanded
conformation increases by about 3%.
The b-type pattern leads to a noticeable increase
in the fraction of expanded conformations; however,
the structures are by no means highly ordered. The
fraction of expanded states (26%) is smaller than
the average for the PDB structures, which do, of
course, contain various types of structural motifs.
The level of ordering seen in real globular b-type
proteins would be reproduced by ca. 50% of expanded states.
The helical sequence pattern leads to a substantial increase of helical conformations. Moreover,
the fraction of b-type expanded conformations is
significantly lower than that seen for other sequences. The degree of local ordering for this sequence is substantial, although lower (by a factor
of about 2) than the average for real helical proteins.
It is interesting that the helical pattern generates
more ordered structures than the b-type pattern.
There are at least two reasons for such behavior.
First, the helical pattern is stabilized by some shortrange interactions, i.e., attractions between ith and
i / 4th residues. In real proteins, the helical conformations are more stable in isolation due to various
specific short-range interactions. Our model chains
lack any specific long-range interactions, and heli-
8K30
/
8K30$$5502
08-18-97 11:14:42
ces form much easier due to entropic effects. The
second reason is also partly entropic. Suppose a
short helical fragment forms, then provides a large,
well-defined hydrophobic surface that acts as a scaffold for the remaining portions of the polypeptide
chains. Moreover, the helices are thicker and more
rigid than the expanded fragments. Thus, the segregation of hydrophobic and hydrophilic residues between the surface and the interior of the globule is
more strongly associated with the presence of some
secondary structure.48 On the contrary, expanded
conformations (in the absence of long-range hydrogen bonds) are more flexible and there are a large
number of possible arrangements of various more
or less bend chain fragments in a compact globular
state with the proper phase segregation of the two
types of residues.
In all cases, the average length of secondary
structure elements is small as can be seen in Table
III. Moreover, it does not change very much with
the change of the sequence type. It should be
pointed out that the ‘‘5’’ in Table III means that two
consecutive three-vector fragments have the same
conformation according to the coarse classification
given in Table I. Thus, R-helix (right-handed helix), length 5 means a helical turn formed by four
consecutive backbone vectors connecting five Ca
vertices. According to this definition, the shortest
detectable secondary structure element connects 4
consecutive a-carbons. The longest observed fragments of regular secondary structure elements contained 8 Ca vertices (one per ten snapshot of globular state). Interestingly, the longest helix observed
with well detectable (one per three snapshots of
the 60-residue globule) frequency for the b-type
sequence contained 6 model residues, while the 6residue b-type fragments in the helical sequence
were hardly detectable. In all cases, the main contri-
bpa
W: Biopolymers
5502
546
Kolinski and Madziar
Table III Average Length of Secondary Structure Elements
in the Globular State for Various Sequencesa
Model
Helix
(R-helix or L-helix)
Expanded
5.22
5.17
5.32
5.20
5.40
5.43
5.07
5.58
Random HP sequence
({H{)n
({HHPPHPP{)x
({HP{)y
a
x Å n/7, y Å n/2 (n Å 60 is the model chain length).
bution to the ‘‘secondary structure’’ content came
from the three-vector, four-residue fragments. Thus,
in spite of the strong sequence effect on the ordering
of the globular state, the overall level of secondary
ordering for all models is low.
DISCUSSION
Comparison with Other Work
In the past, several computational studies of the
collapse transition of long, flexible homopolymeric
chains showed that compactness itself does not induce any noticeable secondary structure type local
ordering of the chain segments. On the contrary,
some time ago Chan and Dill 11,12 suggested that
compactness itself induces local ordering, which
could be interpreted as an equivalent of the secondary structure of globular proteins. It appears, however, that their conclusion was biased by the very
short chains they studied and the rather permissive
definition used for the secondary structure of simple
cubic lattice chains. Subsequent studies of Gregoret
and Cohen, 14 Hao et al., 15 and Socci et al.16 showed
that for reasonable, protein-like densities, in the absence of some sequence-specific interactions, the
amount of induced structure is negligible, if any,
when a more rigorous definition of secondary structure is employed (see also an excellent review by
Dill et al.17 ).
Heteropolymeric systems are different in this respect. For instance, it is possible to design the sequences of two types of amino acids of simple lattice chains that would exhibit well-defined local or
even global ordering. This has been proven by Dill
and co-workers, 49 Shakhnovich et al., 20,50 Sali et
al., 21 and others 16 for simple cubic lattice models
of two-component copolymers. The model employed in the present work has a geometry close
to the main chain geometry of real proteins.26,31,32
8K30
/
8K30$$5502
08-18-97 11:14:42
Helices or b-sheets can assemble in any direction
of the lattice coordinate system with no qualitative
effect of lattice anisotropy. Thus, the question of
where the compactness and pattern of hydrophobic/
hydrophilic residues induce secondary structure is
addressed in a more straightforward form. The answer is that a proper pattern of H and P residues in
model sequences does indeed lead to some secondary ordering of the globular state; however, the
overall degree of ordering is small and there is no
global ordering of the compact globules. For simple
exact models and for more complex (and perhaps
more ‘‘realistic’’) lattice models, a unique globular
state could be easily obtained when some secondary
preferences (consistent with the target structure)
moderate the tertiary interactions. As has been
shown in other work employing a lattice representation of protein chains similar to that studied here,
a unique structure of globular state requires some
directional interactions (like hydrogen bonds) and
secondary preferences that trigger formation of various structural motifs.13 A unique packing of side
chains (when their internal degrees of rotational
freedom is accounted for) apparently requires some
kind of multibody interactions.44 There is complex
interplay between these interactions. Does this contradict the findings of Kolinski and Skolnick, 5,6 Dill
et al., Shakhnovich et al., and Sali et al.21,25 for
simple lattice models? Not at all. These very simple
lattice models suppressed various conformational
degrees of freedom of real proteins and, therefore,
the implications of the above-mentioned complex
interplay between various interactions were to a
large extent a priori assumed. Basic physics of the
protein folding process is perhaps reproduced by
these simple motifs.
Yet another difference between present studies
and studies of simple exact models 17,24 of protein
like systems should be mentioned. In the case of
simple exact models the conclusions concerning
bpa
W: Biopolymers
5502
Collapse Transition
547
structural properties of the ‘‘native’’ state are based
on analysis of the lowest energy conformation(s).
In present work we analyzed a manifold of compact
conformation at a low, however finite, temperature.
At this range of the reduced temperature the model
systems are still well equilibrated. The obtained results correspond to a free-energy minimum at a
given temperature. It cannot be excluded that upon
further ‘‘cooling’’ and careful equilibration the
structural properties (in particular the secondary
structure content) would change. Since the packing
density approached already a plateau at the studied
range of temperatures it is rather unlike that these
changes would be significant.
gen bonds, there are some restrictions on the otherwise possible packing of side groups and main chain
fragments. What would then be the effect of a sequence pattern on protein ordering? Perhaps larger?
If so, will more random patterns than the idealized
patterns examined in the present work still induce
secondary structure? These and other questions
would be addressed in future work, and the model
from the present work would be used as a reference
system. These series of studies hopefully will provide some insight into the role of various interactions and their complex interplay in controlling protein structures, folding dynamics and thermodynamics.
Why Study with Reduced
Representation and Incomplete
Interactions?
CONCLUSIONS
To simulate a real protein folding pathway and a
three-dimensional structure with meaningful resolution, one needs more exact models that reproduce
a minimal amount of local details. Recent studies
have shown that the high coordination lattice models with proper knowledge-based force fields are
good candidates for such investigations.27 – 32 Paradoxically, within the framework of such apparently
more realistic but also more complex models, it is
more difficult to quantitatively answer some general
questions concerning protein folding dynamics and
thermodynamics.30 Thus, the motivation for studying partial models, such as the one analyzed here.
In the past, we have analyzed in detail the effect
of amino acid sequence specific short-range interactions alone on the behavior of the 310 lattice model
of protein chains.28 It has been shown that the secondary structure generated by the model in the absence of any long-range interactions was similar to
that seen in the native state of the corresponding
proteins. The accuracy of such predicted secondary
structure was on the level of predictions via the
simplest one-dimensional methods of secondary
structure predictions. Here, we asked a very different question about the effect of long-range interactions alone in their simplest and most exaggerated
form. The answer is that the sequence pattern (even
in the exaggerated binary form) induces little secondary structure when the interactions are essentially isotropic (except for some trivial connectivity
effects). The long-range interaction effect could
have been perhaps stronger when explicit side
groups were incorporated. This would introduce
some directionality into the long-range interactions.
Moreover, due to the strong directionality of hydro-
Using the Metropolis Monte Carlo sampling
method, we examined a model of protein chains
capable of reproducing a protein-like geometry of
a-carbon reduced backbone. The interaction scheme
was reduced to the excluded volume interactions
and the square well pairwise interactions between
united atoms centered on the a-carbons. Two types
of residues were considered: H—nonpolar (hydrophobic); and P—polar (hydrophilic). It has
been shown that in the absence of short-range interactions (and/or some directional long-range interactions), the collapse transition of homopolymeric
chains does not induce any secondary type proteinlike structural ordering. A marginal secondary structure can be observed for random heteropolymers
with an equimolar content of H and P residues.
Heteropolymers with exaggerated patterns of H and
P residues that mimic the amino acid sequence patterns of a- and b-type proteins exhibit some augmenting of the secondary structure content in the
dense globular state. Nevertheless, the observed degree of local ordering was lower than the average
ordering seen in real globular proteins. For all model
sequences, the collapse transition was continuous,
which is in agreement with theoretical predictions
for flexible polymers 8 and heteropolymers.23,24
More cooperative transition was observed for regular patterns; however, this cooperativity was very
low in comparison to the cooperativity of the folding transition of typical globular proteins. In future
work, we will examine the effect of the explicit
modeling of side chains and short-range interactions
on the character of the collapse transition of these
idealized protein models.
8K30
/
8K30$$5502
08-18-97 11:14:42
bpa
W: Biopolymers
5502
548
Kolinski and Madziar
This work was partially supported by the University of
Warsaw, grant BST-532-34/96, and the Howard Hughes
Medical Institute (International Scholar grant no. 75195543402). Helpful discussions with Dr. Jeffrey Skolnick
are gratefully acknowledged.
REFERENCES
1. Anfinsen, C. B. (1973) Science 181, 223–230.
2. Creighton, T. E. (1990) Biochem. J. 270, 131–146.
3. Richardson, J. (1981) Adv. Protein Chem. 34, 167–
339.
4. Kolinski, A., Skolnick, J. & Yaris, R. (1987) Macromolecules 20, 438–440.
5. Kolinski, A. & Skolnick, J. (1986) Proc. Natl. Acad.
Sci. USA 83, 7267–7271.
6. Kolinski, A., Skolnick, J. & Yaris, R. (1986) J.
Chem. Phys. 85, 3585–3597.
7. Skolnick, J., Kolinski, A. & Yaris, R. (1988) Proc.
Natl. Acad. Sci. USA 85, 5057–5061.
8. Lifshitz, I. M., Grosberg, A. Y. & Khokhlov, A. R.
(1979) Rev. Mod. Phys. 50, 683.
9. Sanchez, I. C. (1979) Macromolecules 12, 980–988.
10. Post, C. B. & Zimm, B. H. (1979) Biopolymers 18,
1487–1501.
11. Chan, H. S. & Dill, K. A. (1989) Macromolecules
22, 4559–4573.
12. Chan, H. S. & Dill, K. A. (1990) Proc. Natl. Acad.
Sci. USA 87, 6388–6392.
13. Kolinski, A. & Skolnick, J. (1992) J. Phys. Chem.
97, 9412–9426.
14. Gregoret, L. M. & Cohen, F. E. (1991) J. Mol. Biol.
219, 109–122.
15. Hao, M.-H., Rackovsky, S., Liwo, A., Pinkus,
M. R. & Scheraga, H. A. (1992) Proc. Natl. Acad.
Sci. USA 89, 6614–6618.
16. Socci, N. D. & Onuchic, J. N. (1994) J. Chem. Phys.
100, 1519–1528.
17. Dill, K. A., Bromberg, S., Yue, K., Fiebig, K. M.,
Yee, D. P., Thomas, P. D. & Chan, H. S. (1995) Protein Sci. 4, 561–602.
18. Karplus, M. & Sali, A. (1995) Curr. Opinion Struct.
Biol. 5, 58–73.
19. Argos, P. (1987) J. Mol. Biol. 197, 331–348.
20. Shakhnovich, E., Farztdinov, G. & Gutin, A. M.
(1991) Phys. Rev. Lett. 67, 1665–1668.
21. Sali, A., Shakhnovich, E. & Karplus, M. (1994) J.
Mol. Biol. 235, 1614–1636.
22. Dinner, A. R., Sali, A. & Karplus, M. (1996) Proc.
Natl. Acad. Sci. USA 93, 8356–8361.
23. Shakhnovich, E. I. & Gutin, A. M. (1989) Biophys.
Chem. 34, 187–199.
8K30
/
8K30$$5502
08-18-97 11:14:42
24. Dinner, A. R., Sali, A., Karplus, M. & Shakhnovich,
E. (1994) J. Chem. Phys. 101, 1444–1451.
25. Sali, A., Shakhnovich, E. & Karplus, M. (1994) Nature 369, 248–251.
26. Godzik, A., Kolinski, A. & Skolnick, J. (1993) J.
Comput. Chem. 14, 1194–1202.
27. Kolinski, A. & Skolnick, J. (1996) Lattice Models
of Protein Folding, Dynamics and Thermodynamics,
R. G. Landes, Austin, TX.
28. Kolinski, A., Milik, M., Rycombel, J. & Skolnick, J.
(1995) J. Chem. Phys. 103, 4312–4323.
29. Kolinski, A., Galazka, W. & Skolnick, J. (1995) J.
Chem. Phys. 103, 10286–10297.
30. Kolinski, A., Galazka, W. & Skolnick, J. (1996)
Proteins 26, 271–287.
31. Kolinski, A. & Skolnick, J. (1994) Proteins 18, 353–
366.
32. Kolinski, A. & Skolnick, J. (1994) Proteins 18, 338–
352.
33. Vieth, M., Kolinski, A., Brooks, C. L., III & Skolnick, J. (1994) J. Mol. Biol. 237, 361–367.
34. Vieth, M., Kolinski, A., Brooks, C. L., III & Skolnick, J. (1995) J. Mol. Biol. 251, 448–467.
35. Skolnick, J., Kolinski, A. & Ortiz, A. R. (1997) J.
Mol. Biol. 265, 217–241.
36. Hao, M.-H. & Scheraga, H. A. (1994) J. Phys.
Chem. 98, 9882–9893.
37. Skolnick, J., Kolinski, A., Brooks, C., III, Godzik,
A. & Rey, A. (1993) Curr. Biol. 3, 414–423.
38. Shakhnovich, E. I. & Gutin, A. M. (1993) Proc.
Natl. Acad. Sci. USA 90, 7195–7199.
39. Metropolis, N., Rosenbluth, A. W., Rosenbluth,
M. N., Teller, A. H. & Teller, E. (1953) J. Chem.
Phys. 51, 1087–1092.
40. de Gennes, P. G. (1979) Scaling Concepts in Polymer Physics, Cornell University Press, Ithaca, NY.
41. Rackovsky, S. (1990) Proteins 7, 378–402.
42. Oldfield, T. J. & Hubbard, R. E. (1994) Proteins 18,
324–337.
43. Milik, M., Kolinski, A. & Skolnick, J. (1997) J.
Comput. Chem. 18, 80–85.
44. Kolinski, A., Godzik, A. & Skolnick, J. (1993) J.
Chem. Phys. 98, 7420–7433.
45. Kolinski, A., Skolnick, J., Godzik, A. & Hu, W.-P.
(1997) Proteins 27, 290–308.
46. Bernstein, F. C., Koetzle, T. F., Williams, G. J. B.,
Meyer, E. F., Jr., Brice, M. D., Rodgers, J. R., Kennard, O., Simanouchi, T. & Tasumi, M. (1977) J.
Mol. Biol. 112, 535–542.
47. Protein Data Bank (1995) Quart. Newsl. No. 71,
January.
48. Hecht, M. H., Richardson, J. S., Richardson, D. C. &
Ogden, R. C. (1990) Science 249, 884–891.
49. Dill, K. A. (1993) Curr. Biol. 3, 99–103.
50. Shakhnovich, E. I. & Finkelstein, A. V. (1989) Biopolymers 28, 1667–1680.
bpa
W: Biopolymers
5502
© Copyright 2026 Paperzz