Desolvation Barrier Effects Are a Likely Contributor to the

doi:10.1016/j.jmb.2009.04.011
J. Mol. Biol. (2009) 389, 619–636
Available online at www.sciencedirect.com
Desolvation Barrier Effects Are a Likely Contributor
to the Remarkable Diversity in the Folding Rates of
Small Proteins
Allison Ferguson, Zhirong Liu and Hue Sun Chan⁎
Department of Biochemistry,
University of Toronto, Toronto,
Ontario, Canada M5S 1A8
Department of Molecular
Genetics, University of Toronto,
Toronto, Ontario, Canada M5S
1A8
Department of Physics,
University of Toronto, Toronto,
Ontario, Canada M5S 1A7
Received 3 March 2009;
received in revised form
1 April 2009;
accepted 6 April 2009
Available online
9 April 2009
The variation in folding rate among single-domain natural proteins is
tremendous, but common models with explicit representations of the
protein chain are either demonstrably insufficient or unclear as to their
capability for rationalizing the experimental diversity in folding rates. In
view of the critical role of water exclusion in cooperative folding, we apply
native-centric, coarse-grained chain modeling with elementary desolvation
barriers to investigate solvation effects on folding rates. For a set of 13
proteins, folding rates simulated with desolvation barriers cover ∼ 4.6
orders of magnitude, spanning a range essentially identical to that observed
experimentally. In contrast, folding rates simulated without desolvation
barriers cover only ∼ 2.2 orders of magnitude. Following a Hammond-like
trend, the folding transition-state ensemble (TSE) of a protein model with
desolvation barriers generally has a higher average number of native
contacts and is structurally more specific, that is, less diffused, than the TSE
of the corresponding model without desolvation barriers. Folding is
generally significantly slower in models with desolvation barriers because
of their higher overall macroscopic folding barriers as well as slower
conformational diffusion speeds in the TSE that are ≈1/50 times those in
models without desolvation barriers. Nonetheless, the average root-meansquare deviation between the TSE and the native conformation is often
similar in the two modeling approaches, a finding suggestive of a more
robust structural requirement for the folding rate-limiting step. The
increased folding rate diversity in models with desolvation barriers
originates from the tendency of these microscopic barriers to cause more
heightening of the overall macroscopic folding free-energy barriers for
proteins with more nonlocal native contacts than those with fewer such
contacts. Thus, the enhancement of folding cooperativity by solvation
effects is seen as positively correlated with a protein's native topological
complexity.
© 2009 Elsevier Ltd. All rights reserved.
Edited by C. R. Matthews
Keywords: contact order; Gō model; transition state; Kramers' theory;
conformational diffusion
Introduction
*Corresponding author. E-mail address:
[email protected].
Present address: Z. Liu, College of Chemistry and
Molecular Engineering, Peking University, Beijing 100871,
China.
Abbreviations used: TSE, transition-state ensemble;
RCO, relative contact order; db, desolvation barrier; PMF,
potential of mean force; LRO, long-range order; cm,
contact minimum; ssm, solvent-separated minimum; CI2,
chymotrypsin inhibitor 2.
Theoretical studies of protein folding kinetics afford
a conceptual framework for deciphering from experimental data the physicochemical interactions underlying protein behaviors.1–11 Much progress has been
attained recently by investigating small, singledomain natural globular proteins whose folding/
unfolding thermodynamics and kinetics are twostate-like. For these proteins, although folding speed
has not been extensively optimized by evolution,12,13
no appreciable accumulation of folding or unfolding
0022-2836/$ - see front matter © 2009 Elsevier Ltd. All rights reserved.
620
intermediates has been observed.14–16 This hallmark
feature is in contrast with the more complex, multiphasic folding kinetics of somewhat larger proteins,
which were often subjects of earlier experiments.17–20
Two-state-like folding is also clearly set apart from the
noncooperative thermodynamics and kinetics exhibited by those proteins recently found to be likely
global downhill folders.21,22 With the folding data for
an increasing number of such single-domain proteins
becoming available since the early 1990s14 (reviewed
in Refs. 23–25), the very simplicity of their folding
processes has led researchers to taking a panoramic
view of biophysically important trends across many
two-state-like folders. In a seminal discovery by
Plaxco et al., a significant correlation was found to
exist between logarithmic folding rates of two-state
folders and the values of a simple parameter termed
relative contact order (RCO) derived from the
residue–residue contacts in a given protein's native
structure; the pattern of these contacts is commonly
referred to as native topology.26 (For a perspective on
this usage of the term “topology” vis-à-vis that in
other biomolecular contexts, see Section 1.1 of Ref. 27.)
It was recognized immediately that this simple
empirical rate–topology correlation should offer
important clues to the energetics of protein folding.
The correlation actually represented a fundamental
conceptual challenge because several protein chain
models embodying common notions of protein
energetics at the time failed to reproduce a similar
trend.28 One apparent exception was a two-dimensional square-lattice-model study conducted 20
years earlier by Taketomi and Gō.29 These early
researchers concluded that local interactions speed
up folding kinetics, whereas nonlocal interactions
lead to more cooperative folding transitions. However, although this earlier finding was consistent
with the discovery of Plaxco et al., it did not directly
address the multiple-protein rate–topology correlation because only a single 49mer native structure
was considered in Ref. 29.
Ising-like constructs offered the first rationalizations for the rate–topology correlation.30,31 Their
success provided critical physical insights regarding
the different contributions of local and nonlocal
interactions to the free-energy barrier to folding.
Nonetheless, these constructs are not self-contained
heteropolymer models because they lack an explicit
representation of the protein chain.32 As such, the
relationship between Ising-like constructs and explicit-chain models—which clearly bear a more direct
resemblance to real proteins—remains to be better
elucidated.33 The rate–topology correlation was subsequently addressed using explicit-chain, continuum
Gō-like modeling34–36 by Koga and Takada, who
simulated folding rates of 18 natural proteins.37
Consistent with experiment,26 a positive correlation
between simulated rates and RCO was found, albeit
with a weaker correlation coefficient. These results
showed that the empirical rate–topology correlation
can be captured, at least to a degree, by native-centric
models. The results also revealed, however, a fundamental limitation of the common Gō-like potential
Desolvation and Topology-Dependent Folding
because it fell short in accounting for the diversity of
folding rates among natural two-state-like folders. As
noted by the authors, the simulated folding rates
spanned ≈1.5 orders of magnitude, which is much
narrower than the 6 orders of magnitude spanned by
the corresponding experimental folding rates.37
It was soon realized38,39 that a likely origin for this
shortcoming is that continuum Gō-like models with
pairwise-additive Lennard–Jones-like potentials as
well as common lattice Gō models fold less cooperatively than real two-state-like proteins.40,41 Lattice
modeling efforts inspired by this realization showed
more diverse folding rates when cooperativity was
enhanced by nonadditive many-body energy
terms.38,39 These studies further suggested that the
remarkable diversity of experimental folding rates
among two-state-like proteins is probably underpinned by specific, rather than generic, forms of
many-body interactions, for there are substantial
variations in folding rate diversity when different
many-body interaction schemes were applied. For
example, under a physically plausible nonadditive
scheme that coupled local conformational propensity with nonlocal contact interactions, folding rates
among a set of 27mer three-dimensional cubic lattice
model proteins spanned ≈ 2.6 orders of magnitude.39
(See Ref. 42 for a recent analytical model based on a
similar local–nonlocal coupling mechanism.39,43) In
contrast, under a different nonadditive scheme,38 the
corresponding computed folding rates spanned only
≈ 1.8 orders of magnitude.8 A subsequent application of the idea about many-body effect to continuum explicit-chain models showed that adding a
three-body term to the common pairwise-additive
Gō potential could increase model folding rate
diversity among 18 proteins from ≈ 2.0 to ≈ 3.0
orders of magnitude, even though the improvement
was insufficient to match the ≈ 4.5–6.0 orders of
magnitude spanned by the corresponding experimental rates.44 Analytical modeling has explored the
boosting effect of many-body terms on folding
cooperativity,45 and a recent nonexplicit-chain variational model study also indicated that folding rate
diversity was enhanced by many-body effects.46
Together, these findings have convincingly
demonstrated that folding cooperativity is a crucial
ingredient in the physical accounting of the empirical rate–topology correlation. A case in point is an
earlier lattice-model study that has insightfully concluded that folding cooperativity increases with
nonlocality of native contacts.47 However, because
the models in that study were insufficiently cooperative (see Ref. 48 and Section 16.4.4, pp. 422–427,
of Ref. 49 for an assessment of the 20-letter interaction scheme used in Ref. 47), severe chevron rollovers50,51 led to the simulation results that “under
conditions at which each native conformation was
stable, the structure with mostly nonlocal contacts
folded 2 orders of magnitude faster than the one
with mostly local contacts”.47 This prediction was
opposite to the experimental rate–topology trend.26
In contrast, more cooperative native-centric models
have milder chevron rollovers, and thus a positive
621
Desolvation and Topology-Dependent Folding
correlation between simulated folding rate and RCO
can be maintained even under strongly folding
conditions.52
What are the physical origins of a high degree of
folding cooperativity? Biophysical properties of
proteins are governed by water-mediated interactions; and one key physical contributor to folding
cooperativity is the energetic effects of water
expulsion (desolvation).53 Recent simulations of
coarse-grained protein chain models with a physics-based desolvation/solvation barrier (referred to
simply as “desolvation barrier,” or “db” below) in
their native-centric potential41,52,54–58 showed that
microscopic db's could significantly enhance folding/unfolding cooperativity.54,57,59 In this respect,
db's afford protein chain models with more realistic
thermodynamics and kinetics than those stipulated
by common Gō-like models with no db's. In view of
the positive impact of desolvation effects on folding
cooperativity and the above-discussed relationship
between folding cooperativity and rate–topology
correlation, we deemed it worthwhile to investigate
the extent to which desolvation effects may account
for the remarkable diversity in folding rates among
two-state-like proteins.60
Our group has underlined the role of conformational entropy in the rate–topology correlation using
common continuum Gō-like models with no db's.27
Building upon that advance and motivated by the
prospect of gaining a deeper understanding about
the role of solvation/desolvation in protein folding,
in the present work we used db models, which have
more realistic folding cooperativity, to address two
main questions: (1) How are the folding rates of
models of small, single-domain proteins impacted
by the introduction of db's, and do they exhibit a
stronger correspondence with experimental rates?
(2) What is the physical basis behind the changes in
the overall free-energy barrier to folding and in the
speed of conformational sampling that lead to this
alteration in the rates?
Theory
The present investigation adopts the coarsegrained native-centric approach in our previous
studies.41,57–59 This modeling approach is appropriate for our purpose. Although nonnative interactions occur in the folding of some proteins,61–65 their
effects are not dominant in two-state-like folders
and can be treated as a perturbation on a nativecentric background. 66 Hence, our native-centric
approach may be seen as a zeroth approximation
in a more general modeling framework.66–68 As in
our previous efforts, for computational tractability,
we use an implicit-water effective db potential54,59
in place of explicit-water simulations. 69,70 The
effective db potential embodies the collective effects
of many water molecules, and thus, in this sense,
represents a “many-body” contribution. However, it
should be recognized that our pairwise-additive
form of the effective db potential involves an
approximation because it neglects the nonadditivity
of the water-mediated effective interactions themselves8,71–73 (see below). Despite these limitations,
results from recent coarse-grained native-centric db
modeling have provided useful physical insights
into molecular recognition74 and mechanical stability of proteins in pulling experiments.75 Moreover,
our simulations showed that db's could significantly
reduce native-state conformational fluctuations,52,59
a notable feature consistent with the experimental
view that db's are a main factor in the kinetic stability
of proteins. 76,77 As already noted above, db's
enhance kinetic cooperativity; that is, they entail a
more extended linear chevron regime.52,57,59 For the
issue at hand regarding folding rate diversity, this
chevron property, by itself, means that db's tend to
increase the diversity of folding rate of a given
proteins under different folding conditions. Given
this trend, it is not unreasonable to expect that db's
would also increase the diversity of folding rates
across different native structures. This is indeed the
case, as will be detailed below.
Here we use a native-centric potential with an
implicit-water desolvation barrier54 that our group
has applied in previous investigations.41,57 Following the notation in the detailed formulation in Ref.
59, the potential is given by
U ðr; rcm ; e; edb ; essm Þ
8
eZðrÞ½ZðrÞ 2
>
>
<
h
i
n
n
2n
= CYðrÞ YðrÞ =2 ðrdb rcm Þ =2n + edb
>
>
:
B½YðrÞ h1 =½YðrÞm + h2 for rbrcm
for rcm Vrbrdb ð1Þ
for rbrdb
where rcm is the contact-minimum (cm) separation, ɛ
is the magnitude of energy at cm, ɛdb is the db
height, ɛssm is the depth of the energy well at the
solvent-separated minimum (ssm), as illustrated in
Fig. 1a; and
ZðrÞ = ðrcm =rÞk
YðrÞ = ðr=rdb Þ2
C = 4nðe + edb Þ=ðrdb rcm Þ4n
B = messm ðrssm rdb Þ2ðm1Þ
h1 = ð1 1=mÞðrssm rdb Þ2 =ðessm =edb + 1Þ
h2 = ðm 1Þðrssm rdb Þ2m =ð1 + edb =essm Þ
ð2Þ
In the above Eq. (2), rssm = rcm + 3 Å, which
followed from the consideration that 3 Å is
approximately equal to the diameter of a water
molecule, and rdb = (rssm + rcm)/2, as in the original
work of Cheung et al.54 We use k = 6, m = 3, and n = 2
as before.41,57,58
The form of this potential was motivated by the
general behavior of two nonpolar solutes in water.
In order for the solutes to be in contact, water
molecules must be pushed out of the space between
them (Fig. 1a). The finite size of the water molecules
thus leads to an energetic cost, manifested as a
barrier in the effective pair potential between the
solutes, that is, the potential of mean force (PMF)
with the water degrees of freedom averaged.79
Evidently, a similar effect is likely to have a signi-
622
Fig. 1. Effective (implicit-water) potential with desolvation barrier (db). (a) The db potential energy41,54,58,59
(continuous curve) is given by the expression for U(r;
rcm, ɛ, ɛdb, ɛssm) in the text [which is identical to that in
Eqs. (2) and (3) of Ref. 59]. Here U(r) is plotted in units
of the depth ɛ of the minimum energy (= − ɛ) at the cm
separation r = rcm[U(rcm) = − 1 in this plot]. Included for
comparison is the PMF of two methane molecules at
25 °C computed by atomic simulation using the TIP4P
model of water (dashed curve, data from Ref. 71). The
schematic molecular drawings illustrate the distances
between the methane molecules (full circles) at the cm,
db, and ssm positions vis-à-vis the size of a water
molecule (dashed circles). For the example in this figure,
the rcm distance, the db height ɛdb, and the ssm depth
ɛssm in U(r) (continuous curve) are shown with values
equal to those in the methane–methane PMF from
atomic simulation (dashed curve). In general, the contact
distance rcm in the U(r) potential for a pair of natively
contacting residues i and j is set equal to the Cα–Cα
distance rijn between the residues in the PDB structure,
whereas ɛdb and ɛssm may take values similar (see the
text) but not necessarily identical to that shown in this
figure. (Effects of varying ɛdb and ɛssm were explored in
Refs. 52 and 59.) (b) PMF computed by explicit-water
atomic simulation (dashed curve) for two 20-residue
polyalanine α-helices versus an implicit-water potential
for the same system (continuous curve), where r is the
distance between the centers of mass of the two helices.
The PMF shown was simulated at 25 °C for two
essentially rigid helices at a fixed crossing angle using
the TIP4P model of water (dashed curve; data from Ref.
78). The implicit-water potential here was constructed by
assuming that water-mediated interactions were pairwise additive, as follows. First, “native” contacts
between residues along the two helices were determined
by applying the criterion for native contacts to the
helices' cm configuration at r ≈ 0.75 nm. Second, the
potential U for each such pair was taken to be the U(r)
function in (a) except rijn was set equal to the given
residue pair's distance in the helices' cm configuration
(rijn can be different for different contacts). The overall
implicit-water potential energy function shown by the
continuous curve in (b) was then calculated as the sum
of all such U's.
Desolvation and Topology-Dependent Folding
ficant impact on the folding process of a globular
protein as well, because most, if not all, water
molecules must be excluded from the hydrophobic
core before the native folded structure can be
formed.
In general, the water-mediated PMF is temperature
dependent.73,80 Therefore, to account for the temperature dependence of protein folding,81,82 some
form of temperature dependence would have to be
introduced into the effective potential, as in our
group's recent attempt to rationalize58,59 the common
yet intriguing feature of isostable intrinsic enthalpic
folding barriers.83 In the present investigation, however, we use only a temperature-independent nativecentric potential with db, as in most previous
studies,41,52,54–56 because our main goal here is to
address the diversity of experimental folding rates of
different proteins measured at essentially the same
temperature. The present focus on temperatureindependent interactions also serves well to ensure
that any entropic effect observed in our model must
necessarily originate from conformational entropy,
whose role in the rate–topology correlation27 is an
issue we aim to further elucidate.
Approximate additivity of the db potential
As in previous applications of db potentials41,54,59
with functional form similar to that in Fig. 1a, the
total native-centric interaction energy in a protein
chain model is the sum of db potentials between
pairs of residues. Figure 1b provides an assessment
of this additivity assumption. Here, the PMF
between two 20-residue α-helices simulated using
an explicit-water model78 is contrasted with an
effective potential constructed for the same manybody system based on assuming pairwise additivitiy of our db potential. Figure 1b shows that the
overall barrier to helix–helix association (at separation ∼ 1 nm) computed from explicit-water simulation (dashed curve) is lower than that calculated by a
simple summation of contributions from our implicit-water potential for individual residue pairs
(continuous curve). Nonetheless, features of the
two potentials in Fig. 1b are quite similar, including
the position and depth of the solvent (water)separated minimum at ≈1.2 nm. This similarity
suggests that one may expect pairwise additivity of
the db potential to be a reasonable first approximation for coarse-grained modeling of desolvation/
solvation effects in protein folding.
Activated volume in pressure-dependent folding
as a db effect
A noteworthy physical implication of desolvation
effects is how they contribute to the volumetric
signatures of protein folding.84 Recent explicit-water
simulations of two-helix systems78 has revealed an
intimate relationship between the enthalpic contribution to the overall folding barrier and the activation volume of folding transition state determined
from pressure-based experimental methods.85 The
Desolvation and Topology-Dependent Folding
helix simulations in Ref. 78 highlighted the creation
of a void volume when the two helices (as a model
for two parts of a folding protein) were separated by
a distance too small to accommodate water molecules in between them (a process termed “steric
dewetting”). Thus, formation of the helix dimer
entails surmounting an “activation volume” (peak of
volume increase as the two helices approach each
other from large separation) of ≈ 55 mL/mol and
≈ 150 mL/mol, respectively, for a pair of 20-residue
polyalanine and polyleucine helices (Fig. 3 of Ref.
78). Interestingly, pressure-based experiment by
Mitra et al. showed that the folding activation
volume of wild-type staphylococcal nuclease is
≈ 56 mL/mol (Table 1 of Supporting Information
for Ref. 85), suggesting that the extent of dehydration
at the folding rate-limiting step of this protein may be
similar to that typified by the dimerization of two
rigid 20-residue polyalanine helices. This comparison between activation volume data from pressure
experiments and from explicit-water simulation of
623
many-body hydrophobic interactions provided
further support to the hypothesis that the ratelimiting step of folding for some proteins likely
involves large-scale, near-simultaneous hydrophobic burial. If so, the height of the enthalpic folding
barrier as well as the size of activation volume may
be closely related to the degree of folding cooperativity of a given protein.58,78 How these many-body,
nonadditive effects might be captured and elucidated by coarse-grained modeling is beyond the
scope of the present work but is a question that
would be extremely interesting to explore in the
future.
db's lead to a higher overall folding free-energy
barrier
As in most of our previous studies,41,59 we adopt
ɛdb = 0.1ɛ and ɛssm = 0.2ɛ for the native-centric db
potential (Fig. 1a). We focus on the 13 proteins in the
previous study by Wallin and Chan27 (Fig. 2). The set
Fig. 2. Ribbon diagrams of the PDB structures of the set of 13 proteins used in the present investigation (labeled below
each structure by its PDB id). The same set was used in a previous study by Wallin and Chan.27 Drawings were created by
RasMol.
624
of native contacts used for modeling a protein is
obtained by applying the same 4.5 Å side chain–side
chain separation criterion as that in Refs. 27, 58, and
59 on the given protein's Protein Data Bank (PDB)
structure. Folding kinetics and equilibrium sampling
are conducted by Langevin dynamics.86 As before,
bias potentials are introduced to facilitate sampling52,87–89 when necessary. The parameters for
Langevin dynamics simulations are identical to
those in our previous works. In particular, the simulation time step δt = 0.02 and the friction coefficient
γ = 0.0125, as in Refs. 27 and 59. During Langevin
dynamics simulation, a pair of residues belonging to
the native contact set is considered to be in contact—
and thus contributing to the fractional native contact
number Q—if the distance between their Cα positions is not larger than that at the db peak of their
Fig. 3. Free-energy barriers and folding rates. (a)
Typical Q-based one-dimensional free-energy profiles,
shown here for the with-db (continuous curve) and nodb (dashed curve) models of the 6–85 fragment of λrepressor (1lmb). Each curve was simulated at approximately the transition midpoint of the given model;
ΔG(Q)/kBT = − ln P(Q) + constant, where P(Q) is the
conformational population as a function of Q. (b)
Free-energy barrier height ΔG‡ (in units of kBT) versus
logarithmic midpoint folding rate ksim
determined from
f
simulations of with-db (filled circles) and no-db (open
squares) models of the 13 proteins we studied. Straight
lines were determined from linear regression with
correlation coefficient r = − 0.98 for both cases. The xintercepts of the straight lines provide the preexponential (front) factors in Kramers theory for the withdb (F db) and no-db (F (0)) models. Data for the no-db
models were taken from Fig. 4 of Ref. 27.
Desolvation and Topology-Dependent Folding
native-centric db potential.41,58,59 As illustrated by
the example in Fig. 3a, free-energy profiles ΔG(Q)/
kBT (kBT is Boltzmann constant times absolute
temperature) for the models with db's we studied
have higher overall free-energy barriers than the
profiles for their corresponding no-db models. This
is part of the above-noted general trend that
folding/unfolding transitions are more cooperative
in with-db models than in corresponding no-db
models.41,54,57
db's significantly reduce conformational
diffusion at the peak of overall folding
free-energy barrier
Using the computational setup outlined above, we
have determined the folding rates of the with-db
models for the 13 proteins in Fig. 2 at or near each
with-db model protein's transition midpoint. We
have also determined the folding activation free
energy, ΔG‡, at the corresponding model temperatures for the progress variable Q. Our ΔG‡'s are
determined from Q-based free-energy profiles as
exemplified by that in Fig. 3a, wherein ΔG‡ is an
overall barrier height defined as the ΔG value at the
peak of the overall free-energy barrier minus the ΔG
value at the unfolded (or denatured, low-Q) freeenergy minimum. For the 13 with-db models, Fig. 3b
shows that, to a very good approximation, there is
linear relationship, with slope − 1, between logarithmic simulated folding rate ln kfsim and ΔG‡/kBT
(circles). As noted previously,27 a similar linear relationship holds for the corresponding no-db models
as well (squares in Fig. 3b). These trends indicate
that the relationship
!
DGz
ð3Þ
kf = F exp kB T
in the conventional transition-state picture or
Kramers theory of protein folding 90,91 holds
approximately for our model midpoint folding
rates, with F denoting the preexponential front
factor92 or prefactor93 estimated by the x-intercepts
of the linear fits in Fig. 3b.
The formulation in Eq. (3) provides an analysis of
model folding rates in terms of a product of two
contributions: The front factor F characterizes the
rate of conformational diffusion at the overall folding
free-energy barrier, whereas the folding barrier
height ΔG‡/kBT is determined by the population of
conformations at the same overall barrier relative to
that at the unfolded minimum. The ensemble of
conformations at the overall barrier constitutes a
putative folding transition state27,36 because, dynamically, the value of Q can only undergo essentially
continuous variation. Hence, a chain en route to the
native state must pass through one of the conformations with Q values corresponding to that of this
putative transition-state ensemble (TSE) at the overall barrier. This ensemble acts as a folding bottleneck
when ΔG‡/kBT is large because then the conformations it encompasses have low probabilities relative
625
Desolvation and Topology-Dependent Folding
Fig. 4. Time evolution
~ of~ native contact number in
Langevin dynamics. P½QðtÞ; Qðt + dtÞ is the probability,
among all possible dynamic transitions effected by a
Langevin dynamic
time step δt,~that the number of native
~
contacts is QðtÞ at time t and Q ðt + dtÞ at a subsequent
time t + δt. Results shown are for the ~
with-db
~ model of~CI2
(2ci2) simulated at ɛ = 1.172 (T = 1). Q = QQn where Qn is
the
~ number of native contacts in the PDB structure and
Qn = 131 for 2ci2. The transition probabilities were
determined from 2 × 109 time steps of
~ sampling. Probabilities
changes in Q, denoted here as
~ ~for different
~
dQuQðt + dtÞ QðtÞ, are depicted in different colors
for
~
clarity: the black, red, and blue curves are for dQ = 0, – 1,
and + 1, respectively. Probabilities
~ for
~ all other transitions
~
were~zero in our simulation (P½QðtÞ; Qðt + dtÞ = 0 for dQN1
or dQb 1).
to those belonging to the unfolded state. Following a
similar argument put forth in an earlier lattice
protein model study (Fig. 2 of Ref. 94), Fig. 4 here
shows that during one simulation time step δt (which
is short by construction), the largest change in the
number of native contacts is ± 1, which is the
minimum nonzero increase or decrease possible.
Thus, as expected, Q is seen as varying in a quasicontinuous manner in our model dynamics. Accordingly, properties of the transition state, such as its
average potential energy, conformational entropy,
and average root-mean-square deviation (RMSD)95
from the native structure, are determined from
conformations sampled within a narrow range of Q
values at the peak of the overall free-energy barrier
as in Ref. 27.
Figure 3b shows that folding rates in the with-db
models are substantially slower than the corresponding no-db models. However, the with-db
models' higher ΔG‡/kBT values (Fig. 3a) account
only partly for the slower folding rates in these
models. The analysis in Fig. 3b shows that the other
major reason for their slower folding rates is that
conformational diffusion is slower in the with-db
models. In Fig. 3b, the intercepts of the linear fits
show that the front factor F db ≈ 1.7 × 10− 5 for the
with-db models is ∼ 50 times slower than the front
factor F (0) ≈ 9.0 × 10− 4 for the no-db models. In
general, the rate of conformational diffusion along
a single progress variable Q has been found to
depend on the progress variable.96,97 Results from
one study suggested that the variation across the
middle of range of Q may be mild.96 Using a different model, another study concluded that the rate of
conformational diffusion decreases “with respect to
the progression of folding toward the native state,
which is caused by the collapse to a compact state
constraining the configurational space for exploration”.97 Remarkably, in light of likely variations of
conformational diffusion rate with respect to Q as
proposed in these prior theoretical studies, our
results in Fig. 3b show that the rate of transitionstate conformational diffusion, as embodied by the
front factor F, is approximately uniform among a
class of models for different proteins constructed
using the same native-centric interaction scheme
(with-db or no-db), even though it can be very
different for different classes of models (with-db
versus no-db).
The observation here that transition-state conformational diffusion is significantly slower in the
with-db model is physically reasonable because the
presence of repulsive interactions in the db potential
creates a more bumpy energy landscape, entailing
more channeled and meandering microscopic folding paths that would take longer times to traverse.
Evidently, the rate of conformational diffusion is
dependent upon solvent viscosity.98 The present
simulations were conducted under low viscosity for
computational tractability. Nonetheless, a recent
result showing that model chevron plots maintain
their shape over a wide-range of Langevin friction
coefficients52 and the above general physical consideration both suggest that a significant difference
in the rate of transition-state conformational diffusion between with-db and no-db models should
persist in Langevin dynamics with higher, more
water-like friction coefficients.86
Results and Discussion
db's significantly increase the diversity in
folding rates among model proteins of
different native topologies
Applying the with-db modeling approach described above to the 13 proteins in Fig. 2, we show
in Fig. 5a the simulated folding rates, kfsim , of
the with-db protein models at their respective
transition midpoints and compare kfsim's with experimental rates (see Ref. 27 and references therein)†.
At the model transition midpoint, folding and
unfolding rates are equal and the kinetic relaxation
is well approximated by a single exponential,41 and
thus kfsim = 1/MFPT, where MFPT is mean first
passage time of folding. As in the previous no-db
model study27 (Fig. 5b), we focus on kfsim at the
model transition midpoint because the behaviors
of no-db and with-db models are kinetically more
cooperative, that is, two-state-like, at midpoint
† In Ref. 27, for Coicilin E9 immunity protein (PDB id
1imq), instead of the chain length N and folding rate kf in
Table 1 of this reference, they should be listed, respectively, as N = 86 and kf = 1.5 × 103s− 1. This is merely a
typographical error that did not affect other results on
1imq in Ref. 27.
626
Desolvation and Topology-Dependent Folding
which include three other two-state proteins (with
N = 36, 43, and 115) and three three-state proteins,
their simulated no-db model folding rates cover ≈ 4.7
orders of magnitude, whereas the corresponding
experimental folding rates cover ≈ 8.8 orders of
magnitude (see Table 1 and upper plot in Fig. 1 of
Ref. 100)‡.
db's tend to increase Q of the folding transition
state but leave transition-state RMSD from
native essentially unchanged
Fig. 5. Experimental folding rates (kexp
f ) versus simulated folding rates (ksim
f ) of the 13 proteins studied here for
(a) the with-db model and (b) the no-db model. The no-db
data in (b) were from Fig. 2 of Ref. 27 and included here to
facilitate comparison with the new results in (a).
than when the models are under strongly folding
conditions.41,57,99
Although the correlation between simulated and
experimental folding rates (kfsim and kfexp in Fig. 5) in
the with-db models is comparable with that of the
no-db models (Pearson correlation coefficient
r = 0.66 and 0.69, respectively), the with-db models
exhibit a remarkable improvement over the no-db
models in matching the experimental diversity in
folding rates. In Fig. 5a, kfsim spans a range of ≈ 4.6
orders of magnitude, almost identical to the kexp
range of ≈ 4.5 orders of magnitude. To our knowledge, such a match over 4 orders of magnitude
between the range of folding rates from direct
kinetic simulations of explicit-chain models and
that from experiments is unprecedented. By comparison, the range of folding rates in Fig. 5b
simulated using no-db Gō-like models of the same
proteins spans only ≈ 2.2 orders of magnitude.
Interestingly, the kfsim range of ≈ 2.2 orders of
magnitude from our no-db Gō-like models is almost
identical to the range of ≈ 2.1 orders of magnitude
obtained previously by Chavez et al.100 using the
same no-db Gō-like constructs for a somewhat
different set of 13 proteins (9 of which overlap with
our set) with chain lengths within the range N = 56–
98 as in our set. In contrast to our with-db model
folding rates (Fig. 5a) but similar to our no-db model
folding rates (Fig. 5b), the no-db model folding rates
of Chavez et al. also fall short of matching the
corresponding range of experimental folding rates:
For their aforementioned 13 proteins with N = 56–98,
the experimental folding rates span ≈ 7.0 orders of
magnitude; for the set of all proteins in their study,
The match between the ranges of simulated and
experimental folding rates in Fig. 5a suggests
convincingly that barrier effects originating from
desolvation energetics are a significant contributor
to folding rate diversity. As noted above, both the
increase in overall folding barrier height ΔG‡ and
the slower transition-state conformational diffusion
(smaller front factor F ) contribute to slower
folding in the with-db models than that in the
no-db models. However, because F db is approximately constant among the with-db models, at
least for the 13 proteins studied here (Fig. 3b), the
larger diversity in folding rates among the with-db
models vis-à-vis that among the no-db models is
underpinned almost entirely by a larger diversity
in ΔG‡ values for the with-db models. Below we
provide rationalization for both the with-db
models' higher ΔG‡ values as well as the larger
dispersion of the ΔG‡ values.
The example in Fig. 3a indicates that the peak of
the with-db model's higher overall folding barrier
is situated at Q ≈ 0.67, which is significantly higher
than the Q ≈ 0.53 value for the peak of the overall
folding barrier in the no-db model. Motivated by
this observation, we show in Fig. 6 the relationship
between the overall folding barrier height ΔG‡ and
the corresponding change in fractional native
contact Q from the denatured-state (low-Q) minimum (Q = QD) to the transition-state peak (Q = Q‡).
As seen in Fig. 6, ΔG‡ is well correlated with
ΔQ‡ = Q‡ − QD for both the with-db and no-db
models for 10 of the proteins we study. For these
model proteins, db's produce a shift in the Q-value
of the peak location, leading to larger values of
ΔQ‡. The Q-value of the denatured state minimum,
on the other hand, remains roughly the same for a
given protein in both models.
‡ We note that the rescaling procedure proposed by
Chavez et al. in Eq. (C.4) in Supporting Information of
Ref. 100 is unwarranted. The proposed procedure
resulted in approximately 4 orders of magnitude increase
in the range of their no-db model folding rates after
rescaling. However, even if the model native-centric
energy strength ɛ may be different for different proteins
when measured in physical energy units, this consideration cannot affect model midpoint folding rate because
ksim
at midpoint temperature Tm is controlled by the
f
dimensionless quantity ΔG‡/kBTm that, therefore, is
invariant with respect to change in unit for ɛ.
Desolvation and Topology-Dependent Folding
Fig. 6. Activation free energy (ΔG‡, in units of kBT)
versus “activation” Q value (ΔQ‡ = Q‡ – QD). For both the
with-db (filled circles) and no-db (open squares) models,
the correlation is significant for 10 of the proteins studied
(3 outliers not plotted, see the text). The straight lines are
least-squares linear regression; correlation coefficient
r = 0.73 and 0.75, respectively, for the with- and no-db
models plotted.
Thus, the larger ΔG‡ in the with-db models may
be viewed as resulting from a larger ΔQ‡. This
feature was noted previously for with-db models of
chymotrypsin inhibitor 2 (CI2) and barnase.59 The
more general result in Fig. 6 showing a substantial
increase in ΔQ‡ for the with-db models over that for
the no-db models is physically reasonable because
db tends to decrease the stability of partially ordered
conformations. As a result, folding does not proceed
until a sufficiently high number of contacts have
formed; that is, larger portions of the protein are
ordered into native-like structure. Additionally, Fig.
6 shows for both the with-db and no-db models that
an approximate linear relationship exists between
ΔG‡ and ΔQ‡. We consider this trend a Hammondlike behavior,101 because it shows that the extent of
structural reorganization of the transition state from
that of the reactant (denatured state in our case) is
negatively correlated with reaction (folding) speed,
and therefore positively correlated with overall
barrier height (Fig. 3b), as in the Hammond hypothesis. The underlying principle of this trend is
similar to that enunciated by Hammond, although
his original study of chemical reactions considered
potential energy as a function of reaction coordinate101 rather than the free-energy profile used in
the study of protein folding.
It should be noted, however, that Hammond-like
behavior does not apply to all of our protein
folding models. The behaviors of three outliers—
models for twitchin (1wit), spliceosomal protein
U1A (1urn), and acylphosphatase (1aps)—suggest
that once the overall folding barrier ΔG‡ becomes
sufficiently high, its relationship with ΔQ‡ does not
follow the trend exhibited in Fig. 6 for models with
comparatively lower ΔG‡ values (outlier data not
shown in Fig. 6). For with-db models of the outliers,
ΔG‡/kBT = 9.8, 10.8, and 14.9, and ΔQ‡ = 0.40, 0.51,
627
and 0.45, respectively, For their no-db counterparts,
ΔG‡/kBT = 4.7, 5.3, and 6.1, and ΔQ‡ = 0.31, 0.28,
and 0.30, respectively. Nonetheless, these no-db
ΔG‡ and ΔQ‡ values are lower than those for the
corresponding with-db models. In this respect, they
are similar to the results for the proteins shown in
Fig. 6.
We next turn to the increased diversity in ΔG‡
values in the with-db model. What is causing some
proteins to experience an increased shift in simulated ΔQ‡ value than others when modeling is
switched from the no-db to the with-db interaction
scheme? To address the issue, we consider RMSD
from the native structure as a function of Q (shown
for two proteins in Fig. 7). In all cases, including
those for the remainder of the protein set not shown
in Fig. 7, RMSD is a decreasing function of Q,
wherein for Q values intermediate between the
denatured and native states (Q ∼ 0.2–0.8) the RMSD
at a given Q is higher for the with-db than for the
no-db model. Remarkably, the RMSD values at the
barrier peak locations of the two models (marked
by vertical lines in Fig. 7) are essentially the same.
This near-invariance of transition-state RMSD with
Fig. 7. RMSD from the native PDB conformation as a
function of fractional number of native contacts Q. Results
are shown for the examples of (a) λ–repressor (1lmb) and
(b) S6 (1ris). In each panel, filled circles (upper curve) are
for the with-db model, whereas open squares (lower
curve) are for the no-db model. Vertical lines in each of the
plot mark the locations of the overall barrier peaks along
the free-energy profiles for the with-db (continuous line)
and no-db (dashed line) models.
628
respect to the change from the no-db to with-db
interaction scheme provides a perspective for
understanding the corresponding shift in ΔG‡. It
appears that adding db's shifts the ΔG peak to a
higher Q‡ position because in the presence of the
unfavorable interactions at the db's, a larger
number of native contacts are necessary to achieve
a given RMSD threshold required at the ratelimiting step of folding, and this shift in Q‡ leads
to a higher ΔG‡ following a Hammond-like trend.
However, the magnitude of this Q‡ shift is sensitive
to native topology. Comparing the results for 1lmb
and 1ris in Fig. 7, for example, indicates that on
average a protein with higher native topological
complexity would require a larger Q‡ shift to
maintain an essentially model-independent RMSD
threshold.
Desolvation and Topology-Dependent Folding
Folding routes with db's are more channeled
To gain further insight into db effects, we examine
the distribution of individual native contacts along the
model folding trajectories. At each value of Q, there are
conformations with different sets of native contacts
that are consistent with the givenPtotal
number of
~
~
n
native contacts, such that Q = Q
k = 1 Pðck jQÞ=Qn ,
where P(ck∣Q) is the probability of contact ck in the
set of conformations each of which has a given Q
~
~
value, the contact label k = 1, 2, …, Qn , with Qn
denoting the total number of contacts in the native
(PDB) structure. For an individual conformation, a
contact ck can either be formed or not formed (with
slight variation when a smooth criterion is used
instead52). But in an ensemble, P(ck) typically takes
on fractional value because it involves averaging over
Fig. 8. Comparing the transition states in the with-db and no-db model. (a) Contact maps showing contact
probabilities (color coded as indicated) in the transition states of 2ci2 in the with-db model (upper triangle, same
simulation conditions as in Fig. 4 above) and in the no-db model (lower triangle), simulated at each model's respective
transition midpoint. Transition states are defined from Q-based free-energy profiles as discussed in the text. The bottom
drawings illustrate conformational variations in the transition states for the with-db (b) and no-db (c) models. In (b) and
(c), the thick black traces represent the backbone of the native PDB 2ci2 structure, whereas thin red traces depict
representative transition-state conformations optimally superimposed on the native structure. These drawings were
constructed using the method in Ref. 10.
629
Desolvation and Topology-Dependent Folding
different conformations with different contact sets. We
now take a closer look at the distribution of P(ck).
The contact maps in Fig. 8a shows probabilities of
individual contacts at the peak of the free-energy
profile in both the with-db and no-db models for CI2
[P(ck) for a narrow range of Q centered at Q‡; see Ref.
27]. Transition-state contact maps such as Fig. 8a
provide a useful visualization of the distribution in
contact probabilities.66 The distribution of contact
probabilities is more heterogeneous in the with-db
model (upper triangle) than in the no-db model
(lower triangle). This trend is consistent with results
from the other proteins in our data set (contact maps
not shown), indicating that one effect of db's is to
induce more favorability to certain contacts in the
TSE relative to that in the no-db case. Reflecting the
higher Q‡ in the folding transition state in the withdb model than that in the no-db model (Figs. 6 and
7), the chain representations in Fig. 8b and c show a
discernibly tighter conformational ensemble for the
with-db model (Fig. 8b) than for the no-db model
(Fig. 8c).
One parameter that has been used to quantify the
heterogeneity of native contacts along the freeenergy profile is the route measure
1
~
Qn
X
ðPðck jQÞ QÞ2
RðQÞ = ~
Q n Qð1 QÞ k = 1
102
ð4Þ
introduced by Plotkin and Onuchic
and applied
subsequently by Chavez et al.100 to analyze simulated data obtained from no-db Gō-like models. R(Q)
is essentially the second moment of the contact
probability distribution normalized by the maximum possible spread (0 ≤ R(Q) ≤ 1). Detailed discussions of the meaning of R(Q) are provided in Refs.
100 and 102. Briefly, if R(Q) takes the maximum
value of unity, it indicates that only one specific set
of native contacts is found at Q, in which case the
protein can traverse very few possible conformational routes through the given Q value. At the other
extreme, if R(Q) = 0, it means that all native contacts
are equally probable at Q, and as a result many
different conformational routes are available for the
protein to pass through the given Q value. It follows
that the value of R(Q) indicates whether there are
many [small R(Q)→0] or few [large R(Q)→1]
folding/unfolding routes at a given Q value.100,102
Figure 9 shows the route measure in both the
with-db and no-db models for the same two
proteins studied in Fig. 7. R(Q) has been computed
before for several no-db Gō-like model proteins in
Ref. 100. For those no-db model proteins that were
considered in both that work and the present
study, we obtain agreement between the two sets
of results. For all proteins considered in our study,
R(Q) for the with-db models (Fig. 9, filled circles) is
typically larger than that for the corresponding nodb models (Fig. 9, open squares) at virtually all Q
values, indicating that there are generally fewer
folding routes in the with-db models. This result is
consistent with our expectation that the number of
Fig. 9. Route measure. Results are shown for the two
proteins in Fig. 7. In each panel, route measure for the
with-db model is plotted using filled circles (upper curve),
whereas that for the no-db model is plotted using open
squares (lower curve). Vertical lines mark the locations of
overall barrier peaks along the free-energy profiles for the
with-db (continuous line) and no-db (dashed line) models,
as in Fig. 7.
accessible conformations that are partially folded is
substantially reduced by the repulsive part of db
interactions. R(Q)'s for with-db models also exhibit
substantially more structure. Whereas R(Q)'s for
the no-db models are mostly monotonic, decreasing function with possibly a low maximum, R(Q)'s
for the with-db models often have one or more
prominent maxima. This feature implies that the
action of db's to narrow routing possibilities along
the folding trajectory is significantly more pronounced at certain values of Q. Folding may be
characterized as encountering conformational
entropic folding bottlenecks at these Q values.100
Thus, considering the above arguments together,
with desolvation effects more appropriately
accounted for by the present with-db models,
folding is seen as more channeled than that predicted by no-db Gō-like models.
Rate–topology correlation likely driven by
conformational activation entropy
To gain further insight into the biophysics of
rate–topology correlation, we resolve simulated
activation free-energy ΔG‡ into its energetic (ΔE‡)
630
Desolvation and Topology-Dependent Folding
of the harsher restrictions imposed by db's on the
TSE conformational freedom of topologically more
complex proteins.
Figures 11 and 12 turn attention to the relationship between native topology and simulated
folding rate. Since the predictive power of RCO
was discovered, 26 several measures of native
topological complexity have been devised to
Fig. 10. Energetic and entropic components of freeenergy barrier to folding. ΔE‡ is activation energy and
ΔS‡ is activation entropy. Activation free energies (ΔG‡ )
in units of kBTm at the Tm's of the 13 with-db model
proteins as well as their energetic (ΔE‡/kBTm) and
entropic (−ΔS‡/kB) components are plotted as function
of logarithmic simulated folding rate. Straight lines are
results of least-squares linear regression.
and entropic (−TmΔS‡) components, where ΔE‡ is
activation energy and ΔS‡ is activation conformational entropy.27 Figure 10 shows ΔG‡ (same data
as that in Fig. 3b), ΔE‡, and activation conformational entropic free-energy − TmΔS‡ for the 13 withdb model proteins. As for the no-db models
studied before,27 there are large entropy-energy
compensations. For example, both ΔE ‡ and
− TmΔS‡ have magnitudes ∼ 130kBT for the slowest
folding with-db model in Fig. 10, but they combine
to yield a ΔG‡ of only ∼ 15kBT.
Figure 10 shows that logarithmic model folding
rate kfsim correlates quite well with ΔE‡ (negative
correlation, r = − 0.79) and also with − TmΔS‡ (positive correlation, r = 0.74). Simulation data in Fig. 10
indicate further that the sign of ΔG‡ is identical to
that of its conformational entropic component,
− TmΔS ‡ , but opposite to that of its energetic
component, ΔE‡. In other words, the conformational
entropic component of ΔG‡ dominates over its
energetic component. Because the variation in logarithmic folding rate across different model proteins
is underpinned by the corresponding variation in
ΔG‡ (Fig. 3b), the observation of entropic dominance in Fig. 10 implies that the rate–topology
correlation is driven mainly by conformational
entropy of the folding transition state in the withdb models, as in our previously studied no-db
models.27 This trend—which has now been obtained
from two explicit-chain simulation studies—is also
consistent with the conclusion from an earlier nonexplicit-chain investigation 103 and recent advances
in elucidating principles of loop closure.104 Taken
together, the robustness of the finding led us to
conclude that the rate–topology correlation in real
proteins is likely a consequence of similar conformational entropic effects at the folding rate-limiting
step. From this vantage point, the increased folding
rate diversity in with-db models is a manifestation
Fig. 11. Topological parameters versus simulated logarithmic folding rate (ksim
f ). Results are shown for both
with-db (filled circles) and no-db (open squares) models.
Folding rates for the no-db models were from Ref. 27.
Straight lines are results of least-squares linear regression.
The correlation coefficients for with-db and no-db models
are, respectively, (a) r = −0.64, − 0.59 for CO; (b) r = − 0.73,
−0.72 for RCO; and (c) r = − 0.80, − 0.84 for LRO. In (c), the
ln ksim
versus LRO data for no-db models were taken from
f
Fig. 7b of Ref. 27 and are included here for comparison.
631
Desolvation and Topology-Dependent Folding
where the summation is over contacts between
nonhydrogen atoms of contacting residues, and Na
is the total number of such atomic contacts.26,107 We
also provide in Fig. 11c the dependence of simulated
folding rate on LRO:105
LRO =
1 X
nij
N ibjl
ð7Þ
c
Fig. 12. Transition-state topological parameters versus
entropic component of activation free energy (see Fig. 10).
Present results for the with-db models (filled circle) are
compared against previous results27 for the no-db models
(open squares). Straight lines are results of least-squares
linear regression. The correlation coefficients for the withdb and no-db models are, respectively, (a) r = 0.34, r = 0.24
for CO‡, and (b) r = 0.70, r = 0.53 for LRO‡. Data for the nodb models were taken from Figs. 8b and 9b of Ref. 27.
rationalize protein folding rates.105–109 Here we
focus on RCO,26 long-range order (LRO),105 and a
measure we termed27 CO:
CO =
1 X
lij
~
NQn ibj3
ð5Þ
where N is the chain length (number of residues) of
the given protein, i and j are residue labels, lij = ∣j − i∣
and the summation is over residue–residue contacts
in the native structure. This measure was motivated
by, but differs somewhat from, the original definition of RCO. CO was introduced for Cα chain model
studies27 for its similarity with RCO. But unlike
RCO, once the native contact set is determined,
calculation of CO does not require knowledge about
side-chain positions (Fig. 11a). The RCO values in
Fig. 11b, however, are calculated by the original
definition:
RCO =
X
1
lij
NNa atomic contacts
ð6Þ
where nij = 1 if residues i and j are in contact;
otherwise, nij = 0. Unlike CO and RCO, the terms for
LRO are not weighted by the loop length lij, and
LRO counts only long-range contacts satisfying a
sequence cutoff lc criterion. Here we use lc = 12 as in
Ref. 27.
Figure 11 shows for the 13 studied proteins that the
correlation between logarithmic kfsim and the topological complexity parameters are reasonably good.
Introduction of db's leads to an improved correlation
with CO (− r increases from 0.59 for the no-db models
to 0.64 for the with-db model, Fig. 11a). But db's have
little effect on the correlation of log kfsim with RCO
and LRO. As noted above, the lij terms in RCO are
weighted by the number of side-chain atomic
contacts, whereas those in CO are not [Eqs. (5) and
(6)]. Interestingly, even though kfsim is computed
using a Cα chain model with a uniform strength for
favorable native-centric energies, the correlation of
log kfsim with the RCO measure (r ≈ − 0.73) is
significantly stronger than that with the CO measure. Among the topological complexity parameters
considered, the simulated logarithmic folding rates
correlate most strongly with LRO (Fig. 11c). The
r ≈ − 0.8 value for the correlation between log kfsim
and LRO is comparable to that for the dependence of
experimental log kfexp on LRO,105 despite the weaker
correlation between log kfsim and log kfexp for the set of
proteins we study (r = 0.66, Fig. 5).
Combining the results from Figs. 10 and 11, Fig. 12
explores the relationship between activation conformational entropy ΔS‡ and the transition-state
topological complexity parameters CO‡ and LRO‡
of the with-db models, an analysis that has been
performed for the corresponding no-db models.27
The activation quantities CO‡ and LRO‡ are the CO
and LRO values computed for the TSE instead of for
the native structure; that is, they are obtained by
applying Eqs. (5) and (7) but with the summation
replaced by one that sums over contacts in each of
the TSE conformations and then averaged over the
TSE. [Note that RCO‡ cannot be computed using a
Cα chain model because the side-chain information
required in Eq. (6) is lacking.]
Figure 12a shows that there is not much correlation between ΔS‡ and CO‡ among both the with-db
and no-db models. This is not too surprising
because although log kfsim correlates reasonably
well with ΔS‡ (Fig. 10), the correlation between
log kfsim and native CO is weaker (Fig. 11a).
Nonetheless, it is interesting to note that the range
of CO‡ values spanned by the 13 model proteins is
60–80% of the corresponding range of native CO
values. This trend appears to be consistent with
632
recent results based on ψ-value analyses and other
experimental techniques, indicating that transition
states of several small proteins achieve approximately 60–80% (∼ 70%) of the RCO of their
respective native structures.110,111 A somewhat
lower ∼ 50% of native RCO, however, was found
in putative TSE's simulated using experimental ϕvalue as constraints112 (see also comment in Ref. 113
on the method in Ref. 112).
Figure 12b shows that activation conformational
entropy correlates much better with LRO, and that
db significantly improves the correlation, viz., r for
LRO‡ versus − ΔS‡/kB increases from 0.53 for the nodb models to 0.70 for the with-db models. Contrasting this behavior with that in Fig. 12a, our finding
suggests that the conformational entropic consequence of the topological complexity in the TSE may
be better characterized by the LRO measure than by
the RCO measure. It is clear from Fig. 12b that db's
promote nonlocal contacts in the TSE, with the
maximum LRO‡ among the 13 proteins studied
shifting from ∼ 0.65 for the no-db models to ∼ 0.8 for
the with-db models. Results in Figs. 11c and 12b
indicate that LRO‡ ∼0.4 (LRO). Experimental testing
of this predicted scaling should provide useful
topological information about the TSE in addition
to the insight gained from the CO‡ ∼ 0.7 (CO) relation discussed above.
Concluding Remarks
In summary, we have shown that incorporating
physics-inspired pairwise db's into native-centric
coarse-grained explicit-chain models of a set of
natural proteins can lead to a remarkable diversity
in folding rates almost identical to that observed
experimentally, a feat not achievable by common
Gō-like models without db's. db's give rise to more
ruggedness on the energy landscape. This ruggedness enhances rather than diminishes folding
cooperativity because db's serve to eliminate many
partially folded conformations that are prone to
kinetic trapping. In other words, energy landscapes
with db are rugged with barriers but not rugged
with traps, a distinction that has been pointed out in
a lattice modeling context.92 Consequently, we
found that folding with physical db's is more
cooperative, slower, and more channeled than that
stipulated by no-db modeling.
In our models, the slowing of folding rate by
db's as well as the concomitant enhancement of
folding cooperativity is seen as mainly a transitionstate conformational entropic effect. Broadly
speaking, this effect is more prominent for proteins
with more complex native topologies. The correlation between simulated and experimental folding
rates is fair for the set of proteins studied.
Although the match between the range of simulated and experimental folding rates improves
dramatically with the incorporation of db's, the
degree of correlation between simulated and
experimental folding rates are practically the
Desolvation and Topology-Dependent Folding
same, and are not very high, for our with-db
and no-db models. This means that much needs to
be learned about the relationship between the
energetics of db models and that of real proteins,
as well as the possible connections between our
with-db models and many-body interaction
schemes that have been invoked, with various
degree of success, to rationalized rate–topology
correlation.38,39,52
In this respect, it should be noted that the
native-centric db model has recently been applied
productively to rationalize non-two-state protein
folding.67 The modeling approach also appears
capable, at least for two members of the peripheral
subunit-binding domain family with available
PDB structures, to capture the rank order of
folding cooperativity of homologous proteins.
However, it may not always reproduce quantitatively the full divergence in folding rates among
homologues.52,114 Experiments have shown that
even a single mutation can significantly change
folding speed,13 and folding rates of proteins with
the same architecture, such as those for the
spectrin domains, can differ by more than 3 orders
of magnitude.115 Several circular permutants'116
nonconformity to the usual rate–topology correlation117 also raised questions as to the generality
of any simple theoretical treatment based on
native topology. To what degree native-centric
approaches such as the present db model can
rationalize these intriguing findings remains to be
ascertained.
These potential limitations of the present model
notwithstanding, the fact that a simple addition of
db's is sufficient to essentially reproduce the large
range of experimental folding rates in this study
suggests strongly that db effects are a main physical
origin of the remarkable diversity in the folding
rates of natural proteins. A further tantalizing
suggestion from our results is that once a protein
sequence is designed to specifically favor a folded
conformation (as in our native-centric models) that
has an appropriate topology,52,67,118 most of folding
cooperativity and folding rate diversity might
simply follow from the physics of desolvation.
This is an attractive prospect that deserves further
investigation.
Acknowledgements
We thank Artem Badasyan, Justin MacCallum,
Cathy Royer, Peter Tieleman, and Stefan Wallin for
helpful discussions. A.F. is a postdoctoral trainee of
the Canadian Institutes of Health Research (CIHR)
Training Program in “Protein Folding: Principles
and Diseases” at the University of Toronto and
thanks the Program for stipend support. We thank
also CIHR (grant MOP-84281 to H.S.C.) and the
Canada Research Chairs Program for funding this
research.
Desolvation and Topology-Dependent Folding
References
1. Levitt, M. & Warshel, A. (1975). Computer simulation
of protein folding. Nature, 253, 694–698.
2. Taketomi, H., Ueda, Y. & Gō, N. (1975). Studies on
protein folding, unfolding and fluctuations by
computer simulation. 1. The effect of specific amino
acid sequence represented by specific inter-unit
interactions. Int. J. Pept. Protein Res. 7, 445–459.
3. Bryngelson, J. D., Onuchic, J. N., Socci, N. D. &
Wolynes, P. G. (1995). Funnels, pathways, and the
energy landscape of protein folding: a synthesis.
Proteins Struct. Funct. Genet. 21, 167–195.
4. Dill, K. A., Bromberg, S., Yue, K., Fiebig, K. M., Yee,
D. P., Thomas, P. D. & Chan, H. S. (1995). Principles of
protein folding—a perspective from simple exact
models. Protein Sci. 4, 561–602.
5. Thirumalai, D. & Woodson, S. A. (1996). Kinetics of
folding of proteins and RNA. Acc. Chem. Res. 29,
433–439.
6. Dill, K. A. & Chan, H. S. (1997). From Levinthal to
pathways to funnels. Nat. Struct. Biol. 4, 10–19.
7. Mirny, L. & Shakhnovich, E. (2001). Protein folding
theory: from lattice to all-atom models. Annu. Rev.
Biophys. Biomol. Struct. 30, 361–396.
8. Chan, H. S., Shimizu, S. & Kaya, H. (2004).
Cooperativity principles in protein folding. Methods
Enzymol. 380, 350–379.
9. Onuchic, J. N. & Wolynes, P. G. (2004). Theory of
protein folding. Curr. Opin. Struct. Biol. 14, 70–75.
10. Wallin, S. & Chan, H. S. (2005). A critical assessment
of the topomer search model of protein folding using
a continuum explicit-chain model with extensive
conformational sampling. Protein Sci. 14, 1643–1660.
11. Shakhnovich, E. (2006). Protein folding thermodynamics and dynamics: where physics, chemistry, and
biology meet. Chem. Rev. 106, 1559–1588.
12. Kim, D. E., Gu, H. & Baker, D. (1998). The sequences
of small proteins are not extensively optimized for
rapid folding by natural selection. Proc. Natl Acad.
Sci. USA, 95, 4982–4986.
13. Northey, J. G. B., Di Nardo, A. A. & Davidson, A. R.
(2002). Hydrophobic core packing in the SH3 domain
folding transition state. Nat. Struct. Biol. 9, 126–130.
14. Jackson, S. E. & Fersht, A. R. (1991). Folding of
chymotrypsin inhibitor 2. 1. Evidence for a two-state
transition. Biochemistry, 30, 10428–10435.
15. Sosnick, T. R., Mayne, L., Hiller, R. & Englander, S. W.
(1994). The barriers in protein folding. Nat. Struct.
Biol. 1, 149–156.
16. Jacob, J., Krantz, B., Dothager, R. S., Thiyagarajan, P.
& Sosnick, T. R. (2004). Early collapse is not an
obligate step in protein folding. J. Mol. Biol. 338,
369–382.
17. Matthews, C. R. & Hurle, M. R. (1987). Mutant
sequences as probes of protein folding mechanisms.
BioEssays, 6, 254–257.
18. Kuwajima, K. (1989). The molten globule state as a
clue for understanding the folding and cooperativity
of globular protein structure. Proteins: Struct. Funct.
Genet. 6, 87–103.
19. Kim, P. S. & Baldwin, R. L. (1990). Intermediates in
the folding reactions of small proteins. Annu. Rev.
Biochem. 59, 631–660.
20. Matthews, C. R. (1993). Pathways of protein folding.
Annu. Rev. Biochem. 62, 653–683.
21. Sadqi, M., Fushman, D. & Muñoz, V. (2006). Atomby-atom analysis of global downhill protein folding.
Nature, 442, 317–321.
633
22. Liu, F. & Gruebele, M. (2007). Tuning λ6–85 towards
downhill folding at its melting temperature. J. Mol.
Biol. 370, 574–584.
23. Jackson, S. E. (1998). How do small single-domain
proteins fold? Folding Des. 3, R81–R91.
24. Baker, D. (2000). A surprising simplicity to protein
folding. Nature, 405, 39–42.
25. Barrick, D. (2009). What have we learned from the
studies of two-state folders, and what are the
unanswered questions about two-state protein folding? Phys. Biol. 6, 015001.
26. Plaxco, K. W., Simons, K. T. & Baker, D. (1998).
Contact order, transition state placement and the
refolding rates of single domain proteins. J. Mol. Biol.
227, 985–994.
27. Wallin, S. & Chan, H. S. (2006). Conformational
entropic barriers in topology-dependent protein folding: perspectives from a simple native-centric polymer
model. J. Phys.: Condens. Matter, 18, S307–S328.
28. Chan, H. S. (1998). Matching speed and locality.
Nature, 392, 761–763.
29. Gō, N. & Taketomi, H. (1978). Respective roles of
short- and long-range interactions in protein folding.
Proc. Natl Acad. Sci. USA, 75, 559–563.
30. Alm, E. & Baker, D. (1999). Prediction of proteinfolding mechanisms from free-energy landscapes
derived from native structures. Proc. Natl Acad. Sci.
USA, 96, 11305–11310.
31. Muñoz, V. & Eaton, W. A. (1999). A simple model for
calculating the kinetics of protein folding from threedimensional structures. Proc. Natl Acad. Sci. USA, 96,
11311–11316.
32. Chan, H. S. (2000). Modeling protein density of
states: additive hydrophobic effects are insufficient
for calorimetric two-state cooperativity. Proteins:
Struct. Funct. Genet. 40, 543–571.
33. Karanicolas, J. & Brooks, C. L. (2003). The importance
of explicit chain representation in protein folding
models: an examination of Ising-like models. Proteins: Struct. Funct. Genet. 53, 740–747.
34. Micheletti, C., Banavar, J. R., Maritan, A. & Seno, F.
(1999). Protein structures and optimal folding from a
geometrical variational principle. Phys. Rev. Lett. 82,
3372–3375.
35. Shea, J.-E., Onuchic, J. N. & Brooks, C. L. (1999).
Exploring the origins of topological frustration:
design of a minimally frustrated model of fragment
B of protein A. Proc. Natl Acad. Sci. USA, 96,
12512–12517.
36. Clementi, C., Nymeyer, H. & Onuchic, J. N. (2000).
Topological and energetic factors: what determines
the structural details of the transition state ensemble
and “en-route” intermediates for protein folding? An
investigation for small globular proteins. J. Mol. Biol.
298, 937–953.
37. Koga, N. & Takada, S. (2001). Roles of native
topology and chain-length scaling in protein folding:
a simulation study with a Gō-like model. J. Mol. Biol.
313, 171–180.
38. Jewett, A. I., Pande, V. S. & Plaxco, K. W. (2003).
Cooperativity, smooth energy landscapes and the
origins of topology-dependent protein folding rates.
J. Mol. Biol. 326, 247–253.
39. Kaya, H. & Chan, H. S. (2003). Contact order
dependent protein folding rates: kinetic consequences of a cooperative interplay between favorable nonlocal interactions and local conformations
preferences. Proteins: Struct. Funct. Genet. 52,
524–533.
634
40. Kaya, H. & Chan, H. S. (2000). Polymer principles of
protein calorimetric two-state cooperativity.Proteins:
Struct. Funct. Genet. 40, 637–661; [Erratum: 43, 523
(2001)].
41. Kaya, H. & Chan, H. S. (2003). Solvation effects and
driving forces for protein thermodynamic and kinetic
cooperativity: how adequate is native-centric topological modeling? J. Mol. Biol. 326, 911–931; [Corrigendum: 337, 1069–1070 (2004)].
42. Ghosh, K. & Dill, K. A. (2009). Theory for protein
folding cooperativity: helix bundles. J. Am. Chem. Soc.
131, 2306–2312.
43. Kaya, H. & Chan, H. S. (2005). Explicit-chain model
of native-state hydrogen exchange: implications for
event ordering and cooperativity in protein folding.
Proteins: Struct. Funct. Bioinf. 58, 31–44.
44. Ejtehadi, M. R., Avall, S. P. & Plotkin, S. S. (2004).
Three-body interactions improve the prediction of
rate and mechanism in protein folding models. Proc.
Natl Acad. Sci. USA, 101, 15088–15093.
45. Wang, J., Lee, C. & Stell, G. (2005). The cooperative
nature of hydrophobic forces and protein folding
kinetics. Chem. Phys. 316, 53–60.
46. Qi, X. & Portman, J. J. (2007). Excluded volume, local
structural cooperativity, and the polymer physics of
protein folding rates. Proc. Natl Acad. Sci. USA, 104,
10841–10846.
47. Abkevich, V. I., Gutin, A. M. & Shakhnovich, E. I.
(1995). Impact of local and nonlocal interactions on
thermodynamics and kinetics of protein folding.
J. Mol. Biol. 252, 460–471.
48. Chan, H. S. (1999). Folding alphabets. Nat. Struct.
Biol. 6, 994–996.
49. Chan, H. S., Kaya, H. & Shimizu, S. (2002).
Computational methods for protein folding: scaling
a hierarchy of complexities. In Current Topics in
Computational Molecular Biology (Jiang, T., Xu, Y. &
Zhang, M. Q., eds), pp. 403–447, The MIT Press,
Cambridge, MA; chapt. 16.
50. Kaya, H. & Chan, H. S. (2003). Origins of chevron
rollovers in non-two-state protein folding kinetics.
Phys. Rev. Lett. 90, 258104.
51. Zhou, Y., Zhang, C., Stell, G. & Wang, J. (2003).
Temperature dependence of the distribution of the
first passage time: results from discontinuous molecular dynamics simulations of an all-atom model of
the second β-hairpin fragment of protein G. J. Am.
Chem. Soc. 125, 6300–6305.
52. Badasyan, A., Liu, Z. & Chan, H. S. (2008). Probing
possible downhill folding: native contact topology
likely places a significant constraint on the folding
cooperativity of proteins with ∼40 residues. J. Mol.
Biol. 384, 512–530.
53. Rank, J. A. & Baker, D. (1997). A desolvation barrier
to hydrophobic cluster formation may contribute to
the rate-limiting step in protein folding. Protein Sci. 6,
347–354.
54. Cheung, M. S., García, A. E. & Onuchic, J. N. (2002).
Protein folding mediated by solvation: water expulsion and formation of the hydrophobic core occur
after the structural collapse. Proc. Natl Acad. Sci. USA,
99, 685–690.
55. Karanicolas, J. & Brooks, C. L. (2002). The origins
of asymmetry in the folding transition states of
protein L and protein G. Protein Sci. 11,
2351–2361.
56. Sessions, R. B., Thomas, G. L. & Parker, M. J. (2004).
Water as a conformational editor in protein folding. J.
Mol. Biol. 343, 1125–1133.
Desolvation and Topology-Dependent Folding
57. Kaya, H., Liu, Z. & Chan, H. S. (2005). Chevron
behavior and isostable enthalpic barriers in protein
folding: successes and limitations of simple Gō-like
modeling. Biophys. J. 89, 520–535.
58. Liu, Z. & Chan, H. S. (2005). Desolvation is a likely
origin of robust enthalpic barriers to protein folding.
J. Mol. Biol. 349, 872–889.
59. Liu, Z. & Chan, H. S. (2005). Solvation and desolvation effects in protein folding: native flexibility,
kinetic cooperativity, and enthalpic barriers under
isostability conditions. Phys. Biol. 2, S75–S85.
60. Ferguson, A., Liu, Z. & Chan, H. S. (2007). Desolvation effects and topology-dependent protein folding.
2007 American Physical Society March Meeting
Abstract BAPS.2007.MAR.D26.3. http://meetings.
aps.org/link/BAPS.2007.MAR.D26.3.
61. Capaldi, A. P., Kleanthous, C. & Radford, S. E. (2002).
Im7 folding mechanism: misfolding on a path to the
native state. Nat. Struct. Biol. 9, 209–216.
62. Viguera, A. R., Vega, C. & Serrano, L. (2002).
Unspecific hydrophobic stabilization of folding
transition states. Proc. Natl Acad. Sci. USA, 99,
5349–5354.
63. Feng, H., Takei, J., Lipsitz, R., Tjandra, N. & Bai, Y.
(2003). Specific non-native hydrophobic interactions
in a hidden folding intermediate: implications for
protein folding. Biochemistry, 42, 12461–12465.
64. Cho, J. H., Sato, S. & Raleigh, D. P. (2004).
Thermodynamics and kinetics of non-native interactions in protein folding: a single point mutant
significantly stabilizes the N-terminal domain of L9
by modulating non-native interactions in the denatured state. J. Mol. Biol. 338, 827–837.
65. Gu, Z., Rao, M. K., Forsyth, W. R., Finke, J. M. &
Matthews, C. R. (2007). Structural analysis of kinetic
folding intermediates for a TIM barrel protein,
indole-3-glycerol phosphate synthase, by hydrogen
exchange mass spectrometry and Gō model simulation. J. Mol. Biol. 374, 528–546.
66. Zarrine-Afsar, A., Wallin, S., Neculai, A. M., Neudecker, P., Howell, P. L., Davidson, A. R. & Chan, H.
S. (2008). Theoretical and experimental demonstration of the importance of specific nonnative interactions in protein folding. Proc. Natl Acad. Sci. USA,
105, 9999–10004.
67. Zhang, Z. & Chan, H. S. (2009). Native topology of
the designed protein Top7 is not conducive to
cooperative folding. Biophys. J. 96, L25–L27.
68. Chan, H. S. & Zhang, Z. (2009). Liaison amid
disorder: non-native interactions may underpin
long-range coupling in proteins. J. Biol. 8, 27.
69. Sheinerman, F. B. & Brooks, C. L. (1998). Molecular
picture of folding of a small α/β protein. Proc. Natl
Acad. Sci. USA, 95, 1562–1567.
70. Rhee, Y. M., Sorin, E. J., Jayachandran, G., Lindahl, E.
& Pande, V. S. (2004). Simulations of the role of water
in the protein-folding mechanism. Proc. Natl Acad.
Sci. USA, 101, 6456–6461.
71. Shimizu, S. & Chan, H. S. (2002). Anti-cooperativity
and cooperativity in hydrophobic interactions: threebody free energy landscapes and comparison with
implicit-solvent potential functions for proteins.
Proteins: Struct. Funct. Genet. 48, 15–30; [Erratum:
49, 294 (2002)].
72. Shimizu, S. & Chan, H. S. (2002). Origins of protein
denatured state compactness and hydrophobic clustering in aqueous urea: inferences from nonpolar
potentials of mean force. Proteins Struct. Funct. Genet.
49, 560–566.
Desolvation and Topology-Dependent Folding
73. Moghaddam, M. S., Shimizu, S. & Chan, H. S. (2005).
Temperature dependence of three-body hydrophobic
interactions: potential of mean force, enthalpy,
entropy, heat capacity, and nonadditivity. J. Am.
Chem. Soc. 127, 303–316; [Correction: 127, 2363
(2005)].
74. Levy, Y. & Onuchic, J. N. (2006). Water mediation in
protein folding and molecular recognition. Annu.
Rev. Biophys. Biomol. Struct. 35, 389–415.
75. Best, R. B. & Hummer, G. (2008). Protein folding
kinetics under force from molecular simulation. J. Am.
Chem. Soc. 130, 3706–3707.
76. Rodríguez-Larrea, D., Minning, S., Borchert, T. V. &
Sanchez-Ruiz, J. M. (2006). Role of solvation barriers
in protein kinetic stability. J. Mol. Biol. 360, 715–724.
77. Costas, M., Rodríguez-Larrea, D., De Maria, L.,
Borchert, T. V., Gómez-Puyou, A. & Sanchez-Ruiz,
J. M. (2009). Between-species variation in the kinetic
stability of TIM proteins linked to solvation-barrier
free energies. J. Mol. Biol. 385, 924–937.
78. MacCallum, J. L., Sabaye Moghaddam, M., Chan, H. S.
& Tieleman, D. P. (2007). Hydrophobic association of
α-helices, steric dewetting and enthalpic barriers to
protein folding. Proc. Natl. Acad. Sci. USA, 104,
6206–6210; [Correction: 105, 19561 (2008)].
79. Pratt, L. R. & Chandler, D. (1977). Theory of
hydrophobic effect. J. Chem. Phys. 67, 3683–3704.
80. Shimizu, S. & Chan, H. S. (2000). Temperature
dependence of hydrophobic interactions: a mean
force perspective, effects of water density, and nonadditivity of thermodynamic signatures. J. Chem.
Phys. 113, 4683–4700; [Erratum: 116, 8636 (2002)].
81. Chen, B.-L., Baase, W. A. & Schellman, J. A. (1989).
Low-temperature unfolding of a mutant of phage-T4
lysozyme. 2. Kinetic investigations. Biochemistry, 28,
691–699.
82. Oliveberg, M., Tan, Y.-J. & Fersht, A. R. (1995).
Negative activation enthalpies in the kinetics of
protein folding. Proc. Natl Acad. Sci. USA, 92,
8926–8929.
83. Scalley, M. L. & Baker, D. (1997). Protein folding
kinetics exhibit an Arrhenius temperature dependence when corrected for the temperature dependence of protein stability. Proc. Natl Acad. Sci. USA,
94, 10636–10640.
84. Chalikian, T. V. (2003). Volumetric properties of
proteins. Annu. Rev. Biophys. Biomol. Struct. 32,
207–235.
85. Mitra, L., Hata, K., Kono, R., Maeno, A., Isom, D.,
Rouget, J.-B. et al. (2007). Vi-value analysis: a
pressure-based method for mapping the folding
transition state ensemble of proteins. J. Am. Chem.
Soc. 129, 14108–14109.
86. Veitshans, T., Klimov, D. & Thirumalai, D. (1997).
Protein folding kinetics: timescales, pathways and
energy landscapes in terms of sequence-dependent
properties. Folding Des. 2, 1–22.
87. Valleau, J. P. & Torrie, G. M. (1977). A guide to
Monte Carlo for statistical mechanics: 2. Byways. In
Statistical Mechanics, Part A: Equilibrium Techniques
(Berne, B. J., ed.), pp. 169–194, Plenum Press,
New York; chapt. 5.
88. Beveridge, D. L. & DiCapua, F. M. (1989). Freeenergy via molecular simulation – Applications to
chemical and biomolecular system. Annu. Rev.
Biophys. Biophys. Chem. 18, 431–492.
89. Voter, A. F. (1997). Hyperdynamics: accelerated
molecular dynamics of infrequent events. Phys. Rev.
Lett. 78, 3908–3911.
635
90. Fersht, A. R., Matouschek, A. & Serrano, L. (1992).
The folding of an enzyme. I. Theory of protein
engineering analysis of stability and pathway of
protein folding. J. Mol. Biol. 224, 771–782.
91. Bilsel, O. & Matthews, C. R. (2000). Barriers in protein
folding reactions. Adv. Protein Chem. 53, 153–207.
92. Chan, H. S. & Dill, K. A. (1998). Protein folding in the
landscape perspective: chevron plots and non-Arrhenius kinetics. Proteins: Struct. Funct. Genet. 30, 2–33.
93. Portman, J. J., Takada, S. & Wolynes, P. G. (2001).
Microscopic theory of protein folding rates. II. Local
reaction coordinates and chain dynamics. J. Chem.
Phys. 114, 5082–5096.
94. Kaya, H. & Chan, H. S. (2002). Towards a consistent
modeling of protein thermodynamic and kinetic
cooperativity: how applicable is the transition state
picture to folding and unfolding? J. Mol. Biol. 315,
899–909.
95. Coutsias, E. A., Seok, C. & Dill, K. A. (2004). Using
quaternions to calculate RMSD. J. Comput. Chem. 25,
1849–1857.
96. Best, R. B. & Hummer, G. (2006). Diffusive model of
protein folding dynamics with Kramers turnover in
rate. Phys. Rev. Lett. 96, 228104.
97. Chahine, J., Oliveira, R. J., Leite, V. B. P. & Wang, J.
(2007). Configuration-dependent diffusion can shift
the kinetic transition state and barrier height of protein
folding. Proc. Natl Acad. Sci. USA, 104, 14646–14651.
98. Jacob, M. & Schmid, F. X. (1999). Protein folding as a
diffusional process. Biochemistry, 38, 13773–13779.
99. Kaya, H. & Chan, H. S. (2003). Simple two-state
protein folding kinetics requires near-Levinthal thermodynamic cooperativity. Proteins: Struct. Funct.
Genet. 52, 510–523.
100. Chavez, L. L., Onuchic, J. N. & Clementi, C. (2004).
Quantifying the roughness on the free energy landscape: entropic bottlenecks and protein folding rates.
J. Am. Chem. Soc. 126, 8426–8432.
101. Hammond, G. S. (1955). A correlation of reaction
rates. J. Am. Chem. Soc. 77, 334–338.
102. Plotkin, S. S. & Onuchic, J. N. (2002). Structural and
energetic heterogeneity in protein folding. I. Theory.
J. Chem. Phys. 116, 5263–5283.
103. Bai, Y., Zhou, H. & Zhou, Y. (2004). Critical nucleation size in the folding of small apparently two-state
proteins. Protein Sci. 13, 1173–1181.
104. Weikl, T. R. (2008). Loop-closure principles in protein
folding. Arch. Biochem. Biophys. 469, 67–75.
105. Gromiha, M. M. & Selvaraj, S. (2001). Comparison
between long-range interactions and contact order in
determining the folding rate of two-state proteins:
application of long-range order to folding rate
prediction. J. Mol. Biol. 310, 27–32.
106. Zhou, H. & Zhou, Y. (2002). Folding rate prediction
using total contact distance. Biophys. J. 82, 458–463.
107. Ivankov, D. N., Garbuzynskiy, S. O., Alm, E., Plaxco,
K. W., Baker, D. & Finkelstein, A. V. (2003). Contact
order revisited: influence of protein size on the
folding rate. Protein Sci. 12, 2057–2062.
108. Micheletti, C. (2003). Prediction of folding rates and
transition-state placement from native-state geometry. Proteins Struct. Funct. Genet. 51, 74–84.
109. Gong, H., Isom, D. G., Srinivasan, R. & Rose, G. D.
(2003). Local secondary structure content predicts
folding rates for simple, two-state proteins. J. Mol.
Biol. 327, 1149–1154.
110. Pandit, A. D., Jha, A., Freed, K. F. & Sosnick, T. R.
(2006). Small proteins fold through transition states
with native-like topologies. J. Mol. Biol. 361, 755–770.
636
111. Baxa, M. C., Freed, K. F. & Sosnick, T. R. (2008).
Quantifying the structural requirements of the
folding transition state of protein A and other
systems. J. Mol. Biol. 381, 1362–1381.
112. Paci, E., Lindorff-Larsen, K., Dobson, C. M., Karplus,
M. & Vendruscolo, M. (2005). Transition state contact
orders correlate with protein folding rates. J. Mol.
Biol. 352, 495–500.
113. Hubner, I. A., Shimada, J. & Shakhnovich, E. I. (2004).
Commitment and nucleation in the protein G
transition state. J. Mol. Biol. 336, 745–761.
114. Badasyan, A., Liu, Z. & Chan, H. S. (2009).
Interplaying roles of native topology and chain
length in marginally cooperative and noncooperative folding of small protein fragments. Int. J.
Quantum Chem. In press. doi:10.1002/qua.22272.
Desolvation and Topology-Dependent Folding
115. Scott, K. A., Batey, S., Hooton, K. A. & Clarke, J.
(2004). The folding of spectrin domains I: wildtype domains have the same stability but very
different kinetic properties. J. Mol. Biol. 344,
195–205.
116. Lindberg, M., Tangrot, J. & Oliveberg, M. (2002).
Complete change of the protein folding transition
state upon circular permutation. Nat. Struct. Biol. 9,
818–822.
117. Miller, E. J., Fischer, K. F. & Marqusee, S. (2002).
Experimental evaluation of topological parameters
determining protein-folding rates. Proc. Natl Acad.
Sci. USA, 99, 10359–10363.
118. Cho, S. S., Weinkam, P. & Wolynes, P. G. (2008).
Origins of barriers and barrierless folding in BBL.
Proc. Natl Acad. Sci. USA, 105, 118–123.