Chapter 6
Random Walks and the
Structure of
Macromolecules
“The journey of a thousand miles begins with a single step.” - Chinese proverb
(RP: check)
6.1
What is a Structure: PDB or RG ?
We have already belabored the idea that the study of structure is often a prerequisite to being able to tackle the more interesting question of the functional
dynamics of a particular macromolecule or macromolecular assembly. Indeed,
this notion of the relation between structure and function has been elevated to
the status of the true central dogma of molecular biology, namely, “sequence
determines structure determines function” (Petsko and Ringe, 2004), wherein
our mission is to uncover the relation between sequence and consequence. We
have also already noted that the idea of structure is hierarchical and subtle, with
the relevant detail that is needed to uncover function often living at totally disparate spatial scales. For example, in thinking about phosphorylation-induced
conformational changes, an atom-by-atom description is required, whereas in
thinking about cell division, a much coarser description of DNA is likely more
useful. The key message of the present chapter is that there is much to be
gained in some circumstances by abandoning the deterministic, pdb mentality
described in earlier chapters for a statistical description in which we attempt
only to characterize certain average properties of the structure. Indeed, we will
argue that this type of thinking permits immediate and potent contact with a
range of experiments.
To elaborate, we continue with an aside on the concept of structure itself
with the aim of probing the subtlety that attends this idea. The use of the word
221
222CHAPTER 6. RANDOM WALKS AND THE STRUCTURE OF MACROMOLECULES
structure is in the end an attempt to lay order on top of the geometric disposition
of huge collections of atoms. When discussing the structure of a bridge such as
the Golden Gate Bridge we certainly make no mention of the underlying atomic
coordinates. Rather, in the search for a reduced description, we make reference
to the dimensions of overall objects which are treated as though they are of a
perfect rigidity and hence once the dimensions, the center of mass coordinates
and the orientation are specified, we have said all there is to say. At the opposite
extreme, when describing a crystal such as that of the crystals that make up the
microprocessors on which these pages are being written, once again, we bypass
an atom-by-atom reckoning with the idea of exploiting the perfect monotony of
the underlying crystal lattice. There are other cases yet! When contemplating
the chemically complex and atomistically enormous molecules that make up the
living world, the notion of structure reveals other facets. In particular, we find
that a statistical description of the structure becomes entirely appropriate. That
is, rather than attempting to state the precise location of each and every atom
that makes up the macromolecule, we settle for a description of the structure
which specifies its compactness.
6.1.1
Deterministic vs Statistical Descriptions of Structure
PDB Files Reflect a Deterministic Description of Macromolecular
Structure
The notion of structure is complex and ambiguous. In the context of crystals,
we can think of structure at the level of the monotonous regular packing of the
atoms into the unit cells of which the crystal is built. This thinking applies even
to crystals of nucleic acids, proteins or complexes such as ribosomes, viruses and
RNA polymerase. Indeed, it is precisely this regularity that makes it possible
to deposit huge pdb files containing atomic-coordinates on databases such as
the Protein Databank and VIPER. In this world view, a structure is the set
(r1 , r2 , · · · rN ), where ri is the vector postion ri = (xi , yi , zi ) of the ith atom
in this N -atom molecule. However, the structural descriptions that emerge
from x-ray crystallography provide a deceptively static picture which can only
be viewed as a starting point for thinking about the functional dynamics of
macromolecules and their complexes in the crowded innards of a cell.
Statistical Descriptions of Structure Emphasize Average Size and
Shape Rather Than Atomic Coordinates
As noted above, in the context of polymeric systems, the notion of structure
is subtle and brings us immediately to the question of the relative importance of
universality and specificity in macromolecules. In particular, there are certain
things that we might wish to say about the structure of polymeric systems that
are indifferent to the precise chemical details of these systems. For example,
when a DNA molecule is ejected from a bacteriophage into a bacterial cell,
all that we may really care to say about the disposition of that molecule is
6.2. MACROMOLECULES AS RANDOM WALKS
223
how much space it takes up and where within the cell it does so. Similarly,
in describing the geometric character of a bacterial genome, it may suffice to
provide a description of structure only at the level of characterizing a blob of a
given size and shape. Indeed, these considerations bring us immediately to the
examination of statistical measures of structure. As hinted at in the title to this
section, one such statistical measure of structure is provided by the radius of
gyration, RG , which, roughly speaking, gives a measure of the size of a polymer
blob. It is the business of the remainder of the chapter to hang quantitative
meat on this idea and to show the calculable consequences of adopting such a
description of structure.
6.2
Macromolecules as Random Walks
Recall that one of our primary missions in this book is to illustrate the ways in
which good models rear their heads under widely different circumstances. The
present chapter introduces one such model, namely, the random walk, which
is so important that it will see double duty. In the present setting, we argue
that the random walk and its cousins are just the models we need to provide
a statistical characterization of the structure of macromolecules. On the other
hand, in chap. ?? the random walk will once again rear it’s head, though in that
context with relevance to the diffusion of various molecules within cells.
Random Walk Models of Macromolecules View Them as Rigid Segments Connected by Hinges
As already seen in chap. 5, one way to characterize the geometric disposition of a macromolecule such as DNA is through the deterministic function
r(s). This function tells us the position (r) of that part of the polymer which
is a distance s along its contour. An alternative we will explore here is to discretize the polymer into a series of segments, each of length a, and to treat
each such segment as though it is rigid. The various segments that make up
the macromolecular chain are then imagined to be connected by flexible links
that permit the adjacent segments to point in various directions. Both one- and
three-dimensional versions of this idea are shown in fig. 6.1. Note that in the
figure, we illustrate the case in which the links are restricted to 90 degree angles,
though there are many instances in which we will consider links that can rotate
in arbitrary directions (the so-called freely jointed chain model).
As has already been noted, Crick’s “two great polymer languages” will form
the backdrop for much of what is to follow. In that spirit, figs. 6.2 and 6.3 show
the correspondence between the real structures of these molecules and their
idealization in terms of the lattice model of the random walk. In particular,
fig. 6.2 shows a conformation of DNA on a surface. Using the discretization
advocated above, we show how this same structure can be approximated using
a series of rigid rods (the Kuhn segments) connected by flexible hinges. We
will argue that this level of description can be useful in settings ranging from
estimating the entropic cost to confine DNA to the response of DNA when
224CHAPTER 6. RANDOM WALKS AND THE STRUCTURE OF MACROMOLECULES
(A)
(B)
Figure 6.1: Schematic representation of a one-dimensional random walk and a
three-dimensional random walk.
Figure 6.2: Structure of DNA on a surface as seen (a) experimentally using
atomic-force microscopy and (b) as represented as a random walk. (RP: use 2d
data- JK: put random walk on top of the experimental figure)
subjected to mechanical forces.
Fig. 6.3 has a similar mission to its partner in fig. 6.2. In this case, the lattice
model shows how the structure of the model protein lysozyme can be represented
by a series of segments (each segment corresponds to a single residue) which are
arranged in a compact, but convoluted structure. Indeed, part of the interest
that attaches to invoking random walk ideas in the context of proteins is the
mathematical care that needs to be taken to insure that the continous chain of
amino acids is packed compactly and that it doesn’t cross itself.
6.2.1
A Mathematical Stupor
Every Macromolecular Configuration Is Equally Probable When the
Polymer Is Viewed as a Random Walk
6.2. MACROMOLECULES AS RANDOM WALKS
225
Figure 6.3: Structure of lysozyme as viewed from the standpoint of both a
ribbon diagram and from the point of view of a random walk.
In this section we work our way up by degrees to some of the full beauty and
depth of the random walk model. The aim of the analysis is to obtain a probability distribution for each and every possible macromolecular configuration
and to use these probabilities to compute features that can be observed such as
the mean size of the macromolecule and the energy required to force excursions
from this mean size. Our starting point will be a conventional analysis of the
random walk in one-dimension, with our discussion being guided by the ways in
which we will later both generalize these ideas and apply them in what might at
first be considered unexpected settings. We begin by imagining a single random RP: want to make sure to
walker confined to a one-dimensional lattice with lattice parameter a as already do both lattice and continshown in fig. 6.1(a). The life history of this walker is built up as a sequence of uum versions of all of this
left and right steps, with each step constituting a single segment in the polymer.
In addition, for now we postulate that the probabilities of left and right steps
are given as pr = pl = 1/2. The trajectory of the walker is built up by assuming
that at each step the walker starts anew with no concern for the orientation of
the previous segment. We note that for a chain with N segments, this implies
that there are a total of 2N different permissible macromolecular configurations,
each with probability 1/2N .
The Mean Size of a Random Walk Macromolecule
Scales as the
√
Square Root of the Number of Segments, N
Given the spectrum of possible configurations and their corresponding probabilities, one of the most immediate questions we can pose concerns the mean
distance of the walker from its point of departure as a function of the number
of segments in the chain. In the context of biology, this question is tied to problems such as the cyclization of DNA and amounts to asking for the probability
of a given end-to-end distance for the molecule of interest. The response to this
226CHAPTER 6. RANDOM WALKS AND THE STRUCTURE OF MACROMOLECULES
question can be obtained by recourse to both simple argumentation as well as
brute force calculation, and we will take up both of these options in turn. The
simple argument notes that the expected value of the walkers distance from the
origin, R, after N steps can be obtained as
N
hRi = h
xi i,
(6.1)
i=1
where xi = ±a is the excursion suffered by the walker during the ith step and
where we have introduced the bracket notation hi to signify an average. Recall
that to obtain such an average we sum over all possible configurations with each
configuration weighted by its probability (in this case they are all equal). This
result may be simplified by noting that the averaging operation represented by
the brackets hi on the righthand side of the equation can be passed within the
summation symbol and through the recognition that hxi i = 0. Indeed, this
leaves us with the conclusion that the mean excursion undertaken by the walker
is identically zero. A more useful measure of the walkers departure from the
origin is to examine
N
N
hR2 i = h
xi xj i.
(6.2)
i=1 j=1
This expression may be broken down into two parts as
N
hR2 i =
N
hx2i i +
i=1
hxi xj i.
(6.3)
i=j=1
We note at this point that by virtue of the character of our model, namely, that
each and every step is uncorrelated with all steps that precede and follow it, the
second term on the righthand side is zero. In addition, we note that hx2i i = a2 ,
with the result that
hR2 i = N a2 .
(6.4)
Thus, we have learned that the walker’s departure from
√ the origin is characterized statistically by the assertion that hR2 i = a N , meaning that the
distance from the origin grows as the square root of the number of segments in
the chain.
The Probablity of a Given Macromolecular Configruation Depends
Upon it’s Microscopic Degeneracy
In addition to the simple argument spelled out above, it is also possible to
carry out a brute force analysis of this problem using the conventional machinery
of probability theory. We consider this an important alternative to the analysis
given above since it highlights the fact that there are many microscopic configurations that correspond to a given macroscopic configuration. In particular,
in the case in which the walker makes a total of N steps, we pose the question,
what is the probability that nr of those steps will be to the right (and hence
6.2. MACROMOLECULES AS RANDOM WALKS
227
nl = N − nr to the left)? Since the probability of each right or left step is given
by pr = pl = 1/2, the probability of a particular sequence of N left and right
steps is given by (1/2)N . On the other hand, we must remember that there are
many ways of realizing nr right steps and nl left steps out of a total of N steps.
In particular, there are
Ω(nr ; N ) =
N!
,
nr !(N − nr )!
(6.5)
distinct ways of achieving this outcome. A particular example of this thinking to
the case N = 3 is shown in fig. 6.4 where we see that there is one configuration
where all three segments are right pointing, one configuration in which all three
segments are left pointing and three configurations each for the cases in which
nr = 2, nl = 1 and nr = 1, nl = 2.
We have now enumerated the microscopic degeneracies of each macroscopic
configuration (characterized by a given end-to-end distance). As a result, we
are poised to write down the probability of an overall departure nr from the
origin which is given by
p(nr ; N ) =
N!
1
( )N .
nr !(N − nr )! 2
(6.6)
With this probability distribution in hand, we can now evaluate any averages
characterizing the geometric disposition of the chain by summing over all of the
configurations.
To develop facility in the use of this probability distribution, we begin by
confirming that it is normalized. To do so, we ask for the outcome of the sum
N
N
Normalization constant =
p(nr ; N ) =
nr =0
nr
1
N!
( )N .
n !(N − nr )! 2
=0 r
(6.7)
To evaluate this sum, we recall the binomial theorem that tells us
N
(x + y)N =
nr
N!
xnr y N −nr .
n
!(N
−
n
)!
r
r
=0
(6.8)
For the case in which x = y = 1, we see that this implies
N
nr
N!
= 2N .
n
!(N
−
n
)!
r
=0 r
(6.9)
Plugging this result back into eqn. 6.7 demonstrates that the probability distribution is indeed normalized.
Entropy Determines the Elastic Properties of Polymer Chains
The probability distribution for nr can be used to deduce a more telling
quantity, the probability distribution for the end to end distance, R = (nr −nl )a.
228CHAPTER 6. RANDOM WALKS AND THE STRUCTURE OF MACROMOLECULES
0
W=1
0
W=1
0
0
0
W=3
0
0
0
W=3
Figure 6.4: Cartoon revealing several conformations of a polymer and the corresponding degeneracies.
6.2. MACROMOLECULES AS RANDOM WALKS
229
Figure 6.5: End-to-end probability distribution for a one-dimensional “macromolecule” with 100 segments.
From nr + nl = N , nr = (N + R/a)/2 follows and eqn. (6.6) can be rewritten
as
N!
1
p(R; N ) =
(6.10)
( )N ,
(N + R/a)/2!(N − R/a)/2! 2
to give the probability distribution of the end-to-end distance. This distribution
is plotted in fig. 6.5. For large N this probability distribution is sharply peaked
at R = 0, taking on the form of a Gaussian distribution for R ¿ N a. For
example, the DNA associated with an E. coli cell is roughly 5 million nucleotides
long, as discussed in detail below, and can be modeled as a random walk of some
N = 15000 steps. The probability that the end-to-end distance is zero for a onedimensional walk of this many steps is 7 × 10−3 . The probability that R = 500a
is 2 × 10−6 while for R = 1000a the probability drops all the way down to
2 × 10−17 . This overwhelming probability that R is close to zero is responsible
for the elastic properties of polymer chains. Namely, if you imagine stretching
a polymer (say, the E. coli DNA) so that R is non-zero, then upon release it
will quickly find itself in the R ≈ 0 state solely by virtue of this being a much
more likely state. Note that this is not the result of any real physical force, such
as, for example, the electric force which is ultimately responsible for the elastic
properties of crystals, but purely a result of statistics. As such it is, like the
case of pressure of the ideal gas, another example of an entropic force.
We next interest ourselves in recovering some of the results obtained above
for the mean departure of the walker from his starting point. The mean excursion exercised by the walker is given by hRi = (hnr i − hnl i)a, which by virtue of
the identity nl = N − nr may be rewritten as hRi = (h2nr i − N )a. Hence, we
need to exploit our probability distribution in order to compute the expectation
value hnr i, which we can already anticipate will be equal to N/2. To evaluate
230CHAPTER 6. RANDOM WALKS AND THE STRUCTURE OF MACROMOLECULES
h2nr i we must evaluate
N
hnr i =
nr
N!
1
( )N nr .
n !(N − nr )! 2
=0 r
(6.11)
To evaluate this sum we find it convenient to artificially introduce the probabilities pr and pl for right and left jumps, respectively. As a result, our expression
becomes
N
N!
r
hnr i =
nr .
(6.12)
pnr r pN−n
l
n
!(N
−
n
)!
r
r
n =0
r
The advantage of this trick is that we can now express hnr i simply by differentiating eqn. 6.12 with respect to the parameter pr . In particular, we have
∂
hnr i = pr
∂pr
N
nr
N!
r
,
pnr r pN−n
l
n
!(N
−
n
)!
r
=0 r
(6.13)
which can be verified simply by carrying out the suggested differentiation.
However, we note that the quantity being differentiated is nothing more than
(pr + pl )N , with the result that
hnr i = pr N (pr + pl )N−1 =
RP: Higher dimensions
N
,
2
(6.14)
where we have used the fact that pr + pl = 1 and that pr = 1/2. The conclusion
of all of these machinations is that via brute force analysis we have recovered
what we knew already as a result of our simple qualitative reasoning. On the
other hand, the tools we have introduced here can serve us in those cases where
seat of the pants reasoning is impotent.
Statistical Structure of Long Polymers is Described by Random Walks
With the random walk model in hand we make use of it to describe the
structure of long polymers, whose contour length L is much larger than the
persistence length ξp . A good example is provided by genomic DNA of viruses
such as λ-phage with a contour length of 16.6µm. This should be compared
to the persistence length ξp ≈ 50nm of DNA at room temperature and solvent
conditions typical of the cellular environment. Since the persistence length is
the length over which the tangent vectors to the polymer backbone become
uncorrelated, we can think of the polymer as consisting of N ∼ L/ξp connected
links which take random orientations with respect to each other. This is the
logic which gives rise to the freely jointed chain model (essentially the random
walk picture undertaken in the previous section).
As already described, in the freely jointed chain model polymer conformation
are random walks of N steps. The length of the step is the Kuhn length which
is roughly equal to the persistence length already introduced in chap. 5. As
promised in the earlier discussion, we now establish the relation between the
persistence length and the Kuhn length invoked in the random walk model. To
6.2. MACROMOLECULES AS RANDOM WALKS
231
make a more precise determination of the Kuhn length we calculate the meansquared end-to-end distance of an elastic beam undergoing thermal fluctuations,
and compare it to the same quantity obtained for the freely jointed chain. The
end-to-end vector R of a beam can be expressed in terms of the tangent vector
t(s),
L
R=
dst(s)
(6.15)
0
Therefore
L
hR2 i = h
L
dst(s)
0
dut(u)i
(6.16)
0
where h· · ·i is the thermal average. Using the result for the tangent-tangent correlation function from chap. ??, ht(s)t(u)i = exp(−|s − u|/ξp ), which provided
for the definition of the persistence length in the first place, we find
L
hR2 i = 2
L
due−(u−s)/ξp .
ds
0
(6.17)
s
The above integral is obtained by splitting up the integration over the L × L
box in s-u space to integrals over the two triangles, one with s < u and the
other with s > u, which give equal contributions (thus the factor of two). In
the limit L À ξp we are considering here, we have
∞
L
hR2 i ≈ 2
ds
0
− ξxp
dxe
= 2Lξp .
(6.18)
0
Comparing this to the result that follows from the random walk model, hR2 i =
aL, we see that Kuhn length a is twice the persistence length. We are now
prepared to make estimates of the physical size of genomes in solution.
6.2.2
How Big is a Genome?
In previous sections we have demonstrated how the size of a polymer, when
viewed as a random walk, can be written in terms of key parameters such as
the persistence length ξp and the number of Kuhn lengths making up the entire
contour. In particular, we deduced the size of the polymer in solution may be
written as
√
hR2 i = 2ξp N .
(6.19)
This equation may be rewritten in terms of the polymer length once we recall
that the number of “monomers” (more correctly, the number of Kuhn lengths)
in the chain is given by N = L/2ξp . In light of this result, we then have
hR2 i =
2Lξp .
(6.20)
A more useful result (proven by the reader in the problems at the end of the
chapter) is the relation between the radius of gyration and the number of links,
namely,
Lξp
2i =
hRG
.
(6.21)
3
JK: Think about where we
will make contact between
the random walk and an elastic beam undergoing thermal
fluctuations.
232CHAPTER 6. RANDOM WALKS AND THE STRUCTURE OF MACROMOLECULES
Figure 6.6: Plot of the average size of a DNA molecule in solution as a function
of the number of base pairs using the random walk model.
Note that the radius of gyration is perhaps a more intuitive (and measurable)
meausure of the size of a polymer in solution and is defined through the expression
N
1
2
hRG
i=
h(Ri − RCM )2 i.
(6.22)
N i=1
Recall also that the center of mass can be defined as
RCM =
1
N
N
Ri .
(6.23)
i=1
We may write this result in an alternative form in terms of the number of
base pairs in the genome of interest by noting that L ≈ .34Nbp nm, and hence,
2 i ≈ .3 N ξ nm.
hRG
bp p
(6.24)
This relation between the radius of gyration of DNA in solution and the number
of base pairs is plotted in fig. 6.6.
One application of ideas like those described above in the setting of biological electron microscopy is to images of viruses and cells that have ruptured and
are thus surrounded by the DNA debris from their genome. Two examples of
this phenomenon are shown in figs. 6.7 and 6.8 where it is seen that the DNA
adopts a configuration in solution which is much larger than the configuration
it has when packed inside of the virus or bacterium. We explore the energetic
implications of these pictures in problem ?? at the end of this chapter. To develop intuition for what is seen in such images, we exploit eqn. 6.21 to formulate
an estimate of the size of the DNA. Consider fig. 6.7 which shows bacteriophage
T2, similar to those described in section ?? with reference to DNA packaging.
6.2. MACROMOLECULES AS RANDOM WALKS
233
Figure 6.7: Illustration of the extent of a viral genome which has escaped its
capsid.
As seen in the figure, the viral genome has leaked from what is apparently a
ruptured capsid and for the present purposes, we will assume that this DNA
in solution has adopted an equilibrium configuration. The genomes of T 2 and
T 4 are very similar with a genome length of roughly 150 kB. For a genome of
length L = Nbp 3.4Å ≈ 510, 000Å and recalling that the persistence length is
ξp ≈ 500Å,√eqn. 6.21 tells us that the mean size of the DNA seen in fig. 6.7 is
2i =
hRG
2 × 500 × 510 × 103 Å ≈ 2µm. This result is comparable to though
larger than the length scale of the exploded DNA seen in fig. 6.7. Given the
crudeness of the model and probably more importantly, the fact that the DNA
seems to be constrained via links to the capsid itself, this analysis provides a
satisfactory first approximation to the structures seen in electron microscopy.
These same arguments can be invoked again to coach our intuition concerning the size of the DNA cloud surrounding a bacterium that has lost its DNA
as well. In this case, the genome length is substantially larger than that of the
T2 phage, namely, L ≈ 4.6 × 106 × 3.4Å ≈ 1.5 × 107 Å ≈ 1600µm. Once again
invoking eqn. 6.21 tells us that the mean size of the DNA seen in fig. 6.8 is
2 i ≈ 12µm. As with the phage calculation, the random walk calculation
hRG
should be seen as an overestimate since the DNA is clearly forced to return to the
bacterium repeatedly, inhibiting the structure from adopting a fully expanded
configuration.
6.2.3
DNA Looping: From Chromosomes to Gene Regulation
RP: make a figure with all
Looping of Large DNA Fragments Is Dictated by the Difficulty of the of the examples of looping we
can think of- RNA, DNA in
Distant Ends to Find Each Other
gene regulation. JK has the
very interesting idea that one
can think of folding as recruitment - one fold recruits
the next, etc. - can we make
concrete for DNA and proteins
RP: this section should
in principle talk about the
Jacobson-Stockmayer theory,
but more importantly hint
234CHAPTER 6. RANDOM WALKS AND THE STRUCTURE OF MACROMOLECULES
Figure 6.8: Illustration of the extent of a bacterial genome which has escaped
the bacterial cell
Figure 6.9: Examples of looping. RP/JK: RNA folding (binding vs entropy)
and DNA (binding vs bending)
6.2. MACROMOLECULES AS RANDOM WALKS
235
In the previous chapter we argued that the formation of small loops, where
”small” is relative and determined by the stiffness of the macromolecule in
question, is inhibited by the energetic cost associated with bending the DNA.
Such looping forms a part of the story of how genes are regulated (see sections
?? and ??) and, as a result, it is of interest to understand the energetic factors
contributing to the rate of such looping. Beyond the case of small loops, it is
also of interest to examine the opposite limit, namely where the DNA fragment
in question is very large. In this case, DNA looping is similarly inhibited, but
now by the entropic challenge of getting the two ends of the fragment at the
same place at the same time. The random walk model of polymers allows for a
simple calculation of the entropic cost for loop formation.
As a warm-up exercise, consider first the one dimensional model introduced
in section ??. The probability, p◦ , of loop formation is the probability that
the one-dimensional random walker returns to the origin. Using eqn.(6.10), we
conclude
N
1
N!
p◦ = N
,
(6.25)
N
2
2 ! 2 !
where N is the number of Kuhn segments, or the ratio of the DNA length
to twice the persistence length (ξP ≈ 50nm). Here we are interested in the
long chain limit, which corresponds to N À 1. This is also the limit in which
the random walk model can be applied to DNA conformations, as discussed
previously. To further√simplify eqn.(6.25) we make use of our trusty Stirling
formula, N ! ≈ (N/e)N 2πN , for N À 1:
p◦ ≈
2
.
πN
(6.26)
The interesting prediction of the model is that the cyclization probability of
long DNA stands will decay with polymer length to the power −1/2.
Unlike the scaling of the polymer size with its length, which we found to
be independent of the dimensionality of space, the effect of dimensionality on
cyclization is quite significant. Namely, consider the 3-dimensional random
walker of N steps. The probability of returning to the origin can be written as
the ration of the number of walks that return to the total number of walks. For
a walker on a cubic lattice, the latter is 6N , as there are 6 possible directions
to choose from at each step. The number of walks that return to the origin, is
a slightly more involved combinatorial problem.
The number of walks that return to the origin can be written as
Ω◦ =
δ(l1 + l2 + l3 + . . . + lN )
(6.27)
{li }
where the sum goes over all possible choices, li ∈ {±x̂, ±ŷ, ±ẑ} for each of the
the N steps, i = 1, 2, . . . , N . The function is the Krönecker delta function; it
returns one if l is zero, otherwise the value of the function is zero. To make
236CHAPTER 6. RANDOM WALKS AND THE STRUCTURE OF MACROMOLECULES
further progress we make use of a mathematical identity
δ(l) =
1
2π
2π
eilx kx dkx ·
0
2π
1
2π
eily ky dky ·
0
1
2π
2π
eilz kz dkz ,
(6.28)
0
where we the vector l = (lx , ly , lz ) has integer coordinates. The validity of this
2π
identity is readily checked by evaluating the integral 1/2π 0 exp(ilk)dk which
is one for l = 0 and zero for any other integer value of the parameter l. With
this identity in hand we can rewrite the expression for the number of walks that
return to the origin as,
Ω◦ =
{li }
3
1
2π
2π
2π
dkx
2π
dkz eik·(l1 +l2 +l3 +...+lN )
dky
0
0
.
(6.29)
0
where the vector k ≡ (kx , ky , kz ).
Now we reap the benefits of writing the Krönecker delta function as integral.
Namely, the sum over all possible values of li of the exponential term appearing
in our formula, can now be computed explicitly. In fact the careful reader should
notice that this is nothing but the sum we dealt with when computing the force
extension curve for a worm-like chain. The only difference is that the reduced
force is now replaced by the vector ik.
The steps of the walk are independent of each other. Therefore the sum in
eqn.(6.29) can be written as a product of N identical sums over the possible
single step excursions:
{li }
eik·(l1 +l2 +l3 +...+lN ) =
l=±x̂,±ŷ,±ẑ
The sum over all possible single steps is
N
eik·l
.
eik·l = 2(cos kx + cos ky + cos kz ) ,
(6.30)
(6.31)
l=±x̂,±ŷ,±ẑ
which in turn yields the following expression for the cyclization probability
p◦ =
Ω◦
=
6N
1
2π
3
2π
2π
dkx
0
2π
dky
0
dkz
0
cos kx + cos ky + cos kz
3
N
.
(6.32)
Since we are interested in the limit N À 1 the function to be integrated over
will be essentially zero excepts for values of k close to (0, 0, 0), (π, π, π), etc.,
that is close to points at which all three cosines are either one or negative one.
This point is illustrated in fig. ?? where we plot ((cos kx + cos ky )/2)N , the twodimensional version of our integrand, for N = 60. Note that we need to take
the number of steps to be even to ensure that our walker can in fact return to
the origin.
6.2. MACROMOLECULES AS RANDOM WALKS
237
1
6
0.5
4
ky
0
0
2
2
kx
4
6 0
Figure 6.10: Plot of the function ((cos kx + cos ky )/2)N for N = 60.
From the figure we see that the integral is essentially twice the volume under
the peak at (π, π), or equivalently, twice the volume under the peak at (0, 0).
Similarly in the three dimensional case of interest, for N À 1 we can compute the
integral to a very good approximation as twice the integral over a small volume,
∆V0 centered around (0, 0, 0). We can further simplify matters by expanding
the integrand in small k, which is valid since we are confined to a small region
under the (0, 0, 0) peak. For this we use the expansion cos x = 1 − x2 /2. The
end result of these approximations is the expression
p◦ ≈ 2
1
2π
3
d3 k 1 −
∆V0
k2
6
N
(6.33)
which becomes the Gaussian integral we know and love,
p◦ ≈ 2
1
2π
3
d3 ke−
∆V0
N k2
6
≈2
1
2π
3
d3 ke−
N k2
6
(6.34)
once we recall that ex = 1 + x for small x. Furthermore, note that in the final
expression the integral over the volume ∆V0 is replaces by an integral over all
of k-space (kx,y,z ∈ (−∞, +∞)). This is yet another valid approximation which
is rationalized by the fact that the Gaussian function exp(−N x2 /6) rapidly
vanishes for values of x that exceed 6/N , not making a significant contribution
to the integral. Referring to fig. ?? the last two approximations correspond to
writing the integral under a single peak as a Gaussian integral over the whole
plane.
To obtain the scaling of the cyclization probability with the number of Kuhn
segments of the DNA chain, which in turn is proportional to the number of base
pairs, we need only scale out the N dependance of the integral in eqn.(6.34).
238CHAPTER 6. RANDOM WALKS AND THE STRUCTURE OF MACROMOLECULES
Figure 6.11: Looping probability as a function of DNA segment length. For large
DNA fragments, the results of the random walk analysis are seen to provide an
appropriate description of these systems, while for short fragments, the looping
phenomenon is dominated by the cost of elastic bending. RP: data from Shore
and Baldwin, J. Mol. Bio.,170 957 (1983)
This is accomplished by a change of variables
p◦ ≈ 2
1
2π
3
1
N 3/2
√
N k ≡ x, and
d3 xe−
x2
6
3
∼ N−2 .
(6.35)
We conclude that the random walk model predicts a cyclization probability
which scales with the polymer length to the −3/2 power. This is confirmed in
RP: this section should re- experiments on DNA, many persistence length in size.
fer to RNA folding and the
role of loops in dictating the
6.3 The New World of Single Molecule Mechanstructures - and the costs
ics
Another interesting outcome of the random walk model is its ability to shed light
on the response of macromolecules to external forcing. This is especially timely
in light of the development of a range of experimental techniques which make
it possible to query individual proteins, DNA molecules and carbohydrates as
to their response to forcing.
Single Molecule Measurement Techniques Lead to Force Spectroscopy
There are a number of different implementations of the basic idea of applying
forces to individual macromolecules. Several of these techniques are represented
in schematic form in fig. 6.12. One such techique shown in fig. 6.12(a) involves
the use of micron-sized cantilevers which are attached to a macromolecule which
is, in turn, tethered to a surface. Through control of the height of the surface to
6.3. THE NEW WORLD OF SINGLE MOLECULE MECHANICS
239
which the molecule is tethered, for example, the cantilever will suffer a deflection
which can be measured using reflected laser light. A second example shown in
fig. 6.12(b) is optical tweezers which permit the application of forces of order
1-50 pN on macromolecules of interest. In this case, the key idea is that by
attaching a macromolecule to a micron-sized bead, it is possible to pull on the
bead (and hence the molecule) by shining laser light on the bead and using
the resulting radiation pressure from the laser light to manipulate the bead.
The same basic idea is similarly played out in the context of the magnetic
tweezers shown in fig. 6.12(c) where the bead is manipulated by magnetic fields
rather than laser light. One of the interesting variations on the forcing scheme
provided by the magnetic tweezer is the opportunity to apply torsional forces
which examine the response of molecules to twist. The final example shown
in fig. 6.12(d) is the use of a pipette-controlled force apparatus in which the
strengths of ligand receptor interactions as well as the mechanical response of
lipid bilayer vesicles can be examined. Our main point in this discussion is to
alert the reader to the emergence of single molecule techniques that complement
the tools of traditional solution biochemistry and permit the measurement of not
only the average properties of the various macromolecules of biological interest,
but also the fluctuations about this average response.
RP: optical tweezers, magnetic tweezers, pipette techniques - comment on dis6.3.1 Force-Extension Curves: A New Spectroscopy
placement and force sensitivDifferent Macromolecules Have Different Force Signatures When Sub- ity and estimate the relevant
jected to Loading
scales
RP: analogous to ion chanThe techniques introduced above permit the explicit measurement of the nels with their signature
force-extension characteristics of a range of different molecules. Fig. 6.13 shows
the force-extension properties of several characteristic examples ranging from
DNA to proteins. In particular, fig. 6.13(a) shows the force-extension characteristics of a single DNA molecule subjected to loading. Note that the same
characteristic force-extension signature will be found for a given DNA molecule
regardless of which of the various techniques is used to measure it, and further, that this curve provides a unique fingerprint which serves as the basis of
force spectroscopy of macromolecules. Fig. 6.13(b) shows a plot of the forceextension properties of a particular RNA molecule. Note that the character of
the secondary structure associated with a given RNA molecule is translated, in
turn, into the character of the force-extension curve, illustrating the idea that
the force-extension curve provides a spectroscopic fingerprint of different macromolecules. Fig. 6.13(c) shows yet a third example of the intriguing diversity of
force-extension curves associated with different macromolecules, this time revealing how the multidomain protein titin unfolds in the presence of force. One
immediate statement that can be made in this example is that the number of
load drops in the curve corresponds to the number of domains in the protein,
an assertion that has been tested elegantly in the experiments of Fernandez and
coworkers. We consider it part of the business of the remainder of this book to
show the various ideas that have been set forth to respond to the types of spec-
240CHAPTER 6. RANDOM WALKS AND THE STRUCTURE OF MACROMOLECULES
(A)
laser
beam
(B)
laser
beam
cantilever
multidomain
protein
bead
DNA
microscope slide
tethering
surface
mRNA
(C)
RNA polymerase
(D)
magnetic field
lines
large micropipette
N
S
magnetic bead
supercoiled
DNA
beads
red blood cell
tethering
surface
receptor molecule
ligand molecule
small micropipette
Figure 6.12: Schematic showing a variety of single molecule techniques. (a)
single molecule atomic-force microscopy being used to stretch a multi-domain
protein, (b) optical tweezers being used to measure the rate of transcription,
(c) magnetic tweezers being used to measure the torsional properties of DNA
and (d) pipette-based force apparatus being used to measure ligand-receptor
adhesion forces.
6.3. THE NEW WORLD OF SINGLE MOLECULE MECHANICS
241
troscopic measurements shown in fig. 6.13. Further, we emphasize that these
three examples are but a tiny representation of the broad class of measurements
that have been made on polysaccharides, lipids, proteins and nucleic acids as
well as their assemblies.
RP: also talk about fluctuations something not present
6.3.2 Random Walk Models for Force-Extension Curves in bulk experiment.
RP: once we have the idea
Now that we have said something about those aspects of macromolecular struc- of the structure in this
ture that can be informed on the basis of the random walk model, we now pose form, can we say something
more sophisticated questions, namely, what do these structural insights imply about the relative probabiliabout the free energy cost of stretching the chain.
ties of different configuration
The Low-Force Regime in Force-Extension Curves Can Be Under- and what this implies about
stood Using the Random Walk Model
forces. This bit is conceptual
and should include a cartoon.
One of the simplest models that can be written to capture the relation between force and extension in polymers is based on a strictly entropic interpretation of the free energy. In particular, by remembering that as the chain molecule
is stretched to lengths approaching its overall contour length, the overall number
of configurations available to the molecule goes down and with it so too does
the entropy. This reduction in entropy corresponds to an increase in the free
energy.
We begin with a one-dimensional rendition of the freely-jointed chain model.
We imagine a polymer of overall length Ltot = N b, where N is the number of
monomers and b is the length of each monomeric segment. The basic thrust
of our argument will be to construct the free energy F (L) as a function of the
length L = (nr − nl )b from which the force necessary to arrive at that extension
is given by
∂F (L)
force = −
.
(6.36)
∂L
As before, we use the notation nr and nl to signify how many of the total links
are right pointing (nr ) and how many are left pointing (nl ). In order to proceed,
we need an explicit formula for the free energy. As noted above, in this simplest
of models we ignore any enthalpic contributions to the free energy, with the
entirety of the free energy of the molecule taking the form,
F (L) = −kB T lnΩ(L; Ltot ),
(6.37)
where Ω(L; Ltot ) is the number of configurations of the molecule which have
length L given that the total contour length of the molecule is Ltot .
As shown in fig. 6.14, we are interested in the equilibrium of our random
walk representation of the polymer when it is subjected to external forcing such
as can be provided by an optical tweezers setup. A particularly transparent way
to imagine this problem is to think of weights being dangled from the ends of
the polymer as shown in fig. 6.15. In this case, the free energy of eqn. 6.37 must
be supplemented with a term of the form Uweights = −2mgL. What this term
says physically is that the more the molecule is stretched, the lower the weights
242CHAPTER 6. RANDOM WALKS AND THE STRUCTURE OF MACROMOLECULES
Figure 6.13: Force-displacement curve for a variety of different molecules illustrating the sense in which single molecule experiments serve as the basis of
force spectroscopy. RP: DNA, RNA, titin - DNA is Bouchiat, Biophys. J 1999,
RNA is Liphardt et al in Science and titin is Julio Fernandez in PNAS, 96, 3694
(1999).
6.3. THE NEW WORLD OF SINGLE MOLECULE MECHANICS
243
f
f
Figure 6.14: Schematic of a model one-dimensional polymer subjected to external forcing. The cartoon is meant to suggest that our one-dimensional random
walk polymer is attached to two optical beads at its extremities and that forcing
is applied using optical tweezers.
will dangle with the result that their potential energy is decreased. Putting
together this term with the contribution from eqn. 6.37, we have for the total
free energy of the system
F (L) =
−2mgL
−
.
kB T lnΩ(L; Ltot )
contribution from weights
entropic contribution of polymer conformations
(6.38)
To make further progress with this result, and in particular, to obtain the free
energy minimizing length as a function of the applied force, we must first find
a concrete expression for Ω)L; Ltot ). To that end, we note that this reduces to
nothing more than the combinatoric question of how many different ways there
are of arranging N arrows, nR of which are right pointing and nL = N − nR of
which are left pointing. Indeed, this is in many ways the “hydrogen atom” of
combinatoric questions and is given by
Ω(nR ; N ) =
N!
,
nR !(N − nR )!
(6.39)
where we have found it convenient to replace our reference to L and Ltot with
reference to the number of right pointing arrows and the total number of such
arrows with the recognition that they are related by L = (nR − nL )b and Ltot =
N b.
Given the free energy, our task now is to minimize it with respect to length
(or nR ). To that end, we first invoke the Stirling approximation, which we
remind the reader allows us to replace lnN ! by N lnN − N . In light of this
244CHAPTER 6. RANDOM WALKS AND THE STRUCTURE OF MACROMOLECULES
f
f
Figure 6.15: Schematic of a model one-dimensional polymer subjected to external forcing through the attachment of weights on the end. This scenario is a
simple fiction that helps illustrate how to include the forcing in the overall free
energy budget. RP: replace this figure with a second one - talk to Nigel.
approximation, the overall free energy may be written as
F (nR ) = −2M gbnR + kB T (nR lnnR + (N − nR )ln(N − nR )).
(6.40)
Note that we have neglected all constant terms since they will not contribute
during the minimization. Differentiation of this expression with respect to nR
results in
∂F
= −2M gb + kb T lnnR − kB T ln(N − nR ) = 0
(6.41)
∂nR
which may be rewritten in a more transparent fashion as
z=
mgb
hLi
= tanh
.
Ltot
kB T
(6.42)
The construct of using weights to load the molecule was a convenient fiction
and more generally the two ends are subjected to a force f with the result
that z = tanh(f b/kB T ). This force-extension relation is shown in fig. 6.16.
To gain further insight into the quantitative aspects of the model we consider
the limiting case of a small force, ie. f b ¿ kB T . For a dsDNA molecule in
physiological conditions (b ≈ 100nm) this corresponds to f ¿ 40fN while for
the much more flexible ssDNA (b ≈ 1.5nm) the small force regime is obtained
for f ¿ 3pN. In the small force limit the force extension curve is linear,
hLi =
Ltot b
f ,
kB T
(6.43)
ie. in this regime the polymer behaves like an ideal Hookean spring with a
stiffness constant k = kB T /Ltot b. The fact that the stiffness of this spring is
6.3. THE NEW WORLD OF SINGLE MOLECULE MECHANICS
245
Figure 6.16: Relation between force and extension as obtained using the freely
jointed chain model. (a) One-dimensional calculation, (b) Three-dimensional
calculation.
linearly proportional to the temperature reveals its true entropic nature. For
λ-phage dsDNA whose Ltot = 16.6µm the effective spring constant is k ≈
2.3fN/µm while for the same length ssDNA k ≈ 160fN/µm. Note that the
larger flexibility of ssDNA, as evidenced by its smaller persistence length, leads
to a larger value for the effective spring stiffness.
Note that thus far, our model of the macromolecule has been highly idealized
in that we have imagined that each monomer can only point in one of two
directions. Though that model is instructive, clearly it is of interest to expand
our horizons to the more physically realistic three-dimensional case, noting that
even there, there are features of our models that remain questionable.
The generalization of our freely-jointed chain analysis to three dimensions
holds no particular surprises. The fundamental idea is that now instead of
constraining the monomers that make up the molecule of interest to point only
right or left, we give them full three-dimensional motion. The simplest variant
of this model is to permit each monomer to point in one of six directions (i.e.
e1 , −e1 , e2 , −e2 , e3 and −e3 ). We quote the result for this model, namely,
z=
2sinhβf b
hLi
=
,
Ltot
4 + 2coshβf b
(6.44)
and leave the details as an exercise for the reader.
The more interesting case which we work out in greater detail is that in which
each monomer can point in any direction. In this case, rather than writing out
the free energy explicitly, instead, we compute the partition function and use it
to deduce the relevant averages, such as the average length at a given applied
force. As each link in the chain is independently fluctuating the partition for
JK: Show data for dsDNA
and ssDNA force extension
at small forces, and use it
to deduce a spring constant.
Compare to estimate from
the FJC model.
246CHAPTER 6. RANDOM WALKS AND THE STRUCTURE OF MACROMOLECULES
N = Ltot /b links is ZN = Z1N with
2π
Z1 =
π
ef b cos θ/kB T sin θdθ.
dφ
0
(6.45)
0
The above integral over the unit sphere can be evaluated with the change of
variables x = cos θ, to give
Z1 = 4π
kB T
fb
sinh
.
fb
kB T
(6.46)
Now the free energy F (f ) = −kB T ln ZN is a function of the applied force f
and we differentiate it with respect to f to obtain an expression for its thermodynamic conjugate, the average polymer length,
hLi = −
∂F
fb
kB T
= N b coth
−
∂f
kB T
fb
.
(6.47)
The small force limit, f b/kB T ¿ 1 in this case gives the same Hookean
expression, f = khLi as the one-dimensional freely jointed chain, except the
effective spring constant is three times as large, k = 3kB T /Ltot b. The same
result follows from eqn. (6.44). Not surprisingly, the two-dimensional version of
the model, whether it be defined on a lattice or not, gives k = 2kB T /Ltot b.
Force-Induced Unfolding of Multidomain Proteins Can Be Modeled
Using the Random Walk Model
One particularly satisfying application of these ideas is to the analysis of
the force-induced unfolding of multidomain proteins. The data relevant to the
particular case of the muscle protein titin has already been shown in fig. 6.13(c).
Our current objective is to use our random walk analysis to partially explain
data like that of fig. 6.13(c). The idea of the analysis we will bring to bear
on this problem is shown in fig. 6.17, where it is seen that the overall contour
length of the chain increases in a systematic and calculable way as a function
RP: this is a curve like of the number of domains that have unfolded.
from my homeworks where The Random Walk Picture Can Be Used to Construct a Toy Model
the random walk elastic- of Force-Induced Structural Transitions in DNA
ity gives a bunch of difAs noted above, one of the most compelling features of the force-extension
ferent curves corresponding
to different overall contour curve for DNA is the observation that in the large force regime the molecule
suffers extreme deformation as shown in fig. 6.19. One set of insights into the
lengths.
physical content of this deformation is that garnered from all-atom simulations
RP: ref the article on stretch- of DNA. These simulations indicate that as a result of the large forces to which
ing by Richard Lavery
the molecule is subjected, it undergoes a structural change from the conventional
B form to a new state in which the DNA is lined up with the ribbons now nearly
parallel and the base pairs forming a structure something like the rungs on a
ladder as shown in fig. 6.19.
As a toy model to provide insight into the underlying mechanism of this forceinduced structural change, we consider a generalization of the freely-jointed
6.3. THE NEW WORLD OF SINGLE MOLECULE MECHANICS
247
f
f
f
Figure 6.17: Random walk model for elastic parts of the titin force-extension
curve.
Figure 6.18: Illustration of the force-extension properties of a globular protein as
obtained using the random walk model of entropic elasticity as different domains
are successively unfolded, resulting in longer contour lengths.
248CHAPTER 6. RANDOM WALKS AND THE STRUCTURE OF MACROMOLECULES
Figure 6.19: Illustration of the structural change that attends the application of
force on DNA.(RP: where did this figure come from? Also, look at the Lavery
figure)
chain model. As a first cut, we consider a one-dimensional generalization and
consider the three-dimensional version of the same model subsequently. We now
imagine that each monomer has four distinct states. Once again, each monomer
can point either to the left or right, but we now supplement our original space
of possibilities by allowing for each monomer to have a second variant with a
length a which costs an energy ² in excess of those of length b. Our strategy
for examining the mean length of the chain in the presence of a force f is to
evaluate the partition function
e−Econfig /kB T .
Z=
(6.48)
config
Since each monomer has configurations that are indifferent to those of the remaining monomers, the total partition function factorizes into a product of N
individual monomer partition functions and can be written in the form
e−E1 /kB T · · ·
Z=
monomer 1
e−EN /kB T .
(6.49)
monomer N
We note that each monomer can be in one of four states (left/short: E = f b,
right/short: E = −f b, left/long: E = ² + f a, right/long: E = ² − f a). The sum
over configurations is now reduced to summing over these four terms for each
monomer and results in the total partition function
Z = (2cosh
fb
fa N
−
+ 2e kB T cosh
) .
kB T
kB T
(6.50)
Now that the partition function is in hand, we can determine the average length
6.5. BEYOND THE RANDOM WALK
249
(A)
(B)
Figure 6.20: Schematic of a one-dimensional freely jointed chain that supports
a force-induced transition.
of the molecule by evaluating
hLi =
1 1 ∂Z
.
β Z ∂f
(6.51)
The result of carrying out this manipulation is
z=
−
b
sinh kfBbT + e kB T sinh kfBaT
L
= a
.
−
Ltot
cosh kfBbT + e kB T cosh kfBaT
RP: check this
(6.52)
RP: the experimental landscape. Should include the
As noted in the previous section, the random walk model has taken us worm like chain model
straight to predictions about the nature of the force-displacement curves associated with macromolecules. These predictions are made all the more potent
as a result of the emergence of a new generation of experimental probes that
have made possible a new generation of mechanical testing in which the test
samples are precisely the type of large molecules described above.
250CHAPTER 6. RANDOM WALKS AND THE STRUCTURE OF MACROMOLECULES
Figure 6.21: Cartoon illustrating how entropy changes for a bent biofunctionalized cantilever.
6.3.3
6.4
Rate Effects
Macromolecules on Surfaces
6.4.1
DNA on a Surface
6.4.2
Biofunctionalized Cantilevers Revisited
6.5
6.5.1
Beyond the Random Walk
Compact Random Walks and the Size of Proteins
Random Walk Models Permit an Estimate of the Size of Proteins
The recurring theme of our work is that many different fields of study are
characterized by certain defining models that serve as the point of departure for
deeper analysis of a particular domain. Until now, our discussion has primarily
centered on those models which are primarily deterministic. The ambition of
the present chapter is to illustrate one of the most powerful stochastic models,
namely, the random walk. Indeed, as mentioned in section ??, one of the
defining features of the types of models it is the goal of this book to describe
is their widespread applicability. Certainly, the random walk model succeeds
admirably in this regard, serving as the cornerstone of situations as diverse as
the nature of surface diffusion, the structure of macromolecules and the forces
which hold proteins together and the temporal evolution of financial markets.
Globular proteins in their native state form compact structures. One of the
key ideas driving research in structural biology, which seeks to describe protein
structure in atomic detail, is that protein function follows from its structure.
RP: this section shoul
clude Evans and also
Spakowitz kinetics
JK: Can we take the
model of glasses and u
to build a simple mod
rate effects on heteropo
stretching (or RNA un
ing by mechanical fore)
idea would be that by p
at a certain rate wou
low for only a subset of
to be active. The tim
sociated with a trap e
would be given by the A
nius factor... need to
about this.
RP: this section should
include the entropy on
functionalized cantileve
it bends, figure out how
entropy increase there
the ssDNA
RP: reflecting walls a
don’t know what else:
avoidance. Reflecting
for tethering problems.
RP: want to show s
plots of protein size an
cuss the contrast betwee
polymer physics world
and the specificity/pr
science world view
6.5. BEYOND THE RANDOM WALK
251
Figure 6.22: Scaling of protein size as a function of the number of amino acid
residues.
Proteins are polymers comprised of amino acids. Therefore, a natural question
to ask is what, if any, aspects of protein structure can be understood from
simple coarse-grained models of polymers, such as the various random walks
introduced in this chapter.
Here we examine a lattice model, the compact polymer model, first studied
in the context of protein structure by Dill. Compact polymers are self-avoiding
random walks that visit every site of the lattice, usually taken to be cubic. They
take into account connectedness, compactness, and self-avoidance of a protein
molecule but not the specific sequence information as given by the identity and
order of amino acids along the chain. Compact polymers are a very coarse
grained model of proteins and, as with all coarse-grained models, one is limited
in scope and precision of the questions that the model is equipped to address.
The rewards on the other hand come in the form of simplicity and generality of
the answers obtained.
Possibly the simplest property of a globular protein is its size, as measured
by its linear dimensions, or more precisely, its radius of gyration. The protein
data bank reveals a systematic dependence of the protein size on its mass.
Namely, for globular proteins, the radius of gyration scales roughly with the
cubic root of the mass. This is a property of compact polymers as witnessed by
the configuration shown in fig. 6.23. Since a compact polymer completely fills
the lattice, its linear size will scale with the linear dimension of the lattice or
with the cubic root of the number of lattice sites. If we attach a single residue
with each site, and take these to be of roughly equal mass, we arrive at the
scaling law observed for real proteins. In other words, compactness implies that
all the space occupied by proteins is filled, with no holes present. Therefore, the
volume occupied by the protein, which necessarily scales as the cubic root of
its linear dimension, is proportional to the mass. For proteins in the unfolded
JK: include the plot of Rg
with mass from PDB and
show some of the structures
on the same graph.
252CHAPTER 6. RANDOM WALKS AND THE STRUCTURE OF MACROMOLECULES
Figure 6.23: Compact polymer configuration on a 4x4x3 cubic lattice. Taken
from Dill’s Protein Science review
state the structures are better described as random walks. Random walks,
unlike compact polymers, are fractal objects with the linear size scaling as the
1/2 power of the mass. If one were to examine random self-avoiding walks, an
argument due to Flory, predicts scaling of the linear size with mass to the 3/5
power, indicating a structure which is even more expanded that that of a simple
random walk.
When examining the structure of globular proteins one of the most striking features is the preponderance of rater symmetric motifs such as helices and
sheets, which are referred to as the secondary structure. These are precisely the
features of protein structure accentuated by ribbon diagrams. An interesting
question then is to consider the origin of secondary structure. One idea is that
secondary structure motifs are a consequence of the compact state of the protein
which in turn is affected by the hydrophobicity of amino-acid residues. If indeed
compactness alone drives secondary structure formation then compact polymers
should also exhibit a rather large tendency towards secondary structure motifs.
This hypothesis is readily testable in the lattice model, a study first undertaken
by Chan and Dill. First we need to define secondary structure motifs on the
lattice. An example, for the case of two dimensional compact polymers is given
in fig. 6.24a. Then we generate the ensemble of all possible compact structures,
and for each such structure the percentage of residues taking part in secondary
structure motifs, such as helices, sheets and turns, is computed. The distribution of percentages of residues participating in secondary structure over the
ensemble of compact polymers of different lengths obtained in this way is shown
in fig. 6.24b. Two features are prominent. One, the percentage of monomers
participating in secondary structure approaches a fixed number as the size of
the protein increases, and two, this number is roughly 70%. In other words,
compactness of self-avoiding walks on the lattice indeed produces secondary
6.5. BEYOND THE RANDOM WALK
b
Number Density
30
20
Percent Secondary Structure
a
253
1
0.6
N
N
N
N
N
N
N
N
0.4
0.2
0
10
N
N
N
N
N
N
N
N
0.8
-4
-2
0
=
=
=
=
=
=
=
=
36
10
0
22
5
40
0
62
5
90
0
1600
2500
2
=
=
=
=
=
=
=
=
36
100
225
400
625
900
1600
2500
4
Standard Normally Distributed Random
e Variabl
0
0
0.2
0.4
0.6
0.8
1
Percent of Residues Participating in Secondary Structure
Figure 6.24: a. Secondary structure motifs for two-dimensional compact polymers. b. Histogram of percentage of residues participating in secondary structure over the ensemble of compact polymers of different length.
structure.
6.5.2
Elasticity and Entropy: The Worm-like Chain
RP: wormlike chain via variThe Worm-like Chain Model Accounts for Both the Elastic Energy ational estimate
and Entropy of Polymer Chains
As yet, we have revealed two starkly different ways of viewing the same underlying physical problem, namely, we have seen that we can view the problem of
polymer conformations from the perspective of elastic beam theory (chap. ??),
or from the perspective of the random walk (and its associated entropic elasticity). Evidently, both the bending energy and entropic contributions to the
overall free energy budget are both important. To that end, a class of models
known as worm-like chain models have been set forth in which one evaluates the
partition function associated with different polymer configurations, where each
such configuration is appropriately weighted by its corresponding elastic energy
cost.
Force-extension for worm-like chain model
The Worm-like Chain Model Describes the Force-Extension Properties of Macromolecules
The competition between chain entropy and bending energy is captured by
the partition function, which for the worm like chain model reads
Z=
Dt(s) exp −
ξP
2
L
0
dt
ds
2
ds
.
(6.53)
254CHAPTER 6. RANDOM WALKS AND THE STRUCTURE OF MACROMOLECULES
Figure 6.25: Cartoon illustrating the two competing pieces of physics present in
the worm-like chain model. RP: Somehow we have to capture both the entropy
and elastic cost.
Figure 6.26: Data for force vs extension for double stranded DNA illustrating
the distinction between the freely-jointed chain model and the worm-like chain
model. Adapted from Cocco et al. (2002).
6.5. BEYOND THE RANDOM WALK
255
The mathematics expressed by the above equation translates into a simple algorithm: 1. draw a curve of length L representing a possible DNA configuration,
L
2
2. evaluate its bending energy, Ebend = ξP kB T /2 0 |dt/ds| ds, and the corresponding Boltzmann factor exp(−Ebend /kB T ), and 3. repeat 1. and 2. for all
possible curves and sum all the Boltzmann factors to obtain Z. The sum over
curves is the celebrated Feynman path integral, written here as Dt(s) where
t(s) is the tangent vector of the curve, parametrized by the arc length s.
In order to compute the force extension curve of the worm-like chain model,
we consider a force F applied to one end of the chain in the ẑ-direction. The
L
energy of the worm-like chain then acquires an additional −F/kB T 0 tz ds term.
This is analogous to the energy of the freely jointed chain model in the presence
of an applied force, discussed previously. For fixed F , the chain extension z =
L
t ds averaged over all configurations is
0 z
hzi =
1
Z(f )
Dt(s) z exp −
ξP
2
L
0
dt
ds
2
JK: In the beam chapter we
have been using Estrain but
I like Ebend for the beambending energy.
L
ds + f
tz ds
,
(6.54)
0
where Z(f ) is the partition function in the presence of the applied force F =
f kB T . Note that the reduced force f has units of inverse length. Eqn. (6.54)
can be rewritten as
d ln Z(f )
hzi =
,
(6.55)
df
owing to the fact that z is the thermodynamic conjugate of the force. The
problem of computing the force extension curve therefore reduces to computing
the partition function Z(f ). This is a rather difficult mathematical problem,
but it greatly simplifies in the small and large force limits.
First we examine the small force limit defined by f ξP ¿ 1; for dsDNA
this becomes F ¿ kB T /ξP = 0.08pN, a small force indeed. In this limit the
partition function can be expanded in powers of f ξP
Z(f ) =
Dt(s) exp −
ξP
2
L
0
dt
ds
2
L
ds
1+f
tz (s)ds +
0
f2
2
L
L
0
0
(6.56)
and we can safely retain only the first three terms in the expansion. The approximate expression that we get in this way can be rewritten as:
L
Z(f ) = Z(0) 1 + f
0
htz (s)i0 ds +
f2
2
L
0
L
0
dsdu htz (s)tz (u)i0
tz (u)du + O (f ξP )3
tz (s)ds
. (6.57)
Here < · · · >0 is the average evaluated using the zero-force partition function,
eqn. (6.53). The second term does not contribute since the average extension in
the zero-force case is zero. The third term in the expansion gives a non-vanishing
contribution which can be evaluated with the help of the tangent-tangent correlation function ht(s) · t(u)i0 = exp(|s − u|/ξP ); see eqn. (??) in chapter 5. Since
t(s) · t(u) = tx (s)tx (u) + ty (s)ty (u) + tz (s)tz (u), and since the worm-like chain
,
256CHAPTER 6. RANDOM WALKS AND THE STRUCTURE OF MACROMOLECULES
energy is invariant under rotations, htz (s)tz (u)i0 = 1/3 ht(s) · t(u)i0 follows.
L L
Replacing this into eqn. 6.57, and using 0 0 dsdu exp(|s − u|/ξP ) = 2LξP ,
JK: Both the tangent- which holds in the L À ξP limit, gives
tangent corr.
and the
f 2 LξP
end-to-end distance formuZ(f ) = Z(0) 1 +
.
(6.58)
3
las for WLC should appear
before this. The first in the
Finally, making use of the relation given in eqn. (6.55) we arrive at
beam-theory chapter, the
second in the random walks
hzi
2f ξP
=
,
(6.59)
and structure chapter.
L
3
same as for the freely jointed chain, taking the length of the Kuhn segment to
be twice the persistence length.
Next, we consider the limit of very large forces (f ξP À 1) which, as remarked
earlier, for dsDNA corresponds to stretching forces in excess of 0.08pN. In this
limit chain configurations that contribute appreciably to the partition sum Z(f )
have their tangent vectors point roughly in the direction of the force. In other
words the components of the tangent vector in the x̂ and ŷ directions can be
considered small, and
t=
1
tx , ty , 1 − (t2x + t2y )
2
.
(6.60)
This approximate expression for the tangent vector turns the formula for the
bending energy into a quadratic form in tx and ty ,
Ebend =
ξP kB T
2
L
ds
0
dtx
ds
2
+
dty
ds
2
+
f kB T
2
L
ds t2x + t2y −f kB T L .
0
(6.61)
For quadratic energies such as this the path integral for the partition function
turns into a Gaussian integral which can be evaluated explicitly.
Using eqn. (6.60) the average extension in the large force limit can be written
as
1 L
hzi = L −
(6.62)
ds t2x + t2y
2 0
where h· · ·i is the average with respect to Z(f ). To compute t2x + t2y we first
express the bending energy functional in term of the Fourier components of the
tangent vector,
ti (s) =
eiws ti (w) (i = x, y) ,
(6.63)
w
where the frequencies w = 2πj/L with j an integer. In Fourier space the
bending energy takes on the form of the potential energy of a collection of
harmonic oscillators, two for each value of the frequency w:
Ebend =
LkB T
2
(ξP w2 + f ) t2x (w) + t2y (w) .
w
(6.64)
6.6. THE RANDOM WALK REEXAMINED
257
This observation allows us to compute the average t2i (w) without explicitly computing the path integral. Namely, we remind ourselves that the equipartition
theorem of equilibrium statistical mechanics which states that the average energy for every quadratic degree of freedom is kB T /2. Therefore,
LkB T
(ξP w2 + f ) t2i (w)
2
and
1
hzi
=1−
L
L
w
=
kB T
2
(i = x, y) ,
1
ξP w 2 + f
(6.65)
(6.66)
follows immediately from eqn. (6.62) if we replace the integral in that equation
by a sum over Fourier modes. Now the sum in eqn. (6.66 can be evaluated by
+∞
taking a continuum limit w → L/2π −∞ dw; the remaining integral over w
gives
hzi
1
.
(6.67)
=1− √
L
2 f ξP
JK: Need a better intuitive
explanation of why the extension for the WLC model
is greater than that of the
FJC model at large forces.
Also, need to supplement
with some numbers. How
high a force before the model
breaks down?
Add bond
(6.68) stretching effects?
Unlike the low-force limit this is very different from the freely jointed chain
result for the force-extension curve which predicts a 1/f ξp approach to full
extension z = L. The slower approach to full extension in this case is a consequence of the finite bending rigidity at scales below the Kuhn length in the
worm like chain model. For intermediate forces there is no closed form expression for the extension but Marko and Siggia proposed a very successful formula
that interpolates between the two limits:
f ξP ≈
z
1
1
− .
+
L 4(1 − z/L)2 4
6.6
The Random Walk Reexamined
6.7
Further Reading
Benedek G. B. and Villars F. M. H., Physics With Illustrative Examples
from Medicine and Biology: Statistical Physics, Springer-Verlag, Inc.,
New York: New York, 2000.
Berg H., Random Walks in Biology, Princeton University Press, Princeton:
New Jersey, 1993.
Doi M., Introduction to Polymer Physics, Oxford University Press, Oxford:
England, 1995.
Doi M. and Edwards S. F., The Theory of Polymer Dynamics, Oxford
University Press, Oxford: England, 1986.
RP: why this model is in the
hall of fame
258CHAPTER 6. RANDOM WALKS AND THE STRUCTURE OF MACROMOLECULES
Rensberger B., Life Itself, Oxford University Press, New York: New York,
1996. Rensberger’s book is a sophisticated popularization of some of the recent
developments in molecular biology.
Chandrasekhar article
Brush article
Bray D., Cell Movements: From Molecules to Motility, Garland Publishing, New York: New York, 2001.
de Gennes P.-G., Scaling Concepts in Polymer Physics, Cornell University
Press, Ithaca: New York, 1979.
Lubensky D. K. and Nelson D. R., Pulling Pinned Polymers and Unzipping
DNA, Phys. Rev. Lett., 85, 1572 (2000).
Du R., Grosberg A. Y. and Tanaka T., Random Walks in the Space of Conformations of Toy Proteins, Phys. Rev. Lett., 84, 1828 (2000).
Raposo E. P., de Oliveira S. M., Nemirovsky A. M. and Coutinho-Filho M. D.,
Random walks: A pedestrian approach to polymers, critical phenomena, and
field theory, Am. J. Phys., 59, 633 (1991).
Howard J., Mechanics of Motor Proteins and the Cytoskeleton, Sinauer
Associates, Inc., Sunderland: Massachusetts, 2001.
Wang M. D., Manipulation of single molecules in biology, Curr. Opinion Biotech.,
10, 81 (1999).
Grandbois M., Beyer M., Rief M., Clausen-Schaumann H. and Gaub H. E., How
Strong is a Covalent Bond?, Science, 283, 1727 (1999).
RP: the title of this paper is
sort of the mantra for this Stuart J. K. and Hlady V., Reflection Interference Contrast Microscopy Comsection
bined with Scanning Force Microscopy Verifies the Nature of Protein-Ligand
Interaction Force Measurements, Biophys. J., 76, 500 (1999).
Carrion-Vazquez M., Marszalek P. E., Oberhauser A. F. and Fernandez J. M.,
Atomic force microscopy captures length phenotypes in single proteins, Proc.
Natl. Acad. Sci., 96, 11288 (1999).
Carrion-Vazquez M., Oberhauser A. F., Fowler S. B., Marszalek P. E., Broedel
S. E., Clarke J. and Fernandez J. M., Mechanical and chemical unfolding of a
single protein: A comparison, Proc. Natl. Acad. Sci., 96, 3694 (1999).
Mehta A. D., Reif M. and Spudich J. A., Biomechanics, One Molecule at a
Time, Journal of Biol. Chem., 274, 14517 (1999).
6.7. FURTHER READING
259
Gittes F. and Schmidt C. F., Thermal noise limitations on micromechanical
experiments, Eur. Biophys. J., 27, 75 (1998).
Beyer M. K., The mechanical strength of a covalent bond calculated by density
functional theory, J. Chem. Phys., 112, 7307 (2000).
Evans E. and Ritchie K., Strength of a Weak Bond Connecting Flexible Polymer
Chains, Biophys. J., 76, 2439 (1999).
Lebrun A., Shakked Z. and Lavery R., Local DNA stretching mimics the distortion caused by the TATA box-binding protein, Proc. Natl. Acad. Sci., 94, 2993
(1997).
Klimov D. K. and Thirumalai D., Stretching single-domain proteins: Phase diagram and kinetics of force-induced unfolding, Proc. Natl. Acad. Sci., 96, 6166
(1999).
Baumann C. G., Bloomfield V. A., Smith S. B., Bustamante C., Wang M. D.
and Block S. M., Stretching of Single Collapsed DNA Molecules, Biophys. J.,
78, 1965 (2000).
Kellermayer M. S. Z., Smith S. B., Granzier H. L. and Bustamante C., FoldingUnfolding Transitions in Single Titin Molecules Characterized with Laser Tweezers, Science, 276, 1112 (1997).
Cluzel P., Lebrun A., Heller C., Lavery R., Viovy J.-L., Chatenay D. and Caron
F., DNA: An Extensible Molecule, Science, 271, 792 (1996).
Bustamante C., Marko J. F., Siggia E. D. and Smith S., Entropyic Elasticity of
λ-Phage DNA, Science, 265, 1599 (1994).
Marko J. F. and Siggia E. D., Fluctuations and Supercoiling of DNA, Science,
265, 506 (1994).
Smith S. B., Cui Y. and Bustamante C., Overstretching B-DNA: The Elastic
Response of Invidivual Double-Stranded and Single-Stranded DNA Molecules,
Science, 271, 795 (1996).
Marko J. F. and Siggia E. D., Statistical mechanics of supercoiled DNA, Phys.
Rev., E52, 2912 (1995).
Li H., Rief M., Oesterhelt F., Gaub H. E., Zhang X. and Shen J., Single-molecule
force spectroscopy on polysaccharides by AFM-nanomechanical fingerprint of α(1,4)-linked polysaccarides, Chem. Phys. Lett., 305, 197 (1999).
260CHAPTER 6. RANDOM WALKS AND THE STRUCTURE OF MACROMOLECULES
Marko J. F., Twist and shout (and pull): Molecular chiropractors undo DNA,
Proc. Natl. Acad. Sci., 94, 11770 (1997).
Erickson H. P., Stretching Single Protein Molecules: Titin is a Weird Spring,
Biophys. J., 276, 1090 (RP).
Leckband D., Measuring the Forces that Control Protein Interactions, Annu.
Rev. Biophys. Struct., 29, 1 (2000).
Linke W. A. and Granzier H., A Spring Tale: New Facts on Titin Elasticity,
Biophys. J., 75, 2613 (1998).
Meiners J.-C. and Quake S. R., Femtonewton Force Spectroscopy of Single Extended DNA Molecules, Phys. Rev. Lett., 84, 5014 (2000).
Rich A., The Rise of Single-Molecule DNA Biochemistry, Proc. Natl. Acad.
Sci., 95, 13999 (1998).
Essevaz-Roulet B., Bockelmann U. and Heslot F., Mechanical separation of the
complementary strands of DNA, Proc. Natl. Acad. Sci., 94, 11935 (1997).
Bockelmann U., Essevaz-Roulet B. and Heslot F., Molecular Stick-Slip Motion
Revealed by Opening DNA with Piconewton Forces, Phys. Rev. Lett., 79, 4489
(1997).
Bockelmann U., Essevaz-Roulet B. and Heslot F., DNA strand separation studied by single molecule force measurements, Phys. Rev., E58, 2386 (1998).
Marszalek P. E., Lu H., Li H., Carrion-Vazquez M., Oberhauser A. F., Schulten
K. and Fernandez J. M., Mechanical unfolding intermediates in titin modules,
Nature, 402, 100 (1999).
Allemand J.-F., Bensimon D., Lavery R. and Croquette V., Stretched and overwound DNA forms a Pauling-like structure with exposed bases, Proc. Natl.
Acad. Sci., 95, 14152 (1998).
Thompson R. E. and Siggia E. D., Physical Limits on the Mechanical Measurement of the Secondary Structure of Bio-molecules, Europhys. Lett. 31, 335
(1995).
Doyle P. S., Ladoux B. and Viovy J.-L., Dynamics of a Tethered Polymer in
Shear Flow, Phys. Rev. Lett., 84, 4769 (2000).
This paper looks at experiments like those of Chu.
Strick T. R., Allemand J.-F., Bensimon D. and Croquette V., Stress-Induced
Structural Transitions in DNA and Proteins, Ann. Rev. RP., RP, 523 (2000).
6.8. PROBLEMS
261
Rief M., Gautel M., Oesterhelt F., Fernandez J. M. and Gaub H., Reversible
Unfolding of Individual Titin Immunoglobulin Domains by AFM, Science 276,
1109 (1997).
Rief M., Oesterhelt F., Heymann B. and Gaub H. E., Single Molecule Force
Spectroscopy on Polysaccharides by Atomic Force Microscopy, Science, 275,
1295 (1997).
Oesterhelt F., Oesterhelt D., Pfeiffer M., Engel A., Gaub H. E. and Muller D. J.,
Unfolding Pathway of Individual Bacteriorhodopsins, Science, 288, 143 (2000).
Austin R. H., Brody J. P., Cox E. C., Duke T. and Volkmuth W., Stretch Genes,
Phys. Today, pg. 32, Feb. 1997.
Lu H. and Schulten K., The Key Event in Force-Induced Unfolding of Titin’s
Immunoglobulin Domains, Biophys. J., 79, 51 (2000).
Lu H., Isralewitz B., Krammer A., Vogel V. and Schulten K., Unfolding of Titin
Immunoglobulin Domains by Steered Molecular Dynamics Simulation, Biophys.
J., 75, 662 (1998).
Rohs R., Etchebest C. and Lavery R., Unraveling Proteins: A Molecular Mechanics Study, Biophys. J., 76, 2760 (1999).
Lavery R. and Lebrun A., Modelling DNA Stretching for Physics and Biology, in
Structural Biology and Functional Genomics edited by E. M. Bradbury
and S. Pongor, Kluwer Academic Publishers, Dordrecht: The Netherlands, 1999.
Finer J. T., Simmons R. M. and Spudich J. A., Single myosin molecule mechanics: piconewton forces and nanometre steps, Nature, 368, 113 (1994).
6.8
Problems
How Big is a Genome?
(a) Compute the entropic size of the DNA associated with a human chromosome if it is not associated with any proteins.
Entropic Cost of DNA Packing. Work out the free energy cost associated
with packing the E. coli genome inside of the bacterium assuming that the
entirety of this free energy cost is entropic. (RP: be careful about the FloryHuggins version of this story that Ken likes to talk about). RP: also do the
problem for the virus.
Force-Extension in the Freely Jointed Chain.
Work out the force extension properties of the three dimensional freely
jointed chain, both for discrete and continuous allowed orientations.
© Copyright 2026 Paperzz