Document

Statistical Physics II
sf2 Fall 2012
Helmut Schiessel
1
1.1
Introduction and reminder
The partition function
To explain what statistical physics is all about, we start from a simple example
we are all familiar with since our childhood: a balloon filled with gas. The
physical state of the gas in the balloon can be fully characterized by three
physical quantities: (1) The volume V of the balloon, that corresponds to the
volume available for the gas. (2) The pressure p that describes how hard one
has to press to compress the gas. (3) The temperature T of the gas. It has
been known since a long time that these three quantities are related to each
other. Robert Boyle found in 1692 that when a fixed amount of gas is kept at
contant temperature, then the pressure and volume are inversely proportional,
i.e., p ∼ 1/V . Jacques Charles found in the 1780s that if the pressure of a
fixed amount of gas is hold constant, then the volume is proportional to the
temperature, i.e., V ∼ T . And finally Joseph Louis Gay-Lussac stated in 1802
that the pressure of a fixed amount of gas in a fixed volume is proportional to
the temperature, i.e., p ∼ T . You can easily check that these three laws are all
fulfilled if the ratio between the pressure-volume product and the temperature
is constant:
pV
= const.
(1)
T
That means if we look at a gas at two different states, (p1 , V1 , T1 ) and (p2 , V2 , T2 ),
we always find p1 V1 /T1 = p2 V2 /T2 . What is that value, i.e., the value of the
constant on the right hand side (rhs) of Eq. 1? The value depends on the amount
of gas inside the balloon. An amount of gas that occupies V = 22.4 litres at
T = 0◦ C = 273.15K and atmospheric pressure, p = 1.013 bar, is called one
mole. The constant then takes the value
R = 8.31
J
K
(2)
and is called the universal gas constant. If the amount of gas in the balloon is
n moles, then the constant in Eq. 1 has the value nR.
Equation 1, the so-called combined gas law, is an example of an empirical law
that relates measurable physical quantities. Statistical physics is the theoretical
framework that allows us to derive such laws from first principles. This is a quite
daunting task. A gas is a collection of a huge number of particles. We nowadays
know that one mole of gas contains NA = 6.02 × 1023 particles (independent of
the type of gas chosen; normal air, helium, etc.) where NA is called the Avogadro
1
constant. This is a rather mindblowing fact: a balloon which contains one mole
of particles can be fully characterized by three so-called macroscopic variables,
p,V and T , yet it has a myriad of microstates characterized by the positions
and velocities (both in X-, Y - and Z-direction) of 6 × 1023 gas molecules!
Let us try to derive Eq. 1 from the microscopic structure of the gas. This will
serve as a concrete example to introduce the methods of statistical physics. As
a first step we introduce the very high-dimensional phase space of our system,
that contains the positions and momenta of all the N particles. A point in phase
space is given by (x1 , y1 , z1 , ..., xN , yN , zN , px1 , py1 , pz1 , ..., pxN , pyN , pzN ), where e.g.
yi and pyi denote the position and momentum of the ith particle, both in the
Y -direction. In short hand notation we can write (q, p) for such a point in
phase space where q is a high dimensional vector that contains all the positions
and p all the momenta of the N particles. The amazing thing is that as the
gas molecules move inside the balloon and bounce off its surface, i.e., as the
point (q, p) races through the phase space, we can not see anything happen to
the balloon in our hands, that stays quiet at a constant pressure, volume and
temperature.
This suggests the following: To a given macrostate characterized by the
triplet (p, V, T ) there is a myriad of microstates, each characterized by a highdimensional vector (q, p). But not all possible microstates should have the same
probability. As it is highly unlikely (but in principle not impossible) to throw
a dice one billion times and to find a six each time, so it is highly unlikely that
at a certain point in time all the 6 × 1023 particles are in the left half of the
balloon. We thus need to introduce the concept of probabilities by assigning to
each microstate (p, q) a probability ρ = ρ (q, p).
We now present a line of argument that allows us to determine the form of ρ,
namely Eq. 5 below. Please be warned that even though each of the steps looks
rather compact, it is not easy to grasp them entirely. At this stage you might
rather consider this is as a rough outline, providing you with a rather general
view of things and allowing to quickly get to something concrete to work with.
You do not have to feel too uncomfortable with this, since we shall later on
provide a completely different argument that again leads us to Eq. 5.
In Fig. 1 we show the balloon with its gas molecules; we call this whole
system Σ. We consider now two subsystem, Σ1 and Σ2 , namely the molecules
to the left and to the right of a virtual dividing plane as indicated in the figure by
a dashed line. Real gas molecules have a very short range of interaction that is
much shorter than the diameter of the balloon. This means that only a very tiny
fraction of the molecules in Σ1 feel molecules from Σ2 and vice versa. Therefore,
to a good approximation, the two subsystems can be considered as independent
from each other. We can thus separately, for each subsystem, define probability
densities ρ1 and ρ2 – without going here further into mathematical details. Now
since Σ1 and Σ2 are independent the probability of the whole system is simply
the product of the probabilities of its subsystems, ρ = ρ1 ρ2 (just as for two dice;
the probability to throw a 6 amounts for each dice to 1/6 and the probability
that both dice yield a 6 is 1/6 × 1/6 = 1/36). Using the functional property of
2
Σ2
Σ1
Σ
Figure 1: A balloon filled with gas molecules. The virtual line divides the whole
balloon (system Σ) into two subsystems, its left half, Σ1 , and its right half, Σ2 .
To a good approximation these two halves are independent from each other, as
mathematically expressed in Eq. 3.
the (natural) logarithm, ln ab = ln a + ln b, this can be rewritten as:
ln ρ = ln ρ1 + ln ρ2 .
(3)
This is one of the conditions that ρ needs to fulfill. A second one is the
following. Here and in the rest of this lecture we are considering systems in
equilibrium. What we mean by this is that the system has evolved to a state
where nothing changes anymore. For our balloon this means that the values of
p, V and T stay constant in time (unlike e.g. a glass of water where all the water
evaporates if you wait long enough). Likewise microscopically nothing changes
anymore, i.e., the function ρ = ρ (q, p, t) does not explicitly depend on time but
is of the form ρ = ρ (q, p), as we had written it in the first place. In other words
∂ρ
= 0.
∂t
(4)
Amazingly Eqs. 3 and 4 are enough to determine ρ. We know from Eq. 4
that ρ is a conserved quantity, meaning a quantity that does not change in time.
ρ must thus be a function of a conserved physical quantity. Possible candidate
quantities are: (a) the total energy H of the system, (b) its total momentum
P and (c) the particle number N (for different types of particles the numbers
Nα of each type). Most systems are confined by walls (e.g. a gas in a balloon).
Whenever a gas molecule hits the balloon, it gets reflected and thereby transmits
momentum to the balloon; thus P of the gas is not conserved. This means ρ
can only depend on H and N . From Eq. 3 we know that ln ρ is an additive
3
quantity and so is the energy H, HΣ = HΣ1 + HΣ2 and the particle number N ,
NΣ = NΣ1 + NΣ2 . This means we know more about how ln ρ should depend on
H and N , namely it must be a linear function of additive, conserved quantities.
This leaves several possibilities for ln ρ that depend on the concrete physical
situation. For the balloon the number of particles inside the balloon is fixed
since the gas molecules cannot pass through the balloon skin. However, energy
can flow in and out of the balloon in the form of heat. In that case we should
expect that ln ρ ∼ −βH where β is some constant. If in addition particles can
move in and out, one should expect that ln ρ ∼ +αN − βH with α being yet
another constant. The plus and minus signs here are just conventions and do
not mean anything since we do not yet know the signs of α and β.
Let us begin with the first case, the one with N fixed. Such a system is
called the canonical ensemble. From above we know that ρ must be of the form:
ρ (q, p) =
1 −βH(q,p)
e
Z
(5)
where the function H = H (q, p) is the energy of the system that depends
on the positions and momenta of all the particles. The role of the prefactor
1/Z is to normalize the probability distribution such that the sum over all
different possible states of the system adds up to one. Surprisingly this seemingly
harmless prefactor is the whole key to understand the properties of the system
as we shall see below. As it turns out to be so important, it should not surprise
you that it has a name: the partition function. We need to choose Z such that
Z
1
ρ (q, p) d3N q d3N p = 1
(6)
N !h3N
and hence
1
Z=
N !h3N
Z
e−βH(q,p) d3N q d3N p.
(7)
The prefactor 1/N !h3N in front of the integrals in Eqs. 6 and 7 seems to be
an unnecessary complication in the notation and needs some explanation. Let us
start with the factor 1/N !. This corresponds to the number of possible ways one
could number the N particles (we pick a particle and give it a number between 1
and N , then the second particle and give it one of the N − 1 remaining tags and
so on). If the microscopic world would behave classically (like the macroscopic
world we are used to live in), we can give each of the N gas molecules such an
individual tag and follow its course in time. That way the two configurations
shown in Fig. 2(a) are different from each other, since particles 1 and 2 are
exchanged. However, the microscopic world of these particles is governed by
the laws of quantum mechanics. One of these laws is that identical particles are
indistinguishable, in other words the two conformations shown in Fig. 2(a) are
identical and belong to exactly the same physical
R state, the one shown to the
right, Fig. 2(b). When performing the integrals d3N qd3N p in Eqs. 6 and 7 one
would encounter N ! times such a configuration. The prefactor 1/N ! prevents
this overcounting.
4
1
3
4
5
2
�=
4
3
1
quantum mechanical
5
2
classical
Figure 2: A balloon with N = 5 identical gas molecules. (a) In classical mechanics we can number the particles individually allowing us to distinguish between
the configurations shown in the top and in the bottom. (b) In quantum mechanics identical particles are indistinguishable, which means that the two states that
are shown on the left are one and the same, namely the configuration depicted
at the right. This quantummechanical law is the cause of the 1/N ! factor in Eq.
7.
Next we discuss the factor h3N . This factor is introduced to make Z dimensionless, i.e., no matter what units we use (e.g. meters or inches for length) Z is
always the same. h is a quantity with the dimensions of length times momentum
(or equivalently energy times time), namely
h = 6.626 × 10−34 Js.
(8)
Even though this choice seems arbitrary from the viewpoint of classical mechanics it can be motivated to be the most logical choice in the realm of quantum
mechanics. The quantity h is the so-called Planck constant that appears in a
famous relation in quantum mechanics: It is impossible to measure the position
and momentum of a particle beyond a certain precison. According to the socalled Heisenberg’s uncertainty principle the uncertainty in position, ∆x, and
in momentum, ∆px , both in X-direction, obey the relation ∆x∆px ≥ h/4π
(more precisely ∆x and ∆p are the standard deviations found when the measurements are repeated again and again under identical conditions). So if one
measures the position of a particle very precisely, there is a large uncertainty in
its momentum and vice versa, a consequence of the particle-wave duality that
we shall not discuss here further. Because of this it makes sense to divide our
6N -dimensional space in small hypercubes of volume h3N which explains the
choice of the prefactor in Eq. 7.
To give a concrete example we calculate the partition function of the gas in
the balloon, Fig. 1. Before we can start to evaluate the integral, Eq. 7, we need
to have an expression for the energy of the gas, H = H (q, p), its Hamiltonian.
We consider here ideal gas, an idealization of a real gas. In this model the
5
interaction between different gas molecules is neglected altogether. This turns
out to be an excellent approximation for most gases since the concentration of
gas molecules is so low that they hardly ever feel each other’s presence. This
means that the energy is independent of the distribution of the molecules in
space, i.e., H = H (q, p) does not depend on q. This leaves us just with the
kinetic energy of the particles (assumed here to all have the same mass m):
H = H (p) =
N
N
i
X
1 Xh x 2
p2i
2
2
=
(pi ) + (pyi ) + (pzi )
2m
2m i=1
i=1
(9)
with pi = |pi | being the length of the momentum vector pi = (pxi , pyi , pzi ). Plugging this into Eq. 7 we realize that we have Gaussian integrals. The momentum
p
integration of each of the 3 components of each particle gives a factor 2πm/β.
In addition each particle is allowed to move within the whole balloon so that its
position integration gives a factor V . Altogether this leads to
3N/2
Z
β PN
2
VN
VN
2πm
− 2m
i=1 pi d3N p =
Z=
e
.
(10)
N !h3N
N !h3N
β
It is customary to introduce a quantity called the thermal de Broglie wave length
r
β
λT = h
(11)
2πm
that allows us to write the partition function Z of the ideal gas very compactly:
N
1
V
.
(12)
Z=
N ! λ3T
We introduced the partition function in Eq. 5 merely as a prefactor necessary
to normalize the probability distribution, but we mentioned already that then
one can derive from Z almost everything one would like to know about the
macroscopic system. As a first example we show now that knowing Z means
that it is straightforward to determine E = hHi, the average energy of the
system:
R
H (q, p) e−βH(q,p) d3N q d3N p
R
hHi =
e−βH(q,p) d3N q d3N p
Z
1
1
=
H (q, p) e−βH(q,p) d3N q d3N p.
(13)
Z N !h3N
Here the denominator is necessary to normalize the canonical distribution
and is, of course, again proportional to the partition function. It seems at first
that the integral on the rhs of Eq. 13 needs to be evaluated all over again.
However, the beauty of the partition function Z is that it is of such a form that
it allows expressions such as Eq. 13 to be obtained from it by straightforward
differentiation. You can easily convince yourself that one has
E = hHi = −
6
∂
ln Z.
∂β
(14)
The differentiation of the ln-function produces the 1/Z prefactor on the rhs of
Eq. 13 and the form of its integrand, He−βH , follows simply from the differentiation, −∂e−βH /∂β. This means all the hard work lies in calculating Z through
a high-dimensional integral, Eq. 7. Once this is done, the harvest consists of
straightforward differentiation as in Eq. 14.
We can also calculate the variance of the energy fluctuations of the gas.
These fluctuations result from the exchange of heat with the surrounding air
outside
that constitutes a so-called heat bath. This quantity is
the
balloon
2
2
σE
= H 2 − hHi and follows simply by differentiation of ln Z twice:
2
σE
∂
∂2
ln Z = −
hHi
∂β 2
∂β
∂ 1
2
= H 2 − hHi .
= H 2 − Z hHi
∂β Z
=
(15)
To arrive at the second line we used Eq. 13; the first term accounts for the
β-dependence inside the integral, the second for that of the Z −1 prefactor.
Since we already calculated the partition function of the ideal gas, Eq. 10,
we can immediately obtain, via Eq. 14, its average energy:
E = hHi =
3N
.
2β
(16)
The energy is thus proportional to the particle number N , as one should expect
for non-interacting particles, and inversely proportional to the quantity β. We
still do not know the physical meaning of that quantity – even though, as we
shall soon see, it is well-known to us; we even have a sensory organ for it. For
now we can only give β a rather technical meaning: It allows us, via Eq. 16, to
set the average energy hHi of the gas to a given value.
We can now also calculate the typical relative deviation of the energy from
its mean value hHi. It follows from Eqs. 10 and 15 that
r
σE
2 1
√ .
=
(17)
hHi
3 N
This means that for large systems the relative fluctuations around the mean
value are so tiny that the system is, for any practical purposes, indistinguishable
from a microcanonical ensemble, a system that is thermally isolated, i.e., that
cannot exchange energy with the outside world.
Our aim is now to derive an equation for the pressure of the ideal gas and to
check whether statistical mechanics allows us to derive from first principles the
combined gas law, Eq. 1. To make the analysis more convenient we put the gas
in a cylinder with a movable piston, Fig 3, instead of a balloon. If we apply a
force f on the piston, then the pressure on it is given by p = f /A where A is the
area of the piston. The gas occupies a volume V = Al with l denoting the height
of the piston above the bottom of the cylinder. To better understand how the
gas can exert a force on the piston we add to the Hamiltonian H (q, p) a wall
7
u (l − x)
l
Figure 3: Gas in a cylinder. The piston is under an externally imposed force
f that counterbalances the individual forces fi of the gas molecules close to
the surface of the piston. Each of these forces follows from a short ranged wall
potential u that smoothly goes to infinity as the gas molecule reaches the surface
of the piston.
potential Uwall (l, q) that depends on the positions of all the particles and on the
height l of the piston. We do not assume anything here about the form of the
Hamiltonian H (q, p) so the following formulas are general. The wall potential
Uwall (l, q) takes an infinite value if any of the molecules is outside the allowed
volume. This way the gas is forced to stay inside the cylinder. To calculate the
force exerted by the gas molecules we assume that the potential goes smoothly
to infinity over a microscopically short distance δ when a particle gets close to
the surface of the piston (for the other confining walls we simply assume that
the potential jumps right to infinity). More specifically, the wall potential is of
the form
N
X
Uwall (l, q) =
u (l − xi )
(18)
i=1
as long as all particles are inside the cylinder and infinity otherwise. Most
particles are far from the surface of the piston, l − xi > δ, and thus do not feel
it, i.e., u (l − xi ) = 0. But a small fraction of them are closeby, l − xi < δ,
and they are pushed to the left exerting a force on the piston. For a given
configuration of particles, q = (x1 , y1 , z1 , ..., xN , yN , zN ) this force is given by
f =−
N
X
∂u (l − xi )
∂Uwall (l, q)
=−
.
∂l
∂l
i=1
We are, however, interested in the mean force hf i that is given by
Z
1
1
∂Uwall (l, q) −β[H(q,p)+Uwall (l,q)]
3N
3N
hf i =
d
qd
p
−
e
.
Z N !h3N
∂l
8
(19)
(20)
This expression might look complicated, but again it is just a simple derivative
of the partition function, namely
1 ∂ ln Z
.
β ∂l
hf i =
(21)
This is the average force that is exerted by the gas on the piston (and vice
versa). Using the relations p = f /A and V = Al we can immediately write
down the relation for the pressure:
hpi =
1 ∂ ln Z
hf i
=
.
A
β ∂V
(22)
We can now use Eq. 22 to determine the pressure of an ideal gas. When
calculating its partition function in Eq. 10 we did not take account of a detailed
wall potential. But since the wall potential increases over a microscopically
small distance δ l, the partition function is not affected by such details.
Using Eq. 12 we find
N
hpi =
.
(23)
βV
Comparison with the combined gas law, Eq. 1, lets us finally understand the
physical meaning of β: it is inversely proportional to the temperature:
β=
1
.
kB T
(24)
The quantity kB is called the Boltzmann constant. From Eq. 1 together with
Eq. 2 follows its value
kB =
R
J
= 1.38 × 10−23 .
NA
K
(25)
To summarize we have found two equations that characterize an ideal gas.
From Eq. 16 we find for the energy
E=
3
N kB T
2
(26)
and from Eq. 23 we obtain the ideal gas equation of state
pV = N kB T.
(27)
The first relation, Eq. 26, states that each gas molecule has on average an
energy of (3/2) kB T , this is, as we can see from Eq. 9, its kinetic energy. The
temperature of a gas is thus a measure of the average kinetic energy of its
molecules that move on average faster inside a hotter gas. The second relation
states how these molecules exert a force when they bounce off the inner side
of the wall of the balloon, Fig. 1, or the piston, Fig. 3. The hotter the gas
the faster the gas molecules and the larger the transferred momentum during
9
collision. The larger the volume, the longer the time before the molecule hits
the wall again and thus the lower the average pressure.
The quantity kB T is called the thermal energy. At room temperature, T =
293K, one has
kB T = 4.1pN nm.
(28)
It is worthwhile to remember this formula by heart (instead of Eqs. 2 and 25).
Let us now come to the second case, the case of a system that exchanges
energy and particles with its surroundings. In this case only the expectation
values of the energy, E = hHi, and the particle number, N = hN i, can be given.
This is the so-called grandcanonical ensemble. In that case we expect a density
distribution ρ of the form:
1 αN −βH
e
.
(29)
ρ=
ZG
The grandcanonical partition function is a summation and integration over all
possible states of the system, each state weighted with ρ. This means we have
to sum over all particle numbers and then, for each number, over the positions
and momenta of all the particles:
ZG =
∞
X
N =0
1
h3N N !
Z
eαN −βH(q,p) d3N q d3N p.
(30)
This can be rewritten as
ZG =
∞
X
eαN ZN =
∞
X
z N ZN
(31)
N =0
N =0
where ZN is the canonical partiton function of a system of N particles, i.e., the
quantity that we called Z in Eq. 7. On the rhs of Eq. 31 we introduced the
so-called fugacity z = eα . It is straightforward to see, using similar arguments
as the ones that led to Eqs. 14 and 15, that
∂
ln ZG ,
∂β
∂
N = hN i =
ln ZG ,
∂α
E = hHi = −
∂2
ln ZG
∂β 2
∂2
=
ln ZG .
∂α2
2
σE
=
2
σN
(32)
For large N the relative fluctuations in energy and particle number, σE /E
and σN /N , become so small (just as in Eq. 17) that the grandcanonical ensemble
with mean energy E and mean particle number N becomes physically equivalent
to the canonical ensemble with mean energy E and exact particle number N .
It is thus just a matter of convenience which ensemble one chooses. Many
calculations are more convenient in the grandcanonical ensemble since one does
not have such a strict condition on N .
10
Let us again consider the ideal gas. Inserting Eq. 12 into Eq. 31 we find its
grandcanonical partition function
ZG =
∞
X
N
z ZN
N =0
N
∞
zV
X
1 zV
3
= e λT .
=
3
N ! λT
(33)
N =0
The expectation value of the particle number follows from Eq. 32
N=
∂
zV
ln ZG = 3
∂α
λT
(34)
and that of the energy as well
E=−
3
zV
3
∂
ln ZG = kB T 3 = N kB T.
∂β
2
λT
2
(35)
This is equivalent to Eq. 26 but N is now strictly speaking hN i. The pressure
formula, Eq. 27 follows even more directly from these relations as we shall see
later below (cf. Eq. 62).
1.2
The entropy
In stf1 you learned about a quantity that is crucial for the understanding of
macroscopic systems: the entropy. As we shall see, the concept of entropy allows
for a different, more convincing argument for the Boltzmann distribution, Eq.
5. But before we come to that we start with a simple model system where it is
quite straightforward to grasp the ideas behind entropy, especially the relation
between a macroscopic state and its associated microscopic states.
The following system can be considered as an idealization of a so-called
paramagnet. A paramagnet is a substance that consists of atoms that have
magnetic dipole moment. The different dipoles do not feel each other and point
in random directions. As a result such a system shows no net macroscopic
magnetization. The model consists of a collection of microscopic so-called spins
on a lattice as shown in Fig. 4. Each spin represents an atom sitting on the
lattice of a solid – in contrast to a gas, Fig. 1, where the atoms can move freely
in space. We call the spin at the ith site si and assume that it can take either
the value +1 or −1 with a corresponding magnetic moment +µ or −µ. This
leads to the overall magnetization
M =µ
N
X
si .
(36)
i=1
We assume that the spins do not interact with each other. We also assume
that there is no energy change involved when a spin flips from one value to
the other. This means that all states have exactly the same energy. Therefore
each microscopic state {s1 , s2 , ..., sN } is as good as any other. The spins in a
paramagnet permanently flip back and forth due to the thermal environment.
11
s1
s2
s3
sN −1 sN
Figure 4: A system of N noninteracting spins. Each spin can either point up or
down.
We should thus expect when we look long enough at such a system to measure
any value of M between −µN and +µN . However, for a large system, N 1,
a paramagnetic substance always (“always” not in the strict mathematical sense
but almost always during the lifetime of the universe) shows an extremely small
value, |M | µN . How is this possible?
To understand this we have to look at the possible number of microstates
that correspond to a given macrostate, i.e., a state with a given value M of
magnetization. If we find a macrostate M , then there must be k spins pointing
up (and hence N − k spins pointing down) such that
M = µk − µ (N − k) = µ (2k − N ) .
(37)
Let us determine the number of microstates that have this property. This is a
simple problem in combinatorics. There are
N!
N
(38)
=
k
k! (N − k)!
possible combinations of spins where k spins point up and N − k point down.
The point is now that for large N there are overwhelmingly more configurations
that lead to a vanishing M , k = N/2, then there are states for which M takes
its possible maximal value, M = µN . For the latter case there is obviously only
one such state,
namely
all spins pointing up, whereas the former case can be
N
achieved in
different ways. To get a better understanding of how big
N/2
this number is, we employ Stirling’s formula that gives the leading behavior of
N ! for large values of N :
N
√
N
N →∞
2πN .
(39)
N! →
e
Equation 39 holds up to additional terms that are of the order 1/N smaller and
can thus be neglected for large values of N . Combining Eqs. 38 and 39 it is
straightforward to show that the number Nmax of spin configurations that lead
to M = 0 obeys
r
2 2N
N
.
(40)
Nmax =
≈
N/2
π N 1/2
As you can see Nmax grows exponentially with N , Nmax ∼ 2N . Macroscopic
systems may contain something like 1023 spins which means that there is an
12
astronomically large number of states with M = 0 (namely a 1 with 1022 zeros),
compared to one state with M = µN .
Let us call Nmicro (M ) the number of microstates corresponding to a given
macrostate characterized by M . One can show that to a good approximation
2
Nmicro (M ) = Nmax e
M
− 2µ
2N
.
(41)
This function is extremely peaked around M = 0 with the value Nmax given by
Eq. 40. It decays rapidly√when one moves away from M = 0, e.g. it has decayed
to Nmax /e for M = ±µ 2N , a value much smaller than the maximal possible
magnetization ±µN . Suppose we could somehow start with some macroscopic
state with a large value of M . Over the course of time the spins flip back
and forth randomly. Given enough time it is overwhelmingly probable that
M will have values that stay in an extremely narrow range around M = 0,
simply because there are so many more microstates available with tiny M -values
than with larger M -values. Therefore it is just an effect of probabilities that a
paramagnetic substance shows (close to) zero magnetization.
We can formulate this in a slightly different way. A macroscopic system will
go to that state where there is the largest number of microstates to a given
macrostate. This state is called the equilibrium state since once the system has
reached this state it does not leave it anymore – not because this is in principle
impossible, but because it is overwhelmingly improbable. We can also say the
following: Of all the possible macroscopic states, the system chooses the one for
which our ignorance of the microstate is maximal. If we measure M = µN we
would know for sure the microstate of the system, but if we measure M = 0 we
only know that our system is in one of about 2N (see Eq. 40) possible states.
We introduce a quantity that measures our ignorance about the microstate.
If we require that this quantity is additive in the sense that if we have two
independent (sub)systems our ignorance of the two systems is simply the sum
of the two, then we should choose this quantity to be given by
S = kB ln Nmicro .
(42)
The prefactor is in principle arbitrary, yet it is convention to choose it equal
to the Boltzmann constant kB , the quantity introduced in Eq. 25. A macroscopic system will always – given enough time – find the macroscopic state
that maximizes its entropy. Let us reformulate Eq. 42. Suppose we know the
macrostate of the system, namely that there are k spins pointing up. Then each
of the microstates corresponding to that macrostate has the same probability
pk = 1/Nmicro . We can then rewrite Eq. 42 as follows
S = −kB ln pk .
(43)
When k, and therefore M , changes, the entropy changes. Since the entropy is
extremely sharply peaked around k = N/2, the system will spontaneously reach
states around k ≈ N/2 and never deviate from this anymore, not because it is
forbidden, but because it is extremely improbable.
13
x2
∇f
f (x1 , x2 ) = const
g (x1 , x2 ) = C
g (x1 , x2 ) = const
∇g
x1
Figure 5: The method of the Lagrange multiplier. The objective is to find the
maximum of the function f (x1 , x2 ) under the constraint g (x1 , x2 ) = C. Shown
are lines of equal height of f (purple curves) and of g (blue curves). The red
point indicates the maximum of interest. It is the highest point of f on the
line defined by g = C. At this point the gradients of the two height profiles
are parallel or antiparallel (case shown here). This means there exists a number
λ 6= 0, called the Lagrange multiplier, for which ∇f = λ∇g.
The goal in the following is to extend the concept of entropy to a system
like our gas in a balloon. In such a case we also expect that the system goes to
a macrostate with the largest number of microstates or, in other words, to the
macrostate for which we know least about the microstate, the state of maximal
entropy. In this case there is, however, a complication. We had required that
the average energy has a certain value, hHi = E, cf. Eq. 16. So we need to
maximize the entropy with the constraint
X
hHi =
pi Ei = E.
(44)
i
Here we assume that the states are discrete, which – as outlined above – should
in principle always be assumed due to the uncertainty principle. We already
know from the previous section that the probabilities of states with different
energies are different. Extending Eq. 43 we now define the entropy as our
average ignorance about the system:
X
S = −kB
pi ln pi .
(45)
i
What we need to do is to maximize S, Eq. 45, under the constraint of having
a certain average energy, Eq. 44. This can be achieved using the method of Lagrange multipliers. Suppose you want to maximize the function f (x1 , ..., xm ).
If this function has a maximum it must be one of the points where the function has zero slope, i.e., where its gradient vanishes: ∇f = 0 with ∇ =
(∂/∂x1 , ..., ∂/∂xm ). What do we have to do, however, if there is an additional
14
constraint, g (x1 , ..., xm ) = C with C some constant? This constraint defines an
(m − 1)-dimensional surface in the m-dimensional parameter space. Figure 5
explains the situation for m = 2. In that case f (x1 , x2 ) gives the height above
(or below) the (x1 , x2 )-plane. As in a cartographic map we can draw contour
lines for this function. The constraint g (x1 , x2 ) = C defines a single line gC
(or combinations thereof) in the landscape. The line gC crosses contour lines
of f . We are looking for the highest value of f on gC . It is straightforward to
convince oneself that this value occurs when gC touches a contour line of f (if
it crosses a contour line one can always find a contour line with a higher value
of f that still crosses the gC -line). Since gC and the particular contour line of f
touch tangentially, the gradients of the two functions at the touching point are
parallel or antiparallel. In other words, at this point a number λ exists (positive
or negative), called the Lagrange multiplier, for which
∇ (f − λg) = 0.
(46)
Let us use this method in the context of the entropy. We want to find the
maximum of S/kB , a function depending on the parameters (p1 , ..., pNtot ) where
pi denotes the probability of the ith of the Ntot microstates. In addition we
need to fulfill the constraint 44. This leads to a condition equivalent to Eq. 46,
namely
∇ (S/kB − β hHi) = 0,
(47)
with ∇ = (∂/∂p1 , ..., ∂/∂pNtot ) and β a Lagrange multiplier. For each i, i =
1, ..., Ntot , we find the condition
S
∂
− β hHi = − ln pi − 1 − βEi = 0.
(48)
∂pi kB
This leads to pi ∼ e−βEi which then still needs to be normalized to one, leading
to
1
(49)
pi = e−βEi .
Z
This means that we again recover the Boltzmann distribution, Eq. 5, using
a different line of argument. Whereas the previous argument combined the
arguments concerning conserved physical quantities and independence of subsystems, the current argument simple looked for the macroscopic state where
our ignorance about the microscopic state is maximal. The inverse temperature
β has now entered the scene as a Lagrange multiplier.
Inserting the Boltzmann distribution, Eq. 49, into the entropy, Eq. 45, one
finds
X 1
1
e−βEi (− ln Z − βEi ) = kB ln Z + hHi .
(50)
S = −kB
Z
T
i
Solving this relation for −kB T ln Z leads to
F ≡ −kB T ln Z = E − T S.
15
(51)
From the partition function Z follows thus immediately the difference between
E, the internal energy of the system, and the entropy. The quantity F is called
free energy. Since in equilibrium the quantity S − βE is maximized, cf. Eq.
47, the free energy has to be minimized to find the most probable macrostate
characterized by the temperature, volume and number of particles. F is thus a
function of these quantities, i.e., F = F (T, V, N ). The free energy is an example
of a so-called thermodynamic potential, a function from which one can find the
equilibrium state of the system via minimization. Knowing F allows to directly
determine average quantities via differentiation, e.g. by combining Eq. 22 and
51 we find
∂F
.
(52)
p=−
∂V
As an example let us again consider the ideal gas. The free energy follows
from Eq. 12:
N !
3 V
λT N
1
F = −kB T ln
≈
k
T
N
ln
−
1
.
(53)
B
N ! λ3T
V
On the rhs we used Stirling’s formula, Eq. 39, and then neglected the term
(kB T /2) ln (2πN ) that is much smaller than the other terms. The pressure
follows by differentiation of Eq. 53 with respect to V , see Eq. 52, leading again
to p = kB T N/V .
The method of Lagrange multipliers can also be used to derive the grandcanonical ensemble. Maximizing the entropy with two constraints, E = hHi
and N = hN i, can be done in an analogous way to the canonical case, Eq. 48,
and leads to the condition
S
∇
− β hHi + α hN i = 0.
(54)
kB
The requirement is thus
∂
S
− β hHi + α hN i = − ln pi − 1 − βEi + αNi = 0.
∂pi kB
(55)
This leads directly to the Boltzmann factor for the grandcanonical case, Eq. 29.
Inserting this distribution into the entropy, Eq. 45, we obtain
1
kB X −βEi +αNi
e
(ln ZG + βEi − αNi ) = kB ln ZG + hHi − kB α hN i .
ZG i
T
(56)
Solving this relation for −kB T ln ZG leads to
S=
K = K (T, µ, V ) = −kB T ln ZG = E − T S − µN
(57)
where we introduced the quantity µ = α/β, called the chemical potential. From
ZG follows thus the difference between the internal energy E and T S − µN .
16
The thermodynamic potential K is called the grandcanonical potential or Gibbs
potential.
Surprisingly the grandcanonical potential is directly related to the pressure
of the system:
K = −pV.
(58)
To see this we start from the fact that E, S, V and N are so-called extensive
quantities, i.e., quantities that are additive. For instance, let us look again
at the gas-filled balloon, Fig. 1: The volume of the whole system Σ is simply
the sum of the volumes of the subsystems Σ1 and Σ2 and so are the energies,
particle numbers and entropies. On the other hand, the temperature T , the
pressure p and the chemical potential µ are intensive quantities. For systems
in equilibrium such quantities have the same value in the full system and in all
its subsystems. Products of an intensive and an extensive quantity like T S are
thus also extensive. From this follows that the Gibbs potential K is an extensive
quantity since all of its terms, E, −T S and µN , are extensive. This means that
K fulfills the relation
K (T, µ, λV ) = λK (T, µ, V )
(59)
for any value of λ > 0. If we choose e.g. λ = 1/2, then the left hand side (lhs)
of Eq. 59 gives the Gibbs potential of a subsystem with half the volume of the
full system. Its Gibbs potential is half of that of the full system (rhs of Eq. 59).
We now take the derivative with respect to λ on both sides of Eq. 59 and then
set λ = 1. This leads to
∂K
V = K.
(60)
∂V
Now in complete analogy to the derivation of the relation for the free energy in
Eq. 52 one can show that p = −∂K/∂V and hence
pV = −K = kB T ln ZG .
(61)
We can thus immediately obtain the pressure from ZG . For instance, for the
ideal gas we calculated ZG in Eq. 33 from which follows
pV = kB T
zV
= N kB T
λ3T
(62)
where we used Eq. 34. We thus rederived the ideal gas equation of state, Eq.
27.
Finally, let us take one more close look at the example discussed earlier, the
gas in a cylinder, Fig. 3. We were a little bit sloppy since we said in the legend
of that figure that “the piston is under an externally imposed force f ”, but then
calculated instead the expectation value of the force for a given volume, cf. Eqs.
20 to 23. If we want to be formally correct, then we need to maximize the
entropy under the two contraints hV i = V and hHi = E. This is achieved by
solving the following set of conditions
S
∂
− β hEi − γ hV i = − ln pi − 1 − βEi − γVi = 0
(63)
∂pi kB
17
where we introduced the additional Lagrange multiplier γ. Along similar lines
that led us to the grandcanonical potential, Eq. 57, we find a thermodynamic
potential G = E − T S + (γ/β) V . The ratio of the two Lagrange parameters in
front of V is just the pressure, p = γ/β, as we shall see in a moment. The new
thermodynamic potential
G (T, p, N ) = F (T, V (T, p, N ) , N ) + pV (T, p, N )
(64)
is called the free enthalpy G. We can immediately check
∂F ∂V
∂V
∂G
=
+V +p
=V
∂p
∂V ∂p
∂p
(65)
where we used Eq. 52.
For an ideal gas we find from its free energy, Eqs. 53 and V (T, p, N ) =
N kB T /p (i.e., Eq. 23 solved for V ):
3 λT p
G (T, p, N ) = kB T N ln
.
(66)
kB T
Inserting this into Eq. 65 one recovers indeed the ideal gas law, Eq. 27, but this
time in the version hV i = kB T N/p.
The grandcanonical potential obeys a very simple relation, K = −pV (cf.
Eq. 61), and so does the free enthalpy. Using the same line of argument that
led to Eq. 61 we find
∂G
G=
N = µN.
(67)
∂N
That ∂G/∂N is the chemical potential µ follows by comparing Eqs. 57 and 58
to Eq. 64.
Only in a few exceptional cases one can calculate Z, Eq. 7, or ZG , Eq. 30,
exactly, e.g. for the ideal gas. Such systems are often somewhat trivial and
do not even show phase transitions as we know them for any real substance.
Phase transitions can only come about if the molecules interact with each other
but then Z cannot be calculated anymore. Nevertheless, in many cases a real
system is close to an exactly solvable case. The idea of approximate methods is
usually to describe the deviations by a small parameter ε and then to expand
Z in powers of ε around the exacly solvable case:
Z = Zexact + C1 ε + C2 ε2 + ...
Here a few examples:
1. perturbation theory: H = H0 + λW , expansion in λ
2. quasiclassical approximation: expansion in ~
3. high temperature expansion in T0 /T
4. low temperature expansion in T /T0
18
5. expansion around critical point: τ =
T −Tc
Tc
6. 1/N -expansion with N number of components of a suitable field
7. ε-expansion: D = 4 − ε with D space dimension
8. virial expansion in n = N/V
In addition there is meanfield theory that cannot be cast easily in the above
scheme. The idea of meanfield theory is to replace the interaction of a particle
with all the other particles by the interaction of that particle with a suitable
meanfield.
In the following chapter we will consider the virial expansion and apply it
to the gas-liquid transition. In Chapter 3 we use high- and low temperature
expansions to prove the existence of a phase with spontaneous magnetization
for the 2-dimensional Ising model (ferromagentism). In the Chapter 4 we use
meanfield theory to get a very simple view on that phase transition. We also
use this framework to study a system where the virial expansion fails, namely
salt solutions.
2
2.1
Virial expansion
Virial expansion up to second order
The virial expansion is an expansion in the density n = N/V . It should be
thus a good approximation for sufficiently dilute systems for the case that the
particles interact with short-range interaction (as we shall see later - in Section
3.3 - it does not work for a salt solution where the ions experience a long-range
1/r interaction). We first study the expansion up to second order which is
relatively straighforward. Then we also study the much more complex case of
the virial expansion to arbitrary order.
The Hamiltonian of a real gas is of the following form
N
X
X
p2i
H (p, q) =
+
w (|qi − qj |) .
2m i<j
i=1
(68)
The first term represents the kinetic energy, the same as for the ideal gas, Eq.
9. The second term accounts for the interactions between the particles. The
sum goes over all pairs of particles (“i < j” makes sure that each pair is only
counted once) and we assume that the interaction potential w depends only on
the distances between the particles. It is now most convenient to use the grand
canonical ensemble for which the partition function is of the form
ZG = 1 +
∞
X
z N ZN = 1 +
N =1
N
∞
X
z
1
IN .
N ! λ3T
(69)
N =1
The first step is just Eq. 31 where we wrote the N = 0 term separately. In
the second step we inserted the explicit form of ZN , Eq. 7, with H (p, q) given
19
by Eq. 68, and performed immediately the integration over the momenta. IN
denotes thus the remaining integral
Z
P
IN = e−β i<j w(|qi −qj |) d3 q1 ...d3 qN .
(70)
Let us first consider again the ideal gas. In this case IN = V N and thus
3
ZG = ezV /λT , Eq. 33. From that result we derived above, in Eq. 34, that
N = zV /λ3T . In other words the quantity z/λ3T that appears in Eq. 69 is in the
case of the ideal gas precisely its density n = N/V . Now consider a real gas. If
this gas is sufficiently dilute, then the interaction between its particles constitute
only a small effect. The ratio z/λ3T is then very close to its density. Since we
assumed here the density to be small, the quantity z/λ3T is small as well. We
can thus interprete 69 as a series expansion in that small parameter. From this
expansion we can learn how the interaction between the particles influences the
macroscopic behavior of the system – at least in the regime of sufficiently dilute
gas. In that regime it is then often sufficient to only account for the first or the
first two correction terms since the higher order terms are negligibly small.
Unfortunately the quantity z/λ3T has not such a clear physical meaning than
the density n. But since both parameters are similar and small we can rewrite
Eq. 69 to obtain a series expansion in n instead of z/λ3T . This can be done in a
few steps that we outline here for simplicity only to second order in ζ = z/λ3T .
We start from
ζ2
(71)
ZG = 1 + ζI1 + I2 + ...
2
To obtain the density n = N/V we need to calculate the expectation value of N
that follows from ln ZG via Eq. 32. We thus need next to find the expansion of
ln ZG starting from the expansion of ZG . This is achieved by inserting ZG from
Eq. 71 into pV = kB T ln ZG , Eq. 61. To obtain again a series expansion in ζ
we use the the series expansion of the logarithmus around x = 1, ln (1 + x) =
P∞
k+1 k
x /k. This leads to
k=1 (−1)
ζ2
βpV = ln ZG = ln 1 + ζI1 + I2 + ...
2
2
2 2
ζ
ζ I1
ζ2
= ζI1 + I2 −
+ ... = ζI1 +
I2 − I12 + ...
(72)
2
2
2
When going from the first to the second line in Eq. 72 we neglected all terms
higher than ζ 2 . The particle number follows by taking the derivative of ln ZG
with respect to α, Eq. 32. Since ζ = eα /λ3T one has ∂ζ/∂α = ζ and thus
N=
∂
ln ZG = ζI1 + ζ 2 I2 − I12 + ...
∂α
(73)
We are now in the position to write an expansion in the density n = N/V
(instead of in ζ) by substracting Eq. 73 from 72. This lead to
βp =
N
ζ2
−
I2 − I12 + ...
V
2V
20
(74)
With this step we got rid of terms linear in ζ but there is still a ζ 2 -term. This
term can now be easily replaced by using Eq. 73 that states that N = ζI1 up
2
to terms of the order ζ 2 . We can thus replace the ζ 2 -term in Eq. 74 by (N/I1 )
3
neglecting terms of the order ζ . We arrive then at
2
N
1
βp = n −
I2 − I12 + ...
(75)
I1
2V
To see that Eq. 75 is indeed an expansion in n, we need to evaluate the integrals,
I1 and I2 , defined in Eq. 70. We find
Z
I1 =
d3 q = V
(76)
V
and
I2
=
Z
V
≈
V
d3 q1 d3 q2 e−βw(|q1 −q2 |) =
Z
Z
V
d3 q1
Z
d3 re−βw(r)
”V −q1 ”
d3 re−βw(r).
(77)
V
The first step in Eq. 77 is simple the definition of I2 , Eq. 70. In the second step
we substitute q2 by r = q2 − q1 , the distance vector between the two particles.
The integration goes over all values of r such that q2 = q1 + r lies within the
volume that we symbolically indicate by the shifted volume ”V − q1 ”. The
last step where we replaced the shifted volume by the unshifted one involves an
approximation. This can be done since the interaction between the particles,
w (r), decays to practically zero over microscopic small distances. Thus only a
negligibly small fraction of configurations, namely where particle 1 has a distance
to the wall below that microscopic small distance, is not properly accounted for.
Now we can finally write down the virial expansion to second order. Plugging
the explicit forms of the integrals, Eqs. 76 and 77, into Eq. 75 we arrive at
Z
Z
n2
n2
3
−βw(r)
βp = n −
d r e
− 1 + ... ≈ n −
d3 r e−βw(r) − 1 + ...
2 V
2
= n + n2 B2 (T ) + ...
(78)
In the second step we replaced the integration over V by an integration over
the infinite space. This is again an excellent approximation for short-ranged
w (r) since e−βw(r) − 1 vanishes for large r. The quantity B2 (T ) depends on
the temperature via β and is called the second virial coefficient. Introducing
spherical coordinates (r, θ, ϕ) with r1 = r sin θ cos ϕ, r2 = r sin θ sin ϕ, and r3 =
r cos θ we can write B2 (T ) as
B2 (T )
=
=
1
−
2
Z2π
dϕ
0
−2π
Z∞
0
Z1
−1
d cos θ
Z∞
0
dr r2 e−βw(r) − 1
dr r2 e−βw(r) − 1 .
21
(79)
2.2
Virial expansion to arbitrary order
We calculated the equation of state by reducing it to an effective two-particle
system. We show here that the lth order in n, nl , follows by accounting for
interactions of complexes (“clusters”) of l interacting particles. We start again
from
∞
X
ζN
IN (T, V )
(80)
ZG =
N!
N =0
with IN in the classical case
Z
P
IN = e−β i<j w(xi −xj ) d3 x1 ...d3 xN
or in the quantummechanical case something like
Z
−βHN
IN = Tr e
= hx1,..., xN | e−βHN |x1,..., xN i d3 x1 ...d3 xN .
In general IN will be of the form
Z
IN = WN (x1 , ..., xN ) d3 x1 ...d3 xN .
(81)
(82)
(83)
The quantity of interest is, however, not ZG but ln ZG . When one knows
ZG exactly, this is a trivial step but here this turns out to be very challenging.
ln ZG can also be expanded in ζ. It turns out to be of the form
ln ZG
∞
X
1 r
ζ Jr (T, V )
r!
r=1
Z
∞
X
1 r
=
ζ
Ur (x1 , ..., xr ) d3 x1 ...d3 xr .
r!
r=1
=
(84)
How can the functions Ur be calculated? For small indices this can be done by
hand by inserting the expansions, Eqs. 80 and 84 into the identity ZG = eln ZG
and then by comparing the coefficients. This leads to
W1 (x1 ) = U1 (x1 ) = 1,
W2 (x1 , x2 ) = U2 (x1 , x2 ) + U1 (x1 ) U1 (x2 ) ,
W3 (x1 , x2 , x3 )
=
U3 (x1 , x2 , x3 ) + U2 (x1 , x2 ) U1 (x3 ) + U2 (x2 , x3 ) U1 (x1 )
+
U2 (x1 , x3 ) U1 (x2 ) + U1 (x1 ) U1 (x2 ) U1 (x3 ) .
Inverting these relations allows to calculate the Ur ’s from the WN ’s.
What we try to find now is a general relation between the WN ’s and the
Ur ’s. As a first step we need the following definition:
22
Definition: A partition P of the set {1, 2, ..., N } is the decomposition of the
set in disjunct non-empty subsets S1 , ..., SK (1 ≤ K ≤ N ) such that
{1, 2, ..., N } = ∪ Sk .
Sk ∈P
N
We call P N the set of all such partitions, and PK
the set of all partitions made
from K subsets.
N
For a partition P ∈ PK
we denote by ri = |Si | the cardinality (number of
elements) of the set Si that belongs to P , and by nri the number of sets in P
of cardinality ri . One can see immediately that there are
AK ({ri }) =
1
N!
Q
r1 !....rK ! nri !
(85)
N
different partions in PK
for a given set of ri ’s.
In our case it turns out that partitions of particles are important. Define for
a given P ∈ P N the function
U|S| (S) = U|S| xi1 , ..., xi|S|
as the function for which the indices of the arguments xi are just the elements
of S. One can then formulate the following important theorem:
Theorem: The function WN can be decomposed into its connected parts,
the cluster functions Ur , as follows
X Y
WN (x1 , ..., xN ) =
U|S| (S) .
(86)
P ∈P N S∈P
We can immediately check that the explicit cases N = 1, 2, 3 given above do
indeed fulfill Eq. 86. The proof of this relation for general N goes as follows.
We start out from the equality ZG = eln ZG :
∞
∞
X
X
1
ζN
ZG =
IN =
N!
k!
N =0
k=0
∞
X
1 r
ζ Jr
r!
r=1
!k
.
Comparison of the cofficients for ζ N with N ≥ 1:
∞
X
N
X 1
IN
=
N!
k!
k=1
r1 ,...,rk =1
r1 +...+rk =N
k
Y
1
Jr .
r1 !...rk ! i=1 i
As a next step we order the ri ’s such thatQr1 ≤ r2 ≤ ... ≤ rk and take this into
account via the combinatorical factor k!/ nri !:
N
X 1
IN
=
N!
k!
k=1
X
{ri } ordered
Q
23
k
Y
k!
1
Jr .
nri ! r1 !...rk ! i=1 i
Here the summation over
P “{ri } ordered” is a short-hand notation for the following summation: {{ri } | ri = N, r1 ≤ r2 ≤ ... ≤ rk }. This can be written more
compact by using Ak ({ri }) from Eq. 85:
IN =
N
X
X
k=1 {ri } ordered
Ak ({ri })
k
Y
Jri .
i=1
Putting in the explicit forms of the Jr ’s we arrive at
IN =
N
X
X
k=1 {ri } ordered
Ak ({ri })
Z
Ur1 (S1 ) ...Urk (S1 ) d3 x1 ...d3 xN
which is simply
IN =
N
X
X Z
Ur1 (S1 ) ...Urk (S1 ) d3 x1 ...d3 xN .
k=1 P ∈PkN
From this follows finally
IN =
Z
X Y
P ∈P N S∈P
U|S| (S) d3 x1 ...d3 xN
from which follows directly Eq. 86.
What is the physical meaning of the connected parts? To a partition P ∈
P N is associated a partitioning of the arguments x1 , ..., xN in disjunct subsets
Si with i = 1, ..., k. We define the clusterlimit to a partition P as the limit
|xa − xb | → ∞ for all xa , xb with xa ∈ Si and xb ∈ Sj with Si ∩ Sj = ∅ and
xa − xb fixed if xa and xb from the same subset. The functions WN (x1 , ..., xN )
have obviously the following cluster property: In the clusterlimit to a partition
P , WN factorizes (asymptotic factorization):
Y
WN (x1 , ..., xN ) →
W|S| (S)
(87)
S∈P
if w (r) → 0 for r → ∞ fast enough. Then e.g.
lim
|x1 −x2 |→∞
W2 (x1 , x2 ) = W1 (x1 ) W1 (x2 )
and hence (see above)
lim
|x1 −x2 |→∞
U2 (x1 , x2 ) = 0.
The latter property of U2 can be generalized as follows:
All functions Ur (x1 , ..., xr ) go to zero for each clusterlimit from Pkr for any
k > 1 if WN (x1 , ..., xN ) has the cluster property.
24
This can be proven by induction as follows. Suppose we have shown this for
all r ≤ N − 1. Now consider a clusterlimit P from PkN for any k > 1. Then WN
takes asymptotically the form of Eq. 87. Using Eq. 86 we can write
Y X Y
WN (x1 , ..., xN ) →
U|Se| Se
(88)
S∈P P
e∈P |S| S∈
e P
e
On the other hand we can use directly Eq. 86 and then write the term containing
UN separately. This leads to
Y
X
U|S| (S)
(89)
WN (x1 , ..., xN ) = UN (x1 , ..., xN ) +
P ∈PkN ,k>1 S∈P
Now we take again the cluster limit of this expression. We do not know yet
anything about the first term, UN , but we know that in the summation of the
second term only those terms survive where there is no U|S| in the product that
contains arguments from more than one cluster. This is exactly the expression
above, Eq. 88. The only difference between Eq. 88 and the clusterlimit of Eq.
89 is the term UN which must thus be zero.
The property of Ur that it vanishes for any non-trivial clusterlimit means
that the quantity
Z
1
lim
d3 x1 ...d3 xr Ur (x1 , ..., xr )
V →∞ V
V
exists since only conformations contribute to the integral where all particle are
together in a cluster. This is in general not the case for the WN ’s.
The calculation of the cluster functions can be simplified in the classical case:
fij = fji = e−βw(|xi −xj |) − 1
(90)
Then WN can be rewritten as
WN (x1 , ..., xN ) = e−β
P
i<j
w(|xi −xj |)
=
Y
(1 + fij )
(91)
i<j



N
2

On the rhs there are 2
terms, each a product of fij ’s and 1’s. It is now
convenient to represent each term by a numbered graph. As an example we give
a graph of one of the terms occuring in Eq. 91 for N = 6:
1
2
3
4
5
6
=
For N = 1, 2, 3 we find:
25
(f24 f45 f25 ) f36
W1 (x1 )
=
W2 (x1 , x2 )
=
1
2
W3 (x1 , x2 , x3 ) =
1
2
1
3
+
1
2
3
+
+
+
1
2
1
2
3
1
2
3
+
+
1
2
3
1
2
3
+
+
1
2
3
1
2
3
A graph is called connected if any two of its vertices are connected (direct or
indirect) by an edge, otherwise not connected. The function of an unconnected
graph is the product of its connected subgraphs.
We now formulate the following theorem: The function Ur (x1 , ..., xr ) is
given by the sum of all connected, numbered graphs with r vertices.
Before we proove this theorem we give two examples:
U2 (x1 , x2 )
U3 (x1 , x2 , x3 )
From these examples we can immediately see that Ur vanishes in every nontrivial clusterlimit.
Now we come to the proof of the theorem. The cases r = 1, 2 are clear.
Suppose we have proven the theorem up to r = N − 1. We know that WN is of
the form
X X
WN = Con ({1, 2, ..., N }) +
Con (S1 ) ...Con (Sk )
k>1 P ∈PkN
where Con (S) is the sum of all connected graphs to the set S. Using Eq. 86 we
know that
X X
WN = UN +
U|S1 | (S1 ) ...U|Sk | (Sk )
k>1 P ∈PkN
Thus UN = Con ({1, 2, ..., N }) as stated
in the theorem above.
R
For the integrals Jr (T, V ) = V d3 x1 ...d3 xr Ur (x1 , ..., xr ) each numbered
graph that just has a different numbering gives the same contribution. The
integrals can thus be represented by unnumbered graphs (Cluster integrals). To
give examples:
−1
26
with
�
d3 x1 d3 x2 d3 x3 f (x1 − x2 ) f (x2 − x3 ) f (x1 − x3 )
�
=V
d3 xd3 yf (x) f (y) f (x + y)
=
V
V
=V
�
��
d3 xd3 yf (x) f (y) = V
V
V
�2
d3 xf (x)
Clusterintegrals factorize if the graph can be disconnected by removal of a
vertex as in the latter example.
Let us define
Z
Jl (T, V )
1
d3 x1 ...d3 xl Ul (x1 , ..., xl ) =
(92)
bl (T, V ) =
l!V
l!V
V
We expect that for bl (T, V ) exists the limit
lim bl (T, V ) = bl (T )
V →∞
Then
∞
X
p
1
ln ZG =
=
ζ l bl (T, V )
V
kB T
(93)
l=1
and
∞
X
N
1 ∂ ln ZG
=n=+
=
lζ l bl (T, V )
V
V ∂α
(94)
l=1
Furthermore, the energy density follows from
1 ∂ ln ZG
1 ∂T ∂ ln ZG
kT 2 ∂ ln ZG
E
=−
=−
=
V
V ∂β
V ∂β ∂T
V
∂T
Using Eqs. 69 and 93 one finds
E
= kT 2
V
∞
∞
X
X
3 l l
∂bl (T, V ) l
ζ bl (T, V ) +
ζ
2T
∂T
l=1
l=1
!
With Eq. 94 this leads to
∞
X
E
3
= nkB T + kT 2
b0l (T, V ) ζ l
V
2
l=1
with b0l = ∂bl /∂T .
27
(95)
Equations 93 to 95 with the coefficients being the clusterintegrals (with their
Ur being the sums of connected graphs) are very elegant expressions. Unfortunately they are expansions in the activity ζ and not in the density n. To go to
a general virial expansion
∞
X
p
=
Bl (T ) nl
kB T
(96)
l=1
requires some extra work. The quantity Bl (T ) in Eq. 96 is called the lth virial
coefficient.
To calculate the virial expansion up to lth order, one has to do the following
steps (here given for the case l = 3). We start with the power series in ζ for n,
Eq. 94, up to third order:
n = ζ + 2b2 ζ 2 + 3b3 ζ 3 + O ζ 4
(97)
where we used the fact that b1 = 1 (cf. Eq. 92). Now we have to solve for ζ up
to terms of third order in n. We use the ansatz:
ζ = a1 n + a2 n2 + a3 n3 + O n4
(98)
We plug Eq. 98 into Eq. 97 and find
n = a1 n + a2 + 2b2 a21 n2 + a3 + 4b2 a1 a2 + 3b3 a31 n3 + O n4
For this equation to hold we need a1 = 1 and the coefficients in front of n2 and
n3 to vanish. This means that a2 = −2b2 and a3 = 8b22 − 3b3 . Thus
ζ = n − 2b2 n2 + 8b22 − 3b3 n3 + O n4
Plugging this into Eq. 93 leads finally to
p
= n − b2 n2 + 4b22 − 2b3 n3 + O n4
kB T
(99)
We give here the first 4 virial coefficients
B 1 = b1 = 1
B2 = −b2
B3 = 4b22 − 2b3
B4 = −20b32 + 18b2 b3 − 3b4
..
.
Finally, let us give the expansions for the energy E and the free energy F .
Using pV = kB T ln ZG and Eq. 32 we find
∞
X nl−1
3
E
= kB T − kB T 2
B 0 (T ) .
N
2
l−1 l
l=2
28
(100)
w
whard
wattr
+
=
r
r
r
Figure 6: The typical interaction potential w between two molecules as a function of their distance r. It typically is the sum of two contributions. The first one
is a hardcore repulsive term, whard , that forbids particles to overlap (excluded
volume). The second is a longer-ranged attractive potential, wattr .
To derive F remember that the pressure follows from F by differentiation, p =
−∂F/∂V , Eq. 52. The free energy is thus obtained by integrating the pressure,
Eq. 96, leading to
∞
X
nl
Bl (T ) .
βF = N ln λ3T n − 1 + V
l−1
(101)
l=2
The first term in Eq. 101 follows from integrating the l = 1 term that leads to
−N ln V . All the other contributions to the first term are just the integration
constant that has to be chosen such that the result matches the ideal gas result,
Eq. 53, for the case that all Bl = 0 for l ≥ 2. You can convince yourself easily
that one indeed obtains Eq. 96 from the virial expansion of F by taking the
derivative with respect to V , p = −∂F/∂V .
2.3
Van der Waals equation of state
The van der Waals equation of state is an ingeniously simple ad hoc approach
that gives a qualitative idea of the equation of state of a real substance including its gas-liquid phase transition. It has been introduced by Johannes van der
Waals in his thesis “Over de Continuïteit van den Gas- en Vloeistoftoestand”
(Leiden University, 1873). Van der Waals assumed the existence of atoms (disputed at that time) and even more, that they have excluded volume and attract.
Here we will discuss how his approach can be understood in a more systematic
way as the first two (or three) terms in the virial expansion of a real gas.
As a start let us estimate from Eq. 79 the typical temperature dependence
of the second virial coefficient B2 (T ). Figure 6 depicts the typical form of
the interaction w (r) between two molecules. For short distances w (r) rises
sharply, reflecting the fact that two molecules cannot overlap in space due to
hardcore repulsion. For larger distances there is typically a weak attraction.
As schematically indicated in the figure the total interaction potential can be
written as the sum of these two contributions, w (r) = whard (r) + wattr (r).
To a good approximation the hardcore term can be assumed to be infinite for
r ≤ d and zero otherwise, where d denotes the center-to-center distance of the
touching particles, i.e., their diameter. The integral 79 can then be divided into
29
two terms accounting for the two contributions to the interaction:
B2 (T )
=
−2π
Zd
0
3
≈
2π
2
r (−1) dr − 2π
d
+ 2π
3
Z∞
d
Z∞
r2 e−βwattr (r) − 1 dr
r2 βwattr (r) dr = υ0 −
d
a
.
kB T
(102)
The approximation involved by going to the second line is to replace e−βwattr (r)
by 1 − βwattr (r) which is a good approximation if the attractive part is small
compared to the thermal energy, i.e., βwhard (r) 1 for all values of r > d.
In the final expression of Eq. 102 the volume v0 = 2πd3 /3 accounts for the ex3
cluded volume of the particles. It is actually four times the volume 4π (d/2) /3
of a particle. The factor 4 is the combination of two effects: (i ) A particle
excludes for the other a volume 4πd3 /3 that is eight times the eigenvolume. (ii )
An additional factor 1/2 accounts for the
R ∞implicit double counting of particle
pairs by the n2 -term. The term a = −2π d r2 wattr (r) dr is a positive quantity
(assuming wattr (r) ≤ 0 everywhere as is the case in Fig. 6). We thus find that
with increasing temperature the attractive term becomes less and less important and the systems behaves more and more like a system with pure hardcore
repulsion. There is a temperature T ∗ = a/ (kB v0 ) below which B2 (T ) becomes
negative, i.e., the particles effectively start to attract each other.
We can now write the virial expansion up to second order in n = 1/v (v is
the volume per particle). From Eq. 78
1
1
v0 a
1
p
= + B2 2 =
1+
− 2
(103)
kB T
v
v
v
v
v kB T
and from Eq. 100
E
3
3
a
= kB T − kB T 2 nB20 (T ) = kB T −
(104)
N
2
2
v
Eq. 103 does not make sense for small values v since then p → −∞ rather
than the physically expecteed p → ∞. This is not surprising since the virial
expansion up to second order is not expected to hold in this regime. In fact,
one can show that even the complete virial expansion, Eq. 96, will break down
in that regime.
To come to at least a qualitative understanding of this system one can add
various modifications to Eq. 103. The most natural thing would be to go in
the virial expansion up to third order, i.e. adding an additional term of the
form B3 /v 3 and simply to assume B3 to be some positive constant. This is a
procedure often done in the context of polymer physics (Flory, 1934). When
thinking about a real gas, however, people typically tend to use the approximation introduced by van der Waals that constitute a rather arbitrary procedure.
Replace in Eq. 103:
v0 1
1
1+
→
v
v
v − v0
30
p
2.0
T > T∗
1.5
p∗
T = T∗
1.0
T < T∗
0.5
v0
v∗
0.5
1.0
v
1.5
2.0
Figure 7: Isothermes of the van der Waals gas, Eq. 105.
Now this term goes to infinity at v = v0 which makes sense because the pressure
should become infinite once the molecules are densely packed. Even though the
replacement seems completely arbitrary, note that for small v0 /v:
1
1
1
v0
v0 2
1
=
1+
+O
=
v − v0
v 1 − v0 /v
v
v
v
Using this replacement we can write Eq. 103 as follows
1
a
p
=
− 2
kB T
v − v0
v kB T
which can be recast in the famous van der Waals equation
a
p + 2 (v − v0 ) = kB T
v
(105)
Originally this equation has been more phenomenologically introduced by modifying the ideal gas equation pv = kB T as follows: v → v − v0 as already
discussed above and p → p + a/v 2 which means that the pressure is effectively
reduced due to the attraction between particles and that this reduction should
be proportional to n2 .
Amazingly the isotherms of the van der Waals equation (or of the above
mentioned virial expansion up to 3rd order) are qualitatively very similar to the
isotherms that one measures for real substances - including even the liquid-gas
phase transition, see Fig. 7. There is a so-called critical temperature T ∗ such
that for temperatures above T ∗ , T > T ∗ , the isothermes are similar to that of
an ideal gas whereas for low temperatures, T < T ∗ , the isothermes feature a
local minimum and maximum. The isotherm at T = T ∗ , the so-called critical
isotherm, features an inflection point that follows from the conditions
∂2p ∗ ∗
∂p ∗ ∗
(T , v ) =
(T , v ) = 0
∂v
∂v 2
31
(106)
p
liquid
gas
L
G
T = 0.7T ∗
vG
vL
v
Figure 8: The Maxwell construction (see text). States where gas and liquid
coexist (cylinder in the middle) lie on the line between points G (pure gas, right
cylinder) and L (pure liquid, left cylinder).
From this follows
v ∗ = 3v0 , kT ∗ =
8 a
1 a
, p∗ =
27 v0
27 v02
(107)
Note, however, that the isotherms for T < T ∗ around v ∗ are not physical
since there the substance has a negative compressibility which makes the system
instable. In fact, this hints at a first order phase transition between a liquid
and a fluid. In the coexistence region (only present for T < T ∗ ) the isotherm is
supposed to be horizontal, i.e. one should have no increase in p with decreasing
v but condensation of gas (low density state) into fluid (high density state).
At T = T ∗ the gas and the fluid have the same density and for T > T ∗ there
is no phase transition anymore (there is no distinction between gas and liquid
anymore).
To make the unphysical isotherms of Eq. 84 that occur for T < T ∗ physical,
one needs to employ the Maxwell construction: One replaces a part of the
isotherm by a horizontal line as shown in Fig. 8. The height of the horizonal
line needs to be chosen such that the two areas that are enclosed between that
line and the original isotherm are equal. We will give a justification of this in a
moment, but for now let us discuss what happens to the system when it moves
along the horizontal line.
Suppose we start at a very dilute system, i.e., at a large v-value. When we
compress this system at constant temperature, then the pressure first rises. Once
the volume vG per particle is reached something dramatic happens: the pressure
remains constant under further compression, see Fig. 8. This signals the onset
32
of a phase transition. Whereas at point G the cylinder is still completely filled
with gas (right cylinder), as soon as we move along the Maxwell line there is
also a second phase in the cylinder shown in darker blue in the middle cylinder.
This is the liquid phase that has a higher density and it is thus found at the
bottom of the cylinder. By compressing the volume further, more and more
molecules in the gas phase enter into the liquid. Once the point L is reached all
the molecules have been transferred to the liquid phase, see the cylinder on the
left. Upon further compression of the system the pressure rises sharply following
again the original isotherm, reflecting now the compression of the liquid phase.
How is it possible that two phases coexist inside the cylinder? This is only
possible if three conditions are fulfilled: (i ) The temperatures in the two phases
need to be the same since otherwise heat will flow from the hotter to the colder
phase. This condition is fulfilled since both points, L and G, lie on the same
isotherm. (ii ) The pressure in both phases needs to be the same since otherwise
the phase with the higher pressure expands at the expense of the phase with
the lower pressure. The horizontal line is by construction a line of constant
pressure. (iii ) Finally, the chemical potentials of the two phases need to be the
same, i.e., the chemical potentials at points L and G in Fig. 8 have the same
value:
µG (T, p) = µL (T, p) .
(108)
Since this is the least intuitive condition we explain it here in more detail. We
can think of each phase as a system under a given pressure p at a given temperature T . The appropriate thermodynamic potential is thus the free enthalpy,
Eq. 64. Using Eq. 67 the total free enthalpy of the two coexisting phases is
given by
G = µG NG + µL NL = µG NG + µL (N − NG ) .
(109)
On the rhs we used the fact that the total number of particles, N , is the sum of
the particles in the two phases, NG + NL . Suppose now that the two chemical
potentials were different, e.g. µG > µL . In that case the free enthalpy can be
lowered by transferring particles from the gas to the liquid phase. Equilibrium
between the two phases, as shown inside the middle cylinder of Fig. 8, is thus
only possible if the two chemical potentials are the same. Only then the free
enthalpy is minimized: ∂G/∂NG = µG − µL = 0.
We now need to show that condition 108 is fulfilled when the equal area
construction is obeyed. Combining Eqs. 64 and 67 we find for each phase the
relation
Fk + pVk
(110)
µk (T, p) =
Nk
with k = G, L. The coexistence condition, Eq. 108 together with the relation
110 then leads to the condition
fL − fG = p (vG − vL )
(111)
where fk denotes the free energy per molecule in the kth phase. Next we calculate the difference fL − fG purely formally by integrating along the (unphysical)
33
isotherm:
fL − fG =
Z
df = −
ZvL
p (T, v) dv.
(112)
vG
isotherm G→L
In the second step we used the relation
df = f (T + dT, v + dv) − f (T, v) =
∂f
∂f
dT +
dv
∂T
∂v
isotherm
=
−pdv.
(113)
On the rhs we made use of the fact that per definition dT ≡ 0 along the isotherm
and of the relation ∂f /∂v = ∂F/∂V = −p, Eq. 52. Combining Eqs. 111 and
112 we arrive at
ZvG
p (T, v) dv = p (vG − vL ) .
(114)
vL
This is just the mathematical formulation of the equal area requirement since
only then the area under the isotherm between vL and vG equals the area of a
rectangle of height p and width vG − vL .
Next we point out that one can bring Eq. 105 in a universal form (i.e.
make it independent of the specific values of v0 and a) by making everything
dimensionless. We introduce
ṽ =
v
p
T
e
, p̃ = ∗ , T̃ = ∗ , ẽ = ∗
v∗
p
T
e
With Eq. 107 Eqs. 104 and 105 take the universal forms
3
3
p̃ + 2 (3ṽ − 1) = 8T̃ , ẽ = 4T̃ −
ṽ
ṽ
(115)
Eq. 115 is qualitatively in good agreement with experiments but not quantitatively. For instance, Eq. 107 predicts
p∗ v ∗ =
3 ∗
kT
8
(116)
whereas for real substances one finds typically p∗ v ∗ ≈ 3.4kT ∗ . Also the behavior
close to the critical point is not correctly captured.
However, the existence of a universal equation of state
p̃ = p̃ T̃ , ṽ , ẽ = ẽ T̃ , ṽ
(117)
is quite well experimentally confirmed. Within classical statistical mechanics it
can be understood as follows. Assume for all substances a qualitatively similar
interaction profile w (r) but of different depth ε and range σ:
r
w (r) = εw̃
σ
34
This means for the partition function
Z
P
x
1
−βε i<j w̃( σ
3N
)
ZN =
d
xe
3N
N !λ
V
Z
P
σ 3N
3N 0 −(βε) i<j w̃(x0 )
=
d
x
e
N !λ3N
(118)
1
σV
From this we can directly see that σ and ε can be adsorbed in V and T which
explains the univeral form of Eq. 117.
3
Low and high temperature expansion
In the following we will consider the Ising model to achieve some understanding
of the phase transition in a ferromagnet. Here we consider the case of a vanishing
external magnetic field. In the next chapter we shall also study the influence of a
magnetic field within the meanfield approximation. In a ferromagnet one has an
ordered phase with a non-vanishing magnetization below a critical temperature,
the Curie temperature, and a disordered phase with zero magnetization above
that temperature. The Ising model on a D dimensional lattice is a simple model
for an uniaxial ferromagnet. On each lattice site i site a spin that can assume
the values σi = ±1. The Hamiltonian is given by
X
X
H ({σi }) = −h
σi − J
σi σj
(119)
i
NN
Here h = mB is the energy of the magnetic moment m of the spin in the
magentic fieldP
B, J is the nearest neighbor coupling energy (spins prefer to align
parallel) and NN means the summation over nearest neighbours. We will first
study the one-dimensional case a warming-up exercise. There are many ways
of calculating its partition function, here we use one more exotic one that sums
over self-avoiding walks. Whereas the 1D case is trivial (no phase transition),
the same method will be applied later on the 2D Ising model. This case is much
more hard to deal with but it has the beautiful feature that it shows a phase
transition to an ordered phase with non-zero magnetization.
3.1
The one-dimensional Ising model
We consider a one-dimensional lattice with the spins i = 1, ..., N . For simplicity
we close the lattice into a ring, i.e., spin 1 and N are considered to be near
neighbors (sN +1 = s1 ). We consider the case without external magentic field,
h = 0. The partition function is then given by
35
Z
=
X
eβJ
PN
i=1
σi σi+1
=
X
N
Y
eβJσi σi+1
{σi =±1} i=1
{σi =±1}
=
X
N
Y
(cosh βJ + σi σi+1 sinh βJ)
(120)
{σi =±1} i=1
In the second line we used the identity
e
βJσi σi+1
= cosh βJ + σi σi+1 sinh βJ =
(
eβJ
e−βJ
for σi σi+1 = 1
for σi σi+1 = −1.
(121)
Equation 120 can be further rewritten as follows:
Z
=
N
Y
X
N
(cosh βJ)
(1 + σi σi+1 tanh βJ)
{σi =±1} i=1
=
N
(cosh βJ)
X
1 + tanh βJ
i
{σi =±1}
+ (tanh βJ)
X
2
i6=j
X
(σi σi+1 ) +
(122)

(σi σi+1 ) (σj σj+1 ) + ...
For the various terms of this expression we can introduce a graphical representation: Each pair (σi σi+1 ) is represented by a line that connects the lattice
points i and i + 1. The first term in the expansion corresponds then to a graph
of N points without any connecting line. The second contains graphs in which
exactly one pair of neighboring points is connected. The third point is made
from graphs where 2 different pairs of points are connected by a line and so on.
Finally, the last term consist of one graph where all lines are inscribed. Since
X
X
σi2 = 2 and
σi = 0
σi =±1
σi =±1
we see that only those terms contribute to the summation over {σi = ±1} for
which either σi for each i does not appear or where it appears quadratically.
That means that only 2 terms contribute to the summation, namely the first
term (no lines) and the last term (all lines present). Therefore
X N
N
N
N
Z = (cosh βJ)
= (2 cosh βJ)
1 + (tanh βJ)
1 + (tanh βJ)
{σi =±1}
(123)
36
N
The righthand side can be interpreted (up to the factor (2 cosh βJ) ) as a
summation over all closed walks on a lattice for which no line is used twice.
l
Each walk has to be wighted by a factor (tanh βJ) (l : length of walk). There
are two such self-avoiding walks: one of length 0 and one that goes all around the
lattice. The latter contribution disappears in the thermodynamic limit N → ∞.
The partition function is analytical for all temperatures. A finite temperature with zero magentization above and non-zero magnetization below a finite
temperature can thus not exist and there is no phase transition.
3.2
The two-dimensional Ising model
First studied in Ising’s PhD thesis in 1925, this model features a phase with
spontaneous magnetization as proven by Peierls in 1936. Kramers and Wannier
calculated in 1941 the exact expression for the temperature below which spontaneous magentization occurs (in the absence of a magnetic field). Onsager was
the first to find the free energy of the 2D Ising model with algebraic methods
(again in the absence of a magnetic field). We perform now high- and low temperature expansion of this model in the following subsection. In later subsection
we will prove the existence of a phase transition and then determine the exact
temperature where the phase transition occurs.
3.2.1
High- and low temperature expansions
We consider spins living on a two-dimensional quadratic lattice. As for the 1D
Ising model we have the following identity for neighboring spins:
eβJσi σj = cosh βJ + σi σj sinh βJ =
(
eβJ
e−βJ
for σi σj = 1
for σi σj = −1.
(124)
leading to the partition function
2N 2
Z = (cosh βJ)
X
Y
(1 + σi σj tanh βJ)
{σi =±1} hi,ji
2N 2 is number of lines on a quadratic N × N lattice when closing the lattice
into a torus. As in the 1D case one can multiply all the factors in the product
and represent them by graphs by connecting all spin pairs that occur in the
corresponding term. Only such terms will contribute where a spin does not
occur an odd number of times. This means that only such terms count where
walks are closed. As no direction is associated to these self-avoiding walks, it
is better to speak of polygons. Note that a term can correspond to a single or
several closed polygons. The total length of those polygons, l, leads to a weight
l
factor (tanh βJ) . The partition function is thus given by
37
length
multiplicity
shape
length
multiplicity
shape
N2
0
1
8
4
N2
2N 2
6
2N 2
4N 2
�
�
N2 N2 − 5
2
Figure 9: All closed polygons on a quadratic lattice up to length l = 8.
Z
=
2N 2
(cosh βJ)
2N
2
X
l
(tanh βJ)
Polygons
=
2 cosh2 βJ
N 2 X
l
P (l) (tanh βJ)
(125)
l
P (l) is the number of closed polygons of length l. Note that this representation
of the partition function of the Ising model is valid for any dimension of the
lattice.
An expansion of Z in powers of τ = tanh βJ is a high-temperature expansion
(since tanh βJ 1 for high temperatures). All the closed polygons of length
l ≤ 8 are depicted in Fig. 9 together with their multiplicity. From this follow
the first terms of the partition function in the high temperature expansion
N 2
1
N 2 − 5 τ 8 + ...
Z = 2 cosh2 βJ
1 + N 2 τ 4 + 2N 2 τ 6 + N 2 7 +
2
(126)
and the corresponding free energy per spin
1
1
9 8
2
4
6
ln 2 cosh βJ + τ + 2τ + τ + ...
(127)
F∞ = − lim
ln Z = −
2
β
2
N →∞ βN
where we used ln (1 + x) ≈ x − x2 /2 for x 1.
38
++ −−
+ −− +
− − −+
++ −−
+ ++ −
+ −− −
++ −−
− − + −
−
+
+
+
−
+
+
−
+
+
−
+
−
−
+
−
+
+
−
+
−
−
+
−
+
−
−
−
+
−
−
−
Figure 10: The dual lattice (see text for details).
As the next step we derive a low-temperature expansion (β → ∞). As a first
step we put a configuration-independent factor in front of the partition function:
P
P
X
X
2
(128)
eβJ hi,ji (σi σj −1)
Z=
eβJ hi,ji (σi σj −1+1) = e2N βJ
{σi =±1}
{σi =±1}
Here 2N 2 is the number of nearest neighbors on a quadratic lattice. The weight
of a given configuration of spins is given by the number of lines that connect
spins with opposite orientation. Each such line contributes a factor e−2βJ . It
is then useful to go to the dual lattice where the roles of the plaquettes (the
squares defined by 4 lines) and of the vertices are exchanged. The centers of the
plaquettes of the original lattice are then the vertices of the dual lattice. The
spins are now defined on the plaquettes of the dual lattice. We can now highlight
the lines of the dual lattice bordering regions of spin +1 and spin −1, see Fig.
10. These border lines are dual to the lines that contribute a factor e−2β in
the partition function. Each spin configuration corresponds to a configuration
of border lines with the weight e−2βl where l denotes the total length of the
border between plus and minus spin regions.
Remarkably we encounter here again self-avoiding closed polygons as we did
above in the high-temperature expansion. We therefore can write the partition
function as follows:
X
2
Z = 2e2βJN
e−2βJl
(129)
Polygons
Again we find for the two-dimensional Ising model a representation as a summation over closed polygons but this time with a weight that vanishes for β → ∞,
a low-temperature expansion. The first few terms of the partition function are
now given by
2
1
Z = 2e2βJN 1 + N 2 e−8βJ + 2N 2 e−12βJ + N 2 7 +
N 2 − 5 e−16βJ + ...
2
(130)
39
from which follow
1
9
ln Z = 2βJ + e−8βJ + 2e−12βJ + e−16βJ + ...
2
2
N →∞ N
−βF∞ = lim
(131)
The equilvalence between the high- and low-temperature expansion of the
2D Ising model is also called self-duality and will be useful later to determine
the phase transition temperature. But first we need to proof that there is a
phase transition at all.
3.2.2
Proof of the existence of a phase transition
The Ising model in two (and also in any higher) dimensions has a phase transition that separates a phase with a spontaneous magnetization from a phase
without spontaneous magnetization. This can be described by an order parameter that is non-zero in the ordered phase and zero in the disordered phase. The
most obvious candidate for such a quantity is the expectation value of the spin
orientation. But the problem is that the Boltzmann factor is invariant under the
transformation σi → −σi and thus the expectation value hσi i vanishes always.
To circumvent this problem we introduce a quantity that does not vanish in
the ordered phase:
µ = lim lim hσi σi+r i
(132)
r→∞ N →∞
This quantity determines the correlation between 2 spins that are infinitely far
apart. In the case of spontaneous magnetization the probability to be parallel
is larger than to be antiparallel and thus µ > 0. In the disordered phase there
are no correlation between spins that are infinitely far apart from each other
and µ = 0.
For T = 0 there are only 2 spin configurations that contribute, namely all
spins have +1 or all spins have −1, both leading to µ = 1. This, however, is
also true for the one-dimensional Ising model, even though this system has no
phase transition since the magetization vanishes for any T 6= 0. What needs to
be shown here is that there is a finite temperature range above T = 0 where the
magnetization is different from 0. The following proof is typical for an estimate
of this kind in statistical mechanics. As a first step we rewrite µ as follows:


+
−
1 X βJ Phi,ji σi σj X βJ Phi,ji σi σj 
e
−
e
µ = 2 lim lim
r→∞ N →∞ Z
{σ}
{σ}


2
+
−
X
e2N βJ  X −βJl
= 2 lim lim
e
−
e−βJl 
(133)
Z
r→∞ N →∞
Polygons
Polygons
The indices “+” and “−” indicate that the summations go only over those spin
conformations where σr = 1 and σ0 = +1 or σ0 = −1, respectively. The factor 2
in front of the expressions accounts for the fact that we do not count the states
with σr = −1 separately. The transition from the first to the second line is the
same as the one from Eq. 128 to 129.
40
We have now to show that for sufficiently small temperatures the second sum
(the one with σ0 = −1) is smaller than the first one and hence µ > 0. Since
we only want to prove the existence of this ordered phase a rough estimate is
sufficient. We do this using the formulation in terms of the polygons. The
important point is here that contributions to the second sum need at least to
have a closed border around the spin at i = 0 wheras this is not the case for the
case of equal spins, σ0 = 1. We use the inequality


+
−
+
X
X
X
X
0
e−βJl −
e−βJl >
e−βJl 1 − 2
e−2βJl 
Polygons
Polygons
Polygons
Polygons around 0
(134)
The summation “Polygons around 0” means a summation over all closed polygons that enclose the point 0, are simply connected and do not cross or touch
themselves. That means for each term in that sum that there is an area with
−1-spins around the point 0 and outside this area one has +1-spins. The reason why this is an inequality lies in the fact that not every polygon around
0 is allowed to transform a given conformation from the “+”-sum into a “−”configuration because many such polygons around 0 would have common lines
with polygons from that given “+”-configuration. The factor 2 accounts for the
fact that we can have also put the boundary around the point at r. Even though
we vastly overestimate the number of “−”-configurations on the righthand side
of Eq. 134 we shall see now that the expression is still larger than zero for sufficiently small non-zero temperatures. In other words there is temperature range
for large β-values where
1
>
2
X
e−2βJl
(135)
Polygons around 0
2
A closed polygon of length l encloses an area not larger than (l/4) = l2 /16.
This gives an upper estimate of the number of possibilities how to position such
a polygon around 0. The number of shapes of polygons of length l cannot be
larger than 4 × 3l−1 as for the first step there are 4 directions to go and for the
following steps only 3 as the polygon is self-avoiding. This overcounts largely
the number of possible shapes as they need to be closed after l steps and cannot
cross themselves. This very rough estimate sits in between the two quantities
in Eq. 135 such that
1
>
2
∞
X
l=4,6,8,...
l2 l −βJl
3e
>
12
X
e−βJl
(136)
Polygons around 0
It is the first inequality that remains to be proven. The summation can be
performed exactly:
∞
X
x2 4 − 3x + x2
1
l2 l −βJl
>
3e
=
(137)
3
2
12
3 (1 − x)
l=4,6,8...
41
with x = 9e−4βJ . One can convince oneself that this inequality is fulfilled for
large enough values of β (larger than βJ ≈ 0.8).
To complete our proof of the existence of a phase transition we show now that
there is a phase with vanishing magnetization for sufficiently large temperatures.
The high-temperature expansion of the correlation function between 2 spins
hσi σj i =
1
Z
X
σi σj e−βE
(138)
{σi =±1}
can be written as the sum over all polygons where one polygon connects i with
j whereas all the other polygons are closed (that way each spin occurs only in
the form σk2 or not at all and thus does not lead to a cancellation in the spin
summation). For each term in the polygon summation of Z one obtains a term
in the summation of the nominator by adding an allowed connection between i
l
and j weighted with (tanh βJ) where l denotes the length of that connection
(“allowed” means that at each lattice point only 2 or 4 lines can come together).
This leads to the inequality
X
l
hσi σj i <
(tanh βJ)
(139)
connections i→j
We take again the very rough estimate 4 × 3l−1 from above for the number of
connections of length l:
X
connections i→j
l
(tanh βJ) <
|i−j|
4 (3 tanh βJ)
4 X
l
(3 tanh βJ) =
3
3 1 − 3 tanh βJ
(140)
l≥|i−j|
Here |i − j| denotes the length of the shortest connection between point i and
j. The geometric series converges for sufficiently small values of β. The value
of µ that follows in the limit |i − j| → ∞ vanishes, i.e. there is no spontaneous
magnetization for sufficiently large temperatures.
3.2.3
Self-duality of the two-dimensional Ising model
We use now this self-duality to calculate exactly the temperature where the
phase transition occurs (following Kramers and Wanniers). We define the dual
temperature through
∗
e−2β J = tanh βJ
(141)
i.e.
1
β ∗ J = − ln tanh βJ
2
Comparing the low-temperature expansion
Z (β) = 2e2βJN
2
∞
X
l=0
42
P (l) e−2βJl
(142)
(143)
and the high-temperature expansion
2
Z (β) = 2 cosh βJ
∞
N 2 X
l
P (l) (tanh βJ)
(144)
l=0
one finds the following relation between Z (β) and Z (β ∗ ):
N2
Z (β) =
(2 cosh βJ sinh (βJ))
2
Z (β ∗ )
(145)
If we know Z (β) we can calculate Z (β ∗ ). For large values of β one has a small
value of β ∗ and vice versa. Equation 145 connects the partition function at
low temperatures with the partition function at high temperatures. The duality
transformation, Eq. 142 is an involution:
1
1
1
= βJ
(146)
β ∗∗ J = − ln tanh β ∗ J = − ln tanh − ln tanh βJ
2
2
2
In terms of the free energy per spin F∞ = − lim 1/βN 2 ln Z the relation
N →∞
145 is given by
F∞ (β) =
1
ln (sinh 2βJ) + F∞ (β ∗ )
β
(147)
We assume now that the Ising model in 2 dimensions has only one phase transition, i.e., the free energy per spin has only one value where it is non-analytical.
That means that this point must be a fixpoint of the duality transformation
β → β ∗ . This leads to the condition
e−2βJ = tanh βJ
which is solved for
βc J =
4
4.1
√ 1 ln 1 + 2 ≈ 0.440687
2
(148)
(149)
Meanfield approximation
Introduction
To introduce the meanfield approximation somewhat general, let us first write
down the exact probability to find a particle at position x1 :
Z
1 N
d3 x2 ...d3 xN e−βVN (x1 ,...,xN )
(150)
n1 (x1 ) =
Z λ3N N !
with
VN (x1 , ..., xN ) =
N
X
1X
w (xi − xj ) +
U (xi )
2
i=1
i6=j
43
(151)
Here w described the interaction between particles and U is an external potential. Eq. 150 follows by integrating over all degrees of freedom one is not
interested in (i.e. the positions of particle 2 to N and all the N momenta). The
additional factor N is chosen such that
Z
d3 x n1 (x) = N
(152)
Eq. 150 is in general too complicated to be solved explicitely. There are various approximation schemes taking correlations between particles into account
up to a certain extent. The crudest approximation is to neglect correlations
altogether which is the so-called meanfield approximation:
n1 (x) = eβ (µ−U (x)−
R
d3 x0 w(x−x0 )n1 (x0 ))
(153)
Here the influence of all the other particles onto a given particle which is given
by the potential
V (x) = U (x) +
N
X
i=2
where n (x0 ) =
PN
w (x − xi ) = U (x) +
Z
d3 x0 w (x − x0 ) n (x0 )
δ (x − xi ) is replaced by its average
Z
Ueff (x) = U (x) + d3 x0 w (x − x0 ) n1 (x0 )
(154)
i=2
(155)
Even though this is a very, very big approximation (throwing most details
out of the window), the resulting expression, Eq. 153, is often not trivial to
solve since it is a nonlinear selfconsistent equation for n1 (x). The meanfield
approximation is expected to be good if many particles contribute to Ueff . Thus
it usually works better for a system in higher space dimensions. It also works
well when the particle-particle interaction w (r) becomes long-ranged (like for
charged particles in an electrolyte or a plasma). In such a case the virial expansion does not work and in that sense the meanfield approximation can be
complementary to the virial expansion. We first discuss the Weiss theory of
ferromagnetism, a prototype meanfield theory that predicts a first order phase
transition very similar to the one predicted by the van der Waals theory. We
then discuss the Poisson-Boltzmann theory for electrolytes (salt solutions). We
shall see that this case shows a low density behavior that is markably different from what a virial approximation can predict. We will in addition focus
on the role of the nonlinearity and its physical interpretation and discuss the
breakdown of meanfield theory in the case of strong ion-ion coupling where
correlations dominate the behavior.
4.2
Ferromagnetism
We consider here the Ising model on a D dimensional lattice as a simple model
for an uniaxial ferromagnet. On each lattice site i site a spin that can assume
44
the values σi = ±1. The Hamiltonian is given by
X
X
H ({σi }) = −h
σi − J
σi σj
i
(156)
NN
Here h = mB is the energy of the magnetic moment m of theP
spin in the
magentic field B, J is the nearest neighbor coupling energy and NN means
the summation over nearest neighbours. The 1D case is trivial to solve but it
shows no phase transition. The 2D case features a phase transition and Lars
Onsager (1944) managed to calculate the free energy of the 2D Ising model
exactly but only for h = 0. Nobody has been able so far to solve the 3D case
analytically.
The meanfield theory of ferromagnetism provides a simple view (qualitative
but not quantitative) of such systems. It predicts a phase transition irrespective
of the dimensionality of the system and thus is obviously a bad approximation
in 1D where we know that there is no phase transition. It works better and
better at larger space dimensions and certain predictions of the theory become
exact for D ≥ 4. The meanfield Hamiltonian is assumed to be
X
HMF ({σi }) = − (h + Jz hσi)
σi
(157)
i
Here z is the coordination number of the lattice (e.g. z = 6 for a threedimensional cubic lattice). In Eq. 157 the interactions of the given spin σi
with its z nearest neighbours (cf. Eq. 156) has been replaced by the interaction of that spin with the “meanfield” of the neighboring spins, assumed to be
given by the mean magnetization per spin hσi. This approximation obviously
neglects correlations like that the spin-spin coupling favors configurations where
σi is surrounded by spins with the same orientation. hσi will be calculated below
in a selfconsistant manner.
The partition function to the Hamiltian 157 can be calculated exactly:
X
X Y
ZMF =
e−βHMF ({σi }) =
eβ(h+Jzhσi)σi
{σi =±1}
=
{σi =±1} i
eβ(h+Jzhσi) + e−β(h+Jzhσi)
The free energy is thus
N
N
= [2 cosh (β (Jz hσi + h))] (158)
FMF = −kB T ln ZMF = −kB T N ln (2 cosh (β (Jz hσi + h)))
(159)
Now we are in the position to calculate the mean-field magnetization per spin:
hσi = −
1 ∂FMF
= tanh (β [Jz hσi + h])
N ∂h
(160)
We arrived here is a selfconsistent, nonlinear equation for hσi, a typical feature
of meanfield theories. Here the meanfield hσi is simply a constant whereas in
the general case Eq. 153 (that accounts also for the possible presence of a
45
arctanh �σ�
�σ�
T < TC
T < TC
T = TC
T > TC
T > TC
h
�σ�
(a)
(b)
Figure 11: (a) Graphical representation of the selfconsistent equation for hσi,
Eq. 161. The intersections between the lines are solutions of that equation. (b)
From the intersections in (a) follow the magentization per particle as function
of h. Note that for T < TC one has a first-order phase transition from a state
with spontaneous negative magnetization to a state with spontaneous positive
magnetization. To make the multivalued T < TC -curve physical one needs to
replace it by the curve with a jump, in very much the same way as the Maxwell
construction for the liquid-gas transition, Fig. 8.
nonhomogeous external potential) one has to determine a function in space,
n1 (x).
We rewrite now Eq. 160 as follows
arctanh hσi = βh + βJz hσi
(161)
In this form the selfconsistent equation can be solved graphically. In Fig.11 we
plot the lhs of Eq. 161, namely arctanh hσi vs. hσi, as well as the rhs of that
equation, βh + βJz hσi vs. hσi. The solutions to Eq. 161 are the points where
the curves cross. For h = 0 there are either one or three intersections between
the arctanh and the linear function. This depends on whether the temperature
is above or below a critical temperature
TC =
Jz
kB
(162)
the so-called Curie temperature. If one is above T = TC there is no spontaneous
magnetization, hσi = 0, below T = TC the system becomes ferromagnetic and
features spontaneous magnetization (the solution with hσi = 0 is then irrelevant
since it becomes a maximum of the free energy 159).
It is important to note that the van der Waals equation of state, Fig. 7,
is very similar to the magnetic case, Fig. 11(a), if one identifies p with h and
hσi with V . In fact, one can show that lattice gases can be mapped one-to-one
onto spin models. An important experimental difference is, however, that p, V
and h can be imposed on a system but not hσi. Note that, as for the van der
Waals equation, the meanfield model for ferromagnets works only qualitatively
but not quantitatively. For instance, at T = TC one finds in the meanfield
46
approximation hσi ∼ h1/δ (cf. middle curve in Fig. 11(b)) with δ = 3. But
in 1D there is no phase transition at all, in 2D one finds such a power law but
with δ = 15, in 3D with δ ≈ 4.8 . The exponent δ takes the meanfield value
only from 4 dimensions onwards.
4.3
Poisson-Boltzmann theory
A living cell is essentially a bag filled with charged objects. Besides the charged
macromolecules (DNA, RNA and proteins; see Fig. 12) and the membranes
(that also contain some charged lipids) there are lots of small ions. These ions
are mostly cations, positively charged ions, compensating the overall negative
charges of the macromolecules: 5-15mM sodium ions, Na+ , 140mM potassium
ions, K+ , as well as smaller amounts of divalent ions, 0.5mM magnesium, Mg2+ ,
and 10−7 mM calcium, Ca2+ . Here mM stands for millimolar, 10−3 moles of particles per liter. There are also small anions, mainly 5-15mM chloride ions, Cl− .
We know the forces between those charged object; in fact, basic electrostatics
is even taught in school. But even if, for simplicity, we consider the macromolecules as fixed in space, a cell contains a huge number of mobile small ions
that move according to the electrostatic forces acting on them which in turn
modifies the fields around them and so on. This problem is far too complicated
to allow an exact treatment. There is no straightforward statistical physics approach that can treat all kinds of charge-charge interactions occurring inside
a cell. In other words, we have not yet a good handle on electrostatics, the
major interaction force between molecules in the cell. And that despite many
years of hard work. The current chapter tries to give you a feeling of what we
understand well and what not.
The standard approach to theoretically describe the many-body problem of
mobile charges in an aqueous solution in the presence of charged surfaces is the
so-called Poisson Boltzmann (PB) theory. It is not an exact theory but is yet
another example of the meanfield approximation. As I will argue, one needs to
be quite careful when applying it to the highly charged molecules encountered
in a cell.
To construct the PB theory one first distinguishes between mobile and fixed
ions. This distinction comes very natural since the small ions move much more
rapidly than the macromolecules. So it is usually reasonable to assume that
at any given point in time the small ions have equilibrated in the field of the
much slower moving macromolecules. Let us denote the concentration of small
ions of charge Zi e by ci (x) where e denotes the elementary charge and |Zi | the
valency of the ion: |Zi | = 1 for monovalent ions, |Zi | = 2 for divalent ions and
so on. The concentration of fixed charges, the “macromolecules”, is denoted by
ρfixed (x). The total charge density at point x is then
X
ρ (x) =
Zi eci (x) + ρfixed (x) .
(163)
i
From a given charge density ρ (x) the electrostatic potential ϕ (x) follows via
47
phosphate
(a)
(b)
Figure 12: (a) The DNA double helix: each phosphate carries a negative charge.
(b) The lysozyme (a protein): the colors indicate the electrostatic potential
(blue: positive potential, red: negative)
the Poisson equation:
∇ · ∇ϕ (x) = 4ϕ (x) = −
4π
ρ (x) .
ε
(164)
Here ε is the so-called dielectric constant that has the value ε = 1 in vacuum and
the much larger value ε ≈ 80 in water. That this value is so high in water, the
main ingredient of the cell, is crucial since otherwise free charges would hardly
exist, as we shall see below.
The Poisson equation, Eq. 164, is linear in ϕ and ρ so that it is straightforward to solve for any given charge density. First one needs to know the
Green’s function, i.e., the solution for a single point charge e at position x0 ,
ρ (x) = eδ (x − x0 ). Since 4 (1/ |x − x0 |) = −4πδ (x − x0 ) this is given by
ϕ (x) = eG (x, x0 ) =
e
.
ε |x − x0 |
(165)
Having the Green’s function G (x, x0 ) of the Poisson equation, one can calculate
the potential resulting from any given charge distribution ρ (x) via integration:
Z
Z
ρ (x0 ) 3 0
0
0
3 0
ϕ (x) = G (x, x ) ρ (x ) d x =
d x.
(166)
ε |x − x0 |
You can easily check that this solves indeed Eq. 164. Physically the integral in
Eq. 166 can be interpreted as being a linear superposition of potentials of point
charges, Eq. 165.
Unfortunately things are not as easy here since mobile ions are present. The
potential produced by a given charge density is in general not flat so that the
48
mobile charges experience forces, i.e., they will move. If they move the charge
density changes and thus also the potential and so on. What we are looking for
is the thermodynamic equilibrium. In that case the charge density of each ion
type is given by the Boltzmann distribution:
ci (x) = c0i e−Zi eϕ(x)/kB T
(167)
with c0i denoting the charge density at places in space where ϕ (x) = 0. Combining Eqs. 163, 164 and 167 leads to the Poisson-Boltzmann equation:
4ϕ (x) +
X 4πZi ec0i
i
ε
e−Zi eϕ(x)/kB T = −
4π
ρfixed (x) .
ε
(168)
This is an equation for ϕ (x); the charge densities of the different mobile ion
species are then given by Eq. 167. An additional contraint is that the total
charge of the system needs to be zero:
Z
ρ (x) d3 x = 0.
(169)
system
This condition can be understood as follows: If the system of size R (here e.g.
the whole cell) would carry a non-vanishing charge Q, then the energy that it
costed to charge it would scale like Q2 / (εR). It is extremely unlikely that this
energy would be much larger than the thermal energy and therefore Q needs to
stay very small. In other words, the huge positive and negative charges inside
the cell need to cancel each other, leading to a total charge Q that can be
considered to be zero for any practical purposes.
There are two problems when dealing with a PB equation, one of more
practical, the other of principal nature. The practical problem is that this is a
non-linear differential equation for the potential ϕ (x) that is usually very hard
to solve analytically; there exist exact solutions only in a few special cases, two
of which will be discussed below. That ϕ (x) occurs at two different places in Eq.
168 just follows from the above mentioned fact that charges move in response to
the potential and at the same time determine the potential. A solution needs to
be self-consistent, i.e., the distribution of charges needs to induce an electrical
potential in which they are Boltzmann distributed. The non-linearity makes it
in many cases hard to understand how sensitive the solution is to details in the
charge distribution.
What is, however, much more worrisome is the second problem. Solutions of
Eq. 168 are usually smooth functions that look very different to the potentials
featured by electrolyte solutions. Close to each ion the potential has very large
absolute values that in the limit of point charges go even to infinity. Something
has been lost on the way when we constructed the PB equation: Instead of looking at concrete realizations of ion distributions we consider averaged densities
ci (x), Eq. 167. These averages create smooth potentials. This is a typical example of a meanfield approximation: the effect of ions on a given ion is replaced
49
by an averaged effect. A priori it is not clear at all whether such an approximation makes any sense when applied to the electrostatics of the cell. But it
is intuitively clear that the field emerging from a solution of monovalent ions
shows less dramatic variations than that of a solution of ions of higher valency.
The question that we have to answer will be when PB works reasonably well,
when it breaks down and what new phenomena might emerge in that case. As
we shall see, this a fascinating topic with many surprising results.
Electrostatics of charged surfaces
We aim at understanding the electrostatic interactions between macromolecules.
Especially we would like to know what happens if two DNA chains come close
to each other or if a positively charged protein approaches a DNA chain. Usually the charges are not distributed homogeneously on the surface of a macromolecule. For instance, charges on the DNA double helix are located along the
helical backbones and the distribution of charged groups on a protein is often
rather complicated, see Fig. 12. Despite these complications, we shall see that
one can learn a great deal about these systems by looking at much simpler geometries, especially by looking at the electrostatics of charged flat surfaces. The
reason for this is that in many cases all the interesting electrostatics happens
very close to the surface of a macromolecule. Essentially the ions experience
then the macromolecules in a similar way as we experience our planet, namely
as a flat disk. We shall see in the following section that this is indeed true; in
this section we focus on charged planes.
To get started we rewrite the PB equation 168 in a more convenient form
by multiplying it on both sites by e/kB T :
X
ρfixed (x)
−Zi Φ(x)
.
(170)
4Φ (x) +
4πZi lB c0i e
= −4πlB
e
i
Here Φ (x) denotes the dimensionless potential Φ (x) = eϕ (x) /kB T . In addition
we introduced in Eq. 170 one of three important length scales in electrostatics,
the so-called Bjerrum length
e2
lB =
.
(171)
εkB T
This is the length where two elementary charges feel an interaction energy kB T :
e2 / (εlB ) = kB T . In water with ε = 80 one has lB = 0.7nm. This is small
enough compared to atomic scales so that two oppositely charged ions “unbind”.
On the other hand, inside a protein core the dielectric constant is much smaller,
roughly that of oil with ε ≈ 5, and thus there are hardly any free charges inside
the core. Inspecting again Eq. 171 one can see that another route to free charges
is to heat a substance to extremely high temperatures. This leads to a so-called
plasma, a state of matter of no biological relevance.
As warming up exercise let us first consider a simply special case, namely an
infinite system without any fixed, only with mobile charges. Suppose we have
50
z
−σ
Figure 13: Atmosphere of positively charged counterions (blue) above a surface
with a negative charge number density −σ (light red) that is assumed to be
homogeneously smeared out.
an equal number of positively and negatively charged ions of valency Z. In this
case the PB equation 170 reduces to
−4Φ (x) + 8πlB Zcsalt sinh (ZΦ (x)) = 0
(172)
with csalt denoting the bulk ion density, the salt concentration. At first sight Eq.
172 might look difficult to solve but in fact the solution is as trivial as possible,
namely
Φ (x) = 0.
(173)
everywhere. This result is rather disappointing but not really surprising since
the PB equation results from a mean-field approximation. And the mean electrical field of an overall neutral system of uniform positive and negative charges
vanishes. In reality one has thermal fluctuations that lead locally to an imbalance between the two charge species. But such fluctuations are not captured in
PB theory. So far it seems that PB produces nothing interesting. This is, however, not true: as soon as fixed charges are introduced one obtains non-trivial
insights. As we shall see later on, even the fluctuations in a salt solution in the
absence of fixed charges can be incorporated nicely in a linearized version of the
PB theory, the Debye-Hückel theory, that we shall discuss later.
In the following we study the distribution of ions above a charged surface as
depicted in Fig. 13. This is an exactly solvable case that provides crucial insight
into the electrostatics of highly charged surfaces and – as we shall see later – of
DNA itself. The system consists of the infinite half-space z ≥ 0 and is bound
by a homogeneously charged surface of surface charge number density −σ at
z = 0. Above the surface, z > 0, we assume to have only ions that carry charges
of sign opposite to that of the surface, so-called counterions. The counterions
can be interpreted to stem from a chemical dissociation at the surface, leaving
behind the surface charges. These ions will make sure that the charge neutrality
condition, Eq. 169, is respected. We assume that there is no added salt, i.e.,
there are no negatively charged ions present. The PB equation, Eq. 170, takes
51
now the following form:
Φ00 (z) + Ce−Φ(z) = 4πlB σδ (z) .
(174)
We replaced here the term 4πlB Zc0 by the constant C to be determined below
and the primes denote differentiations with respect to z, Φ0 = dΦ/dz. As a
result of the symmetry of the problem, this is an equation for the Z-direction
only since the potential is constant for directions parallel to the surface.
To solve Eq. 174 let us consider the space above the surface, z > 0. Due to
the absence of fixed charges, we find
Φ00 (z) + Ce−Φ(z) = 0.
(175)
Multiplying this equation with Φ0 and performing an integrating along z leads
to
1
2
(176)
E = (Φ0 ) − Ce−Φ
2
where E denotes an integration constant. To solve Eq. 176 we use the trick of
the separation of variables, here of z and Φ, i.e., we rewrite this equation as
dz = ± √
dΦ
.
2E + 2Ce−Φ
(177)
Integration yields
z−z =±
ZΦ
√
Φ
dΦ
2E + 2Ce−Φ
(178)
where we start the integration at height z̄ above the surface where Φ (z̄) = Φ̄.
As we shall see a posteriori we obtain the solution with the right boundary
conditions if we use the positive sign and set E = 0. This makes the integral in
Eq. 178 trivial. If we set z̄ = 0 and choose Φ̄ = 0 we find
1
z=√
2C
ZΦ
e
Φ/2
dΦ =
0
r
2 Φ/2
e
−1 .
C
Solving this for Φ gives finally the potential as function of z:
!
r
C
z .
Φ = 2 ln 1 +
2
(179)
(180)
At a charged surface the electrical field −dϕ/dz makes a jump proportional to
the surface charge density. It vanishes below the surface and attains just above
the surface the value
√
dΦ (181)
= 4πlB σ = 2C.
dz z↓0
52
2
Φ
1
2πlB λ2 c
1
2
z/λ
Figure 14: Potential Φ, Eq. 183, and rescaled counterion density 2πlB λ2 c, Eq.
186, as a function of the rescaled height h/λ above a charged surface. The
dashed lines indicate a simplified counterion profile where all counterions form
an ideal gas inside a layer of thickness λ.
p
This sets C. In fact, 2/C turns out to be the second important length scale
in electrostatics, the Gouy-Chapman length:
λ=
1
.
2πlB σ
(182)
The physical meaning of this length becomes clear further below. We can now
rewrite Eq. 180 as
z
Φ = 2 ln 1 +
.
(183)
λ
The atmosphere of counterions above the surface is then distributed according
to Eq. 167:
c0 λ 2
c (z) = c0 e−Φ =
(184)
2.
(z + λ)
The prefactor c0 in Eq. 184 has to be chosen such that the total charge of the
counterions exactly compensates the charge of the surface, see Eq. 169:
Z∞
−∞
[c (z) − σδ (z)] dz = 0.
(185)
This sets c0 to be σ/λ and hence
c (z) =
1
2.
2πlB (z + λ)
(186)
This distribution is depicted together with the potential Φ, Eq. 183, in Fig. 14.
The density of ions above the surface decays algebraically as z −2 for distances larger than λ. This is somewhat surprising since we have seen that the
distribution of gas molecules in a gravity field decays exponentially, namely as
c (z) ∼ e−mgz/kB T , the so-called barometric formula. The physical reason is
53
that the gas particles do not feel each other but the ions do. The higher the
ions are above the surface, the less they “see” the original surface charge density
since the atmosphere of ions below masks the surface charges. As a result the
ions farer above the surface feel less strongly attracted which leads to a slower
decay of the density with height.
We can now attach a physical meaning to the Gouy-Chapman length λ.
First of all, λ is the height up to which half of the counterions are found since
Rλ
c (z) dz = σ/2. Secondly, if we take a counterion at the surface where Φ (0) =
0
0 and move it up to the height λ where Φ (λ) = 2 ln 2 we have to do work on the
order of the thermal energy, eϕ = 2 ln 2 kB T ≈ kB T . One can say that the ions
in the layer of thickness λ above the surface form an ideal gas since the thermal
energy overrules the electrostatic attraction to the surface. On the other hand,
if an ion attempts to “break out” and escape to infinity, it will inevitably fail
since it would have to pay an infinite price: Φ → ∞ for z → ∞. That means
that all the counterions are effectively bound to the surface. But half of the
counterions, namely those close to the surface, are effectively not aware of their
“imprisonment.”
Based on these ideas let us now try to estimate the free energy fapprox per
area of this so-called electrical double layer. We assume that all the counterions
form an ideal gas confined to a slab of thickness λ above the surface as indicated
in Fig. 14 by the dashed line. The density of the ions is thus c = σ/λ that,
according to Eq. 53, leads to the free energy density
3
σλT
3
−1
(187)
βfapprox = c ln cλT − 1 λ = σ ln
λ
where λT is the thermal de Broglie length, see Eq. 11.
We show now that this simple expression is astonishingly close to the exact
(mean-field) expression. A more formal, less intuitive way of introducing the
PB theory would have been to write down an appropriate free energy functional
F from which the PB equation follows via minimization. This functional is
the sum of the electrostatic internal energy and the entropy of the ions in the
solution:
Z
Z 1
ρ (r)
ρ (r) 3
2
βF =
(∇Φ (r)) d3 r +
ln
λT − 1 d3 r. (188)
8πlB
e
e
Replacing ρ (r) in this functional by Φ (r) through the Poisson equation 4Φ =
−4πlB ρ/e, Eq. 164, one finds that the Euler-Lagrange equation is indeed identical to the PB equation, Eq. 170, namely here 4Φ (x) + 4πlB c0 e−Φ(x) = 0.
Inserting the PB solution for a charged surface, Eqs. 183 and 186 into the free
energy functional, Eq. 188, we find the following free energy density per area:
3
σλT
βf = σ ln
−2 .
(189)
λ
The exact expression, Eq. 189, differs from the approximate one, Eq. 187, just
by a term −σ. Given that agreement, it is fair to say that we have achieved
54
−σ
D
D
Figure 15: Two parallel, negatively charged surfaces and their counterions. (a)
For large separations D between the surfaces the two counterion clouds hardly
interact. (b) If the planes are closeby the two clouds combine and form a dense
“gas”, homogeneously distributed across the gap.
a rather clear qualitative understanding of the physics of the electrical double
layer.
Since we are mainly interested in the interactions between macromolecules,
especially between two DNA molecules and between a DNA molecule and a
protein, we discuss now two model cases: the interaction between two negatively
charged surfaces and the interaction between two oppositely charged surfaces.
We begin with two negatively charged surfaces. The exact electrostatics can be
worked out along the lines of Eqs. 174 to 180 using appropriate values of the
integration constant. We prefer to give here a more physical line of argument.
Suppose the two parallel walls, at distance D, carry exactly the same surface
charge density −σ, see Fig. 15. Then due to the symmetry of the problem the
electrical field in the midplane vanishes; this plane is indicated in the drawing,
Fig. 15, by dashed lines. The disjoining pressure Π between the two planes,
i.e., the force per area with which they repel each other, can then be easily
calculated since it must equal the pressure of the counterions in that midplane.
Using the ideal gas law, Eq. 27, we find
Π
D
=c
.
(190)
kB T
2
Without doing any extra work we can now predict the disjoining pressure
between the two surfaces in two asymptotic cases. If the distance is much larger
than the Gouy-Chapman length λ of the planes, D λ, we can assume that
the two counterion clouds are independent from each other. The density in the
midplane is then the sum of the two single-plane densities, see Fig. 15(a). From
Eq. 186 we obtain
1
4
Π
≈2
=
.
(191)
2
D2
kB T
πl
2πlB 4
BD
Remarkably the disjoining pressure is here independent of σ. This results from
55
−σ
+σ
D
D
Figure 16: Two oppositely charged surfaces and their counterions. (a) For large
separation D between the surfaces the two counterion clouds hardly interact.
(b) If the planes are closeby the counterions are not needed anymore. They gain
entropy by escaping to infinity.
the fact that the single plane counterion density, Eq. 186, becomes independent
of λ (and thus σ) for D λ. In the other limit, D λ, the two counterion
clouds are strongly overlapping and we expect a flat density profile, see Fig.
15(b). Hence
2σ
1
Π
≈
=
.
(192)
kB T
D
πlB λD
The pressure is here linear in σ, reflecting the counterion density. Note that
these results show that the situation is very different from how we are used
to think about electrostatics, namely that the pressure results from the direct
electrostatic repulsion of the two charged surfaces. In fact, in the absence of
counterions the electrical field between the surfaces is constant and follows from
the boundary condition, Eq. 181. This leads to
Π
= 4πlB σ 2
kB T
(193)
that is independent from the distance between the surfaces and, as a result of
the pairwise interaction between surface charges, proportional to σ 2 . Thus the
counterions completely change and, in fact, “rule” the electrostatics.
This becomes even more evident when looking at the interaction between
two oppositely charged surfaces, see Fig 16. Such a situation arises when a
positively charged protein comes close to a negatively charged DNA molecule.
For simplicity, let us assume that the number charge densities of the two surfaces
are identical, σ + = σ − = σ. If the two surfaces are very far from each other,
we can assume that both form the usual electrical double layer of thickness λ,
one with positive counterions, one with negative ones, see Fig 16(a). If the
two surfaces come close to each other, Fig 16(b), there is, however, no need
for counterions anymore since the two surfaces can neutralize each other. The
counterions can therefore escape to infinity and gain translational entropy on
56
the order of kB T . The binding energy per area of the two surfaces as a result of
this counterion release should thus be something on the order of kB T σ. If the
surface charge densities are not the same, charge neutrality enforces that some
of the counterions remain between the surfaces.
Note that our model system that assumes two infinitely large surfaces and
no added salt is quite academic and that a precise calculation of this effect is
not possible in the current framework. No matter how far the two surfaces
are apart: if we look at length scales much larger than the surface separation
of the two surfaces, they look together like a neutral plane. As a result, the
counterions are never really bound. In the following sections we have to come
up with slightly more realistic situations that allow better descriptions of the
counterion release mechanism.
Electrostatics of cylinders and spheres
So far we have discussed planar charged surfaces. However, at length scales
below its persistence length the DNA double helix looks more like a cylinder
and the shapes of globular proteins might be better described by spheres. We ask
here the question whether the basic physics that we described in the previous
section still holds for such objects. As we shall see, this is actually a subtle
problem that can be understood in beautiful physical terms.
Let us start with DNA. DNA is a charged cylinder with a diameter of 2nm
and line charge density of −2e/0.33nm. The question that we like to answer is
whether such a charged cylinder has its counterions effectively bound or whether
they are free. The answer is surprising: Around three quarter of the DNA’s
counterions are indeed condensed but the rest is free and can go wherever they
like. We give here a simple physical argument that goes back to the great
Norwegian scientist Lars Onsager. For simplicity, we describe the DNA molecule
as an infinitely long cylinder of line charge density −e/b and diameter 2R. The
charges are assumed to be homogeneously smeared out on its surface. The
dimensionless electrostatic potential of a cylinder is known to be
Φ (r) =
2lB r ln
b
R
(194)
where r ≥ R denotes the distance from the centerline of the cylinder. Suppose
we start with a universe that consists only of one infinite cylinder. Now let us
add one counterion. We ask ourselves whether this counterion will be bound to
the cylinder or whether it is able to escape to infinity. In order to find out we
introduce two arbitrary radii r1 and r2 with r2 r1 R as depicted in Fig.
17. Now suppose the counterion tries to escape from the cylindrical region of
radius r1 to the larger cylindrical region of radius r2 . According to Eq. 194 the
counterion has to pay a price, namely it has to move uphill in the electrostatic
potential by an amount of the order of
r2
2lB
ln
.
(195)
∆Φ = Φ (r2 ) − Φ (r1 ) =
b
r1
57
R
r1
r2
Figure 17: Onsager’s argument for counterion condensation on charged cylinders
is based on an estimate of the free energy change for a counterion that goes from
a cylindrical region of radius r1 to a larger region of radius r2 .
At the same time it has a much larger space at its disposal, i.e., it enjoys an
entropy gain. The entropy of a single
ion in a volume V follows from the ideal
gas entropy S = kB N ln V / N λ3T + 5/2 with N = 1. This equation follows
from combining Eq. 53 with Eqs. 26 and 51. We assume here V λ3T so that
we can neglect the 5/2-term. When the ion moves from the smaller to the larger
region we find the following change in entropy:
2
r
r2
∆S = S (r2 ) − S (r1 ) = kB ln 22 = 2kB ln
.
(196)
r1
r1
Altogether this amounts to a change in the free energy of
lB
r2
∆F/kB T = ∆Φ − ∆S/kB = 2
− 1 ln
.
b
r1
(197)
There are two possible cases. For weakly charged cylinders, b > lB , the free
energy change is negative, ∆F < 0, and the counterion eventually escapes to
infinity. For highly charged cylinders, b < lB , one finds ∆F > 0. In that case
the energy cost is too high as compared to the entropy gain and the counterion
stays always in the vicinity of the cylinder.
Now the same argument can be used for the rest of the counterions. What
we have to do is simply to add, one by one, all the counterions. The nontrivial and thus interesting case is that of a highly charged cylinder with b < lB .
In the beginning all the counterions that we add condense, thereby reducing
the effective line charge. This continues up to the point when the line charge
density has been lowered to the value −e/lB . All the following counterions that
are added feel a cylinder that carries an effective line density that is just too
weak to keep them sufficiently attracted allowing them to escape to infinity. To
conclude, the interplay between entropy and energy regulates the charge density
58
R
r1
r2
Figure 18: Onsager’s argument applied to a charged sphere.
of a cylinder to the critical value −e/lB . Cylinders with a higher effective charge
density simply cannot exist.
According to the above given definition DNA is a highly charged cylinder.
Counterion condensation reduces its bare charge density of 1/b = 2/ (0.33nm)
to the critical value 1/lB = 1/ (0.7nm). That means that a fraction
e/b − e/lB
b
1
=1−
=1− ,
e/b
lB
ξ
(198)
i.e., about 76%, of the DNA’s counterions are condensed. Counterion condensation on cylinders is called Manning condensation and is characterized by the
dimensionless ratio ξ = lB /b, the Manning parameter. Cylinders with ξ > 1 are
highly charged and have condensed counterions. More precise treatments based
on the PB equation show that this simple line of arguments is indeed correct.
There is another interesting interpretation for Manning condensation. We
have seen above that all the counterions of an infinite, planar surface are condensed. Now a cylinder looks like a flat surface to a counterion if the GouyChapman length, the typical height in which it lives above the surface, is much
smaller than the radius of the cylinder, i.e., if λ R. Using the definition of
λ, Eq. 182, this leads to the condition ξ 1. One can say that for ξ > 1 a
counterion experiences the cylinder as a flat surface and thus stays bound to it.
Let us now study a model protein, a sphere of radius R that carries a total
charge eZ homogeneously smeared out over its surface. We can again use an
Onsager-like argument by adding a single counterion to a universe that consists
only of that sphere. We estimate the change in free energy when the counterion
moves from a spherical region of radius r1 R around the sphere to a larger
region of radius r2 r1 , see Fig. 18. The change
in electrostatic energy is
given by ∆Φ = Φ (r2 ) − Φ (r1 ) = lB Z r2−1 − r1−1 and that of the entropy by
∆S = 3 ln (r2 /r1 ). We learn from this that the free energy change ∆F/kB T =
∆Φ − ∆S/kB goes to −∞ for r2 → ∞, no matter how highly the sphere is
charged. This suggests that a charged sphere will always loose all its counterions.
59
λ
R
II
I
Figure 19: A highly charged sphere in a salt solution. To a good approximation
ions can “live” in two zones. Zone I contains “condensed” counterions, zone II
the bulk electrolyte solution.
Our results on counterion condensation that we have obtained so far can be
summarized as follows. The fraction of condensed ions, fcond , depends on the
shape of the charged object as follows:
plane:
fcond = 1,
cylinder:
fcond = 1 − ξ −1 for ξ > 1, fcond = 0 otherwise,
sphere:
fcond = 0.
It is important to realize that we have considered so far fairly academic special
cases. First of all, we assumed infinitely extended planes and infinitely long
cylinders but any real object is of finite size. Any object of finite extension
looks from far apart like a point charge and will thus loose all its counterions,
as a sphere does. One might therefore think that theorizing about counterion
condensation is a purely academic exercise. This is luckily not the case since, as
we shall see now, counterions might also condense on spheres. We came above
to the conclusion that for the spherical case fcond = 0 by assuming that we had
only one sphere in the universe. If there is a finite density of spheres, each with
its counterions, the situation can be different. Also we assumed that there are
no small ions present, except the counterions of the sphere. If we have a single
sphere but a finite salt concentration, the situation can again be different from
the above given academic case. In both cases, for a finite density of spheres
or for a finite salt concentration, the entropy gain for a counterion to escape
to infinity is not infinite anymore. Depending on the sphere charge and on the
concentration of small ions in the bulk, there might be a free energy penalty
instead.
We consider now a single sphere in a salt solution following the line of argument given by Alexander and coworkers (1984). For a highly charged sphere at
moderate salt concentration csalt they postulated two zones, see Fig. 19. Zone
I is the layer of condensed counterions of thickness λ and zone II is the bulk.
When a counterion from the bulk, zone II, enters zone I it looses entropy since
60
it goes from the dilute salt solution of concentration csalt to the dense layer of
condensed counterions. The ion concentration of that layer can be estimated to
be ccond ≈ σ/λ = 2πlB σ 2 where σ denotes the surface charge number density
of the sphere. We assume here that the sphere is so highly charged that most
of its counterions are confined to zone I. The entropy loss is then given by
ccond
= −kB Ω.
(199)
∆S = SI − SII ≈ −kB ln
csalt
The counterion also gains something by entering zone I. In zone II it does not feel
the presence of the charged sphere since the electrostatic interaction is screened
by the other small ions as shall become clear in the following section. On the
other hand, in zone I it sees effectively a sphere of charge Z ∗ where Z ∗ denotes
the sum of the actual sphere charge, Z, and the charges from the condensed
counterions inside zone I. The gain in electrostatic energy is thus
lB Z ∗
.
(200)
R
If we start with a system where all counterions are inside the bulk, counterions
flow into zone I up to a point when there is no free energy gain anymore. This
point is reached when the charge is renormalized to the value
∆Φ ≈ −
eZ ∗ = eΩ
R
.
lB
(201)
To formulate it in a more elegant way: Z ∗ is the point where the chemical
potentials of zone I and II are identical.
Note, however, that in order to obtain Eq. 201 we cheated a bit since we
assumed that Ω is a constant. This is not really the case since according to Eq.
199 Ω depends on ccond and thus on Z ∗ . Since this dependence is logarithmic,
i.e., very weak, this simplification is quite reasonable and one can assume Ω
to be a constant with a value of around 5 for typical salt concentrations and
surface charge densities encountered in cells. A more concise way of calculating
the renormalized charge Z ∗ is given in the next section.
We are now in the position to refine our argument on counterion release from
above. Consider again the case of two oppositely charged surfaces as depicted in
Fig. 16 but with additional salt. In the case of equal surface charge densities of
the two surfaces all the counterions are released and the free energy gain reflects
the change of concentration that the counterions experience. The free energy
change per surface scales thus as
f
kB T
≈ σΩ ≈ σ ln
2πlB σ 2
.
csalt
(202)
When discussing PB theory above – especially for spherical geometry where
no analytical solutions exist – we had to rely on simplified arguments. It turns
out that one can gain a great deal of insight by linearizing PB theory. Strictly
speaking such a linearization makes only sense for weakly charged surfaces but
we shall see that there is an elegant argument that allows us also to extend this
framework to highly charged objects.
61
Debye-Hückel theory
As mentioned earlier, the PB equation is hard to handle since it is non-linear.
Here we study its linearized version, the well-known Debye-Hückel (DH) theory.
It provides an excellent approximation to PB theory for the case that the fixed
charges are weak. Consider the PB equation of a salt solution of valency Z = 1
and concentration csalt in the presence of fixed charges of density ρfixed . The
PB equation 170 takes then the form
ρfixed
.
4Φ + 4πlB csalt e−Φ − e+Φ = −4πlB
e
(203)
Let us now assume that the electrostatic energy is small everywhere, i.e., that
Φ (x) 1 for all x. In that case we can linearize the exponential functions,
eΦ ≈ 1 + Φ and e−Φ ≈ 1 − Φ. This results in the DH equation
−∆Φ + κ2 Φ = 4πlB
ρfixed
.
e
(204)
We introduced here the final of the three length scales important in electrostatics, the Debye screening length κ−1 . For monovalent salt, as assumed here, this
length is given by
1
.
(205)
κ−1 = √
8πlB csalt
Its physical meaning will become clear below.
We can now come back to the disappointing result we encountered earlier
when we looked at a salt solution in the absence of fixed charges where the PB
equation 172 is solved by Φ ≡ 0. This has not changed here since also the DH
equation produces the same trivial answer. But now we are in the position to
go beyond this result and to include in our discussion correlations between salt
ions. This would have been very difficult to do for the PB equation where no
appropriate analytical solutions are available. Consider a point charge +eZ at
position x0 . The DH equation for such a test charge takes the form:
−∆ + κ2 G (x, x0 ) = 4πlB Zδ (x − x0 ) .
(206)
Knowing G (x, x0 ), the Green’s function, allows to calculate Φ for an arbitrary
distribution of fixed charges:
Z
ρfixed (x0 ) 3 0
Φ (x) = G (x, x0 )
d x.
(207)
eZ
The Green’s function for Eq. 206 is given by
G (x, x0 ) =
lB Z −κ|x−x0 |
.
e
|x − x0 |
(208)
One calls this a Yukawa-type potential, referring to Yukawa’s original treatment
introduced to describe the nuclear interaction between protons and neutrons
due to pion exchange. That this indeed solves Eq. 206 can be checked by letting
62
κ−1
+eZ
Figure 20: An ion of charge +eZ is surrounded by an oppositely charged ion
cloud of typical size κ−1 .
the Laplace operator in spherical coordinates act on the potential of a point
charge at the origin:
2
−κr
∂
2 ∂ e−κr
2e
−
+
=
−4πδ
(r)
−
κ
.
(209)
∂r2
r ∂r
r
r
To derive Eq. 209 we use the fact that 4 (1/ |x − x0 |) = −4πδ (x − x0 ) as mentioned above Eq. 165.
What is the physical picture behind Eq. 208? In the absence of any salt ions
one would have just the potential Φ (x) = lB Z/ |x − x0 | around our test charge,
i.e., Eq. 208 without the exponential term or, if you prefer, the full Eq. 208
but with κ = 0. In the presence of salt ions the test charge is surrounded by
an oppositely charged ion cloud as schematically depicted in Fig. 20. This ion
cloud effectively screens the test charge so that the potential decays faster than
1/r, namely like e−κr /r. The screening length κ−1 reflects the typical cloud
size.
Having at hand an expression for the potential around an ion, we calculate
now the free energy of a salt solution on the level of the DH theory. As a first
step we determine the change of the self-energy of an ion that is brought from
ion-free water to the salt solution. We consider the ion asa homogeneously
charged ball of radius a and charge density ρ = 3e/ 4πa3 . We shall show
below that the result will not depend on the radius so that we can take the
limit a → 0. In an electrolyte free environment the self energy is
Z
Z
1
ρ (x) ρ (x0 )
1 e2 lim
d3 x0 d3 x
=
= ∞.
(210)
a→0 2
ε |x − x0 |
2 εa a=0
On the right-hand side we assumed a point-like charge for which ρ (x) = δ (x).
There is evidently a problem since the self-energy of the point charge is infinite.
Let us nevertheless go ahead and calculate also the self-energy of the point
charge inside an electrolyte solution:
Z
Z
1
ρ (x) ρ (x0 ) −κ|x−x0 |
1 e2 e−κa lim
d3 x0 d3 x
e
=
= ∞.
(211)
a→0 2
ε |x − x0 |
2
εa a=0
63
Also here the self-energy is infinite. What saves us is that we are not interested
what it costs to “form” a point ion. What we want to know instead is the
change in the self-energy when the ion is transferred from ion-free water to the
electrolyte solution. This change turns out to be finite:
−κa
e
1
lB κ
lB
lim
−
=−
.
(212)
βEself =
2 a→0
a
a
2
Each particle in the electrolyte contributes this value to the internal energy.
This leads to the following change in the internal energy density:
β∆u = 2csalt Eself = −
κ3
.
8π
(213)
Combining Eqs. 14 and 51 we know that the average internal energy density
h∆ui follows from the free energy density 4f via
h∆ui =
∂
[β4f ] .
∂β
(214)
This allows us to calculate the electrostatic contribution of the charge fluctuations to the free energy density:
∆f = −kB T
κ3
.
12π
(215)
This finding should surprise you. We discussed in Section 2.3 the impact of
the interaction between particles of a real gas on its pressure and free energy. We
found to lowest order in the density n that the ideal gas expressions are changed
by terms of the order n2 , see Eqs. 96 and 101. This reflects interactions between
pairs of particles. Surprisingly, for the ion solution we find that interactions
3/2
between ions lead to a free energy contribution proportional to κ3 ∼ csalt instead.
How can one understand this discrepancy? The reason lies in the fact that the
electrostatic interaction decays very slowly with distance. If one attempts to
calculate the second virial coefficient B2 for such a long-ranged 1/r-potential
one finds a diverging
integral: According to Eq. 79 the integrand is proportional
to r2 e−βw(r) − 1 which scales then for large r as r2 (1/r) = r.
We provide now a scaling argument that makes Eq. 215 transparent. Consider a very small volume V inside the electrolyte solution. Ions can enter and
leave this volume at will, as if they would be uncharged and as a result the
volume displays random fluctuations in its net charge. According to the central limit theorem the net charge Q can be estimated to be proportional to the
square root of the number of ions Nion inside that volume, i.e.,
p
p
Q/e ≈ ± Nion = ± csalt V .
(216)
The assumption that the ions are independent of each other is only true up to
regions of size L with volume V = L3 for which the electrostatic self energy
equals the thermal one
lB Q2
= 1.
(217)
L
64
κ−1
Figure 21: Schematic sketch of charge fluctuations inside an electrolyte solution.
Regions of typical size κ−1 with an excess of negative ions are surrounded by
regions with positive net charge and vice versa.
This condition can be rewritten as
L= √
1
∼ κ−1 ,
lB csalt
(218)
i.e., the length scale up to which ions move independently from each other
is just the Debye screening length, Eq. 205. For larger length scales an area
κ−3 that happens to carry a positive excess charge is typically surrounded by
regions with negative excess charge as schematically indicated in Fig. 21. The
interaction energy of two such neighboring, oppositely charged regions is on
the order of −kB T as follows directly from Eq. 217. Thus we expect that the
fluctuations in the charge distribution lead to a contribution to the free energy
density that scales like −kB T /κ3 . This is indeed what we found from the exact
DH treatment, Eq. 215.
The DH equation can be solved analytically for various geometries. We
present here the solutions for three standard geometries: a plane, a line and a
charged ball. The DH equation for a plane of charge density σ is given by
∂2
2
− 2 + κ Φ = 4πlB σδ (z) .
(219)
∂z
It is straightforward to check that this is solved by the potential
Φ (z) = 4πlB σκ−1 e−κz
(220)
for z ≥ 0 and Φ (z) = 0 for z < 0. A corresponding DH equation in cylindrical
symmetry for a charged line of line charge density b−1 leads to the potential
( 2l
B
κr
for κr 1
2lB
b ln
q
Φ (r) = −
K0 (κr) ≈
(221)
l
2π
−κr
B
b
−b
for κr 1.
κr e
The function K0 is a modified Bessel function whose asymptotic behavior for
small and large arguments has been used on the rhs of Eq. 221 to predict the
65
potential close to and far from the charged line. The short-distance behavior is
identical to the one of
√ a naked rod, Eq. 194, for larger distances the line charge
is screened as e−κr / r (up to logarithmic corrections). Finally, for a charged
sphere of radius R and charge Z one finds for r > R the potential
lB Z e−κ(r−R)
.
(222)
1 + κR
r
As for a point charge the potential decays proportional to e−κr /r. Here, however, for a sphere larger than the screening length, κR > 1, the full charge can
never be seen, not even close to its surface, since it is distributed in a volume
larger than the screening length. Z is then effectively reduced to Z/ (κR).
The above given three potentials are not only exact solutions to the DH
equation but also excellent approximations to the PB equation if the potential
is everywhere much smaller than one, Φ 1. For a line charge this condition
requires lB b, i.e., the Manning parameter ξ needs to be much smaller than
one. Hence DH theory works well if we do not have Manning condensation.
In other words, counterion condensation is just a physical manifestation of the
nonlinearity of the PB equation. For spheres the situation is similar. Assuming
a sufficiently small sphere so that κR < 1, the DH approximation works well if
lB Z/R 1, see Eq. 222. This condition is fulfilled if the sphere charge is much
smaller than the charge Z ∗ , Eq. 201, the value to which a highly charged sphere
would be renormalized. In other words, DH can be used for weakly charged
spheres that do not have charge renormalization.
But what can one do if surface charge densities are so high that Φ becomes
larger than unity? Does one necessarily have to deal with the difficulties of
nonlinear PB theory or can one somehow combine the insights into counterion
condensation and DH theory to construct something that can be handled more
easily? That this is indeed possible has been demonstrated by Alexander and
coworkers (1984). The idea is that the nonlinearities of the PB equation cause
the charge renormalization of highly charged surfaces. As a result the potential
slightly away from such a surface is so small that DB theory can be used, but
a DH theory with a properly reduced surface charge. Consider, for instance,
a sphere with κR < 1. If the sphere is weakly charged we can simply use Eq.
222. If the sphere is highly charged the nonlinearities of the PB theory predict a
layer of condensed counterions of thickness λ that effectively reduces the sphere
charge Z to a smaller value Z ∗ as estimated in Eq. 201. We thus expect that
the potential sufficiently away from the sphere’s surface is given by
Φ (r) =
e−κ(r−R)
.
(223)
r
Note, however, that Eq. 201 is just a rough estimate of Z ∗ based on an argument
where the space around the sphere is artificially divided into two zones.
We are now in the position to give Z ∗ a precise meaning by requiring that
the renormalized DH solution ΦZ ∗ and the exact PB solution ΦPB – that is here
only known numerically – match asymptotically for large distances
ΦZ ∗ (r) = lB Z ∗
lim ΦZ ∗ (r) = lim ΦPB (r) .
r→∞
r→∞
66
(224)
.c
Φ
Z
λ
PB
PB
λ
R
Z∗
R
r
Z∗
(a)
r
(b)
Figure 22: (a) Schematic sketch of the potential around a charged sphere for the
full solution (PB), the DH solution (Z) and the DH solution with renormalized
charge (Z ∗ ). (b) Resulting counterion density for the PB solution and for the
DH solution with renormalized charge. At large distances the densities are the
same but closeby the sphere PB predicts a dense layer of condensed counterions.
That Eq. 224 has a precise mathematical meaning follows from two facts: (1)
due to the symmetry of the problem the electrical field is radially symmetric
and (2) the potential decays to zero away from the sphere. Therefore the potential must asymptotically look like the DH solution of a charged sphere. In
Fig. 22(a) we sketch schematically the potential Φ (r) for the three solutions
around a highly charged sphere: the full PB solution, the DH solution with the
bare charge Z and the DH solution with the renormalized charge Z ∗ . For a
non-renormalized charge the DH solution overestimates the potential at large
distances whereas the renormalized DH solution matches asymptotically the
full PB solution. The resulting counterion density c (r) ∼ eΦ(r) for the full PB
solution and the renormalized DH solution is depicted in Fig. 22(b).
You might be worried that all the details of the PB theory are lost since
in this simple procedure everything is lumped together in one number, the
renormalized charge. It is true that renormalized DH theory can only describe
the electrostatics beyond the Gouy-Chapman length. It has nothing to say
about the microscopic details inside the double layer. One can, however, argue
that one does not really want to know about those microscopic details anyway.
As a concrete example let us consider again DNA that has 2 elementary charges
per 0.33nm and a radius of R = 1nm. This leads to the Gouy-Chapman length
λ=
σ −1
0.33nm × R
=
≈ 0.24nm.
2πlB
2 × 0.7nm
(225)
Up to now we assumed that the DNA charges are homogenously smeared out.
In reality the DNA surface area per phosphate charge is given by
A=
2πR × 0.33nm
≈ 1nm2 .
2
67
(226)
λ = 0.24 nm
√
A = 1 nm
Figure 23: The area A per charged group on a DNA chain is around 1nm2 but
the Gouy-Chapman length λ of a homogeneously charged surface with the same
surface charge density is only 0.24nm.
In other words, the layer of condensed counterions per surface charge is much
thinner than it is wide. We must thus expect that the details of the charge
distribution, namely its graininess has an effect on the counterion condensation.
Smearing out the surface charges might create huge errors, e.g., in the value
of the renormalized charge. It is, however, difficult to estimate the size of this
error since the PB theory is extremely nonlinear close to the surface.
In principle it is, of course, possible to numerically solve the PB equation for
any distribution of surface charges, but one has to ask oneself how meaningful
that is. Typical ion radii are of the order of the λ-value of DNA and might
have an effect that is again hard to determine due to the inherent nonlinearity
of PB theory. And finally, there is yet another effect that we have brushed
under the carpet: the difference in the dielectric constants between the inside
of a macromolecule and the surrounding water. Since electrical field lines try
to avoid regions of low dielectricity, e.g. the inside of a protein, ions feel an
effective repulsion from such a region. In standard electrostatics such effects
can be modelled via the introduction of so-called image charges, virtual charges
that “live” inside regions of low dielectricity and repel real ions nearby. Again
this is an effect where microscopic details matter and that can hardly be properly
estimated. All what we can say is that all these effects act together in effectively
reducing the charge densities of highly charged surfaces.
Breakdown of mean-field theory
When discussing PB theory and its linearized version, DH theory, we might have
given the impression that these theories always work in one way or another. We
noted that the strong non-linearities close to highly charged surfaces are somewhat problematic but claimed that proper charge renormalization will always
fix that problem. However, as we shall see now, electrostatics is not always as
simple as that. Let us go back to the problem of two equally charged surfaces.
PB theory predicts that two such surfaces repel, see the two expressions for the
disjoining pressure at short and large separations, Eqs. 191 and 192. However,
68
A
B
.c
(a)
D
(b)
Figure 24: Two Wigner crystals formed by condensed counterions induce an
attraction between two equally charged planes: (a) top view indicating the
displacement vector c that leads to maximal attraction and (b) side view.
in many experiments it has been observed that equally charged objects attract,
an effect that – as one can show strictly mathematically – can never be produced
by PB theory. In other words, PB theory does sometimes not even get the sign
of the force right. A well-known example is DNA. Under the right conditions a
DNA molecule can condense onto itself. Such a condensed DNA molecule forms
typically a toroid, thereby avoiding in its middle a region of too high curvature.
How is it possible that a highly charged molecule like DNA attracts itself? In
fact, this never happens inside monovalent salt solutions but when a sufficient
amount of trivalent ions or ions of even higher valency is added, such a collapse is typically observed. It can be shown that the meanfield approximation
becomes less and less accurate with increasing ion valency. We are lucky that
monovalent ion charges are small enough that PB theory can be applied. In
fact, one can go much further and not just worry about the applicability of that
theory: if the smallest charge unit would be e.g. 4e instead of e, everything
would glue together and there would be simply no life possible.
To come up with a very clean theory that describes the origins of this attraction is not straightforward. We give here a simple argument that goes back
to Rouzina and Bloomfield (1996). Again we study the interaction between
two identically charged surfaces with their counterions. We assume monovalent
counterions but lower the temperature to zero, i.e., we study the ground state
of the system. This is, of course, rather academic since water freezes long before
but what we are aiming at is just a basic understanding of the principle. According to the so-called Earnshaw’s theorem any electrostatic system collapses
at a sufficiently low temperature. Two surfaces with their counterions should
thus stuck on top of each other, D = 0, for zero temperature. We shall see that
the two surfaces indeed attract in that case.
Let us start by first considering a single charged plane. For T → 0 its GouyChapman length goes to zero, λ → 0, since the Bjerrum length goes to infinity,
69
lB → ∞. This means that all the counterions sit on the surface. In order to
minimize their mutual repulsion, they form a two-dimensional triangular socalled Wigner crystal as depicted in Fig. 24. If we have now two such surfaces
sufficiently far apart, then the counterions at both surfaces form such patterns
independent from each other. When the two surfaces come closer, the counterions lower the electrostatic energy further by shifting their two Wigner crystals
with respect to each other by a vector c as indicated Fig. 24(a). That way an
ion in plane B is located above an ion-free area in plane A, namely above the
center of a parallelogram with A-ions in its corners. In other words, the relative
position of the two planes is shifted with respect to each other by half a lattice
constant, so that the two Wigner crystals are out-of-register.
A counterion sitting on one plane, say plane A, feels then the following dimensionless potential resulting from the interaction with plane B and its counterions:
Z
X
d2 r
1
q
− lB σ √
.
(227)
Φ (D) = lB
2
r2 + D2
l
|Rl + c| + D2
The first term on the rhs describes the repulsion from the counterions condensed
on surface B that are located at positions Rl +c with c denoting the displacement
vector between the two planes (both, Rl and c, are in-plane vectors). The second
term accounts for the attraction of the counterion to the homogeneous surface
charge on plane B. Further terms do not appear in Eq. 227 since the attraction
of the fixed charge of plane A to ions in plane B is exactly cancelled by the
repulsion from the fixed charge of plane B. From Eq. 227 follows directly the
pressure between the two surfaces:
√
Π (D)
∂
− 2π
σD
= −σ
Φ (D) ≈ −8πσ 2 lB e 31/4
.
kB T
∂D
(228)
This formula, derived in the Appendix,
is accurate for distances D much larger
√
than the counterion spacing ∼ 1/ σ. We thus find an attraction with a decay
length proportional to the counterion-counterion spacing.
What is the condition that needs to be fulfilled to have attraction between
equally charged surfaces? Above we argued that PB theory is not useful anymore
if the Gouy-Chapman length becomes shorter than the distance between fixed
charges on the surface, see Fig. 23. Here we use a similar argument, but this time
we focus on the counterions in order to estimate when the alternative theory of
correlated counterions becomes reasonable. If the counterions have valency Z,
then the height up to which half of the counterions are found is λ/Z. On the
other hand, the spacing a between
p the counterions sitting in a Wigner crystal
as shown in Fig. 24 is given by 3/2a2 = Z/σ. The typical lateral distance
between counterions is larger than the height of the counterion cloud if a > λ/Z.
This leads to the condition
r
3
1
3
Z >
(229)
2 σ.
2 4π 2 lB
70
From this follows that the cloud is essentially two-dimenional for large enough
counterion valencies (note the cubic dependence) and for large enough surface
charge densities. Remarkably, when condition 229 is fulfilled, one finds that –
up to numerical prefactors – the spacing between counterions fulfills a < Z 2 lB ,
i.e., the neighboring ions feel a mutual repulsion on the order of or larger than
kB T . Even though this is by far not strong enough to induce their ordering into
a perfect Wigner crystal, the ions are correlated to some extent and can induce
the attraction between the charged surfaces. For DNA one has σ = 1nm−2
and condition 229 reads Z 3 > 0.06 or Z > 0.4. This seems to suggest that
monovalent ions are already strong enough to cause attraction but the argument
is evidently too simple to give a reliable quantitative estimate. In reality, ions
with Z = 3 or larger cause attraction between DNA double helices.
Appendix: Interaction between two charged plates at zero temperature
Here we derive the pressure between two equally charged plates at zero temperature, Eq. 228. We begin by rewriting the sum over l in Eq. 227:
Z
X
X
δ (r − Rl )
1
q
q
= lB dr
.
(230)
I = lB
2
2
l
l
|Rl + c| + D2
|r + c| + D2
The XY -positions of the ions on one surface form a lattice given by the set of
2D vectors Rl , whereas the ions on the other surface are shifted to the positions
Rl + c. The integral introduced above is thus two dimensional. This leads to a
sum over delta-functions that is a periodic function in two dimensions. Any such
periodic function f (r) can be written in the form of a plane wave expansion, a
2D version of the Fourier expansion introduced in Appendix 4.2. Here
X
X
f (r) =
δ (r − Rl ) =
fk eikr
(231)
l
k
where the summation goes over all vectors k of the reciprocal lattice that is
defined further below. The fk are the Fourier coefficients that are given by
Z
fk = σ
e−ikr f (r) dr
(232)
C
with C denoting a primitive cell of the direct lattice, a minimum repeat unit
containing one ion. Here fk = σ and hence
Z
XZ
X
eikr
eikr
q
√
I = lB σ
dr.
(233)
dr = lB σ
e−ikc
2 + D2
2
r
2
k
k
|r + c| + D
We exchanged here the order of summation and integration; substituting r + c
by r yields the phase factor e−ikc . Note that the term with k = 0 in the
71
b2 a2
.c
a1
b1
Figure 25: Primitive vectors a1 and a2 that span the triangular lattice. Also
indicated are the primitive vectors of the reciprocal lattice, b1 and b2 , and the
shiftvector c for maximal attraction between the two surfaces.
summuation corresponds exactly to the second term in Eq. 227. Hence we can
write the dimensionless potential as
Z
X
eikr
√
Φ (D) = lB σ
e−ikc
dr.
(234)
2
r + D2
k6=0
Using Eq. 228 we calculate the pressure from the potential by differentiation
Π (D)
kB T
2
= lB σ D
X
e
−ikc
Z
dr
eikr
3/2
(r2 + D2 )
Z 2π
Z ∞
X
reikr cos φ
= lB σ 2 D
e−ikc
dφ
.
dr
3/2
0
0
(r2 + D2 )
k6=0
k6=0
We introduced here polar coordinates where φ denotes the angle between the
respective k-vector and r. The double integral can be calculated analytically
(first integrate over φ, then over r) and yields (2π/D) e−kD with k = |k|. This
leads to
X
Π (D)
= 2πlB σ 2
e−ikc e−kD .
kB T
(235)
k6=0
We have thus expressed the interaction between the two surfaces as an infinite
sum of exponentials. In the following we are interested in the leading terms of
this sum for large distances. These will be the terms with the smallest value of
k.
The ground state of a single plane is given by counterions that form a triangular Wigner crystal. We expect that each surface with its counterions still
remains in this triangular groundstate as long as D is much larger than the spacing between counterions within their planes. More specifically, the positions of
72
the counterions in one lattice are given by n1 a1 + n2 a2 with ni = 0, ±1, ±2, ...,
an example of a so-called Bravais lattice. The vectors ai that span the lattice,
the so-called primitive vectors, are given by
√
a
3a
a1 = aex , a2 = ex +
ey
(236)
2
2
and are indicated in Fig. 25. The lattice spacing a has
to be chosen such to
match the charge density σ, leading to a = 2/ 31/4 σ 1/2 . The reciprocal lattice,
the set of all vectors k for which eikR = 1 for all R in the Bravais lattice, is
given by k = k1 b1 + k2 b2 , ki = 0, ±1, ±2, ..., with
4π
1
2π
(237)
ex − √ ey , b2 = √ ey .
b1 =
a
3
3a
The primitive vectors of the reciprocal lattice fulfill bi aj = 2πδij , see also
Fig. 25. For large distances the leading terms in Eq. 235 are the ones with
the smallest value of k, namely (k1 , k2 ) = (±1, 0) and (k1 , k2 ) = (0, ±1). For
distances D with D a all higher order terms are negligible. The large distance
pressure is thus to a very good approximation given by
Π (D)
− √4π D
≈ 4πσ 2 lB (cos (b1 c) + cos (b2 c)) e 3a .
kB T
(238)
For a vanishing length of c counterions of one surface are just on top of counterions of the other surface so that the two surfaces repel each other. One finds
then cos (b1 c) + cos (b2 c) = 2 leading to maximal repulsion. If we, however,
allow one of the plates with its counterions to move in the XY -plane relative to
the other at a fixed value of D, the system can lower its energy. It reaches the
groundstate when cos (b1 c) + cos (b2 c) = −2. This can be achieved by choosing
e.g. the shift c such that b1 c = −π and b2 c = π. This is achieved for
√
a
3a
c = − ex +
ey ,
(239)
4
4
as shown in Figs. 24 and 25. In that case we find Eq. 228 from Eq. 238.
73