Large Deviations

Large Deviations
Stefano Olla
c Draft date January 11, 2014
January 11, 2014
ii
Chapter 1
Tilting
1.1
Legendre transforms and the rate function I
Let α(dx) a probability distribution on R. We define the moment generating function
Z
M (λ) = eλx α(dx)
(1.1.1)
and let us assume that there exists λ∗ > 0 such that M (λ) < ∞ if |λ| < λ∗ .
Notice that, since |x| ≤ λ−1 (eλx + e−λx ) for any
R λ > 0, this condition implies that
all moments are finite and we denote m = xα(dx) ∈ R. It is easy to see that
m = M 0 (0). We are interested in the logarithmic moment generating function
Z(λ) = log M (λ)
(1.1.2)
By Jensen’s inequality, we have Z(λ) ≥ λm > −∞. Let DZ = {λ : Z(λ) < +∞}.
Under our hypothesis, 0 ∈ DZo (the interior of DZ ).
Lemma 1.1.1
1. Z(·) is convex.
2. Z(·) is continuously differentiable in DZo and
Z
0
Z (λ) = xeλx−Z(λ) α(dx)
λ ∈ DZo .
Proof: For any γ ∈ [0, 1], it follows by Hölder inequality
M (γλ1 + (1 − γ)λ2 ) ≤ M (λ1 )γ M (λ2 )1−γ
1
2
CHAPTER 1. TILTING
and consequently
Z(γλ1 + (1 − γ)λ2 ) ≤ γZ(λ1 ) + (1 − γ)Z(λ2 )
The function f (x) = (e(λ+)x − eλx )/ converges point-wise to xeλx , and
|f (x)| ≤ eλx (eδ|x| − 1)/δ ≤ eλx (eδx + e−δx )/δ = h(x),
for every || ≤ δ.
R
For any λ ∈ DZo , there exists a δ > 0 small enough such that h(x)dα(x) ≤
M (λ + δ) + M (λ − δ) < +∞. Then the result follows by the dominated convergence
theorem.
Using the same argument one can prove that Z(·) ∈ C ∞ (DZo ). Computing the
second derivative we obtain
2
Z
Z
λx−Z(λ)
00
2 λx−Z(λ)
xe
α(dx) ≥ 0
Z (λ) = x e
α(dx) −
Observe that αλ (dx) := eλx−Z(λ) α(dx) is a probability measure, with average
Z (λ) and variance Z 00 (λ).
0
To avoid the trivial deterministic case, we assume that Z 00 (0) > 0. It follows
that Z 00 (λ) > 0 for any λ ∈ DZo , i.e. Z(·) is strictly convex.
We define the rate function as the Fenchel-Legendre transform of Z
I(x) = sup {λx − Z(λ)}
(1.1.3)
λ∈R
It is immediate to see that I is convex (as supremum of linear functions), hence
continuous, and that I(x) ≥ 0. Furthermore we have that I(m) = 0. In fact by
Jensen’s inequality M (λ) ≥ eλm for any λ ∈ R, so that
λm − Z(λ) ≤ 0
and it is equal to 0 for λ = 0. We conclude that I(m) = 0.
Consequently m is a minimum of the convex positive function I(x). It follows
that I(x) is nondecreasing for x ≥ m and nonincreasing for x ≤ m.
Observe that if x > m and λ < 0
λx − Z(λ) ≤ λm − Z(λ)
1.1. LEGENDRE TRANSFORMS AND THE RATE FUNCTION I
3
that implies
I(x) = sup {λx − Z(λ)}
x>m
(1.1.4)
x<m
(1.1.5)
λ≥0
Similarly one obtains
I(x) = sup {λx − Z(λ)}
λ≤0
Here are other important properties of I(·):
Lemma 1.1.2 I(x) → +∞ as |x| → ∞, and its level sets are compact.
Proof: If x > m ∨ 0, for any positive λ ∈ DZ ,
I(x)
Z(λ)
≥λ−
x
x
and limx→+∞ Z(λ)/x = 0, so we have limx→+∞ I(x)/x ≥ λ. Consequently its level
sets {x : I(x) ≤ a} are bounded, and closed by continuity of I. 1.1.1
Properties of Legendre transforms
We denote DI = {x ∈ R : I(x) < ∞}.
Lemma 1.1.3
The function I is convex in DI , strictly convex in DI0 and I ∈ C ∞ (DIo ). Furthermore
for any x̄ ∈ DIo there exists a unique λ̄ ∈ DZo such that
x̄ = Z 0 (λ̄)
and
λ̄ = I 0 (x̄)
Furthermore I(x̄) = λ̄x̄ − Z(λ̄).
We will say that x̄ and λ̄ are in duality if the conditions of the above lemma are
satisfied.
Proof: The function Fx (λ) = λx − Z(λ) has a unique maximum for λ = λ̄.
This is because it is concave and ∂λ Fx (λ̄) = 0. It follows that I(x̄) = λ̄x̄ − Z(λ̄)
and that Z(λ) = supx {λx − I(x)}. By the same argument Gλ (x) = λx − I(x) is
maximized by x̄. 4
CHAPTER 1. TILTING
Examples in R
1. Let α be the gaussian distribution
√
1
2πσ 2
2 /2σ 2
e−(x−m)
dx
then I(x) = (x − m)2 /2σ 2 .
2. α = 12 (δ0 + δ1 ) (Bernoulli). Then M (λ) = 12 (1 + eλ ) and
I(x) = x log x + (1 − x) log(1 − x) + log 2
if x ∈ [0, 1]
and I(x) = +∞ otherwise.
3. For the exponential law α(dx) = βe−βx 1x≥0 dx, we have M (λ) = β/(β − λ)
for −∞ < λ < β, otherwise M (λ) = +∞. Then
I(x) = βx − 1 − log(βx)
if x > 0
and I(x) = +∞ if x ≤ 0.
4. If ξ in a random variable with law N (0, 1/β), then ξ 2 has law χ2 (1), i.e. a
gamma law Γ(1/2, β/2), which has density
β 1/2
√
x−1/2 e−βx
2Γ(1/2)
Its moment generating function is M (λ) = (β/(β − 2λ))1/2 if λ < β/2, otherwise equal to +∞. The rate function results
I(x) =
1
{βx − log(βx) − 1}
2
if x > 0
and +∞ if x < 0.
5. If Z(λ) = p−1 |λ|p , then Z ∗ (x) = q −1 |x|q , with p−1 + q −1 = 1.
1.2
A more general setup
We can extend the above setup to situation where α is only a positive measure
on a measurable topological space Ω. We will be interested essentially to Ω = Rd
(eventually d = 2) and α as the Lebesgue measure on it. Other examples are Ω = Sd ,
the d-dimensional sphere, and α the corresponding uniform measure.
1.2. A MORE GENERAL SETUP
5
Let g : Ω → R a measurable function and define
Z
Z(λ) = log eλg(ω) dα(ω)
(1.2.1)
Ω
on the corresponding domain DZ . Notice that now this could be empty, and in
particular 0 ∈
/ DZ if α is not of finite measure. We assume that DZ0 6= ∅, and that
g is not constant. With similar proofs as above, Z is C ∞ on the interior of DZ .
In any case, if λ ∈ DZ , then dαλ (ω) = eλg−Z(λ) dα(ω) is a probability distribution on Ω. With respect to αλ , g can be seen as a random variable with average
Z 0 (λ) and variance Z 00 (λ). Notice that Z 00 (λ) > 0 just because g is not constant.
Examples are easily recovered in this set up. The gaussian distribution of
example 1 is obtained
by taking Ω = R, dα = dx, and g(x) = −(x − m)2 /2. Then
2π
1
Z(λ) = 2 log λ , defined only for λ > 0, and the tilted measure αλ is the gaussian
measure with average m and variance σ 2 = λ−1 . Notice now that the Legendre
transform of Z, that we now denote with Z ∗ is different from the rate function I,
in particular Z ∗ (x) = − 12 log(−πx) − 2 for x < 0, and +∞ if x > 0.
More generally we need a multidimensional setup. Let g : Ω → Rr a vector
valued measurable function and
Z
Z(λ) = log eλ·g(ω) dα(ω)
(1.2.2)
Ω
finite in the corresponding domain DZ ⊂ Rr . Z(λ) is convex and lower semicontinuous (it maybe not continuous). Again strict convexity follows by assuming that
every component of g is not constant. Furthermore DZ is convex.
The Fenchel-Legendre transform is now defined by
Z ∗ (x) = sup {λ · x − Z(λ)} .
(1.2.3)
λ
which, in the domain DZ ∗ = {x ∈ Rr : Z ∗ (x) < +∞}, is also convex and lowersemicontinuous as supremum of linear functionals. As before, there is a unique
correspondence from DZ to DZ ∗ such that
x̄ = ∇Z(λ̄)
λ̄ = ∇Z ∗ (x̄)
(1.2.4)
For λ ∈ DZ , we have the probability measure dαλ (ω) = eλ·g−Z(λ) dα(ω) on Ω.
With respect to αλ , g can be seen as a vector valued random variable with average
∇Z(λ) and covariance matrix HessZ(λ) = ∇2 Z(λ). Also now HessZ(λ) > 0 just
because g is not constant.
6
CHAPTER 1. TILTING
The gaussian example can be recovered with the choice d = 1, r = 2, g =
(x2 /2, x), α = dx. Then
1
Z(λ1 , λ2 ) = log
2
2π
λ1
1
+
2
λ2
λ1
2
−1
and αλ is the gaussian measure on R with variance λ−1
1 and average λ2 λ1 .
We will be interested mostly in the following example.
Let Ω = R2 , we will denote ω = (r, p), and α the usual Lebesgue measure
drdp. Let V : R → R+ a smooth function, such that V (r) → +∞ as |r| → ∞, and
such that
Z
e−βV (r) dr < +∞
∀β > 0.
(1.2.5)
Then we choose g = (−[p2 /2 + V (r)], r). We will make sure that all conditions we
will assume in the following will be satisfied by this example.
Observe that
Z
Z(λ + δ) − Z(λ) = log
eδ·g dαλ
(1.2.6)
Ω
So in the following we will work with a probability measure α that will indicate αλ0
for some λ0 .
In particular notice that the rate function Iλ (x) corresponding to the tilted
measure αλ is given by
Iλ (x) = Z ∗ (x) − λ · x + Z(λ)
1.3
(1.2.7)
Local Central Limit Theorem
Theorem 1.3.1 Local central limit theorem. Let φ(k) the characteristic function of a centered probability measure ν(dx) on Rr with finite covariance matrix σ 2 ,
and assume that |φ(k)| < 1 if k 6= 0 and that there exists an integer n0 ≥ 1 such
√
that |φ|n0 is integrable. Let g̃n (x) the probability density of (X1 + · · · + Xn )/ n,
where Xj are i.i.d. with common law ν. Then
2 −1
e−x·(σ ) x/2
√
lim g̃n (x) =
.
n→∞
(2π)r/2 det σ 2
Proof. This is a standard proof, we will consider here the one dimensional case, the
multidimensional case is straighforward.
1.3. LOCAL CENTRAL LIMIT THEOREM
7
The characteristic function of ν is defined by
Z
φ(k) = eixk ν(dx)
(1.3.1)
The characteristic function of the distribution of X1 + · · · + Xn is φn (k) that is
integrable for n ≥ n0 . It follows that the probability density g̃n (x) exists for any
n ≥ n0 (cf. Feller theorem XV.3.3). Then
n
Z +∞
k
1
−ix·k
dk
φ √
g̃n (x) =
e
2π −∞
n
and therefore
n
Z +∞ 1
1
k
2 /2σ 2 2 σ 2 /2 −k
−x
g̃n (x) − √
≤
φ √
dk
−e
e
2π
n
2πσ 2
−∞
Given a > 0, we split the integral in three parts.
1. Uniformly in k ∈ [−a, a],
n n
k2σ2
k
1
2 2
= 1−
φ √
+o
−→ e−k σ /2
n→∞
2n
n
n
so that
Z
+a
−a
n
k
2
2
−k σ /2 φ √
−
e
dk → 0
n
2. Observe that it is possible to choose δ > 0 such that
2 2
|φ(k)| ≤ e−k σ /4
if |k| ≤ δ.
√
Then for the interval |k| ∈ (a, δ n), we can estimate as
n
Z δ √n Z δ √n
Z +∞
k
2 2
2 σ 2 /2 −k
−k2 σ 2 /4
φ √
−e
dk ≤
2e
dk ≤
2e−k σ /4 dk
n
a
a
a
that converge to 0 as a → ∞.
√
3. It remains to estimate the contribution from the interval (δ n, +∞). Since
we assumed that |φ(k)| < 1 for k 6= 0, and since |φ|n 0 is integrable, we have
φ(k) → 0 as k → ∞. Consequently we must have sup|k|≥δ |φ(k)| = η < 1, and
we can estimate
n
n
Z +∞ Z +∞ Z +∞
k
k 0
2
2
2 2
−k σ /2 n−n0
φ √
−e
dk ≤ η
φ √ dk + √ e−k σ /2 dk
√ n
n
δ n
−∞
δ n
Z +∞
Z +∞
√
2 2
= η n−n0 n
|φ (k)|n0 dk + √ e−k σ /2 dk
−∞
that converges to 0 as n → ∞.
δ n
8
CHAPTER 1. TILTING
Distributions such that their characteristic function |φ(k)| < 1 for k 6= 0 are
called non-lattice ( [1], chapter 2). It does not imply they have density.
1.4
A Local Large Deviation Theorem
We assume now that the probability measure ν(dx) satisfies all the assumptions
made in section 1, and furthermore its characteristic function satisfies the conditions
of the local central limit theorem 1.3.1. Then, for n ≥ n0 , the distribution of Ŝn on
Rr has a density that we denote by fn (x).
Theorem 1.4.1 For any y ∈ DIo we have
1
log fn (y) = −I(y) .
n→∞ n
lim
(1.4.1)
Proof.
Again we will prove it for r = 1, the generalization is straightforward.
R
Let τy ν the translation of the measure ν by y. Assume that m = xν(dx) = 0,
otherwise just recenter it and consider τm ν.
Let y ∈ DI o . Then by lemma 1.1.3 there exists a unique λ ∈ DZ o such that
y = Z 0 (λ), λ = I 0 (y), and I(y) = λy − Z(λ). Define
ν̃(y, dx) = e(x+y)λ−Z(λ) τy ν(dx)
Observe that this is a probability distribution with 0 average. In fact
Z
Z
1
ν̃(y, dx) =
ezλ ν(dz) = 1
M (λ)
and
Z
1
xν̃(y, dx) = −y +
M (λ)
Z
zezλ ν(dz) = −y + Z 0 (λ) = 0
So we treat here y as a parameter. Let X1y , . . . , Xny i.i.d. random variables with law
given by ν̃(y, dx).
For n ≥ n0 it exists the density for the distribution of (X1y + · · · + Xny )/n that
we denote by fn (x, y), and it is equal to
fn (x, y) = en((x+y)λ−Z(λ)) fn (x + y) = en(I(y)+λx) fn (x + y)
1.5. LARGE DEVIATIONS PROBABILITIES
9
To prove this formula, compute, for a given bounded measurable function G(·):
Z
y
y
G(ŝn )en(I(y)+λŝn ) τy α(dx1 ) . . . τy α(dxn )
E (G((X1 + · · · + Xn )/n)) =
n
R
Z
(1.4.2)
n(I(y)+λŝ)
=
G(ŝ)e
fn (ŝ + y)dŝ
R
It follows that
fn (y) = e−nI(y) fn (0, y)
To conclude we only need to prove that (log fn (0, y))/n → 0 as n → ∞.
√
√
√
Let f˜n (x, y) the density of (X1y +· · ·+Xny )/ n. Then fn (x, y) = nf˜n ( nx, y).
By the local central limit theorem 1.3.1, the result follows immediately. 1.5
Large deviations probabilities
Corollary 1.5.1 For any closed set C
Z
1
lim log fn (x)dx ≤ − inf I(x)
n→∞ n
x∈C
C
and for any open set A
1
lim log
n→∞ n
Z
fn (x)dx ≥ − inf I(x)
A
x∈A
Proof Under the condition of the previous section, let C ⊂ Rr a compact set
such that C ∩ DI0 6= ∅. Then it is immediate that
Z
1
lim sup log fn (x) dx ≤ − inf I(x).
(1.5.1)
x∈C
n→∞ n
C
In fact let x0 the minimum attained by I(·) on the compact set C (under our
assumptions I is continuous). Then
Z
Z
Z
√
−nI(x0 )
−nI(x0 )
fn (0, x) dx = e
fn (x) dx ≤ e
n g̃n (0, x) dx
C
C
C
and it is not hard to prove that one can prove the local CLT page 6 with enough
uniformity such that
Z
1
lim sup log g̃n (0, x) dx = 0
n→∞ n
C
10
CHAPTER 1. TILTING
If C is closed but unbounded, then by Lemma 1.1.2 we have
Z
1
lim lim log
fn (x) dx = −∞
k→∞ n→∞ n
C∩{x>k}
and since
Z
Z
log fn (x) dx ≤ max log
C
Z
fn (x) dx, log
C∩{x≤k}
fn (x) dx + log 2
C∩{x>k}
by taking the logarithm and dividing by n, we are reduced to the compact case.
Abouth the lower bound, if A is an open set such that A ∩ DI 6= ∅, then take
any x0 ∈ A such that I(x0 ) < +∞. For any > 0 let δ > 0 such that I(x) < I(x0 )+
if |x − x0 | < δ. Then
Z
Z
Z
−n(I(x0 )+)
fn (x)dx ≥
fn (x)dx ≥ e
fn (0, x)dx
A
J(x0 ,δ)
A∩J(x0 ,δ)
Where J(x0 , δ) = {|x − x0 | < δ}. Since A ∩ J(x0 , δ) 6= ∅, it follows that
Z
1
lim log fn (x)dx ≥ −I(x0 ) − .
n→∞ n
A
Since is arbitrarily small at this point, we have obtained the lower bound. 1.6
Generalities on Large Deviations
Let X a complete separable metric space and Pn a family of probability distributions
on X. In the previous sections X = Rd and Pn the distribution of Ŝn . We says that
{Pn } satisfies a large deviation principle with good rate function I(·) if there exists
a function I : X → [0, ∞] such that:
1. I(·) is lower semicontinuous.
2. For each ` < ∞ the set {x : I(x) ≤ `} is compact in X.
3. For each closed set C ⊂ X
lim sup
n→∞
1
log Pn (C) ≤ − inf I(x).
x∈C
n
4. For each open set G ⊂ X
lim inf
n→∞
1
log Pn (G) ≥ − inf I(x).
x∈G
n
1.6. GENERALITIES ON LARGE DEVIATIONS
11
Here the adjective good refers to properties 1 and 2. The next lemma does not
require the rate function I to be good.
Theorem 1.6.1 Varadhan’s Lemma. Let Pn satisfy the large deviation principle
with rate function I. Then for any bounded continuous function F (x) on X
1
lim log
n→∞ n
Z
enF (x) dPn (x) = sup{F (x) − I(x)}.
x∈X
Proof.
Upper bound. For any given δ > 0, since F is bounded and continuous, we can
find a finite number of closed sets covering X such that the oscillation of F (·) on
each of these closed sets is less or equal δ. Then
Z
nF (x)
e
dPn (x) ≤
m Z
X
j=1
e
nF (x)
dPn (x) ≤
Cj
m
X
enFj +δ Pn (Cj )
j=1
where Fj = inf Cj F (x). It follows
1
lim sup log
n→∞ n
Z
enF (x) dPn (x) ≤ sup [Fj + δ − inf I(x)]
1≤j≤m
Cj
≤ sup sup[F (x) − I(x)] + δ
1≤j≤m Cj
= sup[F (x) − I(x)] + δ
x∈X
Since δ is arbitrary, we can let it go to 0.
Lower bound. By definition of a supremum for any δ > 0 we can find y ∈ X
such that F (y) − I(y) ≥ supx [F (x) − I(x)] − δ/2. Since F is continuous we can find
an open neighborhood U of y such that F (x) ≥ F (y) − δ/2 for any x ∈ U . Then we
obtain
Z
Z
1
1
nF (x)
lim inf log e
dPn (x) ≥ lim inf log enF (x) dPn (x)
n→∞ n
n→∞ n
U
δ
δ
≥ F (y) − − inf I(x) ≥ F (y) − I(y) − ≥ sup[F (x) − I(x)] − δ
2 x∈U
2
x
and we conclude from the arbitrariness of δ. 12
CHAPTER 1. TILTING
Theorem 1.6.2 Contraction Principle. Let Pn satisfy the large deviation principle with rate function I, and π : X → Y a continuous mapping from X to another
complete separable metric space Y . Then P̃n = Pn π −1 satisfies a large deviation
principle with rate function
˜ =
I(y)
inf
I(x),
x:π(x)=y
˜ = +∞
I(y)
if {x : π(x) = y} = ∅
Proof. Since π is continuous, given any closed set C̃ ⊂ Y , the subset C = π −1 (C̃) is
closed in X. Then
lim sup
n→∞
1
1
log P̃n (C̃) = lim sup log Pn (C) ≤ − inf I(x) = − inf inf I(x).
x∈C
n
y∈C̃ x:π(x)=y
n→∞ n
and similarly for the lower bound. Chapter 2
Conditioning
2.1
Conditional measures
Let Ω a d–dimensional manifold, as in the previous chapter, typically Rd , Sd etc. Let
α a positive σ-finite measure on Ω, typically the corresponding Lebesque measure.
Let g : Ω → Rr a measurable function. We assume that it is not constant and
that has compact level sets.
On the product space Ωn , with the product measure dαn = ⊗j dα(ωj ), we
define the function
n
X
(n)
g =
g(ωj )/n.
j=1
For any y ∈ Rr consider the set
Σn (y) = (ω1 , . . . , ωn ) ∈ Ωn : g(n) = y .
(2.1.1)
This is bounded set since it is the boundary of a compact set.
We denote the projection of αn on Σn (y) as the positive measure dγn (ω1 , . . . , ωn ; y)
on Σn (y) defined by
Z
Z
Z
(n)
n
F (g )G(ω1 , . . . , ωn )dα =
F (y)
G(ω1 , . . . , ωn ) dγn (ω1 , . . . , ωn , y).
Ωn
Rr
Σn (y)
(2.1.2)
for any bounded measurable functions F : R → R, G : Ω → R. In general γn (·, y)
is a finite (not normalized) measure on Σn (y). The total volume is given by
Z
Wn (y) =
dγ(ω1 , . . . , ωn ; y)
(2.1.3)
r
Σn (y)
13
14
CHAPTER 2. CONDITIONING
In the case r = 1, we have that
d
Wn (y) =
dy
Z
dαn (ω1 , . . . , ωn ).
(2.1.4)
g (n) ≤y
Recall the definition of Z(λ) and its Legendre transform Z ∗ (y):
Z
Z ∗ (y) = sup [λ · y − Z(λ)] .
Z(λ) = log eλ·g(ω) dα(ω),
λ
Ω
Theorem 2.1.1 For any y ∈ DZ ∗ :
1
log Wn (y) = −Z ∗ (y)
n→∞ n
lim
(2.1.5)
Proof.
Let λ = ∇Z ∗ (y), and consider the tilted probability measure dαλ = eλ·g(ω)−Z(λ) dα(ω).
Then the product probability measure on Ωn is given by
dαλn = en[g
(n) ·λ−Z(λ)]
dαn (ω1 , . . . , ωn ).
Then, under the αλn probability, we can see g (n) as a normalized sum of independent
random variables. We denote fn (x, λ) the density of its probability distribution, i.e.
for any F : Rr → R,
Z
Z
F (g(n) )dαλn
F (x)fn (x, λ)dx =
Ωn
Rr
Z
(n)
=
F (g(n) )en[g ·λ−Z(λ)] dαn
n
ZΩ
=
F (x)en[x·λ−Z(λ)] Wn (x)dx
Rr
i.e. fn (y, λ) = en[x·λ−Z(λ)] Wn (x). Applying proposition 1.4.1, we have that
1
log fn (y, λ) = −Iλ (y) = − [Z ∗ (y) + Z(λ) − λ · y]
n→∞ n
lim
and (2.1.5) follows. The conditional probability distribution of αλn on Σn (y) is defined by
Z
(n)
F (g(n) )G(ω1 , . . . , ωn )en[λg −Z(λ)] dαλn (ω1 , . . . , ωn )
Ωn
Z
Z
F (y)fn (y, λ)
G(ω1 , . . . , ωn )dαλn (ω1 , . . . , ωn |y)
=
Rr
Σn (y)
2.1. CONDITIONAL MEASURES
15
Since fn (y) = en[λ·y−Z(λ)] Wn (y) we have the relation
(n)
dαλ (·|y) = dγ(·, y)/Wn (y)
that in particular imply that it does not depend on λ1 .
Lemma 2.1.2 Let F be a bounded continuous function on Ω, and g : Ω → Rr as
above. Then for any θ ∈ R
Z
1
(n)
lim log
eθnF dα(n) (ω1 , . . . , ωn |y) = −Zθ∗ (y) + Z ∗ (y)
(2.1.6)
n→∞ n
Σn (y)
R
where Zθ∗ (y) is the Fenchel-Legendre transform of Zθ (λ) = log Ω eθF +λ·g dα:
Zθ∗ (y) = sup {δ · y − Zθ (δ)} .
δ
Furthermore
Z
∂θ Zθ∗ (y)
=−
θ=0
δ ∗ ·g−Z(δ ∗ )
Fe
Z
dα = −
Ω
F dαδ∗ (y)
(2.1.7)
Ω
for δ ∗ = ∇Z ∗ (y).
Proof:
Consider the doubly tilted probability measure
dαλ,θ (ω) = eθF (ω)+λ·g(ω)−Zθ (λ) dα(ω).
n
Under the product measure αλ,θ
on Ωn , the probability distribution of g(n) is given
by:
Z
(n)
n[λ·y−Zθ (λ)]
enθF dα(n) (ω1 , . . . , ωn |y)
fn (y; λ, θ) = e
Wn (y)
Σn (y)
Then, applying theorem 1.4.1 and (1.2.7) we have
1
lim log fn (y; λ, θ) = −I(y; λ, θ) = −Zθ∗ (y) + λ · y − Zθ (λ).
n→∞ n
and (2.1.6) follows directly.
About (2.1.7), observe that
Zθ∗ (y) = δ ∗ · y − Zθ (δ ∗ )
where δ ∗ = ∇y Zθ∗ (y) depends on y and θ, and y = ∇δ Zθ (δ ∗ ) . Differentiating in θ
we have:
Z
∗
∗
∗
∗
∗
∗
∂θ Zθ (y) = y · ∂θ δ − ∂θ Zθ (δ ) − ∇δ Zθ (δ ) · ∂θ δ = −∂θ Zθ (δ ) = − F (ω)dαδ∗ ,θ (ω).
Ω
and (2.1.7) follows after taking θ → 0. 1
In statistics we say that g(n) is a sufficient statistics for estimating λ.
16
CHAPTER 2. CONDITIONING
2.2
Equivalence of ensembles
In the following y ∈ DZ0 ∗ .
Theorem 2.2.1 There exists a constant C > 0 such that for any > 0
Z
1
1[|F (n) −R F dαλ |≥] dα(n) (ω1 , . . . , ωn |y) ≤ −C2 .
lim log
n→∞ n
Σn (y)
(2.2.1)
Proof:
R
Without loosing any generality we can here assume that F dαλ = 0. Consequently Zθ∗ (y) − Z ∗ (y) = O(θ2 ). Then for any θ > 0 by exponential Chebichef
inequality:
Z
Z
(n)
(n)
−nθ
1[|F (n) |≥] dα (ω1 , . . . , ωn |y) ≤ e
eθ|nF | dα(n) (ω1 , . . . , ωn |y)
Σn (y)
Σn (y)
Z
h
i
(n)
(n)
≤ e−nθ
eθnF + e−θnF
dα(n) (ω1 , . . . , ωn |y)
Σn (y)
and by (2.1.6)
1
lim log
n→∞ n
Z
Σn (y)
1[|F (n) |≥] dα(n) (ω1 , . . . , ωn |y) ≤
∗
− θ + max{−Zθ∗ (y) + Z ∗ (y), −Z−θ
(y) + Z ∗ (y)}
and since Zθ∗ (y) = O(θ2 ), we can choose θ such that the right hand side is bounded
by −C2 for some positive constant C. Since dα(n) (ω1 , . . . , ωn |y) is a symmetric measure:
Z
Z
Z
(n)
(n)
(n)
F (ω1 )dα (ω1 , . . . , ωn |y) =
F dα (ω1 , . . . , ωn |y) −→
F dαλ
Σn (y)
n→∞
Σn (y)
Ω
More generally we have
Theorem 2.2.2 Let F (ω1 , . . . , ωk ) a bounded continuous function on Ωk and y ∈
DZo ∗ , then
Z
Z
(n)
lim
F (ω1 , . . . , ωk )dα (ω1 , . . . , dωn |y) =
F (ω1 , . . . , ωk )αλ (dω1 ) . . . αλ (dωk )
n→∞
Σn (y)
Ωk
2.2. EQUIVALENCE OF ENSEMBLES
17
Proof. It is enough to consider functions of the form F (ω1 , . . . , ωk ) = F1 (ω1 ) . . . Fk (ωk ).
For simplicity let us prove the case k = 2, the generalization
to any k is straightforR
ward. Without loosing generality, let us assume that Fj (ω)αλ (dω) = 0. By the
exchange symmetry of dα(n) (·|y) we have
Z
Z
X
1
(n)
F1 (ωi )F2 (ωj )α(n) (dω1 , . . . , dωn |y)
F1 (ω1 )F2 (ω2 )α (dω1 , . . . , dωn |y) =
n(n − 1) i6=j
Z
n2
1
(n) (n) (n)
F1 F2 α (dω1 , . . . , dωn |y) + O
=
n(n − 1)
n
and this last expression converges to 0 an n → ∞ by (2.2.1) . Examples
1. Choose Ω = R and g(x) = x2 . It follows from the above equivalence the so
2
called Poincare
√ lemma : the uniform measure on the n-dimensional sphere
with radius n converges, in terms of the√finite dimensional distributions, to
2
the product of gaussian measures e−xi /2 / 2π.
2
Apparently Poincare has nothing to do with such statement, that should be attributed to
Maxwell.
18
CHAPTER 2. CONDITIONING
Chapter 3
Statistical mechanics and
thermodynamics of one
dimensional chain of oscillators
3.1
Grand canonical formalism
We study a system of m = [nM ] anharmonic oscillators, where M > 0 is a positive
parameter corresponding to the macroscopic mass of the total system. The particles
are denoted by j = 1, . . . m. We denote with qj , j = 1, . . . , m their positions, and
with pj the corresponding momentum (which is equal to its velocity since we assume
that all particles have mass 1). We consider first the system attached to a point,
and we set q0 = 0, p0 = 0. Between each pair of consecutive particles (i, i + 1) there
is an anharmonic spring described by its potential energy V (qi+1 − qi ). We assume
V is a positive smooth function such that V (r) → +∞ as |r| → ∞ and such that
Z
Z(λ, β) := e−βV (r)+λr dr < +∞
(3.1.1)
for all β > 0 and all λ ∈ R. Let a be the equilibrium interparticle spacing, where V
attains its minimum that we assume is 0: V (a) = 0. It is convenient to work with
interparticle distance as coordinates, rather than absolute particle position, so we
define {rj = qj − qj−1 − a, j = 1, . . . , m}. Without loosing any generality, we will
choose a = 0 for the sequence.
The configuration of the system is given by {pj , rj , j = 1, . . . , m} ∈ R2m , and
energy function (Hamiltonian) defined on each configuration is given by
m
X
H=
Ej
j=1
19
20
CHAPTER 3. THERMODYNAMICS
where
1
Ej = p2j + V (rj ), j = 1, . . . , m
2
is the energy of each oscillator. This choice is a bit arbitrary, because we associate
the potential energy of the bond V (rj ) to the particle j. Different choices can be
made, but this one is notationally convenient.
At the other end of the chain we apply a constant force P
τ ∈ R on the particle
n (tension). The position of the particle m is given by qn = m
j=1 rj . We consider
the Hamiltonian dynamics:
ṙj (t) = pj (t) − pj−1 (t),
j = 1, . . . , m,
ṗj (t) = V 0 (rj+1 (t)) − V 0 (rj (t)),
j = 1, . . . , m − 1,
0
ṗm (t) = τ − V (rm (t)),
(3.1.2)
It is easy to see that, for any β > 0, the grand canonical measure µgc
τ,β defined by
dµn,gc
τ,β
=
m
Y
e−β(Ej −τ rj )
p
drj dpj
2πβ −1 Z(βτ, β)
j=1
(3.1.3)
is stationary for this dynamics.
The distribution µm,gc
is called grand canonical Gibbs measure at temperaτ,β
−1
ture T = β and tension (or pressure) τ . Notice that {r1 , . . . , rm , p1 , . . . , pm } are
independently distributed under this probability measure.
We can apply the result of chapters
1 and 2, with Ω = R2 , g(r, p) = E(p, r),
q
. Then, for M > 0, U > 0, L ∈ R, we have
λ = (−β, −τ /β), Z(λ) = Z(βτ, β) 2π
β
the microcanonical surface
(
)
m
m
1X
1X
Σ̃m (M, M U, M L) := (r1 , p1 , . . . , rm , pm ) :
Ej = M U,
rj = M L
n j=1
n j=1
= Σm (U, L) = (r1 , p1 , . . . , rm , pm ) : E (m) = U, r(m) = L .
(3.1.4)
Defining Wm (U, L) as in (2.1.3), By theorem (2.1.5) the following limit exists
1
log Wm (U, L) = M S(1, U, L)
(3.1.5)
n
In fact by the definition follows that S is homogeneous of degree 1. In the following
we just use the notation S(U, L) = S(1, U, L). This function is the thermodynamic
entropy that by (2.1.5)
n
o
p
−1
S(U, L) = inf −λL + βU − log Z(λ, β) 2πβ
.
(3.1.6)
S(M, M U, M L) := lim
n→∞
λ,β>0
3.2. MICROCANONICAL MEASURE
21
This is the fundamental relation that connects the microscopic system to its
thermodynamic macroscopic description.
The limit in (3.1.5) is intended for all values of the internal energy U > 0.
S is concave, since inf of linear functions.
We can now define the other thermodynamic quantities from the entropy definition (3.1.6). From equation (3.1.6) we have
λ(L, u) = −
∂S(L, u)
,
∂L
β(L, u) =
∂S(L, u)
∂u
(3.1.7)
and we will always define the tension as τ (L, u) = λ(L, u)/β(L, u).
Z
Z
∂ log Z(λ, β)
eλr−βV (r)
L(λ, β) =
= r
dr = rj dµgc
τ,β
∂λ
Z(λ, β)
p
Z
Z
∂ log Z(λ, β) 2π/β
eλr−βV (r)
1
u(λ, β) = −
= V (r)
dr +
= Ej dµgc
τ,β
∂β
Z(λ, β)
2β
(3.1.8)
Computing the total differential of S(r, u) we have
dS = −βτ dL + βdu =
d\Q
T
(3.1.9)
where d\Q is the (non-exact) differential
d\Q = −τ dL + du.
3.2
(3.1.10)
Microcanonical measure
Instead of applying a force (tension) to oneP
side of the chain, one can fix the particle
n
n to another wall at distance nr (qn =
j=1 rj = nr and pn = ṗn = 0). The
corresponding constrained dynamics is
ṙj (t) = pj (t) − pj−1 (t),
j = 1, . . . , n − 1,
0
0
ṗj (t) = V (rj+1 (t)) − V (rj (t)),
j = 1, . . . , n − 1,
rn (t) = nr −
n−1
X
(3.2.1)
rj (t) .
j=1
P
The dynamics
now
is
conserving
the
total
energy
H
=
j Ej = nu and the total
Pn
n,mc
length j=1 rj = nr. The microcanonical measures µr,u are now stationary for
this dynamics. These are defined in the following way:
22
CHAPTER 3. THERMODYNAMICS
Consider the vector valued i.i.d. random variables
{Xj = (rj , Ej ), j = 1, . . . , n},
n,mc
distributed by dµn,gc
τ0 ,β0 . Fix x = (r, u),
Pnand define µx the conditional distribution of
(r1 , p1 , . . . , rn , pn ) on the manifold j=1 Xj = nx. This is defined, for any bounded
continuous function G : R × R+ → R and H : R2n → R, by
Z
G(Ŝn )H(r1 , p1 , . . . , rn , pn ) dµn,gc
τ0 ,β0 (r1 , p1 , . . . , rn , pn )
Z
Z
dx G(x)fn (x) H(r1 , p1 , . . . , rn , pn ) dµn,mc
=
x
R×R+
P
where Ŝn = n1 ni=1 Xi . It is easy to see that µn,mc
does not depend on τ0 , β0 . We
x
n,mc
call µx the microcanonical measure.
The multidimensional application of theorem ?? gives the following equivalence
between microcanonical and grandcanonical measure:
Theorem 3.2.1 Given x = (r, u), let
β = β(r, u),
τ = λ(r, u)β −1 .
Then for any bounded continuous function F : R2k → R we have
Z
lim
F (r1 , p1 , . . . , rk , pk ) dµn,mc
(r1 , p1 , . . . , rn , pn )
x
n→∞
Z
= F (r1 , p1 , . . . , rk , pk ) dµgc
τ,β (. . . , r1 , p1 , . . . , rn , pn , . . .)
It will be useful later the equivalence of ensembles in the following form:
Theorem 3.2.2 Under the same conditions of Theorem 3.2.1, assume that
Z
F (r1 , p1 , . . . , rk , pk ) dµk,gc
τ,β (r1 , p1 , . . . , rk , pk ) = 0.
Then
Z n−k
X
1
n,mc
lim
F
(r
,
p
,
.
.
.
,
r
,
p
)
=0
i i
i+k k+i dµx
n→∞
n − k
i=1
The proof of these two theorems follows the argument used for Theorems 2.2.2.
3.3. CANONICAL MEASURE
3.3
23
Canonical measure
Applying a Langevin’s thermostat at temperature T = β −1 to the particle n (or to
any other particle), we obtain a dynamics that has the canonical measure µn,c
r,β as
stationary measure:
ṙj (t) = pj (t) − pj−1 (t),
j = 1, . . . , n − 1,
0
0
dpj (t) = (V (rj+1 (t)) − V (rj (t))) dt
p
+ δj,n−1 −pj (t)dt + βdw(t) ,
rn (t) = nr −
n−1
X
j = 1, . . . , n − 1,
(3.3.1)
rj (t) .
j=1
This is defined as follows:
If we condition the P
grand canonical measure µn,gc
0,0,β on the total length of the
chain equal to L = nr = j rj = qn − q0 , we obtain the canonical measure that we
denote by µn,c
r,β . We can formally write
dµn,c
r,β
!
P
X
Y
Y e−βp2j /2
e−β j V (rj )
p
δ
rj = nr
drj
dpj ⊗
=
−1
Z
(r,
β)
n,c
2πβ
j
j
j
where Zn,c (r, β) is the normalization constant (canonical partition function).
Similar statements as theorems 3.2.1 and 3.2.2 holds, µn,c
r,β converging to the
n,gc
grand-canonical measure µτ,β , with τ given by the thermodynamic relations (3.1.7).
Other boundary conditions can be made, like applying a tension τ and a
Langevin thermostat at temperature β −1 to the n particle, obtaining a system with
µgc
τ,β as stationary measure.
3.4
Local equilibrium, local Gibbs measures
The Gibbs distributions defined in the above sections are also called equilibrium
distributions for the dynamics. Studying the non-equilibrium behaviour we need the
concept of local equilibrium distributions. These are probability distributions that
have some asymptotic properties when the system became large (n → ∞), vaguely
speaking locally they look like Gibbs measure. We need a precise mathematical
definition, that will be useful later for proving macroscopic behaviour of the system.
24
CHAPTER 3. THERMODYNAMICS
Definition 3.4.1 Given two functions β(y) > 0, τ (y), y ∈ [0, 1], we say that the
sequence of probability measures µn on R2n has the local equilibrium property (with
respect to the profiles β(·), τ (·)) if for any k > 0 and y ∈ (0, 1),
lim µn ([ny],[ny]+k) = µk,gc
(3.4.1)
τ (y),β(y)
n→∞
Sometimes we will need some weaker definition of local equilibrium (for example relaxing the pointwise convergence in y). It is important here to understand
that local equilibrium is a property of a sequence of probability measures.
The most simple example of local equilibrium sequence is given by the local
Gibbs measures:
n
Y
j=1
e−β(j/n)(Ej −τ (j/n)rj )
p
drj dpj =
2πβ(j/n)−1 Z(β(j/n)τ (j/n), β(j/n))
gτn(·),β(·)
n
Y
drj dpj
(3.4.2)
j=1
Of course are local equilibrium sequence also small order perturbation of this sequence like
n
Y
P
Fj (rj−h ,pj−h ,...,rj+h ,pj+h )/n n
j
e
gτ (·),β(·)
drj dpj
(3.4.3)
j=1
where Fj are local functions.
To a local equilibrium sequence we can associate a thermodynamic entropy,
defined as
Z 1
S(r(y), u(y)) dy
(3.4.4)
S(r(·), u(·)) =
0
where r(y), u(y) are computed from τ (y), β(y) using (3.1.8).
Bibliography
[1] R. Durrett. Probability: Theory and Examples. Duxbury Press, third edition,
2005.
25