A short proof of Cramér’s theorem in R
Raphaël Cerf and Pierre Petit
Abstract
We give a short proof of Cramér’s large deviations theorem based on
convex duality. This proof does not resort to the law of large numbers or
any other limit theorem.
The most fundamental result in probability theory is the law of large numbers
for a sequence (Xn )n>1 of independent and identically distributed real-valued
random variables. Define the empirical mean of (Xn )n>1 by
n
1X
Xi .
Xn =
n i=1
The law of large numbers asserts that the empirical mean X n converges (almost
surely) towards the theoretical mean E(X1 ) provided that E(|X1 |) is finite. The
next fundamental results are the central limit theorem and Cramér’s theorem.
Both are refinements of the law of large numbers in two different directions. The
central limit theorem describes the random
fluctuations of X n around E(X1 ),
√
in fact “small deviations” of order 1/ n. Cramér’s theorem estimates the probability that X n deviates significantly from E(X1 ):
P X n > E(X1 ) + ε
for ε > 0, that is, the probability of “large deviation” events. It turns out that
this probability decays exponentially fast with n when |X1 | admits exponential
moments. The first estimate of this kind can be traced back to Cramér’s paper
[6], which deals with variables possessing a density and exponential moments.
In [5] Chernoff relaxed the first assumption. Bahadur [2] finally gave a proof
without any assumption on the law of X1 . Coming from statistical mechanics,
Lanford imported the subadditivity argument in the proof [10]. The result of
Cramér was generalized in several directions. Bahadur and Zabell [3] extended
Cramér’s theory to infinite-dimensional topological vector spaces. Hammersley [9] considered superadditive sequences in R; taking advantage of the order
structure of R, he found a quite different efficient proof of the result (we thank
Olivier Garet for drawing our attention to Hammersley’s paper [9]). The texts
of Azencott [1], Deuschel and Stroock [8], Dembo and Zeitouni [7], and Cerf [4]
take stock of the foregoing improvements. The proofs of Cramér’s theorem in
R presented in these texts resort either to the law of large numbers (see, e.g.,
[7]), Mosco’s theorem (see, e.g., [4]), or another limit theorem. We give here a
1
direct proof of Cramér’s theorem in R which combines the ideas of Hammersley,
Lanford, Bahadur, and Zabell, with two simplifying features: we first prove the
dual version of Cramér’s theorem (in the sense of convex functions) and we use
conditioning by a compact convex set. Not only is the proof clearer, but it can
easily adapt to infinite-dimensional spaces and even nonindependent variables.
Cramér’s theorem. Let (Xn )n>1 be a sequence of independent and identically distributed real-valued random variables and let X n be the empirical mean:
n
Xn =
1X
Xi .
n i=1
For all x ∈ R the sequence
1
log P X n > x
n
converges in [−∞, 0] and
1
log P X n > x = inf log E eλX1 − λx .
n→∞ n
λ>0
lim
Let us define the entropy of the sequence (Xn )n>1 by
∀x ∈ R
1
log P X n > x
n>1 n
s(x) = sup
and the pressure of X1 (or the log-Laplace transform of the law of X1 ) by
∀λ ∈ R
p(λ) = log E eλX1 .
The entropy and the pressure may take infinite values. We work in [−∞, +∞]
with the natural extensions of addition and the order relation. Our strategy is
to show a dual version, in the sense of convex functions, of Cramér’s theorem.
Dual equality.
For all λ > 0,
p(λ) = sup λu + s(u) .
u∈R
Proof. The classical Chebyshev’s inequality will yield one part of the proof of
the above equality. To prove the other part we will condition X n to be bounded
by K and then let K grow towards +∞.
Since X1 , . . . , Xn are independent and identically distributed, we have, for all
λ > 0 and n > 1,
n
E enλX n = E eλX1 .
2
Thus, for u ∈ R and n ≥ 1, it follows from Chebyshev’s inequality that
1
p(λ) = log E eλX1 = log E enλX n
n
1
1
> log enλu P enλX n > enλu > λu + log P X n > u .
n
n
Hence, taking the supremum over n > 1 and then over u ∈ R, we get
∀λ > 0
p(λ) > sup λu + s(u) .
u∈R
Next, we prove equality for λ = 0. Let us state a simple and useful inequality.
Useful inequality.
For all x ∈ R and n > 1,
n
P X1 > x 6 P X n > x 6 ens(x) .
In particular, s(u) > log P(X1 > u). Hence, letting u go to −∞, we see that
sup s(u) = 0 = p(0).
u∈R
Now we prove the converse inequality for λ > 0. Let λ > 0 and K > 0. If X
is a random variable, 1X6x will denote the random variable that is 1 if X 6 x
and 0 otherwise. For all n > 1, using the fact that X1 , . . . , Xn are independent
and identically distributed, we have
1
log E eλX1 1|X1 |6K = log E eλ(X1 +···+Xn ) 1|X1 |6K · · · 1|Xn |6K
n
1
6 log E enλX n 1|X n |6K
n
since
{|X1 | 6 K} ∩ · · · ∩ {|Xn | 6 K} ⊂ {|X n | 6 K}
where {X 6 x} denotes the event that the random variable X is at most x.
Then, writing exp(nλX n ) as an integral, we get
log E eλX1 1|X1 |6K
!
!
Z Xn
1
e−nλK +
nλenλu du 1|X n |6K
6 log E
n
−K
Z +∞ 1
−nλK
nλu
6 log e
+
E 1−K6u6X n 1|X n |6K nλe du ,
n
−∞
the last step being a consequence of Fubini’s theorem. Since
E 1−K6u6X n 1|X n |6K 6 E 1X n >u 1−K6u6K
= P X n > u 1|u|6K 6 ens(u) 1|u|6K ,
3
we get
log E e
λX1
1|X1 |6K
!
Z K
1
n(λu+s(u))
−nλK
6 log e
nλe
du
+
n
−K
1
6 log e−nλK + 2Knλ exp n sup λu + s(u)
.
n
u∈R
Let K be large enough so that (recall that the supremum of s is 0, whence there
is some u ∈ R such that s(u) > −∞)
−λK < sup λu + s(u) .
u∈R
Then
1
log E eλX1 1|X1 |6K 6 log (1 + 2Knλ) exp n sup λu + s(u)
n
u∈R
1
6 log(1 + 2Knλ) + sup λu + s(u) .
n
u∈R
Sending n to ∞ we obtain
log E eλX1 1|X1 |6K 6 sup λu + s(u) .
u∈R
Finally, sending K to +∞ we get
p(λ) 6 sup λu + s(u) .
u∈R
To deduce Cramér’s theorem from the dual equality, we need to prove that the
function s is concave.
Convergence.
For all x ∈ R,
1
log P X n > x
n
converges in [−∞, 0] towards s(x).
Proof. Let x ∈ R. The proof relies on the subadditivity of the sequence
− log P(X n > x). Suppose that P(X1 > x) > 0 and let m > 1. For n > m, let
n = mqn + rn be the Euclidean division of n by m. Then
m(k+1)
qn
−1
n
\
\
1 X
{X n > x} ⊃
Xi > x ∩
{Xi > x}
m
i=mq +1
k=0
i=mk+1
4
n
since X n is a weighted mean of the random variables appearing in the events of
the right-hand side. Since the Xi are i.i.d., for all k ∈ {0, . . . , qn − 1},
m(k+1)
X
1
P
Xi > x = P X m > x
m
i=mk+1
and, for all i ∈ {mqn + 1, . . . , n}, P(Xi > x) = P(X1 > x). Therefore
q n
qn
P X n > x > P X m > x P(X1 > x)rn > P X m > x P(X1 > x)m .
Now, taking logarithms, dividing by n, sending n to ∞, and remembering that
qn /n → 1/m, we get
lim inf
n→∞
1
1
log P X n > x >
log P X m > x .
n
m
Taking the supremum over m > 1, we conclude that
1
log P X n > x = s(x).
n→∞ n
lim
If x is such that P(X1 > x) = 0, then
∀n > 1
1
log P X n > x = −∞
n
whence the sequence converges towards −∞ = s(x).
Concavity.
Proof.
The function s : R → [−∞, 0] is concave.
For x, y ∈ R and n > 1,
) (
)
( n
2n
1 X
1X
1
Xi > x ∩
Xi > y
X 2n > 2 (x + y) ⊃
n i=1
n i=n+1
so that, taking the logarithm of the probability of each side, dividing by n, and
sending n to ∞, we get
s 21 (x + y) > 21 s(x) + s(y) .
Iterating, we have for every dyadic rational α ∈ [0, 1]
s αx + (1 − α)y > αs(x) + (1 − α)s(y).
Since s is nonincreasing, we conclude that s is concave.
We now finish the proof of Cramér’s theorem. At this point we know that for
all x ∈ R
inf p(λ) − λx = inf sup λ(u − x) + s(u) .
λ>0
λ>0 u∈R
5
It remains to prove that the latter quantity equals s(x). The result is standard in
convex function theory and known as Fenchel-Legendre duality. Let us give an
elementary proof in our setting. The right-hand side of the previous equation
is clearly greater than or equal to s(x): take u = x. To prove the converse
inequality we set
c = inf x ∈ R : P X1 > x = 0
and we distinguish the two cases x < c and x > c.
• Suppose x < c. So s(x) > −∞ (remember the useful inequality). Since s is
concave and nonincreasing let
−λ = lim−
u→x
s(u) − s(x)
60
u−x
be the left derivative of s at the point x. Then, by concavity,
∀u ∈ R
s(u) 6 s(x) − λ(u − x)
from which the result follows.
• Suppose x > c. Note that
P X1 > c = lim+ P X1 > c + ε = 0.
ε→0
Then, for all λ > 0 and ε > 0,
p(λ) − λx = log E eλ(X1 −x) 1X1 <x−ε + 1x−ε6X1 6c
6 log e−λε + P X1 > x − ε .
Taking the infimum over λ > 0 and sending ε to 0 we get
inf p(λ) − λx 6 log P X1 > x 6 s(x).
λ>0
Acknowledgments. We would like to thank Cécile Dubois and Yann Fuchs
for their careful reading, and the referees for their remarks.
[1] R. Azencott, Grandes déviations et applications, in Ecole d’Eté de Probabilités de Saint-Flour VIII-1978, Lecture Notes in Mathematics, vol. 774,
Springer-Verlag, Berlin, 1980, 1–176.
[2] R. R. Bahadur, Some Limit Theorems in Statistics, Conference Board of the
Mathematical Sciences Regional Conference Series in Applied Mathematics,
no. 4, Society for Industrial and Applied Mathematics, Philadelphia, PA,
1971.
[3] R. R. Bahadur and S. L. Zabell, Large deviations of the sample mean in
general vector spaces, Ann. Probab. 7 (1979) 587–621.
6
[4] R. Cerf, On Cramér’s Theory in Infinite Dimensions, Panoramas et
Synthèses, no. 23, Société Mathématique de France, Paris, 2007.
[5] H. Chernoff, A measure of asymptotic efficiency for tests of a hypothesis
based on the sum of observations, Ann. Math. Statistics 23 (1952) 493–507.
[6] H. Cramér, Sur un nouveau théorème-limite de la théorie des probabilités,
Actual. Sci. Indust. 736 (1938) 5–23.
[7] A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications,
2nd ed., Springer, New York, 1998.
[8] J.-D. Deuschel and D. W. Stroock, Large Deviations, Academic Press, San
Diego, 1989.
[9] J. M. Hammersley, Postulates for subadditive processes, Ann. Probab. 2
(1974) 652–680.
[10] O. E. Lanford III, Entropy and equilibrium states in classical statistical
mechanics, in Statistical Mechanics and Mathematics Problems, Lecture
Notes in Physics, vol. 20, Springer-Verlag, Berlin, 1973, 1–113.
Mathématique, bâtiment 425, Université Paris Sud, 91405 Orsay Cedex
[email protected]
[email protected]
7
© Copyright 2026 Paperzz