The Moderate Deviations Result

Steven R. Dunbar
Department of Mathematics
203 Avery Hall
University of Nebraska-Lincoln
Lincoln, NE 68588-0130
http://www.math.unl.edu
Voice: 402-472-3731
Fax: 402-472-8466
Topics in
Probability Theory and Stochastic Processes
Steven R. Dunbar
The Moderate Deviations Result
Rating
Mathematicians Only: prolonged scenes of intense rigor.
1
Section Starter Question
Key Concepts
1. For any sequence an with
√
n an n we have
Pn [Sn − pn ≥ an ] → 0
but neither the Central Limit Theorem nor the Large Deviations Principle tells us how fast the convergence is, nor what the precise rate
of growth is for an . Making this precise is the domain of Moderate
Deviations Theorem.
n
= 0 then
2. Precisely, if an → ∞, and limn→∞ na1/6
p
Sn
an
1
2
− p ≥ p(1 − p) √ ∼ √ e−an /2 .
Pn
n
n
an 2π
Vocabulary
1. Moderate deviations results are refinements of the Central Limit
Theorem
p
1
Sn
an
2
Pn
− p ≥ p(1 − p) √ ∼ √ e−an /2 .
n
n
an 2π
when an = o(n1/6 ).
2
Mathematical Ideas
Recall that Xk is a Bernoulli random variable taking on the value 1 or 0 with
probability p or 1 − p respectively. Then
Sn =
n
X
Xi
k=1
is a binomial random variable indicating the number of successes in a composite experiment.
The Large Deviations Estimate shows that the probability of large deviation events of the type
Sn
Pn
−p>x
n
i.e. the sample mean exceeds the mean by more than x, decays exponentially
in x. Equivalently, the probability Pn [Sn − np ≥ nx] that the partial sum Sn
exceeds its mean by more than nx is exponentially small in x. The de MoivreLaplace Central Limit Theorem tells
√ us the probability that the partial sum
exceeds its average by an order of n. Precisely
Sn
x
Pn
− p ≥ √ → 1 − Φ(x) > 0.
n
n
√
Equivalently, the probability Pn [Sn − np] ≥ nx for partial sums approaches
the
√ standard normal distribution. This implies that for any sequence an with
n an n we still have
Pn [Sn − pn ≥ an ] → 0
and neither the Central Limit Theorem nor the Large Deviations Estimate
tells us how fast the convergence is, nor what the precise rate of growth is
for an . Making this precise is the domain of Moderate Deviations Theorem.
3
Figure 1: Comparison of the binomial distribution with n = 12, p = 4/10
with the normal distribution with mean np and variance np(1 − p).
The Moderate Deviations Theorem is due to Harald Cramér in 1938.
First we have to prove two supplementary results, each of which is interesting in its own right.
Proposition 1 (“Optimization” extension of de Moivre-Laplace Central
Limit Theorem). Assume
1. For 0 ≤ k ≤ n, define δn (k) by
«
„
−(k−np)2
n k
1
p (1 − p)k = p
e 2np(1−p) (1 + δn (k)).
k
2πp(1 − p)n
2. Let cn be a positive real sequence with limn→∞ cn = 0.
3. Let In0 = {k ∈ Z : |k − np| < cn n2/3 }.
Then
lim max |δn (k)| = 0.
n→∞ k∈In0
Remark. In Figure 1 the amount δn (k) is the small relative error between
the height of the normal distribution curve and the height of the binomial
distribution histogram over the integer k.
4
Remark. Compare the statement of this proposition to the statement of
the de Moivre-Laplace Binomial Point Mass Limit, Lemma 9 in de Moivre
Laplace Central Limit Theorem. Here the domain of the maximum is In0 =
{k ∈ Z : |k − np| < cn n2/3 } which is slightly larger than the domain in the
√ de
Moivre-Laplace Binomial Point Mass Limit, In = {k ∈ Z : |k − np| < a n}.
Proof.
1. Recall from the de Moivre-Laplace Theorem (see step 2 of the
proof of Lemma 9 in de Moivre Laplace Central Limit Theorem) that
from Stirling’s Formula
n k
n!
p (1 − p)n−k =
pk (1 − p)n−k
k!(n − k)!
k
r
np k n(1 − p) n−k n
1
1 + n
=√
.
(n − k)
(1 + k )(1 + n−k )
2π k(n − k) k
where n < A/n, k < A/k, n=k < A/(n − k) for some constant A.
2. For k ∈ In0
n
(np + cn
n2/3 )(n(1
≤
− p) + cn n2/3 )
n
≤
k(n − k)
n
(np − cn
n2/3 )(n(1
− p) − cn n2/3 )
,
1
1
≤
·
−1/3
n (p + cn n
)((1 − p) + cn n−1/3 )
n
≤
k(n − k)
1
1
·
,
−1/3
n (p − cn n
)((1 − p) − cn n−1/3 )
1
·
np(1 − p)
1+
1
cn n−1/3
1+
p
cn n−1/3
1−p
≤
n
≤
k(n − k)
1
·
np(1 − p)
1−
5
1
cn n−1/3
1−
p
cn n−1/3
1−p
.
Compare this to step 3 of the proof of Lemma 9 in de Moivre Laplace
Central Limit Theorem.
3. Therefore, for k ∈ In0
n
1
=
· 1 + Ou (cn n−1/3 )
k(n − k)
np(1 − p)
r
n
1
· 1 + Ou (cn n−1/3 ) .
=p
k(n − k)
np(1 − p)
(1)
This follows from the One-Term Geometric Series Expansion and the
Square-Root Expansion Proposition in the section Big-Oh Algebra
Compare this to steps 4,5 of the proof of Lemma 9 in de Moivre Laplace
Central Limit Theorem.
4. Since k ∈ In0 , k−np
= Ou (cn n−1/3 ) and k−np
= Ou (cn n−1/3 ). Compare
k
n−k
this to step 7 of the proof of Lemma 9 in de Moivre Laplace Central
Limit Theorem.
5. Using the Taylor series expansion for the logarithm
"
n−k #
n k n
1
1
1
2
p
(1 − p)
= − (k − np)
+
+
ln
k
n−k
2
k n−k
kOu (c3n n−1 ) + (n − k)Ou (c3n n−1 )
1
1
+ Ou (c3n ).
= − (k − np)2
2
np(1 − p)
Compare this to step 8 of the proof of Lemma 9 in de Moivre Laplace
Central Limit Theorem.
6. Thus
np k n(1 − p) n−k
k
n−k
= exp
−(k − np)2
2np(1 − p)
1 + Ou (c3n ) .
(2)
See the Exponential Expansion Proposition in the section Big-Oh Algebra
6
7. Step 10 of the proof of Lemma 9 in de Moivre Laplace Central Limit
Theorem showed why
1
1 + n
= 1 + Ou
.
(3)
(1 + k )(1 + n−k )
n
8. Now combining equations (1), (2) and (3) above into step 1, we get
„
«
−(k−np)2
n k
1
p (1 − p)n−k = p
e 2np(1−p) (1 + Ou (c0n )) ,
k
2πp(1 − p)n
where c0n = max cn n−1/3 , c3n , n−1 .
Proposition 2. Assume
1. kn and `n are two sequences with kn < `n for all n.
2. kn = np + o(n2/3 ) and `n = np + o(n2/3 ); i.e., kn = np + c0n n2/3 where
c0n →0 as n → ∞ and `n = np + c00n n2/3 where c00n → 0 as n → ∞.
3. Let an = √kn −np
np(1−p)
and bn = √`n −np .
np(1−p)
Then
1
Pn [kn ≤ Sn ≤ `n ] ∼ √
2π
Z
bn
e−x
2 /2
dx,
an
as n → ∞.
Remark. If (an ) and (bn ) converge respectively to a and b such that a < b then
this proposition becomes the de Moivre-Laplace Central Limit Theorem.
1. Take n so large that 0 ≤ kn < `n ≤ n.
Proof.
2. Let h(n) = √
1
.
np(1−p)
Then by the Optimization Proposition 1
h(n)
Pn [Sn = j] = √ exp
2π
−(j − np)2
2np(1 − p)
(1 + δn (j)).
and
`n−1
h(n) X
−(j − np)2
Pn [kn ≤ Sn < `n ] = √
exp
(1 + δn (j)).
2np(1 − p)
2π j=kn
7
(4)
The hypotheses on the sequences (kn ) and (`n ) along with Proposition 1
imply that the sequence (δn (j)) converges uniformly to zero when kn ≤
j ≤ `n .
3. Therefore it suffices to show that
Z bn
`X
n −1
−(j − np)2
2
e−x /2 dx .
∼
h(n)
exp
2np(1 − p)
an
j=k
n
4. Set
x(j) = p
j − np
np(1 − p)
.
Then an = x(kn ) and bn = x(`n ).
5. The claim is:
h(n)
`X
n −1
exp
j=kn
−(j − np)2
2np(1 − p)
bn
Z
−
e
−x2 /2
Z
bn
−x2 /2
e
dx = o
dx .
an
an
(5)
This claim will follow by considering the Riemann sums for the integral
2
of e−x /2 .
6. In the case (an ) > 0,
Z x(j+1)
x(j + 1)2
h(n) exp −
<
exp(−x2 /2) dx
2
x(j)
x(j)2
< h(n) exp −
.
2
For kn ≤ j ≤ `n obtain
`n−1
0 ≤ h(n)
X
j=kn
exp
−x(j)2
2
Z
bn
−
exp−x
2 /2
dx
an
≤ h(n)(exp(−a2n /2) − exp(−b2n /2)) (6)
8
7. Also
Z
bn
−x2 /2
exp
dx ≥
an
1
bn
Z
bn
2
x exp−x /2 dx
an
1
=
exp(−a2n /2) − exp(−b2n /2) . (7)
bn
1/6
8. Note h(n) = o(b−1
). Then combining (6) and (7)
n ) since bn = o(n
yields (5)
Theorem 3 (Moderate Deviations Theorem). Suppose
1. (an ) is a sequence of real numbers,
2. an → ∞ as n → ∞ and
3. limn→∞
an
n1/6
Then
= 0.
Pn
p
an
1
Sn
2
− p ≥ p(1 − p) √ ∼ √ e−an /2 .
n
n
an 2π
Remark. Step 6 of the proof of the Moderate Deviations Theorem shows that
Z bn
1
1
2
2
√
e−x /2 dx ∼ √ e−an /2 .
2π an
an 2π
so that an equivalent result is that
p
Sn
an
1
2
Pn
− p ≥ p(1 − p) √ ∼ √ e−an /2 .
n
n
an 2π
Remark. The de Moivre-Laplace Central Limit Theorem tells us that as n →
∞
Z ∞
p
x2
Sn
a
1
Pn
− p ≥ p(1 − p) √ ∼ Φ(a) = √
e− 2 dx
n
n
2π a
The moderate deviations result tells us that this estimate remains true
when a is allowed to approach ∞ at a slow enough rate.
Proof.
1. The hypothesis 3 says that an → ∞ less quickly than n1/6 .
9
√
an . Then limn→∞ adnn = 0 and so dn = o(an ).
p
p
3. Let kn = dnp+ np(1 − p)an e and `n = dnp+ np(1 − p)(an +dn )e. A
schematic diagram of where all the sequences sit relative to each other
is below:
2. Let dn =
np +
p
p
np(1 − p)an np + np(1 − p)(an + dn )
-
kn
4. Event [Sn ≥ kn ] =
"
Pn
Sn
−p≥
n
Sn
p
n
−p≥
kn
n
`n
−p =
√
Sn
n
p(1−p)
√
an
n
−p≥
. Thus,
#
p(1 − p)
√
an = Pn [Sn ≥ kn ]
n
= Pn [kn ≤ Sn < `n ] + Pn [Sn ≥ `n ] .
Step 5 below will take care of the first summand. Step 7 below will
take care of the second summand.
5. By hypothesis 3, an = o(n1/6 ) and so kn , `n = np + o(n2/3 ). From
Proposition 3, set
kn − np
`n − np
, bn = p
,
a0n = p
np(1 − p)
np(1 − p)
and so
1
Pn [kn ≤ Sn < `n ] ∼ √
2π
Z
bn
e−x
2 /2
dx .
a0n
This allows us to say that
1
Pn [kn ≤ Sn < `n ] ∼ √
2π
Z
bn
e
an
−x2 /2
1
dx − √
2π
Z
a0n
e−x
2 /2
dx .
an
Step 6 below will take care of the first summand. Step 7 below will
take care of the second summand.
10
6. The claim is that
bn
Z
e−x
2 /2
dx ∼
an
1 −a2n /2
e
.
an
(a) Notice that
bn
Z
e
1
dx ≤
an
−x2 /2
an
Z
∞
xe−x
2 /2
dx =
an
1 −a2n /2
e
.
an
(b) We also have that bn ≥ an + dn , since normalizing the ceiling is at
least as big as normalizing the argument of the ceiling. Thus,
Z an +dn
Z bn
2
−x2 /2
e−x /2 dx
e
dx ≥
an
an
1
an
Divide by
Z an +dn
1
2
≥
xe−x /2 dx
an + dn an
2
1
−an
−(an + dn )2
=
exp
− exp
.
an + d n
2
2
2
exp −a2 n to get on the right hand side:
=
−(an +dn )2
2
an
an exp
−
an + dn an + dn exp −a2n
2
→ 1 − 0 = 1.
Now combine steps 6a and6b to get the claim of step 6.
7. The claim is that
Z
a0n
−x2 /2
e
dx = o
an
1 −a2n /2
e
.
an
The fact that 0 ≤ a0n − an ≤ (np(1 − p))−1/2 directly implies that
Z
a0n
e
an
−x2 /2
1
exp
dx ≤ p
np(1 − p)
11
−a2n
2
by approximating the integral
2 with
a 1-box left or lower Riemann sum.
−an
1
Divide through by an exp 2 .
R a0n
2
e−x /2 dx
an
2 < p
→ 0,
np(1 − p)
exp −a2 n
an
1
an
since an = o(n1/6 ).
8. The claim is that
Pn [Sn ≥ `n ] = o
1 −a2n /2
e
.
an
(a) By the Large Deviations Theorem we have
p
bn
Pn [Sn ≥ `n ] ≤ exp −nh+
p(1 − p) √
,
n
where h+ () =
2
2p(1−p)
+ O(3 ) for → 0. Thus,
Pn [Sn ≥ `n ] exp
−b2n
+O
2
b3
√n
n
∼ exp
−b2n
2
,
since bn = o(n1/2 ).
(b) Notice that
exp
−b2n
2
(c) We can see that exp
√
dn > 2 ln an .)
−(an + dn )2
≤ exp
2
2
2 −dn
−an
= o exp
exp
.
2
2
−d2n
2
≤
This concludes step 8.
12
1
an
by our choice of dn . (Note that
n
= limn→∞ n−1/24 =
Example. Take an = n1/8 , so that an → ∞ and limn→∞ na1/6
√
0. Take p = 1/2. Take n = 104 , so a104 = 10 and
"
√ #
h
√ i
1 10
1
S104
√
4
≥
−
10 .
Pn
=
P
S
≥
5000
+
50
n
10
104
2
2 104
R
1-pbinom(5000+50*sqrt(10)-1, 10^4,0.5)
0.0008156979
(1/(sqrt(10*2
Octave
1-binocdf(5000+50*sqrt(10)-1, 10^4, 0.5)
8.1570e-04
(1/(sqrt(10*
Using the deMoivre-Laplace Central Limit Theorem in R: 1- pnorm(5000+50*sqrt (10), mea
0.0007827011.
Sources
The explanatory remarks at the beginning comparing the Moderate Deviations Theorem to the Large Deviations Theorem and the Central Limit
Theorem are from the survey article by Mörters. This section is adapted
from: Heads or Tails, by Emmanuel Lesigne, Student Mathematical Library
Volume 28, American Mathematical Society, Providence, 2005, Chapter 8.
[1]. [2].
Algorithms, Scripts, Simulations
Algorithm
The experiment is flipping a coin n times, and repeat the experiment k times.
Then check the probability of moderate deviations.
Scripts
Scripts
R R script for Moderate Deviations
13
p <− 0 . 5
n <− 10000
k <− 1000
c o i n F l i p s <− array ( 0+( runif ( n∗k ) <= p ) , dim=c ( n , k ) )
# 0+ c o e r c e s Boolean t o numeric
h e a d s T o t a l <− colSums ( c o i n F l i p s )
# 0 . . n b i n o m i a l r v sample ,
an <− n ˆ ( 1 / 8 )
mu <− p∗n
s t d d e v <− sqrt ( p∗(1−p ) ∗n )
moddev <− mu + s t d d e v ∗ ( an )
prob <− sum( 0+( h e a d s T o t a l > moddev ) ) /k
t h e o r e t i c a l <− ( 1/ ( sqrt ( 2 ∗ p i ) ∗an ) ) ∗exp ( −(an ) ˆ2 /2 )
cat ( s p r i n t f ( ” E m p i r i c a l p r o b a b i l i t y : %f \n” , prob ) )
cat ( s p r i n t f ( ” Moderate D e v i a t i o n s Theorem e s t i m a t e : %f \n” , t h e o
Octave Octave script for Moderate Deviations
p = 0.5;
n = 10000;
k = 1000;
c o i n F l i p s = rand ( n , k ) <= p ;
h e a d s T o t a l = sum( c o i n F l i p s ) ;
# 0 . . n b i n o m i a l r v sample , s i z e k
an = n ˆ ( 1 / 8 ) ;
mu = p∗n ;
s t d d e v = sqrt ( p∗(1−p )∗ n ) ;
moddev = mu + s t d d e v ∗an ;
prob = sum( h e a d s T o t a l > moddev ) / k ;
t h e o r e t i c a l = ( 1/ ( sqrt (2∗ pi )∗ an ) )∗ exp ( −(an )ˆ2/2 ) ;
disp ( ” E m p i r i c a l p r o b a b i l i t y : ” ) , disp ( prob )
disp ( ” Moderate D e v i a t i o n s Theorem e s t i m a t e : ” ) , disp ( t h e o r e t i c a l )
Perl Perl PDL script for Moderate Deviations
use PDL : : N i c e S l i c e ;
use PDL : : Constants qw( PI ) ;
14
$p = 0 . 5 ;
$n = 1 0 0 0 0 ;
$k = 1 0 0 0 ;
$ c o i n F l i p s = random ( $k , $n ) <= $p ;
$headsTotal =
$ c o i n F l i p s −>t r a n s p o s e −>sumover ;
#n o t e o r d e r o f dims ! !
# 0 . . n b i n o m i a l r . v . sample
#n o t e t r a n s p o s e , PDL l i k e s x ( row ) d i r e c t i o n f o r i m p l i c i t l y t h r e a d e d
$an
$mu
$stddev
$moddev
=
=
=
=
$n ∗∗( 1 / 8 ) ;
$p ∗ $n ;
sqrt ( $p ∗ ( 1 − $p ) ∗ $n ) ;
$mu + $ s t d d e v ∗ $an ;
$prob = ( ( $ h e a d s T o t a l > $moddev )−>sumover ) / $k ;
$ t h e o r e t i c a l = ( 1 / ( sqrt ( 2 ∗ PI ) ∗ $an ) ) ∗ exp ( −( $an ∗∗2 ) /
$prob ,
print ” E m p i r i c a l p r o b a b i l i t y : ” ,
”\n” ;
print ” Moderate D e v i a t i o n s Theorem e s t i m a t e : ” , $ t h e o r e t i c a l , ”\n” ;
SciPy Scientific Python script for Moderate Deviations
import s c i p y
p = 0.5
n = 10000
k = 1000
c o i n F l i p s = s c i p y . random . random ( ( n , k))<=
# Note Booleans True f o r Heads and F a l s e
h e a d s T o t a l = s c i p y . sum ( c o i n F l i p s , a x i s =
# Note how Booleans a c t as 0 ( F a l s e ) and
an = n ∗ ∗ ( 1 . / 8 . )
mu = p ∗ n
s t d d e v = s c i p y . s q r t ( p ∗ ( 1−p ) ∗n )
15
p
for Tails
0 ) # 0 . . n b i n o m i a l r . v . sam
1 ( True )
moddev = mu + s t d d e v ∗ an
prob = ( s c i p y . sum ( h e a d s T o t a l > moddev ) ) . a s t y p e ( ’ f l o a t ’ ) / k
# Note t h e c a s t i n g o f i n t e g e r t y p e t o f l o a t t o g e t f l o a t
t h e o r e t i c a l = ( 1/ ( s c i p y . s q r t (2∗ s c i p y . p i )∗ an ) ) ∗ s c i p y . exp (−(an ∗ ∗ 2 ) / 2 )
print ” E m p i r i c a l p r o b a b i l i t y : ” , prob
print ” Moderate D e v i a t i o n s Theorem e s t i m a t e : ” , t h e o r e t i c a l
Problems to Work for Understanding
1. Using the Proposition 1 show that if (an ) and (bn ) converge respectively
to a and b such that a < b then this proposition becomes the de MoivreLaplace Central Limit Theorem.
2. Explain why
Pn [Sn ≥ `n ] exp
−b2n
+O
2
b3
√n
n
∼ exp
−b2n
2
,
if bn = o(n1/2 ).
3. Explain why for k ∈ In0 ,
k−np
k
= Ou (cn n−1/3 ) and
k−np
n−k
= Ou (cn n−1/3 ).
4. Explain why
exp
−(an +dn )2
2
exp
−a2n
2
→ 0.
5. Explain why
2
2 −(an + dn )2
−dn
−an
exp
= o exp
exp
.
2
2
2
16
Reading Suggestion:
References
[1] Emmanuel Lesigne. Heads or Tails: An Introduction to Limit Theorems
in Probability, volume 28 of Student Mathematical Library. American
Mathematical Society, 2005.
[2] Peter Mörters. Large deivation theory and applications. http://people.
bath.ac.uk/maspm/LDP.pdf, November 2008. Cramér’s theorem, large
deviations, moderate deviations.
Outside Readings and Links:
1.
2.
3.
4.
I check all the information on each page for correctness and typographical
errors. Nevertheless, some errors may occur and I would be grateful if you would
alert me to such errors. I make every reasonable effort to present current and
accurate information for public use, however I do not guarantee the accuracy or
17
timeliness of information on this website. Your use of the information from this
website is strictly voluntary and at your risk.
I have checked the links to external sites for usefulness. Links to external
websites are provided as a convenience. I do not endorse, control, monitor, or
guarantee the information contained in any external website. I don’t guarantee
that the links are active at all times. Use the links here with the same caution as
you would all information on the Internet. This website reflects the thoughts, interests and opinions of its author. They do not explicitly represent official positions
or policies of my employer.
Information on this website is subject to change without notice.
Steve Dunbar’s Home Page, http://www.math.unl.edu/~sdunbar1
Email to Steve Dunbar, sdunbar1 at unl dot edu
Last modified: Processed from LATEX source on November 29, 2012
18