Degree Project
Edgeworth Expansion and Saddle Point
Approximation for Discrete Data with
Application to Chance Games.
Rani Basna
2010-09-27
Subject: Mathematics
Level: Master
Course code: 5MA11E
Edgeworth Expansion and Saddle Point Approximation for Discrete Data With
Application to Chance Games
Abstract
We investigate mathematical tools, Edgeworth series expansion and the saddle point
method, which are approximation techniques that help us to estimate the distribution
function for the standardized mean of independent identical distributed random
variables where we will take into consideration the lattice case. Later on we will
describe one important application for these mathematical tools where game
developing companies can use them to reduce the amount of time needed to satisfy
their standard requests before they approve any game.
Keywords
Characteristic function, Edgeworth expansion, Lattice random variables, Saddle point
approximation.
Acknowledgments
First I would like to show my gratitude to my supervisor Dr.Roger Pettersson for his
continuous support and help provided to me. It is a pleasure to me to thank Boss
Media® for giving me this problem. I would like also to thank my fiancée Hiba
Nassar for her encourage .Finally I want to thank my parents for their love and I
dedicate this thesis to my father who inspired me.
Rani Basna
Number of Pages: 45
Contents
1 Introduction
3
2 Notations and Definitions
2.1 Characteristic Function . . . . . . . . . . . . . . . . . . . . . .
2.2 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . .
2.3 Definition of Lattice Distribution and Bounded Variation . . .
3
3
6
7
3 Edgeworth Expansion
3.1 First Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Second Case . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Auxiliary Theorems . . . . . . . . . . . . . . . . . . . .
3.2.2 On The Remainder Term . . . . . . . . . . . . . . . . .
3.2.3 Main Theorem . . . . . . . . . . . . . . . . . . . . . .
3.2.4 Simulation Edgeworth Expansion for Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . .
3.3 Lattice Edgeworth Expansion . . . . . . . . . . . . . . . . . .
3.3.1 The Bernoulli Case . . . . . . . . . . . . . . . . . . . .
3.3.2 Simulating for the Edgeworth expansion Bernoulli Case
3.3.3 General Lattice Case . . . . . . . . . . . . . . . . . . .
3.3.4 Continuity-Corrected Points Edgeworth Series . . . . .
3.3.5 Simulation on Triss Cards . . . . . . . . . . . . . . . .
8
9
14
14
15
17
4 Saddle Point Approximation
4.1 Simulation With Saddle Point Approximation Method
18
18
19
21
21
25
26
30
. . . . 32
A Matlab Codes
33
References
38
2
1
Introduction
The basic idea in this Master’s Thesis is to define, adjust, and apply two
mathematical tools (Edgeworth Expansion and Saddle Point approximation).
Mainly we want to estimate the cumulative distribution function for independent and identically distributed random variables, specifically the lattice
random variables. These approximating methods will give us the ability to
reduce the amount of independent random variables for our estimate, comparing to what we usually use by normal approximation using the central
limit theorem. This mean that will make these methods more applicable to
real life industry. More precisely the chance game industry may use methods
to diminish the amount of time they need to publish a new trusted game.
We will write Matlab codes to verify theoretical results, by simulating a Triss
game similar to real ones with three boxes, and then apply the codes on this
game to see how accurate our methods will be.
In the second chapter we define some basic concepts and present important
theorems that will help us in our work. In the third chapter we will introduce the Edgeworth expansion series in addition to the improvement of the
remainder term that Esseen (1968)[11] present. In Chapter four we will discuss the lattice random variables case which is much more important for us
since we face it in important applications. After that we will try to apply the
Edgeworth method to our main problem and look at the results. In Chapter
five we will describe the most important useful formulas for the saddle point
approximation technique without theoretical details. Finally we will apply
the saddle point approximation method to our problem.
2
Notations and Definitions
2.1
Characteristic Function
The definitions and theorems presented below can be found, for example, at
[14] ,[18],...,[19].
Definition 2.1 (Distribution Function). For a random variable X, FX (x) =
P (X ≤ x) is called the distribution function of X.
Definition 2.2 (Characteristic
Function). For a random variable X let
R ∞ itx
itX
ΨX (t) = E e
= −∞ e dFX (x), called the characteristic function of
X.
Here the integral is in the usual Stieltjes integral sense.
3
Theorem 2.1. Let X be a random variable with distribution function F and
characteristic function ψX (t).
1. If E |X|n < ∞ for some n = 1, 2, ..., then
(
)
n
n
n
n+1
n+1
j
X
(it)
|t|
|X|
|t|
|X|
EX j ≤ E min 2
,
ψX (t) −
j!
n!
(n + 1)!
j=0
In particular,
If E |X| < ∞, then
|ψX (t) − 1| ≤ E min {2, |tX|} ,
|ψX (t) − 1 − itEX| ≤ E min 2 |tX| , t2 X 2 /2 ,
and if EX 2 < ∞, then
ψX (t) − 1 − itEX − t2 EX 2 /2 ≤ E min t2 X 2 , |tX|3 /6 ,
2. If E |X|n < ∞, for all n, and
then
|t|n
E
n!
ψX (t) = 1 +
|X|n → 0 as n → ∞ for all t ∈ R,
∞
X
(it)j
j=1
j!
EX j
Theorem 2.2 (Characteristic Function of Normal Random Variables). If
X ∈ N (µ, σ) Then it’s characteristic function is
ψX (t) = exp(iµt −
σ 2 t2
)
2
(1)
To make the paper more self-contained a proof is included.
Proof:
We know that
ψX (t) =
=
=
=
=
1
√
σ 2π
1
√
σ 2π
1
√
σ 2π
1
√
σ 2π
1
√
σ 2π
Z
∞
Z−∞
∞
Z−∞
∞
Z−∞
∞
Z−∞
∞
eitx e−
(x−µ)2
2σ 2
eitx e−
(x2 −2xµ+µ2 )
2σ 2
dx
dx
e−
(x2 −2xµ+µ2 )+2itxσ 2
2σ 2
e−
(x2 −2xµ−2itxσ 2 )
2σ 2
e−
(x2 −2(µ+itσ 2 )x)
2σ 2
−∞
4
dx
µ2
e− 2σ2 dx
µ2
e− 2σ2 dx
=
=
=
=
Z ∞
(µ+itσ 2 )2
(x2 −2(µ+itσ 2 )x+(µ+itσ 2 )2 )
µ2
1
+
2σ 2
2σ 2
√
e− 2σ2 dx
e−
σ 2π −∞
Z ∞
(x−µ−itσ 2 )2 (µ+itσ 2 )2 −µ2
1
2σ 2
√
e
dx
e− 2σ2
σ 2π −∞
Z ∞
(x−µ−itσ 2 )2
t2 σ 2
1
√
eµit− 2 dx
e− 2σ2
σ 2π −∞
2 2
Z
2 )2
exp(µit − t 2σ ) ∞ − (x−µ−itσ
2σ 2
√
dx.
e
σ 2π
−∞
By substituting y =
(x−µ−itσ 2 )
σ
we get
dy =
exp(itµ −
√
ψX (t) =
2π
where
√1
2π
R
e
−y 2
2
dx
σ
t2 σ 2
)
2
Z
∞
y2
e− 2 dy
−∞
dy = 1 ⇒
ψX (t) = exp(itµ −
t2 σ 2
).
2
Using Maclaurin expansion for (1) we get the following expansion
ψX (t) = 1 + (µit −
t2 σ 2
1
t2 σ 2 2
) + (µit −
) + ...
2
2
2
In addition, if we have two normal random variables X, Y :
ψX (t) = exp(µX it −
2
t 2 σX
)
2
ψy (t) = exp(µY it −
t2 σY2
)
2
and
then we can easily prove that
ψX+Y (t) = ψX (t)ψY (t) = exp[(µX + µY )it −
For the special case when
X ∈ N (0, 1)
we have the formula
ψX (t) = e−t
5
2 /2
.
2
t2 (σX
+ σY2 )
].
2
2.2
Central Limit Theorem
Theorem 2.3 (Central Limit Theorem). Let X1 , X2 , ... be a sequence of
independent and identically distributed random variables each having mean
µ and variance σ 2 < ∞ . Then the distribution of
X1 + ... + Xn − nµ
√
σ n
tends to the standard normal distribution as n → ∞.
The theorem is fundamental in probability theory. One simple derivation
is in Blom [14] which we follow below.
P
√
Proof: Let’s put: Yk = (Xk − µ)/σ, and Sn = n1 Yk / n.
We will show that
ψSn (t) = E(eitSn )
2
converge to e−t /2 , the characteristic function of the standard normal distribution. By the independence,
√
√
ψSn (t) = ψP Yk (it/ n) = [ψY (it/ n)]n .
Furthermore
t2
+ t3 H(t)
2
where H(t) is bounded in a neighborhood of 0. We get
ψY (t) = 1 + iE(Y )t − E(Y 2 )
√
√
t2
ψY (t/ n) = 1 + iE(Y )t/ n − E(Y 2 ) + n−3/2 Hn .
2n
where Hn is finite and E(Y ) = 0, V (Y ) = 1. Hence
ψSn (t) = [1 −
t2
+ n−3/2 Hn ]n .
2n
and
ln ψSn (t) = n ln(1 −
t2
= n − + n−3/2 Hn
2n
t2
+ n−3/2 Hn )
2n
ln(1 − t2 /2n + n−3/2 Hn )
.
t2
− 2n
+ n−3/2 Hn
→ 1, as x → 0. Thus
From the logarithm property ln(1+x)
x
2
t2
t
−3/2
Hn → − , as n → ∞
n − +n
2n
2
6
and
ln(1 − t2 /2n + n−3/2 Hn )
→ 1, as n → ∞
t2
− 2n
+ n−3/2 Hn
Since the characteristic function of Sn converges to the Characteristic function of the standard normal distribution, Sn converges in distribution to the
standard normal random variable, see e.g Cramér[4].
Theorem 2.4 (Laypunov’s Theorem). suppose that for each n the sequence
X1 , X2 , ..., Xn is independent and satisfies E [Xn ] = 0, hVar[Xn ]i= σn2 and
P
2+δ
2
2
SN
= N
are finite
n=1 σn . if for some δ > 0 the expected values E |Xk |
for every k and that Lyapounov’s condition
lim
N
1 X
2+δ
SN
n=1
E Xn2+δ = 0
holds for some positive δ then the central limit theorem holds.
Remark 2.5. This theorem is considered as further development of Lyapounov’s solution to the second problem of Chebyshev (you can see more
details in Gnedenko and Kolmogorov [13]) which turned out to be much simple and more useful in applications of the central limit theorem than former
solutions.
2.3
Definition of Lattice Distribution and Bounded Variation
Definition 2.3 (Lattice distribution). A random variable X is said to have
a lattice distribution if with probability 1 it takes on values of the form a + kh
where (k = ±1, ±2, ...), and h > 0 are constants. We call the smallest such
number h the span of the distribution.
Definition 2.4 (Function of Bounded Variation). Let F (x) be a real or
complex-valued function of the real variable x. We say F (x) has bounded
variation on the whole real axis if
Z ∞
V (F ) =
|dF (x)| < ∞
−∞
For a function F of bounded variation, define F (x) at the discontinuity points
in such a way that
F (x) =
1
[F (x + 0) − F (x − 0)]
2
7
where F (x + 0) = limε→0 F (x + ε) and F (x − 0) = limε→0 F (x − ε).
If furthermore F (−∞) = 0, F (∞) = 1, then F (x) is said to be a lattice
distribution function.
3
Edgeworth Expansion
Let X1 , X2 , ...Xn be independent and identically distributed random variables
with mean µ and variance σ 2 . By the Central Limit Theorem,
Pn
Xi /n − µ
Sn = i=1 √
σ/ n
is asymptotically normally distributed with zero mean and unit variance.
What we are interested in here is to study the asymptotic behavior of the
difference between the normal distribution Φ(x) and the distribution function
Fn (x) of the Sn . In other words we want to describe the error of the normal
approximation and one way to do that is via characteristics function.
Three cases may appear, where they all together cover all possibilities:
1. The characteristic function ψX (t) satisfy the condition
lim sup |ψX (t)| < 1,
|t|→∞
(C)
called the Cramér condition. Then the distribution function has the
following expansion
s
X
pj (x) − x2
1
Fn (x) = Φ(x) +
,
s≥1
(2)
e 2 +O
s+1
j/2
n
n 2
j=1
where pj (x) is a polynomial in x.
2. Condition (C) is not satisfied and the distribution is not of lattice type.
It is found that
x2
α
1
√3
(1 − x2 )e− 2 + o( √ )
Fn (x) = Φ(x) +
3
n
6σ 2πn
α3 being the third moment of Xi , α3 = E [(X − EX)3 ] .
3. Fn (x) is a lattice distribution function. Even if all moments are finite,
an expansion like the latter one is impossible, Fn (x) will have jumps at
discontinuity points of order of magnitude √1n . By adding an extra term
to the expansion we diminish the order of magnitude of the remainder
term.
8
Note: When s = 1, O( n1 ) << o( √1n ). that means when (C) is satisfied the
error is much more smaller than when it is not satisfied.
Remark 3.1. If the distribution function of X is absolutely continuous, i.e.
X is a continuous random variable, then (C) is valid.
Remark 3.2. Kolmogorov [13] noted that there are discrete distributions
which are √
not lattice distributed. For instance, if Xj takes only the values
±1 and ± 3, each with probability 14 , then its distribution is not a lattice
distribution because the system of equations
√
− 3 = a + k1 h
1 = a + k2 h
1 = a + k3 h
√
− 3 = a + k4 h
where ki ∈ {±1, ±2, ...} , which, if I understand Kolmogorov correctly, does
not have a solution.
3.1
First Case
Here a proof of (2) will be presented. However, we assumed that the distribution function of X is absolutely continuous which implies (C), recall remark
3.1.
We know that Sn is asymptotically normal N (0, 1). Then the characteristic function ψn of Sn converges to that of N (0, 1) as N → ∞,
ψSn (t) = E [exp(itSn )] → E[exp(itN )] = e−t
2 /2
,
∞ < t < ∞.
(3)
Now if we put Y = (X − µ)/σ, where X is equal in law to Xi , and ψY is the
characteristic function of Y then we will have
ψSn (t) = E[e
= E[e
itSn
] = E[e
√ X −µ
it/ n 1σ
√
= (E[eit/
n X−µ
σ
it
Pn
Xi −nµ
i=1 √
σ n
]E[e
]
√ X −µ
it/ n 2σ
√
])n = (E[eit/
√
].....E[eit/
nY
n Xnσ−µ
]
])n = (ψY (t/n1/2 ))n .
We also can write the characteristics function as an expansion
log ψY (t) =
∞
X
kj (it)j
j=1
9
j!
.
(4)
Then
1
1
2
j
ψY (t) = exp k1 it + k2 (it) + ... + kj (it) + .. .
2
3!
(5)
2
Formula (5) follow by using the expansion of the log(1 + z) = z1 − z2 + ... ±
zj
+ O(z j ) and replacing the 1 + z by ψY (t) and do some rearrangements.
j
In addition, we have from the characteristic function developed in Maclaurin’s
series for small value that
1
1
ψY (t) = 1 + E[Y ]it + E[Y 2 ](it)2 + ... + E[Y j ](it)j + . . .
2
j!
We can define the cumulants Kj using the formal identity
(
)
X1
X1
kj (it)j = log 1 +
E[Y j ](it)j
j!
j!
j≥1
j≥1
=
X
i≥1
(−1)
i+1 1
i
1
E[Y j ](it)j
j!
k
and by equating coefficients of (it)j ,
k1
k2
k3
k4
= E[Y ],
= E[Y 2 ] − E[Y ]2 = V ar(Y ),
= E[Y 3 ] − 3E[Y 2 ]E[Y ] + 2E[Y ]3 = E[Y − E[Y ]]3 ,
= E[Y 4 ] − 4E[Y 3 ]E[Y ] − 3E[Y 2 ]2 + 12E[Y 2 ]E[Y ]2 − 6E[Y ]4
= E[Y − E[Y ]]4 − 3(V ar(Y ))2 ,
(6)
and so on.
We expressed the cumulants in term of homogeneous polynomial in moments of degree j. Moments too can be expressed in terms of homogeneous
polynomials in cumulants.
We have standardized the random variable Y for location and scale, so
now E[Y ] = k1 = 0 and var(Y ) = k2 = 1. Hence by (4) and (5) and using
the expansion series of the exponential function we get:
(
2
j )
it
it
k
it
k
j
2
√
√
ψY (t/n1/2 ) = exp k1 √ +
+ ... +
.
j!
n 2!
n
n
ψY (t/n
1/2
kj (it)j
k2 (it)2
it
+ ... +
) = exp k1 √ +
j! nj/2
n 2! n
10
Hence
n
kj (it)j
2
k1 √itn k2 (it)
j
j!
ψSn (t) = e
e 2! n ...e n /2 ...
Substituting k1 = 0 and k2 = 1 we get
2
3
j
−(j−2) kj (it)
−t
−1/2 k3 (it)
+n
+ ... + n 2
+ ... ,
ψSn (t) = exp
2
3!
j!
ψSn (t) = e−t
2 /2
1 + n−1/2 r1 (it) + n−1 r2 (it)... + n−j/2 rj (it) + ... ,
(7)
where rj is a polynomial of degree 3j depending on the cumulants, and this
expansion gives the same as the convergence of the central limit theorem
gives (3). We can see that rj is an even polynomial when j is even and an
odd polynomial when j is odd, and it is obvious from (7) that
1
r1 (u) = k3 u3
6
and
r2 (u) =
(8)
1
1
k4 u4 + k32 u6 .
24
72
(9)
We can rewrite (7) in the form
ψSn (t) = e−t
2 /2
+ n−1/2 r1 (it)e−t
2 /2
Now since
ψSn (t) =
and
e
−t2 /2
+ n−1 r2 (it)e−t
Z
∞
−∞
=
Z
2 /2
... + n−j/2 rj (it)e−t
2 /2
+ ...
(10)
eitx dP (Sn ≤ x)
∞
eitx dΦ(x),
(11)
−∞
where Φ denotes the standard normal distribution function, if we apply the
inverse method of Fourier-Stieljes transform on (12) we get
P (Sn ≤ x) = Φ(x) + n−1/2 R1 (x) + n−1 R2 (x)... + n−j/2 Rj (x) + ...,
(12)
where Rj (x) represents a function whose Fourier-Stieljes transform equals
2
rj (it)e−t /2
Z ∞
2
eitx dRj (x) = rj (it)e−t /2 .
−∞
11
Our focus now is to compute Rj . Applying integrating by parts j times in
(11) gives
Z ∞
−t2 /2
−1
e
= (−it)
eitx dΦ(1) (x)
Z−∞
∞
2
e−t /2 = (−it)−2
eitx dΦ(2) (x)
−∞
(13)
..
.
Z ∞
−j
= (−it)
eitx dΦ(j) (x),
−∞
where Φ(j) (x) = (d/dx)j φ(x). Putting D as the differential operator d/dx,
such that rj (−D) is differential operator, Hence
Z ∞
2
eitx d (−D)j Φ(x) = (it)j e−t /2
(14)
Z
−∞
∞
−∞
eitx d {rj (−D)Φ(x)} = (it)j e−t
2 /2
and rj (−D)Φ(x) is the function we are looking for here
Rj (x) = rj (−d/dx)φ(x), for j ≥ 1.
(15)
The Chebyshev-Hermit polynomials can be defined by the formula
Hk (x) = (−1)k ex
2 /2
dk −x2 /2
e
,
dxk
Which gives
H0 (x) = 1,
H1 (x) = x,
H2 (x) = x2 − 1,
H3 (x) = x3 − 3x,
.......
(16)
Then using the Hermitian polynomials we can put
(−Dj )Φ(x) = −Hj−1 (x)Φ(x).
We note that the Hermitian polynomials are orthogonal with respect to the
function Φ(x) and that Hj is a polynomial of degree j and is even when j is
even and is odd when j is odd. We get now from (9), (8), and (15) that
1
R1 (x) = − k3 (x2 − 1)φ(x)
6
12
R2 (x) = −x
For general j ≥ 1,
1 2 4
1
2
2
k4 (x − 3) + k3 (x − 10x + 15) φ(x).
24
72
Rj (x) = pj (x)φ(x),
Here the polynomial pj have degree of order 3j − 1 and is odd for even j.
This is clear because of rj degree. Hence
1
p1 (x) = − k3 (x2 − 1)
6
and
p2 (x) = −x
1 2 4
1
2
2
k4 (x − 3) + k3 (x − 10x + 15) .
24
72
(17)
(18)
Now we can rewrite formula (12)
P (Sn ≤ x) = Φ(x) + n−1/2 p1 (x)φ(x) + n−1 p2 (x)φ(x)... + n−j/2 pj (x)φ(x) + ...
(19)
called the Edgeworth expansion of the distribution of P (Sn ≤ x). The third
cumulant k3 refers to skewness, so the term of n−1/2 order in the last expansion improves the basic normal approximation of the cumulative distribution
function of Sn by performing skewness correction. In the same way k4 refers
to kurtosis for the term of order n−1 which improves the normal √
approximation further by adjusting for kurtosis. In other words the O(1/ n) rate of
the Berry-Essen theorem is improved to uniform errors of the order n−1 , and
3
n− 2 by the one and two term Edgeworth expansion.
It is very rare that this expansion converges according to Hall [15]. In
fact there is a condition on this expansion (19) Cramér [5], which says that
if X has an absolutely continuous distribution function then the condition
for convergence of (19) is
1 2
E exp( Y ) < ∞,
4
which is a very severe condition. Usually (19) exists as an asymptotic series,
which means that if the series stop at a specific order the remainder is of
smaller order than the last omitted term in the series. It means
P (Sn ≤ x) = Φ(x)+n−1/2 p1 (x)φ(x)+n−1 p2 (x)φ(x)...+n−j/2 pj (x)φ(x)+o(n−j/2 ).
(20)
− j+1
−j/2
In fact, the remainder term o(n
) is much smaller, namely O(n 2 ) [10].
The restrictions on (20) are
E(|X|j+2 ) < ∞ and lim sup |ψ(t)| < 1.
|t|→∞
13
(21)
You can find the proof of this fact in Hall [15].
We derived the Edgeworth expansion from an expansion for the logarithm
of the characteristic function of Sn . The Cramér condition (C) ensures that
the characteristic function can be expanded in the requested manner. The
expansion of Fn was obtained by a Fourier inversion of the expansion for the
characteristic function.
3.2
3.2.1
Second Case
Auxiliary Theorems
Now we move to the case where the condition (C) is not satisfied and the
distribution is not of lattice type. For that we need some auxiliary theorems
before a proof the main theorem, Theorem 3.10.
Theorem 3.3. Let A, T, and ε > 0 be constants, F (x) a non decreasing
function, and G(x) a function of bounded variation. If
1. F (−∞) = G(−∞), F (+∞) = G(+∞),
R
2. |F (x) − G(x)| dx < ∞,
′ ′
3. G (x)exist for all x and G (x) ≤ A,
R T 4. −T f (t)−g(t)
dt = ε,
t
then to every number k > 1 there corresponds a finite positive number c(k)
depending only on k such that
|F (x) − G(x)| ≤ k
ε
A
+ c(k) .
2π
T
Theorem 3.4. Let A, T, ε be arbitrary positive constants, F (x) a non decreasing discontinuous function, and G(x) a function of bounded variation.
If
1. F (−∞) = G(−∞) = 0, F (+∞) = G(+∞),
R
2. |F (x) − G(x)| dx < ∞,
3. the functions F (x) and G(x) have discontinuities only at the points
x = xi (xi < xi+1 ; i = 0, ±1, ±2, ...), and there exist an l such that
min(xi+1 − xi ) ≥ l,
14
4. everywhere except at x = xi (i = 0, ±1, ±2, ...),
′ G
(x)
≤A
5.
R T f (t)−g(t) dt = ε,
t
−T then to every number k > 1 there corresponds two finite numbers c1 (k) and
c2 (k) depending only on k and such that
|F (x) − G(x)| ≤ k
ε
A
+ c1 (k)
2π
T
whenever, T l ≥ c2 (k).
3.2.2
On The Remainder Term
Theorem 3.5. If the random variables X1 , X2 , ...Xn , .. have finite third moments, then
ρ3
|Fn (x) − Φ(x)| ≤ c √ ,
n
where ρ3 is the third cumulant of the random variable x/σ and c is a constant.
Theorem 3.6. If the random variables X1 , X2 , ...Xn , .. are identically distributed and have finite third moments, then for
√
σ3 n
= Tn ,
|t| ≤
5β3
the following inequality holds:
7 |t|3 β
2
t2 3 −t
4 ,
√
e
fn (t) − e− 2 ≤
6 σ3 n
where β3 denote the absolute moment of order 3 and equal
Z ∞
|x|3 dF (x).
β3 =
−∞
Pn
X /n−µ
We found that the characteristics function of Sn = i=1σ/√in
are independent and identically distributed is
)
(
∞
X
1
2
ψSn (t) = e−t /2 1 +
rj (it)( √ )j
n
j=1
15
where Xi
(22)
Theorem 3.7. If in the sum (22) the summands
have finite moments from
√
n
order s where s ≥ 3, then for |t| ≤ Tsn = 3/s the inequality 1
8sρs
t2
fn (t) − e− 2
s−3
X
1
1+
rj (it)( √ )j
n
j=1
!
c (s) 2
1
s
3(s−2) − t
e 4
≤ s−2 |t| + |t|
Tsn
holds where c1 (s) depend only on s; also the inequality
!
s−2
δ(n) X
2
t2
t
1
rj (it)( √ )j ≤ s−2 |t|s + |t|3(s−1) e− 4
fn (t) − e− 2 1 +
n 2
n
(23)
(24)
j=1
holds, where δ(n) depends only on n and
lim δ(n) = 0.
n→∞
Remark 3.8. By Gnedenko and Kolmogorov [13, P204,Theorem 1b] we instead have
!
s−2
c (s)δ(n) X
2
2
t
1
2
s
3(s−2) − t
4
|t|
+
|t|
e
.
rj (it)( √ )j ≤
fn (t) − e− 2 1 +
s−2
Tsn
n
j=1
Theorem 3.9. If the distribution function F (x) is non lattice, then whatever
the number w > 0 may be, there exist a function λ(n) such that
lim λ(n) = ∞
n→∞
and
I=
Z
λ(n)
w
1
|f n (t)|
dt = o( √ )
t
n
The theorem we will mention next was first proved By Cramér [5] where
the condition (C) satisfied, and later by Esseen [10] which we will present
here.
1
ρs =
βs
σs
16
3.2.3
Main Theorem
Theorem 3.10. If the independent random variable X1 , X2 , ..., Xn are identically distributed, nonlattice, and have finite third moment, then
p1 (x)
1
Fn (x) = Φ(x) + φ(x) √ + o( √ )
n
n
uniformly in x, where p1 (x) =
k3
(1
6
(25)
− x2 ).
Proof:
Put s = 3 in Theorem 3.7 of formula (24). We then deduce that
t2
t2 t2
r
(it)
1
−
−
fn (t) − e 2 − √ e 2 ≤ δ(n)
√
|t|3 + |t|6 e− 4 .
n
n
(26)
The characteristic function of the function
R1 (−Φ) = −
k3 (3)
k3
2
Φ (x) = √ (1 − x2 )e−t /2
6
6 2π
is equal to
k3
2
2
(it)3 e−t /2 = r1 (it)3e−t /2 .
6
Now apply Theorem 3.3 with
1
F (x) = Fn (x), G(x) = Φ(x) + √ R1 (−Φ)
n
√
′ A = max G (x) < +∞, T = λ(n) n.
Without loss of generality we can suppose that T ≥ T3n and then we estimate
the integral
Z T Z −T3n Z T3n Z T
fn (t) − g(t) ε=
+
+
.
dt =
t
−T
−T
−T3n
T3n
From (26) we get
Z T3n Z
fn (t) − g(t) dt ≤ δ(n)
√
t
n
−T3n
2
1
t2 + |t|5 e−t /4 dt = o( √ )
n
and according to [13, P202] we obtain
n
Z T
Z T Z T fn (t) − g(t) t
dt
dt ≤
f ( ) dt +
|g(t)|
t
Bn
t
t
T3n
T3n
T3n
17
√
√
Where Bn = nβ2 = nσ.
However,
Z T
Z T
1
dt
2
≤√
e−t /2
|g(t)|
t
2πT3n T3n
T3n
k3 |t|3
1+ √
6 n
!
1
dt = o( √ )
n
and by the previous theorem we get
n
Z λ(n)
Z T dt
t
1
dt
f ( ) |f (t)|n
= o( √ )
Bn t =
t
n
w
T3n
R −T
and we can estimate the integral −T 3n in the same way. Hence ε = o( √1n ),
and by applying Theorem 3.3 we get the inequality
(−Φ) 1
c(a)A
a
fn (t) − Φ(x) − R1√
√ = o( √ )
ε+
≤
2π
n
λ(n) n
n
which proves the theorem.
3.2.4
Simulation Edgeworth Expansion for Continuous Random
Variables
Let us test accurateness of the Edgeworth expansion. For instance we will
try to apply it to exponential random variables which are continuous random
variables, and compare the standardized mean of the exponential random
variable to the normal approximation. We will choose to have 5 independent
random variables X1 , X2 , X3 , X4 , X5 identically distributed (exponentially)
with λ = 2 i.e. mean 21 , and then do the simulation 10000 times using Monte
Carlo method to generate exponential random outcomes. We can see from
Figure 1 below that the Edgeworth approximation gives better estimate to
the distribution function than the normal approximation does.
3.3
Lattice Edgeworth Expansion
All absolutely continuous distributions satisfy Cramér condition (C). On the
other hand, discrete distributions do not. This mean that all lattice distributions do not satisfy Cramér condition (C). The expansion we have seen
above does not hold for lattice distributions. A motivative reason for this
is that, in the lattice distributions case, the distribution function of Sn has
jumps points for every n, while the Edgeworth expansion on the other hand
is a smooth function, so it can not approximate the distribution function of
Sn with the accuracy of the level claimed in (25). Therefore to let the lattice
distributions admit an expansion similar to (25) we need to add extra terms
to account for the jumps.
18
1.2
empirical distr
normcdf
E3
1
0.8
0.6
0.4
0.2
0
−0.2
−4
−3
−2
−1
0
1
2
3
4
5
6
Figure 1: Here Edgeworth expansion is of third order. The Edgeworth expansion approximate the empirical data better than the normal.
3.3.1
The Bernoulli Case
The simplest lattice distribution example is the Bernoulli case. Suppose
that we have an experiment whose outcome can be classified as either a
success or a failure, so if we put X = 1 when the experiment is a success
and X = −1 when it is a failure, then the probability distribution function
will have jumps p, 1 − p, at X = ±1 where 0 ≤ p ≤ 1 is the probability
of success. A random variable is said to be a Bernoulli random variable if
it’s probability distribution function is given by the above definition, and we
call it symmetrical Bernoulli distribution if F (x) is having jumps of size 12 at
X = ±1 i.e. p = 21 . Here F (x) is lattice distribution and the characteristic
function ψX (t) = E[eitx ] = 12 eit + 21 e−it = cos t. Now if we suppose that n is
even number, then Fn (x) is purely discontinuous with jumps at the points
x = √kn , (k = 0, ±2, ±4, ..., ±n).
We now look for an expression for the particular Bernoulli example. We
will end into an expansion of Fn (x) formulated in Esseen [10] and Gendensko and Kolmagorov [13]. The arguments in Esseen [10] and Gendensko and
Kolmagorov [13] differed some what and omitted details filled in. To motivate the expansion formula for the Bernoulli example we first need the the
following theorem.
Theorem 3.11 (The de Moivre-Laplace Theorem ). Let X1 , X2 , ..., Xn be independent identically distributed random variables with P (Xi = 1) = P (Xi =
19
−1) =
1
2
and let Zn = X1 + X2 + ... + Xn . If a < b, then as n → ∞,
√
p(a ≤ Zn / n ≤ b) →
Z
b
a
1
2
√ e−x /2 dx
2π
By de Moivre-Laplace Theorem, the jump size of Fn (x) for x =
is even, is approximately
Z
k+1
√
n
k−1
√
n
k
φ(y)dy = φ( √ )
n
k+1 k−1
√ − √
n
n
√k ,
n
for k
φ( √kn )
2
−( √k )2 /2
e n .
=2 √
=√
n
2πn
Let us investigate Fn (x) for x ∈ (− √1n , √1n ).
√
We look at the distribution function Fn of Sn = Zn / n in the interval
(− √1n , √1n ). We have from Theorem 3.11 that
1
1
P (− √ ≤ Sn ≤ √ ) →
n
n
Z
√1
n
− √1n
1
2
√ e−x /2 dx =
2π
Z
√2
n
0
2
1
1
2
√ e−x /2 dx = √
+o( √ )
n
2π
2πn
Inside the interval (− √1n , √1n ) the distribution function Fn has a jump at
2
zero asymptotically equal to √2πn
, and in the neighborhood of x = 0 the
normal distribution function Φ(x) is approximately Φ(0) + φ(x)x ≈ 21 + √x2π .
Furthermore
1
1
(Fn (+0) + Fn (−0)) = Φ(0) = .
2
2
1
For the particular case 0 ≤ x ≤ √n ,
1
Fn (x)−Φ(x) = Fn (0+)−Φ(x) ≈ Fn (x)−Φ(x)−φ(0)x = [Fn (0+) − Fn (−0)]−φ(0)x
2
√ 1 x n
1
2
1 2
−√ =√
−
= √
2 2πn
2
2π
2πn 2
We can represent the behavior of Fn and Φ(x) around x = 0 by the figures
2 and 3 below and neglect the term o( √1n ).
Generally, for x ∈ R
√ 2
x n
Q
Fn (x) − Φ(x) ≈ √
2
2πn
where
1
Q(x) = [x] − x + ,
2
20
Φ(x)
√2
2πn
Fn (x)
1
2
b
b
b
b
b
b
b
b
−4
√
n
−2
√
n
√1
n
√2
n
√3
n
√4
n
√5
n
√6
n
x
Figure 2: Behavior of Fn (x) and Φ(x)
and [x] is the integral part of x. Essen [10] suggest that we write
√ −x2
2
x n
Dn (x) = √
e 2
Q
2
2πn
√ It is easy to see that Q x 2 n is a periodic function and takes values between
+ 12 and − 21
then you need to study the expression
Fn (x) − Φ(x) − Dn (x).
Here we will not go further with this particular Bernoulli example.
3.3.2
Simulating for the Edgeworth expansion Bernoulli Case
Here we apply the Edgeworth approximation method using Matlab. We simulate 10 random variables 3000 times, and then apply normal approximation
in addition to our Edgeworht series approximation. We can easily see that
there is big difference when we deal with lattice distributions. The normal
distribution can never estimate the stairs shape of the distribution of Sn
while the lattice Edgeworth approximation does that.
3.3.3
General Lattice Case
Now we will move to the general case of the lattice distribution.
21
Fn (x) − Φ(x)
b
b
−4
√
n
−2
√
n
1
2
b
b
b
b
b
b
√1
n
√2
n
√3
n
√4
n
√5
n
√6
n
x
Figure 3: Difference between Fn (x) and Φ(x) for the Bernoulli case
1.2
empirical distr
normcdf
Kol1
1
0.8
0.6
0.4
0.2
0
−0.2
−4
−3
−2
−1
0
1
2
3
4
Figure 4: Kol1 is the Edgeworth Expansion of order 3, normcdf is the Normal
approximation.
Let us consider a sequence of independent random variables X1 , X2 , ..., Xn ,
all with the same distribution function F (x), the characteristic function
ψX (t), the mean value µ, and standard deviation σ 6= 0, the moments αj , and
the absolute moments βj , (j = 3, 4, ...). We denote by Fn (x) the distribution
function of the partial sum
Pn
j=1 Xj /n − µ
√
.
Sn =
σ/ n
22
The possible values of the lattice random variable Xn will be x = a+kh, (k =
0, ±1, ±2, ...) and h, the span of the distribution,see definition 2.3. Then
X −µ
the lattice random variable Yj = jσ which has mean zero and standard
deviation one will take values in σa + kh
.
σ
Following the same steps we did in the Bernoulli case we get the expression
√ x n − x2
2
Q
e 2.
Dn (x) = √
2
2πn
Here we put the main theorem for general lattice distribution in a way that
Esseen (1945) [10] present it 2 . The proof is basically to start calculating
the characteristic function of Dn (x) and then applying Theorem 3.4, then
splitting the integral we have into 3 parts. Each of them will give values
estimated as o( √1n ). That will proof the theorem.
Theorem 3.12. If X1 , X2 , ..., Xn are independent, identically distributed
random variables with finite third moments then
−x2
e 2
Fn (x) = Φ(x) + √
2π
P1 (x) Q1 (x)
√ + √
n
n
uniform in x,
where P1 (x) =
− K63 (x2
− 1) and Q1 (x) =
Note P1 (x) = p1 (x) in ( 17).
h
Q
σ
1
+ o( √ )
n
√
√
(x− a σ n ) n
h/σ
(27)
.
Gnedenko and Kolmogorov [13] shows the expansion in the lattice case
for one term and neglect the terms from order o( √1n ). On the other hand,
Kolassa [17] put the expansion as series for n terms and neglecting terms
1−j
from order o(n 2 ). Now denote the Edgeworth expansion for continuous
random variable by Ej (x; k n ) where j represents
the order of first cumulants,
Pn
j=1 Xj /n−µ
n
√
. In the same way we
and k represents the cumulants for Sn =
σ/ n
can denote the discontinuous part of the Edgeworth expansion by Dj (x, k n ).
Then we can rewrite the formula (27) into the form
Fn (x) = E3 (x, k n ) + D3 (x, k n ).
where
P1 (x)
1
E3 (x, k n ) = Φ(x) + φ(x) √ + o( √ )
n
n
2
We realized that here there is an erorr in the original Esseen formula, see also Hall
[15] concerning that observation. The formula here is from Kolassa [17]
23
and
Q1 (x)
1
D3 (x, k n ) = φ(x) √ + o( √ ).
n
n
We will try to use the formula (27) in applications later, but for seeking
more accuracy results we should calculate more terms so we will look at a
formula in Kolassa [17] which contains our needs (more than one term) and
after that we can compare it with one term to see if we get any improvements.
Applying the Edgeworth cumulative distribution function approximation
using the first four cumulants,
1
Fn (x) = E4 (x, k n ) + o( )
n
(28)
1
n
n
n2
= Φ(x) − φ(x) h2 (x)k3 /6 + h3 (x)k4 /24 + h4 (x)k3 + o( )
n
n
where k3 is the cumulant of order 3 for Sn ,
2−j
kj n 2
=
σj
We can then prove (29) easily. The characteristic function of Sn is
kjn
ψSn (t) = E[eSn θ ]
θ
Using (5), E[eθX ] = e
P∞
j=1
]
= E[e
n
Xj −µ
√θ
.
σ
n
= E[e
]
n
Xj
√θ
√
− σθµ
σ
n
n
E[e
= e
]
j
kj θj!
ψSn (t) =
Pn
Xj −µ
j=1
√
n/σ
= e
√
− σθµ
n
√
n
e
P∞
j=0 kj
e
P∞
e
P∞
=e
=e
− θµσ
=e
2−j
P∞ k j n 2 θ j
j=1
j!
σj
=e
P∞
j=2
θ
√
σ n
j
kj n θ j
j=0 (σ √n)j j!
− θµσ
√
n
j=1
j
kjn θj!
,
24
2−j
kj n 2 θ j
j!
σj
1
j!
n
, k1 = 1
(29)
2−j
where kjn =
kj n 2
σj
, j ≥ 2.
We have from Kolassa [17] the following expansion
l
r−2
X
1 h/σ
n
√
D4 (x, k ) =
Ql
l!
n
j=1
(x −
!
√
a n √
n
)
σ
h/σ
(−1)l
dl
Ej (x, k n ),
dxl
(30)
where
Ql =
(
P
cos(2πjy)/(2l−1 (πj)l ) if l is even
(l − 1)!gl ∞
Pj=1
l−1
(πj)l ) if l is odd
(l − 1)!gl ∞
j=1 sin(2πjy)/(2
(31)
with the constant gl given by
(
+1 if l = 4j + 1 or l = 4j + 2 for some integer j
gl =
−1 if l = 4j + 3 or l = 4j for some integer j
so
Fn (x) = E4 (x, k n ) + D4 (x, k n ).
3.3.4
Continuity-Corrected Points Edgeworth Series
Feller [12] shows that the Edgeworth series estimated only at continuitycorrected points
h/σ
x+ = x + √
2 n
√
will yield results accurate to o(1/ n).
In other words Feller suggests that it is possible to add corrections to the
standard Edgeworth series expansion for continuous random variables. By
modifying the cumulants for the standard Edgeworth expansion for continuous random variables we get a formula valid for lattice random variables.
Specifically this method suggests that we can use the cumulants adjusted by
Sheppard’s corrections to approximate the lattice distribution. Kolassa and
McCullagh [16] prove that by evaluating such an Edgeworth expansion at
continuity-corrected points, the errors that result are as small as usually in
an Edgeworth approximation. The main purpose of their work was to show
that the discrete part in Esseen’s approximation for lattice random variables
can be omitted by modifying the expansion for the continuous random variables. In equations words they put
25
Er (x+ ; λ) = Er (x+ , k n ) + Dr (x+ , k n ) + o(n
1−r
2
) ; x are lattice points (32)
where
λi =
(
ki
for i > 1 and odd
h
for i even
ki − εi nσ
(33)
are the Sheppard’s-corrected cumulants. Wold [21] describes more about
these corrected cumulants .
adjustment
εi is the ith cumulant of the uniform random variable U
The
1 1
1
1
on − 2 , 2 , i.e. ε2 = 12
, ε3 = 0 and ε4 = − 120
.
Remark 3.13. If X are not lattice points then (32) is not true. Moreover,
graphically Er (x+ , k n ) does produce the stair shape for the distribution function.
Remark 3.14. There are discrete distributions which are not lattice distributions. For those non of the Edgeworth expansion presented here may work.
This problem can be solve using the saddle point approximation method which
we will describe later.
3.3.5
Simulation on Triss Cards
Our main task here is to try to apply the Edgeworth expansion series for
lattice random variables to real application that companies may use, and
one of this application is the chance games, for example Triss.
We will present here the case where the company produce a Triss card with
three boxes and when you scratch the box you may win 0 SEK, 2 SEK,
or 10 SEK with probabilities (0.53, 0.46999, 0.00001) , respectively. We will
write a program in Matlab for this purpose. The aim behind this program is
trying to approximate the cumulative distribution function of Sn better than
the simple normal approximation method, and one of the important benefit
of this method is to reduce the time spent by a games companies to check
the statistical aspects and decide if a specific chance game is reliable or not
depending on their rules.
First we will start with n = 1000 lattice random numbers taking values in
0, 2, 10 with probabilities (0.53, 0.46999, 0.00001) and let 3000 be the number
of simulations. Estimating of the cumulative distribution function will give
the figure 5.
26
1
empirical distr
normcdf
Kol1
Kolasa1
Kolasa2
Kolasa3
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
−4
−3
−2
−1
0
1
2
3
4
5
Figure 5: Here Kol1 represent the Edgeworth series expansion with one term
in (27), Kolasa1 is the same like Kol1 but derived from Kolassa formula [17],
Kolasa2 represent Edgeworth expansion with two terms [17], while Kolasa3
is for Edgeworth expansion for corrected-continuity points.
empirical distr
normcdf
Kol1
Kolasa2
Kolasa3
0.75
0.7
0.65
0.6
0.55
0.5
0
0.1
0.2
0.3
0.4
0.5
Figure 6: Edgeworth approximation of the distribution function n = 1000
Now if we take a zoom in to have better picture to our approximation
methods we get the figure 6.
27
As a direct conclusion we can see that the normal approximation did not
give very accurate result as the others, it is even don’t have the stairs shape,
while we can also see that Kol1,Kolasa1, and Kolasa2 are exactly the same
curve and they are more close to the empirical distribution function. For
the Kolasa3, the Corrected-Continuity method gives a very good accurate
result only at Corrected-Continuity points which are the points that the
distribution function jumps at.
Our second attempt will be with n = 100 and 3000 the number of simulations. Here we reduce the number of lattice random numbers to test the
ability of the Edgewrth expansion to give an accurate outcome even though
with small n, see the figure 7.
1
empirical distr
normcdf
Kol1
Kolasa2
Kolasa3
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
−1
−0.5
0
0.5
1
1.5
Figure 7: Edgeworth approximation of the distribution function n = 100
We notice that our Edgeworth approximation methods are doing well
here and much more better than the normal approximation and the stairs are
also close to the empirical distribution function, the same for the CorrectedContinuity method (Kolasa3). It gives an accurate estimation at the ContinuityCorrected points. It will be more clear with the zoomed picture Figure 8.
We still want to try one more case, and here we will change the winning
values to become 0 SEK, 2 SEK, and 50 SEK with the same previous probabilities, and try to apply the same program we have before at the new lattice
case. By applying the program we get Figure 9, with 1000 generated independent identically distributed lattice random numbers and 3000 number of
simulations.
28
empirical distr
normcdf
Kol1
Kolasa2
Kolasa3
0.6
0.55
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
−0.8
−0.7
−0.6
−0.5
−0.4
−0.3
−0.2
−0.1
0
Figure 8: Edgeworth approximation of the distribution function n = 100
empirical distr
normcdf
Kol1
Kolasa2
Kolasa3
0.7
0.65
0.6
0.55
0.5
0.45
0.4
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
Figure 9: Edgeworth approximation of the distribution function n = 1000.
We notice that similar to the previous cases, the normal approximations
don’t give us an accurate result as the Edgeworth approximation does. Also
we see the same big difference between the Corrected-Continuity method and
the normal method at the corrected continuity points.
Now if we change n the number of generated identically distributed lattice
29
random numbers, to 100, with the same simulation number and run the
program it will give Figure 10.
0.6
empirical distr
normcdf
Kol1
Kolasa2
Kolasa3
0.55
0.5
0.45
0.4
0.35
0.3
−0.8
−0.6
−0.4
−0.2
0
0.2
Figure 10: Edgeworth approximation of the distribution function n = 100.
We will present another approximation method now focusing just on the
applied part without going through theoretical aspects of it.
4
Saddle Point Approximation
The statistical direction of the saddle point approximation started by the
pioneering article by Daniels (1954) [6], where he treated the case of mean
of identically distributed random variables. The saddle point approximation
is derived using saddle point integration techniques applied to a density inversion integral. Here we will consider the problem of estimating the density
of a random variable X whose distribution depends on n. As we mentioned
before we will not go further in theoretical details, instead present the formulas and how to use it.
First let X1 , X2 , ...Xn be iid continuous observations having f (x) as the
density function.
Suppose the moment generating function (mgf) of f, ϕ(t) =
R tx
tx
E[e ] = e f (x)dx exist for t around the origo point, and K(t) = log(ϕ(t))
represent the cumulant generating function (cgf). Then for given x, let φb =
b denote the saddle point of K; in other words the solution of the equation
φ(x)
30
b = x, which in many applications is unique. For given n let fn (u)
K (φ)
denote the density of X. Then the saddle point approximation to fn says
the following.
′
Theorem 4.1.
fn (x) =
for any given value x.
s
n
b
2πK (φ)
′′
b
b
en[K(φ)−φx] 1 + O(n−1 )
The expansion can be carried to higher-order terms, in which the error
of the approximation goes down in powers of n1 rather than powers of √1n .
This gives a hint that the saddle point approximation may be more accurate
at small sample sizes. You can find an explicit explanation to this fact in
Daniels (1954) [6]. The result in Theorem 4.1 can be derived at least in two
ways, like Laplace’s method or Edgeworth expansion with exponential tilting
which can be found in DasGupta [8].
Example: Suppose we have Xi are independent identically distributed
t2
2
N (0, 1). Then, ψ(t) = e 2 and so K(t) = t2 . The unique saddle point φb
′
b = φb = x, giving the approximation
solves K (φ)
r
r
n n[ x2 −x2 ]
n n x2
fn (x) =
=
e 2
e 2
2π
2π
which coincides with the exact density of X.
Theorem 4.2. (The Lattice Case) Let X1 , X2 , ..., Xn q
be an independent idenb − K(φ)]sgn(
b
b
tically distributed lattice random variables. With ζb = 2n[φx
φ),
√
b
and Z = 1 − e−φ
nK (2) , then
1
1
−3/2
b
b
Qn (x) = P X > x = 1 − Φ(ζ) + φ(ζ)
) .
(34)
− + O(n
Z ζb
Remark: A continuity-corrected method of this theorem is presented in
Daniels (1987) [7], but its practical improvement over the uncorrected version
looks to be questionable.
31
4.1
Simulation With Saddle Point Approximation Method
We will apply the saddle point approximation method on the same chance
game we explained before by simulating the game and then compare the
empirical distribution function with the distribution function we get from
saddle point approximation technique. Programing the saddle point method
using Matlab and run it for 1000 iid random numbers with 3000 the number
of simulations will give us Figure 11.
empirical distr
normcdf
sad
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
−3
−2
−1
0
1
2
3
4
Figure 11: Saddle point approximation (sad). The comparing normal approximation may not hardly visible on black-white print.
In this figure we can see that the saddle point method give better estimation than the normal approximation at the discontinuity points where the
distribution function have jumps.
32
A
Matlab Codes
%Rani Basna
N=10000; lambda=2;n=5;
u=unifrnd(0,1,n,N);
X=-1/lambda*log(1-u);
x=linspace(0,4*lambda);
m1=1/lambda;
m2=2/(lambda)^2;
m3=6/(lambda)^3;
m4=6;
sigma=sqrt(m2-m1^2);
gamma3=(m3-3*m1*m2+2*m1^3)/sigma^3;
h2=@(x)x.^2-1;
h3=@(x)x.^3-3*x;
h4=@(x)x.^4-6*x.^2+3;
h5=@(x)x.^6-15*x.^4+45*x.^2-15;
E3=@(x)normcdf(x)-normpdf(x).*(gamma3*h2(x)/(sqrt(n)*6));
stand=(mean(X,1)-m1)/(sigma/sqrt(n));
ecdf(stand);
y=linspace(-4,4),hold,
plot(y,normcdf(y),’g’,y,E3(y),’r’)
legend(’empirical distr’,’normcdf’,’E3’)
%Rani Basna
n=10;%n as in Gnedenko and Kolmogorov
m=3000;%number of simulations
win=2*unidrnd(2,n,m)-3;
m1=[-1 1]*[.5 .5]’;
m2=([-1 1].^2)*[.5 .5]’;
sigma=sqrt(m2-m1^2);
S=@(x)floor(x)-x+1/2;
D=@(x)2/sqrt(n)*S(x*sqrt(n)/2).*normpdf(x);
Kol=@(x)normcdf(x)+D(x);
standwin=(mean(win,1)-m1)/(sigma/sqrt(n));
ecdf(standwin)
x=linspace(-4,4), hold,plot(x,normcdf(x),’g’,x,Kol(x),’r’),hold off
legend(’empirical distr’,’normcdf’,’Kol1’)
%Rani Basna and Roger Pettersson
33
clear
n=1000;%n as in Gnedenko and Kolmogorov
m=3000;%number of simulations
x=rand(n,m);
kolasa1=zeros(size(x));
win=0*(x<=0.53)+2*(x>0.53).*(x<=0.53+0.46999)+50*(x>0.53+0.46999);
mwin=mean(win,1); %sample of mean wins
xvalues=[0 2 50];
pvalues=[.53 .46999 .00001];
m1=xvalues*pvalues’;
m2=(xvalues.^2)*pvalues’;
m3=(xvalues.^3)*pvalues’;
m4=(xvalues.^4)*pvalues’;
a=max(xvalues)/2-m1;
h=2;
sigma=sqrt(m2-m1^2);
gamma3=(m3-3*m1*m2+2*m1^3)/sigma^3;
gamma4=((xvalues-m1).^4)*pvalues’/sigma^4;
%S1=@(x)floor(x)-x+1/2;
S=@(x)x-floor(x)-1/2;
S22=@(x)(x-floor(x)).^2-(x-floor(x))+1/6;
D=@(x)2/sqrt(n)*S(x*sqrt(n)/2).*normpdf(x);
pbar=-a/h;
an=h*n*pbar/sigma/sqrt(n);
S1=@(x)-h/sigma*S((x+an)*sigma*sqrt(n)/h);
S2=@(x)h^2/(2*sigma^2*n)*S22((x+an)*sigma*sqrt(n)/h);
Q1=@(x)1/6*gamma3*(1-x.^2);
Q2=@(x)10*gamma3^2*x.^5/prod(1:6)+(gamma4/3-10*gamma3^2/9)...
*x.^3/8+(5*gamma3^2/24-gamma4/8)*x;
Kol1=@(x)normcdf(x)+normpdf(x).*(Q1(x)+S1(x))/sqrt(n);
Kol2=@(x)Kol1(x)+normpdf(x).*Q2(x)/n;
h2=@(x)x.^2-1;
h3=@(x)x.^3-3*x;
h4=@(x)x.^4-6*x.^2+3;
h5=@(x)x.^6-15*x.^4+45*x.^2-15;
E3=@(x)normcdf(x)-normpdf(x).*(gamma3*h2(x)/(sqrt(n)*6));
E31=@(x)normpdf(x).*(1+gamma3*h3(x)/(6*sqrt(n)));
D3=@(x)S1(x)/sqrt(n).*E31(x);
Kolasa1=@(x)D3(x)+E3(x);
34
E4=@(x)E3(x)+gamma4*h3(x)/(24*n)+gamma3^2*h5(x)/(72*n);
E41=@(x)normpdf(x).*(1+gamma3*h3(x)/(6*sqrt(n))+...
gamma4*h4(x)/(24*n)+gamma3^2*h6(x)/(72*n));
E411=@(x)-normpdf(x).*(x+gamma3*h4(x)/(6*sqrt(n))+...
gamma4*h5(x)/(24*n)+gamma3^2*h7(x)/(72*n));
D4=@(x)D3(x)+S2(x).*x.*normpdf(x);
Kolasa2=@(x)D4(x)+E4(x);
eps2=1/12;
eps4=-1/120;
lamda3=gamma3;
lamda4=gamma4-eps4/n*(h/sigma)^4;
Z=@(x)x+h/(sigma*2*sqrt(n));
e3=@(x)normcdf(Z(x))-normpdf(Z(x)).*(lamda3*h2(Z(x))/(sqrt(n)*6));
e31=@(x)normpdf(Z(x)).*(1+lamda3*h3(Z(x))/(6*sqrt(n)));
d3=@(x)S1(Z(x))/sqrt(n).*e31(Z(x));
Kolasa3=@(x)e3(x);
e4=@(x)e3(Z(x))+lamda4*h3(Z(x))/(24*n)+lamda3^2*h5(Z(x))/(72*n);
e41=@(x)normpdf(z(x)).*(1+lamda3*h3(Z(x))/(6*sqrt(n))+...
lamda4*h4(Z(x))/(24*n)+lamda3^2*h6(Z(x))/(72*n));
e411=@(x)-normpdf(Z(x)).*(Z(x)+lamda3*h4(Z(x))/(6*sqrt(n))+...
lamda4*h5(Z(x))/(24*n)+lamda3^2*h7(Z(x))/(72*n));
d4=@(x)d3(Z(x))+S2(Z(x)).*Z(x).*normpdf(Z(x));
Kolasa4=@(x)e4(x);
Kolasa11=@(x)normcdf(x)-normpdf(x)/sqrt(n).*((gamma3*h2(x))/6+...
(gamma4*h3(x))/(sqrt(n)*24)+(gamma3^2*h5(x))/(72*sqrt(n))+...
S1(x)+S1(x).*(gamma3*h3(x))/(6*sqrt(n))-(x.*S2(x))*h/(2*sqrt(n)*sigma));
standwin=(mean(win,1)-m1)/(sigma/sqrt(n));%standardized mean data
%subplot(211),ecdf(standwin);
plot(211),ecdf(standwin);
display(’emp percentiles; normal perc’)
[prctile(mwin,2.5) prctile(mwin,97.5); %empirical percentiles
m1+norminv(0.025)*sigma/sqrt(n) m1+norminv(0.975)*sigma/sqrt(n)]
alpha=0.025;
R=@(y)1-Kol1(y);
zalpha=norminv(1-alpha);
35
yalpha=fzero(@(y)R(y)-alpha,zalpha);
[yalpha zalpha];
xalphaNormal=m1+zalpha*sigma/sqrt(n)
xalphaKol1=m1+yalpha*sigma/sqrt(n)
%y=linspace(-2,2), hold ,plot(y,normcdf(y),’g’,y,Kol1(y),’r’,y,Kolasa1(y)...
,’c’,y,Kolasa2(y),’m’,y,Kolasa3(y),’y’,y,Kolasa4(y),’k’),hold off
y=linspace(-2,2);, hold ,plot(y,normcdf(y),’g’,y,Kol1(y),’r’,y,Kolasa1(y)...
,’c’,y,Kolasa2(y),’m’,y,Kolasa3(y),’y’),hold off
legend(’empirical distr’,’normcdf’,’Kol1’,’Kolasa1’,’Kolasa2’,’Kolasa3’)
%Rani Basna and Roger Pettersson
clear
n=100;
m=3000;
X=rand(n,m);
win=0*(X<=0.53)+2*(X>0.53).*(X<=0.53+0.46999)+10*(X>0.53+0.46999);
xvalues=[0 2 10];
pvalues=[.53 .46999 .00001];
mu=sum(xvalues.*pvalues);
sigma=sqrt(sum(((xvalues-mu).^2).*pvalues));
xbar=mean(win,1);
stand=(xbar-mu)/(sigma/sqrt(n));
psi=@(t)sum(exp(t*xvalues).*pvalues);
psi1=@(t)sum(xvalues.*exp(t*xvalues).*pvalues);
psi11=@(t)sum(xvalues.^2.*exp(t*xvalues).*pvalues);
k=@(t)log(psi(t));
k1=@(t)psi1(t)/psi(t);
k11=@(t)(psi11(t).*psi(t)-psi1(t).^2)/psi(t).^2;
y=linspace(-2,2);
x=mu+sigma*y/sqrt(n);
for i=1:length(x)
%plot(x(i),k1(x(i)),’.’),hold on
end
%hold off
for i=1:length(x)
36
theta(i)=fzero(@(t)k1(t)-x(i),max(x(i)-1,0));
Zetabar(i)=sqrt(2*n*(theta(i)*x(i)-k(theta(i))))*sign(theta(i));
Zbar(i)=(1-exp(-theta(i)))*sqrt(n*k11(theta(i)));
sad(i)=normcdf(Zetabar(i))-normpdf(Zetabar(i))*(1/Zbar(i)-1/Zetabar(i));
end
ecdf(stand),hold,
plot(y,normcdf(y),’g’,y,sad,’r’)
legend(’empirical distr’,’normcdf’,’sad’)
37
References
[1] Bhattachatya, R.N. and Rao, R.R. (1976). Normal Approximation and
Asymptotic Expansions. Willey, New York.
[2] Billingsley, P. (1986). Probabili´ty and Measure. 2nd ed. Wiley, New
York.
[3] Blom, G. (1984). Sannolikhetsteori med tillämpningar. Studenttlitteratur.
[4] Cramér, H. (1946). Mathematical Methods of Statistics. Princeton University Press, Princeton, NJ.
[5] Cramér, H. (1970). Random Variables and Probability Distributions, 3rd
ed. Cambridge University Press, Cambridge, UK.
[6] Daniels, H.E.(1954). Saddle Point Approximations in Statistics.
Ann.Math. Statist. 25, 631-649.
[7] Daniels, H.E.(1987). Tail Probability Approximations. Int. Stat. Rev., 55,
37-48.
[8] DasGupta, A. (2008). Asymptotic Theory of Statistics and Probability.
Springer. New York.
[9] Durrett, R. (1996). Probability theory and examples. Wadsworth.
[10] Esseen, C.G. (1945). Fourier Analysis for Distribution Functions, Acta
Mathematica, 77, 1-125.
[11] Esseen, C.G. (1968). On The Remainder Term in The Central Limit
Theorem. Arkiv Matematic 8, 7-15.
[12] Feller, W. (1971). An Introduction to Probability Theory and Its Applications,Vol 2, 2nd ed. Wiley, New York.
[13] Gnedenko, B.V. and Kolmogorov, A.N. (1954). Limit Distributions for
Sums of Independent Random Variables. Addison-Wesley, Reading, MA.
[14] Gut, A. (2005). Probability a Graduate Course, Springer.
[15] Hall,P. (1992). The Bootstrap and Edgeworth Expansion, SpringerVerlag, New York.
[16] Kolassa, J.E., and McCullagh, P.(1990), Edgeworth Series for Lattice
Distributions, Annals of Statistics, 18, 981-985.
38
[17] Kolassa, J.E, (2003). Series Approximation Methods in Statistics.
Springer.
[18] Petrov, V.V. (1975). Sums of Independent Random Variables. SpringerVerlag, New York.
[19] Ross, S.(2006). First Course in Probability. 7th Ed. Person Prentice Hall.
[20] Råde, L. and Westergren, B. (2004). Mathematics Handbook for Science
and Engineering. Studenttlitteratur.
[21] Wold, H. Sheppard’s Corrections formula in several variables. Skand,
Akturarietidskrift 17 248-255.
39
SE-351 95 Växjö / SE-391 82 Kalmar
Tel +46-772-28 80 00
[email protected]
Lnu.se/dfm
© Copyright 2026 Paperzz