Chu, John T.; (1955). "Errors in normal approximation to certain types of distribution functions." (Navy Research) Paper

ERRORS IN NORMAL APPROXIMATION TO CERTAIN TYPES OF
DISTRIBUTION FUNCTIONS
by
J. T. Chu
University of North Carolina
S'pe~Jal.re'port-to---eije
Office 10f Naval Research
of work at;C~~~l.JiLll.Uftder:Contract NR 042 031,
Projet"'t1f7~onr-284( 02) ,-!.2F.research in probability;andtatat1st1ds~ Reproduction in whole or
in part is ~perm1~W to;p -any''''Purpose of the
Un1ted~SUites·Government.
-"","
Institute of Statistics
Mimeograph Series No. 13.0
May 27, 1955
----'
ERRORS TIJ NORMAL APPROXD!ATIONS TO CERTATIJ TYPES OF
DISTRIBUTION FUNCTIONS
by
J. T. Chu
University of North Carolina
1.
Swmnary.
For certain types of sequences of distribution functions which
are asymptotically normal, say with mean 0
ed for deriving upper and lower bounds in
function.
1, a method is obtain-
of the asymptotic distribution
terr~
It is then shown that to many of these distribution functions, the
errors are small in using the normal
2.
and variance
Introduction.
Let
approx~nation.
F (x), n = 1, 2, •.• , be a sequence of cdf's (cumn
ulative distribution functions) such that for every fixed x , Fn(x) - > F(x)
as
n ->
00,
where
F(x)
is a cdf independent of n.
of view, it i3 desirable to know how large
I Fn(X) -
F(x)
I
From a practical point
n has to be in order that Dn (x) "'"
be small so that F(x) nay be used as an approximation to
F (x), althou3h approximations are often used in practice without much knowledge
n
about the
ma~nitudes
of the errors.
The function D (x) may vary, of course,
.
considerably for different values of n
and x.
n
But the most interesting kind
of Dn (x)'s is probably those which tend rapidly to
such cases,
0,
uniformly in x.
In
F(x) provides for all n's greater than some minimum and for all XIs
a satisfactory approximation for
Fn(x).
is ample numerical evidence that as
Generally, however, even though there
n incroases D (x) rapidly becomes ill1iform-
ly small, it may not be easy to obtain a
n
m:~thciatical
proof.
There are, on the other hand, types of sequences of cdf's for which we arc
able to confirm riGorously that they do tend rapidly to normality.
If a cd! hus
1. Work sponsored by the Office of Naval Research under Contract NR 042 031
at Chapel Hill.
2
one of tho followinC fonns:
( 1 +
+
Z In)
2
lrl
/
2
dz
,
-00
where
en
and r,l depend only on n which takes integral values" and
Lim
n
mIn - I" then as
n ->
1
Fn(x) - > '
00"
for every fixed x ( Lanma 4).
--:> 00
(x)" the normal cd! specified by (6),
We find: by usinC simple transformations such as
u ....
r + n log
--
( 1 + z2 In)
J
1
upper and lower bounds can be easily obtained for the integral in F (x) in terms
n
of ~(x)
(Lemma 3)" and if
en
is not a very complicated function of n, then
rather simple upper bounds can usually be derived for the error in using ]i(x)
as an approximation to Fn(x).
~
If the bound is small, then so is the error.
In
4" applicQtions nre given to sequences of cdfts corresponding to tho Student
t-distribution,
t'he --r-distribution of W. R. Thompson ["4_7, and the distribu-
tions of the partial and total correlation coefficients when the vcriatos involved are llldependcntly and normally distributed.
For most of these cdirs, we
are able to show that the errors are small in using the normal approximation,
though the actual values of the errors scorn even smaller.
Similnr methods were used by the author
bounds for the cdf of the sample medinn
tion function (which is normal).
x
L-I_7
to derive upper and lower
in terms of its asymptotic distribu-
There we ~lso showed that if the parent dis-
tribution is normal" then even for samples of moderate sizes, the error is smnll
in
using tho normal approximation to the
,-.oJ
cdf of x •
Tho cdf
of
duced to one of the forms given above by sever",l transformations.
x can be reBut more re-
3
fined arguments aro nooded in order to got the bounds obtained in ~1_7.
Some further applications and remarks arc given at the end of
3.
~
4
and ~
=
0, hence
5.
Lemmas.
Lemma 1.
(1)
, for all real
(2)
1 +
x-x
a
>
'""'
<
x
2
/2
x
,
>
<
, according as x
o.
If x 2; 0, thon
(3)
> 1 ,
x
Proof.
2
e- x /(1 _ e-
The function
eX
.1
2
X
and
2 < 1 .
)
- x - I has its minill1urn
0
at x
we have (1).
(2) holds because log(l + x ) -x+ x 2/2 is monotonically increasing for
all x >-1-
(3)
is a consequence of
(left hmd side) tends to
function of
1
(1). (4)
follows from the facts that the
as x --> 0 and is n monotonically decroasing
x (Differontiate twice).
L;mrna
2.
Lot
1
1 • 3 ••• (2n-l)
)2
b (c) = -2-.-:4~.-.-.~(2~n-:)~
(n + c
, n "" 1, 2, •.• ,
n
whero
c >
-1.
Than
LHS
1
>
~
Proof.
bn(O)
_1
j'(' 2
£5,
is known
,
if
c <
t ,.
,
if
c >
g
P. 351
_7
4
•
7
as the Wallis product and tends to
w
as
n _>
•
Obviously b (c) tends to the sarne limit for every fixed c.
n
00.
By examining the square of the ratio
b
~
l(c)/b (c), it can be shown that
n
is a monotonicCllly incroasing function of n
monotonically decreasing function of
n
Lemma
i f and only i f
if .and only if
c
c ~
t'
b (c)
n
and is a
~ ~. Hence we have (5).
3.
Let
,
(6)
and
~o(x)
-
I
(x) -
~
and let the inequalities
•
Let (7) below be abbreviated as
-
-
-
- - - -
A < B < C < D < E,
A < B, B < C, C < D, and D < E be respectively referred
...
~
(1.1), (7.2), ••• , and (1.4). Thon (7.1) and (7.2) hold for all m, n > 0,
and 0 < x ~ n1/2 • (7.3) holds for all m, n > 0, and 0 S x < 00; and ( 7.4),
to as
all m > 3, n > 0, nnd ..0 :5 x <
00,
l
x(1_. 2In) m/2 d.
o
1
<
IE! 0
-~m
(x
IF)
,In
_< (2n)-2
~
x
(1+ z2/n )-m/2 dz
o
Proof.
of
It is easy to see that (7.2) and (7.3) aro immediate consequences
(1). Now use the transformation
1
v(z)
=
(1+z 2/h)-m/2 dZ
=t
J "2 ,
log (1 + Z 2
In)
in
then
x
J
o
.1
where hex) is the illS of (4).
we have
(1.4).
Finally
(1.1)
-
2
By (2) and (4), h{v/n )
S 1 and vex) ~ x. Hence
can bo obtained in a similar way by using
(3)
after
applying to the integral the transformation
u(z ) =
-r -
1
n log (1 - z
Lomm.1.
4.
~
nO
Suppose that for overy integer
n
~
1
22
In) J
J
x
Fn(X) = en
(8)
j
2
(1.: z2/n );111/ dz
-00
is a cd!. where C and rn depend only on n
. n
is (1 - z
2
In
)m/ 2'
,
it should be replaced by
and
Lim
n->
min
00
o uhcncver
'z'
:$
1 (If tho intcgr~nd
~ n~).
Then, for every
fixed x,
Lim
n _>
whore
I
00
F (x)
n
=
(~) is defined by (6).
Proof.
By Lemma 3, we have
~(x)
,
1
C'" (2n)-2.
Lim
n _>
00
n
Using the same
lCInInt:l
6
once again, we obtain
4.
~ormnl.
types
(8),
(9).
4, we shOlled that if a cd! is of ('no of tho
apprOxilllati r ns. In Lemma
then it tends to tho
][ of
(6)
as
shall find uppor and lower bounds for tho C
n
IS
n --->
00.
In this sectian, wo
correspcnding to various well known
cdfs of the typos (8); then usc Lemma 3 to obtain uppor and lower bounds for those
:i
cdfs thomselves, in terms of
and fihdly derive upper bounds for the pro-
j
portional errors(~) in using 5[ as an approximation to those cdfs.
A. t-distribution
The cdf of the t-distribution \-1itIl
n == 1, 2,
... ,
n
d. f. (degrees of freedom),
is given by
r
(10)
a
n
(1 + Z2;n )-(n+l)/2dZ,
where
- 00
It is well knol'm that as
fixed
Xj
n ->
00,
Fn(X) - >
!
(x) of (6) for every
and the llspeed ll of approaching the limit is rather fast.
normal approximation is often used in practice.when
n ~ 30.
In fact, the
We shall derive for
Fn(x) upper and louer bounds in terms of !(:::), then show that the proportional
error in using ~(x) as an approximation to
ApplYing (7.4) to
.
~ (-x) -1 -
n
~
F (y)-1/2 and 1/2 - F (-x) and using tho fact that
n
F (y) - F (-x) <
n
n
c
n
= (1
n
"!(x), it can be shown e1sily that for arbitrary x, y ~ 0, and
3,
where
Fn(X) is less than lin for all n ~ 8.
1
(2n)2 a c-lr-;h(c y)
nn-
~
n
- ::h(-c x) 7
~
n -
.1
- 2/n)2.
Using
(7.3), we obtain, in a similar way,
,
7
1
1
F (y) - F (-x) > (2n)2 a d-
(13)
n
where
d
= (1
n
Using
n
r(x+1)
m =' 1", 2, •. ' •
~
xr(x) and r(1/2)
and a
wtting
m=l, 2, •••
•
2~
i:. (-dnx) J
o
if
= 1/2
it
c~n
be seen that
,
~
2m
~
1 b-1 (c)
m
=
where
i f n - 2m+1,
3 ,
(1 _ J/(7n)
)1/2
and 1/4 respectiV-fl1y, we obtain
n:or 2m
a
n-
= 2m and ~ (1 - J/(7n»1/2
n-
Likewise, lotting
= n1 / 2 ,
.
(2n)1/2 n <
_71/2
-
= L-(m+c)/(2m+1) J1/2
1
it' n
In general, for n
(14)
n/(n+l)
~
(d y)
:::t::. n
1;4 and 2/7 in turn, we obtain, by (5),
be
0
~ L-2n/(2n+l) Jl/2
(2n)1/2 an
r
nn-
+ 1/n)1/2.
rm/(2m+2c) J1/2 b (c)
m
£'
-
and;: (1 - 1/2n)1/2
if
•
(2n)I/2 an
~
n;;a 2m + 1, m :z 1, 2, • •• •
In general, for n > 2.
By (12) - (15), VIe have, for nrbitrnr~r x, y ~ 0, and n ~ J •
(16)
. ,
Fn (y) - Fn (-x) -< .I.r- ;n-i4
71/ 2 _.::t.
/-~(o ny)- 1h(_C
x) n- ;t.
n
F (y) - F (-x) > d-2 /- ~ (d y) n
n
-n-.::t:.n
(17)
The proportional error in using
-,:.- (-d x)
~
n
-
7.,
7 .
as an qpproximation to B is defined
A
to be
E-I*
(18)
whore
I
A
I
is tho absolute valuo of
A.
-1\
UOVI
omitting
on
and d
n
in the
8
arguments in the ~rs
of (16) and (17), we see that
f( 7n-3)/(7n-14) Jl/2 - 1
maximum of
E < :../n
state that
for.::l11
much smaller than l/n.
-
-
and 1 - n/(n+1).
For simplicity, we may
It seems that the actual values c£
For example, if
J: (y)
Fn( -x) = .95 while
n > 8.
E is not more than the
E
arc
n'" 30, and y = x = 2.°42, thGn F (y) n
.$: ( -x) == .9588
so E';'
.0092.
Nevertheless, tho
bound is of precise and general natura cnd smnll cncueh to justify the use of such
approximations for large n.
B.
Thompson 1 s
'"'L. -distribution.
The cdf of the (; - distribution is given
L-2,
p. 241
J
by
(19)
...
-
where I x I <
•
For
applications of the 1:- distribution the readers are referred to 1:2, P.390_7
and
r4_7: Obviously by (11), a':= (1 - 1/n)1/2 an-l~
L n .
we obtain for
-
-
x,y > 0, and n > 4 •
~(y jn-l)
L- :::t:.
n
F (y) - F (-x) > (1 - 1/n)1/2
(21)
using (7), (14), and (15),
n
n
-
;: (1 - lin)
f
~ (y) -
- !.(-x jn-!)
i
O(x)
if
0:: a
oS 1. Thus in using 1: (y) -
J
:i (-x) _7 •
The second inequality of (21) is obtained by using the fact that
a
n
IJax) ~
~ (-x) as an approximation to
F (y) - F (.x), the proportion,~l error E, as defined by (18) is not more than the
n
n
9
1
maximum of £(7n-1o)/( 7n-21) _7
/'2 -
1 and 1 - (1 - lin).
For n
~
13, this
max imum is l/n.
c.
The correlation coefficients.
Let a
s~ple
of size n + 1 bo drawn from each of k independently and
normally distributod populations.
Tho pdf of the partial correlation coefficients
r12.34~ •• k,(k~ 2), correspcnding to tho k s~ples is then ~2, P. 412_7
-
I zI <
(22)
where nk ~ n - k + 2.
1
,
If k = 2, then (22) reduces to the pdf of the totnl
correlation coefficient r 12 •
Th8 variance of
r 12 • 4••• k is ~l.
3
Let F~(X) be
1/2
-the cdf of ~ r 12 • 34•.• k' then
x
(23)
f-n
1/2
r
2
(n k -3)/2
an (1 - z Ink)
dz
k
1/2
, l x l <- n1<:
•
k
This, obviously, is only a special c~se of (19).
in using ~ (y) l/~.
Therefore the proportional
error
I.(-x) as an approxim:>tion to F (y) - F (-x) is not more than
n
n
k
k
Rotelling £3, p. 196 _7 st.:ltcd that (tho nOrrrl.::tl approximntion) is in ordi-:-
nary cases the most convenient of all (methods for evalu1ting the intogrnl (23) ),
but no suitable bound for the error has been Qvailnblo.
The bound we obtain hero
seems acceptable, at least when n is large compared with k.
D.
~2-distribution.
It is well known £2, p. 251_7 that i f -x,2 has a x,2-distribution
with n d.f., then both (X 2_n )/(2n)1/2
normally distributed with mean 0 and
and (2 X2)1/2 - (2n)1/2 are asymptotically
vari~nce
1.
According to
Fishe~thc
distri-
10
mticn of (2 x,2)1/2 _ (2n_l)1/2 tends to normality even "faster".
Unfortunately,
we are not able to obtain upper and lower bounds for the cdfs of these distributions.
of
lole shall derive, as another application of (7), a lower bound, in terms
:!" 0
F (y) - F (0), where
n
n
of (6), for
(~2)1/2
_
(2n_2)1/2~
If
F (Y) - F (0)
(24)
n
n
y
Q
~
is the cdf of
0, then
2-n /2
Ym
r-l
(n/2)
Jm
xm/2-1 e -x/2 dx
y
=
r-
1
( n/2) (m/2c )m/2
iI
2
2
£1+z/( an) 1/ ]oXPf:-z/( an)1/2-( 1/2 )z /2m_:
o
where
Y,
::II
m
£(2m)1/2 + y
-
71 / 2/2,
m = n-l, and
practical purposes, we may use, for
n
~
Z
= (2x)1/2 - (2m)1/2. For all
4,
n! ~ (2n)1/2 nn+l/2 exp
J: -n
+ 1/12n
r- l (m/2)(m/2o)m/2 ~
Using this approximation and (2), it can be shown that
for n ~ 10.
7.
(2n)-1/2
Nou. applying (2) to the second factor of the integrand in the second.
integral of (24), ne sec that the integrand is not less than (1 - z2/2m )m (Vle assume
that
y < (2m)1/2):
-
using (7).
.
Therofore a lO\'lCr bound can be obtained for
F (y) - F (0) by
n
n
However, a better lower bound cnn be obtained by applying (2) to the
first factor of the integrand.
Siimilarly i f
0
~x~
He have, for all
y
~
0,
(2m)1/2, then
F (0) - F (-x) < ~ (x)
n
n
- ;1:.0
•
11
5.
P;.
remark. SOqu,onces of distribution f'w:lctions are often encountered which
have asymptotic ~2-distributions. For example, if x has a ~-distribution with
parameters m and n, then the c.d.f. of nx is given by ["2. p. 243_7'
y
F
m,n
(y)
\
o
- -
where 0 < Y <n, and m,n > O.
As n - >
of tho -x,2-distribution with m d.f.
00,
2
m,n (y) tends to the c.d.f. Xm(y)
F
B Y methods similar to those used in deriving
(1), we can obta.in upper and lower bounds; for F . (y) in terms of X- 2 (y). Unfor.
m,n
m
tunately, these bounds are not very close to each other unless min is very small;
consequently they are of little practical interest.
The author wishes
We omit the details
to thank Professor Hotelling for his critical reading of
the manuscript.
REFERENCES
J. T. Chu, uOn the distribution of-the sample median," Annals of Ma.thematical Statistics, Vol. 26 (1955), pp. 112-116.
L
. .'
i
I
H. Cramer, Mathematical Methods of Statistics, Princeton University Press
1946.
Harold Hotelling, uNew light on the correlation coefficient and its transferms,u Journal of the Royal Statistical Society, Series B,
Vol. 15 {I9S3', PP. 193-232.
•
UOn a criterion for tho rejection of observations and the
-r4-7 w. R. Thompson,
distribution of the ratio of deviation to sample standard deViation, II Annals of Hnthomatical Statistics, Vol. 6 (1935)" pp.
214-219.
' ,
rr; 7
-
-
J.
v. Usponsky,· Introduction to Mathematical Probability, HcOraw-Hill, New
York, 1931.