130 Part B

Gertrude M. Cox
ERRORS IN NORMAL APPROXIMATION TO CERTAIN TYPES OF
DISTRIBUTION FUNCTIONS
by
J. T. Chu
University of North Carolina
Special report to the Office of Naval Research
of work at Chapel Hill under Contract NR 042 031,
Project N7-onr-284(02), for research in probability and statistics. Reproduction in whole or
in part is permitted for any purpose of the
United States Government.
•
Institute of Statistics
Mimeograph Series No. 13.0
May 27, 1955
•
•
.,
ERRORS IN NORMAL APPROXnUTIONS TO CERTAIN TYPES OF
DISTRIBUTION FUNCTIONS
by
J. T. Chu
University of North Carolina
1.
Surl~ary.
For certain types of sequences of distribution functions which
are asymptotically normal, say with mean 0 and variance
1, a method is obtain-
ed for deriving upper and lower bounds in terms of the asymptotic distribution
function.
It is then shown that to many of these distribution functions, the
errors are small in using the normal approximation.
2.
Introduction.
Let Fn (x), n
= 1,
2, •.• , be a sequence of edt's (cum-
ulative distribution functions) such that for every fixed x , Fn (x) - > F(x)
as
n ->
00,
where
F(x)
is a cdf independent of n.
From a practical point
of view, it is desirable to know how large
I Fn(X) -
F(x)
I
n has to be in order that D (x) =
n
be small so that F(X) l"1ay be used as an approximation to
Fn(x), although approximations are often used in practice without much knowledge
about the magnitudes of the errors.
The function D (x) may vary, of course,
n
considerably for different values of nand x. But the most interesting kind
of Dn (x) IS is probably those which tend rapidly to
such cases,
0,
uniformly in x.
In
F(X) provides for all n's greater than some minimum and for all XIS
a satisfactory approximation for
Fnex).
is ample numerical evidence that as
Generally, however, even though there
n increases D (x) rapidly becomes uniformn
ly small, it may not be oasy to obtain a mathc:latical proof.
There are, on the other hand, types of sequences of cdt's for which we are
able to confirm rigorously that they do tend rapidly to normality.
If a cd! has
1. Work sponsored by tho Office of Naval Research under Contract NR 042 031
at Chapel Hill.
2
one of the following fonus:
iffi
/
j,.
2
2
( 1 + Z In)
dz
,
-00
where
Cn
and
TIl
depend only on
n which tnkes integral values, and
Lim
n --->
mIn
0:
I, then as
n ->
for every fixed x ( Lemma
I
Fn(x) - > '
00,
(x), the normal edt specified by (6),
4). We find: by using simple transformations such as
u .,
r + n log ( 1 + z2 In) J
--
1
upper and lower bounds can be easily obtained for the integral in
of
l. (x)
(Lcnma 3), and if
en
F (x) in terms
n
is not a very complicated function of n, then
rather simple upper bounds can usually be derived for the error in using
as an approximation to
~
4,
00
Fn(x).
'i. (x)
If the bound is small, then so is the error.
In
applications arc given to sequences of edt's corresponding to the Student
t-distribution,
tOhe --r-distribution of W. R. Thompson
1:4_7,
and the distribu-
tions of the partial and total correlation coefficients when the variates involved are independently and normally distributed.
For most of these edt! s, we
are able to show that the errors are small in using the normal approximation,
though the actual values of the errors seem even smaller.
Similar methods were used by the author L-l_7 to derive upper and lower
bounds for the edt of the sample median
tion function (which is normal).
x
in terms of its asymptotic distribu-
Thore we also showed that if the parent dis-
tribution is normal, then even for samples of moderate sizes, the error is small
in using the normal approxim.::ltion to the
cdf of
x.
The cdf
of
duced to one of the forms given above by several transformations.
x
can be ro-
But morc re-
3
fined arguments arc neoded in order to got tho bounds obtained in
~1_7.
Some further applications and remarks are given at the end of
3.
~
t 5.
4
and
=
0, hence
Lcnunas.
Lcmna 1
0
, for all real x
(1)
(2)
1 +
;
>
,
, according as x
x <-
>
<
o.
If x 2: 0, then
(3)
> 1 ,
2
. x
Proof.
we have (1)
The function
.!
2
e- x /( 1 _ eeX
-
X
and
)
2 ~ 1 •
x - I has its min:illlUIn
at x
0
0
(2) holds because log(l + x ) -x+ x 2/2 is monotonically increasing for
all x > -1.
(3) is a consequence of
(left h~d side) tends to 1
function of x
(1)0 (4) follows from the facts that the
as x ---> 0 and is a monotonically decreasing
(Differentiate twice).
Lormna 2.
Lot
1
= _1_0_"-:;.3_0_0_"-:(~2_n-:-1~) (n + c) 2 ,
2 • 4 ... (2n)
n
where
c >
-1.
Then
LHS
1
2
= , ,
0 •• '
1
,.rb (0) < 1
(5)
n
-
>
1
""
Proof.
bn(O)
is known
_1
"IT
2
["5,
,
J.
Of
0
<
4'1
,.
,
if
c >
g
•
P. 351
_7
7
4
as the Wallis product and. tends to
as
n ->
Obviously bn(c) tends to the samo limit for every fixed c.
00.
By examining the square of tho ratio
b
n+
l(c)/b (c), it can be shown that
n
is a monotonically incroasing function of n
monotonically decreasing function of
n
Lorrma
Let
(
(6)
i f and only i f
if and only if
c
c ~
t'
b (0)
n
and is a
~ ~. Hence we have (5).
3.
(~)~
,
-00
and
I
~O(x) -
and let the inequalities
to as
and
(7.1), (7.2),
0 < x
~
n
1/2
Let (7) below be abbreviated as
(x)
•
-
_...
-
Then
(7.1) and (7.2) hold for all m, n
(7.3) holds for all m, n > 0, and
:s x
<
0 S x <
00;
x
(1- z2/n ) m/2 dz
o
1
IE I
0
(x
IF)
,In
_< (2n)-2
~
x
(1+ z2/n )-m/2 dz
o
> 0,
and (7.4),
00.
i
<
-~m
- -
A < B, B < C, C < D, and D < E be respectively referred
... , and (7.4).
all m > 3, n > 0, and ..0
.. -
A < B < C < D < E,
5
Proof.
of (1).
It is easy to see that (1.2) and (7.3) arc ~mediate consequences
Now use the trmsformation
1
v(z) =£n log (1 +
Z
2
J
In)
'2
,
then
x
f (1+z2/h)-m/2dZ ...
o
1
whero hex) is the illS of (4).
we have
By (2) and (4), h(v/n'2) S 1 and vex) :: x.
Hence
(7.4). Finally (7.1) can be obtained in a similar way by using (3) after
applying to the integral the transformation
u(z) =
-r- n log (1 Lemma
Suppose that for every integer
1:
2
z In)
J
2
4.
n > nO > 1
-
-
J
and
Lim
n->
(8)
is a. cdf. where C and m depend only on n
. n
is (1 - z
2
In
fixed
)m/ 2"
J
.
it should be replaced by
Fn(x) ~
Lim
:i(::)
Proof.
t z'
='
1 (If the integrand
~
~ n ).
Then J for every
x,
n->
where
0 uhenever
min
00
"I. (x)
J
00
is defined by (6).
By Lemma 3, we have
Using tho same lemma
Lim
n->
00
6
once again, we obtain (9).
4. Normal ..qpproximaticns.
In Lemma
4, we shoned that if a cdf is of one of tho
types (8), thon it tends to the ][ of (6) as
n --->
In this section, we
00.
shall find upper and lower bounds for the C 1S corresponding to various well known
n
cdfs of tho types (8); thon USG Lemma 3 to obtain upper and lower bounds for these
2
cdfs themselves, in terms of
errors(~)
portional
in using
~
<md finQlly derivG upper bounds for the pro-
j
as <m approximation to those cdfs.
A. t-distribution
The cdf of the t-distribution with n d.f. (d~grees of freedom),
n
= 1,
... , is given by
2,
(10)
F (x)
n
r
=
a ( I + Z2;n )-(n+I)/2 dz, where
n
- 00
It is well kno'tom that as
fixed
n - > 00, F (x) - >
n
'! (x)
of (6) for every
and the11speed" of approaching the limit is rather fast.
Xj
normal approximation is often used in practice when n
~
In fact, the
30. We shall derive for
Fn(x) upper and 10\-ler bounds in terms of !(x), then show that the proportional
error in using ~(x) as an approximation to Fn(x) is less than lin for all n ~ 8.
Applying (1.4) to. Fn (Y)-1/2 and 1/2 - Fn (-x) and using the fact that
:! (-x)
n
~
::10
I -
'! (x),
it can be shown easily that for arbitrary x, y ~ 0, and
3,
.1
F (y) - F (-x) < (2n)2 a o-lr ~ (0 y) -
n
n
-
nn-
~
n
:+;"
~
(-c x) 7
n -
,1.
where
o
n
::10
(1 - 2/n)2.
Using (7.3), we obtain, in a similar way,
,
7
1
r
F (y) - F (-x) > (2n)2 a d-1
n
n
nn-
(13)
where
d
=0
n
r(x+l)
J1/2
m = 1, 2, •..•
(2n)1/2 an
= xr(x)
bm(C)
Letting
~ !:2n/(2n+1)
m=l, 2, •••
J ,
(d y) - 1[ (-d x)
n
n
(1 + 1/n)1/2.
Using
Lm/(2m+2c)
-::h
~
•
2
and r(1/2) = n1 / , it can be seen that
and a
0
2mH
be
Jl/2
:=
L-(m+c)/(2m+1)
J1/2
= 2m
In general, for n
~
and
~
(1 - 3/(7n) )1/2
2m -
~
n-1 b;l (c)
1;4 and 2/7 in turn, we obtain, by
if n
a
where
(5),
i f n ... 2m+l,
3 ,
(14)
Likewise, letting
.£
n/(n+l)
_71 / 2
= 1/2 and 1/4 respectiv.e1y, we obtain
0
n .... 2m
if
and
~
(1 - 1/2n)1/2
if
(2n)1/2 an
~
n - 2m + 1, m ~ 1, 2, • •• •
In general, for n > 2.
By (12) - (IS), He have, for arbitrary x, y ~ 0, and n ~ 3 •
F (y) - F (-x) > d-
(11)
n
n
2
r
:r:(d y) -n-;t:n
The proportional error in using
~(-d
x)
~
n
-
7 .
A as an qpproximation to B is defined
to be
E=11- 1t
(18)
where
I At
is the absolute value of
A. How omitting On and dn in the
8
arguments in the ~'s
maximum of
of (16) and (17), we see that
f(7n-3)/(7n-14) Jl/2 - 1
state that E < lIn
much smaller than lIn.
Fn (-x)
:z:
.95 while
n ~ 8.
for all
n/(n+l)~
For simplicity, we may
It seems that the actual values cf
For examplo, if
~ (y) -
and 1 -
E is not more than the
=-
n
30, and y ::
~ (-x) = .9588
E
nre
2. 0 42, then F (y) -
X ::
n
so E';' .0092.
Nevertheless, tho
bound is of precise and genoral nature cnd small enough to justify the use of such
approximations for largo n.
B.
Thompson1s
1: -distribution.
Tho cdf of the
1; -
distribution is given
L-2,
p. 241
J
by
x
F (x) ::
(19)
J
n
-n1/2
a '(1- z 2/n )(n- 3)/2
n
dz ,
...
-
where I x t <
•
For
applications of the 1:: - distribution tho readers arc referred to £2, P.390_7
and £4_7:
a~ ~
Obviously by (11),
we obtain for
(1 - 1/n)1/2 an_I:
-4.
x,y > 0, and n >
-
F ( y) - F (-;: ) < (7n-lO)1/2
--
n
n
.- . 7n-21
r
~
-
F (y) - F (-x) > (1 - 1/n)1/2L-
(21)
Using (7), (14), and (15),
n
n
-
~ (1 - lIn)
f
(jE;1-3)
y -n
-
A:
~
-:i:
=:t: ( -x
(y jn-l) _
n
~ (y) -
:! (-x jn-I)
J
n
!. (-x) _7 .
The second inequality of (21) is obtainc<i by using the fact that
a !O(x)
if
0 ~ a ~ 1.
Thus in using
l: (Y)
-
jE;-3
7
n ) -:' ,
1 o(ax)
~
~(-x) as an approximation to
F (y) - F (-x), the proportional error E, as defined by (18) is not more than the
n
n
9
maximum of l(7n-10)/(7n-21)
_71/ 2 -
1 and 1 - (1 - lin).
For n
~
13, this
maximum is lin.
c.
The correlation coefficients.
Let a sample of size n + 1 be drawn from each of k
normally distributed populations.
independently and
The pdf of the partial correlation coefficients
r12.34~ •• k,(k~ 2), corresponding to the k sQffiples is then ~2, p. 412 _7
,
(22)
where nk
=n
- k + 2.
, zJ < 1
-
J
If k = 2, then (22) reduces to tho pdf of the total
correlation coofficient r 12 •
The variance of
r 1203 4••• k is ~l.
Let F~(X) be
1/2
-the cdf of ~ r 12 • 34•.. k' thon
x
2
I
)1/2
-n
an
(1 - z Ink)
(nk-3)/2
dz
,
k
1/2
lXI~nk
•
k
This, obviously, is only a special cose of (19).
in using
l/~.
lJy) -
Therofore the proportional error
'I(-x) as an approximation to F (y) - F (-x) is not more thnn
Hotelling £3, p. 196
n
_7
k
n
k
stated that (the normal approximation) is in ordi-
nary cases the most convenient of all (methods for cV3lu3ting the integral (23) ),
but no suitable bound for the error has beon available.
The bound we obtain here
seoms acceptable, at least when n is largo compared with k.
D.
~2-distribution.
It is well known £2, p. 251
with n d.f., then both (;r: 2_n )/(2n)1/2
_7
that if -X 2 has a X 2-distribution
and (2 JG 2)1/2 - (2n)1/2 are asymptotically
normally distributed with moan 0 and varic::nce 1.
According to Fishor, tho distri-
10
mtien of (21.. 2 )1/2 - (2n_l)1/2 tends to normality even "faster".
Unfortunately,
we are not able to obtain upper and lower bounds for the cdfs of these distributions.
We shall derive, as another application of (7), a lower bound, in terms
of ~O
of (6), for
(~2)1/2
_
F (y) - F (0), where
(2n_2)1/2~
F (y)
'"'!
n
n
n
If
y ~ 0, then
F (0)
n
= 2-n /2
r-l (n/2)
is the cdf of
Ym
Jm
xm/2-l e -x/2 dx
Io {
y
r-1 (n/2)( m/20 )m/2
-
where Ym
=r
( 2m) 1/2 + y
_71/2 /2,
£1+z( 2Jn)1/2]oxpf:-z/( 2Jn)1/2-( 1/2 )z2/2m_:
m =: n-l, and z
practical purposes, we may usc, for
=<
(2x) 1/2 - ( 2m )1/2 •
n Z 4,
nl "- (2n)1/2 nn+l/2 exp
£' -n
+ 1/12n
-
NOli
integral of (24),
that
applying (2) to the second factor of the integrand in the second
UG
y _< (2m)1/2):
using (7).
7.
r- l (m/2)(m/2e)m/2 ~ (2n)-1/2
Using this approximation and (2), it can be shown that
for n > 10.
For all
sec that the integrand is not less than (1 - z2 /2m )m (Ue assume
Therefore a lower bound can be obtained for
Fn (y) - Fn (0) by
However, a bettor lower bound can be obtained by applYing (2) to the
first factor of the integrand.
S:imilarly i f
0:; x
~
We have, for all
y
~
0,
(2m)1/2, then
F (0) - F (-x) < ~ (x)
n
n
- .%.0
•
11
5.
A remark. $l3q'Ilances of distribution functions are often encountered which
For example, i f x has a ~-distribution with
have asymptotic ;G2.distributions.
parameters m and n" then the c.d.f. of nx is given by ["2. p. 243_7'
. r(~)
(y) == -:'!!"1~~/i!. - - - m,n
nm/~r(m/~)r(n/~)
F
y
\
tm/2 -1 (1 _
zin)n/2
-1 "".
o
where 0 < Y <n, and m,n > O.
2
As n - >
of the-x. -disttibution with m d.f.
00"
F
m"n
2
(y) tends to the c.d.f. Xm(y)
By methods similar to those used in deriving
(7), we can obta.in upper and lower boundS3 for F ' (y) in terms of Yv 2(y). Unfor.
m"n
m
tunately, these bounds are not very close to each other unless min is very small;
consequently they are of littlapractic<U. interest. We omit the details
The author wishes
to thank Professor Hotelling for his critical reading of
the manuscript.
REFERENCES
J. T. Chu, uOn the distribution of~the sample median," Annals of Mathomatical Statistics, Vol. 26 (1955), pp. 112-116.
I
H. Cramer, Mathematical Methods of Statistics, Princeton University Press
1946.
Harold Hotelling, uNew light on the correlation coefficient and its transforms,U Journal of the Royal Statistical Society, Series B,
Vol. 15 (1953), pp. l§3~232.
Thompson, ~On a criterion for the rejection of observations and the
distribution of the ratio of doviation to sample standard de~­
ation,11 Annals of Hathematical Statistics, Vol. 6 (1935), pp.
214-219.
• . .
-r57-
J.
v.
Uspcnsky, 'Introduction to Mathematical Probability, McGraw-Hill" New
York, 1937.