Chu, John T.; (1954)On the distribution of the sample median." (Navy Research)

.
ON THE DISTRIBUTION OF
TH:C: SAl-1PLE MEDIAN
by
John T. Chu
Special report to the Office of Naval Research
of work at Chapel Hill under Contract N1-onr-28402
for research in probability and statistics.
Institute of Statistics
Mimeograph Series No. 105
May
1954
On the Distribution of the Sample Median l
by John T. Chu
Institute of Statistics, University of North Carolina
1.
Summary.
Upper and lower bounds are obtained for the cumulative distribu-
tion function of the sample median of a sample of size 2n+l drawn from a continuous population.
It is shown that if the parent population is normal, then
the distribution of the sample median tends "rapidly" to normality.
Other kinds
of parent populations are also discussed.
2.
Introduction.
Let a continuous population be given with cdf F(x) (cumula-
tive distribution function) and median £ (assume it exists uniquely).
sample of size 2n+l, let x denote the sample median.
xis,
p. 369_7 that the distribution of
normal with mean
£ and
variance
2
0n
Ll_7,
It is well-Y~own £"2,
under certain conditions, as~~ptotically
= 1/4
the pdf (probability density function).
£"4_7 and J. H. Cadwell
For a
72 (2n+l)
r-f(~)
L
_
where f(x)
= F'(x)
is
Several authors, among them T. Rojo
claimed that numerical investigations showed
that, i f the parent population is normal, "the convergence (of the distribution
of
x)
to normality is surprisingly fast."
It seems, however, no mathematical
proof or disproof has ever been given to this experimental result.
It is the
purpose of this paper to show mathematically that their findings are correct.
·Upper and. lower bounds are obtained for P
r -x < (x - £) /
-
0
n
<
y
-
7
(Equations
( 23) and( 24)), and for large samples, these bounds are reduced to a simpler form,
(25) and (26).
By examining these bounds, it then becomes evident that the
distribution of
(x - £)/0n
tends "rapidly" to normality.
Rectangular and Laplace parent popUlations are briefly discussed.
seems that in these cases the distribution of
It
(x - ~)/on tends to normality at
a "much slower speed."
1 Special report to the Office of Naval Research of work at Chapel Hill under
Contract N7-onr-28402 for research in probability and statistics.
2
3. Upper and Lower Bounds.
Let F(x) and f(x) be respectively the cdf and pdf of a certain popula.tion whose median is~.
If g(x) is the pdf of the sample median
x of a. semple
of size 2n+l, then
(1)
where C = (2n+l) !jn~n~.
It is known 1-2, P. 369_7 that :l.f f( ~ )
n
f' (x) is continuous in some neighborhood. of x
normal distribution with
mean~
=~ ,
:f
-
then x has a,n a.symptotica.lly
and variance
(2)
For finite n, let the cdf of (i-~
~
!
(3)
1
H(y) - "2 =
1
I
I
+ya
)jan
be H(x).
g(t)dt
= en
j
J
~
Then for any y > 0,
, F(~ +yo n )
n
1
'2
° and
u n (l-u )n du
3
Applying to (3) the transformation
v
( 4)
= 1 L-l
2
1
- e- t 21(2n+l)72
'
we obtain without difficulty
,tl
1
,(
H(y) - -> a. B
2_
n I
___1_ e
o
(6)
H(y)
_12
_<bBn
t2
2n+l
f2$.
I
I
_ -E:!:!
t
J
o
t
1
2n+l
a
h ( t ) dt,
2 /2n+l
j2;c
(8)
=t
dt,
for
0.:: 8 .:: 1,
_~ t 2
where
h (t )
2
)
/2n+l
2
1
h (
a -t
2
2
1
I( I-a -t)2 ,
for b > 1,
4
(10)
(11)
t
2
2
= L-(2n +1) log ( 1 _(4/b )
LF(~
1
~_72 t7
2
+ yOn) -
It can be shown (by differentiations) that hl(t) and h (t) are respectively
2
monotonically increasing and decreasing functions of t, when t
Lim
t -> 0
hl(t)
= Lim
t -:;> 0
h (t)
2
= 1.
(12)
H ( y)
1/
f
- '2 ~ a Bn
1 -
2ii+2 Lr ¢
1
.--
/2n+2)
(tl,j
2ii+I
-
1
'2 _
7,
where
t
(14)
11' (t)
1
) -f2i
=
-00
In a similar way we can show that for arbitrary x
(15)
H(y) - H( -x)
>a
-
B
nv
/;"'.~
1
2n+2
0, and that
Hence we obtain, from (5) and (6),
:----_.1
~
>0
and y
> 0,
5
where t
3
y by -x.
and t
4
are obtained from t
l
and t
2
in (10) and (11) by replacing
It can be seen from (10) and (11) that if x and yare fixed,
= t 4 = x + 0(1) for large n. In the following
3
sections we will show, for various kinds of parent distributions, that if
t
l
=t2 =Y+
0(1) and t
a and b are properly chosen, then (15) and (16) remain valid if t , t ; and
l
2
t , t are respectively replaced by y and x.
4
3
If n is large, upper and lower bounds for B can be obtained by using
n
W. Feller ["3_7 showed that for n ~ 4,
Stirling's formula.
where
n~
= f21t
IQ I ~
1/6.
n
n+.l2
e
·1 (l+Q)
-n+--l2n 360n 3
1+Q-
If the last term, - ----- , is omitted, then it can be
360n 3
shown that
(18)
1 +
1
trri
7n+3
<B<1+ 1 +
2
n
En
24n (2n+l)
1
16n(8n-l) ,
or
4. Normal Parent Population
Suppose that a sample of size 2n+l is drawn from a normal population
-
2
with mean S and variance a • The distribution of x is then asymptotically
normal with mean
s and
variance ~a2/2(2n+l). It has been shown that if for
6
x>
0,
II> (X) _ <l?( -X)
(20)
= a(x)
2 2
Ll _ e- it X
1
7"2 ,
then a(x), a function of x, never exceeds 1 and is very close to 1 for
all values of x
> O.
J. D. Williams
L6_7
proved that a(x) .$ 1 and tabulated
l/a(x) -1 for a number of values of x ranging from .1 to 2.0.
G.
P61ya
L5_7
gave several proofs for the same inequality and remarked that if
2 2
- -
(1 - e
X
1
:n:)2
is used as an approximation to lI>(x) - II>(-x), "then the
error committed is less than one per cent (even less than
of the quantity approximated."
(21)
For arbitrary x
(22)
,
a(x) > .9929
>0
and y
x
/2ii+I
H(y) - H(-x)
> Min
> O.
let
,
Applying (20) to (15) and
( 23)
In other words,
for all x
> 0,
.71 per cent)
(16), it follows that
r;. - 2ii+2
;,.
( a(x ), a(y )} B
n
n
n.J
Ii( j2n+2) - <IJ(_x/2n+?, )7
2n+l
~2n+l -
L .... Y.,..
where Bn , <l?(x), and a(x) are defined by (7), (14), and (20).
we suggest to use
For large n
7
/--
(26)
5.
~
H{y) - H{ -x)
(I +
§n)
j 1 + ~ L4>(y)
- 4>( -x)_7.
Other Parent Populations.
A. Rectangular Distribution.
then
s = !(c+d),
Ir x > 0 and y
and 0n2
> 0,
Let rex)
= (c-d)2/4(2n+l).
= l/{d-c),
where c < x < d,
Let H(x) be the cdr or
(x-s)/on .
then lower bounds rorH(y) - H(-x) are the RHS (right-
hand sides) or (23) and (25),
without the factors Min (a(x ), a(y )J and
n
n
.9929 and upper bounds for H(y) - H(-x) are the RHS of (24) and (26) with
an additiona.l ractor Max (b(x ), bey ) J where b(x) is derined by
n
n
b{x)
=0
l-e -x
,
> 0,
x
and
,
(28)
We note that b(x) is close to 1 only if x is close to 0, e. g.,
b(.l)
= 1.02
and b(.2}
= 1.05.
This means: unless x and yare small, upper
and lower bounds for H(y) - H{-x) are not very close to each other except for
large n.
B.
where
-00
Laplace Distribution.
< x < 00,
then
s is
Let rex)
= 2~
the median and ~
_ .k.lJ
e
A
,
= A2/{2n+l). Define c{x) by
8
1 _ e- x
We say that c{x)
it.
~
= c{x)
2 1
(l _ e-x )2
1 : if x
~
, X
> O.
1, this is obvious; if x
~
1, use (20) to prove
It then becomes clear that upper bounds for H{y) - H(-x) are the same
as those corresponding to a normal parent population, i.e., (24) and (26).
Lower bounds for H(y) - H(-x) are the RHS of (23) and (25) with
Min (a{xn ), a(Yn») and .9929 replaced by Min (c(Xn ), c{Yn) J,:.wher$.c(x) is
defined by (29) and x
n
= x//2n+l
• Again we remark that c(x) is close to 1
if x is small or large, e. g., c(.l) = .99, c(.2) = .92, and c(3)
while c(l)
= .79
and c(2)
=
.97,
= .87.
Finally we note that since c(x) tends to 1 as x tends to 0,
hey) - H(-x) tends to ¢(y) - ¢(-x) as n tends to infinity.
(x-~)/a has an asymptotically normal distribution.
n
Therefore
In the general theorem
which showed that (x-~)/an has an asymptotically normal distribution
(see ~2, p. 369_7 , also the beginning of §
be continuous.
3), it is required that
For a Laplace distribution, however,
ft(~)
ft(~)
does not exist.
9
References
["1_7 J. H. Cadwell, "The distribution of quantiles of small samples,"
Biometrika, Vol. 39 (1952), pp. 207-211.
~2_7
H. Cramer, Mathematical Methods of Statistics, Princeton University
Press, 1946.
["3J
W. Feller, "On the normal approximation to the binomial distribution,"
Ann. Math. Stat., Vol. 16 (191~5), pp. 319-329.
~4_7
T. Hojo, "Distribution of the median, quartiles, and interquartile
distance in s8.Illples from a normal population," Biometrika
Vol. 23 (1931), pp. 315-360.
"Remarks on computing the probability integral in one and two
dimensions," Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, University of California
Press, Berkeley, 1949, pp. 63-78.
1:6_7
J. D. Hil1iams, "An approximation to the probability integral," Ann.
Math. Stat., Vol. 17 (1946), pp. 363-365.