ON WEISSMAN'S METHOD OF ESTIMATING LARGE
PERCENTILES
Dennis D. Boos
Inst. of Statistics Mimeo Series #1369
ABSTRACT
On Weissman's Method of Estimating Large Percentiles
A
Weissman (1978, JASA) suggested percentile estimators
n based
on the joint limiting distribution of the k largest order statistics.
The present work extends the method to censored situations and
investigates the consistency and asymptotic distribution of
two different limiting schemes.
indicate that
n
~
Mean squared error calculations
is often an improvement over the usual sample
percentile estimators.
under
1.
INTRODUCTION
Weissman (1978) suggested a method of estimating the large (or
small) percentiles of a distribution using extreme value theory.
His
approach is attractive because it is asymptotically nonparametric
as long as the domain of attraction of the underlying distribution is
known.
Although the method is applied to estimation of specific
quanti1es, it may also be viewed as a smoothing technique for the
tails of the quantile process.
The purpose of this paper is to extend
Weissman's work to censored samples, derive confidence intervals,
address the consistency issue, and identify those situations where
improvement over raw' percentile methods is possible.
Weissman's results can be summarized as follows.
Let
X , ••• ,X
1
n
e"
be a sample from the distribution function F and let
Xl
> X > ••• > X
n- 2n - nn
to smallest.
be the order statistics labeled from largest
Suppose that there are sequences
a
> 0
n
and
b
n
such that
P«X
In
-b )/a <x)
n
n-
for all x in the support of G.
case
G(x) = exp(-exp(-x))
~
G(x) as
n~
(1.1)
For convenience, consider only the
since results for the other two possible
limiting distributions are similar.
Then for k fixed, Theorem 2 of
Weissman (1978) yields
X In-b n
[
a
n
' .•. ,
x..
-Kn-b n ]
a
n
d
--> M
k
'
(1. 2)
-2-
where
~
is the k-dimensional extremal variate
w~th
density
...
Thus, (Xln' ••• '~n)
has approximately the density
Wk«xl-b )/a , ••• ,(xk-b )/a )/a k
n
n
n
n
n
~n = k
quantile of
-l~k
F.
li=lX in
a
and
n
an = ~n - ~n
by maximum likelihood to get
where
and
Let
Dl - c/ n
b
and
can be estimated
n
bn
=
~n in k + Xkn
be the upper (l-c/n)th
(I use the terms "quantile" and "percentile" inter-
changeably although technically the .9th quantile is the 90th percentile.)
Then, if (1.1) holds, (D
/ - b )/a
n
n
l -c n
a natural quantile estimator is
Weissman suggested
in c
+ -
nl -c / n = a(-ln
A
A
c > 0
for each
and
~"
c) + b
n
= a nin(k/c) + Xk n •
for the situation where only the k largest
order statistics are available or as an alternative to raw percentile
methods.
This investigation is aimed at the latter context and was
motivated by the practical problem explained in Section 2.
Section 3
extends the Weissman theory to include censoring of the extremal variate
and gives confidence intervals for
of
A
n
np .
Section 4 discusses consistency
under two different limiting schemes.
Monte Carlo results which suggest values of
true improvement over raw percentile methods.
example illustrates the procedures.
2.
Section 5 reports some
(n,k,c)
for
A
n
to be a
In Section 6, a numerical
Section 7 is a summary.
A MOTIVATING EXAMPLE
While preparing a paper (Boos, 1981) on minimum distance estimation,
it became necessary to use Monte Carlo methods to estimate the upper
percentiles of a statistic which involved minimization in several
-3-
dimensions.
The minimizations were fairly costly so that I allowed
myself only
N
= 1000
.
replications.
A packaged program (see Dickey, 1981)
processed the Monte Carlo results by placing the output in 800 bins
rather than printing out the N exact observations.
However, in some
situations I did not specify a large enough range in the program, so
that a few of the largest observations were censored.
Likewise, in
other cases I did not trust the largest observations because certain
convergence criteria had not been met.
Thus, I had situations of
Type I censoring and Type II censoring (trimming).
The grouping effect
of the bins was small compared to the standard error of percentile
estimators.
Unfortunately, these latter standard errors for the 9Sth-
99th sample percentile estimators were higher than desired and no more
observations could be taken.
to be a useful alternative.
3.
The extreme value
e ..
approach turned out
See Section 6 for a specific example.
INFERENCE FROM THE EXTREMAL VARIATE
In this section I want to extend Weissman's (1978) theory to
(X ,x , •.. ,Xk )
1 2
is an extremal variate with location and scale parameters ~ and cr,
include censoring of the extremal variate.
Suppose that
i.e., having density
Type I censoring.
of
the likelihood is
Suppose that we can only observe those values
which are
< x
-°
.
-
Then, i f
(r-l)x
1. /crk-r+1 (r-l) !
OJ
and
xr
< X
-
o'
OO>xo>X
>"'>X
- r-
k
(3.1)
e
-4-
Solving the likelihood equations gives
" = a"
~
Type I I censoring.
but
r
Here,
is not random.
(Xl' ••• 'X _ )
r l
in
k
+
~
•
(3.2)
are again unavailable
The likelihood is just the marginal density
of the remaining random variables and is the same as (3.1) except that
is replaced by
with
X
o
X
Likewise,
r
replaced by
a"
are given by (3.2)
and
X
r
The next results apply only to Type I I censoring. Weissman (1978,
Theorem 3) showed that the spacings
exponentials with mean
xi/a
T _
k r
= \"~
L =1..
D.
J
J
,
i
-1
D.
1.
=
(x.-x.+l)/a
1.
are independent
1.
and also independent of
X •
k
&= aT k-r / (k-r+l),
oile can easily verify that
is a standard gamma random variable with parameter
is independent of
size, and
a
n
+ ~.
= a'
b
n
(Recall that
=
k - r
where
which
For the situation described in the introduction,
~.
Weissman's estimator of the pth quantile
np = ~ In(k/c)
Writing
~.)
c
np
is given by
= n(l-p)
where
n
is the sample
Then using the above expression for
a
and Weissman's (1978) expressions (2.8) and (2.9) for the mean and
variance of
~,
we have
k-l
L
a +
~
j=l
and
(3.3)
k-l
k-r 2 (In.(k/c))2
L(k-r+l)
= [
L
j=l
-5-
where
=
y
.5772
is Euler's constant.
If
(x In ~ ••• ,X ) -_is
kn
approximately an extremal variate, then (2.3) with
= bn
j.l
n
and
should give reasonable approximations to the mean and variance
'" . Moreover, since the distribution of W =
n
p
of
cr = a
'"
....
(n -(-crinc +j.l»/cr
p
is free of parameters, confidence intervals can be eonstructed for
+ j.l
-cr£:.n c
np
~
percentiles of
For the case
= 1,
r
c
= 1,
2 : k : 30, the
W may be obtained from a table in Weissman (1978).
In general I suggest Pearson curve approximations to the distribution
of
W (see Solomon and Stephens, 1979) since the moments can be
calculated using the fact that
w = .en(k/c) +
and
and
~
-K
(Xk-j.l)/cr
a
a
a
a
T _
k r
r
2
3
4
=
S 00 .- S2k + a 2
2
l
=
2 (S3°o-S3k) + 3a l a 2
=
6(S4
=
and
e--
The first four moments of
00
2a
3
l
S4k) + 4a a - 12a 2a + 3a 2 + 6a 4
l
1 2
l 3
2
.5772 •.•
is Euler's constant and
at
k
Sik
=
\,k-1 .-i
Lj=l J
Since
E(Tk_r)-~ = 1/[(k-r-1)(k-r-2) ... (k-r-~)J . At
c = 1, the skewness and kurtosis of
S = 3.317
2
are independent.
Slk
is a gamma,
= 1,
+ .en cJ • (k-r+l)/Tk _ r
are given by
= y -
l
y
where
Tk-r
[[\o:~]
= 100,
~
= -.331
and
W are
~
S2 = 3.207
Is:-1 = -.286 and S2 = 3.154 at k = 200.
= -.409 and
at
k = 150 ,
Clearly, the
-6-
W is approaching normality as
distribution of
y - S
:It--
1k
in.
k
and
1'lO
S200 - S2k ::: k
-1
k
+
00.
, we have for
Since
k» r
0
and
-
(k--r+1) 2
j
t(k-r-ll (k-r-2)
~ [1 + (in.(k/c»)2 J / k •
Thus, for large k an approximate
(l-u) x 100% confidence interval for
np is given by
"e
(3.4)
z
where
a
is the ath quantile of the standard normal.
4.
CONSISTENCY
,..
I want to consider the limiting behavior of
under Type II censoring where
In terms of the original sample
censored on the right.
an
=
[\,k
X.
L.i=r 1n
Xl, ..• ,X
n
Xrn -> ••• -> ~
Ukn
+ (r-1)X rn -kX
--l<.n J/(k-r+1) •
But since we have relabeled the upper
x In -> X 2n>
k
order
> X
- kn
may be said to be on the left.
The first lemma is for the
(1. 2) •
A
= a nin.(k/c) +
' r - 1 observations are
statistics in reverse order, the censoring of
to get
~1 -c / n
k > 0
fixed situation described by
~
-l<.n
-7-
LEMMA 4.1.
k > 0, c > 0, and r > 1
If (1.1) holds and
are fixed,
~;
then
a)
a..... / a
b)
(nl-c/n - nl_c/n)/an
n
n
d
->T
k -r
/ (k-r+l)
and
where
~>
Tk _ r in(k!c)!(k-r+l) +
k-r
~
is independent of
having density
combinations of
(X
rn
, •.• ,X
is fixed,
c
the support of
n
kn
c ,
which
exp{-exp(-x)-kx}/(k-l)! •
a'"
The proof follows directly from (1.1) since
When
k ... r
is a gamma random variable with parameter
T
~ + in
n'"
and
n
are linear
) •
converges as
l-c/n
F (typically +00).
n
+
to the supremum of
00
Thus, let us define consistency to
mean that
nl-c/n - nl-c/n .-2-> 0
Lemma 4.1 says that (4.1) holds iff
a
as
n
+
n +
0
00
•
as n
+
of consistency may be a little stringent since n
estimate as
n
EXAMPLES.
a
n
=
~,
If
i. e. ,
Z(tnn +1)
F = <P
F(x) = 1 -
exp(-x~), x
and (4.1) does not hold.
If
F
/
l -c n
is harder to
F
If
> 0,
If
=
(2 log n )-~
is a Weibull with
then
F
is a chi-squared
a
n
+
and (4.1)
2
has exponential tails, i.e.,
F(x) = 1 - exp(-(x-d )/d )
l
2
x
This notion of
•
, the standard normal, then a n
distribution with four degrees of freedom, than
does not hold.
00
gets large.
(see Galambos, 1978, p. 65) and (4.1) holds.
parameter
(4.1)
sufficiently1arge, then
for some constants
a
n
+ d
2
d
l
and
d
Z
and (4.1) does not hold.
for all
e--
-8-
The above examples suggest that (4.1) is not adequate, especially
since the convergence in (1.1) is well known to be faster for the
p.
exponential than for the normal (see Serfling, 1980,
Galambos, 1978, Sec. 2.11).
90, or
An alternative definition of consistency
which is satisfied by all of the above examples is
,.,
nl-c/n - nl-c/n -2-> 0
as
n -+
00
•
nl-c/n
A second asymptotic approach is to let
sample size
n.
k
and
c guow with the
Then we may apply standard L-statistic theory (see
Serfling, 1980, Ch. 8) to get
LEMMA. 4.2.
o<
where
and
nl
1 - a
. .and
1
A
a
n
< 1 - (1
l
-<1
a)
kin -+ aI' c/n -+ 1 - p, r/n -+ a 2
Let
nl-a.
!!EL>a
00
=
< 1
2-
and
o<
p < 1 .
as
n-+
oo
nxl dF(x) <
If
are unique quantiles, then
2
1
(1l-a.2
[1::
n t dt + (12nl -<1
2
-
~"l
n1-<:L ]
1
and
b)
1
Moreover, if
F'(n
l -<1
c)
where
f[F(x)(l-F(x))J~dx <
00
and
F'(n
l
-<1)
1
) > 0 , then
2
~
A
n (nl-c/n -
IC (.)
IC(x)
is the influence
curve of
given by
> 0
and
00
-9-
with
= [I (x<n
_ )
l a
(l-a l ) Jn l -
1
+ xl(n _
l a
= [1
IC (x)
2
~x~
1
n - )
l a
2
i: [I.(x>nl_ai - a2J~1-a2
al
l-a.
2
I'
~
I-a
n dt ,
t
1
- a 2 - I(x ~ nl-a )J/F~nl_a )
2
2
IC (X) = [l-al - I(x<n
)J/F'n
).
l -a
3
- l-al
l
"a
Note that
X
rn
is just a trimmed mean plus a linear combination of
n
~n'
and
The asymptotic variance in
c) is fairly complicated
but may be a more accurate approximation in finite samples than that
provided by Lemma 4.l(b) (see Tables 3 and 4).
Examples.
n
t
= dl
- d
2
Suppose that
In(l-t)
and
F
for all
t
sufficiently large.
Thus
=
(l+exp(-x))
-1
a oo
= dZ
,the standard logistic,
Suppose that
Then
mately exponential tails and the asymptotic bias
divided by
Then
is strongly consistent in the
usual sense under the conditions of Lemma 4.2.
F(x)
e··
has exponential tails, i.e.,
cr p
F
has approxi-
n00 - np
is given in Table 1.
---insert Table 1 here---
Note that
Z
p(l-p)/f (n )n
p
is the asymptotic variance of the sample
pth quantile. Standardization by
cr
p
helps compare across
distributions and the square of each entry multiplied by
one to assess the importance of the bias.
a
l
=
.2, and p
=
2
.95, (.0025) 1000
=
.006
For example, at
n
allows
p
=
.95,
suggests that bias is not
-10-
important.
However, at
p
=
2
.99, (.0133) 1000
= .18
suggests that
bias is beginning to have an affect since it is 18% as large as the
asymptotic variance of the 99th sample percentile.
k/c
~
4-5
The ratio
tends to give the smallest bias except for the two zeros which
occur when
k/c = 1
and
reduces to the usual sample pth quantile.
---insert Table 2 here--Table 2 contains similar results for the Weibull,
F(x) = 1 -
exp(-x~),
x 20.
Since the tails of
F
are not close to
exp-onential, it is not surprising that the bias is generally larger
Tab~e
than in
1.
Here the ratio
k/c
~
8-10
'" will decrease as
n
Of course, the variance of
smallest bias.
appears to give the
k
gets large, so that a balance must be struck between the contribution
of bias to mean squared error (MSE) and the variance contribution.
Tables 1 and 2 are based on Lemma 4.2.
.~
(actually (3.3) with
= an
and
~
b ) or Lemma 4.2 can be used
n
to calculate the bias, variance, and MSE
illustrate these calculations.
However, either Lemma 4.1
of
'"
n.
Table 3 is for
Tables 3 and 4
p
=
.99,
---insert Table 3 here---
0.
2
=0
(no censoring) and
F
=~ ,
the standard normal.
The "truth"
is supplied by Monte Carlo results explained in the next section.
Clearly, the
kIn
+ 0.
1
approach of Lemma 4.2 gives much better results
than the k fixed approach of Lemma 4.1.
A partial explanation may be
the slow convergence of (1.2) for the normal and/or the inaccuracy of
using
a
n
= (21n
1
n)-~
and
b
n
= (21n n) ~ -
~[log
log n + 4rr]/[210g n] ~
Table 4 lists similar calculations for the chi-squared distribution with
4 degrees of freedom.
Here, the k fixed approach does better, but
-11-
still not quite as good as the L-statistic approach.
It is interesting
that the asymptotic theory which produced the estimators
n'"p does not
give as good an approximation in finite samples as the L-statistic
approach.
The last columns of Tables 3 and 4 may be compared to 13.94
and 524, the respective asymptotic variances of the 99th sample
percentiles.
These last comparisons illustrate the kinds of reductions
in MSE that can be made by using
....
n.
More comparisons are given in
the next section.
5.
MONTE CARLO RESULTS
The goal of this section is to indicate empirically some values of
n, k, and p = 1-- c/n
which can be used in practical situations.
The
distributions studied were normal, t distributions with 3 and 8
degrees of freedom, and chi-squared distributions with 1, 4, and 8
degrees of freedom.
These distributions were motivated by the fact that
most statistics of interest are approximately normal or chi-squared
distributed.
A secondary motivation for their use was the availability
of the location swindle (see Gross, 1973) to reduce Monte Carlo
variance.
Sample sizes studied were
Monte Carlo replication size was
n = 50, 100, and 500
N = 5000.
and the
The estimated standard
errors for the bias calculations in Table 5 are in the range + .01
to + .09.
For MSE the maximum relative standard error (s.e./nMSE)
for each column is given in parentheses at the bottom of each column.
Overall these relative standard errors ranged from .010 to .024.
All random variables were generated from normal deviates produced by
the Super-Duper generator of Marsaglia et
at. (1976).
-12-
For comparison purposes I computed the four percentile estimators
listed as options in SAS routine UNIVARIATE (see Chilko (1979), p. 429)
and a fifth estimator (called AV) motivated by the usual definition of
the sample median.
If
of the sample in ascending order and
•
are the order statistics
X(l) ~ ••• ~ X(n)
[.J
is the greatest integer
function, this last estimator is
if
=X ([npJ+l)
np
is an integer
otherwise •
The best of the SAS methods was the default option on UNIVARIATE which
I shall call LINT for linear interpolation.
LINT is actually obtained
by linear inverse interpolation of the empirical distribution function
and defined by
LINT
where
g =
= (l-S)X([np]) + sX([np]+l) ,
np - [np].
However, when
n~ ~
AV performed very well in terms of bias.
[np],
LINT outperformed AV in terms of MSE.
np = [np] , no clear winner emerged.
When
Other "raw percentile" methods
such as k-point inverse interpolation or methods based on nonparametric
density estimators (see Azzalini, 1981) may give small improvements
over LINT and AV.
~-insert
Tables 5 and 6 here---
Table 5 lists Monte Carlo estimates of
in the standard normal.
# censored/k
~
n
~
bias and
nMSE for
Small amounts of censoring, say
.1, have only a minor effect on the results.
For
n~
-13-
= .95, we see that
example, in the first two rows of Table 5 under
p
one censored observation caused an inflation of
nMSE from 3.31 to
3.57.
However, five observations censored (row 3) inflated the
to 6.43.
nMSE
When k gets too large, bias makes a significant contribution
1
to the MSE.
At
P = .975
k = 10, 20, and 50
respectively.
and
n = 100
the ,n~ias values for
(none censored) are -.69, -.05, and 3.86,
The associated n·variance values 4.83, 5.33, and 6.40
are relatively stable in comparison.
Table 6 is a basic reduction of Table 5 and lists the ratio of
~
np
estimates of MSE of
deleted).
to that of LINT (the censored cases have been
Thus, if an entry in Table 6 is
outperforming LINT in terms of MSE.
k/c
for
~
1,
~
np
< 1,
then
is
From Table 6 we see that when
is clearly beaten by LINT, and this is also typically true
kIn > .4.
The optimal ratio
k/c
for
~
np
appears to be about 4.
---insert Tables 7 and 8 here--Table 7 has the same structure as Table 6, except that the data
is from a t distribution with 8 degrees of freedom.
The pattern of
results is very similar to that of Table 6 as one might expect since
t8
is fairly close to a normal distribution.
However, Table 8 based
on the t distribution with 3 degrees of freedom is markedly different.
LINT outperforms
n
p
for most of the entries.
It was also discovered,
though not listed in Table 8, that censored versions of
outperformed uncensored versions.
np
often
It is easy to verify (see
Galambos, 1978, Theorem 2.4.3 (iii)) that (1.1) does not hold for any
t distribution.
~:
-14-
---insert Tables 9-11 here--Tables 9-11 are for chi-squared distributions with 1, 4, and 8
degrees of freedom, respectively.
In these tables we can see that
k/c is unimportant, but rather choosing k
performance of
in the MSE at
'" •
n
p
In Tables 9 and 10 bias begins to play a role
kin =.2
n = 500, so that
In Table 11 bias is important even at
n > 100
preferred for
large tends to optimize the
and
p
n
kIn = .4
is preferred to
= 100,
so that
kin =.2
is
.95 •
~
In general, the new estimators can be expected to do well in
distributions with approximately exponential tails using the rough
kin
rule of thumb
.2.
=
Of course for
n
large enough, bias will
dominate and raw percentile methods will be optimal.
In approximately
normal distributions, more attention needs to be given the ratio
with the rough rule of thumb
~
k/c
4
suggested.
n'"
raw percentile methods are preferable to
6•
for
k/c
In all situations
k/c < 1 .
NUMERICAL EXAMPLE
The basic situation has been described in Section 2.
The specific
example considered here is Monte Carlo estimation of the percentiles
of
T
n
=
min
function of
F (x)
II
d(F ,F,,),
II
n
n
....
where
F
n
is the empirical distribution
= 50 standard logistic random variables,
= [1 + exp(-(x-ll))]-l ,and d(·,·)
distance (see Boos, 1981, for details).
percentiles of
T
n
is the Anderson-Darling
In this particular case the
have been tabulated in Stephens (1979, Table 1)
and appear as the first row of Table 12.
Table 12 are the new estimators
using
The next four rows of
k
= 200, 150, 100, and 50.
-15-
Five observations were outside the reporting gange
were unavailable.
np
However, I used
[0,2]
and thus
based on Type II censoring
because the gap between the 995th observation_.and the upper limit 2 was
larger than expected, which suggested that I would have trimmed the
five largest observations even if they were available.
The last
DOW
of Table 12 is the raw percentile estimator LINT which in these five
is an integer. At p • .9, it would
....
be hard to choose between LINT and 111 for k • 200 ·;or k • 150.
cases is just
since
X(NP)
At
p • .95, all three
At
P • .975 and
at
p •• 99 all but
that in the range
For
....
np
Np
with k/c > 1
....
all four n
p
p • .995
are better than LINT.
k· 200 are an improvement over LINT.
p
=
.95 to p
k· 50, r • 6, and c
~. -.5967, and
are an improvement over LINT.
S2· 3.7061
=
.99,
= 10,
k/c
~
And
It appears
4-6 is best.
EW· -1.6738 , Var W • .0894,
(see Section 3).
From a table of
Pearson Curve percentiles (Bouver and Bargmann, 1974) the 5th percentile
of
W is
-1.784 and the 95th percentile is 1.466.
an • .271, a 90% confidence interval for
....
....
(1.498 - a 1.466, 1.498 + a 1.784)
n
n
n. 99
= (1.396,
method of (3.4) yields (1.378, 1.617).
Thus, using
is given by
1.660).
The approximate
The usual distribution-free
methods based on order statistics (see Serf1ing, 1980, p. 103) yields
(X(985), X(995»
= (1.349, 1.801)
-16-
7.
The percentile estimators
SUMMARY
n'"p
derived from extreme value theory
are biased but often have lower MSE than raw percentile estimators.
If the shape of the underlying distribution
accurate estimate of the MSE of
theory (Theorem 4.2).
Simpler
based on Theorem 4.1 or (3.3).
F
is known, then a fairly
'"
n
can be obtained from L-statistic
p
but less accurate calculations may be
Such calculations along with Monte
Carlo results indicate that moderate reductions (20-35%) in MSE are
possible for normal data using the rule of thumb
k/n(l-p)
~
4 •
Larger reductions may be expected in distributions with approximately
exponential tails.
For example, in the chi-squared distributions with
1, 4, and 8 degrees of freedom, reductions in MSE of 20-80% were
··e
.
.
obtained.
If a reasonable guess of the shape of
F
is not possible,
then a natural approach would be to let the sample itself select
This idea will be pursued in a later report.
k•
-17-
•
TABLE 1.
Cl
1
ex~
Standardized Asymptotic Bias for the Logistic
.9
.2
0
.1
0
0
.05
0
.0108
TABLE 2.
exl
ex2~
.2
0
.1
0
.05
0
*F(x) = 1
-.0113
.95
-.0025
-.0037
0
.99
•.995
~0133
.0147
-.0009
.0028
.0042
- .0013
.00004
.0009
.975
.0067'
Standardized Asymptotic Bias for the Weibu11 *
.9
.95
.975
-.0656
.0326
-.0036
-.0326
-.0417
0
.0347
.0185
-.0076
-.0120
-.1351
0
.0197
.0069
-.0047
- exp(-x~ ), x > 0
.99
•995
e
.
'
..
TABLE 3.
e
Estimates of Bias,
varia~
and MSE in the Standard N9rmal at p
~
.99..
~
1
n~ Bias
n
k
100
100
100
500
500
500
500
500
10
20
50
10
20
50
100
200
Note:
o,l=k/n
From
Lemma 4.1
From
Lemma 4.2
-.19
-.01
.17
2.67
2.51
2.56
2.63
2.69
.1
.2
.5
.02
.04
.1
.2
.4
.45
1.87
7.95
-.40
-.36
1.01
4.19
12.42
Monte
Carlo
-.50
1.13
7.33
-.53
-.73
.68
3.89
12.23
n Variance
From
From
Lemma 4.1 Lemma 4.2
10.:06
10.05
11.40
9.76
9.74
10.06
10.05
10.64
6.32
5.18
3.48
5.97
5.73
4.99
3.98
2.93
Monte
Carlo
9.29
10.01
11.07
9.97
9.67
10.19
10.26
10.80
From
Lemma 4.1
6.36
5.18
3.51
13.10
12.03
11.55
10.90
10.16
n MSE
From
Lemma 4.2
10.26
13.57
74.60
9.93
9.87
11.08
27.62
164.9
Monte
Carlo
9.54
11.29
64.82
10.25
10.20
10.65
25.38
160.3
The asymptotic variance of the 99th sample percentile (times n) is 13.94.
TABLE 4.
Estimates of Bias, Variance, and MSE in the X4
2
at P = .99 •
I
......
00
I
1
n~ Bias
n
k
100
100
100
500
500
500
500
500
10
20
50
10
20
50
100
200
Note:
o,l=k/n
.1
.2
.5
.02
.04
.1
.2
.4
From
Lemma 4.1
-4.06
-2.82
-1. 54
.94
-.33
.. 08 .
.62
1.07
From
Lemma 4.2
.50
2.04
8.25
-.44
-.39
1.11
4.55
13.13
Monte
Carlo
-3.93
-1.13
6.18
-.97
-2.32
-.50
3.20
12.34
n Variance
From
From
Lemma 4.1 Lemma 4.2
298
244
164
363
349
304
242178
-
336 .
277·
199
383
378
336
277
216 .
Monte
Carlo
312
265
193
378
364
323
265
210
From
Lemma 4.1
314
252
166
364
349
304
242
179
The asymptotic variance of the 99th sample percentile (times n) is 524.
n MSE
From
Lemma 4.2
337
281
267
383
379
338
298
388
Monte
Carlo
328
266
231
379
369
323
275
362
-19-
TABLE 5.
k
No.
Censored
Monte Carlo Estimates of Bias and MSE for
Percentile Estimators in the Standard Normal
P = .9
n\ias
nMSE
P=
.95
n\ias
P ';i'
.975
nMSE
n\ias
nMSE
3.31
3.57
6,43
3.tt4
6.85
13.4
4.17
4.33
-,34
-,17
-.17
1.28
2,48
3.27
-1.14
-.73
5,08
5.89
P = .99
n\ias
nMSE
n = 50
10
10
10
20
20
20
0
1
5
0
5
10
LJ:Nr
AV
-.44
,...,39
-,39
-.50
,10
,50
-.45
-.05
2.22
2,25
2.93
2.22
2.87
5.18
2.95
2.77
-,56
-.45
-.45
.22
1.12
1.71
-.69
-.12
( .021)
(.014)
12~4
7.28
15,9
29.4
6.29
6.32
.31
.56
.56
3.04
4,63
5.7
-1.92
-.51
9.21
11.3
25.0
19.1
38.8
65.9
10.2
10.7
(.024)
(.021)
n = 100
10
10
10
20
20
20
50
50
50
LINT
AV
0
1
5
0
5
10
0
5
10
.26
.26
.26
-.58
-.34
-.22
-.15
.49
1.07
-.30
-.02
3.05
3.05
3.05
2.35
2.37
2.91
2.14
2.73
4.16
2.92
2.83
-.46
-.41
-.47
-.55
-.07
.16
1.62
2.53
3.35
-.54
-.07
3.37
3.45
4.35
3.41
4.29
7.03
6.41
11.0
17.1
4.39
4.20
-.69
-.60
-.73
-.05
.68
1.03
3.86
5.04
6.12
-.89
-.17
(.019)
(.015)
5.31
5.70
9.41
5.33
8.62
15.5
21.3
33.3
47.5
6.56
6.96
-.50
-.35
-.56
1.13
2.18
2.68
7.33
8.88
10.3
-1.85
-.05
(.022)
9.54
10.9
21.0
11.3
20.8
36.7
64.8
92.6
123
13.0
11.2
e
..
e
r
(.023)
n = 500
10
10
10
20
20
20
50
50
50
100
100
100
LINT
AV
0
1
5
0
5
10
0
5
10
0
10
20
5.78
5.71
6.25
2.90
2.64
2.66
.13
.13
.13
-1.17
-.86
-.60
-.12
.00
57.4
59.1
79.8
17.3
17.4
20.0
2.97
2.97
2.97
3.37
2.81
1. 55
2.92
2.91
(.017)
Note:
2.79
2.75
3.06
.72
.66
.66
-.80
-.59
-.44
-.74
-.12
.39
-.20
.02
22.1
22.7
29.1
6.19
6.23
6.37
3.78
3.61
3.62
3.68
3.55
4.23
4.50
4.47
(.019)
.87
.87
.94
-.39
-.25
-.26
-.64
-.23
.08
.77
1. 70
2.46
-.38
.00
10.1
10.1
10.6
5.58
5.80
6.48
5.46
5.69
6.44
6.03
9.33
13.8
7.01
7.21
(.019)
-.53
-.50
-.74
-.73
-.33
-.36
.68
1.37
1.88
3.89
5.23
6.33
-.83
.00
10.2
10.6
13.4
10.2
12.8
18.7
10.6
13.8
17 .8
25.4
39.8
55.2
13 .4
13 .1
(.020)
In parentheses are the maximum estimated relative standard errors for
each column.
.
-20-
-_
TABLE 6.
"
Ratio of MSE of n
=
=
P = .975
k/c Ratio
P = .99
k/c Ratio
k/c
.9
Ratio
P
k/c
10
20
2
4
.75
.75
4
8
.79
.83
8
16
.81
1.16
20
40
.90
1.87
100
100
100
10
20
50
1
2
5
1.04
.80
.73
2
4
10
.77
.78
1.46
4
8
20
.81
.81
3.24
10
20
50
.74
.87
5.00
500
500
500
500
500
10
20
50
100
200
.2
.4
1
2
4
19.69
5.94
1.02
1.15
,'99
.4
.8
2
4
8
4.92
1.38
.84
.82
1.63
.8
1.6
4
8
16
1.44
.80
2
.76
4
.76
10
.79
20 1.89
40 11.92
p
n
k
50
50
_
-.
to MSE of LINT for Normal Data
p
TABLE 7.
"
Ratio of MSE of n
p
=
.95
Ratio
.7>8
.86
5.81
to MSE of LINT for t
=
P = .975
k/c Ratio
.9
Ratio
p
k/c
2
4
.78
.86
.82
8
16
.81
.77
4
8
10
20
50
1
2
5
1.08
.78
.75
2
4
10
.80
.78
1.02
10
20
50
100
200
.2
.4
1
2
4
20.4
4.94
1.01
.82
.78
.4
.8
2
4
8
5.23
1.35
.76
.75
1.14
n
k
50
50
10
20
100
100
100
500
500
500
500
500
p
k/c
.95
Ratio
8
Data
P = .99
k/c Ratio
.77
20
40
.74
.71
4
8
20
.86
.76
1.30
10
20
50
.75
.57
1.17
.8
1.6
4
8
16
1.48
.82
.78
.73
1.83
2
4
10
20
40
.83
.80
.67
.59
1.88
-21-
TABLE 8.
e-
"-
Ratio of MSE of n to MSE of LINT for t Data
p
3
p
= .9
P = .95
klc. Ratio
P = .975
k/c Ratio
P = .99
k/c Ratio
k/c
Ratio
10
20
2
4
1.29
1.23
4
8
1.38
1.01
8
16
.99
.61
20
40
.60
.36
100
100
100
10
20
50
1
2
5
1.17
1.40
1.31
2
4
10
1.53
1.46
.97
4
8
20
1.58
1.10
.56
10
20
50
1.00
.62
.28
500
500
500
500
500
10
20
50
100
200
.2
.4
1
2
4
190
26.7
1.04
1.97
2.02
.4
.8
2
4
8
21.6
1.62
1.90
1.83
1.32
.8
1.6
4
8
16
1. 74
1.56
1.71
1.04
.59
2
4
10
20
40
1.67
1.58
.87
.60
.46
n
k
50
50
TABLE 9.
" to MSE of LINT for Xl2 Data
Ratio of MSE of n
p
p
= .9
P = .95
k/c Ratio
P = .975
k/c Ratio
e
P = .99
k/c Ratio
k/c
Ratio
10
20
2
4
.79
.74
4
8
.85
.75
8
16
.81
.67
20
40
.78
.62
100
100
100
10
20
50
1
2
5
1.15
.78
.71
2
4
10
.79
.76
.67
4
8
20
.84
.77
.68
10
20
50
.74
.60
.56
500
500
500
500
500
10
20
50
100
200
.2
.4
1
2
4
24.39
5.77
1.03
.79
.75
.4
.8
2
4
8
5.52
1.42
.75
.73
.69
.8
1.6
4
8
16
1.52
.81
.76
.68
.79
2
4
10
20
40
.77
.75
.64
.55
.78
n
k
50
50
.'
-22-
-_
TABLE 10.
'"
Ratio of MSE of n
p
p = .9
to MSE of LINT for X 2 Data
4
P = .95
k/c Ratio
P = .975
k/c Ratio
P = .99
k/c Ratio
k/c
Ratio
10
20
2
4
.76
.74
4
8
.81
.76
8
16
.80
.68
20
40
.77
.61
100
100
100
10
20
50
1
2
5
1.10
.73
.71
2
4
10
.75
.74
.67
4
8
20
.82
.76
.66
10
20
50
.74
.60
.53
500
500
500
500
500
10
.20'
50
100
200
.2
.4
1
2
4
19.80
5.07
1.01
.74
.72
.4
.8
2
4
8
5.15
1.38
.73
.72
.71
.8
1.6
4
8
16
1.50
.78
.75
.69
.76
2
4
10
20
40
.77
.75
.66
.56
.73
n
k
50
50
TABLE 11.
e
'"
Ratio of MSE of n
p
p = .9
n
k
50
50
to MSE of LINT for X 2 Data
8
. P = .95
k/c Ratio
P = .975
k/c Ratio
P = .99
k/c Ratio
k/c
Ratio
10
20
2
4
.74
.73
4
8
.83
.78
8
16
.81
.73
20
40
.78
.71
100
100
100
10
20
50
1
2
5
1.09
.72
.70
2
4
10
.77
.75
.78
4
8
20
.82
.78
.94
10
20
50
.74
.65
.95
500
500
500
500
500
10
20
50
100
200
.2
.4
1
2
4
19.2
5.10
1.02
.79
.76
.4
.8
2
4
8
5.05
1.39
.75
.74
.81
.8
1.6
4
8
16
1.48
.79
.75
.71
1.22
2
4
10
20
40
.76
.75
.67
.68
1.62
-23-
TABLE 12.
p
True Percentiles
Estimates of the Pencenti1es of
e-
T
n
.9
.95
.975
.99
•995
.854
1.043
1.238
1.502
1. 707
....
an
k
= 200
.860
1.044
1.227
1.470
1.654
.265
k
= 150
.858
1.047
1.235
1.484
1.672
.272
k
= 100
.851
1.045
1.240
1.497
1.692
.281
k
=
.873
1.061
1.249
1.498
1.686
.271
.849
1.057
1.206
1.471
1.801
50
LINT
-24-
·e
REFERENCES
Azzalini, A. (1981), "A Note on the Estimation of a Distribution Function
and Quantiles by a Kernel Method," Biometrika, 68, 326-328.
Boos, D. D. (1981), "Minimum Anderson-Darling Estimation," preprint.
Bouver, H., and Bargmann, R. E. (1974), "Tables of the Standardized
Percentage Points of the Pearson System of Curves in Terms of
Sl and S2'" Technical Report No. 107, Department of Statistics and
Computer Science, University of Georgia, Athens.
Chilko, D. M. (1979), "Univariate Procedure," in SAS User's Guide,
1979 Edition, eds. J. T. Helwig and K. A. Council, SAS Institute,
Inc., 427-432.
Dickey, D. A. (1981), "Histograms, Percentiles, and Moments,"
The American Statistician, 35, 164-165.
Galambos, J. (1978), The Asymptotic Theory of Extreme Order Statistics,
New York:
John Wiley.
Gross, A. M. (1973), "A Monte Carlo Swindle for Estimators of Location,"
Applied Statistics, 22, 347-353.
Marsaglia, G., Ananthanarayanan, K., and Paul, N. J. (1976),
"Improvements on Fast Methods for Generating Normal Random
Variables," Information Processing Letters, 5, 27-30.
Serfling, R. J. (1980), Approximation Theorems of Mathematical Statistics,
New York:
John Wiley.
Solomon, H., and Stephens, M. A. (1978), "Approximations to Density
Functions using Pearson Curves," Journal of the American Statistical
Association, 73, 153-160.
-25-
Stephens, M. A. (1979), "Tests of Fit for the Logistic Distribution
Based on the Empirical Distribution Function," Biometrika,
e·
66, 591-595.
Weissman, 1. (1978), "Estimation of Parameters and Large Quanti1es
Based on the k Largest Observations, JournaZ of the American
StatisticaZ Association, 73, 812-815.
e·-
© Copyright 2026 Paperzz