Steinberg, Seth Michael; (1983).Confidence Intervals for Functions of Quantiles Using Linear Combinations of Order Statistics."

CONfIDENCE INTERVLAS FOR FUNCTIONS OF QUANTILES
USING LINEAR COMBINATIONS OF ORDER, STAHSnCS
by
Seth
Mich~el
Ste.tnberg
Department of Biostatistics
University of North CaroHna (J,t Chapel Hill
Institute of Statistics Mimeo Series No, 1433
r~arcn
1983
CONFIDENCE INTERVALS FOR FUNCTIONS OF QUANTILES
USING LINEAR COMBINATIONS OF ORDER STATISTICS
by
Seth Michael Steinberg
A Dissertation submitted to the faculty of The University of North
Carolina at Chapel Hill in partial fulfillment of the requirements
for the degree of Doctor of Philosophy in the Department of
Biostatistics, School of Public Health.
Chapel Hi 11
1983
Approved by:
,
j
f
.,_...-,
/
i
/j.... tJ
l .. !/'-"rv\
i
Advisor
,<t
ii
ABSTRACT
SETH MICHAEL STEINBERG. Confidence Intervals for Functions of Quantiles Using Linear Combinations of Order Statistics. (Under the
direction of C.E. DAVIS)
Estimators for quantiles based on linear combinations of order
statistics have been proposed by Harrell and Davis (1982) and Kaigh
and Lachenbruch (1982).
Both estimators have been demonstrated to
be at least as efficient for small sample point estimation as an
ordinary sample quantile estimator based on one or two order statistics.
Distribution free confidence intervals for quantiles can be
constructed using either of the two approaches.
By means of a simu-
lation study, these confidence intervals have been compared with
several other methods of constructing confidence intervals for quantiles in small samples.
For the median, the Kaigh and Lachenbruch
method performed the best overall.
For other quantiles, no method
performed better than the method which uses pairs of order statistics.
The interquantile difference is often useful as a measure of
dispersion.
Both the Harrell-Davis and Kaigh-Lachenbruch estimators
are modified to estimate interquantile differences.
Theoretical
developments needed to establish large-sample use of the normal
distribution for these estimators are presented.
Both of these
methods are used to form pivotal quantities with asymptotic normal
distributions, and thus are readily used for construction of confidence intervals.
The poi nt estimators of i nterquantil e di fference are compared
iii
through simulations on the basis of relative mean squared errors.
The estimator based on the Harrell-Davis method generally performed
best in this regard.
Confidence intervals are constructed and com-
pared with a method based on pairs of order statistics.
This order
statistic method produced very conservative intervals.
The perfor-
mance of the other estimators varied, and was better for symmetric
distributions.
Neither method could consistently produce intervals
of the desired confidence.
Finally, an example using data from the Lipid Research Clinics
Program is presented to illustrate use of the new estimators for
point and interval estimation of quantiles and interquantile differences.
iv
e.
ACKNOWLEDGEMENTS
My committee, chaired by Dr. C.E. Davis, was extraordinary in
the amount of time and effort put forth to assist with this project.
I sincerely thank and am grateful to Dr. Davis for his availability
to discuss my work, suggestions, and guidance throughout my writing
of the dissertation.
I want to thank the other members of the com-
mittee, Drs. Shrikant Bangdiwala, Frank Harrell, Abdel Omran, and
Dana Quade, for their comments and suggestions, and for maintaining
a strong interest in the project.
The past few years in Chapel Hill have been very enjoyable.
This is due in large part to the many wonderful friends I have made
here.
I thank them all for their support along the way.
My parents deserve special thanks for encouraging me to obtain
a worthwhile education and for supporting me throughout the whole
process.
I would like to thank Dr. P.K. Sen for initially suggesting my
investigation of this area of research, and for providing helpful
information when it was needed.
Dr. William Kaigh of The University
of Texas at El Paso, and Dr. Bruce Schmeiser of Purdue made available some results of their own research and is gratefully acknowledged.
Data for the example in Chapter V are used with permission of the
National Heart, Lung, and Blood Institute.
Finally, I would like to thank Ernestine Bland for providing
superb, speedy typing services for this manuscript, and the entire
faculty and staff of the Department of Biostatistics for making my
v
experience pleasant and rewarding.
Funding was provided by NICHD
training grant #5-T32-HD07l02-05, and by Survey Design, Inc.
vi
TABLE OF CONTENTS
Page
ACKNOWLEDGEMENTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. i v
L1ST OF TABLES. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .• .. .. i x
CHAPTER
I
INTRODUCTION AND REVIEW OF THE LITERATURE
.
1.1 Introducti on. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Revi ew of the Literature..... . . . . . . . . . . . . . . . . . . . . .
1. 2.1 Simple Point and Interval Estimators
1. 2. 2 Various Median Estimation Methods
1.2.3 Estimators for the p-th Quantile
1.2.4 Quantile Estimators for Specific
Distributions
1
2
2
8
16
23
1.2.4.1
1.2.4.2
Normal Distribution Quantiles ..... 23
Exponential Distribution
Quantiles
25
1.2.4.3 Quantile Estimation for Other
Distributions
28
1.2.5 Estimation of Quantile Intervals
1.2.6 Estimation of Quantile Differences
1.3 Outl ine of the Research Proposal
II
28
31
32
A COMPARISON OF CONFIDENCE INTERVALS FOR QUANTILES ..... 34
2.1 Introducti on
2.2 Selection of Interval Estimators for Comparison
2.3 Note on the Use of the Kaigh and Lachenbruch
Estimator
2.4 Eval uation of Confi dence Interval s
2.4.1
34
34
36
38
Exact Confidence Intervals
38
2.4.1.1 Determination of Confidence
2.4.1.2 Expected Length of Confidence
Intervals
38
2.4.2 Simulated Confidence Intervals
2.4.2.1 Determination of Confidence
2.4.2.2 Expected Lengths of Intervals
39
43
44
45
vii
2.4.3 Selection of Distribution for Pivotal
Quantity
2.5 Details of the Simulation Process
2.6 Results from Simulated or Theoretical
Construction of Intervals
2.7 Conclusions
III
47
47
49
51
THEORY FOR ESTIMATION OF AN INTERQUANTILE DIFFERENCE ... 69
3.1 Introduction
69
3.2 Theory for the L-COST Estimator of Interquanti 1e Di fference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 70
3.2.1
The L-COST Interquantile Difference
Estimator
70
3.2.2 Theoretical Framework for Convergence
toN 0 rma 1i ty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 71
3.2.2.1
L-estimators and the L-COST
Estimator
71
3.2.2.2 Establishing Conditions for
Convergence. . . . . . . . . . . . . . . . . . . . . .. 72
3.2.3 Convergence Theorems for the L-COST
Estimator of Interquantile Difference ..... , 77
3.2.4 Confidence Interval Estimator Based
on L-COST Interquantile Difference
Estimator
80
3.3 Theory for the Kaigh and Lachenbruch (1982)
Estimator of an Interquantile Difference
81
3.3.1
The K-L Interquantile Difference
Est i rna to r. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 81
3.3.2 Convergence Theorems for the K-L
Estimator of Interquantile Difference
82
3.3.3 Confidence Interval Estimator for the
K-L Interquantile Difference Estimator ..... 84
IV
A COMPARISON OF POINT AND CONFIDENCE INTERVAL
ESTIMATORS OF INTERQUANTILE DIFFERENCES
86
4.1
4.2
Introducti on
86
Point Estimators for the Interquantile
Di fference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 87
4.3 Eval uation of Point Estimators
87
4.3.1 Methodology for Comparisons
4.3.2 Results of Comparisons
87
89
viii
4.4
4.5
4.6
V
EXAMPLE OF QUANTILE ESTIMATION METHODS •••.•••.••••.•••• 105
5. 1
5.2
5.3
VI
Evaluation of Confidence Intervals •.•.•.••••.••••• 91
Results from Simulated Confidence Intervals ••••.•• 93
Conclusion and Sunmary
95
Introducti on •••••.••••••••.•.•••••..•.••.•.••••••• 105
Comparison of Results for the Example .•.•••.••.•.• 106
Conclusion ...••••••••••••••••.•..•••.•.•..••.••••. 108
SU~~ARY AND SUGGESTIONS FOR FURTHER RESEARCH .•.•.•.•.•• 119
6. 1 Summa ry •.•••••.••••••••••.••.•••.••••••...•••..•.• 11 9
6.2 Suggestions for Further Research
121
BIBLIOGRAPHy •..•••...••••••.•••••..•.•.••.•.••••.•••.•..••.••.• 123
APPENDIX •..•.••.•.•.•..•.•..•.••.•.•.••..•..•••..••••••••.••••. 129
ix
LIST OF TABLES
Page
TABLE
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
Order Statistics X(j); X(k) Comprising a Confidence
Interval (with Theoretical Confidence) for Various
Quantiles and Sample Sizes
54
Expected Lengths of 95% Confidence Intervals (and
Theoretical or Observed Confidence) Computed for
Various Quantiles of the Uniform Distribution, with
Three Sample Sizes
55
Expected Lengths of 99% Confidence Intervals (and
Theoretical or Observed Confidence) Computed for
Various Quantiles of the Uniform Di~tribution, with
Three Sample Sizes
56
Expected Lengths of 95% Confidence Intervals (and
Theoretical or Observed Confidence) Computed for
Various Quantiles of the Normal Distribution, with
Three Sample Sizes
57
Expected Lengths of 99% Confidence Intervals (and
Theoretical or Observed Confidence) Computed for
Various Quantiles of the Normal Distribution, with
Three Sample Sizes
58
Expected Lengths of 95% Confidence Intervals (and
Theoretical or Observed Confidence) Computed for
Various Quantiles of the Cauchy Distribution, with
Three Sampl e Sizes
59
Expected Lengths of 99% Confidence Intervals (and
Theoretical or Observed Confidence) Computed for
Various Quantiles of the Cauchy Distribution, with
Three Sample Sizes
60
Expected Lengths of 95% Confidence Intervals (and
Theoretical or Observed Confidence) Computed for
Various Quantiles of the Exponential Distribution,
with Three Sample Sizes
61
Expected-Lengths of 99% Confidence Intervals (and
Theoretical or Observed Confidence) Computed for
Various Quantiles of the Exponential Distribution,
with Three Sample Sizes
63
x
2.10
2.11
Expected Lengths of 95% Confidence Intervals (and
Theoretical or Observed Confidence) Computed for
Various Quantiles of the Lognormal Distribution,
with Three Sample Sizes
~
65
Expected Lengths of 99% Confidence Intervals (and
Theoretical or Observed Confidence) Computed for
Various Quantiles of the Lognormal Distribution,
with Three Sample Sizes
67
4.1
Relative Bias of Proposed Estimators
97
4.2
Relative Efficiency of Proposed Estimator vs.
Sample Quantiles Method
98
4.3
Indices for Order Statistics Selected for Formation of Confidence Intervals Described by Chu (1957) ... 99
4.4
Expected Lengths of Confidence Intervals (and
Observed Confidence) Computed for Interquantile
Distances from the Uniform Distribution
.100
Expected Lengths of Confidence Intervals (and
Observed Confidence) Computed for Interquantile
Distances from the Normal Distribution
101
Expected Lengths of Confidence Intervals (and
Observed Confidence) Computed for Interquantile
Distances from the Cauchy Distribution
102
Expected Lengths of Confidence Intervals (and
Observed Confidence) Computed for Interquantile
Distances from the Exponential Distribution
103
Expected Lengths of Confidence Intervals (and
Observed Confidence) Computed for Interquantile
Distances from the Lognormal Distribution
104
Estimates of Median, Lipid Data, Sample Size 51,
Users and Nonusers of Oral Contraceptives
110
Limits for 95% Confidence Intervals for Median
of Lipid Data, Sample Size 51, Users and Nonusers
of Oral Contraceptives
111
Limits for 99% Confidence Intervals for Median
of Lipid Data, Sample Size 51, Users and Nonusers
of Oral Contraceptives
112
4.5
4.6
4.7
4.8
5.1
5.2
5.3
xi
5.4
5.5
5.6
5.7
5.8
5.9
Estimates of Interdecile Difference, Lipid Data,
Sample Size 51, Users and Nonusers of Oral
Contracepti ves
113
Limits for 95% Confidence Intervals on Interdecile Range, Lipid Data, Sample Size 51, Users
and Nonusers of Oral Contracepti ves
114
Limits for 99% Confidence Intervals on Interdecile Range, Lipid Data, Sample Size 51, Users
and Nonusers of Oral Contraceptives
115
Estimates of Interquartile Difference, Lipid
Data, Sample Size 51, Users and Nonusers of
Oral Contraceptives
116
Limits for 95% Confidence Intervals on Interquartile Range, Lipid Data, Sample Size 51,
Users and Nonusers of Oral Contraceptives
117
Limits for 99% Confidence Intervals on Interquartile Range, Lipid Data, Sample Size 51,
Users and Nonusers of Oral Contraceptives
118
CHAPTER I
INTRODUCTION AND REVIEW OF THE LITERATURE
1.1
Introduction
Suppose there is interest in the probability distribution of
some random variable, X, having cumulative distribution function
denoted F(x) and probability density (or mass) function f.
It may
be desired to estimate various characteristics of this distribution
by means of a random sample of size n.
Denote this random sample
by Xl"" ,X n and its observed realization by xl"" ,x n ' Often a
mean and standard deviation are estimated from this sample, but
there are many instances where additional measures of location and
dispersion are of more value.
For example, to determine the number
of months of marriage after which half of the mothers in a study
gave birth to their first child, estimate the median time until
first birth.
Or it may be necessary to know the cholesterol level
which is exceeded by only 5% of the studied population.
In each of
these examples, the quantity of interest is called a population
quantile.
Formally, a p-th quantile, denoted F-l(p) or ~p' of a probability
distribution F(x) is defined by
median of the distribution,
~.95
f~~
f(x)dx
= p. Thus, s.5 is the
is the 95th "percentile,1I and so on.
2
If X is discretely distributed, P(X <
~p) <
p
~
P(X
~ ~p)
and 'p
=
F-l(p), where F-l(p) = inf(x: F(x) > p).
In addition to the quantiles themselves, important functions
of quantiles exist.
For example,
quartile range and '.90 -
~.10
~.75
-
~.25
is called the inter-
is the interdecile range.
Each of
these quantities provides a useful measure of dispersion in a population, especially if there is uncertainty about the shape of the
distribution from which the data arise.
1.2 Review of the Literature
1.2.1
Simple Point and Interval Estimators
Many methods have been proposed for estimating a quantile, and
the very simplest one is based on a single ordered value from the
random sample.
If the observed values of the sample, xl , ... ,x n are
arranged in ascending order of magnitude and denoted by x(i)' then
x(l)
~
...
~
x(n) constitute the order statistics corresponding to
the random sample.
One such definition of the p-th sample quantile is
X
= np
x(np)
if [npJ
x([np]+l)
if [npJ < np,
P ={
where [yJ denotes the greatest integer less than or equal to y.. Xp
may be used as a point estimator of ~r; for example, see Ogawa (1962).
This estimator has the following important property:
If f(x) is differentiable in the neighborhood of
x
= ~p'
and
f(~p)
r 0,
the distribution of the
3
~
random variable In/p(l-p)
f(~p)(Xp-~p)
tends to
that of a N(O,l) random variable (normal random
variable with mean 0, variance 1) as n +
as
00,
explained by Ogawa (1962).
Another estimator, defined as X[p(n+l)]' is biased for
shown by a simple example in Schmeiser (1975).
~p
as
Hogg and Craig (1978,
pp. 308-311) demonstrate, however, why X((n+l)p) is the 100 p-th
percentile of the sample only for p such that (n+l)p is an integer.
Let Z = F(X).
Then Z is uniformly distributed on the unit interval.
If the sample is ordered X(l)
Zi
=
<
X(2)
< ... <
X(n)' then with
F(X(i))' the density function for zk is
h ( ) =
n!
zkk- l (l-Zk)n-k.
k zk
(k-l)!(n-k)!
'
o < zk
<
1,
so
o<
Define the random variables Wl
zl < 1.
= F(X(1))
W2 -- F(X(2)) - F(X(l))
Wi is called a coverage of the random interval {I: X(i_l) < I < X(i)}'
and each Wi has the same pdf as Zl = F( X( 1))'
Thus, the expected
value of each coverage can be shown to be
E( W.) .=
1
1
J nw . (1-w. )n- 1 dw.
0
1
1
1
=
1/ (n+ 1) .
4
(The order statistics lead to a partition of the probability distribution into n+l sections, with common expected value l/(n+l».
because F(X(j»
F(X(i»)
=
- F(X(i»
is the sum of j-i coverages, E(F(X(j»
(j-l)/(n+l), so if (n+l)p
(n+l)p/(n+l) = p.
Thus,
=
k, E(F(X(k»)
=
k/(n+l}
=
This implies that X(k) would be a reasonable
estimator for F-l(p}
= ~p'
Another quantile estimator is based on two adjacent order statistics.
As reported in Harrell and Davis (1982) as well as in
-,....
Schmeiser (1975), F~ (p)
a
= (l-a}X(r) + a X(r+l) where r = [p(n+l}],
= p(n+l} - [p(n+l}], and p
E
[l/(n+l}, n/(n+l}].
Schmeiser (1975)
demonstrates that this is unbiased for data from a uniform (O,l) distribution.
Only point estimators have been discussed so far.
If X is a
continuous random variable, a confidence interval (X(j}'X(k)} can be
derived for
~p
with an approximate confidence coefficient of y
As shown in Mood, Graybill, and Boes (1974),
= 1 - P[F(X(j»
> p] - P[F(X(k}} < p]
It is easily shown that the probability density function of
f(x(j}}
=
)
n!
} j-l
(}]n- j (
(j-l}!(n-j)! [F(x(j}]
[1 - F x(j)
f xU}
= l-a.
5
Thus, if we let Z = F(X(j))' then ~~ = f(X(j)).
fz(z) = IdZ)dXI f(X(j)) =
(j-l)~in-j)!
So,
j
zj-l (l_z)n- ,
(O<z<1).
This means that
u] =
=
r
o
fz(Z)dz
1
S(j,n-j+l)
fU
o
j-l (l_z)(n-j+l)-l dz
z
where the beta function is
B(a,b) •
I:
t a- 1 (l_t)b-l dt, (a
>
0, b
> 0)
= r~a~+~
b)
and the incomplete beta function is defined by
Ip(a,b) = JP t a- l (l_t)b-l dt/S(a,b).
o
In practice, the appropriate interval is defined by (X(j)'X(k))
which have j and k satisfying
Ip(j,n-j+l) - Ip(k,n-k+l) = Y = l-a.
An alternative derivation in David (1981) leads to
n
=
=
I
i=j
k-l
I
i =j
(~) pi (l_p)n-i 1
(~)pi (l_p)n-i
1
n
(l_p)n-i
L (~)pi
1
i=k
= n(n,j,k,p),
6
In the discrete case, p(X <
~p) ~
p and P(X <
~p) >
P together
imply that
So,
P(X(j)
< ~p ~
X(k)) ? n(n,j,k,p),
P(X(j)
< ~p <
X(k))
<
n(n,j ,k,p).
Many years ago, Thompson (1936) and later Scheffe (1943), presented this result in terms of intervals for the unknown median, M,
of a population:
2k
<
P(X(k) < M< X(n-k+l))
=1
- 2I. 5 (n-k+l,k) for
n+1.
Nair (1940) compares this result to a very similar one obtained
independently by Savur (1937).
The latter method only differs be-
cause of the assumption that there is a finite probability for an
individual observation to equal the population median, which implies
F(x) is noncontinuous.
Scheffe and Tukey (1945) discuss the problem of interval estimation of quantiles paying particular attention to discrete distributions.
This work is limited to showing how to employ the proba-
bility integral transformation to define the interval regardless of
the form the cdf assumes.
Noether (1948) proceeds by means of step functions defined to
be parallel to Fn(x), the empirical cdf.
He then demonstrates that
his method leads to the same kind of interval as that of Thompson.
7
Wilks (1948)
be~ins
with a slight rewriting of the interval defini-
tion,
= P(U
<
P < U + V),
but arrives at Thompson's result as well.
This methodology is still very much in use today.
Lever (1969)
provides a simple example of how to apply this type of interval to
cover the p-th quantile from a mortality distribution for data observed in a laboratory setting.
Because it is based on the binomial distribution, the one sample
sign test (for example, Mood, Graybill, and Boes (1974), p. 514) can
easily be converted into a confidence interval for any desired quantile.
This is done by including in the interval any gaps between
order statistics whose binomial probability of occurrence (under the
null hypothesis of p = Po' the true value for the p-th quantile) can
be added appropriately to bracket the hypothesized value with a
stated total confidence.
For example, in order to estimate the
lower quartile (~.25)' first form the binomial array based on
(~)
.25 i .75 n- i , i=l, ... ,n. Then, beginning with the gap between
1
order statistics with the largest confidence, determine the interval
by adding together probabilities around this value until at least
the required confidence has been achieved.
The interval should be
closed by including the values for the order statistics defining the
ends of the intervals.
8
As well, two methods have been discussed which are based on the
large sample normal approximation to the binomial distribution.
first, a simple rule of thumb, is presented in David (1981).
The
For a
sample size of at least 10, an approximate l-a confidence interval
for the median is obtained by counting off ~;n ~-1(1-a/2) observations
on either side of
X[~(n+l)J'
the sample median, where ~-1(1-a/2) is
the upper ~ point of a normal distribution.
The second method, described by Wilks (1962), defines n as the
l
number of components of the sample less than ~ p . By the Det~oi vreLaplace theorem, nl - Bin(n,p) is distributed asymptotically as
N(np,np(l-p)). Thus,
n,-np
< Y)
lim P(-y <
n+oo
y Inp(l-p)
y
=
y
where
__1__ fYY e-~ x dx
!21T -y
y
2
(n l -np)2
Solving np(l-p)
y
= yy2 for py provides an approximate y confidence
interval (p ,p) for p.
J
=
y
This leads to using (x[np J,x[nP-J) as a
y-level confidence interval for
~p'
J
y
1.2.2 Various Median Estimation Methods
In recent years, many competing estimators have been proposed.
Some have application only to the population median, while others
can be applied to virtually any quantile.
Much attention has been devoted to point and interval estimation of the median.
Some relatively simple interval estimators for
9
the median were proposed by Walsh (1958).
fidence intervals for
~.5
One of his two-sided con-
can be written as
where for small samples, 1
~
i
~
5
~
n
~
12 and i
~
n-4.
Another
two-sided interval is of the form
The confidence coefficient for this interval has a value
p[min[~(x(1)+ x(l+i))' x(2)] < i:. 5 ]
+ P[max[~(x(n) + x(n_j))' x(n-1) >
l;.5] - 1.
The lower and upper bounds are determined by setting each of these
probability expressions equal to its upper and lower bound value.
These probabilities are tabled according to sample size and parameters A and B which reflect the degree of population symmetry.
Two simple median estimation methods are proposed by Ekblom
(1973).
+ 00.
Define A as a constant whose value can be between 1 and
Assume there is a sample of size n (even).
Let the sample
median be defined by:
if
n =
2k-l
+ x( k+1) ) if n = 2k.
Then, the two estimators are:
X(k)
p( A)
if x(k) - x(l) ~ A(X(n) - x(k+l))
= x(k+1) if x(n)
(x(k) + x(k+l))/2
x(k+l) ~ A(x(k) - x(l))
if otherwi se
10
and
N(A)
=
X(k)
if x(n) - x(k+l) ~ A(x(k) - x(l))
x(k+l)
if x(k) - x(l) ~ A(X(n) - x(k+l))
(x(k) + x(k+l))/2
If A =
00,
if otherwise.
the sample median, M, would be used.
Monte Carlo tests on
various values of A indicate that P(2) had higher relative efficiency
than M for the normal, triangle, and uniform distributions.
N(A),
A = 1, 1.5, or 2, performed better for these distributions as well
as the Cauchy.
Since the estimator is a simple function of order
statistics, asymptotic normal theory results lead to constructing a
confidence interval of the form P(A)
N(A)
±
±
¢-1(1-a/2) S(P(A)) or
¢-1(1-a/2) S(N(A)) where S(P(A)), S(N(A)) are estimated stan-
dard deviations of the estimators.
Maritz and Jarrett (1978) provide formulas for obtaining estimates of the variance of the sample median based on both even and
odd sample sizes.
For odd (n
=
2m+l) sample sizes it is implied
that the variance, E(X n'" 2) - [E(X'"n )J2, where
E(X~r)
=
(2m+l~! foo xr[F(x)(l-F(x))J m f(x)dx,
(m! )
_00
can be obtained by using Fn(X) to estimate y
~
F(x).
estimate of the variance can be written in terms of
where
Then, the
11
w. = (2rn+1)!
(m!)2
J
j/n
f(j-l)
ym (1-y)m dy.
n
A more complicated expression is derived for the estimate of the
variance when the sample size is even.
Applying the results of
these methods can lead to approximate confidence intervals for the
median.
Another method of arriving at confidence intervals for the
median is mentioned in Hartigan (1969). It requires forming all
N = 2n_l possible subsets of the index set {l, ... ,n}. For each
subset s, form the "subsample mean" L:.lES x(.1 )/L:.lES 1.
Letting
Z(l)""'Z(N) denote these subsample means, arrange them in ascending order of magnitude, and then use (Z(i)'Z(j)) as the confidence
interval where i and j are selected to be consistent with the
interval's confidence of (j-l)/N.
A simple method, which is Lanke's (1974) main focus, requires
~.5'
a unimodal and symmetric distribution around
Define R = X(n) -
X(l)' then for every A ~ 0, it is shown that
P(X(l) - AR
<
~.5
<
X(n) + AR) ~ 1 - (2+2A)-n+l
and the lower bound will be the best possible.
Specifically, if
A = ~a-l/(n-l) - 1 is selected, then the interval (X(l) - AR,
X(n) + 'AR) has a confidence coefficient of at least l-a for the
median.
If the underlying population distribution is normal, use
numerical integration to obtain A values for which the true confidence level is l-a.
Compared to the usual
parametric confidence
interval, this one has been shown to be no more than 11% longer.
12
A second modification is also discussed which performs very well for
a uniform distribution.
Gui1baud (1979) provides an interval estimate for the median
which extends Thompson's results.
Let
r-1
C (r)= 1 - 2 ~ (n)2- n
v=O v
n
and if In(r) = [x(r),x(s)J, 1
P(~.5
E
where 1
P(X
~
r
~
~
n-2r+1
Cn(r) +
~
P(X
then
Cn(r+t).
r: s = n-r+1, it has been shown that
Define a second interval
s = n-r+1, and 0
< ~.5) ~ ~ ~
o$ t
~
In(r)) > Cn(r).
~
~ ~o5)'
P(~.5 E
~
t
~
s-r.
~.5
If
is such that
and r,t,n satisfy 1 ~ r ~ n-r+1, and
In(r;t))
>
Ln(r;t), where Ln(r;t) =
The lower bound in this inequality is stated
to be the best possible.
If F(x) is continuous and strictly in-
creasing, and F'(u) is uniquely defined and continuous on (0,1), then
the probability of lying in the interval In(r;t) is increased by a
function of SF(u), a rather complex "symmetry function".
A slight modification to the confidence interval derived from
the sign test is presented by Noether (1973).
The technique only
requires looking at the m largest differences among
2
~
IX i - nol ,
m $ n, and forming the statistics
m
m
T
=
~
j=l
to,
J
T+ = ~ (l-t
J
j=l
o ).
reject Ho ' that ~.5 = ~~5' if min(T+,T_) $ C.
Because a symmetric population with a continuous cdf is assumed,
The test is:
13
IC
b(s;m;~) where s is the number
s=O
of successes and m is the number of trials. It is explained that
the significance level is a
=
2
the confidence interval associated with this modification can be
written as:
where g and h can be chosen to minimize the expected length of the
interval.
The confidence coefficient associated with this is
Ygh ~ 1 - a = 1 - (~)g+h-2
g-l
'I
s=O
(g+h-l).
s
Confidences corresponding to various values of 9 and h are tabled.
A graphical method for obtaining a confidence interval for the
median, based on Wilcoxon's signed rank test, is discussed briefly
in Moses (1965).
Assuming Wilcoxon's signed rank statistic, W, is
formed, where
n
W=
L
Z. = { +1
1
Z.R. ,
i =1
_1
1 1
x.1
> 0
x.1 < 0
and Ri ;: Rank of IXii among IX11, ... ,IXnl, the critical value for
a test of the median=O is called S*. The procedure described essentially requires forming all possible averages from two sample
observations.
Moses proposes that this can be done by a graphical
technique involving the intersection of lines from each pair of
sample observations.
The smallest and largest S* values among the
averages, and the data points themselves, are excluded.
The
(S*-l)-th lowest and highest remaining averages or observed values
14
constitute the endpoints of the interval.
Efron (1979) describes a technique for estimating the expected
squared error of estimation for the sample median.
entitl ed the "Bootstrap ", is descri bed as fo 11 ows:
The method,
Assume it is de-
sired to estimate the sampling distribution of R = R(X,F)
= t(~) - 8(F).
This random variable is the difference between a parameter of interest, 8(F), and its estimator,
t(~).
A bootstrap sample is obtained
by drawing a random sample of size n from the population.
Then, the
A
sample probability distribution, F, is constructed by equally weightA
ing each data point by lin.
From this F, a subsample of size n is
drawn, with replacement, which will be called the bootstrap sample
*
).
1
*. (*
*)
and be denoted X* -- ( Xl""
,X *
n ' wlth observed va ues ~ = Xl"" ,x n .
Thus, this is not a permutation of the original sample unless by
chance each element is selected exactly one time.
The sampling dis-
tribution of R(X,F) can be approximated by the distribution of
A
R*
= R(X*,F), which is the bootstrap distribution induced by selectA
ing a bootstrap sample from a fixed F.
The distribution of R* would
equal the distribution of R if F were exactly equal to F, and, as
Efron explains, must be "clos e" since F is "close" to F.
Exactly
how well R*'s distribution approximates that of R depends upon the
form of R.
= median of F, and
t(X) = X(m)' the sample median from a sample of size n = 2m-l. Let
N* = [ Nl,
* ... ,N *J
n where N*i denotes the number of times Xi is selected
For estimation of the median, use 8(F)
with the bootstrap sampling procedure.
denote the ordered values x(l) 5 ...
~
Within this bootstrap sample,
x(n) and the corresponding
15
N* values N(l), ... ,N(n).
R*
= R(X*,F) = X(m)
Then, the bootstrap value of R is
- x(m)' the sample median of the bootstrap disA
tribution minus the median from the empirical distribution, F.
obtain the estimate of the variance, for any integer l, 1
~
l
To
~
n,
ca 1cul ate
*
= Prob* {K(l)
* 5 m-l}
+ ... + N(l)
= Prob {Sin(n,g)
n
= mf (~)(f) j
j=O J
n
<
-
(n-l)
m-l}
n-j .
( 1.1)
n
Thus,
Prob* {R*
= x(l) -
x(m)}
l-l)
= P{Sin (n'--n--
~
m-l}
P{Sin (n,~) ~ m-l}.
Finally, for a random sample of size n, calculate
as an estimate of
the expected squared error of estimation for the sample median.
If E(t(X)) ; 8(F) can be assumed, a confidence interval for the
median can be based on
at least approximately.
In a later work, Efron (1981) presents an
A
estimator, aSoot which is based on repeatedly drawing bootstrap
samples from the empirical distribution function, F. Denoting
A
16
these bootstrap estimates of the median by 81'.*1 , ... ,8A*N ,
a Boot
=
N
1
The resulting confidence interval is then t(X)
Other median estimators deserve mention.
±
~-1(1-a/2) ~Boot·
Bauer (1972) demon-
strates one way of forming the confidence interval for the median
of a symmetric distribution based on the Wilcoxon signed rank test.
This method is an algebraic expression of that discussed by Moses
(1965) but appears more difficult to implement.
(1969) present an estimator of the form T(a,r)
where 0 < a < 1 and 1
~
r
~
[n/2].
Desu and Rodine
=
aX(r) + (1-a)X(n_r+1)
They derive its density and
distribution assuming symmetric underlying densities, but the
density function is extremely complex and confidehce intervals are
difficult to obtain.
Finally, three separate articles written by
Reid (1981), Emerson (1982), and Brookmeyer and Crowley (1982) discuss median estimation from survival data both with and without
censoring.
The sign test is modified to handle these situations.
1.2.3 Estimators for the p-th Quantile
The estimators and intervals discussed above are intended specifical1y for the population median.
There are several other estimators
and intervals which can be used to estimate arbitrary p-th quanti1es,
although perhaps with restrictions.
Kubat and Epstein (1980) develop two point estimators of
from any distribution in the location-scale family:
~p
17
F (~ )
x
p
t.; -A
= F(~)
= p.
<5
Their simpler estimator is based on two-order statistics.
•
It can be
expressed as Xt.;(a,b) = C1X(L) + C2X(M) subject to Cl +C 2 = 1,
C1Za + C2Zb = Z~, where ~p = <5Z~+A, Q<a<p<b<l " L = [naJ+l, and
M = [nbJ+l.
Solving for constants leads to
The variance of the estimator is derived, a* and b* are chosen to
maximize the asymptotic relative efficiency, and the optimal X(L) and
X(M) are determined.
The final estimator then becomes
X~(a*,b*) = CiX(L*) + C2X(M*),
* _ Zt"-Zb*
where L* = [na*J+l, M* = [nb* J+l, Cl - Z~-Zb. Since this estimate
is a simple function of order statistics, asymptotic normal results
apply, and the resulting confidence interval is of the form
where
v(X~(a,b))
is presented in the paper.
proposed for three order statistics.
A similar estimator is
For both estimators, the full
sample need not be observed, and knowledge of F(o) is only required
in an interval covering
~p.
Another type of interval estimator is discussed in Schmeiser
(1975).
Based on the above-mentioned estimator,
a normal theory estimator is formed.
Let there be estimates
18
val for the p-th quantile can then be constructed as
4'
G(F n (p))
± t a/
2 ;m-l S/!Im
where
Schmeiser explains the validity of the interval in terms of a result that as l
+ 00,
and p
x
+
(r)
= r/l is fixed,
N[F~)'
n
p(l-p)f 2]
[f[F~l(p)JJ
,
with f corresponding to Fn (from Gibbons (1971), p. 40).
Obtaining a confidence interval of pre-assigned length for any
p-th quantile is the subject of Weiss's (1960) work.
His method
lets us select a pre-assigned length 6 for the confidence interval
for the p-th quantile with desired confidence coefficient B.
First,
two definitions need to be given:
1)
Define the p-th sample quantile from a random sample of
size n as the sample value with exactly [npJ observations
be low it.
(A va ri at ion
0f
Xp. )
19
2)
Let 0 5 a, Y ~ 1 be any two values, and define N(a,y) as
the smallest positive integer n which satisfies
fmin (l,q+y) y[npJ (1-y[np,J-_
1.dy a
n.I
~
~n-
>
max(O,q-y)
To construct the interval, choose quantities a,w,r, all between
o and
1 so that aw
= Band
r > max(p,l-p).
Then, select a sample of
m observations where m is the smallest positive integer that satisfies 1_[mrm- 1 - (m-1)rmJ ~ w. Next, defining Land U to be the
smallest and largest among these m observations, let
_
. [
Y - mln r-p, r-
(1)
-p,
r-(l-p) 6 1-0 6 J
U-L 2' rr:L2'
Then take a second sample of n observations, where n is the smallest
integer greater than N(a,y) such that np is not an integer.
by Z the p-th sample quantile of the second sample.
interval for
~p
Denote
The confidence
is then
Azza1ini (1981) presents a method based on the so-called kernel
estimate of the density f(·).
The estimator is
1
A
f(x)
n
x-x.
= nb I w(~)
i=l
where w(·) is some bounded density function with w(t)
real (t,b) > O.
= w(-t) for all
The distribution function estimate then is simply
A
1
F(x) =-
n
R
x-xi
I W(----)
,
. 1
b
1=
20
where W(t)
A
= ftW(U)dU.
To estimate
~p = F-l(p),
use xp defined by
Under regularity conditions, it is shown that xp has the
_00
P = F(x p).
same asymptotic distribution as its corresponding sample quantile.
Thus, it is possible to form a confidence interval of the form
where S(x p ) is the estimated standard deviation of xp •
Another novel approach is called Nomination Sampling by Wil1emain
(1980).
The method is demonstrated for estimating the median, but
works for any quantile by making slight modifications.
Instead of
drawing a random sample of size n, draw n independent random subsamples of size N.
We then "nominate" the largest value x(N)i from each
of the i=l, ... ,n random subsamp1es.
The proposed estimator is of the
form
where
i =
[!l~ 1 ] , and a
2
i+1) liN
( S+l)
- 1/2
1IN _ r-.i-J 1IN
(l:!l.}
S+ 1
lS+ 1
=--~-:-:------.~
Note that X(i) provides an estimate of the
(s1 1)
fracti1e of the dis-
tribution of nominees, so it provides an estimate of the ((S~l)) liN
fractile of the distribution of the general population.
An approxi-
mate confidence interval based on this estimator can be formed by
using 8p ± ~-1(1-a/2) S(8 p)' The standard deviation, S(8 p)' can be
estimated numerically by using the empirical cdf in the formula
presented for the probability density, and then calculating appropriate
21
moments.
If a finite population, TIN' with N elements is assumed, and a
simple random sample of size n is selected without replacement, an
interval estimation method described by Wilks (1962) and extended
in Sedransk and Meyer (1978) may be used.
1 :::
Let t be a fixed integer,
t·::; N, so X(t) is the (k)-th quantile of TIN"
interval is defined to be (x(i)'X(j»' where 1
that P(x(i) ~ X(t) ~ x(j»
= P(X(i)
::; X(t»
~
If the confidence
i
n, we have
< j <
- P(x(j) ~ X(t_l»
which
tu rns out to be
The estimator which will be the major focus of this dissertation has been developed by Harrell and Davis (1982).
Their
paper states that since lim E[X((n+l)p)] = F-l(P) for p
E
(0,1), it
rr+ro
would be desirable to estimate E[X((n+l)p)] and hence the p-th quantile of the population, whether or not (n+l)p is an integer.
They
suggest using
Q
p
=
f\-l(y) y(n+l)p-l (l_y)(n+l)(l-p)-l dy
0 n
.1
B((n+l)p,(n+l)(l-p»
whe're y = F(x), Fn (x) = n- l H(x.
.1
(1978), Qp can be expressed as
':S
x).
Following Maritz and Jarrett
n
Q
P
=
I W ,X(,'}
i=l n,'
(1.2)
where
1
B((n+l)p,(n+l)(l-p»
i n
f / y(n+l)p-l (l_y)(n+l)(l-P)-l dy
i-l
n
22
= Ii/n{p(n+l),(l-p)(n+l)} - I i - l {p(n+l),(l-p)(n+l)}.
n
Thus, the estimator for the p-th quantile is a linear function of
all the order statistics.
A jackknifed variance estimator is pre-
sented in the form:
where
S. is the quantile estimate with the j-th order
J statistic removed;
S. = I(ifj) W 1 . 1[' 'J
n- ,1- 1>J
J
n
S = n-1 I Sj'
j=l
and
(1. 3)
By appealing to asymptotic normality results, a confidence interval for this method is seen to be Qp ± ~-1(1-a/2) S(Qp)' where
S(Qp) is the estimated standard deviation for the estimator, obtained
by the jackknife procedure.
The estimator's performance is tested
on a variety of shapes of distributions, and is generally shown to
be more efficient than traditional estimators based on one or two
order statistics.
Kaigh and Lachenbruch (1982) present another method using all
of the order statistics.
This technique requires drawing all pos-
sible subsamples of size k from the n elements selected in a random
sample.
Then, the average of the p-th sample quantiles from the sub-
samples is used to form the estimator for the p-th quantile.
This
estimator is a U-statistic which can be expressed as a linear combination of order statistics:
23
=
r+n-k
.I
J=r
j-l n-j n
[(r-l)(k-r)/(k)] X(J')' where r
=
[(k+l)p].
(1.4 )
A confidence interval for this estimator may be written as
since the statistic is shown to be asymptotically normally distriA
buted.
(1982).
A jackknife estimator for S(sp(K-l)) is presented in Kaigh
Monte Carlo studies have generally shown this estimator to
be more efficient than the sample quantile.
1.2.4 Quantile Estimators for Specific Distributions
Many articles have been written which present estimators for
specific distributions.
They use a variety of techniques for devel-
oping the estimator and a variety of criteria for evaluating the
estimator's utility.
Often, the principle used for obtaining these
estimators is maximum likelihood estimation.
for any vector
~
As Green (1969) states,
of unknown parameters of the distribution, IIIf the
p-thquantile, sp' is equal to 9(0) and § is the maximum likelihood
estimator of ~, then ~p
1.2.4.1
=
g(~).11
Normal Distribution Quantiles
For example, Green notes,for a normal distribution,
~p = X + c p
sA(n~ 1)}
where ¢(E p) = p; X and (n~1)S2 are the usual maximum likelihood
estimators, (MLEs), of ~ and 0 2.
Several other authors have addressed the problem of estimating
24
the p-th quantile from the normal distribution.
Owen (1968) defined
+ Epa, the p-th quantile, where Ep is
the standardized normal deviate corresponding to probability p.
a confidence interval for
~
This interval is defined by:
P{X +
where
y
E(l_y) S ~ ~ + Epa
2
$
-
X + E(l+y) S} = y,
2
=l-a, S is the sample standard deviation, and
E1lzrl,E~
refer to deviates from a non-central t-distribution, values which are
tabled in the article.
A
Zidek (1971) presents the minimax estimator for
1
II Cn S~
~p'
_
8p = X +
where:
n
X =
I X;ln,
i=l
n is a given standardized deviate, and
( n+-1) ) - 1 ,n=2,3, ....
Cn =f (2n) (y,
2-f2
He shows this estimator to be "inadmissible" because there is another
es timator,
where
H(t) = ~n(nt)
for which
A
2
- nCn(nt) + 1,
E~,a (81-~p)
2
A
_00
<
t <
00
2
< E(8p-~ p) .
Confidence intervals appear difficult to form, however,
Dyer, Keating, and Hensley (1977) present two other estimators
for normal quantiles.
These are:
25
1)
A minimum variance unbiased estimator:
_ r (n/2)
r (n-l)
m-
2
2)
(-L)~
n-l
•
•
A IIBest-invariant estimator, Xp(BIE} = X + <ll-l(l-p)mS, which
ll
is the Pitman-closest estimator defined as
P( /8 1-81 <
/8 2-8/)
81 ,
2: .5 for any other estimator
where
82 and un-
known parameter 8.
These two estimators are also compared with each other as well as
with the maximum likelihood estimator.
Different conclusions re9ard-
ing relative quality are found depending on the judgement criterion
chosen.
Dyer and Keating (1979) also present an estimator which
meets a fairly complex criterion of achieving minimum mean absolute
error.
The estimator is of the form:
where t. 5 (f,o) is the solution to T(t;f,o) = .5 from the noncentral
t-distribution with noncentrality parameter on f degrees of freedom.
°
Confidence intervals for any of these estimators can be based on the
noncentral t-distribution as described in Owen (1968).
1.2.4.2 Exponential Distribution Quantiles
In addition, estimation of quantiles from the exponential distribution has been discussed.
Robertson (1977) proposed three
formulas for linear estimators of the quantile kp8 of the single
parameter exponential distribution with pdf 1/8 exp(-x/8), 8>0,
26
_00
x<
<
Essentially, the first method sets out to find a con-
00.
stant K so that 1 - exp(KX) will be of minimal MSE.
This K is shown
to be
K = K = n[ exp{ kpl (n+ 1)} - 1]
o
[2 - exp{kp/(n+l)}]
Without regard to the minimal MSE criterion, another choice is to
let K=Kl=k p' The third choice, K2 = n[exp(kp/n)-lJ, makes exp(-K 2X)
unbiased for exp(-k p)' suppressing G. Obtaining exact confidence
intervals based on these estimators is not discussed, but mean squared
errors for the predicted distribution function are presented to indicate performance.
A recent work by Rukhin and Strawderman (1982) deals with estimating the quantile
p = A + ba from the two-parameter exponential
distribution with pdf 1 exp _(X-A), where A,a are both unknown, and
G
a
b is a given constant
~
a
O.
They demonstrate that an estimator often
considered to be the best equivariant estimator,
°
0
=
-1 )
x(1) + ( b-n )(x-x(l) ,
is not as good an estimator as
o=
when b
>
°
0
-
2(n+l)-1 [(b-l-n-l)(x-x(l)) - (bn-l)x(l)J
1 + n- l . Confidence intervals based on this estimator ap-
pear difficult to obtain.
The performance of this estimator is
measured through risk functions.
Greenberg and Sarhan (1962) propose a nonparametric estimator
X -A
of the same distribution's quantile.
estimator takes the form:
If Zp = ~, then their
27
The variance of this estimator is
An approximate l-a confidence interval can be formed based upon this
estimator:
~p(S-G) ± ~-1(1-a/2) S(~p(S-G)).
S(~p(S-G)) may be easily obtained since Zp
= -In(l-P), and A and a
can be estimated from the data.
Ali, Umbach, and Hassanein (1981) use much of the Kubat and
Epstein (1980) technique to estimate a quantile from the twoparameter exponential distribution.
After algebra designed to mini-
mize the variance of the estimator, the result for this distribution
is presented as:
(1 - Zp/l.59362)x(l) + (Zp/l.59362)X([.7967n] + 1)'
for 0
s
p '5 .3339; .9296 < p;
xp =
.745x([(1 .50l37p-.50l37)n]+1) + .255 x([~305063+.69494)n]+1)'
for
.3339
~
P '5 .9296.
A confidence interval based on this estimator assumes the form
Xp ± ~-1(1-a/2)
and Epstein.
S(X ) since this method follows the work of Kubat
p
This method has been shown to have much higher asymp-
totic relative efficiency than the sample quantile.
28
1.2.4.3 Quantile Estimation for Other Distributions
Much has also been written about estimating quantiles from
other distributions.
For the Weibull, Lawless (1975), Mann and
Fertig (1975, 1977), Schafer and Angus (1979), as well as others,
have proposed relevant estimators.
Many other distributions have had their quantiles estimated,
generally by methods not easily leading to confidence interval
con~
struction.
Angus and Schafer (1979) discuss estimation of logistic
quantiles.
Ali, Umbach, and Hassanein (1981) estimate double expo-
nential quantiles.
quantiles.
Umbach, Ali, and Hassanein (1981) estimate pareto
Lawless (1975), and Mann and Fertig (1977) estimate
quantiles from the extreme-value distribution.
Sarhan and Greenberg
(1962) estimate the p-th quantile from the Uniform (0,1) distribution.
Finally, Weissman (1978) demonstrates estimation of large quantiles
based on the k largest observations in the sample if the cdf's have
the general forms:
I.
II.
III.
A(x) = exp(-e- x),
-co
¢(x) = exp(-x- a ), x
ljJ(x)
<
X
<
co,
> 0, a > 0,
= exp(- (- x)a),
X
<
or
0, a > 0.
1.2.5 Estimation of Quantile Intervals
Previously, we have only discussed estimation of individual
quantiles of a distribution.
There is often interest in other re-
lated quantities which are functions of quantiles.
tities are quantile intervals
(s ,s ) and
Pl
P2
Two such quan-
quantile differences,
.
29
~P2- ~Pl'
An important example of the latter is
interquartile range.
~.75
~.25'
-
the
This is shown in Chu (1957) to be a reasonable
measure of dispersion for many distributions since it is only a constant multiplier of the standard deviation.
Wilks (1962) defines an outer confidence interval for the quantile interval (~Pl'~P2) based on X(i)'X(j) (l~i<j~n) as the random
interval (X(i),X(j)) with P((~Pl,t,:P2)C (X(i)'X(j))) ?
the nominal confidence coefficient.
y
is
,s ) ~
Pl P2
Without loss of generality, one may show that if
1
O<Pl<p{l,
where
An inner confidence interval is
defined as the random interval (X(.)'X(.)) with P((t,:
(X(i)'X(j))) ? y.
y
l~i<j~n,
J
and X(l)<".< X(n)
is a random sample from a
Uniform (0,1) distribution, then
P{ X( 1.) < ~ P < ~ P < X( J.)} =
l
2
n!
= ( i - 1) !
j-i-l
k ·+1
1
I
(-1) pl
I(l_p )(n-j+l,j-i-k) [k!(n-i-k)!(i+k)Jk';;O
1
2
and
where It(a,b) is the incomplete beta function.
David (1981) provides lower bounds for these intervals as:
and
30
Krewski (1976) produces tighter bounds on the confidence coefficient for outer confidence intervals as follows:
where
0.
1
=
I(l_P ) (n-j+l,j+l) ]
2
j/(n+l).
[ I(l_p) (n-j+l,j)
2
P(X(i) < Pl < P2 < X(j)) >
I
where
0.
(i,n-i+l) - I(Pl/o. ) (i,j-l) I (j,n- j +l),
p2
Pl
2
2
= max
I p (j+l, n-j+1)
--=2=-I
p/
j
j / ( n+1)
,n-j+1)
Shortly thereafter, Reiss and Ruschendorf (1976) proposed a
method which sometimes led to sharper confidence bounds than Krewski
could produce.
Their bounds involve partitioning the interval be-
tween Pl and P2 into smaller sections and labeling each partition by
ai' so Pl
= aO<al<···<a k = P2' These intermediate probability values
are then employed in complex expressions involving functions of the
beta function, which results in more closely bracketing the probability of lying in the outer interval.
The results presented in the
paper indicate slightly tighter bounds than those of Krewski.
..
31
Sathe and Lingras (1981) improve upon both of the previous
papers' ideas by introducing the notion of convex functions.
method is extremely complicated and will not be discussed.
This
Essen-
tially, they demonstrate that they can obtain even sharper bounds
than Reiss and RUschendorf's, and that the bounds can be made even
sharper by subdividing the interval between the two probabilities.
1.2.6 Estimati on of Quant;' e Di fferences
Only one paper has been published specifically discussing intervals for interquartile ranges or other quantile differences.
The
main theorem of Chu (1957) provides bounds on the confidence coefficients for estimates of differences of the form
Let the confidence intervals for
(x(v) -x(u)' x(s) -x(r))'
-
~p
-
~p'
be of the form
Then, if
r
Bn(r,p)
~q
~q
==
I
. -0
1=
.
.
(~)pl (l_p)n-l,
1
it can be shown that
and
David (1981) provides a simpler proof than does Chu.
The article also explains that if k
then as n
-+ 00,
= [np]+l, and m = [nq]+l,
x(m) - x(k) B N(~q-~p'O(*)).
Estimation of a symmetric
quasi-range, such as the interquarti1e range or interdeci1e range,
is also discussed.
32
1.3 Outline of the Research Proposal
In the present work, the papers of Harrell and Davis (1982),
and Kaigh and Lachenbruch (1982) are extended.
Chapter II will con-
sider the problem of relative performance of the proposed estimators
for construction of confidence intervals.
The confidence interval
for the linear combination of order statistics estimator from Harrell
and Davis will be of the form ~ ± CS(~), where S(~) is the estimated
standard deviation of the estimator, and C is a standard normal deviate.
The confidence interval for the other estimator will be as
proposed in Kaigh (1982).
Under the assumption that the data follow
the uniform, normal, exponential, lognormal, or Cauchy distributions,
confidence intervals based on the proposed estimators will be constructed.
The confidence intervals so obtained will be judged rela-
tive to other confidence interval estimators with regard to expected
length and ability to preserve the desired confidence level.
This
comparison will consider parametric and nonparametric estimators under
the distributions mentioned.
In Chapter III, estimators for the difference of two quantiles,
~q
-
~p'
such as the interquartile range will be constructed based on
the linear combination of order statistics estimators.
will be of the following form:
The estimators
33
Important distributional properties for these estimators will
be derived.
For example, under what conditions asymptotic normality
will hold, and how to estimate their variance will be discussed.
Confidence intervals will be constructed based on distribution results in large samples.
Chapter IV will present results and methodology of simulations
used to demonstrate the performance of the estimators of interquantile differences.
Their performance will be discussed relative to
the confidence interval bounds presented by Chu (1957).
In Chapter
V, an example will be presented to demonstrate the estimators' application to health data.
This will consist of a demonstration on
quantiles of lipid distribution from the Lipid Research Clinics
Project.
Finally, Chapter VI will summarize results obtained and will
offer a few suggestions for further research.
CHAPTER II
A COMPARISON OF CONFIDENCE INTERVALS FOR QUANTILES
2.1
Introduction
In order to evaluate the performance of a confidence interval
for quantile estimation, two criteria will be used.
serve the desired confidence level is foremost.
Ability to pre-
If a constructed
interval of intended confidence (l-a) is demonstrated to be of confidence clearly below l-a, then this interval is of little use.
Among those intervals which can be shown to hold the desired (l-a)
confidence, the preferred interval is the shortest.
Throughout this chapter, the notation (n,p)L(l_a) will refer to
the length of a (l-a) confidence interval for the p-th quantile based
on a sample of size n.
The expected length of this confidence inter-
val is denoted E((n,p)L(l_a)) and is the difference between the
expected values of the random variables comprising the ends of the
interval.
This chapter will systematically evaluate various confi-
dence intervals constructed for estimating quantiles and offer recommendations regarding their use.
2.2 Selection of Interval Estimators for Comparison
As is evident from the previous chapter, there are numerous
35
methods available to construct confidence intervals for quanti1es.
The decision to include or exclude estimators for the comparisons in
this chapter is based on several criteria.
For nonparametric esti-
mators, the first criterion is that the confidence interval can be
constructed from a single random sample of arbitrary size n.
This
excludes, for example, Schmeiser's (1975) normal-theory estimator
which employs m such samples, as well as nomination sampling proposed
by Wi11emain (1980) which also requires multiple samples.
Walsh's
(1958) method was excluded because it was only readily computable for
samples up to size 12.
Secondly, no knowledge of the shape or type of the underlying
distribution should be required.
Several estimators require symmetric
underlying distributions to estimate the median, so these were not
considered.
Kubat and Epstein's (1980) method was not considered
because knowledge of the distribution is required ina small interval
around the quantile.
Finally, the method must be straightforward to calculate for the
quantile(s) of interest.
The methods of Ekblom (1973), Guilbaud
(1979), Weiss (1960), and Azza1ini (1981) were all eliminated for
th i s rea so n.
The remaining candidates for comparison were few in number.
The
1inear combination of order statistics (L-COST) estimator proposed
by Harrell and Davis (1982), the method of Kaigh and Lachenbruch
(1982), the bootstrap for the median, discussed by Efron (1979), and
the method using the order statistics X(j) and X(k) for endpoints,
as discussed in David (1981, p. 15) and elsewhere, were all considered
36
reasonable to compare.
Finally, when one or more parametric methods
for construction of the interval for a particular distribution appear in the literature, the simplest such method is included for
comparison purposes.
2.3 Note on Use of the Kaigh and Lachenbruch Estimator
The purpose of this chapter is to compare the above-mentioned
confidence interval estimators for a quantile.
In order. to use the
Kaigh and Lachenbruch (1982) estimator, either as a point estimator
or in a confidence interval, it is necessary to select a value for
k, the subsample size, as explained in Chapter I.
Kaigh and
Lachenbruch suggest choosing k so as to minimize E(~p(K-L) _ ~p)2,
where ~p(K-L) denotes their point estimator for the p-th quantile.
However, making the choice of k in this way implies knowledge concerning the underlying distribution.
Empirical results have indicated
that lengths of confidence intervals and levels of observed confidence are not very sensitive to moderate variation in k (see Kaigh
(1982)).
Thus, choosing a reasonable, but not necessarily optimal
k will be sufficient.
Secondly, since the Kaigh and Lachenbruch con-
fidence interval estimator is constructed using a t-distribution with
n-k degrees of freedom, when the sample is small, it is preferable to
select a smaller value of k, all else being equal.
This will prevent
the critical t value from becoming excessively large.
Finally, re-
gardless of quantile, a value of k should not be selected which is
small enough to permit the extreme, or nearly-extreme, order statistics
to be included.
This is of most help when dealing potentially with
37
long-tailed distributions.
Kaigh and Lachenbruch (1982) suggest choosing moderate values
of k for median estimation, and somewhat larger values for the estimation of other quanti1es.
This might suggest, for example, choosing
a value of k about n/3 for estimation of the median, and about 3n/4
for other quantiles, assuming a sample of size n.
The method used in this chapter is more complicated, but may
lead to an interval centered around a point estimator which has lower
bias.
The first step in this process, regardless of the quantile to
be estimated, is to examine the negative hypergeometric probability
distribution (weights) for the particular quantile and sample size of
interest, for various values of k.
A sufficiently large number of
values of k was selected so as to reasonably cover a range from small
subsamples up to the entire sample size.
of k was considered adequate.
Choosing about n/5 values
One probability distribution was con-
structed for each value of k (and nand p) under consideration.
To estimate the median, the order statistics which combine to
form the sample median were determined.
All weight distributions
computed were roughly symmetric about this sample median.
A value
of k was chosen for which about one-third of the order statistics
on either side of the sample median, nearest the ends, have zero
weight.
This resulted in a probability distribution with a well-
defined, but smooth,
II
peak at the sample median.
li
That is, one for
which the probabilities decrease greatly in magnitude when more than
four or five order statistics away from the sample median.
intended to reduce variability.
This is
38
To estimate a quantile other than the median, the order
statistic(s) which form(s) the sample quantile were determined.
Among the probability distributions computed, the one which assigns
the greatest probability very close to, or at, the sample quantile
was identified.
When more than one value of k had a distribution
with high probability near the sample quantile, the value of k for
which the probabilities surrounding the sample quantile appeared to
be in the most well-defined, but smooth, cluster was chosen.
These steps were taken to help reduce the bias of the point
estimator, and hence lead to an interval centered near the quantile
of interest.
It must be realized, however, that varying k leads to
variation in the standard deviation of the estimator as well as in
the critical t-value.
Hence, choosing k according to these sugges-
tions mayor may not lead to realization of the interval with the
most confidence.
2.4 Evaluation of Confidence Intervals
The intervals considered are either determined exactly or require simulation.
These need to be considered separately with re-
gard to their evaluation.
~
2.4.1
Exact Confidence Intervals
2.4.1.1
Determination of Confidence
Although it is intended to construct intervals of
confidence, this may not always be the case.
(l-a)~OO%
Two of the methods do
permit nearly exact determination of the confidence of an interval
39
obtained.
In the cases of the uniform, exponential (A=l), and N(O,l)
distributions, parametric estimators exist in the literature.
In
each case, the expected value of the estimator depends on p,n,and
<lJ- l (1-a/2). As <lJ- l (l-a/2) can be determined almost exactly from
tables, an interval can be determined with nearly exact confidence
1-0..
Also, for the same distributions, because the expected values of
order statistics are readily computed or already tabled, the order
statistics confidence interval, (X(j)'X(k))' for any quantile can be
evaluated almost exactly.
In these cases, regardless of distribution,
the exact probability of having the true quantile between the two
order statistics can be found by examining a table of the cumulative
binomial probability distribution for the p and n specified.
for example, the Harvard University (1955) tables.
constructed in part from the Harvard tables.
See,
Table 2.1 is
Simulated confidence
intervals will be discussed in section 2.4.2.
2.4.1.2 Expected Lenath of Confidence Intervals
If an underlying N(O,l) distribution is assumed, then Owen (1968)
provides
P(X
where
S
+ E
(~)
= sample
In
S
< ~ + E a <
P
-
X+
E
(1-~)
S)
=
1-0.
standard deviation
,In E
are, respectively, lower and upper critical
(~)
(1-~)
points of a noncentra1 t-distribution with noncentra1ity
E
parameter In <lJ-1(p).
40
Then,
E((n,p)L(l_a))
= E(X
+
E
(l-~)
S) - E(X +
,2
= (E
E
(~)
S)
2
l - a /2-E a / 2) E(S).
When X is distributed as a N(O,l) random variable,
E(S)
=
f(n/2) (n:l)~ (f(n l))-1.
2
Thus,
(n,p)
) _ (
) ( )( 2 )~ ( (n-l))-l
E(
L(l-a) - E l - a/ 2 - Ea / 2 f n/2 n-l
f --2-.
The last column of Tables 2.4 and 2.5 presents these expected lengths.
Greenberg and Sarhan (1962) present an appropriate parametric
estimator for a quantile of the one-parameter exponential distribution.
Following the notation established in the previous chapter,
~p(S-G) = -8* In(l-p)
where
This estimator has variance
with 8 2 estimated by 8*2.
This would then lead to an estimator of
the standard deviation:
S(~ (S-G))
In(l-p).
In
Following a large-sample approach, the confidence interval can
p
= 8*
be constructed as
Cp(S-G)
±
¢-1(1-a/2) S(~p(S-G)).
41
Thus,
E((n,p)L·
)
(l-a)
= 2 ¢-1(1-a/2)
E(8* In(l-p)l
= 2 ¢-l(l_a/~
In(l-p) E(8*).
hi"
)
hi"
It is easy to show that when X - exponential (A = 1/8),
So,
= n~ 1
(8( 1 -
~))
= 8.
Therefore,
E((n,p)L(l_a)) = 2 ¢-1(1-a/2) 8 In(l-p) .
In
These expected lengths are presented in the last column of Tables
2.8 and 2.9.
If the data have an underlying uniform distribution on the
interval (O,e), then
~p(S-G)
where
= p8*
n+l
8* = -n- X( n) .
This estimator has variance
Va r ( At,;p ( S- G) ) = P2 Va.r [n+l
-n- X( n)
_ p2
-
=
1
82
n( n+2) -
["n(:2,]'-
]
42
These formulae permit an asymptotic (l-a)xlOO% confidence interval
to be constructed of the form
where
S(~p(S-G))
=
p8*
In(n+2)
As 8* is unbiased for 8, the expected length is
These tabled values are presented in the last column of Tables 2.2
and 2.3.
For several distributions, the exact expected value of the order
statistics comprising the endpoints of this type of interval can be
computed readily.
The interval of the form (X(j)'X(k)) is constructed
such that
and as close to l-a as possible.
For the
N(~,02)
distribution with cdf
~,
the expected value of
the k-th order statistic is
and thus the expected length of the confidence interval is difficult
to obtain in a closed form.
Harter (1961) has performed the required numerical integration
to obtain the expected value of each order statistic from a N(O,l)
43
distribution.
Using his tables, the expected value of the interval
is easily obtained as the difference of the expected values of the
order statistics at the endpoints.
The next to last column of Tables
2.4 and 2.5 presents the expected lengths.
Since the expected value of the k-th order statistic from an
exponential (A
= 1/8)
distribution is
The expected lengths are presented in the next to last column of
Tables 2.8 and 2.9.
For a uniform distribution on (0,8), the expected value of the
j-th order statistic is
E(X(j))
= 8j/(n+l).
Thus, under a uniform (0,8) distribution,
These lengths are in the next to last column of Tables 2.2 and 2.3.
2.4.2 Simulated Confidence Intervals
In the case of the L-COST, Kaigh and Lachenbruch, and Bootstrap
methods, the intervals were simulated.
method was also simulated.
As a check, the order statistic
44
2.4.2.1
Determination of Confidence
The observed confidence from the methods requiring simulation
were computed as
So I. (A.
< ~ < B.)
- y
I ' , p
- So
So
i=l
,
where
I . (A. < ~ < B.)
"
p
,
={'
~O
(2.1)
,
if Ai < ~ < B.
P
otherwi se
So = number of simulations performed
(Ai,B i ) = confidence interval constructed.
The number of simulations performed for each combination of
sample size, quantile, method, underlyi ng distribution, and desired
confidence of the interval is So'
A confidence interval is constructed from each simulated sample.
Based on the known, underlying distribution of the sample, the p-th
population quantile is calculated.
estimate of the true confidence.
Formula (2.1) is used to find an
To decide whether the observed con-
fidence is acceptable, the confidence interval may be evaluated on
the assumption that inclusion/exclusion of
~p
from the interval is
distributed as a binomial random variable with probability parameter
y
= l-a, and sample size, So'
An appropriate
~robability
statement
ta kes the form:
(2.2)
45
Since sY is the observed confidence, a 95% interval for y is readily
o
obtained from (2.2).
2.4.2.2 Expected Lengths of Intervals
The expected lengths of simulated intervals can be easily estimated.
For each underlying distribution and quantile to be considered,
confidence intervals have been simulated based on X(j) and X(k).
The bootstrap method of Efron (1979) can be used to form a confidence interval for the median.
From Chapter I, an interval can be
The expected squared error of estimation for the sample median is
estimated by E*(R*)2 and can be simulated based on formulae presented
as (1.1) of Chapter I.
The expected length of the interval is
2 (p-1(1-a/2) E/E*(R*)2.
The L-COST interval is of the form
as shown in (1.2) and (1.3) of Chapter I.
The expected length of
the confidence interval is then
with S(Qp) being simulated.
Finally, the Kaigh and Lachenbruch
(1982) estimator can be used in a confidence interval
This leads to:
46
A variance estimator,
52(~p(K-L) = V(~p(K-L)),
can be based on the jackknife as follows.
Writing ~p(K-L) as 8~, and
8~_1 for the point estimate for a sample with the i-th observation
removed, and the weights readjusted accordingly, a jackknifed estimator for the point estimator can be written as
A
8~
1
AO
Ai
= n8 n - (n-l)8 n-l .
(2.3)
Using (2.3), an estimator for the variance can be constructed as
foll ows:
n
S2
A* - 8A*)2 I ( n-l )
= I (8.
1
(2.4)
i=l
where
A
8*
=
Thus,
AO
n
I ..f(n8AOn i =1
52 = (n-l) -1
(n-l}8Ai)
n_l
n
= (n-l) -1 I ( n8AOn - (n-l)§i 1
ni=l
n
= (n-l}-l
I
[(n-l)
i=l
n
= (n-l)
where
-i
8
n-
I
i =1
n
1 =
\'
L
;=1
i
(8 n-l
Ai
8
n-
Jl
Ai
8n-l
n
i )2
- e- n-l
lin.
-
- ( In
i=l
C
8n
Ai
-
(~-1)8n_l
n
n AO
e
+
(n-l)
I
I
i =1
i =1 n
(n-l)8Ai
n-l
]J]. 2
8~_1]
2
j'
(2.5)
47
2.4.3 Selection of Distribution for Pivotal Quantity
The statistics
(0 -
8 )
o
S(8)
is frequently known as a pivotal quantity.
When 8 = sps and
8 is
based on the Bootstrap or L-COST method for estimations the normal
distribution is used to approximate the distribution of the pivotal
quantity.
Tukey (1958) would argue in favor of using at-statistic
with n-l degrees of freedom when both the point estimator and its
standard error are based on a jackknife.
Neither the L-COST nor the
Bootstrap point estimator is a jackknife estimator.
SOs
his argu-
ment does not necessarily apply to pivotal quantities based on these
estimators.
Since the pivotal quantities have been shown to be dis-
tributed normallys at least asymptotically (sees e.q. s Kaigh (1982)),
the normal distribution was chosen to approximate L-COST and Bootstrap
critical values.
Since Kaigh (1982) suggests using t n_k for his estimators this was used for the comparisons in this chapter.
2.5 Details of the Simulation Process
In order to represent a range of shapes of distributions s the
uniform (OslL N(Osl)s exponential (.\=l)s standard Cauchys and standard lognormal distributions were selected for the simulation study.
Because of the ease of uses PROC MATRIX in SAS (1979) was employed
to code and execute the simulations performed.
To generate uniformly
distributed random variables s the UNIFORM function was used.
N(Osl) random variables s the NORMAL function was used.
For
To generate
48
exponential random variables, the probability integral transform
was employed:
If X has a continuous cumulative distribution function,
F(x), then Y = F(X) is uniformly distributed on the interval (O,l).
Let U be a uniform (0,1) random variable as generated by the SAS
function UNIFORM.
variable and C =
Then E = -In(l-U) is a standard exponential random
tan(TI(U-~))
is a standard Cauchy random variable.
To simulate a sample from the lognormal distribution, the relation
that X is distributed as a lognormal random variable if Y =log X is
distributed as a N(O,l) random variable was employed.
Thus, to gen-
erate X from a lognormal distribution, Y was generated from a N(O,l)
distribution and the transformation X = EXP(Y) was used.
Five hundred simulations were computed.
This allows observed
confidences within about .02 of .95 to be acceptable for the 95% confidence interval, and within about .01 of .99 to be acceptable for the
99% interval.
Three odd sample sizes, 11, 31, and 51, were chosen to
permit direct use of the Bootstrap method, and to permit some generalization of results for small-to-medium sized samples.
The median,
quartiles, and deciles were reasonable to estimate from the largest
sample size, but the deciles were not estimated from the sample of
size 31, and only the median was estimated from the smallest sample
size.
This was done in order to only use quantiles whose confidence
can be estimated by the order statistic method, without extending beyond the first or last order statistic, in order to achieve a desired
confidence.
As the uniform and normal distributions are symmetric,
only the quantiles at or below the median were estimated.
The Cauchy
is also symmetric, but due to very wide tails, it appeared useful to
49
estimate the upper and lower quartiles as the two estimates may
differ.
Deciles from the Cauchy were not estimated as trial results
indicated very poor stability of estimates over repeated simulations.
2.6 Results from Simulated or Theoretical Construction of Intervals
This section contains the results of the simulated or exact
confidence intervals constructed for the cases described above.
The
first table, 2.1, contains the order statistics which would form the
endpoints of the order statistic estimator, as well as the theoretical
confidences obtainable.
Then, tables 2.2 through 2.11 report each
underlying distribution's results in separate tables for 95% confidence intervals and 99% confidence intervals.
Each expected length
estimated or calculated is presented with its observed or theoretical
confidence directly beneath in parentheses.
The first comparison of results is between the simulated order
statistic interval and the exact order statistic interval.
For the
distributions which permit exact calculation of the order statistic
interval, it is clear that the simulated interval agrees to a large
degree with the exact one, both in terms of expected length and
ability to preserve the confidence level.
This agreement testifies
to the quality of the samples which were generated.
Thus, the esti-
mates which result from simulation of other methods are likely to be
reasonable.
Considering median estimation, it seems that the L-COST interval
estimator, constructed using the normal distribution, is generally
unable to preserve the desired confidence.
The exception is for a
50
small sample generated to be from a Cauchy distribution.
It was not
a poor estimator, but did violate this important criterion.
On the
other hand, the Kaigh and Lachenbruch method almost always exceeded
or equaled 93% confidence for a 95% confidence interval and 98% for
a 99% confidence interval.
The bootstrap estimator tended to perform about as well as the
Kaigh and Lachenbruch estimator with respect to preservation of
fidence.
con~
In every instance, except for estimation of a Cauchy median
by samples of size 11, the length of the confidence interval from a
bootstrap was at least as great as that from the Kaigh and Lachenbruch
method.
The simulated order statistics interval is quite good with
regard to preserving confidence, but is always longer than the equivalent Kaigh and Lachenbruch estimator.
Evaluating the estimators' performance at other quanti1es provides different results.
For distributions other than the Cauchy,
neither the Kaigh and Lachenbruch nor the L-COST interval, as
presently formulated, is able to provide the specified confidence.
It is evident that selection of k by the more complicated method
described in section 2.3 is not adequate to ensure proper confidence
from these quanti1es.
Kaigh's (1982) article describing construction of confidence
intervals from the Kaigh and Lachenbruch estimator also presents
results of simulations.
The article compares the expected length
of confidence intervals and observed confidence from the Kaigh and
Lachenbruch estimator, the order statistics estimator, and an estimator which is a generalization of the L-COST estimator.
This
51
general i zed estimator assumes the form
n
L* = .~
I
J-l
j
/
n
. 1
L
n
where r = [(k+l)p].
1
r-l
k-r
B{r,k-r+l) x
(l-x)
,
o< x <
1
Letting k=n produces the L-COST estimator.
His
results show that L* is better able, under the symmetric distributions presented, to preserve the desired confidence than is L-COST.
The results are presented only for sample sizes 19 and 99, and for
the uniform, normal, double exponential, and Cauchy distributions,
so ability to perform under a wider variety of distributions is not
considered.
The problem of selecting k would remain under this modi-
fication, so it could be considered equivalent in many respects to
the Kaigh and Lachenbruch estimator.
To the L-COST's credit, it should be noted that when the confidence obtained by the L-COST estimator is at least as great as that
of the Kaigh and Lachenbruch estimator, the expected length of the
former interval is never greater than that of the latter.
The order
statistics confidence interval, however, clearly preserves the desired confidence, and except for the 99% confidence interval for
Cauchy quantiles, its expected lengths are quite close to those of
methods which cannot preserve the desired confidence.
2.7 Conclusions
From the results presented in the previous section, it is
apparent that the L-COST interval estimator cannot be depended upon
to provide a confidence interval of stated level (l-a) in small to
52
moderate samples so long as the normal distribution is used to construct the interval.
Only under certain distributions is an assump-
tion of following an approximate normal distribution in small samples
valid.
In view of the merits of the L-COST method for point estimation,
it may be worthwhile to explore its performance when constructed with
other than the normal distribution.
For example, if a t n- 1 statistic
were used in construction of the interval, the length would increase
about 13% for a sample of size 11, but only 4% for a sample of size
31, and 2% for 51.
Based upon the results obtained, this may, in some
cases, be sufficient extra length to provide adequate coverage.
For median estimation, the Kaigh and Lachenbruch estimator performs satisfactorily with regard to preservation of confidence when
considering each interval separately.
However, it appears to be
biased towards being below the desired confidence level when results
are examined overall.
This is reflected in its expected length almost
always being less than that of other interval estimates.
For estimation of quanti1es other than the median, the estimator
of choice is the order statistic method given the other intervals as
they are presently constructed.
This method consistently produced
intervals of the desired confidence, or better, and its interval
length was generally quite comparable to interval lengths from estimators which could not attain the desired confidence with enough
regularity.
Of course, all the estimators considered above were nonparametric.
If there is definite know1edqe regarding the underlying distribution,
53
it is obvious that the parametric estimator is the best choice, regardless of quantile.
With regard to the last point, it should be noted that in the
only instances for which the parametric estimator was greater in expected interval length than the L-COST interval, the L-COST interval
was not providing an interval with nearly the specified confidence.
The confidence interval appears to be biased upwards, and its length
is biased towards being too short.
An adjustment in length resulting
from employing a t-distribution would change this, and should be considered.
Or perhaps, the question of whether to jackknife the linear
combination of order statistics should be reconsidered, as discussed
in Efron (1979), and Parr and Schucany (1982).
In any event, when constructing confidence intervals from small
to medium sized samples for the median, the Kaigh and Lachenbruch
method is on the borderl i ne of acceptabil ity, but the order statisti c
method assures the desired confidence.
For other quantiles, the order
statistic method is also preferable when compared with other methods
as they are presently constructed.
The L-COST interval might also
perform satisfactorily when constructed usin9 other than the normal
distribution.
TABLE 2.1
ORDER STATISTICS X(j) ;X(k) COMPRISING A CONFIDENCE
INTERVAL (WITH THEORETICAL CONFIDENCE) FOR
VARIOUS QUANTILES AND SAMPLE SIZES
Desired
Confidence
.99
.95
Quantile
n
.10
.25
.50
.75
.90
11
Start*;X(5)
( .997)
Sta rt;X (7)
( .992)
X(1);X(10)
( .994)
X(5) ;end*
(.992)
X(7) ;end
(.997)
31
Start;X(9)
( .997)
X(3);X(16)
( .990)
X(8) ;X(23)
( .993)
X(16) ;X(29)
( .990)
X(23) ;end
( .997)
51
X(1) ;X(l2)
(.992)
X(6);X(22)
( .991)
X( 16) ; X( 35)
(.992)
X(31) ;X(47)
( .991)
X( 40) ; X( 51)
(.992)
11
Sta rt ;'X ( 4)
( .981)
Start ;X( 6)
( .966)
X(2) ;X(9)
( .961)
x(6);end
( .966)
X(8) ;end
( .981)
31
Start;X (7)
(.969)
X(4);X(14)
( .958)
X(1l);X(23)
(.959)
X(18) ;X(28)
( .958)
X(25);end
( .969)
51
X(2) ;X(l1)
(.958)
X(8) ;X(21)
( .953)
X(19);X(33)
(.951)
X(31) ;X ( 44)
( .953)
X(41) ; X( 50 )
( .958)
U1
~
*Start = X such that F(x) = 0;
e
End = X such that F(x) = 1.
e
e
e
e
e
TABLE 2.2
EXPECTED LENGTHS OF 95% CONFIDENCE INTERVALS (AND THEORETICAL
OR OBSERVED CONFIDENCE) COMPUTED FOR VARIOUS QUANTILES OF
THE UNIFORM DISTRIBUTION, WITH THREE SAMPLE SIZES
-
Exact
Kai gh
Simulated
Order
and
Order
Lachenbruch Bootstrap Statistic Statistic Parametric
P
N
K*
L-COST
.1
51
29
.15
( .92)
.15
( .90)
.17
( .98)
.17
( .96)
.01
( .95)
31
33
.26
( .90)
.31
( .90)
.31
( .96)
.31
( .96)
.03
( .95)
51
29
.21
( .93)
.22
(.91)
.25
( .97)
.25
( .95)
.02
( .95)
11
3
.45
( .89)
.48
( .95)
.57
( .95)
.59
(.98)
.58
( .96)
.16
( .95)
31
9
.30
( .89)
.30
( .93)
.36
( .92)
.37
( .96)
.38
( .96)
.06
( .95)
51
19
.25
( .92)
.25
( .94)
.28
( .93)
.27
( .94)
.27
( .95)
.04
( .95)
.25
.5
*Kaigh and Lachenbruch method only.
lT1
lT1
TABLE 2.3
EXPECTED LENGTHS OF 99% CONFIDENCE INTERVALS (AND THEORETICAL
OR OBSERVED CONFIDENCE) COMPUTED FOR VARIOUS QUANTILES OF
THE UNIFORM DISTRIBUTION, WITH THREE SAMPLE SIZES
Kaigh
Simulated
Exact
and
Order
Order
Lachenbruch Bootstrap Statistic Stati s ti c Parametric
P
N
K*
L-COST
.1
51
29
.19
( .96)
.21
( .95)
.21
( .99)
.21
( .99)
.01
(.99)
31
23
.34
( .95)
.46
( .95)
.40
( .99)
.41
( .99)
.04
( .99)
51
29
.28
( .97)
.30
(.97)
.31
(1.0)
.31
(.99)
.02
(.99)
11
3
.58
( .96)
.70
( .99)
.75
( .98)
.74
(.99)
.75
( .99)
.22
( .99)
31
9
.40
(.95)
.41
( .97)
.47
( .97)
.47
(.99)
.47
( .99)
.08
( .99)
51
19
.33
( .96)
.34
( .98)
.37
( .98)
.37
( .99)
.37
( .99)
.05
( .99)
.25
.5
*Kaigh and Lachenbruch method only.
c..n
O'l
e
-
e
e
e
e
TABLE 2.4
EXPECTED LENGTHS OF 95% CONFIDENCE INTERVALS (AND THEORETICAL
OR OBSERVED CONFIDENCE) COMPUTED FOR VARIOUS QUANTILES OF
THE NORMAL DISTRIBUTION, WITH THREE SAMPLE SIZES
Kaigh
Simulated
Exact
and
Order
Order
Lachenbruch Bootstrap Statistic Statistic Parametric
P
N
K*
L-COST
.1
51
29
.84
(.89)
.92
( .89)
1.06
( .95)
1.03
( .96)
.76
( .95)
31
23
.86
( .90)
1.08
( .92)
1.04
( .94)
1.04
( .96)
.81
( .95)
51
29
.69
( .91)
.75
( .91)
.80
( .94)
.80
( .95)
.62
( .95)
11
3
1.26
( .89)
1.43
( .94)
1.62
( .95)
1.80
(.97)
1. 79
( .96)
1. 31
( .95)
31
9
.81
( .92)
.82
( .94)
.95
(.96)
1.01
( .96)
1.01
(.96)
.73
( .95)
51
19
.64
( .93)
.65
( .95)
.72
( .95)
.70
( .95)
.70
( .95)
.56
( .95)
.25
.5
*Kaigh and Lachenbruch method only.
<.T1
"
TABLE 2.5
EXPECTED LENGTHS OF 99% CONFIDENCE INTERVALS (AND THEORETICAL
OR OBSERVED CONFIDENCE) COMPUTED FOR VARIOUS QUANTILES OF
THE NORMAL DISTRIBUTION, WITH THREE SAMPLE SIZES
Kaigh
Simulated
Exact
and
Order
Order
Lachenbruch Bootstrap Statistic Statistic Parametri c
p
N
K*
L-COST
.1
51
29
1.10
( .95)
1.25
( .96)
1. 50
( .99)
1.51
( .99)
1.02
( .99)
31
23
1.14
( .95)
1.57
( .97)
1.39
( .99)
1.38
( .99)
1.09
(.99)
51
29
.91
( .95)
1.01
( .97)
1.04
( .99)
1.02
(.99)
.83
(.99)
11
3
1.65
( .95)
2.08
( .99)
2.13
( .98)
2.70
( .99)
2.64
( .99)
1.86
( .99)
31
9
1.06
( .97)
1.12
( .99)
1.25
(.99)
1.29
( .99)
1.29
(.99)
.98
( .99)
51
19
.83
(.97)
.87
( .98)
.95
( .98)
.98
( .99)
.97
( .99)
.75
{ .99)
.25
.5
*Kaigh and Lachenbruch method only.
(.Jl
(Xl
e
e
e
e
e
e
TABLE 2.6
EXPECTED LENGTHS OF 95% CONFIDENCE INTERVALS (AND THEORETICAL
OR OBSERVED CONFIDENCE) COMPUTED FOR VARIOUS QUANTILES OF
THE CAUCHY DISTRIBUTION, WITH THREE SAMPLE SIZES
P
N
K*
L-COST
Kaigh
and
Lachenbruch
.25
51
29
1. 59
( .94)
1.88
( .96)
11
3
2.45
( .97)
3.41
( .99)
3.39
(.99)
4.74
( .98)
31
9
1. 12
( .92)
1.23
( .95)
1.34
( .95)
1.50
( .96)
51
19
.86
( .93)
.89
( .95)
.98
(.96)
.96
(.94)
51
39
1. 59
1. 75
.5
.75
( .95)
( .94)
Bootstrap,
Simulated
Order
Statis ti c
1.84
( .97)
1.82
( .96)
*Kaigh and Lachenbruch method only.
<J1
1.0
TABLE 2.7
EXPECTED LENGTHS OF 99% CONFIDENCE INTERVALS (AND THEORETICAL
OR OBSERVED CONFIDENCE) COMPUTED FOR VARIOUS QUANTILES OF
THE CAUCHY DISTRIBUTION, WITH THREE SAMPLE SIZES
P
N
K*
L-COST
Kaigh
and
Lachenbruch
.25
51
29
2.09
( .98)
2.55
( .98)
11
3
3.22
( .99)
4.96
( .99)
4.46
( .99)
31.87
( .99)
31
9
1.47
( .97)
1.68
( .98)
1. 76
( .98)
2.05
( .99)
51
19
1.13
( .98)
1. 20
( .99)
1.29
( .99)
1.38
( .99)
51
39
2.09
( .98)
2.45
( .99)
.5
.75
Bootstrap
Simulated
Order
Statistic
2.93
(1.0)
3.56
( .99)
*Kaigh and Lachenbruch method only.
O'l
o
e
e
e
e
e
e
TABLE 2.8
EXPECTED LENGTHS OF 95% CONFIDENCE INTERVALS (AND THEORETICAL
OR OBSERVED CONFIDENCE) COMPUTED FOR VARIOUS QUANTILES OF
THE EXPONENTIAL DISTRIBUTION, WITH THREE SAMPLE SIZES
Kaigh
Simulated
Exact
Order
and
Order
Lachenbruch Bootstrap Sta.tistic Statistic Parametric
P
N
K*
L-COST
.1
51
29
. 17
( .93)
. 17
( .93)
(.96)
.20
(.96)
.06
( .95)
.25
.5
.20
31
23
.37
( .90)
.45
( .89)
.46
( .95)
.45
(.96)
. 16
( .95)
51
29
.30
(.94)
.30
( .91)
.36
(.96 )
.36
( .95)
.12
( .95)
11
3
1.03
( .91)
1.20
( .96)
1.36
( .96)
1.31
( .98)
1.32
( .96)
.82
( .95)
31
9
.65
( .90)
.67
( .93)
.78
( .93)
.88
( .95)
.88
( .96)
.49
( .95)
51
19
.52
( .93)
.53
(.94)
.60
( .95)
.57
( .94)
.56
( .95)
.38
(.95)
*Kaigh and Lachenbruch method only.
m
-'
TABLE 2.8
(continued)
P
Kaigh
Simulated
Exact
and
Order
Order
Lachenbruch Bootstrap Statistic Statistic Parametric
N
K*
L-COST
31
23
1.07
( .91 )
1. 31
(.90)
1. 31
( .96)
1.35
( .96)
.98
( .95)
51
39
.88
( .93)
1.01
( .92)
1.01
(.97)
1.05
( .95)
.76
( .95)
51
39
1.53
( .92)
1.81
( .89)
1.95
( .98)
1.91
( .96)
1.26
( .95)
.75
.9
*Kaigh and Lachenbruch method only.
'"
N
e
e
e
e
e
e
TABLE 2.9
EXPECTED LENGTHS OF 99% CONFIDENCE INTERVALS (AND THEORETICAL
OR OBSERVED CONFIDENCE) COMPUTED FOR VARIOUS QUANTILES OF
THE EXPONENTIAL DISTRIBUTION, WITH THREE SAMPLE SIZES
-
P
.1
.25
.5
Kaigh
Simulated
Exact
and
Order
Order
Lachenbruch Bootstrap Statistic Statistic Parametri c
N
K*
L-COST
51
29
.22
( .97)
.23
(.96)
.25
(.99)
.25
(.99)
.08
( .99)
31
23
.49
( .95)
.65
( .96)
.62
(.99)
.61
(.99)
.21
(.99)
51
29
.39
( .98)
.41
( .96)
.44
( .99)
.43
(.99)
.. 16
( .99)
11
3
1.35
( .96)
1. 75
( .99)
1. 79
( .99)
1.89
(.99)
1.92
( .99)
1.08
(.99)
31
9
.85
( .95)
.91
( .98)
1.02
( .97)
1.02
(.99)
1.02
( .99)
.64
( .99)
51
19
.68
( .97)
.71
( .98)
.78
( .99)
.77
(.99)
.77
( .99)
.50
( .99)
*Kaigh and Lachenbruch method only.
en
(.oJ
TABLE 2.9
(continued)
p
Simulated
Exact
Kaigh
'and
Order
Order
Lachenbruch Bootstrap Stati sti c Stati sti c Parametric
N
K*
L-COST
31
23
1.41
( .95)
1.90
( .96)
1. 76
( .99)
1.82
( .99)
1.28
( .99)
51
39
1.16
( .97)
1.42
(.97)
1.53
( .99)
1.51
( .99)
1.00
( .99)
51
39
2.01
(.97)
2.54
( .94)
3.04
(.99)
3.02
(.99)
1.66
( .99)
.75
.9
*Kaigh and Lachenbruch method only.
m
.j:::>
e
e
e
e
e
e
TABLE 2.10
EXPECTED LENGTHS OF 95% CONFIDENCE INTERVALS (AND THEORETICAL
OR OBSERVED CONFIDENCE) COMPUTED FOR VARIOUS QUANTILES OF
THE LOGNORMAL DISTRIBUTION, WITH THREE SAMPLE SIZES
P
N
K*
L-COST
Kaigh
and
Lachenbruch
.1
51
29
.24
( .89)
.25
( .89)
.29
( .95)
31
23
.46
(.90)
.55
( .90)
.55
( .94)
51
29
.36
( .91 )
.36
( .89)
.43
( .94)
11
3
1.46
( .91)
1.80
( .95)
1.98
( .97)
1.88
( .97)
31
9
.84
( .91)
.89
( .95)
1.02
( .95)
1.19
(.96 )
51
19
.66
( .93)
.68
( .95)
.76
( .95)
.73
( .95)
.25
.5
Bootstrap
Simulated
Order
Statistic
*Kaigh and Lachenbruch method only.
O"l
<.n
TABLE 2.10
(Continued)
p
N
K*
L-COST
Kaigh
and
Lachenbruch
31
23
1.83
( .90)
2.21
( .90)
2.28
( .96)
51
39
1.42
( .91)
1.61
( .91 )
1.62
( .94)
51
39
3.29
( .92)
3.86
( .87)
4.47
( .97)
.75
.9
Bootstrap
Simulated
Order
Statistic
*Kaigh and Lachenbruch method only.
O'l
O'l
e
e
e
e
e
e
TABLE 2.11
EXPECTED LENGTHS OF 99% CONFIDENCE INTERVALS (AND THEORETICAL
OR OBSERVED CONFIDENCE) COMPUTED FOR VARIOUS QUANTILES OF
THE LOGNORMAL DISTRIBUTION, WITH THREE SAMPLE. SIZES
P
N
K*
L-COST
Kaigh
and
Lachenbruch
.1
51
29
.31
(.96)
.34
( .96)
.36
( .99)
31
23
.60
(.95)
.81
(.97)
.76
(.99)
51
29
.47
( .97)
.49
( .95)
.54
( .99)
11
3
1.92
( .96)
2.61
( .99)
2.61
(.99)
3.01
(.99)
31
9
1.11
( .97)
1. 21
( .99)
1. 34
( .99)
1.35
(.99)
51
19
.87
( .97)
.92
( .98)
1.00
( .98)
1.01
( .99)
.25
Bootstrap
Simulated
Order
Statistic
-
.5
*Kaigh and Lachenbruch method only.
0'1
'-I
TABLE 2.11
(Continued)
p
N
K*
L-COST
Kaigh
and
Lachenbruch
31
23
2.40
( .94)
3.21
( .95)
3.20
( .99)
51
39
1.86
( .96)
2.25
( .96)
2.64
( .99)
51
39
4.33
( .95)
5.41
( .94)
8.33
( .99)
Bootstrap
Simulated
Order
Statisti c
.75
.9
*Kaigh and Lachenbruch method only.
~
co
e
e
e
CHAPTER III
THEORY FOR ESTIMATION OF AN INTERQUANTILE DIFFERENCE
3.1
Introduction
Until the present chapter, most of this work's discussion regard-
ing quantile estimation has centered on estimation of a single quantile.
The difference between quanti1es, however, is also useful.
can serve as a nonparametric measure of dispersion.
It
Chu (1957) dis-
cusses how the standard deviation is in fact a constant multiple of
an interquanti1e difference in many cases.
Thus, it is of practical
interest to determine useful methods of estimating differences between
quanti1es and to provide a comparison of them.
This difference be-
tween two quanti1es is called an interquanti1e difference.
Definition 3.1:
Let sp and Sq be the p-th and q-th quanti1es from the cdf F,
respectively, with q
>
p.
The interquantile difference is then Sq - sp.
If F is the cdf of a continuous random variable, then the interquanti1e
difference is t-s when F(t) = q and F(s) = p.
Both the L-COST method of Harrell and Davis (1982) and the "K-L"
method of Kaigh and Lachenbruch (1982) will be modified in order to
estimate the interquanti1e difference.
This chapter will present
II
70
the theory needed to extend the use of both of these estimators.
The appropriate confidence intervals will also be presented.
3.2 Theory for the L-COST Estimator of Interguanti1e Difference
3.2.1
The L-COST Interguanti1e Difference Estimator
The L-COST estimator for the p-th quantile can be written as:
n
Q
P
= i=l
I Pw n,'. X(,.)
where
Pw
. =
n, ,
1
Ji / n (n+1)p-1 (1_y)(n+1)(1-P)-1 dy
B(( n+ 1) p, (n+1)( 1- p)) . i -1 Y
n
=
I i / n {p(n+ 1) ,( 1- p)( n+ 1)} - I i _1 {P(n+ 1) ,( 1- p)( n+ 1) } ,
n
and
Ii/n(a,b) is the incomplete beta function.
An estimator of the q-th quantile can be defined similarly as
Q
q
n
= i=l
I
qw
. X.
n,' (,)
This readily allows construction of the L-COST interquanti1e difference estimator as follows.
Definition;).2:
The L-COST interquanti1e difference estimator is defined to be
n
d
X
~(q_p)(L-COST) = i~l Wn,i (i)
( 3. 1)
- p
dW . = qw
n,,
n,i
Wn,i
(3.2)
where
71
Thus t the estimator for an interquantile difference is again a linear
combination of order statistics t with coefficients consisting of a
II
function of incomplete beta functions as indicated in (3.2).
3.2.2 Theoretical Framework for Converqence to Normality
In order to establish the use of the Normal distribution for a
pivotal quantitYt it is important to demonstrate convergence to normality of the estimator presented in (3.1).
The framework in which
the results will be demonstrated is that of L-estimators as discussed
in Parr and Schucany (1982)t Cheng (1982)t and Sen (1982).
3.2.2.1
L-estimators and the L-COST Estimator
Consider
-1
Ln = n
(3.3)
a general form for an L-estimator t which is a linear combination of
order statistics.
Let J(t nti ) be a score or weight function whose
argument depends on both n and it and g be some suitably bounded
function of the order statistics.
In this case t g(X(i)) will be
X(i)t the order statistic itself.
As discussed in Parr and Schucany (1982)t and Sen (1982)t (3.3)
is asymptotically equivalent to a form
n
Un
=
I
C. n X( .)
lt
1
i=l
(3.4)
where
=
iln
I
J (u)
i-l
n
and thus Un can be considered equally well in asymptotic contexts.
72
In fact, Sen (1982) specifically makes the equivalence
for i~l < u <
*
J(tn,i) = J(nl1) = J(u)
due to this asymptotic equivalence.
The L-COST estimator for interquantile differences can be expressed in the form of (3.4).
To do so, consider
(3.5)
where
Jl(u)
_ u(n+l)q-l (l_u)(n+l)(l-q)-l
B((n+l)q,(n+l)(l-q))
and
J (u)
= u(n+l)p-l (l_u)(n+l)(l-p)-l
2
B((n+l)p,(n+l)(l-p))
To simplify the subsequent expressions, (3.5) will be written as
J(u) = Clu
where
al-l
a -1
bl-l
b -1
(l-u) 2 - C2u
(l-u) 2
(3.6)
Cl = [B((n+l)q, (n+l)(l-q))]-l
C2
= [B((n+l)p,
a1
= (n+l)q
(n+l)(l-p))]-l
a2 = (n+l)(l-q)
bl = (n+l)p
b2
= (n+l)(l-p)
3.2.2.2 Establishing Conditions for Convergence
Sen (1982) and Serfling (1980) provide several necessary conditions in order to establish convergence to normality of a pivotal
e-
73
quantity based on the estimator. Also, the estimator of variance,
proposed in section 3.2.3, will be shown to converge to the true
asymptotic variance.
Each of the conditions needed will be estab-
lished for the L-COST interquantile difference situation prior to
presenting the main theorems.
The first five conditions which fol-
low are required by Sen (1982) and the remainder by Serfling (1980).
Condition (i):
If
b(u) = g(F-l(u}), 0 < u < 1, fore do,~), b(u) is of
bounded variation on (e,l-e).
Condition (ii):
--
If b(u) = g(F-l(u}), 0 < u < 1, then Ib(u)1
some positive, finite K, real a, and all u
E
$
K{u(l-u)}-a for
(0,1).
Condition (iii):
J(u) has continuous first-order derivatives, {JI(U}; 0
<
U < l},
almost everywhere.
Condition (iv):
IJ(u)1 ~ K{u(l-u)}-b
IJ1(u)1
$
and
K{u(l_u)}-b-l
V 0 < u < 1, real b, and some
finite, positive k.
Condition (v):
a + b = ~ - 0, for some 8 > 0, where a and b are defined in
condition (ii) and condition (iv).
74
Condition (vi):
n
If Tn
= .I
n max
l:si:5n
t
J(t n i) X(i)' then
1=1
I
'
iin =
. - -
n,l
O( l) .
Condition (vii):
For some d > 0,
Condition (viii):
If
Elxl r
<
for some r, then for 6 > 0,
00
IJ(u)1 ~ M(u(l-u))-~ + l/r + 6 ,0< u < 1.
Conditions (i) through (viii) will now be shown to be satisfied by
the L-COST interquantile difference estimator.
Condition (i):
_00
<
F- 1(u)
<
00.
As g(X(i))
= X(i)' g(F- 1 (u)) = F- 1(u). On (0,1),
By Rudin (1976, p. 128),
1- 0
fo
and
F-1 (u)du <
00
Thus, b(u) is of bounded variation on (0,1-0).
Condition (ii):
Since b(u)
= F- 1 (u)
e-
75
where
K1
= F- 1(u)u a (l-u)a
IuIa
Then,
<
Condition (iii):
=
£
(0,1),
1, and
Ib(u)1 ~ K{u(l-u)}-a
J(u)
0, on u
>
for some K1 ~ K <
00.
From (3.6),
a-l
a-1
b-l
b-l
C u 1 (l-u) 2 _ C u 1 (l-u) 2 .
1
2
Thus
0_
JI(U)
-2
a -1
a
b -1
b -2
a -1
a
-2J
= C1[ u 1 (a 2-1)(1-u) 2 + (l-u) 2 (a 1-l)u 1
+ C [u 1 (b -1)(1-u) 2
2
2
-21
b -1
b
+ (l-u) 2 (b 1-l)u 1 _.
It is obvious that the first derivative of JI(U) exists everywhere
for u
£
(0,1).
By
Rudin (1976, p. 104), JI(U} is continuous on (0,1).
From (3.6), J(u) can be written as:
Condition (iv):
J(u)
al-l+b
= {C1u
a -1+b
b -1+b
b -1+b} {
}-1
(l-u) 2
- C2u 1
(l-u) 2
u(l-u}
b
= K2 {U(l-U)r .
Also, JI(U) can be written as:
JI(U)
a +b
a -l+b
= C1[ u 1 (a 2-1)(1-u) 2
{
a +b
a +b- 1]
+ (l-u) 2 (a 1-l)u 2
76
b +b
b +b-l
+ C2 u 1 (b -l)(l-u) 2
+ (l-u) b2+b (bl-l)u b1+b_1J}
2
[
-b-l
x { u(l-u) }
= K3 { u(l-u) }
-b-l
.
For each of these expressions, every term within the sum or product
of terms is easily shown to be finite.
K3 are each finite.
Consider k
=
Thus, it follows that K2 and
max{K l ,K 2 ,K 3} and conditions (ii)
and (iv) are simultaneously satisfied by one k.
Proper choice of a and b will still preserve
Condition (v):
the required finiteness of K and meet this condition:
As Tn is asymptotically equivalent to Un' then
Condition (vi):
it suffices to show that if
n
= I
Un
then
i =1
n max
l~i~n
i/n
i-l J(u)du ) X(i)'
n
U
Iu - *1
= O( 1) .
Since
max
l:si~n
Iu - *1
<
max
l:::i:::n
I~
i
1
-
it follows that
n
Then
max
l-si:;:n lu -
*1
= 0(1).
~I
<
1
- n
77
. for i-l < u < i. It
n"
n
n
is obvious that this relation will hold for arbitrary nand i ~ n,
Condition (vii):
Consider u <
>t
by suitably selecting d.
Condition (viii):
This condition is required by Serfling (1980)
and is more specific than condition (iv) required by Sen (1982), but
shown by the same techniques.
3.2.3 Convergence Theorems for the L-COST Estimator of Interguantile
Difference
Having established all the needed conditions, the following
theorems regarding convergence for the L-COST estimator of interquantile distance may be established.
THEOREM 3.1.
Consider
as a pivotal quantity for the L-COST interquantile difference estimator, where
which converges to
(3.5).
~q
- sp' and J(u) = Jl(u) - J 2(u) as defined in
Let
0l =
f: f:[ffiin(s.t)-stJ J(s)J(t) dF-1(s} dF-1(t}
be the asymptotic variance.
(3.7)
78
Assume {Xi} are independent and identically distributed for any
cdf, F. Assume EjXl r < 00 for some r > O. Then, since conditions
(iii), (vi), (vii), and (viii) are satisfied,
Zl
Proof:
+
d
N(O,l)
as n
+
00.
From a result in Serfling (1980, p. 277), since the required
conditions are satisfied, the result has been established for the
II
interquantile difference estimator.
2
To define an appropriate estimator for the variance, 0L' consider using the jackknife framework.
Definition 3.3:
D.
Let
J
= d.1.IX~
where
and
(removing the j-th order statistic)
Then
qw n- 1 ,1. I (.1 >J.)
=
I__i__ {qn,(l-q)n} - I i _l {qn,(l-q)n} if i < j
n-l
n-l
(3.8)
I i - l {qn,(l-q)n} - I i _2 {qn,(l-q)n} if i > j,
i < n
n-l
n-l
PWn-l,i-I(i>j) is similarly defined, replacing q by P throughout in
(3.8).
Then
S~2(L-COST)
=
var(~(q_p)(L-COST))
n
=
n 1
-ij~l
-
(OJ-D)
2
(3.9)
79
_
where
n
I
D=
D·/n.
J
j=l
This can be shown by applying (4) of Harrell and Davis (1982).
II
To prove the next theorem, consider the following jackknife
notation based on the L-statistic, Un'
Let
n
= I
U
n
where Ci ,n
=
i =1
i/n
i-l J(u)du
I
(3. 10)
Ci n X(i)
'
(as in (3.4) and (3.5)).
Then,
n
U . = nU - (n-l)u(i )
n,1
n
n- l
where U(i)
n-l
=
(3.ll)
Un based on n-l sample observations, removing the
.
j-th order statist1c
and
n
U*
n
=
1n. I 1 Un,1..
(3.12)
1=
THEOREM ;). 2.
Let S~2(L-COST) be the jackknife variance estimator of
t(q_p)(L-COST) as defined in (3.9), and a~ be the asymptotic variance as shown in (3.7).
Then, under satisfaction of conditions (i)
through (v), S~2(L-CQST) ~ a~ almost surely.
Proof:
Let
2
Sn
=
(n-l)
-1
n
.I
1 =1
= S~ 2 (L-COST)
(Un i-Un)
2
(3.13)
'
+ n( n-l)
-1
*
2
(3.14)
(Un - Un) .
By (3.10) through (3.14) and satisfaction of conditions (i) through
(v), Sen (1982) shows U*n - Un
~ 0
almost surely as n
~
00.
Thus,
80
S~ - S~2(L-COST)
+
As shown in Sen
Thus~
0 almost surely as n
(1982)~
2
2
Sn - 0L
+
+
00.
0 almost surely as n +
00.
under conditions (i) through (v) with the asymptotic equiva-
lence of Tn and Un~ S~2(L-COST) - o~
+
0 almost surely as n +
00.
II
With these two theorems, the use of the L-COST difference estimator and its jackknife variance estimator are both correct based on
their convergence to the desired parameters.
3.2.4 Confidence Interval Estimator Based on L-COST Interguantile
Difference Estimator
Establishment of Theorems 3.1 and 3.2 allows the use of the
L-COST estimator and the jackknife variance estimator in a pivotal
quantity for the construction of a (1-a)x100% confidence interval for
the interquanti1e difference.
Again~
the small sample distribution of the pivotal quantity is
not clearly determined, so the asymptotic normality results will be
applied.
This allows the confidence interval to be defined in the
following way.
Definition 3.4:
A (1-a)x100% confidence interval for the interquanti1e difference based on the L-COST interquanti1e difference estimator is
S(q_p)(L-COST)
A
where
(3.9).
S~
2
± ~
-1
*
(1-a/2) Sn(L-COST)
(L-COST) is the jackknife variance estimator defined in
II
81
3.3 Theory for the Kaigh and Lachenbruch (1982) Estimator of an
Interguanti1e Difference
3.3.1
The K-L Interguanti1e Difference Estimator
Recall that
where k is the subsamp1e size and r
=
[(k+1)p].
the q-th quantile is defined in a similar manner.
The estimator of
This allows con-
struction of an interquanti1e difference estimator as follows:
Definition 3.5:
The K-L Interquanti1e Difference Estimator is defined to be:
~(q_p)(K-L)
where
k1
= subsamp1e
size for q-th quantile estimator
k2
= subsamp1e
size for p-th quantile estimator
r1
=
[(k 1+l)q]
r 2 = [(k 2+l)p]
II
Thus, this estimator is also a linear combination of order statistics.
It should be noted that k1 and k2 are equal only when it is found
that the two subsamp1e sizes can equally well be used in the estimation of their intended quanti1es.
As discussed in Chapter II,
82
selection of k may be difficult and is somewhat arbitrary, requiring
much care.
3.3.2 Convergence Theorems for the K-L Estimator of Interguantile
Di fference
The asymptotic normality of a pivotal quantity for the K-L interquantile difference estimator is established in the following theorem.
THEOREM 3.3.
Let
mr . : k. (u)
1
=u
1
r.-l
k.-r.
(l-u) 1 1
S(r.,k.-r.+l)
1
1
1
1
a<
u< 1
mdr:k(u) = mrl:kl(u) - mr2 : (u)
k2
~r.:k.(F)
and
1
1
1
=
fa
F-l(u) mr.:k.(u) du
1
for i=1,2 corresponding to p and q.
k2
1
A
n (~(q_p)(K-L) - (~rl:k,'F) -
z2 -0
Ix
2
Suppose
lJ
dF(x)
<
00,
then
F
r2 :k/ ))
d(F)
is a pivotal quantity where
and Z2 converges to the standard normal distribution as n ~
Proof:
00.
Extending directly from Theorem 2.1 of Kaigh (1982), and re-
placing mr:k(u) by mdr:k(u) yields the result.
II
83
To define the variance estimator for the Kaigh and Lachenbruch
estimator, an extension of the jackknife variance estimator discussed
in Chapter II will be considered.
Definition 3.6:
AO
Let
0n
A
= ~(q_p){K-L)
and
Ai
on-1
where the i-th observation has been removed from the sample, r 1 , r 2 ,
k1, k2 are defined in Definition 3.15, and let
A
0~
,
= n0AOn
-
(n-1)
Ai
0
n-1
= 1, ... ,n
i
n
8* = I 8i/n.
and
i=l
Then
S~2{K-L)
= Variance
n
= I
,=. 1
(8~
'
estimator of ~(q_p){K-L)
- §*)2 / {n{n-1)).
(3.16)
From (2.8),
s*2{K.-L)
n
n
1
=~
n .f\ (0'
-n-1 - 0'n-1 )
A·
, =1
where
-i
n
on-1 = I
i =1
en- lin.
i
•
2
(3.17)
II
84
The next theorem establishes the almost sure convergence of the
jackknife variance estimator for the K-L interquanti1e difference
estimator to the asymptotic variance.
THEOREM :3.4.
Let
f IXI 2
Then, for fixed p
+€
€
dF(x)
<
00
for some
(0,1), andk, as n
--r
00,
€ >
O.
Sn*2 (K-L) - 0d(F)
--r
0,
almost surely, where 0d(F) is defined by (3.15).
Proof:
Since the estimator for the difference between two quanti1es
is the difference of two U-statistics, it is also aU-statistic.
From Theorem 3.1 of Kaigh (1982), established by Theorem 6 of
Arvesen (1969), the result follows.
II
Remark 1 of Theorem 3.1 in Kaigh (1982) indicates that the
existence of the variance of the r-th order statistic in a random
sample of size k from F is sufficient for this theorem to hold.
This will apply easily for any distribution.
For a Cauchy distri-
bution, the smallest and largest order statistics do not have moments
(David, 1981, p. 34).
Choosing k so as to eliminate the extreme
order statistics will permit the needed conditions to be met for that
distribution.
3.3.3 Confidence Interval Estimator for the K-L Interguantile
Difference Estimator
Theorems 3.3 and 3.4 having been established allows the K-L
estimator and its jackknife variance estimator to be used in the
85
pivotal quantity for construction of a (l-a)xlOO% confidence interval for the interquantile difference.
This confidence interval is
defined as follows.
Defini tion ;).?:
A (l-a)xlOO% confidence interval for the interquantile difference based on the K-L estimator for interquantile difference is defined as
~ _ (K-L) ± t _
A
q p
*
n max (k l' k)
2' 1- a 12 Sn(K-L).
The number of degrees of freedom for the t-statisti cis chosen to be
the smaller number between the degrees of freedom for the t-statistics
used for each individual quantile's estimator for a confidence interval.
II
CHAPTER IV
A COMPARISON OF POINT AND CONFIDENCE INTERVAL
ESTIMATORS FOR INTERQUANTILE DIFFERENCES
4.1
Introduction
In the previous chapter, point and interval estimators of inter-
quantile distance were defined, and distributional properties developed.
It is of interest to investigate numerically the relative per-
formance of these estimators.
Two estimators, based on the L-COST and the Kaigh and Lachenbruch
(K-L) methods, will be evaluated as point estimators based on their
Mean Squared Error (MSE) and relative bias.
Their relative efficiency
(RE), as compared with an interquantile difference estimator based on
the difference of sample quantiles, will be computed.
Only inter-
decile and interquartile difference estimators will be considered
because of their easy interpretation and application.
Confidence
intervals for interquantile difference estimators will be constructed
as described in Chapter III, and their ability to preserve
confidenc~
and expected length will be evaluated using methods described in
Chapter II.
These intervals will also be discussed relative to
bounds which were developed by Chu (1957).
87
4.2 Point Estimators for the Interguantile Difference
The L-COST of Harrell and Davis (1982) and the K-L estimator
of Kaigh and Lachenbruch (1982) have been developed into estimators
for the interquantile difference.
Definition 3.2 in Chapter III
specifies the form of the L-COST Interquantile Difference Estimator,
denoted ~(q_p)(L-COST).
The Kaigh and Lachenbruch (K-L) Interquan-
tile Difference Estimator, denoted t(q_p)(K-L), was presented in
Definition 3.5, also in the previous chapter.
The interquantile
difference estimator based on sample quantiles (SQ method) is defined as follows.
Definition 4.1:
The sample quantile interquantile difference estimator is defined to be
where
rl
= [q(n+l)]
= [p( n+ 1)]
a 1 = q( n+ 1) - r l
a 2 = p( n+ 1) - r 2
r2
II
All three of these estimators will be compared as point estimators.
4.3 Evaluation of Point Estimators
4.3.1
Methodology for Comparisons
The L-COST, K-L, and SQ interquantile difference estimators were
88
all evaluated as to relative bias, and MSE.
In order to evaluate the estimators, 500 simulations were performed for different combinations of sample size and underlying distribution.
The interquartile differences were estimated for samples
of size 31 and 51 for the uniform, normal, exponential, and lognormal
distributions.
The interquartile distance for the Cauchy distribution
was only estimated from a sample of size 51 because far more than 500
simulations would be required to obtain an adequate estimate in this
case.
The interdecile difference for the uniform, normal, exponential,
and lognormal distributions was also evaluated for a sample size of 51.
Values of Kl and K2 for the K-L estimator as defined above were
chosen to be the same as those selected for the analyses in Chapter
II.
Depending on the Kl and K2 values selected, the results will vary.
As each underlying distribution has a known value for !;q - !;p'
the interquantile difference of interest, the relative bias for each
estimator was approximated as follows:
"-
Relative bias
~ bias/(~q
-
~p)
where
bias
where So is the number of simulations (500) performed, and Method is
either SQ, L-COST, or K-L .
The MSE of each •estimator was also estimated from simulations.
Within each simulation, a value of (~(q_p)(Method) - (~q - ~p))2 was
obtained.
These squared errors were averaged over the number of
..
89
simulations performed to obtain an estimate of MSE.
To obtain relative efficiencies, the MSE of the sample quantile
(SQ) method was divided by the MSE of the L-COST and K-L methods.
This then gives a measure of the performance of each estimator1s MSE
relative to a standard, i.e., the SQ method.
4.3.2 Results of Comparisons
Estimates of relative bias are presented in Table 4.1.
The
amount of relative bias present in the estimators varied noticeably
depending on which
di~tribution
the data were sampled from, but also
varied somewhat between estimators and by changing sample size.
For the uniform distribution, relative bias was small compared
to that for other distributions.
nearly unbiased.
The sample quantiles method was most
Negative bias was present when using the L-COST
method, as well as when any method was used for sample size 31 estimation of the interquartile range.
Relative bias patterns were similar for the normal and exponential distributions.
For both of these distributions, the L-COST
method always produced estimates with lowest relative bias, followed
by the Sample Quantiles method, and the K-L method with consistently
higher
relative bias.
Increasing the sample size reduced the rela-
tive bias under the SQ and L-COST methods, but increased it under
the K-L method.
For the Cauchy distribution, the SQ method produced the lowest
relative bias, followed by the L-COST and K-L methods.
The same
almost holds true under the lognormal distribution, except that the
90
K-L and L-COST methods had approximately the same relative bias when
estimating the interquartile difference from a sample of size 31, and
the interdecile difference from a sample of size 51.
With both of
these distributions, all three methods resulted in moderately large
bias, however.
Finally, there is some tendency for the interdecile
difference estimates to have slightly higher relative bias than the
interquartile difference estimates.
The MSEs for the same cases were computed.
As a summary measure
of performance, the relative efficiencies of both the L-COST and K-L
methods for estimating interquantile differences were computed using
the SQ method as a reference.
These values are found in Table 4.2.
Neither the L-COST nor the K-L estimator was as efficient as
the SQ method when the Cauchy distribution is sampled.
In every
other case, no matter what distribution or which interquantile difference, the L-COST estimator was more efficient than the SQ method.
Except for the interdecile range computed for the uniform distribution,
the L-COST estimator had higher relative efficiency than did the
particular K-L estimator under consideration.
While it is possible
that changing values of Kl and K2 could improve estimation by the
K-L method, preliminary work indicates that different choices of these
parameters varies the results, but not in a consistent fashion across
distributions.
Overall, as formulated for this
chapter~
the K-L
estimator is approximately as efficient as the sample quantile method,
whereas the L-COST method is generally more efficient.
91
4.4 Evaluation of Confidence Intervals
Three types of confidence intervals for the interquantile difference estimator will be considered.
One is based on the modified
version of the L-COST estimator and is presented in Definition 3.4.
Another, shown as Definition 3.7, is based on the K-L estimator as
modifed in Chapter III.
The third is based on the bounds of Chu
(1957) as presented in section 1.2.6 of Chapter T.
For purposes of this chapter, only symmetric interquantile
ranges, for which q
= l-p, are being considered. In this case, Chu
describes a simplification of his general bounds as follows.
If
then
p( X( s ) - X( r ) >- l; q - l; p
) > 1 - a/2
if r is chosen such that Bn(r-l,p)
~
a/4 and s = n-r+l.
u is chosen such that Bn(u-l,p)
~
l-a/4 where v
if
=
Also,
n-u+l.
Select-
ing rand u in this manner will assure that
This probability is very conservative because of the method by
which Chu (1957) determined these bounds.
to obtain his result.
He used the relation
The probability on the left-hand side actually
equals the probability on the right-hand side added to
92
+
in addition to other combinations.
Similar simplifications apply
to the upper bound with X(v)- X(u)'
Thus, the confidence interval
can be of much greater confidence than is initially represented,
and hence be much wider than an order statistic bound of this type
could be constructed to be.
Following closely to Chu·s (1957) recommendations, the values
of r,s, u,v selected for bounds which meet at least (l-a)xlOO% confidence in the two-sided interval are found by examining the binomial
distribution and are presented in Table 4.3.
In the case of the interdecile range, the actual X(r) and X(s)
chosen lead to slightly shorter intervals than Chu would require.
They are, however, the extreme order statistics, and are the best
possible approximation.
Just as in Chapter II, the confidences were evaluated using
the simulated interval endpoints.
The equation (2.1) was used to
compute the proportion of times that the true interquantile difference is within the observed bounds.
This applied for all three
methods considered.
The average lengths of the intervals are also obtained in a
manner similar to that in Chapter II.
For the L-COST interquantile
distance estimator, the expected length of the interval is
2 ¢-1(1-a/2) E(S~(L-COST)),
where
S~(L-COST)
is defined in (3.9).
The K-L estimator has an
93
interval with average length
2 tn-max(kl,k2),1-a/2 E(S~(K-L))
where S~(K-L) is defined in (3.16).
Finally, for the interval bounds
presented by Chu (1957), the expected length is
E((X(S) - X(r)) - (X(v) - X(u)))
= E(X(s)
+ X(u) - X(r) - X(v))'
For all three intervals, the expectations are estimated through the
simulations, which are conducted as discussed in section 2.5 of
Chapter II.
4.5 Results from Simulated Confidence Intervals
The expected lengths and observed confidences for all three
methods appear in Tables 4.4 through 4.8.
The most obvious feature
from the tables is that Chu's (1957) bounds are indeed overly wide
and exceed their intended confidence.
previously.
This was expected as described
Thus, it appears that these bounds on interval length
can be considered generous upper bounds on the true interquantile
difference, and should be
im~roved
upon by other methods.
In fact,
even 99% intervals from the K-L or L-COST methods would always preserve at least 95% confidence and yet be shorter than Chu's 95%
intervals.
Regardless of interquantile difference or underlying
distribution, both the K-L and L-COST methods achieve interval
lengths much lower than those ofChu but they do this in nearly all
cases by reducing the actual level of confidence below that intended.
94
For the uniform distribution, both the L-COST and K-L methods
are close to performing acceptably for the 95% confidence intervals,
but both estimators have somewhat lower confidence than acceptable
for the 99% intervals.
The K-L method always equals or exceeds the
confidence of the L-COST method, but K-L intervals are often longer
as well.
Results are somewhat different for the normal distribution.
In
this instance, the K-L method always provides an interval of acceptable confidence, whereas the L-COST method is only within acceptable
tolerances when using the larger sample size.
For equal confidence,
the K-L method provides a longer interval, but it provides an
acceptable, longer interval when L-COST cannot attain the approximate
confidence desired.
For the Cauchy distribution, either estimator provides an interval of acceptable confidence.
Again, the K-L intervals are longer
than those from the L-COST method.
Entirely different results are obtained when forming intervals
for quantile differences from exponential and lognormal distributions.
Neither the L-COST nor the K-L method was acceptable for attaining
the desired confidence, regardless of sample size or interquantile
difference of interest.
A minor exception is that the L-COST method
did achieve an observed .94 confidence for the interquartile range
from an exponential sample of size 51.
Otherwise, when sampling from
these asymmetric distributions with the moderate to large sample
sizes used for this work, it appeared that neither method was of
acceptable confidence to use for estimation of interquantile ranges.
95
4.6 Conclusion and Summary
Based upon the results presented in section 4.3, the L-COST
estimator for interquantile differences is of great value as a point
estimator.
Its bias is generally within a reasonable amount of that
of the SQ method, and is actually less biased for two distributions.
It also had lower MSE than either the SQ or K-L method, with the exception of estimation from a Cauchy-distributed sample.
Thus, assum-
ing adequate sample sizes and reasonable interquanti1e ranges, the
L-COST estimator for interquanti1e differences is the preferable
method, in general, for use as a point estimator.
This is consistent
with the finding of Harrell and Davis (1982), which was that their
L-COST estimator for a single quantile had generally higher efficiency
than the sample quantile method for sample sizes between 20 and 60.
For construction of confidence intervals, not one of the three
methods in their present form performed satisfactorily overall, and
thus none can be recommended for general use.
The Chu (1957) method
overstated the bounds required, and thus led to overly wide bounds
which were too conservative.
The L-COST method provided shorter
bounds, but too often, its confidence was below the acceptable thresholds as defined in section 2.5 of Chapter II.
For symmetric distri-
butions, the K-L method achieved acceptable confidences in virtually
every case simulated.
Thus, it is acceptable to use in the instances
in which the underlying distribution is considered to have some type
of symmetric shape.
as well.
This assumes, of course, adequate sample sizes
It should be kept in mind that results looked less promis-
ing for interdeci1e ranges than for interquarti1e ranges.
Further
96
work is needed to find an acceptable confidence interval estimator
for general, unspecified distributions.
-
e
e
TABLE 4.1
RELATIVE BIAS OF PROPOSED ESTIMATORS
Sample
Quantiles
L-COST
K-L
-.0058
-.0082
.0019
- .0366
- .0138
- .0173
-.0054
-.0392
.0018
0
.0359
.0267
.0377
.0272
.0213
.0234
.0460
.0790
.0456
29
Q
.0641
.1157
.1744
23
39
39
23
29
29
Q
.0359
.0353
.0508
.0330
.0320
.0502
.0482
.0590
.0628
23
39
39
23
29
29
Q
.0740
.0637
.0800
.0950
.0688
.1034
.0943
.0884
.1041
t;q -t; P**
Distribution
N
Kl*
K2*
Uni form
31
51
51
23
39
39
23
29
29
Q
31
51
51
23
39
39
23
29
29
Q
Cauchy
51
39
Exponenti al
31
51
51
31
51
51
Normal
Lognormal
0
0
0
*Kaigh and Lachenbruch method only.
**Q
=
t;.75 - t;.25;
.0 =
t;.90 - t;.10·
\.0
-.....J
TABLE 4.2
RELATIVE EFFICIENCyt OF PROPOSED ESTIMATORS VS SAMPLE QUANTILE METHOD
L-COST
Kaigh and
Lachenbruch
1.375
1.293
1.186
1.209
1.084
1.216
D
1.329
1.367
1.188
1.128
.958
.916
29
Q
.944
.685
23
39
39
23
29
29
Q
1.339
1.218
1.225
1.158
1.047
1.044
23
39
39
23
29
29
Q
1.147
1.197
1.063
1.085
1.051
.976
N
K1*
K2*
Uni form
31
51
51
23
39
39
23
29
29
Q
31
51
51
23
39
39
23
29
29
Q
Cauchy
51
39
Exponenti a1
31
51
51
31
51
51
Normal
Lognormal
*Kaigh and Lachenbruch method only.
**Q
e
I; -I;
Distribution
= 1;.75 - 1;.25;
D = l;.90 - l;.10·
q
P**
D
D
D
t Re1 .
e
Eff.
1.0
CXl
=
e
e
e
e
TABLE 4.3
INDICES FOR ORDER STATISTICS SELECTED FOR FORMATION OF
CONFIDENCE INTERVALS DESCRIBED BY CHU (1957)
Intended
Confidence
N
~ -~
q P*
31
Q
95%
31
99%
3
29
14
18
6
46
21
31
D
I
1
51
11
41
Q
I
2
30
16
16
5
47
23
29
1
51
13
39
51
51
= ~. 75
I
51
51
*Q
I
I ndi ces of Order Stati sti cs
r**
s
u
v
D
I
- ~ . 25 ; D = ~. 90 - ~. 10.
**r as in X(r)'
~
~
TABLE 4.4
EXPECTED LENGTHS OF CONFIDENCE INTERVALS (AND OBSERVED COKFIDENCE) COMPUTED FOR INTERQUANTILE
DIFFERENCE FROM UNIFORM DISTRIBUTION
Intended
Confidence
N
Kl*
K2*
31
23
23
~ -~
q
P**
Q
51
39
29
51
39
29
31
23
23
95%
D
I
Q
51
39
29
51
39
29
99%
D
Chu
L-COST
K-L
(1)
.69
.29
( .92)
.39
( .94)
(1)
.58
.24
( .94)
.29
( .94)
(1)
.38
.19
( .93)
.22
( .94)
(1)
.84
.39
(.97)
.57
( .98)
.69
(1)
.32
( .98)
.41
( .99)
.46
.26
( .97)
.31
( .97)
(1)
I
*Kaigh and Lachenbruch method only.
**Q -- ~ .75 - ~ .25'. D = ~. 90 - ~. 10·
e
0
a
e
e
e
e
e
TABLE 4.5
EXPECTED LENGTHS OF CONFIDENCE INTERVALS (AND OBSERVED CONFIDENCE) COMPUTED FOR INTEROUANTILE
DIFFERENCE FROM NORMAL DISTRIBUTION
Intended
Confi dence
N
K1*
K2*
31
23
23
~ -~
q
Q
95%
51
39
29
51
39
29
31
23
23
D
Q
99%
51
39
29
51
39
29
*Kaigh and Lachenbruch method only.
**Q -- ~ .75 - ~ .25 .• D = ~.90-~.10'
**
P
Chu
L-COST
K-L
2.45
(1)
.98
( .92)
1.29
( .94)
1. 96
(1)
.79
(.95)
.94
( .95)
I 2.85
(1)
1.13
( .92)
1.37
( .93)
3.01
(1)
1.28
(.97)
1.88
( .99)
2.39
1.03
( .98)
1.32
( .98)
I 3.11
(1)
I
1.49
( .96)
1. 92
( .98)
(1)
D
.......
0
.......
TABLE 4.6
EXPECTED LENGTHS OF CONFIDENCE INTERVALS (AND OBSERVED CONFIDENCE) COMPUTED FOR INTERQUANTILE
DIFFERENCE FROM CAUCHY DISTRIBUTION
Intended
Confi dence
N
Kl*
K2*
95%
51
39
29
99%
51
t,; -t,; **
q p
Q
Chu
L-COST
K-L
5.47
1.89
( .94)
2.30
( .95)
2.49
( .99)
3.22
(1)
39
29
7.45
Q
(1)
( .99)
*Kaigh and Lachenbruch method only.
**Q = ';.75 - t,; .25'
o
N
e
e
e
e
e
e
TABLE 4.7
EXPECTED LENGTHS OF CONFIDENCE INTERVALS (AND OBSERVED CONFIDENCE) COMPUTED FOR INTERQUANTILE
DIFFERENCE FROM EXPONENTIAL DISTRIBUTION
Intended
Confidence
N
K1*
K2*
31
23
23
~ -~
q
Q
51
39
29
51
39
29
31
23
23
95%
** I
p
I
D
Q
51
39
29
51
39
29
99%
D
Chu
L-COST
K-L
2.12
( .99)
.99
( .90)
1. 26
( .91)
1. 73
(l)
.82
(.94)
.96
( .92)
3.17
(.99 )
1.52
( .92)
1.80
( .90)
2.88
(l)
1. 31
( .95)
1.83
(.96)
2.11
(l)
1.08
( .97)
1.35
( .97)
3.39
( .99)
2.00
( .97)
2.52
( .95)
I
*Kaigh and Lachenbruch method only.
**Q -- ~ .75- ~.
.25' D = ~. 90 - ~ . 10·
.....
0
w
TABLE 4.8
EXPECTED LENGTHS OF CONFIDENCE INTERVALS (AND OBSERVED CONFIDENCE) COMPUTED FOR INTERQUANTILE
DIFFERENCE FROM LOGNORMAL DISTRIBUTION
Intended
Confidence
N
K1*
K2*
31
23
23
l; -l;
q
**
P
Q
95%
51
39
29
51
39
29
31
23
23
D
51
39
29
51
39
29
L-COST
K-L
3.62
(1)
1.71
( .88)
2.11
( .90)
2.70
(1)
1.32
( .92)
1. 53
3.27
(.91)
3.84
( .88)
5.13
(1)
2.25
( .95)
3.07
( .96)
3.37
(1)
1. 74
( .96)
2.14
( .96)
8.86
( .99)
4.30
( .95)
5.38
( .94)
I 8.53
(.99)
Q
99%
Chu
D
( .92)
L
*Kaigh and Lachenbruch method only.
**Q -- l; .75- l; .25'. D = l;. 90 - l; . 10.
e
--'
0
~
e
e
CHAPTER V
EXAMPLE OF QUANTILE ESTIMATION METHODS
5.1
Introduction
In the preceding chapters, comparisons of various estimation
methods have been performed using data simulated from known underlying distributions.
These simulations provided a basis for assess-
ing a constructed confidence interval's expected length and ability
to achieve desired confidence for the particular combination of
sample size and function of quantiles.
While the simulated inter-
vals are of use for evaluating the merits of the various intervals,
an example is provided in this chapter to illustrate the variation
among the different point and confidence interval estimates obtained
from using the different methods on real data.
The data for this chapter are taken from the Lipid Research
Clinics Program Prevalence Study (Davis, 1980), which has the goal
of improved understanding of heart disease.
The study is cross-
sectional, including data from subjects with widely differing socioeconomic and cultural backgrounds.
For this example, separate
random subsamples of 51 users and 51 nonusers of oral contraceptives
were selected from among the female study participants aged 20 to 29.
The variables included in this illustration are Total Cholesterol
106
(CHOL), Triglycerides (TRIG), and High-Density Lipoprotein Cholesterol (HDL-C).
The example is not intended to provide a definitive
analysis of the data in the sample, but rather to illustrate the
methods for quantile estimation on a real data set.
5.2 Comparison of Results for the Example
Point estimators and confidence intervals were constructed for
estimators of the median as well as for the interdecile and interquartile difference.
Several different methods were compared for
each class of oral contraceptive use and type of lipid.
Considering the median, Table 5.1 provides evidence that all four
order statistic-based point estimation methods discussed in Chapter II
provide estimates which are quite close to one another.
The greatest
variability occurred among estimates of median TRIG levels for women
who do not use oral contraceptives; even here, the estimates were
within 10% of each other.
All methods indicate that nonusers of oral
contraceptives have noticeably lower median CHOL and TRIG levels than
users, but about the same levels of HDL-C.
Both the 95% and 99% confidence intervals for the median had
similar values among the various methods.
That is, the upper and
lower endpoints for intervals constructed from all four methods
agreed to a large extent.
Once again, the exception was for the
TRIG levels for nonusers, specifically the lower endpoints of the
interval.
These lower limits varied from 42.9 for the bootstrap,
up to 63 for the order statistic method for 95% intervals, and from
35.7 up to 58 for the 99% intervals.
The long-tailed distribution
107
of triglycerides among nonusers is probably a major factor in this
discrepancy among methods.
Interval lengths for median CHOL levels
were somewhat greater for nonusers than for users of oral contraceptives regardless of estimator.
The same held true for TRIG levels
but with a wider variation in interval lengths among methods, whereas
interval lengths for median HDL-C were virtually independent of method
and class of contraceptive use.
Estimates of interdecile difference were constructed using the
sample quantile, L-COST, and K-L methods as described in Chapter IV.
The results are presented in Table 5.4.
Essentially, results are of
the same nature as for the median in terms of variability among methods.
The methods all yielded virtually identical values across all variables
and contraceptive use classes with the exception of TRIG level interdecile differences for nonusers.
The range of variation, in this
case, was smaller than for medians, however.
Confidence intervals for the interdecile difference were constructed using Chu's method, the L-COST method, and the K-L method
for interquantile differences, which were also discussed in Chapter
IV.
The most striking feature of both Tables 5.5 and 5.6 is that
the interval endpoints obtained using Chu's method differ widely
from those of the L-COST and K-L methods, and result in a much longer
interval than either of these more recent competitors.
For instance,
the length of the 95% confidence interval for the interdecile range
for CHOL level among users of oral contraceptives is 115 when obtained
by the Chu method, 41.7 by
L~COST,
and 38.7 by the K-L method.
Mainly,
this difference is due to Chu's method of constructing very conservative
108
intervals which exceed the desired confidence with high probability.
By contrast, the K-L and L-COST methods yield intervals that are
generally in fairly close agreement (with the exception again being
a slightly wider disparity between measurements of TRIG among nonusers of contraceptives) and much shorter than intervals produced by
Chu's method.
Finally, interquartile differences were computed and are presented in Table 5.7, and confidence intervals computed for interquartile differences are presented in Tables 5.8 and 5.9.
As is
readily apparent from Tables 5.8 and 5.9, the results for the interquartile difference have similar characteristics as those of the
interdecile difference.
A minor difference is the increased discre-
pancy between estimates obtained from the L-COST method as opposed to
the other two methods of point estimation considered for this difference.
As the discrepancy is most pronounced only for nonusers' levels
of CHOL, it might be attributed to the particular pattern in the data.
5.3 Conclusion
The example in this chapter suggests that, for the particular
data used, different point estimators for medians or for interquartile
and interdecile differences will lead to only slightly varying results
with real data, and only a somewhat more noticeable disagreement when
comparing the TRIG levels for nonusers of oral contraceptives.
This
may be an unusual situation, so the generally slight differences between results may lead one to choose the most precise method from
which to actually report results.
The L-COST method has been
109
demonstrated in Harrell and Davis (1982) to be of greater small
sample efficiency than the order statistic method for point estimation.
The method for estimating interquantile differences, which
is based on the L-COST method, was also shown (in Chapter IV) to be
more efficient in small to moderate samples than either K-L or
sample quantile methods for interquantile differences.
Thus, since
precision of estimators is an important criterion for selection of
a method to use, the L-COST method should probably be preferred.
For construction of confidence intervals for the median, the
example shows that the methods were in fairly close agreement, with
the exception of TRIG levels among nonusers, as was noted.
Thus,
from the results of this example, the methods are likely to give close
results.
The intervals for interquantile differences constructed using
Chu's method were much broader than those constructed from either the
K-L or L-COST method.
If one needed to be very conservative, one
could use Chu's bounds; otherwise the example illustrates how small
the differences really are between the other methods.
Thus, while there is no clear choice, the methods of K-L and
L-COST appear useful in examples such as the ones presented in this
chapter.
TABLE 5.1
ESTIMATES OF MEDIANS, lIPID DATA, SAMPLE SIZE 51,
BY USERS AND NONUSERS OF ORAL CONTRACEPTIVES
Oral
Contraceptive
Use
Sample
Quantil e
l-COST
K-l
(K=19)
Bootstrap
CHOl
USER
NONUSER
205
178
204.8
177 .4
204.8
177.5
205
178
TRIG
USER
NONUSER
106
66
106.9
71.0
107.1
72.4
106
USER
NONUSER
51
52
51.2
51.8
51.4
51. 7
51
52
Lipid
HDl-C
66
-'
-'
a
e
e
e
e
e
e
TABLE 5.2
LIMITS FOR 95% CONFIDENCE INTERVALS FOR MEDIAN OF LIPID DATA,
SAMPLE SIZE 51, USERS AND NONUSERS OF ORAL CONTRACEPTIVES
Lipid
Oral
Contraceptive
Use
Limit*
Order
Stati sti c**
L-COST
K-L
(K=19)
Bootstrap
USER
L
U
193"
217
193.9
215.7
193.6
215.9
193.1
216.9
NONUSER
L
U
160
193
166.1
188.7
164.9
190.2
165.1
190.9
USER
L
U
100
117
98.0
115.9
98.0
116.2
96.0
115.9
NONUSER
L
U
63
95
54.5
87.5
56.3
88.5
42.9
89.1
USER
L
U
47
59
47.1
55.3
46.8
56.0
46.1
55.9
NONUSER
L
U
47
55
47.3
56.2
47.5
55.9
47.1
56.9
CHOL
TRIG
HDL-C
*L = Lower limit; U = Upper limit.
**L = X(19); U = X(33).
......
......
......
TABLE 5.3
LIMITS FOR 99% CONFIDENCE INTERVALS FOR MEDIAN OF LIPID DATA,
SAMPLE SIZE 51, USERS AND NONUSERS OF ORAL CONTRACEPTIVES
Lipid
Oral
Contraceptive
Use
HDL-C
Order
Statistic**
L-COST
K-L
(K=19)
Bootstrap
USER
L
U
184
220
190.6
219.1
189.7
219.8
189.3
220.7
NONUSER
L
U
158
200
162.6
192.3
160.5
194.6
161. 1
194.9
USER
L
U
94
118
95.2
118.7
94.8
119.3
92.9
119.1
NONUSER
L
U
58
96
49.4
92.7
50.8
94.1
35.7
96.3
USER
L
U
44
60
45.8
·56.6
45.2
57.6
44.5
57.5
NONUSER
L
U
46
57
45.9
57.6
46.0
57.4
45.5
58.5
CHOL
TRIG
Limit*
*L = Lower limit; U = Upper 1imit.
**L = X(16) ; U = X(35).
---l
---l
N
e
e
e
e
e
e
TABLE 5.4
ESTIMATES OF INTERDECILE DIFFERENCE, LIPID DATA, SAMPLE
SIZE 51, USERS AND NONUSERS OF ORAL CONTRACEPTIVES
-
Oral
Contracepti ve
Use
Sample
Quantile
L-COST
K-L
(K 1=39;K 2=29)
CHOL
USER
NONUSER
89.4
95.4
85.0
96.5
86.1
96.5
TRIG
USER
NONUSER
94.0
142.8
91. 5
133.7
141 .1
HDL-C
USER
NONUSER
42.6
30.6
41.0
30.5
41.9
31.5
Lipid
91.4
.....
-'
w
TABLE 5.5
LIMITS FOR 95% CONFIDENCE INTERVALS ON INTERDECILE RANGE, LIPID
DATA, SAMPLE SIZE 51, USERS AND NONUSERS OF ORAL CONTRACEPTIVES
Lipid
CHOL
Limit*
CHU**
(L=X(v)-X(u))
(U=X(srX(r))
L-COST
K-L
(K l =39;K 2=29)
USER
L
U
51
166
64.2
105.9
66.7
105.4
NONUSER
L
U
75
155
82.8
110.1
81.1
111.9
USER
L
U
59
283
72.3
11 O. 7
73.4
109.5
NONUSER
L
U
219
66
. 81.9
185.4
77.5
204.7
USER
L
U
24
75
33.2
48.7
34.6
49.2
NONUSER
L
U
19
51
23.8
37.3
23.6
39.4
Oral
Contraceptive
Use
TRIG
HDL-C
*L = Lower limit; U = Upper 1imit.
**See Table 4.3 for r,s, u,v.
......
......
+:>
e
e
e
e
e
e
TABLE 5.6
LIMITS FOR 99% CONFIDENCE INTERVALS ON INTERDECILE RANGE, LIPID
DATA, SAMPLE SIZE 51, USERS AND NONUSERS OF ORAL CONTRACEPTIVES
Lipid
CHOL
TRIG
HDL-C
Oral
Contraceptive
Use
cHlJ**
Limit*
(L=X(v)-X(u))
(U=X(s)-X(r))
L-COST
K-L
(K l =39;K 2=29)
USER
L
U
45
166
57.7
112.4
58.9
113.2
NONUSER
L
U
68
155
78.5
114.4
75.0
118.1
USER
L
U
49
283
66.2
116.8
66.2
116.7
NONUSER
L
U
61
219
65.7
201.7
52.0
230.3
USER
L
U
20
70
30.8
51.1
31.6
52.2
NONUSER
L
U
16
51
21. 7
39.4
20.4
42.6
*L = Lower limit; U = Upper limit.
**See Table 4.3 for r,s, u,v.
--'
--'
U1
TABLE 5.7
ESTIMATES OF INTERQUARTILE DIFFERENCE, LIPID DATA, SAMPLE
SIZE 51, USERS AND NONUSERS OF ORAL CONTRACEPTIVES
-
Oral
Contracepti ve
Use
Sample
Quantil e
L-COST
K-L
(K l =39;K 2=29)
CHOL
USER
NONUSER
45
68
44.1
63.7
46.1
68.6
TRIG
USER
NONUSER
49
61
46.6
56.7
48.4
58.9
HDL-C
USER
NONUSER
20
16
20.7
15.4
21. 7
16.2
Lipid
---I
---I
O"l
e
e
e·
e
e
e
TABLE 5.8
LIMITS FOR 95% CONFIDENCE INTERVALS ON INTERQUARTILE RANGE, LIPID
DATA, SAMPLE SIZE 51, USERS AND NONUSERS OF ORAL CONTRACEPTIVES
Lipid
CHOL
Limit*
CHU**
(L=X(v)-X(u))
(U=X(s)-X(r))
L-COST
K-L
(K l =39;K 2=29)
USER
L
U
19
83
34.3
53.9
35.7
56.5
NONUSER
L
U
13
89
46.3
81.1
48.4
88.8
USER
L
U
15
86
24.9
68.3
20.0
76.9
NONUSER
L
U
30
134
39.1
74.4
36.3
81. 5
USER
L
U
7
41
14.7
26.7
14.5
28.9
NONUSER
L
U
7
29
11. 2
19.7
11.2
21.2
Oral
Contracepti ve
Use
TRIG
HDL-C
*L = Lower limit; U = Upper limit.
**See Table 4.3 for r,s, U,V.
.......
.......
-....J
TABLE 5.9
LIMITS FOR 99% CONFIDENCE INTERVALS ON INTERQUARTILE RANGE, LIPID
DATA, SAMPLE SIZE 51, USERS AND NONUSERS OF ORAL CONTRACEPTIVES
Lipid
Limit*
CFiu**
(L=X(v)-X(u))
(U=X(sfX(r))
L-COST
K-L
(K 1=39;K 2=29)
USER
L
U
7
88
31.3
56.9
31.5
60.7
NONUSER
l
U
9
97
40.9
86.6
40.3
96.9
USER
l
U
8
96
18.1
75. 1
8.6
88.3
NONUSER
l
U
18
145
33.5
79.9
27.2
90.6
USER
l
U
2
44
12.9
28.5
11.6
31.8
NONUSER
l
U
4
31
9.8
21.0
9.2
23.2
Oral
Contraceptive
Use
CHOl
TRIG
HDl-C
*l = lower limit; U = Upper limit;
**See Table 4.3 for r,s, U,V.
......
.....
00
e
e
e
CHAPTER VI
SUMMARY AND SUGGESTIONS FOR FURTHER RESEARCH
6.1
Summary
The research in this dissertation was motivated by the need
for further understanding of the properties of proposed estimators
for quantiles.
Until this work, the L-COST and K-L estimators had
only been evaluated in a limited way for use in confidence intervals.
Their use had not been explored for estimating important
functions of quantiles, such as the interquantile range, a useful
measure of dispersion.
In Chapter I, the existing literature is reviewed, with an
emphasis on formation of confidence intervals for quantiles.
Both
parametric and nonparametric methods are considered, and estimators
for quantile intervals and quantile differences are discussed.
Chapter II discusses formation of confidence intervals based
upon six potentially useful estimators for single quantiles.
The
methods used to determine average interval length and ability to
preserve confidence are detailed for both simulated and exact intervals.
Simulations were performed to compare the Bootstrap, L-COST,
K-L, and order statistics methods under five distributions for the
data.
Results showed that the L-COST interval would need an ordinate
120
other than one from the normal distribution to perform consistently
well in small samples.
The K-L method, as described, performs
reasonably well for median estimation.
Finally, the method based
on order statistics produced longer intervals, both when simulated
and (when possible) when calculated explicitly.
It did maintain the
desired confidence quite well, however.
Theoretical developments needed to establish 1arqe-samp1e use
of the normal distribution for estimators of interquanti1e difference were presented in Chapter III.
It was shown that both the K-L
and L-COST methods could be used to form pivotal quantities with
asymptotic normal distributions, and thus readily lend themselves to
use in confidence intervals.
Chapter IV first compared the interquanti1e difference estimators
based on L-COST and K-L methods.
As a point estimator, the L-COST
method performed very well under most distributions.
Confidence inter-
vals based on these methods were also constructed and compared with
the order statistic method of Chu (1957).
Chu's method was found to
yield very conservative, long intervals, and neither the K-L nor the
L-COST method consistently provided intervals meeting the desired
confidence with the particular ordinates selected.
The K-L method
was useful for intervals based on symmetric underlying distributions.
Finally, in Chapter V, an example using data from the Lipid
Research Clinics Program was constructed to provide a simple illustration of how the methods discussed in both Chapters II and IV
compare when applied to a real data set.
Generally, differences
were slight; Chu's method for intervals of interquanti1e differences
121
lead to vastly different intervals, however.
In some cases, charac-
teristics of different variables analyzed lead to more noticeable
variations among estimates.
6.2 Suggestions for Further Research
The research in the preceding chapters attempted to address the
question of how well the L-COST and K-L estimators would perform
when used for confidence interval construction, estimation of simple
functions of quantiles, and estimation of confidence intervals for
functions of quantiles.
Further work on the following areas would
provide additional information in evaluating the methods described.
i)
Determine the distribution of
L-COST -
in small samples.
~
p
Determine the accuracy of an approximation by the
normal distribution.
If an appropriate distribution can be shown to
be a t-distribution, determine the method for finding the degrees of
freedom.
Otherwise, empirically find those factors which closely
yield the desired confidence for intervals formed.
ii)
Develop a randomized procedure such that either the ob-
served confidence level or interval length could be held constant
so as to permit uniform evaluation of the different methods.
iii)
Develop an alternative variance estimator to one based
on the Jackknife, and compare results obtained with those in this
dissertation.
122
iv)
Further investigate methods for selecting k for the K-L
method.
v)
Develop tighter bounds, based on Chu's (1957) method,
which would better account for the probabilities previously not
considered.
123
BIBLIOGRAPHY
•
.
ALI, MIR MASOOM, UMBACH, DALE, and HASSANEIN, KHATAB, M. (1981),
"Estimation of Quantiles of Exponential and Double Exponential
Distributions Based on Two Order Statistics", Communications
~ Statistics, A10, 1921-1932.
ANGUS, J.E. and SCHAFER, R.E. (1979), "Estimation of Logistic Quantiles with Minimum Error in the Predicted Distribution Function",
Communications ~ Statistics, A8, 1271-1284.
ARVESEN, JAMES N. (1969), "Jackknifing U-Statistics", Annals of
Mathematical Statistics, 40, 2076-2100.
AZZALINI, A. (1981), "A Note on the Estimation of a Distribution
Function and Quantiles by a Kernel Method", Biometrika, 68,
326-328.
BAUER, DAVID F. (1972), "Constructi ng Confi dence Sets Using Rank
Statistics", Journal of the American Statistical Association, 67,
687-690.
-----BROOKMEYER, RON and CROWLEY, JOHN (1982), "A Confi dence I nterval for
the Median Survival Time", Biometrics, 38, 29-41.
CHENG, KUANG-FU (1982), "Jackknifing L-estimates", Canadian Journal
of Statistics, 10, 49-58.
CHU, J. T. (1957), "S ome Uses of Quasi-Ranges", Annal s of Mathematical
Statistics, 28, 173-180.
DAVID, HERBERT A. (1981), Order Statistics, Second Edition, New York:
----John Wiley.
DAVIS, C.E., et al. (1980), "Correlations of Plasma High-Density
Lipoprotein Cholesterol Levels with Other Plasma Lipid and Lipoprotein Concentrations", Circulation, Part II, 62: IV-24 - IV-30.
DESU, MAHAMUNULU M. and RODINE, R.H. (1969), "Estimation of the
Population Median l' , Skandinavisk Aktuarie Tidskrift, 67-70.
DYER, DANNY O. and KEATING, JEROME P. (1979), "A Further Look at the
Comparison of Normal Percentile Estimation", Communications in
Statistics, A8, 1-16.
-DYER, D.O., KEATING, J.P., and HENSLEY, O.L. (1977), "Comparison of
Point Estimators of Normal Percentiles", Communications in
Statistics, B6, 269-283.
EFRON, B. (1979), "Bootstrap Methods:
Annals of Statistics, 7, 1-26.
Another Look at the Jackknife",
124
EFRON, BRADLEY (1981), "Censored Data and the Bootstrap", Journa 1
of the American Statistical Association, 76, 312-319.
EKBLOM, H. (1973), "A Note on Nonlinear Median Estimators", Journal
of the American Statistical Association, 68, 431-432.
EMERSON, JOHN D. (1982), IlNonparametric Confidence Intervals for
the Median in the Presence of Right Censoring", Biometrics, 38,
17-27.
GIBBONS, J.D. (1971), Nonparametric Statistical Inference, New York:
McGraw-Hill.
GREEN, J.R. (1969), "Inference Concerning Probabilities and Quantiles",
Journal of the Royal Statistical Society, Ser. B, 31, 310-316.
GREENBERG, B.G.
Best Linear
Statistics,
John Wil ey,
and SARHAN, A.E. (1962), "Exponential Distribution:
Unbiased Estimates", in Contributions to Order
eds. A.E. Sarhan and B.G. Greenberg, New York:
352-360.
GUILBAUD, OLIVIER (1979), "Interval Estimation of the Median of a
General Distribution", Scandinavian Journal of Statistics, 6,
29-36.
-HARRELL, FRANK E. JR. and DAVIS, C.E. (1982), "A New DistributionFree Quantile Estimator", Biometrika, 69, 635-640.
HARTER, LEON (1961), "Expected Values of Normal Order Statistics",
Biometrika, 48, 151-165.
HARTIGAN, J.A. (1969), "Using Subsamp1e Values as Typical Values",
Journal of the American Statistical Association, 64, 1303-1317.
HARVARD UNIVERSITY COMPUTATION LABORATORY (1955), Tables of the
Cumulative Binomial Probability Distribution, Cambridge:lHiarvard
Uni vers i ty Press.
HOGG, ROBERT V. and CRAIG, ALLEN, T. (1978), Introduction to Mathematical Statistics, Fourth Edition, New York: Macmillan:
JENNETT, W.J. and WELCH, B.L. (1939), liThe Control of Proportion
Defective as Judged by a Single Quality Characteristic Varying
on a Continuous Sca1e ll , Journal of the Royal Statistical Society,
Supplement, 6, 80-88.
KAIGH, W.O. (1982), "Quantile Interval Estimation", unpublished manuscri pt.
KAIGH, W.O. and LACHENBRUCH, PETER A. (1982), IIA Generalized Quantile
Estimator", Communications in Statistics, All, 2217-2238.
.
125
KREWSKI, DANIEL (1976), "Distribution-Free Confidence Intervals for
Quantile Intervals", Journal of the American Statistical
Association, 71, 420-422.
-----KUBAT, PETER and EPSTEIN, BENJAMIN (1980), "Estimation of Quantiles
of Location-Scale Distributions Based on Two or Three Order
Statistics", Technometrics, 22, 575-581.
LANKE, JAN (1974), "Interval Estimation of a Median", Scandinavian
Journal of Statistics, 1, 28-32.
LAWLESS, J.F. (1975), "Construction of Tolerance Bounds for the
Extreme-Value and Weibull Distributions", Technometrics, 17,
255-261 .
LEVER, W. E. (1969), "Note: Confidence Limits for Quantil es of Mortality Distributions", Biometrics, 25, 176-178.
MANN, NANCY R. and FERTIG, KENNETH W. (1975), "Simplified Efficient
Point and Interval Estimators for Weibull Parameters",
Technometrics, 17, 361-368.
MANN, NANCY R. and FERTIG, KENNETH (1977), "Efficient Unbiased Quantile Estimators for Moderate-Size Complete Samples from ExtremeValue and Weibull Distributions; Confidence Bounds and Tolerance
and Prediction Intervals
Technometrics, 19, 87-93.
ll
,
MARITZ, J.S. and JARRETT, R.G. (1978), "A Note on Estimating the
Variance of the Sample Median
Journal of the American Statistical
Association, 73, 194-196.
-----ll
,
MOOD, ALEXANDER M., GRAYBILL, FRANKLIN A., and BOES, DUANE C. (1974),
Introduction to the Theory of Statistics, Third Edition, New
York: McGraw-Hill.
MOSES, LINCOLN E. (1965), IIQueries: Confidence Limits from Rank Tests
Technometrics, 7, 257-260.
ll
NAIR, K.R. (1940), "Tables of Confidence Intervals for the Median
in Samples from any Continuous Population
Sankhya, 4, 551-558.
ll
,
NOETHER, GOTTFRIED E. (1948), "On Confidence Limits for Quantiles
Annals of Mathematical Statistics, 19, 416-419.
ll
,
NOETHER, GOTTFRIED E. (1973), "Some Simple Distribution-Free Confidence Intervals for the Center of a Symmetric Distribution
Journal of the American Statistical Association, 68, 716-719.
ll
,
OGAWA, JUNJIRO (1962), "Di stributi on and Moments of Order Stati sti cs II,
in Contributions to Order Statistics, eds. A.E. Sarhan and B.G.
Greenberg, New York: John Wiley, 11-19.
,
126
OWEN, DON B. (1968), "A Survey of Properties and Applications of the
Noncentral t-Distribution", Technometrics, 10, 445-478.
PARR, WILLIAM C. and SCHUCANY ,WILLIAM R. (1958), "Jackknifing LStatistics with Smooth Weight Functions", Journal of the American
Statistical Association, 77, 629-638.
-----REID, NANCY (1981), "Estimating the Median Survival Time", Biometrika,
68, 601-608.
REISS, ROLF D. AND RUSCHENDORF, LUDGER (1976), "On Wilks· DistributionFree Confidence Intervals for Quantile Intervals", Journal of the
American Statistical Association, 71, 940-944.
-----ROBERTSON, C.A. (1977), "Estimation of Quantiles of Exponential Distributions with Minimum Error in Predicted Distribution Functions",
Journal of the American Statistical Association, 72, 162-164.
RUDIN, WALTER (1976), Principles of Mathematical Analysis, New York:
McGraw-Hill.
RUKHIN, ANDREW L. and STRAWDERMAN, ~JILLIAM E. (1982), "Estimating
a Quantile of an Exponential Distribution", Journal of the
American Statistical Association, 77, 159-162.
-----SARHAN, A.E. (1954), "Estimation of the Mean and Standard Deviation
by Order Statistics", Annals of Mathematical Statistics, 25,
-317-328.
SARHAN, A.E. and GREENBERG, B.G. (1962), 1I0ther Distributions:
Rectangular Distribution", in Contributions to Order Statistics,
eds. A.E. Sarhan and B.G. Greenberg, New York: John Wiley, 383390.
SAS INSTITUTE (1979), SAS User's Guide, 1979 Edition, Raleigh:
Institute.
--
SAS
SATHE, Y.S. and LINGRAS, S.R. (1981), "Bounds for the Confidence
Coefficients of Outer and Inner Confidence Intervals for Quantile Intervals", Journal of the American Statistical Association,
76, 473-475.
-- ---SAVUR, S.R. (1937), "The Use of the Median in Tests of Significance",
Proceedings of the Indian Academy of Science, Section A, 5,
564-576.
SCHAFER, R.E. and ANGUS, J.E. (1979), "Estimation of Weibull Quantiles
with Minimum Error in the Distribution Function", Technometrics,
21, 367-370.
SCHEFFE, HENRY (1943), "Statistical Inference in the Nonparametric
Case", Annals of Mathematical Statistics, 14,305-332.
a.
127
SCHEFF€' H. and TUKEY, J.W. (1945), IINonparametric Estimation. I.
Validation of Order Statistics
Annals of Mathematical
Statistics, 16, 187-192.
ll
,
SCHMEISER, BRUCE W. (1975), liOn Monte Carlo Distribution Sampling,
Ph.D.
with Application to the Component Randomization Test
Dissertation, Georgia Institute of Technology, Atlanta.
ll
,
SEDRANSK, J. and MEYER, J. (1978), IIConfidence Intervals for the
Quantiles of a Finite Population: Simple Random and Stratified
Simple Random Sampling Journal of the Royal Statistical
Society, Sere B, 40, 239-252.
ll
,
SEN, P. K. (1982), IIJackkni fi ng L-Estimators: Affi ne Structure and
Asymptotics
Institute of Statistics Mimeo Series No. 1415,
The University of North Carolina, Chapel Hill, North Carolina.
ll
,
SERFLING, ROBERT J. (1980), Approximation Theorems of Mathematical
Statistics, New York: John Wiley.
STIGLER, S.M. (1969), IILinear Functions of Order Statistics
of Mathematical Statistics, 40, 770-788.
ll
,
Annals
THOMPSON, WILLIAM R. (1936), liOn Confidence Ranges and Other Expectation Distributions for Populations of Unknown Distribution
Form", Annals of Mathematical Statistics, 7,122-128 .
•
TUKEY, JOHN W. (1958), "Bias and Confidence in Not-Quite Large
Samples (Abstract)", Annals of Mathematical Statistics, 29,614.
UMBACH, DALE, ALI, MIR MASOOM, and HASSANEIN, KHATAB, M. (1981),
IIEstimating Pareto Quantiles Using Two-Order Statistics",
Communications iQ Statistics, A10, 1933-1941.
WALSH, JOHN E. (1958), "Efficient Small Sample Nonparametric Median
Tests with Bounded Significance Levels", Annals of the Institute
of Statistical Mathematics, Tokyo, 9, 185-199. ----WEISS, LIONEL (1960), "Confidence Intervals of Preassigned Length
for Quanti les of Uni moda 1 Popul ations ", Naval Research Logi sti cs
Quarterly, 7, 251-256.
WEISSMAN, ISHAY (1978), IIEstimation of Parameters and Large Quantiles Based on the k Largest Observations", Journal of the
American Statistical Association, 73, 812-815.
----WILKS, S.S. (1948), 1I0 r der Statistics
Bulletin of the American
Mathematics Society, Series 2, 54, 6-50.
- ---ll
,
WILKS, SAMUEL S. (1962), Mathematical Statistics, New York: John
Wiley.
128
WILLEMAIN, THOMAS R. (1980), "Estimating the Population Median by
Nomi na.ti on Sampl i ng", Journal of the Ameri can Stati stica 1
Association, 75, 908-911.
--ZIDEK, JAMES V. (1971), "Inadmissibil ity of a Class of Estimators
of a Normal Quantile ll , Annals of Mathematical Statistics, 42,
1444-1447.
130
L-COST Quantile Estimation Program
PROC MATRIX;
* L-COST QUANTILE ESTIMATION PROGRAM;
* N IS THE SAMPLE SIZE, P REFERS TO P-TH QUANTILE;
N=ll; * THESE LINES;
P=.5; * WILL VARY;
*** CONSTANTS FOR THE INCOMPLETE BETA;
*.,
Al=P*(N+1);
A2=P*N;
Bl=(l-P)*(N+1) ;
B2=(1-P)*N;
*.,
*** INITIALIZE W'S, LAMBDA, AND THE U VECTOR;
*.,
Wl=J(N,l,O);
W2=J(N,1,0);
W3=J(N,1,0);
Ul=J(N,l ,0);
LAMBDA=J(N,N,O);
*.,
*** FORM W'S AND THE U VECTOR;
*.,
DO 1=1 TO N;
* .,
Wl(I,)=PROBBETA(I#/N,Al,Bl)-PROBBETA((I-l)#/N,Al,Bl);
*.,
IF (I > 1) THEN
W2(I,)=PROBBETA((I-l)#/(N-l),A2,B2)-PROBBETA((I-2)#/(N-1),A2,B2);
ELSE W2(I,)=0;
IF (I < N) THEN
W3(I,)=PROBBETA(I#/(N-l),A2,B2)-PROBBETA((I-l)#/(N-l),A2,B2);
ELSE W3( I, )=0;
*** FORM THE U VECTOR;
D=W2(I,) ;
E=W3(I ,) ;
Ul(I,)=(I-l)*D + {N-I)*E;
END;
* OF I LOOP;
*** CONSTRUCT LAMBDA FROM VARIOUS W·S;
*.,
DO L=l TO N;
DO M=l TO N;
* .,
***SEPARATE W'S FOR INDEX LAND M;
*.,
FL =W2( L,) ;
GL=W3(L,);
FM=W2(M,);
GM=W3(M,) ;
131
*** FILL LAMBDA WITH CORRECT COMBINATION OF W'S;
*.•
IF L=M THEN
LAMBDA(L,M}=(L-1}*(FL**2}+(N-L}*(GL**2};
*.,
IF M > L THEN
LAMBDA(L.M}=(L-1}*(FL*FM + (M-L-1}*GL*FM + (N-M}*GL*GM;
*.•
IF M< L THEN
LAMBDA(L,M)=LAMBDA(M.L};
*.•
END;
* OF MLOOP;
END;
* OF L LOOP;
*.•
* INDATA IS A SAS DATASET CONTAINING;
* ONLY THE VARIABLE OF INTEREST;
FETCH X DATA=INDATA;
A=X;
* OBTAIN THE ORDER STATISTICS;
A4=A;
B4=A4;
Y=RANK(A4};
A4(Y.}=B4;
* QP IS THE ESTIMATOR OF THE P-TH QUANTILE;
*.•
QP=W1'*A4 ;
*.•
* FORM THE MEAN OF THE SIS;
S=(1#/N}*U1 *A4;
* FORM THE SUM OF S-SUB-J SQUARED;
Vl=A4 i *LAMBDA*A4;
* SBAR2=SQUARE(MEAN OF THE SIS};
SBAR2=S##2;
* FORM THE VARIANCE AND STD DEVIATION;
*.•
VARQP=(N-l}*((l#/N}*Vl - SBAR2};
SDEVQP=VARQP##.5;
* FORM INTERVAL LENGTHS;
LEN95=2*1.96*SDEVQP;
LEN99=2*2.575*SDEVQP;
* UPPERXX AND LOWER XX ARE;
* UPPER AND LOWER .XX CONFIDENCE LIMITS;
UPPER95=QP + 2*1.96*SDEVQP;
LOWER95=QP - 2*1.96*SDEVQP;
UPPER99=QP + 2*2.575*SDEVQP;
LOWER99=QP - 2*2.575*SDEVQP;
PRINT N P QP LEN95 LEN99;
UPPER95 LOWER95 UPPER99 LOWER99;
TITLEl HARRELL AND DAVIS (1982) L-COST ESTIMATOR;
TITLE2 SAMPLE SIZE=11;
TITLE3 QUANTILE: P=.5;
II
1
132
K-L Quantile Estimation Program
PROC MATRIX;
* K-L ESTIMATION PROGRAM;
* READ IN N, K, P;
N=51;
P=.5;
K=19;
DIFF1=N-K;
R=INT((K+l)*P);
Fl=SQRT(l#/N);
* INITIALIZE THE WEIGHT VECTOR;
WEIGHT=J(N,l,O);
* FILL THE WEIGHT VECTOR WITH PROPER WEIGHTS;
NFACT=GAMMA(N+l);
KFACT=GAMMA(K+l);
NKFACT=GAMMA(N-K+l);
NCHOOSEK=NFACT#/l(KFACT*NKFACT);
DO J=R TO N+R-K;
J 1FACT=GAMMA (J) ;
R1FACT=GAMMA(R);
JRFACT=GAMMA(J-R+l);
J1CHR1=J1FACTH/(R1FACT*JRFACT);
NJFACT=GAMMA(N-J+l);
KRFACT=GA~1A(K-R+l);
NJKRFACT=GAMMA(N-J-K+R+l);
NJCHKR=NJFACT#/(KRFACT*NJKRFACT);
WEIGHT(J,)=(J1CHR1*NJCHKR)#/NCHOOSEK;
END;
* OF J LOOP (CONSTRUCTING WEIGHTS);
*.,
* INITIALIZE JACKKNIFE WEIGHT VECTOR;
WEIGHT2=J(N-l,1 ,0);
* FILL THE WEIGHT VECTOR WITH PROPER WEIGHTS;
NFACT2=GAMMA(N);
KFACT2=GAMMA(K+l);
NKFACT2=GAMMA(N-K);
NCHUZK2=NFACT2#/(KFACT2*NKFACT2);
DO Jl=R TO N+R-K-l;
J2FACT=GAMMA(Jl);
R2FACT=GAMMA(R) ;
JRFACT2=GA~~A(Jl-R+l);
J2CHR1=J2FACT#/(R2FACT*JRFACT2);
NJFACT2=GAMMA(N-Jl);
KRFACT2=GAMMA(K-R+l);
NJKRFAC2=GAMMA(N-Jl-K+R);
NJCHKR2=NJFACT2#/(KRFACT2*NJKRFAC2);
WEIGHT2(Jl,)=(J2CHR1*NJCHKR2)#/NCHUZK2;
END;
* OF Jl LOOP (CONSTRUCTING WEIGHTS WITH ONE DELETION);
133
* INDATA IS A SAS DATASET CONTAINING ONLY;
* THE VARIABLE OF INTEREST;
FETCH X DATA=INDATA;
A=X;
B~;
Y=RANK(A);
A(Y!)=B;
* A IS NOW A VECTOR OF ORDER STATISTICS;
* FORM THE K-L ESTIMATOR;
EST1=WEIGHT ' *A;
* FORM THE JACKKNIFED ESTIMATES;
JACKEST=J(N!1!0);
DO 1=1 TO N;
IF 1=1 THEN SHORTER=A(2:N!);
IF ((I >1) AND (I<N));
THEN SHORTER=A(l:(I-1)!)//A((I+1):N!);
IF I=N THEN SHORTER=A(1:(N-1)!);
JACKEST(I!)=WEIGHT2 *SHORTER;
END;
* OF I LOOP FOR JACKKNIFED ESTIMATES;
MEAN1=SUM(JACKEST)#/N;
VECMEAN1=J(N,1!MEAN1);
DIFF=JACKEST - VECMEAN1;
SDEV1=((DIFF'*DIFF)*(N-1))##.5;
* SELECTED VALUES OF N-K HAVE T-VALUES BELOW;
* ADD OTHER ONES AS NEEDED;
IF DIFF1=8 THEN DO;
T1=2.3060; T2=3.3554;
END;
IF DIFF1=12 THEN DO;
T1=2.1788; T2=3.0545;
END;
IF DIFF1=17 THEN DO;
T1=2.1098; T2=2.8982;
END;
IF DIFF1=22 THEN DO;
T1=2.0739; T2=2.8188;
END;
IF DIFF1=28 THEN DO;
T1=2.0484; T2=2.7633;
END;
IF DIFF1=32 THEN DO;
T1=2.0369; T2=2.7385;
END;
* FORM LENGTHS OF INTERVALS;
LEN95=2*T1*F1*SDEV1;
LEN99=2*T2*F1*SDEV1 ;
* UPPERXX AND LOWERXX ARE ENDPOINTS;
* OF THE .XX CONFIDENCE INTERVALS;
UPPER95=EST1 + 2*T1*F1*SDEV1;
LOWER95=EST1 - 2*T1*F1*SDEV1;
UPPER99=EST1 + 2*T2*F1*SDEV1;
LOWER99=EST1 - 2*T2*F1*SDEV1;
1
134
PRINT P N K EST1 UPPER95 LOWER95
UPPER99 LOWER99 LEN95 LEN99;
TITLE1 KAIGH AND LACHENBRUCH METHOD;
TITLE2 SAMPLE SIZE 51;
TITLE3 QUANTILE: P=.5;
H-D Interguanti1e Difference Program
PROC MATRIX;
* L-COST INTERQUANTILE DIFFERENCE PROGRAM
* INSERT N~P~Q VALUES;
N=51;
P=. 10; Q=. 90;
* DEFINE CONSTANTS FOR THE INCOMPLETE BETA;
A1=Q*{N+1};
B1={1-Q}*{N+l) ;
A2=P*{N+1);
B2={1-P)*{N+1);
A3=Q*N;
B3={1-Q)*N;
A4=P*N;
B4={1-P)*N;
*INITIALIZE WEIGHTS;
W1=J (N~ 1 ~ 0) ;
W2=J{N~ 1 ~O);
W3=J{N~1~0);
D=J{N~N~O);
*CONSTRUCT WEIGHTS FOR THE POINT ESTIMATOR;
DO 1=1 TO N;
W1(I~)=PROBBETA{I#/N~A1~B1)-PROBBETA{{I-1)#/N~A1~B1)
-PROBBETA{I#/N~A2~B2)+PROBBETA{{I-1)#/N~A2~B2);
END;
* OF LOOP TO CALCULATE WEIGHTS FOR THE POINT ESTIMATOR;
* CALCULATE WEIGHTS FOR THE JACKKNIFE VARIABLE ESTIMATOR;
DO J=l TO N;
IF J > 1 THEN DO;
DO 1=1 TO J-1;
D{I~J)=PROBBETA{I#/{N-1)~A3~B3)
-PROBBETA{{I-1)#/{N-1)~A3~B3)
-PROBBETA{I#/{N-1)~A4~B4)
+PROBBETA{{I-1)#/{N-1)~A4~B4);
END;
* OF I LOOP;
END;
* OF IF-THEN GROUP;
IF J < N THEN DO;
DO I =J+ 1 TO N;
D{I~J)=PROBBETA{{I-1)#/{N-1)~A3~B3)
-PROBBETA{{I-2)#/{N-1)~A3,B3)
-PROBBETA{{I-1)#/(N-1)~A4,B4)
+PROBBETA{(I-2}#/(N-1),A4~B4);
•
135
END;
* OF I LOOP;
END;
* OF IF-THEN GROUP;
END;
* OF J LOOP;
* SPECIFY T1 AND T2;
Tl=l. 96;
T2=2.575;
* INITIALIZE VECTORS;
DJ=J(N,l,O);
DBAR=J(N,l,O);
* INDATA IS A SAS DATASET CONTAINING;
* ONLY THE VARIABLE OF INTEREST;
FETCH X DATA=INDATA;
A=X;
BVEC=A;
Y=RANK(A) ;
A(Y,)=BVEC;
* NOW A CONTAINS THE ORDER STATISTICS;
* FORM THE ESTIMATOR FOR THE SAMPLE;
QP=W1 *A;
* COMPUTE THE N D-SUB-J VALUES;
DO J=l TO N;
DJ(J,)=D( ,J) '*A;
END;
* OF J LOOP;
* OBTAIN DBAR;
SUMDJ=SUM(DJ);
DBARELEM=SUMDJ#/N;
DBAR=J(N,l,DBARELEM);
* OBTAIN THE VARIABLE AND STD DEV OF THE ESTIMATOR;
VARQP=( (N-l)#/N)*( (DJ-DBAR) '*(DJ-DBAR)) ;
SDEVQP=VARQP##.5;
* FORM LENGTHS OF INTERVALS;
LEN95=2*T1*SDEVQP;
LEN99=2*T2*SDEVQP;
* UPPERXX AND LOWERXX ARE ENDPOINTS;
* OF THE .XX CONFIDENCE INTERVAL;
UPPER95=QP+2*Tl *SDEVQP;
LOWER95=QP-2*T1*SDEVQP;
UPPER99=QP+2*T2*SDEVQP;
LOWER99=QP-2*T2*SDEVQP;
PRINT N P Q QP LEN95 LEN99
UPPER95 LOWER95 UPPER99 LOWER99;
TITLE1 L-COST INTERQUANTILE DISTANCE ESTIMATOR;
TITLE2 SAMPLE SIZE=51;
TITLE3 INTERDECILE DIFFERENCE;
II
1
..
136
K-L Interguantile Difference Program
PROC MATRIX;
* K-L INTERQUANTILE DIFFERENCE PROGRAM;
* READ IN NEEDED PARAMETER VALUES;
* K1, R1 CORRESPOND TO THE Q-TH QUANTILE;
* K2, R2 CORRESPOND TO THE P-TH QUANTILE;
P=.25; Q=.75;
N=51 ;
* PARAMETER VALUES WILL VARY;
IF N=31 THEN DO;
K1=23; K2=23;
END;
IF N=51 THEN DO;
K1=39; K2=29;
END;
DIFFl =N-K1 ;
DIFF2=N-K2;
IF {DIFF1 > DIFF2} THEN DIFFM=DIFF2;
ELSE DIFFM=DIFF1;
R1=INT{{K1+1}*Q);
R2=INT((K2+1)*P);
*.,
IF DIFFM=8 THEN DO;
T1=2.3060; T2=3.3554;
END;
IF DIFFM=12 THEN DO;
T1=2.1788; T2=3.0545;
END;
IF DIFFM=17 THEN DO;
Tl=2.1098; T2=2.8982;
END;
IF DIFFM=22 THEN DO;
T1=2.0739; T2=2.8188;
END;
IF DIFFM=28 THEN DO;
Tl=2.0484; T2=2.7633;
END;
IF DIFFM=32 THEN DO;
T1=2.0369; T2=2.7385;
END;
*.,
* INITIALIZE THE WEIGHT VECTORS;
*.,
WEIGHT1Q=J(N,1,0);
WEIGHT1P=J{N,1,0);
WEIGHT2Q=J {{N-l}, 1,0) ;
WEIGHT2P=J{ (N-1), 1,O};
*.,
•
137
*COMPUTE THE WEIGHTS FOR Q-TH QUANTILE, FULL SAMPLE;
*.,
NFACT1=GAMMA(N+1);
K1FACT1=GAMMA(K1+1);
NK1FACT1=GAMMA(N-K+l);
NCHUZKll=NFACT1#/(K1FACT1*NK1FACT1);
DO J1=R1 TO N+R1-Kl;
J1FACT1=GAMMA(J1);
R1FACT1=GAMMA(Rl);
J1R1FAC1=GAMMA(Jl-Rl+1);
J1CHR1l=J1FACT1#/(R1FACT1*J1R1FAC1);
NJ1FACT1=GAMMA(N-Jl+1);
K1R1FAC1=GAMMA(K1-Rl+l);
NJKRFAQ1=GAMMA(N-Jl-Kl+Rl+l);
NJQl Kl Rl =NJl FACTl#/ (Kl Rl FAel *NJKRFAQl) ;
WEIGHT1Q(Jl,)=(J1CHRll*NJQ1K1Rl)#/NCHUZKll;
END;
* OF Jl LOOP FOR FULL SAMPLE Q-TH QUANTILE WEIGHTS;
* COMPUTE WEIGHTS FOR THE P-TH QUANTILE, FULL SAMPLE;
*.,
NFACT1=GAMMA(N+l);
K2FACT1=GAMMA(K2+1);
NK2FACT1=GAMMA(N-K2+1);
NCHUZK21 =NFACTl #/ (K2FACTl*NK2FACTl);
DO J2=R2 TO N+R2-K2;
J2FACT1=GAMMA(J2);
R2FACT1=GAMMA(R2);
J2R2FAC1=GAMr~(J2-R2+1);
END;
J2CHR21=J2FACT1#/(R2FACT1*J2R2FAC1);
NJ2FACT1=GAMMA(N-J2+1);
K2R2FAC1=GAMMA(K2-R2+1);
NJKRFAP1=GAMMA(N-J2-K2+R2+1);
NJP1K2R2=NJ2FACT1#/(K2R2FAC1*NJKRFAP1);
WEIGHT1P(J2,)=(J2CHR21*NJP1K2R2)#/NCHUZK21;
*.,
* OF J2 LOOP OVER FULL SAMPLE P-TH QUANTILE WEIGHTS;
*.,
* COMPUTE WEIGHTS FOR Q-TH QUANTILE, REDUCED SAMPLE;
*.,
NFACT2=GAMt4A(N) ;
K1FACT2=GAMMA(Kl+l);
NK1FACT2=GAMMA(N-Kl);
NCHUZK12=NFACT2#/(K1FACT2*NK1FACT2);
DO J3=Rl TO N+Rl-Kl-l;
J3FACT2=GAMMA(J3);
R1FACT2=GAMMA(Rl);
J3R1FAC2=GAMMA(J3-Rl+l);
J3CHR12=J3FACT2#/(R1FACT2*J3R1FAC2);
NJ3FACT2=GAMMA(N-J3);
K1R1FAC2=GAMMA(Kl-Rl+l);
NJKRFAQ2=GAMMA(N-J3-Kl+Rl);
NJQ2K1Rl=NJ3FACT2#/(K1R1FAC2*NJKRFAQ2);
WEIGHT2Q(J3,)=(J3CHR12*NJQ2K1Rl)#/NCHUZK12;
138
END;
* OF J3 LOOP FOR REDUCED SAMPLE Q-TH QUANTILE WEIGHTS;
*.,
* COMPUTE WEIGHTS FOR P-TH QUANTILE, REDUCED SAMPLE;
*.,
NFACT2=GAW1A(N);
K2FACT2=GAMMA(K2+1);
NK2FACT2=GAMMA(N-K2);
NCHUZK22=NFACT2#/(K2FACT2*NK2FACT2);
DO J4=R2 TO N+R2-K2-1;
J4FACT2=GAMMA(J4);
R2FACT2=GAMMA(R2);
J4R2FAC2=GAMMA(J4-R2+1);
J4CHR22=J4FACT2#/(R2FACT2*J4R2FAC2);
NJ4FACT2=GAMMA(N-J4);
K2R2FAC2=GAMMA(K2-R2+1);
NJKRFAP2=GAMMA(N-J4-K2+R2) ;
NJP2K2R2=NJ4FACT2#/(K2R2FAC2*NJKRFAP2);
WEIGHT2P(J4,)=(J4CHR22*NJP2K2R2)#/NCHUZK22;
END;
* OF J4 LOOP FOR REDUCED SAMPLE Q-TH QUANTILE WEIGHTS;
JACKEST=J(N,l,O);
* INDATA IS A SAS DATASET CONTAINING ONLY THE;
* VARIABLE OF INTEREST;
FETCH X DATA=INDATA;
A=X;
B=A;
Y=RANK(A);
* A IS NOW A VECTOR OF ORDER STATISTICS;
* FORM THE K-L ESTIMATOR OF INTERQUANTILE DIFFERENCE;
EST1=(WEIGHT1Q'*A)-(WEIGHT1P '*A);
* FORM THE JACKKNIFE ESTIMATES;
DO 1=1 TO N;
IF 1=1 THEN SHORTER=A(2:N,);
IF ((I >1) AND (I < N))
THEN SHORTER=A(1:(I-1),)//A((I+1):N,);
IF I=N THEN SHORTER=A(1:(N-1),);
JACKEST(I,)=(WEIGHT2Q'*SHORTER)-(WEIGHT2P ' *SHORTER);
END;
* OF I LOOP FOR JACKKNIFE ESTIMATES;
MEAN1=SUM(JACKEST)#/N;
VECMEAN1=J(N,1,MEAN1);
DIFF=JACKEST-VECMEAN1 ;
SDEV1=((DIFF'*DIFF)*(N-1)#/N)##.5;
*FORM THE LENGTHS OF THE INTERVALS;
LEN95=2*T1*SDEV1 ;
LEN99=2*T2*SDEV1;
* LOWERXX AND UPPERXX ARE UPPER AND;
* LOWER .XX CONFIDENCE INTERVALS;
UPPER95=EST1+2*Tl*SDEV1;
LOWER95=ESTl-2*T1*SDEV1;
UPPER99=EST1+2*T2*SDEV1;
LOWER99=ESTl-2*T2*SDEV1;
•
139
"•
PRINT N Q P K1 K2 EST1 LEN95 LEN99
UPPER95 LOWER95 UPPER99 LOWER99;
TITLE1 K-L INTERQUANTILE DISTANCE ESTIMATOR;
TITLE2 SAMPLE SIZE=51;
TITLE3 INTERQUANTILE DIFFERENCE;
II