This research was supported by the U. S. Army Research Office
(Durham) Grant No. DA-ARO(D)-31-124-G-746.
SOME CONTRIBUTIONS TO ORDER STATISTICS
by
Prakash Chandra Joshi
University of North Carolina
Institute of Statistics Mimeo Series No.
May 1969
623
SOME CONTRIBUTIONS TO ORDER STATISTICS
by
Prakash Chandra Joshi
A thesis submitted to the faculty of
the University of North Carolina at
Chapel Hill in partial fulfillment of
the requirements for the degree of
Doctor of Philosophy in the Department
of Statistics
Chapel Hill
1969
Approved by:
Adviser
PRAKASH CHANDRA JOSHI. Some Contributions to Order Statistics.
(Under the direction of HERBERT A. DAVID).
Three different problems in order statistics are considered in
this dissertation.
The first problem deals with the recurrence relations between
moments and other functions of order statistics.
It is shown that
recurrence relations valid for independent and identically distributed
random variables continue to hold for exchangeable variables.
In the second problem a method, based upon orthogonal polynomials,
for obtaining bounds and approximations for the moments of order statis tics is given.
These bounds and approximations depend on the dis-
tribution function only through certain moments of order statistics
in small samples.
It is shown that for the Cauchy distribution bounds
and approximations of all finite moments can be obtained.
Finally, the problem of detecting a single outlier in a fixed
effects linear regression model is considered in some detail.
various cases considered are:
The
(i) known variance, (ii) external
studentization and (iii) pooled studentization.
In each case, one- and
two-sided test statistics for detecting a single outlier are proposed.
These statistics are maxima of suitably standardized or studentized
weighted residuals.
With the help of Bonferroni and other inequalities
upper and lower limits for the true upper percentage points of the
proposed statistics are developed and some tables are provided.
Some
measures of performance, appropriate for our purposes are also introduced and studied.
Finally, a comparison between external and pooled
studentization is made.
ii
TABLE OF CONTENTS
CHAPTER
PAGE
LIST OF TABLES
ACKNOWLEDGEMENTS
ABSTRACT
I
vii
1
1.1.
1.2.
1.3.
Scope . . . . . . .
1
Recurrence relations for order statistics
Bounds and approximations for the moments
of order statistics • • • • • • • • • • •
Outliers in regression models . • • • . . •
1.4.1. The problem of outlier detection ••
1.4.2. Notations and summary ••
1
3
5
RECURRENCE RELATIONS BETWEEN MOMENTS OF ORDER STATISTICS
FOR EXCHANGEABLE VARIATES
9
2.1.
2.2.
2.3.
2.4.
III
vi
INTRODUCTION AND SUMMARY
1. 4.
II
v
Introduction. . . . . . . . . . . . . . .
Recurrence relations • • • . • . • • . • • .
Direct proof and generalizations.
Some applications • • • • • • • •
BOUNDS AND APPROXIMATIONS FOR THE MOMENTS OF
ORDER STATISTICS
3.1.
3.2.
3.3.
3.4.
3.5.
2
3
9
10
12
14
17
Introduction. . . • . . . . . . . . . . . . . . . .
17
Notations and some preliminary results concerning
orthonormal functions • • •
. • • . • •
Bounds and approximations • • •
Some applications • • • • • • • •
Concluding remarks and comments •
. • . .
18
20
25
29
iii
TABLE OF CONTENTS (Continued)
CHAPTER
IV
PAGE
SINGLE OUTLIER IN A REGRESSION MODEL
4.1.
Formulation of the problem and the test
4.2.
Bounds for correlation coefficients • • . . . • . •
Upper percentage points of statistics
expressible as maxima •
...•...•.
Measures of performance
Examples • • • . • • • .
proce~ures. . . . . . . • . • . • . •
4.3.
4.4.
4.5.
V
DISTRIBUTION THEORY WHEN VARIANCE IS KNOWN
5.1.
5.2.
5.3.
5.4.
VI
DISTRIBUTION THEORY WHEN VARIANCE IS UNKNOWN EXTERNAL STUDENTIZATION
32
37
40
41
44
47
47
48
48
48
51
52
54
58
66
Introduction. . .
Percentage points
• • • .
6.2.1. Upper limits.
.•••
6.2.2. Improved upper limits •
6.2.3. Lower limits • • • • . .
Performance of test statistics ••
Applications. • • • • • . •
66
67
67
67
69
70
73
DISTRIBUTION THEORY WHEN VARIANCE IS UNKNOWN POOLED STUDENTIZATION
76
6.1.
6.2.
6.3.
6.4.
VII
Introduction. • • • • .
Percentage points . • .
5.2.1. Upper limits..
. •••
5.2.2. Improved upper limits .
5.2.3. Lower bound for the significance
level attained . • • • .
5.2.4. Lower limits . . • • . •
Performance of test statistics •.
Applications. . • • • • . •
. .••
32
7.1.
7.2.
7.3.
7.4.
7.5.
7.6.
7.7.
Introduction... • • • • • • • • •
76
Marginal and joint distributions. •
• • • • • 77
Evaluation of bivariate probability. • • • • .
86
A probability inequality. • • • . . • • • . • • • • 91
Percentage points. • • • . • •
97
7.5.1. Upper and lower limits.
97
7.5.2. True percentage points. .
98
Performance of test statistics. •
99
Comparison between external and pooled
studentization. • . • • . • • . • • • •
• 101
iv
TABLE OF CONTENTS (Continued)
CHAPTER
PAGE
BIBLIOGRAPHY
104
APPENDIX
109
v
LIST OF TABLES
Page
Table
2.4.1.
3.4.1.
3.4.2.
5.4.1.
Upper 5 and 1% points of X l ' the second largest among
n equi-correlated standardnnotMal variates with
correlation coefficient p = .5 . • • . • • . . . . . . . .
16
Approximate values and bounds for ~ '10
(a) Normal distribution • . • • • :' • .
(b) Cauchy dis tribu tion • • • • • • • • .
28
28
Approximate values and bounds for ~ '20
(a) Normal distribution . • • • • :' • .
(b) Cauchy distribution . • • . • • • •
28
28
Comparison between the lower bounds (5.4.1) (given in
top row) and (5.4.2) (given in bottom row) for a = .05
63
5.4.2.
Lower bound (5.4.3) for Q for a = .05 .
l
63
5.4.3.
Upper and lower limits of the statistic Ul for a two-way
layout with r rows and c columns and a = .05 . . . . .
64
Performance PI of the test statistic Ul for a two-way
layout with r rows and c columns and a = .05 . .
64
Upper and lower limits of the statistic VI for a
two-way layout with r rows and c columns and a = .05 .
65
Performance PI of the test statistic VI for a two-way
layout with r rows and c columns and a = .05 . • • • •
65
Improved upper limits v~ from equation (6.2.6)
for a = .05. •
. .••
75
5.4.4.
5.4.5.
5.4.6.
6.4.1.
6.4.2.
Performance Pa given at equation (6.4.1) for the
statistic V for a = .05 • • • • • • • .
2
75
vi
ACKNOWLEDGEMENTS
I acknowledge my sincere gratitude to Professor H. A. David,
not only for proposing the problems discussed in this dissertation,
but also for his guidance and encouragement throughout my stay in
Chapel Hill.
It has been a pleasure to work under him as a student
and also as a research assistant.
I also wish to thank the other members of my examination
committee:
Professors I. M. Chakravarti, N. L. Johnson, P. K. Sen
and N. M. Wigley.
Thanks are also extended to other members of the
faculty of the Departments of Statistics, Biostatistics and Mathematics who have contributed towards my graduate training.
I am
particularly indebted to Professor T. G. Donnelly for a stimulating
course in computer programming and allowing me to use some of his
CALL A COMPUTER programs in this dissertation.
For financial support, I sincerely thank the Department of
Biostatistics and the U. S. Army Research Office, Durham.
I also
wish to thank the State Government of Uttar Pradesh for providing
a partial travel grant for this purpose.
Finally, I wish to thank Mrs. Delores Gold for her excellent
typing of the manuscript.
vii
ABSTRACT
Three different problems in order statistics are considered in
this dissertation.
The first problem deals with the recurrence relations between
moments and other functions of order statistics.
It is shown that
recurrence relations valid for independent and identically distributed
random variables continue to hold for exchangeable variables.
In the second problem a method, based upon orthogonal polynomials,
for obtaining bounds and approximations for the moments of order statistics is given.
These bounds and approximations depend on the dis-
tribution function only through certain moments of order statistics
in small samples.
It is shown that for the Cauchy distribution bounds
and approximations of all finite moments can be obtained.
Finally, the problem of detecting a single outlier in a fixed
effects linear regression model is considered in some detail.
various cases considered are:
The
(i) known variance, (ii) external
studentization and (iii) pooled studentization.
In each case, one- and
two-sided test statistics for detecting a single outlier are proposed.
These statistics are maxima of suitably standardized or studentized
weighted residuals.
With the help of Bonferroni and other inequalities
upper and lower limits for the true upper percentage points of the
proposed statistics are developed and some tables are provided.
Some
measures of performance, appropriate for our purposes are also introduced and studied.
Finally, a comparison between external and pooled
studentization is made.
CHAPTER I
INTRODUCTION AND SUMMARY
1.1.
Scope
In this dissertation some well known problems in the theory of
order statistics are considered.
Throughout this work, we shall assume
that the order statistics are obtained by re-arranging in non-decreasing
order of magnitude variates having common marginal c.d.f.
In all, three different problems, along with some applications,
are treated.
These are:
1.
order statistics for exchangeable variates,
2.
bounds and approximations for the moments of
order statistics and
3.
detection of outliers in a linear regression model.
In each case, some existing results are generalized and, where
necessary, new concepts are introduced.
Out of these three problems,
the third has been considered in greatest detail and constitutes the
bulk of the dissertation.
1.2.
Recurrence relations for order statistics
Recurrence relations between the moments and other functions of
order statistics have been derived by many authors, usually on the
assumption that the random variables X ,X 2 ""'X are independent
n
l
continuous variates with common marginal c.d.f. P(x).
result is for r=1,2, •.. ,n-l,
The simplest
2
(1.2.1)
where ~
size n.
n ~r:n-l
r:n
=
r ~r+l:n + (n-r)~r:n'
is the expected value of r
th
order statistic in a sample of
In a recent paper, Young [43] shows that (1.2.1) and other re-
suIts deducible from (1.2.1) continue to hold when the X. are exchangeable
1.
In Chapter II, we give a simplified version of the proof given
by Young and another simple probabilistic argument which establishes
(1.2.1) and multivariate generalizations for exchangeable variates.
Some applications of the results are also included.
1.3.
Bounds and approximations for the moments of order statistics
Let X ,X "",X be a random sample of size n from a continuous
l 2
n
distribution with c.d.f. P(x).
Let
Xl :n -< X2 :n -< ••• < Xn:n
be the corresponding order statistics.
The problem of finding bounds and approximations for the moments
of order statistics has drawn considerable attention in the literature.
One of the early papers is due to Plackett [31], who showed that there
is a universal upper bound for E(X
deviation of X.
-Xl )/a, where a is the standard
n:n
:n
Later work on these lines is by Moriguti [28], Gumbel
[18], Hartley and David [21] and finally by Sugiura [42].
All these
authors with the exception of Sugiura only give the bounds, while
Sugiura gives both bounds and approximations.
3
In Chapter III, we generalize Sugiura's method and show that
even with less stringent conditions, one or more different sequences
of bounds and approximations for all finite moments can be obtained.
These bounds and approximations depend on the distribution function
only through certain moments of order statistics in small samples.
Further, it has been shown that for the Cauchy distribution bounds and
approximations of all finite moments can be obtained.
Some numerical
calculations for normal and Cauchy distributions are also given.
1.4.
Outliers in regression models
1.4.1.
The problem of outlier detection
The remainder of this dissertation has been devoted to the prob-
lem of detecting outliers in fixed effects additive regression models.
The problem can be formulated as follows:
A sample of size n has been observed on a variable Y which has a
linear regression on a known set of m variables X ,X , .•. ,X •
m
l 2
The usual
normal theory model can be described by
(1.4.1)
,
where
i:n = (Yl'Y2"·"Yn) is the observation vector,
matrix of rank m«n), Sl,S2, ..• ,Sm' 0 2
the identity matrix of order n.
X is a known
mxn
are unknown parameters and I is
Also the symbol "d" stands for "is
distributed according to".
For simplicity, we shall neglect the possibility of having any
"errors" arising from the entries of the design matrix X.
Among the
n observations, one or more may show some sign of deviation from the
assumed regression or may have a different variability or both.
Such
4
observations are generally termed outliers and can arise for various
reasons (see e.g. Kruskal [26]).
It is clear that the inclusion of such observations in any
analysis may yield quite erroneous conclusions.
Moreover, at times,
these observations may themselves be of interest.
Therefore, the main
problem is to develop some suitable test procedures which can be used
to isolate such observations.
We shall restrict our attention to
this aspect of the outlier problem.
In practice, we do not usually know the total number of outliers
in any given data set.
much more difficult.
This makes the problem of detecting outliers
In the special case, when we have a sample from
a N(~,a2) parent, the problem has been studied by many authors (see e.g.
Chew [6], who summarizes these and other results).
The outlying observa-
tions are usually assumed to differ in mean or in variance from the other
o~servations.
Numerous test procedures, mostly depending on a pre-
assigned number of outliers, have been studied in this case.
The
general problem of a single outlier in regression models has been considered by Srikantan [40].
In the present study, we shall assume that at most one observation -- we do not know which one -- is an outlier.
Moreover, this
observation is assumed to differ from the assumed regression only in
mean.
In this case, certain test statistics have been proposed by
Srikantan.
For these test statistics, he has obtained the nominal
Eercentage points, which control the error of the first kind at a level
not exceeding the specified one.
He has also shown that under certain
conditions, which in general hold in small samples, the nominal
5
percentage points coincide with the true percentage points.
Here, we
consider some additional test procedures, depending on the knowledge
available about
0
2
and study their performance.
This is summarized
in the following section.
1.4.2.
Notations and summary
Let
= I-X' (XX') -lX = «A
11.
nXn
ij
)) ,
e = 1I.y = residual vector,
S2
Pij
e'e = error sum of squares,
A..
1.J
=
(AiiA jj )
Pmin = Min
i;'j
k =
2
correlation between e. and e. ,
1.
J
Pij
and
= Max
i;'j
p ..•
1.J
Throughout this study we shall assume that
-1 < p.
<
m1.n -
p
max
< 1.
In Chapter IV, we propose one- and two-sided test statistics for
detecting a single outlier in situations where 1) 0 2 is known and 2) 0 2
is unknown, but an independent root mean square estimator, sv, of 0 based
on V degrees of freedom is available.
The latter case has been further
divided into two categories which are termed external studentization and
pooled studentization.
The one-sided test statistics for 0 2 known,
external studentization and pooled studentization are U ' U and U
3
1
2
6
respectively, where
e.
VI
Max b. ,
1.
i
b. =
1.
1.
--1
A.. ~
/
0,
1.1.
e.
V
Max t. ,
1.
2 i
t. =
1.
1.
--i~
A..
1.1.
e.
V = Max Wi'
3
i
(1.4.2)
w.
1.
1.
=~
A..
2
/ s\>,
k
2
/ (82+\>s2)
\>
1.1.
and the maximum is taken over the set {1,2, ••• ,n}.
The corresponding two-
sided statistics are denoted by VI' V and V respectively, where
2
3
VI = Maxlb.1 etc.
i
1.
The first two Bonferroni inequalities are suggested to get upper
and lower limits for the true upper lOOa% points of these statistics.
These are improved in later chapters for some special cases.
To study the performance of the proposed statistics, we assume the
null hypothesis, H , that there are no outliers and that under H our model
o
0
is given by equation (1.4.1).
The alternative hypothesis H is the union
a
of n mutually exclusive hypotheses H ,H , •.• ,H and under H
l 2
n
k
where
where
~k
is an n-vector with k
th
component 1 and all other components O.
Now under H , all the proposed test statistics are of the form
o
Z = Max{zi:
i=1,2, •.• ,n},where zl,zZ"",zn are identically distributed
random variables.
k=1,2" ",n let
Let z
a
be the true upper lOOa% point of Z and for
7
Probability that zk is significantly large when H is true
k
= Pr(zk > zaIHk)'
Q
k
Probability of rejecting H when H is true
o
k
The measures of performance introduced are:
P
a
Min P
k
k
,
Qa = Min Qk ,
k
1
P =n
b
1
Qb =n
n
L Pk ,
k=l
n
L Qk'
~l
In some special cases, these reduce to the measures considered "by David
and Paulson [10].
Finally, three examples of regression models, viz., 1) sample
from a N(~,a2) parent, 2) one-way layout and 3) two-way layout are briefly
mentioned.
These are considered in later chapters as an application of
the theory developed.
In Chapter V, the statistics U and VI are treated in detail.
l
It
is shown that one can always obtain better upper limits for the true
upper 100a% points of VI than those obtained by the first Bonferroni
inequality.
Some interesting results for U in the case p
< 0 are
max l
also obtained,
Among the measures of performance, it was observed that
for both Ul and VI' Pa depends on A(l)' where A(l) = Min Akk ,
Moreover,
k
P is a poor measure if anyone of the A is much smaller than the
a
kk
others and in such cases P might be worth computing.
b
Upper and lower
limits for the true upper percentage points of U and VI for the two-way
l
layout are tabulated and the measure P has been studied for this case.
a
8
The statistics U and V are investigated in Chapter VI.
z
z
A
number of results in this case are analogous to that of Chapter V.
For the measures of performance we have mainly considered P
a
provided a table for computing P
a
and
for the two-sided test statistic.
In Chapter VII, a number of distribution theory results related
to U and V are obtained.
3
3
Findings of this chapter generalize the
results due to Doornbos et a1. (see e.g. Doornbos [11]) and Srikantan
[40].
The joint distribution of w and w ' where wi is given at (1.4.Z)
1
z
is derived and an expression useful for evaluating Pr(w >c , wz>c ) is
z
1 1
obtained.
It is shown that if P1Z < 0, then
provided both c
1
and
C
z
are of the same sign.
Following a geometrical argument, it is shown that for some small
values of n and v, the upper limits for the true upper percentage points
for both U and V coincide with the true upper percentage points.
3
3
comparison between U and U using P is also included.
z
k
3
A
It was found
that U has a definite edge over U ' when we use the upper limits in the
3
Z
expressions for P •
k
CHAPTER II
RECURRENCE RELATIONS BETWEEN MOMENTS OF
ORDER STATISTICS FOR EXCHANGEABLE VARIATES
2.1.
Introduction
Let X.
(i=1,2, ••. ,n) be the order statistics obtained by re1:n
arranging in non-decreasing order of magnitude the variates X. having
1
common marginal c.d.f. p(x).
Denote by F. (x)
1:n
expected value of X.
respectively.
1:n
and other functions of the Xi
:n
and~.
1:n
the c.d.f. and
Recurrence relations for moments
have been derived by many authors,
usually on the assumption that the X. are independent continuous variates.
1
The most basic of these relations states that for r=1,2, ••• ,n-l,
(2.1.1)
n ~r:n-l = r ~r+l:n + (n-r)~r:n·
In a recent paper, Young [43] shows (in effect) that (2.1.1) and hence
results deducible from (2.1.1) continue to hold if the X. are exchange1
able, continuous or discrete variates, i.e., if
is sYmmetric in x ,x , •.• ,x •
l 2
n
Pr(Xl<xl,X2~x2,••• ,X~xn)
In Section 2.2, we give a simplified
version of the proof given by Young and in Section 2.3, another simple
probabilistic argument which establishes (2.1.1) and multivariate
generalizations for exchangeable variates is given.
some applications of the results are mentioned.
In Section 2.4,
10
2.2.
Recurrence relations
Let A.1 (i=1,2, ••. ,n) be the event X.1 <
x.
-
(2.2.1)
S
m
Define
E Pr(A.1 ,A.1 , ... ,A.1 ),
1
2
m
n
the summation extending over all (m) sets of integers 1<il<i2< ... <i~n.
Then for r=1,2, •.. ,n,
Pr(X
< x)
r:n -
= Pr(at least r of the X.'s are
1
~
x)
(2.2.2)
The last equality follows by using a well known theorem for the realization of at least r out of n events (see e.g. Feller [14]).
Equation (2.2.2) is valid for any set of n random variables.
In
particular, if the X. are exchangeable variates, then from (2.2.1)
1
= (n) Pr(X
< x)
m
m:m -
Substituting in (2.2.2), we get
n
(2.2.3)
Fr : n ()
x =
~ (_l)m-r(m-l)(n)F
(x)
L
r-l m m:m
'
m=r
a relation linking the c.d.f. of X
with the c.d.f.'s of the maximum
r:n
in samples of r,r+l, ••• ,n.
Differentiating or differencing, multiplying
11
itx
.
.
' t h e same re 1at10n
.
b etween
a nd'1ntegrat1ng
or summ1ng,
we 0b
ta1n
by e
p.d.f.
IS,
characteristic functions, and hence raw moments.
Equation
(2.2.3) has been proved by Young [43] by a similar but rather complicated
method. (see equation (11) of Young's paper).
With the help of (2.2.3) we can derive what may be called the
basic recurrence relation for order statistics:
(2.2.4)
n Fr:n- l(x) = r Fr +1 :n (x) + (n-r)F r:n (x).
This formula, usually stated in terms of moments, follows on applying
(2.2.3) to each term of (2.2.4).
We have here reversed the usual sequence whereby (2.2.3) and indeed
some more general related results are deduced by repeated application of
(2.2.4).
It may also be noted that for identically distributed standard-
ized multi-normal variates with equal correlation p, (2.2.4) for arbitrary
P follows readily from (2.2.4) for p=o in view of the representations of
X. by (Owen and Steck [30])
1
k:
X. = P 2y
1
O
+ (l-p)
k:
(p>o; i=1,2, ••. ,n),
2y.
1
where YO'Yl""'Y n are independent N(O,l) variates and
(-l/(n-l)~p<O;
i=1,2, •.• ,n),
\-
j
where Zl,Z2, .•. ,Zn are
in9~p~ndent
N(O,l) variates, Zo is N(O,l) and
If in addition to exchangeability multivariate symmetry (say about
zero) also holds, i.e.
12
then as for independent symmetric variates
(2.2.5)
Pr(X
<x) = Pr(X
+1 >-x).
r:nn-r :n-
To prove this, we only need to note that in view of multivariate
symmetry, the joint distributions of
(Xl : n
,X:2n ' .•• ,Xn:n )
and
1 "",-X 1 :n )
(-Xn:n ,-X n-:n
coincide.
Hence for any x
Pr(X
<x) = Pr(-X
+1 <x)
r:nn-r :n-
and (2.2.5) follows.
For continuous variates we have, in particular,
Fn:n (x) -- 1 - Fl:n (-x)
n
= 1 -
L
m=l
(_1)m-1(n)F
(-x)
m m:m
(by (2. 2 . 3) ) ,
a result obtained by Steck [41] for the equi-corre1ated mu1tinorma1
variates.
2.3.
Direct proof and generalizations
Of the n variates X. drop one at random and let Y.
1
1
1:nth
(i=1,2, ... ,n-1) denote the i
order statistic in the reduced set of
n-1 exchangeable variates.
If X.
is dropped (i=1,2, •.• ,r) the r
1:n
th
13
largest variate in the set of n-l was the (r+l)
(A)
Y
r:n-l
= X
r+l:n
th
largest out of n, i.e.
•
Likewise, i f X.
is dropped (i=r+l,r+2, ..• ,n), then
1:n
(B)
Y
r:n-l
= X
r:n
Since (A) and (B) have respective probabilities r/n and (n-r)/n, it
follows that for any x
(2.3.1)
r
n-r
Prey r:n- l<X)
= -n Pr(Xr+1
<x) + --<x),
:nn Pr(Xr:n-
that is
n Fr:n_l(x) = r Fr+l:n(x) + (n-r)Fr:n(x),
which is equation (2.2.4).
The above argument is readily generalized to the joint c.d.f. of
two or more order statistics.
Let F
r,s:n
of X
and X
(l<r<s<n; x<y) and let
r:n
s:n -
~
(x,y) denote the joint c.d.f.
r,s:n
= E(X
X
).
r:n s:n
corresponding to (2.3.1), we now have for any x,y (x<y)
=
r
n Pr(Xr+l:~x,
Xs+l:;:Y)
. s-r
+ --- Pr(X
<x, X
<y)
n
r:ns+l:nn-s
+ --- Pr(X
<x, Xs : <v).
n
r:nrr-"
This can be rewritten as
Then
14
(2.3.2) n Fr,s:n- l(x,y) = r Fr +1 ,s +1..n (x,y)+(s-r)F r,s +1..n (x,y)
+(n-s)F
r,s:n
(x,y).
As in Section 2.2, this result can be converted into one linking the
corresponding product-moments of any order, to give in particular,
(2.3.3)
n
~
r,s:n-l
= r
+
~
+(s-r)~
r+l,s+l:n
(n-s)~
r,s:n
r,s+l:n
•
(2.3.3) has been established by Govindarajulu [16] for independent
identically distributed continuous variates.
For the equi-correlated
multinormal case (with common marginal c.d.f.) (2.3.3) may also be proved
with the help of expressions for the moments of order statistics given
by Owen and Steck [30].
2.4.
Some applications
For numerical work formula (2.2.3) requires tables of the c.d.f.
of largest order statistic.
If Xi (i=1,2, •.• ,n) are multivariate normal
with means zero, variances unity, and common correlation coefficient p,
then Gupta [19] tabulates F
(x) for n=1(1)12 and several values of p.
n:n
An important special case of (2.2.3) gives the c.d.f. of X 1
n- :n
as
(2.4.1)
As pointed out by Fisher tIS] in connection with harmonic analysis a
test of the second largest variate X 1
becomes of special interest
n- :n
when the test based on X
is inconclusive, that is, close to the
n:n
chosen level of significance.
15
As an application of (2.4.1) consider the problem of testing n
"treatment" means against a "control" mean (Dunnett [12]).
and Zoh (i=1,2, ... ,n; j=1,2, ... ,k; h=1,2, ...
,~)
Let Z..
1J
be mutually independent
normal variates, Z.. and Z h being respectively N(~.,02) and N(~ ,0 2),
1J
with 0
2
0
assumed known.
1
0
In order to test simultaneously whether any
of the treatment means Z. differ from the control mean
1
Z0
we may use
the statistic
X
n:n
=
Max X.,
i
1
where
Z.-Z
X. =
1
1
,
0
o(l + l)~
i=1,2, .•. ,n,
~
k
k
1
Z. = - L Z'j
1
k j=l 1
,
i=1,2, ••• ,n
and
1
Z =0
~
~
I Z h'
h-l
0
It is easy to show that the Xi are equi-correlated standard normal
variates with correlation coefficient p =
k/(k+~)
so that Gupta's tables
may be used to obtain F 1 (x) and hence percentage points of X l .
n- :n
n- :n
For the case
Table 2.4.1.
k=~,
i.e.
p=~,
upper 5 and 1% points are given in
16
Table 2.4.1.
Upper 5 and 1% points of X l ' the second largest
n- :n
among n equi-corre1ated standard normal variates
with correlation coefficient p=.5
n
2
3
4
5
6
7
8
9
10
11
12
5%
1%
1.100
1.400
1.569
1.685
1. 773
1.843
1.901
1.950
1.993
2.031
2.065
1.713
1.981
2.134
2.242
2.324
2.390
2.443
2.490
2.532
2.569
2.597
CHAPTER III
BOUNDS AND APPROXIMATIONS FOR THE MOMENTS
OF ORDER STATISTICS
3.1.
Introduction
Let X ,X "",X be a random sample of size n from a continuous
1 2
n
distribution with c.d.f. p(x) and p.d.f. p(x).
Let
X
< X
< ••• < X
l:n - 2:n n:n
be the corresponding order statistics.
Denote E(X
r:n
) by
~
r:n
•
Several authors (see e.g. B10m [5], David and Johnson [8],
Sugiura [42]) have given methods of finding approximations for the
moments of order statistics.
The method due to Sugiura also gives the
bounds and requires that the r.v. X has a finite variance.
By a
generalization of his method, it will be shown here that, even with
less stringent conditions, one or more different sequences of bounds
for all finite moments can be obtained.
These bounds and approxima-
tions depend on the distribution function only through certain moments
of order statistics in small samples.
All such bounds, e.g. for
~
r:n
are of the form
~
r,n,t
± c
r,n,t
,
where ~r,n,t is a (t+1)-term approximation to ~r:n and ~r,n,t
c
r,n,t
+
0 as t+oo.
+ ~r:n'
,
18
3.2.
Notations and some preliminary results concerning orthonormal
functions
Let {~k(u)}; k=0,1,2""'~o(u)=1, be an orthonormal system over
the closed interval [0,1], i.e.
f
1
o
{ 1
~ (u)~ (u)du =
m
n
0
if m=n
otherwise.
Let f(u) and g(u) be square-integrable functions over [0,1].
(3.2.1)
ak =
f
(3.2.2)
bk =
f
Thus a
to
and b
k
~k(u)
If
o
1
o
f(u)~k(u)du,
g(u)~k(u)du.
are the Fourier coefficients of f(u) and g(u) relative
respectively.
Lemma 3.2.1.
(3.2.3)
k
1
We then have the following lemma of Sugiura [42].
For. any integral t>O
1
o
f (u)g (u)du -
where the equality holds if and only if
t
f(u) -
L ak~k(u)
k=O
Proof:
Put
~ g(u) -
Applying Schwarz inequality to
t
L bk~k(u).
k=O
19
t
t
f(u) -
L ak¢k(u)
and
L bk¢k(u)
g(u) -
k=O
k=O
we have
II
1
°
t
(f(u) -
t
L ak¢k(u»(g(u)
L bk¢k(u»dul
-
k=O
k=O
Multiplying and using (3.2.1), (3.2.2) and the orthonormality of
functions {¢k(u)}, the lemma follows.
Remark 1.
In Lemma 3.2.1 the members ¢k(u) (k=O,l, .•• ,t)of the ortho-
normal system have been used to give (3.2.3).
However, the same proof
applies for any subset of {¢k(u)}, thus allowing us to select subsets
which result in useful bounds.
t
Remark 2.
I
akb
provides an approximation to
I
1
fg du, an upper bound
k
k=O
to the error involved being given by the R.H.S. o f (3.2.3). If, further,
°
{¢k} constitute a complete orthonormal system in [0,1] (see e.g. [35]),
then the R.H.S. of (3.2.3) tends to
° as
t~,
and hence the approximation
can be made as accurate as we please by choosing t large enough.
As an example of a complete orthonormal system in [0,1], we have
the Legendre polynomials in [0,1] given by
(3.2.4)
12k+l
k!
dk
du
k
k
- k u (u-l) ;
k=O, 1,2, ..••
20
3.3.
Bounds and approximations
Let the r.v. X have a continuous c.d.f. p(x) and let
finite.
~
r:n
be
Then by using the probability integral transformation
u = p(x) we can write
_ _=1_ _ fl
(3.3.1)
~r:n = B(r,n-r+l) 0 x(u)u
r-l
(l-u)
n-r
du,
where x(u) denotes that x has been expressed as a function of u.
Theorem 3.3.1.
Let ¢o=l, ¢1'¢2"'. be any orthonormal system in
[0,1] and let E(X~p+l:2p+2q+l) be finite for some integral p, ~O.
Then for r=1,2, •.• ,n and any integral
t~O
t
<
IB(p+r,q+n-r+l) E X
\
I
B(r,n-r+l)
( p+r:p+q+n) - k~oakbk
t
1
[B(2p+l,2q+l)E(X~p+l:2p+2q+l) - k~Oa~]~
(3.3.2)
I
• [B(2r-l,2n-2r+l) _
b2]~,
[B(r,n-r+l)] 2
k=O k
where a
k
and b
k
are given by (3.2.1) and (3.2.2) with
(3.3.3)
g(u) =
(3.3.4)
Proof:
1
ur-l(l_u)n-r.
B(r,n-r+l)
We have
f
1
o
f2(u)du = B(2p+l,2q+l)E(X~p+l:2p+2q+l)'
21
f
1
g2(u)du =
o
B(2r-l,2n-2r+l)
[B(r,n-r+l)]2
and
f
1
o
f(u)g(u)du =
B(p+r,q+n-r+l)
E(X
).
B( r,n-r+l )
p+r:p+q+n
Applying Lemma 3.2.1, the result follows.
It should be noted that for a given orthonormal system, a
k
depends on the c.d.f. P(x) of the r.v. X, but not on rand n; while b
does not depend on the P(x), but only on rand n.
shows that the approximations and bounds for
~
Theorem 3.3.1 then
+
+ + (r=1,2, ••. ,n),
p r:p q n
which depend on P(x) can be obtained, provided E(X~p+l:2p+2q+l) is
finite.
In terms of p(x), a sufficient condition for this is
f
00
2
--
2p l
Jxl + • dP(x) <
00
_00
for
~~o
(Sen [37]).
It is clear that the same technique can be used to find the
bounds and approximations for moments of any order.
Thus, to get
s
+ +n)' where s>O, we only require the
similar results for E(X +
p r:p q
square integrability of the function [x(u)]s uP(l-u)q.
equivalent to saying that
E(X~;+1:2p+2q+l) must
This is
be finite.
We shall now turn our attention to the case where the distribution is known to be symmetric, say about x=O.
This additional
information leads to sharper bounds.
Theorem 3.3.2.
about x=O.
Let the distribution of X be continuous and symmetric
Let ¢o=l, ¢1'¢2' .••. be any complete orthonormal system
in [0,1] satisfying
k
22
(3.3.5)
If E(X~m+1:4m+1) is finite for some integral m~O, then for r=1,2, ... ,n
and any integral t>O
t
\
I
L a2k+1b2k+1
k=O
B(m+r,m+n-r+1)
B(r,n-r+1)
E(Xm+r:2m+n) -
I
t
(3.3.6)
t
o [B(2r-1,2n-2r+1)-B(n,n) _
2[B(r,n-r+1)]2
where a
k
and b
1
[B(2m+1,2m+1)E(X~m+1:4m+1) - k~Oa~k+1]~
<
k
1:
\ b2
]2
L
2k+1 '
k=O
are as in Theorem 3.3.1, with p and q equal to m.
If,
further, E(X~m+1:4m+1) is finite for some integral m~O, then for
r=1,2, .•. ,n and any integral
IB(m+r,m+n-r+1)
B(r,n-r+1)
t~O
E(X2
) _
m+r:2m+n
(3.3.7)
o[B ( 2r-1,2n-2r+1)+B(n,n) _ I b2 ]~
2[B(r,n-r+1)]2
k=O 2k
t
1
where
1
a'
k
Proof:
= J..x 2 (u)
o
um(l_u)m cP (u)du.
k
Take p=q=m in (3.3.3) and apply Lemma 3.2.1 with k=1,3,5, .•. ,
2t+1,O,2,4, ..•. , thus giving
Z3
IB(m+r,m+n-r+l) E(X
) _
B(r,n-r+l)
m+r:Zm+n
t
00
L aZk+lbZk+l
k=O
< [B(Zm+l,Zm+l)E(X~m+l:4ffi+l) -
-
L aZkbZkl
k=O
t o o l
L a~k+l
k=O
- k_~oa~k]~
.[B(Zr-l,Zn-Zr+l) _
[B (r ,n-r+l)]2
(3.3.8)
Since the distribution of X is symmetric about x=O, the inverse
function x(u) is odd and hence on using (3.3.5)
1
a Zk = ~ x(u)um(l-u)ffi¢Zk(u)du
=
I
1
o
x(l-v)vm(l-v)m¢Zk(v)dv; v=l-u
So that
(3.3.9)
We now show that
00
(3.3.10)
L b 2 = B(Zr-l,Zn-Zr+l)+B(n,n)
k=O Zk
2[B(r,n-r+l)]2
This result has been prove,d by Sugiura [42] for Legendre polynomials
(3.2.4), for which (3.3.5) is satisfied.
on parallel lines.
(3.3.11)
The proof given here follows
Define
1
un-r(l_u)r-l,
g*(u) = B(r,n-r+l)
O<u<l.
24
Then on using (3.3.5)
1
f
~(g(u)+g*(u))¢2k(u)du
a
f
(3.3.12)
1
h(u)¢2k(u)du,
a
where
(3.3.13)
h(u)
= ~(g(u)+g*(u)).
Since {¢k} is a complete orthonormal system in [0,1], hence
f
(3.3.14 )
1
a
where c
k
00
h 2 (u)du =
is the Fourier coefficient of h(u) relative to ¢k(u) , i.e.
1
c
k
= f h(u)¢k(u)du.
a
Now by (3.3.12) and (3.3.5) we see that
and
=
o.
Equation (3.3.14) then reduces to
Therefore
00
I
k=O
.'
b~k
=~
f
1
a
(g(u)+g*(u))2du
and (3.3.10) follows.
Use of equations (3.3.9) and (3.3.10) in (3.3.8) completes the
proof of first part of the theorem.
25
To prove the second part, note that
(3.3.15)
and
00'
= B(2r-l,2n-2r+l)-B(n,n)
2[B(r,n-r+l)]2
An application of Lemma 3.2.1 with k=0,2,4, ••. ,2t,1,3,5, •.•.
then gives (3.3.7).
It is clear that similar results for any odd and even order
moments can be obtained by applying the same techniques as in the proofs
of (3.3.6) and (3.3.7) respectively.
3.4.
Some applications
As an application of (3.3.6), let X have a Cauchy distribution,
p(x)
1
-OO<x<oo.
In this case Barnett [4] has shown that E(X
s
) is finite for all s<r<n-s.
r:n
In particular, this means that E(X~:5) is finite and hence (3.3.6) is
applicable with m=l.
We thus get bounds for E(X + : + ), r=1,2, •.. ,n,
r l n 2
i.e., for all finite moments.
However, one can take any integral
m~l.
Thus for m=2 we get another sequence of bounds for E(X + : + ),
r 2 n 4
r=1,2, ••• ,n, i.e., for all finite moments except for the second smallest
and second largest order statistics.
The case m=O has been treated by Sugiura [42] who has also given
bounds and approximations for a normal distribution for n=lO, 20.
Here
we consider the case m=l and use the Legendre polynomials (3.2.4) for
which (3.3.5) is satisfied.
In general, we can write
26
(3.4.1)
where c ' are constants.
k ,J.
For k=O,1,2 and 3 these are given below:
LO(u) = 1,
Ll(u) = 13 (2u-l),
L (u) = 15 (6u 2-6u+l),
2
L (u) = 17 (20u 3 -30u 2+l2u-l).
3
With m=l and L in place of ¢k we have
k
k
(3.4.3)
c
'
k ,J.
\'
(
)
L
(·+2)(·
i=O J.
J.+3) E XJ.'+2 • J.'+3 '
(3.4.2)
b
k
L
r(r+l) •••• (r+i-l)
k = i=Ock,i (n+l)(n+2) .•• (n+i)
and (3.3.6) reduces to
r(n-r+l)
I (n+l)(n+2)
(3.4.4)
~
E(X r +l : n+ 2) - k~oa2k+lb2k+l
1
2
t
[30 E(X 3 •• S) -
2
L a 2k
+l ]
I
<
k
2
k=O
I
• [B(2r-l,2n-2r+l)-B(n,n) _
b2
]~
2[B(r,n-r+l)]2
k=O 2k+l
.
Note that a k is a
l~near
function of the moments
~i+2:i+3
in
small samples and can be evaluated by using tables of moments of order
statistics for the r.v. X.
In cases where no such tables are available,
the moments could be evaluated by numerical integration.
27
The necessary moments for computation are extensively tabulated
for the normal distribution in [36] and for the Cauchy distribution in
[4] and [34].
Bounds and approximate values for n=8 and n=18, using
(3.4.4), are given in Tables 3.4.1 and 3.4.2.
first bound and t=l gives the second bound.
t=O in (3.4.4) gives the
These tables show that the
approximations and bounds are remarkably good for Cauchy distribution
for all r.
For the normal distribution, the first bound for
is rather bad.
~
n+l:n+2
However, a comparison with the corresponding results
due to Sugiura [42] (with n=lO, 20 and m=O in (3.3.6)) shows that our
first bound is approximately of the same order as his first bound,
except for
~n+l:n+2'
for which our bound is inferior.
bound is far superior to his second bound.
But our second
In fact, in many cases our
second bound is even superior to his third bound, which is obtained by
taking t=2 in (3.3.6).
This, for example, for
~11:20
his third bound
is 0.066 ± 0.008, while our second bound is 0.0644 ± 0.0037.
It is worthwhile noting that for symmetric parent distributions,
equations (3.3.9) and (3.3.15) can also yield some simple recurrence
relations of the form mentioned in Chapter II.
Thus, for example, on
taking k=l and L in place of ¢k in (3.3.9), we get
k
1
J x(u)um(1-u)~2(u)du
o
= 0,
that is,
1
J
o
m
x(u)u (l-u)m(6u 2 -6u+l)du = 0
and this simplifies to
(m+2)~m+3:2m+3 = (2m+3)~m+2:2m+2'
28
Table 3.4.1.
Approximate values and bounds for
(a)
Normal distribution
r
First bound
9
8
7
6
1. 299
0.530
0.248
0.074
±
±
±
±
0.• 298
0.133
0.136
0.057
Second bound
1. 0041
0.6509
0.3788
0.1249
(b)
Cauchy distribution
r
First bound
9
8
7
6
3.053
1.246
0.582
0.174
±
±
±
±
Table 3.4.2.
0.073
0.032
0.033
0.014
2.9822
1.2749
0.6129
0.1866
r
First bound
±
±
±
±
±
±
±
±
±
1.481
0.377
0.247
0.228
0.222
0.213
0.188
0.134
0.049
2.9814
1. 2755
0.6132
0.1866
1. 40760
1.13095
0.92098
0.74538
0.59030
0.44833
0.31493
0.18696
0.06200
I
First bound
0.361
0.092
0.060
0.056
0.054
0.052
0.046
0.033
0.012
~r:20
±
±
±
±
±
±
±
±
±
1.4693
1.1023
0.9002
0.7390
0.5940
0.4570
0.3241
0.1937
0.0644
r
±
±
±
±
±
±
±
±
±
Tabled value
Tabled value
Cauchy distribution
6.590
3.079
1.890
1.279
0.897
0.626
0.415
0.238
0.078
0.0015
0.0028
0.0016
0.0013
1. 00136
0.65606
0.37576
0.12267
Second bound
(b)
19
18
17
16
15
14
13
12
11
±
±
±
±
Tabled value
Approximate values and bounds for
Normal distribution
2.805
1.310
0.804
0.545
0.382
0.267
0.177
0.101
0.033
0.0028
0.0053
0.0031
0.0025
Second bound
(a)
19
18
17
16
15
14
13
12
11
±
±
±
±
~r:10
0.0619
0.0310
0.0225
0.0116
0.0063
0.0094
0.0115
0.0095
0.0037
Second bound
6.2705
3.0287
1.9128
1.3259
0.9480
0.6718
0.4506
0.2600
0.0851
±
±
±
±
±
±
±
±
±
0.0324
0.0162
0.0118
0.0060
0.0033
0.0049
0.0060
0.0050
0.0019
Tabled value
6.2648
3.0293
1.9140
1.3268
0.9484
0.6720
0.4506
0.2599
0.0850
29
3.5.
Concluding remarks and comments
We conclude this chapter by briefly mentioning a few points of
interest.
Work on some of these, and related problems, could be further
pursued, although a good deal of computation might be needed.
1.
First, we consider the effect on the approximation as n
increases.
To this end note that g(u) is a polynomial of degree n-l
and hence can be written as a linear combination of Legendre polynomials.
Thus
n-l
(3.5.1)
L bkLk(u)
g(u) =
k=O
and
1
J g2(u)du
o
where b
k
=
is the Fourier coefficient of g(u) relative to L •
k
Now using
Theorem 3.3.1, say with p=q=O and t=n-l, we have
(3.5.2)
llr:n =
where
1
ak =
J
o
x(u)Lk(u)du
k
\'
(3.5.3)
= ..L..
1.= O
c
.
k
~
i+l· ]1i+l: i+l'
Equations (3.5.2) and (3.5.3) also show that ]1
r:n
combination of ]1 . . , i=1,2, .•. ,n (see equation (2.2.3)).
1.:1.
approximation of ]1
r:n
is a linear
Further, the
is obtained by taking the first (t+l) terms of
30
the R.H.S. of (3.5.2).
From this, it may be conjectured that if n is
increased, then more terms will be needed to get the same degree of
approximation.
Some numerical computations for the normal distribu-
tion appear to confirm this.
2.
It is clear that the results of Theorems 3.3.1 and 3.3.2
apply to discrete random variables provided only that the necessary
moments are finite.
From equation (3.5.2) it can be shown that
n ].l
r:n-l
= r ].l
r+l:n
+ (n-r)].l
r:n
•
Now equation (2.Z.4) shows that this result holds for exchangeable
r.v.'s also.
Consequently, the approximation for ].lr:n is valid for
exchangeable r.v.'s as well.
3.
As pointed out in Section 3.4, we can get several different
sequences of bounds converging to ].lr+m:n+Zm by taking different values
of m and n in (3.3.6).
However, there does not seem to be any theoretical
way of finding the value of m, for which the results are best possible
(in the sense that for fixed t, the error bound term is
~
minimum).
Some
numerical computations performed for the normal distribution with n+Zm=ZO
and 50 show that, in general, m=l gives better results than m=O or m=Z
for the second approximation (t=l).
4.
The results of Section 3.3 hold for any orthonormal system
which satisfies the
given.~onditions.
Thus, instead of using the
Legendre polynomials, we can also use the trigonometric functions
defined by
¢ (u) = 1,
o
1
1
¢Zk-l(u) = IZ sin Zkrru, ¢Zk(u) =;r cos Zkrru, k>l.
31
These are known to form a complete orthonormal system in [0,11 and also
satisfy (3.3.5).
Obviously, the Fourier coefficient a
linear combination of moments of order statistics.
k
is no longer a
However, a
k
and b
could be evaluated by using numerical integration.
This observation raises an interesting question:
Does there
exist an orthonormal system for a given distribution function, for
which the results are best possible, uniformly in n?
question open for further investigation.
We leave this
k
CHAPTER IV
SINGLE OUTLIER IN A REGRESSION MODEL
4.1.
Formulation of the problem and the test procedures
Let Y1'Y2""'Yn be n independently and normally distributed
observations such that for j=1,2, •.• ,n
(4.1.1)
where
0
2
,
Sl,S2"",Sm (m<n) are unknown parameters and xij's are known
real coefficients.
present in the data.
This is the model when there are no outliers
Letting
Y1
Y2
Z =
nx 1
.
,
S
mx1 =
Sl
x
n x 12
x
S2
x
x
x
.
,
22
2n
X
mXn =
x
Sm
Yn
21
1n
m1
x
m2
...
x
mn
the model can be written as
where I is the identity matrix of order n.
generality, let X be of rank m.
Then the least-square estimate of
.f is given by
~
The residual vector,
~,
is
Without any loss of
(XX,)-l X Y .
33
e
=Ay,
nxl
(4.1.2)
where
(4.1.3)
is an idempotent matrix of rank n-m.
Further
(4.1.4)
The error sum of squares is
(4.1.5)
8
2
= ~'~ = y'
11. Y ,
which is distributed as a2X2 with n-m degrees of freedom.
The method for detecting a single outlier will be based on these
residuals, standardized in some way.
(i=1,2, .•• ,n).
We now assume that A.• > 0
u.
A necessary and sufficient condition for this is given
at the end of Theorem 4.1.1.
We need the following lemma (see e.g.
Rao [33], p. 29).
Lemma 4.1.1.
Let A and D be non-singular square matrices of orders
m and n respectively and B be a mXn matrix.
If B'A-1B + D-
l
is in-
vertible, then A + BDB' is also invertible and
Theorem 4.1.1.
With the notations used as above, let X and 11. be parti-
tioned according to
X
34
where Xl is of order mxk and All is of order kxk.
A necessary and
sufficient condition for All to be positive definite is that X2 is
of rank m.
Proof:
Let
~l
By definitiDn
be the first k components of the residual vector e.
Then by
(4.1.4)
which shows that All is at least positive semidefinite.
If X is of
2
rank m, then X Xi is also of rank m and by Lemma 4.1.1.
2
which proves that All is positive definite.
Conversely, if All is positive definite and X is of rank m,
then on applying the lemma to XX'-X Xi we see that XX'-X1Xi is in1
vertible.
But X Xi
2
=XX'-X1Xi.
Hence X Xi is of rank m and so is
2
Applying Theorem 4.1.1 with k=l, we see that A •• > 0 provided
J.J.
that the rank of the matrix obtained by deleting the i th column of X
is m.
Denote the correlation coefficient between eJ.. and e. by p ...
J
J.J
Then
= A •• / (A .• A •• )
J.J
J.J. J J
k
2.
35
Note that p .. is well defined, because by our assumption A.. > 0 for
1J
11
Let
all i.
(4.1.6)
Pmin = Min P "
iJ
i~j
Pmax = Max p .. ,
i~j 1J
We will test for a single outlier under the following assumption:
(4.1.7)
This condition holds for a wide class of regression models and insures
that the joint distribution of e. and e. is non-singular.
1
J
Now apply-
ing Theorem 4.1.1 with k=2, it follows that assumption (4.1.7) holds,
provided that the rank of the matrix obtained by deleting the i
jth columns of X is m.
th
and
Another sufficient condition for the validity
of (4.1.7) is given later in Corollary 4.2.1.
A test procedure for detecting a single specified outlier with
a shift in location has been given by Srikantan [40].
He also gives
the test statistics when the outlier is not specified.
For this case,
he only obtains the nominal percentage points which control the error
of the first kind at a level not exceeding the specified one.
These
points essentially give an upper limit for the true percentage points.
In this dissertation, we shall consider some additional test statistics
depending on the knowledge available about a 2 •
One-sided statistics,
appropriate for testing an outlier on the right are denoted by U and
the corresponding two-sided statistics are denoted by V.
All these
test statistics are maxima of suitably standardized or studentized
residuals and in each case, the maximum is taken over the set
{l,2, •..
,nL
36
Case 1:
0
2
known
e.
(4.1.8)
U = Max
l
i
Case 2:
0
2
~
A..
0
V = Max
l
k< '
i
2
~~
unknown, but a root mean square estimator sv of
0
based
on V degrees of freedom and independent of Yl'Y2""'Yn is available.
In this case, we can either use external studentization or pooled
studentization.
In the former method, any knowledge about
02
from the
sample is totally ignored, while in the latter, it is pooled with sv'
The test statistics considered for external studentization are
e.
~
U2 = Max
k<
i
SA •• 2
(4.1.9)
V
'
Max
i
~~
and for pooled studentization are
(4.1.10)
e.
~
U = Max _--:::-:;k<
3
i
S A•• 2
P
~~
'
Max
i
where
When no such Sv is available, we will use U and V with v=O.
3
3
(The
case considered by Srikantan).
In view of continuity of el,eZ, ••. ,e
n
and the assumption (4.1.7),
it follows that the maximum in all of these test statistics will occur
for a single i, say i=i.
o
is an outlier.
Large values of U or V then indicate that
To find the percentage points of these test
37
statistics, we need their distribution in the null case (when there
is no outlier).
These, in general, involve the elements of
A.
How-
ever, upper and lower limits for the true percentage points can be
obtained by using the Bonferroni inequalities.
Some general results
for this are given in Section 4.3.
4.2.
Bounds for correlation coefficients
It is of some theoretical interest to get bounds for the
magnitude of the correlation coefficients {P
defined in Section 4.1.
upper bounds for
/p 1.J
.. 1
ij
}, where P
ij
has been
In this section, we will obtain some simple
(i~j).
Among other things, this will also
provide sufficient conditions for the validity of assumption (4.1.7).
Theorem 4.2.1.
I-A ..
2 < M'1.n ( A 1.1.
Pij
jj
(4.2.1)
f
Proo:
Let A be given by equation (4.1.3).
Then for
I-A ..
JJ).
A..
.
1.1.
'
.
A2 ~~A
d'
. . )th
By cons1.' d
er1.ng
t h e equat1.on
11
an equat1.ng
t h e ( 1.,1.
element on both sides we get
A•••
1.1.
Hence for
i~j
A~.
1.1.
that is,
(4.2.2)
Similarly
(4.2.3)
i~j
+ A~.
< A.. ,
1.J - 1.1.
38
Combining (4.2.2) and (4.2.3) we have
l-A ..
2
< M'~n ( A ~~
Pij
jj
l-A ..
JJ)
A.. •
~~
Using equation (4.2.1), a sufficient condition for assumption
(4.1.7) can be easily obtained.
However, the following corollary
gives a rather compact result.
Corollary 4.2.1.
A sufficient condition for the validity of assump-
tion (4.1.7) is that
where A(l) is the smallest of Akk (k=1,2, .•• ,n) and A(2) is the
second smallest.
Proof:
Adding equations (4.2.2) and (4.2.3) we see that for
(4.2.4)
Now i f
then
A•• + A •• > 1
~~
JJ
for all
i~j.
.,
Hence from equation (4.2.4)
for all
and the result follows.
i~j
i~j
39
Remark 1.
We shall show later that for A.. + A.. > 1, equation (4.2.4)
1.1.
JJ
can be replaced by a stronger inequality
(4.2.5)
The proof of this involves some geometric considerations and is deferred
till Section 7.2.
Note that
I-Ai'
l-A j .
2
Min( A 1., A J) < A A - I .
jj
ii - ii+ j j
Consequently, equation (4.2.1) gives better bounds than (4.2.4).
How-
ever, a comparison between (4.2.1) and (4.2.5) is difficult.
So far we have not made any assumption about the homoscedasticity
of the residuals, which we do now.
Anscombe [3] lists a number of
examples where the residuals have equal variance, viz. (n-m)cr 2 /n.
He
has also shown that a lower bound for the magnitude of the largest
correlation coefficient is [m/(n-m) (n-l)]
k
2.
From equation (4.2.5) we
see that in this case
(4.2.6)
IPij I -<--!!--l
n-m
m
=--
n-m
and a sufficient condition for the validity of (4.1.7) is simply n>2m.
In practice, the sample size n is usually much larger than the
number of parameters m ana the condition n>2m is likely to be satisfied.
Moreover, if m is small and n is large, then equation (4.2.6) shows
that the correlations will be close to 0 and can often be ignored.
40
4.3.
Upper percentage points of statistics expressible as maxima
Let zl,z2"",zn be n identically distributed r.v.'s and let
Z = Max{z.: i=1,2, .•. ,n}.
1.
Then clearly
n
Pr(Z>z) = Pr ( U (z. > z) ) .
i=l 1.
(4.3.1)
Applying the first two Bonferroni inequalities to (4.3.1), we get
(4.3.2) n pr(zl>z) -
L pr(z.>z,Zj>z)
i<j
~ Pr(Z>z) ~ n Pr(zl>z).
1.
Let z be the true upper 100a% point of Z.
a
Then an upper
limit z for z is obtained by solving for z in
a
a
(4.3.3)
Similarly, a lower limit z
--01.
for z
a
may be obtained by solving for z
in
(4.3.4)
n Pr(zl>z) -
L pr(z.>z,Zj>z)
i<j
= a.
1.
We here point out that it is possible for (4.3.4) not to· have a solution for all values of a, especially for large a.
(4.3.5)
Pr(z.>z ,z.>z ) < [pr(zl>zN)]2
1.
a J a
""
for all i<j,
then (4.3.2) gives
(4.3.6)
Pr(Z>z a ) > a -
(n-l)a 2
2n -> a - ~ a 2 •
For a small, the lower bound given at (4.3.6) is close to a and hence
z
a
will be approximately equal to z.
a
Further, if
41
(4.3.7)
o
Pr(z.>z ,z.>z )
J..
a
J
a
then equation (4.3.2) shows that z
a
for all i<j
coincides with z .
a
Apart from these simplifications, it is possible to sharpen the
upper limits in some special cases by using the properties of the joint
distribution of zl,z2, .•• ,zn'
This will be illustrated in later
chapters.
4.4.
Measures of performance
We now study the performance of the various test statistics
proposed in Section 4.1 in the non-null situation when outliers are
present.
It is assumed that just one of the observations -- we do not
know which one -- is an outlier.
The null hypothesis, H , specifies
o
that there is no outlier and under H our model is given by equation
o
(4.1.1).
The alternative hypothesis is the union of n mutually
where for j=1,2, .•. ,n
i f j:fk
(4.4.1)
i f j=k
and
~.
J
is defined in (4.1.1).
For outliers on the right 8>0 and for
outliers in either direction 8:fO.
Under H , the distribution of the residual vector e is given
o
by (4.1.4).
From equation (4.1.2) it is clear that under H ,
k
~
is
normally distributed with variance-covariance matrix A 0 2 and mean
42
A E(!IH k )
A[E(!IH o ) + ~ke] ,
where
O.
~k
is an n-vector with k
th
component 1 and all other components
Thus
E(~IHk) = AX'~ + A~ke
A~ke
(4.4.2)
,
because
AX' = [I-X'(XX,)-lX]X'
XI_XI =
O.
Further the error sum of squares, S2, has a noncentral cr 2X2 distribution with n-m degrees of freedom and noncentrality parameter ~~k' where
cr2~~k
E(!'IH k ) A E(yIH )
k
= (~'X + e:ke)A(X'~ + ~ke)
= -k
e:' A -k
e: e 2
(4.4.3)
=
A
kk
e2
•
For a random sample of size n from a N(~,cr2) parent, David
and Paulson [10] have proposed a number of measures of performance.
However, their measures are not suitable in the present case because
the distribution of e under H depends on A.
k
Thus, we are led to
consider some other measures, which in special cases reduce to the
measures given by David and Paulson.
Note that, under H , the U and V test statistics of Section
o
4.1 are of the form Z = Max{zi: i=1,2, ••• ,n}, where zl,z2, ••. ,zn
43
are identically distributed random variables.
With the notation used
in Section 4.3, let
P
k
= Probability that zk is significantly large
when
1\
is true
(4.4.4)
Q = Probability of rejecting H when H is true
k
o
k
(4.4.5)
=
Pr(Z > za l1\)'
In general, both P and Q depend on k.
k
k
Also, any measure of
performance must be symmetric with respect to hypotheses H ,H , •.• ,H •
n
I 2
If we now assume that a priori each observation has an equal chance of
being declared an outlier, then the following measures seem reasonable:
(4.4.6)
P = Min P ,
a
k
k
(4.4.7)
Qa = Min Qk'
k
I
P =n
b
I
Q = -n
b
n
L Pk ,
k=l
n
L Qk'
k=l
where the minimum is taken over the set {1,2, ••• ,n}.
Pa
~
P and Q
b
a
~
Qb'
In general
But, in the special case when Pk and Qk do not
depend on k, then
P = P = PI = Probability that zl is significantly
a
b
large when the alternative hypothesis
is true
and
Q = Q = Q = Power function.
I
a
b
44
These are the two measures considered in detail by David and
Paulson [10].
It is easy to show that
(4.4.8)
k=1,2, ..• ,n.
Hence
Qa -> Pa
Evaluation of Q is quite laborious, but again Bonferroni inequalities
k
can be used to get upper and lower bounds for Qk'
Thus
Max(Pk,gk) ~ Qk ~ Qk'
(4.4.9)
where
n
=
L Pr(z, > zal~)
Qk
i=l
].
n
= P + L Pr(z. > zalHk)
k
i=l
].
i;&k
(4.4.10)
and
(4.4.11)
gk = Qk - L Pr(z. > z , z.J > za IHk ) .
].
a
].'<'J
Using these results, lower and upper bounds for Q and Q can
a
b
be obtained.
In practice, z is available for very few regression
a
models and as an approximation we will use the upper limit
(4.4.4) and (4.4.5).
4.5.
za
in
This will give a lower bound for P and Qk'
k
Examples
We shall consider the following examples of the fixed effects
models, which are of interest in practice:
45
Example 4.5.1.
Sample from a N(~.a2) parent.
In this case. statistics
related to U and V have been studied by several authors (see e.g. Chew
[6] for a good review of the literature).
We have included it here for
the sake of completeness and performance studies.
The variance of any
residual is (n-l)a 2/n and the rank of matrix A is n-l.
The correlation
between any two residuals is -l/(n-l) and hence the condition (4.1.7)
holds for n > 3.
Example 4.5.2.
the i
th
One-way layout.
Let Yij be the jth observation from
class (j=1.2 •••• ,n.; i=1,2, .•• ,m; m>l) and let e .. be the
~
1
corresponding residual, i.e.
where
n.
1
1
I
=n . . 1Y'j'
1
1
J=
It is easy to show that the variance of e .. is (n.-l)if/n. and the rank
1J
1
1
m
~
of matrix A is n-m, where n =
i=l
n •
i
J-l/(n.-l)
p(eij,ei'j') -
1 :
Further
i f i=i' ,j:Fj'
otherwise.
This shows that p
= 0 and the condition (4.1.7) holds provided that
max
n.1 >
3 for all i.
"
1n the l. th
b servat10n
h
Let y .. b e teo
1J
1 2 , .•. ,r; J=
. 1 , 2 , ••• ,c ) and
rowan d J.th co 1umn 0 f a two-way 1ayout (.1=,
Example 4.5.3.
Two-way layout.
let e .. be the corresponding residual, i.e.
1J
46
where
1
Yio
=c
1
c
L YiJ·,
j=l
YoJ.
= -
r
\' Y •. , Y oo
L
r i=l
l.J
Here
n
= rc
and
m = r+c-l.
The variance of e .. is (n-m)a 2 /n and the rank of matrix A is n-m.
l.J
Further
p(eij,ei'j') =
-1/ (c-l)
if
i=i', j:fj'
-l/(r-l)
if
i:Fi', j=j'
1/ «r-l) (c-l»
if
.J.. ,
l.rl. ,
j:fj , •
From this, it follows that the correlations are both positive and
negative.
But these can assume at most three different values.
Also,
the condition (4.1.7) holds provided both rand c are greater than or
equal to 3.
CHAPTER V
DISTRIBUTION THEORY WHEN VARIANCE IS KNOWN
5.1.
Introduction
In this chapter, we will study the statistics Ul and VI of
Section 4.1.
For convenience, we shall drop the suffixes and write
the test statistics as U and V.
Letting
e.
(5.1.1)
b. =
~
~
---'--ck;:-
a A••
,
i=1,2, ••• ,n,
2
~~
we have from (4.1.8)
(5.1.2)
U = Max b ,
i
i
V = Maxlb.J.
i
Further from equation (4.1.4), under H '
o
~
~'
=
(b l ,b , .•• ,b ) has a
2
n
singular normal distribution with means zero, unit variances and
correlations {P
ij
}.
Hence the joint distribution of bi' b
N(O,O,l,l,p .. ), which in view of (4.1.7) is non-singular.
1.J
j
is
Moreover,
marginally, each b. has a unit normal distribution.
~
In Section 5.2, we give upper and lower limits for the true
percentage points of U and V.
Some results regarding the performance
of these statistics are given in Section 5.3.
are considered in Section 5.4.
Finally some applications
48
5.2.
Percentage points
5.2.1.
Upper limits
Following the notations introduced in Section 4.3, we shall
denote the true upper 100a% points of Uand V by u and v respectively.
a
a
Then the upper limits for u and v , denoted by u- and v- respectivea
a
a
a
ly, are obtained by solving
(5.2.1)
that is,
=1
<!l(u )
a
_ a
=1
n
a
- 2n'
where
a
<!lea) =
J
_00
5.2.2.
1
ffTI
Improved upper limits
We now show that it is always possible to get better upper
limits for v
a
than
va
of equation (5.2.1).
We first state some in-
equalities involving multivariate normal distributions, which are of
..
value in the present context (see e.g. "Sidak [38]).
Theorem 5.2.1.
Let (Xl, ••• ,X ) be a random vector having an-variate
n
normal distribution with mean values 0, variances 1, and having, under
the probability law P , the correlation matrix K
K
=
the probability law P , the correlation matrix R =
R
k"1J -< r 1'j
then
for all i,j,
»,
((r, ,».
((k
ij
1J
and, under
If
49
(5.2.2)
Corollary 5.2.1.
If k .. < 0 for
1J
i~j=1,2, ...
,n, then
n
PK(Xl~cl""'Xn-<cn)
(5.2.3)
Proof:
Take R = I
Corollary 5.2.2.
n
< IT Pr(X.<c.).
i=l
1- 1
in (5.2.2).
If r
ij
> 0 for
i~j=1,2, .•.
,n, then
(5.2.4)
Proof:
Take K = I
Theorem 5.2.2.
n
in (5.2.2).
Let (Xl, •.. ,X ) have a multivariate normal distribun
tion with mean values 0 and with an arbitrary correlation matrix, then
n
pr(lxll~cl,·.·,lxnl~cn) ~ IT pr(IXil~ci)
i=l
(5.2.5)
for any non-negative numbers c ,c 2 ' ••• ,c u •
l
We now apply Theorem 5.2.2 to get improved upper limits for v •
a
Theorem 5.2.3.
v'a
as obtained by solving
1
pr(lbll>v~) = 1 - (l_a)n
(5.2.6)
gives an upper limit for v
of (5.2.1).
Proof:
By definition
a
which is lower (and hence better) than v
a
50
Pr(V>v' ) = Pr(Max!b.l>v,)
a
i
l.
a
n
=
1-Pr
(n
(I b . I<v ' ) )
i=l
a
l.
n
< 1 -
IT Pr(lb.!<v')
i=l
l. -
by (5.2.5)
1
n
n
IT (I-a)
= 1 -
a
by (5.2.6)
i=l
=
a.
Hence v' is a nominal upper 1000.% point of V.
a
show that
Further it is easy to
1
a < l-(l-a)n,
nthat is,
This implies that
v a ->
v'a
'
which shows that v' is closer to the true percentage point than v ;
a
a
i.e., v~ is an improved upper limit for va'
1
1-(1-0.)
n
~
Note that
0. 2
n + 2n
a
and the improvement is slight if a is small and n is large.
We now consider the statistic U.
If p.
> 0 then an applicaml.n -
tion of (5.2.4) shows that an improved upper limit for u
1
pr(bl>u~) = l-(l-a)n.
a
is u~, where
51
Remark 1.
Regression models satisfying P. > 0 are rare.
mJ.n -
example is given below.
does not hold.
One such
However, for this example the condition (4.1.7)
We are not aware of any example where both these con-
ditions hold simultaneously.
Example 5.2.1.
Let the design matrix X of the regression model be
-1
-1
1
-1
Then
xx'
and
A=
so that P
ij
5.2.3.
(i~j)
~
~
o
o
~
~
0
0
is either 0 or 1.
Lower bound for the significance level attained
It is clear that by using the upper limits instead of the true
percentage points, we are actually working at a reduced significance
level.
But how big is the reduction?
Obviously, this will depend on
the particular regression model under consideration.
In an important
case when P
< 0, the following theorem shows that the significance
max level for the one-sided test statistic cannot drop below l-e
using u •
a
Theorem 5.2.4.
If P
< 0, then
max -
-a by
52
Pr(U>u ) > 1-(1 _ a)n > 1-e-a
a
n ,
(5.2.7)
where u is given by (5.2.1).
a
Proof:
For any c,
n
Pr(U>c) =pr(U(b.>c))
i=l 1.
n
= 1-Pr(
n (b.<c))
i=l
1.-
n
> 1 (5.2.8)
IT Pr(b.<c)
i=l
1.-
= 1- [ <P (c) ]
n
by (5.2.3)
•
The first inequality of (5.2.7) follows if we set c = u and use
a
equation (5.2.1).
To prove the second inequality, note that the
function
a t
h (t) = (1 - -) for t>l, O<a<l
a
t
--is a monotonic non-decreasing function of t and tends to e
to
-a as t tends
00.
Note that the same approach applied to the second Bonferroni
bound (equation (4.3.6)) gives
(n-1)a 2
Pr (U>u ) > a - ~::--<-~
a
2n
a n
This is slightly less than 1-(1 - -) •
n
5.2.4.
Lower limits
For the lower limits u and v , we solve for u and v in
--a
--a
53
(5.2.9)
n Pr(bl>u) -
l
Pr(b.>u, b.>u) = a
1.
J
l
Pr ( lb. I>v, lb. I>v) = a
1.
J
i<j
and
n Pr ( Ib11 >v) -
(5.2.10)
respectively.
i<j
Let
00 00
(5.2.11)
L(h,k,p)
=J J
h k
1
exp[1
(x 2_2pxy+y2)] dydx.
2TI/l- p2
2(1-p2)
Then
Pr(b.>u, b.>u)
1.
J
= L(u,u,P 1.J
.. )
and
pr(lb1.. I>v, Ibjl>v)
= 2L(v,v,p 1.J
.. )
+ 2L(v,v,-p .. ).
1.J
The function L(h,k,p) has been tabulated by the National Bureau
of Standards [29].
But, here we are interested in the "tail" prob-
abilities and these tables are not of much use.
It is well known that
(see e.g. [29]) for positive values of hand k
TI
(5.2.12)
exp[-~(h2+k2-2hk cos w)cosec 2w]dW'
J
p
Therefore
1
L(h,h,p) = 2~
/I
f
TI
arc cos p
_h 2
exp[l+cos W] dw •
This integral can be evaluated by using the quadrature method
given in the Appendix.
It is clear that unless p .. can assume very few
1.J
values, the evaluation of u and v will require a good deal of compu-a.
-a.
tation.
54
In the special case when P
max
~
0, the solution of equation
1
(5.2.13)
(I-a)
<Il(u) =
n
for u, gives another lower limit, u', for u.
-a
a
This follows immediately
by setting c=u in equation (5.2.8).
In general, u' is expected to be
slightly inferior to u
However, if Pij's are close to 0
-a
-a
of (5.2.9).
then it may be even superior to u .
Since
-a
1
(l_a)n ;
it follows, from (5.2.1) and (5.2.13), that the difference between the
upper limit u
a
and lower limit u' will be small.
-a
for a=.05 and n=lO we have
5.3.
ua
Thus, for example,
= 2.576 and u'=2.568.
-a
Performance of test statistics
In this section, we shall continue to use the notation intro-
duced in Section 4.4.
From equations (4.4.2) and (5.1.1), it follows
that under H the joint distribution of b ,b , .•• ,b is multivariate
k
l 2
n
normal with unit variances, correlation matrix «P
ij
)) and means given
by
(5.3.1)
=
°ik •
8
i=1,2, ••• ,n,
(J
where
A
(5.3.2)
= - - 1ik"I
/\ii
~
Since all the measures of performance depend on the ratio 8/a, we may
without loss of generality take a=l.
55
One-sided test statistic U.
From equation (4.4.4)
(5.3.3)
For fixed 8>0, P will attain a minimum when A is a minimum.
k
kk
P
=
a
Rence
Min P
k
k
where A(l) = Min Akk •
k
The computation of P requires the evaluation of at most n
b
different normal probabilities given at (5.3.3).
then P
a
= Pb = Pl'
If all A are equal,
kk
But if anyone of the A is much smaller than the
kk
others, then P will be a better measure of performance and will be
b
worth computing.
Similarly, equation (4.4.5) gives
n
(5.3.4)
=
Pr ( U (b. >
. 1
1.=
1.
I
u
H. )) •
ex. -1<.
Equations (4.4.10) and (4.4.11) then give
n
L ¢(-uex.
(5.3.5)
. 1
+ o'k8 )
1.
1.=
i;'k
and
=
(5.3.6)
Qk -
L Pr(b.>u
i<j
1.
ex.
, b.>u IRk)
J
ex.
56
where the function L has been defined at (5.2.11).
Since the bi-
variate probabilities in (5.3.6) are difficult to obtain, it is usefu1 to note that in the important case when p
< 0
max -
where
n
(5.3.7)
Proof:
gk'
=1
-
IT ~(u -C'ke ).
• 1
ex, J.
J.=
From equations (5.3.1) and (5.3.4)
n
Qk = Pr(
U (w,
i=l
J.
> u -a'ke)),
ex, J.
where w ,w , .•. ,w have a multinormal distribution with mean values 0,
1 2
n
variances 1 and correlation matrix «p, ,)).
J.J
Now on applying the same
technique as in the proof of Theorem 5.2.4, we get
Qk
~
1 -
n
IT Pr(w, < u -C'ke),
'1
J.- ex, J.
J.=
that is, Qk ~~k'
A direct comparison between (5.3.6) and (5.3.7) is difficult,
but (5.3.6) is expected to give better results, unless the p,,'s are
J.J
close to O.
Some comparative results for Example 4.5.1 are given in
the next section.
However (5.3.7) is better than (4.4.8), because
from (5.3.7)
n
•
IT ~(u -c, 8)
i=l
ex, J.k
i;'k
57
Two-sided test statistic V.
Now
(5.3.8)
To find P , we need the following easily proved lemma.
a
Lemma 5.3.1.
For
A~O, a~O
and fixed 8
h(A) = $(-a+A8) + $(-a-A8)
is a non-decreasing function of A.
Applying the lemma, we see that
(5.3.9)
where A(l)
= Min
k
Akk •
Similarly, from equations (4.4.10) and (4.4.11) we get
n
L Pr (I b '\ >v
i=l
1.
a
IHk )
i~k
n
8)
L [$(-va +O'k
1.
(5.3.10)
'1
1.=
+ $(-va -O'k8)]
1.
i~k
and
(5.3.11)
Slk =
Qk
-
L Pr ( lb.1. I>va ,
i<j
I
I
\ b , >v H ) •
J
a k
The bivariate probability at (5.3.11) is a linear function of
at most 4 "L" functions defined in (5.2.11).
For computational purposes,
58
we give the closed form for
Let
h.1 = va -O'ke,
1
g.1 = va +O'ke,
1
i=1,2
and
Then
(5.3.12)
Note that P also gives a lower limit for Q and may be better than
k
k
Qk in some cases.
5.4.
Thus
Applications
We first consider Example 4.5.1 with 0=1.
In this case, the
true percentage points of the statistic A = Max(y.-y) have been
1
i
1
tabulated by Grubbs [17] for n=3(1)25 and several values of a.
Note
that
From equation (5.3.4), it is easy to see that in this case Qa=Qb=Q •
l
The lower bounds for Q given at (5.3.6) and (5.3.7) simplify to
1
59
(5.4.1)
and
(5.4.2)
where
e
+ -----,1.<:-
u
a
[n(n-1)]
2
and
P
1
= -
n-1 •
Q1 has been tabulated by David and Paulson [10] for a=.Ol and
.05 and several values of n.
and n=5(5)20.
For
h1~0
Table 5.4.1 compares
Q1 and
Qi
for a=.05
and h >0, the bivariate probabilities at
2
(5.4.1) were obtained by using equation (5.2.12) and the quadrature
method described in the Appendix.
For other values of h , h , these
1
2
were obtained by using the expressions given in [29].
In general,
Qi
is only slightly inferior to Q1; the maximum difference between Q
1
and
Qi
is less than .003.
Results for a=.Ol are not tabled here, but
the differences are even less in this case.
Both upper and lower Bonferroni type limits of the statistic
Maxly.-Yl are tabulated by Halperin et a1. [20] (corresponding to
i
].
their case m=oo) for a=.05 and .01 and various sample sizes.
However,
their upper limits can be slightly decreased by using Theorem 5.2.3.
We now study the performance of
-I
V = (n/ (n-1)) 1.<2 Max Iy .-y
1
i
].
60
by using the improved upper limits
v'a
given at (5.2.6).
Since all
Qk's are equal, the power function is Qa=Qb=Q l with a lower bound given
Using v~ in place of va in equations (5.3.8) and (5.3.11)
by (5.3.13).
we have
(5.4.3)
where
and
gl = PI +
(n-l)[~(-h3)
~(-h4)]
+
-(n-l)[L(h ,h ,p) + L(h ,h ,p) + L(h ,h ,-p) + L(h ,h ,-p)]
2 3
l 3
2 4
l 4
n-l
-( 2 )[L(h ,h ,p) + L(h ,h ,p) + 2L(h ,h ,-p)],
4 4
3 3
3 4
where
hI = VIa
h
3
=
VI +
a
-
(n-l)~8
n
8
1
[n(n-l)]~
'
,
h
h
4
2
=
=
VI +
a
VI a
(n-l)~8
n
'
8
[n(n-l)]
k
,
2
and p = -l/(n-l).
Since PI and gl are symmetric functions of 8, we only need the
values for 8>0.
For a=.05, the lower bound (5.4.3) is tabled in
Table 5.4.2 for n=3(l)lO(5)30.
was greater than gl for
8~5
It was observed that, in general, PI
and in this case the difference (Pl-gl )
was largest for small values of n.
Note that for fixed 8, the tabu-
lated values first increase and then decrease as the sample size
increases.
61
For a one-way layout (Example 4.5.2), P
< O.
max -
Hence the
lower bound (5.2.7) holds and equation (5.2.13) can be used to get
the lower limits for u.
a
Other results are also expected to be
similar to that of previous example.
We next consider the statistic U for a two-way layout with r
rows and c columns (Example 4.5.3).
The upper limit, u , is immea
diately given by (5.2.1) with n=rc.
The lower limit,
~,
is
obtained by solving equation (5.2.9), which in the present case
simplifies to
where
P3 = l/[(r-l)(c-l)].
P2 = -l/(c-l),
PI = -l/(r-l),
Define a function h(u) by
h(u) =
~(u)
+
~[(r-I)L(u,u,PI)
+ (c-I)L(u,u,P 2)
+ (r-I)(c-I)L(u,u,P 3 )]·
Then u
~
(5.4.4)
is the solution of the equation
h(u)
a
rc
=I
To solve this equation, note that
a
h(u ) > I
rc
a
We also know that the lower limit u
~
lies between 0 and u.
a
find a value d (if possible) such that
We then
62
h(~ -d) < 1
(5.4.5)
a
a
rc
This shows that equation (5.4.4) has a root between u -d and u.
a
a
Here,
we increased d in steps of 0.1 till the inequality (5.4.5) was satisfied.
This method then gives a value of d for which
u -d < u
< u -d+O.l.
a --a - a
The range of u was then narrowed down to get 3-decimal place accuracy.
-a
Values of u are given in Table 5.4.3 for 0.=.05 and r=3(1)12, c=3(1)7.
-a
The table is symmetric in rand c and the values above the diagonal are
For comparison purposes, the upper limit ~
not tabulated.
included.
is also
a
This table shows that the difference between upper and lower
limits is small; the largest difference is .023 for r=c=3.
Using the upper limit ~o. in (5.3.3) we have
P = P = P = ~(-u + «r-l)(c-l))~e).
1
a
b
a
rc
This is tabulated in Table 5.4.4 for 0.=.05 and r=c=3(1)12.
It appears
that the values of PI do not depend greatly on the total number of
observations rc, especially for
e~3.
Corresponding results for the two-sided test statistic V are
given in Tables 5.4.5 and 5.4.6.
The upper and lower limits were
obtained from equations (5.2.6) and (5.2.10).
The maximum difference
between the upper and lower limits is .075, when r=c=3.
ference decreases rapidly as rand c increase.
it is only .010.
The dif-
Thus for r=5 and c=6,
63
Table 5.4.1. Comparison between the lower bounds (5.4.1)
(given in top row) and (5.4.2) (given in bottom row) for a=.05
X
1
2
3
4
5
6
5
.0975
.0960
.3059
.3036
.6435
.6417
.8956
.8950
.9842
.9841
.9988
.9988
10
.0845
.0832
.2700
.2676
.6168
.6143
.8915
.8905
.9853
.9851
.9991
.9991
15
.0777
.0768
.2445
.2425
.5878
.5855
.8795
.8785
.9835
.9833
.9990
.9990
20
.0733
.0726
.2258
.2242
.5633
.5613
.8677
.8668
.9813
.9812
.9988
.9989
Table 5.4.2.
x
Lower bound (5.4.3) for Q for a=.05
1
0
1
2
3
4
5
6
3
4
5
6
7
8
9
10
.045
.047
.048
.049
.049
.049
.049
.049
.094
.092
.088
.085
.083
.080
.078
.077
.271
.267
.259
.251
.243
.236
.230
.224
.562
.574
.573
.569
.563
.556
.550
.543
.816
.842
.851
.853
.853
.852
.850
.847
.955
.967
.972
.973
.974
.974
.974
.974
.994
.997
.997
.998
.998
.998
.998
.998
15
20
25
30
.050
.050
.050
.050
.071
.068
.065
.063
.201
.185
.173
.164
.513
.490
.470
.454
.833
.820
.807
.797
.971
.968
.965
.963
.998
.998
.997
.997
64
Table 5.4.3. Upper and lower limits of the statistic U1
for a two-way layout with r rows and c columns and a=.05
I~
3
4
6
5
7
3
2.539
2.516
4
2.638
2.622
2.734
2.722
5
2.713
2.700
2.807
2.796
2.878
2.868
6
2.773
2.761
2.865
2.855
2.935
2.926
2.991
2.983
7
2.823
2.812
2.914
2.904
2.983
2.974
3.038
3.030
3.084
3.076
8
2.865
2.855
2.955
2.946
3.023
3.015
3.078
3.070
3.124
3.116
9
2.902
2.893
2.991
2.983
3.059
3.051
3.113
3.105
3.158
3.151
10
2.935
2.926
3.023
3.015
3.090
3.082
3.144
3.136
3.189
3.181
11
2.965
2.956
3.052
3.044
3.118
3.111
3.172
3.164
3.216
3.209
12
2.991
2.983
3.078
3.070
3.144
3.136
3.197
3.189
3.241
3.234
I
Table 5.4.4. Performance PI of the test statistic U1 for a
two-way layout with r rows and c columns and a=.05
X
3
4
5
6
7
8
9
10
11
12
1
.031
.024
.019
' .015
.013
.011
.010
.008
.007
.007
2
3
4
5
6
7
.114
.109
.101
.093
.085
.079
.073
.068
.064
.060
.295
.314
.316
.312
.304
.295
.286
.277
.269
.261
.551
.605
.626
.634
.635
.632
.627
.622
.615
.608
.786
.845
.869
.880
.885
.887
.888
.887
.885
.883
.928
.961
.973
.978
.980
.982
.982
.983
.983
.982
.983
.994
.997
.998
.998
.998
.999
.999
.999
.999
65
Table 5.4.5. Upper and lower limits of the statistic V1
for a two-way layout with r rows and c columns and a=.05
I>'Z
3
4
6
5
7
3
2.766
2.691
4
2.858
2.812
2.948
2.922
5
2.928
2.893
3.016
2.998
3.083
3.071
6
2.984
2.955
3.071
3.057
3.137
3.127
3.190
3.183
7
3.031
3.006
3.116
3.105
3.182
3.174
3.234
3.228
3.278
3.273
8
3.071
3.048
3.156
3.145
3.220
3.214
3.272
3.267
3.315
3.312
9
3.106
3.085
3.190
3.180
3.254
3.248
3.305
3.301
3.348
3.345
10
3.137
3.118
3.220
3.212
3.283
3.278
3.335
3.331
3.377
3.374
11
3.165
3.147
3.247
3.240
3.310
3.306
3.361
3.358
3.403
3.401
12
3.190
3.173
3.272
3.265
3·.335
3.330
3.385
3.382
3.427
3.425
Table 5.4.6. Performance P1 of the test statistic V1 for a
two-way layout with r rows and c columns and a=.05
~
3
4
5
6
7
8
9
10
11
12
1
2
3
4
5
6
7
.018
.014
.011
.009
.008
.007
.006
.005
.004
.004
.076
.074
.069
.064
.059
.055
.051
.047
.044
.041
.222
.243
.247
.245
.240
.233
.227
.219
.213
.206
.461
.521
.547
.557
.560
.559
.555
.550
.544
.538
.715
.789
.820
.836
.843
.847
.848
.848
.846
.844
.891
.940
.957
.965
.969
.971
.972
.973
.973
.973
.971
.989
.994
.996
.997
.997
.997
.998
.998
.998
CHAPTER VI
DISTRIBUTION THEORY WHEN VARIANCE IS
UNKNOWN -- EXTERNAL STUDENTIZATION
6.1.
Introduction
Let b
i
be as defined in (5.1.1) and let
b.
e.
(6.1.1)
where s
ti
v
=
_---::1."""7"""
s v >".i
1.
~2
1.
= s /a '
i=l,Z, .•• ,n,
V
is a root mean square estimator of a based on v degrees of
freedom and independent of Yl'YZ, .•• ,Y •
n
We now take a=l.
The test
statistics for locating a single outlier in this case are U and V
2
z
defined at (4.1.9).
(6.1.Z)
Dropping the suffixes from U and V ' we have
Z
z
U=Maxt.,
i
1.
V = Maxlt.J.
i
1.
From (6.1.1) and (4.1.4) it follows that, under H , each t. has
o
1.
a Student t distribution with V degrees of freedom and the joint distribution of, say, t
l
and t
z is
bivariate t with v degrees of freedom
(see e.g. Dunnett and Sobel [13]) with density given by
(6.1.3)
where p= P.i.Z.
In this chapter, our approach is similar to that of Chapter V.
The percentage points of these statistics are given in Section 6.2.
67
In Section 6.3, we study their performance.
Some applications are
considered in Section 6.4.
6.2.
Percentage points
6.2.1.
Upper limits
The solution of equations
(6.2.1)
give an upper limit for u
and v
a
respectively, where u and v are the
a
a
a
true upper 100a% points of U and V.
The c.d.f. of the t distribution
has been tabulated by Hartley and Pearson [22].
and v
a
can be readily obtained.
Using their tables
Alternatively, tables of the incomplete
beta function can be used for this purpose, since for
Pr(t 1 >a) =
ua
~
I
a~O
(~, ~),
V
v+a 2
where
(6.2.2)
=
f
x
o
1
B(p,q)
-1
q-1
u P (l-u)
du
is the incomplete beta function.
6.2.2.
Improved upper limits
Similar to the case of known variance, here also it is possible
to improve the upper limits in some cases.
Note that Corollary 5.2.1
has no analogue in the present case (see Hume [23]).
However, the
following theorem gives a partial analogue of Corollary 5.2.2.
Theorem 6.2.1.
If P
> 0 for all i#j, then
ij
68
n
(6.2.3)
where t
pr(t <c ' .•.. ,t <c ) > IT Pr(t.<c.),
1- 1
n- n
i=l
~- ~
i
is given in (6.1.1) (with 0=1) and c ,c , •.• ,c are any non1 2
n
negative numbers.
Proof:
00
(6.2.4)
=J
o
Pr(t <c , .•. ,t <c )
n- n
1- 1
where f(sv) is the p.d.f. of sv'
Pr(b <c s , .•• ,b <c s )·f(s )ds ,
1- 1 v
n-nv
V
V
Now applying Corollary 5.2.2 to the
first term of the integrand on the R.H.S. of (6.2.4), we get
oon
pr(t <c , ••• ,t <c ) > J IT Pr(b.<c.s )f(s )ds
1- 1
n- n - 0 i=l
~- ~ v
v v
n
= E{ IT Pr(b.<c.s )}
. 1
~=
(6.2.5)
~- ~
V
n
>
IT E (Pr (b. <c. s ))
i=l
~- ~ V
n
= IT Pr(t.<c.).
i=l
~- ~
The inequality at (6.2.5) follows on using a result due to Kimball
[25] •
The analogue of Theorem 5.2.2 is given in Theorem 6.2.2 and is
v
..
due to Sidak (see e.g. [38]).
Theorem 6.2.2.
With the notations of Theorem 6.1.1,
n
IT Pr ( It. I<c . ) •
. 1
~=
~
-
~
69
Corresponding to Theorem 5.2.3, we now have
Theorem 6.2.3.
The solution of
1
(6.2.6)
=
l-(l-a)
n
gives an improved upper limit for v •
a
Similarly, if p. > 0 then an improved upper limit for u is
m1n a
ti'a' where
1
= l-(l-a)
6.2.3.
n
Lower limits
The evaluation of the lower limits u
--a.
and v
--a.
is similar to the
case of known variance with the exception that now we have to deal with
the bivariate t distribution.
Denote pr(tl>h, t >k) by L(h,k,p,v), where the joint distribu2
tion of t
l
and t
z
is given at (6.1.3).
Pr(ltll>h,ltzl>k]
Then
= ZL(h,k,p,v) + ZL(h,k,-p,v).
Tables prepared by Dunnett and Sobel [13] and Siotani [39] could be
used to evaluate such probabilities.
However, for computational
purposes, it is convenient to express the double integral
00
00
as a single integral and to use numerical integration.
note that for h,k positive
To this end
70
00
L(h,k,p,v)
= J Pr(b 1>hs , b 2>ks ) • f(s )ds ,
o
V
V
V
V
where f(sv) is the p.d.f. of sv.
Now using (5.2.12) and interchanging
the order of integration, we get
L(h,k,p,v) = 2~
'IT
00
J exp[-~s2(h2+k2-2hk cos w)cosec 2 w]
J
arc cos p 0
2.
(~)
k:V
2
r(~)
1
= 2'IT
v
'IT
2
J
dw.
arc cos
For h=k, this reduces to
1
L(h,h,p,v) = 2'IT
6.3.
'IT
J
arc cos p
2h 2
[1 + v(l+cos w)]
v
2
dw.
Performance of test statistics
To study the performance of U and V, we shall proceed similarly
to the case of known variance.
From equations (5.3.1) and (6.1.1), it
is clear that under H , we can write
k
where 0ik is given by equation (5.3.2) and z has a standard unit normal
distribution independent of sv.
Now letting
(6.3.1)
it follows that under H , t has a noncentra1 t distribution with V
k
i
71
degrees of freedom and noncentrality parameter 6
notation 6
ik
ik
•
Note that the
is in agreement with equation (4.4.3) for 0=1.
One-sided statistic U.
Pk
From equation (4.4.4)
= Pr(tk>ua !Hk ),
k=l, 2, ••• ,n
= Pr(t'v,u" >u),
a
(6.3.2)
kk
where t' 6
v, kk
has the noncentral t distribution with V d.f. and non-
centrality parameter 6
k
= A~k8.
kk
tion are needed to evaluate P .
k
Tables of the noncentral t distribuHowever, the existing tables (see e.g.
[27]) require a considerable amount of interpolation and one may use
the normal approximation to the c.d.f. of noncentral t distribution
(see e.g. [24] and [1] p. 949) given by
Pr(t' ,,~t ) • ~(
(6.3.3)
t
0
v,~ o '
(1 - ..!..)-6
4
V)
2
(1
t
k2
•
2v
This gives a reasonably good approximation.
v=5, 6= (v+l)
1
+ .-Q.)~
Thus, for example, for
and t =5, the approximate value from (6.3.3) is .891
o
while the exact value is .900.
improves with increasing
The accuracy of the approximation
v.
To find Pa' note that P can be rewritten as
k
00
=
fo Pr(z>ua s v-Okk8)f(sv )dsv
00
=
fo ~(-ua s v+Akkk 8)f(sv )dsv
2
where f(sv) is the p.d.f. of sv'
occurs when A is a minimum and hence
kk
72
(6.3.4)
P = Min P = Pr(t' A>U ),
k
a
k
V,Ll ex
where
(6.3.5)
Similarly from (4.4.10)
n
Qk = Pk + L Pr(t.>u I~)
i=l
J.
ex
i:fk
n
= Pk + L Pr(t' 6 >u).
i=l
v, ik ex
i:fk
The lower bound Q given at (4.4.11) involves the bivariate noncentra1
k
t distribution and is not considered here.
Two-sided statistic V.
Now
Pk = Pr(1tkl>vexIHk)'
k=l, 2, •••• ,n
= Pr(lt' A I>v)
v,Llkk
ex
00
=
of pr(lz+o kk81>vex s V )f(sV )dsV
00
= of
[~(-v s +okk8)~(-v s -okk8)]of(s )ds
ex V
ex V
v V
Now using Lemma 5.3.1, we have
(6.3.6)
P = Min P = Pr(lt' AI>v ),
k
a
k
V,Ll
ex
where 6 is given at (6.3.5).
Expression for the upper bound for Q is similar.
k
73
6.4.
Applications
For Example 4.5.1, the true percentage points of
y.-y
C1 = Max (-~-)
i
sv
have been tabulated by David [9] for n=3(1)12 and several values of
v and a.
Further, the performance P has been studied by David and
a
Paulson [10].
The upper and lower limits for the true percentage points of
Max
i
are given by Halperin et al. [20] for a=.05, .01 and several values of
nand V.
As for the case of known variance, the upper limits can be
slightly improved by using (6.2.6).
We now turn our attention to the general regression model.
From equations (6.2.1) and (6.2.6), it follows that the upper limits
depend only on a,n and V.
To fix the ideas, we consider the two-sided
statistic V and use the improved upper limit
v'.
a
Equation (6.3.6)
then reduces to
Pa
(6.4.1)
=
where 6. is given at (6.3.5).
pr(\t' A!>V'),
v,u
a
This shows that for fixed a,n and v, P
a
depends only on 6. and hence is a conservative measure of performance.
This is given in Table 6.4.2 for n=4(2)10; v=5(5)20 and a=.05.
purposes of comparison P
a
The values of
n,V and a.
v'a
for
0
For
known (v=oo) has also been included.
are tabulated in Table 6.4.1 for the above values of
74
For the numerical work involving the noncentral t distribution,
the series expansion for pr(lt~,~I~to)' originally due to Craig [7]
was found most convenient.
However, as pointed out by Amos [2], Craig's
formula contains an error.
The correct formula is
(6.4.2)
00
I
I
(j+1-'2' ~v),
t2
j=O
o
t 2 +v
o
where I (p,q) is given at (6.2.2).
x
00
(6.4.3) Pr(lt' AI>t )
V,LI
0
I
j=O
From (6.4.2) we get
1
(~~2)j • jT
I
V
(~V, j +1-'2) •
V+t 2
o
This series was used for the numerical work.
The incomplete beta
functions were obtained by using the quadrature method given in the
Appendix and the series was summed to give at least 3-decimal place
accuracy.
The normal approximation (6.3.3) was also studied in this
connection.
For all the values tabulated, it was found that the approx-
imation is quite good for all
v~lO
and in this case the largest dif-
ference between the tabled and the approximate values was .003.
for v=5, the largest difference was .012.
Even
Further, for fixed V, the
approximation appeared to be less accurate for large values of
v'.
ex.
Note that Table 6.4.2 can be used to get any P and hence Pb'
k
However, the normal approximation (6.3.3) is recommended for v>lO.
75
Table 6.4.1.
~
4
6
8
10
5
10
15
20
3.791
3.027
2.827
2.736
2.491
4.197
3.264
3.026
2.918
2.631
4.501
3.434
3.166
3.045
2.727
4.747
3.568
3.275
3.143
2.800
00
Table 6.4.2.
~
4
5
10
15
20
00
6
5
10
15
20
00
8
5
10
15
20
00
10
VIa from equation (6.2.6) for a=.05
Improved upper limits
5
10
15
20
00
Performance Pa given at equation (6.4.1)
for the statistic V for a=.05
2
1
2
3
4
5
.040
.051
.056
.059
.068
.147
.215
.244
.260
.312
.355
.519
.579
.608
.695
.610
.809
.861
.883
.934
.819
.954
.975
.982
.994
.936
.994
.998
.999
1.000
.028
.037
.041
.043
.052
.108
.169
.196
.212
.264
.278
.444
.510
.544
.644
.513
.748
.816
.845
.914
.737
.928
.961
.973
.991
.887
.988
.996
.998
1.000
.022
.029
.033
.035
.042
.086
.141
.167
.182
.234
.231
.393
.462
.498
.608
.447
.700
.779
.815
.898
.672
.905
.948
.963
.988
.842
.981
.993
.996
.999
.018
.024
.027
.029
.036
.072
.122
.147
.161
.212
.199
.356
.426
.464
.579
.398
.661
.749
.789
.885
.619
.883
.936
.955
.986
.801
.975
.991
.995
.999
6
CHAPTER VII
DISTRIBUTION THEORY WHEN VARIANCE IS
UNKNOWN -- POOLED STUDENTIZATION
7.1.
Introduction
The statistics, which we will consider here are U and V as
3
3
defined at (4.1.10).
Let
e.
(7.1.1)
W.
1.
=
1.
~--=.-
~
i=1,2, •••
,n,
A.. S
1.1.
P
where
(7.1.2)
is the pooled sum of squares based on
(7.1.3)
p
degrees of freedom.
= n-m+v
Note that the assumption (4.1.7) implies that
p~2.
Dropping the suffixes from U and V , we have from equation (4.1.10)
3
3
(7.1.4)
U
= Max
Wi'
i
V
= Max Iw·1
•
1.
i
Whereas the results of Chapter VI are similar to that of
Chapter V, the situation is quite different in the present case.
This is due to the fact that the statisticS U and V are the maximum
of n bounded random variables.
As we shall see in Section 7.5, this
77
allows us to evaluate the true percentage points in some cases.
The necessary distribution theory results are given in Sections
7.2 and 7.3.
In Section 7.4, we give an analogue of Corollary 5.2.1
for a special case.
The percentage points of U and V are considered
in Section 7.5 and some results regarding the performance are discussed
in Section 7.6.
A comparison with external studentization is made in
Section 7.7.
7.2.
Marginal and joint distributions
We first obtain the marginal distribution of w.•
1
Using the
Cochran's theorem, it is easy to show that
where
and
and the two quadratic forms are independent.
Hence
(7.2.1)
has a beta
(~, ~(p-1»
distribution.
From this we get the marginal
distribution of w.
1
(7.2.2)
Next, consider the joint distribution of, say, w and w •
1
2
By assumption (4.1.7), we see that the matrix
78
is positive definite.
Put
Azzei - ZA 1Z e 1e Z + A11e~
=
All AZZ - Aiz
and
Case 1.
P2:.3
Again by using the Cochran's theorem, we see that Q and QZ are
1
independent cr 2 X2 variates with Z and (p-Z) degrees of freedom respective1y.
Further, Q can be written as
l
= zi +
Ql
Z'2,
where
z.
(7.Z.3)
~
= e. fA ~• • ,
~
~~
i=l,Z,
and
Thus, we have the decomposition
where zi'
Z'2
and Q are mutually independent cr 2 X2 variates with 1,1
Z
79
and (p-2) degrees of freedom respectively.
This implies that the
joint distribution of zl' z' and Q is
2
1
(2a2)~(p-2) • r(~(p-2»
Making the transformation
we get
where
(7.2.4)
To get the joint distribution of w and w we note that
1
2
Zi
wi
i=1,2
= -z"""i--2-P-Z-l-Z-2-+-Z"""~---k
(
=
l-p2
+ Q2 )2
Zi
1
l'
(z'p-·Z+Q )~
2
---:-=~-:-
where
P
= [~
il
.
80
Make a transformation
i=l,Z
The jacobian of the transformation, after some simplifications, is
Cl(Zl'ZZ,QZ)
Cl (Wl'WZ',QZ) =
where
Therefore, the joint distribution of wl'
Now integrating out Q from 0 to
Z
-.
P
-.-
QZ is
2
-1
Zo- (l-w' P w)
c •
(7.2.5)
z and
W
00,
we get
•
(l-~'P
2
Z7T(1-p2)~
-1 . k2 (p-4)
w)
•
The region of positive density is the interior of the ellipse
i.e.,
(7.Z.6)
81
Case 2.
p=2
Using the same notations as in case 1, we now have
z.~
(~'p
-1
~)
!.< '
i=1,2.
2
It is easy to show that
and hence the joint density of wI' w can not exist.
2
However, the
marginal distribution of w. is still given by (7.2.2).
~
Note that the R.H.S. of (7.2.5) vanishes on putting p=2.
Hence we will continue to use (7.2.5), keeping in mind that no joint
density exists for p=2.
Remark 1.
(7.2.5) generalizes a result obtained by Doornbos et a1.
(see e.g. Doornbos [11]) for the special case of Example 4.5.1.
Their
result has been re-derived by Quesenberry and David [32] by a simplified argument.
Remark 2.
By using the inverse transformation
i=1,2,
where!.' = (tl' t ), it can be shown that the joint distribution of t
l
2
and t
2
is bivariate t with p-2 degrees of freedom.
In view of this
result and the corresponding result between univariate beta and univariate t distributions, we may call the joint distribution of wi and
w~ a generalized bivariate beta distribution.
82
Next consider the ellipse
in the (w1t w )-p1ane.
2
and the axis w -w
1 2
The principal axes of the ellipse are
= 0 intersects the ellipse at the points (cotc o )
and (-c t-C )t where c
000
= [~(l+p)]
k
2
(see Figure 7.2.1).
We now have
the following lemma (see also Srikantan [40)).
c
w -w =0
1 2
----I-----------..,j£------------I'-----~w1
Figure 7.2.1
83
Lemma 7.2.1.
(7.2.5) .
Proof:
Let the joint distribution of w and w be given by
1
2
Then
~ [~(l+p)]
k
(a)
For any c
(b)
For any c' ~ [~(l+lpl)] 2
2
k
Result (a) is obvious.
To prove (b), let
(7.2.7)
Then
(7.2.8)
pr(lw1 !>h, Iw21>k) = 2 M(h,k,p,p)
+ 2 M(h,k,-p,p).
Setting h=k=c' in (7.2.8) and applying result (a), we see
that both
terms appearing on the R.H.S. of (7.2.8) will vanish, provided that
1
1
c' ~ Max([~(l+p)]\ [~(l-p)]~)
We are now in a position to prove the stronger version of
equation (4.2.4) as mentioned in Remark 1 of Section 4.2.
Theorem 7.2.1.
(7.2.9)
Proof:
Let A be given by (4.1.3).
IP 'j I
~
<
-
2
,+,
A
ii
A
- 1,
jj
From (7.1.1) and (7.1. 2) we have
Then
.J..
~'f'J
•
84
n
n
I
i=l
A •• w~ =
11 1
I e~
i=l 1
S2
S2+VS 2 = --S""':2+=-V-s"""2 •
V
V
Hence
n
I
(7.2.10)
i=l
A.. w~ < 1.
11 1 -
Without any loss of generality we now take i=l and j=2.
Then (7.2.10)
yields.
(7.2.11)
The equation
(7.2.12)
describes an ellipse in the (w ,w )-p1ane (see Figure 7.2.2) and the
1 Z
line w1-w
2
=0
intersects the ellipse at the points (d,d) and (-d,-d),
where
(7.2.13)
Similarly, the line w +w = 0 intersects the ellipse (7.2.12) at the
1 2
points (-d,d) and (d,-d).
By (7.2.11), it then follows that
(7.2.14)
Pr( Iw1 \>d,
IW2 \>d)
=
o.
Further from Lemma 7.2.1
Pr( I w1 1 >d 1 ,
where
Iw2 1>d1 )
= 0,
85
(1
W
l
-w
2
=0
(d,d)
------+---\------~E__---_+--___jr__----~W
Ellipse (1):
Ellipse (2):
Figure 7.2.2
We now claim that
d~dl;
for otherwise equation (7.2.14) will
contradict the fact that the joint density of w ,w is positive
l 2
inside the ellipse (7.2.6).
i.e. ,
Hence
1
86
7.3.
Evaluation of bivariate probability
Before proceeding any further, we will obtain an expression for
the bivariate probability in terms of a single integral.
This will be
used for the evaluation of lower limits of the true percentage points.
From equation (7.2.7)
(7.3.1)
where the integration is over the region
(7.3.2)
Define
(7.3.3)
Q(a) == Pr(w1>a)
== K
1
J1(1-wi)~(P-3)dW1
a
where
(7.3.4)
The following properties of the M function are obvious:
M(h,k,p,p)
= M(k,h,p,p),
M(-h,k,p,p) + M(h,k,-p,p)
M(-h,-k,p,p)
= Q(k),
= l-Q(h)-Q(k)+M(h,k,p,p).
From these relations, it is clear that we only need to consider
the case h,k>O.
Let A be the point (h,k) in the (w1 ,w 2)-p1ane.
We
87
assume that A lies inside the ellipse
0.3.5)
for otherwise, the desired probability is zero.
The region of integra-
tion given at (7.3.2) is then the shaded area ABCA (see Figure 7.2.1),
where
and
The extended line OA intersects the ellipse at the point D,
where
If we now define
then due to symmetry
(7.3.6)
M(h,k,p,p) = Ml(h,k,p,p) + Ml(k,h,p,p).
Hence we only have to find an expression for Ml(h,k,p,p).
assume that k>O, for Ml(h,O,p,p)
= 0.
We now
Make a transformation
Then the joint distribution of zl' z2 is
88
and the region of integration for M (h,k,p,p) is the shaded region
1
A B D A (Figure 7.3.1), where
1 1 1 1
A
1
h-pk
= (
1
(1_p2)~
"
k)
o
Figure 7.3.1
89
Put
and
r cos 8
= zl
r sin 8
=
Then, the joint distribution of rand
g(r,8)
= p;;
zZ.
e is
• r (1_r2)~(p-4), O<8<ZTI, O<r<l.
Moreover
I
1
k cosec 8
On putting
this can be rewritten as
where the range of integration is
(7.3.7)
~ r(1-r2)~(p-4)drd8
ZTI
90
The expression for M (k,h,p,p) is similar, with hand k interchanged
l
(valid for h>O).
Note that the range of integration for M (k,h,p,p)
l
is same as (7.3.7).
Hence on using (7.3.6)
where the range of integration is as in (7.3.7).
Putting
u = cos w
we get for h>O, k>O
arc
1
(7.3.8) M(h,k,p,p) = 2TI
2
h +k 2 -2hk cos w)~(p-2)dw.
(1 sin 2 w
J
arc cos p
It is easy to see that this expression is also valid when either h or
k is equal to 0.
In fact, this is valid even for h=k=O.
To this end
note that
= Pre
e
1
A~ oS
11
where u
(7.3.9)
l
and u
2
e
l
> 0,
A2
22
p
2
-=-J.,;--=- > 0)
0
S
p
have a N(O,O,l,l,p) distribution.
Hence
1
M(O,O,p,p) = ~ + 2TI arc sin p
and this is the value we obtain from equation (7.3.8) as well.
91
7.4.
A probability inequality
We now turn to a probability inequality, which gives a restricted
analogue to Corollary 5.2.1.
Theorem 7.4.1.
(7.2.5).
Let the Joint distribution of wI and w be given by
2
If p<O, then
Pr(wl~cl' w2~cZ) ~
(7.4.1)
Z
IT
Pr(w.<c.),
. 1
1.- 1.
J,.=
provided both c
Remark 1.
l
and
C
z are
of the same sign.
This theorem has been proved by Doornbos et a1. (see Doornbos
[11]) for the special case of Example 4.5.1.
The proof given here follows
on parallel lines.
Proof:
First consider the case when both c
1
and
C
z are
negative.
Let
A be the point (c ,c ) (see Figure 7.4.1), which lies within the ellipse
1 Z
Then the region (w1<c ' w <c ) intersecting with the ellipse is ABCA,
l
2 z
where
and
92
----\--------+--------4-~.
W
c
Figure 7.4.1
Now consider the function
We shall show that
which will prove the result.
From (7.2.2), we see that
C.
(7.4.3)
Pr(wi<c.)
-
J.
J.
= Kl J
-1
1 (
(l-w~)~ pJ.
where
(7.4.4)
1
B(~,~(p-l))
3)
dw., 1=1,2,
J.
1
93
Define
(7.4.5)
h(t)
Then
(7.4.6)
where g(wl'w ) is given at (7.2.5).
2
Now making a transformation
and
(7.4.7)
Z
we get the joint distribution of wl'w as
= K (l_w 2)!a(p-3).K (l-w' 2)!a(p-4)
(7.4.8)
1
1
22'
where
and K is given at (7.4.4).
l
Also (7.4.6) reduces to
(7.4.9)
where w (w ,c ) is obtained from (7.4.7).
l 2
Z
and (7.4.9) in (7.4.2), we get
Substituting from (7.4.3)
94
(7.4.10)
We now treat it as a function of c ,c for c1<0,
1 Z
¢(c 1 ,c Z) with respect to c ' we get
1
(7.4.11)
where
(7.4.12)
Writing
c
(7.4.13)
¢1(c ,c )
1 2
=K
1 -1
(7.4.11) can be rewritten as
(7.4.14)
f
2
(1-w2)~(p-3)dw
2
2
cZ~O.
Differentiating
Now differentiating wi(c ,c Z) with respect to c ' we have
l
1
(7.4.15 )
dwi(c1,cZ)
dC
1
Since both c
1
and
C
z are
negative and
p~O,
hence by (7.4.15)
This implies that wi(c 1 ,c ) is a nondecreasing function of c for
l
Z
fixed c Z•
From (7.4.13), it then follows that
increasing function of c
l
for fixed c •
Z
= ~ - ~ = O.
Also
~l(cl'cZ)
is a
non~
96
and
o
p
-----t---------'f;:---------t---~
c
l
T
r--~---__IR
8
Figure 7.4.Z
Therefore
T to R.
~(cl,c2)
But the function
increases from Q to T and decreases from
~
is non-negative at Q (since
at Q) and at R (since R lies on the line 80).
~(cl'cZ) ~
both c
and
l
This shows that
0 on QR for any fixed c z' which completes the proof when
C
z are
negative.
To complete the proof when both c
l
and
C
z are
that (7.4.1) is equivalent to
(7.4.16)
Pr(wl~cl'wZ<cZ)~O
Pr(wl~cl'
w2>c z) <
Z
11 Pr(wi~ci)'
i=l
positive, note
97
where c
and
l
C
z are
negative.
Since the probability distribution of
(wl,w Z) is same as that of (-wl,-w ), hence (7.4.16) yields
Z
z
Pr(wl-S-d l , wZ.S-dZ)'s- II Pr(wi.S-di),
i=l
where d i = -c (i=1,2) is positive.
i
Remark 2.
In view of Lemma 7.2.1, one suspects that this theorem is
valid for p>O in some region near the "boundary" of the ellipse.
However, this is not true for all points within the ellipse as can be
easily seen by using equation (7.3.9).
7.5.
Percentage points
7.5.1.
Upper and lower limits
The solution of equations
(7 .5.1)
Ua
and va denote the true
upper lOO<Y.% points of U and V respectively.
These can be obtained
give an upper limit for ua and va' where
by using the tables of the incomplete beta function.
Thus to get
2a
=n
and
1
-2
1-v
(~(p-l), ~)
a
=-an
where I x (p,q) is defined at (6.2.2).
The method of finding the lower limits is similar to the case
of known variance with the exception that now we use the bivariate
probabilities given by (7.3.8) and (7.2.8).
Note that if P
< 0, then on using Theorem 7.4.1
max
Pr(U>u ) > a _ (n-l)~2
a
2n
7.5.2.
True percentage points
For the case v=O, Srikantan [40] has shown that under certain
conditions, which in general hold for small values of n, the upper
limits given at (7.5.1) coincide with the true percentage points.
This is true in the present case as well
Theorem 7.5.2.
(a)
u
v
u~.
coincides with
u
(7.5.2)
(b)
a
> [ ~ ( 1+p
a -
max
u.
)]
provided that
J-.,;
2
coincides with v • provided that
a
a
v
(7.5.3)
Proof:
> [~(1+1 p .. 1)]
a -
If the conditions
k
2
1.J
for all i#j.
(7.5.2) and (7.S.3) are satisfied, then by
Lemma 7.2.1, we have for all
i~j
Pr(w.>u
1.
a,
W
j
>ua )
=
°
and
Pr ( IW1.• I>va , Iw . I>va )
j
= o.
This means that condition (4.3.7) is satisfied by both u
hence u
a
and
va
coincide with u
a
a
and v
a
and
and v~ respectively.
u.
It should be noted that this theorem is of limited use, as it
gives the true percentage points for some small values of n and v only.
99
Illustration.
Ua
va
and
7.6.
For Example 4.5.3 with r=c=4 and a=.05, the upper limits
give the true percentage points for v~l and v=O respectively.
Performance of test statistics
For simplicity, we only consider the measure$ P and P •
a
b
now need the distribution of w under H •
k
k
We
From equation (7.2.1), we
can write
(7.6.1)
where
and
Now, by the results established in Section 4.4, under
has a N(A
kk
~,
ek
8, A 0 2 ) distribution and S2 has a noncentra1 02 X2 distrikk
bution with n-m degrees of freedom and noncentra1ity parameter
It is clear that the distribution of w~ depends on the ratio
8/0 and hence we may take 0=1.
Considering the decomposition
we see that Q has a noncentra1 X2 distribution with 1 d.f. and non1
centrality parameter ~~k = A e2 , (S2_ Ql ) has a central X2 distribution
kk
with (n-m-l) d.f. and the two X2 's are independent. Hence
100
··e
x' 2
(7.6.2)
2
k
W
where X,Z
d
1,6k2k
-X-=-,--n2- 2-+-"""2-'
Xp-l
1,6
kk
stands for a noncentral X2 variate with 1 d.f. and non-
1,6~k
centrality parameter 6~k' X;_l stands for a central X2 variate with
p-l d.f. and the two XZ's are independent.
noncentral beta (~,~(p-l),6~k) distribution.
Thus, under H , w~ has a
k
The distribution of w
k
is now immediate.
Note that (7.6.2) can be rewritten as
(7.6.3)
where
ti , 6kk
is the noncentral t distribution with f=p-l d.f. and
noncentrality parameter
~k'
For the one-sided statistic, we now have
k=1,2, ••• ,n
t'
f,6
= Pr(
(t'Z
f,6
kk
1
+f)~
> u )
a
kk
(7.6.4)
This can be compared with P of equation (6.3.2).
k
same in both expressions.
Now
Note that 6
kk
is
101
p
a
where 6 is given at (6.3.5).
Similarly, for the two-sided statistic
k=l, Z, ..• ,n
and
7.7.
Comparison between external and pooled studentization
In this section, we deviate from the notations used so far
and retain the suffixes in various statistics and related quantities.
Thus, for example, the true upper 100a% point of U will now be denoted
z
by u
z,a
and V as used in Chapter VI for external
stud~ntization will
be denoted by v •
Z
The non-availability of true percentage points for most regression models makes the comparison difficult.
One way to compare the
two cases is to use the upper limits and study the various measures
of performe;tnce mentioned in Section 4.4.
We now restrict our attention to the one-sided statistics U
z
and U •
3
satisfy
z,a.
From equations (6.Z.1) and (7.5.1) we see that U
and u
3,a
102
2a
(7.7.1)
k) = ( kV
2 2' 2
n
and
(7.7.2)
2a
I l _u2 (~(n-m+v3-l),~) =n3,a
Similarly, using the upper limits in the expressions for P 2 ,k and
P
3,
k given at (6.3.2) and (7.6.4) respectively, we get for k=1,2, ..• ,n
(7.7.3)
and
f
(7.7.4)
= Pr(ti
A
3'Ukk
>
k -
2· U
3
(1-u 2
3,a)
1
,
3,a
)~
where
and
From equations (7.7.1)-(7.7.4), it follows that if f =V 2 then
3
f
=
u
2,a
and
k2 -
.u
3 3 a~
_.;;;..-.-:;:..z..'
(l-u 2
3 ,a
)~
103
Next consider the case when V =V ,
Z 3
By our assumption (4,1,7),
the matrix A is at least of rank Z and hence n-m-l>l,
that f 3 >v Z and P ,k>P Z,k'
3
This implies
Hence, if we use the upper limits, then for
the measures P and P , U will have a definite edge over U '
a
b
3
Z
Obviously, the gain will be large when (n-m-l) is large compared to
BIBLIOGRAPHY
105
BIBLIOGRAPHY
[1]
Abramowitz, M. and Stegun, I. A. (eds.) (1964). Handbook of
Mathematical Functions. National Bureau of Standards,
Applied Mathematics Series 55, U. S. Government Printing
Office, Washington, D. C.•
[2]
Amos, D. E. (1964). "Representations of the central and
non-central t distributions", Biometrika, 51, 451-458.
[3]
Anscombe, F. J. (1960).
1, 123-147.
[4]
Barnett, V. D. (1966). "Order statistics estimators of the
location of the Cauchy distribution". Journal of the
American Statistical Association, 61, 1205-1218; correction
63, 383-385.
-
[5]
Blom, G. (1958). Statistical Estimates and Transformed Beta
Variables. Almqvist and Wiksell, Uppsala, Sweden.
[6]
Chew, V. (1964). "Tests for the rejection of outlying observations". RCA Systems Analysis Technical Memorandum No. 64-7,
Patrick Air Force Base, Florida.
[7]
Craig, C. C. (1941). "Note on the distribution of non-central
t with an application". Annals of Mathematical Statistics,
12, 224-228.
[8]
David, F. N. and Johnson, N. L. (1954). "Statistical treatment
of censored data, Part I, Fundamental formulae". Biometrika,
41, 228-240.
[9]
David, H. A. (1956). "Revised upper percentage points of the
extreme studentized deviate from the sample mean".
Biometrika, 43, 449-451.
"Rejection of outliers", Technometrics,
[10]
David, H. A. and Paulson, A. S. (1965). "The performance of
several tests for outliers". Biometrika, 52, 429-436.
[11]
Doornbos, R. (1966).
Amsterdam.
Slippage Tests.
Mathematisch Centrum,
106
[12]
Dunnett, C. W. (1955). "A multiple comparison procedure for
comparing several treatments with a control". Journal
of the American Statistical Association, 50, 1096-1121.
[13]
Dunnett, C. W. and Sobel, M. (1954). "A bivariate generalization of Student's t-distribution, with tables for certain
special cases". Biometrika, 41, 153-169.
[14]
Feller, W. (1957). An Introduction to Probability Theory and
Its Applications, Volume I. John Wiley and Sons, Inc.,
New York.
[15]
Fisher, R. A. (1940). "On the similarity of the distributions
found for the test of significance in harmonic analysis,
and in Stevens's problem in geometrical probability".
Annals of Eugenics, 10, 14-17.
[16]
Govindarajulu, Z. (1963). "On moments of order statistics
and quasi-ranges from normal populations". Annals of
Mathematical Statistics, 34, 633-651.
[17]
Grubbs, F. E. (1950). "Sample criteria for testing outlying
observations". Annals of Mathematical Statistics, 21,
27-58.
[18]
Gumbel, E. J. (1954). "The maxima of the mean largest value
and of the range". Annals of Mathematical Statistics,
~, 76-84.
[19]
Gupta, S. S. (1963). "Probability integrals of multivariate
normal and multivariate t". Annals of Mathematical
Statistics, 34, 792-828.
[20]
Halperin, M., Greenhouse, S. W., Cornfield, J. and Zalokar, J.
(1955). "Tables of percentage points for the studentized
maximum absolute deviate in normal samples". Journal of
the American Statistical Association, 50, 185-195.
[21]
Hartley, H. O. and David, H. A. (1954). "Universal bounds for
mean range and extreme observation". Annals of Mathematical
Statistics, ~, 85-99.
[22]
Hartley, H. O. and Pearson, E. S. (1950). "Table of the probability integral of the t-distribution". Biometrika, 21,
168-172.
[23]
Hume, M. W. (1965). "The distribution of statistics expressible
as maxima". The Virginia Journal of Science, 16, New
Series No.2, 120-127.
107
[24]
Johnson, N. L. and Welch, B. L. (1940). "Applications of the
noncentral t-distribution". Biometrika, 11:, 362-389.
[25]
Kimball, A. W. (1951). "On dependent tests of significance in
the analysis of variance". Annals of Mathematical Statistics,
~,
600-602.
[26]
Kruskal, W. H. (1960). "Some remarks on wild observations".
Technometrics, l, 1-3.
[27]
Locks, M. 0., Alexander, M. J. and Byars, B. J. (1963). New
Tables of the Noncentral t Distribution. Aeronautical
Research Laboratories, Wright-Patterson Air Force Base,
Ohio.
[28]
Moriguti, S. (1951).
distributions".
"Extremal properties of extreme value
Annals of Mathematical Statistics, ~,
523-536.
[29]
National Bureau of Standards (1959). Tables of the Bivariate
Normal Distribution Function and Related Functions.
Applied Mathematics Series 50, U. S. Government Printing
Office, Washington, D. C••
[30]
Owen, D. B. and Steck, G. P. (1962). "Moments of order
statistics from the equicorrelated multivariate normal
distribution." Annals of Mathematical Statistics, 33,
1286-1291.
[31]
Plackett, R. L. (1947). "Limits of the ratio of mean range
to standard deviation". Biometrika, 34, 120-122.
[32]
Quesenberry, C. P. and David, H. A. (1961).
outliers". Biometrika, 48, 379-390.
[33]
Rao, C. R. (1965).
Applications.
[34]
Rider, P. R. (1960). "Variance of the median of samples from
a Cauchy distribution". Journal of the American
Statistical Association, ~, 322-323.
[35]
Sansone, G. (1959).
New York.
[36]
Sarhan, A. E. and Greenberg, B. G. (eds.) (1962). Contributions
to Order Statistics. John Wiley and Sons, Inc., New York.
[37]
Sen, P. K. (1959). "On the moments of the sample quantiles".
Calcutta Statistical Association Bulletin, ~, 1-19.
"Some tests for
Linear Statistical Inference and Its
John Wiley and Sons, Inc., New York.
Orthogonal Functions.
Interscience, Inc.,
108
[38]
v
~
Sidak, Z. (1968). "On multivariate normal probabilities of
rectangles: their dependence on correlations". Annals
of Mathematical Statistics, ~, 1425-1434.
[39]
Siotani, M. (1964). "Interval Estimation for linear combinations of means". Journal of the American Statistical
Association, ~, 1141-1164.
[40]
Srikantan, K. S. (1961). "Testing for the single outlier in
a regression model". Sankhya Series A, 23, 251-260.
[41]
Steck, G. P. (1962). "Orthant probabilities for the equicorrelated multivariate normal distribution". Biometrika,
49, 433-445.
[42]
Sugiura, N. (1962). "On the orthogonal inverse expansion with
an application to the moments of order statistics". Osaka
Mathematical Journal, 14, 253-263.
[43]
Young, D. H. (1967). "Recurrence relations between the P.D.F.'s
of order statistics of dependent variables, and some
applications". Biometrika, 54, 283-292.
110
APPENDIX
EVALUATION OF DEFINITE INTEGRALS
For our computational needs, the definite integrals were
evaluated by
u~ing
the Gauss quadrature formula (see e.g. [1], p. 887).
The formula can be briefly summarized as below:
Let a and b be finite real numbers.
f
(A.l)
a
b
f(y)dy ~ ~(b-a)
Then for n point formula
n
I
w.f(y.),
i=l 1
1
where
Yi =
~(b-a)xi
+
~(b+a),
i=1,2, ••• ,n
and the points x. and weights w. are constants extensively tabulated
1
1
in [1] for various values of n.
It should be noted that (A.l) gives
an exact result when fey) is a polynomial of degree (2n-l) or less.
The value n=20 was found to give sufficient accuracy for all
of our computations.
The points xi and weights wi were directly taken
from the above tables and the computations were performed on a GE235
CALL A COMPUTER.
© Copyright 2026 Paperzz