TESTS FOR ONE OR TWO OUTLIERS
by
Robert G.MoMillan
•
Institute ot Statistics
Mimeograph Series No. 613
August 1968
iv
TABLE OF CONTENTS
Page
LIST OF TABLES
LIST OF FIGURES •
1.
1.3
•
0
•
•
.
0
..
Cl
1
The Outlier Problem • • • •
Review of Literature •
Problems CQu'sidered i.n This Study
...
1
1
5
2.2
9
Known Variance • • • • • • • •
2.1.1
2.1.2
2.1.3
2.1.4
9
The Test Procedure
Performance Criteria
Calculation :>f Pe:rformance
Two-sided A:.t:ernat::':'ve.s • ,
9
• 10
· • 10
«
,
•
•
•
2.2.5
2.2.6
2.2·7
G
•
0
•
...
Introduction.
Variance.
....
~nown
•
•
· 44
44
· . 44
•
44
4(.
:1
5:
Unknown Variance • •
3.3.1
3.3.2
The Problem of Rejection Cons·::an.:~8
Condition for Rejec:::ion of at M,~±.:;:: One Set of
3.3.3
3.3.4
A Bound for Err.:>:c ix. a in Othe:t' c,i:tses •
Performance
• • • •
Kx's • •
3.4
16
The Test Proceiure
• • 21
Performance CL~teria • •
22
Calculation of Performance
• • 22
Asymptotic Properties of Perrc:rmance Criteria • • 32
Numerical ReSUlts • • • •
• 33
Two-sided Alte'c:,arives • • « •
•
•
•
•
•
36
External Student ~,za'::ion
• • ••
• 39
Rejection Cunstan~s •
Performance
3.3
0
21
MURPHY I S TEST FOR TWO OUTLIERS
3.1
3.2
•
Unknown Variance •
2.2.1
2.2.2
2.2.3
2.2.4
vi
. vii
SEQUENTIAL TEST OF MAXIMUM RESIDUAL •
2.1
3.
•
•
INTRODUCTION AND LITERATURE SURVEY
1.1
1. 2
2.
• •
1f.01
~
••
o.
• 51
· 53
· • 58
• • 63
An Optimum Test for a Pr.eje·:.~::mined Num'bei: of Outliers • •
64
v
TABLE OFCONTUNTS (continued)
Page
4.
~RUBBst
TEST FOR TWO OUTLIERS •
CI
•
4.1 The Test PrQcedure • • • •
4,.2 Pex-£ormance' of Gru~bs' Test
...
,.
C~PAIUSON
•
•
.... .
"
..
~
•
0
•
0
"
II
•
et
0
If
0.
0
II
0
OF PROCEDURES AND CONCLUSIONS
Conclusions
.11
• 71
· 71
· . . 75
5.1 Comparison of Murphy's and Gr'l1bbs' Tests •
• • • • •
5.2 Comparison of Three Tests
5~3
•
•
G
•
. • • • . •
. . · . ··
"
1:1
75
78
· 81
Jp3.1 Samples Expected to Contain at Most One Outlier • 81
5.3.2 Samples with a Predet~Pnined Number of,
potential
Outli~rs
• • • • • • • • • • •
5.3.3 Samples with an Unknown Number of Outliers,
Known Variance
• • • • • • • • • • . ••
of Outliers,
Unknown Variance
• • • • • • • • • •
5.'.5 External Studentization is Multiple Outliers
Are Suspected • •
• • • .
5.3. 4 Sample. with an Unknown Number
6.
LIST OF
REFERENCES,.•
7. APPENnICES
7~l
7.3
7.3
.. .
I I Ill". "
...
· 8,
· 83
• 83
83
· 84
· . • 86
Cpnd!tiQns for Which Simultaneous Detection Implies
Sequen~ial Detection • • • • • • • • • • • • • . •
A Generalization of the Neyman-Pearson Fundamental
Letnllla • • • • • • • . • •
••• • • • .
Tablee,of Numerical Results •
p
•
•
•
•
•
•
•
•
•
• 86
. 90
•
93
vi
LIST OF
TABLE~·
Page
·,.1 Rejectienconstants bn,O: for sum of .largest two
$ tandatdiz ed residuals , •
0
7.1
. ' 00
0
0
•
0
•
•
o
8
sequential maximum residual test of section
2,,1.1, when'x l ,x2 '"" l'l'(Il+A,l), XyP8,Xnrv N(Il-!l), A> 0 8
~~tfQrmjnee,of
7.2 PerfQrmanceo£ sequential maximAAm reSidual
Pe~forma~ceo£
= ~ 05",
.,
Go
••
93
rv
N(Il,l) •• 94
sequential maximum residual test of section
Z.,2.1 when xl,x
~
0
of $ectiQn
tes~
Z~l.~ when Xl rv N(Il+A,?l), x 2 rv N(Il,.,A,l) , ~3'q''?Xn
7.3
ell.
0
z
.,
p"
<)
N(I,.l;-lrA,J!) .. JJ:
~,
..
9'
..
,
:?, . . ,x
3
.;"
I')
•
0
N(Il"rI),y
rv
n
o.
l'
.(1
~
,.
•
,
> 0,,'
•
I')
q
'0
~5
P«r£OPnaflce'of sequential maximum.·resiflual test·pf section
2. 2~ 1 when Xl rv
x ,·q,X
"
N~f.tHl'
2)
cr
"" N(Il,o' )., A
n
l
3
·P.erfo'tft\aq,ce-o£ sequentid
1
. (
2) ,
X2 rv N Il+A2 ,O" ,
.; • • . 96
2 >0, ex = .05
ma~imum residual test
==
21
o:f'·s,c~ion
2
'2 .
~.2.6 when Xl rv N(Il+ A,cr ),x ~ N(Il- A,& ),
2
.' 2
.
~3'q!Jxn rvN(IJ.,o- ), A >0, ex ::;:.05" -v = 0 • • • • • • •
9
••
97
i.rfQ-r;mance-ofseq\1,ential maxitllW1l residual test! of section'
2.2.7 with external stttden;ization and with in~erna1 and
,
.
2"
exteJ:'nal. studenti~ationWhen X ,,";K/2 rv N(f,L+A, cr ) ~
J
"3'· •• ,xn.'" N(Il,o- ), A > O,Ql ;II! ,.05 • • • •
0
,
••••
~
••
98
'erform,ance-9f Murphy's test for two outliers when.
. .
wc'+'X$
· 2 2
p ~, ?~n·N ~(f,L, 0" ),
1:> 0 .• •.•• • , P '
rv N(Il+:A-,o- ), x y
l,rfQ~ance
,
"
99
of Grubbs' test for twp,outliers whEllo'\
.2
~:l.'~~ ..... N(f,L+:A-"o-),
' 2 '
xy .•• "Xn,N N(fJ.,cr), 1 >0 • • • • • • • • 100
vii
LIST OF FIGUilS
Page
2.1
2.2
Performances Pa , ?b' Pc of sequential maximum residual test
of section 2.1.1 at level ex when x ,x 2 '" N(IJ.+)..,l) ,and
1
x , .•• ,x '" N(;.L.,l), with dash~d lines showing the
n
3
performanc~ oi the ordinary maximum restdual test in
the presence ~f one outli,r
. . . . • • • • • • . 0 •
Performances Pa , Pc
of
sE~quential
•
•
13
maximum residual test of
section 2.1.4 wilen Xl n., N(IJ.+"-,l), x2 '" N(IJ.-:A,l) and
~~, ••• ,x '" N(p., 1), with dasped lines shoWing P ' P
a b
n
of the one-sided tast of sect~9n 2.1.1 • ~ • . • • • . • • • • 18
,
I
2.3 Rejeotion regions
~or first sta~e of sequential maximum
test of s~ction 2.~.~ for n ~'11, v = 0, ex = .05,
and x'a fixed, &lhQwing corresponding regions for rejection
of low outliers . . • • . • • • .,. • • . • • • • ••
•• 27
rea~dual
2.4
Rejectto~
regions rQr f1rst stage of sequential max~mum
l;'esidual test of section 2.2.1 for x2 ::: X:
.. . .
• .29
5
~., ' Performances P , P
test
bl Pc of sequential maximum ,~,sidual
a
'
2
of section 2.2.1 when ex = .05 and x "x2 '" N(IJ.+A, CT ),
1'
2
x y .. " x '" N(1-1-, (J") T............·...
" 34
n
2.6 Performances p" Ph' P of ::Jequential maxim~m residual test
a
c
2
of sections 2.2.1 and 2;1.1 when 0: =.05 and xl '" N(IJ.+).l'O- ),
2·7
x2 '" N(IJ. +2' A1 , 0- ), x '· .. ,xn N(IJ." cr) • • • . • • . • • • • 35
3
Performances Pa , Ph' 'Pc of sequential,maximum +esidl,181 test
of section 2~2.1 when n+v ::: 21, ex= .05 and
1
2
,2
'V
.
2
"
'2
x 1,x 2 '" N(IJ.+A,cr ), x3'''''~n '" N(IJ.,cr ) . • • • . • ,. , • • . • 37
2.8 Performances Pa ' Ph' Pc of sequential maximum re~idual test
of section 2.2.6 when q := .05 and xl '" N(IJ.+A, cr ),
',,2
x2 '" N( IJ.- A, ':' )-, ,x3"o., x
rt
''V
,', 2
N( IJ., J )
•
•
••
•
•
•
• •
•
40
2.9 'Performances ,p a , Ph' Pc of externally studentizecl sequential
maximum residual test of section 2.2.7 when ex ;;,.05 and
2
x 1 ,x 2 '" N(IJ.+A,J2), x , ... ,x n '" N(IJ.,cr ), showing the
3
performances of the test of section 2.2.1 as da3hed lines
for comparison • • • • . . • • • • • • • • . • . • • • • • • , 41
viii
LI ST
~,
~. ~ Gtc hi S (C
tll
,
j
Page
",l
).2
performance of Mu.rphy's test fot' tw~ outliers when
xl"ox2'~ N(;;.~),..,,;L); Xyo",,;I.. '~ N(\J:.1),7 A> 0
••••••
Performa,pce of Murphy's test for two outliers when
2
x ,x "-' N(;;'+fc,.,o2)" KyO"";>'l'v N(u,er )" A, > 0 , • • • • , ••
1 2
Performance of Grubbs! test fn' two
x
"x.
' l'N(',Ii,,' '"
2,);
K· },,,·,}i; II
·'V
3
QutH~rs
N( 11·) 0'2)
,
J
'\>0
~\
()
•
.,
e
(J
0
•
5.4
e
74
• 77
CUl!;,;Cdnt.
jtE
pU!u.tt1,W.,V
Grubbs' and Murphy's
79
te~ts
PerformanCe: \)L sequ~ntial rltd1~ mUtt! .residual and Murphy I s tests
for two outliers when KIJx~?'v N([lfA,lL x ,· .. ,x n "-' N(Il,l) •• 80
3
Performance of three tests f;:·' two outliers when
~
2
,. , ,+" 0 c: ) J x 3} • ,~.
N(. ..,'.:' ,rf'
)
X 1 J X 2 ' VN
" " ' II}
J
1
'"
0
<",
First and second stage reject
maximum residual test with
2
2
X
7.2
65
5 • • • . . 76
Xl' x)
Liues of
52
when
n ::::
";
p
:;: X
~'
.?
•
,
v
0
~
0
'J
e
0
It
82
regions for sequential
'
unknown when n == 21, ex :;: .05,
~on
2
J'
,
. 87
First and second stage rejection regions for sequential
maximum residual test with J2 known when n == 21, ex :;: .05 . . •
89
I,
II>lTRODUCTION AND LITERATURE
1, .L
SURVE~
The Outl;ter Problem
•
-
j
•
The problem of treating outlying opservations has been
contex~s
in various
that a sample tlas
may be from a
and with various objeqtives.
be~n
appro~ched
Usually it is assumed
obsf;rved in which one or more of the observations
~i~tribution
other than the distribution of chief interest.
In cltemical laboratories.;! fer example" it is common practiae to monitor
the precision of an analytical method with a
homogeneous batch of material,
rhese
serie~
analy~es
of analyses on a
may be used to provide an
estimate of the true batch concentration" and this estimate may also be
used in
~he calibr~tion
of other methods,
expel'ienced persclUnel or other
Occasionally, Qecause
ca4ses,gra~sly d~vi,ant
o~
in.
ana.lyses occur,
Suah re$ults are frequently helpful in drawing attention to conditions
needing attentiQp~
Beyoqd th~$ they al'e likely to be harmful in anal-
ysts 9f the date, ?nd their rejectton, or at 1,ast isolation, is desirable.
For normally distributed observations, the qu~~iers are usually
assume<;l to differ in m,ean (lQcation cQptaminat'ion) or in variance
(scala;r c;.ontaf\1inat;ion) fram the Qther
obse~'"Vat1ons.
Some studies have
COIulidel;'ed the effect of rejection of QbsGrvat;ions on the estimation of
parameters!
object~ve
In other: studif;!s the detection of outUen has been an
in itself.
1,2
Since reviews of the
Revi~w
,
of Literature
I
lite~atul'e
on this subject are available el&e-
where, ~.!o Anscambe (~96o) an4 Ferguson
(1961), only the
cQntribution~
2
most relevFnt to the
p~esent
study will be mentioned
five times the probable error should be rejecf:;f;'d.
d~ntized by Tho~pson
(1935), who proposes that
h~re.
Residuals
This r..:tle is stu-
';.!l a normal sample)
x 1 ,:x 2 '.·· ,x n ' thos~ observ.;1ti,ons for which Ixi"Ox;ls exceel$ some Gon_
1 n
2
1 n
r,
stant be cb;lssed as outliers, where x =..... L. x .• and s "" - r. ex. -i)'"·.
n i=l 1"
q 1=1 1
ihQm.pson prQvidesa ta1>le of constants which allows the a\rarclge number
~earsQ~
of rejections per samp1e free of outliers to be predetermiufd.
and Chanqra Sek9 :r (1936) point out that (xir-x) /a is, likel? to be ~n
~or
eUj,cient
the $amph.
tlla~k
ou~1iers
detectiol\ of
'rhe
all of them.
pre~ence
i f there ~s morl! than one outlier
I,:.
of additional outli,ers tends to :increase s ar'.::
They .haw that: i f x(l)
< x(2) < ••. < xen) are the
,,:r.·atlre~ QbserviSt;i.9Q,S, there are algebraic ma?Cima'oIfh~<th (~(n) .. i)
;.'1
(X~n_l) .... ~)/~J etc. e~nnl!l,t excee4iii
The dis tributl,on of (x (rt) -x)! (f is obtained ;:>y 1I1cltay (19:'5).
fj~n~p,1ar. defivat~ons a:re givt;ln by Nair
'fabulat~s
the probatility ;f.ntegral of this statiijtic for n <
,~'nrLsl'H~s t.he upper.
; '!t'!f.:ed
(1948) and Grubbs (l:j~'C).
c~nst.ant!sQ
5%, 1%, 5%"
and :1.0% values" n
9.
Nair
Grubbs
-
< 2"5, for ;Jse as l;'e-
For l;)'Se wben the variapce is unk1lown'j NL:::'r determines
"
'l.j,,~;: 1,%,
.
f,~':te
and 5% points f'or (X.C
n
)""x)!s v,
where s2 is a lllean square estiv
of variance with v d. f. and independerLt of xl'x 21 , ,. ,l:8(t'
Gtubbs
q~taini the distr~Qution of a statistic equiv~lent te (x(n)-i)!s and
I..l.ii,$t' l%~
2.5%, 5%, an4
cO:'v,stants
f01;
rejection
lO~
values.
Quesenberry and David (:961) g;i,ve
Q' oCltliers b~sed on
(~(n)-x) ! Ji~J, (~i .. i)2 +v~:~"
:3
thus using all the data to estimate
~
when external information is
avaUabh.
Proc~dures in,which x(n) is classed as an outlier if (X(n)-x)/s
eKc~eds
some constant are shown by Murphy (1951), Paulson (1952), and
(1956) to have a certain
~udo
opt~mum
property.
the optimality will be exp1aineQ in Chapter 3.
opments are based on restriction to procedures
chpnge of scale and 1Qcation (y
the
~
ax + b, a >
if present, have means
out1i~r~,
observations.
differ~nt
Each of them tteats a different
outlier problem.
The ~recise nature ot
All three of the deve1 . .
~hieh
are invariant under
0), and all assume that
from that of the other
generali~ation
Murphy ahows that the most power£u1
of the
invaria~t
proce-
dure tor sdecting l< "bservatious as ourHers is based on
(an (K(n) -x) + a n _1 (:x: (n-1) -~) + ... + a n _k +1 (x (n.,.k+1)-x) J/s •
H~s d~velp~ment pssum~~ no external in£prmation as to ~2, and that;
an ,. an_I> ••
>
Q
an,.... k+1
by which the means of
outli~rs.
whi~h
are known
th~
outliers
~oustaqts
d~ffer
P"l'oporti,onal to the FlmQunts
from the mean of hhe non-
Since the procedure depends on the a's, there is no procedure
is most powerful for all outlier means if k > I.
are assumed to have the
sa~e
If the outliers
distribution, however, the a's may be taken
as equGll, and an equivalent statistic 1s (xl'-i{2)/s, where Xl is the mean
of the k highes t
~ions9
to
observat~onsand
x2
the me{lU of the remaining observa-
If the test is for only one outlier, the statistic is
equ~valent
(X(n)""x)/s.
J?au~son
t'l:'eat$ the case of the one-way C;1assification with equal
nun,.be:j:;s in t'he
c~l1li1o
He develops a
pr9cedu:J:'~
for deciding
thq~
all
4
cells have the's8me mean or that one-cell has a mean highe:r than the
o~her~,
the
otQermea~being equal.
Paulson's
pro~edure
is based on
(X(n)_X)/S when the:reisbl\t -one observation per cell, and the problem
reduces to that of at most one outlier in a normal sample.
Kudo considers three groups of independently distributedobservations.
The firstg:roup is distributed N(iJ.,cr2 ) e~cept possibly for one
outliet', which has a mean exceeding iJ..
2
The second group is N(j..t,cr ) and
2
The third group is N(J..L ,cr ) with iJ. f J..L. Whenthere
1
1
are nO,9pservationsin the ..ec-ond g~oup, the ~eststatistic which Kudo
free of outliers.
derive'$reduees to that fo¥' 'lIl'hich Quesenbe-rry anel David determine rejection constants.
When there
~re
o.bservations only in the first group it
reduces to (x(n)-i)/s.
Dh:on (195J) investi,giltes the performance o£ outlier testsU$ing
randOqlsampl~n.g~
Thetest:$s~udied include
Gl;1,1bbs ' test and vario4-$
tests based on ratios of differences of order statistics, such as
(X(n)-X(n_1))/(X(n)-X(1))'
consid~red,
Both scalar and location contamination are
with one or two outliers, or with Some percent of the popu-
lation sampled being contaminateel.
For one or two outliers, performance
is the percentage of samples in which the outliers are extreme values
and are detected as outliers.
For contamination of a percentage of the
population samples, performance is the percent of totill contamination
discQvered.
David and
Pa~lson
(1965) mention several possible measures of the
per£orqlance of outliet' tests for the case o.f at most one outlier.
these measures
t~e
emphasis is On detection of outliers as an end in
itself rather than a means of improving estimates of parameters.
e
I
rn
The
5
different measures result from such
lier is also the
extr~~
$ignificant result in the
consid~rat;i,ons
whethe~
value and
. Some
samrlf~
as wlll:ither the
~~~~ier
tlle
~~ ~hese
0ut-
is the only
measures would be
Fpr a variety of cases
extref\1ety difficlfltto evaluate ul1mer:i,cany...
David and EaulSoJ:1, d.€'t.errnJne
..,.
.
/~
V1==1
(x._x)2 +
V8
1.
.t"
~
x _·x
1
Pr ( ....-.................- - - - - - -
> Vf'~J v)
ex
2
V
a's, a function of the shUt in mean of
9;nly
ou~aet;',
~1'
which iii! assumed tq be the
with vil.1,Jv) Bu·ch that, with no outliers,?
p
Pr (~-r--,.-,_X~(n~)~~_x_,_ _
2
Z (x.-i) . +
Thu$ the¥
1
determ~ne
.=
ex
2
n
. 1
1':::'
> V (J;l." v)
V
the
103
probab~lity
Problems
In the present study
i.
that the outlier
val~es
are also
Co~sidered
;
prool~s
k ~ N(~+AJ~2)
A>0
j,k
unk~o~
signiticant,
in Thi$ Study
j
of more thqn one
o~ i~dependent obs~rvations
i~
significan~o
i
sidered, with emphasis on the case of two
xj,X
0
'liS
with@ut regard to whether other
that a sample
ex
•
outlie~
outl~ersQ
x
1
'~2gooo,X
Us~ally
n
are conwe assume
is such that
6
Some consideration is given to cases in which the two outliers do not
In these cases it i~< assumed either that x
have the same mean.
have
ex~ectat1ons ~+Al' ~+A2J
2~
In Chapters
•~
0-;'1..-_,
lOWing three
with A ,A2 > 0b or
1
j
and x
~+A, ~_A.
36 and 41.'/ respectively, the performance of the
pJ;'9~edt.tres
k
fo~ ...
for detecting outliers in this situation-ia
evaluated~
1.
S~uential
test of maximum
ceqs1dQr~ ~a ~,tlt~
<
r~iduat~
In this procedure, x(n) is
if
,-
(X(n) ~)/(Y ..; e
when the variance is unknown.
If :K(n) is detected as an outliet;,
the test is repeated on the remaining n-l observations.
For two';"
stded alternatives x(n)-i 18 replaced with the largest resld¥al in
absolute value.
2.
Murphy's test.
The two largest
observation~
are considered outliers
if
X(n)-X + X(n_l)-x
s
>c.
If the variance is known, the denominator is replaced with (Y.
3.
Grubbs' test.
The two largest observations are considered outliers
if
n-2
L:
i=1
_
(x(_)-x
1
n
L:
i=1
2
n,n-
(x.-x)
1
1)
2
n-2
<
c
where
x
n,n-l
=
L:
i=1
x(.')/n-2
1
7
Here c is a generic constant: which provid.e.s in each case pr-;ba-tili.ty
1·-ex t4at no observations will be
cl~c1ar~d
outliers if in fact there are
none.
These procedures are imprQvements over Thompson's procedure if
bhere are
f~ature
multipleo~tliers.
when the number Qf potential outliers is unknown, since the
st~p
first
The sequential procedure has a desirable
has good properties when there is only one outlier.
Gru~bs'
and Murphy 1 s tests are deslgned to test for exactly two outliers.
In evaluating
performan~'e,
Xl
~md
x
2
are assumed to be outliers.
The probabUi,ties of rejeqiqn (;onstants being exceeded for Xl and x
were calculated, without l:eganl
other considerations.
ThIs is
10
J
2
whether they are extl;"eme values or
direct generalization of the approach
of David and Paulson.
For the sequential p,l'ocedure there al;"e three rather natural measures of performance as follows:
1.
Probab~lity
2.
Probability of detecting both outliers in two steps J
3.
ProbabHity that both outliers are
of detecting at least one outlier,
signifi~ant
at the first
step.
For Grubbs' and Murphy's tests the choice is between two outliers and no
outliers.
and x
2
The only measut-e cop.sidered is thl= probability that both Xl
are detected.
The performance measures
w~re
evaluated numerically
a/:l a function of A/ry in q. vari.ety of cases.
Chapter 3 captains a development of the distribution of
(X(n) + X(n_l) - ~x)/(J", and upper 5% and 1% percentage points.
Also a
simplified de·ri.vation is presented for an optJ.mum p:t'ocedure for deteqting
8
a predetermined number of outliers.
general to include
~pecial
th~
Thi.s derivation is sufficiently
prQCedUl'es of Murphy, paulson? and Kudo as
cases or easy extensioqs, but is simplified
consider~bly
by the
use of sufficient statistics.
In Chapter 5 performances of the three procedures are compared,
summarizing results of the preceding chapters.
Finally, some conclu-
sions as to the use of out:li.er tests are presented.
9
2.
SEQUENTIAL TEST OF MAXIMUM Ri:SIDUAL
2.1
Known
2.1.1
Va~iance
t
f
The Test Procedure
Let x ,x ' ••• 'x be a sample of independent
1 2
n
distrH.lt~d
N(Il,0-2) i f there are no outliers.
2
2
distd outed N(Il+X, 0- )) A > O.
2
veniencethat 0< ••• < x(n).
~ 1.
If 0-
We denote the
0bs~ryations,
each
Outliers, i f any, are
is known we can assume for con-
ordere~ observati~nsx(1) <x(2)
A suitable procedure for detection of a single outlier is
to consider zen) an
outll~r
if
;K(n)
1
where x- =
..,.
n
2: x. and v(n)
a
i=1 ~
(1950) such that
n
_
Pr { xen) - x
is
);>
the rejection
con3ta~t tab~lated by Gr~bbs
(n)}
=a
Va
if actually there are no outliers.
For at most Qne
dure is optimum in a sense discussed in Chapter 3.
ever, the number of potential Qutliers is
lik~ly
outli~r,
this proce-
In practice, how-
to be unknown.
It
seems reasonable to consider the sequential procedure of testing for a
single outlier and, if an outlier is detected, rejecting it and repeatin~
the test on the remaining data.
in the first stage.
wher~
-
Thus if (2,1) holds, we reject x(n)
If also
x(n) is the mean of observations excluding x(n)' we reject X(n_l)'
10
etc.. until the test fa Us to
pro.bab~lity
of
rej~cting
re"j~ct
an o:bsenrati-eti;.
no observations if
In th.is
~here a~~
no
w~y
~utlier$
the
remains
h(X" whUe in some saU1ple~ more than Qne ob,eJ'Vatiqn is reJ~c.ted.
Performance Criteria
~.~.2
Ii
To
I
tha~
assume
the performance of this procedure with two outlters we
evaluat~
the outliers are
~l
~2; !q~.
and
(2.2)
.
.
~nd
consider the fotlowing
'.
i ::
l,f?
i ::;:
3,4, .. "n
~eas4res
of
1>0
~erformance:
Pa ::;: Probability ,thfit,at'l.;;ast. vne out~:ter. is detected
,~'4
Th~s
we
.
Probabi1~tythat
Pe ="
probabil~~Y
:.that both outliel;'s, are ~ignificant a~ the
.• ' ...
tio~s
also
2.1.3
'£alcula~ion
w~tho~t
detQ~1=ed
regard to whether other observa-
a~e$ign1£icapt.
. I
i.
"
of
Performan~e
t
II
The following easily verified
evaluatioll of tlhe performance
(2.4)
both·outliers are
qnly whether the test pl;'OCedure yields significant re-
suIts with the true,outliers,
'.
~~\:~:.it:
Pb ::;:
co~~ider
"
.
relSuH~
meq~ure$,
are tJ,elpf'l.,ll in
rl.1:am~rica1
11.
The joint distrfb~~ion of xl-x and x 2 -xis bivariat.e normal
(2.6)
with p ;:::..
11'"".
n-
x 1.•
Here, as el~~where, x~ de~~~emean of observa~iQns excludino·lIO
That (2.3) h91d.s Ciin
bC!\Te~"fied from
f;:he table
shQwn analytically fop a of the usual size.
simple
algebrai~
Hrst stage
Then
and Can a 1$0 be
(2.4) and (2.5) are
The importance of (2.~) i$ that simultaneous
results.
signifi~ance
o~ v~n)
of xl and x 2
implie~
$equential
~ignificance.
consequeI\t~y
Calc\llation Qf the probabiliUes 1?
forward all;d
v(n)
( m;in(~1-x)x2-x) >
. a }
= Pr
c
00
: :; J
v(n)
ex
2
n-1
(J'y
:c ....."...,...
J1
1
p
Also
Ph' an,c;l P.c is now E;traighf;:..
as foUows:
proc~eds
P
<li'
1
= -n-1
12
n
(
.r
xl~x
> Vex(n)
x t.Q -x-
J
( x _·x-
= 2Pr
1
PI
1
> v(n-l)
}
rv
u,
X
2
·-x
(n.. l)
>V
1
v(n) "' ~-?
(
ex
n . j [ 1 ..
=2[1-<9'
\ --~-----_ .. - _ .... I
~
" .....n _.
\j
\
,
P
Q'
v(Q~ 1)
fa
c
n .• 2 '
n .. }
. - .A
l---I n _2
)
]
- P
c
V;:I
,-..
where
l
z
I(z) :;;
1
---
J
-00
e
'.
III
211
Similarly,
P
a
= Pr
{ x -x > v(n) } + Pr ( x -x > v(n) } _
ex
1
ex
2
P
c
) ] - Pc
The bivariate normal integrals can be evaluated easily using tables
published by Na~iona1 Bureau of Standards (1959) provided n~l is an
integral multiple of 0.05.
Results for
~
and .01 are plotted in Figures 2.1(a)-(f).
= 6, 11, and 21 and ex
For comparison, the prob_
ability of detecting a single outlier is also shown.
in most cases P
a
single outlier.
= .05
It appears that
is slightly higher than the probability of detecting a
13
.75
(a)
.50
n=6
CX=.05
.25 -
loOOr
I
I
p
a
::r
(b)
n=6
CX=.Ol
.25
o
Figure 2.1
8
Performances Pa , Pb' Pc of sequential maximwm residual test
of section 2 1..1 at level a: when x ,x2 " N(fl+A y 1) and
0
xy
".. ,xn
an
("~
l
'" N(fl,l) , with dashed lines showing the perform
of the ordinary maximwn residual test in the presence
of one outlier
14
(c)
n=ll
a;"'. 05
8
(d)
n=l,;L.
0::;:.01
-.
Fi~re
2.1
(conti~4ed)
l5
.
·50
l
~
(e)
n=2l
0:=.05
..
.25
·75
(f)
..L
::J-
3
4
A
5
6
7
8
n=2l
a:=.Ol
16
2.1. 4 ,Two_sided
A1ternat.ives
,
:~
Up to now
N(~+A,l),
~t h~s
been assumed that the distribution of outliers is
If we dQ not know the direction in which the di~tribu-
A> O.
tian of the outliers has shifted, or if the two
Qutlie~
distributions
may have .ntfted in opposite directions, we consider the procedure of
d~claring
outliers those observations
~.
1
for whioh
(2.9)
Rejection constants,
(1955).
.=!
a1.
Now, howeve.l' y we evaluate performance for the case
X.
~
~ake
for such tests were given by Halperin
As b~fore, we consider the probabilities of significance of the
t;rue outliers,
and
w~l1) ,>
,....,
N(IJ.,l)
i =
3,4, ... ,u
as performance measures
P
a
P
c
= Probability
of detecting at least one outlier
Probabi.U.ty of detecting both outliers
.
In this case we do not assume the test to be repeated using observations
not rejected using (2.9), si.nce sequential rejection of two outliers in
opposite directions impli.es first stage rejection of both..
If
tllen
x - ~ <
2
ex
...
x -x
it :; _w( n-1) _ 2.....
l'
ex
n-l
+ x_
- W( n- ) +
()
< ,. W n-l
a
This im,pHes
'if
W(f. )
(n-l)
-Wex
OJ,'
...
a
__
,
~.,....., 'J~~ ~
_
.~~
0-1
,
(2.+1)
'
That (2A 11) hqlds cllj,n b~ v~rifi~d nume,ricall.y £~r 'all <ras~s 9£ ~,nterest.
'I'her~
1.$" of QOU1;'se, pod tive prQbcibilj"ty tlJ.at an qutl:ier is sign;i.fi...
cant iti the opposi,t!e di:rection frorp its tl1ue shift.
bed~teGtep ~:y
If this happens, an
~d.c;1itiop.al (!)~t1ieJ;"
can
ins QbseJ;'Vat;i.oJ:l.s.
With the mqc;1et (2A 10) ,howeve:r, th~' proballqity that
repeating the
this occurs is :J;'l,egligi'Qle for otlJ.er thaJ;l sm.a1~:A..
t~st
on t;;he remaiJ;l-'
We theref9re dis-
. !c'
t'~g~q:d
.
P .# tqe probabilU:yo~ ~eq\1-ential detection of ~he twooutliers~
b
Results t~i J? 1:\ and R¢ usi,p.g 1\:1)e BlOd~l (2.'9)
'
repre~enta~ive caSeS in Fi~ure~ 2.2(a)-(f).
are shown for
som~
T~e p~obqb~lities ~a and
Ph ~or tl1e one... sidec!J~odd cQnsi<;lel!'~d earl:J.er are shqwp., a$ dash~~ lines
t'or
·e
Cf>Qlpad~onff
18
(b)
n=6
a=.01
....
.,
F~~r~
2.2
Perfqrman~es
Pa' Pc of
f~quential m~imUm
. ~ection2.1.4'whell xl "" N(J,l+:A.,l), x
XyH ~ ,xn
rv
2
rv
residual test of
N(p.-A,l) and
N(P., J,.) ,with dashedlj,nesshowing Pa' P
b
of' the 9ne-s;ided i1est of;;sectiotl,2", 1,
19
(c)
n=ll
a=~o5
.50
(d)
m::ll
(¥:;::;.Ol
.t
.25
~
~
0
1
Figv~e 2.2
2
(continued)
3
5
7
8
20
(e)
n=21
0:::;:.05
1
2
3
(f)
m=21
0:<=.01
21
Perfo~ance for the model withboth'o~t11ers
the direction of shift un~nown was not evaluated.
on the,same side with
Except for small A
this sit~ation is similar to the one-sided model (2.2) but using the
~~rger
cQn~tants w~n).
rejection
advantageous.
The performance
sided case
becau~e
Figure
indicates
2.~
The
wou~d
sequen~ial
procedure is again
be inferiqr to that of
of the larger rejectipn constants.
th~
On the
two outliers
. in.. opposite .directions
.
~hat
. easier to d'etect than two on,
tl:t~
same side evet:1
i~
one...
ot~er
hand
~~ybe
the direc;tion is
known.
Unknown Variance
,
2,2
2.2.1
;
The Test Procedure
,
, ;
j
;
As wi+h known varianc;e, we
co~ider
a sample
x ,x2 ,··.,xn of in_
1
dependent obsefVa~ions, each distributed N(~,~2) if ~her~ are no out-
~2
n
(J(; .... i)
Z
>::;
1=1
8
2 =
K
2
+ vs
;1.
2
v
,
(x.-x )2 + vs 2
v
iII.< ~ K
Z
where ~~ is a lJ1e~n sqUgife ~s,~:(,mate of variance based all v d. f.,. independent of x
1
,x2~"'Jx.
n
W~
consider a
we reject;: tne largest observation in the
-
x(n)~X
S
)
> V(
o,V
ex
seque~t~al
ftr~t
procedure ill
stage if
w~ich
22
Th\;Js the fi,rststage is based on th~ sta~istic p;rQPosed
., .
Dav;f,.d (1961).
by ~u40 (1956), wh~ch is 0l'timt.im i.n a pa1tt;lce:;ul~r l$ell~e i,£ ther~ is at
.
m08~ 6ne o\.\tlier.
If ~(n) is declal;'~d an. o\,ttlieX'i the test ~$~epeat\i!d
~ r~i,l'ling
w;l.th
•
,2.4.2
i~
1:.•.=.
.ervat;i,ons ,
>
then K(n_l)
.
~f
(n_.l.v)
.
ex
V
also declared an
outl~~r.
l?erro1;'ll1ance Crit;eria
I
'i
We nl!)W
I
aSSurn~
that t;he only
a~d c9~sidp~ ~rqbabi~i,tie$
o~t1iers i~
that these
the
s~mp1e Q'l'e
o~se~ation~
Xl and x 2
yie14 ,ignificance in
to
order relativn.b1pB.
h91~~
I~ addi,tiqn we ne~d
the retatioqs above, but without vegar4
We use
2. 2 r 3 Oal~ulatiQnof P~i.~Q~aj.lce
The
res~~ts
n
,
L:, (x; ... x)
!l,;;;:1
~
(F.3)
2
...
.,
(2.15)
The in~qu~~ity (~.12)
tion
(2.5) $till
::;: ~ (~.~x)
n .. l K
2
+L: (x, -x )
l' .L.K'
1"
1,
2
l{
are ind,epengent.
t$
~r~£ied nume~ic~~ly frpm the ~able of rej~~
~Q~~tants. ~,l~ti,on (2.~3) is'4n ~j&eb~aic identity;
it
is
23
frequently useful
th~t
prove (2.14), we have from (2.4) and
.
n~l (~1-~)
\
A
(2~12).
the la$t step following from
implies
si~ultaneous
using a
A
= 0,
theor~m
2
+
n
~ (x.-x )
i=2 1 1
of Basu.
Hence
rel~tion
vS
~
v
ho~ds
Qu~senberry
for
an,d David
They needed the conclusion only in the ca$e of
but it fq~lows immediate~y for A f O.
dif~icult
with un-
variance, but can be accomplished by numerical integration.
conditiqn~
(2.16)
~
+
(2.8) also
(2.15) was proved py
Conclusion
fore proceeding with thiS,
'.
2
This means that, as with known
Evaluation of the performance measures is more
k~own
To
first stage significance of two observations
two~stage significan~e.
unknown variance.
are independent.
(2.l3)
n
variance,
t~e r~ght
the tWQ terms on
P
c
~
o.
~t
is helpful
~o
recall that under some
Pearson and Ghandra Sekar
<
(1936)
s~owed
\V2U
fU-2.
and
\
that
ae-
24
where Ix~xl(n_1) denotes the second large~t of the residuals in absolute
Thu~
value.
(2.18)
we also have
x(n_1)-X
'8'
<
which is to say that for any n and v, one can choose a rejection
.
.
constant so large that the
impossible.
(The
co~dition
Quesenberry ana David.)
Pc
=0
are as
si~ultaheous
for
For a
P
c
=0
= 0.05
rejections of two outliers is
is implicit in expression 3.3 of
and 0.01, the n and v for which
follows~
n
v
a=.05
a=.Ol
15
<
24
.".
< 10
< 18
< 8
< 7
:::: 15
'"
< 12
< 6
< 11
< 5
< :LO
< 4
< 9
< :3
< 8
< 1
<
..,. 6
0
< 4
~
J
S 3
0
Fpr n :;>·14 with a
v
= O.
.
::<::
.05 and n > 21
wi~h
a
:;=
.01.,
Pc i~
pOlilit:ive even if
25
To
ab~l1ty
calcula~e
~l
that
the
perfqrman~~~easuresJ we consi~er
issignit:f.,eant a,t the
fit~t st~~eJ
f~rst
the prob-
!.!.:
..
wheJ;e
i~
x12
i~
the meqri of ob~ervafi:ions ~c.ll-l<;li,ng xl anq
conf~~io~J
no qanger of
fo~r t~rms
.exP)iession the
we write for
in
v~n,v}
~quai:e brac~ets
xao . When
simply V.
there
In t~~ last
are m\1tual1y ind-ependep.t •
. 2
.
Th,e sum of 1:;'I;\e la~t WO of th~,e teIlM i,~ 4i$trib\ited ,as X with n-3+v
d. f
f
Let!tin~
K
WI!!.
-
1
-~
-
¥
;K::;::Ii:
-7;3··2;1
::;:
have
. Pr { xl.,.x > VS
.
2
+ Xn- 3·+]'
.v
which,;
a~ter
, .(2.19)
t
1 > 0 }
S9me rearrangl'lmep.t) becQt11es
Pr { xl-x
;l>
V8 } ;:: p~
(. t~J,
[~
Vt:.
;., (n-l)]
2
- f?t
1 t 2 - (n
\ - 1)t
·2
26
The
o£ the
disc~~inant
function is
quadra~ic
n-l .
wh ie h is positive, since V2 < ~
~s
regipnin the t ,t2 plane where xl
1
br~neh ~f
2
for fixed X
Con$eque~tly,
n
sigpificant
i~
th~
boundeq by one
By analogy,
a qyperbola.
(2.20)
In
~igur~
2.3
th~
situation for
I1
;::;0
11;1 v
= 0, a = .05 is
represent~d.
The quadratic e~Hression& in (2.19) and (2.20) spe~i~y t~milies of
axes, respe¢t~vely. Each fa~ily
2
2 2 2
1$ ind~xed by X~_3+v' 'l'he hyperbolas drawn are for \ 10'
'jJ' and
hyperbola $ intersecting
X~90'
where
x;
is the
~he
t
1
and t
~olution of
pJ; { '1.,2 <
x: J =
X.
P.
Asymptqtes, wl;1.ich
corre~pond to '1.,2 = 0, aJ;e also shown.
2
For fixed X the probability that
x~
is
signifi~antly high
is the
integra.! of the 'joint density of t 1 and t 2 ovel; the region bounded by
the branch intersecting the ppsitive t
ing to the fixed '1.,2.
that
axis pf the hyperbola correspond_
1
The propabtlities that Xl is significantly low and
2 is significantly high or low can be obtained in the same way,
still conditioned on X2 • The unco~qitional probabilities can Qe ob:R
tai~ed
by
tntegra~ing
numericallY the Conditional probabilities witl;1.
re~pect to the pqf of '1.,2.
1n this example the probability of simultanepus rejection of Xl and
'x
2 is zerq, which is seen
fro~
the
non-ov~r~apping of
the
hyperbo~as
for
27
t
------·-·-··.·--···--·-lL.
7 -t,
6
t-
5
I
II
i
I
~.,_jl--_'_ _~
Figure 2.3
Rejection regions for first stage of sequential maximum
residual test of section 2.2 1 for n = 11, v = 0, a = 005,
2
and X fixed, showing corresponding regions for rejection
of low outliers
0
28
2
all X.
Otherwise the probability of sUnultaneaus rejection, Pc" can be
integrati~g
camputed by
e:xample" the two
a~ymptotes
ov~rlapping
$imultan~ous
Also, in this
In fact, the probability, or
:re)ecUon of one )'ligh and one low obsj:arvation can be ob-
integrat~ng
quadrants.
regiqns.
in the, second aJ;l.Q. fourth quadrants virtually
This h not generally true.
oo:f,llcide.
tained by
over the
ove;!!, the appropriate
Othe~ exampl~s,
in which simultaneous rejection has non-zero
are shown in Figure 2.4.
p~Qbab11itYJ
tive branches of the hyperbolas for
Once the
probab!l~ties
overlapping regions in those
In these figures only thj:a posi-
x.2 5O
are drawn.
of first stage rejection of one or both of
Xl and x are established, calculation of P and P is relatively easy.
a
2
b
We have
P = Pr (
a
=~
P
b
= Pr
x
a~
Pr {
-x
Pr ( x
1
>v(n,v)s ) + Pr { x
-x > V(n,v)S
a
a 2
-x
> v(n,v)s ) ~ P
a
c
( x1-x > v~n,v)s ) Pr { x
'"'"
2
Probabilities of
Such
1
re~ectingone
-x 1
> v(n-l,v)s }
a
1
Qutlier after the other has been rej,cted,
~2-xl > v~n-l"v)sl }~
are the
on~y ~roblems
These probabilities can be expressed as noncentral t
remaining.
integ~als an~
29
·t
-------.---,.-.,,_._. ---------'1 2
6-t'.
I
(a)
n=21
v=O
a=_.O_5
L - . ._ _
Figure 2.4
-4 -1-
I
-5 .
-6~
\,"'
J
Rejection regions for first stage of sequential maximum
2
residual test of sect~on 2.2.1 for x = X~5
30
(b)
n=6
v =1;5
a=.05
."
Figur~
8..4 (cQntinued)
31
ev~luated
from t~bles as did David anq Paulson (1965), or obtained by
nl1,q1erical
integration~
integra~i~n
scheme
ties, if
~he·means
prope~ly
adju.ted.
rn tact they can be obtatned with the
j~st
of
Qiscus$ed for first stage. rejection
th~
With
bivariate normal variables t
samplesi~e
h:i
and t 2 can ~e taken ~s
1
cated approach, b~t provides
t
1
nqme',t'~cal
probabi~t-
and t
2
are
n-1 and one outlier, the means of
A and O.
Thi~ is an upnecesssrily cempli-
useful checks on numerical results.
so~e
The computational procedure for first stage rejection probabilities
t~
also applicable if there are two outliers with unequal means.
tM~ ~e.,
The
Pea~sQn
Pb~
of course, the last formula for
~eometric
representation used
and Chandra Sekar.
Because of
In
ab9ve, J;l.O longer holds.
herev~rifies
symmet~y,
the·results of
the condition that
hyperbolas d9SQ.t 9verlap in fhe first (or thir4) quadrant can be expressed by requiring the slope Yo pf the t
1
as~ptqte
to. be less than
one, -..,.
i,~ e.
i
-1 +
=
which
n.. 2
( n-1 ) [ :2 '" (n-I)
V
J
n-1
<
1
imp1ie~
verifY~ng
(2.16).
Also, requiring the slope, 1 , of t;heasymptotein
1
the fO\,1rth (~r second) q'j,lsdrant to exceed -1,
£p!.
-1 -
> .. 1
yields
32
1
> 2"
verifyi~g (2.17).
2.2.4
Asymptotic Properties of Performance Criteria
,
If the two outliers are from the same distribution, the·meanof the
biv~riate
normal distribution of t
"f"~ca11'
spec~
y at
t he
"
po~nt
and t
1
2
(n-2
--- AJ -n-2)
-- A •
n
n
if the hyperbolas for first stage rejection
A --> 00.
But if Pc
= 0,
approaches 0 as A -->
00,
~s
on the line t
~
1
t ,
2
Consequently if P > 0, i. e.
c
Qver~ap,
then P --.> 1 as
c
the probability of detecting even one-outlier
except in the case-in which the
~symptotes
coincide, for which Pa --> 1.
If the·me{lns of the two outliers are different., say Al and A , let
2
(n-l)Tj-l
n-l-Tj
Then, as Al and A increase, the mean of the .distribuci-on of t and t
2
2
1
moves along the line t
= yt 1 •
2
If this line does not inters-ect tlt.e
hyperbolas of either family, the probability of rejecting either outlier
~pproach~s
zero as AI' A2 -->
00.
The portion of the AI' A2 plane for
which detection of either outlier becomes increasingly difficult as
A1,A
a increase
Y
can be expressed as the region
o
1
< y<--.
Yo
prOVided y < I.
If y
o
0
> 1, there is no such region.
tive values· of yare as fol'lows:
o
Somerepresenta-
33
n
v
V. 05
Yo
6
0
.81.5
.474
2
.732
.662
4
.666
.826
6
.614
.971
0
.706
.803
5
Jioo
1.128
10
. "528
1.. 396
15
.. lj.?/
1.,622
11
The value of Yo depends onty on n aQd V, not specifically on v, although
for given ex, V depen<;ls on v,
2.2.5
Numeric~l
Results
The :foregoing discussion provides some
indi~at;i.on
of the·circum-
stances in which this sequential procedure; might be effective in a
$ample containing two outliers.
A rough
indicat~on
of what is likely
to happeI1- at the first stage for any Al and 11. can. be seen from the
2
fami,lies of hyperbolas.
Numerical results for P , Pb , aJ;ld Pc have been
a
obtained for a few cases and are shown in the followiJ;lg figures.
Figure 2.5 sh,ows performance charac tex'is tic.s for n = 11 and n = 21
for unknown variance wit4
E
= OJ
ex
simultaneous rejection of Xl and x
ch~racteristics
Fi~e2.6
approach zero as
is also fO'r n
:=:
2
:=:
..
0 5.~ Al = A2 •
In the firs t case
is impossible;) and the performance
A increases.
11, but here Al
:=:
2~~2.
Again simultane-
ous rejection of two ouuliers is impo66ible y but P and P approaGh 1 as
a
b
A increases.
Performance for known variance in this case is also shown.
·75
n=11
v=O
0:=905
·50
.
"
.25
o.,.::::;_.L.-_illlli:::.-...L---'---I---.J,--......---"----'---'
.1
2
3
5
9
7
A
T
1.°°1
·75
n=21
v=O
0:=.05
·50
.25
.1
2
3
5
A
Figure 2.5
Performances P , Ph' P of sequential maximum
a
c
residual test of section 2.2.1 when 0: = .05 and
2
2
x1,x2
~ N(~+A, ~
), x ,···,x n
3
~ N(~,~
)
1. OO....---...---...----r---.,...---r-----r--.....--..,----r----,.
Unknown Variance
·75
.;0
·-.
.25
p·=o
c
0
1
2
5
3
9
7
Al
1 0 00
Known
Variance
·75
.25
~
··
0
1
2
3
4
5
6
7
8
9
Al
Figure 2
0
6 Performances Pa , Ph' Pc
o~ sequential maximum
test of sections 2 2 1 and 20101 when
212
ex = .05 and xl '" N(I-l.+A1 , cr ).? x2 "-' N(I-l. +'2 AI' cr ).?
2
x ,··0,xn '" N(I-l.,cr )
3
r~sidual
0
0
35
36
FigilP? 2,7 shews a compar.ison of the 'lla11;,8 of in.ternal and external
d.~.
wi,th respect; to the
,It appears
d~tecti~g
characteristics, withAl
= "-2'
external d.f, are not necessartly more valuable in
outliers.
TwO-Sided Alternativ¢~
2.2.6
~
t~at
performan~e
We consider here the problem with unknown variance of testing for
Qutlie.G \"'bi.ch !flay have shifted by differenL amounts, or even in dif·,
ferent directions.
OUtllU
The first stage uf the pJ::ocedure
i.!:l
'L,0
c:tas,<:l a", an
the observation with the largest absolute residual if
_-
-----...,',
il
where
w~n,v)
is the two...sided rejection constant tabulated by Quesen-
berry and David.
If an observation is classed as an outlier at this
$tage.. t:he test is repeated on the remaining observations;> etc.
Unlike
tJ;1e con::esponding test with known variance, this procedure can reject
an observation in the second stage in the opposite direction from one
,rejected i,n the first stage.
In fact simultaneous detection at the
first stage ordiI).arily implies sequential detection.
An explanation of
conditions for which thiais true is given in Appendix 7.1.
Ear'Herit was pointed out -that there are two families
ot hyper-
bolas wh;i.ch relate to the efficiency of the maximum studentized residual
test with two outliers shifted to the-right.
nigh a,nd one lowo1..ltUer.
This is also true with one
For means shifted by the-same'amount, A, in
opposi,te directions,? the probability that both outliers &re detected at
the
fir~ t
st<;lge approaches one as A ->
00
if the two families overlap
37
n=2:\.. ~__#)II
·75
V::;()
.50
>.
.25
J,.
2
3
4
5
7
8
9
A
·75
n=ll
v::;:lO
p.=21
.50
v:;:;()
n=6
v=15
1
5
7
A
Figure 2.7
rerformances Pa ~ Pb' Bc of sequential
maxim~
tel'! t of section 2.2.1 when. n+v = 21 J or
2
2
x ,x2 rv N(Il+A,o-)J
xyo •. ,xn
N ( Il,o-)
1
'V
residual
= .05 and
38
in the second and fou:crh quadrants.
The· c.ondi.tion for the hyperbolas to
If
tecting even one outlier approaches
W·
2
>~
° as
the probability of de-
:J
A -->
The portion of the
00.
A ,1 plane for which the first stage detection pI·obabili.ties approach
1 2
for A ,12
1
->
00.. ).-2/Al cl)nstant is 01:: tained from
<
".
°
<
y
Smne rqHesentative val.l.Les of y 1
hI:':"
the twu ... sided situation are as
follows:
n
6
11
A c.mnparison of
th~se
\'
W
.0,')
0
. U'ltl
-..
2
. TIl
.. 0821
4
,708
.- .997
6
.657
-1.1 45
°
.744
... .897
5
. 638
.-1.205
10
,,566
-1.449
15
·51'3
.-1. 659
l
1
.604
results with those on page 32 indic.;Ites that
convergence of first stage probabiU.ties is more favorable in test:,i.ng
for one high and one low outlier than in testing for two high outliers.
In this statement more. favorable convergence means that: convergence to 1
OCCUl:S foY a greater portion of the appt'opriilt e quarter plane of AI' A •
2
Conv~rgulLe
liers ei
l. h"~r
is still
les~
favorable' tf '.vE'
a}"c~
both pwsitlve or bc,rh. negat.ive.
CC1H
(,rued wit.h two out-
This situation would
39
require
'U.BE:
of the rejecti.cm constan.ts for
lx-x If)
S ,n. ..? which are larger
than for (X(n)-x)/s, but for the first stage probabilities to converge
to 1. we would have to have overlapping hyperbolas in the first and third
quadrants.
Figure 2.8 shows the perfcrmance of this
'.
te~t
against the alterna-
tive of one high and one low outlier gi.ven in (2.1lj).
F;jr n
convergetlce to U Ui ""luwEt chaf' in the ,:r.e-8.L<ied .::ase.
expected] since 'Y
tW0;:;Llcd casc.
o
:c'
ThiS might be
.803 ' .. ::b.t-: ,~,·ne si.ded... case and. 1'1 =
I'vl n
L;.
11, the
-.897 for the
Ui"; a.,ympLYtd- favor the one-sided case.
Neverthc t<:,ss the two·.. si.dE:d r··l'rlolmanc.e seems
the lar,'ger values of 11, 1.nd LtC/til'i',
Uk:
to
be more favorable fc,r
difference in correlations of
residuals is important.
2.2.7
Ext.ernal Studentizat.ion
One way of escaping part of the masking effect of multiple outliers
is to s(::udentize with an independent variance estimate.
This procedure
is obviously superi.or in some ci'rcumstanc.es j since the test based on
combined internal and external studentizationmay have decreasing probability of detecting even one outli.er as 11 increase.s.
It may happen
that external stl,ldentization is preferable in other eases also y as shown
by the examples of Figure 2.9.
With results for additional cases, one could perhaps determine for
each sample size the mi.nimum
be in same sense superior.
'V
for which external studentization would
The/i.ncentive to do
since if one knew the alternative
iO
£0
i.s
somewhat limited,
Spt·cHy exact I.y two outliers there
are better tests, as discussed in Chapter 3.
---r-_----.
1. 00 r--"-"'-'~--r---r---,-"'--r--~........
0
1
~
-
2
4
\
7
5
I
I
8
9
8
9
:A.
.
"
1.00
n=21
.75
ex==.05
.50
.25
1
Figu;re 2.8
3
4
6
7
Performances Pa , Pb'
PC ofsequentid max:itnum
,
residual test of sect~op
2
2.?6
2
Xl ~ N(~+A,~ ), x2 ~ N(~-~,~ ),
2
x3,···,x
. n
~ N(~,~·)
when ex
= .05
and
40
41
1.00
(a) n=11
v=10
.75
/
.50
.
/
/
/
/
,
.25
1
2
3
4
5
6
7
8
9
A
-
External
External and
Internal
(b) n=6
v=15
·50
1
Figur~
2.9
2
3
5)...
7
9
Performances Pa , Pb' Pc of externally studentized
sequential maximum residual test of section 2.2.7
2
when a = .05 and x1'x~ ~ N(~+)..,~ ),
~ N(~,~2)J showing th~ performances of the
n
test of section 2.2.1 as dash~d lines for comparison
x , .•. ,x
3
42
1.00
..
,--r---r-...-r--,.--r--.c:;aIII.....-,----,
·75
n=11
v=5
.50
.25
o ~:--JI.oalI!:!!IIli!::II---L---I.._...,L,.=~_-J.
1
2
3
4
5
A
Figure
2.9
(continued)
6
_ _.J..I---l..--J
7
8
9
43
The numerical C'al<.:ulati.on (,·f Pa' Pb " at\d Pc for external
stud~ntizati0n w~reobtain~d
by rrumet1cal integration.
For this test
procedure'the first: and second stage te$'C statistics (xl~'x) /sv and
(x2~xl) / ~ l are 11,'C l,ndE?pender.. t.
P
c
I\l.re
~a~dlJ
pdf of s •
v
o.et.~.nr,i~~d,
'For fix~d
and,.an be
i::l
v. h~1W'Iii·ve:r.\, Pa' Pb , and
9
integr~c.ed
w:l.th reepee..- L
the
44
')"
MURPHY'S TEST FOR TWO OUTLIERS
3.1 Introduction
Suppose that in a sample of n normally distributed observations J
N(~+A,~2)J
K observations are distributed
N(~J~2). Among tests invariant under change of scale
are distributed
and location, Murphy
largest observations
;{
(-, I)
when
(1951) showed that the most powerful test of the
- 0 ~gJiQbL rhe alternative A
hypothebls
"- . ) "
A ~ 0, and n-K observations
fJ
outlt~x8
>
0 is:
declare the K
if
+ x(
)1' x lj.,)
n-K+1)
- KX
>
b
,-.
IT iil
nJa
known, aqd
(3.2)
('
--n:, a
s
if cr is unknown.
In this chapter rejec(.lnn constants are presented for
some tests with K
= 2.
tions are given.
Finally, simpler proofs of some of Murphy's results
When this is not feasible bounds and approxima-
are included.
3.. 2
Known. VariancE.'
3.2.1 Rejection Constants
For the case of known variance
we
aSSume
2
UC'·.L
is the development of suitable rejection constants, b
Pr ( x(n) + x(
n·-1 ) ,.
.?x " h fi,a
when in f.'lct: there are no outJ!yt'.':,
The first problem
• such that
n,a'
0'
UE;ing an ilpj»"i.lcb Flmj.lar to that
of Gruhb,'; (195)) and Nedr (1948) i.n obtain.ing the distri,bution of the
la.rges t
1 ~.;,
LJualy a method of olnaining ex for given b will be developed"
Then b corresponding to a prescribed ex can be obtained by inverse interpolation.
Following Grubbs and using his notation in this section y the
density of the ordered observations xl < x
o(n-J) ~n - -x.1..
-
2
< .
xn J
x,.)
:ir
Q
+
0
< x
n
is
(n-l)x
.
n
n
we have
f (qM ., " . ,11 , 't)
c'
. n'
n-IJ.
)
e
o
q
<'
)',1
Tl
2
46
and
1
_. '2
nl
- (2:rc) (n-lJ72
n
Z
'T1
i=2
e
2
i
Making the further transformation
lr(r-l)
ur
'I)
r
r
we have., stU.1. followi.ng Grubbs and Nair
1
r (LI
n ~
._)'1
'J
,-'
"
"IJ
- '2
Vii-
. 'I) /2
j"
e
(r-l)u
The
UiS
are cloeely related to residuals.
u
< ru
r-l -
r
In fact
",
x
r
r
where
x-
x
r
1
+
+- x
r
r
Grubbs and Nair integrate out. u ,
2
;
° " ".;
Iln,~,l in (3. Ii).,
Far our purpose we
want the joint densi.ty of U
;Ju, and int.egratE ('ut pnly u2:,."oJun_20
n" .. } no
.
Us:i,.ng the following functions
8S
u
I
()
dnflned by Grubbs J
.)
e-
(,'"}' dx
n
Vn
F (u) - ~n- 1
n
V
f
u
1 (~ x2)
~
n-l
e
F
1
--
V2rr
0
(
n)
--x dx
n=1 n-l
we obt.ain
f
Il.~
....
(° " " " . U n )
3
'''J
i"
I.,
i(u ", ' .
4
"iJ
p
rn-n:-l rrr
e
--2-
'j2
"I;,
\f2- \f2;
!
'.T.,
]VJ
J
(2Jr)
etc., until finally
f(
. un_I' un )
Ii:
n(n .. J.)" _~_
21l
n-2
;=
n
i
L'
L'
11",,;:"
n ,.. 1
\n~'L)
-
U
l[.n-l II 2
- -
n a
+--U;..J
)e- 2 n,·2n-l n-l i:l:
n-,l .
n
o<u fl·· 1 <--u
-- n·.. l n
The functions F (x) have been tabulated by Grubbs for x in intervals of
n
.05 ,and for n
s: 250
This means that numerical Integration of feu
is fairly easy.
Recall that
u
n
x ,..xn
~-,
.- x
n ..
J
·~x
n,~J
(if
n ., I
.,x)
j
1
(x ,,):)
n,.·.·]· n
"
n-
IJu)
n
48
We a'.re
i!li:E;re~red
&
n
+
in t.h"
r",gion where
X
2i >
n-1
-
h
or
u
..
+n-2
_"
n-1
n... 1 ""n
>h.
'l''? facUit;ate evaluat:i.onJ let
~I,,·l
11
n" J
v
_
r .
\'i "
ll' I ' l l
Hence
1 n".2 2
V
c n-·1
1
'0 ~~
F
'tl
.)
.. t
tv')
.I.'
l•
whe:re
1
We now want to tntegt:a te f (v 1'- V;J over thE' J'f'gi on
n.::2
V
n::I l
i. eo
V~
c:
+
>
rememberi.ng alsq that
n~,2
n:I
I'<IX
V",
n
('
l(V )
2
>
V._,
r::'
V
1
Hence for any b the p-robabtU.ty that (3.3) holds when there are no outapproxi~ated
liers can be
..
by
1 n... 2
- -- -
V
2
2n.-11i[1_
a:(b) ;::
~--_.,-.-.
·f,
_..._----
Rn,.. if
\i
I
I
I
L
( 111'1 X
n-.2
V
n~l
V'n:1 n.,2 - V
(b
)
1-
]
VI.
1.
I.
t
0', Rnd
can then be obt;li.l.wd bV
to b
tl,Ja:
i.nlerpu!nti"l'I"
jI
\.viH·
I(lUlld l,b;)!
ex =
.01
approximations
with, \,lI'j.cert<;lhlt..Les only in the foun:h sign U lC8nt figure can be
01, N abolltYL and V1 tmi,tably chosen for the
-1
lksl,llt$ are pl('~l'nll'd in Table )" 10
Qbt:ainecl with L\
part.iculaT
11
0
X.
I.
.'0
N(II.4A" 1)
K.
I
Pt-
t)1
1)( .
L
i·
u., (y
Table 3.1
Rejection constants b
for sum of l/irgest. two standardized
residuals
n)a
a
..
4
5
/
u
'7
n
u
In
.Ll
~
.01
2.388
2. 80 5
3.098
3.326
)·509
3.661
791
2.934
3.380
3.693
3·932
4.122
4.279
4.411
4.526
4,,625
4.714
4" 79)-1
4.866
IL931
40990
5.045
5.097
5.144
5.190
5.232
'5. 273
5·310
5.346
5.379
5.410
":1
"
1"c
Il"I)iJ'j
13
14
4.091
4.1'70
4.241
4.307
4.368
4.426
4.479
h.526
4·571
4.613
4.654
)~. 690
4.728
4.764
1'"
.:J
"
.05
-n
16
17
18
19
20
21
22
23
24
25
26
27
4. '(9f.)
51
witp.,mr. t'egard to whe-cheT other x s are si.gnj.:ficant.
Since xl + x
j
•lS nOl"JlJ.8 J".''t (l.strl
I'
°b'
' h mt"'u,
....
ute d vat
..
tersect at
SUPPO"H~
n ..
~:.
n
one point.
,
Tlwn
j".
1I
CQm:pari.so ll
.
2(n-2)
an d va.rl.ance
n " the per-
a, the curves for two values of
e~actly
I.
h'
1"
n~
say n
l
and u2J in-
The 1 coordinate of this point is
Ii,
nu,!Uce
for ill will be better than that
'rnance for n
·,t til,} M1Jrp\'lv
- 2X
Some re£ults are shown in Figure 3.1.
formance i.s easily calculated.
For a particular
tn
n
,.2).\11.
C\~n
2
I.
I'b t
lInd 'he
2
i.s better.
"H;'lfl)(~nt"'al
test of Chapter 2
fonnanc"" of l,hp t·tl.rrphy test is nor dirl'! L ly compara'bJ'" wiTh Ph of the
previous chapte'r (P
b
being t'ht· [Jtob.:J.bi I ity of seq\Jenl.lal detccti.on of
both outliers) with the same ex.
observations reje.cted in
sam.plf~s
This Is because the average number of
wit.h no cmtli.en;; J.8 dUferent.
3.3 Unknown Varianc.e
H if
o
.,.
x
rJ"O:
52
1.00r----r---.----r-----:::::J;;iiiilll--,
·75 -
ex= .05
.50
.25
1
2
3
5
L 00 r------r----,-----;-.---.,...--.:::::iiI__--..
".75
ex = ,01
·50
.25
1
Figure 3.1
2
3
5
Performance of Murphy's test for two outliers
when ~1,x2 ~ N(~+A,l), x ,ooo,xn ~ N(~,l),
3
1>0
53
test~
TO ev;,luate the perf.Grmance of the
c
n,ex
or even to apply it, constants
are needed such that: 0.')) h,~\lds with probability
no outliers.
ex when there are
The general solution of this problem is difficult.
some cases, however, it happens that at most one pair of
rejected with a test of the form
(3.5),
XiS
In
can be
and a simplification results.
This oCGurrence is a generalization of the masking effect mentioned
earlier o
Previously it was e,nough to know the upper bound for the
next largest ,nHd"'DtJzed t(I;d,duaL
Now we need the corresponding bound
for the next latgest sum of K residuals.
From this bound we can obtain
immedIately the c,:Jnd,itLm nuder \oJhich Murphy's te$t can reject at most
one pair of
3.3 2
0
XIS"
Condition for. Rejection
uL
<-lL Mo~c
One Set: of K
XiS
---'~------~""""~~"'-----~~~--""""'~-~_·"-~"·-""--~-----~"""'---~··r_.<-,~--
The fol10\oJ:tng bound for t,he SE:cmld laxgest sum, of K studentized
Tesiduals from a sample of n was given by Murphy"
The derivation
in~
eluded here is simpler than that of Muxphyp and uses an approach similar
to that of Pearson and Chandra Sekar (1936) in t,reating a single ordered
~tudentized
residual.
'K
There are 'D) distinct sets of K residuals 1n a
sample of n observations and corresponding to each a 'value
..
T. -
1
where the indices j 1-' j 2:'
0
0
• •?
jK are chos en from 1,2, ••• J n.
< T(m) the ordered Tis.
The01;'em~
54
= [(x(n) ~x) + (x(n_1) -,x) +", + (x (n-K+2) -x) + (X(n_K) -x) ~2
Proof~
n
2: (x.-x)
i=l
where
1
n
x- =
2:
xC") / K-1
1
1
i=n-K+2
x-
(n-4) S
n-K-1
= 2:
xCi) / n-K-1
2
i=l
2
0
n
n-K-1
=
2:
i=l
(X(i)-x 2 )2 +
Hence T(m_1) attains its maximum when
possible, i,e, x( K ) = x( K)'
-n- +1
n-
2:
i=n-K+2
s~
X(n_K+2) =
"0
(X(i)-x 1 )2
= 0 and x(n_K+1) is as small as
~,~,
= x(n) ,
The problem is unaffected by scale and location,
- x
(n-K) -
,
The only problem remaining is the
spacing between the three groups of XiS,
x
2
- 1
(n-K+1)-
~(n-K+2) = ". = X(n) = t
.
~o
let
55
We want to
m~i..mize
in t the functi.on
.( 1+(K-l) t_ KiJ 2
Z(~Cx)2
wh~ch
after simplification is
t
2
(n-K+l) (K-l)n + t[ .,.,4n(K-l)
Differentiatin~ with
J
+ 2n(n-2)
respect to t: the maximum is found to occur at t
consequent~y
and
<
2
2J,lK _ 2K
2n
With a little algebraic manipulat;ion this agrees wi·th Murphy's result.
For
th~ two'outl~er
case we have
(3.6)
Thus
(Xi-X) + (xj-x)
I
V
c
- 2
n
-2:;
>
i~l
(x.=:il:)
1
holds for at most one pair i,j provided
c ->
We
V3n2n- 8
then hav~ by symmetry under H
o
=2"
(3.8)
> c }
.
~
The right hand
~ide
of
(3.8) can
be evaluated as follows~
x
n
,,( x.~x
.,,)2
~
i=l
1,
= Pr (
i
(3.9)
2
(.§. _
c
)
f 2 (n-2)] (x -x ) 2
{
= Pr
2
2
( (.§.,.
c
n
n.,.. 2) t
= Pr
( t
n
>
V
.
2
> c2 ;
xl > x
2
C
c
2
n~2
}
>
x
2
2
> c ;
/n- 2
n~2
Ii -
where
X2
n-2
I
1
. . n
= P'r
1
}
1
+ x
2
2 > -
x
57
n
I
r: xi
=
0,-2
ir=3
and t has the t dhtributiQn with
n,~2 d~grees
of freedomo
'rhus we have
'.
that if (3.7) holds, then
(3.10)
a(c)
=
(~) Pr (
t
>
c
2-
V
'rhe restriction on c which allows at m0st one pair of
sign~f~cant
is equivalent to a restriction on a.
The
XIS
maxi~um
to be
a which
can be evalt,lated precisely with (3010) i;s obtainable f;rQm (3010) with
c
=
V3~~8
.
There!:iults are as follows:
-
0,
M;aximum
a
,
100
0901
.672
0464
0304
.191
0115
.0 69
.040
0026
.013
.007
0004
3
4
5
6
7
8
9
10
11
12
13
14
15
These values show that
a~curate
with (3.8) only for n < 10 if a
rejection constants can be obtained
= .05
and n < 13 if a
= 401.
3.3.:3 A Bound for Krror in ex in Other Cases
i
,
Th.e boun4 for error in
the·clas~
ca~e,
a derived below may be useful in extending
of tests slightly beyond those for which c
2
> 32~8.
In any
we have by Bonferroni inequalities
> c ) < m Pr ( Tij > c
Pr (T
I\lax
where
m
-
x. ···x
:::=:
.f- X].'-X
-;:======
n
x: ex . . j{) 2
1
i
1.
V
and the last surn in (3.12) ~xtend$ over all d:j..stinct pa:j.rs, of which
there are (~).
We call T
and T
an overlappi.ng pair of T t s i f the
ij
Kf
pairs (i,j) an~ (K,'If) have one element in common.
nQn-overlapping.
Otherwise they are
Let the numoer of non.•.overlapPl.ng pairs be N and the
o
nqmber of 9verla:pping pairs be N •
1
Then
_ n(n_1)(n~2)(n-3)
No -- (n)3
4 - -----cn
since we can choose the four
ind~ces
.n) w'ays, and each of these sets
in (4
can be divided in'l;:o two pairs in 3 ways.
Similarly
since the t'Qree indi,:es can be cho$en ~Q (~) ways and the common element
...J
59
can then be chosen in three ways.
It i.B eas:Uy verified that.
The problem now is to compute an upper bound for
..
The last term is the more important.
It will be shown later that the
term with naIl-overlapping TIs is small in the cases of interest.
Let
Xl
3
x.
1:
"3
i""l
~
n
x
2
8
2
==
L;
8
then
2
1
2
Z
~
. 4: n-3
~==
n
==
L;
i==l
8
x.
-
3
==
L:
i::::l
(x.-x) 2
~
(x .~xl) 2
1.
n
=
L:
i==4
(xj,,'-x ) 2
2
"~') 2
c:x
( xl + x :::)
2' ,
S
=
60
- + x ·-x
- - 2 (- ))2
(x ,=x
x-x
1
1
2
1
1
--,...,~.,._--;:;:-----;=:----;:::---2
2
8
f, ~2, + 8
1
2
=
".
where
2
For any given X and L\.. the. conti guration
min(T
2
12
0
f t "t "t
1
2
3
that m.aximizes
"
, T~3)
is
-2
t
1
= 22
(say)
.
Hence
The value of Z which maxi.mi.zes thi.s function
.l
+ ~2
bKLi-.
1.8
Hence
Thus for
nQn~~+o
jj.2
"l
probabj.lity that both
212
e~ceed c. we need
13
T12~ T
1
>
2
K
c
2
1
.- 1
"6
~
or
t
(3.14)
2
n-4
n-4
>
c
wner~
for
=
2
t _ is student's
n 4
K
2
1 - 1
- "6
t
n-4
4(n-3)
1
3n (2
c -"6)
-1
with n-4 degrees of freedom.
An upper limit
~he probability that T~2)T~3 both exceed c 2 can be obtained by
fi~cling th~
2
prQ1;>ability that (3.14) holds, .!:..~. the probability that
2
X and t;. a1;e in such a ratio that simultaneous significance is possible.
We have
Pr { T212 > c 2 , T2 /~ c2 } < Pr (
..
}
13
Now LHS
= ~
Pr ( T > c J T >
12
13
~
.
- 1
} + 2 pr ( T12 > c, T < -c
13
}~
so
that
Pr ( T12 > c, T13 > c ) < Pr (
t
n _4 > \
n-4
}
.
62
We now cO'P-sider bhe simultaneous signif:Lcance of non.. Qverlapping T 1 so
Since T
and T
are n?gatively correlated,
12
34
bounds
PI' ( T
max
(3.17)
>c
}
<
Pi: ( t 2
n~
ill
J-2-n
(3.18)
PI' ( T
max
>
>c
Pr
ill
t
n
_2
c
> \
2
>
c
2}
-n.~2
\j c~
\
2
n
.~
c
n=2
)
}
=
The$econd tenn on the right in
_~reqqent1y
Pl;" { T
be disregardljld.
>
12
J
c
te~
in
N
e j ::::;
qu~tion
ci : : ;
02
ill
which is
>
s~all
reJ.atively small and can
I;or c; will usuall.y be established
is appraximately a/m,o
P-,: ( T
12
. the
(3018) is
If
aim
becomes
(n-2) (n-3) a2
2n(n-l)
0
1
if
<-2"
enough to be ignored in most cases o
$0
tl;1at
6.3
The followi.ng tabl.e was obtained. using ().17) and (3.18)
n
ex
ex
-
max
---
c
min
.9926
"1375
.02
100489
00550
,,0543
100538
005)0
.0497
:1-5
1.00.37
.050
,,015
C!.J
100064
.0100
.0054
11
"-
0
F:tOl1l these J:'esul ts i.t appears that the simultaneous significanc.e of two
TIs can safely be ignored only sli.ghtly beyond the range of n :for which
Simultaneous signiflcanre is illJf\oBsihle.
Fot:' example the uu,certainty
in ex :\.I\tended to b€ .0':) 1.1': tlf.;"lgligible tor
11
n
=:
'Cc
11 but considerable for
15.
3.2. )+
Performance
.
,......
As before VIe consIder the slpgle U1eHsqXE' ot perfl)rm8,nCE:~ which is
the probability of rejectlon. of the two
out.1ier~o
In this case we
cpmpute
e }
where
X.
1.
For the-cQmbipatlqns N"n: cOlwi,lplcd" the
UlH01L;.'d,tU'y
in ex l.,s eIther. nil
64
aJthou~h
or negligible
two
TiS
was disregarded in establishing c.
wer~9btainedwith the
(1963),
the' possibility of simultaneous significance of
help. of noncentral
and are· shown in Figure
3.4
Some results of performance
t
tables of Locks at 01.1.
3.2.
An Op~imum Test for a Predetermined Number of Outliers
Suppose x ,x ,.o.,x are independent;> normal observations with
n
l 2
common y~riance ~2 aqd means ~i.
~i ~ ~
Ro :
= l,2,o.oJn
~
We want to test the hypothesis
against the alternative that the means of
k.<: n Qf the'observations have shifted by amounts propo;rti,onal to known
cons~antso
Thus under H
a
Ex
n
-. cx. A + fl
wher~ CXl,a2Jo.o'~n
~
n
are known and il' ••. Jin is an unknown permutation of
theip.tegers l,2,ooo,n.
Consequently there are
n~
alternative·hypothe-
ses.
The
unde~
the restrictions imposed later they are not all equal, and at
(XiS
are not necessarily distinct, and some may be zero, but
leas.t one'm1J,st be nonzero.
Without loss of general:1ty we assume
We. reqUire a pn)cedure for dec.i.ding between H
o
and Que· of the n,! alternative hypotheses corresponding to the n! permutations •. The' prG>cedure. should have the foUowing--p1:'o.p&rtiesu
1.
The l'r(;)bability of choosing H when H
o
0
ts true shqul\ibe l=CXo
1.00
0:=~Q5
.75
.25
1
2
3
4
5
6
7
8
9
A
1.00
·75
n=ll
0:=.05
·50
.25
1
2
3
4
5
A
Figure
3~2
6
7
8
9
Performance of Murphy's test for two outliers
2
2
when x ,x ~. N(~+A,~), x , •.• ,xn ~ N(~,~ ),
1 2
3
A>O
2
The
0
procecl~Te
is unc.hanged if the same Il't!mbe:r. is ,added to each
obse.rvati.on o
The procedure is uncbanged i f each obs€:Ivati.on is mu.ltiplied by
30
the
~ame
posit:i,ve number
0
The probab1.Hty of chr:)oaing an alternattv'€; if that alternative
:f,s tt'1,le is the same tor each of the
n~
alte;r:natives o
The probab1.li.ty of. a correc.t dEH.isl':Jn 1:8 max l.mum
Con~ider
0
first the problem of testing w'beth.er a slIppage has acc.urred
c9rre~ponding
to
So
0: 1 s:~
particular perrttu,t,ation of the
Fo·r si.mplicity
of notationSttppose t:hh permut:atlon takes the lihlbscrlpts in. their
natur~l
.j
order.
Hence
and we want; to tes t
against
In
thi~
case
~( ".!J..,,(X.'C
.\)2
LX.
1
1.'
2
2cr
e
aull consequently the stati.stics Lx."
La.x."
1."
:l 1"
'A,
2
(J'
where
0
Lx~ a:n~ sufficient
for
.
Equivalently we can t,ake as s'G:fficiE-,nr:
!J....,
~.
x..,,
Ea,
1
(x.~x)
l'
'J
and
8
2
0
J
.
) s-~
. (n-2
o
He+~,
s2 is the familicU;
. 0
La
i
(xi-~)
S~"
and
of
re~;id'o,lal variance
location ("h!inge the te=:t
un~eT.'
FC'J;' inval'iance
e~tlmate
mu~t
In regression •
dcpE'nd only on
and for i.l;tvaI'i,anCB undE!' ;:;calf7 change ellS
depe'ncts only (lnt:he ratio La:l(Xi~:)i:)/so'
well~,
it.
But thIs is eq·:.:dvalent. to a
tstatistic with noncentrality paramet.er :;,L(CX{<'&) 2:, an.d the test desir'!C1
rejectl!l H if
q
f
(t)
a
-GtT >
c
0
o
We
u~e
c as a generic con8tant in the fcllowlng statements.
shown (eo
go FJ;'as~r (1957) J
....,. equival~l;l.t
It has been
p. 103) that the pre,cedi..ng c;:tatement is
to
> c
t
or
La.
1.
(x.-x)
1.
s
>
Using 1:=he definition of
Returning to the
NeY\1.1ap Pearson
c
0
o
n.~+1
lClljll18
S
2
o
y
we can say that the t.el.':t rejects H when
0
ded.e.lon problem.:! an.d applyIng r:he generaliz8d
proved i.n the appendix., the p:J:;'ocedvre
ib ~
Choose
H(,
Ghcose H
a
i.£
< c
i.E
:> c
m
for some IJern.ut:atL::n :Itch\:; ex: d:, an-i a
ID.
1<) the pErIC.l.n:·ati.",n f:irw-tich the
test statistic is maximized"
Ordinari.ly the exit;> are H,'t all o.ifferent,,, and ht';:n.c.e the nl permo.ta-.
tions
at;'~
not. all dJst;incc,
At
[imet>:
tions i.s :rttlt;vant: bi:'cause of thE
nacul<,~)f
examples shew some of the more likely
E;xam,p:!.e 1.
cnly a. re:.;;tx·icted
of
s€::t
Th~
the problem.
permtl.ta,~
fallowing
af'plicat'.ions~
'l'he test ,fc.:r; K out:Ue·r8 \'lith t,t1E sarr:2 di.>\:;trib'l..ltL:,uo
Suppose we want to choose between t.he null hypcrheBi'f C!f n,:' <el'u:tliers and
di~jtr'i.b!lt,:Lon
script c.)me from a
ex _
n k
:;:;(),9
and ex
_
=00
n,~k+-L
0
=ex '=:10
n
wIth mean
iJ
fl".:
},:>
Ou
Let ex =ex2~o
1
0
0
=.
Th.t: Vrv<..:E:dure s2!lectsthe K largest
observatjc.'Us a'S Qutli.ers if
c
and ptherwl,£:.e accepts H{}
outliers
outlier,
consid~~ed
0
cat-,,,=s
"'iE' tl;;iVe
the rest
~hjs chap~er
ani rhe
ua~ai
Alb bpecldl
earlier in
f0T' t\\7(1)
te&t £01 aUt
g,
EX~lllple
~nown
The test for one outlier to the ri.ght with some observations
to be nonoutliers.
outlier, if any.
tions
o~
SUPPOSE;
The nonzero
thela~t
observat:l.ons contain the
I'
a may be tak&n as 19 and the only permuta=
the ala which correspond to dLljitinct alternative hy'pothesis
are the r perm'l,1tati.ons contaIning 1 in one of tht;' la'€Jt r positions and
Tqe procedure chooses't.he largest an:ong the last r ob-
Ole elsewhere,
,servations as an outlier If
max
i >n-r
x.-X
1,
>
c ,
n.
_ 2
.E {x ""x)
1=1" i
This test i p easily generalized [')
con~:ider
the. posaibility of a number
of outliers gr'eater than ene.
Ex,ample 3.
Paulsonls
re~ulto
Suppose the ob8·enlat:i,;ms are in
with K observations in each groupo
We want to
any, comes frvtn a distribution
mean shifted t.:,thE- right.
'vli,th
decii~
I'
groups,
which group, if
Let K
of the o;l S be 1 and all others O.
The only permutatiwr:s cf iC":2rest
are those in whi.ch the nonzero
fall in the same groap,
o;~s
The
pr..:
ce-
dure chooses the grou.p wi,.th the largest sample me.art i f
max
> c
1
where
ii
Exatnp~e
in
is the sample ffiean of the i
4.
~ddition
th
group.
External estimate of variance tc be includedo
Suppc.;se that
to the data considered above wer.avc an exter.l,al estimate
;0
of variance
~
~J
l:>
L:a. (x . pox)
~
;L
2
v? base.d on v d.L
S
2
0;
and
also sufficient.
I;
2
'v
.
The
are new
t';uf.ficlent~ is t:a t.i.s tics
/-)
- fi:X. ,x=x
But x;,
. :J
J,.
2
an.d S p "'"
.
) s 2.
(n,=2
+
0
;is
2
i)
n=2 +v
a:re
After imposing the lequ.u:effiGr.. t:3 of f..r.::·'a:riance we. find
the test must uepend on
Lai(:X:!""X)
---r----
S
..
or
p
equ~vdentlr
Z(.x:
, . ;L _.X)
which ,aga;Ln iii! a
essentially ~he
observatioljls.
t
2
+Vb
2
v
statistic
b,amE:l
o
The ext.c.tnal estimate of variance is
information a~ (',-"nte! n€~ i.n Ku:l0 ~!> third
gl;''';Up
(If
11
40
GRUBBS i TEST FOR TWO OUTLIERS
401 The Test Procedu·,t'e
The test for two outliers proposed by Grubbs
highes~ obs~rV'aUons
the two
(1950)
is to reject
if
<
where it
1
u- !)u
Th~
is the mean of the observat:tons exclud.i.ng the tw'o highest.
corresponding statistic Cdn be used to test. the two lowest obse,rva-
ti,ouso
Grubbs derived the distribu.tion of the teet. Btatbtic and gave
Tl].e test was proposed on intuitive
a table of reject-ion constants o
grounds an.d no optimum
to
ap~ly~
however J
~nd
properti~s
are claimE;!d o
Sincl;' the test is easy
is perhaps widely used J some numerical results
for its performance will be
gi~en
here o
Performance of Grubbs ~ Test:
4 2
0
I
AI:, before" we invest.igate the probal ility that: Xl and x .? assumed
2
:
to be the true outliers.?
ar~
signifi.cant;> wit.hout l;'ega,rd to whether
other o1:>13erl1ations are significclUto
~r
(
<
n
l: ( x ..-x"') 2
;i,.c;:l
where
1.
The perfoY,l!lance index is thus
d
n.?D:
}
72
n
::::
x.
-~
I:
. 3 n-.2
~=
Sin~(a
Wt::
ha.ve
the
8,:3
p~rformance
1
Pr (
2 ( n - 2 ) X l +x 2 2
1
i
(x 2 2") t 2
1 +
n
1
n.
~.
2
I: (x.-x )
12
i;:;:) ~
:::J
PI' (
i
i
1
(4.1)
;::: Pr {
;::: PI' (F!
1 }
>
(r -
;L)
n;:5 }
n"CX
where
x;/ b
noncentral
x2 with 2
do f.:J
x~_'3
is
ord:~n.ar~y X2
'with n=3
d. L, and F! is npn<;:entrFtl FwU:h 2 and n= 3 d. fo
If the t:wo outliers
ha.ve ~l'lal1-, A" the n.oncentral,ity parameter is 2(n~2)
In
.2
A
,9
say 2K.
Tang (1938) gave the following formula for evaluation of non=
2
<;:entral F prq~abilities if the de~rees of freedom of the denominator X
is
even~
7.3
(4.2)
where
b
=1
'2 [denomf.na tor 2
X deg:q~.e$ 0 f
x
=
T
1
Thep~rformance
1
f:J:'eedom] =: '2 v2
l=x
= ( a + b - l + K) x
x
of Grubb$' test for n
u~;j.ng fO~41a (4.~), with
=:
11 and n
=:
21 was
the resuH;s shown. in Figure 401.
evaluat~d
1. 00 r----r---r----r---.,.--.,.--"t::3'-.,.--.,..--.,..----,
·75
n=ll
•.50
.25
\
1
2
3
4
5
A
:
7
8
9
7.5
50
COMPARISON OF PROCEDURES AND CONCLUSIONS
501
Comparison of Murphy is and Grubb.':!' Tests
The performances with
ri
unknown of Murphy 1 s (J-951) and Grubbs
Y
(1950) tests for two outliers with the same distri.bution are shown in
Figures 501 and 5 2 for n:=:5 and n::.::ll0
0
Murphy! s test is superior OJ as
expected from t.he development of section 303 but: by a smaller' margin for
for n=llo
If the outliers are from different di.stributions J say of means I-l+A
1
and IJ.+A2J the optimum procedure i.n the sense of secti.on 303 would be
based on the weighted sum of two residuals with the weights proportional
to Al and A ,
2
But since :A
A~).,
and
1
or even their ratio.9 are not likely
to be known.? it may be of interest to consider the performance of
Murphy i S test for two outliers from the same di.stribution when i.n fact
Al
f
A ,
2
We recall that the performance of Murphy v s test: involves the
integral of a noncentral t statistic o
(3" 9) that the
I t follows from
noncentrality parameter is proportional to )'1+)'2"
ConsequentlYJ lines
of constant performance in the AI" )'2 plane are of the form Al +A :.:: Ko
2
Performance of Grubbs' test is an integral. of the Doncentral F
;
distribution of expression (401)0
If the out.Her means are unequal, the
noncentrality parameter is
A. +A
( 12 2)
2
(
2 n~2
)
.
+
{ Ai."- }'2
2
)2
_ ~n~1 (.,2
- 2)
A +A
n
-
2
and the curves of constant performance are the family
=
2.,.,
A
1"2
1_
-
n
1. 00 I--.,.....--~-.,..----r----r----r--.--.---T-·
n=5
.75
a=.05
Murphy
.50"
.25
o _~r:---I_......l._~_....Ll_......l._-lI_-L_ _.Jl-,;."_L_1
2
1
3
4
5
6
789
A
1.00
n=5
a=QOl
·75 .
. 50
Murphy
.25
;;
OL---
2
3
4
5
7
8
9
A
Figure 5.1
Performance of Grubbs' and Murphy's tests when
x1~x2 ~ N(~+A,~2), x , ••• ,xn ~ N(~,~2), A> 0,
n. =
5
3
76
77
1.00
n=ll
MU+phy
a=.05
.25
1
2
5
A
1.00,'
I
. 75
I
1-
-'
I
I
·50
n=ll
Murphy
j
a=.Ol
I
!
-i
!
Grubbs
i
I
I
I
.25~
-+
:.
0-·__L
1
I
I
2
3
4
5
A
Figpr~
6
7
8
~
9
5.2 Performance of Grubbs Y and Murphyi s tests when
2
2
x ,x ~ N(~+A,a ), x ,oo.,xn ~ N(~,a ), A > 0,
1 2
3
n = 11
78
Representati.ve lines of constant performance. for Mu:rphyi s and Grubbs
tests are shown in Figure 5030
i
The perfl,rmances aSSClclated with the
lines were read graphi.call.y y and are acc.urate to the sec.ond place at
mosto
Nevertheless they suffice to show that if the a3sumption of equal
outlier means is not met.
0
the superi.ority of Murphy 1s test d:i.minLshes
rapidly" while Grubbs i test is more robust. o
If we cannot speci.fy the
shift of one out.lier relative to the ot.heL but. beli.€'ve both have
shifted to t.he right" we may do better with Grubbs
5.,2
j
tesL
Compari.son of Three Tests
To compare the sequential test wi.th Murphy1s or Grubbs' tests we
can consider P ., the probabl1.Lty Dr detect.ing both ou.tIlers with the
b
sequential tes t.? along with the single cri. terion used with the other
testso
If CX
= .05 for the sequential procedure." however" the average
number of observations rejected per sample with no outliers is
.05 + (005)
2
= 005250
Since the average number of rejections with the
other tests i.s 2cx, we choose CX = 002625 to obtai.n
parable to P with the sequenti.al test.
b
8.
performance com-,
The comparison still favors
Grubbs' and Murphyi s tests somewhat J since the sequential test will
detect a single outlier someti.mes when it fails to detect both.
Also"
should it happen that only one outlier exists.? the sequential tes't is
preferableo
Grut'bs' test i.s intended for useliJhen
,i
is unknown
0
A comparison
of the sequential t.est and Murphy 7 s test for known vari.ance is shown in
Figure
5040
As expected_" Murphy's test shows the better performance o
The sequeJ;'lt:i,.al performance is nevertheless reasonably good o
A given
79
.375
6
5
Murphy
Performance
4
3 -1="".............
2
1
Murphy Performance .14
Figure
5.3 Lines of constant performance for Grubbs' and
Murphy's tests
80
.75
n::;6
·50
~'lirphy
a;:. 02625
"-
.25
QIIooool!IF-~_~_l.- _-+I_--I.I_--!:I....-.--I.I.....-~I_......J,I_
123456789
1.00
I
n::;:21
Murphy
,50
a;;::.O?625
1
.25
1
Fi~re 504
2
3
4
5
7
8
9
Perf(;>nnance of sequential maxim\llll;resit,iual and
MurpllY's tfastlil for two outliers when x ,x
:N(IJ.+A,l),
1 2
x3J""x~
N(IJ.,l)
'V
'V
81
performance level can be attai.ned with the sequential pt'oced'llre w:1.th A
about 1cr larger than the A needed to attain that: per.formance with
Murphy I S tes to
This difference is the price paid for not knowing the
exact number of potential outliers
o
Figure 5.5 shows a comparison of the three tests with unknown
variance for n=11 and of Grubbs
j
and sequential tests with n=2L
The
. rejection cons tant for 1'1urphy s test for n=21." CX=o 02625 is not accuj
rately known.
The sequential test, compared on this basis, seems to
have consistently lower performance than either of the other two tests.
The inefficiency of the seqLlential tests is really serious.? however, in
those cases for which P
c
3)0
not unreasonable, especially
In otlwr cases the sequential procedure is
tf
At
me:s!
on~
outLi.er is expected in the
majority of samples to be examined"
503
Conclusions
,
Based on results obtained in this study the following conclusions
appear reasonable.
The conclusions apply only to outli.ers displaced in
mean,? and, except where noted,? to problems of at most two outlierso
503.1
Samples Expected to Contain at Host: One. Outlier
If a sample is Virtually certai.n to contain at most one outlier,
one of the sequential tests of Chapter 2 is the best procedure for
outlier testingo
The appropriate form of the test sho'uld be used.')
depending on whether the variance is known or estimated and whether
one~sided
or
two~sided
alternatives apply"
82
1. 00 r----,.--r--r--...-,--~~II'"'"""'__r-_,_-,__,
.75
n=11
Murphy
a=~0262.5
~
.~
!
,25
a=.02625
c·
Sequent al Maximum
Residual, a =.05
". ','~' ._L_.. _ 1
L,~,
~
OL.........IIr-:_iI!!:::..
2
1
'.s,
6
7
8
9
10
D=21
~
Grubbs,
a = .02625
,
~
Sequential Maximum
R~sidualp
a= 005
;
1
2
5
7
10
11.
Figure 505
of three tests for two outliers when
2 2
x 1 ,x2 ~ N(~+1I.,~ ), x ,ooo,x n ~ N(~,~ )
3
P~rformance
83
5~ 3. 2
Samples with
? Predete:nuin~d
Number of Pot.ent.:1.al Outliers
If the nwnber of potential QutHers is l{ > 1 (K known.):J and the
choice is between K outliers or no outliers!) a test for the particular
K ehould be used.
Grubbs' and Murphy's tests are preferable to the
. s.~quential test if K=2
0
If the relative· shifts of the outliers are
known, as with outliers equal in mean, Murphy's test should be used
when rejection constants are available.
If the relative shifts cannot
be assumed, Grubbs' test is preferable.
5.303
Samples with an Unknown Number of OutliersJ.Known Variance
If the number of potential outliers is unknown. and the variance is
kIlOvn;l, the·
s~quential
maximum resi dllal procedure is satisfactory for' One
'or two· outliers and n >
6.
5.3.4 Stilllpies with an Unknown Number of Outliers." Unknown. Variance
I
If the
'.
n~~ber
qf potent1al outliers and the variance are both un-
known, a choice of procedures should be made depending on the seriousne~s
of
~a~king
n, v:; and
CYo
by multiple outliers!) if present, for the particular
Under conditions for which P
c
=OJ
e.g. n + v < 15
--
(apprpximately) for CX=.05:; the sequential test of maximum residual
should.beavoided.
For somewhat larger n+v the sequential procedure is
satis;Eactor;y •
5.3.5
External Studentization i f Multiple Outliers' Are Suspected
Fo;!:' cases in which the sequential test is i.nefficient, as for
n + v < 15, a"".05:; or marginal, as for n+v slightly larger,? the test
based on external studentization can be preferable to the one baSed on
int:~r:nal
and. external student:i.zation even if v is as low as
5.
84
6,
Anscombe, F. J.
19600
LIST OF RE~ERENCES
Rejection of outliers.
Technometri.cs 2~123-147.
David, H. A., and A. S. Paulson, 19650 The performance of several
tests for outliers. Biometrika 56:429-436.
Dixon, W. J. 1950. Analysis of extreme values,
Statistics 21:488-506.
Ferguson 1 T. S. 1961.
de ~tat. 3:29-42.
Annals of Mathematical
Rules for rejecti.on of outliers.
Revue Inst.
Fraser, D, A, s. 1957" Nonparametric Methods in Statistics.
W.iley and Sons, Inc", New York1 No Y.
John
Grubbs, F. E. 1950, Sampl e cdteria for testing outlying observations.
Annals of Mathematical Statistics 21~27-58"
Halperin, M. J S. Greenhousp~, .I" Cornfield, and J. Zalokar. 1955,
Tables of percentage pain u; In l the studentized maximum absolute
deviate in normal samples, J<)urnaJ. of the American Statistical
Association so: 185.. 195.
19560
17:67-76.
Kudo, A.
On the testing of
~utlying
observations.
Sankhya
Locks, M. 0., M. Jo Alexander, and B, J. Byars. 1963. New tables of
the noncentral t distribution, Office of Technical Services.,
U. S, Department of Commerce) Washington J Do C.
McKay, A. To 1935. The distribution of the difference between the
extreme observation and the sample mean in samples of n from a
normal universe. Biometrika 27:466-472.
Murphy, R. B. 1951. On tests for outlying observations. Unpublished
PhD thesis, Department of Mathematics, Princeton University,
Princeton, N. Yo University Microfilm, Ann Arbor J Michigan.
Nair, K. R. 19480 The distribution of the extreme deviate from the
sample mean and its studentized form. Biometrika 35:118-144.
National Bureau of Standards. 1959. Tables of the Bivariate Normal
Distribution and Related Functions. Applied Mathematics Series
SO. Government Printing Office, Washington) D. C.
Paulson, E. 1952. An optimum solution to the k-,sample slippage problem
for the normd distribuUon o Annals of Mathematical Statistics 23~
610,-616.
85
Pearson, Eo So, and Co Chandra Sekar o 19360 The efficiency of
statistical tools and a criterion for the rejection of outlying
observations 0 Biometrika 28:308-3200
Quesenberry, Co Po, and Ho Ao David 0 19610
Biometrika 48~379-3900
Some tests for outlierso
Tang, Po Co 19380 The power function of the analysis of variance tests
with tables and illustrations of their use o Statistical Research
Memoirs 2:126-1460
Thompson, Wo Rn 19350 On a criterion for the rejection of observations
and the distribution of the ratio of deviation to sample standard
deviation. Annals of Mathematical Statistics 6:214-219.
To Wo 18840 A Treatise on the Adjustment of Observations by
the Method of Least Squares
D. Van Nostrand Company, New York,
No Yo
Wrigh~J
0
86
7.
7.1
APPENDICES
Conditions for Which Simultaneous Detection Implies
Sequential Detection
To establish conditions under which simultaneous detection of one
high and One low outlier implies sequential detectioD.,j> we use the
geometrical representation introduced in section 2.2.3.
the situation for n=21,
2
= 17.338
X
·50
of,
~.~.,
x
2
are drawn.
v=O~
a=.05 is shown.
In Figure 7.1
Only the hyperbolas for
To relate this to the second stage rejection
given that xl was rejected at the first stage we have
which leads to
(7.1)
n-l
- n-2
This allows us to show, for the same fixed
rejection regions for Xl and x
linear boundaries.
2
x2 ,
the upper and lower
at the second stage.
These regions have
The two shaded regions in the figure are the regions
in which Xl can be rejected first as a high outlier, followed by x
a low outlier, and the region in which x
~1
as high.
t
2
is rejected as low followed by
The intersection region is that of simultaneous rejection.
From the figure we see that for
a~d
2
as
2
~mpli.es
x.2 5'
sequential significance.
initial significance of t
l
For an algebraic expression of
87
t
2
6
5
4
3
2
1
t
1--
-1
-1
-2
e
-3
-4
-5
-6
Figure 7.1
First and second stage rejection regions for sequential
maximum residual test with ~2 unknown when n
a = ~05J
2
X
2
= X.
5
= 21 J
88
a sufficient condiUon for this" we can require the point: of intersection Q to be within the lower second stage x
upper second stage xl rejection region),
2
rejection regions (or
This requirement can be stated
<
2
2
The requirement does not contain X J and hence holds for all X if it
holds at all,
For n, v, and a ordinarily of interest, (7,2) can be verified
numerically.
In the limit as v -->
00"
however, the condition fails,
Figure 7,2 shows the first and second stage rejection regions for xl
and x
2
with known variance, for n=21, a=.05.
The first stage rejection
regions are
Ix 1. -x I
=
It.1
I
>
2,97
i = 1,2
and are shown as vertical and horizontal dotted lines.
The second stage rejection regions are given by
= It 2
+2n-l
t11
> w(20)
,05
= 2.94
!x1 - 2 1 = It 1
+2n-l
t21
> w(20)
,05
= 2,94
Ix 2 -x1 I
x
and are
~hown
as sloping lines,
Simultaneous first stage rejection
regions are shaded in the first and fourth quadrants" and show the
different relation to second stage rejection depending on whether the
outliers are on the same side of the mean,
89
;
\
\
I
\j
\I
f'
\!
------"11--
I
-6
-5
-4
-5
\I
I
i
-6
I
Figure 7.2
First and second stage rejection regions for sequential
maximum residual test with
ex = .05
a2
known when n
= 21,
90
7.2
A Generalization of the
Neyman~,Pearson Fundamental
'Lemma
Suppose we require a procedure for classifying the observations
x ,x ,··.,x as having come from one of the densities foZ'fl, ••• ,fm,
n
I 2
such that
(a) Probabil:i.ty of choosing f
when f is correct is I-ex,
0
0
(b) Probability of choosing f. when. f. is correct is the same for
1
1
i == l,2, •.• ,m,
(c) Probability of a correct decision is maximum, subject to (a)
and (b).
The procedure is to choose f
f
choose f. if f.(x) > f., j
J
11-
Proof:
if cf (x) < f.(x), i == 1,2, ••• ,m; and
o
0-
-
Aod I.(x)
i
~
-
1-
> cf (x) •
0 -
Let the decision functions be d.(x), i == O,l, ••• ,m with d.==l
1 -
where f. is selected and d.==O otherwise.
1
(7.3)
The required
(a) and (b).
1
J [c
1
Consider the integral
m
d0(x)f
(x) + L d.(x)f.(x)]dx
•
01==1 .1 - 1 - -
de~ision
procedure is that which maximizes (7.3) subject to
Let Do,D1, ••. ,D be the partition of the sample space
m
corresponding to the .procedure
d ==1
0
if
d.==l
if
1
c'f
0
> f.1
i == 1,2, •.. ,m
f.1 > f.,
with c chosen to satisfy (a).
j
J
f
i
and
f. > cf
0
1
Let C ,c 1J ••• ,C be the partition correo
m
sponding to any other procedure, say
di,
also satisfying (a) and (b).
91
Co
D
C~l=-
Crm~,
o
D
1
1---+--+------I---'I
D
m
,--1-. __
_....1.-_"--
Consider the set SjK' for which d selects f j and d i selects' f ,
K
If j=K"
the integral of the form (7~)over SjK is the same for d and dl.
If jfK
and j,K
> 0 we have
m
J
[cq~(~)fo(~) + ~ di(~)fi(~)]d~
SjK
=
i-I
SjK
=
J
. S
Simi1ar~y
fK(~)~~
J
J
£j(~)d~
SjK
[~d (x)f (x) +
0-
0-
jK
m
2: d.(x)f.(~)]dx.
i~l
1.-
1.-
-
if j =0
m
=J
S
oK
[cd (x)f (x) + 2: d.(x)f.(x)]dx
0- 0.
1.1.1=1
and if K=O
=J
S
jo
m
[cd (x)f (x) +
0-0-
2: d.(x)f.(x)]dx •
.
1- 11=1
92
Hence theinteg~al (7J) is ~malle~ witq the procedure d' than with do
This completes
~he
proQf.
93
7.3 Tables of Numerical Results
Table 7.1
Performance of sequential maximum residual test of section
2.1.1 when x ,x2 ~ N(~+A,l), x ,···,xn ~ N(~,l), A> 0
1
3
n
Pa
e·
a = .05
a= .01
Pb
Pc
Pa
Pb
Pc
6
1
2
3
4
5
6
7
8
9
.096
.332
.694
.934
.995
1.000
.007
.085
.392
.786
.966
.998
1.000
.001
.019
.146
.469
.797
.954
.9::13
.999
1.000
.027
.138
.422
.77 6
.962
.998
1.000
.001
.017
.158
.537
.877
.986
.999
1.000
.000
.002
.035
.213
.565
.855
.971
.996
1.000
11
1
2
3
4
5
6
7
8
9
.080
.346
.753
.966
.997
1.000
.003
.065
.370
.789
.968
.998
1.000
.001
.028
.222
.626
.911
.989
.999
1.000
.024
.156
.512
.872
.989
1.000
.000
.014
.162
.573
.903
.991
1.000
.000
.004
.075
.374
.770
.958
.996
1.000
21
1
2
3
4
5
6
7
8
9
.058
.310
.743
.969
.999
1.000
.001
.041
.304
.746
.962
.998
1.000
.001
.025
.228
.658
.932
.994
1.000
.018
.143
·517
.889
.993
1.000
.000
.009
.130
.532
.890
.989
1.000
.000
.005
.085
.425
.824
.976
.999
1.000
94
Table 7 2
0
Performance of sequential maximum residual test of section
2.1.4 when Xl ~ N(~+AJl), x2 ~ N(~-A,l), x3,·~·,xn ~ N(~,l)
n
Pa
6
11
CX=.Ol
P
c
P
a
P
c
1
2
3
4
5
6
.117
·525
.913
.99 6
1.0
1.0
~008
~040
.137
.576
.924
0996
1.0
.297
.772
0980
1.0
100
.001
.043
.342
.804
.981
.999
1
2
3
4
5
0072
.399
0844
.990
1.0
1,0
.002
.064
.404
.838
.984
.999
.023
.207
.663
.956
.999
:;LoO
.000
.016
.204
.666
.948
.997
1
0043
.292
.754
.97 6
1.0
1.0
.001
.029
.271
.733
0963
.998
0014
.142
.554
.920
.997
1.0
0000
.007
.121
·535
.900
.992
{)
21
CX = .05
2
3
4
5
6
95
P~r~orm~npe
of ~aqu(ap.t~al max;lmum l"esi<!ual te$t of sect:l,on
2. ~.1 wh~n xl ,x2 N(Il+A.?(T2), x ,··· '~n N(Il, i) J A > 0,
3
Tabla 7.3
t'V
t'V
a :;::.05
-
n
6
P
-
-
15
1
2
4
.071
.204
.635
.9.39
.004
.047
517
.928
0
0
.018
.178
.441
,,002
.008
.132
.188
.155
0
0
0
0
0
8
0
10
1
.046
• Ill,
4
8
.198
.195
.156
1
.060.
~206
2
4
6
.702
.973
.999
8
_i
0
a
2
6
21
P
A
6
11
P
v
.040 .
.222
0631
0953
0998
l
2.
4
6
8
t
i
b
p
c
f 002
.036
.542
.963
.991
0
o·
.039
.299
~ 631.
~OOl
0
0
.020
0217
.544
0028
0449 .
0939
.. 993
96
Tabl~
7.4
Performance of sequential maximum residual test of section
2
2
2.2.1 when Xl ~ N(~+A1'~ ), x 2 ~ N(~+A2'~ ),
2
x3' ••• ,.x n ~ N(~,~ ), Al
....
= 2A2 > 0,
ex
= .05
97
7.5
Perfotmance of' sequential maximum residuaLtest of section 2.2.6
· 2 -
when xl ~ N(~+A,~ ), x
0:
= .05,
n
-
11
21
v
=0
2
.
2
~ N(~-A,~ ), x ,··.,x
3
P
.p
-
a
--
b
1
2
4
6
8
.036
.109
.243
.280
.263
0
.010
.129
.258
.262
1
.028
.137
.640
.959
A
2
4
6
8
.999
0
.013
.411
.934
.997
2 .
n ~ N(~,~), A> 0,
P
c
.....-
0
0
0
0
0
°0
.012
.257
.605
98
P~rformance ofsequent1al maximum testdual,test of section
2.2.7 with external studentization and with iriternal and
exte:rnal studentization when x1,x2 '" N( 1J.+1., cr2 ),
2
x ,··.,xn '" N(IJ.,cr ), A> 0, a = .05
3
Table 7.6
,
I
Pb
Pa
n=.6
v=15
a=.05
n=11
v =10
n=l~
v=5
a="o5
- E!!:I
I
2
4
6
8
.267
.836
.996
1.0
.204
.635
.939
.983
2
.231
4
.BI0
.206
.702
.973
·999
6
8
.987
1.0
.162
.592
.912
.989
2
4
6
8
I
!!:I
+E2I
-A
External.,
~. Inter~al
+
E&t~tnal.
.169
.503
.792
.999
E!!:I
.076
.664
'l986
1.0
.054
.571
.969
1.0
.044
.388
.834
.980
I
Pc
+E.!Y
EY
I
+EEl
.051
.528
.934
.983
,,022
.366
.886
.996
0
.018
.178
.441
.041
.558
.966
.998
.029
.417
.913
.998
.039
.299
.031
.388
.785
.949
.025
.279
o
.001
.018
.72 4
.951
o
.631
.~,
99
Table 7.7
Performance of Murphy's test for two outliers when
2
'2
x1 ,x2 ~ N(~+A,~), x , ... ,x n ~ N(~,~), A> 0
3
Performance with
Known ~2
-- -
A
1
2
3
4
5
:"
n =11
- =21
Xl
n=6
n=ll
n=21
.063
.354
.783
.974
.999
.038
0311
.784
.980
1.0
.020
.240
0738
.976
1.0
.020
.187
.605
.922
.995
.012
. 164
. 618
.943
.998
.006
.121
. 570
.936
.998
Performance with a
n=4
.866
1,732
2.598
3.464
4.330
5.196
= .05
n =10
n=6
Perf.
A
.027
.064
.120
.194
.279
.370
.968
1.937
2.905
3.873
4.841
5.810
-Perf.
.023
.092
.237
.442
.654
.820
A
1.186
2.372
3.558
4.744
5.930
7.116
Performance-with n=ll
,;;
\
'"
a=.Ol
n=6
Unknown (1"2
A
--
Performance with
a=.05
A
--
a=.02
a=.025
a=.Ol
-
1.23
2.46
3.69
4.92
6.15
7.38
.032
.261
.701
.951
.997
.G19
.182
.583
.901
.990
1.0
.008
.100
.409
.780
.959
.996
LO
-Perf.
.026
.193
.564
.875
.983
.999
100
Tabl~
7.8
perfo.nnance of
Grubbs' test for two outlie~s when
x ,··o,x n ~ N( ~,~2 ), A> 0 .
3
2
xl,x2'~ N( ~+A,~ ),
~
Performance·
-n
-a=.Ol
-
A
~
5
1
2
4
8
12
.029
.060
.178
.514
.798
.014
.030
.091
.299
.545
11
1
2
3
4
6
.021
.115
.358
.675
.979
.011
.067
.240
.526
.9~9
.004
.031
.132
.345
.834
1
2
3
4
6
.016
.132
.477
.849
.999
,,008
.085
.373
.765
.997
.003
,.045
.251
.634
.990
21
a=.025
.006
.012
.036
.129"
.264
© Copyright 2026 Paperzz