Resistance to Gross Errors in Some Rank Tests

Resistance to Gross Errors in Some Rank Tests
Hany M. Zayed
Family Health International, Biostatistics Division
P.O. Box 13950
Research Triangle Park, NC 27709, USA
[email protected]
Test of hypothesis is one of the standard tasks in most statistical investigations. However, when the data suer from contaminated observations, the choice of an appropriate test
procedure becomes a challenging task. In this paper, test resistance is used to evaluate test
tolerance to gross errors (e.g., measurement, recording, round-o errors). Tests with higher
resistance will be recommended.
Notation and Assumptions
Let 2 be the parameter of interest. Consider a test statistic T (Sn) based on a random
sample of size n, Sn , to test the null hypothesis H0 : 2 0 against the alternative
Ha : 2 1 . Suppose that T (Sn) = M (Sn )=n such that n is the lowest common
denominator that makes M (Sn ) an integer. Assume that n = (1 , n)n , where > 0 and
1 are constants that do not depend upon n and n ! 0 as n ! 1.
Assumption 1. Under the null hypothesis, assume
that E0[T (Sn)] = 0, nVar0[T (Sn)] = n2 !
p
2, and the asymptotic null distribution of nT (S )= is ().
n
n
Arguments through out this work are based on the addition contamination model whereby
the contaminating observations Bk are added to the original sample Sn to form a new sample,
Sn+k = fSn [ Bk g, of size n + k.
Assumption 2. Given a random sample Sn of size n and the set Bk of k 1 contaminants
max
M (Sn [ Bk ) = M (Sn ) + (n+k , n ) + R (n; k ) where R (n; k ) ! 0 as n ! 1: (1)
B
k
From the above assumption, and after some algebra, it is easy to see the following:
1
,
n
n max
T (Sn [ Bk ) = n;k T (Sn ) + (1 , n;k ) + R(n; k ); where nk =
(2)
Bk
1 , n+k n + k
and R(n; k) = R(n; k)=n;k ! 0 as n ! 1.
As it will be shown in the subsequent sections, the above assumptions are not as restrictive
as they may seem. They are satised by a host of rank tests. For notational convenience, Tn;k is
used as short for T (Sn [ Bk ), the value of the test statistic based on n \good" observations and k
\bad" ones. Tn;0 is used as short for T (Sn), the value of the test statistic before contamination.
Maximum Resistance to Rejection
The maximum resistance to rejection (MRRT ) shows how sensitive a test statistic to gross
errors in the least favorable case for rejecting the null hypothesis given that the original sample
(before contamination) accepts the same hypothesis. Under the addition contamination model,
MRRT (n) = minfk=(n + k ) : min
max Tn;k c(n + k; ); k 1g;
(3)
Sn B
where c(n; ) is the level critical value.
k
Table 1: Test Resistance for the One-sample Location and Association Models
MRRT
Test
n n;k
20
50
1
n
Sign (Fisher)
1
0 1
0.5000
0.5000
0.5000
n k
n n
Signed Ranks (Wilcoxon) 1/2 ,1=n 2 n k n k
0.3103 0.2958 0.2929
Kendall's Tau
1/2 1=n 2 n nk nn, k, 0.2857 0.2958 0.2929
n
Quadrant Correlation
1
0 1
0.5000 0.5000 0.5000
n k
n
Spearman's Rho
1/6 1=n 3
0.2000 0.2647 0.2062
n k
+
( +1)
( + )( + +1)
(
1)
( + )( +
1)
+
2
+
Theorem 1. Let Tn;k be a statistic that satises Assumption 2. The maximum resistance to
rejection is MRRT (n) = k=(n + k ) where k is the smallest integer that satises the inequality
(n + k)(1 , n k )[1 , c(n + k; )] n(1 , n)[1 , Tn;, ];
(4)
where Tn;, , the value of the statistic that gives the largest possible p-value.
Proof. Let k be the smallest number of contaminated observations required to force the test
statistic Tn;k to reject the null hypothesis. Then minSn maxBn Tn;k c(n + k; ). Given
Assumption 2, and after some algebra, it can be shown that n;k (Tn;, , 1), (c(n; ) , 1).
Using the expression of n;k in (2) and changing the order of the terms yields the theorem.
The exact values of c(n; ) are usually tabulated for small to medium
sample size. Given
p
,
Assumption 1, for large n, the value of c(n; ) is approximately (n= n) (1 , ).
Theorem 2. Let Tn;k be a consistent test that satises Assumptions 1 and 2. The limiting
maximum resistance to rejection is
MRRT = 1 , (1 , Tn;, ), = where 1 and Tn;, < 1:
(5)
+
0
0
1
0
1
0
1
0
Proof. For any consistent test c(n; ) ! 0 as n ! 1. Given the assumption that n ! 0 (and
hence n k when k is xed) as n ! 1. Plug in all of the above into the result of Theorem 1
yields the theorem.
Corollary 1. For test statistics that satisfy Assumptions 1 and 2, the largest possible value of
the limiting expected to resistance is attained when = 1 and gives MRRT = 1=2.
It should be noted that the above arguments and denitions can, analogously, be extended
to dene and derive the maximum resistance to acceptance (MRAT ).
+
Illustrative Examples
The above table shows test parameters for various models. It also gives the MRRT (n)
for n = 20 and 50 at = 0:05 and the limiting MRRT . For the one-sample location problem,
the Sign test seems to be a more resistant than the Signed Ranks test. For the association
problem, the quadrant correlation is more resistant than Kendall's Tau and Spearman Rho.