ASYMPTOTIC NORMALITY AND ROBUSTNESS OF ONE SAMPLE CHERNOFF-SAVAGE STATISTICS FOR HETEROGENEOUS DISTRIBUTIONS by Pranab Kumar Sen University of North Carolina Institute of Statistics Mimeo Series No. 556 November 1967 Work supported by the National Institutes of Health, Public Health Service, Grant GM-12868. DEPARTMENT OF BIOSTATISTICS UNIVERSITY OF NORTH CAROLINA Chapel Hill, N. C. ASYMPTOTIC NORMALITY AND ROBUSTNESS OF ONE SAMPLE CHERNOFF-SAVAGE STATISTICS FOR HETEROGENEOUS DISTRIBUTIONS* By PRANAB KUMAR SEN University of North Carolina at Chapel Hill 1. Introduction and Summar. Let Xl"."Xn be independent random variables having continuous cumulative distribution functions (cdf) Fl(x), .•. ,Fn(X), respectively. Consider the Chernoff-Savage (1958) statistic (1.1) Tn = (l/n)L.nlE 1= n, iZ n,1., where E . = J (i/(n+l», (i=l, ..• ,n) are (explicitly known) rank scores (satisfyn,1 n ing the conditions 1, 2 and 3 of section 2), and Z i is 1 or n, a according as the ith smallest observation among !xll, ••• ,lxnl corresponds to a positive X or not (i=l, ••• ,n). For Fl= ••• =Fn=F, the asymptotic normality of the standardized form of Tn has been ~ obtained by Govindarajulu (1960) (See also Sen and Puri (1967) and Pyke and Shorack (1967». The present paper is concerned with (i) the asymptotic normality of T n for arbitrary continuous Fl, ••• ,F and (ii) the robust efficiency of Tn for shift altern natives when Fl, ••• ,F 2. n are not all identical. ~. Let c(u) be 1 or a according as u is ~ a or not. Define (2.1) (2.2) (2.3) As in Chernoff and Savage (1958), we extend the domain of J (u) to (0,1) by letting n * Work supported by the National Institutes of Health, Public Health Service, Grant GM-12868. 2 it have constant values on (~~i < u ~ n~l)' i=l, ••• ,n. Then, Tn in (1.1) may be written as 00 T = I J ( +n H*(x) )dF*(x). n n n 1 n n (2.4) o It is assumed that (2.5) (1) n-;<JO l~m J n (u) = J(u) exists for all O<u<'l and is not a constant; 00 (2.6) (2) n n I [J ( +1 H*(x»-J(--+IH*(x»]dF*(x) = o n n n n n n 0 p (n ~ ); (2.7) for some 0>0, where K is a finite positive constant. Let us also define 1 00 11~ = I J[H(n)(x)]dF(n)(X) (so that 111~1 ~ I IJ(u)ldu < (0), (2.8) o 0 00 00 • I J2 [H(n) (x) ]dFi (x)-[ I J[H(n) (x) ]dFi (X)]2 + 0 (2.9) o 2[ II H.(x)[l-H.(y)]J'[H( )(X)]J'[H( )(y)]dF( )(x)dF( )(y) o<x<y<oo ~ ~ n n n n + II J[H( n )(x)]J'[H( n )(y)]dF( n )(x)dF.(y) ~ o <x <y <00 . 00 00 - {/J[H(n)(X)]dFi(X)}{ IHi(x)J'[H(n)(x)]dF(n)(x)}], i=l, ... ,n; o 0 (2.10) The main theorem of the paper is the following. THEOREM 1. If assumptions 1, 2 and 3 hold and if i~fy~ > 0, uniformly in x and F , ••• ,F • n 1 The proof is postponed to Section 4. 3 3. Some fundamental lemmas. Define 00 (3.1) 00 00 { f J [H(n) (x) ]dF (n) (x)}{ f H(n) (x)J' [H(n) (x) ]dF (n) (x)}]; o 0 00 an,i = fJ[H(n)(x)]dFi(x)-1J~, i=l, ••• ,n; (3.2) o 00 (3.3) LEMMA 3.1. For arbitrary F , ••• ,F , n 1 2 lIn 2- 2 2- Yn = Y (F( n ))-(l/n)l'1.= 1(0.n,1..+Sn, i) -< Y (F( n )) < 00. The proof follows by straightforward computations using (2.1), (2.2), (2.7), (2.8), (2.9), (2.10), (3.1), (3.2) and (3.3). Hence, for brevity, the details are omitted. Define (3.4) 00 (3.5) Bn 2(X.) 1. = f [c(x-lx.I)-H.(x)]J'[H( 1. 1. n )(x)]dF( n )(x), o (3.6) LEMMA 3.2. \' n I Bn(X ) 1 2+8 } < Under assumption 3, (l/n)li=lE{ i 00, . un1.form1y in nand F1 ,···,Fn • PROOF. By virtue of the inequality that la+bI 2+8 ~ 21+8{laI2+8+lbI2+8}, it suffices to show that uniformly in F , ••• ,F and n, n 1 4 (3.7) Upon using assumption 3, it follows from (3.4) that (l/n) 'i'L.. n 1E{ 1.= IB l(X.) I2+0 } n 1. - ) (x) ] I 2+0 dF() = I IJ [H( n n (x) o (3.8) 1 2+0 ~ IIJ(u)1 du < 00, 00 as dF(n) ~ dH(n) and (2+0)(-~+0»-1. o Let now Y be a random variable (independent of X ' .•. ,X ) following the cdf n 1 n F(n)(X). Define (3.9) dn (Xi,Y) n = [c(Yn -lx.I)-H.(Y 1. 1. n )]J'[H( n )(Yn )], i=l, ... ,n. It is easy to verify that (3.10) Bn 2(X.) 1. = E{dn (Xi,Yn )Ixi }, i=l, ••• ,n. Consequently, by straightforward computations, we obtain that E{IB = 2E{ (3.11) n2 (X.) 12+0 } < E{[Eld (X. Y )1 1+0 / 2 Ix ]2} 1. n 1.' n i II \{c(x-IX.\>-H.(x)}{c(y-lx.I)-H.(y)}J'[H( )(x)] o<x<y<oo 1. 1. 1. 1. n - J'[H(n)(y)] ~ 11+8/2 - - dF(n) (x)dF(n) (y) 6 II Hi (x)[1-H i (y)] I J'[H(n)(x)]J'[H(n)(y)] o<x<y<oo 1 1+0 / 2 dF(n)(x)dF(n)(y), (as Upon noting that (1/n)2i~1Hi(x)[1-Hi(Y)] = H(n) (x) [1-H(n)(y)] - (1/n)2i~ltHi(x) H(n)(X)] [Hi(y)-H(n) (y)], we obtain from (3.11) that 'i' n E I (1/n)L.i=1 { Bn2 (Xi ) ,2+0 } (3.12) ~6[ < II , H(n) (x) [l-H(n) (y)] 1 J ' [H(n)(x)]J [H(n)(y)] o x<y<oo 00 11 +0 / 2dF(n)(x)dF(n)(y) - 1 0 2 1 'i' n - 2n £i=1{ I [Hi (x)-H(n) (x)] I J ' [H(n) (x)] 1 + / dF- (n) ( x) }2 ] o < 6 II u(l-v) 1J'(u)J'(v) 11+0/2 dudv < 0<u<v<1 00, by (2.7), 5 dF(n) ~ dH(n». (as 4. The Therefore the proof follows from (3.8) and (3.12). roof of theorem 1. Q.E.D. Using (2.4), (3.4), and (3.6), one can write (4.1) where B (X.) is defined by (3.6) and n 1. 1 o (n-~), by (2.6); (4.2) p 00 (4.3) [-1/ (n+l)] fH*(x)J' [H( ) (x) ]dF*(x) ; n n n C 2,n o 00 C3 ,n = f[H~(x)-H(n)(X)]J'[H(n)(x)]d[F~(X)-F(n)(X)]; (4.4) o 00 (4.5) C4 ,n = f{J[n~l H~(X)]-J[H(n)(X)]-[n~l H~(x)-H(n)(x)]J' [H(n)(x)]}dF~(X). o Straightforward computations using (2.8), (3.4), (3.5), (3.6), (2.9) and (2.10) yield that (4.6) (l/n),.nlE{B L1.= n (X.)} 1. = ~*, n I~*I n < 00 (4.7) Further by lemma 3.1 and the assumption that (4.8) o < Y < n 00, inf y > 0 n n ' uniformly in Fl, ••• ,F n and n. Finally, by lemma 3.2 and (4.6) < (4.9) uniformly in Fl, ••• ,F and n. n 00 , Hence, the independent random variables {B (Xl), ••• ,B (X )} satisfy Liapounoff's condition of the central limit theorem n n n [cf. Gnedenko (1962, p. 322)]. (4.10) Consequently, b It remains only to prove that C = r,n 0 p (n -~ ) for r=2,3,4 (uniformly in Fl, •.. ,F ), n and this is accomplished in the Appendix. 5. General h Q.E.D. and robustness of T. '\/\J'\/\J'VV'VV'VVV\JV\J'VV\N\N\N\N\NVVVVVVVVVVVVVVVVVVV\J1l1\1U continuous cdf's symmetric about O. (5.1) n Let ~ be the class of all 0 We want to test the general hypothesis H: o F.l. e:;/0 '" i=l, •.. ,n, without bringing in the assumption that Fl= •.. =F • n (5.1) is less restrictive than Fl= ••• =F =F e: ~. It will be seen that T in (1.1) provides a robust test for n o n H*: o H in (5.1), for all F.e:~ , i=l, ••• ,n. o l. For this, it may be noted that 0 (5.2) From (2.8) and (5.2), it follows that under H in (5.1), o 1 (5.3) ~ lJ* = n f J(u)du = ~lJ(say). o Upon using (5.2) and (3.1), straightforward computations yield that 1 y2(F(n)) =!>z; f J 2 (u)du = !>z;A 2(say,) if F(n) e: (5.4) o roo Again, using (5.2) and integrating (3.3) by parts, it readily follows that a n,l..+(3n,l.• = 0, V i, if H0 in (5.1) holds. (5.5) Consequently, from theorem 1, (5.3), (5.4), (5.5) and lemma 3.1, it follows that ifH - 0 in (5.1) holds l<2n~[Tn -~lJ]/A) (5.6) -+ ~(0,1), uniformly in Fl, ••• ,Fn • This clearly indicates the robustness of Tn for arbitrarily symmetric F , ••• ,Fn • 1 Thus, like the well-known sign-test (for location) we need not assume the identity of Fl, ••• ,F n (only identity of their medians is enough). However, unlike the sign- test, symmetry of each F. (i=l, ••• ,n) appears to be necessary. l. 7 REMARK 1. The sign-invariant permutation distribution theory of one-sample Chernoff-Savage statistics, developed in the more general multivariate case by Sen and Puri (1967), can be easily shown to be valid (for the univariate case) even when the sample observations are drawn from different (but symmetric) distributions. This permutation principle leaves scope for an exact test for H in o (5.1) (based on Tn in (1.1», when n is small but Fl, ••. ,F all identical. n are not necessarily Also, along the line of theorem 3.2 of Sen and Puri (1967), the asymptotic convergence of the sign-invariant permutation distribution of k 2n 2 [T n -~~]/A to a standard normal distribution can be readily deduced. This patches up the link between small sample and large sample tests for H in (5.1) o based on T • n REMARK 2. Not only T n is used to test H in (5.1), but also it may be used to 0 estimate the common median mate of ~ REMARK 3. based on T n (~, say) of Fl, ••• ,Fn • Thus the Hodges-Lehmann esti- is robust for any possible heterogeneity of (symmetric) Recently, Puri (1967) has considered the problem of combining indepen- dent one sample tests of significance. His procedure encounters some difficulty when the number of sources is large but the number of observations in (all) the sources are not large. This difficulty can be readily avoided by considering an one-sample test based on all the samples pooled together. Here also, the symmetry of the different cdf's is enough, their identity is not necessary. 6. Robust-efficienc Consider the sequence of shift alternatives {H }, n where H specifies that Xl, •.• ,X are independent random variables having absolutely n n ~ . continuous cdf's F 1, ••• ,F , respectively, where F .(x) = Fi(x-n c.8), 1=1, ••. ,n, n nn n1 1 (Fi's being all symmetric about 0), and cl, .•• ,cn and 8 are all real and finite. It is also assumed that F has a continuous (a.e.) density function f for all i i 8 i=l, ... ,n. We define Further, it is assumed that f(n)(X)J[H(n)(x)] is bounded as x+±oo. Concerning the rank scores {E .}, it is assumed that E . is the expected value of the ith n,1 n,1 order statistic of a sample of size n from a distribution function ~(X) (6.2) where ~*(x) = ~*(x)-~*(-x-); is a continuous cdf. ~*(-x-) = ~(x) given by l-~*(x), This implies that (6.3) Define ~ and A2 as in (5.3) and (5.4), and let 00 f B(F) (6.4) J*'[F(x)]f 2 (x)dx for all F E j _00 o . Then by routine computations, it follows that under {H } n lIn (6.5) ~* = ~~+(8/2n~)(n n 00 I c. f i=l 1-00 1 J*'[F( )(x)]f( )(x)dFi(x»+o(n-~); n n (6.6) , Thus, it follows from theorem 1, (6.5) and (6.6) that lim ~ n-+<lO PH {2n [Tn -~~]/A -< x+(8/A) • (6.7) n for all real x. Let us now assume that the cdf F has the variance O~ for 1 i \' n -2 i=l, ..• ,n and denote Xn = (l/n)Li=lXi , On = (l/n)li~lo~. Then, by the well-known central limit theorem (for non-identically distributed independent random variables), we have for any real x 9 lim PH {n~X 10 <x+(e/o ~ n n nn (6.8) Thus, if c n )cn} = ;z.;;:-oo 1 f x J e ~ t2 1 _ dt; c n =- n L c .• n i=l 1. is different from 0, the asymptotic relative efficiency (A.R.E.) of T with respect to X may be computed as n n 00 f (6.9) -00 J*'[F(n)(x)]f(n)(x)dF i (x)]2. Thus, in general, the A.R.E. depends on (cl' .•• ,c ) as well as on Fl, ••• ,F • n n Two special cases are of interest and yield some interesting results. Case I. cl= ••. =cn=c#O, i.e., equal shift but not necessarily identical cdfs. It readily follows from (6.4) and (6.9), that en (=e(l)) n reduces to (6.10) which agrees with the expression for the Chernoff-Savage (1958) efficiency but for the cdf F(n)' As such, for normal scores statistic (i.e., when ~* in (6.2) is a standard normal cdf), (6.10) will be at least as large as 1, where the equality sign holds only when F(n) is itself normal. efficiency of the normal scores test. This clearly illustrates the robustIncidentally, when Fl, ••• ,F n are all normal cdf's differing only in of,""o~, F(n) can not be normal, unless 0l= ••• =on' Hence, for normal cdf's, the normal scores statistic will have an A.R.E. (relative to x) > n - 1, where the equality sign holds only when 0 = ••• =0. n 1 For the Wilcoxon's signed rank statistic, similar results are already deduced in an earlier note [Sen (1968)] and hence the discussion is omitted. Case II. F =••. Fn=F but ci's are not all equal. i.e., homogeneous cdf's but l heterogeneous shifts. (6.11) It follows from (6.4) and (6.9) that = [B(F)0/A]2 e n = e(2) n ' where 0 2 is the variance of the cdf F. This indicates that the A.R.E. is not 10 affected by heterogeneity of the shifts, and the Chernoff-Savage (1958) bounds are equally applicable in this situation. REMARK. In the two sample case, a distribution-free estimate of B(F) (defined by (6.4)), has been considered by Sen (1966) and the same procedure yields a similar estimate of B(F) in the one sample case when Fl= •.• =Fn=F. It follows from (5.6), (6.7) and some routine computations along the line of Sen (1966) that this onesample estimator estimates consistently (i) B(F(n)) in case I when Fl, .•• ,F not necessarily identical or (ii) B(F) in case II when cl, ••• ,c identical. n n are are not necessarily Also, the interval estimation of the common median of Fl, ••• ,F n based on Tn remains valid even when F , •••.,F are not necessarily identical. n l Let (a ,b ) be the interval S such that n n n,£ 7. (7.1) where £ is an arbitrary positive number and n (>0) depends on £. £ (7.2) (7.3) maxI X. I<x} = 1. 1.- p{. mini Xi I>x} p{ i n - II Hi(x) < [H (n) (x)] , i=l = IIn [l-H.(x)] i=l n - )(x)] n , < [l-H( n 1. and proceeding as in Chernoff and Savage (1958, p. 986 ), it follows that (7.4) Thus, from (4.3) and (7.4), it follows that with probability (7.5) 1 <-n From (2.7) and (7.1), it follows that for all XES (7.6) Upon noting that n,£ ~ 1-£ 11 Since, the extreme right hand side of (7.5) involves average over independent random variables, by Markov's law of large numbers S (7.7) f 1 [H(n)(X){l-H(n)(X)}]-l+O/zdH~ ~! [u(l_u)]-l+o/Zdu<oo • n,E: Therefore, (7.5), (7.6) and (7.7) yield that ICz,n l (7.8) = 0P (n-~(l+o)) = 0p (-~) . F1'···' Fn· n ,un1Of orm1y 1n For C and C ,we require the following theorems. 4 ,n 3 ,n For any E:>O 3 c(E:) «00), such that for 0'>0 THEOREM 7.1. PROOF. Define the stochastic process V (t) by n Then, by direct computations n (7.10) for all t>O. V (t) = E{V (t)} = H( )(t) - n n n 1 l n i=l [Hi(t)-H( )(t)]2 n ~ H(n)(t)<l, Also, on using (2.1), (2.2), (2.3) by straightforward manipulations, i t follows that (7.11) 1 n L + [H. (t)-H.(s)][l-Hi(t)+H.(s)] > V (s) n i=l 1 1 1 - n 12 Then (i) lim t+o v n (t)/gn (t) = 0 00 and (ii) f o [l/g (t)]dH (t)<oo. n n Thus, on defining t 0-0 n by H( )(t ) = n n ~ and using theorem 5.1 of Birnbaum and Marshall (1961), it follows that (7.12) (as 1+2H (t)<2 for all t < to). n - n But, V (t)/g (t) is equal to n n k< _ _ _n_2_[H.....:~::.(_t_)_-_H~(n::..):....(_t_).....,],..--..,,.....,..-+ (7.13) - - K{H(n)(t) [l-H(n)(t)]} ~-o' ~-o' 2 ] K Hence, it follows from (7.12) and (7.13) that n~IHn*(t)-H(n)(t)1 _ _ _-..::'---_.....:_lo.::.L__~--=- (7.14) - - {H(n)(t) [l-H(n)(t)]} ~-o' ~ _> K+l} _< K22 f {u(l-u)} -1+20' duo 0 In a similar manner, it can be shown that ~ (7.15) 2 1 -1+20' K+l} ~ K2 f {u(l-u)} duo ~ 1 -1+20' Since 2 f{u(l-u)} du = co,<oo, K can always be so selected that CO/K2~E, K+l=c(E). o With this choice of K (say, K ), the proof of the theorem readily follows from E (7.14) and (7.15). Q.E.D. REMARK. The theorem generalizes lemma 6 of [5] to non-identically distributed random variables without unnecessarily using the Poisson distribution in conjunction with a binomial distribution; the latter approach becomes quite involved in the case of non-identical cdf's considered above. 13 o (1), uniformly in Fl, ••• ,F • P n THEOREM 7.2. 1 For Hn(t) > cn-~, c>o, the proof readily follows from that of theorem 7.1 PROOF. and some routine computations. - For H (t) < cn n -~ ,one can apply the well-known results on standard Poisson process to deduce the desired result. COROLLARY. Sup { } H*<l [l-H(n)(t)]/[l-H~(t)] n = 0P (1), uniformly in Fl, •••. ,Fn • Let now S be the complementary interval to S n,£ n,£ (7.16) C 3,n = S f Q.E.D. Then, + n,£ Since, with probability ~ 1-£, there is no observation in Sn,£' and as dF(n) ~ dH(n) , it is easily seen on using (2.7) that the integral over Sn,£ is 0 (n -~-o ) i.e., p 1 o (n-~). p Again, on making use of theorem 7.1 and (2.7) (with 0<0'<0), straight1 forward computations yield that in (7.16), the integral over S is also n,£ Thus, C 3 ,n C 4,n = 0 p (n = 0 p (n-~) -~ ). 0 p (n -~). Finally, with theorem 7.2 and its corollary, the proof of follows by routine computations and is omitted. REFERENCES [1] BIRNBAUM, Z. W., and MARSHALL, A. W. (1961) Some multivariate Chebyshev inequalities with extensions to continuous parameter process. Ann. Math. Statist. 1I, 687-703. [2] CHERNOFF, H., and SAVAGE, I. R. (1958) Asymptotic normality and efficiency of certain nonparametric test statistics. Ann. Math. Statist. ~, 972-994. [3] GNEDENKO, B. V. (1962) Theory of probability. New York: Chelsea Publishing Co. [4] GOVINDARAJULU, Z. (1960) Central limit theorem and asymptotic efficiency of one sample nonparametric procedures. Tech. Rep. 11. Dept. Statist., Univ. Minnesota. [5] GOVINDARAJULU, Z., LECAM, L., and RAGHAVACHRI, M. (1965) Generalizations of theorems of Chernoff and Savage on the asymptotic normality of test statistics. Proc. Fifth Berkeley Symp. Math. Statist. Prob. (Univ. Calif. Press) 1, 608-638. (Translated by B. D. Seckler). 14 [6] PURl, M. L. (1967) Combining independent one-sample tests of significance. Ann. lnst. Statist. Math. ~, 285-300. [7] PYKE, R., and SHORACK, G. (1967) A Chernoff-Savage theorem for random sample sizes. (Abstract). Ann. Math. Statist. 38, 1313. [8] SEN, P. K. (1966) On a distribution-free method of estimating asymptotic efficiency of a class of nonparametric tests. Ann. Math. Statist. 11, 1759-1770. [9] SEN, P. K. (1968) On a further robustness property of the test and estimator based on Wilcoxon's signed rank statistic. Ann. Math. Statist. 39, No.1 (in press). [10] SEN, P. K. and PURl, M. L. (1967) On the theory of rank order tests for location in the multivariate one sample problem. Ann. Math. Statist. 38, 1216-1228.
© Copyright 2025 Paperzz