Completeness Comparisons Among Sequences of Samples II. Censoring from Above or Below, and General Censoring* '. • by Nonnan L. Johnson Department of Statistics University of North Carolina at Chapel Hill Chapel Hill, North Carolina 27514 Institute of Statistics r.limeo Series No. 1103 February, 1977 • * This research was supported by the U. S. Contract No. DAH G29-74-C-0030. Arrrrt Research Office lmder Completeness Comparisons A~ong Sequences of Samples II. Censoring from Above or Below, and General Censoring* by N. L. Jolmson Department of Statistics University of North Carolina at Chapel lIill Chapel tIill, North Carolina 27514 1. Introduction As in [1] we will consider situations in which we have two sequences of samples Al =AllsA12"'.; the order statistics in Aij AZ =AZl,AZZ ".' respectively. We denote by Xijl S XijZ S ••• S Xijr .. 1) ( r ij is the number of observed values in Aij ). For convenience "Ie will also use the notation X·1 l S X.1 Z S ••• S x.1r i for order statistics in a typical member of Ai (i = 1,2). We further suppose that it is known that each sample in one of the two sequences (it is not specified which one) is a complete random sample while the other is a random sample which has been censored by removal of some extreme observations. We will denote the original sample or sequence by placing a bar over the corresponding symbol - thus All would belong to * This research was supported by the U. S. Anny Research Office mder Contract No. DAtI GZ9-74-C-0030. Ai . 2 ,e In [1] it was supposed that symmetrical censoring (removal of the s greatest and s least values from an original complete random sample of size (r+2s)) had been used on one sequence. In the present paper we will consider censoring from above or below - for example, removal of the greatest s 01' the least s random sample of size (but not both) from an original complete (r+s). We will also consider a natural extension of a general purpose test of extreme sample censoring (suggested in [2] for use when the proportions of greatest and least values removed are tmknown) • 2. Censoring from Above or Below For definiteness we will discuss censoring from above~ in which the greatest values are removed from a complete random sample of size or more generally (rij + s). The cumulative distribution function~ (cdf) (r+s)- Fi(x), will be supposed to be the same for each of the i-th pair of samples Ail and Ai2 • It will be assumed to be absolutely continuous, with probability density function f i (x) = dFi/dx. We "rill also use F(x) for the cdf in a general pair of samples for Al and AZ for convenience. 2.1 Population Distribution's) Known In this case the ratio for the hypotheses HI vs. Hi = sequence Ai is censored HZ where s 3 is ~J L/L = n 1 Z j=l 2 )1 ] 1J } {l-F(Xlj ,1' m [(1' .. +1)[5] 0 (1'Zj+1)[sr (1) I-F(XZj ,1'2/ J (remember that censoring is from above). 'Whatever the value of s, we see that discrimination between HI and HZ will be based on the statistic m T = .n U-F(X1J, l' )H1-F(XZJ' l' e)} J=l ' 1J ' ZJ where Zij,r If 1'lj = r Zj =1 - F(Xij,r)' for all -1 m =,n (2 1J, l' ,/Z2J' 1',) J=l ' 1J ' ZJ Generally HI (HZ) is accepted if T >«) K. S the likelihood ratio is just T j (2) and it is natural to take K=l and use the rule "Accept HI if T>l. (3} Accept HZ if T<l. II (If T=l, no decision is reached.) If HI is valid Zl' has a standard beta distribution with J ,r1j parameters (s+l) ,1'1' and Z2' , independent of ZI' , has a J J,r Zj J,r1j standard beta distribution with parameters 1,rZj . is distributed as tIle ratio Each of the ratios V, = Zl' l' /2 Z' l' J ), 1j J, Zj of independent standard beta variables with parameters (s+1),r1j and 1,rZj respectively. The V j '5 are IlUltually independent and we have m Pr[cor1'ect decision IHI] = Pr[, n V , > K]. )=1 J When m=l and 1'1 = 1'2 = r we have 4 = Pr[Vl Pr(correct decisionlI\] = > 1] 1 zl (s+r)!r! J zS(l-z )r-1(1_Z )r-1dz dz s!{(r-1)!}2 0 0 1 1 2 2 1 J = (s+r)!r! II zS(l-z )r-1 s!{(r-1)!}2 0 1 1 = (s+r)!r! s!{(r-1) I}2 r- 1{1-(1-Z )r}dz 1 1 0 $fB(s+l~r) - B(s+1,2r)] 0 (4') This is~ valid. of also the probability of correct decision when H2 is Some numerical values are given in Table 1. course~ TABLE 1 Probability of Correct Decision~ Based on Single Pair of Samples~ Each of Size r sir 1 2 3 4 5 5 0.727 0.841 0.902 0.937 0.958 10 0.738 0.857 0.919 0.953 0.972 15 0.742 0.863 0.925 0.958 0.976 20 0.744 0.866 0.928 0.961 0.978 The probabilities in the above tables since (VI > 1) is equivalent to are~ 25 0.745 0.868 0.930 0.962 0.980 in fact~ 50 0.748 0.871 0.934 0.966 0.982 00 0.750 0.875 0.9375 0.969 0.984 simply values of eXlr < XZr ). For sequences AI' A2 of m samples~ when m is large enough, we may use a normal approximation to the distribution of 5 m = l T' j=l R.n VJo Using the methods described in Section 2.2 of [1] we obtain (5) If r 1j = r Zj = r for all j = 1,2,. •.m, then (t-Z)l[ r I h-(t-l) + (_l)t h=l r r (h+s)-(t-l)]. (6) h=l 2.2 Population Distribution Not Known Using the method described in section 3 of [1] we suppose that among r l , r Z observed values from samples in sequences AI' AZ respectively (with a common distribution) the GZ greatest are from AZ • Under HI ' there were originally from which the greatest (rl + s) s values in a complete random sample Al have been removed to fonn Al ' while the r 2 values from AZ represent a complete random sample. The number of orderings of the original erl + r Z + s) values which would produce an ordering wi th the G greatest values from AZ ' after censoring in the ,TJ1y described Z is and the corresponding likelihood is 6 If r l = r 2 = r then the likelihood ratio is just (8) Since one of Gl and GZ must be zero (and the other positive) we see that the likelihood ratio approach leads to the decision rule "Accept Hi if Gi > 0 Since G > 0 is equivalent to l ruZe (3) (i = 1,2)." (9) lr > X2r this rule is equivaZent to the X obtained from the approaah based on knowZedge of the population distribution. The probabilities of correct decision given in Table 1 thus apply, and we see that in this case we appear to lose nothing by not knowing the population distribution. The situation is different when we have more than one sample in each sequence (m>1) . If we suppose r ij = r to the rule m tlAccept HI if.rr J=l [(G for all i and all j, we are led [S]j > 1.. .+1) 2J [s] (Glj+l) ClO) .Accept I-I2 if the ratio is less than I.'! (If the ratio equals 1, no decision is reached.) The pairs (Glj ,G2j ) for j = l, .•.m are mutually independent, and the joint distribution of Glj and GZj if HI is valid is 7 (11) (gl,g2 = 1,Z, ••• ,r). The possible values of the ratio w. = (GZ. J J + l)[s]/(G . + l)[s] lJ are, in increasing order, s!{(g+l)[s]}-l for g (s.I)-l(g+l)(s].c ~or g -_ 1,Z, ••. r. For small values of m and r = r, r-l, ••• 1 and it is possible to evaluate m Pr[correct decision] = Pr[correct decisionlHl ] = Pr[.IT WJ. J=l For eX~1Iple > llHl ]. (12) if m=Z, we have Pr[correct decision] = Pr[correct decision on each of (All,A zl) and (A12 ,A ZZ ) separately] + Pr[one correct and one incorrect. decision). g2+ S] [2r-gz-l] [2r- gl -l] r-l r-l [ s (using (4) and (11) ). (13) The double summation is conveniently evaluated as gzIz (gfS) [zr~:tl::i: (zr~:~ -1]. (14) 8 With m>l there is the possibility of no decision being reached. For m=Z, the probability that no decision is reached is I z(zr+s)-2 (g+s) (2r- g-l) r g=l s 1'-1 2 (15) TABLE 2 - Probabilities of Correct Decision and No Decision when m=2 5 s/r Correct 0.69Z 0.842 0.915 0.953 0.972 1 2 3 4 5 10 No 0.068 0.039 0.OZ3 0.013 0.008 Correct 0.717 0.870 0.940 0.972 0.986 15 No 0.058 0.030 0.015 0.007 0.004 Correct 0.726 0.880 0.948 0.977 0.990 No 0.055 0.027 ·O.OlZ 0.005 0.003 3. General Censoring of Extremes If we do not know the numbers s Y, sYi of least and greatest observations removed from one of two samples (or from each sample in one of two sequences) - though we do know that such removal has taken place from just one of them, we carmot use the direct likelihood ratio approach. obtaL~ed Results in [Z] suggest that we might use criteria based on quantities D." = G.. + L"" J.J J.J 1J where GJ." J' (LJ.'J" ) is equal to the number of values among the sample Aij which are greater(less) than any values inA 3_i ,j • If we simply have one pair of samples (m=l) of equal size r, then we would: "Accept HI if Gl + Ll < GZ + LZ Accept HZ if Gl + L1 > GZ + LZ Reach.no decision if G1 + Ll = GZ + L 2 •Ii 9 This is the same test as that obtained in [1] (see (30' L page 13) for the case wh~n the censoring is known to be symmetrical. If m is greater than 1 ~ the value of Dij might be combined by either multiplication or summations leading to the rules: or m m "Accept Bl(HZ) if IT (D.. + 1) «» IT (DZ· + 1)11 j=l 1J j=l J m m Accept HI (HZ) if IT Dl · «» L DZ.• 11 j=l J j=l J (16) (17) In both (16) and (17), no decision is reached when there is equality. -. Neither of these rules is the same as that suggested for syrnmettical censoring in [1] (foot of page 12). An approximate assessment of the properties of procedure (17) can be based on normal approximation to the distribution of The first and second moments of (Dlj - D2j ) tmder HI are derived in the Appendix (equations (AlZ) and (A27) ). Numerical values for the expected value and variance of (Dlj - DZj ) tmder HI are given in Table 3. The variance lUlder HZ is the same as lUlder HI ' while the expected value has its sign reversed (positive tmder HZ ' negative lUlder HI ). 10 TP,BLE 3 r s' 15 2 3 4 2 3 1 2 1 0 10 2 3 1 2 1 0 5 1 2 1 0 s" E[D1- D21I1] 2 -3.852 1 -3.709 :"3.234 0 1 -3.078 0 -2.682 1 -2.271 0 -2.014 0 -1.163 0 0 1 -2.647 0 -2.243 1 -1.986 0 -1. 726 0 -1.024 0 0 1 -1.336 0 -1.059 0 -0.907 0 0 Moments of (D1-D2) var(D1-D2IH1) s.d. (D1-D21I1) m* m** 8.998 9.085 8.912 9.323 9.010 9.568 9.177 9.340 9.015 7.589 7.208 7.958 7.546 7,899 7.818 4.750 4.339 4.229 5.063 3.000 3.014 2.985 3.053 3.002 3.093 3.029 3.056 3.002 2.755 2.685 2.821 2.745 2.810 2.796 2.179 2.083 2.057 2.250 4 4 5 6 7 11 13 38 6 10 12 18 22 66 6 8 11 14 41 11 14 20 25 72 15 21 28 26 37 50 For m modestly large - say in ~ 7 9 4 - the distribution of m T = I (D1 · - D2 ·) m j=1 J J "lill be approximately nonnal, and the probability of correct decision, given either HI or H2 ' will be approximately (18) In the last two columns of Table 3 m* gives the minimum integer value of m for which the approximate probability of correct decision is at least 0.99, ie m*{E[D1-DZIH1]}2 var(D -D fIf,"J ~ 2.326 121 (19) 11 and m** gives the mini.Im.Jm value for which the approximate probability of correct decision is at least 0.999. As r+<», (s ' and Sil remaining constant) the limiting values of expected value and variance are (20) and lim var(Dl-DzIHl) r-+oo = 2(s'+s"+2) + (2S'+5)2- s ' + (ZS"i+S)S-Stv _ 4- 5 ' _ 4- 5 " • (Zl) (We use Table 4 gives some numerical values of these limits and corresponding values of m* and m**. From Table 4 it appears that the limiting values are not closely approached unless r is about 50 or more. TABLE 4 Limi ting Expected Value and Variance as 51 SIV lim E[Dl-DZIHl ] -5.5 -5.375 -4.984375 -4.25 -3.875 -3.0 -2.75 -1.5 0.0 !'-+otl 2 2 3 1 0 4 2 3 1 2 1 0 1 0 1 0 0 0 r-+oo (s lim var(D1-D2IH1) I , s" constant) m* m** ~ 16.. 375 16.609375 16.8046875 15.484375 15.359375 14.50 14.1875 13.25 12.0 3 4 4 5 6 9 11 32 6 6 7 9 10 16 18 57 12 The decision rule (16) is just as simple to apply as (17), but the distribution theory associated with it is rather more complex. We can examine the relationship between (16) and (17) in a simple case (r=Z, s'=l, SIl=O) by direct calculation of probabilities. In this case (~) = 10 possible configurations of the combined sample there are In seven of these Dl = DZ ; in two Dl = 0 and D2 = 2, and in one Dl = 2 and DZ = O. Now (16) and (17) each lead to acceptance of HI (which is valid Al n A2 • since s '>0 ) when there are more pairs of samples with one of the two (Dl = 0, DZ = 0) D2 = 2) configurations than there are with the single configuration. (Dl = 2, There will be no decision when the number of pairs of samples is the same for each of the two types of configuration. The probability of correct decision is 'f (f:l) (7 )m-j (3 )j j=l J where [~j] ro: ill ! (j) (2)h(1)j-h 3 m . = (0.7)m I (~) 7- J j=l J h=[~j]+l h 3 (22) denotes the integer part of ~j. The probability of no decision is [~] = (0.7)m(m!).I . {(m_2j)~}-1(j!)-Z(jg)J J=O (Of course, we take (~) = 1.) Alternatively, from the point of view of procedure (17), we need the distribution of Tm = Ij=l (D1rD2j)' The (23) 13 !1 for d=-2 7 10 for d=O k for d=2 and the probability generating function is The probability generating function of T m is lO-mt -2m(2 + 7t 2 + t 4)rn • Hence the probability of correct decision when HI is valid, using (17), is Pr[Tm < 0IH ] = 10·m x (SlDIlS of coefficients in tems of t a l with a<2m in (2+7t 2+t 4)ffi ). (24) The probability of no decision, when using (17) is Table 5 contains a few numerical values. The values given by (22) and (24), and by (23) and (25) are of course identical. In general, procedures (16) and (17) will not be identical, but this example is of interest in showing that they can be. TABLE 5 Decision Probabilities when H (or H) is Valid Under Either Rules (16) or (17) *hen r~2, s';:l, 5 11 =0 m 2 4 6 Correct 0.32 0.45 0.53 No 0.53 0.36 0.28 14 APPENDIX f'1oments of (D -D2) 1 (~) = 0 if a<b. (i) We will use the convention (ii) We will use the relationships (a<b) (Al) and (A2) For convenience we will omit the subscript denoting order in the series and consider where L i (G i ) is the number of values in Ai less(greater) than any values in A3 -1. . We introduce indicator variables x = {l L ih if the h-th order statistic of A. is less than any value in A3- i 1 0 otherwise and GXih' defined as LXih but with I less v replaced by 'greater v • Then r L. = 1 L h=l r TX·h; G. = ~ 1 1 r h=l (A3) GXoh 1 and (a E[LXih LXik] = E[LXik] for k~h E[cfih GXik] = E(dCih] for k~h. = 1,2) (4.1) (4.2) 15 e Under I\ ' the total number of ways of arranging the original values in Al and the r values of AZ in ascending order of magnitude is C2r+s'+s" r ). Consequently (r+s'+s") Number of arrangements for which LXih=1 E[LXihIHl] = Zr+s'+sh ( r ) and similarly for expected values of other quantities. It is convenient to introduce the symbols (AS) in our calculations. We will use pictorial representation to indicate the derivation of the various combinatorial formulae. E*[ X IF] - C2r+s"-h) L 111 11 r s' t E* [LXZh IHI] = L u=O (u+h-l) (2r+s'+s"-u-h) h-1 r-h Al s'+h r+s"-h (A6) A Z 0 r I Al uss r+s' +Sli_ U AZ h-lx • (A7) r-h From (A3) and (A6) (AB) (using (Al) ). Similarly E*[G IH ] = (2r+s') _ (r+s') 1 1 r+l r+l· From (A3) and (A7) (A9) 16 E*[L IH] 2 1 , 1 ? +'" v + (u+n)(Wr_~. s u=O h=l h-l r-h = 5'L rL 5' lJ 2r+s'+sli r-l) = L ( U=O -u~ h ) (using (A2) ) = (S'+1)(2r+;~~sii). (AlO) Similarly (All) From (AS) - (All) (2r+s')+(2r+s ll)_(r+s')_(r+s ii )_(s'+sfl+2) (2r+s'+s") E[D -D IR] = r+l r+l r+l r+l . r-I • 1 Z -"1. 2r+s Y +Sil ( r (Al2) ) We now turn to the evaluation of (A13) (Note that 1l LZ are GlG Z are always zero.) In view of (A4.l) = E*[Li r + 2 L Ie=l (k-I) LXo1 jHl ]. 111 (A14) Now (cont.) 17 ii ) = (2r+s r+2 / (r+si l+1) _ ( _l)C r +sll ) r+2 r r+l 2r+s li r+s i1 r+s lV = ( r+2 ) - (r+2 ) - r(r+1 ) and so = (2r+s r+l ll ) + 2C 2r+s' ) _ (2r+l)(r+s ' ) _ 2Cr +s '). r+2 r+1 r+2 (A1S) Similarly Also r '\' E*[ l (k-l) k=l LXzkIHl] r 51 '\' (2r+s'+si1- u -k ) = L (k-l) 'L\' (u+k-1) k-1 r-k k=l U=O 1 2 = L (u+1) rL (u+k-)( r+s +5 Sl U=O I k=2 Zr+s'+sii = (r+s'+sli+Z) = u+1 \I r-h k -u- ) Sl r U=O (u+l) 1i J~(s'+l) (5'+2) CZr+S '+5 ) r-2 (using CAZ)) (Al7) and so (AlB) Similarly 18 (AZO) Al s'+b. k-h-l s\/+r-k+l i I A 0 1 r: 0 Z From this r-l = ~ (Zr-h) = (Zr ) h~l r+l r+Z (AZl) (using (Al) twice). Also, with k>h again 5' 5" ~ ~ (u+h-l) (r+s' +sli-u-v+k-h-I) (v+r-k). [ I ] E* LXZh GXZk HI = L L h-l k-h-l r-k • u=O v=O (AZ2) Al ussllr+sl+sll_u-vIVSSIl A h-l x k-h-l xr-k Z Hence E*[LzGzIHl] 51 S" = L I r L (v+r-k) u=o v=O k=Z 5I 5 11 r r-k k=l h=l (u+h-l) (r+s'+s"-u-v+k-h-l) u r+s l +s 1i -u-v = L L L u=O v=O k=2 5' ~ (using (AZ)) gil ~ 2r+s l +s" = u=O L L ( ) v=O r-2 (using (A2) again) = (S'+1)(SIl+1)(2r+~~;sl1). To calculate E*[L1GzII\J we need (for any h,k (A23) = 1, .•. r) 19 (A24) Al s i+h r+sYl-v-h V~S:i A2 0 k-1 r-k whence E*~L G IH ] = 1 2 1 r r 1: v=O h=l k=l SH 1: r (v+r-k)(r+s!i-~-h+k-1) r+s -v-h v (using Sil = r+s 2r+s" r ( ) (r ) v=O r iV = (Sil+1){(2r+s r (AZ)) (using (Al)) li (A25) ) Similarly i i E*[G L IH_] = (si+1){(2r+s ) _ (r+s )} 1 2 -~ r r' (A26) Combining (Al3, 15, 16, 18, 19, 21, 23, 25, 26) we obtain after some rearrangement E[(D1-D2)2IH1] = (2r+s~+sv'fln:'{2(;~;s) + (;~~s) _ 2(S+1)(2~+s) - Z(r+s ) - (2r+1)(r+ s) + Z(s+l)(r+s)} r+ 2 r+1 r Zr+s'+sll Zr+s i +s + (s9+s"+2)( r-1 ) + (s'+s"+2)(S9+SIl+3)( r-2 ) l1 + 2(2r )] r-2 where r9 denotes StmIlllation over s::s', S=Sii. (A27) 20 when Ll + Gl = L2 + G2 = 1 and as (AlZ) gives E[Dl-DZIH1] = 0 and (A27) gives As a check we note that for Dl - D2 = 0 identically~ var(Dl-DZIH1) = O. r=l~ Of course, if in a sequence of sample pairs each of size 1 we had a preponderance of pairs with Ll = 1 = GZ we would suspect tl1at one of the two samples had been censored tmsymmetrically, but we would not be able to decide which one. If S9 = sl1 = 0 then, as we would expect, from (Al2) (for any r) and from (A27) var(Dl-DZIHI) -- Zr -1 4(3r z-4r-2) 4(r) + (r+l)(r+2) • TIle joint distribution of L1' Gl' L2~ GZ when HI is valid is given by the following table. , - R.l &1 R. 2 &2 S:o ... >0 0 0 >0 0 0 >0 0 >0 >0 0 0 0 >0 >0 (2r+s ' +s\1)Pr[ ~ (L.=R..) ~ (G.=g.)] . 1 11 r . I I I 1= 1= ] [Zr-Z-~l-gZ] . ['' +Sz] r-R.I -1 sIt [zr- Z-~2-g1] .[" +~z] r-gl-I [zr+~Z-g2]. (s'+~z]. [S"+gz]. lr-Z s' s" er-Z-~l-gl , (AZ8) r-2 Sl 21 The joint distribution of D1 = 11 + G1 and DZ = 12 + G2 is given by the following table • d1 dZ >0 >0 >0 0 0 >0 (2~+s~",:s'\Pr[ (Di::;d1) () (DZ=dZ)] z){[ +dz]+(SOO+d Z]} [Zr-Z-d1-d dZ dZ r-l-d1 r-Z-d1] (r-1) r-Z [zr-Z-dztz-1s'+j [S"+dz-j] 5 • ' e r- Z = .~ (s) J=l s" [zr~:;dZ]{(S'+S~:l+dZ}_(S~~Z}_[S:~Z]}. Calculation of the moments of (D1-D2) from the joint distribution of D1 and DZ does not seem to be as convenient as calculation using indicator variables. This method is easily extended to the case when r 1 and r Z ~ the sample sizes in A1~ AZ respectively, are not necessarily equal. However, the appropriate test criteria would then be different~ and so the results will not be given here. Tukey [3] uses a test statistic (for differences between two samples) which in our notation, would be defined as: - if 1.G . ;t 0, the value of 1. 3- 1. the statistic is (1i + G3- i ). (If 11G2 = 12Gl = 0, no value is assigned • to the statistic.) The reader may find the derivation of the distribution of the statistic in [3] of interest. 22 REFEREnCES [lJ • Johnson~ N. 1. (1976) !1Comp1eteness comparisons among sequences of samp1es. 1i Institute of Statistias~ University of North CaroZina, Mimeo Series No. 1056. (To appear in H. O. HartZey 65th Birthday VoZume. ) [2] Johnson, N.,L. (1970) iiA general purpose test of censoring of sample extreme values!! (In Essays in Pr>obabiUty and Statistias (S. N. Roy MemoriaZ VoZwne)) Chapel Hill: University of North Carolina Press (pp. 379-384). • [3] Tukey, J. W. (1959) ilA quick~ compact o~o-sample test to Duckworth's specificationsi l Teahnometrias~ 1, 21-48. & •
© Copyright 2024 Paperzz