RANDOM SET SAMPLING WITH NON RESPONSES

RANDOM SET SAMPLING WITH NON RESPONSES
Carlos N. BOUZA1
ABSTRACT: The problem of estimating the population mean under
non responses is studied when Ranked Set Sampling [rss] is the
sampling design used for selecting the sample among the non
respondents [nr]. Two rss strategies are proposed in this paper:
1. Selection of a sub sample from the nr in each cycle.
2. Selection of sub samples from the nr in each rank.
The corresponding variances and their expectations are derived.
We obtain that the error derived is smaller than the classical
simple random sampling alternative. The behavior of the proposed
estimators is illustrated using some experiments.
KEYWORD: Non respondent strata, sub sampling , eÆciency.
1 Introdution
Ranked set sampling [rss] is a procedure which was rst
proposed by McYntire (1952) when modeling the estimation of
pastures yields.
Ranked Set Sampling procedure involves the drawing of m
independent sets of m units by simple random sampling with
replacement [srswr]. The unit with rank j in the j th set is
measured and the value of Y in2 it , yj(j) is obtained. Thus m units
are quantied out from the m units in the m evaluated sets. This
1 Facultad de Matem
atica y Computacion, Universidad de La Habana San
Lazaro y L. Habana, CP 10 400. Cuba. email:[email protected].
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
297
procedure is repeated r times [cycles] and n = mr measurements
are made. We denote by si the sample corresponding to cycle y. It
reduces costs and increase the eÆciency when it is easy to rank the
sampled units. Li, Sinha, Perron (1999) discuss dierent features
of this model.
The classical non response problem [nr], see Cochran (1977)
and Ardilly (1994) , is related with the existence of missing
observations in the random sample s. We study the problem of
estimating the population mean when the information on some
units is not available and rss is used.
As nr are observed the population U is divided into two strata
U1 = fu 2 U j u responds at the rst attemptg
U2 = fu 2 U j u does not respond at the rst attemptg
U = U1 + U2 and the sample si = si1 + si2 where sit Ut ;
t = 1; 2. The size of sit is , mit = jsit j is random variable. Let Nt
represents the number of units within Ut hence j U j= N1 + N2 =
N:
We will use the following notation and results: E [Yu j u 2 Ut]
= t; V [Yu j u 2 Ut] = t2 ; t = 1; 2 , E Y(j) = (j); V Y(j) =
2 ;j=
(2j ) ; j = 1; 2; :::m; E Y(j ) j U2 = 2(j ); V Y(j ) j U2 = 2(
j)
1; 2; :::; m; (j) = (j) ; 2(j) = 2(j) 2 ; j = 1; 2; :::; m:
E [Y ] = = W1 1 + W2 2 and V [Y ] = 2 : Y(j ) denotes the
statistic of order j and Wt = Nt=N .
The surveyor may select the sub sample from the nr by using
one of the following strategies:
1 Select a sub sample s0i2 of size m0i2 from each si2, i = 1; :::r.
2. Select a sub sample s02(j) of size r0i2 from the nr with rank
j , j = 1; :::; m.
In both cases he/she uses srswr for selecting the sub samples
. We will obtain estimators of for the model associated with each
strategy.
Section 2 is devoted to the development of unbiased
estimators of and their errors. In Section 3 the accuracy of
them is analyzed through simulation experiments. The behavior
of the proposed estimators is evaluated . The result is that the
Strategy 1 is the best alterntive.
298
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
2 Sub sampling the non respondent stratum
with rss
The rss method produces
r independent samples s0i2 : If mi2 =
0
jsi2 j 6= 0 a sub sample si2 should be selected from s0i2 . The nr
may be a consequence of a dierent behavior of Y in the units
that belong to U2. Therefore 2; 2(j) ; 22 and 2(2 j) . may be very
dierent from ; (j) ; 2 and (2j) . The surveyor xes m and selects
the sub sample from s using srswr for obtaining information from
U2 .
Take yiu0 as the value of Y in he unit u of s0i2 . The sub sample
mean is given by:
0
2=
yi
mi2 0
X
yiu
0
u=1
m0i
2
The mean s1 is denoted by
0
y rss;i
dening
1=
m f
X
y ij
mi1
j =1
if unit with rank j in the j th ranked set responds
0
otherwise
From Cochran [1977] is derived that an estimator of is the
sample weighted mean
=
y
e ij
()
yj j ;
;
=y
i
0
= wi1 y rss;i1 +wi2 y i2
where wit = mit =m, t = 1; 2; and y rss;i1 is the rss estimator of
the population mean of U1. Then
it is an unbiased estimator of
0
0
1 . As si2 U2 we have that y i2 is also unbiased for 2 .The rss
estimator of it is
m 1 X
y rss;i2 =
y
y
eij
mi
Therefore we can use
2 j=1
j (j )
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
299
=y
i
=
1
wi y rss;i
1 +wi2 y rss;i2
+ wi2
0
yi
2
2
y rss;i
Note that the rst term is the rss estimator of under the
complete response model.
As the conditional expectation of the
=y
second term is zero, i is unbiased for and its variance is:
V
h= i
yi
=V
+wi22 E
y rss
0
yi
2
y rss;i
2
2
+Cov
0
2
y rss ; y i
y rss;i
2
It is well known , see Kaur, Patil and Taillie (1997) for
example, that
m
1 X
2
V y rss;i =
m
2
(j)
j =1
Taking
0
yi
2
2=
0
yi
2
2
y rss;i
+
the expectation of the second term of V
0
E yi
2
2
y rss;i
2
2
y rss;i
h= i
yi
2
is
= m20
V
i2
2
2
y rss;i
because there is no contributions from the cross term .Then
h= i
V yi
2
m
2
X
1
= m2 (2j) + wi22 4 m20
i2
j =1
1
mi2
X
22
j =1
mi
3
2
2(j ) 5
(1)
For calculating EV =y i we need an explicit expression of
m0i2 . If the surveyor uses Hansen-Hurwitz's rule m0i2 = mi2 =Ki ,
Ki > 1. In this case
m
m
h= i
X
X
1
wi2 Ki 22
1
2
2
y
(2)
+
V
=
h
i
300
2
m
j =1
(j)
i
i2
m
2
m
j =1
2(j)
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
Using the results of Dell-Clutter (1972) it is possible to
express the sum of the involved variances of the order statistics
as
m
m
1 X
2 = 2 1 X 2
2
m
j =1
(j )
m
m
2 = 22
2(
) mi2
m2i2
j =1
h= i
y
1
mi2
X
2
j =1
1
(j)
mi2
X
2
22 2(j)
j =1
j
mi
Then we can rewrite V i and its expectation is easily
obtained. It is
2
3
m
m
h= i
X
X
2 W2 [Ki
1]
22
1
4
EV y i =
+
2(j) E [ 22(j) ]5 (3)
m
m
m2
i2
j =1
j =1
The rss procedure establishes that we must replicate r times
for obtaining a sub sample of size n = mr and to average the r
sample means. In our case we have that the corresponding mean
if the r samples s01; :::; s0r is
r
X
=y
=y
1
=
rss
i
r
i=1
It is unbiased for and its error is given by
"
#
r
h= i
X
1]
22
2
W2 [K
1
1
2
EV y rss =
+
2
E [ ]
n
n
where
22i =
m
mi2
X
j =1
n
i=1
2i
22(j)
m
X
2 = m1 2(j)
and
j =1
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
301
K
= 1r
r
X
i=1
Ki
Note that if Ki = K the third term measures the gain in
accuracy of the proposed model with respect to the use of srs in the
presence of nr. The proposed model is more accurate that the srswr
one if it is positive. Otherwise the relation between K and K
should be included in the analysis of the behavior of the expected
variance. Let us assume that the surveyor has information for
xing 20i = M ax[22(j) ] for each i = 1; ::; r:
Then:
20i mi2 =
mi2
X
j =1
22(j)
and we have that mW220i is an upper bound of E [Pmj=1 22(j) ]
and if
r
r
X
X
2 1 E [2 ] < 2 W2 2
i2
n
i=1
2i
r
i=1
0i
holds a gain in accuracy is expected. Generally W2 is suÆciently
small for granting the positiveness of this upper bound.
Note that the r observed units with rank j constitutes an
independent sample s(j) 0 from the density
f(j ) . In the alternative
strategy a sub sample s2(j) of size r2(0 j) units should be selected
among the nr in s (j) . The sample means
2
1
y 2(j ) =
r0
0
r2(j)
1
1( ) = r
r1(j)
0
and
X
2(j) p=1
y j
X
1(j) i=1
()
yp j
()
yp j
can be computed for s02(j) and s1(j) .We have that an unbiased
0
estimator of (j) is =y (j) = q1(j) y 1(j) +q2(j) y 2(j) dening qt(j) =
302
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
()
rt j =r; t
1
= 1; 2 . Mimicking the procedure used with the Strategy
0
=y
y
y
=
q
+
q
+
q
(j)
1(j) 1(j) 2(j) 2(j)
2(j) y 2(j)
2( )
y j
where y 2(j) is the mean
ofthe nr. As we obtain the unbiasedness of
0 =y
=
2(j) and E y 2(j) = E y 2(j) = (j) . Then y is also unbiased.
Its conditional variance is
V
h= i
y (j )
=V
()
y j
+ q2(2 j) E
0
2( )
y j
2( )
y j
2
because
the cross term has null expected value. The rst term is
V y (j ) = 1r (2j ) : If Hansen-Hurwitz's rule is used we have that
0 = r2(j) =K(j) and
r2(
j)
0
2
2 K(j) 1
2(
y 2(j )
E y 2(j )
= j) r
2(j)
Therefore:
2
q2(j ) K(j )
1
2(
1
j)
2
= r (j) +
r
As an estimator of we use the mean of the m estimators of
the rank population means
m
X
=y
=y
1
=
h= i
V y (j )
(rss)
m
j =1
(j)
It is unbiased and its variance is given by:
h=
i
V y (rss)
j
= m12r
2
m
X
2
4
j =1
(j) +
m
X
j =1
2( ) [ ( )
q j Kj
3
1]2(2 j) 5
Denote by P2(j) the probability that a sample unit with rank
does not respond. Hence:
m
h=
i
2
X
2
1
y
EV
=
+
P
[K 1]2
(rss)
n
nm
j =1
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
2(j) (j)
2(j)
303
and adding and substracting Pmj=1 2(2 j) in the last term we have
that
1
X
m
nm j=1
P2(j) [K(j)
1]
2
2(j)
Xh
m
=
j=1
P2(j) [K(j)
1] 1
i
2
2(j)
+
Then the expected variance is given by:
EV
h i=
=
y (rss)
2
+ 22 + 1
n
nm
X
m
j =1
[
1]2(2 j)
P2(j ) K(j )
1
22
n
nm
1
j =1
m
nm j=1
X[
m
X
2
(j )
2
2(j)
+22(j) ]
An analysis of the gain in accuracy of this estimator depends
on the values of the P2(j) 's which are generally unknown. Therefore
we can not conclude if it is better alternative with respect to the
other estimators.
3 Evaluating the eÆciency
The classical srswr estimator of under nr is
=y
srs
=
0
1 1 +n2 y 2
n y
n
where nt =j st j st Ut; t = 1; 2:
n
1X
y 1=
yi
1
1 i=1
n
is the sample mean of the respondents and
0
y
1
2 = n0
n2
X
0
2 i=1
yi
is the mean corresponding
=y to the sub sample among the nr
0
and n2 = n2=n. The error of , see Cochran [1977] is
h
= i 2 + W2 [K 1]22
EV ysrs =
n
n
=y
We made
a
comparison
of
the
behavior
of
srs with the rss
=y
=y
estimators (rss); rss
304
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
A set of experiments was developed using a normal N [1; 12]
distribution for generating m1 values of Yu , u 2 U1 and another
N [2 ; 22 ] for the m2 nr. Each mt was generated by a Binomial
with parameters m and Wt:
A similar procedure was used for generating n1 and n2:
The eÆciency of the estimators is denoted by:
[ ]=
a; b
h= i
ya
h= i ; a
EV y b
EV
6= b
We computed i and the sub indexes denote rss; (rss), and
srs:The number of cycles was xed as r = 10 , K =K =K(j ) and
100 populations of size 500 were generated. The results are given
in Table 1.
The
analysis
of the table
establishes that ==y rss performs better
=y
=y
=y
than (rss) and srs : But (rss) is worst than y srs : Note that =y rss2
produces larger gains in accuracy when the parameters W1 and 2
are large.
Table 1. EÆciency of the estimators under normal
distributions:r = 10:
Distrib:
[1; 1]
[3; 1]
=0; 8
[1; 1]
[3; 1]
=0; 2
[1; 1]
[3; 1]
=0; 8
[1; 1]
[3; 3]
=0; 2
N
N
W1
N
N
W1
N
N
W1
N
N
W1
m
10
20
30
10
20
30
10
20
30
10
20
30
[
=2
0 81
0 74
0 60
0 92
0 87
0 83
0 71
0 65
0 59
0 88
0 76
0 73
rss;
K
;
;
;
;
;
;
;
;
;
;
;
;
(rss)]
K =4
0; 67
0; 64
0; 57
0; 84
0; 77
0; 69
0; 66
0; 53
0; 48
0; 76
0; 63
0; 59
[
=2
0 92
0 87
0 81
0 98
0 93
0 89
0 86
0 70
0 67
0 90
0 85
0 81
rss;
K
;
;
;
;
;
;
;
;
;
;
;
;
]
=4
0 88
0 83
0 76
0 89
0 85
0 79
0 81
0 75
0 69
0 84
0 77
0 69
srs
K
;
;
;
;
;
;
;
;
;
;
;
;
[( )
=2
1 12
1 18
1 22
1 34
1 38
1 46
1 43
1 49
1 52
1 37
1 45
1 48
rss ;
K
;
;
;
;
;
;
;
;
;
;
;
;
]
=4
1 22
1 28
1 31
1 36
1 40
1 45
1 49
1 53
1 55
1 46
1 50
1 54
srs
K
;
;
;
;
;
;
;
;
;
;
;
;
In Table 2 we present a comparison among the estimators
using real life data. They were obtained from an study of the
abundance of Plankton in Veracruz, Mexico. The selection of
the units was made by ranking the results of the analysis of the
Hemoglobin. The variable of interest is the evaluation of the
blood's quality expressed as a percent of the computed result
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
305
with respect to a xed ideal standard. One hundred samples were
selected and
[ ]=
A ya
100 y a
X
s
100
s=1
was computed. The results coincide with
those obtained in the
=y
simulations.
They again suggest that rss is the best alternative
and =y (rss) the worst.
Table 2. Accuracy of the estimators in a Hematic Biometry
study. r = 10
m
10
20
30
[= ]
K =2
=4
0; 23 0; 26
0; 11 0; 19
0; 10 0; 08
A y rss
K
[= ]
K =2
=4
0; 33 0; 38
0; 25 0; 31
0; 10 0; 20
A y srs
K
[=(
K =2
0; 44
0; 35
0; 28
)]
A y rss
K
=4
0; 56
0; 46
0; 39
Another set of data is given by the yields of rice in 1551 plots
generated under experimental conditions. The= results are given in
Table 3. Again rss is the best alternative but y (rss) has a behavior
very similar to the srs estimator.
Table 3. Accuracy of the estimators in a study of rice's yields.
r = 10
=
=
=
A[ y rss ]
A[ y srs ]
A[ y (rss) ]
m
K =2 K =4
K =2 K =4
K =2 K =4
10 1; 61 0; 82
2; 83 3; 45
3; 06 3; 51
20 1; 38 0; 73
2; 33 2; 87
2; 52 2; 96
30 0; 76 0; 69
1; 81 1; 12
1; 74 1; 25
The results suggest that the rst strategy should be prefered.
306
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
BOUZA, C.N. Conjunto de amostras aleatorias com respostas
faltantes
(S~ao Paulo), v. 19, p. 297-308,
2001.
Rev.
Mat.
Estat.
RESUMO: O problema de estimar medias populacionais na aus^encia
de respostas e estudado quando o Conjunto de Amostras Ordenadas
(RSS) e o delineamento amostral usado para selecionar a amostra
entre as respostas faltantes (NR). Duas estrategias RSS s~ao propostas
no artigo: 1) Seleca~o de uma sub amostra de NR em cada ciclo. 2)
Selec~ao de uma sub amostra de NR em cada passo. As vari^ancias
correspondentes e seus valores esperados s~ao deduzidos. Obtemos
que o erro resultante e menor do que o da alternativa classica de
amostragem aleatoria simples. O comportamento dos estimadores
propostos e ilustrado atraves de alguns experimentos.
PALAVRAS-CHAVE: Extratos de respostas faltantes, sub amostragem,
eci^encia.
References
ARDILLY, P.
. Paris: Technip, 1994.
p.393.
COCHRAN, W.G.
. New York: Wiley, 1977,
p.435
DELL, T.R.; CLUTTER, J.L. Ranked set sampling theory with
order statistics background.
, v. 28, p. 545-55, 1972.
KAUR, A.; PATIL, G.P.; TAILLIE, C. Unequal allocation models
for ranked set sampling with skew distributions.
, v. 53,
p.123-30, 1997
HANSEN, M.H.; HURWITZ, W.N. The problem of non response
in sample survey.
, v. 41, p.517-29, 1946.
LI, D.; SINHA, B.K.; PERRON, F. Random selection in ranked set
sampling and its aplications.
, v. 76, p.185-201,
1999.
McYNTIRE, G.A. A method of unbiased selection sampling using
ranked sets.
, v.3, p.385-90, 1952.
PATIL, G.P.; SINHA, A.; TAILLIE, C. Ranked set sampling.
In : PATIL, G.P.; RAO, C.R (Ed.).
Les techniques de sondages
Sampling techniques
Biometrics
Biometrics
J. Amer. Stat. Assoc.
J. Stat. Plan. Infer.
Aust. J. Agric. Res.
Enviromental
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001
statistics.
307
Amsterdam: New Holland, 1994. p. 167-98. (Handbook of
Statistics, 12)
SAMAWI, H.M. Stratied set sampling.
. v.12,
p.9-16, 1996.
Pakistan J. Stat
Recebido em 19.09.1999.
308
Rev. Mat. Est. S~ao Paulo, 19: 297-308, 2001