Algorithm for computing the “optimal” cut off point for simple design

Dealing with
Representative
Outliers in Survey
Sampling:
Algorithms
Final Report
Robbert Renssen, Marc Smeets, Sabine Krieg – Statistics Netherlands
March 2002
1. Introduction
In this note we describe the algorithms of three types of adaptive censored estimators. These estimators
are extensively discussed in Renssen, Smeets, and Krieg (2002). The first estimator is the one-sided
adaptive censored estimator for i.i.d. situations (Renssen et al., section 2.2), the second estimator is the
two-sided adaptive censored estimator for i.i.d. situations (Renssen et al., section 3.2), and the third
estimator is the (one-sided) adaptive censored estimator for stratified designs (Renssen et al., section 4).
2. Adaptive censoring for simple random sampling with replacement (one-sided)
Given a sample y1 ,..., yn with n elements from a population with N elements. We first compute the
estimated optimal cut-off point t using the sample (step 1 to step 4). Given this cut-off point we compute
the estimator (step 5). Without loss of generality, we assume that the sample is ordered: y1  y2  ...  yn .
Step 1: take q 
1
.
n
Step 2: compute j  n(1  q) , p 
j
, ym  j 1
n
j

yi , yr  (n  j )1
n
y .
i
i  j 1
i 1
pym  nqyr
.
p  nq
Step 3: compute t 
Step 4: If y j  t and y j 1  t then go to step 5. Else take q  q 
1
and go to step 2.
n
Step 5: The t computed in step 3 is the estimated optimal cut-off point. Compute ̂t  pym  qt .
3. Adaptive censoring for simple random sampling with replacement (two-sided)
Given a sample y1 ,..., yn with n elements from a population with N elements. We want to estimate the
optimal cut off points s and t using the sample. The sample is ordered: y1  y2  ...  yn .
Step 1: take i  0 and j  0 .
Step 2: compute q l 
j 1
i 1
, qr 
, pm  1  ql  qr ,
n
n
ym 
1
ni j 2
n  j 1

yk , yl 
k i  2
1
i 1
i 1

k 1
yk , yr 
1
j 1
n
y
k
.
k n j
Step 3: solve the following system of linear equations for s and t :
pm qr
q

 ql ) s  r t 
n
n
n
ql
pm ql
 s(
  qr )t 
n
n
n
(
pm
ym  ql yl
n
pm
ym  qr yr
n
step 4: If yi1  s  yi2 and y n j 1  t  y n j then go to step 5; else



If j  0 then j : i  1 , i : 0 and go to step 2.
If j  0 and j  i then i : i  1 , j : j and go to step 2.
If j  0 and j  i then i : i , j : j  1 and go to step 2.
Step 5: The s and t computed in step 3 are the estimated optimal cut off points. Now compute:
y s,t  ql s  pm y m  qr t .
4. Adaptive censoring for stratified designs (one-sided)
Given L strata with N1 ,..., N L elements in the population and a sample with n1 ,...nL elements in the
sample. The problem is compute the optimal stratum cut-off points t1 ,..., t L , using the sample. The
elements in the sample are called y11,..., y1n1 , y21,..., yLnL . Again, without loss of generality, we assume that
the sample of the h-th stratum is ordered such that yh1  yh 2  ...  yhnh . We have to solve the system of
non-linear equations given in Renssen et al. Formula 4.5. for t1 ,..., t L . We suggest the following heuristic.
Step 1: use the algorithm described in section 2 for each stratum separately to obtain a preliminary
estimate of the stratum mean. So, take n : nh , N : N h , yi : yi,h for i  1,..., nh . Denote these stratum
estimates by ˆh .
*
Step 2: compute the transformed values yhi

L
Step 3: Calculate N :

Nh
( yh,i  ˆh ) .
nh
L
N h , n :
h 1
n
h
*
, y j : yhi
, h  1,..., L, i  1,..., nh , j  1,..., n . Order these values
h 1
such that y1  ...  yn . Use the algorithm described in section 2 to calculate the cut-off point t* : t .
Step 4: compute for each stratum the stratum fraction qh of transformed values larger than t* :
*
qh  (# yhi
 t* ) / nh .
Step 5: Given q1 ,..., qL compute phm 1  qh and rh  phmnh , hm 
1
rh
rh

i 1
yih , and  hr 
1
n h  rh
nh
y
i ,h
i  rh 1
for qh  0 or  hr  0 for qh  0 , h  1,..., L . Compute also Wh  Nh / N , h  1,..., L and solve the system of
linear equations for t1 ,..., t L :
W1[
W1 p1m
(t1  1m ) 
n1
L
W q
h hr (  hr
 t h )]  0
h 1
.
WL [
This gives
W L p Lm
(t L   Lm ) 
nL
L
W q
h hr (  hr
 t h )]  0 ,
h 1
t1 ,..., t L .
Step 6: compute q~h  (# y h,i  t t ) / nh for all h .
Step 7: if qh  q~h for all h then go to step 8. Else take hmin  min{h : qh  q~h }. If q hmin  q~hmin then
q hmin  q hmin 
1
1
, else q hmin  q hmin 
. Go to step 5.
nhmin
nhmin
Step 8: the solution t1 ,..., t L we found in step 5 is the optimal cut-off vector. Compute the estimate
rh
ˆt 
L

h 1
Nh
N
y
hj
 (nh  rh )th
j 1
nh
Remark: In our simulations often step 5 gave the optimal cut-offs at the first time. Otherwise step 5 – step
7 were repeated a few times. However, it is not certain whether this heuristic will find the correct
solution. It might be possible that the steps 5 to 7 are repeated infinite times. Therefore it is recommended
to limit the number of repetitions to, say, 1000.
References
Renssen, R., M. Smeets, and S. Krieg (2002), Dealing with Representative Outliers in Survey Sampling:
methodology, Euredit report, WPx.2.

Download Report

Algorithm for computing the “optimal” cut off point for simple design

Paperzz.com

Your Paperzz