D45-2-1

Dealing with
Representative
Outliers in Survey
Sampling:
Methodology
Final Report
Robbert Renssen, Marc Smeets, and Sabine Krieg – Statistics Netherlands
March 2002
1
Contents
1 Introduction ..........................................................................................3
2 One-sided censored estimators for i.i.d. sampling ................................5
3 Two-sided censored estimators for i.i.d. sampling .............................11
4 One-sided censored estimators for stratified designs .........................15
5 Use of auxiliary information ................................................................20
6 Level of censoring .............................................................................22
Appendix .................................................................................................23
References ..............................................................................................31
2
Dealing with Representative Outliers
in Survey Sampling
by
Robbert Renssen
Marc Smeets
Sabine Krieg
Abstract
This paper discusses adaptive censoring for both i.i.d. situations and stratified simple random
sampling designs in order to deal with representative outliers. Based on the influence function
approach the design mean squared error of these estimators is calculated and compared with
the design variance of their classical design unbiased counterparts. Roughly, for populations
that have few isolated outliers adaptive censoring plainly gives better results than their
classical counterparts. Adaptive censoring also performs better for populations that are
smoothly skewed. The results are only slightly worse for populations that contain no outliers.
Keywords: Adaptive censoring; Adaptive weighting; Influence function; Finite population
sampling; Resistant estimators.
1. Introduction
The problem considered in this paper is to find a practical way to deal with large observations that
practitioners of sampling theory may encounter in their sampled data. In order to state the problem more
precisely, we follow Chambers (1986) and distinguish between representative and non-representative
outliers. A representative outlier is a sample unit with a value that has been correctly recorded and that
cannot be regarded as unique; the non-sampled part of the population may contain similar units. A nonrepresentative outlier is typically associated with a sample unit whose values are incorrectly measured
(gross errors) or with a sample unit whose values are unique in the sense that there are no other units like
them in the finite population. In this paper, it is assumed that all gross errors are corrected at the editing
stage. At this stage, robust multivariate outlier detection methods as discussed in Beguin and Hulliger
(2001) could be very useful to detect irregularities in the data. Once these irregularities have been
detected, the next step is to decide whether the causing observations have been measured correctly or not.
Ideally, this check is done by re-contacting the supplier of the sampled data. Otherwise, one could consult
a subject matter specialist and/or make use of external sources.
There remains the problem of treating outliers that are correctly measured. Often, if prior
information about such observations is available, they are sampled with inclusion probability one and
considered as non-representative, i.e. they obtain a sample weight of one and are excluded from the
weighting process. If the prior information is lacking, they may pop up as representative outliers with
some inclusion probability smaller than one. Lee (1995) gave an overview of the several proposals that
have been made in literature to treat such outliers. He distinguished between basically three approaches,
namely lowering the (sample) weights of outliers, reducing the outlier values, or make use of robust
estimation techniques. For heavy tailed distributions these proposals will generally decrease the design
variance. For skewed populations, however, they also imply an increase of design bias. Inspired by the
results of Searls (1966), Ernst (1980), and Kokic and Bell (1994) this paper uses the censored estimator as
a starting point to deal with representative outliers in survey sampling.
In case of i.i.d. sampling, Searls (1966) already showed that the mean squared error of the
sample mean can be actually lowered by censoring large observations at some (optimal) cut-off point.
Although the derivation is quite cumbersome, the obtained result is surprisingly simple. As expected, the
3
optimal cut-off point depends on the sample size, namely it increases as the sample size increases. Ernst
(1980) proved that censoring observations at the optimal cut-off point is at least as efficient as trimming
or winsorising observations, i.e. it was shown that the censored estimator with optimal cut-off point has
no larger mean squared error than the one-sided -trimmed mean and the one-sided -winsorised mean
for any . In a certain sense, the results of Searls were extended to stratified designs by Kokic and Bell
(1994). Namely, they examined an optimal winsorised estimator in a stratified context, where the notion
‘optimal’ is defined by minimising a mean squared error criterion as a function of the stratum cut-off
points. They concluded that winsoring outliers may lead to a considerable improvement over the classical
estimator, especially when there is only a small proportion of extreme outliers present in the sample.
For the practitioner of sampling theory the applicability of these results remains limited as the
optimal cut-off points not only depend on the sample size but also on unknown population parameters. In
an ideal situation, these parameters can be accurately estimated from the past. When such information is
lacking, they have to be estimated from the sample itself. Naturally, this changes the statistical behaviour
of the censored estimator. Kokic and Bell (1994) mentioned this possibility, however, they did not
elaborate on this. This paper uses the results of Searls (1966) as a starting point and discusses both for
i.i.d. situations and stratified designs several types of censored estimators with estimated optimal cut-off
points. We emphasis that, according to these estimators, it is quite well possible that observations that are
identified as outliers at the editing stage should not be censored at the estimation stage. Based on the
influence function approach (see e.g. Hampel et al., 1986), the variances and mean squared errors of these
estimators are approximated. Unlike the censored estimator with known optimal cut-off point, the
estimator with estimated cut-off point may be less efficient than the sample mean. In case of simple
random sampling, a comparison of the mean squared error of our (one-sided) censored estimator with the
variances of the sample mean gives some insight under which circumstances the use of censored
estimators with estimated cut-off point may be fruitful. Roughly, samples that contain few isolated
outliers are worthwhile to be censored. We note that this conclusion parallels the conclusion of Kokic and
Bell (1994).
The outline of this paper is as follows. Section 2 deals with one-sided censored estimators in
i.i.d. situations. In Section 2.1 the results of Searls (1966) are briefly repeated and the one-sided censored
estimator with (known) optimal cut-off point is defined. When the optimal cut-off point unknown it may
be estimated from the current sample. This gives an adaptive censored estimator. In Section 2.2 some
statistical properties of this adaptive censored estimator are discussed. Among others, its design bias and
design variance are approximated. In Section 2.3 the results are applied to pps-sampling and in Section
2.4 to simple random sampling without replacement. Some numerical results are given in Section 2.5.
Section 3 deals with two-sided censoring in an i.i.d. situation. The two-sided censored estimator with
(known) left and right cut-off points is defined in Section 3.1, while the adaptive two-sided censored
estimator is discussed in Section 3.2. One-sided censored estimators for stratified design are discussed in
Section 4. Extending the results of Searls (1996) in an obvious way, an ‘optimal’ set of stratum cut-off
points is defined. Similarly as in the i.i.d situation, if this set is estimated by the current sample, we obtain
an adaptive censored estimator for stratified designs. Section 4 is ended by a small simulation study for
both the Neyman allocation and proportional allocation in order to get an insight into the gain of adaptive
censoring under different allocations. Section 5 discusses adaptive censoring when auxiliary information
is used, and finally in Section 6 a strategy is presented to deal with the so-called level of robustification.
The treatment of representative outliers in finite population sampling has been the subject of
several papers. In an i.i.d. situation, Searls (1966) showed the existence of an (optimal) cut-off point such
that the censored estimator has smaller mean squared error than the sample mean. Ernst (1980) extended
this result by showing that this estimator is not only at least as efficient as the sample mean, but also as
any of six competitive estimators that are supposed to reduce the effect of large observations. Hidiroglou
and Srinath (1981) considered various methods for reducing the weights of outliers once they have been
identified. Fuller (1991) investigated the mean squared error of the once-winsorised estimator for the
mean of the Weibull distribution, and Kokic and Bell (1994) discussed an ‘optimal’ winsorised in a
stratified context. Clarke (1995) and Chambers and Kokic (2000) discussed the censored estimator from a
model-based point of view. They suggested to adjust the value (weight) of observations that are
significantly larger than the fitted values implied by some estimation model, where the notion ‘significant
larger’ is defined by means of a threshold parameter that is calculated on past survey information.
4
Chambers (1986), Gwet and Rivest (1992), and Hulliger (1995, 1999) used M-estimation techniques to
define resistant estimators that implicitly take account of outliers. In fact, the MER-estimator defined by
Hulliger (1995) resembles our censored estimator with estimated cut-off point in the sense that the
involved  -function has estimated tuning constant. Hulliger, however, gave no approximation formulas
for the variance of MER-estimators. We finally mention Duchesne (1999), who used M-estimation
techniques to define robust calibration estimators.
2. One-sided censored estimators for i.i.d. sampling
2.1 One-sided censored estimators with fixed cut-off point
Let Y1,...,Yn be n independent identically distributed (i.i.d.) random variables according to some unknown
distribution function F. In particular, in case of simple random sampling with replacement from a finite
population {u1,..., uN } , we have
F (u ) 
N

1
N
yi (u ) ,
(2.1)
i 1
where y1,..., yN denote the values of a target variable, and  yi denotes the distribution function with
point-mass 1 at yi . Our goal is to find an estimator for the population mean   EF (Y ) that is (more)
resistant against representative outliers than the ordinary sample mean. In literature, many resistant
estimators have been proposed to deal with outliers in sampling theory, see e.g. Lee (1995) or Hulliger
(1999). In this paper we limit ourselves to censored type estimators and start with one-sided censored
estimators for i.i.d. sampling
r
Y
(i )
̂t 
 (n  r )t
i 1
,
n
(2.2)
where n  r values larger than t are replaced by t. We note that such one-sided censored estimators are of
particular interest if the population distribution function F is skewed to the right. Let pm  pm (t ) denote
the population fraction of the non-truncated values, m  m (t ) the population mean of the non-truncated
values, qr  qr (t ) the population fraction of the truncated values, r  r (t ) the population mean of the
truncated values, and  m2   m2 (t ) the population variance of the non-truncated values, and define
Y if Yi  t
.
Yi*   i
 t if Yi  t
Then, the variance, bias, and mean squared error of (2.2) can be expressed as
Var ( ˆ t ) 
2
n

2
pm m
 pm qr (  m  t ) 2
,
n
Bias (ˆ t )  EF (ˆ t )    qr (r  t ) ,
MSE ( ˆ t ) 
2
pm m
 pm qr (t   m ) 2
 qr2 (r  t )2 ,
n
respectively, where  2 is the population variance of Yi* . For differentiable distribution function F, Searls
(1966) showed that the cut-off point that minimises the mean squared error satisfies
5
pm
(t   m )  qr (  r  t )  0 .
n
(2.3)
When considering this point as a function of n, we have by the implicit function theorem
t qr (  r  t )

 0.
n
pm  nqr
So, the optimal cut-off of the one-sided censored estimator increases as the sample size n increases. This
makes sense because, by increasing sample size, the bias component of the MSE will more and more
dominate the variance component.
Although the derivation of (2.3) assumes the differentiability of F , (2.3) itself does not depend
on the density function, and it makes sense to use (2.3) to define the cut-off value for finite populations as
well. Below we show that this cut-off value is well defined. Consider h(t )  pm (t  m )  nqr (r  t ) as a
function of t, and observe that h(t ) is continuous, piecewise linear, i.e. it is linear between y(i )  t  y(i 1) ,
2  i  N  1 , where y(1) , y(2) ,..., y( N ) denotes the order statistic of y1,..., yN , and has a crack at each
observation. Figure 1 exhibits a prototypical h(t ) -function with N  9 . The derivative of h(t ) , if exists,
equals pm  nqr  0 from which we conclude that h(t ) is strictly increasing. Note that this derivative
decreases as t increases. Finally, we have h(  )  0 and h( y( N ) )  0 , and so there exists a
unique   t0  y( N ) such that h(t0 )  0 . Let
t[i ] 
n 1 ( y(1)  ...  y(i ) )  ( y(i 1)  ...  y( N ) )
n 1i  ( N  i)
, 2  i  N 1 ,
then, it is easily seen from figure 1 that the optimal cut-off value can be expressed as t0  max (t[ j ] ) . If
j
4
Figure 1
h(t )

2
0
0,00
1,00
2,00
3,00
4,00
5,00
6,00
-2
-4
-6
-8
-10
-12
-14
t
the optimal cut-off point given by (2.3) is known, then the minimised mean squared error of the censored
estimator can be written as
2
n  MSE ( ˆ t )  pm m
 pm qr
nqr
(r  m )2 .
nqr  pm
(2.4)
Indeed, (2.4) is smaller than the variance of the ordinary sample mean:
2
n  Var (ˆ )   2  pm m
 qr r2  pmqr (r  m )2 ,
(2.5)
6
where  r2 denotes the population variance of the truncated values. Unfortunately, in practice the optimal
cut-off point will be unknown and hence, it has to be estimated. Ideally, one may use past surveys to
approximate the unknown distribution and to guess the optimal cut-off point for the given sample size. By
means of a numerical example, Searls (1966) showed that this guess needs not be very accurate for gains
to be achieved.
We emphasis, however, that (2.4) is no longer valid for sub-optimal t, and that a theoretical
foundation of this claim is lacking. Especially, when t is accidentally chosen too small, we expect that the
reduction of variance is largely exceeded by an increase of bias. Therefore, as an alternative, we suggest
to estimate t from the sample itself and to define a censored estimator with estimated cut-off point.
Naturally, we have to re-consider the mean squared error, as (2.4) is derived for fixed t.
2.2 Adaptive censoring; one-sided censored estimators with estimated cut-off point
If past information is lacking, one may estimate the optimal cut-off point (and the other population
parameters) from the sample:
tˆ0  max (t[ j ] )  max
j
n 1 (Y(1)  ...  Y( j ) )  (Y( j 1)  ...  Y( n) )
n 1 j  (n  j )
j

pˆ mYm  nqˆrYr
,
pˆ m  nqˆr
where
pˆ m 
r
, Ym  r 1
n
r

Y(i ) , qˆ r 
i 1
nr
, and Yr  (n  r ) 1
n
n
Y
(i )
,
i  r 1
with r the number of observations smaller than tˆ0 .This gives the following adaptive one-sided censored
estimator:
ˆ tˆ  pˆ mYm  qˆr tˆ0 
0
1  nqˆr
nqˆr
pˆ mYm 
qˆrYr .
pˆ m  nqˆr
pˆ m  nqˆr
(2.6)
We note that Hulliger (1995) suggested a similar idea in a M-estimation context. His idea was to take an
M-estimator, to estimate the mean squared error from the sample for a set of tuning constants, and to
choose the tuning constant with least estimated mean squared error. He called the resulting estimators
Minimum Estimated Risk estimators (MER-estimators).
It is easy to see that tˆ0 is location and scale equivariant, from which it follows that p̂m and q̂r
are location and scale invariant and subsequently ˆ tˆ0 is location and scale equivariant (compare (3.6)). If
the traditional sample mean is decomposed as
ˆ  pˆ mYm  qˆ r Yr
(2.7)
we observe that (2.6) actually up-weights (down-weights) inliers (outliers), where the set of inliers
(outliers) is determined by the estimated optimal cut-off point.
In the following we show under which circumstances (2.6) has a smaller mean squared error than
the ordinary sample mean. To that purpose we first calculate its influence function:

( y   t 0 )  qr


IF ( tˆ ; F )  
0
(t0  t )  qr
0


( y  t0 )
pm  nqr
n( y  t0 )
pm  nqr
if y  t0
if y  t0
7
p  nqr  1
 1  nqr
( y   m )  qr ( m
)( m  t0 ) if y  t0

pm  nqr
 pm  nqr

p  nqr  1
nqr

( y   r )  pm ( m
)( m  t0 ) if y  t0
p

nq
pm  nqr

r
 m
(2.8)
see appendix Ia. This influence function is continuous and consists of two linear pieces with a crack at
y  t0 . It is unbounded, which is due to the fact that all observations are used to estimate the optimal cutoff value. By means of (2.8) we can approximate the variance of ˆ tˆ :
0
 1  nqr
nVar (ˆ tˆ )  pm 
0
 pm  nqr
2
 2
 nqr
  m  qr 

 p  nq
r

 m

qr
 pm 1 
p
m  nqr

2
 2
 1  pm  nqr
  r  pm qr 

 p  nq
m
r


2
 2

pm
  m  qr 1 


p
m  nqr






2
 nqr

 p  nq
r
 m
2

 (r  m )2


2
2
2

 2
pm
 nqr2 
  r  pm qr 1 
(r  m ) 2 .

2

( pm  nqr ) 


A term-wise comparison with the variance of the ordinary sample mean that is given by (2.4) shows that
the first term is larger, the second term smaller, and the third term smaller (larger) if nqr2  pm2
( nqr2  pm2 ). Noting that for positively skewed populations,  m2 is relatively small compared to
2
(r  m )2 , we suggest on account of the third term to use nqr2  pm
 qr  ( n  1) (n  1) as a rule of
thumb to expect variance reduction. To obtain the mean squared error, we need to incorporate the squared
bias:
n  Bias 2 ( ˆ tˆ ) 
0
2 2
npm
qr
( pm  nqr ) 2
(r  m )2 .
If this squared bias is added to the (third term of) variance, then, again on account of the third term, we
expect gain, i.e. a reduction in mean squared error, if
2
2
2
2 
2
2 



npm qr
pm
 nqr2 
1  pm  nqr  
1  nqr  nqr  .
 1 

1

 ( p  nq )2 
 ( p  nq ) 2 

( pm  nqr ) 2 
( pm  nqr ) 2
m
r
m
r





(2.9)
The fulfilment of (2.9) depends on the sample size in combination with the shape of the (finite)
population. If the population contains a few number of isolated outliers to the right and the sample size is
not too large, then (2.9) is probably satisfied. In particular, (2.9) is satisfied if pm2  nqr , however this rule
of thumb probably is too crude.
2.3 Application to PPS-sampling
In sample surveys, many samples are drawn according to some complex design. In order to study the
impact of unequal probabilities on our one-sided censoring strategy we devote a separate sub-section to
pps-sampling. Let y1,..., yn denote n independent observations from some finite population y1,..., y N and
suppose that these observations are not made at random, but with drawing probabilities pi . Then, each
observation yi leads to a random variable with probability distribution
N
G (u ) 
p
i yi
(u ) .
i 1
Now, if the population mean of the y-values is still the parameter of interest, we can adapt our censoring
strategy by replacing the population values yi by zi  yi pi and the corresponding distribution function
(2.1) by
8
N
G (u ) 
p
i zi
(u ) .
i 1
All parameters, like pm , m ,  m2 , qr , r ,and  r2 should now be defined in terms of G. Among others,
this implies that the optimal cut-off point is the solution of pm (t   m )  nqr ( r  t ) , where pm ,  m , qr ,
and  r are defined as
t

pm 
dG (u ) ,  m 

1
pm

t


zdG( z ) , qr  dG (u ) , and  r 

t
1
qr

 zdG( z)
t
respectively. We note that this cut-off point can be alternatively defined as
t  max (t[ j ] ) , 2  i  N  1 ,
j
with
t[ j ] 
n 1 ( y(1)  ...  y( j ) )  ( y( j 1)  ...  y( N ) )
n 1 ( Np(1)  ...  Np( j ) )  ( Np( j 1)  ...  Np( N ) )
,
where the order in both the numerator and denominator is determined by the order of z(1) ,..., z( N ) . For
example, y( N ) is not necessarily the largest yi -value, but corresponds to the largest zi -value. If this
optimal cut-off point is unknown we may estimate it from the sample
tˆ  max (t[ j ] ) , 2  i  n  1
j
with
tˆ[ j ] 
n 1 ( z(1)  ...  z( j ) )  ( z( j 1)  ...  z( n) )
n 1n( j )  (n  n( j ) )
.
It is well known that pps-sampling is efficient as the pi ’s are proportional to the yi ’s. In
comparison with simple random sampling, such efficient designs reduce the gain of censoring. First, as
large outlying yi -values are divided by large pi -values, the outlyingness of their corresponding zi values might be softened somewhat. Second, on the average, an efficient design causes an
overrepresentation of large yi -values ( zi -values) in the sample, which gives a relatively large optimal
cut-off point.
2.4. Adaptation to SRWHR-Sampling
In case of simple random sampling without replacement, one may simply adapt (2.5) to take account of
the finite population correction factor: ˆ ftˆ  fˆ  (1  f )ˆtˆ , where f  n / N . Note that the influence
0
0
function of this estimator equals IF (  ftˆ0 ; F )  f  IF (  : F )  (1  f )  IF ( tˆ0 ; F ) , where IF (  ; F )  y  
and IF ( tˆ0 ; F ) is given by (2.7). By means of this influence function, we can approximate the variance of
this adapted estimator:
n
Var ( ˆ ftˆ ) 
0
(1  f )

(1  f )qr
pm 1 
pm  nqr

2
2
2
2

 2

(1  f ) pm  2
(1  f )( pm
 nqr2 ) 
  m  qr 1 
  r  pm qr 1 
(r  m )2 .


2


pm  nqr 
(
p

nq
)


m
r


9
As we see, in each term the effect of censoring is softened by the factor 1-f. On account of the third term,
we still expect variance reduction if qr  ( n  1) (n  1) . Adding
n
npt2 qt2
Bias 2 ( ˆ acen )  (1  f )
( t  t  ) 2
(1  f )
( pt  nqt ) 2
to the (third term of) variance, we obtain an expression for the mean squared error. On account of the
third term of this expression, we expect a reduction in this mean squared error if
2
2
2
2 
2
2 
2 



1  (1  f )( pm  nqr )   (1  f )npm qr  1  1  (1  f )( pm  nqr )   1  (1  f )(nqr  nqr )  .



( pm  nqr ) 2 
( pm  nqr ) 2
( pm  nqr ) 2 
( pm  nqr ) 2 



Clearly, this inequality is satisfied under the same conditions as (2.9), i.e. we may expect a reduction in
mean squared error under exactly the same conditions that apply for the one-sided censored estimators
that are defined for simple random sampling with replacement.
2.5. Some numerical results
In order to illustrate the behaviour of our one-sided censored with estimated cut-off point numerically, we
have calculated its variance and mean squared errors in some specific situations and compared it with the
variance of the ordinary sample mean. The data sets stem from a production survey. The variable of
interest is total turnover. We distinguish three different size classes and consider each size class as a
population. The first size class consists of 788 units. Its population mean is approximately 358 (thousand
guilders). Furthermore its maximum value is 5600, which might be considered as an outlier. The second
and third largest values are approximately 3000 and it is unclear whether they should be considered as an
outlier or not. The second size class consists of 774 units, its population mean is 1353, and it has one
obvious outlier with a value of 17000. Finally, the third size class consists of 544 units, it has a mean of
1488, and has no clear (isolated) outliers. Its largest value is approximately 5600. For each of these
populations we have calculated the variance and mean squared error of our one-sided censored estimator
with estimated cut-off point in a simple random sampling without replacement setting with sample sizes
of n = 20, 50, 100, 200, 500, and 1000. To that purpose we have used the influence function approach.
The results are given in table 2.5.1.
As expected, the largest gain of our estimator is achieved in the second size class as this class
contains one clear outlier. The influence of this outlier with respect to the population mean remains
limited as the population size is rather large, however if this outlier is accidentally sampled it may have a
rather large impact on the sample mean. Naturally, the influence on the sample mean becomes smaller as
the sample size increases. For sufficiently large sample sizes, say n = 1000, the effect of adaptive
censoring becomes negligible. In the third size class the variances of the sample mean and the mean
squared errors of our one-sided censoring estimator are more or less the same for all sample sizes. There
is neither much gain nor much loss if there are no obvious outliers. The first size class, which can be
considered as ‘smoothly’ skewed population to the right, shows some gain in favour of our one-sided
censored estimator.
Table 2.5.1:
Sample size Size class 1: Mean = 358
V ( y ) - M ( ytˆ ) - B( ytˆ )
Size class 2; Mean = 1353
V ( y ) - M ( ytˆ ) - B( ytˆ )
Size class 3; Mean = 1488
V ( y ) - M ( ytˆ ) - B( ytˆ )
20
50
100
200
500
1000
34687
13875
6937
3469
1387
694
15048
6019
3009
1504
602
301
6213
2485
1242
621
248
124
4512
1952
1024
563
241
120
26
16
11
8
5
3
27967
9731
4802
2351
1156
662
47
27
21
17
13
10
14825
5771
2921
1478
608
304
37
21
14
9
5
3
10
Especially for small sample sizes, say n  20, the gain of censoring seems to be considerable.
However, it should be noted that this gain is somewhat deceptive as for very small sample sizes even the
estimator with the smallest mean squared error may have mean squared error that is probably too large for
publication purposes.
Till now, we have considered populations that are skewed to the right. We note that the
performance of our one-sided censored estimator compared to that of the sample mean is worse for
populations that also contain outliers to the left. Therefore, we consider in the next section two-sided
censored estimators, which should be able to handle outliers on both sides.
3. Two-sided censored estimators for i.i.d. sampling
3.1. Two-sided censored estimators with fixed cut-off point
Let Y1,...,Yn be n independent identically distributed random variables according to some unknown
distribution function F. For given s and t, satisfying s  t , we define a two-sided censored estimator by
replacing all y-values that are smaller than s by the value s and all y-values larger than t by the value t:
r
y nsn t
i
ys ,t 
l
i 1
n
r
,
where r denotes the number of non-truncated values and nl ( nr ) the number of sample values that are
truncated to the left (right). Similarly as the one-sided censored estimator we can derive its variance:
V ( y s ,t ) 
pm 2
qq
[ m  ql (  m  s ) 2  qr (t   m ) 2 ]  l r (t  s ) 2 ,
n
n
where pm  pm (s, t ) is the population fraction of non-truncated values, m  m (s, t ) the population mean
of the non-truncated values,  m2   m2 (s, t ) the population variance of the non-truncated values,
ql  ql (s) the population fraction of the truncated values to the left, qr  qr (t ) the population fraction of
the truncated values to the right, see Smeets (2001). If we additionally denote l  l (s) and r  r (t ) as
the population means of the values that are truncated to the left and right respectively, then its bias is
easily seen to be
B( ys,t )  ql (s  l )  qr (r  t ) .
Similarly as in the one-sided case, one could consider the mean squared error as a function of s and t, and
define the optimal cut-off points by minimising this function. However, by taking s  t   we obtain a
zero bias and a zero variance, and we conclude that the optimal cut-off values in the two-sided case, i.e.
the values that minimise the mean squared error, offer us nothing to go on. Instead we suggest to
minimise
Q( s, t ) 
pm 2
qq
[ m  ql (  m  s ) 2  qr (t   m ) 2 ]  l r (t  s ) 2 + ql2 (s  l )2  qr2 (r  t )2
n
n
as a function of s and t. Note that the cross-product of the squared bias is deleted in this minimisation
function in order to avoid the aforementioned trivial solution. By taking the partial derivatives with
respect to s and t and setting these derivatives at zero, we obtain that the optimal solution satisfies the
following two equations:
11
pm
q
(  m  s )  r (t  s )  ql ( s  l )  0
n
n
(3.1)
pm
q
(t   m )  l (t  s )  qr (  r  t )  0
n
n
(3.2)

A comparison with (2.3) shows that (3.1) defines the (one-sided) optimal cut-off value at the left, where,
for fixed t, all values larger than t have been replaced by t, while (3.2) defines the (one-sided) optimal cutoff value at the right, where, for fixed s, all values smaller than s have been replaced by s. Clearly, the
simultaneous solution of (3.1) and (3.2) defines the two-sided optimal cut-off values.
In the following we will show that these cut-off values are well defined, that is, we will show
that there exists a unique solution of (3.1) and (3.2). Let
F1 ( s, t )  
F2 ( s, t ) 
pm
q
(  m  s )  r (t  s )  ql ( s  l )
n
n
pm
q
(t   m )  l (t  s )  qr (  r  t )
n
n
and define the map (s, t )  (F1(s, t ), F2 (s, t )) on the region ( ,  ]  [  , ) . If the underlying distribution
function F is differentiable, then the Jacobian of this map equals
 pm qr

 ql

n
J ( s, t )   n
q

 l

n

qr


n
.
pm ql
  qr 
n
n


It is easily seen that this Jacobian is a P-matrix1, so the map is one-to-one (see Gale and Nikaido, 1965). It
follows that the solution to (3.1) and (3.2), if it exists, is unique.
For finite populations, however, the differentiability condition of F is not met, and consequently
the result is not applicable. Instead, we proceed as follows. Similar as in the one-sided case it holds that,
for each fixed t   , F1(s, t ) is continuous and monotone increasing. Furthermore, for each fixed t we
have that F1(s, t )  0 as s approaches the lower limit of the distribution and F1(s, t )  0 as s   . The latter
assertion follows from the following series of (in-)equalities:
pm
q
(  m   )  r (t   )  ql (   l )
n
n
pm
q
(  m   )  r (  r   )  ql (   l )
> 
n
n
ql
( l   )  ql (   l ) > 0.
=
n
F1 (  , t )  
It follows that, for each fixed t , the first component has a (unique) zero at some s (t )   , that depends
continuously on t . Next, consider the second component F2 (s(t ), t ) as a function t , and note that also this
function is piecewise linear and continuous. Its derivative, if exists, equals
p
q
F2 ( s (t ), t )
s
s
 J 21( s, t )  J 22 ( s, t )  m  l (1  )  qr ,
t
t
n
n
t
(3.3)
where, according to the implicit function theorem,
1
A pp matrix A is said to be a P-matrix if all its principal minors are positive.
12
s
J ( s, t )
qr
  12

 1.
t
J11( s, t ) pm  qr  nql
It follows that (3.3) is strictly positive and so F2 (s(t ), t ) is also monotone increasing. For t   we have
pm
q
(    m )  t (   s (  ))  qr (  r   )
n
n
pm
q
(    m )  t (    l )  qr (  r   )
<
n
n
qr
=  (    r )  qr (  r   ) < 0,
n
F2 ( s (t ), t ) 
and for t approaching the upper limit of the distribution we have that F2 (s(t ), t )  0 . We conclude that
there exists a unique (s0 , t0 )  ( y(1) ,  ]  [ , y( N ) ) such that (1) and (2) are zero.
3.2. Adaptive censoring; two-sided censored estimators with estimated cut-off points
Until now, we have discussed the two-sided censored estimator with fixed cut-off points satisfying some
optimality condition. Similarly as in the one-sided case, in many practical situations, these cut-off points
are unknown. Therefore we suggest to estimate them from the sample, that is, (s0 , t0 ) is estimated by
solving

pˆ m
qˆ
( ym  s )  r (t  s )  qˆl ( s  yl )  0
n
n
(3.4)
pˆ m
qˆ
(t  ym )  t (t  s )  qˆ r ( yr  t )  0
n
n
(3.5)
for (s, t ) , where pˆ m  pˆ m (s, t ) is the sample fraction of non-truncated values, ym  ym (s, t ) the sample
mean of the non-truncated vales, qˆl  qˆl ( s) the sample fraction of the truncated values to the left, and
qˆr  qˆr (t ) the sample fraction of the truncated values to the right. We denote this estimate by ( sˆ0 , tˆ0 ) .
Naturally, the existence and uniqueness of ( sˆ0 , tˆ0 ) can be shown similarly as the existence and
uniqueness of (s0 , t0 ) . For the sake of brevity, if we denote
 pˆ m qˆ r

 qˆl

n
Ĵ ( s, t )   n
qˆ

 l

n

qˆ r




n
 and b̂( s, t )  
ˆpm qˆl



 qˆ r 

n
n



pˆ m

ym  qˆl yl 
n
,
pˆ m

ym  qˆ r yˆ r 
n

then ( sˆ0 , tˆ0 ) is the unique value that satisfies ( s, t )t  Jˆ 1 ( s, t )bˆ ( s, t ) . In the following we shortly discuss a
practical method to find ( sˆ0 , tˆ0 ) . First, observe that the domain of ( s , t ) can be divided into a finite number
of rectangular sub-domains, say S j  ( y(1) , y ]  [ y, y(n) ) , j  1,..., K , such that Jˆ ( s, t ) and bˆ ( s, t ) are
constant within S j . An example of such a sub-domain is S1  ( y(1) , y(2) ]  [ y(n 1) , y(n) ) . For each subdomain, we may calculate Jˆ 1 ( s, t )bˆ ( s, t ) . Denote these outcomes by Jˆ j 1bˆ j , j  1,..., K . Obviously, ( sˆ0 , tˆ0 )
corresponds to one of those outcomes. However, since ( sˆ0 , tˆ0 ) is unique, this is the only outcome that
satisfies Jˆ j 1bˆ j  S j . So, ( sˆ0 , tˆ0 ) can be found by checking on this condition. In doing so, we propose to
start checking on ( y(i 1) , y(i  2) ]  [ y(n  j 1) , y(n  j ) ) with (i, j )  (0,0) , and then gradually increase i and j .
For example, the next series of candidates could be (0,1) , (1,1) , and (1,0) ; then (0,2) , (1,2) , (2,2) , ( 2,1) ,
13
and (2,0) ; etc.. If there are nl ( nr ) observations that should be truncated to the left (right), then it suffices
to check no more than max[ nl2 , nr2 ] sub-domains.
The location equivariance of ( sˆ0 , tˆ0 ) can be demonstrated as follows. Denote zi  yi  a for
some constant a . Given ( s , t ) , let zl  yl  a , zr  yr  a , and zm  ym  a be the means of the z-values
that are truncated to the left, the z-values that are truncated to the right, and the non-truncated z-values,
respectively. Furthermore, let wˆ l  wˆ l (s) , wˆ r  wˆ r (t ) , and vˆm  vˆm ( s, t ) , denote the fractions of the zvalues that are truncated to the left, the z-values that are truncated to the right, and the non-truncated zvalues, respectively, and note that wˆ l ( s  a)  qˆl (s) , wˆ r (t  a)  wˆ r (t ) , and vˆm (s  a, t  a)  pˆ m (s, t ) . By
means of these considerations, it is readily seen that, if ( sˆ0 , tˆ0 ) is the unique solution to (3.4) and (3.5),
then ( sˆ0  a, tˆ0  a) is the unique solution to (3.4) and (3.5), where q̂l , q̂r , p̂m , yl , yr , and ym are
replaced by v̂l , v̂ r , ŵm , zl , z r , and zm , respectively. Similarly, we can show the scale equivariance of
( sˆ0 , tˆ0 ) .
Now, by means of ( sˆ0 , tˆ0 ) we define our two-sided censored estimator with estimated cut-off
point as
ysˆ
ˆ
`0 , t 0
 qˆl sˆ0  pˆ m ym  qˆ r tˆ0 ,
(3.6)
where the parameters q̂l , q̂r , p̂m , and ym are determined by ( sˆ0 , tˆ0 ) , i.e. qˆl  qˆl ( sˆ0 , tˆ0 ) , qˆ r  qˆ r ( sˆ0 , tˆ0 ) ,
pˆ m  pˆ m ( sˆ0 , tˆ0 ) , and ym  ym ( sˆ0 , tˆ0 ) . The location equivariance of (3.6) follows from the following series
of observations. First, if a constant value a is added to each observation, then, since ( sˆ0 , tˆ0 ) is location
equivariant, ( sˆ0  a, tˆ0  a) is the new estimated optimal cut-off point. Second, at this new cut-off point
we have vˆl (sˆ0  a)  qˆl (sˆ0 ) , vˆr (tˆ0  a)  qˆ r (tˆ0 ) , and wˆ m ( sˆ0  a, tˆ0  a)  pˆ m ( sˆ0 , tˆ0 ) , i.e. the fractions of lefttruncated, right-truncated, and non-truncated observations remain unaltered. Third, since the fraction of
non-truncated observations remains unaltered, it holds that zm ( sˆt  a, tˆ0  a)  ym ( sˆ0 , tˆ0 )  a . We conclude
that each term on the right-hands-side of (4.6) is increased by a, and so (4.6) is location equivariant.
Again, the scale equivariance can be demonstrated similarly. If (sˆ0 , tˆ0 )t  Jˆ 1(sˆ0 , tˆ0 )bˆ (sˆ0 , tˆ0 ) is elaborated
as
tˆ0 
[(
pˆ m
1
qˆ
ym  qˆl yl )(  qˆ r )  r (qˆ r yr  qˆl yl )]
n
n
n
(3.7)
[(
pˆ m
qˆ
1
ym  qˆ r yr )(  qˆl )  l (qˆ r yr  qˆl yl )] ,
n
n
n
(3.8)
1
sˆ0 
det[Jˆ ( sˆ0 , tˆ0 )]
1
det[Jˆ ( sˆ0 , tˆ0 )]
where det[Jˆ (sˆ0 , tˆ0 )] denotes the determinant of Jˆ (sˆ0 , tˆ0 ) , and (3.7) and (3.8) are substituted into (3.6), we
obtain
ysˆ
ˆ
`0 ,t 0

1
det[Jˆ ]
qˆl [(
pˆ m
pˆ
1
1
1
ym  qˆl yl )(  qˆr )]  pˆ m ym 
qˆr [( m ym  qˆr yr )(  qˆl )] ,
n
n
n
n
det[Jˆ ]
which can be further elaborated as
ysˆ
ˆ
`0 ,t 0

1
det[Jˆ ]
1
1
1
1
[qˆl2 (  qˆr ) yl  pˆ m (  qˆr )(  qˆl ) ym  qˆr2 (  qˆl ) yr ]
n
n
n
n
 ˆl qˆl yl  ˆm pˆ m ym  ˆr qˆr yr .
(3.9)
14
This last expression facilitates a comparison with the ordinary sample mean: y  qˆl yl  pˆ m ym  qˆr yr .
Elaborating the determinant of Ĵ as
pˆ m qˆl
pˆ
qˆ qˆ
qˆ

 qˆ r )( m  r  qˆl )  l r
n
n
n
n
n n
qˆl 1
1
1
qˆ r 1
 (  qˆ r )(  qˆl )  (  qˆ r ) 
(  qˆl )
n
n
n n
n n
1
1
1
1
 pˆ m (  qˆ r )(  qˆl )  qˆl2 (  qˆ r )  qˆ r2 (  qˆl ) ,
n
n
n
n
det[Jˆ ]  (
(3.10)
(3.11)
and substituting (3.11) into (3.9) gives a formal expression of (3.9) as a weighted mean with weights
depending on the sample. By (3.10) we have ˆm  1 implying that the non-truncated values are upweighted. If qˆr  qˆl  pˆ m , a little algebra shows that both ˆl  1 and ˆr  1 , and so the truncated values
are down-weighted. If accidentally qˆl  qˆ r  qˆ , then (3.9) simplifies considerably:
y sˆ
ˆ
0 ,t 0
 qˆ (
nqˆ
1  nqˆ
nqˆ
) yl  pˆ m (
) ym  qˆ (
) yr .
pˆ m  nqˆ
pˆ m  nqˆ
pˆ m  nqˆ
(3.12)
This last expression resembles very much our censored estimator for the one-sided case; compare (2.6).
We close this section by giving the influence function of the functional form of (3.6) or (3.9):
IF (  sˆ
ˆ
0 ,t 0 )

1  y  s0 
y  s0
 ( s0   s0 ,t 0 )  (ql qr )Q A l 
 if
 y  t0 

 y  s0 

 if s0  y  t0
 ( y   s0 ,t 0 )  (ql qr )Q 1A m 
 y  t0 


 y  s0 
1
 if
y  t0
 (t0   s0 ,t 0 )  (ql qr )Q A r 

 y  t0 
see appendix Ib, where  s0 , t 0  ql s0  pm m  qr t0 is the two-sided censored population mean, ql  ql (s0 , t0 )
and qr  qr (s0 , t0 ) are the truncated population fractions, Q  J(s0 , t0 ) , and
 n 1 0 
 1  n 1 
0 
 1
 , and A r   n
.
Al   1 1  , A m  

1


 0
n 
1 
 n
 0 n 

By the derivation of this influence function, it is tacitly assumed that the underlying distribution function
F is differentiable. This influence function typically consists of three linear pieces with a crack at both
y  s0 and y  t0 . The continuity of the influence function at these points is easily verified. It gives an
heuristic tool to approximate the variance of (3.6) or (3.9). By estimating this influence function, we
obtain a heuristic tool to estimate this variance. We will not elaborate hereon.
4. One-sided censored estimators for stratified designs
In stratified sampling designs the population is divided into, say L , mutually exclusive sub-populations
(strata) of N1, N2 ,..., N L units respectively. The data consist of separate samples from each stratum and the
sample sizes within the strata are denoted by n1, n2 ,..., nL respectively. Cochran (1977, page 89) gave a
number of reasons to apply stratified designs, including the following one. Stratification may increase the
efficiency of estimates of characteristics of the whole population. It may be possible to derive a
heterogeneous population into strata, each of which is internally homogeneous. If each stratum is
15
homogeneous, a precise estimate can be made of any stratum mean, which can be combined into a precise
estimate for the whole population. As is well-known, in comparison with simple random sampling, the
efficiency gain of such an estimator is strongly related to differences in the population stratum means.
In view of treating outliers in case of stratified designs, one could apply a censored estimator for
each stratum separately, and then combine these estimators to obtain a censored estimator for the whole
population. However, the accuracy of this combined estimator may be unnecessarily low, as, in the end,
too many observations may be censored. To overcome this problem, an obvious alternative is to combine
the separate samples and to replace all outlying observations by some cut-off point that is determined by
means of the combined sample. For example, in case of proportional allocation this combined sample
corresponds to the so-called self-weighting sample. In case of disproportional allocation one could weight
each observation in the combined sample to compensate for differences in sampling fractions. This
alternative is unattractive too, as it disregards any differences in the population stratum means. The
problem of treating outliers in case of stratified designs is to identify influential observations at the level
of the whole population, taking into account the sub-division of this population into (homogeneous)
strata.
Before we give our approach, we shortly discuss a third alternative. Given a preliminary robust
estimate of each stratum mean, one may define residuals per stratum and apply the censored estimator on
these residuals. This gives a cut-off point on the level of the residuals. If this cut-off point is added to
each of the preliminary estimates we obtain a cut-off point on the level of the original observations for
each stratum. Based on these cut-off points one may define a censored estimate per stratum and
subsequently combine these estimates to obtain a censored estimate for the whole population. We note
that this estimator resembles robust post-stratification, which will be discussed as a special case of the
robustified regression estimator in the next section. Intuitively, it also resembles the estimator to be
discussed below. However, the latter estimator does not need preliminary estimates.
Let Yh1 ,..., Yhnh be an i.i.d. sample from the h-th stratum. Let t  (t1,..., t L )t be fixed and let rh be
the number of observations in the h-th stratum that are smaller than th , h  1,..., L . Our approach is 1) to
define
rh
ˆt 
L

h 1
Y
 ( nh  rh )t h
h( j )
Nh
N
j 1
,
nh
(4.1)
as a one-sided censored estimator for stratified designs, 2) to take the derivative of the mean squared error
of (4.1) with respect to t , 3) to define a cut-off point t cut by means of this derivative, 4) to estimate this
cut-off point, and 5) to insert this estimated cut-off point into (4.1) to obtain a censored estimator with
estimated cut-off point for stratified designs. Now, for fixed t , the variance and bias of (4.1) equal
Var (ˆt ) 
L
( N )
Nh
2
h 1
2
phm hm
 phmqhr (  hm  t h ) 2
nh
(4.2)
and
Bias (ˆt )  
L
Nq
Nh
hr (  hr
 th )
(4.3)
h 1
respectively. Hence the mean squared error of (4.1) can be expressed as
MSE (ˆt ) 
2
L

N
p  2  phmqhr (  hm  th ) 2 
Nh
( h ) 2 hm hm

qhr (  hr  th )  ,


N
nh
N
h 1
 h 1

L


(4.4)
16
where phm  phm (t) is the h-th stratum population fraction of non-truncated values, hm  hm (t) the h-th
2
2
stratum population mean of the non-truncated values,  hm
  hm
(t) the h-th stratum population variance
of the non-truncated values, qhr  qhr (t) the h-th stratum population fraction of the truncated values, and
hr  hr (t) the h-stratum population mean of the truncated values.
By differentiating MSE(ˆt ) with respect to t we obtain after a few calculations that the optimal
cut-off value satisfies the following system of equations:
2W1q1r [
W1 p1m
(t1  1m ) 
n1
L
W q
h hr (  hr
 t h )]  0
h 1
.
2WL q Lr [
WL p Lm
(t L   Lm ) 
nL
L
W q
h hr (  hr
 t h )]  0 ,
h 1
with Wh  Nh / N , h  1,..., L . We note that, if the stratum finite population correction factors are
neglected, this system of equations resembles very much the system of equations of Kokic and Bell
(1994), which was derived in a slightly different context. Denote Ym  (1m ,...,  Lm )t , Yr  (1r ,...,  Lr )t ,
Pm  diag (W1 p1m ,..., WL pLm) , Qr  diag (W1q1r ,...,WLqLr ) , qr  (W1q1r ,...,WL qLr )t , and Dn  diag (n1,..., nL ) ,
then our system can be written for short as
2Qr [Dn1Pm (t  Ym )  lqtr (Yr  t)]  0 .
Unlike the one- and two-sided cases in i.i.d. situations, some qhr (not all) may be zero. Ignoring the fact
that Q r may be singular, we define our cut-off point t cut as the solution of
Dn1Pm (t  Ym )  lqtr (Yr  t)  0 .
(4.5)
As we will show below, this cut-off point gives a mean squared error that is smaller than the variance of
the ordinary stratified estimator. First we elaborate a bit on (4.5).
For differentiable stratum distribution functions, it can be shown that the Jacobian of the map
t  F(t)  Dn1Pm (t)[t  Ym (t)]  lqtr (t)[Yr (t)  t] is equal to J(t)  Dn1Pm (t)  lqtr (t) . Let
 D1P (t) l 
,
A(t)   n t m
 q (t)
 1
r

and observe that the determinant of A = J (t ) , see e.g. Rao (1973, problem 4.4). However, if Pm (t) is
non-singular, A can also be written as Dn1Pm (t) 1  qtr (t)Pm1 (t)Dnl . It follows that
1
J (t ) = Dn1Pm (t ) 1  qtr (t )Pm
(t)Dnl =
L

h 1
Wh phm (t ) 
1 
nh


L

h 1
nh qhm (t ) 
.
phm (t ) 

Provided that all phm (t)  0 we conclude that J (t ) > 0. Noting that this reasoning is applicable for each
of the principal minors of J (t ) we have shown that J (t ) is a P-matrix on {t : phm (t)  0, h  1,...L} . This
suggest that the solution to (4.5), if exists, is unique.
17
Provided that Pm (t) is non-singular, it is interesting to note that the inverse of J (t ) can be
written as (compare Bartlett’s identity)
J 1 (t )  Pm1 (t )D n 
1
1  q tr (t )Pm1 (t )D n l
Pm1 (t )D n lqtr (t )Pm1 (t )D n .
(4.6)
In the following of this section we frequently need an expression of the form qtr (t)J 1 (t) . By (4.6) we
have
1
q tr (t )Pm1 (t )D n
1 
qtr (t)J 1 (t) =
(4.7)
with
L

  q tr (t )Pm1 (t )D n l 
h 1
nh qhr (t )
.
phm (t )
For example (4.7) is useful to find an expression for the mean squared error at the cut-off point defined by
(4.5). First note that this cut-off value can be written as t cut  J 1[Dn1Pm Ym  lqtr Yr ] , where the involved
matrices, J , Pm , Q r , Ym , and Yr are evaluated at t  t cut . Inserting this value into (4.2) and using (4.7)
we obtain the following expression for the bias
Bias (ˆt cut )  
H
Nq
Nh
hr (  hr
 t h )  q tr ( Yr  t cut ) 
h 1
1 t
q r ( Yr  Ym )
1 
Furthermore, inserting t cut into (4.3) and using (4.7) we obtain
Var (ˆt cut ) 
L
( N )
Nh
2
h 1
L

2
N h 2 phm hm
)
 ( Ym  t cut )t Pm D n1Q r ( Ym  t cut ) =
N
nh
(
h 1
L
2
phm hm
 phmqhr (  hm  t h ) 2
=
nh
( N )
Nh
2
phm hm


( Yr  Ym )t q r q tr ( Yr  Ym ) .
nh
(1   ) 2
2
h 1
It follows that the mean squared error at t  t cut can be written as
MSE (ˆt cut ) 
H
( N )
Nh
h 1
2
2
phm hm
1

( Yr  Ym )t q r q tr ( Yr  Ym )
nh
1 
(4.8)
Decomposing the variance of the ordinary stratified estimator as
H
Var ( y st ) 
( N )
Nh
2
2
2
phm hm
 qhr hr
 phmqhr (  hr   hm ) 2
nh
h 1
H

( N )
h 1
Nh
2
2
phm hm

nh
H
( N )
h 1
Nh
2
2
qhr hr
 ( Yr  Ym )t Pm D n1Q r ( Yr  Ym )
nh
(4.9)
and subtracting (4.8) from (4.9) gives
18
Var ( y st )  MSE (ˆt cut ) 
H

(
h 1
2
N h 2 qhr hr
q qt
)
 ( Yr  Ym )t [Pm D n1Q r  r r ]( Yr  Ym ) .
N
nh
1 
(4.10)
The right-hand-side of (4.10) clearly is larger than or equal to zero if we can show that the quadratic form
is semi-definite positive. Now, we only show that Pm Dn1Q r  (1   ) 1q r qtr  0 , noting that the other
principal minors have identical mathematical forms. To that purpose we consider the determinant of
 P D1Q
B   m nt r

qr

(1   ) 1q r 
.

1

Again, by using the theory of partitioned matrices (Rao, 1973, problem 2.4), we have
B  Pm Dn1Q r  (1   ) 1q r qtr  Pm Dn1Q r 1 
qtr Q r 1Dn Pm1q r
 Pm Dn1Q r (1   ) 1  0
1 
giving the result. It is interesting to note that (4.10) suggests that the gain of censoring depends on the
homogeneity of the strata. The more homogenous the strata are, i.e. the more efficient the stratified design
is, the less will be the effect of censoring.
The cut-off point as defined by (4.5) depends on unknown population parameters. The last step
of our approach is to estimate this value by the sample. In an obvious notation, the estimated cut-off value
satisfies
ˆ y  lqˆ t y ] ,
tˆcut  Jˆ 1[Dn1P
m m
r r
where Ĵ , P̂m , Q̂ r , y m , and y r are the sample analogues of J , Pm , Q r , Ym , and Yr evaluated at t̂ cut .
For stratified designs we finally arrive at the following censored estimator with estimated cut-off point:
ˆtˆ
cut
 pˆ tm y m  qˆ tr tˆ cut ,
where pˆ m  Pˆ ml . Using (4.7), we have
qˆ tr tˆ cut 
1
1  ˆ
qˆ tr y m 
ˆ
1  ˆ
 1

ˆ
qˆ tr y r = qˆ tr 
ym 
yr  ,
ˆ
ˆ
1   
1  
and so
ˆtˆ
cut
 (pˆ tm 
1
1  ˆ
qˆ tr )y m  (
ˆ
1  ˆ
qˆ tr )y r .
(4.10)
Similarly as in section 2 we see that the inliers according to the estimate cut-off value are up-weighted,
while the outliers according to this value are down-weighted. A weighting type from of our censored
estimator is given by
ˆtˆ

cut
d y
*
i i
iS
with weights
19
di*
qˆ hm
 Nh
) if yi is an inlier of the h - th stratum
 Nn (1 
 h
pˆ hm (1  ˆ )

 N h (1  1 )
if yi is an outlier of the h - th stratum
 Nnh
1  ˆ
depending on the sample. It is easy to verify that

iS h
d i*  Wh , and so

iS
di*  1 . In appendix IIa
we have calculated the influence function of the functional form of (4.10), by means of which the
variance of (4.10) can be approximated.
Naturally, since the cut-off value is estimated instead of fixed, (4.4) and (4.8) are not valid for
(4.10) 2. In order to simulate its mean squared error of (4.10) for a specific population we have performed
a small simulation study. The population consists of 2665 units, which are divided into 6 strata. The first
three of these strata are already discussed in section 2.5. The population sizes of these strata are N1  788 ,
N2  774 , N3  544 , N4  318 , N5  191 , and N6  50 . From this population we have drawn ten times
500 stratified samples with replacement. The first five times we have applied a proportional allocation for
sample sizes n = 50, 100, 200, 500, and 1000. The last five times we have applied the Neyman allocation
for the same sample sizes. For each replicate we have calculated the ordinary stratified estimate and the
censored estimate given (4.10). Based on these replicates we have estimated the variance of the ordinary
stratified estimator and the bias and mean squared error of our censored estimator. The results are given
in tables 4.1 and 4.2.
Table 4.1. Some simulation results concerning the censored estimator; proportional allocation.
n  50
n  100
n  200
n  500
n  1000
Var ( yst )
Bias (ˆt̂ )
MSE (ˆt̂ )
MSE / Var
52154
34551
20219
8827
4627
-68
-65
-50
-32
-16
44991
30715
18989
8551
4455
0.86
0.89
0.94
0.97
0.96
Table 4.2. Some simulation results concerning the censored estimator; Neyman allocation.
n  50
n  100
n  200
n  500
n  1000
Var ( yst )
Bias (ˆt̂ )
MSE (ˆt̂ )
MSE / Var
21860
11299
5722
2256
1171
-47
-33
-21
-11
-9
20982
10852
5615
2215
1175
0.96
0.96
0.98
0.98
1.00
A number of comments are of interest. First, the Neyman allocation gives for both the ordinary stratified
estimator and the censored stratified estimator smaller mean squared errors than the proportional
allocation. Second, the gain of censoring is larger in case of proportional allocation than in case of the
Neyman allocation. This makes sense as the Neyman allocation is an efficient way to deal with outliers,
leaving less space for the censored estimator to deal with outliers. Third, the gain of censoring becomes
smaller as the sample size increases. Fourth, even in case of the Neyman allocation, the stratified
censored estimator performs not worse than the ordinary stratified estimator for all sample sizes.
5. Use of Auxiliary Information
In this section we associate with each element target variable yi a p-vector of auxiliary variables xi . It is
assumed that there are no measurement errors and that the population total of the xi -values is known or
2
We have derived the influence function of (4.10), but not implemented yet.
20
can be precisely estimated. When using auxiliary information, the following decomposition is a good
starting point to define estimators:

N
1
N
yi  b0t X 
i 1
1
N

N
( yi  b0t xi )  b0t X 
i 1
1
N
e
N
i
i 1
where X denotes the population mean of xi and b0 is some p-vector describing a relationship between
yi and xi . Assuming we have some pre-assigned value for b0 , it suffices to estimate the population mean
of the residuals.
To that purpose, it is important to have some idea about the skewness of the residuals. For
example, for categorical xi it is reasonable to assume that the skewness of the original yi -values is turned
over to the residuals. However, if the xi -values are positively skewed too, then the residuals may have
very thick tails on both sides. In the first situation we could apply our one-sided censored estimator. In the
second situation two-sided censored estimators could be of interest.
We only consider the first situation and assume that the sample is simple random with
replacement. Then, we can apply our one-sided censored estimator derived in section 2 on the ei -values:
1  nqˆtˆ
nqˆtˆ
ˆ
Etˆ 
pˆ ˆeˆ 
qˆ ˆeˆ ,
pˆ tˆ  nqˆtˆ t t pˆ tˆ  nqˆtˆ t t 
where the set of inliers / outliers is determined by means of the optimal cut-off point with respect to the
residuals. This gives
ˆ
b0t X  Etˆ 

di* yi  b0t  X 

i 1

n

n

 d x  ,
*
i i
i 1
(5.1)

with
 1 1  nqˆt
 n  pˆ  nqˆ

t
t
d 
ˆ
 1  nqt

 n pˆ t  nqˆt
if eˆi  yi -b0t xi is inlier according to the estimated cutoff point
*
i
if eˆi  yi -b0t xi is outlier according to the estimated cutoff point
as an estimator for the population mean of y. This estimator assumes the form of the classical difference
estimator. Since b0 is fixed, its variance can be approximated along the lines of section 2.
An important issue that deserves attention is the choice of b0 . Motivated by the regression model
yi  b x   i , i  1,..., N , where the i-th residual  i has mean zero and variance  i2 , Särndal et al. (1992)
suggest to take
t
0 i

b0  



N
i 1
xi xit 
 i2 
1

N
i 1
xi yi
2
i
.
This choice, however, is non-robust and we prefer a robust version of the regression coefficient.
Following Gwet and Rivest (1992), we take a Sweppe version, which is implicitly defined by
 ( yi  b0t xi )h( xi ) 
0.

i

 h( x )  
N
xi
i 1
i
i
The function h( xi ) is used to limit the influence of the auxiliary information. For categorical xi there is
no need to bound this influence. So, for categorical auxiliary information we take h( xi )  1 , which
21
corresponds to Huber’s proposal, see Huber (1981, chapter 6). If xi  0 is a scalar, one could take
h( xi )  xi / x and  i  K xi , for some positive scaling factor K, see Gwet and Rivest (1992).
After the weights d i* are derived by means of a preliminary estimate b0 , one could also consider
the following estimator:

di* yi  Bˆ 0t  X 

i 1

n
y *greg 


di* xi 

i 1

n



di* xi xit 

 i 1

with Bˆ 0  
n

1
n
d x y
*
i i i
.
(5.2)
i 1
This estimator can be considered as a regression type estimator with starting weights d i* that are
determined by means of the (estimated) optimal cut-off point, which in turn is determined by means of a
robust estimate of b0 . An explicit expression of the resulting regression type weights is given by

w  d [1  x 


*
i
*
i
t
i
1

d xjx  (X 

j 1

n

*
j
t
j
n
 d x )] .
*
i i
j 1
An important advantage of these weights is that, when applying them on the auxiliary variables xi , we
obtain the known population total X . For many statistical bureau’s this is an important tool to achieve
consistent estimates between various sources.
An interesting special case is xi  1 for all i U . Then, ei  yi  Bˆ0 xi is just a shift of location by
the amount of B̂0 . But, since d i* solely depends on the estimated fractions of (non-) truncated values at
the estimated optimal cut-off point, and these fractions remain unaltered by a shift of location, d i* remains
unaltered by a shift of location. Noting that the d i* sum up to one, we have that y g* reg equals the one-sided
censored estimator that is given by (2.6).
6. Level of censoring
One practical problem that is not discussed yet is the level of censoring. Given the sampling strategy, i.e.
a combination of sampling design and use of auxiliary information, we only have defined optimal cut-off
values at the level of the whole universe. However, most statistical surveys are conducted to obtain
estimates for many different sub-domains. If estimates are required for different sub-domains, then either
the optimal cut-off points have to be recalculated or the cut-off points with respect to the whole universe
have to be used. In the former case more efficient estimates for the sub-domains will result, however,
these estimates will not sum up to the estimate of whole universe. In the latter case, the estimates of the
sub-domains may be less efficient than those obtained using the re-estimated cut-off points. Kokic and
Bell (1994) addressed this problem as the level of categorisation, Chambers and Kokic (2000) as the level
of winsorisation, and Hulliger (1999) as the reference for robustification.
Since it is impracticable to define a general estimation strategy to take account of all possible
domains of interest, we restrict our attention to estimating a tabular form, which already plainly reflects
the problem of the level of censoring. As an example, suppose we like to estimate total turnover for A
sizes classes and that the union of these size classes constitutes the whole universe. In order to do so, we
suggest a top-down approach. Namely, we suggest to estimate first total turnover for the whole universe.
Then, the reference population for calculating the optimal cut-off point obviously is the whole universe. It
results into an estimate, which is denoted by ˆ . Next, we suggest to estimate total turnover for each
size-class separately. To that purpose we have to calculate for each size class an optimal cut-off point by
means of which the total turnover per size class can be estimated. Denote these size class estimators by
ˆ1,...,ˆA . We typically have
22
A
 ˆ
a
 ˆ .
a 1
In order to fix this inconsistency we suggest to minimally adjust the estimates for the size classes. Denote
aˆ  (ˆ1 ,...,ˆA )t , then the adjusted estimates bˆ  (ˆ1* ,...,ˆ*A )t should satisfy the following set of constraints:
l t b̂  ˆ , where l is a A-vector consisting of ones. Now, b̂ can be found by minimising the Lagrangean
L
1
2
(bˆ  aˆ )t 1 (bˆ  aˆ )   (l t aˆ  ˆ )
with respect to b̂ and the Lagrange multiplier  . The symmetric positive definite matrix  defines a
measure of closeness between the original and the adjusted estimates for the class totals. After some
straightforward algebra we obtain as a solution
bˆ  aˆ  C(ˆ  lt aˆ ) with C  l (l t l ) 1 .
This top-down approach is easily extended to higher dimensional tables; we first we estimate the total
turnover for the whole universe; secondly its one-dimensional margins given the estimate for the whole
universe; thirdly its two-dimensional margins given its one-dimensional margins etc. The Langrangean
function to be minimised at each estimation step is extensively discussed in Boonstra (2002) and not
repeated here.
We end this section by discussing a number of interesting  -matrices for our one-dimensional
margin. First, observe that the i-th component of C equals
A
ci  (
 )
a
1
i .
a 1
The choice   I is the identity matrix divides the difference ˆ  lt aˆ equally over the A different size
classes, irrespective of the (estimated) population size or estimated total turnover of the size classes. Let
N̂ a denote the estimated size of the a -th size class, a  1,..., A , then the choice   diag ( Nˆ1,..., Nˆ A )
divides the difference proportionally to these estimated sizes. Naturally, if the population sizes are known,
one could use   diag ( N1,..., N A ) instead. These two latter choices are interesting if the A population
class means of turnover are expected to be more or less the same. Finally, the choice   diag (aˆ ) divides
the difference proportionally to the a priory estimated class totals of turnover. An elaboration of this
ˆa ) 1ˆ  ˆi .
particular choice gives the simple ratio adjustment ˆi*  (

a
Appendix I; Influence functions for i.i.d. situations
Let Y1,..., Yn be n independent identically distributed random variables with values in IR according to
some unknown distribution function F . Let  (F ) be the functional representation of an estimate ˆ , i.e.
ˆ   ( Fˆ ) , where F̂ denotes the empirical distribution function of the sample y1,..., yn . Let  y denotes the
probability measure that puts mass 1 at the point y and define F  (1   ) F   y . Then the influence
function of  at F is defined as
IF ( y, ; F )  lim
 0
 ( F )   ( F )

23
if this limit exists for every y  IR, see e.g. Hampel et al. (1986). The influence function can be obtained
by differentiating the real-valued function  (F ) with respect to  at   0 . A very nice property of the
influence function is that in most situations its expected square approximates the variance of ˆ . In the
following we will derive the influence function of (2.6) and (3.6).
Appendix Ia: Derivation of the influence function of (2.6)
In order to derive the influence function of (2.6), we first derive the influence functions of m , r , pm ,
qr , and topt . After that we derive the influence function of tˆ  pmm  qmt0 . Assuming that F is
0
differentiable, the influence function of m can be obtained by taking the derivative of
t ( F )
 m ( F ) 

t ( F )
ydF ( y)
a

t ( F )
dF ( y)  [(1   )
a

t ( F )
ydF ( y)  
a

yd( y)]
t ( F )
[(1   )
a

dF ( y)  
a
t ( F )
 d( y)]
a
with respect to  at   0 . This gives
IF ( y, m ; F ) 

[ pm m  t0 f (t0 ) IF ( y, t0 ; F )  ya[ y t 0 ] ]  m [ pm  f (t0 ) IF ( y, t0 ; F )  a[ y  t0 ] ]
pm
a[ y  t 0 ]
pm
( y  m ) 
f (t0 )
(t0   m )  IF ( y, t0 ; F ) , where f (t )  F (t ) t .
pm
We note that the influence functions of r , pm , and qr can be derived similarly. Namely, by taking the
derivative of
b
 r ( F ) 

t ( F )
b
ydF ( y )
t ( F )

dF ( y ) , pm ( F ) 
t ( F )

dF ( y ) , and qr ( F ) 
a
b
 dF ( y)
t ( F )
with respect to  at   0 respectively we obtain the desired influence functions:
IF ( y, r ; F ) 
a[ y  t 0 ]
qr
( y  r ) 
f (t0 )
(t0  r )  IF ( y, t0 ; F )
qr
IF ( y, pm ; F )  (a[ y  t 0 ]  pm )  f (t0 )  IF ( y, t0 ; F )
IF ( y, qr ; F )  (a[ y  t 0 ]  qr )  f (t0 )  IF ( y, t0 ; F )
The influence function of t0 is obtained by differentiating both sides of
pm ( F )
[t0 ( F )   m ( F )]  qr ( F )[  r ( F )  t0 ( F )]  0
n
with respect to  at   0 :
IF ( y, pm ; F )
p
[t0   m ]  m [ IF ( y, t0 ; F )  IF ( y,  m ; F )]  IF ( y, qr ; F )[  r  t0 ]  qr [ IF ( y,  r ; F )  IF ( y, t0 ; F )]  0
n
n
Inserting the influence functions of m , r , pm , qr into this expression and after a few calculations we
obtain
24
IF ( y, t0 ; F ) 
a[ y t 0 ]  na[ y  t0 ]
pm  nqr
( y  t0 ) .
Eventually, the influence function of  tˆ0 is obtained by inserting the obtained influence functions into
IF ( y, tˆ ; F )  pm IF ( y,  m ; F )  qr IF ( y, t0 ; F )  IF ( y, pm ; F )  m  IF ( y, qr ; F )t0 . After a few calculations, we
0
obtain the desired result.
Note that pmIF ( y, m; F )  IF ( y, pm; F )m  IF ( y, qr ; F )t0 = a[ y  t 0 ] y  a[ y  t 0 ]t0  t 0 , which can be
recognised as the influence function of  t 0 .
Appendix Ib: Derivation of the influence function of (3.6)
In order to derive the influence function of (3.6), i.e. of  s,t  ql s0  pm m  qr t0 , we start with some
preliminary results. First note that
t

pm ( s, t ) 
 dF ( y)  F (t )  F (s) ,
s


s
ql ( s) 


dF ( y )  F ( s) , qr (t )  dF ( y )  1  F (t ) ,


t
t
t
 m ( s, t )   ydF ( y )
 dF ( y) ,
s

s
l ( s ) 


s

s
ydF ( y )

dF ( y ) , r (t ) 



ydF ( y )
t
 dF ( y) .
t
Assuming that the distribution function F has density function f , the following derivatives are easily
derived:

pm ( s , t )
pm ( s, t )
ql ( s )
qr (t )
  f (s) ,
 f (t ) ,
 f ( s ) , and
  f (t ) ,
t
s
t
s

 m ( s, t )
f ( s)
 m ( s, t )
f (t )

(s  m ) ,

(t   m ) ,
s
pm
t
pm

l ( s)
f ( s)
 r (t )
f (t )

( l  s ) ,

(r  t ) .
s
ql
t
qr
Finally, one may derive the influence functions of these parameters in a similar way the parameters that
were defined for the one-sided censored estimator, see appendix Ia.

IF ( y, pm ; F )  (a[ s0  y  t 0 ]  pm )  f (t0 )  IF ( y, t0 ; F )  f ( s0 )  IF ( y, s0 ; F )

IF ( y, ql ; F )  (a[ y  s0 ]  ql )  f ( s0 )  IF ( y, s0 ; F )

IF ( y, qr ; F )  (a{ y  t 0 ]  qr )  f (t0 )  IF ( y, t0 ; F )

IF ( y, m ; F )  ( y  m )
a[ s0  y t0 ]
pm

f (t0 )
f (s0 )
(t0  m ) IF ( y, t0 ; F ) 
(s0  m ) IF ( y, s0 ; F )
pm
pm
25
a[ y  s0 ]

IF ( y, l ; F )  ( y  l )

IF ( y, r ; F )  ( y  r )
ql
a[ y  t 0 ]
qr

f (s0 )
(s0  l ) IF ( y, s0 ; F )
ql

f (t0 )
(t0  r ) IF ( y, t0 ; F ) .
qr
Using these preliminary results, we derive the influence functions of s0 and t0 by solving simultaneously
(for reasons of notational convenience, we have shortened the notation):
IF( pm )
p
IF(ql )
q
(t  m )  m ( IF(t )  IF(m )) 
(t  s)  l ( IF(t )  IF(s))  IF(qr )(r  t )  qr ( IF(r )  IF(t ))  0
n
n
n
n
IF( pm )
pm
IF(qr )
q

(  m  s) 
( IF(m )  IF(s)) 
(t  s)  r ( IF(t )  IF(s))  IF(ql )(s  l )  ql ( IF(s)  IF(l ))  0 .
n
n
n
n
Inserting the involved influence functions for pm , m , ql , l , qr , and r into these equations we
obtain after a little algebra
 IF ( y, s0 ; F ) 
 ( y  s0 ) 

  IF ( y, ( s0 , t0 ); F )  J 01A( y )
 ,
IF
(
y
,
t
;
F
)
0


 ( y  t0 ) 
where J0  J(s0 , t0 ) and
 a[ s0  y  t 0 ] a[ y  t 0 ]


 a[ y  s0 ]
n
n
A( y )  
a[ y  s0 ]



n


a[ s0  y  t 0 ]
n

a[ y  t 0 ]
n
a[ y  s0 ]
n


.

 a[ y  t 0 ] 

Note A( y ) can be written as A( y)  a[ y  s0 ]Al  a[ s0  y  t0 ]Am  a[ y  t0 ]Ar , where
 n 1 0 
 n 1  n 1 
0 
 1



.

,
,
and
A

A

Al   1
m
r
 0 n 1 
 0
n1 
1 
 n



We are ready to derive the influence function of (3.6). i.e. of  s,t  ql s0  pm m  qr t0 . To that purpose,
observe the following decomposition:
IF ( y,  s,t ; F )  ql IF ( y, s0 ; F )  IF ( y, ql ; F )s0  pm IF ( y, m ; F )  IF ( y, pm ; F )m  qr IF ( y, t0 ; F )  IF ( y, qr ; F )t0
of which
s0   s ,t if

IF ( y, ql ; F ) s0  pm IF ( y,  m ; F )  IF ( y, pm ; F )  m  IF ( y, qr ; F )t0   y   s ,t if
t  
s ,t if
0
y  s0
s0  y  t0 .
y  t0
This can be recognised as the influence function of the double sided censored estimator with known cutoff values (s0 , t0 ) . The remaining two terms can be written as
 y  s0 
 .
ql IF ( y, s0 ; F )  qr IF ( y, t0 ; F )  (ql qr )J 01A( y )
 y  t0 
Collecting terms we arrive at the following influence function for  s,t  ql s0  pm m  qr t0 :
26

 y  s0 
 +
IF ( y,  s,t ; F )  a[ y  s0 ] (s0   s,t )  (ql qr )Q1Al 
 y  t0 


 y  s0 
 +
a[ s0  y  t 0 ] ( y   s,t )  (ql qr )Q 1A m 
 y  t0 


 y  s0 
 .
a[ y  t 0 ] (t0   s,t )  (ql qr )Q1A r 
 y  t0 

Appendix II: An influence function for stratified designs
For stratified designs we have to adapt the definition of the influence function. Let Yh1 ,..., Yhnh be nh
independent identically distributed random variables with values in IR according to some unknown
distribution function Fh , h  1,..., L . Let  h (F1,..., FL ) be the functional representation of an estimate ˆh ,
i.e. ˆh   h ( Fˆ1,..., FˆL ) , where F̂h denotes the empirical distribution function of the sample yh1 ,..., yhnh . We
define the influence function of ˆh   h ( Fˆ1,..., FˆL ) with respect to the h -th stratum as
IFh ( y, h ; F1,..., FL )  lim
 h ( F1,.., Fh( ) ,.., FL )   h ( F1,.., Fh ,.., FL )

 0
with Fh( )  (1   ) Fh   y . In view of (4.1), we are typically interested in a functional of the form
L
 st ( F1 ,..., FL ) 
 N  (F ,..., F ) .
Nh
h
1
L
h 1
This gives the following influence function for  st (F1,..., FL ) with respect to the h -th stratum:
L
IFh ( y, st ; F1 ,..., FL ) 
N
Nh
IFh ( y, h ;F1 ,..., FL ) .
h 1
If Gh is ‘near’ Fh , a first order von Mises expansion is given by
L
 st (G1 ,..., GL )   st ( F1 ,..., FL ) 
  IF ( y,
h
st ; F1 ,..., FL ) d (Gh
 Fh )( y )  remainder ,
h 1
compare Hampel et al. (1986, Chapter 2). This suggests that the asymptotic variance of ˆst   st (Fˆ1,..., FˆL )
can be obtained as
Var (ˆst ) 
L

h 1
1
nh N h
Nh
[IF ( y
h
2
hj , st ; F1 ,..., FL )]
.
j 1
Appendix IIa. Deriving the influence function of (4.10)
In this appendix we derive the influence function of (4.10) with respect to the h -th stratum. Once these
influence functions are derived, it is an easy matter to approximate the variance of (4.10). We start with
some preliminary results. Let
27
th
 dF
ph 
with th  th (F1,..., FL ) , so ph  ph (F1,..., FL )
h


qh 
 dF
h
with th  th (F1,..., FL ) , so qh  qh (F1,..., FL )
th
1
 hm 
ph
th
 ydF
h
with th  th (F1,..., FL ) , so hm  hm (F1,..., FL ) ,

and

 hr 
 ydF
h
with th  th (F1,..., FL ) , so hr  hr (F1,..., FL ) .
th
Along the line of appendix Ia it can be shown that
 ph )  f h (th )  IFh ( y, th ; F1,..., FL ) if h  h
(a
IFh ( y, ph ; F1,..., FL )   [ y  t h ]
if h  h
 f h (th )  IFh ( y, th ; F1,..., FL )
 qh )  f h (th )  IFh ( y, th ; F1,..., FL ) if h  h
 (a
IFh ( y, qh ; F1,..., FL )   [ y  t h ]

f
(
t
)
if h  h
 h h  IFh ( y, th ; F1,..., FL )
( y   hm )  f h (th )  (th   hm )  IFh ( y, th ; F1,..., FL ) if h  h
a
ph IFh ( y,  hm ; F1,..., FL )   [ y  t h ]
if h  h
 f h (th )  (th   hm )  IFh ( y, th ; F1,..., FL )
( y   hr )  f h (th )  (th   hr )  IFh ( y, th ; F1,..., FL ) if h  h
a
qh IFh ( y,  hr ; F1,..., FL )   [ y  t h ]

f
(
t
if h  h
 h h )  (th   hr )  IFh ( y, th ; F1,..., FL )
At the end of this appendix we will show that
IFh ( y, t cut ; F1,..., FL )  J 01Bh ( y) ,
where
 N1 p1

 n1
J 0  J (t cut )   .

 0

.
.
.

  N1q1 ... N L q L 
  .
. 
. 
. 
N L pL   .
  N1q1 ... N L q L 
nL

0
and
 N h [a[ y  t h  ] ( y  th )  qh (  hr  th )]  

0

 

.
N h



.
B h ( y )  

[
a
(
y

t
)

p
(


t
)
]




[
y

t
]
h
h
h
m
h
h
 n

.

h

 

N
[
a
(
y

t
)

q
(


t
)]
0
h
h hr
h  

 h [ y  t h  ]
It follows that the influence function of  h (F1,..., FL )  ph hm  qhth with respect to the h -th stratum can
be elaborated as
28
IFh ( y, h ; F1,..., FL )  IFh ( y, ph ; F1,..., FL )  hm  ph  IFh ( y, hm; F1,..., FL ) +
IFh ( y, qh ; F1,..., FL )  th  qh  IFh ( y, th ; F1,..., FL ) .
Inserting the involved influence functions, we obtain for the first three terms:
IFh ( y, ph ; F1,..., FL )  hm  ph  IFh ( y, hm; F1,..., FL )  IFh ( y, qh ; F1,..., FL )  th 
a[ y  t h ] y  a[ y t h ]th  ( ph hm  qhth ) if h  h
.
if h  h
0
= 
The fourth term can be obtained from IFh ( y, t cut ; F1,..., FL )  J 01Bh ( y) . Finally, we obtain that the
influence function of the functional form of (4.10) with respect to the h -th stratum can be written as
L
IFh ( y, st ; F1 ,..., FL ) 
N
Nh
IFh ( y, h ; F1 ,..., FL ) 
h 1
N h
[(a[ y  t h  ] y  a[ y  t h  ]th )  ( ph  hm  qht h )]  q tr J 01B h ( y ) .
N
Here, an expression for qtr J 01 can be obtained by means of (4.7).
It is interesting to note that if there are no outliers in the h -stratum, then Bh (y) simplifies
considerably:


0


N

B h ( y )   h ( y  Yh )  ,
 nh



0


where Yh  is the population mean of the h -th stratum. Moreover, since the h -th component of q r is
zero, it is easily seen that qtr J 01Bh ( y)  (1   )1qtr Pm1DnBh ( y)  0 . So, if the h -th stratum contains no
outliers, the influence function of the functional form of (4.10) with respect to this stratum becomes
IFh ( y, st ; F1 ,..., FL ) 
N h
( y  Yh ) ,
N
which can be recognised as the influence function of the ordinary stratified estimator with respect to this
stratum.
It remains to show that IFh ( y, t cut ; F1,..., FL )  J 01Bh ( y) . Now the functional th  th (F1,..., FL ) is
implicitly defined as
N h ph
(t h   hm ) 
nh
L
 N q (
l l
lr
 tl )  0 , h  1,..., L .
l 1
Taking the influence function of both sides of these defining equation with respect the h -th stratum,
h  1,..., L , gives
Nh
[ IFh ( y, ph ; F1,..., FL )  (th   hm )  ph  ( IFh ( y, th ; F1,..., FL )  IFh ( y,  hm ; F1,..., FL ))] 
nh
29
L
 N [IF
h ( y , ql ; F1 ,..., FL )  ( lr
l
 tl )  ql  ( IFh ( y, lr ; F1,..., FL )  IFh ( y, tl ; F1,..., FL ))]  0
l 1
The term within the first brackets at the left-hand side equals
 a[ y  t h ] (th  y )  ph (th   hm )  ph IFh ( y, th ; F1,..., FL ) if h  h

if h  h
 ph IFh ( y, th ; F1,..., FL )
The second term at the left-hand side equals
L
 N [IF
h ( y , ql ; F1 ,..., FL )  ( lr
l
 tl )  ql  ( IFh ( y, lr ; F1,..., FL )  IFh ( y, tl ; F1,..., FL ))] =
l 1
L
 N [IF
h ( y, ql ; F1 ,..., FL )  ( lr
l
 tl )  ql  ( IFh ( y, lr ; F1,..., FL )  IFh ( y, tl ; F1,..., FL ))] 
l  h
Nh[IFh ( y, qh ; F1,..., FL )  (hr  th )  qh  (IFh ( y, hr ; F1,..., FL )  IFh ( y, th ; F1,..., FL ))] =
L

 N q  IF
l l
h ( y , tl ; F1 ,..., FL )
l  h
 N h [a[ y  t h  ] ( y  t h )  qh (  hr  t h )  qh IFh ( y, t h ; F1 ,..., FL )] 
L

 [ N q  IF
l l
h ( y , tl ; F1 ,..., FL )] 
N h [a[ y  t h  ] ( y  t h )  qh (  hr  t h )] .
l 1
Denoting
 Fh ( y, t1; F1 ,..., FL ) 


IFh ( y, t cut ; F1 ,..., FL )  
.
,
 F ( y, t ; F ,..., F ) 
L 1
L 
 h
and collecting terms, we arrive at the following system of equations: J0 IFh ( y, t cut ; F1,..., FL )  Bh ( y) .
Noting that J 0 is non-singular (see Section 4) we obtain the desired result.
30
References
Boonstra, H-J (2002), Variance Estimation for Adjusted Tables, Internal Report, Statistics Netherlands,
Heerlen.
Béguin, C. and B. Hulliger (2002), Develop and Evaluate New Methods for Statistical Outlier
Detection and Outlier Robust Multivariate Imputation, Euredit Workpackage x.2.
Chambers, R.L. (1986), Outlier Robust Finite Population Estimation, Journal of the American
Statistical Association, 81, pp. 1063-1069.
Chamber, R.L. and Kokic, P. (2000), Winsorization for Identifying and Treating Outliers in Business
Surveys. In The Second International Conference on Establishment Surveys, American
Statistical Association, Buffalo, New York.
Clarke, R.G. (1995), Winsorization Methods in Sample Surveys, Master Thesis, Department of
Statistics, Australian National University.
Cochran, W.G. (1977), Sampling Techniques, John Wiley and Sons, New York.
Duchesne, P. (1999), Robust Calibration Estimators, Survey Methodology, 25, pp. 43-56.
Ernst, L.R. (1980), Comparison of Estimators of the Mean which Adjust for Large Observations,
Sankhya, The Indian Journal of Statistics, Series C, 42, pp. 1-16.
Fuller, W.A. (1991), Simple Estimators for the Mean of Skewed Populations, Statistica Sinica, 1,
pp. 137-158.
Gale, D. and H. Nikaidô (1965), The Jacobian Matrix and Global Univalence of Mappings,
Mathematical Annals, 159, pp. 81-93.
Gwet, J-P. and L-P. Rivest (1992), Outlier Resistant Alternatives to the Ratio Estimator, Journal of
the American Statistical Association, 87, pp. 1174-1182.
Hampel, F.R., E.M. Ronchetti, P.J. Rousseeuw, and W.A. Stahel (1986), Robust Statistics: the
Approach Based on Influence Functions, Wiley, New York.
Hidiroglou, M.A., and K.P. Srinath (1981), Some Estimators of a Population Total Containing Large
Units, Journal of the American Statistical Association, 78, pp. 690-695.
Huber, P.J. (1980), Robust Statistics, John Wiley and Sons, New York.
Hulliger , B. (1995), Outlier Robust Horvitz-Thompson Estimators, Survey Methodology, 21, pp. 79-87.
Hulliger, B. (1999), Simple and Robust Estimators for Sampling. In Proceedings of the Section on
Survey Research Methods, American Statistical Association, pp. 54-63.
Kokic, P.N. and P.A. Bell (1994), Optimal Winsorizing Cutoffs for a Stratified Finite Population
Estimator, Journal of Official Statistics, 10, pp. 419-435.
Lee, H. (1995, Outliers in Business Surveys. In Business Survey Methods, pp. 503-526, John Wiley
and Sons.
Rao, C.R. (1973), Linear Statistical Inference and Its Applications, John Wiley and Sons, New York.
Särndal, C-E., B. Swensson, and J. Wretman (1992), Model Assisted Survey Sampling, Springer-Verlag,
New York.
Searls, D.T. (1966), The Estimator for a Large Population Mean which Reduces the Effect of Large
True Observations, Journal of the American Statistical Association, 66, pp. 1200-1205.
Smeets, M. (2001), A Simulation Study for Censoring Estimators, Internal Report, Statistics
Netherlands, Heerlen.
31