ASYMPTOTICS IN FINITE POPULATION SN1PLING
by
pra,nab Kumar Sen
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1483
March 1985
ASYMPTOTICS IN FINITE POPULATION SAMPLING
Pranab Kumar Sen
1.
Introduction
The theory of (objective or probabilistic) sampling from a finite population
plays a fundamental role in statistical inference in sample surveys.
Indeed,
in practice, mostly, one encounters a set or collection of a finite number
(say, N) of objects or units comprising a population, and, .on
th~
basis of a
subset of these units, called a sample (drawn in an objective manner), the
task is to draw (valid) statistical conclusions on (various characteristics of)
the population.
The population size, N, though finite, needs not be small, and
the sample size, say n, though presumably less than N, needs not be very samll
compared to N (Le., the sampling fract·ion nlN needs not be very samll).
sur~ey
In
sampling, the sampling frame defines the units and the size of the
population unambiguously.
It also reconstructs a population having an un-
countable (or infinite) number of natural units interms of a finite population
by redefining suitable sampling units.
Thus, given the sampling frame and units,
one may like to draw inference on the population through an objective sampling
scheme.
In some other cases, though the units are clearly defined, the size of
the population may not be known in advance, and one may therefore like to
estimate the populatIon size (along with its other characteristics), through
objective sampling schemes.
In either case, usually sampling is made without
replacement (generally, leading to relatively samller margins of sampling
fluctations), though, for sampling with repalcement, the theory is of
relatively simpler in form.
Again, in either with or without replacement
schemes, the different units in the population may all have the common probability
for inclusion in the sample (leading to equal probability or simple random
sampling), or, they may be stratified into some subsets, for each one of which
simple random sampling may be adopted.
In the extreme case, the units in the
population, depending on their sizes or some other characteristics, may have
possibly different probabilities for inclusion in the sample (i.e., varying
probability sampling).
There may be other variations in a sampling scheme
(such as double sampling, interpenetrating sampling, successive sampling, etc.)
However, all of these cases are characterized by an objective sampling procedure defined by a probability law governing the sampling distribution of
suitable statistics based on the sample units.
When N is small, such a sampling distribution (of a
s~atistic)
be studied, mostly, by direct enumeration of all possible cases.
may
However, as
N increases, this enumerational process, generally, becomes prohibitively
laborious.
On the other hand, in survey sampling and in other practical
situations, usually, N is large and n/N may not be very small.
In such a
case, there may be a profound need to examine the generally anticipated and
applicable large sample approximations for the sampling distributions (and
related probability inequalities) with a view to prescribing them in actual
applications.
Our main interest is centered in these asymptotics in finite
population sampling.
Naturally, the asymptotic theory depends on the sampling
design, and, for diverse sampling schemes, diverse techniques have been
employed to achieve the general goals.
It is intended to provide here a
general account of these technqiues along with the related asymptotic theory.
In line with the
gen~ral
objectives of the Handbook, mostly, the derivations
will be replaced by motivations, and emphasis will be laid on the applications
oriented theory only.
In Section 2, we start with the asymptotics in simple random sampling
(with and without replacement) schemes.
For general U-statistics, containing
the sample mean and variances as special cases, asymptotic normality and
related results are presented there.
Some asymptotics on probability inequalities
in simple random sampling (SRS) are then considered in Section 3.
Asymptotics
on jackknifing in finite population sampling (SRS) are presented in Section 4.
Capture-mark-recapture (CMR) techniques and asymptotic results on the estimation
of the size of a finite population are considered in Section 5.
Asymptotic
results on sampling with varying probabilities (along with the allied coupon
collector problem) are presented in Section 6.
In this context, some limit
theorems arising in the occupancy problem are also treated briefly.
The con-
eluding section deals with successive sub-sampling with varying probabilities,
and the relevant asymptotic theory is discussed. there.
2.
Asymptotics in SRS
Let the N units with values a1' •.•
'~
constitute the finite population.
In a sample of size n (2N), drawn without replacement, the observation vector
~n=(~' ••
~=(a1, .•.
' ,Xn ) is a (random) subset of
,aN)' governed by the basic
probability law:
p{xl=a. , ... ,X =a. } = N-[n],
1.
for every l2i, ; ..• ;i
n
(2.1)
1.n
n
1
- In]
In] -1
In]
where N
=(N
)
and N =N •.. (N-n+l), for
~N,
n~N(N[O]=l).
Based on X , we may be interested in the estimation of the
-n
population mean
A.
-""N
-1 N
=N
(2.2)
L:.1. = 1a.1
and the E2pulation variance
2
0;"1
n
= (N-l)
-1
N
- 2
E. l[a.-a ] ,
1=
1
among other characteristics of the population.
[viz. , Nandi and Sen (1963)] are given by
X
n
= n -Ln
L:.
respectively.
~=
IX.
~
and
s
2
n
=
(2.3)
n
The optimal sample estimators
•
-1- n
(n-l) L:.
~=l
-
(X. - X )
~
2
(2.4)
n
Both these estimators are special cases of U-statistics, which
may be introduced as follows.
For a symmetric kernel g(X}, ••. ,X ) of degree
m
parameter
eN
= e (A_
~~ ) = N- [mJ
m(~I),
we define a (population)
E <. <
< N g ( a. , ••• ,a. ) ,
I -~l ••• <'~
~l
~
(2.5)
and the corresponding sample function, viz.,
U
n
= n- lrn]
1:
.
1'::'~l
<.
0
•
<
.
~rn
g (X. , ••• ,X.
5..n
~l
(2.6)
~rn
termed a U-statistic, is an optimal unbiased estimator of 0 o We may note that
N
1
2
2
2
for m=l, g(x)=X, Un=X , 8 =a n , and, for m=2, g(x,y)~(x-y) , Un=sn' 0 =cr o
n
N
N n
In fact, as g(.) is assumed to be symmetric in its
and (2.6), we may take i<i",o<i
-
m(~l)
arguments, in (2.5)
<~ (and l<i,"'<i
<n) and replace N-[m] (and
m-
m-
n-[m]) by (N)-l (and (n)-l), where (p)=p!/(q!(p-q) 1).
m
m
q
Note tnat the Xi are not independent random variables'(r.v.).
they are symmetric dependent rov.'s.
For any
given~,
Nevertheless,
the (exact) sampling
N
n
distribution of U may be obtained by direct enumeration of all possible ( )
n
samples of size n
from~.
Tnis process, obviously, becomes prohibitively
laborious for large N (and n).
As such, there is a genuine need to provide
suitable approximations to the large sample distribution, when N and n are
both large, though
~=n/N
(the sampling fraction) needS not be very SMall.
In this context, the permutational central limit theorems (peLT) playa vital
role.
For the particular case of X , Madow (1948) initiated the use of peLT
n
in finite population sampling, and, since then, this has been an active area
of fruitful research.
For general U , the asymptotic normality result in
n
SRS has been studied by
~andi
and Sen (1963), with further generalizations due
to Sen (1972), Krewski (1978), Majumdar and Sen (197&), and others.
In an asymptotic setup, we conceive of a sequence {A } of populations and
-n
a sequence {n} of sample sizes, such that as N increases,
:tN=n/N-+a :05t<1
Though, theoretically, the asymptotic theory is justified for N indefinitely
large, in practice, the asymptotic approximations workout quite well, for N
even moderately large.
For every N and
h:O~h~m,
let
(2.7)
g (N) ( a . ,..., a
~l
h
r~
where the summation
{l, ... ,N}\{il,··.,i }·
h
-
l:..
'h,
N
-[h]
=N
~
* g( a . ,..., a. )
) -_ ( N-h) - [m-h] ENh
~l
Note that
r1 . <
'::'1
~
-
(Nt
and gm -g.
i ...
<' <n {gh
> 0
l;l,N
go(NL
-eN
(N)
~L~
r
=0)
for h = 0,1 , ••• , m (where "O,N·
lim inf
~l+l' .•
extends over all distinct
}2
~
1
·' i m over the set
Let then
(a. , •• ., a . )
1
(2 • 8 )
1m
2
- eN '
(2 .9)
For the asymptotl.·c theory, we assume that
lim sup
and
~
-
l;m,N
Then, as in Nandi and Sen (1963), we have, for every
<
( 2.10)
00
•
n~m,
Var (U ) = E [U -8 ) 2
n
n
IS
2 -1 -1 -1 -1 2
m (n -N )l;l,N + O«n -N ) )
(2.11)
Let us now assumE: that as N increases,
(2.12)
Also, let [k) denote the largest integer 2k, and let
YN(t)
= YN(N-1 [Nt]) =
where for t<m/N, we let YN(t)=O.
every N (~m).
-
~
[Nt] (U[NtJ - 8 )/{m(Nr;1,N) } , t e: [0,1],
N
(2.13)
Then YN={YN(t), 0~t2l} is well defined for
Finally, let WO={Wo(t), O<t~l} be a Brownian bridge, i.e.,
WO(t) is Gaussian with EWo(t)=O, V02 t<1 and EWo(s)Wo(t)=s~t-st, V s,te:[O,l],
where aAb=min(a,b).
Then, we have the following general result, discussed in
detail in Sen (1981, Section 3.5):
Under (2.10) and (2.12), as N increases, the stochastic process Y
N
converges in distribution to W°.
An
immediate corollary to this basic result is the following:
Under (2.10 and (2.12), as n increases, satisfying (2.7),
1
n
It
Z
_~
(Un-G N) I {m Z;~i.~} - N(O, l~).
(2.14 )
is also interesting to note that this weak convergence (of {Y } to Wo)
N
provides a very simple proof for the asymptotic normality result, when n is
itself a positive integer-valued r.v.
Suppose that {V } is a sequence of nonN
negative integer valued r.v. 's, such that as N increases,
-1
N v
D
N
+
B:
O<B<l.
(2.15)
Then, under (2.10) and (2.12) ,
1
N~( U
V
N
-
eN
-~
,I'
-"CO,S
) / {ml,;l,N }
-1 -1).
(2.16)
This last result is useful in the case where the sample size n is determined
by some other considerations, so that it may be stochastic in nature.
-
-1
Note that for m=l and g(x)=x, Sl,N=(l-N
(estimable) parameter.
Knowledge of
~l,N
)a
2
N·
-
In general, sl,N is an
is useful in providing a confidence
interval for eN' based on (2.14) or (2.15).
The following'estimator, due to
Nandi and Sen (1963), is a variant form of the usual jackknifed estimator.
For each i(=l, .•. n), let
v~:i
be the U-statistic based on (Xl"",Xi_l,Xi+l'
... , X ), and 1 et
n
V
-(n-l)V C)
1 , i=l, ... ,n.
n
n-1
(2.17)
,=nV
n,1
Let then
2
-1 n
2
(2.18)
l:. l(V .-V) •
1=
n,1 n
_2 2
)
Note that for m=l and g(x)=x,Un,1.=X.,l<i<n,so
that s n =s n , defined by (2.4 ,
1 __
Sn = (n-l)
while, in general, we have (d. Nandi and Sen (1963» under (2.10) and (2.12),
(2.19)
The asymptotic results considered here extend easily to the case of more
than one V-statistic.
Futher, we have studied, so far. the case of sampling
without replacement .. In the case of sampling with replacement, we have
~
•..• ,
X independent and identically distributed (i.i.d.) r.v., where
n
-1
p{~=ai}= N , for i=l, ... , N.
(2.20)
As such, the classical central limit theorems and weak convergence results for
V-statistics [viz., Hoeffding (1948), Miller and Sen (1972)], remain applicable.
In particular, in such a case, for (2.14),1-a has to be replaced by 1, and. in
(2.16), B(l-B) has to be replaced by B alone; (2.19) follows from Sen (1960a).
e-
3. Scree PrOOability and r-rnent Inequalities for SFS.
In SRS with :replacenent, the sanple observatioos are independent and iden-
tically distributed randan variables, so that the usual probability and rrarent
inequalities holding for sempling fran an infinite populaticn also remain valid
in this case.
Ql
the other hand, in SFS without replacenent, the semple obser-
vaticns are no lcnger independent ( but, exchangeable ) randan variables. In
(2.11) and (2.14), we have observed that the dependence in SRS without replaC5IEl1t leads to a smaller variance for the sanple rrean or,in general, for
U-statistics. '!his feature is generally shared by a general class of statistics
and the related inequality is temed the Hoeffding inequality.
let Xl' ••• 'Xn be a semple in SRS without replacerrent fran a finite populaticn, and let Y , ••• 'Y be a sanple in SRS with replacerrent fran the sarre
l
n
pq:>ulaticn. '1l1en, for any cxnvex and ccntinuous functicn <P (x) , we have
E<p (XI+" .+X ) < E<P (Y +••• +Y ) •
I
n n
(3.1)
We may :refer to Hoeffding (1963) for a sinple proof of (3.1). Rosen (1967)
has extended the inequality in (3.l) for certain functicns other than ccnvex,
ccntinuous ep t.} and also for nore general syrmetric sanpling plans \vhi.ch
include the SRS with replacerrent as a Particular case. A nore general result
in this direction is due to Karlin (1974). In his setup, <P (.) needs not be a
function of the sum of the Xi (or Yi) •
let <P ( , , ••• ,X ) be a functicn, symretric
n
in its n a.:rgum:mts, such that
<P(a,a,x3 , ••• ,xn } + <P<b,b,xy .. .,x )
n
~
2 <P(a,b,x , ••• ,x ) ,
3
n
(3.2)
for all a,b,~, ••• ,x • For all such ep C.) and any symretric sanpling plan
n
E<P(Xl'."'X )
n
<
-
E<P(Yl, ... ,Y ) ,
n
S',
(3.3)
where the X.~ are in SRS without replaoerrent and the Y.~~,
in the sanpling plan "1-.
'll1e:re are no:re general inequalities of this nature in Karlin (1974), and they
should be of ccnsiderable theoretical interest. In passing, we may remark that
the ccnditim in (3.2) may not , in general, hold for functims of V-statistics.
'Ib illustrate this point, let us ccnsider the special case of the sanple variance
when n
=
2.
~re,
2
2
- ON)
(~(xl-"2)
'Ihus, whenever
= cr~
U2
2
, we have
~(a-b)2
, in general,
(Xl-~)2/2,
=
so that EU
2
<p(a,a)
= <p(b,b)
2cr~ , 2<p(a,b)
>
'I.ben, for
<P(x ,"2) =
l
2 22
4
= ON and <p(a,b) =(~(a-b) -aN) •
E[~(Xl-~)2]
with a positive probability, and hence,
sane picture holds for n
']he
•
> <p(a,a) + <p(b,b). Since
~(Xl-~)2 exceeds 2cr~
(3.2) does not hold.
~
=
~
2. Nevertheless, for ccnvex
functicns of U-statistics, we have sate s.ilYple m:::m:mt inequalities [due to
Hoeffding (1963)]. COnsider (as in sectim 2) a kemel g(Xl, ... ,X ) of degree
m
m( > 1) , and for every n > m, define k
n
*
Un
let
= k n-1 r.~
1
1=
=
[n/m] and let
g(x(.1-l)m+l'''.'X'lltl ) .
(3.4)
Yn be the sigma-field generated by the ordered oollectim of Xl' ••• 'Xn
so that we have U
n
= E I Un* IJi:n J,
for any ccnvex functic:n
E[ <P(U )
n
J
~
<p
'
and, as a result, by the Jensen inequality,
(for whidl the expectatic:n in (3.5) exists),
E[ <P(U* ) ] •
n
(3.5)
Ch the other hand, for U* , the inequality in (3.1) is directly applicable,
n
so that we have for every oontinoous and ccnvex
E[<P(Un )J -< £[<P(
<p ,
r.~
19(Y("1-l)m+l'." ,Yolltl)!k:n )J , lrj n
1=
_> m ,
(3.6)
where the Y. are defined as in (3.1). For the right hand side of (3.6), the
1
usual m:::m:mt inequalities are applicable, and these are therefore adaptable
for U in SRS without replaoem:mt too.
n
.
In SRS without replaoonent , the reverse martingale property of U-statistics
(and hence, sanple rreans) has been established by Sen U970}, and this enables
c:ne to derive other probability inequalities, whidl will be briefly discussed
here. In passing we may also note that by virtue of this reverse martingale
prcperty, for any ocnvex
<p , {<p
(U ) ,m < n < N}
n
--
has the reverse sub-martingale
prcperty, so that suitable m:::m:mt inequalities may also be based m this fact.
We define V
n
= Var(Un )
as in (2.11) and let Vn* = Vn - Vn +1 ' for n > m.
e-
Also, let {~1k': m}
be a noodecreasing seqtalCE of positive nurrbers. '!hen,
we have the following
r cf.
W1enever for
SatE
Sen(1970)]:
r ~ 1, E IUn - eN Ir exists (for n ~ rn), for every t > 0
andrn~n~n' ~N,
I
max ' ~ Uk - eN
p{ <k<
n
n
I
...n' +l(c.-c.
r r l)E Iu.-e Ir },(3.7)
>t} < t -r{cr E IU -eN Ir +L.
N
n
n
~=n
~
~~
K.
so that) in particular,
< ' c.
p{ <max
n k n
. K.
-
~
have
Illel1, 1 -> t}
-k
< t- 2 {c2v + Ez:'-l c'2:v~ }
n n
~=n
~ ~
-
(3.8)
,
and
(3.9)
For the particular case of sarrple rreans (or sums) i. e., kernels of degree 1,
sore related inequalities have also been studied by 5erfling (1974). In this
ccntext, the following inequality fdue to Sen (1979b)] is worth nenticning.
let {~i' 1 ~ i ~ N1 N ~ l}
be a triangular array of real nurrbers
satisfying the nonnalizing oonstraints:
N
Ei=l ~i = 0
Also, let q
N
and
2
Ei=l dNi
= 1 •
(3.10)
= { q(t):O < t < l} be a oontinuous, normegative, U-shaped and
square integrable ftmcticn inside I = fO ,1L Finally, let
take an each pentUltatian of (1, ••• ,N) with the
\XI[1l0l1
9=
(Ql'''· ,QN )
probability (N!) -1.
'!hen
(3.11)
Clearly, in SR3 withaIt replacerrent, (3.11) may be used to provide a simulta-
neous (in k: 1<k<N) ccnfidenCE band for
eN by d100sing q in an awropriate way.
For a related inequality ( eJq)loiting the 4th m::rrent but not the inherent
.
,
v
/.
martmgale structure) we may refer to Hajek and Sidak (1967 ,p.18S) :
k
max
I-~i=l~.':
I t} ~ Cn/N) [ l<i<N
max 2
-4
-3
P { l<k<n
dNi + 3n/N]t (l-n/N) (1 +
where
~
... 0 as N ...
--
~
00
EN),
(3.12)
•
Generally, (3.7) with r = 4 (and rn=l) provides a better bound than (3.12). t'e
may obtain even better bounds by exploiting the weak calverge.nce results in
secticn 2 , for large values of N. As in
sen
(1972), we consider t.'le case of
general V-statistics I with the same notaticns as in Secticn 2 ], so that the
case of sanp1e neans (or sums ) can be obtained as a particular cne. Note that
by virtue of the weak convergence result, stated after (2.13), we have for
eve!:".! t >
1~~
a
and
p{rn
~
nIN,:,
n :
< n
kll\ - eN
=p{o<~<alwO(u)1
where wO = {~f (tl, O,:,t<l}
(5+1) -\'1(5), 5
~a ,
a (0 < a <ll ,
where
I ~ tIn[N -e1 ,N
~t}
J1/2 }
,
(3.13)
is a B~an bridge
w =
{W(t),t
Noting that t.f (5/ (5+1»
=
~ a} is a standard B~an rroticn
process an [0 ,oo}, we may rewrite the right hand side of (3.13) as
pro <5:<a/ (1-a)
I (u+l) -~(u) I
>
(3.14)
t}.
An upper bound for (3.14) is given by
P{o~cx,l (u+1)-l.yHull ?.
= 2
t}
L~l (_1)k+1exp(_2k2 t 2 ).
(3.15)
For small values of a (as is usually the case encountered in practice), we It'ay
get a better bound :
I (1+u) -1"l'1(u) I?. t}
sup
prO ~ u ~ a/{l-a)
~4 p{ t'l(a/(l-a»
where
~
t}
=
,:, P{o ,:,
U
sup
~ a/Cl-a) Iw(u) I ~ t}
1
4[ 1 - eIl(tCl/a - 1)>i ) J ,
ell (.) is the standard nonnal d. f.
(3.16)
In particular, for kernels of degree 1,
(3.16) may be CO'!pared to (3.12), and , as 1 - cI>(x) converges to 0 exponentially,
as x
-+00
,
usually (3.16) perfonns llUlch better than (3.12). '!he sane conc1usioo
holds for tlle cx:npariscn between «3.8) and (3.16) { or (3.9) and (3.15).
~
conclude this section with the remark that for noderate values of N, the
equality sign in (3.13) may , generally, be replaced by a less than or equality
sign, so that we may have a ccnservative property for small values of N.
4.
Jackknifing in Finite Populaticn Sanpling.
In SRS or other sanpling plans, for ratio, regressioo or other estimators,
'jackknifing' was mainly introduced to serve a dual pw:pose : To reduce the bias
of estimators (which are typically of the noo-linear fonn) and to provide an
efficient (and asyrrptotically nonnally distributed) estimator of the sanpling
variance of the (jackJcnifed ) estimator. In the
SaI1E
setup as in sectioo 2, for
a general estimator T = T(X , ••• ,X) (cootaining U as a special case) , we may
n
n
n
l
define the pseudo values
. = nT
T
-
n
n,~
- (n-l} T (i ) , i=l, ... ,n, as in (2.17). '!hen
n-l
the jackknifed estinator is defined by
*
Tn
= n -1 (Tn, 1
+ ••• + Tn,n ) ,
(4.1)
and the (Tukey fonn of the ) jackknifed variance estimator is given
( as in
(2.18» by
Sn2
=
(n-l}
-1
~
l..
1 (T
. - T
* )2
(4.2)
•
n
To rrotivate the jackknifed estimator, we may start with a possibly biased estimator
~=
n,~
T for whim we may have
n
ET =
n
eN + n
-1
a (N} + n
l
-2
~
(N) + . . . . ,
(4.3)
\'tlere the a. (N) are real nurrbers depending possibly 00 the J:X)pulatian size N
J
and the set ~~ • Using n-l for n in (4.3) for each T~~i and (4.1), we obtain
that under (4.3),
E'I'n* =
a (N)/n (n-l) + ....
eN -
2
'Ihus, the bias of T
n
=
eN + O(n-2) •
(4.4)
is reduced fran 0 (n-1) to that of O(n-2) for T* •
n
In
additioo to this irrportant feature of 'bias reduction' , the variance estimator
s2 also plays a very iIrportant role in drawing statistical ccnclusioos on eN •
n
.
Since in this chapter, we are primarily exncemed with the
~otics
in finite
populaticn sanpling, we shall mainly restrict ourselves to the discussioo of the
large semple pl:q)erties of T* and S2
n
n
; hopefully, in Sate other d1apt.er(s) , there
will be carple.nentaJ:Y discussioos en other aspects of jackknifing •
Keeping in mind the ratio, regression and other estimators (which are all
expressible as functicns of
sate
U-statistics 1, we cx:ncei\Ie of a general esti-
mator T of the fonn T = h CU ) where h C.) is a srrooth ftmetien and U ia a vector
n
n
_n
~n
of U-statistics, defined as in secticn 2. Also, we keep in mind the C01diticnal
(pennutaticnal) distributicn generated by the n! equally likely peII'llUtaticns of
Xl' ••• ,Xn am:ng t.he.mSelves, and define
'3i:n
as in after (3.4). 'll1en, it follCMS
fran the basic results in Majurrrlar and
sen
(1978) that
T*n
= T + (n-l)E[ (T -T 1)
n
n n-
S2
n
= n (n-l)Var[ (T - T 1)
n
n-
I'X:n
\:J n
1"1'J"n" ] , 'd n
] ,
> m ,
(4.5)
> m •
(4.6)
'Ihus, for both the jackknifed estimator T* and the variance estimator s2 , the
n
n
inherent penrutaticnal distributicnal structure provides ·the access for the
necessaxy m:xlificaticns. 'Ibis theoretical justification for jackknifing has
been elaborately studied in
sen
(1977).
To fix the notations, we let
ll.. ~N
EO , and, in adlition to (2.10)
"",n
(in
a matrix setup), we asS'Lll'lE that
i
Ell 2(~, ... ,Xm) II
4
<
ex>
;
m = max
CnJ.' •• .,mp )
where g_ (.) stands for the vector of kernels of degrees
(4.7)
,
TIl..L , •••
,mP , respectively.
further, we asS'Lll'lE that h (:;) has botmded secx:nd order (partial) derivatives
(with respect to
~
) in sore neighbourhood of
1:N
and h
(~)
is finite. Finally,
let us define
cr2
Nn
= EI (Tn* -
eN ) 2J , n
> n
-
0
and ass'Lll'lE that there exists a sequence
n
cr~ -
[(N-n) / (N-l)]
cr~
-+
,where n (> m) is finite,
0
{cr~}
of positive mmrers, SUdl that
0 as n increases, where
~ cr~
> O.
Now, parallel to that in (2.13), we consider a stochastic process
o<
t < l}
~~(t)
(4.8)
-
(4.9)
Y = {YN (t) ;
N
by letting
=YN(N-l[Nt]} =
where, for t <
nv'N,
INtJ(T~Nt]
- eN }/crN ,
~~t~l,
(4.10)
we carplete the definiticn of YN(t), by letting YN(t) = O.
Further, as in after (2.l3), we define a BI."CMnian bridge
vf
= {vf (t); 0 < t ~ lJ.
'!hen, we have the following result I viz., Majurrrlar and Sen (1978)]:
For the jackknifed estimator, tmder the assured regularity ccnditions,
Y converges in distributicn (or law) to
N
vf ,
as N increases.
(4.11)
Further, under the
sanE
regularity ccnditicns,
2
2
Sn - ON strengly ccnverges to
o as
n increases; this streng convergence is in the sense that for every
and
6 (> O), there exists a positive integer n
~
p{n
0-
<
I~ I S~
-
O~ I
>
£
}
a
£
(
> 0)
= n (£,6), such that
a
< 6
~
N
nO •
(4.12)
-
'Ihis strong convergence result enables us to replace in (4.10) ON by S [Nt] ,
* , t > 0,
for eveJY t > 0 • 'rhus, if we denote such a studentized process by YN(t)
then, we ccnclude that for every n >0 ,
{Y;(t);t e: [n,l]}
converges in law to {vf (t};t
In particular, it folla.vs that for any (fixed) a :
[n,l]}, as N increases. (4.13)
E:
°
< a ~ 1, if n/N
-+
a , as
N increases, then
n ~ (T
1
*
- eN) Is
n
Further, i f
v
N
be ncnnegative integer valued randan variable, such that
N-lv ccnverges in probability to
N
* -
N~ (Tv
1
(4.14)
is asyrrptotically noD'la1 (0, 1- a) •
n
eN )/S
a ( 0 < a ~l), then
is asynptotically nonnal (0, (l-a)/a ).
v
(4.15)
N
N
'!he last two results are very useful in setting up a confidence interval for
the Paraneter
o: eN = eO (specified).
eN or to test for a null hypothesis
H
As a sinple illustration, consider a typical ratio-estimator
of the fonn:
. u(j)
-l<;,n
(X)
. 1 2
Tn = u(l);U(2)
n
n
' n
= n ~i=l gj i ' J=, ,
(4.16)
where the functions gl (.) and g2 (.) may be of quite general fonn. In fact, we
may even consider sare U-statistics for U(1) and U(2) (of degrees
n
a setting, T
n
n
> 1). In such
is not generally an unbiased estimator of the popu1aticn paraneter
eN = ~l) 1~2} , though the u~j) may unbiasedly estimate the
Typically, the bias of T
n
~j) , j=1,2.
is of the fonn in (4.3), and hence, jackknifing
reduces the bias to the order n
-2 • Further, here h(a,b) = alb , so that
(a 2/aa2 )h(a,b) = 0 , (a 2laaab)h (a,b) = -b-2 and (a 2lab 2 )h (a,b)=2b-~(a,b) •
Consequently, whenever
~2)
is strictly positive and finite , for finite
eN'
the regularity conditions are all satisfied, and hence, (4.11) through (4.15)
hold. For
satE
specific cases, we may refer to Majurrrlar and
sen
(1978). 'Ihe
basic advantage of using (4.11), (4.12) and (4.13) , instead of (4.14) or (4.15),
is that these asyrrptotics are readily adoptable for sequential testing and
estimation procedures. Further, the asyrrptotic inequalities discussed in the
preceeding secticn also remain applicable for the jackknifed estimators. In
Particular, (3.13) through (3.16) also hold when we replace the U-statistics
and their variances by the T* and the 'I\Jkey estimator of their variances.
k
Finally, the results are easily extendable to the case where the T are q-vectors,
n
for scree q
~
1.
In that case, instead of (4.11) or (4.l3),
~
would have a
tied-dCW'l Brownian sheet approxinatien (in law), and instead of (4.12), we
would have the strang convergence of the matrix of jackknifed variance covariances.
For (4.14) and (4.15), we would have an analogous result involving a nultivariate nonnal distributien.
5. Estimaticn of Populatien Size: Asynptotics
'n1e estimatien of the total size of a pc:p.ilatien ( of nd:>ile individuals,
such as the nurcDer of fish in a lake etc.) is of great :iIrportance in a variety
of biological, envi.rarental and ecological studies. Of the nethods available
for obtaining infonnatien about the size of such pc:p.ilatiens, the enes based en
c~ure,
marking, release and
re~ture
Petersen (1896), have been extensively
(CMRR) of individuals, originated by
studied and adapted in practice. 'n1e
Petersen nethod is a a..u-sanple experinent and am::>unts to marking (or tagging)
a sanple of a given nurrber of individuals fran a closed pcpulatien of \mknCW'l
size (N) and then retuming it into the
~tien.
'n1e prcporticn of marked
individuals appearing in the seoc:nd sanple estimates the proporticn marked in
the populati.cn, providing in tum, the estimate of the pc:p.ilation size N.
Sdmabel (1938) censidered a nulti-sanple extensicn of the Petersen nethod ,
where each sanple captured camencing fran the seccnd is examined for ma.IKed
rrerrbers and then every rrerrber of the sartt>le is given another mark before being
returned to the populatien. For this nethod, the carputatiens are simple,
successive estimates enable the field worker to see his nethod as the work
progresses,and the nethod can l:::e adapted for a wide range of capture C01diticns.
For the statistical fonnulation of the QffiR procedure, we use the follaving
notaticns. Let N = total populaticn size ( finite and unknavn), k = nurrber of
-
semples ( k > 2), n. = size of the ith semple, i > 1, m. = mmiJer of marked
-
~
~
individuals in n. , i=l, ••• ,k, u. = n. - m. , i=l, ••• ,k, and H. = number of
~
~
~
~
~
marked individuals in the pcpulaticn just l:::efo:re the ith semple is drawn (Le. ,
Mi =
l:~:i
r,~. +
n. - m =
u j ), i=l, ••• ,k. Ccnventianally, we let
~
J=
KKK
N.
~
1 (n. - m. ).
J
J
r\
= u l = 0 and
~+1
=
a:nditicnal distribution of m. , given
NcM, the
~
and n. , is given by
(.) ~
N.
L.~ ~
~
(m.
~
N-I>1.
N
I M.,n.) = (.)
~ (
~ )/t),
'In. 'n.-m.
n.
~
~
~
~
~
i=2, ... ,k ,
(5.1)
~
so that the (partial) likelihood function is
(5.2)
lJote that
_ -Ck-l){ k
- N
IT.~= 2
= N-
(k-l) { k
IT.
~=l
so that
lil~-l
1k:M,
is
~
according as ( 1- N-'+l)
1
~
is
k
n
j=l
(l-N-ln.)
J
(5.3)
(5.3) provides the soluticn for the IPaXirnum likelihood esti.m3.tor (HLE) of N.
For the Petersen schaiE (i.e., k
lilLN-l is
~
!n2J
so that [n n
l 2
1
=
= 2)
according as
,.,
1~2
, (5.3) :reduces to
N is
~
!n2 '
nl n2
is the MLE of N. For k.: 3, in general, (5.3) needs an
iterative soluticn for locating the
1>1[£
of N.
that based en ~i) , the
Note
,.,
MLE of N is given by N. = [n. M.,Im.] , for i = 2, ••• ,k.
~
~
~
~
interest to study the relationship
j==2, ••• ,k, when k > 3.
(5.4)
l:::et~
Before doing so,
v.'e
It is of natural
,.,
the MLE N ( fran (5.3»
,.,
and the N. ,
J
may note that by virtue of (5.1),
I H.1. ,n.1.
P( m. = 0
1.
N-M.
N
) = (
l.l!l 1 > 0 , for every i =2, ... ,k, so that the r1LE
n.
"n.
1.
A
1.
N do not have finite m::::rrents of any positive order. To eliminate this drawback,
i
we may proceed as in 01.apnan (1951) and calsider the m:xli.fied MIE
v
N. = (n.+l) (M.+l)!(m.+l) - 1 , for i = 2, ... ,k •
(5.5)
1.
1.
1.
1.
A
v
Asymptotically (as N -+ 00), both N. and N. behave identically, and hence, this
1.
m:xlification is well reccmrended.
1.
Using the normal approximation to the
hypergec:rretric distribution , ooe readily d::>tains fran (5.5) that
_~
N
2
A
(N - N ) is asyrrptotically normal (0, y «(X1'(X2) ) ,
2
whenever for scree 0 < (Xl' (X2
y
2 (a,b) = (I-a) (l-b)!ab
.::. 1, nl/N
~
-+
(Xl and n2/N
[(2-a-b)!(a+b)]
and where the equality sign in (5.7) holds when
For the case of k
2:.
2 ,
-+
(5.6)
(X2 ' as N
~
0 < a,b
-+00
1,
,
where
(5.7)
a = b •
3, a little nore delicate treatnent is needed for the
study of the asynptotic prc:perties of the MIE's as well as their interrelatioos.
Using sare martingale characterizaticns, such asyrrptotic studies have been made
by
sen
and
sen
(1981) and
sen (1982a,b).
First, it follCMS fran
sen
and Sen
A
(1981) that a very close approximation N* to the actual MLE N [ in (5.3)] is
given by the solution :
k
k
A
N* = n:s=2 Nsms!<N*-Ms ) (N*-n )]![[s=2 ms!(N*-Ms ) (N*-n )] ,
s
s
where the HLE N. are defined as in after (5.4).
1.
~
(5.8)
other approximations,
listed in Seber (1973), are given by
N
= [ [k 2 ~ m !
s= "~s s
(N -
M )]! [ [k 2 m !
s
s=
s
(N--
M )] ,
s
(5.9)
and
::¥
k
N = [[
'"
k
N m ]![ [s=2 m ] •
s=2 s s
s
(5.10)
out well when the n. ,2~j~k are all equal or N-ln. are all small, while,
J
J
(5.10) is quite suitable, when in addition, the N-~. ,i=2, ... ,k are all small.
N- works
1-
For both (5.8) and (5.9), an interative solutioo works out very well, and has
been discussed in Sen and Sen (1981). Schumacher and Eschneyer (1943) ccnsidered
another estimator (pertaining to the same sche.Ire):
•
N
m M J/11:ks=2
-k
'"
~- 2 N
=[
s=
555
(5.11)
J,
I!1 1"1
55
which is also an weighted average of the Petersen estimators.
sen
and
sen
(1981)
considered an altemative estimator
N
I ~2
=
Ns Uns/Ms)J/[ ~2 (ms ,l't-1s )]
(5.12)
•
If we let
n. = Na. (0 < a.
~
~
< 1) and
-
~
8. =
~
n.i 1(1- a.}, i=l, ••• ,k ,
J=
(5.13)
J
'" as well as the a.wroximate MLE N* , we have [ viz.,
then, for the MLE N
sen
and
Sen (1981)] :
N-~ ( N* - N) asyrrptotically nonnal
[~S=2
cr*2 =
a s (l-
8s-1)/~s
(0, cr*2 ) ,
(5.13)
(5.14)
]-1.
Parallel results for the other estimators are
N-~(N -
~2
N) asynptotically nOl:mal(O,
I:~2as(1-as} Cl-8s_1)/~s_1][
= [
N ~C N -
N) asyrrptotically nonnal (0, cr
_k
*
= I ~~2
8s a s (1-as ) (1-8s _ 1 }8s _ 1J/[
:: 2
cr
*
~s
_1
=1
= 1- B5-l '
N-
= (E
k
s=
]-1
) ,
k
Ls=2 asCl- 8s - 1 )]
[Lk 2{(Y +1:. a.y.}2 a (I-a )y Cl-y )}J/[Ek 2a y2 ]2
s=
s J>s J J
5
5
5
5
s= S 5
N-~ (
-2
asCl-~s_l)/Bs_l
2
(5.16)
,
'2
asymptotically nomal CO, cr ) ,
•
;2 =
cr
(5.15)
Bk* = 1;
k
+ Lj=S+l
a j , s=2, ••• ,k-l,
N ~ (N - N)
YS
ikS=2
::2
==
_1
02 } ,
N)
2 a)
5
It follONS fran
for s
= 2, ••• ,k
;
asynptotically nonnal (0,
-1
k
[E
sen
s=
(5.17)
'
cr2
),
-1
2
2(y
+ E.> a./y·) a (I-a )y (l-y ) J.
5
J 5 J J
5
5
5
5
and
sen
(1981) that
- cr,;
cr , ~,
(5.18)
are all greater than or
(2<s<k) are all equal, while, in the
s
-other three cases, an approximate equality sign holds when the a are all small.
equal to
cr* , where
(J
= cr* iff the a
5
NUl'lerical cx:rtpariscn of these asynptotic variances reveals that over the entire
darain of variation of the as ' none of the estimators
N,N
and
N is
unifonnly
better than the others. In (5.1) and (5.2), we have considered the so called
sanpling without replacerrent scherre. If
wB
draw the 2nd, ••• ,kt.h sarrples with
replacarent, we need to replace the hypergec:rtetric distributioos in (5.1}-(5.2)
by the correspcnding binanial distributicns, and this will lead to scree
sinplificaticns in the folltU.1lae for the asynptotic variances.
In many situations when the' n. are ve:ry small ccrrpared to N, the m. are
J
J
also ve:ry small (may even be equal to 0 with a positive prd:>ability). '!his
may push up the variability of the estimators ccnsidered earlier. For this
reasen, often, an inverse sartFling scherre is recamended. In this setup, at
the sth stage , the sample units are drawn ene by one , until a preassigned
number m of the marked units
s
~ar,
so that the semple size n
variable, while m is fixed in advance, for s = 2, ••• ,.k.
s
s
is a randan
For this inverse
sarrpling scheItE, parallel to (5.1), we have
(i)
N
-1 Mi
N-Mi
L~ (n. I K,m.) = (n I)
(l)(
){(M.-m.+l}/(N-n.+l}}
~
~
~
~
'n.'rn.'n.-m.
~
~
~
~
~
= {ro. (
~
H
~
N-r~
i) (
}}/{n. ( N)} , i=2, ••• ,k,
ro. n.-m.
~
n.
~
~
~
e-
~
(5.19)
~
and (5.2) can be rrodified accordingly.
Note that (5.3) and (5.4) are not
affected , so that the r-II.E remains the sane. It follows fran Bailey (1951)
that
N.= (M.+l)n./ro. - 1
~
~
~
~
the exact variance of
on letting
-1: "
~
N 2(N -N)
2
~2
, i=2, ••• ,k are unbiased estimators of N. Note that
is equal to
(nl~+l) (N+l) (N-nl)~ (n l +2),
so that
= a *n = a * •a N , we have parallel to (5.6) - (5. 7) that
l l
l 1
*
.
*
asynptotically nomal (0, (1-<l1) (1-a )/a a ).
l
l l
(5.20)
Note that in (5.20), a * plays the sane role as <l2 in (5.6)-(5.7). With a
1
similar rrodificaticn for the other ms ' the results considered earlier for
the direct sanpling schare all go through for the inverse sarrp1ing schene too
( when N is large)
~
the main advantage of this inverse sarrpling scherre is that
the estimates have finite rrr::m:mts of positive orders, although, the arrcAlnt of
sanpling ( Le, n +••• ~) is not predetennined (}Jut , is a rancbn variable).
2
Ir.verse sanpling scl1eIres are tlle precursors of: 'secruential sC!!Fling ta<J.2ing
ocnsidered by Clapnan (l9~2), GcxXJman (1953), Darrcch (l9S8) and oth?rs. Darling
andItilbins (1967) and
(l968) have studied
relateO proolens rn
SOlE
arising in sequential sarrpling tagging for the estimation of
stCf'Ping ti.Ires
~ulatien
the
S~l
size N , and the asyrrptotic theory plays a vital role in this
ccntext. Lack of stod'lastic indepenaence of tile randcr.\ variables at successive
stages of drawing and nenstatiooarity of their marginal distributirns call for
a ncnatandarc:l awroach for a rigorous sttXiy of the asynptotic properties of the
M::.E of N in a multi-stage
or sequential saripling procedure. Using a suitable
martingale dlaraeterizaticn, this asynptotic theory has been developec1 in
sen
(1982&,b}, and is presented below.
Inili.viduals are drawn randar.ly cne
neJ:t drawing is made. let
1\ be
f:r.j
one, ma.rkeG and released before the
th.e nurrber of marl-:.ed individuals in the popu-
laticn just before the ktl'1 dra\'!al, for k.:: 1. 'ihus, rIa
f\4'k ' for every k ~ 1, and ti k+l
= f\
X,.( is equal to 1 or 0 according as the
I
Xl' ••• '''i-I)
= N-II"': (r-:-
1, r\+l
~
+ 1 - ~ , k .:: 1 , \1.'1Cre , for every k (~l) ,
kt.~
drawal yields a r.arked individual or
not. Now, tl1e ccnditi.cnal probability functirn for
f k (Xl(
= ~ = 0, M:2=
~\) l-X;~
"i< '
qiven Xl' ••• '~-l i!';
, k > 1, so that at the nth stage, the
(partial) likelihcxx1 functicn is given by
Ln(U)
= ~2
fk(~
IXl' •••
'~~-l) = N-(n-l) ~2{l\~(N _ ~)l-~}.
(5.2l)
Note that
(a/aN) lCXJ L (N)=
n
~2 (1 - ~)/(N- ~)
-
(n-l)!N
(5.22)
•
'r.'1e sumnands in (S.22) are neither independent nor identically distributed
randall variables.
~~verth.eless,
thery lead to a sinple martingale-difference
structure en \<lhi.ch the asynptotic theory has been huilt in • '.L1le MLE
NSn
of
H , based en 'L (N), is a solution of (5.22) (equated to 0), and cne is interested
n
"
in the ~11Tptotic behaviour of the partial sequence { N
Sn
: n ~ n*}, "mere n* is
large , and the sequence is suitably nOIl'\alized. For this purpose, we clefine
(5.23)
and, for every N,n, such that n
[Nex],for sene a >0, we define
k : ~* (N) ~ tZ * (N) } , 0 .: t ~ 1 •
n
= max{
n (t)
=
(5.24)
'Ihen, for each (N,n) and every e: :0 < e: < 1, we may cx:nsider a stochastic
*e:
*
*
=
~~b
process
{~m (t)
= N-~ (N~Sn (t)
~-Jn (t)
,e: ~ t .: 1l , by letting
- N ) (e
a
- a-I)
~
,e: ~ t
< 1.
(5.25)
Further, defining the standard Brownian noticn process W = {W(t);t e: [O,l]} as
in before (3.14), we let w*e:
following:
*e:
= {w* (t) = t -~(t),
For every e: : 0 < e: < 1,
*e:
~
cx:nverges in law to W
, whenever n
= [Na]
such that n -Iv
n
~
N
~
~
=
- N ) is asyrrptotically nonnal (0, (e
Furt.~er, if {v}
n
svn -
(N
-.
as N increases,
A direct consequence of (5.26) is that whenever n
~r~ ( NSn
-
e: < t < I} • '!hen, we have the
for sene a
> O.
(5.26)
INQC.], for sene ex >0,
a - a-I) -1) •
(5.27)
be any sequence of positive integer valued randan variables,
-+
1, in prOOability, as n increases, then, we have
m asynptotically nonnal
(0, (e
a
- ex - 1)
~
).
(5.28)
are nav in a position to carpare (5.6) and l5.27), where we put a = ex + a •
1
2
By virtue of (5. 71, the asynptotic variance in (5.6) is a mi.ninun
"men
= a2•
CL
I
Cai"paring this minimum value with (5.27), we cx:nc1ude that the asynptotic
relative efficiency (A.R.E.) of the two-sanple Petersen estimator (for
with respect to the sequential estimator
E(P,S)
As
a
=
2
a
a /{ (2-a)2( e
so that for small values of a , the
~
Sn
a increases, E(P,S) also increases, and in fact, for
exceeds 1 and it can be quite large when
practical situaticns,
= a2)
(5.29)
Petersen estimator is about 50% efficient cc.npared to N
as
l
is given by
- a - I ) }.
goes to 0, (5.29) converges to 1/2,
a
•
en
a~
the other hand,
.7657, (5.29)
a is close to 2. F-awever, in all
a is generally quite small, and hence, the sequential
estimator can be re<XJma1ded with full ccnfidence. Fran the operaticnal point
of view, often, sequential schE!l!ES are not very practical, and hence, the
Petersen estimator may be used •
In the CXl'ltext of sequential estimatien of the total size of a finite popu-
1atien, the fol1CMing urn node1 arises typically. SUppose that an urn CXl'ltains
an unknC7N1'l nurrber N of white balls and no others. We repeatedly draw a ball at
randan, cbserve its colour and replace it by a black ball, so that before each
draw, there are N balls in the urn. let W be the nunber of white balls observed
n
Note that ~ is nendecreasing in k , ~ ~ k, for every
in the first n drawals.
k
~
1 and W = 0, W = 1.
o
1
t
Note that t
whenever t
= inf{ n: n
c
c
c
For every c > 0, CCI'lSider a stopping variable
> (c+1)W
-
}.
n
(5.30)
can take en cnly the values [(ot-.l)k], for k =1,2, ••• , and W
t
= rn
c
= [m(c+1)]. In this situatien, ene is not only interested in the
"
study of the asyrrptotic prcperties of the ru N
. ' but also of the standardized
St
c
fonn of the stopping variable t • SamJe1 (1968) made sare exnjectures , and
c
general results in this directien are due to Sen (l982b).
We may note that W
n
= ~1n-1
+ Wn
where
,
n is equal to 1 or 0 according as
W
the ball appearing at the nth draw is white or not, for n ~ 1; \\TO=-WO=O. For
every K (0 < K <co) and N , we exnsider a stidlastic pl:'CXl:!SS
[0, K]}
~
=
{~(t),
t e:
by letting
(
~~(t) = l.r~
1
W[Nt] - N(l-(1-N- ) [Nt]», t e: [O,K] ,
where [s] denotes the largest integer CXl'ltained in s. Also, let
t e: (O,K]}
(5.31)
Z = {Z(t),
be a Gaussian Pl:'CXl:!SS with 0 drift and covariance function
EZ(s)Z(t) = e-t { 1 - (l+s)e'!hen, as N increases,
s
} , for 0 ~ s ~ t ~ K.
(5.32)
~ converges in law to Z. Since (1-N-1) n is close to e -n/N,
this suggests that a CXl'lvenient estimator of N (based on n draws) is given by
the solutien (N* ) of the equaticn : W = N(l-e-n/N), and the asyrrptotic pro-
n
n
perties of this estimator can then be studied by incorporating the CXl'lvergence
of
~~
to Z (in law ). For the associated stopping ti.rre, we define now
1*
=
where 0 < a < b < 1 •
[a,b]
Also, for every rn
rn
= (1
E
1* , we define t* as the soluticn of the equaticn
_~
- e
(5.33)
*
)/tm
m
,
m
e: 1*.
(5.34)
tbte that mt~ ~ 1 for evezy m e: [0,1], ti
as m mJveS fran 1 to
o.
=0
and
t~
ncnotalically goes to
For evezy N , we ccnsider a stochastic process Y
N
Q)
=
{YN(m) , m e: 1* } by letting
Tt.....
~II
= inf{
n( >1):
Note that W (=
n
-
-
~~)
(5.30). Let then Y
> W} and YN(m) = l.r~( T
- Nt * ), rn e: 1*.
(5.35)
Nm
n
rn
depends en N as well and m plays the role of (l+e) -1 in
1m
=
covariance function
-t*
EY(m)Y(m ' ) = e m·
'Ihen, as N increases,
{Y(m),m e: I*}
{l-(l+t~,)e
be a Gaussian process en 1* with 0 drift and
-t*
-t*
-t*
m'}/{(rn-e rn) (m' - em')} , m2:,m'.
(5.36)
Y converges in law to Y. 'lhis ccnvergence result, in tum,
N
provides the asyrrptotic nonnality of YN(m) for evezy fixed m ( e: 1*) as well as
for any sequence {m } of positive randan variables for which m
n
n
~
m ( e: 1*) , in
prti:>ability, as n increases. For m vezy close to 1 (i.e., c in (5.30) very close
to 0), Poisson awroxiroatiens for
T
Nm
, suggested by 5amJel (1968) ,works out well.
In the two or nulti-sanple capture-recapture nodel, and, IOOre critically,
in the sequential tagging scherre, there are certain basic assunpticns whim may
not always match the practical awlocaticns. For exarrple, effects of migraticn
need to be taken into account when sanpling is ccnducted over a period of ti.rre
and new individuals may enter into the schem::! as well as
SClTe
existing cnes may
exit. Also, the catchability of an individual in the tagging schen'e may depend
en sane other characteristics. l-breover, cnce caught, an individual may develop
sane trap-shyness or trap-addictiens, so that at the subsequent stage (s), the
capture-prcbabilities are affected. Fann(l971) ccnsidered t..l-te asyrrptotic normality
in a capture-recapture problem when catchability is affected by the tagging
procedure. seber (l973) studied the robustness of 01RR procedures against possible
departures fran these basic hcrcogeneity asstmpticns. For sane CMRR nodels allCMing
sane relaxaticns of these basic assmptions, large sanple theory has been neatly
develq;led in lbsen(l979). 'I11e basic problem with this develc:prent is that there
are so many unknown pararreters involved in the final structure that fran the
statistical inferential point of view, there is little encouragement in their
possible adcptials.
e
6. sanpling with Varying Prd:>abilities : hJynptotics
Ha."1sen and Il\.uWitz (1943) initiated the use of unequal selecticn prooabilities
leading to
IiDJ:e
efficient estimators of the populaticn total. If N and n stand
for the nt1n'ber of units in the p:pulatioo and semple , respectively, and if Yl ' .
• • , Y and y l' ••• , yn denote the values of these mits in the populatioo and semple
N
raspective1y, then, cne may ccnsider the fo11CMi.ng sanp1ing 'oli.th :r.ep1acerrent
GcheIre. let P = (PI' ••• ,P ) be positive nurcters which are nonnalized in such a
N
rre~ure
way that PI+ ••• +PN = 1. 'J.Ypical1y, ooe may CCI"lsider a
the ith unit in the population and set P. =
1
Si of the size of
S./(1:I~
IS.), for i=l,. .. ,N.
1
1=
1
NoN,
corresponding to the san-p1e entries y l' •.• ,Yn' the associated P' s are denoted by
PI' ••• , Pn ' respectively.
Here, semp1ing is made with .rep1ac::::errent and the j t.:'1
mit in the population is drawn with the prd>abi1it'l P. , for j=l, ••• ,N. 'D1en,
J
the hansen-Hon"itz estimator of the population total Y = Y1 + ••• +Y ; is
1
-1
YHH = n
(Y1/P1 + ••• + Yn/Pn ) •
(6.1)
A
'D1is estimator is unbiased and its sanpling varianoe is given by
A
Var(Y )
HH
= n -1 {L.1~,,2
1 _./p.
1=
1
1
=
(2n)
-1
_2
- y- }
2
(6.2)
E ·..J· ". P. P . { Y. /p. - Y. /p . }
1~1rJ~~~ 1 J
1
1
J J
Ue may further note that
2
S_L"U = [n(n-1)]
-1 n
!U1fi
A
E'=l( y,lp. - YI1U
1
1
1
=.
)2
= [2n2 (n-1l] -1 E '..J'< {y./p. - y./p. }2
1~1rJ_n 1 1
J J
(6.3)
is an unbiased estimator of var(¥HH)' Sinoe sarrp1ing is made with .rep1acer:ent and
the y./p. are independent with rrean Y and varianoe
1
1
E~1=1
-lIP.
-.;.
1
1
(=
aN~"Ut..T'
U"'l
standard large sanp1e tileOry is adq>tab1e to verify ti'.at as n increases,
2
2
nSnHH/ONHH
n~ (
ccnverges to ene, in prcba.1)i1ity ,
A
YHH - Y ) is asyrrptotical1y nonnal (0,
so that, by (6.4)
~
n~(YHH - Y)/SnHH
p}~1H
(6.4)
),
(6.5)
is asynptotical1y nonnal (0, 1) •
(6.6)
(6.5),
say),
'n1e situatioo beocItes quite different when sarrpling is made without replaoenent.
In ooe hand, ooe has generally m::>re efficient
estimators~ 00
the other hand, the
e
exact theory becx.mas so a::rrplicated that me is narurally inclined to rely m::>stly
en the asynptotics. To enccrrpass diverse sanpling plans (without replaoenents) ,
'tJe
identify
~
H, ... ,lJ} of natural integers and
populatioo with the set N =
denote the sanple by s • A sanpling design may then be defined by the prd:>abilities
£
s ,
associated with all possible sarrples. In particular,
n.1. =
pU
£
p (s), s
'tJe
let
s} = r{all S corh..
......
;~;~.} p (s) , i=l, ••• ,N.
a .••u ..L.Llg 1.
•
(6.7)
'!hese are tenred the first order inclusioo prcbabilities. Similarly, the se<:Xrld
order inclusioo probabilities are defined as
n1.J
..
= p{
i,j
£
s}
= r{all s cantain" mg ('1.,J')} p(s), i1j=1, ••• ,N.
(6.8)
'!he classical HorVitz-'IhCllpS011 (1952) estimator of the populatioo total Y is
then expressible as
'"
Y
HT
= L,1.£S (y1.,/n1..) •
(6.9)
Various properties of this estimator are discussed in
sate
other chapters, and,
hence, we shall not repeat the discussioo here. l'1e shall mainly
the asynptotic theory.
'"
var( YHrr)
~-!1en
= L.N1=1
CCl10entrate 00
'!he sanpling variance of this unbiased estimator of Y is
-12
(n. -l)~ + Ll<'~'<~T (n .. /n.n. -l)Y.Y. •
1.
1.
_1.rJ-:"
1.J
1. J
(6.10)
1. )
the nl.lrCber of units (n) in the sarrple s is fixed, an alternative expression
for the variance in (6.10), due to Sen (1953) and Yates and Grundy (1953), is
2
' '<N (n.n, - n .. 1( Y./n, - Y,/n,) •
(6.11)
1~1.<J_
1. J
1.J
1. 1.
J
J
It is clear that if the Y. are all ( exactly or closely) prqx>rtional to the
1:
1.
corresponding
n. , then (6.11) is ( exactly or closely) equal to 0
1.
advocates the choice of the
00
that count,
~
this point
n. as prqx>rtienal to the size of the units, and
1.
'prd:>abili~ E9P9rtienal
to size' (ws) sanpling is quite a
reasooable optioo.
NcM, in the ccntext of sarrpling (without replacenentl with varying prcbabilities,
various sarrpling designs have been ccnsidered by
various wol:kers.
Am:ng these,
rejeetive s~lin2 may be defined as in H.3:jek (1964) as sarrpling with replaoerrent
e
with drawing prcbabilities
(11"'.'~
at eadl drm..., ccnditicned en the require-
nent that all drawn units are distinct. 'lbe (1i are positive nurtbers adding upto
1. As soon as me cbtains a replicaticn, one rejects the whole partially built up
semple and starts eatpletely new. In this scheIre, the inclusicn prd:>abilities
iT. can re carputed as in Hii'jek (1964), in tems of the (1. • A related sanpling
J . '
J.
plan, known as the Samford-Dlllbin sanpling, is defined in a similar manner,
where the first unit in the sanple is drawn fran the pq:>ulaticn with the proba(1)
a.
bilities
J.
-1
.
= niT. , J.=l, •••
J.
,N,
and in the subsequent. (n-l) draws, cne
considers the drawing prcbabilities as ex ~*) = (1 iT. (l-iT. )-1, i=l, ••• ,N where (1 is
J.
J.
J.
N
(*)
so selected that ~i=l a
= L Here also, a sanple is accepted only if all
i
selected units are distinct. For roth these schenes, a rejecticn of the
accumulating semple is made when at any intenrediate stage, a repetition occurs.
In a successive
PI' ••• , PN
'
s~~
plan cne draws units one by cne with drm...mg probabilities
and, if a replication occurs at any draw, that particular one is
rejected, and the drawing is ccntinued in this manner tmtil one has the prefixed
mmber n of. distinct units in the sanple • Following Posen (l972,1974) , let II , •
••,I
n
be the indices in the (randan) order in whidl they
~ar
in the sanple
of size n , and let
f:. (r ,n) = Probability that item r is incltrled in the sarrple of size
n drawn accprding to the successive sanpling plan,
(6.12)
for r= 1, ••• ,N. For this scherre, the HOl:vitz-'Iharpscn estimator in (6.9) reduces
to the following:
YHT
=
In passing,
~=l
(6.13)
YI./f:.(Ii,n) •
J.
~ may
remark that if Pl=, •• =P
n/N, so that (6.13) is given by N(
N
= N-
l
, the f:.(r,n) all redu::es to
n-lL~=lYI.)
, and hence, relates to the usual
J.
equal prOOability sarrpling scherre (without replacenemt). In the nore general
case where the Pi are not all equal, the
f1(r,n) can be OOtained as in Ibsen
(1972) in tenns of a set of inclusion probabilities.
~ver,
these expressions
( as given below) are generally quite CCIlplicated and call for asyrrptotic
ccnsideratioos. For every n.( >1) and rl, ••• ,r : 1 < r ". ••• ". r < N, let
n- l
n.{rrn 2 Pr [1- L~=ll
P
]-l}.
(6.14)
p(rl ,···, r n 1 = Prl"K=
Jr
k
j
'llleJ1 ,
6(r,n) =
rj=l { L(j) P(rl, ••• ,r )},
n
where the sumnatioo
(6.15)
L (j) extends over all penrutatioos of (rl' ••• , r ) over
n
(l, •••• ,N} , subject to the constraint that r. = r
J
~
j=l, ••• ,n, for r =l, •••
,~.
'Ihe varying probability stIucture and the ccrrplicatioos underlying the
6 (r,n) in (6.14)-(6.151 intrcxiuoe certain CCIlplicatioos in the study of the
asyrrptotic distributioo theory of the HoIVitz-'Iharpson estimator ( or other
estimators available in the literature). Ibsen (1970,1972) CCIlSidered an
alternative approad1 (through the ootzE?1 collector's problem) and provided
sare deeper results in this ccntext. To illustrate this approadl, we first
consider a ooupcn oollector problem. Let
(6.16)
be a sequence of coupcn collector's situations, where the
L~-1 P~.l· = 1 , \;f
n\.1['['()ers, the P . are positive and
NJ
(double) sequence { J
, k .: l}
Nk
J-
~~J
~j
and P
are real
Uj
N > 1. Consider also a
-
of (rcM-Wi.se) independent and identically
distributed randan variables, where, for each N ( .: 1), k .: 1,
J
p{
(6.17)
Nk = s } = PHS' for s=l, ••• ,N •
Let then
.
~=
Vr~
{
~
/6 (JNk,n)
(6.18)
1&
0,
= inf{
otherwise ;
n : nurrber of distinct JNl'oo.,JNn
Note that for each N ( .: 1), the
v~
are
k > 1.
= m }, m .:,1.
(6.l9)
positive in~r-valued randan variables.
'!hen, lbsen(1970,1972) has shown that on identifying {Yl,oo .,Y : Pl' ••• ,P } with
N
N
~~
in (6.16) ,
YHT
iJ
, say,
Z
nV
Nn
(6.20)
where
~
stands for tile equality in distributioos. NoN, for a coupcn collector
~~ in (6.l6) , ~ = L:~l ~
situatiCl1
~
i8 temed the Dams sum after n co~,
~
for n > 1. 'Ihus, oorrespcnding to the situatiCl1
another situatirn
* =
~'1n
in (6.16) , if we ccnsider
{ (~1/6 (l,n) 'Pl'n) , ••• , (~~6 (H,n) ,PNl:.l ) }, and, as in
(a_.,p.,.) Nith (Y.,P.), j=l, •• .,S, then, foragivenn,
NJ .~J
) )
z\)
in (6.20) is the berlUs surn after \)Nn ooupons in the oollector's situatirn
~
before, identify
~ Nn
•
':J1US, the asynptotic nonnality of (ran&:rnly stopped) bonus surrs ( for the redured
caJPCI'l oollector's situatiCl1) provides the sarre result fqr the Eorvitz-'IhCIlPson
estimator. A similar treatnent holds for many ot:1.er related estimators in
successive sanpling with varying probabilities (\'!i.thout replacarent). Towards
this goal,
CI1 the
\~
may note blat as in PDse.n (1972), under sare regularity oonditiCl1S
a_. as \lell as the Pp
'
-~~
~~
6 (s ,n)
,
as N increases,
+
= 1 -exp{ -Pl., t en) }
:~s
where the functiCl1 t (.) = {t (x), x
N- x =
0
(i,r~),
~ O}, is
H
I:k=l exp{ -t(x)P!'lk}
s=l, ... ,N,
(6.21)
defined inplicitly by
(6.22)
, x .:. 0 ,
and therefore, depends CI1 PI~l' ••• ,PI-iN ). Given this asyrrptotic relation, we
may write for evelY s (=1, ••• ,N) ,
a,*
r~s
= a. /6(s,n) = U - exp{-P t(n)}) -1a.., + o(N-~ ).
illS
....s
Ns
(6.23)
It also follCMS fran Rosen (1970) that under the sarre regularity CCI'lditions ,
\)Nn/t (n)
whenever
S\.ITI
n~
(6.24)
CCI'lverges in probability to CI1e,
is bounded
Cf.iay
fran 0 and 1. Calsequently, if
.
*
\~
define the bonus
*
k
for the reduced coupoo. collector's situation by I1~ = I:i=l'\u . ' k ~ 1,
then , we need to verify that
cally nonnal., and (ii) n
_Yz
(i) the nonnalized versirn of
max{
*
111m -
*
~t{n)
I :I
*
N!.
~Jnt(n)
k/t{n) -11 .:.
o}
is asyrrptotiCCI'lverges
in prOOability to 0 ( the later CCI'lditiCl1 is knCMn in the literature as the
Anscarbe (1952) 'unifonn CCI1tinuity in probability' CCI1ditiCl1). A stronger
result,Yhich ensures both. (i) and (ii), :relates to the \\eak CCI'lvergence of the
.
*
*
part~al sequence {~-Jnl( - m,.~
* en) )} ~ ;
)/{var (~t
k < t (n) } , and , this has
been established by Sen (.l979al through a martingale approach. For sinplicity of
presentaticn,
~
ca1Sider the case of
original a::>upcn a::>llector's situaticn
t.l-)e
(and the sarre result CCI1tinueS to hold for the reduced situaticn too). let us
denote by
<P
Nn
=
~~l ~-Js [
2
dNn
=
1:s=1 ~s exp{-nPNS}[l - exp{-nPNS }J -
H
1 - exp{ -nPNS }],
n .: 0,
(6.25)
2
N
n( ~s=l ~SPHSexp{-nPNS })
2
,
n > O.
(6.26)
'!hen, under the usual(lbsen-) regu1aruty a::nditions, it follCMS that
~
= De en) whenever
nIN is bounded away frc:rn 0 and
(6.27)
00
Further, under the sane regularity conditicns,
(B
-
Nn
<P
)/'1-41 is asyrrptotically nonna! (O,l).
Nn
(6.28)
In fact, if ~ CCI1sider any finite nUI'YiJer, say, q,of the sanple sizes, Le.,
n1'''. ,n ,where the n. all satisfy the calditioo that 0 < n./N < 00 ,for
q
J
J
j=l, ••• ,q, then, (6.28) readily extends to the multinonna! case. Further, if
~
define ~ = { ~-l (t), t
WN{t)
-!.:
= N
Z
(
e: T }, where T = [O,K] for Sate finite K , and
~.J[NtJ - <P lHNt ] ) , 0
.=:. t .=:. K ,
(6.29)
then, it follCMS frc:rn Sen (l979a) that ~ converges in law to a Gaussian functioo
00
T , and this ensures the tightness of
v~
as well ( so that the MScarbe
conditicn holds). In passing, we may :remark that if the a p
_"s
are all ncnnegative,
the bcnus stm\ ~ is then ncndecreasing in n, so that, ~ may define
UN (t) = min { k : ~.: t } , for every t .: O.
(6. 30)
'Ihen, UN(t) is terned the waitinst tilre to cbtain the bonus stm\ t in the
a::>llector's situatioo
p{ UN(t) > x}
~-J.
~
Note that, by definiticn,
= p{ ~J[x] < t } , for all
x, t
(6.31)
> O.
'l11erefore, the asynptotic distributicn of the nonna!ized versial of the waiting
tilre can readily be d:ltained frc:rn (6.28), and, noreover, the
result al
WN
~ak
ccnvergenoe
also yields a parallel result for a similar stochastic process
CalStructed frc:rn the UN (t) •
a .
•
Note that by (6. 7), (6. 8), (6.14) and (6.1s),
,l~
'IT
= II (r ,n) for every r =1, •••
r
, ~le for evezy r ~ 5 (=1, ••• ,N) ,
'lT
= L1<~j~ {L(ij}
rs
P( r 1 , ••• ,rn )},
(6.32)
lij 1 extends over all peD'llltatians of (rl' • • • ,rn) over
where the sumnati.cn
(1, ... ,N) subject to the ccnstraints that r. = r and r. =
5,
)
~
for
i~
j=l, ... ,no
Further, the expressicn for the variance in (6.101 can be rewritten as
N
N
_2
N
(6.33)
L.~=1 L.)=1 'IT ~J
.. Y.Y./{'IT.'IT.) - y- where Y = L. 1Y' ,
~)
~)
~=
~
so that the expressiau; for the 'IT.. and 'IT. may be incorporated to evaluate
~J
(6.33). '!his,
~ver,
.
~
is quite eatp1icated ( in view of (6.14), (6.15) and
(6.32», and therefore,
proceed to obtain sinpler expressicns. We may note
\E
that for the reduced coupcn collector' 5 situatioo n* and for n replaced by t (n) ,
N
\E
have Parallel to (6.26)
*2
N
~
*2
}
Ls=l ~sexp{-t(n)PNS [1- exp{-t(n)PNS }J
'\i{n) =
*
N
- ten) [ Ls=l ~~sPNS exp{-t(n)PNS })
I:~1 ~
=
where,
\E
exp{-t(n)P1.Js}/[1- exp{-t(n)PNS }]
L~l YSP~lS exp{-t(n)PNS }
- ten)!
2
/(1-exp{-t(n)PNS})]2,
may note that the PNs are all specified nUll'bers, so that by (6.22),
t (n) is a knov.n quantity. As a result,
is rounded
CMcrj
A
( YIlT - Y
fran
\>Je
*
)/~(n)
---1~j
increases and n/N
° (and is finite too ),
(6.35 )
is asynptotically nOl:mal (0,1).
)
denote by PNL
~~
obtain that as
Further, if we take the semple observaticns as y. (=
= Po.,· , j=l, ••• ,n,
\>Je
y~_
~j
), j=l, ••• ,n and
may set
1'1)
U,~) = n-l~ 1 y~ exp{-t(n)p . } /[ 1 - exp{-t(n)p,,!. } ]2
UNn = n
VNn
\hen,
I~J
)=)
L'4l1
(2)
(6.34)
-1 n
Lj=l YjPNjexp{-t(n)PNj} /[1 - exp{-t(n)PNj } ]
= U~)
- t(n) [
U~~) ]2.
it fol1CMS that as n increases,
1, so that in (6.35),
<1~t(n)
(6.36)
1 )
2
*
VNr/~(n} cc:nve~sJin
may also be replaced by
V~
•
,
(6.37)
(6.38)
prc:babilityJ to
Besides the sarrpling strategies ccnsidered so far, there are
considered
sate
others
by other WOrk2rs. Am:Jng these, nention may he made of one approach
prop::>sed by Rao, Hartley and Cochran (1962). '!hey suggested that the p::>pulatioo
of N units be first divided randarly into n sub-pcpulaticns of p:redetennined
sizes N'l' ••• ' N , respeetively.
n
~!ithin
each sub-porxliatioo (or rather, group),
using a coovenient varying prOOability se12ctioo, a Horvitz-'Iharpscn esti.zrator
is used to estimate the group total, and the sum of these estimates then taken
as an estimator of the population total. H1el'1 n, the
nUrl~r
of groups , is large,
and the groupings are made ranchnly, the asyrrptotic theory remains applicable
under quite general cooditicns.
~ver,
them is sate arbitrariness in this
division of N units into suDsets of N , .•• ,N units \'tUch may JTlake th:i.s ProceCu:re
l
n
rather unappealing to practicians. For sare fw:ther '-vork in this directirn,
may refer to Icrewski (1978). Systcrietic procedures ( randan or
ordere<~)
\oJ':
for
sanpling .....i .th varying prcbabilities were also crnsidered by MadcM (1949),
Hartley (1966) and Rao and Hartley (1962), am:ng others. Sc:rre of these
are cllscusseC. in detail in
Sate
o~s
other chapters of this volune, and hence, we
shall not elaborate on their related asynptotics. Besides the systenatic procedures,
there are otl'ter procedures due to Narain (1951),
Grundy (19:>3), Sen (1353) , and
for snall values of n ( viz. n
ot.~rs.
= 2,
~·li.dzuno
(1952) , Yates and
Ibst of these proced:..lres work out y;ell
3 or 4), and as n increases, trE p:":'OCedu:res
lJecare prohibitively et.mbrous. Ne may refer to Brever and Hanif (1983) for
Cietailed discussiClls of these procedures when n neeC:L not be large.
SatE
Ha~ver,
as regards tile asyrrptotic t.'1eory is concerned, a lot of work remains to be
accx::l'i'plished.
7. SUccessive Sub-saI'll'ling \vi.th Varying PrOOabilities : Asynptotics.
SUb-sanplin2 or multi-staCJ!!
s~lin2
is oftf>.n adopted in practice and hac:;
a great variety of applicatioo.B in survey sarrpling. 'These are elaborated in
SatE
other cilapterB of thiB volurre. 'I'lPical1y, we may ccnsicer a finite pop.11ation of
N units with variate values ~l'''' ''1~
,
respectively. Ccnsider a successive
sanpling sdlerre where items are sanpled me after the other (without replaCEITEnt)
in such a
Wcrj
that at ead1 draw the prOOability of drawing item 5 is prqx>rticnal
to a nUltiJer P
if item 5 has not already appeared in the earlier drawals, for
Ns
s=1, ••• ,N, where PNl ' ••• ,PNN are a set of positive nurrbers, adding upto 1 • \'E
like to ccnsider a nulti-stage extensicn of this sanpling sdlE!flE • Here, each of
the N items in the populaticn ( called the prim?!¥ units) is crnposed of a nUITber
of smaller units (sub-units), and it may be nore econcmic to select first a sanple
of n primal::y units, and then to use sub-sanples of sub-units in eadl of these
selected primary units. Suppose that the 5th primary unit has M sub-units with
-
5
variate values b . , j=l, ••• ,M ,so that a, = b 1 + ••• + b
,for 5 = 1, ••• ,
sM5
5)
5
~s
5
N. For each 5, we c<noeive of a set { pO. , 1<;<1>1} of positive nUltiJers (such
5)
M
"..J'_
5
that Lj=~ P~j = 1) and ccnsider a successive sanpling sher'!E (without replaoerrent),
where In
5
~~s
(out of M ) sub-units are dlosen. 'n1en, as in (6.13), an estimator of
5
can be franed, for each of the n selected primary units. Finally, these
estimates can be carbided as in (6.13) to yield the estimator of the total
~n
+ ••• +
~
•
similar way. 'Ihis
'me
~
=
procedure can be extended to the nulti-stage case in a
may be terned the sucoessive sub-s?pling with
SdlerrE
~ing
prOOabilities (without replaCEITEnt) or SSSVPWR. To study the asynptotic theory,
first, we may note b'1at an Horvitz-'lhCJ'!'P5Cl1 estimator of a,
M
'"
~~s
M
is
5
= Lj=l ~~j b sj / ~~(j,ms) ,
where the b . are defined as before,
5)
(7.1)
w* . is equal to 1 or 0 according as the
5)
jth sub-unit in the 5th primal::y unit belongs to the sub-sanple of size m or
-
5
not, j=l, ••• ,M and
s
* (j ,ms)
~s
is the probability that the jth sub-unit belongs to
the sub-sarrple of Ins sub-units fran the 5th primary unit, 1 ~ j
~
M ' s=l, ••• ,N.
s
carbining (6.13) and (7.1), we may CCI1sider the natural estimator
'"
N
'"
~(HT) = L~l ~s ~-ls/~(s,n}
=
l~
~~
*
*
Ls=l Lj=l ~ wsj bsj/[~(s,n}~s(j,ms)] ,
(7.2)
~s
where
is equal to 1 or 0 according as the sth primary unit is in the sanple of
n pri.rnary tmits fran the population, s=l, ... ,N, and the inclusien prcbabilities
Ll(s,n) are degined as in (6.12). Ibte that for each (selected) primary unit s,
"
for the estimator a.
in (7.1), ene may use the theory discussed in sectien 6.
:-JS
'Ihi.s, hCJWlever, leads to a nultitude of stewing nmDers and thereby intn:xiuoes
ccrcplications in a direct extension of the Ibsen awroac.'1 to SSVJ?l:I'JR. A n'Ore
sinple approach (based en sene martingale oonstructiens) has been worked out. in
sen (1980),
and we IT'ay present the basic asynptotic theory.as follONS.
OUr pri.rnary interest is to present the asyrrptotic theory of the estimator
~U(h"?) in (7.2). In this oontext, as in earlier sections, ,~ alIa..., N to increase.
As N
-+
00,
we assure that n, the pririiary sanple size, also increases, in sud1 a
way that n/!.J is bounded
~.;;ay
fran 0 and
00
,
,role the m
s
(Le., the sub-sanple
sizes) for the selected primary units mayor may not be large. For this situatien,
t.~
asynptotic theory rests heavily en the structure of the primary unit sanpling,
and , we may also allow the sanpling schene for the sub-units to be rather arbitrary
(not necessarily a SSVPWR), while we assurre that b'1e primary units are sanpled in
acoordance with a SSVPWR scllene. A seCX!1d situatien may arise where the nurrber of
primary units (Le., 1'1 ) is fixed or divided in to a fixed nUIYiJer of strata, and
within each strata a semple of seoondary units is drawn aa:::ording to a SSI.TWF. schene.
':ids situation, hor.,.oever, is congruent to the stratified sanpling sdlerte under
SSVPWR , for which t;he theory in Section 6 extends readily. Hence, we shall not
enter into the detailed discussions on this second scherre.
With the notations introduced before, we set
0"
a1'Js
= E (~s)
2
~ = ~::l ~s =
In order that ~
=~ ,
"
O'Ns = Var (~s) , for s=l, ••• ,LJ :
and
E(
~HET)
) •
(7.4)
=~ ,
for every s. othel:wise, the bias
be negligible. Also, for every N, we oonsider a nondecreasing ftm.ctioo t_
~
< i~} by letting
(7.3)
it is therefore preferred to have unbiased estimators at
the sub-unit stage, so that ~~s
o<x
•
nON
may not
=
{t.,,(x):
.~
e
N- x
=
l::1
exp{-Pl~S~~(x)}
(7.5)
, x e: (0, N) •
let then
2
N
0
2
{
= Ls=l
[~1s] exp -PNS
0Nn
N
~/n)
}
[1 - exp
2
{
} -1
-PNS~-J(n) ]
-1
+ Ls=l (\~s ( 1 - exp{ -PNS~;(n)}]
-
~~(n) ( L~l 'i.~SPi-lsexp{-Pr~s ~~(n) }/(
1 - exp{
a.
Finally, we asSl.lltE t."'lat the sub-unit estimators
cooditicn, nane1y, that for every
l~N E [(~s
-
~~s) 2 r ( I~~s
-
n
(7.6)
satisfy a LinCeberg-type
0,
>
c;~s I
i.\lS
-PNS~l(n)}) J2.
>
~~
)]
0 , as H
-+-
-+-
(7.7)
00
'llie other regularity cooditions are, of course, the crnpatibi1ity of the prabal:>i1ities
PIJ1"" ,Pl~ and b'le sizes ~1"'.'~ ( in the sense that for each sequence the
ratio of the rnaxi.:rrnJm to the minimum entry is asyrrptotica11y finite). 'llien, we
have the following:
'"
( ~(HT) -
AN0
}/0Nn
(7.8)
is asyrcptotica11y normal (0, 1 ).
~
Actually, parallel to (6.29), we may ccnsider a stochastic PI'OCESS
c..::,t <1} (where c
~~~T)
~N (t)
> 0), by letting
=
-~
N
"(t)
(AN
(H?)
is the estimator in (7.2) based on the sanp1e size n
primary sarrple), t e: Ic,l]. 'Ihen, the PI'OCESS
~N
= {~(t):
0
- 1l11) , '>Jhere
=
[Nt]
( for the
converges in 1a\y to a Gaussian
'llie proofs of these results are based on sore aslmptotic
functicn en [c,l] •
theory for an extended c0SZ!: collector's probleM, where in (6.16) through (6.19),
the real (ncn-stochastic ) e1errents a.
•
~-Js
' s=l, ••• ,U.
Note
~1at
NS
are replaced by suitable randan variables
For details of these deve1q:rrents,
'o,lE!
may refer to
sen
in the above deve1<:prent, apart fran t.l-te tll1iform integrability
CQ'lditirn in (7.7), ,..J e have not inposed any restricticn on the estimates
'U1us, we are
(1980).
a11~
to make the sub-sanple sizes m
s
"
~~.
arbitrary, subject to the
ccnditioo that (7.7) holds. In this context, we nay note that if these m are
s
2
also chosen to be large, then the 0N , defined by (7.3), ,...i11 be small, so that
•s
in (7.6), the secx:>nd sum on the right hand side will be of SMaller order of
magnitude (eatpared to the first sum), and hence, in (7.8),
0Nn may be replaced
1<
by ~ , defined by (6.34), where the Y are to be replared ~ the ~ • In this
s
limiting case,
\~
observe therefore that sub-sa!ipling
does not lead to any sig-
nificant increase of the variance ( carpared to SSVPWR for the primary units);
although in many practical prdJlens, suO-sanpling is rrore suitable, Decause it
Q:leS
not presuppose the knowledge of the values of the pr:irnary units
{~lS}
a eatplete census for these may be much ITDre expensive than the estimates
and
'"
{~
}
based en a handful of su.:runits.
So far, ~ have considered sanpling without replaeem:::nt. In SSVP sanpling
with repla<::er!'ent, the theory of successive VP sanpling ~. . . i th replacerrent, discussed
in the beginning of Section 6, readily extends. In (6.1), instead of the primary
'" , derived frau the respect.ive su'Junits y. , we need to use their estimates y.
J
J
sanples. As in (7.6), this v:ill result in an increaded variability due to the
individual variances of the seccnd-stage estirc:ators.
~ver,
with replacemant
strategy yields siIrplifications in the tJ:eatnent of the relevant asynptotic theory ,
and (6.5) and (5.6) 00th extend to this sulJ-5al!Pling scha:e without any difficulty.
In ccnclusien, ~ nay remark tilat in finite populaticn sanpling, the usual
treatrcent for the asynptotic theory (valid for independent randan variables) may
not be directly applicable. But, in ITDst of these situations, by appeal to either
sore appropriate permutation structures ( for equal pro.'Jability saripling) or to
sate
martingale theory (for VP sanpling as
~ll
), the asynptotic theory has been
established under quite general regularity conditions. rihese provide theoretical
justifications of the asynptotic nonnali ty of different estimators (under diverse
•
sanpling scherres) wt..en
particular, for
t.~
~tiIL1al
sanple size (s) mayor may not be ncn-stochastic. In
allocation based en pilot data, often,
~
end up Nith sanple
sizes being ran<km (positiV2 integer valued) variables. In such a case, the
asynptotic results an tl1e stochastics processes referred to earlier are useful.
'It~e
results are also useful for quasi-sequential or repeated significance testing
problem:; in finite populatien scmpling •
e
leferenoes
Amab, R. (1979). en strategies of sanp1ing finite pqm].aticns en successive occasicns
with varying probabilities. Sankhya, ser.C 41 , 140-154.
en samford's procedure of unequal probability
sanp1ing without rep1acenent. J. Arter. Statist. .Mso. 71., 912-918.
Asok, C. and Sukhat:Ite, B.V. (1976).
Brewer, K.R.W. and Banif, H. (1983). ~lin2 with Un~ PrOOabi1ities. Lecture
Notes in Statistics 15 , Springer-Verlag, New Yo •
O1apnan, D.G. (1951). SCIre prcperties of the hypergearetric distributicn with
app1icaticns to zoological census. Univ. calif. Publ- Statist. !., 131-160.
O1apnan, D.G. (1952). Inverse, multiple and sequential sanp1e censuses. Bi<::lretrics
.!!, 286-306.
O1audhuri, A. (1975). SCIre prcperties of estimators based·on sanp1ing scherres with
varying prcbabi1ities. Austral. J. Statist. 17, 22-28.
O1audhuri, A. (1976). On the moice of sarrpling strategies. calcutta Statist.Asso.
Bull. 25, 119-128.
Codlran, W.G. (1977).
~lin2 Ted1ni~s
(3rd Ed.). Jdm Wiley, New York.
Cochran, W.G., Hartley, H.G. and Rao, J .N.K. (1962). On a si..rrp1e procedure of
unequal probability sanp1ing without rep1acerrent. J. Roy. Statist. soc.
sere B 24 , 482-491.
Connack, R.M. (1968). '!he statistics of capture-recapture rrethOOs. O::ean2'
rm. Biol-Ann. lev. 6, 455-506.
Darling, D.A. and Ibbbins, H. (1967). Finding the size of a finite population.
Ann. r1atll. Statist. 38, 1392-1398.
Darroch, J.N. (1958). '!he nultip1e recapture census,I: estimation of closed
pcp.liation. Biaretrika 45 , 343-359.
Das, A.C. (1951). On t\\U phase sanp1ing and sanp1ing with vaJ:}'ing probabilities.
Bull. Internat. Statist. Inst. 33, 105-112.
D..u:bin, J. (1953). SCIre results in sanp1ing when units are selected with unequal
probabilities. J. ~. Statist. SOC. sere B15, 262-269.
Fann, A. (1~71). .Mynptotic nonna1ity in a capture-recapture problem when
catdlabi1ity is affected by tlle tagging pI'C>Oedure. Tech. Rep. TRITA-MAT-1971-10,
Rgy. Inst. Tech., Stockholm, ~ .
Goodman, L.A. (1953}. sequential sanp1ing tagging for populaticn size problems.
Ann. Hath. Statist. ~, 56-69.
H£jek, J. (1~59). ~ti.ImJm strategy and other problems in probability saI!p1ing.
Cas?pis Pro P~stovani 1-1atemat~ 84, 387-423.
Hci'jek, J. (1964). Asynptotic tlleOl.)" of rejective sanp1ing with varying probabilities
fran a finite populaticn. Ann. t-Bth. Statist. ~, 1491-1523.
Htijek, J. (1974). Asynptotic theories of sanp1ing with vcuying probabilities wit.~out
replacement. Proc. Pra~ cenfer. ~. Statist. ( Ed. J. Hajek) ,pp.127-138.
Hljek, J. (1981). ~lin~ ~ a Finite P~aticn. !B:rce1 Dekker, New York.
".
V.
"-
HaJek, J. and S~dak, Z. (1967). 'Iheo~ of Rank Tests. Academic PJ;ess, New York.
Hanif, M. and Bl:'eWer, K.R.W. (1980). Q1 unequal probability sanp1ing without
replacement: A review. Internat. Statist. Rev. 48, 317-335.
11ESen, M.H. and HIJ.n.litz, W.H. (1943). en the theory of sanp1ing fran a finite
pq:>ulatioo. Ann.r--..ath. Statist. 14, 333-362.
Hansen, M.H., Hurwitz, W.N. and Madc:M, \-I.G. (1953). Sarrple
'lheo!J'. Jclm Wiley, New York.
Sl1r'IJe¥
rEthods and
Hartley, H.O. (1966). Systematic sanpling with unequal pJ'rl)ability and without
replaoenEnt. J. ArtEr. Statist. Asoo • .il, 739-748.
Eart1ey, Il.O. and Rae, J .N.I{. (1962). Sarcpling with unequal pJ'rl)abilities and
without rep1acemant. .Ann. Math. Statist. 33, 350-374.
Hoeffding, w. (1948). en a class of statistics with asynptotically nomal distribution. Ann. Math. Statist • .!2., 293-325.
Hoeffding, W. (1963). PrciJability inequalities for suns of bounded randan variables.
J. Arter. Statist. Asso. 58, 13-30.
Horvitz, D.G. and 'Iharpoon, D.J. (1952). A generalizatirn of sarrp1ing wib'1out
replaoern:mt fran a finite universe. J. ArtEr. Statist. AB50. £, 663-685.
Karlin, S. (1974). Inequalities for symretric sempling plans, I. J'nn. Statist.
~ , 1065-1094.
Krewski, D. (1978). Jackknifing trstatistics in finite populatiCtlS. Comun.Statist.
Theor. ~t.h. A 7 1-12.
Lahiri, D.B. (1951). 1-. xrethod for selectioo providing unbiased estimates. Bull.
Intemat. Statist. Inst. 33, 133-140.
Madc:M, W.G. (1949). Ch the theory of systematic sanpling, II. .Ann. b1ath. Statist.
20, 333-354.
~Jajumiar,
H. and Sen, P.K. (1978). Invariance principles for jackknifingtrRtatistics
for finite population sarcpling and satE app1icatioos. carmun. Statist.'!heor.
~. £ ' 1007-1025.
I,l1dzuno, H. (1952). 01 the sarrp1ing system with prd>ability proportionate to sun
of sizes. Ann. Inst. Statist. r1ath. 3, 99-107.
Miller, R.G., Jr. and Sen, P.K. (1972). ¥eak ccnvergence of U-statistics and VCl1
Mises' differentiable statistical functioos. Ann. Math. Statist. 43, 31-41.
Murthy, }Ol.N. (1967). sanpling
~ry
and MeelOOs. Statist. Pub. Soc., calcutta.
H.K. and sen, P.K. (1963). <Xl the prq:lerties of U-statistics ~.1m the
dJservatians are not independent. Part '&'0 : Unbiased estimation of the
paraneters of a :finite ~atian. Calcutta Statist. ABoo. Bull. 12, 125-148.
~~andi,
Narain, ReD. (1951). en sanpling without rep1.acelient with varyinq probabilities.
J. Indian Soc. Agricul. Statist. ~, 16~-175.
Pathak, P .K. (1967a). Asynptotic efficiency of tea Raj's Rtrate<:n', I. Sankhya ,
Ser. A 29, 203-298.
Pathak, P.K. (1967D). Asynptotic efficiency of ~ Raj's strategy, II. Sankhya,
Ser. A 29 , 29~-304.
Petersen, C.G.J. (1896). 'Ihe yearly imnigratioo of yOtmg plaice into the
LiJnfjord fran the Gennan sea. lep. Danish. BioI. Sta. ~, 1-48.
Raj, Des (1950). serre estiI'!lators in sarip1ing with varying prcbabilities without
repla.cefTeIlt. J. mer. Statist. Asoo. ~, 269-284.
Hao, C.R.
977). Sare prti,)lems of semple surveys. Sankhya, ser.C 39, 128-139.
a.
Rae, J.N.K. (1961). Q'l sar'llling ,rith varying prd'::aQilities and with replacerrent
in subsarrpling designs. J. Indian Soc.!::E'i. Soc. 13, 211-217.
Rao, J .i.J .K. (1960). Q1 the c::x:npari.soo of sanpling with and without replacerrent.
!:ev. Intemat. Btatist. Inst. 34, 125-138.
~sen,
B. (1967).
Ibsen, B. (1970).
1!,
en an
en the
inequality of Hoeffding. Ann. 12th. Statist.
coup::.n collector's waiting
~,
382-332.
tiIre.Ann.Matll.~tatist.
1952-19G9.
:tbsen, B. (1972a) •Asynptotic theory for successive sanpling with varying probabilities
without replaoerrent, I • Ann. l-'Jath. Statist. §., 373-397.
B. (1972b) .Asyriptotic theory for successive sarpling ".'i.th varying prof::abilities
without replacerrent, II. Ann. ~Jath. Statist. 43, 748-776.
t{osen,
~sen,
B. (1974). Asyrptotic theory for ~s Raj estimators. Proc. Prague Confer.
(ed. J. F.ajek) , W.313-330.
-
As~otic Statist.
105en, B. (1979). A tool for derivatioo of asynptotic results for sanpling a'1d
occupancy problerrs, illustrated by its applicatioo to SOTE general capturerecapture sanpling prcblerrs. Froc. 2nd Pra~ Confer. AsyIy:totic Statist.
(~. P. ~..andl and M. Hus'kova), pp.67-BO.
samuel, E. (1!:*68). Sequential rnaxi.mum likeli.~ood estirnatioo of the size of a
population. Ann. l'iath. Statist. 1.9,1057-1068.
Sdmabel, Z.E. (1938). 'J.11e estimation of the total fish pop..llatioo of a lake. h-er.
Hath. Ivbn. 45, 348-352.
seber, G.A.F. (l~3). me estimation of animal abtmaance, Griffin: I..a1don.
f.en, A.R. (1953).
en
the estimate of the variance in sanpling
~i. Statist. ~, 119-127.
~lith
varying proba-
bilities. J. Indian Soc.
ana Sen, P.K. (1981). SCimabel type esti.Mators for closed populaticns
witn ITIUltip1e ICIarldngs. SaT'lk11~a, Ger. B 43, 68-80.
Sen, A.R.
Sen, P.K. (l960a). en serre ccnvergence prorerties of P-statistics. calcutta Statist.
Asso. Bull. !Q., 1-19.
Sen, P .K. (1960b). On the est.irTlatirn of population size Dy the capture-recapture
rrethod. Calcutta Statist. Asso. Bull. 10 , 91-110.
Sen, P.~'. (1970). '!he Hajek.-Renyi inequality for sarrrpling fran a finite populaticn.
Sankhya, sere A. 32 , 181-188.
Sen, P .K. (1972). Finite populatioo sanpling and weak ccnvergo..nce to a Brc:Mnian
bridge. Sankhya, sere A 34, 35-90.
Sen, P.K. (1977). Sore invariance principles relating to jack}:nifing and their role in
sequential analysis. Ann. Statist. 5, 315-329.
Sen, P.K. (197~a) • Invariance principles for t.l'1e coupcn collector's prab1eJ'T\
martingale approach. Ann. Statist. 7, 372-380.
A
Weal~ ccnvergence of scree quantile processes arising in
progressively censored tests. Ann. Statist. 7, 414-431.
Sen, P.K. (1979b).
Sen, P.K. (1980). Limit theorems for an extended coupcn collector's problem
cu"1<':
for successive sub-sanpling with varying prcrabilities. Calcutta Statist.
lIsso. Bull. 29, 113-132.
Sen, P.K. (1981).
tial Ncn
tries: InvariancePrineip1esand Statistical
Inference. Jdm Wi ey, New York.
sen,
P.K. (1982a). 0:1 the asyr'ptotic norna1ity in sequential sarrp1ing tagging.
sex.A 44, 352-363.
~lya,
sen,
sen,
P.K. (19820). A renewal theorem for an urn rrodel. Ann. PrOOabi1ity 10, 838-843.
P.K. (198J). en pe.rTIU.ltatiana1 central limit theorems for general l"1UJ_tivariate
li.o.ear rank statistics. ~ank:lya, Ser. A 45, 141-149.
serf1ing, R.J. (1974). Probability ine:jUalities for the
replacement. lnn. Statist. ~, 39-48.
SUM
in sanp1ing without
Yates, F. and. Grundy, P.r-I. (1953). Se1ecticn without rep1acerrent fI'a'l within strata
with probabi1it~1 proportional to size . •:T. Ib'j. Stati!'lt. Soc. ser.B 15, 253-261.
© Copyright 2026 Paperzz