ON SAMPLING WITH REPLACEMENT:
AN AXIOMATIC APPROACH
by
RICHARD CONRAD TAEUBER
Institute of Statistics
Mimeo Series No. 299
October, 1961
iv
TABLE OF CONTENTS
Page
100 INTRODUCTION. • 0 •••
1
2.0 REVIEW OF LITERATURE •
6
ooooooe.oooo
3 00 ON THE BASIC CRITERIA FOR A THEORY OF SAMPLING 0
21
301 Components of the Sampling Problem • • • • • •
3.2 On the Question of Sampling with or without Replacement
3.3 The Applicability of Traditional Estimation Criteria
304 Criteria for Estimators from Finite Populations •
0
21
25
29
39
400 THE GENERAL CLASSES OF LINEAR ESTIMATORS FOR SAMPLING WITH
REPLAC~
401
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
"
0
0
•
e'
0
0
0
Introductory Remarks
• • • 0 • • • •
Probability System and Notation •
Some Combinatorial Considerations • •
Class One Estimators • • • • • .
Class Two Estimators
• •
Class Three Estimators
Class Four Estimators •
Class Five Estimators • • • • • •
Class Six Estimators
• • •
Class Seven Estimators • • • •
• •
Summary of Numerical Examples • • • • •
0
o
(I
"
•
o
0
•
43
0
43
•
•
•
I)
0
"
46
. • . . 50
0
0
.. .. .
o
0
•
0'
•
0
53
56
63
70
76
85
~
. .. .. .
90
92
5.0 SOME ADDITIONAL COMMENTS ON THE ESTIMATORS
94
6.0 SUMMARY
99
OOooGOOOOOOO
601 Summary and Conclusions • • 0 • 0
6.2 Suggestions for Future Research
0
99
•• 104
....
700 LIST OF REFERENCES 0 • 0 0 • 0 0 ••••
~.
8.0 APPENDIX A. THE DISTRIBUTION OF THE NUMBER OF DISTINCT UNITS'
IN THE SAMPLE 0 ••• 0 ••• 0 • • ••
0,0 • •
• .. '111
8.1 Equal Selection Probability Case •
8.2 Arbitrary Selection Probability Case
900 APPENDIX B. A STATISTICAL THEORY OF COMMUNISM •
o
•
106
. . . 111
115
• • 117
1
1.0
INTRODUCTION
Sample survey procedures are among the most valuable and powerful
tools at the command of a statistician.
they could be exceedingly dangerous.
Improperly or carelessly used,
To get useful results from a
sample survey, one must heed the importance of the logical planning of
all steps in an investigation.
Not only is there a problem of how best
to select the sample, but also there are the problems of how to obtain
an estimate of the desired population value and what measure of reliability to attach to that estimate.
To do all this makes it inevitable
that certain assumptions regarding the unknown population will be necessary.
It is here that finite population sampling differs considerably
from procedures for draWing samples from an infinite population.
When
sampling from finite populations, the only assumptions made concern .such
things as existence, identifiability and aVailability of the sampling
units and the probability construct chosen for the selection of the sampling units.
Sample survey theory makes no assumptions concerning the
abstract distribution of the variables (characteristics) under study.
Even after the unknown number of centuries that man has been
drawing samples and acting on the information that they prOVide him,
.-
there still has not been developed a general theory with practical
applicability which will universally indicate to the sampler a "best'.'
(in some sense) system for selecting the sampling units to be observed
and at the same time indicate a "best" estimating procedure by which to
glean the information prOVided by the sample.
(The term "best" will
usually connote minimum variance, but this is not the only requirement
2
for bestness which one lIl.ight impose.
For certain restrictive particular
cases "best" systems and estimators are known, e.g., the arithmetic mean
of the observations is the minimum variance estimator when sampling
without replacement and with equal selection probabilities.)
However,
with some reflection, it is easy to see why no such perfect theory has
been developed for sample surveys, for the method that any given sampler
adopts is very ,dependent on the nature of the material which is available or can be obtained, and the assumptions necessary to utilize that
material to which he can gain access.
•
In spite of this absence ofa
"general theory" for sample surveys, some progress has been made in the
formulation of improved sampling systems and estimating procedures which
will give "better" results.
With, no doubt, centuries of application, the study of the theory
behind sample surveys (at least that which was published) dates from
1713 and the appearance of Bernouilli' s Ars co:qiectandi.
In the two
centuries following the appearance of this work, little was, published by
anyone other than Poisson and texis.
Beginning in 1916 many authors
have published various considerations of aspects of sample survey theory, especially as applied to the drawing and evaluating of samples from
finite populations.
.'
Modern developments in the field of sampling finite populations are
usually said to stem from the paper by Neyman (1934) which was the last
major paper to give much consideration to purposive selection of the
sample units, as contrasted with probabili.stic selection, and which
pointed the way to more "scientific" lines of development of the art.
3
Nine years later Hansen and Hurwitz (1943) stimulated the consideration
of drawing samples with unequal or arbitrary selection probabilities by
using the idea of probability proportional to
s~ze.
Midzuno (1950) for-
malized the general approach of arbitrary probabilities by introducing
the concept of a probability field into such studies.
As he said:
"there is no need of equal probability for every element when we construct the probability field, isn't it?"
It was not until 1952 that the first attempts at formulating
general classes of estimators for samples from finite populations was
published by Horvitz and Thompson (1952).
But Horvitz and Thompson did
not recognize the deductive approach of their own work, and so merely
stated three of the possible classes of estimators.
It remained for Koop (1957) to formalize the approach to the
formation of classes of linear estimators.
The formation of seven
classes of linear estimators, for the case of sampling with unequal
(general) selection probabilities and without replacement of. the sampled
units, was based on three axioms which are descriptions of physical
realities.
This approach to the formulation of classes of estimators,
based on the way things actually happen with the associated probabil-
i ties, would seem, for finite popUlations, much more fundamental than
one based on classical estimation criteria.
In fact, the notion that
there are criteria for which one can develop classes of estimators is
not germane to sample surveys.
In sample survey theory, one first
develops classes of estimators, then applies criteria, such as unbiasedness or minimum variance, to attempt a determination of bestness within
each class.
4
Another problem which has been under discussion recently in the
literature of sample surveys is. the question of whether one should sample with or without replacement.
It is argued that with equal total
sample size, there is no question but that one should use without replacement by virtue of the fact that the variance of the mean is smaller.
But, when cost is figured in as a consideration in the decision
process, then the comparison clouds for the costs of sampling with replacement depend on the number of distinct units in the sample, rather
than the total sample size.
Hidden by considerations such as the question of whether to sample
with or without replacement, the development of newer and fancier estimatorsfor specialized situations, the extension of the sampling plan to
more and more stages, the more and more theoretical discussions of some
of the technical problems which arise in actual samples, etc., is an
almost complete lack of discussion of principles governing the choice of
estimators to use on samples from finite populations.
Although the
basic principles of unbiasedness and minimum variance, which are directly applicable to samples from finite populations, appeared in the literature in the early nineteenth century, little has been developed since
in the way of criteria specifically applicable to the problem of deter.","
mining optimum estimators (in some sense) when the population under
study is finite.
In this, and many other aspects of sample survey the-
ory (samples from finite populations) the tendency seems to have been to
assume that the criteria developed for infinite populations will merely
transfer to finite populations •. In some cases they may, but for the
5
most part they do not without adding unwarranted assumptions about the
nature of the population or the sample.
What this dissertation proposes to do, then, is:
(1)
To discuss, in a preliminary manner, the applicability of the
classical estimation criteria to samples from finite populations, and to suggest some possible criteria which might be
utilized in evaluating possible estimators for use on samples
from finite populations.
(2)
Examine the question of sampling with replacement versus sampling without replacement to see what conclusions might be
reached, or have been reached, or to see if such a comparison
can properly be made in the first place.
(3)
Using an axiomatic approach, to examine the problem of formulating classes of estimators for samples drawn from finite
populations, with arbitrary selection probabilities, and with
replacement of each sampling unit before the next unit is
drawn.
It can be noted that the first two objectives are somewhat
interrelated~
The third objective, the use of the axiomatic approach, does not depend
on the results of the first objective.
However, this approach to the
formulation of classes of estimators is further justified by the results
of the first objective.
6
2.0
REVI:EWOF LITERATURE
Man from time innnemorial has engaged in the use of sampling
techniques to base decisions on partial knOWledge of the situation.
He
has judged the opinions of many by talking with friends or advisors; he
has condemned or praised a whole. nation or race of people after but a
fi ve or ten day visit; he has pushed aside a bowl of hot soup. or tepid
mush after swallowing one spoonful; et cetera, et cetera and so forth.
In the case of the soup or the mush, the universe (bowlful) is undoubtedly sufficiently homogeneous that such a sample would lead to valid
inferences.
But for the other examples cited (in fact for most of the
sampling that is done by man, either unconsciously or deliberately),
there is great danger that false and misleading inferences will be
drawn if complete objectivity in the formulation of the goals and procedures of the inquiry and in the collection and analysis of the data
is lacking.
Eventually people began to want to formalize the methodology behind
obtaining some of these sample estimates.
Some sort of formal
procedure
-.,
was needed to obtliLin measures of central tendency and an indication of
their Validity, based on a subset of an entire population.
The first
known formal approach to study the theory of sampling was that of
Bernouilli in his monumental study Ars coniectandi which appeared :i,n
1713.
A c.entury later Poisson gave indications of the theory that would
result from the introduction of stratification into the sampling procedure.
Subsequently Lexis systematized the work of his predecessors and
added the beginnings of the theory of sampling clusters of elements.
7
Also, it can be noted that the germinal ideas of the analysis of
variance techniques are to be found in Lexis I works.
Sir Arthur Bowley (1926) su:mma.rized the adaptation of the works of
Bernouilli and Poisson to sampling from finite populations.
Bowley was
also one of the first to apply the representative method (purposive selection) in practice, and included in this paper a discussion of the
theory involved.
This paper undoubtedly marked the high-water-mark for
purposive selection (as contrasted to random selection, or attempts
thereat) because the major papers subsequent to this one seemed to assume random selection, or to condemn purposive selection.
Bowley, him-
self, later made the .comment ~ when discussing the paper by Neyman (1934),
that he thought his 1926 paper had "damned it (purposive selection) with
very faint praise".
In the decade immediately preceding Bowley's paper,l the theory of
sampling finite populations with equal selection probabilities and without replacement began to develop in earnest.
Isserlis (1916 and 1918),
Edgeworth (1918), Tschuprow (1923) and Neyman (writing under the name
J. Splawa-Neyman) (1925)
the sample mean.
de~ived
formulae for the, first four moments of
Mortara (1917) developed a formula for the standard
error of the mean.
Neyman, in addition to giving formulae for the first
four moments of the 'sample. mean, gave formulae for the first two moments
.
' ;
of the sampling variance.
1
.
Due to· inaccessability, formidable notational
.
For a very interesting discourse on part of the.history of the
development of sampling theories· and practice in the five decades preceding Bowley I s paper see the article by You Poh Seng (1~5l).
8
systems, or other reasons, none of these papers stimulated 'Wide growth
in the field of sample survey theory.
Tschuprow, in that same 1923 paper, developed the principles of the
theory of the optimum. allocation of units in stratified sampling.
In
fact, "Zarkovic (1956), in his article on the history of sampling methods
in Russia, gives the impression that had the works of
Tsc~uprow
been
more accessible, and had they had a system of notation which was easier
to understand, they might be the monumental works being cited in this
chapter.
Zarkovic refers to an earlier Russian work which mentions that
Tschuprow, in 1900 in a report
II
On Sampling Methods II , dealt only with
probability samples (Western reliance was then on purposive selection)
and developed the basic theory of surveys.
Also several of Tschuprow' s
other works, especially those in connection with the Russian census
circa 1913, where many of the techniques were applied, were quite suggestive of techniques and theoretical developments which were "derived"
much later in the more familiar Western literature
0
In fact, if Zarkovic is right, Russian sta.tisticians were in the
forefront" of the development of sample survey theory and techniques up
to the time of the death of Lenin.
This was due, undoubtedly, to the
fact that
"These Russian statisticians watched the development of
statistical theory allover the world, they published
translations of the most important "foreign "contributions
and they reviewed for their rea.ders all important results,
whatever country prOVided them. This keen actiVity supplied the base from which they sought solutions to their
own practical problems." (1956, p. 336)
He goes on to say, though, that in the years after the death of Lenin,
9
political considerations became increasingly important in Russian
statistical effort, and less reliance was placed on theory in the practical application of "statistical" techniques.
(For an illustration of
this non-reliance on theory in the Communist world, see Section 9.0).
There are indications, however, that, at least in Russia, a reliance on
theory is again emerging [YeZhOV (1957)).
An impetus to sample survey development, following the above
mentioned papers and the fundamental statistical contributions of
Pearson, Fisher, and others, resulted from the paper by Neyman (1934)
entitled liOn Two Different Aspects of the Representative Method:
"
Modern developments in the field of sample survey theory can be said to
have begun With this paper.
Several new concepts (i. e., new to most
researchers; some had been anticipated in earlier articles which had not
received as much attention) were introduced and discussed, such as:
(i ) optimum use of resources in sample surveys,
(ii)
(iii)
criteria for the choice of the sampling unit:>
use of preliminary inquiry for improving the design of the
survey, and
(iv)
optimum allocation for assigning units to different strata
subject to the restriction that the sample shall have a
fixed total number of sampled units.
Neyman also discussed the advantages of random over purposive selection
of units, and also the advantages of using stratified sampling, going
so far as to make the statement that the only recommended method of
sampling is stratified random sampling.
10
The next major paper to appear was that by Hansen and Hurwitz
(1943).
Faced with the situation where the sampling unit, and the ulti-
mate unit.of analysis are not identical, they examined the question of
sampling with unequal selection probabilities.
In situations where the
sampling units are aggregates of ultimate units (i. e., clusters) limitations on resources may prohibit the effort needed to group the ultimate
units into clusters of equal size by artificial methods.
These authors
noticed that if one· sampled units with replacement using probabilities
(Pi) exactly proportional to the values (Yi) of their aggregate characteristics (i.e., Pi
= YilT,
T
= the
population total), the mean of
these aggregate values, each weighted by the reciprocal of its respective selection probability, has a sampling variance of zero since each
Yilpi ;: T.
These considerations led Hansen and Hurwitz to consider selecting
sampling units with probabilities proportional to some measure of size
so as to reduce the Sampling variance of the estimator.
The scheme that
they proposed was essentially a stratified two-stage scheme, selecting
one primary unit per stratum at the first stage with probabilities proportional to some measure of size, and at
t~e
second stage selecting the
elements in each selected primary unit with equal probabilities and without replacement.
An unweighted estimator was shown to be unbiased and
to have a smaller variance than if the sampling plan was based on equal
first-stage selection probabilities.
The appearance of this article by Hansen and Hurwitz stimulated
attempts to generalize the approach, both in terms of estimators
11
invalving varying probabilities (rather than being restricted to equal
selection
probabilit~es
as had most previous studies) and in terms of
selecting more than one first-stage unit per stratum so that the sampling variance of the estimator could be calculated.
Not all of these
papers will be mentioned here as they are not of immediate relevance to
this dissertation.
Sukhatme and Narain (1951) outlined a scheme where the primary
sampling units (p. s. u. IS) were selected with replacement and With
probabilities proportional to their sizes as measured by the number of
sub-units in each primary unit.
Then the second stage units were se-
lected without replacement and with equal selection probabilities.
They
presented the theory, and also compared the efficiency of the following
two schemes:
(A)
selected a random sample of mn sub-units from the i-th prii
mary unit, where
m is an integer, and
n
i
denotes the
number of times the i-th primary unit appears in the sample,
E n. = n, and
J.
(B)
'
select
n
i
independent random sub-samples of m sub-units
from the i-th primary unit.
The variances of the sample means are respectively:
VA = -n
[~~ + i=l
1
VB = -n
[~~
1
N
E
M -m
N
i
:2
(n-l)
E
PiO"i
mM
i
i=l
N M -m
i
+ E
mM
i=l
i
Pi~i]
:2
:2
Pi O"i
M
i
]
12
where N = number of p. s. u. 's in the stratum;
M = number of sUb-units in the i-thp. s. u. ;
i
:2
O"b = the between p. s. u. variance;
t{
= the wi thin the i-th p. s. u. variance.
Thus in their plan (A) that part of the variance attributable to subsampling is reduced to the order of
N
m(n-l) tj,
(M-m)N
where
M=
L:
i=l
Mi/N,
which, it may be noted, is very nearly equal to the over-all sampling
fraction.
The estimates of the between and within variances are as follows:
for case (A):
v
"2
w=
0"
mni
-:2
L: (Yij-Yi) /(nm-v)
i=l j=l
L:
v
A2
O"b = L:
i=l
where
v
:2
O"w
for case (B):
=
n (y -y):2
i i
n-l
A2
0"
2!. [ E(V)-l
n-l .
n
],
the number of distinct p. s. u.' s in the sample;
=
the estimates come straight from analysis of variance
considerations, since the sub-samples are drawn independently.
Wilks (1960) raised an objection to the above scheme by noting that
it is conceivable that mn
i
could exceed M ; thus the above method
i
13
could require observation of more
~han
the total number of secondary
units available.
Wilks suggested that one let N = oMi = the number of·elements in'
i
the i-th p. s. u. (and to consider a reasonable approximation where the
N are roundeC!. to the nearest integral multiple of m)
Then one is to
i
draw a s~le of n p. s. u.'s~ in a manner such that a sample of aim
0
where the a (i = 1, 2,
i
k) are random variables having the hypergeometric distribution
sub-units is drawn from the i-th p. s.
u.~
This scheme may be regarded as one in which
s~l;ing
0
0
0'
is done without
replacement at both stages, whereas the scheme proposed by Sukhatme and
Narain involves
s~ling
with replacement at the first stage and without
replacement at the second stage.
Wilks suggests that the estimator for the mean be
_ 1 k
_
1
k
Y = - Z a.y = Z m aiyJ,. ,
n i=l J, i
mn i=l
.
which is self-weighting.
The expression for the estimate of the vari-
ance of this mean is, unfortunately ~ rather complicated and is given by
Wilks
(1960~
p. 246).
In the early 1950's many articles appeared which incorporated unequal selection: probabilities into the formulation of estimators
0
The
majority of these were by Indian authors, and have not received much
14
attention in this country.
The article that is probably the best known
of the ones thai;,
, appeared in this interval· is that by Horvitz and
Thompson (1952).
In their "Generalization of Sampling Without Replace-
ment from a Finite Universe ll they formulated three classes of linear
estimators for the population total with coefficients for class one dependent on the presence or absence of a unit in the sample, for class
two
d~pendent
on the order of draw and for class three dependent on the
particular sample involved.
This article was the first to incorporate
the ideas of what was subsequently formalized as the axiomatic approach
to the formation of classes of estimators, although they did not explore
the logical consequences of this formulation.
These requirements on the
coefficients for the classes will be seen later'to be the same as our
class two, class one and class three respectively.
They determined coef;f'icients for each class by imposing (a) the
condition of unbiasedness (that the expected value of the estimator be
equal to the total); and (b) that the coefficients so determined shall
be independent of the properties of the population.
The authors them-
selves were aware that they were indicating only three of the possible
,
...
.
classes of linear estimators of the total when sampling a finite population.
It was subsequently shown that there were in fact seven classes
of linear estimators for sampling a filiite population with unequal
(general) selection probabilities and without replacement.
It will be
seen in Section 4.0 that these same seven classes can readily be adapted
to the case where the sampling is done with replacement.
15
Horvitz and Thompson themselves indicate that they did not consider
the general solution of determining a "best linear unbiased" estimator
for the total of a finite population sampled with arbitrary probabili ties and without replacement.
Godambe
(1955) considered this question
and demonstrated that a uniformly minimum variance unbiased estimator
for the total or mean of a finite population does not exist.
Godambe also put forward a "unified theory of sampling from a
finite population".
He developed a system of notation to indicate the
element by the unit selected on a particular draw and the sequence of
units preceding the individual unit selected (i. e ., the particular sam. ple involved)~ He also defined symbolically a system of probabilities
to handle this case, and proposed a "general" estimator which can be
seen to belong to class seven among the classes developed axiomatically
in this dissertation.
Koop
(1957) recognized the systematic approach to the development
of classes of estimators implicit in the works of Horvitz and Thompson
and Godanibe, but not directly recognized by those authors.
He posited
three axioms, axioms which are descriptions of physical realities, and
then, in a systematic fashion, derived seven classes of linear estimators.
This approach will be discussed more fully in Section 4.0, and the axiomatic approach applied to the problem of determining classes of estimators for a system of sampling where the probabilities are arbitrary and
,
-
J
•
the sampling is done with replacement of each unit before the next is
drawn.
16
In their article, Des Raj and Khamis (1958) made a comparison
between the arithmetic mean of the distinct units observed in the sample
when it is drawn with replacement, and the arithmetic mean of the totality of observed units including repetitions.
They assumed equal selec-
tion probabilities and made the comparison for both the case when the
sample size is fixed and the number of distinct units is random, and the
case when the number of distinct units is prespecified.
Basu (1958), in his article liOn Sampling with and without Replacement", written independently about the same time, made the same comparison as did Des Raj and
~amis,
but not by an analytic method as did
Des Raj and Khamis.
Roy and Chakravarti (1960) acknowledged the researches of Godambe
(1955), Des Raj and Khamis (1958) and Basu (1958) and said they were
going further, obtaining an "admissible" estimator, together with a
"complete class of .estimators ll for a very general scheme of sampling.
This very general scheme whi.ch they propose has some exotic properties;
however, it appears that they have induced generality by deliberately
leaving some details unspecified.
Their estimator can be shown to
belong to class two.
Godambe (1960) also demonstrated the Iladmissibilityll of an estimator which is algebraically equivalent to that proposed by Roy and
Chakravarti when the same restrictions are imposed.
Godambe' s estimator
is the same as given for the class two estimator in Section
4.5. This
article will also be discussed at length later, in Section 5.0.
17
Since this dissertation is discussing principally sampling with
replacement, the following additional recent articles are of interest.
Nanjamma, et al, (1959) propose a scheme of sampling with replacement
which leads to an unbiased estimator.
Their scheme is to select one
unit with probability proportional to some auxiliary variable x, replace
it and then select the rest of the sample units from the whole population with equal probability with replacement at each draw.
~.
selection procedure the ratio estimator, R =
y/x
biased for estimating the population ratio R.
For this
is shown to be un-
The sampling variance
and an unbiased estimator thereof would be different from those in the
case of sampling with equal probabilities without replacement of the
units.
The variance estimator they give as:
A
A
V(R)
A2
=R -
v
2
v
E ni(ni-l)y. + E n.nj y . y .
i=l
~
ifj ~
~ J
n(n-l)
xX
where X is the known population value.
They also mention another
modification of the probability proportional to size (pps) with replacement scheme which has the first unit selected with probability proportional to the size of the x-characteristic (ppx), replaced, and then the
remaining (n-l) units selected 'With replacement with ppz, where
another indicator of size.
z
is
The estimator for this case is algebraically
the same as that of the usual biased ratio estimator in the case of complete pps with replacement sampling, to wit:
,
18
which is unbiased for the ppx-ppz scheme by virtue of the new probability system.
stevens (1958) postulated a scheme whereby sampling with replacement could be made equivalent to sampling without replacement, thus
taking advantage of the simpler probabilistic manipulations.
He showed
that sampling without replacement with pps can be achieved if the sampling units are grouped with reference to size.
Then when the same unit
is selected a second (or more) time, it is substituted by another unit
of the same size chosen at· random.
The estimate of the population total
is then formally the same as when sampling is done with replacement.
Des Raj (1958) compared the efficiency of an estimator for the case
of sampling with probability proportional to size and with replacement
with the efficiency of some alternative methods such as:
simple average
(simple random sampling), ratio, regression, proportionate allocation
stratified and optimum allocation stratified sampling.
Zarkovic (1960),
in making essentially the same comparison, added difference estimates
and dropped the optimum allocation stratified sampling estimate.
One final aspect of the literature apropos to this dissertation is
the relative absence of any consideration of estimation criteria (other
than unbiasednessand minimum variance) applicable directly to samples
from finite populations.
By this is meant the absence of criteria which
do not depend on artificial devices such as letting the size of the
finite population approach infinity.
For instance, Madow (1948) claims
that under very broad conditions the usual theorems concerning the
limiting distributions of estimates hold for estimates based on samples
19
taken from finite populations, at random without replacement.
He also
states that under the same conditions, the same conclusions are true for
samples drawn with replacement, if the approach to infinity by the size
of the "finite" universe is within the limitations imposed by "condition
w" • In his paper, Madow "proves" that the limiting distribution of the
mean is normal "provided only that as the universe increases in size,
the higher moments do not increase too rapidly as compared with the
variance, and that for sufficiently large sizes of sample and population
the ratio of
n
to
N is bounded away from one."
Another frequently used conceptual device [see, for instance,
Cochran (1946), Des Raj (1958)] is to make the assumption that the
finite population itself is a random sample from an infinite superpopulation, making the sample a second- stage sample.
Using "consistencyll as an illustration, this being a universally
accepted desirable criterion for any estimator, very few authors use a
definition applicable to samples from finite populations.
in the textbooks on sample survey theory:
For instance,
Yates (1953) does not bother
to give a definition; Cochran (1953), Hansen, Hurwitz and Madow (1953),
~
and Sukhatme (1953) all give the "infinite" definition involving convergence in probability.
Cochran (1"
13) does actually give a IIfinite"
definition of consistency, but in the next paragraph he returns to the
convergence in probability definition saying that "the idea of consistency does not play an important part in the subsequent exposition."
Only Deming (1960) uses a suitable definition, although he does not
state it explicitly but refers to Fisher's paper "On the Mathematical
20
Foundations of' Theoretical Statistics" (1921).
He does make the state-
ment, though, that lithe bias of' this estimate is inconsistent, i.e., the
bias if' any does not diminish to zero as
ni
Ni " (p. ;20).
This whole question of' estimation criteriaf'or samples f'rom f'inite
populations is discussed in Sections ;.; and
approaches
;.4.
•
21
3 .0 ON THE BASIC CRITERIA FOR A THEORY OF SAMPLING
3 .1 Components of the Sampling Problem
It has been said that sample survey theory is easy because it deals
mainly with the estimation of means or totals and the variance of these
estimates.
This statement is made in spite of the multitude of problems
which can beset a sampler in real life situations, in spite of a bewildering maze of formulae which can be present for a very involved
multi-stage survey; and also in spite of complex formulas and difficult
terminology which often confuse the practicioner in the field and those
trying to glean some knowiedge from: the report of the survey.
The two
conflicting Viewpoints arising in the above situations would seem to be
resolved if the first is attributed to a non-sampler who is looking at
sampling from the broad spectrum of the traditional approach to estimation and attribute the second to the practicing sampler who sees the
multitude of problems that arise when actually conducting a survey.
The resolution of these viewpoints would be very difficult, since
many aspects of the traditional (infinite) approach to estimation do not
hold when applied to the sampling of a finite population.
(By the term
finite is meant a size below that which might be categorized as "indefinitely large", for which the infinite theory would hold, at least
approximately. )
In the study of the theory behind various aspects of sampling
conglomeration of problems may be encountered:
~
one can select the sam-
ple systematically, purposefully or probabi1istical1y; one can have an
unrestricted sample or one can stratify, or use clusters, chunks or
22
quotas; one can have equal, unequal, arbitrary or judgement probabilities or probabilities proportional to certain measures of size; one
can have a single-stage or multipJ.e-stage sample; one can use mean-persampling-unit, regression, ratio or more elaborate estimators; one can
use biased, deliberately or accidentally, or unbiased estimators; one
can study the effects of response and non-response errors; and so forth.
~
But behind all these related or unrelated aspects of sampling there are
five components of any sampling plan, the first three of which, at least,
must be specified~rior to any theoretical or empirical investigation.
First and foremost there must be a well defined UNIVERSE; a
universe which consists of the totality of ultimate units of analysis
about which information is desired and which is invariant under further
considerations of the particular sampling investigation being carried
out. 'For the universe one must next specify the FRAME, Le., a description (e. g. by maps ) and/or listing of all sampling units (each containing one or more units of analysis) which comprise the universe or a '
sufficient portion thereof, if the sampling operation is planned in
several stages, to conduct the survey.
For a full discussion of the
concept of the "frame see Deming (1960, ch. 3).
II
This dissertation will be concerned With a single universe from
which the units (i.e., the ultimate units of interest) can be selected
in one stage.
simple.
Thus the universe under consideration can be said to be
Correspondingly the frame is simply a list.
23
Given the universe and the frame, next define a PROBABILITY SYSTmC
for the possible selection of every unit revealed by the frame [see Keop
(1960)] • When the frame is a list, as above, the probability system
will be defined by a single sequence of non-negative numbers which sum
to unity.
For more complex frames (those which show' the universe in
separate portions and in which the units are in some hierarchal or
nested order) the probability system will be correspondingly complex
and will consist of a sequence of probabilities specific to each unit or
subdivision of the frame (strata, first stage units, second stage units,
etc. ).
For a geometrical representation of a probability system see
Feller (1957, p. 118 ff).
Then the SAMPLING PROCEDURE comes operationally from the selection
probability system and is the scheme for determining which particular
units are to be drawn for the sample.
And also, for every logical combination of a specific frame and a
specific probability system, there is a specific problem of determining
an ESTIMATOR; the problem of selecting the arithmetical procedure of
estimation which will "optimally" (in some sense of the word) give the
information desired from the survey in the first place, i.e., the estimates of the population values of the characteristics under observation.
2The use of the word ll system" follows the usage of Carmichael
(1937) who states "A set of objects, with the associated rule Or rules
of combination, is called
system. " Thus the use of
only the simple Pi values
combination necessary for
a system, or, more explicitly, a mathematical
the term system is intended to connote not
but also any applicable associated rules of
full specification.
24
Schematically the directions of influence between the five components can be represented by the following diagram:
r
)
t~
UNIVERSE ----+-) FRAME ---,..> PROBABILITY SYSTEM
SAMPLING OPERATION
~ ESTIMATOR
Given the frame and the probability system, one may be able to get
an "optimum" sampling plan and an "optimum" estimator.
Vary either the
frame or the probability system, or both, and the problem of getting the
sample and estimates, or comparing various methods for obtaining the sample and estimates, changes.
Problems of choice wi thin the last two com-
ponents, i.e., the sampling operation and the estimation process, constitute most sampling research, and are the source of the statements
that the study of sample survey theory is rather difficult and frequently involved in algebraic complications.
But all five components must
be spelled out in detail for any individual survey.
Further, the first
three components must be specified accurately and completely, for no
amount of refinement or elaborateness in the last two can overcome defects in the first three, e.g., definition or delineation of the frame
or selection probabilities.
It might be noted here that the formulation of these five components has ignored several other parts of any sampling problem, equally
as important as the five given, but which depend on the individuals
planning the survey, and not on the process itself.
These non-prob-
abilistic problems are involved with the mechanical process of
25
accumulating the sample data, and would include requirements that the
objectives of the study are well defined, that the appropriate questionnaire is designed to obtain the desired information in a manner which
can be used, that the answers obtained are to the questions on the survey questionnaire as designed, and not as interpreted by the interviewer, and that the units actually interviewed are the units selected by
the sampler designing the survey.
The neglect of these ideas, fundamental to the study of sample
survey theory, is a great source of confusion and difficulty in much of
the research into comparisons of sampling methodologies done thusfar.
3.2
On the Question of Sampling with or without Replacement
As an example of the application of the principles discussed in the
preceding section, the statement can be made that there is no valid· direct comparison between sampling with replacement and sampling without
replacement.
A comparison between the two is possible, but only on an
indirect, total (or multiple) basis.
That is, since completely differ-
ent probability systems are involved, two complete sampling plans must
be run, with a final judgement as to which is better depending on comparisons of end results for items such as variances and costs involved.
With equal sample Sizes, and no consideration of cost, then there
is agreement that sampling· without replacement is better than sampling
with replacement (using the mean of the sample units as the estimator)
2
by virtue of a smaller variance, Le., (N-n)cr InN
ever, when cost
i~
versus
2
cr In.
How-
considered, the conclusions are not clearcut, for the
cost of the with-replacement sample is dependent, not on the total
26
sample size, but on the mmiber of distinct units included in the sample.
The problem of making comparisons between the two then involves the distribution of
v (the number of distinct units when sampling with replace-
ment) which is discussed in Section 8.0.
Apropos to this discussion of with versus without replacement, two
articles' already mentioned in Section 2. a will now be discussed briefly;
that by Des Raj and Khemis (1958) and that by Basu (1958).
Des Raj and Khamis compare the arithmetic mean of the distinct
units in the sample
v
Z y.)
v i=l 1.
1
(y = v·
totality of observed units
_
(y
n
. 1
=-
with the arithmetic mean of the
v
Z kiy., where k. is the number of
1.
n i=l
1.
times the i-th unit appears in the sample).
For the two cases that they
examine, the applicability of their results is restricted by assuming
equal selection probabilities
(P.1.
= liN)
0
For their case A (n fixed, v a random variable) they then have a
neat algebraic inequality to show that
to wit:
o
1
= Q
n
-
0
Thus for the restrictive case of sampling with replacement and with
equal selection probabilities, Des Raj and Khemis have shown that the
arithmetic mean of the distinct unit characteristic values in the sample,
27
has smaller variance than the arithmetic mean of the totality of observed variate values.
n ~ 2, but for n
=1
(Actually the strict inequality only holds for
no estimate of the variance is possible.)
Basu (1958) in an article entitled
"On Sampling with and Without
Replacement" attempted the same comparison that Des Raj and Khamis made,
utilizing an "indirect proof" of the inequality
( N-V
E N-l
2
!!-)
v
<
(J'2
n
(n > 1) •
(Note that had Basu used a definition for (J'2 which used N-l as a divisor
rather than N, the left-hand expression Would have simplified considerably from the standpoint of taking expectations.)
inequality is not apparent.
The proof of this
For the case of equal selection probabil-
ities, the conclusion of his argument runs as follows (with notational
changes to correspond with the above):
"Since
Yn is
an unbiased estimator of
that ~ is also unbiased.
(downwards) loss function,
y.
n
Y,
it follows at once
It also follows that, for any convex
Yv
has a uniformly bet'ter risk func-
In particular VCy ) < V(y ), the sign of the
v n
equality holding only when n = 1. Thus the inequality is proved.
We may note in passing that T (the vector of distinct observations) is a sufficient statistic here although not a complete
one. No uniformly best unbiased estimator of Y exists." .
tion than
(1958, p. 290).
Basuls argument for the general case of arbitrary probabilities also
rests· on the idea of sufficiency and he claims that the same inequality
holds.
But the concept of sufficiency is not relevant for finite popu-
lations (see Section
3.3.3), so where does the argument rest? Whether
or not the inequality does hold in fact, merely stating an intuitive
28
belief does not constitute proof.
It can be conceded that the vector of
distinct observations does, in a physical sense, contain all the information in the sample, but with selection probabilities and possible observational weights necessary for estimation known in advance and independent of the characteristics under study, or determined by counting the
.appearances of the units, a mere statement of sufficiency does not
constitute a proof, unless one is redefining sufficiency.
From the above arguments, should one be restricted to sampling
without replacement and forget entirely sampling with replacement?
question has not been answered.
This
This is not the question actually
attacked by any of the authors, or what was actually proved in the one
case.
The question of whether one should sample with or without re-
placement, as does the question of the numerical structure of the selection probability construct, arises in the consideration of the
probability, system to be used in a given survey problem.
The decision
may be made on the basis of choice, or may be dictated by external cir. cumstances, but once specified cannot be altered without changing the
entire problem.
And it is a decision which must be made before one pro-
ceeds to the steps of selecting an "optimum" sampling procedure or an
"optimum" estimator.
It·; is undoubtedly for these reasons that the various authors who
consider the question of sampling with or without replacement start out
saying
thi~
is the comparison they are making, but actually make a com-
parison between using an estimator based on the totality of observed
units and 'one based on just the distinct units drawn in the sample of n,
29
both for the case of sampling with replacement.
As said earlier, a
comparison is possible, but only by duplicating the entire sampling plan
and then comparing end results,· remembering that the costs involved are
behind every step of every comparison.
3.3 The Applicability of Traditional Estimation Criteria
From the above arguments, then, there are five components of any
sampling plan, all of which are essential to the estimate which is
finally obtained.
Of these five components
Universe,
Frame;
Probability System,
Sampling Operation, and
Estimator
the first three must be completely specified before any problem involving the last two can be discussed.
The problems involved in obtaining
an "optimum" sampling plan for any given situation, subject to considerations such as costs, expediency, etc., will not be discussed in this
dissertation.
When one comes to the position of deciding on an estimator to be
used to arrive at an estimate of the desired mean or total (or other
population value), one can choose from within a variety of specific
estimators for a given situation.
However, behind this choice of a
specific estimator lies the problem of determining which one is "best"
for the purposes at hand, or even deciding whatcriteria should be used
in resolving the question of bestness.
Neyman (1952, p. 158) made the
30
following comment along these lines:
"While there is likely to be general agreement as to the
desirability of using the best, or at least a satisfactory,
method of making assertions regarding Tl, there may be difficulty in explaining exactly what properties a method of
estimation should posseSB1n order to qualify as the 'best'
or as 'satisfactory'. And without having such an exact
explanation, without knowing exactly what we are looking
for, it is obviously hopeless to expect that we shall
ever find it. If it were possible to devise a method
of using the values of the observable random variables
to predict exactly and without fail the value of the
estimated parameter, then there would be universal agreement that the method in question is the best imaginable.
However, it is obvious that, barring some very artificial
examples, such a method does not exist and we have to put
up with unavoidable ,errors."
With 'this "unaVOidable error" thus present in any estimate, what criteria are to be used for determining the choice of estimator?
This
question is particularly appropriate to the problem of estimation based
on a sample from a finite population.
The traditional, or classical,
approach to this question of criteria for estimators has been based on
concepts developed for and largely applicable to infinite populations,
and samples therefrom.
Fisher's magnum opus on estimation (1925) posited that:
,
"Any body of numerical observations • • • may be interpreted as a random sample of some infinite hypothetical
population of possible values. Problems of estimation
arise when we know, or are willing to assume, the form
of the frequency distribution of the population, as a
mathematical function involving one or more unknown
parameters, and wish to estimate the values of these
parameters by means of the observational record available. A statistic may be defined as a function of the
observations designed as an estimate of any such parameter. The primary qualifications of satisfactory statistics may most readily be seen by their behaviour
when derived from large samples." (p. 701)
From this bieginning, then, the criteria for determining "bestness:' in
31
estimators have been developed as if all estimators that might be questioned are based on samples that came from an infinite population.
But what of the problem of estimation based on samples from finite
populations?
Fisher makes the statement that estimation problems arise
when one knows or is willing to assume the form of the frequency distribution of the population.
However, in sample survey theory little or
no attention is paid to the abstract distribution of the characteristics
under observation (abstract distribution meaning, for each characteristic, a sunnnarization by histographic methods to indicate the proportions
of units contained between arbitrarily chosen bounds for the measure of
the characteristic under consideration).
For infinite populations the
abstract distribution is identified with a frequency distribution, but
the frequency distribution concept does not yield operational probabilities for sampling purposes, i.e., probabilities of the form f(x)dx
are not very realistic as selection probabilities.
To impute the fre-
quency distribution approach, a classicist would use randomization concepts, where there is no discrimination against or preferential treatment for a unit on other than probabilistic considerations.
The problem of estimation in sample surveys is to determine the
1
method of weighting the sample observations (this being dependent on the
method of selection of the units that comprise the sample, and the known
selection probability system) to produce the "best" estimate of the
desired population value.
What really occurs in sample surveys is this.
There is a universe
of units, U. (i = 1, 2, •.. , N), each of which has associated with it a
J.
.
32
vector of charac~eristics, say Yi
= (Yli'
Y2i , •.• Yhi)'
One must note
that "i"is not necessarily a simple index, but may be an extended index
with a number of sUbscripts sufficient to identify the unit in the hierarchal structure of the frame, however complicated it may be.
If one
desires to examine the j-th characteristic,
.
. Y'J i (which will hereafter
be ,denoted by xi)' then a set of units is drawn from among the U aci
cording to the probabilities prescribed by the system.
Then a function
of the characteristics observed for the units included in the sample is
calculated to estimate the mean or total for that particular characteristic for the finite population under study, Le., calculate
f(X ) =
i
,.,
Xor
'"
T •
Also, to compare alternative estimation procedures, or to "evaluate" the
estimate that this process yields, one may compute a "variance", a function of the form
f(X
!.' 2
i
- ~) , which can be used as a measure of the pre-
cision or as a "bestness" indicator.
In the traditional approach to determining the optimality of this,
or a chosen, estimator, one would like it to possess those properties
Let us now examine, wi thin the framework given earlier for sampling from
33
a finite population, each of these concepts in turn to see to what
extent they can be applied to estimates for population values for finite
universes.
3.3.1 unbiasedness.
This is probably the most universally recog-
nized attribute for an estimator.
Unbiasedness is concerned with the
distribution of an estimate, and requires that the distribution be
"centered" on the population value (parameter), 1. e., that the expectation of the estimator is equal to the population value being estimated.
It should be noted here that the concept of expectation must be
modified to be applied to finite populations.
Essentially it can be re-
garded as an averaging over all possible samples, i •e. , "the mean of the
distribution of the estimates X, each X being calculated by the rules
contained in the sampling procedure for all the possible samples that .
one can draw by applying the procedure to a given frame " [Deming
(1960)].
One can express this as
=
where
,.,
e.s
S
E
A
1i
e
s=l s,s
denotes the value of the estimator calculated from the s-th
sample; and
denotes the probability of selecting the s-th sample.
s
The expectation of an individual unit would be expressed by
1i
to
where
Xi
is the measurement of the characteristic under study; and
Pi
is the probability of selecting the i-th unit on a given draw.
34
This criterion of unbiasedness certainly can be applied to an estimator based on a sample from a finite population.
However it must fall
into the category of a potentially desired attribute rather than a
universally required one since:
a)
if the standard error of the estimator is large, the fact that
it is unbiased is rather incidental;
b)
it is possible that a biased estimator will give a more precise
estimate, i. e., have a smaller mean square error.
as to
wheth~r
The decision
or not to require unbiasedness in this situation
must rest on a consideration of the total error, which arises
from bfas and sampling variation together.
In general, how-
ever, one should not use a biased estimator unless an upper
bound can be computed for the bias from known properties' of the
universe in question.
To further cloud the issue, there may be some problems in which unbiasedness of the estimate might be more important than a smaller error,
if, say, large amounts of money, or even life, might be lost on a wrong
decision.
3.3.2 Consistency. The criterion of consistency is less stringent
than that of unbiasedness in that it requires unbiasedness "in the lim-'it".
The traditional and universally accepted definition of a consist-
ent estimator can be stated as:
f(.!) p-> e or P.r [If(.!) - el > e: ] > 8 for n > N(e:, 8).
This is the definition that is used or cited almost
books on sampling.
univ~rsally
in the
However the concept of convergence in probability
35
leaves something to be desired when one thinks of a finite (rather than
indefinitely large) population.
Fisher (1956, p. 145), in fact, makes
the comment in his latest book that "the asymptotic definition is satisfied by any statistic whatsoever applied to a finite sample".
Fora definition of consistency which applies to samples from.
finite populations, it would be best to use Fisheris 1921 definition:
"Consistency.--A statistic satisfies the criterion of consistency if, when it is calculated from the whole population,
it is equal to the required parameter." (p. 310)
This definition is very satisfactory for sample survey theory, and with
this definition, the criterion of consistency is certainly a reasonable
one to require for any estimator.
3.3.3 Sufficiency. Sufficiency, at least in the traditional sense,
requires that the whole of the relevant information (not the current
popular usage of "information") available in the sample will be contained in, or utilized by, the estimator(s) which is (are) computed.
It
was in this general sense that Fisher first defined sufficiency in 1921,
i.e.
"Sufficiency.--A statistic satisfies the cri.terion of sufficiency when no other statistic whic can be calculated from
the same sample prOVides any additional information as to
the value of the parameter to be estimated." (p. 310)
or
" ••• sufficiency, which latter requires that the whole of
the relevant information supplied by a sample shall be contained in the statistics calculated.". (p. 367).
From these first general statements a more formal definition has come
into universal usage, this definition being, as given in Fraser (1958,
p. 218):
36
"We have the definition:
A statistic t(x) is a sufficient statistic, if, given
the value of t"[x), the conditional distribution is
independent of the parameters.
Evaluating conditional distributions can often be
tedious
Fortunately we have a criterion that avoids this:
A statistic t(x) is a sufficient statistic if and
only if the probability or density function can be
;factored,
o
0
into two' parts, one dependent only on the statistic
and the parameters, the second independent of the
parameters. "
Sufficiency, then, for the infinite population case is definitely to be
aimed at, although not always obtainable.
For a finite population, how-
ever, one cannot admit this concept as being relevant in view of the
considerations set forth below.
In a special sense every sample of any size Whatsoever is sufficient for estimating the desired population value.
Firstly surveys are
interested in estimating means, totals, ratios or other functions of the
measurable characteristics revealed by the ultimate units.
These pop-
ulation values (which may, only by convention, be termed parameters) are
logically separate from their respective selection probabilities as revealed by the probability system.
.
Secondly, since probabilities enter:into
sampling only in the process of selecting the units to be included in
the sample, and not with the 'characteristics to be measured, the conditional distribution of the sampled characteristic values from any size
sample, depending only on these selection probabilities and the sampling
procedure, is independent of the population value being estimated.
Thus
the concept of sufficiency is not relevant, at least in the context of
37
the universally accepted definition as quoted above from Fraser.
(One
might note that this logical separateness of the population values and
the selection probabilities ,is an essential feature of sample survey
theory.
Without this separateness no sampling operation would be pos-
sible and therefore there would be no meaningful theories of sampling.)
Reference might be made to the original definition given by
Fisher, but that is too general and subject to the same type of criticism as just given against the more complete (complex) definition.
The one place that the traditional definition of sufficiency might
fit would be for the case when sampling from a finite population with
replacement where the vector of ni-values (with ni denoting the number
of times that the i-th unit appears in the sample) would be sufficient
for estimating the probabilities of selection of the sampling units.
However, these are known or assumed at the start of the sampling procedure, and have no bearing on the characteristics carried by these units
which are being measured.
Thus one must conclude that the notion of sufficiency has little
meaningful application to estimators based on samples from finite populations • Actually Basu (1958) is the only author to argue strongly for
sufficiency; other authors have been silent on the question since (presumably), as indicated, its logical basis is rather insecure for finite
populations.
3.3.4 Efficiency and Minimum Variance.
Efficiency seems to be one
of those concepts for which every author has his own definition, the
common denominator of which seems to be a connection with the idea of
minimum variance; hence they will be discussed together.
The one set of
definitions which is directly akin to the problem at hand of choosing
criteria for estimators from finite populations is that an efficient
estimator is that one from among several satisfying a set of other criteria which has minimum variance.
That one is taken to be the most
efficient, and the relative efficiency of the other estimators is measured by the ratio, less than unity, of the minimum variance to their
variance.
This notion, as indicated, extends directly to the problem
of estimating from a sample taken from a finite population, since it is
entirely logical that if there isa choice of' estimators, the obvious
selection would be the one with the smallest spread or variance.
The other group of definitions centers around asymptotic ideas, and
is one place where Fisher and Neyman agree, to wit:
"Here, again, I agree unreservedly with Fisher that, when
several consistent estimates of' the same parameter are
available, all tending to be normally distributed, the
one with the smallest. variance is preferable to others."
--Neyman (1952, p. 188)
The dependence on asymptotic normality rules out this definition.
the case of a single universe of N units, the most that
to, when sampling with replacement,. is
n
In
can tend
N,iI which cannot yield a nor-
mally distributed estimator.
While discussing minimum variance, a digression might be made from
the main stream of thought to set the historical record straight with the
following quote from Neyman (1952, p. 227):
"Laplace himself studied certain problems on the assumption
that the loss due to an error in estimation is directly
proportional to the absolute value of' the error. On the
other hand, Gauss noticed that various results became
39
more elegant if the loss is assumed to be proportional to
the square of the error committed so that
Upon reflecting on the general nature of errors of measurements, in particular on the possibility of systematic
errors, Gauss f01.Uld it necessary to impose on the estimate
Fn (Xn ) another condition, that of 1.Ulbiasedness, expressed
by the identity,
It will be seen that the two conditions, one of the
1.Ulbiasedness
of Fn (Xn ) and the other of minimum
..
. expected loss measured by the square of the error, formulate the now familiar problem of best 1.Ulbiased estimates.
All this was reported to the Konigliche Societat der
Wissenschaften in G6ttingen on February 15, 1821, and
subsequently published in Latin."
Of the classical criteria of estimation, these two, which outdate almost
all statistical theory, are about the only ones that apply to finite
population problems.
Further, as shown by Das (1951), Godambe (1955) and others, minimum
variance estimators may not exist in an estimable form, because the coefficients or weights for the observations necessary to produce a minimum variance estimator may be enmeshed with the other variate values.
3.4 Criteria for Estimators from Finite Populations
With all this, then, where does this subject stand?
What criteria
can or should be applied to determine which estimator is optimum when
the sample is from a finite population?
Judging from the frequency of
mention in the literature, there would seem to be the following:
1.
Consistency:
The chosen estimator should be consistent, not
40
in the sense of convergence in probability to the population value being
estimated, but in the sense of:
Definition:
A statistic satisfies the criterion of consistency
if, when it is calculated from the whole population, it is equal to the required population value.
The more restrictive condition of unbiasedness could well be listed as a
universal criterion, if it were not for the fact that there may be situations where an estimate with a disappearing bias will better meet
other criteria for "bestness
assured.
tl
•
If there is no bias, then consistency is
If a bias is to be allowed to be present, however, one should
be able to determine an upper limit for that bias in terms of some characteristics of the sample or population.
2.
Minimum variance or minimum mean square error:
In the case of
an unbiased estimator, these two are the same, but in the general case
they are related in the following manner:
MSE
=
V + (Bias)2.
Except in
the case where there are compelling reasons for ignoring this criterion,
it is certainly evident that one would want an estimator which gave as
narrow a spread to the estimates of the population value as possible.
It would seem that these are the two major criteria, at least from
among those based on probabilistic considerations.
They have their rel-
ative importance determined by the given particular situation at hand.
Both are to be desired, but there may be situations where one or the
other of them is an overriding consideration to the detriment of the
other, for example, where unbiasedness is of such importance that variance is taken at face value rather than considered a restriction.
Or,
as indicated earlier, a minimum variance estimator might not exist in an
41
estimable form, although one might then modify "minimum" to choose the
estimator with the smaller variance.
There are two other criteria which might be mentioned, a.lthough
they are of lesser rank than consistency and minimum mean square error,
and not based on probabilistic considerations.
3.
Cogredience (Independence of scale):
These are:
To satisfy this criterion,
our estimate f(~) must ha.ve the property f( c ~) = c f(~).
For example,
if two people are estimating lengths from the same observations, one
measuring in feet. and the other in meters, we would like them to get
equivalent estimates, expressed in feet and meters respectively.
(This
really should be taken care of in interviewer instructions.)
4. Ease of Computation: It would seem desirable, all other things
being equal, that an estimate be easy to compute.
The more complicated
the form of an estimator, the more expensive it is to produce estimates
and the more time it may take to get results which can be acted on •.
With the advent of the large computers, though, this objection may be
disappearing.
Also along these lines, if past history continues, the
process of adapting techniques so that they can be handled on the computers may well indicate that, with further work, simplifications and
short cuts can be developed and approximations found' which would serve
most purposes.
The third criterion mentioned would seem to be essential, although,
as mentioned, it should be required before the consideration of estimation problems.
Ease of computation should possibly be considered a
desideratum, rather than a criterion, but that is a matter of semantics
42
--certainly it is not as dominant a criterion as either of the first two
mentioned •.
ThUS there are two major criteria to apply to the problem of selecting an estimator from among a class of estimators for samples from a
finite population:
mean square error.
that it be consistent and that it have a minimum
4;
4.0 THE GENERAL CLASSES OF LINEAR ESTIMATORS
FOR SAMPLING WITH REPLACEMENT
4.1 Introductory Remarks
The first formal approach to the problem of determining classes of
estimators for samples from finite populations was that of Horvitz and
Thompson (1952).
They formulated three classes of linear estimators.
for the population total for a scheme of sampling from a finite population with arbitrary probabilities arid withoutreplacement.
These classes
were formed by having coefficients dependent on the order of draw, the
presence or absence of
a unit
ple drawn, respectively.
in the sample, and on the particular sam-
However, they did not formalize their ideas
for establishing classes of estimators, and thus did not pursue them
further.
Godambe (1955) formulated what hecal~ed a "unified· theory" of
sampl:ing from finite populations.
This theory, actually a generalized
basic theory, was not axiomatic in nature, although Godanibe apparently
recognized some essence of formality in his approach and that of Horvitz
and Thompson (but he too failed to formalize the deductive process).
For his theory Godambe did posit a generalized notational system which
could cover both probability systems where the units are. drawn with or
without replacement, however, this system is not an operational system
for. determining probabilities.
It will be seen that Godanibe' s most
general estimator would fall into class seven under the aXiomatic approach presented subsequently.
44
Realizing that one must have some definite set of rules for estab1ishmentof the classes when formulating groups or classes of estimators
for samples from finite populations, Koop (1957) developed an axiomatic
approach.
This axiomatic approach, with axioms based on the physical
realities of sample formation, i.e., the way things actually happen,
would seem much more basic for establishing classes of estimators for
samples from finite populations than attempts at utilizing the classical
"infinite" estimation criteria such as unbiasedness, sufficiency, admissibility, completeness, etc.
In fact, the notion that there are crite-
ria for which one can develop classes of estimators is not germane to
sample surveys.
In sample survey theory, the classes of estimators are
developed first, e.g., by axiomatic methods, and then criteria such as
unbiasedness or minimum mean square error are applied to various estimators in each class to attempt a determination of bestness within that
class.
The generality of the axiomatic approach is also of considerable
theoretical advantage because it provides the basis for determining the
optimum probabilities in any defined sense.
For sampling from finite populations, axioms, to be useful, must
be based on physical realities, since sample survey theory is operational in a physical sense.
These axioms, as postulated by Koop are
"three features inherent in the nature of the process of selection" of'
the sample.
They are stated as follows:
lI(i)
the order of appearance of the elements,
(ii)
the presence or absence of any given element (in the sample)
which is a member of the population (or universe), and
(iii)
the set of elements composing the sample considered as one
of the total number possible (in repeated sampling according
to the given probability system)."
(1957, p. 25)
These three possible features, or combinations thereof, which are. inherent in the selection process and therefore sampling procedure, supply
the basis for the deductive construction of seven general classes of
estimators.
The seven result from taking the axioms singly, two at a
time, and, most generally, all three together.
He derived the classes of estimators for estimating the total of a
finite population when sampling with arbitrary probabilities and without
replacement.
This thesis will consider the case of sampling from a fi-
nite population with arbitrary probabilities, but with replacement of
each sampling unit preceding the drawing of another unit.
These estimators of the population total (note that the choice
between discussing the total or the mean is completely arbitrary) for a
characteristic under study will be listed and discussed in Sections
through
4.4
4.10 inclusive. For each class of estimator, weights or coeffi-
cients will be determined which (a) satisfy the criterion of unbiasedness, (b) are independent of the properties of the population, 1. e., of
the measurable characteristic(s) under observation in the sample, and
(c) are positive.
In connection with requirement (b), Koop has shown that, for the
general classes of estimators, minimum variance estimators do exist, but
the weights for such estimators are enmeshed with the variate values of
the characteristics of the sampling units.
Thus, although theoretically
46
eXistent, such weights are non-estimable; hence for all practical purposes, minimUlll variance estimators do not exist.
For this reason this
study will restrict consideration to weights which are independent of
the values of the characteristics under study.
4.2
4.2.1
Probability System and Notation
The Probability System.
pIing units, Up U2 , • • ., UN'
Consider a population of N sam-
Associated with each of' these units is
a vector of measurable characteristics, say Yi = (Yli' Y2i' ... , Yhi).
A sample of n of these N units is to be drawn in a manner which is
completely specified before the sampling procedure begins, and from observing certain of the vector characteristics of the sample units it is
desired to estimate the aggregate of these characteristics pertaining to
the universe under consideration.
For drawing the sample it is given
that the probabilities of selection at any given draw are arbitrary
(arbitrary in the sense that they can assume discretionary non-negative
values, not necessarily equal) with the sole restriction that when
summed over all units in the universe they sum to one.
Also for the system under consideration, each unit is required to
be returned to the universe after it is drawn and measured, and bef'ore
the next unit is drawn.
The case where the sampling is done without
replacement of the units has already been mentioned above.
The most
general case where the units mayor may not be replaced, depending on
some arbitrary or systematic method of determination, or where they may
be replaced in clUlllps after a certain number have been drawn, or some
such chaotic situation will not be discussed, for fairly obvious reasons.
47
Within this framework, then, the following probabilities are to be
considered, with attendant notation.
For an explanation of notation see
Section 4.2.2.
Pi -- the probability of drawing the i-th unit on any given draw.
Pi values will be constant for all draws.
on the values of the Pi are that
+ ••• + PN
These
The only restrictions
0 < Pi < 1
and that
Pl + P2
= 1. Allowing either equality in the bounds on the p.J.
effectively reduces the size of the universe and thus will not be
considered.
pi
=1
- (1 - Pi)n
= the
probability that the i-th unit will appear,
any number of times, in a sample of n units drawn with replaceN
I: p~ =
mente
E(v) where v is the number of distinct units
i=l J.
among the n units drawn (see also Section 8.0).
P
=
s
IT
p.n
i€s
i
v
sample,
P ,--
s
n:
=..""....;~
JT n.!
i€s
= the probability. of obtaining a
~iven
particular
J.
I: n
i
= n •
n.
11
p.
i€sv
J.
= the probability of obtaining a specific com-
J.
J.
V
bination of units, disregarding order of draw, but with the same
number of appearances of each unit in the sample (i.e., a constant
ni vector, ni
Ps -v
I:
P(nlv)
~
n!
JT n.!
J.€S
J.
0).
IT
This would be the sum of' n! ~Tr ni : Ps - terms •
h€sv
Pi
i€sv
ni
=
the probability of obtaining a given
v
distinct sample, that is the set of samples with the same set of
48
distinct units.
The -J.
n. vector is disregarded other than when the
elements are non-zero.
vn
This would be the sum of' A 0 Ps -terms.
Notation and Def'initions.
4.2.2
Def'initions:
Particular sample -- a given individual sample, i.e., the ordered array
of' units resulting f'rom the
will be
n draws comprising the sat,nple.
They
S = ~ in nuniber.
Distinct sample
i
a sample containing a· set of' v distinct units, dis-
regarding the number of' times each unit appears.
A distinct sample
is the set of' particular samples with the same :A.iu vector, where
i
:A..J. = 1 if' the i-th unit appears in the sample any nuniber of'times,
n. vector is disregarded other than
= 0 f'or nonappearance. The -J.
n
~ S =
v=l v
For each distinct sample of' v distinct
whether its elements are non-zero.
n
There will be
~ ~ distinct samples.
v=l
units there are AvOn particular samples.
S' =
[Ref'. Riordan (1958,
1'. 91 ).]
Indices (Subscripts):
i = 1, 2, ••• , N.
i
the unit index f'or the universe.
s
ref'ers to a particular sample, or is the index f'or sunmJ.a.tion
..
over all particular samples.
s
=
1, 2, ••• , S•
ref'ers to a distinct sample, or is the index f'or summation
overall distinct samples.
t
s
V
= 1, 2,
..
... ,
st.
the index denoting the order of' draw f'or the sample units.
t = 1, 2, ... , n.
v
an indexf'or summation over the dif'f'erentpossib1e nunibers
of' distinct units.
It also denotes the nuniber of' distinct
units among the n units in the sample (1
~ v.~
n).
Letters:
n
the sample size.
the number of' times the i-th unit appears in the samplej
E
i€s
n
= n.
i
S
v
ref'ers.to particular samples,
S'
denotes the total number of' distinct samples,
If'
in number.
n
E
v=l
eN
v
in
number.
denotes the number of' distinct samples of' size v, i.e., the
samples of' n with v distinct units, ~ in number f'or a
given v.
z
(with a subscript) will be used as a characteristic random
variable to denote appearance of' a unit according to the
specif'ic subscript assigned.
6 vOn
the dif'f'erences of' zero notation.
operator:
6 un
=
un+l - un'
6 is the f'inite dif'f'erence
Thus the dif'f'erences of' zero
would be
(2 n _l n ) _ (In.On)
:;
2n_2(ln) + On
~On :; [(3n_2n)_(2n_ln~ _ [(2 n_l n )_(ln" On)]
For additional discussion see Whittaker and.Robinson (1944)
or Riordan (1958).
See also Section 4.3(3).
Tables of' 6 vOn
,
50
were given by stevens (1937) and.were reprinted by Fisher and
Yates (1949, table 22).
P(nlv) --
denotes the v-part partitions of n, that is, all sets of nonzero values for n. (i € s ) such that
~
v
.
r
n. = n.
~€S
~
The full
v
(proper) partitional notation, as given by Chrystal (1900,
p. 556), would beP(nlvl ~ n-v-l), i.e., the partitions of
n into. y parts no one of which exceeds
n - v-l, but the
shorter form will be used.
i€sv
those its (ubits) contained in.a distinct sample.
s::> i
those samples (distinct samples) which contain the i-th unit.
eSv:;)i)
4.3 Some Combinatorial Considerations
The following are some combinatorial considerations concerning a
sample of size n drawn from a finite population of size N . with replacement.
(1)
The total number of possible samples is
S =
Ifl
since each
unit drawn is replaced prior to the drawing of the next unit.
(2)
The total number _Of distinct samples, i. e ., samples containing
different sets of v. (1
S' =
n
1: S
~
v
~
n) distinct units, is
=
v=l v
+eNn
since there are
CN combinations of v
v
units from.atotal of N.
51
(3)
The total number of samples of size
n which will contain v
. distinct units, Le., the number of ways of putting
into
v
n
different objects
different cells, With no cell (among the v) empty, is given by
Riordan (1958, p. 91) as:
where Sen, v) is the Stirling number of the second kind.
This could
also be written:
from which it follows that
(4)
=1
for
v =1
= n!
for
v
=n
=0
for
v
> n.
From paragraphs (2) and (3) another ex:Pression for the total
number of possibJ,.esamples would be:
(5)
COD.!Sider all those sam:Ples of size
n,. each containing
tinct units; the total number which contain the i-th unit is
N-l
v
Cv- l '
dis-
52
(6)
Thus the total number of particular samples containing the
i-th unit is
(7) If one is given that the sample of n contains v distinct
units, and that one of those units is the i-th unit, there might be some
,
, v n '
interest in the number of those t::. 0 sa.mple~ which contain the i-th
unit a given number of 'times.
Then with the help of the respective
diagrams it is easy to see that number which contain i
n - I
,--
- JA ' -
---..
.,,--:-;-__
.
once is:
-.,
i
~----=""
~..J~.
v-IVother
distinct units
n - 2
___------'A----__
twice is:
•
0
It
'------,vr--------'
i
n - v-I times is:
i
v-I others
n - v-I
iii
thus:
v - I
v-I others
53
(8)
It follows from (7) that the total number of times that the
i-th unit will appear in the s6mples of size n with v distinct units
is
=
n-v-l
E
r Cn ~V-l On-r •
r=l
r
From this, the total number of times that the i-th unit will appear in
all
~ s6mples (Le., v = 1, 2, ... , n) will be:
n
n-v-l n
I = n + E CN-l
E r C ~V-l On-r •
v=2 v-l r=l
r
I, as a total quantity, can be derived much more simply by noting that
the number of appearances of a particular unit, say the i-th, among the
n
~ units which appear in the N s6mples is symmetric in the N units,
and thus
I
= ~/N = n ~-l
•
This approach, however, does not provide any information concerning the
component structure of I.
4.4 Class One Estimator
4.4.1 The Estimator (of the universe total). The class one estimator, with weights dependent solely on the order of appearance, is given
by
n
E at x.
Tl =
t=l
~t
(4.4-1)
where at (t = 1, 2, ••• , n) is the weight attached to the element
selected at the t-th draw and x.
J.
is the value of the characteristic
t
measured on the i-th unit observed on the t-th draw.
4.4~2
Number of Weights.
The total number of weights isn, one
for each draw.
4.4.3 Determination of the Weights.
The first step in the deter-
mination of weights is to determine the expectation of T , as follows:
l
E(Tl )
= E(
n'
1: at xi )
t=l
n
1:
=
t=l
t
at E(x. )
J. t
N
For T
l
to be unbiased E(T ) must identically equal 1: x., i.e.,
l
i=l J.
N
n
xp
1: a
i=l i~, t=l t
1:
E
which requires that
Pi
n
1:
at
= 1 ' for i = 1, 2, ••• , N.
(4.4-2)
t=l
This condition effectively says that for
T
l
to exist as an estimator,
all the Pi must be equal, i. e.,
Pi
Hence the at
=P
=
liN •
exist only when the Pi
are equal and not in the general
55
case.
In this situation, a solution is
=
at
t = 1,2, ... , n
N/n
(4.4-3 )
so that
T
1
=
n
L:
N
-x
t=l n
(4.4-4)
it
This is a well known estimator which is readily seen to be unbiased.
4.4.4 Variance of Tl .
The variance of Tl' for the case when prob-
abilities are equal·and when cit = N/n , is
if
n
n
t=l
= 2" V(
L:
x. ) .
J. t
Because of independence of the draws, one from another,
=
~
2"
2
nO"
n
(4.4-5 )
which can be estimated by
(4.4-6)
56
4.5 Class Two Estimator
4.5.1 The·Estimator.
The'class two
estimator~
with weights depend-
ent solely on the presence or absence of a given element in the sample,
is given by
( 4.5-1)
where
~.
~
(i
.
~
1, 2, ••• , N) is the weight attached to the i-th element
whenever it appears in the sample, and where
i€sv denotelS summation
over the distinct units in the sample (v ~ n).
4.5.2 Number of Weights.
The total number of weights is
N, one
for each sampling unit.
4.5.3 Determination of the Weights.
are
attach~d
Since the weights, the
~i'
whenever the i-th element appears in the samp;I.e, and summa-
tion is over the distinct units, to determine the
~i
(4.5-1) must be
rewritten as
(4.5-2 )
where
z.
~
is a characteristic random variable for which
= 1 when the i-th unit appears in the sample, irrespective
of the number of times it appears,
= 0 when the i-thunit does not appear in the sa.mple, and
where, see (8-10),
57
Then:
*
= N
E f3ixi p.
i=l
1.
•
For unbiasedness
-
v
*
t3.Pi x.
i=l 1.
1.
1:
which imposes the requirement that
f3.P~
1. 1. = 1
for all i.
Therefore, for unbiasedness, it is found that the weights are uniquely
dete:rmined as
(4.5-3 )
Thus the unbiased linear estimator for class two is
..
(4.5-4 )
where p~ = 1 - (1 - p1..)n
1.
= the probability that the i-th unit appears in a sample of
n units drawn with replacement.
58
This is the analogue of the Horvitz-Thompson estimator.
It also has
been propounded, for the case of equal selection probabilities, by
Godambe (1955 ) and Roy and Chakrava,:rti (1958).
4.5.4 Variance of T2 • The variance of T is given by
2
N
V(.T2 ) = V( E ~iX'Zi)
i=l
~
Substituting (8-12) and (8-13) into this expression, and also substituting f.or the
~i'
yields,
N
=
x.2~
E
i=l pi'2
i
where P~= 1 - (l-pi)n
*
~
( )n
= 1 - Pi* =l-pi
* (l-pi-P )n •
~j=
j
This ,can be
~itten
more concisely as
* * - ~j
*
**j
PiP
(l-p*
i ) 2
= E
- *.
- x.~
i=l p.
~qj
.N
~
An estimate of V( T ) can be obtained as
2
.*
=
N
(1-1'1)
E
--:;2 x2i - E
ie:s
p.
ifje:s v
v
=!-
** *
***
pipjI'ij
~qj - ~j
(4·5-6)
59
The functional similarity between this and the Horvitz-Thompson variance
formula is readily apparent.
4.5.5
Numerical Example.
To illustrate the procedures involved in
Section 4.5, consider the following examples, based on all possible
samples of sizen = 3 from two simple four-unit populations.
Unit:
-A
B
-C
D
-"
Pi
1/2
1/4
1-(1- Pi)3 =
Pi*
.8750
.5781
.3301
.3301
Case A:
Xi
3
4
8
5
xi/P~
x.~
Case B:
Xi/Pi*
3.4286
6.9192
8
5
9.1429
8.6490
1/8
24.2351
1/8
15.1469
4
12.1175
3
9.0882
When setting up Case A, the numerical values were assigned to the units
at random.
It was also deemed advisable to examine the situation where
the probabilities are somewhat proportional to the size of the units,
thus the numerical values were reassigned to the letter-units to produce
this situation as Case B.
When drawing samples of size 3 (n=3) with replacement, the fo11owing distinct samples are possible:
A:
AM(64)
B:
BBB(8)
C:
CCC(l)
D:
DDD(l)
AB:
AAB(32), ABA(32), BAA(32), .BBA(16), BAB(16), ABB(16)
60
AC:
AAC(16), ACA(16), CAA( 16 ), CCA( 4), CAC( 4), ACC( 4)
AD:
AAD(16), ADA(16), DAA(16 ), DDA( 4), DAD( 4), ADD( 4)
BC:
BBC( 4), BCB( 4), CBB( 4), CCB( 2), CBC( 2), BCC( 2)
BD:
BBD( 4), BDB( 4), DBB( 4), DDB( 2), DBD( 2), BDD( 2)
CD:
CCD( 1), CDC( 1), DCC( 1 ), DDC( 1), DCD( 1), CDD( 1)
ABC:
ABc( 8), ACB( 8), BAC( 8), BCA( 8), CAB( 8), CBA( 8)
ABD:
ABD( 8), ADB( 8), BAD( 8), BDA( 8), DAB( 8), DBA( 8)
ACD:
ACD( 4), ADC( 4), CAD( 4), CDA( 4), DAc( 4), DCA( 4)
BCD:
BCD( 2), BDC( 2), CBD( 2), CDB( 2), DBC( 2), DCB( 2).
.-
.
-
-
The number f'ollowing each sample, _when divided by 512, is the probability of' obtaining that particular sample (p ).
s
This example, with N
= 4,
n
= 3,
produces the f'ollowing results f'or
the class two estimator:
v
E xi/Po*
ies
~
v
Case A
Case B
T
2
..
s
=
A
64
3.4286
9.1429
B
8
6·9192
8.6490
C
1
24.2351
12.1175
D
1
15.14b9
9·0882
AB
144
10.3478
17·7919
AC
60
27.6637
21.2604
AD
60
18.5755
18.2311
BC
18
31.1543
20.7665
BD
18
22.0661
17·7372
61
=
T
2s
v
xi/p;
.1::
~€s
v
Case A
Case B
CD
6
39.3820
21.2057
ABC
48
34.5829
29.9094
ABD
48
25.4947
26.8801
ACD
24
42.8106
30.3486
BCD
12
46.3012
29.8547
From this then,
512 E(T2 ) = 10239.6540 or E(T2 ) = 19·9993 ;
for Case B (8-5-4-3): 512 E(T2 ) = 10239.8865 or E(T2 ) = 19.9998 ;
i •e •, the estimator is unbiased since T = 20.0000.
for Case A (3-4-8-5):
Using ( 4.5-5), the variances for these examples can be determined
as follows:
~ Pi
xiA
2
x2
iB
.1429
.7298
2.0294
2.0294
9
16
64
25
64
25
16
9
AB
AC
AD
BC
BD
CD
~qj
.0527
.0837
.0837
.2826
.2826
.4488
~j = (l-Pi -P j )n
.0156
.0527
.0527
.2441
..2441
.4219
**
* *j
PiP
.0371
.0310
.0310
.0385
.0385
.0269
·5058
.2888
.2888
.1908
.1908
.1090
A
B
C
D
Pi*
car
.8750
.5781
.3301
.3301
.1250
.4219
.6699
.6699
-.
**
~qj
*
- %j
*1 *
2 * *
xiA~/Pi
1.2861
11.6768
129.8816
50.7350
A: 193·5797
2 * *
xiB~/pi
9.1456
18.2450
32.4704
18.2646
B: 78.1256
62
AB
. AC
AD
BC
BD
CD
.0733
.1073
.1073
.2018
.2018
.2468
xixj(A)
12
24
15
32
20
40
XiXj(B)
40
32
24
20
15
12
(~qj
- ~j )/p~pj
A:
25.4299
x 2
50.8598
V(T2 )A
Thus, for Case A:
B:
18.9654
x 2
37.9308
= 193.5795 - 50.8598
= 142.7197 ,
and, for Case B:
V(T2 )B
= 78.1256 - 37.9308
=
40.1948.
When computed directly from the possible sample estimates, i. e., by
using
the following results are obtained:
Case A:
V(T2 )A
= 142.6741
Case B:
V(T2)B
= 40.1744
The slight discrepancies are due to rounding off errors • Further, using
(4.5-6), the following estimates of the variance are obtained for the
.
.
various possible samples listed above:
Sample
Case A
Case B
A
not esti:rn.e,ble
B
not estimable
C
not estimable
63
Sample
Case A
Case B
not estimable
D
AB
19.8801
36.0524
AC
389.4929
101.5662
AD
151.7662
60.3442
BC
396.5720
119.2456
BD
163.2139
78.8808
CD
513.0051
143.4502
ABC
390.8153
116.4865
ABD
·159.4966
77.9345
ACD
505.6374
141.2127
BCD
505.4355
156.3205
4.6
Class Three Estimator
4.6.1 The Estimator.
The class three estimator, with weights
dependent solely on the distinct sample drawn, is given by
( 4.6-1)
r Sv
= 1, 2, •• 0' SI) is the weight attached to the s -th
v
v
distinct· sample whenever it appears. S I is the number of distinct sam-
where
p1es and
sample
(s
i€s
v
again denotes summation over the distinct units in the
0
4.6.2
n
SI = E
eN
v=l v
Number of Weights.
with there being
in the sample of n.
The total number of weights is
C~ different sets of v distinct units
64
4.6.3 Determination of the Weights. Imposing the criterion of
unbiasedness says
(4.6-2 )
where Ps
denotes the probability of obtaining the s-th sample. (4.6-2)
can be rewritten as
where
i€U denotes all
i
in the universe and
samples containing the i-th unit.
E
s:) i
'Y
P
Sv s
=1
for all
For T
3
s :) i
denotes those
to be unbiased requires that
i.
This expression can be rewritten as
n
n
P
= E E SE€S 'Y sv s = E E 'Y s
E Ps
v=l s:;) i
v=l s::>i v S€S
v
V
V
V
where, in the triple sum, the first summation is over the possible
values of v.
The second summation is over those distinct sets with v
units which contain the i-th unit.
The third summation (with index
S€S ) groups the particular samples (sets of n, ordered by draw) into
v
distinct sets (those with distinct sets of v units). The third sum wtll
vn
group ~ 0 terms together, one for each particular sample within the
distinct sample, and one can readily observe that E Ps
S€S
V
the requirement for unbiasedness becomes
= PS
Thus
V
n
E E 'Y P = 1 ,
v=l s:::>i s v s v
v
(4.6-3 )
from which a solution for the r s is
v
1
= --=,,~-
1
N-l
•
n Cv- 1
(4.6-4 )
Ps
v
since
1
E
s :::li
n
v
l
eNv-l
= (·E
s :::li
v
N-l
v-l
C
=
n eN- l
v-l
l/n
c~=i
= -1n
Thus, from (4.6-4),
T3
=
1
(4.6-5 )
_N-l
c-"
s v v-l
n P
Note that a more general solution would be
=
n
with the restriction that
E E C
v::;l s v::>i,. sv
= 1. One could thus obtain
additional solutions for the rls by suitable manipulation of the CIS.
From the requirement for unbiasedness, (4.6-3) it can be seen tha.t
one can obtain the class two estimator for the restrictive case of equal
selection probabilities.
Assuming that the r s
are equal over all sv'
v
(4.6-3) becomes
n
n
r(s) E E P E l
v v=l s::>i sv
v
66
n
.r:
or, since
.r:
p
= p* = the inclusion probability for the i-th
Sv
vel s :;,i
v
unit,
r
1:
sv
=-
p
*
1
and T'3 (equal) = -
P*-
4.6.4 Variance of T • One can determine the variance of T as
3
3
follows:
Sv
n
=
.r:
.r:
vel s =1
V
P
sv
(rs
(4.6-6)
. .r:
v ~€S
V
This can be estimated by
To obtain an estimator for
found for
rrF
--2
T'
N
2
V(T3)~ an unbiased estimator,
N
rrF,
must be
= .r: Xi + .r: XiX. • The simplest unbiased estimator of
i=l
ifj
J
is given by
2
..r: Xi
~ = _~_€_s-=v:-::-__
V
/::::-..
n eN-l,p
v-l
sv
+
67
p
where
P
sv
I
P
=
sv
=
N n
1 -
n
S
v
I:
I:
I: Pi
sv
v=2 s =1
v
i=l
Ps
v
Note that at least two different tini ts :( v.?~ 2) • are required to estimate the cross product term.
The above can readily be shown to be un-
biased as follows:
=
~
~v
I:
__i_€_S,..lv
P
v=l s =1
v·
(
sv
2
Xi
n CN- 1 P
v-1 s
___
)
v
1
n ~-1
=
v-l
=
I:
E
( ifJ··
(n-l)
I:
xix j
~N-2
p'
v-2 s
)
n
=
S
v
I:
p
I:
v=2 s =1· sv
v
v
(
irJ.. xixJ
(n-1)
··v
n
=
x,x I: I:
j
irj€U ~
v=2 sv~i,j
I:
N
=
ilj xixj ,
C~-2
p'
v-2 s
)
1
(n-1) CN- 2
v- 2
68
Thus:
1::
i~j€s
,v
n
1
eNv-1
P
xx
i j
( 4.6-7)
(n-1) CN- 2 pI
v- 2 s v
.
sv
It may be noted that this estimator can be negative for certain samples.
4.6.5 Numerical Example.
To illustrate the techniques of
4.6, the examples of Section·4.5.5 can be used.
Secti~n
Since distinct samples
are again involved, the following results are readily obtainable for
T
1
=-~~--
3
n
Z
ce=i. Psi€s
v
v
(4.6-5 )
x
i
(2)
n ~-;L
v-1
A
B
C
D
AB
AC
AD
BC
BD
CD
ABC
ABD
ACD
BCD
64
8
1
1
144
60
60
18
18
6
48
48
24
12
3
3
3
3
9
9
9
9
9
9
9
9
9
9
(3 )
T
3s
1:: Xi
i€s v
A
B
3
4
8
8
5
4
5
7
11
8
3
13
12
12
9
8
9
13
1.5
12
16
17
11
7
17
16
15
12
= 512 (3)/(1)(2)
v
A
8.0000
85.3333
1365.3333
853.3333 .
2.7654
10.4296
7.5852
37·9259
28.4444
123·2593
17.7777
14.2222
37·9259
80·5926
B
21.3333 .
106.6667
682.6667
512.0000
5.1358
11.3778
10.4296
28.4444
25.•2840
66,.3704
20.1481
18.9630
35.5556
56.8889
.
69
From this then,
for Case A (:3- 4- 8-5) : 512 E( T:;) = 102:;9.9878 or E(T:; )A = 20.0000,
for Case B (8-5-4-:;):512 E(T:;) = 102:;9.9983 or E(T:;)B = 20.0000,
i.e., the estimator is unbiased.
Using (4.6-6) (which is the same computationally as the variance
of all possible sample estimates), ·the variance of T:;, for these examples, ·is:
V(T:;)A = 5331.8290 ,
V(T )B = 1601.6463 •
3
Further, using (4.6-7),
v
th~
various samples delineated. above for which
= 2 or 3 produce estimates of the variance.of the total as follows:
Sample
Case A
Case B
AB
- 38.729:1.
-130.4508
AC
-135.6382
..179.9976
AD
- 84.2017
-135.6382
BC
406.8677
192.8370
BD
192.8)70
166.8239
CD
11429.0031
:;291.9930
ABC
99.6849
-138.2485
ABD
- 71.4258
-116.9902
ACD
485.2026
432.7377
BCD
4318.3894
2141.5599
70
4.7 Class Four Estimator
4.7.1 The Estimator. The class four estimator, with weights
dependent on both the presence or absence of a unit and the order of
appearance of the units, is given by
(4. r-l)
where
5
it
(i = 1, 2, ••• , N; t = 1, 2, ••• , n) is the weight attached
to the i-th element whenever it appears at the t-th draw.
4.7.2 Number of Weights.
weights at each of
n
The total number of weights is
Nn
(N
draws).
4.7.3 Determination of the Weights,
Since the weights, 5
it
' are
attached depending on the appearance of the i-th element on a particular
draw, as for the class two estimator, the estimator can be rewritten introducing the characteristic random variable
Zit'
Thus (4.7-1) be\':--
comes:
N
n
E
i=l
where
E 5. x.Z
t=l J. t J. it
(4.7-2 )
if the i-th element appears at the t-th draw
=
1
=
° if the i-th element does not appear at the t-th
{
draw
and
E(zit) = Pi
sinCe: the individual draws are independent.
11
Taking expectations, .
N
n
= I: x. I: B.tPi
i=l 1, t=l 1,
Imposing the criterion of unbiasedness, i.e., requiring that
N
.
E(T4) = i~lxi ,means that the
Bit
can be determined by setting
for i = 1, 2, ... , N.
The obvious solution for this is to
s~t
1
Bit = nP
i
which weights hold for all
draw (the t' s ) •
(4.1-3)
i , and are independent of the order of the
This yields the familiar
A more general solution might be
Bit = rot/Pi
where
I: rot = 1 ,
but it is well known that the variance of a linear function, with arbi,
trary weights is minimized when the weights are equal, Le., when
rot = lin for all
t.
When the selection probabilities are equal it is readily seen that
the class four estimator reduces to the class one E7stimator (4.4-3) •.
72
4.7.4 Variance of T4 • To determine the variance of T4 , set
N
n
V(T 4) = V( E E
1=1 t=1
1
np
xiz it )
i
(4.7-5)
Note that the terms involving Cov(xit,Xitt) and COV(Xit , Xjt ,) disappear by virtue of the independence of the" draws, One from another, so
that the
termsinvo1~ng the
t-thand t t-th draws· have zero covariance.
Now, from multinomial theory for a single draw,
V(Zit) = Pi %
%=1
where
- Pi
and
COV(Zit' Zjt) = - PiP j .'
Substituting these into (4.7-5) produces
V(T 4)
=
=
X~
N
n
Z
E
N
Xi
E
-
~ 2
i=1 t=l n Pi
2
Pi
%+
N
n
Z
Z
i~j t=1
1-Pi
(-)
i=1
n
N
= E
i=l
Xi
nP i
Pi
2
2
N
1
- n ( E Xi)
i~l
2
N
= E
i=l
Xi
r/-- -n
nP
1
= n
1:;
i
N
i=l
[ P (Pxi )
1
i
2
- Pi
r]
(4.7-6)
73
One can estimate this variance by using
....
V.( T4)
n
1
=---,.;;;;...
~......
n(n-l).
where
T=
(4.7-7)
E
t=l
1
n
n
E
t=l
It is to be noted that (4.7-7) always produces positive estimates of
the variance which is a definite interpretational attribute.
4.7.5 Numerical Example.
The class four estimator depends on sum-
mation over the units -of the sample as they appear, and not just the
distinct units observed.
ThuS, in using the four-unit population from
Section 4.5.5 as an example, the interest is in the groups of samples
that have the same units the 'same number of times.
Order is not. impor-
tant, so the samples and results for this case can be grouped as follows:
T4
=
1
(4.7-4)
-n
unit
A
B
C
D
.
.
Xi/Pi
{
Case A
6
16
64
40
Case B
16
20
32
24
74
Sample
512
PSI
T4
st
.E xi/Pi
i€s
A
B
A
B
AM
64
18
48
.6.0000
16.0000
AAJ3
96
28
52
9.3333
17·3333
Me
48
76
64
25.3333
21.3333
AAD
48
52
56
17.3333
18.6667
ABB
48
?8
56
12.6667
18.6667
ABC
48
86
68
28.6667
22.6667
ABD
48
62
60
20~6667
20.0.000
ACC
12
134
80
44.6667
26.6667
ACD
24
. 110
72
36.6667
24.0000
ADD
12
86
64
28.6667
21.3333
BBB
8
48
60
16.0000 '
20.0000
BBC
12
96
72
32.0000
24.0000
BBD
12
72'
64
24.0000
21·3333
BCC
6
144
84
48.0000
28.0000
BCD
12
120
76
40.0000
25·3333
BDD
6
96
68
32.0000
22.6667
CCC
1
192
96
64.0000
'32.0000
CCD
3
168
88
56.0000
29·3333
CDD
3
144
80
48.0000
26.6667
DDD
1
120
72
40.0000
24.0000
75
Thus, for these examples:
512 E(T4 ) = 10240.0000 or E(T4 )A = 20.0000 ,
for Case B (8-5-4-3): 512 E(T ) = 10239.9994 or E(T4 )B = 20.0000 ,
4
for Case A (3-4-8-5):
i.e., the estimator is unbiased.
Using (4.7-6), the variance of T4, for these examples, is
V(T4)A = 131.3333 ,
V(T4)B
= 9·3333.
When computed directly from the sample estimates, the results are:
V(T4)A = 131.2813 ,
V(T4)B
= 9.3333·
Further, using (4.7-7), the following estimates of the variance are
obtained for the various possible samples listed above:
Sample
AM.
Case A
Case B
not estimable
AA'B
11.1111
1.7778
MC
373.7778
28.4444
AAD
128.4444
7.1111
ABB
11.1111
1.7778
ABC
320.4444
23.1111
ABD
101.7778
5.3333
ACC
373.7778
28.4444
ACD
283.1111
21.3333
ADD
128.4444
7·1111
76
Sample
Case A
Case B
not estimable
BBB
BBC
256.0000
16.0000
BBD
64.0000
1.7778
BCC
256.0000
16.0000
BCD
192.0000
12.4444
BDD
64.0000
1.7778
not estimable
CCC
CCD
6400000
,701111
eDD
64.0000
7·1111
not estimable
DDD
4.8 Class Five Estimator
40801 The Estimator.
The class five estimator, with weights
dependent on the presence or absence of a particular unit in the distinct sample drawn, is given by:
(4.8-1)
where es i (i = 1,2, 000' N; Sv = 1,2, 0.0' Sf) is the weight attachv
ed to the i-th element whenever it appears in the s -th samp1eo
v
tion again is over distinct units.
Summa-
408.2
Number of Weights 0 The total number of weights is
n
n
.
1
N
~ v eN = N ~ C - 1 with V eN corresponding to the situation where
v=l
v
v=l vv
there are ~ combinations of v distinct units from among the· N , and
77
a weight is attached to each of the v units in the distinct sample.
_N-l
Alternatively, there are c-"
1 samples with v distinct units which
vcontain a given specific unit, say the j-th, and N such units.
4.8.3
Determination of the Weights.
As with the cla§lsthree esti-
mator, to determine weights for the class five estimator which satisfy
the criterion of unbiasedness, expectation must be taken over all possible samples.
This leads to equations of the following form:
s
= sE__ Ps E e s 1 Xi
i€sv
v
l
which, in turn, stipulate that for the estimator to be unbiased one must
determine a set of weights satisfying
N
n
E x. E E
i=l J. v=l s :>i
v
N
Eli
E Xi •
(4.8-2 )
1=1
The solution of this equation, or the determination of a set of values
which satisfy it, is a problem in combinatorial number theory.
As a
special case of the class five estimator, if the subscript "1" is suppressed, one can determine the
directly froIl). the identity (4.8-2).
78
Thus
n
E
E
v=l s::)i
v
P
e ( )
Sv Sv i
must hold in order that
=1
5
T = !:
..
i€s
es
v
(i) xi
be unbiased.
This, of
v
"course, is the same criterion as obtained for the class three estimator
This yields as a general solution es (i) = Cs (i)/PS with
n
v
v
v
the restriction that !: !:
C (i) = 1 , or yields a specific solution
sv
v=l s ~i
(4.6-3).
V·
v
e·v(i)
= ~ p.v
~=ir
·
Another solution is the estimator given by Basu (1958), which belongs to this class for certain special values of the c' s.
Consider
n
E
E
v=l s ';;)i
v
i = 1, 2, ••• , N
CSv(i) = 1
(4.8-3)
The c-coefficients relate to the possible samples of size
v = 1, 2, ••. , n distinct units.
n
containing
Also it is only meaningful to determine
them in the context of probability values relating to samples of size n.
The right hand side of (4.8-3) will result in multinomial probabilities
relating to samples of size n-l.· Now multiply both sides of (4.8-3) by
Pi' yielding
n
(
)n-l
!:
Cs (i) = PJ.. Pl + ... + PN
,for all i,
J. v=l sv:;)i
v
p.
E
= p.
n-l
E'!:
J. v=l
E
Sv P(n-llv)
(n-l) !
nj
JT
v
79
Choose the following solutions to this equation:
(i
= 1,
2, ... , N; v
= 1,
the i-th
unit),
pC ()=p
i s2 i
i
[I:
P(n-1-!:2)
(n-1)!
ni"-Inj.I
(i
= 1,2,
••• , N; v = 2, say units
i, j),
(i = 1,
(i
=
•• 0 ,
N;
1,2, ... , N; v
k, h),
or, in general
2,
= 3,
v
=
4,
say units i,j,k),
say units i, j,
80
(n-l) !
+ E
P(n-llv-l)
, (V- l )!
n .•n j •... n j
Jl
2
v-l
(i
= 1,2, •• . ,Nj
jl' j2'
.oo,
2
=5 v =5 n-l, say units
i,
jv-l)·
(4.8-4)
The solutions listed above hold simply because the sum of the multinomial probabilities in the square brackets for any given i , and for
all sets of distinct v's, add to
)n-l
N
( E
Pi
= 1 . Of course, in the
i=l
light of the above demonstration, Pi need not be multiplied to both
sides of the condition of unbiasedness,
(4.8-3),
in the choice of the probability functions in
but this device helps
relat~on
to the sets of
possible distinct samples of size n •
It will be seen that the sum of the multinomial probabilities in
square brackets on the right hand side of each equation, when multiplied
by p., is the probability of selecting the i-th unit to complete a given
~
collection of v distinct units.
Thus the coefficients can be determined as
which yields the estimator
81
T ' = 1:
5
i€sv
where
[
]
Pi [
xi
P
Pi
Sv
(4.8-6)
] denotes the term inside square brackets in (4.8-4).
This estimator, when divided by N, is equivalent to the estimator obtained by Basu for estimating the population mean.
Also, the estimator given by Des Raj and Khamis (1958) belongs to
this class and is a special case for equal selection probabilities,
i.e., p.
J.
= liN
for all i.
In this situation
(l/N)(~von-l + ~v-10n-l) N-(n-l)
~vO~-n
• ( ~vOn-l + ~v-10n-l )
~vOn
using the "differencesof zero" notation, as explained in Section 4.3(3),
rather than the summation notation.
Further, from the definition of
~vOn given in Section 4.2, it can be readily shown by induction that
so that
Another . . special case can be obtained by stipulating that
es J..
v
= 8
i
for all
s v ~ i • . This .situation produces a requirement for
unbiasedness which is identical to that of the class two estimator.
An
82
alternate derivation would be:
n
since
E E P
v=l sv~i Sv
is the probability that the sample includes the
P~.
i-th unit, and. so equals
4.8.4 Variance of
T
5
.
Thus, as for (4.5-3) ~
In very general form, the variance of
T
5
Will 'be
n
=
E
v=l
which can be estimated by
e'
(4.8-7)
83
i~S X~
- [n P v
sv
eN-i
v-
using the simplest unbiased estimator of
(408-8)
T2
as given in Section 406040
Again negative estimates of the variance are possible
4.8.5
Numerical
Example~
0
The coefficients for the class five
estimator depend on the appearance of a particular unit in a given
distinct sample.
Using (4.8-6) to illustrate this class, the coeffi-
cients for the distinct units wi thin each sample are determined from
(4.8-5) after eValuating (4.8-4) to determine the [
J-termo
This
term is dependent on the selection probabilities, the sample size n, and
the number of distinct units v.
For samples of size n
= 3, this term
is:
for v
=1
for v
=2
for v
=3
[ J= P~ ,
[ ] =2p.p.J + Pj'2 ,
[ J= 2pjPk '
~
and is applied to the coefficient for the i-th unit in the distinct
sample.
84
,.
T
5s v.(i)
Sample
Estimator
P
sv
Case A
Case B
A
64
2A
6.0000
16.0000
B
8
4B
16.0000
20.0000
C
1
8c
64.0000
32.0000
D
1
8D
40.0000
24.0000
AB
144
10 A +16 B
9
9
10.4444
17.7778
AC·
60
~ A + 16 C
-5
5
29·2000
22.4000
AD
60
~ A + 16 D
5
5
19.6000
19.2000
Be
18
37·3333
25·3333
BD
18
20 B + 32 C
9
9
20 B + 32 D
99
26.6667
21.7778
CD
6
52.0000
28.0000
ABC
48
gA+ !!B+.§C
33
3
28.6667
22.6667
ABD
48
gA+ !!B+.§D
3
3
3
20.6667
20.0000
ACD
24
gA + .§C+.§D
3
3
3
36.6667
24.0000
BCD
12
!!B+ .§C+.§D
3
3
3
40.0000
25·3333
.-
>4c +
4D
Thus, for these examples:
t
512 E(T )
=
512 E(T )
5
i.e., the estimator is unbiased.
=
for Case A (3-4-8-5):
for Case B (8-5-4-3):
5,
9
10239.9976 or E(T )A = 20.0000,
5,
10240.0042 or E(T )B = 20.0000,
5
85
t
The variance of T , computed directly from the distribution of
5
estimates, is
t
V(T )A = 118.5349 ,
5
V(T~)B =
8.3961.
4.9 Class Six Estimator
4.9.1 The Estimator. The class six estimator, 'With weights,
dependent on both the order of appearance of the units and the particular sample involved, is given as
(4·9-1)
~
where
(s = 1, 2, ••• , S) is the weight attached to the s-th sample
s
whose elements appear in a specified order, and xit is the characteristic value observed on the i-th unit at the t-th draw.
4.9.2 Number of Weights. The total number of weights is S = ~
since for this case where attention is paid to the ordering of the
elements within the sample, there will be a separate weight for each
sample.
4.9.3 Determination of the Weights. Taking the expectation of
(4.9-1) over all possible samples yields
where summation is over all
i
appearing in s, including repetitiona.
86
Thus
S
=
=
with
E'
~
s=l
p
~
s
xi
E
s i€s
,
N
1:
xi E P
i=l
s
s
4>
s
denoting sunnnation over all appearances of the i-th unit,
in number as derived in Section 4.;(8).
I
Imposing the condition of un-
biasedness, i.e., setting
N
E
x.
i=l
E1 P
J.
S
SH
4>
s
iii
N
1:
. -1
J.-
x.
J.
requires that
for all
(4.9-2 )
i •
A set of weights which satisfied this requirement would be
1
PS
",I
LJ
SH
=
1
P I
(4.9-; )
s
where
the
If'
I
= n If-- l
is the number of times that the i-th unit appears in
samples and is developed in Section 4.;(8);
as an estimator of the population total
This yields, then,
87
n
Z xit
tel
1
6 = PI
s
T
=
=
1
rf~l
Ps n
(4.9-4)
n
N
Xit
t=l.
1:
n Hs
ni
(NPi) -
IT
where
n
Z x
t=l it
i€sv
For the case where the selection-probabilities are equal, i.e.,
Pl."
=
$s
= N/n, so that, for the equal probability case,
P
= l/N, then
H
s
=
1-
and $s
reduces to the familiar form:
'.
This , it will be recalled, is the estimator obtained in class one for
the restrictive case, i.e., equal selection probabilities, for which the
class one esti:m.ator did exist.
Further, if one sets
=$
for all
Sv
S€Sv' then the class three estimator can be derived, since the requirement for unbiasedness would be
s
S:l Ps $sv i;S Xi
n
1:
i€U
Xi
1:
N
n
1:
1:
i=l
Xi
1:
vel sv,i
1:
vel s :::»i
v
;;; T
-
$
P
1:
Sv S€Sv s
$s
P
v sv
;;;
T
T
$s
88
or
n
E
E
$s p i ! !
v=l s =»i
v Sv
1
v
which is the same as (4.6-3), the unbiasedness requirement for class
three.
4.9.4 Variance of T6 .. The variance of
quite simply
6)
V(T
=
as
S
E
6 can be determined
T
follows:
n'
s ($s
P
s=l
2
E. x
t=l
it )
-
~
•
Further expansion of this expression would become involved, for, with
summation over all units drawn including repetitions, some of the crossproduct terms are, in fact, squares.
T
6
An estimator for the variance of
would be
E
=
where
X~
ie:s
T6 · -.[ P I
s
s
2
+
:t is the total number of appearances of the i-th unit in all
possible samples and L is the total number of times the (i,j)-crossproduct occurs in all possible samples.
That this is unbiased follows
directly from the expectation methods used in this section, and along
~
the lines used to prove·"rl-
is unbiased in Section 4.6.4.
4.9.5 Numerical Example.
depend on the particular sample.
The weights for the class six estimator
For brevity in the listing, it can be
noted that under the assumption that the selection probabilities are
coIistant over all draws, the probability of obtaining a particular sample
~l
89
depends on the units drawn, and not on. the
Thus
~articular
ord~r
in which they are drawn.
samples having the same units the same number of times
can be lumped together, as in the discussion of the class four estimator.
Again using the four-unit population of Section
4.5.5, the esti-
mates produced by
T6 = P1 I
s
(4.9-4)
E Xi
ie:s
would be:
512 Ps f
AM
AAJ3
MC
e)
AAD
ABB
ABC
ABD
ACC
ACD
ADD
BBB
BBC
BBD
BCC
BCD
BDD
CCC
CCD
CDD
DDD
64
96
48
48
48
48
48
12
24
12
8
12
12
6
12
6
1
3
3
1
512 Ps
64
32
16
16
16
8
8
4
4
4
8
4
4
2
2
2
1
1
1
1
PsI
= Ps nlif- 1
6
3
3/2
3/2
3/2
3/4
3/4
3/8
3/8
3/8
3/4
3/8
3/8
3/16
3/16
3/16
3/32
3/32
3/32
3/32'
T6
Sf
.E Xi
ie:s
A
B
9
10
14
11
11
15
12
19
16
13
12
16
13
20
17
14
24
21
18
15
24
21
20
19
18
17
16
16
15
14
15
14
13
13
12
11
12
11
10
9
Case A
Case B
1.5000
3.3333
9·3333
7.3333
7·3333
20.0000
16.0000
50.6667
42.6667
34.6667
16.0000
42.6667
34.6667
106.6667
9006667
74.6667
256.0000
224.0000
192.0000
160.0000
4.0000
7.0000
13.3333
12.6667
12.0000
22.6667
21.3333
42.6667
40.0000
37.3333
20.0000
37.3333
34.•6667
69.•3333
64,.0000
58.6667
128,.0000
117..3333
106.•6667
96.0000
90
Thus, for these examples:
for Case A (3-4-8-5):
512 E(T6 )
=
10239.9952 or E(T6 )A
=
20.0000 ,
for Case B (8-5-4-3):
512 E(T6 )
=
10240.0000 or E(T6 )B
=
20.0000 ,
i.e., the estimator is unbiased.
(4.9-6), the variance of T6, for these examples, is
Using
V(T6 )A = 1009.9484
V(T6 )B = 354.6458 •
4.10 Class Seven Estimator
4.10.1 The Estimator. The class seven estimator, the most general
class of estimators with weights dependent on the order of draw, the
presence
e'
or
absence of a unit, and the
p~icular
sample involved, is .
given by:
n
(4.10-1)
~7 = t:l Vsit xit
where
Vs~0t (t = 1,2, .••. , n; i = 1,2, ... ,N; s
= 1,2, ... , S) is
the weight attached to the i-th unit appearing at the t-th draw in the
s-th sample (whose elements, of course, appear in a specified order).
4.10.2 Number of Weights. The total number of weights is n if; n
for each of the ~ samples, since the
V's depend on the ,sample, unit
and order of draw.
,
4.10.3 Determination of the Weights. In a manner similar to that
used in class six, the
.
.
restr~ctions
along the following lines:
for unbiasedness can be derived
91
S
~
-
6=1
=
with
~
p
s i€s
t . t x.
s~
N
~
i=l
X
~l
i s
,I,
~sit
~
P
s
~l
having meaning as in Section 4.9.3.
s
ness the weights must satisfy
Thus, to produce unbiased-
that is
for all
i.
A general solution to (4.10-2) would be
(4.10-3 )
lJrsit
c it = 1 for every i.
s
s
A more specific solution would again involve the use of combinatorial
where the
c.
s~
t
satisfy the restriction that
~'
number theory.
It can readily be seen that the class seven estimator is the most
general class, since by suitable suppression of the subscripts on the
lJr
' one can reach any of the other classes of estimators.
sit
ing equality of the
t.t
s~
By requir-
for all it, the unbiasedness requirement
(4.10-2) becomes
which is the same as that for class six (4.9-2).
From there one can
92
move to classes three, two and one.
By suppressing t , and setting
Vsi = Vs i ' one moves to class five, and from there to classes three
v
and two. Finally, by suppressing s, Vsit = Vit and class four is
obtained; from which class one can.be reached for the case of equal selection probabilities.
4.10.4 Variance of T7..'
Again the general expression is the easi-
est to manipulate for Whatever purpose might be at hand, and thus
(4.10-5 )
have been determined to produce an unbiased
assuming the V
sit
estimator.
An estimator for the variance of
'"
n
= (I:
V 't X,t)
, 7
t=l SJ.
J.
V(T )
n
=
would be
T
7
2
2.
(E Vsit xit ) t=l
[I:X~
i€s J.
P
s
( 4.10-6)
I
~
where
T2
is as given in Section 4.9.4.
4.11 Summary of Numerical Examples
In illustrating several of the various estimators derived in this
dissertation, two numerical examples were used.
These two four-unit
populations used the same numerical values, but in the second example,
Case B, the numerical values were assigned to the units so as to provide
selection probabilities at least somewhat proportional to size.
93
The two populations were:
A
B
C
D
1/2
1/4
1/8
1/8
Case A
3
4
8
5
Case B
8
5
4
3
unit
selection probability
numerical value
For all classes of estimators studied, Case B provided better (in
the sense of smaller) results in terms of the variance of the estimator,
the range of the estimates of that variance, and the range of the estimates of the population total.
Further, for both cases used as examples,
the estimator given in Class Five as (4.8-5) had the smallest variance
among the unbiased estimators for which variances were determined.
For comparitive purposes the results can be summarized as follows:
Case A (random assignment of numerical values to units):
Range of estimates
Class
Variance
Totals
Variances
2
3
4
5
6
142.7
5331.8
131.3
118.5
1009·9
3.4 - 46.3
2.8 - 1365.3
6.0 - 64.0
6.0 - 64.0
1.5- 256.0
19.9 - 513·0
-135.6 - 11429.0
11.1 - 373·8
Case B (probabilities somewhat proportional to size):
Range of estimates
Class
Totals
Variance
Variances
2
3
4
5
6
40.2
1601.6
9.3
8.4
354.6
8.6
5·1
16.0
16.0
4.0
- 30·3
- 682.7
- 32.0
- 32.0
- 128.0
36.0 - 156.3
-180.0 - 3293·0
28.4
1.8-
5.0
SOME ADDITIONAL COMMENTS ON THE ESTIMATORS
The reader of this dissertation has undoubtedly noticed that some
of the weights given for the various classes of estimators are rather
formidable in appearance, especially if one is thinking about the computational aspects of producing numerical results.
The advent of the
large high-speed computers should help negate any reluctance to use a
non- self-weighting (1. e., self-weighting meaning equal simple weights)
design with "complicated" weights.
Another approach to this problem
has been proposed by Murthy and Sethi (1961).
Starting from the premise
that the effort required to produce the multipliers used in the estimator may be prohibitive where a non-self-weighting design is used in a
large scale survey, they propose a technique to substitute for the multipliers a very small number of multipliers called "randomized rounded-off
multipliers" , substituted by a suitable randomizing process, thus reducing the computational burden.
They suggest a procedure for determining
the values of the randomized rounded-off' multipliers which minimizes
the increase in the variance of the estimator.
Another item which might be a cause of concern is the possibility
that some of the estimators could have negative estimates of the sample
variance.
In regard to this problem, Koop (1957, ch. 6 ) gives a very
complete discussion of the possibility of and interpretation of negative
estimates of the sampling variance, and these remarks will not be repeated here.
Also to be noted is that among the various estimators pro-
posed in the various classes, only those estimators proposed in classes
one and four have variance estimators which always produce a positive
95
estimate of the variance.
The variance estimators in the other classes
mayor may not produce a positive estimate, depending on the particular
sample involved.
Having formulated seven classes of estimators for the case of
sampling with replacement, the question might now be raised as to
whether any of the estimators can be eliminated from further consideration by virtue of an estimator from another class having a consistently
smaller variance.
For the general case such comparisons between the
variances will involve comparisons between quadratic forms which involve
both the variate values of the characteristic(s) under study and the Pivector, with the p. values arbitrary subject only to the restriction
N
~
that
1: p. = 1'1
~=
~
In general, to get an answer to this question, one would have to be
very specific, for the direction of the inequalities, from preliminary
considerations, would seem to involve the specific values of Nand n
under consideration, and also the specific probability vector, (~i) (or
at least its structure), applicable to the problem at hand.
Given all
these specifications, it would seem that inequalities should exist but
imposing such restrictions does not yield a general answer to the question of "bestness" of any of the ,estimators posited earlier.
For the restrictive case of equal selection probabilities, the
class one estimator can be eliminated from further consideration.
Des Raj and Khamis (1958) have shown that this estimator, which is the
arithmetic mean of the total sample for the case of equal selection
probabilities, has a larger variance than the arithmetic mean of the
96
distinct units observed when sampling with equal selection probabilities
and with replacement.
This arit'hmetic mean of the distinct units (i. e.,
an estimator with weights N/v) belongs to class five rather than class
one.
One additional comparison (inequality) has been "provedll in the
Godambe (1960) shows that the
literature and it is worthy of comment.
estimator given in Section
4.5
as the class two estimator (the sum over
the distinct units of the x's divided by their inclusion probabilities)
has smaller variance than any member of a class corresponding to class
five for some population.
ing argument:
Th1~
follows, Godambe says, from the follow-
Define a linear estimator
e
s
as
where the summation is over the distinct units in the sample.
"It is
again clear that all the known linear estimates must be particular cases
of
e
s
II
says Godambe.
for all 1.
And if
=
e
If
s
is to be unbiased, then
1.
.E
s::> i
f3;i Ps + .E xiXJ0.E
Godambe then" proves that setting f3si
estimate.
Here p(i)
irj
=
l/p( i)
f3
°
S1.
P
=
1
S
s::> i, j
f3sif3sJo Ps -
T2 .
yields an admissible
denotes the probability that the i-th unit is
included in the sample (= Pi* with replacement).
by supposing that
.E
s::> i
is unbiased, its variance is given by
s
~ x~
1=1
e
This is done, for i €s ,
o 0
97
Xi
=
1
=
0
0
xi
for
i
f
io •
For these assumptions,
1
vee')
= v (i !: 13's~.
s
.
.
€Sv
Xi)
= !:13
'
i 6i
s;) 0
0
,
P - 1
s
,
so that
1 )2 Ps
.p(i )
3
o
His argument runs that since
is positive with the two components inside the brackets assumed unequal,
and with Ps
also always positive,
is at least as good as any other estimator in the class of unbiased
estimators for some special population.
The derivation of this
3The article, as printed, omitted Ps from this equ:ation. This
was corrected on the basis of private correspondence with Dr. Godambe.
98
inequality rests on the assumption that all elements in a population are
zero except on the i -th, which takes the value one.
o
The logical justi-
fication for the use of this estimator solely on the basis of its merit
from this peculiar restrictive case would seem to be rather shaky.
Special attention might be called to the effect of the lis-factor in
class six.
This factor,will have the effect of helping correct for a
disproportionate number of units in the sample from among those with
large probabilities or those with small probabilities (assuming selection probabilities somewhat indicative of size).
If a disproportionate
preponderance of the smaller (probability) units are drawn, then lis
will be numerically small, and being in the denominator will tend to
increase the estimate of the total (or mean) and will correspondingly
increase the variance.
Consider the situation where a few units re-
ceived special probabilistic consideration by virtue of their large
size, with the bulk of the units being smaller, and having equal probabilities among themselves.
smaller units, the value of
Then, if the sample drawn included only the
lis.
=
JT
(N .
1€S . P1
ti
v
[( 4.9-5 ~ would be less
~
.
than one arid the estimate of the total or mean would be inflated to
counteract the absence of a "representative" number of the larger units.
Also the estimate of the variance of the total would be inflated to give
a truer picture than that given by the essentially equal
s~ler
units.
By the same token, if the sample as drawn included a disproportionate number of the larger units, then the lis-factor would be p.umerically
large (> 1) and the estimate of the total would be deflated, as would
the estimate of the variance of the estimator.
seem to be a useful inclusion in an estimator.
All in all, it would
99
6.0
6.1
SUMMARY
Summary and Conclusions
There is more to the estimation of unknown population values than
the making and recording of observations.
Nor is one helped much by
merely taking a large number of observations.
All too often, as a re-
sult of insufficient consideration of the basic components of a sampling
plan, badly biased sample results have been put forth as reliable simply
I
because the number of units in the saIlJ±lle involved was numerically large.
One must note that a large sample is not necessarily a good sample, but
it is nearly always an expensive sample.
Relegating cost considerations to the background, but not ignoring
them, it has been seen that a sampling plan has five major components:
1.
A UNIVERSE:
the totality of ultimate units of analysis about
which information is desired.
2.
The FRAME:
a delineation of the sampling units (which may consist
of one of more units of analysis).
3. A PROBABILITY SYSTEM::
a set of numbers, one for each saIlJ±lling
unit, with values restricted to the range
0
< p.J. < 1 . and with
their sum over all sampling units in the universe restricted to
one, which are in one-to-one correspondence with the particular
frame involved.
These selection probabilities must be operation-
ally realizable.
4. A SAMPLING PROCEDURE:
a scheme which comes operationally from the
probability system for determining which particular units constitute the sample.
100
5. .An ESTThlATION PROCEDURE:
the result of the logical combination of
the observations (obtained through the frame) and the probability
system, and also involved with the sampling procedure, for arriving at the desired estimates of the population values of the
characteristics under observation.
The first three of these must be completely specified prior to any consideration of the last two.
.And for each change in the specification of
the frame and the probability system, the problem of obtaining an "optimum" sampling procedure and an "optimum" estimator changes •.
One can note that both the frame and the probability system can be
either simple or complex.
In the discussion of estimators in Section 4.0,
consideration was restricted to one stage sampling, so that there was a
simple frame and simple probability system, however that does not affect
the generality of the above formulation.
The frame ang. the probability
system, whether simple, different for each stage of a multi-stage plan,
varying over time, etc., still must be specified before one can consider
problems of selection of a sampling procedure or an estimator.
Also as a result of this formulation, the statement has been made
that a direct comparison between sampling with replacement and sampling
without replacement does not have any logical justification.
The
authors who have considered this question apparently came to the same
conclusion, although they did not state it explicitly, because the comJ2arisons actually made were between estimating on the basis of the distinct units and on the basis of the totality of units when sampling with
replacement, rather than the stated with versus without comparison.
101
In addition to considering the non-human components of a sampling
plan, consideration also has been given to criteria to.be applied in
helping determine which of a choice of estimators is optimum.
In the
literature on sample survey techniques, the criteria that have been
used have been, for the most part, those developed for infinite populations and applied to samples from finite populations with the expectation that the degree of relevance is still fairly high.
It was seen
that the concepts of sufficiency and efficiency (defined in terms of
minimum variance for an asymptotically normally distributed estimator)
are usually meaningless when applied to samples from finite populations.
Asymptotic normality cannot be achieved without resorting to an argument that the size of the fixed finite universe be allowed to approach .
infinity.
Regarding sufficiency, the argument follows principally from
the fact that, in the process of sampling, probabilities enter the problem only in connection with the selection of the units to be included,
and not in connection with the characteristics of those units which are
the objectives of the investigation.
Further, the
concep~
of consistency, when the traditional defini-
tion based on convergence in probability is used, does not apply to
finite samples from finite populations for the same reasons as given
above.
However, if one goes back to the first definition given for con-
sistency, when Fisher promulgated the beginnings of estimation theory
(1921), there is obtained a definition which seems to be perfectly suitable for finite populations.
It is the following:
"A statistic satisfies the c~.iterion of consistency if, when
it is calculated from the whole population, it is equal to
the required population value. n
102
The two oldest estimation criteria, unbiasedness and
mini~um
variance, which were formulated by Gauss in the early 1800's are still
applicable, both to the infinite and finite populations, however they
are possibly too restrictive to be general criteria.
Thus, in the way of major estimation criteria to be applied to the
problem of selecting an optimum estimator to be based on the observations of a finite sample from a finite population, one can require:
(1) that the estimator be consistent, and,
(2) that it have a minimum mean square error
where mean square error equals the sum of the variance and the square of
the bias.
Many might consider the desideratum to be that the estimator
be unbiased and have minimum variance, but, for generality, a better (in
some sense) estimator may be obtained if consideration is given to estimators which are consistent, i.e., have a disappearing bias, and consequently might have a minimum mean square error, if such minimum is
obtainable.
If there is no bias present in the desired estimator, the
two sets of criteria are identical.
And finally this dissertation has applied the axiomatic approach to
the case of sampling with arbitrary selection probabilities and with replacement of the sampling units before another unit is drawn.
It has
been seen that the use of axioms in the process of formulation of
classes of estimators has produced seven classes of linear unbiased
estimators of the population total, with the weights independent of the
unit characteristics (thus prohibiting imposition of a minimum variance
criterion).
Within each class, a condition derived from the criterion
103
of unbiasedness has been derived, and possible solutions to that equation have been proposed.
To tie the various classes of estimators to-
gether it may be noted that from the condition of unbiasedness on the
class seven estimator every other estimator can be derived by suitable
suppression (assumption of equality, e.g., r j = r when j is suppressed)
of the subscripts which denote conditions on the weights.
From class
six, one can go to classes one, three and two; and so forth.
The pos-
sible directions of movement are indicated by the arrows in the following diagram:
In considering the variances of the estimators given in Sections
4.4 to 4.10, the class one estimator has been shown to be inferior to
an estimator belonging to class five, to wit:
< v(l!
n
1.:
n t=l
x. )
~t
for
n > 2· •
However, as class one is so restricted, this comparison is also restricted to the case of equal selection probabilities.
No such statement can
be made concerning the general case of arbitrary selection probabilities.
104
Otherwise, the choice of which of the various estimators to use
will depend on the specific circumstances of the sample to be drawn,
including the choice of the probability system, and possible outside
considerations which will dictate the combination of restrictions to be
applied to the choice of weights for the selected units.
6.2
Recommendations for Future Research
A major objective of this dissertation has been to raise the point
of view that the whole area of sample survey theory needs a theory of
estimation or a set of estimation criteria derived for and applied to
finite samples from finite populations.
The field of sample survey
theory should not have to rely on ready-made concepts derived for in. finite populations, which, when applied to finite populations, have to
rely on ideas such as letting both the sample and the population approach an infinite size.
If such a theory is developed, it will of necessity mean more
emphasis on combinatorial theory in the study of and development of
sample survey theory.
Another area of possible additional research would be the intermediate area between sampling with replacement which was a subject of
this dissertation and sampling Without replacement which was the subject
of the dissertation by Koop (1957).
One might have a situation where
replacement of one or more units occurred simultaneously after a given
number of units has been drawn without replacement.
Or one might
postulate a sampling scheme where the decision as to whether to replace
c'
105
a given unit is arbitrary (e.g., the unit might die before replacement
could be effected) or is determined 'in a systematic or probabilistic
manner.
With the advent of bigger and faster computers ,empirical sampling
might be done to investigate the relative efficiencies for the estimators proposed here for the case of sampling with replacement.
Another
topic for investigation along these lines would be the possible dependence of the relative efficiencies on the structure of the selection
probability vector.
And finally, this dissertation dealt mostly with unbiased linear
estimators • With the large computers for use in computation, it is
undoubtedly desirable to modify the "best linear unbiased" criterion to
include consideration of estimators that are nonlinear and consistent,
but would have a smaller mean square error than the best linear unbiased
estimators.
106
7.0
Basu, D. 1958.
287-294.
LIST OF REFERENCES
On sampling with and without replacement.
Sankhya 20:
Bowley, A. L. 1926. Measurement of' the precision attained in sampling.
Bull. Inst. Inter. Stat. Tome XXII, I-ere Livraison: (1)-(62).
Carmichael, R. D. 1937. Introduction to the Theory of' Groups of' Finite
Order, Dover Publications, Inc., New York (reprinted 1956).
Chrystal, G. 1900. Algebra, Part II.
York (reprinted 1961).
Dover Publications, Inc., New
Cochran, W. G. 1946. Relative accuracy of' systematic and stratif'ied
random samples for a certain class of' populations. Ann. Math.
Stat. 17: 164-177.
Cochran, W. G.
New York.
1953.
Sampling Techniques.
John Wiley and Sons, Inc.,
Das, A. C. 1951. On two-phase sampling and sampling with varying probabilities. Bull. Inst. Inter. Stat. 33: 105-112.
Deming, W. E. 1960. Sample Design in Business Research.
and Sons, Inc., New York.
John Wiley
Edgeworth, F. Y. 1918. On the value of a mean as calculated from a
sample. J. Roy. Stat. Soc. 81: 624- 632.
Feller, W. 1957. An Introduction to Probability Theory and its Applications, Vol. I, 2nd edn. John Wiley and Sons, Inc., New York.
Fisher, R. A. 1921. On the mathematical f'oundations of theoretical
statistics. Phil. Trans. Roy. Soc. London Ser. A 222: 309-368.
Fisher, R. A. 1925. Theory of' statistical estimation.
Phil. Soc. 22: 700-725·
Proc. Cambridge
Fisher, R. A. 1956. Statistical Methods and Scientific In:f'erence.
Hafner Publishing Co., New York.
Fisher, R. A., and Yates, F.
and Boyd, Ltd., London.
1949.
Fraser, D. A. S. 1958. Statistics:
Sons, Inc., New York.
Statistical Tables, 3rd edn.
An Introduction.
Oliver
John Wiley and
107
Godambe, V. P. 1955. A unified theory of sampling from finite populations. J. Roy. Stat. Soc. Ser. B 17: 269-278.
Godambe, V. P.
Sankhya
22:
1960. An admissible estimate for any sampling design.
285-288.
Hansen, M. H. and Hurwitz, W. N. 1943. On the theory of sampling from
finite populations. Ann. Math. Stat. 14: 333-362.
Hansen, M. H., Hurwitz, W. N. and Madow, W. G. 1953. Sample Survey
Methods and Theory, Vol. II. John Wiley and Sons, Inc., New York.
Horvitz, D. G. and Thompson, D. J. 1952. A generalization of sampling
without replacement from a finite universe. J. Am. Stat. Assoc.
47:
663-685.
Isserlis, L. 1916. On the conditions under which the "probable errors"
of frequency distributions have a real significance. Froc. Roy.
Soc. (London) Ser. A 92: 23-41.
Isserlis, L. 1918. On the value of a mean as calculated from a sample.
J. Roy. Stat. Soc. 81: 75-81.
Koop, J. C.' 1957. Contributions to the general theory of sampling finite populations without replacement and with unequal probabilities.
Unpublished Ph.D. Thesis, North Carolina State College, Raleigh
(university Microfilms, Ann Arbor).
Koop, J. C. 1960. On theoretical questions underlying the technique of
replicated or interpenetrating samples. Froc. Social Stat. Sect.,
Am. Stat. Assoc. 1960: 196-205.
1-1adow, W. G. 1948. On the limiting distribution of estimates based on
samples from finite universes. Ann. Math. Stat. 19: 535-545.
1950. An outline of the theory of sampling systems.
Inst. Stat. Math. 1:: 149-156.
Mi dzuno , H.
Ann.
Mortara, G. 1917. Elementi di statistica. Appunti sulle lexioni di
statistica methodologica dettate nel R. Instituto Superiore di
studi comerciali di Roma. Rome. p. 356. As cited by Tschuprow
(1923) .
Murthy, M. N. and Sethi, V. K. 1961. Randomized rounded-off multipliers in sampling theory. J. Am. Stat. Assoc. 56: 328-334.
NanjaJ'lJIlla, N. S., Murthy, M. N. and Sethi, V. K. 1959. Some sampling .
systems providing unbiased ratio estimators. Sankhy~ 21: 299-314.
108
Neyman, J. 1934. On two different aspects of the representative method: the method of stratified sampling and the method of purposive
selection. J. Roy. Stat. Soc. 2]: 558-606.
.
'
Neyman, J. 1952. Lectures and Conferences on Mathematical Statistics
and Probability. Graduate School, u. S. Dept. Agr., Washington,
D. C.
Raj, Des. 1958. On the relative accuracy of some sampling techniques .
J. Am. Stat. Assoc. 22: 98-101.
Raj, Des and Rhams, S. H. 1958. Some remarks on sampling with replacement. Ann. Math. Stat.~: 550-557.
Riordan, J. 1958. An Introduction to Combinatorial Analysis.
Wiley and Sons, Inc., New York.
John
Roy, J. and Chakravarti, I. M. 1960. Estimating the mean of a finite
population. Ann. Math. Stat. 31: 392-398.
Seng, Y. P. 1951. Historical survey of the development of sampling
theories and practice. J. Roy. Stat. Soc. Sere A 114: 214-231.
Splawa-NeYman, J. 1925. Contributions to the theory of small samples
drawn from a finite population. Biometrika 17: 472-479.
Stevens, W. L.
1937.
Significance of grouping.
Ann. Eugenics.§: 57-69.
Stevens, W. L. 1958. Sampling without replacement with probability proportional to size. J. Roy. Stat. Soc. Sere B 20: 393-397.
Sukhatme, P. V. 1953. Sampling Theory of Surveys with Applications.
The Indian Society of Agricultural Statistics, New Delhi, India,
and The Iowa State College Press, Ames, Iowa.
Sukhatme, P. V. and Narain, R. D. 1952 .
Indian Soc. Agr. Stat. ~: 42- 49.
Sampling with replacement.
J.
Tschuprow, A. A. 1923. On the mathematical expectation of the moments
of frequency distributions in the case of correlated observations.
Metron g(3): 461-493 and g(4): 646-683.
Whittaker, E. and Robinson, G. 1944. The Calculus of Observati'ons, 4th
edn. Blackie and Son, Ltd., London.
Wilks, S. s. 1960. A two-stage scheme for sampling without replacement. Bull. Inst. Inter. Stat. 21(2): 241-248.
109
Wu-min. 1958. Two ways of compiling statistics.
Peking, China. April 29: 1, 4.
A
J el'lIllin j ih pao.
Yates, F. 1953. Sampling Methods for Censuses and Surveys, 2nd edn.
Hafner Publishing Co., New York.
Yezhov, A. 1957. Soviet Statistics.
House, Moscow.
Zarkovic, S.
Russia.
s.
1956.
Foreign Language Publishing
Note on the history of sampling methods in
336-338.
J. Roy. Stat. Soc. Sere A 119:
Zarkovic, S. S. 1960. On the efficiency of sampling with varying probabilities and the selection of units withreplacement. Metrika
~: 53-60.
APPENDICES
111
8.0
APPENDIX A
THE DISTRIBUTION OF THE NUMBER OF DISTINCT UNITS IN THE SAMPLE
Let
of size
v
n
denote the number of distinct units appearing in a sample
drawn from a fimte population of size
of each unit drawn preceding the next draw.
that
on
Then it is readily apparent
is a random variable (1 ~ v ~ n) with a distribution dependent
v
n
N with replacement
and
N• Although all the results of this appendix are not used
in the body of this dissertation, the use of generating functions in
this field is of interest.
8.1
Equal Selection Probability Case
Let the probabilities of selection be equal for each of the
units (i.e., Pi = P
= liN),
distribution of
and that of the number of empty cells when
v
then an analogy may be drawn between the
are randomly distributed among
n
cells.
n
r
balls
This classic "occupancy prob-
lem" yields the following formula, as given by Feller
the probability of having
N
m cells empty when placing
(1957, p. 92), for
r
objects into
cells:
P[m]
= Pr(m
cells empty)
(8-1)
To apply this formula to the distribution of
Pr( v distinct units in n draws)
v, note that
=
Pr(N-vunits not drawn or "empty").
112
Setting, in (8-~)
n = N,
m = N - v,
r = n
and reversing the order of summation (setting s = v-s) gives:
p(v) = ( N) ~ (_l)v-s ( v ) (1 _
v-s
N-v s=o
(N-V); (v-s) )
(8-2)
Using the "differences of zero" notation, (8-2) may be written in a
more elegant form as
(8-3)
where t:::. is the usual finite difference operator with unit increment and
From (8-3) the probability generating function of
v
can be obtained
as
(8-4)
Note that t:::.sOn = 0
for
s = 0
and for
s > n.
Further, the factorial generating function is readily obtained from
the probability generating function by substituting (1 + t) for
( 8- 4), to wit
t
in
113
(8-5 )
where
C =1
+A
= the
usual increment operator with unit increments.
Using Fv(t), (8-5), one can readily compute E(v) and E v(v-l)
and from
these the mean and variance of v are easily obtained as follows:
E(v)
=
t=O
(8-6)
Since the variance of v is
(8-7)
first determine:
114
t=O
and then, by substituting this· in (8-7), obtain
V(v) = N- n N(N-1) [~ - 2(N-1)2 + (N-2)nJ + N-~[~ - (N-1)nJ
_r n r [~n _ 2~(N_1)n + (N_1)2nJ
= N- n N(N-1)n
= N (N;lt _
+
r
N- n N(N-1)(N-2)n _ N-2~(N_1)2n, _
(N;1)2n + N(N_1)(N;2)n •
(8-9)
Also, E(~) can be seen to be:
2
E( v )
= E [v( v-1) ] + E( v)
= N-~ [Nn - 2(N-1)n
+
(N-2)nJ
+
~ [(N_1)n - (N_2)n].
(8-10 )
115
8.2 Arbitrary Selection Probability Case
With arbitrary, or unequal, selection probabilities, the analogy
with the "occupancy problem" disappears, and the distribution of v be·
comes rather messy.
One can, however, obtain expressions for the mean
and variance of v without first obtaining the distribution of v.
Let the characteristic random variable
1
Zi =
if the i-th unit is drawn, regardless of the number of
times it appears in the sample.
o
if the i-th unit is not drawn.
Also denote the probability of the i-th unit being drawn on any given
N
draw as
p.
with 1:: Pi = 1 •
1.
i=l
Then, on n draws with replacement, the probability that the i-th
unit is not drawn, i. e ., that zi takes the value zero, is:
Pr( zi = 0) = ~ where ~ = 1 - Pi
from which it follows that
Thus the expectation of zi is seen to be
=
(1) (1 - ~) + (0) (~)
(8-11)
Now since the number of distinct units equals the sum over all units
in the population of the characteristic random variable, then
116
N
E( v)
= r:
E( z. )
i=l. J.
N
= r:
i=l
n
(1 - ~)
N
= N - r: (1 i=l
p)
i
n
(8-12)
•
This approach also yields
N
N
N
V(v) = V( r: Zi) = r: V(Zi) + r: Cov(zi' Zj).
i=l·
i=l
irj
Now:
(8-13 )
= (1
- ~ - q~ + ~j) - (1 - ~)(1 - q~)
n)
n n
= - ( ~ qj
where
= 1 - Pi - Pj •
~j
so that, using (8-13) and (8-14), the variance of v
=
N
r:
i=l
N:2
%- (r:
~)
i=l
(8-14)
- ~j
N
is
n
+ r: ~ ..
irj
J
These results have also been derived by Basu (1958), but in a very
compressed form.
(8-15 )
117
9.0
APPENDIX B
A STATISTICAL THEORY OF COMMUNISM
The following is a translation of an article, Wu-min (1958), which
....
-
-
appeared in Jenmin jih pao, the official party newspaper in Communist
China, on April 29, 1958.
The government was, at the time, having con-
siderable difficulty explaining to the world the discrepancy between
the actual production figures for some crops, and the stated objectives
of the five-year plan then in effect.
It is reproduced as an illustration of the reaction that can occur
when scientific principles do not produce politically desired results.
The moral, however, is not that samplers should pay sole attention to
statistical theory and methodology at the expense of political considerations when formulating the problem, but that the desideratum is sampIers who observe considerations of the subject under study and the
national goals which may be involved, and still retain complete objectivity in compiling, analysing and reporting the sample data.
Two Ways of Compiling Statistics
In reading the report on "Speeding up production by using statistics in Ho-Pei Province, II we see that there are two ways of compiling
and using statistics.
and integrated.
One is static and isolated,
t~e
other is imposing
Statisticians in the past, under metaphysical philoso-
phy, adhered too closely to regulations and forms, claiming that statistical workers should assume an extremely detached and cool attitude.
But at the height of our national leap forward in agricultural and
118
industrial production we cannot stand still; we must march forward with
the mass of the people.
The State Statistical Bureau has made a thor-
ough investigation of past policies and found the following shortcomings:
1.
Too much stress on textbooks, report forms, neglecting polit-
ical responsibility, obserVing the rules to each title and letter.
Doing nothing beyond this.
A new and improved system of statistics in-
troduced in Ho-Pei Province has been used with highly effective results.
Statistical workers of the old school, Visiting Ho-Pei, have doubted
these results because the new methods cannot be found in their textbooks.
The value of new methods and experience must be judged by their
contributions to the national welfare and socialist construction.
must be materialistic and follow the principle of actuality.
We
Most of
the Chinese texts on statistical methods were translated or compiled
from foreign books.
No books have yet been written with creative genius
based upon actual experience in China.
Therefore government agencies
dealing with statistics and colleges giving statistical courses should
accept the responsibility of accumulating experience in China and compiling our own textbooks.
2.
Too much emphasis on official formulae, which disregard the-
ories and politics; seeking only concrete figures, forgetting the spirit
of the times.
Statisticians should first learn the statistical theory
of Marxism and Leninism, then respond to the demand that China be guided
by those principles and establish its own method of statistics.
119
3. Too much mystification and self-consciousness among statisticians, who insist that this work must be done only by specialists.
Thus, they depend only upon their own workers, having no confidence in
other people, and refuse guidance by the government or the party.
We
must cooperate with the local population and participate in their production efforts.
This is the lesson
demonstrate~ by
the experiment in
Ho-Pei Province.
4. Too much reliance on official rules and procedures; seeking
only figures, disregarding people.
Too much preoccupation with writing
reports and filling forms, to the neglect of positive, creative and progressive work.
Statistical workers in Ho-Pei Province have adopted
entirely different methods, which can be summarized under the following
three points:
1)
Related their statistical records with the major activities
of the party and the productive labor of the people.
This
makes statistics the motive power and guiding force in the
national production leap.forward.
2)
Maintained political consciousness and guiding principles,
without holding too rigidly to absolute figures, prescribed
procedures and forms, which would waste time.
3)
Relied on local authorities and the mass of the people to
bring results and overcome obstacles.
In this way Ho-Pei
statistics are based upon actual conditions and the accuracy of sources can be guaranteed.
The new statistical methods in Ho-Pei have created new experience, new
120
trends and directives in statistics.
It is worthwhile, therefore to
recommend these methods to all statistical workers in China, and hope
that they will pay special attention to adapting concrete methods to
suit local conditions and purposes.
INSTITUTE OF STATISTICS
NORTH CAROLINA STATE COLLEGE
(Mimeo Series available for distribution at cost)
265. Eicker, FriedheIm. Consistency of parameter-estimates in a linear time-series model. October, 1960.
266. Eicker, FriedheIm. A necessary and sufficient condition for consistency of the LS estimates in linear regression. October,
1960.
267. Smith, W. L. On some general renewal theorems for nonidentically distributed variables. October, 1960.
268. Duncan, D. B. Bayes rules for a common multiple comparisons problem and related Student-t problems. November,
1960.
269. Bose, R. C. Theorems in the additive theory of numbers. November, 1960.
270. Cooper, Dale and D. D. Mason. Available soil moisture as a stochastic process. December, 1960.
271. Eicker, FriedheIm. Central limit theorem and consistency in linear regression. December, 1960.
272. Rigney, jackson A. The cooperative organization in wildlife statistics. Presented at the 14th Annual Meeting, Southeastern Association of Game and Fish Commissioners, Biloxi, Mississippi, October 23-26, 1960. Published in Mimeo Series,
January, 1961.
273. Schutzenberger, M. P. On the definition of a certain class of automata. January, 1961.
274. Roy, S. N. and J. N. Shrizastaza. Inference on treatment effects and design of experiments in relation to such inferences.
January, 1961.
275. Ray-Chaudhuri, D. K. An algorithm for a minimum cover of an abstract complex. February, 1961.
276. Lehman, E. H., Jr. and R. L. Anderson. Estimation of the scale parameter in the Weibull distribution using samples cen·
sored by time and by number of failures. March, 1961.
277. Hotelling, Harold. The behavior of some standard statistical tests under non-standard conditions. February, 1961.
278. Foata, Dominique. On the construction of Bose-Chaudhuri matrices with help of Abelian group characters. February,
1961.
279. Eicker, FriedheIm. Central limit theorem for sums over sets of random variables. February, 1961.
280. Bland, R. P. A minimum average risk solution for the problem of choosing the largest mean. March, 1961.
281. Williams, J. S., S. N. Roy and C. C. Cockerham. An evaluation of the worth of some selected indices. May, 1961.
282. Roy, S. N. and R. Gnanadesikan. Equality of two dispersion matrices against alternatives of intermediate specificity.
April, 1961.
283. Schutzenberger, M. P. On the recurrence of patterns. April, 1961.
284. Bose, R. C. and I. M. Chakravarti. A coding problem arising in the transmission of numerical data. April, 1961.
285. Patel, M. S. Investigations on factorial designs. May, 1961.
286. Bishir, J. W. Two problems in the theory of stochastic branching processes. May, 1961.
287. Konsler, T. R. A quantitative analysis of the growth and regrowth of a forage crop. May, 1961.
288. Zaki, R. M. and R. L. Anderson. Applications of linear programming techniques to some problems of production planning over time. May, 1961.
289. Schutzenberger, M. P. A remark on finite transducers. June, 1961.
b2+ m c"'-p in a free group. June, 1961.
290. Schutzenberger, M. P. On the equation a2+D
291. Schutzenberger, M. P. On a special class of recurrent events. June, 1961.
292. Bhattacharya, P. K. Some properties of the least square estimator in regression analysis when the 'independent' variables
are stochastic. June, 1961.
293. Murthy, V. K. On the general renewal process. June, 1961.
294. Ray-Chaudhuri, D. K. Application of geometry of quadrics of constructing PBIB designs. June, 1961.
295. Bose, R. C. Ternary error correcting codes and fractionally replicated designs. May, 1961.
=
296. Koop, J. C. Contributions to the general theory of sampling finite populations without replacement and with unequal
probabilities. September, 1961.
297. Foradori, G. T. Some non-response sampling theory for two stage designs. Ph.D. Thesis. November, 1961.
298. Mallios, W. S. Some aspects of linear regression systems. Ph.D. Thesis. November, 1961.
299. Taeuber, R. C. On sampling with replacement: an axiomatic approach. Ph.D. Thesis. November, 1961.
300. Gross, A. J. On the construction of burst error correcting codes. August, 1961.
301. Srivastava, J. N. Contribution to the construction and analysis of designs. August, 1961.
302. Hoeffding, Wassily. The strong laws of large numbers for u-statistics. August, 1961.
303. Roy, S. N. Some recent results in normal multivariate confidence bounds. August, 1961.
304. Roy, S. N. Some remarks on normal multivariate analysis of variance. August, 1961.
305. Smith, W. L. A necessary and sufficient condition for the convergence of the renewal density. August, 1961.
306. Smith, W. L. A note on characteristic functions which vanish identically in an interval. September, 1961.
307. Fukushima, Kozo. A comparison of sequential tests for the Poisson parameter. September, 1961.
308. Hall, W. J. Some sequential analogs of Stein's two-stage test. September, 1961.
309. Bhattacharya, P. K. Use of concomitant measurements in the design and analysis of experiments. November, 1961.
© Copyright 2025 Paperzz