Koop, J.C.; (1961)Contributions to the general theory of sampling finity populations without replacement and with unequal probabilities."

·e
CONTRIBurIONS TO THE GENERAL T:f-IEORY OF SAMPLING FINITE POPULATIONS
WITHO ur REPLACEI"lENT AND \fiT H UNEQ UAL PROBABI1IT ms
by
JOHN C1EFJENT KCOP
Institute of Statistics
11imeo Series No. 296
August 1957
•
roSTSCRIPr
•
This work now published in the Mimeo Series of the Institute of
Statistics, partly in response to requests for copies, is an approved
thesis for a doctor's degree which was submitted to the Faculty of
North Carolina State College in August 1957.
J. C. Koop
Raleigh" North Carolina
September, 1961
TABLE OF CONTENTS
Page
I.
•
1.1 General remarks
1
1.2 Statement of the problem
3
1.3 Review of literature
5
1.4 Theory treated
13
1.5 Notation
14
II.
III.
.-
1
INTRODUCTION
DEFINITION OF ThE PROBABILITY SYSTEM
18
AND CONSTRUCTION OF
25
m~UMERATION
ESTII~TORS
3.1 The axioms of sample formation
25
3.2 The seven general linear estimators
26
3.3 The enumeration of weights
28
3~4
IV.
4~1
29
General comments
UNBIASED ESTIMATORS AND THE ffiOB.LEH OF
V~~IANCE ESTD~TORS IN EACH CLASS
Introductory statement
MINIMm1
31
31
4.2 Class one estimators
33
41'3 Class two estimators
43
4~4
50
Class three estimators
4., Class four estimators
55
4.6 Class five estimators
67
497 Class six estimators
73
4.8 Class seven estimators
77
409 Summary
80
Table of Contents (Continued)
Page
V.
$.1
•
FURTEER NOTES ON UNBIASED ESTIl'1ATORS
General notes
83
83
$.2 Comparison of estimators
83
$.3 11ultiple comparison of efficiency
88
VI.
THE PROBLEl'1 OF NEGATIVE ESTIlvIATES OF VARIANCE
90
6.1 Introductory remarks
90
6.2
91
The case for class two estimators
3)
An example for V(T )
3
6.3 The case for V(T
92
6.4
94
V(~) illustrative of an unwise
choice of selection probaoilities
6.$ A further example for
98
6.6 Some interpretations of negative estimates of
variance for T
99
3
6.7 Estimators in class four
100
6.8 Estimators in class five and class seven
101
6.9 The case of V(T 6)
101
6.10 The existence of probability systems for which
V(T 2), V(T
3) and V(TS')
VII.
are always positive
A NOTE ON OPrD1UM PROBABILITIES
103
107
7.1 Introductory note
107
7.2 A simple example
107
7.3 A formal statement of the problem of optimum
probabilities in multivariate sampling with
general cost functions
liO
Table of Contents (Continued)
Page
VIII.
•
SUIlI1ARY AND CONCLUSIONS
112
8.1 Summary
112
8.2 Conclusions
1.14
8.3 Suggestions for future work
116
LITERATURE CITED
118
I.
1.1
INTRODUCTION
General Remarks
The historical basis of modern sampling theory can be found in the
•
works of Bernoulli, Poisson and 1exis.
However, it was not until the
appearance of Neyman's (1934) paper, based on his work in Boland, that
the theory of statistical sampling, as we know it today, came into
prominence.
In this paper the advantages of random versus purposive
sampling, the principle of unbiasedness and minimum variance (due to
Gauss but popularized by Harkoff) and the theory of optimum allocation
!I in
stratified one-stage sampling were discussed.
The next paper having a direct bearing on the subject of this
-e
thesis was by Hansen and Hurwitz (1943)
Y.
In what amounts to a
stratified two-stage sampling scheme they selected one primary unit per
stratum at the first stage with probabilities proportionate to some
measure of size, and at the second stage the elements in each primary
unit were selected with equal probabilties and without replacement.
They showed the resulting estimator to be unbiased"
Sampling with unequal probabilities arose partly because in practice the ultimate unit of analysis and the unit of sampling are not
identical.
If unequal clusters of ultimate units are sampled, then
obviously the sampling variance, on the basis of equal probability
sampling, can be expected to be larger than in the case when they
Y Subsequently
Thompson (1952) pointed out that these results were
anticipated by A. Tschuprow in a paper published in two parts in
I'i3tron in 1923.
g;Other
papers of importance to the body of sampling theory in the intervening period, i.e., 1934-1943, arell2ntioned in the bibliography
of this paper. I have omitted direct reference to them only because
they have less bearing on this work.
2
are all equal.
However, practical limitations (in the way of resources
both human and mmetary) may set a limit as to the way existing aggregates of ultimate units (e.g., farms in a minor civil division or
houses found in city block) may be arranged to have as far as possible
"
equally sized clusters in terms of the ultimate units considered.
It is well known that if clusters, treated as sampling units, are
selected with probabilities proportionate to their aggregate characteristics, and with replacement, the sampling variance of the mean of
these aggregates, each weighted by the reciprocal of its respective
selection probability, is zero.
Further, in most practical situations,
the aggregate characteristic of a cluster is highly correlated with
the number of ultimate units (or size of the cluster) which is often
-e
known in advance.
All these considerations prompted Hansen and Hurwitz to select
units with probabilities proportionate to some measure of size in order
to reduce the variance of their estimator over that which would have
been obtained on the basis of equal probabilities of selection.
How-
ever, because only one first-stage unit was selected per stratum, it
was not possible to calculate the sampling variance of the
estimator~
The special theories of I1idzuno (1950) and Horvitz and Thompson
(1951, 1952) were attempts to generalize this approach in the selection
of units.
About this time, and later, various statisticians from
India took up the problems of unequal probability sampling and proposed various sorts of estimatorso
Among them, Narain (1951) arrived
at the same tyPe of linear estimator of the population total as proposed by Horvitz and Thompson; Lahiri (1951), by a differenc approach,
obtained essentially the same estimator as Midzuno's, which, in
3
functional form, retains the character of a ratio estimate; Das (1951)
proposed an entirely different linear estimator.
In a subsequent sec-
tion all this work will be reviewed.
In any situation once the clusters (first-stage units) have been
sampled with unequal probabilities (in the sense explained) the ultimate units (second-stage units) in each selected cluster can be sampled
with equal or unequal probabilities.
If there is a hierarchy of units,
one nested within the other, then in a multi-stage procedure, sampling
at one or more stages may be with equal or unequal probabilities, and
with or without replacement.
In this thesis we shall not be concerned
with sampling beyond the first stage.
Once results for the first stage
are obtained the extension of the theory to cover two or more stages is
almost immediate.
1.2 Statement of the Problem
In this thesis we shall be concerned with sampling a finite popu-
lation without replacement of the units sampled after each draw and
with unequal probabilities of selection.
To be specific the solution of the following questions, or their
discussion, will be attempted as a contribution to the special theory
(in the sense which will be explained in the next section) already
initiated by Horvitz and Thompson (1952) •
•
(i)
During the past seven years various estimators alluded to
above have been proposed.
Can they all be placed in a
logical scheme starting with a few simple characteristics
(let us say axioms) on the way a sample is forrood?
Also
can general linear estimators and a most general linear
BIOMATHEMATICS TRAINING PROGRAM
4
estimator (from which it may be possible to derive all
known estimators) be formulated on the basis of these
axioms?
(ii)
Can minimum variance estimators in the sense of Gauss and
Harkoff (Neyman, 1934, 1952) be determined for each
general estimator?
(iii) From certain known results in the theory for infinite populations (which will be discussed) there are indications
that these minimum variance estirrators cannot be used because they may have weight s which may not be independent
of the properties of the population.
Attempts will be
made to approximate or simulate them in all such cases.
(iv)
Are there any more unbiased estimators still
undiscovered,
and if so what are their properties?
(v)
The difficulties in comparing the efficiencies of knovm
unbiased estimators 'Will be explored.
(Vi)
The conditions under which certain known unbiased estimatJrs yield negative estimates of variance will be
determined.
Also whether or not probability systems exist
for certain classes of estimators which will always yield
positive estimates of variance will be explored.
(Vii) An attempt to solve the problem of optimum probabilities
for a given estimator will be made by way of an example.
The problem in a general way will be formulated taking
into consideration factors of cost in the sampling of more
than one characteristic, i.e., multivariate sampling.
5
•
I.) Review of Literature
First and foremost it is instructive to review the work on equal
probability sampling.
The theor,r of sampling finite populations with
equal probabilities and without replacement of the elements selected
after each draw took shape with Isserlis' (1916) paper, "On the
Conditions Under Which the 'Probable Errors' of Frequency Distributions Have a Real Significance.'" Among other results he obtained the
first four moments of the mean.
It appears that this paper did not
come into prominence among statisticians concerned with the problems
of sampling because of its very title.
Isserlis (1918) reproduced
these results in another paper more suggestive of its contents, and
Edgeworth (1918), noting his work, derived his results in a novel way.
-e
A Russian statistician, Tschuprow (192)} reported the same
results.
He also gave the formula for the optimum allocation of units
in stratified sampling.
He was aware of Isserlis' work and in a
footnote to his 192) paper he also referred to the work of an Italian
statistician, G. Mortara (1917), in which, to use his own words,
lithe special formula for the standard error of the average in the
case of 'unreplaced tickets'
II
was given.
He further explained that
his results were obtained about the same time as Isserlis t and were
subsequently published in a Scandanavian journal (1918).
In this paper
he considered the population to be changing at each draw and he obtained
the first four moments for the mean as a special case of his theory
when the elements are not replaced after each draw»
Neyman (1925)Y, unaware of previous work, obtained the same
formulas for the first four moments of the mean and the first two
±lHe
then wrote under the name of
J~
Splawa-Neyman.
6
moments of the sampling variance.
His method of derivation of these
moments was much more direct than either Isserlis' or Tschuprow's and
without the complexity of notation and algebraic involvement which was
characteristic of the latter's work.
Indeed it seems that the work of
Tschuprow was overlooked among writers in the English language for this
reason.
Church (1926) obtained the third and fourth moments of the sampling
variance of the mean.
It is of interest to note that he started out
to derive the third moment by first determining the uncorrected third
moment as
(~)lJ
JI
\: N
-e
-'
where S. represents summation over the N values of the sample and..:.2
represents the summation over the MCN samples.
for his universe, and N in the sample.)
(There were 11 elements
It lJill be noted that 1/
is the probability of obtaining a given sample of Ne
I"bN
Hence this un..
corrected moment is merely the expectation of the expression in curled
bracketsD
In the algebra associated with the derivation of expecta..
tions in unequal probability sampling we perform similar operations.
Carver (1930) also obtained the first four moments of sample
aggregate values by precisely the same approach as Church.
If one
reads a little deeper into Carver's work it appears that he may have
aggregate weighted by NC / N-IC 1 provides
n
nan unbiased estimator of the population total (and indeed the best
recognized that the sample
linear unbiased estimator).
This statement may appear a little naive
because in the context of equal probability sampling the weight in
7
question reduces to N/n but in the context of unequal probability
sampling it assumes significance as we shall see latero
The work of Hansen and Hurwitz
(1943) has already been referred
to in some detail.
I~dzuno
(1950) extended the Hansen-Hurwitz approach to sampling a
combination of !! elerrents with a probability proportionate to some
measure of size of the combination.
From a formula given in his sub-
c...quent paper (1952) and also as noted by Horvitz and T~lom}'oon (195£)
and Sen (1952)$ this probability is equal to the total probability of
selecting the first unit with a probability proportionate to a
measure of size of the first unit and the remainder with equal probabilities and without replacementa
-e
Lahiri (1951) was concerned with the bias in the ratio estimate
and he set out to find an estimator which while retaining the character
of a ratio estimate was also to be free from bias.
The usual ratio
estinate of the total for single-stage sampling is
r-]x) ] 1
~"'l I
,
Yi
i=l
t>T y $
where the y' s are the auxiliary characteristics which in many situations are known, Ty is their total, and the x's are the characteristics
under study of the !!. elements sampled.
He saw that if a sampling scheme
could be devised in which the probability of obtaining a given group
of !: elements is proportional to the total measure of their corresponding aUXiliary characteristics, then the estimate lolOuld be unbiased.
He devised such a scheme.
It must be said his approach was much more
di.."'ect than that of I"lidzuno, of whose work he was unaware, and he arrived
at his sampling scheme,; as he confesses, by geometrical intuition.
8
Sen
Hidzuno's
(1952) worked out the consequences of the ideas contained in
1950 paper, and among other results he obtained independently
the same type of ratio estimate as Lahiri t B in the case of sampling
without replacement and in two stages.
samples of size n •
He was concerned mostly with
2 (Sen et al., 1954).
The foregoing account will show that there was a unity of theme
in the works of Hidzuno, Lahiri, and Sen.
In their investigations they
assumed from the beginning that the probability with which the sample
should be drawn should be some function of known measurable characteristics related to the elements under study.
Subsequently it will be
shown that their estimator is a particular case of a more general
estimator formulated by Hbrvitz and Thompson
(1952).
A IOOre general approach to the problem of sampling finite populations with unequal probabilities and without replacement of elements
was formulated by Horvitz and Thompson
paper was published in full in
(1951) and subsequently this
1952.. In their theory the probabilities
of selection were completely arbitrary at each draw.
As we shall see
later the generality of this approach is of conSiderable theoretical
advantage becuase it provides the basis for determining the optirmun
probabilities in any defined
sense.
They formulated three classes
of linear estimators of the population total with coefficients for
each class depending on the order of draw, the presence or absence
of an element in a sample and the particular sample in question.
Subsequently the general estimators corresponding to each of these
features of sample formation will typify what will be designated as
class one, class two and class three respectively.
HOt,~ver,
the
logical consequences of these ideas were not explored except in
9
respect of the estimator with coefficients depending on the appearance
or nonappearance of the elements in the sample.
These coefficients were
determined (a) by imposing the condition of unbiasedness, i.e., requiring
the expected value of the estimator to be equal to the quantity under
estimation (the total in this case); and (b) by requiring that they
shall be independent of the properties of the population, i.e., insisting further that the expected value of the estimator shall be
identically equal to the quantit.y under estimation..
In sequel the
distinction between conditions (a) and (b) will become more evident.
The sampling variance of this estimator was given as well as its
estimator.
The extension of the theory for the case of two-stage
sampling was also given.
-e
They also attempted to determine those values
of the initial selection probabilities that would result in a low
variance on the basis of information on an auxiliary correlated variable b known for each element of the universe.
Yates and Grundy (1953) noted that the Horvitz-Thompson estimator
of variance can assume negative values.
They proposed another unbiased
estimator believed to be less often negative.
Narain (1951) among other questions considered the estimation of
the mean value of a population of clustered elements by sampling in two
stages, sampling first the cluster (treated as the primary unit) with
unequal probabilities and without replacement, and then a constant
number of elements in each cluster at the second stage with equal
probabilities.
He chose as his estimator the unweighted mean of the
characteristic of the elements under study, and for sampling primary
units of size two he determined from the condition of unbiasedness the
expression for the initial probabilities.
These turned out to be
10
complicated functions of the cluster sizes.
To circumvent this diffi-
culty he proposed the selection of groups of primary units, which he
termed "group sampling, It from among all poSsible combinations wi. th certain predetermined probabilities for each group or sample.
In order
that the estimator may be unbiased, he chose probabilities of including
a given unit proportional to the product of the sample size and cluster
size.
He wrote down the expressions for these probabilities for the
case of samples of two units from a universe of four.
mator is of the
earlier.
sa~
Narain's esti-
type as Horvitz and Thompson's estimator described
The sampling variance of his estimator is also of the same
type as the one obtained by Horvitz and Thompson for their two-stage
case.
Das (1951) also proposed an unbiased estimator (for the case of
single-stage sampling) which was entirely different from any described
in the foregoing account.
The weights attached to each observed
characteristic were dependent partly on the order of the draw and
partly on the element appearing at the draw 0
He also obtained the
expression for the sampling variance and its estimator, which can assume negative values.
Des Raj (1956) proposed two new unbiased estimators ~uth coefficients which depend partly on the element itself and partly on the
order of the draw.
In the same paper he proposed a third but he was
apparently unaware that it was essentially the saIll:l as Das'.
The
estimates of variance for these estimators can take negative values
except for a special form of one of his estimators.
He further gave
the extensions of the theory for multi-stage sampling.
11
Earlier
(1954) he had explored the special form of the Midzuno-
Lahiri-Sen estimator and he stated that it was the only unbiased estimator in the class.
is not true.
Subsequently it will appear that this statement
He also gave the appropriate theory for the multi-stage
and two-phase extensions of this estimator, but in regard to the former
problem in a more straightforward manner than Sen.
the variance in each case discussed in this
negative.
The estimators of
1954 paper can also be
The conditions under which the estimator of variance shall
be positive even for single-stage sampling have not been explored.
Godambe
(1955) put forward a unified theory of sampling finite
populations.
He defined symbolically a system of probabilities for the
selection of each element which are functions specified by the follow-
-e
ing arguments:
the individual selected, the particular draw, and the
sequence of elements preceding the individual element selected.
For
example, probabilities associated with drawing any element in a stratified sampling scheme can be expressed with a unity of notation.
On
the basis of these probabilities the total probability of any sequence
of elements appearing can be uniquely determined c
What may perhaps
be termed a weakness in the generality of approach manifests itself
here; the question as to what is the total number of possible samples,
given that there are
!
distinct elements in the universe, and which
are themselves the units of sampling, is not clear..
Clearly the answer
to this question will depend on hpw the elements are split up into
strata and whether when sampling they are, or are not, replaced.
Ignoring the problem of stratification, let us enumerate the number of
possible samples appropriate to the following situations when n drawings are made:
12
(i)
The eleIll3nts are not replaced after each draw.
(ii)
For the first rl-l draws, the elements are replaced, but
at the r th draw the element drawn is not replaced; next"
l
th
th
at the .(rl+l)
and up to (r +r -1)
draw the elements are
l 2
replaced but at the (r +r ) th draw the element drawn is not
l 2
replaced and so on up to the last set of r k draws.
In all
k
there are
:£
r q "" n draws and 1 < r < n.
""
q=l
(iii)
q
The elements are replaced after each draw.
A "chaotic" situation can also arise if" prior to any draw, after
deciding not to replace, we reverse our decision and put back some or
all of the elements previously removed.
-e
Conceptually, a theory of
sampling taking into account such type of "erratic" behavior on the
part of the sampler is possible.
In the climate of a unified theory
such a suggestion is admissible, even though the appropriate situation
However, in view of its nature, we shall not
would occur infrequently.
discuss this question further.
In situation (i) there are only N(N-l)···(N-n-l) possible samples;
r
r
r
in (ii) there are N I(N_l) 2···(N_k::r) k possible samples; and in (iii)
n
there are N possible samples.
In regard to (iii) there may be cases
!! distinct units in the sample and the
point when !! distinct units are obtained.
when we desire, say,
proceed up to the
case it is possible for n >N.
drawing may
In this
Sukhatme and Narain (1952) have dis-
cussed the point in relation to certain multi-stage designs.
remarks apply to situation (ii).
The Same
I have enlarged on this topic since
it helps to bring into relief' the special theories possible within the
total field of a unified theory.
In this thesis I shall limit myself
to situation (i), i.e., where the elements are not replaced after each draw.
13
To continue with Godambe's ideas, it must be said that the general
estimator which he proposed is referable to any of the above situations
relating to the question of replacement and nonreplacement.
He proves
that in such a class of estimators there is no unique minimum variance
estimator; i.e., one having coefficients which are entirely independent
of the properties of the population.
In
row view his proof is not convincing for the following reasons.
Prior to minimiZing the variance function subject to
!
restraints
(stemming from the condition of unbiasedness) he has eliminated from
one part of his variance function the unknown coefficients or weights
(which are the subject of study) by the use of the equations of
restraint.
Therefore the position is that he may not have been study-
ing the variance function of his most general type of linear estimator,
but some other roodified function.
Also if the classical method of
Lagrange is strictly adhered to, i.e., simply augmenting the function
under study without in any way tampering with its form, then a different set of equations may be obtained.
1.4
Theory Treated
In the special theory which is the concern of this thesis, I will
not follow the above procedure of minimization for the reasons already
given.
In the title of the thesis the words "general theory" appear.
there are semantic difficulties in how this theory should be termed.
Here
I
have referred to it in the body of the thesis merely as "the special theory,r
to be logically consistent with the ,rords Ita- unified theory" already appropriated, and I believe rightly so, by Godarrbe.
It is to be interpreted
14
as general partly in the sense that the numerical values of the probabilities to be used in any practical situation are discretionary and
partly in the sense that the system of ideas on the basis of which the
estimators are constructed are relevant to any scheme of sample formation where the elements, as units of sampling, are dra1-ID one at a time.
I stress this because it is possible to construct a theory of sampling-somewhat lIodd" but nevertheless rational--where groups or clusters not
necessarily having the same number of elements are simultaneously
sampled.
Effectively, therefore, "sample size" differs in the drawing
of each group.
It is possible to construct an unbiased estimator of
the total and an unbiased estimate of the sampling variance.
It will
be noted that such a theory is not encompassed by the present unified
theory.
·e
1.5
Notation
I have followed the standard notation of denoting an element by
u
m
with the subscript denoting the particular element in question.
Further, a sub-subscript
~
attached to a
subscript~,
such as in u
mn,
indicates that the mth element appears at the nth draw.
Next, Xi denotes the value of a characteristic of u ; and somei
times a sub-subscript t attached to !, such as in x. , indicates that
-
1
t
u. which bears the characteristic x. appears at the tth draw.
1
1
In regard to arbitrar,y probabilities, I have followed the nota-
tion of Horvitz and Thompson, and of Des Raj (1956), by writing "p"
in a lower case letter with the subscript denoting the element and the
sub-subscript denoting the order of the draw.
The letters attached to
Eo as superscripts indicate the elements which have already appeared.
15
For example, p~t denotes the probability of drawing u. at the third dru.w
~3
~
after Us and ut have appeared at the first and second draws.
The unconditional probability for drawing a given element at a
particular draw is denoted by a capital "P" with the subscript denoting
the element and the sub-subscript the order of the draw under consideration.
Thus Pi denotes the probability of drawing u at the tth draw.
i
t
A similar notation Pi'j denotes the unconditional probability of
t w th
th
th
drawing u. and u. at the t
and w draws respectively, the t
draw
~
J
preceding the wth; also a comma separates i and J. which appear on the
same line.
Horvitz and Thompson's notation for the probability of including
-
u. in a sample of n is P(u.); I have changed this simply to P. to be
~
-e
notationally consistent
~
~
l~th
the result
n
~
P.
t=l
~t
= P.~
the meaning of which will be apparent.
,
Incidentally Thompson (1952)
used Pi in his thesis.
Similarly for their P(ui,uj ), which denotes
the probability that u and u appear in a sample of !!J I have written
i
j
simply P.. , the subscripts i and j appearing in the same line. This
1J
-
-
again is notationally consistent with the result
=
"::I P
~ it j
t,w
' w
•
Cansado (1955) developed a system of notation in which the u's
always appear in the representation of arbitrary probabilities and
unconditional probabilities.
However, his formulas require more space,
16
and typographically speaking more labor in composition.
The omission
of the u's saves much space.
Also Pij is essentially Singh's
Thompson's P(u.,u.) and P.
1.
J
1.
(1954) notation for Horvitz and
is Des Raj's
(1956) notation for uncondi-
t
tional probabilities.
The lower case letter s identifies a given sample of !l elements in
the context used.
Thus P represents the total probability of a given
s
sample, s, of n elements appearing and P ( ) represents the joint
so
probability of a given sample of !l elements appearing in a specified
order.
Also there is notational consistency in the result
where the summation is taken over all the nl possible ways in which the
!l elements can appear.
The capital letter
~
indicates the total number of samples in all
possible orders of appearance and 3
the total number of distinct
samples.
Regarding summation sumbols, ~ denotes summation over all !l
ies
elements which are included in sample
Also ~
~
a typical eleroont being u i •
denotes summation over all samples which include u '
i
Symbols
such as
~
~
i<jes
~
s:Ji,j
bear similar meanings.
Seven Greek letters, ,,(, ~, y', 5,
lJ
Q,
¢, r1/,/
,
which denote weights to
be attached to the seven different classes of estimators, have subscripts and sub-subscripts whose meaning will be indicated when they
are used.
17
The population total of a characteristic in question is denoted
by!.
However, this! with a numerical subscript, such as T2, represents a class of estimators indicated by the numeral each of which is
an estimator of this population total.
The symbols not mentioned here except those in standard usage are
explained in each particular context.
'e
18
II.
DEFINITION OF THE PROBABILITY SYSTEI"1
Consider a population of
~
elements,
'1.'
u , ... ,u •
2
N
has a certain number of measurable characteristics.
Each eleroont
By taking a sample
in a way to be specified, and observing their characteristics, it is
proposed to estimate the aggregate of each of these measurable characteristics pertaining to the population.
It is given that the probabili-
ties of selection at each draw are arbitrary in the sense that the
choice of the numerical values which they may assume are discretionary.
Later we shall Bee how these probabilities may be chosen.
Also the
reason for leaving them arbitrary will reveal itself in the course of
the discussion.
The notion of arbitrary probabilities of selection for each
draw in sampling finite populations is due to Horvitz and Thompson
(1952), and the definition of the probability system which follows is
implicit in their 1952 paper.
The later elaborations in the display
of this idea and notation (the need for which was noted by the two
authors) were made independently by Singh (1954), Cansado (1955) and
Des Raj (1956)(>
Prior to the first draw let the probability of selection of the
i
th
element be p.1
(i:l,2, ••• ,N) where
1
p.
11
N
0, and ~p.
.~ 1 1
• 1.
1=1
Again prior to the second draw, when one element has already been removed, let the probability of selecting the i th element at the second
19
draw be either
1
p.
~2
2
tN.
~2
~2
or p . . . . ,p.
,It ...
dependL."'1g on which of the
or p.
~2
~
N
~ P~
I
= 1, t=1,2 p u,N
i(lt)=l 2
elements have been removed at the first
The superscript t in the symbol p~ indicates the element which
~2
has been removed from the universe. Thus there are ~ possible sets of
draw.
selection probabilities, speaking in an operational sense, or
~
possible
sets of probability distributions, speaking in an abstract sense.
Finally, prior to the nth draw, let there be NC 1 sets of selection
n-
probabilities for the remaining
N-n:I' elements, depending on which
of the n-l elements have been selected at the preceding n-l draws;
~mbolically
these
N
C 1 sets of selection probabilities may be written
n-
as
where
=1
have already been drawn prior to i •
n
The above probability system defines the way in which the sample
is to be drawn.
It is abstract and we may say that the frame which
describes the elements and the technique of sampling by which these
elements, as units of sampling, are selected with probabilities
prescribed by the system, are its real counterparts.
20
Thus associated with drawing a sample of size n there are in all
D .. 1 +
N
01 +
N
02 + ••• +
N
0n_l
possible sets of selection probabilities to be defined prior to each
draw.
In practice, of course, only
are used.
we
~
sets out of the
~
possible sets
shall hereafter speak of these D sets of selection
probabilities (or probability distributions) as the probability system,
or sometimes simply as the
~ystem
when there is no danger of ambiguity.
It will be noted that in all there are
NC + ... + (N-n-l
- ) NC _ .. ~r
n
NC
G ... N + (N-l) NC + (N-2)
r
n l
l
2
r=l
arbitrary selection probabilities in the system when!!. elements are
-e
selected without replacement.
Obviously there is an infinite number
of probability systems including the one where the selection probabilities are equal at each draw.
Subsequently it will appear that some
are better than others in the sense that for the estimation of a given
characteristic; a given estimator (irrespective of whatever properties
it may already possess), using a given probability
system~
smaller variance than when an alternative system is used"
will have
One topic
of research in the theory of unequal probability sampling is the search
for such systems.
Given the above probability system, the
~ prio~~
probabilities
for the selection of a given element at a given draw, for the selection
of a group of elements in a specified order, etc., are needed for the
development of the theory and can all be evaluated.
The various formulas on probabilities given below are contained
in the articles of Singh (1954), Cansado (1955), Des Raj (1956) and
21
those of a less involved nature in Horvitz and Thompson (1952), Narain
(19$1), and in Sukhatme's book (1953).
They are reproduced here in
order that the discussion may be coherent.
Let a sample of size !!. be selected (without replacement) according
to the above probability system and 1et
x. ,x.
~l
J2
,
be the values of a given characteristic observed on the
1.th,J.th,
. . . ) respec t ~ve
. 1y,
QO', Ith,mth se 1ec t e d e1 ement s ( or samp1e ·un1uS
according to the order indicated in the sub-subscripts;
i~eo,
element
u. bearing the characteristic x. is drawn first, u. bearing the
J
1
:l
characteristic x. is drawn second and so
J
Denoting by P.
~t
draw~
on~
the probability of u being selected at the t
i
the expressions for these unconditional probabilities are as
follows:
N
P.
12
;: ~
.
p. pi
j(,li);:lJl
2
(Here there ar.e N-IC terms under summation.)
1
N
s st
Ps Pt Pi
s,lt (11)=11 2 3
=~
P
i3
NI
(Here there are 2l - C2 terms under summation.)
•
Finally
(Here there are ( n-l)J N-lC 1
terms under the summa- ntion sign.)
th
22
In the expression for P.
~
superscripts in pi k ••• t
above, the symbols j,k, ••• ,t appearing as
~enote
the n-l elements uj,uk"",u which are
t
n
th
drawn prior to u ' the element drawn at the n draw. If we sum these
i
unconditional probabilities we have the probability of u.~ appearing in
a sample of n.
Denoting this probability by· Pi we have
Further
N
.~Pit =- 1, for t=lJ2 p u,n ..
~nl
If P.
. denotes the p!'obability that the i th element is selected at the
~t' J
·e
tth drawwand the jth element at the wth draw (W)t), then
In the above expression there are (w_2)ZN-2C 2 terms under the s~
mation sign o
wThe symbols appearing as superscripts in p~b •• oeif ••• lm
Jw
denote the t-l elements ua ,~0 ,no.,Ue which were drawn prior to u.,
the
~
+h
one element u which appeared at the t~ draw, and the w-(t+l) elements
i
th
th
uf' •• um which appeared after the t
but preceding the w draw. Also
we will find
23
and since
N
= 1,
~ P:".L
i."l
for all t
U
we have
N
'~P.
2
.
J..L' J
,,<.J
.( t.,
.' yJ.)""
lU
... 1..
W
Denoting by P
the probability of including the i th and the jth ele...
ij
ments in a sample of size n, it will be found that
-'
and funher v,'9
'tiil]
finj
N
(n-.l)P
=
i
Further,
~1 P I )
s\o
] Pij
j (fi ),;:]
denct3s the probability that a given sample
of elame:'1ts u s ~ ,
a,-'- 0....f.
sub-subscripts .. then
Q -
<>!I uk
1
n-.
•
u..
'.il
(
tl , 0
app'3ar in the o!.'der indiJated by the
n
'D
.L
~
)
= P.9.
1
a
ab·; ~ ,-.k
Pb 2 ~,.prn
,
and the total probability of obtaining
n
~
irrespective of order is
given by
where the summation is taken over all possible orders of appearance
of
ua~.o.~um'
there being nJ terms in all.
order it will be evident that there are n!
Taking into account
N
C " S samples of n elements
n
24
each set of n having probability P ( ) of appearing, which are not
s 0
necessarily the same even for samples composed of the same elements.
Also it will be evident that there are only NC • SI distinct samples,
n
each distinct set of !l eleIJ2nts having a total probability Ps of appearThese results have a bearing on what is to
ing.
follow~
Finally, it will be apparent that
and
p
where
t
th
S~it
draw.
s(o)
means all samples which include u ' when it appears at the
i
III.
3.1
ENUI-1ERATION AND CONSTRUCTION OF ESTIMATORS
The Axioms of Sample Formation
In drawing a sample according to the probability system defined
above (i.e., one element at a tizoo and without replacement from the
finite population of N), three features inherent in the nature of the
process of selection are evident.
They are as followS:
(1)
the order of appearance of the elements,
(ii)
the presence or absence of any given element (in the
sample) which is a member of the population (or universe),
and
(iii)
the set of elements composing the sample considered as
one of the total number possible (in repeated sampling
according to the given probability system).
In regard to (ii) it may be pointed out that if an element is
assigned an arbitrary probability of zero at each of the!!. draws, then
it can never appear in a sample.
The statements at (i), (ii), (iii),
of course, are perfectly general and apply to sampling from any finite
or infinite population, one element at a time, and with or without
replacezoont.
Further, as their veracity is self-eVident, we may
designate the three of them as axioms.
These features, inherent in the process of selection (and as a
result sample formation), supply the bases for the construction of
estimators.
Thus we may say that the approach here in constructing
estimators is a deductive one.
26
3.2 The Seven General Estimators
Let
N
T .. ] xi
i=l
where xi is the measure of a certain characterietic of ui (i=1,2'.Q.,N),
be the population total to be estimated on the basis of the sample
x. ,x, , .... ,x
~l
J2
mn
•
Considering the order of appearance of the elements we have the
estimator
n
Tl 101 ~
t=l
where
~
"1.X t
(t-l,2, ••• ,n) is the weight to be attached to the element
selected at the tth draw.
(For convenience the subscripts identifying
elements have been dropped from the XIS.)
·e
Considering the presence or absence of an element in the sample,
we have
where 13
T2 .. ~ ~ixi
ies
(i=l,2, ... ,N) is the weight to be attached to the i th element
i
whenever it appears in the sample.
Considering the sample obtained as one of the set of all possible
distinct samples we have
T
3
= ',-'
"'J
6s~
xi
,
ies
where..2 IlEans summation over all elements included in the sth sample,
its
27
and where
rs (s=1,2, ••• ,S') is the weight to be attached to the sth
sample whenever it is selected.
It will be recalled that there are
only S'= NCn distinct samples of n elements~
Further four other estimators are possible, taking into consideration two or alJ. of the features at a tine o
Thus we have
n
T4
• ]
'
8i X t
tal
t
where 5. (i=l,2, •• o,N; t=l,2, ••• ,n) is the weight to be attached to
~t
the i th element whenever it appears at the tth draw.
~
S =.~
T
x. ,
Q
8
1.&S
i
1.
where Qs. (i=l,2,~ •• ,N; s=l,2,~ •• ,S') is the weight to be attached to
th ~
th
the i
element whenever it appears in the s sample, and where 3
i&s
means summation over all i elements (there are ~ of them )
included in !J considered without regard to the order of appearance of
the elements.
T6
= ~s j
xi
i&s
where
~s
(S=1,2,Qo.,S)!I is the weight to be attached to the sth sample
whose elelOOnts appear in a specified order and where
It may be recalled that there are S=nINC
means summation
i&6
...
over those elements in s •
!I
~
n
samples, taking into con-
siderat ion order of appearance, each of which has a probability
ps(o) of occurring.
28
where
1/.-
't'
the i th
s.
(t.l,2,~ .. ,n;
e~~nt
s=1,2, ... ,5) is the weight to be attached to
appearing at the tth draw in the sth sample (whose
elements, of course, appear in a specified order).
Thus there are seven different estimators of the population total.
Each estimator, which is a linear function of the characteristics of
the observed
ele~nts
of the population, constitutes a
class in the
sense that for any given probability system there exists a set or sets
of weights which can be determined from conditions arising out of the
application of the criterion that the estimator shall be unbiased.
In
each class there will be as many sets of weights as there are linear
-e
functions which satisfy the condition of unbiasedness, and we shall
attempt to choose (in each class) the one which has minimum variance
l~rkoff;
in the sense of Gauss and
estimator for the class in
i.e., the best linear unbiased
question~
For reasons which will appear in
the subsequent discussion, only these seven classes of estimators are
possible and no others.
In the next paragraph we shall enUIOOrate the
total number of weights in each class.
3.3 The Enumeration of Weights
In class one and class two it will be evident that there are
!!. and
!
weights respectively.
As there are NCn distinct samples there
will be NC weights in class three, one applicable to each sample.
n
Regarding class four there will be Nn weights as each element has a
different weight at each draw.
In class five there are
~-lC 1
n-
weights, since each element appears in only N-1C 1 of the NC
nn
29
distinct samples.
In enumerating the weights in class six we first
note that the order of appearance of the elements is taken into consideration so that for any given set of !!. elements there are n! ways in which
samples can be formed; thus there will be n,NC weights for this class.
n
For class seven we note first that there are nJNC n samples considering
order; since each element has a weight depending on its order, there
will be in all n.n,N Cn weights.
3.4
General Comments
The estimators Tl' T , and T3' in their most general form, which
2
specify classes one, two, and three, were proposed by Horvitz and
Thompson
(1951, 1952)!I. Earlier Iudzuno (1950) proposed an estimator
which will be shown to belong to class three.
Independently, Lahiri
(1951) gave a procedure for sampling a collection of units with
probability proportional to the sum of their sizes and his estimator,
like I'1idzuno' s, reta i:qed the character of a ratio estimate.
Narain
(1951) also independently proposed an estimator which belongs to class
two, and his approach in arriving at it was essentially the same as
Horvitz and Thompson t s.
Das
(1951) also arrived at an estimator which
on exandnation will be found to belong to class four.
elaborated on a particular form of 11ldzuno's estimator.
estimators will be shown to belong to class three.
Sen
(1952)
All his
Godambe
(1955)
formulated a unified theory, and his most general estimator, in the
context of this special theory for a finite pOPUlation when the units
are replaced after each draw, is logically equivalent to T •
7
Y They
used the term "subclass."
Des Raj
I have used the term olass for
reasons which will be come apparent in Chapter IV.
30
(1956) proposed two estimtors which will be shown to belong to class
Regarding Godambe' s way of arriving at his most general type of
linear estimtor, it must be said that he did so after recognizing, as
I also do, the deductive approach implicit in Horvitz and Thompson's
work o
However, he did not posit the three features of sample forma-
tion as axiomso
Finally we note that the following statement which appears on
page 668 of Horvitz and Thompson's paper is pregnant with suggestions:
We have indicated above only three of the possible subclasses of linear estimators of T when sampling a finite
universe without replacement.
The application of these axioms has resulted in four more general ones,
including
Godambe~s.
IV.
4.1
UNBIASED ESTII"lATORS AIID THE PROBLEM OF I'1INIMUl"i
VARIANCE ESTIMATORS IN EACH CLASS
IntroductoEY Statement
Having defined the estimators" we shall now determine the weights
which give
~~biased
estimators in each class.
As stated above" in each
class there will be as many sets of weights as there are linear functions which satisfy the condition of unbiasedness.
In each class we
shall attempt to determine the linear function which has minimum variance" i.e., the best linear unbiased estimator.
Before we proceed it is pertinent to remark that the criteria of
unbiasedness and minimum variance" when applied to populations nonstationary in the stochastic sense" do not always yield unbiased
estimators which can be computed from sample data.
ing example Ulustrates this proposition.
Briefly the follow-
Consider an infinite
population; it is desired to estimate the mean value of a certain
character of this population" which is continuously changing during
the period when observations are taken.
One element is drawn randomly
at a certain epoch of time" another at a suoceeding epoch" and so on.
Altogether ~ stochastically independent elements are drawn.
epoch
i~en
x
t
is the character observed, the population has a certain
mean" ,LtV and a certain variance"
unknown.
At the t th
O"~
(t=1,2" ••• "n), which are both
Let fv be the mean value of the character (during the pericxi
the observations are taken) which it is desired to estimate.
a linear function of the
Xl
s,
ConSider
32
to estimate
f-'.
We shall determine the weights
E(F)
= IJ.
V(F)
=
~
such that
, and
= min.
It can easily be shown that!!
n
= IJ.:2
F
IJ.t (xtl ~~ )
tal
and
V(F) .. IJ.2
~ (IJ.~/~~)
G
tal
Thus clearly the sample values alone are not sufficient to obtain
an unbiased estimate of
l.lo~
We can, however, obtain soma estimate of
~ x/n,
I
IJ.,• e.g., x- .. ~
but we would not know how precise it would be.
One may therefore ask, does a best linear unbiased estimator independent of the properties of the population
The answer is certainly no to this question.
(~, ~t' ~~J
etc.) exist?
Formally it does exist
in some sense, although from a practical point of view it is useless.
When the population is stationary, i.e., when all the respective
moments relating to each epoch are equal, which implie s that
~
= "....
j.Ln
= IJ.
and
222
~l
=•
0'"
..
~n
=~ ,
then we have the well-known
classic results:
F
= (lIn) 2x
and
V(F) .. ~2/n •
!lWhen
we consider the estimation of a mean value with data from
samples of size one each coming from n infinite stationary populations
with different means and variances the same results are also obtained.
But it will be recognized that the problems are not logically equivalent.
33
Logically, therefore, these results could not have been possible without the formal existence of F.
There is nothing new in the above
formulas, but in the theory of estimation it appears that the underlying
implications have little or no significance.
However, in the theory of
sampling finite populations (in the present context) there are somewhat
analogous results which are of interest in a number of ways.
In what
follows we shall comment on these types of results lIyhen they appear",
4.. 2
Class ene Estimators
Let us consider the general estimator in class one; i.e.,
n
T
=2
~Xt·
tel
First it is pertinent to inquire whether weights can be obtained for
this estimator which are independent of the properties of the popula...
tion.
We have
n
E(T l ) ..,] ~E(Xt)
t''''l
To satisfy the condition of unbiasedness we must have
N
n
~Xi~'\P~
i=l
tel
N
= T = ]xi
'
i=l
and for the -<' s to be independent of the properties of the population
we must insist on the condition of identity, i"e.,
34
N
n
~x. ~.,(tP.
~J...::::J
i=l
= ~ x.
J.t-~
tel
J.
i=l
which leads to the conditions
n
].,(tPi = 1, for i=1,2, ••• ,N o
tal
t
These conditions are clearly impossible since !l « N) unknowns cannot
simultaneously satisfy
!i equations.
Thus this estimator cannot have
weights which are independent of the properties of the population
except in a special case which we shall derive as follows.
Initially
we shall attempt to find weights without insisting on the condition
of identity, ioe., E(T1 ) • T, and also such that V(T l ) • mino
Next
from results arising out of this consideration we shall show that the
best linear unbiased estimator exists with weights independent of the
population values when the selection probabilities are made equal at
each draw"
We have Y
V(T l )
=E
f:l - E(T l )]
n
=E
2
~
Lt=l
t (x.J.t
-~
'
i=1
t<w
P. x.)
J. 1
t
N
+2 ~ ~"\ E(x i
17
2
N
. .,(
-
t
J
N
'~ P.
x. )(x . - ~ P. x.) "
~ J.
J.
J
~ J
J
i=l t
w j=l w
- I have reintroduced the subscripts i identifying the elements in the
x's as this is necessary for the saKe of clarity in defining the
moments o
Defining the following
35
moments Y
N
u... (t) .. E(xi )
t
• .1
~
III
~
P. x. ,
l.t l.
i=l
for t=1,2, ... ,n
2
1J.2(t) .. E [Xit-Il:t(t)]
=
N
.]P [xi-lJ.:L(t)
it
2
J
l.=l
'1.1 (t, w)
..
E [X -1-lJ. (t ~
it
N
..
[X jw-'1. (w)]
N
~ ~ Pi j [xi-IJ.:L (t~
i=l j(;'i) t, w
Lxj-Il:t (w)]
,
which are analogues of those in equal probability sampling, we find
"e
and the condition of unbiasedness becomes,
n
Tl
=~ -(tl-lJ. (t).
t=l
To determine the
-(fS
which make T a minimum variance estimator
l
we set up the function,
YThe
definitions of higher moments and product moments follow on the
same lines; thus
36
where
~
is an undetermined Lagrange multiplier, and we solve the fol-
lowing equations:
5H/5..\
=0
'" 2,.(t~2(t) + 2 ~
'"'w'"1.1(t,w) + 2'"1.(t),
(t=1,2, ... ,n)
w(~t)=l
together with the condition of unbiasedness.
equations and n+l unknowns (together with
solve for the
~),
There are n+l independent
and therefore we can
They are as follows
~Js.
•
•
•
e
•
•
~'"1.1(n,1) + ,.(2~1(n,2)
-e
"'1.'"1. (1)
If
~
+ ,.(2~ (2)
+ ~
+ ,.(n~(n)
'"1. (n)
,., 0
... T
represents the men product-moment and moment matrix i
•
•
~ll(n,l), '"1.1(n,2) e., ~2(n)
,.( the column vector (~"",.(n)' ~o the row vector
~(l)""''"1.(n)
and 0 the null column vector, we have concisely the equations in matrix
notation as
[~J [f H~] ,
and thus
[f]= [~] -1[%1
37
Clearly the .... rs will be functions of the moments and product-moments,
say ft' each multiplied by
1.
Thus symbol.ically
Hence
('"." J:1. ma):,rix
r~Citation
V(T )
l
t=
T2.f~r
J
where f = [f1' f 2 , ••• ,fn is a row vector"
When the selection probabilities at each draw are made equal,
i.e.,
Pil ..
",e
liN
,
for
i.l,2,~ •• ,N
etc.,
we find
P.
J.t
.. liN and P.
J.t
j
, w
so that
~(t) .. ~ =
(liN)
N
= ~ ..
(liN)
2
i=l
..
liN "l/(N...l)
38
Under these circumstances (of equal selection probabilities) the above
results simplify to the follolv.lng:
~t
= N/n,
for t=1,2, ••• ,n
so that
T]. =
n
(N/n) ~ xi
i=l
and after some algebra we will find
(N-n)/(N-l)
,
which are familiar results, the formula for the variance being due to
Isserlis (l9l6).
Before we pass on we make the following observations:
·.e
(a)
When the selection probabilities are unequal the populations
of elements eXisting prior to each draw, as evidenced by the values of
the moment coefficients, are non-stationary in the stochastic sense and
they become stationary only when the selection probabilities are equal.
(b)
Evidently this is the analogue of the infinite population
case noted earlier, and we note again that the property of stationarity
confers advantages in that under these particular circumstances of
sampling (the main feature being one element drawn at a time) unbiased
estimation becomes possible.
39
It may be of interest to write down the general estimator for
the case when n-2.
The equations for solving
~, ~2
and A are as
follows:
r jJ.2 (1)
~l(l,2)
'_ ~ (1)
~l(l,2)
'1.(1)]
~(2)
~(2)
~(2)
0
and we find
Til
l(n=2)
I:;
1
- I
~I
I
0
· r T0
I
I
1_
J
T
1(n=2) ] = T~
It may be noted that EfT
One may ask under what circumstances the estimator T
-e
1(n=2)
peets of being useful.
has pros-
Suppose it is lmown (or assumed) that the
characteristic! is related to l by the approximate relationship x
=c
and that the y-values are known for all!! elements in the population.
Then we will have
and
and
where the rooments and product-moments, on the right-hand side of the
above three equations refer to those of y.. computed by the formulas
given on page
35.
from those for !.)
(The suffix l is added merely to distinguish them
y,
40
Also we note
N
T ..
c~y
~
... cT •
Y
J.
i=l
Substituting for! and the moments of
~
with these values in the
formula for Tf
' we get exactly the same type of estimator, since
(n=2)
the c' s in the nunierator ano. denominator cancel out. 1rle may write
this estimator as
where
~(y)
and
~2(Y)
are weights known in terms of the y's, and their
functional form can be identified in Tf
• We define this to be
(n=2)
a simulated minimum varianoe estimator. We have
'.e
unbiased estimate of V[Tr'
J can be found if we can find unbiased
(n=2)
estimates for j.L2(1), ~(2) and IJ.u (1,2).
An
We will find unbiased estimators in the general case, i.ee, for
~(t) and j.Lll(t,w).
Now
].
41
we
note before proceeding further that x.
~t
is the
i th element appearing at the t th draw and x.
the jth element appearing at the wth draw.
Jw
charac~eristic
of the
is the characteristic of
IJ.2(t)
can be rewritten as
Thus,
~(t)
=
xi - P. x~ ~t J.
is an unbiased estimator of IJ.2(t).
We note that only the i th and the
jth elements enter into the expression for the estimator.
-e
Also P. ,
Jt
a coefficient of xix j ' is the unconditional probability of the jth
element appearing at the tth draw, and is not the same as P .• Considered
as a
•
quadrat~c
Jw
1\
form IJ.2(t) is not positive definite and there-
fore we may sometimes have negative estimates of
~(t)o
Similarly
we find
).
We note that
N
Q
2
P. (P. x.)
~ Jw Jt J
J=l
'"
42
and therefore
is an unbiased estimate of
= 1.
whenever c l + c 2
= c2
Perhaps it is simpler to choose c l
~(t,w)
so that another unbiased estimate of
~
P. P.)
~t J w
~l(tJw)' ... xix. 1 - P
J.
•L
.
~t,JW)
... 1/2
is
2
- (1/2)(P.Jtx.J
2
+ P. x.) •
~
w
~
Hence, for the case n = 2, where we recall that the i th and the j th
elements have appeared at the first and second draws respectively
(for the sake of breVity they have been written xl and x 2 previously),
we find
1\
V(Tl'
(n=2)
2 (02
x.(l-P.
) ... -<J.(y)
+
~
~l
P.~l P.Jl
) - P•
~(y) \~Gc~(l-P.
J
J
•
~1,J2
)
x.x.
;l. J
Pi P.
) - P.
2
2 J2
.
~1,J2
I have discussed the case of n ... 2 because, by and large, it is
the most usual case considered in unequal probability sampling.
The
43
arguments for the case of !:!. elements follow along the same lines.
Of
course, some tecrmical difficulties will be experienced in constructing
1',
the estimator T
given symbolically as
T1'
n
=]
.(t (y)x t
,
t=l
since in the calculation of the .(t(y)ls the first step will be to
calculate the n(n+l)/2 moments and product moments of the yt s plus
the n first moments.
The next step will be to solve for the
~(y)IS
according to the set of n+l simultaneous equations given on page 36.
Equally, before all this is done, the computation of the unconditional
probabilities will be laborious.
Before we leave this topic we note that the
mations of the true .,(t' s.
when
~
are approxi-
They will represent the true values only
is directly proportional
from bias.
~(Y)'s
to~.
Thus, T1' will not be free
However, this is the price we pay for attempting to con-
struct an estimator having the appearance of a minimum variance estimator in the
conserved.
h~pe
that the property of minimum variance will be
If the assumed relationship holds for a large number of
elements we have reason to believe that this property will be conserved.
4.3
Class Two Estimators
The general estimator in class two is given by
T
2
="~.x.
J::J ~~.
ies
44
We recall that ~. is the weight to be attached to the i th element
J.
whenever it appears in the sample.
The weights which are independent
of the properties of the population are well known (Horvitz and
Thompson, 1952), but those which are not independent have not been considered.
We know from the case of TIl that such an estimator has
possibilities of becoming useful whenever each of the characteristics
of the population currently under investigation is related to some
other known for a long time among writers on probability, and it
appears to have been first used by Isserlis (1916).
Let z. be a characteristic random variable such that
J.
z.J.
= 1,
whenever element u.J. appears in the sample;
= 0, whenever element ui does not appear in the sample.
Thus we find
A
A
A
E(z.)
•
1
.P.
+ a (l-P.)
J.
J.
J.
= P.J.
and
[Probability that both ui and u j appear in the
sample] + OA.OB [Probability that u i and u j do
not appear in the sample]
+OA. l B [Probability that u appears but u
j
i
does not appear]
+lA.OB [Probability that u appears but u
j
does not appear]
= Pij ,
for all A,B
= 1,2,3•••
and ifj.
i
45
We only need the above expectations for cases when A
= 1,2
and B = 1,
but for the determination of higher moments the more general cases, as
presented here, are needed.
Hence
v (z ~.)
P. -
;0
~
P~~ = P.~ (1- P.~ ) J
Now the estimator can be rewritten as
zi~~i
~ ~ixi
;0
ies
since for the remaining N-n elements which do not appear in the sample,
the z I S are all zero
0
1.ve have
N
·.e
N
2
E(T 2 ) ... ~ E(zi)~iXi ...
Pi~iXi •
i=l
i=l
For the estimator to be unbiased as well as for Bls to be independent
of the properties of the population
N
~ Pi~ixi
i=l
insist that
vJ8
=
N
~xi
i=l
so that we must have
P.~.
~
and thus
~
= 1
for all i,
46
We note that there is one and only one set of such weights.
Therefore,
as noted by Horvitz and Thompson,
T2 = ~ (Xi/Pi)
iea
is the only type of unbiased linear estimator possible in this class,
and in this very special sense it is the best since it is the only
estimator which we can choose,
Next we have
·.e
We note that when we put ~i = l/P , Horvitz and Thompson's formula
i
is immediately obtained; ice"
I-P.J.
Pi
Obviously one cannot obtain an estimator such that V(T ) is a
2
minimum subject to the formal condition E(T ) ~ T, since, as pointed
2
out above, this condition leads to one and only one estimator to
choose from.
47
we proceed to find the minimum variance estimator which, as will
be seen subsequently, will not be independent of the properties of the
population.
E(T 2 ) = T.
For this we minimize V(T2 ) tubject to the restriction
Setting
we solve the equations
6H/&~i
=0
D
2~.X~P.(1-P.)
+ 2
1. 1. 1.
1.
N
] ~jXiXj(Pij-PiPj) + 2A.x i Pi ,
j (1i)=1
for i=1,2, ... ,1'1,
together with
1'1
] Pif3iXi
i=l
·.e
The above
!
= T.
equations reduce to
N
~
~
x.1. P..
/3.
1.J J
+ A = To
j(li)=l
These, with the restricting equation, constitute a system of 1'1+1
independent equations with N+l unknowns.
Writing them in full we
have in matrix notation,
1
P2l/P2
P12/Pl
1
• ••
·..
•
•
•
PN1/PN
Pl
f3 l xl
T
P2N/P2 1
f3r2
T
·••
..
1
• ••
P
PN2/P
N "
P2
1
P1N/Pl
N
~J l~
•••
:=
L.
T
48
Hence the solution of these equations will be of the form
where f
is purely a function of probabilities and to be specific it
i
is the ratio of determinant of the matrix on the left-hand side of the
equations with l's substituted in the i th column divided by the same
"
determinant.
To illustrate when N = 3 and we have samples of n ::: 2,
then
T
~l "" xl
13
-e
2
Q
~3 '"
-pt13
p
-
3
1
L[ ~~23PJ
~) - ~3
P j/n
3
+
P
2
2
P
P12 - P~3l/n
1
13 P23j_ 2-/
[P12
(P
i~ PJ
x2
P2
+
Pl
PI P
3
+
T
Pl
P
2
P
12
P P
1 2
0
Again there are possibilities of using the estimator when we know
that x is related to some characteristic
~
say, approximately by the
relationship
x = cy •
In this situation the
~'s
become known, and we find
49
The labor of computing from these probability functions will of course
be considerable.
It may be noted that the estimator assumes the form
of a weighted ratio estimate
T~
n
= Ty ~
(xifi)/Yi •
i=l
When the probabilities of selection are equal, then f.1. ~
estimator reduces to Ty:Z ~
variance of
T~
In,
and this
a well knolffl ratio estimate.
The
is given by
V(T~) .. T; [~ ;~ X~Pi(I-Pi)
+ 2
i=l
-- fif.
.~
~.x,(Pi'-P,P,)
y.y. 1. J
J 1. J
• . 1. J
J
~
•
1.<J
we have merely substituted the expression for the p's in
·.e
lIn
V(T~p.
The un-
biased estimate of V(T 2) is given by
2 [ n f2 2
f.f.
V(T 2") .. T ~ i x. (l-P.) + 2 ~ .1:.l xix.
Y ~-).
).
~ y.y.
J
i=lY i
i<j 1: J
•
These results appear to be mainly of theoretical interest and, of
course, conditional upon the appropriateness of the functional relationship assumed to hold.
In practice the formulas may be
unwieldy for use except for sample sizes of n .. 2 from populations
of N = 3 or
4.
Lastly we note that this estimator, which has the appearance of
a minimum variance estimator, without strictly having its properties,
will certainly not be free from bias since the
true
~'s.
~'s
used are not the
But it is likely that the property of minimum variance will
1'.
be conserved to some extent as in the case of T
price for simulation.
Again we pay a
50
4.4
Class Three Estimators
The estimators in class three will now be considered.
We recall
that for
N
there are S' = C weights" one for each distinct sample..
n
For the T,
to be unbiased and for the 'Y r s to be independent of the properties of
the population we must have
Sr
= r:
N
s=l
r:x;.= r:
Ps'Ys
i€s I
i=l
~,
i.e."
N
r:
i=l
N
Xi
r:
s~i
Ps'Y s E
r: xi"
i=l
r: indicates summation over all those products P 'Y which are
s;i
s s
the coefficients of aggregates of sample values which include the
where
element u •
i
There are N-1 Cn_ such product terms.
l
r:
s.:<1
Thus we must have
Ps 'Y s = 1"
There will be many sets of solutions to these N equations.
Now if for each i, 'Y a
biasedness becomes
i.e.,
= 'Yi
for all
a~i,
then the condition of un-
51
and this leads to the unbiased estimator in class two proposed by
Horvitz and Thompson (1951, 1952).
We note that one set of solutions
is given by
'Y
s
= l/(ps N-I Cn-l )
for all s.,i and all i.
Hence one subclass of unbiased estimators is given by
T' = li/(p N-I C
)1 1: x
3's
n-l' i€s i '
which includes Midzuno' s (1950), Lahiri's (1951) and Sen's (1952)
estimators for the case of one-stage sampling.
Des Raj (1954) stated
that this is the only unbiased estimator of the aggregate of the
population, of the particular form which has been defined in this thesis
".e
as class three.
Clearly this is one among many.
Next we have
S'
V(T )
3
= 1:
s=l
ps('Y s
1:
Xi)
2
-
i€s
We shall determine 'he 'Y's which minimize V(T ) subject to the
!!
restrictions on the unbiasedness of the estimate given above.
Then
3
the best linear unbiased estimate will be obtained.
For this we set
52
where the A'S are the Lagrange multipliers, and solve the S' equations
for all s,
together with the! equations representing the restrictions.
Substituting unity for
E P r , which is the coefficient of each
s_'i s s
Xi' when the expression in square braces is rewritten, the above equa-
tion simplifies to
+P
EAi=TP
Ex
s ies
s ieB i
for all s,
or
'.e
rs
,
(.E
ies
\ 2
x.1 'I
)
+ E A.1 = T
ies
.E
Xi'.
6=1,2, •• • ,S'.
ies
The restricting equations which we have already used to simpli:fy the
above equations are
for i=1,2, ••• ,N.
There are (NCn + N) independent equations and the same number of unknowns (including the N i\' s) and therefore a unique solution for these
unknowns exists.
This solution will involve all the x's.
Technically
speaking the algebraic solution of these linear simultaneous equations
will be cumbersome.
Of course, the solution can be written in matrix
notation, but in this connection the symbolism will be barren.
There-
fore let us simply say ors for s=1,2, ••• ,8' and o~i' i=1,2, ••• ,N
represents the values (weights) obtained as a result of solving the
53
above (NCn + N) equations.
Then the minimum variance estimator T,3 will
be symbolically represented by
Til::
3
r
1:: x
0 S
i€s i
and its variance by
S'
2 I
'2.2
= s=l
1:: Ps or s I 1:: Xi)
\i€s!
V(T")
3
- r
·
Symbolically the estimate of this variance will be
',e
where s is the particular sample in question and ~.; is its corres.
0 s
ponding weight. The expectation of the expression in braces multiplied by lip
is equal to
s
rJf.
1:: xi over all !
when summing
i€s
This is easy to show if we note that
X~ occurs N-1 Cn_l times" and also when
summing 1:: xixj over all !" Xi and x jl when 1<j" occur together
i<j€s
N-2
C _ times.
n
Clearly
2
Also
VeT';)
~
VeT;)
can assume negative values.
can be given in terms of the oA.1 'so
If we add the
set of S· simultaneous equations where P is present we will find
s
S·
1:: P i'
s=l
6 S
. 2
1:: Xi )
i€s'
S·
+ 1:: P 1:: A.
s=l S i€s i
=T
S'
~ P
~ x
i=l a fEB i
i.e.,
and therefore
where
o~i'
for all
!,
is the solution given above.
If there is some approximate relationsh:f.p between xi and a characteristic Yi' known for
all!,
such as x = CYi then we can obtain a
simulated minimum variance estimator by substituting for the
XiS
in the
above system of simultaneous equations and solving for the ,,' s and
-e
Thus for BJJY semple s we will have a simulated value
0"s •
-
0
~ IS.
" " of the true
s
The cl,-'seness of each of these values to the true value will of
course depend on the degree of closeness of the relationship x
= cy.
This estimator will then be
Til' =
3
",'
0' s
Zx
ies i '
and the estimate of its sampling variance will be obtained simply by
substituting
of size n
0" ~ for 0"s in the expression for V(" T; )
= 2,
and N
= 4, 5, 6, 7,
and
8,
0
For samples
the number of
unkn~~1S
in-
volved in the solution of the linear simultaneous equations will be
10, 15, 21, 28 and 36 respectively.
alsO be very simple.
The calculations involved will
The only operations needed are sums of combina-
tions of numbers taken n( =2) at a time and their corresponding squares
55
and the total for X,
In solving the
be absorbed into the ~
equation~ the
constant c
2
can
fS.
Among the three simulated minimum variance estimators presented
so far,
Tl" T2;
handle in
and
re~ard
Tj"
the last one will be the least cumbersome to
to practical applications.
Class ..-.
Four Eetimators.
The getle!'al estimator in class four is given by
n
"""
-'d (.; i
xt
''\
t=l
where we recall
t
t.
(i=-l, 2, •• •, N; t e l,2, •.• , n) is the weight to be
"h Y
th
attached to the i"t element whenever it appears at the t
draw.
~T
There are iwo approaches in deriving the expected value of T
4
•
In one
approach we use the unconditional probabilities, i.e., the probabilities
of the elements appearing at specified draws; in the other we use the
probability distributions at each draw and in this sense the derivation
rests much more on first principles, althoufh much more cumbersome.
The former approach is also geared to the use of the characteristic
random variable, but the latter is not.
We shall first find E(T
Let z.
l.t
4) using
unconditional probabilities.
be a characteristic random variable such that
z.
l.t
:=
1, whenever the i th element appears at the t th draw.
= 0, whenever the i th element does not appear at the
tth draw.
11 It
is easy to solve such equations by electronic machines.
56
Then we may write T as
4
N
n
~ ~ 8i
t=l i=l
t
xiz i
•
t
We note again that the x's are mere numbers.
To specify a character-
istic of element u. fully we should write x. • But here the sub~
1
-
-
subscript t has been suppressed in view of the presence of z.
N
~ ~ 8 i x. E( z. ).
n
E(T
)
4
=
.
t
t=l 1=1
Now
1t
1
We have
1r [Probability that u
c
appears at t th draw]
i
+ Or [Probability that u
i does not appear at
t th draw]
P. , fer r=1,2, •.. ,
1
t
so that
N
n
= ~ x. ~ P.
1
i=l
1
t=l
t
S lot
..
For T4 to be unbiased we must have
N
n
E(T ) = ~ Xi ~
4
i=l
N
=~
x. ,
1
i=l
t=l
and for the ~'s to be independent of the unknown x's we further insist on the condition of identity; i.e.,
N
n
~x~ ~P
~ --1.~,
i=l
t=l
N
('
i
0
t
- ~x.
.:. .::; -i. •
i
t
i=l
n
Hence,
~
(,
kPi b i = 1
t=l t t
for i=1,2, ..•,N.
BIOMATHEMATICS TRAINING PROGRAM
57
vle can find many sets of solutions to these equations. One set
of solutions is given by
J i = ct/p.
t
for i=1,2, •.• ,N
J.t
provided
Thus we have
n
Til = ~Ct{Xt/Pi ).
4
i=l
t
This is the estimator proposed by Des Raj (1956).
-we
be simpler to take c
l
= .••
= c
n
It appears to
= lin so that one estimator in this
subclass will be
n
:2
(lin)
(xt/P·1. ).
t=l
t
Let us attempt to find the minimtun variance estimator.
We have
N n
= "''''"'
L<~~'
i=l t=l
Now
22(
.x.v z.1. ) + 2
1. 1.
t
.
J.
i=j, t<YJ
i<j, t=w
i<j, t<w
t
. x. x. Cov (z. , z. )
1. J
J.
t Jw
Jw
58
When i = j, t
< w,
Cov (z. , z. ) = E( z. z. ) - E( z. ) E( z. )
1
1
1
Jw
t Jw
t Jw
t
th
th
Now E(z. z. ) = 0, since i
unit cannot appear at the w draw when it
1
J.
t w
th
has already appeared at the t . Therefore
Cov( zi ' z. ) = 0 - P. .P.
1t
1W
t Jw
Similarly when i
< j,
1t
Finally when i
< j,
t
Jw
=
E( z. Zj ) - E( z. ) E( z. )
t
1t
= E( z.1
Hence, with these results V(T
$ 's
J.w
1
t
Jt
< w,
Cov (zi ' Z. )
t Jw
the
P. •
t
t = w, we find
Cov (z. ,z. )
·e
= - P.J.
z. ) - E( z. )E( z. )
1t
Jw
t Jw
)
4
is known.
tve shall now determine
subject to the restrictions on unbiasedness of T4'
For this
we set up a function
H
= v( T4)
+ 2
N
n
~ Ai ( ~ Pi (;
i=l
t=l
t
i
- l)
t
where the AI s are the undetermined Lagrange multipliers and solve the
following equation s:
59
n
= 2f.','
2
X. V(
l.t l.
"'\;1"
z. ) + 2xi
l.t
2;[> i._Xl.' Cov (z. ,zi)
-w
l.t
3(,t)=1
w
Cov (zi ,Z. )
t
Jt
for C-i=1,2, ... , N
Lt=l, 2, •.. , n,
together with the!! restricting equations.
Substituting for the
variance and covariance expressions and rearranging terms we will find
N
·.e
- x.P.
l. l.t
n
~Xi~
i=l
t=l
for all
1:
and
.!.
The restricting equations are
n
]
P.
.t=l
b.
It l.t
::;
1,
for i=1,2, .
0 0'
N•
There are Nn+N independent equations and the same number of unknowns (including the N X' s) •
involve the the unknown
estimator TI
4
exists.
Xl s.
Clearly the solution for the ~; I swill
Thus at least formally a minimum variance
If (as before) we attempt to approximate for
these unknown values of the
Xl
s by a function of known values of a
related. characteristic ib then it will be possible to obtain simulated
values of the ~ I S and thus to have an estimator (as before) having the
appearance of a minimum variance estimator.
However, in view of the
60
complexity ()f the solutions for the
S 's we will not write the actual
expressions for this estimator and its variance.
,
4
Next we shall determ";.ne the expected value of T
bilities releva.l1t t::> I')''j,ch dr\j:\""
definf.:s
c.l1t:'1(·s~
p:,:,cb.'>.bilitiee
clitiona.l probr.:bi:Hies.
element u
i
1'7(.:
using the proba-
We re':lall that the probability system
v.hL~h
are somet-That analogous to con-
3.:r- gue ~.('cm
appeal's 8.t the fi:.."st 0raw
first
pr~L.nciplds.
u
at the seGond.
@1Q
represent the character observed on u..
J
j
Suppose
Let x.
J2
Then
E(x. ) =, E [E(x. /u. appeared at the first draw.)].
J2
J2
1
have defined [p~ J. Next we can find
J2
the expectation of the reSulting expression, since we have defined
We can find E(x.
J2
".e
J.
[Pi
u.) slnca
'i/\J
1
Thus
1
IV
E(x. ) ".. E
J2
2 P~2Xj
:.;
j(~i)"'l
Now for three or more draws the above notation for expectations applicable to eaGh draw becomes cumbersome.
Thus if
character observed on Uk at the third draw then
E(x. ) = Ef~r~{'~
'~3
I
1... \
~
is the
3
ap~eared aJ!'
st
u. appea:..-ed at 'the 1
') U
i
K
1
d
n
3 draw,
and u. at the 2
draw
the 1
J
draw._
)
1
For this reason we denote the above operations of taking expectations
siJnply by
and
where the symbols
F1.'
E2 and E refer to the operation of taking ex3
pectations applicable to the first, second and third draws respectively.
61
Generally for the expectation of the variable appearing at
the nth
draw we can write
"
. m . ..,
Elx
n
R.
~
E,., ••• E ( x )
n mn
L
meaning that "j,he operation is C'u'ried out in 12 steps using probability
distributions l'elev3.l1t to each of the .!2 draws.
E ;;
JS. E2 • •• En •
Symbolically
{.lhese remarks apply equally to functions of the vari-
ables involved.)
Thus we hEl.ve
E('I
4)
= E('
~8,l.tx,l.t )
t=l
....
"
= E. (2,;
x. )
-_e
~l
1
E
n
Now
x. ) + ••• +
J2 l.2
+ R. '1<:_ (c> .
~~
l.1
R. '1<:_ ...
~~
Co m xm ).
n n
N
N
'\) = R.
K E,., ( x. ~'.
"~"
L
~
J 2 J2
"'i p,i x.r,
J'
"J J J
2
2
and so on up to
N
= .L!.
""" Pi
N
"'" Pji
LJ
i=l 1 j(,Ii)=l 2
When we arrange terms we will find
...
N
"" Pl.'
~
LJ
= ~
&.!
j(~i)=l
N
i=l
Ij(~i)=l
Pij 2X ' J'J'
J
J
2
62
Hence, as before,
1j.(
)
'
. 'T..L4
n
=
~
N
~
n
N
~~,..;'
c
~
,l'i:! ···i
'" i
= LJx
p
it- it
t:.:l i..,1
i=l
"".... P
~"i i '
t:::l
t
t
and the same condit.i.on of unbiasedness, i.e.,
n
.",~,
/,v.
.71
.,
t =.1.
~.
"C
P.~
=1
for all ,1,
"t
follows.
Further if for B.ny i,
-
d\ .: ,.;'. for t=1,2, •••,n, then the con..
"'t
l.
dition of unbiasedness becomes
n
= 1,
-.e
for all
.1
i.e .,
or
r·
/
,
t')i=l,P
i
for all i
and this leads to the unbiased estimator in class two.
We shall now obtain another unbiased estimator in class four.
Consider the
~pressions
for the following expectations:
6)
where c ' c 2, ••• c are constants which will be chosen to satisfy the
n
1
condition of unbiasedness.
Substituting tbese
S' s
Denote the resulting estimator by.. Tl:' .
in the relevant expressions for the above ex-
pectations and adding them we obtain, after some algebra
N
N
E(Tl1l) = c ~ xi + [c /(N-l)]:;:'
l
2
N
~
xj +
i=l j(~i)=l
i=l
N
+ [c /(N-l)(N-2) ..• (N-n:I)]
n
2· ..
N
~
i=l m( ~l. •. ,Ii) =1
X
m
N
=
·_e
[c + c + ••• + cn ] ......,
~xJ..•
l
2
i=l
Hence,
T41t' = [clip.
J.l
Jx.
J.1
+
[c~/(N-l)p. pJ.j. ]x.
~
~
2
J2
+ •••
...
is an unbiased estimator of
provided
'\.'
4' was first
The estimator T
obtained by Das (1951), but the probabilities
he used at the second and sUbsequent draws were based on the initial
probabilities p.
J.
(i=1,2, ••• ,N). He gave the variance of this estil
mator and the unbiased estimator of this variance when c = l/n for
t
•
all.!:., and he noted also that the values of the c's which minimized
V(TIl') were functions of all the parameters of population.
Des Raj
Later
(1956) reported the same estimator but using arbitrary prob-
abilities at each draw.
In the same paper Des Raj also reported another unbiased estimator
which now will be shown to belong to class four.
tl
= x.~l/Pi1'
t
=
t
-e
2
3
He first wrote
x. .,. x. /p~ ,
~l
= x.
~
+
X
J2
J2
.'+
J2
Y'k
+ ... +
3
and stated that the unbiased estimator was
T"4" = c1t l
f
C
t + ••• + cnt n
2 2
where
n
~Ct = 1tel
Now this estimator can be rewritten as
-
.•• + Cn]X.
1
+
1
(C2/P~
J2
) + C + ••• + C ]Xj
3
n
2
65
To prove that it belongs to class four we need only show that when
Si1
;2
= (cl/P.
~l
a
=
) + c
(C2/P~2)
2
+ ••• + C , for i=1,2, ••• , N
n
+ C3 + ••• + cn' for j=1,2, .•• ,N, but jfi
ij ••• l) ,
( cnI p m
J ~
for m=1,2, ••• ,N, but mrlr
••• ~r i,
n
N
= ~Xi
= T,
i=l
provided
We have, after substituting for the
Sf s,
+ ••• + E1 ~ ••• En (c/ij
n Pm ...l)xm •
n
n
Now
~ [(Cl /Pi ) + c 2 + ••• + cnJx
i1
1
= ~[(cl/Pil)XilJ
+ [C 2 + ••• + cnJE.t(\.:
N
= cl
2Xi + [c2 +
i=l
... + cnJE:t (~,
66
R.F._
~-z
[(02/P~J
) + 03 + ••• +
2
°n Jx.J2
N
= °2
~ xi - °2E1(X } + (°3 + ••• + Cn )11. E2(X j2 ),
i1
i=l
-e
~E2E3[(C3/P~~)Xk3] +
(c4 + ••• +
°n)~E2E3(~3)
N
= °3.2Xi - °3E1(X
i1
~=1
Cn)\E2E3(~3)'
) - °3\E2 (X ) + (°4 + ••• +
j2
and finally
N
.. cn
~
Xi - cnE:t (X~) - cn \ E2(X
j2
i=l
- °n~
E... E
2
••• E l(x..
n-
~n-l
cn~E2~(~3)
) -
- •••
)•
Collecting all these expectations together we find
N
E(TIIII)
4 --
°1
••• + c
n
2; xi
i=l
n
=; (
2>t) T.
t=l
•
57
Hence, provided
n
~. c
~;
t
.. 1
'
E(Tttll) .. T
4
.
t-l
One estimator in this subclass, for which c l ..
°2 " •.• = c n = lIn,
has an estimated variance which is always positive (D§s Raj, 1956).
4.6
Class Five Estimators
Next He consider the general estimator in class five given by
i·::s
where
x.
s. 1.
1.
eB.
is the weight to be attached to the i th element whenever
1.
th
it appears in tho s
sample. We recall that there are altogether
S'
= N N-l Cn_l
.
.. n N
C we1.ghts.
n
11e shall determine these weights such
that
and further we shall attempt to find the best linear unbiased. estimator, i.e., and estimator with weights such that
subject to the restrictions on unbiasedness.
We have
N
Sf
2P
sal s
~e x. =
ie:s si 1.
2 x . sd
2;P Q •
s si
i=i
1.
68
Hence, for the estimator to be unbiased we must have
~p S
. 8
= 1,
si
for all i.
S.7~
One set of solutions is given by 9
and another set by choosing c
si
= c
= I / N-I Cn_l
si
si
Ip,
s
for all
provided
1,
so that
Ss. = I/N-I cn-,IP,
for all i
s
~
which leads to an estimator belonging also to class three.
if for any i, S
si
= S.~
for all
9i
s~i
~p
L, s
s:=>i
Further
we will have
=1
i.e 0'
S.P.
~
~
=1
or
S.
~
= lip.
~
and this leads to the estimator given by Horvitz and Thompson.
We have
\2
S'
I
~p ( "5'9 X.l
LJ S\""'-J S ~,
sal
i s
i
I
69
Next, to find the minimum variance estimator, we construct the
function
N
H
= VeTS)
[_s:..21 Ps9 s.
~>"i
+ 2
i=l
S'
= VeTs)
,
~PS (2 Aoe
~ s.
+ 2
- 1.-'
~
~_
o
].·L
s=
~s
N
')\ - 2
~
2>".
. 1
~
J.=
where the hi s are undetermined Lagrange multipliers, and attempt to
solve the nNc equations
n
~
, ; - SI
'bH/o9
-.e
si
=0
= P 2 ( 2x.9s}- 2\ ~P
s
i~s ~ i
s~ s
x. 9
]..,::s ~ s.~
I
J
P sx.~ + 2PS Ai
for s=1,2, ••• ,SI and for all i
in each~,
together with the
Mrestricting
Substituting unity for ~P
s> i
equations.
e ,
ss.
which is the coefficient of
~
each Xi' vlhen the expression in braces is rewritten, the above equation reduce
to
P x . ""0
.LJx.9
s
~
. .
J.fS
1
for s=1,2, •• o,SI and all! in each
s. + PS 'A.~ = TP SX.,
1
1
or
x.
~
i
21x.e s.
ES
1
J.
+
>... = Tx.•
J.
J.
The restricting eq,lations. which we have used to simplify the above
equations are
2 Ps9 s.
.
S)1
~
= 1, for i=l, 2,
o •• ,
N.
~
70
There are n
N
C + N independent equations and the same number of unn
knowns (including the N A's) and therefore a unique solution msts.
Again the algebraic solution of this system of linear simultaneous
equations is cumbersome.
Therefore let us simply say
9
os.,
for s=1,2, ••• ,SI and all i in each
~
- -
1
and
oAi , for i=1,2, ••• ,N
represent
n
the values obtained as a reSi11t of solving the above
N
C + N equations.
The minimum variance estimator T'I:"
will then be
;;
n
symbolically represented by
=
T'r!;;
~ 0 9 8 x.•
1
i
ies
Let us determine the expression for its variance . Multiplying
the fourth equation on the previous page by 9
all! in each Eo and then over
Sf
all~,
, then summing over
i
we find
Sf
~Ps (2;Xi 9 s . ) 2 + ~P5
s=1
5
.. .
'1~S
1
J
'
1
s=
~x.9
1 s.
~A.9
1 S.
4roo..
iCs
1
ifS
This reduces to
N
= T
~x. ~p 9
~1L:JSS
. 1
1=
.
s)1
;
i
substituting the! conditions of unbiasedness, we find
N
]A.1 = O.
i=l
1
71
Now taking the set of equations
X)..
~ x.9
). s.). + Ai
.
1. cs
and summing over all i in each
~
= Tx.).
and then over all ], we will find
S'
Sf
~~
s=l
~ ~xi
Ai = T
i~s
s=l
i~s
i.e .,
S'
~L...' (,~/, x.
oLe
8=1
'.e
N
)(
).
•
1.<: S
2: x.9 s. ) + N-l Cn-1 ~
/,
......
1.
'. .
1.\:" 8
or
1.
•
).=
1
2
A.1. -- lIT-I Cn-IT,
sa'
(It-len_I)
~(2,XiV~Xi9s.) -
sal its
N
'fi~ts
N
2
T +
2\
:I
0
i=l
1.
ELiminating ~ ~ between the two equations concerned, and writing
o
9
8
for 9
i
i=l
,we will find
si
I
Sf
M
= (1/ -len_I)
~ ~Xi2
. S
s= 1 )./;
0 9 8. +
).
~
. oJ·J ·::8
).r
Xi 0 9
x.
i
i
I
s)..J!
-'
2
-T .
72
•
A
Clearly V(T~) can assume negative values.
That V(T~) is the minimum variance of
can be easily verified o
! and for all
~
When we set
Q
a 8i
• N/n for all i in each
-
the resulting expression for V(T~) reduces to
N
; (l/n).
~ (xi "" ~)2
i-I N
. [1
where
- (n-l)/(N-l)]
N
i·
(ljN>
~~,
i=l
the variance of the be st linear unbiased e stilna te of the total in
the case of sampling with equal probability am without replacement.
Again if there is some approximate relationship between
~
arid a
characteristic y., known for all i, such as x = cy. we can find a
1.
-
~
simulated minimwn variance estimator by exactly the same procedures
73
as for the estimator in class three.
Thus i f
Q' is the simulated
o si
value of oQs.' then this estimator will be
1
Till
5
= 2;
Q'
iss
x ,
0 8i
i
and the estimate of its variance can be obta:ired by inserting
Q'
o s.
for
1
A
in the expression for V(TJ). HCMever, the nwnber of equations
o si
'
for obtaining Q'
for each s increases. For example, for samples of
Q
o
size n
be
=2
~
and N ..
-
4
5,
and
the number of equations
Y
involved will
16 and 25, respectively.
4.7
Class Six Estimators
We consider now: the estimators in class sixJthe general form
being given by
T
6
CI
¢
~x.
s iea
G
1
We recall that there are in all n& NCn • S possible samples taking into
-
account the order of appearance of the elements and therefore S weights.
For T to be an unbiased estimate matever the values of the s's,we
6
must have
S
E(T6 ):::]P (o)¢
s=l a
N
s
.~Xi· = 1~1
.~Xi s~i
~¢sPs (0)
188
Y Still with electronic machines the
no problem.
N
;.
~
~ ~xi
i·1
J
solution of such equations is
74
and therefore we must have
We recall that Ps(o) .. P3L pij ••• pij.ul
pij
k
m •••k
J. 2
n-l
n
all possible n' permutations
~
. j
J., , . . . ,m
J
and when summed for
p()",P.
s
0
a
Suppose lis - ¢Si for each possible set of !l elements ui,u j " .....~
which always include
; this implies that the order of appearance
i
of the elements is immaterial. Then the condi tion of unbiasedness
U
beco~s
',e
~¢s
s~l
P
I:
i 8
1 for all i.
-
This leads to the unbiased estima. tor in class five.
From this step
we can obtain the coniition leading to the estimator in class three.
Next it each ¢
S
III
¢. for every s which contains i at
J.
t
then we will haw
n
~¢. 8>i
~
t-l J.t
t
P ( ) • 1 ,
8 0
tre t th
draw"
75
n
~¢i.Pi
t=l --e
-¢
for all i and this leads to the
for any 1"
-
=
s
t
=1
~timator
¢.J. for all s:::>i we will
.
in class four.
have ¢.P
J.B
Further if
= 1 and this, as
considered above, leads to the unbiased estimator in class two.
One set of solutions satisfying the above set of
!! equa tions
is
also
•
'"
N.I_n
_
Hence, one unbiased estimator is
-.e
One feature of this new est:iJnator is that only those selection
probabilities of the probability distributions used at eam draw enter
into its
expression~
excepting
T~lI
and
T~tlJ
This contrasts with other known estimators"
where other probability distributions in the
system having no actual bearing on the formation of the sample enter
into its expression.
From a computational point of view this esti-
mator appears to enjoy considerable advantage over all others proposed
so far.
76
Lastly we note that when the selection probabilities are such that
the eJq>ression in the denominator for
Tl
is close to 1" then the esti-
mator is effectively like the ane used in the equal probability sampling
case.
Indeed when the probabilities at each draw are equal the
numerical value is 1 0
The IX' ocedure for finding the mininnlm variance estimator in class
six is along the same lines as for the one in class three.
Fir at
V2
2.
have
The equations for solving
¢ for all .!. arxl
~ (i
Q
1,,2, ~ .."N) the
Lagrange multipliers" are
P ~Xi'" ~"':i .. T ~x:."
B
iss
s = 1" 2"
iSS ~
iss
~ •• "S
and
~PS(O·ojS.l"
r
vi
i=1,,2, ... ,N
~
There are nl NC ... N independent equations am. therefore we can solve
n
for the ¢'s and the A's.
U
rJ,
o~a
for all s, and
-
A., for all i, are
o~
-
the solutions, then the min:i.mum variance estimator will be given by
Til ..
o
and it s variance by
¢ ~~
0 8
iss.
77
6) will be
The unbiased estimate of V(T
it can aSS'.lme [email protected] 7ah.:.e So
va~:Lance
Here also the simrlated miniIllti.m
estimator can be worked
out along the satnP- line 9 as for the one i..'"l class three.
N
lII1
4; 5,
For n .. 2. and
or more~ the same munber of .equations will be involved in
so
the solution of the ¢~s and ~aG as in the case of T
the number of equations ""rill neop-ssarily
the variance of the
sim::lao~ed nd':"'rim'.lm
b~
more.
But fer n72.,
The estimate of
yariar.::e estimator, for any
given ~ can be obtaired :by S:1.8stituting the s:i.1nu.lated valu.e of
A
in the e:x:p:essicn for V(~'f)- It
¢
also can assuma negative vcil.ues.
oJ
40
0
CJ.~ss
•.
Senr3r• E:=:tiJ"'?.fJC'rs
~~·"".L.·_·.'~C_":::r .;..~"!Oo'
. ...-~~·_ _ ",·,:~_._ _
given by
where it will be recalled that
if the weight to be attached to
1//
TSi
the i th e..la'lle.nt appearing at the
t
Gth
J
draw in the
element s appear in a specified orda: )"
elements! appearing in E.-
Again
~
8
th sample (whose
The summation is only over the
is used to identify the individuaJ.
samples of ..•
n and the sub-subscript -t in the weight is used to indicate
the weight appropriate at the t th draw_
possible samples and
nn:
Thare are in all nA NCn
NCn weights to be determired o
78
We have
S
.~ Pa(o) ~ 1/J.
E(T..J
sc-J.
I
For T
7
N
X ...
iss YSitJ.
~A' ~ P
i=l
J.
a i
( )
S 0
fa.J.
0
t
to be unbiased and at the sRme time far the 1'Sights to be
independent of the
:~js
we must have
and therefore we must have
~p
:: 6(0',
t~· t
lIOI
-
I, for all i
J.
S.7J.
o
Clearly one set of solu.tions will be given by
r
.,
s~
""t
for 81::" ies
in class
aI~d
six~
N",~l
ilIl
1I nl
.
.C
:'J
n·~
for all corresponding
P ( )
s
1.,
0
B.nd this leads to the estimator
Simila:."ly from the above condition of unbias'3dness we
can derive conditions leading to unbiased estimat.ol's in class five,9
four" tl'.ree azrl two,
and~
when all selection proaabilities are equaJ. 3
also one o
'l'he dstermina.tion of the
mini.'TI1Jll1
variai1("e estimator for. class
seven proceeds on the same lire s as for class five (;
\ve have
79
We set up the function
where the It.J.s are the Lagrange multipliers specific to each of the
N restrictions
the equations
'oH/J)t'a
co 0
it
stemm~~
from the condition of unbiasedness, and solve
It'Zps(O)(~ tao Xi)"1-)~P,,(O)
<;Sv'-~. "1]
~C=l
iee:L
io
Ps(o)Xi
_SS:L
t
t
Ps(o».i' for s = 1,2 sfleG ,S and for all! in each!.?
-
together with the N independent restricting equations
These
'.~
nnt'c
n
... N equa.tions are all independent and therefore
solution exists"
Hence, a minimum value of V('f ) enstso
7
'/ I J
'rs.:L
" tor all s and for a,ll j. in each s
t
-
-
o~ , for all!~
Thus" the m:Lni.nmJn ",arianoe estimator Tf is given by
unique
Let the
solution be symbolically represented as
o
?
-
80
and its variance by
~ (.:~X~i) [~"'i 4•. ) - 0-1,
V(TP • (1!nIN-1Cn-J.)
ana.
ios
1.SS
1.
t
the unbiased estimate of which is
V.(T71)
•
(l/ngN."lon.J?[(
~ ~ t s.
0
4·6
-""
.i,( ~
oft
... - 1-
IIN.lon,.'l
...t
~ X.J)/Na0r2.Cn.. . J
£lX.
j &.s 1. 0
njP~{o)
)
~
.~
Eli...
'(4
v
~x~].
i as
[ m:>1Q
'.'
'n~
"~...
~ X.'Xltl
.;.
'l'<~
J.
,.~
n-2
Q
••.
...J
A
for any given sample!o
_e
V(TP can assume negative values.
Needless to say, as in the case for th3 prevl QUS six minimum
variance estimators, we cannot use
is available e
Tt ur":;'ess auxiliary information
For example, ii' there is an approximate relationship
between ~
x and 1. such as x.1. • ey.~ for all ~
v in the universe, we can
construct a simulated lJIinimum variance estimator on exactly the same
lines as for the estimator, say, in class five.
In this case there
will be more simulated value s to determire than in all the previous
•
In summary we find that in each of the ::even classes, unbia::ed
estimators exist; in the case of class one such an est:imator exists
8J.
only when the selection probabilities are equal.
Also all unbiased
estimators reduce to the one in class one when all selection probs.bilities are equal at each draw.
been classified.
All known unbiased estimator shave
These estimators have weights which are functions
of the probabilities. An estimator, believed to be new, has been
derived (from the condition of unbiasedress)
Y
in class sixc
So far
it has not been possible to derive analogous types of unbiased estimators
in class five and seven any different from those already derived.
From the condition of unbiasedness in class seven every estimator can
be derived.
From six, every estimator except seven can be derived.
From five, estimato3:ts in class three, two and one can be derived. From
four, estimators in class two and one can be derived.
From three,
two and one can be derived and finally from two, one can be derived
(when the selection probabilities are all equal at each draw)G
Further minimum variance estimators exist in ea(jl class, though
with weights not independent of the propertie s of the populati on.
From a severely practical point of view they are useless, but they
provide the mearlB for the construction of other estimators.
Thus
estimators with simulated weights, obtained by the use of auxUiary
information, called simulated minimum variance estimator s have been
obtained; those in class one, two and four appear cumbersome, but
•
those in the remaining classes are much easier to construct.
formal and simulated variances and their
been obtained o
Y
It can be derived also intuitively,
correspond~
Both
estimates have
82
How closely the simulated minimum variance approaches the true
minimum variance will depend. on the exactitude of the relationship
between the oharacteristio under study and the characteristic chosen
to determine the simulated weights.
83
V.
5.1
FURTHER NOTES ON UNBIASED ESTIMA.TORS
General Notes
In Chapter IV unbiasedestimators in each class with weights which
are explicit functions of the probabilities were derived.
In class one
the best linear unbiased estimator was derived as a fP6 cial case when
all probabilities are equal.
Considering these special types of estimators, we have a unique
one in class two (given on page 46), a sub-class in class three
(given on page
.5J.)"
three sub-classe s in class four (given on pages 57,63
and 64, respectively), and a sub-class in class six (given on page 75).
Thus 12 interclass comparisons are possible among the estimators in
these four classes; and within class four, three intraclass (or inter
sub-class) comparisons are possible.,
Alternatively ore might ask which
of these six estimators has the least variance, given the probability
system and sample
5.2
size~
Comparison of Estimators
From what follows it will appear that we are only able to state
definitely that the variance of the unbiased estimator in class three
is always less than that in class six.
Regarding the
14
remaining
comparisons, we are, in the present state of knowledge, only able to
say that the relative magnitudes of the variances in question are dependent on the probability
syste~.
84
Let us compare the special estimators in class two and class
three.
We recall that these estimators are given by
and
Their N8Tiances are respectively
N
V(T21)
= i~l
~ (x~/P.)
1
1
+
2~x.x.(P ../P.P j )
i~j-~ J
1J
1
...
i-
:I
and
S'
• (If-1e 1)2 ~
n-
8=1
· (1' ·lC
n. / "
where
~
s)i,j
~x~ +2:Z
!ss
s.x
.
1
i <'Jes
P
1
J
_
rI-
IG=l~ xi ~ (1: > ~.XiX. ~ (lIP.> JJ
+ 2.
8;:)1
J.<O.J
8
J
S'1, j
denotes summation aver all samples which contain u. and u .•
J.
J
Now T! is more efficient than T! if veT!> - V(T~) /'0 0
V(Tj) _ V(T~).
N
~
~:x:~
i=l 1
1
N-lc,
n-l
~ (liP) _
5.,1
B
N~C
n-l
Pi
We find
85
Before proceeding to simplify the terms in braces in the eJq>ression
above, with a view to obtaining more IlBaningful quantities, we note
that the notal number of samples containing
Uo
J.
is N-lC 1 and there-
n-
fore the total number of P with s which include i is N-IC .1 •
s
-
-
n-
Similarly the total number of Ps with! which include! and
Claarly therefore
..i
is N-2 cn...2 &
~ (lIP )f-len 1 is the reciprocal of the harmonic
a
~
mean of all P which include i and also "'" (lIP )N-2C 2 is the
8:;,i
S
OJ
-
reciprocal of the harmonic
lIl':l an
fbJ.,
of all P
8
s
n-
which include i and ;
-
0
oW.
Let
us denote these two hannonic means by HPi and HPij' respectively.
Further we can write
and
where APi and APij denote the arithmetic means of all Ps which include
i and all P which include i and j respectively. ~bstituting these
-
B
expressions into the formula for V(T~)" V(T~) we have
N-
V(Tj) .. V(Ti) ..
aX~l(l/HPi) ..
(lAp i)]
86
Now the harmonic mean is always less than tre arithmetic mean
(Hardy et alo, 1934) and therefore for any given
!"
However, given a probability systan, for some pairs of
(l!HPi) ...
(APij)/~Pi
!
and
.J.
APj)
may be positive and for others negative or zero.
Treating the e:xpression for VeT!) ... V(T~) as a quadratic form, we
can write out the N conditions under which it will be positive definite
but the algebra is quite cumbersome.
We can definitely state, however" that for some probability systems
it will be positive definite and for other s indefiniteo
This is because
there are G .. D"7 N values of arbitrary probabilities which we are free to
choose in order to make the quadratic form either positive definite or
indefinite.
In this particular comparison it can never be negative
definite because the special harmonic-arithmetic inequal ity, given above"
is always positive.
efficient than
T~
Thus for some probability systems
and for others vice versa.
T~
will be more
What the se systems are is
aIlOther matter.
The examination as to whether V(T9 ... V(T~) is positive definite" for
any particular probability system" will involve the examination of all
inequalities (resulting from the expansion of all principal minors of
the determinant of the matrix) made up from various kinds of harmonic
and arithmetic means described above.
problems here are very complex.
to ana meaningful interpretation.
Certainly the mathematical
Further, an examination may not lead
87
Similar conclu sions will be aIT i ved at and similar difficul tie s
will be encountered for other comparisons in respect to the entire set
of probability systems, or any particular one thereof, except for tile
case discussed below.
Next let us compare the unbiased estimator in class six with that
in class three..
We have
and
where
"'V
~
denotes summation of the reciprocal of ps(o) (which is
i;jr.. crm
given by p.p~
J.
J2.
,.or;
pij •••1) for any given sample s (made up
mn
-
elements u."u. J . . . j u ,u ) far all the n! possible orders
1 m
J.
J
of t.he elements. Now
a:r
a:r
appearance
88
is the reciprocal of the harmonic me an of all P ( )' s for a givens \I
S 0
Let us denote this qucntity by Iflll{o).
-
Similarly we ha'W:l
where AP ( , is the arithmetic mean of all P ( ) IS fer a g:i ven
6 0,
B 0
Thus we find
V(Tl) - V(TP •
So
-
f (a~J(;;p~(O) • Ar~(O~/nll (N-lCn~
Since HPs{o)< APa(e) for all ~ V(Tl) ... V(T )'/0 whatever the value
3
of the Xis, and, of cour se, whatever the probability system.
T~
is always more efficient than
Tl,
Hence
a very pleasing reSllt.
In the discussion so far I have omitted the expressions for too
variance of the three unbiased estimators in class fal!'.
all given by Des Raj (1956).
paper 0
They are
Also one of them is found in Das l (1951)
The expressions are very complicated and the pr oblem of com-
paring their variances, for any given probability system, appears to
be intractable as in the ease of
5.3
T~
T~
and
0
M..1.1tiple Comerison of Efficiensr;
Finally if a multiple comparison
or
efficiency is attempted we
note that we can always find a probability system when, for example,
the following siiuation obtains:
provided
n""l
G- D
a
~ (r-l )NC: + nNe
r=2
r
n
... 1 '/ 5N •
2.
89
since there are five times the number of corrlitions to be satisfied
here; the second inequality is true for 2 L.. n <.... N, when NZ 7.
Y
We note that V(T~) must always be less than V(Tl) in this system
of inequalities.
The order of magnitude of the variances in the in-
equality can be permuted except :i,.n respect to this restrictiona and
for each permissible permutation there corresponds a probability system o
Thus, except for the cases noted, we conclude that for sone
probabUity systems the magnitude of the variances will be in some
unknown order, and for others in a different order but still unknown.
In other words, the order of efficiency of the six estimators is de""'"
pendent on the probability
•
system~
What the probability system is in
relation to a given order of efficiency is sane thing we are unable to
answer at present o
3 and n ... 2, we find G - D ... 5~ so that the multip:l,e inequality of six members (variances) will not hold. But for a
simple inequality a probability system Hhich meets the requirement can certainly be found. Other exceptional case s for
3 < N.<: 6 and for the relevant values of n can easily be Horked
Y When N •
~e-
-
90
VI.
THE ffiOBLEM OF NEGATIVE ESTIMATES OF VARD\NCE
6.1 Introductory Remarks
In sampling with unequal probabilities, negative estinates of
variance have been found for almost all estimators.
It appears that
such estimates have been rejected because of the lack of meaningful
interpretations to be given in each specific situation.
On p1.U'ely
logical. grounds, once we have accepted the premises behind a theory':;
we cannot reject its conclusions (provided of course they are validly
deduced).
One difficulty here is that there is a tendency in our
thinking, perhaps uncorwciously, to equate a quantity Slch as s2
•
(the square of the standard error) to the estimator of a variance, and
when the numerical result is negative we find it unsatisfactory,
partly because in contrast to equal probability sampling we are always
used to findirg positive results, ani partly because its square root
is not meaningful in the context of current statistical theories used
for the purpose of determining confidence intervals and tests or
significance e
In what follows attempts will be made to interpret negative estimates of variance in certain classes of estimators.
Hcuever, as we
shall soon see, not all classe s of es tirna to r s even allow the easy
discovery, in meaningful terms, of the coniitions under which their
estimates of variance are positive"
Tm main source or this difficulty
lies in the complexity of the probability functions vlhich are enmeshed
in the expressions for these estimatorslJ
91
6.2
The Case for Class Two Estimator s
Regarding class two, Horvitz and Thompson's estimator of the
variance is given by
aU
n
1 - Pi)/pi Jxi + 2
~[(Pij
~
.. PiPj)/PiPjPiJ xiX j
and Yate s am Grundy' s (1953) ver s ion C£ it by
n
n
=~ (x~/P.)
i-1
).
).
~
j(,i)=l
[!'iPj" Pi,)/P'j] +
J).
2~ [(P' j
i<j).
.. p)..P.)/p)..PjP..lx.X .•
J
).jJ--"J. J
Both these estimators can lead to negativa estimates for certain
samples, except in the case of the latter estimator for two special
•
situations noted by Sen (1953).
For (l'.ach estimator we can, regarding
its expression as a quadratic form, write dcwn the conditions under
Which, for a given sample, it will be positive definite.
However, it
is not easy to interpret these conditions in a meaningful way.
In
this connection, sen in the same paper advocated taking zero as the
estimate whenever this estimator yields a negative value.
As a practi-
cal meaSlre it is certainly appropriate but we do not know what me aning
it carries.
For a given sample, the interpretation to be given to these
sorts of conditions appears to be a basis fer interpreting negative
in relation to positive estimates of variances"as we shall see from
the case of the unbiased estimator in class three!>
92
This unbiased estimator in class three is given by
~~
N-l c1
It can be shown that
_)
p
n-l s
VfT~)>O
when
.;
(lIe
n
~x~ a ~Ft-Ie
J.
)~p.
B
~-l] x~
1
_
p
n- iLj X1 j
n-l 6
+
To prove this
result we establilh the following lemma.
Lemma.
The quadratic form
n
~
...
•
a~~
i ..l
.. 2 (a-b) ~Y.Y.J
i(j J. J
-
where b70, is positive definite, if, and only if, a >[(n-l)/nlb.
To prove this we note that the n x n determinant of the rna trix
of this quadratic form is
a
a-b
a.-b
a
a-b
•••
•••
•
•
•
a
~e
necessary and sufficient condition for Q to be positive
definite is that
a-bl 70,
a
a
a-b
a-b
a-b
a
a-b
a-b
a-b
a
70' and so ono
93
Since b">O, the second inequality reduces to
a'/(1/2)b,
the third to
a> (2/3)b,
' all y the nth J.nequalOt
and f J.n
J. y t 0
0
> l<n-l)/n]b.
a
Since b7 0, the necessary and sufficient condition reduces to
a '1
~n-l)/nJ
b.
Wnen
a
=
[(n-l)/nj b,
y
lOa
~y/n.
then
-..;;,
To apply this lema we note that
and
b
1\
in the formula for V(T
3),
0=
disregarding the constant term outside the
1\
square braces.
(N-n)/(n...l) ~ 0
Hence V(Tj) is
positiv~
definite if and only if
(1;N-ICn_1P ) ... 1 ~ (n-l)/n.e(N-n)/(n-l), i~e.
s
(1;N-ICn_1Ps)
(lt cn ) 2:" Ps
~
(N/n), i.e.,
94
If for any s. it happens that P
-'
lemma or even by substi. tuting l/e
that
= liNen ,
~
n
then by applying the
for P , it can be easily S1 oun
s
~(T9 ~ N2[~(x -X)a/n(n-1).J (1 We note that l!C
size
n
s
(n/N)]
'7
O.
is the probability of drawing any sample of'
when all selection proaabilities are equal 0
Also this quantity
is the arithmetic average of' all P. Thus the practical upshot of
s
this result is that all samples with high pro babilities of' appearing
(in the sense of being greater than l/NCn) will have negative estimates
of variance.
This also means that even if the selection probabilities,
which make up each P ' are optimum (in the sense that V(T1) is a
s
minimum for these values whatever they may be) the above condition
•
still holds •
6.4 An Example for
/\
V(T~)
Sometimes samples with high probabilities of appe aring may also
yield astimates which are near the true value as the following example
given below illustrates.
We have the following data far a population of four units
i
x
1.
1
2
Z
4
5
4
3
9
6
4
6
9
It is desired to estimate the total for the x-population by sampling
two units.
Suppose we know 1. but not .?S and
l. is in soroo vay related
95
to 2£.
N<»l in general it is extrezooly difficult to obtain the optimum
probabilities (in the sense explamed above) with which the x-units
are to be selected.
T~,
But for the estimator ..1hich we shall use"
is not difficult to determine the optimum P.
s
P = ~ x./N-Ie.
i&a ~
a
This optimum P
it
is given
s
T
n-l
where
The proof is as follows:
Here it is of interest to use the Cauchy inequality.
V(T~l
Now
is a minintum, for variat ions in P , i f
S'
~
sal
is a mininnun.
Sf
~
s=l
s
~ 02
C~Xi
~&a
_
-
Pa
s. I
~
~Xi
~
j'
i Pa
. Bel
~ut
~
sal
S'
~
. .21
This can be easily seen from the formula for V(Tj). Now
:.:s Xi
iss
fP;;
ja
.
SI
SI
~ (VP)'J. ~
s=l
s
]
./ s=l
2:
;2~
~8S
,.-.
JPa
o
(fP"a)
Equality is only attaire d when
rPs
= a constant for all
~"
0;
96
where A is some cQ'lstant which can be determined by summing over all
-s.
We do this and find
Sf
~
N...l
~x.·
CIT,
sal i&a ~
n-
""
• ~
so that
A. N-IC
T
n-l '
~ V--k-l
and thus
p •
"i
s
J.8a
Cn"'l T•
Des Raj (1954) also stated this result, but he did not appear to be
aware that it was an optimum for the P.
a
One set at' sele ction
probabilities appropriate to this situation is to select the first
unit with probability proportional to the size of the x--character am
the remaining n-l units with equal pr'obabilities and without replacement.
For these values of P , V(Tj) = Oe
a
of value.
P~.
This result is not in itself
B ut it provides the basis for determining the near-optimum
And coming back to wr example, we must select the first unit with
probability proportional to the x-value and the remaining unit with
equal probabilitie s.
the corresponding
In our case
~
do not know the
XIS,
but we know
1. values which are in some way related to the XIs.
Hence all vIe can do is to take
PS =
N1
~y. I - C IT.
'"
1.
n. y
•
J.sa
Again ..Ie have to ltunscramble " thispr'O bability to obtain the
II scrambledll
selection probabilities in terms at' the known y's: in order to select
97
the sample.
This means that
'We
select the first unit 1-lith probability
proportional to its y-measure and the remaining unit with equal probability_
Then the formula for the estimate is
T,3'
.~x./N-lc
-a.P .(~Xi (~y~ . Ty
(~)
n. \108 / is. V
168
J;
vlhich is Lahirits (1951) ratio estimate.
Y
In our example, the possible samples, the P vaJnes in terms of if.,
s
the estimates, and the estimated variances are as folloVl~.
Samples
-
P
s
~(T~, (n l.)]
T1
,3(n=2 )
R
1,2
6J60
.30
363
1,3
8/60
32 1/2
273 3/4
1,4
10/6Q
20
• T)2 - (T! ~ y)(xi+
8
J ...
V [Tj
x~+
6xr:.)
J
J
26.53
(n=2)
2,3
10,i20
28
2,4
12/60
18 1/3
...65 5/9
3,4
14/60
21 3/7
... 170 40/49
The tot.aJ. for 2S is 24.
32
The sample 0,4) vlhich yields an estimate
at
21 3/7 which is the closest to the true value has a very large negative
variance.
Tl:e samples (1,4) and (2,3) which have the same p:'obabilities
of selection yield estimates which deviate equally from the true value
but the edimates of their variances are different.
(1,3) for which each correspondirg P <1f-0 '
2
s
estimates of variance.
Tl:e true variance V [T
samples (l,a) ar:d
have very large positive
3
]
= 26.53.
Like
(n::a2)
in equal probability samplirg, a low estimate of variance does not
Y
In his sampling procedure there is a direct rrethod for selecting the
~
sample s proportional to ~ y, and P need not be "unsctoambled."'
iss c.;
S
98
neoossarily :iJnply that the estimate is always closer to its true value
than one having a high estimate of variance.
But in this example it
will be noted that accuracy (meaning absolute deviation from true
values) is associated with low estimates of' variance and high probabilities
of selecting the sample in question.
6.5
1\
A Further Example for V(T1) Illustrative of
an Unwise Choice of Selection Probabilities
It is possible to manufacture an example, so that samples with
high or low probabilities of selection may Yield estinates deviatiq;
greatly from their true values by suitably manipulating the probabilities.
Consider the same data again for 2SJ but this time the auxiliary
information is not so "good" (in the sense that ther e is no high
positive relationship between! and l in turn leadil'{'; to a much higher
variance than in 6.4) 0
San pIes
- -x 1.
Ps
i
1\
T~(n=2 )
1
4
4
1,2
7/30
12 6j7
2
5
3
1,3
5/30
26
3
9
1
1,4
6/30
16 2/3
4
6
2
V
-
[T13(n=2 ) ]
64
50
48 8/9
2,3
4/.30
35
285
2,4
5/30
22
2
3,4
3/.30
50
1030
In the selection of samples of size n
34/49
= 2,
~ ~T3
(n=2)
] = 124.79
as in the fir st example, ihe
first unit is selected with probability proportional to the
lEW
y-measure and 11Tith no replacement, and the second with equal probability
The r e sults are given above.9
The samples 't1Tith the highe st and
99
lowest probabilities of selection have yielded the worst estimates.
r'che cause of this state of affairs is to be found in the selection
probabilitie s; we have associated large values of
~
I and based our relection probabilities on these
of
with snaIl values
yls~
Also we have
used what are believed to be optinmm probabilities for the selection
of each sample, but their use has not been of much help as evidenced
by V
6.6
[T~
..J(n=2)
~om.e
1
$:
124.79; with the previous set
of Y's it was
26.53 •
I..
Interpretations of Negative Estimates of Variance far T
3
In view of the above examples, we know that when T
3 is
to be used
as an estimator, negative estimates of variance are likely to occur
'VJhatever the probability system, except for those systems where the
selection probabilities are such that each P
s
we have arrived at is this:
to be treated?
= lIen.
The question
How are negati ve estimates of variance
The above examples seem to give a clue to the aIlffifEr.
When the selection probabilities are derived from the re ar-optimum
probabilities (optimum in the sense of P being optimum), then there
s
appears to be meaning in using them as irrlicators of the precision
attaired in sampling.
We
ha~e not introduced any quantity like s2
to be equated to the e st:imate of the variance partly because in a
samplirg theory of this kind it is unnecessary (at least in the
view)
Y
~vriter's
and partly because in the present state of knoVlledge we are
unable to give a meaningful interpretation to imaginary numbers in
relation to the theory of statistic al samplizg.
Y
If there is basis for
When we are conce;ned with very small samples, as here, there is no
2
basis for considering 8 , since even if ~ve do, we cannot ure it for
the purpose of determining confidence intervals in view of the fact
that we do not have the advantages of large sample theory at our
disposal.
100
belief that the selection probabilities are not appropriate (in the
sense that the P which are derived from them are not ne ar the optimum
s
P ) then there appears to be no way out of the difficultYo
s
Indeed,
a negative estimate may sometimes mean that the selection probabilities
may not be appropriate, as revealed by wr secorrl example.
This remark
may be at least as meaningful as the assertion, necessitated by
practical circumstances, that its estimate is zeroo
6.1 Estimators in Class Four
There are three mb-classes of unbiased est imator s in class four,
4' pro posed by Das
T
(1951) and Tt and Tg" by De s Raj (1956).
or V(Tt) and V(Tlt) as quadratic
can be said about the estimator s
forms in view of their complexityo
estimates of variance.
form of T
given in
4
t 'I
Little
Generally they all yield negative
However, Des Raj has shown that for a special
hi~
notation as
n
\nean
"":::l
51;
(l/n) .2d t i
~=l
(Where the t's are as defined on page 64) the estimate of the variance is
n
1\
[
V(t)..
••,....
(lIn) ~tiJ
a
~ (ti
12'
.. T .. (lIn) •
~=l
G
2 •
T ~s
i2-"
)2
This property is partly a conse-
quence of the result that Cov (ta'~)
E (ta~) -=
mean
n-l
~
and is therefore always positive
.. t
llII
0 for a
f
b, so that
which leads to the remlt that an unbiased estimate of
10.1
6.8 Estimators in Class Five and Class Seven
There are no especial types o£ unbiased estimator s for class five
and class seven.
Let us consider an unbiased e stima tor for class five.
It will be given by, say
n
T~W
;;
• .~e
Xi
~ s.
1..... 1.
where 9 1 s satisfy the condition of unbiasedness; i.e o ,
!.
for all
The estimate of the variance of T
S will be
c;
/\
We see that
VeTS')
a
I
0
n Q x.
~.
i=l 8 1 1.
2. .. (lIp)
can be negative.
8
~~ ~a
N1
-
C ~
+2.
•
n.....
Here also the coilllditions under
which it will be positive are hard to determine.
Similar formulas and
remarks apply to any unbiased estimator in class seven.
The special type of unbiased estimator in class six is given by
102
with estimated variance, whioh is unbiased, by
~(Tl) •
(Tl)20 -
J(.£
nl
1
Cn-1Ps(o)
(~~ + 2 [(N-l)/(n-l)] .]~)
t
?J
l.<j
A
Applying the lemma established earlier, we will find that V(Tl) is
positive definite if
or if
ij ••• f
I\l.. P1.'
•••
n-l
It will be noted that lIN
c
pij.~~
m
n
o
l/(N-l) •• 0 l/(N-n-l) is the probability
of obtaining a given sample in a given order when all probabilitie s
are equal as also the p-produot on the right-hand side of the inequality"
but, when all seleotion probabilities at eaoh draw are unequal.
103
',Vhen
we~rill
find
1\
u2.
V(T~) • N-
'where
x.
~ (xi
- 'X)2.
n(n..1)
~ x/n.
'Ihere is one distinctive fea'blre atout this estimator"
The Sar,le
set of: elements drawn in a different order will give rise to a different
estimate of the total and a different estimate of the variance despite
~
~ xi remains the same 0
Thus the
ies
same set of elements may sometimes have a positive estimate of variance
the fact that the sample aggregate
and sometimes a negative one.
By the use of the Cauchy inequality we can, as in the case of
T~
,
show that the optimum value of Ps (0) is
In view of the similarity of conditions with respect to
T~,
negative
estimates of variance here may also be interpreted as described in
6..10
:!:~!x~stence of
Probability Systems for
1\
1\
V(T!)i
1\
~(T~) and
Whi~
V(Tg"} are Always Pusitive
One might ask whether for each class of unbiased esti.rna.tor there
are probability systems for which the estimate at: the variance for
8very possible sample is positiveI';
104
•
\ve already know the answer far class one.
Th3 system where the
elements are sampled with equal probability and wi thout replacement
yields estimates of variance which are always positive.
In class two the re is only one unbiased e stiIna tor, and Sen has
noted the two special cases in which the estimate of variance is positive;
Jk.,
(i) when the first unit is selected with an arbitrary
probability and the remaining n-l with equal probability at each draw
and without replacement, and (ii), applicable only for the case of tV-TO
elements wt.en the first unit is selected wi th an arbitrary probability
and the second with probabilities proportional to the probability masses
of the remaining elements.
(The latter statement is often phrased as
"prooability proportional to size.")
probability systems for which the
We will show that there are
Horvit~-Thompson estimator
of variance,
or Yates and Qrnndygs version of it, is positive for every possible
sample.
'Th.ese estimators can be written in a general way as
~
2.
~
V • ~~x. + 2 ~ q. -tX.Xj
iss
J.
i<j&5 J.J J.
where q. C4""1d q.j are functions of the selection probabilitie s specific
J.
J.
to the sample in question and specific to the ·lJype of estimator.
Nav
there are only NC distinct samples possible, and therefore there are
n
only Nen C-Lstinct quadratic forms.
With each estimator there are n
conditions to be satisfied for its corresponding quadratic form to be
positive definite.
Thus in all there are at most n.NC conditions
n
to be satisfied, perhaps not all distinct..
are that the principal minors of
y.
Each of these n corrlitions
should all 00 positive, and the
105
are therefore in the form of inequalities.
The members
or
these in-
equ al i tie s are made up of the q t sand therefore all the sele etion
probabilities in the system.
There are in all
arbitrary probabilities in the entire system.
The se arbitrary
probabilities are subject to
D ... 1
'f-
Nc...~ N~ +
J.
4.
000
restrictions of the type
Viewing the a.rbitrary probabilities as quantities which we are
free to choose (SUbject of course to the restrictions mentioned above)
we have altogether
n=-l
G "" D ..
~
rtr.'2..
(r..,l) NCr + n NO .... 1
n
(for
11
? 3)
of tr..8m,,:;0w we have noted above that, far the NC quadratic forms
n
(and t,}lerefore the estimators) to be positive definite, n NCn conditions"
at most, must be sati.sfie:i o
Clearly we ca...'l
chOOS·8
the
pts SllCh that ttese conditions are all satisfied since
various
106
•
-
for n73
For the case n
IIlI
2, G - D •
NJ-- - N - 1"
and the exaot number of coOOi ti ons
(regarding the principal minors) to be astisfied is
ea
N +N
lI;I
LN(N+l)j
above assertion.
/20
Here also G ... D> [N(N+l)]
12.
Thia;proves the
Thus the two probability s;ystems on which Sen's two
cases rest are part of a larger group of systems all of lihioh. yield
The identi-
positive est:iJnates of variance for every poss ible sample 0
fication of these systems is another matter; certainly the task is
Regarding class three we have already shown that when P happens
s
to be eqp.al to l/i cn, then 'V(Tj> >0.
There
C'J.~e
N
G
!l
Now' consider the case when
such equations am G - D arbitrary probabilities which
we are free to choose
G
Since G ... D "/
'IT
he ., there is a group
n'
of systems
(of whirh th(. equal prohFl.bility case is just one) for which every
. .. 1 e
pOSSJ.t,
/v' \.'l~,
j
(TI . )
';
0 ) b:.;.G
J
.
'1
..
f'
on1yarlslnJi-,-rQm
--~, _;;.:J.
_~_l
t1,.,·
'LId
_J<oaot
.. .
CC.:1 d l"Glcn
..,----
-~--
P
B
l:
l/Nc
n
for aJ.] s-
-.......
. .: .... ,...
.....0
It :'8 djfZt:-;".llt to establish the proposition for estima-ooars
T'41 1 a"....·..1.'J '1',"~
' '1..+ "n
_. (., aMS
.~.~...
~.)
.p~"""
'.! """'.
.L
the unbia.s6'l est.i:n<"?t:')r
a<~
t;;;)
a":
~o
.-l-..;
for c·, .. "'0"
S1.··..,.
...~
.~ ~IU
ar· 4 class sevf>n
••••-
.... -
•
Ttl
For
'I~ PI i.n class five, vm.e:re there are NCn distinct
quadratic forms corresponding to each estimator, the proof is exactly
as given above for the class
t,~o
estimatorll
De.. Raj:s special estimator Trr(when
C
i
9
lIn) is an example of
In
estimator acconunodating itself to any system.. Whatever the probability
S~lstemJ
the estimate of its variance is always positive 0
107
VII.
7.1
A NOTE ON OPI'OOM PROBABILITIES
Introducto~ Note
In the foregoing account we have been speaking about arbitrary
selection probabilities in a most general way.
We mentioned in se·:::tion
1.3 that optimum values of these probabilities may be chosen in any
defined sense.
The: ,. 0 tio tl of optimum probabilities is due to Hansen and Hurwitz
(1949).
They spoke of it to mean the set of selection probabilities
that minimized lithe variance for a fixed cost of obtaining sample
results," or alternatively "the cost for a fixed
7 .2
s~ilpling
error."
A Si~]..~l"~t~~
Leaving aside the question of
COS'G3,
which is admittedly unrea1-
iatic, He can detG:!."'lnine at least theoretically the set of probabilities
that win minimize the variance of Ciny estima.tor, say T , having a
a
varia.nce, V(T ).
a
For thi3 we set
and solve the equ.&-c.';'on
~H/
6p
=.:
0
for all .£ togother witil the entire set of restrictions, one of which
is symbolized by
practice.
the
Xl
s
I
p = 1.
The actual solutions are not easy in
Further we can only proceed if we have some knowledl!e of
th~ough
some related characteristic if..
The following example relating to Horvitz and Thompson I s estimator
for the case N = 3 and n = 2, with probabilities for the selection of
108
the second unit dependent on the initial probabilities, illustrates
+
..2..
I-p )
)
+.
Let us assume we know about the
x • cy.
XIS
by an approximate relationship
To find the optimum pI s we set
H
:::J
V(T) + A.(Pl + P2 +
p) -
1),
where A is the undetermined Lagrange mUltiplier, and put
+
(-:::)X~1.
1
109
We have two similar equations correspondinr to setting
dH/AP 2
D
0 = ~H/~ P3
There are four equations and four unknowns (includinr >..) and therefore,
theoretically at least, they can be solved.
the solution is not easy.
Technically, of course,
We may solve them by Newton's method.
Let the function obtained by eliminating >.. between dH/oP = 0
l
and ~H/JP2 = 0, and then writing P
f 12 (Pl'P 2 )
3
= 1 - (PI + P2) be denoted by
= o.
Also let the function obtained by eliminating>.. between c\H/c'>P
2
and ~H/dP3 0 and then writing P = 1 - (PI + P ) be denoted by
2
3
= 0
only. If (pi, p~) is an approxi2
mate solution sufficiently close to the true solution (PI' P ) then we
2
These two equations involve PI and P
will have by Taylor's theorem,
and
neglecting second and higher order derivatives which will be insign1ficant compared with those of the first order.
Thus we can solve for PI
and P2 from the above simultaneous equations which are linear in PI' P2.
Using these values again, we can obtain an improved solution.
A pro-
longed iterative procedure, however, will not be necessary i f the right
set of values can be guessed.
We may start by guessing that
110
pI
= Yi/(Yl
tude.
.
Y + Y )· Obviously this is about the right order of magni2
3
l1hen we have obtained sets of (Pl,P2) which do not differ
+
appreciably then we may stop and take P3 as given by
These will be the initial probabilities of selection which will make
V(T ) a minimum.
a
7.3
When we take cost into consideration the problem can be formulated
most generally as follows.
Let O. be the cost for sampling the i
th
1-
element (i=l,2, •.• ,N), whatever the form of the cost function.
For
example, 0i mtJ.y be a funotion of the population, size of physical area,
if the unit of sampling is a given delineated area, etc.
The expected
cost will be given by
Suppose we wish to use some unbiased estimator T belonging to one of
a
-
the seven classes for the estimation of k characters.
Let the allow-
able amount of variance to be fixed for the gth character be F.
g
we set up the function
k
H
=E +
2).g [v( gT a
) - F] + 2). ( 2p-l),
g
g=l
Then
111
where the 'A's are the undetermined Lagrange multipliers, and solve the
resultin(" simultaneous equations,
6H/Jp ..
0,
for all selection probabilities, p, together with the equati. ons
V( gTa }" gF,
and the entire set of equations symbolized by
~p=l.
"'lhon V( gTa } is differentiated with respect to the p's, functions of
Xl s wi-II appear and therefore we must have auxiliary information
g
about each gX through some other variable, say .y, for all E' to
solve these equation s.
The optimum values of the
thus be opti;nam for the multivariate sampling of
!
piS
obtained will
characters.
112
VIII.
8.1
SUMMARY AND CONCI,USIONS
Summary
An examination of a unified theory of sampling for a finite uni-
verse propounded by Godambe, relevant only to the case when the elements,
as units of sampling, or in clusters of non-overlapping elements, are
drawn one at a time, shows that it encompasses several special fields
each depending on whether or not the units are replaced after each draw.
When the sampling is one stage the folloWing situations are conceivable:
(i) when the elements are not replaced after each draw, (ii) when they
are occasionally replaced, (iii) when they are always replaced, and
(iv) a "chaotic" situation when some or all of the elements, none of
which have been previously replaced, are replaced prior to a given
Theories of sampling specific to each situation can be developed.
draw.
In this thesis the special theory of sampling appropriate to situation
(i) has been constructed.
There appears to be a need for such a theory
for this particular field because during the past seven years various
estimators have been proposed and the logical connections between them
in the wider perspective of such a. theory have not been explored.
A
deductive approach is necessary for such a task.
As a prerequisite the most general probability system appropriate
to this situation has been defined in Chapter II and expressions for
all !: Eio!'i probabilities needed for the development of the theory
have been given.
In Chapter III, the three axioms of sample formation have been
stated.
Seven general linear estimators were constructed on the basis
of these axioms.
113
Tl has weights which are based on the order of appearance, or,
of drawing the elements, T on the presence or absence of a given
2
element in the sample, T3 on the set of elements composing the sample,
T4. on the appearance of a given element at a given draw, T on the
5
given element and the particular sample in which it appears, T on
6
the set of elements appearing in a specified order, and finally T on
7
the element, the order of its draw and the particular sample in which
it appears.
Each typifies a class of eetimator.
From the condition of unbiasedness special estimators were determined in each class.
obtained in class six.
•
in each class.
One new special unbiased estimator was
Aleo minimum variance estimators were obtained
However, the weights of each of these minimum vari-
ance estimators were not independent of the properties of the population; Le., they were functions of the x-values.
By the use of auxiliary information, in the form of an approximate
relationship, x· cy, presumed to hold and known for all y's in the
universe, all weights in each class were simulated.
In practice these
weights will be obtained by solving certain systems of linear simultaneous equations specific to ea1h class.
Thus seven simulated minimum
variance estimators were obtained.
The efficiencies of six known special types of unbiased estimators
(T
2, Tj, T4, T4', T4'" T6) were compared.
VeT!)
> V(Tj)
It
whatever the probability system.
l"1SS
found that
In refard to the fourteen
remaining comparioons, the relative efficiencies depend on the
probability system.
On the question of multiple comparisons of efficiency, in the
present state of knowledge we can only say that the order of efficiency
114
depends on the kind of probability system used.
This means that for
some systems it will be in one order and for others in another order.
The problem of negative estimates of variance was discussed.
Some meaningful interpretations were given for the estimators Tj and
T~.
For others, such kinds of interpretations could not be found in
view of the refractory nature of the inequalities anerging from their
respective quadratic forms.
The problem of optimum probabilities was discussed for a simple
case and a solution given by way of an example.
Also the problem of
optimum probabilities in multivariate sampling, taking into account
costs, was formulated and the necessary equations given •
•
e
8.2
Conclu sions
Some of the statements in section 8.1 are in the form of con-
elusions.
Before we proceed to other conclusions it should be noted
that in any study dealing with the problems of unequal probability
sampling it is of the utmost importance to define the probability system
in sufficient detail for the following reasons:
(a)
It is the key to the problem of relationships between the
various classes of estimators.
(b)
It determines whether or not one kind of estimator has a
larger variance than another except for the one comparison
Regarding the estimators, the most general estimator T includes
7
all others as special cases. Neither T nor T yield any distinct
5
7
special forms. Also the order of generality decreases as we proceed
downwards from T to T in the following sense.
I
7
From the condition of
115
unbiasedness of T we can derive all other estimators.
Next, from -the
7
condition of unbiasedness of T we can derive all others except T , and
6
7
so on down to T , one special form of which is the best linear unbia.sed
l
estimator in the case of equal probabilities of selection.
6,
A new special unbiased estimator, T
found.
belonging to class six, was
Of those known, it is the simplest to compute.
3.
is less efficient than T
Unfortunatelj7' it
However, when the probabilities of selection
effi~ient
at each draw are not too unequal it will be almost as
as T).
The simulded minimum variance estimators like the other unbiased
estimators with few exceptions can have negative estimates of variance.
•
6,
Regarding the estimates of variance for Tj and T
even before they
are computed we can determine whether the outcome will be positive or
negative simpl:l by computing the proba.bility, P or P ( )' as the case
S
S 0
may be, ~
and dete:rmi::1ing whether Ps
> lINe n
> l/(N{N-l) •. •(!::-n=I) J.
or P s(o)
'He note that p ( ) i.s fe.r ee.sier to compute tha.n P , which involves
s 0
s'
the summation of n t products.
When there is an excess the estimate in
both cases will be negative; otherwise it will be positive.
probable samples have negat,ive estimates of variance.
where the probabilities of the appearance of
samples are optLmm or near-optimum, then
•
e~(;h
The most
In the situation
of the possible
ne~ative
estimates of ve.riance
appear to be a::,sociated with est:tmates which are closaT to the true
population value under study.
When there is
;:;',1
The first example dis0ussed shows this.
"unwise" or misconceived choice of selection probabilities
anything can happen; this is also evident on purely theoretical grounds.
116
No meaningful interpretation of negative estimates of variance
2, T4, TIl
of T
f ,
TJ;It, in all their special fo!'l1ls, appears to be possible
in the present state of knowledge.
The cause for thi s has al!'eady been
stated above.
The exploration of all remaining special fields, already indicated
a.t length in Chapter I, encompassed by the present unified theory will
certainly be of interest.
What Sen (1952) and Durbin (1953) were doing when t,hey a~tempted
compar:lsvns of efficiency when sa'llpling with, as ag'1.inst sarr.:pling
without rupls.cement c&n be termed as inter-field cOmpal·isons.
more such
comp~risons
Many
between estimators in the diffe:t'ent fields
(and of course 'l!I1ithin ';t'jj~f:!.eld) ~e poss1.ble.
interesting- 2.nd fruitiul
cO!lclu~ians
The pOSo:1n:Hity of
a.r":sing from such studies cannot
be excluded.
fll'e note again thf.:.t the axioms of s1mple formation, posited here,
can produce all the pcssJ.ble estimators, in each field, iil Ell their
ramifications.
The identification of mere probability
s~rstems
which yi'31d posi-
tive estimates of variance [o'r all pcss::'blo sDmples, when
used, should be of interest since
•
i~
'I'~
or
T~
...
is
appears that only these t1'10
special estimators have this possibility.
On a more practical level, with all needed information available,
it is of importance to::ompare the efficiencies af the seven simulated
minimum variance
estima'~ors
of this special theory.
With the use of
electronic computing machines, the solution of the set of linear
117
sirnultaneous equations for the simulated weights specific to each
estimator, and other calculations involved (including the eva1u8.ti.on
of probabilities) should not be difficult.
We have only considered the simplest approximate relationship
x
III
cy, to obt..,l.in
th~ Sil1tl.l~a"'ied
other ft:Mtion,!j1 fo;:"ms J either
be used, pro"Tided they ere
r.,!'
eaeh
vari8.nC0.
e;-r~iinR.~o:~
Again +.ha:r
minimum variance e3timators.
algebra.~.c
ccmp~.tr:tion
is no pl'oblen
cO~3idertd
wo·,~l.d
~d;Gj,
elec+,j,'cnic
above, the empirical
be available and it would
to see what kinds of funct:onal relationships result
i;, distd.butions with wide spread or with narrow spread.
wOt".ld
used.
t~()W
can
if we know the simulated weights we would be
dist::'ibution of the es-r,:ma'ted variance
i~:te"::"est
e\"srJ tr~;;l~;cend3ntal,
appropri~.te.
Also for any probability system
be of
or
Equally,
Such studies
light on the appropriateness of the functional relationship
·LJ.d
LIT mATURE CITED
•
Cansado, E. 1955. 5amplin~ without replacEIIlent from finite populations. Proceedings of the International statistioal Institute,
Rio de Janeiro (1955) (to appear in Trabajos de Estadistica).
Carver, H. C. 1930. Fundamentals of the theory of samplin@'
Math. Stat. l( 1): 101-121.
0
Ann
0
Church, A. E. R. 1926. On the means and squared standard deviations
of small samples from any population. Biomatrika 18: 321- 39l~ .
Das, A. C. 1951. On two-phase sampling and samplinr with varying
probabilities. Bull. Inter. stat. Inst.,22: 105-112.
Des Raj. 1954. Ratio estimation in sampling with equal and unequal
probabilities. Jour. Ind. Soc. Agric. Stat. 6(2): 127-138.
-
Des Raj. 1956. Some est~ators in sampling with varying probabilities
without rep1acElllent. Jour. Amer. Stat. Assoc. 21: 269-284.
Durbin, J. 1953. Some results in sampling theory when the units
are selected with unequal probabilities. Jour. Roy. Stat. Soc.,
B 12: 262-269.
Edgeworth, F. Y. 1918. On the value of a mean as calculated from a
sample. Jour. Roy. stat. Soc. 81: 624-632.
-
Godambe, V. P. 1955. A unified theory of sampling from finite populations. Jour. Roy. stat. Soc., B 1I(2): 269-278.
Hansen, M. H., and Hurwitz, w. N. 1943. On the theory of sampling
from finite populations. Ann. Math. stat. 1d!(4h 333-362.
Hansen, M. H0' and Hurwitz, l/lT. N. 1949. On the determination of
optimum probabilities in sampling. Ann. Math. Stat. 20(3):
426-432.
.
-..
..
Hardy, G. H., Littlewood, J. E., and Polya, G. 1934. Inequalities.
Cambridge University Press, Cambridge. pp. 143-145 •
Horvitz, D. G., and Thompson, D. J. 1951. A generalization of
sampling without replacement from a finite universe. Abstract.
Ann. Math. Stat. 22( 2): 315.
Horvitz, D. G., and Thompson, D. J. 1952. A generalization of
sampling without replacElllent from a finite universe. Jour. Amer.
stat. Assoc. lJ1.: 663-685.
119
Isserlis, L. 1916. On the conditions under which the "probable errors"
of frequency distributions have a real significance. Froc. Roy.
Soc. Lond. A, 21: 23-41.
Isserlis, L. 1918. On the value of a mean as calculated from a sample.
Jour Roy. stat. Soc.~: 75-81.
tahiri, D. B. 1951. A method of sample selection providing unbiased
ratio estimates. Bull. Inter. stat. Inst. ll: 133-140.
Midzuno, H. 1950. An outline of the theory of sampling systems.
Ann. Inst. Stat. Math. (Japan) 1(2): 149-156.
,
-
Midzuno, H. 1952. On the sampling system with probability proportionate to sum of sizes. Ann. Inst. stat. Math. (Japan) 1(2): 99-107.
Mortara, G. 1917*. Elementi di statistica. Appunti salle 1edoni
di statistica metodologica dettate ne1 R. Istituto Superiore
di studi comerciali di Roma. Rome. p. 356.
Narain, R. D. 1951. On sampling without replacement with varying
probabilities. Jour. Ind. Soc. Agric. stat. 1( 2): 169-174.
Neyman, J. 1934. On two different aspects of the representative
method: the method of stratified sampling and the method of
purposive selection. Jour. Roy. Stat. Soc. 21: 558-606.
Neyman, J. 1952. Lectures and conferences on mathematical statistics
and probability. U. S. Dept. Agric., Washington, D. C. pp. 221- 228.
Sen, A. R. 1952. Further developments of the theory and application
of prirnat7 sampling units with special reference to the North
Carolina agricultural population. Ph.D. thesis, North Carolina
&'tate College, Raleigh.
Sen, A. R. 1953. On the estimate of the variance in samp1in~ with
varying probabilities. Jour. Ind. Soc. Agric. Stat. 2,( 2):
119-127.
sen, A. R., Anderson, R. L., and Finkner, A. L. 1954. A comparison
of stratified two-stage sampling systems. Jour. Amer. stat.
Assoc. Y2: 539-558.
,
Singh, D. 1954. On efficiency of the sampling with varying probabili ties without replacEment. Jour. Ind. Soc. Agric. stat.
~(l): 4f3...'7.
Splawa-Neymanj J. 1925. Contributions to the theory of small samples
drawn from a finite population. Biometrika!1.: 412-479.
Sukhatme, P. V. 1953. Sampling theory o! surveys with applications.
The Indian Society of Agricultural Statistics, New Delhi, India,
and The Iowa State College Press, Ames, Iowa. pp. 60-72.
120
Sukhatme, P. V., and Narain, R. D. 1952. Sampling with replacement.
Jour. Ind. Soc. Agric. stat. 1l(1): 42-49 •
Thompson, D. J. 1952. A theory of sampling finite universes with
arbitrary probabilities. Unpublished Ph.D. thesis, Iowa State
College, Ames, Iowa.
..
Tschuprow, A. 1918*. Zur theorie der stabilitat statistischer reihen.
Skan. Akt.].: 219 •
Tschuprow, A. 1923. On the mathematical expectation of the moments
of frequency distributions in the case of correlated observations.
Metron 2(3): 461-293; l(4): 646-683.
Yates, F., and GTundy, P. M. 1953. Selection without replacement
from within strata with probability proportional to size.
Jour. Roy. Stat. Soc., B, 12(2): 253-261.
The two references in asterisk are, at present, not accessible
to me. As mentioned in Chapter I, they were given by
Tschuprow himself in his 1923 paper.
INSTITUTE OF STATISTICS
-
NORTH CAROLINA STATE COLLEGE
(Mimeo Series available for distribution at cost)
265. Eicker, FriedheIm.
266.
Consistency of parameter-estimates in a linear time-series model.
October, 1960.
Eicker, FriedheIm. A necessary and sufficient condition for consistency of the LS estimates in linear regression. October,
1960.
267. Smith, W. L. On some general renewal theorems for nonidentically distributed variables. October, 1960.
268. Duncan, D. B. Bayes rules for a common multiple comparisons problem and related Student-t problems. November,
1960.
269. Bose, R. C. Theorems in the additive theory of numbers. November, 1960.
270. Cooper, Dale and D. D. Mason. Available soil moisture as a stochastic process. December, 1960.
271. Eicker, FriedheIm. Central limit theorem and consistency in linear regression. December, 1960.
272. Rigney, Jackson A. The cooperative organization in wildlife statistics. Presented at the 14th Annual Meeting, Southeastern Association of Game and Fish Commissioners, Biloxi, Mississippi, October 23-26, 1960. Published in Mimeo Series,
January, 1961.
273. Schutzenberger, M. P. On the definition of a certain class of automata. January, 1961.
274. Roy, S. N. and J. N. Shrizastaza. Inference on treatment effects and design of experiments in relation to such inferences.
January, 1961.
275. Ray-Chaudhuri, D. K. An algorithm for a minimum cover of an abstract complex. February, 1961.
276. Lehman, E. H., Jr. and R. L. Anderson. Estimation of the scale parameter in the Weibull distribution using samples censored by time and by number of failures. March. 1961.
277. HoteIIing, Harold. The behavior of some standard statistical tests under non-standard conditions. February, 1961.
278. Foata. Dominique. On the construction of Bose-Chaudhuri matrices with help of Abelian group characters.
1961.
279. Eicker. FriedheIm. Central limit theorem for sums over sets of random variables. February, 1961.
e
February,
280. Bland. R. P. A minimum average risk solution for the problem of choosing the largest mean. March. 1961.
281. Williams, J. S., S. N. Roy and C. C. Cockerham. An evaluation of the worth of some selected indices. May, 1961.
282. Roy, S. N. and R. Gnanadesikan. Equality of two dispersion matrices against alternatives of intermediate specificity.
April, 1961.
283. Schutzenberger, M. P. On the recurrence of patterns. April, 1961.
284.
285.
286.
287.
288.
Bose, R. C. and I. M. Chakravarti. A coding problem arising in the transmission of numerical data. April, 1961.
Patel, M. S. Investigations on factorial designs. May. 1961.
Bishir, J. W. Two problems in the theory of stochastic branching processes. May, 1961.
Konsler, T. R. A quantitative analysis of the growth and regrowth of a forage crop. May, 1961.
Zaki, R. M. and R. L. Anderson. Applications of linear programming techniques to some problems of production planning over time. May, 1961.
289. Schutzenberger, M. P. A remark on finite transducers. June, 1961.
b"+mc"+p in a free group. June, 1961.
290. Schutzenberger, M. P. On the equation a 2 +D
291. Schutzenberger, M. P. On a special class of recurrent events. June, 1961.
292. Bhattacharya, P. K. Some properties of the least square estimator in regression analysis when the 'independent' variables
are stochastic. June, 1961.
293. Murthy, V. K. On the general renewal process. June, 1961.
=
294. Ray-Chaudhuri, D. K. Application of geometry of quadrics of constructing PBIB designs. June, 1961.
295. Bose, R. C. Ternary error correcting codes and fractionally replicated designs. May, 1961.
296. Koop, J. C. Contributions to the general theory of sampling finite populations without replacement and with unequal
probabilities. September, 1961.
297. Foradori, G. T. Some non-response sampling theory for two stage designs. Ph.D. Thesis. November, 1961.
298.
299.
300.
301.
302.
303.
304.
305.
306.
MalIios, W. S. Some aspects of linear regression systems. Ph.D. Thesis. November, 1961.
Taeuber, R. C. On sampling with replacement: an axiomatic approach. Ph.D. Thesis. November, 1961.
Gross, A. J. On the construction of burst error correcting codes. August, 1961.
Srivastava, J. N. Contribution to the construction and analysis of designs. August, 1961.
Hoeffding, Wassily. The strong laws of large numbers for u-statistics. August, 1961.
Roy, S. N. Some recent results in normal multivariate confidence bounds. August, 1961.
Roy, S. N. Some remarks on normal multivariate analysis of variance. August, 1961.
Smith, W. L. A necessary and sufficient condition for the convergence of the renewal density.
Smith, W. L. A note on characteristic functions which vanish identically in an interval.
307. Fukushima, Kozo. A comparison of sequential tests for the Poisson parameter.
308. Hall, W. J. Some sequential analogs of Stein's two-stage test. September, 1961.
309. Bhattacharya, P. K.
August, 1961.
September, 1961.
September, 1961.
Use of concomitant measurements in the design and analysis of experiments. November, 1961.