Finkner, A.L.Further investigations on the theory and application of sampling for scarcity items."

~ertrude M.£oX
FURTH.t:R INVESTIGATION ON THE THEORY .AND
APPLICATION OF SM.rPLING FOR SCARCITY ITEMS
By
I
A. L. Finknor
•
Insti tute of Statistics
Minoo Serios # 30
For limitod distribution
'.
,
.'
't "
.,
,
T.Al3LE OF CQNTENTS
c
CHAPTER I
THE PROBLEM
. Seotion
~
1.1 Introduotion
1.2 Review of Literature
1.3 Notation
1
5
6
CHAPTER II
IDE'NTITY OF THE SAMPLING UNITS KNOWN IN .ADVANCE
OF THE FI:mLD SAMPLING
PRocEss
Seotion
•
•
2.1 Relative Efficienoy of Samples from the Selective and Original
Populations for Various Methods of Sanq)ling and Estimation
2.1.1 Previous Results
2.1.2 General Theory
2.1.3 Simple Random Sample with a Mean per Sampling Unit
Estimate
2.1.4 Simple Random Sample with a Ratio Estimate
2,1.5 Simple Random Sample with a Regression Estimate
2.1.6 Stratified Random Sample with a Mean per Sampling Unit
Estimate and Proportional Allocation
2.1.7 Stratified Random Sample with a Mean per Sampling Unit
Estimate and Optimum Allocation
2.2 Cost Functions
9
9
9
10
11
15
19
21
22
T.ABLI OF CONTENTS ( con t d)
CHAPTER III
ID2ITY OF THE SAMPLING UNITS NOT mOWN IN ADV"lliCE
OF THE FI:IDLD SAMPLING PROOESS
§oction
•
~
3.1 Introduction
3.2 Genoral Thoor,v
3.2.1 Sioplo Rondon Sanple with a Moan per Sanpling Unit
EstiT:late
3.2.2 Stratified llandon Sample with a. Mean per SaI.'lpling
Unit Estlmto
3.3 Empirica.l Investigation of E(l/n t )
3.4 Relative Efficienc~ of the Selective Sample CODpared with
Samples fron the Selective and Original Populations, Using
the Same Sampling Rates
3.4.1 Sinple Randon Sample with a Mean per Sanq)ling Unit
Estinato
3.4.2 Stople RandoD Samplo with a Ratio Estinate
3.4.3 S1~le Randon Sample with a. Regression Estimate
3.4.4 Stratified Random Sarrple with a Monn per Sampling
Unit Estimate
3.5 Cost Functions
35
36
36
41
43
47
47
49
53
55
50
CRAFTER IV
AREA SAMPLING
Sgction
4.1 Investigation of the Geographic Distribution of Some
Scarcity Items
4.2 SUb-sampling
4.2.1 Introduction and Notation
4.2.2 Unbiased Estimates Derived from PSU Totals
4.2.3 Other Estimates
4.3 Pre-listing
•
,
e
59
65
65
67
70
72
TABLE OF CONTENTS (con 'a)
CHAPTER V
•
COMBINATIONS OF MAIL, LIST .AND AREA S.A1I.{PLING
Section
5.1 General Considerations
5.2
An ExaJ1l)le Using COl'!111orcial Peach Orchards
76
79
CIiAPTER VI
SUMMARY
Section
•
,
e
6.1 Sunmary of Results
6.2 Oonclusions
6.3 Suggestions for Further Research
06
07
80
REFEREl.fCES
92
LIST OF TABLES
Page
•
Table 1
Table of Notation.,
Table 2
HYPothetical Values of p, d, r and l/~ to be Substituted
6
- - Id-l
- -
'\
pr / - + 1 + qi
into the Equation olco =
.~ a
- (pr + q)
a
Table 3
27
n:
37
Estimates of E(l/n t ) anel. its Stanclard Error from a
Number of Varying Sized Samples Drawn at Randon fran a
Hypothetical Population and a Comparison of the Estinate
of E(l/n t ) with l/pn.
45
Means anel. Variances of Several Populations of Scarcity
Items Located in Two Geographically Concentrated Areas.
61
Observed and Computed Frequency Distributions of
Oommercial Peach Orchards in the Sandhills .Area of
North Carolina.
62
Observed and Computed Frequency Distributions of Fruit
Orchards in .Allegan Oounty. Michigan.
62
Observed ann Computed Frequency Distributions of Applo
Orchards in Allegan County, Michigan.
63
Table 10 Observed and Oomputed Frequenoy Distributions of Peach
Orchards in Allegan County. Michigan.
63
Table 5
•
•
Values of the Relative Cost of Saopling the Selective
Population to S~ling the Original Population when
Different Values for the Variables d~ p, l/~ and r are
Considered.
- - --Characteristics of Selective SaL~)les of
Drawn Without
Replacement from a Population of ! Elements in Which
There are pN Eligibles and qN Non-eligibles.
Table 4
'e
•
Table 6
Table 7
Table 8
Table 9
Table 11 Observed and Oomputed Frequency Distributions of Pear
'Orclk'11'ds in Allegan County, M i c h i g a . ' l . 6 4
•
•
e
Table 12 Observed and Oomputed Froquency Distributions of Grape
Vineyards in Allegan Oounty, Michigan.
64
Table 13 Notation for Sub-sampling Oonparisons.
66
LIST OF TABLES (con'd)
P~e
•
•
•
Table 14 Hunber of Orchards and Segnents per Stratum for
Twelve Different Methods of Sa,~~pling.
00
Tablo 15 Size of Sample Required to Attain Ten Per Cent
Accuracy at the 95 Per Cent Level in Estinating
Expectod Production of Peaches in the Sandhills
Area ~y Various Methods of Combining Stratified
S~1ling, Mail Questionnaire and Large Fam Lists •
04
e..
I
LIST OF FIGURES
Page
Figure 1 Graph of the Influence of Each Variable on the Ratio of
C/Co Averaged Over All Other Variables.
Figura 2 Influence of tho "Life Expectancy" of a COl'!Tplet.e
Enumeration on the RelativQ Costs (C/C o) of Two
Procedures for Given Values of the Proportion of
Eligibles, the S~)ling Rate, and the Relative
Oosts of Enumerating an Eligible and a Non-eligible.
Figure 3
Figure 4
•
01
01
Fi€','UI'o 5
29
30
Influence of the riclative Cost of Enunerating an
Eligible and a Non-olip,ible o~ the Relative Costs
(C/Co ) of Two Procedures for Given Values of the
uLife Expectoncy" of a COMplete EnUMoration, the
Sampling Rate and the Proportion ef Eligibles.
31
Influence of the Profortion of Eligibles on the
Relative Costs (C/Co of Two Procedures for Given
Values of the "L1fe Expectancy" of a COT.\plete
Enumeration, the Sanpling Rate and tho Relative
Costs of Enuoerating an Eligiblo and a Non-eligible.
32
Influence of the SaMpling Rate on the Relative Costs
~C/Co) of Two Procedures for Given Values of the
'Lifo Ex.pectancylf of a Conploto Enumeration, the
Proportion of Eligibles and the Relative Costs of
Enunerating an Eligible and a Non-eligible.
33
CEAPTER I
THE ?ROBLEM
•
1.1
Introluction
SaD~linG ~roccQures an~ anopIa surveys r~vo boon classified in u
nunber of
~ifforcnt
ways
depen~ing
b~sis
upon tho
for
classific~tion.
For
oxaT:'\IJlo, a sanple nay be classifiocl on the b~sis of tho Dothod of selection
of the saMPling unit (purposive, rando~, systenatic); tho restrictions
placed upon the Method of selection (s1ople, stratifioc, sUb-sam~ling);
the basic matorial evailablo (list, area, quota); tho ::'1ethoc1. of o'1;tainint ;
tho information fran the responr1..ent (l'!1.[\,il, telolihone, l"Jersonal visi tntion) ;
or the typo of porulation snnr1od, to
.
use~
by the Bureau of
~ricultural
n~~c
a few.
Still nnother,
Econonios
~n1ich
is
the classifiea-
./
e
"
tion of sal'Iplcs into {;cnoral liUr:,',oso anet srocia1 I,ulj!oSO sW"1j?les.
J:i. geIl-
eral l1Ur:P0SO survey, as defined by the B.A.!]., usu.e'1.l1y involves a Inrge
population ane is usually concerned with tho
estin~tion
of universo ocans
or totals of a rolo.tivcly large nunber of items fran tho sar::rplo date.
Of
course in such a survey, not all itens will bo estinated witn equal pre- ,
eiBion.
The itens usually csti!1f',tec'. more accurately are those 1vhich DXe ,
approximately symnctrically distributed and occur with a relatively high
freQuency in the population.
Sarlo
L12 J stntes that a general l)urIlose
snBlilo is usually most satisfr~ctory with three t;y']!cs of fnrms--"Gonoral ll ,
ULivestock" ane'. IIAIl-Other Crop" farns an(\' in sono stC1.tos, "DaiI'"lJ II and
II poultIjl'II fl".rns,
Commercial fruit Dnd nut, vegetable, poultry, c1niry,
turkey and other i tens are not repres ento''1. E'/'..oCluately in
survey~
The writer
h~ ~esignatod
c,
general
rUI'I)OSC
the lattor class of itens ns scaroity
-2-
items.
It is usually these items which require ttspecial purposeu sampling•
.4.t the present tino Most II speoial purpose ll surveys involve the uae of a
;
The frequency of occurence and perhaps to a lessor extent the degroe
of concentration of an agricultural item con be considered to be continuous
depend.ing upon the population.
That is for a nur.'1ber of given 110pulations,
JJ
the percentage of eligibles can vary from 0 to 100%.
Thore is therefore
no clear cut line of der.1arcation for determining a sco.rcity iteo and there
si~le
is no
designed.
method for determining when a special purpose sample should be
~T.hat
criteria should therefore be established for classifying an
item as a scarcity iten requiring a special purpose surveyi
Sarle
~ste.tos.
"Whether special s~)ling is needed depends upon the characteristics
of tho population being san,pled. Relatively snnll populations of a faw
hundred or less, populations that have a highly skewod distribution where
a small proportion of the population accounts for a high proportion of the
total production or volur.'1e, or where different segments (either by size of
operation or geographic location) react differently to economic and other
forces affecting production, usually require 'spacial purpose! sampling of
some kind rather than 'general purpose' sampling."
Translating these general rules for classification into a clear cut
objective method for determining the typo of sample to use is highly desirable.
Any objective nethan of discrimination between alternative sampling
procedures must be bused on the general principle of selecting the swm)ling
procodure
17 For
t~~t
will give the greatest anount of information for a given cost,
e~le consider the por cent of farms growing tobacco for ho.rvost
in three different PODul~tions according to the 1945 Census of Agriculture.
!n the United States, 0%; in North Carolina. 52% anQ in Wilson County; N.C ••
9~.
- 3 -
or for equal accuracy, will cost the least.
It is apparent that in order
to satisfy the conditions of the general Drinciple, it is nocessar,y to
(1)
determino the relative efficiencies of alternative sanpling
procedures, and
,(2)
set up realistic cost functions for these alternative procedures.
The relative efficiencies an~ costs can be rrpproached theoretically,
empirically or both.
In this
st~cy,
the problem will be attacked from the
theoretical angle first and the theoretical results tested where actual
data are available.
~
mentioned previously, sampling procedures can be classified on tho
basis of the materials available for Qesigning tho saqpleo
•
.
'
tions of tho problem of scarcity
study where lists are available
available.
s~)ling
ana
The investiga-
fall rather naturally into a
where area SaDDling materials are
With respect to list snnpling, two cases are involved•
Case I. The eligibles in the original porulation are identifiable
before the sampling unit is actually contacted.
The
no~
eligibles can thus be eliminated during the planning of the
survey and the original IloIJulation reducocl to a sub-:l')opulation
consisting of eligibles only.
This sUb-population will be
referred to throughout this thesis as the selective population.
A sample of given size or for a given sampling rate can then
be compared for the two populations (original and selective)
in order to ascortain tho gain in accuracy due to the olimina;,
tion of tho non-oligibles.
Tho number of oligibles in a sam-
pIe from the selective population is not n random variable.
-4-
It might
appep~
at first glance that Case I is an unrealistic sitUation
and that if you knew the eligibles in advance you most assuredly would sample only from them.
However when relative costs are considered, the pro-
blam becomes a realistic one.
s~)le
For example, if tho gain in efficiency of a
from the selective population over a sample fron the original popu-
lation is greater than the increased cost necessary to determine the
eli~i-
bles in the original population, then it is worthwhile to establish the
identity of the
s~)ling
units in
a~vance
and sample fran the selective
population.
Case II. The problem can be considered from the standpoint that the
sampli~~
units are not iQentifiable until they arc contacted
by the srurrpling process.
•
In these cases a gain in efficiency
is realized by eliminating the non-eligibles from the sample
and
estimati~~
only on the
the mean of the selective population based
eligi~les
in the samplo.
designated as a selective sOIDple.
Such a sample will be
Hence the size of a solee-
tive sample is a random variable and must be considered as
such in tho estimation of the mean and variance.
Case II
can be applied when the survey covers more than one item
while Case I must necessarily be limited to one
Area
san~ling ~oses
problems somewhat lifferont from those confronted
in the list saM]le, and in general is more complex.
However the basic
principle of greater accuracy for a givon cost still applies.
situ~tions
it~.
will be considered in detail later.
These
1.2 Review of Literature
Very little information on the specific subject of scarcity sampling
is available in the literature.
this study.
Houseman [ 7 ] Ijrovided the stimulus for
In a survey conducted in Florida in which the section was the
saffil)ling unit,
rrouser.k~
noted that a relatively large proportion of the
sections in the sam:I.jlo clicl not contain farns.
Th:i.s observation lee. him to
consider the gain in statistical efficioncy \"/hich woulcl result, on the
average, from the elimination of those sections which contained no farms
from the population_
Houseman's investigation foIl into the category of
Case I and he showed the theoretical gain expected when a simple random
y
sample. a mean per sampling unit expansion
,
e
the two populations is considered.
and tho sarne sar.tpling rv-te in
Housemants work has boen extenced in
this stUdy to other sampling l,roceclures and methods of estimation.
The variance of tho Mean of a simple ran(lol':l sa..r:rple with a mean per
sampling unit estimate is given by cr~/n if the finite ~opulation corroction
factor is ignored.
In Case II; where the number of eligibles in the sample
2
is a random variable, the variance of the mean would bo given by 0 E(l/n T)
where
l!.!..
is the size of the seleotive sample.
Stophan
C13 J
has derived
the expected values of a positive hyper-geometric variable in sampling
without replacement from a finite population.
Stephants work also has
been extended in this study to other sampling ?rocenures and methods of
estimation.
.
~
.,
~ A number of osti~ates of the p~pulation moan or total can be made from
sample data. One of the simplest methods of expansion is a moan per sampling unit estimate. To utilizQ this methou in estimating tho population
total, tho moan of the sampling units in the sample is multiplied by the
total nur1ber of san~ling units in the popul~tion•
Holmes ant''. IIanc1.y
[6 I
have reported procedures for estip.lO.ting
connereial truck crop production in areas surrounding large metropolitan
districts.
The author
[3 J has
shown that area sanpling for commercial
peach production in the Sandhills area of North Carolina does not give
acceptable accuracy unless basic variances are lenown.
General purpose area
sample surveys made by the B.A.E. in 1947 and 1948 incUcated that scarcity
items were estimated with a lower degree of accuracy than items
distributed.
wide~
In North Carolina, a general purpose aroa survey of 1,000
farms (..3 per cent samplo) to establish buying
and.
selling habits of North
Oarolina farmers disclosed that only two peach growers, four apple growers,
six turkey growers anfl six pecan growers ha]?pened to fall in the sarrrplo[10].
Each of these categories was supplemented with a spocial
pu~)ose
survey to
obtain more observations.
1.3 No tation
Standar~
notation has been used in so far as possible throughout this
pal)er. For purposes of ready reference these notations are listed in
tabular form in Table 1 ..
Table 1.
Table of Notation
Definition
Syrnbol Associated
Original Selective Selective
population population
sample
Number of sampling units in pOl")ulation
Proportion of eligibles in population
Proportion of non-eligi0les in population
PopUlation Moan
e
"
Variance of an individual obsorvation
Value of an obsorvation (= 0 for nonoligibles)
N
p
q
/-/~6
pN
1
0
N
l'
q
.- u/
/./-
2
CJ2
r?ll!
0
CJ
y
y
y
-7-
Table 1. Continued
Synbol ~sociated
Original Selective Selective
population population
B~)lc
Definition
Numbor of SaJ:1Dl11lf:: un!ts in sample when
SOD1)l1ng rate is the sMe
Sample meon
Estimated population total
Population total
Variance of tho saJ!Il)le mean
Variance of the estimated populntion total
(Coefficient of Variation)~ fo Z an individual observation (e.ge, aol/~o)
Estimate of the variance of. the somple
mean
•
Estimate of the variance of the estimated
population total
Population correlation coefficient
Cost of enumerating a sample
pn
n
-
t'
t
T
Vyo
V
V-Y
V
(OV)~
(Ov)2
(OV)2
2
By
s-.
y
to
2
Byo
2
t
2
2
2
G
to
St
St l
Po
p
~
0
0
\.,
y'
Y
oy
0
1
Other Notation
SaI!1J?l1ng rate :: niH
Cost of enumerating an eligible
Cost of contacting a non-eligible
The ratio of Cp to Cq(~/Oq)
.
The life expectancy of a complete enumeration in{
terms of the number of periodic surveys for
" =d
which basic information will remain valid
./
The relative efficiency of one method of sampling)
(1) with respect to another (2) is defined as
t = B..E.
Vtl/Vt2
..J
1J
Some writers prefer to define
')
while others use
G;t
00
cr~ in a finite population
N
2
I: (y - 1..iJ )
i-I i -' a
= =....-.=.-...;;..--- •
N-l
N (Yi-ro)2
as
~ ---------
i=l
N
In the estimation of V_
yo , the
appropriate finite population correction factor is applied in each case so
that the two definitions give identical results. This is illustrated for
-0Table I, Continued
simple random sampling and a mean per sampling unit estimate.
while in the discussion in Chapter III it is defined as
- 2
and. similarly for c •
Includes the cost of identifying the eligibles.
sI-
--I
Accord.ing to Cochran L 2 - i • the relative efficiency of method 2 to
method 1 is defined as the inverse ratio of their variances. Usually
equal sized samples are assumed for both methods although this assumption
is not necessary. If the finite population correction fe.ctor is negligible,
and equal sample size and simple random sampling is assumed, R.E. gives the
relative size of sample that must be taken to give the sar1e precision for
the estimated total. Suppose that the samDle for method 1 is increased in
size fran n to hn. Then the varianco of the total of method 1 becomes
22
N c
:::
Ihn
---- .
V
tl
V
t2
or Vtl/h.
Now if .h is choson so that Vtl/h::: Vt2 • then h ::: R.E.
CHAPTER II
IDENTITY OF TEE SAMPLING UNITS KNOilN IN ADVANCE
OF THE FIELD SAMPLING PROCESS
2.1 Relative Efficiency of Sanples fron the Selective and Original
Population for Various Methods of Sanpling and Estimation.
2.1.1
Previous Results:
The relative efficiency of a sanple total i
fron the selective population with respect to a snople total to from the
original population when the snnplingrate is the same and the finite popr ·-1
ulation correction factor is ignorect, is given by Housenan L 7 ._i • as
R.E.
(1 - g) (OV)
=
2
0
(1 _ g) (CV)
2
- q •
a
.A further relationship established by Houseman is given as
(CV)2 + 1
=P [(CV)~ + 1
I.
(2)
Since Houseman did not present derivations of these results, they
have been developed in this thesis so that the procedure can be applied in
obtaining similar results for other methods of saqpling and estimation.
2.1.2
General Theory:
Let
.,P~
and
O'~
be the mean and variance of an
individual observation in the original pOImlation and./.I.>2 anel ~ be the
mean and variance of an individual observation in the selective populationo
pN
.2
2
As footnoted preViously,cJ is clefined as t
(Yi-.,.u,) /pN in order to
i=l
2
simplify algebraic manipulations. Then the basic relationship between 0 o
ancl 0'2 is given by
•
10 -
M
Proof:
N
2 _
oo -
g1
•
J.
;2
2
2
l:y - (Zy)
(y.J....
- /..00 )
IN
..
N
plJ
2
2
i.: (y - r')
by2 _ (bY) / pli
1i 0 2 = 1=1
i
=
_
N
2
o
o
2
-po
N
z:?
=-N
~~Z
NZ
cr-'''IO
J;'
Z
(l:Y)
= -
.N
o
N
p~
:
Z
l
j-
Z
2
EI
~~Z
--+-- .
1
I
1--1i
J
1
= / l!-.
J.'
'-_P
;-",
1
i
o·
-
p!
. -
Z
,p..
,-
I
1)
Cl.
0
•
p
Z
Z
Z p, q
po=oQ
o
p
.
e
(i)
..
The rGl~tiv0 efficiency ef a sample total i with respect to to
ignoring tho finite population correction fnctor
sa~01ing
rate can bc written as
Z Z
N CJ In
o
_
Ipn-
p
Substi tuting the value for p
0
(pN) Z oZ
0
h..E. =
2
(CV)
0
=
2
2
J.;;. q
o -'~
0
p
(CV)
2
0
the sa90
Y
2
0
nn~ consi~cring
0
0
2
2
=
2
CJ
2
o
-
~~
q
,
P
(1 - q) (CV)
0
:::
q
- -p
(1 - q)(CV)
2
2
0
..
- q
0
~ The effoct of inclufling tho finite ~opul~tion cOl~ection factor is to
pN - 1
multil")ly tho relativo efficiency (R.E.) by ,
..
Since 1) .is loss than
pN - P
1, R.]]. will be slightly overestinated.
However tho (Ufferonco bot\\l'oon 1
lJN - 1
and. tho fraction, - - pN - P
sar:tpled.
t
will be nogligible for Boat I,opulations to 'be
... 11 ...
which is the result
[1 ]
given by Houseman.
This may be rewritten as follows:
2
p(OV)
1
R.E. =
2 0 =----l' (OV) ... q l...
q
o
p(OV)2
a
Equation (2) may also be rewritten as
(OV)2
q
--=+
p(OV)2
o
Since 0
=1·
p(OV)2
0
<.
q
2< 1,
p(OV)
it is always more efficient to sample from the
o
selective population when p:< 1 ...
(ii)
•
.
--
.
A further gain in efficiency from sampling the selective pop-
ulation is'roalized if the sample sizes are tho samo in samples from both
1
populations.
Further, this gain will be of the ordor of - since the
p
jinito population factor is ignored.
The relative efficiency of a sample
total i with respect to a sample total to when the sample sizes arc equal
is given by
1
R E
(~)~
r
2
• • =p ~
!,
0
(OV)
...
0
(0~2
0
.9. \ = p (OV) 2 p)
./
0
q
(6)
~
!I
2.1.4 Simple Random SaT1ple with a Ratio Estimate:
(i)
If the samo
sampling rate is used, there apl)ears to be no particular benefit from
sampling tho selectivo population when a ratio estimate is contemplated.
'.
~7 The ratio and rogression estimates consider only those cases where a
zero"xfl .value (x being the supplernontnry variable) is ahmys associated.
with a zero llyn value and where no non-zero It X Il values are elininated.
This is a realistic assumption in mnny cases. For oxnnplo in estimating
corn acreage on a ratio to crop land, corn acreage would bo tho "ylt variable and crop land tho "Xli varinole. A zero fiX" value (no crop-land)
would of courso be associated with a zoro " y U valuo..,
\
-12-
Before discussing this point in detail it might be well to review tho
..
characteristics of a ratio estiMate.
This estimate fron a
si~le rcndo~
sample is given by
...
Yo
to = -_
N .jJ..-l
xo
..
X
o
According to Cochran [1 ]
t
the ratio estinatc is a "bost unbiased
linear estimate" if two conditions are satisfied:
(a) the relation between Z nnd ~ is a straight line through the
origin, and
(b) the variance about this line is proportional to
If condition (a) is not satisfied, HQsel
.
e
[5]
;&0
has shown that the estimate
is biased, although in largo snnplos the bias is usually negligible relative to the sampling error.
Exprossed in terms of expectation, the
bias arises fron the fact that
(8)
The menn sqUD.re error is mado up of two conponents:
(a) tho sampling orror, and
(b) the bias term squared.
This can also be expressed as
M.S.E.= V(y- o /-x)
+ Bias 2
0
2
[ (::.~ J
Yo
E
E(yo)
-
x
I
_
,..
E(x )
o
•
(9)
... 13'"
The bias terr.! is generally small and is usually ignored whEm E.. is reason'"
ably largoo
An l1PI)roximate fornula for the variance of a sample ratio ignoring
the finite population correction factor is given by
V (yo/xo)
(J (J2 2(J )
I Y
_)2(2
::: ~ + ~o _~o,
(...o
n
~o ~.4Ixo .J- yoJi'o
0
(10)
.
X
(J
where
y~
will stnnd for
(Jyom .
Equation (10), when mUltiplied by
~~2
/ xo
is an approximate expression
for tho variance of tho snnple total (to) and is utilized in this investigation for cor.:q:Jaring samples from the selective population with those
from tho original population.
,
e.-
If a ratio estimate is usod, and the
sampling rate is tho sano in both popUlations, the approximate formula
for tho variance of a sa.l:lJ?le ratio (10) gives identical results for both
populations, i.e.,
-(11)
R.E. = 1.00
Proof:
•
R.E.
Now A,
y
ftxo
t::-,
p
- 14-
e
2
p02
=0
.J-kyoq
2
yo
y
from
(3),
p
2
2
a
e
:PCTx
~~oq
2
xo
and
po
.
1:1
yx
a
from (3),
p
/kYO~oq
yxo
P
.JL,2
R.E.. :::
0
2
~o
(CV)
2
70
•
"
2
+ (CV) XO - 2 p 0 (CV)YO (CV)
(~ ~.
3payx
,) -;:r-+ _2 ...
,..u-xo ~o'-1co
(Pya
f(CV)~o
J
xo
f
+ (CV);o ... 2P 0 (CV)YO (OV)xo
2
q
. 2
q
2q
(CV)
... - 4- (CV)
- - - 2 P (CV) (CV) xo + ,[.
yo p
xo
p
0
yo
p
e_~-..1ll
.
e
_ _,"--
--:O:";;"'_-';"';"":;'
~
__
=1.
Although it has been indicated previously that tho bias is usually
negligible, it should be pointed out that tho bias for a ratio estimate
using sanple data fron the selective
popul~tion
will not be the seno bias
encountered when the estir-late uses sample (Lata from the original populo....
tion.
That i8\
E
Yo)
-
Xo
~ E
( -y.)
i
The differences will be expressed synbolically in Chapter III since
further theory, developed in that Chapter, is needed.
... 15 -
(ii)
Again a further gain in efficiency of tho order of IIp is
expected when
.
equ~l snr-~le
sizos are considered in both
popul~tions.
This is
1
R.E. == P
Siuplc Rcnc10n SDnj')lc vri th a Rep-rosston Estinate (Soe
ostin~to
Another woll known
which takes
~lvantnge
={ ~
(i)
of any corrolation
betweon the variable to be estinated and any supplencntary variable fer
which population data arc uvuilaolo is the regression estinateo
estinate of the
to = N
•
where
~
f'Y
porul~tion
0
This
total is givon by
x)J '
+ b(Axo -
0
is the least squares estinate of tho regression coefficient
~
for
y on x.
-Cochran [1 ]
hus sho\'1n that the uverage variance of the sD.J:lplo menn
for a regression estinate is
2
2
::: °yo (1 ... po)
V(yo)
(15)
(I
n
This relationship assunes that
(a) the true regression is linear.
(b) tho deviations from regression have a constant varim1ce$ and
(c~ N is infinite.
Howevor if
n
is large enoue~ so that the terns lIn aro negligible
Cochran [1] has shown that fornulu (15) ho1(1s even if the true regression is not linear und the residual variance depends
finite size of the
p~pulation
upon~.
When the
is considered, the regression estimate is
- 16 ....
slightly biasod but this bias is
~~iaportant
effect on tho ,rl1r:l.1'JJco 1s D:ppToxir.1I.1toly
(N: n) .
'1;0
in most practical casoso
The
multiply i t by tho factor
R8cognizing the assunptions as specified by Cochrnno the
following fornula is used for
co~)aring
the
rel~tive
efficiency of
s~Dplos
from tho selocti.vo rOlm.ln.t:ion to thoso fron tho originnl pOl)ulation \1Thon
a regression ostiDC1to :13 ccmtcDpln,ted ~
')
2
N (1 - p")
V
to
0
=
'.)
0;0
0
n
''!hen the s1.1J:rplir>.r; rote is trlO SDno in. both POljuln.tions Mel the finite
popUlation corroction fnctor is
ignored~
tho relative efficioncy of a
sanple drawn fron tho selective
popul~tion
to one drawn fron the original
population is as followB
1 -
•
In the proof of (17) tho following oxpression is needed:
(18)
f
This suprlenental proof of (18) will be considered firsto
Proof:
Since
.
e
pNPO'yO'x
=~
N
y :x: ~ pN .Py/~ = l:; Y x - .- /-~o.,..l~,
I)
... 17 -
and
Z y
X
o
0
:::
Z y x,
N
:pN,o 0yOx + - "'#"'yo~o = NPOoyOoxO + N.#"yo~o·
P
Hence
:PI' 0yOx = I' oOyoOxo
+ .Jkyo..Pf.o ... ~yo/4xol-p
and
f'
2
I::
.
2
(p L:J 0 0
... q LJ.J
I ~-" )
r 0 yo xc
.r. yo,/"'xo
--;2.,p..;::.."';';2r=-)-(--'2--11-:";;"""::;;;"P (lJO"y POx)
Proof of (17):
,
e
.
e
R.E.
v
~
= . to:: n
Vt
(1 .. ,P2) (02 )
0
yo
(pN)2(
pn
1-- f
2) 2
°Y
•
- 18-
r
1
=----...-,,- 7(pcryO2 :p(~o - q;{O>
2
2
q "UYo) (paxo- q
2
~o)
.....
~ (p POOyOOJOO -
qfJyoJ"'JOO)2] •
No\lI' consider only the term of the clenomina.tor in curled brackets, which is
Tl
,J;;'
2 2 2.
(J
yo (Jxo
- pqCJ
2
-
Tl
1"\
,J;;';-
22 2
CJ
CJ
+?-q /'l CJ (J r fL_
IL
0 yo XO
"",1J;-0 yo xo
yo~xo
-
2
2
IIyo""---xo
""(J
r-J;
2
2
~
xo yo
2
1
2 CJ2
yo xo
+ ---,,:,....
r:::po:
(Ov)2
xo
r::'
(OV)
°c OV) xo
yo
II
J
'r, .
•
,J
"'
Hence
,
e
R.m.
2
2
2
(1- 1- )(pcrxo-q ~o)
I::
6
2 (i (
2) _
q
r(/OV)2 + (OV)2 -:3 P (OV) (OV)
(Jxo· P l-,Po
(OV)2 (OV)2
yo
xo
0
yo
x?...
yo
xo
L
If numera.tor and
~lenominator are
divided. by (1
_/:'I~)
po;o'
1-(P(;)~ )
R.E. ::
--{'--q-)-.l(,.-..(-OV-)2~+-(C-V--:) 2~.-~-2-p-,
yo
1 _
xo
(OV)~o (1
p(CV);o
-(O-V)--(O-V)-----) •
0
-
yo
XQ ....
,P~)
)
q
It has been shown in (5) that
.
e
less than 1.
•
J.
- - ( ' ' : : " .- - - - - -........- - - . . . ; ' - - - - - - - - - - - - - - - - -......
2 must be a positive quantity
P(CV)X9
Therefore if the term in the d.enominator of (17)
- 19-
r
(OV)y2
+ .(OV)2xo ... 2,0 (OV) yo (CV) xo]
is always grce.ter than or oqu..'1l
0
.
2
'
to (OV)YO (1 -J'~)t the relntive efficiency ~ 1.
L
Proof:
(OV) ~o + ( OV) ~o - 2 P ( CV) yo (OV) xo
P 2o (OV) 2yo
_ 2 f'
(OV)
0
yo
(OV)
1
f
xo
~
(OV) ~o (1 - p~) •
+ (CV) 2
>0
xo -
•
2
p o (OV) yo - (OV) XO)> 0 ..
Heneo the relativo efficiency is equnl to 1 if
P a (OV) yo
::: (OV)
xo
und the relativo efficiency is greater than 1 if
•
1
(ii) Again the relative efficiency for eqlli~1 s~lle sizes is p
times that for the sane s::u:TJ.)ling
rat~.
1 ..
R.E.
:I
f
I.
)
2
~(OV)xo
.. (19)
--'-:--------:::---:-I..-.:::~--'-~---------~
p .... (
~
q
(OV)2
xo
(CV);o + (CV);o - 2 flo (CV) 70 (CV)xo }
.
,
.
(OV)2 (1 _;02)
yo
0
Stratified Random SroTt)le with a Moan ner Sampling Unit Estimate
and Proportional Allocation:
in simple random
can be used.
sa~pling,
(i) In strnt1fied random sanpling as
a number of different methods of estimation
In nddition, with stratified
differont ways of allocating the
s~~lo
sm.~Jling
to the
there arc a nunbor of
v~rious
common method is known as proportional allocation.
strata.
One
By this method the
- 20 -
proportion of the total saMple allocated to a strutun is equal to Nj/N
where Nj is the total s~)ling units in the jth strat~ and N is defined
as usual.
When the following restrictions aro imposod upon tho
srurr~los:
(a) stratification,
(b) menn per sa~ling unit estinate,
(c) pro~ortional allocation. (nj
=n Nj/N) ,
and
(d) the snne se~)ling rate,
the relative efficiency of a
s~)lo
fran the selective population with
respect to a sample fron the original population ignoring tho finite
population corroction factor is givon by
(20)
.
e
Proof:
=
1
2
EUj A .qj/Pj
1 _
OJ
EN .o2
J 0j
>
1.
- 21 -
(ii) When the size of sample is made the same in both populations,
the effect on the relative efficiency is again to multiply (20) by IIp.
This i.s
1
R.E.
= --p1:-N-j-~"'ar-jq-J-lp-j
•
a
p-
EN/'oj
It might be noted at this point that even though the total sample size is
equal thoro is no reason to expoct that the sample sizes within strata
"rill be equal unless the proportion of eligibles is constant from one
stratum to another.
2.1.7 Stratified Random Samnle with a Mean por Sampling Unit Estimate
.
e
and ~)timum Allocation:
(i)
Another method of allocation used
whon possible is known as optimum allocation.
by
Ne~
CIlJ
and insuros
This method was developed
minimum variance when it can be utilizocl.
It requiros a knowledge of, the within stratum variances and allocation is
then mado as follows:
n
J
= n No
j oj
(22)
EN,,ooj
The relativo efficiency of
s~~les
from the selective and original popula-
tions under these conditions when tho
over~ll
sampling rate is the same
and tho finite populatinn correction factor ignored is
.
e
- 22 -
Proof:
V
R E = to
Vt
.'.
=
Hence
R.E.
=
(1i) \ihen the same s~lcsize is considerod in both populutions and
tho finite population correction factor ignored. tho relative efficiency
beccr.1es
R.E.
2.2
Cost Functions
Many roasonable types of cost functions can bo established depending
uDon tho assumptions
m~te~
The difficulty arises in attempting to construct
a cost function which will be general onough to holQ under most situations
and yet not bo so cOMplex as to bo unwioldy.
One comparison of genoralizod
cost functions will bo r.1ade horein.
Cost functions must not only bo constructed for tho sampling of the
seloctivQ population but for tho original population as woll in ordor to
- 23 -
compare relative efficiencies on a basis of both cost and IJrecision.
00 be the cost of enumerating a s~jlo of
n
Let
out of the original pOIJulation
(li) and Vtobo the varianco of tho estimated total, to. Then let
a be
tho
. cost of identifying the eligibles in the original population plus the cost
of enumerating a sanple of pn out of tho selective population and Vt be the
Then if 7· 'G . ~'Vt09 the practice of
to Q
'
i,dentifying the eligibles in advance and taking only a snJ!ll)le of the
variance of the estimntocl total, t.
eligibles will be more efficient and should be adopted.
This inequality
may be written as
(25)
V
Now to ~ R.E., and has boen given IJroviously for different methods of snmplVt
ing and estimation.
Thorefore if R.E.
>
O/Co~ an attorr.~pt should be mwle to
eliminf'.to tho non-eligibles fron the original l)Opuln.tion.
If Co is the cost of enumorating a samplo of
population of
E(Ca )
where
a
l'
!~
n fran
tho original
the cost function night be written as
= l)nC
. p
.W
+ qn0 q
is the cost of enumerating an eligible and 0q the cost of con-
tacting a non-oligiblo.
Tho travol costs invnlving C and Cq presumably
1'
would bo about oQ.uo.1 but tho interview time spent with an eligiblo \'1111 be
gro~ter
than that spont with a non-oligibloo
Tho avorage cost per survey, 0, of detornining eligibility and
Q!
.
e
E(C ) is the avcr~c value of Co takon ovorall possible s~)los. Tho
o
cXl)ectation is appropriato :l.n this Cl1S0 as the ostinatod valuos of p and q
will bo randon variablos from sw~)le to sanploo
-24-
enumerating subsequent samples from the selective population only, might
bo represented as follows:
d
(d-l) pnO + pNO + qNO + l: Zi
P
P
q
1=1
0= d
where it is assumed that a
co~~lete
cnumerati0n will be made the first
time to establish a list of eligibles,
d is the nunlJor of surveys fd
for
-,
a:")~)1icn.blo
which the list is
--
without another cOT1il"llete enumeration,
0
p
and 0 are defined as in (26), and Zi is the cost of a periodic follow-up
li
q
which is mado to keep the list up to dato.
If 0p
= Oq and ~
Zi
=Z ~
0,
the cost function for C would reduce to
C=
C-p [(el-l) pn + N]
•
c1
In a relatively s t..-:bJ.o IJClJ?ttlation such as cor.I:.lo::'cial fruit gro't',ors,
! f.:tay
bo
zc:r:o~ i.Ge~ a
equal to
commercial
~rowers
COD1"lleto onumoro:Gion to detornine the
nay roquire no follow-ups for a
n~~bcr
of yearse
Substituting tho general exprossion for Co and C into (25) gives
d
(cln - n + N) + qNC
+ t
Z
9
i=1.-.i •
--------"'---=:.....:=
d(pn OJ? + fln O\~)
(29)
If inequality (29) :i.s fulfilled» thcm the solective population shoulcl bo
identified anc'.. sUDpJ.GC1.:l-
§) Hereafter this
,------
will l)e rofer rod to
I:1S
the n life o::q)ectancyli of the
complete enumora;t:Ion"
J.!
Zi is sunned ::ron 1 ·~c .(~ Tl1t:1or 'i·.1.'.an ,g to .~ as sono follow-up nay bc
necessary tho f:i:t's1: ;:TnT' to iTlsti:l.'o tr.a;ti tho list af elif:iblcs is conploto ..
-25-
Tho behavior of the cost function relationship is illustrated with
so~o
hypothetical yet reasonable values for the variables in the function.
Let C /C = r t lot n/N be 1/~ (sampling fraction) and let t Zi = O.
:p
q
.
Inequality (29) then becomes
)
d-l
:pr +1 + q
~
R.E.
> --.,-------"- •
(30)
d
(pr + q)
The four variables for which different values cnn be substituted arc
~
and ~,~.
~t
dt
The values chosen for illustrative rurposcs are given in
Table 2.
Table 2.
td~l
Hypothetical Values of p, d t r and l/~ to be Substituted into
pr
the Equation C/C o
=
: 1
~
q )
•
d
(pr + q)
d
r
1/~
.2
1
1.0
1/2
,,5
3
1.5
1/10
.8
5
2.0
1/20
1/100
These values seom reasonablo for obtaining some idea about tho basic
behavior of the relationship.
AlthoU&~
in general, tho cases in question
- 26 -
will have a
~
value closer to .2 than to .8, the values .2, .5 and .8
fairly well cover the range of
~
between 0 and 1.
Sinilarly, it is
difficult to conceive a situation where a basic enuneration would provide
a list that would be accurate for more than five l)erioclic surveys.
~
The
value, i.e., the ratio between the cost of enunorating an eligible and
the cost of enunerating a non-eligiblc would raroly excoed 2 unless the
schedule is quite long, and in most cases would bo closer to 1 than 2.
As for the sanpling rat os, a fraction of 1/2 would rarely be used unless
the
~o~ulation
is extrenely snall, but sanpling rates of one in ten, one
in twenty and one in 100 are not uncommon.
Tho ~aluos of C/Oo conputed
from all possible combinations of values of p, d, r and l/~ given above
are presented in Table 3.
A portion of these data arc presented later in
Figures 1 through 5.
An examination of the results in Table 3 indicates that a change in
,the variables il and l/~ has the greatest effect on the ratio C/Co •
increase in
£
causes a rapid decrease in C/Co •
.An
As tho life expectancy of
a complete enuneration is lengthened, the more favorable tho situation will
be towards a complete onuneration followed by a sampling of the seloctive
pOl)ulation.
A ctecrease in p cnuses a decrease in the rntio C/Co ' ond
sinco scarcity items will tend to hnve a low value of
p~
-
this situation is
favorable to the establishment of the seloctive population.
effect of decreasing p is not as
increnso in -d.
stron~
However the
in decreasing the ratio as is an
°
Changing. the ratio of p to 0q over the range considered
has less influence on tho ratio 0/00 than tho other two fnctors montioned
prcviouslya
This is not surprising since tho range is smnll.
cal situations tho valuo of
~
In practi-
would usually tend to be closer to 1, so that
this factor operates slowly in favor of sampling the selective population.
e,
~
T
Table 3.
lla.
...
of the Relative Cost of Sampling the Selective Population to Sanq)l~ the
Original Population whan Different Values for the Variables d t Pt r. and 1/~ arc
Considered.
- - -
r
••
..
•
:
d=l
.•
...
.72
.98
1.20
·..:
1.0 :
: 1.5 :
2.0 ;
3.60
4.91
6.00
: 1.0 :
: 1.5 :
2.0 :
7.00
9.82
12.00
: -1.0:
1/100 : 1.5 :
: 2.0 :
36.00
4:9.10
60.00
1/10
1/20
·e
~ues
1.0 :
: 1.5 :
: 2.0 :
1/2
e'
..
p=.2
d=3
.
•
••
:
d=5
·.
••
.37
.51
.62
··....
.30
.41
.51
·
··
1.33
1.82
2.22
·..
··:
.88
1.20
1.47
2.53
3.45
4.22
..
··
1.60
2.18
2.67
·.
7.36
10.04
12.27
.·
··
·
··
···.
·:
12.13
16.55
..• 20.22
·
··.
·•.
•
·:.
·
·
·•
··
··:
·.
·:
del
••
...
1.50 ...
p=.5
d=3
.83
1.00
1.11
·.
·
7.50
9.00
10.00
··
·
2.83
3.40
3.77
:
5.33
6.40
7.11
15.00
18.00
20.00
75.00
90.00
: 100.00
..
··..
·.
··...
d=5
•
1.80
2.00
·
·:
..
25.33
30.4:0
33.78
·.•
···
··
·....
·..
····
:
..
.70
.84
.93
·..·•
·..
·
...
·
del
1.02
1.10
1.14
·
··
9.60
10.30
10.65
·
·
···
··:
3.40
4.08
4.53
·· 19.20
··· 20.60
21.30
..•
96.00
··: 103.00
15.40
18.48
20.53
....
1.17
1.26
1.30
1.92
2.06
2.13
1.90
2.28
2.53
:
d=5
:
8
X:.
=3
·••
."
: 106.50
·...
··..
·
·
·
·
·
.Z~!13
4.00
4.15
6.93
7.43
7.70
32.53
34.86
36.14
•
·
··
·
.·
··...
.·•
··
·
·
·
2.56
2.74
2.84
4.48
4.80
4.98
19.84
21..26
: 22.04:
·
J
~
- 20 -
The sampling rate howevor works in the opposite direction.
sampling fraction (~ecreases, the ratio of
of Co
increases rapidly.
As tho
At vory
small sampling rates, this factor would tend to ovor balance the other
~s
three in favor of sampling the original population,
For example, ~
= .2, d = 5, r = 1
and If~
= 1/100
seen from Table 3.
is the situation most
favorable for sampling the selective population with respect to values of
1:.'
~ and.:: and least favorable for values of 1/~.
0/00 is equal to 7.36.
In this caso the ratio,
Therefore tho relative efficiency of the estimates
from the two populations must exceed 7.36 in order to recommend the practice of complete enumeration to determine eligibility fellowed by a sample
of thesoloctivo populntion.
To make this rocommondation tho coofficient
of variation of a simple random sample from the original population must be
.'"
,."
..
".':.
less than 215 per cent (1).
The basic relationship may be discerned MOre easily from Figures 1
Figure 1 is a graph of the function of ~ when C/O o isavoragcd
through 5.
over all other variables.
are plotted.
Simil~xly the ratio of 0/00 for
£'
~ and l/~
The averages in themselves have little nonning from the
standpoint of magnitUde.
Tho graphs are presented mainly to indicate tho
direction in which each variable operates qnd the relative influence.
In
Figures 2 through 5, each vnriable is plotted separately for combinations
of extroMo values for the other variables.
Fron Figure 1
ana
is appnront that for a sampling rate of 1/2, tho ratio of C/O
TN)lo 3, it
o is very low
and for situati0ns where such a high sampling rate is applicable (small
pop.ulations) the practice of a complete enumoration nnd subsequent snnpling
.
,e
of the selective population should probably be
stances_
~lvised
The reverso would be true in considering a
in most circuo-
s~}ling
rate of 1 in
-29-
e
1/100
CIOo
f
40~
f
I
J
I
I
30~
I
d=l
\
I
\
-
e
\
. I
!
I
;
.0
\
~
.5 ____. _ _ _ -2.0
-
.----.-' 1.5
/
;'
I
"
r=1.0
lOr
I
I
i
.
e
l
OI.
1/(1;
Figure 1.
=:
.---1.-~.".... _
Graph of the Influence of Each Variable on the Ratio 'of C/C
1/2
Averagea Over All Other Variables.
0
- 30-
e
0/0
20 po
15
- 0 t -1 = 1,
a. p-.
ex.
20
b • P"""
- 0 t -1 = 1
ex.
20,
c. p=.O, 1 = .1....
ex.
10
1
1
d. 1'=.2, - == - ,
ex. 20
- 0 • -1 =-,
1
e. p='t
ex. 10
1
f. p=.2, -1 ==-=w
rel
_ 2 1
g. p- •• -
r=2
h.
r=l
ex.
p~.2,
20
1
=-,
ex. 10
1
1
--,
ex. 10
rz:2
rc:l
r=2
r=2
rel
10
!
I
5
!
l~
i
,
I
o
.
e
I
J ..-
Figure 2.
....I
-_-......_,
.......
- 4."
- __ -_ -
........._ - . 1 ._ _
1 3 5
Life Expectancy (d)
Inf1uonce of the IlLife ~ectancytl of a Oorrplete Enumeration on
the Relative Costs (C/Co ) of Two Procedures for Given Values of
the Proportion of Eligibles, the Sampling Rate and the Relative
Costs of Enumerating an Eligible and a Non-eligible.
- 31 -
del, p=.o~
-
1
(X,
1
=-20
1
cx.
1
20
15
del, );F.2, -:: -
._----- p=.G. -1 = -1
del,
0.=1, p=.2.,
.-_.
,
I
I
------;:-.-.
:.-
.
e
1
cz:::
1
Ie
1
1
1
1
1
1
1
1
p=.O~
- :: ex. 10
d=5, pc.2,
C;; =20
0.=5,
.....-.-_.-
-
....- - - - - - - - - _• • •j
_-----:-:-..-
-
_-----
-l-
10
0.=5, p=.O, _:: cx. 20
------...-.. ._....._-_ ..--..----....
...- - - - - - -
01
(X,
0.=5, p=.2,
c.z :: 10
.
_
1.0
1.5
2.0
Relative Cost of Enumcr~ting an Eligible and aNon-eligible (r) •
Figure 3. Influenoe of the Rel~tivo Cost of Enunerating an Eli5~ble and a
Non~eligiblo on the Relative Costs (C/Co) of Two Procedures for
Given Values of the "Life 11lx],)ectancy ll of a Conplete Enuneration,
tho Sanpling Rate and the Proportion of Eligibles.
- 32 -
1
1
d=l., - = - , r=l
0:.
20
15
..------
1
r=2
1
-tXt1 =-10'
r=1
1
-20"
r=2
.
10
d=5,
5
=
•
o
'--
.
e
1
=-10'
d=l, ~
2
-10...
5
-10...
1
-::
0:.
1
1
- = ......., r=l
0:.20
1
1
d=5, - = - r=2
0:.
10'
,
_
1
1
d=5,
- ---., r=1
0:.
10
:
-
0
-1
Proportion of Eli~iblos in tho Original Population (p) •
Figuro 4. Influence of tho Proportion of Eligibles on tho Relative Costs
·(C/Co) of Two Procedures for Given Values of the lILife Expoctanoy"
of a Complete Enunorntion. tho Sanpling Rate and the Relative Costs
of Enunerating an Eligible and a Non-eligible_ .
, e·
e·
·e
C/C o
a. d=1, rr.8, r=2
b. 0.=1, p=.o, r=1
c. d~1, p=.2, r:::.2
d. d=l, ~.2; r=1
e. 0.=5, p~.o. r=2
f. 0.::;5, p=.O, re1
g.. d~5, p= .. 2, r=2
h. d=5, ~.2, r=1
25
20
,,
,
I
If
15
I
~
~
10
51-
01
112
Figure 5.
j.::
: Sanpline Rate (1/a.)
t/l0
1/20
17100
In£luence of the Sanpling Rate on the Relativo Costs (C/Oo ) of T\fO Procedures for Given
Values of the "Life Expectl'.ncy" of a COEIplete EnUl!lor~tion, tho Proportion of Elir;i'blos
and the Relative Costs of nnuoer~tinG an TIligib10 anc a Non-oli~iol0.
-
I
-34-
100.
Tho two snnpling rntes considered in Figures 2, 3 and 4 aro 1 in 10
f1Ild 1 in 20 •
.
e
CHAPTER. III
IDENTITY OF THE S.AM1'LING UNITS -NOT· 'KNOWN IN
ADVANCE OF THE FIELD SAMPLING PROCESS
3.1
Introduction
Tho results in Chnpter II arc predicated on the
pro~iso tlk~t
from a
conparison of the relative efficiencies and corresponding cost function, a
decision would be mado as to whether each sampling unit in the original
population should be identified prior to
eliminated.
s~)ling and
the non-eligibles
In this caso tho selective population would be known com-
pletely and subsequent
s~)les
would be drawn only from it.
There are
situations, however, in which the cost function as set up ih Chapter II
would adviso against a complete identification of tho sampling units in
advance, or casos where it would be impossible to do so.
a lessor gain in efficienc7
n~
In those cases
be realized by snopling the original
population but eliminating the non-eligibles from the samplo. The nean of
n
the sample yl = t ' Y1!n l would then be based entirely upon the eligibles
~l
_
and is an unbiased estimate of the monn of the selective population,
pN
Q/
;L= !: Yi!pN.
The number of eligibles, D.!.., that occur in any given
i=l
Q/ Tho population values of Yi arc summet'l. from 1 to pN, and thesnT.1ple
values of !1 are summed frofl-r to B:- It should be ;;do clear however that
•
this notation does not mean that thes~lo values are tho first.~ values
of the population. Tho Yi values in the sample are in roality the eligible
(non-zero) values that arise in a randomly'selected sample of A s~)ling
units out of a population of ! sampling units_ This sit~~tion can also be
regarded as a sample of ~ (eligible sa~pling units) out of a selective population of pN eligible elements. In tho consideration of the selective
sample, if tne proportion of eligiblos and the size of sample is such that
it is possiblo to get a s~)lo composed entirely of non-eligibles, this
sa!'lJ.')le flust be discarded and. a nml one dra~m, as samples that do not
contain any eligibles are inadflissible.
.. 36 -
sample of the original lJopUlation is now a. random variable, C'..ncl must be
considered as such in deriving the expected values of the mean and variance.
Stephan [13
J has given tho following result for simple random sampling
from an infinite
pOPulnti~n.
vS"
c
r?
E \ :,)
•
He further considered
the mean and variance of -- for an infinite population and for s~)ling
nf
from a finite populatinn without replacement. However he did not indicate
the finite population correction factor for V,t when sampling from a finite
population without replacemont, and no results were presented for other
than
si~le
random smw2ling.
These latter points are considered in this
thesis.
The charfl..cteristics of selective sazrn?les of n' drawn without replacement fron n finite population with pM eligibles are presented in Table 4.
,
e
3.2
General Theory
3.2.1
Sirmle Ranclon Sar.l'ple with a Moan por Swmling Unit Estimate:
Let Y~, bo tho mean of the
exactly
~
eligible aembers of a sar.1)?le "lhieb has
eligible elements; that is
n'
Yn', '= ~ Yi!n l
1-=1
(31)
t
where the n f [Yis can be regarded as a random sar.l.J?lo from the pM eUgible
in the population, with the ~ernaining k elements (k ~ n - n' ~ non~eligi­
bles) being drawn from the qN non-eligibles in the population.
Hence it is
seen that thore are
f
nf
= (PM)
nf
(qN)
k
possible sD.J'!Il?les having exactly n1 eligibles out
of~.
In tho sum of those
.- 37-
Table 4.
Characteristics of Selective Sanples of A!. Drawn without
Replacement from a Population of N E10~nts in which There
are pN Eligibles and qN Non-eligibles.
nl
·
·
Number of possible ·
k:=n-n l
sar.T'J?les
f~N)
·
·
1
[~~)(iN)
2
{:~] {~N]
··
.•
n
0
n-l
n-2
•
.
e
.
•
•
n-k
•
•
•
k
•
•
•
2
•
•
n-2
1
n-l
•
:
(PN-Il
n-l
{p~:f) (iN
•
•
•
•
r~:~r[f]
[ J?N-B {qN)
_n-k- .k.
•
•
•
l~Nlb~~}
·
...
.
(P:)(ll~~l
=
I
Ip~:~1 (~NI
•
E fn,
if
fnr
Numbor of tines
each eligiblo Yi
is represented
in tho total
possible sanples
•
•
[J?~-~ {n~~l
(n~~l
r~I-r~l
No non-eligible Yi is admissible in the seloctive
samples which contain no eligibles are inadnissible.
s~)lc;
consequently
•
"'
... 38 -
f nr srucrplos, oach eligible Yi in the population will be represented n'fnt/PN
tines.
Henco
f
where ~n
j=l
I
eligiblesa
.
yl r'
n J
is the sun of the f , possible s~Jles having exactly n l
n
The expected value of Y'r is given by
n
The expected value of any
s~~,le
nenn
yl
is Given by
where Pn' is the proportion of possible sa.r1J.'les which have exactly n I (n I ~ 1)
n
eligibles, that is P ,
n
= f n ,Inl=l
t f ,a
n
Ronce
n
_
t f f
_.' n. nl=l n
r=_# 0
=--==--Ef ,
nrcl n
It might be noted at this point thnt under the conditions of oliminatlng the non-eligibles in tho saople, the estinate of the total for the
po~ulation
•
from anople data only, is biased upwards.
fact that the estimato of p is biased upwards when
contain any eligibles arc inadmissiblea
This arises from the
s~)los
that do not
Therefore, even though
yl
is an
- 39 ...
unbiased estimate 0(,"",
];l
must be known in orelor to mo.ke an unbiMecl
estimate of tho total for the population.
In
m~
practical cases, D may
be known (or very well estimated) for the population but the indiviclU£ll
saP.qiling units arc not identifiable until contacted.
Tho variance of the moan of the oligibles ~ from a random srorr~le of
n.
11
out of a population of
is gi von by
Proof:
Vyl
= E(Yf_,.,u...)2
c
E(y)2 _~ ..
In genoral for any sanple of
n,
.
e
For a givon
n.!..
1
nI
Now, 2
•
~ I: Y'Y
1<.:1
J.
j
---_.
..1
+ 2 I: I: Y'Yj
plY
I:
i=l
-.,.,.."
J>'<I
2
Yi·
J.
pN(pN--ol)
- 40 -
Hence.
pN
g
i 1
=
2
Yi
nf
pN
2
~
.
.
t,.
i
i(~-
1
pN
-
n'
-=- +
pN
nt-I}'
- ----
)' 2
+ ( E Y
n f pN(pN-1)
1=1
i
r__n_''''_l 'J>i(ntpN(PN-1»)
Now
e
-
~(~)' ~J
L!.. n'
(1 )
2
2
Nowft-
= (~Yi) 2
and l:
f
I
I.,
t
nf
--.E... l:f
(pN)
n
+
pN
= E-
•
n1
Hence
,-
(2..
I
pN
E(Y'-../-l.-)
2
=
i~l
2
Yi
pN-I
•
E ( :.
!
I 1~: "1
'
1
+
pN
pN
) 2
1-E
J
nt,
I
L
(pN-I)
l
\
-1
pN
.... 41 ...
pN
==
i~l
2
Yi
plf-l
~(-:.)I
nT
[
ill' P~NN
2
\2
-~
--=pN(pN-l)
pN
1
pN
: . .r. J'<>- . ;y~=-'.:"- l.:.i~-=: "'Y-=i: .:. )_2_/_P_N_ L~ I'~) _..:. J'
pN-l
pN .
nt
Using sample data s2 (.:.. - .:.) is an unbiaeed estimate of
nl
a
2!-E (2:-) - ~J~
'-
n
yf (the
an
l
provided that n I
>
2.
If this restriction is made,
pN
estimate Of?,) is still unbiased, as it has been shown that
unbiase~ estimate
(34) ]
pN
of/I... for all n T •
ytn f
is
This wou1ct require [in proof of
a summation from n I == 2 to n' :::: n.
3.2.2
Stratified Random Sample with a Mean per Sampling Unit Estimate:
In considering the drawing of a selective stratified random sample, one
further restriction is made.
To be an admissible sample, each stratum
must be represented by at least one eligible_
The total number of possible
samples under this restriction is
•
\'Jhoro N is number of sampling units
j
~(l
'qi1j tho nunbor of non>-cligible
saIDDling units in the population in tho jth stratum, and n
j
is that part of
- 42 -
the original sample (both eligibles and non-eligibles) allocated to the
jth
,
Proof:
1
G'.
l: p jN . fij •
/lJ • -
pN j=l
J
Since within each stratum the sampling is simple ,random,
Y3
is an
1
where
Proof:
1
(y'
-..PL)
V_I
Y
=
=-pN l:
p N
jj
1
E(p N .)2
(pN) 2
j J
G'j - /A')
j
•
IIE(:? _A)2 + E(i" - A) (y'm - ,~.dl
mj
j
j
j
j
for j ~ m.
From the proof of (35),
2
E(y' - /-0'-<) 2 • cr .
j
•
JJ1
J
j
(~(.:.)'
- -=-? .
n
I
J
f
l
j,
:p N
j j/
This also assumes that Pj and of course Nj are known •
- 43 -
Now the sample taken within any stratum is independent cf the
s~2le
taken wi thin any othe r stratum. Hence
Therefore
1
G
~ (pJ.NJ.) (3 C1J~
V_I ::::
•
(pN)2 j=l
Y
2
2
For the estimate of Vyt from sample data, substitute Sj for OJ and
one over the number of eligibles which actually turned up in the sample
in each stratum for]] ( :,; ) .
This estimate is unbiased assuming that
every stratum must be represented by at least miO eligiblos observations.
3.3 Enporicnl
Invosti~ntion of
E
(:,j
It would bo worth>rhilo to dotermine tho behavior of E (:. )
reasonably large samples.
E (
~)
in
Stophan [13] has givon the expression for
in torms of a factorial series.
Howevor in large s_les, evon
,-.rith the use of Stirling's apprOXimation, the computations arc laborious.
It was, thorefore, deeidod to investigate the nature of E (
empirically.
~)
These calculations were made with tho aid of International
Business Machines.
A deck of 1,000 cards with random numbers in .the first 5Q columns
1Q/
•
was constructed.
1Q7
In 100 random cards from tho deck, a 1 was punched
R. J. Monroe and J. S. Hunter, both of the Institute of Statistics,
designed the process for obtaining tho data from punched cards e
-44-
in column 51.
On the othor 900 cards, a zero was punched in column 51.
Similarly in column 52, 200 random cards received a
column 53, 300 were punched
~
~
and 800 a zero; in
and 700 zeros; in column 54, 400 were punched
1 and 600 zero; and in column 55. 500 ranaom cards received a value of 1
and 500 a value of zero.
Then the 1,OaO-card deck (population) was
thoroughly mixed end 100 curds selected at. random from the dock.
Tho
number of cards with an eligible (1 in a given colU[~ was tabulated for
each of the colUllms 51 through 55.
In all, 50 such samples were drmln.
To
obtain estimates of p above .5, the zeros were considered as eligibles to
complete the rango of p from .1 to .9.
~
Therofore, estimates of n S for p
.1 and .9, .2 and .8, .3 and .7, .4 and .6» and .5 and .5 are not indo-
pendent but it is not felt this invalidates the inferences.
SinilQrlY, to
obtain sanples of 200, two sanples of 100 wore combined so that 25 samples
were available.
This is not strictly sanpling without replacement since
,
anyone card has a chance of being in the sample of 200, twice.
this should not invalidate the results.
Again
Finally, samples of 500 were con-
structed by combining five of the sarq)lcsof
There were 10 of
these samples •. The
their . ostinated
standa~l
~verago
estimated values
errors and coefficients of variation are shown in
Table 5 aleng with a comparison of tho estimatec1. E
l -=- 'J
Mel"':".
\ nt
Tho
pn
standard orrors arc all so large that the confidence limits about the
average estim~tes include the value of l/pno
pattern eMergos.
tho exception of p
The estimate of E(l/n s ) is always larger than l/pn with
= ~3
and one value for p
p
~
.2, P
sizo 500.
= .3
However a rather definite
= .5
for the
aver~~e
of tho samples of size 100; p
for the avoraeo of the sarwles of size 200;
and one value for p = .5 for the average of tho
~
03
an~
sw,~los
of
... 45-
Table 5.
Estinates of E
£,.:.)
t··n
and its Stanclard Error from a Nunber of
l
Varying Sized Scu!l?les Drawn at Ranc:'on fran a Hypothetical
I 1 \
POIJulation and a Conparison of tho Estir.1D.to of E
'1ith
l-;:; )
llpn.
SarrplQ...:
sizo:no" : p :
·
· :
100 ·
:50 :01:
l/pn
0
e
100
100
100
100
100
100
100
100
100
:50
:50
:50
:50
:50
:50
:50
:50
:50
:. 0-2:
: ~3:
: 04:
200
200
200
200
200
200
200
200
200
200
:25
:25
:25
:25
:25
:25
:25
:25
:25
:25
: .1:
: ~2:
500
500
500
500
500
500
500
500
500
500
:10
:10
:10
:10
:.10
:10
: 10
:10
:10
:10
=-.5 :
:05:
: .6:
: 0':
: .8:
:.9:
: Q-3:
: .4:
: .5:
: .5:
: a6:
: e 7:
: 00:
: .9:
: .1:
: e2:
: t>3:
: 04:
: ,,5:
: $-5:
: .. 6:
: .7:
: .0:
: G9:
·
·
E
(~)
.100000
0050000
..107543
.051253
..033333
.~O33261
..025000
,,020000
0020000
0016667
.014286
.. 012500
.011111
.025402
.020393
.020042
.016804
.014425
,,012581
0011139
::E (1\'
1:
...
n rJ
:
-~
pn:
,,007543
0001253
-00000'''(2
,,000402
.000393
.000042
.000137
.000139
.
..000081
.000028
·
·
.05000CO
.. 0250000
.0J.66667
.0125000
..0100000
.01000000:
.00833333:
c00714286:
.. 00625000:
,,00555556:
,,0016235
.0516235
.0000092
.0250092
.0165253
"'00001414
.0000555
.0125555
.•0001424
.0101424
..00997081: ....00002919:
.00836182: .. 00002049:
,,00720486: ,,00006200:
.00627961: .00002961:
.00556493: .00000937:
·
.0202114 :. ,,0002114
.00995935: -.00004065:
000658018: - ... 00000649:
.0200000
.01000000:
..00666667:
.. 00500000:
000400000:
.0040QOOO:
.00333333:
.00205714:
.00250000:
,,00222222:
.
~00500233:
~00000233:
..00404310: ..00004310:
.00397428: -,,00002572:
.00333951: .00000610:
.00207950: ,,00002244:
.00251108: .•00001100:
.00222527: .00000305:
". f,;,)1.
C"V.
%
27 0 68
.029771
,,0104-40
..004178
0003517
"002918
,,002089
.001438
.000733
.000666
.000430
12<156
13084
10,,'18
10 0 42
8.56
5.08
5.29
3 ..86
.,0105909
,,003J.9897
000153905
,,00101996
.000792116
0000754012
..000413073
,,000201260
,,000199930
.. 000137170
20"53
12,,79
9031
8.12
7.81
7057
4.94
3.90
3018
2 ..46
000305147
. . 00112141
.• 000412426
,,000236293
,,000211056
.,000177006
cOOOl02035
00000774120
,,0000677868
..0000377221
20.;037
··
··
15 ..10
11,,26
6.27
4.72
5,,22
4 ..47
3 ...06
2 ...63
2.70
1.70
- 4$-
With the exceptions previously noted, it appears that
1
1
\ _)
1 \
-<E
<-
pn
n'
pn-l
Although these onpir:i.cal results are scanty, "lith snnplcs nne\. populations
of this sizo, tho differences between l!pn ani E(l/n ) arc negligible and
'
oan be
disrogn~ded
for oODpariscn purposes in nost practical surveyso
Further thore is evidonce that the values converge as p approaches 1, which
is reasonable_
Obviously when :p
= 1,
the two are ilenticalo
The standard
errors also deorease as p increasos but ct a faster rate than the Deans
decreaso, as incticatocl by the dencondinp.; ordor of tho coefficients of
variation as p is increased.
If l/pn :l.s used as tho value of E
•
•
total
~
t) ,
;
the variance of a
sar:~le
will bo slightly underestinated; so to be on the safe side, one
night use l/pn-l •
... 47 ...
3.4 Relative Efficiency of the Selective Sample Compared with
S~les
from the Selective and Original Populations, Using the Sane Sampling
Rates.
~~,l
Sir:rple Random Sanple with a Mean per Sanpling Un! t
Estimato~
The relative efficiency of a sample of pn from the selective DOpulation to
a selective sarrple of n t (n' eligiblos out of a sarrylo of n) is given by
r
0
l:.
---
2
"')E
2
0
1
1
pn
pN
Tho efficiency of a selective
1
pN E
-~
=
N
sar-~le
s~~ple.
2
2
o
--
l: (y - ,..,uJ )
0
(39)
from the 'original population
~
the expression for the relative efficiency, a o is
0
•
of n' eligibles can also be
in which tho non-eligibles are retained in the
e.,
- 1
- 1
n
compared with the efficiency of a sample of
•
J..
(~j
In ordor to simplify
define~
as in Chapter II,
2
, yet the fini to populo.tion correction footor apy.lieo.
N...n
N-n
is --- rather than tho correct factor - - - a
N
There will be little difference
~l
1
in practical situations if N is large enough so that - is about equal to
N
1
• Unc.er these aSSUJ;1Iltions, the relative efficiency is given by
2 (1
(CV) 0
R.E. =
p
1\
i ~ -~ ~
~ (CV) o2 - q
p
E
(~)
1
•
(40)
plr
Derivation:
•
Let V be the estimated variance of the total for the smwle fran the
to
original population ~~d Vtl bc tho estinated variance of the total for the
selective sample.
... 48 -
Then
•
_-=pN
L
5
If the finite population correction fnctor can be ignored, (40)
reclucos to
•
- 49-
e
1
=
r-=- )
11 nE
(~ }
If]l
nl
•
q
1 -
(41)
p(CV)2
\ nl
0
1
=--,
I
which would occur if the eligibles were knwon in
pn
/.1c1.vance, (41) \'foulc'.. revert to (1).
It \ms shoun tha.t the confl..i tions of
equation (1) ahiays resulted in a gain in statistical officiencyq
Therc-
1
foro ifE ( :,)
evidenced •. If ~
< --.
a gain in statistical efficiency is always
pn
as it appears fran the e~)iricnl results in section 303
1
>-- ,
a gain nay still result, Qepcnding upon the population
pn
2
ane'. (CV) •
values of p, E
3.4.2
o
Ra~don
Si,nple
Srum)le with a Ratio 3stinatc:
In
conp~ring
selective SllJJl,le with sanples fron the original anr1 selective
there is little difference aD0ng tho three
CODIl 8xisons
a
:polJulnt1()ns~
if a ratio estimntc
If tho fact that liN is not
is used and the sanpling rate is the sane.
equal to l/N-l is ignored, the relative efficiency of a solective s~]le
to that of a sarrq)lo from the original population is approxinatoly
1
/
1
R.E.
(
-11.
~p)
E
\.
Proof~
R.E.
-1
N
(~ )
\• n' J,
1
f
-~)
1
-n --N
"\
::
pE
(:, )
C/o
1
N
(42)
- 50 2
2
Now ,..uyl1 AI; 0y' mld ax'
=?ya A.
2
0y lUld
2
ax respectivoly fron (34)
and (35).
Hence fron (11)
R.E. = p
--(;) --
E
1
1
1
n
1
N
1
pN
=
-n
--1
N
C.J -1
pE
"
N
Sinilarly the relative efficioncy of a snnplo fran tho selective
pOlmlat1on to a saMple fron thE> original :porulation . .·Then the sDJ:q)l1ng rates
arc .thc saMe is awroxinat ely
E
B..E.
•
•
=
l;)
-~
1
1
pn
pN
.
Proof:
R.E"
and. (35) this expression reduces to
1
E (:.)
....
-
pN
R.E. = - - - - - - 1
1
•
pn
pN
•
- 51 -
•
Thus, both rolative efficiencies (42) and (43) depend upon tho
1
ill
relation of]l \' - ) and - .
. nr,
,
If' the
m.1TO
were cqual, then both relative
l'n
efficiencics would'ne l~ From section 3.3 it nppears that .E
'1:
!-;
>
I
i
\nr ;
1
pn
slightly less than 1 whilc (43) is slightly
grOD-tor than 1.
However, with reasonably large sample sizes, the rela-
t1ve efficiencies would probably not differ greatly from
1.
so that there
would be little to choose on the basis of respective variances.
In section 2.1.4 it was indicated that the biases in a ratio estimate
using samj}le <'tata from the original and selective pOIlUlations woulcl be
different.
Similarly, tho bias in a ratio estimate using data from the
seloctive senple will be different from either of the othor two, i.e ••
i _
I
E !
\;
\
,
Yo \
-
I
I
I
Xo /
~
i
yt
\
i
f,
:&1
"
I
i
rf
I
I
I
l ,I
-
E(y)
Y
nJ
1: -
E! \ x
•
E(i)
I
!
If the sampling rato is the snno for each of tho three Droceduros, the
difference in the bias may be expressed as in (45), (46) and (47).
considering tho difference in the bias between the
s~}les
zeros and those excluding the zeros, a ratio of %
E
!f
Yo
j
\\ X
o
)
(~) = 0,
I
El
yl
..-.
\x
t
i
then E (
i
I
\ Xo
('
~:)
) 1 -
I
•
is considered to be
iyl i}
I 'I
EI:-J.>
\.I
'x
.
/ -I
=E
inclUding the
1.
.'l
(qN~
n:
=--
~\
,/
In
~1
,
(
,x
l
; )
\
Ii
=E
.
(~15)
#
Ii \
I:: /-
.Also if the
\ x
relati9n of y to x is a straight line through the origin, then all expeeE(y)
•
tat ions are equal.
Since
=
E(yl)
E(y)
=---• the differences
.E(St)
in the
E(it)
expectations of the ratios ar~ equal to the differences in the biases".
0
E(i)
- 52-
Proof:
(~)
j-
i '1
E,i
i
I
0
-
\i
=
i
\
E
°
{
j~l
I
0
!y
x
i
\ °
Efn'
E
= .1=1
( qN\,
n! ! yo
NO,"l
E
\
\
I
(...2.
.1=1 \
their ratio is
•
jqN\
n j since all these Yo
=
Xo /
aneL x
!.
values are 0, and
o
j
1.
Honce
t'
y'
Now for a given n',
,Hence
I _
E \
-Xl
Yo
::=-
X
o
\
I
~:)
r y
- E, -
\ i
\
~fnt
:::: .1-1
, (N)
n
..
e
-I
=
(qN)
'n
,N)
In
,
and l:
j::l
.1=1
::=
,
!it I
y
(
\
"
Xo )
I\
.i
i ~ J,1
\
\
I
l:fn l
j=1
,
l:fn l
E
_ ,1=1
(~)
I
(-;)
= i::
j
\
'
l y'
-
-I
'I y
'-II
)
\ i'
j
(qN)
.n
\
I
((N) _
~ n:
j
- (CJf!)
\
0
(N)
,n·
/
-I
f
y
.
l:fn' / -,
I:
J
l
,-'1
~fnl
n·
IN'l
'n
l
(-,
y
(- E \
f;N'}
"
\! ')
x' J ~
•
rN) \
- (
n
I
>-
I
)
+
(~)
(N)
·n
•
j
qN
r
\
+ ---Ai.
IN'~i
n
... 53 -
Although the other two differences do not work out as neatly as (45).,
tho r1..1ffercnce between the bias of a ratio' estimate macle from a selective
s~)lo anQ
a snople from the selective population is given by
pn-l
n
t
tfn'
tfn
\
\
nl::=.pn+l
nf=l i y
f yt \
+
Z
{I -y )
i ' Ii •1
,j=l
.1~1 \
! j
E (_I - E
=
-.
I
I-=,x
~ x
I
(qN)
(~)
. n,
-,
I
,
x'
~)
-
;
F
+E(~)
i (pN)
\ pn
~iffcrence
'\
}
(qN)
on - 1 l
I
IN) _(qN)
I
Similarly the
,"
J
n
In
•
botween the bins of a ratio estimate fron
sample fron the selectivo population
an~
a
s~}lo
fron the
~
ori~inal
population is
pn-l
Zfn' I
nt=l ;y
,t
:: =1 1 i '
J
\
-.
n
Zfn l
n l =.pn+1
Z
=1
+E
3.4,3 Simple Random Sproule with a Regression Estimate:
A regression
estimate of the pe?ulation total, T, can also be made from the data obtained in a selective
sn~}le,
and can be compared with regression estimates
from samples clrawn from the original Mel selective populationSa
The
relative efficiency of a SaMple from the selective population to a selective sa.r.JIlle is
E
•
R.E.
=
(:. ) --pN
-pN
-pn
1
1
1
a
(40)
-54-
Proof':
R.E.
Fron (34) anc1.. (35),
It .:ill.
el,
y
and;?' are equal to
0
2 and P respectively.
y
=------ •
1
1
--pn pN
Again the relative efficiency depends entirely
, 1 \
E ( - J'
u~on
the relation of
1
to - . From the empirical results in soction 3.3, o:Jq,)ression
pn
.
\.n l
(48) will probably be slightly greater than 1.
The relative efficiency of a selectivo
original population is given as
.
R.E."
' -('
l
L'
1
\
Proof:
•
(
p(av)~ J
q
1 -
\
sa~le
r~ 1)
I
-
;;
to a sample fron the
/(49)
q)
(l)_l~
21\/(CV);o+(CV);o_2Po(CV)Yo(CV)xo~~r
2
2
pE_
-
p(CV) XO, '\
(CV) yo (1 -I'
0)
I
e--
nl
N
'......
-55-
The relative efficiency here depends not only on the relation of
(~)
to": but also on the relation of P a (CV)yO to (CV)xo. (See
pn
section 2 l ..S). If /° 0 (CV)yO is quite different from (CV)xoit is
E
\ n'
G
reasonable to expect a gain in efficiency of the selective sample over
a samplo from the original
~opulation.
Stratified Ranclom Sru:rplc with a Ivlean 1)or Sampling Unit
3.4.4
Estimate:
The relative efficiency of a stratified ran-
(1)
aom sample drawn from the selective population with respect to a selec-
tive sanple is
(cO)
2
If crj
=cr2
t
the relative efficiency reduces to
• •=
RE
G
(
(l \
1 ')
!: Cp N ) 2
:m :~
jc::l j j
nt
p.N J
"
j ,_ _"'__l"'_:_
,
I
:>--~..::_.
G
!:
j=l
•
J - --
(pjN
'>
J
2
\--=I ---=- ~
I)jn j
PjNj .1
•
- 56 -
(ii) Tho relativo efficiency of a selective snnplo with resDect to a
s~1le
fron tho original Dopulation is
....
"I
1
j
_\.,.
(
P N. \
j JJ
..
Proof:
02 Now .....
.P j j -
'/~ qi
2
00j .... /"'""'"0.1 ....JL •
PJ
-W-2...-------· Nj
!:
).-
(YJ.· J' - /-<-< .)
OJ
o 2 . is defined as i=l
oJ
and consequently the appropriate
finite population correction factor to apply in tho calculation of Vt is
N - n.
j __~J •
~
N
j
For
al~ebraic
convenience the fop.c. used was
Nj - n j
- 1
foro the error is
Nj
-
1
N.
reciprocal, or _-::.J~ •
N
j
•
Thcre-
and could be corrected by multiplying V by the
t
Nj
estinate V
to
negligi b1e •
•
Nj
The effect of this error is to slightly undor-
- 1
but in most populations requiring snnpling,
N - 1
j
N
j
is
- 5·7-
,
],
___ ti
I
11 N
j
j)
1·
)
f
- 1?N.
J ,J
The numerator and tho first tern of the Qenominator are tho same with
the exception of tho torn l/n
clenoninator.
If
-==:P
n
j
j
E
(-=--)
nj /
in the nunorntor onet J.J. E
j
J
I
(-=-)
n1
j
in the
\
tho fornula boconos that of the relative
efficiency of n s[l.P.lJ)lo fron the selecUve po])ulatinl1 to
original yopulation.
The socond term of the rlononinator,
sanple fran the
f
'
~ ~~jqj (~
D,
:m
1
)
}
--(
n ..
J I
PjNj )
is al\mys a ;)os i ti va qunntity an~ always subtro.ctg from tho first tern.
.
J 1 \
1
is l)ro bnbly > - t if
.AJ. though p j E r n I-J'
n
\ j
I:~p-~
j
q
oj j,
!-) -
t(E 1/\ n I l )--=-,l
.N. \
j
~ J)
>
j
I!
I
,;1,,2.
(1
J OJ ) n
(
j
,I,1
-]J
j
.\ {
:Eli - ) ,
\ nl
\
\ j / ,)
,
- 58 -
there will be a gain in statistical efficiency from using the selective
sacrple.
3.5
In most situations, inequality (53) will be satisfied.
Oost Functions
In considering the cost of obtaining sample information under
alternative procedures, three comparisons can be made.
(1)
00 versus C
(1i)
ot versus 0
These are
1
(iii) C versus 0
o
f
where C is the cost of sample data from the selective population, 0 is
tho cost of
s~JDle
sa~ple
data from tho selective snople and 00 is the cost of
data from the original population.
have been
consi~ered
Oost functions for 0 and 00
previously in Section 2.2.
These functions would
hold in tho comparisons needed in this section also.
Thereromains to
b~ estr~lished
t
a cost function for O.
In so far
as gathering the information is concerned '(and this is the only cost we
have considered), the cost of ot should be identical with that of Co.
Therefore a comparison of the relative efficiency of tt and to on the
basis of both cost ane", precision reduces to one of comparison of tho
variances since the costs are equal.
per unit cost of t
l
and
~
In compering the relative efficiency
as estimates of the population total, the cost
I
function for 00 can be substituted for 0.
Vtt
", 0
0)
V
t
I\
0
->,-=-r
\
0
0
For
ex~)le
if
'
the practice of a conplete enumeration followed by a sampling of the
•
selective population in subsequent surveys is to be preferred over
the original population anc1 cliscarcling the zeros fron the sar!l]!le.
s~)ling
CHAPTER IV
.AREA SAMPLING
•
4.1
Investigation of the Geographic Distribution of Some Scarcity Items •
In practice, a situation encountered frequently in estimating scarcity
items is that no lists of any kind are available.
The alternatives to
consider in these cases are (i) an area sample and (ii) the development of
a list of eligibles.
Most area samples can be classifiod in the category
9f IIgonoral purpose ll samples and in general have a low efficiency in estimating scarcity items.
Somo general rules can be given for improving thQ
estimate, but again on objective method for selecting alternatives to
satisfy the principle of greater precision at a given cost is highly desirable.
For example. if the scarcity item to be estimated is concentrated
geographically, tho area population con be delineated to include only
those Minor Civil Divisions or Counties which contain tho item.
It has boon
shown [3 ] that this procedure of inclucUng only townships \oThich contained
commercial peach orchards in the SandhilIs area of North Carolina did not
give acceptable accuracy with an area sample unless optimum allocation was
employed.
In general, the within stratum variances necessary to mru{c this
allocation will not bo known in advance.
Tho basic distribution of eligible sampling units in a geographically
concentrated area was investigated for several populations.
It was believed
that this distribution resembled the contagious type of distributions, i.e.,
one eligible sampling unit tended to bo geographically associated with
another.
•
To test this premise, tho fit of a contagious distribution was
compared to the fit of the binomial and the Poisson distribution.
three distributions wero fitted to data from several
popul~tions
These
in Two
- 60 -
different geographic areas.
One area considered was those townships in the
13/
Sandhills of North Carolina having at least one commercial peach orchard.
The scarcity item considered here was a commercial peach 6Tower.
The
other area was Allegan County, Michigan, and the items considered were
froi t growors, appl,? l]rO\'1ors, poach growers, pear growers - anel the owners
A
of grape
Mt
viney~ris.
In both aroas, an attempt was made to maintain a
segment size of five farms each, Dnd yet maintain a reasonably identifiable
se~Jent.
In order to meet the latter criterion, a little variation was'
allowed in the number of farms per segment, which accounts for the fact
thet some segments show as many as 7 orcharcls.
The moans and variances
of the number of eligibles per segment are shown in Table 6.
In estimating the
butions, tho
avorU{~o
~arNleters
for the binomial and the Poisson distri-
number of f[l,rms por segment l.ras token as 5 oven though
a few segments hacl. more than 5 farms as inclicatec. previously.
tagious distribution considered is also known as the
The con'"
Poly~EgeonburGer,
negative binomial or the binomial with the negative index.
the
The discussion
found in Kenclall [0 ] was followed in ostimating the parameters, Eo and c,
for the contagious <Ustribution. 'The observed v-alucs of ,.f,/., and
CJ2
are sub-
stitutcd into the following simultaneous equations and these equations
solved for :p and c.
jjJ The data on connercial peach orchards in the Sandhills area of North
Carolina were collected unc'.er the supervision of Frank Parker ancl R.P, Handy
of the North Carolina Cooperative Crop Reporting Service which is maintained
jointly by the United States Departrne~t of Agriculture and the North Carolina
Dc-.f>aI'tmont of Agriculture.
The Michigan data were nade available by the Michigcn Cooperative Crop
Reporting Service. The Allegan County Survey was concluctcu. by C~ J. Borum
anel R. V. Norman of the Michigan office and C.D. :Palner nne. G.D. Simpson of
the Washington office of B.A.E.
1dJ
- 61 -
Table 6.
Means anC'. Variances of Several Populations of Scarcity Items
Located in Two Geographically Concentrated Areas.
·
#
Area
Sanc~ills
of N.C.
Allegan County,
Michigan
••
.P-
cr2
Comtlercial Poach
Orcharcls
.24055
.60891
Fruit Orchards
.44315
1.13704
2.55
Apple Orchards
.37042
.90405
2.39
Poach Orchards
.31535
.69782
2.21
Pear Orchards
.30539
.68905
Population
cr
·
2
0--
2.45
:
·• Grape Vineyards
·••
•
.07004
.09760
2.26
••
••
1.23
The probability of 0, 1, rind 2 successes, according to Kendall, is given by
/
c )
\ 0+1
p (p
y'
P (p+l)
0+1 • 21(0+1)2 • •
~.
j'
•
(56)
The sum of these ?robabilities for a fixed size of segment will not
equal 1. For exaople, the sun of the probabilities for 0, 1, 2, 3, 4, 5 and
6 orchards per segment is only .9983 for tho Sanclhi11s data.
For such a
truncated distribution it is necessary to divide each of the probabilities
in (56) by the sum of the 1,robabilities.
These adjusted probabilitios when
multipliod by tho total number of segments in the universe give the computed
frequencies for the contagious distribution.
The results of the computed
frequencies for the three methods as compared to the obs.orved frequencies
are shown for each population in Tables 7 through 12•
•
••
- 62-
e
Table 7.
•
Observed and Computed Frequency Distributions of Commercial
Peach Orcharcls in the SancUlills Area of North Carolina
·
Numbor of
orchards
p or segment:
0
1
2
:
Computed
Observed
Contagious
095
81
26
..
Binomial
..
889
90
31
·.·
·
·
803
200
22
Poisson
000
199
25
.-
3
4
5
6
Table 8.
e
•
·..
14
10
6
.•
1034
2
13
...
3
1
0
0
2
--2.
6
1034
2
0
0
--.Q
1034
1034
Observed and Computed Froquency Distributions of Fruit Orchards
in Allogan County, Michigan
COIToiutocl
NumlJor of
orchards ....
por segnent:
Observed
Contagious
BinoMial
0
1
2
951
121
55
925
160
62
760
367
71
3
4
5
30
20
13
29
15
0
6
7
5
2
1205
4
2
1205
·.
:
..
Poisson
776
342
75
7
0
0
...
0
_Q.
1205
:
.
..
t
11
1
0
0
--.Q
1205
- 63 -
e
Table 9.
Observed and Computed Frequency Distributions of Al'l'le Orchards
in Allegan County, Michigan
Observed
0
1
2
970
126
43
952
150
56
3
4
5
39
12
12
25
12
6
6
7
2
1
1205
··
Table 10.
··
·
·•
·
·
·
Contagious
3
__
1
1205
:
·.
·
Computed
Binomial
015
332
54
4
0
0
Poisson
··•.
·•
7
1
0
0
0
1205
0
·•
027
311
59
---2.
1205
Observed and Computed Frequency Distributions of Ponch Orchards
in Al10gan County, Michigan
Nunber of :
orchards ••
per segment:
Observecl
Contagious
Conputed
Binomial
0
1
2
996
116
982
140
40
071
292
39
001
276
43
3
5
0
0
3
4
5
6
47
·
22
17
6
1
1205
•
··
Number of :
orchards
per segment:
20
9
4
2
1205
-
0
0
0
1205'
·
•
Poisson
--.Q
1205
- 64 -
e
"
Table 11.
Observed and Computed Frequency Distributions of Pear Orchards
in Allegan CountYt Michigan
Number of :
orchards
per segment:
··
0
1
2
..
·
Observed
1002
119
33
30
14
5
3
4
5
6
~
1205
Table 12.
e
..
·.....
991
134
46
081
205<
37
19
2
0
0
9
··:
4
2
1205
·.
.--.Q
1205
.•
·•
:
Poisson
009
270
41
4
1
0
·
.
0
...............
1205
Observed and Computed Frequency Distributions of Grape Vineyards
in Allegan County, H1chigan
·
N'llf.lber of •
vineyards :
per segnent:
;
Observod
:
1
2
..
·
Contogious
1123
72
9
15
·
COf.lJ.Juted
BinoniaJ.
,
•
Poisson
..
1125
65
0
3
4
5
6
·.
·•
Contagious
Computed
Binonial
0
0
0
0
-1205
·
.
1
0
0
-1205
0
.•
.
·
t
...
...
1113
09
3
0
0
0
0
l.265
•
..•
·
1114
87
4
0
0
0
0
"i205
The tables show clearly that the contagious clistribution is much
superior to either the binomial or tho Poisson in describing tho observed
frequency distribution for all populations considored •
•
- 65 -
4.2 Sub-sampling
4.2.1
Introduction and Notation: A comparison of a sample from the
selective population with one from the original population, when a subsampling approach is contemplated, fits more logically into a discussion
of an area sample, since most area samples are based on some sub-sam-pling
procedure.
Sub-sampling implies that the population to be sampled is
dividec'.. into a number of large units which are generally known as ;Qrinary
sampling units, (psu's)~ Each of these psuls is further sub-divided into
smaller elements known as sub-sampling unitsa
In practice a sample of
pSUIS is selected from the population of pauls and a swm)le of sub-units is
then selectecl from each psu in the sample.
Either selection may be made at
random or systematically with equal probability or probability proportional
to some measure of size.
Sub-sampling is sonetimes referred to as "incom-
plete stratification", since the psuls can bo considored as strata, and
samples are selected from only part of the strataa
If every psu selected
were enumerated completely, this typo of sampling would revert to
random
si~le
s~jling.
The notation presented in tabular forn in Table 13 is noeded for
making sub-sampling cO!!1pflXisons among the selective sample and sanples fror.!.
the selective and original population.
We assune that the proportion of eligiblos por psu is known in
~lvance
so that in any of the comparisons euch psu will be considered to contain
at least one eligible sub-sar.tpling unit.
In sanpling for most agricultural
or socio-economic items the pau is tween as tho county or tho Minor Civil
•
Division (M.C.D., e.g•• township).
For aroas this lnrge. Census informar-
tion usually is available to indicato whether tho psu contained
a~
- 66 -
Table 13.
Notation for SuO-sampling Comparisons
Definition
Synbol
Original Selective Selective
population population s~)le
Number of eligible pauls in population
Number of eligible pauls in sample
Proportion of eligible sub-units in i th psu
Number of sub-units in i th psu
N
n
n
n
1
:PiNi
Number of sub-units in the sample in i th pau
Value of jth sub-unit in i th psu
Sample Dean of sub-units in i th psu
_
lIT
Pimi
7ij
n
e.g•• Yio
i
= j=l
~ Yij/mi
Population nenn of sulJ-unitsin 1th psu
M
i
=
~
Yij/Mi
~,u/io
j=l
Sanple nean of psu totals
o.g..
n
e.g..
~
i=l
Ni
/-tofn ::: y
S
...Y
s
-Ys
-Ys
PopUlation mean of psu totals
e.g.~
N
i:
Mi/'''ti N == /o.(/s
i=l
.
Variance botween sub-units in i th psu
Variance between psu totals
eligibles, at least for the census years, ane those psu's which contained no
eligibles
coul~
be removed from the original population.
However. if the
psu were maCee smaller than a township, it is doubtful that previous infor-
•
mation would be available for determining whethor or not the psu contained
an eligible.
A seloctive sample is not considerod to be admissible unless
- 67-
from every psu selected, at least one eligible sub-sampling unit is also
selectede
A similar assumption was made for stratified random sampling
in section 3.404.
~.2.2
Unbiased Estimates Derived from PSU Totals:
selected with
e~ual ~robability
even though they
m~
an unbiased estimate can be mnde based on psu totals.
If the psuls arc
vary greatly in size,
This estimate is
Proof:
E(t )
o
IT,
where E indicates an average over the '(n l possible
n
iN)
these in! samples, there will be n
(~.) values of
s~~0les
of.n..
In
Mi ·,I.'1.o; so that each
of the 1I values of Mi/'io in the ]?OIlUlation will be represented
n
N,
t
N :n
I
times i""
n t..n •
Bence
If the variation in size of pauls is great, this estimate will usually
have a large variance since the betwoen component of the variance reflects
•
not only variations among psu means on a sub-unit basis but also variations
- 60 ...
in tho number of sub-units per psu.
The vnriance of the estimated total is
exprossed as
.,
Proof~
N n
N
~
My" ~ M
n 1=1 i io
1=1 i
(to - T) ;: -
/i___
10
-
.Ai\.C1. and subtract
N n
::: - ~ Mi (Yio -/'/'/10) + N(ys -'/':'8) "
n 1=1
.
2
iT
to
::: E(t -T)
0
,
~ ~C1 I........ ... -)
n 1=1 i 10 ! m
M
i
i
N N
2'1
1
::: -
\
!
2
+;0
(
1
1
_ ... _.
S
n
)
I>
N/
'
For a sanple frOM the selective population, an unbiased estimate of the
population total bnsed 0n psu totals is given ns
(59)
and its variance is given as
•
- 69 ..
The reu!'lConing fo1101:iod in deriving express:1.ons (57) and (50) can l)e used in
pOlmlation tot.')J. bas Jcl. on pau totals is given by
PI' 00 f
o
U
"'"'1"
-:
OD.
('7.<1\
,-'>;t:/
B(y;'> ::: ,/~·1.
0
Hence, from proof of (57),
Tho variance of t' is
I""~ t11 \
I
L
In'
\ .i
\
/1
f
Proof:
Fron (35)$ th0 within portion of V , will be given as
t
NN 222[;'1\
1--'
E{-I--I,
- E pM.• a
n 1=1 i ~ i
.
. mt J
\ '1 .
.'
p l4
I
iii
, _ 1
and tho betweon ~ortion is exactly the same as for (50) and (60).
Since no non-eligible psuts are aili~issible, the psu population totals
will be iclentical for the selective Mel original IJopulations"
Honce the
secClnd tern on' the right in expressions (50)p (60) anel (62) (the lI'betwoen
pau co~onenttt) is identical for the selective sample and samples from the
•
selective and original populations.
A comparison of tho efficiency of the
estimates based on the three
is reeluced to a comparison of tho
"within components II for each.
s~}les
In comparing (50) with (62). the only
-70-
aifferen~e is in the relation between E i':'" ) and
:1
\
mi
-=-.
P1mi
it appears that E ( - ) is slightly larger than
\n! I
2- ;
From ••otion 3.3
..
hence, (60) would.
Pimi
bo slightly lese than (S2) and.t a 11ttle more efficient than
In cOMparing (50) with (60), the
",'Ii thin
cOI!l.ponent ll of (50) will
alwll\Ys be less than the If,'Ii thin conponent" for (60)"
2
by relationship (3) where PiOi
2
d
a io -
i!...
~1 0
tlt.
This can be shown
Hence a samplo from the
Pi
seloctive popUlation will always be more officient than a sarq,le from tho
original popUlation.
In the cooparison of (50) and (S2). there are two relations to consieler:
-=---. and ~he relationship
the rolat10.n between E ( -=-). and.
2
\ mt
p
2
Pio2!
1
=a i
j./..
'.,1
q
(3)
0
11
/1
Ef --
\
10 i . Fron section 3.3, the relation between
! and
0
Pi
\
probably oporates slightly in favor of the snnple from tho original
-
nl /
Pimi
population.
However the second relationship operates in favor of the
selective sample
an~
in most cases the selective sanplo will be marc
officient, i.o.,. the first term (within comronent) of (S2) will be less
than the first term of (68) by approximately the v~ount
4.2.3
Other Estimates: (i)
Thus ·far. the discussion has been
confined to unbiased estimates of tho population total based on psu totals.
•
Oochrl'.Il [2J has shown that, although these estimates are unbiased, they
are relatively inefficient.
Another estimate which should be considered
- 71 -
is basecl upon an eXj}ansion of an unweightect sub-unit mean.
As in 4.2.2,
the psuts are assumed to be selected with equal probability with either an
equal number or a constant proportion of sub-units selected per psu.
The estimate of the population total is
n
==
E
i=l
sri In ..
0
This estimate is biasod by the following amount
(W
N
J
IN)
N
M
~ E ..,L/li - E Mi /.J.io/EMi f •
:Bias == ( 1':
1=1
i
( i=l
i=l
)
but in many instances the bias is small and the variance is much lower than
.for to (50).
(11)
Another estinate, proposed by Hansen nne. Hurwitz [4] • nay
yieln the lowost variance of the estimates considered,hore.
This estimate
is
In their procedure, one psu is selected with probability proportional to
the size of the
psu~
i.e., tho numl)or of sub-units per psuo
Within the
selected peu, a sar:1]!ling rate is calculated lmsed on the number of subsampling units dosirod.
The estimate is unbiased, for in repeated sampling
N
the i th psu will appear with a frequency Mil E Mi' so that
i=l
•
- 72-
(iii)
Comparisons basccl on elimination of non-eligibles for these
estimates will not be considered here.
4.3 Pro-listing
The true size of a priMary
so.mplin~ uni t
is often unknown.
Usually
infornation, such as Census data, is available for a previous date and will
give a good indication of tho size, but no changes since tho Census are
recorded.
In such cases, it is not unconnon to select one
prinar~ s~)ling
unit using arbitrary probabilitios which will approximate the probabilities
proportional to the exact size.
choscn~
After the primary sampling unit has been
the 'liruc size of this lIsU often is determined by cOW1ting during
the survey.
This procedure is known as pre-listing.
Prc-listing has also been used in scarcity
s~)ling.
For some surveys
with which the writer is acquainted, the rrocodure is to select one county
or township using pr.obabilities proportional to tho numbor of sub-units per
psu.
Within the selected psu, a number of sub-units is selected, USUally
systematically from a
ran~om
start.
number of Master Sai:lJ:)le segments
unit.
l§/
Master Sample maps are used, but a
are combined to forM the sub-sQml)l!ng
The size of the sub-W1it may vary from survey to survey but usually
is nade large enough to encompass a minimum expected nUMber of eligibles.
This minimum number is calCUlateC'. on the lJas:i.s of the best available
j]J Master Sample segments arc small areas of l~~d delineated for the Master
Sample of Agriculture~ Each county in the Uni ted States was divided into
suCh areas which average approximatoly 2 5 square miles in area and contain
from four to eight farms on the averaGe" Tho nroa ancl tho number of farms
por area may vary widely in different locations. Those basic area materials
were designed by the Bureau of Agricultural Economics, the Bureau of tho
Census' ana the Iowa State College Statistical LaboratorYt cooporatingG For
a clescription of· the developE1e~t anet clesign of the Master SD.l!1J?lo .Area
Ilaterials see King Mel Jessen L 9 ] •
0
•
- 73 -
information on the proportion of eligibles in the selected psu.
A third
stage of sampling may then be employed if all the eligibles in the subsaI!11}l1ng unit can not be enumerated.
Each su'b-sanpling unit is pre-listed.
to determine the number and identity of the eligibles within the area.
From tho eligiblos, tho predetermined nunbers
random end
en~~erated.
Tho 011giblos might be
~osired
are selected at
regar~od
as the sub-sub-units.
If the item to bo estimated is concontrated geographically, and tho
proportion of oligibles varies widely from one psu to another, it might bo
wise to nuko each psu a stratum and select the combined Master Sample
from oach stratum.
se~nents
In either caso, tho ostimate is unbiased.
When three stage SaT!l11ling is employed, tho estimate of the :population
total is
~1
- -
t
m
Z FijYij' where
m j=l
==
the number of sub-units (com"tlinod Haster Sample segment) in
the sa.-nple in the i th psu.
F
=:
ij
t.he number of eligibles listed in the j th sub-uni t of the
i th psu.
f
= constant
number of eligibles to be selected per
s~ile
sub-u.ni t.
Yijk
•
= value
of kth eli~ible
in jth sub-unit oitha i th nsu.
. .,
J;'
- 74 -
Othor notation needed for this discussion follows:
"u--ij
= population
mean of the jth sub-unit of the i th psu on a por
eligible basis.
/
= number of sub-units in the i th
]?SU (
M
=
N
~
i=l
\
Mi ) •
= M/M = probability of clrawing tho i th psu in the sample.
= number of eligibles in tho i th psu.
= population
total of tho i th psu.
population flOM of i th psu on a per eligible basis,
= population
/-(.. = population
T
total.
flean on a per oligible basis.
F
= number
O'~s
= the variance
M
Now t
=-
of el1gilllcs in the population.
wi thin tho i th psu on a sub-unit basis.
m
~ FijYiJ. is an unbiased estimate of T
m j=l
Proof:
N
::
N
::
M
Pi i=l
Mi
~
~
Mi
~
1=1 j=l
•
Tij = T •
=
N
M
~
~
i
1=1 j=l
Fij.~ij.
- 75 -
The variance of ]. is g1 von by
M N
V
t
=-
M.
2
m 1=1 j=l
N
2
I;J. F
I;
M
0
ij ij
1M
+ Z -ii _ T - T
1=1 M \ ~
i
Proof:
M
The error of estimate (t-T)
=-
III
~
Fijyi , -
m j=l
F/~.
J
This can be broken clown into its various components; bet\"een eligibles
in suo-units, between sub-units in psu, and between psu as follows;
Now in squaring (t-T) and taking its expectation, each cOTillJonent must
be multiplied by a factor of Mi/M, since this was the probability for
selection of tho i th psu.
2
E( t-T)
M N
= Vt =-.III
N
+
••
M
~
Mi 2
2
11 - '1F
~ F ,0
1=1 j=l iJ iJ
I
'M
\
~ ...! (- Ti - T)
1=1 M I,M.
, 1
1)
.
f
2
.
ij /
N
~
+ MI; Mio
1=1
J.S
'I 1)
(- - III
M
i
I
CH.8PTER V
COMBINATIONS OF HAIL, LIST .AND .AREA SAMPLING
5.1
General Considerations
If a careful study of tho foregoing presentation is nade, it is
apparent that a successful practical solution to the problem of scarcity
saT:1pling nust incorporate tho features of soveral closigns.
The distri-
bution of some scarcity items, as shmm in section 4.1, cn.:n be fairly \-vell
described as a contagious distribution.
In other words wherever an
oligi~
ble is found, it is usually geographically associated with a number of
other eligibles.
This fact, plus the fact that sono sort of a list even
though antiquated and incompleto is usually
availD~le, sU{~cests
tion of mail inquiry, incomplete list n.:nd area enumeration.
a combina-
This basic
idea has been used by tho Washington State Field Office of Agricultural
Estinatos, BAE, in an attenpt to obtain inforMation concerning connorcial
apple growers.
A simil8X plan is suggesteQ here and is
~artially
tested
with data obtained from a COMplete enumeration in 1946 of a cOmf.1ercial
peach area locatecl in tho Sandhills area of North Carolina previously
described in footnote~.
The first stop in this proceclure is to o1)tain the most complete a..l'ld
up-to-date list available.
sw~)lo
Mail questionnaires would then De sent to a
or perhaps to all of the names listed, as the populations would
usually be snaIl.
Tho practice of successive mail inqUiries could be
adopted if response to tho first request wero light.
•
In order to utilize
this nethod, the respondent must locate accurately his farn heenquarters
geographically either by description or on an enclosed map.
As the returns
- 77 -
come into the office, the eligibles·
he~lqunrtors
are spotted on a single
large map of the area and the Master Sa:r.1ple segment, in which each headquarters falls, is identified.
By use of Census or other information, tho
geographical area to be covered is delineated.
populations are thus established;
arc located and,
respondents.
Two basic strata or suo-
(1) those segnents in which respondents
(ii) those segments in tho population which contain no
Further stratification may bo Dade on the basis of county or
township if practicable.
If the population under consideration is distri-
buted in tho Manner described in scction 4.1, tho
avcr~~e
number of e11gi-
bles per segment will be much greater in stratum (i) than in stratun (ii).
although the number of segments in (ii) will probably bo groater than in
(i).
A differential
s~)ling rate (optimum allocation would be preforred
if variancos can be estimated) can thon bo applioQ in the two strataG
In
effect, the field enumeration of tho eligibles in the segments selected
adjusts for two biases:
bias iue to
(1) the bias due to non-response an~
incorr~leteneas
of tho list.
(ii) tho
A third bias, which is not neasured,
is the bias \"J'hich r.1ay result from an inaccurate clelineation of the area of
the population.
The ostinato of the population total, using this scheme,
\illoulcl be
whore
m
•
= the
number of respondents to tho mail questionnaireo
Y = tho moan of the respondents.
m
N'lj = tho number of segnents in the jth stratum having resJ/on<lentsG
~
~ mean of tho sogncnts in the jth stratum haVing respondents but
Ij
exclusive of those respondents.
- 78 -
N2j = tho number of segments in the jth stratum having no
respondents.
Y2j
= the nonn
of the segments in the jth stratum haVing no
responc1ents.
The variance will be that of an ordinary stratified random sample.
The first tern in the estimate of i can be considered as a population of
respondentsand.honcohas no variance associated with it.
are merely the
sumv~tion
of stratum estimatos.
The second terms
The variance of the esti-
mated total, V , ignoring the finite population correction factor, can
t
then be expressed as
(60)
Further refinenents might increase the rrecision of the estimate, or
decrease tho cost of the survey for equal accuracy.
For example, the
eligibles whose values are extremely large may be singled out for
con~loto
enur1oration, and if they can be located geographically, the segments in
\thich they fall could also be placed in sub-population (i).
If information
were available on a township basis, those townships with small numbers of
eligibles might oe removed from the population.
If the precision could be
materially increased and tho magnitude of the bias estimated, one might bo
willing to accept a. small bias in order to increase the efficiency or,·
reduce the cost.
Of course. if the segments in which the eligibles occur
are identifiable. the efficiency can be increased greatly.
If a
.list is available, especially with supplementary information,
•
this list w0uld be aclopted •
co~~leto
s~)ling
from
- 79 -
5.2 An example Using Commercial Peach Orchards
The basic scheme and several variations were tested using the data
from the 1946 cOMplete enumeration of the commercial peach orchards in the
Sandhills area of North Carolina. and a subsequent mail inquiry me£.e later
in that same year.
The estimated costs of the several methods were com-
pared holaing tho accuracy constant.
It should be pointed out that these
cost estimates are for obtaining the information only and do not inclucle
the cost of
~rocessing
the data.
It was assumed that ton per cent accuracy
at the 95 per cent level was acceptable.
The COMplete enumeration disclosed 257 COMMercial peach orchards in
the Sandhi1ls section of North Carolina having an expected production of
1,561,212 bushels.
Tho hcaiqunrtors of each of those orchards wore located
on a map of the aren.
Thoy fell in 23 townships which had a total of 1,034
Master Sexq;lo segments.
In September 1946, information \1fas ol)tainocl concerning actual
production by moans of a mailed inquiry and an enumerative fol1owup of a
s~)le
of non-respondents.
Only 37 growers responded to the first request
for information, so a second inqUiry was mailed to the remaining 220 growers.
The response to tho second inquiry was 47 growers, giving a total of 04
usable schedules from tho mai1er1 inquiry.
The hOfl,c:'quartors of these 84
orchards fell into 72 different sogments which became sub-population (i).
If a response to the mailed inquiry was obtained from every known eligible
in a township, this township was climinated fron the ropulation.
sub-pOl)u1ation was further stratified, giving a. total of 6 strata.
•
Each
For
this purpose, townships in Anson, Richrmnd and Scotland counties were
placed in one stratUT.'l, those in Moore ancl Hoke in a second and Montgomery
- 80 -
in a third.
Throe variations of this basic plan were
The
consi~ored.
largo furns (those having marc than 10,000 trees) were consi~ered
separately; tho townships having only ono
con~crcial
orchard wore olinin-
at eel, thus giving a biased. ostirmto; ancl the oligiblc segnonts 'fOro
consi~orod.
to bo identifiable.
(lata arc 11rosontoC'.. in Table 14.
population responding
t~
These methods and tho basic population
For the purposes of presentation, tho
tho mail inquiry
an~
tho largo farm non-respon-
dents arc each considered as separato strata.
Table 14.
Number of Orchards and Segmonts por Stratum for
Methods of Sampling.
~felve
Differont
- 81 -
Table 14, Continued
- 02-
Table 14, Continued
Method
Stratum
No. of segments
No. of orchards
y
Mail
12
Large FarI!l
9
6
15
22
1
2
3
4
5
15
22
6
04
10
14
9
40
24
22
46
~
Biased estimate due to the elimination of townships having only one
orchard
bl
Eligible segments are ass~~e& to be identifiable before the field
process.
s~~)ling
Since
.:
ba~ic
variances wore
aV~il~blo,
the principle of optimum
allocation (soe section 2.1.7) was followei in detCr.Llining the size of
sample required to attain ten per cent accuracy at the 95 per cent
W
lovel,
Although tho calculations do not allow for incompleteness of the
j]J
Tho formula ~sed to co~ute the size of sample reqUired was taken
from Cochran [2J • This formula is only applicable in those cases where
optimum allocation is to be used.
:>
n
2
(I: Nl',1)
_ _ _-.r.L_-.lJ._.......
2 2
d /t + I: Nll~
,
where i is Stuclent r s lItn anet .9:.
= T/IO.
The nj are then cODJ?utecl by the formula
n
j
=
n N (].
-j
J •
:E Nj(]j
In many instances, optimum allocation specified that nore segments were to
be enumerated in a particular stratum t1k~ there were segments in the population in that stratun. ~n those cases it must be considered that the
p~ticular stratum in which this occurs must be enumerated cOffil)letely and
optimum allocation applied to the remaining strata in tho population.
- 03 -
list, it is felt the comparisons would indicate which arproach with this
population would be most fruitful.
The costs were computed on the follow-
ing bo.sis:
Oost of one completed mail questionnaire
Oost of one
Oost of
naile~
$ .15
inquiry not returned
on~~eration
~or schedule
.10 per schedule
of an eligible by
personal visitation
4.00 per schedule
Oost of determining oligibles
The cost of the first two items
woul~
1.25 per segnent.
be constant for all methods con-
sidered and is computod as follows:
37 x .15
=
5.55
220 x .10 = 22.00
47 x .15
173 x .10
=
7.05
= 17.30
52.90
The data are
s~~arized
in Table 15.
Tho nm1ber of orchards shown in
column 4 is the expected number over a large numbor of sarn[Jles and not
those which night arise in any one survey.
Although methods 9 through 12 were inserted for comparison
PU~)osos.
tho information necessary to utilizG thon will rarely bo available in
practice.
A selection of one of the other oight methods would narrow
down to eithor 7 or 8 if a biased estimate is acceptable, or to 3 or 4
if an unbiD.secL estimate must be used.
In this l)articular case, tho bias
is of the order of 0.0 per cent end would be
corr~pletely
•
accapt~~
rather than a
unbiasod estimate.
From the tRbla, it appenrs that a lnrgo ffl,rm list is more helpful
than additional stratification.
The offect of
a1~itional
stratification
-
~
Table 15.
·
3
4
5
5
'7
0
9
10
11
12
·
·
··
··
··•
·
·
··
··
·
'e
Sizo of SDJ!Jl,10 Roquirocl to .&'i.ttain Ten Per Cont Accuracy o,t tho 95 Per Cent Level in
Estimating Expected Pro:J.uction of Poachos in the Sanc1hills .Area by Vn..rious Meth()ds
of Cor.ibining Stretified SanJ/lin{';, Mail Q.u.estionnnire ~m(l Lorge Farm Lists.
Number of :
:N'U.T.lber of .:
large
Mothod: orchards : orcbnrcls .:
.:rosponCl.ing.: ( > 10,000-:
.: by nail :
trees) :
1
2
e.
04
04
84
84
04
04
D4
04
04
04
04
04
· -··• -10
·: 10
·· -· -10
·
•
·
··
:
··
···
·
10
--
10
10
:
··
··
••
··
·
·
.:
·•
··•
Exr!cctecl
number of
orch..."\l'0.s
in arca
snrrple
't·
13'7 .
136
90
90
131
132
70
00
138
111
51
50
·•
··
·
Totcl
orchnrcl..s
cnunerntoc.
by pcrsona1
• visitation
·
·
·•
··
·:
··•
··
·
··
·•
·
137
136
100
100
131
132
96
90
130
111
69
60
··
··•
···
··
··
·
··
:
·
···•
··
··
·
Required
nunber of
segnents
621
617
353
201
414
414
239
100
01
60
29
27
·• Expected .:
·.: nunbcr of
·
to ·
··:bcfnrns
contacted:
··•
····
·
·
··••
·•·
··
·
·
·
3105
3005
1755
1405
2070
2070
1195
900
405
;Y...()
145
135
··•
··
···
··•
·•
"
··
··•
·
·
~stimn.to(1.
cost
dollars
1,377.15
1.360.15
925.15
035.15
1.094.40
1.090.40
735.65
669.90
706.15
501.90
365..15
350.65
J
0
.;:.
I
- 05 -
is more pronounced when the largo farm list is used than when it is not
used.
The decreased cost stems from tho fower
n~~ber
of segmonts to
contact, evon though about tho snne nunbor of orchards are expected in
the segments to be contacted in both instancos.
Again it should be enphasized that these figures probably represent
a maximun efficiency from two points of view:
(i) in practice, varirulces
usually will not be available for optimun allocation and, (i1) no estimate
can be made rogarding thocomplctenoss of the list at this date_
•
CHAPTER VI
6.1
Summ~
(1)
of Results
The efficiencies of various methods of saqpling and estimation
were oOl:lparecl on the basis of variance alone and on the basis of variance
and cost oonsidering one type of cost function for samples from the two
following types of populations:
(i)
A sample of
n from
an original popul~tion consisting of N
elements, pN of which have a value for the characteristic in
question and qN which have a zero value,
(ii) a so.rIJ.')le of
n
[salle size as (i)
sampline rate as (i)J
J
a'r a san1J:)le of pn
rsame
1_.
selected from a po~ulation consisting
of pN el1gibles •
(2)
The efficiencies of most of the sane notho~s of samplinG and
ostimation were
co[~ared
on tho hasis of
varia~ce
and cost for sar:1ples
from populRtions (1) and (ii) with
(iii) a selective sample, in which a sam,le of
n
is selected from
the original population N but the non-eligibles are discarded
from the sample in order to estimate tho mean of the selective
population (ii).
occur in any given
(3)
In this case the nunber of eligibles which
s~)lo
is a random variableo
Data from a complete enumoration of fruit orchards in Allegan
County, Michigan, and of commercial peach orchards in the SandhilIs of
North Carolina indicnted that the distribution of orchards per segment
•
was Qescribed reasonably well by a contagious distribution•
- 07 -
(4)
Expressions for the unbiased estimates of tho total based on
primary sampling unit totals and expressions for corresponding varicnccs
wore derived for the selective sample and samples from the original and
selective populations when a sub-sampling procedure involving unequal
sized primary
(5)
snmplin~
units is to be utilized.
A method combining the use of tho nail questionnairo, largo farn
lists and area
s~)ling
was suggested.
Approximate costs for obtaining
ton per cent accuracy in estimating commercial
nreawere compared for
Sanc~ills
po~ch pro~uction
in the
ve~ious proco~uros.
Conclusions
(1)
Provided t1k~t
(1)
General lists which are cOMplote
~Ld ~D-to-dato
nrc
available,
(ii)
Realistic cost functions such as proposed in section 2.2
can be established, and
(iii) Estimates of the basic variance in the original population
and estimates of the variables in tho cost function can be
made,
then, the various methods of
sacr~ling
and estimation can be corrpared for
the three typos of samples listed below, and an objective selection of
one can be
n~10
on the basis of maximum accuracy for a given cost or
minimum cost for fixed accuracy:
(1)
•
A
s~lc
drown from tho original popUlation in which the
non-eligibles are retnincl in both tho population and tho
sample,
- 00 -
(ii)
A anople nrawn fran the selective population (eligibles
only) assuninr, that the eligibles are identifiable before
the sarq)ling process, and
(iii) A snnplo drawn from the original popul~tion in which the
non-eligibles are eliminated fran tho sample selected,
assuning that eligibles are identifiable when contacted.
(2)
If an antiquated or incort)loto list is available, a nethocl
utilizing a
conbin~tion
of nail inquiry, list sampling, and area sampling
as proposed in Chapter V
m~
be used to roduce costs in obtaining accurate
estimatese
(3)
area is
For a limited anount of data concorning fruit trees, in which an
divi~cd
into segments haVing equal nunber of farms, tho distribu-
tion of eligibles per segment seens to follow a contngious distribution.
(4)
If no lists of any kind ~re available an~ nono can bo obtained,
an area sample must be used.
The nost feasible proceclure in this case
appears to bo the selection of a sample of large
s~)ling Q~its
which are
formed by the combination of several (o.g., four to six) Master S~)le
segnents.
Theso large sampling units shoulcl thon be pre-listed with all
eligibles taken within the
s~)ling
eligibles if all of thon can not
6.3
1)0
onun,,Jratc·1.
Suggestions for Further Research
The research
re~ported
of the problem of scarcity
•
unit, or a constant number of
herein does not constitute a complete coverage
s~)ling.
exten1ed will be indicated.
Sane of the points which should be
In addition, the investigation has raised
new queries which night Drofi tably be considered in future researche
These will
~e
mentioned also.
- 09 -
Scarcity saqpling involving the nulti-st~~o or sub-sampling
(1)
process has been 1iscussed briefly in this manuscript.
among the selective sample and
sa~les
Conparisons
fron the original and selective
populations need to be considered for a nunbor of estinutos other than
the unbiased estimate based on psu totals.
Those estinates which involve
the selection of the psu with probability proportional to sone neasure
of size should be stuaiod.
Information which covers a wi1c range of population sizes and
(2)
saDDle sizes is desirablo for stUdying the nature rold 1)ehavior of the
expected value of the reciprocal of a positive hypergoonetric varial)le.
Two a?proaches night be consiiered.
One would involve tho construction
of artificial populations an1 tho selection of difforent sized
fron those popUlations sinilar to the process
(le~cribecl
s~Jles
in section
3~3.
{1 \
The second approach would be to (:tctcrnine the values for E 1- ~
,
Int!
theoretically. Although Stephan [_13.J has given the theory for obtaining
these values, the process will be laborious unless his recursion formula
can be adapted. to 113M procea.ures or other nechanical conputers.
Somo
effort should be 1irected toward the development of such a computing
procec1ure.
(3)
More realistic yet workable cost functions must bo ostablished.
The cost fUnctions presented hore appear reasonable but they need. to be
tested in actual surveys.
si~lest
Further, those cost functions are of the
type and will need to be revised as
are plC'.cecl upon the Sam:Dling process.
•
ad~itional
rostrictions
The costs of nctuv.l
be the basis for ostv.blishing the cost functions •
survo~rs
shoulcl
- 90 -
(4)
MOre infor~ation is needed on tho characteristics and properties
of scarcity items themselves.
Dc
all scarcity items tend to follow a
contagious distribution 1)y area sogment?
co~plote en~eration
from objective
(5)
the "life oxpectancy" of a
is only one year, what is tho persistence of tho
distribution in successive years?
to theso and other
~!hen
sinil~r
Some speculative answers can be givon
quostions, but better answers should rosult
infor~ation.
It appears to me that suggestion (4) above can be answered only
when l)opulation data are availeJ)lo to the sal!lJ.Jling statistician on a much
wider scope than is now the case.
Many sELl11]?le surveys are unc1.ortaken
each year, but usually wi th a liMi teo. buclget and drawn so that the
relative accuracy of
altern~tive ~esig.ns
and estimates rarely can bo
lJ
assossed.
b~)ility
In many cases, espocially with the use of sampling with proproportional to size, the accuracy of tho estimate can only be
approximated.
It is
~
feeling that more rapid progress could be attained
with a subsequent saving of time and monoy, if complete enumerations were
made for the
~urDose
of obtaining basic population data similar to those
.data collecteQ in the Sanclhills of North Carolina and Allegan CountYt
Michigan.
A
valu~ble
source of data would be U.S. Census information if
it could bo nade availw)le on a per form basis for research purposes.
addition. such information would furnish basic lists of
In
inaividUe~ls
having scarcity items.
With llilditional universe information, the methods described herein
couli be tested more completely.
•
Further, this added information might
suggost new methods of estimating scarcity items.
horo aro not new:
they havo been used by rractical
The methods roported
s~)ling
statisticians
-91 -
•
for sone timo_
However they
h~ve
been used with little ol)jective
knowledge of their efficiency in terns of either cost or varianco with
respect to alternative procedures.
This dissertation
lk~s
given sone
theoretical conparisons of efficiency, but the results are at present
limited in application by the lack of information on costs and the lack of
information
~bout
populations~
•
the characteristics of various
t~)es
of scarcity
- 92-
(.'
Oochran~ \(. G. rfSntlplil1s Thoory ~fuen the Sar.r;)ling Units arc of
Uncqu<''1.1 Sizes l!. Jour, Ar:Ior. StQ.t~ Assoco 37:199-212, 1942.
"
(2)
Cochran, \t. G. So.r.lple Survey Techniques. Insts of Std. rUneo
Series *7 prepnroi by Inst. of Stnt., N.C. State College and
the :Su:r.. of Agri. Eeon., USDA, Cool)erating, Raleigh, NeC",
19<1:0.
(3)
Finknor, A. L. IlMothocls of Snnpling for Estinnting COr:1T'1orcial Peach
Protl.uction in North Carolina". l~ .. C., Stnto Call ego Ex)). Stq~
Tech. :Sul. 91, 1950.
(4)
Hansen, M" Ho rold Hunli tz,
Finite Populations ll •
(5)
Hasel, A. A. nEstination of Volume of Tinbor St~~ds by Strip
Sanpling". .Ann. Math. Stat. 13:179-206, 1942.
vI.
N.. liOn the Theory of S[1J:11)ling from
Ann. Mat40 Stat~ 14~333-362~ 19430
Ho1nes, Irvin and Handy R. P. DesigninR a Srum)lo of Vegetable
Growers. Restricted ~relininPIY nineo report of the Bur. of
Agri. Scon. USDA, 1949•
.
'
Jessen, R" J .. anr:'!. HouseBan, ::.1 .. 2. "Sto:tiistica1 Investigations of
Farn Sa.TI11')10 Surveys TDken in !0\1I"[',; Floricl.a ancl Californio,u"
ItPa.rt I I: F10riCLa n by E. E. Houseno.u. IOVTa State Collo,~
Res. :SuI. 329, 1944.
'r
(0)
Konc1.al1. M. G. The Advanced TheoIiT of Stat:l.stics.
Griffin and Co., Ltd. London 1946-47.
Vol. 1.
Charles
King, A. J. and Jessen, R. J~ "Tho Master Sample of .Agriculturc lf •
Jour .. ABer. Stat. Assoc. 40:30-56, 1945.
McVay,F. E. and Tucker,
Ro~ort of Progress
Stat. Mimco Sories
•
~ricultura1 Prices.
to Jan. 1, 1950
(A
Inst. of
(11)
NOynrol, Jcrzy. "On the Two Different Aspocts of the Representative
Method: The Method of Stratified S~)ling and the Method of
Purposive Selection ll • Jour. Royo Stat. Soc. 97:550-606, 1934.
(12)
Sar1e, C. F. Proposed Resoarch Proerarn of .A;;ricultural Estimates.
Mcmoranc"!.un to Menbers of ligriculturfl,l Dstinates Technical
COll~ittoe and Visiti~~ Field Statisticians.
Septonbcr 10, 1940.
(13)
Stephan, F. F. liThe 1!bq>ected Vnlue and. tho Variance of the Reciprocal nnd Other Nogativo Powers of a Positivo :Sornoullian
Variate". .Ann. ~,Iath.. Stato 16 :50-62, 1945.