Pasternack, Bernard S.; (1956)Comparison of alternative measures of size in the construction of area sampling frames." (A.M.S. Progress Report 20-2)

....
~
~
....
Report of Progress of Cooperative Project"
of
The Institute of Statistics, North Carolina State College
The Agricultural Marketing Service
United States Deparlment ot Agriou1ture
February J 19,6 - July, 19,6
.'
Progresa Report No. 20 - 2
fV)I'"..,eo
SeVt'<::s
#-/S3
"
.
Comparison of Alternative Maasures of Size
Construction of Area Sampling FraJlles
by
Bernard. S. Pasternack
m the
•
•
TABLE OF CONTENTS
Page
Chapter
I.
FRA1VIE CONSTRUCTION AND AREA SAMPLING
1.1 Introduotion
1.2 The Area Method ot Sampling
1.3 Construotion ot an Area Sampling Frame
1.4 Con~truction ot a Frame to Equalize the ltSize" ot lt
Area Sampling Units Us ing a Given "lieasure of Size
1.5 A Teohnique tor Seleoting a Simple Random Sample of
Area Sampling Units
II.
3
5
7
10
PRESENT STATUS OF RESEARCH ON THE EFFECT AND CONTROL OF
VARIATION IN SIZE
11
2.1
11
Introduotion
The Effeot of Variation in Size
The Control of Varia tion in Size
13
18
METHODS OF COMPARISON
22
3.1 Introduotion
3.2 Proposed Teohnique
2223
IV. CALCULATIONS AND RESULTS
V.
1
2
1.6 Statement of the ProbleZil
2.2
2.3
III.
1
26
4.1 The Cotton Study
4.2 The Winston-Sale m Study
26
SUlv1MARY .AND CONCWSIONS
34
APPENDIX A - SilltPLING FRAMES
36
APPENDIX B - DERIVATION OF EXPECTED dAN SQUARES
40
BIBLIOGRAPHY
44
29
•
LIST OF TABLES
Page
Table
1.
CONSmUCTION OF AIr AREA SAMPLING 1iRAME
2.
COMPARISOl~
9
OF ilL TERNATlVE MEi~S ORES OF SIZE:
{INOD)oo va. (INOF)oo IN THE CuTTON STUDY
27
3.
COMPARISON OF ALTERNATIVE MEASURES OF SIZE:
(INOD)co va. (INOD)ms IN THE COTTON STUDY
28
4.
COMPARISON OF ALTERNATIVE MEASURES OF SIZE:
(INOD)oo va. (INOF)ms IN TEE COTTON STUDY
28
5·
OPEN COUNTRY STRATIFICA1'ION:
THE WnJSTON-sALEM STUDY
30
6.
COMPARISON OF ALTERNATIVE MEASURES OF SIZE:
(INOD)oo VB. (INon)ms IN THE WINSTON-BALEU STUDY
33
7.
SAMPLING FRAhiES FOR ALTERNATIVE MEAstj'RES OF SIZE:
COTTON STunY
36
8. SAMPLING FRAMES FOR ALTERNATIVE MEASURES OF SIZE:
WINSTON-SALEM STUDY
39
Chapter I
FRPJ,E CONSTRUCTIOI\T AND ARBA SAh.PLING
1.1
Introduction
In attempting to construct area sampling frames with s8Jllpling units as close to
equal size as possible, we are often faoed with the problem of choosing from among
various souroes of information a specifio Itmeasure of size lt to be used for the af::"'"
signment of sampling units to the area segments which comprise the area universe?
In general, each unit oonsists of a cluster of elements and the size of the
unit is defined to be the number of elements which it oontains.
sampl~~o
Since the prO<ledUrd
for establishing an arDa sampling frame is b.,sed upon an It indicator 11 of size rather
than on the actual size of these area secments, the variation in size among samplin 5
units will depend upon the relationship of the I1measure of s belt we adopt to the act'H3.J.
size.
It is obvious that variations in size of the sQDlpling units (or olusters) may
have a serious effect on the varianoe of the estimates of totals (or means per oluster)
although moderate variation in size will usually have a relatively small effeot on the
variance of estimates of averages per element or ratios, Hansen and
Hurwit~
(1953).
The purpose of this thesis is to explore the relative merits of alternative
measures of size in the oonstruotion of open oountry area sampling frames using data
oollected in North Carolina.
in detail, a brief
HOwever, before discussing the problem of the thesis
sum~arization
of some of the fundamental aspeots of frame construc-
tion and area sampling will be given.
Perhaps this will provide a clearer insight
into the significant role assumed by the chosen measure of size in produoing variation
in aotual size between sampling units vdth oonsequently inoreased variability of the
sampling estimates of totals.
l~n
excellent aoooUlIt of frame construotion for area
samples is given by Jessen and Thompson
(1953).
••
,.'
- c:.. -
In order to define the word frame in its restricted statistical sense, we will
first offer an explanation of two commonly used words which are basic to samplinga
universe and population.
universe.
A finite set of samplin[; units defines the concept of e.
The oorresponding set of measures generated by the universe when an at ..
tribute or measurable characteristio is attached to each sampling unit defines the
concept of a population.
A frame is then defined to be a system which unambiguously
determines all sampling units whic h comprise the universe under consideration.
Often, sampling
st~tisticians
are confronted with a number of different universes
which are troUblesome to sample due to the presence of one or both of the
circumstanoes:
follmvip~
(i) the total number of elements in the universe is unknown (note that
a sali'lpling unit can consist of either one or a cluster of elements to be observed)
or (ii) no specific desic,nation soheme or frame exists by which to identify each
element (or sampling unit).
For example, it is known
North Carolina, there exists a number of farms.
that, say in Union County,
Suppose it was desired to examine
certain farm oharacteristics as they exist currently by means of a sample.
The
obvious first step would be to get a list of the farms in Union County and then to
draw a random sample from the list following established procedures.
list is not available.
Em,ever, a current
Hence there is no frame for farms in Union County, and there·
fore, a random sample of :tarms cannot be directly selected.
This situation, in general,
exists for the universes of persons, dwelling units, business establishments, etc.
1.2
The Area Method of Sampling
In order to circumvent this troublesome aspect, a procedur e kuown as the "area
method of sampling" has been devLloped so that probability samples of such universes
e
oan be drawn simply, yet with aoouracy.
sampling oonsider the fo llowing example.
As an illustration of t he technique of area
It is desired to sele at for study a
- 3 ~.-:,'obability
sample from the universe of' farms located in the open oountry portion of
Union County, or in other war ds, tls area of land exclus ive of towns and cit ies.
list of these farms exists and only an
approxi~Ate
No
estimate of their total number
(based on the most recent oensus figures) is available.
A frame for this l.nd of
universe does not exist because the farms are not identifiable in adve.\);)·'td theJ.'
exact tot al number is unknown.
It is possible, nevertheless, to devise another universe of sampling units consisting of area segments, oovering the open oountry portion of Union County, and
arb itrarily constructed by the us e of !denti tiable geographical boundaries such as
roads, streams and railroads.
Some method has to be employed to assooiate each farm
with one and only one area segment or sampling unit.
Atter this has been aohieved,
the number of farms or some other measure assooiated with any area segment oan be
regarded as a oharacteristio of that particular swmpling unit.
A probability sample
of farms can thus be drawn by selecting a sample of area segmentv where the probability
of ohoosing aI\Y farm will be equal to the probability of selecting the area segment
with which it is associated.
Even though the exaot number and locations of farms is
unknown, it is possible, by the use of this
with known probabilities, a sample of farms.
n~;
universe of area segments, to obtain
This example serves to illustrate how a
universe unsuitable tor statistioal treatment oan be transformed, by means of the area
approaoh, into a new universe for which a frame actually exists.
1.3
Construot~on
of an Area Sampling Frame
By regarding eaoh area segment as the aotual sampling unit, in the example cited
above, a frame of, say N :
395, sampling units is obtained whio h includes all the
farms in the universe of inquiry (open country portion of Union County).
Each of
these area segments can be numbered from 1 to 395 for identification purposes.
-4Identifioation of selected sampling units is expedited, however, if the open country
portion of Union County is divided or partitioned into a number of large areas to be
called divisions.
Within eaoh of these divisions every area segment oan be numbered
beginning with one until all have been enumerated.
of Union County is partitioned into,
s~y
7
Thus i f the open country portion
divisions, each area segment can be easily
identified, first by its division number and second by its number within the division.
In the area universe presented above the area segment was arbitrarily chosen to
be the sampling unit.
Hence this universe oonsisting of 395 area segments forms a
suitable frame for sampling since (1) the total number of sampling units is knov.rn and
(ii) each can be uniquely determined by some simple prooedure of numbering.
AlthouGh
a random sample could be adequately drawn from this area universe by means of the
frame whioh has been oonstructed as above, suoh a universe may possess the undesirable
property of having a large unit to unit variability for the measurable characteristio
under examination.
The problem of oonstructing the area sampling
~
should also
be governed by t he extent to which a low sampling variance oan be obtained.
If the number of farms actually varies considerably from area segment to area
segment, the variance of the estimate of, say, the total number of farms in the open
country portion of Union County will be quite high.
On the other hand, it should be
evident that the more equal each area segment is in actual size (i.e. number of farms),
the lower will be the varianoe of this estimate.
In fact, if area segments or sampling
units can be oonstructed so that each segment oontained an equal number of farms, the
variance of the estina te of the total number of farms in the open oountry portion of
Union County would reduoe to zero.
This is practically impossible without perfeot
advance knowledge, in whioh case there would be no need to estimate the total number
of farms, but it serves to demonstrate the point in question.
- 5 -
1.4 Construotion of a Frame tito Equalize the "Size" of Area Sampling Units Using
a Given llHeasure of Size
Construotion of easily identifiable area segments or sampling units oan only be
aohieved to a limited extent if a one-one oorrespondenoe between area segments and
sampling units is required.
If the or~gina.lly speoified area segments (designated as
oount units hereafter) oan be assigned one or more sampling units by means of informs.tion available oonoerning their relative sizes, oonsiderably greater "size equalizatiol
of sampling units is possible.
At this stage it beoomes apparent that the problem of
frame construotion revolves about the ohoice of a teohnique or an appropriate measure
of si ze for each area segment (oount unit) whioh will enable the ultimate sampling
units of the area universe to be as olosely equal in
si~e
as possible.
When
alterna~
tive measures of siae are available to aid in the aooomplishment of this goal, a
deoision has to be made as to whioh one will be used.
In North Carolina, area
sampli~:
frames were oonstruoted in the Department of Experimental Statistios at North Carolina
State College, until reoently, by means of the information provided by the Master
Sample materials.
These were prepared by the Statistioal Laboratory of Iowa State
College in oooperat ion with the Bureau of Agrioultural Eoonomics and the Bureau of
the Census during the period 1943-1944.
The Master Sample materials l oonsist of
small areas or oount units oovering the entire United States.
Each of these count
units has assooiated with it, an indioated number of farms (INOF) and an indicated
number of dwellings (INOD).
These indicated numbers were obtained (for the open
oountry) by an actual count of the culture sho\"ln on maps prepared by the various
State Highway Commissions.
These oounts of INOF and INOD thus provide a measure of
1
For a disoussion of the development of the Master Sample of Agrioulture,
see King and Jessen (1945).
- 6 size tor eaoh of the delineated count units.
On the basis of this information an area sampling frame tor the open country
portion of Union County oan be oonstruoted by assigning to eaoh oount unit an appropriate number of sampling units !Jased, say, on the Indioated Number of Farms as
given by the Master Sample naterials, i.e. (INOF)
mg.
The total number of sampling
units whioh 'WOuld be assigned to t he open country portion of Union County would be
arrived at by dividing the current oensus figures for number of farms in the area
under consideration, by the expeoted average size of sampling unit desired.
The
nearest integral number would represent the total number of sampling units to assign
the entire area universe.
The subsequent assignment (disoussed below) of these samI':.j
units to eaoh count unit is then based upon the specific measure of size used in the
construotion of the area sampling frame.
Henoe, it is to be observed that the measwp.
of size serves as the indioator of the distribution or density ot the sampling uniXB
among the oount units for the area universe frame to be construoted such that the
sampling units are approximately equal in actual size.
The liJ.aster Sample materials oonsisting of area segments will always remain complete whatever changes may ocour in the course of time, although the aoouracy of the
supplementary information provided by the map oounts of the number of farms and
dwellin{:, units will naturally beoome progressively 'Worse.
It should be pointed out
that in a simple random area sample the estimate of a total will always be unbiased.
However, if the degree of relationship between the actual sizes and the size measures
as provided by the
~mster
Sample map oount data deoreases over time, the sampling
varianoe of the estimated total for a given related to size charaoteristio, obtained
from samples of the same size repeatedly over time, oan be expected to increase.
It
appears reasonable to believe that after a certain period of time has elapsed extensive revision of the Easter Sample materials will be necessary in oertain parts of the
- 7 oountry where further use of the information provided by the Master Sample might
result in unduly large variances of sample estimates, Yates
(1949).
In the state of
North Carolina suoh a revision l has reoently beeu made and these new materials (to
be oalled Current) serve as the basis for ohoosing a measure of size for the construotion of area sampling frames within the state.
These nevf materials oonsist of
a mor e convenient and more easi ly identifiable set of oount units with supplementary
current map oount information on the number of farms and the number of dwelling units
vJithin each count unit.
1.5 A Teohnique
for Seleoting a Simple Random Sample of Area Sampling Units
~wtbods of oonstruoting a universe of areas with "equalized size", under a
given measure of size, exist by which much of the labor of actually constructing an
area universe of segments can be eliminated without any loss of precision in the finA].
results.
Consider the following method used for Union County where the total number
of sampling units in the open oountry is to be
use of current census information.
2,164,
a figure arrived at through the
If (INOD)oo (i.e. Indioated Number of Dvvellings
based on Current N~terials) is adopted as the measure of size for the construction of
the a rea sampling frame, the assignment of an integral number of sampling units (to
be denoted as su IS) to any count unit will be based on the rule that the cumulative
total of suls assigned through any count unit will be equal to the integer closest to
cumulative (INOD)cc divided by the average "indioa.ted size" of a
SUe
The average indicated size of a sampling unit for the area universe if defined to
be total (INOD)cc /total no. of suls, \vhich in the present situation
6063/2164
equals
=2.8018.
1
By the Survey Operations Unit, Institute of Statistics, North Carolina State
College, Raleigh, North Carolina.
- 8 -
The frame consis ts of 2,164 sampling units.
2164 is drawn.
A random number between 1 and
An indioation of the oount unit in which the sampling unit lies is
determined by multiplying the seleoted random number by the average indicated size
of a sampling unit (in this oase 2.8018).
= 1185
was 423 then (423)(2.8018)
For example, if the random number ohosen
to the nearest integer, whioh indioates that the
su belongs to oount unit 2-2 (see table 1).
The number of sampling units assigned
through thatoount unit and the oount unit immediately preceding it is then computed
by dividing oolumn
(3), in table 1, by the average indicated size of the sampling
unit, 2.8018, and then recording the val ue obtained to the nearest whole number in
column (4).
For the example, through count unit 2-2 we have 1188/2.8018 : 424, and
through the preceding oount unit 2-1 we have 1180/2.8018 : 421.
assigned to oount unit 2-2.
Therefore 3 su's ere
The number of sampling units assigned to the count unit
is then reoorded in column (5) by taking the difference between the number of saJrlp:i.51'],'<
units assigned through
the selected oount unit and the number of sampling units
signed through the oount unit
inm~diately
preoedinG it.
a~­
Thus the oomputations essen-
tial for the assignment of su's to the oount units is confined to a subset determined
by the random seleotion.
Note that the first three columns
~re
filled out oompletely
and constitute the materials used repeatedly for different samples.
Computations for
columns (4) and (5) depend upon the sample drawn.
Onoe the oount units in whioh the seleoted sampling unit lies is obtained, it is
divided into a number of segments exaotly equal to the number of sampling units assigned to it, 3 for the example.
Each sampling unit is then ordered according to some
predetermined rule or teohnique suoh that, theoretioally, in any repeated sampling
prooess its number would remain the same.
- 9 Table 1
CONSTRUCTION OF AN AREA SAMPLIHG FRAME
(1)
(2)
(3)
(4)
(5)
Count Unit
INOD
Cum.
INOD
BU's assigned
through
BUts assigned
to
1- 1
2
3
7
18
10
7
25
35
•• •
f) • •
• ••
79
14
1151
1
29
8
10
1180
2-
2
3
•••
39
27
1188
1198
• ••
1646
•••
• ••
• ••
7- 1
2
3
15
18
22
5217
5235
5257
•••
• ••
62
D ••
7
~
421
424
3
• ••
Q ••
..
6063
There are a large number of sit uations ari sing in area sampling; where it is not
practicable or desirable to designate the ultim&te sampling unit as a specifio area.
Suppose, as in the example, that the selected sampling unit belongs to a oount unit
to whioh three sampling units have been assigned.
The rule states that the oount
unit should be segmented into three parts of approximately an equal number of indioatec
duellings.
Suppose, however, that suitable roads, streams or other identifiable lines
do not exist for use as boundaries within the oount unit for the construction of sampling units of about equal indioated
si~e.
In this event, greater inequalities in the
sizes of the final sampling units may have to be aocepted, or else a different kind
of sampling unit may have to be chosen such as one which consists of a systematio
numbering of the households that actually exist in the count unit.
In other words,
• 10 the three sampling units could be
Sample Unit No.
des~gnated
in the following manner:
Elements (households) Contained
(1)
(2)
1,4,7,10, •••
2,5.8,11, •••
3,6,9,12, ...
(3)
where the numbering of the households is fixed by some rule such as lithe most northeast household is number 1 and others follow in a clockwise direotion".
vThen the
households are not situated on a perimeter the rule required for numbering may be
more oomplex but it should be suoh that eaoh household 's number is unambiguous 1y
determined.
If the sampling unit seleoted was No.
3 in this partioular oount unit
the field investigator would locate the oount unit area, prooeed to the northeast
corner and list the households taking observations on the 3rd, 6th, 9th, etc., unti.:l
th~
1.6
count unit area had been completely oanvassed.
Statement of the Problem
At this point we will restate that the objective of this thesis is to oompare
alternative measures of size in the construotion of area sampling frames in North
Carolina with specHic attention given to the measures of size referred to earlier,
viz., (INOF)ms, (IIJOD)ms p (INOF)cc, and (INOD)co.
oomparison has be'")!!.
,~onstro
iously, vize: the effoot of
the sample estimates,
Tte Qhoioe of a oriterion of
h;.)d to the prope:'ty vlhioh has also been discussed prevth~se
alternative measures of size on the variance of
Ohapter II
PRESENT STATUS OF RESEARCH ON THE EFFECT AND
CONTROL OF VARIATION ll'J SIZE
2.1
Introduction
The discussion thus far has been limited to sampling procedures wherein the
sa.mpling units are not indi.vidllal elements, b1t rather clusters of elements.
Often.s
however, cluster sampling may be inefficient due to similarity among elements vJithin
elustersC'
exists.
TIns, in effect, is saying that a high positive intra-cluster correlat! on
Cost consid€'ratio.r.s also ile.ke cl1J.ster sampJ.ing prohibiti.ve at tj_mes.
Io
s:1.vh situations it is lcgical to take mea::3Urements on only a €'\:tlple oi: elements f:;-::-,'I
p.ach cluster rather than to enumerate it canpletely.
I~
order to present.
d.
geoor?.lizcd tr'3atment of a sampling estinate and i te:
\"'1.11.anr'3 c()llElider a two-'F+'E.EP. sti'a.tii'ied random sampling scheme where the poru1C'.t::l""l!'
is divided into, say, S st:.:'<..ta.
The clusters of elementa (households or farms
P5 ~ll
the earlier illustration) form the primary sampling units (i.e. psuls) in the area
universe.
i
th
In each stratum there are ~ pauls (i=1,2,. .. ,S) and in the jth psu of the
stratum there are N... househo~.ds (jal, 2, •••,M.).
~"
-"'1
stratum will be carried out in two stages.
TYl0 sampling procedure in ead
At. the first 3tage a predetermined ml.'Ti.be!'
of primary sampling units are sampled in each stratwn and then at the second stage,
also, a
pred~termined
sampling unit ~
number of households are sampled in each selected primary
In the i
tl-
. . stratwn a sa:nple (;f
Iiio:
.....
out. of
l~~ PS1.l S
....
'
are selected at
random at the first stage, with equal probability at each selection and without
replacement.
In each pan selected, all households are listed and n
out of the N
ij
ij
listed households are selected at random with equal probability and without replacement and then enumerated.
-12 Let x
be a variate representing the measure or value attached to any
ijk
characteristic of the k ~ household (k-l, 2, ...,nij ) ot the j ~ primary unit
(j • l,2, ••• ,mi ) in the ith stratum (i-l,2, ••• ,5).
The total,X, tor any given characteristic is given by
S M Nij
i
X.}J~2
i-l j-l k-l
and the best linear unbiased estimate, XI, of X is
5
XI
•
~ l\/mi
i-1
mi
2
j-l
nij
Nij/nij
~ X
k-l ijk
Mi/m and Nij/nij are known as first-stage and second-stage expansion tactors,
i
respectively, and their reciprocals are referred to as sampling fractions.
In order to obtain the variance ot an estimate in two-stage sampling it is
important to note that the process of taking expectations mst be distinguished over
the two stages ot sampling.
That is to say, it will be necessary to have two sets of
"expected values", one ot which corresponds to the sampling Of primary units and the
other which lnll arise in the sampling of secondary units wi thin a particular primary
unit.
The variance ot the estimate,
X',
on the premise that
lltj
is a mmber predeter-
17:,
mined by some rule tor each ot the Mi' primary un1 ts is [Ct. Deming{1950
where
in which
- 13 and
2
O'ij
where
For the case in which there are no strata so that m primary units are drawn from a
universe consisting of M primary units, and then random subsamples are drawn consisting
of n
j
secondary units from the primary unit that was drawn at the jth draw, the varianc
of Xt become s
2
V(X')
•
(~) [~
2.2 The Effect of' Variation in Size
Hansen and Hurwitz (1943) in their classical paper on liThe Theory of Sampling
From Finite populationsll present an elegant analysis of the effect of change in size
of first-stage units on the variance of' the mean of a two-stage sample consisting of
m first-stage units and n second-stage units per first-stage unit.
Their analysis
is based on the fact that they are able to express the variance of a mean as follows:
2
V(i~. ~
r
1
..
N~M:i~
+
PJ.
[U:i
r
(N-1)
-
Ni9J
where P represents the intra-class correlation coefficient within first-stage units
l
of size N,
where
(xjk .. Po)
MN
and
~
.
M
'V~
~
~
j-1 k,tkt
2
-14If the first-stage units are combined to give ~ new first-stage units with ON
-
second-stage units each, the variance of the mean (denoted now by In) of a two-stage
sample of size mn w::lll be gi. ven by
v(i" )
a~
• mn-
[1 - R(M:C)
n(m-l)
{!+"mO
+ P2l.!1~
n
1m'
(
NO - 1) -
NC -
nJl
No"JJ
where P2 will now represent the intra-class correlation coefficient within first-stage
units of size NO.
The difference between the two variance. can be expressed as
where
.,(.
J.
~
M-m n (N 1)
Jr-I tr
III
-
- N-n
T
· b O W(NC-l) - ~
Now sime
.,(. _ ~ • n (CC-l)(m-l)(MN-ll1 ". 0
-;L
c: M[: (M-l)(M-G)
J' .~
and
n(m-l)(O-l) .> 0
N(K-l) (M-G)
The conclusion is that
V(fn )
> 0
whenever P >P2 provided both Pl and
v(f t )
-
1
P~
are positive.
In other words, a gain in
precision is brought about by enlarging first-stage units whenever the intra-class
correlation is positive and decreases as the size of the first-stage unit increases.
It also follows that the smaller the nlue of P2 the larger is the gain, so that by
choosing for consolidation those first-stage units 't-1hich are as different as possible
e
the gain can be increased.
Note, though, that practical considerations put a limit
on the size to m.ich the first-stage units can be increased since cost of subsampling
increases with larger and larger areas.
Hence the increase in precision is to be
weighed against the increase in cost. As an example, Sukhatrre (19S3) states that in
-15 crop surveys, the variance is deoreased when an a dministrative circle comprising a
group of villages is used in place of a village as the first-stage unit of sampling,
rot praotical oonsiderations of cost and administa:ative oonvenienoe favor the use of
the 'Village.
If oost l'lere no oonsideration the enlargement of first-stage units could proceed
to a point of elimination of the use of first-stage units altogether and the
~cond­
stage units would be seleoted independently from the whole population.
In the more general situation of two-stage stratified sampling where it is
2
2
assumed that the parameters of the frame 1\, N , a and aij for each of the characij i
teristios are fixed, the Hansen and Hurwitz development for examining the effect of
variation in size of first-stage units cannot be extended along the same linea
previously discussed.
Very reoently, hCMever, J.C.Koop (1955) devised an ingenious
approaoh to the development of the variance of an estimate in two-stage stratified
sampling whioh is based upon an alternative two-stage sampling fomulation.
The
variance fomula he derives has oertain advantageous properties with respect to frame
construction lo1hioh will subsequently be pointed out.
The specifio values which
l\ and
Nij assume in eaoh stratum depend on the method
of frame construotion or the measure of size whioh is used.
Usually the sampling
1\ is
81"l8ys known, but Nij is known only lor those psu la whioh
are seleoted at the first-stage of sampling. 'tVhen ~ and nij are fixed in advance,
system is suoh that
it oan be seen that the varianoe of the estinate of a total or zrean will depend on
the population values of these parameters with respeot to a given frame.
In the teohJlJlLquefor drawing a simple random sample of pauls illustrated at the
end of the first chapter, the asaignment of a. number of pauls to a partioular oount
e
unit or area segment depended solely upon the measure of size used in the oonstruction
of the area sampling frame.
If' an area universe is oonceived of where eaoh stratum is
composed of a finite set of identifiable area segments, the ohoioe of a measure of size.
- 16 ..
can be regarded as detining the allocation of primary units to each area segment.
Each measure of size, in a sense, generates a configuration of primary units on to the
area uni""rse. If at a gi... n point of time::i Hi;! is constant tor each stratum, then
j-1
specific to each measure of size in the construction ot the area sampling frame, K.t'
2
2
Ni , O'i and O'ij will aSSWlle certain values. For the alternative measures of size to
be compared in this thesis, the al ternati va frames which will be construoted Hill
differ tram each other in that eaoh time a new measure of size is introduced a partial
or total alteration of primary units will occur as some may be made larger and others
smaller in size.
Consider a frame in which
e
l\
is also held oonstant.
If another trame is con-
structed still keeping M constant, but now rearranging the seoond-stage units, the
i
2
2
O'i IS will change the O'ij IS will also change except for those primary units whioh
are unaltered.
The resulting configuration of primary units under the new trame
does not mange the mean value at Nij but its variance changes depending upon the
extent to which its frequency distribution is altered. Qiven that l\ is fixed in
advanoe, a frame must exist for which the weighted sum of the a~ IS and afj's will
assume the lowest possible value for a given characteristic, assuming the practical
limitations imposed by the relevant measures of size available for the construction
of the frame.
For two-stage stratified sampling, Koop s derivation of the variance of the
'
estimate of a total is based upon the postulate that a predetermined number of secondstage unite are
~
!:! selected
~ ~
2!. ~ ~ primary units
stage independent at the actual sample of primary unite.
is selected first will
e
selected
~ ~
first
In other words, the psu that
auls drawn from it, the psu selected second Will have n2
suls drawn from it, etc. This differs from the Hansen and Hurwitz premise which,
~
to repeat, is that the n secondary units chosen from the primary unit that was drawn
j
on the jth draw is a number predetermined
~ ~!:!!!! f2!: ~ 2! 2 ~ primary units,
The variance of the estimate of a total tha t KoOp darivas under his al ternative t't'1Ostage sampling formulation is
V(l').
~ -i Lini
(i ~={
ial
where
and is therefore the harmorxLc mean of the sizes of the second-stage samples chosen.
from each of the m selected first-stage units in the stratum. As Koop observes"
i
this remllt is of practical significance in the sense tmt" in any given stratum"
once the prescribed sample sizes for the number of second-stage units" to be taken
e
from each of the selected first-stage units" are determined (either by optimum
allocation theory or by other considerations)" it is only necessary to take fran
each primary unit a uniform second-stage sample of a size almost equal to the harmonic
mean of the separate second-stage sample sizes, to achieve approximately the same
degree of precision. The harmonic mean"
~
, In.ll not assume integer values and hence
i
the variance of a given estimate for any stratum vIill be a little greater or a ltttle
less than that for the case which l'1ould have resulted, if second-stage samples of
of prescribed sizes had been taken" dEglending upon whether the nearest integer
chosen is above or below the harmonic mean.
In order to investigate the effect of the variation in size at psu's in any
given stratum, Koop recast his variance formula along the lines adopted by Hansen and
1l1rw.i. tz for the case of psuls of equal size discussed earlier.
He has shOl-Tn tmt the
variation in size of psu IS contributes both directly tbrOlgh the term
Ni j " and
indirectly" through cova:riation with the characteristics under study, to the saI!¥'ling
error.
The resulting formula, for any given estimate of a total" contains the
intra-olass oorrelation between the varia tea under study and oertain measures of
- l~ ..
variation and covariation involving also the N IS. In essence it is, to a certain
ij
extent" a further generallzation of the Hansen and Hurwitz result to the case where
the psu I s are of unequal size. Under the imposed alternativa Hansen and lhrw.i. tz
-
two-stage sampling formulation" a similar expression could not be obtained since tba
analysiS becomes possible when the predetermined second-stage sample size for each of
the
1\ primary sampling
units is constant.
2.3 The Control of Variation in Size
As mentioned previously" the control of variation in the size of a sampling unit
may be of much greater importance for the problem of obtaining an estimte of the
aggregate of some characteristic for the population on the basis of a sample, than
when estimating an average per element, percentage, or other ratio from a sample.
Hansen and Hurtdtz (1953) have investigated various methods of reducing the effect
of variation in size of sampling units on the variance of the estinate of a total.
Probably, the most obvious method of reducing the contribltion of the variation
in size of suls to the varianoe of a simple unbiased estimate of a total lIi th cluster
sampling is to define suls ttat have a small amount of variation in size.
That is,
if there are a.dequate resources available, such as detailed maps and aerial photographs plus supplementary count information, a measure of size can be developed wh:tch
would project a particular configuration of su Is on to the area universe so as to
considerably limit their variation in size.
The Mister Sample materials afford
such measures of size, and they are being used to construct area sampling frames by
methods similar to that illustrated in Chapter 1"
Selecti on with proba. bility proportional to size (pps) lJill often aid in the
control of variation in size of suls.
e
unbiased estimate of the total"
~,
When single-stage sampling is done by pps, an
is obtained, by multiplying the characteristic
total for each au by the reciprocal of its probability of inclusion in the sample"
and then adding the results for the selected units.
In other words, for a sample of
- 19 m su t s drawn wi th repla cement
m
Xt.
!"5: ~
m i::1 Ai
where Xi is the total for the i th su in the sample" ~ is the measure of size of the
i th su in the aanq:>le and A is the aggregate measure of size for the population.
If fairlY' detailed maps and other supplementary materials are available" subsamplil€ of 'tmole compact segments is feasible.
In this procedure an indioated
average size of a segment is decided upon and then the measure of size of each psu
is defined in terms of the nwnber of area segments or subsampling units into which
the psu will
re
divided"
The actual prooedure
('.all
be carried out in the same manner
as illustrated at the end of Chapter 1, or" if the indicated average size is" say,
3c5 and the estimate of the nwnber of elements in a particular sampling unit is 40"
then 40/3~5
• 11 segments or
BUIS
can be assigned to the psu.
A whole number of
segments will always have to be assigned to a psu rather than a number involving a
.traction. The nwnber of segments into l-lhich the psuls are di. vided become the measures
of size which serve as the basis for the sample selection.
If' an area sampling frame is constructed in iihich there is a large variation
in size of psu Is, another method of reducing the contribution of variation in size
to the variance of unbiased estimates of totals is to stratify the psu IS into a number
,
of size groups before the sample is drawn.
Hence psu IS can be selected by use of
varying sampling fraotions in the different size groups.
The subsampling process
can then proceed, perhaps, by using a uniform over-all sampling fraction.
other methods which can be put to use in controlling the effect of variation in
size include the technique of applying ratio or other regression estimates, aSSUming
that other aspects of optimum design for estimating ratios have been approxinated.
Also" it is important to bear in mind that the presence of a few sampling units of
very extreme
size in the population might contrHute significantly to the sampling
variance of an estimate.
In such situations precautions ahcnld be adopted in order
- 20 -
to reduce or eliminate their possible effects (e.g. large institutional populationsl
such as pIisons, hospitals and hotels can be treated as a special class in sampling,
1£ such populations are to be included in the population tlB t is being sampled.
Such
institutions can be identified in an area and separate sampling within toom can be
provided for.)
For tl'10-stage stratified sampling, J. C. Koop (1955) has derived some further
results Which throw light on the nature of the effect of va.riation in size of primary
sampling units
't~hen
shifting from a given frame to one l'lhich is believed to be more
efficient due to a reduction in variation of size.
These results are again an after-
rnath of the varianoe formula hs deve). oped based on his al ternativa two-stage sampling
formulation.
He considers a frame in which the sizes cf the paufs are unequal and a
corresponding frame in which the psu fS are equal in size.
total number of psuzs and
S".l'S
are assumed constant.
In both these frames the
In this situation the variance
of any given characteristic specific to the latter frame is less than that specific
to the former, if a certain set of ineciualities on the intra-class correlaticn
coefficients of the characteristic in question are aimultaneously satisfied for the
frame in which the psuvs are equalo
Thus if a set of these parameters for the
former frame are knOli!l or call be estins.ted from a sample, inequalities can be obtained
indicating the limits between which the intra-Class correlation coefficierIlis of the
latter frame should lie for the estimate in question to have a smaller variance.
The
minimum sample size required such that the estina tes of the parameters in the former
frame will be sufficiently precise to justify any conclusions suggested by the set of
inequalities has not as yet been investigated.
Further work has to be done in tha t
direction" but in any event, it has been demonstrated that mere reduction in variation
e
in size of pau Is may not result in a more efficient sampling estimate if the corresponding changes in the essential intra-class correlations do not satisfy certain
conditions.
CHAPl'ER III
METHODS OF COMPARISON
3.1 Introduotion
In the summer of
1955
the Institute of Statistios at Raleigh, North Carolina"
oonduoted a simple random area sampling sur\rey in the open country portion of eleven
Southern Piedmont co'mties in North Carolina in order to arri va at estimates of
various agricultu.ral
cl:lara(~teristics,
particulaily total cotton acreage.
1955 a two-stage f'tratlfied ;:I,rea. sample survey
Unit of the Institute
o~
"'JaS
AlSO" in
conducted by the Survey Operations
St.a.tistics in order to investigate various household oharac-
tel'istics including buying practices, particularly for those households which had
members that had shopped at. lp.ast once in the city of vlinston..Salem in the year prior
to
inter~liew.
study and the
These two s·.1rvcYS rrlill be ;;,"eferred to, respectively, as the Cotton
Winston-Sa~..el.:l. 9(,1.:.1,'1,
'.:'he materials used in the construction of the area
sampling frames for both o'! -cb.ese stu.C:toes were the heretofore mentione<i Current
materials.,
It was thought +'hf',t ':'he ,...raster Sample naterials developed in. 1943-1944
(from 1937-1938 culture :i.n
NO):'t~'l
Car:)l:'na, for example) i'lare decidedly out of date.
John Monroe of the Institute ,f StatiEtics has summarized the characteristics of
OJ
these materials as foll()ws; "'"
(1) a,bandor.ment of t01 r.'lsi''':'p 1::ol1ndaries as count unit boundaries. All
count n.ri ts ara bOiX~id(~ci. \j'£.0..y by county lines, roads, streams, railroads
and ci t:~ limits ~ (2) uiv"isions wi thin counties fonned by the major road
netTrJork, rather than the tOv1::lship as used in the Master S..lmple of Agriculture
material~ In ~o~t oounties, this delineation renders a pie-shaped effect.
(3) delineatiC"n of tha ul1i.ncorporated places defi:1&d in the 1950 Census.
Enu.'11eraticn c1.ist:~ict maps for those places 1'1ere obtained from the Bureau of
the Censu,s< "enabl.:in3 the delinea:Lion of the area ~n the high'~'lay maps and
the exclus:'on of thosa areas from the open country count, Unincorporated
areas not define1 by the Census (those plaoes under 1,000 in population)
are in the open count.ry portions., (4) aerial photo count in congested
areas around. cities. Collection of photos for such areas is made as requests
for sanples are received. Use of this technique improves the accuracy of
the high1"!ay mn.ps count considerably. (S) ci. ty and to..m maps are made as
the cities are drawn in samples.
The specific measure of size which ua.s used for both of these studies was (nmD) cc-.
1
The Institute of Statistics, A Record of Resea.rch:III
.. 22 ..
In order to canpare (INOD)cc against (INOF)cc and to compare (INOD)cc against
both (IN9D)ms and (INOF)ms a procedure was developed which made use of these t't-l0
probability area samples,
In essence, it involves the computing of weights which
are used to calculate estimates of observations for a corresponding sample had it
been drawn under an alternative m3asure of size.
These estimates are based on the
actual observations recorded for the characteristics measured in the chosen sample.
The explioi t
underl~ng
assumptions are (i) for a specific comparison the area
un! verse to be oonsidered remains the same for both measures of size and (ii) if the
same size probs bility sample had been dra'tTD from the al ternative measure of size
frame, exactly the same group of count units would have been represented although the
actual configuration of the sampling units may have been different,
The first of
these assumptions oan be approximately satisfied by defining out initial area uni-verse
in such a manner as to exclude, as marly as possible, such portions for 't-Thi.ch neither
Current nor Master Sample materials are available.
Once an approxinate coincidenoe
of the two area uni -verses is established so thJ.t effeotively there exists but one
suoh universe, ,the second aSfW7Iption is intUitively reasonable,
Hence the initial
problem is to estimate the mea.sures whioh m uld have been observed for each sampling
unit had the area sampling frame been constructed with reference to the alternative
measure of size.
After ·this has been accomplished particular quantities can be
determined, such as estimated sampling varianoes, Hhich l-lould upon comparison best
reflect the relative efficiency of one measure of size to another.
3.2 Proposed Technique
The initial problem was resolved by considering this approach.
belonging to some particular count unit is randomly drawn.
Suppose a au
Asswne, further, that
this oount unit can be partitioned into the number of su's assigned to it such that
the measures (for any specific characteristic) are the same for each of the su's 101ithin the count unit.
If this type of partitioning of a count unit can be done for all
- 23 count unitS I an unbiased estimate of the measure (for any characteristic) of a au
belonging to any count unit and drawn from a frame constructed under an alternative
measure of si2'Le can be obtained.
The notation to be used below should not be con-
fused with a sOMe1{hat similar notation used previously.
defined specifically for the detiva tion e;iven gel ow•
I\
and Ni as used here are
If the number of su IS assigned
to the i th count unit is denoted by Ni for the measure of' size acmally used in the
selection ot the sample, and if' the number of suts assigned to the same i th count
unit is denoted by
1\ for
the alternative measure of size, then (Ni/M:t,)xi is an
unbiased estinate where Xi represents the actual observed value of the randomly
th
chosen su which belongs to the i
count unit. If" as rarely occurs" two or more
of the selected su Is belong to the i th count unit" the estimate for each of these
~
su I s tor the sample based on the al ternative measure of size is (~/l\)(~ ~/ni)
i-l
where n denotes the number of suls which fall into the i th count unit.
i
HOl-rever" such a partitioning would be impossible to accomplish in practice and
consequently a bias is introduced 'Vrhich tends to make the variance of the sampling
eetimate of a mean or total smaller for the alternative measure of size when the
procedure for estimating the observaticns is as described above.
This can be seen
as fol101-1S:
Consider a large area that is partitioned into k area segments which are called
count units.
Also" assume that two sets of information or "measures of size"
(N and 1\) are available for assigning a fixed number of
i
of k count units.
say N" to the set
BUIS,
Schematically,
Count Unit
I(Ni >
Nl
1
2
II(l\>
1\
N2
~
•••
•• •
•••
k
Total
-1\
N
\
•
M
- 24 Let x
be the measure for any characteristic of. the j th au ly.ing in the i th count
ij
unit under scheme I. Let Y be the measure of the same characteristic for the jth
ij
au contained in the i th count unit under Bbheme II. Note that
~
Ni
Xi
-
~
j-l
~
JIlj • Yi •
Yij
j-l
where
We can write
- (.I.)
2
N
SimilarlYI
k
(2)
CJ~ =~ ~
i=l
If a simple random sample is drawnl scheme I would be preferred to Bcheme
if CJ2 < CJ2 • Note that (1) can be written as
x
Y
k
2 ,~.
CJ .!!!!: ~
x
i-l
Ni
If'"
2
CJw
iX
+ 2
CJbx
n
and (2) can be l-Jritten as
k
(:l. ~
Y
1-1
-Ml\
If the Y IS are estinated by the ",eighting procedure described earlier" a bias will
ij
be introduced in the estinate of
since the Wi thin count unit component of variance
2
"is underestimated. This l:d.as can be regarded as insignificant if the between
iy
oount unit component of variance domimtes the expression. These results can be
0;
"w
extended in a similar fashion to a two-stage stratified area sample.
In the comparison of measures of size using the new ma terials as opposed to
the Master sample materials certain minor difficulties occur.
Since" as previously
stated, the count unit boundaries under the Current materials have been considerably
e
revised, a correspordence of count units for the Master sample materials and the
Current materials was not readily available.
In order to obtain such a correspondence
it was necessary to obtain equivalent area segments on both maps" which at times
required the combining of certain count units on either the Master sample map or the
Current map, or both.
In the Cotton study where (meD) cc was the measure of size
used in the construction of the area sampling frame" parts of count units were
occasioMlly combined on the Master sample map to assure a precise equivalence vlith
count units on the Current map.
For the Uinston-Salem study" the area segments f or the alterna ti ve mea sure of
size, (INOD)me" were so chosen as to coincide with the area segments (not necessarily
count units) selected uSing {INOD)cc as the measure of size.
In other t'lords" i f a
count unit was delireated into a smaller segment contaimng the selected su, the
matched area on the Mlster sample map was made to correspond to the delireated segment
e·
rather than to the count unit on the Current map.
In such cases l'1here an alteration
of a count unit was made, the number of su Is asSigned to this, in a sense" redefined
count unit, was computed by dividing the total (mOO)ms in the particular redefined
count unit, by the indicated average size of a sampling unit under (INOD)ms.
CHAPl'ER
rv
CALCULATIONS AND RESULTS
4.1 The Cotton Study
The area universe is the open country portion of Crop Reporti ng District No.8,
an area with eleven counties in the Southern Piedmont area of North Carolina where
cotton is by far the predominant crop.
The populations under investigation for the
comparison of altermtive measures of size are:
(1)
Number of Milk Cows
(2 )
Number of Beef cattJ.e
(3) Number of Hogs and Pigs
(4)
Number of Cotton Fields
($)
Cotton Acreage
(6)
Number of Corn Fields
(7)
Corn Acreage
(8)
Number of Wheat Fields
(9) TrJheat Acreage
The sampling unit consists of an area segment having an expectation of four
cotton fields.
A random sample of 12$ au Is was selected on the l:as:Ls of the estimated
number of cotton fields in the eleven counties considered.
No stratification was used
and the observations were recorded on the basis of the "closed segment" approach.
This method consists of taking observations on tracts of land which are confined
llithin the boundaries of the selected area segment.
In the lIfarm headquarters"
approach observations are taken on those tracts of land 'Which are owned by farmers
nho have their headquarters in the chosen area segment.
These tracts of land are
thus not confined wi thin the boundaries of the sele cted sampling unit.
In order to reconcile the Master sample materials with the Current materials,
the 3 au Is which fell in Mecklenburg County were excluded from the comparisons as
were 2 others in Richmorxl County and 1 in Cabarrus County.
There suls were elimiMted
in the process of redefining the area universes so as to be approxillBtely the same
for both the old and new materials.
All remaining 119 au fS were used in the oompari.
son of (INOD)co vs, (INOD)ms and (INOD)oo vs. (INOF)ms, but out of these only 53
were uC3ed for the oomparison of (INOD)oo
VB,
(mOF).cc.
This was due to the fact
that the (nWF)oc measure of size was available for just Union and Cleveland counties
a t the time of this study.
The varianoe of a mean is gi ven by (N..n/N-l)
~
8~
i
i In for a
~ [~Xi _! ~ ~J 2
The statistj.:2 •
(x -x)2 - l and 2 •
/n
y n..J. i-l"'1
x i-l
n i-I
vIas oomputed for the 9 characteristics under examination.
is, of course
simple random sample.
"'1
The expected value of s;
(N/N-l)a~. The relative efficiency of (INOD)cc was then calculated
for each of the comparisals.
The results of (INOD)co vs (INOF)cc; (INOD)cc vs.
(INOD)ms: and (INOD}cc vs. (INOF)ms are given in Tables 2, 3" and 4" respectively.
Table 2
COMPARISON OF ALTERNATIVE MEASURES OF SIZE:
THE COTTON STUDY (n-53)
(INOD)cc
2
sx
(INOF)oc
2
8y
Number of Milk Cows
350.6821
186.0112
53.043
Number of Beef Cattle
569.1517
924,9067
162.506
Number of Hogs and P"lgs
145.1299
239.6976
165.161
Number of Cotton Fields
4.4528
5.6654
127.232
105.4097
109 0 1335
103.533
3 0 8483
6.2487
162.376
Corn Aoreage
55.2850
82.2569
148.787
Number of 11heat Fields
.1.8229
3 e1630
173.515
123.2241
148.3140
120.361
Characteristics
Cotton Acreage
Number of Corn Fields
Wheat Aoreage
Relative
Efficiency
·
,
- 2.8 -
Table 3
COMPARISON OF ALTERNATIVE MlMSURES OF SIZ~:
(INOD)cc
2
Oharacteristi cs
I!lx
THE COTTON
(INOD)ms
srum
(n
101
119)
2
Relative
Efficiency
Sy
Number of Milk Cows
320.9819
373.7680
116.li4S
Number of Beef cattle
5b1',,8302
610/')872S
112.742
\
Number of Hogs and Pigs
930.1684
888;7839
95.551
Number of Cotton Fields
4.7754
3.980,
83.354
111.8288
71.4165
63.862
24.0786
35.7756
148.578
258.1610
295.818,
114.587
,.5266
4.7000
85.04.3
1,4.1136
112.3320
'·7~.606
Ootton Acreage
Number of Corn Fields
Oorn Acreage
Number of Wheat Fields
Wheat Acreage
Table
4
COMPARISON OF ALTERNATIVE MEASURES OF SIZE:
(INOD)cc
Chara cteristi cs
2
8
x
THE CarTON STUDY
(INOF)ms
2
s
(n=1l9)
Relative
Efficiency
y
Number of Milk Cows
320.9819
245,,4246
76.461
Number of Beef Cattle
5l.a..8302
2373,,6506
438.080
Number of Hogs a nd Pigs
73O c1684
1832.3745
196.994
Number of Cotton Fields
4.7754
5.767,
120.775
111.8288
8105724
72.944
24.0786
36.1355
150.073
258.1610
274.1246
106.184
5.5266
3.8709
70.obl
154.7136
199.4962
128.945
Cotton Acreage
Number of Corn Fields
Corn Acreage
Number of wheat Fields
Wheat Acreage
- 29 The results of Table 1 and Table 3 tend to indicate that, in generall for an
all purpose sample of agricultural characteristics using the closed segment approach
(INOD) is a better measure of size for the construction of an area sampliDg frame
than (INOF).
This is particularly evident for (INOD)cc
VS.
(INOF)cc, although,
there is only a slight gain in efficiency for the important characteristic of cotton
acreage.
For (INOD)cc
VS e
(DTOF)ms there even appears to be a loss of efficiency
in the estination of cotton acreage.
Table 2 suggests that there is apparently not much gain in efficiency, if any,
when (IHOD)cc is the measure of size as opposed to (UJOD)ms.
In fact in this com-
parison there is, perhaps" a substantial loss in efficiency in the estimation of the
dotton acreage.
It should be borne in mind, however" that there is a bias operating
in favor of the Master Sample materials although the assumption has been that it is
of relatively small magnitude.
4.2 The \1inston-Salem Study
The area universe covers the open country portion of 16 counties in the Northl1est part of North Carolina.
of (!NOD )cc
VB.
The populations under investigation for the comparison
(!NOD )ms are:
(1)
Number of persons under 12 yea2's of age
(2)
Numbar of persona 12-18 years of age
(3)
Nu."11be::, of persons 18 years of age and older
(4)
N1.unb3r of households receiving a newspaper d1.Uy
(5)
Number of households having a radio
(6)
Number of households having a television set
(7)
Number of households with income less than $3>000
(8)
Number of households that shopped in Winston,..Salem
(9)
Number of white households
(10)
Number of households
\
,
-)()
-
The psu consists of an area segment having an expectation of six hQlseholds.
The universe was stratified into 35 strata and
?\ • 500
psu's were asSigned to each
of them.
Every psu is composed of N • 2 su IS" and a two-stage stratified sample
ij
consisting of 105 su's was selected. This lias accomplished by selecting ~ • 3 pBU'S
from each stratum am n • 1 su from each psu. The scheme of stratification and the
ij
number of pau's assigned to each county is presented in Table 5. Map count informatiol
for both (INOD)cc and (INCD)ms is also included.
In the process of redefining the area universe such that it would be approxi-
mately the same for the available information on both measures of
and Guilford.
counties were excluded from the comparison.
eliminated from the study,
si~e,
Forsyth
Hence, 21 su's were
Fourteen other su's distributed among the remaining
counties were also removed from the study for the same reason.
Thus the comparison
of the two measures of size was based on a sample of 10 su's e
This necessitated the
introduction of a small bias in the estimate of the sampling variance since
~
was not
equal to 3 for every stratum included in the sample.
The variance of the estimate of a total for a two stage stratified sample is (see
page 12)
Which, for the s0.mpl.ing dE;:sign for the 'tvinston-Salem study" reduces to
1~ 2}
M
+ M ~ 40'ij
J=l
or
;-
-
S
m ~
- ...
.; ="1
The statistic
2
sb
U!-3
n:X
S
'7l
III
~
i=l
~
B b.
1.
2
0'.
1.
+
~J
- 31 Table 5
OPEN COUNTRI STRATIFICATION:
THE ~aNST ON-SAIEM
psu 1a assigned Map Count (mOD)oo
per au
500/Stratum
(mOD)co
STUDY:
Map Count
(mOD)ms
(mOD)Ms
per au
6.19
2402
1.20
2742
5.48
1969
3.94
1500
7483
~.99
4044
2.70
.3
1590
8024
5•.35
4564
3.04
Randolph
3
1500
6891
4.59
3402
2.23
Guilford
3
1500
8267
5eSl
4580
3.05
Rockingham
3
1500
6402
4.27
4260
2.84
stokes
1.64
820
4342
5.30
2978
3.63
Surry
2.36
1180
6337
5.37
4224
3.58
Yadldn
1.48
740
4218
5.70
3297
4.46
Iredell
2.52
1260
6512
5.17
3962
3.14
Alexander
1
500
2827
$.65
1693
3.39
Wilkes
.3
1$00
7983
5.32
4803
3.20
.56
280
1807
6.45
15$2
5.54
Ashe
1.U4
·720
3695
5.13
2S99
3.61
Watauga
1
SOO
3191
6•.38
248.3
4.97
Country
No. of
Strata
Forsyth
4
2000
12374
ravie
1
500
Rowan
3
Davidson
Alleghany
- 32 where
s2
b
ix
-
~
~
i-l
nij
(~j-Xi)2
m -1
i
was computed for each of the characteristics under examination for both measures of
a
2
2
size (i.e., sbx and Sby were cOlnp.1ted). sb represents the between primary unit
within stratum mean square from an analysis of variance point of view.
value of
The expected
s~ is, in general, (under the Koop tiVo-Btage sampling formulation)
i
~
~
ij
1
1\
~
2
a. j
1
j-l Nij-l
+
Where
and
Where
-
~-
This result is derived in Appendix B.
In the Winston-Salem study (which
satisfies both the Hansen and Hurwitz and Koop sampling fonnulations since nij • 1
and is constant) this reduces to
S
M
1 ';--, 2
2 ~
M
2
>. aij
E
sb·~ u..,1 ai
+ -M -i-1. ~'r- &.l.
j-l
or
S
.~
JC4
M
u-1
nai2 + a
i-l ~T"',..
Wi
since when the Nij IS are ecpal, say to Hi"
B.
S
4l~M:rM
...... M
•
i-l -
ai.
j.!.
~
2
0i
N.
as explained in Appendix
In this case, N • 2 and hence the above forrnuli:.
i
Thus the statistic s~ was thought to be appropriate for the calculations of the
relative efficiency of (DIOD)cc vs. (INOD)ms since for the Winston-Salem siudy«
1+-3. The results of this canpar:l.son are given in Table 6.
- 33 Table 6
COMPARISON OF ALTERNATIVE MEASURES OF SIZE:
THE WINSTON-SALEM STUDY (na 70)
(mOD)cc
s2
bx
(INOD)ms
s2
by
Relative
Efficiency
Number of persons under
12 ~ars of age
11.1706
1,.2,99
136.6077
Number of persons 12-18
years of age
3.432,
4.1119
119.7932
Number of persons 18
years of age and olde r
1,.32,4
38.9644
~54.2472
Number of households receiving newspaper daily
1.8690
4.9046
262.4184
Number of households
having a radio
4.0992
7.8179
190.7177
Number of households
having a television set
1.9167
2.,67,
133.9,42
Number of households
with income $3,,000
1.5556
4.2919
27,.9000
Number of households that
shopped in Winston-Salem
2.2,00
4.2,29
189.0178
Number of white households
2 0 6984
6.8185
2,2.6868
Characteristics
The results of 1'8 b1e 6 clearly tend to indicate that for various household
characteristics (INOD)cc is a 6Uperior measure of size for constructing an area
sampling frame as opposed to (mOD)ms.
Almost all of the 10 cmracterietics under
study evidence a large gain in efficiency li'ith the frame constructed on the 1.;lasis
of (INOD)cc.
Chapter V
SUMMARY AND CONCLUSIONS
The objeotive of this thesis is to oomIare alternative measures of size in the
construotion of area sampling frames in North Carolina.
The speoifio measures of
size under examination are (INOD)co, (INOF)oo, (referring respeotively to indioated
number of dwellings and indioated number of farms as obtained from oounts of ourrent
highway maps) and (INOD)ms, and (INOF)ms, (referring too ounts pr ovided by Master
Sample materials).
The oriterion of oomparison has been constrained to the relative
effeot of these alternative measures of size on the variance of sample estimates.
In
order to compare (INOD)oo against (INOF)oo and to oompare (INOD)oo against both (INOD)
and (INOF)ms a procedure was developed whioh made use of
t'lilO
probability area samples~
The first of these was a study of agrioultural characteristios in a sample of olosed
segments in the 8th Orop Reporting Distriot of North Carolina and is referred to as
the Cotton study.
The seoond was a trade area survey of households in 16 oounties
surrounding Winston-Salem, North Carolina and is referred to as the Winston-Salem
study.
Both of these studies were drawn from frames oonstruoted on the basis of (INOr
The prooedure, in essenoe, involves the oomputing of weights Which are us ed to oalculate estimates of observations for a corresponding (matched) sample had it been drawn
under en alternative measure of size.
These estimates are based on the actual observe
tions recorded for the oharaoteristics measured in the selected sample.
In the Cotton stu dy the results of the comparison of alternative measures of
size indicate tlR t, in general, for an all purpose sample of agricultural oharaoterist
(INOa) is a better measure of size for the oonstruotion of an area sampling frame than
(INOF).
However, there is apparently little gain in effioiency, if any, when (INOD)oc
is the measure of size as opposed to (INOD)ms.
In faot in this oanparison there is,
perhaps, a substantial loss in efficiency for (INOD)oc in the estimation of total
- 35 ootton aoreage.
On the other hand, the results of the Winston-Salem study olearly
suggest that for various household oharacteristics (INOD)co is a superior measure of
size for oonstruoting an area s&upling frame as opposed to (INOD)ms.
The inconolusive results of the Cotton study with respeot to (INOD)oo va. (INOD)n
raises the question of \1hether or not the (INOD) measure of size is, in faot, an adeq~
souroe of information for developing a frame whioh is supposed to oo.atain sampling
units of approximately equal size with respeot to number of ootton fields, partioularl
when the "olosed segment" approaoh is used.
of a high degree of assooiation
is not justified.
be~ieen
It is quite possible that the assumption
ootton fields and indioated number of
d~elling
In any event, if some such relationship does exist it may be more
benefioial to oompare the alternative measures of size under the "farm headquarters"
approaoh sinoe the
relationship betl'/een the measure of size and the sampling unit
would be more meaningful in that s i tuati on.
This difficulty does not appear to the same extent in the Winston-Salem study
sinoe (INOD) oan be
sure~
expeoted to be an appropriate measure of size for oonstruot-
ing a frame "to equalize" the size of sampling units with respeot to households.
results of this study lead to less tenuous oon:clusions.
The
(INOD)co demonstrated a large
gain in effioiency fo r almost all of the 10 oharaoteristics under examimtion.
The c01JBtruction of sampling frames is of fundamental importanoe in sampling
surveys and further investigations for determining the most suitable set of information available, with respeot to a speoific type of sampling inquiry, in the development of a frame should ce1 tainly be enoouraged.
4
Recent advanoes 1n the theory of
frame construotion should also be more fully explored, and when possible large scale
sampling surveys should inoorporate into the sampling design, frames that inherently
provide a basis for the comparison of alternative measures of size.
APPEIIDIX A
Table 7
SAMPLING FRAMES FOR ALTERNATIVE }{EASURES OF SIZE:
COTTON STUDY
Area Segment
4f
( INOD) 00
1
2
3
13
7
2
14
4
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
7
1
1~
10
4
8
3
6
1
2
5
1
20
5
10
3
5
3
4
4
16
9
9
23
9
13
8
36
4
4
4
37
38
7
40
5
3
34
35
39
8
Sampling Units Assigned
(INOF)oo
(INOD)ms
11
15
7
8
2
14
4
11
6
5
2.
2.
8
3
16
10
4
9
2
7
1
2
5
2
8
4
13
5
5
4
5
5
8
3
13
13
4
11
3
8
2
3
2
3
6
6
7
8
7
4
6
8
17
10
10
22
9
12
9
12
12
33
12
10
8
8
5
5
7
5
5
8
6
4
6
5
4
8
7
7
(INOF)ms
15
9
2
8
6
5
9
3
13
13
5
12
3
9
2-
3
2
3
7
g
7
6
6
7
12
7
16
16
21
7
13
10.0
6
6
5
6
8
7
- 37 Area Segment
Sampling Units Assigned
#
(INOD )00
(IHOF')oo
(INOD)ms
(INOF)ms
41
3
21
fl45
~
1
21
9
14
3
19
10
12
5
7
10
2
22
10
10
4
6
10
11
8
11
11
1-
42
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
fY7
88
5
6
11
11
10
2
6
3
6
10
9
12
11
4
12
6
4
14
9
18
5
11
8
8
9
7
2
36
19
3
6
8
5
5
2
6
6
8
2
4
3
4
5
2
2
2
2
3
3
3
5
2
8
4
11
10
6
4
11
7
22
6
12
9
12
11
3
1
19
16
11
3
9
8
4
3
6
7
3
3
4
3
3
7
2
3
4
3
3
3
10
1
6
4
9
10
6
6
12
7
23
6
12
10
11
10
3
1
22
15
5
1
5
6
5
3
6
7
1
4
5
4
3
8
3
4
4
4
3
4
- 38 Area. Segment
Sampling Units Assigned
:JF
(INOD)oo
89
90
91
92
93
94
95
96
97
98
10
4
4
14
4
3
2
2
5
9
6
9
99
100
101
22
102
20
103
104
20
105
106
101
108
109
110
111
112
113
114
115
116
U7
118
119
4
15
3
5
3
1
5
5
5
(INOF)oo
(INOD)ms
11
3
8
25
6
7
7
6
22
5
:3
:3
6
3
4
11
9
12
7
10
10
12
4
5
5
2
22
15
5
4
)~
3
3
3
5
5
2
2
5
4
4
7
5
2
2
2
2
2
(INOF)ms
4
7
2
5
5
5
2
2
4
6
3
3
3
2
4
1
1
4
4
5
4
- 39 Table 8
SAMPLING FRAMES FOR ALTERNATIVE MEAS ORES OF 81 ZE:
WnfSTON-SALEM: STUDY
Area Segment;
4,~
1
2
3
4
5
6
7
8
9
10
11
J2
13
14
15
16
17
18
19
20
21
22
suts Assigned
( INOD) 00 ( I NOD)me
3
5
5
1
2
3
1
~
6
1
3
4
4.
4
2
3
2
2
4
2
4
3
23
2
24
25
26
27
3
2
5
3
4
28
29
30
31
32
33
34
35
3
2
7
5
3
2
6
1
7
9
2
6
7
5
4
2
9
3
1
6
1
5
2
2
2
3
7
4
3
3
4
5
3
3
1
2
3
1
1
2
3
6
Area Segment
41-
36
37
38
39
40
41
42
43
J.l4
L6
46
47
L~8
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
SUts Assigned
(INOD)oo (INOD)ms
3
1
1
1
4
1
1
1
1
1
2
2
1
1
2
1
1
4
2
2
2
5
3
4
1
5
2
4
2
2
2
5
4
3
2
3
8
2
3
3
1
2
4
1
3
2
3
7
4
2
2
6
4
6
2
3
3
4
4
4
2
4
7
2
3
2
1
1
4
2
APPENDIX B
In the analysis of variance the follonj.ng algebraic identity for the i th stratum
is well known.
2
j,k
The notation has been kept consistent with that used in section 2.1.
mi nij
mi
.2
(x'jk-i)2=~
j-l k=l
J.
(x. 'k-xi .)2
J
J.J
+.2
n
(xiJ,-xi )2
j=:l ij
Wi
where
an
\~en
d
-
X ij
•
.l:..
~j
all the primary units have the same number of second-stage units J
say Nij "" N , it will be seen that ~ 1J,00i .. O"i from the following considerations:,
i
"Nij
III
'"
LJ x. 'k ..
k=l
since
1.J
~'i .N.
J
1.
..
111
-41 -
l-Je have
'Q
B1 = L! n..
j=J. J.J
-)2
xJ.. J' "xJ.'
(-
For convenience the subscript 1 will be dropped in the derivation of E (B ).
i
we have
m
B
~
~
I:
l'J
j=l
j
(:a -)2
,xj"'-x
m
1:1
m
';..;
...2 -2
4J n. xj-nx , "Ii\l'here n
j=-l J
=
~
.61
a=l
n
j
therei'ore l
ni
~
E(~
k=l
x. k ) 2
J
= n.2
J
E (-2)
x.
J
= n.2 fa~
...J..
~
Nj-n.J
2J
N _l + ~j
..
nj j
2 Nj
J J j
-?
n.a. N _
2 2
+ n IJ..
J
Thus
.
- l.t2 (where i t must be remembered that N .. N and lJoij • lJ. in view of the fact that the
j
ij
j
subscript i has been dropped)
SOl
and on the basis of Koop's sampling formalation where the n are fixed in advance
j
for each of the paula sampled" we find
.I~ nn_l f.
lP7q
P:J
l
If!. l}]
+
M-l
(where
0
jJ.
2
ClI
o~
IJ.:L
and
~
..
~J.. )
i.e.
with the subscript i included.
To complete the problem" it can similarly be shown that
:;
2_~~J_
E(Wi lj-l"2" ... ,,m) • ~ (nij-l) °ij Ir.:"=I
j=l
ij
::~a:
m)('J . l:r~ i C1~J
1
E[E(WilJ-l.2, ....
1 j-l
Ilfi
1 :2 2 !~L
((
II
.. Mi
j=l 0ij Nij-I L!.ni-mi)J
Ni~lJ
(i
ij
0=1
N
("ij -1)
l.
~
.. 44 ..
BIBLIOORAmY
Deming, W. E.
1950. Some Theory of $ampling. John Wiley and Sons, Inc., Nel'f York.
Hansen, M. H., and Hurwitz .. lie N. 1943. On the theory of sampling from finite
populations. Ann. Math. Stat. 1.4:333-362.
Hansen, M. H., Hund tz, U. N., and Madow, W. G. 1953. Sample Survey Methods and
Theory: Vol. 1. John irliley and Sons.. Inc., New York.
Jessen.. R. J., and Thompson, D. J. 1953. Design and Analysis of Surveys, Chapter 8.
Dittoed Notes. Iowa State College, Ames, Iowa.
King, A•. J., and Jessen~ R. J.
AmeD. Stat. Assn. ~:38.
1945. The master sample of agriculture. Jour.
Koop, J. C. 1955. Sample Survey of Labor Force in Rangoon. A Study in Methods.
Union Government Printing and Stationary, Rangoon, Burma.
Sukhatme .. P. V. 1953. Sampling Theory of Surveys 'Wi. th Applications.
College Prese, Ames, Iowa.
Yates, F. 1949. Sampling Methods for Censuses and Surveys.
Co., London.
Iowa State
Charles Griffin and