Boyd, J.N.Investigation of area sub-stratification and the efficiency of the Minor Civil Division as a primary sampling unit in estimating agricultural characteristics in North Carolina."

Gertrude M. Cox
INVESTIGATION OF AREA SUB-STRATIFICATION .AND THE EFFICIENCY
OF THE MINOR CIVIL DIVISION AS A PRIMARY SAMPLING
UNIT IN ESTIMATING AGRICULTURAL CH.AR.ACTERISTICS
IN NORTH C.AROLIlifA
:By
Joe Nelson :Boyd
•
TABLE OF CONTENTS
•
Introduction
1
~ea
3
Sub-stratification
Investigation of the Efficiency of the Minor Civil Division
(M.C.D.) as a Primary Sampling Unit and Master Sample
Segments as the Sub-sampling Units
16
Method of Stratification
16
Basic Sampling Plan
17
Calculations and Results
17
Unbiased Estimate
17
Ratio Estimate
18
Regression Estimate
19
Appendix
28
References
36
LIST OF T.A:BL1IlS lIND MAPS
Table I.
12
Table II. nias of the Estimate in Per Cent for Area
Sub-stratifioation with 20 Strata
13
Table III. Comparison of the Coefficients of Variation and
the RelaUve Efficienoies of Different Hethods
Using 10 Strata If.hore Ono County is Selected
with P.P.S. from Each Stratum
15
Comparison of the :Bet\'teen Primary Sampling Unit
Component of the (Coefficient of Variation)2
for Approximately 200 Strata of M.C.D.'s Using
Different Methods of Estimation
21
Table IV.
•
Comparison of the Coefficients of Variation and
the Relative Efficiencies of Different Methods
Using 20 Strata, Where One County is Seleoted
with P.P.S. from Each Stratum
~stimate
Table V.
:Bias of the Ratio
in Per Cent
Table VI.
Comparison of the 13etween Primary Sampling Hnit
Component of the (Coefficient of Variation)
for Stratification by Current Census Information
with that for Past Census Information Using the
Unbiased Estimate
Table VII. Comparison of the (Coefficient of Variation)2
of 20 Strata When the County is the Primary
Sampling Unit and 20 und 197 Strata Whore the
M.C.D. is the Primary Sampling Unit
Map I.
Map II.
22
23
25
Types of Farming Aroas in North Carolina with
County 30undaries Presorved
4
Dasic 10 and 20 Strata by County
5
INTRODUCTION
Tho work reported in this thesis is n
reported by Lillinn II.. Madow
co~tinuntion
of n study
[lJ ,.
Mrs. MndQl;iT used a t\'!o stngo sampling plan whore the county was tho
primnry sl'lr.l];lling unit and the Master Sample segments wore tho secondnry
I,.
sampling units.
if
If a simple unbiased estimate were used, it was deter-
mined that 20 counties were not sufficient to estimate most items with an
accuracy of a 5
PCI'
cent coefficient of variation und even samples of 40
counties would not often yield estimates within a 5 per cent coefficient
of variationo
As cost coneiderations usually limit most surveys to 20 counties or
ono-fifth the total number in North Carolina, Mrs. Madow proceeded to
investigate the accuracy of ostimates other than the simple unbiasod
estimate.
These other estimates were
(1)
ratio to estimated numbor of farms at time of survey
(2)
ratio to same characteristic at a previous census
(3)
regression estimate.
Little improvement was found by using the first estimate, but it was found
that 20 counties were adequate for most items when the second was used.
The third showed only slight gains over the second.
The chief objection
to those estimates is that they arc more laborious to compute and in
estimates (2) and (3) compnr2.ble data nay not be available at an earlier
Census date.
Since cost considerations usually limit most surveys to 20
co~~ties
and since ostimates (2) and (3) require information which may not be
:JJ
Refer to King nnc1 Jessen
[2J •
available. how can estimates based on 20 counties be improved?
The
research on this subject' in this thesis was divided into two parts.
I.
Investigation of the use of area sub-stratificntion where tho
Minor Civil Division (MeD) is the unit of sub-stratification.
II.
Investigation of the efficiency of the MCD as a primary
s~Jling
unit and Master Srurrple segnents as tho sub-sampling units rather than the
county as a primary unit and Master Sample segments as sub-snnpling units
as used previously.
In connection with II above, the
1.
followill{~
conparisons were made:
The rntio to 1945 nUMber of farms estimate with an unbiased
estimate and the bias of the ratio estimate.
2.
A regression estimate with a ratio to 1945
n~~bor
of farms
estimate.
3. A purely geographic sub-stratification with four sub-stratifications by socia-economic charactoristics.
4. A sub-stratification by current census information with that
of past census information.
5.
A stratification of MCDl s in 200 strata with that of 20 strata
where tho MCD is the primary
s~ling
unit.
The sub-stratification items used in these investigations were:
1.
days worked off farms in 1940
2.
value of products sold in 1940
3.
average niles from all weather roaa.s in 1945
4.
farns operated by tenants in 1945.
The items to be estimated were:
1.
tobacco acreage in 1945
2.
cotton
3.
days worked off farms in 1945.
acreD{~c
in 1945
.AREA S'UB-STIiATIFICATION
Tho purpose of area sub-stratification is to adjust the szmple taken
fran tho prinary s~)ling unit (a county, in this part of the study) in such
a
so as to make it representative of the stratum from which it was
w~
selectod rather
tl~
representative of tho prinary sanpling unit.
In order to use area sub-stratification, the populntion nust be capable
of being divided into prinary snopling units whioh in turn flust bo capablo
of being dividoc1. into sub-areas.
Also these sub-areas flust bo accompanied
by certain Census information, such as on total farms in 1940 and on substratification items.
In this design the state of North Carolina is the
population, the county is the prima~J sampling unit and the MCD (township)
is tho sub-aroa.
Tho MCD will bo used only to dotornine tho sub-strata
l'li thin tho primary geographic stratum and the prinaI'jr s[lI.'l]?ling unit, as
Master Sanple segments will be the sub-sampling units within the sub-strata
of tho priflary
s~~ling
unitse
For purposes of comparison the same 10 and 20 major
rwore used as in Ll
J-
(soe Maps 1 and 2).
~co€raphic
strata
The design assumes that one
county is selected from each primary stratum with probability proportionate
to the 1940 numbor of farfls (p.p.s.).
stratum was allocated to one of
~lO
Each MOD within a given primary
sub-strata on tho basis of one of the
stratification itens listed previously.
prinary stratum contained
~9proxinately
accorc1ing to tho 1940 Census._
allocated to ono of tho
~lO
These two sub-strata within the
the
SaT:10
number of farms listed
Ronce onch l-1CD in a given county
WD,s
also
sub-stratn on tho bnsis of tho figures obtained
to divide the primary stratuT.1 into two sub-strata, but the two sub-strata
within a givon county would not be oxpected to hnvc the anne number of
farms.
Hence this design would have n different
s~)ling
rate for tho two
e
l.~ypes
Map
e
,
of Farming Areas in North Carolina
I .~
Pc-Z""
¥J:-ll"/'" '"
~
.". ..' .'
"'.
•...•
,n
,· ,. "-.wt··
•
- (.1i
C """"".
h :',,
'-, /
:
.
. '....
":
,.
~
.. '"
I
'.
IIT!'- i--I -"- r·1--r····~··~ . / <... .·....::._.~f~\
r~ L/.. ·
"-··..:····xt «.~
,' . : : : ;- .
.' '
~~'
..•.:c
.:'" :.
- , : .... ,., ... ,<...-
\'"rrr.:':
. . '
I. ... .. .. .. . . .
:---
.Ul·
. '
":y B;,-
./;. .
_: ~/.". T<,
Northern Tidewater
Southern Tidewater
North Central Coastal Plain
Central Coastal Plain
Upper. Coastal Plain
Sandlnlls
".
rr .' ;
y....
\r\l
LO\'1er Piedmont
Northern Piedmont
Central Piedmont
Foothills
North Mountain .
Southern Mounto.1n
.
~ .. '
• .
..'
. . <III
.:.
i
;
'.. '"''
/
~,
V
.x.!
\~ _~.\
\
".>.. . . ., . .
~
"1 • ....n \~
..
I:! .
11
~ ..•~
.~
'\\
\\
". ./
I
~.~~i
I...... ',..'i~,··,;-PU-7:J-1/
:. r¥::..A
':-
., ..
.' ..'
....
'. ..... "'-.._
.
f.
::or
\ _..
-
\
-~-L/ '~
VB.
VI.
VII.
VIlA.
VIII.
VIIIA.
'---;.~'--
.\
lL...-.
I.
. II.
III.
IV.
V.
VA.
e-
•
~/~ -, .
r
.
(:
.~
,.f"·)
'
•. A
~
e
It
Basie strata
Map 2.
'A"'! tn - (""";.
\AL~d~"1----
i(
,.J!..............
,WATAUC~\.
. . . . . j ~pv ~"-I·
p.\.
~
,SURRY
L&'~IJ
rt
--_-1
f-:!-,'TCH'LL\~\:~OW'LL-;--"
• ",h.
\
"
.
/")
",
•
~•
./"":
.,./ -:;;w":
r:--.... -.
_.J
oov~
/.'
~
\
pAl!KSOi'lJ',
\
------.;"CRAHAM
:J'
,,/·.:;;;;;rO'RSON
.
--,'.
",."
,,_'
,POLK,
,
__
MACON
\.
T ...
'",ROKEE
t'l.. ')
tf,'/'
'\
""-,
1
,_.-=,__ ,
'~. ~.'7
________'<~,.
I
I.
•
/ -'"tAY',
I..
V
,./,~.
< 7'
\.
. . . .","_~_
. ~-
i
'
/
,,<>-
.
,._'
'~),
·
.
1 t~1
/.
'
tr
.'\
I~'
CATAWbA\
yfl
I
I'
'
\
CINe""""
'\;._.
.""-J" • .r-"
\
.
\
11.' \
~u
.
\
'=-A='7'""""':
~,TANL"
~
,
"<\.~
1 ~I
V"' \t/ (Y
....-rJ.r-f!'
..JtJ.J!--
\
i
. ./
ASH
\,
.v.
'I
,
;;
L_.-_L'-f __
,
,.
l
"
C"-q;)...
"'-?
-?'>
,~o ~,1'~\~~~'~:>--<~"~\
"li~ " . "\
(.J...-;n.,.'
"01
"",. "'~" . ~~
/
"-<-,:'\
~L;r
.
&?"
'\'-
I ;v
i
""".-<",
ON
I
?5<
Ir
~
.-:;J,
(
"'RRELL
.
\.,....../
/.
1- '
-,//fY
·_U,;B~RL.NO
fs''~MPSON'
--
)Y / . _ .
A,...........
/0
\C
/ ..,
,;J'
"
_ ......../ "
,,-.:-:('
LSNOIR
•
\'
'
"ONSLO-;:;---'
J/---K. 0
~ ~
"J-~
,
~OLUM';;}.-""",,-,-
Y
/'
-(..........
..JY Pr
;1'
~' /)-'ROBESO".::,,~'
/,"p. \'
tII
'v'" yO
,'-'"
:'-/'
n.'S
1>''-
'\.
).
(
0
f\
\
''-'1, <"
'D
'
'R'
,
'\
EO
.
,,.,/
•
• ..
J
\
' . r7"
~
···)7.
11-,
!
'
"'p.
\. V~Am"Rtr
I ..
\
,Cc
\I
·---·':·'-..·~E-::;'·
...1
''\.,
I!
')l'"7£ND'R
~0~"
~
'-~"'\.'
."
-"
O"'<:s' ........ ~L
. or
.
~
\.
~,j--\
6'v
I pI"
""jj',A5HIN'>1
\
,
-?, ~,- -. QI:' '\~.
),
:".i,\
\ .~ '"0
':--.,.
'
4'.4.
~,~.~"O" \ _t,~·!c.L·/~';:""'''''' M""r;.... ) __ 1../
I
, y h~\
/,'1'
\.
·K
'Y"B'AU'ORT"
7 r"
'/)1 (!J / ·7'WAY·NE ~\,f:
H
'Y~'~\'
'\
/
..
MONTGoMERS'~','~'i
,.~ .
r-=-~"
j'
\...
li
N
,.-,)~.)
)
/.
'Ie""""Ii'''; jiP.
/
,\~
fA ! °O~
~oo;;, ,/~,
i
I' AN s,a,.-",\!
~!;),'.
.Y
,-pr;.v;
\
,ON,
r)
• '-.
ki'RANKL';;-'"
, I~
, . J ' " '",
J
I RANOOLPH~'-_
I CHATHA"-"I7·
\.f
1/
pr ..c
'l';;~/u;;-,...-'i"':--./ .... I' .-.
("10."-
.L.__,I;i
r
'
AMANe ",\>;'Nc', OURLAf1
CU'LFORO
·N'O'RTHAMPrON~--7-'.;-ATrS-T"~-C'~
'J::jAlIf'"A0
t·n.
I
I /'
I
.-"'=. .
j
n,
• CASTON-'
')
/'/
l.
CL'VELANO ~
- - - -.
._--.1.'_
:
I
.
L_,."DL:,4 I
I!
\OAVIOSON 1
'>;,-\,f'
/
\.
"
'''',
.~~)
';R~
.-1
.1..
OAVW'''I;
IRED
I
'
Ij--'Y;;OKiN"');C'ORSYTH
......·r'" 'mL~'-
.0° '-."
I
STok"E.s·-ROCKiN·G~M--·-"
,,_..
..
f CASWELL-r'F/PERSON'lGAArWi'LLEfVANCEt;NARRiN-r.:::::."::::-
I
I
.-".0~
A"'''' '\-..........,.:,,~O'"
0"
~{1 nr ':'\,I
•
A
... f
. . ~ ~ ICLfxAND
\ . 1. ,OIJ~
1 ~, n
'\'
,,~~ }
'-. . . . . '
)
...
J\iCe-;:'1
..
I
WILKES
1<- :\. .",.
•
1-<tJ\~:P~
I"'
"'"
e
..
\HANOVE
j.JBRUNSWICK
'\.f
q
'"
sub-strata within each county selected.
Master Sample segments were tho
sUb-sanpling units within each sub-stratun of MODIs within tho county
se1ectecl.
For purposes of illustration tho following
eX~)lo
is given.
We will
f
aSsUT-le that the first stratification iten, "Dnys Worked Off Farns in 1940t~
was used.
Tho MODIs in prinary stratun IX3 wore put in an arrqy by tho
avorage of this iten for ench MOD
ano.
split into t\10 sub-strata with
approximatoly equal nunber of farns in oach sub-stratun.
thnt a primary stratun
Mitchell (3).
m has 3 counties, Alleghany (1), Avery (2) !'.ncl
The pertinent data follow.
Prinnry
Oounty
MOD lid thin
county
1940 nunber
of farns
Str~'"tun
IXJ3
Ounu1atod 1940
nunber of farms
Jl.vere.go number
of clays 1tlorked
off farms in 1940
6
2
4
1
212
265
117
347
225
212
477
594
941
1166
1.5
7.3
9.4
10.6
15.4
3
5
3
3
7
8
227
384
203
101
94
1393
1777
1980
2081
2175
16.8
19.4
21.4
29.2
29.7
2
2
3
2
1
4
5
9
6
7
341
289
150
170
138
2516
2805
2955
3125
3263
43.0
53.9
54.1
55.4
56.0
3
3
3
2
2
5
6
2
3
2
472
288
171
170
176
3735
4023
4194
4364
4540
62.6
63.4
65.0
67.5
87.7
2
3
3
7
4
1
213
102
286
4753
4055
5141
114.5
116.6
126.0
1
1
1
1
2
1
1
3
3
e
Lot us assune
1
- 7 -
I t can bo soen that the figure for average number of days \-Iorked off farms
...
.
necessary to divido the primary stratum into two sub-strata with approximate1y equal number of farms is somaihero
bO~ioen
43 and 54.
It should be
noted that this figure will vary from primary stratun to prinary stratun
for each of the four stratification items used.
One county is now selected
from this prir:mry stratum with probability proportionato to the 1940
nuobor of farms (p.p.e.) and this is dono by simply cumUlating the nuobor
of
far~
1attor
by county and solocting a random number betweon 1 and 5141. tho
bei~~
tho total number of farms for this particular primary stratum.
This techniquo is shown below.
County
19~10
number of farms
Cunu1atod 1940 number of farms
1
1690
1690
2
1504
3274
3
1867
5141
If 1807 is tho random number selected, county 2 is the srurrp10 county.
This
county has 36% of its farms in sub-stratum I and 64% of its farms in substratum II.
Let the within stratum sampling rate be lit, ancl the \'li thin
ceunty saopling rate bo l/c, where cit is tho ratio of the number of farms
in the saT.Iplo county Mel. tho number of farms in tho primary stratum from
which the county was selected.
Thus tho sampling rate in Sub-stratum I
will be 25/18c and the sampling rate in Sub-strntum II will be 25/32c and
36
25
+
thorofare the sampling rate within the oounty is
64
25
100
180
= l/oe It OM. bo seon that two sa.rn:Pling ratos are used
100
320
within eaoh county. The actual nunber of sub-strata usod within oach
primary stratum can bo greator than two provicl.ceL each primary unit has an
adequate representation of each sub-stratum.
sub-strata have boon used.
In this thesis, only two
The following subscript notation will be used in the remainder of this
thesis:
Subscript
Number of subdivision in
population
sClJ!11?le
Subdivision
cc.
cc.th stratum
cx.i
i th county in
<t.ij
cc.ijs
G
G
Ret
1
jth substrE1tum in (cc.i) county
2
2
sth Master Sanple uni t in (etij)
substratum
lvhij
7!b.ij
CL th
X will refer to the characteristic
of X.
stratun
bei~~
i
estimated and X to the estimate
P will refer to the total number of farms in 1940.
For example
~ij
is the total of a particular characteristic being estimated in the jth substratun of tho i
th
county in tho cc.th stratum.
If it is desired to indicate
the total SUITIT10d over an inner subscript, we shall indicate this by a dot.
For example, ~.j is the sum over all counties in the jth sub-stratum of the
ex. th stratum,;
The population total to be estimated can bo expressed as
X
G
=
H
t:
,£0.
2
'£
M
t:1:Lij X
a?l 1=1 j=l s=l
cx.ijs
A similar expression for this estimate of X is
(2)
where t
cc.
ta. can bo
Xl
=
G 1
2
~i' t
'E J X
'E
'E
etijs
ar=l tcz jt::l s=l
is the sampling rate within the cc. th stratum.
e~q>rossecl
as
- 9 -
~ij can bo found fron (3) to be
t
The eX"pocted value of X is
E Xt
(5)
::
SUbstituting tcc,
,
EX
=Dcc,ij
t
j s
CG i
P
1
EEEE
n
~
P
cc,i,"
Pcc,ij/~ij Pcc,.j into
= E E ~~.
~
Po:,.j
Pex.i
cc, i j s
Pcc,ij
Pa,
cdjs
M
CGJ.' j
cc,'
CG
X
(5), wo obtain
Xcc,ijs which cnn be expressed, by
multiplying ancL clivicling by Pa.0 us
E X'
= !:
a.
Pa, E ~
i j
P
.
C:"J
Pcc,
Xetij
Pcc,J.OJ
•
If we denote tho double sur.tt1ation by Ra.(A) , the average of the adjusted
ratios within tho ~th stratum, wo obtain
E X' = 1: Fa.
I\x. (A)
a.
R (A) = ~
a.
i
1'a.1
P
ex.
.x
where 11.
:: E X
a.ij
s a.ijs
IpcGij :: ...9'
4..1
P
cx;ij
... 10 ....
Since X
=t
ex.
=t
"'Acx.
)
Pc:c.Ra, and E fC·I = t Pa, (
Rcx.(A)
•
Hence the estimate X' is biased.
all the primary strata.
This bias is the sum of tho biasos over
See reforence [3] for a moro complete discussion
of the preceding theory.
11
The moan squaro error for area suO-stratification can be expressed as .
where
2
{]a.ij
= t:s
(
,
X
... X
a.ijs
cx.ij
!
I
2
1Mcx.ij
and is the variDnce between Master
Sample segments within tho sub-strata within tho county selected.
Pcx.ij
Also
=pcx.ij/~ij·
Thus tho mean square error is composed of three components (from left
to right):
(i) tho l)etween Master Srum)le segments component,
between county within primary strata component and
(ii) the
(ii1) the bias com-
ponent. Previous studios have indicated that (ii) is the largest component
with the importance of (i) del1encl!ng on the sampling rate used.
The bias
component (iii) was expected to bo negligible, but it turned out to be
if
This is tho sruno formula givon in
[3 J with minor
changes in notation.
- 11 ...
relatively largo for somo items
st~iiod
in this thesis.
In the discussion which fo~lows. coefficients of variation for three
!
methods of estimation with no
sub-stratific~tion
and for four methous of
stratification with area sub-stratification will be compared.
Tablo 1
prosonts these sevan coefficients of variation and the accompanying
relative efficiencies for three items whon the state was divided into 20
strata.
The results for no sub-stratification were previously reported
in [lJ • For estimating f1Days ";orkecl Off Farms ll all four methocls for
sub-stratification were less efficient than the last two with no sub-stratification.
Strangely enough the smallest relative efficioncy of the four
sub-stratification items was "Days Worked Off Farms". the socio-economic
characteristic to be estimated.
It is believed that the roason for this
is the shift from farm to city during tho yonrs 1940 to 1945 when the war
inclustries \'!ero in full swing.
For "Tobacco Jl.crengoll only tho second
method with sub-stratification was moro officient than the unbiasod
estimate.
The small relative efficiency for the fourth sub-stratification
method is not in accordance with the opinion that per cent tenancy and
tobacco acreage are highly correlD.ted.
For "Cotton Acroage lt the first and
fourth methods with sub-stratification wore slightly more officient
tl~
the
unbiased estimate.
The bias component contributed a considerable amount to the mean square
error in many cases as shown in Table 2.
It is believed that the reason
for tho large bias in the case of tfDays Worked Off Farms n is the shift of
farm labor to factory labor (l.uring the wnr years.
Tho large biases in
estiIJ'l.ating ttTobacco Acreage It and "Cotton Acreagc lt , \'Then the last two substratification items were used, were probably attributable to a fairly
consistent positive oorro1ation betweon p~ij/Pai (the ratio of the number of
- 12 -
Table 1.
Comparison of the Coefficients of Variation aLid the Relative
Efficiencies 11 of Different Methods Using 20 Strata, v!hcro
One County is Solected with p.p.s. from Each Stratum.
:Days worked t
Tobacco
:
Cotton
off faro
;
acrenge
:
acreage
:; in 1945
:
in 1945
:
in 1945
:~.V.:R.E. in %:C.V.:R.E. in %:O.V.:R.E. in
,.
•
••
••
•
~
TYlJe of estimate
:
e
J,
No area sub-stratification
Unbiased
Ratio to 1945 nUMber of farms
Ratio to 1940 characteristic
Area sub-stratification
· .
: .152:
: .143:
:.114 :
100
113
170
.
:.086 :
:.000 :
: .020:
100
95
1450
···
:.135:·
100
99
:.136:
: .033: 1655
·•
·
Number of d~s worked off farm
in 1940
Value of products sold in 1940
Average miles from all weather
roacls in 1945
: .151:
Farms operated by tonants in
1945
: .146:
:
91
91
:.077 :
97
:.073 :
107
102
=.001:
.09
100
: .092:
60
iJ
:.130:
:.136:
100
98
:.146:
:
:.133:
06
103
Relative officiency (R.E.) of two different nethods, according to
Cochran [4] 1 is defined as tho inverse ratio of their variancos. For
exam:CJ1e under IIDO¥s Worked Off Farms in 1945 1l tho ro1ativo efficiency in
per cent of the ratio to 1945 number of farms estimate to the unbiased
estimate is
V
2
• 100
X2
(11152)2
V1
(.143)
=- • 100 = --"':"2
x2
• 100 :: 113%
%
- 13-
Table 2.
:Bias of the Estimate in Per Cent of 1945 Characteristic for
Area Sub-stratification with 20 Strata.
·
·
1945 Estimate
Sub-stratification item: Days worked
off farms • Tobacco acreage
Number of days worked
off farms in 1940
9.48
Value of products sold :
in 1940
3.20
·
··
1.77
Cotton acreage
·
·
.19
a
:
Average miles from all
weather roads in 1945
4.63
Farms operated by
tenants in 1945
5.84
:
.45
··
3.23
··
4.18
.86
2.24
:
·
·
3.54
- 14 -
farms in the sUb-stratum to the number in the county), and Ra.ij (the mean
characteristic per farm for the ~ub-stratum) within each sub-stratum.
Hansen and Hurwitz point out that if there were no correlo.tion,. R would
a
equal Ra.(A); hence, there would be no bias [3
J.
Table 3 shows a comparison of area sub-stratification with the unbiased
estimate when 10 strata are used.
For 1fDays Worked Off Farms" only the
second sub-stratification method was less efficient than the unbiased
estimate.
For IITobacco Acreage" only the third sub-stratification method
was as efficient as the unbiased estimate.
very low relative efficiency.
The fourth method again had a
For "Ootton Acreage lt the first and fourth
methods were more efficient than the unbiased estimate.
In the case of both 10 and 20 strata there is no evidence that any of
the 4 sub-stratification methods used in this investigation is consistently
more efficient than the unbiased. estimate.
Hence it would appear inodvis-
able to recommend area sub-stratificction for tho following two reasons:
(1)
The ostimate using area sub-stratification is generally biased,
often by a large amount.
Further the magnitude of this bias can not be
estimated in an nctual survey.
.
(ii).
when area
There is a certain a."21ount of extra computing in d.rawing a sample
sub~stratification is
used..
- 15 -
Table 3.
Corrrparison of the Coefficients of Variation and the Relative
Efficiencios of Different Designs Using 10 Strata Where One
County is Selected with p.p.s. from each Stratum.
Days worked.
Tobacco:
Cotton
off farm
acreage
acreage
in 1945
:
in 1945
:
in 1945
~C.V.:R.E. in %:C.V.:R.. E.. in %:C.V.:R.:m. in
:
•
:
..
No area sub-stratification :
Type of estimate
·
Unbiased
Area sub-stratification
·
:.232 :
Number of days worked off
farm in 1940
Value of proCl.ucts sold in 1940..245:
Average miles from all weather
:
roads in 1945
:.232:
Farms operated by tenants in :
1945
:.231:
·
·· ·
~
~
· .
100
100
:.122:
100
105
···
·:.129:
90
: .. 212:
102
84
95
100
· .
:.227:
89
64
: .198:
117
94
105
106
:.133:
•
: .122:
•
:.152:
·
·
·
.-
·
%
... 16 ...
INVESTIGATION OF THE EFFICIENCY OF THE MnmR CIVIL DIVISION (M.C.D.)
.AS A PRIM.ARY S.AMPLi~G UlHT lHTH· }~TER SAMPLE SEGMENTS .AS THE
SUB-SAMPLING mUTS'.
Methods of Stratific9-tion
The state of North Carolina was first divided into 12 major geographic
strata according to type of farming areas.
Map 1.
These 12 strata are shO\m in
It was possible to improve upon these original 12 strata by following
M.. C.D. boundaries rather than county boundaries and then using the M.C.D. as
the primary sampling unit.
For tho purely geographic stratification, contiguous M.C.D. 's were
combined within each major geographic stratification in such a way that
each stratum contained approxinately 5 M.C.D. f S Met 1400 farms.
This re-
sulted in obtaining approximately 200 strata within the 12 major geographic
l/
strata.
For the other four types
of stratification the M.C.D.'s within
each of the 12 major geographic strata were put into an array by the
nagnitude of the type of stratification and combined in such a manner thnt
each stratum contained approximately 1400 farms.
It ''fns believed that the
size of snople with one M.C.D. drawn from each of the aOO-odd strnta would
be approximately the sarne as with one COU,.Tlty drawn frof:1 each of ao strata
(sub-sampling rate assumed to be equal).
Daro and Swain counties were
y
omitted from the population leaving 98 counties and 941 11.C.D.IS.
It was not possible to obtain exactly the sarne number of strata for each
of the several types of stratification used in this investigation. This
number varied from 194 to 197. but will be referred to os approximately 200
in the text.
If an M.C.D. had less than 100 farms in 1940 it liaS combined 't1Tith an
adjacent M.C.D. so that each of the 941 combined M.C.D.l s had at loast
100 farms in 1940. There were a total of 1018 original M.C.D.'s in 1945.
... 17 ...
Basic Sampling Plan
In an actual survey, from each of the approxim4tely 200 strata one
M.O.D. \'1oulel 'be selected with probability proportional to the 1940 num'ber
of farr.ts, anet Master Sanple segments then chosen systematically from eo.ch
of the selected
M.O.D~ts.
Oalculo.tions o.nd Results
The total to be estimated can be expressed as
th
is the total of the cho.racteristic in the s
Master Sample
th
segment in the i th M.e.D. in the ex. stratum, Mex.i is the total number of
where X.
a.J.S
Master Sample segments in the i th M.O.D. of the ex.th stratum, Hex. is the total
number of M.O.D.' s in the ex.th stratum, and G is the total num'ber of strata.
We shall consider three different estimates of
(i)
~
in this investigation.
Unbiased Estinate Tho unbiased estimate can be expressed as
f
(9)
X
1
ma.i f
!:
!:
X
t 0.=1 1=1 s=l
ais
1
G
= -!:
where! is the over-all so..":1pling rate Mel ma.i is the nu.n'ber of segments in
the sample for the i th M.O.D. of the a,th stratu.n.
!l~i
Thus in equation form
t P
== P Ck..
cx.i
fbi'
,,'here P is the total mmber of fo.rms in tho a. th stratum in 1940 and P • is
ex.
M
th
th
the total num'ber of farms in tho i
county of the ex. stratun in 1940.
- 18 -
The variance of tho unbiased ostinnte, as derived in tho appendix is.
(10)
!h
G
2
O'X'
=cc.=1
~
~
1=1
.
;_
+
\'lhere
r.';
xct.i =
Pcc.1
na,i
P
•T
·
t
-
cc.
Ip
G
2 Ba, po,!
-
~
(Xcc.1
! a, i=l
cc.=1
P
cc.
l~
M -m
a.i
a.1
2
-1
!
O'a,i i
i
M0,1 - 1
_!
-x- cc.)2l.!
•
..J
1945 characteristic in the i th M.C.n. of the a,th stratum
1940 number of farns in tho i th M.C.r. of the cc.~ stratum
1945 characteristic in the a th stratun
1940 nunber of farns in tho
a
2
o,i
= variance within
the i
th
a, th
stratun
M.C.D. of the a th stratum.
Only tho soconcl lJart of (10) has boen oonputed in this investigation.
This
part is called the "bott-ieen primary sanpl1ng unit cooponent fl of the total
varianco.
(ii)
Ratio Estinate
To estinato! we can also usc the ratio estinato.
which nay be slightly biased.
I
X
r
= Y ..
This estimate is
Xl
-
y'
where Xl is an unbiasod estinato of the population total for sone specified
characteristic and is given by
1
(12)
Xl
G
~
t ar=-1
=-
_
ttx.i
X cc.i
- 19 -
f
and Y and Yare tho population total and snqplc total of an auxiliary
f
~
variato that is correlatod with X •
An approximation of the variance of the ratio estimate is given by
the oquntion
2 N 2
Xl =X
a
r
.
Thus tho coefficiont of variation of tho ratio estimato is approximatod by
2 a
I
XY
I
XY
The bins of the ratio estimate is approximated by
(15)
X
2
- (C.V.y - C.V·Xlyf) •
Y
The derivations of the bias, variance and (C.V.)2 of the ratio estimate are
presented in the appendix.
(iii) Regression Estimate
total,
At
The estimation equation for a population
for the regression estimate can be expressed as
t
(y - y)
I
1
whore X and Yare unbiasecl estimates of the popUlation total.
The variance of the regression estimate as derived in the appendix, is
-20-
Table 4 presents the between primary sampling unit component of tho
(C.V.)2 for three difforont itens ostimated for 1945.
parisons havo boon mode:
estimation,
Two sets of con-
(i) comparison of throe different mothods of
(ii) compnrison of five differont types of stratification.
Little difference can be observed between the unbiasod and ratio ostinatos.
As the unbiased estimate is oasier to
ratio estimate.
over the
co~ute,
it is reconmendod over tho
In all cases the regression estimate showed slight gains
~biased
estimate.
worth the adied costs of
It is not believed that this slight gain is
co~uting
the regression coefficient.
The most
important conclusion that can be dr<:lWn from Table 4 is that the geographic
stratification is better than tho othor four typos considered in all casos
for the unbiased estinate.
In an actual survey, geographic stratification
is not only easier to uso from a computational point of view but also
information on the stratification itoms is not nocessary.
Table 5 shows tho bias of the ratio estimate in per cent.
The largest
bias obtained was approxinately 1/2 of 1 per cent; so for practical purposes
this bias is negligible and can be neglected in tho conparison of tho ratio
with the unbiased estimate.
In comparing the stratification by current Census information with
that of past Census infornation, it was found that thero is an improvement
in using current information over past infornation.
Information in 1940
and 1945 was available for only two of the stratification items.
The
results apl)ear in Table 6. For liDDY'S ''larked. Off Fams tr the stratification
.
2
by current Census inforT.1ation gavo D. highor (C. V.) than for :post Census
infornation.
,e
This
~~
be oXDlained by the fact that this characteristic
\1aS noro variable in 1945 than in 1940.
e
It
Table 4.
e
•
Conpnrison of tho Bet\'lCen Prinary Snopling Unit COE]?onent of the (Coefficient of
Vnriation)
2
for
11
A?proxinate~ 200
Strata of M.C.D.f S Using Different Methods of
Estimation.
Estinate of
Tobacco acreage in 1945 : Cotton acreage in 1945
:Days worked off farns in 1945
•
• Ratio to :
Ratio to :
Estinato Used
• Ratio to :
Un- :1945 nunbor:Rogres-: Un- : 1945 nunber :Regres- : Un- :1945 nunbcr :Regres:biased: of farns : sian :biased: of fams : sian :biased: of farns : sion
•
•
•
•.
•.
:
•
••
.
..
TY:Jeof stratification:
•
•
•
•
•
...
~
··
Geographic
: .645
Days worked off farns :
in 1940
: .931
Value 'of products sold:
in 1940
:1.016
Average niles fran all:
weather roads in 1945:1.046
Fams .6peraterl by
tenants in 1945
:1.016
11
.
AlJn (C. V.)
2
·
··
~
:
·:
..•
:
~
.722
.933
.940
·:•
1.009
:
1.030
·
··•
··
·
··
.·
·•
.595
.881
.917
.963
··
··..
·
: 1.150:
·: 3.147:·•
·: 1.328:·
·:• 3.196:·
.973 : 2.895:
3
in this table have beon nultip1iect by 10 •
·
·
·
1.323
··
····
·
: 1.132 : 4.920:
·:• 3.092 ·•: 5.616:·•
··: 1.271 ··: 6.083:··
3.171
:
: 3.130 : 6.426:
6.071
••
6..048
2.940
: 2.865 : 6.141:
5.608
••
5.467
1.329
3.139
··
·
5.620
5.135
5.674
·
···•
·••
·.·••
4.910
I
5.040
....
N
I
5.632
... 22 ...
Table 5. Bias of Ratio Estimate in Per Cent
1945 Estimate
:Tobacco acroago:Cotton acreage:
in 1945
in 1945
Days worked
off farm in 1945
Typo of stratification
"
Geographic
Days worked off farms
in 1940
Value of products sold
in 1940
Average miles from all
weather roads in 1945
Farms operated by
tenants in 1945
.064
.412
.024
.016
.027
.308
.020
.024
.023
..030
... 23 ...
Table 6.
Comparison of the :Betweon Primary Sampling Un1 t Component of the
(Coefficient of Variation)2 for Stratification by Current Census
Information with that 'for Past Census Information Using the
Unbiased. Estimate_
1945 Estimato
Cotton •• Days '"JOrkcd
off farms
acreage •
Typo of stratification
Tobacoo
aoreage
Days worked off farms in 1940
.000931
.003147
.005616
Days worked off farms in 1945
.000877
..003044
.005913
Value of produots sold in 1940
.001016
0001329
.006083
.001274
.005728
Value of products sold in 1945
.
.001104
·
.•
... 24 ..
For the efficiency of the county and the M.O.D. as a primary sampling
unit, several considerations must be made in comparing estimates based on
our sampling plan with estimates obtained by
MadO\'l
[ll.
These considera-
tions include gains due to stratification, cost, and size of primary sampline
unit.
~te
assume, as Mado\-l does, that the between primary sampling unit
variance is the dominant term in the total varianco.
It is also reasonable
to assume that the between segment within county variance is at least as
large as the between segment within M.O.D. variance.
This becomes clear
when the scatter of selected Master Sample segments within the county is
compared with the scatter of selected segments within the MoO.D.
It can be seen from Table 7 that the between component is about t\1/ice
as large for tho M.O.D. with 20 strata as for the county ltTith 20 strata.
The
difficulty in mcl{ing this comparison is that the within variances are not
known, and no consideration is made as to the relative costs of oach method.
It would seem that the county is more efficient than the M.O.D. using 20
strata.
The reasoning follows:
If the nunber of sub~sarnples is the same in each case, it is
(i)
believed that the costs ,,,auld be practically the same.
The reasoning horo
is that the betweon primary sampling un! t travel would be about the same
since 20 are visitocl in each case.
The between sub-unit travel within tho
primary sampling unit would slightly favor tho M.O.D., but this would not
bo very great as the same nUl!1ber of sub-units must bo visi teel.
(ii)
The within variance for M.O.D.I S would bo reduced by two factors:
(a)
elimi~ation
(b)
tho finite population correction would reduco the variance
of the
be~lcen
M.O.D. variation within county
nore for the M.O.D., as the sampling rate would. be higher.
However if we assume that tho
bO~1/oen
component is dominant as was assumed
e
e
..
Table 7.
e
~
Cooparison of the (Coefficient of Vnriation)2 of 20 Strata where tho·County is tho Primary
Sampling Unit
a~d
20 and 197 Strata where the M.C.D. is the Primary Sampling Unit.lI
(Coefficient of Variation)2
.•
M.C.D.
14.C.D.
:% Gain of 197 M.C.D.
County
•
·•
•
•
Q/
if
~
E1
·
·
·
•
•
.
197
20
over 20 M.C.D.
20
Number of strata ·
•
·
: Unbi as cd. Ratio ·
; Unbi asod ; Ratio : Unbiased : Ratio : Unbiased : Ratio
Estinate used
•
••
:
•
•
••
·
••
·
·
·
·
•
•
5
..
6
7
4
2
3
Estir.1ated i ten
1
•
•
•
•
·
·
·• a
·
·
·
To'bacco acreage
· .0058 : .0061 : .0+27 : .0125 :• .000645 ·: .000722 :· 49 ·: 42
·
Cotton acreage
·• .0182 : .0184 : .0324 · .0263 : .001150 : .001329 ·: 65 ·· 49
:
28
.0574 : .004920 : .005620 ·
..0596
: .0205 :
23
Days worked off farm: .0231
;
P.S.U.
~;
t
t
.t
t
t
·
1Z
t
·
Only the between primary sampling unit cOI:l]?onent is considered..
1"-:
The figures in co1UI:1Ils (1) and (2) are those obtained by Mrs. Madow Ll.J t where 20 strata wore used
and tho county was tho IJrinary sar.ll)ling unit.
.
;H S&'lO strata as used in (1) and. (2) but with the Minor Civil Division as the prioary san,pling unit.
These figures are those obtained using 197 strata where the Minor Civil Division is the primary sonpling
unit.
§} Gain in per cent for 197 M.C.D. over 20 MoC.D. was calculatod by:
122
iO (C.V20) .... (0.V197)
- - - - - - - - - - x 100
W.
V
1
10 (C.V20)
2
S?
)
- 26-
in
fl] ,
it is not believed that the reductions mentioned in (a) and (b)
would compensate for the differences between columns 1 and 3 and columns 2
and 4.
To cor.tparo the method whore one county is selected from each of 20
strata with tho method where one M.C.D. is selected fron each of 197 strata
is even More difficult without information on the within variances and
relative costs.
If an equal nwnber of sub-units are selected, the sanpling
rate can be assUT.led to be the sane.
Thus the within variance for tho
M.C.D.'s would be reduced by only tho factor (ii) (a) above.
should be loss thc.'U1 that for the wi thin county.
However it
Thus it lriould soon
reasonable for the 197 strata to have a snaller total variance than the
20 strata.
As it is noro costly to onUMerate segments in 197 M.O.D.'s
than for the sane nUMbor of sogments in 20 counties, it is not possible to
nake a definite statement as to tho relative efficiencies of these two
methods.
However it seens roasonable that this conparison night be in
favor of 197 strata.
In order to calculate the relative gains due to stratification where
the size of the primary sar.tpling unit is the sane, we
and 4 with columns 5 and 6 in Table 7.
comp~re
columns 3
The per cont gain of 197 strata
over 20 strata where the M.O.D. was used as the primary SaMpling wlit is
indicated in columns 7 and 8.
Considerable gains are indicated on all
items for both the sinple unbiased estimate and tho ratio estimato.
Greater
gains were obtained for the sinp1e unbiased estimate than for tho ratio
estimate.
Increased stratification was nore effective than the use of a more
corrplicated estimation cquation, neglecting costs.
At the present timc research is being done on the within M.O.D. and
count~
contribution to the over-all variances.
if.hen these calculations
are corrplete, it will be possible to nake more definite statenonts about
- 27-
the efficiency of the M.C.D. and the county as a prinary sanpling unit.
Li ttle data are available on costs at present, but it is hoped to obtain
such data in the near futuro •
•
-28-
APPENDIX
Pertinent notation for this thesis is presented in tabular form below.
State Stratun
M.C.D. in
stratum Saopling unit ill M,C.D.
Oji
Subscript
Nunbor in population
N~ber
in sanple
Total nunber of farms
in 1940
1
G
H
2
1
G
1
P
P
P
P .
xex.
X
X
X'
Xl
Xl
ex.
Total of agricultural
characteristics in
1945
X
Estimate of total
Xl
Total of characteristics in 1945
Total number of
farms in 1940
a.is
a.i
C:LS
o.i
a.is
a.i
OJ
ex.is
/'~,
N
X
X
0.1
Mean per sa.rnpling
Uhit
=~
=
X
,ex.i
Ma.1
Other Notation
t
= sampling
rate
11
Proof that Xl is unbiased estimate of X.
rl
E Xl
=E
G
1
In
tcc,i X
I -t a=l
t
t
i=l scl . cx.is
I
I
'--
--I
II
1
-J
This proof is presontod in reference
G
f'
1
E , ~
t a.=l ex. i=l
=-
t
[5] .
L
- 29 -
whero Ecx. is the expected value for tho a.th stratum and Ftx.i is the expected
value of ~is within the county selocted fron the cx. th stratum.
1
Mo.!
S:l
Now, ,weti Xla.is = Meti
T,l
=
Xetis = Xa.i
Then,
n
~a.i
s=l
_
l
itx. I
i!
-'
r..J
To get
Xa.i
or the 1945 Doan por 1940 nunber of farns. rOJ!lcnber that
t
t p
ex. M
Dex.i p - 'a.i
a.i
Thus,
n
a.i
-X
a.i
t Pex.
=-M
p
a!
C(,i
=
,..;
.X
ex.i
=tP X
a. ai
Thorefore,
G
1
Ai
=i:P
i:EX
a.=1 ex. i=l a. ai
,,,
Now,
N
:In X
"1:!. cx.1
..
where 1'0,1
Pa.
Hex.PitV
= i=l
i: ..L X
Ii
ex.i
ex.
/OJ
::: X
a
= the probability of
f
selecting the i th county from the a,th
stratum, using probability proportional to the 1940 number of farns.
Thus,
- 30 ...,
Derivation of Variance of Unbiased Eatinate.
1
1
~
,
We moy wri te X
=
G
J
!: X ,
a.=l a.
n
llihero X' = _ !:~C(,1 X'
a.
t 1=1 s=l ex.is
Then
G
E Xt
!: E X'
w::l
a,
==
t
and if tho sanple is selected independently fron each stratun. we have
2
aX'
=
G
2
~
a.=l
a,
Xct.
2
Thus, we neecl to evaluato oX'.
,lXa,t = E r.L. xa,r :
(1) • • •
:By the definition of varianco, we have
.E Xt
~
J'
2 = E E
r ,Xl _ E Xl] 2
a, ex.i _ a,
ex.
\10 nay wri to
(2)
[x
ai _ a.
t
E
II
...
E
XI]
ai a
2
,
since
E \- X' . a.l !_, a,
]J 1
Ct..
X'
cx,
l_J r:re
__.ex.i
X' - E X' ]
a
a _
.-1
:: r E X ' - E X'
L 0.1 a.
ex. - i
•
t
...
,_. cx,
• E 1 rl· x
a,
E i
a
x'l
::.
0._
constant X rEi
La,
Fron (1) and (2). we then have
::]1
ex.
11
(I) + E
ex,
(II)
This proof is presented in reference [5 ]
•
xa,t
...
E 1
a,
t
x0._
'/!
= 0
- 31 -
(r),
To evaluate E
ex.
we calculate
Xl -1" 2 :::
,_. Xl .. m
E
ai
-
a
a
J.:Iai
-=-
-·t
Ilhi
Ma.i -
2 ma.i
M-1
2
• C1a.i ,
a.i
where
Thus,
E
a.
(I)::: E
a.
2-'
f
_l
m M.-m
0:.J.
a.1 C1
t 2 0.,1 M _ 1 a.i
'-
a.1
I
.J.
To evaluate
()
a
E
'VIe
II
,
a. I
-
::: E
I
E
I
X - E X
ai 0.,
a,
--I 2
-
I'
I
note that
t
IV
EX eX =p X
a.
ex.
a a.
Also
and
Thus,
•
.~
r·
II
.,..I!
~X -X
! ai
2
_pNXJ
(II) --'l;1lpX
....cx, La. cx,i
a. ex.
a
E
p
::: p2 to, 0,1
(X
a 1=1 p
Ct.1
a
Thus, we have, finally
-1
_p2 E
- a. a
<f'J
2
.. X)
a
...
1\1:
(:(,1
M
... m
eti
-
0"
-
ai
.. 1
2
- 32 -
where
~ = r- :a
.
CGi
11
Derivation of the Variance of Rhe Ratio Estimate.
The ratio estimate of X is
Xl == Y •
r
Xl
-I
Y
where Xl is an unbiasecl estimate of the population
total of a characteristic ancl Y and Y' are the population total ancl sample
total of an auxiliary variate that is correlated with X.
The variance of the ratio estimate can be approximated as follows.
)
r
Xl == Y
r
:X+LlX
•
I
'oJ'here X ==X+LlX
\Y+LlY I
Y' ==Y+LlY
This can be written as
X~ = X(1
X>X (1
I
Lly)-l
11
+and expancling the las t
I \
Ll X \
+i
X
Ll
+-
1
of order
\
-z
etc.
y
X) .r'
X
term
Y
Ll
1 - -
\
Y.
+
Y
(Ll Y) n )
aaa
r
and eliminating terms
the expression can be written as
6X
[._X
YJ
Ll
-+1--
...
Y
Since
,
J'\'
EX='
X,
r
Ll X Ll YJ
[_ --:;- - - - : ; _
t
neglecting the term in (6 X) (6 Y).
j}
This proof is presented in reference [4 ] •
... 33 ...
Also
"
E
2 2E (A- X;... -Ay)2 = X2
tV
(X .... EX') ';i X
r
r.
2
O'y'
=-X2
r
y
2
• 2 r0 O'X'
(C .. V.X)
X
+ -
y2
2 0X'Y'
... --=-==XX
11
O·
Derivation of tho Bias of the Ratio Estimate:
TIle estination equation as shown previously is
Xl
,
X =Y .r
y'
Xl
X'
EX'
:f ---:IDYl
Since E -
yl
the approximate bias can be founcl by expanding -
y'
infinite series about E Xl and EY'.
Thus
IX'
)
E;;--;=E (;;-
X
X
X'
--
Y
This can be expressed as
I
E
I\ yl
X
I
"
= E
y' ... E y'
( X'
I
1
E
(
\
il
EY
I
i
i
I
)
\ ... -
Y
X
+
\ E Y' J
Therefore
1
E yl
(
\1
41
X
... E yl + E y'
("\
cE
\
Xl
I
y
J
yl ... E y'
1 ...
\
EY
I
(y
f
,
... E Y )
+
(EY')
Xl
X
- ... yl
Y
.....
This proof is presented in reference [5] •
2
2
...
... )
X
... y
in an
-34-
Since E X' = X and E y'
I.'
X'
X)
\
yl
Y
\E---
Y, the first term vanishes and
e
(E X'
=-
(yl y2
\
EY'») + (E Xl
(y. y3
Ey.)2 'J - ....
I
The first term is the covariance and can be oxpressed as -
2 ,
a X' y'
2
Y
..
Adding and subtracting - - - - - - - - - • tho second term may be written as
y3
Xl _ E XI)(y' _ E
E
(
.
.
y.)2)
y3
y.)2
EX' E(y' _ E
+-------y3
1
The first term is of ordor
2
y
and can be discarded.
Tho socond can be written as
2
X ayt
An approximation to the bias cnn then bo written as
(
XI
X.)'
Y
Y
E I- - -
2
=-
aX ' y '
,
y2
X ay '
+y3
11
Derivation of the Varinnce for the Regression Estimate:
The estimation equation for a population total, X, for the regression
•
estimate can be expressed
Xl
Lr
11
= X' -
°x.'y'
2
0y'
B$
( Y'-)
Y
where XI and Y1 are unbiased estimates of
This proof is presented in reference [5] •
-35-
the population total and thus E Xl
2
so oX'
Lr
= E (Air - X)2 and
I
=.
E Xl
Lr
= X and
E yl :: y
substituting the following results
2
(yl - y)
1
and taking the expected value,the
J
following is found
2
+ 0y'
2
oX'
Lr
;.
•
2
:::0:,-
X
which is oqual to
- 36 REFERENCES
(1)
Madow. Lillian H.
nOn Sampling for State Agricultural Estimates in
Carolinalf Journal of tho American Statistical
Association. Vo1Uflo 45. M:o.rch 1950.
l~orth
(2)
'(3)
•
Hansen. Morris H. anel Hurwitz. \fi1liam N. liOn tho Theory of Sar.lJ.)ling
from Finite Popu1ations. 1f The Annals of Mathomatical Statistics. Volume XIV. :nro. 4. Pago 333.
December 1943.
"Sam:P10 Survoy Techniquos. 1I Mimeographed by the
Department of Statistics at North Carolina State
Co11ego. Juno 1948.
(4)
Cochran, H. G.
(5)
Madowt Lillian H.
•
•
King. A. J. and Jessen, R. J. "The Master Sample of Agriculturo. tJ
Journal of the American Statistical Association.
Volume 40. l'~arch 1945.
If On Sampling for State Agricultural Estimates in
North Carolina. lI Progress Roport, North Carolina
State College. February 1949•