QUOTA FULFILM:ENT IN FINITE POPULATIONS
N. L. Johnson
Institute of Statistics
Mim.eo Series No. 361
May, 1963
f
UNIVERSITY OF NORTH CAROLINA
Department of Statistics
Chapel Hill, N. C.
QJJOTA FULFIIMENl' IN FINITE POPULATIONS
by
N. L. Johnson
May 1963
Grant No. AF-AFOSR-62-l48
In choosing a sample it can happen that extra expense
in involved if selection has to be made from a restricted
section ('stratum') of a population. If a sample containing specified numbers of individuals from each stratum
is required it may then be more economical to sample in
two stages, the first sample being chosen at random from
the whole population, and the second arranged to complete
the sample so that it has the desired constitution. An
earlier paper discussed the choice of size of first sample assuming that the population size is effectively infinite. This report extends the results to populations
of finite size.
This research was supported by the Mathematics Division of the Air
Force Office of Scientific Research.
Institute of Statistics
Mimeo Series No. 361
QUOTA FULFILL'vlENT IN FINITE POPUIATIONS
l
By N. L. Johnson
University of North Carolina
1.
Introduction.
Johnson.Ll_7 has discussed the choice of size (N) of an initial random
sample from a population divided into k strata
11;:, 11;, ... , 11;"
aim is to obtain a sample containing specified numbers
ly of individuals from these k strata
when the final
••• ,
~, ~,
~
respective-
(.E~~=1 m.~ = m0 ). In Lr l- 7 it was assumed
that the strata sizes were effectively infinite (though in known ratios).
In the
present paper the same problem will be discussed for the case when the numbers of
individuals in the strata ~,
respectively.
2.
11;, ... , ~
are known to be
1\,
~,
... , l\
k
(.Ei=l Mi = Mo·)
Optimum Size of Initial Sample.
As in ["1_7, c will denote the cost per individual of an unrestricted
random (without replacement) sample;
c. will denote the cost per individual of
~
a random (without replacement) sample from stratum
"11:';
~
and c! will denote the
~
value (if any) of each excess individual beyond the quota requirement (m ) from
i
stratum 1[.
~
Then, using the same approach as inL1.-7, consider the-expected net cost
of increasing the size of the first (unrestricted) sample from
N to (N
+
1).
If
the first sample (of size N) contains n , ~, ••• , ~ individuals from~, 11;,
l
••• , ~ respectively (with; of course, .E~=l ni = N), then the conditional increase in the expected net cost arising from the extra individual increasing the
sample size from N to N + 1 is
~his research was supported by the Mathematics Division of the Air Force Office
of Scientific Research.
2
The (unconditional) increase in expected net cost is obtained by taking
expected values with respect to the n.
J.
k
( M
E
i=l
where
PrLni < mi ]
c!
J.
-
-1
"1
n.1
-
C LM~ -
:0)
=(
giving
IS,
ni
N
m~-l (
Mi )
n.J. =0
n.J.
~ miJ'
Pr
mi-l
=
E
n
i
=0
= p( Mo '
N, M , m - 1)
i
i
using the notation of Lieberman and Owen
["2_7.
Also
mi-l
J=
E
n
i
=0
NM
=-Mi • P(M
o
- 1, N - 1, M - 1, m - 2)
i
i i
and
~ [ni I ni < mi
J
Pr [ni < mi ] +
t: [ni I ni ~ mi ]
Pr [ni
~ mi ]
=
~ (ni )
= 11M./M
J.
Hence equation (2) can be written
0
3
6C
=
0 - (Mo - N)-l [
-1
= e-M
o
i~l 0i {
,\P(Mo,N,,\,mi-l) -
(NM/Mo)P(Mo-l,N-l,Mi-l,mi-2~
k
{
~
M~ci' + .E M~(ci-Ci')
P(M ,N,M~,m.-l)-(N/M )P(M -1,N-l,M~-1,m.-2,.
i=l.L
i=l.L
.
0
.L ~
0
0
.L
~
k
l:
This can be further simplified by noting that
P(M ,N,M~ ,m.-l) - (N/M )P(M -1,N-l,M-r-l,m.-2)
o
.L ~
0
0
.L
~
M ··N
o
=
M
o
(4)
= (1 - N/M )P(M -l,N,M.-l,m~-l)
o
0
~
.L
(This formula can also be established by considerations of probability.)
Using this relationship, the folloWing expression for increase in expected
net cost'iS' obtained
-1
t£ = c - M
o
k
-1
l: M.c ' - M
i+l
~
i
0
k
.E M.(c.-c!)P(M -l,N,M-r-l,m.-l)
i=l 1 ~ 1
0
.L
~
For small values of N, l:1C is negative.
As N increases, so does l:1C.
value, of N is the least value of N for which l:1C
be the solution of the equation
The optimal
> O. This Will, approximately,
4
-1
c - Mo
( 6)
Two
k
E Mic~ = Mi=l
~
0
k
l
E M.(c.-Ci')P(M -l,N,M.-l,mi-l)
i=l J. J.
0
J.
special cases of (6) will now be enumerated.
=d
If Ci/C
and Ci/c
= d' for
all i, then (6) becomes
( 1- d ' ) / ( d- d ,) = Mo
If, further, Mol.I.
(~)
= M0 /k,
m.J.
= m0 /k
(l-d')/(d-d t )
k
l
E MiP(Mo-l,N,Mi-l,mi-l)
i=l
for all i, then
= P(M0
-l,N,(M
/k)-l,(m /k)-l)
0 0
A few optimum values of N obtained from
Table 1.
k
2
m
0
50
50
~
1.25
1.5
2.0
2.5
3·0
0.9
53
55
57
58
59
0.7
r 0.25
50
52
54
55
56
47
49
51
53
54
0.0
46
48
50
52
53
0·9
56
60
64
66
67
49
54
58
60
62
0.25
44
48
52
55
56
0.0
42
46
50
53
55
0.9
59
65
70
73
74
0·7
49
55
61
65
67
0.25
40
46
53
57
59
0.0
37
44
50
54
57
M
0
100
100
r 0·7
~~
l
f
10
50
100
~
\
\
\
These
Optimum Values of Initial Sample Size
1
5
(8) are given in Table 1.
5
were obtained using I~7. They do not cover the same range of values of
mo ' d and d' as Table 1 of 1];.7.
However, the following approximate analysis
shows how usefully accurate values of N can be obtained using the tables given
in 1];.7.
3. Some Approximations
To obtain an approximate explicit expression for
I}]
N, use may be made of Wise's
approximation to the hypergeometric function
I
P(M" n, M'" m)
(n-m, m+1)
h
M' ... 21 m
where h = 1 ...
and Ip{r,s)
= f B(r"slT
1
j
.p
r 1
t - (1_t)S-1 dt
o
is the incomplete Beta function ratio.
Using the approximation (9) in (8) leads to the equation
m
1 - I _ ( ;;;
1 h
(10)
mo
, II -
it
+ 1)
1 - d'
d - d'
=••
where
1 - h
Mo
= I it
-
7
2 ... -m
2k- -rM 0
o
1
- -21 --N2-7-1
•
Equation (10) can be expressed in the form
(11)
pr
where
x
Ix
1 _ d'
=•
<
~,
•
is a binomial variable with :PSorameters
1
(12)
M -"2 m
w(M o" N, mo" k) = £- 0 k
0
-
1
'2
_7_rMo - '21
Equation (11) differs from equation (8) in
approximation) only in that
that if N
= mo +
k - 1
ill
replaces
1];.7
1 71
N - '2 _
"and N.
(which does not use an
k -1 as one binomial parameter.
Note
then the two equations lead to identical reSUlts" so
6
that when in
£];7
it is found that
Optimal size of initial sample = (Size of final sample) +(~ of Strata) -1
(13)
the same size of initial sample will approximately be optimal whatever be the size,
M ' of the population.
o
Approximatas the binomial distribution, in tum, by a normal distribution,
the following approximate formula for
mo
1
~ - Nw
k:
where
1
If&""
and
w
J~
is defined by
1
e
-"2
u
:
N is obtained:
~/Nw(l-w)
2
1 _ dt
du
d _ dt
=
(12) •
Writing this equation explicitly in terms of N it can be arranged in the
form
Neglecting terms of order m 1M
o 0
and assuming that
N will be roughly of
order mo ' (14) can be written
(15)
mo -
~
- N
~ ~(1
m
-
~) .IN(k~l)
This is the same as equation (9) in
suggesting that the apprOXimation
might be appropriate •
I
.
o
£];7,
with ~
replaced by
m
~(l - ~ ) ,
o
7
The two conditions
(a)
if N.- m + k - 1 ,
o
N does not vary much with changes in M
0
(see (12) above), and
(b)
A.
if
= 0,
N,- (mo -
~
k)(Mo -
~)(Mo
-
~tl
(see (13) above)
suggest the slightly modified form
So a first approx1ma.tion to N can be obtained from the infinite population value
with
1
1 - d
t
d _ at
=
1
e
ro%C'
- '2
u
2
du
0
This value can then be adjusted by adding
Same explicit values of this correction, for different values of m AM
o 0
shown in Table
TABIE 2:
2.
Corrective additions to approximate optimum value of N
mo/M o
1/2
1/'
1/4
1/5
1/6
1/8
1/10
Correction
2
2
0.375k - OQ25 + 00094 A. (k-l)
2
2
0.,25k - 0.17 + 0.070 A. (k-l)
2
2
0.19k - 0.125 + 0.055 A. (k-l)
2
2
0.15k .., O~l + 0.045 A. (k-l)
2
2
0.125k - 0~08 + 0.038 ~ (k-l)
2
2
0.094k - 0.06 + 0.029 A. (k-l)
'
2
2
0.075k - 0.05 + 0.024 A. (k-l)
are
8
Naninal values of
(l-d')/(d-d') corresponding to actual values of (:L-d')/(d-d')
and various values of the sampling fraction m tM
o 0
TABLE ;.
Nominal values of
(l-dI) I( d-d' )
Actual Value of
["For
are shown in Table ;.
molMo =
(l-d' )/(d-d')
0.6
0.5
0.4
0.3
0.2
0.1
0.05
0.1;
0.11
0.09
0.08
0.07
0.06
0.1
0.19
0.17
0.15
0.14
0.1;
0.11
0.15
0.2;
0.22
0.20
0.19
0.18
0..16
0.2
0.27
0.26
0.25
0.24
0.2;
0.21
0.25
0.'2
0.;0
0..29
0.28
0.27
0.26
0.;
0.;6
0.;5
0.;4
0.;;
0.;2
0.;1
0.;5
0.;9
0.;9
0.;8
0.;7
0..;7
0.;6
0.4
0.4;
0.42
0.42
0.41
0.. 41
0.40
0.45
0.47
0.46
0.46
0.46
0.46
0.45
0.5
0.. 50
0,,50
0.50
0.50
0.50
0.50
(l-d' )/(d-d t )
greater than 0.5, enter the table with 1 - (l-d' )/(d-d')
and use 1 - (tabulated nominal value of (l-d' )/(d-d'
)t.7
As an example of the use of this table" suppose d
motMo = 0.4. Then
= ;,
dt
= 0.5
(l-d' )/(d-d') = 0.2 but we use the nominal value 0..25.
could be done, for example, by entering interpolating in Table 1 of
d =;, d'
=
1/;.
and
I)}
This
with
9
4. Comparison of EX]?ected Net Costs
The expected net cost is
Since
equation (17) can be written
k
k
1k
= N(c - M- E Mi cit) + E mi c! + E (c i - cit)~m.p(M 1 N, Mi' m. - 1)
i=l
~
i=l
~
0
~
o i=l
Using (4) this can be expressed as
C
= N(c
1
k
k
I
k
- M- E MiC!) + E m.c i• + E (ci-c!)~Mi(l-N M )P(M -1 , NIM.-1 , m.-1)
i=l ~
i=l
~
0
0
~
~
o i=l ~
If N satisfies (6) then
1 k
k
k
C = M (c - M- E Kic it ) + E mic i' - E (M.-mi)(ci-ci')P(M.INIM"mi-1)
0 i=l
i=l
i=l ~
0
~
o
10
If
ci/C
=d
If" further"
and ci/c = d t " (18) becomes
M.~ = M0 /k and mi = m0 /k
then we have
(20)
Mo
mo
II:
- P(MO, N, i t ' it - 17"
M
o
_7
m
a
The approximation (11) to P(Mo,N, i t ' i t - 1) is the same as the approximation
Mm··
00)
to P ( Mo - 1, N, i t
- 1, i t - 1 , but 'WitIl w ir..creased by
Mo
Mo - it
M (k-1)
•
G
I::
•
1
k(Mo - -2 N)
2
-
So
Nl
m
m
.'
(~ -l)l(N - ~)!
'With
OJ
given by (12).
m /k-1
N- m /k
w0
(1 _ w)
0
II
Inserting this approx:i.ma.tion in (20) we find
The ratio of C to the cost of sampling separately from each stratum (m c d) is
o
(22)
t
m
m + k
d
d1 + (1 - '(1)(1
- ~)(1.. 0
)
Mo
~o
'When M
o
~
-1
(1 _
N-1)
aio
-1
(k..1) ..
k
is large this becomes approXimately
N
o
mo/k
+ (1 .. f)(k;l) ( m /k)
(~)
. N-m /k
(k;l)
0
which agrees with (12) in £1_7.
Very roughly, it appears that the excess of the ratio over l/d is inversely
proportional to
(initial sample size)1/2 for given sets of values of k, d t /d
and m •
o
REFERENCES
11:.7
Johnson, N. L.,. "Optimal sampling for quota fulfilment", Biometrika, 44
(1957) , 518-523.
L~7
Lieberman, G. J. and OWen" D. B., Tables of the Hypergeometric Probability
Distribution, Stanford University Press. 1961.
1'd.7
Wise, M. E., "A quickly convergent expansion for cumulative hy:pergeanetric
probabilities, direct and inverse", Bicmetrika, 41, (1954), 317..329.
INSTITUTE OF STATISTICS
NORTH CAROLINA STATE COLLEGE
(Mimeo Series available for distribution at cost)
328. Schutz, W. M. and C. C. Cockerham.
The effect of field blocking on gain from selection.
329. Johnston, Bruce 0., and A. H. E. Grandage.
tial model. M.S. Thesis, June, 1962.
330. Hurst, D. C. and R. J. Hader.
1962.
331. Potthoff, Richard F.
1962.
332. Searls, Donald.
Ph.D. Thesis. August, 1962.
A sampling study of estimators of the non-linear parameter in the exponen-
Modifications of response surface techniques for biological use.
Ph.D. Thesis, June,
A test of whether two regression lines are parallel when the variances may be unequal. August,
On the "large" observation problem.
Ph.D. Thesis. August, 1962.
333. Gupta, Somesh Das.
On the optimum properties of some classification rules. August, 1962.
334. Potthoff, Richard F.
August, 1962.
A test of whether two parallel regression lines are the same when the variances may be unequal.
335. Bhattacharyya, P. K.
On an analog of regression analysis. August, 1962.
336. Fish, Frank.
On networks of queues. 1962.
337. Warren, William G.
Contributions to the study of spatial point processes.
338. Naor, P, Avi·Itzhak, B., On discretionary priority queueing.
339. Srivastava,
1962.
1962.
J. N. A note on the best linear unbiased estimates for multivariate populations.
340. Srivastava, J. N.
341. Roy, S. N. and Srivastava, J. N.
342. Srivastava, J. N.
1962.
Incomplete mUltiresponse designs. 1962.
Hierarchical and p-block multiresponse designs and their analysis.
1962.
On the mono tonicity property of the three main tests for multivariate analysis of variance.
1962.
343. Kendell, Peter.
Estimation of the mean life of the exponential distribution from grouped data when the sample is censored-with application to life testing. February, 1963.
344. Roberts, Charles D. An asymptotically optimal sequential design for comparing several experimental categories with a
standard or control. 1963.
345. Novick, M. R.
A Bayesian indifference procedure.
346. Johnson, N. L.
Cumulative sum contwl charts for the folded normal distribution.
347. Potthoff, Richard F.
348. Novick, M. R.
1963.
1963.
On testing for independence of unbiased coin tosses lumped in groups too small to use X'.
A Bayesian approach to the analysis of data for clinical trials. 1963.
349. Sethuraman, J.
Some limit distributions connected with fixed interval analysis. 1963.
350. Sethuraman, J.
Fixed interval analysis and fractile analysis. 1963.
351. Potthoff, Richard F.
352, Smith, Walter L.
On the Johnson-Neyman technique and some extensions thereof.
1963.
On the elementary renewal theorem for non-identically distributed variables.
353. Naor, P. and Yadin, M.
Queueing systems with a removable service stations.
1963.
1963.
354. Page, E. S.
ary, 1963.
On Monte Carlo methods in congestion problems-I. Searching for an optimum in discrete situations. Febru-
355. Page, E. S.
On Monte Carlo methods in congestion problems-II. Simulation of queueing systems. February, 1963.
356. Page, E. S.
Contwl1ing the standard deviation by cusums and warning lines.
357. Page, E. S.
A note on assignment problems.
March, 1963.
358. Bose, R. C. Strongly regular graphs, partial geometries and partially balanced designs.
March, 1963.
© Copyright 2026 Paperzz