Efficient Allocation of a Multi

Efficient Allocation of a Multi-Purpose Sample
Author(s): Leslie Kish
Source: Econometrica, Vol. 29, No. 3 (Jul., 1961), pp. 363-385
Published by: The Econometric Society
Stable URL: http://www.jstor.org/stable/1909637 .
Accessed: 05/06/2014 16:28
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
.
The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to Econometrica.
http://www.jstor.org
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
Econometrica,
EFFICIENT
Vol. 29, No. 3 (July 1961)
ALLOCATION OF A MULTI-PURPOSE
SAMPLE1
BY LESLIE KISH
SUMMARY
WHEN LISTING addresses within sample blocks for surveys the field inter-
viewer hastily assigns economic ratings of L, M, and H (for low, medium,
and high) to dwellings. The means and the standard deviations differ greatly
among these strata with regard to socioeconomic characteristics; hence they
may be used for allocating different sampling rates to decrease the variance
of sample means and totals. Disproportionate sampling rates bring gains in
precision for the means of skewed financial items and for estimates based
on higher economic subclasses, but they bring losses in estimating most
proportions.
The many diverse purposes of economic surveys lead to conflicting allocations. To facilitate rational decision among them we developed condensed
ways of analyzing and presenting data. The tables display for many variables
the relative precision of several allocation schemes, including the optimum
one. Standard statistics are extended to provide estimators for subclasses
from stratified samples. Then these are used to investigate optimal and
other allocations for the subclasses.
1. THE PROCEDURES OF RATING AND SELECTION
The process of obtaining the socioeconomic ratings was simple. The field
procedures called for listing the dwellings located within selected sample
blocks or segments. At the same time, the interviewer also wrote one of the
letters L, M, or H next to each of the dwelling addresses on the listing sheet,
indicating the interviewer's guess of low, medium, or high income for its
occupants. The instructions asked the interviewers to rate the dwelling M
if probably occupied by a family whose income was between $3000 to $6000
per year (or $60 to $120 per week), L if lower, and H if higher. Later we
raised these limits to $5000 and $10,000 per year ($100 to $200 per, week)
and encouraged the use of MH and ML for doubtful cases. This addition to
the duties of listing was easy and inexpensive. We emphasized that we wanted
only hasty personal judgments, that we expected frequent deviations from
the normal and "typical," and that despite this the results were statistically useful.
In the survey of 1954, the source of our empirical results, the sampling
1 This paper was read at the Annual Meeting of the American Statistical Association
in September, 1954 in Montreal. The final revision was supported by research grant
G-7571 from the National Science Foundation.
363
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
364
LESLIE KISH
rates were 1, 2, and 4 in 20,000, respectively, for the three strata. These
numbers-1, 2, and 4-were the "loading factors" of strata 1, 2, and 3,
respectively. The office procedure consisted of selecting a sample of 4 in
20,000 and then eliminating from the sample half of the dwellings rated M
and three-fourths of the dwellings rated L. The proportions in the three
strata were W=
.38, W2 -.53 and W3= .09. In the analysis, the responses
were weighted (on machine cards) in inverse proportion to the loading
factors. We investigated the possible uses of these strata for increasing
efficiency. The "efficiencies" of different allocations in the three strata give
estimates of the relative numbers of interviews needed for the same precision
or variance (approximately, see Section 6).
Like most "eye-estimates," these ratings are mere guesses and far from
perfect, but also far from trivial. Here we should compare briefly this procedure for stratification with some other possible sources of information.
On the bases of modest investigations of some alternatives (to which we should
welcome additions) we think that: (a) The elaboration of the exterior rating
would not be worthwhile. (b) If the costs and problems of double sampling
by means of a brief interview inside the dwelling could be met, the variance
per interview could be reduced considerably. (c) Stratification by make of
car is not nearly so good as by dwelling, because too many rich people have
cheap cars and vice versa. (d) There are no national lists that are both convenient and inclusive enough for this purpose. (e) Only a small proportion
(under 10 per cent) of high income people live in identifiable high income
areas, such as Census tracts.
Columns 3, 4, and 5 of Table I present mean values of the three strata
based on L, M, and H ratings, numbered as strata 1, 2, and 3. Large differences among the strata appear for mean incomes and mean liquid assets and
for many other items, e.g., for the proportion with yearly disposable incomes under $1000, the proportion with college degrees, the proportion in
professional occupations, and the proportion of nonwhites. For example,
the proportion of units having incomes of $10,000 or more is about 1,2, and
24 per cent in strata 1, 2, and 3, respectively. Even larger differences appear in
subsequent tables dealing with characteristics of selected subclasses. These
large differences show that ratings do indeed distinguish three rather
distinct groups. (The "pure" ratings actually had even greater differences,
as explained in Section 6(e).)2
2 The punctilious reader should be warned that he cannot check against 100 per
cent the percentages in columns 3, 4, and 5. In Groups C', E and F several intermediate
subclasses (mostly of type 1) have been omitted to save space without sacrificing
important results. From Group D the "not ascertained" category is missing. The
computation of medians involved proportions very near but not necessarily precisely
at the median.
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
MULTI-PURPOSE SAMPLING
365
2. DEVICES FOR MULTI-PURPOSE DESIGN
The differences among the strata means would be important for comparing
proportionate stratified sampling with unstratified sampling, since the gains
of the former depend on the variances of means among strata. But here we
are more interested in comparing the efficiencies of different allocations in
stratified sampling and these depend on the relative values of the standard
deviations within the strata.
Optimum allocation occurs when the sampling rates, hence the "loading
factors," are made proportionate to the standard deviations (SA)within the
strata. Assuming a basic sampling rate of r for the first stratum, optimum
and
allocation is reached when the sampling rates in the other strata are rkO2
rkO3,where ko2 S2/S1 and ko3 = S3/S1. Actually, we used estimates 'j, ?2,
and S (as defined in Section 8). The "optimum loading factors" ko2 and ko3
(Table I, columns 6 and 7) are the ratios of the standard deviations within
the second and third strata, '2 and S3, to the standard deviation, sl, within
the first stratum. For example, for mean liquid assets the standard deviations
within the three strata were $1788, $4063, and $8927 and these lead to the
ratios 2.27 and 4.99. Important differences among standard deviations are
indicated by values of ko2and ko3which deviate greatly from unity. Generally,
these important differences are found for those characteristics for which the
means also vary widely.
We investigated the efficiencies of various allocations of the number of
interviews necessary for fixed variance. In columns 9-10 of Table I the
"efficiencies" of two different allocations (1: 1.5 : 3 and 1: 2: 4) are given
relative to the efficiency of a proportionate stratified sample, which appears
as the unit basis of comparison in column 8. The headings on columns 8-10
denote the loading factors, k2 and k3, for strata 2 and 3 relative to the sampling rate in the first stratum. Column 11 gives the efficiency of the optimum
allocation relative to that of a proportionate stratified sample. The optimum
is reached when the "loading factors" are taken in proportion to the standard
deviations, that is, when k2 ko2and k3 ko3.
For an illustration, consider mean liquid assets. The optimum is reached
when the loading factors are in the ratio of 1: 2.27:4.49 (columns 6 and 7).
The efficiency of the optimum loading is 1.30. That is, with proportionate
sampling it would take 30 per cent more interviews than with the optimum
design to obtain a fixed variance. The efficiency of the 1:1.5 :3 loading is
1.22. Hence, the advantage of that loading is equivalent to taking 22 per
cent more interviews with proportionate loading. The 1:2:4 loading at 1.29
obtains almost the same gains as the optimum.
Consider median liquid assets as another example. The variance of the
median is computed as a function of the variance of the proportion that
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
366
LESLIE KISH
falls below the median [3, Vol. I, p. 448]. These proportions are .70, .48, and
.26, in the three strata. The standard deviation of a proportionis VPQ,and
this is V.70 *.30
.46 for s1 in the first stratum. Then S2
V48 *.52 - .50
?-and S3
.74
in
.44
result
the
ratios
(to
.46)
of
1.09
and 0.96 in
1
V.26
columns 6 and 7. The optimum loading factors are 1: 1.09 :0.96. Thus, the
optimum loading is close to proportionate sampling. The moderate 1: 1.5 :3
loading is 91 per cent efficient while the 1 :2 :4 loading is at 85 per cent
efficiency. That is, a proportionate sample of 850 interviews will give the
same precision as 1000 at the 1:2:4 loading.
What we just noted about liquid assets is also true for income, for personal
debt, and for other financial items: while estimating the means often calls
for a disproportionate allocation (here about 1 :2 :4), the estimation of
medians calls for proportionate allocation (1: 1: 1). This situation, due to
the relatively small effect of differences among the strata (in the proportions
below the median), holds for most items.
Each of the lines in Table I presents a different characteristic in the
sample. Together, they represent a larger number of calculations and the
still greater number of results produced by the survey. Each statistic produces a different optimum and different efficiencies for any particular
allocation scheme. How is one to summarize this large variety of conflicting
data? Furthermore, this variety arises along two dimensions, because each
of these many characteristics is presented not only for the entire sample,
but also for many different subclasses. For example, the mean income
and the proportion of families buying cars are two of the characteristics, and
these are presented for several occupation classes, several education classes,
several income classes, etc. The broad and divergent aims of these "Surveys
of Consumer Finances" may be seen in the issues of the Federal Reserve
Bulletins where the results appeared annually from 1946 through 1959.
The July, 1950 issue describes the methods used in these surveys. Our
computations were based on the 1954 survey of about 3000 interviews.
The purposes of these surveys were many and subject to frequent change.
To provide empirical bases for decisions on the sample design was the
aim of this research into sampling efficiencies. Our methods and results
can aid in the design of other economic surveys which are also multipurpose in character.
Our analysis and discussion must transcend the standard methods of
optimum allocation in order to permit the treatment of a large variety of
different variables. To do this we resorted to several devices for unifying,
simplifying, and summarizing the presentation of the data. We found
enough regularity among types of items and subclasses to improve considerably our guesses of design parameters by extrapolating from the tabled
results to other similar items. The reader too may find some of these data
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
367
MULTI-PURPOSE SAMPLING
useful for his designs.
Asterisks call attention to loading schemes which are within five percentage points of the efficiency of the optimum. Since the optimum points are
not sensitive, frequently two allocation schemes possess efficiencies within
five per cent of the optimum.
10 _
9
0
8 _
7 -
6 _
5
0
0
0/
ko3
4~~~~~~~
0~~~~
0
0
0
3
2
00
~~~~~~~~~0
_0Xo
0
0
0.5
1
A02 2
2.5
3
3.5
FIGURE 1.- Optimum "'Loading Factors"J A02 plotted against A03. (The function
ko3= ko is also shown.)
To simplify the process of organizing the data, we have sorted the variables
into seven "types" such that the relative efficiencies for each scheme of
loading are roughly the same for all variables designated by a type. These
efficiencies depend on the relative values of the standard deviations within
the three strata, hence on the values of ko2and ko3.We begin by designating
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
368
LESLIE
KISH
as type 1 the estimates for which ko2 1ko3-I approximately, and for
which the optimum allocation is therefore the proportionate one. Type 4
represents items for which ko2 2 and ko3 4, approximately. The
optimum allocation for this type is 1 :2 :4, which represents an efficiency
around 1.23 over proportionate sampling. Types 2 and 3 are intermediate
between 1 and 4, while for type 5 the optimum allocation is more extreme
than that of 1:2:4. On the contrary, 0 and 00 represent types for which ko2
and ko3 are both less than 1 and optimum allocation occurswith less than
proportionate sampling for the higher strata.
Criteriafor these types were derivedfrom efficiencies (relative to proportionate) of the 1 :2 :4 allocation (column 10) and the optimum allocation
(column 11). Illustrations in the following sections will clarify them. Their
usefulness dependson some regularityin the relationshipsamong the standard
deviations within the three strata, hence among the optimal loading factors
kO2and ko3.Actually, these appear to be in geometrical ratio to each other
roughly but generally. That is, if S1l/S2i = kO2ifor the ith characteristic,
then roughly S3i/S2i = ko3i/k62i= ko21 also. Hence, ko3i k=2N and the
distribution of points against this line can be seen on Figure 1-as well as in
the ko2and ko3columns of Tables I-IV. This relationship appears for means
and for aggregates, in the entire sample and in subclasses. It seems to persist
for values of ko2iboth greater and less than one. The relationshipis roughbut
still useful, because moderate departures from optimal allocations lose little
efficiency. This empirical relationship of c :cko2:cko2among the standard
deviations must depend on the particular rating system used. But for other
rating systems I would still guess a relationship of c: cko2:C(ko2)g3:*k2) g4
c(kO2)g5with the g's constant over variables and < g3 < g4 < g5, etc.
3. ESTIMATES BASED ON THE ENTIRE SAMPLE
Table I presents computations of efficiencies based on the entire sample
for seven groups of means. They are equally valid for the corresponding
aggregates. In this table all the means are proportions except those in
Group A, which are the means of three dollar distributions, each rather
skewed and concentrated in the higher economic strata. The optima represent gains of 20 per cent to 30 per cent over proportionate sampling, and
they are near the loading scheme of 1:2:4. These statistics belong to type 4.
In Group B we examine the efficiencies for estimating the corresponding
medians. These calculations actually deal with the proportions which fall
below the median value. The medians possess efficiencies different from the
means and similar to the proportions of Group G.
The attitudinal items of Group G represent many more similar computations. Most of these and the large majority of all survey items-including
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
369
MULTI-PURPOSESAMPLING
TABLE I
RELATIVE
EFFICIENCIES
OF SIX DIFFERENT
ALLOCATIONS FOR CHARACTERISTICS OF
ENTIRE POPULATION
Type
Characteristic
(1)
$25,000 and over
D. Social Classes
Education:
Grammar School
High School
College, no degree
College, degree
Occupation:
Profess. & Semi-Pro.
Self Employed
Managers
Clerical and Sales
Skilled and Semi-Sk.
Unskilled and Service
Miscellaneous
Farmers
Unemployed
Retired
Color:
Nonwhite
Efficiency related to
proportionate sample for
different "loadings": (I: ks: ks)
loading
factor
1:2:4
Opt.
(10)
(11)
1
2
3
Ass
Ass
1:1:1
1:1.5:3
(3)
(4)
(5)
(6)
(7)
(8)
(9)
2882 4085 6904
745 1922 4819
561
434
303
Proportions Below
the Median
1 .70
.44
.32
1 .70
.26
.48
.48
.49
1 .66
1.42
2.27
2.09
3.62
4.99
4.16
1
1
1
1.09
1.09
.98
1.02
.96
.87
1*
1*
1*
0
0
0
0
1
2
3
4
.15
.18
.17
.20
.15
.12
.022
.009
.07
.11
.13
.14
.17
.26
.068
.052
.03
.07
.07
.14
.11
.22
.12
.24
.72
.81
.88
.85
1.06
1.38
1.72
2.29
.52
.64
.68
.85
.86
1.29
2.19
4.45
1*
1*
1*
1*
1*
1
1
0
3
3
.44
.02
.01
.17
.06
.03
.07
.13
.09
.75
1.64
1.82
.52
2.31
2.95
1*
1
1
4
.002
.006
.04
1.97
4.91
1
.73
.83
1.06* 1.05 *
1.12* 1.12*
1.31 *
1.29
0
1
2
3
.50
.37
.044
.025
.30
.48
.11
.086
.18
.34
.19
.27
.92
1.04
1.53
1.78
.77
.98
1.91
2.82
1*
1*
1
1
.79 1.00
.88
.84 1.00
.91
1.02* 1.00* 1.05
1.11* 1.10* 1.12
3
2
3
2
1
0
1
1
00
1
.030
.046
.019
.060
.355
.176
.076
.076
.066
.080
.083
.087
.051
.143
.308
.075
.060
.096
.022
.064
.192
.159
.129
.169
.132
.047
.057
.017
.027
.065
1.62
1.35
1.62
1.48
.96
.69
.90
1.11
.58
.90
2.31
1.75
2.46
1.58
.71
.55
.88
.48
.65
.91
1
1*
1
1*
1*
1*
1*
1*
1
1*
1.06* 1.05* 1.08
1.01 * .96 1.03
1.08* 1.05* 1.09
.99* .97 1.03
.80 1.01
.88
.72 1.04
.82
.80 1.00
.88
.82 1.03
.89
.69 1.07
.81
.80 1.00
.89
00
.21
.05
.02
.55
.36
1
(2)
A. Means of Dollar Values
Mean Disposable
Income
Mean Liquid Assets
Mean Personal Debt
B. Median of Dollar
Values
Median Total Income
Median Liquid Assets
Median Personal Debt
C. Income Classes
under $1000
$1000-$1999
$2000-$2999
$3000-$3999
$4000-$4999
$5000-$7499
$7500-$9999
$10,000 and over
C'. Liquid Assets Classes
None
$5000-$9999
$10,000-$24,999
Mean
Mean
or orrooOptimum.
proportion
in stratum
Means in Dollars
4
4
4
)1*
1.22* 1.21 * 1.23
1.22 1.29* 1.30
1.20* 1.23* 1.23
.92
.91
.89
.85
.85
.82
1.00
1.00
1.00
.72 1.04
.82
.75 1.02
.85
.78 1.01
.86
.78 1.00
.87
.83 1.00
.90
.93 1.02
.97
1.05* 1.04* 1.07
1.21 * 1.24* 1.24
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
.78
.66
1.03
1.08
1.13
1.32
1.11
370
LESLIE KISH
TABLE
Characteristic
I, CONTINUED
Mean or proportion
Meanstraproprtumn
in stratum
Type
Efficiency related to
proportionate sample for
different "loadings": (1 k, : k3)
Optimum
loading
factor
1
2
3
kO2
k03
(3)
(4)
(5)
(6)
(7)
(8)
1
.59
1 1 .046
1
.087
2
.014
.55
.050
.079
.029
.56
.044
.080
.047
1.01
1.04
.96
1.43
1.01
.98
.97
1.79
1*
1*
1*
1*
1
0
1
2
.68
.082
.028
.025
.59
.066
.040
.042
.58
.028
.049
.067
1.06
.90
1.18
1.28
1.06
.61
1.29
1.59
1
2
1
.22
.06
.15
.24
.10
.14
.22
.15
.067
1.03
1.24
.96
(4) yes
1
.12
.16
.21
(5) yes
1
.035
.050
(6) for cash
(7) installment
2
.041
0
(1)
1:2:4
Opt.
(10)
(1 1)
.91
.91
.90
1.01*
.84
.84
.82
.98
1.00
1.00
1.00
1.04
1*
1*
1*
1*
.92
.86
.95
.99*
.85
.78
.90
.94
1.00
1.01
1.01
1.02
1.00
1.47
.69
1*
1*
1*
.91
.97*
.88
.84
.92
.80
1.00
1.02
1.01
1.12
1.26
1*
.94
.88
1.01
.050
1.20
1.19
1*
.94
.89
1.01
.075
.095
1.33
1.48
1*
.98*
.94
1.02
.091
.075
.042
.91
.70
1*
.87
.79
1.01
1
1
1
1
1
1
.32
.33
.33
.32
.091
.45
.38
.31
.29
.36
.12
.37
.43
.30
.25
.46
.11
.29
1.04
.98
.97
1.03
1.12
.97
1.07
.98
.91
1.07
1.10
.91
1*
1*
1*
1*
1*
1*
.92
.90
.89
.92
.93
.90
.85
.83
.82
.85
.87
.82
1.00
1.00
1.00
1.00
1.00
1.00
1
1
1
1
1.00
1.00
1*
.91
.83
1.00
(2)
E. Buying in Dollar
Classes
Bought
(1) None
selected (2) 1-99
durables(3) 200-299
in
(4) 1000 +
classes
House
(5) None
ad(6) 1-49
ditions (7) 200-299
and
(8) 1000 +
repairs
F. Buying Categories
Bought
(1) yes
Car?
(2) new
(3) used
1:1 :1 1:1.5:3
(9)
Expected
to buy
car?
Bought
house?
Bought
T.V.?
G. Attitudes
Better Off (1) yes
Last Year (2) same
(3) no
GoodTime(4) good
To Buy
(5) pro-con
(6) poor
The "Standard"
Assumed 'S =
2=
S3
The basis of the computations is k52 = S2i1 and k0o = g2/91; these are the "optimum loading factors" of columns 6
and 7.
The "loading factors" k2 and k3 are the ratios of the sampling rates in strata 2 and 3 to the sampling rate in stratum 1.
The efficiencies of five disproportionate allocations are given as ratios of the efficiency of proportionate sampling.
"Efficiency" denotes the precision (inverse of the variance) per sample size.
* Denotes allocations which are within five percentage points of the optimum.
the computations of medians-have
this in common: the proportions for
the three strata lie between 15 and 85 per cent. This still permits a great
deal of variation among the strata-variation
which may be of statistical
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
MULTI-PURPOSE SAMPLING
371
and economic significance. Nevertheless, the standard deviations IPQ lie
between .36 and .50; and the ko2 and ko3 usually well within .80 and 1.20.
These items can be well approximated by the "standard" of type 1 for which
S1
S2
S3;
hence ko2 ko3
1. Thus, for this majority of items, the
optimum is near 1: 1: 1. The slight loading of 1: 1:2 is at 96 per cent efficiency; the moderate loading of 1 :1.5 :3 is at 91 per cent, but a heavy
loading of 1:2 :4 is down to 83 per cent efficiency.
In Groups E and F we have some categories relating to buying behavior
and buying intentions. These are also mostly of type 1 with optima at the
proportionate allocation, but some categories, such as new car buying and
large purchases, are of type 2. For these the optimum would be reached
with some moderate loading; but the efficiency of the optimum is always
within 1.05 (e.g., between I and 1.05) relative to proportionate allocation.
Furthermore, the 1 :2 :4 loading is within 0.90 efficiency relative to proportionate allocation-and this limit distinguishes it from type 1.
Groups C and D present characteristics which form scales strongly
associated with economic status. The items range from type 00 to type 4,
ascending from the lower to the higher classes on the social and economic
scale. Type 3 items, contrasted to type 2, have relative efficiency over 1.00
with the 1 :2 :4 loading and over 1.05 with their optimum; but both the
1:2 :4 and the optimum loading are under 1.15 relative efficiency and this
separates them from type 4, for which the gains from such oversampling
are greater than 1.15.
At the lower end of the socioeconomic scale, several items of type 0 occur.
For these items the optimum calls for slightly higher sampling rates in the
lower strata. They (unlike type 1) have less than .80 efficiency with the
1:2:4 loading, relative to proportionate sampling; this has almost the efficiency of the optimum (within 1.05). There are only a few extreme items of
type 00. For these the efficiency of the optimum, with "reverse" loading,
would be greater than 1.05, and the 1 :2 :4 loading is less than .70 efficient
-both efficiencies being relative to proportionate sampling and distinguishing type 00 from type 0. Estimates of the proportions of the unemployed and
of nonwhites are in this category. Apparently these ratings, based on the
quality of dwellings, segregate the nonwhites more distinctly than any
based on income,
other underprivileged socioeconomic group-whether
assets, or buying.
4. ESTIMATES OF MEANS FOR SUBCLASSES
When dealing with different characteristics for various subclasses one
encounters redoubled complexity: the surveys present estimates for many
characteristics, and each of these for many different subclasses.
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
372
LESLIE
KISH
The subclasses-sometimes called "domains"-cut across the strata.
Despite slight theoretical difficulties, the ordinary sample mean Y, for the
subclass c is a good ratio estimator (see [4] and [6, pp. 297-305]). Furthermore, the estimator of its variance is
var(yc)
n
n*2
k- [nich2h +iich(l(Ii_ch)
(?ch Ic)2]
-
Here nch is the proportion of subclass c members in stratum.h. The brackets
contain the variance per element in the stratum, which consists of two
terms. The first term is Sch, the variance per element within the subclass,
multiplied by the proportionof subclassmembers.The second term expresses
the variances among the stratum means for the subclass. The entire bracketed
expression plays the same role in allocation as the usual variance per
element (see Section 9). Comparedto (b) of Section 8, we have here neglected
the trivial factors of (1 -fh) and nh / (nh- 1).
Many of our variables are binomial, and for these the second term is
usually small compared to the first, and may be neglected. Furthermore,
for these variables, ch =-y(V
h-
) and these usually do not vary much
among the strata. Insofar as one can assume negligible between-stratum
components as well as similar SChfor all strata, one can treat the standard
deviation in the stratum as proportional to flch. Then the efficiency of an
allocation scheme depends only on the proportionof subclass membersin the
various strata. These are the assumptions behind the "standards" shown
for several subclasses in Table II. We found these standards useful for
many variables, just as we found in Table I that many variables-especially
in Groups E, F, and G-behaved very much like the "standard" type 1.
For many variables, especially proportions, we found results (in terms of
ko2 and ko3)rather similar to the "standards." For example, the standard
for the domain of professionalshad ko2 =1.67 and ko3 = 2.54 (see Table II).
We compared these with many actual computations (not shown here). For
estimating the proportion of those who thought it was a "good time to buy,"
these factors were ko2 1.66 and ko3 2.54; for estimating the proportions
who said that they were "making more money than in the previous year,"
the factors were ko2 = 1.63 and ko3 = 2.34.
For many subclasses the fich are relatively constant from stratum to
stratum. These are subclasses which were not strongly distinguished by the
economic ratings, e.g., age classes, geographicor city size classes, attitudinal
classes. For this large variety of subclasses the "standard" is similar to that
for the entire population (ko2 -1 and ko3 = 1) and no separate results need
be presented for them.
The situation is different for subclasses which were distinguished by the
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
373
MULTI-PURPOSESAMPLING
TABLE II
"STANDARDS" FOR COMPARINGTHE RELATIVE EFFICIENCIES OF SOME SUBCLASSES
where ii,h is the proportion
(Computed by assuming h02 = lAC2/nil and ko3
Jc3/ln,c1
of subclass members in stratum h.)
"Subclass"
(1)
Type
(2)
Optimum
loading
factor
Efficiency related to proportionate
sample for different "loadings":
(1: k2: ks)
kos
koa
1:1:1
1 :1.5:3
1:2:4
Opt.
(3)
(4)
(5)
(6)
(7)
(8)
Entire Population
1
1.00
1.00
1*
.91
.83
1.00
H. Income Classes
Under $1000
$1000-$1999
$2000-$2999
$3000-$3999
$4000-$4999
$5000-$7499
$7500-$9999
$10,000 +
($7500 +)
0
0
0
0
1
2
3
4
4
.69
.77
.86
.82
1.08
1.52
1.76
2.34
1.95
.49
.60
.64
.82
.84
1.37
2.31
5.09
3.39
1*
1*
1*
1*
1*
1*
1
1
1
.82
.84
.86
.86
.90
.98
1.06*
1.26*
1.15*
.71
.74
.77
.77
.84
.95
1.05*
1.29*
1.16*
1.05
1.02
1.01
1.01
1.00
1.03
1.08
1.30
1.16
I. Occupation Classes
Professional and Semi-Pro.
Self Employed
Managers
Clerical and Sales
Skilled and Semi-Skilled
3
2
3
2
0
1.67
1.38
1.64
1.55
.93
2.54
1.87
2.61
1.68
.61
1
1*
1
1*
1*
1.08*
1.02*
1.09*
1.01 *
.87
1.07*
.98
1.08*
.98
.79
1.09
1.04
1.10
1.04
1.01
00
.65
.52
1
.81
.70
1.06
0
00
1
.89
.57
.89
.87
.64
.90
1
1
1*
.88
.80
.88
.79
.69
.80
1.00
1.08
1.00
Unskilled
and Service
Miscellaneous
Unemployed
Retired
economic ratings and Table II is devoted to these. The results show that the
standards range from types 00 to 4 as we ascend the scale of economic class.
Hence, for estimating proportions based on the blue collar workers, or on
lower income groups, it is most efficient to use proportionate sampling (or
even less) for the higher strata; but for estimating proportionsbased on the
higher income or occupational groups, a higher loading gives moderate
gains.
For some characteristics, however, the above assumptions do not hold
well. Hence their statistical characteristics (in terms of the ko2 and ko3)
depart from the "standard." These departures are usually in predictable
directions; for low economic activity the factors often are depressed, while
for high economic activity the factors are increased. For example, for the
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
374
LESLIE KISH
subclass having incomes above $7500 the "standard" had factors of ko2 =
1.95 and ko3 3.39 (see Table II). For estimating the proportion within
this income subclass of those who possess liquid assets of over $7500 we
actually found the factors increased to ko2- 2.78 and ko3- 6.19 (computations not shown).
The largest departures from the standards, and the largest gains for highly
loaded samples, were found for the kinds of items illustrated in Tables III
and IV.
TABLE
RELATIVE
EFFICIENCIES
III
OF THE MEAN LIQUID ASSETS HOLDINGS
SUBCLASSES
Occupation subclass
Type
(1)
(2)
(3)
5
5
5
4
797
1721
1113
1668
1
0
4
3
654
586
480
905
1
J. Occupation
Professional and
Semi-Pro.
Self Employed
Managers
CleriCal and Sales
Skilled and
Semi-Skilled
Unskilled and Service
Miscellaneous
Retired
Optimum
loading
factor
Mean for stratum
(in dollars)
FOR OCCUPATION
Efficiency related to
proportionate sample for
different "loadings": (I :k2 k3)
1:1:1 1:1.5:3
1:2:4
Opt.
(10)
(11)
1.42
1.48
1.60
1.16*
1.48
1.56
1.76
1.16
.97
.94
.85
.77
1.29* 1.31*
1.07 * 1.05*
1.03
1.05
1.32
1.07
2
3
ko2
(4)
(5)
(6)
2151
3224
3267
1734
4145
9852
7312
2847
2.74
2.55
2.44
2.00
7.28
7.74
9.06
3.41
1
1
1
1
1.37
1.43
1.53
1.15**
1238
1071
1802
2442
1895
1.51
812
.93
4176 1 1.91
1.43
5736
1.23
.28
4.80
2.88
1*
1*
1
1
'
o3
(7)
(8)
(9)
.
TABLE
RELATIVE EFFICIENCIES
IV
OF THE AGGREGATES LIQUID ASSETS HOLDINGS FOR
OCCUPATION SUBCLASSES
Occuipation subclass
Type
Optimumii
loading
factor
ko2
(1)
K. Occupation
Professional and Semi-Pro.
Self Employed
Managers
Clerical and Sales
Skilled and Semi-Skilled
Unskilled and Service
Miscellaneous
Retired
ko__
Efficiency related to proportionate
sample for different "loadings":
(1 :k :ks)
1:1:1
1:1.5:3
1:2:4
Opt.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
5
5
5
3
1
0
4
4
3.96
3.05
5.14
1.88
1.56
.96
2.34
1.69
10.33
9.38
18.14
3.19
1.31
.36
5.61
3.34
1
1
1
1
1*
1*
1
1
1.39
1.45
1.55
1.13*
.98
.86
1.30*
1.16*
1.46
1.52
1.65
1.14*
.95
.78
1.34*
1.16*
1.57
1.64
1.96
1.15
1.04
1.04
1.35
1.17
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
375
MULTI-PURPOSE SAMPLING
These tables deal with the skewed distributions of "dollarvariables "pertaining to subclasses that represent economic classes with differing dollar
shares.These are important types of estimates in many surveys. Table III
deals with mean liquid asset holdings and Table IV with the corresponding
totals. The higher economic subclasses are of type 5, denoting relative
efficiencies above 1.35 for the optimum compared to proportionate sampling-and this limit distinguishes type 5 from 4. Although the optimum is
even more disproportionate, the 1:2 :4 loading achieves most of the gains.
But even a moderate overloading of, say, 1: 1.5 :3 obtains a large portion
of the possible gains and in turn loses less for the less extreme items.
The larger values (than indicated by the standard) of k02and ko3for high
economic activity are due to coincidence in the higher strata of both larger
proportions, nch of elements with higher activity and of higher variances,
,2
Sch, per element. On the whole, the results show a rather consistent pattern
in the rough additivity of these two factors,and this is useful for extrapolating
the empirical results to other characteristics in the same survey and for
planning future surveys.
5. ESTIMATES OF AGGREGATES FOR SUBCLASSES
Estimating aggregates for subclasses may be important, e.g., they can
be used to estimate the "share" of the entire sample accounted for by the
subclass, such as an occupation subclass or an income category.
When dealing with the entire sample (as in Table I), the standard deviations are equally relevant for allocating observations whether estimating
aggregates or means. When dealingwith subclasses,however, the best allocation for estimating the aggregate (Table IV) is not necessarily the same as
that for estimating the mean. This may be judged from the formula for
the variance of aggregates (from equation (3b) of Section 8, but neglecting
the trivial factor (1 -fh)nh/(nhvar (ye)
1)):
N HWh
- I -LzchS
2
2
+'ch(I
`Rc,+ch( 1]
Comparingthe variance per element within the brackets with that for the
mean, we see that the first term is the same, but the second term differs. The
second term is here a function of the value of 5hy, the mean per subclass
element within the stratum (while in the variance of the mean it depends on
the deviations of the means). Now the second term is generally neither small
nor similar from stratum to stratum. We found (as expected) roughly the
following relationships among the strata:
For dollar items the subclass means, 5Ch, were roughly proportional to
the standard deviations, sch; that is, the coefficients of variation, Sch/ Yh,
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
376
LESLIE KISH
were approximately constant among the strata. Usually (1 - fih) is unimportant, and then the second term is roughly proportional to the first. Hence,
the allocations for estimating the means would hold also roughly for estimating the aggregates. Actually, the factors ko2and ko3,and hence the optimum
allocations, diverge somewhat more for aggregates than for the mean
(compare Tables III and IV). This is due largely to the "damping" effects
of the relative constancy of the second term in the variances for the means.
For estimating aggregates of binomials the second term generally had
little disturbing effect on the allocations. For proportions near 0.5 it was
=. 25,
proportional to the first term, since y? = .25 and Sch = 5ch ( -ch)
relatively
small;
e.g.,
for
also. For small proportions the second term is
ych -
.1 the Ych-
.01 while S
=09.
6. SIMPLIFICATIONS INTRODUCED INTO THE ANALYSIS
To be able to deal with and present a large body of complex data, we
simplified the treatment in several ways, while keeping the basic problems
in sight. But to apply these results to a specific sample design, one should
consider several aspects neglected in the analysis.
(a) The effects of the complexities of the cluster design have been disregarded. Hence, the actual losses and gains of the total variances are less than
the computed variations in efficiencies. For complex designs one can usually
consider the effects of clustering as additional components-great for
large clusters, such as counties in a nationalsample, andless for samples
thoroughly spread over the population (e.g., in a city). In either case, if
one keeps the number of clusters constant, then the computed relative
efficiencies indicate the ratios by which the numbersof interviews need to
be increasedor decreased.
(b) Our computations were based on the results of 3000 cases, and the
results for subclasses are particularly subject to sampling errors. But since
the efficiencies of allocation vary but little for moderate departures in the
values of the factors ko2 and ko3, precision is unnecessary. The kind of
stability that was needed came from the computations for the great variety of
items comprisingthe survey. Moreover,for eas'eof computation we neglected
the factors (1 -fh) and nh / (nh -1).
(c) We consider only estimates in which there is complete compensation,
with the inverse weighting of estimates, for the differences of selection
rates. But in the discussion of efficiencies we neglected the increased costs
of a weighted sample as against a proportionate sample. The increase in the
costs of analysis, of sampling, of planning, of bookkeeping is not negligible,
and one could for the same cost buy more interviews so as to increase the
precision of the proportionate sample.
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
MULTI-PURPOSE SAMPLING
377
Furthermore, if the sampling rates were kept constant, "rating" the
dwellings would be unnecessary. The cost of these ratings was, however,
small in our situation. We also disregardedpossible differences among the
strata in cost per interview. These differences were smaller than we could
measure and the differences in Vich(see formula 12) would not affect the
allocations by more than a per cent or two.
Increasing the divergence among sampling rates increases also the total
numbers of addressesthat need to be listed in one form or another, and this
results in increased listing costs. Roughly the listing of a dwelling had a
cost of $0.50 against $20.00 for interviewing and coding, a ratio of 1:40.
The proportionof "used"dwellingsinthe 1:2: 4 allocation was .09/1 +.53/2 +
.38/4 .45; hence, we had to list 1/.45 2.2 dwellings for each dwelling
used. Thus the cost of a 1:2: 4 allocation was (40 +2.2) / (40 + 1) = 1.03
greater per dwelling than for proportionate sampling. This factor can be
applied to reduce the relative efficiencies shown in the columns for the
1 :2:4 loading.
(d) The above factor was just about cancelled by an equal correction in
the other direction for the actual samplingprocedures(asdescribedin Section
1). This procedure did not have the actual population weights (Wh) that a
true stratified sample requires. It can be regardedusefully as a procedureof
"double sampling for stratification" [2, pp. 268-271]. In this, the variance
I) 2, where n' and X are
h h +( 1/n') X Wh (
is approximately (1/n)
the sample sizes for the first and second phases.
An allocation of 1:2 :4 would utilize a first phase of n' _ /.45 listings,
thus the second term would enter with the factor (.45/n). But for an allocation of 1 :1 :1 only n' n listings would be needed and the second term
would enter with the factor (1/X). This difference was found to make the
1 :2 :4 allocation, compared to the proportionate 1 : : 1, more efficient
actually than shown in the tables by an effect of 1.02, on the average. This
effect ranged mostly from 1.00 to 1.04, with one extreme at 1.06; these were
all based on the entire sample. For variates based on subclasses, this effect
will be smaller.
(e) Actually, the data in Tables I-IV understatesomewhat the efficiencies
of actual interviewer ratings for types 3, 4, and 5. They represent the entire
sample selected with sampling rates in the proportions of 1 :2 :4 for the
three strata, because we wanted to judge the performanceof the ratings in
the sample as actually used. However, to judge how well the interviewer
ratings could perform if completely utilized, we call attention to some
important divergences of the three strata (1,2,3) from the interviewers'
actual ratings (L,M,H). First, 22 per cent of the respondents were in the
''open country," where compact segments were the last stage sampling
units and where no ratings were assigned to the dwellings. Secondly, in
-
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
378
LESLIE KISH
about 7 per cent of the cases, the interviewers' ratings were mixed (LM or
MH). Thirdly, about 3 per cent of the cases were placed into higher or lower
strata than indicated by the ratings. We used this flexibility to adjust the
sample sizes to lessen field problems. Because of these departures of the
actual strata from the interviewers' rg tings, the data in Tables I-IV actually
understate somewhat the true efficacy of the ratings. To evaluate these
effects we also investigated on a limited basis the 68 per cent of the sample
in which the allocations were truly based on interviewers' ratings. Most of
the results, as expected, were close to those for the full sample. The chief
departures came for items of types 3-5, for which the advantage of the opTABLE
V
RELATIVE EFFICIENCIES WHEN STRATA FOLLOW STRICTLY THE INTERVIEWERS
RATINGS
(Based on 68 per cent of entire sample. These results should be compared with similar
characteristics in Table I based on the entire sample.)
Mean or proportion
in stratum
Characteristic
Optimum
loading
factor
Type
1
(1)
(2)
Mean Disposable Income
Mean Liquid Assets
Mean Personal Debt
Income Classes
$5000-$7499
$7500-$9999
$1O0,00andover
(3)
|
2
3
ko2
ks3
(4)
(5)
(6)
(7)
Efficiency related to proportionate
sample for different "loadings":
(1; k2 : k3)
1
:
1:1.5:3
1:2:4
(8)
(9)
(10)
:
Opt.
(11)
4.67
5.87
4.96
1
1
1
1.32 * 1.32 * 1.36
1.38
1.41 * 1.45
1.36
1.37* 1.42
1.43
2.73
3.31
1.33
3.46
7.16
1*
1
1
.98*
1.12
1.30
.94
1.16*
1.37
1.03
1.20
1.43
.14
1.93
1.59
1.68
2.79
2.16
2.51
1
1
1
1.10*
1.05*
1.08*
1.11*
1.03*
1.07*
1.12
1.07
1.09
.16
.14
.05
1.81
1.89
1.32
2.72
3.76
4.94
1
1
1
1.10*' 1.10*
1.19* 1.20*
1.39
1.40
1.11
1.20
1.49
5
5
5
2811
659
308
4174
1824
381
1
4
5
.11
.008
.004
.28
.07
.05
.23
.12
.31
3
3
3
.02
.03
.02
.09
.09
.06
.23
I .19
3
4
5
.02 1 .07
.008 1 .03
.002
.004
7974
1.62
5789
2.00
659 1 1.53
Occupation
Professional and
Semi-Pro.
Self Employed
Managers
Liquid Assets
$5000-$9999
$10,000-$24,999
$25,000 and over
TABLE
VI
For the entire sample
For the "pure" 68 per cent
Strata
2
Mean Disposable Income
Mean Liquid Assets
Mean Personal Debt
1,939
1,788
535
2,755
4,063
1,121
3
7,012
8,927
2,226
L
1,725
1,716
543
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
M
2,794
3,429
833
H
8,056
10,065
2,693
MULTI-PURPOSE SAMPLING
379
timum over the proportionate was considerable. These are illustrated in
Table V and can be compared with the same variables in Table I. The
"pure" strata of Table V are seen to achieve optima from 5 to 20 per cent
higher than the "impure" strata of Table I. The reader can estimate, for
all proportions in the tables, S h
For the three important
ip-h(l-ph).
dollar items the Sch are shown in Table VI.
7.
CONCLUSIONS
The choice of the allocation scheme is easier if an item or a small set of
similar items can be designated realistically as the prime objective of the
survey. But many surveys, like ours, have many conflicting objectives. To
guide the choice of design we were compelledto present the consequencesof
different designs for the entire spectrum of survey variables. Our objective
was to make this vast task feasible and comprehensible.
For many survey statistics we obtained estimates of the optimum allocation and of the relative efficiencies for several allocation schemes. These
generally differ, and a brief summary may be useful.
Proportionate sampling seems to be near the optimum for the majority of
estimates dealing with proportions. But disproportionate sampling results
in moderate to large gains for some important kinds of estimates, such as:
(a) Means of financial variables with skewed distributions, such as income
and liquid assets; (b) Proportions denoting high economic status or high
economic activity; (c) Proportions based on subclasses standing high on
the socioeconomic scale; (d) Financial variables based on subclasses high
on the socioeconomicscale.
For the first three, the gains over proportionate sampling were below 40
per cent; for the last, they went up to 100 per cent. For all these items disproportionate sampling rates, perhaps in the 1 :2 :4 ratios actually used,
are close to the optimum. On the other hand, such disproportionatesampling
is 15 to 20 per cent less efficient than proportionatesamplingfor the majority
of proportions estimated in the survey. The allocation of 1 :1.5 :3 would
cut these losses to about 10 per cent and still achieve most of the gains
possible for the other items.
Increasing the precision is generally not equally important for all items
in a survey, and the problem should be stated in terms of the "costs" of
specified errors for each of the estimates in a decision function. In this
formulation-as in many practical situations-the requirements can not
reasonably be met by the statistical analyst. Nevertheless, we might raise
ever so briefly the question of the relative importance of errorsfor different
estimates.
In comparing a characteristic for two different economic subclasses, the
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
380
LESLIE KISH
variance of the comparison is often dominated by the variance of the subclass "higher" on the economic scale, because the "higher" subclasses are
both more variable and smaller. They are made smaller because they
represent the tails of skewed distributions and economic activities and
have importance well beyond their proportions. Increasing the efficiency of
estimators based on these relatively small subclasses of higher economic
status and activity favors some disproportion in sampling rates. A moderate
disproportion may produce most of the gains for these items and result in
only small losses for the others. However, one should also consider the increased costs of sampling and analysis before departing from the simplicity
of proportionate sampling.
8.
SUBCLASSES FROM STRATIFIED SAMPLES
For a population of N
population mean F
H
_
Nh
Y/N = (1/N)
elements divided into H strata the
EH 1Nh Yhl
(1/N) EH Nh h=
EH Wh 7h is usually estimated by
H
y
1 H nh
Hnh/fh
zII* /rYh
z Whyh
n*
1 H yh
EkhYh
n*
y
n*
kh
where Wh= Nh/N is the proportion of elements in the hth stratum and
nh/Nh
rkh is the selection rate within the hth stratum;
EH Wh =1; fh
r is some basic selection rate; kh is the "loading factor" in stratum h (usually
some convenient number, for example, in the survey which provided our
empirical results, r was near 1/20,000 and the loading factors for the three
Nr = Hnh/kh is the
strata were k1
1, k2 2, and k3 = 4); n*
sample "size" weighted inversely to the "loading factors"; and y*
E E yhi/kh is the sample total for a variable in which each element is
weighted inversely to the loading factor. If the nh elements within
each stratum are selected by simple random sampling, the variance of the
sample mean is estimated by
H
(1)
var(y)
E(-
fh)
W2H
Wfh)h2
s
=
1h
2
Sh ,
E (I -fh)
where SEn (nh h1)i n (yhi -y h) 2 is the estimated variance of elements
within the hth stratum. For a binomial, this can be computed simply
as equal to ih,&(1 yh)nhI/(nh
1).
N' =y*/r, can be estiThe variance of the estimated aggregate, y'
mated by
H
Nfh)
N
Wh
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
2
MULTI-PURPOSE
381
SAMPLING
The estimators of the mean and aggregate are unbiased with minimum variance. Also (1) and (2) give unbiased estimators of the population values of the
2.
variances, because Sh is the unbiased estimator of (Nh 1)-1 zNh (YM -h)
These formulas can be found in sampling textbooks [1, 3, 6].
Very often estimates are made not only for the entire population covered
by the survey sample, but also for various subclasses thereof. If each stratum
belongs entirely to only one of the subclasses, then the standard formulas
are adequate. But we need new formulas if the subclasses cut across strata,
so that the elements of a stratum belong to more than one subclass.
The variance for subclasses in simple random samples is derived by
Hansen, Hurwitz, and Madow [3, Vol. II, pp. 114-117]; and Yates [6] gives
the results of this section without derivation. More recently, Durbin [2]
gave derivations for stratified samples, and Hartley [4] presented them in
greater detail and generality. But our brief derivation is convenient and
useful both for presenting our empirical results and for the formulation of
the efficiencies of the next section.
Suppose that of the N elements in the population N0 are members of the
specified subclass c, N0h being members among the Nh elements of the hth
stratum. In the sample of nh random elements we find n,h belonging to the
cth subclass. The fact that n0h is a random variable gives rise to the
special problems of a subclass. These problems can be handled easily by
defining for the subclass in the sample (and with analogous definitions in
capital letters in the population):
Ychi
Yhi
for the
nh
Ychi
0
for the
n
members of the subclass.
-nCh
nonmembers of the subclass. We shall also use
later the special "counting variate":
for the n0h members of the subclass.
n1hi-1
for the nh -nlh nonmembers of the subclass.
-0
nchi
Also useful will be ,h = nchlnh, the sample proportion of subclass members
in the hth stratum, and the stratum mean within the subclass,
1
Ych
Ch=
'nch
-
1
Uc
Ychi
-E
'nch
1
nh
'ncch
Ychi
Yhi
fnch fnh
1
=
ncch
The redefinition of the variable ychi does not disturb the unbiased nature
of estimators of aggregates such as y*c and yc. The variance of the estimator
of the subclass aggregate, EHych,=!Ycki
y*/r, can be computed with
equation (2) by using the entire sample and including (nfh -nah) zero values
for Ych. But we can gain ease and insight by using only the subclass sample.
First, note that because (nh
nch)
values of yhi and yhi are zero,
nh
- Sh
(nh1)Sh-z2
1)
(nh
~E
2
Vhi
2
- Yh
qn
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
382
KISH
LESLIE
becomes for the variate Ychi
2 yc2
Nch
2
nch
Yci h
nlh
Ych
-
2
2
ri
nlh
nc
nch
(nch
1)Sch
ch (I1 ich)
+
2
O
This definition ofOSCh for the subclass resembles the definition of
stituting for Sh in (2) gives
H
var (y')
(3)
=
S2.
5h.
ych
Sub-
22C
sc +-
S(c-fh)
l
(1fIch)
YchJ
This is the unbiased estimator (as Sh is of S2) of Var (y'), whose population
values are given when there are capital letters inside the brackets. By using
Sch
Ch(ychi -5Ch) / nch we get a simpler expression:
~~~~2
H
'
HNhvar
(
I
=
[Ifch h + tch (II- ch) ych]
(3a)
-fh)(yc)
Nch/Nh above we can compare it to the variance, E (1 fh)
of
a
sample designed specifically for the subclass and see that
Nchschlnch,
is increased approximately by (1-Nch)
the relative variance, Sc h/ I2
For small subclass proportions this increase approaches 1, and it can have
a great effect.
A useful computation form is
By using Nch
2
(3b)
2
var(y')
N
r
-
H
Wh~ nh
(l r -IA)
kh~nh~-I
2
fiech + htc(I
-
tqch) 5h]
The mean Y, (-- YI/N,) for a subclass is usually and conveniently estimated
by the ordinary (but weighted) sample mean
H
(4)
IC
Ye
Y
y
ye
Yychlkh
nc
n
H
/Yr
n%chlkh
This is the "combined ratio estimator" in a stratified random sample.
As a ratio of two random variables it has a technical bias which decreases
with increasing sample size. These properties are given in sampling texts,
as is the usual estimator of the variance:
(5)
var(5?)
nc
[var (y,) + 52var (nc) 2y, cov (y, n')]
The variance of yc was given above. The variance of nc is the special
case of a binomial variate for (2) in which Sh = Ch(1 -`h)fh/
1),
(fhand so
H
(6)
var (n)
(1 -Jh)
N2
h
[flch (1 'fie )j
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
MULTI-PURPOSE
2
Sh =
Just as we earlier treated
valent
(nh-1)
which becomes for the variate
2
Shy,
383
SAMPLING
we similarly treat the covariance equi-
shyn
-
yhinhi
_
nh
ychi
Ych -YCh ftch
( 1-ich)
-nhfAch
Ychl*
Then substituting Shyn for Sin (2) we get
~~~2
H
cov (yc', n'c) =II
(7)
nh [Ach
-fh)
(l-tch)Ych]
Upon substituting the results of (3a), (6), and (7) into (5), we obtain within
each stratum
ftch ch +'Rch
(I
-
(l
fich
tfch) YCh
'e2 SA+ #fchA(l
-tch)5c-21chA(I
-tch)5ycich
(?Ch -5C) 2
-ich)
and these terms, multiplied by the common factors and summed over all
strata, give
~~~~2
H
var (yc)
(8)
(
2
I-fh)
n f-1ch
Ach
+ fch ( l-cA)(?cA5c)2]
nc
and, using if - nI/N=
sample,
n /n', the proportion of subclass members in the
2
H
(8a)
var(yc)
-
J(I -fh
n2
c
A
{ncAScA
SC
-
n
+fch
(I -c)
(YcAchc)2}
so that
I
(8b)
var
(ic) -*2
H
S
WA
( -fI)
flAh
-
lch
sch + fch (I -fcA)
(Ych-
c) 2}.
We can compare this with a sample designed specifically for the subclass and
find that the variance per elemenit is increased by approximately (1 -fich)
Thus for small subclasses (fich small) the effect on the variance of
(?ch -<c)2
the mean is to wipe out the gains, (Ych - ?,) 2, that proportionate stratification would yield. But the additional gains from optimum allocation will
tend to persist.
We should add here that the variance of the difference of two subclasses
(c and d) involves the two variances and their covariance. This introduces into
the term in braces the quantity &hAidh (c ch-ic) (ydh -yd) for the mean and
fichAidhYchAYh for the total [see 6, p. 301]. These will be comparatively small
for the more usual subclasses-where fich is small compared to (1 -fch) X
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
384
LESLIE
9.
THE ECONOMICS
KISH
OF DIFFERENT
ALLOCATIONS
Any of the four variances-of means or aggregates, for the entire sample
or for a subclass-may be expressed in a convenient common form:
2
H H
(9)
var (u)=
Nh
-Bh
A E ( 1-fh)
2
=
II~~
A I(1 -fh)
Wh
2
Rh,
where Bh is the within-stratum variance per element. This generalization
of the variance permits the easy extension of allocation formulas from the
2
2
2
entire sample to subclasses. The estimates bh of Bh are sh in (1) and (2) for
the entire sample and are the quantities in brackets in (3) for the aggregate
and in (8a) for the mean of a subclass. In estimating aggregates, A = N/r;
1/nn* for the entire sample and A
I/Nr
in estimating means, A
1= /cn*
N/n'2r
for subclasses.
In comparing two sets of allocations, a and b, let ra and rb and kah and kbh
denote, respectively, the basic sampling rates and the sets of loading factors.
That is, fah =rakah and fbh =bkbh.
We may obtain, for any of the four
types of estimators, the ratio of the variances of the two allocations as
H
#
( 0)
variance (a)
variance (b)
r~~bE
b
r-
ra
Wh
(I -fah)
fah
H.
2
a-Bh
2
Wh
X (l-fbh)j--
Bh
Denoting by Ch the cost per element in the hth stratum, we can compare
the variances of the two sample designs subject to the condition of fixed
rN 2H ChWh kh . The inverse of the ratio of variances for fixed
zH
Chn h
cost, which we can we call "relative economy" of allocation b to allocation a,
becomes
H
(11)
variance (a) _(
~~variance
varince(b)
b)
H
Ch Wh kah)
HHH
(E ChWh kb72)
E (1 -fah) Wh
E (I21fu)',Bjb
-fah) Wh
2
Bh/kah
Bhlkbh
If we neglect differences (often small) among the Ch and compare variances
for fixed Enh, then (11) without the factors Ch measures the "relative
efficiency" of allocation b to a.
The "optimum allocation" of the nh, giving minimum variance for fixed
CHChnh, is obtained in sampling texts with Lagrange multipliers for the
entire sample. We extend this to subclasses by using the Bh in place of the
Sh and obtain the optimum
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions
MULTI-PURPOSE SAMPLING
385
H
nh
(12)
hkkz=~ Ch
Ch h
Bh
h H
_
I2NhBh/V'ch
In our data, as is often the case, the fhare so small that the factor (1 -fh)
can be neglected. The ch can also be dropped because the differences among
them are slight. In computing the allocations and "efficiencies" for this
study we used these two simplifying conditons.
For many binomial items (proportions) in the survey, the Bh are also
relatively constant. In these cases the optimum is proportionate allocation,
and compared to it the relative economy of allocation b can be shown to be
This is always less than one. Hence,
simply [(EWhkbh)(
Wh/kbh)]-l.
for Bh and Ch constant, all departures from proportionate sampling involve
some loss. This loss is shown in our results for many items of "type 1."
Survey Research Center, University of Michigan
REFERENCES
[1] COCHRAN, W. G.: Sampling Techniques, New York: JohnWiley and Sons, 1953.
[2]
[3]
[4]
[5]
[6]
J.: "Sampling Theory for Estimates Based on Fewer Individuals Than
the Number Selected," Bulletin of the International Statistical Institute, Vol. 36,
Part 3, pp. 113-119, 1956.
HANSEN, M. H., W. N. HURWITZ, AND W. G. MADOW: Sample Survey Methods and
Theory, New York: John Wiley and Sons, 1953, Vols. I and II.
HARTLEY, H. O.: "Analytic Studies of Survey Data," contribution to a volume
in honor of Corrado Gini, University of Rome: Istituto di Statistica. (In
preparation.)
KATONA, G., L. KISH, J. B. LANSING, AND J. K. DENT: "Methods of the Survey of
Consumer Finances," Federal Reserve Bulletin, 36 (1950), pp. 795-809.
YATEs, FRANK: Sampling Methods for Censuses and Surveys, London: Charles
Griffin and Company, 2nd. edition, 1953.
DURBIN,
This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM
All use subject to JSTOR Terms and Conditions