Efficient Allocation of a Multi-Purpose Sample Author(s): Leslie Kish Source: Econometrica, Vol. 29, No. 3 (Jul., 1961), pp. 363-385 Published by: The Econometric Society Stable URL: http://www.jstor.org/stable/1909637 . Accessed: 05/06/2014 16:28 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to Econometrica. http://www.jstor.org This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions Econometrica, EFFICIENT Vol. 29, No. 3 (July 1961) ALLOCATION OF A MULTI-PURPOSE SAMPLE1 BY LESLIE KISH SUMMARY WHEN LISTING addresses within sample blocks for surveys the field inter- viewer hastily assigns economic ratings of L, M, and H (for low, medium, and high) to dwellings. The means and the standard deviations differ greatly among these strata with regard to socioeconomic characteristics; hence they may be used for allocating different sampling rates to decrease the variance of sample means and totals. Disproportionate sampling rates bring gains in precision for the means of skewed financial items and for estimates based on higher economic subclasses, but they bring losses in estimating most proportions. The many diverse purposes of economic surveys lead to conflicting allocations. To facilitate rational decision among them we developed condensed ways of analyzing and presenting data. The tables display for many variables the relative precision of several allocation schemes, including the optimum one. Standard statistics are extended to provide estimators for subclasses from stratified samples. Then these are used to investigate optimal and other allocations for the subclasses. 1. THE PROCEDURES OF RATING AND SELECTION The process of obtaining the socioeconomic ratings was simple. The field procedures called for listing the dwellings located within selected sample blocks or segments. At the same time, the interviewer also wrote one of the letters L, M, or H next to each of the dwelling addresses on the listing sheet, indicating the interviewer's guess of low, medium, or high income for its occupants. The instructions asked the interviewers to rate the dwelling M if probably occupied by a family whose income was between $3000 to $6000 per year (or $60 to $120 per week), L if lower, and H if higher. Later we raised these limits to $5000 and $10,000 per year ($100 to $200 per, week) and encouraged the use of MH and ML for doubtful cases. This addition to the duties of listing was easy and inexpensive. We emphasized that we wanted only hasty personal judgments, that we expected frequent deviations from the normal and "typical," and that despite this the results were statistically useful. In the survey of 1954, the source of our empirical results, the sampling 1 This paper was read at the Annual Meeting of the American Statistical Association in September, 1954 in Montreal. The final revision was supported by research grant G-7571 from the National Science Foundation. 363 This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions 364 LESLIE KISH rates were 1, 2, and 4 in 20,000, respectively, for the three strata. These numbers-1, 2, and 4-were the "loading factors" of strata 1, 2, and 3, respectively. The office procedure consisted of selecting a sample of 4 in 20,000 and then eliminating from the sample half of the dwellings rated M and three-fourths of the dwellings rated L. The proportions in the three strata were W= .38, W2 -.53 and W3= .09. In the analysis, the responses were weighted (on machine cards) in inverse proportion to the loading factors. We investigated the possible uses of these strata for increasing efficiency. The "efficiencies" of different allocations in the three strata give estimates of the relative numbers of interviews needed for the same precision or variance (approximately, see Section 6). Like most "eye-estimates," these ratings are mere guesses and far from perfect, but also far from trivial. Here we should compare briefly this procedure for stratification with some other possible sources of information. On the bases of modest investigations of some alternatives (to which we should welcome additions) we think that: (a) The elaboration of the exterior rating would not be worthwhile. (b) If the costs and problems of double sampling by means of a brief interview inside the dwelling could be met, the variance per interview could be reduced considerably. (c) Stratification by make of car is not nearly so good as by dwelling, because too many rich people have cheap cars and vice versa. (d) There are no national lists that are both convenient and inclusive enough for this purpose. (e) Only a small proportion (under 10 per cent) of high income people live in identifiable high income areas, such as Census tracts. Columns 3, 4, and 5 of Table I present mean values of the three strata based on L, M, and H ratings, numbered as strata 1, 2, and 3. Large differences among the strata appear for mean incomes and mean liquid assets and for many other items, e.g., for the proportion with yearly disposable incomes under $1000, the proportion with college degrees, the proportion in professional occupations, and the proportion of nonwhites. For example, the proportion of units having incomes of $10,000 or more is about 1,2, and 24 per cent in strata 1, 2, and 3, respectively. Even larger differences appear in subsequent tables dealing with characteristics of selected subclasses. These large differences show that ratings do indeed distinguish three rather distinct groups. (The "pure" ratings actually had even greater differences, as explained in Section 6(e).)2 2 The punctilious reader should be warned that he cannot check against 100 per cent the percentages in columns 3, 4, and 5. In Groups C', E and F several intermediate subclasses (mostly of type 1) have been omitted to save space without sacrificing important results. From Group D the "not ascertained" category is missing. The computation of medians involved proportions very near but not necessarily precisely at the median. This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions MULTI-PURPOSE SAMPLING 365 2. DEVICES FOR MULTI-PURPOSE DESIGN The differences among the strata means would be important for comparing proportionate stratified sampling with unstratified sampling, since the gains of the former depend on the variances of means among strata. But here we are more interested in comparing the efficiencies of different allocations in stratified sampling and these depend on the relative values of the standard deviations within the strata. Optimum allocation occurs when the sampling rates, hence the "loading factors," are made proportionate to the standard deviations (SA)within the strata. Assuming a basic sampling rate of r for the first stratum, optimum and allocation is reached when the sampling rates in the other strata are rkO2 rkO3,where ko2 S2/S1 and ko3 = S3/S1. Actually, we used estimates 'j, ?2, and S (as defined in Section 8). The "optimum loading factors" ko2 and ko3 (Table I, columns 6 and 7) are the ratios of the standard deviations within the second and third strata, '2 and S3, to the standard deviation, sl, within the first stratum. For example, for mean liquid assets the standard deviations within the three strata were $1788, $4063, and $8927 and these lead to the ratios 2.27 and 4.99. Important differences among standard deviations are indicated by values of ko2and ko3which deviate greatly from unity. Generally, these important differences are found for those characteristics for which the means also vary widely. We investigated the efficiencies of various allocations of the number of interviews necessary for fixed variance. In columns 9-10 of Table I the "efficiencies" of two different allocations (1: 1.5 : 3 and 1: 2: 4) are given relative to the efficiency of a proportionate stratified sample, which appears as the unit basis of comparison in column 8. The headings on columns 8-10 denote the loading factors, k2 and k3, for strata 2 and 3 relative to the sampling rate in the first stratum. Column 11 gives the efficiency of the optimum allocation relative to that of a proportionate stratified sample. The optimum is reached when the "loading factors" are taken in proportion to the standard deviations, that is, when k2 ko2and k3 ko3. For an illustration, consider mean liquid assets. The optimum is reached when the loading factors are in the ratio of 1: 2.27:4.49 (columns 6 and 7). The efficiency of the optimum loading is 1.30. That is, with proportionate sampling it would take 30 per cent more interviews than with the optimum design to obtain a fixed variance. The efficiency of the 1:1.5 :3 loading is 1.22. Hence, the advantage of that loading is equivalent to taking 22 per cent more interviews with proportionate loading. The 1:2:4 loading at 1.29 obtains almost the same gains as the optimum. Consider median liquid assets as another example. The variance of the median is computed as a function of the variance of the proportion that This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions 366 LESLIE KISH falls below the median [3, Vol. I, p. 448]. These proportions are .70, .48, and .26, in the three strata. The standard deviation of a proportionis VPQ,and this is V.70 *.30 .46 for s1 in the first stratum. Then S2 V48 *.52 - .50 ?-and S3 .74 in .44 result the ratios (to .46) of 1.09 and 0.96 in 1 V.26 columns 6 and 7. The optimum loading factors are 1: 1.09 :0.96. Thus, the optimum loading is close to proportionate sampling. The moderate 1: 1.5 :3 loading is 91 per cent efficient while the 1 :2 :4 loading is at 85 per cent efficiency. That is, a proportionate sample of 850 interviews will give the same precision as 1000 at the 1:2:4 loading. What we just noted about liquid assets is also true for income, for personal debt, and for other financial items: while estimating the means often calls for a disproportionate allocation (here about 1 :2 :4), the estimation of medians calls for proportionate allocation (1: 1: 1). This situation, due to the relatively small effect of differences among the strata (in the proportions below the median), holds for most items. Each of the lines in Table I presents a different characteristic in the sample. Together, they represent a larger number of calculations and the still greater number of results produced by the survey. Each statistic produces a different optimum and different efficiencies for any particular allocation scheme. How is one to summarize this large variety of conflicting data? Furthermore, this variety arises along two dimensions, because each of these many characteristics is presented not only for the entire sample, but also for many different subclasses. For example, the mean income and the proportion of families buying cars are two of the characteristics, and these are presented for several occupation classes, several education classes, several income classes, etc. The broad and divergent aims of these "Surveys of Consumer Finances" may be seen in the issues of the Federal Reserve Bulletins where the results appeared annually from 1946 through 1959. The July, 1950 issue describes the methods used in these surveys. Our computations were based on the 1954 survey of about 3000 interviews. The purposes of these surveys were many and subject to frequent change. To provide empirical bases for decisions on the sample design was the aim of this research into sampling efficiencies. Our methods and results can aid in the design of other economic surveys which are also multipurpose in character. Our analysis and discussion must transcend the standard methods of optimum allocation in order to permit the treatment of a large variety of different variables. To do this we resorted to several devices for unifying, simplifying, and summarizing the presentation of the data. We found enough regularity among types of items and subclasses to improve considerably our guesses of design parameters by extrapolating from the tabled results to other similar items. The reader too may find some of these data This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions 367 MULTI-PURPOSE SAMPLING useful for his designs. Asterisks call attention to loading schemes which are within five percentage points of the efficiency of the optimum. Since the optimum points are not sensitive, frequently two allocation schemes possess efficiencies within five per cent of the optimum. 10 _ 9 0 8 _ 7 - 6 _ 5 0 0 0/ ko3 4~~~~~~~ 0~~~~ 0 0 0 3 2 00 ~~~~~~~~~0 _0Xo 0 0 0.5 1 A02 2 2.5 3 3.5 FIGURE 1.- Optimum "'Loading Factors"J A02 plotted against A03. (The function ko3= ko is also shown.) To simplify the process of organizing the data, we have sorted the variables into seven "types" such that the relative efficiencies for each scheme of loading are roughly the same for all variables designated by a type. These efficiencies depend on the relative values of the standard deviations within the three strata, hence on the values of ko2and ko3.We begin by designating This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions 368 LESLIE KISH as type 1 the estimates for which ko2 1ko3-I approximately, and for which the optimum allocation is therefore the proportionate one. Type 4 represents items for which ko2 2 and ko3 4, approximately. The optimum allocation for this type is 1 :2 :4, which represents an efficiency around 1.23 over proportionate sampling. Types 2 and 3 are intermediate between 1 and 4, while for type 5 the optimum allocation is more extreme than that of 1:2:4. On the contrary, 0 and 00 represent types for which ko2 and ko3 are both less than 1 and optimum allocation occurswith less than proportionate sampling for the higher strata. Criteriafor these types were derivedfrom efficiencies (relative to proportionate) of the 1 :2 :4 allocation (column 10) and the optimum allocation (column 11). Illustrations in the following sections will clarify them. Their usefulness dependson some regularityin the relationshipsamong the standard deviations within the three strata, hence among the optimal loading factors kO2and ko3.Actually, these appear to be in geometrical ratio to each other roughly but generally. That is, if S1l/S2i = kO2ifor the ith characteristic, then roughly S3i/S2i = ko3i/k62i= ko21 also. Hence, ko3i k=2N and the distribution of points against this line can be seen on Figure 1-as well as in the ko2and ko3columns of Tables I-IV. This relationship appears for means and for aggregates, in the entire sample and in subclasses. It seems to persist for values of ko2iboth greater and less than one. The relationshipis roughbut still useful, because moderate departures from optimal allocations lose little efficiency. This empirical relationship of c :cko2:cko2among the standard deviations must depend on the particular rating system used. But for other rating systems I would still guess a relationship of c: cko2:C(ko2)g3:*k2) g4 c(kO2)g5with the g's constant over variables and < g3 < g4 < g5, etc. 3. ESTIMATES BASED ON THE ENTIRE SAMPLE Table I presents computations of efficiencies based on the entire sample for seven groups of means. They are equally valid for the corresponding aggregates. In this table all the means are proportions except those in Group A, which are the means of three dollar distributions, each rather skewed and concentrated in the higher economic strata. The optima represent gains of 20 per cent to 30 per cent over proportionate sampling, and they are near the loading scheme of 1:2:4. These statistics belong to type 4. In Group B we examine the efficiencies for estimating the corresponding medians. These calculations actually deal with the proportions which fall below the median value. The medians possess efficiencies different from the means and similar to the proportions of Group G. The attitudinal items of Group G represent many more similar computations. Most of these and the large majority of all survey items-including This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions 369 MULTI-PURPOSESAMPLING TABLE I RELATIVE EFFICIENCIES OF SIX DIFFERENT ALLOCATIONS FOR CHARACTERISTICS OF ENTIRE POPULATION Type Characteristic (1) $25,000 and over D. Social Classes Education: Grammar School High School College, no degree College, degree Occupation: Profess. & Semi-Pro. Self Employed Managers Clerical and Sales Skilled and Semi-Sk. Unskilled and Service Miscellaneous Farmers Unemployed Retired Color: Nonwhite Efficiency related to proportionate sample for different "loadings": (I: ks: ks) loading factor 1:2:4 Opt. (10) (11) 1 2 3 Ass Ass 1:1:1 1:1.5:3 (3) (4) (5) (6) (7) (8) (9) 2882 4085 6904 745 1922 4819 561 434 303 Proportions Below the Median 1 .70 .44 .32 1 .70 .26 .48 .48 .49 1 .66 1.42 2.27 2.09 3.62 4.99 4.16 1 1 1 1.09 1.09 .98 1.02 .96 .87 1* 1* 1* 0 0 0 0 1 2 3 4 .15 .18 .17 .20 .15 .12 .022 .009 .07 .11 .13 .14 .17 .26 .068 .052 .03 .07 .07 .14 .11 .22 .12 .24 .72 .81 .88 .85 1.06 1.38 1.72 2.29 .52 .64 .68 .85 .86 1.29 2.19 4.45 1* 1* 1* 1* 1* 1 1 0 3 3 .44 .02 .01 .17 .06 .03 .07 .13 .09 .75 1.64 1.82 .52 2.31 2.95 1* 1 1 4 .002 .006 .04 1.97 4.91 1 .73 .83 1.06* 1.05 * 1.12* 1.12* 1.31 * 1.29 0 1 2 3 .50 .37 .044 .025 .30 .48 .11 .086 .18 .34 .19 .27 .92 1.04 1.53 1.78 .77 .98 1.91 2.82 1* 1* 1 1 .79 1.00 .88 .84 1.00 .91 1.02* 1.00* 1.05 1.11* 1.10* 1.12 3 2 3 2 1 0 1 1 00 1 .030 .046 .019 .060 .355 .176 .076 .076 .066 .080 .083 .087 .051 .143 .308 .075 .060 .096 .022 .064 .192 .159 .129 .169 .132 .047 .057 .017 .027 .065 1.62 1.35 1.62 1.48 .96 .69 .90 1.11 .58 .90 2.31 1.75 2.46 1.58 .71 .55 .88 .48 .65 .91 1 1* 1 1* 1* 1* 1* 1* 1 1* 1.06* 1.05* 1.08 1.01 * .96 1.03 1.08* 1.05* 1.09 .99* .97 1.03 .80 1.01 .88 .72 1.04 .82 .80 1.00 .88 .82 1.03 .89 .69 1.07 .81 .80 1.00 .89 00 .21 .05 .02 .55 .36 1 (2) A. Means of Dollar Values Mean Disposable Income Mean Liquid Assets Mean Personal Debt B. Median of Dollar Values Median Total Income Median Liquid Assets Median Personal Debt C. Income Classes under $1000 $1000-$1999 $2000-$2999 $3000-$3999 $4000-$4999 $5000-$7499 $7500-$9999 $10,000 and over C'. Liquid Assets Classes None $5000-$9999 $10,000-$24,999 Mean Mean or orrooOptimum. proportion in stratum Means in Dollars 4 4 4 )1* 1.22* 1.21 * 1.23 1.22 1.29* 1.30 1.20* 1.23* 1.23 .92 .91 .89 .85 .85 .82 1.00 1.00 1.00 .72 1.04 .82 .75 1.02 .85 .78 1.01 .86 .78 1.00 .87 .83 1.00 .90 .93 1.02 .97 1.05* 1.04* 1.07 1.21 * 1.24* 1.24 This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions .78 .66 1.03 1.08 1.13 1.32 1.11 370 LESLIE KISH TABLE Characteristic I, CONTINUED Mean or proportion Meanstraproprtumn in stratum Type Efficiency related to proportionate sample for different "loadings": (1 k, : k3) Optimum loading factor 1 2 3 kO2 k03 (3) (4) (5) (6) (7) (8) 1 .59 1 1 .046 1 .087 2 .014 .55 .050 .079 .029 .56 .044 .080 .047 1.01 1.04 .96 1.43 1.01 .98 .97 1.79 1* 1* 1* 1* 1 0 1 2 .68 .082 .028 .025 .59 .066 .040 .042 .58 .028 .049 .067 1.06 .90 1.18 1.28 1.06 .61 1.29 1.59 1 2 1 .22 .06 .15 .24 .10 .14 .22 .15 .067 1.03 1.24 .96 (4) yes 1 .12 .16 .21 (5) yes 1 .035 .050 (6) for cash (7) installment 2 .041 0 (1) 1:2:4 Opt. (10) (1 1) .91 .91 .90 1.01* .84 .84 .82 .98 1.00 1.00 1.00 1.04 1* 1* 1* 1* .92 .86 .95 .99* .85 .78 .90 .94 1.00 1.01 1.01 1.02 1.00 1.47 .69 1* 1* 1* .91 .97* .88 .84 .92 .80 1.00 1.02 1.01 1.12 1.26 1* .94 .88 1.01 .050 1.20 1.19 1* .94 .89 1.01 .075 .095 1.33 1.48 1* .98* .94 1.02 .091 .075 .042 .91 .70 1* .87 .79 1.01 1 1 1 1 1 1 .32 .33 .33 .32 .091 .45 .38 .31 .29 .36 .12 .37 .43 .30 .25 .46 .11 .29 1.04 .98 .97 1.03 1.12 .97 1.07 .98 .91 1.07 1.10 .91 1* 1* 1* 1* 1* 1* .92 .90 .89 .92 .93 .90 .85 .83 .82 .85 .87 .82 1.00 1.00 1.00 1.00 1.00 1.00 1 1 1 1 1.00 1.00 1* .91 .83 1.00 (2) E. Buying in Dollar Classes Bought (1) None selected (2) 1-99 durables(3) 200-299 in (4) 1000 + classes House (5) None ad(6) 1-49 ditions (7) 200-299 and (8) 1000 + repairs F. Buying Categories Bought (1) yes Car? (2) new (3) used 1:1 :1 1:1.5:3 (9) Expected to buy car? Bought house? Bought T.V.? G. Attitudes Better Off (1) yes Last Year (2) same (3) no GoodTime(4) good To Buy (5) pro-con (6) poor The "Standard" Assumed 'S = 2= S3 The basis of the computations is k52 = S2i1 and k0o = g2/91; these are the "optimum loading factors" of columns 6 and 7. The "loading factors" k2 and k3 are the ratios of the sampling rates in strata 2 and 3 to the sampling rate in stratum 1. The efficiencies of five disproportionate allocations are given as ratios of the efficiency of proportionate sampling. "Efficiency" denotes the precision (inverse of the variance) per sample size. * Denotes allocations which are within five percentage points of the optimum. the computations of medians-have this in common: the proportions for the three strata lie between 15 and 85 per cent. This still permits a great deal of variation among the strata-variation which may be of statistical This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions MULTI-PURPOSE SAMPLING 371 and economic significance. Nevertheless, the standard deviations IPQ lie between .36 and .50; and the ko2 and ko3 usually well within .80 and 1.20. These items can be well approximated by the "standard" of type 1 for which S1 S2 S3; hence ko2 ko3 1. Thus, for this majority of items, the optimum is near 1: 1: 1. The slight loading of 1: 1:2 is at 96 per cent efficiency; the moderate loading of 1 :1.5 :3 is at 91 per cent, but a heavy loading of 1:2 :4 is down to 83 per cent efficiency. In Groups E and F we have some categories relating to buying behavior and buying intentions. These are also mostly of type 1 with optima at the proportionate allocation, but some categories, such as new car buying and large purchases, are of type 2. For these the optimum would be reached with some moderate loading; but the efficiency of the optimum is always within 1.05 (e.g., between I and 1.05) relative to proportionate allocation. Furthermore, the 1 :2 :4 loading is within 0.90 efficiency relative to proportionate allocation-and this limit distinguishes it from type 1. Groups C and D present characteristics which form scales strongly associated with economic status. The items range from type 00 to type 4, ascending from the lower to the higher classes on the social and economic scale. Type 3 items, contrasted to type 2, have relative efficiency over 1.00 with the 1 :2 :4 loading and over 1.05 with their optimum; but both the 1:2 :4 and the optimum loading are under 1.15 relative efficiency and this separates them from type 4, for which the gains from such oversampling are greater than 1.15. At the lower end of the socioeconomic scale, several items of type 0 occur. For these items the optimum calls for slightly higher sampling rates in the lower strata. They (unlike type 1) have less than .80 efficiency with the 1:2:4 loading, relative to proportionate sampling; this has almost the efficiency of the optimum (within 1.05). There are only a few extreme items of type 00. For these the efficiency of the optimum, with "reverse" loading, would be greater than 1.05, and the 1 :2 :4 loading is less than .70 efficient -both efficiencies being relative to proportionate sampling and distinguishing type 00 from type 0. Estimates of the proportions of the unemployed and of nonwhites are in this category. Apparently these ratings, based on the quality of dwellings, segregate the nonwhites more distinctly than any based on income, other underprivileged socioeconomic group-whether assets, or buying. 4. ESTIMATES OF MEANS FOR SUBCLASSES When dealing with different characteristics for various subclasses one encounters redoubled complexity: the surveys present estimates for many characteristics, and each of these for many different subclasses. This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions 372 LESLIE KISH The subclasses-sometimes called "domains"-cut across the strata. Despite slight theoretical difficulties, the ordinary sample mean Y, for the subclass c is a good ratio estimator (see [4] and [6, pp. 297-305]). Furthermore, the estimator of its variance is var(yc) n n*2 k- [nich2h +iich(l(Ii_ch) (?ch Ic)2] - Here nch is the proportion of subclass c members in stratum.h. The brackets contain the variance per element in the stratum, which consists of two terms. The first term is Sch, the variance per element within the subclass, multiplied by the proportionof subclassmembers.The second term expresses the variances among the stratum means for the subclass. The entire bracketed expression plays the same role in allocation as the usual variance per element (see Section 9). Comparedto (b) of Section 8, we have here neglected the trivial factors of (1 -fh) and nh / (nh- 1). Many of our variables are binomial, and for these the second term is usually small compared to the first, and may be neglected. Furthermore, for these variables, ch =-y(V h- ) and these usually do not vary much among the strata. Insofar as one can assume negligible between-stratum components as well as similar SChfor all strata, one can treat the standard deviation in the stratum as proportional to flch. Then the efficiency of an allocation scheme depends only on the proportionof subclass membersin the various strata. These are the assumptions behind the "standards" shown for several subclasses in Table II. We found these standards useful for many variables, just as we found in Table I that many variables-especially in Groups E, F, and G-behaved very much like the "standard" type 1. For many variables, especially proportions, we found results (in terms of ko2 and ko3)rather similar to the "standards." For example, the standard for the domain of professionalshad ko2 =1.67 and ko3 = 2.54 (see Table II). We compared these with many actual computations (not shown here). For estimating the proportion of those who thought it was a "good time to buy," these factors were ko2 1.66 and ko3 2.54; for estimating the proportions who said that they were "making more money than in the previous year," the factors were ko2 = 1.63 and ko3 = 2.34. For many subclasses the fich are relatively constant from stratum to stratum. These are subclasses which were not strongly distinguished by the economic ratings, e.g., age classes, geographicor city size classes, attitudinal classes. For this large variety of subclasses the "standard" is similar to that for the entire population (ko2 -1 and ko3 = 1) and no separate results need be presented for them. The situation is different for subclasses which were distinguished by the This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions 373 MULTI-PURPOSESAMPLING TABLE II "STANDARDS" FOR COMPARINGTHE RELATIVE EFFICIENCIES OF SOME SUBCLASSES where ii,h is the proportion (Computed by assuming h02 = lAC2/nil and ko3 Jc3/ln,c1 of subclass members in stratum h.) "Subclass" (1) Type (2) Optimum loading factor Efficiency related to proportionate sample for different "loadings": (1: k2: ks) kos koa 1:1:1 1 :1.5:3 1:2:4 Opt. (3) (4) (5) (6) (7) (8) Entire Population 1 1.00 1.00 1* .91 .83 1.00 H. Income Classes Under $1000 $1000-$1999 $2000-$2999 $3000-$3999 $4000-$4999 $5000-$7499 $7500-$9999 $10,000 + ($7500 +) 0 0 0 0 1 2 3 4 4 .69 .77 .86 .82 1.08 1.52 1.76 2.34 1.95 .49 .60 .64 .82 .84 1.37 2.31 5.09 3.39 1* 1* 1* 1* 1* 1* 1 1 1 .82 .84 .86 .86 .90 .98 1.06* 1.26* 1.15* .71 .74 .77 .77 .84 .95 1.05* 1.29* 1.16* 1.05 1.02 1.01 1.01 1.00 1.03 1.08 1.30 1.16 I. Occupation Classes Professional and Semi-Pro. Self Employed Managers Clerical and Sales Skilled and Semi-Skilled 3 2 3 2 0 1.67 1.38 1.64 1.55 .93 2.54 1.87 2.61 1.68 .61 1 1* 1 1* 1* 1.08* 1.02* 1.09* 1.01 * .87 1.07* .98 1.08* .98 .79 1.09 1.04 1.10 1.04 1.01 00 .65 .52 1 .81 .70 1.06 0 00 1 .89 .57 .89 .87 .64 .90 1 1 1* .88 .80 .88 .79 .69 .80 1.00 1.08 1.00 Unskilled and Service Miscellaneous Unemployed Retired economic ratings and Table II is devoted to these. The results show that the standards range from types 00 to 4 as we ascend the scale of economic class. Hence, for estimating proportions based on the blue collar workers, or on lower income groups, it is most efficient to use proportionate sampling (or even less) for the higher strata; but for estimating proportionsbased on the higher income or occupational groups, a higher loading gives moderate gains. For some characteristics, however, the above assumptions do not hold well. Hence their statistical characteristics (in terms of the ko2 and ko3) depart from the "standard." These departures are usually in predictable directions; for low economic activity the factors often are depressed, while for high economic activity the factors are increased. For example, for the This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions 374 LESLIE KISH subclass having incomes above $7500 the "standard" had factors of ko2 = 1.95 and ko3 3.39 (see Table II). For estimating the proportion within this income subclass of those who possess liquid assets of over $7500 we actually found the factors increased to ko2- 2.78 and ko3- 6.19 (computations not shown). The largest departures from the standards, and the largest gains for highly loaded samples, were found for the kinds of items illustrated in Tables III and IV. TABLE RELATIVE EFFICIENCIES III OF THE MEAN LIQUID ASSETS HOLDINGS SUBCLASSES Occupation subclass Type (1) (2) (3) 5 5 5 4 797 1721 1113 1668 1 0 4 3 654 586 480 905 1 J. Occupation Professional and Semi-Pro. Self Employed Managers CleriCal and Sales Skilled and Semi-Skilled Unskilled and Service Miscellaneous Retired Optimum loading factor Mean for stratum (in dollars) FOR OCCUPATION Efficiency related to proportionate sample for different "loadings": (I :k2 k3) 1:1:1 1:1.5:3 1:2:4 Opt. (10) (11) 1.42 1.48 1.60 1.16* 1.48 1.56 1.76 1.16 .97 .94 .85 .77 1.29* 1.31* 1.07 * 1.05* 1.03 1.05 1.32 1.07 2 3 ko2 (4) (5) (6) 2151 3224 3267 1734 4145 9852 7312 2847 2.74 2.55 2.44 2.00 7.28 7.74 9.06 3.41 1 1 1 1 1.37 1.43 1.53 1.15** 1238 1071 1802 2442 1895 1.51 812 .93 4176 1 1.91 1.43 5736 1.23 .28 4.80 2.88 1* 1* 1 1 ' o3 (7) (8) (9) . TABLE RELATIVE EFFICIENCIES IV OF THE AGGREGATES LIQUID ASSETS HOLDINGS FOR OCCUPATION SUBCLASSES Occuipation subclass Type Optimumii loading factor ko2 (1) K. Occupation Professional and Semi-Pro. Self Employed Managers Clerical and Sales Skilled and Semi-Skilled Unskilled and Service Miscellaneous Retired ko__ Efficiency related to proportionate sample for different "loadings": (1 :k :ks) 1:1:1 1:1.5:3 1:2:4 Opt. (2) (3) (4) (5) (6) (7) (8) 5 5 5 3 1 0 4 4 3.96 3.05 5.14 1.88 1.56 .96 2.34 1.69 10.33 9.38 18.14 3.19 1.31 .36 5.61 3.34 1 1 1 1 1* 1* 1 1 1.39 1.45 1.55 1.13* .98 .86 1.30* 1.16* 1.46 1.52 1.65 1.14* .95 .78 1.34* 1.16* 1.57 1.64 1.96 1.15 1.04 1.04 1.35 1.17 This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions 375 MULTI-PURPOSE SAMPLING These tables deal with the skewed distributions of "dollarvariables "pertaining to subclasses that represent economic classes with differing dollar shares.These are important types of estimates in many surveys. Table III deals with mean liquid asset holdings and Table IV with the corresponding totals. The higher economic subclasses are of type 5, denoting relative efficiencies above 1.35 for the optimum compared to proportionate sampling-and this limit distinguishes type 5 from 4. Although the optimum is even more disproportionate, the 1:2 :4 loading achieves most of the gains. But even a moderate overloading of, say, 1: 1.5 :3 obtains a large portion of the possible gains and in turn loses less for the less extreme items. The larger values (than indicated by the standard) of k02and ko3for high economic activity are due to coincidence in the higher strata of both larger proportions, nch of elements with higher activity and of higher variances, ,2 Sch, per element. On the whole, the results show a rather consistent pattern in the rough additivity of these two factors,and this is useful for extrapolating the empirical results to other characteristics in the same survey and for planning future surveys. 5. ESTIMATES OF AGGREGATES FOR SUBCLASSES Estimating aggregates for subclasses may be important, e.g., they can be used to estimate the "share" of the entire sample accounted for by the subclass, such as an occupation subclass or an income category. When dealing with the entire sample (as in Table I), the standard deviations are equally relevant for allocating observations whether estimating aggregates or means. When dealingwith subclasses,however, the best allocation for estimating the aggregate (Table IV) is not necessarily the same as that for estimating the mean. This may be judged from the formula for the variance of aggregates (from equation (3b) of Section 8, but neglecting the trivial factor (1 -fh)nh/(nhvar (ye) 1)): N HWh - I -LzchS 2 2 +'ch(I `Rc,+ch( 1] Comparingthe variance per element within the brackets with that for the mean, we see that the first term is the same, but the second term differs. The second term is here a function of the value of 5hy, the mean per subclass element within the stratum (while in the variance of the mean it depends on the deviations of the means). Now the second term is generally neither small nor similar from stratum to stratum. We found (as expected) roughly the following relationships among the strata: For dollar items the subclass means, 5Ch, were roughly proportional to the standard deviations, sch; that is, the coefficients of variation, Sch/ Yh, This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions 376 LESLIE KISH were approximately constant among the strata. Usually (1 - fih) is unimportant, and then the second term is roughly proportional to the first. Hence, the allocations for estimating the means would hold also roughly for estimating the aggregates. Actually, the factors ko2and ko3,and hence the optimum allocations, diverge somewhat more for aggregates than for the mean (compare Tables III and IV). This is due largely to the "damping" effects of the relative constancy of the second term in the variances for the means. For estimating aggregates of binomials the second term generally had little disturbing effect on the allocations. For proportions near 0.5 it was =. 25, proportional to the first term, since y? = .25 and Sch = 5ch ( -ch) relatively small; e.g., for also. For small proportions the second term is ych - .1 the Ych- .01 while S =09. 6. SIMPLIFICATIONS INTRODUCED INTO THE ANALYSIS To be able to deal with and present a large body of complex data, we simplified the treatment in several ways, while keeping the basic problems in sight. But to apply these results to a specific sample design, one should consider several aspects neglected in the analysis. (a) The effects of the complexities of the cluster design have been disregarded. Hence, the actual losses and gains of the total variances are less than the computed variations in efficiencies. For complex designs one can usually consider the effects of clustering as additional components-great for large clusters, such as counties in a nationalsample, andless for samples thoroughly spread over the population (e.g., in a city). In either case, if one keeps the number of clusters constant, then the computed relative efficiencies indicate the ratios by which the numbersof interviews need to be increasedor decreased. (b) Our computations were based on the results of 3000 cases, and the results for subclasses are particularly subject to sampling errors. But since the efficiencies of allocation vary but little for moderate departures in the values of the factors ko2 and ko3, precision is unnecessary. The kind of stability that was needed came from the computations for the great variety of items comprisingthe survey. Moreover,for eas'eof computation we neglected the factors (1 -fh) and nh / (nh -1). (c) We consider only estimates in which there is complete compensation, with the inverse weighting of estimates, for the differences of selection rates. But in the discussion of efficiencies we neglected the increased costs of a weighted sample as against a proportionate sample. The increase in the costs of analysis, of sampling, of planning, of bookkeeping is not negligible, and one could for the same cost buy more interviews so as to increase the precision of the proportionate sample. This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions MULTI-PURPOSE SAMPLING 377 Furthermore, if the sampling rates were kept constant, "rating" the dwellings would be unnecessary. The cost of these ratings was, however, small in our situation. We also disregardedpossible differences among the strata in cost per interview. These differences were smaller than we could measure and the differences in Vich(see formula 12) would not affect the allocations by more than a per cent or two. Increasing the divergence among sampling rates increases also the total numbers of addressesthat need to be listed in one form or another, and this results in increased listing costs. Roughly the listing of a dwelling had a cost of $0.50 against $20.00 for interviewing and coding, a ratio of 1:40. The proportionof "used"dwellingsinthe 1:2: 4 allocation was .09/1 +.53/2 + .38/4 .45; hence, we had to list 1/.45 2.2 dwellings for each dwelling used. Thus the cost of a 1:2: 4 allocation was (40 +2.2) / (40 + 1) = 1.03 greater per dwelling than for proportionate sampling. This factor can be applied to reduce the relative efficiencies shown in the columns for the 1 :2:4 loading. (d) The above factor was just about cancelled by an equal correction in the other direction for the actual samplingprocedures(asdescribedin Section 1). This procedure did not have the actual population weights (Wh) that a true stratified sample requires. It can be regardedusefully as a procedureof "double sampling for stratification" [2, pp. 268-271]. In this, the variance I) 2, where n' and X are h h +( 1/n') X Wh ( is approximately (1/n) the sample sizes for the first and second phases. An allocation of 1:2 :4 would utilize a first phase of n' _ /.45 listings, thus the second term would enter with the factor (.45/n). But for an allocation of 1 :1 :1 only n' n listings would be needed and the second term would enter with the factor (1/X). This difference was found to make the 1 :2 :4 allocation, compared to the proportionate 1 : : 1, more efficient actually than shown in the tables by an effect of 1.02, on the average. This effect ranged mostly from 1.00 to 1.04, with one extreme at 1.06; these were all based on the entire sample. For variates based on subclasses, this effect will be smaller. (e) Actually, the data in Tables I-IV understatesomewhat the efficiencies of actual interviewer ratings for types 3, 4, and 5. They represent the entire sample selected with sampling rates in the proportions of 1 :2 :4 for the three strata, because we wanted to judge the performanceof the ratings in the sample as actually used. However, to judge how well the interviewer ratings could perform if completely utilized, we call attention to some important divergences of the three strata (1,2,3) from the interviewers' actual ratings (L,M,H). First, 22 per cent of the respondents were in the ''open country," where compact segments were the last stage sampling units and where no ratings were assigned to the dwellings. Secondly, in - This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions 378 LESLIE KISH about 7 per cent of the cases, the interviewers' ratings were mixed (LM or MH). Thirdly, about 3 per cent of the cases were placed into higher or lower strata than indicated by the ratings. We used this flexibility to adjust the sample sizes to lessen field problems. Because of these departures of the actual strata from the interviewers' rg tings, the data in Tables I-IV actually understate somewhat the true efficacy of the ratings. To evaluate these effects we also investigated on a limited basis the 68 per cent of the sample in which the allocations were truly based on interviewers' ratings. Most of the results, as expected, were close to those for the full sample. The chief departures came for items of types 3-5, for which the advantage of the opTABLE V RELATIVE EFFICIENCIES WHEN STRATA FOLLOW STRICTLY THE INTERVIEWERS RATINGS (Based on 68 per cent of entire sample. These results should be compared with similar characteristics in Table I based on the entire sample.) Mean or proportion in stratum Characteristic Optimum loading factor Type 1 (1) (2) Mean Disposable Income Mean Liquid Assets Mean Personal Debt Income Classes $5000-$7499 $7500-$9999 $1O0,00andover (3) | 2 3 ko2 ks3 (4) (5) (6) (7) Efficiency related to proportionate sample for different "loadings": (1; k2 : k3) 1 : 1:1.5:3 1:2:4 (8) (9) (10) : Opt. (11) 4.67 5.87 4.96 1 1 1 1.32 * 1.32 * 1.36 1.38 1.41 * 1.45 1.36 1.37* 1.42 1.43 2.73 3.31 1.33 3.46 7.16 1* 1 1 .98* 1.12 1.30 .94 1.16* 1.37 1.03 1.20 1.43 .14 1.93 1.59 1.68 2.79 2.16 2.51 1 1 1 1.10* 1.05* 1.08* 1.11* 1.03* 1.07* 1.12 1.07 1.09 .16 .14 .05 1.81 1.89 1.32 2.72 3.76 4.94 1 1 1 1.10*' 1.10* 1.19* 1.20* 1.39 1.40 1.11 1.20 1.49 5 5 5 2811 659 308 4174 1824 381 1 4 5 .11 .008 .004 .28 .07 .05 .23 .12 .31 3 3 3 .02 .03 .02 .09 .09 .06 .23 I .19 3 4 5 .02 1 .07 .008 1 .03 .002 .004 7974 1.62 5789 2.00 659 1 1.53 Occupation Professional and Semi-Pro. Self Employed Managers Liquid Assets $5000-$9999 $10,000-$24,999 $25,000 and over TABLE VI For the entire sample For the "pure" 68 per cent Strata 2 Mean Disposable Income Mean Liquid Assets Mean Personal Debt 1,939 1,788 535 2,755 4,063 1,121 3 7,012 8,927 2,226 L 1,725 1,716 543 This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions M 2,794 3,429 833 H 8,056 10,065 2,693 MULTI-PURPOSE SAMPLING 379 timum over the proportionate was considerable. These are illustrated in Table V and can be compared with the same variables in Table I. The "pure" strata of Table V are seen to achieve optima from 5 to 20 per cent higher than the "impure" strata of Table I. The reader can estimate, for all proportions in the tables, S h For the three important ip-h(l-ph). dollar items the Sch are shown in Table VI. 7. CONCLUSIONS The choice of the allocation scheme is easier if an item or a small set of similar items can be designated realistically as the prime objective of the survey. But many surveys, like ours, have many conflicting objectives. To guide the choice of design we were compelledto present the consequencesof different designs for the entire spectrum of survey variables. Our objective was to make this vast task feasible and comprehensible. For many survey statistics we obtained estimates of the optimum allocation and of the relative efficiencies for several allocation schemes. These generally differ, and a brief summary may be useful. Proportionate sampling seems to be near the optimum for the majority of estimates dealing with proportions. But disproportionate sampling results in moderate to large gains for some important kinds of estimates, such as: (a) Means of financial variables with skewed distributions, such as income and liquid assets; (b) Proportions denoting high economic status or high economic activity; (c) Proportions based on subclasses standing high on the socioeconomic scale; (d) Financial variables based on subclasses high on the socioeconomicscale. For the first three, the gains over proportionate sampling were below 40 per cent; for the last, they went up to 100 per cent. For all these items disproportionate sampling rates, perhaps in the 1 :2 :4 ratios actually used, are close to the optimum. On the other hand, such disproportionatesampling is 15 to 20 per cent less efficient than proportionatesamplingfor the majority of proportions estimated in the survey. The allocation of 1 :1.5 :3 would cut these losses to about 10 per cent and still achieve most of the gains possible for the other items. Increasing the precision is generally not equally important for all items in a survey, and the problem should be stated in terms of the "costs" of specified errors for each of the estimates in a decision function. In this formulation-as in many practical situations-the requirements can not reasonably be met by the statistical analyst. Nevertheless, we might raise ever so briefly the question of the relative importance of errorsfor different estimates. In comparing a characteristic for two different economic subclasses, the This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions 380 LESLIE KISH variance of the comparison is often dominated by the variance of the subclass "higher" on the economic scale, because the "higher" subclasses are both more variable and smaller. They are made smaller because they represent the tails of skewed distributions and economic activities and have importance well beyond their proportions. Increasing the efficiency of estimators based on these relatively small subclasses of higher economic status and activity favors some disproportion in sampling rates. A moderate disproportion may produce most of the gains for these items and result in only small losses for the others. However, one should also consider the increased costs of sampling and analysis before departing from the simplicity of proportionate sampling. 8. SUBCLASSES FROM STRATIFIED SAMPLES For a population of N population mean F H _ Nh Y/N = (1/N) elements divided into H strata the EH 1Nh Yhl (1/N) EH Nh h= EH Wh 7h is usually estimated by H y 1 H nh Hnh/fh zII* /rYh z Whyh n* 1 H yh EkhYh n* y n* kh where Wh= Nh/N is the proportion of elements in the hth stratum and nh/Nh rkh is the selection rate within the hth stratum; EH Wh =1; fh r is some basic selection rate; kh is the "loading factor" in stratum h (usually some convenient number, for example, in the survey which provided our empirical results, r was near 1/20,000 and the loading factors for the three Nr = Hnh/kh is the strata were k1 1, k2 2, and k3 = 4); n* sample "size" weighted inversely to the "loading factors"; and y* E E yhi/kh is the sample total for a variable in which each element is weighted inversely to the loading factor. If the nh elements within each stratum are selected by simple random sampling, the variance of the sample mean is estimated by H (1) var(y) E(- fh) W2H Wfh)h2 s = 1h 2 Sh , E (I -fh) where SEn (nh h1)i n (yhi -y h) 2 is the estimated variance of elements within the hth stratum. For a binomial, this can be computed simply as equal to ih,&(1 yh)nhI/(nh 1). N' =y*/r, can be estiThe variance of the estimated aggregate, y' mated by H Nfh) N Wh This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions 2 MULTI-PURPOSE 381 SAMPLING The estimators of the mean and aggregate are unbiased with minimum variance. Also (1) and (2) give unbiased estimators of the population values of the 2. variances, because Sh is the unbiased estimator of (Nh 1)-1 zNh (YM -h) These formulas can be found in sampling textbooks [1, 3, 6]. Very often estimates are made not only for the entire population covered by the survey sample, but also for various subclasses thereof. If each stratum belongs entirely to only one of the subclasses, then the standard formulas are adequate. But we need new formulas if the subclasses cut across strata, so that the elements of a stratum belong to more than one subclass. The variance for subclasses in simple random samples is derived by Hansen, Hurwitz, and Madow [3, Vol. II, pp. 114-117]; and Yates [6] gives the results of this section without derivation. More recently, Durbin [2] gave derivations for stratified samples, and Hartley [4] presented them in greater detail and generality. But our brief derivation is convenient and useful both for presenting our empirical results and for the formulation of the efficiencies of the next section. Suppose that of the N elements in the population N0 are members of the specified subclass c, N0h being members among the Nh elements of the hth stratum. In the sample of nh random elements we find n,h belonging to the cth subclass. The fact that n0h is a random variable gives rise to the special problems of a subclass. These problems can be handled easily by defining for the subclass in the sample (and with analogous definitions in capital letters in the population): Ychi Yhi for the nh Ychi 0 for the n members of the subclass. -nCh nonmembers of the subclass. We shall also use later the special "counting variate": for the n0h members of the subclass. n1hi-1 for the nh -nlh nonmembers of the subclass. -0 nchi Also useful will be ,h = nchlnh, the sample proportion of subclass members in the hth stratum, and the stratum mean within the subclass, 1 Ych Ch= 'nch - 1 Uc Ychi -E 'nch 1 nh 'ncch Ychi Yhi fnch fnh 1 = ncch The redefinition of the variable ychi does not disturb the unbiased nature of estimators of aggregates such as y*c and yc. The variance of the estimator of the subclass aggregate, EHych,=!Ycki y*/r, can be computed with equation (2) by using the entire sample and including (nfh -nah) zero values for Ych. But we can gain ease and insight by using only the subclass sample. First, note that because (nh nch) values of yhi and yhi are zero, nh - Sh (nh1)Sh-z2 1) (nh ~E 2 Vhi 2 - Yh qn This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions 382 KISH LESLIE becomes for the variate Ychi 2 yc2 Nch 2 nch Yci h nlh Ych - 2 2 ri nlh nc nch (nch 1)Sch ch (I1 ich) + 2 O This definition ofOSCh for the subclass resembles the definition of stituting for Sh in (2) gives H var (y') (3) = S2. 5h. ych Sub- 22C sc +- S(c-fh) l (1fIch) YchJ This is the unbiased estimator (as Sh is of S2) of Var (y'), whose population values are given when there are capital letters inside the brackets. By using Sch Ch(ychi -5Ch) / nch we get a simpler expression: ~~~~2 H ' HNhvar ( I = [Ifch h + tch (II- ch) ych] (3a) -fh)(yc) Nch/Nh above we can compare it to the variance, E (1 fh) of a sample designed specifically for the subclass and see that Nchschlnch, is increased approximately by (1-Nch) the relative variance, Sc h/ I2 For small subclass proportions this increase approaches 1, and it can have a great effect. A useful computation form is By using Nch 2 (3b) 2 var(y') N r - H Wh~ nh (l r -IA) kh~nh~-I 2 fiech + htc(I - tqch) 5h] The mean Y, (-- YI/N,) for a subclass is usually and conveniently estimated by the ordinary (but weighted) sample mean H (4) IC Ye Y y ye Yychlkh nc n H /Yr n%chlkh This is the "combined ratio estimator" in a stratified random sample. As a ratio of two random variables it has a technical bias which decreases with increasing sample size. These properties are given in sampling texts, as is the usual estimator of the variance: (5) var(5?) nc [var (y,) + 52var (nc) 2y, cov (y, n')] The variance of yc was given above. The variance of nc is the special case of a binomial variate for (2) in which Sh = Ch(1 -`h)fh/ 1), (fhand so H (6) var (n) (1 -Jh) N2 h [flch (1 'fie )j This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions MULTI-PURPOSE 2 Sh = Just as we earlier treated valent (nh-1) which becomes for the variate 2 Shy, 383 SAMPLING we similarly treat the covariance equi- shyn - yhinhi _ nh ychi Ych -YCh ftch ( 1-ich) -nhfAch Ychl* Then substituting Shyn for Sin (2) we get ~~~2 H cov (yc', n'c) =II (7) nh [Ach -fh) (l-tch)Ych] Upon substituting the results of (3a), (6), and (7) into (5), we obtain within each stratum ftch ch +'Rch (I - (l fich tfch) YCh 'e2 SA+ #fchA(l -tch)5c-21chA(I -tch)5ycich (?Ch -5C) 2 -ich) and these terms, multiplied by the common factors and summed over all strata, give ~~~~2 H var (yc) (8) ( 2 I-fh) n f-1ch Ach + fch ( l-cA)(?cA5c)2] nc and, using if - nI/N= sample, n /n', the proportion of subclass members in the 2 H (8a) var(yc) - J(I -fh n2 c A {ncAScA SC - n +fch (I -c) (YcAchc)2} so that I (8b) var (ic) -*2 H S WA ( -fI) flAh - lch sch + fch (I -fcA) (Ych- c) 2}. We can compare this with a sample designed specifically for the subclass and find that the variance per elemenit is increased by approximately (1 -fich) Thus for small subclasses (fich small) the effect on the variance of (?ch -<c)2 the mean is to wipe out the gains, (Ych - ?,) 2, that proportionate stratification would yield. But the additional gains from optimum allocation will tend to persist. We should add here that the variance of the difference of two subclasses (c and d) involves the two variances and their covariance. This introduces into the term in braces the quantity &hAidh (c ch-ic) (ydh -yd) for the mean and fichAidhYchAYh for the total [see 6, p. 301]. These will be comparatively small for the more usual subclasses-where fich is small compared to (1 -fch) X This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions 384 LESLIE 9. THE ECONOMICS KISH OF DIFFERENT ALLOCATIONS Any of the four variances-of means or aggregates, for the entire sample or for a subclass-may be expressed in a convenient common form: 2 H H (9) var (u)= Nh -Bh A E ( 1-fh) 2 = II~~ A I(1 -fh) Wh 2 Rh, where Bh is the within-stratum variance per element. This generalization of the variance permits the easy extension of allocation formulas from the 2 2 2 entire sample to subclasses. The estimates bh of Bh are sh in (1) and (2) for the entire sample and are the quantities in brackets in (3) for the aggregate and in (8a) for the mean of a subclass. In estimating aggregates, A = N/r; 1/nn* for the entire sample and A I/Nr in estimating means, A 1= /cn* N/n'2r for subclasses. In comparing two sets of allocations, a and b, let ra and rb and kah and kbh denote, respectively, the basic sampling rates and the sets of loading factors. That is, fah =rakah and fbh =bkbh. We may obtain, for any of the four types of estimators, the ratio of the variances of the two allocations as H # ( 0) variance (a) variance (b) r~~bE b r- ra Wh (I -fah) fah H. 2 a-Bh 2 Wh X (l-fbh)j-- Bh Denoting by Ch the cost per element in the hth stratum, we can compare the variances of the two sample designs subject to the condition of fixed rN 2H ChWh kh . The inverse of the ratio of variances for fixed zH Chn h cost, which we can we call "relative economy" of allocation b to allocation a, becomes H (11) variance (a) _( ~~variance varince(b) b) H Ch Wh kah) HHH (E ChWh kb72) E (1 -fah) Wh E (I21fu)',Bjb -fah) Wh 2 Bh/kah Bhlkbh If we neglect differences (often small) among the Ch and compare variances for fixed Enh, then (11) without the factors Ch measures the "relative efficiency" of allocation b to a. The "optimum allocation" of the nh, giving minimum variance for fixed CHChnh, is obtained in sampling texts with Lagrange multipliers for the entire sample. We extend this to subclasses by using the Bh in place of the Sh and obtain the optimum This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions MULTI-PURPOSE SAMPLING 385 H nh (12) hkkz=~ Ch Ch h Bh h H _ I2NhBh/V'ch In our data, as is often the case, the fhare so small that the factor (1 -fh) can be neglected. The ch can also be dropped because the differences among them are slight. In computing the allocations and "efficiencies" for this study we used these two simplifying conditons. For many binomial items (proportions) in the survey, the Bh are also relatively constant. In these cases the optimum is proportionate allocation, and compared to it the relative economy of allocation b can be shown to be This is always less than one. Hence, simply [(EWhkbh)( Wh/kbh)]-l. for Bh and Ch constant, all departures from proportionate sampling involve some loss. This loss is shown in our results for many items of "type 1." Survey Research Center, University of Michigan REFERENCES [1] COCHRAN, W. G.: Sampling Techniques, New York: JohnWiley and Sons, 1953. [2] [3] [4] [5] [6] J.: "Sampling Theory for Estimates Based on Fewer Individuals Than the Number Selected," Bulletin of the International Statistical Institute, Vol. 36, Part 3, pp. 113-119, 1956. HANSEN, M. H., W. N. HURWITZ, AND W. G. MADOW: Sample Survey Methods and Theory, New York: John Wiley and Sons, 1953, Vols. I and II. HARTLEY, H. O.: "Analytic Studies of Survey Data," contribution to a volume in honor of Corrado Gini, University of Rome: Istituto di Statistica. (In preparation.) KATONA, G., L. KISH, J. B. LANSING, AND J. K. DENT: "Methods of the Survey of Consumer Finances," Federal Reserve Bulletin, 36 (1950), pp. 795-809. YATEs, FRANK: Sampling Methods for Censuses and Surveys, London: Charles Griffin and Company, 2nd. edition, 1953. DURBIN, This content downloaded from 193.0.118.39 on Thu, 5 Jun 2014 16:28:20 PM All use subject to JSTOR Terms and Conditions
© Copyright 2026 Paperzz