The Use of Sample Overlap Methods in the Consumer Price Index Area Redesign William Johnson, Steve Paben, John Schilp Bureau of Labor Statistics, 2 Massachusetts Ave., NE, Room 3655, Washington, DC 20212 Abstract Approximately every ten years the Consumer Price Index (CPI) selects a new area sample. Due to costs associated with the opening and closing of field offices, it is considered highly desirable to maximize the overlap between the old and new areas sampled in the redesign. This paper will outline the overlap maximization methods investigated in the selection of noncertainty Core Based Statistical Areas for the next CPI area redesign. Two of the methods of overlap maximization investigated were the Perkins method and an optimal linear programming method proposed by Ernst (1986). The results from these methods were compared to a sample obtained independently. Key Words: Sample Redesign, Overlap Maximization, Linear Programming 1. Introduction The Consumer Price Index (CPI) is a measure of the average change over time in the prices of consumer goods and services. The first stage of sampling for the CPI consists of selecting geographic areas that are meant to cover the urban US population. Every 10 years the CPI reselects the geographic areas or primary sampling units (PSUs) to make sure the survey accurately reflect shifts in the American population observed in the latest decennial census. The CPI is currently in the midst of updating their samples based on the 2010 Census. In previous updates, a sample overlap procedure was used in selecting the geographic areas. By using a sample overlap procedure the likelihood of retaining current geographic areas is increased and thus we reduce the number of new areas that will be introduced with each area redesign. This decreases collection costs which include the closing of offices and the hiring and training of new staff for offices in a different geographic area. In other words, overlap sampling allows the CPI to retain PSUs from the prior design while maintaining the requirements of probability sampling. We considered two different sample overlap procedures: Perkins (1970) and Ernst (1986). Historically, the Perkins method was used for sample overlap purposes. It is a heuristic procedure and the Ernst procedure uses linear programming. The two procedures have different assumptions, but due to the optimization, the Ernst procedure should have a higher overlap. Additionally, the population and other demographic characteristics of the continually overlapped PSUs have changed over time. Therefore, we also investigated selecting the new area sample independently. An independent sample would also serve as a baseline for a cost benefit analysis comparing the expected number of overlaps from the Perkins and Ernst methods. Following the results of each decennial Census, the Office of Management and Budget (OMB) releases a new set of area definitions. The changes can range from the addition or deletion of a single county from an area definition to dividing an old area definition into multiple new areas. There were substantial changes made to the area definitions that were first released in 2003 with data based on the 2000 Census. Additionally, a whole new set of terminology and concepts were introduced. This is the current Core Based Statistical Area (CBSA) concept that included the introduction of micropolitan areas, albeit the micropolitan areas are akin to the current smaller metropolitan areas for the CPI. The CPI never implemented the area selection based on the 2000 Census due to budgetary issues which have since been resolved. Therefore, CPI faces two decades of area definitional changes. In latter sections, PSU definition changes will be discussed within the context of the Perkins and Ernst methods. There are currently three types of PSUs in the CPI: certainty metropolitan, non-certainty metropolitan, and micropolitan areas. The largest metropolitan areas are selected with certainty. Since certainty areas will definitely be included in the new sample there is no reason to attempt to overlap them. We also do not attempt to overlap the micropolitan areas. For the micropolitan areas, the issue was not having enough unique renters for the CPI Housing survey. Each micropolitan area must have enough renters for two samples of six panels each over the course of the decade between area redesigns. Thus, we concluded that applying an overlap maximization procedure is not appropriate for the micropolitan areas. Therefore, only non-certainty metropolitan areas will be eligible for the overlap procedures. In Sections 2 and 3, we respectively describe the Perkins and Ernst methods. In Section 4, we compare the results of the two methods to the expected overlap of an area sample drawn independently. In Section 5, we present some concluding thoughts and issues for further study. 2. Perkins Method In the past we have used a modification of the Perkins method for overlap maximization to increase the amount of sample overlap. The modification is due to PSU definition changes in addition to the strata changing between area sample designs. Let I j , j 1,..., J , denote the J old strata that intersect S. Perkins’ method requires that it first be determined from which I j the new PSU is to be selected. To do this, we first calculate a probability y j for each I j , by summing i over all PSUs in I j . Then the selection among the I j is made with probability proportional to y j . Once a subgroup I j has been selected, the new sample PSU for S is selected from the set of PSUs in I j , in the following manner: If a PSU k in I j were selected in the old sample, the new sample PSU for S will be selected from among the PSUs in I j with the following probabilities conditional on j and k: k y j pk k | jk min k ,1 y j pk i| jk 1 min ,1 (2.1) max i y j pi ,0 ,i k max y j p ,0 (2.2) I j If none of the PSUs in I j were selected in the old sample, the new sample PSU for S will be selected from among the units i in I j with a probability proportional to max i y j pi ,0 (2.3) The Perkins method of overlap maximization works entirely with the set of PSU definitions used for the old sample. This is done because it is only in the set of old PSU definitions that the concept of an overlap PSU is unambiguously defined. This is not necessarily the same concept of overlap used for counting the expected number of overlap PSUs based on new PSU definitions. The modification to Perkins method occurs after the formulas have been applied to produce conditional probabilities for PSUs based on old PSU definitions. The probability of an old PSU is apportioned to the counties making up the old PSU based on the proportion of the PSUs population contained in the county. Then the counties are added together to form PSUs based on new PSU definitions and the probability of the new PSU is the sum of the probability of the individual counties within it. Perkins’ method is relatively simple to implement, but will not yield an optimal overlap. 3. Ernst Method Overlap maximization starts with the equation (3.1) Where p is a PSU and S is a possible sample. can be calculated as the population at t=2 divided by the stratum population at t=2. This allows freedom is assigning values to as long as the above equation holds for every PSU p. These values can be assigned such that the conditional probability of selecting a currently sampled area is increased. The Ernst 1986 method uses linear programming to determine the set of conditional probabilities that maximizes the expected unconditional number of current PSUs that will be reselected. Like all sample overlap procedures the Ernst 1986 procedure does not alter the unconditional (or independent) selection probabilities of selecting PSUs in the new sample. Each stratum in the new design represents a separate overlap problem. The linear program is: (3.2) (3.3) (3.4) (3.5) Where: i = 1,…, r, where r = the number of old strata that have at least one PSU in the new stratum, j=1,…, ui where ui = the number of possible outcomes for the set of PSUs in the i th old stratum, k=1,…, n, where n = the number of PSUs in the new stratum, = the probability of selection for a PSU in the new stratum, = the probability of selection for a PSU in the old stratum, yi = the probability that the selection comes from the ith old stratum xijk= the joint probability that the ith old stratum is chosen, that the jth possible outcome within the ith old stratum that intersects with the new stratum is chosen, and that the kth PSU is chosen in the new stratum (this joint probability is what is being maximized), cijk= is a constant that equals 1, 0, or the current probability of selection for each new PSU. This matrix is essentially a list of all the possible outcomes given the intersection between the PSUs in the old strata and the new stratum. Here is the unconditional expected number of PSUs from the current sample that are also selected for the new sample. To see this, the xijk’s give the probability of every possible combination of old PSU selected and new PSU being selected and cijk is the number of expected overlap PSUs in the event that the ith old stratum is chosen, that the jth possible outcome within the ith old stratum that intersects with the new stratum is chosen, and that the kth PSU is chosen in the new stratum. Thus xijk cijk is an expected number of overlaps for a particular event. The constraint guarantees that the unconditional probability of selection, , of a new PSU k is unchanged. This can be described as saying the sum over all old PSUs ij that could be selected of the probabilities of PSU k being selected given old PSU ij being previously selected must equal the unconditional probability of new PSU k being selected. The constraint says that summing over all possible new PSUs that could be selected, the previous probability of selection of the PSU ij is preserved. The factor of yi is due to the fact that multiple old strata may intersect a single new stratum. The constraint says that the selected PSU must come from one of the old strata which intersect the new stratum. The procedure is a three-stage process. First, determine all possible sets of outcomes for the old strata and old PSUs that intersect with the new stratum. Determine the corresponding selection probabilities. This step includes creating the cijk matrix described above. Second, an optimal set of xijk’s (also described above) is obtained by solving the linear programming problem. Finally, a set of new PSUs in the new stratum is selected conditioned on the entire set of old sample PSUs that are in the new stratum. This is done as follows: The probability of new PSU k is selected and old stratum i is selected and PSU j within old stratum i being selected (xijk) is equal to the conditional probability of new PSU k being selected given old stratum i and old PSU j within i multiplied by the probability of selecting old stratum i and old PSU j within stratum i. In other words Thus, the conditional probability of selecting new PSU k for a specific old stratum and PSU within the old stratum is Finally we sum over all of the old strata which intersect the new stratum (here j represents the PSU within old stratum i which was previously selected) (3.6) For the Ernst method, area definitional changes are assumed to be a one-to-one correspondence between the PSUs in the old and new design. That is, each new area corresponds to one and only one old area and each old area to exactly one new area. For example, in the 1990 Census-based area design Greenville-Spartanburg-Anderson, SC was considered one metropolitan area. This metropolitan area is now considered to be four separate CBSAs, three metropolitan areas: Greenville, SC; Spartanburg, SC; and Anderson, SC; and one micropolitan area, Gaffney, SC. Only the CBSA that makes up the majority of the population from the old design would be matched using this approach. The other three CBSAs would have “dummy areas” created with an old selection probability of zero. It should be noted that it is possible to consider partial sample overlaps with the Ernst method. In which case in the example mentioned above, all four areas would be considered as potential partial sample overlaps. However, this greatly raises the number of possible outcomes for the Ernst method and therefore increases the size and complexity of the problem. Therefore, we decided not to pursue the partial sample overlap approach with the Ernst method. 3.1 An Example In this example, which illustrates the method above, we examine one S or stratum in the new design. This represents a separate overlap problem and uses current data. The example stratum S contains 3 CBSAs that come from 3 initial strata. Old and new probabilities are based on population relative importance. Initial Strata Orlando-Kissimmee, FL B338 Naples-Marco Island, B360 FL Key West, FL C328 Initial probability of New Probability = πk selection within initial stratum .7334 .8394 .1173 .1290 .0359 .0306 Here, k=1, 2, 3 correspond to Orlando, Naples and Key West respectively. While each initial stratum contains numerous other previous PSU, we are only concerned with the intersecting PSU from initial strata and current strata S and their initial probabilities. The new probabilities denoted πk necessarily sum to 1 for k=1 to n. Cijk Matrix is below, where i is the initial strata and j is 1 to ui , where ui is the number of possible outcomes for the set of PSUs in the ith old stratum. In this example ui is coincidently 2 for all initial strata. When the Initial Strata is B338, j=1 when Orlando FL was previously selected and j=2 when another PSU in B338 was previously selected. (i,j) (1,1) k 1 1 2 .1173 3 .0359 (1,2) 0 .1173 .0359 (2,1) .7334 1 .0359 (2,2) .7334 0 .0359 (3,1) .7334 .1173 1 (3,2) .7334 .1173 0 The solution to the optimization problem is Xijk. It will maximize the objective function (i,j) (1,1) (1,2) (2,1) (2,2) (3,1) (3,2) K 1 0.2225 0 0 0.6149 0 0 2 0 0.0495 0.0817 0 0 0 3 0 0.0314 0 0 0 0 Pij is defined here B338 B360 C328 j=1 .7335 .1173 .0359 j= 2 .2665 .8827 .9641 Finally an optimal set of conditional probabilities are below. It is the result of Xijk/ Pij j1 j2 j3 K 1 1 1 1 .3034 1 1 2 .3034 1 2 1 1 1 2 2 1 2 1 1 0 2 1 2 0 2 2 1 .6966 2 2 2 .6966 2 .6966 .6966 0 0 .8824 .8824 .1858 .1858 3 0 0 0 0 .1176 .1176 .1176 .1176 Since no PSU in the intersection was previously selected in our example we necessarily have J=2 for all initial strata i =1 to 3 and we will take the bottom row of the table above for our conditional probabilities. Say Orlando was previously selected to represent initial stratum 1. In that case J1= 1 in the table above while J2 and J3 remain as previously unselected to represent their initial strata or 2. We would take the 4th row for the conditional probabilities. 4. Comparing sample overlap results to independent selection Independent Selection is the sampling procedure that does not use overlap maximization in any way. After stratifying into groups, the total population of each PSU is divided by its stratum population to arrive at each PSU’s selection probability. This is known as probability proportional to size (PPS) sampling where the size statistic is total population as provided by the decennial Census. PSUs with relatively large populations will have more probability of selection than relatively small PSUs with this PPS procedure. There is no conditioning on the previous sample in Independent Selection. In order to determine which of these methods works the best, the expected sample overlap value is calculated as follows. First, an overlap PSU is defined by the 1 to 1 matching method as described in section 3. In 1 to 1 matching each newly defined PSU corresponds to 1 and only 1 previous PSU in the previous frame. These previous PSUs are clearly defined as being an overlap PSU in the previous sample or a non-overlap PSU, not in the previous sample. In some cases a new PSU may match to a “dummy area” of which it is necessarily not an overlap PSU. We then use this matching method in all three methods in consideration to determine which new PSUs are considered overlap PSU and which do not. Second, this determination is used to find overlap probability, which will sum to the expected value of overlap. Stratum 1 PSU a Overlap PSU * PPS Probability Overlap Probability .75 .75 PSU b .15 0 PSU c .10 0 .5 .5 .5 0 Stratum 2 PSU d PSU e * The Expected Value of Overlap is the Sum of Overlap Probability = 1.25 In order to determine the expected overlap, 1 to 1 matching was used after the maximized probabilities were calculated for each of these three methods. In other words, Perkins method used partial matching in its overlap maximization procedure, the Ernst86 method used 1 to 1 matching for its procedure and independent selection did not use any overlap definitions in its procedure; 1 to 1 matching was used in all methods to find the expected overlap. The expected sample overlap broken down by Census Division for Independent Selection and the 2 overlap maximization methods for 58 non-self representing PSUs are as follows: PSU Design Independent Selection Perkins Ernst86 Division 1 2 0.4529 0.8635 1.0172 Division 2 4 0.6929 1.2570 1.6678 Division 3 8 2.8817 4.0453 4.8377 Division 4 4 0.7950 1.2120 2.0597 Division 5 14 3.1574 3.7433 6.9912 Division 6 6 0.7833 1.0577 2.1836 Division 7 8 2.0929 2.6480 4.3588 Division 8 6 1.4000 2.4980 3.2277 Division 9 6 0.9154 1.0247 2.2353 Total 58 13.17 18.349 28.59 5. Conclusion Due to the strong desire to reduce the start-up and shut-down cost of implementing the sample, the CPI has chosen the Ernst 86 method of overlap maximization. The Ernst 86 method provides the optimal expected overlap when compared to the Perkins method of overlap maximization and independent selection. Disclaimer: Any opinions expressed in this paper are those of the authors and do not constitute policy of the Bureau of Labor Statistics. References Ernst, L.R. (1986). Maximizing the overlap between surveys when information is incomplete. European Journal of Operational Research, Vol. 27, 192 - 200. Ernst, L.R., Izsak, Y., and Paben S.P. (2004). Use of Overlap Maximization in the Redesign of the National Compensation Survey. In JSM Proceedings, Survey Research Methods Section. Alexandria, VA: American Statistical Association. Perkins, W. (1970). 1970 CPS Redesign: Proposed Method for Deriving Sample PSU Selection Probabilities Within 1970 NSR Strata. Memo to Joseph Waksberg, U.S. Bureau of the Census.
© Copyright 2026 Paperzz