PDF file for Estimating Price Relatives for the U.S. Consumer Price Index When Priced Items are Restratified

ESTIMATING PRICE RELATIVES FOR THE U.S. CONSUMER PRICE INDEX
WHEN PRICED ITEMS ARE RESTRATIFIED
Mary Lee Fuxa, U.S. Bureau of Labor Statistics
2 Massachusetts Ave., N.E., Room 3655, Washington, D.C. 20212
Any opinions expressed in this paper are those of the author
and do not constitute policy of the Bureau of Labor
Statistics.
Key Words: Item Strata, Price Relative,
Quote-Level Weights
1. Introduction
Approximately every 10 years the fixed market basket
of commodities and services supporting the Consumer
Price Index is updated in a process referred to as a
revision of the index. Samples of specific items in the
fixed market basket are selected within a newly
designed, stratified sample of geographic areas on a
continuing rotation basis. The prices of the specific
items are quoted every month by representatives of the
Bureau of Labor Statistics.
Commodities and services have 70% relative
importance by weight in the index, and shelter has 30%.
For the January 1998 revised index, the stratification of
the items in the fixed market basket for commodities
and services was extensively redefined. For the 36
geographic areas that were not in the sample for the
December 1997 index, new samples of specific items
were selected for the 183 item strata. For the 51
geographic areas continuing in the sample: 24% of the
item strata were designated for new sampling; 33% did
not change in definition; 43% changed but were not
designated for new sampling due to budgetary
constraints.
This paper documents the adjustment in the monthly
estimation of the index that is necessary for the 43%
item strata until new samples are selected and rotated in
for the 51 geographic areas.
2. Method of Sampling and Formulation
of the Index and its Components
The geographic areas are referred to as the primary
sampling units (PSU’s). The PSU’s were selected to
represent 38 index areas. A PSU is classified by the
region it is in as Northeast, South, Midwest, or West.
It is also classified by its population size as selfrepresenting or non-self-representing.
A selfrepresenting PSU is large enough to be its own index
area.
An example is Tampa, Florida, and its
surrounding area. For each region, there is an index
area composed of non-self-representing PSU’s. The
index area is divided into geographic strata, and a single
PSU is selected from each geographic stratum. An
example of a PSU selected to represent the index area
for the South is Ocala, Florida, and its surrounding area.
Indexes are computed at the item-stratum, index-area
level each month. The indexes are weighted and
aggregated to produce the all-items U.S. index.
The sample of priced items for a combination of an
item stratum and index area is composed of replicate
samples. A self-representing PSU has 2 or 4 replicate
samples selected for it. A non-self-representing PSU is
one of the replicate samples for its index area, or a
component of a replicate sample.
Indexes are
computed at the replicate level each month for the
purpose of estimating index variance.
For the first stage of sampling for an item stratum
within a PSU, or replicate of the PSU, the item stratum
is divided into item categories referred to as entry-level
items (ELI’s). An example is the item stratum Cakes,
Cupcakes, and Cookies. It is composed of the ELI
Cakes and Cupcakes (Excluding Frozen) and the ELI
Cookies.
One or more ELI’s are selected with
probability proportional to the expenditure they
represent.
At each stage of sampling, multiple
selections of a sample unit are possible. A single
selection of an ELI is referred to as an ELI hit.
Each ELI has a corresponding sampling frame for
outlets in the PSU that sell the ELI. A bakery in the
food court of a shopping mall is an example of an outlet
that sells the ELI Cookies. More than one ELI may
share the same sampling frame. In the second stage of
sampling, two or more outlets from a frame are selected
with probability proportional to the expenditure they
represent. A single selection of an outlet is referred to
as an outlet hit. The outlets selected from a frame are
matched to the selected ELI’s that correspond to the
frame.
For a given combination of an ELI hit and an outlet hit,
a unique item within the outlet is selected with
probability proportional to the expenditure it represents.
An example of a unique item for the ELI Cookies is a
The expected value of the numerator (or denominator)
in formula 2 equals the total updated expenditure for
the item stratum in the PSU during month t (or t-1),
where updating reflects the change in price of items
purchased but not the change in quantity of items
purchased.
1-lb. bag of chewy-style chocolate-chip cookies with
walnuts, of a particular brand name. A single selection
of a unique item is referred to as a quote. For each
quote, the price of the unique item is quoted each
month.
An index I t , 0 , reflecting price change from month 0
to the current month t , is calculated at the item-stratum
replicate level by multiplying the index for the previous
month I (t −1), 0 by the estimate of price change between
3. Bridging of Samples for the 1998 Revised Index
and Adjustments in Price-Relative Estimation
For the 51 PSU’s continuing in the sample for the
January 1998 revised index, the sample of quotes were
mapped to codes for the newly defined item strata and
ELI’s. Sample sizes were tested at the national level
against the optimal sample sizes determined for the
1998 sample design. A sample for an item stratum was
defined as insufficient if the ratio of actual size to
optimal size was less than .50.
t and t-1 . This one-month price change is referred to
as the price relative Rt , (t −1) .
(1)
I t , 0 = I (t −1) , 0 Rt , (t −1) =
t
∏R
k =1
k , ( k −1)
Based on this test, 24% of the item strata had
insufficient samples and were designated for new
sample selection. For the 76% item strata that had
sufficient samples mapping to them, a process referred
to as quote-level bridging was developed. The process
involves: (1) coding of the quotes to retain their
association with both the old and the new ELI
definitions; (2) adjusting the quote-level weight for pt
For a replicate of a PSU that is self-representing, the
estimator for the price relative Rt , (t −1) is:
(2)


pt  
1




w
∑ ∑  n n P  a p  
( ELI hit =1) ( quote =1) 
a  c ,i , j 
 c c,i , j 
ELI hit, quote
n
n
c


 
1
 wa pt −1  

∑
∑
 n n P 
p a  c,i , j 
( ELI hit =1) ( quote =1) 
 c c,i , j 
 ELI hit, quote
n
nc
Where:
n
= sample size for ELI’s in the
item stratum
nc
= sample size for outlets in the
and pt −1 shown in formula 2.
Samples were bridged for 33% of the item strata where
there was no change in the item-stratum or ELI
definitions. For estimation, no adjustment of the quotelevel weights is necessary for these item strata.
Samples were also bridged for 43% of the item strata
where there were changes in definition for the itemstratum and/or ELI’s. Since the quotes in a bridged
sample were not selected based on sampling stages
determined by the new definitions, the expected sample
proportion associated with a quote for a sampling stage,
given the new definition, would not equal the original
probability of selection. Thus, the expected value of
the numerator in formula 2 would not equal the updated
total expenditure for the item stratum in the PSU during
month t .
frame corresponding to ELI c
Pc ,i , j
= probability that ELI c and
outlet i and unique item j are
selected given the item stratum
and PSU

p 
 wa t 
p a  c ,i , j

The following formulation explains these concepts
more explicitly. Formula 3 is the basis for the current
computation of the numerator of Rt , (t −1) . It can be
= measurement of expenditure
in month a for unique item j
in outlet i for ELI c which
has been updated by the ratio
of the price in month t to the
price in month a
shown that formula 3 is equal to the numerator of
formula 2.
2
It is a random variable, such that:
(3)

rc

∑
∑

n (rc nc ) Pc,i , j
(c =1) ( quote =1) 

N
rc nc

 
 wa p t  

p a  c ,i , j 

 quote
N c N c ,i
∑∑ r
i =1 j =1
c ,i , j
= rc nc
Where:
Where:
N =
number of ELI’s in the item stratum
rc =
number of sample hits for ELI c .
It is a random variable, such that:
Nc
= number of outlets in the
frame corresponding to
ELI c
N c ,i =
number of unique items
N
∑r
c=1
Note:
c
= n
in outlet
(rc nc )
*
rc nc
, the factor
is used which has a value from
0 to rc nc . It is computed each month,
The probability of selection associated with a quote is:
Where:
c =
i =
j =
event that ELI
c
event that outlet
(4)
monthly computation of the factor (rc nc ) is based on
quotes coded for the old ELI, not the new, and the
expected sample proportion for the unique item equals
the probability of selection for the unique item given
the old ELI. Also, the factor rc / n does not change
and is based on the old ELI. Thus, estimation is no
different for a bridged sample than a newly selected
sample, but the supporting sampling structure and
coding system are different.
*
is selected
i
is selected
event that unique item
selected
j
is
The expected sample proportion for ELI c is:
r 
E  c  = Pr (c )
n
Levels 2, 3, and 4 involve item strata where the old
ELI’s in one or more old item strata may be split apart
and/or recombined into new ELI’s. The newly defined
ELI’s and the unchanged ELI’s are grouped into new
item strata. An example is based on 2 old item strata:
(1) White Bread, composed of the old ELI White Bread;
and (2) Other Breads, Rolls, Biscuits, and Muffins,
composed of 2 old ELI’s: (a) Bread Other Than White
and (b) Rolls, Biscuits, Muffins (Excluding Frozen). A
new item stratum Bread is composed of the new ELI
Bread, and a second new item stratum Fresh Biscuits,
Rolls, and Muffins is composed of the new ELI Fresh
Biscuits, Rolls, and Muffins. For estimation, a poststratification method is used. Substrata within a new
(5)
The expected sample proportion for a unique item given
ELI c is selected is:
(6)
 rc,i , j

| c  = Pr (i | c) Pr (j | c, i)
E 
 rc nc 
Where:
c
Level 1, referred to as the 1987 method, involves item
strata where the old ELI’s within an item stratum may
be split apart and/or recombined into new ELI’s.
However, the definition of the item stratum does not
change. An example is the item stratum Rice, Pasta,
and Cornmeal. The old ELI Rice and the old ELI
Macaroni, Similar Products, and Cornmeal were
combined into the new ELI Rice, Pasta, and Cornmeal.
For estimation, a simple technique is used that was
developed for the January 1987 revised index. The
and its value depends on the number of
usable quotes for that month and other
contingencies (such as the duplication
of certain quotes to compensate for the
subsampling and dropping of quotes).
Pc,i,j = Pr (c) Pr (i | c) Pr (j | c, i)
ELI
The adjustment of the quote-level weights for the 43%
item strata is broken down into four levels of
complexity. The bridged sample for an item stratum
and PSU, or replicate of the PSU, is assigned to the
lowest level of complexity that it qualifies for.
In practice, an adjustment is made in
formula 3. Instead of
i for
rc ,i , j = number of quotes for unique
item j in outlet i for ELI c .
3
item stratum are defined and, for the numerator (or
denominator) of the price relative, the estimate of total
updated expenditure in month t (or t-1) is the sum of
the substratum estimates of updated expenditure in
month t (or t-1). For the new item stratum Bread, the
first substratum is White Bread and the second
substratum is Bread Other Than White. Formula 7
shows how formula 3 is adjusted for the poststratification method.
For level 2, referred to as stage-1 poststratification, a
substratum is defined as the intersection of the old ELI
and the new item stratum. In formula 7, each
intersection c is its own substratum. For a bridged
sample to qualify for estimation at level 2, each of the
substrata for the new item stratum must be represented
by at least one bridged quote. The probability of
selection for intersection c given substratum s is equal
to 1.00 and is incorporated into Ps ,c ,i , j . The sample
proportion for intersection c given substratum s,
rs ,c / n s , is equal to 1.00 . If the outlet sample for the
(7)

rs ,c

∑
∑∑
*

s =1 c =1 qt =1  n s (m s ,c ) Ps ,c ,i , j

N
N s ms , c
Where:
N


 w p t 

 a p a 

s
,
c
,
i
,
j

qt
old ELI is split apart when quotes for the old ELI are
bridged to the new item stratum, then the probability of
selection for an outlet given intersection c is adjusted
and the probability is incorporated into Ps ,c ,i , j .
(
Finally, the monthly computation of the factor m s ,c
= number of substrata s
= intersection of the old ELI
and the new item stratum
Ns
= number of intersections c
in substratum
ns
rs ,c
For level 3, referred to as stage-2 poststratification, a
substratum is defined as the intersection of the old item
stratum and the new item stratum. For a bridged
sample to qualify for estimation at level 3, each of the
substrata for the new item stratum must be represented
by at least one bridged quote. The method of
estimation is the same as it is for level 2, with the
following exceptions: (1) A substratum s may contain
more than one intersection c . (2) The probability of
selection for intersection c is conditioned on substratum
s and may be less than 1.00 . (3) The sample proportion
for intersection c, rs , c / n s , is conditioned on
s
= number of old-ELI sample hits
bridged to substratum
s
= number of sample hits for
intersection
c,
Ns
∑r
c =1
s ,c
such that:
= ns
substratum s and may be less than 1.00 .
Note: If intersection c is
smaller than the old
ELI, then
rs ,c
For level 4, referred to as stage-2 poststratification with
adjustment, a substratum is defined the same as it is for
level 3. However, at least one substratum is not
represented by at least one bridged quote. The method
of estimation is the same as it is for level 3 with the
addition of an adjustment factor in the quote-level
weight to account for missing substrata. This factor
equals the total expenditure at time b for the new item
stratum divided by the sum of the expenditures at time
b for the substrata having a bridged sample. (Time b is
between month a, the sample-rotation reference month
for the PSU, and January 1998.) The adjustment factor
is important when estimating a higher level price
relative where the summation in the numerator (or
denominator) crosses over replicates and/or geographic
strata.
may
equal 0 . Otherwise,
it will equal the original
number of sample hits
for the old ELI.
m s ,c =
number of quotes bridged to
intersection
Ps ,c ,i , j =
*
is based on quotes coded for intersection c, and, as a
result, the sample proportion for a unique item is
conditioned on intersection c .
in the new item stratum
c
)
c
c
and outlet i and unique item j
are selected given substratum s
probability that intersection
4
The number for replicates C and D is less than the
number of samples shown in Table 1 since: (1) A and
B replicate samples were not newly selected for 3 item
strata and, as a result, control and test relatives could
not be paired; (2) only nonimputed price relatives are
used. An 8-month price relative is defined to be
imputed if no quotes in its sample are usable in any
month from January 1998 through August 1998.
The adjustment of the quote-level weights for the 43%
item strata is being implemented in 2 stages.
Implementation of stage 1 was completed before the
production of the January 1998 revised index. In stage
1, the level-1 adjustments were made for the qualifying
bridged samples and the level-2 adjustments were made
for the samples qualifying for levels 2, 3, and 4. In
stage 2, the level-3 and level-4 adjustments will be
made.
Plot 1 shows no notable difference in the mean 8-month
price relative between the levels of adjustment or
between the replicates.
Likewise, no notable
divergence of the set of test means from the set of
control means is indicated. The divergence that appears
at level 3 is fairly negligible relative to the standard
deviations at level 3 in Plot 2, which is examined
below. These results could indicate that estimates from
the reweighted bridged samples are as adequate as the
estimates from the newly selected samples. However,
such a conclusion is premature given the limited time
frame and geographic representation of the analysis.
4. Empirical Findings
An analysis of the bridged samples and adjustment of
quote-level weights was conducted for the PSU
consisting of New York City. It is self-representing
and has 4 replicate samples selected for each item
stratum. For 180 of the 183 item strata, 2 replicate
samples (A and B) were newly selected based on the
new item-stratum and ELI definitions. The third and
fourth replicate samples (C and D) were bridged. For
the remaining 3 item strata, all 4 replicate samples were
bridged.
Plot 2 shows the standard deviation of the 8-month
price relatives for each level of adjustment for each
replicate A, B, C, and D. At level 0, a divergence of the
set of test standard deviations from the set of control
standard deviations is not obvious. If there is a
divergence, it could indicate a sample-rotation effect.
At level 2, a notable divergence does exist. This
divergence could indicate that the stage-1
poststratification method results in an estimate with
lower variance when each substratum for a new item
stratum is represented by at least one bridged quote.
However, the divergence may fade over time. Also, it
may not occur in other geographic areas, or it may
depend on the particular groups of items in the analysis,
or it may be due to a bridging effect other than
reweighting.
The analysis is based on the stage-1 implementation of
the adjustment of the quote-level weights. For stage 1,
the bridged samples assigned to the 3rd and 4th levels of
adjustment are combined and coded as level 3. For
these bridged samples, the level-2 adjustments were
made with no adjustment factor added to account for
missing substrata.
At the time of the analysis, 8-month price relatives,
reflecting price change from December 1997 to August
1998, were available. Price relatives estimated from the
replicate A and B newly selected samples (month a =
May 1995) are compared against price relatives
estimated from the replicate C and D bridged samples
(month a = May 1994). The term control relative is
defined as an A or B price relative, and the term test
relative is defined as a C or D price relative. A control
relative for an item stratum is coded the same level of
adjustment as its paired test relative for the same item
stratum, where relatives for replicates A and C are
paired and relatives for replicates B and D are paired.
5. Conclusion
The poststratification method described in this paper
provides a flexible tool when the reclassification of
sampled items is necessary. Further study of both the
immediate and long-term effects on the consumer price
index and its component indexes is warranted.
Table 1 shows how the C and D replicate samples break
down for the 183 item strata. The average number of
active quotes per sample is 4.9, where active is defined
as a combination of an ELI and outlet that was still
existing in January 1998.
Plot 1 shows the mean 8-month price relative for each
level of adjustment for each replicate A, B, C, and D.
The number of price relatives coded for each level of
adjustment is listed by replicate under Plots 1 and 2.
5
Table 1
6. References
Type of C or D
Replicate Sample
Newly Selected
Bridged
Level of Adjustment
0 No Change
1 1987 Method
2 Stage-1 Poststrat.
3 Stage-1 Poststrat.
No Adjustment Factor
Designated as Bridged
but No Quotes to Bridge
All
Bureau of Labor Statistics. BLS Handbook of Methods,
September 1992, pp. 190-191.
Casady, Robert J., Alan H. Dorfman, and Richard L. Valliant.
“Sample Design and Estimation for the CPI,” BLS Note, July
2, 1996.
Lane, Walter F. “Changing the Item Structure of the
Consumer Price Index,” Monthly Labor Review, December
1996, pp. 18-25.
Plot 1:
MEAN 8-MONTH RELATIVE FOR REPLICATES A, B, C, D BY LEVEL OF ADJUSTMENT *
Plot 2:
STANDARD DEVIATION FOR REPLICATES A, B, C, D BY LEVEL OF ADJUSTMENT *
Number of Relatives:
A
B
C
D
*
46
43
46
43
13
13
13
13
13
15
13
15
23
27
23
27
178
52
56
100
A control relative is coded the same level of adjustment as its paired test relative, where
relatives for replicates A and C are paired and relatives for replicates B and D are paired.
6
Number of
Samples &
Proportion
90
.24
118
32
55
57
.32
.09
.15
.16
14
366
.04
1.00