Estimating Discrete Joint Probability Distributions for Demographic

Quantitative Marketing and Economics, 3, 71–93, 2005.
C 2005 Springer Science + Business Media, Inc. Printed in The United States.
Estimating Discrete Joint Probability Distributions
for Demographic Characteristics at the Store Level
Given Store Level Marginal Distributions
and a City-Wide Joint Distribution
CHARLES J. ROMEO∗
Economist, Antitrust Division, U.S. Department of Justice, Washington, D.C., 20530
E-mail: [email protected]
Abstract. This paper provides a solution to the problem of estimating a joint distribution using the associated
marginal distributions and a related joint distribution. The particular application we have in mind is estimating joint
distributions of demographic characteristics corresponding to market areas for individual retail stores. Marginal
distributions are generally available at the census tract level, but joint distributions are only available for Metropolitan Statistical Areas which are generally much larger than the market for a single retail store. Joint distributions
over demographics are an important input into mixed logit demand models for aggregate data. Market shares that
vary systematically with demographics are essential for relieving the restrictions imposed by the Independence
from Irrelevant Alternative property of the logit model.
We approach this problem by formulating a parametric function that incorporates both the city-wide joint
distributional information and marginal information specific to the retail store’s market area. To estimate the
function, we form moment conditions equating the moments of the parametric function to observed data, and we
input these into a GMM objective.
In one of our illustrations we use four marginal demographic distributions from each of eight stores in Dominick’s
Finer Foods data archive to estimate a four dimensional joint distribution for each store. Our results show that our
GMM approach produces estimated joint distributions that differ substantially from the product of marginal distributions and emit marginals that closely match the observed marginal distributions. Mixed logit demand estimates
are also presented which show the estimates to be sensitive to the formulation of the demographics distribution.
Key words. mixed logit, discrete joint probability distributions, generalized method of moments
JEL Classifications:
1.
C51, C81
Introduction
The advantage of the mixed logit model for aggregate data pioneered by Berry (1994)
and Berry, Levinsohn, and Pakes (1995) (henceforth BLP) is that it allows one to solve
for the primitives of a flexible differentiated products model using only aggregate data on
prices, quantities sold, and product characteristics. Heterogeneity is introduced by interacting randomly generated consumer tastes with the characteristics of products in a logit
demand function. The BLP paper has been followed by a steady stream of papers in the
∗ The
views expressed are not purported to reflect those of the United States Department of Justice.
72
ROMEO
economics and marketing literatures, as it offers the possibility of flexible inference with
readily available data.1
However, the flexibility engendered by mixing the logit model does not come about
magically. Generating elasticities relieved of the restrictions imposed by the Independence from Irrelevant Alternatives (IIA) property of the logit model generally requires
information beyond just aggregate data on product characteristics. BLP recognized this
in their seminal paper by introducing demographic data on income in addition to normal random variates to represent individual types. The direction of extensions that have
been produced since BLP, have pushed in the direction of incorporating additional demographic information into the model. Nevo (2001) and Davis (1998) introduced draws
from a joint distribution of demographic information into the second stage of the demand
hierarchy. Berry, Levinsohn, and Pakes (2003) and Petrin (2002) incorporated moments
conditions based on consumer survey data into the GMM objective to improve the fit
of certain aspects of the model. Dube (2002) and Hendal (1999) present multiple discrete choice models that completely mix micro with aggregate data, while Chintagunta
and Dube (2003) present a BLP type model in which they integrate household level purchase data with store level market share data to improve the estimate of both the mean
response and the heterogeneity distribution over what could be obtained with a single data
source.
A difficulty that one sometimes faces with these models is in obtaining a joint distribution
of demographics that matches the contours of the market for the products under study. The
particular application we have in mind for this paper is the Dominick’s store level data
available on the University of Chicago, Graduate School of Business web site.2 This data
archive contains as many as 400 weeks of store level observations on a myriad of supermarket
products. In addition, a file of store level demographic distributions is available that provides
a snapshot of the characteristics of the households and the local economy for each of the 89
Dominick’s stores. However, all the distributions in the demographic file are marginals, and
this limits their usefulness for mixing with BLP class models. One is either limited to drawing
a single demographic characteristic, as BLP did in their original work, or to forming store
level joint distributions as a product of marginals and hoping that the difference between
joint distributions approximated in this manner and the true store level joint distributions is
empirically unimportant.3
This limitation is not specific to the Dominick’s data. Marginal distributional information is available for numerous demographic variables at the census tract level while joint
distributions can only be formed for a few variables. Consequently, the potential is there for
researchers to face this limitation whenever the focus is on modeling demand at the retail
outlet level and the market area for the outlet’s goods is a subset of the census tracts in the
Metropolitan Statistical Area (MSA).4
1
2
3
4
Kadiyali, Sudhir, and Rao (2001) provide a survey.
http://gsbwww.uchicago.edu/research/mkt/MMProject/DFF/DFFHomePage.html.
Meza and Sudhir (2003) appear to take this approach.
At the MSA or Primary MSA (PMSA) level, joint distributional information is readily available for a wide
variety a variables from the Current Population Survey web site (http://www.bls.census.gov/cps/ads/sdata.htm).
ESTIMATING DISCRETE JOINT PROBABILITY DISTRIBUTIONS
73
The innovation offered by this current paper is to develop a Generalized Method of
Moments (GMM) approach for consistently estimating discrete store level joint distributions
by combining discrete store level marginal distributions, with information from a discrete
joint distribution for the same set of variables from the associated MSA. The essence of our
approach is to use the available store level and MSA level information to form an initial
estimate of the joint distribution of interest that contains all of the elements of variation
of the true store level distribution. For example, suppose we are interested in the joint
distribution of income and race for an individual store. To form an initial estimate of this
joint distribution, we could specify a parametric function that incorporates the MSA level
joint distribution over income and race to capture joint variation in these two variables
that is not specific to the individual retail outlet, and the store level marginal distributions
for income and race to capture information specific to the individual store. This function
varies over both income and race, and is specific to each individual retail outlet. Moment
conditions are then formed that equate moments formulated using this parametric function
to observed moments.
Previous researchers have faced this problem in other contexts and two previous solutions
have been offered. To our knowledge the first solution is attributed to Deming and Stephan
(1940). Their method, Iterative Proportional Fitting (IPF), was used to estimate internal
cells of a two-way contingency table for the total census population. The inputs they used
were a two-way contingency table for the same variables generated from a five percent
census sample and marginal frequencies for the total census population. The approach
is iterative in that row frequencies are matched first, then column frequencies. Matching
column frequencies alters the row frequencies and vice-versa, so each is then updated in
second and subsequent iterations until convergence is achieved. The objective function
underlying IPF is a constrained minimum chi-square. This is an intrinsically statistical
objective, but, to our knowledge, the statistical properties of the IPF estimator have never
been developed.5 We do not develop them here as that is outside the scope of this research.
More recently, Putler, Kalyanam, and Hodges (1996) have offered a Bayesian solution.
Their interest is in estimating joint distributions over demographics to improve the targeting
of marketing efforts. They too use MSA level Census data to, in their case, provide a prior
joint distribution and they use smaller area marginal distributions as data inputs to the
posterior. They form a posterior distribution over “free cells,” i.e., those not constrained by
the marginal information. In comparison to our moment based approach, the Putler et al.
Bayesian approach has the advantage of incorporating the structural likelihood information
which should improve the fit of model to the data. On the other hand, the number of
parameters to be estimated grows rapidly as the number of free cells increases in both
the number of cells in a given dimension and the number of dimensions. This limits the
5 This was an active area of research throughout the 1940s until at least the early 1960s. Researchers offered
a variety of modifications to Deming and Stephan’s IPF algorithm (Stephan, 1942; Smith, 1947; Friedlander,
1961), but the focus was generally on providing a more efficient algorithm to reduce computational costs. In the
days when computational power was defined by pencil and paper, statistical distributions that were not feasible
to calculate may have been perceived as too esoteric to invest in describing. The most recent discussion of IPF
that we have found is in Bishop, Fienberg, and Holland (1975) which also does not contain any discussion of
the statistical properties of this estimator.
74
ROMEO
feasibility of the Putler et al. approach to estimating joint distributions with relatively few
free cells.
Two illustrations are provided. In the first illustration, we use an example from Putler et al.
to facilitate a comparison of the three approaches to solving this problem: IPF, Bayesian, and
GMM. This example shows that all three methods produce similar estimates of the joint
distribution. In the second illustration, we estimate a four-dimensional joint distribution
over demographics for each of eight Dominick’s stores using only the GMM approach.
Our results show that the model produces an excellent fit to the moment conditions, and
that the estimated joint distributions produce marginal distributions that are generally the
same as the observed marginals to at least two decimal places. In addition, we evaluate the
empirical importance of the demographic distribution formulation in a mixed logit demand
model. To do this, we generate two sets of estimates for an equilibrium mixed logit demand
and supply system for bath tissue data from these eight Dominick’s stores. For the first
estimates, we draw the demographics from a joint distribution formulated as a product
of marginal distributions, while for the second estimates we draw demographics from the
joint distribution estimated by GMM. The results show substantial, though generally not
statistically significant differences between the estimates.
The remainder of this paper is organized as follows. Section 2 contains the methodology for initializing and estimating store level discrete joint distributions. This section
is developed in five parts. In part one, we formulate the parametric function and moment conditions for a two-by-two discrete probability distribution. In part two, we formulate the parametric function for the general case, while part three contains the associated moment conditions. Differences in model parameterization for the GMM, IPF,
and Bayesian approaches to inference are discussed in part four, and part five discusses GMM estimation. Section 3 contains the two illustrations, and Section 4 contains
conclusions.
2.
2.1.
Formulating and estimating a discrete joint distribution
The two-by-two case
Suppose we observe joint probabilities for a city-wide area, and marginal probabilities for
the market area associated with a retail outlet within the city as in Table 1.
subject to:
c jk ≥ 0,
j,k
c jk = 1.
d· j , dk· ≥ 0,
j
d· j = 1,
dk· = 1
k
Our goal is to use this information to parameterize a joint probability distribution for the
retail outlet. Let p(θ) = ( p11 (θ), p12 (θ), p21 (θ), p22 (θ)) be the unknown joint probabilities
formulated in terms of observed data and unknown parameters θ. Suppose now we take log
odds transformations of the data in Table 1 and incorporate this information into a logistic
75
ESTIMATING DISCRETE JOINT PROBABILITY DISTRIBUTIONS
Table 1. A city-wide joint probability distribution with associated
retail outlet marginal distributions for the two-by-two case.
City-wide joint distribution
Retail outlet
marginal
distributions
z2
1
2
z1
z2
1
c11
c12
d1·
d·1
2
c21
c22
d2·
d·2
z1
distribution for p(θ) as follows,
c d exp A jk ln c22jk + β1 I[ j=1] ln d2·j· + β2 I[k=1] ln dd·k·2
p jk (θ) =
. (1)
1 + s,r =1,2,r <s exp Asr ln cc22sr + β1 I[s=1] ln dd2·s· + β2 I[r =1] ln dd·2·r
The log odds transformations are formed by dividing each element of the distributions in
Table 1 by the last element in that distribution and then taking logs. Since, by construction,
ln(c22 /c22 ) = ln(d2· /d2· ) = ln(d·2 /d·2 ) = 0 the specifications for p12 (θ), p21 (θ), p22 (θ)
each have fewer terms than that for p11 (θ). Define A = [A jr ] j,r =1,2 , and θ = (A, β j , j =
1, 2). The I[·] are indicator functions equal to one if the condition in the brackets is satisfied.
The advantage of specifying a parametric form for p(θ) is that it contains all the elements
of variation of the true unknown joint distribution. Incorporating the ln(c jk /c22 ) terms reflect
the joint variation at the city level, while the ln(d j· /d2· ) and ln(d·k /d·2 ) terms incorporate
information specific to the retail outlet into p(θ ) that adjust the city level joint variation.
Using a parametric form for p(θ) enables us to fit the moment conditions with fewer
parameters than required for either the IPF or Bayesian approaches. In addition, the number
of parameters to be estimated will grow more slowly with problem size than for either of
these other approaches.
We form three sets of moment conditions and associated GMM objective to consistently
estimate θ. The first set of conditions are formed as the difference between the estimated
and observed marginals.6
p j· (θ) − d j· = 0,
j = 1, 2,
p·r (θ) − d·r = 0,
r = 1, 2.
(2)
Given the adding up constraints on the marginal distributions, only two of the four
moment conditions above are independent. Using these four moment conditions alone will
6 It is a slight abuse of notation to set the following moment conditions exactly to zero. Rather, the GMM
objective will make the discrepancies between the estimated and observed moments as small as possible. We
extend our notation to remedy this abuse in Section 2.3.
76
ROMEO
only enable us to consistently estimate β1 and β2 with Aset = [1].7 To estimate parameters
in A we introduce a second condition relating the city-wide and retail outlet covariances.
cov(z 1 , z 2 ; θ ) − city-wide cov(z 1 , z 2 ) = 0,
(3)
where
cov(z 1 , z 2 ; θ ) =
(z 1 j − E[z 1 ; θ ])(z 2k − E[z 2 ; θ ]) p jk (θ),
j,k=1,2
and
E[z 1 ; θ] = (1/2)
z 1 j p j· (θ), and
E[z 2 ; θ]
j
is similarly formulated. Covariance is not a very meaningful measure in discrete distributions. We chose to use a condition based on covariance discrepancies because covariances
are the simplest moments that are formulated using bivariate distributional information.
This condition penalizes differences in the estimated and city-wide bivariate distributions,
p jk (θ) and the c jk respectively. Including it improved the model fit in both our illustrations.
The model in (1) contains five parameters-A22 does not enter the model. For purposes of
parsimony and identification, we structure A as the product of two 2 × 1 vectors α1 and α2 ,
such that A = α1 ∗ α2 and set one of these vectors to a column of ones.8
For the third set of conditions we specify the joint distribution for each cell as the product
of a conditional distribution derived from (1) and a retail outlet marginal distribution. Taking
the difference between two formulations of each cell provides the moment conditions.
Specifically, form moment conditions as
d j· p jr (θ)/ p j· (θ) − d·r p jr (θ)/ p·r (θ)
= d j· p·r (θ) − d·r p j· (θ) = 0,
j, r = 1, 2.
(4)
As the second line of (4) shows, this condition simplifies to a difference of products of
marginal distributions of d and p. There is a condition in the form of (4) for each cell in
the joint probability distribution, and this penalizes the model for any discrepancies in the
joint probabilities. Since this condition relies on the same sample moments as (2), it does
not increase the number of parameters in θ that can be identified, but, as our illustrations
show, including this condition improves model fit.
7 For problems larger than 2×2, enough independent marginal moment conditions are available to make elements
in A estimable with these conditions alone, at least in principle. In one of our illustrations, however, estimating
the βs and elements in A with just these conditions produced estimated joint distributions that were very close
to joint distributions formulated as a product of marginal distributions.
8 In the 2 × 2 case we still have four parameters and only three independent moment conditions if A is structured
this way, and hence we still could not estimate either α vector. This is only a problem for this particular case.
77
ESTIMATING DISCRETE JOINT PROBABILITY DISTRIBUTIONS
2.2.
The general case: Formulating an initial estimate
of the store level joint probabilities
Let z = (z 1 , . . . , z J ) be a discrete random vector. For each z j , j = 1, . . . , J , let
m j = (m j1 , . . . , m jk j ) be a vector of k j < ∞ support points, and let m = {(m 11 ,
m 22 , . . . , m J J ): j = 1, . . . , k j , j = 1, . . . , J } denote the set of all J dimensional
support points for z. Indexing stores by s = 1, . . . , S, our goal is to consistently
estimate the true joint probabilities P0z (m | s) = [P0z (m 11 , m 22 , . . . , m J J | s)](k J ∗···∗ k2 ) × k1
(joint over the random vector z) for each s, given known joint probability C z (m) =
[C z (m 11 , m 22 , . . . , m J J )](k J ∗···∗ k2 )×k1 for the city as a whole, and known marginal distributions Dz j (m j | s) = [Dz j (m j j | s)]k j ×1 (marginal with respect to the z j ).
Our first step in estimating P0z (m | s) is to use the available information to parameterize
a function, say Pz (m | s; θ), that contains all of the elements of variation in P0z (m | s). To
formulate Pz (m | s; θ) we generate the necessary data inputs from log odds transformations
of our known city-wide and store level distributions. From the city-wide distribution we
generate the data
C z m 11 , . . . , m J J
, j = 1, . . . , k j , j = 1, . . . , J,
x m 11 , . . . , m J J = ln
C z m 1k1 , . . . , m J k J
(5)
while we use the store level marginals to provide data
Dz j m j j s
, j = 1, . . . , k j
y m j j s = ln
Dz j m jk j s
j = 1, . . . , J,
(6)
for each s. It is useful to organize the matrix x = [x(m 11 , . . . , m J J )] so that it is of
dimension (k J k J −1 ∗ · · · ∗ k2 )xk1 , as we are going to organize the corresponding matrix of
unknown parameters A conformably with x. Specifically, define the set of parameter vectors
{α1 , . . . , α J } such that α j is k j x1 for each j, and formulate the matrix A of parameters as
A = α J ⊗ · · · ⊗ α3 ⊗ (α2 ∗ α1 ). A has the same dimensions as x, and A(m 11 , . . . , m J J ) =
α11 α22 ∗ · · · ∗ α J J . In general, {α j set= ı k j , j = 1, . . . , J, j = r }, where ı is a k j vector
of ones, so that only one α vector is estimated. The choice of which vector to allow to vary
freely, αr , depends in part on the number of linear and covariance constraints available, and
in part on model fit criteria.
Use (5) and (6) to formulate Pz (m | s; θ) = [Pz (m 11 , m 22 , . . . , m J J | s; θ)](k J ∗···∗ k2 )xk1 as
a logistic distribution having elements
Pz m 11 , . . . , m J J
= s; θ
exp(A(m 11 , . . . , m J J ) ∗ x(m 11 , . . . , m J J ) + β1 y(m 11 | s) + . . . + β J y(m J J | s))
,
1 ,..., J exp(A(m 11 , . . . , m J J ) ∗ x(m 11 , . . . , m J J ) + β1 y(m 11 | s) + . . . + β J y(m J J | s))
j = 1, . . . , k j , j = 1, . . . , J,
(7)
78
ROMEO
where the β j are scalars and θ = (αr , β1 , . . . , β J ). Equation (7) contains all the all
the elements of variation contained in unknown store level distributions P0z (m | s) :
x(m 11 , . . . , m J J ) allows for variation among the (z 1 , . . . , z J ) unconditional on s, while
each of the y(· | s) allows for adjustments to the city-wide distribution for a particular z j
conditional on s.
Finally, setting αr = ı kr , and setting β j = 1 for all j, gives us an initial estimate of
P0z (m | s). To improve upon this estimate, we specify a set of moment conditions having
P0z (m | s) as their unique solution. Choosing θ to minimize the GMM criterion formed from
these moment conditions provides a consistent estimate of P0z (m | s).
2.3.
Moment conditions
Using (7), specify marginal probabilities as
Pzr m r r s; θ =
Pz m 11 , . . . , m J J s; θ .
(8)
1 ,...,r −1 ,r +1 ,..., J
Form the following three sets of moment conditions for each s:
(M1)
(M2)
(M3)
Pz j m j j s; θ − Dz j m j j s = δ j j , j = 1, . . . , k j , j = 1, . . . , J
Cov(z j , zr | s; θ) − city-wide Cov(z j , zr ) = η jr , each j, r, j = r ;
rg
Pz m g s; θ Dz m r s − Pz m r s; θ Dz m g s = ν
,
g
g
r
r
r
r
g
g
1 ,..., J
r = 1, . . . , kr , g = 1, . . . , k g r, g = 1, . . . , J, r < g.
Condition (M1) is formed as the difference between the estimated and observed marginal
distributions at each point of support. There are k j moment conditions constraining the
marginals for each j. The difference between the estimated and observed marginals is
assumed to equal an error δ jl j having the properties E[δ j j ] = 0, and Var[δ j j ] < ∞, at
each j , all j.
Moment condition (M2) imposes covariance assumptions in the estimation. There are
( 2J ) = J (J − 1)/2 of these conditions available. We assume the difference between the
estimated and observed covariance to equal an error η jr , having the properties E[η jr ] = 0,
and Var[η jr ] < ∞, all j, r .
As shown in (4) above, condition (M3) is equivalent to formulating the joint distribution
Pz (m | s; θ) two different ways at each point of support m, with each formulation mixing a
different estimated conditional distribution with an observed store level marginal. There are
k1 k2 ∗ · · · ∗ k J conditions (M3) corresponding to the same number of points of support of
Pz (m | s; θ) for each (r, g) pair, and there are ( 2J ) = J (J − 1)/2 (r, g) pairs. The difference
rg
in the two formulations of the joint moments is assumed to be equal to an error ν1 ,..., J that
rg
rg
has the properties E[ν1 ,..., J ] = 0, and Var[ν1 ,..., J ] < ∞, at each 1 , . . . , J , all r, g.
79
ESTIMATING DISCRETE JOINT PROBABILITY DISTRIBUTIONS
2.4.
Differences in parameterization of the GMM, IPF,
and Bayesian approaches to inference
J
In the general case, there
Jare P J = j=1 k j − 1 independent cells in the joint probability
distribution and S J = j=1 k j − J independent marginal relationships. In addition, we
add J (J − 1)/2 covariance conditions yielding a total of S J + J (J − 1)/2 independent
constraints for identifying θ. Identification requires the number of free parameters in θ to
be less than or equal to the number of independent constraints. In general, our approach
requires many fewer free parameters than there are independent constraints to achieve a
good model fit. We associate one β parameter with each store level marginal, and allow one
α vector, the r th, to be free for a total of J + kr free parameters.
Underlying the IPF estimator is a chi-square criterion that minimizes the difference
between the unknown probabilities P0z (m | s) and the observed city-wide joint distribution
C z (m). This criterion is then subject to the S J marginal constraints each interacted with
an unknown Lagrangian multiplier. Estimation of the Lagrangian multipliers is the focus
of the Iterated Proportional Fitting algorithm. Then, as Deming and Stephan (1940) show,
using the estimated Lagrangian multipliers one can infer estimates for all the free cells
in P0z (m | s). In general, the number of Lagrangian multipliers S J > J + kr parameters
estimated using our GMM approach, but not substantially so.
Putler, Kalyanam, and Hodges (1996) take a Bayesian approach. They use the city-wide
joint distribution to form a Dirichlet prior and specify a multinomial likelihood over the
store level joint distribution. They do not parameterize the probabilities in the multinomial
distribution and, as such, this leaves them with a posterior distribution over D F = PJ − S J
free parameters. Since DF grows quickly in both the number of cells in any given dimension
and in the number of dimensions, the computational cost of posterior inference grows more
quickly than for either the parametric GMM approach we propose, or the IPF approach. The
Putler et al. approach can be extended by specifying parametric functions for the Dirichlet
prior probabilities and for the probabilities in the likelihood to reduce the dimensionality
of the estimation problem. This will, however, complicate the structure of the posterior as
conjugacy of the prior and the likelihood will be lost by introducing a parametric form for
Pz (m | s).
2.5.
Estimation
To form a GMM objective function, we input the moment conditions in one long vector.
We chose this formulation because there is a different number of moments associated with
each set of moment conditions, and for moment condition (M1), the number of moments
varies with each marginal distribution. Hence there is no natural way to allow the moment
conditions to freely correlate. Define
(θ) = δ11 (θ), δ12 (θ), . . . , δ1k1 (θ), . . . , δ J k J (θ) ,
H (θ ) = [η12 (θ), η13 (θ), . . . , η1J (θ), . . . , η J −1,J (θ)] ,
1,2
1,2
1,2
J −1,J
V (θ) = ν1
(θ), ν1
(θ), . . . , ν1
(θ), . . . , ν1
(θ) ,
1 ,...,J 1
1 ,...,J 2
k ,...,J k J
k ,...,J k
1
1
J
80
ROMEO
and define T (θ ) = [(θ ) , H (θ ) , V (θ) ] , where vector T (θ ) has length Jj=1 k j + J (J −
1)/2 + (k1 k2 ∗ · · · ∗ k J )J (J − 1)/2. Now specify the objective function as
θ = argmin{T (θ) T (θ )}.
(8)
Estimates of θ from (8) have asymptotic normal distribution θ̂ ∼ N (θ, σ 2 (G G)−1 ),
where σ 2 = Var[T (θ̂)], and G = ∂ T (θ̂)/d θ̂ . In addition, Pz (m | s; θ) is asymptotically normally distributed with mean Pz (m | s; θ̂) and variance σ 2 H (G G)−1 H , where
H = ∂Pz (m|s; θ̂)/θ̂ .
3.
Illustrations
We present two illustrations. The first uses data and results from an illustration presented
in Putler et al. The authors present joint distributions estimated three ways: as a product
of marginals, using IPF, and conducting posterior inference. To these results we add a
column obtained using our GMM approach. The second illustration uses demographic data
from eight Dominick’s stores and from the Chicago PMSA. We use the GMM approach to
estimate a joint distribution over demographics for each of these stores, and present model
fit statistics. We then determine if the formulation of the joint demographic distribution has
empirically important effects on the results of an equilibrium mixed logit demand-supply
model for bath tissue consumption. Two sets of results are generated. For the first set of
results we draw individual types from a joint demographic distribution formulated from a
product of marginals, while for the second, we take draws from our GMM estimate of the
joint demographic distribution.
3.1.
Targeting the market for stain resistant carpeting
As discussed in Putler et al., the target market for stain resistant carpeting is married couples
who are homeowners with young children living at home. To estimate a joint distribution
for these three variables for Sioux Falls, South Dakota, the authors use marginals for Sioux
Falls, and a prior joint distribution that corresponds to the whole state of South Dakota. The
variables are each coded into binary categories: (renter, homeowner), (married, unmarried),
(children under 18, no children under 18). The true distribution for Sioux Falls is available
for evaluating goodness of fit.
These data are presented in Table 2, along with estimates of the joint distribution derived
four different ways: independence estimate, IPF, posterior mode, GMM mode. As the table
shows, the independence estimate is a poor approximation to the true joint distribution. In
fact, this estimate is considerably worse than using the prior distribution for the entire state
of South Dakota to represent the distribution of these variables in Sioux Falls. On the other
hand, the IPF, posterior, and GMM results each produce substantial improvements over
both the independence estimate and the prior. All three approaches produce very similar
estimates of the joint distribution, and all three are found to be statistically indistinguishable
81
ESTIMATING DISCRETE JOINT PROBABILITY DISTRIBUTIONS
Table 2.
Estimates of joint probabilities for stain-resistant carpeting direct mail campaign.
Cell descriptora
(housing, married,
children <18)
Actual
proportiona
Prior
proportiona
Independence
estimatea
IPFa
Posterior
modea
GMM
estimate
(rent, no, no)
0.2252
0.2704
0.0969
0.2215
0.2219
(0.0055)
0.2216
(0.0033)
(rent, no, yes)
0.0418
0.0368
0.0565
0.0369
0.0366
(0.0037)
0.0368
(0.0012)
(rent, yes, no)
0.0413
0.0379
0.1269
0.0336
0.0333
(0.0035)
0.0337
(0.0014)
(rent, yes, yes)
0.0461
0.0576
0.0740
0.0625
0.0622
(0.0046)
0.0625
(0.0020)
(own, no, no)
0.1378
0.1538
0.1767
0.1453
0.1453
(0.0055)
0.1451
(0.0039)
(own, no, yes)
0.0283
0.0255
0.1030
0.0295
0.0292
(0.0037)
0.0294
(0.0013)
(own, yes, no)
0.2276
0.2266
0.2313
0.2316
0.2315
(0.0053)
0.2317
(0.0048)
(own, yes, yes)
0.2520
0.1914
0.1348
0.2394
0.2400
(0.0061)
0.2392
(0.0036)
χ 2 goodness of
fit measureb
–
66.65
847.07
16.6
16.5
16.53
a Source:
b All
Table 5, Putler et al. (1996).
χ 2 are based on 2083 observations.
from the true joint distribution by a χ 2 goodness of fit test at the one percent significance
level.9
To produce the GMM results we formulated A = αr ⊗(αm ∗ αa ) , with r, m, a representing
the housing status, marital status, and child status dimensions respectively, and we tested
which 2 × 1 α vectors to estimate and which ones to set equal to ı, a vector of ones. Since
we have three independent moment conditions on the marginal distributions and three more
covariance conditions, in principal we can identify up to six parameters in θ. In practice we
found that the χ 2 goodness of fit statistic was the smallest with Aset = [1]. Estimating any
of the α vectors produced a small increase in the χ 2 statistic. We also tried estimating the
model using just marginal moment conditions (M1), and including only conditions (M2)
or only (M3) in addition to (M1). We found that excluding conditions of type (M2) and/or
(M3) caused a slight deterioration in the fit. For example, estimating the three ß parameters
using only moment conditions (M1) produced a χ 2 statistic equal to 16.54, up from the
value of 16.53 obtained using all three sets of moment conditions.
9 Putler et al. Also include posterior mean estimates that yield χ 2 statistics as small as 16.1.
82
3.2.
ROMEO
Dominick’s data
As stated in the introduction, the motivation behind this research is to estimate joint distributions corresponding to a subset of variables in the store level marginal distributions
provided in the Dominick’s data archive. These estimated joint distributions will then be
used as our source of individual types in a mixed logit model for aggregate data. Data for
the Chicago PMSA from the March 1996 Current Population Survey is used to provide the
city-wide joint distribution for our GMM procedure.
To begin, we extracted demographic data for eight stores from the Dominick’s data
archive that were reasonably representative of the total population of Dominick’s stores.
We limit attention to eight stores to keep sample size within the range of computational
feasibility for mixed logit estimation. Three criteria were used to choose stores. First we
wanted to have at least two stores from each of the three pricing regimes that Dominick’s
employs. To this end we chose two low price, three medium price, and three high price
stores, while the population of Dominick’s stores contains 9.4, 64.7, and 25.9 percent of
low, medium, and high price stores respectively. Second, the stores were chosen to exhibit
substantial variation over our four variables of interest. Third, we closely matched the means
and correlations of our sample and the Dominick’s population for these variables in order
to produce a sample that is representative of the Dominick’s population. The variables we
chose are income, number of persons in a household, race, and number of units in a housing
structure.
Two pieces of information are provided regarding the income distribution for the market
area surrounding each Dominick’s store: the log median income, and the standard deviation
of income. The researcher is left to guess what continuous probability distribution these
variables parameterize. It is straightforward to show that the log median of a lognormal
distribution is the mean of the associated normal distribution, hence we inferred that a
lognormal was used.10 However, since the standard deviations provided appear to be for a
lognormal, we are left with a normal mean and lognormal standard error. To make these
two statistics coherent with one another, we solved for the lognormal mean, , using the
relationship µ = ln 2 −.5ln( 2 +λ2 ), (see e.g. Greene, 1997, p. 71) where µ is the normal
mean, and λ2 is the lognormal variance.11 With the income distribution parameterized,
we index it by i, and discretize it into 17 adjacent intervals, (in 000’s): [0,10), [10,20),
[20,30), [30,40), [40,50), [50,60), [60,70), [70,80), [80,90), [90,100),[100,125), [125,150),
[150,175), [175,200), [200,300), [300,400), [400,∞).
For number of persons in household, n, the Dominick’s data provides a distribution with
four points of support: 1 person, 2 persons, 3 or 4 persons, and 5 or more persons. For race,
indexed by r , the data provides the percentage of nonwhites. The percentage of detached
houses, u, is our housing units variable. Corresponding variables and the city-wide joint
distribution were readily available for the Chicago PMSA from the March 1996 Current
Population Survey.
10 Specifically, if x has a lognormal distribution with parameters µ and σ 2 , such that E[x] ≡ = exp(µ + σ 2 /2)
and Var[x] ≡ λ2 = exp(2µ + σ 2 )(exp(σ 2 ) − 1) and median(x) = γ , then if y = ln(x), y ∼ N (µ, σ 2 ) and
ln(γ ) = µ.
11 We used a bisection algorithm to solve this relationship for .
ESTIMATING DISCRETE JOINT PROBABILITY DISTRIBUTIONS
83
Table 3 contains the raw data for our eight store sample, and descriptive statistics to
compare our eight stores with the whole population of Dominick’s stores and the Chicago
PMSA. As the table shows, the means for our eight store sample match the Dominick’s
population quite closely for all except the proportion of nonwhites. Our sample contain
nearly five percent more nonwhites than the Dominick’s population. In comparison with
1996 March CPS for the Chicago PMSA, the Dominick’s market areas have similar means
for ln(median income) and proportion of nonwhites, but have fewer persons/household, and
fewer detached houses.
A sample correlation comparison of our sample with the Dominick’s population is contained in Table 4. While less closely matched than the means, the correlations all have the
same signs, and are similar in magnitude.
In this example, J = 4, ki = 17, kn = 4, kr = 2, and ku = 2. This yields ki + kn + kr +
ku = 25 marginal distribution moment conditions of type (M1), J (J − 1)/2 = 6 covariance
conditions of type (M2), and ki ∗ kn ∗ kr ∗ ku = 272 moment conditions of type (M3) for
each pair of variables, and there are J (J − 1)/2 = 6 variable pairs. Together, these yield a
total of 1663 moment conditions, 31 of which provide identifying information about θ.
A separate joint distribution is estimated for the marketing area corresponding to each of
our eight stores, and three criteria are used to gauge model fit for each store: the Euclidean
distance of the estimated joint distribution from a joint distribution formed from a product of
marginal distributions, the weighted average Euclidean distance of all moment conditions
from zero, and the GMM function value. The first criteria enables us to gauge the impact
of excluding moment conditions (M2) and/or (M3) and of the parameterization of A on
our ability to estimate a joint distribution that differs substantially from a joint distribution
formed under the assumption of independence. To form the second criteria we first evaluate
the Euclidean distance of each set of moment conditions from zero, and then use the weighted
average of these distances as a model fit metric.12 In forming this measure we incorporate
both included and excluded moment conditions. So, for example, if we estimate the model
without moment conditions of type (M3) this will enable us to determine if a substantial
deterioration in fit occurred as the (M3) moment conditions still get included in the metric.
The third criterion, the GMM function value, provides a fit metric that is affected only by
the included moment conditions. This metric is less reliable as it can show improvements
even if overall fit, as measured by the Euclidean distance metric, has deteriorated.
To parameterize θ, we tested letting each of the α vectors in A be free individually
and in various combinations. These tests showed that allowing αn to vary freely produced
marginally better results and than allowing αr or αu to be free. Allowing αi to vary freely
caused problems with inverting (G G) as did allowing the α’s to vary freely in any combination. In addition, we estimated the model with different combinations of the moment
conditions included and with A = [1]. Table 5 contains the results of tests of various moment and parameter configurations for one of our Dominick’s stores (Store # 111). Results
for other stores were similar.
The results in Column 1 include all three sets of moment conditions in the model and
allow αn to vary freely. This combination produces the best fit statistics. The estimated
12 The proportions of moment conditions of each type are used as the weights.
10.392
10.494
10.715
10.910
10.950
11.233
10.650
(0.365)
10.620
(0.280)
10.897
18
102
59
80
121
109
Mean
(std. dev.)
Mean
(std. dev.)
Mean
(std. dev.)
0.097
0.243
(0.080)
0.244
(0.073)
0.148
0.251
0.227
0.154
0.261
0.269
0.379
0.261
1
0.208
0.310
(0.031)
0.314
(0.026)
0.338
0.343
0.331
0.297
0.290
0.327
0.315
0.271
2
0.389
0.331
(0.058)
0.328
(0.056)
0.400
0.314
0.342
0.408
0.310
0.307
0.234
0.311
3 or 4
2.909
2.587
2.687
3.020
2.760
2.583
2.259
2.833
Meana
2.705
(0.236)
2.717
(0.263)
0.306
3.713
(1.715)
Chicago PMSA
0.116
(0.031)
0.648
(0.478)
0.550
(0.208)
0.559
(0.224)
0.786
0.536
0.593
0.798
0.682
0.611
0.156
0.312
Detached
house
Dominick’s population
0.114
(0.029)
8 store sample
0.114
0.092
0.100
0.141
0.139
0.098
0.072
0.157
5+
Persons/household
0.236
(0.424)
0.199
(0.186)
0.245
(0.323)
0.089
0.102
0.074
0.036
0.179
0.090
0.393
0.995
Nonwhite
–
0.375
(0.294)
0.250
(0.463)
0
0
1
1
0
0
0
0
Low
priced
persons/household is calculated by indexing 1, 2, 3 or 4, and 5+ persons with 1, 2, 3.5, and 6 respectively.
10.367
93
a Mean
10.138
ln(median
income)
–
0.647
(0.481)
0.375
(0.518)
0
1
0
0
1
1
0
0
Medium
priced
–
0.259
(0.441)
0.375
(0.518)
1
0
0
0
0
0
1
1
High
priced
The eight store sample, with a comparison of descriptive statistics for the eight stores and the Dominick’s population.
111
Store #
Table 3.
84
ROMEO
85
ESTIMATING DISCRETE JOINT PROBABILITY DISTRIBUTIONS
Table 4. A comparison of sample correlations between the eight store sample and the Dominick’s
population.
Dominick’s population
ln(median income)
8 store sample
ln(median income)
Mean persons/hh
–
0.353
Mean persons/hh
0.303
Detached house
0.624
0.736
−0.686
−0.060
Nonwhites
Table 5.
Detached house
–
Nonwhites
0.685
−0.730
0.609
−0.157
–
−0.653
−0.689
–
Effect of model specification on fit criteria for Dominick’s store 111.
Moment conditions
included and
parameterization of A
(1)
(2)
(3)
(4)
(5)
(6)
Marginal conditions (M1)
included
Yes
Yes
Yes
Yes
Yes
Yes
Covariance conditions (M2)
included
Yes
Yes
Yes
No
No
No
Individual cell conditions (M3)
included
Yes
No
Yes
Yes
Yes
No
αn free
Yes
Yes
No
No
Yes
Yes
Distance: Pz (u, r, n, i | s; θ)
−P(u)P(r )P(n)P(I )
0.1034
0.1113
0.1370
0.1411
0.0012
0.0019
Weighted average distance of
all moment conditions from 0
0.0197
0.0218
0.0646
0.0655
0.0015
0.0015
0.0015
0.0013
0.0083
0.0069
3.365e-6
3.168e-6
Fit criteria
GMM function value
joint distribution is substantially different from a product of marginal distributions, and
the weighted average distance of all moment conditions from 0 is the smallest of the first
four columns. In Columns 2–4, we exclude moment conditions (M3), fix αn = ı 4 , and
exclude conditions (M2) and fix αn = ı 4 respectively. Each of these changes produces a
deterioration in fit as measured by the weighted average distance metric, and some changes
in distance of the joint distribution from a product of marginals. In Columns 5 and 6 we
exclude conditions covariance conditions (M2) and let αn vary freely. This produces large
changes in the estimated distribution as it has moved much closer to a product of independent
marginals.
The important point to take from the results in Columns 5 and 6 is that one has to
be wary when estimating an α vector without including covariance moment conditions.
The GMM and weighted average Euclidean distance criteria both show that the model in
these two columns fits better than in any of the previous columns. The reality, however,
86
ROMEO
Table 6. Summary measures of model fit: weighted average Euclidean distances and GMM objective
function values for a model including all three types of moment conditions and letting αn vary freely.
Weighted average
Euclidean Distance
Distance between estimated
and independence distributions
Store number
Initial
Final
GMM
objective
function value
111
0.0814
0.01971
0.0015
0.1034
0.0596
93
0.0810
0.0152
0.0010
0.0810
0.0400
Euclidean
Max absolute
18
0.0398
0.0098
0.0004
0.0088
0.0323
102
0.0475
0.0108
0.0006
0.0542
0.0195
59
0.0556
0.0155
0.0015
0.0594
0.0251
80
0.0665
0.0186
0.0014
0.0587
0.0206
121
0.0687
0.0174
0.0012
0.0212
0.0210
109
0.0867
0.0134
0.0014
0.0646
0.0255
is that the GMM objective is optimized by placing very little weight on city-wide joint
distribution. Each estimated element in αn < 0.01 in absolute value and, as such, yields a
joint distribution that is very close to a product of marginals.
Table 6 includes summary measures of model fit for all eight Dominick’s stores for the
model including all three sets of moment conditions and with θ = (αn , β j , j = u, r, n, i).13
The results indicate that the model fits the moment conditions well for all eight stores, and
it estimates joint distributions that differ substantially from joints formed as a product of
marginal distributions. In addition to the three fit measures discussed above, we include the
maximum absolute distance between the estimated and independence joint distributions.
This is done to try and give the reader a better feel for how far the estimated model is
from a joint distribution formed as a product of marginals. More specifically, these joint
probability distributions each contain 272 cells. Hence, the average probability associated
with each cell is 1/272 = 0.0037. The Euclidean distance is an order of magnitude larger
than this value for seven of eight stores and the maximum absolute distance is an order of
magnitude larger for all eight stores, thereby indicating the differences from independence
to be substantial.
Tables 7a and 7b provide the estimated joint probability distribution and the observed
and estimated marginal distribution for stores 18 and 111 respectively. These tables are
provided to show that the estimated marginals closely match the observed marginals, and
that the estimated joint distributions for each store are substantially different. The two
stores serve very different demographic populations. Store 18’s market population is 91
percent white, 61 percent of whom live in detached houses. For store 111, 99.5 percent of
its market population is nonwhite, and only 31 percent live in detached homes. There are
also substantial differences in the income distributions, and store 111 has a higher average
number of persons per household.
13 We estimated this model using GAUSS on a 2GHz Pentium 4. Total estimation time for all eight stores was
3.2 seconds.
0.174
0.185
0.327
0.325
0.090
0.093
0.389
0.388
0.058
0.032
0.019
0.007
0.005
0.007
0.009
0.008
0.015
0.018
0.003
0.001
0.000
0.001
0.000
0.002
15
0.250
0.240
0.307
0.307
0.051
0.069
0.049
0.014
0.002
0.001
0.006
0.007
0.004
0.021
0.005
0.005
0.000
0.001
0.001
0.004
25
0.030
0.020
0.011
0.007
0.000
0.000
0.000
0.001
0.001
0.012
0.020
0.005
0
0.001
0.003
0.002
45
0.009
0.003
0.013
0.001
0.000
0.000
0.000
0.000
0.000
0.014
0.037
0.005
0
0.000
0.000
0.000
55
75
85
95
0.000
0.000
0.000
0.000
0
0
0
0.000
0
0.000
0.002
0.001
0
0.000
0.000
0.000
0
0.000
0
0.000
0
0
0
0.000
0
0.000
0.000
0.000
0
0
0
0.000
0
0.000
0.000
0
0
0
0
0.000
0
0.000
0.000
0.000
0
0
0
0.000
112.5 137.5 162.5 187.5
Income interval midpoints (in 000’s)
0.009 0.001 0.001 0.000 0.000
0.009 0.002 0.000 0.000 0.002
0.000 0.001 0.000 0.000 0.000
0.001 0.001 0.000 0.000 0.001
0
0
0
0
0
0.001 0
0.000 0.000 0
0.002 0.000 0.000 0
0
0.000 0.000 0.000 0.000 0.000
0.000 0.000 0
0
0.000
0.013 0.003 0.002 0.002 0.004
0.020 0.015 0.008 0.003 0.017
0.004 0.004 0.002 0.001 0.003
0.000 0
0
0
0
0.000 0.000 0
0
0
0.000 0.000 0.000 0.000 0.001
0.002 0.000 0.000 0.000 0.000
65
0
0.000
0
0
0
0
0
0.000
0
0.000
0.000
0.000
0
0
0.000
0.000
250
0
0.000
0
0
0
0
0
0.000
0.000
0.000
0.000
0
0
0
0
0
350
0
0.000
0.000
0
0
0
0
0
0
0
0
0.000
0
0
0
0
450
Store level marginal distributions
0.202 0.135 0.084 0.052 0.032 0.020 0.012 0.014 0.005 0.002 0.001 0.001 0.000 0.000
0.211 0.112 0.083 0.061 0.028 0.014 0.007 0.027 0.003 0.001 0.000 0.000 0.000 0.000
0.098
0.104
0.052
0.035
0.018
0.002
0.001
0.002
0.004
0.000
0.007
0.050
0.033
0.005
0.000
0.000
0.001
0.001
35
Notes: 0.000 entries imply probability element’s value is in interval [10−8 ,10−3 ), while a 0 entry implies element is strictly less than 10−8 . Standard errors not reported
to reduce clutter.
0.017
0.029
0.269
0.264
0.910
0.907
0.611
0.612
5
Income: observed
Estimated
Persons: observed
(1, 2, 3 or 4, 5 +) estimated
Race: observed
(white, nonwhite) estimated
Units: observed
(attach, detach) estimated
Persons
0.010
0.001
0.001
0.002
0.006
0.002
0.003
0.004
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.001
White
White
White
White
Nonwhite
Nonwhite
Nonwhite
Nonwhite
White
White
White
White
Nonwhite
Nonwhite
Nonwhite
Nonwhite
Race
Estimated joint and marginal probability distributions for store 18.
1
2
3 or 4
5+
1
2
3 or 4
5+
1
2
3 or 4
5+
1
2
3 or 4
5+
Attached
Attached
Attached
Attached
Attached
Attached
Attached
Attached
Detached
Detached
Detached
Detached
Detached
Detached
Detached
Detached
Units
Table 7a.
ESTIMATING DISCRETE JOINT PROBABILITY DISTRIBUTIONS
87
0.281
0.273
0.271
0.267
0.995
0.961
0.688
0.684
0.001
0.001
0.000
0.000
0.052
0.032
0.034
0.008
0.005
0.003
0.001
0.000
0.034
0.066
0.000
0.035
15
0.239
0.261
0.311
0.304
0.001
0.001
0.001
0.000
0.023
0.009
0.022
0.007
0.002
0.002
0.001
0.000
0.025
0.062
0.071
0.032
25
0.000
0.000
0.000
0.000
0.003
0.001
0.001
0.003
0.001
0.001
0.001
0.000
0.000
0.029
0.060
0.014
45
0.000
0.000
0.000
0.000
0.001
0.000
0.002
0.001
0.000
0.001
0.001
0.000
0.000
0.002
0.007
0.008
55
75
85
95
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0
0
0.000
0.000
0.000
0.000
0.000
0.000
0
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0
0.000
0
0.000
0.000
0.000
0.000
0.000
0
0.000
0.000
0.000
0.000
0.000
0.000
0.000
112.5 137.5 162.5 187.5
Income interval midpoints (in 000’s)
0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000
0.003 0.000 0.000 0.000 0.000
0.004 0.000 0.001 0.000 0.000
0.001 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000
0.001 0.000 0.000 0.000 0.000
0.001 0.001 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000
0.003 0.000 0.000 0.000 0.000
0.005 0.005 0.000 0.000 0.000
0.002 0.006 0.001 0.001 0.007
0.007 0.004 0.003 0.002 0.002
65
0
0
0
0.000
0.000
0.000
0.000
0.000
0
0.000
0.000
0.000
0.000
0.000
0.000
0.000
250
0
0
0
0.000
0.000
0
0
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0
0.000
350
0
0
0
0
0
0
0
0.000
0
0
0
0.000
0.000
0
0
0.000
450
Store level marginal distributions
0.153 0.093 0.056 0.034 0.021 0.014 0.009 0.011 0.004 0.002 0.001 0.001 0.000 0.000
0.149 0.115 0.024 0.025 0.016 0.006 0.004 0.010 0.001 0.000 0.000 0.000 0.000 0.000
0.157
0.169
0.001
0.000
0.000
0.000
0.009
0.009
0.013
0.002
0.002
0.003
0.002
0.000
0.032
0.012
0.043
0.020
35
Notes: 0.000 entries imply probability element’s value is in interval [10−8 ,10−3 ), while a 0 entry implies element is strictly less than 10−8 . Standard errors not reported
to reduce clutter.
0.081
0.115
0.261
0.261
0.005
0.039
0.312
0.316
5
Income: observed
estimated
Persons: observed
(1, 2, 3 or 4, 5 +) estimated
Race: observed
(white, nonwhite) estimated
Units: observed
(attach, detach) estimated
Persons
0.000
0.000
0.000
0.000
0.036
0.011
0.013
0.003
0.000
0.000
0.000
0.000
0.028
0.006
0.004
0.012
White
White
White
White
Nonwhite
Nonwhite
Nonwhite
Nonwhite
white
white
white
white
Nonwhite
Nonwhite
Nonwhite
Nonwhite
Race
Estimated joint and marginal probability distributions for store 111.
1
2
3 or 4
5+
1
2
3 or 4
5+
1
2
3 or 4
5+
1
2
3 or 4
5+
Attached
Attached
Attached
Attached
Attached
Attached
Attached
Attached
Detached
Detached
Detached
Detached
Detached
Detached
Detached
Detached
Units
Table 7b.
88
ROMEO
ESTIMATING DISCRETE JOINT PROBABILITY DISTRIBUTIONS
89
The observed and estimated marginals match to the second decimal place in most cases.
The estimated race distribution for store 111 misses the observed distribution by a full three
percent, but this is because of the extreme skewing of the distribution toward nonwhites.
The model does a much better job of matching the somewhat less skewed race distribution
of store 18.
3.3.
A study of the effect of formulation of the joint demographic distribution
on the results of an equilibrium model for demand and supply
We estimate an equilibrium demand-supply model for bathroom tissue using one year of
weekly data from each of the eight Dominick’s stores; the demand model is mixed logit
and a static Bertrand-Nash equilibrium condition is used to generate the supply function.
Two versions of the model are run each using a different estimate of the joint demographic
distribution for each of the eight stores. In the first version, the joint demographic distribution
is estimated as a product of marginals, in the second version it is estimated using GMM.14
3.3.1. Demand model. We use a random coefficients specification to represent the conditional indirect utility of consumption, ci jmt , for consumer i from product j purchased from
store m in week t, yielding
b
U (ci jmt ; d ) = x aj θ̄ a + x bj θim
− P jmt αim + ξ j + ξ jmt + εi jmt ,
i = 1, . . . , N , j = 0, . . . , Jmt , m = 1, . . . , M, t = 1, . . . , T, (9)
where for each product j we observe characteristics x j = (x aj , x bj ), and prices, p jmt . We
decompose x j into subvectors x a and x b to highlight the point that we restrict x a to enter
only the mean, while x b is allowed to influence both the mean and the random coefficients.
In addition, the x’s are only subscripted by “ j” because all product characteristics other than
price remain constant across stores and throughout the time period. Different products may
be available over time or across stores, but the characteristics of products with a particular
UPC number do not change.15 Examples of product characteristics are the color of the tissue,
the size of the roll (single or double), the ply of the paper (1- or 2-ply), the lotion content
of the paper (with or without lotion), and the scent of the paper (scented or scent free). ξ j is
the mean valuation of the unobserved (by the econometrician) product characteristic across
all of the stores in our dataset, and ξ jmt is the store-week specific deviation from that
mean. Following Nevo (2001), we use brand dummy variables to control for the ξ j , leaving
the ξ jmt as our error terms. We expect that consumers and firms take the characteristics
of all Jmt products into consideration when making decisions, and hence the ξ jmt will be
14 We sketch out the demand model here, but do not present the supply function. A structural supply function
is incorporated to provide additional structure that improves the precision of the demand model estimates.
Supply function estimates, however, are not substantially affected by the choice of demographic distribution.
Hence its development is ancillary to our main focus. The supply function is developed in detail in Romeo and
Sullivan (2004).
15 In each store-week we observe between 28 and 42 bath tissue UPCs.
90
ROMEO
correlated with the prices of all products available in store m in week t. We define alternative
j = 0 as the outside good. Since we do not have detailed information about this alternative
we retain the intercept and normalize other elements of x, p0mt , ξ0 , ξ0mt to equal 0.
To control the effect of demographic differences on bath tissue choices in each store’s
b
market area, we specify the vector of consumer taste parameters (θim
, αim ) as a function of
store area demographics and a random normal component as in
b
θim , αim = (θ̄ b , ᾱ) + aim + ϒνim , i = i, . . . , N , m = 1, . . . , M,
(10)
where each aim draw is a L × 1 vector having probability distribution P̂(1 , . . . , L ) and is a matrix of unknown parameters. P̂ is the estimated joint demographic distribution and
ϒ is specified as a diagonal matrix of unknown random coefficient parameters and the νim
are draws from N (0,I).
Finally, the εi jmt are unobserved buyer attributes that are assumed to follow a type 1
extreme value distribution that is independent across individuals, products, and time periods.
In addition, aim , νim , ξ jmt , and εi jmt are assumed to be mutually independent.
Aggregate demand shares are obtained by assuming that individuals make utility maximizing choices in their consumption of bath tissue, and integrating the εi jmt , aim , and νim
over the appropriate regions. Integration over ε yields a logit distribution. a and ν are integrated numerically by drawing 200 samples from P̂ and N(0,I) for each store. As discussed
in BLP, Nevo (2001), and Romeo and Sullivan (2004), integration of a and ν is embedded
in a contraction mapping for determining mean utility.
3.3.2. Demand model results. Table 8 contains estimates of the product characteristic
parameters for the demand model.16 The table contains two columns of results. In the
first column, results incorporate draws taken from demographic distributions formulated as
products of marginal distributions, while the second column results incorporate draws taken
from the demographic distributions estimated using GMM; 200 draws from each store are
used in both cases.
The results show the choice of demographic distribution to produce results that are
substantially different, though the differences are not generally statistically significant.
For example, the price coefficient is two units larger in absolute value when the GMM
estimated distribution is used. This difference produces own- and cross-elasticity estimates
(not reported here) that are 5–10 percent larger for the model using the GMM estimated
demographic distributions.
Finally, the objective function value indicates that the model using the GMM estimated
joint distributions fits the data better. The overidentifying conditions are not rejected are the
5 percent level for this model, while they are rejected at the 5 percent level for the model
with draws from the product of the marginals.
16 The demand model also includes month and brand dummies. Instrumental variables issues are discussed in
Romeo and Sullivan.
91
ESTIMATING DISCRETE JOINT PROBABILITY DISTRIBUTIONS
Table 8.
Mixed logit results for the Dominick’ bath tissue data.
Individual types sampled from joint distribution
formulated as
Variables
Product of marginals
GMM estimate
Mean parameters
−0.802† (0.141)
2-ply
−2.459
Plus
(1.307)∗∗
−0.915† (0.155)
−2.575∗ (1.114)
Double
5.362†
(0.695)
5.981† (0.566)
Color
0.336 (0.236)
0.181 (0.211)
Sale
Price
1.340 (1.215)
1.756 (1.234)
−25.906† (4.955)
−27.826† (4.852)
Random coefficient standard errors (ϒ)
2-ply
0.254† (0.067)
0.447† (0.088)
Plus
0.146 (2.920)
0.094 (2.041)
Double
0.369† (0.093)
0.348† (0.087)
Color
1.518† (0.268)
1.346† (0.473)
Sale
0.027†
(0.007)
0.020† (0.006)
Price
0.015 (0.012)
0.021 (0.018)
Demographic interactions ()
Double ∗ persons/hh
0.434 (0.371)
0.358 (0.473)
Color ∗ nonwhite
−4.045† (0.440)
−3.650† (0.473)
Price ∗ nonwhite
−0.140 (1.031)
−0.623 (1.289)
Price ∗ persons/hh
−1.367† (0.338)
−1.455† (0.438)
Price ∗ income
−0.723 (1.334)
−0.400 (0.960)
Sample size
13878
13878
GMM function value: q
49.256
44.319
0.020
0.057
2 > q)
P(χ33
(Standard errors in parentheses)
Two tailed test significance levels: ∗∗ Significant at the 10% level; ∗ Significant
at the 5% level; † Significant at the 1% level.
4.
Conclusions
We develop a new approach to the problem of estimating a joint distribution from its
associated marginal distributions and auxiliary information. This problem has been studied
in different contexts for more than 60 years. Our GMM approach is the first fully parametric
solution to this problem. This approach is competitive with the previously developed IPF
and Bayesian solutions in terms of model fit, and it generally requires fewer parameters than
either previous approach. The algorithm is straightforward to program and convergence of
92
ROMEO
the GMM criterion is virtually instantaneous even for the relatively large joint distributions
modeled in our second illustration.
We test the empirical importance of the formulation of the joint distribution for demographics in a mixed logit demand model. Results obtained using draws from a joint
distribution formulated as a product of marginal distributions are compared with results obtained drawing from a joint distribution estimated by our GMM procedure. The estimates
show substantial though generally not statistically significant differences.
Acknowledgments
We thank Peter Rossi and two anonymous referees for comments. Their insights greatly
improved the quality of this paper. The author is responsible for all remaining errors.
References
Berry, S. (1994). “Estimating Discrete-Choice models of Product Differentiation,” RAND Journal of Economics
25(2), 242–262.
Berry, S., J. Levinsohn, and A. Pakes. (1995). “Automobile Prices in Market Equilibrium,” Econometrica 63(4),
841–890.
Berry, S., J. Levinsohn, and A. Pakes. (2003). “Differentiated Products Demand Systems from a Combination of
Micro and Macro Data: The New Car Market,” Journal of Political Economy 112(1), 68–104.
Bishop, Y.M.M., S.E. Fienberg, and P.W. Holland. (1975). Discrete Multivariate Analysis: Theory and Practice.
Cambridge, MA. The MIT Press.
Chintagunta, P. and J.-P. Dube. (2003). Estimating an SKU-Level Brand Choice Model Combining Household
Panel Data and Store Data. Mimeo, University of Chicago, Graduate School of Business.
Davis, P. (1998). Spatial Competition in Retail Markets. Mimeo, MIT Sloan School of Management, Cambridge,
MA.
Deming, W.E. and F.F. Stephan. (1940). “On a Least Squares Adjustment of a Sampled Frequency Table when
the Expected Marginal Totals are Known,” Annals of Mathematical Statistics XI, 427–444.
Dube, J.-P. (2002). Product Differentiation and Mergers in the Carbonated Soft Drink Industry. Mimeo, University
of Chicago, Graduate School of Business.
Friedlander, D. (1961). “A Technique for Estimating a Contingency Table, Given the Marginal Totals and Some
Supplementary Data,” Journal of the Royal Statistical Society Series A 124, 412–420.
Greene, W.H. (1997). Econometric Analysis, 3rd edition. Upper Saddle River, New Jersey: Prentice Hall.
Hendal, I. (1999). “Estimating Multiple-Discrete Choice Models: An Application to Computerization Returns,”
Review of Economic Studies 66, 423–446.
Kadiyali, V., K. Sudhir, and V.R. Rao. (2001). “Structural Analysis of Competitive Behavior: New Empirical
Industrial Organization Methods in Marketing,” International Journal of Research in Marketing 18, 161–
186.
Meza, S. and K. Sudhir. (2003). The Role of Strategic Pricing by Retailers in the Success of Store Brands. Mimeo,
Stern School of Business, New York University.
Nevo, A. (2001). “Measuring Market Power in the Ready-to-Eat Cereal Industry,” Econometrica 69(2), 307–
342.
Petrin, A. (2002). “Quantifying the Benefits of New Products: The Case of the Minivan,” Journal of Political
Economy 110, 705–729.
Putler, D. S., K. Kalyanam, and J.S. Hodges. (1996). “A Bayesian Approach for Estimating Target Market Potential
with Limited Geodemographic Information,” Journal of Marketing Research, XXXIII, 134–149.
ESTIMATING DISCRETE JOINT PROBABILITY DISTRIBUTIONS
93
Romeo, C.J. and M.W. Sullivan. (2004). “Controlling for Temporary Promotions in a Differentiated Products
Model of Consumer Demand,” EAG Working Paper 10–04, Economic Analysis Group, U. S. Department of
Justice. http://papers.ssrn.com/sol3/papers.cfm?abstract id = 569866
Smith, J. H. (1947). “Estimation of Linear Function of Cell Proportions,” Annals of Mathematical Statistics XVIII,
231–254.
Stephan, F. F. (1942). “An Iterative Method of Adjusting Sample Frequency Tables when Expected Marginal Totals
are Known,” Annals of Mathematical Statistics XIII, 166–178.