Demand estimation with unobserved choice set heterogeneity

Demand estimation with unobserved
choice set heterogeneity∗
Gregory S. Crawford†
Rachel Griffith‡
Alessandro Iaria§
April 22, 2016
Abstract
[Preliminary and incomplete: please do not cite without authors’ consent]
We present a method to estimate demand parameters in the presence of unobserved choice set
heterogeneity. We describe the estimation bias that arises when the econometrician mistakenly
imputes to individuals supersets of their true but unobserved choice sets. We propose procedures
related to the Chamberlain (1980) Fixed Effect estimator that allow for unobserved and heterogeneous choice sets, and rely on assumptions about the stability or evolution of choice sets over at
least two choice occasions. We present specification tests and illustrate the ideas using householdlevel scanner data in the market for ketchup in the UK. Our results show that accounting for
unobserved choice set heterogeneity is critical for accurately estimating preferences and bounding
price elasticities of demand.
∗ We would like to thank Clément de Chaisemartin, Xavier D’Haultfœuille, xx, xx, and seminar participants at xx,
xx, and ECARES for helpful comments. Griffith gratefully acknowledges financial support from the European Research
Council (ERC) under ERC-2009-AdG grant agreement number 249529, and the Economic and Social Research Council
(ESRC) under the Centre for the Microeconomic Analysis of Public Policy (CPP), grant number RES-544-28-0001.
Iaria gratefully acknowledges financial support from the research grant Labex Ecodec: Investissements d’Avenir (ANR11-IDEX-0003/Labex Ecodec/ANR-11-LABX-0047).
† Dept. of Economics, University of Zürich and CEPR, [email protected]
‡ University of Manchester and IFS, [email protected]
§ CREST (Paris) and IFS, [email protected]
1
1
Introduction
The application of revealed preference arguments to estimate demand and recover preference parameters is one of the primary applications of empirical economics (Samuelson (1938), Blundell (2005)
and Varian (2006)). Traditional approaches rely on complete knowledge of the choice set from which
decisions makers are choosing, yet, there are many reasons why choice sets may be restricted and heterogenous across consumers (see, inter alia, Bernheim and Rangel (2009), DellaVigna (2009), Koszegi
and Rabin (2007), Hausman (2008), Spiegler (2011), Eliaz and Spiegler (2011), Manski (1977), Manzini
and Mariotti (2014), Masatlioglu et al. (2012), Reutskaja et al. (2011)). Researchers and policy makers
are interested in studying the empirical relevance and consequences of choice set heterogeneity. The
recent applied literature has sought to accommodate choice set heterogeneity by bringing in additional
information on their distribution in the form of further data or functional form restrictions (Goeree
(2008) and Van Nierop et al. (2010)), De los Santos et al. (2012)). However, we rarely have good
information on true choice sets, or specific knowledge of the process of choice set formation, and even
when we believe we are able to model the process, there is likely to be uncertainty about the precise
functional form.
Our contribution in this paper is to propose an econometric estimator that allows us to consistently estimate preference parameters, and to bound market shares and functions of market shares
such as demand elasticities, when consumers face heterogeneous choice sets that are unobserved to the
econometrician. Our proposed solution is related to Chamberlain (1980) and McFadden (1978), who
show that we can identify preferences using subsets of products known to belong to the true choice
set. We use this intuition and show that it can be extended to exploit observed purchase decisions to
construct “sufficient sets” of products that lie within consumers’ true but unobserved choice sets. We
require some minimal restrictions to preference heterogeneity, and assumptions about the stability or
evolution of unobserved choice sets over at least two choice occasions. These are considerably weaker
than requiring complete information about choice sets or the process by which they are determined.
They allow us to identify preference parameters in the presence of very general unobserved heterogeneity in both choice sets and preference parameters, and to test the validity of parametric models
of choice set formation, such as estimated in Goeree (2008).
2
These sufficient sets, coupled with the convenient functional form of the logit model, allow us
to “difference out” consumers’ true, unobserved-to-the-econometrician, choice sets. Importantly, we
allow for unobserved preference heterogeneity in the form of mixed logit with latent classes, individualproduct specific fixed-effects, or the nested logit, and any choice set formation mechanism that is
compatible with these forms of heterogeneity. The choice set formation mechanism can itself be a
function of all systematic utilities, including the product characteristics and the preference parameters,
which includes the choice set formation mechanisms specified in Chiang et al. (1998), De los Santos et
al. (2012), Draganska and Klapper (2011), Draganska et al. (2009), Goeree (2008), and Van Nierop et
al. (2010). In common with these papers, we can not allow for choice set formation processes that are
driven by the idiosyncratic individual-time varying shocks to demand. We give a number of specific
examples of “sufficient sets,” but our proposition is more general, and there could be many more
possible sufficient sets that are better suited to specific economic environments.
Estimators of demand parameters will be biased if the econometrician mistakenly includes in
consumers’ choice sets unavailable products that would have had a positive purchase probability
if available. We show that, in models with logit errors, unobserved choice set heterogeneity can
be characterized as an individual-product specific fixed effect. As a consequence, the Fixed-Effect
logit (FE logit), developed by Chamberlain (1980) to “difference out” individual-product specific
unobservables, naturally accommodates unobserved choice set heterogeneity.
The Fixed Effect logit relies on the assumption of choice set stability across at least two choice occasions and identifies preference parameters from changes in choices. This means that only preference
parameters on time varying characteristics can be identified. The remaining part of the systematic
utilities, the individual-product specific fixed effects, are differenced out. This prevents us from obtaining market shares and other potentially interesting functions of preference parameters, such as
elasticities. We show how to by-pass this lack of identification by adding restrictions on the form of
unobserved preference heterogeneity. We suggest an estimator that uses individual-specific purchase
histories to provide a sufficient set that allows us to identify all the preference parameters in the
presence of unobserved and stable choice sets. We call this model the Full-Purchase-History logit
(FPH logit). A variant of this model makes the alternative assumption that the unobserved choice
sets may be increasing over time, as for example would be the case with experience goods. We call this
model the Past-Purchase-History logit (PPH logit). Instead of comparing an individual’s decisions
3
over time we can alternatively compare two individuals at the same point in time. This approach may
be applicable if considering similar individuals making a decision in a common economic environment;
for example two 14 year old boys who own an XBOX ONE and are shopping online for video games.
We call this model the Inter-Personal logit (IP logit).
These models are complementary. The FE logit is robust to preference heterogeneity in the form of
individual-product specific fixed effects, requires choice set stability but does not allow us to identify
all elements of preferences that are needed to calculate elasticities. The FPH and IP also require choice
set stability, but by restricting the forms of unobserved preference heterogeneity allowed, they enable
us to identify all elements of consumer preferences. The PPH makes an alternative assumption on
the evolution of choice sets and also restricts the forms of unobserved heterogeneity allowed. Different
choice environments favor the assumptions underlying different sufficient sets and estimation methods,
and we provide examples where each could be relevant. We also propose tests of the choice set stability
and preference heterogeneity assumptions underlying the alternative sufficient sets.
We are often interested not only in the preference parameters, but in market shares and other
functions of parameters, for example, willingness to pay, elasticities, welfare or in undertaking counterfactual analysis. While we can point identify preference parameters, and so willingness to pay,
without further information or restrictions on the true but unobserved choice sets, we are not able to
pin down purchase probabilities and hence market shares and elasticities. We can however “bound”
them if we observe both a sufficient set, that is a subset of the true choice set, and a superset that
encompasses the true (unobserved) choice set.
Our estimator is complementary to existing empirical methods that deal with choice set heterogeneity by combining additional data and assumptions about their distribution in the population. Our
method allows us to estimate preferences when additional information on choice sets is not available
or not reliable, and to test the functional form assumptions embedded in any proposed model of choice
set formation.
This paper is related to multiple literatures in economics and marketing.
The closest paper to ours is Lu (2016). Lu (2016) proposes a method for the estimation of demand
when choice sets are unobserved that does not require the specification of a choice set formation
model. Lu (2016)’s key insight is that an individual’s true choice probability must be bounded above
by a choice probability computed on a subset of the true choice set, and below by a choice probability
4
computed on a superset of the true choice set. This allows him to set identify preference parameters by
making assumptions, or by having additional data, on supersets and subsets of the unobserved choice
sets faced by each individual in each choice situation. In contrast, our method relies on the ability of
the econometrician to observe each profile of preferences making at least two choices from the same
choice set (or a choice set that changes in predictable ways), without requiring any knowledge on what
the individual-specific unobserved choice set actually is. In principle, Lu (2016)’s method does not
require any specific assumption on the distribution of the unobserved portion of preferences, while our
method relies on having a logit component (i.e., multinomial logit, nested logit, FE logit, or mixed
logit with latent classes). However, in the implementation of the estimator, both in the simulations
and in the empirical application, Lu (2016) also assumes a multinomial logit model of preferences. On
the practical side, Lu (2016)’s method may face numerical complexities when the number of preference
parameters is larger than a few, while ours can be implemented like a standard discrete choice model,
both in terms of coding and numerical complexity. In conclusion, we see Lu (2016)’s and our method
as complements: if one has reliable information on tight supersets and subsets of the true choice sets
faced by individuals in each choice situation, then Lu (2016)’s estimator may be preferred because
more robust in terms of unobserved preferences, while when such information is not available or not
reliable, then our estimator can be favored.
There is a relevant literature that seeks to provide theoretical foundations for consumer choice
set variations based on limited consumer attention. Eliaz and Spiegler (2011) introduce boundedly
rational consumers that use screening criteria to reduce the number of “relevant” alternatives among
which they must choose and explore how firms can use marketing techniques to manipulate these sets.
Masatlioglu et al. (2012) allow for limited consumer attention in a more general demand environment
and explore whether and when preferences can be separately identified from attention (or consideration) when both preferences and consideration are non-stochastic, while choices and choice sets are
observed. If, for example, a consumer’s choice x changes when a product y is removed from the choice
set, then it must be that both y must have been considered (“Revealed Attention”) and that she
prefers x over y (“Revealed Preference”). Manzini and Mariotti (2014) observe that the deterministic
nature of the model proposed by Masatlioglu et al. (2012) may be at odds with stochastic choice data,
and extend the framework to allow for stochastic consideration. They show that this enables the
separate identification of preferences from attention with stochastic choice data. From classic decision
5
theory (Debreu (1959)) it is known that information on both prices and income is necessary to pin
down households’ budget constraints and ultimely choice sets, and it is relatively rare that we have
data with good information on both prices and incomes (standard budget surveys measure income
but not prices, while transaction level purchase data have information about prices, but rarely include
detailed information about households’ income).
There is a large number of empirical papers in economics and marketing based on both aggregate
and individual-level datasets (leading examples are Berry et al. (1995) and Nevo (2001), but see also
Ackerberg et al. (2007) and Train (2009) for recent surveys). Researchers typically assume that all
consumers have access to the same choice set, consisting of all the products in the market (Berry et
al. (1995)) or the highest market share products (Nevo (2001)). Identification of preferences relies on
revealed preference arguments (see Samuelson (1938), Varian (2006) and for more recent discussions
Koszegi and Rabin (2007) and Hausman (2008)), which require that consumers consider all relevant
alternatives from the choice set. This literature seeks to empirically accommodate choice set heterogeneity by modelling the choice set formation process in the estimation of preferences. Some papers
in this literature have information directly about consumers’ choice sets, while others model them as
a function of relevant observables. Conlon and Mortimer (2013) exploit periodic information about
product availability from vending machines to demonstrate the importance of accounting for stockouts on estimating preferences and the profit consequences of stock-outs. However, more common
is the estimation of models of choice set formation. For example, papers by Chiang et al. (1998),
Draganska and Klapper (2011), Draganska et al. (2009), Goeree (2008), and Van Nierop et al. (2010),
all building on Manski (1977), specify models of probabilistic consideration sets: a consumer’s unconditional purchase probability for product j is the sum across all possible choice sets including j of the
purchase probability for j given that choice set times the probability of that choice set. In Chiang
et al. (1998), the prior distribution of each choice set containing j is equal. Goeree (2008) models
the probability of a product being in a choice set as a function of advertising; identification relies
on differential exposure to advertising media across consumer demographics coupled with differential
advertising strategies across firms. The explicit estimation of a choice set formation model requires
additional information with respect to usual choice data. Whenever this additional information, i.e.
more detailed data and knowledge of the appropriate choice set formation model, is not available, our
proposed approach is more appropriate. Indeed, our proposed method exclusively relies on standard
6
choice data and does not require the estimation of choice set formation models. Moreover, and differently from existing methods, our estimators do not suffer from any curse of dimensionality in the
number of products sold in the market.
The rest of the paper is structured as follows. In section 2 we describe the problems that arise
from unobserved choice set heterogeneity. We show that estimators of demand parameters that
mistakenly impute to consumers supersets of their true choice sets will be biased if the products
mistakenly added would have positive purchase probability when available. In section 3 we illustrate
why existing methods may not easily address the problems introduced by unobserved choice sets and
derive alternative estimators that can accommodate unobserved choice set heterogeneity. In section
4 we discuss the economic intuition and specification tests that would allow the econometrician to
choose between alternative estimators. Section 5 discusses how we can use the parameter estimates to
obtain bounds on predicted choice probabilities and elasticities. In section 6 we illustrate our methods
in practice using household-level scanner data on the market for ketchup in the UK. In section 7 we
provide some concluding remarks. The appendices provide additional detail, auxiliary findings and
further examples.
2
Bias from Unobserved Choice Set Heterogeneity
Unobserved choice set heterogeneity introduces a bias in demand estimation. Applying revealed
preference arguments to supersets of choice sets, which include unavailable products that would have
positive purchase probabilities if available, will yield biased estimates of preference parameters since
the model would be forced to rationalise observed choices that are inconsistent with consumer’s true
preferences. In this case, we can write a consumer’s probability of choosing a product from their
true choice set as the probability they choose from a superset divided by the sum of the probabilities
she would choose any of the products in her specific choice set. This latter probability induces an
estimation bias whenever not accounted for, and such a bias can be substantial. In this section we
introduce the primitives of the choice environment and formalize the mechanism by which unobserved
choice sets, if not properly handled, are likely to introduce an estimation bias in demand.
7
2.1
Preference Fundamentals
Let there be i = 1, . . . , I types of decision makers and for each type t = 1, . . . , T different choice opportunities. For example, this could be one individual observed making several consecutive independent
decisions, or several observationally equivalent individuals each simultaneously making an independent purchase decision. For both kinds of data, denote i’s sequence of choices by Yi = (Yi1 , . . . , YiT ).
For simplicity, we assume to observe exactly T choice situations for each consumer type i, but this is
not important for our results.
We consider a situation in which consumer type i is matched, in choice situation t, to choice set
?
. We are interested in the estimation consequences of mistakenly imputing to the pair (i, t) a
CSit
∗
⊆ St .
superset, St , that includes options that were not considered by i in choice situation t, i.e. CSit
For notational convenience, we assume each i to face the same St , but this is inessential. An example
of a commonly used superset from the empirical literature on demand estimation is to include the
union of all alternatives available to any consumer in a certain time period. Another example would
be to include all products above a given market share, or a given number of products with the largest
market shares.
Denote by × the cartesian product and let the superset of choice sequences be S = ×Tt=1 St . By
construction, any observed choice sequence Yi must belong to the set of possible choice sequences
?
CS ?i = ×Tt=1 CSit
.
Let preferences be defined by a parameter vector θ and the probability with which i is matched
to a given set of possible choice sequences, CS ?i = c, be given by Pr [ CS ?i = c| γ], with γ a vector of
parameters governing this process.
Given θ and a specific match with a set of possible choice sequences, each consumer type i is
observed to make a sequence of choices Yi . We assume that the conditional indirect utility of alternative
j in choice situation t for consumer type i is
Uijt = Vijt (Xit , θ) + ijt ,
where Xit = [Xi1t , . . . , Xijt , . . . , XiJt ] is the matrix of (i, t)-specific observable characteristics, while
ijt is a portion of i’s utility that is unobserved to the econometrician. Note that the function Vijt has
8
an i subscript, indicating that so far we have not made any assumptions on the form of unobserved
consumer heterogeneity.
The probability that i is matched to the set of possible choice sequences CS ?i = c and makes a
sequence of choices Yi = j is then
Pr [ Yi = j, CS ?i = c| θ, γ]
= Pr [ Yi = j| CS ?i = c, θ] Pr [ CS ?i = c| γ] .
(2.1)
Equation (2.1) is quite general and captures two different features of behavior typical of demand
in most product markets. The first term, Pr [ Yi = j| CS ?i = c, θ], embodies consumer preferences for
products given the choice sets they are matched to. The second term, Pr [ CS ?i = c| γ], embodies
the likelihood of consumers to be matched to their (unobserved) set of possible choice sequences,
CS ?i = c. This process could be a function of consumers’ decisions, for example due to consumer
search or due to stategic decisions by firms. These factors are represented by the parameter vector
γ. In principle, γ could include some or all of the parameters that are in θ. We make the following
important assumption:
Assumption 1. Pr [ Yi = j| CS ?i = c, θ] from (2.1) belongs to the logit family for any c.
Note that, compatibly with Assumption 1, Vijt (Xit , θ) may accomodate flexible forms of unobserved preference heterogeneity such as nested logit, mixed logit, or multinomial logit with individualalternative specific fixed effects. In addition, and consistently to models that describe individuals being
matched to choice sets according to consumer search and/or stategic decisions by firms, Pr [ CS ?i = c| γ]
can take any form and be a function of any element of Pr [ Yi = j| CS ?i = c, θ].
Assumption 1 is ubiquitous in empirical work both among papers with general demand estimation
purposes, such as (Berry et al., 1995, p864-868) and (Berry et al., 2004, p76), and among papers
that precisely focus on demand estimation with unobserved choice sets, such as (Draganska and
Klapper, 2011, p660), (Draganska et al., 2009, p110), and (Goeree, 2008, p1025). Indeed, any model
of individual choice behavior based on the logit family (including mixed logit) relies on Assumption
1. Extending the methods introduced in this paper to demands without any logit component is a
promising topic for further research (e.g., probit).
Under Assumption 1, Pr [ CS ?i = c| γ] can be ignored in the estimation of θ. This is so since with
Assumption 1 we assume to know the functional form for the first term in (2.1). Thus, even if γ
9
contains elements of θ, failing to account for the matching of consumers to choice sets only impacts
the efficiency of our estimator of θ, not its consistency.
2.2
The Bias Formula
Given Assumption 1, the conditional probability of i selecting choice sequence Yi = j, given their set
of possible choice sequences, CS ?i = c = ×Tt=1 ct , is:
Pr [ Yi = j| CS ?i = c, θ] =
T
Y
exp (Vijt (Xit , θ))
.
m∈CS ? =ct exp (Vimt (Xit , θ))
P
t=1
(2.2)
it
By contrast, the typical application specifies the conditional probability of i choosing Yi from a
superset, S = s = ×Tt=1 st :
Pr [ Yi = j| S = s, θ] =
T
Y
exp (Vijt (Xit , θ))
,
m∈St =st exp (Vimt (Xit , θ))
P
t=1
(2.3)
where the difference between (2.2) and (2.3) lies in the summations in their denominators. Even given
Assumption 1, if consumer type i makes decisions from the set of possible choice sequences CS ?i = c,
a likelihood function derived from (2.3) will generally lead to an inconsistent estimator of θ.
In what follows we illustrate why. If i were to choose from S = s, the superset of choice sequences,
then the probability of selecting choice sequence Yi = j ∈ CS ?i = c ⊆ S = s would be:
Pr [ Yi = j| S = s, θ]
= Pr [ Yi = j| CS ?i = c, θ] Pr [ Yi ∈ CS ?i = c| S = s, θ]
(2.4)
T
Y
P
? =c exp (Vimt (Xit , θ))
exp (Vijt (Xit , θ))
m∈CSit
t
P
P
=
,
exp
(V
(X
,
θ))
?
imt
it
m∈CS =ct
r∈St =st exp (Virt (Xit , θ))
t=1
it
where Pr [ Yi ∈ CS ?i = c| S = s, θ] is the probability that i’s choice sequence will belong to CS ?i = c
whenever she selects from the superset of choice sequences S = s. Pr [ Yi ∈ CS ?i = c| S = s, θ] is a
measure of the importance, in terms of i’s preferences, of the choice sequences in CS ?i = c relative to
those in the superset S = s.
10
Re-arranging (2.4), we can express (2.2) in terms of the superset of choice sequences, S = s:
Pr [ Yi = j| CS ?i = c, θ]
= Pr [ Yi = j| S = s, θ]
Pr [ Yi ∈
CS ?i
1
= c| S = s, θ]
T
Y
exp (V
(Xit , θ) − ln(Pr [ Yi ∈ CS ?i = c| S = s, θ]))
X
exp (Vimt (Xit , θ))
t=1
m∈St =st
!!
P
? =c exp (Vimt (Xit , θ))
m∈CSit
t
P
T exp Vijt (Xit , θ) − ln
Y
r∈St =st exp (Virt (Xit , θ))
X
.
=
exp (Vimt (Xit , θ))
t=1
=
ijt
(2.5)
m∈St =st
The final two lines of (2.5) show how, if ijt is i.i.d. Type I Extreme Value, the model conditional on CS ?i = c can be written as a choice probability from S = s augmented by a term whose
omission may cause a bias. The term ln(Pr [ Yi ∈ CS ?i = c| S = s, θ]) is a function of the same systematic utilities that enter the numerator and denominator and will generally cause a bias unless
Pr [ Yi ∈ CS ?i = c| S = s, θ] = 1, i.e. there will be bias unless the probability of selecting one of the
choice sequences not in CS ?i = c equals 0 if i faced the superset of choice sequences S = s. While
Pr [ Yi ∈ CS ?i = c| S = s, θ] is never exactly 1 in a logit model (or any random utility model with unbounded support for ijt ), continuity suggests that if we mistakenly attribute to consumer type i’s set
of possible choice sequences a choice sequence that she does not like, then the expected bias should
be small. However, if we mistakenly attribute a choice sequence that she really does like, but did not
select because it was not included in CS ?i = c, then the bias is expected to be large.
11
2.3
The Size of the Bias
Table 2.1: The size of bias in Universal Logit
Bias (SD)
RMSE
Increasing product differentiation, removing first best
2
(V1 − V2 )/V1 ' 10%, σX
= 1.75
-0.379
0.379
(0.016)
2
(V1 − V2 )/V1 ' 30%, σX
= 2.5
-0.471
0.471
(0.012)
2
(V1 − V2 )/V1 ' 50%, σX
= 3.5
-0.578
0.579
(0.012)
2
(V1 − V2 )/V1 ' 70%, σX
=6
-0.771
0.771
(0.009)
Increasing share of individuals with constrained choice set
100% have full choice set
0.005
0.032
(0.032)
90% have full choice set
-0.223
0.224
(0.021)
70% have full choice set
-0.525
0.526
(0.013)
50% have full choice set
-0.719
0.719
(0.007)
Increasing share of products randomly removed from choice set
5 out of 5 available
0.005
0.032
(0.032)
4 out of 5 available
-0.525
0.526
(0.013)
3 out of 5 available
-0.719
0.719
(0.007)
2 out of 5 avialable
-1.139
1.139
(0.003)
3
Correcting the Bias from Unobserved Choice Set Heterogeneity
Chamberlain (1980)’s Fixed-Effect logit (FE logit) allows us to consistently estimate the preference
parameters associated to time-varying regressors, which we show in section 3.2. In section 3.3, building
on the insights from the FE logit model, we propose a broad class of models, of which the FE logit
is a member, that relies on individual-specific observed purchase behavior to construct “sufficient
sets” that difference out true but unobserved choice sets. In general, to obtain predicted purchase
probabilities and their functions, such as elasticities, we need to be able to identify the whole vector
12
θ. The FE logit does not usually allow for the identification of the entire vector θ. However, in section
3.5 we present specific examples of “sufficient sets” that, at the cost of some minimal additional
restrictions on preference heterogeneity, allow us both to difference out consumers’ unobserved choice
sets and to identify the entire preference vector θ. However, before turning to our proposed solutions,
we first discuss why conventional methods based on alternative specific constants, random coefficients,
and variants of Manski (1977) are unlikely to fully deal with the problem.
3.1
Why Some Existing Methods Do Not Solve the Problem
Several methods routinely implemented in demand estimation do not generally avoid bias from unobserved choice set heterogeneity. To help see this denote the bias term that enters the numerator
of equation (2.5) as ln (πi ) = − ln(Pr [ Yi ∈ CS ?i = c| S = s, θ]). Econometricians are used to thinking that most choice-based sampling issues in logit models can be addressed by adding a full set of
alternative-specific constants (see Lerman and Manski (1981) and Bierlaire et al. (2008)). This is not
the case for our problem, because ln (πi ) is an individual -specific term that only affects the numerator
of the logit formula.
We cannot treat the term causing the bias, ln(πi ), as a random coefficient, because the estimation of
models with random coefficients, such as the mixed logit model, relies on an independence assumption
between the distribution of random coefficients and the observables of the model (see (Berry et al.,
1995, p865), (Berry et al., 2004, p76), (McFadden et al., 2000, p447), and (Nevo, 2001, p314)). In this
situation, as is made clear by equation (2.5), by construction the distribution of ln(πi ) is a function
of the Xit . Furthermore, we cannot treat the ln(πi )’s as unobserved fixed effects and estimate them
with individual-specific dummy variables, because of the incidental parameters problem.
The most popular approach used in the literature is to jointly model both choice set formation and
the purchase decision given a choice set. As originally discussed by Manski (1977), the unconditional
probability of i selecting choice sequence j can be written as:
Pr[Yi = j|θ, γ] =
X
Pr[Yi = j|CS ?i = c, θ] Pr[CS ?i = c|γ],
(3.1)
c∈Ci?
where Ci? is the collection of sets of possible choice sequences to which consumer type i can be
matched. By having information on the matching process between consumer types and choice sets,
13
we can integrate out unobserved choice set heterogeneity, as it is routinely done with unobserved
preference heterogeneity. However, there are several instances in which the approach proposed by
Manski (1977) may not fully solve the problem.
One instance is if we do not have precise information on the correct functional form, which is
necessary in order to implement these models in practice. Another, that is perhaps less obvious, is
that a new problem that is analogous to choice set misspecification may arise in a model that tries
to integrate out choice sets. This applies to the collection of choice sets over which expectations are
taken, Ci? , since such a support can easily be misspecified. Even when Pr[Yi = j|θ, γ] is correctly
specified, in practice its estimation may prove difficult. Indeed, this model suffers from a curse of
dimensionality related to the number of elements in Ci? , which grows exponentially in the number of
products J sold in the market. We illustrate instances of support misspecification in the context of
three popular papers that rely on Manski (1977)’s approach.
First consider Goeree (2008). In our notation, equation (3) in Goeree (2008) can be written as:
Pr[Yit = jt |θ, γ] =
X
c∈C j
"
#
Y
Y
exp (Vijt t (Xit , θ))
P
φilt (γ)
(1 − φikt (γ)) ,
r∈c exp (Virt (Xit , θ))
l∈c
(3.2)
k∈c
/
where C j is the collection of all the choice sets that include product j. This specification relies on:
exp (Vijt t (Xit , θ))
and
r∈c exp (Virt (Xit , θ))
?
• our Assumption 1, with Pr[Yit = jt |CSit
= c, θ] = P
• the assumption that consideration of each product is independent from the consideration of the
Q
Q
?
other products: Prit [CSit
= c|γ] = l∈c φilt (γ) k∈c
/ (1 − φikt (γ)).
Even given this second assumption, in Goeree’s application the total number of PCs is still very
large (2112), so the non-parametric estimation of all the φ’s is not feasible. Consequently, Goeree
further assumes that
φilt (γ) =
exp (Wilt (γ))
.
1 + exp (Wilt (γ))
(3.3)
This implies that every c ∈ C j will have a strictly positive probability in the distribution of choice
?
sets for each (i, t), so that Prit [CSit
= c|γ] > 0 for every (i, t). However, it may be that for some (i, t)
?
combination Prit [CSit
= c|γ] = 0 for some c, i.e. c is not in the support of the choice set distribution
14
to which individual i can be matched to in period t:
?
Prit [CSit
= c|γ] =

Y
Y

 φ (γ)

(1 − φikt (γ))
ilt

l∈c
j?
if c ∈ Cit
(3.4)
k∈c
/



0
j
if c ∈ C \
j?
Cit
,
j?
where Cit
is the collection of choice sets to which individual i can possibly be matched to in period
j?
is typically unobserved and heterogeneous across (i, t) combinations, model (3.2) and
t. Since Cit
(3.3) will suffer from support misspecification whenever there exists even one observation where the
true collection of unobserved choice sets to which an individual can actually be matched is restricted,
j?
⊂ C j . With support misspecification, any estimator of model
i.e. ∃ (i, j, t) combination such that Cit
(3.2) and (3.3) will be inconsistent.
To be clear, computing expectations over the power set of the universal set, as with C j in (3.2),
would not be a problem if we could afford to estimate a truly flexible specification for φ that was able to
?
accomodate Prit [CSit
= c|γ] = 0 whenever necessary. The problem arises because we are not usually
able to estimate a truly flexible model for φ, and we need to make additional assumptions along the
lines of (3.3). Taken together, C j in (3.2) and (3.3) may introduce bias due to the potential inclusion
of infeasible choice sets. At the very least, the approach we develop will be a useful complement to
allow testing of the functional form assumptions made with such an approach, but more generally we
provide a more robust approach to estimating demand parameters in a context where choice sets are
unobserved and heterogenous.
Two other papers that take a similar approach to Goeree (2008) are Van Nierop et al. (2010) and
?
Draganska and Klapper (2011). Van Nierop et al. (2010) assume that the model for Prit [CSit
= c|γ]
is a J-dimensional multivariate normal distribution, which again cannot be 0 for any c. In the
corresponding equation to (3.2), they compute expectations over C j , associating positive mass to
j?
every c ∈ C j rather than only to each c ∈ Cit
⊂ C j (see their equation (8)). Draganska and Klapper
?
(2011) assume Prit [CSit
= c|γ] to be a multinomial logit model with choice set C j , and compute
expectations over C j (see their equation (5)), where again the multinomial logit model cannot be
exactly 0 for any c ∈ C j .
15
3.2
Fixed Effect Logit
As shown in section 2.2, a misspecified model with unobserved choice set heterogeneity is equivalent
to a model in which we impute to each i a superset S = s and we have a (further) additive unobserved
individual-specific fixed effect in the numerator of the logit formula, which is correlated to all other
components of the model. This bias term, see equation (2.5), can be written as an individual-specific
effect, ln (πi ) = − ln(Pr [ Yi ∈ CS ?i = c| S = s, θ]), which suffers from the incidental parameter problem.
Chamberlain (1980)’s FE logit was developed to difference out individual-alternative specific fixed
effects, and so will also naturally address the problem of unobserved choice sets.
Chamberlain (1980) takes a conditional likelihood approach, where “the key idea is to base the
likelihood function on the conditional distrubtion of the data, conditioning on a set of sufficient
statics for the incidental parameters.” The conditional log likelihood function does not depend on the
incidental parameters so maximum likelihood yields consistent estimators. Chamberlain (1980)’s FE
logit is a special example of a conditional logit model, which we show in the next section.
In general, to obtain predicted purchase probabilities and their functions, such as elasticities, we
need to be able to identify the whole vector of preference parameters θ. The FE logit does not usually
allow for the identification of the entire vector θ, but only for those elements of θ associated to timevarying observables, such as prices. However, at the cost of some minimal additional restrictions on
preference heterogeneity, we can construct alternative sufficient sets that allow us both to difference
out consumers’ unobserved choice sets and to identify the entire preference vector θ. In this section
we present an example of such a sufficient set, whose associated conditional logit model we call Full
Purchase History logit.
3.3
Sufficient Sets
Conditional logit models are obtained from conditioning the original logit model on some statistic or
function of observables. In the case of the FE logit, such a statistic is the random set P(Yi ), whose
exact realization depends on i’s observed choice sequence Yi . In what follows, we propose a class of
random sets with the property that the associated conditional logit models will be independent of
the true but unobserved choice sets. Because these random sets act as sufficient statistics for the
unobserved choice sets, we call them sufficient sets.
Consider the following condition, which we will use to evaluate possible solution methods:
16
Condition 1.
Given any choice sequence Yi ∈ CS ?i , there exists a correspondence f such that
f (Yi ) ⊆ CS ?i .
Condition 1 states that we can use i’s observed choice sequence Yi = j to construct a set of
choice sequences f (Yi = j) = r that is always included in their unobserved set of possible choice
sequences CS ?i = c. In what follows, we show that if Assumption 1 and Condition 1 hold, then f (Yi )
is a sufficient statistic for CS ?i . Afterward, we will present alternative economic assumptions on the
?
stability or evolution of CSit
across the T choice situations that lead to f ’s that satisfy Condition 1
and imply the following proposition.
Proposition 1.
If Assumption 1 and Condition 1 hold, for every consumer type i and choice se-
quence j such that f (Yi = j) = r:
QT
exp(Vijt (Xit , θ))
.
QT
k∈f (k)=r
t=1 exp(Vikt (Xit , θ))
t=1
Pr[Yi = j|f (Yi ) = r, θ] = P
Then, θ can be consistently estimated by Maximum Likelihood from Pr[Yi = j|f (Yi ) = r, θ].
17
Proof.
Pr[Yi = j|f (Yi ) = r, θ]
=
Pr[f (Yi ) = r, Yi = j, CS ?i = c|θ, γ]
Pr[f (Yi ) = r, CS ?i = c|θ, γ]
=
Pr[f (Yi ) = r|Yi = j, θ] Pr[Yi = j|CS ?i = c, θ] Pr[CS ?i = c|γ]
Pr[f (Yi ) = r|CS ?i = c, θ] Pr[CS ?i = c|γ]
Pr[f (Yi ) = r|Yi = j, θ] Pr[Yi = j|CS ?i = c, θ]
= X
Pr[f (Yi ) = r|θ, Yi = k] Pr[Yi = k|CS ?i = c, θ]
(3.5)
k∈U
T
Y
exp(Vijt (Xit , θ))
v∈CS ? =ct exp(Vivt (Xit , θ))
P
t=1
=
X
it
T
Y
exp(Vikt (Xit , θ))
v∈CS ? =ct exp(Vivt (Xit , θ))
P
k∈f (k)=r t=1
it
QT
exp(Vijt (Xit , θ)
QT
k∈f (k)=r
t=1 exp(Vikt (Xit , θ))
=P
t=1
The first two equalities follow from applying Bayes’s rule. In the third equality, U is the universal
set of all choice sequences. The fourth equality follows from Pr[f (Yi ) = r|Yi = k, θ] being 1 for any
k ∈ f (k) = r and 0 otherwise. This is an extreme case of McFadden (1978)’s Uniform Conditioning
Property. Here this property holds since, conditional on a realization of Yi , Yi = k, f (Yi ) is not a
random set: f (k) is either r with probability one, or different from r with probability one. In the
P
last equality, v∈CS ? =ct exp(Vivt (Xit , θ)) cancels out. In conclusion, consistency of the Maximum
it
Likelihood estimator of θ follows from McFadden (1978). Proposition 1 implies that, in the presence of unobserved choice set heterogeneity, parameters θ can
be consistently estimated by Maximum Likelihood on the basis of the conditional logit model Pr[Yi =
j|f (Yi ) = r, θ]. Whenever CS ?i = c is observable, the econometrician can easily detect appropriate
?
subsets of CSit
= ct for any (i, t) combination, and rely on McFadden (1978) to consistently estimate
θ. However, whenever CS ?i = c is unobservable, the econometrician needs to be careful in constructing
the set of choice sequences f (Yi = j). Indeed, if f (Yi = j) is not a subset of CS ?i = c, then Condition
18
1 does not hold and we can use our earlier argument to show inconsistency of any estimator based
on Pr[Yi = j|f (Yi ) = r, θ]: the econometrician is mistakenly enlarging i’s set of potential choice
sequences.
3.4
Unobserved Consumer Heterogeneity
Proposition 1 is compatible with unobserved preference heterogeneity in the form of mixed logit
models with discrete mixtures or latent classes. For example, suppose that each i has individualspecific preferences βi , which are distributed according to p ( βi = b| CS ?i = c, θ) = p ( βi = b| θ) in the
population. It is typical in the literature to assume that the distribution of random coefficients is
independent of choice sets, CS ?i = c. When we condition a mixed logit choice probability on a subset
of the true choice set f (Yi ) = r ⊆ CS ?i = c, as in our Proposition 1, we obtain:
Pr [ Yi = j| f (Yi ) = r, CS ?i = c, θ]
= Pr [ Yi = j| f (Yi ) = r, θ]
(3.6)
Z
=
Pr [ Yi = j, βi = b| f (Yi ) = r, θ] db
b
Distr. of pref. cond. on obs. choices
Z
=
b
Pr [ Yi = j| f (Yi ) = r, βi = b]
|
{z
}
}|
{
z
p ( βi = b| f (Yi ) = r, θ)
db.
Our result for each logit of the mixture
It is common in the literature to make assumptions about the unconditional distribution of random
coefficients in the population, p ( βi = b| θ), but these usually do not restrict in convenient ways the
resulting distribution conditional on the endogenous observed choices (e.g., starting from an unconditional normal, there is no reason to believe that the conditional will still be normal). Usually, this
translates into as many distributions (of unknown form) to be estimated as the number of realizations
of the sufficient set f (Yi ) = r. Intuitively, here the identification problem would be to recover the
distribution of βi among those individuals whose observed sequence of choices gives rise to f (Yi ) = r.
There are two cases to consider.
19
In the first case, assume that there are a large enough number of consumers for each realization of
the sufficient set f (Yi ) = r. The simplest model of unobserved preference heterogeneity compatible
with (3.6) would be a model in which each f (Yi ) = r represents a potentially different class of
preferences to be estimated. All individuals with f (Yi ) = r have preferences βi = br with probability
1, with as many br ’s to be estimated as observed values of f (Yi ) = r, r = 1, . . . , R.
We could generalise that to a mixed logit model with a discrete distribution of random coefficients,
where βi has a discrete support with q = 1, . . . , Q values and each consumer belonging to group
f (Yi ) = r has preferences drawn from a categorical distribution p ( βi = bq | f (Yi ) = r, θ) = prq , q =
1, . . . , Q. Preferences would take values [b1 , . . . , bq , . . . , bQ ] with probability masses [p1 , . . . , pq , . . . , pQ ],
r
0
r
r
r
where pq = p1q , . . . , prq , . . . , pR
q , and p = p1 , . . . , pq , . . . , pQ is the distribution of preferences among
those individuals whose observed sequence of choices give rise to f (Yi ) = r. Depending on how many
individuals are in each f (Yi ) = r, we could either fix the points of the support [b1 , . . . , bq , . . . , bQ ] and
only estimate their distribution [p1 , . . . , pq , . . . , pQ ], or alternatively estimate both [b1 , . . . , bq , . . . , bQ ]
and [p1 , . . . , pq , . . . , pQ ]. A third possibility would be to make some more parametric assumption about
p ( βi = b| f (Yi ) = r, θ), for example that for every f (Yi ) = r, p ( βi = b| θr ) is distributed normal with
parameters θr and estimate θr , r = 1, . . . , R, using simulation methods.
In the second case, assume that there are too few individuals for each f (Yi ) = r, r = 1, . . . , R. In
the extreme, there may be as many different r’s as individuals. Then none of the methods described
would work, since one would always have more parameters to estimate than observations. In general,
this relates to the non-parametric identification of p ( βi = b| f (Yi ) = r, θ). This problem is current
work in progress by D’Haultfœuille and Iaria.
In both cases, once we obtain estimates pb ( βi = b| f (Yi ) = r, θ), r = 1, . . . , R, we can compute the
estimated unconditional distribution of random coefficients in the population, pb ( βi = b| θ). Indeed,
by Bayes’s rule:
pb ( βi = b| θ) =
R
X
c [f (Yi ) = r] ,
pb ( βi = b| f (Yi ) = r, θ) Pr
(3.7)
r=1
c [f (Yi ) = r] is the observed share of choice sequences that give rise to sufficient set f (Yi ) = r.
where Pr
20
3.5
Applications of Sufficient Sets
?
In this section, we discuss examples of assumptions on the stability or evolution of CSit
across the T
choice situations that lead to examples of sufficient sets that satisfy Condition 1 and that can be used
to estimate demand parameters. These are examples and may be more or less appropriate in different
economic environments, and these are not the only possibilities that would satisfy Condition 1.
3.5.1
Full Purchase History Logit
Consider a situation in which i indexes individuals who make several purchase decisions t = 1, . . . , T
at different points in time. As in the case of the FE logit, for every individual i, unobserved choice
?
sets are stable across the T choice situations (but potentially different across i’s): CSit
= CSi? , for
every i and t. To simplify exposition, we assume that choice sets must be stable across all choice
situations of each individual, but this is not essential. This method works whenever we observe at
least two purchase decisions by the same individual from the same choice set. More in general, i’s full
choice sequence of length T can be divided into sub-sequences of length of at least two. Then, the
assumption of stable choice sets implies fixed choice sets within each sub-sequence, but potentially
different choice sets between the different sub-sequences of individual i.
With this kind of dataset, the econometrician typically observes individual i’s sequence of purchase
decisions Yi = (Yi1 = j1 , . . . , YiT = jT ). As we saw in the example in section 3.2, the FE logit is
obtained by sufficient set fF E (Yi ) = P(Yi ): the set of all possible permutations of choice sequence Yi =
(Yi1 , . . . , YiT ). Note that, even if not explicitly, Chamberlain (1980) relies on the assumption of choice
set stability. Interestingly, choice set stability also implies that P(Yi ) ⊆ CS ?i , with the consequence
that, by Proposition 1, the FE logit will also accommodate unobserved choice set heterogeneity.
The FE logit was designed to identify only those preference parameters associated to time-varying
regressors, so that any individual-alternative specific fixed effect would be differenced out. However,
there can be situations in which the econometrician is not worried about fixed effects, does not observe
choice sets, and still needs an estimate of the whole vector θ, for example to evaluate the elasticity of
demand with respect to price. In such situations, the following sufficient set may be more appropriate
than fF E (Yi ) = P(Yi ).
The assumption of stable choice sets allows for another sufficient set that enables the estimation of
ST
?
all the elements of θ. Specifically fF P H (Yi ) = (Hi )T , where Hi = t=1 {Yit } ⊆ CSit
. The sufficient
21
set fF P H (Yi ) imputes to each choice situation t the collection of all the products that individual i
ever purchased in any of the T choice situations. We call the model implied by fF P H (Yi ) the Full
Purchase History logit, or FPH logit.
The FPH logit is an extreme example of a model that allows to fully identify preference parameters
θ and to difference out unobserved choice sets. It is extreme, in the sense that full identification of
θ is achieved at the cost of ruling out unobserved preference heterogeneity, as in a multinomial logit
model. Such a strong assumption is indeed not necessary for the full identification of θ. In the next
two sections, we discuss milder restrictions on the extent of preference heterogeneity that can be
accomodated compatibly with Proposition 1 and the full identification of θ: the nested logit model
and the mixed logit model with latent classes.
3.5.2
Experience Goods: Past Purchase History Logit (PPH Logit)
In this variant of our method, we allow for the choice sets of each individual i to expand over subsequent
?
?
choice situations t = 1, . . . , T : CSit
⊆ CSit+1
. Then our proposed sufficient set is fP P H (Yi ) =
St
?
×Tt=1 Hit , where Hit = b=1 {Yib } ⊆ CSit
. The sufficient set fP P H (Yi ) imputes to each choice situation
t the collection of products that individual i ever purchased in any past choice situation up to t. We call
the model implied by fP P H (Yi ) the Past Purchase History logit, or PPH logit. Symmetrically, we can
allow for the choice sets of each individual i to shrink over subsequent choice situations t = 1, . . . , T .
Up to some data re-ordering, the case of shrinking choice sets can be treated identically to the case
of expanding choice sets: if choice sets are shrinking over subsequent choice situations from t = 1
to t = T , then by re-sorting the choice situations in reverse chronological order, from the last choice
situation to the first, choice sets will be expanding over subsequent choice situations from t = T to
St
?
t = 1. Then fP P H (Yi ) = ×Tt=1 Hit , where Hit = b=T {Yib } ⊆ CSit
. Importantly, this sufficient set
can either deal with ever expanding or ever shrinking choice sets, but not with both within the same
sub-sequence of choices.
3.5.3
Inter-Personal Logit (IP Logit)
Focusing on the cross-sectional variation of the data, different choice situations, indexed by t, can be
interpreted as several individuals of a certain consumer type, indexed by i, making each a separate
purchase decision at a specific point in time from an identical choice set. In this context, the as-
22
sumption of identical choice sets among the different T individuals of the same consumer type is the
?
assumption of stable choice sets: CSit
= CSi? , for every i and t.
With cross-sectional data, the econometrician typically observes one purchase decision for each of
the T individuals of consumer type i, Yi = (Yi1 = j1 , . . . , YiT = jT ), and the observable characteristics
of the chosen products, (Xij1 1 , . . . , XijT T ). The method we propose in this section relies on the
assumption that the observable characteristics of any product j, Xijt , be the same for each of the T
individuals of consumer type i, Xijt = Xijt0 = Xij for any i, j, and t 6= t0 . For example, if individual t
and t0 of consumer type i are observed purchasing, respectively, product j at price pjt and product k
at price pkt0 , then our method will rely on the assumption that each individual could have purchased
the product bought by the other exactly at the same price, i.e. pjt = pjt0 = pj and pkt = pkt0 = pk .
Similarly, this method assumes that all the individuals of consumer type i are observationally identical
in terms of their demographics.
The assumption of an identical choice set across the T individuals of the same consumer type i gives
ST
?
rise to a sufficient set similar in spirit to fF P H (Yi ): fIP (Yi ) = (Hi )T , where Hi = t=1 {Yit } ⊆ CSit
.
The sufficient set fIP (Yi ) imputes to each individual t the collection of all the products purchased by
any of the T individuals of consumer type i. Since this sufficient set entails comparisons of purchase
behaviors between individuals, rather than within individual, we call the model implied by fIP (Yi )
the Inter-Personal logit, or IP logit.
3.6
A Practical Example
A simple example illustrates the different sufficient sets and their implied choice probabilities, as well
as how each can accommodate different economic environments. Suppose there are three products,
j = 1, 2, and 3, and two choice situations, t = 1 and 2. Consider a consumer type i with true but
?
?
unobserved choice set containing all products in both choice situations, CSi1
= CSi2
= {1, 2, 3}. In
this case, CS ?i = {1, 2, 3} × {1, 2, 3} = {(1, 1) , (2, 2) , (3, 3) , (1, 2) , (2, 1) , (1, 3) , (3, 1) , (2, 3) , (3, 2)}.
Assume consumer type i’s indirect utility of choosing product j in choice situation t is Uijt = δj +
0 0
Xijt β + ijt , where θ = [δ1 , δ2 , δ3 , β ] . Finally, i’s observed choice sequence is j = 1 in the first choice
situation and j = 3 in the second, so that Yi = (1, 3).
• Fixed-Effect logit: The sufficient set that gives rise to the FE Logit is made of all permutations
of the observed choice sequence, P(Yi ). Since Yi = (1, 3), fF E (1, 3) = P(1, 3) = {(1, 3), (3, 1)}.
23
Given this, the contribution to the likelihood function for consumer type i is:
Pr[Yi = (1, 3)|β, {(1, 3), (3, 1)}] =
exp(Xi11 β) exp(Xi32 β)
.
exp(Xi11 β) exp(Xi32 β) + exp(Xi31 β) exp(Xi12 β)
• Full Purchase History and Inter-Personal logit: In both the FPH and the IP logits, the sufficient
set is made of the choice sequences compatible with products j = 1 and j = 3 being available
S2
in both choice situations, Hi × Hi , where Hi = t=1 {Yit }. Thus, fF P H (1, 3) = Hi × Hi =
{(1, 1), (3, 3), (1, 3), (3, 1)}. In this case, i’s contribution to the likelihood function of the FPH
logit is:
Pr[Yi = (1, 3)|θ, {(1, 1), (1, 3), (3, 1), (3, 3)}]
exp(δ1 + Xi11 β) exp(δ3 + Xi32 β)
(j1 ,j2 )∈{(1,1),(1,3),(3,1),(3,3)} exp(δj1 + Xij1 1 β) exp(δj2 + Xij2 2 β)
=P
=
exp(δ1 + Xi11 β)
exp(δ3 + Xi32 β)
×
.
exp(δ1 + Xi11 β) + exp(δ3 + Xi31 β) exp(δ1 + Xi12 β) + exp(δ3 + Xi32 β)
Note that, in the case of the IP logit, it is assumed that each individual t faces the same
observable characteristics, Xi11 = Xi12 = Xi1 and Xi31 = Xi32 = Xi3 . This implies that i’s
contribution to the likelihood function of the IP logit further simplifies to:
Pr[Yi = (1, 3)|θ, {(1, 1), (1, 3), (3, 1), (3, 3)}]
=
exp(δ1 + Xi1 β)
exp(δ3 + Xi3 β)
×
.
exp(δ1 + Xi1 β) + exp(δ3 + Xi3 β) exp(δ1 + Xi1 β) + exp(δ3 + Xi3 β)
Even if the FPH and the IP logit can be analytically expressed in a similar way, the economic
assumptions they rely on are very different. In the FPH logit there is a same individual making
several purchase decisions at different points in time, while in the IP logit there are different
individuals of the same type making each a separate purchase decision at a specific point in
time.
• Past Purchase History logit: In the PPH logit, the sufficient set—at any point in time—includes
all choice sequences consistent with those products purchased up to that choice situation, Hi1 ×
24
Hi2 , where Hit =
St
b=1
{Yib }. Thus, fP P H (1, 3) = Hi1 × Hi2 = {(1, 1), (1, 3)}. In this case,
consumer type i’s contribution to the likelihood function is:
Pr[Yi = (1, 3)|θ, {(1, 1), (1, 3)}]
=
exp(δ1 + Xi11 β) exp(δ3 + Xi32 β)
exp(δ1 + Xi11 β) exp(δ1 + Xi12 β) + exp(δ1 + Xi11 β) exp(δ3 + Xi32 β)
=1×
exp(δ3 + Xi32 β)
.
exp(δ1 + Xi12 β) + exp(δ3 + Xi32 β)
Note that the FE logit with linear systematic utilities does not allow for the identification of all the
preference parameters in θ: those parameters associated with regressors that only vary in (i, j), but
not in t, will not be identified. In other words, any (i, j)-specific fixed effect will be differenced out.
In the current example, the FE logit drops the alternative-specific constants δ1 , δ2 , and δ3 ; only β is
identified. Both in the FPH and in the PPH logit, consumer type i’s choice sequence does not convey
any information about δ2 . In this sense—and identically to the standard case of observable choice
sets—only those alternative-specific constants corresponding to products with positive market shares
can be identified.
With the exception of the FE logit, each of the logit models we propose can be represented as the
product of simpler logit models. In the current example, this can be seen by moving from the first
to the second equality in both the FPH logit and the PPH logit. The same cannot be done in the
FE logit. In general, one can switch the order of multiplication and summation in logit model (3.5),
only when the sufficient set f can be defined as the cartesian product of T sets: f = ×Tt=1 ft . Among
the sufficient sets we propose, fF P H , fP P H , and fIP can be defined as the cartesian products of T
sets (either Hi or Hit ), while P(Yi ) cannot. The possibility of factoring a large logit model defined
over choice sequences into smaller logit models each defined over a single choice situation represents
a great computational advantage.
25
3.7
How well do the estimators perform
Table 3.1: Increasing the share of individuals with restricted choice set
100%
90%
70%
50%
Universal Logit
Bias (SD) RMSE
0.005
0.032
(0.032)
-0.223
0.224
(0.021)
-0.525
0.526
(0.013)
-0.719
0.719
(0.007)
True Logit
Bias (SD) RMSE
0.005
0.032
(0.032)
0.003
0.029
(0.028)
0.005
0.028
(0.028)
0.007
0.028
(0.027)
FPH Logit
Bias (SD) RMSE
0.011
0.032
(0.030)
0.008
0.028
(0.027)
0.012
0.031
(0.029)
0.014
0.030
(0.027)
FE Logit
Bias (SD) RMSE
0.024
0.088
(0.085)
0.015
0.081
(0,080)
0.011
0.088
(0.087)
0.018
0.080
(0.078)
For each of the four cases of choice set heterogeneity (i.e., rows), twenty datasets are generated. The degree of choice
set heterogeneity increases from the first to the fourth row as the share of individuals with restricted choice sets passes
from 0 to 50. #U = 5. Restricted choice sets are defined as CSi = U \ {ji }, where ji is randomly (uniformly) selected
for each individual. Each individual faces 10 choice situations. It is always assumed that individuals have stable choice
sets (i.e., choice sets do not change across choice situations). In each dataset, there are 1000 individuals. The mixed
logit model is estimated by simulated maximum likelihood with 1000 shifted and shuffled Halton draws per individual.
To speed up computations, the FE Logit is estimated by sampling at random (uniformly), for each individual, 5000
permutations of the observed sequence of choices (see D’Haultfuille and Iaria, 2016).
Table 3.2: Increasing the severity of choice set restriction
Number of products
in choice set (out of 5)
5
4
3
2
Universal Logit
Bias (SD) RMSE
0.005
0.032
(0.032)
-0.525
0.526
(0.013)
-0.719
0.719
(0.007)
-1.139
1.139
(0.003)
True Logit
Bias (SD) RMSE
0.005
0.032
(0.032)
0.005
0.028
(0.028)
0.013
0.030
(0.027)
0.007
0.028
(0.027)
FPH Logit
Bias (SD) RMSE
0.011
0.032
(0.030)
0.012
0.031
(0.029)
0.020
0.034
(0.027)
0.009
0.029
(0.027)
FE Logit
Bias (SD) RMSE
0.024
0.088
(0.085)
0.011
0.088
(0.087)
0.023
0.071
(0.068)
0.010
0.068
(0.067)
For each of the four cases of choice set heterogeneity (i.e., rows), twenty datasets are generated. The degree of choice set
heterogeneity increases from the first to the fourth row as the number of alternatives not included in CSi passes from 0
to 3. #U = 5. Restricted choice sets are defined asCSi = U \ {ji , ki , hi }, where ji , ki , and hi are randomly (uniformly)
selected for each individual. Each individual faces 10 choice situations. It is always assumed that individuals have
stable choice sets (i.e., choice sets do not change across choice situations). In each dataset, there are 1000 individuals.
The mixed logit model is estimated by simulated maximum likelihood with 1000 shifted and shuffled Halton draws
per individual. To speed up computations, the FE Logit is estimated by sampling at random (uniformly), for each
individual, 5000 permutations of the observed sequence of choices (see D’Haultfuille and Iaria, 2016).
26
3.8
Discussion
We close this subsection with two important practical comments. First, note that we can combine
any of our time-series sufficient sets with the cross-sectional (Inter-Personal) set. For example, if we
had confidence that the purchases of different individuals within each cross-section spanned the set
of products available at that specific point in time, then we could use that information to limit the
amount of time-series “extrapolation” between different cross-sections.
As we saw in the above example, a drawback of the FE logit is that it does not allow the identification of the parameters of time-invariant regressors. Consequently, from a FE logit one cannot
estimate price elasticities, for example. By contrast, the other sufficient sets we propose allow one
to estimate all the elements of θ. Moreover, the FE Logit is both computationally more complex to
implement and, as we will see in the next section, less efficient whenever there is not any unobserved
preference heterogeneity among consumer types. The additional computational challenges raised by
the FE logit follow from the impossibility of factoring P(Yi ) into a cartesian product of T sets. For a
discussion about these computational issues and a description of the solution we use in our application,
see D’Haultfœuille and Iaria (2016). However, despite these drawbacks, only the FE logit is robust
against (i, j)-specific fixed effects.
As we will discuss in the next section, the various ways of constructing f (Yi ) lead to more or less
robust and/or efficient estimators along the lines of Hausman and McFadden (1984) and can be used
to form specification tests.
4
Choosing the sufficient set
4.1
Specification Tests
We discuss some approaches to testing the maintained assumptions in our empirical approach, in
particular: (a) the length of the sequence of choice situations for which choice sets are stable, ever
expand, or ever shrink; and (b) unobserved individual-product-specific preference heterogeneity.
This section proposes a theoretical result that enables us to “rank” Maximum Likelihood Estimators (MLE) of logit models with different sufficient sets in terms of their efficiency, and to use that
to form specification tests of our assumptions. Roughly, the MLE of a logit model with sufficient set
27
fL is more efficient than the MLE of a logit model with sufficient set fZ ⊆ fL . Moreover, this result
can be applied recursively, so that if two subsets of fL are available, say fZ and fXZ with fXZ ⊆ fZ ,
then the efficiency rank of the three MLEs will be clear: fL fZ fXZ .
Using this result, we then propose Hausman–like specification tests between models based on
different sufficient sets, i.e. different ways of constructing the correspondence f . These comparisons
enable us to test for some forms of choice set stability and preference heterogeneity.
Proposition 2.
Assume that Assumption 1 holds, and that sufficient sets fL and fZ satisfy Con-
dition 1 such that fZ (Yi ) ⊆ fL (Yi ), Yi ∈ CS ?i = c and i = 1, . . . , I. Define lL (θ) and lZ (θ) as the
log–likelihood functions corresponding to the logit models conditional on fL (Yi ) and on fZ (Yi ), with
θbL and θbZ the corresponding Maximum Likelihood Estimators (MLEs). Then the following results
hold:
1. The log–likelihood function lL (θ) can be written as lL (θ) = lZ (θ) + l4 (θ).
2. Provided that θ is identified in l4 (θ), so that θb4 is a well–defined MLE, then:
(a) θbZ and θb4 are asymptotically independent, and
(b) θbL is more efficient than θbZ .
3. Given result (2), then:
(a) All Hausman–like tests based on pairwise estimator comparisons among θbL , θbZ , and θb4
are equivalent,
(b) the Likelihood Ratio statistic LR = 2[lZ (θbZ )+l4 (θb4 )−lL (θbL )] is asymptotically equivalent
to the Hausman statistic comparing θbL and θbZ , and
(c) Var θbL − θbZ = Var θbZ − Var θbL .
Proof. See Appendix B.
The Likelihood Ratio statistic, LR, proposed in result (3b) allows us to compare different models derived from alternative assumptions. It consists of the difference between an unrestricted log–
likelihood function, lZ θbZ + l4 θb4 , and a restricted one, lL θbL . Even though LR requires the
28
computation of a third estimator, θb4 , it is simpler to implement than other Hausman–like statistics based on quadratic forms. For instance, the statistic LR is always non–negative, by–passing
the practical inconvenience of some estimated covariance matrices, which fail to be positive definite.
Furthermore, and differently from other Hausman–like statistics, LR makes very transparent the computation of the degrees of freedom of the corresponding χ2 distribution: they equal the number of
parameters in θbL . Result (3c) is of practical convenience, since it implies that the computation of
Var θbL − θbZ , necessary for classical Hausman–like statistics, can proceed as in the standard case in
which one of the compared estimators is fully efficient under the null hypothesis, even though no such
efficiency assumption is required here. Proposition 2 hinges on the Factorization Theorem proposed
by Paul Ruud, for more details see Ruud (1984) and Hausman and Ruud (1987).
The various sufficient sets introduced in section (3.3) rely on the following economic assumptions:
• fF E : Choice set stability across T choice situations and the possibility of having unobserved
preference heterogeneity in the form of (i, j)–specific fixed effects.
• fF P H : Choice set stability across T choice situations and no unobserved preference heterogeneity.
• fP P H : Choice set evolution in the form of entry–but–no–exit or exit–but–no–entry across T
choice situations and no unobserved preference heterogeneity.
Proposition 2 can be used to construct statistics useful to test for these economic assumptions
by comparing the estimates obtained from models based on different sufficient sets f ’s. The first
possibility is to compare fF E , fF P H , and fP P H for choice sequences of constant length T . In fact, for
choice sequences of a given length T , it can be shown that fF E (Yi ) ⊆ fF P H (Yi ) and that fP P H (Yi ) ⊆
fF P H (Yi ) for any Yi ∈ CS ?i = c. Given Proposition 2, these relationships among sufficient sets allow
us to construct Hasmann–like tests for the assumptions of choice set stability and for preference
heterogeneity.
The second possibility for constructing test statistics is to fix a specific f , say fF E , and to compare
choice sequences with some of their sub–sequences: for example, the sequence 1, 2, . . . , T L can be split
into two mutually exclusive sub–sequences 1, 2, . . . , T Z and T Z + 1, . . . , T L , and this gives rise to
different fF E ’s, fFZE and fFLE such that fFZE (Yi ) ⊆ fFLE (Yi ) for any Yi ∈ CS ?i = c. The same holds for
29
both fF P H and fP P H . This method of making comparisons allows us to test for choice set stability
in several alternative ways, but it does not enable us to test for preference homogeneity.
In appendix C we discuss each of these Hausman–like test statistics in more detail.
5
Bounding Choice Probabilities and Elasticities
We are often interested in functions of parameters, for example, willingness to pay, elasticities, welfare
or analysis of counterfactuals, such as evaluating the effects of a change in tax policy or a merger
between manufacturers.
We can obtain point estimates of preference parameters and so, for example, willingness to pay.
However, without further information or restrictions on the true choice sets, we are not able to pointestimate consumer type-specific choice probabilities and hence average choice probabilities. Still,
under some weak assumptions, we can ”bound” them. Also, given the convenient relationship between
consumer type-specific elasticities and consumer type-specific choice probabilities in logit models, we
can as well bound consumer type-specific elasticities and their averages.
For expositional simplicity, suppose that indirect utilities are linear in parameters:
Vijt (Xit , θ) = Xijt θ = δj + Xijt β.
(5.1)
Denote by Yit = jt whether product jt is chosen by consumer type i in choice situation t, and by
ft (Yi ) = rit whether i’s sufficient set collects products rit in choice situation t.
?
?
Note that ft (Yi ) ⊆ CSit
and that CSit
⊆ Sit . In what follows we use these conditions to bound
the true but unobserved denominator of the logit choice probabilities for any Ximt , δ, and β:
X
m∈ft (Yi )=rit
exp (δm + Ximt β) ≤
X
X
exp (δm + Ximt β) ≤
? =c
m∈CSit
it
exp (δm + Ximt β) . (5.2)
m∈Sit =sit
?
S
CS
?
For brevity, denote P rijt
(θ) = Pr[Yit = jt |Sit = sit , θ], P rijt
(θ) = Pr[Yit = jt |CSit
= cit , θ], and
f
P rijt
(θ) = Pr[Yit = jt |ft (Yi ) = rit , θ]. It follows from (5.2) that for any jt ∈ ft (Yi ) = rit :
?
f
S
CS
P rijt
(θ) ≤ P rijt
(θ) ≤ P rijt
(θ).
30
(5.3)
f
(θ) = Pr[Yit = jt |ft (Yi ) = rit , θ] takes the usual logit form whenever jt ∈ rit ,
Observe that P rijt
f
but that it equals zero whenever jt ∈
/ rit . Hence, for those jt ∈ sit but jt ∈
/ rit , P rijt
(θ) will not be
?
CS
?
a valid upper bound for P rijt
(θ): even if jt ∈
/ rit , it can still be the case that jt ∈ CSit
= cit and
?
CS
so that P rijt
(θ) > 0. Similarly, among the jt ∈ sit that jt ∈
/ rit , there can be some jt ∈
/ cit . But
?
S
CS
S
for those jt ∈ sit that jt ∈
/ cit , P rijt
(θ) > P rijt
(θ) = 0: P rijt
(θ) will not be a valid lower bound
?
?
CS
CS
for P rijt
(θ). It is then unclear how to bound P rijt
(θ) for those jt ∈ sit but jt ∈
/ rit . However, it
is always possible to construct bounds for the probability with which i would purchase jt if indeed jt
?
were to be added to their true but unobserved choice set, CSit
∪ {jt } = cit ∪ {jt }:
CS
P rijt
?
∪j
exp (δjt + Xijt t β)
.
m∈cit ∪{jt } exp (δm + Ximt β)
?
(θ) = Pr [Yit = jt |CSit
∪ {jt } = cit ∪ {jt }, θ] = P
S∪j
f ∪j
S∪j
CS
S
By defining P rijt
(θ) and P rijt
(θ) analogously, note that P rijt
(θ) = P rijt
(θ), P rijt
?
f ∪j
f
S∪j
CS
CS
P rijt
(θ), and P rijt
(θ) = P rijt
(θ) for any jt ∈ rit , while P rijt
(θ) ≤ P rijt
?
∪j
(5.4)
?
CS
(θ) and P rijt
∪j
?
(θ) =
∪j
(θ) ≤
f ∪j
P rijt
(θ) for any jt ∈
/ rit . Using these facts, we can then complement condition (5.3) for those jt ∈
/ rit
and propose choice probability bounds for all (i, j, t) combinations:
S∪j
CS
P rijt
(θ) ≤ P rijt
?
∪j
f ∪j
(θ) ≤ P rijt
(θ).
(5.5)
Condition (5.5) can be used to construct bounds for functions of consumer type choice probabilities,
such as average choice probabilities or elasticities. The average choice probability of product jt for a
certain group of consumer types i = 1, . . . , It can be bounded by:
It−1
It
X
i=1
S∪j
P rijt
(θ) ≤ It−1
It
X
CS
P rijt
i=1
?
∪j
(θ) ≤ It−1
It
X
i=1
31
f ∪j
P rijt
(θ).
(5.6)
With logit demand and linear indirect utilities (5.1), consumer type i’s own- and cross-price elasticities are:
CS
jj
(Xit , θ) = βp pjt t (1 − P rijt
ξit
?
∪j
(θ))



exp(δjt + Xijt t β)

X
,
exp(δm + Ximt β) 


= βp pjt t 1 −

m∈cit ∪{jt }
(5.7)
CS
jk
(Xit , θ) = βp pkt t P rikt
ξit
?
∪j
(θ)



= −βp pkt t 



exp(δkt + Xikt t β)

X
,
exp(δm + Ximt β) 
m∈cit ∪{jt }
where pjt t is jt ’s price in choice situation t and βp is the price coefficient. As (5.7) makes clear,
even though we may have a consistent estimator of δ = [δ1 , . . . , δj , . . . , δJ ] and β, we still do not know
CS
?
the exact CSit
∪ {jt } = cit ∪ {jt } for each i and t, and thus the true P rijt
?
∪j
?
(θ), ∀jt ∈ CSit
∪ {jt } =
cit ∪ {jt }.
Given (5.7), (5.5), and βp < 0, we obtain the following bounds on the elasticities for any jt , kt ,
Xijt , δ, and β:
f ∪j
βp pjt t (1 − P rijt
(θ))
|
{z
}
Lower (in abs. value) Bound
S∪j
−βp pkt t P rikt
(θ)
|
{z
Lower Bound
jj
S∪j
≤ ξit
(Xit , θ) ≤ βp pjt t (1 − P rijt
(θ))
|
{z
}
Upper (in abs value) Bound
≤
jk
ξit
(Xit , θ)
}
≤
(5.8)
f ∪j
−βp pkt P rikt
(θ)
|
{z
Upper Bound
}
We construct confidence intervals for these identification regions following Imbens and Manski
(2004). Details are provided in Appendix A.
32
Figure 6.1: Market shares
Notes: xxxxx
6
Empirical Illustration
In this section we provide an empirical illustration of the problem that arises from unobserved heterogeneous choice sets and potential solutions. We choose a situation in which we observe a change
in the choice set that consumer types face, so that we can investigate the consequent bias in standard
methods and the working of our proposed estimators.
We consider demand for ketchup in plastic bottles in the UK over the period 2002–2009. The UK
ketchup market is an ideal application to study the problem of demand estimation in the presence of
heterogeneous unobserved choice sets. Heinz is the dominant brand, in 2002 it had a market share
of approximately 60%. Supermarket own brands made up the rest of the market. Heinz introduced
the top down package in July 2003. The top down format was popular and gradually became the
dominant variety, see Figure 6.1.
As we saw in equation (2.5) and the following discussion, the estimation bias is expected to be
particularly large whenever the econometrician mistakenly imputes to consumer types products they
like, but did not purchase because not in their true but unobserved choice set. In this sense, the UK
33
Figure 6.2: Mean number of products in sufficient sets
Notes: xxxxx
ketchup market is an ideal situation because we observe a product, the top down variety, which a
substantial share of households clearly prefer (we know this because we observe them purchasing the
top down variety when both are available), but which we know is not in their choice set prior to its
introduction into the market.
The ketchup market has been studied in a number of papers in the literature, all relying on
household-level panel data of the type we also use. Allenby and Rossi (1998) and Chiang et al.
(1998) are early applications that estimate models with preference heterogeneity, while Pesendorfer
(2002) explores the optimal timing of sales. Villas-Boas and Zhao (2005) model both manufacturer
and retailer behavior to measure the extent of manufacturer competition, the nature of retailermanufacturer interactions, and the consequences of retailer product-category pricing. Pancras and
Sudhir (2007) analyze optimal one-to-one couponing strategy for a major customer data intermediary
(CDI). The US and UK markets differ slightly in the relative importance of store own brand ketchup
products.
6.1
Data
We use UK household-level purchase data from the Kantar Worldpanel. Households record purchases
using a scanner in the home. Households typically remain in the panel for around 2 years. For more
information on the data see Leicester and Oldfield (2009) and Griffith and O’Connell (2009). We
estimate demand for ketchup using information on purchases over the period 2002–2009. These are
households that purchase ketchup regularly. We consider 39 products. These are all products in
plastic bottles that we observe being sold.
We define sufficient sets as in section (3.3). In what follows, we present results for the Full Purchase
History (FPH) logit, defined in section (3.5.1), and combined with the the Inter-Personal logit, defined
in section (3.5.3). In addition, we also implement the Fixed Effects logit, discussed in section (3.5.1),
to estimate the marginal utility of income (the coefficient on price).
34
6.2
Demand estimates
Table 6.1: Demand estimates
price
Universal
FPH
FPH-IP
FE
FE-IP
-0.478
(0.021)
-1.372
(0.044)
-1.431
(0.047)
-1.625
(0.063)
-1.527
(0.068)
Universal
on FE sample
-0.491
(0.027)
-0.152
(0.002)
0.013
(0.017)
0.219
(0.018)
0.061
(0.027)
0.359
(0.022)
-0.803
(0.055)
0.415
(0.028)
0.574
(0.031)
-1.287
(0.053)
-0.610
(0.028)
-0.432
(0.025)
-0.125
(0.023)
1534221
39339
2410
9.3
1
31
39
39
39
-0.012
(0.004)
0.583
(0.052)
0.364
(0.040)
0.710
(0.053)
0.812
(0.056)
1.104
(0.136)
1.034
(0.057)
1.243
(0.069)
0.246
(0.112)
0.135
(0.073)
-0.073
(0.063)
0.642
(0.061)
90937
29160
2365
6.5
1
24
2.4
2
3
-0.010
(0.004)
0.593
(0.061)
0.389
(0.042)
0.758
(0.055)
0.872
(0.059)
1.489
(0.166)
1.240
(0.061)
1.332
(0.074)
0.247
(0.116)
0.128
(0.076)
-0.008
(0.066)
0.779
(0.066)
76773
27537
2291
6.3
1
24
2.4
2
3
17021
20187
2048
5.0
1
20
2.3
2
3
-0.150
(0.002)
0.132
(0.022)
0.321
(0.024)
0.191
(0.034)
0.377
(0.029)
-1.340
(0.097)
0.454
(0.035)
0.665
(0.041)
-1.390
(0.078)
-0.446
(0.038)
-0.304
(0.035)
0.080
(0.031)
911781
23379
2181
5.6
1
21
39
39
39
SD price
distance
topdown
ss700g
ss900g
ss1kg
ss11kg
ss12kg
ss13kg
Morr
Sain
Tesc
Hein
N
it
i
average sequences
min sequences
max sequences
average j
minimum j
maximum j
35
20051
23379
2181
5.6
1
21
2.3
2
3
Mixlogit
-1.125
(0.034)
1.333
(0.023)
-0.152
(0.002)
0.396
(0.017)
0.479
(0.018)
0.222
(0.026)
0.828
(0.022)
-0.038
(0.056)
0.525
(0.027)
0.946
(0.031)
-0.973
(0.051)
-0.412
(0.028)
-0.441
(0.026)
0.382
(0.023)
1629108
41772
2416
9.9
1
32
39
39
39
6.3
Elasticity bounds
In what follows, we illustrate the concepts of market share and elasticity in the case of panel data, in
which i represents a household and t a choice situation in a specific time period. With a slight change
of notation, the same can be done in the case of cross-sectional data.
Figure 6.3 shows the own-price elasticities for all 39 products.
Figure 6.3: Own price elasticities for all products in October 2007
7
Conclusion
In this paper we have addressed the consequences of unobserved choice set heterogeneity on demand
estimation. We show that it generally causes an estimation bias. We propose a solution method that
allows us to consistently estimate demand parameters and to bound demand elasticities. The solution
is based on logit demand and on identifying “sufficient sets” of choices based on restrictions that need
to be determined by the characteristics of the economic environment.
The solutions can be applied to both cross-section and panel data. We illustrate the method
using an application to ketchup demand. The point estimate of the marginal utility of income (the
coefficient on price) is biased (towards zero) when we use the superset of products as the assumed
homogenous choice set (a common approach taken in applied demand estimation). The elasticities
obtained by using the superset estimates often lie outside of the bounds that we obtain by using our
solution method.
This matters for strategy and policy applications.
36
Extending the methods introduced here to non-logit demand, whether in general or in response
to the endogenous matching of individuals to choice sets (e.g., due to customer search and/or firms’
product choice decisions) is a promising venue for further research.
Bierlaire et al. (2008) extended estimation on choice-based samples to “block-additive GEV” models, however, these do not include the widely used Nested logit model. Extending our methods to
accommodate Nested logit and/or Random Coefficient logit is a valuable topic for future research.
Fox (2007) describes an estimation method on choice subsets that does not assume logit demand,
but instead relies on knowing rank orderings of (subsets of) consumers’ products.
37
Appendices
A
Confidence Intervals for Elasticity Bounds
jk
(Xit , θ), although the
For notational simplicity, we limit our discussion to a single elasticity term ξit
same ideas can be extended to the collection of all elasticities. Refer to the upper and lower bounds
jk
jk
jk
of ξit
(Xit , θ) in (5.8) as to ξit
(Xit , θ) and ξit
(Xit , θ), respectively. Denote the elasticity bounds of
h
i0
jk
jk
jk
jk
(Xit , θ) = ξit
(Xit , θ) by the 2 × 1 vector B ξit
(Xit , θ) and the corresponding elasξit
(Xit , θ), ξit
jk
b we can estimate
ticity interval from (5.8) by IN ξit
(Xit , θ) . Then, given Xit and our consistent θ,
jk
jk
b . We derive the corresponding 100 (1 − α)
the elasticity bounds B ξit
(Xit , θ) by B ξit
(Xit , θ)
percent confidence interval CI1−α from condition:
inf
jk
jk
(Xit ,θ))
∈IN (ξit
ξit
n
h
io
jk
lim Pr ξit
∈ CI1−α ≥ 1 − α.
I→∞
(A.1)
√ d
Since our estimator is consistent and asymptotically normal, i.e., θb I → N (θ, Vθ ), by the deltamethod:
0 
jk
jk
∂B ξit
∂B
ξ
(X
,
θ)
(X
,
θ)
it
it
it
d


jk
Vθ
I −→ N B ξit
(Xit , θ) ,
.
0
0
∂θ
∂θ

jk
b
B ξit
(Xit , θ)
√
(A.2)
jk
b as to Σ jk . It
Refer to the 2 × 2 asymptotic variance–covariance matrix of B ξit
(Xit , θ)
B (ξit )
follows that, whenever ft (Yi ) ∪ {jt } = rit ∪ {jt } is a strict subset of Sit ∪ {jt } = sit ∪ {jt }, so that for
jk
jk
(Xit , θ) < ξit
(Xit , θ), condition (A.1) is satisfied by:
any Xit and θ, ξit
"
CI1−α
#
r
r
jk
b − q1−α Σ 11 jk , ξ jk (Xit , θ)
b + q1−α Σ 22 jk ,
= ξit
(Xit , θ)
B (ξit ) it
B (ξit )
th
where q1−α is the (1 − α)
(A.3)
quantile of the standard normal distribution.
jk
In the extreme case in which ft (Yi ) ∪ {jt } = rit ∪ {jt } = Sit ∪ {jt } = sit ∪ {jt }, ξit
(Xit , θ) =
jk
jk
ξit
(Xit , θ) for any Xit and θ, and (A.3) is invalid. This is due to a discontinuity at ξit
(Xit , θ) =
jk
(Xit , θ), since in that case the coverage of the interval is only 100 (1 − 2α) % rather than the
ξit
1
nominal 100 (1 − α) %. (See Imbens and Manski (2004) for a modification of (A.3) that overcomes
this problem.) However, note that (a) both ft (Yi ) ∪ {jt } = rit ∪ {jt } and Sit ∪ {jt } = sit ∪ {jt }
are always perfectly observed by the econometrician, so that the appropriate CI1−α can always be
implemented and (b) in our empirical application ft (Yi ) ∪ {jt } ⊂ Sit ∪ {jt } for every i and t.
B
Proof of Proposition 2
B.1
Proof of Result (1)
From Proposition 1, we can re-write for every i the probability of the observed choice sequence j given
fZ (Yi ) = z ⊆ fL (Yi ) = l as:
QT
Pr [Yi = j|, fL (Yi ) = l, θ]
exp(Vijt (Xit , θ))
QT
k∈fL (k)=l
t=1 exp(Vikt (Xit , θ))
=P
t=1
= Pr [Yi = j|fZ (Yi ) = z, θ]
Pr [Yi = j|fL (Yi ) = l, θ]
Pr [Yi = j|fZ (Yi ) = z, θ]
P
QT
exp(Vijt (Xit , θ))
q∈fZ (q)=z
t=1 exp(Viqt (Xit , θ))
=P
,
QT
P
QT
q∈fZ (q)=z
t=1 exp(Viqt (Xit , θ))
k∈fL (k)=l
t=1 exp(Vikt (Xit , θ))
QT
t=1
= Pr [Yi = j|fZ (Yi ) = z, θ] Pr [Yi ∈ fZ (Yi ) = z|fL (Yi ) = l, θ]
where Pr [Yi ∈ fZ (Yi ) = z|fL (Yi ) = l, θ] is the probability that a choice sequence belongs to the
“smaller” set z relative to the “larger” set l.
By multiplying Pr [Yi = j|fL (Yi ) = l, θ] across all
consumer types i’s and by taking the logarithm of this likelihood function, result (1) follows with
PI
l4 (θ) = i=1 ln (Pr [Yi ∈ fZ (Yi ) = z|fL (Yi ) = l, θ]).
B.2
Proof of Result (2)
Given result (1) of Proposition 2, results (2a) and (2b) follow from the Factorization Theorem of
(Ruud, 1984, result (1), p.24).
2
B.3
Proof of Result (3)
Given result (1) of Proposition 2, result (3a) follows from the Factorization Theorem by (Ruud, 1984,
result (3), p.24), while result (3b) follows from (Ruud, 1984, pp.28-9).
Result (3c) can be proved as follows. (Ruud, 1984, result 2, p.24) shows that θbL is asymptotically
−1
−1
equivalent to Var θbL Var θbZ
θbZ + Var θbL Var θb4
θb4 . This implies that Cov θbL , θbZ =
−1
−1
b
b
b
b
Cov Var θL Var θZ
θZ , θZ = Var θbL Var θbZ
Var θbZ = Var θbL , where the first
equality follows from result (2a). Consequently, Var θbL − θbZ = Var θbL +Var θbZ −2Cov θbL , θbZ =
Var θbZ − Var θbL . C
Testing Procedures
The various sufficient sets introduced in section (3.3) rely on the following economic assumptions:
• fF E : Choice set stability across T choice situations and the possibility of having unobserved
preference heterogeneity in the form of (i, j)–specific fixed effects.
• fF P H : Choice set stability across T choice situations and no unobserved preference heterogeneity.
• fP P H : Choice set evolution in the form of entry–but–no–exit or exit–but–no–entry across T
choice situations and no unobserved preference heterogeneity.
There are two possibilities for making comparisons across models based on different sufficient sets
f ’s. The first possibility is to compare fF E , fF P H , and fP P H for choice sequences of constant length
T . The second possibility is to fix a specific f , say fF E , and to compare choice sequences with some
of their sub–sequences: for example, the sequence 1, 2, . . . , T L can be split into two mutually exclusive
sub–sequences 1, 2, . . . , T Z and T Z + 1, . . . , T L , and this gives rise to different fF E ’s, fFZE and fFLE
such that fFZE (Yi ) ⊆ fFLE (Yi ) for any Yi ∈ CS ?i = c. We will discuss each testing possibility in turn.
C.1
Comparisons of Different f ’s with Constant T
For choice sequences of a given length T , it can be shown that fF E (Yi ) ⊆ fF P H (Yi ) and that
fP P H (Yi ) ⊆ fF P H (Yi ) for any Yi ∈ CS ?i = c. These can be seen in our previous example from section
3
(D). Suppose Yi = (1, 3). Then fF E (1, 3) = P (1, 3) = {(1, 3) , (3, 1)}, fF P H (1, 3) = {1, 3} × {1, 3} =
{(1, 1) , (3, 3) , (1, 3) , (3, 1)}, and fP P H (1, 3) = {1} × {1, 3} = {(1, 1) , (1, 3)}. Note that there is no
clear “inclusion” relationship between fF E (Yi ) and fP P H (Yi ).
Given the results from Proposition 2, the above relationships among sufficient sets lead to two
possible classes of tests. The first is about choice set stability and the second about preference
heterogeneity.
C.2
Choice Set Stability
fF P H and fP P H are both based on the same assumption of absence of unobserved preference heterogeneity. However, they rely on different assumptions regarding the evolution of choice sets across
choice situations: fF P H assumes that unobserved choice sets do not change along the whole choice
sequence, while fP P H allows for the entry of new alternatives in the unobserved choice while comparing choice situation t to t + 1. On the one hand, if unobserved choice sets were stable, then both
f ’s would give rise to consistent estimators θbF P H and θbP P H , but result (2b) from Proposition 2 tells
us that θbF P H would be more efficient than θbP P H . On the other hand, if unobserved choice sets were
subject to entry–but–no–exit of alternatives, then only θbP P H would be consistent. It follows that,
under the maintained assumption of no unobserved preference heterogeneity, a test for H0 : (choice
h
i
set stability in 1, 2, . . . , T ) is LR = 2 lP P H θbP P H + l4 θb4 − lF P H θbF P H .
C.3
Preference Homogeneity
The sufficient sets fF P H and fF E are both based on the same assumption of unobserved choice set
stability in 1, 2, . . . , T . However, they rely on different assumptions regarding unobserved preference
heterogeneity: fF P H assumes that there is no unobserved preference heterogeneity, while fF E allows
for (i, j)–specific fixed effects. On the one hand, if there were no unobserved preference heterogeneity,
then both f ’s would give rise to consistent estimators θbF P H and θbF E , but Proposition 2 tells us that
θbF P H would be more efficient than θbF E . On the other hand, if unobserved preference heterogeneity
were present in a form encompassed by (i, j)–specific fixed effects, then only θbF E would be consistent.
It follows that, under the maintained assumption of choice set stability in 1, 2, . . . , T , a test for H0 :
h
i
(preference homogeneity) is LR = 2 lF E θbF E + l4 θb4 − lF P H θbF P H .
4
C.4
Comparisons of Same f with Different Choice Sub–Sequences
It is always possible to split choice sequences of length 1, 2, . . . , T L into two (or more) mutually
exclusive sub–sequences 1, 2, . . . , T Z and T Z + 1, . . . , T L . Then fFZE (Yi ) ⊆ fFLE (Yi ) for any Yi ∈
CS ?i = c. The same holds for both fF P H and fP P H . This method of making comparisons allows us to
test for choice set stability in several alternative ways, but it does not enable us to test for preference
homogeneity.
C.5
Choice Set Stability: fF E Example
In this section we will show with an example why fFZE (Yi ) ⊆ fFLE (Yi ) for any Yi ∈ CS ?i = c and,
afterward, we will discuss how to use this fact to construct tests of choice set stability.
Suppose J = 5, T L = 4, and that consumer type i is observed to make the choice sequence Yi =
(j1 , j2 , j3 , j4 ) = (3, 5, 5, 4).1 By considering the observed choice sequence “at once,” Yi = (3, 5, 5, 4)
can be re–ordered in 12 different choice sequences.2 Collect these sequences into the set fFLE (Yi ) = l.
Assume that Vijt (Xit , θ) = δij + Xijt β. Then, i’s likelihood contribution given fFLE (Yi ) = l is:
Pr Yi = (3, 5, 5, 4)| fFLE (Yi ) = l, β
Xexp ((Xi31 + Xi52 + Xi53 + Xi44 ) β)
.
exp ((Xij1 1 + Xij2 2 + Xij3 3 + Xij4 4 ) β)
=
(C.1)
L (j ,j ,j ,j )=l
(j1 ,j2 ,j3 ,j4 )∈fF
E 1 2 3 4
Differently, by splitting i’s observed choice sequence into two mutually exclusive pairs of choices
Yi1 = (3, 5) and Yi3 = (5, 4), we get fFZE (Yi1 ) = {(3, 5) , (5, 3)} and fFZE (Yi3 ) = {(5, 4) , (4, 5)}. Then,
i’s likelihood contribution given fFZE (Yi1 ) = z1 and fFZE (Yi3 ) = z3 is:
Pr Yi = (3, 5, 5, 4)| fFZE (Yi1 ) = z1 , fFZE (Yi3 ) = z3 , β
=
exp ((Xi31 + Xi52 ) β)
×
exp ((Xi31 + Xi52 ) β) + exp ((Xi51 + Xi32 ) β)
×
exp ((Xi53 + Xi44 ) β)
.
exp ((Xi53 + Xi44 ) β) + exp ((Xi43 + Xi54 ) β)
1 Alternative
(C.2)
one in the first choice situation, alternative three in the second choice situation, etc.
sequences are: (3, 5, 5, 4), (5, 3, 5, 4), (5, 5, 3, 4), (5, 5, 4, 3), (4, 3, 5, 5), (3, 4, 5, 5), (3, 5, 4, 5), (5, 3, 4, 5),
(5, 4, 3, 5), (5, 4, 5, 3), (4, 5, 3, 5), and (4, 5, 5, 3).
2 These
5
By multiplying the binomial logits in (C.2), we get:
Pri Yi = (3, 5, 5, 4)| fFZE (Yi ) = z, β =
Xexp ((Xi31 + Xi52 + Xi53 + Xi44 ) β)
,
exp ((Xij1 1 + Xij2 2 + Xij3 3 + Xij4 4 ) β)
(C.3)
Z (j ,j ,j ,j )=z
(j1 ,j2 ,j3 ,j4 )∈fF
E 1 2 3 4
where fFZE (Yi ) = z collects sequences: (3, 5, 5, 4), (3, 5, 4, 5), (5, 3, 5, 4), and (5, 3, 4, 5). Consequently fFZE (Yi ) = z ⊆ fFLE (Yi ) = l. In this example, fFZE only uses information about 4 of the
12 possible choice sequences in fFLE . This implies that if unobserved choice sets were stable, then
estimator βbFLE would be more efficient than βbFZE .
Moreover, the FE logit estimated on choice sub–sequences may “discard” some choice situations:
in the current example of sub–sequences of length two, whenever jt = jt+1 in Yit = (jt , jt+1 ), then
“fragment” Yit of Yi will not be used in estimation. For example, if i were observed to choose the
sequence Yi = (3, 4, 5, 5), then only Yi1 = (3, 4) would contribute to the likelihood function lFZ E (β),
while lFL E (β) would still use the whole sequence Yi = (3, 4, 5, 5). More precisely, if Yi = (3, 4, 5, 5) were
observed, then fFLE (3, 4, 5, 5) = fFLE (3, 5, 5, 4) = l would still contain the same 12 choice sequences,
while model (C.2) would collapse to:
Pr Yi = (3, 4, 5, 5)| fFZE (Yi1 ) = h1 , fFZE (Yi3 ) = h3 , β
=
exp ((Xi31 + Xi42 ) β)
×
exp ((Xi31 + Xi42 ) β) + exp ((Xi41 + Xi32 ) β)
×
exp ((Xi53 + Xi54 ) β)
exp ((Xi53 + Xi54 ) β)
=
exp ((Xi31 + Xi42 + Xi53 + Xi54 ) β)
exp ((Xi31 + Xi42 + Xi53 + Xi54 ) β) + exp ((Xi41 + Xi32 + Xi53 + Xi54 ) β)
(C.4)
= Pr Yi = (3, 4, 5, 5)| fFZE (Yi ) = h, β ,
which is also equivalent to Pr Yi1 = (3, 4)| fFZE (Yi1 ) = h1 , β . In this case, then, fFZE (Yi1 ) = h1 ⊆
fFZE (Yi ) = z ⊆ fFLE (Yi ) = l. By Proposition 2, we can rank the corresponding estimators in terms
6
of their relative efficiency. As a consequence, by splitting up choice sequences into mutually exclusive
sub–sequences, we can face also this further loss of efficiency.
Model (C.1) requires stronger assumptions than model (C.3) for its consistent estimation. Con?
= ct , t = 1, 2, 3, 4.
sistent estimation of model (C.1) requires that alternatives {3, 4, 5} ⊆ CSit
?
However, consistent estimation of model (C.3) only requires that {3, 5} ⊆ CSit
= ct , t = 1, 2 and that
?
?
?
{4, 5} ⊆ CSit
= ct , t = 3, 4. In this example, if 4 ∈
/ CSit
= ct , t = 1 or 2, or 3 ∈
/ CSit
= ct , t = 3 or 4,
then estimation of model (C.1) would not be consistent, while estimation of model (C.3) would.
These differences in consistency and relative efficiency suggest a Hausman–like test for unobserved
?
choice set stability. If {3, 4, 5} ⊆ CSit
= ct , t = 1, 2, 3, 4, then estimation of both model (C.1) and
model (C.3) would be consistent. However, estimation of model (C.1) would be more efficient than
?
?
estimation of model (C.3). If 4 ∈
/ CSit
= ct , t = 1 or 2 or 3 ∈
/ CSit
= ct , t = 3 or 4, then only
estimation of model (C.3) would be consistent. It follows that, under the maintained assumption of
unobserved preference heterogeneity in a form encompassed by (i, j)–specific fixed effects, a test for
h
i
H0 : (choice set stability in 1, 2, 3 and 4) is LR = 2 lFZ E βbPZP H + l4 βb4 − lFL E βbFLE .
D
A Simple Example
We present a simple example that demonstrates the consequences of unobserved choice set heterogeneity.
Consider a choice environment where a population of consumers, i = 1, . . . , N , selects among one
inside and one outside good, j ∈ S = {0, 1}. Assume preferences are given by
Ui1 = Xi θ + i1
Ui0 = i0 ,
with Xi = Xi1 − Xi0 and ij ∼ Type I Extreme Value.
Let CSi? indicate consumer i’s true choice set, which can either include both options, CSi? = S =
{0, 1}, or the outside option only, CSi? = {0}. In other words, some consumers have access to the
7
inside product while others do not. Let Yi indicate whether consumer i buys the inside product. Then:




1




if Ui1 ≥ Ui0 , and CSi? = {0, 1}



Yi =
Ui1 < Ui0 and CSi? = {0, 1}, or


0
if







CSi? = {0}.
(D.1)
Define ρ as the proportion of the population that faces both options, i.e. Pr [CSi? = {0, 1}] = ρ,
while (1 − ρ) is the proportion that does not have access to the inside product. Let pi (Xi , θ) measure
the choice probability that i would buy the inside product conditional on it being in her choice set,
pi (Xi , θ) = Pr [Ui1 > Ui0 |CSi? = {0, 1}, Xi , θ]
=
(D.2)
e Xi θ
.
1 + e Xi θ
The previous expression relies on the matching of consumers to choice sets being independent of
the distribution of ij , an important assumption that we discuss in detail in the following section.
Then, the unconditional probability that i buys the inside good is
Pr [Yi = 1|Xi , θ] = Pr [Ui1 > Ui0 |CSi? = {0, 1}, Xi , θ] Pr [CSi? = {0, 1}]
(D.3)
= pi (Xi , θ)ρ.
For individuals with CSi? = {0, 1}, the average marginal effect with respect to Xik (averaged over the
distribution of Xi ) is,
EX
∂ Pr [Yi = 1|CSi? = {0, 1}, Xi , θ]
∂pi (Xi , θ)
= EX
∂Xik
∂Xik
(D.4)
= EX [pi (Xi , θ) (1 − pi (Xi , θ))] θk .
By contrast, for individuals with CSi? = {0}, Pr [Yi = 1|CSi? = {0} , Xi , θ] = 0 and the average
marginal effect with respect to Xik is therefore zero,
EX
∂ Pr [Yi = 1|CSi? = {0}, Xi , θ]
= 0.
∂Xik
8
(D.5)
Consequently, the average marginal effect with respect to Xik , averaged over the distribution of both
Xi and CSi? is
EX,CS ?
∂ Pr [Yi = 1, CSi? |Xi , θ]
∂ Pr [Yi = 1|CSi? = {0, 1}, Xi , θ]
= EX
Pr [CSi? = {0, 1}]
∂Xik
∂Xik
(D.6)
= EX [pi (Xi , θ) (1 − pi (Xi , θ))] θk ρ.
As we will see later more in general, estimation of the structural parameters θ from a “standard”
binary logit that ignores choice set heterogeneity will not typically be possible. With an inconsistent
b it is unclear what kind of object one would recover when plugging θb in expression (D.4).
estimator θ,
As a consequence, any counterfactual simulation involving manipulations of the structural parameters
θ will not be reliable. However, even if the structural parameters θ cannot be consistently estimated
by standard methods, in some very special cases, for instance when Xi is normally distributed in the
current example, a consistent estimator of the reduced form parameter (D.6) can be directly obtained
by a linear regression of Yi on Xi (see (Wooldridge, 2010, Section 15.6), summarizing Stoker (1986)).
Note that such estimated average marginal effects will overstate the average marginal effects for those
with CSi? = {0} and understate the average marginal effects for those with CSi? = {0, 1}:
EX [pi (Xi , θ) (1 − pi (Xi , θ))] θk > EX [pi (Xi , θ) (1 − pi (Xi , θ))] θk ρ > 0.
(D.7)
Even in those rare cases in which the average marginal effects (D.6) can be directly estimated, their
practical usefulness may be limited: the range of counterfactuals that one could reliably simulate by
only knowing (D.6), rather than θ and ρ, are limited to situations in which there will be no changes in
the unobserved distribution of choice sets, i.e. when ρ is not affected by the counterfactual behaviors.
For example, if a merger entailed a change in the marketing strategies or the distribution network for
some of the products sold by the merging firms, then only knowing (D.6) may be insufficient for a
realistic evaluation, in the sense that ρmerger may differ from ρ. In order to overcome this problem,
we propose a consistent estimator of the structural preference parameters θ, which then will enable
us to compute the elasticity bounds in (D.7). Since these elasticity bounds are consistent with any
value of ρ, they will be informative also in the evaluation of those counterfactuals in which ρ could
potentially be affected.
9
References
Ackerberg, D., L. Benkard, S. Berry, and A. Pakes, “Econometric Tools for Analyzing Market
Outcomes,” in J. J. Heckman and E. Leamer, eds., Handbook of Econometrics, North-Holland, 2007,
chapter 63, pp. 4171–4276.
Allenby, Greg M and Peter E Rossi, “Marketing models of consumer heterogeneity,” Journal of
Econometrics, 1998, 89 (1), 57–78.
Bernheim, B Douglas and Antonio Rangel, “Beyond Revealed Preference: Choice Theoretic
Foundations for Behavioral Welfare Economics,” Quarterly Journal of Economics, 2009, pp. 51–
104.
Berry, Steven, James Levinsohn, and Ariel Pakes, “Automobile prices in market equilibrium,”
Econometrica, 1995, pp. 841–890.
,
, and
, “Differentiated Products Demand Systems from a Combination of Micro and Macro
Data: The New Car Market,” Journal of Political Economy, 2004, pp. 68–105.
Bierlaire, Michel, Denis Bolduc, and Daniel McFadden, “The estimation of generalized extreme value models from choice-based samples,” Transportation Research Part B: Methodological,
2008, 42 (4), 381–394.
Blundell, Richard, “How revealing is revealed preference?,” Journal of the European Economic
Association, 2005, 3 (2-2), 211–235.
Chamberlain, Gary, “Analysis of Covariance with Qualitative Data,” The Review of Economic
Studies, 1980, 47, 225–238.
Chiang, Jeongwen, Siddhartha Chib, and Chakravarthi Narasimhan, “Markov chain Monte
Carlo and models of consideration set and parameter heterogeneity,” Journal of Econometrics, 1998,
89 (1), 223–248.
Conlon, Christopher T and Julie Holland Mortimer, “Demand estimation under incomplete
product availability,” American Economic Journal-Microeconomics, 2013, 5 (4), 1–30.
10
Debreu, Gerard, “Theory of value. Cowles Foundation monograph 17,” New Haven and London,
1959.
DellaVigna, S., “Psychology and Economics: Evidence from the Field,” Journal of Economic Literature, 2009, 47 (2), 315–372.
D’Haultfœuille, Xavier and Alessandro Iaria, “A Convenient Method for the Estimation of the
Multinomial Logit Model with Fixed Effects,” Economics Letters, 2016, 141, 77–79.
Draganska, M., M. Mazzeo, and K. Seim, “Beyond Plain Vanilla: Modeling Joint Product
Assortment and Pricing Decisions,” Quantitative Marketing and Economics, 2009, 7, 105–146.
Draganska, Michaela and Daniel Klapper, “Choice set heterogeneity and the role of advertising:
An analysis with micro and macro data,” Journal of Marketing Research, 2011, 48 (4), 653–669.
Eliaz, Kfir and Ran Spiegler, “Consideration sets and competitive marketing,” The Review of
Economic Studies, 2011, 78 (1), 235–262.
Fox, Jeremy T, “Semiparametric estimation of multinomial discrete-choice models using a subset of
choices,” The RAND Journal of Economics, 2007, 38 (4), 1002–1019.
Goeree, Michelle Sovinsky, “Limited Information and Advertising in the U.S. Personal Computer
Industry,” Econometrica, 2008, 76 (5), 1017–1074.
Griffith, Rachel and Martin O’Connell, “The Use of Scanner Data for Research into Nutrition,”
Fiscal Studies, 2009, 30, 339–365.
Hausman, Daniel, “Mindless or Mindful Economics: A Methodological Evaluation,” in Andrew
Caplin and Andrew Schotter, eds., The Foundations of Positive and Normative Economics: A
Handbook, Oxford University Press, 2008, pp. 125–155.
Hausman, Jerry A and Paul A Ruud, “Specifying and testing econometric models for rankordered data,” Journal of econometrics, 1987, 34 (1), 83–104.
Hausman, Jerry and Daniel McFadden, “Specification tests for the multinomial logit model,”
Econometrica, 1984, pp. 1219–1240.
11
Imbens, Guido W and Charles F Manski, “Confidence intervals for partially identified parameters,” Econometrica, 2004, pp. 1845–1857.
Koszegi, B. and M. Rabin, “Mistakes in Choice-Based Welfare Analysis,” American Economic
Review, 2007, 97 (2), 477–481.
Leicester, Andrew and Zoe Oldfield, “An analysis of consumer panel data,” IFS Working Papers
W09/09, 2009.
Lerman, S. and C. Manski, “On the Use of Simulated Frequencies to Approximate Choice Probabilities,” in C. Manski and D. McFadden, eds., Structural Analysis of Discrete Data with Econometric
Applications, MIT Press, 1981.
los Santos, Babur De, Ali Hortaçsu, and Matthijs R Wildenbeest, “Testing models of
consumer search using data on web browsing and purchasing behavior,” American Economic Review,
2012, 102 (6), 2955–2980.
Lu, Zhentong, “Estimating Multinomial Choice Models with Unobserved Choice Sets,” 2016. Working paper, University of Wisconsin-Madison.
Manski, Charles F, “The structure of random utility models,” Theory and decision, 1977, 8 (3),
229–254.
Manzini, Paola and Marco Mariotti, “Stochastic choice and consideration sets,” Econometrica,
2014, 82 (3), 1153–1176.
Masatlioglu, Yusufcan, Daisuke Nakajima, and Erkut Y Ozbay, “Revealed attention,” American Economic Review, 2012, pp. 2183–2205.
McFadden, Daniel, “Modeling the Choice of Residential Location,” in A. Karlqvist, L. Lundqvist,
F. Snickars, and J. Weibull, eds., Spatial Interaction Theory and Planning Models, Vol. 1, NorthHolland, 1978, pp. 75–96.
, Kenneth Train et al., “Mixed MNL models for discrete response,” Journal of applied Econometrics, 2000, 15 (5), 447–470.
12
Nevo, Aviv, “Measuring Market Power in the Ready-To-Eat Cereal Industry,” Econometrica, 2001,
69 (2), 307–342.
Nierop, Erjen Van, Bart Bronnenberg, Richard Paap, Michel Wedel, and Philip Hans
Franses, “Retrieving unobserved consideration sets from household panel data,” Journal of Marketing Research, 2010, 47 (1), 63–74.
Pancras, Joseph and K Sudhir, “Optimal marketing strategies for a customer data intermediary,”
Journal of Marketing Research, 2007, 44 (4), 560–578.
Pesendorfer, Martin, “Retail sales: A study of pricing behavior in supermarkets,” The Journal of
Business, 2002, 75 (1), 33–66.
Reutskaja, Elena, Rosemarie Nagel, Colin F Camerer, and Antonio Rangel, “Search dynamics in consumer choice under time pressure: An eye-tracking study,” The American Economic
Review, 2011, 101 (2), 900–926.
Ruud, Paul A, “Tests of specification in econometrics,” Econometric Reviews, 1984, 3 (2), 211–242.
Samuelson, P.A., “A Note on the Pure Theory of Consumer’s Behviour,” Economica, 1938, 5 (17),
61–71.
Spiegler, Ran, Bounded Rationality and Industrial Organization, Oxford University Press, 2011.
Stoker, Thomas M, “Consistent estimation of scaled coefficients,” Econometrica, 1986, pp. 1461–
1481.
Train, K., Discrete Choice Methods with Simulation, Cambridge University Press, 2009.
Varian, H., “Revealed Preference,” in M. Szenberg, L. Ramrattan, and A.A. Gottesman, eds.,
Samuelsonian Economic and the Twenty-First Centur, New York: Oxford University Press, 2006,
chapter 99-115.
Villas-Boas, J Miguel and Ying Zhao, “Retailer, manufacturers, and individual consumers: Modeling the supply side in the ketchup marketplace,” Journal of Marketing Research, 2005, 42 (1),
83–95.
13
Wooldridge, Jeffrey, Econometric Analysis of Cross-Section and Panel Data, 2nd ed., MIT Press,
2010.
14