Estimating Demand Parameters with Choice Set Misspecification∗ Gregory S. Crawford† Rachel Griffith‡ Alessandro Iaria§ December 6, 2015 Abstract [Preliminary and incomplete: please do not cite without authors’ consent] We describe methods to estimate demand parameters in the presence of choice set misspecification due to unobserved individual choice sets. We show that a consumer’s probability of making a choice from her true choice set can be written as the probability she makes the same choice from a universal choice set, an assumption commonly made in the empirical literature, less a bias term capturing the probability that she would choose any of the elements in her true choice set. We show how to estimate models that avoid this bias under Logit demand using “sufficient sets,” instances where (observationally identical) individuals make at least two choices from the same unobserved choice set. Examples of sufficient sets include a variation of the Fixed Effect Logit approach due to Chamberlain (1980), using observed individual-specific purchase histories, and assuming common unobserved choice sets across individual purchases in a given retailer and time period. We illustrate these ideas using household-level scanner data exploiting variation in choice sets based on the introduction of a new product variety and heterogeneous product offerings across stores in the market for ketchup in the UK. Our results show that accounting for choice set misspecification is critical for accurately estimating own- and cross-price elasticities of demand. ∗ Griffith gratefully acknowledges financial support from the European Research Council (ERC) under ERC-2009AdG grant agreement number 249529, and the Economic and Social Research Council (ESRC) under the Centre for the Microeconomic Analysis of Public Policy (CPP), grant number RES-544-28-0001. † Dept. of Economics, University of Zürich and CEPR, [email protected] ‡ University of Manchester and IFS, [email protected] § CREST (Paris) and IFS, [email protected] 1 1 Introduction Demand estimation is one of the primary applications of empirical economics. Recent advances in estimating flexible distributions of preferences in settings where consumers make discrete choices coupled with the many business strategy and policy applications that require information about consumer demand has caused an explosion of applications based on both aggregate and individuallevel datasets.1 In the vast majority of these applications, the researcher assumes consumers all have access to the same, universal, choice set given by all the available products offered in the market (Berry, Levinsohn and Pakes (1995), xxx).23 This is a strong assumption that recent research has sought either to relax, connecting consumer “consideration sets” to limited or rational consumer (in)attention (Masatlioglu, Nakajima and Ozbay (2012), Manzini and Mariotti (2014), Matejka and McKay (2015)), or to empirically measure and/or accommodate (Goeree (2008), Van Nierop, Bronnenberg, Paap, Wedel and Franses (2010)).4 Accurately measuring consumers’ choice sets is important, as if a consumer is assumed not to have chosen an alternative that in fact simply wasn’t in her choice set, inferences about her preferences are going to be mistaken. This, in turn, will feed into researchers’ estimates of consumer preference parameters and the conclusions that rely on them, including estimated elasticities and the consequences of changes in the economic environment, for example evaluating the effects of a change in tax policy or a merger between manufacturers. In this paper, we demonstrate that choice set misspecification introduces a bias in demand estimation and propose solution methods for consistently estimating preferences in this environment. We allow consumers’ choice sets to be smaller than the universal choice set, heterogeneous across consumers, and unobserved to the econometrician. Our solution methods rely on identifying “sufficient sets” of consumer choices, instances where (observationally identical) individuals make at least two choices from the same unobserved choice set, and that consumer demand for products given their (unobserved) choice set is Logit. In addition to being among the most commonly estimated specifications in the applied literature, Matejka and McKay (2015) recently showed that the optimal 1 See Ackerberg, Benkard, Berry and Pakes (2007) and Train (2009) for recent surveys. Also frequent is the assumption that consumers have the same choice set, but this set is restricted to only those products with market shares above a given threshold. Gandhi, Lu and Shi (2013) relax this assumption by introducing errors in measured market shares and finds that including low-share goods in a retail grocery setting doubles estimated price elasticities. 3 Would like to cite another top-5 paper that estimated on a universal choice set. Ideas? 4 We survey the insights of these literatures later in the introduction. 2 2 informational-processing strategy taken by consumers facing payoff uncertainty in discrete-choice environments results in probabilistic choices that follow a Logit model.5 We begin by showing that a consumer’s probability of selecting a product from their individualspecific choice set can be written as the probability based on the (conventional) universal choice set divided by the sum of the probabilities she would choose any of the products in her choice set. The probability induces bias relative to the conventional model that assumes universal choice sets. If the consumer’s imputed choice set mistakenly includes products that were not available to her, but that she would most prefer to purchase when faced with the universal choice set, bias from ignoring unobserved choice sets could be substantial. Fortunately, in the case of discrete choice models with Logit demand, convenient solutions are available. We propose three. Each relies on identifying “sufficient sets” choices that both (1) contain the observed sequence chosen by the consumer(s) and (2) must lie within the consumers’ true unobserved sequence of choice sets. Each also relies on the convenient functional form of the Logit model to “difference out” the consumer’s true, unobserved-to-the-econometrician, choice sets and identify and estimate preferences based on variation in price and other product characteristics across time among the products in the sufficient sets(McFadden (1978)). The three solutions differ in the type of variation available to the researcher and in their choice of sufficient sets. The first two rely on household-specific panel data. First, we show that methods developed by Chamberlain (1980), conventionally called the Fixed-Effect Logit (FE Logit), to address unobserved individual preference heterogeneity also naturally accommodate choice set misspecification. This is intuitive, as the FE Logit estimates preference parameters based only on changes in choices over time among the set of products a consumer ultimately selects. Because a product must have been in a consumer’s choice set to ultimately be chosen, any additional products not in the choice set do not enter the estimation, nor do any products that were in the consumer’s choice set but weren’t chosen. Second, we show individual-specific purchase histories also provide a sufficient sequence of products that can be used to estimate preferences. We call this estimator the Purchase-History Logit (PH Logit). The final solution is available in cross-sectional applications. If one can identify subsets of individuals in the data with identical preferences who make choices from 5 When consumers have equal prior beliefs about payoffs across choices, demand is strictly Logit; when they have heterogeneous priors across products, demand is a generalized Logit, with consumers’ prior beliefs also influencing choices. 3 a common, but unobserved, choice set at a given point in time, then one can take the union of the products they choose as a sufficient set. We call this estimator the Cross-Section Logit (XS Logit). The different methods we propose are complementary. Among the panel-data methods, the FE Logit is robust against both incorrect choice set imputations and individual-product-specific fixed effects, but does not allow the identification of time-invariant elements of the indirect utility frequently needed in applied empirical work. On the other hand, the PH Logit is not robust against individual fixed effects, but allows the identification of time-invariant elements of the indirect utility necessary to compute elasticities. Similarly, the XS Logit relies the relatively strong assumption of preference homogeneity across (observationally identical) individuals, but does not demand household-level panel data which could be difficult to acquire. We show how one can combine results from the conventional Logit, the FE Logit, and the PH Logit to test for both choice set misspecification and individual fixed effects.6 As is common in fixed-effects estimation, these solutions represent a trade-off between the greater efficiency but risk of parameter bias offered by estimation on the models with stronger assumptions (universal choice set, no fixed effects) against the lesser efficiency but robustness against parameter bias offered by estimation with weaker assumptions (sufficient sequences based on observed choices, fixed effects).7 We demonstrate these ideas both analytically and in an application to the demand for ketchup in the UK. Ketchup demand is an attractive application as unobserved heterogeneity in consumer’s choice sets can be driven either by the introduction of a new product or heterogeneity in product availability due to consumer’s store choice decisions, both of which are empirically important during our sample period.8 This paper is related to multiple literatures in economics and marketing. The first seeks to provide theoretical foundations for consumer consideration sets based on limited consumer attention. Eliaz and Spiegler (2011) introduce boundedly rational consumers that use screening criteria to reduce the number of “relevant” alternatives among which they must choose (defining consumers’ 6 What testing can we do with the XS Logit? Both the FE Logit and the PH Logit rely on a subset of the full data, as consumers for whom choices never change over time drop out of estimation and, even among consumers that do contribute to estimation, only that variation across ultimately-chosen products is used to identify preferences. 8 The key new product introduction in our periods is so-called Top-Down ketchup varieties, first by the market leader Heinz, and then later by other branded brands as well as store brands. As is common in grocery markets around the world, ketchup sales consist of both branded products that are available across different retail chains (dominated by Heinz in our data) as well as stores’ “own labels,” which are only available within each retail chain. 7 4 “consideration sets”) and explore how firms can use marketing techniques to manipulate these sets. Masatlioglu et al. (2012) allow for limited consumer attention in a more general demand environment and explore whether and when preferences can be separately identified from attention (or consideration) when both choices and consideration sets are observed. If, for example, a consumer’s choice (x) changes when a product (y) is removed from consideration, then it must be that both y must have been considered (“Revealed Attention”) and that she prefers x over y (“Revealed Preference”). Manzini and Mariotti (2014) extend this framework to allow for probabilistic choice and show that this enhances the separate identification of preferences from attention. In a related approach, Matejka and McKay (2015) allow for rational consumer inattention. They assume consumers have imperfect information about payoffs, but, before choosing, may gain information at a cost. Using information theory to determine consumers’ optimal information-processing strategy, they show that it results in probabilistic choices that follow a generalized Logit form: the probability a consumer selects product j depends on her prior beliefs about that product, the true payoff it would provide, and a parameter that scales with the cost of information. While the former literature assumes that consumers are only select among a subset of the available alternatives, the Matejka and McKay approach assume that consumers know about the possibility of all choices, but may be unaware of their exact characteristics (payoffs). Either assumption can rationalize the empirical approach we take in this paper. A second literature, with branches in both economics and marketing, seeks to accommodate consideration sets in the estimation of preferences, using either data on consumers’ actual consideration sets or a model of consideration set formation. In the marketing literature, Roberts and Lattin (1991) was the first empirical paper to develop and estimate a model of consideration set formation. The literature since has generalized the set of choice environments, econometric models, and datasets considered, with many examples in top marketing journals.9 An important challenge facing most of this literature is separately identifying consideration set formation from preferences given consideration sets. As noted in a survey by Roberts and Lattin (1997), researchers working without explicit measures of choice sets “cannot address whether the consideration stage of their model corresponds to a cognitive stage of consideration or if it is just a statistical artifact of the data.” Recent papers by Van Nierop et al. (2010) and Draganska and Klapper (2011) represent 9 Cite Marketing literature here. 5 the frontier of the marketing literature, both in estimating flexible econometric models and doing so on both consideration set and choice data. Within the economics literature, Chiang, Chib and Narasimhan (1998) was the first paper to account for both choice set and parameter heterogeneity. While econometrically innovative, it too relied only on choice data and suffers from the same identification problem. Goeree (2008) addressed the identification issue in the absence of consideration set data in an indirect, but clever way. She begins by formulating a model of choice set formation based on advertising exposure, with a conventional choice model given a consumer’s consideration set. She then identifies the model based on the differential exposure to advertising media across consumer demographics coupled with differential advertising strategies across media by computer manufacturers.10 She finds accounting for consideration is important, with price elasticities twice as high as in a model that ignores it. Unlike all of these papers (in both literatures), we accommodate the problem of unobserved choice sets not by formulating a model of choice set formation, but by identifying a sufficient subset of consumers’ unobserved choice sets which can be used to recover preferences under Logit demand regardless of their unobserved consideration sets. A third literature demonstrates the usefulness of Logit demand for estimating preferences on subsets of true choice sets.11 The seminal contribution in this literature is McFadden (1978). There he acknowledges that specifying and/or estimating Logit demand when consumers’ (known) choice sets are extremely large can be computationally demanding, but that when preferences are Logit, an econometrician can always estimate them based on a fixed or random subsample of those choices.12 Thus choice set reduction can be accommodated in Logit models. The Logit functional form here is critically important as conditioning on a subset of choices allows for the denominator in both the original Logit choice probability and in the Logit choice probability of the choice subset to cancel, a solution method on which we also rely in this paper.13 10 Thus, controlling for preferences, if young, wealthy, men read car magazines and Dell advertises in such magazines, that such men are more likely to purchase a Dell computer is evidence of it being more likely to be in their consideration set. 11 See also Fox (2007) for an estimation method on choice subsets that doesn’t assume Logit demand, but instead relies on known rank orderings of (subsets of) consumers’ alternatives. 12 This is easiest when the econometrician samples each consumer’s observed choice and then other choices at random (i.e., uniform matching probabilities across choices), but for efficiency reasons, she may instead decide to implement non-uniform matching probabilities. In that case, consistency of the estimator is achieved by accounting for the matching probabilities in the likelihood function. See Train, McFadden and Ben-Akiva (1987) for an application of non-uniform matching probabilities. 13 While Bierlaire, Bolduc and McFadden (2008) extend estimation on choice-based samples to “block-additive GEV” models, these do not include the widely used Nested Logit Model. Extending the methods we introduce here to accommodate Nested Logit and/or Random Coefficient preferences is a valuable topic for future research. 6 Finally, our paper is necessarily related to other papers that estimate the demand for ketchup in the context of addressing other questions of interest in economics or marketing. Allenby and Rossi (1998) is an early application of estimating models of preference heterogeneity, while Pesendorfer (2002) is among the first papers to explore the optimal timing of sales. Villas-Boas and Zhao (2005) model both manufacturer and retailer behavior to measure the extent of manufacturer competition, the nature of retailer-manufacturer interactions, and the consequences of retailer product-category pricing. Pancras and Sudhir (2007) analyze optimal one-to-one couponing strategy for a major customer data intermediary (CDI). While the US and UK markets differ in the relative importance of alternative ketchup brands (but for Heinz), all these papers rely on household-level panel data of the type we also use here.14 The closest paper to our work is that of Lu (2014).15 He acknowledges the same issues regarding the impact of choice set heterogeneity and endogeneity on estimating preferences and also relies on information on the product(s) a consumer actually chooses, but proposes a very different solution based on bounds. In many choice situations, the researcher has some information about what choices must be in every consumer’s choice set. If so, then the probability a consumer selects the product they did among the union of this essential set and their chosen product provides an upper bound on their true choice probability; similarly the probability a consumer selects a product from the universal choice set provides a lower bound on the probability. Lu develops an estimator based only on these bounds and applies it to household grocery scanner panel data. Lu’s model is more flexible in its specification of choice probabilities and ability to exploit cross-sectional variation but relies on bounds that may prove uninformative in certain applications, while ours specifies a parametric (Logit) choice model and relies on across-time, across-choice variation that may not be rich enough in certain applications. The approaches are therefore complementary. The balance of this paper proceeds as follows. Section xxx... 14 Indeed, all but Villas-Boas and Zhou appear to use the same, Nielsen scanner panel data, collected between 1986 and 1988 for Springfield, Missouri. 15 Add also a discussion of Conlon and Mortimer (2013). 7 2 A Simple Example In this section, we present a simple example that demonstrates the consequences of choice set mis-specification. Consider a choice environment where a population of consumers, i = 1, . . . , N , selects among one inside and one outside good, j ∈ U = {0, 1}. In what follows, we will call U the “universal choice set.” Assume preferences are given by ui1 = x0i β + i1 ui0 = i0 with xi ≡ xi1 − xi0 and ij ∼ Type I Extreme Value. Let CSi? indicate consumer i’s choice set, which can either be the universal choice set, CSi? = U = {0, 1}, or the outside good only, CSi? = {0}. In other words, some consumers have access to the inside good while others do not. Let Yi indicate whether consumer i buys the inside good. Then: 1 if ui1 ≥ ui0 , and CSi? = {0, 1} Yi = ui1 < ui0 and CSi? = {0, 1}, or 0 if CSi? = {0} (1) Suppose that ρ percent of the population has the universal choice set, i.e. Pr [CSi? = {0, 1}] = ρ, while (1-ρ) percent does not have access to the inside good. Let pi (xi , β) measure the choice probability purely due to preferences: the probability that i would buy the inside good conditional on it being available to her, i.e. pi (xi , β) = Pr (ui1 > ui0 |CSi? = {0, 1}, xi , β) (2) 0 = e xi β 0 1 + exi β Assume the matching of consumers to choice sets is independent of the distribution of ij . Then, the unconditional probability that i buys the inside good is: 8 Pr (Yi = 1|xi , β) = Pr (ui1 > ui0 |CSi? = {0, 1}, xi , β) Pr [CSi? = {0, 1}] (3) = pi (xi , β)ρ For those i’s with CSi? = {0, 1}, the average marginal effect with respect to xik (averaged over the distribution of xi ) is: Ex ∂pi (xi , β) ∂ Pr (Yi = 1|CSi? = {0, 1}, xi , β) = Ex ∂xik ∂xik (4) = Ex [pi (xi , β) (1 − pi (xi , β))] βk By contrast, for those i’s with CSi? = {0}, Pr (Yi = 1|CSi? = {0} , xi , β) = 0 and the average marginal effect with respect to xik is zero: Ex ∂ Pr (Yi = 1|CSi? = {0}, xi , β) =0 ∂xik (5) Consequently, the observed average marginal effect with respect to xik , averaged over the distribution of both xi and CSi? : Ex,CS ? ∂ Pr (Yi = 1, CSi? |xi , β) ∂ Pr (Yi = 1|CSi? = {0, 1}, xi , β) = Ex Pr [CSi? = {0, 1}] ∂xik ∂xik (6) = Ex [pi (xi , β) (1 − pi (xi , β))] βk ρ When xi is normally distributed, a consistent estimator of (6) can be obtained by a linear regression of Yi on xi (see Wooldridge (2010, Section 15.6)). However, with unobserved choice set heterogeneity, the observed average marginal effects are the result of two things: preferences and limited availability of the inside good. Even though we are typically interested in estimating only the portion of the average marginal effects due to preferences, unfortunaly our estimator will pick up a mixture of the two. Thus, whenever choice sets are heterogenous (i.e. ρ < 1) and unobserved, i.e. we cannot “condition” on the true CSi? , the estimated average marginal effect, asymptotically converging to (6), will overstate the average marginal effects for those with CSi? = {0} and understate the average 9 marginal effects for those with CSi? = {0, 1}: Ex [pi (xi , β) (1 − pi (xi , β))] βk > Ex [pi (xi , β) (1 − pi (xi , β))] βk ρ > 0 (7) The insights from this binary example extend to more complex multinomial models, although in those signing the direction of the bias may prove impractical. We move on to multinomial models from the next section. 3 Choice Set Misspecification: Bias and Its Correction In this section we demonstrate how choice set misspecification introduces a particular form of bias into demand estimation and show how, in discrete choice models with Logit errors, the Fixed-Effect Logit (FE Logit), Purchase-History Logit (PH Logit), and Cross-Section Logit (XS Logit) provide alternative solutions to this problem. 3.1 Bias from Choice Set Misspecification Preference fundamentals Suppose that there are i = 1, . . . , I individuals who are observed to make t = 1 . . . , T choices, Yi = (Yi1 , . . . , YiT ), where Yi is i’s sequence of choices.16 During any choice situation t, we define the union of all alternatives available to any consumer to be the universal choice set in period t, Ut = {1, . . . , Jt }. Denote by × the cartesian product and define U ≡ ×Tt=1 Ut . However, because of various constraints, we assume that every individual is matched to their own choice set CSit? ⊆ Ut in each choice situation, t. We let consumer i’s sequence of choice ? , . . . , CS ? ). By construction, any observed Y must belong to the set sets be given by CSi? = (CSi1 i iT of possible choice sequences CS ?i ≡ ×Tt=1 CSit? . Given preferences θ and a specific match with a sequence of choice sets, each individual makes a sequence of choices Yi . We assume that the conditional indirect utility of alternative j in choice situation t for individual i is uijt = Vijt (X, θ) + εijt 16 For notational convenience, we assume each i makes exactly T choices, but this assumption is inessential for our results. 10 where X are regressors and εijt is an unobserved portion of utility. The probability of individual i of making sequence of choices Yi = j and of being matched to the set of possible choice sequences CS ?i = s is: Pr [ Yi = j, CS ?i = s| θ, γ] = Pr [ Yi = j| CS ?i = s, θ] Pr [ CS ?i = s| γ] (8) where Pr [ CS ?i = s| γ] is the probability with which individual i is matched to the set of sequences CS ?i = s and γ is a vector of parameters governing it. Equation (8) is quite general and captures two different features of behavior in a typical product market. The first term, Pr [ Yi = j| CS ?i = s, θ], embodies consumer preferences for products given their sequence of choice sets, CSi? . These preference parameters are given by the parameter vector, θ. The second term, Pr [ CS ?i = s| γ], embodies the matching of consumers to their (unobserved) sequence of choice sets, CSi? . These could be a function of consumer preference parameters, e.g. due to consumer search, and/or due to firms’ cost parameters, e.g. due to product assortment decisions by retailers. These parameters are denoted by the parameter vector, γ. In principle, γ could include some or all of the same parameters that are in θ. We make the following important assumption: • [A1] Assume that every εijt is identically distributed Type I Extreme Value independently of the specific sequence of choice sets to which i is matched. Thus Pr [ Yi = j| CS ?i = s, θ] is Logit for any s. Under [A1], the matching of consumers to choice sets embodied in the second term in (8), Pr [ CS ?i = s| γ], does not affect the consistent estimation of θ. This is because with [A1] we have assumed we have the correct functional form for the first term in (8). Thus, even if γ contains elements of θ, failing to account for the matching of consumers to choice sets only impacts the efficiency of our estimate of θ, not its consistency. While [A1] is a strong assumption, we note that it is commonly made in empirical work. Indeed, any Logit model of individual choice behavior that ignores choice set misspecification is itself making Assumption [A1] for the universal choice set, U. Extending the methods introduced here to nonLogit demand, whether in general or in response to the endogenous matching of individuals to choice 11 sets, e.g. due to customer search and/or firms’ product choice decisions, is a promising topic for further research. Bias from Choice Set Misspecification Given Assumption [A1], the conditional probability of individual i making sequence of choices Yi = j given the their sequence of choice sets, CS ?i , is: Pr [ Yi = j| CS ?i = s, θ] = T Y P t=1 exp (Vijt (X, θ)) k∈CS ? exp (Vikt (X, θ)) (9) it By contrast, the typical applied researcher will ignore choice set heterogeneity and instead specify the conditional probability of i choosing Yi from the universal choice set, U: Pr [ Yi = j| U, θ] = T Y exp (Vijt (X, θ)) k∈Ut exp (Vikt (X, θ)) P t=1 (10) While equations (9) and (10) are superficially similar, the difference in the terms in the denominator of the Logit probability (denoted in red) is critically important. If individuals make choices from sequences of choice sets strictly included in U [i.e., when the true model is (9)], a likelihood function derived from (10) will generally lead to an inconsistent estimate of θ. The following formula illustrates why. If individual i were to choose from the sequence of universal choice sets U , then the probability of making sequence of choices Yi = j ∈ CS ?i = s ⊆ U would be: Pr [ Yi = j| U, θ] = Pr [ Yi = j| CS ?i = s, θ] Pr [ Yi ∈ CS ?i = s| U, θ] (11) = T Y t=1 P exp (Vikt (X, θ)) exp (Vijt (X, θ)) P P k∈st , k∈st exp (Vikt (X, θ)) r∈Ut exp (Virt (X, θ)) where Pr [ Yi ∈ CS ?i = s| U, θ] is the probability that i’s sequence of choices will belong to CS ?i = s whenever she chooses from the universal set of sequences U ≡ ×Tt=1 Ut . Pr [ Yi ∈ CS ?i = s| U, θ] is a measure of the importance, in terms of i’s preferences, of the sequences of alternatives “feasible” in CS ?i = s relative to the “complete” set U. Re-arranging (11), one can express the true model (9) in terms of the universal set of sequences 12 U: incorrect imputation }| { 1 = s, θ] = Pr [ Yi = j| U, θ] Pr [ Yi ∈ CS ?i = s| U, θ] z Pr [ Yi = j| CS ?i T Y exp (Vijt (X, θ) − ln(Pr [ Yi ∈ CS ?i = s| U, θ])) X exp (Vikt (X, θ)) t=1 k∈U t P k∈st exp (Vikt (X, θ)) P T exp Vijt (X, θ) − ln Y r∈Ut exp (Virt (X, θ)) X = exp (Vikt (X, θ)) t=1 = (12) k∈Ut The final line above shows how, if εijt is i.i.d. Type I Extreme Value, the true model can be written as a choice probability from the sequence of universal choice sets augmented by a term whose omission may cause a bias. The term, ln(Pr [ Yi ∈ CS ?i = s| U, θ]), is a function of the same systematic utilities that enter the numerator and the denominator and will generally cause a bias unless Pr [ Yi ∈ CS ?i = s| U, θ] = 1, i.e. there will be bias unless the probability of selecting one of the sequences not in CS ?i = s equals 0 when facing the full set of sequences U. While Pr [ Yi ∈ CS ?i = s| U, θ] is never exactly 1 in a Logit model (or any random utility model with unbounded support for εijt ), continuity suggests that if one mistakenly attributes to individual i alternatives she does not like, then the expected bias should be small. However, if one mistakenly attributes to individual i alternatives she really likes and did not choose only because they were not included in CS ?i = s, then the bias may be expected to be large. Choice set reduction versus choice set enlargement The current bias may seem surprising given the classic result by McFadden (1978) regarding estimation of Logit models on a subset of choices. There he showed that by assuming Type I Extreme Value errors and that all individuals make choices from U, one may want, for computational reasons, to estimate the model parameters from Pr [ Yi = j| Di = s, θ] rather than from Pr [ Yi = j| U, θ], ∀i. This is an exercise in choice set reduction. In performing this task, the econometrician designs a sampling process that matches individuals to choice sets smaller than the universal set.17 In such cases, McFadden (1978) shows that the econometrician can consistently estimate θ from 17 For example, the econometrician could set the choice set for i in estimation, Di , to be the product chosen by i, ji , and N other products chosen with equal probability from the remaining alternatives in U . 13 Pr [ Yi = j| Di = s, θ] as long as the “matching” between i’s and Di ’s satisfies a Uniform Conditioning Property.18 Thus, choice set reduction is a useful estimation strategy in Logit models. In our case, however, we are dealing with the opposite case of choice set enlargement. In other words, we are considering cases where the researcher inappropriately assumes all consumers have access to the sequence of universal choice sets, U, when in fact they only choose among a subset of choices, CS ?i . Equation (12) shows that, even if one assumes a Logit model, the presence of the additional term in the numerator, − ln(Pr [ Yi ∈ CS ?i = s| U, θ]), will generally cause a bias. Indeed, this term is always positive and larger the smaller is the true choice set, CS ?i = s, relative to the universal choice set, U . Moreover, the magnitude of this term depends on preferences: the more preferred are the extra alternatives mistakenly included in the choice set (i.e., the higher is P r∈Ut \st exp (Vikt (X, θ)), t = 1, . . . , T ), the larger is the bias. 3.2 Correcting Choice Set Misspecification Bias In this section, we illustrate our proposed solution to the problem of choice set misspecification. We then discuss three special cases of our proposed solution: the Fixed-Effect Logit (FE Logit), the Purchase-History Logit (PH Logit), and the Cross-Section Logit (XS Logit). To conclude, we briefly discuss why other existing methods may not solve the problem of choice set misspecification. 3.2.1 General Solution: Sufficient Sets The essence of our proposed solution is to identify “sufficient” sets of choices that both (1) contain the observed choices made by consumers and (2) must lie within the consumers’ true unobserved choice sets.19 We show how, when one has a sufficient set, that the convenient functional form of the Logit model allows us to “difference out” consumers’ true, unobserved-to-the-econometrician, choice sets and identify and estimate preferences based on variation in price and other product characteristics among the products in the sufficient set. 18 While the example in the previous footnote would do so, for efficiency reasons the econometrician may decide to implement non-uniform matching probabilities between i’s and Di ’s. In that case, consistency of the estimator is achieved by accounting for the matching probabilities (designed by the econometrician) in the likelihood function. See Train et al. (1987) for an application of non-uniform matching probabilities. 19 As our method works in both cross-sectional and panel data environments, for expositional convenience we refer to a comparison of consumers’ (plural) choice sets. In a panel data context, these “consumers” would actually represent a single consumer at different points of time. 14 To show this, consider the following condition which we will use to evaluate possible solution methods: • [C1] Given any sequence of choices Yi ∈ CS ?i , there exists a set of sequences of choices, f , such that f (Yi ) ⊆ CS ?i . Condition [C1] states that one can use the observed set of choices Yi to construct a set of choices f (Yi ) that is always included in the unobserved set of possible choices CS ?i . As the classic result by McFadden (1978) suggests, reducing choice sets is not a problem in Logit models. In the same spirit, we now show that if conditions [A1] and [C1] hold, then f (Yi ) is a sufficient statistic (sufficient set) for CS ?i . Afterwards, we will propose three different sufficient conditions on CSi? that lead to four examples of functions f that satisfy condition [C1] and imply the following proposition. Proposition 1. If [A1] and [C1] hold, for every individual i and set of choices j: T Y exp (Vijt (X, θ)) t=1 Pr [ Yi = j| f (Yi ) = r, θ] = P T Y k∈f (k)=r . exp (Vikt (X, θ)) t=1 Then, θ can be consistently estimated by Maximum Likelihood from Pr [ Yi = j| f (Yi ) = r, θ]. 15 Proposition 1 can be proved as follows: Pr [ Yi = j| f (Yi ) = r, θ] = Pr [f (Yi ) = r, Yi = j|θ] Pr [ f (Yi ) = r| Yi = j, θ] Pr [ Yi = j| θ] = Pr [f (Yi ) = r|θ] Pr [ f (Yi ) = r| θ] = Pr [ f (Yi ) = r| Yi = j, θ] Pr [ Yi = j| CS ?i = s, θ] Pr [ f (Yi ) = r| CS ?i = s, θ] = Pr [ f (Yi ) = r| Yi = j, θ] Pr [ Yi = j| CS ?i = s, θ] Pr [ f (Yi ) = r| CS ?i = s, θ] Pr [ f (Yi ) = r| Yi = j, θ] Pr [ Yi = j| CS ?i = s, θ] [ f (Yi ) = r| θ, Yi = k] Pr [ Yi = k| CS ?i = s, θ] k∈U Pr | {z } =P Equals 1 if k∈r, 0 else T Y P t=1 = exp (Vijt (X, θ)) v∈st exp (Vivt (X, θ)) T Y P k∈f (k)=r t=1 T Y exp (Vikt (X, θ)) v∈st exp (Vivt (X, θ)) P exp (Vijt (X, θ)) t=1 = P T Y k∈f (k)=r exp (Vikt (X, θ)) t=1 (13) Then, consistency of the Maximum Likelihood estimator of θ follows from McFadden (1978). The first equality follows from applying repeatedly Bayes’s rule to Pr [ Yi = j, f (Yi ) = r| CS ?i = s, θ]. The third equality follows from Pr [ f (Yi ) = r| θ, Yi = k] being 1 for any k ∈ f (k) = r and 0 otherwise. This is an extreme case of McFadden (1978)’s uniform conditioning property. In our context this property holds since, conditional on a realization of Yi , Yi = k, f (Yi ) is not a random variable (i.e., f (k) is either r with probability one, or different from r with probability one). P The last equality is the most important as, in it, v∈st exp (Vivt ) cancels out of both numerator and denominator. Thus the likelihood of Yi = j given a sufficient sequence, f (Yi ) = r, can be evaluated looking only at the characteristics of the choices in f (Yi ) = r, and not in the unobservedto-the-econometrician individual-specific choice sets, st . 16 Proposition 1 implies that, in the presence of unobserved choice set heterogeneity, parameters θ can be consistently estimated by Maximum Likelihood on the basis of the conditional Logit model Pr [ Yi = j| f (Yi ) = r, θ]. Whenever CSi? is observable, one can easily detect appropriate subsets of CSit? for each (i, t) combination, and exploit McFadden (1978)’s result to consistently estimate θ. However, whenever CSi? is unobservable, one needs to be careful in constructing the set f (Yi ). Indeed, if f (Yi ) is not a subset of CS ?i , then condition [C1] does not hold and one can use our earlier argument to show inconsistency of any estimator based on Pr [ Yi = j| f (Yi ) = r, θ], i.e., one is mistakenly enlarging (i, t)’s choice set. 3.2.2 Specific Solutions: Fixed-Effect, Purchase History, and Cross-Section Logits Introduction There are, however, be reasonable assumptions one can make on CSi? that lead to examples of f that satisfy condition [C1]. We propose two such assumptions that yield four such sets of sequences of choices. • Stable choice sets (Panel Data) With panel data, assume that during the sequence of ? . If so, there are two sets of choices that can be choice situations from T to T , CSit? = CSiT constructed: – Fixed-Effect Logit: For each individual i, let P (Yi ) index the set of all possible permutations of the observed sequence of choices, Yi = (Yi1 , . . . , YiT ). For example, suppose ? = CS ? = {1, 2, 3} and that Y = (1, 3). Then P (Y ) ≡ {(1, 3) , (3, 1)}. Since that CSi1 i i i2 CS ?i ≡ {1, 2, 3}×{1, 2, 3} = {(1, 1) , (2, 2) , (3, 3) , (1, 2) , (2, 1) , (1, 3) , (3, 1) , (2, 3) , (3, 2)}, it is obvious that P (Yi ) ⊆ CS ?i . This is the classic Fixed-Effect (FE) Logit proposed by Chamberlain (1980) and the sufficient sequence used in estimation is fF E (Yi ) ≡ P (Yi ). – Full Purchase-History Logit: Consider all the products purchased by household i between S T and T , denoted by Hi = Tt=T {Yit }. Since it relies on the products purchased by i over the whole of its time period, we call this the Full Purchase-History Logit, or FPH ? = CS ? = {1, 2, 3} and that Y = (1, 3). We Logit. For example, suppose again that CSi1 i i2 define the sufficient set of sequences of choices in this case to be fF P H (Yi ) ≡ ×Tt=1 Hi . Then fF P HL (Yi ) = {1, 3} × {1, 3} = {(1, 1) , (3, 3) , (1, 3) , (3, 1)}. Since CS ?i is the same as in the Fixed-Effect Logit above, it is clear that fF P H (Yi ) ⊆ CS ?i . 17 • Stable choice sets (Cross-Section Data) With cross-section data, consider a single time period, t, whose subscript we omit for convenience. Suppose that we can identify a subset of individuals, i ∈ I, who have the same xij and shop in the same fascia, f , in period t, i.e. CSit? = CSi?0 t = CSf?t , ∀i, i0 ∈ I. If so, then one can construct a sufficient set of choices: – Inter-Personal Logit: Consider all the products purchased by the households in I, denoted fXS (Y), where Y ≡ {Y1 , Y2 , . . . , Yi , . . . , YI } is the 1 × I vector of choices made by the I consumers. Since it relies on the set of products purchased across individuals, we call this the Inter-Personal Logit. For example, suppose that there are three individuals, i = 1, 2, 3, each of whom shop at fascia f in period t. Further suppose that these three consumers purchased goods 3, 2, and 5. Then the sufficient set of choices in this case is fXS (Y) = {2, 3, 5}.20 Since all three consumers purchased at the same store, f , in the same period, t, it is clear that fXS (Y) ⊆ CS f t . • Entry, no exit. Assume that during the sequence of choice situations from T to T , CSit? ⊆ ? CSit+1 . Then: – Past Purchase-History Logit: In any period, t, consider the products purchased by houseS hold i between T and t, denoted by Hit = tt=T {Yit }. Since it relies on all the past products purchased by i, we call this the Past Purchase-History Logit, or PPH Logit. ? = CS ? = {1, 2, 3} and that Y = (1, 3). We define For example, suppose again that CSi1 i i2 the sufficient set of sequences of choices in this case to be fP P H (Yi ) ≡ ×Tt=1 Hit . Then fP P H (Yi ) = {1} × {1, 3} = {(1, 1) , (1, 3) , } and again it is clear that fP P H (Yi ) ⊆ CS ?i . Note that, in general, fF E (Yi ) ⊆ fF P H (Yi ) ⊆ CS ?i and fP P H (Yi ) ⊆ fF P H (Yi ) ⊆ CS ?i , but that no clear relationship exists between fF E (Yi ) and fP P H (Yi ). A drawback of the FE Logit is that it does not allow the identification of the parameters of time-invariant regressors. Consequently, from a FE Logit one cannot estimate price elasticities, for example. By contrast, the PH Logit allows one to estimate all the parameters of individuals’ indirect utilities. Moreover, as it will be clear later, the PH Logit is much simpler than the FE Logit to implement, and is more efficient than 20 Note that if even if we had five consumers that purchased goods 3, 2, 5, 2, and 5, that fXS (Y) would still be {2, 3, 5}. 18 the FE Logit if there is not any unobserved preference heterogeneity among individuals. However, these advantages do not come for free: the PH Logit is not robust against any form of unobserved preference heterogeneity across individuals, while the FE Logit controls for individual-alternativespecific fixed effects. As we will discuss in the next sections, the various ways of constructing f (Yi ) lead to more or less robust and/or efficient estimators along the lines of Hausman and McFadden (1984). Discussion: Alternative Solution Methods Note that “standard” methods routinely implemented by applied researchers in differentiated product demand estimation do not necessarily solve choice set misspecification due to unobserved choice sets and endogenous matching. For brevity, call ln (πi ) ≡ − ln(Pr [ Yi ∈ CS ?i = s| U, θ]). • Since Lerman and Manski (1981), econometricians are used to think that most choice-based sampling issues in Logit models can be addressed by adding a full set of alternative-specific constants.21 This is not the case for our problem: ln (πi ) is an individual -specific term that only affects the numerator of the Logit formula. • Empirical researchers typically do not observe the CSi? and usually know little about the matching process, thus they cannot “compute” the bias term, ln πi . • As for the standard case of unobserved individual heterogeneity, one cannot simply estimate the ln (πi )’s, e.g., with individual-specific dummy variables, because of the incidental parameters problem. • One cannot treat the πi ’s as “random coefficients” and implement a random effects estimator since, by construction, they are correlated with all the X’s. 3.3 Elasticity Bounds This sections shows that, in general, our method only allows to “bound” elasticities rather than to point identify them. The identification issue arises because of two reasons: (1) choice sets 21 See Bierlaire, Bolduc, and McFadden (2008) for the latest substantial contribution to this strand of literature. 19 are unobservable and (2) our estimator does not estimate the distribution of choice sets in the population, but only the preference parameters. Assume a multinomial logit model with linear indirect utilities, then the true elasticities we wish to estimate are: exp (X ijt θ) Eitjj (X it , θ) = θprice pricejt 1 − X exp (X ikt θ) ? k∈CSit (14) exp (X ilt θ) . Eitjl (X it , θ) = −θprice pricelt X exp (X ikt θ) ? k∈CSit As (14) makes clear, even though we may have a consistent estimator of θ, we still do not know the exact CSit? for each i. However, our method naturally leads to bounds for the own- and cross-price elasticities in (14). On the one hand, our method specifies ft (Yi ) ⊆ CSit? ; on the other, by definition, CSit? ⊆ Ut . These inclusions allow us to bound the denominator of the “true” logit choice probability for any X it and θ: X exp (X ikt θ) ≤ X exp (X ikt θ) ≤ ? k∈CSit k∈ft (Yi ) X exp (X ikt θ) . (15) k∈Ut Given (14), (15), and assuming that θprice < 0, one gets the following elasticity intervals for any θ and X it : exp (X ijt θ) ≤ E jj (X it , θ) ≤ θprice pricejt X θprice pricejt 1 − 1 − it exp (θX ikt ) k∈Ut exp (X ijt θ) X exp (θX ikt ) k∈ft (Yi ) exp (X ijt θ) exp (X ijt θ) −θprice pricelt X ≤ Eitjl (X it , θ) ≤ −θprice pricelt X . exp (X ikt θ) exp (X ikt θ) k∈Ut k∈ft (Yi ) (16) 20 We construct confidence intervals for these identification regions following Imbens & Manski (2004). For notational simplicity, we limit our discussion to a single elasticity term Eitjl (X it , θ), although the same ideas can be extended to the collection of all elasticities. Refer to the upper and lower bounds of Eitjl (X it , θ) in (16) as to Eitjl (X it , θ) and Eitjl (X it , θ), respectively. Denote h i0 jl jl the elasticity bounds of Eitjl (X it , θ) by the 2 × 1 vector BE jl (X , θ) = E (X , θ) , E (X , θ) it it it it it it and the corresponding elasticity interval from (16) by I E jl it (X it , θ). Then, given X it and our b we can estimate the elasticity bounds BE jl (X it , θ) by BE jl X it , θ b . We derive the estimator θ, it it corresponding 100 (1 − α) percent confidence interval CI1−α from condition: inf jl Eit ∈I E jl it (X it ,θ) lim Pr I→∞ h Eitjl ∈ CI1−α i ≥ 1 − α. Since our estimator is consistent and asymptotically normal, i.e., √ (17) d b → N (θ, Vθ ), by the Iθ delta-method: √ d b IBE jl it X it , θ −→ N 0 ∂BE jl ∂BE jl it (X it , θ) it (X it , θ) BE jl (X , θ) , V it θ it ∂θ 0 ∂θ 0 ! . (18) b as to Σ jl . It follows Refer to the 2×2 asymptotic variance-covariance matrix of BE jl X , θ it it BE it that, whenever ft (Yi ) is a strict subset of Ut (so that for any X it and θ, Eitjl (X it , θ) < Eitjl (X it , θ)), condition (17) is satisfied by:22 " CI1−α = Eitjl r b − q1−α Σ 11 X it , θ BE jl it , Eitjl b + q1−α X it , θ r # Σ 22 jl BE it , (19) where q1−α is the (1 − α)th quantile of a standard normal distribution. 22 In the extreme case in which ft (Yi ) = Ut , Eitjl (X it , θ) = Eitjl (X it , θ) for any X it and θ, and (19) is invalid. This is due to a discontinuity at Eitjl (X it , θ) = Eitjl (X it , θ), since in this case the coverage of the interval is only 100 (1 − 2α) % rather than the nominal 100 (1 − α) %. See Imbens and Manski (2004) for a modication of (19) that overcomes this problem. Importantly, note that (a) both ft (Yi ) and Ut are always perfectly observed by the econometrician, so that the appropriate CI1−α can always be implemented and (b) in our empirical application ft (Yi ) ⊂ Ut for every i and t. 21 Figure 1: Market shares Notes: xxxxx 4 Empirical Application 4.1 Data We use UK household level purchase data from the Kantar Worldpanel. Households record purchases using a scanner in the home. Households typically remain in the panel for around 2 years. For more information on the data see Leicester and Oldfield (2009) and Griffith and O’Connell (2009). We consider demand for ketchup by 2312 households over the period 2002- 2012 23 The UK ketchup market is an ideal place to study the problem of demand estimation in the presence of heterogeneous unobserved choice sets. Heinz is the dominant brand, in 2002 it had a XX% market share. Supermarket own brands made up the rest of the market. Heinz introduced the top down package in DDMMYY. This gradually became the dominant variety. See Figure 1. 24 This is an ideal situation because we observe products which households clearly prefer, but which we know are not in their choice set on some choice occasions. 23 I think we might want to use data to 2008, cleaner, we aren’t currently using own brand top down entry. It’s surprising the Heinz market share goes to zero in 2009, need to check this, I think is because we’ve removed glass bottles 24 22 Figure 2: Mean number of options in consideration set Notes: xxxxx We define three sufficient sequences: • U : all j products, the universal choice set • P Hit : products that the household has purchased in the past – P Hait : products that the household has purchased in the past and where the product is available in at least one fascia on that day – P Hbit : products that the household has purchased in the past and where the product is available in the household’s chosen fascia on that day Figure 2 shows the mean (over households) number of products in these different sufficient sequences. 4.2 Empirical Specification Denote households by i, individual choice occasions (shopping trips) by it, options (products) by j. 23 The indirect utility household i obtains from purchasing product j at time t is, vitj = β1 ptj + β2 dij + X β3k xkj + itj (20) k where p: price, d: distance, x: product characteristics (brand, size, topdown). Denote the choice outcome as yitj = 1 if i chose j on t and the sufficient sequence as fP Hit . Individual predicted purchase probabilities are, fP Hit P rijt ≡ P [yitj = j| fP Hit = r] P exp β̂1 ptj + β̂2 dij + k β̂3 kxkj =P P exp β̂ p + β̂ d + β̂ kx 1 2 3 tl il kl l∈fP H k (21) (22) it Predicted market shares are , sfjtP H = n−1 jt X P [yitj = j| fP Hit = r]P [j ∈ fP Hit = r] (23) i Market elasticities are ηjjt = ∂sjt pjt = β1 pjt (1 − sjt ) ∂pjt sjt (24) ∂sjt pkt = −β1 pkt skt ∂pkt sjt (25) ηjkt = with derivatives, ∂sjt = β1 sjt (1 − sjt ) ∂pjt (26) ∂sjt = −β1 sjt skt ∂pkt (27) 24 4.3 Results universal Purchase History in market in fascia price -0.591 -1.025 -1.366 -1.431 (0.017) (0.024) (0.024) (0.027) distance -0.167 -0.054 -0.055 (0.002) (0.003) (0.003) topdown 0.810 2.999 1.797 1.815 (0.016) (0.029) (0.032) (0.034) size 600g -2.479 0.327 0.503 0.079 (0.135) (0.200) (0.202) (0.245) size 700g 0.445 0.272 0.328 0.339 (0.017) (0.021) (0.021) (0.022) size 900g 0.277 0.673 0.835 0.832 (0.024) (0.028) (0.028) (0.030) size 1kg 0.967 1.048 1.080 1.160 (0.020) (0.029) (0.030) (0.031) size 1.1kg -1.270 -1.838 0.687 0.797 (0.103) (0.110) (0.151) (0.161) size 1.2kg 0.694 0.915 1.533 1.581 (0.025) (0.031) (0.032) (0.035) size 1.3kg 1.400 1.996 1.470 1.607 (0.028) (0.040) (0.039) (0.042) Morr -0.511 0.849 0.212 0.393 (0.058) (0.068) (0.067) (0.080) Sain -0.315 0.433 0.164 0.449 (0.031) (0.043) (0.043) (0.064) Tesc -0.320 0.872 0.459 0.509 (0.027) (0.036) (0.037) (0.052) Hein 0.148 -0.681 0.003 0.079 (0.025) (0.035) (0.038) (0.046) N 1700178 218820 161343 109173 (it) 36174 36174 36174 36174 (i) 2334 2334 2334 2334 number of options (j) in households choice set mean 47 6 4 3 minimum 47 2 2 2 maximum 47 23 17 8 choice sequences 25 Chamberlain FE T=2 T=8 -1.72 (0.06) -1.57 (0.03) 2334 2334 9931 7620 4.4 5 Specification Testing/Robustness Conclusion [This section remains to be completed.] 26 References Ackerberg, D., L. Benkard, S. Berry, and A. Pakes, “Econometric Tools for Analyzing Market Outcomes,” in J. J. Heckman and E. Leamer, eds., Handbook of Econometrics, North-Holland, 2007, chapter 63, pp. 4171–4276. Allenby, Greg M and Peter E Rossi, “Marketing models of consumer heterogeneity,” Journal of Econometrics, 1998, 89 (1), 57–78. Berry, Steven, James Levinsohn, and Ariel Pakes, “Automobile Prices in Market Equilibrium,” Econometrica, 1995, 63 (4), 841–890. Bierlaire, Michel, Denis Bolduc, and Daniel McFadden, “The estimation of generalized extreme value models from choice-based samples,” Transportation Research Part B: Methodological, 2008, 42 (4), 381–394. Chamberlain, Gary, “Analysis of Covariance with Qualitative Data,” The Review of Economic Studies, 1980, 47, 225–238. Chiang, Jeongwen, Siddhartha Chib, and Chakravarthi Narasimhan, “Markov chain Monte Carlo and models of consideration set and parameter heterogeneity,” Journal of Econometrics, 1998, 89 (1), 223–248. Conlon, Christopher T and Julie Holland Mortimer, “Demand estimation under incomplete product availability,” American Economic Journal-Microeconomics, 2013, 5 (4), 1–30. Draganska, Michaela and Daniel Klapper, “Choice set heterogeneity and the role of advertising: An analysis with micro and macro data,” Journal of Marketing Research, 2011, 48 (4), 653–669. Eliaz, Kfir and Ran Spiegler, “Consideration sets and competitive marketing,” The Review of Economic Studies, 2011, 78 (1), 235–262. Fox, Jeremy T, “Semiparametric estimation of multinomial discrete-choice models using a subset of choices,” The RAND Journal of Economics, 2007, 38 (4), 1002–1019. Gandhi, Amit, Zhentong Lu, and Xiaoxia Shi, “Estimating Demand for Differentiated Products with Error in Market Shares,” 2013. Working paper, University of Wisconsin-Madison. 27 Goeree, Michelle Sovinsky, “Limited Information and Advertising in the U.S. Personal Computer Industry,” Econometrica, 2008, 76 (5), 1017–1074. Griffith, Rachel and Martin O’Connell, “The Use of Scanner Data for Research into Nutrition,” Fiscal Studies, 2009, 30, 339–365. Hausman, Jerry and Daniel McFadden, “Specification tests for the multinomial logit model,” Econometrica, 1984, pp. 1219–1240. Leicester, Andrew and Zoe Oldfield, “An analysis of consumer panel data,” IFS Working Papers W09/09, 2009. Lerman, S. and C. Manski, “On the Use of Simulated Frequencies to Approximate Choice Probabilities,” in C. Manski and D. McFadden, eds., Structural Analysis of Discrete Data with Econometric Applications, MIT Press, 1981. Lu, Zhentong, “A Moment Inequality Approach to Estimating Multinomial Choice Models with Unobserved Consideration Sets,” 2014. Working paper, University of Wisconsin-Madison. Manzini, Paola and Marco Mariotti, “Stochastic choice and consideration sets,” Econometrica, 2014, 82 (3), 1153–1176. Masatlioglu, Yusufcan, Daisuke Nakajima, and Erkut Y Ozbay, “Revealed attention,” American Economic Review, 2012, pp. 2183–2205. Matejka, Filip and Alisdair McKay, “Rational inattention to discrete choices: A new foundation for the multinomial logit model,” American Economic Review, 2015, 105 (1), 272–98. McFadden, Daniel, “Modeling the Choice of Residential Location,” in A. Karlqvist, L. Lundqvist, F. Snickars, and J. Weibull, eds., Spatial Interaction Theory and Planning Models, Vol. 1, North-Holland, 1978, pp. 75–96. Nierop, Erjen Van, Bart Bronnenberg, Richard Paap, Michel Wedel, and Philip Hans Franses, “Retrieving unobserved consideration sets from household panel data,” Journal of Marketing Research, 2010, 47 (1), 63–74. 28 Pancras, Joseph and K Sudhir, “Optimal marketing strategies for a customer data intermediary,” Journal of Marketing Research, 2007, 44 (4), 560–578. Pesendorfer, Martin, “Retail sales: A study of pricing behavior in supermarkets*,” The Journal of Business, 2002, 75 (1), 33–66. Roberts, John H and James M Lattin, “Development and testing of a model of consideration set composition,” Journal of Marketing Research, 1991, pp. 429–440. and , “Consideration: Review of research and prospects for future insights,” Journal of Marketing Research, 1997, pp. 406–410. Train, K., Discrete Choice Methods with Simulation, Cambridge University Press, 2009. Train, Kenneth E, Daniel L McFadden, and Moshe Ben-Akiva, “The demand for local telephone service: A fully discrete model of residential calling patterns and service choices,” The RAND Journal of Economics, 1987, pp. 109–123. Villas-Boas, J Miguel and Ying Zhao, “Retailer, manufacturers, and individual consumers: Modeling the supply side in the ketchup marketplace,” Journal of Marketing Research, 2005, 42 (1), 83–95. Wooldridge, Jeffrey, Econometric Analysis of Cross-Section and Panel Data, 2nd ed., MIT Press, 2010. 29 A Details of Proposed Solutions A.1 Fixed-Effect Logit: Details Our first proposed solution results by assuming stable choice sets over time, and by setting fF E (Yi ) ≡ P (Yi ). Chamberlain (1980) introduced this model in the context of time-invariant individualproduct-specific heterogeneity. For example, let uijt = Vijt (X, θ)+εijt = δij +X ijt β+εijt , where δij is individual i’s heterogeneity parameter for alternative j. For all models other than the linear probability model, estimation of choice probabilities requires estimating θ = (δ11 , . . . , δ1J , δ21 , . . . , δ2J , . . . , δI1 , . . . , δIJ , β), i.e. both β and the δij ’s.25 That this estimation only yields consistent estimates of β for T fixed and I → ∞ is known as the incidental parameters problem. The essence of the problem is that for fixed T and I → ∞, the number of parameters grows with the sample size, preventing consistency. Chamberlain (1980) showed that if the econometrician can observe individual i making at least two different choices from the same choice set, then relating the change in her choice to the change in the characteristics across the choice situations allows one to “difference out” the unobserved heterogeneity in a manner similar to the linear fixed effect models.26 Proposition 1, when paired with fF E (Yi ) ≡ P (Yi ), results in Chamberlain’s FE Logit model. Our insight is to recognize that unobserved choice sets, if constant across time, can be treated as individual heterogeneity. As for all applications of the FE Logit, we require each i to make at least two choices from the same choice set. If choice sets evolve across choice situations, one needs to be careful in comparing, for each i, only choice situations with common choice sets. For example, ? = CS ? and CS ? = CS ? , but CS ? 6= CS ? , then one can only compare choice situations if CSi1 i2 i3 i4 i2 i4 t = 1 with t = 2 and t = 3 with t = 4. Violations of this assumption result in violations of the last equality in (13) and would no longer yield consistent estimates of θ. The assumption of constant choice sets is testable via Hausman tests, something we discuss this further in Section xxx below. To illustrate the functioning of this solution method, assume for simplicity that there are only two choice situations per individual. The dependent variable in each time period is given by the vector of dummy variables indicating if i chose j in t, y it = (yi1t , . . . , yijt , . . . , yiJt ), where yijt = 1 25 In the linear probability model, one can use fixed effects estimation to difference the data, eliminate the δij from the estimated model, and consistently estimate β using only across-time variation in the data. The nonlinearity of choice probabilities in systematic utilities for the majority of discrete-choice models (e.g., Probit and Logit) prevents this technique from being used more widely. 26 We provide an example of this below. 30 if i chose j in t and 0 otherwise. In this (simplest) case, the Chamberlinian sufficient statistic is y i1 + y i2 . Further assume that in the universal choice set, there are 5 alternatives (i.e., J = 5), and ? = CS ? = s is unobserved (and constant between the two choice situations), but i is that CSi1 i2 observed choosing the sequence j = 3 in t = 1 and j = 5 in t = 2. The Chamberlain fix is to examine the probability of seeing the observed choice sequence Yi = (3, 5) relative to the universe of sequences involving 3 in one period and 5 in the other, P (Yi ) ≡ {(3, 5) , (5, 3)}. This conditional probability is given by Pr [Yi = (3, 5)) |CSi? = (s, s, y i1 + y i2 = (0, 0, 1, 0, 1) , θ] or, equivalently, by Pr [ Yi = (3, 5)| {(3, 5) , (5, 3) , θ}]. Let the systematic utilities be given by Vijt (X, θ) = δij +X ijt β.27 Then: Pr [Yi = (3, 5)) |CSi? = (s, s, y i1 + y i2 = (0, 0, 1, 0, 1) , θ] = Pr [ Yi = (3, 5)| θ, {(3, 5) , (5, 3)}] . = Pr [Yi = (3, 5)|CSi? = (s, s) , θ] Pr [Yi = (3, 5)|CSi? = (s, s) , θ] + Pr [Yi = (5, 3)|CSi? = (s, s) , θ] = ? = s, θ] Pr [Y = 5|CS ? = s, θ] Pr [Yi1 = 3|CSi1 i2 i2 ? ? ? = s, θ] Pr [ Y = 3| CS ? = s, θ] Pr [Yi1 = 3|CSi1 = s, θ] Pr [Yi2 = 5|CSi2 = s, θ] + Pr [Yi1 = 5|CSi1 i2 i2 exp (Vi52 ) exp (Vi31 ) P j∈s exp (Vij1 ) j∈s exp (Vij2 ) = exp (Vi52 ) exp (Vi51 ) exp (Vi32 ) exp (Vi31 ) P P P +P exp (V ) exp (V ) exp (V ) ij1 ij2 ij1 j∈s j∈s j∈s j∈s exp (Vij2 ) P = exp (δi3 + X i31 β) exp (δi5 + X i52 β) exp (δi3 + X i31 β) exp (δi5 + X i52 β) + exp (δi5 + X i51 β) exp (δi3 + X i32 β) = exp (X i31 β) exp (X i52 β) exp (X i31 β) exp (X i52 β) + exp (X i51 β) exp (X i32 β) (28) where, as in the general case in (13), unobserved choice sets, P j∈s exp (Vij1 ) P j∈s exp (Vij2 ), can- cel when going from the third to the fourth equality, and the individual-specific effects, exp(δi3 ) exp(δi5 ) cancel when going from the fourth to the fifth equality. As such, one can estimate β from variation 27 The term δij encapsulates all the time-invariant preference heterogeneity of individual i for alternative j. 31 only in the price and product characteristics in products 3 and 5. Beyond the specific example in (28), for each individual i, one can derive choice probabilities P conditional on t y it = ci , where ci is the vector counting the number of times across the t periods that i selected each of the J products.28 Within our framework, ci is the required sufficient statistic for i. In this framework, one can only use in estimation those observed sequences of choices that are compatible with more than one permutation of the observed sequence of choices. If the observed sequence of choices is not compatible with any other permutation of choices, then (28) will equal one, and the observation will not contribute to the likelihood function. For example, those individuals who never switch products cannot be used in estimation as there is only one possible sequence associated with that permutation, the sequence of choosing the same product in each period. Lee (2002) provides further details about how to construct likelihood functions and similar objects for this class of models. We present further results associated with estimating a FE Logit on relatively large choice sets in relatively long panels in Appendix 5. A.2 Purchase-History Logits: Details Our second proposed solution results by assuming that choice sets do not “shrink” over time, and by setting f (Yi ) = fF P H (Yi ) or fP P H (Yi ). To illustrate the functioning of this solution method, assume for simplicity that there are three choice situations per individual. yijt and yit are define as in the previous subsection, as is the universal choice set, with J = 5. Suppose i’s choice set, CSi? is s1 in period 1, s2 in period 2, and s3 in period 3. Per our definitions in the last section, it must be that s1 ⊆ s2 ⊆ s3 . Thus CSit∗ could be constant or growing across periods, but not shrinking. Suppose i is observed choosing the sequence j = 3 in t = 1, j = 5 in t = 2, and j = 1 in t = 3. The Past Purchase History fix is to examine the probability of seeing the observed choice sequence, Yi = (3, 5, 1), relative to the universe of sequences that have 3 in the first period, 3 or 5 in the second period, and 1, 3, or 5 in the third period. Thus fP P H (Yi ) = {3} × {3, 5} × {1, 3, 5}. Let the systematic utilities be given by Vijt (X, θ) = δj + X ijt β.29 Then: 28 Recall ci = (0, 0, 1, 0, 1) in our example above. Note that we have dropped the possibility of individual-product-specific fixed effects, δij , in favor of simple product dummies, δj . 29 32 Pr [Yi = (3, 5, 1)|CSi? = (s1 , s2 , s3 ), fP P H (Yi ) = {3} × {3, 5} × {1, 3, 5}, θ] = Pr [Yi = (3, 5, 1)] Pr [Yi = (3, 5, 1)] + Pr [Yi = (3, 5, 3)] + Pr [Yi = (3, 5, 5)] + Pr [Yi = (3, 3, 1)] + Pr [Yi = (3, 3, 3)] + Pr [Yi = (3, 3, 5)] Pr [Yi1 = 3] Pr [Yi2 = 5] Pr [Yi2 = 1] × × Pr [Yi1 = 3] Pr [Yi2 = 3] + Pr [Yi2 = 5] Pr [Yi2 = 1] + Pr [Yi2 = 3] + Pr [Yi2 = 5] exp(Vi52 ) exp(Vi13 ) P P exp(V ) ij2 j∈s2 j∈s3 exp(Vij3 ) =1× × exp(Vi32 ) exp(Vi13 ) exp(Vi52 ) exp(Vi33 ) exp(Vi53 ) P P +P +P +P j∈s2 exp(Vij2 ) j∈s2 exp(Vij2 ) j∈s3 exp(Vij3 ) j∈s3 exp(Vij3 ) j∈s3 exp(Vij3 ) exp(δ1 + Xi13 β) exp(δ5 + Xi52 β) × = exp(δ3 + Xi32 β) + exp(δ5 + Xi52 β) exp(δ1 + Xi13 β) + exp(δ3 + Xi33 β) + exp(δ5 + Xi53 β) = (29) where the dependence of the probabilities on (CSi? = (s1 , s2 , s3 ), θ) has been omitted for notational convenience and unobserved choice sets cancel when going from the third to the fourth equality. As for the FE Logit, one can estimate β from variation only in the price and product characteristics in products 1, 3, and 5. More generally, the PPH Logit estimates Logit choice probabilities, but only among those sequences of choices that would be possible given the purchase history of consumer i. Within our framework, the sequence of choice sets defined by the augmentation of any new product is the required sufficient statistic. Unlike in the FE Logit, the PH Logit allows for the estimation of product-specific constants, δj . This is important for calculating predicted choice probabilities, elasticities, and other economic objects that depend on the full parameter vector, θ. This benefit comes at the cost of not permitting individual-product-specific effects, δij , as these would (still) not be identified due to the incidental parameters problem. Note also that the (assumed) independence of ijt within individual across time meant that it was possible to split the probability statement about full choice sequences in the first equation to a sequence of period-specific probability statements in the second equation. Whenever f (Yi ) can be expressed as the cartesian product of choice sets, as in the case here and for fP P H (Yi ) more generally, then Pr [ Yi = j| f (Yi ) = r, θ] simplifies with respect to the general equation, Equation 33 (13). For example, in the case of f (Yi ) = fP P H (Yi ): T Y exp (Vijt (X, θ)) t=1 Pr [ Yi = j| fP P H (Yi ) = r, θ] = P T Y k∈fP P H (k)=r exp (Vikt (X, θ)) t=1 = T Y (30) exp (Vijt (X, θ)) . k∈rt exp (Vikt (X, θ)) P t=1 The second equality follows from switching the order of summation and multiplication. An example of this is implicit in moving from the first to the second equation in Equation (31). Note that (30) holds for the PH Logit, but not for the FE Logit since P (Yi ) cannot be expressed as the cartesian product of choice sets. As we will discuss later, the possibility of factoring choice sets as in (30) represents a great computational advantage for the PH Logit relative to the FE Logit as the former is a product of small Logits rather than a unique Logit with a large choice set. A.3 Inter-Personal Logit: Details Consider a single time period, t, whose subscript I omit for convenience.30 Suppose preferences are given by uij = δj + x0ij β + ij Further suppose that we can identify a subset of individuals, i ∈ I, who (a) have the same xij (e.g., they have the same demographic variables) and (b) shop in the same fascia, f , in period t. The identifying assumption is that the set of choices facing each i in f in t, CS ?f t , is the same (though not necessarily observed by us). What is observed by us is the set of choices, j ∈ CS ?f t such that at least one of the i bought j. Call this set our sufficient “set,” fXS (Y), where Y ≡ {Y1 , Y2 , . . . , Yi , . . . , YI } is the 1 × I vector of choices made by the I consumers. To illustrate the functioning of this solution method, assume for simplicity that there are three individuals, i = 1, 2, 3, each of whom shop at fascia f in period t. Implicit in this assumption is that the choice set is the same for all individuals, something we should be a little careful of if stores 30 The notation in this section must be made comparable to the previous two sub-sections (gsc to do). 34 differ in size within fascia. Suppose these three consumers purchased goods 3, 2, and 5. Then f (Y) = {2, 3, 5}.31 The cross-sectional (inter-personal) fix is to examine the probability of seeing each individual’s observed choice, Y1 = 3, Y2 = 2, and Y3 = 5, relative to the choice set defined by the union of those choices across individuals, {2, 3, 5}. Then: Pr Y = (3, 2, 5)|CS ? = sf t , fXS (Y) = {2, 3, 5}, θ = Pr [Y2 = s] Pr [Y3 = 5] Pr [Y1 = 3] × × Pr [Y1 = 2] + Pr [Y1 = 3] + Pr [Y1 = 5] Pr [Y2 = 2] + Pr [Y2 = 3] + Pr [Y2 = 5] Pr [Y3 = 2] + Pr [Y3 = 3] + Pr [Y3 = 5] = Pr [Y1 = 3] × Pr [Y2 = 2] × Pr [Y3 = 5] Pr [Yi = 2] + Pr [Yi = 3] + Pr [Yi = 5] P = exp(V13 ) exp(V22 ) exp(V35 ) ×P ×P exp(V1j ) j∈sf t exp(V2j ) j∈sf t exp(V3j ) j∈sf t P exp(Vi2 ) exp(Vi3 ) exp(Vi5 ) +P +P exp(Vij ) j∈sf t exp(Vij ) j∈sf t exp(Vij ) j∈sf t P exp(Vi3 ) exp(Vi2 ) exp(Vi5 ) ×P ×P exp(V ) exp(V ) ij ij j∈sf t j∈sf t j∈sf t exp(Vij ) P exp(Vi2 ) exp(Vi3 ) exp(Vi5 ) +P +P j∈sf t exp(Vij ) j∈sf t exp(Vij ) j∈sf t exp(Vij ) = = if x1j = x2j = x3j = xij if x1j = x2j = x3j = xij exp(δ3 + x0i3 β) × exp(δ2 + x0i2 β) × exp(δ5 + x0i5 β) exp(δ2 + x0i2 β) + exp(δ3 + x0i3 β) + exp(δ5 + x0i5 β) (31) where the need for x1j = x2j = x3j = xij is apparent in moving from the first to second and third to fourth equalities in order to get cancellation of the denominator terms that depend on the elements of the true (unobserved) choice set, sf t . As in the Fixed Effect and Purchase History Logits, one can estimate β from variation only in the price and product characteristics of products 2, 3, and 5. As in the PH Logit (but not the FE Logit), the XS Logit allows for estimation of the product-specific constants, δj , and thus we can calculate predicted probabilities, elasticities, etc. 31 Note that if even if we had five consumers that purchased goods 3, 2, 5, 2, and 5, that f (Y) would still be {2, 3, 5}. 35 B Implementation Issues in FE Logit B.1 Full Sequences versus Subsequences In this subsection, we discuss estimation of the FE Logit model using an individual’s entire sequence of choice situations versus estimation using mutually exclusive portions of the entire sequence of choice situations. For example, assume one observes individual i over T = 6 choice situations. Define i’s observed sequence of choice situations by j i = (j1 , j2 , j3 , j4 , j5 , j6 ).32 By estimating the FE Logit on the entire sequence of choice situations, we mean that for individual i we would use sequence j i at “once,” as in Chamberlain (1980). By estimating the FE Logit on mutually exclusive portions of the entire sequence of choice situations, we mean that for individual i we would split sequence j i into parts. For instance, one could split j i into the mutually exclusive sequences j i1 = (j1 , j2 , j3 ) and j i4 = (j4 , j5 , j6 ), or into j i1 = (j1 , j2 ), j i3 = (j3 , j4 ), and j i5 = (j5 , j6 ). In the discussion that follows, we illustrate how these different options have different computational and econometric properties. For expositional purposes, in what follows we will only discuss about two extreme cases: the use of an individual’s entire sequence of choice situations versus the use of mutually exclusive pairs of an individual’s sequence of choice situations. For brevity, we call the first strategy the sequence FE Logit and the second strategy the pairs FE Logit. Computational Burden Let PT t=1 y it = ci = (ci1 , ci2 , . . . , ciJ ) be the distribution of observed choices for i over T choice situations. Let ci /T be the vector of individual-specific “market shares” for each choice j over i’s T choice situations. In general, the sequence FE Logit model has mainly been used in binary models (J = 2) with a small number of choice situations (T ≤ 5). One of the reasons may be related to the growing “size” of the model with respect to J and T . In a sequence FE Logit model with J alternatives and T shopping trips, each i has: 32 Alternative j1 was chosen in the first choice situation, alternative j2 in the second choice situation, etc. 36 T! ci1 !ci2 ! · · · ciJ ! (32) addends in the denominator of the Logit formula. (32) is a factorial function of the number of choice situations, but also heavily depends on the observed behavior of i. Regardless of J, the fewer alternatives chosen by i across T , the smaller (32) will be. Even though this is an entirely empirical question, one can imagine that larger J may be more likely associated with individuals choosing many different alternatives (i.e., larger (32)). In applications with very “dispersed” observed choices across the J alternatives, this can translate into up to T ! addends in the denominator of the Logit formula per individual. For example, with T = 10 there may be over 3.5 million addends. Within the sequence FE Logit, we propose to by-pass this computational burden by exploiting the classic result by McFadden (1978) mentioned earlier. This idea is useful for general implementations of Chamberlain (1980)’s estimator, beyond its specific use proposed in our context. As can be seen from equation (28), in Chamberlain’s Logit formulae, the IIA property holds with respect to sequences of alternatives across choice situations rather than to single alternatives. The addends in the denominator represent the different “sequences of alternatives” the individual can choose from, i.e., her choice set in this framework. We can therefore rely on McFadden’s results to sample random subsets of these sequences in our estimation. By contrast, the dimensionality of the pairs FE Logit is not affected by J and T . Regardless of J or T , it will always be the product of up to T /2 binomial Logits. Econometric Issues In addition to issues of computational burden, there are important econometric differences between sequence and the pairs FE Logits. Suppose J = 5, T = 4, and that i is observed choosing the sequence of alternatives j i = (j1 , j2 , j3 , j4 ) = (3, 5, 5, 4).33 The sequence FE Logit has sufficient statistic ci = (0, 0, 1, 1, 2). The maximum number of addends an individual can have in the denominator of the sequence FE Logit is T ! = 4! = 24. However, (32) implies that i’s observed combination of choices can only be arranged 33 Alternative one in the first choice situation, alternative three in the second choice situation, etc. 37 in 12 different sequences of choices.34 Collect these sequences into the set Si . Then, i’s likelihood contribution in the sequence FE Logit is: Xexp {β (x31 + x52 + x53 + x44 )} . exp {β (xj1 1 + xj2 2 + xj3 3 + xj4 4 )} Pri [ (3, 5, 5, 4)| Si ] = (33) (j1 ,j2 ,j3 ,j4 )∈Si By contrast, the pairs FE Logit splits i’s observed choices into mutually exclusive pairs of choice situations: j i1 = (j1 , j2 ) = (3, 5) and j i3 = (j3 , j4 ) = (5, 4). Then, i’s likelihood contribution in the pairs FE Logit is: exp {β (x31 + x52 )} · exp {β (x31 + x52 )} + exp {β (x51 + x32 )} Pri [(3, 5, 5, 4)] = . · (34) exp {β (x53 + x44 )} exp {β (x53 + x44 )} + exp {β (x43 + x54 )} Estimation of model (33) is more efficient than estimation of model (34). Indeed, multiplying the binomial Logits in (34) yields to: Pri [ (3, 5, 5, 4)| RSi ] = Xexp {β (x31 + x52 + x53 + x44 )} exp {β (xj1 1 + xj2 2 + xj3 3 + xj4 4 )} , (35) (j1 ,j2 ,j3 ,j4 )∈RSi where RSi ⊆ Si collects sequences: (3, 5, 5, 4), (3, 5, 4, 5), (5, 3, 5, 4), and (5, 3, 4, 5). In this example, the pairs FE Logit only uses information about 4 of the 12 sequences belonging to Si . Furthermore, the pairs FE Logit requires that some shopping trips will not be used in estimation: whenever jt = jt+1 in j it = (jt , jt+1 ), then “fragment” j it of j i will not be used in estimation. For example, if i is observed choosing the sequence of alternatives j i = (3, 4, 5, 5), then only j i1 = (3, 4) will be used in the estimation of the pairs FE Logit (while the sequence FE Logit would still be equivalent to (33) above).35 Model (33) requires stronger assumptions than model (34) for its consistent estimation. Consistent estimation of model (33) requires that alternatives {3, 4, 5} ⊆ CSit , t = 1, 2, 3, 4. However, 34 These sequences are: (3, 5, 5, 4), (5, 3, 5, 4), (5, 5, 3, 4), (5, 5, 4, 3), (4, 3, 5, 5), (3, 4, 5, 5), (3, 5, 4, 5), (5, 3, 4, 5), (5, 4, 3, 5), (5, 4, 5, 3), (4, 5, 3, 5), and (4, 5, 5, 3). 35 We do not use the sequence ji2 = (4, 5) as j2 was used in the formation of ji1 = (3, 4). Thus we could use either ji1 = (3, 4) or ji2 = (4, 5), but not both. 38 consistent estimation of model (34) only requires that {3, 5} ⊆ CSit , t = 1, 2 and that {4, 5} ⊆ CSit , t = 3, 4. In this example, if 4 ∈ / CSit , t = 1 or 2 or 3 ∈ / CSit , t = 3 or 4, then estimation of model (33) would not be consistent, while estimation of model (34) would. This difference in consistency results suggests a Hausman test for examining whether unobserved choice sets (or matching functions) change over time. If {3, 4, 5} ⊆ CSit , t = 1, 2, 3, 4, then estimation of both model (33) and model (34) are consistent. However, estimation of model (33) is more efficient than estimation of model (34). If 4 ∈ / CSit , t = 1 or 2 or 3 ∈ / CSit , t = 3 or 4, then only estimation of model (34) is consistent. 39 40 Universal Logit Bias RMSE SD -0.07 0.10 0.07 0.07 0.10 0.06 1.41 1.41 0.03 -1.41 1.41 0.03 1.65 1.65 0.01 -1.65 1.65 0.01 1.73 1.73 0.01 -1.73 1.73 0.01 True Logit Bias RMSE SD -0.07 0.10 0.07 0.07 0.10 0.06 -0.07 0.09 0.06 0.07 0.09 0.06 -0.05 0.07 0.05 0.05 0.07 0.05 -0.05 0.07 0.05 0.05 0.07 0.05 PH Logit Bias RMSE -0.08 0.10 0.08 0.10 -0.07 0.09 0.07 0.09 -0.06 0.08 0.06 0.08 -0.05 0.08 0.05 0.08 SD 0.07 0.06 0.06 0.06 0.05 0.05 0.06 0.06 Bias -0.38 0.41 -0.33 0.36 -0.15 0.16 -0.17 0.18 FE Logit RMSE 0.62 0.68 0.50 0.55 0.54 0.56 0.54 0.55 SD 0.48 0.54 0.37 0.41 0.52 0.54 0.51 0.52 Table 1: For each of the four cases of choice set heterogeneity (i.e., rows), ten datasets are generated. The degree of choice set heterogeneity increases from the first to the fourth row as the share of individuals with restricted choice sets passes from 0 to 50. #U = 5. The restricted choice sets are defined as CSi = U \ {ji }, where j is randomly selected for each individual. Each individual faces 10 choice situations. It is always assumed that individuals have stable choice sets (i.e., choice sets do not change across choice situations). In each dataset, there are 1000 individuals. Data Generating Process α = −2 100% U β=2 90% U α = −2 10% U \ {ji } β = 2 70% U α = −2 30% U \ {ji } β = 2 50% U α = −2 50% U \ {ji } β = 2 TABLE 1 — Increasing the share of individuals with restricted CSi 41 Universal Logit Bias RMSE SD -0.07 0.10 0.07 0.07 0.10 0.06 1.65 1.65 0.01 -1.65 1.65 0.01 1.78 1.78 0.01 -1.78 1.78 0.01 1.85 1.85 0.01 -1.85 1.85 0.01 True Logit Bias RMSE SD -0.07 0.10 0.07 0.07 0.10 0.06 -0.05 0.07 0.05 0.05 0.07 0.05 -0.06 0.10 0.08 0.06 0.10 0.08 -0.06 0.10 0.09 0.05 0.10 0.09 PH Logit Bias RMSE -0.08 0.10 0.08 0.10 -0.06 0.08 0.06 0.08 -0.07 0.11 0.07 0.11 -0.06 0.11 0.06 0.11 SD 0.07 0.06 0.05 0.05 0.08 0.08 0.09 0.09 Bias -0.38 0.41 -0.15 0.16 -0.71 0.75 -0.12 0.12 FE Logit RMSE 0.62 0.68 0.54 0.56 1.27 1.33 0.56 0.56 SD 0.48 0.54 0.52 0.54 0.52 0.54 0.55 0.55 Table 2: For each of the four cases of choice set heterogeneity (i.e., rows), ten datasets are generated. The degree of choice set heterogeneity increases from the first to the fourth row as the number of alternatives not included in CSi passes from 0 to 3. #U = 5. The restricted choice sets are defined asCSi = U \ {ji , ki , hi }, where ji , ki , and hi are randomly selected for each individual. Each individual faces 10 choice situations. It is always assumed that individuals have stable choice sets (i.e., choice sets do not change across choice situations). In each dataset, there are 1000 individuals. Data Generating Process α = −2 100% U β=2 70% U α = −2 30% U \ {ji } β=2 70% U α = −2 30% U \ {ji , ki } β=2 70% U α = −2 30% U \ {ji , ki , hi } β = 2 TABLE 2 — Increasing the severity of CSi restriction C Original Example ********* To be expanded, just main idea ************* gsc: I’ve kept this until we sign off on the Logit implementation of this example in Section 2 Imagine a simple binary choice model in which individual i is observed choosing or not some alternative 1: Yi = 1 if ui ≥ 0 and CSi? = {1, 0} . 0 if (ui < 0 and CSi? = {1, 0}) or CSi? = {0} Individual i’s difference in the indirect utilities of alternative 1 and 0 is ui ≡ ui1 −ui0 = βX i −εi , βX i ≡ β1 xi1 + · · · + βk xik + · · · + βK xiK , while i’s probability of ui ≥ 0 is Pr [ ui ≥ 0| β, X i ] = G (βX i ). The probability that any individual i to be matched to choice set CSi? = {1, 0} is Pr [CSi? = {1, 0}] = p, while the probability to be matched to CSi? = {0} is Pr [CSi? = {0}] = 1 − p. Assuming that the matching process between individuals and choice sets is independent of εi , Pr [ Yi = 1| β, X i ] = F (βX i ) = G (βX i ) p, while Pr [ Yi = 0| β, X i ] = 1−F (βX i ) = [1 − G (βX i )] p+ [1 − p] = 1 − G (βX i ). The derivatives of F and G are, respectively, f and g. For those i’s with CSi? = {1, 0}, the average marginal effect with respect to xik (averaged across the distribution of X i ): EX ∂ Pr [ Yi = 1| β, X i ] ? CSi = {1, 0} = EX [g (βX i ) βk ] . ∂xik (36) For those i’s with CSi? = {0}, the average marginal effect with respect to xik (averaged across the distribution of X i ): EX ∂ Pr [ Yi = 1| β, X i ] ? CSi = {0} = 0. ∂xik (37) Differently, the average marginal effect with respect to xik (averaged both across the distribution of X i and the distribution of CSi? ): 42 EX,CS ? ∂ Pr [ Yi = 1| β, X i ] ? = {1, 0} CS = EX [f (βX i ) βk ] i ∂xik (38) = EX [g (βX i ) βk ] p. By assuming that X i is distributed multivariate normal, one can obtain a consistent estimate of (38), whatever the true G and p, by a linear regression of Yi on X i :36 specifically, estimated coefficient αk of linear regression Yi = α0 + α1 xi1 + · · · + αk xik + · · · + αK xiK + ei . Since EX [g (βX i ) βk ] > EX [g (βX i ) βk ] p > 0, whenever choice sets are heterogeneous (i.e., p < 1) and unobserved (i.e., cannot “condition” on true CSi? or, in other words, cannot estimate regression only for those with CSi? = {1, 0}), the estimated average effects (38) will overstate the average marginal effects for those with CSi? = {0}, (37), and will understate the average marginal effects for those with CSi? = {1, 0}, (36). 36 See discussion at page 579 by Wooldridge (2010). 43
© Copyright 2026 Paperzz