Demand estimation with unobserved choice set heterogeneity∗ Gregory S. Crawford† Rachel Griffith‡ Alessandro Iaria§ April 22, 2016 Abstract [Preliminary and incomplete: please do not cite without authors’ consent] We present a method to estimate demand parameters in the presence of unobserved choice set heterogeneity. We describe the estimation bias that arises when the econometrician mistakenly imputes to individuals supersets of their true but unobserved choice sets. We propose procedures related to the Chamberlain (1980) Fixed Effect estimator that allow for unobserved and heterogeneous choice sets, and rely on assumptions about the stability or evolution of choice sets over at least two choice occasions. We present specification tests and illustrate the ideas using householdlevel scanner data in the market for ketchup in the UK. Our results show that accounting for unobserved choice set heterogeneity is critical for accurately estimating preferences and bounding price elasticities of demand. ∗ We would like to thank Clément de Chaisemartin, Xavier D’Haultfœuille, xx, xx, and seminar participants at xx, xx, and ECARES for helpful comments. Griffith gratefully acknowledges financial support from the European Research Council (ERC) under ERC-2009-AdG grant agreement number 249529, and the Economic and Social Research Council (ESRC) under the Centre for the Microeconomic Analysis of Public Policy (CPP), grant number RES-544-28-0001. Iaria gratefully acknowledges financial support from the research grant Labex Ecodec: Investissements d’Avenir (ANR11-IDEX-0003/Labex Ecodec/ANR-11-LABX-0047). † Dept. of Economics, University of Zürich and CEPR, [email protected] ‡ University of Manchester and IFS, [email protected] § CREST (Paris) and IFS, [email protected] 1 1 Introduction The application of revealed preference arguments to estimate demand and recover preference parameters is one of the primary applications of empirical economics (Samuelson (1938), Blundell (2005) and Varian (2006)). Traditional approaches rely on complete knowledge of the choice set from which decisions makers are choosing, yet, there are many reasons why choice sets may be restricted and heterogenous across consumers (see, inter alia, Bernheim and Rangel (2009), DellaVigna (2009), Koszegi and Rabin (2007), Hausman (2008), Spiegler (2011), Eliaz and Spiegler (2011), Manski (1977), Manzini and Mariotti (2014), Masatlioglu et al. (2012), Reutskaja et al. (2011)). Researchers and policy makers are interested in studying the empirical relevance and consequences of choice set heterogeneity. The recent applied literature has sought to accommodate choice set heterogeneity by bringing in additional information on their distribution in the form of further data or functional form restrictions (Goeree (2008) and Van Nierop et al. (2010)), De los Santos et al. (2012)). However, we rarely have good information on true choice sets, or specific knowledge of the process of choice set formation, and even when we believe we are able to model the process, there is likely to be uncertainty about the precise functional form. Our contribution in this paper is to propose an econometric estimator that allows us to consistently estimate preference parameters, and to bound market shares and functions of market shares such as demand elasticities, when consumers face heterogeneous choice sets that are unobserved to the econometrician. Our proposed solution is related to Chamberlain (1980) and McFadden (1978), who show that we can identify preferences using subsets of products known to belong to the true choice set. We use this intuition and show that it can be extended to exploit observed purchase decisions to construct “sufficient sets” of products that lie within consumers’ true but unobserved choice sets. We require some minimal restrictions to preference heterogeneity, and assumptions about the stability or evolution of unobserved choice sets over at least two choice occasions. These are considerably weaker than requiring complete information about choice sets or the process by which they are determined. They allow us to identify preference parameters in the presence of very general unobserved heterogeneity in both choice sets and preference parameters, and to test the validity of parametric models of choice set formation, such as estimated in Goeree (2008). 2 These sufficient sets, coupled with the convenient functional form of the logit model, allow us to “difference out” consumers’ true, unobserved-to-the-econometrician, choice sets. Importantly, we allow for unobserved preference heterogeneity in the form of mixed logit with latent classes, individualproduct specific fixed-effects, or the nested logit, and any choice set formation mechanism that is compatible with these forms of heterogeneity. The choice set formation mechanism can itself be a function of all systematic utilities, including the product characteristics and the preference parameters, which includes the choice set formation mechanisms specified in Chiang et al. (1998), De los Santos et al. (2012), Draganska and Klapper (2011), Draganska et al. (2009), Goeree (2008), and Van Nierop et al. (2010). In common with these papers, we can not allow for choice set formation processes that are driven by the idiosyncratic individual-time varying shocks to demand. We give a number of specific examples of “sufficient sets,” but our proposition is more general, and there could be many more possible sufficient sets that are better suited to specific economic environments. Estimators of demand parameters will be biased if the econometrician mistakenly includes in consumers’ choice sets unavailable products that would have had a positive purchase probability if available. We show that, in models with logit errors, unobserved choice set heterogeneity can be characterized as an individual-product specific fixed effect. As a consequence, the Fixed-Effect logit (FE logit), developed by Chamberlain (1980) to “difference out” individual-product specific unobservables, naturally accommodates unobserved choice set heterogeneity. The Fixed Effect logit relies on the assumption of choice set stability across at least two choice occasions and identifies preference parameters from changes in choices. This means that only preference parameters on time varying characteristics can be identified. The remaining part of the systematic utilities, the individual-product specific fixed effects, are differenced out. This prevents us from obtaining market shares and other potentially interesting functions of preference parameters, such as elasticities. We show how to by-pass this lack of identification by adding restrictions on the form of unobserved preference heterogeneity. We suggest an estimator that uses individual-specific purchase histories to provide a sufficient set that allows us to identify all the preference parameters in the presence of unobserved and stable choice sets. We call this model the Full-Purchase-History logit (FPH logit). A variant of this model makes the alternative assumption that the unobserved choice sets may be increasing over time, as for example would be the case with experience goods. We call this model the Past-Purchase-History logit (PPH logit). Instead of comparing an individual’s decisions 3 over time we can alternatively compare two individuals at the same point in time. This approach may be applicable if considering similar individuals making a decision in a common economic environment; for example two 14 year old boys who own an XBOX ONE and are shopping online for video games. We call this model the Inter-Personal logit (IP logit). These models are complementary. The FE logit is robust to preference heterogeneity in the form of individual-product specific fixed effects, requires choice set stability but does not allow us to identify all elements of preferences that are needed to calculate elasticities. The FPH and IP also require choice set stability, but by restricting the forms of unobserved preference heterogeneity allowed, they enable us to identify all elements of consumer preferences. The PPH makes an alternative assumption on the evolution of choice sets and also restricts the forms of unobserved heterogeneity allowed. Different choice environments favor the assumptions underlying different sufficient sets and estimation methods, and we provide examples where each could be relevant. We also propose tests of the choice set stability and preference heterogeneity assumptions underlying the alternative sufficient sets. We are often interested not only in the preference parameters, but in market shares and other functions of parameters, for example, willingness to pay, elasticities, welfare or in undertaking counterfactual analysis. While we can point identify preference parameters, and so willingness to pay, without further information or restrictions on the true but unobserved choice sets, we are not able to pin down purchase probabilities and hence market shares and elasticities. We can however “bound” them if we observe both a sufficient set, that is a subset of the true choice set, and a superset that encompasses the true (unobserved) choice set. Our estimator is complementary to existing empirical methods that deal with choice set heterogeneity by combining additional data and assumptions about their distribution in the population. Our method allows us to estimate preferences when additional information on choice sets is not available or not reliable, and to test the functional form assumptions embedded in any proposed model of choice set formation. This paper is related to multiple literatures in economics and marketing. The closest paper to ours is Lu (2016). Lu (2016) proposes a method for the estimation of demand when choice sets are unobserved that does not require the specification of a choice set formation model. Lu (2016)’s key insight is that an individual’s true choice probability must be bounded above by a choice probability computed on a subset of the true choice set, and below by a choice probability 4 computed on a superset of the true choice set. This allows him to set identify preference parameters by making assumptions, or by having additional data, on supersets and subsets of the unobserved choice sets faced by each individual in each choice situation. In contrast, our method relies on the ability of the econometrician to observe each profile of preferences making at least two choices from the same choice set (or a choice set that changes in predictable ways), without requiring any knowledge on what the individual-specific unobserved choice set actually is. In principle, Lu (2016)’s method does not require any specific assumption on the distribution of the unobserved portion of preferences, while our method relies on having a logit component (i.e., multinomial logit, nested logit, FE logit, or mixed logit with latent classes). However, in the implementation of the estimator, both in the simulations and in the empirical application, Lu (2016) also assumes a multinomial logit model of preferences. On the practical side, Lu (2016)’s method may face numerical complexities when the number of preference parameters is larger than a few, while ours can be implemented like a standard discrete choice model, both in terms of coding and numerical complexity. In conclusion, we see Lu (2016)’s and our method as complements: if one has reliable information on tight supersets and subsets of the true choice sets faced by individuals in each choice situation, then Lu (2016)’s estimator may be preferred because more robust in terms of unobserved preferences, while when such information is not available or not reliable, then our estimator can be favored. There is a relevant literature that seeks to provide theoretical foundations for consumer choice set variations based on limited consumer attention. Eliaz and Spiegler (2011) introduce boundedly rational consumers that use screening criteria to reduce the number of “relevant” alternatives among which they must choose and explore how firms can use marketing techniques to manipulate these sets. Masatlioglu et al. (2012) allow for limited consumer attention in a more general demand environment and explore whether and when preferences can be separately identified from attention (or consideration) when both preferences and consideration are non-stochastic, while choices and choice sets are observed. If, for example, a consumer’s choice x changes when a product y is removed from the choice set, then it must be that both y must have been considered (“Revealed Attention”) and that she prefers x over y (“Revealed Preference”). Manzini and Mariotti (2014) observe that the deterministic nature of the model proposed by Masatlioglu et al. (2012) may be at odds with stochastic choice data, and extend the framework to allow for stochastic consideration. They show that this enables the separate identification of preferences from attention with stochastic choice data. From classic decision 5 theory (Debreu (1959)) it is known that information on both prices and income is necessary to pin down households’ budget constraints and ultimely choice sets, and it is relatively rare that we have data with good information on both prices and incomes (standard budget surveys measure income but not prices, while transaction level purchase data have information about prices, but rarely include detailed information about households’ income). There is a large number of empirical papers in economics and marketing based on both aggregate and individual-level datasets (leading examples are Berry et al. (1995) and Nevo (2001), but see also Ackerberg et al. (2007) and Train (2009) for recent surveys). Researchers typically assume that all consumers have access to the same choice set, consisting of all the products in the market (Berry et al. (1995)) or the highest market share products (Nevo (2001)). Identification of preferences relies on revealed preference arguments (see Samuelson (1938), Varian (2006) and for more recent discussions Koszegi and Rabin (2007) and Hausman (2008)), which require that consumers consider all relevant alternatives from the choice set. This literature seeks to empirically accommodate choice set heterogeneity by modelling the choice set formation process in the estimation of preferences. Some papers in this literature have information directly about consumers’ choice sets, while others model them as a function of relevant observables. Conlon and Mortimer (2013) exploit periodic information about product availability from vending machines to demonstrate the importance of accounting for stockouts on estimating preferences and the profit consequences of stock-outs. However, more common is the estimation of models of choice set formation. For example, papers by Chiang et al. (1998), Draganska and Klapper (2011), Draganska et al. (2009), Goeree (2008), and Van Nierop et al. (2010), all building on Manski (1977), specify models of probabilistic consideration sets: a consumer’s unconditional purchase probability for product j is the sum across all possible choice sets including j of the purchase probability for j given that choice set times the probability of that choice set. In Chiang et al. (1998), the prior distribution of each choice set containing j is equal. Goeree (2008) models the probability of a product being in a choice set as a function of advertising; identification relies on differential exposure to advertising media across consumer demographics coupled with differential advertising strategies across firms. The explicit estimation of a choice set formation model requires additional information with respect to usual choice data. Whenever this additional information, i.e. more detailed data and knowledge of the appropriate choice set formation model, is not available, our proposed approach is more appropriate. Indeed, our proposed method exclusively relies on standard 6 choice data and does not require the estimation of choice set formation models. Moreover, and differently from existing methods, our estimators do not suffer from any curse of dimensionality in the number of products sold in the market. The rest of the paper is structured as follows. In section 2 we describe the problems that arise from unobserved choice set heterogeneity. We show that estimators of demand parameters that mistakenly impute to consumers supersets of their true choice sets will be biased if the products mistakenly added would have positive purchase probability when available. In section 3 we illustrate why existing methods may not easily address the problems introduced by unobserved choice sets and derive alternative estimators that can accommodate unobserved choice set heterogeneity. In section 4 we discuss the economic intuition and specification tests that would allow the econometrician to choose between alternative estimators. Section 5 discusses how we can use the parameter estimates to obtain bounds on predicted choice probabilities and elasticities. In section 6 we illustrate our methods in practice using household-level scanner data on the market for ketchup in the UK. In section 7 we provide some concluding remarks. The appendices provide additional detail, auxiliary findings and further examples. 2 Bias from Unobserved Choice Set Heterogeneity Unobserved choice set heterogeneity introduces a bias in demand estimation. Applying revealed preference arguments to supersets of choice sets, which include unavailable products that would have positive purchase probabilities if available, will yield biased estimates of preference parameters since the model would be forced to rationalise observed choices that are inconsistent with consumer’s true preferences. In this case, we can write a consumer’s probability of choosing a product from their true choice set as the probability they choose from a superset divided by the sum of the probabilities she would choose any of the products in her specific choice set. This latter probability induces an estimation bias whenever not accounted for, and such a bias can be substantial. In this section we introduce the primitives of the choice environment and formalize the mechanism by which unobserved choice sets, if not properly handled, are likely to introduce an estimation bias in demand. 7 2.1 Preference Fundamentals Let there be i = 1, . . . , I types of decision makers and for each type t = 1, . . . , T different choice opportunities. For example, this could be one individual observed making several consecutive independent decisions, or several observationally equivalent individuals each simultaneously making an independent purchase decision. For both kinds of data, denote i’s sequence of choices by Yi = (Yi1 , . . . , YiT ). For simplicity, we assume to observe exactly T choice situations for each consumer type i, but this is not important for our results. We consider a situation in which consumer type i is matched, in choice situation t, to choice set ? . We are interested in the estimation consequences of mistakenly imputing to the pair (i, t) a CSit ∗ ⊆ St . superset, St , that includes options that were not considered by i in choice situation t, i.e. CSit For notational convenience, we assume each i to face the same St , but this is inessential. An example of a commonly used superset from the empirical literature on demand estimation is to include the union of all alternatives available to any consumer in a certain time period. Another example would be to include all products above a given market share, or a given number of products with the largest market shares. Denote by × the cartesian product and let the superset of choice sequences be S = ×Tt=1 St . By construction, any observed choice sequence Yi must belong to the set of possible choice sequences ? CS ?i = ×Tt=1 CSit . Let preferences be defined by a parameter vector θ and the probability with which i is matched to a given set of possible choice sequences, CS ?i = c, be given by Pr [ CS ?i = c| γ], with γ a vector of parameters governing this process. Given θ and a specific match with a set of possible choice sequences, each consumer type i is observed to make a sequence of choices Yi . We assume that the conditional indirect utility of alternative j in choice situation t for consumer type i is Uijt = Vijt (Xit , θ) + ijt , where Xit = [Xi1t , . . . , Xijt , . . . , XiJt ] is the matrix of (i, t)-specific observable characteristics, while ijt is a portion of i’s utility that is unobserved to the econometrician. Note that the function Vijt has 8 an i subscript, indicating that so far we have not made any assumptions on the form of unobserved consumer heterogeneity. The probability that i is matched to the set of possible choice sequences CS ?i = c and makes a sequence of choices Yi = j is then Pr [ Yi = j, CS ?i = c| θ, γ] = Pr [ Yi = j| CS ?i = c, θ] Pr [ CS ?i = c| γ] . (2.1) Equation (2.1) is quite general and captures two different features of behavior typical of demand in most product markets. The first term, Pr [ Yi = j| CS ?i = c, θ], embodies consumer preferences for products given the choice sets they are matched to. The second term, Pr [ CS ?i = c| γ], embodies the likelihood of consumers to be matched to their (unobserved) set of possible choice sequences, CS ?i = c. This process could be a function of consumers’ decisions, for example due to consumer search or due to stategic decisions by firms. These factors are represented by the parameter vector γ. In principle, γ could include some or all of the parameters that are in θ. We make the following important assumption: Assumption 1. Pr [ Yi = j| CS ?i = c, θ] from (2.1) belongs to the logit family for any c. Note that, compatibly with Assumption 1, Vijt (Xit , θ) may accomodate flexible forms of unobserved preference heterogeneity such as nested logit, mixed logit, or multinomial logit with individualalternative specific fixed effects. In addition, and consistently to models that describe individuals being matched to choice sets according to consumer search and/or stategic decisions by firms, Pr [ CS ?i = c| γ] can take any form and be a function of any element of Pr [ Yi = j| CS ?i = c, θ]. Assumption 1 is ubiquitous in empirical work both among papers with general demand estimation purposes, such as (Berry et al., 1995, p864-868) and (Berry et al., 2004, p76), and among papers that precisely focus on demand estimation with unobserved choice sets, such as (Draganska and Klapper, 2011, p660), (Draganska et al., 2009, p110), and (Goeree, 2008, p1025). Indeed, any model of individual choice behavior based on the logit family (including mixed logit) relies on Assumption 1. Extending the methods introduced in this paper to demands without any logit component is a promising topic for further research (e.g., probit). Under Assumption 1, Pr [ CS ?i = c| γ] can be ignored in the estimation of θ. This is so since with Assumption 1 we assume to know the functional form for the first term in (2.1). Thus, even if γ 9 contains elements of θ, failing to account for the matching of consumers to choice sets only impacts the efficiency of our estimator of θ, not its consistency. 2.2 The Bias Formula Given Assumption 1, the conditional probability of i selecting choice sequence Yi = j, given their set of possible choice sequences, CS ?i = c = ×Tt=1 ct , is: Pr [ Yi = j| CS ?i = c, θ] = T Y exp (Vijt (Xit , θ)) . m∈CS ? =ct exp (Vimt (Xit , θ)) P t=1 (2.2) it By contrast, the typical application specifies the conditional probability of i choosing Yi from a superset, S = s = ×Tt=1 st : Pr [ Yi = j| S = s, θ] = T Y exp (Vijt (Xit , θ)) , m∈St =st exp (Vimt (Xit , θ)) P t=1 (2.3) where the difference between (2.2) and (2.3) lies in the summations in their denominators. Even given Assumption 1, if consumer type i makes decisions from the set of possible choice sequences CS ?i = c, a likelihood function derived from (2.3) will generally lead to an inconsistent estimator of θ. In what follows we illustrate why. If i were to choose from S = s, the superset of choice sequences, then the probability of selecting choice sequence Yi = j ∈ CS ?i = c ⊆ S = s would be: Pr [ Yi = j| S = s, θ] = Pr [ Yi = j| CS ?i = c, θ] Pr [ Yi ∈ CS ?i = c| S = s, θ] (2.4) T Y P ? =c exp (Vimt (Xit , θ)) exp (Vijt (Xit , θ)) m∈CSit t P P = , exp (V (X , θ)) ? imt it m∈CS =ct r∈St =st exp (Virt (Xit , θ)) t=1 it where Pr [ Yi ∈ CS ?i = c| S = s, θ] is the probability that i’s choice sequence will belong to CS ?i = c whenever she selects from the superset of choice sequences S = s. Pr [ Yi ∈ CS ?i = c| S = s, θ] is a measure of the importance, in terms of i’s preferences, of the choice sequences in CS ?i = c relative to those in the superset S = s. 10 Re-arranging (2.4), we can express (2.2) in terms of the superset of choice sequences, S = s: Pr [ Yi = j| CS ?i = c, θ] = Pr [ Yi = j| S = s, θ] Pr [ Yi ∈ CS ?i 1 = c| S = s, θ] T Y exp (V (Xit , θ) − ln(Pr [ Yi ∈ CS ?i = c| S = s, θ])) X exp (Vimt (Xit , θ)) t=1 m∈St =st !! P ? =c exp (Vimt (Xit , θ)) m∈CSit t P T exp Vijt (Xit , θ) − ln Y r∈St =st exp (Virt (Xit , θ)) X . = exp (Vimt (Xit , θ)) t=1 = ijt (2.5) m∈St =st The final two lines of (2.5) show how, if ijt is i.i.d. Type I Extreme Value, the model conditional on CS ?i = c can be written as a choice probability from S = s augmented by a term whose omission may cause a bias. The term ln(Pr [ Yi ∈ CS ?i = c| S = s, θ]) is a function of the same systematic utilities that enter the numerator and denominator and will generally cause a bias unless Pr [ Yi ∈ CS ?i = c| S = s, θ] = 1, i.e. there will be bias unless the probability of selecting one of the choice sequences not in CS ?i = c equals 0 if i faced the superset of choice sequences S = s. While Pr [ Yi ∈ CS ?i = c| S = s, θ] is never exactly 1 in a logit model (or any random utility model with unbounded support for ijt ), continuity suggests that if we mistakenly attribute to consumer type i’s set of possible choice sequences a choice sequence that she does not like, then the expected bias should be small. However, if we mistakenly attribute a choice sequence that she really does like, but did not select because it was not included in CS ?i = c, then the bias is expected to be large. 11 2.3 The Size of the Bias Table 2.1: The size of bias in Universal Logit Bias (SD) RMSE Increasing product differentiation, removing first best 2 (V1 − V2 )/V1 ' 10%, σX = 1.75 -0.379 0.379 (0.016) 2 (V1 − V2 )/V1 ' 30%, σX = 2.5 -0.471 0.471 (0.012) 2 (V1 − V2 )/V1 ' 50%, σX = 3.5 -0.578 0.579 (0.012) 2 (V1 − V2 )/V1 ' 70%, σX =6 -0.771 0.771 (0.009) Increasing share of individuals with constrained choice set 100% have full choice set 0.005 0.032 (0.032) 90% have full choice set -0.223 0.224 (0.021) 70% have full choice set -0.525 0.526 (0.013) 50% have full choice set -0.719 0.719 (0.007) Increasing share of products randomly removed from choice set 5 out of 5 available 0.005 0.032 (0.032) 4 out of 5 available -0.525 0.526 (0.013) 3 out of 5 available -0.719 0.719 (0.007) 2 out of 5 avialable -1.139 1.139 (0.003) 3 Correcting the Bias from Unobserved Choice Set Heterogeneity Chamberlain (1980)’s Fixed-Effect logit (FE logit) allows us to consistently estimate the preference parameters associated to time-varying regressors, which we show in section 3.2. In section 3.3, building on the insights from the FE logit model, we propose a broad class of models, of which the FE logit is a member, that relies on individual-specific observed purchase behavior to construct “sufficient sets” that difference out true but unobserved choice sets. In general, to obtain predicted purchase probabilities and their functions, such as elasticities, we need to be able to identify the whole vector 12 θ. The FE logit does not usually allow for the identification of the entire vector θ. However, in section 3.5 we present specific examples of “sufficient sets” that, at the cost of some minimal additional restrictions on preference heterogeneity, allow us both to difference out consumers’ unobserved choice sets and to identify the entire preference vector θ. However, before turning to our proposed solutions, we first discuss why conventional methods based on alternative specific constants, random coefficients, and variants of Manski (1977) are unlikely to fully deal with the problem. 3.1 Why Some Existing Methods Do Not Solve the Problem Several methods routinely implemented in demand estimation do not generally avoid bias from unobserved choice set heterogeneity. To help see this denote the bias term that enters the numerator of equation (2.5) as ln (πi ) = − ln(Pr [ Yi ∈ CS ?i = c| S = s, θ]). Econometricians are used to thinking that most choice-based sampling issues in logit models can be addressed by adding a full set of alternative-specific constants (see Lerman and Manski (1981) and Bierlaire et al. (2008)). This is not the case for our problem, because ln (πi ) is an individual -specific term that only affects the numerator of the logit formula. We cannot treat the term causing the bias, ln(πi ), as a random coefficient, because the estimation of models with random coefficients, such as the mixed logit model, relies on an independence assumption between the distribution of random coefficients and the observables of the model (see (Berry et al., 1995, p865), (Berry et al., 2004, p76), (McFadden et al., 2000, p447), and (Nevo, 2001, p314)). In this situation, as is made clear by equation (2.5), by construction the distribution of ln(πi ) is a function of the Xit . Furthermore, we cannot treat the ln(πi )’s as unobserved fixed effects and estimate them with individual-specific dummy variables, because of the incidental parameters problem. The most popular approach used in the literature is to jointly model both choice set formation and the purchase decision given a choice set. As originally discussed by Manski (1977), the unconditional probability of i selecting choice sequence j can be written as: Pr[Yi = j|θ, γ] = X Pr[Yi = j|CS ?i = c, θ] Pr[CS ?i = c|γ], (3.1) c∈Ci? where Ci? is the collection of sets of possible choice sequences to which consumer type i can be matched. By having information on the matching process between consumer types and choice sets, 13 we can integrate out unobserved choice set heterogeneity, as it is routinely done with unobserved preference heterogeneity. However, there are several instances in which the approach proposed by Manski (1977) may not fully solve the problem. One instance is if we do not have precise information on the correct functional form, which is necessary in order to implement these models in practice. Another, that is perhaps less obvious, is that a new problem that is analogous to choice set misspecification may arise in a model that tries to integrate out choice sets. This applies to the collection of choice sets over which expectations are taken, Ci? , since such a support can easily be misspecified. Even when Pr[Yi = j|θ, γ] is correctly specified, in practice its estimation may prove difficult. Indeed, this model suffers from a curse of dimensionality related to the number of elements in Ci? , which grows exponentially in the number of products J sold in the market. We illustrate instances of support misspecification in the context of three popular papers that rely on Manski (1977)’s approach. First consider Goeree (2008). In our notation, equation (3) in Goeree (2008) can be written as: Pr[Yit = jt |θ, γ] = X c∈C j " # Y Y exp (Vijt t (Xit , θ)) P φilt (γ) (1 − φikt (γ)) , r∈c exp (Virt (Xit , θ)) l∈c (3.2) k∈c / where C j is the collection of all the choice sets that include product j. This specification relies on: exp (Vijt t (Xit , θ)) and r∈c exp (Virt (Xit , θ)) ? • our Assumption 1, with Pr[Yit = jt |CSit = c, θ] = P • the assumption that consideration of each product is independent from the consideration of the Q Q ? other products: Prit [CSit = c|γ] = l∈c φilt (γ) k∈c / (1 − φikt (γ)). Even given this second assumption, in Goeree’s application the total number of PCs is still very large (2112), so the non-parametric estimation of all the φ’s is not feasible. Consequently, Goeree further assumes that φilt (γ) = exp (Wilt (γ)) . 1 + exp (Wilt (γ)) (3.3) This implies that every c ∈ C j will have a strictly positive probability in the distribution of choice ? sets for each (i, t), so that Prit [CSit = c|γ] > 0 for every (i, t). However, it may be that for some (i, t) ? combination Prit [CSit = c|γ] = 0 for some c, i.e. c is not in the support of the choice set distribution 14 to which individual i can be matched to in period t: ? Prit [CSit = c|γ] = Y Y φ (γ) (1 − φikt (γ)) ilt l∈c j? if c ∈ Cit (3.4) k∈c / 0 j if c ∈ C \ j? Cit , j? where Cit is the collection of choice sets to which individual i can possibly be matched to in period j? is typically unobserved and heterogeneous across (i, t) combinations, model (3.2) and t. Since Cit (3.3) will suffer from support misspecification whenever there exists even one observation where the true collection of unobserved choice sets to which an individual can actually be matched is restricted, j? ⊂ C j . With support misspecification, any estimator of model i.e. ∃ (i, j, t) combination such that Cit (3.2) and (3.3) will be inconsistent. To be clear, computing expectations over the power set of the universal set, as with C j in (3.2), would not be a problem if we could afford to estimate a truly flexible specification for φ that was able to ? accomodate Prit [CSit = c|γ] = 0 whenever necessary. The problem arises because we are not usually able to estimate a truly flexible model for φ, and we need to make additional assumptions along the lines of (3.3). Taken together, C j in (3.2) and (3.3) may introduce bias due to the potential inclusion of infeasible choice sets. At the very least, the approach we develop will be a useful complement to allow testing of the functional form assumptions made with such an approach, but more generally we provide a more robust approach to estimating demand parameters in a context where choice sets are unobserved and heterogenous. Two other papers that take a similar approach to Goeree (2008) are Van Nierop et al. (2010) and ? Draganska and Klapper (2011). Van Nierop et al. (2010) assume that the model for Prit [CSit = c|γ] is a J-dimensional multivariate normal distribution, which again cannot be 0 for any c. In the corresponding equation to (3.2), they compute expectations over C j , associating positive mass to j? every c ∈ C j rather than only to each c ∈ Cit ⊂ C j (see their equation (8)). Draganska and Klapper ? (2011) assume Prit [CSit = c|γ] to be a multinomial logit model with choice set C j , and compute expectations over C j (see their equation (5)), where again the multinomial logit model cannot be exactly 0 for any c ∈ C j . 15 3.2 Fixed Effect Logit As shown in section 2.2, a misspecified model with unobserved choice set heterogeneity is equivalent to a model in which we impute to each i a superset S = s and we have a (further) additive unobserved individual-specific fixed effect in the numerator of the logit formula, which is correlated to all other components of the model. This bias term, see equation (2.5), can be written as an individual-specific effect, ln (πi ) = − ln(Pr [ Yi ∈ CS ?i = c| S = s, θ]), which suffers from the incidental parameter problem. Chamberlain (1980)’s FE logit was developed to difference out individual-alternative specific fixed effects, and so will also naturally address the problem of unobserved choice sets. Chamberlain (1980) takes a conditional likelihood approach, where “the key idea is to base the likelihood function on the conditional distrubtion of the data, conditioning on a set of sufficient statics for the incidental parameters.” The conditional log likelihood function does not depend on the incidental parameters so maximum likelihood yields consistent estimators. Chamberlain (1980)’s FE logit is a special example of a conditional logit model, which we show in the next section. In general, to obtain predicted purchase probabilities and their functions, such as elasticities, we need to be able to identify the whole vector of preference parameters θ. The FE logit does not usually allow for the identification of the entire vector θ, but only for those elements of θ associated to timevarying observables, such as prices. However, at the cost of some minimal additional restrictions on preference heterogeneity, we can construct alternative sufficient sets that allow us both to difference out consumers’ unobserved choice sets and to identify the entire preference vector θ. In this section we present an example of such a sufficient set, whose associated conditional logit model we call Full Purchase History logit. 3.3 Sufficient Sets Conditional logit models are obtained from conditioning the original logit model on some statistic or function of observables. In the case of the FE logit, such a statistic is the random set P(Yi ), whose exact realization depends on i’s observed choice sequence Yi . In what follows, we propose a class of random sets with the property that the associated conditional logit models will be independent of the true but unobserved choice sets. Because these random sets act as sufficient statistics for the unobserved choice sets, we call them sufficient sets. Consider the following condition, which we will use to evaluate possible solution methods: 16 Condition 1. Given any choice sequence Yi ∈ CS ?i , there exists a correspondence f such that f (Yi ) ⊆ CS ?i . Condition 1 states that we can use i’s observed choice sequence Yi = j to construct a set of choice sequences f (Yi = j) = r that is always included in their unobserved set of possible choice sequences CS ?i = c. In what follows, we show that if Assumption 1 and Condition 1 hold, then f (Yi ) is a sufficient statistic for CS ?i . Afterward, we will present alternative economic assumptions on the ? stability or evolution of CSit across the T choice situations that lead to f ’s that satisfy Condition 1 and imply the following proposition. Proposition 1. If Assumption 1 and Condition 1 hold, for every consumer type i and choice se- quence j such that f (Yi = j) = r: QT exp(Vijt (Xit , θ)) . QT k∈f (k)=r t=1 exp(Vikt (Xit , θ)) t=1 Pr[Yi = j|f (Yi ) = r, θ] = P Then, θ can be consistently estimated by Maximum Likelihood from Pr[Yi = j|f (Yi ) = r, θ]. 17 Proof. Pr[Yi = j|f (Yi ) = r, θ] = Pr[f (Yi ) = r, Yi = j, CS ?i = c|θ, γ] Pr[f (Yi ) = r, CS ?i = c|θ, γ] = Pr[f (Yi ) = r|Yi = j, θ] Pr[Yi = j|CS ?i = c, θ] Pr[CS ?i = c|γ] Pr[f (Yi ) = r|CS ?i = c, θ] Pr[CS ?i = c|γ] Pr[f (Yi ) = r|Yi = j, θ] Pr[Yi = j|CS ?i = c, θ] = X Pr[f (Yi ) = r|θ, Yi = k] Pr[Yi = k|CS ?i = c, θ] (3.5) k∈U T Y exp(Vijt (Xit , θ)) v∈CS ? =ct exp(Vivt (Xit , θ)) P t=1 = X it T Y exp(Vikt (Xit , θ)) v∈CS ? =ct exp(Vivt (Xit , θ)) P k∈f (k)=r t=1 it QT exp(Vijt (Xit , θ) QT k∈f (k)=r t=1 exp(Vikt (Xit , θ)) =P t=1 The first two equalities follow from applying Bayes’s rule. In the third equality, U is the universal set of all choice sequences. The fourth equality follows from Pr[f (Yi ) = r|Yi = k, θ] being 1 for any k ∈ f (k) = r and 0 otherwise. This is an extreme case of McFadden (1978)’s Uniform Conditioning Property. Here this property holds since, conditional on a realization of Yi , Yi = k, f (Yi ) is not a random set: f (k) is either r with probability one, or different from r with probability one. In the P last equality, v∈CS ? =ct exp(Vivt (Xit , θ)) cancels out. In conclusion, consistency of the Maximum it Likelihood estimator of θ follows from McFadden (1978). Proposition 1 implies that, in the presence of unobserved choice set heterogeneity, parameters θ can be consistently estimated by Maximum Likelihood on the basis of the conditional logit model Pr[Yi = j|f (Yi ) = r, θ]. Whenever CS ?i = c is observable, the econometrician can easily detect appropriate ? subsets of CSit = ct for any (i, t) combination, and rely on McFadden (1978) to consistently estimate θ. However, whenever CS ?i = c is unobservable, the econometrician needs to be careful in constructing the set of choice sequences f (Yi = j). Indeed, if f (Yi = j) is not a subset of CS ?i = c, then Condition 18 1 does not hold and we can use our earlier argument to show inconsistency of any estimator based on Pr[Yi = j|f (Yi ) = r, θ]: the econometrician is mistakenly enlarging i’s set of potential choice sequences. 3.4 Unobserved Consumer Heterogeneity Proposition 1 is compatible with unobserved preference heterogeneity in the form of mixed logit models with discrete mixtures or latent classes. For example, suppose that each i has individualspecific preferences βi , which are distributed according to p ( βi = b| CS ?i = c, θ) = p ( βi = b| θ) in the population. It is typical in the literature to assume that the distribution of random coefficients is independent of choice sets, CS ?i = c. When we condition a mixed logit choice probability on a subset of the true choice set f (Yi ) = r ⊆ CS ?i = c, as in our Proposition 1, we obtain: Pr [ Yi = j| f (Yi ) = r, CS ?i = c, θ] = Pr [ Yi = j| f (Yi ) = r, θ] (3.6) Z = Pr [ Yi = j, βi = b| f (Yi ) = r, θ] db b Distr. of pref. cond. on obs. choices Z = b Pr [ Yi = j| f (Yi ) = r, βi = b] | {z } }| { z p ( βi = b| f (Yi ) = r, θ) db. Our result for each logit of the mixture It is common in the literature to make assumptions about the unconditional distribution of random coefficients in the population, p ( βi = b| θ), but these usually do not restrict in convenient ways the resulting distribution conditional on the endogenous observed choices (e.g., starting from an unconditional normal, there is no reason to believe that the conditional will still be normal). Usually, this translates into as many distributions (of unknown form) to be estimated as the number of realizations of the sufficient set f (Yi ) = r. Intuitively, here the identification problem would be to recover the distribution of βi among those individuals whose observed sequence of choices gives rise to f (Yi ) = r. There are two cases to consider. 19 In the first case, assume that there are a large enough number of consumers for each realization of the sufficient set f (Yi ) = r. The simplest model of unobserved preference heterogeneity compatible with (3.6) would be a model in which each f (Yi ) = r represents a potentially different class of preferences to be estimated. All individuals with f (Yi ) = r have preferences βi = br with probability 1, with as many br ’s to be estimated as observed values of f (Yi ) = r, r = 1, . . . , R. We could generalise that to a mixed logit model with a discrete distribution of random coefficients, where βi has a discrete support with q = 1, . . . , Q values and each consumer belonging to group f (Yi ) = r has preferences drawn from a categorical distribution p ( βi = bq | f (Yi ) = r, θ) = prq , q = 1, . . . , Q. Preferences would take values [b1 , . . . , bq , . . . , bQ ] with probability masses [p1 , . . . , pq , . . . , pQ ], r 0 r r r where pq = p1q , . . . , prq , . . . , pR q , and p = p1 , . . . , pq , . . . , pQ is the distribution of preferences among those individuals whose observed sequence of choices give rise to f (Yi ) = r. Depending on how many individuals are in each f (Yi ) = r, we could either fix the points of the support [b1 , . . . , bq , . . . , bQ ] and only estimate their distribution [p1 , . . . , pq , . . . , pQ ], or alternatively estimate both [b1 , . . . , bq , . . . , bQ ] and [p1 , . . . , pq , . . . , pQ ]. A third possibility would be to make some more parametric assumption about p ( βi = b| f (Yi ) = r, θ), for example that for every f (Yi ) = r, p ( βi = b| θr ) is distributed normal with parameters θr and estimate θr , r = 1, . . . , R, using simulation methods. In the second case, assume that there are too few individuals for each f (Yi ) = r, r = 1, . . . , R. In the extreme, there may be as many different r’s as individuals. Then none of the methods described would work, since one would always have more parameters to estimate than observations. In general, this relates to the non-parametric identification of p ( βi = b| f (Yi ) = r, θ). This problem is current work in progress by D’Haultfœuille and Iaria. In both cases, once we obtain estimates pb ( βi = b| f (Yi ) = r, θ), r = 1, . . . , R, we can compute the estimated unconditional distribution of random coefficients in the population, pb ( βi = b| θ). Indeed, by Bayes’s rule: pb ( βi = b| θ) = R X c [f (Yi ) = r] , pb ( βi = b| f (Yi ) = r, θ) Pr (3.7) r=1 c [f (Yi ) = r] is the observed share of choice sequences that give rise to sufficient set f (Yi ) = r. where Pr 20 3.5 Applications of Sufficient Sets ? In this section, we discuss examples of assumptions on the stability or evolution of CSit across the T choice situations that lead to examples of sufficient sets that satisfy Condition 1 and that can be used to estimate demand parameters. These are examples and may be more or less appropriate in different economic environments, and these are not the only possibilities that would satisfy Condition 1. 3.5.1 Full Purchase History Logit Consider a situation in which i indexes individuals who make several purchase decisions t = 1, . . . , T at different points in time. As in the case of the FE logit, for every individual i, unobserved choice ? sets are stable across the T choice situations (but potentially different across i’s): CSit = CSi? , for every i and t. To simplify exposition, we assume that choice sets must be stable across all choice situations of each individual, but this is not essential. This method works whenever we observe at least two purchase decisions by the same individual from the same choice set. More in general, i’s full choice sequence of length T can be divided into sub-sequences of length of at least two. Then, the assumption of stable choice sets implies fixed choice sets within each sub-sequence, but potentially different choice sets between the different sub-sequences of individual i. With this kind of dataset, the econometrician typically observes individual i’s sequence of purchase decisions Yi = (Yi1 = j1 , . . . , YiT = jT ). As we saw in the example in section 3.2, the FE logit is obtained by sufficient set fF E (Yi ) = P(Yi ): the set of all possible permutations of choice sequence Yi = (Yi1 , . . . , YiT ). Note that, even if not explicitly, Chamberlain (1980) relies on the assumption of choice set stability. Interestingly, choice set stability also implies that P(Yi ) ⊆ CS ?i , with the consequence that, by Proposition 1, the FE logit will also accommodate unobserved choice set heterogeneity. The FE logit was designed to identify only those preference parameters associated to time-varying regressors, so that any individual-alternative specific fixed effect would be differenced out. However, there can be situations in which the econometrician is not worried about fixed effects, does not observe choice sets, and still needs an estimate of the whole vector θ, for example to evaluate the elasticity of demand with respect to price. In such situations, the following sufficient set may be more appropriate than fF E (Yi ) = P(Yi ). The assumption of stable choice sets allows for another sufficient set that enables the estimation of ST ? all the elements of θ. Specifically fF P H (Yi ) = (Hi )T , where Hi = t=1 {Yit } ⊆ CSit . The sufficient 21 set fF P H (Yi ) imputes to each choice situation t the collection of all the products that individual i ever purchased in any of the T choice situations. We call the model implied by fF P H (Yi ) the Full Purchase History logit, or FPH logit. The FPH logit is an extreme example of a model that allows to fully identify preference parameters θ and to difference out unobserved choice sets. It is extreme, in the sense that full identification of θ is achieved at the cost of ruling out unobserved preference heterogeneity, as in a multinomial logit model. Such a strong assumption is indeed not necessary for the full identification of θ. In the next two sections, we discuss milder restrictions on the extent of preference heterogeneity that can be accomodated compatibly with Proposition 1 and the full identification of θ: the nested logit model and the mixed logit model with latent classes. 3.5.2 Experience Goods: Past Purchase History Logit (PPH Logit) In this variant of our method, we allow for the choice sets of each individual i to expand over subsequent ? ? choice situations t = 1, . . . , T : CSit ⊆ CSit+1 . Then our proposed sufficient set is fP P H (Yi ) = St ? ×Tt=1 Hit , where Hit = b=1 {Yib } ⊆ CSit . The sufficient set fP P H (Yi ) imputes to each choice situation t the collection of products that individual i ever purchased in any past choice situation up to t. We call the model implied by fP P H (Yi ) the Past Purchase History logit, or PPH logit. Symmetrically, we can allow for the choice sets of each individual i to shrink over subsequent choice situations t = 1, . . . , T . Up to some data re-ordering, the case of shrinking choice sets can be treated identically to the case of expanding choice sets: if choice sets are shrinking over subsequent choice situations from t = 1 to t = T , then by re-sorting the choice situations in reverse chronological order, from the last choice situation to the first, choice sets will be expanding over subsequent choice situations from t = T to St ? t = 1. Then fP P H (Yi ) = ×Tt=1 Hit , where Hit = b=T {Yib } ⊆ CSit . Importantly, this sufficient set can either deal with ever expanding or ever shrinking choice sets, but not with both within the same sub-sequence of choices. 3.5.3 Inter-Personal Logit (IP Logit) Focusing on the cross-sectional variation of the data, different choice situations, indexed by t, can be interpreted as several individuals of a certain consumer type, indexed by i, making each a separate purchase decision at a specific point in time from an identical choice set. In this context, the as- 22 sumption of identical choice sets among the different T individuals of the same consumer type is the ? assumption of stable choice sets: CSit = CSi? , for every i and t. With cross-sectional data, the econometrician typically observes one purchase decision for each of the T individuals of consumer type i, Yi = (Yi1 = j1 , . . . , YiT = jT ), and the observable characteristics of the chosen products, (Xij1 1 , . . . , XijT T ). The method we propose in this section relies on the assumption that the observable characteristics of any product j, Xijt , be the same for each of the T individuals of consumer type i, Xijt = Xijt0 = Xij for any i, j, and t 6= t0 . For example, if individual t and t0 of consumer type i are observed purchasing, respectively, product j at price pjt and product k at price pkt0 , then our method will rely on the assumption that each individual could have purchased the product bought by the other exactly at the same price, i.e. pjt = pjt0 = pj and pkt = pkt0 = pk . Similarly, this method assumes that all the individuals of consumer type i are observationally identical in terms of their demographics. The assumption of an identical choice set across the T individuals of the same consumer type i gives ST ? rise to a sufficient set similar in spirit to fF P H (Yi ): fIP (Yi ) = (Hi )T , where Hi = t=1 {Yit } ⊆ CSit . The sufficient set fIP (Yi ) imputes to each individual t the collection of all the products purchased by any of the T individuals of consumer type i. Since this sufficient set entails comparisons of purchase behaviors between individuals, rather than within individual, we call the model implied by fIP (Yi ) the Inter-Personal logit, or IP logit. 3.6 A Practical Example A simple example illustrates the different sufficient sets and their implied choice probabilities, as well as how each can accommodate different economic environments. Suppose there are three products, j = 1, 2, and 3, and two choice situations, t = 1 and 2. Consider a consumer type i with true but ? ? unobserved choice set containing all products in both choice situations, CSi1 = CSi2 = {1, 2, 3}. In this case, CS ?i = {1, 2, 3} × {1, 2, 3} = {(1, 1) , (2, 2) , (3, 3) , (1, 2) , (2, 1) , (1, 3) , (3, 1) , (2, 3) , (3, 2)}. Assume consumer type i’s indirect utility of choosing product j in choice situation t is Uijt = δj + 0 0 Xijt β + ijt , where θ = [δ1 , δ2 , δ3 , β ] . Finally, i’s observed choice sequence is j = 1 in the first choice situation and j = 3 in the second, so that Yi = (1, 3). • Fixed-Effect logit: The sufficient set that gives rise to the FE Logit is made of all permutations of the observed choice sequence, P(Yi ). Since Yi = (1, 3), fF E (1, 3) = P(1, 3) = {(1, 3), (3, 1)}. 23 Given this, the contribution to the likelihood function for consumer type i is: Pr[Yi = (1, 3)|β, {(1, 3), (3, 1)}] = exp(Xi11 β) exp(Xi32 β) . exp(Xi11 β) exp(Xi32 β) + exp(Xi31 β) exp(Xi12 β) • Full Purchase History and Inter-Personal logit: In both the FPH and the IP logits, the sufficient set is made of the choice sequences compatible with products j = 1 and j = 3 being available S2 in both choice situations, Hi × Hi , where Hi = t=1 {Yit }. Thus, fF P H (1, 3) = Hi × Hi = {(1, 1), (3, 3), (1, 3), (3, 1)}. In this case, i’s contribution to the likelihood function of the FPH logit is: Pr[Yi = (1, 3)|θ, {(1, 1), (1, 3), (3, 1), (3, 3)}] exp(δ1 + Xi11 β) exp(δ3 + Xi32 β) (j1 ,j2 )∈{(1,1),(1,3),(3,1),(3,3)} exp(δj1 + Xij1 1 β) exp(δj2 + Xij2 2 β) =P = exp(δ1 + Xi11 β) exp(δ3 + Xi32 β) × . exp(δ1 + Xi11 β) + exp(δ3 + Xi31 β) exp(δ1 + Xi12 β) + exp(δ3 + Xi32 β) Note that, in the case of the IP logit, it is assumed that each individual t faces the same observable characteristics, Xi11 = Xi12 = Xi1 and Xi31 = Xi32 = Xi3 . This implies that i’s contribution to the likelihood function of the IP logit further simplifies to: Pr[Yi = (1, 3)|θ, {(1, 1), (1, 3), (3, 1), (3, 3)}] = exp(δ1 + Xi1 β) exp(δ3 + Xi3 β) × . exp(δ1 + Xi1 β) + exp(δ3 + Xi3 β) exp(δ1 + Xi1 β) + exp(δ3 + Xi3 β) Even if the FPH and the IP logit can be analytically expressed in a similar way, the economic assumptions they rely on are very different. In the FPH logit there is a same individual making several purchase decisions at different points in time, while in the IP logit there are different individuals of the same type making each a separate purchase decision at a specific point in time. • Past Purchase History logit: In the PPH logit, the sufficient set—at any point in time—includes all choice sequences consistent with those products purchased up to that choice situation, Hi1 × 24 Hi2 , where Hit = St b=1 {Yib }. Thus, fP P H (1, 3) = Hi1 × Hi2 = {(1, 1), (1, 3)}. In this case, consumer type i’s contribution to the likelihood function is: Pr[Yi = (1, 3)|θ, {(1, 1), (1, 3)}] = exp(δ1 + Xi11 β) exp(δ3 + Xi32 β) exp(δ1 + Xi11 β) exp(δ1 + Xi12 β) + exp(δ1 + Xi11 β) exp(δ3 + Xi32 β) =1× exp(δ3 + Xi32 β) . exp(δ1 + Xi12 β) + exp(δ3 + Xi32 β) Note that the FE logit with linear systematic utilities does not allow for the identification of all the preference parameters in θ: those parameters associated with regressors that only vary in (i, j), but not in t, will not be identified. In other words, any (i, j)-specific fixed effect will be differenced out. In the current example, the FE logit drops the alternative-specific constants δ1 , δ2 , and δ3 ; only β is identified. Both in the FPH and in the PPH logit, consumer type i’s choice sequence does not convey any information about δ2 . In this sense—and identically to the standard case of observable choice sets—only those alternative-specific constants corresponding to products with positive market shares can be identified. With the exception of the FE logit, each of the logit models we propose can be represented as the product of simpler logit models. In the current example, this can be seen by moving from the first to the second equality in both the FPH logit and the PPH logit. The same cannot be done in the FE logit. In general, one can switch the order of multiplication and summation in logit model (3.5), only when the sufficient set f can be defined as the cartesian product of T sets: f = ×Tt=1 ft . Among the sufficient sets we propose, fF P H , fP P H , and fIP can be defined as the cartesian products of T sets (either Hi or Hit ), while P(Yi ) cannot. The possibility of factoring a large logit model defined over choice sequences into smaller logit models each defined over a single choice situation represents a great computational advantage. 25 3.7 How well do the estimators perform Table 3.1: Increasing the share of individuals with restricted choice set 100% 90% 70% 50% Universal Logit Bias (SD) RMSE 0.005 0.032 (0.032) -0.223 0.224 (0.021) -0.525 0.526 (0.013) -0.719 0.719 (0.007) True Logit Bias (SD) RMSE 0.005 0.032 (0.032) 0.003 0.029 (0.028) 0.005 0.028 (0.028) 0.007 0.028 (0.027) FPH Logit Bias (SD) RMSE 0.011 0.032 (0.030) 0.008 0.028 (0.027) 0.012 0.031 (0.029) 0.014 0.030 (0.027) FE Logit Bias (SD) RMSE 0.024 0.088 (0.085) 0.015 0.081 (0,080) 0.011 0.088 (0.087) 0.018 0.080 (0.078) For each of the four cases of choice set heterogeneity (i.e., rows), twenty datasets are generated. The degree of choice set heterogeneity increases from the first to the fourth row as the share of individuals with restricted choice sets passes from 0 to 50. #U = 5. Restricted choice sets are defined as CSi = U \ {ji }, where ji is randomly (uniformly) selected for each individual. Each individual faces 10 choice situations. It is always assumed that individuals have stable choice sets (i.e., choice sets do not change across choice situations). In each dataset, there are 1000 individuals. The mixed logit model is estimated by simulated maximum likelihood with 1000 shifted and shuffled Halton draws per individual. To speed up computations, the FE Logit is estimated by sampling at random (uniformly), for each individual, 5000 permutations of the observed sequence of choices (see D’Haultfuille and Iaria, 2016). Table 3.2: Increasing the severity of choice set restriction Number of products in choice set (out of 5) 5 4 3 2 Universal Logit Bias (SD) RMSE 0.005 0.032 (0.032) -0.525 0.526 (0.013) -0.719 0.719 (0.007) -1.139 1.139 (0.003) True Logit Bias (SD) RMSE 0.005 0.032 (0.032) 0.005 0.028 (0.028) 0.013 0.030 (0.027) 0.007 0.028 (0.027) FPH Logit Bias (SD) RMSE 0.011 0.032 (0.030) 0.012 0.031 (0.029) 0.020 0.034 (0.027) 0.009 0.029 (0.027) FE Logit Bias (SD) RMSE 0.024 0.088 (0.085) 0.011 0.088 (0.087) 0.023 0.071 (0.068) 0.010 0.068 (0.067) For each of the four cases of choice set heterogeneity (i.e., rows), twenty datasets are generated. The degree of choice set heterogeneity increases from the first to the fourth row as the number of alternatives not included in CSi passes from 0 to 3. #U = 5. Restricted choice sets are defined asCSi = U \ {ji , ki , hi }, where ji , ki , and hi are randomly (uniformly) selected for each individual. Each individual faces 10 choice situations. It is always assumed that individuals have stable choice sets (i.e., choice sets do not change across choice situations). In each dataset, there are 1000 individuals. The mixed logit model is estimated by simulated maximum likelihood with 1000 shifted and shuffled Halton draws per individual. To speed up computations, the FE Logit is estimated by sampling at random (uniformly), for each individual, 5000 permutations of the observed sequence of choices (see D’Haultfuille and Iaria, 2016). 26 3.8 Discussion We close this subsection with two important practical comments. First, note that we can combine any of our time-series sufficient sets with the cross-sectional (Inter-Personal) set. For example, if we had confidence that the purchases of different individuals within each cross-section spanned the set of products available at that specific point in time, then we could use that information to limit the amount of time-series “extrapolation” between different cross-sections. As we saw in the above example, a drawback of the FE logit is that it does not allow the identification of the parameters of time-invariant regressors. Consequently, from a FE logit one cannot estimate price elasticities, for example. By contrast, the other sufficient sets we propose allow one to estimate all the elements of θ. Moreover, the FE Logit is both computationally more complex to implement and, as we will see in the next section, less efficient whenever there is not any unobserved preference heterogeneity among consumer types. The additional computational challenges raised by the FE logit follow from the impossibility of factoring P(Yi ) into a cartesian product of T sets. For a discussion about these computational issues and a description of the solution we use in our application, see D’Haultfœuille and Iaria (2016). However, despite these drawbacks, only the FE logit is robust against (i, j)-specific fixed effects. As we will discuss in the next section, the various ways of constructing f (Yi ) lead to more or less robust and/or efficient estimators along the lines of Hausman and McFadden (1984) and can be used to form specification tests. 4 Choosing the sufficient set 4.1 Specification Tests We discuss some approaches to testing the maintained assumptions in our empirical approach, in particular: (a) the length of the sequence of choice situations for which choice sets are stable, ever expand, or ever shrink; and (b) unobserved individual-product-specific preference heterogeneity. This section proposes a theoretical result that enables us to “rank” Maximum Likelihood Estimators (MLE) of logit models with different sufficient sets in terms of their efficiency, and to use that to form specification tests of our assumptions. Roughly, the MLE of a logit model with sufficient set 27 fL is more efficient than the MLE of a logit model with sufficient set fZ ⊆ fL . Moreover, this result can be applied recursively, so that if two subsets of fL are available, say fZ and fXZ with fXZ ⊆ fZ , then the efficiency rank of the three MLEs will be clear: fL fZ fXZ . Using this result, we then propose Hausman–like specification tests between models based on different sufficient sets, i.e. different ways of constructing the correspondence f . These comparisons enable us to test for some forms of choice set stability and preference heterogeneity. Proposition 2. Assume that Assumption 1 holds, and that sufficient sets fL and fZ satisfy Con- dition 1 such that fZ (Yi ) ⊆ fL (Yi ), Yi ∈ CS ?i = c and i = 1, . . . , I. Define lL (θ) and lZ (θ) as the log–likelihood functions corresponding to the logit models conditional on fL (Yi ) and on fZ (Yi ), with θbL and θbZ the corresponding Maximum Likelihood Estimators (MLEs). Then the following results hold: 1. The log–likelihood function lL (θ) can be written as lL (θ) = lZ (θ) + l4 (θ). 2. Provided that θ is identified in l4 (θ), so that θb4 is a well–defined MLE, then: (a) θbZ and θb4 are asymptotically independent, and (b) θbL is more efficient than θbZ . 3. Given result (2), then: (a) All Hausman–like tests based on pairwise estimator comparisons among θbL , θbZ , and θb4 are equivalent, (b) the Likelihood Ratio statistic LR = 2[lZ (θbZ )+l4 (θb4 )−lL (θbL )] is asymptotically equivalent to the Hausman statistic comparing θbL and θbZ , and (c) Var θbL − θbZ = Var θbZ − Var θbL . Proof. See Appendix B. The Likelihood Ratio statistic, LR, proposed in result (3b) allows us to compare different models derived from alternative assumptions. It consists of the difference between an unrestricted log– likelihood function, lZ θbZ + l4 θb4 , and a restricted one, lL θbL . Even though LR requires the 28 computation of a third estimator, θb4 , it is simpler to implement than other Hausman–like statistics based on quadratic forms. For instance, the statistic LR is always non–negative, by–passing the practical inconvenience of some estimated covariance matrices, which fail to be positive definite. Furthermore, and differently from other Hausman–like statistics, LR makes very transparent the computation of the degrees of freedom of the corresponding χ2 distribution: they equal the number of parameters in θbL . Result (3c) is of practical convenience, since it implies that the computation of Var θbL − θbZ , necessary for classical Hausman–like statistics, can proceed as in the standard case in which one of the compared estimators is fully efficient under the null hypothesis, even though no such efficiency assumption is required here. Proposition 2 hinges on the Factorization Theorem proposed by Paul Ruud, for more details see Ruud (1984) and Hausman and Ruud (1987). The various sufficient sets introduced in section (3.3) rely on the following economic assumptions: • fF E : Choice set stability across T choice situations and the possibility of having unobserved preference heterogeneity in the form of (i, j)–specific fixed effects. • fF P H : Choice set stability across T choice situations and no unobserved preference heterogeneity. • fP P H : Choice set evolution in the form of entry–but–no–exit or exit–but–no–entry across T choice situations and no unobserved preference heterogeneity. Proposition 2 can be used to construct statistics useful to test for these economic assumptions by comparing the estimates obtained from models based on different sufficient sets f ’s. The first possibility is to compare fF E , fF P H , and fP P H for choice sequences of constant length T . In fact, for choice sequences of a given length T , it can be shown that fF E (Yi ) ⊆ fF P H (Yi ) and that fP P H (Yi ) ⊆ fF P H (Yi ) for any Yi ∈ CS ?i = c. Given Proposition 2, these relationships among sufficient sets allow us to construct Hasmann–like tests for the assumptions of choice set stability and for preference heterogeneity. The second possibility for constructing test statistics is to fix a specific f , say fF E , and to compare choice sequences with some of their sub–sequences: for example, the sequence 1, 2, . . . , T L can be split into two mutually exclusive sub–sequences 1, 2, . . . , T Z and T Z + 1, . . . , T L , and this gives rise to different fF E ’s, fFZE and fFLE such that fFZE (Yi ) ⊆ fFLE (Yi ) for any Yi ∈ CS ?i = c. The same holds for 29 both fF P H and fP P H . This method of making comparisons allows us to test for choice set stability in several alternative ways, but it does not enable us to test for preference homogeneity. In appendix C we discuss each of these Hausman–like test statistics in more detail. 5 Bounding Choice Probabilities and Elasticities We are often interested in functions of parameters, for example, willingness to pay, elasticities, welfare or analysis of counterfactuals, such as evaluating the effects of a change in tax policy or a merger between manufacturers. We can obtain point estimates of preference parameters and so, for example, willingness to pay. However, without further information or restrictions on the true choice sets, we are not able to pointestimate consumer type-specific choice probabilities and hence average choice probabilities. Still, under some weak assumptions, we can ”bound” them. Also, given the convenient relationship between consumer type-specific elasticities and consumer type-specific choice probabilities in logit models, we can as well bound consumer type-specific elasticities and their averages. For expositional simplicity, suppose that indirect utilities are linear in parameters: Vijt (Xit , θ) = Xijt θ = δj + Xijt β. (5.1) Denote by Yit = jt whether product jt is chosen by consumer type i in choice situation t, and by ft (Yi ) = rit whether i’s sufficient set collects products rit in choice situation t. ? ? Note that ft (Yi ) ⊆ CSit and that CSit ⊆ Sit . In what follows we use these conditions to bound the true but unobserved denominator of the logit choice probabilities for any Ximt , δ, and β: X m∈ft (Yi )=rit exp (δm + Ximt β) ≤ X X exp (δm + Ximt β) ≤ ? =c m∈CSit it exp (δm + Ximt β) . (5.2) m∈Sit =sit ? S CS ? For brevity, denote P rijt (θ) = Pr[Yit = jt |Sit = sit , θ], P rijt (θ) = Pr[Yit = jt |CSit = cit , θ], and f P rijt (θ) = Pr[Yit = jt |ft (Yi ) = rit , θ]. It follows from (5.2) that for any jt ∈ ft (Yi ) = rit : ? f S CS P rijt (θ) ≤ P rijt (θ) ≤ P rijt (θ). 30 (5.3) f (θ) = Pr[Yit = jt |ft (Yi ) = rit , θ] takes the usual logit form whenever jt ∈ rit , Observe that P rijt f but that it equals zero whenever jt ∈ / rit . Hence, for those jt ∈ sit but jt ∈ / rit , P rijt (θ) will not be ? CS ? a valid upper bound for P rijt (θ): even if jt ∈ / rit , it can still be the case that jt ∈ CSit = cit and ? CS so that P rijt (θ) > 0. Similarly, among the jt ∈ sit that jt ∈ / rit , there can be some jt ∈ / cit . But ? S CS S for those jt ∈ sit that jt ∈ / cit , P rijt (θ) > P rijt (θ) = 0: P rijt (θ) will not be a valid lower bound ? ? CS CS for P rijt (θ). It is then unclear how to bound P rijt (θ) for those jt ∈ sit but jt ∈ / rit . However, it is always possible to construct bounds for the probability with which i would purchase jt if indeed jt ? were to be added to their true but unobserved choice set, CSit ∪ {jt } = cit ∪ {jt }: CS P rijt ? ∪j exp (δjt + Xijt t β) . m∈cit ∪{jt } exp (δm + Ximt β) ? (θ) = Pr [Yit = jt |CSit ∪ {jt } = cit ∪ {jt }, θ] = P S∪j f ∪j S∪j CS S By defining P rijt (θ) and P rijt (θ) analogously, note that P rijt (θ) = P rijt (θ), P rijt ? f ∪j f S∪j CS CS P rijt (θ), and P rijt (θ) = P rijt (θ) for any jt ∈ rit , while P rijt (θ) ≤ P rijt ? ∪j (5.4) ? CS (θ) and P rijt ∪j ? (θ) = ∪j (θ) ≤ f ∪j P rijt (θ) for any jt ∈ / rit . Using these facts, we can then complement condition (5.3) for those jt ∈ / rit and propose choice probability bounds for all (i, j, t) combinations: S∪j CS P rijt (θ) ≤ P rijt ? ∪j f ∪j (θ) ≤ P rijt (θ). (5.5) Condition (5.5) can be used to construct bounds for functions of consumer type choice probabilities, such as average choice probabilities or elasticities. The average choice probability of product jt for a certain group of consumer types i = 1, . . . , It can be bounded by: It−1 It X i=1 S∪j P rijt (θ) ≤ It−1 It X CS P rijt i=1 ? ∪j (θ) ≤ It−1 It X i=1 31 f ∪j P rijt (θ). (5.6) With logit demand and linear indirect utilities (5.1), consumer type i’s own- and cross-price elasticities are: CS jj (Xit , θ) = βp pjt t (1 − P rijt ξit ? ∪j (θ)) exp(δjt + Xijt t β) X , exp(δm + Ximt β) = βp pjt t 1 − m∈cit ∪{jt } (5.7) CS jk (Xit , θ) = βp pkt t P rikt ξit ? ∪j (θ) = −βp pkt t exp(δkt + Xikt t β) X , exp(δm + Ximt β) m∈cit ∪{jt } where pjt t is jt ’s price in choice situation t and βp is the price coefficient. As (5.7) makes clear, even though we may have a consistent estimator of δ = [δ1 , . . . , δj , . . . , δJ ] and β, we still do not know CS ? the exact CSit ∪ {jt } = cit ∪ {jt } for each i and t, and thus the true P rijt ? ∪j ? (θ), ∀jt ∈ CSit ∪ {jt } = cit ∪ {jt }. Given (5.7), (5.5), and βp < 0, we obtain the following bounds on the elasticities for any jt , kt , Xijt , δ, and β: f ∪j βp pjt t (1 − P rijt (θ)) | {z } Lower (in abs. value) Bound S∪j −βp pkt t P rikt (θ) | {z Lower Bound jj S∪j ≤ ξit (Xit , θ) ≤ βp pjt t (1 − P rijt (θ)) | {z } Upper (in abs value) Bound ≤ jk ξit (Xit , θ) } ≤ (5.8) f ∪j −βp pkt P rikt (θ) | {z Upper Bound } We construct confidence intervals for these identification regions following Imbens and Manski (2004). Details are provided in Appendix A. 32 Figure 6.1: Market shares Notes: xxxxx 6 Empirical Illustration In this section we provide an empirical illustration of the problem that arises from unobserved heterogeneous choice sets and potential solutions. We choose a situation in which we observe a change in the choice set that consumer types face, so that we can investigate the consequent bias in standard methods and the working of our proposed estimators. We consider demand for ketchup in plastic bottles in the UK over the period 2002–2009. The UK ketchup market is an ideal application to study the problem of demand estimation in the presence of heterogeneous unobserved choice sets. Heinz is the dominant brand, in 2002 it had a market share of approximately 60%. Supermarket own brands made up the rest of the market. Heinz introduced the top down package in July 2003. The top down format was popular and gradually became the dominant variety, see Figure 6.1. As we saw in equation (2.5) and the following discussion, the estimation bias is expected to be particularly large whenever the econometrician mistakenly imputes to consumer types products they like, but did not purchase because not in their true but unobserved choice set. In this sense, the UK 33 Figure 6.2: Mean number of products in sufficient sets Notes: xxxxx ketchup market is an ideal situation because we observe a product, the top down variety, which a substantial share of households clearly prefer (we know this because we observe them purchasing the top down variety when both are available), but which we know is not in their choice set prior to its introduction into the market. The ketchup market has been studied in a number of papers in the literature, all relying on household-level panel data of the type we also use. Allenby and Rossi (1998) and Chiang et al. (1998) are early applications that estimate models with preference heterogeneity, while Pesendorfer (2002) explores the optimal timing of sales. Villas-Boas and Zhao (2005) model both manufacturer and retailer behavior to measure the extent of manufacturer competition, the nature of retailermanufacturer interactions, and the consequences of retailer product-category pricing. Pancras and Sudhir (2007) analyze optimal one-to-one couponing strategy for a major customer data intermediary (CDI). The US and UK markets differ slightly in the relative importance of store own brand ketchup products. 6.1 Data We use UK household-level purchase data from the Kantar Worldpanel. Households record purchases using a scanner in the home. Households typically remain in the panel for around 2 years. For more information on the data see Leicester and Oldfield (2009) and Griffith and O’Connell (2009). We estimate demand for ketchup using information on purchases over the period 2002–2009. These are households that purchase ketchup regularly. We consider 39 products. These are all products in plastic bottles that we observe being sold. We define sufficient sets as in section (3.3). In what follows, we present results for the Full Purchase History (FPH) logit, defined in section (3.5.1), and combined with the the Inter-Personal logit, defined in section (3.5.3). In addition, we also implement the Fixed Effects logit, discussed in section (3.5.1), to estimate the marginal utility of income (the coefficient on price). 34 6.2 Demand estimates Table 6.1: Demand estimates price Universal FPH FPH-IP FE FE-IP -0.478 (0.021) -1.372 (0.044) -1.431 (0.047) -1.625 (0.063) -1.527 (0.068) Universal on FE sample -0.491 (0.027) -0.152 (0.002) 0.013 (0.017) 0.219 (0.018) 0.061 (0.027) 0.359 (0.022) -0.803 (0.055) 0.415 (0.028) 0.574 (0.031) -1.287 (0.053) -0.610 (0.028) -0.432 (0.025) -0.125 (0.023) 1534221 39339 2410 9.3 1 31 39 39 39 -0.012 (0.004) 0.583 (0.052) 0.364 (0.040) 0.710 (0.053) 0.812 (0.056) 1.104 (0.136) 1.034 (0.057) 1.243 (0.069) 0.246 (0.112) 0.135 (0.073) -0.073 (0.063) 0.642 (0.061) 90937 29160 2365 6.5 1 24 2.4 2 3 -0.010 (0.004) 0.593 (0.061) 0.389 (0.042) 0.758 (0.055) 0.872 (0.059) 1.489 (0.166) 1.240 (0.061) 1.332 (0.074) 0.247 (0.116) 0.128 (0.076) -0.008 (0.066) 0.779 (0.066) 76773 27537 2291 6.3 1 24 2.4 2 3 17021 20187 2048 5.0 1 20 2.3 2 3 -0.150 (0.002) 0.132 (0.022) 0.321 (0.024) 0.191 (0.034) 0.377 (0.029) -1.340 (0.097) 0.454 (0.035) 0.665 (0.041) -1.390 (0.078) -0.446 (0.038) -0.304 (0.035) 0.080 (0.031) 911781 23379 2181 5.6 1 21 39 39 39 SD price distance topdown ss700g ss900g ss1kg ss11kg ss12kg ss13kg Morr Sain Tesc Hein N it i average sequences min sequences max sequences average j minimum j maximum j 35 20051 23379 2181 5.6 1 21 2.3 2 3 Mixlogit -1.125 (0.034) 1.333 (0.023) -0.152 (0.002) 0.396 (0.017) 0.479 (0.018) 0.222 (0.026) 0.828 (0.022) -0.038 (0.056) 0.525 (0.027) 0.946 (0.031) -0.973 (0.051) -0.412 (0.028) -0.441 (0.026) 0.382 (0.023) 1629108 41772 2416 9.9 1 32 39 39 39 6.3 Elasticity bounds In what follows, we illustrate the concepts of market share and elasticity in the case of panel data, in which i represents a household and t a choice situation in a specific time period. With a slight change of notation, the same can be done in the case of cross-sectional data. Figure 6.3 shows the own-price elasticities for all 39 products. Figure 6.3: Own price elasticities for all products in October 2007 7 Conclusion In this paper we have addressed the consequences of unobserved choice set heterogeneity on demand estimation. We show that it generally causes an estimation bias. We propose a solution method that allows us to consistently estimate demand parameters and to bound demand elasticities. The solution is based on logit demand and on identifying “sufficient sets” of choices based on restrictions that need to be determined by the characteristics of the economic environment. The solutions can be applied to both cross-section and panel data. We illustrate the method using an application to ketchup demand. The point estimate of the marginal utility of income (the coefficient on price) is biased (towards zero) when we use the superset of products as the assumed homogenous choice set (a common approach taken in applied demand estimation). The elasticities obtained by using the superset estimates often lie outside of the bounds that we obtain by using our solution method. This matters for strategy and policy applications. 36 Extending the methods introduced here to non-logit demand, whether in general or in response to the endogenous matching of individuals to choice sets (e.g., due to customer search and/or firms’ product choice decisions) is a promising venue for further research. Bierlaire et al. (2008) extended estimation on choice-based samples to “block-additive GEV” models, however, these do not include the widely used Nested logit model. Extending our methods to accommodate Nested logit and/or Random Coefficient logit is a valuable topic for future research. Fox (2007) describes an estimation method on choice subsets that does not assume logit demand, but instead relies on knowing rank orderings of (subsets of) consumers’ products. 37 Appendices A Confidence Intervals for Elasticity Bounds jk (Xit , θ), although the For notational simplicity, we limit our discussion to a single elasticity term ξit same ideas can be extended to the collection of all elasticities. Refer to the upper and lower bounds jk jk jk of ξit (Xit , θ) in (5.8) as to ξit (Xit , θ) and ξit (Xit , θ), respectively. Denote the elasticity bounds of h i0 jk jk jk jk (Xit , θ) = ξit (Xit , θ) by the 2 × 1 vector B ξit (Xit , θ) and the corresponding elasξit (Xit , θ), ξit jk b we can estimate ticity interval from (5.8) by IN ξit (Xit , θ) . Then, given Xit and our consistent θ, jk jk b . We derive the corresponding 100 (1 − α) the elasticity bounds B ξit (Xit , θ) by B ξit (Xit , θ) percent confidence interval CI1−α from condition: inf jk jk (Xit ,θ)) ∈IN (ξit ξit n h io jk lim Pr ξit ∈ CI1−α ≥ 1 − α. I→∞ (A.1) √ d Since our estimator is consistent and asymptotically normal, i.e., θb I → N (θ, Vθ ), by the deltamethod: 0 jk jk ∂B ξit ∂B ξ (X , θ) (X , θ) it it it d jk Vθ I −→ N B ξit (Xit , θ) , . 0 0 ∂θ ∂θ jk b B ξit (Xit , θ) √ (A.2) jk b as to Σ jk . It Refer to the 2 × 2 asymptotic variance–covariance matrix of B ξit (Xit , θ) B (ξit ) follows that, whenever ft (Yi ) ∪ {jt } = rit ∪ {jt } is a strict subset of Sit ∪ {jt } = sit ∪ {jt }, so that for jk jk (Xit , θ) < ξit (Xit , θ), condition (A.1) is satisfied by: any Xit and θ, ξit " CI1−α # r r jk b − q1−α Σ 11 jk , ξ jk (Xit , θ) b + q1−α Σ 22 jk , = ξit (Xit , θ) B (ξit ) it B (ξit ) th where q1−α is the (1 − α) (A.3) quantile of the standard normal distribution. jk In the extreme case in which ft (Yi ) ∪ {jt } = rit ∪ {jt } = Sit ∪ {jt } = sit ∪ {jt }, ξit (Xit , θ) = jk jk ξit (Xit , θ) for any Xit and θ, and (A.3) is invalid. This is due to a discontinuity at ξit (Xit , θ) = jk (Xit , θ), since in that case the coverage of the interval is only 100 (1 − 2α) % rather than the ξit 1 nominal 100 (1 − α) %. (See Imbens and Manski (2004) for a modification of (A.3) that overcomes this problem.) However, note that (a) both ft (Yi ) ∪ {jt } = rit ∪ {jt } and Sit ∪ {jt } = sit ∪ {jt } are always perfectly observed by the econometrician, so that the appropriate CI1−α can always be implemented and (b) in our empirical application ft (Yi ) ∪ {jt } ⊂ Sit ∪ {jt } for every i and t. B Proof of Proposition 2 B.1 Proof of Result (1) From Proposition 1, we can re-write for every i the probability of the observed choice sequence j given fZ (Yi ) = z ⊆ fL (Yi ) = l as: QT Pr [Yi = j|, fL (Yi ) = l, θ] exp(Vijt (Xit , θ)) QT k∈fL (k)=l t=1 exp(Vikt (Xit , θ)) =P t=1 = Pr [Yi = j|fZ (Yi ) = z, θ] Pr [Yi = j|fL (Yi ) = l, θ] Pr [Yi = j|fZ (Yi ) = z, θ] P QT exp(Vijt (Xit , θ)) q∈fZ (q)=z t=1 exp(Viqt (Xit , θ)) =P , QT P QT q∈fZ (q)=z t=1 exp(Viqt (Xit , θ)) k∈fL (k)=l t=1 exp(Vikt (Xit , θ)) QT t=1 = Pr [Yi = j|fZ (Yi ) = z, θ] Pr [Yi ∈ fZ (Yi ) = z|fL (Yi ) = l, θ] where Pr [Yi ∈ fZ (Yi ) = z|fL (Yi ) = l, θ] is the probability that a choice sequence belongs to the “smaller” set z relative to the “larger” set l. By multiplying Pr [Yi = j|fL (Yi ) = l, θ] across all consumer types i’s and by taking the logarithm of this likelihood function, result (1) follows with PI l4 (θ) = i=1 ln (Pr [Yi ∈ fZ (Yi ) = z|fL (Yi ) = l, θ]). B.2 Proof of Result (2) Given result (1) of Proposition 2, results (2a) and (2b) follow from the Factorization Theorem of (Ruud, 1984, result (1), p.24). 2 B.3 Proof of Result (3) Given result (1) of Proposition 2, result (3a) follows from the Factorization Theorem by (Ruud, 1984, result (3), p.24), while result (3b) follows from (Ruud, 1984, pp.28-9). Result (3c) can be proved as follows. (Ruud, 1984, result 2, p.24) shows that θbL is asymptotically −1 −1 equivalent to Var θbL Var θbZ θbZ + Var θbL Var θb4 θb4 . This implies that Cov θbL , θbZ = −1 −1 b b b b Cov Var θL Var θZ θZ , θZ = Var θbL Var θbZ Var θbZ = Var θbL , where the first equality follows from result (2a). Consequently, Var θbL − θbZ = Var θbL +Var θbZ −2Cov θbL , θbZ = Var θbZ − Var θbL . C Testing Procedures The various sufficient sets introduced in section (3.3) rely on the following economic assumptions: • fF E : Choice set stability across T choice situations and the possibility of having unobserved preference heterogeneity in the form of (i, j)–specific fixed effects. • fF P H : Choice set stability across T choice situations and no unobserved preference heterogeneity. • fP P H : Choice set evolution in the form of entry–but–no–exit or exit–but–no–entry across T choice situations and no unobserved preference heterogeneity. There are two possibilities for making comparisons across models based on different sufficient sets f ’s. The first possibility is to compare fF E , fF P H , and fP P H for choice sequences of constant length T . The second possibility is to fix a specific f , say fF E , and to compare choice sequences with some of their sub–sequences: for example, the sequence 1, 2, . . . , T L can be split into two mutually exclusive sub–sequences 1, 2, . . . , T Z and T Z + 1, . . . , T L , and this gives rise to different fF E ’s, fFZE and fFLE such that fFZE (Yi ) ⊆ fFLE (Yi ) for any Yi ∈ CS ?i = c. We will discuss each testing possibility in turn. C.1 Comparisons of Different f ’s with Constant T For choice sequences of a given length T , it can be shown that fF E (Yi ) ⊆ fF P H (Yi ) and that fP P H (Yi ) ⊆ fF P H (Yi ) for any Yi ∈ CS ?i = c. These can be seen in our previous example from section 3 (D). Suppose Yi = (1, 3). Then fF E (1, 3) = P (1, 3) = {(1, 3) , (3, 1)}, fF P H (1, 3) = {1, 3} × {1, 3} = {(1, 1) , (3, 3) , (1, 3) , (3, 1)}, and fP P H (1, 3) = {1} × {1, 3} = {(1, 1) , (1, 3)}. Note that there is no clear “inclusion” relationship between fF E (Yi ) and fP P H (Yi ). Given the results from Proposition 2, the above relationships among sufficient sets lead to two possible classes of tests. The first is about choice set stability and the second about preference heterogeneity. C.2 Choice Set Stability fF P H and fP P H are both based on the same assumption of absence of unobserved preference heterogeneity. However, they rely on different assumptions regarding the evolution of choice sets across choice situations: fF P H assumes that unobserved choice sets do not change along the whole choice sequence, while fP P H allows for the entry of new alternatives in the unobserved choice while comparing choice situation t to t + 1. On the one hand, if unobserved choice sets were stable, then both f ’s would give rise to consistent estimators θbF P H and θbP P H , but result (2b) from Proposition 2 tells us that θbF P H would be more efficient than θbP P H . On the other hand, if unobserved choice sets were subject to entry–but–no–exit of alternatives, then only θbP P H would be consistent. It follows that, under the maintained assumption of no unobserved preference heterogeneity, a test for H0 : (choice h i set stability in 1, 2, . . . , T ) is LR = 2 lP P H θbP P H + l4 θb4 − lF P H θbF P H . C.3 Preference Homogeneity The sufficient sets fF P H and fF E are both based on the same assumption of unobserved choice set stability in 1, 2, . . . , T . However, they rely on different assumptions regarding unobserved preference heterogeneity: fF P H assumes that there is no unobserved preference heterogeneity, while fF E allows for (i, j)–specific fixed effects. On the one hand, if there were no unobserved preference heterogeneity, then both f ’s would give rise to consistent estimators θbF P H and θbF E , but Proposition 2 tells us that θbF P H would be more efficient than θbF E . On the other hand, if unobserved preference heterogeneity were present in a form encompassed by (i, j)–specific fixed effects, then only θbF E would be consistent. It follows that, under the maintained assumption of choice set stability in 1, 2, . . . , T , a test for H0 : h i (preference homogeneity) is LR = 2 lF E θbF E + l4 θb4 − lF P H θbF P H . 4 C.4 Comparisons of Same f with Different Choice Sub–Sequences It is always possible to split choice sequences of length 1, 2, . . . , T L into two (or more) mutually exclusive sub–sequences 1, 2, . . . , T Z and T Z + 1, . . . , T L . Then fFZE (Yi ) ⊆ fFLE (Yi ) for any Yi ∈ CS ?i = c. The same holds for both fF P H and fP P H . This method of making comparisons allows us to test for choice set stability in several alternative ways, but it does not enable us to test for preference homogeneity. C.5 Choice Set Stability: fF E Example In this section we will show with an example why fFZE (Yi ) ⊆ fFLE (Yi ) for any Yi ∈ CS ?i = c and, afterward, we will discuss how to use this fact to construct tests of choice set stability. Suppose J = 5, T L = 4, and that consumer type i is observed to make the choice sequence Yi = (j1 , j2 , j3 , j4 ) = (3, 5, 5, 4).1 By considering the observed choice sequence “at once,” Yi = (3, 5, 5, 4) can be re–ordered in 12 different choice sequences.2 Collect these sequences into the set fFLE (Yi ) = l. Assume that Vijt (Xit , θ) = δij + Xijt β. Then, i’s likelihood contribution given fFLE (Yi ) = l is: Pr Yi = (3, 5, 5, 4)| fFLE (Yi ) = l, β Xexp ((Xi31 + Xi52 + Xi53 + Xi44 ) β) . exp ((Xij1 1 + Xij2 2 + Xij3 3 + Xij4 4 ) β) = (C.1) L (j ,j ,j ,j )=l (j1 ,j2 ,j3 ,j4 )∈fF E 1 2 3 4 Differently, by splitting i’s observed choice sequence into two mutually exclusive pairs of choices Yi1 = (3, 5) and Yi3 = (5, 4), we get fFZE (Yi1 ) = {(3, 5) , (5, 3)} and fFZE (Yi3 ) = {(5, 4) , (4, 5)}. Then, i’s likelihood contribution given fFZE (Yi1 ) = z1 and fFZE (Yi3 ) = z3 is: Pr Yi = (3, 5, 5, 4)| fFZE (Yi1 ) = z1 , fFZE (Yi3 ) = z3 , β = exp ((Xi31 + Xi52 ) β) × exp ((Xi31 + Xi52 ) β) + exp ((Xi51 + Xi32 ) β) × exp ((Xi53 + Xi44 ) β) . exp ((Xi53 + Xi44 ) β) + exp ((Xi43 + Xi54 ) β) 1 Alternative (C.2) one in the first choice situation, alternative three in the second choice situation, etc. sequences are: (3, 5, 5, 4), (5, 3, 5, 4), (5, 5, 3, 4), (5, 5, 4, 3), (4, 3, 5, 5), (3, 4, 5, 5), (3, 5, 4, 5), (5, 3, 4, 5), (5, 4, 3, 5), (5, 4, 5, 3), (4, 5, 3, 5), and (4, 5, 5, 3). 2 These 5 By multiplying the binomial logits in (C.2), we get: Pri Yi = (3, 5, 5, 4)| fFZE (Yi ) = z, β = Xexp ((Xi31 + Xi52 + Xi53 + Xi44 ) β) , exp ((Xij1 1 + Xij2 2 + Xij3 3 + Xij4 4 ) β) (C.3) Z (j ,j ,j ,j )=z (j1 ,j2 ,j3 ,j4 )∈fF E 1 2 3 4 where fFZE (Yi ) = z collects sequences: (3, 5, 5, 4), (3, 5, 4, 5), (5, 3, 5, 4), and (5, 3, 4, 5). Consequently fFZE (Yi ) = z ⊆ fFLE (Yi ) = l. In this example, fFZE only uses information about 4 of the 12 possible choice sequences in fFLE . This implies that if unobserved choice sets were stable, then estimator βbFLE would be more efficient than βbFZE . Moreover, the FE logit estimated on choice sub–sequences may “discard” some choice situations: in the current example of sub–sequences of length two, whenever jt = jt+1 in Yit = (jt , jt+1 ), then “fragment” Yit of Yi will not be used in estimation. For example, if i were observed to choose the sequence Yi = (3, 4, 5, 5), then only Yi1 = (3, 4) would contribute to the likelihood function lFZ E (β), while lFL E (β) would still use the whole sequence Yi = (3, 4, 5, 5). More precisely, if Yi = (3, 4, 5, 5) were observed, then fFLE (3, 4, 5, 5) = fFLE (3, 5, 5, 4) = l would still contain the same 12 choice sequences, while model (C.2) would collapse to: Pr Yi = (3, 4, 5, 5)| fFZE (Yi1 ) = h1 , fFZE (Yi3 ) = h3 , β = exp ((Xi31 + Xi42 ) β) × exp ((Xi31 + Xi42 ) β) + exp ((Xi41 + Xi32 ) β) × exp ((Xi53 + Xi54 ) β) exp ((Xi53 + Xi54 ) β) = exp ((Xi31 + Xi42 + Xi53 + Xi54 ) β) exp ((Xi31 + Xi42 + Xi53 + Xi54 ) β) + exp ((Xi41 + Xi32 + Xi53 + Xi54 ) β) (C.4) = Pr Yi = (3, 4, 5, 5)| fFZE (Yi ) = h, β , which is also equivalent to Pr Yi1 = (3, 4)| fFZE (Yi1 ) = h1 , β . In this case, then, fFZE (Yi1 ) = h1 ⊆ fFZE (Yi ) = z ⊆ fFLE (Yi ) = l. By Proposition 2, we can rank the corresponding estimators in terms 6 of their relative efficiency. As a consequence, by splitting up choice sequences into mutually exclusive sub–sequences, we can face also this further loss of efficiency. Model (C.1) requires stronger assumptions than model (C.3) for its consistent estimation. Con? = ct , t = 1, 2, 3, 4. sistent estimation of model (C.1) requires that alternatives {3, 4, 5} ⊆ CSit ? However, consistent estimation of model (C.3) only requires that {3, 5} ⊆ CSit = ct , t = 1, 2 and that ? ? ? {4, 5} ⊆ CSit = ct , t = 3, 4. In this example, if 4 ∈ / CSit = ct , t = 1 or 2, or 3 ∈ / CSit = ct , t = 3 or 4, then estimation of model (C.1) would not be consistent, while estimation of model (C.3) would. These differences in consistency and relative efficiency suggest a Hausman–like test for unobserved ? choice set stability. If {3, 4, 5} ⊆ CSit = ct , t = 1, 2, 3, 4, then estimation of both model (C.1) and model (C.3) would be consistent. However, estimation of model (C.1) would be more efficient than ? ? estimation of model (C.3). If 4 ∈ / CSit = ct , t = 1 or 2 or 3 ∈ / CSit = ct , t = 3 or 4, then only estimation of model (C.3) would be consistent. It follows that, under the maintained assumption of unobserved preference heterogeneity in a form encompassed by (i, j)–specific fixed effects, a test for h i H0 : (choice set stability in 1, 2, 3 and 4) is LR = 2 lFZ E βbPZP H + l4 βb4 − lFL E βbFLE . D A Simple Example We present a simple example that demonstrates the consequences of unobserved choice set heterogeneity. Consider a choice environment where a population of consumers, i = 1, . . . , N , selects among one inside and one outside good, j ∈ S = {0, 1}. Assume preferences are given by Ui1 = Xi θ + i1 Ui0 = i0 , with Xi = Xi1 − Xi0 and ij ∼ Type I Extreme Value. Let CSi? indicate consumer i’s true choice set, which can either include both options, CSi? = S = {0, 1}, or the outside option only, CSi? = {0}. In other words, some consumers have access to the 7 inside product while others do not. Let Yi indicate whether consumer i buys the inside product. Then: 1 if Ui1 ≥ Ui0 , and CSi? = {0, 1} Yi = Ui1 < Ui0 and CSi? = {0, 1}, or 0 if CSi? = {0}. (D.1) Define ρ as the proportion of the population that faces both options, i.e. Pr [CSi? = {0, 1}] = ρ, while (1 − ρ) is the proportion that does not have access to the inside product. Let pi (Xi , θ) measure the choice probability that i would buy the inside product conditional on it being in her choice set, pi (Xi , θ) = Pr [Ui1 > Ui0 |CSi? = {0, 1}, Xi , θ] = (D.2) e Xi θ . 1 + e Xi θ The previous expression relies on the matching of consumers to choice sets being independent of the distribution of ij , an important assumption that we discuss in detail in the following section. Then, the unconditional probability that i buys the inside good is Pr [Yi = 1|Xi , θ] = Pr [Ui1 > Ui0 |CSi? = {0, 1}, Xi , θ] Pr [CSi? = {0, 1}] (D.3) = pi (Xi , θ)ρ. For individuals with CSi? = {0, 1}, the average marginal effect with respect to Xik (averaged over the distribution of Xi ) is, EX ∂ Pr [Yi = 1|CSi? = {0, 1}, Xi , θ] ∂pi (Xi , θ) = EX ∂Xik ∂Xik (D.4) = EX [pi (Xi , θ) (1 − pi (Xi , θ))] θk . By contrast, for individuals with CSi? = {0}, Pr [Yi = 1|CSi? = {0} , Xi , θ] = 0 and the average marginal effect with respect to Xik is therefore zero, EX ∂ Pr [Yi = 1|CSi? = {0}, Xi , θ] = 0. ∂Xik 8 (D.5) Consequently, the average marginal effect with respect to Xik , averaged over the distribution of both Xi and CSi? is EX,CS ? ∂ Pr [Yi = 1, CSi? |Xi , θ] ∂ Pr [Yi = 1|CSi? = {0, 1}, Xi , θ] = EX Pr [CSi? = {0, 1}] ∂Xik ∂Xik (D.6) = EX [pi (Xi , θ) (1 − pi (Xi , θ))] θk ρ. As we will see later more in general, estimation of the structural parameters θ from a “standard” binary logit that ignores choice set heterogeneity will not typically be possible. With an inconsistent b it is unclear what kind of object one would recover when plugging θb in expression (D.4). estimator θ, As a consequence, any counterfactual simulation involving manipulations of the structural parameters θ will not be reliable. However, even if the structural parameters θ cannot be consistently estimated by standard methods, in some very special cases, for instance when Xi is normally distributed in the current example, a consistent estimator of the reduced form parameter (D.6) can be directly obtained by a linear regression of Yi on Xi (see (Wooldridge, 2010, Section 15.6), summarizing Stoker (1986)). Note that such estimated average marginal effects will overstate the average marginal effects for those with CSi? = {0} and understate the average marginal effects for those with CSi? = {0, 1}: EX [pi (Xi , θ) (1 − pi (Xi , θ))] θk > EX [pi (Xi , θ) (1 − pi (Xi , θ))] θk ρ > 0. (D.7) Even in those rare cases in which the average marginal effects (D.6) can be directly estimated, their practical usefulness may be limited: the range of counterfactuals that one could reliably simulate by only knowing (D.6), rather than θ and ρ, are limited to situations in which there will be no changes in the unobserved distribution of choice sets, i.e. when ρ is not affected by the counterfactual behaviors. For example, if a merger entailed a change in the marketing strategies or the distribution network for some of the products sold by the merging firms, then only knowing (D.6) may be insufficient for a realistic evaluation, in the sense that ρmerger may differ from ρ. In order to overcome this problem, we propose a consistent estimator of the structural preference parameters θ, which then will enable us to compute the elasticity bounds in (D.7). Since these elasticity bounds are consistent with any value of ρ, they will be informative also in the evaluation of those counterfactuals in which ρ could potentially be affected. 9 References Ackerberg, D., L. Benkard, S. Berry, and A. Pakes, “Econometric Tools for Analyzing Market Outcomes,” in J. J. Heckman and E. Leamer, eds., Handbook of Econometrics, North-Holland, 2007, chapter 63, pp. 4171–4276. Allenby, Greg M and Peter E Rossi, “Marketing models of consumer heterogeneity,” Journal of Econometrics, 1998, 89 (1), 57–78. Bernheim, B Douglas and Antonio Rangel, “Beyond Revealed Preference: Choice Theoretic Foundations for Behavioral Welfare Economics,” Quarterly Journal of Economics, 2009, pp. 51– 104. Berry, Steven, James Levinsohn, and Ariel Pakes, “Automobile prices in market equilibrium,” Econometrica, 1995, pp. 841–890. , , and , “Differentiated Products Demand Systems from a Combination of Micro and Macro Data: The New Car Market,” Journal of Political Economy, 2004, pp. 68–105. Bierlaire, Michel, Denis Bolduc, and Daniel McFadden, “The estimation of generalized extreme value models from choice-based samples,” Transportation Research Part B: Methodological, 2008, 42 (4), 381–394. Blundell, Richard, “How revealing is revealed preference?,” Journal of the European Economic Association, 2005, 3 (2-2), 211–235. Chamberlain, Gary, “Analysis of Covariance with Qualitative Data,” The Review of Economic Studies, 1980, 47, 225–238. Chiang, Jeongwen, Siddhartha Chib, and Chakravarthi Narasimhan, “Markov chain Monte Carlo and models of consideration set and parameter heterogeneity,” Journal of Econometrics, 1998, 89 (1), 223–248. Conlon, Christopher T and Julie Holland Mortimer, “Demand estimation under incomplete product availability,” American Economic Journal-Microeconomics, 2013, 5 (4), 1–30. 10 Debreu, Gerard, “Theory of value. Cowles Foundation monograph 17,” New Haven and London, 1959. DellaVigna, S., “Psychology and Economics: Evidence from the Field,” Journal of Economic Literature, 2009, 47 (2), 315–372. D’Haultfœuille, Xavier and Alessandro Iaria, “A Convenient Method for the Estimation of the Multinomial Logit Model with Fixed Effects,” Economics Letters, 2016, 141, 77–79. Draganska, M., M. Mazzeo, and K. Seim, “Beyond Plain Vanilla: Modeling Joint Product Assortment and Pricing Decisions,” Quantitative Marketing and Economics, 2009, 7, 105–146. Draganska, Michaela and Daniel Klapper, “Choice set heterogeneity and the role of advertising: An analysis with micro and macro data,” Journal of Marketing Research, 2011, 48 (4), 653–669. Eliaz, Kfir and Ran Spiegler, “Consideration sets and competitive marketing,” The Review of Economic Studies, 2011, 78 (1), 235–262. Fox, Jeremy T, “Semiparametric estimation of multinomial discrete-choice models using a subset of choices,” The RAND Journal of Economics, 2007, 38 (4), 1002–1019. Goeree, Michelle Sovinsky, “Limited Information and Advertising in the U.S. Personal Computer Industry,” Econometrica, 2008, 76 (5), 1017–1074. Griffith, Rachel and Martin O’Connell, “The Use of Scanner Data for Research into Nutrition,” Fiscal Studies, 2009, 30, 339–365. Hausman, Daniel, “Mindless or Mindful Economics: A Methodological Evaluation,” in Andrew Caplin and Andrew Schotter, eds., The Foundations of Positive and Normative Economics: A Handbook, Oxford University Press, 2008, pp. 125–155. Hausman, Jerry A and Paul A Ruud, “Specifying and testing econometric models for rankordered data,” Journal of econometrics, 1987, 34 (1), 83–104. Hausman, Jerry and Daniel McFadden, “Specification tests for the multinomial logit model,” Econometrica, 1984, pp. 1219–1240. 11 Imbens, Guido W and Charles F Manski, “Confidence intervals for partially identified parameters,” Econometrica, 2004, pp. 1845–1857. Koszegi, B. and M. Rabin, “Mistakes in Choice-Based Welfare Analysis,” American Economic Review, 2007, 97 (2), 477–481. Leicester, Andrew and Zoe Oldfield, “An analysis of consumer panel data,” IFS Working Papers W09/09, 2009. Lerman, S. and C. Manski, “On the Use of Simulated Frequencies to Approximate Choice Probabilities,” in C. Manski and D. McFadden, eds., Structural Analysis of Discrete Data with Econometric Applications, MIT Press, 1981. los Santos, Babur De, Ali Hortaçsu, and Matthijs R Wildenbeest, “Testing models of consumer search using data on web browsing and purchasing behavior,” American Economic Review, 2012, 102 (6), 2955–2980. Lu, Zhentong, “Estimating Multinomial Choice Models with Unobserved Choice Sets,” 2016. Working paper, University of Wisconsin-Madison. Manski, Charles F, “The structure of random utility models,” Theory and decision, 1977, 8 (3), 229–254. Manzini, Paola and Marco Mariotti, “Stochastic choice and consideration sets,” Econometrica, 2014, 82 (3), 1153–1176. Masatlioglu, Yusufcan, Daisuke Nakajima, and Erkut Y Ozbay, “Revealed attention,” American Economic Review, 2012, pp. 2183–2205. McFadden, Daniel, “Modeling the Choice of Residential Location,” in A. Karlqvist, L. Lundqvist, F. Snickars, and J. Weibull, eds., Spatial Interaction Theory and Planning Models, Vol. 1, NorthHolland, 1978, pp. 75–96. , Kenneth Train et al., “Mixed MNL models for discrete response,” Journal of applied Econometrics, 2000, 15 (5), 447–470. 12 Nevo, Aviv, “Measuring Market Power in the Ready-To-Eat Cereal Industry,” Econometrica, 2001, 69 (2), 307–342. Nierop, Erjen Van, Bart Bronnenberg, Richard Paap, Michel Wedel, and Philip Hans Franses, “Retrieving unobserved consideration sets from household panel data,” Journal of Marketing Research, 2010, 47 (1), 63–74. Pancras, Joseph and K Sudhir, “Optimal marketing strategies for a customer data intermediary,” Journal of Marketing Research, 2007, 44 (4), 560–578. Pesendorfer, Martin, “Retail sales: A study of pricing behavior in supermarkets,” The Journal of Business, 2002, 75 (1), 33–66. Reutskaja, Elena, Rosemarie Nagel, Colin F Camerer, and Antonio Rangel, “Search dynamics in consumer choice under time pressure: An eye-tracking study,” The American Economic Review, 2011, 101 (2), 900–926. Ruud, Paul A, “Tests of specification in econometrics,” Econometric Reviews, 1984, 3 (2), 211–242. Samuelson, P.A., “A Note on the Pure Theory of Consumer’s Behviour,” Economica, 1938, 5 (17), 61–71. Spiegler, Ran, Bounded Rationality and Industrial Organization, Oxford University Press, 2011. Stoker, Thomas M, “Consistent estimation of scaled coefficients,” Econometrica, 1986, pp. 1461– 1481. Train, K., Discrete Choice Methods with Simulation, Cambridge University Press, 2009. Varian, H., “Revealed Preference,” in M. Szenberg, L. Ramrattan, and A.A. Gottesman, eds., Samuelsonian Economic and the Twenty-First Centur, New York: Oxford University Press, 2006, chapter 99-115. Villas-Boas, J Miguel and Ying Zhao, “Retailer, manufacturers, and individual consumers: Modeling the supply side in the ketchup marketplace,” Journal of Marketing Research, 2005, 42 (1), 83–95. 13 Wooldridge, Jeffrey, Econometric Analysis of Cross-Section and Panel Data, 2nd ed., MIT Press, 2010. 14
© Copyright 2026 Paperzz