How Many Choice Contexts are There? Annie Liang∗ June 25, 2016 Abstract Recent evidence suggests that many choice datasets reflect maximization of several context-dependent orderings, rather than a single context-independent ordering. This paper asks whether it is possible to identify the number of context-dependent orderings maximized in the data from choice observations alone (i.e. without explicit labeling of contexts). I study a model in which a decision-maker imperfectly maximizes a set of context-dependent preferences, and propose an approach for recovering the number of context-dependent preferences. The main results show that exact recovery of the number of contextdependent orderings is feasible with probability exponentially close to 1 (in quantity of data) using the proposed approach. 1 Introduction Let X be a finite set of choice alternatives, and suppose that a decision-maker (DM) chooses from many subsets A ⊆ X. In the classical approach, the analyst assumes that the DM maximizes a single ordering over X in each of these choices. But there is now ample evidence that many choice datasets reflect maximization of several context-dependent orderings, rather than a single context-independent ordering. For example, risk preferences over insurance plans can vary depending on the domain of insurance (Einav, Finkelstein, Pascu & Cullen 2012), and preferences over consumer goods can vary depending on ordering effects and decoys (Kahneman & Tversky 2000, Huber, Payne & Puto 1982). Moreover, many choice datasets are aggregated over a population of DMs with different preferences (Crawford & Pendakur 2012).1 ∗ Microsoft Research and University of Pennsylvania. Email: [email protected]. I am especially grateful to Jerry Green. I would also like to thank Emily Berger, Gabriel Carroll, Drew Fudenberg, Ben Golub, David Laibson, Eric Maskin, Jose Montiel Olea, Gleb Romanyuk, Andrei Shleifer, Ran Shorrer, Tomasz Strzalecki, and Yufei Zhao for useful comments and discussions. 1 In the last example of a population of DMs, we should think of each context as representing a class of DMs. 1 Despite evidence that choice is context-dependent in many settings, contextindependence remains the standard assumption in part due to the difficulty of discerning choice contexts. Descriptions of choice settings are not always available, and when they are, it may not be apparent which aspects are relevant to choice. For example, if individuals maximize the same preferences over health insurance and dental insurance, but a different preference for their 401K investments, the analyst need not know that these are the relevant groupings. This paper asks whether it is possible to identify the number of contexts from choice observations alone. The number of context-dependent preferences that are maximized in the data is important, because it informs the appropriateness of reduction to a single context-independent ordering. This is relevant for welfare assessment: identification of context-dependency can tell us whether inconsistencies in choice data reflect error (and are hence uninformative towards the DM’s true preferences), or reflect rational maximization (and hence, should be considered for welfare analysis). The number of contexts is also relevant to parameter estimation: it tells us whether it makes sense to estimate preference parameters on the full dataset, or whether improvement can be attained by conditioning on a relevant subsample. Finally, the number of contexts is relevant for prediction of future choices: it tells us whether predictive power is maximized by inferring a single ordering for prediction, or can be improved by inferring several context-dependent orderings. Towards this goal, I study a model in which the classical setting is enriched by a set of choice contexts F (this follows the approach of Bernheim & Rangel (2009) and Rubinstein & Salant (2008)), and the DM possesses a set of contextdependent preferences {f }f ∈F . I assume that the DM imperfectly maximizes these preferences, so that given choice set A and context f , the DM chooses the f -optimal alternative in A with probability 1 − p, and errs with probability p.2 No parametric restrictions are imposed on the shape of error. The analyst wants to know the number of choice contexts |F |, but observes only the DM’s choices and the choice sets they were selected from. The question of interest is whether the analyst can recover the number of contexts |F | given this information alone. The intrinsic challenge in this problem is the multiplicity of mappings from choice data to context-dependent orderings. At one extreme, we can preserve the single ordering model by identifying a “best-fit” ordering to the data (e.g. as proposed by Houtman & Maks (1985)). At another extreme, Kalai, Rubinstein & Spiegler (2002) suggest identifying the smallest number of orderings such that every observation is consistent with maximization of some ordering. These approaches promise disciplined elicitation of the number of contexts, but the former is inappro2 The DM’s choice rule is thus a stochastic mapping from choice problems (A, f ) to distributions over A. 2 priate if the number of contexts is greater than one, and the latter is inappropriate if the probability of error is positive (since errors will be mistaken for preference heterogeneity). This paper thus suggests a new, intermediate, solution that linearly trades off the fit of the model to the data (as measured via the number of unexplained observations) and the number of orderings used. The tradeoff between these two goals is mediated via a constant λ, where the Houtman & Maks (1985) and Kalai, Rubinstein & Spiegler (2002) solutions correspond to special cases of λ. The main results show that the proposed approach recovers the correct number of contexts with probability converging to 1 in the quantity of data. Specifically, if the choice problems that the DM is presented with are drawn from a distribution that satisfies certain conditions, then there is an “optimal” set of choices for λ such that the number of inferred contexts converges to |F | as the quantity of data is taken to infinity. The set of choices for λ are determined by three features: the true number of contexts, the probability p that the DM errs in each choice, and the “differentiation” in choice implications of the DM’s context-dependent orderings. Section 5 develops and discusses these results for the special case in which the distribution over choice sets and contexts is uniform, and Section 6.1 generalizes to distributions satisfying a condition of “limit preference differentiation.” Section 6.2 extends the approach and recovery results for DMs with continuous preferences over a non-finite set X. Section 7 discusses limitations of the main recovery results: while we can recover the number of contexts, it is not in general possible to recover the contextdependent orderings themselves. I discuss possible extensions of the approach to recover further details of preference. Section 8 is the literature review, and Section 9 concludes. 2 Choice Model Let X be a finite set of choice alternatives, with A ⊆ X denoting a typical choice set, and 2X denoting the set of possible choice sets. Following Rubinstein & Salant (2008) and Bernheim & Rangel (2009), let F be a finite set of K choice contexts. A choice problem is a pair (A, f ) ∈ PX × F consisting of a choice set A and a choice context f . I assume that each choice context f cues a (distinct) strict preference relation f , which the DM imperfectly maximizes. Formally, the DM’s choice rule is a map P from choice problems (A, f ) to distributions over the elements of A. I will use xA,f to denote the f -maximal element in A (the alternative that a perfectly maximizing DM would choose). The constant p := min p0 : P (A, f )[xA,f ] ≥ 1 − p0 3 ∀ (A, f ) is a uniform upper bound on the DM’s probability of error, and will figure importantly into the main results. Remark 2.1. Special cases of this model include the following: (a) p = 0, K = 1 returns perfect maximization of a single ordering. (b) p = 0, K > 1 returns a model closely related to ideas proposed in Rubinstein & Salant (2008) and Bernheim & Rangel (2009). If moreover, F ⊆ 2X , so that the set of contexts partitions the set of choice sets, then we have the multiple preference model considered in Kalai, Rubinstein & Spiegler (2002). A total of n choice problems are repeatedly sampled from a distribution π over the set of unique choice problems 2X × F . In each choice problem, the DM chooses according to his choice rule P . This induces the realizations D∗ = {(x, (A, f )) | (A, f ) ∈ A}, where A ⊆ 2X × F is the realized set of choice problems, and (x, (A, f )) denotes choice of x given choice problem (A, f ). Instead of observing D∗ , the analyst observes D = {(x, A) | (x, (A, f )) ∈ D∗ }, which includes information about the DM’s choices and choice sets, but not the contexts. Throughout, I will refer to a pair (x, A) as a choice observation. The question of interest is the following: when, and under what conditions, can an analyst recover the number of contexts K := |F | from the observed dataset of choices D? 3 Example Let us first consider a simple example. Suppose that a DM makes repeated choices between alternatives in X = {x1 , x2 , x3 , x4 }. His choices depend on the weather; in particular, on whether it is a rainy (R) or sunny (S) day. On rainy days, with probability 1 − p < 1 he maximizes x1 R x2 R x3 R x4 , and otherwise he trembles uniformly over the other alternatives. On sunny days, with probability 1 − p < 1 he maximizes x4 S x3 S x2 S x1 and otherwise he trembles uniformly over the other alternatives. 4 The analyst does not know that rain and sun are relevant to the DM’s choices, and does not know which choices were made on rainy or sunny days. If he observes only choice sets and realized choices, can he infer that the DM was influenced by two choice contexts? To fix ideas, let us consider a particular realization of the data, in which the DM is presented with every (nonempty and non-singleton) choice problem in 2X × {R, S}, and selects xA,f (the f -maximal element of A) in every choice problem (A, f ), except in the following two choices: (x2 , ({x1 , x2 , x3 }, R)) and (x3 , ({x1 , x3 , x4 }, S)).3 (1) Then, there are 18 total observations, eight of which are consistent with maximization of R , eight of which are consistent with maximization of S , and two which are not consistent with maximization of either. There are many possible rationalizations of this data. For example, the analyst can assign the single ordering R to the DM, explaining eight observations, and interpreting the remaining observations as choice error. This is consistent with the proposal of Houtman & Maks (1985), which determines the largest subset of choice observations that are consistent with a single ordering.4 The approach, however, fails to identify the context-dependency in the DM’s choice behaviors. At another extreme, the analyst can perfectly describe the DM’s preferences using the set {R , S , C }, where x2 C x3 C x4 C x1 . This is consistent with the proposal of Kalai, Rubinstein & Spiegler (2002): the smallest set of orderings such that every observation is consistent with maximization of some ordering. This approach, however, over-fits the observed data, interpreting the error in (1) as genuine heterogeneity in preference. The present paper proposes a solution intermediate to those described above. The idea is to identify the fewest number of orderings that can explain as many observations in the data as possible. Intuitively, two orderings can explain almost all of the observations, while decreasing to a single ordering is costly (leaving eight additional observations unexplained), and adding one additional orderings is not very useful (explaining only two additional observations). Formally, let ∆D,k be the minimal number of choice observations that must be left unexplained if we use k orderings to rationalize the data, noting from the above observations that ∆D,1 = 8, ∆D,2 = 2, and ∆D,3 = 0. Define Kλ∗ = min k + λ∆D,k , k∈Z+ 3 In the former, the DM should have chosen x1 , the R -maximal element of {x1 , x2 , x3 }, but instead chose x2 , and in the latter, the DM should have chosen x4 , the S -maximal element of {x1 , x3 , x4 }, but instead chose x3 . 4 This solution is not unique. For example S also explains eight observations. 5 where λ is any constant in the interval 81 , 12 . Then, Kλ∗ = 2, recovering the “correct” number of contexts. Where did choice of λ ∈ 81 , 12 come from, and how special is recovery of Kλ∗ = 2 to the particular (constructed) dataset we considered? Theorem 1 generalizes the ideas here, providing conditions and choices of λ for which recovery obtains for most datasets, where “most” is quantified by a probability of realization that approaches 1 as the number of observations increases. 4 Proposed Approach I will now describe the approach in more general terms. Let D be any (multi-)set of choice observations, and suppose we wish to rationalize these observations using a set of preference orderings Λ. Define ∆D,Λ := #{(x, A) ∈ D : x is not -maximal in A for any in Λ}. to be the number of observations in D that are not consistent with maximization of any ordering in Λ. I will refer to ∆D,Λ as the implied choice error. Let ∆D,k := min ∆D,Λ |Λ|=k denote the smallest implied choice error that is obtainable by use of some set of k orderings; naturally, ∆D,k is decreasing in k. Following Kalai, Rubinstein & Spiegler (2002), say that choice data D is k-rationalizable if ∆D,k = 0. Remark 4.1. For every dataset D, there exists a constant L ≤ min{|D|, |X|} such that D is k-rationalizable for every k ≥ L. Following, I propose an estimate for the number of contexts: Definition 4.1. For every λ ∈ R+ , define Kλ∗ = argmin k + λ∆D,k . (2) k∈Z+ Remark 4.2. In an abuse of notation, I will refer to Kλ∗ as a singleton, although this solution may not be unique. The constant λ arbitrates between the two goals of minimizing the number of orderings and minimizing the number of implied choice errors. Intuitively, λ1 is the “cost” of each ordering, so that an ordering is attributed to the DM if and only if it explains at least λ1 observations that would otherwise be interpreted as choice error. As λ varies over (0, ∞), the solution to (2) traces out the optimal trade-off curve between quantity of choice errors and number of orderings. 6 Notice in particular that as λ → 0, errors become approximately free (relative to the cost of orderings), so that the analyst prefers to adopt a unique ordering for the agent and interpret the remaining observations as error. For large choices of λ, new orderings are approximately free (relative to the cost of error), so the analyst prefers to use as many orderings as necessary to eliminate choice error. At the edges, the choice of any λ < ∆11 returns the Houtman & Maks (1985) solution, and the choice of any λ > 1 returns the Kalai, Rubinstein & Spiegler (2002) solution. Observation 1. (a) Kλ∗ = 1 for every λ < ∆11 . (b) Kλ∗ = L for every λ > 1, where L is the smallest integer such that the data is L-rationalizable. How should the analyst choose λ, and under what conditions on the datagenerating process do we find the proposed approach to recover the “true” number of orderings maximized by the agent? 5 Main Result For clarity of exposition, I will first take the sampling distribution π to be uniform over 2X ×F (this section), and subsequently allow for general sampling distributions in Section 6. 5.1 No error benchmark (p = 0) Suppose first that p = 0, so that the DM always chooses the f -maximal observation given choice problem (A, f ). The following example illustrates that recovery of the number of contexts K can be infeasible even in this idealized benchmark. Let X = {x1 , x2 , x3 } and suppose that sthere are three contexts: rainy (R), sunny (S), and cloudy (C). The DM’s preferences are {R , S , C }, where x1 R x2 R x3 x1 S x3 S x1 x3 C x2 C x1 Notice that every choice observation consistent with maximization of some ordering in {R , S , C } is also consistent with maximization of an ordering in {1 , 2 }, where x1 1 x2 1 x3 x3 2 x2 2 x1 The set of orderings {1 , 2 } thus achieves the same choice error as {R , S , C }, but uses strictly fewer orderings, and is thus preferred under the criterion in (2). 7 This illustrates that recovery of the number of contexts K using (2) requires sufficient “diversity” in the choice implications of the DM’s context-dependent preferences. Following, I define one such notion of choice diversity. First, a preliminary definition: Definition 5.1. Say that choice problems A = {(A1 , f1 ), . . . , (AK , fK )} are in Kviolation of IIA if 1. xAi ,fi 6= xAj ,fj whenever i 6= j, and T 2. xAi ,fi ∈ K j=1 Aj for every i. The first condition requires that every choice problem in A has a distinct optimal choice, and the second condition requires that each of these K alternatives is available from every Ai .5 An immediate implication is that the set of choice observations {(xAi ,fi , (Ai , fi ))}K i=1 cannot be rationalized using fewer than K orderings. Definition 5.2. The differentiation parameter d is the largest integer d such that 2X × F admits d non-overlapping subsets, denoted A1 , . . . , Ad , where for every i ∈ {1, . . . , d}, 1. |Ai | = K, and 2. Ai is in K-violation of IIA. That is, the set of choice problems 2X × F includes at least d (nonoverlapping) sets of K choice problems, each of which in K-violation of IIA. Example 5.1. The differentiation parameter of {R , S , C } is 0, since there do not exist any sets {(A1 , f1 ), (A2 , f2 ), (A3 , f3 )} in 3-violation of IIA. Example 5.2. Consider X = {x1 , . . . , xn } and {f , f 0 } where x1 f x2 f · · · f xn xn f 0 xn−1 f 0 · · · f 0 x1 The differentiation parameter of {f , f 0 } is 2n −n−1, since every set {(A, f ), (A, f 0 )} with |A| > 1 is in 2-violation of IIA. Then, if d > 0, so that there is at least one set of choice problems A in K-violation, we can choose λ > 1 to return the correct number of contexts (this is the Kalai, Rubinstein & Spiegler (2002) solution). Claim 1. Suppose p = 0 and d > 0. Then, for any λ > 1, Pr(Kλ∗ = K) → 1 as n → ∞. The proof is obvious and omitted.6 5 6 Notice that every pair of choice problems from A constitutes a (standard) violation of IIA. In fact, two stronger statements can be made: 8 5.2 Error (p > 0) But when the DM imperfectly maximizes, λ > 1 will not always be the optimal choice for recovery of K. In particular, the presence of error complicates inference in two ways. First, error may generate choices that cannot be rationalized using the same ordering, but which should not be mistaken as representing genuine heterogeneity in preference. For example, consider a decision maker who has the single preference ordering , and errs with some small but positive probability p. Then, a perfect rationalization of n observations is likely to require many more than a single context, even while the single ordering explains “almost all” observation. Recall our previous intuition that the criterion in (2) prices each new ordering at λ1 observations. In view of this, the current example suggests that λ must be small enough ( λ1 large enough) that an ordering which solely explains error will not be attributed to the DM. The precise notion of “small enough” will depend on the probability of error p—the larger p is, the smaller λ must be. Second, allowing for error may “undo” genuine heterogeneity, allowing the imperfectly maximized dataset to be explained using fewer orderings than the perfectly maximized dataset. Consider for example a decision maker with the following two preferences: x1 x2 · · · x99 x100 x1 x2 · · · x100 x99 Most choice observations that result from maximizing 2 can be rationalized using 1 , and vice versa. Thus, if λ1 is large, the solution to (2) will return a single ordering, interpreting the remaining observations as error. This example suggests that λ must be large enough ( λ1 small enough) that real heterogeneity in preference is preserved, even in the presence of error. The restrictiveness of this constraint depends again on p, as well as on the degree of “diversity” in the choice implications of the context-dependent preferences—the smaller p is, and the more differentiated the context-dependent orderings are, the smaller λ can be. Collecting these ideas, the main theorem provides a set of sequences (λn )n≥1 given which the proposed approach recovers the true number of contexts for most realized datasets, where “most” is quantified by a probability converging to 1. • Consider any realization of choice problems A ⊃ 2X × F . Then, for every λ > 1, it holds that Kλ∗ = K. • The assumption that d > 0 is not necessary; it is sufficient that the set of observations {(xA,f , A) , A ∈ 2X , f ∈ F )} is K-rationalizable, and not k-rationalizable for any k < K. 9 Theorem 1. For any sequence (λn )n≥1 , where each λn ∈ Pr Kλ∗n = K → 1 as n → ∞. K2|X| , 1 d(1−p)K n pn , Moreover, the rate of convergence is exponential in n. I provide a brief idea of the proof below, and defer the details to the appendix. The key idea is to identify every dataset with an undirected (hyper)graph7 in the following way: nodes represent choice alternatives, and there is an edge between a set of observations if and only if these observations are not consistent with maximization of any single ordering. The key observation in the proof is that a dataset is k-rationalizable if and only if the corresponding graph is k-colorable8 . This equivalence is shown by taking each color class to represent consistency with a distinct ordering. Thus, the problem in (2) can be re-interpreted as finding the smallest number of colors k such that the greatest number of nodes are k-colorable. Fix any (multi-)set of choice problems A and consider the dataset that would be generated under p = 0. Since by construction, this dataset is generated by perfect maximization of K orderings, the corresponding (hyper-)graph must admit a K-coloring. Moreover, since every set of observations in a K-violation of IIA constitutes a complete K-partite subgraph, and the data includes at least d such sets, the corresponding graph includes at least d complete K-partite subgraphs. So it cannot be colored by fewer than K colors. The differentiation parameter d captures the “robustness” of this observation to small perturbations in the graph. Now introduce choice error. Each node (observation) is affected with probability p, following which its edges are changed. But p is sufficiently small, then there will not be enough nodes in error to justify use of a new ordering. And if d is large enough, so that the number of complete K-partite subgraphs in the original graph is sufficiently high, then any coloring of the graph with fewer than K colors will be very incomplete. Thus, in the perturbed graph, which corresponds to the realized data, most nodes in the perturbed graph can be partitioned into K colors, and no fewer. 6 Generalizations 6.1 Relaxing assumption on π Section 5 focused on the case in which π is uniform over choice problems. The recovery properties of the proposed approach are not special to this distribution, and can in some cases be improved using others. First, let us generalize the previous notion of a differentiation parameter in the following way. 7 A hypergraph is a generalization of a graph in which edges may connect more than two vertices. A k-coloring of a graph is a partition of its vertex set V into k color classes such that no edge in E is monochromatic. A graph is k-colorable if it admits an k-coloring. 8 10 Definition 6.1. For every set of choice problems A = (A1 , f1 ), . . . , (AK , fK ), define d(A) to be the largest integer d0 such that A admits d non-overlapping subsets, denoted A1 , . . . , Ad , such that for every i ∈ {1, . . . , d}, 1. |Ai | = K, and 2. Ai is in K-violation of IIA. Taking A = 2X × F returns the definition from Section 5.1. Then: Proposition 1. Let π be any distribution with the property that Z d(A)dπ(A) → δ as n → ∞ (3) {A :|A|=n} where δ >. Then, for any sequence (λn )n≥1 , where each λn ∈ Pr(Kλ = K) → 1 1 , 1 δ(1−p)K n pn , as n → ∞. The condition in (6) says that the differentiation parameter of the realized set of choice problems A converges to δ as the number of observations is taken to infinity. Thus, given sufficiently many observations, there are approximately δn sets of choice problems in K-violation of IIA. This allows us to distinguish the various orderings. Notice that the special case in which π is uniform satisfies the condition in (6) |X| with δ = K2d . In general, the larger δ is, the better the recovery properties. For example, we can improve upon the main result in Section 5 if π samples only from choice sets for which the different context-dependent orderings disagree—of course, this presumes that the analyst has prior knowledge about the orderings. 6.2 Continuous utility functions So far, we have considered a DM whose preferences are linear orderings over a discrete set X. I will show now that the results extend in a simple and natural way to the case in which his preferences are a set of continuous utility functions {uf }f ∈F , uf : X → R, where (X, τ ) is a topological space. Specifically, let us suppose that the DM is presented with compact choice sets A ⊆ X, and chooses according to a stochastic choice rule P , which takes choice problems (A, f ) into distributions over A.9 The sampling distribution π is a (Borelmeasurable) distribution over choice problems (A, f ).10 The sampling distribution π and stochastic choice rule P induce D∗ = {(x, (A, f )) | (A, f ) ∈ A}, 9 Every P (A, f ) is a Borel-measurable distribution over X with support contained in A. Take the topology over choice problems to be the product topology of τ and the discrete topology on F . 10 11 and again we assume that the analyst observes only D = {(x, A) | (x, (A, f )) ∈ D∗ }. (4) The analogue of the proposed approach then defines 0 u(x ) for any u ∈ U . ∆D,k = min # (x, A) ∈ D : x 6= max 0 x ∈A |U |=k and the proposed solution is: Definition 6.2. For every λ ∈ R+ , define K ∗λ ∈ argmin k + λ∆D,k . (5) k∈Z+ For the corresponding statement of the recovery result, define xA,f = max uf (A) x∈X to be the choice that a perfectly maximizing DM makes given choice problem (A, f ), and define p = min{p0 : P (A, f )[xA,f ] ≥ 1 − p0 ∀ (A, f ) to be a (uniform) upper bound on the probability of error. Take d(A) to be the differentiation parameter defined in Section 6.1. The following statement is a corollary to Proposition 1: Corollary 1. Let π be any (Borel-measurable) distribution with the property that Z d(A)dπ(A) → α as n → ∞ (6) {A :|A|=n} Then, for any sequence (λn )n≥1 , where each λn ∈ Pr(Kλ = K) → 1 1 , 1 α(1−p)K n pn , as n → ∞. Remark 6.1. Observe that, as before, we impose no parametric assumptions on the distribution of error. This result shows that parametric assumptions are not required for recovery, but fails to utilize properties of error that are especially natural in the continuous setting, and may allow for stronger results. The key observation is that any choice data of the nature described in (4) can be mapped into discrete choice data, where we reduce X to the finite set of alternatives that are chosen in some choice observation. For example, suppose X = R, and the observed choices are (3, [0, 4]), (2, [1, 4]), 12 (8, [0, 10]). Then, labelling ‘3’ as x1 , ‘2’ as x2 , and ‘8’ as x3 , we can redefine the set of choice alternatives as X = {x1 , x2 , x3 }, and the choice data as (x1 , {x1 , x2 }), (x2 , {x1 , x2 }), (x3 , {x1 , x2 , x3 }). This yields a dataset of the nature studied previously. It remains to show that the new problem posed in (5) is equivalent to the original problem posed in (2), given the above transformation of the continuous data. This is shown in Lemma 4 in the appendix. 7 7.1 Can we Recover More? Limitations Section 4.1 provides conditions under which the problem in (2) recovers the correct number of contexts with high probability. Is it possible to recover the contextdependent orderings themselves? The following proposition reveals that the answer is almost always no, even in the idealized p = 0 benchmark. Definition 7.1. Say that the set of orderings Λ is identifiable if there exists some dataset D such that ∆D,Λ = 0, and moreover, ∆D,Λ0 > 0 for every Λ0 with |Λ0 | ≤ |Λ|. This weak definition of identifiability says that there exists at least one dataset D such that Λ is the unique set of |Λ| or fewer orderings that perfectly explain every observation in D. Proposition 2. No set Λ with |Λ| ≥ 3 is identifiable. If Λ = {1 , 2 }, then Λ is identifiable if and only if the 1 -maximal alternative is the 2 -minimal alternative, and vice versa. Every singleton set Λ = {} is identifiable. This proposition does not rely on specific features of the proposed approach (beyond favoring use of with fewer orderings), but highlights a basic challenge of inferring multiple preferences from choice data. Indeed, not only is it generally impossible to recover the exact set of preferences {f }f ∈F , but it is not always possible to recover the choice implications of these orderings either. To be more precise, let us first define a weaker notion of identifiability: Definition 7.2. Say that sets of orderings Λ and Λ0 are choice-equivalent if an observation (x, A) is consistent with maximization of some ordering in Λ if and 13 only if it is also is consistent with maximization of some ordering in Λ0 . Say that the set of orderings Λ is choice-identifiable if there exists some dataset D such that ∆D,Λ = 0, and moreover, if ∆D,Λ0 = 0, for some other Λ0 satisfying |Λ0 | ≤ |Λ|, then Λ and Λ0 are choice-equivalent. To give a few examples, the set of orderings {1 , 2 } with x1 x2 x3 x2 x1 x3 and the set of orderings {01 , 02 } with x1 x2 x3 x2 x3 x1 are choice-equivalent, since every observation that can be rationalized by either 1 or 2 can also be rationalized by either 01 or 02 . The weaker notion of choiceidentifiability says that even though we may not be able to recover the exact set of preferences {f }f ∈F , we can recover the choice implications of this set, in the form of all observations (x, A) that are consistent with maximization of some f . The negative example below illustrates that choice-identifiability is not guaranteed for all sets of orderings. Example 7.1. Let X = {x1 , x2 , x3 , x4 , x5 }. Define Λ = {1 , 2 } to satisfy x1 1 x2 1 x3 1 x4 1 x5 x2 2 x1 2 x3 2 x4 2 x5 and the set of orderings Λ0 = {01 , 02 } to satisfy x1 01 x2 01 x3 01 x4 01 x5 x5 02 x4 02 x3 02 x2 02 x1 Then, every choice observation that is consistent with maximization of some ordering in {1 , 2 } is also consistent with maximization of some ordering in {1 , 02 }, so that if ∆D,Λ = 0, then also ∆D,Λ0 = 0. But Λ and Λ0 are not choice-equivalent; therefore, Λ is not choice-identifiable. 14 7.2 Possibilities The example above highlights the (general) phenomenon that across sets with a fixed number of orderings, there is variation in the “richness” of choice implication. Since the approach in (2) penalizes all sets consisting of the same number of orderings equally, it is biased towards elicitation of sets with richer choice implications. That is, if the decision maker’s context-dependent preferences are many but similar, the proposed approach will incorrectly interpret the data using orderings that are fewer but “more different”. This example suggests an alternative penalty on the “complexity” of the choice model that counts not the number of orderings, but rather the number of choice implications. For example, we might extend the approach in (2) to loss functions of the form f (Λ) + λ∆D,k , where f (Λ) = #{(x, A) : x is -maximal in A for some ∈ Λ}. This brief discussion illustrates that the approach of trading off choice errors and model-complexity (reminiscent of statistical regularization) can be extended beyond the particular loss function studied in this paper. Alternative specifications may prove productive towards recovery of further features of preference. 8 8.1 Related Literature Identifying preferences from choice data This paper builds on a literature that seeks to nonparametrically identify preferences from choice data. Most directly, it extends ideas in Kalai, Rubinstein & Spiegler (2002), which defines a set of orderings {i }L i=1 as a rationalization by multiple rationales if for every observed choice is i -maximal in the available choice set for some i = 1, 2, . . . , L. Using the notation of Section 2, any set of orderings Λ with choice error ∆D,Λ = 0 is a rationalization by multiple rationales of the dataset D. This set of orderings may not, however, correspond to a best multiple-ordering rationalization of the data as defined in (2). In particular, I suggest that the analyst may prefer an imperfect rationalization of the data using some K < L orderings to a perfect rationalization of the data using L orderings. The key conceptual difference is that Kalai, Rubinstein & Spiegler (2002) is agnostic towards the degree of evidence for orderings, whereas the approach in this paper insists on sufficient evidence for each ordering in order to separate error from preference variation. Other nonparametric approaches for preference identification from choice data include Houtman & Maks (1985), which has been discussed extensively throughout this paper, Famulari (1995) and Varian (1982). There is also a separate literature on identification of preferences under various parametric assumptions on the 15 distribution of error (e.g. Quandt (1956), McFadden & Richter (1970), and Train (1986)). Finally, Crawford & Pendakur (2012) and Dean & Martin (2010) respectively apply the approaches of Kalai, Rubinstein & Spiegler (2002) and Houtman & Maks (1985) to real data. The former is particularly relevant to this paper. The authors analyze a dataset of 500 Danish households and their choices between six different kinds of milk, discovering that five utility functions are sufficient to explain all of the data. However, the authors point out that the fifth utility function explains only 8 out of 500 observations, and subsequently drop this utility function in many of their latter analyses. The solution proposed in the present paper is precisely designed to explain why this might be a reasonable action, and provide theoretical justifications for where to set the bar. 8.2 Testing rationality A long literature asks the question of how close choice data is to perfect rationalization of a single preference ordering. This question was originally asked by Afriat (1967) for choices over price vectors and consumption bundles, and subsequently studied also by Varian (1982), Houtman & Maks (1985), Gross (1989), and Apesteguia & Ballester (2012), among others. In view of this literature, a goal of this paper can be viewed as distinguishing between the case in which choice data doesn’t satisfy perfect rationalization of a single preference ordering because of random error, and the case in which choice data doesn’t satisfy perfect rationalization of a single preference ordering because of multiplicity in preference. In both cases, most of the inconsistency measures suggested above will be large, since a single ordering fails to explain the data well. The proposed solution Kλ∗ offers a way to distinguish between these two cases, since in the case of approximately perfect rationalization of multiple preferences, Kλ∗ will be large while ∆D,Kλ∗ will be small, while imperfect rationalization of a single preference will return Kλ∗ = 1 with a large value of ∆D,1 . 8.3 Multi-self decision making The model of choice that I consider throughout is most closely related to Rubinstein & Salant (2008) and Bernheim & Rangel (2009). These papers extend the standard model to include a set F of contexts (called frames in Rubinstein & Salant (2008) and ancillary conditions in Bernheim & Rangel (2009)). A choice problem (called extended choice problem in Rubinstein & Salant (2008) and generalized choice situation in Bernheim & Rangel (2009)) is defined as a pair (A, f ) where A ⊆ X is a choice set and f ∈ F is a context. An extended choice function c assigns to every extended choice problem (A, f ) an element of A. I consider an extension of this 16 model to allow for choice error, so that c(A, f ) is stochastic. The goal of the present paper differs rather significantly from these earlier papers, raising the new question of whether it is possible to recover the number of contexts in F using choice data. Additionally, there are several recent contributions to the literature on multiself decision-making (e.g. Fudenberg & Levine (2006), Rubinstein & Salant (2006), Manzini & Mariotti (2009), and Ambrus & Rozen (2013)). Among these, Ambrus & Rozen (2013) is most related, and shows (among other results) that without prior restriction on the number of selves involved in a decision, many multi-self models have no testable implications. Although the set of choice models considered in Ambrus & Rozen (2013) is different from the set of choice models considered in my paper,11 their lesson that restricting the number of selves is important for constraining the available degrees of freedom holds in the setting considered in this paper as well, and motivates in part the suggested criterion in (2). 9 Conclusion Inference of context-dependence preferences from choice data is a challenging problem, due to lack of explicit data on choice contexts, and an inherent indeterminacy of multiple-preference models (see, e.g., Proposition 2). This paper suggests that we may nevertheless be able to uncover the number of choice contexts from choice behavior alone. The proposed strategy seeks the smallest number of orderings that explains the greatest number of choice observations. Working in a model of context-dependent preferences based on Rubinstein & Salant (2008) and Bernheim & Rangel (2009), I show that with probability exponentially close to 1, the proposed approach is able to recover the true number of context-dependent preferences. This provides an alternative to existing approaches, which deliver either a single “best-fit” ordering or multiple “perfect-fit” orderings. 11 Ambrus & Rozen (2013) study multi-self models in which every self is active in every decision, and choice is determined through maximization of a choice-set independent aggregation rule over selves. In contrast, I study multi-self models in which every self acts as a “dictator” in a subset of choices. 17 A Preliminaries Following, I collect definitions and results that will be used in the proofs of Theorem 1 and Proposition 1. First, observe that K is the unique solution to the problem in (2) given data D if and only if K + λ∆D,K > k + λ∆D,k ∀ k ∈ Z+ , k 6= K. (7) It will be useful to rewrite this as ∆D,K − ∆D,k > (k − K)/λ. Fix a number of observations n. The sampling distribution π over choice problems and the DM’s stochastic choice rule P induce a distribution νn over datasets of size n, where " n # X Y νn (D) = ν({(xk , Ak )}nk=1 ) = π (Ak , fk ) P (xk | (Ak , fk )) . fk ∈F k=1 To simplify notation, I will frequently write νn (D) = ν(x, A) = π(A)P (x|A), Q using π(A) := nk=1 π(Ak , fk ) to mean the probability that the DM sees choice Q problems A = {(Ak , fk )}nk=1 , and P (x|A) = nk=1 P (xk | (Ak , fk )) to mean the probability that the DM chooses alternatives x = (xk )nk=1 , given choice problems in A. The probability that Kλ∗ = K is then the measure assigned by νn to the set of datasets D satisfying (7). To characterize this probability, it is useful to recast the problem in the following way. Let g : D 7→ GD be a map that identifies every dataset D with a (hyper-) graph GD = (VD , ED ), where VD = {1, . . . , n} indexes choice observations, and ED consists of every set T ⊆ VD such that: (1) the observations {(xi , Ai )}i∈T are 1rationalizable, and (2) no proper subset of {(xi , Ai )}i∈T are 1-rationalizable. These concepts are related to our problem as follows. Claim 2. D is k-rationalizable ⇐⇒ GD is k-colorable. Proof. Take each color class to represent consistency with a distinct ordering, and the equivalence follows directly. This claim further implies that ∆D,k is equal to the minimum number of vertices that must be removed from GD for the graph to be k-colorable. From here on, I will refer to the vertices of GD and the observations they represent interchangeably. Additionally, I will frequently refer to the pushforward measure g∗ (νn )(G) = νn (D : D ∈ g −1 (G)), 18 which is the distribution over graphs induced by the measure νn and map g. We will need a final set of definitions. Say that observation i is perfectly max∗ imized if xi is fi -maximal in Ai . For every set of choice problems A, let DA denote the data that would have been generated if every observation is perfectly ∗ = g (D ∗ ) to be the corresponding graph. maximized, and define GDA A Observation 2. (a) For any set of choice problems A, the perfect maximization ∗ is K-colorable. (b) Let A = 2X × F ; then, the graph GD ∗ contains at graph GDA A least d disjoint K-partite subgraphs. ∗ , and part (b) follows directly from the Part (a) follows from the definition of DA definition of the differentiation parameter. B Proof of Theorem 1 The desired result follows from two lemmas. Lemma 1. There exists a constant c1 > 0 (uniform across n) such that g∗ (νn )({D : k + λ∆D,k > K + λ∆D,K ∀ k > K}) ≥ 1 − e−c1 n ∀n Proof. Fix any set of choice problems A = {(Ak , fk )}nk=1 , and let D = {(xk , Ak )}nk=1 describe the observed data. Suppose towards contradiction that there exists an integer k > K such that ∆D,K − ∆D,k > (k − K)/λ. (8) ∆D,K > 1/λ. (9) Observe that this implies since ∆D,k ≥ 0 and k − K ≥ 1. Let X ∼ Bin(n, p) be the number of observations ∗ , in error. Removing each of these X nodes from GD results in a subgraph of GDA which we know from Obs. 2 is K-colorable. Thus, ∆D,K ≤ X, and it follows from (9) that Pr(X ≤ 1/λ) is an upper bound on the probability that an integer k > K exists such that (8) obtains. Since by assumption λ1 < pn, it follows from Hoeffding’s Inequality that 2 1 2(1/λ − pn)2 Pr(X ≤ 1/λ) = Pr(X − pn ≤ 1/λ − pn) ≤ exp − = e−2( nλ −p) n n This inequality holds independently of the set A, so we have X g∗ (νn )({D : k + λ∆D,k > K + λ∆D,K ∀ k > K}) ≥ 1 − π(A) Pr(X ≤ 1/λ) |A|=n ≥1− X 2 1 π(A) 1 − e−2( nλ −p) n |A|=n 1 2 ≥ 1 − e−2( nλ −p) 19 n Let c1 := 2 1 nλ 2 − p > 0, and the desired inequality follows. Lemma 2. There exists a constant c2 > 0 (uniform across n) such that g∗ (νn )({D : k + λ∆D,k > K + λ∆D,K ∀ k < K}) ≥ 1 − e−c2 n K2|X| Proof. Fix any set A = {(Ak , fk )}nk=1 of choice problems, and let D = {(xk , Ak )}nk=1 describe the observed data. Suppose towards contradiction that there exists an integer k < K such that ∆D,k − ∆D,K < (K − k)/λ (10) Fix any choice problem (A, f ). In any given observation, the probability that (A, f ) K2|X| , is sampled and xA,f is chosen is (1 − p)/(K2|X| ). By assumption, λ > d(1−p)n 1 so that dλ < (1−p)n . Using Hoeffding’s Inequality, the probability that (A, f ) is K2|X| sampled and xA,f is selected at least 1/(dλ) times is therefore at least ! ! 1 1 (1 − p)n 2 1−p 2 1 − exp −2 /n = 1 − exp −2n − − dλ ndλ K2|X| K2|X| The probability that every choice problem (A, f ) is sampled and perfectly maximized at least 1/(dλ) times is weakly more than " 1 − exp −2n 1 1−p − ndλ K2|X| 2 !#K2|X| ∗ contains at least d Conditional on this event, it follows from Obs 2 that GDA 1/λ disjoint complete K-partite graphs, so that (11) 1 dλ = ∆D,k − ∆D,K > (K − k)/λ for every k < K. This directly contradicts (10), so that the probability in (11) is a lower bound on the probability that no k < K satisfies (10). Let c2 = 2 1−p 1 2 ndλ − K2 , and the desired inequality follows. |X| Combining Lemmas 1 and 2, the probability that there does not exist any integer k 6= K satisfying (7) is at least 1 − e−c1 n 1 − e−c2 n K2|X| −→ 1 as n → ∞. Moreover, this convergence is 1 − O(e−cn ) for c = min(c1 , c2 ). 20 C Proof of Proposition 1 Lemma 1 follows as proven for Theorem 1. Lemma 2 is replaced with the following: Lemma 3. For every n ≥ 1, g∗ (νn )({D : k + λ∆D,k > K + λ∆D,K ∀ k < K}) → 1 as n → ∞. Proof. Fix any set of choice problems A = {(Ak , fk )}nk=1 . Suppose towards contradiction that there exists some integer k < K such that ∆D,k − ∆D,K < (K − k)/λ. ∗ includes at Write qn for the probability that d(A) = δ, which implies that GDA least δn complete K-partite graphs. By assumption, qn → 1 as n → ∞. Condition ∗ using i = 1, . . . , βn. on this event, and label the complete K-partite graphs in GDA Define indicator variables Xi , i = 1, . . . , δn to take value 1 if every observation in P the i-th complete K-partite graph is perfectly maximized. Now let X = δn i=1 Xi . K K Notice that every Xi ∼ Ber((1 − p) ), and also E(X) = βn(1 − p) . Since λ1 < δ(1 − p)K n by assumption, it follows from Hoeffding’s inequality that Pr(X < 1/λ) = Pr X − δn(1 − p)K < 1/λ − δn(1 − p)K −2(1/λ − δn(1 − p)K )2 ≤ exp δn 2 ! 1 − (1 − p)K = exp −2n δnλ Thus, the probability that GD contains qn 1 λ complete K-partite graphs is at least 2 !! 1 K 1 − exp −2n − (1 − p) . δnλ 2 1 Let c3 := 2 δnλ − (1 − p)K . Since this inequality holds independently of the set A, we have that X g∗ (νn )({D : k + λ∆D,k > K + λ∆D,K ∀ k > K}) ≥ 1 − π(A) Pr(X ≤ 1/λ) |A|=n ≥ X π(A)qn (1 − e−c3 n ) |A|=n ≥ qn (1 − e−c3 n ) Since qn (1 − e−c3 n ) → 1 as n → 1, the desired statement follows. 21 D Proof of Corollary 1 Fix any dataset D = A = {(xi , Ai )}ni=1 . Collect the chosen alternatives in the set X = {xi }ni=1 , and identify every choice set Ai ⊆ X with a new choice set T Ai = X Ai that includes only the choices from Ai that are also available in X. Define D = {(xi , Ai )}ni=1 . Lemma 4. For every λ ∈ R+ , min k + λ∆D,k = min k + λ∆D,k k∈Z+ k∈Z+ Proof. I will show that ∆D,Λ = δ for some set Λ = {j }kj=1 if and only if ∆D,U = δ for some set U = {uj }kj=1 . Fix any set of k orderings Λ = {j }kj=1 , each defined on X, and define δ := ∆D,Λ . Every j admits representation via a utility function uj : X → R. Moreover, we can extend each uj to a continuous function uj on X satisfying maxx∈X uj (x) = maxx∈X uj (x). Then, xj = argmaxx∈A uj (x) if and only if xj is -maximal in A. This implies ( ) ∆D,{uj } = # (x, A) ∈ D : x 6= argmax uj (x0 ) for all j = 1, . . . , k = δ, x0 ∈A so that min k + λ∆D,k ≥ min k + λ∆D,k . k∈Z+ k∈Z+ (12) In the other direction, fix a set of k continuous functions U = {uj }kj=1 , each defined on X, and define δ = ∆D,U . Define for every uj an ordering j on X that satisfies x j x0 iff uj (x) > uj (x0 ). Then, xj = argmaxx∈A uj (x) if and only if xj is -maximal in A, so that ∆D,{j } = #{(x, A) ∈ D : x is not j -maximal in A for any j = 1, . . . , k} = δ. Thus, min k + λ∆D,k ≤ min k + λ∆D,k . k∈Z+ k∈Z+ Combine this with (12) and we are done. It follows from this lemma that K ∗λ = Kλ∗ for every choice of λ. Moreover, it is easy to see that d(A) = d(A) and p = p. Thus, the problem posed in Section 6.2 can be mapped directly into a corresponding problem involving choice over a discrete set, and we can apply Proposition 1. 22 E Proof of Proposition 2 It will be useful to identify every ordering on X with an ordinal vector r such that r (x) > r (x0 ) iff x x0 . First, I will show the following: Claim 3. If there exist orderings 1 , 2 ∈ Λ such that argmaxx∈X r1 (x) 6= argminx∈X r2 (x) (13) then Λ is not identifiable. Proof. Consider any set of orderings Λ for which there exist 1 , 2 ∈ Λ such that x1 := argmax r1 (x) 6= argmin r2 (x) := x2 , x∈X (14) x∈X Fix any D such that ∆D,Λ = 0. I will show by construction that there exists a set Λ0 with |Λ0 | ≤ |Λ| such that also ∆D,Λ0 = 0. Define a new ordering 02 to agree with 2 everywhere, except that it ranks x1 last.12 Let Λ0 = Λ − {2 } + {02 }, where the operators denote set addition and subtraction. I will now show that ∆D,Λ0 = 0. Suppose towards contradiction that there is some choice observation (x, A) ∈ D such that x not -maximal in A for any ∈ Λ0 . By assumption, x is -maximal in A for some ∈ Λ. If 6=2 , then also ∈ Λ0 , which immediately yields a contradiction. So it must be that x is 2 -maximal in A. Now, there are two possibilities: if x 6= x1 , then x must also be 02 -maximal in A, and we are done. If instead x = x1 , then although x is not 02 -maximal, it is 1 -maximal in A by its definition in (14). So we are again done. Since every set of three or more orderings satisfies the condition in (13), it immediately follows from Claim 3 that every Λ with |Λ| ≥ 3 is not identifiable. Now let us consider Λ = {1 , 2 }. If (13) is satisfied, it again follows from Claim 3 that Λ is not identifiable. Suppose otherwise, and define D = {(x, A) : x is 1 -maximal in A, A ∈ 2X } ∪ {(x, A) : x is 2 -maximal in A, A ∈ 2X }. Clearly there is no singleton set Λ0 such that ∆D,Λ0 = 0. Suppose towards contradiction that there exists some Λ0 = {01 , 02 } 6= Λ such that ∆D,Λ0 = 0. Suppose 12 That is, for every x, x0 6= x2 , x 02 x0 if and only if x 2 x0 , and for every x 6= x2 , x 02 x2 . 23 w.l.o.g. that 01 6=,13 and index the alternatives such that x1 = argmax r1 (x) x∈X x2 = argmax r2 (x) x∈X That is, x1 is ranked first according to 1 and x2 is ranked first according to 2 . First observe that since (x1 , X), (x2 , X) ∈ D, necessarily x1 is highest ranked in 01 and x2 is highest ranked in 02 , or vice versa. Suppose the former w.l.o.g. Since 01 6=1 , there exist alternatives xi , xj such that xi 1 xj and xj 01 xi ; that is, xi is higher ranked than xj under 1 but not under 01 . Let A := {x : xi 1 x} be the set of all alternatives that 1 ranks weakly lower than xi . Notice that x2 , xj ∈ A. Since xj ∈ A, and xj 01 xi by assumption, xi is not 01 -maximal in A. Moreover, since x2 ∈ A, and x2 is 02 -maximal in X, xi is also not 02 -maximal in A. But then (xi , A) is not consistent with maximization of either ordering in Λ0 , yielding the desired contradiction. Finally, every singleton set Λ = {} is trivially identifiable, taking D = (x, A) : is -maximal in A, A ∈ 2X . 13 Otherwise, 02 6=2 and the remainder of the proof is correspondingly mirrored. 24 References Afriat, Sidney N. 1967. “The Construction of a Utility Function from Expenditure Data.” International Economic Review 8(1):67–77. Ambrus, Attila & Kareen Rozen. 2013. “Rationalizing Choice with Multi-Self Models.” Economic Journal . Apesteguia, Jose & Miguel A. Ballester. 2012. “A Measure of Rationality and Welfare.” Working Paper. Bernheim, B. Douglas & Antonio Rangel. 2009. “Beyond Revealed Preference: Choice Theoretic Foundations for Behavioral Welfare Economics.” Quarterly Journal of Economics . Crawford, Ian & Krishna Pendakur. 2012. “How Many Types Are There?” Economic Journal . Dean, Mark & Daniel Martin. 2010. “How Consistent are your Choice Data?” Working Paper. Einav, Liran, Amy Finkelstein, Iuliana Pascu & Mark Cullen. 2012. “How General are Risk Preferences? Choices under Uncertainty in Different Domains.” American Economic Review . Famulari, Melissa. 1995. “A Household-Based, Nonparametric Test of Demand Theory.” Review of Economics and Statistics 77:372–383. Fudenberg, Drew & David Levine. 2006. “A Dual-Self Model of Impulse Control.” American Economic Review 96(5):1449–1476. Gross, Andrew. 1989. Determining the number of violators of the weak axiom. Technical report University of Wisconsin–Milwaukee. Houtman, Martijn & J.A.H. Maks. 1985. “Determining all Maximal Data Subsets Consistent with Revealed Preference.” Kwantitatieve Methoden 19:89–104. Huber, Joel, John Payne & Christopher Puto. 1982. “Adding Asymmetrically Dominated Alternatives: Violations of Regularity and the Similarity Hypothesis.” Journal of Consumer Research . Kahneman, Daniel & Amos Tversky. 2000. Choices, Values, and Frames. The Press Syndicate of the University of Cambridge. Kalai, Gil, Ariel Rubinstein & Ran Spiegler. 2002. “Rationalizing Choice Functions by Multiple Rationales.” Econometrica 70(6):2481–2488. 25 Manzini, Paola & Marco Mariotti. 2009. “Categorize Then Choose: Boundedly Rational Choice and Welfare.”. McFadden, Daniel & M.K. Richter. 1970. “Revealed Stochastic Preference.”. Quandt, Richard E. 1956. “A Probabilistic Theory of Consumer Behavior.” Quaterly Journal of Economics 70:507–536. Rubinstein, Ariel & Yuval Salant. 2006. “A model of choice from lists.” Theoretical Economics 3(17). Rubinstein, Ariel & Yuval Salant. 2008. “(A,f): Choice with Frames.” The Review of Economics Studies . Train, Kenneth. 1986. Qualitative Choice Analysis. Cambridge University Press. Varian, Hal R. 1982. “The Nonparametric Approach to Demand Analysis.” Econometrica 50(4):945–73. 26
© Copyright 2026 Paperzz