How Many Choice Contexts are There?

How Many Choice Contexts are There?
Annie Liang∗
June 25, 2016
Abstract
Recent evidence suggests that many choice datasets reflect maximization of
several context-dependent orderings, rather than a single context-independent
ordering. This paper asks whether it is possible to identify the number of
context-dependent orderings maximized in the data from choice observations
alone (i.e. without explicit labeling of contexts). I study a model in which a
decision-maker imperfectly maximizes a set of context-dependent preferences,
and propose an approach for recovering the number of context-dependent preferences. The main results show that exact recovery of the number of contextdependent orderings is feasible with probability exponentially close to 1 (in
quantity of data) using the proposed approach.
1
Introduction
Let X be a finite set of choice alternatives, and suppose that a decision-maker
(DM) chooses from many subsets A ⊆ X. In the classical approach, the analyst
assumes that the DM maximizes a single ordering over X in each of these choices.
But there is now ample evidence that many choice datasets reflect maximization
of several context-dependent orderings, rather than a single context-independent
ordering. For example, risk preferences over insurance plans can vary depending on
the domain of insurance (Einav, Finkelstein, Pascu & Cullen 2012), and preferences
over consumer goods can vary depending on ordering effects and decoys (Kahneman
& Tversky 2000, Huber, Payne & Puto 1982). Moreover, many choice datasets
are aggregated over a population of DMs with different preferences (Crawford &
Pendakur 2012).1
∗
Microsoft Research and University of Pennsylvania. Email: [email protected]. I am
especially grateful to Jerry Green. I would also like to thank Emily Berger, Gabriel Carroll, Drew
Fudenberg, Ben Golub, David Laibson, Eric Maskin, Jose Montiel Olea, Gleb Romanyuk, Andrei
Shleifer, Ran Shorrer, Tomasz Strzalecki, and Yufei Zhao for useful comments and discussions.
1
In the last example of a population of DMs, we should think of each context as representing
a class of DMs.
1
Despite evidence that choice is context-dependent in many settings, contextindependence remains the standard assumption in part due to the difficulty of
discerning choice contexts. Descriptions of choice settings are not always available,
and when they are, it may not be apparent which aspects are relevant to choice.
For example, if individuals maximize the same preferences over health insurance
and dental insurance, but a different preference for their 401K investments, the
analyst need not know that these are the relevant groupings.
This paper asks whether it is possible to identify the number of contexts from
choice observations alone. The number of context-dependent preferences that are
maximized in the data is important, because it informs the appropriateness of
reduction to a single context-independent ordering. This is relevant for welfare
assessment: identification of context-dependency can tell us whether inconsistencies
in choice data reflect error (and are hence uninformative towards the DM’s true
preferences), or reflect rational maximization (and hence, should be considered for
welfare analysis). The number of contexts is also relevant to parameter estimation:
it tells us whether it makes sense to estimate preference parameters on the full
dataset, or whether improvement can be attained by conditioning on a relevant
subsample. Finally, the number of contexts is relevant for prediction of future
choices: it tells us whether predictive power is maximized by inferring a single
ordering for prediction, or can be improved by inferring several context-dependent
orderings.
Towards this goal, I study a model in which the classical setting is enriched
by a set of choice contexts F (this follows the approach of Bernheim & Rangel
(2009) and Rubinstein & Salant (2008)), and the DM possesses a set of contextdependent preferences {f }f ∈F . I assume that the DM imperfectly maximizes
these preferences, so that given choice set A and context f , the DM chooses the
f -optimal alternative in A with probability 1 − p, and errs with probability p.2
No parametric restrictions are imposed on the shape of error. The analyst wants
to know the number of choice contexts |F |, but observes only the DM’s choices and
the choice sets they were selected from. The question of interest is whether the
analyst can recover the number of contexts |F | given this information alone.
The intrinsic challenge in this problem is the multiplicity of mappings from
choice data to context-dependent orderings. At one extreme, we can preserve the
single ordering model by identifying a “best-fit” ordering to the data (e.g. as
proposed by Houtman & Maks (1985)). At another extreme, Kalai, Rubinstein &
Spiegler (2002) suggest identifying the smallest number of orderings such that every
observation is consistent with maximization of some ordering. These approaches
promise disciplined elicitation of the number of contexts, but the former is inappro2
The DM’s choice rule is thus a stochastic mapping from choice problems (A, f ) to distributions
over A.
2
priate if the number of contexts is greater than one, and the latter is inappropriate
if the probability of error is positive (since errors will be mistaken for preference
heterogeneity).
This paper thus suggests a new, intermediate, solution that linearly trades off
the fit of the model to the data (as measured via the number of unexplained observations) and the number of orderings used. The tradeoff between these two
goals is mediated via a constant λ, where the Houtman & Maks (1985) and Kalai,
Rubinstein & Spiegler (2002) solutions correspond to special cases of λ.
The main results show that the proposed approach recovers the correct number
of contexts with probability converging to 1 in the quantity of data. Specifically, if
the choice problems that the DM is presented with are drawn from a distribution
that satisfies certain conditions, then there is an “optimal” set of choices for λ such
that the number of inferred contexts converges to |F | as the quantity of data is
taken to infinity. The set of choices for λ are determined by three features: the
true number of contexts, the probability p that the DM errs in each choice, and the
“differentiation” in choice implications of the DM’s context-dependent orderings.
Section 5 develops and discusses these results for the special case in which the
distribution over choice sets and contexts is uniform, and Section 6.1 generalizes
to distributions satisfying a condition of “limit preference differentiation.” Section
6.2 extends the approach and recovery results for DMs with continuous preferences
over a non-finite set X.
Section 7 discusses limitations of the main recovery results: while we can recover the number of contexts, it is not in general possible to recover the contextdependent orderings themselves. I discuss possible extensions of the approach to
recover further details of preference. Section 8 is the literature review, and Section
9 concludes.
2
Choice Model
Let X be a finite set of choice alternatives, with A ⊆ X denoting a typical choice
set, and 2X denoting the set of possible choice sets. Following Rubinstein & Salant
(2008) and Bernheim & Rangel (2009), let F be a finite set of K choice contexts. A
choice problem is a pair (A, f ) ∈ PX × F consisting of a choice set A and a choice
context f .
I assume that each choice context f cues a (distinct) strict preference relation
f , which the DM imperfectly maximizes. Formally, the DM’s choice rule is a
map P from choice problems (A, f ) to distributions over the elements of A. I will
use xA,f to denote the f -maximal element in A (the alternative that a perfectly
maximizing DM would choose). The constant
p := min p0 : P (A, f )[xA,f ] ≥ 1 − p0
3
∀ (A, f )
is a uniform upper bound on the DM’s probability of error, and will figure importantly into the main results.
Remark 2.1. Special cases of this model include the following:
(a) p = 0, K = 1 returns perfect maximization of a single ordering.
(b) p = 0, K > 1 returns a model closely related to ideas proposed in Rubinstein
& Salant (2008) and Bernheim & Rangel (2009). If moreover, F ⊆ 2X , so
that the set of contexts partitions the set of choice sets, then we have the
multiple preference model considered in Kalai, Rubinstein & Spiegler (2002).
A total of n choice problems are repeatedly sampled from a distribution π over the
set of unique choice problems 2X × F . In each choice problem, the DM chooses
according to his choice rule P . This induces the realizations
D∗ = {(x, (A, f )) | (A, f ) ∈ A},
where A ⊆ 2X × F is the realized set of choice problems, and (x, (A, f )) denotes
choice of x given choice problem (A, f ). Instead of observing D∗ , the analyst
observes
D = {(x, A) | (x, (A, f )) ∈ D∗ },
which includes information about the DM’s choices and choice sets, but not the
contexts. Throughout, I will refer to a pair (x, A) as a choice observation.
The question of interest is the following: when, and under what conditions, can
an analyst recover the number of contexts K := |F | from the observed dataset of
choices D?
3
Example
Let us first consider a simple example. Suppose that a DM makes repeated choices
between alternatives in X = {x1 , x2 , x3 , x4 }. His choices depend on the weather;
in particular, on whether it is a rainy (R) or sunny (S) day. On rainy days, with
probability 1 − p < 1 he maximizes
x1 R x2 R x3 R x4 ,
and otherwise he trembles uniformly over the other alternatives. On sunny days,
with probability 1 − p < 1 he maximizes
x4 S x3 S x2 S x1
and otherwise he trembles uniformly over the other alternatives.
4
The analyst does not know that rain and sun are relevant to the DM’s choices,
and does not know which choices were made on rainy or sunny days. If he observes
only choice sets and realized choices, can he infer that the DM was influenced by
two choice contexts?
To fix ideas, let us consider a particular realization of the data, in which the
DM is presented with every (nonempty and non-singleton) choice problem in 2X ×
{R, S}, and selects xA,f (the f -maximal element of A) in every choice problem
(A, f ), except in the following two choices:
(x2 , ({x1 , x2 , x3 }, R))
and
(x3 , ({x1 , x3 , x4 }, S)).3
(1)
Then, there are 18 total observations, eight of which are consistent with maximization of R , eight of which are consistent with maximization of S , and two which
are not consistent with maximization of either.
There are many possible rationalizations of this data. For example, the analyst
can assign the single ordering R to the DM, explaining eight observations, and
interpreting the remaining observations as choice error. This is consistent with the
proposal of Houtman & Maks (1985), which determines the largest subset of choice
observations that are consistent with a single ordering.4 The approach, however,
fails to identify the context-dependency in the DM’s choice behaviors.
At another extreme, the analyst can perfectly describe the DM’s preferences
using the set {R , S , C }, where
x2 C x3 C x4 C x1 .
This is consistent with the proposal of Kalai, Rubinstein & Spiegler (2002): the
smallest set of orderings such that every observation is consistent with maximization
of some ordering. This approach, however, over-fits the observed data, interpreting
the error in (1) as genuine heterogeneity in preference.
The present paper proposes a solution intermediate to those described above.
The idea is to identify the fewest number of orderings that can explain as many
observations in the data as possible. Intuitively, two orderings can explain almost
all of the observations, while decreasing to a single ordering is costly (leaving eight
additional observations unexplained), and adding one additional orderings is not
very useful (explaining only two additional observations). Formally, let ∆D,k be
the minimal number of choice observations that must be left unexplained if we
use k orderings to rationalize the data, noting from the above observations that
∆D,1 = 8, ∆D,2 = 2, and ∆D,3 = 0. Define
Kλ∗ = min k + λ∆D,k ,
k∈Z+
3
In the former, the DM should have chosen x1 , the R -maximal element of {x1 , x2 , x3 }, but
instead chose x2 , and in the latter, the DM should have chosen x4 , the S -maximal element of
{x1 , x3 , x4 }, but instead chose x3 .
4
This solution is not unique. For example S also explains eight observations.
5
where λ is any constant in the interval 81 , 12 . Then, Kλ∗ = 2, recovering the
“correct” number of contexts.
Where did choice of λ ∈ 81 , 12 come from, and how special is recovery of Kλ∗ = 2
to the particular (constructed) dataset we considered? Theorem 1 generalizes the
ideas here, providing conditions and choices of λ for which recovery obtains for most
datasets, where “most” is quantified by a probability of realization that approaches
1 as the number of observations increases.
4
Proposed Approach
I will now describe the approach in more general terms. Let D be any (multi-)set
of choice observations, and suppose we wish to rationalize these observations using
a set of preference orderings Λ. Define
∆D,Λ := #{(x, A) ∈ D : x is not -maximal in A for any in Λ}.
to be the number of observations in D that are not consistent with maximization
of any ordering in Λ. I will refer to ∆D,Λ as the implied choice error. Let
∆D,k := min ∆D,Λ
|Λ|=k
denote the smallest implied choice error that is obtainable by use of some set of
k orderings; naturally, ∆D,k is decreasing in k. Following Kalai, Rubinstein &
Spiegler (2002), say that choice data D is k-rationalizable if ∆D,k = 0.
Remark 4.1. For every dataset D, there exists a constant L ≤ min{|D|, |X|} such
that D is k-rationalizable for every k ≥ L.
Following, I propose an estimate for the number of contexts:
Definition 4.1. For every λ ∈ R+ , define
Kλ∗ = argmin k + λ∆D,k .
(2)
k∈Z+
Remark 4.2. In an abuse of notation, I will refer to Kλ∗ as a singleton, although
this solution may not be unique.
The constant λ arbitrates between the two goals of minimizing the number of
orderings and minimizing the number of implied choice errors. Intuitively, λ1 is the
“cost” of each ordering, so that an ordering is attributed to the DM if and only
if it explains at least λ1 observations that would otherwise be interpreted as choice
error. As λ varies over (0, ∞), the solution to (2) traces out the optimal trade-off
curve between quantity of choice errors and number of orderings.
6
Notice in particular that as λ → 0, errors become approximately free (relative
to the cost of orderings), so that the analyst prefers to adopt a unique ordering for
the agent and interpret the remaining observations as error. For large choices of λ,
new orderings are approximately free (relative to the cost of error), so the analyst
prefers to use as many orderings as necessary to eliminate choice error. At the
edges, the choice of any λ < ∆11 returns the Houtman & Maks (1985) solution, and
the choice of any λ > 1 returns the Kalai, Rubinstein & Spiegler (2002) solution.
Observation 1. (a) Kλ∗ = 1 for every λ < ∆11 .
(b) Kλ∗ = L for every λ > 1, where L is the smallest integer such that the data is
L-rationalizable.
How should the analyst choose λ, and under what conditions on the datagenerating process do we find the proposed approach to recover the “true” number
of orderings maximized by the agent?
5
Main Result
For clarity of exposition, I will first take the sampling distribution π to be uniform
over 2X ×F (this section), and subsequently allow for general sampling distributions
in Section 6.
5.1
No error benchmark (p = 0)
Suppose first that p = 0, so that the DM always chooses the f -maximal observation given choice problem (A, f ). The following example illustrates that recovery
of the number of contexts K can be infeasible even in this idealized benchmark.
Let X = {x1 , x2 , x3 } and suppose that sthere are three contexts: rainy (R),
sunny (S), and cloudy (C). The DM’s preferences are {R , S , C }, where
x1 R x2 R x3
x1 S x3 S x1
x3 C x2 C x1
Notice that every choice observation consistent with maximization of some ordering
in {R , S , C } is also consistent with maximization of an ordering in {1 , 2 },
where
x1 1 x2 1 x3
x3 2 x2 2 x1
The set of orderings {1 , 2 } thus achieves the same choice error as {R , S , C },
but uses strictly fewer orderings, and is thus preferred under the criterion in (2).
7
This illustrates that recovery of the number of contexts K using (2) requires sufficient “diversity” in the choice implications of the DM’s context-dependent preferences. Following, I define one such notion of choice diversity. First, a preliminary
definition:
Definition 5.1. Say that choice problems A = {(A1 , f1 ), . . . , (AK , fK )} are in Kviolation of IIA if
1. xAi ,fi 6= xAj ,fj whenever i 6= j, and
T
2. xAi ,fi ∈ K
j=1 Aj for every i.
The first condition requires that every choice problem in A has a distinct optimal
choice, and the second condition requires that each of these K alternatives is available from every Ai .5 An immediate implication is that the set of choice observations
{(xAi ,fi , (Ai , fi ))}K
i=1 cannot be rationalized using fewer than K orderings.
Definition 5.2. The differentiation parameter d is the largest integer d such that
2X × F admits d non-overlapping subsets, denoted A1 , . . . , Ad , where for every
i ∈ {1, . . . , d},
1. |Ai | = K, and
2. Ai is in K-violation of IIA.
That is, the set of choice problems 2X × F includes at least d (nonoverlapping)
sets of K choice problems, each of which in K-violation of IIA.
Example 5.1. The differentiation parameter of {R , S , C } is 0, since there do
not exist any sets {(A1 , f1 ), (A2 , f2 ), (A3 , f3 )} in 3-violation of IIA.
Example 5.2. Consider X = {x1 , . . . , xn } and {f , f 0 } where
x1 f x2 f · · · f xn
xn f 0 xn−1 f 0 · · · f 0 x1
The differentiation parameter of {f , f 0 } is 2n −n−1, since every set {(A, f ), (A, f 0 )}
with |A| > 1 is in 2-violation of IIA.
Then, if d > 0, so that there is at least one set of choice problems A in K-violation,
we can choose λ > 1 to return the correct number of contexts (this is the Kalai,
Rubinstein & Spiegler (2002) solution).
Claim 1. Suppose p = 0 and d > 0. Then, for any λ > 1,
Pr(Kλ∗ = K) → 1
as
n → ∞.
The proof is obvious and omitted.6
5
6
Notice that every pair of choice problems from A constitutes a (standard) violation of IIA.
In fact, two stronger statements can be made:
8
5.2
Error (p > 0)
But when the DM imperfectly maximizes, λ > 1 will not always be the optimal
choice for recovery of K. In particular, the presence of error complicates inference
in two ways. First, error may generate choices that cannot be rationalized using
the same ordering, but which should not be mistaken as representing genuine heterogeneity in preference. For example, consider a decision maker who has the single
preference ordering , and errs with some small but positive probability p. Then,
a perfect rationalization of n observations is likely to require many more than a
single context, even while the single ordering explains “almost all” observation.
Recall our previous intuition that the criterion in (2) prices each new ordering
at λ1 observations. In view of this, the current example suggests that λ must be
small enough ( λ1 large enough) that an ordering which solely explains error will not
be attributed to the DM. The precise notion of “small enough” will depend on the
probability of error p—the larger p is, the smaller λ must be.
Second, allowing for error may “undo” genuine heterogeneity, allowing the imperfectly maximized dataset to be explained using fewer orderings than the perfectly
maximized dataset. Consider for example a decision maker with the following two
preferences:
x1 x2 · · · x99 x100
x1 x2 · · · x100 x99
Most choice observations that result from maximizing 2 can be rationalized using
1 , and vice versa. Thus, if λ1 is large, the solution to (2) will return a single
ordering, interpreting the remaining observations as error. This example suggests
that λ must be large enough ( λ1 small enough) that real heterogeneity in preference
is preserved, even in the presence of error. The restrictiveness of this constraint
depends again on p, as well as on the degree of “diversity” in the choice implications
of the context-dependent preferences—the smaller p is, and the more differentiated
the context-dependent orderings are, the smaller λ can be.
Collecting these ideas, the main theorem provides a set of sequences (λn )n≥1
given which the proposed approach recovers the true number of contexts for most
realized datasets, where “most” is quantified by a probability converging to 1.
• Consider any realization of choice problems A ⊃ 2X × F . Then, for every λ > 1, it holds
that Kλ∗ = K.
• The assumption that d > 0 is not necessary; it is sufficient that the set of observations
{(xA,f , A) , A ∈ 2X , f ∈ F )}
is K-rationalizable, and not k-rationalizable for any k < K.
9
Theorem 1. For any sequence (λn )n≥1 , where each λn ∈
Pr Kλ∗n = K → 1 as n → ∞.
K2|X|
, 1
d(1−p)K n pn
,
Moreover, the rate of convergence is exponential in n.
I provide a brief idea of the proof below, and defer the details to the appendix.
The key idea is to identify every dataset with an undirected (hyper)graph7 in the
following way: nodes represent choice alternatives, and there is an edge between a
set of observations if and only if these observations are not consistent with maximization of any single ordering. The key observation in the proof is that a dataset is
k-rationalizable if and only if the corresponding graph is k-colorable8 . This equivalence is shown by taking each color class to represent consistency with a distinct
ordering. Thus, the problem in (2) can be re-interpreted as finding the smallest
number of colors k such that the greatest number of nodes are k-colorable.
Fix any (multi-)set of choice problems A and consider the dataset that would
be generated under p = 0. Since by construction, this dataset is generated by
perfect maximization of K orderings, the corresponding (hyper-)graph must admit
a K-coloring. Moreover, since every set of observations in a K-violation of IIA
constitutes a complete K-partite subgraph, and the data includes at least d such
sets, the corresponding graph includes at least d complete K-partite subgraphs.
So it cannot be colored by fewer than K colors. The differentiation parameter d
captures the “robustness” of this observation to small perturbations in the graph.
Now introduce choice error. Each node (observation) is affected with probability
p, following which its edges are changed. But p is sufficiently small, then there will
not be enough nodes in error to justify use of a new ordering. And if d is large
enough, so that the number of complete K-partite subgraphs in the original graph
is sufficiently high, then any coloring of the graph with fewer than K colors will be
very incomplete. Thus, in the perturbed graph, which corresponds to the realized
data, most nodes in the perturbed graph can be partitioned into K colors, and no
fewer.
6
Generalizations
6.1
Relaxing assumption on π
Section 5 focused on the case in which π is uniform over choice problems. The
recovery properties of the proposed approach are not special to this distribution,
and can in some cases be improved using others. First, let us generalize the previous
notion of a differentiation parameter in the following way.
7
A hypergraph is a generalization of a graph in which edges may connect more than two vertices.
A k-coloring of a graph is a partition of its vertex set V into k color classes such that no edge
in E is monochromatic. A graph is k-colorable if it admits an k-coloring.
8
10
Definition 6.1. For every set of choice problems A = (A1 , f1 ), . . . , (AK , fK ), define d(A) to be the largest integer d0 such that A admits d non-overlapping subsets,
denoted A1 , . . . , Ad , such that for every i ∈ {1, . . . , d},
1. |Ai | = K, and
2. Ai is in K-violation of IIA.
Taking A = 2X × F returns the definition from Section 5.1. Then:
Proposition 1. Let π be any distribution with the property that
Z
d(A)dπ(A) → δ as n → ∞
(3)
{A :|A|=n}
where δ >. Then, for any sequence (λn )n≥1 , where each λn ∈
Pr(Kλ = K) → 1
1
, 1
δ(1−p)K n pn
,
as n → ∞.
The condition in (6) says that the differentiation parameter of the realized set
of choice problems A converges to δ as the number of observations is taken to
infinity. Thus, given sufficiently many observations, there are approximately δn
sets of choice problems in K-violation of IIA. This allows us to distinguish the
various orderings.
Notice that the special case in which π is uniform satisfies the condition in (6)
|X|
with δ = K2d . In general, the larger δ is, the better the recovery properties. For
example, we can improve upon the main result in Section 5 if π samples only from
choice sets for which the different context-dependent orderings disagree—of course,
this presumes that the analyst has prior knowledge about the orderings.
6.2
Continuous utility functions
So far, we have considered a DM whose preferences are linear orderings over a
discrete set X. I will show now that the results extend in a simple and natural
way to the case in which his preferences are a set of continuous utility functions
{uf }f ∈F , uf : X → R, where (X, τ ) is a topological space.
Specifically, let us suppose that the DM is presented with compact choice sets
A ⊆ X, and chooses according to a stochastic choice rule P , which takes choice
problems (A, f ) into distributions over A.9 The sampling distribution π is a (Borelmeasurable) distribution over choice problems (A, f ).10 The sampling distribution
π and stochastic choice rule P induce
D∗ = {(x, (A, f )) | (A, f ) ∈ A},
9
Every P (A, f ) is a Borel-measurable distribution over X with support contained in A.
Take the topology over choice problems to be the product topology of τ and the discrete
topology on F .
10
11
and again we assume that the analyst observes only
D = {(x, A) | (x, (A, f )) ∈ D∗ }.
(4)
The analogue of the proposed approach then defines
0
u(x ) for any u ∈ U .
∆D,k = min # (x, A) ∈ D : x 6= max
0
x ∈A
|U |=k
and the proposed solution is:
Definition 6.2. For every λ ∈ R+ , define
K ∗λ ∈ argmin k + λ∆D,k .
(5)
k∈Z+
For the corresponding statement of the recovery result, define
xA,f = max uf (A)
x∈X
to be the choice that a perfectly maximizing DM makes given choice problem (A, f ),
and define
p = min{p0 : P (A, f )[xA,f ] ≥ 1 − p0 ∀ (A, f )
to be a (uniform) upper bound on the probability of error. Take d(A) to be the differentiation parameter defined in Section 6.1. The following statement is a corollary
to Proposition 1:
Corollary 1. Let π be any (Borel-measurable) distribution with the property that
Z
d(A)dπ(A) → α as n → ∞
(6)
{A :|A|=n}
Then, for any sequence (λn )n≥1 , where each λn ∈
Pr(Kλ = K) → 1
1
, 1
α(1−p)K n pn
,
as n → ∞.
Remark 6.1. Observe that, as before, we impose no parametric assumptions on the
distribution of error. This result shows that parametric assumptions are not required
for recovery, but fails to utilize properties of error that are especially natural in the
continuous setting, and may allow for stronger results.
The key observation is that any choice data of the nature described in (4) can be
mapped into discrete choice data, where we reduce X to the finite set of alternatives
that are chosen in some choice observation. For example, suppose X = R, and the
observed choices are
(3, [0, 4]),
(2, [1, 4]),
12
(8, [0, 10]).
Then, labelling ‘3’ as x1 , ‘2’ as x2 , and ‘8’ as x3 , we can redefine the set of choice
alternatives as X = {x1 , x2 , x3 }, and the choice data as
(x1 , {x1 , x2 }),
(x2 , {x1 , x2 }),
(x3 , {x1 , x2 , x3 }).
This yields a dataset of the nature studied previously. It remains to show that the
new problem posed in (5) is equivalent to the original problem posed in (2), given
the above transformation of the continuous data. This is shown in Lemma 4 in the
appendix.
7
7.1
Can we Recover More?
Limitations
Section 4.1 provides conditions under which the problem in (2) recovers the correct
number of contexts with high probability. Is it possible to recover the contextdependent orderings themselves? The following proposition reveals that the answer
is almost always no, even in the idealized p = 0 benchmark.
Definition 7.1. Say that the set of orderings Λ is identifiable if there exists some
dataset D such that
∆D,Λ = 0,
and moreover,
∆D,Λ0 > 0
for every Λ0 with |Λ0 | ≤ |Λ|.
This weak definition of identifiability says that there exists at least one dataset
D such that Λ is the unique set of |Λ| or fewer orderings that perfectly explain
every observation in D.
Proposition 2. No set Λ with |Λ| ≥ 3 is identifiable. If Λ = {1 , 2 }, then Λ is
identifiable if and only if the 1 -maximal alternative is the 2 -minimal alternative,
and vice versa. Every singleton set Λ = {} is identifiable.
This proposition does not rely on specific features of the proposed approach
(beyond favoring use of with fewer orderings), but highlights a basic challenge of
inferring multiple preferences from choice data. Indeed, not only is it generally
impossible to recover the exact set of preferences {f }f ∈F , but it is not always
possible to recover the choice implications of these orderings either. To be more
precise, let us first define a weaker notion of identifiability:
Definition 7.2. Say that sets of orderings Λ and Λ0 are choice-equivalent if an
observation (x, A) is consistent with maximization of some ordering in Λ if and
13
only if it is also is consistent with maximization of some ordering in Λ0 . Say that
the set of orderings Λ is choice-identifiable if there exists some dataset D such that
∆D,Λ = 0,
and moreover, if
∆D,Λ0 = 0,
for some other Λ0 satisfying |Λ0 | ≤ |Λ|, then Λ and Λ0 are choice-equivalent.
To give a few examples, the set of orderings {1 , 2 } with
x1 x2 x3
x2 x1 x3
and the set of orderings {01 , 02 } with
x1 x2 x3
x2 x3 x1
are choice-equivalent, since every observation that can be rationalized by either 1
or 2 can also be rationalized by either 01 or 02 . The weaker notion of choiceidentifiability says that even though we may not be able to recover the exact set
of preferences {f }f ∈F , we can recover the choice implications of this set, in the
form of all observations (x, A) that are consistent with maximization of some f .
The negative example below illustrates that choice-identifiability is not guaranteed for all sets of orderings.
Example 7.1. Let X = {x1 , x2 , x3 , x4 , x5 }. Define Λ = {1 , 2 } to satisfy
x1 1 x2 1 x3 1 x4 1 x5
x2 2 x1 2 x3 2 x4 2 x5
and the set of orderings Λ0 = {01 , 02 } to satisfy
x1 01 x2 01 x3 01 x4 01 x5
x5 02 x4 02 x3 02 x2 02 x1
Then, every choice observation that is consistent with maximization of some ordering in {1 , 2 } is also consistent with maximization of some ordering in {1 , 02 },
so that if ∆D,Λ = 0, then also ∆D,Λ0 = 0. But Λ and Λ0 are not choice-equivalent;
therefore, Λ is not choice-identifiable.
14
7.2
Possibilities
The example above highlights the (general) phenomenon that across sets with a
fixed number of orderings, there is variation in the “richness” of choice implication. Since the approach in (2) penalizes all sets consisting of the same number of
orderings equally, it is biased towards elicitation of sets with richer choice implications. That is, if the decision maker’s context-dependent preferences are many but
similar, the proposed approach will incorrectly interpret the data using orderings
that are fewer but “more different”. This example suggests an alternative penalty
on the “complexity” of the choice model that counts not the number of orderings,
but rather the number of choice implications. For example, we might extend the
approach in (2) to loss functions of the form
f (Λ) + λ∆D,k ,
where f (Λ) = #{(x, A) : x is -maximal in A for some ∈ Λ}. This brief discussion illustrates that the approach of trading off choice errors and model-complexity
(reminiscent of statistical regularization) can be extended beyond the particular
loss function studied in this paper. Alternative specifications may prove productive
towards recovery of further features of preference.
8
8.1
Related Literature
Identifying preferences from choice data
This paper builds on a literature that seeks to nonparametrically identify preferences from choice data. Most directly, it extends ideas in Kalai, Rubinstein &
Spiegler (2002), which defines a set of orderings {i }L
i=1 as a rationalization by multiple rationales if for every observed choice is i -maximal in the available choice
set for some i = 1, 2, . . . , L. Using the notation of Section 2, any set of orderings Λ
with choice error ∆D,Λ = 0 is a rationalization by multiple rationales of the dataset
D. This set of orderings may not, however, correspond to a best multiple-ordering
rationalization of the data as defined in (2). In particular, I suggest that the analyst
may prefer an imperfect rationalization of the data using some K < L orderings
to a perfect rationalization of the data using L orderings. The key conceptual difference is that Kalai, Rubinstein & Spiegler (2002) is agnostic towards the degree
of evidence for orderings, whereas the approach in this paper insists on sufficient
evidence for each ordering in order to separate error from preference variation.
Other nonparametric approaches for preference identification from choice data
include Houtman & Maks (1985), which has been discussed extensively throughout
this paper, Famulari (1995) and Varian (1982). There is also a separate literature on identification of preferences under various parametric assumptions on the
15
distribution of error (e.g. Quandt (1956), McFadden & Richter (1970), and Train
(1986)).
Finally, Crawford & Pendakur (2012) and Dean & Martin (2010) respectively
apply the approaches of Kalai, Rubinstein & Spiegler (2002) and Houtman & Maks
(1985) to real data. The former is particularly relevant to this paper. The authors
analyze a dataset of 500 Danish households and their choices between six different
kinds of milk, discovering that five utility functions are sufficient to explain all of
the data. However, the authors point out that the fifth utility function explains
only 8 out of 500 observations, and subsequently drop this utility function in many
of their latter analyses. The solution proposed in the present paper is precisely
designed to explain why this might be a reasonable action, and provide theoretical
justifications for where to set the bar.
8.2
Testing rationality
A long literature asks the question of how close choice data is to perfect rationalization of a single preference ordering. This question was originally asked by
Afriat (1967) for choices over price vectors and consumption bundles, and subsequently studied also by Varian (1982), Houtman & Maks (1985), Gross (1989), and
Apesteguia & Ballester (2012), among others.
In view of this literature, a goal of this paper can be viewed as distinguishing
between the case in which choice data doesn’t satisfy perfect rationalization of a
single preference ordering because of random error, and the case in which choice
data doesn’t satisfy perfect rationalization of a single preference ordering because
of multiplicity in preference. In both cases, most of the inconsistency measures
suggested above will be large, since a single ordering fails to explain the data well.
The proposed solution Kλ∗ offers a way to distinguish between these two cases,
since in the case of approximately perfect rationalization of multiple preferences,
Kλ∗ will be large while ∆D,Kλ∗ will be small, while imperfect rationalization of a
single preference will return Kλ∗ = 1 with a large value of ∆D,1 .
8.3
Multi-self decision making
The model of choice that I consider throughout is most closely related to Rubinstein
& Salant (2008) and Bernheim & Rangel (2009). These papers extend the standard
model to include a set F of contexts (called frames in Rubinstein & Salant (2008)
and ancillary conditions in Bernheim & Rangel (2009)). A choice problem (called
extended choice problem in Rubinstein & Salant (2008) and generalized choice situation in Bernheim & Rangel (2009)) is defined as a pair (A, f ) where A ⊆ X is a
choice set and f ∈ F is a context. An extended choice function c assigns to every
extended choice problem (A, f ) an element of A. I consider an extension of this
16
model to allow for choice error, so that c(A, f ) is stochastic. The goal of the present
paper differs rather significantly from these earlier papers, raising the new question
of whether it is possible to recover the number of contexts in F using choice data.
Additionally, there are several recent contributions to the literature on multiself decision-making (e.g. Fudenberg & Levine (2006), Rubinstein & Salant (2006),
Manzini & Mariotti (2009), and Ambrus & Rozen (2013)). Among these, Ambrus
& Rozen (2013) is most related, and shows (among other results) that without
prior restriction on the number of selves involved in a decision, many multi-self
models have no testable implications. Although the set of choice models considered
in Ambrus & Rozen (2013) is different from the set of choice models considered
in my paper,11 their lesson that restricting the number of selves is important for
constraining the available degrees of freedom holds in the setting considered in this
paper as well, and motivates in part the suggested criterion in (2).
9
Conclusion
Inference of context-dependence preferences from choice data is a challenging problem, due to lack of explicit data on choice contexts, and an inherent indeterminacy
of multiple-preference models (see, e.g., Proposition 2). This paper suggests that
we may nevertheless be able to uncover the number of choice contexts from choice
behavior alone. The proposed strategy seeks the smallest number of orderings
that explains the greatest number of choice observations. Working in a model of
context-dependent preferences based on Rubinstein & Salant (2008) and Bernheim
& Rangel (2009), I show that with probability exponentially close to 1, the proposed approach is able to recover the true number of context-dependent preferences.
This provides an alternative to existing approaches, which deliver either a single
“best-fit” ordering or multiple “perfect-fit” orderings.
11
Ambrus & Rozen (2013) study multi-self models in which every self is active in every decision,
and choice is determined through maximization of a choice-set independent aggregation rule over
selves. In contrast, I study multi-self models in which every self acts as a “dictator” in a subset of
choices.
17
A
Preliminaries
Following, I collect definitions and results that will be used in the proofs of Theorem
1 and Proposition 1.
First, observe that K is the unique solution to the problem in (2) given data D
if and only if
K + λ∆D,K > k + λ∆D,k ∀ k ∈ Z+ , k 6= K.
(7)
It will be useful to rewrite this as
∆D,K − ∆D,k > (k − K)/λ.
Fix a number of observations n. The sampling distribution π over choice problems
and the DM’s stochastic choice rule P induce a distribution νn over datasets of size
n, where
" n
#
X Y
νn (D) = ν({(xk , Ak )}nk=1 ) =
π (Ak , fk ) P (xk | (Ak , fk )) .
fk ∈F
k=1
To simplify notation, I will frequently write
νn (D) = ν(x, A) = π(A)P (x|A),
Q
using π(A) := nk=1 π(Ak , fk ) to mean the probability that the DM sees choice
Q
problems A = {(Ak , fk )}nk=1 , and P (x|A) = nk=1 P (xk | (Ak , fk )) to mean the
probability that the DM chooses alternatives x = (xk )nk=1 , given choice problems
in A. The probability that Kλ∗ = K is then the measure assigned by νn to the set
of datasets D satisfying (7). To characterize this probability, it is useful to recast
the problem in the following way.
Let g : D 7→ GD be a map that identifies every dataset D with a (hyper-)
graph GD = (VD , ED ), where VD = {1, . . . , n} indexes choice observations, and ED
consists of every set T ⊆ VD such that: (1) the observations {(xi , Ai )}i∈T are 1rationalizable, and (2) no proper subset of {(xi , Ai )}i∈T are 1-rationalizable. These
concepts are related to our problem as follows.
Claim 2. D is k-rationalizable ⇐⇒ GD is k-colorable.
Proof. Take each color class to represent consistency with a distinct ordering, and
the equivalence follows directly.
This claim further implies that ∆D,k is equal to the minimum number of vertices
that must be removed from GD for the graph to be k-colorable. From here on, I
will refer to the vertices of GD and the observations they represent interchangeably.
Additionally, I will frequently refer to the pushforward measure
g∗ (νn )(G) = νn (D : D ∈ g −1 (G)),
18
which is the distribution over graphs induced by the measure νn and map g.
We will need a final set of definitions. Say that observation i is perfectly max∗
imized if xi is fi -maximal in Ai . For every set of choice problems A, let DA
denote the data that would have been generated if every observation is perfectly
∗ = g (D ∗ ) to be the corresponding graph.
maximized, and define GDA
A
Observation 2. (a) For any set of choice problems A, the perfect maximization
∗ is K-colorable. (b) Let A = 2X × F ; then, the graph GD ∗ contains at
graph GDA
A
least d disjoint K-partite subgraphs.
∗ , and part (b) follows directly from the
Part (a) follows from the definition of DA
definition of the differentiation parameter.
B
Proof of Theorem 1
The desired result follows from two lemmas.
Lemma 1. There exists a constant c1 > 0 (uniform across n) such that
g∗ (νn )({D : k + λ∆D,k > K + λ∆D,K ∀ k > K}) ≥ 1 − e−c1 n
∀n
Proof. Fix any set of choice problems A = {(Ak , fk )}nk=1 , and let D = {(xk , Ak )}nk=1
describe the observed data. Suppose towards contradiction that there exists an
integer k > K such that
∆D,K − ∆D,k > (k − K)/λ.
(8)
∆D,K > 1/λ.
(9)
Observe that this implies
since ∆D,k ≥ 0 and k − K ≥ 1. Let X ∼ Bin(n, p) be the number of observations
∗ ,
in error. Removing each of these X nodes from GD results in a subgraph of GDA
which we know from Obs. 2 is K-colorable. Thus, ∆D,K ≤ X, and it follows from
(9) that Pr(X ≤ 1/λ) is an upper bound on the probability that an integer k > K
exists such that (8) obtains.
Since by assumption λ1 < pn, it follows from Hoeffding’s Inequality that
2
1
2(1/λ − pn)2
Pr(X ≤ 1/λ) = Pr(X − pn ≤ 1/λ − pn) ≤ exp −
= e−2( nλ −p) n
n
This inequality holds independently of the set A, so we have
X
g∗ (νn )({D : k + λ∆D,k > K + λ∆D,K ∀ k > K}) ≥ 1 −
π(A) Pr(X ≤ 1/λ)
|A|=n
≥1−
X
2 1
π(A) 1 − e−2( nλ −p) n
|A|=n
1
2
≥ 1 − e−2( nλ −p)
19
n
Let c1 := 2
1
nλ
2
− p > 0, and the desired inequality follows.
Lemma 2. There exists a constant c2 > 0 (uniform across n) such that
g∗ (νn )({D : k + λ∆D,k > K + λ∆D,K ∀ k < K}) ≥ 1 − e−c2 n
K2|X|
Proof. Fix any set A = {(Ak , fk )}nk=1 of choice problems, and let D = {(xk , Ak )}nk=1
describe the observed data. Suppose towards contradiction that there exists an
integer k < K such that
∆D,k − ∆D,K < (K − k)/λ
(10)
Fix any choice problem (A, f ). In any given observation, the probability that (A, f )
K2|X|
,
is sampled and xA,f is chosen is (1 − p)/(K2|X| ). By assumption, λ > d(1−p)n
1
so that dλ
< (1−p)n
. Using Hoeffding’s Inequality, the probability that (A, f ) is
K2|X|
sampled and xA,f is selected at least 1/(dλ) times is therefore at least
!
!
1
1
(1 − p)n 2
1−p 2
1 − exp −2
/n = 1 − exp −2n
−
−
dλ
ndλ K2|X|
K2|X|
The probability that every choice problem (A, f ) is sampled and perfectly maximized at least 1/(dλ) times is weakly more than
"
1 − exp −2n
1
1−p
−
ndλ K2|X|
2 !#K2|X|
∗ contains at least d
Conditional on this event, it follows from Obs 2 that GDA
1/λ disjoint complete K-partite graphs, so that
(11)
1
dλ
=
∆D,k − ∆D,K > (K − k)/λ
for every k < K. This directly contradicts (10), so that the probability in (11)
is a lower bound on the probability that no k < K satisfies (10). Let c2 =
2
1−p
1
2 ndλ
− K2
, and the desired inequality follows.
|X|
Combining Lemmas 1 and 2, the probability that there does not exist any integer
k 6= K satisfying (7) is at least
1 − e−c1 n
1 − e−c2 n
K2|X|
−→ 1 as n → ∞.
Moreover, this convergence is 1 − O(e−cn ) for c = min(c1 , c2 ).
20
C
Proof of Proposition 1
Lemma 1 follows as proven for Theorem 1. Lemma 2 is replaced with the following:
Lemma 3. For every n ≥ 1,
g∗ (νn )({D : k + λ∆D,k > K + λ∆D,K ∀ k < K}) → 1 as n → ∞.
Proof. Fix any set of choice problems A = {(Ak , fk )}nk=1 . Suppose towards contradiction that there exists some integer k < K such that
∆D,k − ∆D,K < (K − k)/λ.
∗ includes at
Write qn for the probability that d(A) = δ, which implies that GDA
least δn complete K-partite graphs. By assumption, qn → 1 as n → ∞. Condition
∗ using i = 1, . . . , βn.
on this event, and label the complete K-partite graphs in GDA
Define indicator variables Xi , i = 1, . . . , δn to take value 1 if every observation in
P
the i-th complete K-partite graph is perfectly maximized. Now let X = δn
i=1 Xi .
K
K
Notice that every Xi ∼ Ber((1 − p) ), and also E(X) = βn(1 − p) . Since λ1 <
δ(1 − p)K n by assumption, it follows from Hoeffding’s inequality that
Pr(X < 1/λ) = Pr X − δn(1 − p)K < 1/λ − δn(1 − p)K
−2(1/λ − δn(1 − p)K )2
≤ exp
δn
2 !
1
− (1 − p)K
= exp −2n
δnλ
Thus, the probability that GD contains
qn
1
λ
complete K-partite graphs is at least
2 !!
1
K
1 − exp −2n
− (1 − p)
.
δnλ
2
1
Let c3 := 2 δnλ
− (1 − p)K . Since this inequality holds independently of the set
A, we have that
X
g∗ (νn )({D : k + λ∆D,k > K + λ∆D,K ∀ k > K}) ≥ 1 −
π(A) Pr(X ≤ 1/λ)
|A|=n
≥
X
π(A)qn (1 − e−c3 n )
|A|=n
≥ qn (1 − e−c3 n )
Since qn (1 − e−c3 n ) → 1 as n → 1, the desired statement follows.
21
D
Proof of Corollary 1
Fix any dataset D = A = {(xi , Ai )}ni=1 . Collect the chosen alternatives in the
set X = {xi }ni=1 , and identify every choice set Ai ⊆ X with a new choice set
T
Ai = X Ai that includes only the choices from Ai that are also available in X.
Define D = {(xi , Ai )}ni=1 .
Lemma 4. For every λ ∈ R+ ,
min k + λ∆D,k = min k + λ∆D,k
k∈Z+
k∈Z+
Proof. I will show that ∆D,Λ = δ for some set Λ = {j }kj=1 if and only if ∆D,U = δ
for some set U = {uj }kj=1 .
Fix any set of k orderings Λ = {j }kj=1 , each defined on X, and define δ :=
∆D,Λ . Every j admits representation via a utility function uj : X → R. Moreover,
we can extend each uj to a continuous function uj on X satisfying maxx∈X uj (x) =
maxx∈X uj (x). Then, xj = argmaxx∈A uj (x) if and only if xj is -maximal in A.
This implies
(
)
∆D,{uj } = # (x, A) ∈ D : x 6= argmax uj (x0 ) for all j = 1, . . . , k
= δ,
x0 ∈A
so that
min k + λ∆D,k ≥ min k + λ∆D,k .
k∈Z+
k∈Z+
(12)
In the other direction, fix a set of k continuous functions U = {uj }kj=1 , each
defined on X, and define δ = ∆D,U . Define for every uj an ordering j on X that
satisfies
x j x0 iff uj (x) > uj (x0 ).
Then, xj = argmaxx∈A uj (x) if and only if xj is -maximal in A, so that
∆D,{j } = #{(x, A) ∈ D : x is not j -maximal in A for any j = 1, . . . , k} = δ.
Thus,
min k + λ∆D,k ≤ min k + λ∆D,k .
k∈Z+
k∈Z+
Combine this with (12) and we are done.
It follows from this lemma that K ∗λ = Kλ∗ for every choice of λ. Moreover, it is
easy to see that
d(A) = d(A) and p = p.
Thus, the problem posed in Section 6.2 can be mapped directly into a corresponding
problem involving choice over a discrete set, and we can apply Proposition 1.
22
E
Proof of Proposition 2
It will be useful to identify every ordering on X with an ordinal vector r such
that
r (x) > r (x0 ) iff x x0 .
First, I will show the following:
Claim 3. If there exist orderings 1 , 2 ∈ Λ such that
argmaxx∈X r1 (x) 6= argminx∈X r2 (x)
(13)
then Λ is not identifiable.
Proof. Consider any set of orderings Λ for which there exist 1 , 2 ∈ Λ such that
x1 := argmax r1 (x) 6= argmin r2 (x) := x2 ,
x∈X
(14)
x∈X
Fix any D such that ∆D,Λ = 0. I will show by construction that there exists a set
Λ0 with |Λ0 | ≤ |Λ| such that also ∆D,Λ0 = 0.
Define a new ordering 02 to agree with 2 everywhere, except that it ranks x1
last.12 Let Λ0 = Λ − {2 } + {02 }, where the operators denote set addition and
subtraction. I will now show that ∆D,Λ0 = 0.
Suppose towards contradiction that there is some choice observation (x, A) ∈ D
such that x not -maximal in A for any ∈ Λ0 . By assumption, x is -maximal
in A for some ∈ Λ. If 6=2 , then also ∈ Λ0 , which immediately yields a
contradiction.
So it must be that x is 2 -maximal in A. Now, there are two possibilities: if
x 6= x1 , then x must also be 02 -maximal in A, and we are done. If instead x = x1 ,
then although x is not 02 -maximal, it is 1 -maximal in A by its definition in (14).
So we are again done.
Since every set of three or more orderings satisfies the condition in (13), it immediately follows from Claim 3 that every Λ with |Λ| ≥ 3 is not identifiable.
Now let us consider Λ = {1 , 2 }. If (13) is satisfied, it again follows from
Claim 3 that Λ is not identifiable. Suppose otherwise, and define
D = {(x, A) : x is 1 -maximal in A, A ∈ 2X } ∪
{(x, A) : x is 2 -maximal in A, A ∈ 2X }.
Clearly there is no singleton set Λ0 such that ∆D,Λ0 = 0. Suppose towards contradiction that there exists some Λ0 = {01 , 02 } 6= Λ such that ∆D,Λ0 = 0. Suppose
12
That is, for every x, x0 6= x2 , x 02 x0 if and only if x 2 x0 , and for every x 6= x2 , x 02 x2 .
23
w.l.o.g. that 01 6=,13 and index the alternatives such that
x1 = argmax r1 (x)
x∈X
x2 = argmax r2 (x)
x∈X
That is, x1 is ranked first according to 1 and x2 is ranked first according to 2 .
First observe that since (x1 , X), (x2 , X) ∈ D, necessarily x1 is highest ranked in 01
and x2 is highest ranked in 02 , or vice versa. Suppose the former w.l.o.g. Since
01 6=1 , there exist alternatives xi , xj such that
xi 1 xj
and xj 01 xi ;
that is, xi is higher ranked than xj under 1 but not under 01 . Let
A := {x : xi 1 x}
be the set of all alternatives that 1 ranks weakly lower than xi . Notice that
x2 , xj ∈ A. Since xj ∈ A, and xj 01 xi by assumption, xi is not 01 -maximal in A.
Moreover, since x2 ∈ A, and x2 is 02 -maximal in X, xi is also not 02 -maximal in
A. But then (xi , A) is not consistent with maximization of either ordering in Λ0 ,
yielding the desired contradiction.
Finally, every singleton set Λ = {} is trivially identifiable, taking
D = (x, A) : is -maximal in A, A ∈ 2X .
13
Otherwise, 02 6=2 and the remainder of the proof is correspondingly mirrored.
24
References
Afriat, Sidney N. 1967. “The Construction of a Utility Function from Expenditure
Data.” International Economic Review 8(1):67–77.
Ambrus, Attila & Kareen Rozen. 2013. “Rationalizing Choice with Multi-Self Models.” Economic Journal .
Apesteguia, Jose & Miguel A. Ballester. 2012. “A Measure of Rationality and
Welfare.” Working Paper.
Bernheim, B. Douglas & Antonio Rangel. 2009. “Beyond Revealed Preference:
Choice Theoretic Foundations for Behavioral Welfare Economics.” Quarterly
Journal of Economics .
Crawford, Ian & Krishna Pendakur. 2012. “How Many Types Are There?” Economic Journal .
Dean, Mark & Daniel Martin. 2010. “How Consistent are your Choice Data?”
Working Paper.
Einav, Liran, Amy Finkelstein, Iuliana Pascu & Mark Cullen. 2012. “How General are Risk Preferences? Choices under Uncertainty in Different Domains.”
American Economic Review .
Famulari, Melissa. 1995. “A Household-Based, Nonparametric Test of Demand
Theory.” Review of Economics and Statistics 77:372–383.
Fudenberg, Drew & David Levine. 2006. “A Dual-Self Model of Impulse Control.”
American Economic Review 96(5):1449–1476.
Gross, Andrew. 1989. Determining the number of violators of the weak axiom.
Technical report University of Wisconsin–Milwaukee.
Houtman, Martijn & J.A.H. Maks. 1985. “Determining all Maximal Data Subsets
Consistent with Revealed Preference.” Kwantitatieve Methoden 19:89–104.
Huber, Joel, John Payne & Christopher Puto. 1982. “Adding Asymmetrically Dominated Alternatives: Violations of Regularity and the Similarity Hypothesis.”
Journal of Consumer Research .
Kahneman, Daniel & Amos Tversky. 2000. Choices, Values, and Frames. The Press
Syndicate of the University of Cambridge.
Kalai, Gil, Ariel Rubinstein & Ran Spiegler. 2002. “Rationalizing Choice Functions
by Multiple Rationales.” Econometrica 70(6):2481–2488.
25
Manzini, Paola & Marco Mariotti. 2009. “Categorize Then Choose: Boundedly
Rational Choice and Welfare.”.
McFadden, Daniel & M.K. Richter. 1970. “Revealed Stochastic Preference.”.
Quandt, Richard E. 1956. “A Probabilistic Theory of Consumer Behavior.”
Quaterly Journal of Economics 70:507–536.
Rubinstein, Ariel & Yuval Salant. 2006. “A model of choice from lists.” Theoretical
Economics 3(17).
Rubinstein, Ariel & Yuval Salant. 2008. “(A,f): Choice with Frames.” The Review
of Economics Studies .
Train, Kenneth. 1986. Qualitative Choice Analysis. Cambridge University Press.
Varian, Hal R. 1982. “The Nonparametric Approach to Demand Analysis.” Econometrica 50(4):945–73.
26