Two stage boundedly rational choice procedures: Theory and

Two stage boundedly rational choice procedures:
Theory and experimental evidence.
Paola Manzini
Marco Mariotti
Queen Mary, University of London and IZA
Queen Mary, University of London
This version: December 2007
Abstract
We study and test a class of boundedly rational models of decision making
which rely on sequential eliminative heuristics. We formalize two sequential decision procedures, both inspired by plausible models popular among psychologists and
marketing scientists. However we follow a standard `revealed preference' economic
approach by fully characterizing these procedures by few, simple and testable conditions on observed choice. Then we test the models (as well as the standard utility
maximization model) with experimental data. We nd that the large majority of
individuals behave in a way consistent with one of our procedures, and inconsistent
with the utility maximization model.
Corresponding address: Department of Economics, Queen Mary, University of London, Mile End
Road, London E1 4NS, United Kingdom. Part of this work was carried out while Manzini and Mariotti
were visiting the University of Trento. Funding was provided by the ESRC under grant RES-000-220866. We wish to thank both institutions for their support. We are also grateful to Gianni de Fraja,
John Hey, Jonathan Leland, Michele Lombardi, Luigi Mittone, Daniel Read, Ludovic Renou, Karl Schlag,
Nick Vriend, Quan Wen and seminar audiences at the Universities of Amsterdam, Brescia, EUI, Helsinki,
Leicester, LUISS, Malaga, Manchester, St. Andrews, Siena, Zurich and at the ESA 2007 and the APET
2007 conferences for insightful discussions and comments, as well as to the tireless sta of the CEEL lab
in Trento, in particular to Marco Tecilla, for superb programming support. Any error is our own.
1
1
Introduction
In the standard model of decision making, choice behavior is described as the outcome
of the maximization of some binary relation, possibly summarized by a utility function.
Yet, observed choice behavior is often incompatible with this model.1 The main goal of
this paper is to test both the standard model and a class of boundedly rational models of decision making which rely on sequential eliminative heuristics. We formalize two
sequential decision procedures, both inspired by plausible models popular among psychologists and marketing scientists. However we follow a standard `revealed preference'
economic approach by fully characterizing these procedures by few, simple and testable
conditions on observed choice data. Then we test these models, as well as the standard
utility maximization model, with experimental data.
We nd that the large majority of individuals behave in a way consistent with one of
our procedures, and inconsistent with the utility maximization model. Our theoretical
results also allow us to trace the observed departures from maximization to either of two
elementary forms of inconsistency: this helps guide the search for an alternative model.
For the moment let us refer to our procedures as Procedure I and Procedure II.2 For an
informal example of these procedures, suppose a decision-maker (DM) has to choose which
of three wines to drink with his dinner at a restaurant: a moderately priced Australian
(A), an expensive French Bordeaux (B), or a cheap Italian Chianti (C). Procedure I
describes considerations of the following kind. DM looks rst at the origin: he thinks B is
best among European wines, so he prefers B to C, but has never tasted new world wines,
so cannot compare a to either B or C. When he is undecided, DM chooses according to
price, and C is cheaper than A which is cheaper than B. This generates the following
`irrational' behavior: if only A and B are available DM chooses A; if only B and C are
available he chooses B; nally if only A and C are available he chooses C. DM exhibits
pairwise cyclical choices.
Procedure II captures considerations of the following kind. When all wines are avail1
See e.g. Roelofsma and Read [30], Tversky [42], and Waite [45] who nd evidence of pairwise cycles
of choice. The evidence presented in this paper points to further violations of rational behavior.
2
Procedure I was rst introduced in Manzini and Mariotti [22].
2
able, the DM, who is Italian, perceives B as a `right' compromise choice over the `extreme'
choices A (too exotic) and C (too familiar).3 In other words, when viewed as a group
against B, the options A and C share a characteristic, that of appearing extreme, compared to the alternative option. When only binary choices are available, DM looks instead
straight at price. This generates the following additional type of irrational behavior: B
is selected when all three wines are available, but C is selected (based on price) in both
binary contests in which it appears. DM exhibits a strong form of menu dependence.
We shall see below that the two types of `rationality failure' just illustrated are an
exhaustive taxonomy also for general choices.
Formally, in both procedures, DM implements a two-stage algorithm to arrive at a
nal choice: two rationales (asymmetric and possibly incomplete binary relations) are
used sequentially to eliminate alternatives from the choice set. The di erence lies in the
way the rst rationale operates. In procedure I, the rst rationale simply ranks (some of)
the alternatives. In procedure II, the rst rationale ranks instead sets of alternatives. The
interpretation is that the decision-maker categorizes alternatives in groups, and compares
groups rst. In general there may be more than one criterion to categorize alternatives,
so groups might overlap. When two groups are disjoint, they may be related by the rst
rationale. In this case, the entire `losing' group is eliminated. For both procedures, the
second rationale is used to eliminate further alternatives and to single out a choice.
Surprisingly, Procedure II is fully characterized by a single `revealed preference' property, which we call Weak WARP (Theorem 3). Weak WARP is a simple weakening of the
standard Weak Axiom of Revealed Preference (Samuelson [35]). Weak WARP adds to
WARP the clauses between brackets in the following de nition: if x is directly revealed
preferred to y [both in pairwise contests and in the presence of a `menu' of other alternatives], then y cannot be directly revealed preferred to x [in the presence of a smaller
menu]. This characterization is the main theoretical contribution of our work.
Procedure I can be similarly characterized, as we have shown in Manzini and Mariotti
[22]. The choice data can be generated by procedure I if and only if they satisfy, in
3
This particular example is just an application of the well-documented `compromise e ect' (Simonson
[39]) whereby brands improve market share when they are seen as the intermediate rather than an extreme
option in the choice set.
3
addition to Weak WARP, another standard revealed preference property called Expansion.
Expansion says that if x is chosen from two sets of alternatives, then x is chosen when
the sets are merged. Expansion and Weak WARP together still constitute a weakening
of WARP, which implies both properties.
The nature of these procedures can be much better understood in the light of a simple
but crucial result on choice functions. Any failure of full rationality in choice (that is, a
violation of WARP) can be ultimately reduced to one (or both) of just two categories,
illustrated in the previous wine examples: pairwise inconsistent choice and `Condorcet
inconsistent' choice. Condorcet inconsistency captures a speci c type of `menu-e ect' in
choice: an alternative is chosen over each of a number of other alternatives in pairwise
choice, yet it is no longer chosen when all these alternatives are grouped together.4 On the
other hand, pairwise inconsistent choice involves only pairwise comparisons and therefore
does not incorporate any menu-e ect. It thus captures a conceptually separate violation
of full rationality, namely that the pairwise comparisons do not allow the observer to
construct a preference relation with some minimal consistency property. Procedure I can
address pairwise cyclical choice, but not violations of Condorcet inconsistency (which is
a necessary condition for choices generated by that procedure). Procedure II can explain
both types of irrational behavior. Needless to say, even Procedure II is not vacuous as it
can be tested by Weak WARP.
After the theoretical analysis, we put all the decision models discussed to the experimental test. To do so we elicit the choice function of experimental subjects out of all
possible subsets of a given initial set of alternatives. In many decision theory experiments
only pairwise choices (or preferences) are elicited. However, the taxonomy result discussed
above illustrates the importance of observing decision behavior on larger sets in order to
trace the sources of full rationality violations.
In our experiment we use as alternatives time sequences of monetary rewards. Our
data show that WARP is violated by a majority of subjects. One of our main contributions
on the experimental front is an enquiry into the speci c nature of the violations of full
4
Obviously there are other speci cations of what constitutes menu-dependence, which for example
could be `dynamic' (dependence on previous choices or status quo: see e.g. Masatlioglu and Ok [25],
Houy [17] and Koszegi and Rabin [19]). See Sen [38] for a general discussion of menu-dependence.
4
rationality, based on our taxonomy result. In our context, more than 15% pairwise cyclical choices were observed. Nonetheless these violations of full rationality were strongly
associated with violations of Expansion. The consequence of this fact is that Procedure I
in this case does not make a signi cant improvement of the standard maximization model
(of course, Procedure I may be more useful in other contexts and for other purposes5 ).
Our main experimental nding is that Procedure II yields a step change in explanatory
power in the present case. The majority of violations of WARP are associated with
Condorcet inconsistency, so that any model that describes well these experimental data
will need to incorporate this type of menu e ect. Indeed, the large majority of subjects
satis es Weak WARP in all their choices. Note well: this is not simply a consequence of
the fact that Weak WARP, being a weaker axiom, is able to bring in most of the observed
choices. We demonstrate this by using Selten's Measure of Predictive Success, which
takes into account not only the `hit rate' of a model (proportion of correctly predicted
observations), but also its `parsimony' (inversely measured by the proportion of possible
cases in principle compatible with the model). Using our data, Procedure II scores higher
than both competitors also in terms of Selten's Measure. We present our theoretical
results in section 2, while the experiment is presented and analysed in section 3, and
section 4 conludes.
2
Theory
2.1
Preliminaries
In this section we develop and recall some theoretical results which are necessary to design
our experiment.
Let X be a nite set of alternatives and let
5
2X . A choice function on
is a
In Manzini and Mariotti [23] we study Procedure I with speci c decision criteria and obtain a model
of time preferences with signi cantly higher explanatory power than discounting models for pairwise
choices. In general, it is a consequence of our theoretical results that when only pairwise choices are the
object of interest, or when menu-e ects are not likely to play a major role, Procedure I is as a good a
model as Procedure II.
5
function
:
! X, such that
we make on the domain
(S) 2 S for all S 2 . The only additional assumptions6
are that all pairs and triples of alternatives are included in the
domain, that is, for all x; y; z 2 X:
(i) fx; yg 2 ;
(ii) fx; y; zg 2
.
For a binary relation
2X
X denote the
maximal elements of a set S 2
by
max (S; ), that is:
max (S; ) = fx 2 Sj @y 2 S for which y
xg
De nition 1 A choice function is fully rational if there exists a complete order
on X
such that f (S)g = max (S; ) for all S 2 .
As is well-known,7 in the present context the fully rational choice functions are exactly
those that satisfy Samuelson's [35] Weak Axiom of Revealed Preference (WARP), de ned
below:
WARP: If x =
(S), y 2 S and x 2 T then y 6=
(T ).
WARP says that if an alternative is directly revealed preferred to another, the latter
alternative can never be directly revealed preferred to the former (revealed preference is
an asymmetric relation).
2.2
A taxonomy of irrationality
Failures of full rationality may mix together more than one elementary form of inconsistency. To reduce lack of full rationality to its basic building blocks (to be studied in the
experiment), we consider the two violations of WARP discussed in the introduction, menu
dependence and pairwise inconsistency. The latter category involves exclusively choices
between pairs of alternatives, while the former category involves choices from larger sets.
6
The niteness assumption on the domain could be easily dispensed with, and replaced by a well-
behavedness assumption on the sets in
, guaranteeing that complete and transitive relations on them
have a maximal element. This would just complicate notation so we stick with the nite case in the text.
7
See e.g. Moulin [28] or Suzumura [41]
6
The following property captures the elementary form of menu independence:
Condorcet consistency: If x =
x=
(fx; zg) for all z 2 Sn fxg for some S 2
, then
(S).
Condorcet consistency says that if the same alternative is chosen in pairwise contests
against any other alternative in a set, then this alternative will be chosen from the set.
Let
x =
denote the base relation of a choice function , that is x
y if and only if
(fx; yg). Each of the following properties captures elementary forms of pairwise
consistency:
Base transitivity:
Base acyclicity:
Base intervality:
either x
z or w
is transitive.
is acyclic.
satis es the intervality condition:8 If x
y and w
z then
y.
We note rst that all these conditions are equivalent (the simple proof of the following
result is available from the authors):
Proposition 1 A choice function satis es base intervality, if and only if it satis es base
transitivity, if and only if it satis es base acyclicity.
The equivalence of all the base conditions makes it legitimate to speak simply of
Pairwise Consistency when the three conditions are met. The following is our basic
classi cation result:9
Theorem 1 A choice function on satis es WARP if and only if satis es both Condorcet
consistency and Pairwise Consistency.
Proof: It is obvious that a choice function that violates Condorcet consistency also
violates WARP. Suppose it violates base transitivity. Then there exist x; y; z 2 X for
8
Introduced in Fishburn [10].
9
This result is similar but much stronger than Proposition 1 in Manzini and Mariotti [22].
7
which x
y, y
z, and z
x, and by the domain assumption (ii) WARP is
contradicted on fx; y; zg.
For the converse implication, suppose that
S; T 2
(S) 6= y =
be such that x =
base transitivity: we show that then
violates WARP, and let in particular
(T ), x; y 2 T \ S. Suppose that
satis es
must violate Condorcet consistency. By base
transitivity there exist base-maximal elements in S and T . Since
must be asymmetric,
these elements are unique. So there are unique s 2 S and t 2 T such that
s=
(fs; zg) for all z 2 Sn fsg
t=
(ft; zg) for all z 2 T n ftg
If s 6= x, then Condorcet consistency is violated on S. If s = x, then in particular
x=
2.3
2.3.1
(fx; yg). So y 6= t and Condorcet consistency is violated on T .10
Two sequential models of decision making
Rational Shortlist Methods
We rst explain our Procedure I, introduced in Manzini and Mariotti [22]. From now on
we shall call any asymmetric binary relation on X a rationale.
De nition 2 A choice function
if there exists an ordered pair (
is a Rational Shortlist Method (RSM) if and only
1;
2)
of rationales such that:
f (S)g = max (max (S;
In that case we say that (
1;
2)
1) ;
2)
for all S 2
sequentially rationalize .
So the choice from each S can be represented as if the decision maker went through two
sequential rounds of elimination of alternatives. In the rst round he makes a `shortlist' by
retaining only the elements which are maximal according to rationale
1.
In the second
round, he retains only the element which is maximal according to rationale
10
2:
that
Note that this method of proof would yield a similar conclusion on in nite domains, provided pairwise
choices were suitably continuous.
8
element is his choice. Note that, crucially, the rationales and the sequence in which they
are applied are invariant with respect to the choice set. RSM's are in the vein of several
`noncompensatory' sequential eliminative heuristics11 promoted by psychologists (such as
the Elimination by Aspects model by Tversky [43], or the `Fast and frugal heuristics'
by Gigerenzer and Todd [12]) and marketing scientist (such as `greedoid based' choice
algorithms, Yee, Dahan, Hauser and Orlin [46]).
RSM's are characterized by two properties, one of which is a weakening of WARP,
and the other is the classical Expansion axiom:
Weak WARP: For all R; S 2
y 6=
: If fx; yg
R
S and x =
(S) 6= y then
(R).
Expansion: Let fSi g be a class of sets such that Si 2
x=
(fx; yg) =
(Si ) for all i then x =
for all i and [i Si 2
. If
([i Si ).
Weak WARP says that if you choose steamed salmon over steak tartare when they
are the only available choices, and you also choose steamed salmon from a large menu
including steak tartare, then you cannot choose steak tartare from a small menu including
steamed salmon. In other words, if adding a large number of alternatives to the menu
does not overturn a revealed preference, then adding just a subset of those alternatives
cannot overturn the revealed preference either. Expansion says that if steamed salmon is
chosen in each of a series of menus, then it is also chosen when all the menus are merged.
While these are reasonable conditions, in this paper we treat them merely as tractable
`revealed preference' type of properties which are easy to check in an experiment that
provides choice data.
The following characterization result follows easily by adapting the argument in the
proof of Theorem 1 in Manzini and Mariotti [22].
Theorem 2 Suppose the domain
is closed under set union. Then a choice function on
is an RSM if and only if it satis es Weak WARP and Expansion.
11
The adjective `noncompensatory' refers to the fact that while several criteria are used, they cannot
be used to `compensate' each other: unlike the arguments of a utility function, there is no trade-o in
the mind of the decision maker between one criterion and the next.
9
2.3.2
Rational Shortlist Methods by Categorisation
Next, we consider `Procedure II' from the Introduction. The basic idea is that decision
makers may go through an initial `classi cation' stage in which they organise the alternatives into groups, and decide to discard obviously inferior groups. This appears very
natural as soon as the set of alternatives exhibits some complexity. For example, a saver
seeking to invest a lump-sum might immediately discard investments in the stock market
and focus on xed income activities. Or, a prospective diner out may immediately discard
the group of Italian restaurants. Indeed related ideas have been considered in psychology,
where categorization is a widely studied concept in the analysis of human reasoning.12
In marketing science, the well-known `compromise e ect' (Simonson [39]) can also be
captured by the procedure, as illustrated in the wine example in the Introduction.
De nition 3 A rationale by categorization on X is an asymmetric relation
on 2X
satisfying the following two properties:
(i) R \ S = ? whenever R
(ii) jR [ Sj > 2 whenever R
S
S
The interpretation of a rationale by categorization
is that alternatives may be
organized in categories by some criterion. More than one criterion may be relevant, hence
two categories are not necessarily disjoint (think of categorizing a group of people by race
and religion). However, the decision-maker can only compare disjoint categories (condition
(i)). Moreover, what the relation compares are genuine groups: degenerate comparisons
between singletons are not allowed (condition (ii)). For example, with alternatives such as
those used in the experiment, constant streams of monetary payments can form a category
`against' increasing streams, and increasing three-period streams could form a category
against increasing two-period streams.13
12
13
See e.g. Medin and Aguilar [24].
Rubinstein [31] (and more recently [32]) pioneered in economic theory the analysis of similarity con-
siderations in decision-making, which may be related to the idea of categorization (considering alternatives
in the same category as similar).
10
De nition 4 Given a rationale by categorization
and S 2 , the
maximal set on
S is given by:
max (S; ) = fx 2 Sj for no R0 ; R00
S it is the case that R0
R00 and x 2 R00 g
We are now ready for our main de nition:
De nition 5 A choice function
exists a rationale by categorization
is an RSM by categorization if and only if there
1
and a rationale
f (S)g = max (max (S;
1) ;
2)
2
such that:
for all S 2
So, the decision maker looks rst at group rankings, and eliminates any group which
is dominated by another group. Then, she decides among the remaining alternatives on
the basis of the second rationale. For example, if the choice set is comprised of two nonincreasing streams of money and two increasing streams, the decision maker may rst
select the group of non-increasing streams and then select within that group. When this
procedure leads to a single chosen alternative for each choice set, the resulting choice
function is an RSM by categorization.
Below we make some observations which highlight key di erences between the two
sequential procedures we have introduced.
Remark 1 Condorcet consistency is a necessary condition for an RSM. However, there
are choice functions which violate Condorcet consistency and yet are RSMs by categorization. For instance, take the following choice function, with the base relation as indicated
in
gure 1 (where an arrow going from x to y indicates that x is selected in pairwise
choice between x and y): X = fa; b; c; dg,
(fa; c; dg) =
(X) =
(fa; b; cg) =
(fb; c; dg) = b,
(fa; b; dg) = a. Condorcet consistency is violated, since a is chosen
in pairwise comparisons over each of the other alternatives but is not chosen from the
grand set, nor from fa; b; cg. However, this choice function is an RSM by categorization
with fbg
1
fa; cg, and
2
coinciding with the base relation.
11
a
a
d
b
d
b
c
c
Figure 1: Base relation
Figure 2: Base relation
for remarks 1 and 2
for remark 3
Remark 2 Consider the following dual property to Condorcet consistency: an alternative
is not chosen in a set in which it is never chosen in any pairwise choice. Formally, if
x 6=
(fx; zg) for all z 2 Sn fxg and S 2
, then x 6=
(S). It is easy to show that this
property too is a necessary condition for an RSM. However, there exist choice functions
that violate it and yet are RSMs by categorization. Consider the choice function with the
same base relation as the one in gure 1, where
b,
(fa; b; dg) = a and
categorization by fbg
1
(X) =
(fa; b; cg) =
(fb; c; dg) =
(fa; c; dg) = d. It violates the property and is an RSM by
fa; cg, fdg
1
fa; cg and
2
coinciding with the base relation.
Remark 3 An RSM by categorization is not a vacuous notion: there are choice functions
which are not RSMs by categorization. As an example, let X = fa; b; c; dg and let
(fa; b; cg) = a,
(fa; b; dg) =
(fb; c; dg) = b and
(fa; c; dg) = c, with the base relation
as in gure 2. Then, since b is chosen in fa; b; dg, there must be R0
that a is eliminated before it can eliminate b. Then
in view of a =
(X) =
1
R00 , with a 2 R00 , so
cannot be an RSM by categorization
(X).
The choice function in the last example violates Weak WARP. As we show next, this
property alone characterizes an RSM by categorization. To ease notation, xing the choice
function , we de ne the upper and lower contour sets of an alternative on a set S 2
U p (x; S) = fy 2 Xjy
xg \ S
Lo (x; S) = fy 2 Xjx
yg \ S
and
12
as
respectively.
Theorem 3 A choice function is an RSM by categorization if and only if it satis es Weak
WARP.
Proof: Necessity. Suppose that
and
2.
Suppose x =
that y =
is an RSM by categorization with rationales
(fx; yg) and x =
(R) with x 2 R
1
(S) with y 2 S. Now suppose by contradiction
S. This means that x must be eliminated in the rst
round of elimination in R: if not, then either x would eliminate y in the second round, or
(fx; yg) = x would contradict the assumption that
particular there exist R0 ; R00
contradicts x =
1
1
1
and
R00 and x 2 R00 . Since R0 ; R00
2.
In
S this
(S).
Su ciency. De ne: x
De ne: R
R, such that R0
is rationalized by
2
y if and only if x =
S if and only if there exists T 2
(fx; yg).
2
is obviously asymmetric.
such that
R = f (T )g [ Lo ( (T ) ; T )
and
S = U p ( (T ) ; T ) 6= ?
1
is also obviously asymmetric and note that R \ S = ? whenever R and S are related
by
1.
Now let S 2
and let x =
Suppose rst that y
2
(S). We show that x is not eliminated in either round.
x for some y 2 S . Then by construction
fxg [ Lo (x; S)
1
U p (x; S)
and y is eliminated in the rst round.
Next, suppose by contradiction that x is eliminated in the rst round. Then there
exists R0 ; R00
S with R0
1
R00 and x 2 R00 . De ne R = R0 [ R00 . By construction of
it must be
R0 = f (R)g [ Lo ( (R) ; R)
and
R00 = U p ( (R) ; R)
13
1
(Notice that here a separate assumption of closure under union of the domain is not
needed). This means that
x=
Together with x =
x=
(fx; (R)g)
(S) (and noting that R = R0 [ R00
(S) (if R0 [ R00 = S) or Weak WARP (if R0 [ R00
S) this contradicts either
S). It remains to note that y is
eliminated either in the rst round or in the second round for all y 6= x. If y =
then y is eliminated in the rst round. If x =
(fx; yg),
(fx; yg) then y is eliminated in the second
round since as we have seen x survives the rst round.
3
Experiment
3.1
Experimental Design
The experiment was carried out at the Computable and Experimental Economics Laboratory at the University of Trento, in Italy. We ran a total of 13 sessions between July
2005 and February 2006. Participants were recruited through bulletin board advertising
from the student population of the University of Trento. Male and female participants
took part in each experimental session in roughly equal proportions. The experiment
was computerised, and each participant was seated at an individual computer station,
using separators so that subjects could not see the choices made by other participants.
Experimental sessions lasted an average of around 26 minutes, of which an average of
18 minutes of e ective play, with the shortest session lasting approximately 16 minutes
and the longest around 37 minutes. We considered two treatments, one in which subjects
received only a C
=5 showup fee (a total 56 subjects in 4 sessions), and one with payments
based on choice, where as we explain more in detail below an additional C
=48 were made
available to each subject (a total of 102 subjects in 9 sessions).14 We will refer to these two
treatments as HYP (for hypothetical) and PAY (for paid), respectively.15 At the begin14
The show up fee alone, for an average of less than thirty minute long experimental session, was higher
than the hourly pay on campus, which was C
=8. At the time of the experiments the exchange rate of the
Euro was approximately C
=1=$1.2=$0.7.
15
Distinguishing by treatment, sessions lasted an average of about 28 minutes for the PAY treatment,
of which an average of just above 19 minutes of e ective play; and an average of around 22 minutes for
14
ning of the experiment subjects read instructions on their monitor, while an experimenter
read the instructions aloud to the participants.16 In each treatment, each experimental
subject was presented with 23 di erent screens. Each screen asked the subject to choose
the preferred one among a set of alternative remuneration plans in installments to be
received staggered over a time horizon of nine months, each consisting of C
=48 overall.
Instructions were the same in both treatments, bar for one sentence, which in the HYP
treatment clari ed that choices were purely hypothetical, so that the only payment to be
received would be the show up fee; whereas for the PAY treatment it was explained that
at the end of the experiment one screen would be selected at random, and the preferred
plan for that screen would be delivered to the subjects.17
Choices were based on two sets - depending on the number of installments - of four
plans each, namely an increasing (I), a decreasing (D), a constant (K) and a jump (J)
series of payments, over either two or three installments, as shown below. Though in both
cases payments extended over nine months, because of the di erent number of installments
we abuse terminology and refer to `two-period' and `three-period' sequences rather than
two/three-installment sequences:
Two period sequences
I2
in three months
in nine months
D2
16 32
32 16
Three period sequences
K2 J2
I3
D3
K3 J3
24
8
24
16
8
in three months
16 16
16
8
in six months
24 8
16
32
in nine months
24
8
40
Table 1: the base remuneration plans
Each subject had to choose the preferred plan from each possible subset of plans within
each group (making up 11 choices per group). In addition, in a 23rd question subjects were
asked to choose between the three period sequences SJ=(8,32,8) and SI=(24,8,16). This
the HYP treatment, of which an average of about 16 minutes of e ective play.
16
17
See the appendix for the translation of the original instructions (in Italian).
The experimental lab has a long tradition, so there was no issue of (mis)trust in receiving delayed
payments. At the time of writing all subjects have been paid.
15
was needed to address an issue for a di erent experiment, which we discuss elsewhere.18
The use of payment plans as alternatives stems from our desire to present the experimental subjects with alternatives of a `richer' description than mere money amounts
(sequences can be classi ed by their `shapes'), but which at the same time are objectively
expressed in money terms, so as to reduce di cult to control emotive connotations (which
might happen, for example, by using food items).
Figure 3 displays sample screenshots of the choices subject had to make. The participants made their choice by clicking with their mouse on the button corresponding to
the preferred remuneration plan. Once made, each choice had to be con rmed, so as to
minimize the possibility of errors. Both the order in which the twenty-three questions
appeared on screen and the position of each option on the screen was randomized.
Figure 3: Sample screenshots
An unusual feature of the experiment is that we elicited the entire choice functions
with domain over all subsets for each of the two grand sets X2 = fI2; D2; K2; J2g and
X3 = fI3; D3; K3; J3g. This enables us to check whether or not the axioms discussed in
section 2 hold. In particular, we can assess (i) what the main reason is for the failure of
18
See Manzini, Mariotti and Mittone [23]
16
full rationality (violation of Pairwise Consistency or violation of Condorcet consistency),
and (ii) what proportion of choice functions can be rationalized in the standard way, what
proportion is an RSM and what proportion is an RSM by categorization.
3.2
3.2.1
Experimental results I: Evaluating the models
Choosing at Random?
We begin by noting that, whatever the subjects are doing, they are certainly not pressing
buttons at random.19 Since we are eliciting the entire choice functions from universal
sets with four alternatives, with a uniform probability distribution on each choice set, the
probability of observing even only two subjects with the same choice is e ectively zero for
all practical purposes. In fact, as there are a possible 26
34
4 = 20; 736 choice functions
on each universal set, the probability of any given choice function being picked by two
subjects is (20; 736)
2
= 2:325 7
10 9 . On the contrary for both treatments and for
both universal sets X2 and X3 we nd almost half of the subjects with the same modal
choice function. For illustration we report the frequency distributions of the observed
choice functions only graphically in Figure 4 (we omit labels for legibility).20
3.2.2
Which Type of Rationality Failure?
Before turning to the evaluation of the models,21 we check which, if any, of the two
distinct elementary failures of full rationality highlighted in Theorem 1 in section 2.2 is
more prevalent. To this e ect we begin by reporting aggregate data for the violations
of Pairwise and Condorcet Consistency for each of the choice functions elicited (i.e. the
choice functions from 2X2 and from 2X3 for each treatment):
19
Purely random choice is an important benchmark. Within consumer choice, the idea was rst ad-
vanced by Becker [4] and it is used for example as the alternative hypothesis in the popular Bronars [6]
index of power for nonparametric revealed preference tests. See Andreoni and Harbaugh [2] for a recent
discussion of this issue.
20
The corresponding data are reported in tables2 and 3 in the technical appendix available online at
http://www.qmul.ac.uk/ ugte172/twosimiltechnical and data appendix.pdf.
21
All the exact statistical analysis has been carried out usting StatXact, v.7. For a comprehensive
treatment of exact and other methods in categorical data analysis see Agresti [1].
17
50%
PAY
HYP
45%
40%
Percent
35%
30%
25%
20%
15%
10%
5%
choice functions, 2-period sequences
50%
PAY
choice functions, 2-period sequences
HYP
45%
40%
Percent
35%
30%
25%
20%
15%
10%
5%
choice functions, 3-period sequences
choice functions, 3-period sequences
Figure 4: Frequency distributions of choice pro les by treatment and sequence length
18
PAY
HYP
2 periods
3 periods
2 periods
3 periods
#
#
#
#
%
%
%
%
Condorcet Consistency
39 38.2
30 28.4
14 25
12 21.4
Pairwise Consistency
12 11.8
7
3
3
6.9
5.4
5.4
Table 2: Axiom violations in the two treatments.
Since we are mainly interested in the decisions of individual subjects over all their
choices, we report below a summary of overall violations of Condorcet Consistency and
Pairwise Consistency by experimental subject:
PAY
#
%
Condorcet Consistency 51 50
Pairwise Consistency
HYP
#
%
22 39.3
17 16.7 4
7.1
Table 3: Overall violations of PC and CC.
From table 3 it emerges that failures of Condorcet Consistency are substantially more
frequent than violations of Pairwise Consistency. This di erence is statistically significant, regardless of treatment. In fact, the McNemar test of the hypothesis that the
proportions of subjects violating Condorcet Consistency is the same as the proportion of
subjects violating Pairwise Consistency yields exact p-values of 0:009 in the case of the
PAY treatment, and of 0:001 in the case of the HYP treatment. If we then look at the
di erences in the proportion of violations of each of the two axioms across treatments,
the fall in the proportion of violations when moving from the PAY to the HYP treatment
is not statistically signi cant: Fisher test's exact mid-p values are 0:110 for Condorcet
Consistency and 0:470 for Pairwise Consistency.
19
3.2.3
How Well Do the Models Describe the Data?
Next, we turn to the three models examined in section 2.3, and we study the violations of
the axioms which characterize those models.22 Recall that one model is the full rationality
model (characterized by WARP), the other is the RSM model (characterized by Weak
WARP and Expansion) and the third is the RSM by categorization model (characterized
by Weak WARP). Again we start by looking at data for each choice function, reported in
table 4.
PAY
HYP
2 periods
3 periods
2 periods
3 periods
#
#
#
%
#
%
10.7
5
8.9
%
%
Weak WARP
22 21.6
12 11.7
6
Expansion
39 38.2
30 29.4
14 25
13 23.2
WARP
43 42.2
30 29.4
16 28.6
13 23.2
Table 4: Violations of the axioms used for rationalizability.
Table 4 shows that WARP is violated quite considerably in the PAY treatment, and
less so - though still substantially - in the HYP treatment. Expansion is violated slightly
less often than WARP overall, but much more often than Weak WARP.
Overall violations by each individual, regardless of choice set, are reported in table
5. The proportion of subjects violating each axiom falls when moving from the PAY to
the HYP treatment. Of these di erences, those concerning WARP and Weak WARP
are statistically signi cant, while for Expansion this is not the case (Fisher test's exact
mid-p values are 0:110 for Expansion, 0:042 for WARP and 0:022 for Weak WARP). Like
Table 4, Table 5 also suggests similar rates of violation for Expansion and WARP (50%
22
In our experiment we use a small universal set of alternatives. Evidence for choice from budget sets
includes Fevrier and Visser [9], Mattei [27] and especially Sippel [40], who nd substantial violations of
the Generalized Axiom of Revealed Preferences in choices out of budget sets. However, Andreoni and
Harbaugh [2] argue that most of these violations are `small' on the basis of Afriat's e ciency index.
Indeed Harbaugh, Krause and Berry [15] and especially Andreoni and Miller [3] nd that subjects have
choices consistent with GARP in experiments with budget sets.
20
and 52:9%), and substantially lower rates for Weak WARP compared to either of the
other axioms (28:4%). Within treatment, however, the only meaningful comparison in
the di erence of proportions is between failures of Expansion and Weak WARP, which
are the only two independent axioms.23 Here the hypothesis of equality in the proportion
of subjects violating the two axiom is rejected (McNemar's exact p-value is 0:002 in the
PAY treatment and 0:041 in the HYP treatment).
PAY
#
%
HYP
#
%
Expansion
51 50
22 39.3
Weak WARP
29 28.4
8
WARP
54 52.9
22 39.3
14.3
Table 5: Overall axiom violations.
Table 5 con rms that WARP, i.e. the full rationality model, does not describe the
data well, especially in the PAY treatment where less than half of the subjects t the
model.
Consider now RSMs. The crosstabulation of violations of the two axioms characterizing it is reported in table 6. Interestingly, in both treatments, no individual who satis es
Expansion violates Weak WARP. That is, the (large) number of Expansion violators is
not joined by another separate group of Weak WARP violators in order to determine
the RSM violators. The RSM violators are simply counted by Expansion violators (of
which some are also Weak WARP violators). So although Weak WARP and Expansion
are logically independent properties, in our experimental sample they are not statistically
independent. This also shows that, quite remarkably, although RSM is a much weaker
notion than full rationality, in our experimental sample it does not do a substantially
better job than the fully rational model at explaining the data.
23
For comparisons between the proportion of violations of other pairs of axioms it is not possible to
rely on a McNemar test, as violations of either Expansion or Weak WARP imply violations of WARP
(i.e. the relevant contingency table would have structural zeroes). We defer tackling of this issue to our
discussion of the relative performance of alternative theories further below.
21
PAY
HYP
Expansion
Expansion
X
#
Weak WARP
X
%
X
#
%
#
%
#
%
29 28.4
0
0
8
14.3
0
0
22 21.6
51 50
14 25
Weak WARP
34 60.7
X
Table 6: Violations of Weak WARP and Expansion
RSMs improve only marginally on order maximization in their ability to explain the
data for the PAY treatment (decreasing the violations from 52:9% in the case of WARP
to 50% in the case of RSM), and they are as bad in the HYP treatment. This is due
to the fact that binary cycles, which violate full rationality but not RSM, are not a very
relevant phenomenon here, unlike menu e ects (in the sense of violations of Condorcet
Consistency), which cannot accommodated by either of these two theories.
Finally, we turn to Weak WARP and the model of RSM by categorization. From
Table 5 we can see that Weak WARP is satis ed by just below 72% of the subjects in the
PAY treatment and just below 86% of the subjects in the HYP treatment.
In summary then:
PAY
#
Full rationality
%
48 47.1
HYP
#
%
34 60.7
Rational Shortlist Method 51 50
34 60.7
RSM by categorization
48 85.7
73 71.6
Table 7: Explanatory power of competing theories.
The three models are nested, that is
Full Rationality ) Rational Shortlist Method ) RSM by categorization
22
In order to compare the incremental `explanatory' power in each more general theory in
a meaningful way, we have to take into account this nestedness. We do so in the next
section.
3.2.4
Comparing Models with Selten's Measure of Predictive Success
In order to compare the models we use Selten's Measure of Predictive Success (Selten [36]).
This measure was speci cally designed to evaluate `area theories' like the ones considered
in this paper, namely theories that exclude deterministically a subset of the possible
outcomes. The measure takes into account not only the `descriptive power' of the model
(measured by the proportion of `hits', the observed outcomes consistent with the model),
but also its `parsimony'. The lower the proportion of theoretically possible outcomes
consistent with the model, the more parsimonious the model. In our speci c case, one
possible criticism of our `revealed preference' type of tests might be that the experiment
does not have enough power to reject Weak WARP even if it happened to be the wrong
hypothesis. For example, if each universal set consisted of only three, instead than of
four, alternatives, Weak WARP could never be violated. But the observed 100% hit rate
should be interpreted as a failure of the experimental design rather than a validation of
the model of RSM by categorization. By introducing the `parsimony' element, Selten's
measure would pick up this problem. More precisely, the measure, denoted s, is expressed
as
s=r
a
where r is the descriptive power (number of actually observed outcomes compatible with
the model divided by the number of possible outcomes) and a is the `relative area' of the
model, namely the number of outcomes in principle compatible with the model divided
by the number of all possible outcomes. In the hypothetical example of a universal set of
three alternatives, s = 0 for the model of RSM by categorization
In our experiment, we observed two choice functions for each subject. So the number
of all logically possible observations of choice behavior for each subject was (20; 736)2 .
A `hit' consists of the subject not violating the axioms (characterizing a speci c model)
in either of the two choice functions. Thus the values of r for the various models are in
23
Table 7. In order to compute a for a model we have to compute the proportion of choice
functions compatible with the set of axioms characterizing that model. Let's consider the
Full Rationality model rst. For each universal set of alternatives, the number of choice
functions satisfying WARP is simply the number of strict orderings on a set of cardinality
four, that is 4! = 24. Therefore for each individual there are only 242 possible patterns of
behavior in the experiment compatible with the Full Rationality model. The area of the
Full Rationality model is:
aF R =
24
20; 736
2
0
The Full Rationality model is thus a beautifully parsimonious model whose Selten's Measure of Predictive Success is entirely determined by its descriptive power. From Table 7
above we thus have, for the PAY and HYP treatments, the following values of Selten's
Measure for this model:
sFP R
AY
0:471
sFHYR P
0:607
The RSM model, as we have seen, does not improve signi cantly in this experiment on
the Full Rationality model even in terms of sheer descriptive power. So a fortiori it is not
an interesting competitor to the latter in terms of overall predictive success. Therefore
we move to the computation of the values for the model of RSM by categorization.
We need to compute rst the number of choice functions that satisfy Weak WARP. This
can be done by considering four possible exhaustive and mutually exclusive con gurations,
depending on whether the choice from the grand set,
(X), is selected in pairwise choice
over all three other alternatives (Con guration A), over exactly two other alternatives
(Con guration B), over exactly one other alternative (Con guration C), or over none of
the other alternatives (Con guration D).
How many possible cases of Con guration A are there in which Weak WARP is satised? The choices for all sets of cardinality three which include
(S) are forced, as
(S)
must be chosen from them in order not to violate Weak WARP. So, the four possible
choices of
(S) can be combined with the three possible choices from the set of cardinal-
ity three not including
(S), and with the 23 possible choice combinations from the three
24
binary sets that do not include
(S). All in all, we have 4
3
23 = 96 possible choice
functions compatible with Weak WARP in this con guration.
For Con guration B, we have rst of all that the four possible choices for
be combined with three possible choices for the alternative which is chosen over
binary comparison. For each of these combinations, there are 3
(S) can
(S) in
22 choice combinations
from the sets of cardinality three (since only two choices compatible with Weak WARP
can be made from the two sets of cardinality three that include
(S) and the alternative
that `beats' it in binary comparison), together with the 23 choice combinations from the
binary sets that do not include
(S). All in all, we have 4
3
(3
22 )
23 = 1; 152
possible choice functions compatible with Weak WARP in this con guration.
Reasoning along similar lines leads to the count of 3; 456 possible choice functions
compatible with Weak WARP in con guration C, and to 2; 592 in Con guration D.
Adding up, in total there are 7; 296 possible choice functions compatible with Weak
WARP when the grand set has cardinality four, and thus (7; 296)2 possible types of choice
behavior compatible with Weak WARP in our experiment. This leads to the area value
a
RSM C
=
7; 296
20; 736
2
= 0:123
Our model of RSM by categorization is as expected less parsimonious than the Full
Rationality model. However let's consider the values for Selten's Measure in the two
treatments:
C
sRSM
= 0:716
P AY
0:123 = 0:593
C
sRSM
= 0:857
HY P
0:123 = 0:734
We conclude that the predictive success of the model of RSM by categorization is superior
in both treatments to that of the Full Rationality model.
3.2.5
Summary and Comment.
The general indication we draw from the data is that a model addressing the lack of full
rationality in the choice function must be able to explain menu-e ects in the form of
Condorcet inconsistency.
25
This indication is con rmed in the analysis of the three models we have studied in
this paper. Neither the full rationality nor the RSM model are compatible with menu
e ects of the Condorcet inconsistency type, and indeed they both fail at explaining the
data. The RSM model performs only marginally better than the full rationality model.
The proportion of successes in explaining behavior is not increased signi cantly when
weakening WARP to the combination of Expansion and Weak WARP.
The model of RSM by categorization is compatible with Condorcet inconsistency, and
it is successful. There is a signi cant leap in the proportion of successes in explaining
behavior when weakening WARP to Weak WARP. The resulting model can explain 50%
more data compared to the other two models, namely over 70% in one treatment and over
85% in the other treatment.
In addition, the loss of parsimony of the model of RSM by categorization is much
smaller than the increase in descriptive power, as evidenced by its superiority on the
basis of Selten's Measure of Predictive Success.
3.3
Experimental results II: Other considerations
The data allow a plethora of additional considerations - due to space limitations we cannot
analyze them all in this paper, and limit ourselves to highlighting just a few. Two clear
patterns concerning violations of the axioms that emerge from both the aggregate data
tables and the individual choice data tables are the following:
1. People are more consistent for longer sequences. For each axiom considered, the
proportion of choices or subjects violating it falls as sequence length increases, irrespective of treatment.
2. People are more consistent if they are not paid. For each axiom considered, the
proportion of choices or subjects violating it falls when incentives are removed, i.e.
when passing from the PAY to the HYP treatment, regardless of sequence length.
In addition, by looking at the crosstabulation of violations of each axiom by sequence
length, we can measure, for each axiom, the proportion of subjects failing to satisfy it for
26
at least one choice function. Crosstabulations of this sort allow us to test for each axiom
the following:
within each treatment: (i) the statistical signi cance of the fall in the proportions
of violations when going from shorter to longer sequences, and (ii) whether or not
violations observed for di erent sequence length are associated.
across treatments: whether, controlling for sequence length, the proportion of violations depends on treatment, i.e. whether elicitation of choices by incentive compatible means in the PAY treatment results in a di erent proportion of subjects
violating each axiom as compared to the HYP treatment.
We summarize our main ndings in table 8 below.24 Notationwise,
and
2
3
refer to
the proportions of subjects violating an axiom in choices involving two and three period
sequences, respectively, for any given treatment. In addition, for any given sequence
length,
P AY
and
refer to the proportions of subjects violating an axiom in the
HY P
PAY and HYP treatment, respectively.
Within treatment
PAY
2
>
3
random errors
2
>
3
P AY
random errors
2 periods
X
X
X
PC
X
Weak WARP
EXP
HYP
X
CC
Across treatment
HY P
3 periods
X (10%)
X
(10%)
X (10%)
X
X
WARP
>
X
X
Table 8: Comparisons of proportions and association.
In the leftmost part of table 8 (under the heading `within treatment') we report (i)
whether or not
2
is statistically larger than
3,
and (ii) whether violation of an axiom
for shorter sequences makes it any more likely that the subject violates the same axiom
24
The detailed derivation of this summary is reported in Appendix C in the technical appendix available
online at http://www.qmul.ac.uk/ ugte172/twosimiltechnical and data appendix.pdf.
27
when choosing out of longer sequences, too. If this is not the case, one may assume that
di erences in the proportions of violations across sequence length are due to the subjects
making mistakes independently from one another - in table 8 this lack of association is
referred to by the shorthand `random errors'. In each column, we use a tick (X) when
the relevant statistic is such that the heading in the table `holds', and a cross ( ) when
the heading in the column `fails'. So for point (i), a tick indicates that
larger than
3,
2
is statistically
while a cross indicates that the di erence in proportions is not statistically
signi cant.25 Regarding (ii) instead we use a tick to indicate that indeed di erences may
be just random, and a cross when this is not the case.26
The rightmost part of the table instead reports whether, for any given sequence length,
the fall in the proportion of subjects violating a given axiom when moving from the PAY
to the HYP treatment is statistically signi cant (which we denote by a tick X) or not
(which we denote by a cross
25
).27
To be precise, the null hypothesis of the test for (i) is that the proportion of violations is the same
regardless of sequence length against a one sided alternative that the proportion of violations for two
period sequences is larger than for three period sequences. Then the tick refers to the null being rejected.
To test this hypothesis we rely on McNemar' statistic.
26
To be precise, the null hypothesis of the test is for lack of association between rows and columns
in the cross-tabulation (i.e. the odds ratio is equal to 1). If this hypothesis is rejected, then rows and
columns are associated, i.e. a subject is much more likely to violate the axiom in choice among three
period sequences when he has done so in choice out of two period sequences too. In the table we abuse
terminology for the sake of clarity, so that a tick corresponds to a rejection of the null hypothesis, while
a cross stands for failure to reject. We base this test on Fisher's statistic.
Note that tests (i) and (ii) are independent, in the sense that a high p-value in the McNemar test
does not necessarily imply a high p-value of the Fisher test, and viceversa. For instance, in the table
long
short
yes
no
yes
2
8
no
1
4
based on the Mc Nemar test one rejects the null of equality of proportion whereas
based on the Fisher test one fails to reject the null of lack of association between rows and columns.
27
To be precise, the null hypothesis of the test is for equality in the proportion of subjects violating
a given axiom across the PAY and HYP populations, based on the Fisher test (i.e. the odds ratio for
the table with treatments against violation is equal to unity). If this hypothesis is rejected (against the
one sided alternative that the proportion of violations in the PAY treatment is larger than in the HYP
28
There is no clear pattern of association across sequence length for the violations of
each axiom (`random error' columns). Broadly, di erences in choice behavior between two
and three period sequences are more pronounced in the PAY than in the HYP treatment
(`
2
>
3'
columns). Moreover choice behavior for three period sequences does not di er
much between the two treatments, whereas it does for choices over two period sequences
(`
P AY
>
HY P '
columns). However, a quick inspection of choice behavior by subject
(table 7) shows that for all of the three models analyzed, their ability to explain the
data increases in the HYP treatment as compared to the PAY treatment: when moving
from the PAY to the HYP treatment, the percentage of subjects whose choice function
is an RSM increases by 10:7%, the percentage of subjects whose choice function can be
rationalized in the standard way increases by 13:6%, and the percentage of subjects whose
choice function is an RSM by categorisation increases by 14:1%. For the latter two notions
of rationality these increments are statistically signi cant,28 and `just' not signi cant for
rational shortlist methods.29
Comment. Our data show a very clear pattern whereby monetary incentives to
elicit choices which are the expression of `true' preferences have the e ect of producing
less `rational' behavior. Providing a rigorous explanation for this phenomenon would go
beyond the scope of this paper and the bounds of economics. Still, this seems to open
a di erent angle to the discussion on the role of monetary incentives in experiments. In
the economics literature this generally revolves around whether or not monetary incentives are necessary to elicit `true' preferences or the `best' outcome (see e.g. Camerer
and Hogarth [7], Hertwig and Ortmann [16], Read [29] and Harrison and Rutstr•om [8]).
However, we note that an empirical regularity in experiments is that subjects are upset
treatment), then the two proportions are statistically di erent. In the table we abuse notation for the
sake of expositional clarity, so that a tick corresponds to a rejection of the null hypothesis, while a cross
stands for failure to reject.
28
In comparing proportions of violations in the PAY and HYP treatments, Fisher test's exact mid-p
values are 0:042 for WARP, i.e. standard rationalisability; and 0:022 for Weak WARP, i.e. RSM by
categorisation.
29
The exact mid-p value from the Fisher test is is 0:101.
29
when confronted with their own inconsistencies.30 One might argue that the absence or
presence of monetary incentives constitutes a change of (experimental) `frame', so that
what matters is not the composition of the set from which the choice is going to be made,
rather the objects it includes and whether or not monetary incentives for choice exist. In
other words, the choice set when any alternative once chosen is then going to be experienced is a di erent object from a choice set with the same set of available alternative but
where choice itself is just a thought experiment. In addition, there may be other `external'
relevant dimensions to the problem, such as the decision makers' values, motivations and
so on, which might in uence choice.31 Based on these considerations, we advance the
tentative hypothesis that our results, too, support the position that incentive compatible
elicitation of preference is necessary to elicit preferences that are closer to those that a
decision maker would display in a real life choice situation. Where choices are only hypothetical in nature, as in the HYP treatment, the decision maker's main concern is that
of being consistent, resulting in less frequent violations of the axioms. In this sense, our
results invite caution in the use of introspection for testing the `plausibility' of competing
axioms of choice.
3.4
Experimental results III: `La donna e' mobile'
When looking at violations of the axioms across sexes, a pattern emerges: within each
treatment the proportion of women that violate the axioms is higher than men, albeit
with one exception. In addition, the pattern we highlighted in the previous section whereby with the removal of monetary incentives for choice the proportion of violations
of our axioms decreases - persists regardless of sex, as shown in Table 9.
The statistical signi cance of these di erences, however, is in general weak. In particular:
1. Within treatment:
30
This is based on the casual evidence that generally emerges in de-brie ng discussions, although there
is psychological literature that deals with the e ect of a ective states on decisions, see e.g. Luce, Bettman
and Payne [21].
31
This point has been made very clearly by Sen [37].
30
PAY
F
#
HYP
M
%
#
F
%
#
M
%
#
%
CC
27 57.4% 24 43.6% 14 48.3%
8
29.6%
Expansion
27 57.4% 24 43.6% 14 48.3%
8
29.6%
17 36.2% 12 21.8%
5
17.2%
3
11.1%
7
3
10.3%
1
3.7%
27 57.4% 27 49.1% 14 48.3%
8
29.6%
Weak WARP
PC
WARP
14.9% 10 18.2%
Table 9: Violations of the axioms by sex.
The di erence in proportions of men and women violating Condorcet Consistency and Expansion is statistically signi cant at 10% con dence level in both
the PAY and the HYP treatments;32
The di erence in proportions of men and women violating Pairwise Consistency
is not signi cant in either treatment;33
The di erence in proportions of men and women violating Weak WARP is
statistically signi cant at 10% con dence level in the PAY treatment and not
statistically signi cant in the HYP treatment;34
The di erence in proportions of men and women violating WARP is not statistically signi cant in the PAY treatment and statistically signi cant at 10%
con dence level in the HYP treatment.35
2. across treatments:
For female participants, the only di erence in the proportions of subjects violating a given axiom across treatments which is statistically signi cant is for
32
The mid-p value for the Fisher test is 0:087 for the PAY treatment and 0:084 for the HYP treatment.
33
The mid-p value for the Fisher test is 0:336 for the PAY treatment and 0:199 for the HYP treatment.
34
The mid-p value for the Fisher test is 0:059 for the PAY treatment and 0:27 for the HYP treatment.
35
The mid-p value for the Fisher test is 0:21 for the PAY treatment and 0:084 for the HYP treatment.
31
Weak WARP, for which the Fisher test returns a mid-p value of 0:042;36
For male participants the only di erences in proportions which are statistically
signi cant are for Pairwise Consistency, for which the Fisher test yields a pvalue of 0:037; and WARP, for which the mid-p value from the Fisher test is
0:051.37
The analysis above shows also that there are substantial di erences (though not always
statistically signi cant) in the proportion of women and men whose choices conform to
either Full rationality (i.e. their choices satisfy WARP) or RSM by categorization (i.e.
their choices satisfy Weak WARP). To check di erences in the sexes as to the explanatory
power of RSM we present cross-tabulations of Expansion and Weak WARP by sex in
tables 10 and 11.
PAY
Females
Males
Expansion
Expansion
X
#
Weak WARP
%
17 36.2%
X
X
#
%
#
0
0%
12 21.8%
10 21.3% 20 42.6%
%
#
%
0
0%
Weak WARP
12 21.8% 31 56.4% X
Table 10: RSM by sex in the PAY treatment.
When considering RSMs, then, the di erence across sexes is quite substantial in both
treatments (around 13% in the PAY treatment and just short of 20% in the HYP treatment), and it is also statistically signi cant at 10% con dence level for both treatments.38
Finally, di erences across treatments by sex are not statistically signi cant.39
36
The mid-p value for the Fisher test is equal to 0:224 for Condorcet Consistency, WARP and Expan-
sion, and equal to 0:30 for Pairwise Consistency.
37
The other mid-p values for the Fisher test are equal to 0:117 for both Condorcet Consistency and
Expansion, and to 0:129 for Weak WARP.
38
The mid-p values for the Fisher test of the di erence in the proportion of men and women satisfying
RSM is equal to 0:087 for the PAY treatment and 0:084 for the HYP treatment.
39
The mid-p values for the Fisher test of the di erence in the proportion of subjects satisfying RSM
32
HYP
Females
Males
Expansion
Expansion
X
X
#
%
#
%
#
%
#
%
5
17.2%
0
0%
3
11.1%
0
0%
9
31.1% 15 51.7%
5
18.5% 19 70.4% X
Weak WARP
X
Weak WARP
Table 11: RSM by sex in the HYP treatment.
In summary then:
PAY
F
#
HYP
M
%
#
F
%
#
%
M
#
%
Rationalizability
20 42.6
28 50.9
15 51.7
19 70.4
Rational Shortlist Methods
20 42.6
31 56.4
15 51.7
19 70.4
RSM by categorization
30 63.8
43 78.2
24 82.8
24 88.9
Table 12: Explanatory power of competing theories across sexes
The model of RSM by categorization works better than the other two regardless of
sex. Across sexes, there are di erences:
the proportion of men whose choices are rationalizable is higher than the proportion
of women in the HYP treatment, but not in the PAY treatment;40
the proportion of men whose choices are RSM is statistically signi cantly higher
than the proportion of women in both the PAY and HYP treatment;41
in the HYP and PAY treatments is equal to 0:228 for Female participants and equal to 0:117 for Male
participants.
40
The Fisher test yields an exact mid-p value equal to 0.215 for the PAY treatment and to 0:084 for
the HYP treatment (i.e. for the latter the di erence in proportions is signi cant at 10% con dence level).
41
The Fisher test yields an exact mid-p value equal to 0:086 for the PAY treatment and to 0:084 for
the HYP treatment (i.e. the di erence in proportions is signi cant at 10% con dence level).
33
the proportion of men whose choices are an RSM by categorization is higher than
the proportion of women in the PAY but not in the HYP treatment.42
It is very hard to explain any of these di erences in behavior within a purely economic
framework. We leave further analysis to scholars in other elds.
4
Concluding remarks
We hope to have made several points with this paper:
Methodology. The broad thrust of the paper has been to show concretely that the
standard revealed preference methodology can be successfully used to study (experimentally or otherwise) `psychological' choice procedures. This methodological point has also
been recently argued by Rubinstein and Salant ([34] and [33]), Bernheim and Rangel ([5]),
Manzini and Mariotti ([22]), and it is implicit in the approach of works such as Masatlioglu and Ok ([25], [26]). Note well: we are not claiming that only direct choice data
are relevant for economics (as argued for example by Gul and Pesendorfer [14]), but that
even these data alone can be extremely helpful in discriminating between the standard
choice model and models of boundedly rational choice, and between di erent models of
boundedly rational choice.
`Revealed preference' facts. Whether one believes our models or not, the experiment
has established quite clearly two series of facts. First, it has highlighted the importance
of a precise type of menu e ect with the options made available to the subjects. Indeed,
most violations of the Weak Axiom of Revealed Preference in our experiment are due
to the Condorcet inconsistency phenomenon rather than to pairwise inconsistency (as
we have shown, these two forms of inconsistency are exhaustive). This fact could only
be discovered because we elicited entire choice functions from the subjects, rather than
asking them to express their (binary) preferences.
Secondly, the experiment has demonstrated that while one step in weakening the
Weak Axiom of Revealed Preference (to the combination of Weak WARP and Expansion)
42
The Fisher test yields an exact mid-p value equal to 0:059 for the PAY treatment (i.e. statistical
signi cance is at 10% con dence level) and to 0:272 for the HYP treatment.
34
does not capture a signi cant new portion of observed behaviours, a second step in this
weakening (to Weak WARP alone) is dramatically more e ective.
Models of choice. As a consequence of the facts above, we now know what a good
model has to do in order to explain choices of the type we have considered. The utility
maximization model and the pure Rational Shortlist Method model ([22]) cannot explain
well the choice data elicited in our experiment because they can't address menu e ects.
Our new model of an RSM by categorization instead performs de nitely better (not only in
terms of brute number of observations explained but also in terms of Selten's [36] Measure
of Predictive Success). This model has two main attractions. First, its ability to capture in
a simple way menu e ects. And, second, its easy testability by means of choice data (being
equivalent to a single choice axiom), without the need to search directly for all possible
psychological entities (`categories' and `rationales') that appear in its description and that
might t the data. As a parsimonious way to describe choices, that is as a convenient `as
if ' device, according to our data this is a better model than, say, the utility maximisation
model; categories and rationales are better descriptive tools than utility. So in terms of a
pure `revealed preference' approach we could stop at that. But we would ultimately like
to argue also for the substantive validity of RSMs by categorization as a model of choice,
in the sense that they describe cognitive operations that real decisions makers actually
use. Obviously many more (choice or non-choice) data would be needed for a thorough
validation also in this sense, since other (yet to be formulated) psychological mechanisms
may be compatible with Weak WARP. Nevertheless we believe some ground has been
covered already, as the model passes a rst screening of introspective plausibility and of
psychological support. A minimal claim is that it does so at least as well as competing
models.
References
[1] Agresti, A. (2002) Categorical Data Analysis, Wiley Series in Probability and Statistics, Second Edition, John Wiley and Sons.
35
[2] Andreoni, J. and W. T. Harbaugh (2006) \Power Indices for Revealed Preferences
Tests", mimeo.
[3] Andreoni, J. and J. Miller (2002) \Giving According to GARP: An Experimental
Tests of the Consistency of Preferences for Altruism", Econometrica, 70: 737-753.
[4] Becker, G. S. (1962) \Irrational Behavior in Economic Theory", Journal of Political
Economy, 70: 1-13.
[5] Bernheim, D.B. and A. Rangel (2007) \Toward Choice-Theoretic Foundations for
Behavioral Welfare Economics", American Economic Review (Papers & Proceedings),
97: 464-470.
[6] Bronars, S. G. (1987) \The Power of Nonparametric Tests of Preference Maximization", Econometrica, 55: 693-698.
[7] Camerer, C. F. and R. Hogarth (1999) \The e ects of nancial incentives in experiments: A review and capital-labor-production framework", Journal of Risk and
Uncertainty, 19: 7-42.
[8] Harrison, G. and E. Rutstr•om \Experimental Evidence on the Existence of Hypothetical Bias in Value Elicitation Methods", in C. Plott and V. Smith, eds., Handbook
of Experimental Economics Results. New York: Elsevier, forthcoming.
[9] Fevrier, P. and M. Visser (2004) \A Study of Consumer Behavior Using Laboratory
Data", Experimental Economics, 7: 93-114.
[10] Fishburn, P. C. (1970) Utility theory for decision making, Wiley, New York.
[11] Frederick, S., G. F. Loewenstein and T. O'Donoghue (2002) \Time discounting and
time preferences: a critical review", Journal of Economic Literature, 40: 351-401.
[12] Gigerenzer, G., P. Todd, and the ABC Research Group (1999) Simple Heuristics
That Make Us Smart. New York: Oxford University Press..
[13] Gul, F. and W. Pesendorfer (2001) \Temptation and Self-Control", Econometrica,
69: 1403-1435.
36
[14] Gul, F. and W. Pesendorfer (2005) \Mindless economics", mimeo Princeton University.
[15] Harbaugh, W. T., K. Krause and T. Berry (2001) \GARP for Kids: On the Development of Rational Choice Behavior", American Economic Review, 91: 1539-1545.
[16] Hertwig, R. and A. Ortmann (2001) `Experimental Practices in Economics: A Challenge for Psychologists?' [target article], Behavioral and Brain Sciences, 24: 383-403.
[17] Houy, N. (2006) \Choice functions with States of Mind", forthcoming in Theory and
Decisions.
[18] Kalai, G., A. Rubinstein and R. Spiegler (2002) \Rationalizing Choice Functions by
Multiple Rationales", Econometrica, 70: 2481-2488.
[19] Koszegi, B. and M. Rabin (2006) \A Model of Reference Dependent Preferences",
Quarterly Journal of Economics, 121: 1133-1166.
[20] List, J. and C Gallet (2001) \What Experimental Protocol In uence Disparities Between Actual and Hypothetical Stated Values?", Environmental and Resource Economics, 20: 241-254.
[21] Luce, M., J. Bettman, and J. Payne (1997) \Choice Processing in Emotionally Difcult Decisions", Journal of Experimental Psychology: Learning, Memory and Cognition, 23:384-405
[22] Manzini, P. and M. Mariotti (2007) \Sequentially Rationalizable Choice", American
Economic Review, 97: 1824-1839.
[23] Manzini, P., M. Mariotti and L. Mittone (2006) \Choosing monetary sequences:
theory and experimental evidence", CEEL Working paper 1-06.
[24] Medin, D.L. and Aguilar, C. (1999) \Categorization", in R.A. Wilson and F.C. Keil
(eds) MIT Encyclopedia of the Cognitive Sciences, MIT Press, Cambridge, Ma.
[25] Masatlioglu, Y. and E. Ok (2005) \Rational Choice with Status-quo Bias", Journal
of Economic Theory 121: 1-29.
37
[26] Masatlioglu, Y. and E. Ok (2006) \Reference-Dependent Procedural Decision Making", working paper, NYU.
[27] Mattei, A. (2000) \Full-Scale Real Tests of Consumer Behavior using Expenditure
Data", Journal of Economic Behavior and Organization, 43: 487-497.
[28] Moulin, H. (1985) \Choice Functions Over a Finite Set: A Summary" Social Choice
and Welfare, 2, 147-160.
[29] Read, D. (2005) \Monetary incentives, what are they good for?", Journal of Economic Methodology, 12: 265-276.
[30] Roelofsma, P. H. and D. Read, 2000 \Intransitive Intertemporal Choice", Journal of
Behavioral Decision Making, 13: 161-177.
[31] Rubinstein, A., 1988, \Similarity and Decision Making Under Risk (Is there a utility
theory resolution to the Allais Paradox?)", Journal of Economic Theory, 46: 145-153.
[32] Rubinstein, A., 2003, \Is it `Economics and Psychology' ? The case of hyperbolic
discounting", International Economic Review, 44: 1207-1216.
[33] Rubinstein, A. and Y. Salant (2006) \Two Comments on the Principle of Revealed Preference", mimeo, Tel Aviv University, http://arielrubinstein.tau.ac.il/ papers/frames.pdf.
[34] Rubinstein, A. and Y. Salant (2007) \(A; f ) Choice with Frames", mimeo, Tel Aviv
University, http://arielrubinstein.tau.ac.il/papers/Af.pdf.
[35] Samuelson, P. (1938) \A Note on the Pure Theory of Consumer's Behavior", Economica 5 (1): 61-71.
[36] Selten, R. (1991)\Properties of a measure of predictive success", Mathematical Social
Sciences, 21: 153-167.
[37] Sen, A. (1993) \Internal consistency of choice", Econometrica, 61: 495-521.
[38] Sen, A. (1997) \Maximization and the Act of Choice", Econometrica, 65, 745-779.
38
[39] Simonson, I. (1989) \Choice Based on Reasons: The Case of Attraction and Compromise E ects", Journal of Consumer Research, 16: 158-74.
[40] Sippel, R. (1997) \An Experiment on the Pure Theory of Consumer's Behavior",
Economic Journal, 107: 1431-1444.
[41] Suzumura, K. (1983) Rational Choice, Collective Decisions, and Social Welfare, Cambridge University Press, Cambridge U.K.
[42] Tversky, A. (1969) \Intransitivity of Preferences", Psychological Review, 76: 31-48.
[43] Tversky, A. (1972) \Elimination By Aspects: A Theory of Choice", Psychological
Review, 79: 281-299.
[44] Varian, H. (2005) \Revealed Preference", forthcoming in Samulesonian Economics
and the 21st century, Michael Szenberg (Ed.).
[45] Waite, T. (2001) \Intransitive preferences in hoarding gray jays (Perisoreus canadensis)", Behavioral Ecology and Sociobiology, 50: 116 - 121.
[46] Yee, M., E. Dahan, J. R. Hauser and J. Orlin (2007) \Greedoid-Based NonCompensatory Inference", Marketing Science, 26: 532-549.
39
A
Appendix: Instructions
Please note: you are not allowed to communicate with the other participants for the entire
duration of the experiment.
The instructions are the same for all you. You are taking part in an experiment to
study intertemporal preferences. The project is nanced by the ESRC.
Shortly you will see on your screen a series of displays. Each display contains various
remuneration plans worth the same total amount of C
=48 each, staggered in three, six and
nine months installments. For every display you will have to select the plan that you
prefer, clicking on the button with the letter corresponding to the chosen plan. (HYP:
These remuneration plans are purely hypothetical. At the end of the experiment you'll be
given a participation fee of C
=5) (PAY: At the end of the experiment one of the displays
will be drawn at random and your remuneration will be made according to the plan you
have chosen in that display).
In order to familiarize yourself with the way the plans will be presented onscreen, we
shall now give you a completely hypothetical example, based on a C
=7 total remuneration.
Plan A
How much
When
=3
C
in one year
=1
C
in two years
=1
C
in three years
=2
C
in four years
Plan B
How much
When
=1
C
in one year
=2
C
in two years
=3
C
in three years
C1
=
in four years
In this example plan A yields C
=7 in total in installments of C
=3, C
=1, C
=1 and C
=2 in a
year, two years, three years and four years from now, respectively, while plan B yields C
=7
in total in installments of C
=1, C
=2, C
=3 and C
=1 in a year, two years, three years and four
years from now, respectively.
40