Two stage boundedly rational choice procedures: Theory and experimental evidence. Paola Manzini Marco Mariotti Queen Mary, University of London and IZA Queen Mary, University of London This version: December 2007 Abstract We study and test a class of boundedly rational models of decision making which rely on sequential eliminative heuristics. We formalize two sequential decision procedures, both inspired by plausible models popular among psychologists and marketing scientists. However we follow a standard `revealed preference' economic approach by fully characterizing these procedures by few, simple and testable conditions on observed choice. Then we test the models (as well as the standard utility maximization model) with experimental data. We nd that the large majority of individuals behave in a way consistent with one of our procedures, and inconsistent with the utility maximization model. Corresponding address: Department of Economics, Queen Mary, University of London, Mile End Road, London E1 4NS, United Kingdom. Part of this work was carried out while Manzini and Mariotti were visiting the University of Trento. Funding was provided by the ESRC under grant RES-000-220866. We wish to thank both institutions for their support. We are also grateful to Gianni de Fraja, John Hey, Jonathan Leland, Michele Lombardi, Luigi Mittone, Daniel Read, Ludovic Renou, Karl Schlag, Nick Vriend, Quan Wen and seminar audiences at the Universities of Amsterdam, Brescia, EUI, Helsinki, Leicester, LUISS, Malaga, Manchester, St. Andrews, Siena, Zurich and at the ESA 2007 and the APET 2007 conferences for insightful discussions and comments, as well as to the tireless sta of the CEEL lab in Trento, in particular to Marco Tecilla, for superb programming support. Any error is our own. 1 1 Introduction In the standard model of decision making, choice behavior is described as the outcome of the maximization of some binary relation, possibly summarized by a utility function. Yet, observed choice behavior is often incompatible with this model.1 The main goal of this paper is to test both the standard model and a class of boundedly rational models of decision making which rely on sequential eliminative heuristics. We formalize two sequential decision procedures, both inspired by plausible models popular among psychologists and marketing scientists. However we follow a standard `revealed preference' economic approach by fully characterizing these procedures by few, simple and testable conditions on observed choice data. Then we test these models, as well as the standard utility maximization model, with experimental data. We nd that the large majority of individuals behave in a way consistent with one of our procedures, and inconsistent with the utility maximization model. Our theoretical results also allow us to trace the observed departures from maximization to either of two elementary forms of inconsistency: this helps guide the search for an alternative model. For the moment let us refer to our procedures as Procedure I and Procedure II.2 For an informal example of these procedures, suppose a decision-maker (DM) has to choose which of three wines to drink with his dinner at a restaurant: a moderately priced Australian (A), an expensive French Bordeaux (B), or a cheap Italian Chianti (C). Procedure I describes considerations of the following kind. DM looks rst at the origin: he thinks B is best among European wines, so he prefers B to C, but has never tasted new world wines, so cannot compare a to either B or C. When he is undecided, DM chooses according to price, and C is cheaper than A which is cheaper than B. This generates the following `irrational' behavior: if only A and B are available DM chooses A; if only B and C are available he chooses B; nally if only A and C are available he chooses C. DM exhibits pairwise cyclical choices. Procedure II captures considerations of the following kind. When all wines are avail1 See e.g. Roelofsma and Read [30], Tversky [42], and Waite [45] who nd evidence of pairwise cycles of choice. The evidence presented in this paper points to further violations of rational behavior. 2 Procedure I was rst introduced in Manzini and Mariotti [22]. 2 able, the DM, who is Italian, perceives B as a `right' compromise choice over the `extreme' choices A (too exotic) and C (too familiar).3 In other words, when viewed as a group against B, the options A and C share a characteristic, that of appearing extreme, compared to the alternative option. When only binary choices are available, DM looks instead straight at price. This generates the following additional type of irrational behavior: B is selected when all three wines are available, but C is selected (based on price) in both binary contests in which it appears. DM exhibits a strong form of menu dependence. We shall see below that the two types of `rationality failure' just illustrated are an exhaustive taxonomy also for general choices. Formally, in both procedures, DM implements a two-stage algorithm to arrive at a nal choice: two rationales (asymmetric and possibly incomplete binary relations) are used sequentially to eliminate alternatives from the choice set. The di erence lies in the way the rst rationale operates. In procedure I, the rst rationale simply ranks (some of) the alternatives. In procedure II, the rst rationale ranks instead sets of alternatives. The interpretation is that the decision-maker categorizes alternatives in groups, and compares groups rst. In general there may be more than one criterion to categorize alternatives, so groups might overlap. When two groups are disjoint, they may be related by the rst rationale. In this case, the entire `losing' group is eliminated. For both procedures, the second rationale is used to eliminate further alternatives and to single out a choice. Surprisingly, Procedure II is fully characterized by a single `revealed preference' property, which we call Weak WARP (Theorem 3). Weak WARP is a simple weakening of the standard Weak Axiom of Revealed Preference (Samuelson [35]). Weak WARP adds to WARP the clauses between brackets in the following de nition: if x is directly revealed preferred to y [both in pairwise contests and in the presence of a `menu' of other alternatives], then y cannot be directly revealed preferred to x [in the presence of a smaller menu]. This characterization is the main theoretical contribution of our work. Procedure I can be similarly characterized, as we have shown in Manzini and Mariotti [22]. The choice data can be generated by procedure I if and only if they satisfy, in 3 This particular example is just an application of the well-documented `compromise e ect' (Simonson [39]) whereby brands improve market share when they are seen as the intermediate rather than an extreme option in the choice set. 3 addition to Weak WARP, another standard revealed preference property called Expansion. Expansion says that if x is chosen from two sets of alternatives, then x is chosen when the sets are merged. Expansion and Weak WARP together still constitute a weakening of WARP, which implies both properties. The nature of these procedures can be much better understood in the light of a simple but crucial result on choice functions. Any failure of full rationality in choice (that is, a violation of WARP) can be ultimately reduced to one (or both) of just two categories, illustrated in the previous wine examples: pairwise inconsistent choice and `Condorcet inconsistent' choice. Condorcet inconsistency captures a speci c type of `menu-e ect' in choice: an alternative is chosen over each of a number of other alternatives in pairwise choice, yet it is no longer chosen when all these alternatives are grouped together.4 On the other hand, pairwise inconsistent choice involves only pairwise comparisons and therefore does not incorporate any menu-e ect. It thus captures a conceptually separate violation of full rationality, namely that the pairwise comparisons do not allow the observer to construct a preference relation with some minimal consistency property. Procedure I can address pairwise cyclical choice, but not violations of Condorcet inconsistency (which is a necessary condition for choices generated by that procedure). Procedure II can explain both types of irrational behavior. Needless to say, even Procedure II is not vacuous as it can be tested by Weak WARP. After the theoretical analysis, we put all the decision models discussed to the experimental test. To do so we elicit the choice function of experimental subjects out of all possible subsets of a given initial set of alternatives. In many decision theory experiments only pairwise choices (or preferences) are elicited. However, the taxonomy result discussed above illustrates the importance of observing decision behavior on larger sets in order to trace the sources of full rationality violations. In our experiment we use as alternatives time sequences of monetary rewards. Our data show that WARP is violated by a majority of subjects. One of our main contributions on the experimental front is an enquiry into the speci c nature of the violations of full 4 Obviously there are other speci cations of what constitutes menu-dependence, which for example could be `dynamic' (dependence on previous choices or status quo: see e.g. Masatlioglu and Ok [25], Houy [17] and Koszegi and Rabin [19]). See Sen [38] for a general discussion of menu-dependence. 4 rationality, based on our taxonomy result. In our context, more than 15% pairwise cyclical choices were observed. Nonetheless these violations of full rationality were strongly associated with violations of Expansion. The consequence of this fact is that Procedure I in this case does not make a signi cant improvement of the standard maximization model (of course, Procedure I may be more useful in other contexts and for other purposes5 ). Our main experimental nding is that Procedure II yields a step change in explanatory power in the present case. The majority of violations of WARP are associated with Condorcet inconsistency, so that any model that describes well these experimental data will need to incorporate this type of menu e ect. Indeed, the large majority of subjects satis es Weak WARP in all their choices. Note well: this is not simply a consequence of the fact that Weak WARP, being a weaker axiom, is able to bring in most of the observed choices. We demonstrate this by using Selten's Measure of Predictive Success, which takes into account not only the `hit rate' of a model (proportion of correctly predicted observations), but also its `parsimony' (inversely measured by the proportion of possible cases in principle compatible with the model). Using our data, Procedure II scores higher than both competitors also in terms of Selten's Measure. We present our theoretical results in section 2, while the experiment is presented and analysed in section 3, and section 4 conludes. 2 Theory 2.1 Preliminaries In this section we develop and recall some theoretical results which are necessary to design our experiment. Let X be a nite set of alternatives and let 5 2X . A choice function on is a In Manzini and Mariotti [23] we study Procedure I with speci c decision criteria and obtain a model of time preferences with signi cantly higher explanatory power than discounting models for pairwise choices. In general, it is a consequence of our theoretical results that when only pairwise choices are the object of interest, or when menu-e ects are not likely to play a major role, Procedure I is as a good a model as Procedure II. 5 function : ! X, such that we make on the domain (S) 2 S for all S 2 . The only additional assumptions6 are that all pairs and triples of alternatives are included in the domain, that is, for all x; y; z 2 X: (i) fx; yg 2 ; (ii) fx; y; zg 2 . For a binary relation 2X X denote the maximal elements of a set S 2 by max (S; ), that is: max (S; ) = fx 2 Sj @y 2 S for which y xg De nition 1 A choice function is fully rational if there exists a complete order on X such that f (S)g = max (S; ) for all S 2 . As is well-known,7 in the present context the fully rational choice functions are exactly those that satisfy Samuelson's [35] Weak Axiom of Revealed Preference (WARP), de ned below: WARP: If x = (S), y 2 S and x 2 T then y 6= (T ). WARP says that if an alternative is directly revealed preferred to another, the latter alternative can never be directly revealed preferred to the former (revealed preference is an asymmetric relation). 2.2 A taxonomy of irrationality Failures of full rationality may mix together more than one elementary form of inconsistency. To reduce lack of full rationality to its basic building blocks (to be studied in the experiment), we consider the two violations of WARP discussed in the introduction, menu dependence and pairwise inconsistency. The latter category involves exclusively choices between pairs of alternatives, while the former category involves choices from larger sets. 6 The niteness assumption on the domain could be easily dispensed with, and replaced by a well- behavedness assumption on the sets in , guaranteeing that complete and transitive relations on them have a maximal element. This would just complicate notation so we stick with the nite case in the text. 7 See e.g. Moulin [28] or Suzumura [41] 6 The following property captures the elementary form of menu independence: Condorcet consistency: If x = x= (fx; zg) for all z 2 Sn fxg for some S 2 , then (S). Condorcet consistency says that if the same alternative is chosen in pairwise contests against any other alternative in a set, then this alternative will be chosen from the set. Let x = denote the base relation of a choice function , that is x y if and only if (fx; yg). Each of the following properties captures elementary forms of pairwise consistency: Base transitivity: Base acyclicity: Base intervality: either x z or w is transitive. is acyclic. satis es the intervality condition:8 If x y and w z then y. We note rst that all these conditions are equivalent (the simple proof of the following result is available from the authors): Proposition 1 A choice function satis es base intervality, if and only if it satis es base transitivity, if and only if it satis es base acyclicity. The equivalence of all the base conditions makes it legitimate to speak simply of Pairwise Consistency when the three conditions are met. The following is our basic classi cation result:9 Theorem 1 A choice function on satis es WARP if and only if satis es both Condorcet consistency and Pairwise Consistency. Proof: It is obvious that a choice function that violates Condorcet consistency also violates WARP. Suppose it violates base transitivity. Then there exist x; y; z 2 X for 8 Introduced in Fishburn [10]. 9 This result is similar but much stronger than Proposition 1 in Manzini and Mariotti [22]. 7 which x y, y z, and z x, and by the domain assumption (ii) WARP is contradicted on fx; y; zg. For the converse implication, suppose that S; T 2 (S) 6= y = be such that x = base transitivity: we show that then violates WARP, and let in particular (T ), x; y 2 T \ S. Suppose that satis es must violate Condorcet consistency. By base transitivity there exist base-maximal elements in S and T . Since must be asymmetric, these elements are unique. So there are unique s 2 S and t 2 T such that s= (fs; zg) for all z 2 Sn fsg t= (ft; zg) for all z 2 T n ftg If s 6= x, then Condorcet consistency is violated on S. If s = x, then in particular x= 2.3 2.3.1 (fx; yg). So y 6= t and Condorcet consistency is violated on T .10 Two sequential models of decision making Rational Shortlist Methods We rst explain our Procedure I, introduced in Manzini and Mariotti [22]. From now on we shall call any asymmetric binary relation on X a rationale. De nition 2 A choice function if there exists an ordered pair ( is a Rational Shortlist Method (RSM) if and only 1; 2) of rationales such that: f (S)g = max (max (S; In that case we say that ( 1; 2) 1) ; 2) for all S 2 sequentially rationalize . So the choice from each S can be represented as if the decision maker went through two sequential rounds of elimination of alternatives. In the rst round he makes a `shortlist' by retaining only the elements which are maximal according to rationale 1. In the second round, he retains only the element which is maximal according to rationale 10 2: that Note that this method of proof would yield a similar conclusion on in nite domains, provided pairwise choices were suitably continuous. 8 element is his choice. Note that, crucially, the rationales and the sequence in which they are applied are invariant with respect to the choice set. RSM's are in the vein of several `noncompensatory' sequential eliminative heuristics11 promoted by psychologists (such as the Elimination by Aspects model by Tversky [43], or the `Fast and frugal heuristics' by Gigerenzer and Todd [12]) and marketing scientist (such as `greedoid based' choice algorithms, Yee, Dahan, Hauser and Orlin [46]). RSM's are characterized by two properties, one of which is a weakening of WARP, and the other is the classical Expansion axiom: Weak WARP: For all R; S 2 y 6= : If fx; yg R S and x = (S) 6= y then (R). Expansion: Let fSi g be a class of sets such that Si 2 x= (fx; yg) = (Si ) for all i then x = for all i and [i Si 2 . If ([i Si ). Weak WARP says that if you choose steamed salmon over steak tartare when they are the only available choices, and you also choose steamed salmon from a large menu including steak tartare, then you cannot choose steak tartare from a small menu including steamed salmon. In other words, if adding a large number of alternatives to the menu does not overturn a revealed preference, then adding just a subset of those alternatives cannot overturn the revealed preference either. Expansion says that if steamed salmon is chosen in each of a series of menus, then it is also chosen when all the menus are merged. While these are reasonable conditions, in this paper we treat them merely as tractable `revealed preference' type of properties which are easy to check in an experiment that provides choice data. The following characterization result follows easily by adapting the argument in the proof of Theorem 1 in Manzini and Mariotti [22]. Theorem 2 Suppose the domain is closed under set union. Then a choice function on is an RSM if and only if it satis es Weak WARP and Expansion. 11 The adjective `noncompensatory' refers to the fact that while several criteria are used, they cannot be used to `compensate' each other: unlike the arguments of a utility function, there is no trade-o in the mind of the decision maker between one criterion and the next. 9 2.3.2 Rational Shortlist Methods by Categorisation Next, we consider `Procedure II' from the Introduction. The basic idea is that decision makers may go through an initial `classi cation' stage in which they organise the alternatives into groups, and decide to discard obviously inferior groups. This appears very natural as soon as the set of alternatives exhibits some complexity. For example, a saver seeking to invest a lump-sum might immediately discard investments in the stock market and focus on xed income activities. Or, a prospective diner out may immediately discard the group of Italian restaurants. Indeed related ideas have been considered in psychology, where categorization is a widely studied concept in the analysis of human reasoning.12 In marketing science, the well-known `compromise e ect' (Simonson [39]) can also be captured by the procedure, as illustrated in the wine example in the Introduction. De nition 3 A rationale by categorization on X is an asymmetric relation on 2X satisfying the following two properties: (i) R \ S = ? whenever R (ii) jR [ Sj > 2 whenever R S S The interpretation of a rationale by categorization is that alternatives may be organized in categories by some criterion. More than one criterion may be relevant, hence two categories are not necessarily disjoint (think of categorizing a group of people by race and religion). However, the decision-maker can only compare disjoint categories (condition (i)). Moreover, what the relation compares are genuine groups: degenerate comparisons between singletons are not allowed (condition (ii)). For example, with alternatives such as those used in the experiment, constant streams of monetary payments can form a category `against' increasing streams, and increasing three-period streams could form a category against increasing two-period streams.13 12 13 See e.g. Medin and Aguilar [24]. Rubinstein [31] (and more recently [32]) pioneered in economic theory the analysis of similarity con- siderations in decision-making, which may be related to the idea of categorization (considering alternatives in the same category as similar). 10 De nition 4 Given a rationale by categorization and S 2 , the maximal set on S is given by: max (S; ) = fx 2 Sj for no R0 ; R00 S it is the case that R0 R00 and x 2 R00 g We are now ready for our main de nition: De nition 5 A choice function exists a rationale by categorization is an RSM by categorization if and only if there 1 and a rationale f (S)g = max (max (S; 1) ; 2) 2 such that: for all S 2 So, the decision maker looks rst at group rankings, and eliminates any group which is dominated by another group. Then, she decides among the remaining alternatives on the basis of the second rationale. For example, if the choice set is comprised of two nonincreasing streams of money and two increasing streams, the decision maker may rst select the group of non-increasing streams and then select within that group. When this procedure leads to a single chosen alternative for each choice set, the resulting choice function is an RSM by categorization. Below we make some observations which highlight key di erences between the two sequential procedures we have introduced. Remark 1 Condorcet consistency is a necessary condition for an RSM. However, there are choice functions which violate Condorcet consistency and yet are RSMs by categorization. For instance, take the following choice function, with the base relation as indicated in gure 1 (where an arrow going from x to y indicates that x is selected in pairwise choice between x and y): X = fa; b; c; dg, (fa; c; dg) = (X) = (fa; b; cg) = (fb; c; dg) = b, (fa; b; dg) = a. Condorcet consistency is violated, since a is chosen in pairwise comparisons over each of the other alternatives but is not chosen from the grand set, nor from fa; b; cg. However, this choice function is an RSM by categorization with fbg 1 fa; cg, and 2 coinciding with the base relation. 11 a a d b d b c c Figure 1: Base relation Figure 2: Base relation for remarks 1 and 2 for remark 3 Remark 2 Consider the following dual property to Condorcet consistency: an alternative is not chosen in a set in which it is never chosen in any pairwise choice. Formally, if x 6= (fx; zg) for all z 2 Sn fxg and S 2 , then x 6= (S). It is easy to show that this property too is a necessary condition for an RSM. However, there exist choice functions that violate it and yet are RSMs by categorization. Consider the choice function with the same base relation as the one in gure 1, where b, (fa; b; dg) = a and categorization by fbg 1 (X) = (fa; b; cg) = (fb; c; dg) = (fa; c; dg) = d. It violates the property and is an RSM by fa; cg, fdg 1 fa; cg and 2 coinciding with the base relation. Remark 3 An RSM by categorization is not a vacuous notion: there are choice functions which are not RSMs by categorization. As an example, let X = fa; b; c; dg and let (fa; b; cg) = a, (fa; b; dg) = (fb; c; dg) = b and (fa; c; dg) = c, with the base relation as in gure 2. Then, since b is chosen in fa; b; dg, there must be R0 that a is eliminated before it can eliminate b. Then in view of a = (X) = 1 R00 , with a 2 R00 , so cannot be an RSM by categorization (X). The choice function in the last example violates Weak WARP. As we show next, this property alone characterizes an RSM by categorization. To ease notation, xing the choice function , we de ne the upper and lower contour sets of an alternative on a set S 2 U p (x; S) = fy 2 Xjy xg \ S Lo (x; S) = fy 2 Xjx yg \ S and 12 as respectively. Theorem 3 A choice function is an RSM by categorization if and only if it satis es Weak WARP. Proof: Necessity. Suppose that and 2. Suppose x = that y = is an RSM by categorization with rationales (fx; yg) and x = (R) with x 2 R 1 (S) with y 2 S. Now suppose by contradiction S. This means that x must be eliminated in the rst round of elimination in R: if not, then either x would eliminate y in the second round, or (fx; yg) = x would contradict the assumption that particular there exist R0 ; R00 contradicts x = 1 1 1 and R00 and x 2 R00 . Since R0 ; R00 2. In S this (S). Su ciency. De ne: x De ne: R R, such that R0 is rationalized by 2 y if and only if x = S if and only if there exists T 2 (fx; yg). 2 is obviously asymmetric. such that R = f (T )g [ Lo ( (T ) ; T ) and S = U p ( (T ) ; T ) 6= ? 1 is also obviously asymmetric and note that R \ S = ? whenever R and S are related by 1. Now let S 2 and let x = Suppose rst that y 2 (S). We show that x is not eliminated in either round. x for some y 2 S . Then by construction fxg [ Lo (x; S) 1 U p (x; S) and y is eliminated in the rst round. Next, suppose by contradiction that x is eliminated in the rst round. Then there exists R0 ; R00 S with R0 1 R00 and x 2 R00 . De ne R = R0 [ R00 . By construction of it must be R0 = f (R)g [ Lo ( (R) ; R) and R00 = U p ( (R) ; R) 13 1 (Notice that here a separate assumption of closure under union of the domain is not needed). This means that x= Together with x = x= (fx; (R)g) (S) (and noting that R = R0 [ R00 (S) (if R0 [ R00 = S) or Weak WARP (if R0 [ R00 S) this contradicts either S). It remains to note that y is eliminated either in the rst round or in the second round for all y 6= x. If y = then y is eliminated in the rst round. If x = (fx; yg), (fx; yg) then y is eliminated in the second round since as we have seen x survives the rst round. 3 Experiment 3.1 Experimental Design The experiment was carried out at the Computable and Experimental Economics Laboratory at the University of Trento, in Italy. We ran a total of 13 sessions between July 2005 and February 2006. Participants were recruited through bulletin board advertising from the student population of the University of Trento. Male and female participants took part in each experimental session in roughly equal proportions. The experiment was computerised, and each participant was seated at an individual computer station, using separators so that subjects could not see the choices made by other participants. Experimental sessions lasted an average of around 26 minutes, of which an average of 18 minutes of e ective play, with the shortest session lasting approximately 16 minutes and the longest around 37 minutes. We considered two treatments, one in which subjects received only a C =5 showup fee (a total 56 subjects in 4 sessions), and one with payments based on choice, where as we explain more in detail below an additional C =48 were made available to each subject (a total of 102 subjects in 9 sessions).14 We will refer to these two treatments as HYP (for hypothetical) and PAY (for paid), respectively.15 At the begin14 The show up fee alone, for an average of less than thirty minute long experimental session, was higher than the hourly pay on campus, which was C =8. At the time of the experiments the exchange rate of the Euro was approximately C =1=$1.2=$0.7. 15 Distinguishing by treatment, sessions lasted an average of about 28 minutes for the PAY treatment, of which an average of just above 19 minutes of e ective play; and an average of around 22 minutes for 14 ning of the experiment subjects read instructions on their monitor, while an experimenter read the instructions aloud to the participants.16 In each treatment, each experimental subject was presented with 23 di erent screens. Each screen asked the subject to choose the preferred one among a set of alternative remuneration plans in installments to be received staggered over a time horizon of nine months, each consisting of C =48 overall. Instructions were the same in both treatments, bar for one sentence, which in the HYP treatment clari ed that choices were purely hypothetical, so that the only payment to be received would be the show up fee; whereas for the PAY treatment it was explained that at the end of the experiment one screen would be selected at random, and the preferred plan for that screen would be delivered to the subjects.17 Choices were based on two sets - depending on the number of installments - of four plans each, namely an increasing (I), a decreasing (D), a constant (K) and a jump (J) series of payments, over either two or three installments, as shown below. Though in both cases payments extended over nine months, because of the di erent number of installments we abuse terminology and refer to `two-period' and `three-period' sequences rather than two/three-installment sequences: Two period sequences I2 in three months in nine months D2 16 32 32 16 Three period sequences K2 J2 I3 D3 K3 J3 24 8 24 16 8 in three months 16 16 16 8 in six months 24 8 16 32 in nine months 24 8 40 Table 1: the base remuneration plans Each subject had to choose the preferred plan from each possible subset of plans within each group (making up 11 choices per group). In addition, in a 23rd question subjects were asked to choose between the three period sequences SJ=(8,32,8) and SI=(24,8,16). This the HYP treatment, of which an average of about 16 minutes of e ective play. 16 17 See the appendix for the translation of the original instructions (in Italian). The experimental lab has a long tradition, so there was no issue of (mis)trust in receiving delayed payments. At the time of writing all subjects have been paid. 15 was needed to address an issue for a di erent experiment, which we discuss elsewhere.18 The use of payment plans as alternatives stems from our desire to present the experimental subjects with alternatives of a `richer' description than mere money amounts (sequences can be classi ed by their `shapes'), but which at the same time are objectively expressed in money terms, so as to reduce di cult to control emotive connotations (which might happen, for example, by using food items). Figure 3 displays sample screenshots of the choices subject had to make. The participants made their choice by clicking with their mouse on the button corresponding to the preferred remuneration plan. Once made, each choice had to be con rmed, so as to minimize the possibility of errors. Both the order in which the twenty-three questions appeared on screen and the position of each option on the screen was randomized. Figure 3: Sample screenshots An unusual feature of the experiment is that we elicited the entire choice functions with domain over all subsets for each of the two grand sets X2 = fI2; D2; K2; J2g and X3 = fI3; D3; K3; J3g. This enables us to check whether or not the axioms discussed in section 2 hold. In particular, we can assess (i) what the main reason is for the failure of 18 See Manzini, Mariotti and Mittone [23] 16 full rationality (violation of Pairwise Consistency or violation of Condorcet consistency), and (ii) what proportion of choice functions can be rationalized in the standard way, what proportion is an RSM and what proportion is an RSM by categorization. 3.2 3.2.1 Experimental results I: Evaluating the models Choosing at Random? We begin by noting that, whatever the subjects are doing, they are certainly not pressing buttons at random.19 Since we are eliciting the entire choice functions from universal sets with four alternatives, with a uniform probability distribution on each choice set, the probability of observing even only two subjects with the same choice is e ectively zero for all practical purposes. In fact, as there are a possible 26 34 4 = 20; 736 choice functions on each universal set, the probability of any given choice function being picked by two subjects is (20; 736) 2 = 2:325 7 10 9 . On the contrary for both treatments and for both universal sets X2 and X3 we nd almost half of the subjects with the same modal choice function. For illustration we report the frequency distributions of the observed choice functions only graphically in Figure 4 (we omit labels for legibility).20 3.2.2 Which Type of Rationality Failure? Before turning to the evaluation of the models,21 we check which, if any, of the two distinct elementary failures of full rationality highlighted in Theorem 1 in section 2.2 is more prevalent. To this e ect we begin by reporting aggregate data for the violations of Pairwise and Condorcet Consistency for each of the choice functions elicited (i.e. the choice functions from 2X2 and from 2X3 for each treatment): 19 Purely random choice is an important benchmark. Within consumer choice, the idea was rst ad- vanced by Becker [4] and it is used for example as the alternative hypothesis in the popular Bronars [6] index of power for nonparametric revealed preference tests. See Andreoni and Harbaugh [2] for a recent discussion of this issue. 20 The corresponding data are reported in tables2 and 3 in the technical appendix available online at http://www.qmul.ac.uk/ ugte172/twosimiltechnical and data appendix.pdf. 21 All the exact statistical analysis has been carried out usting StatXact, v.7. For a comprehensive treatment of exact and other methods in categorical data analysis see Agresti [1]. 17 50% PAY HYP 45% 40% Percent 35% 30% 25% 20% 15% 10% 5% choice functions, 2-period sequences 50% PAY choice functions, 2-period sequences HYP 45% 40% Percent 35% 30% 25% 20% 15% 10% 5% choice functions, 3-period sequences choice functions, 3-period sequences Figure 4: Frequency distributions of choice pro les by treatment and sequence length 18 PAY HYP 2 periods 3 periods 2 periods 3 periods # # # # % % % % Condorcet Consistency 39 38.2 30 28.4 14 25 12 21.4 Pairwise Consistency 12 11.8 7 3 3 6.9 5.4 5.4 Table 2: Axiom violations in the two treatments. Since we are mainly interested in the decisions of individual subjects over all their choices, we report below a summary of overall violations of Condorcet Consistency and Pairwise Consistency by experimental subject: PAY # % Condorcet Consistency 51 50 Pairwise Consistency HYP # % 22 39.3 17 16.7 4 7.1 Table 3: Overall violations of PC and CC. From table 3 it emerges that failures of Condorcet Consistency are substantially more frequent than violations of Pairwise Consistency. This di erence is statistically significant, regardless of treatment. In fact, the McNemar test of the hypothesis that the proportions of subjects violating Condorcet Consistency is the same as the proportion of subjects violating Pairwise Consistency yields exact p-values of 0:009 in the case of the PAY treatment, and of 0:001 in the case of the HYP treatment. If we then look at the di erences in the proportion of violations of each of the two axioms across treatments, the fall in the proportion of violations when moving from the PAY to the HYP treatment is not statistically signi cant: Fisher test's exact mid-p values are 0:110 for Condorcet Consistency and 0:470 for Pairwise Consistency. 19 3.2.3 How Well Do the Models Describe the Data? Next, we turn to the three models examined in section 2.3, and we study the violations of the axioms which characterize those models.22 Recall that one model is the full rationality model (characterized by WARP), the other is the RSM model (characterized by Weak WARP and Expansion) and the third is the RSM by categorization model (characterized by Weak WARP). Again we start by looking at data for each choice function, reported in table 4. PAY HYP 2 periods 3 periods 2 periods 3 periods # # # % # % 10.7 5 8.9 % % Weak WARP 22 21.6 12 11.7 6 Expansion 39 38.2 30 29.4 14 25 13 23.2 WARP 43 42.2 30 29.4 16 28.6 13 23.2 Table 4: Violations of the axioms used for rationalizability. Table 4 shows that WARP is violated quite considerably in the PAY treatment, and less so - though still substantially - in the HYP treatment. Expansion is violated slightly less often than WARP overall, but much more often than Weak WARP. Overall violations by each individual, regardless of choice set, are reported in table 5. The proportion of subjects violating each axiom falls when moving from the PAY to the HYP treatment. Of these di erences, those concerning WARP and Weak WARP are statistically signi cant, while for Expansion this is not the case (Fisher test's exact mid-p values are 0:110 for Expansion, 0:042 for WARP and 0:022 for Weak WARP). Like Table 4, Table 5 also suggests similar rates of violation for Expansion and WARP (50% 22 In our experiment we use a small universal set of alternatives. Evidence for choice from budget sets includes Fevrier and Visser [9], Mattei [27] and especially Sippel [40], who nd substantial violations of the Generalized Axiom of Revealed Preferences in choices out of budget sets. However, Andreoni and Harbaugh [2] argue that most of these violations are `small' on the basis of Afriat's e ciency index. Indeed Harbaugh, Krause and Berry [15] and especially Andreoni and Miller [3] nd that subjects have choices consistent with GARP in experiments with budget sets. 20 and 52:9%), and substantially lower rates for Weak WARP compared to either of the other axioms (28:4%). Within treatment, however, the only meaningful comparison in the di erence of proportions is between failures of Expansion and Weak WARP, which are the only two independent axioms.23 Here the hypothesis of equality in the proportion of subjects violating the two axiom is rejected (McNemar's exact p-value is 0:002 in the PAY treatment and 0:041 in the HYP treatment). PAY # % HYP # % Expansion 51 50 22 39.3 Weak WARP 29 28.4 8 WARP 54 52.9 22 39.3 14.3 Table 5: Overall axiom violations. Table 5 con rms that WARP, i.e. the full rationality model, does not describe the data well, especially in the PAY treatment where less than half of the subjects t the model. Consider now RSMs. The crosstabulation of violations of the two axioms characterizing it is reported in table 6. Interestingly, in both treatments, no individual who satis es Expansion violates Weak WARP. That is, the (large) number of Expansion violators is not joined by another separate group of Weak WARP violators in order to determine the RSM violators. The RSM violators are simply counted by Expansion violators (of which some are also Weak WARP violators). So although Weak WARP and Expansion are logically independent properties, in our experimental sample they are not statistically independent. This also shows that, quite remarkably, although RSM is a much weaker notion than full rationality, in our experimental sample it does not do a substantially better job than the fully rational model at explaining the data. 23 For comparisons between the proportion of violations of other pairs of axioms it is not possible to rely on a McNemar test, as violations of either Expansion or Weak WARP imply violations of WARP (i.e. the relevant contingency table would have structural zeroes). We defer tackling of this issue to our discussion of the relative performance of alternative theories further below. 21 PAY HYP Expansion Expansion X # Weak WARP X % X # % # % # % 29 28.4 0 0 8 14.3 0 0 22 21.6 51 50 14 25 Weak WARP 34 60.7 X Table 6: Violations of Weak WARP and Expansion RSMs improve only marginally on order maximization in their ability to explain the data for the PAY treatment (decreasing the violations from 52:9% in the case of WARP to 50% in the case of RSM), and they are as bad in the HYP treatment. This is due to the fact that binary cycles, which violate full rationality but not RSM, are not a very relevant phenomenon here, unlike menu e ects (in the sense of violations of Condorcet Consistency), which cannot accommodated by either of these two theories. Finally, we turn to Weak WARP and the model of RSM by categorization. From Table 5 we can see that Weak WARP is satis ed by just below 72% of the subjects in the PAY treatment and just below 86% of the subjects in the HYP treatment. In summary then: PAY # Full rationality % 48 47.1 HYP # % 34 60.7 Rational Shortlist Method 51 50 34 60.7 RSM by categorization 48 85.7 73 71.6 Table 7: Explanatory power of competing theories. The three models are nested, that is Full Rationality ) Rational Shortlist Method ) RSM by categorization 22 In order to compare the incremental `explanatory' power in each more general theory in a meaningful way, we have to take into account this nestedness. We do so in the next section. 3.2.4 Comparing Models with Selten's Measure of Predictive Success In order to compare the models we use Selten's Measure of Predictive Success (Selten [36]). This measure was speci cally designed to evaluate `area theories' like the ones considered in this paper, namely theories that exclude deterministically a subset of the possible outcomes. The measure takes into account not only the `descriptive power' of the model (measured by the proportion of `hits', the observed outcomes consistent with the model), but also its `parsimony'. The lower the proportion of theoretically possible outcomes consistent with the model, the more parsimonious the model. In our speci c case, one possible criticism of our `revealed preference' type of tests might be that the experiment does not have enough power to reject Weak WARP even if it happened to be the wrong hypothesis. For example, if each universal set consisted of only three, instead than of four, alternatives, Weak WARP could never be violated. But the observed 100% hit rate should be interpreted as a failure of the experimental design rather than a validation of the model of RSM by categorization. By introducing the `parsimony' element, Selten's measure would pick up this problem. More precisely, the measure, denoted s, is expressed as s=r a where r is the descriptive power (number of actually observed outcomes compatible with the model divided by the number of possible outcomes) and a is the `relative area' of the model, namely the number of outcomes in principle compatible with the model divided by the number of all possible outcomes. In the hypothetical example of a universal set of three alternatives, s = 0 for the model of RSM by categorization In our experiment, we observed two choice functions for each subject. So the number of all logically possible observations of choice behavior for each subject was (20; 736)2 . A `hit' consists of the subject not violating the axioms (characterizing a speci c model) in either of the two choice functions. Thus the values of r for the various models are in 23 Table 7. In order to compute a for a model we have to compute the proportion of choice functions compatible with the set of axioms characterizing that model. Let's consider the Full Rationality model rst. For each universal set of alternatives, the number of choice functions satisfying WARP is simply the number of strict orderings on a set of cardinality four, that is 4! = 24. Therefore for each individual there are only 242 possible patterns of behavior in the experiment compatible with the Full Rationality model. The area of the Full Rationality model is: aF R = 24 20; 736 2 0 The Full Rationality model is thus a beautifully parsimonious model whose Selten's Measure of Predictive Success is entirely determined by its descriptive power. From Table 7 above we thus have, for the PAY and HYP treatments, the following values of Selten's Measure for this model: sFP R AY 0:471 sFHYR P 0:607 The RSM model, as we have seen, does not improve signi cantly in this experiment on the Full Rationality model even in terms of sheer descriptive power. So a fortiori it is not an interesting competitor to the latter in terms of overall predictive success. Therefore we move to the computation of the values for the model of RSM by categorization. We need to compute rst the number of choice functions that satisfy Weak WARP. This can be done by considering four possible exhaustive and mutually exclusive con gurations, depending on whether the choice from the grand set, (X), is selected in pairwise choice over all three other alternatives (Con guration A), over exactly two other alternatives (Con guration B), over exactly one other alternative (Con guration C), or over none of the other alternatives (Con guration D). How many possible cases of Con guration A are there in which Weak WARP is satised? The choices for all sets of cardinality three which include (S) are forced, as (S) must be chosen from them in order not to violate Weak WARP. So, the four possible choices of (S) can be combined with the three possible choices from the set of cardinal- ity three not including (S), and with the 23 possible choice combinations from the three 24 binary sets that do not include (S). All in all, we have 4 3 23 = 96 possible choice functions compatible with Weak WARP in this con guration. For Con guration B, we have rst of all that the four possible choices for be combined with three possible choices for the alternative which is chosen over binary comparison. For each of these combinations, there are 3 (S) can (S) in 22 choice combinations from the sets of cardinality three (since only two choices compatible with Weak WARP can be made from the two sets of cardinality three that include (S) and the alternative that `beats' it in binary comparison), together with the 23 choice combinations from the binary sets that do not include (S). All in all, we have 4 3 (3 22 ) 23 = 1; 152 possible choice functions compatible with Weak WARP in this con guration. Reasoning along similar lines leads to the count of 3; 456 possible choice functions compatible with Weak WARP in con guration C, and to 2; 592 in Con guration D. Adding up, in total there are 7; 296 possible choice functions compatible with Weak WARP when the grand set has cardinality four, and thus (7; 296)2 possible types of choice behavior compatible with Weak WARP in our experiment. This leads to the area value a RSM C = 7; 296 20; 736 2 = 0:123 Our model of RSM by categorization is as expected less parsimonious than the Full Rationality model. However let's consider the values for Selten's Measure in the two treatments: C sRSM = 0:716 P AY 0:123 = 0:593 C sRSM = 0:857 HY P 0:123 = 0:734 We conclude that the predictive success of the model of RSM by categorization is superior in both treatments to that of the Full Rationality model. 3.2.5 Summary and Comment. The general indication we draw from the data is that a model addressing the lack of full rationality in the choice function must be able to explain menu-e ects in the form of Condorcet inconsistency. 25 This indication is con rmed in the analysis of the three models we have studied in this paper. Neither the full rationality nor the RSM model are compatible with menu e ects of the Condorcet inconsistency type, and indeed they both fail at explaining the data. The RSM model performs only marginally better than the full rationality model. The proportion of successes in explaining behavior is not increased signi cantly when weakening WARP to the combination of Expansion and Weak WARP. The model of RSM by categorization is compatible with Condorcet inconsistency, and it is successful. There is a signi cant leap in the proportion of successes in explaining behavior when weakening WARP to Weak WARP. The resulting model can explain 50% more data compared to the other two models, namely over 70% in one treatment and over 85% in the other treatment. In addition, the loss of parsimony of the model of RSM by categorization is much smaller than the increase in descriptive power, as evidenced by its superiority on the basis of Selten's Measure of Predictive Success. 3.3 Experimental results II: Other considerations The data allow a plethora of additional considerations - due to space limitations we cannot analyze them all in this paper, and limit ourselves to highlighting just a few. Two clear patterns concerning violations of the axioms that emerge from both the aggregate data tables and the individual choice data tables are the following: 1. People are more consistent for longer sequences. For each axiom considered, the proportion of choices or subjects violating it falls as sequence length increases, irrespective of treatment. 2. People are more consistent if they are not paid. For each axiom considered, the proportion of choices or subjects violating it falls when incentives are removed, i.e. when passing from the PAY to the HYP treatment, regardless of sequence length. In addition, by looking at the crosstabulation of violations of each axiom by sequence length, we can measure, for each axiom, the proportion of subjects failing to satisfy it for 26 at least one choice function. Crosstabulations of this sort allow us to test for each axiom the following: within each treatment: (i) the statistical signi cance of the fall in the proportions of violations when going from shorter to longer sequences, and (ii) whether or not violations observed for di erent sequence length are associated. across treatments: whether, controlling for sequence length, the proportion of violations depends on treatment, i.e. whether elicitation of choices by incentive compatible means in the PAY treatment results in a di erent proportion of subjects violating each axiom as compared to the HYP treatment. We summarize our main ndings in table 8 below.24 Notationwise, and 2 3 refer to the proportions of subjects violating an axiom in choices involving two and three period sequences, respectively, for any given treatment. In addition, for any given sequence length, P AY and refer to the proportions of subjects violating an axiom in the HY P PAY and HYP treatment, respectively. Within treatment PAY 2 > 3 random errors 2 > 3 P AY random errors 2 periods X X X PC X Weak WARP EXP HYP X CC Across treatment HY P 3 periods X (10%) X (10%) X (10%) X X WARP > X X Table 8: Comparisons of proportions and association. In the leftmost part of table 8 (under the heading `within treatment') we report (i) whether or not 2 is statistically larger than 3, and (ii) whether violation of an axiom for shorter sequences makes it any more likely that the subject violates the same axiom 24 The detailed derivation of this summary is reported in Appendix C in the technical appendix available online at http://www.qmul.ac.uk/ ugte172/twosimiltechnical and data appendix.pdf. 27 when choosing out of longer sequences, too. If this is not the case, one may assume that di erences in the proportions of violations across sequence length are due to the subjects making mistakes independently from one another - in table 8 this lack of association is referred to by the shorthand `random errors'. In each column, we use a tick (X) when the relevant statistic is such that the heading in the table `holds', and a cross ( ) when the heading in the column `fails'. So for point (i), a tick indicates that larger than 3, 2 is statistically while a cross indicates that the di erence in proportions is not statistically signi cant.25 Regarding (ii) instead we use a tick to indicate that indeed di erences may be just random, and a cross when this is not the case.26 The rightmost part of the table instead reports whether, for any given sequence length, the fall in the proportion of subjects violating a given axiom when moving from the PAY to the HYP treatment is statistically signi cant (which we denote by a tick X) or not (which we denote by a cross 25 ).27 To be precise, the null hypothesis of the test for (i) is that the proportion of violations is the same regardless of sequence length against a one sided alternative that the proportion of violations for two period sequences is larger than for three period sequences. Then the tick refers to the null being rejected. To test this hypothesis we rely on McNemar' statistic. 26 To be precise, the null hypothesis of the test is for lack of association between rows and columns in the cross-tabulation (i.e. the odds ratio is equal to 1). If this hypothesis is rejected, then rows and columns are associated, i.e. a subject is much more likely to violate the axiom in choice among three period sequences when he has done so in choice out of two period sequences too. In the table we abuse terminology for the sake of clarity, so that a tick corresponds to a rejection of the null hypothesis, while a cross stands for failure to reject. We base this test on Fisher's statistic. Note that tests (i) and (ii) are independent, in the sense that a high p-value in the McNemar test does not necessarily imply a high p-value of the Fisher test, and viceversa. For instance, in the table long short yes no yes 2 8 no 1 4 based on the Mc Nemar test one rejects the null of equality of proportion whereas based on the Fisher test one fails to reject the null of lack of association between rows and columns. 27 To be precise, the null hypothesis of the test is for equality in the proportion of subjects violating a given axiom across the PAY and HYP populations, based on the Fisher test (i.e. the odds ratio for the table with treatments against violation is equal to unity). If this hypothesis is rejected (against the one sided alternative that the proportion of violations in the PAY treatment is larger than in the HYP 28 There is no clear pattern of association across sequence length for the violations of each axiom (`random error' columns). Broadly, di erences in choice behavior between two and three period sequences are more pronounced in the PAY than in the HYP treatment (` 2 > 3' columns). Moreover choice behavior for three period sequences does not di er much between the two treatments, whereas it does for choices over two period sequences (` P AY > HY P ' columns). However, a quick inspection of choice behavior by subject (table 7) shows that for all of the three models analyzed, their ability to explain the data increases in the HYP treatment as compared to the PAY treatment: when moving from the PAY to the HYP treatment, the percentage of subjects whose choice function is an RSM increases by 10:7%, the percentage of subjects whose choice function can be rationalized in the standard way increases by 13:6%, and the percentage of subjects whose choice function is an RSM by categorisation increases by 14:1%. For the latter two notions of rationality these increments are statistically signi cant,28 and `just' not signi cant for rational shortlist methods.29 Comment. Our data show a very clear pattern whereby monetary incentives to elicit choices which are the expression of `true' preferences have the e ect of producing less `rational' behavior. Providing a rigorous explanation for this phenomenon would go beyond the scope of this paper and the bounds of economics. Still, this seems to open a di erent angle to the discussion on the role of monetary incentives in experiments. In the economics literature this generally revolves around whether or not monetary incentives are necessary to elicit `true' preferences or the `best' outcome (see e.g. Camerer and Hogarth [7], Hertwig and Ortmann [16], Read [29] and Harrison and Rutstr•om [8]). However, we note that an empirical regularity in experiments is that subjects are upset treatment), then the two proportions are statistically di erent. In the table we abuse notation for the sake of expositional clarity, so that a tick corresponds to a rejection of the null hypothesis, while a cross stands for failure to reject. 28 In comparing proportions of violations in the PAY and HYP treatments, Fisher test's exact mid-p values are 0:042 for WARP, i.e. standard rationalisability; and 0:022 for Weak WARP, i.e. RSM by categorisation. 29 The exact mid-p value from the Fisher test is is 0:101. 29 when confronted with their own inconsistencies.30 One might argue that the absence or presence of monetary incentives constitutes a change of (experimental) `frame', so that what matters is not the composition of the set from which the choice is going to be made, rather the objects it includes and whether or not monetary incentives for choice exist. In other words, the choice set when any alternative once chosen is then going to be experienced is a di erent object from a choice set with the same set of available alternative but where choice itself is just a thought experiment. In addition, there may be other `external' relevant dimensions to the problem, such as the decision makers' values, motivations and so on, which might in uence choice.31 Based on these considerations, we advance the tentative hypothesis that our results, too, support the position that incentive compatible elicitation of preference is necessary to elicit preferences that are closer to those that a decision maker would display in a real life choice situation. Where choices are only hypothetical in nature, as in the HYP treatment, the decision maker's main concern is that of being consistent, resulting in less frequent violations of the axioms. In this sense, our results invite caution in the use of introspection for testing the `plausibility' of competing axioms of choice. 3.4 Experimental results III: `La donna e' mobile' When looking at violations of the axioms across sexes, a pattern emerges: within each treatment the proportion of women that violate the axioms is higher than men, albeit with one exception. In addition, the pattern we highlighted in the previous section whereby with the removal of monetary incentives for choice the proportion of violations of our axioms decreases - persists regardless of sex, as shown in Table 9. The statistical signi cance of these di erences, however, is in general weak. In particular: 1. Within treatment: 30 This is based on the casual evidence that generally emerges in de-brie ng discussions, although there is psychological literature that deals with the e ect of a ective states on decisions, see e.g. Luce, Bettman and Payne [21]. 31 This point has been made very clearly by Sen [37]. 30 PAY F # HYP M % # F % # M % # % CC 27 57.4% 24 43.6% 14 48.3% 8 29.6% Expansion 27 57.4% 24 43.6% 14 48.3% 8 29.6% 17 36.2% 12 21.8% 5 17.2% 3 11.1% 7 3 10.3% 1 3.7% 27 57.4% 27 49.1% 14 48.3% 8 29.6% Weak WARP PC WARP 14.9% 10 18.2% Table 9: Violations of the axioms by sex. The di erence in proportions of men and women violating Condorcet Consistency and Expansion is statistically signi cant at 10% con dence level in both the PAY and the HYP treatments;32 The di erence in proportions of men and women violating Pairwise Consistency is not signi cant in either treatment;33 The di erence in proportions of men and women violating Weak WARP is statistically signi cant at 10% con dence level in the PAY treatment and not statistically signi cant in the HYP treatment;34 The di erence in proportions of men and women violating WARP is not statistically signi cant in the PAY treatment and statistically signi cant at 10% con dence level in the HYP treatment.35 2. across treatments: For female participants, the only di erence in the proportions of subjects violating a given axiom across treatments which is statistically signi cant is for 32 The mid-p value for the Fisher test is 0:087 for the PAY treatment and 0:084 for the HYP treatment. 33 The mid-p value for the Fisher test is 0:336 for the PAY treatment and 0:199 for the HYP treatment. 34 The mid-p value for the Fisher test is 0:059 for the PAY treatment and 0:27 for the HYP treatment. 35 The mid-p value for the Fisher test is 0:21 for the PAY treatment and 0:084 for the HYP treatment. 31 Weak WARP, for which the Fisher test returns a mid-p value of 0:042;36 For male participants the only di erences in proportions which are statistically signi cant are for Pairwise Consistency, for which the Fisher test yields a pvalue of 0:037; and WARP, for which the mid-p value from the Fisher test is 0:051.37 The analysis above shows also that there are substantial di erences (though not always statistically signi cant) in the proportion of women and men whose choices conform to either Full rationality (i.e. their choices satisfy WARP) or RSM by categorization (i.e. their choices satisfy Weak WARP). To check di erences in the sexes as to the explanatory power of RSM we present cross-tabulations of Expansion and Weak WARP by sex in tables 10 and 11. PAY Females Males Expansion Expansion X # Weak WARP % 17 36.2% X X # % # 0 0% 12 21.8% 10 21.3% 20 42.6% % # % 0 0% Weak WARP 12 21.8% 31 56.4% X Table 10: RSM by sex in the PAY treatment. When considering RSMs, then, the di erence across sexes is quite substantial in both treatments (around 13% in the PAY treatment and just short of 20% in the HYP treatment), and it is also statistically signi cant at 10% con dence level for both treatments.38 Finally, di erences across treatments by sex are not statistically signi cant.39 36 The mid-p value for the Fisher test is equal to 0:224 for Condorcet Consistency, WARP and Expan- sion, and equal to 0:30 for Pairwise Consistency. 37 The other mid-p values for the Fisher test are equal to 0:117 for both Condorcet Consistency and Expansion, and to 0:129 for Weak WARP. 38 The mid-p values for the Fisher test of the di erence in the proportion of men and women satisfying RSM is equal to 0:087 for the PAY treatment and 0:084 for the HYP treatment. 39 The mid-p values for the Fisher test of the di erence in the proportion of subjects satisfying RSM 32 HYP Females Males Expansion Expansion X X # % # % # % # % 5 17.2% 0 0% 3 11.1% 0 0% 9 31.1% 15 51.7% 5 18.5% 19 70.4% X Weak WARP X Weak WARP Table 11: RSM by sex in the HYP treatment. In summary then: PAY F # HYP M % # F % # % M # % Rationalizability 20 42.6 28 50.9 15 51.7 19 70.4 Rational Shortlist Methods 20 42.6 31 56.4 15 51.7 19 70.4 RSM by categorization 30 63.8 43 78.2 24 82.8 24 88.9 Table 12: Explanatory power of competing theories across sexes The model of RSM by categorization works better than the other two regardless of sex. Across sexes, there are di erences: the proportion of men whose choices are rationalizable is higher than the proportion of women in the HYP treatment, but not in the PAY treatment;40 the proportion of men whose choices are RSM is statistically signi cantly higher than the proportion of women in both the PAY and HYP treatment;41 in the HYP and PAY treatments is equal to 0:228 for Female participants and equal to 0:117 for Male participants. 40 The Fisher test yields an exact mid-p value equal to 0.215 for the PAY treatment and to 0:084 for the HYP treatment (i.e. for the latter the di erence in proportions is signi cant at 10% con dence level). 41 The Fisher test yields an exact mid-p value equal to 0:086 for the PAY treatment and to 0:084 for the HYP treatment (i.e. the di erence in proportions is signi cant at 10% con dence level). 33 the proportion of men whose choices are an RSM by categorization is higher than the proportion of women in the PAY but not in the HYP treatment.42 It is very hard to explain any of these di erences in behavior within a purely economic framework. We leave further analysis to scholars in other elds. 4 Concluding remarks We hope to have made several points with this paper: Methodology. The broad thrust of the paper has been to show concretely that the standard revealed preference methodology can be successfully used to study (experimentally or otherwise) `psychological' choice procedures. This methodological point has also been recently argued by Rubinstein and Salant ([34] and [33]), Bernheim and Rangel ([5]), Manzini and Mariotti ([22]), and it is implicit in the approach of works such as Masatlioglu and Ok ([25], [26]). Note well: we are not claiming that only direct choice data are relevant for economics (as argued for example by Gul and Pesendorfer [14]), but that even these data alone can be extremely helpful in discriminating between the standard choice model and models of boundedly rational choice, and between di erent models of boundedly rational choice. `Revealed preference' facts. Whether one believes our models or not, the experiment has established quite clearly two series of facts. First, it has highlighted the importance of a precise type of menu e ect with the options made available to the subjects. Indeed, most violations of the Weak Axiom of Revealed Preference in our experiment are due to the Condorcet inconsistency phenomenon rather than to pairwise inconsistency (as we have shown, these two forms of inconsistency are exhaustive). This fact could only be discovered because we elicited entire choice functions from the subjects, rather than asking them to express their (binary) preferences. Secondly, the experiment has demonstrated that while one step in weakening the Weak Axiom of Revealed Preference (to the combination of Weak WARP and Expansion) 42 The Fisher test yields an exact mid-p value equal to 0:059 for the PAY treatment (i.e. statistical signi cance is at 10% con dence level) and to 0:272 for the HYP treatment. 34 does not capture a signi cant new portion of observed behaviours, a second step in this weakening (to Weak WARP alone) is dramatically more e ective. Models of choice. As a consequence of the facts above, we now know what a good model has to do in order to explain choices of the type we have considered. The utility maximization model and the pure Rational Shortlist Method model ([22]) cannot explain well the choice data elicited in our experiment because they can't address menu e ects. Our new model of an RSM by categorization instead performs de nitely better (not only in terms of brute number of observations explained but also in terms of Selten's [36] Measure of Predictive Success). This model has two main attractions. First, its ability to capture in a simple way menu e ects. And, second, its easy testability by means of choice data (being equivalent to a single choice axiom), without the need to search directly for all possible psychological entities (`categories' and `rationales') that appear in its description and that might t the data. As a parsimonious way to describe choices, that is as a convenient `as if ' device, according to our data this is a better model than, say, the utility maximisation model; categories and rationales are better descriptive tools than utility. So in terms of a pure `revealed preference' approach we could stop at that. But we would ultimately like to argue also for the substantive validity of RSMs by categorization as a model of choice, in the sense that they describe cognitive operations that real decisions makers actually use. Obviously many more (choice or non-choice) data would be needed for a thorough validation also in this sense, since other (yet to be formulated) psychological mechanisms may be compatible with Weak WARP. Nevertheless we believe some ground has been covered already, as the model passes a rst screening of introspective plausibility and of psychological support. A minimal claim is that it does so at least as well as competing models. References [1] Agresti, A. (2002) Categorical Data Analysis, Wiley Series in Probability and Statistics, Second Edition, John Wiley and Sons. 35 [2] Andreoni, J. and W. T. Harbaugh (2006) \Power Indices for Revealed Preferences Tests", mimeo. [3] Andreoni, J. and J. Miller (2002) \Giving According to GARP: An Experimental Tests of the Consistency of Preferences for Altruism", Econometrica, 70: 737-753. [4] Becker, G. S. (1962) \Irrational Behavior in Economic Theory", Journal of Political Economy, 70: 1-13. [5] Bernheim, D.B. and A. Rangel (2007) \Toward Choice-Theoretic Foundations for Behavioral Welfare Economics", American Economic Review (Papers & Proceedings), 97: 464-470. [6] Bronars, S. G. (1987) \The Power of Nonparametric Tests of Preference Maximization", Econometrica, 55: 693-698. [7] Camerer, C. F. and R. Hogarth (1999) \The e ects of nancial incentives in experiments: A review and capital-labor-production framework", Journal of Risk and Uncertainty, 19: 7-42. [8] Harrison, G. and E. Rutstr•om \Experimental Evidence on the Existence of Hypothetical Bias in Value Elicitation Methods", in C. Plott and V. Smith, eds., Handbook of Experimental Economics Results. New York: Elsevier, forthcoming. [9] Fevrier, P. and M. Visser (2004) \A Study of Consumer Behavior Using Laboratory Data", Experimental Economics, 7: 93-114. [10] Fishburn, P. C. (1970) Utility theory for decision making, Wiley, New York. [11] Frederick, S., G. F. Loewenstein and T. O'Donoghue (2002) \Time discounting and time preferences: a critical review", Journal of Economic Literature, 40: 351-401. [12] Gigerenzer, G., P. Todd, and the ABC Research Group (1999) Simple Heuristics That Make Us Smart. New York: Oxford University Press.. [13] Gul, F. and W. Pesendorfer (2001) \Temptation and Self-Control", Econometrica, 69: 1403-1435. 36 [14] Gul, F. and W. Pesendorfer (2005) \Mindless economics", mimeo Princeton University. [15] Harbaugh, W. T., K. Krause and T. Berry (2001) \GARP for Kids: On the Development of Rational Choice Behavior", American Economic Review, 91: 1539-1545. [16] Hertwig, R. and A. Ortmann (2001) `Experimental Practices in Economics: A Challenge for Psychologists?' [target article], Behavioral and Brain Sciences, 24: 383-403. [17] Houy, N. (2006) \Choice functions with States of Mind", forthcoming in Theory and Decisions. [18] Kalai, G., A. Rubinstein and R. Spiegler (2002) \Rationalizing Choice Functions by Multiple Rationales", Econometrica, 70: 2481-2488. [19] Koszegi, B. and M. Rabin (2006) \A Model of Reference Dependent Preferences", Quarterly Journal of Economics, 121: 1133-1166. [20] List, J. and C Gallet (2001) \What Experimental Protocol In uence Disparities Between Actual and Hypothetical Stated Values?", Environmental and Resource Economics, 20: 241-254. [21] Luce, M., J. Bettman, and J. Payne (1997) \Choice Processing in Emotionally Difcult Decisions", Journal of Experimental Psychology: Learning, Memory and Cognition, 23:384-405 [22] Manzini, P. and M. Mariotti (2007) \Sequentially Rationalizable Choice", American Economic Review, 97: 1824-1839. [23] Manzini, P., M. Mariotti and L. Mittone (2006) \Choosing monetary sequences: theory and experimental evidence", CEEL Working paper 1-06. [24] Medin, D.L. and Aguilar, C. (1999) \Categorization", in R.A. Wilson and F.C. Keil (eds) MIT Encyclopedia of the Cognitive Sciences, MIT Press, Cambridge, Ma. [25] Masatlioglu, Y. and E. Ok (2005) \Rational Choice with Status-quo Bias", Journal of Economic Theory 121: 1-29. 37 [26] Masatlioglu, Y. and E. Ok (2006) \Reference-Dependent Procedural Decision Making", working paper, NYU. [27] Mattei, A. (2000) \Full-Scale Real Tests of Consumer Behavior using Expenditure Data", Journal of Economic Behavior and Organization, 43: 487-497. [28] Moulin, H. (1985) \Choice Functions Over a Finite Set: A Summary" Social Choice and Welfare, 2, 147-160. [29] Read, D. (2005) \Monetary incentives, what are they good for?", Journal of Economic Methodology, 12: 265-276. [30] Roelofsma, P. H. and D. Read, 2000 \Intransitive Intertemporal Choice", Journal of Behavioral Decision Making, 13: 161-177. [31] Rubinstein, A., 1988, \Similarity and Decision Making Under Risk (Is there a utility theory resolution to the Allais Paradox?)", Journal of Economic Theory, 46: 145-153. [32] Rubinstein, A., 2003, \Is it `Economics and Psychology' ? The case of hyperbolic discounting", International Economic Review, 44: 1207-1216. [33] Rubinstein, A. and Y. Salant (2006) \Two Comments on the Principle of Revealed Preference", mimeo, Tel Aviv University, http://arielrubinstein.tau.ac.il/ papers/frames.pdf. [34] Rubinstein, A. and Y. Salant (2007) \(A; f ) Choice with Frames", mimeo, Tel Aviv University, http://arielrubinstein.tau.ac.il/papers/Af.pdf. [35] Samuelson, P. (1938) \A Note on the Pure Theory of Consumer's Behavior", Economica 5 (1): 61-71. [36] Selten, R. (1991)\Properties of a measure of predictive success", Mathematical Social Sciences, 21: 153-167. [37] Sen, A. (1993) \Internal consistency of choice", Econometrica, 61: 495-521. [38] Sen, A. (1997) \Maximization and the Act of Choice", Econometrica, 65, 745-779. 38 [39] Simonson, I. (1989) \Choice Based on Reasons: The Case of Attraction and Compromise E ects", Journal of Consumer Research, 16: 158-74. [40] Sippel, R. (1997) \An Experiment on the Pure Theory of Consumer's Behavior", Economic Journal, 107: 1431-1444. [41] Suzumura, K. (1983) Rational Choice, Collective Decisions, and Social Welfare, Cambridge University Press, Cambridge U.K. [42] Tversky, A. (1969) \Intransitivity of Preferences", Psychological Review, 76: 31-48. [43] Tversky, A. (1972) \Elimination By Aspects: A Theory of Choice", Psychological Review, 79: 281-299. [44] Varian, H. (2005) \Revealed Preference", forthcoming in Samulesonian Economics and the 21st century, Michael Szenberg (Ed.). [45] Waite, T. (2001) \Intransitive preferences in hoarding gray jays (Perisoreus canadensis)", Behavioral Ecology and Sociobiology, 50: 116 - 121. [46] Yee, M., E. Dahan, J. R. Hauser and J. Orlin (2007) \Greedoid-Based NonCompensatory Inference", Marketing Science, 26: 532-549. 39 A Appendix: Instructions Please note: you are not allowed to communicate with the other participants for the entire duration of the experiment. The instructions are the same for all you. You are taking part in an experiment to study intertemporal preferences. The project is nanced by the ESRC. Shortly you will see on your screen a series of displays. Each display contains various remuneration plans worth the same total amount of C =48 each, staggered in three, six and nine months installments. For every display you will have to select the plan that you prefer, clicking on the button with the letter corresponding to the chosen plan. (HYP: These remuneration plans are purely hypothetical. At the end of the experiment you'll be given a participation fee of C =5) (PAY: At the end of the experiment one of the displays will be drawn at random and your remuneration will be made according to the plan you have chosen in that display). In order to familiarize yourself with the way the plans will be presented onscreen, we shall now give you a completely hypothetical example, based on a C =7 total remuneration. Plan A How much When =3 C in one year =1 C in two years =1 C in three years =2 C in four years Plan B How much When =1 C in one year =2 C in two years =3 C in three years C1 = in four years In this example plan A yields C =7 in total in installments of C =3, C =1, C =1 and C =2 in a year, two years, three years and four years from now, respectively, while plan B yields C =7 in total in installments of C =1, C =2, C =3 and C =1 in a year, two years, three years and four years from now, respectively. 40
© Copyright 2026 Paperzz