‘Stochastically More Risk Averse:’ A Contextual Theory of Stochastic Discrete Choice Under Risk by Nathaniel T. Wilcox Department of Economics University of Houston Houston, TX 77204-5019 [email protected] Abstract I introduce the relation “stochastically more risk averse.” Stochastic choice under risk has overwhelmingly been modeled by way of what decision theory calls a strong or strict utility model, which in microeconometrics are also known as homoscedastic latent variable models. In particular, a parametric preference functional difference, representing the difference between the value of two lotteries (as specified by expected utility, rank-dependent expected utility, cumulative prospect theory and so forth), is embedded within a cumulative distribution function, such as the Standard Normal or Logistic to represent choice probabilities. Then a likelihood function is constructed and estimated. I show that there is a deep conceptual difficulty with this practice: Estimated money utility function parameters that are meant to represent an agent’s degree of risk aversion in the senses of Pratt (1964), and hence should order agents according to the relation “more risk averse,” have no necessary correspondence to the relation “stochastically more risk averse” within such strong or strict utility implementations of the preference functionals. I introduce a new stochastic choice model, called “contextual utility,” that remedies this unsatisfactory situation. Contextual utility is based on the psychologically plausible notion that the “outcome context” of each lottery choice pair creates a subjective utility range that mediates the perception of value differences between the lotteries in that pair. From a microeconometric viewpoint, contextual utility is simply a heteroscedastic latent variable model where latent error variance is proportional to the range of outcome utilities in lottery pairs. Empirical evidence is presented showing that contextual utility explains and predicts discrete choice under risk better than strong or strict utility models or random preference models. I thank John Hey for generously making his data available to myself and others, and I thank Soo Hong Chew and Glenn Harrison for conversation and commentary during this work. I also thank the U.S. National Science Foundation for research support under grant SES 0350565. Let c = ( z1c , z 2c , z3c ) be a vector of three money outcomes such that z1c < z 2c < z3c , called an outcome context or, more simply, a context. Let S m be a discrete probability distribution ( sm1 , sm 2 , sm3 ) on any vector ( z1 , z 2 , z3 ) ; then a three-outcome lottery S mc on context c is the discrete probability distribution S m on outcome vector ( z1c , z2 c , z3c ) . It is sometimes convenient to view lotteries as cumulative distribution functions S mc ( z ) ≡ ∑i|z ic ≤ z smi . A pair mc is a set {S mc , Tmc } of two lotteries on a common context c. Let E ( S mc ) be the expected value of S mc : a mean-preserving spread or MPS pair is one where E ( S mc ) = E (Tmc ) , sm 2 + sm3 > tm 2 + t m3 and sm3 < t m3 . Let Ω mps be the set of all MPS pairs. Since any risk-averse expected utility maximizing agent n will prefer S mc to Tmc ∀ mc ∈ Ω mps (Rothschild and Stiglitz 1970), we can call S mc the “relatively safe choice” in pairs mc ∈ Ω mps , at least from an expected utility perspective. Suppose that, for empirical reasons, we are convinced that agents choose probabilistically rather than deterministically.1 Then, let Pmcn be the probability that agent n chooses S mc from pair mc, which we regard as an empirical primitive. What might it mean for agent a to be probabilistically more risk-averse than agent b? Consider this proposal for a probabilistic empirical primitive: Stochastically More Risk-Averse (SMRA). Agent a is stochastically more risk averse than agent b, written a b , iff Pmca > Pmcb ∀ mc ∈ Ω mps . smra 1 By this, I mean that choice probabilities are usually different from 0 or 1 (or 0.5 in the case of indifference). A large number of experiments that involve multiple trials of identical pairs document substantial choice switching between trials of identical pairs, even at very short time intervals of a few minutes with no intervening change in wealth or any other obvious decision-relevant variable (Tversky 1969; Camerer 1989, Starmer and Sugden 1989, Hey and Orme 1994, Ballinger and Wilcox 1997, Loomes and Sugden 1998, Hey 2001). 1 That is, a stochastically more risk averse agent makes the relatively safe choice from any MPS pair with higher likelihood than a less stochastically risk-averse agent.2 If mean proportions of relatively safe choices from MPS pairs vary significantly across subjects in an experiment, and we call this heterogeneous risk aversion, then we probably have something like SMRA in mind. SMRA is not, however, an implication of theoretical definitions of the relation “more risk averse” (or MRA) inherited from Pratt (1964) for expected utility (or EU) maximizers, nor of Chew, Karni and Safra’s (1987) later generalization of it for rank-dependent expected utility (or RDEU) maximizers, when these are combined with stochastic discrete choice models commonly used for estimation of risk attitudes. I show this, and offer an alternative stochastic choice model I call contextual utility that does make SMRA an implication of MRA. I then present estimates showing that contextual utility generally describes data and predicts new choices as well or better than other common stochastic choice models, when these are combined with EU and RDEU theories of choice under risk, in the well-known Hey and Orme (1994) data set. Thus we can have our theoretical cake and empirically eat it too: A stochastic choice model like contextual utility, that makes SMRA and MRA congruent with one another, is also an empirically better model of stochastic choices from lottery pairs. As a relational preference concept, MRA is ubiquitous in professional economic discourse. Figure 1 illustrates this for the years 1977 to 2001, comparing counts of articles3 with either of the phrases “more risk-averse” or “greater risk aversion” in their text to counts of articles with 2 We may, if we wish, fancy it up by making distinctions between local and global versions of the primitive—taking account, for instance, of the possibility that risk aversion may change with current wealth in some manner. I will not delve into this here: Formally, regard my discussion as limited to what Cox and Sadiraj (2006) call the “expected utility of income” model (and the rank-dependent expected utility of income as well). 3 The years 1977 to 2001 were chosen so that a broad range of economics journals were searchable in JSTOR for all years in that range—only such journals are included for these counts. Some journals withhold JSTOR availability for the most recent five years, so I chose 2001 as the last widely available year. A small number of important journals either do not have JSTOR availability back to 1977 (e.g. the Journal of Labor Economics) or did not exist back to 1977 (e.g. the Journal of Economic Perspectives) and so are not included. Three important business journals are also included for these article counts (the Journal of Business, the Journal of Finance and Management Science). 2 either of the phrases “more substitutable” or “greater elasticity of substitution” in their text. The latter is clearly a central relational concept of economics. The comparative levels of the two trend lines are somewhat arbitrary, since searches for different text strings would affect the levels. Moreover, instances of the “substitutability strings” include articles about technologies rather than preferences. The comparative trends are more meaningful. MRA gained on “more substitutable” as a relational concept over the quarter century from 1977 to 2001. Figure 1 also shows that although citations4 to Pratt (1964) peaked in the early 1980s, they have continued at a fairly constant rate of about thirty-five to forty a year over the last two decades and show no sign of decrease. Probably, very few economics articles show such steady and long-lived influence. There can be little doubt that MRA is a central relational preference concept of economics. Estimation and interpretation of structural parameters of theories of discrete choice under risk is becoming widespread. Estimation requires the adoption of some stochastic choice model, and we need to know how to interpret structural parameters of theories of choice under risk when those models are estimated with any particular stochastic choice model. It seems desirable that structural parameters associated with the concept of MRA should have a theoretically appealing empirical meaning like SMRA, particularly given MRA’s ubiquity in economic discourse. The contextual utility stochastic model presented here delivers that interpretation: Commonly used alternatives do not. I establish the latter negative point first, and then show the former. Then I show that models based on contextual utility explain (and especially predict) risky choices better than models based on strong utility, strict utility and random preference stochastic models in the well-known Hey and Orme (1994) data set. 4 Here, I searched the ISI Science and Social Science Citation indices, which allows searching from 1977 to 2006. All fields (not just economics) are included for these citation counts in Figure 1: The temporal pattern of citations is extremely similar when limited to citations in economics journals. 3 1. Strong and strict utility To connect deterministic theory to probabilistic primitives, let the structure of choice under risk be defined as a function V of lotteries and a vector of structural parameters β n such that (1.1) V ( S mc | β n ) − V (Tmc | β n ) ≥ 0 ⇔ Pmcn ≥ 0.5 .5 I will frequently refer to V ( S mc | β n ) − V (Tmc | β n ) as the V-distance between Smc and Tmc. Equation (1.1) equates the deterministic primitive S mc nTmc or “ S mc is weakly preferred to Tmc by n” with the probabilistic primitive “n is at least as likely to choose S mc from mc as Tmc ,” a common approach since Edwards (1954).6 Since structure maps pairs into a set of possible probabilities rather than a single unique probability, structure alone underdetermines choice probabilities. Stochastic models combine with structure to remedy this. For example, expected utility (EU) with the constant relative risk aversion (CRRA) utility of money u n ( z ) = (1 − ρ n ) −1 z1−ρ is the structure V ( S mc | ρ n ) = (1 − ρ n ) −1 ∑i=1 smi zic1−ρ , so that 3 n (1.2) n (1 − ρ n ) −1 ∑i=1 ( smi − tmi ) zic1−ρ ≥ 0 ⇔ Pmcn ≥ 0.5 , 3 n where β n = ρ n is the sole structural parameter, called agent n’s coefficient of relative risk aversion. Call this the CRRA EU structure. To estimate ρ n in (1.2), or more generally β n in (1.1), we need a stochastic model. Many empirical economists instinctively use a homoscedastic latent variable model to complete the relationship between the structural left side of (1.1) and the n choice probability Pmcn on the right side of (1.1). Let ymc = 1 if agent n chooses S mc from pair mc n ( ymc = 0 otherwise). In general, such models assume that there is an underlying but unobserved 5 This restricts attention to transitive structures; this would be an unsatisfactory equation for a nontransitive theory. There are alternative stochastic choice models under which this innocent-sounding equation is incorrect, such as Machina (1985); but existing evidence offers little support for these alternatives (Hey and Carbone 1995). 6 4 n* n n* continuous random latent variable ymc such that ymc = 1 ⇔ ymc ≥ 0 ; then we have n* Pmcn = Pr( ymc ≥ 0) . Here, the latent variable is (1.3) n* ymc = V ( S mc | β n ) − V (Tmc | β n ) − σε , where ε is a mean zero random variable with some standard variance and c.d.f. H (x) such that H (0) = 0.5 and H ( x) = 1 − H (− x) , usually assumed to be the standard normal or logistic c.d.f. The resulting homoscedastic latent variable model of Pmcn is then (1.4) ( ) Pmcn = H [V ( S mc | β n ) − V (Tmc | β n )] / σ . The modifier “homoscedastic” refers to the fact that σ is assumed to be constant across both pairs mc and agents n, a common assumption made when estimating these models. In decision theory, (1.4) is a strong utility model (Debreu 1958; Block and Marschak 1960; Luce and Suppes 1965). Letting L be the set of all lotteries, choice probabilities follow strong utility if there is a function µ n :L→R and an increasing function ϕ n :R →[0,1], with ϕ n (0) = 0.5 and ϕ n ( x) = 1 − ϕ n (− x) , such that Pmcn = ϕ n (µ n ( S mc ) − µ n (Tmc ) ) . Clearly, (1.4) is a strong utility model where ϕ n ( x) = H ( x / σ ) and µ n ( S mc ) = V ( S mc | β n ) . In equation (1.3), σε is regarded as computational, perceptual or evaluative noise in the decision maker’s apprehension of the Vdistance V ( S mc | β n ) − V (Tmc | β n ) , with σ being proportional to the standard deviation of this noise. Microeconometric doctrine usually views σε as a perturbation to V-distance observed by agents but not observed by the econometrician. In either case, as σ approaches zero, choice probabilities converge on either zero or one, depending on the sign of the V-distance; put differently, the observed choice becomes increasingly likely to express the underlying preference direction of the structure. In keeping with common (but not universal) practice and parlance in the experimental literature, substitute λ for 1 / σ , and call λ the precision parameter, writing 5 (1.5) ( ) Pmcn = H λ[V ( S mc | β n ) − V (Tmc | β n )] . Strict utility is a special case of strong utility where the function µ n is additionally restricted to be positive-valued and choice probabilities can be written as (1.6) Pmcn = µ n ( S mc ) /[ µ n ( S mc ) + µ n (Tmc )] . Provided V is also positive-valued, we may define µ n ( S mc ) ≡ V ( S mc | β n ) λ , and rewrite (1.6) as (1.7) [( )] Pmcn = Λ λ ln[V ( S mc | β n )] − ln[V (Tmc | β n )] , where Λ ( x) = (1 + e − x ) −1 is the Logistic c.d.f. So strict utility can be viewed as a special case of strong utility where ln(V) replaces V in (1.1) and H is the Logistic c.d.f. One may produce a variety of strong utility implementations of any structure, and that variety is well-represented in the estimation literature. The utility function for money u(z) in both EU and RDEU structures is only unique up to an affine transformation, and in the literature we see various transformations used for estimation from experimental choice data. In the case of the CRRA utility function, call ub ( z | ρ ) = (1 − ρ ) −1 z1− ρ the basic transformation and u p ( z | ρ ) = z1− ρ the power transformation. In lottery choice experiments with many distinct contexts c, let z1 < z 2 < ... < z K be the k = 1, 2,…, K money outcomes from which all contexts c are created. Other transformations we might use for strong utility estimation of ρ , given data from such an experiment, are uur ( z | ρ ) = ( z1− ρ − z11− ρ ) /( z1K− ρ − z11− ρ ) , the unit range CRRA transformation (so called because uur ( z1 | ρ ) = 0 and uur ( z K | ρ ) = 1 ), and umu ( z | ρ ) = ( z1− ρ − z11− ρ ) /( z12− ρ − z11− ρ ) , the minimum unit CRRA transformation (so called because umu ( z1 | ρ ) = 0 and umu ( z2 | ρ ) = 1 ). Let ∆τ Vmc ( ρ ) be shorthand for the CRRA EU V-distance between the lotteries in pair mc, as computed using transformation τ, for use in a strong utility model; that is, let 6 (1.8) ∆τ Vmc ( ρ ) ≡ ∑i=1 ( smi − t mi )uτ ( zic | ρ ) , τ ∈ {b,p,ur,mu}. 3 Also, let ∆ lnVmc ( ρ ) be the strict utility (logarithmic) CRRA EU V-distance, defined as (1.9) ( ) ( ) ∆ lnVmc ( ρ ) ≡ ln ∑i=1 smi z1ic− ρ − ln ∑i=1 rmi zic1− ρ . 3 3 With some terminological abuse, say that this V-distance uses the log transformation. Equations (1.8) and (1.9) now give a family of strong and strict CRRA EU models of Pmcn , namely (1.10) Pmcn = H (λ∆τ Vmc ( ρ n ) ) , τ ∈ {b,p,ur,mu,ln}. where for strict utility H must be the Logistic c.d.f. For consistency, I use the Logistic c.d.f. as H in all estimations reported subsequently. Note that homoscedasticity with respect to agents n is not an essential feature of either strong or strict utility models. However, homoscedasticity with respect to pairs mc is a central characteristic of strong and strict utility models: Without it, they do not imply strong stochastic transitivity, one of the chief observable implications of strong and strict utility that distinguish them from other stochastic models (Block and Marschak 1960; Luce and Suppes 1965). 2. Critique of fully homoscedastic strong and strict utility Pratt (1964) originally developed a definition of what it means to say that EU-maximizing agent a is “more risk averse” (or MRA) than EU-maximizing agent b, which we may write as a b . Because such definitions of “more risk averse than” are couched in terms of comparative mra statics in the neighborhood of indifferences, they do not necessarily imply anything about changes in V-distance away from the neighborhood of indifference; therefore, a b ⇒ / a b in mra smra strong or strict utility models. Since V-distance is arbitrary in affine structures, a definition of “more risk averse than” will of necessity be couched in terms of what happens in the 7 neighborhood of indifference, where the arbitrariness of affine transformations is locally irrelevant. This extended quote from Chew, Karni and Safra (1987, p. 374) illustrates this, with notation modified to fit mine, which is definitive of MRA for both EU and RDEU: b a To compare the attitudes toward risk of two preference relations and on DJ [a set of probability distributions on an interval J ⊂ R], we define Tmc ∈ DJ to differ from Smc ∈ DJ by a simple compensated spread from the viewpoint of , if and only if Tmc ~b Smc and ∃ z0 b ∈ J such that Tmc(z) ≥ Smc(z) for all z < z0 and Tmc(z) ≤ Smc(z) for all z ≥ z0. a DEFINITION 5: A preference relation ∈ G [the set of preference relations representable by Gateaux-differentiable RDEU functionals V] is said to be more risk averse b a than ∈ G if Smc Tmc for every Tmc, Smc ∈ DJ such that Tmc differs from Smc by a b simple compensated spread from the point of view of …. THEOREM 1. The following conditions on a pair of Gateaux differentiable [RDEU] preference functionals V b and V a on DJ with respective utility functions u b and u a and probability transformation functions wb and wa are equivalent:… (ii) wa and u a are concave transformations of wb and u b , respectively. (iii) If Tmc differs from Smc by a simple compensated spread from the point of view of b [implying that V b (Tmc ) = V b ( S mc ) ] then V a (Tmc ) ≤ V a ( S mc ) . In terms of CRRA EU V-distance, Chew, Karni and Safra’s definition of a simple compensated spread means that if Tmc is a simple compensated spread of Smc from the viewpoint of agent b with ρ = ρ b , we have ∆τ Vmc ( ρ b ) = 0. The results (ii) and (iii) of Theorem 1 then imply that ∆τ Vmc ( ρ a ) > 0 ∀ ρ a > ρ b . However, this does not imply that ∂∆τ Vmc ( ρ ) / ∂ρ > 0 ∀ ρ > ρ b . MRA is about how indifference becomes preference when we substitute any “more risk averse” utility function (or probability weighting function) for the less risk averse one that 8 generated indifference; it is definitely not about monotone increasing V-distance with greater risk aversion. But the latter is required for a b ⇒ a b in strong and strict utility models if mra smra precision λ is independent of structural risk aversion ρ, that is if evaluative noise is homoscedastic with respect to agents. Since strong utility specifies no necessary relationship between ρ and λ, then, strong utility and MRA do not imply SMRA. Moreover, MRA cannot imply SMRA across all contexts in any strong or strict utility model that assumes homoscedasticity with respect to agents (constant λ). Graphs illustrate this by way of a simple counterexample, based on Hey’s (2001) four-outcome experimental design (similar four-outcome designs are common, e.g. Hey and Orme 1994 and Harrison and Rutström 2005). The design employs the four outcomes 0, 50, 100 and 150 U.K. pounds. Those four outcomes generate four distinct three-outcome contexts on which all experimental lottery pairs are defined. Label these contexts by their omitted outcome, e.g. the context c = −0 is (50,100,150), while the context c = −150 is (0,50,100). Here, the CRRA unit range transformation is uur ( z | ρ ) = ( z1− ρ − 01− ρ ) /(1501− ρ − 01− ρ ) = (z / 150)1− ρ , while the minimum unit transformation is umu ( z | ρ ) = ( z1− ρ − 01− ρ ) /(501− ρ − 01− ρ ) = (z / 50)1− ρ . Figure 2 shows the behavior of λτ ∆τ V1, −150 ( ρ ) for the five transformations τ defined previously, computed for the MPS pair 1 on context c = −150 (the context (0,50,100)), given by S1, −150 = (0,7 / 8,1 / 8) and T1, −150 = (3 / 8,1 / 8,4 / 8) , taken from Hey (2001), as ρ varies over the interval [0,0.99]. To sketch Figure 2, a constant value of λτ has been chosen for each transformation to make λτ ∆τ V1, −150 ( ρ ) reach a common maximum of 10 for ρ ∈ [0,0.99], for easy visual comparisons; the important point is that λτ is held constant to draw each graph of 9 λτ ∆τ V1, −150 ( ρ ) , for each transformation τ. The graphs thus reveal the behavior of strong and strict utility models in which λ is independent of ρ or, what is the same thing, where λ is assumed to be constant across agents n (homoscedasticity with respect to agents): If λτ ∆τ V1, −150 ( ρ ) is not monotone increasing on [0,0.99], then MRA cannot imply SMRA under transformation τ, given homoscedasticity with respect to agents. Notice that the basic, power and log transformations are not monotonically increasing in ρ, though the nonmonotonicity for the basic transformation is mild for this particular pair. The implication is that a strong utility CRRA EU model with λ constant across agents cannot predict SMRA from MRA defined in terms of ρ, using either the basic, power or log transformations, for MPS pair 1 on context c = −150. However, the unit range and minimum unit transformations can do this, at least for this MPS pair on this context. Unfortunately, this property of the unit range and minimum unit transformations disappears as soon as we consider a context that does not share the minimum outcome z1 = 0 used to define those transformations. Consider now MPS pair 1 on context c = −0 (the context (50,100,150)), given by S1, −0 = (0,7 / 8,1 / 8) and T1, −0 = (3 / 8,1 / 8,4 / 8) , also taken from Hey (2001). Figure 3 shows λτ ∆τ V1, −0 ( ρ ) for MPS pair 1 on context c = −0, for all five transformations: All produce a severely nonmonotonic relationship between V-distance and ρ, given constant precision λ. In essence, none of the five transformations can predict SMRA from MRA in ρ for all MPS pairs on a four-outcome vector—if λ is constant across agents. The conclusion is clear: If we want to explain heterogeneity of safe lottery choices by means of a strong or strict utility model solely in terms of heterogeneity of a parameter like ρ, and we use any of the five transformations discussed above, we cannot uniformly succeed if we assume that precision λ is constant across 10 subjects with varying ρ. Homoscedasticity with respect to agents is unacceptable if we want a b ⇒ a b in a strong or strict utility CRRA EU model across multiple contexts. mra smra The fact that the minimum unit and unit range transformations make MRA and SMRA congruent on a context that shares the minimum outcome used to define them gives a clue to one escape from these difficulties: Why not use a context-dependent transformation? This approach abandons homoscedasticity with respect to pairs mc; hence it is not a strong or strict utility model. This is in fact the approach I eventually argue for: The contextual utility model introduced subsequently assumes heteroscedasticity with respect to outcome contexts of a form suggested by a context-specific unit range transformation. This can be given both good psychological motivation and firm grounding in terms of Pratt’s (1964) main theorem. First, however, it is worth considering whether another apparent solution—allowing heteroscedasticity with respect to agents—is adequate on its own. 3. Empirical and theoretical problems with an “agent heteroscedasticity” solution Figure 4 shows two concave CRRA utility functions, for two agents a and b: A square-root ( ρ b = 0.5) CRRA utility function, and a “near-log” ( ρ a = 0.99) CRRA utility function; clearly a b in the senses of Pratt (1964) and Chew, Karni and Safra (1987). Consider the MPS pair mra S 2, −0 = (0,1,0) and T2, −0 = (1 / 2,0,1 / 2) —a sure 100 versus even chances of either 50 or 150. For a b to imply a b under strong utility, we need it to be the case that the V-distance between mra smra S 2, −0 and T2, −0 , given ρ b = 0.5, is less than the same V-distance given ρ a = 0.99, and Figure 4 has been drawn this way. If this were not true, then the less risk-averse utility function would imply a higher probability of choosing S 2, −0 under strong utility. But the two CRRA utility 11 functions in Figure 4 are not just any CRRA utility functions with ρ b = 0.5 and ρ a = 0.99: A specific transformation has been chosen that equates first derivatives of utility functions at the intermediate outcome 100. If this had not been done, the figure would not “nest” the V-distance with ρ b = 0.5 inside of the V-distance with ρ a = 0.99, and nothing could be read off about relative V-distance between S 2, −0 and T2, −0 for the two utility functions.7 The local risk aversion measure − u ′′( z ) / u′( z ) may be thought of as measuring concavity relative to “a common unit first derivative” at z by virtue of its mathematical form, which is what Figure 4 does visually. Reflection on Figure 4 suggests that for a b ⇒ a b on some collection of contexts, we mra smra might succeed with a transformation of u(z) that equates first derivatives u ′( z ) of all utility functions at some fixed value, at some z sufficiently greater than z1 ≡ max{z1c } , the maximum c of the minimum outcomes found in any of several contexts c for which we have choice data. If this is not done, utility functions become arbitrarily flat over one or more contexts as the coefficient of relative risk aversion get large. This in turn implies that all V-distances between lotteries on such contexts approach zero as the coefficients of relative risk aversion get large and, hence, that all strong or strict utility choice probabilities approach 0.5 for sufficiently great relative risk aversion on such contexts. The first derivative of the CRRA basic transformation is ub′ ( z ) = z − ρ ; so multiplying the basic transformation by z ρ , or (what is the same thing) eαρ where α = ln( z ) for some z > z1 , makes all CRRA utility functions have a common unit slope at some z = eα . Therefore, call u sc ( z | ρ ) = eαρ z1− ρ /(1 − ρ ) an SMRA-compatible CRRA transformation. 7 The figure also chooses a “zero” (really, the c in c + du(z)) for each utility function such that u(100) = 100 for both utility functions, but this is just for ease of visual comparison, since this choice disappears from any strong utility Vdistance. The choice of d to match first derivatives at z = 100 is the important move. 12 If we compute CRRA EU V-distance using the SMRA-compatible transformation, the term eαρ may be factored out of the V-distance. The analogue of equation (1.10) then becomes (3.1) ( ) n Pmcn = H λeαρ ∆ bVmc ( ρ n ) . This is a latent variable model with heteroscedastic error with respect to agents n: This is easily n seen by writing λn ≡ λeαρ ≡ 1 / σ n . This form yields ln(λn ) ≡ ln(λ ) + αρ n , suggesting that if we estimate something similar to equation (1.10) using the basic CRRA transformation, one subject at a time, getting estimates λ̂n and ρ̂ n for each subject n, we should find a linear relationship between ln(λˆn ) and ρ̂ n across subjects n. If this linear relationship has a slope α significantly greater than ln( z) , then a b ⇒ a b in the population from which subjects are drawn on all mra smra contexts found in the experiment that generated the data. Hey’s (2001) remarkable data set contains 500 lottery choices per subject, for fifty-three subjects. This allows one to estimate model parameters separately for each subject n with relative precision. The specification of choice probabilities actually used for this estimation adds a few small features to equation (1.10) to take care of certain peripheral matters. It is (3.2) { ( ) } Pmcn = (1 − δ mc ) (1 − ω n ) H λn ∆ bVmc ( ρ n ) + ω n / 2 + δ mc (1 − ω n / 2) . As is the case with several existing experimental data sets, Hey’s experiment contains a small number of pairs mc in which Smc first-order stochastically dominates Rmc. Letting Ωfosd be the set of all such FOSD pairs, let δmc = 1 ∀ m ∈ Ωfosd (δm = 0 otherwise). It is well-known that strong and strict utility do a poor job of predicting the rareness of violations of FOSD (Loomes and Sugden 1998). Equation (2.2) takes account of this, yielding Pmcn = 1 − ω n / 2 ∀ mc ∈ Ωfosd. The new parameter ω n is a tremble probability. Some randomness of observed choice has been thought to arise from attention lapses or simple responding mistakes that are independent of pairs 13 mc, and adding a tremble probability as above takes account of this. It also provides a convenient way to model the low probability of FOSD violations. Moffatt and Peters (2001) show that there are significantly positive tremble probabilities even in data from experiments where there are no FOSD pairs, such as Hey and Orme (1994). The specification in equation (2.2) follows Moffatt and Peters in assuming that “tremble events” occur with probability ω n independent of mc and, in the event of a tremble, that choices of Smc or Tmc from pair mc are equiprobable. Figure 5 plots maximum likelihood estimates ln(λˆn ) and ρ̂ n from equation (2.2) for each of Hey’s (2001) fifty-three subjects, along with a robust regression line between them.8 It does appear to be a remarkably linear relationship, as suggested by the heteroscedastic form ln(λn ) ≡ ln(λ ) + αρ n . The robust regression coefficient is α̂ = 4.24, with a 90% confidence interval [4.11,4.38]. There are sixteen MPS pairs in Hey’s experimental design: Simple computational methods may be used to find the minimum value z > z1 ≡ max{z1c } = 50 for Hey’s c MPS pairs such that any α > ln(z ) allows MRA to imply SMRA for all those pairs.9 This minimum value is about z = 59.7, so we need α > ln(59.7) = 4.09 for a b ⇒ a b . As can be mra smra seen, the point estimate α̂ =4.24 does indeed allow it, and the 90% confidence interval for α̂ just allows one to reject (at 5%) the directional null α ≤ 4.09 in favor of the alternative that a b ⇒ a b amongst Hey’s subjects for all MPS pairs in all of the experiment’s contexts. mra smra Results are similar in all essential ways if, instead, CRRA RDEU is estimated using Prelec’s ( n ) (1998) one-parameter probability weighting function w(q) = exp − [− ln(q)]γ . 8 The robust regression technique used is due to Yohai (1987), and deals with outliers in both the dependent and independent variable. This is appropriate here, since both are estimates. 9 This is done by a simple grid search. Starting at α = ln( z1 ) = ln(50) , α is incremented in small steps until n eαρ ∆ bVmc ( ρ n ) is monotonically increasing in ρ (this is also checked by computational methods at each value of α) for all MPS pairs mc in Hey’s design. 14 There is, however, an obvious theoretical difficulty with this “agent heteroscedasticity” solution. Suppose that immediately upon completion of Hey’s (2001) experiment, we had estimated the relationship in Figure 5 and then invited Hey’s fifty-three subjects back for another experimental session in which they made choices from lottery pairs on a new fifth context c* = (100,150,200). If we believe that the relationship shown in Figure 5 is “context-free,” that is, that this relationship between relative risk aversion and precision is a fixed feature of this population of subjects independent of any outcome context they might face, then a b ⇒ / a b on the mra smra new context c*. This is clear because the estimated upper confidence limit on α̂ , 4.38, implies a value of z = 79.8 < 100 = z1c* (since e 4.38 = 79.8) . Therefore, as argued earlier, λe 4.38 ρ ∆ bVmc* ( ρ ) will converge to zero for sufficiently large ρ and so cannot be monotonically increasing in ρ : α = 4.38 does not produce this monotonicity on the new context c*. It should be clear that this problem is entirely general. Estimate or choose any finite value of α you wish, on whatever data you like: You can always specify a new context c* with z1c* > eα , and a b ⇒ / a b on that new context, given that value of α. Heteroscedasticity with respect mra smra to agents is not, in the final analysis, a satisfactory way to make a b ⇒ a b . Although the mra smra transformation u sc ( z | ρ ) = eαρ z1− ρ /(1 − ρ ) is “SMRA compatible” on a given collection of contexts for a suitable choice of α, it cannot be SMRA compatible on all contexts with arbitrarily large values of z1c . Of course one could make α “context-dependent” but this is tantamount to abandoning strong and strict utility, since there will then be heteroscedasticity with respect to contexts. If this is what we must do in order to make a b ⇒ a b , it is better to approach it mra smra in a more basic theoretical way. This is what the contextual utility model does. 15 4. Contextual utility Psychological motivation for contextual heteroscedasticity comes from signal detection and stimulus discrimination experiments. In this literature, stimulus categorization errors are known to increase with the subjective range taken by the stimulus or signal. For instance, Pollack (1953) and Hartman (1954) presented subjects with a number of tones equally spaced over some range of tones. The range of tones used is varied between subjects, but specific target tones are encountered by all subjects. The results show that confusion of target tones is more common when the range of tones encountered is wider. Such observations gave rise to models of stimulus categorization and discrimination error predicting that classification error variance increases with the subjective range of the stimulus (Parducci 1965, 1974; Holland 1968; Gravetter and Lockhead 1973). In fact, a rough proportionality between subjective stimulus range and the standard deviation of latent error seemed descriptive of much data from categorization experiments, though some of the formal models allowed for some deviation from this (e.g. Holland 1968 and Gravetter and Lockhead 1973). In categorization experiments, where stimuli are presented one at a time and subjects’ task is to assign the stimulus to a category, the subjective range of the stimulus was usually taken to be determined (after a period of adaptation) by the whole range of stimuli presented over the course of the experiment—what one might call the “global context” of the stimulus. The contextual utility model borrows the idea that the standard deviation of evaluative noise is proportional to the subjective range of stimuli from this literature on categorization. However, being a model of choice from lottery pairs rather than a model of categorization of singly presented stimuli, it assumes that choice pairs create their own “local context” or idiosyncratic subjective stimuli range, in the form of the range of outcome utilities found in the pair. We may 16 think of agents as perceiving lottery value on context c relative to the range of possible lottery values on context c. This subjective stimulus range is assumed to be proportional to the difference between the values of the best and worst possible lotteries on context c. Letting V ( z | β n ) be subject n’s value of a degenerate lottery that pays z with certainty, the subjective stimulus range for subject n on any context c is assumed proportional to V ( z3c | β n ) − V ( z1c | β n ) . The contextual utility model for lottery choice pairs is the assumption that evaluative noise is proportional to this subjective stimulus range. For non-FOSD pairs on a three-outcome context, and ignoring trembles, contextual utility choice probabilities are: (4.1) V ( S mc | β n ) − V (Tmc | β n ) . P = H λ n n V z β V z β ( | ) − ( | ) 3c 1c n mc Under expected utility, this may be rewritten as (4.2) ( s − t )u n ( z1c ) + ( sm 2 − tm 2 )u n ( z 2c ) + ( sm3 − t m3 )u n ( z3c ) . Pmcn = H λ m1 m1 u n ( z3c ) − u n ( z1c ) Noting that ( sm1 − t m1 ) = −( sm 2 − tm 2 ) − ( sm3 − t m3 ) , equation (4.2) may be rewritten as (4.3) n mc P ( sm 2 − t m 2 )[u n ( z 2c ) − u n ( z1c )] s t = H λ + ( − ) m 3 m 3 n n u z u z ( ) − ( ) 3c 1c n = H (λ[(sm 2 − t m 2 )uurc ( z 2c ) + ( sm3 − t m3 )]) , where the transformation τ = urc is a unit range transformation on each context c—that is, an affine transformation making u n ( z1c ) = 0 and u n ( z3c ) = 1 in each context c for each subject n. n That is, uurc ( z ) ≡ [u n ( z ) − u n ( z1c )] /[u n ( z3c ) − u n ( z1c )] . This is another interpretation of contextual utility: It employs a “contextual unit range transformation” of the utility function. 17 Suppose that pair mc in equation (4.3) is an MPS pair. Pratt’s (1964, p. 128) Theorem 1 quickly shows that the contextual unit range transformation guarantees that a b ⇒ a b on mra smra every three-outcome context, as suggested by Figure 2. Here I quote the relevant parts of Pratt’s theorem, with notation modified to fit mine: Theorem 1: Let r n (z ) [equal to − u ′′( z ) / u′( z ) for utility function un] be the local risk aversion…corresponding to the utility function un, n = a,b. Then the following conditions are equivalent, in either the strong form (indicated in brackets), or the weak form (with the bracketed material omitted). (a) r a ( z ) ≥ r b ( z ) for all z [and > for at least one z in every interval]… (e) u a ( z3c ) − u a ( z 2 c ) u b ( z3c ) − u b ( z 2 c ) ≤ [ <] b for all z1c , z 2c , z3c with z1c < z 2c < z3c . u a ( z2 c ) − u a ( z1c ) u ( z 2c ) − u b ( z1c ) Adding unity to both sides of (e) in the form [u n ( z 2c ) − u n ( z1c )] /[u n ( z2 c ) − u n ( z1c )] , and then taking reciprocals of both sides, we get the following equivalence from the theorem: (4.4) u a ( z2 c ) − u a ( z1c ) u b ( z 2c ) − u b ( z1c ) r ( z ) > r ( z )∀z ∈ [ z1c , z3c ] ⇔ a > ∀z 2c ∈ ( z1c , z3c ) . u ( z3c ) − u a ( z1c ) u b ( z3c ) − u b ( z1c ) a b n Since uurc ( z2c ) = [u n ( z 2c ) − u n ( z1c )] /[u n ( z3c ) − u n ( z1c )] , equation (4.4) shows that (4.5) a b r a ( z ) > r b ( z )∀z ∈ [ z1c , z3c ] ⇔ uurc ( z2 c ) > uurc ( z2 c )∀z 2c ∈ ( z1c , z3c ) . Equation (4.3) shows that the EU V-distance between Smc and Tmc under the contextual unit range n n transformation is ( sm 2 − tm 2 )uurc ( z 2c ) + ( sm3 − t m3 ) , which is obviously increasing in uurc ( z2c ) for all MPS pairs since ( sm 2 − t m 2 ) > 0 in these pairs. Therefore, by equation (4.5), it is increasing in local risk aversion r n (z ) as well. Therefore, the contextual utility model implies what strong and 18 strict utility cannot: That a b ⇒ a b on all three-outcome contexts when λ is constant mra smra across agents. We should not necessarily expect homoscedasticity with respect to agents; it is quite possible that some agents are noisier decision makers than others so that λ varies across agents. Another way to state the previous result, allowing for the possibility that precision varies across agents, is as follows (the proof is obvious and so omitted): Proposition: Consider two agents such that λa ≥ λb and r a ( z ) ≥ r b ( z ) for all z [and > for at least one z on every context]. Then under the contextual utility model, a b . Put smra differently, a b and λa ≥ λb ⇒ a b under the contextual utility model. mra smra Contextual utility disentangles variations in risk attitude from variations in precision because its form implies that precision does unique descriptive work. This is easily seen by noting that n uurc ( z 2c ) → 1 as r n (z ) → ∞ . Therefore, as local risk aversion becomes infinite, equation (4.3) with agent-dependent precision λn becomes (4.6) Pmcn = H (λn [ sm 2 + sm 3 − t m 2 − t m3 ]) . Since sm 2 + sm 3 − tm 2 − t m3 is obviously finite, it is obvious that contextual utility does not imply that Pmcn → 1 as r n (z ) → ∞ for all MPS pairs; it just implies that Pmcn is increasing in r n (z ) for given λn . This may seem peculiar at first, but from an econometric viewpoint it implies that precision λn is not simply a nuisance parameter, but rather plays a unique descriptive role in the model: Extreme choice probabilities, that is those very close to zero or unity, necessarily require appropriate variations in λn for their explanation (with typical choices of H such as the Standard 19 Normal or Logistic); variation in r n (z ) alone will not explain all extreme choice probabilities under contextual utility. When combined with either a constant absolute risk aversion (or CARA) utility function u ( z ) = −e − rz or with a CRRA utility function, contextual utility implies some theoretically attractive invariance properties of choice probabilities. Call c + k = ( z1c + k , z2 c + k , z3c + k ) and kc = (kz1c , kz2 c , kz3c ) additive and proportional shifts of context c = ( z1c , z 2c , z3c ) , respectively. Since u ( z + k ) = − e − r ( z + k ) = − e − rk e − rz for CARA, the contextual unit range transformation implies that (4.7) uurn ,c+ k ( z + k ) ≡ ≡ (−e − rk e − rz + e − rk e − rz1c ) /(−e − rk e − rz3 c + e − rk e − rz1c ) (−e − rz + e − rz1c ) /(−e − rz3 c + e − rz1c ) ≡ n uurc (z ) . Equation (4.3) then implies that Pmn,c+ k ≡ Pmcn : That is, contextual utility and CARA EU together imply that choice probabilities are invariant to an additive context shift. Similarly, since u (kz ) = (zk )1− ρ = k 1− ρ z1− ρ for CRRA, we have (4.8) uurn ,kc (kz ) ≡ (k 1− ρ z1− ρ − k 1− ρ z11c− ρ ) /(k 1− ρ z13−c ρ − k 1− ρ z11c− ρ ) ≡ ( z1− ρ − z11c− ρ ) /( z31−c ρ − z11c− ρ ) ≡ n uurc (z ) . Equation (4.3) then implies that Pmn,kc ≡ Pmcn : That is, contextual utility and CRRA EU together imply that choice probabilities in pairs are invariant to a proportional context shift. These invariance properties of choice probabilities under contextual utility echo what is well-known about structural CARA and CRRA preferences, namely that their preference directions are 20 invariant to additive and proportional context shifts, respectively. Strict utility shares these two shift invariance properties with contextual utility, but strong utility does not.10 5. An Empirical Comparison of Contextual Utility and its Competitors So far, I have been concerned with the theoretical coherence of structural and stochastic definitions of the relation “more risk averse.” The question naturally arises: Which stochastic model, when combined with EU or RDEU, actually explains and predicts binary choice under risk best? To answer this question, we need a suitable data set. Strong utility and contextual utility are simple reparameterizations of one another for any one context. Therefore, data from any experiment where subjects make choices on a single context, such as Loomes and Sugden’s (1998) experiment, are not suitable. The experiment of Hey and Orme (1994), hereafter HO, is suitable since all subjects make choices from pairs defined on four distinct contexts. The HO experiment has another desirable design feature that is unique among experiments with multiple contexts: The same pairs of probability distributions are used to construct the lottery pairs on all four of its contexts. Therefore, when models fail to explain choices across multiple contexts in the HO data, we cannot attribute this to other differences in lottery pairs across contexts: The failure must be due to the model’s generalizability across contexts. The HO experiment builds lotteries from four equally spaced non-negative outcomes including zero, in increments of 10 UK pounds. Letting “1” represent 10 UK pounds, the 10 Contextual utility and strong utility EU models share a different property which might be called the “phantom common ratio effect” or PCRE, while strict utility and random preference (discussed below) EU models do not have this property. See Loomes (2005, pp. 303-305) for a lucid discussion of the manner in which strong utility EU produces the PCRE (the reasoning is identical for contextual utility) and why random preference EU does not. But because of strict utility’s difference of logarithms form, common ratio changes drop out of that difference, so that strict utility produces no PCRE: Strict utility EU produces a choice probability invariance to common ratio changes that mimics random preferences EU. 21 experiment uses the outcome vector (0,1,2,3). Exactly one-fourth of observed lottery choices in the HO data are choices from pairs on each of the four possible three-outcome contexts from this vector. In keeping with past notation, these four contexts are denoted by c ∈ {−0,−1,−2,−3}, where −0 = (1,2,3), −1 = (0,2,3), −2 = (0,1,3) and −3 = (0,1,2). HO estimate a variety of structures combined with strong utility, and estimate these models individually—that is, each structure is estimated separately for each subject. Additionally, for all structures that specify a utility function on outcomes u(z), HO take a nonparametric approach to the utility function. Letting u zn ≡ u n (z ) , HO set u0n = 0 and λ = 1, and estimate u1n , u2n and u3n directly, allowing the utility function u(z) to take on arbitrary shapes across the outcome vector (0,1,2,3). (Given the latent variable form of strong utility models and the affine transformation properties of u(z), just three of the five parameters λ, u0n , u1n , u2n and u3n are identified.) HO found that estimated utility functions overwhelmingly fall into two classes: Concave utility functions, and inflected utility functions that are concave over the context (0,1,2) but convex over the context (1,2,3). The latter class is quite common, accounting for thirty to forty percent of subjects (depending on the structure estimated). Because of this, I follow HO in avoiding a simple parametric functional form such as CARA or CRRA that forces concavity or convexity across the entire outcome vector (0,1,2,3), instead adopting their nonparametric treatment of utility functions in strong, strict and contextual utility models. However, I will instead set u0n = 0 and u1n = 1, and estimate λ, u2n and u3n . To account for any reliable heterogeneity of subject behavior, however, I part company with HO’s individual estimation and use a random parameters approach for estimating and comparing models, as done by Loomes, Moffatt and Sugden (2002) and Moffatt (2005). Individual 22 estimation does avoid distributional assumptions made by random parameters methods, and seems well-behaved at HO’s sample sizes when we confine attention to “in-sample” comparisons of model fit, as HO did. However, Monte Carlo analysis shows that with finite samples of the HO size, comparisons of the out-of-sample predictive performance of models can be extremely misleading with individual estimation and prediction (Wilcox 2007). I want to address both the in-sample fit and out-of-sample predictive performance of models, so this issue matters here. This is why I choose a random parameters estimation approach. The HO experiment allowed subjects to express indifference between lotteries. HO model this with an added “threshold of discrimination” parameter within a strong utility model. An alternative parameter-free approach, and the one I take here, treats indifference in a manner suggested by decision theory, where the indifference relation S mc ~ n Tmc is defined as the intersection of two weak preference relations, i.e. “ S mc nTmc ∩ Tmc n S mc .” This suggests treating indifference responses as two responses in the likelihood function—one of Smc being chosen from mc, and another of Tmc being chosen from mc—but dividing that total log likelihood by two since it is really based on just one independent observation. Formally, the definite choice of Smc adds ln( Pmc ) to the total log likelihood; the definite choice of Tmc adds ln(1 − Pmc ) to that total; and indifference adds [ln(Pmc ) + ln(1 − Pmc )] / 2 to that total. See also Papke and Wooldridge (1996) and Andersen et al. (2007) for related justifications of this approach. So far I have confined my attention to strong, strict and contextual utility. However, another stochastic modeling approach, known as the random preference model, appears in both contemporary experimental and theoretical work on discrete choice under risk (Loomes and Sugden 1995, 1998; Carbone 1997; Loomes, Moffatt and Sugden 2002; Gul and Pesendorfer 2006). Econometrically, the random preference model views stochastic choice as arising from 23 randomness of structural parameters. We think of each agent n as having an urn filled with structural parameter vectors β n . Following Carbone (1997) for instance, we may set u0n = 0 and u1n = 1 and think of EU with random preferences as a situation where β n = (u 2n , u3n ) , with u2n ≥ 1 and u3n ≥ u 2n , and where each subject n has an urn filled with such utility vectors (u 2n , u3n ) . At each trial of any pair mc, the subject draws one of these utility vectors from her urn (with replacement) and uses it to calculate both V ( S mc | β n ) and V (Tmc | β n ) without error, choosing Smc iff V ( S mc | β n ) ≥ V (Tmc | β n ). Let Fβ ( x | α n ) be the joint c.d.f. of β in agent n’s “random preference urn,” conditioned on some vector α n of parameters determining the shape of the distribution Fβ . Then under random preferences, (5.1) ( ) Pmcn = Pr V ( S mc | β n ) − V (Tmc | β n ) ≥ 0 | Fβ ( x | α n ) . Put differently, if Bmc = {β | V ( S mc | β ) − V (Tmc | β ) ≥ 0} , then Pmcn = Pr (β n ∈ Bmc ).11 Loomes, Moffatt and Sugden (2002) show how to specify a random preferences RDEU model for estimation when lottery pairs are defined on a single context. However, econometric specification of a random preference RDEU model across multiple contexts is more difficult. Building both on Loomes, Moffatt and Sugden and certain insights of Carbone (1997), Appendix A shows how this may be done for the three contexts −0, −2 and −3, but also shows why further extending random preferences to cover the other context −1 is not a transparent exercise for the RDEU structure. For this reason, I confine all of my estimations to choices in the HO data on the 11 For simplicity’s sake, assume that parameter vectors producing indifference in any pair mc have zero measure for all n, so that the sets Bmc and Bmc′ = {β | V ( S mc | β ) − V (Tmc | β ) > 0} have equal measure. See Gul and Pesendorfer (2006) for a formal discussion of this assumption. 24 contexts −0, −2 and −3. It is simple to specify strong, strict or contextual utility models for both EU and RDEU for all four contexts; but then we could not compare them to random preferences. Appendix B gives details of the random parameters econometric specifications used for strong utility, strict utility, contextual utility and random preference versions of both EU and RDEU structures, for each of the HO contexts −0, −2 and −3. I perform two different kinds of comparisons of model fit. The first kind (very common in this literature) are “in-sample fit comparisons.” Models are estimated on all three of the HO contexts (−0, −2 and −3) and the resulting log likelihoods for each model across all three contexts are compared. The second kind of comparison, which is rare in this literature but well-known generally, compares the “out-of-sample” fit of models—that is, their predictive performance. For these comparisons, models are estimated on just the two HO contexts −2 and −3, that is contexts (0,1,3) and (0,1,2), and these estimates are used to predict choice probabilities and calculate log likelihoods of observed choices on HO context −0, that is context (1,2,3). This is something more than a simple out-of-sample prediction, which could simply be a prediction to new choices made on the same contexts used for estimation: It is additionally an “out-of-context” prediction. This particular kind of out-of-sample fit comparison may be quite difficult in the HO data. Relatively safe choices are the norm in contexts −2 and −3 of the HO data: The mean proportion of safe choices made by HO subjects in these contexts is 0.764, and at the individual level this proportion exceeds ½ for seventy of the eighty subjects. But relatively risky choices are the norm in context −0 of the HO data: The mean proportion of safe choices there is just 0.379, and falls short of ½ for fifty-eight of the eighty subjects. Recall that we cannot attribute this to differences in the probability vectors that make up the lottery pairs in the different HO contexts, since the HO experiment holds this constant across contexts. This out-of-sample prediction task is going to 25 be difficult: From largely safe choices in the “estimation contexts” −2 and −3, the models need to predict largely risky choices in the “prediction context” −0. Table 1 displays both the in-sample and out-of-sample log likelihoods for the eight models. The top four rows are EU models, and the bottom four rows are RDEU models; for each structure, the four rows show results for strong utility, strict utility, contextual utility and random preferences. The first column shows total in-sample log likelihoods, and the second column shows total out-of-sample log likelihoods. Contextual utility always produces the highest log likelihood, whether it is combined with EU or RDEU, and whether we look at in-sample or outof-sample log likelihoods (though the log likelihood advantage of contextual utility is most pronounced in the out-of-sample comparisons). Buschena and Zilberman (2000) and Loomes, Moffatt and Sugden (2002) point out that the best-fitting stochastic model may depend on the structure estimated and offer empirical illustrations of this point. As a general econometric point, I heartily agree with them. Yet in Table 1 contextual utility is the best stochastic model regardless of whether we view the matter from the perspective of the EU or RDEU structures, or from the perspective of in-context or out-of-context fit. It is fair to say that in the last quarter century, decision theory has been relatively dominated by structural theory innovation rather than stochastic model innovation. Table 1 has something to say about this. Let us first examine the in-sample fit column. Holding stochastic models constant, the maximum improvement in log likelihood associated with moving from the EU to the RDEU is 142.02 (with strict utility), and the improvement is 106.64 for the best-fitting stochastic model (contextual utility). Holding structures constant instead, the maximum improvement in log likelihood associated with changing the stochastic model is 151.48 (with the EU structure, switching from strict to contextual utility), but this is atypical: Omitting strict utility models, 26 which have an unusually poor in-sample fit, the maximum improvement is 51.28 (with the EU structure, switching from strong to contextual utility) and is usually no more than half that. Therefore, except for the especially poor strict utility fits, in-sample comparisons make choices of stochastic models appear to be a sideshow relative to choices of structure. This appearance is reversed when we look at out-of-sample comparisons—that is, predictive power. Looking now at the out-of-sample fit column, notice first that under strict utility, RDEU actually fits worse than EU does. We should be getting the impression by now, however, that strict utility is an unusually poor performer, so let us henceforth omit it from consideration. Among the remaining three stochastic models, the maximum out-of-sample fit improvement associated with switching from EU to RDEU is 21.19 (for contextual utility). Holding structures constant instead, the maximum out-of-sample fit difference between the stochastic models (again omitting strict utility) is 113.39 (for the RDEU structure, switching from strong to contextual utility). In the realm of out-of-sample prediction, then, structure plays Rosencrantz and Guildenstern to the stochastic model Hamlet. There will be honest differences of opinion as to whether decision theory should preoccupy itself with explanation or prediction, but it seems that those who have more interest in the latter may want to think more about stochastic models, as repeatedly urged by Hey (Hey and Orme 1994; Hey 2001; Hey 2005). Table 2 reports the results of a more formal comparison between the stochastic models, ~ conditional on each structure. Let D n denote the difference between the estimated log likelihoods (in-sample or out-of-sample) from a pair of models, for subject n. Vuong (1989) ~ provides an asymptotic justification for treating a z-score based on the D n as following a normal distribution under the hypothesis that two non-nested models are equally good. The statistic is computed as z = ∑ N k =1 ~ ~ D n /(~ sD N ) , where ~ sD is the sample standard deviation of the D n across 27 subjects n (calculated without the usual adjustment for a degree of freedom) and N is the number of subjects. Table 2 reports these z-statistics, and associated p-values against the null of equally good fit, with a one-tailed alternative that the directionally better fit is significantly better. While contextual utility is always directionally better than its competitors, no convincingly significant ordering of the stochastic models emerges from the in-sample comparisons shown in the left half of Table 2, though strict utility is clearly significantly worse than the other three stochastic models. Contextual utility shines, though, in the out-of-sample fit comparisons in the right half of Table 2, regardless of whether the structure is EU or RDEU, where it beats the other three stochastic models with strong significance. 6. Conclusions A great deal of the scholarly conversation about decision under risk has been about structure, but there has been resurgent interest in the stochastic part of decision under risk. This has been driven both by theoretical questions and empirical findings. Theoretically, some or all of what passes for “an anomaly” relative to some structure (usually EU) can sometimes be attributed to stochastic models rather than the structure in question. This is an old point, stretching back at least to Becker, DeGroot and Marschak’s (1963a,1963b) observation that violations of betweenness are precluded by some stochastic versions of EU (random preferences) but predicted by other stochastic versions of EU (strong utility). But this general concern has been resurrected by many writers; Loomes (2005), Gul and Pesendorfer (2006) and Blavatskyy (2007) are just three relatively recent (but very different) examples. Empirically, we know from a (now quite large) body of experimental evidence that the stochastic part of decision under risk is surprisingly large, in the sense that retest reliabilities show that choice probabilities are quite 28 distant from zero or one. Additionally, a “new structural econometrics” has emerged over the last decade in which structures like EU, RDEU and cumulative prospect theory are married to some stochastic model for the purpose of estimating structural parameters. My study is not inspired by any worry that choice anomalies might be attributable to stochastic reasons, though I appreciate this point. Rather, I ask what the empirical meaning of estimated structural parameters are in the new structural econometrics of risky choice. I find that structural parameters meant to represent the degree of risk aversion in Pratt’s (1964) structural sense of “more risk averse” cannot imply a theoretically attractive definition of the relation “stochastically more risk averse” across all choice contexts, when estimation uses strong or strict utility models. I conclude that strong and strict utility are deeply flawed stochastic models for the econometrics of discrete choice under risk. The contextual utility model eliminates this flaw without introducing extra parameters. Empirically, at least in the Hey and Orme (1994) data, its in-sample performance is as good as that of strong utility, strict utility and random preferences; and its out-of-sample predictive performance is significantly the best of this particular collection of stochastic models. As a closing thought, contextual utility has novel implications for strength of incentives in experimental designs. It suggests that the marginal utility improvement from taking risky action X rather than risky action Y is only a part of what governs the strength of perceived incentives: The other important part is the subjective range of available utility levels. This implies that, in general, scaling up payoffs in experiments may be a relatively ineffective means of achieving payoff dominance. If instead we can change a design so as to shrink the subjective range of available utilities, holding marginal utility improvements constant, this may be a more effective way of improving subjective payoff dominance in a design. 29 References Aitchison, J. (1963). Inverse Distributions and Independent Gamma-Distributed Products of Random Variables. Biometrika 50, 505-508. Andersen, S., Harrison, G. W., Lau, M. I., and Rutstrom, E. E. (2007). Eliciting Risk and Time Preferences. Working paper 5-24, University of Central Florida Department of Economics. G.M. Becker, G. M., DeGroot, M. H., Marschak, J. (1963a). Stochastic Models of Choice Behavior. Behavioral Science 8, 41–55. Becker, G. M., DeGroot, M. H. and Marschak, J. (1963b). An Experimental Study of Some Stochastic Models for Wagers. Behavioral Science 8, 199-202. Blavatskyy, P. R. (2007). Stochastic Choice Under Risk. Journal of Risk and Uncertainty (forthcoming). Block, H. D. and Marschak, J. (1960). Random Orderings and Stochastic Theories of Responses. In I. Olkin et al. (Eds.), Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling (pp. 97-132). Stanford CA: Stanford University Press. Buschena, D. E. and Zilberman, D. (2000). Generalized Expected Utility, Heteroscedastic Error, and Path Dependence in Risky Choice. Journal of Risk and Uncertainty 20, 67-88. Camerer, C. (1989). An Experimental Test of Several Generalized Expected Utility Theories. Journal of Risk and Uncertainty 2, 61-104. Carbone, E. (1997). Investigation of Stochastic Preference Theory Using Experimental Data. Economics Letters 57, 305-311. Chew, S. H., Karni, E. and Safra, Z. (1987). Risk Aversion in the Theory of Expected Utility with Rank-Dependent Preferences. Journal of Economic Theory 42, 370-81. Cox, J. C., and Sadiraj, V. (2006) Small- and large-stakes risk aversion: Implications of concavity calibration for decision theory. Games and Economic Behavior 56, 45-60. Debreu, G. (1958). Stochastic Choice and Cardinal Utility. Econometrica 26, 440-444. Edwards, W. (1954). A Theory of Decision Making. Psychological Bulletin 51, 380-417. Gravetter, F., and Lockhead, G. R. (1973). Criterial Range as a Frame of Reference for Stimulus Judgment. Psychological Review 80, 203-216. Gul, F., and Pesendorfer, W. (2006). Random Expected Utility. Econometrica 74, 121-146. 30 Harrison, G. W., and Rutström, E. E. (2005). Expected Utility Theory and Prospect Theory: One Wedding and a Decent Funeral. Working paper 05-18, Department of Economics, College of Business Administration, University of Central Florida. Hartman, E. B. (1954). The Influence of Practice and Pitch-Distance between Tones on the Absolute Identification of Pitch. American Journal of Psychology 67, 1-14. Hey, J. D. (1995). Experimental Investigations of Errors in Decision Making Under Risk. European Economic Review 39, 633-640. Hey, J. D. (2001). Does Repetition Improve Consistency? Experimental Economics 4 5-54. Hey, J. D. and Carbone, E. (1995). Stochastic Choice with Deterministic Preferences: An Experimental Investigation. Economics Letters 47, 161–167. Hey, J. D. and Orme, C. (1994). Investigating Parsimonious Generalizations of Expected Utility Theory Using Experimental Data. Econometrica 62, 1291-1329. Hutchinson, T. P. and Lai, C. D. (1990). Continuous Bivariate Distributions, Emphasizing Applications. Adelaide, Australia: Rumsby Scientific Publishers. Judd, K. L. (1998). Numerical Methods in Economics. Cambridge, MA: MIT Press. Loomes, G. (2005). Modeling the Stochastic Component of Behaviour in Experiments: Some Issues for the Interpretation of Data. Experimental Economics 8, 301-323. Loomes, G. and Sugden, R. (1995). Incorporating a Stochastic Element into Decision Theories. European Economic Review 39, 641-648. Loomes, G. and Sugden, R. (1998). Testing Different Stochastic Specifications of Risky Choice. Economica 65, 581-598. Loomes, G., Moffatt, P. and Sugden, R. (2002). A Microeconometric Test of Alternative Stochastic Theories of Risky Choice. Journal of Risk and Uncertainty 24, 103-130. Luce, R. D. and Suppes, P. (1965). Preference, Utility and Subjective Probability. In R. D. Luce, R. R. Bush and E. Galanter (Eds.), Handbook of Mathematical Psychology, Vol. III (pp. 249410). New York: Wiley. Machina, M. (1985). Stochastic Choice Functions Generated from Deterministic Preferences over Lotteries. Economic Journal 95, 575-594. McKay, A. T. (1934). Sampling from Batches. Supplement to the Journal of the Royal Statistical Society 1, 207-216. 31 Moffatt, P. (2005). Stochastic Choice and the Allocation of Cognitive Effort. Experimental Economics 8, 369-388. Moffatt, P. and Peters, S. (2001). Testing for the Presence of a Tremble in Economics Experiments. Experimental Economics 4, 221-228. Papke, L. E., and Wooldridge, J. M. (1996). Econometric Methods for Fractional Response Variables with an Application to 401(K) Plan Participation Rates. Journal of Applied Econometrics 11, 619-632. Parducci, A. (1965). Category Judgment: A range-frequency model. Psychological Review 72, 407-418. Parducci, A. (1974). Contextual Effects: A Range-Frequency Analysis. In E. C. Carterette & M. P. Friedman, eds., Handbook of Perception (Vol 2). New York: Academic Press. Pollack, I. (1953). The Information of Elementary Auditory Displays, II. Journal of the Accoustical Society of America 25, 765-769. Pratt, J. W. (1964). Risk Aversion in the Small and in the Large. Econometrica 32, 122-136. Prelec, D. (1998). The Probability Weighting Function. Econometrica 66, 497-527. Rothschild, M. and Stiglitz, J. E. (1970). Increasing Risk I: A Definition. Journal of Economic Theory 2, 225-243. Starmer, C. and Sugden, R. (1989). Probability and Juxtaposition Effects: An Experimental Investigation of the Common Ratio Effect. Journal of Risk and Uncertainty 2, 159-78. Vuong, Q. (1989). Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses. Econometrica 57, 307–333. Wilcox, N. (2007). Predicting risky choices out-of-context: A critical stochastic modeling primer and Monte Carlo study. University of Houston Department of Economics Working Paper, available at http://www.class.uh.edu/econ/faculty/nwilcox/papers/predicting_risky_choices_may2007.pdf Wooldridge, J. M. (2002). Econometric Analysis of Cross Section and Panel Data. Cambridge MA: MIT Press. 32 Appendix A: Random Preference RDEU Across Multiple Contexts Since EU is a special case of RDEU, I begin with the harder case of a random preferences specification for RDEU; it will imply a specification for EU. The specification generalizes that of Loomes, Moffatt and Sugden (2002) for a single context, and builds on insights of Carbone (1997) concerning random preference EU models across multiple contexts. Like Loomes, Moffatt and Sugden, I simplify the problem by assuming that any parameters of the probability transformation function w(q) of RDEU are nonstochastic, that is, that the only structural parameters that vary in a subject’s “random preference urn” are her outcome utilities. Rank-dependent expected utility or RDEU, with Prelec’s (1998) one-parameter form for the n probability weighting function, that is w(q | γ n ) = exp(−[− ln(q)]γ ) ∀ q ∈ (0,1), w(0 | γ n ) = 0 and w(1 | γ n ) = 1 , is the structure (( ) ( )) (A.1) V ( S mc | u n ( z ), γ n ) = ∑i=1 w ∑ j≥i smi | γ n − w ∑ j >i smi | γ n u n ( zic ) . 3 Substituting this into equation (5.1), we have (A.2) ( ) Pmcn = Pr ∑i=1Wmi (γ n )u n ( zic ) ≥ 0 | Fu ( x | α n ) , where 3 ( ) ( ) ( ) ( ) Wmi (γ n ) = w ∑ j≥i smi | γ n − w ∑ j >i smi | γ n − w ∑ j≥i tmi | γ n + w ∑ j >i tmi | γ n . Since Wm1 (γ n ) ≡ −Wm 2 (γ n ) − Wm3 (γ n ) , and assuming strict monotonicity of utility in outcomes so that we may divide the inequality in (A.2) through by u n ( z2 c ) − u n ( z1c ) , equation (5.3) becomes (A.3) n Pmcn = Pr (Wm 2 (γ n ) + Wm3 (γ n )umuc ( z3c ) ≥ 0 | Fu ( x | α n ) ) , where n u muc ( z3c ) ≡ [u n ( z3c ) − u n ( z1c )] /[u n ( z 2c ) − u n ( z1c )] , Wm 2 (γ n ) = w( sm 2 + sm 3 | γ n ) − w( sm3 | γ n ) − [ w(tm 2 + t m3 | γ n ) − w(t m3 | γ n )] , and 33 Wm3 (γ n ) = w( sm3 | γ n ) − w(tm 3 | γ n ) . n ( z3c ) > 1 is a “contextual minimum unit transformation” of the utility of the Notice that u muc n maximum outcome z3c in context c. Let vcn ≡ umuc ( z3c ) − 1 ; according to equation (A.3) and the random preference model, vcn ∈R+ is an underlying random variable containing all choicerelevant stochastic information about the agent’s random utility function u n (z ) for choices on context c. Let Gvc ( x | α cn ) be the c.d.f. of vcn . Adopt the convention that lottery names in pairs are always chosen so that t m3 − sm3 > 0 ,12 as is true under the previous definition of MPS pairs, so that W3c (γ n ) = w(tm 3 | γ n ) − w( sm3 | γ n ) > 0 for all pairs. We can then rewrite (A.3) to make the change of random variables described above explicit: (A.4) Pmcn = Pr (vcn ≤ −[Wm 2 (γ n ) + Wm3 (γ n )] / Wm 3 (γ n ) | Gvc ( x | α cn ) ) , or w( sm 2 + sm3 | γ n ) − w(t m 2 + t m3 | γ n ) n Pmcn = Gvc | α c . n n w t − w s ( | ) ( | ) γ γ m m 3 3 With equation (A.4), we have arrived where Loomes, Moffatt and Sugden (2002) left things. Choosing some c.d.f. on R+ for Gvc , such as the Lognormal or Gamma distribution, construction of a likelihood function from (A.4) and choice data is straightforward for one context and this is the kind of experimental data Loomes, Moffatt and Sugden had. But when contexts vary in a data set, the method quickly becomes intractable except for special cases. I now work out one such special case. Consider an experiment such as Hey and Orme (1994) that uses pairs on the four three-outcome contexts generated by four equally-spaced monetary outcomes including zero, and let (0,1,2,3) denote these outcomes. The random utility vector for the four outcomes can be 12 If tm3−sm3=0, either Tmc and Smc are identical, or mc is an FOSD pair. In this latter case, random preferences imply a zero probability of choosing the dominated lottery, or with a small tremble probability, a ωn/2 probability of choosing the dominated lottery, which is how Loomes, Moffatt and Sugden (2002) and I would model that probability. 34 expressed in terms of just two random utilities by an appropriate transformation; the minimum unit transformation gives the vector as (0,1, u2n , u3n ) where u2n > 1 and u3n > u2n . Let g1n ≡ u2n −1 ∈ R+ and g 2n ≡ u3n − u2n ∈ R+ be two underlying random variables generating the two random utilities as u2n = 1+ g1n and u3n = 1+ g1n + g 2n . We immediately have v−n3 = g1n on context c = −3 (that is, the context (0,1,2)), and v−n2 = g1n + g 2n on context c = −2 (that is, the context (0,1,3)). Applying the contextual minimum unit transformation on context c = −0 (that is, the context n n n n n n n n (1,2,3)), umu , −0 (3) = (u3 − 1) /(u 2 − 1) = 1 + (u3 − u 2 ) /(u 2 − 1) = 1 + g 2 / g1 . Therefore, we have v−n0 = g 2n / g1n on context −0. With the random variables v−n0 , v−n2 and v−n3 expressed in terms of the two underlying random variables g1n and g 2n , we need a joint distribution of g1n and g 2n so that g1n , g1n + g 2n and g 2n / g1n have tractable parametric distributions. The only choice I am aware of is two independent gamma variates, each with the gamma distribution’s c.d.f. G(x|φ,κ), with identical “scale parameter” κ n but possibly different “shape” parameters φ1n and φ2n . Under this choice, v−n3 = g1n will have c.d.f. G(x| φ1n , κ n ), v−n2 = g1n + g 2n will have c.d.f. G(x| φ1n + φ2n , κ n ), and v−n0 = g 2n / g1n will have the c.d.f. B′( x | φ2n , φ1n ) of the “beta-prime” distribution on R+, also called a “beta distribution of the second kind” (Aitchison 1963).13 These assumptions imply a joint distribution of u2n − 1 and u3n − 1 known as “McKay’s bivariate gamma distribution” and a correlation coefficient φ1n /(φ1n + φ2n ) between u2n and u3n (McKay 1934; Hutchinson and Lai 1990). 13 The ratio relationship here is a generalization of the well-known fact that the ratio of independent chi-square variates follows an F distribution. Chi-square variates are gamma variates with common scale parameter κ = 2. In fact, a beta-prime variate can be transformed into an F variate: If x is a beta-prime variate with parameters a and b, then bx/a is an F variate with degrees of freedom 2a and 2b. This is convenient because almost all statistics software packages contain thorough call routines for F variates, but not necessarily any call routines for beta-prime variates. 35 An acquaintance with the literature on estimation of random utility models may make these assumptions seem very special and unnecessary. They are very special, but this is because theories of risk preferences over money outcomes are very special relative to the kinds of preferences that typically get treated in that literature. Consider the classic example of transportation choice well-known from Domencich and McFadden (1975). Certainly we expect the value of time and money to be correlated across the population of commuters. But for a single commuter making a specific choice between car and bus on a specific morning, we do not require a specific relationship between the disutility of commuting time and the marginal utility of income she happens to “draw” from her random utility urn on that particular morning. This gives us fairly wide latitude when we choose a distribution for the unobserved parts of her utilities of various commuting alternatives. This is definitely not true of any specific trial of her choice from a lottery pair mc. The spirit of random preferences is that every preference ordering drawn from the urn obeys all properties of the preference structure (Loomes and Sugden 1995). We demand, for instance, that she “draw” a vector of outcome utilities that respects monotonicity in z; this implies that the joint distribution of u2n and u3n must have the property that u3n ≥ u2n ≥ 1 . Moreover, the assumptions we make about the vcn must be probabilistically consistent across pair contexts. Choosing a joint distribution of u2n and u3n immediately implies exact commitments regarding the distribution of any and all functions of u2n and u3n . The issue does not arise in a data set where subjects make choices from pairs on just one context, as in Loomes, Moffatt and Sugden (2002): In this simplest of cases, any distribution of vcn on R+, including the Lognormal choice they make, is a wholly legitimate hypothesis. But as soon as each subject makes choices from pairs on several different overlapping contexts, staying true to the demands of random preferences is much more 36 exacting. Unless we can specify a joint distribution of g1n and g 2n that implies it, we are not entitled (for instance) to assume that vcn follows Lognormal distributions in all of three overlapping contexts for a single subject.14 Our choice of a joint distribution for v−n2 and v−n3 has exact and inescapable implications for the distribution of v−n0 . Carbone (1997) correctly saw this in her random preference treatment of the EU structure. Under these circumstances, a cagey choice of the joint distribution of g1n and g 2n is necessary. The random preference model can be quite limiting in practical applications. For instance, notice that I have not specified a distribution of v−n1 and hence have no method for applying equation (5.5) to pairs on the context (0,2,3). Fully a quarter of the data from experiments such as Hey and Orme (1994) are on that context. On the context −1 = (0,2,3), v−n1 = g 2n /(1+ g1n ). Is there a choice of the joint distribution of g1n and g 2n that gives us convenient parametric distributions of g1n , g1n + g 2n , g 2n / g1n and g 2n /(1+ g1n ), so that those four distributions collectively depend on just three parameters? I can find no answer to this question. This is why I limit myself here to the three-fourths of these data sets that are choices from the contexts (0,1,2), (0,1,3) and (1,2,3): These are the contexts that the “independent gamma model” of random preferences developed above can be applied to. There are no similar practical modeling constraints on RDEU strict, strong or contextual utility models (a considerable practical point in their favor); these models are easily applied to choices on any context. 14 Although ratios of lognormal variates are lognormal, there is no similar simple parametric family for sums of lognormal variates. The independent gammas with common scale are the only workable choice I am aware of. 37 Appendix B: Estimation Procedures, Specifications of Random Parameters Distributions, and Specifications of Underlying Choice Probabilities The Hey and Orme (1994) or HO experiment contains no FOSD pairs; therefore, we need no special specification to account for low rates of violation of FOSD. However, Moffatt and Peters (2001) found evidence of nonzero tremble probabilities using the HO data so I include a tremble probability in all models. Working with Hey’s (2001) still larger data set, which contains 125 observations on each of four contexts, I estimated subject-specific trembles ω n separately on all four contexts, and find that there is no significant correlation of these subject-specific estimates of ω n across contexts. This suggests that there is little or no reliable between-subjects variance in tremble probabilities—that is, that ω n = ω for all n—and I will henceforth assume that this is true of the population in all cases. Under these assumptions, likelihood functions are in all instances built from “full” probabilities Pmcfn that contain this constant tremble, that is Pmcfn = (1 − ω ) Pmcn + ω / 2 . The discussion here concentrates wholly on the specifications of Pmcn and their distribution across subjects. Let H denote a particular pairing of a structure and a stochastic model. For instance H = (EU,Strong) is expected utility with the strong utility stochastic model. Supposing that H is the true data-generating process or DGP for all subjects in the sample, let ψ Hn = ( β Hn ,α Hn ) denote subject n’s true parameter vector governing her choices from pairs. Here, β Hn is the structural parameter vector: It contains utilities of outcomes u2n and u3n whenever H is a strong, strict or contextual utility model, and also the weighting function parameter γ n whenever H is an RDEU structure. By contrast, α Hn is the stochastic parameter vector that governs the shape and/or 38 variance of distributions determining choice probabilities: This is λn in strict, strong or contextual utility models, and is the vector (φ1n , φ2n , κ n ) in random preference models. Let J H (ψ H | θ H ) denote the joint c.d.f. governing the distribution of ψ H in the population from which subjects are sampled, where θ H are parameters governing J H . Let θ H* be the true ~ value of θ H in the population. We want an estimate θ H of θ H* : This is what is meant by random parameters estimation. It requires that we choose a reasonable and tractable form for J H that appears to characterize main features of the joint distribution of parameter vectors ψ H in the sample. The procedure followed for doing this is detailed below for the (EU,Strong) model; then, the procedure is outlined more generally; and finally the exact specifications of all models are laid out in detail. At the level of an individual subjects, without trembles and suppressing the subject superscript n, the (EU,Strong) model is (B.1) ( ) Pmc = Λ λn [(sm1 − t m1 )u1c + ( sm 2 − t m 2 )u2 c + ( sm3 − tm3 )u3c ] , where Λ ( x) = [1 + exp(− x)]−1 is the Logistic c.d.f. (which will be consistently employed as the function H(x) for the strong, strict and contextual utility models). Equation B.1 introduces the notation uic = u ( zic ). In terms of the underlying utility parameters u2 and u3 to be estimated, this implies the following: (1, u2 , u3 ) for pairs on context c = −0 = (1,2,3) ; (B.2) (u1c , u2 c , u3c ) = (0,1, u3 ) for pairs on context c = −2 = (0,1,3) ; and (0,1, u2 ) for pairs on context c = −3 = (0,1,2) . 39 I begin by estimating the parameters of a simplified version of equation B.1 individually, for sixty-eight of HO’s eighty subjects,15 using all 150 observations of choices on the contexts −0, −2, and −3 combined. This initial subject-by-subject estimation gives an initial impression of what the joint distribution of parameter vectors ψ = (u2 , u3 , λ ) looks like across subjects, and how we might choose J (ψ | θ ) to represent that distribution. At this initial step, ω is not estimated, but rather assume it is constant across subjects and equal to 0.04.16 Estimation of ω is undertaken later in the random parameters estimation. Therefore, I begin by estimating u2, u3 and λ, using the “full” probabilities (B.3) Pmcf = (1 − ω ) Pmc + ω / 2 = 0.96Λ (λ[(sm1 − tm1 )u1c + ( sm 2 − tm 2 )u 2c + ( sm3 − t m3 )u3c ]) + 0.02 , which temporarily fixes ω at 0.04, for each subject. The log likelihood function for subject n is (B.4) LLn (u2 , u3 , λ ) = ∑ mc n n ymc ln( Pmcf ) + (1 − ymc ) ln(1 − Pmcf ) with the specification of Pmcf in equation B.3. This is maximized for each subject n, yielding ~ ψ~ n = (u~2n , u~3n , λ n ) , initial estimates for each subject n. Figure 6 graphs ln(u~2n − 1) , ln(u~3n − 1) and ~ ln(λ n ) against their first principal component, which accounts for about 69 percent of their collective variance.17 The figure also shows regression lines on the first principal component. The Pearson correlation between ln(u~2n − 1) and ln(u~3n − 1) is fairly high (0.848). Given that these are both estimates and hence contain some pure sampling error, it appears that an assumption of perfect correlation between them in the underlying population may not do too much violence to 15 There are twelve subjects in the HO data with few or no choices of the riskier lottery in any pair. They can ultimately be included in random parameters estimations, but at this initial stage of individual estimation it is either not useful (due to poor identification) or simply not possible to estimate models for these subjects. 16 Estimation of ω is a nuisance at the individual level. Trembles are rare enough that individual estimates of ω are typically zero for individuals. Even when estimates are nonzero, the addition of an extra parameter to estimate increases the noisiness of the remaining estimates and hides the pattern of variance and covariance of these paramters that we wish to see at this step. 17 Two hugely obvious outliers have been removed for both the principal components extraction and the graph. 40 truth. Therefore, I make this assumption about the joint distribution of ψ in the population. ~ While ln(λ n ) does appear to share limited variance with ln(u~2n − 1) and ln(u~3n − 1) (Pearson correlations of −0.22 and −0.45, respectively), it obviously either has independent variance of its own or is estimated with relatively low precision. These observations suggest modeling the joint distribution J (ψ | θ ) of ψ = (u2 , u3 , λ , ω ) as being generated by two independent standard normal deviates xu and xλ, as follows: (B.5) u2 ( xu ,θ ) = 1 + exp(a2 + b2 xu ) , u3 ( xu ,θ ) = 1 + exp(a3 + b3 xu ) , λ ( xu , xλ ,θ ) = exp(aλ + bλ xu + cλ xλ ) and ω a constant, where (θ , ω ) = (a2 , b2 , a3 , b3 , aλ , bλ , cλ , ω ) are parameters to be estimated. Then the (EU,Strong) model, conditional on xu, xλ and (θ , ω ) , becomes (B.6) Pmcf ( xu , xλ ,θ , ω ) = (1 − ω )Λ (λ ( xu , xλ ,θ )[(sm1 − tm1 )u1c + ( sm 2 − t m 2 )u2 c + ( sm3 − tm 3 )u3c ]) + ω / 2 , where u1c = 1 if c = −0, u1c = 0 otherwise, u2c = u2 ( xu ,θ ) if c = −0, u2 c = 1 otherwise, and u3c = u2 ( xu ,θ ) if c = −3, u3c = u3 ( xu ,θ ) otherwise. Now, estimate θ * by maximizing the following random parameters log likelihood function in θ: (B.7) LL(θ , ω ) = ∑ ln(∫∫ (∏ n m∈{ m 0∪m 23} ) ) Pmcf ( xu , xλ ,θ , ω ) ynmt [1 − Pmcf ( xu , xλ ,θ , ω )]1− ynmt dΦ ( xu )dΦ( xλ ) , 41 where Φ is the standard normal c.d.f. and Pmcf ( xu , xλ , θ , ω ) is as given in equation B.6.18 This is a difficult maximization problem, but fortunately the linear regression lines in Figure 6 may provide reasonable starting values for the parameter vector θ. That is, initial estimates of the a and b coefficients in θ are the intercepts and slopes from the linear regressions of ln(u~2n − 1) , ~ ln(u~3n − 1) and ln(λ n ) on their first principal component; and the root mean squared error of the ~ regression of ln(λ n ) on the first principal component provides an initial estimate of cλ. Table 3 shows the results of maximizing B.7 in (θ , ω ) . As can be seen, the initial parameter estimates are good starting values, though a couple of the final estimates are significantly different from the initial estimates (judging from the asymptotic standard errors of the final estimates). These estimates produce the log likelihood in the first column of the top row of Table ~ ~ 1. Note that wherever b2 ≠ b3 , sufficiently large (or small) values underlying standard normal deviate xu imply a violation of monotonicity (that is u2 > u3 ). Rather than imposing b2 = b3 as a constraint on the estimations, I impose the weaker constraint (a2 − a3 ) /(b3 − b2 ) > 4.2649 , making the estimated population fraction of such violations no larger than 10−5. This constraint does not bind for the estimates shown in Table 3. Generally, it rarely binds, and is never close to significantly binding, for any of the strong, strict or contextual utility estimations done here. Recall that the nonparametric treatment of the utility of outcomes avoids a fixed risk attitude across the outcome vector (0,1,2,3), as would be implied by a parametric form such as CARA or CRRA utility. The estimates shown in Table 3 imply a population in which about sixty-eight percent of subjects have a weakly concave utility function, with the remaining thirty-two percent 18 Such integrations must be performed numerically in some manner for estimation. I use gauss-hermite quadratures, which are practical up to two or three integrals; for integrals of higher dimension, simulated maximum likelihood is usually more practical. Judd (1998) and Train (2003) are good sources for these methods. 42 have an inflected “concave then convex” utility function. This is very similar to the results of Hey and Orme’s (1994) individual estimations: That is, the random parameters estimation used here produces estimated sample heterogeneity much like that suggested by individual estimation. A very similar process was used to select and estimate random parameters characterizations of heterogeneity for all models. In each case, a model’s parameter vector ψ n is first estimated separately for each subject with a fixed tremble probability ω = 0.04. Let these estimates be ψ~ n . The correlation matrix of the parameters is then computed, and the vectors ψ~ n are also subjected to a principal components analysis, with particular attention to the first principal component. As with the detailed example of the (EU,Strong) model, all models with utility parameters u2n and u3n (that is, all strong, strict or contextual utility models) yield quite high Pearson correlations between ln(u~2n − 1) and ln(u~3n − 1) across subjects, and heavy loadings of these on first principal components of estimated parameter vectors ψ~ n . Therefore, the population distributions J (ψ | θ ) , where ψ = (u2 , u3 , λ ) (strong, strict and contextual EU models) or ψ = (u2 , u3 , γ , λ ) (strong, strict and contextual RDEU models), are in all cases modeled as having a perfect correlation between ln(u2n − 1) and ln(u3n − 1) , generated by an underlying standard normal deviate xu. Let RP denote random preference models. Quite similarly, individual estimations of the two RP models, with Gamma distribution shape parameters φ1n and φ2n , yield quite high Pearson ~ ~ correlations between ln(φ1n ) and ln(φ2n ) across subjects, and heavy loadings of these on first principal components of estimated parameter vectors ψ~ n . Therefore, the joint distributions J (ψ | θ ) , where ψ = (φ1 , φ2 , κ ) in the (EU,RP) or ψ = (γ , φ1 , φ2 , κ ) in the (RDEU,RP) model, are 43 both assumed to have a perfect correlation between ln(φ1n ) and ln(φ2n ) in the population, generated by an underlying standard normal deviate xφ. In all cases, all other model parameters are characterized as possibly partaking of some of the variance represented by xu (in strong, strict or contextual utility models) or xφ (in random preference models), but also having independent variance represented by an independent standard normal variate. In essence, all correlations between model parameters are represented as arising from a single underlying first principle component (xu or xφ) which in all cases accounts for two-thirds (frequently more) of the shared variance of parameters in ψ~ n according to the principal components analyses. The correlation is assumed to be a perfect one for ln(u2n − 1) and ln(u3n − 1) (in non-random-preference models) or ln(φ1n ) and ln(φ2n ) (in random preference models), since this seems very nearly characteristic of all individual model estimates; but aside from ω, other model parameters are given their own independent variance since their correlations with ln(u2n − 1) and ln(u3n − 1) are always weaker than that observed between ln(u2n − 1) and ln(u3n − 1) (similarly for ln(φ1n ) and ln(φ2n ) in random preference models). The following equation systems show the characterization for all models, where any subset of xu, xφ, xλ, xκ and xγ found in each characterization are jointly independent standard normal variates. Tremble probabilities ω are modeled as constant in the population, for reasons discussed in section 6.1, and so there are no equations below for ω. The systems represent the choice probabilities Pmc , but (1 − ω ) Pmc + ω / 2 is used to build likelihood functions allowing for trembles. As in the text and Appendix A, Λ, G and B′ are the Logistic, Gamma and Beta-prime cumulative distribution functions, respectively. In strong, strict and contextual utility models, the 44 relationship between the uic terms in the expressions for Pmc , and the functions u2 ( xu ,θ ) and u3 ( xu ,θ ) , is always as shown below equation B.6. (EU,Strong), (EU,Strict) and (EU,Contextual) models: Pmc ( xu , xλ ,θ ) = Λ(λ ( xu , xλ ,θ )[ f ( sm1u1c + sm 2u2c + sm3u3c ) − f (rm1u1c + rm 2u2c + rm3u3c )] / d c ( xu ,θ ) ) , where f(x) = ln(x) in strict utility, and f(x) = x in strong and contextual utility; d c ( xu ,θ ) = u2 ( xu ,θ ) for c = −3, u3 ( xu ,θ ) for c = −2, and u3 ( xu ,θ ) − 1 for c = −0 in contextual utility, and d c ( xu ,θ ) ≡ 1 in strong and strict utility; and u2 ( xu ,θ ) = 1 + exp(a2 + b2 xu ) , u3 ( xu ,θ ) = 1 + exp(a3 + b3 xu ) , and λ ( xu , xλ ,θ ) = exp(aλ + bλ xu + cλ xλ ) , where θ = (a2 , b2 , a3 , b3 , aλ , bλ , cλ ) . (EU,RP) model: Pmc ( xφ , xκ ,θ ) = G (Rm | φ1 ( xφ ,θ ), κ ( xφ , xκ ,θ ) ) for c = −3 , G (Rm | φ1 ( xφ ,θ ) + φ2 ( xφ ,θ ), κ ( xφ , xκ ,θ ) ) for c = −2, and B′(Rm | φ2 ( xφ , θ ), φ1 ( xφ ,θ ) ) for c = −0 , where Rm = sm 2 + sm 3 − t m 2 − t m 3 ; and t m 3 − sm 3 φ1 ( xφ ,θ ) = exp(a1 + b1 xφ ) , φ2 ( xφ ,θ ) = exp(a2 + b2 xφ ) , and κ ( xφ , xκ ,θ ) = exp(aκ + bκ xφ + cκ xκ ) , where θ = (a1 , b1 , a2 , b2 , aκ , bκ , cκ ) . 45 (RDEU,Strong), (RDEU,Strict) and (RDEU,CU) models: Pmc ( xu , xλ , xγ ,θ ) = Λ (λ ( xu , xλ , θ )[ f ( wsm1 ( xu , xγ ,θ )u1c + wsm 2 ( xu , xγ ,θ )u2 c + wsm3 ( xu , xγ ,θ )u3c ) − f ( wt m1 ( xu , xγ ,θ )u1c + wt m 2 ( xu , xγ ,θ )u2c + wt m 3 ( xu , xγ ,θ )u3c )] / d c ( xu ,θ ) ) , where f(x) = ln(x) in strict utility, and f(x) = x in strong and contextual utility; d c ( xu ,θ ) = u2 ( xu ,θ ) for c = −3, u3 ( xu ,θ ) for c = −2, and u3 ( xu ,θ ) − 1 for c = −0 in contextual utility, and d c ( xu ,θ ) ≡ 1 in strong and strict utility; and ( ) ( ) wsmi ( xu , xγ ,θ ) = w ∑z ≥ z smz | γ ( xu , xγ ,θ ) − w ∑z > z smz | γ ( xu , xγ ,θ ) and wt mi ( xu , xγ ,θ ) = w ∑z≥ z t mz | γ ( xu , xγ ,θ ) − w ∑z > z t mz | γ ( xu , xγ ,θ ) , where ( mi mi ) ( mi ) mi ( γ ( xu , xγ ,θ ) w(q | γ ( xu , xγ ,θ ) = exp − [− ln(q)] ) ; and u2 ( xu ,θ ) = 1 + exp(a2 + b2 xu ) , u3 ( xu ,θ ) = 1 + exp(a3 + b3 xu ) , γ ( xu , xγ ,θ ) = exp(aγ + bγ xu + cγ xγ ) , and λ ( xu , xλ ,θ ) = exp(aλ + bλ xu + cλ xλ ) , where θ = (a2 , b2 , a3 , b3 , aγ , bγ , cγ , aλ , bλ , cλ ) . (RDEU,RP) models: Pmc ( xφ , xκ , xγ ,θ ) = G (WRm ( xφ , xγ ,θ ) | φ1 ( xφ ,θ ), κ ( xφ , xκ ,θ ) ) for c = −3, G (WRm ( xφ , xγ ,θ ) | φ1 ( xφ ,θ ) + φ2 ( xφ , θ ), κ ( xφ , xκ ,θ ) ) for c = −2, and B′(WRm ( xφ , xγ ,θ ) | φ2 ( xφ ,θ ), φ1 ( xφ ,θ ) ) for c = −0 ; where WRm ( xφ , xγ ,θ ) = w[ sm 2 + sm3 | γ ( xφ , xγ ,θ )] − w[tm 2 + t m3 | γ ( xφ , xγ ,θ )] and w[t m3 | γ ( xφ , xγ ,θ )] − w[ sm3 | γ ( xφ , xγ ,θ )] ( γ ( xφ , xγ ,θ ) w[q | γ ( xφ , xγ ,θ ] = exp − [− ln(q)] ); and φ1 ( xφ ,θ ) = exp(a1 + b1 xφ ) , φ2 ( xφ ,θ ) = exp(a2 + b2 xφ ) , γ ( xφ , xγ ,θ ) = exp(aγ + bγ xφ + cγ xγ ) , and κ ( xφ , xκ ,θ ) = exp(aκ + bκ xφ + cκ xκ ) , where θ = (a1 , b1 , a2 , b2 , aγ , bγ , cγ , aκ , bκ , cκ ) . 46 For models with the EU structure, a likelihood function nearly identical to equation B.7 is maximized in θ ; for instance, for the (EU,RP) model simply replace Pmc ( xu , xλ ,θ ) with Pmc ( xφ , xκ ,θ ) , and replace dΦ ( xu )dΦ ( xλ ) with dΦ ( xφ )dΦ ( xκ ) . For models with the RDEU structure, a third integration appears since these models allow for independent variance in γ (the Prelec weighting function parameter) through the addition of a third standard normal variate xγ. In all cases, the integrations are carried out by gauss-hermite quadrature. For models with the EU structure, where there are two nested integrations, 14 nodes are used for each nested quadrature of an integral. For models with the RDEU structure, 10 nodes are used for each nested quadrature. In all cases, starting values for these numerical maximizations are computed in the manner described for the (EU,Strong) model: Parameters in ψ~ n are regressed on their first principle component, and the intercepts and slopes of these regressions are the starting values for the a and b coefficients in the models, while the root mean squared errors of these regressions are the starting values for the c coefficients found in the equations for λ, κ and/or γ. 47 Table 1. Log likelihoods of random parameters characterizations of the models in the Hey and Orme sample Structure EU RDEU Stochastic Model Strong Utility Strict Utility Contextual Utility Random Preferences Strong Utility Strict Utility Contextual Utility Random Preferences Estimated on all three contexts Log Likelihood on all three contexts (in-sample fit) −5311.44 −5448.50 −5297.08 −5348.36 −5207.81 −5306.48 −5190.43 −5218.00 48 Estimated on contexts (0,1,2) and (0,1,3) Log likelihood on context (1,2,3) (out-of-sample fit) −2409.38 −2373.12 −2302.55 −2356.60 −2394.75 −2450.41 −2281.36 −2335.55 Table 2. Vuong (1989) non-nested tests between model pairs, by structure, stochastic model and in-sample versus out-of-sample fit. EU Structure Estimated on all three contexts, and comparing fit Estimated on contexts (0,1,2) and (0,1,3), and on all three contexts (in-sample fit comparison) comparing fit on context (1,2,3) (out-of-sample fit comparison) Random Strong Strict Random Strong Strict Prefs. Utility Utility Prefs. Utility Utility Contextual z = 1.723 z = 0.703 z = 6.067 Contextual z = 4.387 z = 3.044 z = 2.739 Utility p = 0.042 p = 0.241 p < 0.0001 Utility p < 0.0001 p = 0.0012 p = 0.0031 Random z = 5.419 Random z = 1.639 z = 1.422 z = −1.574 Prefs. Prefs. p = 0.051 p = 0.078 p = 0.058 p < 0.0001 Strong z = 5.961 Strong z = 0.028 Utility p < 0.0001 Utility p = 0.49 RDEU Structure Estimated on all three contexts, and comparing fit Estimated on contexts (0,1,2) and (0,1,3), and on all three contexts (in-sample fit comparison) comparing fit on context (1,2,3) (out-of-sample fit comparison) Random Strong Strict Random Strong Strict Prefs. Utility Utility Prefs. Utility Utility Contextual z = 0.981 z = 0.877 z = 4.352 Contextual z = 3.879 z = 3.304 z = 5.978 Utility p = 0.163 p = 0.190 p < 0.0001 Utility p < 0.0001 p = 0.0005 p < 0.0001 Random z = 3.808 Random z = 1.652 z = 3.831 z = −0.44 Prefs. Prefs. p = 0.049 p < 0.0001 p = 0.330 p < 0.0001 Strong z = 5.973 Strong z = 3.918 Utility p < 0.0001 Utility p < 0.0001 Notes: Positive z means the row stochastic model fits better than the column stochastic model. 49 Table 3. Random parameters estimates of the (EU,Strong) model, using choice data from the contexts (0,1,2), (0,1,3) and (1,2,3) of the Hey and Orme (1994) sample. Structural and stochastic parameter models u2 = 1 + exp(a2 + b2 xu ) u3 = 1 + exp(a3 + b3 xu ) λ = exp(aλ + bλ xu + cλ xλ ) ω constant Distributional Initial Final parameter estimate estimate −1.2 −1.28 0.57 0.514 −0.51 −0.653 0.63 0.657 3.2 3.39 −0.49 −0.658 0.66 0.584 0.04 0.0446 ω Log likelihood = −5311.44 a2 b2 a3 b3 aλ bλ cλ Asymptotic Asymptotic standard t-statistic error 0.0411 −31.0 0.0311 16.5 0.0329 −16.9 0.0316 20.8 0.101 33.8 0.124 −5.32 0.0571 10.2 0.0105 4.26 Notes: xu and xλ are independent standard normal variates. Standard errors are calculated using the “sandwich estimator” (Wooldridge 2002) and treating all of each subject’s choices as a single “super-observation,” that is, using degrees of freedom equal to the number of subjects rather than the number of subjects times the number of choices made. 50 Figure 1. Citations of Pratt (1964), 1977-2006 (Science and Social Science Citation Indices), and articles with text references to risk aversion and substitutability relations, 1977-2001 (JSTOR Economics journals and selected Business journals) 350 250 Articles citing Pratt (1964) 200 JSTOR articles with text references to "more risk averse" or "greater risk aversion" 150 JSTOR articles with text references to "more substitutable" or "greater elasticity of substitution" 100 50 Year 51 05 20 03 20 01 20 99 19 97 19 95 19 93 19 91 19 89 19 87 19 85 19 83 19 81 19 79 19 77 0 19 Number of articles or citations 300 Figure 2. Behavior of five CRRA EU V-distances, using various CRRA transformations, with homoscedastic precision: MPS pair 1 on context (0,50,100), from Hey (2001) 12 basic transformation power transformation unit range transformation minimum unit transformation log transformation 10 V-distance 8 6 4 2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 coefficient of relative risk aversion 52 0.7 0.8 0.9 Figure 3. Behavior of five CRRA EU V-distances, using various CRRA transformations, with homoscedastic precision: MPS pair 1 on context (50,100,150), from Hey (2001) 12 10 V-distance 8 6 4 basic transformation power transformation 2 unit range transformation minimum unit transformation log transformation 0 0 0.1 0.2 0.3 0.4 0.5 0.6 coefficient of relative risk aversion 53 0.7 0.8 0.9 Figure 4. The agent heteroscedasticity solution for making MRA imply SMRA 180 utility and expected utility transformations τ a and τ b of the two utility functions chosen to match slopes at the central outcome of the outcome context (50,100,150) common EU of sure 100 b ∆τ b V (ρ =0.5) a ∆τ a V (ρ =0.99 ) square root b (ρ = 0.50) CRRA utility function "near-log" (ρa = 0.99) CRRA utility function 0 50 100 -20 outcomes 54 150 Figure 5. Relationship between risk aversion and precision in Hey (2001), CRRA EU estimation. 12 natural logarithm of precision 10 8 6 4 individual estimates 2 robust regression line (slope=4.24) 0 -0.5 0 0.5 1 1.5 2 -2 -4 coefficient of relative risk aversion 55 2.5 3 3.5 Figure 6. Shared variance of initial individual parameter estimates in the (EU,Strong) model natural logarithm of precision and utility parameter estimates 6 5 ln(λ) 4 3 2 1 ln(u 3-1) 0 -3 -2 -1 0 1 2 -1 ln(u 2-1) -2 -3 -4 First principal component of variance 56 3
© Copyright 2026 Paperzz