`Stochastically More Risk Averse:` A Contextual Theory of Stochastic

‘Stochastically More Risk Averse:’
A Contextual Theory of
Stochastic Discrete Choice Under Risk
by
Nathaniel T. Wilcox
Department of Economics
University of Houston
Houston, TX 77204-5019
[email protected]
Abstract
I introduce the relation “stochastically more risk averse.” Stochastic choice under risk has
overwhelmingly been modeled by way of what decision theory calls a strong or strict utility
model, which in microeconometrics are also known as homoscedastic latent variable models. In
particular, a parametric preference functional difference, representing the difference between the
value of two lotteries (as specified by expected utility, rank-dependent expected utility,
cumulative prospect theory and so forth), is embedded within a cumulative distribution function,
such as the Standard Normal or Logistic to represent choice probabilities. Then a likelihood
function is constructed and estimated. I show that there is a deep conceptual difficulty with this
practice: Estimated money utility function parameters that are meant to represent an agent’s
degree of risk aversion in the senses of Pratt (1964), and hence should order agents according to
the relation “more risk averse,” have no necessary correspondence to the relation “stochastically
more risk averse” within such strong or strict utility implementations of the preference
functionals. I introduce a new stochastic choice model, called “contextual utility,” that remedies
this unsatisfactory situation. Contextual utility is based on the psychologically plausible notion
that the “outcome context” of each lottery choice pair creates a subjective utility range that
mediates the perception of value differences between the lotteries in that pair. From a
microeconometric viewpoint, contextual utility is simply a heteroscedastic latent variable model
where latent error variance is proportional to the range of outcome utilities in lottery pairs.
Empirical evidence is presented showing that contextual utility explains and predicts discrete
choice under risk better than strong or strict utility models or random preference models.
I thank John Hey for generously making his data available to myself and others, and I thank Soo
Hong Chew and Glenn Harrison for conversation and commentary during this work. I also thank
the U.S. National Science Foundation for research support under grant SES 0350565.
Let c = ( z1c , z 2c , z3c ) be a vector of three money outcomes such that z1c < z 2c < z3c , called an
outcome context or, more simply, a context. Let S m be a discrete probability distribution
( sm1 , sm 2 , sm3 ) on any vector ( z1 , z 2 , z3 ) ; then a three-outcome lottery S mc on context c is the
discrete probability distribution S m on outcome vector ( z1c , z2 c , z3c ) . It is sometimes convenient
to view lotteries as cumulative distribution functions S mc ( z ) ≡ ∑i|z
ic ≤ z
smi . A pair mc is a set
{S mc , Tmc } of two lotteries on a common context c. Let E ( S mc ) be the expected value of S mc : a
mean-preserving spread or MPS pair is one where E ( S mc ) = E (Tmc ) , sm 2 + sm3 > tm 2 + t m3 and
sm3 < t m3 . Let Ω mps be the set of all MPS pairs. Since any risk-averse expected utility maximizing
agent n will prefer S mc to Tmc ∀ mc ∈ Ω mps (Rothschild and Stiglitz 1970), we can call S mc the
“relatively safe choice” in pairs mc ∈ Ω mps , at least from an expected utility perspective.
Suppose that, for empirical reasons, we are convinced that agents choose probabilistically
rather than deterministically.1 Then, let Pmcn be the probability that agent n chooses S mc from pair
mc, which we regard as an empirical primitive. What might it mean for agent a to be
probabilistically more risk-averse than agent b? Consider this proposal for a probabilistic
empirical primitive:
Stochastically More Risk-Averse (SMRA). Agent a is stochastically more risk averse than
agent b, written a b , iff Pmca > Pmcb ∀ mc ∈ Ω mps .
smra
1
By this, I mean that choice probabilities are usually different from 0 or 1 (or 0.5 in the case of indifference). A
large number of experiments that involve multiple trials of identical pairs document substantial choice switching
between trials of identical pairs, even at very short time intervals of a few minutes with no intervening change in
wealth or any other obvious decision-relevant variable (Tversky 1969; Camerer 1989, Starmer and Sugden 1989,
Hey and Orme 1994, Ballinger and Wilcox 1997, Loomes and Sugden 1998, Hey 2001).
1
That is, a stochastically more risk averse agent makes the relatively safe choice from any MPS
pair with higher likelihood than a less stochastically risk-averse agent.2 If mean proportions of
relatively safe choices from MPS pairs vary significantly across subjects in an experiment, and
we call this heterogeneous risk aversion, then we probably have something like SMRA in mind.
SMRA is not, however, an implication of theoretical definitions of the relation “more risk
averse” (or MRA) inherited from Pratt (1964) for expected utility (or EU) maximizers, nor of
Chew, Karni and Safra’s (1987) later generalization of it for rank-dependent expected utility (or
RDEU) maximizers, when these are combined with stochastic discrete choice models commonly
used for estimation of risk attitudes. I show this, and offer an alternative stochastic choice model
I call contextual utility that does make SMRA an implication of MRA. I then present estimates
showing that contextual utility generally describes data and predicts new choices as well or better
than other common stochastic choice models, when these are combined with EU and RDEU
theories of choice under risk, in the well-known Hey and Orme (1994) data set. Thus we can
have our theoretical cake and empirically eat it too: A stochastic choice model like contextual
utility, that makes SMRA and MRA congruent with one another, is also an empirically better
model of stochastic choices from lottery pairs.
As a relational preference concept, MRA is ubiquitous in professional economic discourse.
Figure 1 illustrates this for the years 1977 to 2001, comparing counts of articles3 with either of
the phrases “more risk-averse” or “greater risk aversion” in their text to counts of articles with
2
We may, if we wish, fancy it up by making distinctions between local and global versions of the primitive—taking
account, for instance, of the possibility that risk aversion may change with current wealth in some manner. I will not
delve into this here: Formally, regard my discussion as limited to what Cox and Sadiraj (2006) call the “expected
utility of income” model (and the rank-dependent expected utility of income as well).
3
The years 1977 to 2001 were chosen so that a broad range of economics journals were searchable in JSTOR for all
years in that range—only such journals are included for these counts. Some journals withhold JSTOR availability
for the most recent five years, so I chose 2001 as the last widely available year. A small number of important
journals either do not have JSTOR availability back to 1977 (e.g. the Journal of Labor Economics) or did not exist
back to 1977 (e.g. the Journal of Economic Perspectives) and so are not included. Three important business journals
are also included for these article counts (the Journal of Business, the Journal of Finance and Management Science).
2
either of the phrases “more substitutable” or “greater elasticity of substitution” in their text. The
latter is clearly a central relational concept of economics. The comparative levels of the two
trend lines are somewhat arbitrary, since searches for different text strings would affect the
levels. Moreover, instances of the “substitutability strings” include articles about technologies
rather than preferences. The comparative trends are more meaningful. MRA gained on “more
substitutable” as a relational concept over the quarter century from 1977 to 2001. Figure 1 also
shows that although citations4 to Pratt (1964) peaked in the early 1980s, they have continued at a
fairly constant rate of about thirty-five to forty a year over the last two decades and show no sign
of decrease. Probably, very few economics articles show such steady and long-lived influence.
There can be little doubt that MRA is a central relational preference concept of economics.
Estimation and interpretation of structural parameters of theories of discrete choice under risk
is becoming widespread. Estimation requires the adoption of some stochastic choice model, and
we need to know how to interpret structural parameters of theories of choice under risk when
those models are estimated with any particular stochastic choice model. It seems desirable that
structural parameters associated with the concept of MRA should have a theoretically appealing
empirical meaning like SMRA, particularly given MRA’s ubiquity in economic discourse. The
contextual utility stochastic model presented here delivers that interpretation: Commonly used
alternatives do not. I establish the latter negative point first, and then show the former. Then I
show that models based on contextual utility explain (and especially predict) risky choices better
than models based on strong utility, strict utility and random preference stochastic models in the
well-known Hey and Orme (1994) data set.
4
Here, I searched the ISI Science and Social Science Citation indices, which allows searching from 1977 to 2006.
All fields (not just economics) are included for these citation counts in Figure 1: The temporal pattern of citations is
extremely similar when limited to citations in economics journals.
3
1. Strong and strict utility
To connect deterministic theory to probabilistic primitives, let the structure of choice under
risk be defined as a function V of lotteries and a vector of structural parameters β n such that
(1.1)
V ( S mc | β n ) − V (Tmc | β n ) ≥ 0 ⇔ Pmcn ≥ 0.5 .5
I will frequently refer to V ( S mc | β n ) − V (Tmc | β n ) as the V-distance between Smc and Tmc.
Equation (1.1) equates the deterministic primitive S mc nTmc or “ S mc is weakly preferred to Tmc
by n” with the probabilistic primitive “n is at least as likely to choose S mc from mc as Tmc ,” a
common approach since Edwards (1954).6 Since structure maps pairs into a set of possible
probabilities rather than a single unique probability, structure alone underdetermines choice
probabilities. Stochastic models combine with structure to remedy this.
For example, expected utility (EU) with the constant relative risk aversion (CRRA) utility of
money u n ( z ) = (1 − ρ n ) −1 z1−ρ is the structure V ( S mc | ρ n ) = (1 − ρ n ) −1 ∑i=1 smi zic1−ρ , so that
3
n
(1.2)
n
(1 − ρ n ) −1 ∑i=1 ( smi − tmi ) zic1−ρ ≥ 0 ⇔ Pmcn ≥ 0.5 ,
3
n
where β n = ρ n is the sole structural parameter, called agent n’s coefficient of relative risk
aversion. Call this the CRRA EU structure. To estimate ρ n in (1.2), or more generally β n in
(1.1), we need a stochastic model. Many empirical economists instinctively use a homoscedastic
latent variable model to complete the relationship between the structural left side of (1.1) and the
n
choice probability Pmcn on the right side of (1.1). Let ymc
= 1 if agent n chooses S mc from pair mc
n
( ymc
= 0 otherwise). In general, such models assume that there is an underlying but unobserved
5
This restricts attention to transitive structures; this would be an unsatisfactory equation for a nontransitive theory.
There are alternative stochastic choice models under which this innocent-sounding equation is incorrect, such as
Machina (1985); but existing evidence offers little support for these alternatives (Hey and Carbone 1995).
6
4
n*
n
n*
continuous random latent variable ymc
such that ymc
= 1 ⇔ ymc
≥ 0 ; then we have
n*
Pmcn = Pr( ymc
≥ 0) . Here, the latent variable is
(1.3)
n*
ymc
= V ( S mc | β n ) − V (Tmc | β n ) − σε ,
where ε is a mean zero random variable with some standard variance and c.d.f. H (x) such that
H (0) = 0.5 and H ( x) = 1 − H (− x) , usually assumed to be the standard normal or logistic c.d.f.
The resulting homoscedastic latent variable model of Pmcn is then
(1.4)
(
)
Pmcn = H [V ( S mc | β n ) − V (Tmc | β n )] / σ .
The modifier “homoscedastic” refers to the fact that σ is assumed to be constant across both
pairs mc and agents n, a common assumption made when estimating these models.
In decision theory, (1.4) is a strong utility model (Debreu 1958; Block and Marschak 1960;
Luce and Suppes 1965). Letting L be the set of all lotteries, choice probabilities follow strong
utility if there is a function µ n :L→R and an increasing function ϕ n :R →[0,1], with ϕ n (0) = 0.5
and ϕ n ( x) = 1 − ϕ n (− x) , such that Pmcn = ϕ n (µ n ( S mc ) − µ n (Tmc ) ) . Clearly, (1.4) is a strong utility
model where ϕ n ( x) = H ( x / σ ) and µ n ( S mc ) = V ( S mc | β n ) . In equation (1.3), σε is regarded as
computational, perceptual or evaluative noise in the decision maker’s apprehension of the Vdistance V ( S mc | β n ) − V (Tmc | β n ) , with σ being proportional to the standard deviation of this
noise. Microeconometric doctrine usually views σε as a perturbation to V-distance observed by
agents but not observed by the econometrician. In either case, as σ approaches zero, choice
probabilities converge on either zero or one, depending on the sign of the V-distance; put
differently, the observed choice becomes increasingly likely to express the underlying preference
direction of the structure. In keeping with common (but not universal) practice and parlance in
the experimental literature, substitute λ for 1 / σ , and call λ the precision parameter, writing
5
(1.5)
(
)
Pmcn = H λ[V ( S mc | β n ) − V (Tmc | β n )] .
Strict utility is a special case of strong utility where the function µ n is additionally restricted
to be positive-valued and choice probabilities can be written as
(1.6)
Pmcn = µ n ( S mc ) /[ µ n ( S mc ) + µ n (Tmc )] .
Provided V is also positive-valued, we may define µ n ( S mc ) ≡ V ( S mc | β n ) λ , and rewrite (1.6) as
(1.7)
[(
)]
Pmcn = Λ λ ln[V ( S mc | β n )] − ln[V (Tmc | β n )] ,
where Λ ( x) = (1 + e − x ) −1 is the Logistic c.d.f. So strict utility can be viewed as a special case of
strong utility where ln(V) replaces V in (1.1) and H is the Logistic c.d.f.
One may produce a variety of strong utility implementations of any structure, and that variety
is well-represented in the estimation literature. The utility function for money u(z) in both EU
and RDEU structures is only unique up to an affine transformation, and in the literature we see
various transformations used for estimation from experimental choice data. In the case of the
CRRA utility function, call ub ( z | ρ ) = (1 − ρ ) −1 z1− ρ the basic transformation and u p ( z | ρ ) = z1− ρ
the power transformation. In lottery choice experiments with many distinct contexts c, let
z1 < z 2 < ... < z K be the k = 1, 2,…, K money outcomes from which all contexts c are created.
Other transformations we might use for strong utility estimation of ρ , given data from such an
experiment, are uur ( z | ρ ) = ( z1− ρ − z11− ρ ) /( z1K− ρ − z11− ρ ) , the unit range CRRA transformation (so
called because uur ( z1 | ρ ) = 0 and uur ( z K | ρ ) = 1 ), and umu ( z | ρ ) = ( z1− ρ − z11− ρ ) /( z12− ρ − z11− ρ ) , the
minimum unit CRRA transformation (so called because umu ( z1 | ρ ) = 0 and umu ( z2 | ρ ) = 1 ).
Let ∆τ Vmc ( ρ ) be shorthand for the CRRA EU V-distance between the lotteries in pair mc, as
computed using transformation τ, for use in a strong utility model; that is, let
6
(1.8)
∆τ Vmc ( ρ ) ≡ ∑i=1 ( smi − t mi )uτ ( zic | ρ ) , τ ∈ {b,p,ur,mu}.
3
Also, let ∆ lnVmc ( ρ ) be the strict utility (logarithmic) CRRA EU V-distance, defined as
(1.9)
(
) (
)
∆ lnVmc ( ρ ) ≡ ln ∑i=1 smi z1ic− ρ − ln ∑i=1 rmi zic1− ρ .
3
3
With some terminological abuse, say that this V-distance uses the log transformation. Equations
(1.8) and (1.9) now give a family of strong and strict CRRA EU models of Pmcn , namely
(1.10) Pmcn = H (λ∆τ Vmc ( ρ n ) ) , τ ∈ {b,p,ur,mu,ln}.
where for strict utility H must be the Logistic c.d.f. For consistency, I use the Logistic c.d.f. as H
in all estimations reported subsequently.
Note that homoscedasticity with respect to agents n is not an essential feature of either strong
or strict utility models. However, homoscedasticity with respect to pairs mc is a central
characteristic of strong and strict utility models: Without it, they do not imply strong stochastic
transitivity, one of the chief observable implications of strong and strict utility that distinguish
them from other stochastic models (Block and Marschak 1960; Luce and Suppes 1965).
2. Critique of fully homoscedastic strong and strict utility
Pratt (1964) originally developed a definition of what it means to say that EU-maximizing
agent a is “more risk averse” (or MRA) than EU-maximizing agent b, which we may write as
a b . Because such definitions of “more risk averse than” are couched in terms of comparative
mra
statics in the neighborhood of indifferences, they do not necessarily imply anything about
changes in V-distance away from the neighborhood of indifference; therefore, a b ⇒
/ a b in
mra
smra
strong or strict utility models. Since V-distance is arbitrary in affine structures, a definition of
“more risk averse than” will of necessity be couched in terms of what happens in the
7
neighborhood of indifference, where the arbitrariness of affine transformations is locally
irrelevant. This extended quote from Chew, Karni and Safra (1987, p. 374) illustrates this, with
notation modified to fit mine, which is definitive of MRA for both EU and RDEU:
b
a
To compare the attitudes toward risk of two preference relations and on DJ [a set
of probability distributions on an interval J ⊂ R], we define Tmc ∈ DJ to differ from Smc ∈ DJ
by a simple compensated spread from the viewpoint of , if and only if Tmc ~b Smc and ∃ z0
b
∈ J such that Tmc(z) ≥ Smc(z) for all z < z0 and Tmc(z) ≤ Smc(z) for all z ≥ z0.
a
DEFINITION 5: A preference relation ∈ G [the set of preference relations
representable by Gateaux-differentiable RDEU functionals V] is said to be more risk averse
b
a
than ∈ G if Smc Tmc for every Tmc, Smc ∈ DJ such that Tmc differs from Smc by a
b
simple compensated spread from the point of view of ….
THEOREM 1. The following conditions on a pair of Gateaux differentiable [RDEU]
preference functionals V b and V a on DJ with respective utility functions u b and u a and
probability transformation functions wb and wa are equivalent:…
(ii) wa and u a are concave transformations of wb and u b , respectively.
(iii) If Tmc differs from Smc by a simple compensated spread from the point of view of b
[implying that V b (Tmc ) = V b ( S mc ) ] then V a (Tmc ) ≤ V a ( S mc ) .
In terms of CRRA EU V-distance, Chew, Karni and Safra’s definition of a simple
compensated spread means that if Tmc is a simple compensated spread of Smc from the viewpoint
of agent b with ρ = ρ b , we have ∆τ Vmc ( ρ b ) = 0. The results (ii) and (iii) of Theorem 1 then
imply that ∆τ Vmc ( ρ a ) > 0 ∀ ρ a > ρ b . However, this does not imply that ∂∆τ Vmc ( ρ ) / ∂ρ > 0 ∀
ρ > ρ b . MRA is about how indifference becomes preference when we substitute any “more risk
averse” utility function (or probability weighting function) for the less risk averse one that
8
generated indifference; it is definitely not about monotone increasing V-distance with greater risk
aversion. But the latter is required for a b ⇒ a b in strong and strict utility models if
mra
smra
precision λ is independent of structural risk aversion ρ, that is if evaluative noise is
homoscedastic with respect to agents. Since strong utility specifies no necessary relationship
between ρ and λ, then, strong utility and MRA do not imply SMRA.
Moreover, MRA cannot imply SMRA across all contexts in any strong or strict utility model
that assumes homoscedasticity with respect to agents (constant λ). Graphs illustrate this by way
of a simple counterexample, based on Hey’s (2001) four-outcome experimental design (similar
four-outcome designs are common, e.g. Hey and Orme 1994 and Harrison and Rutström 2005).
The design employs the four outcomes 0, 50, 100 and 150 U.K. pounds. Those four outcomes
generate four distinct three-outcome contexts on which all experimental lottery pairs are defined.
Label these contexts by their omitted outcome, e.g. the context c = −0 is (50,100,150), while the
context c = −150 is (0,50,100). Here, the CRRA unit range transformation is
uur ( z | ρ ) = ( z1− ρ − 01− ρ ) /(1501− ρ − 01− ρ ) = (z / 150)1− ρ , while the minimum unit transformation is
umu ( z | ρ ) = ( z1− ρ − 01− ρ ) /(501− ρ − 01− ρ ) = (z / 50)1− ρ .
Figure 2 shows the behavior of λτ ∆τ V1, −150 ( ρ ) for the five transformations τ defined
previously, computed for the MPS pair 1 on context c = −150 (the context (0,50,100)), given by
S1, −150 = (0,7 / 8,1 / 8) and T1, −150 = (3 / 8,1 / 8,4 / 8) , taken from Hey (2001), as ρ varies over the
interval [0,0.99]. To sketch Figure 2, a constant value of λτ has been chosen for each
transformation to make λτ ∆τ V1, −150 ( ρ ) reach a common maximum of 10 for ρ ∈ [0,0.99], for easy
visual comparisons; the important point is that λτ is held constant to draw each graph of
9
λτ ∆τ V1, −150 ( ρ ) , for each transformation τ. The graphs thus reveal the behavior of strong and strict
utility models in which λ is independent of ρ or, what is the same thing, where λ is assumed to
be constant across agents n (homoscedasticity with respect to agents): If λτ ∆τ V1, −150 ( ρ ) is not
monotone increasing on [0,0.99], then MRA cannot imply SMRA under transformation τ, given
homoscedasticity with respect to agents. Notice that the basic, power and log transformations are
not monotonically increasing in ρ, though the nonmonotonicity for the basic transformation is
mild for this particular pair. The implication is that a strong utility CRRA EU model with λ
constant across agents cannot predict SMRA from MRA defined in terms of ρ, using either the
basic, power or log transformations, for MPS pair 1 on context c = −150. However, the unit
range and minimum unit transformations can do this, at least for this MPS pair on this context.
Unfortunately, this property of the unit range and minimum unit transformations disappears
as soon as we consider a context that does not share the minimum outcome z1 = 0 used to define
those transformations. Consider now MPS pair 1 on context c = −0 (the context (50,100,150)),
given by S1, −0 = (0,7 / 8,1 / 8) and T1, −0 = (3 / 8,1 / 8,4 / 8) , also taken from Hey (2001). Figure 3
shows λτ ∆τ V1, −0 ( ρ ) for MPS pair 1 on context c = −0, for all five transformations: All produce a
severely nonmonotonic relationship between V-distance and ρ, given constant precision λ. In
essence, none of the five transformations can predict SMRA from MRA in ρ for all MPS pairs
on a four-outcome vector—if λ is constant across agents. The conclusion is clear: If we want to
explain heterogeneity of safe lottery choices by means of a strong or strict utility model solely in
terms of heterogeneity of a parameter like ρ, and we use any of the five transformations
discussed above, we cannot uniformly succeed if we assume that precision λ is constant across
10
subjects with varying ρ. Homoscedasticity with respect to agents is unacceptable if we want
a b ⇒ a b in a strong or strict utility CRRA EU model across multiple contexts.
mra
smra
The fact that the minimum unit and unit range transformations make MRA and SMRA
congruent on a context that shares the minimum outcome used to define them gives a clue to one
escape from these difficulties: Why not use a context-dependent transformation? This approach
abandons homoscedasticity with respect to pairs mc; hence it is not a strong or strict utility
model. This is in fact the approach I eventually argue for: The contextual utility model
introduced subsequently assumes heteroscedasticity with respect to outcome contexts of a form
suggested by a context-specific unit range transformation. This can be given both good
psychological motivation and firm grounding in terms of Pratt’s (1964) main theorem. First,
however, it is worth considering whether another apparent solution—allowing heteroscedasticity
with respect to agents—is adequate on its own.
3. Empirical and theoretical problems with an “agent heteroscedasticity” solution
Figure 4 shows two concave CRRA utility functions, for two agents a and b: A square-root
( ρ b = 0.5) CRRA utility function, and a “near-log” ( ρ a = 0.99) CRRA utility function; clearly
a b in the senses of Pratt (1964) and Chew, Karni and Safra (1987). Consider the MPS pair
mra
S 2, −0 = (0,1,0) and T2, −0 = (1 / 2,0,1 / 2) —a sure 100 versus even chances of either 50 or 150. For
a b to imply a b under strong utility, we need it to be the case that the V-distance between
mra
smra
S 2, −0 and T2, −0 , given ρ b = 0.5, is less than the same V-distance given ρ a = 0.99, and Figure 4
has been drawn this way. If this were not true, then the less risk-averse utility function would
imply a higher probability of choosing S 2, −0 under strong utility. But the two CRRA utility
11
functions in Figure 4 are not just any CRRA utility functions with ρ b = 0.5 and ρ a = 0.99: A
specific transformation has been chosen that equates first derivatives of utility functions at the
intermediate outcome 100. If this had not been done, the figure would not “nest” the V-distance
with ρ b = 0.5 inside of the V-distance with ρ a = 0.99, and nothing could be read off about
relative V-distance between S 2, −0 and T2, −0 for the two utility functions.7 The local risk aversion
measure − u ′′( z ) / u′( z ) may be thought of as measuring concavity relative to “a common unit
first derivative” at z by virtue of its mathematical form, which is what Figure 4 does visually.
Reflection on Figure 4 suggests that for a b ⇒ a b on some collection of contexts, we
mra
smra
might succeed with a transformation of u(z) that equates first derivatives u ′( z ) of all utility
functions at some fixed value, at some z sufficiently greater than z1 ≡ max{z1c } , the maximum
c
of the minimum outcomes found in any of several contexts c for which we have choice data. If
this is not done, utility functions become arbitrarily flat over one or more contexts as the
coefficient of relative risk aversion get large. This in turn implies that all V-distances between
lotteries on such contexts approach zero as the coefficients of relative risk aversion get large and,
hence, that all strong or strict utility choice probabilities approach 0.5 for sufficiently great
relative risk aversion on such contexts. The first derivative of the CRRA basic transformation is
ub′ ( z ) = z − ρ ; so multiplying the basic transformation by z ρ , or (what is the same thing) eαρ
where α = ln( z ) for some z > z1 , makes all CRRA utility functions have a common unit slope at
some z = eα . Therefore, call u sc ( z | ρ ) = eαρ z1− ρ /(1 − ρ ) an SMRA-compatible CRRA
transformation.
7
The figure also chooses a “zero” (really, the c in c + du(z)) for each utility function such that u(100) = 100 for both
utility functions, but this is just for ease of visual comparison, since this choice disappears from any strong utility Vdistance. The choice of d to match first derivatives at z = 100 is the important move.
12
If we compute CRRA EU V-distance using the SMRA-compatible transformation, the term
eαρ may be factored out of the V-distance. The analogue of equation (1.10) then becomes
(3.1)
(
)
n
Pmcn = H λeαρ ∆ bVmc ( ρ n ) .
This is a latent variable model with heteroscedastic error with respect to agents n: This is easily
n
seen by writing λn ≡ λeαρ ≡ 1 / σ n . This form yields ln(λn ) ≡ ln(λ ) + αρ n , suggesting that if we
estimate something similar to equation (1.10) using the basic CRRA transformation, one subject
at a time, getting estimates λ̂n and ρ̂ n for each subject n, we should find a linear relationship
between ln(λˆn ) and ρ̂ n across subjects n. If this linear relationship has a slope α significantly
greater than ln( z) , then a b ⇒ a b in the population from which subjects are drawn on all
mra
smra
contexts found in the experiment that generated the data.
Hey’s (2001) remarkable data set contains 500 lottery choices per subject, for fifty-three
subjects. This allows one to estimate model parameters separately for each subject n with relative
precision. The specification of choice probabilities actually used for this estimation adds a few
small features to equation (1.10) to take care of certain peripheral matters. It is
(3.2)
{
(
)
}
Pmcn = (1 − δ mc ) (1 − ω n ) H λn ∆ bVmc ( ρ n ) + ω n / 2 + δ mc (1 − ω n / 2) .
As is the case with several existing experimental data sets, Hey’s experiment contains a small
number of pairs mc in which Smc first-order stochastically dominates Rmc. Letting Ωfosd be the set
of all such FOSD pairs, let δmc = 1 ∀ m ∈ Ωfosd (δm = 0 otherwise). It is well-known that strong
and strict utility do a poor job of predicting the rareness of violations of FOSD (Loomes and
Sugden 1998). Equation (2.2) takes account of this, yielding Pmcn = 1 − ω n / 2 ∀ mc ∈ Ωfosd. The
new parameter ω n is a tremble probability. Some randomness of observed choice has been
thought to arise from attention lapses or simple responding mistakes that are independent of pairs
13
mc, and adding a tremble probability as above takes account of this. It also provides a convenient
way to model the low probability of FOSD violations. Moffatt and Peters (2001) show that there
are significantly positive tremble probabilities even in data from experiments where there are no
FOSD pairs, such as Hey and Orme (1994). The specification in equation (2.2) follows Moffatt
and Peters in assuming that “tremble events” occur with probability ω n independent of mc and,
in the event of a tremble, that choices of Smc or Tmc from pair mc are equiprobable.
Figure 5 plots maximum likelihood estimates ln(λˆn ) and ρ̂ n from equation (2.2) for each of
Hey’s (2001) fifty-three subjects, along with a robust regression line between them.8 It does
appear to be a remarkably linear relationship, as suggested by the heteroscedastic form
ln(λn ) ≡ ln(λ ) + αρ n . The robust regression coefficient is α̂ = 4.24, with a 90% confidence
interval [4.11,4.38]. There are sixteen MPS pairs in Hey’s experimental design: Simple
computational methods may be used to find the minimum value z > z1 ≡ max{z1c } = 50 for Hey’s
c
MPS pairs such that any α > ln(z ) allows MRA to imply SMRA for all those pairs.9 This
minimum value is about z = 59.7, so we need α > ln(59.7) = 4.09 for a b ⇒ a b . As can be
mra
smra
seen, the point estimate α̂ =4.24 does indeed allow it, and the 90% confidence interval for α̂ just
allows one to reject (at 5%) the directional null α ≤ 4.09 in favor of the alternative that
a b ⇒ a b amongst Hey’s subjects for all MPS pairs in all of the experiment’s contexts.
mra
smra
Results are similar in all essential ways if, instead, CRRA RDEU is estimated using Prelec’s
(
n
)
(1998) one-parameter probability weighting function w(q) = exp − [− ln(q)]γ .
8
The robust regression technique used is due to Yohai (1987), and deals with outliers in both the dependent and
independent variable. This is appropriate here, since both are estimates.
9
This is done by a simple grid search. Starting at α = ln( z1 ) = ln(50) , α is incremented in small steps until
n
eαρ ∆ bVmc ( ρ n ) is monotonically increasing in ρ (this is also checked by computational methods at each value of α)
for all MPS pairs mc in Hey’s design.
14
There is, however, an obvious theoretical difficulty with this “agent heteroscedasticity”
solution. Suppose that immediately upon completion of Hey’s (2001) experiment, we had
estimated the relationship in Figure 5 and then invited Hey’s fifty-three subjects back for another
experimental session in which they made choices from lottery pairs on a new fifth context c* =
(100,150,200). If we believe that the relationship shown in Figure 5 is “context-free,” that is, that
this relationship between relative risk aversion and precision is a fixed feature of this population
of subjects independent of any outcome context they might face, then a b ⇒
/ a b on the
mra
smra
new context c*. This is clear because the estimated upper confidence limit on α̂ , 4.38, implies a
value of z = 79.8 < 100 = z1c* (since e 4.38 = 79.8) . Therefore, as argued earlier, λe 4.38 ρ ∆ bVmc* ( ρ )
will converge to zero for sufficiently large ρ and so cannot be monotonically increasing in ρ : α
= 4.38 does not produce this monotonicity on the new context c*.
It should be clear that this problem is entirely general. Estimate or choose any finite value of
α you wish, on whatever data you like: You can always specify a new context c* with z1c* > eα ,
and a b ⇒
/ a b on that new context, given that value of α. Heteroscedasticity with respect
mra
smra
to agents is not, in the final analysis, a satisfactory way to make a b ⇒ a b . Although the
mra
smra
transformation u sc ( z | ρ ) = eαρ z1− ρ /(1 − ρ ) is “SMRA compatible” on a given collection of
contexts for a suitable choice of α, it cannot be SMRA compatible on all contexts with arbitrarily
large values of z1c . Of course one could make α “context-dependent” but this is tantamount to
abandoning strong and strict utility, since there will then be heteroscedasticity with respect to
contexts. If this is what we must do in order to make a b ⇒ a b , it is better to approach it
mra
smra
in a more basic theoretical way. This is what the contextual utility model does.
15
4. Contextual utility
Psychological motivation for contextual heteroscedasticity comes from signal detection and
stimulus discrimination experiments. In this literature, stimulus categorization errors are known
to increase with the subjective range taken by the stimulus or signal. For instance, Pollack (1953)
and Hartman (1954) presented subjects with a number of tones equally spaced over some range
of tones. The range of tones used is varied between subjects, but specific target tones are
encountered by all subjects. The results show that confusion of target tones is more common
when the range of tones encountered is wider.
Such observations gave rise to models of stimulus categorization and discrimination error
predicting that classification error variance increases with the subjective range of the stimulus
(Parducci 1965, 1974; Holland 1968; Gravetter and Lockhead 1973). In fact, a rough
proportionality between subjective stimulus range and the standard deviation of latent error
seemed descriptive of much data from categorization experiments, though some of the formal
models allowed for some deviation from this (e.g. Holland 1968 and Gravetter and Lockhead
1973). In categorization experiments, where stimuli are presented one at a time and subjects’ task
is to assign the stimulus to a category, the subjective range of the stimulus was usually taken to
be determined (after a period of adaptation) by the whole range of stimuli presented over the
course of the experiment—what one might call the “global context” of the stimulus.
The contextual utility model borrows the idea that the standard deviation of evaluative noise
is proportional to the subjective range of stimuli from this literature on categorization. However,
being a model of choice from lottery pairs rather than a model of categorization of singly
presented stimuli, it assumes that choice pairs create their own “local context” or idiosyncratic
subjective stimuli range, in the form of the range of outcome utilities found in the pair. We may
16
think of agents as perceiving lottery value on context c relative to the range of possible lottery
values on context c. This subjective stimulus range is assumed to be proportional to the
difference between the values of the best and worst possible lotteries on context c. Letting
V ( z | β n ) be subject n’s value of a degenerate lottery that pays z with certainty, the subjective
stimulus range for subject n on any context c is assumed proportional to V ( z3c | β n ) − V ( z1c | β n ) .
The contextual utility model for lottery choice pairs is the assumption that evaluative noise is
proportional to this subjective stimulus range. For non-FOSD pairs on a three-outcome context,
and ignoring trembles, contextual utility choice probabilities are:
(4.1)
 V ( S mc | β n ) − V (Tmc | β n ) 
.
P = H  λ
n
n 
V
z
β
V
z
β
(
|
)
−
(
|
)
3c
1c


n
mc
Under expected utility, this may be rewritten as
(4.2)
 ( s − t )u n ( z1c ) + ( sm 2 − tm 2 )u n ( z 2c ) + ( sm3 − t m3 )u n ( z3c ) 
 .
Pmcn = H  λ m1 m1
u n ( z3c ) − u n ( z1c )


Noting that ( sm1 − t m1 ) = −( sm 2 − tm 2 ) − ( sm3 − t m3 ) , equation (4.2) may be rewritten as
(4.3)
n
mc
P
  ( sm 2 − t m 2 )[u n ( z 2c ) − u n ( z1c )]

s
t
= H  λ 
+
(
−
)
m
3
m
3
 
n
n
u
z
u
z
(
)
−
(
)
3c
1c

 
n
= H (λ[(sm 2 − t m 2 )uurc
( z 2c ) + ( sm3 − t m3 )]) ,
where the transformation τ = urc is a unit range transformation on each context c—that is, an
affine transformation making u n ( z1c ) = 0 and u n ( z3c ) = 1 in each context c for each subject n.
n
That is, uurc
( z ) ≡ [u n ( z ) − u n ( z1c )] /[u n ( z3c ) − u n ( z1c )] . This is another interpretation of contextual
utility: It employs a “contextual unit range transformation” of the utility function.
17
Suppose that pair mc in equation (4.3) is an MPS pair. Pratt’s (1964, p. 128) Theorem 1
quickly shows that the contextual unit range transformation guarantees that a b ⇒ a b on
mra
smra
every three-outcome context, as suggested by Figure 2. Here I quote the relevant parts of Pratt’s
theorem, with notation modified to fit mine:
Theorem 1: Let r n (z ) [equal to − u ′′( z ) / u′( z ) for utility function un] be the local risk
aversion…corresponding to the utility function un, n = a,b. Then the following conditions
are equivalent, in either the strong form (indicated in brackets), or the weak form (with the
bracketed material omitted).
(a)
r a ( z ) ≥ r b ( z ) for all z [and > for at least one z in every interval]…
(e)
u a ( z3c ) − u a ( z 2 c )
u b ( z3c ) − u b ( z 2 c )
≤ [ <] b
for all z1c , z 2c , z3c with z1c < z 2c < z3c .
u a ( z2 c ) − u a ( z1c )
u ( z 2c ) − u b ( z1c )
Adding unity to both sides of (e) in the form [u n ( z 2c ) − u n ( z1c )] /[u n ( z2 c ) − u n ( z1c )] , and then
taking reciprocals of both sides, we get the following equivalence from the theorem:
(4.4)
u a ( z2 c ) − u a ( z1c ) u b ( z 2c ) − u b ( z1c )
r ( z ) > r ( z )∀z ∈ [ z1c , z3c ] ⇔ a
>
∀z 2c ∈ ( z1c , z3c ) .
u ( z3c ) − u a ( z1c ) u b ( z3c ) − u b ( z1c )
a
b
n
Since uurc
( z2c ) = [u n ( z 2c ) − u n ( z1c )] /[u n ( z3c ) − u n ( z1c )] , equation (4.4) shows that
(4.5)
a
b
r a ( z ) > r b ( z )∀z ∈ [ z1c , z3c ] ⇔ uurc
( z2 c ) > uurc
( z2 c )∀z 2c ∈ ( z1c , z3c ) .
Equation (4.3) shows that the EU V-distance between Smc and Tmc under the contextual unit range
n
n
transformation is ( sm 2 − tm 2 )uurc
( z 2c ) + ( sm3 − t m3 ) , which is obviously increasing in uurc
( z2c ) for
all MPS pairs since ( sm 2 − t m 2 ) > 0 in these pairs. Therefore, by equation (4.5), it is increasing in
local risk aversion r n (z ) as well. Therefore, the contextual utility model implies what strong and
18
strict utility cannot: That a b ⇒ a b on all three-outcome contexts when λ is constant
mra
smra
across agents.
We should not necessarily expect homoscedasticity with respect to agents; it is quite possible
that some agents are noisier decision makers than others so that λ varies across agents. Another
way to state the previous result, allowing for the possibility that precision varies across agents, is
as follows (the proof is obvious and so omitted):
Proposition: Consider two agents such that λa ≥ λb and r a ( z ) ≥ r b ( z ) for all z [and > for at
least one z on every context]. Then under the contextual utility model, a b . Put
smra
differently, a b and λa ≥ λb ⇒ a b under the contextual utility model.
mra
smra
Contextual utility disentangles variations in risk attitude from variations in precision because
its form implies that precision does unique descriptive work. This is easily seen by noting that
n
uurc
( z 2c ) → 1 as r n (z ) → ∞ . Therefore, as local risk aversion becomes infinite, equation (4.3)
with agent-dependent precision λn becomes
(4.6)
Pmcn = H (λn [ sm 2 + sm 3 − t m 2 − t m3 ]) .
Since sm 2 + sm 3 − tm 2 − t m3 is obviously finite, it is obvious that contextual utility does not imply
that Pmcn → 1 as r n (z ) → ∞ for all MPS pairs; it just implies that Pmcn is increasing in r n (z ) for
given λn . This may seem peculiar at first, but from an econometric viewpoint it implies that
precision λn is not simply a nuisance parameter, but rather plays a unique descriptive role in the
model: Extreme choice probabilities, that is those very close to zero or unity, necessarily require
appropriate variations in λn for their explanation (with typical choices of H such as the Standard
19
Normal or Logistic); variation in r n (z ) alone will not explain all extreme choice probabilities
under contextual utility.
When combined with either a constant absolute risk aversion (or CARA) utility function
u ( z ) = −e − rz or with a CRRA utility function, contextual utility implies some theoretically
attractive invariance properties of choice probabilities. Call c + k = ( z1c + k , z2 c + k , z3c + k ) and
kc = (kz1c , kz2 c , kz3c ) additive and proportional shifts of context c = ( z1c , z 2c , z3c ) , respectively.
Since u ( z + k ) = − e − r ( z + k ) = − e − rk e − rz for CARA, the contextual unit range transformation
implies that
(4.7)
uurn ,c+ k ( z + k ) ≡
≡
(−e − rk e − rz + e − rk e − rz1c ) /(−e − rk e − rz3 c + e − rk e − rz1c )
(−e − rz + e − rz1c ) /(−e − rz3 c + e − rz1c )
≡
n
uurc
(z ) .
Equation (4.3) then implies that Pmn,c+ k ≡ Pmcn : That is, contextual utility and CARA EU together
imply that choice probabilities are invariant to an additive context shift. Similarly, since u (kz ) =
(zk )1− ρ = k 1− ρ z1− ρ for CRRA, we have
(4.8)
uurn ,kc (kz )
≡
(k 1− ρ z1− ρ − k 1− ρ z11c− ρ ) /(k 1− ρ z13−c ρ − k 1− ρ z11c− ρ )
≡
( z1− ρ − z11c− ρ ) /( z31−c ρ − z11c− ρ )
≡
n
uurc
(z ) .
Equation (4.3) then implies that Pmn,kc ≡ Pmcn : That is, contextual utility and CRRA EU together
imply that choice probabilities in pairs are invariant to a proportional context shift. These
invariance properties of choice probabilities under contextual utility echo what is well-known
about structural CARA and CRRA preferences, namely that their preference directions are
20
invariant to additive and proportional context shifts, respectively. Strict utility shares these two
shift invariance properties with contextual utility, but strong utility does not.10
5. An Empirical Comparison of Contextual Utility and its Competitors
So far, I have been concerned with the theoretical coherence of structural and stochastic
definitions of the relation “more risk averse.” The question naturally arises: Which stochastic
model, when combined with EU or RDEU, actually explains and predicts binary choice under
risk best? To answer this question, we need a suitable data set. Strong utility and contextual
utility are simple reparameterizations of one another for any one context. Therefore, data from
any experiment where subjects make choices on a single context, such as Loomes and Sugden’s
(1998) experiment, are not suitable. The experiment of Hey and Orme (1994), hereafter HO, is
suitable since all subjects make choices from pairs defined on four distinct contexts.
The HO experiment has another desirable design feature that is unique among experiments
with multiple contexts: The same pairs of probability distributions are used to construct the
lottery pairs on all four of its contexts. Therefore, when models fail to explain choices across
multiple contexts in the HO data, we cannot attribute this to other differences in lottery pairs
across contexts: The failure must be due to the model’s generalizability across contexts.
The HO experiment builds lotteries from four equally spaced non-negative outcomes
including zero, in increments of 10 UK pounds. Letting “1” represent 10 UK pounds, the
10
Contextual utility and strong utility EU models share a different property which might be called the “phantom
common ratio effect” or PCRE, while strict utility and random preference (discussed below) EU models do not have
this property. See Loomes (2005, pp. 303-305) for a lucid discussion of the manner in which strong utility EU
produces the PCRE (the reasoning is identical for contextual utility) and why random preference EU does not. But
because of strict utility’s difference of logarithms form, common ratio changes drop out of that difference, so that
strict utility produces no PCRE: Strict utility EU produces a choice probability invariance to common ratio changes
that mimics random preferences EU.
21
experiment uses the outcome vector (0,1,2,3). Exactly one-fourth of observed lottery choices in
the HO data are choices from pairs on each of the four possible three-outcome contexts from this
vector. In keeping with past notation, these four contexts are denoted by c ∈ {−0,−1,−2,−3},
where −0 = (1,2,3), −1 = (0,2,3), −2 = (0,1,3) and −3 = (0,1,2).
HO estimate a variety of structures combined with strong utility, and estimate these models
individually—that is, each structure is estimated separately for each subject. Additionally, for all
structures that specify a utility function on outcomes u(z), HO take a nonparametric approach to
the utility function. Letting u zn ≡ u n (z ) , HO set u0n = 0 and λ = 1, and estimate u1n , u2n and u3n
directly, allowing the utility function u(z) to take on arbitrary shapes across the outcome vector
(0,1,2,3). (Given the latent variable form of strong utility models and the affine transformation
properties of u(z), just three of the five parameters λ, u0n , u1n , u2n and u3n are identified.) HO
found that estimated utility functions overwhelmingly fall into two classes: Concave utility
functions, and inflected utility functions that are concave over the context (0,1,2) but convex
over the context (1,2,3). The latter class is quite common, accounting for thirty to forty percent
of subjects (depending on the structure estimated). Because of this, I follow HO in avoiding a
simple parametric functional form such as CARA or CRRA that forces concavity or convexity
across the entire outcome vector (0,1,2,3), instead adopting their nonparametric treatment of
utility functions in strong, strict and contextual utility models. However, I will instead set u0n = 0
and u1n = 1, and estimate λ, u2n and u3n .
To account for any reliable heterogeneity of subject behavior, however, I part company with
HO’s individual estimation and use a random parameters approach for estimating and comparing
models, as done by Loomes, Moffatt and Sugden (2002) and Moffatt (2005). Individual
22
estimation does avoid distributional assumptions made by random parameters methods, and
seems well-behaved at HO’s sample sizes when we confine attention to “in-sample” comparisons
of model fit, as HO did. However, Monte Carlo analysis shows that with finite samples of the
HO size, comparisons of the out-of-sample predictive performance of models can be extremely
misleading with individual estimation and prediction (Wilcox 2007). I want to address both the
in-sample fit and out-of-sample predictive performance of models, so this issue matters here.
This is why I choose a random parameters estimation approach.
The HO experiment allowed subjects to express indifference between lotteries. HO model
this with an added “threshold of discrimination” parameter within a strong utility model. An
alternative parameter-free approach, and the one I take here, treats indifference in a manner
suggested by decision theory, where the indifference relation S mc ~ n Tmc is defined as the
intersection of two weak preference relations, i.e. “ S mc nTmc ∩ Tmc n S mc .” This suggests treating
indifference responses as two responses in the likelihood function—one of Smc being chosen
from mc, and another of Tmc being chosen from mc—but dividing that total log likelihood by two
since it is really based on just one independent observation. Formally, the definite choice of Smc
adds ln( Pmc ) to the total log likelihood; the definite choice of Tmc adds ln(1 − Pmc ) to that total;
and indifference adds [ln(Pmc ) + ln(1 − Pmc )] / 2 to that total. See also Papke and Wooldridge
(1996) and Andersen et al. (2007) for related justifications of this approach.
So far I have confined my attention to strong, strict and contextual utility. However, another
stochastic modeling approach, known as the random preference model, appears in both
contemporary experimental and theoretical work on discrete choice under risk (Loomes and
Sugden 1995, 1998; Carbone 1997; Loomes, Moffatt and Sugden 2002; Gul and Pesendorfer
2006). Econometrically, the random preference model views stochastic choice as arising from
23
randomness of structural parameters. We think of each agent n as having an urn filled with
structural parameter vectors β n . Following Carbone (1997) for instance, we may set u0n = 0 and
u1n = 1 and think of EU with random preferences as a situation where β n = (u 2n , u3n ) , with u2n ≥ 1
and u3n ≥ u 2n , and where each subject n has an urn filled with such utility vectors (u 2n , u3n ) . At
each trial of any pair mc, the subject draws one of these utility vectors from her urn (with
replacement) and uses it to calculate both V ( S mc | β n ) and V (Tmc | β n ) without error, choosing
Smc iff V ( S mc | β n ) ≥ V (Tmc | β n ). Let Fβ ( x | α n ) be the joint c.d.f. of β in agent n’s “random
preference urn,” conditioned on some vector α n of parameters determining the shape of the
distribution Fβ . Then under random preferences,
(5.1)
(
)
Pmcn = Pr V ( S mc | β n ) − V (Tmc | β n ) ≥ 0 | Fβ ( x | α n ) .
Put differently, if Bmc = {β | V ( S mc | β ) − V (Tmc | β ) ≥ 0} , then Pmcn = Pr (β n ∈ Bmc ).11
Loomes, Moffatt and Sugden (2002) show how to specify a random preferences RDEU
model for estimation when lottery pairs are defined on a single context. However, econometric
specification of a random preference RDEU model across multiple contexts is more difficult.
Building both on Loomes, Moffatt and Sugden and certain insights of Carbone (1997), Appendix
A shows how this may be done for the three contexts −0, −2 and −3, but also shows why further
extending random preferences to cover the other context −1 is not a transparent exercise for the
RDEU structure. For this reason, I confine all of my estimations to choices in the HO data on the
11
For simplicity’s sake, assume that parameter vectors producing indifference in any pair mc have zero measure for
all n, so that the sets Bmc and Bmc′ = {β | V ( S mc | β ) − V (Tmc | β ) > 0} have equal measure. See Gul and Pesendorfer
(2006) for a formal discussion of this assumption.
24
contexts −0, −2 and −3. It is simple to specify strong, strict or contextual utility models for both
EU and RDEU for all four contexts; but then we could not compare them to random preferences.
Appendix B gives details of the random parameters econometric specifications used for
strong utility, strict utility, contextual utility and random preference versions of both EU and
RDEU structures, for each of the HO contexts −0, −2 and −3. I perform two different kinds of
comparisons of model fit. The first kind (very common in this literature) are “in-sample fit
comparisons.” Models are estimated on all three of the HO contexts (−0, −2 and −3) and the
resulting log likelihoods for each model across all three contexts are compared.
The second kind of comparison, which is rare in this literature but well-known generally,
compares the “out-of-sample” fit of models—that is, their predictive performance. For these
comparisons, models are estimated on just the two HO contexts −2 and −3, that is contexts
(0,1,3) and (0,1,2), and these estimates are used to predict choice probabilities and calculate log
likelihoods of observed choices on HO context −0, that is context (1,2,3). This is something
more than a simple out-of-sample prediction, which could simply be a prediction to new choices
made on the same contexts used for estimation: It is additionally an “out-of-context” prediction.
This particular kind of out-of-sample fit comparison may be quite difficult in the HO data.
Relatively safe choices are the norm in contexts −2 and −3 of the HO data: The mean proportion
of safe choices made by HO subjects in these contexts is 0.764, and at the individual level this
proportion exceeds ½ for seventy of the eighty subjects. But relatively risky choices are the norm
in context −0 of the HO data: The mean proportion of safe choices there is just 0.379, and falls
short of ½ for fifty-eight of the eighty subjects. Recall that we cannot attribute this to differences
in the probability vectors that make up the lottery pairs in the different HO contexts, since the
HO experiment holds this constant across contexts. This out-of-sample prediction task is going to
25
be difficult: From largely safe choices in the “estimation contexts” −2 and −3, the models need to
predict largely risky choices in the “prediction context” −0.
Table 1 displays both the in-sample and out-of-sample log likelihoods for the eight models.
The top four rows are EU models, and the bottom four rows are RDEU models; for each
structure, the four rows show results for strong utility, strict utility, contextual utility and random
preferences. The first column shows total in-sample log likelihoods, and the second column
shows total out-of-sample log likelihoods. Contextual utility always produces the highest log
likelihood, whether it is combined with EU or RDEU, and whether we look at in-sample or outof-sample log likelihoods (though the log likelihood advantage of contextual utility is most
pronounced in the out-of-sample comparisons). Buschena and Zilberman (2000) and Loomes,
Moffatt and Sugden (2002) point out that the best-fitting stochastic model may depend on the
structure estimated and offer empirical illustrations of this point. As a general econometric point,
I heartily agree with them. Yet in Table 1 contextual utility is the best stochastic model
regardless of whether we view the matter from the perspective of the EU or RDEU structures, or
from the perspective of in-context or out-of-context fit.
It is fair to say that in the last quarter century, decision theory has been relatively dominated
by structural theory innovation rather than stochastic model innovation. Table 1 has something to
say about this. Let us first examine the in-sample fit column. Holding stochastic models constant,
the maximum improvement in log likelihood associated with moving from the EU to the RDEU
is 142.02 (with strict utility), and the improvement is 106.64 for the best-fitting stochastic model
(contextual utility). Holding structures constant instead, the maximum improvement in log
likelihood associated with changing the stochastic model is 151.48 (with the EU structure,
switching from strict to contextual utility), but this is atypical: Omitting strict utility models,
26
which have an unusually poor in-sample fit, the maximum improvement is 51.28 (with the EU
structure, switching from strong to contextual utility) and is usually no more than half that.
Therefore, except for the especially poor strict utility fits, in-sample comparisons make choices
of stochastic models appear to be a sideshow relative to choices of structure.
This appearance is reversed when we look at out-of-sample comparisons—that is, predictive
power. Looking now at the out-of-sample fit column, notice first that under strict utility, RDEU
actually fits worse than EU does. We should be getting the impression by now, however, that
strict utility is an unusually poor performer, so let us henceforth omit it from consideration.
Among the remaining three stochastic models, the maximum out-of-sample fit improvement
associated with switching from EU to RDEU is 21.19 (for contextual utility). Holding structures
constant instead, the maximum out-of-sample fit difference between the stochastic models (again
omitting strict utility) is 113.39 (for the RDEU structure, switching from strong to contextual
utility). In the realm of out-of-sample prediction, then, structure plays Rosencrantz and
Guildenstern to the stochastic model Hamlet. There will be honest differences of opinion as to
whether decision theory should preoccupy itself with explanation or prediction, but it seems that
those who have more interest in the latter may want to think more about stochastic models, as
repeatedly urged by Hey (Hey and Orme 1994; Hey 2001; Hey 2005).
Table 2 reports the results of a more formal comparison between the stochastic models,
~
conditional on each structure. Let D n denote the difference between the estimated log
likelihoods (in-sample or out-of-sample) from a pair of models, for subject n. Vuong (1989)
~
provides an asymptotic justification for treating a z-score based on the D n as following a normal
distribution under the hypothesis that two non-nested models are equally good. The statistic is
computed as z =
∑
N
k =1
~
~
D n /(~
sD N ) , where ~
sD is the sample standard deviation of the D n across
27
subjects n (calculated without the usual adjustment for a degree of freedom) and N is the number
of subjects. Table 2 reports these z-statistics, and associated p-values against the null of equally
good fit, with a one-tailed alternative that the directionally better fit is significantly better. While
contextual utility is always directionally better than its competitors, no convincingly significant
ordering of the stochastic models emerges from the in-sample comparisons shown in the left half
of Table 2, though strict utility is clearly significantly worse than the other three stochastic
models. Contextual utility shines, though, in the out-of-sample fit comparisons in the right half
of Table 2, regardless of whether the structure is EU or RDEU, where it beats the other three
stochastic models with strong significance.
6. Conclusions
A great deal of the scholarly conversation about decision under risk has been about structure,
but there has been resurgent interest in the stochastic part of decision under risk. This has been
driven both by theoretical questions and empirical findings. Theoretically, some or all of what
passes for “an anomaly” relative to some structure (usually EU) can sometimes be attributed to
stochastic models rather than the structure in question. This is an old point, stretching back at
least to Becker, DeGroot and Marschak’s (1963a,1963b) observation that violations of
betweenness are precluded by some stochastic versions of EU (random preferences) but
predicted by other stochastic versions of EU (strong utility). But this general concern has been
resurrected by many writers; Loomes (2005), Gul and Pesendorfer (2006) and Blavatskyy (2007)
are just three relatively recent (but very different) examples. Empirically, we know from a (now
quite large) body of experimental evidence that the stochastic part of decision under risk is
surprisingly large, in the sense that retest reliabilities show that choice probabilities are quite
28
distant from zero or one. Additionally, a “new structural econometrics” has emerged over the last
decade in which structures like EU, RDEU and cumulative prospect theory are married to some
stochastic model for the purpose of estimating structural parameters.
My study is not inspired by any worry that choice anomalies might be attributable to
stochastic reasons, though I appreciate this point. Rather, I ask what the empirical meaning of
estimated structural parameters are in the new structural econometrics of risky choice. I find that
structural parameters meant to represent the degree of risk aversion in Pratt’s (1964) structural
sense of “more risk averse” cannot imply a theoretically attractive definition of the relation
“stochastically more risk averse” across all choice contexts, when estimation uses strong or strict
utility models. I conclude that strong and strict utility are deeply flawed stochastic models for the
econometrics of discrete choice under risk. The contextual utility model eliminates this flaw
without introducing extra parameters. Empirically, at least in the Hey and Orme (1994) data, its
in-sample performance is as good as that of strong utility, strict utility and random preferences;
and its out-of-sample predictive performance is significantly the best of this particular collection
of stochastic models.
As a closing thought, contextual utility has novel implications for strength of incentives in
experimental designs. It suggests that the marginal utility improvement from taking risky action
X rather than risky action Y is only a part of what governs the strength of perceived incentives:
The other important part is the subjective range of available utility levels. This implies that, in
general, scaling up payoffs in experiments may be a relatively ineffective means of achieving
payoff dominance. If instead we can change a design so as to shrink the subjective range of
available utilities, holding marginal utility improvements constant, this may be a more effective
way of improving subjective payoff dominance in a design.
29
References
Aitchison, J. (1963). Inverse Distributions and Independent Gamma-Distributed Products of
Random Variables. Biometrika 50, 505-508.
Andersen, S., Harrison, G. W., Lau, M. I., and Rutstrom, E. E. (2007). Eliciting Risk and Time
Preferences. Working paper 5-24, University of Central Florida Department of Economics.
G.M. Becker, G. M., DeGroot, M. H., Marschak, J. (1963a). Stochastic Models of Choice
Behavior. Behavioral Science 8, 41–55.
Becker, G. M., DeGroot, M. H. and Marschak, J. (1963b). An Experimental Study of Some
Stochastic Models for Wagers. Behavioral Science 8, 199-202.
Blavatskyy, P. R. (2007). Stochastic Choice Under Risk. Journal of Risk and Uncertainty
(forthcoming).
Block, H. D. and Marschak, J. (1960). Random Orderings and Stochastic Theories of Responses.
In I. Olkin et al. (Eds.), Contributions to Probability and Statistics: Essays in Honor of
Harold Hotelling (pp. 97-132). Stanford CA: Stanford University Press.
Buschena, D. E. and Zilberman, D. (2000). Generalized Expected Utility, Heteroscedastic Error,
and Path Dependence in Risky Choice. Journal of Risk and Uncertainty 20, 67-88.
Camerer, C. (1989). An Experimental Test of Several Generalized Expected Utility Theories.
Journal of Risk and Uncertainty 2, 61-104.
Carbone, E. (1997). Investigation of Stochastic Preference Theory Using Experimental Data.
Economics Letters 57, 305-311.
Chew, S. H., Karni, E. and Safra, Z. (1987). Risk Aversion in the Theory of Expected Utility
with Rank-Dependent Preferences. Journal of Economic Theory 42, 370-81.
Cox, J. C., and Sadiraj, V. (2006) Small- and large-stakes risk aversion: Implications of
concavity calibration for decision theory. Games and Economic Behavior 56, 45-60.
Debreu, G. (1958). Stochastic Choice and Cardinal Utility. Econometrica 26, 440-444.
Edwards, W. (1954). A Theory of Decision Making. Psychological Bulletin 51, 380-417.
Gravetter, F., and Lockhead, G. R. (1973). Criterial Range as a Frame of Reference for Stimulus
Judgment. Psychological Review 80, 203-216.
Gul, F., and Pesendorfer, W. (2006). Random Expected Utility. Econometrica 74, 121-146.
30
Harrison, G. W., and Rutström, E. E. (2005). Expected Utility Theory and Prospect Theory: One
Wedding and a Decent Funeral. Working paper 05-18, Department of Economics, College of
Business Administration, University of Central Florida.
Hartman, E. B. (1954). The Influence of Practice and Pitch-Distance between Tones on the
Absolute Identification of Pitch. American Journal of Psychology 67, 1-14.
Hey, J. D. (1995). Experimental Investigations of Errors in Decision Making Under Risk.
European Economic Review 39, 633-640.
Hey, J. D. (2001). Does Repetition Improve Consistency? Experimental Economics 4 5-54.
Hey, J. D. and Carbone, E. (1995). Stochastic Choice with Deterministic Preferences: An
Experimental Investigation. Economics Letters 47, 161–167.
Hey, J. D. and Orme, C. (1994). Investigating Parsimonious Generalizations of Expected Utility
Theory Using Experimental Data. Econometrica 62, 1291-1329.
Hutchinson, T. P. and Lai, C. D. (1990). Continuous Bivariate Distributions, Emphasizing
Applications. Adelaide, Australia: Rumsby Scientific Publishers.
Judd, K. L. (1998). Numerical Methods in Economics. Cambridge, MA: MIT Press.
Loomes, G. (2005). Modeling the Stochastic Component of Behaviour in Experiments: Some
Issues for the Interpretation of Data. Experimental Economics 8, 301-323.
Loomes, G. and Sugden, R. (1995). Incorporating a Stochastic Element into Decision Theories.
European Economic Review 39, 641-648.
Loomes, G. and Sugden, R. (1998). Testing Different Stochastic Specifications of Risky Choice.
Economica 65, 581-598.
Loomes, G., Moffatt, P. and Sugden, R. (2002). A Microeconometric Test of Alternative
Stochastic Theories of Risky Choice. Journal of Risk and Uncertainty 24, 103-130.
Luce, R. D. and Suppes, P. (1965). Preference, Utility and Subjective Probability. In R. D. Luce,
R. R. Bush and E. Galanter (Eds.), Handbook of Mathematical Psychology, Vol. III (pp. 249410). New York: Wiley.
Machina, M. (1985). Stochastic Choice Functions Generated from Deterministic Preferences
over Lotteries. Economic Journal 95, 575-594.
McKay, A. T. (1934). Sampling from Batches. Supplement to the Journal of the Royal Statistical
Society 1, 207-216.
31
Moffatt, P. (2005). Stochastic Choice and the Allocation of Cognitive Effort. Experimental
Economics 8, 369-388.
Moffatt, P. and Peters, S. (2001). Testing for the Presence of a Tremble in Economics
Experiments. Experimental Economics 4, 221-228.
Papke, L. E., and Wooldridge, J. M. (1996). Econometric Methods for Fractional Response
Variables with an Application to 401(K) Plan Participation Rates. Journal of Applied
Econometrics 11, 619-632.
Parducci, A. (1965). Category Judgment: A range-frequency model. Psychological Review 72,
407-418.
Parducci, A. (1974). Contextual Effects: A Range-Frequency Analysis. In E. C. Carterette & M.
P. Friedman, eds., Handbook of Perception (Vol 2). New York: Academic Press.
Pollack, I. (1953). The Information of Elementary Auditory Displays, II. Journal of the
Accoustical Society of America 25, 765-769.
Pratt, J. W. (1964). Risk Aversion in the Small and in the Large. Econometrica 32, 122-136.
Prelec, D. (1998). The Probability Weighting Function. Econometrica 66, 497-527.
Rothschild, M. and Stiglitz, J. E. (1970). Increasing Risk I: A Definition. Journal of Economic
Theory 2, 225-243.
Starmer, C. and Sugden, R. (1989). Probability and Juxtaposition Effects: An Experimental
Investigation of the Common Ratio Effect. Journal of Risk and Uncertainty 2, 159-78.
Vuong, Q. (1989). Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses.
Econometrica 57, 307–333.
Wilcox, N. (2007). Predicting risky choices out-of-context: A critical stochastic modeling primer
and Monte Carlo study. University of Houston Department of Economics Working Paper,
available at
http://www.class.uh.edu/econ/faculty/nwilcox/papers/predicting_risky_choices_may2007.pdf
Wooldridge, J. M. (2002). Econometric Analysis of Cross Section and Panel Data. Cambridge
MA: MIT Press.
32
Appendix A: Random Preference RDEU Across Multiple Contexts
Since EU is a special case of RDEU, I begin with the harder case of a random preferences
specification for RDEU; it will imply a specification for EU. The specification generalizes that
of Loomes, Moffatt and Sugden (2002) for a single context, and builds on insights of Carbone
(1997) concerning random preference EU models across multiple contexts. Like Loomes,
Moffatt and Sugden, I simplify the problem by assuming that any parameters of the probability
transformation function w(q) of RDEU are nonstochastic, that is, that the only structural
parameters that vary in a subject’s “random preference urn” are her outcome utilities.
Rank-dependent expected utility or RDEU, with Prelec’s (1998) one-parameter form for the
n
probability weighting function, that is w(q | γ n ) = exp(−[− ln(q)]γ ) ∀ q ∈ (0,1), w(0 | γ n ) = 0
and w(1 | γ n ) = 1 , is the structure
((
) (
))
(A.1) V ( S mc | u n ( z ), γ n ) = ∑i=1 w ∑ j≥i smi | γ n − w ∑ j >i smi | γ n u n ( zic ) .
3
Substituting this into equation (5.1), we have
(A.2)
(
)
Pmcn = Pr ∑i=1Wmi (γ n )u n ( zic ) ≥ 0 | Fu ( x | α n ) , where
3
(
) (
) (
) (
)
Wmi (γ n ) = w ∑ j≥i smi | γ n − w ∑ j >i smi | γ n − w ∑ j≥i tmi | γ n + w ∑ j >i tmi | γ n .
Since Wm1 (γ n ) ≡ −Wm 2 (γ n ) − Wm3 (γ n ) , and assuming strict monotonicity of utility in outcomes so
that we may divide the inequality in (A.2) through by u n ( z2 c ) − u n ( z1c ) , equation (5.3) becomes
(A.3)
n
Pmcn = Pr (Wm 2 (γ n ) + Wm3 (γ n )umuc
( z3c ) ≥ 0 | Fu ( x | α n ) ) , where
n
u muc
( z3c ) ≡ [u n ( z3c ) − u n ( z1c )] /[u n ( z 2c ) − u n ( z1c )] ,
Wm 2 (γ n ) = w( sm 2 + sm 3 | γ n ) − w( sm3 | γ n ) − [ w(tm 2 + t m3 | γ n ) − w(t m3 | γ n )] , and
33
Wm3 (γ n ) = w( sm3 | γ n ) − w(tm 3 | γ n ) .
n
( z3c ) > 1 is a “contextual minimum unit transformation” of the utility of the
Notice that u muc
n
maximum outcome z3c in context c. Let vcn ≡ umuc
( z3c ) − 1 ; according to equation (A.3) and the
random preference model, vcn ∈R+ is an underlying random variable containing all choicerelevant stochastic information about the agent’s random utility function u n (z ) for choices on
context c. Let Gvc ( x | α cn ) be the c.d.f. of vcn . Adopt the convention that lottery names in pairs are
always chosen so that t m3 − sm3 > 0 ,12 as is true under the previous definition of MPS pairs, so
that W3c (γ n ) = w(tm 3 | γ n ) − w( sm3 | γ n ) > 0 for all pairs. We can then rewrite (A.3) to make the
change of random variables described above explicit:
(A.4)
Pmcn = Pr (vcn ≤ −[Wm 2 (γ n ) + Wm3 (γ n )] / Wm 3 (γ n ) | Gvc ( x | α cn ) ) , or
 w( sm 2 + sm3 | γ n ) − w(t m 2 + t m3 | γ n ) n 
Pmcn = Gvc 
| α c  .
n
n
w
t
−
w
s
(
|
)
(
|
)
γ
γ
m
m
3
3


With equation (A.4), we have arrived where Loomes, Moffatt and Sugden (2002) left things.
Choosing some c.d.f. on R+ for Gvc , such as the Lognormal or Gamma distribution, construction
of a likelihood function from (A.4) and choice data is straightforward for one context and this is
the kind of experimental data Loomes, Moffatt and Sugden had. But when contexts vary in a data
set, the method quickly becomes intractable except for special cases. I now work out one such
special case. Consider an experiment such as Hey and Orme (1994) that uses pairs on the four
three-outcome contexts generated by four equally-spaced monetary outcomes including zero, and
let (0,1,2,3) denote these outcomes. The random utility vector for the four outcomes can be
12
If tm3−sm3=0, either Tmc and Smc are identical, or mc is an FOSD pair. In this latter case, random preferences imply
a zero probability of choosing the dominated lottery, or with a small tremble probability, a ωn/2 probability of
choosing the dominated lottery, which is how Loomes, Moffatt and Sugden (2002) and I would model that
probability.
34
expressed in terms of just two random utilities by an appropriate transformation; the minimum
unit transformation gives the vector as (0,1, u2n , u3n ) where u2n > 1 and u3n > u2n . Let g1n ≡ u2n −1 ∈
R+ and g 2n ≡ u3n − u2n ∈ R+ be two underlying random variables generating the two random
utilities as u2n = 1+ g1n and u3n = 1+ g1n + g 2n . We immediately have v−n3 = g1n on context c = −3
(that is, the context (0,1,2)), and v−n2 = g1n + g 2n on context c = −2 (that is, the context (0,1,3)).
Applying the contextual minimum unit transformation on context c = −0 (that is, the context
n
n
n
n
n
n
n
n
(1,2,3)), umu
, −0 (3) = (u3 − 1) /(u 2 − 1) = 1 + (u3 − u 2 ) /(u 2 − 1) = 1 + g 2 / g1 . Therefore, we have
v−n0 = g 2n / g1n on context −0.
With the random variables v−n0 , v−n2 and v−n3 expressed in terms of the two underlying random
variables g1n and g 2n , we need a joint distribution of g1n and g 2n so that g1n , g1n + g 2n and g 2n / g1n
have tractable parametric distributions. The only choice I am aware of is two independent
gamma variates, each with the gamma distribution’s c.d.f. G(x|φ,κ), with identical “scale
parameter” κ n but possibly different “shape” parameters φ1n and φ2n . Under this choice, v−n3 = g1n
will have c.d.f. G(x| φ1n , κ n ), v−n2 = g1n + g 2n will have c.d.f. G(x| φ1n + φ2n , κ n ), and v−n0 = g 2n / g1n will
have the c.d.f. B′( x | φ2n , φ1n ) of the “beta-prime” distribution on R+, also called a “beta
distribution of the second kind” (Aitchison 1963).13 These assumptions imply a joint distribution
of u2n − 1 and u3n − 1 known as “McKay’s bivariate gamma distribution” and a correlation
coefficient
φ1n /(φ1n + φ2n ) between u2n and u3n (McKay 1934; Hutchinson and Lai 1990).
13
The ratio relationship here is a generalization of the well-known fact that the ratio of independent chi-square
variates follows an F distribution. Chi-square variates are gamma variates with common scale parameter κ = 2. In
fact, a beta-prime variate can be transformed into an F variate: If x is a beta-prime variate with parameters a and b,
then bx/a is an F variate with degrees of freedom 2a and 2b. This is convenient because almost all statistics software
packages contain thorough call routines for F variates, but not necessarily any call routines for beta-prime variates.
35
An acquaintance with the literature on estimation of random utility models may make these
assumptions seem very special and unnecessary. They are very special, but this is because
theories of risk preferences over money outcomes are very special relative to the kinds of
preferences that typically get treated in that literature. Consider the classic example of
transportation choice well-known from Domencich and McFadden (1975). Certainly we expect
the value of time and money to be correlated across the population of commuters. But for a
single commuter making a specific choice between car and bus on a specific morning, we do not
require a specific relationship between the disutility of commuting time and the marginal utility
of income she happens to “draw” from her random utility urn on that particular morning. This
gives us fairly wide latitude when we choose a distribution for the unobserved parts of her
utilities of various commuting alternatives.
This is definitely not true of any specific trial of her choice from a lottery pair mc. The spirit
of random preferences is that every preference ordering drawn from the urn obeys all properties
of the preference structure (Loomes and Sugden 1995). We demand, for instance, that she
“draw” a vector of outcome utilities that respects monotonicity in z; this implies that the joint
distribution of u2n and u3n must have the property that u3n ≥ u2n ≥ 1 . Moreover, the assumptions
we make about the vcn must be probabilistically consistent across pair contexts. Choosing a joint
distribution of u2n and u3n immediately implies exact commitments regarding the distribution of
any and all functions of u2n and u3n . The issue does not arise in a data set where subjects make
choices from pairs on just one context, as in Loomes, Moffatt and Sugden (2002): In this
simplest of cases, any distribution of vcn on R+, including the Lognormal choice they make, is a
wholly legitimate hypothesis. But as soon as each subject makes choices from pairs on several
different overlapping contexts, staying true to the demands of random preferences is much more
36
exacting. Unless we can specify a joint distribution of g1n and g 2n that implies it, we are not
entitled (for instance) to assume that vcn follows Lognormal distributions in all of three
overlapping contexts for a single subject.14 Our choice of a joint distribution for v−n2 and v−n3 has
exact and inescapable implications for the distribution of v−n0 . Carbone (1997) correctly saw this
in her random preference treatment of the EU structure. Under these circumstances, a cagey
choice of the joint distribution of g1n and g 2n is necessary.
The random preference model can be quite limiting in practical applications. For instance,
notice that I have not specified a distribution of v−n1 and hence have no method for applying
equation (5.5) to pairs on the context (0,2,3). Fully a quarter of the data from experiments such
as Hey and Orme (1994) are on that context. On the context −1 = (0,2,3), v−n1 = g 2n /(1+ g1n ). Is
there a choice of the joint distribution of g1n and g 2n that gives us convenient parametric
distributions of g1n , g1n + g 2n , g 2n / g1n and g 2n /(1+ g1n ), so that those four distributions collectively
depend on just three parameters? I can find no answer to this question. This is why I limit myself
here to the three-fourths of these data sets that are choices from the contexts (0,1,2), (0,1,3) and
(1,2,3): These are the contexts that the “independent gamma model” of random preferences
developed above can be applied to. There are no similar practical modeling constraints on RDEU
strict, strong or contextual utility models (a considerable practical point in their favor); these
models are easily applied to choices on any context.
14
Although ratios of lognormal variates are lognormal, there is no similar simple parametric family for sums of
lognormal variates. The independent gammas with common scale are the only workable choice I am aware of.
37
Appendix B: Estimation Procedures, Specifications of Random Parameters Distributions,
and Specifications of Underlying Choice Probabilities
The Hey and Orme (1994) or HO experiment contains no FOSD pairs; therefore, we need no
special specification to account for low rates of violation of FOSD. However, Moffatt and Peters
(2001) found evidence of nonzero tremble probabilities using the HO data so I include a tremble
probability in all models. Working with Hey’s (2001) still larger data set, which contains 125
observations on each of four contexts, I estimated subject-specific trembles ω n separately on all
four contexts, and find that there is no significant correlation of these subject-specific estimates
of ω n across contexts. This suggests that there is little or no reliable between-subjects variance
in tremble probabilities—that is, that ω n = ω for all n—and I will henceforth assume that this is
true of the population in all cases. Under these assumptions, likelihood functions are in all
instances built from “full” probabilities Pmcfn that contain this constant tremble, that is
Pmcfn = (1 − ω ) Pmcn + ω / 2 . The discussion here concentrates wholly on the specifications of Pmcn and
their distribution across subjects.
Let H denote a particular pairing of a structure and a stochastic model. For instance H =
(EU,Strong) is expected utility with the strong utility stochastic model. Supposing that H is the
true data-generating process or DGP for all subjects in the sample, let ψ Hn = ( β Hn ,α Hn ) denote
subject n’s true parameter vector governing her choices from pairs. Here, β Hn is the structural
parameter vector: It contains utilities of outcomes u2n and u3n whenever H is a strong, strict or
contextual utility model, and also the weighting function parameter γ n whenever H is an RDEU
structure. By contrast, α Hn is the stochastic parameter vector that governs the shape and/or
38
variance of distributions determining choice probabilities: This is λn in strict, strong or
contextual utility models, and is the vector (φ1n , φ2n , κ n ) in random preference models.
Let J H (ψ H | θ H ) denote the joint c.d.f. governing the distribution of ψ H in the population
from which subjects are sampled, where θ H are parameters governing J H . Let θ H* be the true
~
value of θ H in the population. We want an estimate θ H of θ H* : This is what is meant by random
parameters estimation. It requires that we choose a reasonable and tractable form for J H that
appears to characterize main features of the joint distribution of parameter vectors ψ H in the
sample. The procedure followed for doing this is detailed below for the (EU,Strong) model; then,
the procedure is outlined more generally; and finally the exact specifications of all models are
laid out in detail.
At the level of an individual subjects, without trembles and suppressing the subject
superscript n, the (EU,Strong) model is
(B.1)
(
)
Pmc = Λ λn [(sm1 − t m1 )u1c + ( sm 2 − t m 2 )u2 c + ( sm3 − tm3 )u3c ] ,
where Λ ( x) = [1 + exp(− x)]−1 is the Logistic c.d.f. (which will be consistently employed as the
function H(x) for the strong, strict and contextual utility models). Equation B.1 introduces the
notation uic = u ( zic ). In terms of the underlying utility parameters u2 and u3 to be estimated, this
implies the following:
(1, u2 , u3 ) for pairs on context c = −0 = (1,2,3) ;
(B.2)
(u1c , u2 c , u3c ) =
(0,1, u3 ) for pairs on context c = −2 = (0,1,3) ; and
(0,1, u2 ) for pairs on context c = −3 = (0,1,2) .
39
I begin by estimating the parameters of a simplified version of equation B.1 individually, for
sixty-eight of HO’s eighty subjects,15 using all 150 observations of choices on the contexts −0,
−2, and −3 combined. This initial subject-by-subject estimation gives an initial impression of
what the joint distribution of parameter vectors ψ = (u2 , u3 , λ ) looks like across subjects, and
how we might choose J (ψ | θ ) to represent that distribution. At this initial step, ω is not
estimated, but rather assume it is constant across subjects and equal to 0.04.16 Estimation of ω is
undertaken later in the random parameters estimation.
Therefore, I begin by estimating u2, u3 and λ, using the “full” probabilities
(B.3)
Pmcf = (1 − ω ) Pmc + ω / 2 = 0.96Λ (λ[(sm1 − tm1 )u1c + ( sm 2 − tm 2 )u 2c + ( sm3 − t m3 )u3c ]) + 0.02 ,
which temporarily fixes ω at 0.04, for each subject. The log likelihood function for subject n is
(B.4)
LLn (u2 , u3 , λ ) =
∑
mc
n
n
ymc
ln( Pmcf ) + (1 − ymc
) ln(1 − Pmcf )
with the specification of Pmcf in equation B.3. This is maximized for each subject n, yielding
~
ψ~ n = (u~2n , u~3n , λ n ) , initial estimates for each subject n. Figure 6 graphs ln(u~2n − 1) , ln(u~3n − 1) and
~
ln(λ n ) against their first principal component, which accounts for about 69 percent of their
collective variance.17 The figure also shows regression lines on the first principal component.
The Pearson correlation between ln(u~2n − 1) and ln(u~3n − 1) is fairly high (0.848). Given that these
are both estimates and hence contain some pure sampling error, it appears that an assumption of
perfect correlation between them in the underlying population may not do too much violence to
15
There are twelve subjects in the HO data with few or no choices of the riskier lottery in any pair. They can
ultimately be included in random parameters estimations, but at this initial stage of individual estimation it is either
not useful (due to poor identification) or simply not possible to estimate models for these subjects.
16
Estimation of ω is a nuisance at the individual level. Trembles are rare enough that individual estimates of ω are
typically zero for individuals. Even when estimates are nonzero, the addition of an extra parameter to estimate
increases the noisiness of the remaining estimates and hides the pattern of variance and covariance of these
paramters that we wish to see at this step.
17
Two hugely obvious outliers have been removed for both the principal components extraction and the graph.
40
truth. Therefore, I make this assumption about the joint distribution of ψ in the population.
~
While ln(λ n ) does appear to share limited variance with ln(u~2n − 1) and ln(u~3n − 1) (Pearson
correlations of −0.22 and −0.45, respectively), it obviously either has independent variance of its
own or is estimated with relatively low precision.
These observations suggest modeling the joint distribution J (ψ | θ ) of ψ = (u2 , u3 , λ , ω ) as
being generated by two independent standard normal deviates xu and xλ, as follows:
(B.5)
u2 ( xu ,θ ) = 1 + exp(a2 + b2 xu ) , u3 ( xu ,θ ) = 1 + exp(a3 + b3 xu ) ,
λ ( xu , xλ ,θ ) = exp(aλ + bλ xu + cλ xλ ) and ω a constant,
where (θ , ω ) = (a2 , b2 , a3 , b3 , aλ , bλ , cλ , ω ) are parameters to be estimated.
Then the (EU,Strong) model, conditional on xu, xλ and (θ , ω ) , becomes
(B.6)
Pmcf ( xu , xλ ,θ , ω ) =
(1 − ω )Λ (λ ( xu , xλ ,θ )[(sm1 − tm1 )u1c + ( sm 2 − t m 2 )u2 c + ( sm3 − tm 3 )u3c ]) + ω / 2 ,
where u1c = 1 if c = −0, u1c = 0 otherwise,
u2c = u2 ( xu ,θ ) if c = −0, u2 c = 1 otherwise, and
u3c = u2 ( xu ,θ ) if c = −3, u3c = u3 ( xu ,θ ) otherwise.
Now, estimate θ * by maximizing the following random parameters log likelihood function in θ:
(B.7)
LL(θ , ω ) =
∑ ln(∫∫ (∏
n
m∈{ m 0∪m 23}
)
)
Pmcf ( xu , xλ ,θ , ω ) ynmt [1 − Pmcf ( xu , xλ ,θ , ω )]1− ynmt dΦ ( xu )dΦ( xλ ) ,
41
where Φ is the standard normal c.d.f. and Pmcf ( xu , xλ , θ , ω ) is as given in equation B.6.18 This is a
difficult maximization problem, but fortunately the linear regression lines in Figure 6 may
provide reasonable starting values for the parameter vector θ. That is, initial estimates of the a
and b coefficients in θ are the intercepts and slopes from the linear regressions of ln(u~2n − 1) ,
~
ln(u~3n − 1) and ln(λ n ) on their first principal component; and the root mean squared error of the
~
regression of ln(λ n ) on the first principal component provides an initial estimate of cλ.
Table 3 shows the results of maximizing B.7 in (θ , ω ) . As can be seen, the initial parameter
estimates are good starting values, though a couple of the final estimates are significantly
different from the initial estimates (judging from the asymptotic standard errors of the final
estimates). These estimates produce the log likelihood in the first column of the top row of Table
~ ~
1. Note that wherever b2 ≠ b3 , sufficiently large (or small) values underlying standard normal
deviate xu imply a violation of monotonicity (that is u2 > u3 ). Rather than imposing b2 = b3 as a
constraint on the estimations, I impose the weaker constraint (a2 − a3 ) /(b3 − b2 ) > 4.2649 ,
making the estimated population fraction of such violations no larger than 10−5. This constraint
does not bind for the estimates shown in Table 3. Generally, it rarely binds, and is never close to
significantly binding, for any of the strong, strict or contextual utility estimations done here.
Recall that the nonparametric treatment of the utility of outcomes avoids a fixed risk attitude
across the outcome vector (0,1,2,3), as would be implied by a parametric form such as CARA or
CRRA utility. The estimates shown in Table 3 imply a population in which about sixty-eight
percent of subjects have a weakly concave utility function, with the remaining thirty-two percent
18
Such integrations must be performed numerically in some manner for estimation. I use gauss-hermite quadratures,
which are practical up to two or three integrals; for integrals of higher dimension, simulated maximum likelihood is
usually more practical. Judd (1998) and Train (2003) are good sources for these methods.
42
have an inflected “concave then convex” utility function. This is very similar to the results of
Hey and Orme’s (1994) individual estimations: That is, the random parameters estimation used
here produces estimated sample heterogeneity much like that suggested by individual estimation.
A very similar process was used to select and estimate random parameters characterizations
of heterogeneity for all models. In each case, a model’s parameter vector ψ n is first estimated
separately for each subject with a fixed tremble probability ω = 0.04. Let these estimates be ψ~ n .
The correlation matrix of the parameters is then computed, and the vectors ψ~ n are also subjected
to a principal components analysis, with particular attention to the first principal component. As
with the detailed example of the (EU,Strong) model, all models with utility parameters u2n and
u3n (that is, all strong, strict or contextual utility models) yield quite high Pearson correlations
between ln(u~2n − 1) and ln(u~3n − 1) across subjects, and heavy loadings of these on first principal
components of estimated parameter vectors ψ~ n . Therefore, the population distributions J (ψ | θ ) ,
where ψ = (u2 , u3 , λ ) (strong, strict and contextual EU models) or ψ = (u2 , u3 , γ , λ ) (strong, strict
and contextual RDEU models), are in all cases modeled as having a perfect correlation between
ln(u2n − 1) and ln(u3n − 1) , generated by an underlying standard normal deviate xu.
Let RP denote random preference models. Quite similarly, individual estimations of the two
RP models, with Gamma distribution shape parameters φ1n and φ2n , yield quite high Pearson
~
~
correlations between ln(φ1n ) and ln(φ2n ) across subjects, and heavy loadings of these on first
principal components of estimated parameter vectors ψ~ n . Therefore, the joint distributions
J (ψ | θ ) , where ψ = (φ1 , φ2 , κ ) in the (EU,RP) or ψ = (γ , φ1 , φ2 , κ ) in the (RDEU,RP) model, are
43
both assumed to have a perfect correlation between ln(φ1n ) and ln(φ2n ) in the population,
generated by an underlying standard normal deviate xφ.
In all cases, all other model parameters are characterized as possibly partaking of some of the
variance represented by xu (in strong, strict or contextual utility models) or xφ (in random
preference models), but also having independent variance represented by an independent
standard normal variate. In essence, all correlations between model parameters are represented as
arising from a single underlying first principle component (xu or xφ) which in all cases accounts
for two-thirds (frequently more) of the shared variance of parameters in ψ~ n according to the
principal components analyses. The correlation is assumed to be a perfect one for ln(u2n − 1) and
ln(u3n − 1) (in non-random-preference models) or ln(φ1n ) and ln(φ2n ) (in random preference
models), since this seems very nearly characteristic of all individual model estimates; but aside
from ω, other model parameters are given their own independent variance since their
correlations with ln(u2n − 1) and ln(u3n − 1) are always weaker than that observed between
ln(u2n − 1) and ln(u3n − 1) (similarly for ln(φ1n ) and ln(φ2n ) in random preference models).
The following equation systems show the characterization for all models, where any subset
of xu, xφ, xλ, xκ and xγ found in each characterization are jointly independent standard normal
variates. Tremble probabilities ω are modeled as constant in the population, for reasons
discussed in section 6.1, and so there are no equations below for ω. The systems represent the
choice probabilities Pmc , but (1 − ω ) Pmc + ω / 2 is used to build likelihood functions allowing for
trembles. As in the text and Appendix A, Λ, G and B′ are the Logistic, Gamma and Beta-prime
cumulative distribution functions, respectively. In strong, strict and contextual utility models, the
44
relationship between the uic terms in the expressions for Pmc , and the functions u2 ( xu ,θ ) and
u3 ( xu ,θ ) , is always as shown below equation B.6.
(EU,Strong), (EU,Strict) and (EU,Contextual) models: Pmc ( xu , xλ ,θ ) =
Λ(λ ( xu , xλ ,θ )[ f ( sm1u1c + sm 2u2c + sm3u3c ) − f (rm1u1c + rm 2u2c + rm3u3c )] / d c ( xu ,θ ) ) ,
where f(x) = ln(x) in strict utility, and f(x) = x in strong and contextual utility;
d c ( xu ,θ ) = u2 ( xu ,θ ) for c = −3, u3 ( xu ,θ ) for c = −2, and u3 ( xu ,θ ) − 1 for c = −0 in contextual
utility, and d c ( xu ,θ ) ≡ 1 in strong and strict utility; and
u2 ( xu ,θ ) = 1 + exp(a2 + b2 xu ) , u3 ( xu ,θ ) = 1 + exp(a3 + b3 xu ) , and
λ ( xu , xλ ,θ ) = exp(aλ + bλ xu + cλ xλ ) , where θ = (a2 , b2 , a3 , b3 , aλ , bλ , cλ ) .
(EU,RP) model: Pmc ( xφ , xκ ,θ ) = G (Rm | φ1 ( xφ ,θ ), κ ( xφ , xκ ,θ ) ) for c = −3 ,
G (Rm | φ1 ( xφ ,θ ) + φ2 ( xφ ,θ ), κ ( xφ , xκ ,θ ) ) for c = −2, and B′(Rm | φ2 ( xφ , θ ), φ1 ( xφ ,θ ) ) for c = −0 ,
where Rm =
sm 2 + sm 3 − t m 2 − t m 3
; and
t m 3 − sm 3
φ1 ( xφ ,θ ) = exp(a1 + b1 xφ ) , φ2 ( xφ ,θ ) = exp(a2 + b2 xφ ) , and κ ( xφ , xκ ,θ ) = exp(aκ + bκ xφ + cκ xκ ) ,
where θ = (a1 , b1 , a2 , b2 , aκ , bκ , cκ ) .
45
(RDEU,Strong), (RDEU,Strict) and (RDEU,CU) models: Pmc ( xu , xλ , xγ ,θ ) =
Λ (λ ( xu , xλ , θ )[ f ( wsm1 ( xu , xγ ,θ )u1c + wsm 2 ( xu , xγ ,θ )u2 c + wsm3 ( xu , xγ ,θ )u3c ) −
f ( wt m1 ( xu , xγ ,θ )u1c + wt m 2 ( xu , xγ ,θ )u2c + wt m 3 ( xu , xγ ,θ )u3c )] / d c ( xu ,θ ) ) ,
where f(x) = ln(x) in strict utility, and f(x) = x in strong and contextual utility;
d c ( xu ,θ ) = u2 ( xu ,θ ) for c = −3, u3 ( xu ,θ ) for c = −2, and u3 ( xu ,θ ) − 1 for c = −0 in contextual
utility, and d c ( xu ,θ ) ≡ 1 in strong and strict utility; and
(
) (
)
wsmi ( xu , xγ ,θ ) =
w ∑z ≥ z smz | γ ( xu , xγ ,θ ) − w ∑z > z smz | γ ( xu , xγ ,θ ) and
wt mi ( xu , xγ ,θ ) =
w ∑z≥ z t mz | γ ( xu , xγ ,θ ) − w ∑z > z t mz | γ ( xu , xγ ,θ ) , where
(
mi
mi
) (
mi
)
mi
(
γ ( xu , xγ ,θ )
w(q | γ ( xu , xγ ,θ ) = exp − [− ln(q)]
) ; and
u2 ( xu ,θ ) = 1 + exp(a2 + b2 xu ) , u3 ( xu ,θ ) = 1 + exp(a3 + b3 xu ) , γ ( xu , xγ ,θ ) = exp(aγ + bγ xu + cγ xγ ) ,
and λ ( xu , xλ ,θ ) = exp(aλ + bλ xu + cλ xλ ) , where θ = (a2 , b2 , a3 , b3 , aγ , bγ , cγ , aλ , bλ , cλ ) .
(RDEU,RP) models: Pmc ( xφ , xκ , xγ ,θ ) = G (WRm ( xφ , xγ ,θ ) | φ1 ( xφ ,θ ), κ ( xφ , xκ ,θ ) ) for c = −3,
G (WRm ( xφ , xγ ,θ ) | φ1 ( xφ ,θ ) + φ2 ( xφ , θ ), κ ( xφ , xκ ,θ ) ) for c = −2, and
B′(WRm ( xφ , xγ ,θ ) | φ2 ( xφ ,θ ), φ1 ( xφ ,θ ) ) for c = −0 ;
where WRm ( xφ , xγ ,θ ) =
w[ sm 2 + sm3 | γ ( xφ , xγ ,θ )] − w[tm 2 + t m3 | γ ( xφ , xγ ,θ )]
and
w[t m3 | γ ( xφ , xγ ,θ )] − w[ sm3 | γ ( xφ , xγ ,θ )]
(
γ ( xφ , xγ ,θ )
w[q | γ ( xφ , xγ ,θ ] = exp − [− ln(q)]
); and
φ1 ( xφ ,θ ) = exp(a1 + b1 xφ ) , φ2 ( xφ ,θ ) = exp(a2 + b2 xφ ) , γ ( xφ , xγ ,θ ) = exp(aγ + bγ xφ + cγ xγ ) , and
κ ( xφ , xκ ,θ ) = exp(aκ + bκ xφ + cκ xκ ) , where θ = (a1 , b1 , a2 , b2 , aγ , bγ , cγ , aκ , bκ , cκ ) .
46
For models with the EU structure, a likelihood function nearly identical to equation B.7 is
maximized in θ ; for instance, for the (EU,RP) model simply replace Pmc ( xu , xλ ,θ ) with
Pmc ( xφ , xκ ,θ ) , and replace dΦ ( xu )dΦ ( xλ ) with dΦ ( xφ )dΦ ( xκ ) . For models with the RDEU
structure, a third integration appears since these models allow for independent variance in γ (the
Prelec weighting function parameter) through the addition of a third standard normal variate xγ.
In all cases, the integrations are carried out by gauss-hermite quadrature. For models with the EU
structure, where there are two nested integrations, 14 nodes are used for each nested quadrature
of an integral. For models with the RDEU structure, 10 nodes are used for each nested
quadrature. In all cases, starting values for these numerical maximizations are computed in the
manner described for the (EU,Strong) model: Parameters in ψ~ n are regressed on their first
principle component, and the intercepts and slopes of these regressions are the starting values for
the a and b coefficients in the models, while the root mean squared errors of these regressions are
the starting values for the c coefficients found in the equations for λ, κ and/or γ.
47
Table 1. Log likelihoods of random parameters characterizations of the models in the Hey and Orme sample
Structure
EU
RDEU
Stochastic Model
Strong Utility
Strict Utility
Contextual Utility
Random Preferences
Strong Utility
Strict Utility
Contextual Utility
Random Preferences
Estimated on all three
contexts
Log Likelihood
on all three contexts
(in-sample fit)
−5311.44
−5448.50
−5297.08
−5348.36
−5207.81
−5306.48
−5190.43
−5218.00
48
Estimated on contexts
(0,1,2) and (0,1,3)
Log likelihood
on context (1,2,3)
(out-of-sample fit)
−2409.38
−2373.12
−2302.55
−2356.60
−2394.75
−2450.41
−2281.36
−2335.55
Table 2. Vuong (1989) non-nested tests between model pairs,
by structure, stochastic model and in-sample versus out-of-sample fit.
EU Structure
Estimated on all three contexts, and comparing fit
Estimated on contexts (0,1,2) and (0,1,3), and
on all three contexts (in-sample fit comparison)
comparing fit on context (1,2,3) (out-of-sample fit
comparison)
Random
Strong
Strict
Random
Strong
Strict
Prefs.
Utility
Utility
Prefs.
Utility
Utility
Contextual z = 1.723
z = 0.703
z = 6.067 Contextual z = 4.387
z = 3.044
z = 2.739
Utility
p = 0.042
p = 0.241 p < 0.0001
Utility
p < 0.0001 p = 0.0012 p = 0.0031
Random
z = 5.419
Random
z = 1.639
z = 1.422
z = −1.574


Prefs.
Prefs.
p = 0.051
p = 0.078
p = 0.058 p < 0.0001
Strong
z = 5.961
Strong
z = 0.028




Utility
p < 0.0001
Utility
p = 0.49
RDEU Structure
Estimated on all three contexts, and comparing fit
Estimated on contexts (0,1,2) and (0,1,3), and
on all three contexts (in-sample fit comparison)
comparing fit on context (1,2,3) (out-of-sample fit
comparison)
Random
Strong
Strict
Random
Strong
Strict
Prefs.
Utility
Utility
Prefs.
Utility
Utility
Contextual z = 0.981
z = 0.877
z = 4.352 Contextual z = 3.879
z = 3.304
z = 5.978
Utility
p = 0.163
p = 0.190 p < 0.0001
Utility
p < 0.0001 p = 0.0005 p < 0.0001
Random
z = 3.808
Random
z = 1.652
z = 3.831
z = −0.44


Prefs.
Prefs.
p = 0.049 p < 0.0001
p = 0.330 p < 0.0001
Strong
z
=
5.973
Strong
z = 3.918




Utility
p < 0.0001
Utility
p < 0.0001
Notes: Positive z means the row stochastic model fits better than the column stochastic model.
49
Table 3. Random parameters estimates of the (EU,Strong) model, using choice data from the contexts (0,1,2), (0,1,3) and (1,2,3) of the
Hey and Orme (1994) sample.
Structural and stochastic
parameter models
u2 = 1 + exp(a2 + b2 xu )
u3 = 1 + exp(a3 + b3 xu )
λ = exp(aλ + bλ xu + cλ xλ )
ω constant
Distributional Initial
Final
parameter
estimate estimate
−1.2
−1.28
0.57
0.514
−0.51
−0.653
0.63
0.657
3.2
3.39
−0.49
−0.658
0.66
0.584
0.04
0.0446
ω
Log likelihood = −5311.44
a2
b2
a3
b3
aλ
bλ
cλ
Asymptotic
Asymptotic
standard
t-statistic
error
0.0411
−31.0
0.0311
16.5
0.0329
−16.9
0.0316
20.8
0.101
33.8
0.124
−5.32
0.0571
10.2
0.0105
4.26
Notes: xu and xλ are independent standard normal variates. Standard errors are calculated using the “sandwich estimator” (Wooldridge
2002) and treating all of each subject’s choices as a single “super-observation,” that is, using degrees of freedom equal to the number
of subjects rather than the number of subjects times the number of choices made.
50
Figure 1. Citations of Pratt (1964), 1977-2006 (Science and Social Science Citation Indices),
and articles with text references to risk aversion and substitutability relations, 1977-2001
(JSTOR Economics journals and selected Business journals)
350
250
Articles citing Pratt (1964)
200
JSTOR articles with text
references to "more risk
averse" or "greater risk
aversion"
150
JSTOR articles with text
references to "more
substitutable" or "greater
elasticity of substitution"
100
50
Year
51
05
20
03
20
01
20
99
19
97
19
95
19
93
19
91
19
89
19
87
19
85
19
83
19
81
19
79
19
77
0
19
Number of articles or citations
300
Figure 2. Behavior of five CRRA EU V-distances, using various CRRA transformations, with
homoscedastic precision: MPS pair 1 on context (0,50,100), from Hey (2001)
12
basic transformation
power transformation
unit range transformation
minimum unit transformation
log transformation
10
V-distance
8
6
4
2
0
0
0.1
0.2
0.3
0.4
0.5
0.6
coefficient of relative risk aversion
52
0.7
0.8
0.9
Figure 3. Behavior of five CRRA EU V-distances, using various CRRA transformations, with
homoscedastic precision: MPS pair 1 on context (50,100,150), from Hey (2001)
12
10
V-distance
8
6
4
basic transformation
power transformation
2
unit range transformation
minimum unit transformation
log transformation
0
0
0.1
0.2
0.3
0.4
0.5
0.6
coefficient of relative risk aversion
53
0.7
0.8
0.9
Figure 4. The agent heteroscedasticity solution for making MRA imply SMRA
180
utility and expected utility
transformations τ a and τ b of
the two utility functions chosen
to match slopes at the central
outcome of the outcome
context (50,100,150)
common EU of sure 100
b
∆τ b V (ρ =0.5)
a
∆τ a V (ρ =0.99
)
square root
b
(ρ = 0.50)
CRRA
utility
function
"near-log" (ρa = 0.99)
CRRA utility function
0
50
100
-20
outcomes
54
150
Figure 5. Relationship between risk aversion and precision in Hey (2001), CRRA EU
estimation.
12
natural logarithm of precision
10
8
6
4
individual estimates
2
robust regression line (slope=4.24)
0
-0.5
0
0.5
1
1.5
2
-2
-4
coefficient of relative risk aversion
55
2.5
3
3.5
Figure 6. Shared variance of initial individual parameter estimates in the (EU,Strong) model
natural logarithm of precision and utility parameter
estimates
6
5
ln(λ)
4
3
2
1
ln(u 3-1)
0
-3
-2
-1
0
1
2
-1
ln(u 2-1)
-2
-3
-4
First principal component of variance
56
3