Hierarchical maximum likelihood parameter estimation for

Hierarchical maximum likelihood parameter
estimation for cumulative prospect theory:
Improving the reliability of individual risk
parameter estimates
Ryan O. Murphy , Robert H.W. ten Brincke
ETH Risk Center – Working Paper Series
ETH-RC-14-005
The ETH Risk Center, established at ETH Zurich (Switzerland) in 2011, aims to develop crossdisciplinary approaches to integrative risk management. The center combines competences from the
natural, engineering, social, economic and political sciences. By integrating modeling and simulation
efforts with empirical and experimental methods, the Center helps societies to better manage risk.
More information can be found at: http://www.riskcenter.ethz.ch/.
ETH-RC-14-005
Hierarchical maximum likelihood parameter estimation for cumulative prospect theory: Improving the reliability of individual risk
parameter estimates
Ryan O. Murphy , Robert H.W. ten Brincke
Abstract
Individual risk preferences can be identified by using decision models with tuned parameters that maximally fit a set of risky choices made by a decision maker. A goal of this model fitting procedure is to
isolate parameters that correspond to stable risk preferences. These preferences can be modeled as
an individual difference, indicating a particular decision maker’s tastes and willingness to tolerate risk.
Using hierarchical statistical methods we show significant improvements in the reliability of individual risk
preference parameters over other common estimation methods. This hierarchal procedure uses population level information (in addition to an individual’s choices) to break ties (or near-ties) in the fit quality
for sets of possible risk preference parameters. By breaking these statistical “ties” in a sensible way,
researchers can avoid overfitting choice data and thus better measure individual differences in people’s
risk preferences.
Keywords: Prospect theory, Risk preference, Decision making under risk, Hierarchical parameter estimation, Maximum likelihood
Classifications: JEL Codes: D81, D03
URL: http://web.sg.ethz.ch/ethz risk center wps/ETH-RC-14-005
Notes and Comments:
ETH Risk Center – Working Paper Series
Hierarchical maximum likelihood parameter estimation
for cumulative prospect theory: Improving the
reliability of individual risk parameter estimates
Ryan O. Murphy∗, Robert H.W. ten Brincke
Chair of Decision Theory and Behavioral Game Theory
Swiss Federal Institute of Technology Zürich (ETHZ)
Clausiusstrasse 50, 8006 Zurich, Switzerland
+41 44 632 02 75
Abstract
Individual risk preferences can be identified by using decision models with
tuned parameters that maximally fit a set of risky choices made by a decision
maker. A goal of this model fitting procedure is to isolate parameters that
correspond to stable risk preferences. These preferences can be modeled as
an individual difference, indicating a particular decision maker’s tastes and
willingness to tolerate risk. Using hierarchical statistical methods we show
significant improvements in the reliability of individual risk preference parameters over other common estimation methods. This hierarchal procedure
uses population level information (in addition to an individual’s choices) to
break ties (or near-ties) in the fit quality for sets of possible risk preference
parameters. By breaking these statistical “ties” in a sensible way, researchers
can avoid overfitting choice data and thus better measure individual differences in people’s risk preferences.
Key words: Prospect theory, Risk preference, Decision-making under risk,
Hierarchical parameter estimation, Maximum likelihood
∗
Corresponding author.
Email addresses: [email protected] (Ryan O. Murphy),
[email protected] (Robert H.W. ten Brincke)
Working paper
April 16, 2014
2
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
1. Introduction
People must often make choices among a number of different options
for which the outcomes are not certain. Epistemic limitations can be precisely quantified using probabilities, and given a well defined sets of options,
each option with its respective probability of fruition, the elements of risky
decision making are established (Knight 1932; Luce and Raiffa, 1957). Expected value maximization stands as the normative solution to risky choice
problems, but people’s choices do not always conform to this optimization
principle. Rather decision makers (DMs) reveal different preferences for risk,
sometimes forgoing an option with a higher expectation in lieu of an option
with lower variance (thus indicating risk aversion; in some cases DMs reveal
the opposite preference too thus indicating risk seeking). Behavioral theories
of risky decision making (Kahneman and Tversky, 1979; Tversky and Kahneman, 1992; Kahneman and Tversky, 2000) have been developed to isolate
and highlight the order in these choice patterns and provide psychological insights into the underlying structure of DM’s preferences. This paper is about
measuring those subjective preferences, and in particular finding a statistical
estimation procedure that can increase the reliability of those measures, thus
better capturing risk preferences and doing so at the individual level.
1.1. Example of a risky choice
A binary lottery (i.e., a gamble or a prospect) is a common tool for
studying risky decision making, so common in fact that they have been called
the fruit flies of decision research (Lopes, 1983). In these simple risky choices,
DMs are presented with monetary outcomes xi , each associated with an
explicit probability pi . Consider for example the lottery in Table 1, offering
a DM the choice between option A, a payoff of $1 with probability .62 or
$83 with .38; or option B, a payoff of $37 with probability .41 or $24 with
probability .59. The decision maker is called upon to choose either option A
or option B and then the lottery is played out via a random process for real
consequences.
1.2. EV maximization and descriptive individual level modeling
The normative solution to the decision problem in Table 1 is straightforward. The goal of maximizing long term monetary expectations is satisfied
2
option A
$1
$83
with probability
option B
.62
.38
$37
.41
with probability
$24
.59
Table 1: Here is displayed an example of a simple, one-shot risky decision. The choice
is simply whether to select option A or option B. The expected value of option
A is higher ($32.16) than that of option B ($29.33), but option A contains the
possibility of an outcome with a relatively small value ($1). As it turns out, the
majority of DMs prefer option B, even though it has a smaller expected value.
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
by selecting the option with the highest expected value. Although this decision policy has desirable properties, it is often not an accurate description of
what people prefer nor what they choose when presented with risky options.
As can be anticipated, different people have different tastes, and this heterogeneity includes preferences for risk as well. For example, the majority of
incentivized DMs select option B from the lottery shown above, indicating,
perhaps, that these DMs have some degree of risk aversion. However the
magnitude of this risk aversion is still unknown and it cannot be estimated
from only one choice resting from a binary lottery. This limitation has lead
researchers to use larger sets of binary lotteries where DMs make many independent choices, and from the overall pattern of choices, researchers can
draw inferences about the DM’s risk preferences.
Risk preferences have often been modeled at the aggregate level, by fitting
parameters to the most selected option across many individuals, as opposed
to modeling preferences at the individual level (see Kahneman and Tversky,
1979; Tversky and Kahneman, 1992). This aggregate approach is a useful first step in that it helps establish stylized facts about human decision
making and further facilitates the identification of systematicity in the deviations from EV maximization. However, actual risk preferences exist at
the individual level, and thus improving the resolution of measurement and
modeling is a natural development in this research field. Substantial work
exists along these lines already for both the measurement of risk preferences
as well their application (e.g. Gonzalez and Wu, 1999; Holt and Laury, 2002;
Abdellaoui, 2000; Fehr-Duda et al., 2006). The purpose of this paper is to
add to that literature by developing and evaluating a hierarchical estimation
method that improves the reliability of parameter estimates of risk preferences at the individual level. Moreover we want to do so with a measurement
tool that is readily useable and broadly applicable as a measure of individual
3
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
risk preferences.
1.3. Three necessary components for measuring risk preferences
There are three elementary components in measuring risk preferences
and these parts serve as the foundation for developing behavioral models of
risky choice. These three components are: lotteries, models, and statistical
estimation/fitting procedures. We explain each of these components below
in general terms, and then provide detailed examples and discussion of the
elements in the subsequent sections of the paper.
1.3.1. Lotteries
Choices among options reveal DM’s preferences (Samuelson, 1938; Varian, 2006). Lotteries are used to elicit and record risky decisions and these
resulting choice data serve as the input for the decision models and parameter
estimation procedures. Here we focus on choices from binary lotteries (such
as Stott, 2006; Rieskamp, 2008; Nilsson et al., 2011), but the elicitation of
certainty equivalence values is another well-established method for quantifying risk preferences (see for example Zeisberger et al., 2010). Binary lotteries
are arguably the simplest way for subjects to make choices and thus elicit
risk preferences. One reason to use binary lotteries is because people appear
to have problems with assessing a lottery’s certainty equivalent (Lichtenstein
and Slovic, 1971), although this simplicity comes at the cost of requiring a
DM to make many binary choices to achieve the same degree of estimation
fidelity that other methods purport to (e.g., the three decision risk measure
from Tanaka et al., 2010).
1.3.2. Decision model (non-expected utility theory)
Models may reflect the general structure (e.g., stylized facts) of DMs’ aggregate preferences by invoking a latent construct like utility. These models
also typically have free parameters that can be tuned to improve the accuracy of how they capture different choice patterns. An individual’s choices
from the lotteries are fit to a model by tuning these parameters and thus
identifying differences in revealed preferences and summarizing the pattern
of choices (with as much accuracy as possible) by using particular parameter
values. These parameters can, for example, establish not just the existence of
risk aversion, but can quantify the degree of this particular preference. This
is useful in characterizing an individual decision maker and isolating cognitive mechanisms that underlie behavior (e.g., correlating risk preferences
4
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
with other variables, including measures of physiological processes such as
in Figner and Murphy, 2010; Schulte-Mecklenbeck et al., 2014) as well as
tuning a model to make predictions about what this DM will choose when
presented with different options. Here we focus on the non-expected utility
model of cumulative prospect theory (Kahneman and Tversky, 1979; Tversky and Kahneman, 1992; Starmer, 2000) as our decision model. Prospect
theory is arguably the most important and influential descriptive model of
risky choice to date and it has been used extensively in research related to
model fitting and risky decision making.
1.3.3. Parameter estimation procedure
Given a set of risky choices and a particular decision model, the best
fitting parameter(s) can be identified via a statistical estimation procedure.
A perfectly fitting parameterized model would exactly reproduce all of a
DM’s choices correctly (i.e., perfect correspondence). Obviously this is a lofty
goal and it is almost never observed in practice, as there almost certainly is
measurement error in a set of observed choices.
A decision model, with best fitting parameters, can be evaluated in several different ways. One approach of evaluating the goodness of fit would be
to determine how many correct predictions the parameterized model makes
of a DM’s choices. For example, for a set of risky choices from binary lotteries, researchers may find that 60% of the choices are constant with expected
value maximization (EV maximization is the normative decision policy and
trivially is a zero parameter decision model). If researchers used a concave
power function to transform values into utilities, this single parameter model
may increase to 70% consistency with the selected choices. Although this
model scoring method is straightforward and intuitive, it has some stark
shortcomings (as we will show later in the paper). Other more sophisticated
methods, like maximum likelihood estimation, have been developed that allow for a more nuanced approach to evaluating the fit of a choice model (see
for example Harless and Camerer, 1994; Hey and Orme, 1994). All of these
different evaluation methods can also include both in-sample, and out-ofsample tests, which can be useful to diagnose and mitigate the overfitting
choice data.
The interrelationship between lotteries, a decision model, and an estimation procedure, is shown in Figure 1. Lotteries provide stimuli and the
resulting choices are the input for the models; a decision model and an estimation procedure tune parameters to maximize correspondence between the
5
Input
Choices
Lotteries
1
2
91
32
99
with probability .78
.22
39
56
with probability .32
.68
-24
-13
with probability .99
.01
-15
-62
with probability .44
.56
58
-5
with probability .48
.52
96
-40
with probability .60
.40
time 1
time 2
√
√
Model
Prospect Theory
Parameter
estimation
√
√
Week 1
parameters
α, λ, δ, λ
√
Reliability
Week 2
parameters
α, λ, δ, λ
√
Out-of-sample
(prediction)
In-sample (summarization)
Figure 1: On the left we find three choices for each of the binary lotteries that serve as input
for the model. Using an estimation method we fit the model’s risk preference
parameters to the individual risky choices at time 1. These parameters, to some
extent, summarize the choices made in time 1 by using the parameters in the
model and reproduce the choices (depending on the quality of the fit). The
out-of-sample predictive capacity of the model can be evaluated by comparing
the predictions against actual choices at time 2. The reliability of the estimates
is evaluated by comparing the time 1 parameter estimates to estimates obtained
using the same estimation method on the time 2 data. This experimental design
holds the lotteries and choices consistent, but varies the estimation procedures;
moreover this is done in a test-retest design which allows the computation and
comparison of both in-sample and out-of-sample statistics.
135
136
137
138
139
140
model and the actual choices from the decision maker facing the lotteries.
This process of eliciting choices can be repeated again using the same lotteries and the same experimental subjects. This test-retest design allows for
both the development of in-sample parameter fitting (i.e., statistical explanation or summarization), as well as out-of-sample parameter fitting (i.e.,
prediction). Moreover, the test-retest reliability of the different parameter
6
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
estimates can be computed as a correlation, as two sets of parameters exist
for each decision maker. This is a powerful experimental design as it can
accommodate both fitting and prediction, and further can also diagnose instances of overfitting which can undermine psychological interpretations (see
Gigerenzer and Gaissmaier, 2011).
1.4. Overfitting and bounding parameters
Multi-parameter models estimation methods may be prone to overfitting
and in doing so adjust to noise instead of real risk preferences (Roberts
and Pashler, 2000). This can sometimes be observed when parameter values
emerge that are highly atypical and extreme. A common solution to this
problem is to set boundaries and limit the range of parameter values that
are potentially estimated. Boundaries prevent extreme parameter values in
the case of weak evidence and thus reduce overfitting, but on the downside
they negate the possibility of reflecting extreme preferences altogether, even
though these preferences may be real. Boundaries are also defined arbitrary
and may create serious estimation problems due to parameter interdependence. For example, setting a boundary at the lower range on one parameter
may radically change the estimate of another parameter (e.g., restricting the
range of probability distortion may unduly influence estimates of loss aversion) for one particular subject. The fact that a functional (mis)specification
can affect other parameter estimates (see for example Nilsson et al., 2011)
and that interaction effects between parameters have been found (Zeisberger
et al., 2010) indicates that there are multiple ways in which prospect theory
can account for preferences. This in turn may mean that choosing different
parameter boundaries for one or more parameters may affect the estimates of
other parameter estimates as well if a parameter estimate runs up against an
imposed boundary. To circumvent the pitfalls of arbitrary parameter boundaries, we use a hierarchical estimation method based on Farrell and Ludwig
(2008) without such boundaries. At its core, this method uses estimates of
risk preferences of the whole sample to inform estimates of risk preferences at
the individual level. In this paper we address to what degree an estimation
method combining group level information with individual level information
can more reliably represent individual risk preferences, rather than using
either individual or aggregate information exclusively.
7
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
1.5. Justification for using hierarchical estimation methods
The ultimate goal of this hierarchical estimation procedure is to obtain reliable estimates of individual risk preferences that can be used to make better
predictions about risky choice behavior contrasted to other estimation methods. This is not modeling as a means to only maximize in-sample fit, but
rather to maximize out-of-sample correspondence; and moreover these methods are a way to gain insights into what people are actually motivated by
as they cogitate about risky options and make choices when confronting irreducible uncertainty. Ideally, the parameters should not only be about just
summarizing choices, but also about capturing some psychological mechanisms that underlie risky decision making.
1.6. Other applications of hierarchical estimation methods
Other researchers have applied hierarchical estimation methods to risky
choice modeling. Nilsson et al. (2011) have applied a Bayesian hierarchical
parameter estimation model to simple risky choice data. The hierarchical
procedure outperformed maximum likelihood estimation in a parameter recovery test. The authors however did not test out-of-sample predictions nor
the retest reliability of their parameter estimates. Wetzels et al. (2010) applied Bayesian hierarchical parameter estimation in a learning model and
found it was more robust with regard to extreme estimates and misunderstanding of task dynamics. Scheibehenne and Pachur (2013) applied Bayesian
hierarchical parameter estimation to risky choice data using a TAX model
(Birnbaum and Chavez, 1997) but reported no improvement in parameter
stability.
1.7. Other research examining parameter stability
There exists some research on estimated parameter stability. Glöckner
and Pachur (2012) tested the reliability of parameter estimates using a nonhierarchical estimation procedure. The results show that individual risk
preferences are generally stable and that the individual parameter values
outperform aggregate values in terms of prediction. The authors concluded
that the reliability of parameters suffered when extra model parameters were
added. This is contrasted against the work of Fehr-Duda and Epper (2012)
who conclude, based on experimental data and a literature review, that those
additional parameters (related to the probability weighting function) are necessary to reliably capture individual risk preferences. Zeisberger et al. (2010)
also tested individual parameter reliability using a different type of risky
8
211
212
213
choice (certainly equivalent vs. binary choice). They found significant differences in parameter value estimates over time but did not use a hierarchical
estimation procedure.
227
1.8. Structure of the rest of this paper
In this paper we focus on evaluating different statistical estimation procedures for fitting individual choice data to risky decision models. To this
end, we hold constant the set of lotteries DMs made choices with, and the
functional form of the risky choice model. We then fit the same set of choice
data using a variety of estimation methods, and then contrast the results
to differentiate their efficiency. We also compute the test-retest reliability
of parameter estimations that resulted from different estimation procedures.
This broad approach allows us to evaluate different estimation methods and
make substantiated conclusions about fit quality as well as diagnose overfitting. We conclude with recommendations for estimating risk preferences
at the individual level and briefly describe an online tool for measuring and
estimating risk preferences that researchers are invited to use as part of their
own research programs.
228
2. Stimuli and Experimental design
214
215
216
217
218
219
220
221
222
223
224
225
226
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
2.1. Participants
One hundred eighty five participants from the subject pool at the Max
Planck Institute for Human Development in Berlin volunteered to participate
in two sessions (referred to hereafter as time 1 and time 2) that were administered approximately two weeks apart. After both sessions were conducted,
complete data from both sessions was available for 142 participants; these
subjects with complete datasets are retained for subsequent analysis. The
experiment was incentive compatible and was conducted without deception.
The data used here are the choice only results from the Schulte-Mecklenbeck
et al. (2014) experiment and we are indebted to these authors for their generosity in sharing their data.
2.2. Stimuli
In each session of the experiment individuals made choices from a set of
91 simple binary option lotteries. Each option has two possible outcomes
between -100 and 100 that occur with known probabilities that sum to one.
There were four types of lotteries for which examples are shown in Table
9
245
246
247
248
249
250
251
252
253
254
255
2. The four types are: gains only; lotteries with losses only; mixed lotteries
with both gains and losses; and mixed-zero lotteries with one gain and one
loss and zero (status quo) as the alternative outcome. The first three types
were included to cover the spectrum of risky decisions while the mixed-zero
type allows for measuring loss aversion separately from risk aversion (Rabin,
2000; Wakker, 2005). The same 91 lotteries were used in both test-retest
sessions of this research. The set was compiled of items used by Rieskamp
(2008), Gaechter et al. (2007) and Holt and Laury (2002); these items are
also used by Schulte-Mecklenbeck et al. (2014). Thirty-five lotteries are gain
only, 25 are loss only, 25 are mixed, and 6 are mixed-zero. All of the lotteries
are listed in Appendix A.
Option A
Lottery
1
2
3
4
32
99
-24
-13
58
-5
-30
60
.78
.22
.99
with probability
.01
.48
with probability
.52
.50
with probability
.50
with probability
Option B
39
.32
with probability
56
.68
-15
.44
with probability
-62
.56
96
.60
with probability
-40
.40
0 for sure
Type
gain
loss
mixed
mixed-zero
Table 2: Examples of binary lotteries from each of the four types.
256
257
258
259
260
261
262
263
264
265
266
267
268
2.3. Procedure
Participants received extensive instructions regarding the experiment at
the beginning of the first session. All participants received 10 EUR as a guaranteed payment for participation in the research, and could earn more based
on their choices. Participants then worked through several examples of risky
choices to familiarize themselves with the interface of the MouselabWeb software (Willemsen and Johnson, 2011, 2014) that was used to administer the
study. While the same 91 lotteries were used for both experimental sessions,
the item order was randomized. Additionally the order of the outcomes (topbottom) and options order (A or B) and the orientation (A above B, A left
of B) were randomized and stored during the first session. The exact opposite spacial representation of this was used in the second session, to mitigate
order and presentation effects.
10
277
Incentive compatibility was implemented by randomly selecting one lottery at the end of each experimental session and paying the participant according to their choice on the selected item when it was played out with
the stated probabilities. An exchange rate of 10:1 was used between experimental values and payments for choices. Thus, participants earned a fixed
payment of 10 EUR plus one tenth of the outcome of one randomly selected
lottery for each completed experimental session. As a result participants
earned about 30 EUR (approximately 40 USD) on average for participating
in both sessions.
278
3. Model specification
269
270
271
272
273
274
275
276
279
280
281
282
283
We use cumulative prospect theory (Tversky and Kahneman, 1992) to
model risk preferences. A two-outcome lottery1 L is valued in utility u(·)
as the sum of its components in which the monetary reward xi is weighted
by a value function v(·) and the associated probability pi by a probability
weighting function w(·). This is shown in Equation 1.
u(L) = v(x1 )w(p1 ) + v(x2 )(1 − w(p1 ))
284
285
286
287
288
289
290
291
292
293
294
(1)
Cumulative prospect theory has many possible mathematical specifications. These different functional specifications have been tested and reported
in the literature (see for example Stott, 2006) and the functional forms and
parameterizations we use here are justifiable given the preponderance of findings.
3.1. Value function
In general power functions have been empirically shown to fit choice data
better than many other functional forms at the individual level (Stott, 2006).
Stevens (1957) also cites experimental evidence showing the merit of power
functions in general for modeling psychological processes. Here we use a
power value-function as displayed in Equation 2.
1
Value x1 and its associated probability p1 and is the option with the highest value for
lotteries in the positive domain and x1 is the option with the lowest associated value for
lotteries in the negative domain. The order is irrelevant when lotteries consist of uniform
gains or losses, however in mixed lotteries it matters. This ordering is to ensure that a
cumulative distribution function is applied to the options systematically. See Tversky and
Kahneman (1992) for details.
11
1
w(p) (subjective probability)
v(x) (subjective value)
25
15
5
−5
−15
−25
−25
−15
−5
5
x (objective value)
15
0.8
0.6
0.4
0.2
0
0
25
0.2
0.4
0.6
0.8
p (objective probability)
Figure 2: This figure shows typical value function (left) and probability weighting function
(right) from prospect theory. The parameters used to plot the solid lines are from
Tversky and Kahneman (1992) and reflect the empirical findings at the aggregate
level. The dashed lines represent risk-neutral preferences. The solid line for the
value function is concave over gains and convex over losses, and further exhibits
a “kink” at the reference point (in this case the origin) consistent with loss
aversion. The probability weighting function overweights small probabilities
and underweights large probabilities. It is worth noting that at the individual
level, the plots can differ significantly from these aggregate level plots.
v(x) =
295
296
297
298
299
300
301
302
303
304
305
306
xα
−λ(−x)α
x ≥ 0;
x < 0;
α>0
α > 0; λ > 0
(2)
The α-parameter controls the curvature of the value function. If the value
of α is below one, there are diminishing marginal returns in the domain of
gains. If the value of α is one, the function is linear and consistent with riskneutrality (i.e., EV maximizing) in decision-making. For α values above one,
there are increasing marginal returns for positive values of x. This pattern
reverses in the domain of losses (values of x less than 0), which is referred to
as the reflection effect.
For this paper we use the same value function parameter α for both gains
and losses, and our reasoning is as follows. First, this is parsimonious. Second, when using a power function with different parameters for gains and
losses, loss aversion cannot be defined anymore as the magnitude of kink at
the reference point (Köbberling and Wakker, 2005) but would instead need
12
1
307
308
309
310
311
312
313
314
315
316
317
318
319
to be defined contingently over the whole curve.2 This complicates interpretation and undermines clear separability between utility curvature and loss
aversion, and further may cause modeling problems for mixed lotteries that
include an outcome of zero. Problems of induced correlation between the loss
aversion parameter and the curvature of the value function in the negative
domain have furthermore been reported when using αgain 6= αloss in a power
utility-model (Nilsson et al., 2011).
3.2. Probability weighting function
To capture subjective probability, we use Prelec’s functional specification
(Prelec, 1998; see Figure 5). Prelec’s two-parameter probability weighting
function can accommodate both inverse S-shaped as well as S-shaped weightings and has been shown to fit individual data well (Fehr-Duda and Epper,
2012; Gonzalez and Wu, 1999). Its specification can be found in Equation 3.
w(p) = exp (−δ(− ln(p))γ ),
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
δ > 0; γ > 0
(3)
The γ parameter controls the curvature of the weighting function. The
psychological interpretation of the curvature of the function is a diminishing sensitivity away from the end points: both 0 and 1 serve as necessary
boundaries and the further from these edges, the less sensitive individuals are
to changes in probability (Tversky and Kahneman, 1992). The δ-parameter
controls the general elevation of the probability weighting function. It is an
index of how attractive lotteries are considered to be (Gonzalez and Wu,
1999) in general and it corresponds to how optimistic (or pessimistic) an
individual decision maker is.
The use of Prelec’s two-parameter weighting function over the original
specification used in cumulative prospect theory (Tversky and Kahneman,
1992) requires additional explanation. In the original specification the point
of intersection with the diagonal changes simultaneously with the shape of
the weighting function, whereas in Prelec’s specification, the line always intersects at the same point if δ is kept constant.3 However, changing the
2
It is possible to use αgain 6= αloss by using an exponential utility function (Köbberling
and Wakker, 2005).
3
That point is 1/e ≈ 0.37 for δ = 1, which is consistent with some empirical evidence
(Gonzalez and Wu, 1999; Camerer and Ho, 1994). The (aggregate) point of intersection is sometimes also found to be closer to 0.5 (Fehr-Duda and Epper, 2012), which is
13
343
point of intersection and curvature simultaneously induces a negative correlation with the value function parameter α because both parameters capture
similar characteristics and also does not allow for a wide variety of individual preferences (see Fehr-Duda and Epper, 2012). Furthermore, the original
specification of the weighting function is not monotonic for γ < 0.279 (Ingersoll, 2008). While that low value is generally not in the range of reported
aggregate parameters (Camerer and Ho, 1994), this non-monotonicity may
become relevant when estimating individual preferences, where there is considerably more heterogeneity.
344
4. Estimation methods
335
336
337
338
339
340
341
342
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
Individual risk preferences are captured by obtaining a set of parameters
that best fit the observed choices implemented via a particular choice model.
In this section we explain the workings of the simple direct maximization of
the explained fraction (MXF) of choices, of standard maximum likelihood estimation (MLE) and of hierarchical maximum likelihood estimation (HML).
4.1. Direct maximization of explained fraction of choices (MXF)
Arguably the most straightforward way to capture an individual’s preferences is by finding the set of parameters that maximizes the fraction of
explained choices made over all the lotteries considered. By explained choices
we mean the lotteries for which the utility for the chosen option exceeds that
of the alternative option, given some decision model and parameter value(s).
The term “explained” does not imply having established some deeper causal
insights into the decision making process, but is used here simply to refer to
statistical explanation (e.g., a summarization or reproducibility) of choices
given tuned parameters and a decision model. The goal of this MXF method
is to find the set of parameters M = {α, λ, δ, γ} that maximizes the fraction
of explained choices from an individual DM.
N
1 X
d
c(yi |M)
Mi = arg max
M N
i=1
(4)
consistent with Karmarkar’s single-parameter weighting function (Karmarkar, 1979). A
two-parameter extension of that function is provided by Goldstein and Einhorn’s function
(Goldstein and Einhorn, 1987), which appears to fit about equally well as both Prelec’s,
and Gonzalez and Wu’s (Gonzalez and Wu, 1999).
14
362
363
364
By c(·) we denote the choice itself as in Equation 5, in which u(·) is given
by Equation 1. It returns 1 if the utility given the parameters is consistent
with the actual choice of the subject and 0 otherwise.

if A is chosen and u(A) ≥ u(B) or


 1
if B is chosen and u(A) ≤ u(B)
c(yi |M) =
(5)



0
otherwise
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
This is a discrete problem because we use a finite number of binary lotteries.
Solution(s) can be found using a grid search (i.e. parameter sweep) approach,
combined with a fine-tuning algorithm that uses the best results of the grid
search as a starting point and then refines the estimate as needed, improving
the precision of the parameter values. Note that this process can be computationally intense for a high resolution estimates since the full combination
of four parameters over their full range has to be taken into consideration.
The MXF estimation method has some advantages. Finding the parameters that maximize the explained fraction of choices is a simple metric that
is directly related to the actual choices from a decision maker. It is easy to
explain and it is also robust to the presence of a few aberrant choices. This
quality prevents the estimates from yielding results that are unduly affected
by a single decision, measurement error, or overfitting.
An important shortcoming of MXF as an estimation method is that the
characteristics of the explained lotteries are not taken into consideration when
fitting the choice data. Consider the lottery at the beginning of the paper
(Table 1), in which the expected value for option A is higher than that for B,
but just barely. Consider a decision maker that we suspect to be generally
risk-neutral, and therefore the EV-maximizing policy would dictate selecting option A over option B. But for argument’s sake, consider if we observe
the DM select option B; how much evidence does this provide against our
conjecture of risk neutrality? Hardly any, one can conclude, given the small
difference in utility between the options. Now consider the same lottery, but
the payoff for A is $483 instead of $83. Picking B over A is now a huge
mistake for a risk-neutral decision maker and such an observation would
yield stronger evidence that would force one to revise a conjecture about
the decision maker’s risk neutrality. MXF as a estimation method does not
distinguish between these two instances as its simplistic right/wrong crite15
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
rion lacks the nuance to integrate this additional information about options,
choices, and the strength of evidence.
Another limitation of MXF is that it results in numerous ties for bestfitting parameters. In other words, many parameter sets result in exactly
the same fraction of choices being explained. For several subjects the same
fraction of choices is explained using very different parameters. For one subject just over seventy percent of choices were explained by parameters that
included 0.6 to 1.25 for the utility curvature function, 1.5 to 6 (and possibly beyond) for loss aversion, 0.44 to 1.3 for the elevation in the weighting
function and 0.8 to 2.5 for its curvature. We find such ties for various constellations of parameters (albeit somewhat smaller) even for subjects for which
we can explain up to ninety percent of all choices.
Two other estimation methods (MLE and HLM) avoid these problems
and we turn our attention to them next.
4.2. Maximum Likelihood Estimation (MLE)
A strict application of cumulative prospect theory dictates that even a
trivial difference in utility leads the DM to always choose the option with the
highest utility. However, even in early experiments with repeated lotteries,
Mosteller and Nogee (1951) found that this is not the case with real decision
makers. Choices were instead partially stochastic, with the probability of
choosing the generally more favored option increasing as the utility difference
between the options increased. This idea of random utility maximization has
been developed by Luce (1959) and others (e.g. Harless and Camerer, 1994;
Hey and Orme, 1994). In this tradition, we use a logistic function that
specifies the probability of picking one option, depending on each option’s
utility. This formulation is displayed in Equation 6. A logistic function fits
the data from Mosteller and Nogee (1951) well for example and has been
shown generally to perform well with behavioral data (Stott, 2006).
p(A B) =
423
424
425
426
427
1
1+
eϕ(u(B)−u(A))
,
ϕ≥0
(6)
The parameter ϕ is an index of the sensitivity to differences in utility. Higher
values for ϕ diminish the importance of the difference in utility, as is illustrated in Figure 3. It is important to note that this parameter operates on
the absolute difference in utility assigned to both options. Two individuals
with different risk attitudes but equal sensitivity, will almost certainly end up
16
probability of picking A over B
1
0.8
0.6
0.4
ϕ=2
0.2
ϕ=1
0
0
5
10
utility of A
15
20
Figure 3: This figure shows two example logistic functions. Consider a lottery for which
option B has a utility of 10 while option A’s utility is plotted on the x-axis.
Given the utility of B, the probability of picking A over B is plotted on the yaxis. The option with a higher utility is more likely to be selected in general. In
the instance when both options’ utilities are equal, the probability of choosing
each option is equivalent. The sensitivity of a DM is reflected in parameter
ϕ. A perfectly discriminating (and perfectly consistent) DM would have a step
function as ϕ → 0; less perfect discrimination on behalf of DMs is captured by
larger ϕ values.
428
429
430
431
432
433
434
435
436
437
438
439
with different ϕ parameters simply because the differences in options utilities
are also determined by the risk attitude. While the parameter is useful to
help fit an individual’s choice data, one cannot compare the values of ϕ across
individuals unless both the lotteries and the individuals’ risk attitudes are
identical. The interpretation of the sensitivity parameter is therefore often
ambiguous and it should be considered as simply an aid to the estimation
method.
In maximum likelihood estimation the goal function is to maximize the
likelihood of observing the outcome, which consists of the observed choices,
given a set of parameters. The likelihood is expressed using the choice function above, yielding a stochastic specification.
We use the notation M = {α, λ, δ, γ, ϕ}, where yi denotes the choice for
17
440
the i-th lottery.
di = arg max
M
M
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
N
Y
c(yi |M)
(7)
i=1
By c(·) we denote the choice itself, where p(A B) is given by the logistic
choice function in Equation 6.
p(A B)
if A is chosen (yi is 0)
c(yi |M) =
(8)
1 − p(A B)
if B is chosen (yi is 1)
This approach overcomes some of the major shortcomings of other methods
discussed above. MLE takes the utility difference between the two options
into account and therefore does not only fit the number of choices correctly
predicted, but also measures the quality of the lottery and its fit. Ironically,
because it is the utility difference that drives the fit (not a count of correct
predictions), the resulting best-fitting parameters may explain fewer choices
than the MXF method. This is because MLE tolerates several smaller prediction mistakes instead of fewer, but very large, mistakes. The fact that
the quality of the fit drives the parameter estimates may be beneficial for
predictions, especially when the lotteries have not been carefully chosen and
large mistakes have large consequences. Because the resulting fit (likelihood)
has a smoother (but still reactively lumpy) surface, finding robust solutions
is computationally less intense than with MXF.
On the downside, MLE is not highly robust to aberrant choices. If by
confusion or accident the DM chooses an atypical option for a lottery that is
greatly out of line with her other choices, the resulting parameters may be
disproportionately affected by this single anomalous choice. While MLE does
take into account the quality of the fit with respect to utility discrepancies,
the fitting procedure has the simple goal of finding the parameter set that
generates the best overall fit (even if the differences in fit among parameter
sets are trivially divergent). This approach ignores the fact that there may be
different parameter sets that fit choices virtually the same. Consider subject
A in Figure 8, which is the resulting distribution of parameter estimates when
we change one single choice and reestimate the MLE parameters. Especially
for loss aversion and the elevation of the probability weighting function we
find that only a single choice can make the difference between very different
parameter values.
The MLE method solves the tie-breaking problem of MXF, but does so
18
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
by ascribing a great deal of importance to potentially trivial differences in
fit quality. Radically different combinations of parameters may be identified
by the procedure as nearly equivalently good, and the winning parameter set
may emerge but only differ from the other “almost-as-good” sets by a slightest of margins in fit quality. This process of selecting the winning parameter
set from among the space of plausible combinations ignores how realistic or
reflective the sets may be of what the DM’s actual preferences are. Moreover
the ill behaved structure of the parameter space makes it a challenge to find
reliable results, as different solutions can be nearly identical in fit quality
but be located in radically different regimes of the parameter space. A minor
change in the lotteries, or just one different choice, may bounce the resulting
parameter estimations substantially. Furthermore, the MLE procedure does
not consider the psychological interpretation of the parameters and therefore
the fact that we may select a parameter that attributes preferences that are
out of the ordinary with only very little evidence to distinguish it from more
typical preferences.
4.3. Hierarchical Maximum Likelihood Estimation (HML)
The HML estimation procedure developed here is based on Farrell and
Ludwig (2008). It is a two-step procedure that is explained below.
Step 1. In the first step the likelihood of occurrence for each of the four
model parameters in the population as a whole is estimated. These likelihoods of occurrence are captured using probability density distributions and
reflect how likely (or conversely, out of the ordinary) a particular parameter
value is in the population given everyone’s choices. These density distributions are found by solving the integral in Equation 9. For each of the four
cumulative prospect theory parameters we use a lognormal density distribution, as this distribution has only positive values, is positively skewed, has
only two distribution parameters, and is not too computationally intense.
Because the sensitivity parameter ϕ is not independent of the other parameters, and because it has no clear psychological interpretation, we do not estimate a distribution of occurrence for sensitivity, but instead take the value
we find for the aggregate data with classical maximum likelihood estimation
for this step.4 We use the notation M = {α, λ, δ, γ, ϕ}, Pα = {µα , σα },
4
It may be that a different approach with regard to the sensitivity parameter leads to
better performance. We cannot judge this without data of a third re-test, as it requires
19
Step 2
4
3
3
← α (curvature)
←α
density
density
Step 1
4
2
1
2
1
← λ (loss aversion)
1
2
parameter value
←λ
0
0
3
4
4
3
3
density
density
0
0
2
1
(shape)
γ→
0
0
1
2
parameter value
3
3
2
1
← δ (elevation)
1
2
parameter value
0
0
γ→
←δ
1
2
parameter value
3
Figure 4: The two steps of the hierarchical maximum likelihood estimation procedure.
In the first step we fit the data to four probability density distributions. In
the second step we obtain the actual individual estimates. A priori and in the
absence of individual choice data the small square has the highest likelihood,
which is at the modal value of each parameter distribution. The individual
choices are balanced against the likelihood of the occurrence of a parameter to
obtain individual parameter estimates, depicted by big squares.
504
Pλ = {µλ , σλ }, Pδ = {µδ , σδ }, Pγ = {µγ , σγ }, P = {Pα , Pλ , Pδ , Pγ }, S and I
estimation in a first session, calibration of the method using a second session, and true
out-of-sample testing of this calibration in a third session. This may be interesting but is
beyond the scope of the current paper.
20
505
denote the number of participants and the number of lotteries respectively.
b = arg max
P
P
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
S
Y
s=1
ZZZZ
"
N
Y
#
c(ys,i |M) ln(α|Pα ) ln(λ|Pλ ) ln(δ|Pδ ) ln(γ|Pγ ) dα dλ dδ dγ
i=1
(9)
This first step can be explained using a simplified example which is a
discrete version of Equation 9. For each of the four risk preference parameters let us pick a set of values that covers a sufficiently large interval, for
example {0.5, 1, ..., 5}. We then take all parameter combinations (Cartesian
product) of these four sets, i.e. α = 0.5, λ = 0.5, δ = 0.5, γ = 0.5, then
α = 0.5, λ = 0.5, δ = 0.5, γ = 1.0, and so forth. Each of these combinations has a certain likelihood of explaining a subject’s observed choices,
displayed within the block parentheses in Equation 9. However, not all of the
parameter values in these combinations are equally supported by the data.
It could for example be that the combination showed above that contains
γ = 0.5 is 10 times more likely to explain the observed choices than γ = 1.0.
How likely each parameter value is in relation to another value is precisely
what we intend to measure. The fact that one set of parameters is associated with a higher likelihood of explaining observed choices (and therefore
is more strongly supported by the data) can be reflected by changing the
four log-normal density functions for each of the model parameters. Because
we multiply how likely a set of values explains the observed choices by the
weight put on this set of parameters through density functions, Equation 9
achieves this. The density functions, and the product thereof, denoted by
ln(·) within the integrals, are therefore ways to measure the relative weight of
certain parameter values. The maximization is done over all participants simultaneously under the condition and to ensure that it needs to be a density
distribution. Applying this step to our data leads to the density distributions
displayed in Figure 4.
Step 2. In this step we estimate individual parameters as carried out in
standard maximum likelihood estimation, but now weigh parameters by the
likelihood of the occurrence of these parameter values as given by the density
functions obtained in step 1. This likelihood of occurrence is determined by
the distributions we obtained from step 1. Those distributions and both steps
are illustrated in Figure 4. This second step of the procedure is described in
Equation 10. With Mi we denote M as above for subject i. The resulting
parameters are not only driven by individual decision data, but also by how
21
538
likely it is that such a parameter combination occurs in the population.
"N
#
Y
di = arg max
cα ) ln(λ|P
cλ ) ln(δ|P
cδ ) ln(γ|P
cγ )
M
c(yi |Mi ) ln(α|P
(10)
Mi
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
i=1
This HML procedure is motivated by the principle that extreme conclusions
require extreme evidence. In the first step of the HLM procedure, one simultaneously extracts density distributions of all parameter values using all
choice data from the population of DMs. In the second step, the likelihood
of a particular parameter value is determined by how well it fits the observed choices from an individual, and at the same time, it is weighted by
the likelihood of observing that particular parameter value in the population
distribution (defined by the density function obtained in step 1). The parameter set with the most likely combination of both of those elements is selected.
The weaker the evidence is, the more parameter estimates are “pulled” to
the center of the density distributions. HML therefore counters maximum
likelihood estimation’s tendency to fit parameters to extreme choices by requiring stronger evidence to establish extreme parameter estimates. Thus,
if a DM makes very consistent choices, the population parameter densities
will have very little effect on the estimates. Conversely, if a DM has greater
inconsistency in his choices, the population parameters will have more pull
on the individual estimates. In the extreme, if a DM has wildly inconsistent
choices, the best parameter estimates for the DM will simply be those for
the population average.
This is illustrated in Figure 5 that compares the estimates for four representative participants by MLE (points) and HML (squares). Subject 1 is the
same subject as in Figure 8. We find that HML estimates are different from
MLE, but are generally still within the 95% confidence intervals. This is the
statistical tie-breaking mechanism we referred to in the previous section. It
is also possible that the differences in estimates are large and fall outside the
univariate MLE confidence intervals (displayed by lines), such as for subject
2. For the other two participants HML’s estimates are virtually identical, irrespective of the confidence interval. This is largely driven by how consistent
a DM’s choices supporting particular parameter values are.
The estimation method has been implemented in both MATLAB and C.
Parameter estimates in the second step were found using the simplex algorithm by Nelder and Mead (1965) and initialized with multiple starting points
22
α
γ
δ
λ
1
2
3
4
1
2
2
4
6
1
2
1
2
Figure 5: Parameter estimates for four participants, with each row a subject and each
column a parameter. The lines are approximate univariate 95% confidence intervals for MLE for each of the four model parameters (sensitivity not included).
The round markers indicate the best estimate from the maximum likelihood estimation procedure. The square marker indicates its hierarchical counterpart.
The first subject is subject A in Figure 8. For participants one and two, we observe large differences in parameter estimates, partially explained by the large
confidence intervals. For participants three and four we find that the results are
virtually identical.
585
to avoid local minima. Negative log-likelihood was used for computational
stability when possible. For the first step of the hierarchical method one
cannot use negative log-likelihood for the whole equation due to the presence
of the integral, so negative log-likelihood was only used for the multiplication
across participants. Solving the volume integral is computationally intense
because of the number of the high dimensions and the low probability values. Quadrature methods were found to be prohibitively slow. Because the
integral-maximizing parameters are what is of central importance, and not
the precise value of the integral, we used Monte Carlo integration with 2,500
(uniform random) values for each dimension as a proxy. We repeated this
twenty times and used the average of these estimates as the final distribution
parameters. A possible benefit of HML is that the step 1 distribution appears
to be fairly stable. With a sufficiently diverse population it may therefore be
possible to simply take the distributions found in previous studies, such as
this one. The parameter values can be found in the appendix.
586
5. Results
571
572
573
574
575
576
577
578
579
580
581
582
583
584
587
588
589
590
5.1. Aggregate results
The aggregate data can be analyzed to establish stylized facts and general
behavioral tendencies. In this analysis, all 12,922 choices (142 participants ×
91 items) are modeled as if they were made by a single DM and one set of risk
23
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
preference parameters is estimated. This preference set is estimated using
each of the three different estimation methods discussed before (MXF, MLE,
HLM). For MXF, four parameters are estimated that maximize the fraction
of explained choices; for the other two methods, five parameters (the fifth
value is the sensitivity parameter ϕ) are estimated for each method.
The results of the three estimation procedures, including confidence intervals, can be found in Table 3. On the aggregate, we observe behavior
consistent with a curvature parameters implying diminishing sensitivity to
marginal returns; α values less than 1 are estimated for all of the methods,
namely 0.59 for MXF and 0.76 for both MLE and HLM. Choice behavior
is also consistent with some loss aversion (λ > 1), which reflects the notion that “losses loom larger than gains” (Kahneman and Tversky, 1979).
The highest loss-aversion value we find for any of the estimation methods
is 1.12, which is much lower than the value of 2.25 reported in the seminal cumulative prospect theory paper (Tversky and Kahneman, 1992). Part
of the explanation for a lower value (indicating less loss aversion) might be
that as it is verboten to take money from participants in the laboratory and
thus “losses” were bounded to only being reductions in participant’s show-up
payment. The probability weighting function has the characteristic inverse
S-shape that intersects the diagonal at around p = 0.44 for MLE and HML
(and stays above it for MXF). Similar values for all of the estimation methods are found in the second experimental session. The largest differences
between the two experimental test-retest sessions are seen in the probability
weighting parameters.
α
Time 1 MXF 0.59
MLE 0.76
HML 0.76
Time 2 MXF 0.58
MLE 0.78
HML 0.78
(±0.03)
(±0.02)
(±0.03)
(±0.02)
λ
1.11
1.12
1.12
1.11
1.18
1.18
(±0.04)
(±0.03)
(±0.04)
(±0.03)
δ
0.71
0.93
0.93
0.71
0.91
0.91
(±0.03)
(±0.02)
(±0.03)
(±0.02)
γ
0.90
0.63
0.63
0.86
0.65
0.65
(±0.02)
(±0.02)
(±0.02)
(±0.02)
ϕ
0.25
0.26
0.23
0.23
(±0.04)
(±0.03)
(±0.03)
(±0.03)
Table 3: Median parameter values for the aggregate data found using all three estimation
methods. Approximate univariate 95% confidence intervals constructed using
log-likelihood ratio statistics for the MLE and HML methods.
615
With the choice function and obtained value of ϕ we can convert the
24
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
parameter estimates of the first experimental session for MLE (and HML,
since its aggregate estimates are virtually identical) into a prediction of the
probability of picking option B over option A for each lottery in the second
experimental session. This is then compared to the fraction of subjects picking option B over option A. On the aggregate, cumulative prospect theory’s
predictions appear to perform well. The correlation between the predicted
and observed fractions is 0.93.
5.2. Individual risk preference estimates
When benchmarking the explained fraction and fit of individual parameters compared to their estimated aggregate median counterparts, we find
that individual parameters provide clear benefits, especially in terms of the
fit. After having established that individual parameters have added value, a
more direct comparison between MXF, MLE and HML is warranted. Particularly for HML it is important to determine to what degree its parameter
estimates sacrifice in-sample performance, which is the performance of the
individual parameters of the first session within that same session. Individual risk preference parameters were estimated for each subject using all three
estimation methods. Fits are expressed in negative log-likelihood (lower is
better) and are directly comparable because the hierarchical component is
excluded in the evaluation of the likelihood value. The mean and standard
deviation of the subject’s fits are displayed in Table 4. As to be expected, the
log-likelihood is lowest for MLE because that method attempts to find the
set of parameters with the best possible fit. HML is not far behind, however.
No likelihood is given for MXF, as that method does not use the sensitivity
parameter. We also determined the explained fraction of choices for each
individual. The means and standard deviations are also printed in Table 4.
With 0.82 the explained fraction of choices is highest for MXF because its
goal function is to maximize that fraction. MLE and HML follow at about
0.76. Thus, the explained percentage of choices that cannot be explained at
time 1 by HML estimates but can be explained by MLE estimates is small to
naught. For some participants the HML estimates explain a larger fraction of
choices at time 1. While this seems counterintuitive, it is the fit (likelihood)
that eventually drives the MLE parameter estimates and not the fraction
of explained choices. For most participants the HML estimates are different from the MLE estimates, but the differences are inconsequential for the
percentage of explained choices.
25
1
Fraction explained by HML
0.9
0.8
0.7
0.6
0.5
0.4
0.4
0.5
0.6
0.7
0.8
Fraction explained by MLE
0.9
1
Figure 6: The predicted fraction of choices at time 2 when using MLE parameters from
time 1 (x-axis) and HML parameters from time 1 (y-axis). Each point represents
one DM. MLE performs better for a subject whose point appears in the lower
right region, HML performs better when it appears in the upper left region.
Negative log likelihood of result with HML
200
170
140
110
80
50
20
20
50
80
110
140
170
Negative log likelihood of result with MLE
200
Figure 7: The goodness of fit at time 2 when using MLE parameters from time 1 (x-axis)
and HML parameters from time 1 (y-axis). Each point represents one DM. MLE
performs better for a DM whose point appears in the lower right region, HML
performs better when it appears in the upper left region.
26
Fraction
MXF
MLE
HML
Fit MXF
MLE
HML
Explained (Time 1) Predicted (Time 2)
0.82 (0.07)
0.74 (0.10)
0.76 (0.09)
0.73 (0.09)
0.76 (0.08)
0.73 (0.10)
83.73 (21.15)
103.24 (26.66)
86.37 (21.81)
98.97 (23.66)
Table 4: Mean and standard deviation (in parentheses) for the three estimation methods.
Explained refers to the time 1 statistics given that time 1 estimates are used,
predicted refers to time 2 statistics using parameters from time 1. The explained
fraction of choices is about the same, while the fit in time 2 which is better for
HML.
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
So far we have established that the HML estimates do not hurt the explained fraction of choices accounted for even if it has a somewhat lower fit
within-sample; the next question is whether HML is worth the effort in terms
of prediction (out-of-sample). The estimates of individual parameters of the
first session were used to predict choices in the second session. In Table 4 we
have listed simple statistics for the fit (again comparable by not including the
hierarchical component) and the explained fractions of all three estimation
methods for the second session. All three methods explain approximately the
same fraction of choices. The large advantage MXF had over the other two
methods has disappeared. It is interesting to compare this with the prediction that participants make the same choices in both sessions. That method
correctly predicts .71 on average, but its application is of course limited to
situations in which the lotteries are identical. Figure 6 shows the fraction
of correctly predicted lotteries and Figure 7 compares the fit for MLE and
HML parameters. Judging by the fraction of lotteries explained, it is not
immediately obvious which method is better. A more sensitive measure to
compare the three methods is the goodness-of-fit. When comparing HML
to MLE, we find that HML’s out-of-sample fit is better on average and its
negative log likelihood is lower for 100 out of 142 participants. Using a paired
sign rank test for medians we find that the fit for HML is significantly lower
than for MLE (p < 0.001). The results are much clearer when examining the
fit in the figure too, which shows many points on the diagonal and a large
number below it (a lower negative log likelihood is better). We can conclude
that HML sacrificed part of the fit at time 1, for an improved fit at time 2.
27
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
Examination of the differences in parameter estimates for those participants for which HML outperforms MLE reveals higher MLE mean parameter estimates for the individual probability weighting function parameters.
To verify if HML did not just work because MLE returned extremely high
parameter estimates for the weighting function, we applied the simple solution of using (somewhat arbitrarily chosen) bounds to the MLE estimation
method (0.2 ≤ α ≤ 2, 0.2 ≤ λ ≤ 5,0.2 ≤ δ ≤ 2,0.2 ≤ γ ≤ 1). This did not
change the results, as HML still outperformed MLE. In many instances HML
is also an improvement over MLE for seemingly plausible MLE estimates, so
HML’s effect is not limited to outliers or extreme parameter values. Interestingly, some of the cases in which MLE outperforms HML are those in which
low extreme values emerge that are consistent with choices in the second
experimental session, such as loss seeking (less weight on losses) or weighting functions that are nearly binary step functions. This lack of sensitivity
highlights one of the potential downsides of HML.
5.3. Parameter reliability
An key issue is the reliability of the risk parameter estimates. This viewpoint is consistent with using risk preferences as a trait, a stable characteristic
of an individual decision maker, and amendable to psychological interpretation of the parameters. Reliability can be assessed by applying all three
estimation procedures independently on the same data from the two experimental sessions. Large changes in parameter values are a problem and an
indication that we are mostly measuring/modeling noise instead of real individual risk preferences. At the individual level we can examine the test-retest
correlations of the parameters from the three estimation methods, displayed
in Table 5. The parameter correlations for MXF indicate some reliability
with correlations of approximately 0.40, except for δ which approaches zero.
MLE provides more reliable estimates for δ than MXF, but performs worse
overall. The test-retest reliability for MLE can be improved for three out
of four parameters by applying admittedly arbitrary bounds. The highest
test-retest correlation is obtained using HML, which yields the most reliable
parameter estimates for all parameters.
Another form of parameter reliability is the change in parameter estimates resulting from small changes in choice input. One way of establishing
this form of reliability is by temporarily changing the choice on one of a
subject’s lotteries, after which the parameters are reestimated. This is repeated for each of the 91 lotteries separately. MXF and HML’s estimates are
28
MXF
MLE
MLE (bounded)
HML
rα
0.39
0.23
0.28
0.43
rλ
0.40
0.36
0.54
0.53
rδ
0.05
0.19
0.10
0.44
rγ
0.39
0.06
0.45
0.66
Table 5: Test-retest correlation coefficients (r values) for parameter values obtained using the listed estimation procedure for both sessions independently. The use of
parameter bounds (0.2 ≤ α ≤ 2, 0.2 ≤ λ ≤ 5, 0.2 ≤ δ ≤ 2, 0.2 ≤ γ ≤ 1),
indicated by by bounded condition, improves the test-retest correlation for MLE.
The highest test-retest correlations are obtained with HML.
713
714
715
716
717
718
719
720
721
722
robust, whereas MLE’s estimates are less so. In Figure 8 the results of reestimation after perturbing one choice is plotted for two subjects using both
MLE and HML. A single different choice can yield large changes in MLE
parameter estimates and this contribute to unreliable parameter estimates.
This is generally an undesirable property of a measurement method as minor
inattention, confusion, or a mistake can lead to very different conclusions
about that person’s risk preferences. The high robustness of HML estimates
is a general finding among subjects in our sample. Moreover the fragility
of the estimates quickly becomes worse as the number of aberrant choices
increases by only a few more.
29
Number of results
Number of results
Number of results
Number of results
90
90
Subject A − MLE
90
90
60
60
60
60
30
30
30
30
0
0
1
α
2
0
0
0
1
2
0
λ
Subject A − HML
90
2
δ
4
0
0
90
90
60
60
60
60
30
30
30
30
0
0
1
α
2
0
0
1
λ
2
0
0
2
δ
4
0
0
90
Subject B − MLE
90
90
60
60
60
60
30
30
30
30
1
α
2
0
0
0
1
2
0
λ
Subject B − HML
90
2
δ
4
0
0
90
90
60
60
60
60
30
30
30
30
0
0
1
α
2
0
0
1
λ
2
0
0
2
1
γ
2
1
γ
2
1
γ
2
90
90
0
0
1
γ
90
2
δ
4
0
0
Figure 8: The parameter outcomes of MLE parameter estimates for two sample subjects if
we change a single choice for each of the 91 items. The black line corresponds to
the best MLE estimate of the original choices. The bimodal pattern we observe
for loss aversion and the weighting function for subject A is not a desirable
feature. A momentary lapse of attention or a minor mistake may result in very
different parameter combinations. HML produces more reliable estimates in
terms of being less sensitive to one aberrant choice compared to MLE.
30
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
6. Discussion
In this paper we illustrated the merits of a hierarchical maximum likelihood (HML) procedure in estimating individual risk parameter estimations.
The HML estimates did reduce (albeit trivially) the in-sample performance,
while at the same time the out-of-sample fit was significantly higher than
MLE and its explained fraction was as high as any of the other methods.
The reliability of the parameter estimates for test-retest was furthermore
higher for HML than for any other method. We conclude that HML offers
advantages over the maximization of the explained fraction and maximum
likelihood estimation and is the preferred estimation method for this choice
model. The use of population information in establishing individual estimates
can improve both the out-of-sample fit and the reliability of the parameter
estimates.
Some caveats are in order however. The benefits of hierarchical modeling
may, for example, disappear when more choice data is available. In cases
where many choices have been observed, the signal can more easily be teased
apart from the noise and hierarchical methods may become unwarranted. Its
benefits may also depend on the set of lotteries used. For lotteries specifically designed for the tested model, that allow a narrow range of potential
parameter values, the choices alone may be sufficient to reliably estimate
individual risk preference parameters and hence the MXF estimates may be
more reliable than in our current results. As researchers however we do not
always control the set of lottery items, especially outside of the laboratory.
Furthermore, cumulative prospect theory and the parameterization of it we
have chosen here is but one model. The possibility exists that other models fit
the choice data better and that the resulting parameter estimates are highly
robust over time; in this case there may not be added value of hierarchical
parameter estimation.
In this paper we have not discussed alternative hierarchical estimation
methods. Bayesian models have gained attention in the literature recently
(Zyphur and Oswald, 2013). The frequentist model is used here because it
requires the addition of only one term in the likelihood function, given that
the hierarchical distribution is known. Our data indicate that this distribution appears to be fairly stable in a population. Nevertheless a Bayesian
hierarchical estimation method may outperform the frequentist method outlined here. Additionally the hierarchical method we have applied here is of a
relatively simple nature. It is possible to modify the extent to which the hi31
779
erarchical method weights the individual risk preference estimates and there
may be more sophisticated ways to do so. One way to is to fit a parameter
that moderates the weight of the hierarchical component by maximizing the
fit in the retest. To determine the advantages of such a method requires an
experiment with three sessions: one to obtain parameter estimates, another
to calibrate the parameters and a third to verify it outperforms the other
methods. This may be of both theoretical and practical interest, but it is
beyond the scope of this paper.
One major take away point from the current paper is that the estimation method can greatly affect the inferences one makes about individual
risk preferences, particularly in multi-parameter choice models. A number of
different parameter combinations may produce virtually the same fit result.
When the interpretation of individual parameter values is used to make predictions about what an individual is like (e.g., psychological traits) or what
will she do in the future (out-of-sample prediction), it is important to realize that other constellations of parameters are virtually equally plausible in
terms of fit but may lead to vastly different conclusions about individual differences and predictions about behavior. Moreover, using more sophisticated
estimation methods can help us extract more “signal” from a set of noisily
choices and thus support better measurement of innate risk preferences.
780
7. Online implementation
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
Implementing the measurement and estimation methods delineated in this
paper requires some investment of time, effort, and technical expertise; this
may act as a barrier for the measure and method’s use and wider adoption.
One could imagine that there are researchers interested in establishing reliable measures of risk preferences at the individual level, but that these risk
preferences may not of central interest to their research programs. Nonetheless knowing these individual differences are worth some attention as they
may covary with other variables or psychological processes under scrutiny.
To facilitate the use of individual level risk measurement, we have developed an online version of the risky decision task used in this research. The
URL http://vlab.ethz.ch/prospects contains a landing page, followed by the
91 binary lotteries (see the Appendix) presented in a randomized order to a
decision maker to elicit their choices and preferences over well defined lotteries. The DM is shown each of the binary lotteries one at a time, and the
computer records their choice as well as their response time. After the DM
32
803
has registered all of her choices, the individual’s parameters are estimated
(using the HLM method) and the results are displayed (and emailed to the
researcher as well). Further, researchers can have have full access to the raw
data should they wish to run additional analyses/modeling on the choice
data from the individual decision makers. This configuration can be used in
a behavioral laboratory with internet connected computers, and most naturally could be used as a pre- or a post-test, in sequence with other research
tasks.
804
Acknowledgments
796
797
798
799
800
801
802
805
806
807
808
809
810
811
The authors wish to thank Michael Schulte-Mecklenbeck, Thorsten Pachur,
and Ralph Hertwig for the use of the experimental data analyzed in this paper. These data are a subset from a large and thorough risky choice study
conducted at the Max Plank Institute. Thanks also to Helga Fehr-Duda and
Thomas Epper for helpful comments and suggestions. This work was funded
in part by the ETH Risk Center CHIRP Research Project. Any errors are
the responsibility of the authors.
33
812
References
813
References
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
Abdellaoui, M., 2000. Parameter-free elicitation of utility and probability
weighting functions. Management Science 46 (11), 1497–1512.
Birnbaum, M. H., Chavez, A., 1997. Tests of theories of decision making:
Violations of branch independence and distribution independence. Organizational Behavior and Human Decision Processes 71, 161–194.
Camerer, C., Ho, T.-H., 1994. Violations of the betweenness axiom and nonlinearity in probability. Journal of Risk and Uncertainty 8, 167–196.
Farrell, S., Ludwig, C., 2008. Bayesian and maximum likelihood estimation
of hierarchical response times. Psychonomic Bulletin and Review 15 (6),
1209–1217.
Fehr-Duda, H., de Gennaro, M., Schubert, R., 2006. Gender, financial risk,
and probability weights. Theory and Decision 60 (2-3), 283–313.
Fehr-Duda, H., Epper, T., 2012. Probability and risk: Foundations and economic implications of probability-dependent risk preferences. Annual Review of Economics 4, 567–593.
Figner, B., Murphy, R. O., 2010. Using skin conductance in judgment and
decision making research. New York, NY: Psychology Press, pp. 163–184.
Gaechter, S., Johnson, E. J., Hermann, A., 2007. Individual-level loss aversion in risky and riskless choice. Working Paper, University of Nottingham.
Gigerenzer, G., Gaissmaier, W., 2011. Heuristic decision making. Annual
review of psychology 62, 451–482.
Glöckner, A., Pachur, T., 2012. Cognitive models of risky choice: Parameter
stability and predictive accuracy of prospect theory. Cognition 123, 21–32.
Goldstein, W., Einhorn, H., 1987. Expression theory and the preference reversal phenomena. Psychological Review 94 (2), 236–254.
Gonzalez, R., Wu, G., 1999. On the shape of the probability weighting function. Cognitive Psychology 38, 129–166.
34
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
Harless, D. W., Camerer, C. F., 1994. The predictive utility of generalized
expected utility theories. Econometrica 62 (6), 1251–1290.
Hey, J. D., Orme, C., 1994. Investigating generalizations of the expected
utility theory using experimental data. Econometrica 62 (6), 1291–1326.
Holt, C. A., Laury, S. K., 2002. Risk aversion and incentive effects. The
American Economic Review 92 (5), 1644–1655.
Ingersoll, J., 2008. Non-monotonicity of the tversky-kahneman probabilityweighting function: A cautionary note. European Financial Management
14 (3), 385–390.
Kahneman, D., Tversky, A., 1979. Prospect theory: An analysis of decision
under risk. Econometrica 47, 263–291.
Kahneman, D., Tversky, A. (Eds.), 2000. Choices, values and frames. Cambridge University Press, Cambridge, UK, pp. p. 3 or pp. 1–16.
Karmarkar, U., 1979. Subjectively weighted utility and the allais paradox.
Organizational Behavior and Human Performance 24, 67–72.
Köbberling, V., Wakker, P., 2005. An index of loss aversion. Journal of Economic Theory 122, 119–131.
Lichtenstein, S., Slovic, P., 1971. Reversals of preference between bids and
choices in gambling decisions. Journal of Experimental Psychology 89 (1),
46–55.
Lopes, L. L., 1983. Some thought on the psychological concept of risk. Journal
of Experimental Psychology: Human Perception and Performance 9, 137–
144.
Luce, R. D., 1959. Individual choice behavior: A theoretical analysis. New
York, NY: Wiley.
Mosteller, F., Nogee, P., 1951. An experimental measurement of utility. The
Journal of Political Economy 59, 371–404.
Nelder, J. A., Mead, R., 1965. A simplex method for function minimization.
Computer Journal 7, 308–313.
35
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
Nilsson, H., Rieskamp, J., Wagenmakers, E.-J., 2011. Hierarchical bayesian
parameter estimation for cumulative prospect theory. Journal of Mathematical Psychology 55, 84–93.
Prelec, D., 1998. The probability weighting function. Econometrica 66, 497–
527.
Rabin, M., 2000. Risk aversion and expected-utility theory: A calibration
theorem. Econometrica 68, 1281–1293.
Rieskamp, J., 2008. The probabilistic nature of preferential choice. Journal
of Experimental Psychology: Learning, Memory and Cognition 34 (6),
1446–1465.
Roberts, S., Pashler, H., 2000. How persuasive is a good fit? a comment on
theory testing. Psychological Review 107 (2), 358–367.
Scheibehenne, B., Pachur, T., 2013. Hierarchical bayesian modeling: Does
it improve parameter stability? In: Knauff, M., Pauen, M., Sebanz, N.,
Wachsmuth, I. (Eds.), Proceedings of the 35th Annual Conference of the
Cognitive Science Society. Austin TX: Cognitive Science Society, pp. 1277–
1282.
Schulte-Mecklenbeck, M., Pachur, T., Murphy, R. O., Hertwig, R., 2014.
Does prospect theory capture psychological processes. Working Paper,
MPI Berlin.
Starmer, C., June 2000. Development in non-expected utility theory: The
hunt for a descriptive theory of choice under risk. Journal of Economic
Literature 38, 332–382.
Stevens, S. S., 1957. On the psychophysical law. The Psychological Review
64 (3), 153–181.
Stott, H., 2006. Cumulative prospect theory’s functional menagerie. Journal
of Risk and Uncertainty 32, 101–130.
Tanaka, T., Camerer, C. F., Nguyen, Q., 2010. Risk and time preferences:
Linking experimental and household survey data from vietnam. American
Economic Review 100 (1), 557–571.
36
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
Tversky, A., Kahneman, D., 1992. Advances in prospect theory: Cumulative
representations of uncertainty. Journal of Risk and Uncertainty 5, 297–323.
Wakker, P., 2005. Formalizing reference dependence and initial wealth in
rabin’s calibration theorem. Working paper, Erasmus University, available
at http://people.few.eur.nl/wakker/pdf/calibcsocty05.pdf.
Wetzels, R., Vandekerckhove, J., Tuerlinckx, F., Wagenmakers, E.-J., 2010.
Bayesian parameter estimation in the expectancy valence model of the
iowa gambling task. Journal of Mathematical Psychology 54, 14–27.
Willemsen, M., Johnson, E., 2014.
URL http://www.mouselabweb.org
Willemsen, M. C., Johnson, E. J., 2011. Visiting the decision factory: Observing cognition with mouselabweb and other information acquisition methods. A handbook of process tracing methods for decision research: A critical review and users guide, 21–42.
Zeisberger, S., Vrecko, D., Langer, T., 2010. Measuring the time stability of
prospect theory preferences. Theory and Decision 72, 359–386.
Zyphur, M. J., Oswald, F. L., 2013. Bayesian probability and statistics in
management research: A new horizon. Journal of Management 39 (1),
5–13.
37
Appendix A: Lotteries
Item p(A1 )
1
.34
2
.88
.74
3
4
.05
5
.25
.28
6
7
.09
8
.63
.88
9
10
.61
11
.08
.92
12
13
.78
14
.16
.12
15
16
.29
17
.31
18
.17
19
.91
20
.09
21
.44
22
.68
23
.38
.62
24
25
.49
26
.16
.13
27
28
.29
29
.82
30
.29
31
.60
32
.48
33
.53
34
.49
.99
35
36
.79
37
.56
38
.63
39
.59
40
.13
41
.84
42
.86
43
.66
44
.39
45
.51
46
.73
.49
47
48
.56
49
.96
50
.56
A1 p(A2 ) A2 p(B1 )
B1 p(B2 )
24
.66 59
.42
47
.58
79
.12 82
.20
57
.80
62
.26
0
.44
23
.56
56
.95 72
.95
68
.05
84
.75 43
.43
7
.57
7
.72 74
.71
55
.29
56
.91 19
.76
13
.24
41
.37 18
.98
56
.02
72
.12 29
.39
67
.61
37
.39 50
.60
6
.40
54
.92 31
.15
44
.85
63
.08
5
.63
43
.37
32
.22 99
.32
39
.68
66
.84 23
.79
15
.21
52
.88 73
.98
92
.02
88
.71 78
.29
53
.71
39
.69 51
.84
16
.16
70
.83 65
.35 100
.65
80
.09 19
.64
37
.36
83
.91 67
.48
77
.52
14
.56 72
.21
9
.79
41
.32 65
.85 100
.15
40
.62 55
.14
26
.86
1
.38 83
.41
37
.59
15
.51 50
.94
64
.06
-15
.84 -67
.72 -56
.28
-19
.87 -56
.70 -32
.30
-67
.71 -28
.05 -46
.95
-40
.18 -90
.17 -46
.83
-25
.71 -86
.76 -38
.24
-46
.40 -21
.42 -99
.58
-15
.52 -91
.28 -48
.72
-93
.47 -26
.80 -52
.20
-1
.51 -54
.77 -33
.23
-24
.01 -13
.44 -15
.56
-67
.21 -37
.46
0
.54
-58
.44 -80
.86 -58
.14
-96
.37 -38
.17 -12
.83
-55
.41 -77
.47 -30
.53
-29
.87 -76
.55 -100
.45
-57
.16 -90
.25 -63
.75
-29
.14 -30
.26 -17
.74
-8
.34 -95
.93 -42
.07
-35
.61 -72
.76 -57
.24
-26
.49 -76
.77 -48
.23
-73
.27 -54
.17 -42
.83
-66
.51 -92
.78 -97
.22
-9
.44 -56
.64 -15
.36
-61
.04 -56
.34
-7
.66
-4
.44 -80
.04 -46
.96
(Continued on next page)
38
B2
64
94
31
95
97
63
90
8
63
45
29
53
56
29
19
91
91
50
65
6
31
2
96
24
14
-83
-37
-44
-64
-99
-37
-74
-93
-30
-62
-97
-97
-69
-61
-28
-30
-43
-30
-28
-34
-70
-34
-80
-63
-58
Item p(A1 )
51
.43
52
.06
53
.79
54
.37
55
.61
.43
56
57
.39
58
.59
59
.92
.89
60
61
.86
62
.74
63
.91
64
.93
.99
65
66
.48
67
.07
68
.97
69
.86
70
.88
.87
71
72
.96
.38
73
74
.28
75
.50
.50
76
77
.50
.50
78
79
.50
80
.50
.50
81
82
.10
83
.20
84
.30
85
.40
86
.50
87
.60
88
.70
89
.80
90
.90
91
1
(Continued)
A1 p(A2 ) A2 p(B1 )
-91
.57 63
.27
-82
.94 54
.91
-70
.21 98
.65
-8
.63 52
.87
96
.39 -67
.50
-47
.57 63
.02
-70
.61 19
.30
-100
.41 81
.47
-73
.08 96
.11
-31
.11 27
.36
-39
.14 83
.80
77
.26 -23
.67
-33
.09 28
.27
75
.07 -90
.87
67
.01 -3
.68
58
.52 -5
.40
-55
.93 95
.48
-51
.03 30
.68
-26
.14 82
.60
-90
.12 88
.80
-78
.13 45
.88
17
.04 -48
.49
-49
.62
2
.22
-59
.72 96
.04
98
.50 -24
.14
-20
.50 60
.50
-30
.50 60
.50
-40
.50 60
.50
-50
.50 60
.50
-60
.50 60
.50
-70
.50 60
.50
40
.90 32
.10
40
.80 32
.20
40
.70 32
.30
40
.60 32
.40
40
.50 32
.50
40
.40 32
.60
40
.30 32
.70
40
.20 32
.80
40
.10 32
.90
40
0 32
1.00
B1 p(B2 )
-83
.73
38
.09
-85
.35
23
.13
71
.50
-69
.98
8
.70
-73
.53
16
.89
26
.64
8
.20
75
.33
9
.73
96
.13
74
.32
-40
.60
-13
.52
-89
.32
-39
.40
-86
.20
-69
.12
-60
.51
19
.78
-4
.96
-76
.86
0
.50
0
.50
0
.50
0
.50
0
.50
0
.50
77
.90
77
.80
77
.70
77
.60
77
.50
77
.40
77
.30
77
.20
77
.10
77
.00
B2
24
-73
93
-39
-26
14
-37
15
-48
-48
-88
-7
-67
-89
-2
96
99
46
31
14
83
84
-18
63
46
0
0
0
0
0
0
2
2
2
2
2
2
2
2
2
2
Table 6: Lotteries used in the experiment. Each row is one lottery. The first column is the
item number. Columns 2-5 describe option A, in the form of a p(A1 ) probability
of A1 and a p(A2 ) of A2 . Columns 6-9 describe option B in the same format.
39
919
Appendix B: Hierarchical Distribution Parameters
Session 1
Session 2
µα
σα
µλ
σλ
µδ
σδ
µγ
σγ
-0.27 0.18 0.00 0.62 -0.02 0.39 -0.31 0.59
-0.22 0.18 0.01 0.67 -0.00 0.38 -0.34 0.67
Table 7: Estimates for the density distribution parameters of both experimental sessions.
Both were obtained using Monte Carlo integration with 2,500 random values on
each dimension.
40