Hierarchical maximum likelihood parameter estimation for cumulative prospect theory: Improving the reliability of individual risk parameter estimates Ryan O. Murphy , Robert H.W. ten Brincke ETH Risk Center – Working Paper Series ETH-RC-14-005 The ETH Risk Center, established at ETH Zurich (Switzerland) in 2011, aims to develop crossdisciplinary approaches to integrative risk management. The center combines competences from the natural, engineering, social, economic and political sciences. By integrating modeling and simulation efforts with empirical and experimental methods, the Center helps societies to better manage risk. More information can be found at: http://www.riskcenter.ethz.ch/. ETH-RC-14-005 Hierarchical maximum likelihood parameter estimation for cumulative prospect theory: Improving the reliability of individual risk parameter estimates Ryan O. Murphy , Robert H.W. ten Brincke Abstract Individual risk preferences can be identified by using decision models with tuned parameters that maximally fit a set of risky choices made by a decision maker. A goal of this model fitting procedure is to isolate parameters that correspond to stable risk preferences. These preferences can be modeled as an individual difference, indicating a particular decision maker’s tastes and willingness to tolerate risk. Using hierarchical statistical methods we show significant improvements in the reliability of individual risk preference parameters over other common estimation methods. This hierarchal procedure uses population level information (in addition to an individual’s choices) to break ties (or near-ties) in the fit quality for sets of possible risk preference parameters. By breaking these statistical “ties” in a sensible way, researchers can avoid overfitting choice data and thus better measure individual differences in people’s risk preferences. Keywords: Prospect theory, Risk preference, Decision making under risk, Hierarchical parameter estimation, Maximum likelihood Classifications: JEL Codes: D81, D03 URL: http://web.sg.ethz.ch/ethz risk center wps/ETH-RC-14-005 Notes and Comments: ETH Risk Center – Working Paper Series Hierarchical maximum likelihood parameter estimation for cumulative prospect theory: Improving the reliability of individual risk parameter estimates Ryan O. Murphy∗, Robert H.W. ten Brincke Chair of Decision Theory and Behavioral Game Theory Swiss Federal Institute of Technology Zürich (ETHZ) Clausiusstrasse 50, 8006 Zurich, Switzerland +41 44 632 02 75 Abstract Individual risk preferences can be identified by using decision models with tuned parameters that maximally fit a set of risky choices made by a decision maker. A goal of this model fitting procedure is to isolate parameters that correspond to stable risk preferences. These preferences can be modeled as an individual difference, indicating a particular decision maker’s tastes and willingness to tolerate risk. Using hierarchical statistical methods we show significant improvements in the reliability of individual risk preference parameters over other common estimation methods. This hierarchal procedure uses population level information (in addition to an individual’s choices) to break ties (or near-ties) in the fit quality for sets of possible risk preference parameters. By breaking these statistical “ties” in a sensible way, researchers can avoid overfitting choice data and thus better measure individual differences in people’s risk preferences. Key words: Prospect theory, Risk preference, Decision-making under risk, Hierarchical parameter estimation, Maximum likelihood ∗ Corresponding author. Email addresses: [email protected] (Ryan O. Murphy), [email protected] (Robert H.W. ten Brincke) Working paper April 16, 2014 2 1 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 1. Introduction People must often make choices among a number of different options for which the outcomes are not certain. Epistemic limitations can be precisely quantified using probabilities, and given a well defined sets of options, each option with its respective probability of fruition, the elements of risky decision making are established (Knight 1932; Luce and Raiffa, 1957). Expected value maximization stands as the normative solution to risky choice problems, but people’s choices do not always conform to this optimization principle. Rather decision makers (DMs) reveal different preferences for risk, sometimes forgoing an option with a higher expectation in lieu of an option with lower variance (thus indicating risk aversion; in some cases DMs reveal the opposite preference too thus indicating risk seeking). Behavioral theories of risky decision making (Kahneman and Tversky, 1979; Tversky and Kahneman, 1992; Kahneman and Tversky, 2000) have been developed to isolate and highlight the order in these choice patterns and provide psychological insights into the underlying structure of DM’s preferences. This paper is about measuring those subjective preferences, and in particular finding a statistical estimation procedure that can increase the reliability of those measures, thus better capturing risk preferences and doing so at the individual level. 1.1. Example of a risky choice A binary lottery (i.e., a gamble or a prospect) is a common tool for studying risky decision making, so common in fact that they have been called the fruit flies of decision research (Lopes, 1983). In these simple risky choices, DMs are presented with monetary outcomes xi , each associated with an explicit probability pi . Consider for example the lottery in Table 1, offering a DM the choice between option A, a payoff of $1 with probability .62 or $83 with .38; or option B, a payoff of $37 with probability .41 or $24 with probability .59. The decision maker is called upon to choose either option A or option B and then the lottery is played out via a random process for real consequences. 1.2. EV maximization and descriptive individual level modeling The normative solution to the decision problem in Table 1 is straightforward. The goal of maximizing long term monetary expectations is satisfied 2 option A $1 $83 with probability option B .62 .38 $37 .41 with probability $24 .59 Table 1: Here is displayed an example of a simple, one-shot risky decision. The choice is simply whether to select option A or option B. The expected value of option A is higher ($32.16) than that of option B ($29.33), but option A contains the possibility of an outcome with a relatively small value ($1). As it turns out, the majority of DMs prefer option B, even though it has a smaller expected value. 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 by selecting the option with the highest expected value. Although this decision policy has desirable properties, it is often not an accurate description of what people prefer nor what they choose when presented with risky options. As can be anticipated, different people have different tastes, and this heterogeneity includes preferences for risk as well. For example, the majority of incentivized DMs select option B from the lottery shown above, indicating, perhaps, that these DMs have some degree of risk aversion. However the magnitude of this risk aversion is still unknown and it cannot be estimated from only one choice resting from a binary lottery. This limitation has lead researchers to use larger sets of binary lotteries where DMs make many independent choices, and from the overall pattern of choices, researchers can draw inferences about the DM’s risk preferences. Risk preferences have often been modeled at the aggregate level, by fitting parameters to the most selected option across many individuals, as opposed to modeling preferences at the individual level (see Kahneman and Tversky, 1979; Tversky and Kahneman, 1992). This aggregate approach is a useful first step in that it helps establish stylized facts about human decision making and further facilitates the identification of systematicity in the deviations from EV maximization. However, actual risk preferences exist at the individual level, and thus improving the resolution of measurement and modeling is a natural development in this research field. Substantial work exists along these lines already for both the measurement of risk preferences as well their application (e.g. Gonzalez and Wu, 1999; Holt and Laury, 2002; Abdellaoui, 2000; Fehr-Duda et al., 2006). The purpose of this paper is to add to that literature by developing and evaluating a hierarchical estimation method that improves the reliability of parameter estimates of risk preferences at the individual level. Moreover we want to do so with a measurement tool that is readily useable and broadly applicable as a measure of individual 3 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 risk preferences. 1.3. Three necessary components for measuring risk preferences There are three elementary components in measuring risk preferences and these parts serve as the foundation for developing behavioral models of risky choice. These three components are: lotteries, models, and statistical estimation/fitting procedures. We explain each of these components below in general terms, and then provide detailed examples and discussion of the elements in the subsequent sections of the paper. 1.3.1. Lotteries Choices among options reveal DM’s preferences (Samuelson, 1938; Varian, 2006). Lotteries are used to elicit and record risky decisions and these resulting choice data serve as the input for the decision models and parameter estimation procedures. Here we focus on choices from binary lotteries (such as Stott, 2006; Rieskamp, 2008; Nilsson et al., 2011), but the elicitation of certainty equivalence values is another well-established method for quantifying risk preferences (see for example Zeisberger et al., 2010). Binary lotteries are arguably the simplest way for subjects to make choices and thus elicit risk preferences. One reason to use binary lotteries is because people appear to have problems with assessing a lottery’s certainty equivalent (Lichtenstein and Slovic, 1971), although this simplicity comes at the cost of requiring a DM to make many binary choices to achieve the same degree of estimation fidelity that other methods purport to (e.g., the three decision risk measure from Tanaka et al., 2010). 1.3.2. Decision model (non-expected utility theory) Models may reflect the general structure (e.g., stylized facts) of DMs’ aggregate preferences by invoking a latent construct like utility. These models also typically have free parameters that can be tuned to improve the accuracy of how they capture different choice patterns. An individual’s choices from the lotteries are fit to a model by tuning these parameters and thus identifying differences in revealed preferences and summarizing the pattern of choices (with as much accuracy as possible) by using particular parameter values. These parameters can, for example, establish not just the existence of risk aversion, but can quantify the degree of this particular preference. This is useful in characterizing an individual decision maker and isolating cognitive mechanisms that underlie behavior (e.g., correlating risk preferences 4 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 with other variables, including measures of physiological processes such as in Figner and Murphy, 2010; Schulte-Mecklenbeck et al., 2014) as well as tuning a model to make predictions about what this DM will choose when presented with different options. Here we focus on the non-expected utility model of cumulative prospect theory (Kahneman and Tversky, 1979; Tversky and Kahneman, 1992; Starmer, 2000) as our decision model. Prospect theory is arguably the most important and influential descriptive model of risky choice to date and it has been used extensively in research related to model fitting and risky decision making. 1.3.3. Parameter estimation procedure Given a set of risky choices and a particular decision model, the best fitting parameter(s) can be identified via a statistical estimation procedure. A perfectly fitting parameterized model would exactly reproduce all of a DM’s choices correctly (i.e., perfect correspondence). Obviously this is a lofty goal and it is almost never observed in practice, as there almost certainly is measurement error in a set of observed choices. A decision model, with best fitting parameters, can be evaluated in several different ways. One approach of evaluating the goodness of fit would be to determine how many correct predictions the parameterized model makes of a DM’s choices. For example, for a set of risky choices from binary lotteries, researchers may find that 60% of the choices are constant with expected value maximization (EV maximization is the normative decision policy and trivially is a zero parameter decision model). If researchers used a concave power function to transform values into utilities, this single parameter model may increase to 70% consistency with the selected choices. Although this model scoring method is straightforward and intuitive, it has some stark shortcomings (as we will show later in the paper). Other more sophisticated methods, like maximum likelihood estimation, have been developed that allow for a more nuanced approach to evaluating the fit of a choice model (see for example Harless and Camerer, 1994; Hey and Orme, 1994). All of these different evaluation methods can also include both in-sample, and out-ofsample tests, which can be useful to diagnose and mitigate the overfitting choice data. The interrelationship between lotteries, a decision model, and an estimation procedure, is shown in Figure 1. Lotteries provide stimuli and the resulting choices are the input for the models; a decision model and an estimation procedure tune parameters to maximize correspondence between the 5 Input Choices Lotteries 1 2 91 32 99 with probability .78 .22 39 56 with probability .32 .68 -24 -13 with probability .99 .01 -15 -62 with probability .44 .56 58 -5 with probability .48 .52 96 -40 with probability .60 .40 time 1 time 2 √ √ Model Prospect Theory Parameter estimation √ √ Week 1 parameters α, λ, δ, λ √ Reliability Week 2 parameters α, λ, δ, λ √ Out-of-sample (prediction) In-sample (summarization) Figure 1: On the left we find three choices for each of the binary lotteries that serve as input for the model. Using an estimation method we fit the model’s risk preference parameters to the individual risky choices at time 1. These parameters, to some extent, summarize the choices made in time 1 by using the parameters in the model and reproduce the choices (depending on the quality of the fit). The out-of-sample predictive capacity of the model can be evaluated by comparing the predictions against actual choices at time 2. The reliability of the estimates is evaluated by comparing the time 1 parameter estimates to estimates obtained using the same estimation method on the time 2 data. This experimental design holds the lotteries and choices consistent, but varies the estimation procedures; moreover this is done in a test-retest design which allows the computation and comparison of both in-sample and out-of-sample statistics. 135 136 137 138 139 140 model and the actual choices from the decision maker facing the lotteries. This process of eliciting choices can be repeated again using the same lotteries and the same experimental subjects. This test-retest design allows for both the development of in-sample parameter fitting (i.e., statistical explanation or summarization), as well as out-of-sample parameter fitting (i.e., prediction). Moreover, the test-retest reliability of the different parameter 6 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 estimates can be computed as a correlation, as two sets of parameters exist for each decision maker. This is a powerful experimental design as it can accommodate both fitting and prediction, and further can also diagnose instances of overfitting which can undermine psychological interpretations (see Gigerenzer and Gaissmaier, 2011). 1.4. Overfitting and bounding parameters Multi-parameter models estimation methods may be prone to overfitting and in doing so adjust to noise instead of real risk preferences (Roberts and Pashler, 2000). This can sometimes be observed when parameter values emerge that are highly atypical and extreme. A common solution to this problem is to set boundaries and limit the range of parameter values that are potentially estimated. Boundaries prevent extreme parameter values in the case of weak evidence and thus reduce overfitting, but on the downside they negate the possibility of reflecting extreme preferences altogether, even though these preferences may be real. Boundaries are also defined arbitrary and may create serious estimation problems due to parameter interdependence. For example, setting a boundary at the lower range on one parameter may radically change the estimate of another parameter (e.g., restricting the range of probability distortion may unduly influence estimates of loss aversion) for one particular subject. The fact that a functional (mis)specification can affect other parameter estimates (see for example Nilsson et al., 2011) and that interaction effects between parameters have been found (Zeisberger et al., 2010) indicates that there are multiple ways in which prospect theory can account for preferences. This in turn may mean that choosing different parameter boundaries for one or more parameters may affect the estimates of other parameter estimates as well if a parameter estimate runs up against an imposed boundary. To circumvent the pitfalls of arbitrary parameter boundaries, we use a hierarchical estimation method based on Farrell and Ludwig (2008) without such boundaries. At its core, this method uses estimates of risk preferences of the whole sample to inform estimates of risk preferences at the individual level. In this paper we address to what degree an estimation method combining group level information with individual level information can more reliably represent individual risk preferences, rather than using either individual or aggregate information exclusively. 7 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 1.5. Justification for using hierarchical estimation methods The ultimate goal of this hierarchical estimation procedure is to obtain reliable estimates of individual risk preferences that can be used to make better predictions about risky choice behavior contrasted to other estimation methods. This is not modeling as a means to only maximize in-sample fit, but rather to maximize out-of-sample correspondence; and moreover these methods are a way to gain insights into what people are actually motivated by as they cogitate about risky options and make choices when confronting irreducible uncertainty. Ideally, the parameters should not only be about just summarizing choices, but also about capturing some psychological mechanisms that underlie risky decision making. 1.6. Other applications of hierarchical estimation methods Other researchers have applied hierarchical estimation methods to risky choice modeling. Nilsson et al. (2011) have applied a Bayesian hierarchical parameter estimation model to simple risky choice data. The hierarchical procedure outperformed maximum likelihood estimation in a parameter recovery test. The authors however did not test out-of-sample predictions nor the retest reliability of their parameter estimates. Wetzels et al. (2010) applied Bayesian hierarchical parameter estimation in a learning model and found it was more robust with regard to extreme estimates and misunderstanding of task dynamics. Scheibehenne and Pachur (2013) applied Bayesian hierarchical parameter estimation to risky choice data using a TAX model (Birnbaum and Chavez, 1997) but reported no improvement in parameter stability. 1.7. Other research examining parameter stability There exists some research on estimated parameter stability. Glöckner and Pachur (2012) tested the reliability of parameter estimates using a nonhierarchical estimation procedure. The results show that individual risk preferences are generally stable and that the individual parameter values outperform aggregate values in terms of prediction. The authors concluded that the reliability of parameters suffered when extra model parameters were added. This is contrasted against the work of Fehr-Duda and Epper (2012) who conclude, based on experimental data and a literature review, that those additional parameters (related to the probability weighting function) are necessary to reliably capture individual risk preferences. Zeisberger et al. (2010) also tested individual parameter reliability using a different type of risky 8 211 212 213 choice (certainly equivalent vs. binary choice). They found significant differences in parameter value estimates over time but did not use a hierarchical estimation procedure. 227 1.8. Structure of the rest of this paper In this paper we focus on evaluating different statistical estimation procedures for fitting individual choice data to risky decision models. To this end, we hold constant the set of lotteries DMs made choices with, and the functional form of the risky choice model. We then fit the same set of choice data using a variety of estimation methods, and then contrast the results to differentiate their efficiency. We also compute the test-retest reliability of parameter estimations that resulted from different estimation procedures. This broad approach allows us to evaluate different estimation methods and make substantiated conclusions about fit quality as well as diagnose overfitting. We conclude with recommendations for estimating risk preferences at the individual level and briefly describe an online tool for measuring and estimating risk preferences that researchers are invited to use as part of their own research programs. 228 2. Stimuli and Experimental design 214 215 216 217 218 219 220 221 222 223 224 225 226 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 2.1. Participants One hundred eighty five participants from the subject pool at the Max Planck Institute for Human Development in Berlin volunteered to participate in two sessions (referred to hereafter as time 1 and time 2) that were administered approximately two weeks apart. After both sessions were conducted, complete data from both sessions was available for 142 participants; these subjects with complete datasets are retained for subsequent analysis. The experiment was incentive compatible and was conducted without deception. The data used here are the choice only results from the Schulte-Mecklenbeck et al. (2014) experiment and we are indebted to these authors for their generosity in sharing their data. 2.2. Stimuli In each session of the experiment individuals made choices from a set of 91 simple binary option lotteries. Each option has two possible outcomes between -100 and 100 that occur with known probabilities that sum to one. There were four types of lotteries for which examples are shown in Table 9 245 246 247 248 249 250 251 252 253 254 255 2. The four types are: gains only; lotteries with losses only; mixed lotteries with both gains and losses; and mixed-zero lotteries with one gain and one loss and zero (status quo) as the alternative outcome. The first three types were included to cover the spectrum of risky decisions while the mixed-zero type allows for measuring loss aversion separately from risk aversion (Rabin, 2000; Wakker, 2005). The same 91 lotteries were used in both test-retest sessions of this research. The set was compiled of items used by Rieskamp (2008), Gaechter et al. (2007) and Holt and Laury (2002); these items are also used by Schulte-Mecklenbeck et al. (2014). Thirty-five lotteries are gain only, 25 are loss only, 25 are mixed, and 6 are mixed-zero. All of the lotteries are listed in Appendix A. Option A Lottery 1 2 3 4 32 99 -24 -13 58 -5 -30 60 .78 .22 .99 with probability .01 .48 with probability .52 .50 with probability .50 with probability Option B 39 .32 with probability 56 .68 -15 .44 with probability -62 .56 96 .60 with probability -40 .40 0 for sure Type gain loss mixed mixed-zero Table 2: Examples of binary lotteries from each of the four types. 256 257 258 259 260 261 262 263 264 265 266 267 268 2.3. Procedure Participants received extensive instructions regarding the experiment at the beginning of the first session. All participants received 10 EUR as a guaranteed payment for participation in the research, and could earn more based on their choices. Participants then worked through several examples of risky choices to familiarize themselves with the interface of the MouselabWeb software (Willemsen and Johnson, 2011, 2014) that was used to administer the study. While the same 91 lotteries were used for both experimental sessions, the item order was randomized. Additionally the order of the outcomes (topbottom) and options order (A or B) and the orientation (A above B, A left of B) were randomized and stored during the first session. The exact opposite spacial representation of this was used in the second session, to mitigate order and presentation effects. 10 277 Incentive compatibility was implemented by randomly selecting one lottery at the end of each experimental session and paying the participant according to their choice on the selected item when it was played out with the stated probabilities. An exchange rate of 10:1 was used between experimental values and payments for choices. Thus, participants earned a fixed payment of 10 EUR plus one tenth of the outcome of one randomly selected lottery for each completed experimental session. As a result participants earned about 30 EUR (approximately 40 USD) on average for participating in both sessions. 278 3. Model specification 269 270 271 272 273 274 275 276 279 280 281 282 283 We use cumulative prospect theory (Tversky and Kahneman, 1992) to model risk preferences. A two-outcome lottery1 L is valued in utility u(·) as the sum of its components in which the monetary reward xi is weighted by a value function v(·) and the associated probability pi by a probability weighting function w(·). This is shown in Equation 1. u(L) = v(x1 )w(p1 ) + v(x2 )(1 − w(p1 )) 284 285 286 287 288 289 290 291 292 293 294 (1) Cumulative prospect theory has many possible mathematical specifications. These different functional specifications have been tested and reported in the literature (see for example Stott, 2006) and the functional forms and parameterizations we use here are justifiable given the preponderance of findings. 3.1. Value function In general power functions have been empirically shown to fit choice data better than many other functional forms at the individual level (Stott, 2006). Stevens (1957) also cites experimental evidence showing the merit of power functions in general for modeling psychological processes. Here we use a power value-function as displayed in Equation 2. 1 Value x1 and its associated probability p1 and is the option with the highest value for lotteries in the positive domain and x1 is the option with the lowest associated value for lotteries in the negative domain. The order is irrelevant when lotteries consist of uniform gains or losses, however in mixed lotteries it matters. This ordering is to ensure that a cumulative distribution function is applied to the options systematically. See Tversky and Kahneman (1992) for details. 11 1 w(p) (subjective probability) v(x) (subjective value) 25 15 5 −5 −15 −25 −25 −15 −5 5 x (objective value) 15 0.8 0.6 0.4 0.2 0 0 25 0.2 0.4 0.6 0.8 p (objective probability) Figure 2: This figure shows typical value function (left) and probability weighting function (right) from prospect theory. The parameters used to plot the solid lines are from Tversky and Kahneman (1992) and reflect the empirical findings at the aggregate level. The dashed lines represent risk-neutral preferences. The solid line for the value function is concave over gains and convex over losses, and further exhibits a “kink” at the reference point (in this case the origin) consistent with loss aversion. The probability weighting function overweights small probabilities and underweights large probabilities. It is worth noting that at the individual level, the plots can differ significantly from these aggregate level plots. v(x) = 295 296 297 298 299 300 301 302 303 304 305 306 xα −λ(−x)α x ≥ 0; x < 0; α>0 α > 0; λ > 0 (2) The α-parameter controls the curvature of the value function. If the value of α is below one, there are diminishing marginal returns in the domain of gains. If the value of α is one, the function is linear and consistent with riskneutrality (i.e., EV maximizing) in decision-making. For α values above one, there are increasing marginal returns for positive values of x. This pattern reverses in the domain of losses (values of x less than 0), which is referred to as the reflection effect. For this paper we use the same value function parameter α for both gains and losses, and our reasoning is as follows. First, this is parsimonious. Second, when using a power function with different parameters for gains and losses, loss aversion cannot be defined anymore as the magnitude of kink at the reference point (Köbberling and Wakker, 2005) but would instead need 12 1 307 308 309 310 311 312 313 314 315 316 317 318 319 to be defined contingently over the whole curve.2 This complicates interpretation and undermines clear separability between utility curvature and loss aversion, and further may cause modeling problems for mixed lotteries that include an outcome of zero. Problems of induced correlation between the loss aversion parameter and the curvature of the value function in the negative domain have furthermore been reported when using αgain 6= αloss in a power utility-model (Nilsson et al., 2011). 3.2. Probability weighting function To capture subjective probability, we use Prelec’s functional specification (Prelec, 1998; see Figure 5). Prelec’s two-parameter probability weighting function can accommodate both inverse S-shaped as well as S-shaped weightings and has been shown to fit individual data well (Fehr-Duda and Epper, 2012; Gonzalez and Wu, 1999). Its specification can be found in Equation 3. w(p) = exp (−δ(− ln(p))γ ), 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 δ > 0; γ > 0 (3) The γ parameter controls the curvature of the weighting function. The psychological interpretation of the curvature of the function is a diminishing sensitivity away from the end points: both 0 and 1 serve as necessary boundaries and the further from these edges, the less sensitive individuals are to changes in probability (Tversky and Kahneman, 1992). The δ-parameter controls the general elevation of the probability weighting function. It is an index of how attractive lotteries are considered to be (Gonzalez and Wu, 1999) in general and it corresponds to how optimistic (or pessimistic) an individual decision maker is. The use of Prelec’s two-parameter weighting function over the original specification used in cumulative prospect theory (Tversky and Kahneman, 1992) requires additional explanation. In the original specification the point of intersection with the diagonal changes simultaneously with the shape of the weighting function, whereas in Prelec’s specification, the line always intersects at the same point if δ is kept constant.3 However, changing the 2 It is possible to use αgain 6= αloss by using an exponential utility function (Köbberling and Wakker, 2005). 3 That point is 1/e ≈ 0.37 for δ = 1, which is consistent with some empirical evidence (Gonzalez and Wu, 1999; Camerer and Ho, 1994). The (aggregate) point of intersection is sometimes also found to be closer to 0.5 (Fehr-Duda and Epper, 2012), which is 13 343 point of intersection and curvature simultaneously induces a negative correlation with the value function parameter α because both parameters capture similar characteristics and also does not allow for a wide variety of individual preferences (see Fehr-Duda and Epper, 2012). Furthermore, the original specification of the weighting function is not monotonic for γ < 0.279 (Ingersoll, 2008). While that low value is generally not in the range of reported aggregate parameters (Camerer and Ho, 1994), this non-monotonicity may become relevant when estimating individual preferences, where there is considerably more heterogeneity. 344 4. Estimation methods 335 336 337 338 339 340 341 342 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 Individual risk preferences are captured by obtaining a set of parameters that best fit the observed choices implemented via a particular choice model. In this section we explain the workings of the simple direct maximization of the explained fraction (MXF) of choices, of standard maximum likelihood estimation (MLE) and of hierarchical maximum likelihood estimation (HML). 4.1. Direct maximization of explained fraction of choices (MXF) Arguably the most straightforward way to capture an individual’s preferences is by finding the set of parameters that maximizes the fraction of explained choices made over all the lotteries considered. By explained choices we mean the lotteries for which the utility for the chosen option exceeds that of the alternative option, given some decision model and parameter value(s). The term “explained” does not imply having established some deeper causal insights into the decision making process, but is used here simply to refer to statistical explanation (e.g., a summarization or reproducibility) of choices given tuned parameters and a decision model. The goal of this MXF method is to find the set of parameters M = {α, λ, δ, γ} that maximizes the fraction of explained choices from an individual DM. N 1 X d c(yi |M) Mi = arg max M N i=1 (4) consistent with Karmarkar’s single-parameter weighting function (Karmarkar, 1979). A two-parameter extension of that function is provided by Goldstein and Einhorn’s function (Goldstein and Einhorn, 1987), which appears to fit about equally well as both Prelec’s, and Gonzalez and Wu’s (Gonzalez and Wu, 1999). 14 362 363 364 By c(·) we denote the choice itself as in Equation 5, in which u(·) is given by Equation 1. It returns 1 if the utility given the parameters is consistent with the actual choice of the subject and 0 otherwise. if A is chosen and u(A) ≥ u(B) or 1 if B is chosen and u(A) ≤ u(B) c(yi |M) = (5) 0 otherwise 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 This is a discrete problem because we use a finite number of binary lotteries. Solution(s) can be found using a grid search (i.e. parameter sweep) approach, combined with a fine-tuning algorithm that uses the best results of the grid search as a starting point and then refines the estimate as needed, improving the precision of the parameter values. Note that this process can be computationally intense for a high resolution estimates since the full combination of four parameters over their full range has to be taken into consideration. The MXF estimation method has some advantages. Finding the parameters that maximize the explained fraction of choices is a simple metric that is directly related to the actual choices from a decision maker. It is easy to explain and it is also robust to the presence of a few aberrant choices. This quality prevents the estimates from yielding results that are unduly affected by a single decision, measurement error, or overfitting. An important shortcoming of MXF as an estimation method is that the characteristics of the explained lotteries are not taken into consideration when fitting the choice data. Consider the lottery at the beginning of the paper (Table 1), in which the expected value for option A is higher than that for B, but just barely. Consider a decision maker that we suspect to be generally risk-neutral, and therefore the EV-maximizing policy would dictate selecting option A over option B. But for argument’s sake, consider if we observe the DM select option B; how much evidence does this provide against our conjecture of risk neutrality? Hardly any, one can conclude, given the small difference in utility between the options. Now consider the same lottery, but the payoff for A is $483 instead of $83. Picking B over A is now a huge mistake for a risk-neutral decision maker and such an observation would yield stronger evidence that would force one to revise a conjecture about the decision maker’s risk neutrality. MXF as a estimation method does not distinguish between these two instances as its simplistic right/wrong crite15 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 rion lacks the nuance to integrate this additional information about options, choices, and the strength of evidence. Another limitation of MXF is that it results in numerous ties for bestfitting parameters. In other words, many parameter sets result in exactly the same fraction of choices being explained. For several subjects the same fraction of choices is explained using very different parameters. For one subject just over seventy percent of choices were explained by parameters that included 0.6 to 1.25 for the utility curvature function, 1.5 to 6 (and possibly beyond) for loss aversion, 0.44 to 1.3 for the elevation in the weighting function and 0.8 to 2.5 for its curvature. We find such ties for various constellations of parameters (albeit somewhat smaller) even for subjects for which we can explain up to ninety percent of all choices. Two other estimation methods (MLE and HLM) avoid these problems and we turn our attention to them next. 4.2. Maximum Likelihood Estimation (MLE) A strict application of cumulative prospect theory dictates that even a trivial difference in utility leads the DM to always choose the option with the highest utility. However, even in early experiments with repeated lotteries, Mosteller and Nogee (1951) found that this is not the case with real decision makers. Choices were instead partially stochastic, with the probability of choosing the generally more favored option increasing as the utility difference between the options increased. This idea of random utility maximization has been developed by Luce (1959) and others (e.g. Harless and Camerer, 1994; Hey and Orme, 1994). In this tradition, we use a logistic function that specifies the probability of picking one option, depending on each option’s utility. This formulation is displayed in Equation 6. A logistic function fits the data from Mosteller and Nogee (1951) well for example and has been shown generally to perform well with behavioral data (Stott, 2006). p(A B) = 423 424 425 426 427 1 1+ eϕ(u(B)−u(A)) , ϕ≥0 (6) The parameter ϕ is an index of the sensitivity to differences in utility. Higher values for ϕ diminish the importance of the difference in utility, as is illustrated in Figure 3. It is important to note that this parameter operates on the absolute difference in utility assigned to both options. Two individuals with different risk attitudes but equal sensitivity, will almost certainly end up 16 probability of picking A over B 1 0.8 0.6 0.4 ϕ=2 0.2 ϕ=1 0 0 5 10 utility of A 15 20 Figure 3: This figure shows two example logistic functions. Consider a lottery for which option B has a utility of 10 while option A’s utility is plotted on the x-axis. Given the utility of B, the probability of picking A over B is plotted on the yaxis. The option with a higher utility is more likely to be selected in general. In the instance when both options’ utilities are equal, the probability of choosing each option is equivalent. The sensitivity of a DM is reflected in parameter ϕ. A perfectly discriminating (and perfectly consistent) DM would have a step function as ϕ → 0; less perfect discrimination on behalf of DMs is captured by larger ϕ values. 428 429 430 431 432 433 434 435 436 437 438 439 with different ϕ parameters simply because the differences in options utilities are also determined by the risk attitude. While the parameter is useful to help fit an individual’s choice data, one cannot compare the values of ϕ across individuals unless both the lotteries and the individuals’ risk attitudes are identical. The interpretation of the sensitivity parameter is therefore often ambiguous and it should be considered as simply an aid to the estimation method. In maximum likelihood estimation the goal function is to maximize the likelihood of observing the outcome, which consists of the observed choices, given a set of parameters. The likelihood is expressed using the choice function above, yielding a stochastic specification. We use the notation M = {α, λ, δ, γ, ϕ}, where yi denotes the choice for 17 440 the i-th lottery. di = arg max M M 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 N Y c(yi |M) (7) i=1 By c(·) we denote the choice itself, where p(A B) is given by the logistic choice function in Equation 6. p(A B) if A is chosen (yi is 0) c(yi |M) = (8) 1 − p(A B) if B is chosen (yi is 1) This approach overcomes some of the major shortcomings of other methods discussed above. MLE takes the utility difference between the two options into account and therefore does not only fit the number of choices correctly predicted, but also measures the quality of the lottery and its fit. Ironically, because it is the utility difference that drives the fit (not a count of correct predictions), the resulting best-fitting parameters may explain fewer choices than the MXF method. This is because MLE tolerates several smaller prediction mistakes instead of fewer, but very large, mistakes. The fact that the quality of the fit drives the parameter estimates may be beneficial for predictions, especially when the lotteries have not been carefully chosen and large mistakes have large consequences. Because the resulting fit (likelihood) has a smoother (but still reactively lumpy) surface, finding robust solutions is computationally less intense than with MXF. On the downside, MLE is not highly robust to aberrant choices. If by confusion or accident the DM chooses an atypical option for a lottery that is greatly out of line with her other choices, the resulting parameters may be disproportionately affected by this single anomalous choice. While MLE does take into account the quality of the fit with respect to utility discrepancies, the fitting procedure has the simple goal of finding the parameter set that generates the best overall fit (even if the differences in fit among parameter sets are trivially divergent). This approach ignores the fact that there may be different parameter sets that fit choices virtually the same. Consider subject A in Figure 8, which is the resulting distribution of parameter estimates when we change one single choice and reestimate the MLE parameters. Especially for loss aversion and the elevation of the probability weighting function we find that only a single choice can make the difference between very different parameter values. The MLE method solves the tie-breaking problem of MXF, but does so 18 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 by ascribing a great deal of importance to potentially trivial differences in fit quality. Radically different combinations of parameters may be identified by the procedure as nearly equivalently good, and the winning parameter set may emerge but only differ from the other “almost-as-good” sets by a slightest of margins in fit quality. This process of selecting the winning parameter set from among the space of plausible combinations ignores how realistic or reflective the sets may be of what the DM’s actual preferences are. Moreover the ill behaved structure of the parameter space makes it a challenge to find reliable results, as different solutions can be nearly identical in fit quality but be located in radically different regimes of the parameter space. A minor change in the lotteries, or just one different choice, may bounce the resulting parameter estimations substantially. Furthermore, the MLE procedure does not consider the psychological interpretation of the parameters and therefore the fact that we may select a parameter that attributes preferences that are out of the ordinary with only very little evidence to distinguish it from more typical preferences. 4.3. Hierarchical Maximum Likelihood Estimation (HML) The HML estimation procedure developed here is based on Farrell and Ludwig (2008). It is a two-step procedure that is explained below. Step 1. In the first step the likelihood of occurrence for each of the four model parameters in the population as a whole is estimated. These likelihoods of occurrence are captured using probability density distributions and reflect how likely (or conversely, out of the ordinary) a particular parameter value is in the population given everyone’s choices. These density distributions are found by solving the integral in Equation 9. For each of the four cumulative prospect theory parameters we use a lognormal density distribution, as this distribution has only positive values, is positively skewed, has only two distribution parameters, and is not too computationally intense. Because the sensitivity parameter ϕ is not independent of the other parameters, and because it has no clear psychological interpretation, we do not estimate a distribution of occurrence for sensitivity, but instead take the value we find for the aggregate data with classical maximum likelihood estimation for this step.4 We use the notation M = {α, λ, δ, γ, ϕ}, Pα = {µα , σα }, 4 It may be that a different approach with regard to the sensitivity parameter leads to better performance. We cannot judge this without data of a third re-test, as it requires 19 Step 2 4 3 3 ← α (curvature) ←α density density Step 1 4 2 1 2 1 ← λ (loss aversion) 1 2 parameter value ←λ 0 0 3 4 4 3 3 density density 0 0 2 1 (shape) γ→ 0 0 1 2 parameter value 3 3 2 1 ← δ (elevation) 1 2 parameter value 0 0 γ→ ←δ 1 2 parameter value 3 Figure 4: The two steps of the hierarchical maximum likelihood estimation procedure. In the first step we fit the data to four probability density distributions. In the second step we obtain the actual individual estimates. A priori and in the absence of individual choice data the small square has the highest likelihood, which is at the modal value of each parameter distribution. The individual choices are balanced against the likelihood of the occurrence of a parameter to obtain individual parameter estimates, depicted by big squares. 504 Pλ = {µλ , σλ }, Pδ = {µδ , σδ }, Pγ = {µγ , σγ }, P = {Pα , Pλ , Pδ , Pγ }, S and I estimation in a first session, calibration of the method using a second session, and true out-of-sample testing of this calibration in a third session. This may be interesting but is beyond the scope of the current paper. 20 505 denote the number of participants and the number of lotteries respectively. b = arg max P P 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 S Y s=1 ZZZZ " N Y # c(ys,i |M) ln(α|Pα ) ln(λ|Pλ ) ln(δ|Pδ ) ln(γ|Pγ ) dα dλ dδ dγ i=1 (9) This first step can be explained using a simplified example which is a discrete version of Equation 9. For each of the four risk preference parameters let us pick a set of values that covers a sufficiently large interval, for example {0.5, 1, ..., 5}. We then take all parameter combinations (Cartesian product) of these four sets, i.e. α = 0.5, λ = 0.5, δ = 0.5, γ = 0.5, then α = 0.5, λ = 0.5, δ = 0.5, γ = 1.0, and so forth. Each of these combinations has a certain likelihood of explaining a subject’s observed choices, displayed within the block parentheses in Equation 9. However, not all of the parameter values in these combinations are equally supported by the data. It could for example be that the combination showed above that contains γ = 0.5 is 10 times more likely to explain the observed choices than γ = 1.0. How likely each parameter value is in relation to another value is precisely what we intend to measure. The fact that one set of parameters is associated with a higher likelihood of explaining observed choices (and therefore is more strongly supported by the data) can be reflected by changing the four log-normal density functions for each of the model parameters. Because we multiply how likely a set of values explains the observed choices by the weight put on this set of parameters through density functions, Equation 9 achieves this. The density functions, and the product thereof, denoted by ln(·) within the integrals, are therefore ways to measure the relative weight of certain parameter values. The maximization is done over all participants simultaneously under the condition and to ensure that it needs to be a density distribution. Applying this step to our data leads to the density distributions displayed in Figure 4. Step 2. In this step we estimate individual parameters as carried out in standard maximum likelihood estimation, but now weigh parameters by the likelihood of the occurrence of these parameter values as given by the density functions obtained in step 1. This likelihood of occurrence is determined by the distributions we obtained from step 1. Those distributions and both steps are illustrated in Figure 4. This second step of the procedure is described in Equation 10. With Mi we denote M as above for subject i. The resulting parameters are not only driven by individual decision data, but also by how 21 538 likely it is that such a parameter combination occurs in the population. "N # Y di = arg max cα ) ln(λ|P cλ ) ln(δ|P cδ ) ln(γ|P cγ ) M c(yi |Mi ) ln(α|P (10) Mi 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 i=1 This HML procedure is motivated by the principle that extreme conclusions require extreme evidence. In the first step of the HLM procedure, one simultaneously extracts density distributions of all parameter values using all choice data from the population of DMs. In the second step, the likelihood of a particular parameter value is determined by how well it fits the observed choices from an individual, and at the same time, it is weighted by the likelihood of observing that particular parameter value in the population distribution (defined by the density function obtained in step 1). The parameter set with the most likely combination of both of those elements is selected. The weaker the evidence is, the more parameter estimates are “pulled” to the center of the density distributions. HML therefore counters maximum likelihood estimation’s tendency to fit parameters to extreme choices by requiring stronger evidence to establish extreme parameter estimates. Thus, if a DM makes very consistent choices, the population parameter densities will have very little effect on the estimates. Conversely, if a DM has greater inconsistency in his choices, the population parameters will have more pull on the individual estimates. In the extreme, if a DM has wildly inconsistent choices, the best parameter estimates for the DM will simply be those for the population average. This is illustrated in Figure 5 that compares the estimates for four representative participants by MLE (points) and HML (squares). Subject 1 is the same subject as in Figure 8. We find that HML estimates are different from MLE, but are generally still within the 95% confidence intervals. This is the statistical tie-breaking mechanism we referred to in the previous section. It is also possible that the differences in estimates are large and fall outside the univariate MLE confidence intervals (displayed by lines), such as for subject 2. For the other two participants HML’s estimates are virtually identical, irrespective of the confidence interval. This is largely driven by how consistent a DM’s choices supporting particular parameter values are. The estimation method has been implemented in both MATLAB and C. Parameter estimates in the second step were found using the simplex algorithm by Nelder and Mead (1965) and initialized with multiple starting points 22 α γ δ λ 1 2 3 4 1 2 2 4 6 1 2 1 2 Figure 5: Parameter estimates for four participants, with each row a subject and each column a parameter. The lines are approximate univariate 95% confidence intervals for MLE for each of the four model parameters (sensitivity not included). The round markers indicate the best estimate from the maximum likelihood estimation procedure. The square marker indicates its hierarchical counterpart. The first subject is subject A in Figure 8. For participants one and two, we observe large differences in parameter estimates, partially explained by the large confidence intervals. For participants three and four we find that the results are virtually identical. 585 to avoid local minima. Negative log-likelihood was used for computational stability when possible. For the first step of the hierarchical method one cannot use negative log-likelihood for the whole equation due to the presence of the integral, so negative log-likelihood was only used for the multiplication across participants. Solving the volume integral is computationally intense because of the number of the high dimensions and the low probability values. Quadrature methods were found to be prohibitively slow. Because the integral-maximizing parameters are what is of central importance, and not the precise value of the integral, we used Monte Carlo integration with 2,500 (uniform random) values for each dimension as a proxy. We repeated this twenty times and used the average of these estimates as the final distribution parameters. A possible benefit of HML is that the step 1 distribution appears to be fairly stable. With a sufficiently diverse population it may therefore be possible to simply take the distributions found in previous studies, such as this one. The parameter values can be found in the appendix. 586 5. Results 571 572 573 574 575 576 577 578 579 580 581 582 583 584 587 588 589 590 5.1. Aggregate results The aggregate data can be analyzed to establish stylized facts and general behavioral tendencies. In this analysis, all 12,922 choices (142 participants × 91 items) are modeled as if they were made by a single DM and one set of risk 23 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 preference parameters is estimated. This preference set is estimated using each of the three different estimation methods discussed before (MXF, MLE, HLM). For MXF, four parameters are estimated that maximize the fraction of explained choices; for the other two methods, five parameters (the fifth value is the sensitivity parameter ϕ) are estimated for each method. The results of the three estimation procedures, including confidence intervals, can be found in Table 3. On the aggregate, we observe behavior consistent with a curvature parameters implying diminishing sensitivity to marginal returns; α values less than 1 are estimated for all of the methods, namely 0.59 for MXF and 0.76 for both MLE and HLM. Choice behavior is also consistent with some loss aversion (λ > 1), which reflects the notion that “losses loom larger than gains” (Kahneman and Tversky, 1979). The highest loss-aversion value we find for any of the estimation methods is 1.12, which is much lower than the value of 2.25 reported in the seminal cumulative prospect theory paper (Tversky and Kahneman, 1992). Part of the explanation for a lower value (indicating less loss aversion) might be that as it is verboten to take money from participants in the laboratory and thus “losses” were bounded to only being reductions in participant’s show-up payment. The probability weighting function has the characteristic inverse S-shape that intersects the diagonal at around p = 0.44 for MLE and HML (and stays above it for MXF). Similar values for all of the estimation methods are found in the second experimental session. The largest differences between the two experimental test-retest sessions are seen in the probability weighting parameters. α Time 1 MXF 0.59 MLE 0.76 HML 0.76 Time 2 MXF 0.58 MLE 0.78 HML 0.78 (±0.03) (±0.02) (±0.03) (±0.02) λ 1.11 1.12 1.12 1.11 1.18 1.18 (±0.04) (±0.03) (±0.04) (±0.03) δ 0.71 0.93 0.93 0.71 0.91 0.91 (±0.03) (±0.02) (±0.03) (±0.02) γ 0.90 0.63 0.63 0.86 0.65 0.65 (±0.02) (±0.02) (±0.02) (±0.02) ϕ 0.25 0.26 0.23 0.23 (±0.04) (±0.03) (±0.03) (±0.03) Table 3: Median parameter values for the aggregate data found using all three estimation methods. Approximate univariate 95% confidence intervals constructed using log-likelihood ratio statistics for the MLE and HML methods. 615 With the choice function and obtained value of ϕ we can convert the 24 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 parameter estimates of the first experimental session for MLE (and HML, since its aggregate estimates are virtually identical) into a prediction of the probability of picking option B over option A for each lottery in the second experimental session. This is then compared to the fraction of subjects picking option B over option A. On the aggregate, cumulative prospect theory’s predictions appear to perform well. The correlation between the predicted and observed fractions is 0.93. 5.2. Individual risk preference estimates When benchmarking the explained fraction and fit of individual parameters compared to their estimated aggregate median counterparts, we find that individual parameters provide clear benefits, especially in terms of the fit. After having established that individual parameters have added value, a more direct comparison between MXF, MLE and HML is warranted. Particularly for HML it is important to determine to what degree its parameter estimates sacrifice in-sample performance, which is the performance of the individual parameters of the first session within that same session. Individual risk preference parameters were estimated for each subject using all three estimation methods. Fits are expressed in negative log-likelihood (lower is better) and are directly comparable because the hierarchical component is excluded in the evaluation of the likelihood value. The mean and standard deviation of the subject’s fits are displayed in Table 4. As to be expected, the log-likelihood is lowest for MLE because that method attempts to find the set of parameters with the best possible fit. HML is not far behind, however. No likelihood is given for MXF, as that method does not use the sensitivity parameter. We also determined the explained fraction of choices for each individual. The means and standard deviations are also printed in Table 4. With 0.82 the explained fraction of choices is highest for MXF because its goal function is to maximize that fraction. MLE and HML follow at about 0.76. Thus, the explained percentage of choices that cannot be explained at time 1 by HML estimates but can be explained by MLE estimates is small to naught. For some participants the HML estimates explain a larger fraction of choices at time 1. While this seems counterintuitive, it is the fit (likelihood) that eventually drives the MLE parameter estimates and not the fraction of explained choices. For most participants the HML estimates are different from the MLE estimates, but the differences are inconsequential for the percentage of explained choices. 25 1 Fraction explained by HML 0.9 0.8 0.7 0.6 0.5 0.4 0.4 0.5 0.6 0.7 0.8 Fraction explained by MLE 0.9 1 Figure 6: The predicted fraction of choices at time 2 when using MLE parameters from time 1 (x-axis) and HML parameters from time 1 (y-axis). Each point represents one DM. MLE performs better for a subject whose point appears in the lower right region, HML performs better when it appears in the upper left region. Negative log likelihood of result with HML 200 170 140 110 80 50 20 20 50 80 110 140 170 Negative log likelihood of result with MLE 200 Figure 7: The goodness of fit at time 2 when using MLE parameters from time 1 (x-axis) and HML parameters from time 1 (y-axis). Each point represents one DM. MLE performs better for a DM whose point appears in the lower right region, HML performs better when it appears in the upper left region. 26 Fraction MXF MLE HML Fit MXF MLE HML Explained (Time 1) Predicted (Time 2) 0.82 (0.07) 0.74 (0.10) 0.76 (0.09) 0.73 (0.09) 0.76 (0.08) 0.73 (0.10) 83.73 (21.15) 103.24 (26.66) 86.37 (21.81) 98.97 (23.66) Table 4: Mean and standard deviation (in parentheses) for the three estimation methods. Explained refers to the time 1 statistics given that time 1 estimates are used, predicted refers to time 2 statistics using parameters from time 1. The explained fraction of choices is about the same, while the fit in time 2 which is better for HML. 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 So far we have established that the HML estimates do not hurt the explained fraction of choices accounted for even if it has a somewhat lower fit within-sample; the next question is whether HML is worth the effort in terms of prediction (out-of-sample). The estimates of individual parameters of the first session were used to predict choices in the second session. In Table 4 we have listed simple statistics for the fit (again comparable by not including the hierarchical component) and the explained fractions of all three estimation methods for the second session. All three methods explain approximately the same fraction of choices. The large advantage MXF had over the other two methods has disappeared. It is interesting to compare this with the prediction that participants make the same choices in both sessions. That method correctly predicts .71 on average, but its application is of course limited to situations in which the lotteries are identical. Figure 6 shows the fraction of correctly predicted lotteries and Figure 7 compares the fit for MLE and HML parameters. Judging by the fraction of lotteries explained, it is not immediately obvious which method is better. A more sensitive measure to compare the three methods is the goodness-of-fit. When comparing HML to MLE, we find that HML’s out-of-sample fit is better on average and its negative log likelihood is lower for 100 out of 142 participants. Using a paired sign rank test for medians we find that the fit for HML is significantly lower than for MLE (p < 0.001). The results are much clearer when examining the fit in the figure too, which shows many points on the diagonal and a large number below it (a lower negative log likelihood is better). We can conclude that HML sacrificed part of the fit at time 1, for an improved fit at time 2. 27 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 Examination of the differences in parameter estimates for those participants for which HML outperforms MLE reveals higher MLE mean parameter estimates for the individual probability weighting function parameters. To verify if HML did not just work because MLE returned extremely high parameter estimates for the weighting function, we applied the simple solution of using (somewhat arbitrarily chosen) bounds to the MLE estimation method (0.2 ≤ α ≤ 2, 0.2 ≤ λ ≤ 5,0.2 ≤ δ ≤ 2,0.2 ≤ γ ≤ 1). This did not change the results, as HML still outperformed MLE. In many instances HML is also an improvement over MLE for seemingly plausible MLE estimates, so HML’s effect is not limited to outliers or extreme parameter values. Interestingly, some of the cases in which MLE outperforms HML are those in which low extreme values emerge that are consistent with choices in the second experimental session, such as loss seeking (less weight on losses) or weighting functions that are nearly binary step functions. This lack of sensitivity highlights one of the potential downsides of HML. 5.3. Parameter reliability An key issue is the reliability of the risk parameter estimates. This viewpoint is consistent with using risk preferences as a trait, a stable characteristic of an individual decision maker, and amendable to psychological interpretation of the parameters. Reliability can be assessed by applying all three estimation procedures independently on the same data from the two experimental sessions. Large changes in parameter values are a problem and an indication that we are mostly measuring/modeling noise instead of real individual risk preferences. At the individual level we can examine the test-retest correlations of the parameters from the three estimation methods, displayed in Table 5. The parameter correlations for MXF indicate some reliability with correlations of approximately 0.40, except for δ which approaches zero. MLE provides more reliable estimates for δ than MXF, but performs worse overall. The test-retest reliability for MLE can be improved for three out of four parameters by applying admittedly arbitrary bounds. The highest test-retest correlation is obtained using HML, which yields the most reliable parameter estimates for all parameters. Another form of parameter reliability is the change in parameter estimates resulting from small changes in choice input. One way of establishing this form of reliability is by temporarily changing the choice on one of a subject’s lotteries, after which the parameters are reestimated. This is repeated for each of the 91 lotteries separately. MXF and HML’s estimates are 28 MXF MLE MLE (bounded) HML rα 0.39 0.23 0.28 0.43 rλ 0.40 0.36 0.54 0.53 rδ 0.05 0.19 0.10 0.44 rγ 0.39 0.06 0.45 0.66 Table 5: Test-retest correlation coefficients (r values) for parameter values obtained using the listed estimation procedure for both sessions independently. The use of parameter bounds (0.2 ≤ α ≤ 2, 0.2 ≤ λ ≤ 5, 0.2 ≤ δ ≤ 2, 0.2 ≤ γ ≤ 1), indicated by by bounded condition, improves the test-retest correlation for MLE. The highest test-retest correlations are obtained with HML. 713 714 715 716 717 718 719 720 721 722 robust, whereas MLE’s estimates are less so. In Figure 8 the results of reestimation after perturbing one choice is plotted for two subjects using both MLE and HML. A single different choice can yield large changes in MLE parameter estimates and this contribute to unreliable parameter estimates. This is generally an undesirable property of a measurement method as minor inattention, confusion, or a mistake can lead to very different conclusions about that person’s risk preferences. The high robustness of HML estimates is a general finding among subjects in our sample. Moreover the fragility of the estimates quickly becomes worse as the number of aberrant choices increases by only a few more. 29 Number of results Number of results Number of results Number of results 90 90 Subject A − MLE 90 90 60 60 60 60 30 30 30 30 0 0 1 α 2 0 0 0 1 2 0 λ Subject A − HML 90 2 δ 4 0 0 90 90 60 60 60 60 30 30 30 30 0 0 1 α 2 0 0 1 λ 2 0 0 2 δ 4 0 0 90 Subject B − MLE 90 90 60 60 60 60 30 30 30 30 1 α 2 0 0 0 1 2 0 λ Subject B − HML 90 2 δ 4 0 0 90 90 60 60 60 60 30 30 30 30 0 0 1 α 2 0 0 1 λ 2 0 0 2 1 γ 2 1 γ 2 1 γ 2 90 90 0 0 1 γ 90 2 δ 4 0 0 Figure 8: The parameter outcomes of MLE parameter estimates for two sample subjects if we change a single choice for each of the 91 items. The black line corresponds to the best MLE estimate of the original choices. The bimodal pattern we observe for loss aversion and the weighting function for subject A is not a desirable feature. A momentary lapse of attention or a minor mistake may result in very different parameter combinations. HML produces more reliable estimates in terms of being less sensitive to one aberrant choice compared to MLE. 30 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 6. Discussion In this paper we illustrated the merits of a hierarchical maximum likelihood (HML) procedure in estimating individual risk parameter estimations. The HML estimates did reduce (albeit trivially) the in-sample performance, while at the same time the out-of-sample fit was significantly higher than MLE and its explained fraction was as high as any of the other methods. The reliability of the parameter estimates for test-retest was furthermore higher for HML than for any other method. We conclude that HML offers advantages over the maximization of the explained fraction and maximum likelihood estimation and is the preferred estimation method for this choice model. The use of population information in establishing individual estimates can improve both the out-of-sample fit and the reliability of the parameter estimates. Some caveats are in order however. The benefits of hierarchical modeling may, for example, disappear when more choice data is available. In cases where many choices have been observed, the signal can more easily be teased apart from the noise and hierarchical methods may become unwarranted. Its benefits may also depend on the set of lotteries used. For lotteries specifically designed for the tested model, that allow a narrow range of potential parameter values, the choices alone may be sufficient to reliably estimate individual risk preference parameters and hence the MXF estimates may be more reliable than in our current results. As researchers however we do not always control the set of lottery items, especially outside of the laboratory. Furthermore, cumulative prospect theory and the parameterization of it we have chosen here is but one model. The possibility exists that other models fit the choice data better and that the resulting parameter estimates are highly robust over time; in this case there may not be added value of hierarchical parameter estimation. In this paper we have not discussed alternative hierarchical estimation methods. Bayesian models have gained attention in the literature recently (Zyphur and Oswald, 2013). The frequentist model is used here because it requires the addition of only one term in the likelihood function, given that the hierarchical distribution is known. Our data indicate that this distribution appears to be fairly stable in a population. Nevertheless a Bayesian hierarchical estimation method may outperform the frequentist method outlined here. Additionally the hierarchical method we have applied here is of a relatively simple nature. It is possible to modify the extent to which the hi31 779 erarchical method weights the individual risk preference estimates and there may be more sophisticated ways to do so. One way to is to fit a parameter that moderates the weight of the hierarchical component by maximizing the fit in the retest. To determine the advantages of such a method requires an experiment with three sessions: one to obtain parameter estimates, another to calibrate the parameters and a third to verify it outperforms the other methods. This may be of both theoretical and practical interest, but it is beyond the scope of this paper. One major take away point from the current paper is that the estimation method can greatly affect the inferences one makes about individual risk preferences, particularly in multi-parameter choice models. A number of different parameter combinations may produce virtually the same fit result. When the interpretation of individual parameter values is used to make predictions about what an individual is like (e.g., psychological traits) or what will she do in the future (out-of-sample prediction), it is important to realize that other constellations of parameters are virtually equally plausible in terms of fit but may lead to vastly different conclusions about individual differences and predictions about behavior. Moreover, using more sophisticated estimation methods can help us extract more “signal” from a set of noisily choices and thus support better measurement of innate risk preferences. 780 7. Online implementation 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 Implementing the measurement and estimation methods delineated in this paper requires some investment of time, effort, and technical expertise; this may act as a barrier for the measure and method’s use and wider adoption. One could imagine that there are researchers interested in establishing reliable measures of risk preferences at the individual level, but that these risk preferences may not of central interest to their research programs. Nonetheless knowing these individual differences are worth some attention as they may covary with other variables or psychological processes under scrutiny. To facilitate the use of individual level risk measurement, we have developed an online version of the risky decision task used in this research. The URL http://vlab.ethz.ch/prospects contains a landing page, followed by the 91 binary lotteries (see the Appendix) presented in a randomized order to a decision maker to elicit their choices and preferences over well defined lotteries. The DM is shown each of the binary lotteries one at a time, and the computer records their choice as well as their response time. After the DM 32 803 has registered all of her choices, the individual’s parameters are estimated (using the HLM method) and the results are displayed (and emailed to the researcher as well). Further, researchers can have have full access to the raw data should they wish to run additional analyses/modeling on the choice data from the individual decision makers. This configuration can be used in a behavioral laboratory with internet connected computers, and most naturally could be used as a pre- or a post-test, in sequence with other research tasks. 804 Acknowledgments 796 797 798 799 800 801 802 805 806 807 808 809 810 811 The authors wish to thank Michael Schulte-Mecklenbeck, Thorsten Pachur, and Ralph Hertwig for the use of the experimental data analyzed in this paper. These data are a subset from a large and thorough risky choice study conducted at the Max Plank Institute. Thanks also to Helga Fehr-Duda and Thomas Epper for helpful comments and suggestions. This work was funded in part by the ETH Risk Center CHIRP Research Project. Any errors are the responsibility of the authors. 33 812 References 813 References 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 Abdellaoui, M., 2000. Parameter-free elicitation of utility and probability weighting functions. Management Science 46 (11), 1497–1512. Birnbaum, M. H., Chavez, A., 1997. Tests of theories of decision making: Violations of branch independence and distribution independence. Organizational Behavior and Human Decision Processes 71, 161–194. Camerer, C., Ho, T.-H., 1994. Violations of the betweenness axiom and nonlinearity in probability. Journal of Risk and Uncertainty 8, 167–196. Farrell, S., Ludwig, C., 2008. Bayesian and maximum likelihood estimation of hierarchical response times. Psychonomic Bulletin and Review 15 (6), 1209–1217. Fehr-Duda, H., de Gennaro, M., Schubert, R., 2006. Gender, financial risk, and probability weights. Theory and Decision 60 (2-3), 283–313. Fehr-Duda, H., Epper, T., 2012. Probability and risk: Foundations and economic implications of probability-dependent risk preferences. Annual Review of Economics 4, 567–593. Figner, B., Murphy, R. O., 2010. Using skin conductance in judgment and decision making research. New York, NY: Psychology Press, pp. 163–184. Gaechter, S., Johnson, E. J., Hermann, A., 2007. Individual-level loss aversion in risky and riskless choice. Working Paper, University of Nottingham. Gigerenzer, G., Gaissmaier, W., 2011. Heuristic decision making. Annual review of psychology 62, 451–482. Glöckner, A., Pachur, T., 2012. Cognitive models of risky choice: Parameter stability and predictive accuracy of prospect theory. Cognition 123, 21–32. Goldstein, W., Einhorn, H., 1987. Expression theory and the preference reversal phenomena. Psychological Review 94 (2), 236–254. Gonzalez, R., Wu, G., 1999. On the shape of the probability weighting function. Cognitive Psychology 38, 129–166. 34 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 Harless, D. W., Camerer, C. F., 1994. The predictive utility of generalized expected utility theories. Econometrica 62 (6), 1251–1290. Hey, J. D., Orme, C., 1994. Investigating generalizations of the expected utility theory using experimental data. Econometrica 62 (6), 1291–1326. Holt, C. A., Laury, S. K., 2002. Risk aversion and incentive effects. The American Economic Review 92 (5), 1644–1655. Ingersoll, J., 2008. Non-monotonicity of the tversky-kahneman probabilityweighting function: A cautionary note. European Financial Management 14 (3), 385–390. Kahneman, D., Tversky, A., 1979. Prospect theory: An analysis of decision under risk. Econometrica 47, 263–291. Kahneman, D., Tversky, A. (Eds.), 2000. Choices, values and frames. Cambridge University Press, Cambridge, UK, pp. p. 3 or pp. 1–16. Karmarkar, U., 1979. Subjectively weighted utility and the allais paradox. Organizational Behavior and Human Performance 24, 67–72. Köbberling, V., Wakker, P., 2005. An index of loss aversion. Journal of Economic Theory 122, 119–131. Lichtenstein, S., Slovic, P., 1971. Reversals of preference between bids and choices in gambling decisions. Journal of Experimental Psychology 89 (1), 46–55. Lopes, L. L., 1983. Some thought on the psychological concept of risk. Journal of Experimental Psychology: Human Perception and Performance 9, 137– 144. Luce, R. D., 1959. Individual choice behavior: A theoretical analysis. New York, NY: Wiley. Mosteller, F., Nogee, P., 1951. An experimental measurement of utility. The Journal of Political Economy 59, 371–404. Nelder, J. A., Mead, R., 1965. A simplex method for function minimization. Computer Journal 7, 308–313. 35 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 Nilsson, H., Rieskamp, J., Wagenmakers, E.-J., 2011. Hierarchical bayesian parameter estimation for cumulative prospect theory. Journal of Mathematical Psychology 55, 84–93. Prelec, D., 1998. The probability weighting function. Econometrica 66, 497– 527. Rabin, M., 2000. Risk aversion and expected-utility theory: A calibration theorem. Econometrica 68, 1281–1293. Rieskamp, J., 2008. The probabilistic nature of preferential choice. Journal of Experimental Psychology: Learning, Memory and Cognition 34 (6), 1446–1465. Roberts, S., Pashler, H., 2000. How persuasive is a good fit? a comment on theory testing. Psychological Review 107 (2), 358–367. Scheibehenne, B., Pachur, T., 2013. Hierarchical bayesian modeling: Does it improve parameter stability? In: Knauff, M., Pauen, M., Sebanz, N., Wachsmuth, I. (Eds.), Proceedings of the 35th Annual Conference of the Cognitive Science Society. Austin TX: Cognitive Science Society, pp. 1277– 1282. Schulte-Mecklenbeck, M., Pachur, T., Murphy, R. O., Hertwig, R., 2014. Does prospect theory capture psychological processes. Working Paper, MPI Berlin. Starmer, C., June 2000. Development in non-expected utility theory: The hunt for a descriptive theory of choice under risk. Journal of Economic Literature 38, 332–382. Stevens, S. S., 1957. On the psychophysical law. The Psychological Review 64 (3), 153–181. Stott, H., 2006. Cumulative prospect theory’s functional menagerie. Journal of Risk and Uncertainty 32, 101–130. Tanaka, T., Camerer, C. F., Nguyen, Q., 2010. Risk and time preferences: Linking experimental and household survey data from vietnam. American Economic Review 100 (1), 557–571. 36 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 Tversky, A., Kahneman, D., 1992. Advances in prospect theory: Cumulative representations of uncertainty. Journal of Risk and Uncertainty 5, 297–323. Wakker, P., 2005. Formalizing reference dependence and initial wealth in rabin’s calibration theorem. Working paper, Erasmus University, available at http://people.few.eur.nl/wakker/pdf/calibcsocty05.pdf. Wetzels, R., Vandekerckhove, J., Tuerlinckx, F., Wagenmakers, E.-J., 2010. Bayesian parameter estimation in the expectancy valence model of the iowa gambling task. Journal of Mathematical Psychology 54, 14–27. Willemsen, M., Johnson, E., 2014. URL http://www.mouselabweb.org Willemsen, M. C., Johnson, E. J., 2011. Visiting the decision factory: Observing cognition with mouselabweb and other information acquisition methods. A handbook of process tracing methods for decision research: A critical review and users guide, 21–42. Zeisberger, S., Vrecko, D., Langer, T., 2010. Measuring the time stability of prospect theory preferences. Theory and Decision 72, 359–386. Zyphur, M. J., Oswald, F. L., 2013. Bayesian probability and statistics in management research: A new horizon. Journal of Management 39 (1), 5–13. 37 Appendix A: Lotteries Item p(A1 ) 1 .34 2 .88 .74 3 4 .05 5 .25 .28 6 7 .09 8 .63 .88 9 10 .61 11 .08 .92 12 13 .78 14 .16 .12 15 16 .29 17 .31 18 .17 19 .91 20 .09 21 .44 22 .68 23 .38 .62 24 25 .49 26 .16 .13 27 28 .29 29 .82 30 .29 31 .60 32 .48 33 .53 34 .49 .99 35 36 .79 37 .56 38 .63 39 .59 40 .13 41 .84 42 .86 43 .66 44 .39 45 .51 46 .73 .49 47 48 .56 49 .96 50 .56 A1 p(A2 ) A2 p(B1 ) B1 p(B2 ) 24 .66 59 .42 47 .58 79 .12 82 .20 57 .80 62 .26 0 .44 23 .56 56 .95 72 .95 68 .05 84 .75 43 .43 7 .57 7 .72 74 .71 55 .29 56 .91 19 .76 13 .24 41 .37 18 .98 56 .02 72 .12 29 .39 67 .61 37 .39 50 .60 6 .40 54 .92 31 .15 44 .85 63 .08 5 .63 43 .37 32 .22 99 .32 39 .68 66 .84 23 .79 15 .21 52 .88 73 .98 92 .02 88 .71 78 .29 53 .71 39 .69 51 .84 16 .16 70 .83 65 .35 100 .65 80 .09 19 .64 37 .36 83 .91 67 .48 77 .52 14 .56 72 .21 9 .79 41 .32 65 .85 100 .15 40 .62 55 .14 26 .86 1 .38 83 .41 37 .59 15 .51 50 .94 64 .06 -15 .84 -67 .72 -56 .28 -19 .87 -56 .70 -32 .30 -67 .71 -28 .05 -46 .95 -40 .18 -90 .17 -46 .83 -25 .71 -86 .76 -38 .24 -46 .40 -21 .42 -99 .58 -15 .52 -91 .28 -48 .72 -93 .47 -26 .80 -52 .20 -1 .51 -54 .77 -33 .23 -24 .01 -13 .44 -15 .56 -67 .21 -37 .46 0 .54 -58 .44 -80 .86 -58 .14 -96 .37 -38 .17 -12 .83 -55 .41 -77 .47 -30 .53 -29 .87 -76 .55 -100 .45 -57 .16 -90 .25 -63 .75 -29 .14 -30 .26 -17 .74 -8 .34 -95 .93 -42 .07 -35 .61 -72 .76 -57 .24 -26 .49 -76 .77 -48 .23 -73 .27 -54 .17 -42 .83 -66 .51 -92 .78 -97 .22 -9 .44 -56 .64 -15 .36 -61 .04 -56 .34 -7 .66 -4 .44 -80 .04 -46 .96 (Continued on next page) 38 B2 64 94 31 95 97 63 90 8 63 45 29 53 56 29 19 91 91 50 65 6 31 2 96 24 14 -83 -37 -44 -64 -99 -37 -74 -93 -30 -62 -97 -97 -69 -61 -28 -30 -43 -30 -28 -34 -70 -34 -80 -63 -58 Item p(A1 ) 51 .43 52 .06 53 .79 54 .37 55 .61 .43 56 57 .39 58 .59 59 .92 .89 60 61 .86 62 .74 63 .91 64 .93 .99 65 66 .48 67 .07 68 .97 69 .86 70 .88 .87 71 72 .96 .38 73 74 .28 75 .50 .50 76 77 .50 .50 78 79 .50 80 .50 .50 81 82 .10 83 .20 84 .30 85 .40 86 .50 87 .60 88 .70 89 .80 90 .90 91 1 (Continued) A1 p(A2 ) A2 p(B1 ) -91 .57 63 .27 -82 .94 54 .91 -70 .21 98 .65 -8 .63 52 .87 96 .39 -67 .50 -47 .57 63 .02 -70 .61 19 .30 -100 .41 81 .47 -73 .08 96 .11 -31 .11 27 .36 -39 .14 83 .80 77 .26 -23 .67 -33 .09 28 .27 75 .07 -90 .87 67 .01 -3 .68 58 .52 -5 .40 -55 .93 95 .48 -51 .03 30 .68 -26 .14 82 .60 -90 .12 88 .80 -78 .13 45 .88 17 .04 -48 .49 -49 .62 2 .22 -59 .72 96 .04 98 .50 -24 .14 -20 .50 60 .50 -30 .50 60 .50 -40 .50 60 .50 -50 .50 60 .50 -60 .50 60 .50 -70 .50 60 .50 40 .90 32 .10 40 .80 32 .20 40 .70 32 .30 40 .60 32 .40 40 .50 32 .50 40 .40 32 .60 40 .30 32 .70 40 .20 32 .80 40 .10 32 .90 40 0 32 1.00 B1 p(B2 ) -83 .73 38 .09 -85 .35 23 .13 71 .50 -69 .98 8 .70 -73 .53 16 .89 26 .64 8 .20 75 .33 9 .73 96 .13 74 .32 -40 .60 -13 .52 -89 .32 -39 .40 -86 .20 -69 .12 -60 .51 19 .78 -4 .96 -76 .86 0 .50 0 .50 0 .50 0 .50 0 .50 0 .50 77 .90 77 .80 77 .70 77 .60 77 .50 77 .40 77 .30 77 .20 77 .10 77 .00 B2 24 -73 93 -39 -26 14 -37 15 -48 -48 -88 -7 -67 -89 -2 96 99 46 31 14 83 84 -18 63 46 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 Table 6: Lotteries used in the experiment. Each row is one lottery. The first column is the item number. Columns 2-5 describe option A, in the form of a p(A1 ) probability of A1 and a p(A2 ) of A2 . Columns 6-9 describe option B in the same format. 39 919 Appendix B: Hierarchical Distribution Parameters Session 1 Session 2 µα σα µλ σλ µδ σδ µγ σγ -0.27 0.18 0.00 0.62 -0.02 0.39 -0.31 0.59 -0.22 0.18 0.01 0.67 -0.00 0.38 -0.34 0.67 Table 7: Estimates for the density distribution parameters of both experimental sessions. Both were obtained using Monte Carlo integration with 2,500 random values on each dimension. 40
© Copyright 2026 Paperzz