RISK AVERSION IN GAME SHOWS Steffen Andersen, Glenn W. Harrison, Morten I. Lau and E. Elisabet Rutström ABSTRACT We review the use of behavior from television game shows to infer risk attitudes. These shows provide evidence when contestants are making decisions over very large stakes, and in a replicated, structured way. Inferences are generally confounded by the subjective assessment of skill in some games, and the dynamic nature of the task in most games. We consider the game shows Card Sharks, Jeopardy!, Lingo, and finally Deal Or No Deal. We provide a detailed case study of the analyses of Deal Or No Deal, since it is suitable for inference about risk attitudes and has attracted considerable attention. Observed behavior on television game shows constitutes a controlled natural experiment that has been used to estimate risk attitudes. Contestants are presented with well-defined choices where the stakes are real and sizeable, and the tasks are repeated in the same manner from contestant to contestant. We review behavior in these games, with an eye to inferring risk attitudes. We describe the types of assumptions needed to evaluate behavior, and propose a general method for estimating the parameters of structural models of choice behavior for these games. We illustrate with a detailed case study of behavior in the U.S. version of Deal Or No Deal (DOND). Risk Aversion in Experiments Research in Experimental Economics, Volume 12, 361–406 Copyright r 2008 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISSN: 0193-2306/doi:10.1016/S0193-2306(08)00008-2 361 362 STEFFEN ANDERSEN ET AL. In Section 1 we review the existing literature in this area that is focused on risk attitudes, starting with Gertner (1993) and the Card Sharks program. We then review the analysis of behavior on Jeopardy! by Metrick (1995) and on Lingo by Beetsma and Schotman (2001).1 In Section 2 we turn to a detailed case study of the DOND program that has generated an explosion of analyses trying to estimate large-stakes risk aversion. We explain the basic rules of the game, which is shown with some variations in many countries. We then review complementary laboratory experiments that correspond to the rules of the naturally occurring game show. Finally, we discuss alternative modeling strategies employed in related DOND literature. Section 3 proposes a general method for estimating choice models in the stochastic dynamic programming environment that most of these game shows employ. We resolve the ‘‘curse of dimensionality’’ in this setting by using randomization methods and certain simplifications to the forwardlooking strategies adopted. We discuss the ability of our approach to closely approximate the fully dynamic path that agents might adopt. We illustrate the application of the method using data from the U.S. version of DOND, and estimate a simple structural model of expected utility theory choice behavior. The manner in which our method can be extended to other models is also discussed. Finally, in Section 4 we identify several weaknesses of game show data, and how they might be addressed. We stress the complementary use of natural experiments, such as game shows, and laboratory experiments. 1. PREVIOUS LITERATURE 1.1. Card Sharks The game show Card Sharks provided an opportunity for Gertner (1993) to examine dynamic choice under uncertainty involving substantial gains and losses. Two key features of the show allowed him to examine the hypothesis of asset integration: each contestant’s stake accumulates from round to round within a game, and the fact that some contestants come back for repeat plays after winning substantial amounts. The game involves each contestant deciding in a given round whether to bet that the next card drawn from a deck will be higher or lower than Risk Aversion in Game Shows 363 Fig. 1. Money Cards Board in Card Sharks. some ‘‘face card’’ on display. Fig. 1 provides a rough idea of the layout of the ‘‘Money Cards’’ board before any face cards are shown. Fig. 2 provides a representation of the board from a computerized laboratory implementation2 of Card Sharks. In Fig. 2 the subject has a face card with a 3, and is about to enter the first bet. Cards are drawn without replacement from a standard 52-card deck, with no Jokers and with Aces high. Contestants decide on the relative value of the next card, and then on an amount to bet that their choice is correct. If they are correct their stake increments by the amount bet, if they are incorrect their stake is reduced by the amount bet, and if the new card is the same as the face card there is no change in the stake. Every contestant starts off with an initial stake of $200, and bets could be in increments of $50 of the available stake. After three rounds in the first, bottom ‘‘row’’ of cards, they move to the second, middle ‘‘row’’ and receive an additional $200 (or $400 in some versions). If the stake goes to zero in the first row, contestants go straight to the second row and receive the new stake; otherwise, the additional stake is added to what remains from row one. The second row includes three choices, just as in the first row. After these three choices, and if the stakes have not dropped to zero, they can play the final bet. In this case they have to bet at least one-half of their stake, but otherwise the betting works the same way. One feature of the game is that contestants 364 STEFFEN ANDERSEN ET AL. Fig. 2. Money Cards Board from Lab Version of Card Sharks. sometimes have the option to switch face cards in the hope of getting one that is easier to win against.3 The show aired in the United States in two major versions. The first, between April 1978 and October 1981, was on NBC and had Jim Perry as the host. The second, between January 1986 and March 1989, was on CBS and had Bob Eubanks as the host.4 The maximum prize was $28,800 on the NBC version and $32,000 on the CBS version, and would be won if the contestant correctly bet the maximum amount in every round. This only occurred once. Using official inflation calculators5 this converts into 2006 dollars between $89,138 and $63,936 for the NBC version, and between $58,920 and $52,077 for the CBS version. Risk Aversion in Game Shows 365 These stakes are actually quite modest in relation to contemporary game shows in the United States, such as DOND described below, which typically has a maximal stake of $1,000,000. Of course, maximal stakes can be misleading, since Card Sharks and DOND are both ‘‘long shot’’ lotteries. Average earnings in the CBS version used by Gertner (1993) were $4,677, which converts to between $8,611 and $7,611 in 2006, whereas average earnings in DOND have been $131,943 for the sample we report later (excluding a handful of special shows with significantly higher prizes). 1.1.1. Estimates of Risk Attitudes The analysis of Gertner (1993) assumes a Constant Absolute Risk Aversion (CARA) utility function, since he did not have information on household wealth and viewed that as necessary to estimate a Constant Relative Risk Aversion (CRRA) utility function. We return to the issue of household wealth later. Gertner (1993) presents several empirical analyses. He initially (p. 511) focuses on the last round, and uses the optimal ‘‘investment’’ formula b¼ lnðpwin Þ lnðplose Þ 2a where the probabilities of winning and losing the bet b are defined by pwin and plose, and the utility function is UðW Þ ¼ expðaW Þ for wealth W.6 From observed bets he infers a. There are several potential problems with this approach. First, there is an obvious sample selection problem from only looking at the last round, although this is not a major issue since relatively few contestants go bankrupt (less than 3%). Second, there is the serious problem of censoring at bets of 50% or 100% of the stake. Gertner (1993, p. 510) is well aware of the issue, and indeed motivates several analytical approaches to these data by a desire to avoid it: Regression estimates of absolute risk aversion are sensitive to the distribution assumptions one makes to handle the censoring created by the constraints that a contestant must bet no more than her stake and at least half of her stake in the final round. Therefore, I develop two methods to estimate a lower bound on the level of risk aversion that do not rely on assumptions about the error distribution. The first method he uses is just to assume that the censored responses are in fact the optimal response. The 50% bets are assumed to be optimal bets, when in fact the contestant might wish to bet less (but cannot due to the 366 STEFFEN ANDERSEN ET AL. final-round betting rules); thus inferences from these responses will be biased towards showing less risk aversion than there might actually be. Conversely, the 100% bets are assumed to be risk neutral, when in fact they might be risk lovers; thus inferences from these responses will be biased towards showing more risk aversion than there might actually be. Two wrongs do not make a right, although one does encounter such claims in empirical work. Of course, this approach still relies on exactly the same sort of assumptions about the interpretation of behavior, although not formalized in terms of an error distribution. And it is not apparent that the estimates will be lower bounds, since this censoring issue biases inferences in either direction. The average estimate of ARA to emerge is 0.000310, with a standard error of 0.000017, but it is not clear how one should interpret this estimate since it could be an overestimate or an underestimate. The second approach is a novel and early application of simulation methods, which we will develop in greater detail below. A computer simulates optimal play by a risk-neutral agent playing the entire game 10 million times, recognizing that the cards are drawn without replacement. The computer does not appear to recognize the possibility of switching cards, but that is not central to the methodological point. The average return from this virtual lottery (VL) is $6,987 with a standard deviation of $10,843. It is not apparent that the lottery would have a Gaussian distribution of returns, but that can be allowed for in a more complete numerical analysis as we show later, and is again not central to the main methodological point. The next step is to compare this distribution with the observed distribution of earnings, which was an average of $4,677 with a standard deviation of $4,258, and use a revealed preference argument to infer what risk attitudes must have been in play for this to have been the outcome instead of the VL: A second approach is to compare the sample distribution of outcomes with the distribution of outcomes if a contestant plays the optimal strategy for a risk-neutral contestant. One can solve for the coefficient of absolute risk aversion that would make an individual indifferent between the two distributions. By revealed preference, an ‘‘average’’ contestant prefers the actual distribution to the expected-value maximizing strategy, so this is an estimate of the lower bound of constant absolute risk aversion (pp. 511/512). This approach is worth considering in more depth, because it suggests estimation strategies for a wide class of stochastic dynamic programming problems which we develop in Section 3. This exact method will not work once one moves beyond special cases such as risk neutrality, where outcomes Risk Aversion in Game Shows 367 and behavior in later rounds have no effect on optimal behavior in earlier rounds. But we will see that an extension of the method does generalize. The comparison proposed here generates a lower bound on the ARA, rather than a precise estimate, since we know that an agent with an even higher ARA would also implicitly choose the observed distribution over the virtual RN distribution. Obviously, if one could generate VL distributions for a wide range of ARA values, it would be possible to refine this estimation step and select the ARA that maximizes the likelihood of the data. This is, in fact, exactly what we propose later as a general method for estimating risk attitudes in such settings. The ARA bound derived from this approach is 0.0000711, less than one-fourth of the estimate from the first method. Gertner (1993, p. 512) concludes that The ‘‘Card Sharks’’ data indicate a level of risk aversion higher than most existing estimates. Contestants do not seem to behave in a risk-loving and enthusiastic way because they are on television, because anything they win is gravy, or because the producers of the show encourage excessive risk-taking. I think this helps lend credence to the potential importance and wider applicability of the anomalous results I document below. His first method does not provide any basis for these claims, since risk loving is explicitly assumed away. His second method does indicate that the average player behaves as if risk averse, but there are no standard errors on that bound. Thus, one simply cannot say that it is statistically significant evidence of risk aversion. 1.1.2. EUT Anomalies The second broad set of empirical analyses by Gertner (1993) considers a regression model of bets in the final round, and shows some alleged violations of EUT. The model is a two-limit tobit specification, recognizing that bets at 50% and 100% may be censored. However, most of the settings in which contestants might rationally bet 50% or 100% are dropped. Bets with a face card of 2 or an Ace are dropped since they are sure things in the sense that the optimal bet cannot result in a loss (the bet is simply returned if the same card is then turned up). Similarly, bets with a face card of 8 are dropped, since contestants almost always bet the minimum. These deletions amount to 258 of the 844 observations, which is not a trivial sub-sample. The regression model includes several explanatory variables. The central ones are cash and stake. Variable cash is the accumulated earnings by the contestant to that point over all repetitions of the game. So this includes previous plays of the game for ‘‘champions,’’ as well as earnings 368 STEFFEN ANDERSEN ET AL. accumulated in rounds 1–6 of the current game. Variable stake is the accumulated earnings in the current game, so it excludes earnings from previous games. One might expect the correlation of stake and cash to be positive and high, since the average number of times the game is played in these data is 1.85 ( ¼ 844/457). Additional explanatory variables include a dummy for new players that are in their first game; the ratio of cash to the number of times the contestant has played the whole game (the ratio is 0 for new players); the value of any cars that have been won, given by the stated sticker price of the car; and dummy variables for each of the possible face card pairs (in this game a 3 is essentially the same as a King, a 4 the same as a Queen, etc). The stake variable is included as an interaction with these face dummies, which are also included by themselves.7 The model is estimated with or without a multiplicative heteroskedasticity correction, and the latter estimates preferred. Card-counters are ignored when inferring probabilities of a win, and this seems reasonable as a first approximation. Gertner (1993, Section VI) draws two striking conclusions from this model. The first is that stake is statistically significant in its interactions with the face cards. The second is that the cash variable is not significant. The first result is said to be inconsistent with EUT since earnings in this show are small in relation to wealth, and The desired dollar bet should depend upon the stakes only to the extent that the stakes impact final wealth. Thus, risky decisions on ‘‘Card Sharks’’ are inconsistent with individuals maximizing a utility function over just final wealth. If one assumes that utility depends only on wealth, estimates of zero on card intercepts and significant coefficients on the stake variable imply that outside wealth is close to zero. Since this does not hold, one must reject utility depending only on final wealth (p. 517). This conclusion bears close examination. First, there is a substantial debate as to whether EUT has to be defined over final wealth, whatever that is, or can be defined just over outcomes in the choice task before the contestant (e.g., see Cox and Sadiraj (2006) and Harrison, Lau, and Rutström (2007) for references to the historical literature). So even if one concludes that the stake matters, this is not fatal for specifications of EUT defined over prizes, as clearly recognized by Gertner (1993, p. 519) in his reference to Markowitz (1952). Second, the deletion of all extreme bets likely leads to a significant understatement of uncertainty about coefficient estimates. Third, the regression does not correct for panel effects, and these could be significant since the variables cash and stake are correlated with the individual.8 Hence their coefficient estimates might be picking up other, unobservable effects that are individual-specific. Risk Aversion in Game Shows 369 The second result is also said to be inconsistent with EUT, in conjunction with the first result. The logic is that stake and cash should have an equal effect on terminal wealth, if one assumes perfect asset integration and that utility is defined over terminal wealth. But one has a significant effect on bets, and the other does not. Since the assumption that utility is defined over terminal wealth and that asset integration is perfect are implicitly maintained by Gertner (1993, p. 517ff.), he concludes that EUT is falsified. However, one can include terminal wealth as an argument of utility without also assuming perfect asset integration (e.g., Cox & Sadiraj, 2006). This is also recognized explicitly by Gertner (1993, p. 519), who considers the possibility that ‘‘contestants have multi-attribute utility functions, so that they care about something in addition to wealth.’’9 Thus, if one accepts the statistical caveats about samples and specifications for now, these results point to the rejection of a particular, prominent version of EUT, but they do not imply that all popular versions of EUT are invalid. 1.2. Jeopardy! In the game show Jeopardy! there is a subgame referred to as Final Jeopardy. At this point, three contestants have cash earnings from the initial rounds. The skill component of the game consists of hearing some text read out by the host, at which point the contestants jump in to state the question that the text provides the answer to.10 In Final Jeopardy the contestants are told the general subject matter for the task, and then have to privately and simultaneously state a wager amount from their accumulated points. They can wager any amount up to their earned endowment at that point, and are rewarded with even odds: if they are correct they get that wager amount added, but if they are incorrect they have that amount deducted. The winner of the show is the contestant with the most cash after this final stage. The winner gets to keep the earnings and come back the following day to try and continue as champion. In general, these wagers are affected by the risk attitudes of contestants. But they are also affected by their subjective beliefs about their own skill level relative to the other two contestants, and by what they think the other contestants will do. So this game cannot be fully analyzed without making some game-theoretic assumptions. Jeopardy! was first aired in the United States in 1964, and continued until 1975. A brief season returned between 1978 and 1979, and then the modern era began in 1984 and continues to this day. The format changes have been 370 STEFFEN ANDERSEN ET AL. relatively small, particularly during the modern era. The data used by Metrick (1995) comes from shows broadcasted between October 1989 and January 1992, and reflects more than 1,150 decisions. Metrick (1995) examines behavior in Final Jeopardy in two stages.11 The first stage considers the subset of shows in which one contestant is so far ahead in cash that the bet only reveals risk attitudes and beliefs about own skill. In such ‘‘runaway games’’ there exist wagers that will ensure victory, although there might be some rationale prior to September 2003 for someone to bet an amount that could lead to a loss. Until then, the champion had to retire after five wins, so if one had enough confidence in one’s skill at answering such questions, one might rationally bet more than was needed to ensure victory. After September 2003 the rules changed, so the champion stays on until defeated. In the runaway games Metrick (1995, p. 244) uses the same formula that Gertner (1993) used for CARA utility functions. The only major difference is that the probability of winning in Jeopardy! is not known objectively to the observer.12 His solution is to substitute the observed fraction of correct answers, akin to a rational expectations assumption, and then solve for the CARA parameter a that accounts for the observed bets. The result is an estimate of a equal to 0.000066 with a standard error of 0.000056. Thus, there is slight evidence of risk aversion, but it is not statistically significant, leading Metrick (1995, p. 245) to conclude that these contestants behaved in a risk-neutral manner. The second stage of the analysis considers subsamples in which two players have accumulated scores that are sufficiently close that they have to take beliefs about the other into account, but where there is a distant third contestant who can be effectively ignored. Metrick (1995) cuts this Gordian knot of strategic considerations by assuming that contestants view themselves as betting against contestants whose behavior can be characterized by their observed empirical frequencies. He does not use these data to make inferences about risk attitudes. 1.3. Lingo The underlying game in Lingo involves a team of two people guessing a hidden five-letter word. Fig. 3 illustrates one such game from the U.S. version. The team is told the first letter of the word, and can then just state words. If incorrect, the words that are tried are used to reveal letters in the correct word if there are any. To take the example in Fig. 3, the true word Risk Aversion in Game Shows Fig. 3. 371 The Word Puzzle in Lingo. was STALL. So the initial S was shown. The team suggested SAINT and is informed (by light grey coloring) that A and T are present in the correct word. The team is not told the order of the letters A and T in the correct word. The team then suggested STAKE, and was informed that the T and A were in the right place (by grey coloring) and that no other letters were in the correct word. The team then tried STAIR, SEATS, and finally STALL. Most teams are able to guess the correct word in five rounds. The game occurs in two stages. In the first stage, one team of two plays against another team for several of these Lingo word-guessing games. The couple with the most money then goes on to the second stage, which is the one of interest for measuring risk attitudes because it is non-interactive. So the winning couple comes into the main task with a certain earned endowment (which could be augmented by an unrelated game called ‘‘jackpot’’). The team also comes in with some knowledge of its own ability to solve these word-guessing puzzles. In the Dutch data used by Beetsma and Schotman (2001), spanning 979 games, the frequency distribution of the number of solutions across rounds 372 STEFFEN ANDERSEN ET AL. 1–5 in the final stage was 0.14, 0.32, 0.23, 0.13, 0.081, and 0.089, respectively. Every round that the couple requires to guess the word means that they have to pick one ball from an urn affecting their payoffs, as described below. If they do not solve the word puzzle, they have to pick six balls. These balls determine if the team goes ‘‘bust’’ or ‘‘survives’’ something called the Lingo Board in that round. An example of the Lingo Board is shown in Fig. 4, from Beetsma and Schotman (2001, Fig. 3).13 There are 35 balls in the urn numbered from 1 to 35, plus one ‘‘golden ball.’’ If the golden ball is picked then the team wins the cash prize for that round and gets a free pass to the next round. If one of the numbered balls is picked, then the fate of the team depends on the current state of the Lingo Board. The team goes ‘‘bust’’ if they get a row, column, or diagonal of X’s, akin to the parlor game noughts and crosses. So solving the word puzzle in fewer moves is good, since it means that fewer balls have to be drawn from the urn, and hence that the survival probability is higher. In the example from Fig. 4, drawing a 5 would be fatal, drawing an 11 would not be, and drawing a 1 would not be if a 2 or 8 had not been previously drawn. If the team survives a round it gets a cash prize, and is asked if they want to keep going or stop. This lasts for five rounds. So apart from the skill part of the game, guessing the words, this is the only choice the team makes. This is therefore a ‘‘stop-go’’ problem, in which the team balances current earnings with the lottery of continuing and either earning more cash or going bust. If the team chooses to continue the stake doubles; if the golden ball had been drawn it is replaced in the urn. If the team goes bust it takes home nothing. Teams can play the game up to three times, then retire from the show. Fig. 4. Example of a Lingo Board. Risk Aversion in Game Shows 373 Risk attitudes are involved when the team has to balance the current earnings with the lottery of continuing. That lottery depends on subjective beliefs about the skill level of the team, the state of the Lingo Board at that point, and the perception of the probabilities of drawing a ‘‘fatal’’ number or the golden ball. In many respects, apart from the skill factor and the relative symmetry of prizes, this game is remarkably like DOND, as we see later. Beetsma and Schotman (2001) evaluate data from 979 finals. Each final lasts several rounds, so the sample of binary stop/continue decisions is larger, and constitutes a panel. Average earnings in this final round in their sample are 4,106 Dutch guilders ( f ), with potential earnings, given the initial stakes brought into the final, of around f 15,136. The average exchange rate into U.S. dollars in 1997, which is around when these data were from, was f 0.514 per dollar, so these stakes are around $2,110 on average, and up to roughly $7,780. These are not life-changing prizes, like the top prizes in DOND, but are clearly substantial in relation to most lab experiments. Beetsma and Schotman (2001, Section 4) show that the stop/continue decisions have a simple monotonic structure if one assumes CRRA or CARA utility. Since the odds of surviving never get better with more rounds, if it is optimal to stop in one round then it will always be optimal to stop in any later round. This property does not necessarily hold for other utility functions. But for these utility functions, which are still an important class, one can calculate a threshold survival probability pni for any round i such that the team should stop if the actual survival probability falls below it. This threshold probability does depend on the utility function and parameter values for it, but in a closed-form fashion that can be easily evaluated within a maximum-likelihood routine.14 Each team can play the game three times before it has to retire as a champion. The specification of the problem clearly recognizes the option value in the first game of coming back to play the game a second or third time, and then the option value in the second game of coming back to play a third time. The certainty-equivalent of these option values depends, of course, on the risk attitudes of the team. But the estimation procedure ‘‘black boxes’’ these option values to collapse the estimation problem down to a static one: they are free parameters to be estimated along with the parameter of the utility function. Thus, they are not constrained by the expected returns and risk of future games, the functional form of utility, and the specific parameters values being evaluated in the maximum-likelihood routine. Beetsma and Schotman (2001, p. 839) do clearly check that the option value in the first game exceeds the option value in the second game, but (a) they only examine point estimates, and make no claim that this 374 STEFFEN ANDERSEN ET AL. difference is statistically significant,15 and (b) there is no check that the absolute values of these option values are consistent with the utility function and parameter values. In addition, there is no mention of any corrections for the fact that each team makes several decisions, and that errors for that team are likely correlated. With these qualifications, the estimate of the CRRA parameter is 0.42, with a standard error of 0.05, if one assumes that utility is only defined over the monetary prizes. It rises to 6.99, with a standard error of 0.72, if one assumes a baseline wealth level of f 50,000, which is the preferred estimate. Each of these estimates is significantly different from 0, implying rejection of risk neutrality in favor of risk aversion. The CARA specification generates comparable estimates. One extension is to allow for probability weighting on the actual survival probability pi in round i. The weighting occurs in the manner of original Prospect Theory, due to Kahneman and Tversky (1979), and not in the rank-dependent manner of Quiggin (1982, 1993) and Cumulative Prospect Theory. One apparent inconsistency is that the actual survival probabilities are assumed to be weighted subjectively, but the threshold survival probabilities pni are not, which seems odd (see their Eq. (18), p. 843). The results show that estimates of the degree of concavity of the utility function increase substantially, and that contestants systematically overweight the actual survival probability. We return to some of the issues of structural estimation of models assuming decision weights, in a rank-dependent manner, in the discussion of DOND and Andersen, Harrison, Lau, and Rutström (2006a, 2006b). 2. DEAL OR NO DEAL 2.1. The Game Show as a Natural Experiment The basic version of DOND is the same across all countries. We explain the general rules by focusing on the version shown in the United States, and then consider variants found in other countries. The show confronts the contestant with a sequential series of choices over lotteries, and asks a simple binary decision: whether to play the (implicit) lottery or take some deterministic cash offer. A contestant is picked from the studio audience. They are told that a known list of monetary prizes, ranging from $0.01 up to $1,000,000, has been placed in 26 suitcases.16 Each suitcase is carried onstage by attractive female models, and has a number from 1 to Risk Aversion in Game Shows 375 26 associated with it. The contestant is informed that the money has been put in the suitcase by an independent third party, and in fact it is common that any unopened cases at the end of play are opened so that the audience can see that all prizes were in play. Fig. 5 shows how the prizes are displayed to the subject at the beginning of the game. The contestant starts by picking one suitcase that will be ‘‘his’’ case. In round 1, the contestant must pick 6 of the remaining 25 cases to be opened, so that their prizes can be displayed. Fig. 6 shows how the display changes after the contestant picks the first case: in this case the contestant unfortunately picked the case containing the $300,000 prize. A good round for a contestant occurs if the opened prizes are low, and hence the odds increase that his case holds the higher prizes. At the end of each round the host is phoned by a ‘‘banker’’ who makes a deterministic cash offer to the contestant. In one of the first American shows (12/21/2005) the host made a point of saying clearly that ‘‘I don’t know what’s in the suitcases, the banker doesn’t, and the models don’t.’’ The initial offer in early rounds is typically low in comparison to expected offers in later rounds. We use an empirical offer function later, but the qualitative trend is quite clear: the bank offer starts out at roughly 10% of Fig. 5. Opening Display of Prizes in TV Game Show Deal or No Deal. 376 STEFFEN ANDERSEN ET AL. Fig. 6. Prizes Available After One Case Has Been Opened. the expected value of the unopened cases, and increments by about 10% of that expected value for each round. This trend is significant, and serves to keep all but extremely risk-averse contestants in the game for several rounds. For this reason, it is clear that the case that the contestant ‘‘owns’’ has an option value in future rounds. In round 2, the contestant must pick five cases to open, and then there is another bank offer to consider. In succeeding rounds, 3–10, the contestant must open 4, 3, 2, 1, 1, 1, 1, and 1 cases, respectively. At the end of round 9, there are only two unopened cases, one of which is the contestant’s case. In round 9 the decision is a relatively simple one from an analyst’s perspective: either take the non-stochastic cash offer or take the lottery with a 50% chance of either of the two remaining unopened prizes. We could assume some latent utility function, and estimate parameters for that function that best explains observed binary choices. Unfortunately, relatively few contestants get to this stage, having accepted offers in earlier rounds. In our data, only 9% of contestants reach that point. More serious than the smaller sample size, one naturally expects that risk attitudes would affect those surviving to this round. Thus, there would be a serious sample attrition bias if one just studied choices in later rounds. Risk Aversion in Game Shows 377 The bank offer gets richer and richer over time, ceteris paribus the random realizations of opened cases. In other words, if each unopened case truly has the same subjective probability of having any remaining prize, there is a positive expected return to staying in the game for more and more rounds. A risk-averse subject that might be just willing to accept the bank offer, if the offer were not expected to get better and better, would choose to continue to another round since the expected improvement in the bank offer provides some compensation for the additional risk of going into the another round. Thus, to evaluate the parameters of some latent utility function given observed choices in earlier rounds, we have to mentally play out all possible future paths that the contestant faces.17 Specifically, we have to play out those paths assuming the values for the parameters of the likelihood function, since they affect when the contestant will decide to ‘‘deal’’ with the banker, and hence the expected utility of the compound lottery. This corresponds to procedures developed in the finance literature to price pathdependant derivative securities using Monte Carlo simulation (e.g., Campbell, Lo, & MacKinlay, 1997, Section 9.4). We discuss general numerical methods for this type of analysis later. Saying ‘‘no deal’’ in early rounds provides one with the option of being offered a better deal in the future, ceteris paribus the expected value of the unopened prizes in future rounds. Since the process of opening cases is a martingale process, even if the contestant gets to pick the cases to be opened, it has a constant future expected value in any given round equal to the current expected value. This implies, given the exogenous bank offers (as a function of expected value), that the dollar value of the offer will get richer and richer as time progresses. Thus, bank offers themselves will be a submartingale process. In the U.S. version the contestants are joined after the first round by several family members or friends, who offer suggestions and generally add to the entertainment value. But the contestant makes the decisions. For example, in the very first show a lady was offered $138,000, and her hyperactive husband repeatedly screamed out ‘‘no deal!’’ She calmly responded, ‘‘At home, you do make the decisions. But y. we’re not at home!’’ She turned the deal down, as it happens, and went on to take an offer of only $25,000 two rounds later. Our sample consists of 141 contestants recorded between December 19, 2005 and May 6, 2007. This sample includes 6 contestants that participated in special versions, for ratings purposes, in which the top prize was increased from $1 million to $2 million, $3 million, $4 million, $5 million or $6 million.18 The biggest winner on the show so far has been Michelle Falco, who was lucky enough to be on the September 22, 2006 show with a top prize 378 STEFFEN ANDERSEN ET AL. of $6 million. Her penultimate offer was $502,000 when the 3 unopened prizes were $10, $750,000 and $1 million, which has an expected value of $583,337. She declined the offer, and opened the $10 case, resulting in an offer of $808,000 when the expected value of the two remaining prizes was $875,000. She declined the offer, and ended up with $750,000 in her case. In other countries there are several variations. In some cases there are fewer prizes, and fewer rounds. In the United Kingdom there are only 22 monetary prizes, ranging from 1p up to d250,000, and only 7 rounds. In round 1 the contestant must pick 5 boxes, and then in each round until round 6 the contestant has to open 3 boxes per round. So there can be a considerable swing from round to round in the expected value of unopened boxes, compared to the last few rounds of the U.S. version. At the end of round 6 there are only 2 unopened boxes, one of which is the contestant’s box. Some versions substitute the option of switching the contestant’s box for an unopened box, instead of a bank offer. This is particularly common in the French and Italian versions, and relatively rare in other versions. Things become much more complex in those versions in which the bank offer in any round is statistically informative about the prize in the contestant’s case. In that case the contestant has to make some correction for this possibility, and also consider the strategic behavior of the banker’s offer. Bombardini and Trebbi (2005) offer clear evidence that this occurs in the Italian version of the show, but there is no evidence that it occurs in the U.K. version. The Australian version offers several additional options at the end of the normal game, called Chance, SuperCase, and Double Or Nothing. In many cases they are used as ‘‘entertainment filler,’’ for games that otherwise would finish before the allotted 30 min. It has been argued, most notably by Mulino, Scheelings, Brooks, and Faff (2006), that these options should rationally change behavior in earlier rounds, since they provide some uncertain ‘‘insurance’’ against saying ‘‘deal’’ earlier rather than later. 2.2. Comparable Laboratory Experiments We also implemented laboratory versions of the DOND game, to complement the natural experimental data from the game shows.19 The instructions were provided by hand and read out to subjects to ensure that every subject took some time to digest them. As far as possible, they rely on screen shots of the software interface that the subjects were to use to enter their choices. The opening page for the common practice session in the lab, shown in Fig. 7, provides the subject with basic information about the task Risk Aversion in Game Shows Fig. 7. 379 Opening Screen Shot for Laboratory Experiment. before them, such as how many boxes there were and how many boxes needed to be opened in any round.20 In the default setup the subject was given the same frame as in the Australian and U.S. game shows: this version has more prizes (26 instead of 22) and more rounds (9 instead of 6) than the U.K. version. After clicking on the ‘‘Begin’’ box, the lab subject was given the main interface, shown in Fig. 8. This provided the basic information for the DOND task. The presentation of prizes was patterned after the displays used on the actual game shows. The prizes are shown in the same nominal denomination as the Australian daytime game show, and the subject told that an exchange rate of 1,000:1 would be used to convert earnings in the DOND task into cash payments at the end of the session. Thus, the top cash prize the subject could earn was $200 in this version. The subject was asked to click on a box to select ‘‘his box,’’ and then round 1 began. In the instructions we illustrated a subject picking box #26, and then six boxes, so that at the end of round 1 he was presented with a deal from the banker, shown in Fig. 9. The prizes that had been opened in round 1 were ‘‘shaded’’ on the display, just as they are in the game show display. The subject is then asked to accept $4,000 or continue. When the 380 STEFFEN ANDERSEN ET AL. Fig. 8. Prize Distribution and Display for Laboratory Experiment. game ends the DOND task earnings are converted to cash using the exchange rate, and the experimenter prompted to come over and record those earnings. Each subject played at their own pace after the instructions were read aloud. One important feature of the experimental instructions was to explain how bank offers would be made. The instructions explained the concept of the expected value of unopened prizes, using several worked numerical examples in simple cases. Then subjects were told that the bank offer would be a fraction of that expected value, with the fractions increasing over the rounds as displayed in Fig. 10. This display was generated from Australian game show data available at the time. We literally used the parameters defining the function shown in Fig. 10 when calculating offers in the experiment, and then rounding to the nearest dollar. Risk Aversion in Game Shows Fig. 9. 381 Typical Bank Offer in Laboratory Experiment. The subjects for our laboratory experiments were recruited from the general student population of the University of Central Florida in 2006.21 We have information on 676 choices made by 89 subjects. We estimate the same models for the lab data as for the U.S. game show data. We are not particularly interested in getting the same quantitative estimates per se, since the samples, stake, and context differ in obvious ways. Instead our interest is whether we obtain the same qualitative results: is the lab reliable in terms of the qualitative inferences one draws from it? Our null hypothesis is that the lab results are the same as the naturally occurring results. If we reject this hypothesis one could infer that we have just not run the right lab experiments in some respect, and we have some sympathy for that view. On the other hand, we have implemented our lab experiments in exactly the manner that we would normally do as lab experimenters. So we 382 STEFFEN ANDERSEN ET AL. Path of Bank Offers 1 Bank Offer As A Fraction of Expected Value of Unopened Cases .9 .8 .7 .6 .5 .4 .3 .2 .1 0 1 Fig. 10. 2 3 4 5 Round 6 7 8 9 Information on Bank Offers in Laboratory Experiment. are definitely able to draw conclusions in this domain about the reliability of conventional lab tests compared to comparable tests using naturally occurring data. These conclusions would then speak to the questions raised by Harrison and List (2004) and Levitt and List (2007) about the reliability of lab experiments. 2.3. Other Analyses of Deal or No Deal A large literature on DOND has evolved quickly.22 Appendix B in the working paper version documents in detail the modeling strategies adopted in the DOND literature, and similarities and differences to the approach we propose.23 In general, three types of empirical strategies have been employed to modeling observed DOND behavior. The first empirical strategy is the calculation of CRRA bounds at which a given subject is indifferent between one choice and another. These bounds can be calculated for each subject and each choice, so they have the advantage of not assuming that each subject has the same risk preferences, just that they use the same functional form. The studies differ in terms of Risk Aversion in Game Shows 383 how they use these bounds, as discussed briefly below. The use of bounds such as these is familiar from the laboratory experimental literature on risk aversion: see Holt and Laury (2002), Harrison, Johnson, McInnes, and Rutström (2005), and Harrison, Lau, Rutström, and Sullivan (2005) for discussion of how one can then use interval regression methods to analyze them. The limitation of this approach, discussed in Harrison and Rutström (2008, Section 2.1), is that it is difficult to go beyond the CRRA or other one-parameter families, and in particular to examine other components of choice under uncertainty (such as more flexible utility functions, preference weighting or loss aversion).24 Post, van den Assem, Baltussen, and Thaler (2006) use CRRA bounds in their analysis, and it has been employed in various forms by others as noted below. The second empirical strategy is the examination of specific choices that provide ‘‘trip wire’’ tests of certain propositions of EUT, or provide qualitative indicators of preferences. For example, decisions made in the very last rounds often confront the contestant with the expected value of the unopened prizes, and allow one to identify those who are risk loving or risk averse directly. The limitation of this approach is that these choices are subject to sample selection bias, since risk attitudes and other preferences presumably played some role in whether the contestant reached these critical junctures. Moreover, they provide limited information at best, and do not allow one to define a metric for errors. If we posit some stochastic error specification for choices, as is now common, then one has no way of knowing if these specific choices are the result of such errors or a manifestation of latent preferences. Blavatskyy and Pogrebna (2006) illustrate the sustained use of this type of empirical strategy, which is also used by other studies in some respects. The third empirical strategy it to propose a latent decision process and estimate the structural parameters of that process using maximum likelihood. This is the approach we favor, since it allows one to examine structural issues rather than rely on ad hoc proxies for underlying preferences. Harrison and Rutström (2008, Section 2.2) discuss the general methodological advantages of this approach. 3. A GENERAL ESTIMATION STRATEGY The DOND game is a dynamic stochastic task in which the contestant has to make choices in one round that generally entail consideration of future consequences. The same is true of the other game shows used for estimation 384 STEFFEN ANDERSEN ET AL. of risk attitudes. In Card Sharks the level of bets in one round generally affects the scale of bets available in future rounds, including bankruptcy, so for plausible preference structures one should take this effect into account when deciding on current bets. Indeed, as explained earlier, one of the empirical strategies employed by Gertner (1993) can be viewed as a precursor to our general method. In Lingo the stop/continue structure, where a certain amount of money is being compared to a virtual money lottery, is evident. We propose a general estimation strategy for such environments, and apply it to DOND. The strategy uses randomization to break the general ‘‘curse of dimensionality’’ that is evident if one considers this general class of dynamic programming problems (Rust, 1997). 3.1. Basic Intuition The basic logic of our approach can be explained from the data and simulations shown in Table 1. We restrict attention here to the first 75 contestants that participated in the standard version of the television game with a top prize of $1 million, to facilitate comparison of dollar amounts. There are nine rounds in which the banker makes an offer, and in round 10 the contestant simply opens his case. Only 7 contestants, or 9% of the sample of 75 continued to round 10, with most accepting the banker’s offer in rounds 6, 7, 8, and 9. The average offer is shown in column 4. We stress that this offer is stochastic from the perspective of the sample as a whole, even if it is non-stochastic to the specific contestant in that round. Thus, to see the logic of our approach from the perspective of the individual decision-maker, think of the offer as a non-stochastic number, using the average values shown as a proximate indicator of the value of that number in a particular instance. In round 1 the contestant might consider up to nine VLs. He might look ahead one round and contemplate the outcomes he would get if he turned down the offer in round 1 and accepted the offer in round 2. This VL, realized in virtual round 2 in the contestant’s thought experiment, would generate an average payoff of $31,141 with a standard deviation of $23,655. The top panel of Fig. 11 shows the simulated distribution of this particular lottery. The distribution of payoffs to these VLs are highly skewed, so the standard deviation may be slightly misleading if one thinks of these as Gaussian distributions. However, we just use the standard deviation as one pedagogic indicator of the uncertainty of the payoff in the VL: in our formal analysis we consider the complete distribution of the VL in a nonparametric manner. 75 100% 75 100% 75 100% 75 100% 74 99% 69 92% 53 71% 33 44% 17 23% 7 9% 10 16 20 16 5 1 0 0 0 $79,363 $107,779 $119,746 $112,818 $103,188 $75,841 $54,376 $33,453 $16,180 $31,141 $53,757 $73,043 $97,275 ($23,655) ($45,996) ($66,387) ($107,877) $53,535 $72,588 $96,887 ($46,177) ($66,399) ($108,086) $73,274 $97,683 ($65,697) ($107,302) $99,895 ($108,629) Round 5 $104,793 ($102,246) $104,369 ($102,222) $105,117 ($101,271) $107,290 ($101,954) $111,964 ($106,137) Round 6 $120,176 ($121,655) $119,890 ($121,492) $120,767 ($120,430) $123,050 ($120,900) $128,613 ($126,097) $128,266 ($124,945) Round 7 $131,165 ($154,443) $130,408 ($133,239) $131,563 ($153,058) $134,307 ($154,091) $140,275 ($160,553) $139,774 ($159,324) $136,720 ($154,973) Round 8 Round 10 $136,325 $136,281 ($176,425) ($258,856) $135,877 $135,721 ($175,278) ($257,049) $136,867 $136,636 173810 ($255,660)) $139,511 $139,504 ($174,702) ($257,219)) $145,710 $145,757 ($180,783) ($266,303) $145,348 $145,301 ($180,593) ($266,781) $142,020 $142,323 ($170,118) ($246,044) $116,249 $116,020 ($157,005) ($223,979) $53,929 ($113,721) Round 9 Note: Data drawn from observations of contestants on the U.S. game show, plus author’s simulations of virtual lotteries as explained in the text. 10 9 8 7 6 5 4 3 2 1 Round 2 Round 3 Round 4 Looking At Virtual Lottery Realized In y Virtual Lotteries for US Deal or No Deal Game Show. Round Active Contestants Deal! Average Offer Table 1. Risk Aversion in Game Shows 385 386 STEFFEN ANDERSEN ET AL. Density VL if No Deal in round 1 and then Deal in round 2 0 50000 100000 Prize Value 150000 200000 Density VL if No Deal in round 1 No Deal in round 2 and then Deal in round 3 0 50000 Fig. 11. 100000 Prize Value 150000 200000 Two Virtual Lottery Distributions in Round 1. In round 1 the contestant can also consider what would happen if he turned down offers in rounds 1 and 2, and accepted the offer in round 3. This VL would generate, from the perspective of round 1, an average payoff of $53,757 with a standard deviation of $45,996. The bottom panel of Fig. 11 shows the simulated distribution of this particular VL. Compared to the VL in which the contestant said ‘‘No Deal’’ in round 1 and ‘‘Deal’’ in round 2, shown above it in Fig. 11, it gives less weight to the smallest prizes and greater weight to higher prizes. Similarly for each of the other VLs shown. The VL for the final Round 10 is simply the implied lottery over the final two unopened cases, since in this round the contestant would have said ‘‘No Deal’’ to all bank offers. The forward-looking contestant in round 1 is assumed to behave as if he maximizes the expected utility of accepting the current offer or continuing. The expected utility of continuing, in turn, is given by simply evaluating each of the nine VLs shown in the first row of Table 1. The average payoff increases steadily, but so does the standard deviation of payoffs, so this evaluation requires knowledge of the utility function of the contestant. Given that utility function, the contestant is assumed to behave as if they evaluate the expected utility of each of the nine VLs. Thus, we calculate nine expected utility numbers, conditional on the specification of the parameters Risk Aversion in Game Shows 387 of the assumed utility function and the VLs that each subject faces in their round 1 choices. In round 1, the subject then simply compares the maximum of these nine expected utility numbers to the utility of the non-stochastic offer in round 1. If that maximum exceeds the utility of the offer, he turns down the offer; otherwise he accepts it. In round 2, a similar process occurs. One critical feature of our VL simulations is that they are conditioned on the actual outcomes that each contestant has faced in prior rounds. Thus, if a (real) contestant has tragically opened up the six top prizes in round 1, that contestant would not see VLs such as the ones in Table 1 for round 2. They would be conditioned on that player’s history in round 1. We report here averages over all players and all simulations. We undertake 100,000 simulations for each player in each round, so as to condition on their history.25 This example can also be used to illustrate how our maximum-likelihood estimation procedure works. Assume some specific utility function and some parameter values for that utility function, with all prizes scaled by the maximum possible at the outset of the game. The utility of the nonstochastic bank offer in round R is then directly evaluated. Similarly, the VLs in each round R can then be evaluated.26 They are represented numerically as 100-point discrete approximations, with 100 prizes and 100 probabilities associated with those prizes. Thus, by implicitly picking a VL over an offer, it is as if the subject is taking a draw from this 100-point distribution of prizes. In fact, they are playing out the DOND game, but this representation as a VL draw is formally identical. The evaluation of these VLs generates v(R) expected utilities, where v(1) ¼ 9, v(2) ¼ 8, y , v(9) ¼ 1 as shown in Table 1. The maximum expected utility of these v(R) in a given round R is then compared to the utility of the offer, and the likelihood evaluated in the usual manner. We present a formal statement of the latent EUT process leading to a likelihood defined over parameters and the observed choices, and then discuss how this intuition changes when we assume alternative, non-EUT processes. 3.2. Formal Specification We assume that utility is defined over money m using the popular CRRA function uðmÞ ¼ m1r ð1 rÞ (1) 388 STEFFEN ANDERSEN ET AL. where r is the utility function parameter to be estimated. In this case r 6¼ 1 is the RRA coefficient, and u(m) ¼ ln(m) for r ¼ 1. With this parameterization r ¼ 0 denotes risk-neutral behavior, rW0 denotes risk aversion, and ro0 denotes risk loving. We review one extension to this simple CRRA model later, but for immediate purposes it is desirable to have a simple specification of the utility function in order to focus on the estimation methodology.27 Probabilities for each outcome k, pk, are those that are induced by the task, so expected utility is simply the probability-weighted utility of each outcome in each lottery. There were 100 outcomes in each VL i, so X ½pk uk (2) EUi ¼ k¼1;100 Of course, we can view the bank offer as being a degenerate lottery. A simple stochastic specification was used to specify likelihoods conditional on the model. The EU for each lottery pair was calculated for a candidate estimate of the utility function parameters, and the index rEU ¼ EUBO EUL m (3) is calculated, where EUL is the lottery in the task, EUBO the degenerate lottery given by the bank offer, and m a Fechner noise parameter following Hey and Orme (1994).28 The index rEU is then used to define the cumulative probability of the observed choice to ‘‘Deal’’ using the cumulative standard normal distribution function: GðrEUÞ ¼ FðrEUÞ (4) This provides a simple stochastic link between the latent economic model and observed choices.29 The likelihood, conditional on the EUT model being true and the use of the CRRA utility function, depends on the estimate of r and m given the above specification and the observed choices. The conditional log-likelihood is ln LEUT ðr; m; yÞ ¼ X ½ðln GðrEUÞjyi ¼ 1Þ þ ðlnð1 GðrEUÞÞjyi ¼ 0Þ (5) i where yi ¼ 1(0) denotes the choice of ‘‘Deal’’ (No Deal) in task i. We extend this standard formulation to include forward-looking behavior by redefining the lottery that the contestant faces. One such VL reflects the Risk Aversion in Game Shows 389 possible outcomes if the subject always says ‘‘No Deal’’ until the end of the game and receives his prize. We call this a VL since it need not happen; it does happen in some fraction of cases, and it could happen for any subject. Similarly, we can substitute other VLs reflecting other possible choices by the contestant. Just before deciding whether to accept the bank offer in round 1, what if the contestant behaves as if the following simulation were repeated G times: Play out the remaining eight rounds and pick cases at random until all but two cases are unopened. Since this is the last round in which one would receive a bank offer, calculate the expected value of the remaining two cases. Then multiply that expected value by the fraction that the bank is expected to use in round 9 to calculate the offer. Pick that fraction from a prior as to the average offer fraction, recognizing that the offer fraction is stochastic. The end result of this simulation is a sequence of G virtual bank offers in round 9, viewed from the perspective of round 1. This sequence then defines the VL to be used for a contestant in round 1 whose horizon is the last round in which the bank will make an offer. Each of the G bank offers in this virtual simulation occurs with probability 1/G, by construction. To keep things numerically manageable, we can then take a 100-point discrete approximation of this lottery, which will typically consist of G distinct real values, where one would like G to be relatively large (we use G ¼ 100,000). This simulation is conditional on the six cases that the subject has already selected at the end of round 1. Thus, the lottery reflects the historical fact of the six specific cases that this contestant has already opened. The same process can be repeated for a VL that only involves looking forward to the expected offer in round 8. And for a VL that only involves looking forward to rounds 7, 6, 5, 4, 3, and 2, respectively. Table 1 illustrates the outcome of such calculations. The contestant can be viewed as having a set of nine VLs to compare, each of which entails saying ‘‘No Deal’’ in round 1. The different VLs imply different choices in future rounds, but the same response in round 1. To decide whether to accept the deal in round 1, we assume that the subject simply compares the maximum EU over these nine VLs with the utility of the deterministic offer in round 1. To calculate EU and utility of the offer one needs to know the parameters of the utility function, but these are just nine EU evaluations and one utility evaluation. These evaluations can be undertaken within a likelihood function evaluator, given candidate values of the parameters of the utility function. The same process can be repeated in round 2, generating another set of eight VLs to be compared to the actual bank offer in round 2. This 390 STEFFEN ANDERSEN ET AL. simulation would not involve opening as many cases, but the logic is the same. Similarly for rounds 3–9. Thus, for each of round 1–9, we can compare the utility of the actual bank offer with the maximum EU of the VLs for that round, which in turn reflects the EU of receiving a bank offer in future rounds in the underlying game. In addition, there exists a VL in which the subject says ‘‘No Deal’’ in every round. This is the VL that we view as being realized in round 10 in Table 1. There are several significant advantages of this VL approach. First, since the round associated with the highest expected utility is not the same for all contestants due to heterogeneity in risk attitudes, it is of interest to estimate the length of this horizon. Since we can directly see that the contestant who has a short horizon behaves in essentially the same manner as the contestant who has a longer horizon, and just substitutes different VLs into their latent EUT calculus, it is easy to test hypotheses about restrictions on the horizon generated by more myopic behavior. Second, one can specify mixture models of different horizons, and let the data determine what fraction of the sample employs which horizon. Third, the approach generalizes for any known offer function, not just the ones assumed here and in Table 1. Thus, it is not as specific to the DOND task as it might initially appear. This is important if one views DOND as a canonical task for examining fundamental methodological aspects of dynamic choice behavior. Those methods should not exploit the specific structure of DOND, unless there is no loss in generality. In fact, other versions of DOND can be used to illustrate the flexibility of this approach, since they sometimes employ ‘‘follow on’’ games that can simply be folded into the VL simulation. Finally, and not least, this approach imposes virtually no numerical burden on the maximum-likelihood optimization part of the numerical estimation stage: all that the likelihood function evaluator sees in a given round is a non-stochastic bank offer, a handful of (virtual) lotteries to compare it to given certain proposed parameter values for the latent choice model, and the actual decision of the contestant to accept the offer or not. This parsimony makes it easy to examine non-CRRA and non-EUT specifications of the latent dynamic choice process, illustrated in Andersen et al. (2006a, 2006b). All estimates allow for the possibility of correlation between responses by the same subject, so the standard errors on estimates are corrected for the possibility that the responses are clustered for the same subject. The use of clustering to allow for ‘‘panel effects’’ from unobserved individual effects is common in the statistical survey literature.30 In addition, we consider allowances for random effects from unobserved individual heterogeneity31 Risk Aversion in Game Shows 391 after estimating the initial model that assumes that all subjects have the same preferences for risk. 3.3. Estimates from Behavior on the Game Show We estimate the CRRA coefficient to be 0.18 with a standard error of 0.030, implying a 95% confidence interval between 0.12 and 0.24. So this provides evidence of moderate risk aversion over this large domain. The noise parameter m is estimated to be 0.077, with a standard deviation of 0.015. Based on the estimated risk coefficients we can calculate the future round for which each contestant had the highest expected utility, seen from the perspective of the round when each decision is made. Fig. 12 displays histograms of these implied maximum EU rounds for each round-specific decision. For example, when contestants are in round 1 making a decision over ‘‘Deal’’ or ‘‘No Deal’’ we see that there is a strong mode for future round 9 as being the round with the maximum EU, given the estimated risk coefficients. The prominence of round 9 remains across all rounds where contestants are faced with a ‘‘Deal’’ or ‘‘No Deal’’ choice, although we can 1 4 7 2 5 8 3 6 9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Future Round Used 1 2 3 4 5 6 7 8 9 10 150 100 50 Frequency 0 150 100 50 0 150 100 50 0 Fig. 12. Evaluation Horizon by Round. 392 STEFFEN ANDERSEN ET AL. see that in rounds 5–7 there is a slight increase in the frequency by which earlier rounds provide the maximum EU for some contestants. The expected utilities for other VLs may well have generated the same binary decision, but the VL for round 9 was the one that appeared to be used since it was greater than the others in terms of expected utility. We assume in the above analysis that all contestants can and do evaluate the expected utility for all VLs defined as the EU of bank offers in future rounds. Nevertheless, it is possible that some, perhaps all, contestants used a more myopic approach and evaluated EU over much shorter horizons. It is a simple matter to examine the effects of constraining the horizon over which the contestant is assumed to evaluate options. If one assumed that choices in each round were based on a comparison of the bank offer and the expected outcome from the terminal round, ignoring the possibility that the maximum EU may be found for an intervening round, then the CRRA estimate becomes 0.12, with a 95% confidence interval between 0.10 and 0.15. We cannot reject the hypothesis that subjects behave as if they are less risk averse if they are only assumed to look to the terminal round, and ignore the intervening bank offers. If one instead assumes that choices in each round were based on a myopic horizon, in which the contestant just considers the distribution of likely offers in the very next round, the CRRA estimate becomes 0.22, with a 95% confidence interval between 0.18 and 0.42. Thus, we obtain results that are similar to those obtained when we allow subjects to consider all horizons, although the estimates are biased and imply greater risk aversion than the unconstrained estimates. The estimated noise parameter increases to 0.12, with a standard error of 0.043. Overall, the estimates assuming myopia are statistically significantly different from the unconstrained estimates, even if the estimates of risk attitudes are substantively similar. Our specification of alternative evaluation horizons does not lead to a nested hypothesis test of parameter restrictions, so a formal test of the differences in these estimates required a non-nested hypothesis test. We use the popular Vuong (1989) procedure, even though it has some strong assumptions discussed in Harrison and Rutström (2005). We find that we can reject the hypothesis that the evaluation horizon is only the terminal horizon with a p-value of 0.026, and also reject the hypothesis that the evaluation horizon is myopic with a p-value of less than 0.0001. Finally, we can consider the validity of the CRRA assumption in this setting, by allowing for varying RRA with prizes. One natural candidate utility function to replace (1) is the Hyperbolic Absolute Risk Aversion (HARA) function of Merton (1971). We use a specification of HARA32 Risk Aversion in Game Shows 393 given in Gollier (2001): y 1g Uð yÞ ¼ z Z þ ; ga0 g (10 Þ where the parameter z can be set to 1 for estimation purposes without loss of generality. This function is defined over the domain of y such that Z þ y=g40. The first order derivative with respect to income is zð1 gÞ y g U 0 ð yÞ ¼ Zþ g g which is positive if and only if zð1 gÞ=g40 for the given domain of y. The second-order derivative is zð1 gÞ y g1 U 00 ðyÞ ¼ o0 Zþ g g which is negative for the given domain of y. Hence it is not possible to specify risk-loving behavior with this specification when non-satiation is assumed. This is not a particularly serious restriction for a model of aggregate behavior in DOND. With this specification ARA is 1/(Z+y/g), so the inverse of ARA is linear in income; RRA is y/(Z+y/g), which can both increase and decrease with income. Relative risk aversion is independent of income and equal to g when Z ¼ 0. Using the HARA utility function, we estimate Z to be 0.30, with a standard error of 0.070 and a 95% confidence interval between 0.15 and 0.43. Thus, we can easily reject the assumption of CRRA over this domain. We estimate g to be 0.992, with a standard error of 0.001. Evaluating RRA over various prize levels reveals an interesting pattern: RRA is virtually 0 for all prize levels up to around $10,000, when it becomes 0.03, indicating very slight risk aversion. It then increases sharply as prize levels increase. At $100,000 RRA is 0.24, at $250,000 it is 0.44, at $500,000 it is 0.61, at $750,000 it is 0.70, and finally at $1 million it is 0.75. Thus, we observe striking evidence of risk neutrality for small stakes, at least within the context of this task, and risk aversion for large stakes. If contestants are constrained to only consider the options available to them in the next round, roughly the same estimates of risk attitudes obtain, even if one can again statistically reject this implicit restriction. RRA is again overestimated, reaching 0.39 for prizes of $100,000, 0.61 for prizes of $250,000, and 0.86 for prizes of $1 million. On the other hand, assuming that contestants only evaluate the terminal option leads to much lower 394 STEFFEN ANDERSEN ET AL. estimates of risk aversion, consistent with the findings assuming CRRA. In this case there is virtually no evidence of risk aversion at any prize level up to $1 million, which is clearly implausible a priori. 3.4. Approximation to the Fully Dynamic Path Our VL approach makes one simplifying assumption which dramatically enhances its ability to handle complicated sequences of choices, but which can lead to a bias in the resulting estimates of risk attitudes. To illustrate, consider the contestant in round 8, facing three unopened prizes and having to open one prize if he declines the bank offer in round 8. Call these prizes X, Y, and Z. There are three combinations of prizes that could remain after opening one prize. Our approach to the VL, from the perspective of the round 8 decision, evaluates the payoffs that confront the contestant for each of these three combinations if he ‘‘mentally locks himself into saying Deal (D) in round 9 and then gets the stochastic offer given the unopened prizes’’ or if he ‘‘mentally locks himself into saying No Deal (ND) in round 9 and then opens 1 more prize.’’ The former is the VL associated with the strategy of saying ND in round 8 and D in round 9, and the latter is the VL associated with the strategy of saying ND in round 8 and ND again in round 9. We compare the EU of these two VL as seen from round 8, and pick the largest as representing the EU from saying ND in round 8. Finally, we compare this EU to the U from saying D in round 8, since the offer in round 8 is known and deterministic. The simplification comes from the fact that we do not evaluate the utility function in each of the possible virtual round 9 decisions. A complete enumeration of each possible path would undertake three paired comparisons. Consider the three possible outcomes: If prize X had been opened we would have Y and Z unopened coming into virtual round 9. This would generate a distribution of offers in virtual round 9 (it is a distribution since the expected offer as a percent of the EV of unopened prizes is stochastic as viewed from round 8). It would also generate two outcomes if the contestant said ND: either he opens Y or he opens Z. A complete enumeration in this case should evaluate the EU of saying D and compare it to the EU of a 50–50 mix of Y and Z. If prize Y had been opened we would have X and Z unopened coming into virtual round 9. A complete enumeration should evaluate the EU of saying D and compare it to the EU of a 50–50 mix of X and Z. Risk Aversion in Game Shows 395 If prize Z had been opened we would have X and Y unopened coming into virtual round 9. A complete enumeration should evaluate the EU of saying D and compare it to the EU of a 50–50 mix of X and Y. Instead of these three paired comparisons in virtual round 9, our approach collapses all of the offers from saying D in virtual round 9 into one VL, and all of the final prize earnings from saying ND in virtual round 9 into another single VL. Our approach can be viewed as a valid solution to the dynamic problem the contestant faces if one accepts the restriction in the set of control strategies considered by the contestant. This restriction could be justified on behavioral grounds, since it does reduce the computational burden if in fact the contestant was using a process such as we use to evaluate the path. On the other hand, economists typically view the adoption of the optimal path as an ‘‘as if’’ prediction, in which case this behavioral justification would not apply. Or our approach may just be viewed as one way to descriptively model the forward-looking behavior of contestants, which is one of the key features of the analysis of the DOND game show. Just as we have alternative ways of modeling static choice under uncertainty, we can have alternative ways of modeling dynamic choice under uncertainty. At some point it would be valuable to test these alternative models against each other, but that does not have to be the first priority in trying to understand DOND behavior. It is possible to extend our general VL approach to take into account these possibilities, since one could keep track of all three pairs of VL in the above complete enumeration, rather than collapsing it down to just one pair of VL. Refer to this complete enumeration as VL. From the perspective of the contestant, we know that the EU(VL)ZEU(VL), since VL contains VL as a special case. We can therefore identify the implication of using VL instead of VL for our inferences about risk attitudes, again considering the contestant in round 8 for ease of exposition, and assuming that the contestant actually undertakes full enumeration as reflected in VL. Specifically, we will understate the EU of saying ND in round 8. This means that our ML estimation procedure would be biased toward finding less risk aversion than there actually is. To see this, assume some trial value of a CRRA risk aversion parameter. There are three possible cases, taking strict inequalities to be able to state matters crisply: 1. If this trial parameter r generates EU(VL)WEU(VL)WU(D) then the VL approach would make the same qualitative inference as the 396 STEFFEN ANDERSEN ET AL. VL approach, but would understate the likelihood of that observation. This understatement comes from the implication that EU(VL)U(D)WEU(VL)U(D), and it is this difference that determines the probability of the observed choice (after some adjustment for a stochastic error). 2. If this trial parameter r generates EU(VL)oEU(VL)oU(D) then the VL approach would again make the same qualitative inference as the VL approach, but would overstate the likelihood of that observation. This overstatement comes from the implication in this case that EU(VL)U(D)oEU(VL)U(D). 3. If this trial parameter r generates EU(VL)WU(D)WEU(VL), then the VL approach would lead us to predict that the subject would make the D decision, whereas the VL approach would lead us to predict that the subject would make the ND decision. If we assume that the subject is actually motivated by VL, and we incorrectly use VL, we would observe a choice of ND and would be led to lower our trial parameter r to better explain the observed choice; lowering r would make the subject less risk averse, and more likely to reject the D decision under VL. But we should not have lowered the parameter r, we should just have calculated the EU of the ND choice using VL instead of VL. Note that one cannot just tabulate the incidence of these three cases at the final ML estimate of r, and check to see if the vast bulk of choices fall into case #1 or case #2, since that estimate would have been adjusted to avoid case #3 if possible. And there is no presumption that the bias of the likelihood estimation in case #1 is just offset by the bias in case #2. So the bias from case #3 would lead us to expect that risk aversion would be underestimated, but the secondary effects from cases #1 and #2 should also be taken into account. Of course, if the contestant does not undertake full enumeration, and instead behaves consistently with the logic of our VL model, there is no bias at all in our estimates. The only way to evaluate the extent of the bias is to undertake the complete enumeration required by VL and compare to the approximation obtained with VL. We have done this for the game show data in the United States, starting with behavior in round 6. By skipping behavior in rounds 1–5 we only drop 15 out of 141 subjects, and undertaking the complete enumeration from earlier rounds is computationally intensive. We employ a 19-point approximation of the empirical distribution of bank offers in each round; in the VL approach we sampled 100,000 times from those distributions as part of the VL simulations. We then estimate the CRRA Risk Aversion in Game Shows 397 model using VL, and estimate the same model for the same behavior using VL, and compare results. We find that the inferred CRRA coefficient increases as we use VL, as expected a priori, but by a very small amount. Specifically, we estimate CRRA to be 0.366 if we use VL and 0.345 if we use VL, and where the 95% confidence intervals comfortably overlap (they are 0.25 and 0.48 for the VL approach, and 0.25 and 0.44 for the VL approach). The log-likelihood under the VL approach is 212.54824, and it is 211.27711 under the VL approach, consistent with the VL approach providing a better fit, but only a marginally better fit. Thus, we can claim that our VL approach provides an excellent approximation to the fully dynamic solution. It is worth stressing that the issue of which estimate is the correct one depends on the assumptions made about contestant behavior. If one assumes that contestants in fact use strategies such as those embodied in VL, then using VL would actually overstate true risk aversion, albeit by a trivial amount. 3.5. Estimates from Behavior in the Laboratory The lab results indicate a CRRA coefficient of 0.45 and a 95% confidence interval between 0.38 and 0.52, comparable to results obtained using more familiar risk elicitation procedures due to Holt and Laury (2002) on the same subject pool. When we restrict the estimation model to only use the terminal period we again infer a much lower degree of risk aversion, consistent with risk neutrality; the CRRA coefficient is estimated to be 0.02 with a 95% confidence interval between 0.07 and 0.03. Constraining the estimation model to only consider prospects one period ahead leads to higher inferred risk aversion; the CRRA coefficient is estimated to be 0.48 with a 95% confidence interval between 0.41 and 0.55. 4. CONCLUSIONS Game shows offer obvious advantages for the estimation of risk attitudes, not the least being the use of large stakes. Our review of analyses of these data reveal a steady progression of sophistication in terms of the structural estimation of models of choice under uncertainty. Most of these shows, however, put the contestant into a dynamic decision-making environment; so one cannot simply (and reliably) use static models of choice. Using DOND as a detailed case study, we considered a general estimation 398 STEFFEN ANDERSEN ET AL. methodology for such shows in which randomization of the potential outcomes allows us to break the curse of dimensionality that comes from recognizing these dynamic elements of the task environment. The DOND paradigm is important for several reasons, and more general than it might at first seem. It incorporates many of the dynamic, forwardlooking decision processes that strike one as a natural counterpart to a wide range of fundamental economic decisions in the field. The ‘‘option value’’ of saying ‘‘No Deal’’ has clear parallels to the financial literature on stock market pricing, as well as to many investment decisions that have future consequences (so-called real options). There is no frictionless market ready to price these options, so familiar arbitrage conditions for equilibrium valuation play no immediate role, and one must worry about how the individual makes these decisions. The game show offers a natural experiment, with virtually all of the major components replicated carefully from show to show, and even from country to country. The only sense in which DOND is restrictive is that it requires that the contestant make a binary ‘‘stop/go’’ decision. This is already a rich domain, as illustrated by several prominent examples: the evaluation of replacement strategy of capital equipment (Rust, 1987) and the closure of nuclear power plants (Rothwell & Rust, 1997). But it would be valuable to extend the choice variable to be non-binary, such as in Card Sharks where the contestant has the bet level to decide in each round, as well as some binary decision (whether to switch the face card). Although some progress has been made on this problem, reviewed in Rust (1994), the range of applications has not been wide (e.g., Rust & Rothwell, 1995). Moreover, none of these have considered risk attitudes, let alone associated concepts such as loss aversion or probability weighting. Thus, the detailed analysis of choice behavior in environments such as Card Sharks should provide a rich test case for many broader applications. These game shows provide a particularly fertile environment to test extensions to standard EUT models, as well as alternatives to EUT models of risk attitudes. Elsewhere, we have discussed applications that consider rankdependent models such as RDU, and sign-dependent models such as CPT (Andersen et al., 2006a, 2006b). These applications, using the VL approach and U.K. data, have demonstrated the sensitivity of inferences to the manner in which key concepts are operationalized. Andersen et al. (2006a) find striking evidence of probability weighting, which is interesting since the DOND game has symmetric probabilities on each case. Using natural reference points to define contestant-specific gains or losses, they find no evidence of loss aversion. Of course, that inference depends on having Risk Aversion in Game Shows 399 identified the right reference point, but CPT is generally silent on that specification issue when it is not obvious from the frame. Andersen et al. (2006b) illustrate the application of alternative ‘‘dual-criteria’’ models of choice from psychology, built to account for lab behavior with long shot, asymmetric lotteries such as one finds in DOND. No doubt many other specifications will be considered. Within the EUT framework, Andersen et al. (2006a) demonstrate the importance of allowing for asset integration. When utility is assumed to be defined over prizes plus some outside wealth measure,33 behavior is well characterized by a CRRA specification; but when it is assumed to be defined over prizes only, behavior is better characterized by a non-CRRA specification with increasing RRA over prizes. There are three major weaknesses of game shows. The first is that one cannot change the rules of the game or the information that contestants receive, much as one can in a laboratory experiment. Thus, the experimenter only gets to watch and learn, since natural experiments are, as described by Harrison and List (2004), serendipity observed. However, it is a simple matter to design laboratory experiments that match the qualitative task domains in the game show, even if one cannot hope to have stakes to match the game show (e.g., Tenorio & Cason, 2002; Healy & Noussair, 2004; Andersen et al., 2006b; and Post, van den Assem, Baltussen, & Thaler, 2006). Once this has been done, exogenous treatments can be imposed and studied. If behavior in the default version of the game can be calibrated to behavior in a lab environment, then one has some basis for being interested in the behavioral effects of treatments in the lab. The second major weakness of game shows is the concern that the sample might have been selected by some latent process correlated with the behavior of interest to the analyst: the classic sample selection problem. Most analyses of game shows are aware of this, and discuss the procedures by which contestants get to participate. At the very least, it is clear that the demographic diversity is wider than found in the convenience samples of the lab. We believe that controlled lab experiments can provide guidance on the extent of sample selection into these tasks, and that the issue is a much more general one. The third major weakness of game shows is the lack of information on observable characteristics, and hence the inability to use that information to examine heterogeneity of behavior. It is possible to observe some information from the contestant, since there is normally some pre-game banter that can be used to identify sex, approximate age, marital status, and ethnicity. But the general solution here is to employ econometric methods that allow one to correct for possible heterogeneity at the level of the individual, even if one 400 STEFFEN ANDERSEN ET AL. cannot condition on observable characteristics of the individual. Until then, one either pools over subjects under the assumption that they have the same preferences, as we have done; make restrictive assumptions that allow one to identify bounds for a given contestant, but then provide contestant-specific estimates (e.g., Post et al., 2006); or pay more attention to statistical methods that allow for unobserved heterogeneity. One such method is to allow for random coefficients of each structural model to represent an underlying variation in preferences across the sample (e.g., Train, 2003, Chapter 6; De Roos & Sarafidis, 2006; and Botti et al., 2006). This is quite different from allowing for standard errors in the pooled coefficient, as we have done. Another method is to allow for finite mixtures of alternative structural models, recognizing that some choices or subjects may be better characterized in this domain by one latent decision-making process and that others may be better characterized by some other process (e.g., Harrison & Rutström, 2005). These methods are not necessarily alternatives, but they each demand relatively large data sets and considerable attention to statistical detail. NOTES 1. Behavior on Who Wants To Be A Millionaire has been carefully evaluated by Hartley, Lanot, and Walker (2005), but this game involves a large number of options and alternatives that necessitate some strong assumptions before one can pin down risk attitudes rigorously. We focus on games in which risk attitudes are relatively easier to identify. 2. These experiments are from unpublished research by the authors. 3. In the earliest versions of the show this option only applied to the first card in the first row. Then it applied to the first card in each row in later versions. Finally, in the last major version it applied to any card in any row, but only one card per row could be switched. 4. Two further American versions were broadcast. One was a syndicated version in the 1986/1987 season, with Bill Rafferty as host. Another was a brief syndicated version in 2001. A British version, called Play Your Cards Right, aired in the 1980s and again in the 1990s. A German version called Bube Dame Hörig, and a Swedish version called Lagt Kort Ligger, have also been broadcast. Card Sharks re-runs remain relatively popular on the American Game Show Network, a cable station. 5. Available at http://data.bls.gov/cgi-bin/cpicalc.pl 6. Let the expected utility of the bet b be pwin U(b)+plose U(b). The first order condition for a maximum over b is then pwin Uu(b)+plose Uu(b) ¼ 0. Since Uu(b) ¼ exp(ab) and Uu(b) ¼ exp(a(b)), substitution and simple manipulation yield the formula. Risk Aversion in Game Shows 401 7. In addition, a variable given by stake2/2000 is included by itself to account for possible nonlinearities. 8. Gertner (1993, p. 512): ‘‘I treat each bet as a single observation, ignoring any contestant-specific effects.’’ 9. He rejects this hypothesis, for reasons not important here. 10. For example, in a game aired on 9/16/2004, the category was ‘‘Speaking in Tongues.’’ The $800 text was ‘‘A 1996 Oakland School Board decision made many aware of this term for African-American English.’’ Uber-champion Ken Jennings correctly responded, ‘‘What be Ebonics?’’ 11. Nalebuff (1990, p. 182) proposed the idea of the analysis, and the use of empirical responses to avoid formal analysis of the strategic aspects of the game. 12. One formal difference is that the first order condition underlying that formula assumes an interior solution, and the decision-maker in runaway games has to ensure that he does not bet too much to fall below the highest possible points of his rival. Since this constraint did not bind in the 110 data points available, it can be glossed. 13. The Lingo Board in the U.S. version is larger, and there are more balls in the urn, with implications for the probabilities needed to infer risk attitudes. 14. Their Eq. (12) shows the formula for the general case, and Eqs. (5) and (8) for the special final-round cases assuming CRRA or CARA. There is no statement that this is actually evaluated within the maximum-likelihood evaluator, but pni is not listed as a parameter to be estimated separately from the utility function parameter, so this is presumably what was done. 15. The point estimates for the CRRA function (their Table 6, p. 837) are generally around f1,800 and f1,500, with standard errors of roughly f 200 on each. Similar results obtain for the CARA function (their Table 7, p. 839). So these differences are not obviously significant at standard critical levels. 16. A handful of special shows, such as season finales and season openers, have higher stakes up to $6 million. Our later statistical analysis includes these data, and adjusts the stakes accordingly. 17. Or make some a priori judgments about the bounded rationality of contestants. For example, one could assume that contestants only look forward one or two rounds, or that they completely ignore bank offers. 18. Other top prizes were increased as well. For example, in the final show of the first season, the top five prizes were changed from $200k, $300k, $400k, $500k, and $1m to $300k, $400k, $500k, $2.5m, and $5m, respectively. 19. The instructions are available in Appendix A of the working paper version, available online at http://www.bus.ucf.edu/wp/ 20. The screen shots provided in the instructions and computer interface were much larger, and easier to read. Baltussen, Post, and van den Assem (2006) also conducted laboratory experiments patterned on DOND. They used instructions which were literally taken from the instructions given to participants in the Dutch DOND game show, with some introductory text from the experimenters explaining the exchange rate between the experimental game show earnings and take home payoffs. Their approach has the advantage of using the wording of instructions used in the field. Our objective was to implement a laboratory experiment based on the DOND task, and clearly referencing it as a natural counterpart to the lab 402 STEFFEN ANDERSEN ET AL. experiment. But we wanted to use instructions which we had complete control over. We wanted subjects to know exactly what bank offer function was going to be used. In our view the two types of DOND laboratory experiments complement each other, in the same sense in which lab experiments, field experiments, and natural experiments are complementary (see Harrison & List, 2004). 21. Virtually all subjects indicated that they had seen the U.S. version of the game show, which was a major ratings hit on network television in five episodes screened daily at prime time just prior to Christmas in 2005. Our experiments were conducted about a month after the return of the show in the U.S., following the 2006 Olympic Games. 22. The literature has already generated a lengthy lead article in the Wall Street Journal (January 12, 2006, p. A1) and National Public Radio interviews in the U.S. with researchers Thaler and Post on the programs Day to Day (http://www.npr.org/ templates/story/story.php?storyId=5243893) and All Things Considered (http:// www.npr.org/templates/story/story.php?storyId=5244516) on March 3, 2006. 23. Appendix B is available in the working paper version, available online at http://www.bus.ucf.edu/wp/ 24. Abdellaoui, Barrios, and Wakker (2007, p. 363) offer a one-parameter version of the Expo-Power function which exhibits non-constant RRA for empirically plausible parameter values. It does impose some restrictions on the variations in RRA compared to the two-parameter EP function, but is valuable as a parsimonious way to estimate non-CRRA specifications, and could be used for ‘‘bounds analyses’’ such as these. 25. If bank offers were a deterministic and known function of the expected value of unopened prizes, we would not need anything like 100,000 simulations for later rounds. For the last few rounds of a full game, in which the bank offer is relatively predictable, the use of this many simulations is a numerically costless redundancy. 26. There is no need to know risk attitudes, or other preferences, when the distributions of the virtual lotteries are generated by simulation. But there is definitely a need to know these preferences when the virtual lotteries are evaluated. Keeping these computational steps separate is essential for computational efficiency, and is the same procedurally as pre-generating ‘‘smart’’ Halton sequences of uniform deviates for later, repeated use within a maximum-simulated likelihood evaluator (e.g., Train, 2003, p. 224ff.). 27. It is possible to extend the analysis by allowing the core parameter r to be a function of observable characteristics. Or one could view the CRRA coefficient as a random coefficient reflecting a subject-specific random effect u, so that one would estimate r^ ¼ r^0 þ u instead. This is what De Roos and Sarafidis (2006) do for their core parameters, implicitly assuming that the mean of u is zero and estimating the standard deviation of u. Our approach is just to estimate r^0 . 28. Harless and Camerer (1994), Hey and Orme (1994), and Loomes and Sugden (1995) provided the first wave of empirical studies including some formal stochastic specification in the version of EUT tested. There are several species of ‘‘errors’’ in use, reviewed by Hey (1995, 2002), Loomes and Sugden (1995), Ballinger and Wilcox (1997), and Loomes, Moffatt, and Sugden (2002). Some place the error at the final choice between one lottery or the other after the subject has decided deterministically which one has the higher expected utility; some place the error earlier, on the comparison of preferences leading to the choice; and some place the error even earlier, on the determination of the expected utility of each lottery. Risk Aversion in Game Shows 403 29. De Roos and Sarafidis (2006) assume a random effects term v for each individual and add it to the latent index defining the probability of choosing deal. This is the same thing as changing our specification (4) to GðrEUÞ ¼ FðrEUÞ þ v, and adding the standard deviation of v as a parameter to be estimated (the mean of v is assumed to be 0). 30. Clustering commonly arises in national field surveys from the fact that physically proximate households are often sampled to save time and money, but it can also arise from more homely sampling procedures. For example, Williams (2000, p. 645) notes that it could arise from dental studies that ‘‘collect data on each tooth surface for each of several teeth from a set of patients’’ or ‘‘repeated measurements or recurrent events observed on the same person.’’ The procedures for allowing for clustering allow heteroskedasticity between and within clusters, as well as autocorrelation within clusters. They are closely related to the ‘‘generalized estimating equations’’ approach to panel estimation in epidemiology (see Liang & Zeger, 1986), and generalize the ‘‘robust standard errors’’ approach popular in econometrics (see Rogers, 1993). Wooldridge (2003) reviews some issues in the use of clustering for panel effects, noting that significant inferential problems may arise with small numbers of panels. 31. In the DOND literature, de Roos and Sarafidis (2006) demonstrate that alternative ways of correcting for unobserved individual heterogeneity (random effects or random coefficients) generally provide similar estimates, but that they are quite different from estimates that ignore that heterogeneity. Botti, Conte, DiCagno, and D’Ippoliti (2006) also consider unobserved individual heterogeneity, and show that it is statistically significant in their models (which ignore dynamic features of the game). 32. Gollier (2001, p. 25) refers to this as a Harmonic Absolute Risk Aversion, rather than the Hyperbolic Absolute Risk Aversion of Merton (1971, p. 389). 33. This estimated measure might be interpreted as wealth, or as some function of wealth in the spirit of Cox and Sadiraj (2006). ACKNOWLEDGMENTS Harrison and Rutström thank the U.S. National Science Foundation for research support under grants NSF/IIS 9817518, NSF/HSD 0527675, and NSF/SES 0616746. We are grateful to Andrew Theophilopoulos for artwork. REFERENCES Abdellaoui, M., Barrios, C., & Wakker, P. P. (2007). Reconciling introspective utility with revealed preference: Experimental arguments based on prospect theory. Journal of Econometrics, 138, 356–378. Andersen, S., Harrison, G. W., Lau, M. I., & Rutström, E. E. (2006a). Dynamic choice behavior in a natural experiment. Working Paper 06–10, Department of Economics, College of Business Administration, University of Central Florida. 404 STEFFEN ANDERSEN ET AL. Andersen, S., Harrison, G. W., Lau, M. I., & Rutström, E. E. (2006b). Dual criteria decisions. Working Paper 06–11, Department of Economics, College of Business Administration, University of Central Florida. Ballinger, T. P., & Wilcox, N. T. (1997). Decisions, error and heterogeneity. Economic Journal, 107, 1090–1105. Baltussen, G., Post, T., & van den Assem, M. (2006). Stakes, prior outcomes and distress in risky choice: An experimental study based on Deal or No Deal. Working Paper, Department of Finance, Erasmus School of Economics, Erasmus University. Beetsma, R. M. W. J., & Schotman, P. C. (2001). Measuring risk attitudes in a natural experiment: Data from the television game show Lingo. Economic Journal, 111, 821–848. Blavatskyy, P., & Pogrebna, G. (2006). Testing the predictions of decision theories in a natural experiment when half a million is at stake. Working Paper 291, Institute for Empirical Research in Economics, University of Zurich. Bombardini, M., & Trebbi, F. (2005). Risk aversion and expected utility theory: A field experiment with large and small stakes. Working Paper 05–20, Department of Economics, University of British Columbia. Botti, F., Conte, A., DiCagno, D., & D’Ippoliti, C. (2006). Risk attitude in real decision problems. Unpublished Manuscript, LUISS Guido Carli, Rome. Campbell, J. Y., Lo, A. W., & MacKinlay, A. C. (1997). The econometrics of financial markets. Princeton: Princeton University Press. Cox, J. C., & Sadiraj, V. (2006). Small- and large-stakes risk aversion: Implications of concavity calibration for decision theory. Games and Economic Behavior, 56(1), 45–60. De Roos, N., & Sarafidis, Y. (2006). Decision making under risk in deal or no deal. Working Paper, School of Economics and Political Science, University of Sydney. Gertner, R. (1993). Game shows and economic behavior: Risk-taking on Card Sharks. Quarterly Journal of Economics, 108(2), 507–521. Gollier, C. (2001). The economics of risk and time. Cambridge, MA: MIT Press. Harless, D. W., & Camerer, C. F. (1994). The predictive utility of generalized expected utility theories. Econometrica, 62(6), 1251–1289. Harrison, G. W., Johnson, E., McInnes, M. M., & Rutström, E. E. (2005). Risk aversion and incentive effects: Comment. American Economic Review, 95(3), 897–901. Harrison, G. W., Lau, M. I., & Rutström, E. E. (2007). Estimating risk attitudes in Denmark: A field experiment. Scandinavian Journal of Economics, 109(2), 341–368. Harrison, G. W., Lau, M. I., Rutström, E. E., & Sullivan, M. B. (2005). Eliciting risk and time preferences using field experiments: Some methodological issues. In: J. Carpenter, G. W. Harrison & J. A. List (Eds), Field Experiments in Economics (Vol. 10). Greenwich, CT: JAI Press, Research in Experimental Economics. Harrison, G. W., & List, J. A. (2004). Field experiments. Journal of Economic Literature, 42(4), 1013–1059. Harrison, G. W., & Rutström, E. E. (2005). Expected utility theory and prospect theory: One wedding and a decent funeral. Working Paper 05–18, Department of Economics, College of Business Administration, University of Central Florida; Experimental Economics, forthcoming. Harrison, G. W., & Rutström, E. E. (2008). Risk aversion in the Laboratory. In: J. C. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Bingley, UK: Emerald, Research in Experimental Economics. Risk Aversion in Game Shows 405 Hartley, R., Lanot, G., & Walker, I. (2005). Who really wants to be a Millionaire? Estimates of risk aversion from gameshow data. Working Paper, Department of Economics, University of Warwick. Healy, P., & Noussair, C. (2004). Bidding behavior in the Price Is Right Game: An experimental study. Journal of Economic Behavior and Organization, 54, 231–247. Hey, J. (1995). Experimental investigations of errors in decision making under risk. European Economic Review, 39, 633–640. Hey, J. D. (2002). Experimental economics and the theory of decision making under uncertainty. Geneva Papers on Risk and Insurance Theory, 27(1), 5–21. Hey, J. D., & Orme, C. (1994). Investigating generalizations of expected utility theory using experimental data. Econometrica, 62(6), 1291–1326. Holt, C. A., & Laury, S. K. (2002). Risk aversion and incentive effects. American Economic Review, 92(5), 1644–1655. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263–291. Levitt, S. D., & List, J. A. (2007). What do laboratory experiments measuring social preferences reveal about the real world? Journal of Economic Perspectives, 21(2), 153–174. Liang, K.-Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22. Loomes, G., Moffatt, P. G., & Sugden, R. (2002). A microeconometric test of alternative stochastic theories of risky choice. Journal of Risk and Uncertainty, 24(2), 103–130. Loomes, G., & Sugden, R. (1995). Incorporating a stochastic element into decision theories. European Economic Review, 39, 641–648. Markowitz, H. (1952). The utility of wealth. Journal of Political Economy, 60, 151–158. Merton, R. C. (1971). Optimum consumption and portfolio rules in a continuous-time model. Journal of Economic Theory, 3, 373–413. Metrick, A. (1995). A natural experiment in ‘Jeopardy!’. American Economic Review, 85(1), 240–253. Mulino, D., Scheelings, R., Brooks, R., & Faff, R. (2006). An Empirical Investigation of Risk Aversion and Framing Effects in the Australian Version of Deal Or No Deal. Working Paper, Department of Economics, Monash University. Nalebuff, B. (1990). Puzzles: Slot machines, zomepirac, squash, and more. Journal of Economic Perspectives, 4(1), 179–187. Post, T., van den Assem, M., Baltussen, G., & Thaler, R. (2006). Deal or no deal? decision making under risk in a large-payoff game show. Working Paper, Department of Finance, Erasmus School of Economics, Erasmus University; American Economic Review, forthcoming. Quiggin, J. (1982). A theory of anticipated utility. Journal of Economic Behavior and Organization, 3(4), 323–343. Quiggin, J. (1993). Generalized expected utility theory: The rank-dependent model. Norwell, MA: Kluwer Academic. Rogers, W. H. (1993). Regression standard errors in clustered samples. Stata Technical Bulletin, 13, 19–23. Rothwell, G., & Rust, J. (1997). On the optimal lifetime of nuclear power plants. Journal of Business and Economic Statistics, 15(2), 195–208. Rust, J. (1987). Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher. Econometrica, 55, 999–1033. 406 STEFFEN ANDERSEN ET AL. Rust, J. (1994). Structural estimation of Markov decision processes. In: D. McFadden & R. Engle (Eds), Handbook of econometrics (Vol. 4). Amsterdam, NL: North-Holland. Rust, J. (1997). Using randomization to break the curse of dimensionality. Econometrica, 65(3), 487–516. Rust, J., & Rothwell, G. (1995). Optimal response to a shift in regulatory regime: The case of the US Nuclear Power Industry. Journal of Applied Econometrics, 10, S75–S118. Tenorio, R., & Cason, T. (2002). To spin or not to spin? Natural and laboratory experiments from The Price is Right. Economic Journal, 112, 170–195. Train, K. E. (2003). Discrete choice methods with simulation. New York, NY: Cambridge University Press. Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57(2), 307–333. Williams, R. L. (2000). A note on robust variance estimation for cluster-correlated data. Biometrics, 56, 645–646. Wooldridge, J. (2003). Cluster-sample methods in applied econometrics. American Economic Review (Papers and Proceedings), 93, 133–138.
© Copyright 2026 Paperzz