risk aversion in game shows - ITD

RISK AVERSION IN GAME SHOWS
Steffen Andersen, Glenn W. Harrison, Morten I. Lau
and E. Elisabet Rutström
ABSTRACT
We review the use of behavior from television game shows to infer risk
attitudes. These shows provide evidence when contestants are making
decisions over very large stakes, and in a replicated, structured way.
Inferences are generally confounded by the subjective assessment of skill
in some games, and the dynamic nature of the task in most games. We
consider the game shows Card Sharks, Jeopardy!, Lingo, and finally
Deal Or No Deal. We provide a detailed case study of the analyses of
Deal Or No Deal, since it is suitable for inference about risk attitudes and
has attracted considerable attention.
Observed behavior on television game shows constitutes a controlled
natural experiment that has been used to estimate risk attitudes. Contestants
are presented with well-defined choices where the stakes are real and
sizeable, and the tasks are repeated in the same manner from contestant to
contestant. We review behavior in these games, with an eye to inferring risk
attitudes. We describe the types of assumptions needed to evaluate behavior,
and propose a general method for estimating the parameters of structural
models of choice behavior for these games. We illustrate with a detailed case
study of behavior in the U.S. version of Deal Or No Deal (DOND).
Risk Aversion in Experiments
Research in Experimental Economics, Volume 12, 361–406
Copyright r 2008 by Emerald Group Publishing Limited
All rights of reproduction in any form reserved
ISSN: 0193-2306/doi:10.1016/S0193-2306(08)00008-2
361
362
STEFFEN ANDERSEN ET AL.
In Section 1 we review the existing literature in this area that is focused
on risk attitudes, starting with Gertner (1993) and the Card Sharks program.
We then review the analysis of behavior on Jeopardy! by Metrick (1995)
and on Lingo by Beetsma and Schotman (2001).1 In Section 2 we turn to
a detailed case study of the DOND program that has generated an explosion of analyses trying to estimate large-stakes risk aversion. We explain
the basic rules of the game, which is shown with some variations in
many countries. We then review complementary laboratory experiments
that correspond to the rules of the naturally occurring game show. Finally,
we discuss alternative modeling strategies employed in related DOND
literature.
Section 3 proposes a general method for estimating choice models in the
stochastic dynamic programming environment that most of these game
shows employ. We resolve the ‘‘curse of dimensionality’’ in this setting by
using randomization methods and certain simplifications to the forwardlooking strategies adopted. We discuss the ability of our approach to closely
approximate the fully dynamic path that agents might adopt. We illustrate
the application of the method using data from the U.S. version of DOND,
and estimate a simple structural model of expected utility theory choice
behavior. The manner in which our method can be extended to other models
is also discussed.
Finally, in Section 4 we identify several weaknesses of game show data,
and how they might be addressed. We stress the complementary use of
natural experiments, such as game shows, and laboratory experiments.
1. PREVIOUS LITERATURE
1.1. Card Sharks
The game show Card Sharks provided an opportunity for Gertner (1993) to
examine dynamic choice under uncertainty involving substantial gains and
losses. Two key features of the show allowed him to examine the hypothesis
of asset integration: each contestant’s stake accumulates from round to
round within a game, and the fact that some contestants come back for
repeat plays after winning substantial amounts.
The game involves each contestant deciding in a given round whether to
bet that the next card drawn from a deck will be higher or lower than
Risk Aversion in Game Shows
363
Fig. 1. Money Cards Board in Card Sharks.
some ‘‘face card’’ on display. Fig. 1 provides a rough idea of the layout
of the ‘‘Money Cards’’ board before any face cards are shown. Fig. 2
provides a representation of the board from a computerized laboratory
implementation2 of Card Sharks. In Fig. 2 the subject has a face card with a
3, and is about to enter the first bet.
Cards are drawn without replacement from a standard 52-card deck, with
no Jokers and with Aces high. Contestants decide on the relative value of
the next card, and then on an amount to bet that their choice is correct. If
they are correct their stake increments by the amount bet, if they are
incorrect their stake is reduced by the amount bet, and if the new card is the
same as the face card there is no change in the stake. Every contestant starts
off with an initial stake of $200, and bets could be in increments of $50 of
the available stake. After three rounds in the first, bottom ‘‘row’’ of cards,
they move to the second, middle ‘‘row’’ and receive an additional $200 (or
$400 in some versions). If the stake goes to zero in the first row, contestants
go straight to the second row and receive the new stake; otherwise, the
additional stake is added to what remains from row one. The second row
includes three choices, just as in the first row. After these three choices, and
if the stakes have not dropped to zero, they can play the final bet. In this
case they have to bet at least one-half of their stake, but otherwise the
betting works the same way. One feature of the game is that contestants
364
STEFFEN ANDERSEN ET AL.
Fig. 2.
Money Cards Board from Lab Version of Card Sharks.
sometimes have the option to switch face cards in the hope of getting one
that is easier to win against.3
The show aired in the United States in two major versions. The first,
between April 1978 and October 1981, was on NBC and had Jim Perry as
the host. The second, between January 1986 and March 1989, was on CBS
and had Bob Eubanks as the host.4
The maximum prize was $28,800 on the NBC version and $32,000 on the
CBS version, and would be won if the contestant correctly bet the maximum
amount in every round. This only occurred once. Using official inflation
calculators5 this converts into 2006 dollars between $89,138 and $63,936 for
the NBC version, and between $58,920 and $52,077 for the CBS version.
Risk Aversion in Game Shows
365
These stakes are actually quite modest in relation to contemporary game
shows in the United States, such as DOND described below, which typically
has a maximal stake of $1,000,000. Of course, maximal stakes can be
misleading, since Card Sharks and DOND are both ‘‘long shot’’ lotteries.
Average earnings in the CBS version used by Gertner (1993) were $4,677,
which converts to between $8,611 and $7,611 in 2006, whereas average
earnings in DOND have been $131,943 for the sample we report later
(excluding a handful of special shows with significantly higher prizes).
1.1.1. Estimates of Risk Attitudes
The analysis of Gertner (1993) assumes a Constant Absolute Risk Aversion
(CARA) utility function, since he did not have information on household
wealth and viewed that as necessary to estimate a Constant Relative Risk
Aversion (CRRA) utility function. We return to the issue of household
wealth later.
Gertner (1993) presents several empirical analyses. He initially (p. 511)
focuses on the last round, and uses the optimal ‘‘investment’’ formula
b¼
lnðpwin Þ lnðplose Þ
2a
where the probabilities of winning and losing the bet b are defined by pwin
and plose, and the utility function is
UðW Þ ¼ expðaW Þ
for wealth W.6 From observed bets he infers a. There are several potential
problems with this approach. First, there is an obvious sample selection
problem from only looking at the last round, although this is not a major
issue since relatively few contestants go bankrupt (less than 3%).
Second, there is the serious problem of censoring at bets of 50% or 100%
of the stake. Gertner (1993, p. 510) is well aware of the issue, and indeed
motivates several analytical approaches to these data by a desire to avoid it:
Regression estimates of absolute risk aversion are sensitive to the distribution
assumptions one makes to handle the censoring created by the constraints that a
contestant must bet no more than her stake and at least half of her stake in the final
round. Therefore, I develop two methods to estimate a lower bound on the level of risk
aversion that do not rely on assumptions about the error distribution.
The first method he uses is just to assume that the censored responses are in
fact the optimal response. The 50% bets are assumed to be optimal bets,
when in fact the contestant might wish to bet less (but cannot due to the
366
STEFFEN ANDERSEN ET AL.
final-round betting rules); thus inferences from these responses will be biased
towards showing less risk aversion than there might actually be. Conversely,
the 100% bets are assumed to be risk neutral, when in fact they might be risk
lovers; thus inferences from these responses will be biased towards showing
more risk aversion than there might actually be. Two wrongs do not make a
right, although one does encounter such claims in empirical work. Of
course, this approach still relies on exactly the same sort of assumptions
about the interpretation of behavior, although not formalized in terms of an
error distribution. And it is not apparent that the estimates will be lower
bounds, since this censoring issue biases inferences in either direction. The
average estimate of ARA to emerge is 0.000310, with a standard error of
0.000017, but it is not clear how one should interpret this estimate since it
could be an overestimate or an underestimate.
The second approach is a novel and early application of simulation
methods, which we will develop in greater detail below. A computer
simulates optimal play by a risk-neutral agent playing the entire game
10 million times, recognizing that the cards are drawn without replacement.
The computer does not appear to recognize the possibility of switching
cards, but that is not central to the methodological point. The average
return from this virtual lottery (VL) is $6,987 with a standard deviation of
$10,843. It is not apparent that the lottery would have a Gaussian
distribution of returns, but that can be allowed for in a more complete
numerical analysis as we show later, and is again not central to the main
methodological point.
The next step is to compare this distribution with the observed
distribution of earnings, which was an average of $4,677 with a standard
deviation of $4,258, and use a revealed preference argument to infer what
risk attitudes must have been in play for this to have been the outcome
instead of the VL:
A second approach is to compare the sample distribution of outcomes with the
distribution of outcomes if a contestant plays the optimal strategy for a risk-neutral
contestant. One can solve for the coefficient of absolute risk aversion that would make
an individual indifferent between the two distributions. By revealed preference, an
‘‘average’’ contestant prefers the actual distribution to the expected-value maximizing
strategy, so this is an estimate of the lower bound of constant absolute risk aversion
(pp. 511/512).
This approach is worth considering in more depth, because it suggests
estimation strategies for a wide class of stochastic dynamic programming
problems which we develop in Section 3. This exact method will not work
once one moves beyond special cases such as risk neutrality, where outcomes
Risk Aversion in Game Shows
367
and behavior in later rounds have no effect on optimal behavior in earlier
rounds. But we will see that an extension of the method does generalize.
The comparison proposed here generates a lower bound on the ARA,
rather than a precise estimate, since we know that an agent with an even
higher ARA would also implicitly choose the observed distribution over the
virtual RN distribution. Obviously, if one could generate VL distributions
for a wide range of ARA values, it would be possible to refine this
estimation step and select the ARA that maximizes the likelihood of the
data. This is, in fact, exactly what we propose later as a general method for
estimating risk attitudes in such settings. The ARA bound derived from this
approach is 0.0000711, less than one-fourth of the estimate from the first
method.
Gertner (1993, p. 512) concludes that
The ‘‘Card Sharks’’ data indicate a level of risk aversion higher than most existing
estimates. Contestants do not seem to behave in a risk-loving and enthusiastic way because
they are on television, because anything they win is gravy, or because the producers of the
show encourage excessive risk-taking. I think this helps lend credence to the potential
importance and wider applicability of the anomalous results I document below.
His first method does not provide any basis for these claims, since risk
loving is explicitly assumed away. His second method does indicate that the
average player behaves as if risk averse, but there are no standard errors on
that bound. Thus, one simply cannot say that it is statistically significant
evidence of risk aversion.
1.1.2. EUT Anomalies
The second broad set of empirical analyses by Gertner (1993) considers a
regression model of bets in the final round, and shows some alleged
violations of EUT. The model is a two-limit tobit specification, recognizing
that bets at 50% and 100% may be censored. However, most of the settings
in which contestants might rationally bet 50% or 100% are dropped. Bets
with a face card of 2 or an Ace are dropped since they are sure things in the
sense that the optimal bet cannot result in a loss (the bet is simply returned if
the same card is then turned up). Similarly, bets with a face card of 8 are
dropped, since contestants almost always bet the minimum. These deletions
amount to 258 of the 844 observations, which is not a trivial sub-sample.
The regression model includes several explanatory variables. The central
ones are cash and stake. Variable cash is the accumulated earnings by
the contestant to that point over all repetitions of the game. So this
includes previous plays of the game for ‘‘champions,’’ as well as earnings
368
STEFFEN ANDERSEN ET AL.
accumulated in rounds 1–6 of the current game. Variable stake is the
accumulated earnings in the current game, so it excludes earnings from
previous games. One might expect the correlation of stake and cash to be
positive and high, since the average number of times the game is played in
these data is 1.85 ( ¼ 844/457). Additional explanatory variables include a
dummy for new players that are in their first game; the ratio of cash to the
number of times the contestant has played the whole game (the ratio is 0 for
new players); the value of any cars that have been won, given by the stated
sticker price of the car; and dummy variables for each of the possible face
card pairs (in this game a 3 is essentially the same as a King, a 4 the same as
a Queen, etc). The stake variable is included as an interaction with these face
dummies, which are also included by themselves.7 The model is estimated
with or without a multiplicative heteroskedasticity correction, and the latter
estimates preferred. Card-counters are ignored when inferring probabilities
of a win, and this seems reasonable as a first approximation.
Gertner (1993, Section VI) draws two striking conclusions from this
model. The first is that stake is statistically significant in its interactions with
the face cards. The second is that the cash variable is not significant. The
first result is said to be inconsistent with EUT since earnings in this show are
small in relation to wealth, and
The desired dollar bet should depend upon the stakes only to the extent that the stakes
impact final wealth. Thus, risky decisions on ‘‘Card Sharks’’ are inconsistent with
individuals maximizing a utility function over just final wealth. If one assumes that
utility depends only on wealth, estimates of zero on card intercepts and significant
coefficients on the stake variable imply that outside wealth is close to zero. Since this
does not hold, one must reject utility depending only on final wealth (p. 517).
This conclusion bears close examination. First, there is a substantial debate
as to whether EUT has to be defined over final wealth, whatever that is, or
can be defined just over outcomes in the choice task before the contestant
(e.g., see Cox and Sadiraj (2006) and Harrison, Lau, and Rutström (2007)
for references to the historical literature). So even if one concludes that the
stake matters, this is not fatal for specifications of EUT defined over prizes,
as clearly recognized by Gertner (1993, p. 519) in his reference to Markowitz
(1952). Second, the deletion of all extreme bets likely leads to a significant
understatement of uncertainty about coefficient estimates. Third, the
regression does not correct for panel effects, and these could be significant
since the variables cash and stake are correlated with the individual.8 Hence
their coefficient estimates might be picking up other, unobservable effects
that are individual-specific.
Risk Aversion in Game Shows
369
The second result is also said to be inconsistent with EUT, in conjunction
with the first result. The logic is that stake and cash should have an equal
effect on terminal wealth, if one assumes perfect asset integration and that
utility is defined over terminal wealth. But one has a significant effect on
bets, and the other does not. Since the assumption that utility is defined over
terminal wealth and that asset integration is perfect are implicitly
maintained by Gertner (1993, p. 517ff.), he concludes that EUT is falsified.
However, one can include terminal wealth as an argument of utility without
also assuming perfect asset integration (e.g., Cox & Sadiraj, 2006). This is
also recognized explicitly by Gertner (1993, p. 519), who considers the
possibility that ‘‘contestants have multi-attribute utility functions, so that
they care about something in addition to wealth.’’9 Thus, if one accepts the
statistical caveats about samples and specifications for now, these results
point to the rejection of a particular, prominent version of EUT, but they do
not imply that all popular versions of EUT are invalid.
1.2. Jeopardy!
In the game show Jeopardy! there is a subgame referred to as Final
Jeopardy. At this point, three contestants have cash earnings from the initial
rounds. The skill component of the game consists of hearing some text read
out by the host, at which point the contestants jump in to state the question
that the text provides the answer to.10 In Final Jeopardy the contestants are
told the general subject matter for the task, and then have to privately and
simultaneously state a wager amount from their accumulated points. They
can wager any amount up to their earned endowment at that point, and are
rewarded with even odds: if they are correct they get that wager amount
added, but if they are incorrect they have that amount deducted. The winner
of the show is the contestant with the most cash after this final stage. The
winner gets to keep the earnings and come back the following day to try and
continue as champion.
In general, these wagers are affected by the risk attitudes of contestants.
But they are also affected by their subjective beliefs about their own skill
level relative to the other two contestants, and by what they think the other
contestants will do. So this game cannot be fully analyzed without making
some game-theoretic assumptions.
Jeopardy! was first aired in the United States in 1964, and continued until
1975. A brief season returned between 1978 and 1979, and then the modern
era began in 1984 and continues to this day. The format changes have been
370
STEFFEN ANDERSEN ET AL.
relatively small, particularly during the modern era. The data used by
Metrick (1995) comes from shows broadcasted between October 1989 and
January 1992, and reflects more than 1,150 decisions.
Metrick (1995) examines behavior in Final Jeopardy in two stages.11 The
first stage considers the subset of shows in which one contestant is so far
ahead in cash that the bet only reveals risk attitudes and beliefs about own
skill. In such ‘‘runaway games’’ there exist wagers that will ensure victory,
although there might be some rationale prior to September 2003 for
someone to bet an amount that could lead to a loss. Until then, the
champion had to retire after five wins, so if one had enough confidence in
one’s skill at answering such questions, one might rationally bet more than
was needed to ensure victory. After September 2003 the rules changed, so
the champion stays on until defeated.
In the runaway games Metrick (1995, p. 244) uses the same formula that
Gertner (1993) used for CARA utility functions. The only major difference
is that the probability of winning in Jeopardy! is not known objectively to
the observer.12 His solution is to substitute the observed fraction of correct
answers, akin to a rational expectations assumption, and then solve for the
CARA parameter a that accounts for the observed bets. The result is an
estimate of a equal to 0.000066 with a standard error of 0.000056. Thus,
there is slight evidence of risk aversion, but it is not statistically significant,
leading Metrick (1995, p. 245) to conclude that these contestants behaved in
a risk-neutral manner.
The second stage of the analysis considers subsamples in which two
players have accumulated scores that are sufficiently close that they have to
take beliefs about the other into account, but where there is a distant third
contestant who can be effectively ignored. Metrick (1995) cuts this Gordian
knot of strategic considerations by assuming that contestants view
themselves as betting against contestants whose behavior can be characterized by their observed empirical frequencies. He does not use these data to
make inferences about risk attitudes.
1.3. Lingo
The underlying game in Lingo involves a team of two people guessing a
hidden five-letter word. Fig. 3 illustrates one such game from the U.S.
version. The team is told the first letter of the word, and can then just state
words. If incorrect, the words that are tried are used to reveal letters in the
correct word if there are any. To take the example in Fig. 3, the true word
Risk Aversion in Game Shows
Fig. 3.
371
The Word Puzzle in Lingo.
was STALL. So the initial S was shown. The team suggested SAINT and is
informed (by light grey coloring) that A and T are present in the correct
word. The team is not told the order of the letters A and T in the correct
word. The team then suggested STAKE, and was informed that the T and
A were in the right place (by grey coloring) and that no other letters were
in the correct word. The team then tried STAIR, SEATS, and finally
STALL. Most teams are able to guess the correct word in five rounds.
The game occurs in two stages. In the first stage, one team of two plays
against another team for several of these Lingo word-guessing games. The
couple with the most money then goes on to the second stage, which is
the one of interest for measuring risk attitudes because it is non-interactive.
So the winning couple comes into the main task with a certain earned
endowment (which could be augmented by an unrelated game called
‘‘jackpot’’). The team also comes in with some knowledge of its own ability
to solve these word-guessing puzzles.
In the Dutch data used by Beetsma and Schotman (2001), spanning 979
games, the frequency distribution of the number of solutions across rounds
372
STEFFEN ANDERSEN ET AL.
1–5 in the final stage was 0.14, 0.32, 0.23, 0.13, 0.081, and 0.089,
respectively. Every round that the couple requires to guess the word means
that they have to pick one ball from an urn affecting their payoffs, as
described below. If they do not solve the word puzzle, they have to pick six
balls. These balls determine if the team goes ‘‘bust’’ or ‘‘survives’’ something
called the Lingo Board in that round. An example of the Lingo Board is
shown in Fig. 4, from Beetsma and Schotman (2001, Fig. 3).13 There are 35
balls in the urn numbered from 1 to 35, plus one ‘‘golden ball.’’ If the golden
ball is picked then the team wins the cash prize for that round and gets a free
pass to the next round. If one of the numbered balls is picked, then the fate
of the team depends on the current state of the Lingo Board. The team goes
‘‘bust’’ if they get a row, column, or diagonal of X’s, akin to the parlor game
noughts and crosses. So solving the word puzzle in fewer moves is good,
since it means that fewer balls have to be drawn from the urn, and hence
that the survival probability is higher. In the example from Fig. 4, drawing a
5 would be fatal, drawing an 11 would not be, and drawing a 1 would not be
if a 2 or 8 had not been previously drawn.
If the team survives a round it gets a cash prize, and is asked if they want to
keep going or stop. This lasts for five rounds. So apart from the skill part of
the game, guessing the words, this is the only choice the team makes. This is
therefore a ‘‘stop-go’’ problem, in which the team balances current earnings
with the lottery of continuing and either earning more cash or going bust. If
the team chooses to continue the stake doubles; if the golden ball had been
drawn it is replaced in the urn. If the team goes bust it takes home nothing.
Teams can play the game up to three times, then retire from the show.
Fig. 4.
Example of a Lingo Board.
Risk Aversion in Game Shows
373
Risk attitudes are involved when the team has to balance the current
earnings with the lottery of continuing. That lottery depends on subjective
beliefs about the skill level of the team, the state of the Lingo Board at that
point, and the perception of the probabilities of drawing a ‘‘fatal’’ number or
the golden ball. In many respects, apart from the skill factor and the relative
symmetry of prizes, this game is remarkably like DOND, as we see later.
Beetsma and Schotman (2001) evaluate data from 979 finals. Each final
lasts several rounds, so the sample of binary stop/continue decisions is
larger, and constitutes a panel. Average earnings in this final round in their
sample are 4,106 Dutch guilders ( f ), with potential earnings, given the initial
stakes brought into the final, of around f 15,136. The average exchange rate
into U.S. dollars in 1997, which is around when these data were from, was
f 0.514 per dollar, so these stakes are around $2,110 on average, and up to
roughly $7,780. These are not life-changing prizes, like the top prizes in
DOND, but are clearly substantial in relation to most lab experiments.
Beetsma and Schotman (2001, Section 4) show that the stop/continue
decisions have a simple monotonic structure if one assumes CRRA or
CARA utility. Since the odds of surviving never get better with more
rounds, if it is optimal to stop in one round then it will always be optimal to
stop in any later round. This property does not necessarily hold for other
utility functions. But for these utility functions, which are still an important
class, one can calculate a threshold survival probability pni for any round i
such that the team should stop if the actual survival probability falls below
it. This threshold probability does depend on the utility function and
parameter values for it, but in a closed-form fashion that can be easily
evaluated within a maximum-likelihood routine.14
Each team can play the game three times before it has to retire as a
champion. The specification of the problem clearly recognizes the option
value in the first game of coming back to play the game a second or third
time, and then the option value in the second game of coming back to play a
third time. The certainty-equivalent of these option values depends, of
course, on the risk attitudes of the team. But the estimation procedure
‘‘black boxes’’ these option values to collapse the estimation problem down
to a static one: they are free parameters to be estimated along with the
parameter of the utility function. Thus, they are not constrained by the
expected returns and risk of future games, the functional form of utility, and
the specific parameters values being evaluated in the maximum-likelihood
routine. Beetsma and Schotman (2001, p. 839) do clearly check that the
option value in the first game exceeds the option value in the second game,
but (a) they only examine point estimates, and make no claim that this
374
STEFFEN ANDERSEN ET AL.
difference is statistically significant,15 and (b) there is no check that the
absolute values of these option values are consistent with the utility function
and parameter values. In addition, there is no mention of any corrections for
the fact that each team makes several decisions, and that errors for that
team are likely correlated.
With these qualifications, the estimate of the CRRA parameter is 0.42,
with a standard error of 0.05, if one assumes that utility is only defined over
the monetary prizes. It rises to 6.99, with a standard error of 0.72, if one
assumes a baseline wealth level of f 50,000, which is the preferred estimate.
Each of these estimates is significantly different from 0, implying rejection of
risk neutrality in favor of risk aversion. The CARA specification generates
comparable estimates.
One extension is to allow for probability weighting on the actual survival
probability pi in round i. The weighting occurs in the manner of original
Prospect Theory, due to Kahneman and Tversky (1979), and not in the
rank-dependent manner of Quiggin (1982, 1993) and Cumulative Prospect
Theory. One apparent inconsistency is that the actual survival probabilities
are assumed to be weighted subjectively, but the threshold survival
probabilities pni are not, which seems odd (see their Eq. (18), p. 843). The
results show that estimates of the degree of concavity of the utility function
increase substantially, and that contestants systematically overweight the
actual survival probability. We return to some of the issues of structural
estimation of models assuming decision weights, in a rank-dependent
manner, in the discussion of DOND and Andersen, Harrison, Lau, and
Rutström (2006a, 2006b).
2. DEAL OR NO DEAL
2.1. The Game Show as a Natural Experiment
The basic version of DOND is the same across all countries. We explain the
general rules by focusing on the version shown in the United States, and
then consider variants found in other countries.
The show confronts the contestant with a sequential series of choices over
lotteries, and asks a simple binary decision: whether to play the (implicit)
lottery or take some deterministic cash offer. A contestant is picked from the
studio audience. They are told that a known list of monetary prizes, ranging
from $0.01 up to $1,000,000, has been placed in 26 suitcases.16 Each suitcase
is carried onstage by attractive female models, and has a number from 1 to
Risk Aversion in Game Shows
375
26 associated with it. The contestant is informed that the money has been
put in the suitcase by an independent third party, and in fact it is common
that any unopened cases at the end of play are opened so that the audience
can see that all prizes were in play. Fig. 5 shows how the prizes are displayed
to the subject at the beginning of the game.
The contestant starts by picking one suitcase that will be ‘‘his’’ case. In
round 1, the contestant must pick 6 of the remaining 25 cases to be opened,
so that their prizes can be displayed. Fig. 6 shows how the display changes
after the contestant picks the first case: in this case the contestant
unfortunately picked the case containing the $300,000 prize. A good round
for a contestant occurs if the opened prizes are low, and hence the odds
increase that his case holds the higher prizes. At the end of each round the
host is phoned by a ‘‘banker’’ who makes a deterministic cash offer to the
contestant. In one of the first American shows (12/21/2005) the host made a
point of saying clearly that ‘‘I don’t know what’s in the suitcases, the banker
doesn’t, and the models don’t.’’
The initial offer in early rounds is typically low in comparison to expected
offers in later rounds. We use an empirical offer function later, but the
qualitative trend is quite clear: the bank offer starts out at roughly 10% of
Fig. 5.
Opening Display of Prizes in TV Game Show Deal or No Deal.
376
STEFFEN ANDERSEN ET AL.
Fig. 6.
Prizes Available After One Case Has Been Opened.
the expected value of the unopened cases, and increments by about 10% of
that expected value for each round. This trend is significant, and serves to
keep all but extremely risk-averse contestants in the game for several
rounds. For this reason, it is clear that the case that the contestant ‘‘owns’’
has an option value in future rounds.
In round 2, the contestant must pick five cases to open, and then there is
another bank offer to consider. In succeeding rounds, 3–10, the contestant
must open 4, 3, 2, 1, 1, 1, 1, and 1 cases, respectively. At the end of round 9,
there are only two unopened cases, one of which is the contestant’s case.
In round 9 the decision is a relatively simple one from an analyst’s
perspective: either take the non-stochastic cash offer or take the lottery with
a 50% chance of either of the two remaining unopened prizes. We could
assume some latent utility function, and estimate parameters for that
function that best explains observed binary choices. Unfortunately,
relatively few contestants get to this stage, having accepted offers in earlier
rounds. In our data, only 9% of contestants reach that point. More serious
than the smaller sample size, one naturally expects that risk attitudes would
affect those surviving to this round. Thus, there would be a serious sample
attrition bias if one just studied choices in later rounds.
Risk Aversion in Game Shows
377
The bank offer gets richer and richer over time, ceteris paribus the random
realizations of opened cases. In other words, if each unopened case truly has
the same subjective probability of having any remaining prize, there is a
positive expected return to staying in the game for more and more rounds.
A risk-averse subject that might be just willing to accept the bank offer, if the
offer were not expected to get better and better, would choose to continue to
another round since the expected improvement in the bank offer provides
some compensation for the additional risk of going into the another round.
Thus, to evaluate the parameters of some latent utility function given
observed choices in earlier rounds, we have to mentally play out all possible
future paths that the contestant faces.17 Specifically, we have to play out
those paths assuming the values for the parameters of the likelihood
function, since they affect when the contestant will decide to ‘‘deal’’ with the
banker, and hence the expected utility of the compound lottery. This
corresponds to procedures developed in the finance literature to price pathdependant derivative securities using Monte Carlo simulation (e.g., Campbell,
Lo, & MacKinlay, 1997, Section 9.4). We discuss general numerical methods
for this type of analysis later.
Saying ‘‘no deal’’ in early rounds provides one with the option of being
offered a better deal in the future, ceteris paribus the expected value of the
unopened prizes in future rounds. Since the process of opening cases is a
martingale process, even if the contestant gets to pick the cases to be opened,
it has a constant future expected value in any given round equal to the
current expected value. This implies, given the exogenous bank offers (as a
function of expected value), that the dollar value of the offer will get richer
and richer as time progresses. Thus, bank offers themselves will be a submartingale process.
In the U.S. version the contestants are joined after the first round by several
family members or friends, who offer suggestions and generally add to the
entertainment value. But the contestant makes the decisions. For example, in
the very first show a lady was offered $138,000, and her hyperactive husband
repeatedly screamed out ‘‘no deal!’’ She calmly responded, ‘‘At home, you do
make the decisions. But y. we’re not at home!’’ She turned the deal down, as
it happens, and went on to take an offer of only $25,000 two rounds later.
Our sample consists of 141 contestants recorded between December 19,
2005 and May 6, 2007. This sample includes 6 contestants that participated in
special versions, for ratings purposes, in which the top prize was increased
from $1 million to $2 million, $3 million, $4 million, $5 million or $6
million.18 The biggest winner on the show so far has been Michelle Falco,
who was lucky enough to be on the September 22, 2006 show with a top prize
378
STEFFEN ANDERSEN ET AL.
of $6 million. Her penultimate offer was $502,000 when the 3 unopened prizes
were $10, $750,000 and $1 million, which has an expected value of $583,337.
She declined the offer, and opened the $10 case, resulting in an offer of
$808,000 when the expected value of the two remaining prizes was $875,000.
She declined the offer, and ended up with $750,000 in her case.
In other countries there are several variations. In some cases there are fewer
prizes, and fewer rounds. In the United Kingdom there are only 22 monetary
prizes, ranging from 1p up to d250,000, and only 7 rounds. In round 1 the
contestant must pick 5 boxes, and then in each round until round 6 the
contestant has to open 3 boxes per round. So there can be a considerable
swing from round to round in the expected value of unopened boxes,
compared to the last few rounds of the U.S. version. At the end of round 6
there are only 2 unopened boxes, one of which is the contestant’s box.
Some versions substitute the option of switching the contestant’s box for
an unopened box, instead of a bank offer. This is particularly common in
the French and Italian versions, and relatively rare in other versions.
Things become much more complex in those versions in which the bank
offer in any round is statistically informative about the prize in the
contestant’s case. In that case the contestant has to make some correction
for this possibility, and also consider the strategic behavior of the banker’s
offer. Bombardini and Trebbi (2005) offer clear evidence that this occurs in
the Italian version of the show, but there is no evidence that it occurs in the
U.K. version.
The Australian version offers several additional options at the end of the
normal game, called Chance, SuperCase, and Double Or Nothing. In many
cases they are used as ‘‘entertainment filler,’’ for games that otherwise would
finish before the allotted 30 min. It has been argued, most notably by
Mulino, Scheelings, Brooks, and Faff (2006), that these options should
rationally change behavior in earlier rounds, since they provide some
uncertain ‘‘insurance’’ against saying ‘‘deal’’ earlier rather than later.
2.2. Comparable Laboratory Experiments
We also implemented laboratory versions of the DOND game, to
complement the natural experimental data from the game shows.19 The
instructions were provided by hand and read out to subjects to ensure that
every subject took some time to digest them. As far as possible, they rely on
screen shots of the software interface that the subjects were to use to enter
their choices. The opening page for the common practice session in the lab,
shown in Fig. 7, provides the subject with basic information about the task
Risk Aversion in Game Shows
Fig. 7.
379
Opening Screen Shot for Laboratory Experiment.
before them, such as how many boxes there were and how many boxes
needed to be opened in any round.20 In the default setup the subject was
given the same frame as in the Australian and U.S. game shows: this version
has more prizes (26 instead of 22) and more rounds (9 instead of 6) than the
U.K. version.
After clicking on the ‘‘Begin’’ box, the lab subject was given the main
interface, shown in Fig. 8. This provided the basic information for the
DOND task. The presentation of prizes was patterned after the displays
used on the actual game shows. The prizes are shown in the same nominal
denomination as the Australian daytime game show, and the subject told
that an exchange rate of 1,000:1 would be used to convert earnings in the
DOND task into cash payments at the end of the session. Thus, the top cash
prize the subject could earn was $200 in this version.
The subject was asked to click on a box to select ‘‘his box,’’ and then
round 1 began. In the instructions we illustrated a subject picking box #26,
and then six boxes, so that at the end of round 1 he was presented with a
deal from the banker, shown in Fig. 9. The prizes that had been opened in
round 1 were ‘‘shaded’’ on the display, just as they are in the game show
display. The subject is then asked to accept $4,000 or continue. When the
380
STEFFEN ANDERSEN ET AL.
Fig. 8.
Prize Distribution and Display for Laboratory Experiment.
game ends the DOND task earnings are converted to cash using the
exchange rate, and the experimenter prompted to come over and record
those earnings. Each subject played at their own pace after the instructions
were read aloud.
One important feature of the experimental instructions was to explain
how bank offers would be made. The instructions explained the concept of
the expected value of unopened prizes, using several worked numerical
examples in simple cases. Then subjects were told that the bank offer would
be a fraction of that expected value, with the fractions increasing over the
rounds as displayed in Fig. 10. This display was generated from Australian
game show data available at the time. We literally used the parameters
defining the function shown in Fig. 10 when calculating offers in the
experiment, and then rounding to the nearest dollar.
Risk Aversion in Game Shows
Fig. 9.
381
Typical Bank Offer in Laboratory Experiment.
The subjects for our laboratory experiments were recruited from the
general student population of the University of Central Florida in 2006.21
We have information on 676 choices made by 89 subjects.
We estimate the same models for the lab data as for the U.S. game show
data. We are not particularly interested in getting the same quantitative
estimates per se, since the samples, stake, and context differ in obvious ways.
Instead our interest is whether we obtain the same qualitative results: is the
lab reliable in terms of the qualitative inferences one draws from it? Our null
hypothesis is that the lab results are the same as the naturally occurring
results. If we reject this hypothesis one could infer that we have just not run
the right lab experiments in some respect, and we have some sympathy for
that view. On the other hand, we have implemented our lab experiments in
exactly the manner that we would normally do as lab experimenters. So we
382
STEFFEN ANDERSEN ET AL.
Path of Bank Offers
1
Bank Offer As A Fraction of Expected
Value of Unopened Cases
.9
.8
.7
.6
.5
.4
.3
.2
.1
0
1
Fig. 10.
2
3
4
5
Round
6
7
8
9
Information on Bank Offers in Laboratory Experiment.
are definitely able to draw conclusions in this domain about the reliability of
conventional lab tests compared to comparable tests using naturally
occurring data. These conclusions would then speak to the questions raised
by Harrison and List (2004) and Levitt and List (2007) about the reliability
of lab experiments.
2.3. Other Analyses of Deal or No Deal
A large literature on DOND has evolved quickly.22 Appendix B in the
working paper version documents in detail the modeling strategies adopted
in the DOND literature, and similarities and differences to the approach we
propose.23 In general, three types of empirical strategies have been employed
to modeling observed DOND behavior.
The first empirical strategy is the calculation of CRRA bounds at which a
given subject is indifferent between one choice and another. These bounds
can be calculated for each subject and each choice, so they have the
advantage of not assuming that each subject has the same risk preferences,
just that they use the same functional form. The studies differ in terms of
Risk Aversion in Game Shows
383
how they use these bounds, as discussed briefly below. The use of bounds
such as these is familiar from the laboratory experimental literature on risk
aversion: see Holt and Laury (2002), Harrison, Johnson, McInnes, and
Rutström (2005), and Harrison, Lau, Rutström, and Sullivan (2005) for
discussion of how one can then use interval regression methods to analyze
them. The limitation of this approach, discussed in Harrison and Rutström
(2008, Section 2.1), is that it is difficult to go beyond the CRRA or other
one-parameter families, and in particular to examine other components of
choice under uncertainty (such as more flexible utility functions, preference
weighting or loss aversion).24 Post, van den Assem, Baltussen, and Thaler
(2006) use CRRA bounds in their analysis, and it has been employed in
various forms by others as noted below.
The second empirical strategy is the examination of specific choices that
provide ‘‘trip wire’’ tests of certain propositions of EUT, or provide
qualitative indicators of preferences. For example, decisions made in the
very last rounds often confront the contestant with the expected value of the
unopened prizes, and allow one to identify those who are risk loving or risk
averse directly. The limitation of this approach is that these choices are
subject to sample selection bias, since risk attitudes and other preferences
presumably played some role in whether the contestant reached these critical
junctures. Moreover, they provide limited information at best, and do not
allow one to define a metric for errors. If we posit some stochastic error
specification for choices, as is now common, then one has no way of
knowing if these specific choices are the result of such errors or a
manifestation of latent preferences. Blavatskyy and Pogrebna (2006)
illustrate the sustained use of this type of empirical strategy, which is also
used by other studies in some respects.
The third empirical strategy it to propose a latent decision process and
estimate the structural parameters of that process using maximum
likelihood. This is the approach we favor, since it allows one to examine
structural issues rather than rely on ad hoc proxies for underlying
preferences. Harrison and Rutström (2008, Section 2.2) discuss the general
methodological advantages of this approach.
3. A GENERAL ESTIMATION STRATEGY
The DOND game is a dynamic stochastic task in which the contestant has to
make choices in one round that generally entail consideration of future
consequences. The same is true of the other game shows used for estimation
384
STEFFEN ANDERSEN ET AL.
of risk attitudes. In Card Sharks the level of bets in one round generally
affects the scale of bets available in future rounds, including bankruptcy, so
for plausible preference structures one should take this effect into account
when deciding on current bets. Indeed, as explained earlier, one of the
empirical strategies employed by Gertner (1993) can be viewed as a precursor
to our general method. In Lingo the stop/continue structure, where a certain
amount of money is being compared to a virtual money lottery, is evident.
We propose a general estimation strategy for such environments, and apply it
to DOND. The strategy uses randomization to break the general ‘‘curse of
dimensionality’’ that is evident if one considers this general class of dynamic
programming problems (Rust, 1997).
3.1. Basic Intuition
The basic logic of our approach can be explained from the data and
simulations shown in Table 1. We restrict attention here to the first 75
contestants that participated in the standard version of the television game
with a top prize of $1 million, to facilitate comparison of dollar amounts.
There are nine rounds in which the banker makes an offer, and in round 10
the contestant simply opens his case. Only 7 contestants, or 9% of the sample
of 75 continued to round 10, with most accepting the banker’s offer in rounds
6, 7, 8, and 9. The average offer is shown in column 4. We stress that this
offer is stochastic from the perspective of the sample as a whole, even if it is
non-stochastic to the specific contestant in that round. Thus, to see the logic
of our approach from the perspective of the individual decision-maker, think
of the offer as a non-stochastic number, using the average values shown as a
proximate indicator of the value of that number in a particular instance.
In round 1 the contestant might consider up to nine VLs. He might
look ahead one round and contemplate the outcomes he would get if
he turned down the offer in round 1 and accepted the offer in round 2. This
VL, realized in virtual round 2 in the contestant’s thought experiment,
would generate an average payoff of $31,141 with a standard deviation of
$23,655. The top panel of Fig. 11 shows the simulated distribution of this
particular lottery. The distribution of payoffs to these VLs are highly
skewed, so the standard deviation may be slightly misleading if one thinks of
these as Gaussian distributions. However, we just use the standard deviation
as one pedagogic indicator of the uncertainty of the payoff in the VL: in our
formal analysis we consider the complete distribution of the VL in a nonparametric manner.
75
100%
75
100%
75
100%
75
100%
74
99%
69
92%
53
71%
33
44%
17
23%
7
9%
10
16
20
16
5
1
0
0
0
$79,363
$107,779
$119,746
$112,818
$103,188
$75,841
$54,376
$33,453
$16,180
$31,141
$53,757
$73,043
$97,275
($23,655) ($45,996) ($66,387) ($107,877)
$53,535
$72,588
$96,887
($46,177) ($66,399) ($108,086)
$73,274
$97,683
($65,697) ($107,302)
$99,895
($108,629)
Round 5
$104,793
($102,246)
$104,369
($102,222)
$105,117
($101,271)
$107,290
($101,954)
$111,964
($106,137)
Round 6
$120,176
($121,655)
$119,890
($121,492)
$120,767
($120,430)
$123,050
($120,900)
$128,613
($126,097)
$128,266
($124,945)
Round 7
$131,165
($154,443)
$130,408
($133,239)
$131,563
($153,058)
$134,307
($154,091)
$140,275
($160,553)
$139,774
($159,324)
$136,720
($154,973)
Round 8
Round 10
$136,325
$136,281
($176,425) ($258,856)
$135,877
$135,721
($175,278) ($257,049)
$136,867
$136,636
173810 ($255,660))
$139,511
$139,504
($174,702) ($257,219))
$145,710
$145,757
($180,783) ($266,303)
$145,348
$145,301
($180,593) ($266,781)
$142,020
$142,323
($170,118) ($246,044)
$116,249
$116,020
($157,005) ($223,979)
$53,929
($113,721)
Round 9
Note: Data drawn from observations of contestants on the U.S. game show, plus author’s simulations of virtual lotteries as explained in the text.
10
9
8
7
6
5
4
3
2
1
Round 2 Round 3 Round 4
Looking At Virtual Lottery Realized In y
Virtual Lotteries for US Deal or No Deal Game Show.
Round Active Contestants Deal! Average Offer
Table 1.
Risk Aversion in Game Shows
385
386
STEFFEN ANDERSEN ET AL.
Density
VL if No Deal in round 1
and then Deal in round 2
0
50000
100000
Prize Value
150000
200000
Density
VL if No Deal in round 1
No Deal in round 2
and then Deal in round 3
0
50000
Fig. 11.
100000
Prize Value
150000
200000
Two Virtual Lottery Distributions in Round 1.
In round 1 the contestant can also consider what would happen if he turned
down offers in rounds 1 and 2, and accepted the offer in round 3. This VL
would generate, from the perspective of round 1, an average payoff of $53,757
with a standard deviation of $45,996. The bottom panel of Fig. 11 shows the
simulated distribution of this particular VL. Compared to the VL in which the
contestant said ‘‘No Deal’’ in round 1 and ‘‘Deal’’ in round 2, shown above it
in Fig. 11, it gives less weight to the smallest prizes and greater weight to
higher prizes. Similarly for each of the other VLs shown. The VL for the final
Round 10 is simply the implied lottery over the final two unopened cases, since
in this round the contestant would have said ‘‘No Deal’’ to all bank offers.
The forward-looking contestant in round 1 is assumed to behave as if he
maximizes the expected utility of accepting the current offer or continuing.
The expected utility of continuing, in turn, is given by simply evaluating
each of the nine VLs shown in the first row of Table 1. The average payoff
increases steadily, but so does the standard deviation of payoffs, so this
evaluation requires knowledge of the utility function of the contestant.
Given that utility function, the contestant is assumed to behave as if they
evaluate the expected utility of each of the nine VLs. Thus, we calculate nine
expected utility numbers, conditional on the specification of the parameters
Risk Aversion in Game Shows
387
of the assumed utility function and the VLs that each subject faces in their
round 1 choices. In round 1, the subject then simply compares the maximum
of these nine expected utility numbers to the utility of the non-stochastic
offer in round 1. If that maximum exceeds the utility of the offer, he turns
down the offer; otherwise he accepts it.
In round 2, a similar process occurs. One critical feature of our VL
simulations is that they are conditioned on the actual outcomes that each
contestant has faced in prior rounds. Thus, if a (real) contestant has
tragically opened up the six top prizes in round 1, that contestant would not
see VLs such as the ones in Table 1 for round 2. They would be conditioned
on that player’s history in round 1. We report here averages over all players
and all simulations. We undertake 100,000 simulations for each player in
each round, so as to condition on their history.25
This example can also be used to illustrate how our maximum-likelihood
estimation procedure works. Assume some specific utility function and some
parameter values for that utility function, with all prizes scaled by the
maximum possible at the outset of the game. The utility of the nonstochastic bank offer in round R is then directly evaluated. Similarly, the
VLs in each round R can then be evaluated.26 They are represented
numerically as 100-point discrete approximations, with 100 prizes and 100
probabilities associated with those prizes. Thus, by implicitly picking a VL
over an offer, it is as if the subject is taking a draw from this 100-point
distribution of prizes. In fact, they are playing out the DOND game, but this
representation as a VL draw is formally identical. The evaluation of these
VLs generates v(R) expected utilities, where v(1) ¼ 9, v(2) ¼ 8, y , v(9) ¼ 1
as shown in Table 1. The maximum expected utility of these v(R) in a given
round R is then compared to the utility of the offer, and the likelihood
evaluated in the usual manner.
We present a formal statement of the latent EUT process leading to a
likelihood defined over parameters and the observed choices, and then discuss
how this intuition changes when we assume alternative, non-EUT processes.
3.2. Formal Specification
We assume that utility is defined over money m using the popular CRRA
function
uðmÞ ¼
m1r
ð1 rÞ
(1)
388
STEFFEN ANDERSEN ET AL.
where r is the utility function parameter to be estimated. In this case r 6¼ 1 is
the RRA coefficient, and u(m) ¼ ln(m) for r ¼ 1. With this parameterization
r ¼ 0 denotes risk-neutral behavior, rW0 denotes risk aversion, and ro0
denotes risk loving. We review one extension to this simple CRRA model
later, but for immediate purposes it is desirable to have a simple specification
of the utility function in order to focus on the estimation methodology.27
Probabilities for each outcome k, pk, are those that are induced by the
task, so expected utility is simply the probability-weighted utility of each
outcome in each lottery. There were 100 outcomes in each VL i, so
X
½pk uk (2)
EUi ¼
k¼1;100
Of course, we can view the bank offer as being a degenerate lottery.
A simple stochastic specification was used to specify likelihoods
conditional on the model. The EU for each lottery pair was calculated for
a candidate estimate of the utility function parameters, and the index
rEU ¼
EUBO EUL
m
(3)
is calculated, where EUL is the lottery in the task, EUBO the degenerate
lottery given by the bank offer, and m a Fechner noise parameter following
Hey and Orme (1994).28 The index rEU is then used to define the cumulative
probability of the observed choice to ‘‘Deal’’ using the cumulative standard
normal distribution function:
GðrEUÞ ¼ FðrEUÞ
(4)
This provides a simple stochastic link between the latent economic model
and observed choices.29 The likelihood, conditional on the EUT model being
true and the use of the CRRA utility function, depends on the estimate of r
and m given the above specification and the observed choices. The
conditional log-likelihood is
ln LEUT ðr; m; yÞ ¼
X
½ðln GðrEUÞjyi ¼ 1Þ þ ðlnð1 GðrEUÞÞjyi ¼ 0Þ
(5)
i
where yi ¼ 1(0) denotes the choice of ‘‘Deal’’ (No Deal) in task i.
We extend this standard formulation to include forward-looking behavior
by redefining the lottery that the contestant faces. One such VL reflects the
Risk Aversion in Game Shows
389
possible outcomes if the subject always says ‘‘No Deal’’ until the end of the
game and receives his prize. We call this a VL since it need not happen; it
does happen in some fraction of cases, and it could happen for any subject.
Similarly, we can substitute other VLs reflecting other possible choices by
the contestant. Just before deciding whether to accept the bank offer in
round 1, what if the contestant behaves as if the following simulation were
repeated G times:
Play out the remaining eight rounds and pick cases at random until all but two cases are
unopened. Since this is the last round in which one would receive a bank offer, calculate the
expected value of the remaining two cases. Then multiply that expected value by the
fraction that the bank is expected to use in round 9 to calculate the offer. Pick that fraction
from a prior as to the average offer fraction, recognizing that the offer fraction is stochastic.
The end result of this simulation is a sequence of G virtual bank offers in
round 9, viewed from the perspective of round 1. This sequence then defines
the VL to be used for a contestant in round 1 whose horizon is the last
round in which the bank will make an offer. Each of the G bank offers in
this virtual simulation occurs with probability 1/G, by construction. To keep
things numerically manageable, we can then take a 100-point discrete
approximation of this lottery, which will typically consist of G distinct real
values, where one would like G to be relatively large (we use G ¼ 100,000).
This simulation is conditional on the six cases that the subject has already
selected at the end of round 1. Thus, the lottery reflects the historical fact of
the six specific cases that this contestant has already opened.
The same process can be repeated for a VL that only involves looking
forward to the expected offer in round 8. And for a VL that only involves
looking forward to rounds 7, 6, 5, 4, 3, and 2, respectively. Table 1 illustrates
the outcome of such calculations. The contestant can be viewed as having a
set of nine VLs to compare, each of which entails saying ‘‘No Deal’’ in
round 1. The different VLs imply different choices in future rounds, but the
same response in round 1.
To decide whether to accept the deal in round 1, we assume that the
subject simply compares the maximum EU over these nine VLs with the
utility of the deterministic offer in round 1. To calculate EU and utility of
the offer one needs to know the parameters of the utility function, but these
are just nine EU evaluations and one utility evaluation. These evaluations
can be undertaken within a likelihood function evaluator, given candidate
values of the parameters of the utility function.
The same process can be repeated in round 2, generating another set of
eight VLs to be compared to the actual bank offer in round 2. This
390
STEFFEN ANDERSEN ET AL.
simulation would not involve opening as many cases, but the logic is the
same. Similarly for rounds 3–9. Thus, for each of round 1–9, we can
compare the utility of the actual bank offer with the maximum EU of the
VLs for that round, which in turn reflects the EU of receiving a bank offer in
future rounds in the underlying game.
In addition, there exists a VL in which the subject says ‘‘No Deal’’ in every
round. This is the VL that we view as being realized in round 10 in Table 1.
There are several significant advantages of this VL approach. First, since
the round associated with the highest expected utility is not the same for all
contestants due to heterogeneity in risk attitudes, it is of interest to estimate
the length of this horizon. Since we can directly see that the contestant who
has a short horizon behaves in essentially the same manner as the contestant
who has a longer horizon, and just substitutes different VLs into their latent
EUT calculus, it is easy to test hypotheses about restrictions on the horizon
generated by more myopic behavior. Second, one can specify mixture
models of different horizons, and let the data determine what fraction of the
sample employs which horizon. Third, the approach generalizes for any
known offer function, not just the ones assumed here and in Table 1. Thus,
it is not as specific to the DOND task as it might initially appear. This is
important if one views DOND as a canonical task for examining
fundamental methodological aspects of dynamic choice behavior. Those
methods should not exploit the specific structure of DOND, unless there is
no loss in generality. In fact, other versions of DOND can be used to
illustrate the flexibility of this approach, since they sometimes employ
‘‘follow on’’ games that can simply be folded into the VL simulation.
Finally, and not least, this approach imposes virtually no numerical burden
on the maximum-likelihood optimization part of the numerical estimation
stage: all that the likelihood function evaluator sees in a given round is a
non-stochastic bank offer, a handful of (virtual) lotteries to compare it
to given certain proposed parameter values for the latent choice model, and
the actual decision of the contestant to accept the offer or not. This
parsimony makes it easy to examine non-CRRA and non-EUT specifications of the latent dynamic choice process, illustrated in Andersen et al.
(2006a, 2006b).
All estimates allow for the possibility of correlation between responses by
the same subject, so the standard errors on estimates are corrected for the
possibility that the responses are clustered for the same subject. The use of
clustering to allow for ‘‘panel effects’’ from unobserved individual effects is
common in the statistical survey literature.30 In addition, we consider
allowances for random effects from unobserved individual heterogeneity31
Risk Aversion in Game Shows
391
after estimating the initial model that assumes that all subjects have the
same preferences for risk.
3.3. Estimates from Behavior on the Game Show
We estimate the CRRA coefficient to be 0.18 with a standard error of 0.030,
implying a 95% confidence interval between 0.12 and 0.24. So this provides
evidence of moderate risk aversion over this large domain. The noise
parameter m is estimated to be 0.077, with a standard deviation of 0.015.
Based on the estimated risk coefficients we can calculate the future round
for which each contestant had the highest expected utility, seen from the
perspective of the round when each decision is made. Fig. 12 displays
histograms of these implied maximum EU rounds for each round-specific
decision. For example, when contestants are in round 1 making a decision
over ‘‘Deal’’ or ‘‘No Deal’’ we see that there is a strong mode for future
round 9 as being the round with the maximum EU, given the estimated risk
coefficients. The prominence of round 9 remains across all rounds where
contestants are faced with a ‘‘Deal’’ or ‘‘No Deal’’ choice, although we can
1
4
7
2
5
8
3
6
9
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
Future Round Used
1 2 3 4 5 6 7 8 9 10
150
100
50
Frequency
0
150
100
50
0
150
100
50
0
Fig. 12.
Evaluation Horizon by Round.
392
STEFFEN ANDERSEN ET AL.
see that in rounds 5–7 there is a slight increase in the frequency by which
earlier rounds provide the maximum EU for some contestants. The expected
utilities for other VLs may well have generated the same binary decision, but
the VL for round 9 was the one that appeared to be used since it was greater
than the others in terms of expected utility.
We assume in the above analysis that all contestants can and do evaluate
the expected utility for all VLs defined as the EU of bank offers in future
rounds. Nevertheless, it is possible that some, perhaps all, contestants used a
more myopic approach and evaluated EU over much shorter horizons. It is
a simple matter to examine the effects of constraining the horizon over
which the contestant is assumed to evaluate options. If one assumed that
choices in each round were based on a comparison of the bank offer and the
expected outcome from the terminal round, ignoring the possibility that the
maximum EU may be found for an intervening round, then the CRRA
estimate becomes 0.12, with a 95% confidence interval between 0.10 and
0.15. We cannot reject the hypothesis that subjects behave as if they are less
risk averse if they are only assumed to look to the terminal round, and
ignore the intervening bank offers. If one instead assumes that choices in
each round were based on a myopic horizon, in which the contestant just
considers the distribution of likely offers in the very next round, the CRRA
estimate becomes 0.22, with a 95% confidence interval between 0.18 and
0.42. Thus, we obtain results that are similar to those obtained when we
allow subjects to consider all horizons, although the estimates are biased
and imply greater risk aversion than the unconstrained estimates. The
estimated noise parameter increases to 0.12, with a standard error of 0.043.
Overall, the estimates assuming myopia are statistically significantly
different from the unconstrained estimates, even if the estimates of risk
attitudes are substantively similar.
Our specification of alternative evaluation horizons does not lead to a
nested hypothesis test of parameter restrictions, so a formal test of the
differences in these estimates required a non-nested hypothesis test. We use
the popular Vuong (1989) procedure, even though it has some strong
assumptions discussed in Harrison and Rutström (2005). We find that we
can reject the hypothesis that the evaluation horizon is only the terminal
horizon with a p-value of 0.026, and also reject the hypothesis that the
evaluation horizon is myopic with a p-value of less than 0.0001.
Finally, we can consider the validity of the CRRA assumption in this
setting, by allowing for varying RRA with prizes. One natural candidate
utility function to replace (1) is the Hyperbolic Absolute Risk Aversion
(HARA) function of Merton (1971). We use a specification of HARA32
Risk Aversion in Game Shows
393
given in Gollier (2001):
y 1g
Uð yÞ ¼ z Z þ
; ga0
g
(10 Þ
where the parameter z can be set to 1 for estimation purposes without loss of
generality. This function is defined over the domain of y such that
Z þ y=g40. The first order derivative with respect to income is
zð1 gÞ
y g
U 0 ð yÞ ¼
Zþ
g
g
which is positive if and only if zð1 gÞ=g40 for the given domain of y. The
second-order derivative is
zð1 gÞ
y g1
U 00 ðyÞ ¼ o0
Zþ
g
g
which is negative for the given domain of y. Hence it is not possible to
specify risk-loving behavior with this specification when non-satiation is
assumed. This is not a particularly serious restriction for a model of
aggregate behavior in DOND. With this specification ARA is 1/(Z+y/g), so
the inverse of ARA is linear in income; RRA is y/(Z+y/g), which can both
increase and decrease with income. Relative risk aversion is independent of
income and equal to g when Z ¼ 0.
Using the HARA utility function, we estimate Z to be 0.30, with a standard
error of 0.070 and a 95% confidence interval between 0.15 and 0.43. Thus, we
can easily reject the assumption of CRRA over this domain. We estimate g to be
0.992, with a standard error of 0.001. Evaluating RRA over various prize levels
reveals an interesting pattern: RRA is virtually 0 for all prize levels up to around
$10,000, when it becomes 0.03, indicating very slight risk aversion. It then
increases sharply as prize levels increase. At $100,000 RRA is 0.24, at $250,000
it is 0.44, at $500,000 it is 0.61, at $750,000 it is 0.70, and finally at $1 million it is
0.75. Thus, we observe striking evidence of risk neutrality for small stakes, at
least within the context of this task, and risk aversion for large stakes.
If contestants are constrained to only consider the options available to
them in the next round, roughly the same estimates of risk attitudes obtain,
even if one can again statistically reject this implicit restriction. RRA is
again overestimated, reaching 0.39 for prizes of $100,000, 0.61 for prizes of
$250,000, and 0.86 for prizes of $1 million. On the other hand, assuming
that contestants only evaluate the terminal option leads to much lower
394
STEFFEN ANDERSEN ET AL.
estimates of risk aversion, consistent with the findings assuming CRRA. In
this case there is virtually no evidence of risk aversion at any prize level up
to $1 million, which is clearly implausible a priori.
3.4. Approximation to the Fully Dynamic Path
Our VL approach makes one simplifying assumption which dramatically
enhances its ability to handle complicated sequences of choices, but which
can lead to a bias in the resulting estimates of risk attitudes. To illustrate,
consider the contestant in round 8, facing three unopened prizes and having
to open one prize if he declines the bank offer in round 8. Call these prizes
X, Y, and Z. There are three combinations of prizes that could remain after
opening one prize. Our approach to the VL, from the perspective of the
round 8 decision, evaluates the payoffs that confront the contestant for
each of these three combinations if he ‘‘mentally locks himself into saying
Deal (D) in round 9 and then gets the stochastic offer given the unopened
prizes’’ or if he ‘‘mentally locks himself into saying No Deal (ND) in
round 9 and then opens 1 more prize.’’ The former is the VL associated with
the strategy of saying ND in round 8 and D in round 9, and the latter is the
VL associated with the strategy of saying ND in round 8 and ND again in
round 9. We compare the EU of these two VL as seen from round 8, and
pick the largest as representing the EU from saying ND in round 8. Finally,
we compare this EU to the U from saying D in round 8, since the offer in
round 8 is known and deterministic.
The simplification comes from the fact that we do not evaluate the utility
function in each of the possible virtual round 9 decisions. A complete
enumeration of each possible path would undertake three paired comparisons. Consider the three possible outcomes:
If prize X had been opened we would have Y and Z unopened coming
into virtual round 9. This would generate a distribution of offers in virtual
round 9 (it is a distribution since the expected offer as a percent of the EV
of unopened prizes is stochastic as viewed from round 8). It would also
generate two outcomes if the contestant said ND: either he opens Y or he
opens Z. A complete enumeration in this case should evaluate the EU of
saying D and compare it to the EU of a 50–50 mix of Y and Z.
If prize Y had been opened we would have X and Z unopened coming
into virtual round 9. A complete enumeration should evaluate the EU of
saying D and compare it to the EU of a 50–50 mix of X and Z.
Risk Aversion in Game Shows
395
If prize Z had been opened we would have X and Y unopened coming
into virtual round 9. A complete enumeration should evaluate the EU of
saying D and compare it to the EU of a 50–50 mix of X and Y.
Instead of these three paired comparisons in virtual round 9, our approach
collapses all of the offers from saying D in virtual round 9 into one VL, and
all of the final prize earnings from saying ND in virtual round 9 into another
single VL.
Our approach can be viewed as a valid solution to the dynamic problem
the contestant faces if one accepts the restriction in the set of control
strategies considered by the contestant. This restriction could be justified on
behavioral grounds, since it does reduce the computational burden if in fact
the contestant was using a process such as we use to evaluate the path. On
the other hand, economists typically view the adoption of the optimal path
as an ‘‘as if’’ prediction, in which case this behavioral justification would not
apply. Or our approach may just be viewed as one way to descriptively
model the forward-looking behavior of contestants, which is one of the key
features of the analysis of the DOND game show. Just as we have
alternative ways of modeling static choice under uncertainty, we can have
alternative ways of modeling dynamic choice under uncertainty. At some
point it would be valuable to test these alternative models against each
other, but that does not have to be the first priority in trying to understand
DOND behavior.
It is possible to extend our general VL approach to take into account
these possibilities, since one could keep track of all three pairs of VL in the
above complete enumeration, rather than collapsing it down to just one pair
of VL. Refer to this complete enumeration as VL. From the perspective of
the contestant, we know that the EU(VL)ZEU(VL), since VL contains
VL as a special case.
We can therefore identify the implication of using VL instead of VL for
our inferences about risk attitudes, again considering the contestant in
round 8 for ease of exposition, and assuming that the contestant actually
undertakes full enumeration as reflected in VL. Specifically, we will
understate the EU of saying ND in round 8. This means that our ML
estimation procedure would be biased toward finding less risk aversion than
there actually is. To see this, assume some trial value of a CRRA risk
aversion parameter. There are three possible cases, taking strict inequalities
to be able to state matters crisply:
1. If this trial parameter r generates EU(VL)WEU(VL)WU(D) then
the VL approach would make the same qualitative inference as the
396
STEFFEN ANDERSEN ET AL.
VL approach, but would understate the likelihood of that observation. This understatement comes from the implication that
EU(VL)U(D)WEU(VL)U(D), and it is this difference that determines the probability of the observed choice (after some adjustment for a
stochastic error).
2. If this trial parameter r generates EU(VL)oEU(VL)oU(D) then the
VL approach would again make the same qualitative inference as the
VL approach, but would overstate the likelihood of that observation.
This overstatement comes from the implication in this case that
EU(VL)U(D)oEU(VL)U(D).
3. If this trial parameter r generates EU(VL)WU(D)WEU(VL), then the
VL approach would lead us to predict that the subject would make the D
decision, whereas the VL approach would lead us to predict that the
subject would make the ND decision. If we assume that the subject is
actually motivated by VL, and we incorrectly use VL, we would observe
a choice of ND and would be led to lower our trial parameter r to better
explain the observed choice; lowering r would make the subject less risk
averse, and more likely to reject the D decision under VL. But we should
not have lowered the parameter r, we should just have calculated the EU
of the ND choice using VL instead of VL.
Note that one cannot just tabulate the incidence of these three cases at the
final ML estimate of r, and check to see if the vast bulk of choices fall into
case #1 or case #2, since that estimate would have been adjusted to avoid
case #3 if possible. And there is no presumption that the bias of the
likelihood estimation in case #1 is just offset by the bias in case #2. So the
bias from case #3 would lead us to expect that risk aversion would be
underestimated, but the secondary effects from cases #1 and #2 should also
be taken into account. Of course, if the contestant does not undertake full
enumeration, and instead behaves consistently with the logic of our VL
model, there is no bias at all in our estimates.
The only way to evaluate the extent of the bias is to undertake the
complete enumeration required by VL and compare to the approximation
obtained with VL. We have done this for the game show data in the United
States, starting with behavior in round 6. By skipping behavior in rounds 1–5
we only drop 15 out of 141 subjects, and undertaking the complete
enumeration from earlier rounds is computationally intensive. We employ a
19-point approximation of the empirical distribution of bank offers in each
round; in the VL approach we sampled 100,000 times from those
distributions as part of the VL simulations. We then estimate the CRRA
Risk Aversion in Game Shows
397
model using VL, and estimate the same model for the same behavior using
VL, and compare results. We find that the inferred CRRA coefficient
increases as we use VL, as expected a priori, but by a very small amount.
Specifically, we estimate CRRA to be 0.366 if we use VL and 0.345 if we
use VL, and where the 95% confidence intervals comfortably overlap (they
are 0.25 and 0.48 for the VL approach, and 0.25 and 0.44 for the VL
approach). The log-likelihood under the VL approach is 212.54824, and it
is 211.27711 under the VL approach, consistent with the VL approach
providing a better fit, but only a marginally better fit. Thus, we can claim
that our VL approach provides an excellent approximation to the fully
dynamic solution.
It is worth stressing that the issue of which estimate is the correct one
depends on the assumptions made about contestant behavior. If one assumes
that contestants in fact use strategies such as those embodied in VL, then using
VL would actually overstate true risk aversion, albeit by a trivial amount.
3.5. Estimates from Behavior in the Laboratory
The lab results indicate a CRRA coefficient of 0.45 and a 95% confidence
interval between 0.38 and 0.52, comparable to results obtained using more
familiar risk elicitation procedures due to Holt and Laury (2002) on the
same subject pool. When we restrict the estimation model to only use the
terminal period we again infer a much lower degree of risk aversion,
consistent with risk neutrality; the CRRA coefficient is estimated to be
0.02 with a 95% confidence interval between 0.07 and 0.03. Constraining
the estimation model to only consider prospects one period ahead leads to
higher inferred risk aversion; the CRRA coefficient is estimated to be 0.48
with a 95% confidence interval between 0.41 and 0.55.
4. CONCLUSIONS
Game shows offer obvious advantages for the estimation of risk attitudes,
not the least being the use of large stakes. Our review of analyses of these
data reveal a steady progression of sophistication in terms of the structural
estimation of models of choice under uncertainty. Most of these shows,
however, put the contestant into a dynamic decision-making environment;
so one cannot simply (and reliably) use static models of choice. Using
DOND as a detailed case study, we considered a general estimation
398
STEFFEN ANDERSEN ET AL.
methodology for such shows in which randomization of the potential
outcomes allows us to break the curse of dimensionality that comes from
recognizing these dynamic elements of the task environment.
The DOND paradigm is important for several reasons, and more general
than it might at first seem. It incorporates many of the dynamic, forwardlooking decision processes that strike one as a natural counterpart to a wide
range of fundamental economic decisions in the field. The ‘‘option value’’ of
saying ‘‘No Deal’’ has clear parallels to the financial literature on stock
market pricing, as well as to many investment decisions that have future
consequences (so-called real options). There is no frictionless market ready to
price these options, so familiar arbitrage conditions for equilibrium valuation
play no immediate role, and one must worry about how the individual makes
these decisions. The game show offers a natural experiment, with virtually all
of the major components replicated carefully from show to show, and even
from country to country.
The only sense in which DOND is restrictive is that it requires that the
contestant make a binary ‘‘stop/go’’ decision. This is already a rich domain,
as illustrated by several prominent examples: the evaluation of replacement
strategy of capital equipment (Rust, 1987) and the closure of nuclear power
plants (Rothwell & Rust, 1997). But it would be valuable to extend the choice
variable to be non-binary, such as in Card Sharks where the contestant has
the bet level to decide in each round, as well as some binary decision
(whether to switch the face card). Although some progress has been made
on this problem, reviewed in Rust (1994), the range of applications has not
been wide (e.g., Rust & Rothwell, 1995). Moreover, none of these have
considered risk attitudes, let alone associated concepts such as loss aversion
or probability weighting. Thus, the detailed analysis of choice behavior in
environments such as Card Sharks should provide a rich test case for many
broader applications.
These game shows provide a particularly fertile environment to test
extensions to standard EUT models, as well as alternatives to EUT models of
risk attitudes. Elsewhere, we have discussed applications that consider rankdependent models such as RDU, and sign-dependent models such as CPT
(Andersen et al., 2006a, 2006b). These applications, using the VL approach
and U.K. data, have demonstrated the sensitivity of inferences to the manner
in which key concepts are operationalized. Andersen et al. (2006a) find
striking evidence of probability weighting, which is interesting since the
DOND game has symmetric probabilities on each case. Using natural
reference points to define contestant-specific gains or losses, they find no
evidence of loss aversion. Of course, that inference depends on having
Risk Aversion in Game Shows
399
identified the right reference point, but CPT is generally silent on that
specification issue when it is not obvious from the frame. Andersen
et al. (2006b) illustrate the application of alternative ‘‘dual-criteria’’ models
of choice from psychology, built to account for lab behavior with long shot,
asymmetric lotteries such as one finds in DOND. No doubt many other
specifications will be considered. Within the EUT framework, Andersen
et al. (2006a) demonstrate the importance of allowing for asset integration.
When utility is assumed to be defined over prizes plus some outside wealth
measure,33 behavior is well characterized by a CRRA specification; but when
it is assumed to be defined over prizes only, behavior is better characterized
by a non-CRRA specification with increasing RRA over prizes.
There are three major weaknesses of game shows. The first is that one
cannot change the rules of the game or the information that contestants
receive, much as one can in a laboratory experiment. Thus, the experimenter
only gets to watch and learn, since natural experiments are, as described by
Harrison and List (2004), serendipity observed. However, it is a simple
matter to design laboratory experiments that match the qualitative task
domains in the game show, even if one cannot hope to have stakes to match
the game show (e.g., Tenorio & Cason, 2002; Healy & Noussair, 2004;
Andersen et al., 2006b; and Post, van den Assem, Baltussen, & Thaler,
2006). Once this has been done, exogenous treatments can be imposed and
studied. If behavior in the default version of the game can be calibrated to
behavior in a lab environment, then one has some basis for being interested
in the behavioral effects of treatments in the lab.
The second major weakness of game shows is the concern that the sample
might have been selected by some latent process correlated with the
behavior of interest to the analyst: the classic sample selection problem.
Most analyses of game shows are aware of this, and discuss the procedures
by which contestants get to participate. At the very least, it is clear that the
demographic diversity is wider than found in the convenience samples of
the lab. We believe that controlled lab experiments can provide guidance on
the extent of sample selection into these tasks, and that the issue is a much
more general one.
The third major weakness of game shows is the lack of information on
observable characteristics, and hence the inability to use that information to
examine heterogeneity of behavior. It is possible to observe some information
from the contestant, since there is normally some pre-game banter that can
be used to identify sex, approximate age, marital status, and ethnicity. But
the general solution here is to employ econometric methods that allow one
to correct for possible heterogeneity at the level of the individual, even if one
400
STEFFEN ANDERSEN ET AL.
cannot condition on observable characteristics of the individual. Until then,
one either pools over subjects under the assumption that they have the same
preferences, as we have done; make restrictive assumptions that allow one to
identify bounds for a given contestant, but then provide contestant-specific
estimates (e.g., Post et al., 2006); or pay more attention to statistical
methods that allow for unobserved heterogeneity. One such method is to
allow for random coefficients of each structural model to represent an
underlying variation in preferences across the sample (e.g., Train, 2003,
Chapter 6; De Roos & Sarafidis, 2006; and Botti et al., 2006). This is quite
different from allowing for standard errors in the pooled coefficient, as we
have done. Another method is to allow for finite mixtures of alternative
structural models, recognizing that some choices or subjects may be better
characterized in this domain by one latent decision-making process and that
others may be better characterized by some other process (e.g., Harrison &
Rutström, 2005). These methods are not necessarily alternatives, but they
each demand relatively large data sets and considerable attention to
statistical detail.
NOTES
1. Behavior on Who Wants To Be A Millionaire has been carefully evaluated by
Hartley, Lanot, and Walker (2005), but this game involves a large number of options
and alternatives that necessitate some strong assumptions before one can pin down
risk attitudes rigorously. We focus on games in which risk attitudes are relatively
easier to identify.
2. These experiments are from unpublished research by the authors.
3. In the earliest versions of the show this option only applied to the first card in
the first row. Then it applied to the first card in each row in later versions. Finally, in
the last major version it applied to any card in any row, but only one card per row
could be switched.
4. Two further American versions were broadcast. One was a syndicated version
in the 1986/1987 season, with Bill Rafferty as host. Another was a brief syndicated
version in 2001. A British version, called Play Your Cards Right, aired in the 1980s
and again in the 1990s. A German version called Bube Dame Hörig, and a Swedish
version called Lagt Kort Ligger, have also been broadcast. Card Sharks re-runs
remain relatively popular on the American Game Show Network, a cable station.
5. Available at http://data.bls.gov/cgi-bin/cpicalc.pl
6. Let the expected utility of the bet b be pwin U(b)+plose U(b). The first
order condition for a maximum over b is then pwin Uu(b)+plose Uu(b) ¼ 0. Since
Uu(b) ¼ exp(ab) and Uu(b) ¼ exp(a(b)), substitution and simple manipulation yield the formula.
Risk Aversion in Game Shows
401
7. In addition, a variable given by stake2/2000 is included by itself to account for
possible nonlinearities.
8. Gertner (1993, p. 512): ‘‘I treat each bet as a single observation, ignoring any
contestant-specific effects.’’
9. He rejects this hypothesis, for reasons not important here.
10. For example, in a game aired on 9/16/2004, the category was ‘‘Speaking in
Tongues.’’ The $800 text was ‘‘A 1996 Oakland School Board decision made many
aware of this term for African-American English.’’ Uber-champion Ken Jennings
correctly responded, ‘‘What be Ebonics?’’
11. Nalebuff (1990, p. 182) proposed the idea of the analysis, and the use of
empirical responses to avoid formal analysis of the strategic aspects of the game.
12. One formal difference is that the first order condition underlying that
formula assumes an interior solution, and the decision-maker in runaway games has
to ensure that he does not bet too much to fall below the highest possible points of
his rival. Since this constraint did not bind in the 110 data points available, it can be
glossed.
13. The Lingo Board in the U.S. version is larger, and there are more balls in the
urn, with implications for the probabilities needed to infer risk attitudes.
14. Their Eq. (12) shows the formula for the general case, and Eqs. (5) and (8) for
the special final-round cases assuming CRRA or CARA. There is no statement that
this is actually evaluated within the maximum-likelihood evaluator, but pni is not
listed as a parameter to be estimated separately from the utility function parameter,
so this is presumably what was done.
15. The point estimates for the CRRA function (their Table 6, p. 837) are
generally around f1,800 and f1,500, with standard errors of roughly f 200 on each.
Similar results obtain for the CARA function (their Table 7, p. 839). So these
differences are not obviously significant at standard critical levels.
16. A handful of special shows, such as season finales and season openers, have
higher stakes up to $6 million. Our later statistical analysis includes these data, and
adjusts the stakes accordingly.
17. Or make some a priori judgments about the bounded rationality of
contestants. For example, one could assume that contestants only look forward
one or two rounds, or that they completely ignore bank offers.
18. Other top prizes were increased as well. For example, in the final show of the
first season, the top five prizes were changed from $200k, $300k, $400k, $500k, and
$1m to $300k, $400k, $500k, $2.5m, and $5m, respectively.
19. The instructions are available in Appendix A of the working paper version,
available online at http://www.bus.ucf.edu/wp/
20. The screen shots provided in the instructions and computer interface were
much larger, and easier to read. Baltussen, Post, and van den Assem (2006) also
conducted laboratory experiments patterned on DOND. They used instructions
which were literally taken from the instructions given to participants in the Dutch
DOND game show, with some introductory text from the experimenters explaining
the exchange rate between the experimental game show earnings and take home
payoffs. Their approach has the advantage of using the wording of instructions used
in the field. Our objective was to implement a laboratory experiment based on the
DOND task, and clearly referencing it as a natural counterpart to the lab
402
STEFFEN ANDERSEN ET AL.
experiment. But we wanted to use instructions which we had complete control over.
We wanted subjects to know exactly what bank offer function was going to be used.
In our view the two types of DOND laboratory experiments complement each other,
in the same sense in which lab experiments, field experiments, and natural
experiments are complementary (see Harrison & List, 2004).
21. Virtually all subjects indicated that they had seen the U.S. version of the game
show, which was a major ratings hit on network television in five episodes screened daily
at prime time just prior to Christmas in 2005. Our experiments were conducted about a
month after the return of the show in the U.S., following the 2006 Olympic Games.
22. The literature has already generated a lengthy lead article in the Wall Street
Journal (January 12, 2006, p. A1) and National Public Radio interviews in the U.S.
with researchers Thaler and Post on the programs Day to Day (http://www.npr.org/
templates/story/story.php?storyId=5243893) and All Things Considered (http://
www.npr.org/templates/story/story.php?storyId=5244516) on March 3, 2006.
23. Appendix B is available in the working paper version, available online at
http://www.bus.ucf.edu/wp/
24. Abdellaoui, Barrios, and Wakker (2007, p. 363) offer a one-parameter version of
the Expo-Power function which exhibits non-constant RRA for empirically plausible
parameter values. It does impose some restrictions on the variations in RRA compared
to the two-parameter EP function, but is valuable as a parsimonious way to estimate
non-CRRA specifications, and could be used for ‘‘bounds analyses’’ such as these.
25. If bank offers were a deterministic and known function of the expected value
of unopened prizes, we would not need anything like 100,000 simulations for later
rounds. For the last few rounds of a full game, in which the bank offer is relatively
predictable, the use of this many simulations is a numerically costless redundancy.
26. There is no need to know risk attitudes, or other preferences, when the
distributions of the virtual lotteries are generated by simulation. But there is
definitely a need to know these preferences when the virtual lotteries are evaluated.
Keeping these computational steps separate is essential for computational efficiency,
and is the same procedurally as pre-generating ‘‘smart’’ Halton sequences of uniform
deviates for later, repeated use within a maximum-simulated likelihood evaluator
(e.g., Train, 2003, p. 224ff.).
27. It is possible to extend the analysis by allowing the core parameter r to be a
function of observable characteristics. Or one could view the CRRA coefficient as a
random coefficient reflecting a subject-specific random effect u, so that one would
estimate r^ ¼ r^0 þ u instead. This is what De Roos and Sarafidis (2006) do for their
core parameters, implicitly assuming that the mean of u is zero and estimating the
standard deviation of u. Our approach is just to estimate r^0 .
28. Harless and Camerer (1994), Hey and Orme (1994), and Loomes and Sugden
(1995) provided the first wave of empirical studies including some formal stochastic
specification in the version of EUT tested. There are several species of ‘‘errors’’ in
use, reviewed by Hey (1995, 2002), Loomes and Sugden (1995), Ballinger and Wilcox
(1997), and Loomes, Moffatt, and Sugden (2002). Some place the error at the final
choice between one lottery or the other after the subject has decided deterministically
which one has the higher expected utility; some place the error earlier, on the
comparison of preferences leading to the choice; and some place the error even
earlier, on the determination of the expected utility of each lottery.
Risk Aversion in Game Shows
403
29. De Roos and Sarafidis (2006) assume a random effects term v for each
individual and add it to the latent index defining the probability of choosing deal.
This is the same thing as changing our specification (4) to GðrEUÞ ¼ FðrEUÞ þ v,
and adding the standard deviation of v as a parameter to be estimated (the mean of v
is assumed to be 0).
30. Clustering commonly arises in national field surveys from the fact that
physically proximate households are often sampled to save time and money, but it can
also arise from more homely sampling procedures. For example, Williams (2000, p.
645) notes that it could arise from dental studies that ‘‘collect data on each tooth
surface for each of several teeth from a set of patients’’ or ‘‘repeated measurements or
recurrent events observed on the same person.’’ The procedures for allowing for
clustering allow heteroskedasticity between and within clusters, as well as autocorrelation within clusters. They are closely related to the ‘‘generalized estimating equations’’
approach to panel estimation in epidemiology (see Liang & Zeger, 1986), and
generalize the ‘‘robust standard errors’’ approach popular in econometrics (see Rogers,
1993). Wooldridge (2003) reviews some issues in the use of clustering for panel effects,
noting that significant inferential problems may arise with small numbers of panels.
31. In the DOND literature, de Roos and Sarafidis (2006) demonstrate that
alternative ways of correcting for unobserved individual heterogeneity (random effects
or random coefficients) generally provide similar estimates, but that they are quite
different from estimates that ignore that heterogeneity. Botti, Conte, DiCagno, and
D’Ippoliti (2006) also consider unobserved individual heterogeneity, and show that it
is statistically significant in their models (which ignore dynamic features of the game).
32. Gollier (2001, p. 25) refers to this as a Harmonic Absolute Risk Aversion,
rather than the Hyperbolic Absolute Risk Aversion of Merton (1971, p. 389).
33. This estimated measure might be interpreted as wealth, or as some function of
wealth in the spirit of Cox and Sadiraj (2006).
ACKNOWLEDGMENTS
Harrison and Rutström thank the U.S. National Science Foundation for
research support under grants NSF/IIS 9817518, NSF/HSD 0527675, and
NSF/SES 0616746. We are grateful to Andrew Theophilopoulos for
artwork.
REFERENCES
Abdellaoui, M., Barrios, C., & Wakker, P. P. (2007). Reconciling introspective utility with
revealed preference: Experimental arguments based on prospect theory. Journal of
Econometrics, 138, 356–378.
Andersen, S., Harrison, G. W., Lau, M. I., & Rutström, E. E. (2006a). Dynamic choice behavior
in a natural experiment. Working Paper 06–10, Department of Economics, College of
Business Administration, University of Central Florida.
404
STEFFEN ANDERSEN ET AL.
Andersen, S., Harrison, G. W., Lau, M. I., & Rutström, E. E. (2006b). Dual criteria decisions.
Working Paper 06–11, Department of Economics, College of Business Administration,
University of Central Florida.
Ballinger, T. P., & Wilcox, N. T. (1997). Decisions, error and heterogeneity. Economic Journal,
107, 1090–1105.
Baltussen, G., Post, T., & van den Assem, M. (2006). Stakes, prior outcomes and distress in risky
choice: An experimental study based on Deal or No Deal. Working Paper, Department of
Finance, Erasmus School of Economics, Erasmus University.
Beetsma, R. M. W. J., & Schotman, P. C. (2001). Measuring risk attitudes in a
natural experiment: Data from the television game show Lingo. Economic Journal,
111, 821–848.
Blavatskyy, P., & Pogrebna, G. (2006). Testing the predictions of decision theories in a natural
experiment when half a million is at stake. Working Paper 291, Institute for Empirical
Research in Economics, University of Zurich.
Bombardini, M., & Trebbi, F. (2005). Risk aversion and expected utility theory: A field
experiment with large and small stakes. Working Paper 05–20, Department of
Economics, University of British Columbia.
Botti, F., Conte, A., DiCagno, D., & D’Ippoliti, C. (2006). Risk attitude in real decision
problems. Unpublished Manuscript, LUISS Guido Carli, Rome.
Campbell, J. Y., Lo, A. W., & MacKinlay, A. C. (1997). The econometrics of financial markets.
Princeton: Princeton University Press.
Cox, J. C., & Sadiraj, V. (2006). Small- and large-stakes risk aversion: Implications of concavity
calibration for decision theory. Games and Economic Behavior, 56(1), 45–60.
De Roos, N., & Sarafidis, Y. (2006). Decision making under risk in deal or no deal. Working
Paper, School of Economics and Political Science, University of Sydney.
Gertner, R. (1993). Game shows and economic behavior: Risk-taking on Card Sharks.
Quarterly Journal of Economics, 108(2), 507–521.
Gollier, C. (2001). The economics of risk and time. Cambridge, MA: MIT Press.
Harless, D. W., & Camerer, C. F. (1994). The predictive utility of generalized expected utility
theories. Econometrica, 62(6), 1251–1289.
Harrison, G. W., Johnson, E., McInnes, M. M., & Rutström, E. E. (2005). Risk aversion and
incentive effects: Comment. American Economic Review, 95(3), 897–901.
Harrison, G. W., Lau, M. I., & Rutström, E. E. (2007). Estimating risk attitudes in Denmark:
A field experiment. Scandinavian Journal of Economics, 109(2), 341–368.
Harrison, G. W., Lau, M. I., Rutström, E. E., & Sullivan, M. B. (2005). Eliciting risk and time
preferences using field experiments: Some methodological issues. In: J. Carpenter,
G. W. Harrison & J. A. List (Eds), Field Experiments in Economics (Vol. 10). Greenwich,
CT: JAI Press, Research in Experimental Economics.
Harrison, G. W., & List, J. A. (2004). Field experiments. Journal of Economic Literature, 42(4),
1013–1059.
Harrison, G. W., & Rutström, E. E. (2005). Expected utility theory and prospect theory: One
wedding and a decent funeral. Working Paper 05–18, Department of Economics, College
of Business Administration, University of Central Florida; Experimental Economics,
forthcoming.
Harrison, G. W., & Rutström, E. E. (2008). Risk aversion in the Laboratory. In: J. C. Cox &
G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Bingley, UK: Emerald,
Research in Experimental Economics.
Risk Aversion in Game Shows
405
Hartley, R., Lanot, G., & Walker, I. (2005). Who really wants to be a Millionaire? Estimates of
risk aversion from gameshow data. Working Paper, Department of Economics,
University of Warwick.
Healy, P., & Noussair, C. (2004). Bidding behavior in the Price Is Right Game: An experimental
study. Journal of Economic Behavior and Organization, 54, 231–247.
Hey, J. (1995). Experimental investigations of errors in decision making under risk. European
Economic Review, 39, 633–640.
Hey, J. D. (2002). Experimental economics and the theory of decision making under
uncertainty. Geneva Papers on Risk and Insurance Theory, 27(1), 5–21.
Hey, J. D., & Orme, C. (1994). Investigating generalizations of expected utility theory using
experimental data. Econometrica, 62(6), 1291–1326.
Holt, C. A., & Laury, S. K. (2002). Risk aversion and incentive effects. American Economic
Review, 92(5), 1644–1655.
Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk.
Econometrica, 47, 263–291.
Levitt, S. D., & List, J. A. (2007). What do laboratory experiments measuring social preferences
reveal about the real world? Journal of Economic Perspectives, 21(2), 153–174.
Liang, K.-Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models.
Biometrika, 73, 13–22.
Loomes, G., Moffatt, P. G., & Sugden, R. (2002). A microeconometric test of alternative
stochastic theories of risky choice. Journal of Risk and Uncertainty, 24(2), 103–130.
Loomes, G., & Sugden, R. (1995). Incorporating a stochastic element into decision theories.
European Economic Review, 39, 641–648.
Markowitz, H. (1952). The utility of wealth. Journal of Political Economy, 60, 151–158.
Merton, R. C. (1971). Optimum consumption and portfolio rules in a continuous-time model.
Journal of Economic Theory, 3, 373–413.
Metrick, A. (1995). A natural experiment in ‘Jeopardy!’. American Economic Review, 85(1),
240–253.
Mulino, D., Scheelings, R., Brooks, R., & Faff, R. (2006). An Empirical Investigation of Risk
Aversion and Framing Effects in the Australian Version of Deal Or No Deal. Working
Paper, Department of Economics, Monash University.
Nalebuff, B. (1990). Puzzles: Slot machines, zomepirac, squash, and more. Journal of Economic
Perspectives, 4(1), 179–187.
Post, T., van den Assem, M., Baltussen, G., & Thaler, R. (2006). Deal or no deal? decision
making under risk in a large-payoff game show. Working Paper, Department of Finance,
Erasmus School of Economics, Erasmus University; American Economic Review,
forthcoming.
Quiggin, J. (1982). A theory of anticipated utility. Journal of Economic Behavior and
Organization, 3(4), 323–343.
Quiggin, J. (1993). Generalized expected utility theory: The rank-dependent model. Norwell, MA:
Kluwer Academic.
Rogers, W. H. (1993). Regression standard errors in clustered samples. Stata Technical Bulletin,
13, 19–23.
Rothwell, G., & Rust, J. (1997). On the optimal lifetime of nuclear power plants. Journal of
Business and Economic Statistics, 15(2), 195–208.
Rust, J. (1987). Optimal replacement of GMC bus engines: An empirical model of Harold
Zurcher. Econometrica, 55, 999–1033.
406
STEFFEN ANDERSEN ET AL.
Rust, J. (1994). Structural estimation of Markov decision processes. In: D. McFadden &
R. Engle (Eds), Handbook of econometrics (Vol. 4). Amsterdam, NL: North-Holland.
Rust, J. (1997). Using randomization to break the curse of dimensionality. Econometrica, 65(3),
487–516.
Rust, J., & Rothwell, G. (1995). Optimal response to a shift in regulatory regime: The case of
the US Nuclear Power Industry. Journal of Applied Econometrics, 10, S75–S118.
Tenorio, R., & Cason, T. (2002). To spin or not to spin? Natural and laboratory experiments
from The Price is Right. Economic Journal, 112, 170–195.
Train, K. E. (2003). Discrete choice methods with simulation. New York, NY: Cambridge
University Press.
Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses.
Econometrica, 57(2), 307–333.
Williams, R. L. (2000). A note on robust variance estimation for cluster-correlated data.
Biometrics, 56, 645–646.
Wooldridge, J. (2003). Cluster-sample methods in applied econometrics. American Economic
Review (Papers and Proceedings), 93, 133–138.