Priors and Desires: a Model of Optimism, Pessimism, and Cognitive

Priors and Desires: a Model of Optimism,
Pessimism, and Cognitive Dissonance
Guy Mayraz
October 29, 2014
Abstract
This paper offers a model of optimism, pessimism, and cognitive dissonance. Beliefs—and consequently choices—depend not only on relevant
information, but also on what makes the decision maker better-off. In an
associated experiment, subjects who stood to gain from an increase in the
price of a financial asset predicted higher prices than subjects who stood to
gain from a decrease in price. Consistent with the model, better information
resulted in a smaller bias, but incentives for accuracy made no difference.
JEL classification: D01,D03,D80,D81,D83,D84.
Keywords: wishful thinking, cognitive dissonance, reference-dependent beliefs, reference-dependent preferences.
1
1
Introduction
Whenever people think of a possible state of affairs, they evaluate it on two dimensions: good or bad and true or false. Normatively, the two are independent:
wanting something to be true doesn’t make it any more or less likely. Nevertheless,
when people consider the evidence they are very much aware what conclusion they
want it to support: they want the evidence about their investments to indicate low
risks and high returns, the evidence about their health to suggest they have little to
worry about, the evidence about their actions to imply that they did the right thing,
and so on.
If there is only one way of interpreting the evidence, it makes no difference
how people feel about it. But if the evidence is open to multiple interpretations,
what people want it to mean can affect what they take it to mean. Optimists are
more likely to believe something is true if it makes them better off, and pessimists
are more likely to believe something is true if it makes them worse off. This paper
offers a formal model of optimism and pessimism, and reports on an experimental
test of some of the most important predictions of the model.
What makes optimists and pessimists different is not the way they make choices,
but the systematic link between their beliefs and their interests (what they have at
stake). The model represents these stakes by a reference mapping 𝑟 from states to
utility values and the resulting beliefs by a reference-dependent probability measure 𝜋𝑟 . With the help of some simplifying assumptions it is possible to obtain
a tractable representation for 𝜋. The formula involves a probability measure 𝑝,
representing beliefs under conditions of indifference, and a parameter 𝜓, which
determines how the agent’s interests affect her beliefs. The expression for the log
of the relative probability (odds-ratio) between two states 𝑎 and 𝑏 is as follows:
log
𝜋𝑟 (𝑎)
𝑝(𝑎)
= log
+ 𝜓[𝑟(𝑎) − 𝑟(𝑏)].
𝜋𝑟 (𝑏)
𝑝(𝑏)
(1)
The first term on the RHS of Equation 1 represents what the subjective oddsratio between the two states would have been if the agent were indifferent between
the two states, so that 𝑟(𝑎) = 𝑟(𝑏). The second term represents the bias. 𝜓 is a
parameter which varies from one person to another. If 𝜓 = 0 the bias term is
always zero. A zero 𝜓 therefore represents realism: whatever the agent has at
stake has no effect on her beliefs. If 𝜓 > 0 the bias term has the same sign as the
difference in the reference stakes between the two states, biasing beliefs towards the
state in which the agent is better off. A positive 𝜓 therefore represents optimism.
Finally, if 𝜓 < 0 beliefs are biased towards the state that makes the agent worse
2
off. A negative 𝜓 represents pessimism. In analogy with constant relative risk
aversion, 𝜓 is the the coefficient of relative optimism. The further it is from zero,
the stronger the bias.
Holding 𝜓 constant, the magnitude of the bias term is an increasing function
of the reference stakes 𝑟(𝑎) − 𝑟(𝑏): the more a person wants something to be true,
the stronger the pull of her optimism or pessimism. The strength of the evidence
(represented by the first term in Equation 1) provides resistance. If, for example,
there is strong evidence in favor of 𝑎, this term will be highly positive, and the bias
term would have to be strongly negative for the agent to believe that 𝑏 is true.
Optimism and pessimism bias beliefs whenever a person has something at
stake. One example is holding an investment in a financial asset, which causes
an optimist to underestimate risks and overestimate returns, with the opposite for
pessimists. However, ‘having something at stake’ is a much broader concept than
that, and could also be something like a person’s inherent interest in her health
(causing optimists to be overly sanguine about their health, and pessimists to worry
too much), in her ability (optimists being overconfident about abilities that matter
to their future payoff), and even in morality (an optimist who cares about behaving
morally is biased to believe her past actions were morally justified).
Biased beliefs affect subsequent choices: underestimating risks reduces the
demand for insurance, and makes further investment more likely; overestimating
ability causes selection into competition, and the pursuit of conflict over compromise; believing previous actions were moral makes it more likely that the person
repeats them. Some of the most interesting implications are in dynamic choice,
where today’s choice acts as the reference stakes for tomorrow’s beliefs, and thereby
influences tomorrow’s choices.
The belief patterns predicted by optimism have been observed in many areas
of economics,1 consistent with the idea that wishful thinking is a pervasive bias
that affects decisions large and small. However, while this evidence is certainly
suggestive, it is nearly always possible to find alternative explanations for each
1
Babcock and Loewenstein (1997) link the low frequency of pretrial bargains to a tendency
by both parties to believe that they would win if the case ends up in court. Olsen (1997) finds
evidence for optimistically biased beliefs among professional investment managers. Camerer and
Lovallo (1999) link excess entry into competitive markets to overconfidence over relative ability.
Malmendier and Tate (2008) argue that managerial overconfidence is responsible for corporate investment distortions. Cowgill et al. (2009) find optimistic bias in corporate prediction markets.
Park and Santos-Pinto (2010) provide field evidence for overconfidence in tournaments. Hoffman
(2011a) and Hoffman (2011b) finds that truck drivers are optimistically biased about their productivity (and hence their pay), resulting in an inefficient failure to switch jobs.
3
particular finding.2
The economics evidence is much less extensive, but includes a wide variety of
situations.3 Theoretical applications extend into additional areas.4
This paper offers a descriptive model of optimism and pessimism, but it does
not explain why they exist in the first place. Economists who tackle this question
assume that biased beliefs have to be somehow utility maximizing. The main idea
is that people have preferences not only over the outcome they obtain, but also
over their anticipatory beliefs when the outcome is still uncertain (Akerlof and
Dickens, 1982; Carrillo and Mariotti, 2000; Caplin and Leahy, 2001; Benabou
and Tirole, 2002; Brunnermeier and Parker, 2005). Since optimism leads to a
first-order improvement in anticipatory beliefs with only a second-order loss in
outcomes, the utility maximizing level of optimism bias is positive (Brunnermeier
and Parker, 2005).5
The practical implications of this idea depend on people’s ability to intentionally bias their beliefs. Obviously, it only has observable implications if people
have some sort of ability to bias their beliefs. If people can choose their coefficient of relative optimism, we would obtain a preponderance of optimists over
pessimists, for which there is good evidence.6 If people can choose their beliefs
2
For example, in Babcock and Loewenstein (1997) subjects in the role of plaintiff came to
expect higher penalties than subjects in the role of defendant, even though both groups of subjects
were exposed to the same case materials. However, subjects had to argue their side with the other
party, which may have caused them to focus their reading on arguments favoring their case. Their
beliefs could thus have arisen from a failure to correct for this selective attention, rather than from
a general wishful thinking bias.
3
Babcock and Loewenstein (1997) find that parties in negotiations are affected by wishful thinking, resulting in an inefficient failure to reach agreement. Camerer and Lovallo (1999) link excess
entry into competitive markets to overconfidence over relative ability. Malmendier and Tate (2008)
argue that managerial overconfidence is responsible for corporate investment distortions. Cowgill
et al. (2009) find optimistic bias in corporate prediction markets. Mullainathan and Washington
(2009) find that voting for a candidate results in more positive views about the candidate. Park and
Santos-Pinto (2010) provide field evidence for overconfidence in tournaments. Hoffman (2011a)
and Hoffman (2011b) finds that truck drivers are optimistically biased about their productivity (and
hence their pay), resulting in an inefficient failure to switch jobs.
4
For example, credit markets De Meza and Southey, 1996, banking Manove and Padilla, 1999,
corporate finance Heaton, 2002, search Dubra, 2004, savings Brunnermeier and Parker, 2005, insurance Sandroni and Squintani, 2007, price discrimination Eliaz and Spiegler, 2008, incentives in
organizations Santos-Pinto, 2008, and financial contracting Landier and Thesmar, 2009. Studies
of overconfidence over the accuracy of signals are excluded from this list.
5
Biased beliefs may also be instrumental in counteracting the impact of other behavioural biases (Benabou and Tirole, 2002; Compte and Postlewaite, 2004).
6
See, for example, Sharot (2011).
4
directly we would expect a second optimistic bias that is determined on a case by
case basis. Since the cost of the bias varies with the importance of subsequent decisions, the case-by-case bias should be inversely proportional to the importance
of subsequent decisions. This makes it very different from the optimism bias in
Equation 1, where the underlying bias (represented by the value of 𝜓) has nothing
to do with the importance of subsequent decisions.
This difference is particularly important when a lot depends on the decision at
hand, which is precisely the sort of situation economists care most about. It would
be irrational to choose much of a bias in this situation, but Equation 1 has nothing
to do with rational choice. If there is a great deal of uncertainty, and the agent
arrives at the decision with a significant existing investment, the model predicts a
large bias regardless of the consequences.
The terms ‘optimism’ and ‘pessimism’ can also refer to a tendency to expect
good (bad) outcomes whatever one does (and regardless of one’s current interests). This very different notion of optimism and pessimism is briefly mentioned
by Quiggin (1982), discussed more broadly in Hey (1984), and further developed
in a couple of recent papers (Bracha and Brown, 2012; Dillenberger et al., 2014).
As noted already by Quiggin (1982) it can be difficult to distinguish from standard
risk preferences (or preferences over ambiguity, if probabilities are only distorted
in ambiguous situations), and is perhaps best seen as an alternative interpretation
of such preferences. Indeed, Bracha and Brown (2012) show that their model of
optimism is the mirror image of the variational preferences model of ambiguity
aversion (Maccheroni et al., 2006), and Dillenberger et al. (2014) show an equivalence between their own model and standard subjective expected utility maximization with a more convex (concave) utility function.
The empirical part of the paper uses a simple lab experiment to test for the
predictions of wishful thinking. Subjects in the experiment observed a chart of
historical wheat prices,7 and their one and only task was to predict what the price
would be at some future time point. There was random assignment into two treatment groups: Farmers, whose payoff was increasing in the future price of wheat,
and Bakers, whose payoff was decreasing in this price. Subjects in both groups
also received a performance bonus as a function of the accuracy of their prediction.
Wishful thinking predicts bias whenever decision makers have a stake in what
the state of the world is. Farmers gain from high prices, and their beliefs should
therefore be biased upward as compared to what they would otherwise be. The op7
Charts were adapted from real asset price data, though not specifically wheat prices.
5
posite is true for Bakers. Given the random allocation, there should be a systematic
difference in beliefs between the two groups, with Farmers expecting higher prices
than Bakers.
The statistic used to identify a systematic difference in beliefs between the two
groups was the difference between the average predictions of Farmers and Bakers.
The prediction bonus formula was designed so that truthful reporting maximizes
subjective expected payoff. As long as decision makers are risk-neutral over small
amounts of money, the difference in predictions should provide an unbiased estimate of the difference in beliefs. Risk-averse subjects may, however, seek to
intentionally hedge their predictions, so as to smooth their payoff across different
states. Such hedging would result in Farmers under-reporting their true prediction,
and an opposite bias for Bakers. Consequently, the estimated difference in beliefs
between the two treatment groups may be biased downward.
The null hypothesis was defined as a non-positive difference in beliefs between
Farmers and Bakers. Hedging could plausibly have resulted in a failure to reject the
null when the true difference in beliefs is positive. There were no corresponding
reasons to expect a false positive result. The actual observation was a positive
and statistically measurable difference in predictions between Farmers and Bakers
(𝑝 < 0.0002).
This result demonstrates that wishful thinking can indeed affect at least some
subjective judgments of likelihood. However, many economically important decisions involve much higher stakes, and it is not clear whether wishful thinking
would remain significant if its cost had been large. According to the Prior and
Desires model, the magnitude of the bias is independent of the importance of subsequent decisions, so the answer is yes. According to self-deception models such
as Brunnermeier and Parker (2005), the magnitude of the bias is decreasing in its
cost, so the answer is no.
Differentiating between these two modelling approaches requires the ability to
manipulate the incentives for holding accurate beliefs. The design of the experiment afforded a simple way to do so, by varying the scale of the accuracy bonus:
the larger the potential bonus, the more subjects had to lose from holding biased
beliefs. If wishful thinking is strategic (as in self-deception), the magnitude of the
bias should decrease in the scale of the accuracy bonus. If wishful thinking is non
strategic (as in the Priors and Desires model), there should be no change in the
magnitude of the bias as the scale of the accuracy bonus is increased.
Converting this intuition into a formal test requires quantitative predictions.
‘No change’ is a testable hypothesis, but ‘decreasing with the scale of accuracy
bonus’ is not. Consequently, testing the hypothesis that wishful thinking is strate6
gic made it necessary to focus on some particular strategic model. The best known
such model is Optimal Expectations Brunnermeier and Parker, 2005. Agents in
this model have preferences over anticipated consumption, and choose beliefs in
order to maximize their subjective expected utility. The constraint is that, once
chosen, beliefs govern future choices and change only as the result of Bayesian updating. Agents therefore trade-off the gain from anticipating a high payoff, against
the cost in a lower realized bonus: the more favorable they believe the future price
to be, the higher is their anticipatory utility, but the lower the prediction bonus
they can expect to receive. Increasing the scale of the accuracy bonus increases
the cost of biased beliefs and reduces the optimal level of bias. Assuming riskneutrality over small stakes, the quantitative prediction is that the magnitude of
the bias would be inversely proportional to the scale of the accuracy bonus (Section 3.3.2).
Different sessions were run with different levels of accuracy bonus. The scale
of the bonus was increased five fold, with the maximum bonus amount varying
from £1 to £5. Results showed no decrease in the magnitude of the bias, consistent with the prediction of non strategic models. This result is statistically measurable: the prediction of the Optimal Expectations model was formally rejected
(𝑝 < 0.0140), while that of non strategic models was not (𝑝 < 0.4026). The experiment, therefore, corroborates wishful thinking in its non strategic version. This, of
course, is the version with the most far-reaching implications, implying that wishful thinking affects any and all decisions based on subjective judgment, whatever
the cost to the decision maker.8
Both types of models also prediction that the magnitude of the bias increase
in the degree of subjective uncertainty and in what subjects have at stake in the
quantity that they form expectations over (the sensitivity of the final payoff to the
day 100 price in the context of the present experiment). The prediction of both
models is that the magnitude of the bias increases in both these factors. Testing
these predictions cannot provide a further test of which model is correct, but it
can provide some further assurance that the experiment is sensitive enough for the
main conclusions to be trusted.
In order to make a test of the comparative statics of subjective uncertainty
possible, subjects were asked to provide a confidence level together with their prediction. Confidence was provided on a 1-10 scale, calibrated with the help of
8
Consistent with this result, Hoffman (2011a) and Hoffman (2011b) finds substantial overconfidence in the trucking industry, and shows that it is highly costly to workers. Prediction accuracy
is not reduced by adding monetary incentives.
7
examples provided as part of the instructions (Figure 3). By averaging the confidence reports across subjects, it was possible to obtain an estimate of the amount
of subjective uncertainty in different charts. This made it possible to test the prediction that the bias in high subjective uncertainty charts is greater. Results were
consistent with this prediction (Figure 5), and the null hypothesis that the magnitude of the bias is at least as high in low subjective uncertainty charts was rejected
(𝑝 < 0.0142). A robustness test using a different measure of uncertainty yielded
comparable results.
Due to insufficient data, a test of the comparative statics of the stakes was
inconclusive. Two sessions were run with half the stakes, and the estimated bias
was roughly half what it was in the baseline sessions. However, the null hypothesis
that that the bias is the same could not be rejected.9
One concern with interpreting the results of the experiment is that subjects may
have felt the task of predicting the day 100 price is impossible, and that they may
as well choose whichever number they want to be true. Since Farmers gain from
high prices and Bakers gain from low prices, Farmers would choose high guesses,
and Bakers would choose low ones. If this explanation is correct, we would expect subjects who are generally confident in their predictions to be less biased than
less confident subjects. Similarly, we would expect subjects who generally believe
prices in financial markets are predictable to be less biased than subjects who do
not think prices can be predicted. I tested the first prediction by defining a subject’s confidence level by the average confidence rating in her predictions across
all charts. I tested the second prediction by asking subjects in the post experiment
questionnaire whether they believe that prices in financial markets are generally
predictable. In both cases I obtained just the opposite result: subjects who believe prices are predictable and relatively confident subjects are more biased than
those who are less confident. These results suggest that this concern is misplaced.
Moreover, they support the view that over-confidence is a manifestation of wishful thinking, and that the degree of wishful thinking bias is a stable individual
characteristic, such as the parameter 𝜓 in Equation 1.
The reminder of the paper is organized as follows. Section 3 describes the
experiment in detail. Section 3.3 develops the predictions of the Optimal Expectations Brunnermeier and Parker, 2005 and Priors and Desires Mayraz, 2011 models.
Section 3.4 describes how the data were analyzed. Section 4 presents the results,
9
The comparative statics of the stakes have been studied before in a different but related context. In a study of self-deception Mijović-Prelec and Prelec (2010) found a larger bias when stakes
were higher (the ‘Anticipation Bonus’ treatment) as compared with lower bonus (the ‘Classification
Bonus’ treatment).
8
and Section 5 concludes.
2
Model
The model is intended specifically for decisions under subjective uncertainty. As
Knight (1921) famously noted, this is a very large class of decision problems.10
The judgment of probability that decision makers are required to make in such
situations offers a plausible route for optimism and pessimism to affect beliefs.
Moreover, since reasonable people make different probability judgments, optimism and pessimism can affect beliefs without this being obvious either to the
decision maker herself or to outside observers.
Subjective uncertainty is represented using a set of states, each of which denotes some possible set of affairs. The decision maker is assumed to have welldefined probabilities over the set of states. More formally, uncertainty is defined
over a measurable-space (𝑆, Σ), where 𝑆 is the set of states, and Σ is a 𝜎-algebra
of subsets of 𝑆 called events. The range of possible beliefs is represented by the
set Δ of all 𝜎-additive probability measures over (𝑆, Σ). Many decision problems
can be modeled using a finite set of states, in which case Δ is simply the set of all
possible probability distributions.
Although we are principally concerned with choices under uncertainty, the decision maker’s world is assumed to contain a source of objective uncertainty (risk).
This makes it possible to identify the utility function independently of subjective
beliefs by observing the decision maker’s choices over objective lotteries. In the
following I assume that the utility function is known.
The particular outcomes that the decision maker obtains in different states play
no role in the model—all that matters is the utility value associated with these outcomes. This makes it possible to use mappings from states to utility values (payofffunctions) to represent the decision maker’s reference stakes and the choices that
she has available.
The model can now be formally described for the case of a single decision
problem. Let 𝐹 = {𝑓 ∶ 𝑆 → ℝ} denote the set of all payoff-functions. At 𝑡 = 0
the decision maker starts out with a reference 𝑟 ∈ 𝐹 ; at 𝑡 = 1 she makes a choice 𝑐
10
“Business decisions, for example, deal with situations which are far too unique, generally
speaking, for any sort of statistical tabulation to have any value for guidance. The conception of an
objectively measurable probability or chance is simply inapplicable.” (III.VII.47); “Yet it is true,
and the fact can hardly be overemphasized, that a judgment of probability is actually made in such
cases.” (III.VII.40).
9
from a choice set 𝐶 ⊆ 𝐹 ; at 𝑡 = 2 uncertainty is resolved: some particular state 𝑠∗
is revealed to be the true state, and the decision maker obtains the outcome 𝑐(𝑠∗ ).
At this point it is worth reviewing how a standard subjective expected utility
maximizer would behave in this setting. Such a decision maker would choose 𝑐
to maximize subjective expected utility according to some probability measure
𝑞 ∈ Δ. The reference 𝑟 would play no role in her decision.
Decision makers in the present model also maximize subjective expected utility, but their beliefs are reference-dependent. Each decision maker is thus characterized by a distortion mapping 𝜋 ∶ 𝐹 → Δ, which associates each possible
reference with a probability measure over the set of states. If the reference is 𝑟 the
decision maker chooses 𝑐 to maximize subjective expected utility according to 𝜋𝑟 .
The standard model with reference-independent beliefs corresponds to the special
case where 𝜋 is a constant mapping.
Simplifying assumptions (Appendix A) yield a tractable representation for 𝜋.
Decision makers are characterized by the combination of (i) a probability measure
𝑝 ∈ Δ, which represents the decision maker’s beliefs in the special case that she is
indifferent between all states, and (ii) a real-valued parameter 𝜓, which determines
how beliefs are distorted away from 𝑝 as a function of what she has at stake. Let
𝑟 ∈ 𝐹 denote the decision maker’s reference, let 𝑎 and 𝑏 denote any two states,
and suppose that 𝑝(𝑏) > 0. The log odds-ratio between the two states is given by
the following expression:
log
𝜋𝑟 (𝑎)
𝑝(𝑎)
= log
+ 𝜓[𝑟(𝑎) − 𝑟(𝑏)].
𝜋𝑟 (𝑏)
𝑝(𝑏)
(2)
In order to understand this expression, note first that if 𝑟(𝑎) = 𝑟(𝑏) the second
term on the RHS drops out, and 𝜋𝑟 (𝑎)/𝜋𝑟 (𝑏) = 𝑝(𝑎)/𝑝(𝑏). The probability measure
𝑝 therefore represents the beliefs of a disinterested observer for whom every state
is as good as any other.
Suppose now that the decision maker is not indifferent, and 𝑟(𝑎) − 𝑟(𝑏) > 0.
If 𝜓 = 0 this makes no difference. A decision maker with 𝜓 = 0 is therefore a
realist, and holds the same beliefs regardless of her interests. If 𝜓 > 0 the entire
term is positive, pushing beliefs towards 𝑎. Such a decision maker is an optimist.
Finally, if 𝜓 < 0 the distortion term is negative, pushing the decision maker’s
beliefs towards 𝑏. Such a decision maker is a pessimist.
The parameter 𝜓 determines not only whether the decision maker is an optimist or pessimist, but also the strength of her bias. In analogy with risk aversion,
𝜓 is the coefficient of relative optimism. Note that 𝜓 is defined relative to a given
10
utility function representation. Any rescaling of the utility function has to be accompanied by an inverse rescaling of 𝜓.11
There are a number of other useful expressions for 𝜋. The expression for the
probability of a state 𝑠 is
𝜋𝑟 (𝑠) ∝ 𝑝(𝑠)𝑒𝜓𝑟(𝑠) .
(3)
In log terms it becomes the equation for a line, with 𝜓 as the slope:
log 𝜋𝑟 (𝑠) = log 𝑝(𝑠) + 𝜓𝑟(𝑠) + 𝐶
(4)
The most general expression is the following, where 𝐴 is any event:
𝜋𝑟 (𝐴) ∝
∫
𝐴
𝑒𝜓𝑟 𝑑𝑝.
(5)
Consider again Equation 2, and define 𝛿 = 𝑟(𝑎) − 𝑟(𝑏) as a measure of the
decision maker’s stake in 𝑎 rather than 𝑏 obtaining. Other things being equal, the
bias scales with 𝛿. As an illustration, suppose 𝑝(𝑎) = 𝑝(𝑏) = 1/2, and consider an
optimist with 𝜓 = log 2. The odds ratio 𝜋𝑟 (𝑎)/𝜋𝑟 (𝑏) with 𝛿 = 1 would then be 2,
corresponding to 𝜋𝑟 (𝑎) = 2/3 and 𝜋𝑟 (𝑏) = 1/3. If 𝛿 = 2 the odds-ratio would be
4, corresponding to 𝜋𝑟 (𝑎) = 4/5 and 𝜋𝑟 (𝑏) = 1/5. Pessimism is the mirror image
of optimism. For example, a pessimist with 𝛿 = 2 and 𝜓 = − log 2 would have an
odds ratio of 1/4, corresponding to 𝜋𝑟 (𝑎) = 1/5 and 𝜋𝑟 (𝑏) = 4/5.
The magnitude of the bias is also dependent on the weight of the evidence,
represented by the indifference beliefs 𝑝. In the above examples the odds-ratio is
𝑝(𝑎)/𝑝(𝑏) = 1, corresponding to equal evidence on both sides. The bias is much less
if the evidence leans strongly on one side. Suppose, for example, that the evidence
favors 𝑏 with 𝑝(𝑎)/𝑝(𝑏) = 1/4, corresponding to 𝑝(𝑎) = 1/5 and 𝑝(𝑏) = 4/5. With
𝛿 = 1 and 𝜓 = log 2, the odds-ratio would double to 1/2, raising the probability
of the good state only to 1/3. A coefficient of relative optimism larger than log 4
would be necessary for 𝜋𝑟 (𝑎) to rise above 1/2.
These comparative statics are readily seen in the case of a normal distribution
with linear reference stakes. Consider the effect of holding a financial asset on
the beliefs of a risk neutral investor. States represent the price of the asset, and
the initial stakes are a linear function of the state (the slope corresponding to the
size of the investment). As the following proposition shows, if beliefs are given by
11
If we replace the utility function 𝑢 by a different utility function 𝑢′ related to 𝑢 by a positive
affine transformation (𝑢′ = 𝑎𝑢 + 𝑏 where 𝑎 > 0) we need to replace 𝜓 by a different coefficient of
relative optimism 𝜓 ′ = 𝜓/𝑎.
11
a normal distribution over asset prices, the effect of the bias is to shift the entire
distribution, the shift being proportional (i) to the size of the investment, and (ii)
the variance of the distribution:
Proposition 1 (normal distribution). Let 𝑃 and Π𝑟 denote the cumulative probability distribution functions that correspond to 𝑝 and 𝜋𝑟 , and let 𝜓 denote the decision
maker’s coefficient of relative optimism. Suppose 𝑆 = ℝ, 𝑃 ∼ 𝒩 (𝜇, 𝜎 2 ) for some
𝜇, 𝜎 ∈ ℝ and 𝑟(𝑠) = 𝑎𝑠 + 𝑏 for some 𝑎, 𝑏 ∈ ℝ, then Π𝑟 ∼ 𝒩 (𝜇 + 𝜓𝑎𝜎 2 , 𝜎 2 ).
Proof. By Equation 5 and the assumption that 𝑃 ∼ 𝒩 (𝜇, 𝜎 2 ),
𝑠
Π𝑟 (𝑠) ∝
∫
−∞
𝑒𝜓𝑟(𝑠) 𝑑𝑝 =
= 𝑒𝜓𝑏
𝑠
1
𝑠
∫
−∞
𝑒
𝑒𝜓(𝑎𝑠+𝑏)
1
( √2𝜋𝜎
2 ))2 −𝜓 2 𝑎2 𝜎 4
2𝜎 2
− (𝑠−(𝜇+𝜓𝑎𝜎
∫
−∞ √2𝜋𝜎
= 𝒩 (𝜇 + 𝑎𝜓𝜎 2 , 𝜎 2 ).
𝑒
− (𝑠−𝜇)2
)
𝑠
𝑑𝑠 ∝
2
2𝜎
𝑑𝑠
1
∫
−∞ √2𝜋𝜎
𝑒
− (𝑠−(𝜇+𝜓𝑎𝜎
2
2𝜎
2 ))2
𝑑𝑠
Optimism and pessimism are psychologically very far from Bayesian rationality. Nevertheless, it is actually possible to interpret the equations of the model
as Bayesian updating. According to this interpretation, optimists and pessimists
believe that Nature has selected the state of the world with their interests in mind,
making their interests a valuable source of information as to Nature’s choice. The
Bayes Rule analogue of Equation 2 is
log
𝑝(𝑎|𝑒)
𝑝(𝑎)
= log
+ [log 𝑝(𝑒|𝑎) − log 𝑝(𝑒|𝑏)],
𝑝(𝑏|𝑒)
𝑝(𝑏)
(6)
where the reference stakes play the role of the evidence 𝑒, and the log likelihood in a
state 𝑠 is given by log 𝑝(𝑒|𝑠) = 𝜓𝑟(𝑠). Decision makers start with their indifference
beliefs, observe their reference interests 𝑟, and use Bayesian inference to update
their beliefs.12
12
It is also possible to give the optimism and pessimism models of Hey (1984), Bracha and
Brown (2012), and Dillenberger et al. (2014) a Bayesian interpretation. However, in those models
the information is not in the reference stakes, but in the choices that optimists and pessimists make.
Moreover, making a choice is not merely informative about the state of the world, but actually
changes it.
12
In order to identify the parameters of the model, we need to observe the decision maker’s beliefs with two different reference stakes. Once we identify 𝜓 we
can determine how a change of reference would alter beliefs. Consider parents
whose child is to be allocated randomly to one of two schools. The parents want
their child to be allocated to the better school, but they do not know which school
is better. States correspond to the identity of the better school. Ex-ante, the parents do not know the allocation, and have the same stake in both states. Since their
reference is constant, their beliefs are represented by the indifference probability
measure 𝑝. Once they learn which school their child would attend, their reference
changes to a higher utility in the state in which that school is better. According to
Equation 2, this change in reference causes a change in the parents’ beliefs. Optimistic (pessimistic) parents come to think more (less) highly of the school their
child has been allocated to, with the size of this effect being a function of their
utility function and of 𝜓. Such a change in beliefs in the absence of a change in
relevant information is an example of cognitive dissonance.
Further assumptions are required if we want to be able to say whether the resulting beliefs are too high or too low, or even how they compare with the beliefs
of other equally informed individuals. One useful assumption is the following:
IPC (interpersonal comparability) The indifference beliefs of all equally informed
individuals coincide.
According to this assumption, differences in subjective beliefs are completely
determined by information: the normatively relevant information we normally
think of as information, and the normatively irrelevant “information” represented
by the reference stakes. Consider a group of equally informed parents, whose children were allocated to the two schools. Using this assumption we can say that optimistic parents whose child was allocated to the first school will be more positive
about that school than optimistic parents whose child was allocated to the second
school. In the absence of some such assumption we could only make a statistical
prediction about the distribution of beliefs in the two populations.13
Suppose two optimists find themselves in a conflict, in which their interests are
opposed. By similar reasoning, optimism creates a gap in the two sides’ evaluation
of the situation, with each party more positive about her own chances than the
other party’s valuation of her chances. This gap in beliefs can effectively destroy
13
This example is analogous to the experiment described in Section 3. The statistical inference
is only valid if the allocation is independent of ex-ante beliefs.
13
the chances of an efficient compromise.14
In many applications it is appropriate to go further, and assume a weak form
of rational expectations:
WRE (weak rational-expectations) Indifferent individuals have rational expectations.
This assumption makes it possible to say that optimistic parents overestimate
the true quality of the school their child was allocated, that optimistic parties to a
conflict overestimate their true chances, etc.
Consider an individual who (naturally enough) does not want to fall ill. An
optimist would overestimate the likelihood of health, and consequently have an
inefficiently high reservation price for insurance, skip health checks, and make few
preparations in case she does fall ill. A pessimist would overestimate the likelihood
of disease, would have an inefficiently low reservation price for insurance, have
inefficiently many health checks, and spend too much time planning for disaster to
strike.
The model applies in an exactly analogous way whenever people prefer one
state to another, whatever the reason. If, for example, people have social preferences, and want their friends to be healthy, they would be biased not only about
their own health, but also about that of their friends (though if they don’t care as
much about their friends’ health as they do about their own, they wouldn’t be as
biased about their friends’ health).15 If a person cares about morality, she would
be biased about the morality of her past actions and of her situation in life. If she
is rich, she would be biased to believe that her wealth is deserved.
3
Experiment
Section 3.1 describes the implementation and protocol, Section 3.2 describes specifics
of the belief elicitation procedure in Section 3.2, Section 3.3 describes the predictions of the standard model, Optimal Expectations, and the Priors and Desires
models. Finally, Section 3.4 describes how the data was analyzed.
14
See Loewenstein et al. (1993), Babcock et al. (1995), and Babcock and Loewenstein (1997)
for related evidence from pretrial bargaining and other disputes.
15
See Weinstein (1989) for findings consistent with this.
14
3.1
Implementation and protocol
The experiment was conducted at the Centre for Experimental Social Science
(CESS) at Nuffield College, University of Oxford. The subject pool consisted of
Oxford students who registered on the CESS website for participation in experiments. Business, finance, and economics students were excluded. A week before
each session students meeting the sample restrictions received an email inviting
them to participate in an experiment that would require one hour of their time.
Further details were given on-site prior to the experiment itself. Registration was
via an online form, allowing students to select one of several sessions, up to an
upper limit of 14 students per session. Taking no-shows into account, sessions
consisted of between 10 and 13 students. Altogether, 145 students took part in the
experiment, of whom 57 were male and 88 female. The median age was 22.
Sessions were conducted in the afternoon over a total of six days. There were
12 sessions altogether. Half the sessions consisted of Farmers, and half of Bakers.
The order of sessions was randomized in order to prevent any consistent relationship between the time of day in which a session was held, and the role given to the
subjects who took part in that session.
After subjects were seated, they were each given a copy of the instructions,
which they were able to refer to until the experiment ended. The instructions were
also read aloud, and there was an opportunity for subjects to ask questions. The
experiment itself consisted of 13 periods, the first of which was a training period,
and the remaining 12 were earning periods. A given set of 13 charts was used
throughout the experiment. One of these 13 charts was reserved for the training
period, and the other 12 charts were used for the earning periods (Figure 2). The
order of presentation was randomized independently between subjects. At the end
of the experiment, each subject had one earning period chosen at random, and was
paid in accordance with the payoff in that period.
The experiment was conducted in a computer lab, and was programmed using
z-Tree Fischbacher, 2007. Figure 1 shows an example of the interface. In each
period subjects were shown a chart of wheat prices, and were asked to predict the
price of wheat at some future date. Subjects were thus put in a somewhat similar
position to speculators who ignore fundamental information, and predict future asset prices on the basis of historical price charts.16 In order to maximize the realism
of the task, prices were adapted from real financial markets. The specific source
was historical stock prices, scaled and shifted to fit into a uniform range. Charts
16
Traders refer to the use of historical price charts in making buy and sell decisions as Technical
Analysis Murphy, 1999; Edwards and Magee, 2010.
15
were selected to include a variety of situations. Time was standardized across
charts, so that all charts had space for prices going from day 0 to day 100. Subjects were only shown prices up to an earlier date, and the task was to predict what
the price of wheat would be at day 100. The price range was also standardized, so
that prices were always between £4,000 and £16,000.
After submitting their prediction, subjects were presented with a waiting screen
until all other subjects had also made their prediction. There was therefore little
or no incentive for speed. The transition to the next period only occurred after
all the subjects in the room had submitted their prediction. A brief questionnaire
was administered following the final period of the experiment. After all subjects
completed the questionnaire, subjects were informed of their earnings, and were
called to receive their payment.
Farmers were instructed that the price of wheat varies between £4,000 and
£16,000, that it had cost them £4,000 to grow the wheat, and that they would
be selling their wheat for the price that would obtain at day 100. Their notional
profit was therefore between zero and £12,000, depending on the day 100 price.
The payoff at the end of the experiment consisted of three parts: an unconditional
£4 participation fee, profit from the sale of the wheat, and a prediction accuracy
bonus. In the baseline sessions subjects received £1 in real money for each £1,000
of notional profit, and could earn up to an extra £1 from making a good prediction.
The prediction procedure and bonus formula are explained in detail in Section 3.2.
Bakers were told that they make bread, which they would sell for a known price
of £16,000, and that in order to make the bread they would be buying wheat at the
price that would obtain at day 100. The range of notion profit was therefore the
same as that of Farmers, and all other particulars were also the same. The one
difference was that that Farmers gained from high wheat prices, whereas Bakers
gained from low prices.
Sessions differed in the scale of the accuracy bonus and in the stakes (the degree
to which payoff depended on the price level at day 100). In the baseline sessions
the maximum obtainable bonus was £1, and the amount received for each £1,000
of notional profit was also £1. Sessions were also conducted with a bonus level of
£2 and £5, and with stakes of 50 pence for each £1,000 of notional profit.17 Table 1
lists the number of sessions in each condition.
17
In sessions with lower stakes, subjects received an additional £3, so that the average payoff
was the same as in the baseline sessions.
16
Table 1: The number of sessions for each combination of bonus scale and stakes.
bonusa
1
2
5
1
stakesb
1
1
1
0.5
sessionsc
4
2
4
2
subjects
49 (25 Farmers, 24 Bakers)
26 (13 Farmers, 13 Bakers)
44 (23 Farmers, 21 Bakers)
26 (12 Farmers, 14 Bakers)
a
The amount in pounds subjects received for an optimal prediction of the day 100 price. The
larger it was, the more subjects had to gain from holding accurate beliefs. The bonus for less
good predictions was scaled accordingly.
b
The amount in pounds subjects received for each £1,000 of notional profit. The larger the
stakes, the more subjects had to gain from the the day 100 price being high (if they were
Farmers), or low (if they were Bakers).
c
Sessions were conducted in pairs: one for Farmers and the other for Bakers.
3.2
The belief elicitation procedure
The belief elicitation procedure was designed with two goals in mind. The first
was to make it possible to test for the presence or absence of wishful thinking bias,
namely a systematic difference in beliefs between Farmers and Bakers. The second
was to obtain a measure of the degree of subjective uncertainty in the predictions
subjects make. This was important both for testing whether the magnitude of the
bias is greater in charts with more subjective uncertainty, and for testing whether
more confident individuals are also more biased.
In each period subjects were asked to report two numbers: a prediction and a
confidence level. The prediction represented the expected day 100 price, and could
be any number in the range of possible prices. The confidence level represented
the (inverse of) the uncertainty in the prediction, and was reported on a 1-10 scale.
In order to give meaning to the 1-10 confidence scale, the instructions included
visual examples of distributions with different prediction and confidence levels
(Figure 3). The distributions were the weighted average of a normal distribution
and a uniform one, with almost all the weight given to the normal. The prediction
corresponded to the mean of the normal distribution, and the confidence level was
inversely proportional to its standard deviation. The density corresponding to a
prediction of 𝑚 ∈ [4000, 16000] and confidence level 𝑟 ∈ [1, 10] was
𝑞(𝑥) = (1 − 𝜖)𝒩 (𝑥|𝑚, (𝜆𝑟)−2 ) + 𝜖
(7)
where 𝒩 (⋅|𝜇, 𝜎 2 ) represents a normal distribution with a given mean and variance,
𝜆 is a scale parameter, translating the 1-10 confidence scale into the scale of prices,
17
and 𝜖 is the weight given to the uniform component. The effect of the latter was to
ensure that the density was bounded below by 𝜖, including at prices far from the
prediction.
The scoring rule was logarithmic: subjects whose prediction and confidence
level corresponded to a density 𝑞 received a bonus given by
𝑏(𝑥) = 𝛼 log (𝑞(𝑥)/𝜖)
(8)
where 𝑥 is the true day 100 price, and 𝛼 is a parameter which determines the maximum bonus level.18 As 𝑞 ≥ 𝜖 (Equation 7), the bonus was positive for all possible
predictions. The value of 𝛼 was calibrated for the maximum bonus level in the
session (Table 1).
To see under what conditions the scoring rule is incentive compatible, let 𝑃
denote the probability measure representing the subject’s true beliefs, and suppose the subject reports a prediction 𝑚 and a confidence level 𝑟. The subjective
expectation of the bonus is given by the following expression:
𝔼𝑃 [𝑏(𝑥)] =
+
∫
∫
𝑝(𝑥)𝛼 log
𝑞(𝑥)
𝑞(𝑥)
d𝑥 = 𝛼 ( 𝑝(𝑥) log
d𝑥
∫
𝜖
𝑝(𝑥)
(9)
𝑝(𝑥) log 𝑝(𝑥) d𝑥 − log 𝜖 ) = 𝛼 ( − 𝐷KL (𝑃 ||𝑄) − 𝐻(𝑃 ) − log 𝜖 )
where 𝐷KL (𝑃 ||𝑄) is the Kullback-Leibler divergence (KL-divergence or relative
entropy) between 𝑃 and 𝑄, and 𝐻(𝑃 ) is the entropy of 𝑃 . Maximizing the expected bonus with respect to 𝑄 is thus equivalent to minimizing the KL-divergence
𝐷KL (𝑃 ||𝑄). According to a standard result, 𝐷KL (𝑃 ||𝑄) ≥ 0 for all 𝑄, and is minimized if 𝑄 = 𝑃 .19
The scoring rule works best if subjects are risk neutral and beliefs are well approximated by a density in the family described by Equation 7. The scoring rule
should then successfully elicit the prediction and confidence level for each subject
in each chart, making it possible to identify the difference in beliefs between Farmers and Bakers, the average subjective uncertainty in each chart, and the average
confidence for each subject.
18
The logarithmic scoring rule was introduced in Good (1952). See Gneiting and Raftery (2007)
for a recent discussion and comparison to other scoring rules.
19
This result, known as Gibb’s Inequality, follows directly from the fact that log 𝑥 is a concave
function Cover and Thomas, 1991. The instructions explained that the expected bonus is maximized by reporting a prediction and confidence level that reflect the subject’s beliefs about the day
100 price. The bonus formula itself was included in a footnote.
18
One potential difficulty is hedging.20 Consider a risk-averse Farmer. Her profit
is increasing in the price, and she would therefore prefer to receive the bonus in
states in which the price is relatively low. Consequently, she could increase her
subjective expected utility by reporting a lower number than her true beliefs. By a
similar logic, a risk-averse Baker would be better-off by reporting a higher number.
The result would be a downward bias in the estimated difference in beliefs between
Farmers and Bakers.
A second potential problem is the possibility that the beliefs of some subjects
are bi-modal, or otherwise not well approximated by a density in the family described by Equation 7. This could make it harder for subjects to see what prediction would maximize their payoff, making predictions within each group more
variable than they would be otherwise. This increase in variance would translate into more noise in the estimated difference in beliefs between the two groups,
though it should not result in bias.
3.3
Predictions
This section develops the predictions of the standard model, the Priors and Desires
model, and the Optimal Expectations model. The following timing framework
is used: at 𝑡 = 0 subjects observe a price chart and form their beliefs over the
day 100 price; at 𝑡 = 1 they report their prediction and confidence level, and
consume anticipatory utility; at 𝑡 = 2 the day 100 price is revealed, and payoffs are
realized. Subjects are assumed to be risk neutral. Beliefs about the day 100 price
are represented by a distribution from the family described by Equation 7. Given
these assumptions, the prediction made at 𝑡 = 1 coincides with the 𝑡 = 1 beliefs.
3.3.1
The standard model
Subjects are allocated into the Farmer and Baker roles randomly. The 𝑡 = 0 beliefs
of beliefs of Farmers and Bakers are therefore drawn from the same distribution.
Since the prediction coincides with the 𝑡 = 1 beliefs, and as no new information
is observed between 𝑡 = 0 and 𝑡 = 1, it follows that predictions are also drawn
from the same distribution. Consequently, there is no systematic difference in
predictions between Farmers and Bakers.
20
Blanco et al. (2008) find evidence of hedging in belief reporting when opportunities are transparent and incentives are strong. Armantier and Treich (2010) discuss hedging in probability elicitation.
19
3.3.2
Optimal Expectations
Optimal expectations agents choose their prior beliefs in order to maximize their
discounted subjective expected utility, where each period’s instantaneous utility
includes anticipatory utility as well as standard consumption utility.
The payoff in the experiment is realized at 𝑡 = 2, and consists of two components: the profit and the accuracy bonus. The profit is a function of the true price,
while the bonus depends on the accuracy of the 𝑡 = 1 beliefs. Anticipatory utility is proportional to the expected value of the profit and bonus, with expectations
computed using the 𝑡 = 1 beliefs. The more optimistic those beliefs are, the higher
is anticipatory utility, but the less accurate the prediction is likely to prove. The
𝑡 = 0 decision maker choosing her 𝑡 = 1 beliefs therefore faces a trade-off: more
bias increases the anticipatory utility experienced at 𝑡 = 1, but lowers the expected
value of the 𝑡 = 2 consumption utility.
Let 𝑃 and 𝑄 denote the probability distributions representing the 𝑡 = 0 and
𝑡 = 1 beliefs respectively. At 𝑡 = 0 the agent maximizes a weighted sum of the
𝑡 = 1 anticipatory utility and 𝑡 = 2 realized payoff. Let 𝜂 denote the weight
given to anticipatory utility, so that the weight given to the realized payoff is 1 − 𝜂.
Letting 𝑥 denote the true day 100 price, the profit can be written as 𝜙𝜅𝑥 + 𝑙, where
𝑥 is true day 100 price, 𝜅 represents the stakes (the absolute value of the slope
relating the profit to the day 100 price), and 𝜙 denotes the direction, with 𝜙 = 1
for Farmers and 𝜙 = −1 for Bakers. I denote the bonus by 𝑏(𝑥), where 𝑏 is defined
by Equation 8. The 𝑡 = 0 maximand can thus be written as follows:
𝑊 = 𝜂𝔼𝑄 [𝜙𝜅𝑥 + 𝑏(𝑥)] + (1 − 𝜂)𝔼𝑃 [𝜙𝜅𝑥 + 𝑏(𝑥)] + 𝑙
(10)
In order to derive the comparative statics of the bias in closed form I make a
couple of simplifying assumptions. First, I assume that 𝑃 and 𝑄 are normal:
𝑃 = 𝒩 (𝜇0 , 𝜎02 ), and 𝑄 = 𝒩 (𝜇1 , 𝜎12 ). Second, I assume that only the mean of
𝑄 is subject to bias, i.e. 𝜎1 = 𝜎0 = 𝜎. Given these assumptions and using Equation 9, we can rewrite Equation 10 as follows:
𝑊 = 𝜂𝔼𝑄 [𝜙𝜅𝑥 + 𝑏(𝑥)] + (1 − 𝜂)𝔼𝑃 [𝜙𝜅𝑥 + 𝑏(𝑥)] + 𝑙
= 𝜂 (𝜙𝜅𝜇1 − 𝛼𝐻(𝑄) − 𝛼𝐷KL (𝑄||𝑄) − 𝛼 log 𝜖 )
+ (1 − 𝜂)(𝜙𝜅𝜇0 − 𝛼𝐷KL (𝑃 ||𝑄) − 𝛼𝐻(𝑃 ) − 𝛼 log 𝜖 ) + 𝑙
(11)
= 𝜂(𝜙𝜅𝜇1 − 𝛼𝐻(𝑄)) − (1 − 𝜂)𝛼𝐷KL (𝑃 ||𝑄) + 𝐶
where 𝐶 collects factors that are independent of 𝑄. The two terms that depend on
𝑄 represent, respectively, the gain in anticipatory utility from adopting optimistic
20
beliefs, and the cost in expected realized payoff of adopting such beliefs. The
gain term has two components. The first represents the anticipated profit, and is
proportional to 𝜇1 = 𝔼𝑄 [𝑥]. The second represents the anticipated bonus, and
is decreasing in the degree of uncertainty in 𝑄, measured by its entropy 𝐻(𝑄).
The gain term is thus larger the more favorable is the expected day 100 price, and
the more certain the subject is about her prediction. The cost term represents the
reduction in expected bonus due to the bias in the prediction that follows from the
bias in the 𝑡 = 1 beliefs, and is proportional to the Kullback-Leibler divergence
between the 𝑡 = 0 beliefs 𝑃 and the 𝑡 = 1 beliefs 𝑄. Thus, if the subject cared only
about the realized payoff she would choose not to bias her beliefs at all (𝑄 = 𝑃 ).
If, instead, she cared only about her 𝑡 = 1 instantaneous utility, she would choose
to believe that the most favorable price would be realized,21 and would further
choose to assign this prediction as little subjective uncertainty as possible.
If 𝜂 is sufficiently small, the optimal choice of 𝜇1 would be an extreme value in
the favorable direction. Otherwise, the optimal value of 𝜇1 would be at an internal
point, where 𝜕𝑊 /𝜕𝜇1 = 0. Since we do not observe subjects making extreme predictions I assume that 𝜂 is large enough that the optimal value of 𝜇1 is at an internal
point. Using the standard formula for the KL-divergence between two normal distributions Johnson and Sinanovic, 2001, and noting that 𝐻(𝑄) is independent of
𝜇1 , the derivative can be written as follows:
𝜕𝐷 (𝑃 ||𝑄)
𝜕𝐻(𝑄)
𝜕𝑊
= 𝜂𝜙𝜅 + 𝜂
− (1 − 𝜂)𝛼 KL
𝜕𝜇1
𝜕𝜇1
𝜕𝜇1
(𝜇 − 𝜇 )
= 𝜂𝜙𝜅 − (1 − 𝜂)𝛼 1 2 0
𝜎
(12)
Setting the derivative to zero and solving for 𝜇1 we obtain the following expression
for the bias:
𝜂
𝜅𝜎 2
𝜇1 − 𝜇0 = 𝜙
(13)
(1 − 𝜂 ) ( 𝛼 )
where 𝜅 represents the stakes, or the degree to which the profit is dependent on the
value of the day 100 price, 𝜎 2 represents the degree of subjective uncertainty, and
𝛼 represents the scale of the accuracy bonus, or the cost of holding biased beliefs.
Equation 13 describes the bias in the beliefs of one particular individual. The
21
That is, the highest possible price of £16,000 if a Farmer, and the lowest possible price of
£4,000 if a Baker.
21
prediction for the average bias in the population of subjects in the same role is
𝔼[𝜇1 − 𝜇0 ] = 𝔼[𝜇1 ] − 𝔼[𝜇0 ] = 𝜙𝔼
𝜂
𝜅𝜎 2
[1 − 𝜂 ] ( 𝛼 )
(14)
where I allow for the possibility that 𝜂 varies between individuals, but assume that
it is independent of 𝜎 2 (because of the random assignment 𝜂 is independent of 𝜅 and
𝛼). Finally, it also follows from the random allocation that the undistorted beliefs
of Farmers and Bakers are drawn from the same distribution, and that in particular
𝔼𝜇0 is the same in both groups. The expected difference in beliefs between the two
groups is therefore given by
𝑏optimal expectations = 2𝔼
𝜂
𝜅𝜎 2
𝜅𝜎 2
∝
[1 − 𝜂 ] ( 𝛼 )
𝛼
(15)
Optimal Expectations thus implies a systematic difference in beliefs between Farmers and Bakers that is proportional to the stakes and to the degree of subjective
uncertainty, and inversely proportional to the cost of getting beliefs wrong.
3.3.3
Priors and Desires
The payoff-function in the experiment is the mapping linking the subject’s payoff
to the day 100 price.22 Using the same notation as in Section 3.3.2, the payofffunction is given by 𝑟(𝑥) = 𝜙𝜅𝑥 + 𝑙, where 𝑥 is the day 100 price, 𝜅 represents the
stakes, or the slope relating payoff to the day 100 price, and 𝜙 denotes the direction,
with 𝜙 = 1 for Farmers and 𝜙 = −1 for Bakers. Suppose, as in Section 3.3.2, that
undistorted beliefs are given by a normal distribution 𝑃 = 𝒩 (𝜇0 , 𝜎 2 ). According
to Proposition 1 the distorted probability measure is given by 𝑄 = 𝒩 (𝜇1 , 𝜎 2 ),
where
𝜇1 − 𝜇0 = 𝜙𝜓𝜅𝜎 2
(16)
This equation describes the bias in the beliefs of some particular individual, and
is the Priors and Desires analogue of Equation 13. By analogy with Section 3.3.2,
the expected difference in beliefs between Farmers and Bakers is
𝑏priors and desires = 2𝔼[𝜓]𝜅𝜎 2 ∝ 𝜅𝜎 2
(17)
Comparing this result to Equation 15, we see that—as with Optimal Expectations—
the magnitude of the bias is proportional to the stakes 𝜅 and the degree of subjective
22
In principle, it should be the payoff in utility terms, but I am assuming throughout this section
that subjects are risk neutral over small amounts of money.
22
uncertainty 𝜎 2 . However, whereas in Optimal Expectations the magnitude of the
bias is inversely proportional to the cost of getting beliefs wrong 𝛼, the magnitude
of the bias in Equation 17 is independent of 𝛼.
3.4
Analysis
As noted in Section 3.2, hedging could lead to a downward bias in estimating the
difference in beliefs between Farmers and Bakers. In order to minimize this risk, a
questionnaire was administered after the experiment itself was concluded, in which
subjects were asked whether they always reported their best guesses, or whether
they sometimes reported a higher or lower number. Out of a total of 145 students
who took part in the experiment, 132 claimed to have always reported their best
guess, and 13 admitted to an intentional bias in their predictions. Observations
from these 13 subjects were excluded from the main analysis.
The raw data from the experiment consist of the predictions and confidence
levels reported by individual subjects in individual charts. The primary goal in
analyzing the data was to determine whether predictions were affected by wishful
thinking. Let 𝑦𝑖,𝑗 denote the prediction made by subject 𝑖 in chart 𝑗, and let 𝑡𝑖 ∈
{1, −1} denote whether subject 𝑖 is a Farmer or a Baker. We want to know whether
𝑦𝑖𝑗 is systematically higher if 𝑡𝑖 = 1. In order to answer this question formally I
used the following regression model:
𝑦𝑖𝑗 = 0.5𝛽𝑡𝑖 +
∑
𝛾𝑗 𝑑𝑗 + 𝜖𝑖𝑗
(18)
𝑗
where 𝑑𝑗 is a dummy for chart 𝑗, and 𝜖𝑖𝑗 is the error term. The value of 𝛽 represents
the contribution of wishful thinking. The null hypothesis is that 𝛽 ≤ 0.
The second goal in analyzing the data was to investigate the comparative statics of the bias. This required estimating the bias separately in different subsamples
of interest. Let 𝐾 denote a partition of the sample, indexed by 𝑘, and let 𝑐𝑖𝑗𝑘 denote
a dummy for whether the prediction of subject 𝑖 in chart 𝑗 belongs to subsample
𝑘. Assuming wishful thinking is the only systematic source of difference in predictions between subjects, we can generalize Equation 18 as follows:
𝑦𝑖𝑗 = 0.5
∑
𝛽𝑘 𝑐𝑖𝑗𝑘 𝑡𝑖 +
𝑘∈𝐾
∑
𝛾𝑗 𝑑𝑗 + 𝜖𝑖𝑗
(19)
𝑗
In this equation 𝛽𝑘 represents the average difference in predictions between Farmers and Bakers in class 𝑘, and can be used to define formal comparative statics
hypotheses.
23
Unobserved factors may result in a correlation in the predictions made by the
same subject in different charts, so that 𝜖𝑖𝑗 may be correlated with 𝜖𝑖𝑘 for 𝑗 ≠ 𝑘.
In order to allow for this possibility, standard errors are clustered by subject in all
regressions.
4
Results
This section presents the results of the experiment, starting with the overall difference in predictions between Farmers and Bakers, and continuing with the comparative statics of the bias. Parameter estimates and statistical test results are presented in summary form in Table 3. Figures 4 and 5 provide a graphical illustration
of the results.
4.1
Wishful thinking bias
The overall magnitude of the wishful thinking bias corresponds to the systematic
difference in predictions between Farmers and Bakers across the entire sample,
represented by the value of 𝛽 in Equation 18. The estimate for this number is
£452, measured with a robust standard error of £123. The null-hypothesis that it
is non-positive is rejected with a 𝑝-value of 0.0002.
This estimate excludes observations from the 13 subjects who admitted in the
post experiment questionnaire to biased reporting of their beliefs (Section 3.4).
If these subjects are nonetheless included, the estimate goes down to £390. This
difference is consistent with the prediction that risk-averse Farmers (Bakers) would
intentionally understate (overstate) their estimates of the day 100 price.
The observed difference in predictions between Farmers and Bakers can be
explained by wishful thinking, but not by ego-utility or by a cognitive bias.
4.2
Incentives for accuracy
Self deception predicts that the magnitude of the bias would be decreasing in the
incentives for accuracy, while Priors and Desires predicts that it would remain the
same. In order to determine whether higher incentives for accuracy result in lower
bias, Equation 19 was used to estimate the difference in beliefs between Farmers
and Bakers separately in sessions with different levels of accuracy bonus (Table 1).
The results in Table 3 are that the estimated bias is actually greater in sessions
with a higher bonus level, the point estimates being 298, 560, and 645, respectively.
24
On the face of it, these results are consistent with neither type of model. Formal
testing, however, reveals that the apparent increase in the magnitude of the bias
may well be random (𝑝 < 0.4026). The data is, therefore, consistent with the
prediction of Priors and Desires that the magnitude of the bias would be invariant
to changes in the incentives for holding accurate beliefs.
The same is not true, however, for Optimal Expectations. The prediction of this
model is that the magnitude of the bias would be inversely proportional to the scale
of the accuracy bonus (Section 3.3.2). That is, the bias in £2 bonus sessions should
be half the size of the bias in £1 bonus sessions, and the bias in £5 bonus sessions
should be one fifth the size. This prediction is rejected by the data (𝑝 < 0.0140).23
The first panel of Figure 5 shows these results graphically. Though the point
estimates are increasing in the maximum level of the accuracy bonus, a horizontal
parallel line can be comfortably fitted within the confidence intervals. The same
is not true, however, for a hyperbolic curve.
4.3
Subjective uncertainty
According to both Optimal Expectations (Section 3.3.2) and Priors and Desires
(Section 3.3.3), the magnitude of the bias should be increasing in the degree of
subjective uncertainty. In order to test this prediction, I divided the 12 charts used
in the paying periods into two equal sized groups by the degree of subjective uncertainty in the chart, and used Equation 19 to estimate the bias separately in the two
subsamples.24 I used two different measures of subjective uncertainty. The first
was based on the confidence ratings that subjects provided: charts were classified
into the high (low) subjective uncertainty group if the mean (across all subjects) of
the confidence rating for the chart was below (above) median. The second measure
of uncertainty was the within group variance of predictions: charts were classified
into the high (low) subjective uncertainty group if the within group variance of predictions for that chart was above (below) median. In practice, the two measures
resulted in nearly identical classifications.
Depending on the measure used, the estimated bias was 635 or 677 in the group
of high subjective uncertainty charts, and 269 or 227 in the low subjective uncertainty group. The null hypothesis—that the magnitude of the bias in high subjective uncertainty charts would be less than or equal to the magnitude of the bias in
23
This is the a joint hypothesis test. The hypothesis that the magnitude of the bias in £5 sessions
is one fifth that of £1 sessions is rejected with a 𝑝-value of 0.0069.
24
Each subsample consists of observations from all subjects, but in only half the charts.
25
low subjective uncertainty charts—was rejected with a 𝑝-value of 0.0142 when using the first classification method, and a 𝑝-value of 0.0034 when using the second
(Table 3).
These results support the qualitative prediction that the magnitude of the bias
is increasing in the degree of subjective uncertainty. Given that the qualitative
prediction of the two models fits the data, it is interesting to try and test the specific functional form predicted by the two models. The quantitative prediction is
that the magnitude of the bias is linear in the variance of subjective uncertainty.
The following equation should thus prove to be a better model of the data than
Equation 18:
𝑦𝑖𝑗 = 0.5𝛽 ′ 𝜎𝑗2 𝑡𝑖 +
𝛾 𝑑 + 𝜖𝑖𝑗
(20)
∑ 𝑗 𝑗
𝑗
In this equation the 0.5𝛽𝑡𝑖 term in Equation 18 is replaced by 0.5𝛽 ′ 𝜎𝑗2 𝑡𝑖 , where 𝜎𝑗2
is the variance of subjective uncertainty in chart 𝑗.
Testing this quantitative prediction requires a good proxy for the variance of
subjective uncertainty. Using the above measures of subjective uncertainty, we
can identify 𝜎𝑗2 either with the square of the inverse mean confidence rating in
chart 𝑗, or with the mean within group prediction variance for chart 𝑗.25 Table 2
shows the resulting regression fit when estimating the two equations using both
proxies for the variance of subjective uncertainty, as well the results of fitting a
model which includes both the 0.5𝛽𝑡𝑖 term of Equation 18 and the 0.5𝛽 ′ 𝜎𝑗2 𝑡𝑖 of
Equation 20. The results show that Equation 20 indeed provides a better fit to the
data, consistent with the prediction that the magnitude of the bias is linear in the
degree of subjective uncertainty.
The same results can also be seen graphically in the second and third panels of
Figure 5. Panel 2 plots the estimated wishful thinking bias in the 12 charts against
the mean prediction confidence in the chart, and panel 3 plots the same data against
the within group prediction variance. In both panels a curve is fitted to the data
using Equation 20.
4.4
Stakes
Optimal Expectations and Priors and Desires also predict that the magnitude of the
bias is increasing in the stake subjects have in what the day 100 price would be.
Payoff depends on the day 100 price via the notional profit, which is linear in the
25
This assumes a representative agent approximation.
26
Table 2: Testing whether the magnitude of the bias increases with the variance
of subjective uncertainty. Column 1 fits a model in which the bias is independent
of subjective uncertainty (Equation 18). Column 2 and 4 fit a model in which the
magnitude of the bias is linear in the variance (Equation 20). Columns 3 and 5 fit
a model which allows for both regressors. Method 1 and method 2 refer to the two
proxies for subjective uncertainty (Section 4.3). The 𝑡𝑖 𝜎𝑗2 variable is normalized
to have the same standard deviation as 𝑡𝑖 , so that the regression coefficients are
comparable in size. Robust standard errors are in parentheses. The regression R2
is computed after netting out the contribution of the chart dummies. Statistical
significance indicators: *** 𝑝 < 0.01, ** 𝑝 < 0.05, * 𝑝 < 0.1.
𝑡𝑖
∗∗∗
452
method 1
−473∗
(122)
𝑡𝑖 𝜎𝑗2
R2
(259)
∗∗∗
∗∗∗
(272)
955
503
945∗∗∗
(129)
(309)
(130)
(319)
0.0218
0.0230
0.0224
0.0237
497
0.0181
method 2
−458∗
∗∗∗
day 100 price with a slope of 1. The amount of money received for each £1,000 of
notional profit was £1 in 10 sessions and 50p in the remaining 2 sessions (Table 1).
I estimated the magnitude of the bias separately in these two subsamples (Equation 19). The magnitude of the bias was 260 in the low stakes subsample, and 495
in the standard stakes subsample. These results are consistent with the prediction
that the magnitude of the bias is linear in the stakes (𝑝 < 0.9668). The modest
variance in the stakes between sessions was, unfortunately, insufficient to produce
statistically measurable results, and the hypothesis that the bias is not any smaller
in the low stakes subsample could not be rejected (𝑝 < 0.2313). See also Table 3
and panel 4 of Figure 5.
4.5
Over-confidence
Section 4.1 demonstrates the existence of a systematic difference in predictions
between Farmers and Bakers. This difference in predictions is interpreted as evidence of wishful thinking bias affecting subjects’ judgment about the day 100
price. A key assumption is that subjects believe they have better than random odds
of making a good prediction, so it is in their interest to report their true beliefs. If
27
this assumption is not true, subjects could very well choose whichever prediction
they enjoy making, without having to worry about losing the prediction bonus.
As long as subjects prefer reporting a price that would benefit them, we could observe a systematic difference in predictions between Farmers and Bakers that has
nothing to do with wishful thinking.
If this alternative explanation is correct, we would expect subjects who lack
confidence in their predictions to be more biased than confident subjects, since
such subjects have less to lose from biasing their prediction. Similarly, we would
expect subjects who generally believe prices in financial markets are unpredictable
to be more biased than subjects who believe prices can be predicted.
In order to test the first prediction I defined a proxy for a subject’s confidence
by the average prediction confidence for that subject across all charts. I then split
the sample into more and less confident subjects, and estimated the bias separately
in the two subsamples. In order to test the second prediction I included a question
in the post experiment questionnaire about the predictability of prices in financial
markets, and divided subjects into two groups by whether they thought prices can
generally be predicted. The bias was then estimated separately in the two subsamples.26
The result was just the opposite: subjects who believe prices are predictable
and relatively confident subjects are more biased than those who are less confident. Specifically, the estimated bias among relatively confident subject is 628,
compared with 276 among less confident subjects. The hypothesis that more confident subjects are less biased is rejected with 𝑝-value of 0.0732. Similarly, the
estimated bias among subjects who believe prices in financial markets to be generally predictable was 613, as compared with 292 among subjects who believed
prices cannot be predicted. The hypothesis that subjects who believe prices to be
predictable are less biased was rejected with a 𝑝-value of 0.0997.
By and large, therefore, subjects believe they have at least some ability to predict the day 100 price, and the stronger this belief is, the more biased they are. This
result is consistent with the wishful thinking interpretation, and further suggests
that over-confidence is a manifestation of wishful thinking, and that the degree of
wishful thinking bias is a stable individual characteristic.27
26
The question was “We are interested in what people believe about financial markets. How
predictable are the movements of prices in financial markets in your opinion?” The possible choices
were: “Prices can be predicted to a significant extent”, “Prices can rarely be predicted”, and “The
idea that prices can be predicted is an illusion”. The first choice was defined as yes, and the other
two as no. The distribution of answers was 66, 58, and 8, respectively.
27
This explains why individuals with more than average wishful thinking bias also tend to be
28
4.6
Gender
Though the psychology evidence is mixed Lundeberg et al., 2000, certain behavioral differences between men and women, such as a propensity to overtrade among
men, have been interpreted as evidence of gender differences in confidence Barber
and Odean, 2001. Subjects in the experiment included 62 percent females and 38
percent males, and there was therefore sufficient variation to test for gender differences in wishful thinking. The estimated bias is 411 for males and 477 for females,
and the hypothesis of no difference cannot be rejected (𝑝 < 0.7956).
5
Conclusion
This paper describes an experimental test of wishful thinking bias in predictions
of asset prices. Subjects received an accuracy bonus for their predictions of the
future price of an asset, and an unconditional payment that was either increasing
or decreasing in this price. Both groups of subjects had the same information,
and faced the same incentives for accuracy. Nevertheless, and despite incentives
for hedging, subjects in the group benefiting from high prices predicted systematically higher prices than subjects in the group benefiting from low prices. These
results are consistent with wishful thinking, and cannot be accounted for by such
alternative explanations as ego-utility or cognitive bias.
By varying the scale of the accuracy bonus it was possible to test whether the
magnitude of the bias decreases with the incentives to hold accurate beliefs. No
such decrease was found, and the prediction of Optimal Expectations Brunnermeier and Parker, 2005 that the magnitude of the bias is inversely proportional to
the incentives for accuracy, was formally rejected. This result is hard to square
with strategic models of wishful thinking, but is consistent with Priors and Desires. The implication is that wishful thinking can significantly affect beliefs even
if the costs are high.
Other comparative statics results include good evidence that wishful thinking
bias is stronger when subjective uncertainty is high, evidence that over-confidence
and wishful thinking bias go together, and some evidence of greater bias when
payoff is more strongly dependent on the state of the world.
Taken together, these results suggest that any and all subjective beliefs are affected by wishful thinking bias, and that the bias may well be sufficiently strong
over-confident. The tendency to be more or less biased can be identified with the coefficient of
relative optimism in the Priors and Desires model.
29
to materially affect economically important decisions. High stakes decisions in
financial markets are a case in point, as they involve probability assessments in
situations characterized by high stakes and high subjective uncertainty—both of
which are conducive to the presence of an economically significant bias.
In interpreting this conclusion, it is important to bear in mind that decision
makers in high stakes situations have an incentive to invest in quality information
in order to reduce the uncertainty in their beliefs. Since the strength of the bias
depends on the degree of subjective uncertainty, quality information will not only
reduce the variance in beliefs, but would also (perhaps unintentionally) reduce the
magnitude of the bias. The degree to which wishful thinking is likely to affect
high stakes decisions is therefore dependent on decision makers’ ability to reduce
uncertainty before making their choices.
One way to asses the degree of uncertainty is to examine the beliefs of informed
experts. In many important decision making environments (financial markets, corporate decision making, politics, war) informed experts commonly disagree. The
failure of experts to come to anything approaching consensus suggests the existence of a substantial level of irreducible uncertainty. When that is the case, there
is evidently significant potential for wishful thinking to materially affect decisions.
The present paper describes one particular experiment on one particular group
of subjects. While the main conclusions are strongly statistically significant, it
would clearly be important to see whether the results can be replicated by other
researchers and in other decision making environments. Another important limitation in interpreting the results of the experiment is the limited range of theories
under consideration. While I am not aware of any other non ad hoc theory that
can explain the results of the experiment, it is important to emphasize that if such
a theory were to be offered, it may significantly change the interpretation of the
experiment’s results.
A
Axiomatic foundation
This appendix provides an axiomatic foundation for the representation of 𝜋 in Equations 2–
5. In the following definitions 𝑟 and 𝑟′ stand for any reference stakes, 𝑎 for any real number, and 𝐸 for any event. The first definition states the properties we want 𝜋 to satisfy,
and the second describes the logit formula. The theorem says that the two definitions are
equivalent.
Definition 1. 𝜋 ∶ 𝐹 → Δ is a well-behaved distortion if the following conditions are
satisfied:
30
A1 (absolute continuity) 𝜋𝑟′ (𝐸) = 0 ⟺ 𝜋𝑟 (𝐸) = 0.
A2 (consequentialism) If 𝑟 = 𝑟′ over a non-null28 event 𝐸 then 𝜋𝑟′ (⋅|𝐸) = 𝜋𝑟 (⋅|𝐸).
A3 (shift-invariance) If 𝑟′ = 𝑟 + 𝑎 then 𝜋𝑟′ = 𝜋𝑟 .
A4 (prize-continuity) If 𝑟𝑛 → 𝑟 then 𝜋𝑟𝑛 (𝐸) → 𝜋𝑟 (𝐸).
These properties should be understood as simplifying assumptions, the purpose of
which is to obtain as simple as possible a representation, while retaining the ability to
represent the phenomena we wish to model. Absolute Continuity defines the scope of belief
distortion as the set of events that the agent is uncertain about. Consequentialism requires
that beliefs (and therefore any consequences for choices) depend only on the reference
stakes in states that are consistent with the available evidence. Beliefs conditional on an
event 𝐸 cannot depend on the reference stakes in states not in 𝐸.
The idea behind Shift Invariance is that different reference stakes may result in different beliefs only if they differ in what a person wants to be true, or how strongly she feels
about it. Shift-Invariance embodies this idea on the assumption that the agent only wants
something to be true if her reference stakes yield a higher utility if it is true, and that equal
differences in utility correspond to equal degrees of ‘wanting to be true’.
Definition 2 (Logit distortion). 𝜋 ∶ 𝐹 → Δ is a logit distortion if there exists a probability measure 𝑝 (the indifference measure), and a real-number 𝜓 (the coefficient of relative
optimism), such that for any reference 𝑟 ∈ 𝐹 and any event 𝐴,
𝜋𝑟 (𝐴) ∝
∫
𝐴
𝑒𝜓𝑟 d𝑝.
(21)
It is easy to see that every logit distortion is well-behaved. The opposite requires the
technical assumption that there exist at least three events with positive probability:29
Definition 3 (Minimally complex distortion). 𝜋 ∶ 𝐹 → Δ is minimally complex if there
exists three disjoint events 𝐴, 𝐵, and 𝐶, and reference stakes 𝑟 such that 𝜋𝑟 (𝐴), 𝜋𝑟 (𝐵), and
𝜋𝑟 (𝐶) are all positive.
Theorem 1 (Representation theorem). A minimally complex distortion is a logit-distortion
if and only if it is well-behaved.
28
That is, both 𝜋𝑟 (𝐸) > 0 and 𝜋𝑟′ (𝐸) > 0. Absolute Continuity ensures that these two requirements coincide.
29
If there are only two disjoint events, there exist well-behaved distortions that are not logit
distortions.
31
A.1
Intermediate representation results
In this section I prove the theorem for the special case where there are only finitely many
events. That is, I assume that there exists a finite partition 𝒮 of the state-space, such that
Σ is the algebra generated by 𝒮 . In addition, I prove a sequence of partial representation
results requiring only a subset of the assumptions. In order to state the necessary and
sufficient conditions for these representations I define a new property, Indifference, which
is related to Shift Invariance, but is considerably weaker:
A3’ (Indifference). 𝜋𝑟 = 𝜋𝑟′ if both 𝑟 and 𝑟′ are constant payoff-functions.
Note that unlike Shift Invariance, Indifference does not require the set of payoffs to have
cardinal (or even ordinal) meaning.
Lemma 1. Suppose that there exists a finite partition 𝒮 of the state-space, such that Σ is
the algebra generated by 𝒮 , and that 𝜋 is minimally complex, then:
1. Absolute Continuity is a necessary and sufficient condition for there to exist a probability distribution 𝑝 ∈ Δ and a function ℎ ∶ 𝐹 × 𝒮 → ℝ+ , such that for any
reference 𝑟 and any event 𝐴 ∈ 𝒮 ,
𝜋𝑟 (𝐴) ∝ 𝑝(𝐴) ℎ𝑟 (𝐴).
(22)
2. Assume Absolute Continuity. Consequentialism is a necessary and sufficient condition for there to exist a probability distribution 𝑝 ∈ Δ, and a mapping 𝜇 ∶ 𝒮 ×𝑋 →
ℝ+ , such that for any reference 𝑟 and any event 𝐴 ∈ 𝒮 ,
𝜋𝑟 (𝐴) ∝ 𝑝(𝐴) 𝜇𝐴 (𝑟(𝐴)).
(23)
3. Assume Absolute Continuity and Consequentialism. Indifference is a necessary
and sufficient condition for there to exist a probability distribution 𝑝 ∈ Δ, and
a mapping 𝜈 ∶ 𝑋 → ℝ+ , such that for any reference 𝑟 and any event 𝐴 ∈ 𝒮 ,
𝜋𝑟 (𝐴) ∝ 𝑝(𝐴) 𝜈(𝑟(𝐴)).
(24)
4. Assume Absolute Continuity and Consequentialism. Shift-Invariance and PrizeContinuity are necessary and sufficient conditions for there to exists a probability
distribution 𝑝 ∈ Δ, and a parameter 𝜓 ∈ ℝ, such that for any reference 𝑟 and any
event 𝐴 ∈ 𝒮 ,
𝜋𝑟 (𝐴) ∝ 𝑝(𝐴) 𝑒𝜓𝑟(𝐴) .
(25)
32
Note that while the representation in Equations 22–25 is defined with respect to events
in 𝒮 , the implication for general events is straightforward.30 The following simple example
demonstrates that Minimal Complexity is a necessary assumption. Let 𝒮 = {𝐴, 𝐵}, let
𝜋𝑟 (𝐴) ∝ 𝑝(𝐴)(1 + (𝑟(𝐴) − 𝑟(𝐵))2 ) and 𝜋𝑟 (𝐵) ∝ 𝑝(𝐵). This distortion is well-behaved
(Definition 1), but it cannot even be given the representation of Equation 23, let alone that
of a logit distortion (Definition 2).
A.2
Completing the proof
This section concludes the proof of Theorem 1 for the general case. The first step is to
generalize Equation 25 to any reference and any constant-payoff events:
Lemma 2. Suppose 𝜋 ∶ 𝐹 → Δ is a minimally complex well-behaved distortion, then
there exist a probability measure 𝑝 and a parameter 𝜓 ∈ ℝ, such that for any reference 𝑟
and any events 𝐴 and 𝐵 such that 𝑝(𝐵) > 0 and 𝑟 is constant on 𝐴 and on 𝐵,
𝜋𝑟 (𝐴) 𝑝(𝐴) 𝑒𝜓𝑟(𝐴)
.
=
𝜋𝑟 (𝐵) 𝑝(𝐵) 𝑒𝜓𝑟(𝐵)
(26)
Theorem 1 for references that are simple payoff-functions is an immediate corollary.31 The
following claim is a little more general, allowing for functions that are almost everywhere
simple:
Definition 4. A payoff-function 𝑓 ∈ 𝐹 is almost everywhere simple if there exists a
payoff-function 𝑔 ∈ 𝐹 and an event 𝐸 such that 𝑟 obtains only finitely many values on 𝐸
and 𝜋𝑔 (𝐸) = 1.
Corollary 1. Theorem 1 holds when restricted to payoff-functions that are almost everywhere simple.
The remaining case involves functions which are not almost everywhere simple. If such
payoff-functions exist, there must also exist an infinite sequence of non-null events {𝐴𝑛 }𝑛∈ℕ .
But then, as long as 𝜓 ≠ 0 and the set of feasible payoffs is unbounded, it is possible to
(𝐴𝑛 )
construct a reference 𝑟 such that lim𝑛→∞ 𝜋𝜋𝑟 (𝐴
= ∞. But this implies that 𝜋𝑟 (𝐴1 ) = 0, in
𝑟 1)
contradiction to Absolute Continuity. Hence, if 𝜓 ≠ 0 the set of feasible payoffs must be
bounded.
Lemma 3. Suppose 𝜋 ∶ 𝐹 → Δ is a minimally complex well-behaved distortion, and that
there exists a reference 𝑟 that is not everywhere simple, then there exist an upper bound
𝑀 ∈ ℝ, such that for any feasible payoff-value 𝑥, 𝑒𝜓𝑥 ≤ 𝑀.
30
31
Any event in Σ is the finite union of events in 𝒮 .
A payoff-function 𝑟 is simple if 𝑟(𝑆) is finite.
33
Lemma 3 ensures that 𝑒𝜓𝑋 is bounded from above. If it is also bounded from below, a
limit argument based on simple payoff-functions can be used to extend the claim further:
Lemma 4. Suppose 𝜋 ∶ 𝐹 → Δ is a minimally complex well-behaved distortion then
there exists a probability measure 𝑝 and a parameter 𝜓 ∈ ℝ, such that for any events 𝐴
and 𝐵 for which 𝑝(𝐵) > 0, and any reference 𝑟 for which there exist a number 𝑚 > 0 such
that 𝑟(𝑠) ≥ 𝑚 for all 𝑠 ∈ 𝐴 ∪ 𝐵,
𝜓𝑟
𝜋𝑟 (𝐴) ∫𝐴 𝑒 d𝑝
.
=
𝜋𝑟 (𝐵) ∫𝐵 𝑒𝜓𝑟 d𝑝
(27)
The final step in the proof of Theorem 1 uses a limit argument whereby a general event
𝐴 is approached by events of the form 𝐴𝑛 = {𝑠 ∈ 𝐴 ∶ 𝑒𝜓𝑟(𝑠) ≥ 2−𝑛 }, and Lemma 4 is
applied on each of these events separately.
A.3
Proofs
Lemma 1
In all the four parts of Lemma 1 the proof that the requirements are necessary is trivial. I
thus prove only that the requirements are sufficient:
Part 1. Let 𝑎 denote some arbitrary constant payoff-function. Define 𝑝 = 𝜋𝑎 , and let
𝑟 (𝐴) for 𝐴 ∈ 𝒮 ∗ and ℎ (𝐴) = 0 for
𝒮 ∗ = {𝐴 ∈ 𝑆 ∶ 𝑝(𝐴) > 0}. Define ℎ𝑟 (𝐴) = 𝜋𝑝(𝐴)
𝑟
∗
∗
𝐴 ∉ 𝒮 . For 𝐴 ∈ 𝒮 the claim follows from the definition of ℎ𝑓 . By Absolute Continuity
𝑝(𝐴) = 0 ⇒ 𝜋𝑟 (𝐴) = 0, and hence the claim holds also for 𝐴 ∉ 𝒮 ∗ .
Part 2. Let 𝐴 ∈ 𝒮 ∗ and 𝑥 ∈ 𝑋, let 𝑟(𝐴, 𝑥) be the payoff-function mapping 𝐴 to 𝑥 and all
states outside 𝐴 to 𝑎. Let 𝐸1 , … , 𝐸𝑛 denote the other events in 𝒮 ∗ . By Minimal Complexity and Absolute Continuity 𝒮 ∗ includes at least two events other than 𝐴. 𝑟(𝐴, 𝑥) and the
constant payoff-function 𝑎 agree on 𝐸𝑖 and 𝐸𝑗 for all 𝑖 and 𝑗. Hence, by Consequentialism
with 𝐸 = 𝐸𝑖 ∪ 𝐸𝑗 ,
1 − 𝜋𝑟(𝐴,𝑥) (𝐴) =
𝜋𝑟(𝐴,𝑥) (𝐸𝑖 )
𝜋𝑟(𝐴,𝑥) (𝐸𝑗 )
∑
=
𝑝(𝐸𝑖 )
.Thus,
𝑝(𝐸𝑗 )
𝜋𝑟(𝐴,𝑥) (𝐸𝑖 ) =
𝑖
𝜋𝑟(𝐴,𝑥) (𝐸𝑗 )
∑
𝑖
𝜋
𝑟(𝐴,𝑥)
Define 𝜇𝐴 (𝑥) = ( 1−𝑝(𝐴)
𝑝(𝐴) ) ( 1−𝜋
(𝐴)
𝑟(𝐴,𝑥) (𝐴)
𝑝(𝐸𝑗 )
𝑝(𝐸𝑖 ) =
𝜋𝑟(𝐴,𝑥) (𝐸𝑗 )
𝑝(𝐸𝑗 )
(1 − 𝑝(𝐴)).
(28)
). By Equation 28,
𝑝(𝐴)𝜇𝐴 (𝑟(𝐴)) = (1 − 𝑝(𝐴))
𝜋𝑟(𝐴,𝑟(𝐴)) (𝐴)
1 − 𝜋𝑟(𝐴,𝑟(𝐴)) (𝐴)
34
= 𝑝(𝐸𝑗 )
𝜋𝑟(𝐴,𝑟(𝐴)) (𝐴)
𝜋𝑟(𝐴,𝑟(𝐴)) (𝐸𝑗 )
.
(29)
Let 𝑟 be any payoff-function, and let 𝐴 and 𝐵 be any two events in 𝒮 ∗ . Let 𝑟′ be a payofffunction that coincides with 𝑟 on 𝐴 and 𝐵, and with 𝑎 elsewhere, and let 𝐶 be any third
event in 𝒮 ∗ . Inserting 𝐸𝑗 = 𝐶 in Equation 29 we obtain that
𝜋𝑟 (𝐴) 𝜋𝑟′ (𝐴) 𝜋𝑟′ (𝐴)/𝜋𝑟′ (𝐶) 𝑝(𝐶) 𝜋𝑟(𝐴,𝑟(𝐴)) (𝐴)/𝜋𝑟(𝐴,𝑟(𝐴)) (𝐶)
=
=
=
𝜋𝑟 (𝐵) 𝜋𝑟′ (𝐵) 𝜋𝑟′ (𝐵)/𝜋𝑟′ (𝐶) 𝑝(𝐶) 𝜋𝑟(𝐵,𝑟(𝐵)) (𝐵)/𝜋𝑟(𝐵,𝑟(𝐵)) (𝐶)
𝑝(𝐴)𝜇𝐴 (𝑟(𝐴))
=
𝑝(𝐵)𝜇𝐵 (𝑟(𝐵))
(30)
where the first and third steps follows from Consequentialism, and the final step from
Equation 29. Since Equation 30 holds for all 𝐴, 𝐵 ∈ 𝒮 ∗ it follows that Equation 23
holds for any event 𝐴 ∈ 𝒮 ∗ . For an event 𝐴 ∉ 𝒮 ∗ , define 𝜇𝐴 (𝑥) = 1 for all 𝑥. Since
𝜋𝑟 (𝐴) = 𝑝(𝐴) = 0 for 𝐴 ∉ 𝒮 ∗ Equation 23 holds however 𝜇𝐴 is defined. Combining these
results Equation 23 holds for reference 𝑟 and any event 𝐴 ∈ 𝒮 .
Part 3. Let 𝐴∗ ∈ 𝒮 ∗ be some event. Define the mapping 𝜈 ∶ 𝑋 → ℝ+ by 𝜈(𝑥) = 𝜇𝐴∗ (𝑥).
For 𝑥 ∈ 𝑋 let 𝑥 denote also the constant payoff-function yielding the payoff 𝑥 in all states.
Inserting 𝑓 = 𝑥 and 𝐵 = 𝐴∗ in Equation 30 we obtain that for all 𝐴 ∈ 𝒮 ∗ and 𝑥 ∈ 𝑋,
𝜋𝑥 (𝐴)
𝑝(𝐴) 𝜇𝐴 (𝑥)
= 𝑝(𝐴
∗ ) 𝜈(𝑥) . Since 𝑥 is a constant payoff-function it follows from Indifference that
𝜋𝑥 (𝐴∗ )
𝜋𝑥 = 𝜋𝑎 = 𝑝. Hence, 𝜇𝐴 (𝑥) = 𝜈(𝑥). Thus, 𝜋𝑟 (𝐴) ∝ 𝑝(𝐴)𝜈(𝑟(𝐴)) for all 𝐴 ∈ 𝒮 ∗ . Finally,
this is also trivially true for 𝐴 ∉ 𝒮 ∗ , since 𝜋𝑟 (𝐴) = 𝑝(𝐴) = 0 for 𝐴 ∉ 𝒮 ∗ .
Part 4. Let 𝐴, 𝐵 ∈ 𝒮 ∗ be two events, and let 𝑥 and 𝑦 be real-numbers such that 𝑥, 𝑦, and
𝑥 + 𝑦 are in 𝑋. Define the payoff-functions 𝑓𝑥 and 𝑔𝑥,𝑦 as follows: 𝑓𝑥 (𝑠) = 𝑥 for 𝑠 ∈ 𝐴
and 𝑓𝑥 (𝑠) = 0 for 𝑠 ∉ 𝐴, and 𝑔𝑥,𝑦 = 𝑓𝑥 + 𝑦. By Shift-Invariance, 𝜋𝑔𝑥,𝑦 = 𝜋𝑓𝑥 , and in
𝜋𝑔𝑥,𝑦 (𝐴)
𝜋𝑔𝑥,𝑦 (𝐵)
log( 𝜈(𝑥)
)
𝜈(0)
particular
=
𝜋𝑓𝑥 (𝐴)
.
𝜋𝑓𝑥 (𝐵)
By Equation 24 it follows that
𝜈(𝑥+𝑦)
𝜈(𝑦)
=
𝜈(𝑥)
.
𝜈(0)
Hence, defining
𝜎(𝑥) =
we obtain that 𝜎 is linear, i.e. for all 𝑥 and 𝑦, 𝜎(𝑥 + 𝑦) = 𝜎(𝑥) + 𝜎(𝑦).
For 𝑚 ∈ ℕ let 𝑦 = 𝑚𝑥. By induction we obtain that 𝜎(𝑚𝑥) = 𝑚𝜎(𝑥). Similarly, for
𝑛 ∈ ℕ let 𝑦 = 𝑥𝑛 to obtain that 𝜎(𝑥) = 𝜎(𝑛𝑦) = 𝑛𝜎(𝑦), and hence 𝜎( 𝑥𝑛 ) = 𝜎(𝑥)
𝑛 . Let
𝑦 = −𝑥 to obtain that 𝜎(−𝑥) = −𝜎(𝑥). Combining these results, and defining 𝜓 = 𝜎(1),
we obtain that for any rational number 𝑞 ∈ 𝑋, 𝜎(𝑞) = 𝜓𝑞, and so 𝜈(𝑞) = 𝜈(0)𝑒𝜓𝑞 . Let
now 𝑥 ∈ 𝑋 be any feasible payoff-value, and let {𝑞𝑛 }𝑛∈ℕ be a sequence of rational feasible
payoff-values converging to 𝑥. By prize-continuity 𝜋𝑓𝑞 → 𝜋𝑓𝑥 , which given Equation 24
𝑛
implies that 𝜈(𝑞𝑛 ) → 𝜈(𝑥). By the result for rational numbers, 𝜈(𝑞𝑛 ) = 𝜈(0)𝑒𝜓𝑞𝑛 , and hence
𝜈(𝑞𝑛 ) → 𝜈(0)𝑒𝜓𝑥 . Thus, 𝜈(𝑥) and 𝜈(0)𝑒𝜓𝑥 are both the limit of the same sequence of realnumbers, and so 𝜈(𝑥) = 𝜈(0)𝑒𝜓𝑥 . Finally, since Equation 24 is invariant to multiplying 𝜈
by a positive number, we obtain that Equation 25 holds for all 𝑥 ∈ ℝ.
35
Lemma 2
Proof. Let 𝑎 ∈ 𝐹 denote some constant payoff-function, and define 𝑝 = 𝜋𝑎 . By Minimal
Complexity and Absolute Continuity there exists a finite partition 𝒮 of the state-space
consisting of at least three events, such that 𝜋𝑟 (𝐴) > 0 for any 𝑟 ∈ 𝐹 and 𝐴 ∈ 𝒮 . Let
Σ(𝒮 ) ⊆ Σ denote the algebra generated by 𝒮 , and let 𝑅(𝒮 ) ⊆ 𝐹 denote the set of Σ(𝒮 )measurable payoff-functions. By Lemma 1 there exists a probability measure 𝑝𝒮 over
(𝑆, Σ(𝒮 )) and a parameter 𝜓𝒮 ∈ ℝ such that Equation 25 holds any probability measure
𝑓 ∈ 𝐹 (𝒮 ) and any event 𝐴 ∈ 𝒮 . In particular 𝑎 ∈ 𝐹 (𝒮 ) (any constant payoff-function
is), and hence for any 𝐴 ∈ 𝒮 , 𝑝(𝐴) = 𝜋𝑎 (𝐴) ∝ 𝑝𝒮 (𝐴)𝑒𝜓𝒮 𝑎 . Thus, 𝑝(𝐴) = 𝑝𝒮 for any event
𝐴 ∈ 𝒮 , and hence also for any event 𝐴 ∈ Σ(𝒮 ). Define 𝜓 = 𝜓𝒮 . It follows that for any
payoff-function 𝑓 ∈ Σ(𝒮 ) and any event 𝐴 ∈ 𝒮 , 𝜋𝑟 (𝐴) ∝ 𝑝(𝐴)𝑒𝜓𝑟(𝐴) .32
Let now 𝐴 and 𝐵 denote any events such that 𝑝(𝐵) > 0, and let 𝑟 be any payoff(𝐴)
𝑝(𝐴) 𝜓(𝑟(𝐴)−𝑟(𝐵))
function. I need to show that 𝜋𝜋𝑟 (𝐵)
= 𝑝(𝐵)
𝑒
. To simplify notation let 𝛿𝑓 (𝐴, 𝐵) =
𝑟
(𝐴)
𝑝(𝐴)
log 𝜋𝜋𝑟 (𝐵)
− log 𝑝(𝐵)
. With this notation I need to prove that 𝛿𝑓 (𝐴, 𝐵) = 𝜓(𝑟(𝐴) − 𝑟(𝐵)).
𝑟
Let 𝐸1 , 𝐸2 , … 𝐸𝑛 denote the events in 𝒮 . Without limiting generality suppose 𝐴 ∩ 𝐸1 is
not-null. Define a payoff-function 𝑔 ∈ 𝐹 by 𝑔 = 𝑟(𝐴) on 𝐴 ∩ 𝐸1 and 𝑔 = 𝑟(𝐵) elsewhere,
and a payoff-function ℎ ∈ 𝐹 (𝒮 ) by ℎ = 𝑟(𝐴) on 𝐸1 and ℎ = 𝑟(𝐵) elsewhere. With
these definitions, 𝛿𝑓 (𝐴, 𝐵) = 𝛿𝑓 (𝐴 ∩ 𝐸1 , 𝐵) = 𝛿𝑔 (𝐴 ∩ 𝐸1 , 𝐵) = 𝛿𝑔 (𝐴 ∩ 𝐸1 , 𝐵 ∪ 𝐸2 ) =
𝛿𝑔 (𝐴 ∩ 𝐸1 , 𝐸2 ) = 𝛿ℎ (𝐴 ∩ 𝐸1 , 𝐸2 ) = 𝛿ℎ (𝐸1 , 𝐸2 ) = 𝜓(𝑓 (𝐴) − 𝑟(𝐵)), where the last step
uses the fact that ℎ is in 𝑅(𝒮 ), and the other steps use Consequentialism and the fact that
by Shift-Invariance 𝜋𝑟(𝐴) = 𝜋𝑟(𝐵) = 𝑝.
Corollary 1
Proof. The proof that a logit distortion is well-behaved is trivial. I thus prove only that if
𝜋 is well-behaved then it is a logit-distortion. The conditions of Lemma 2 are met. Let 𝑝
and 𝜓 be parameters for which the claim in Lemma 2 holds. Suppose 𝑟 is a.e. simple then
there exist a finite set of disjoint events {𝐸1 , … , 𝐸𝑛 } such that 𝑟 is constant on each of
these events, and for some payoff-function 𝑔, 𝜋𝑔 (∪𝑖 𝐸𝑖 ) = 1. By Absolute Continuity also
𝜋𝑟 (∪𝑖 𝐸𝑖 ) = 1, and so 𝜋𝑟 (𝐴 ∩ ∪𝑖 𝐸𝑖 ) = 𝜋𝑟 (𝐴). Given that the events are disjoint it follows
that 𝜋𝑟 (𝐴) = ∑𝑖 𝜋𝑟 (𝐴∩𝐸𝑖 ). Using Lemma 2 we obtain that 𝜋𝑟 (𝐴) ∝ ∑𝑖 𝑝(𝐴∩𝐸𝑖 )𝑒𝜓𝑟(𝐴∩𝐸𝑖 ) .
By Absolute Continuity 𝑝(𝑆 ⧵ ∪𝑖 𝐸𝑖 ) = 0, and hence ∫𝐴 𝑒𝜓𝑟 d𝑝 = ∑𝑖 𝑝(𝐴 ∩ 𝐸𝑖 )𝑒𝜓𝑟(𝐴∩𝐸𝑖 ) .
Combining these observations we obtain that 𝜋𝑟 (𝐴) ∝ ∫𝐴 𝑒𝜓𝑟 d𝑝.
Lemma 3
Proof. The case of 𝜓 = 0 is trivial. Henceforth I assume 𝜓 ≠ 0. By Corollary 1 there exist a probability measure 𝑝 and a parameter 𝜓 such that Equation 5 holds for any reference
32
Note that 𝑝 = 𝜋𝑎 is a probability measure over all the events in Σ—not just the events in Σ(𝒮 ).
36
𝑟 that is almost everywhere simple. If there exists a reference 𝑟 that is not almost everywhere simple then there exists an infinite sequence {𝐴𝑛 }𝑛∈ℕ of disjoint non-null events.33
I need to prove that in this case there exists a number 𝑀 ∈ ℝ such that 𝑒𝜓𝑥 ≤ 𝑀 for
all 𝑥 ∈ 𝑋. Suppose otherwise, then it is possible to choose from 𝑋 a sequence {𝑥𝑛 }𝑛∈ℕ ,
s.t. for all 𝑛, 𝑝(𝐴𝑛 )𝑒𝜓𝑥𝑛 ≥ 𝑝(𝐴1 )𝑒𝜓𝑥1 . Define a reference 𝑟 by 𝑟(𝐴𝑛 ) = 𝑥𝑛 for 𝑛 ∈ ℕ,
and 𝑟(𝑠) = 𝑥1 outside ∪𝑛 𝐴𝑛 . For 𝑛 ∈ ℕ define also a simple payoff-function 𝑟𝑛 by
𝑟𝑛 (𝐴𝑛 ) = 𝑥𝑛 and 𝑟𝑛 (𝑠) = 𝑥1 for 𝑠 ∉ 𝐴𝑛 . By construction 𝑟 and 𝑟𝑛 agree on 𝐴1 and 𝐴𝑛 .
𝜋 (𝐴 )
(𝐴𝑛 )
Thus, for all 𝑁 ∈ ℕ, 1 ≥ ∑𝑛≤𝑁 𝜋𝑟 (𝐴𝑛 ) = 𝜋𝑟 (𝐴1 ) ∑𝑛≤𝑁 𝜋𝜋𝑟 (𝐴
= 𝜋𝑟 (𝐴1 ) ∑𝑛≤𝑁 𝜋𝑟𝑛 (𝐴𝑛 ) =
)
𝑟
𝜓𝑥𝑛
1
𝑟𝑛
1
𝑝(𝐴𝑛 )𝑒
𝜋𝑟 (𝐴1 ) ∑𝑛≤𝑁 𝑝(𝐴
𝜓𝑥1 ≥ 𝜋𝑟 (𝐴1 ) ∑𝑛≤𝑁 1 = 𝑁𝜋𝑟 (𝐴1 ) where the second equality follows
1 )𝑒
from Consequentialism, and the third from Corollary 1. Letting 𝑁 → ∞ we obtain that
𝜋𝑟 (𝐴1 ) = 0, in contradiction to the assumption that 𝐴1 is not null.
Lemma 4
Proof. By Corollary 1 there exist a probability measure 𝑝 and a parameter 𝜓 such that
Equation 27 holds for any reference 𝑟 that is almost everywhere simple. I show that the
claim holds with the same 𝑝 and 𝜓 also for a reference 𝑟 that is not everywhere simple. If
such payoff-functions then by Lemma 3 there exists a number 𝑀, such that 𝑒𝜓𝑥 ≤ 𝑀 for
all 𝑥 ∈ 𝑋. Assume first that 𝜓 ≠ 0. For any 𝑛 ∈ 𝑁 divide the interval [𝑚, 𝑀] into 2𝑛 nonoverlapping intervals of length 𝑀−𝑚
. For any state 𝑠 let 𝐼𝑛 (𝑠) denote the interval to which
2𝑛
𝜓𝑟
𝑚𝑖𝑛
𝑒 belongs, and let 𝐼𝑛 (𝑠) denote its lower endpoint. Define a simple payoff-function 𝑟𝑛
𝑚𝑖𝑛
≤ 𝑒𝜓𝑟𝑛 (𝑠) ≤ 𝑒𝜓𝑟(𝑠) for all 𝑠, and
by 𝑟𝑛 (𝑠) = log 𝐼𝑛 𝜓 (𝑠) . With this definition 𝑒𝜓𝑟(𝑠) − 𝑀−𝑚
2𝑛
so 𝑒𝜓𝑟𝑛 (𝑠) ↗ 𝑒𝜓𝑟(𝑠) for all 𝑠. Moreover, since 𝑒𝜓𝑥 is a monotonic continuous function also
𝑟𝑛 → 𝑓 . Thus,
𝜋𝑟 (𝐴)
∫ 𝑒𝜓𝑟𝑛 d𝑝 lim ∫𝐴 𝑒𝜓𝑟𝑛 d𝑝 ∫𝐴 𝑒𝜓𝑟 d𝑝
𝜋𝑟 (𝐴) lim 𝜋𝑟𝑛 (𝐴)
=
= lim 𝑛
= lim 𝐴 𝜓𝑟
(31)
=
=
𝜋𝑟 (𝐵) lim 𝜋𝑟𝑛 (𝐵)
𝜋𝑟𝑛 (𝐵)
∫𝐵 𝑒 𝑛 d𝑝 lim ∫𝐵 𝑒𝜓𝑟𝑛 d𝑝 ∫𝐵 𝑒𝜓𝑟 d𝑝
where the first step follows from Prize-Continuity, the second and fourth since 𝑝(𝐵) > 0
and 𝜋𝑟𝑛 (𝑠) ∈ [𝑚, 𝑀] on 𝐴 ∪ 𝐵, the third from Corollary 1, and the fifth from the monotone
convergence theorem.
33
If 𝑟 has infinitely many atoms these atoms can form the sequence. Otherwise, let 𝐸 denote
the event outside the set of atoms (if any). 𝐸 cannot be null, or else 𝑟 is almost always simple.
Since 𝑟 has no atoms on 𝐸 it follows that there exists a value 𝑦 (the median of 𝑟 on 𝐸) such that
. Thus 𝐸 includes two non-null events on which 𝑟 has no atoms: 𝐸(𝑦)
𝑝(𝑠 ∈ 𝐸 ∶ 𝑟(𝑠) ≤ 𝑦) = 𝑝(𝐸)
2
and 𝐸 ⧵ 𝐸(𝑦). This process can be repeated recursively, where in the 𝑛’th stage 𝐸 is split into 2𝑛
disjoint non-null events. An infinite sequence of disjoint non-null events can therefore be formed.
37
Theorem 1
Proof. I prove that if 𝜋 is a well-behaved distortion then it is a logit distortion. The opposite direction is trivial. By Lemma 4 there exist a probability measure 𝑝 and a parameter 𝜓
such that Equation 27 holds for any events 𝐴 and 𝐵 for which 𝑝(𝐵) > 0 and any reference 𝑟
for which there exists a number 𝑚 > 0 such that 𝑒𝜓𝑟 ≥ 𝑚 on 𝐴∪𝐵. To complete the proof I
need to show that Equation 27 holds even if no such number 𝑚 exists. Let 𝑟 be any payofffunction and let 𝐴 and 𝐵 be any events such that 𝑝(𝐵) > 0. If 𝑟 is almost everywhere
simple the claim follows from Corollary 1. Otherwise, by Lemma 3 there exists a number
𝑀 ∈ ℝ such that 𝑒𝜓𝑥 ≤ 𝑀 for all 𝑥 ∈ 𝑋. For 𝑛 ∈ ℕ let 𝐴𝑛 = {𝑠 ∈ 𝐴 ∶ 𝑒𝜓𝑟 ≥ 2−𝑛 }, and
similarly define 𝐵𝑛 . By construction lim𝑛→∞ 𝐴⧵𝐴𝑛 = ∅ and similarly lim𝑛→∞ 𝐵⧵𝐵𝑛 = ∅.
Moreover, since 𝑝(𝐵) > 0 there exists 𝑛0 ∈ ℕ such that for all 𝑛 ≥ 𝑛0 , 𝑝(𝐵𝑛 ) > 0. The
conditions for Lemma 4 therefore hold for 𝐴𝑛 and 𝐵𝑛 for all 𝑛 ≥ 𝑛0 . Combining these
observations we obtain that
lim𝑛→∞ ∫𝐴 𝑒𝜓𝑟 d𝑝
∫𝐴 𝑒𝜓𝑟 d𝑝
∫𝐴 𝑒𝜓𝑟 d𝑝
𝜋𝑟 (𝐴𝑛 )
𝜋𝑟 (𝐴)
𝑛
𝑛
= lim
= lim
=
=
𝜋𝑟 (𝐵) 𝑛→∞ 𝜋𝑟 (𝐵𝑛 ) 𝑛→∞ ∫𝐵 𝑒𝜓𝑓 d𝑝 lim𝑛→∞ ∫𝐵 𝑒𝜓𝑓 d𝑝 ∫𝐵 𝑒𝜓𝑓 d𝑝
𝑛
(32)
𝑛
where step 3 holds since the integrals are bounded from below and above: (i) 𝑒𝜓𝑥 ≤ 𝑀
for all 𝑥 ∈ 𝑋, so the integrals are bounded from above by 𝑀, and (ii) 𝑝(𝐵𝑛0 ) > 0 and
𝑓 ≥ 2−𝑛0 on 𝐵𝑛0 , and hence there exists some 𝑚 > 0 such that for all 𝑛 ≥ 𝑛0 , ∫𝐵 𝑒𝜓𝑟 d𝑝 ≥
𝑛
∫𝐵 𝑒𝜓𝑟 d𝑝 ≥ 2−𝑛0 𝑝(𝐵𝑛0 ) ≥ 𝑚.
𝑛0
References
Akerlof, G. and Dickens, W. (1982). “The Economic Consequences of Cognitive
Dissonance”. In: American Economic Review 72.3, pp. 307–319.
Armantier, O. and Treich, N. (2010). Eliciting Beliefs: Proper Scoring Rules, Incentives, Stakes and Hedging. TSE Working Papers. Toulouse School of Economics.
Babcock, L. and Loewenstein, G. (1997). “Explaining Bargaining Impasse: The
Role of Self-Serving Biases”. In: Journal of Economic Perspectives 11.1, pp. 109–
126.
Babcock, L., Loewenstein, G., Issacharoff, S., and Camerer, C. (1995). “Biased
Judgments of Fairness in Bargaining”. In: American Economic Review 85,
pp. 1337–1343.
Babcock, L., Wang, X., and Loewenstein, G. (1996). “Choosing the wrong pond:
Social comparisons in negotiations that reflect a self-serving bias”. In: Quarterly Journal of Economics 111.1, p. 1.
38
Barber, B. and Odean, T. (2001). “Boys Will be Boys: Gender, Overconfidence,
and Common Stock Investment”. In: Quarterly Journal of Economics 116.1,
pp. 261–292.
Benabou, R. and Tirole, J. (2002). “Self-confidence and personal motivation”. In:
Quarterly Journal of Economics 117.3, pp. 871–915.
Blanco, M., Engelmann, D., Koch, A., and Normann, H. (2008). “Belief elicitation in experiments: is there a hedging problem?” In: Experimental Economics,
pp. 1–27.
Bracha, A. and Brown, D. (2012). “Affective decision making: A theory of optimism bias”. In: Games and Economic Behavior.
Brunnermeier, M. and Parker, J. (2005). “Optimal Expectations”. In: American
Economic Review 95.4, pp. 1092–1118.
Camerer, C. and Lovallo, D. (1999). “Overconfidence and excess entry: An experimental approach”. In: American Economic Review 89.1, pp. 306–318.
Caplin, A. and Leahy, J. (2001). “Psychological Expected Utility Theory and Anticipatory Feelings”. In: Quarterly Journal of Economics 116.1, pp. 55–80.
Carrillo, J. and Mariotti, T. (2000). “Strategic Ignorance as a Self-Disciplining
Device”. In: Review of Economic Studies 67.3, pp. 529–544.
Compte, O. and Postlewaite, A. (2004). “Confidence-Enhanced Performance”. In:
American Economic Review 94.5, pp. 1536–1557.
Cover, T. and Thomas, J. (1991). Elements of information theory. Vol. 1. Wiley
Online Library.
Cowgill, B., Wolfers, J., Wharton, U., and Zitzewitz, E. (2009). Using Prediction
Markets to Track Information Flows: Evidence from Google. Mimeo.
De Meza, D. and Southey, C. (1996). “The Borrower’s Curse: Optimism, Finance
and Entrepreneurship”. In: Economic Journal, pp. 375–386.
Dillenberger, D., Postlewaite, A., and Rozen, K. (2014). “Optimism and Pessimism
with Expected Utility”.
Dubra, J. (2004). “Optimism and overconfidence in search”. In: Review of Economic Dynamics 7.1, pp. 198–218.
Edwards, R. and Magee, J. (2010). Technical analysis of stock trends. Snowball
Publishing. ISBN: 1607962233.
Eliaz, K. and Spiegler, R. (2008). “Consumer optimism and price discrimination”.
In: Theoretical Economics 3.4, pp. 459–497.
Fischbacher, U. (2007). “z-Tree: Zurich toolbox for ready-made economic experiments”. In: Experimental Economics 10.2, pp. 171–178.
39
Gneiting, T. and Raftery, A. (2007). “Strictly proper scoring rules, prediction,
and estimation”. In: Journal of the American Statistical Association 102.477,
pp. 359–378.
Good, I. (1952). “Rational decisions”. In: Journal of the Royal Statistical Society.
Series B (Methodological), pp. 107–114.
Heaton, J. (2002). “Managerial Optimism and Corporate Finance”. In: Financial
Management 31.2, pp. 33–45.
Hey, J. (1984). “The economics of optimism and pessimism”. In: Kyklos 37.2,
pp. 181–205.
Hoffman, M. (2011a). Learning, Persistent Overconfidence, and the Impact of
Training Contracts. Mimeo.
Hoffman, M. (2011b). Overconfidence at Work and the Evolution of Beliefs: Evidence from a Field Experiment. Mimeo.
Johnson, D. H. and Sinanovic, S. (2001). “Symmetrizing the kullback-leibler distance”. In: IEEE Transactions on Information Theory 1.1, pp. 1–10.
Knight, F. (1921). Risk, Uncertainty and Profit. Houghton Mifflin.
Landier, A. and Thesmar, D. (2009). “Financial contracting with optimistic entrepreneurs”. In: Review of Financial Studies 22.1, pp. 117–150.
Loewenstein, G., Issacharoff, S., Camerer, C., and Babcock, L. (1993). “Self-serving
assessments of fairness and pretrial bargaining”. In: Journal of Legal Studies
22.1, pp. 135–159.
Lundeberg, M., Fox, P., Brown, A., and Elbedour, S. (2000). “Cultural influences
on confidence: Country and gender”. In: Journal of Educational Psychology
92.1, p. 152.
Maccheroni, F., Marinacci, M., and Rustichini, A. (2006). “Ambiguity Aversion,
Robustness, and the Variational Representation of Preferences”. In: Econometrica 74.6, pp. 1447–1498.
Malmendier, U. and Tate, G. (2008). “Who makes acquisitions? CEO overconfidence and the market’s reaction”. In: Journal of Financial Economics 89.1,
pp. 20–43.
Manove, M. and Padilla, A. (1999). “Banking (conservatively) with optimists”. In:
The RAND Journal of Economics, pp. 324–350.
Mayraz, G. (2011). Priors and Desires—A Model of Payoff-Dependent Beliefs.
CEP Discussion Paper 1047. Centre for Economic Performance, London School
of Economics.
Mijović-Prelec, D. and Prelec, D. (2010). “Self-deception as self-signalling: a model
and experimental evidence”. In: Philosophical Transactions of the Royal Society B: Biological Sciences 365.1538, p. 227.
40
Mullainathan, S. and Washington, E. (2009). “Sticking With your Vote: Cognitive
Dissonance and Political Attitudes”. In: American Economic Journal: Applied
Economics 1.1, pp. 86–111.
Murphy, J. J. (1999). Technical analysis of the financial markets. New York Institute of Finance.
Olsen, R. (1997). “Desirability bias among professional investment managers: Some
evidence from experts”. In: Journal of Behavioral Decision Making 10.1, pp. 65–
72.
Park, Y. and Santos-Pinto, L. (2010). “Overconfidence in tournaments: evidence
from the field”. In: Theory and Decision 69.1, pp. 143–166.
Quiggin, J. (1982). “A theory of anticipated utility”. In: Journal of Economic Behavior & Organization 3.4, pp. 323–343. ISSN: 0167-2681.
Sandroni, A. and Squintani, F. (2007). “Overconfidence, insurance, and paternalism”. In: American Economic Review 97.5, pp. 1994–2004.
Santos-Pinto, L. (2008). “Positive Self-image and Incentives in Organisations”. In:
Economic Journal 118.531, pp. 1315–1332.
Sharot, T. (2011). The Optimism Bias. Random House.
Weinstein, N. (1989). “Optimistic biases about personal risks”. In: Science 246.4935,
pp. 1232–1233.
41
Figure 1: The interface of the Farmers treatment with a maximum accuracy bonus
of £5. The interface of the Bakers treatment was similar, except: (a) the first three
lines were: “You have a buyer for £16,000 worth of bread from your bakery. At
day 100 you will get the money from the order, and will have to use some of it to
buy wheat at the market. Your profit is whatever you would have left after paying
for the wheat.”, and (b) instead of an arrow on the chart pointing to £4,000 with
the label “Wheat production costs”, there was an arrow pointing to £16,000 with
the label “The price you would get for your bread”.
42
Figure 2: The charts used in the 12 earning periods. The 𝑥-axis represents time,
ranging from day 0 to day 100, and the 𝑦-axis represents price, ranging from £4,000
to £16,000. The data for the charts were adapted from historical equity price data,
shifted and scaled to fit into a uniform range. Figure 1 shows how these charts
were presented to subjects.
43
0
0
.1
.05
Density
.2
.3
Density
.1 .15
.4
£6,400 confidence level 3
.2
£10,000 confidence level 1
4000
6000
8000 10000 12000 14000 16000
Price (in £)
4000
8000 10000 12000 14000 16000
Price (in £)
£10,000 confidence level 10
0
0
.2
Density
.5
1
Density
.4
.6
.8
1.5
£12,600 confidence level 5.5
6000
4000
6000
8000 10000 12000 14000 16000
Price (in £)
4000
6000
8000 10000 12000 14000 16000
Price (in £)
Figure 3: The examples of distributions used in the instructions. Each distribution
is characterized by a prediction and a confidence level. These examples were used
in explaining the prediction elicitation procedure. They were particularly useful in
establishing a reference for the 1-10 scale that was used in reporting the subject’s
confidence in her prediction.
44
Table 3: Wishful thinking bias and comparative statics. The table reports the estimated bias in different sub-samples and statistical tests of related hypotheses.
Sample
All subjects
negative ?
Accuracy bonus: low (£1)
Accuracy bonus: medium (£2)
Accuracy bonus: high (£5)
low = medium = high ?
low = 2 ⋅ medium = 5 ⋅ high ?
Estimated biasa
452∗∗∗ (s.e. 123)
𝑝 < 0.0002
∗∗
298
(s.e. 164)
569∗∗
(s.e. 328)
645∗∗∗ (s.e. 210)
𝑝 < 0.4026
𝑝 < 0.0140c
Observationsb
1584 (132)
Chart uncertainty: low
Chart uncertainty: high
low > high ?
Within chart variance: low
Within chart variance: high
low > high ?
269∗∗
(s.e. 127)
635∗∗∗ (s.e. 166)
𝑝 < 0.0142
227∗∗
(s.e. 113)
∗∗∗
677
(s.e. 175)
𝑝 < 0.0034
792 (66)
792 (66)
Stakes: low (50p)
Stakes: standard (£1)
standard ≤ 2 ⋅ low ?
standard = 2 ⋅ low ?
260
(s.e. 289)
495∗∗∗ (s.e. 135)
𝑝 < 0.2313d
𝑝 < 0.9668
288 (24)
1296 (108)
Confidence in
ability to
predict prices
Average confidence: low
Average confidence: high
low > high ?
Prices predictable? no
Prices predictable? yes
no > yes ?
276∗
(s.e. 174)
628∗∗∗ (s.e. 169)
𝑝 < 0.0732
∗∗
292
(s.e. 174)
613∗∗∗ (s.e. 174)
𝑝 < 0.0997
792 (66)
792 (66)
Demographics
Males
Females
same ?
411∗∗
(s.e. 187)
∗∗∗
477
(s.e. 166)
𝑝 < 0.7956
600 (50)
984 (82)
Cost of bias
Degree of
subjective
uncertainty
Stakes in the
value of the day
100 price
a
816 (68)
300 (25)
468 (39)
792 (66)
792 (66)
792 (66)
792 (66)
Robust standard errors in parentheses. Statistical significance indicators: *** 𝑝 < 0.01, **
𝑝 < 0.05, * 𝑝 < 0.1.
b
An individual observation refers to the prediction of a given subject in a given chart.
Clustering is by subjects. The number of clusters is in parentheses.
c
If the regression is restricted to the sessions with standard stakes the test 𝑝-values are 0.5094
and 0.0171 respectively.
d
If the regression is restricted to the sessions with a low maximum bonus the test 𝑝-values are
0.7620 and 0.4269 respectively.
45
Farmers
7000
10102
13000
Bakers
7000
9650
13000
Figure 4: Histogram of the mean predictions made by Farmers and Bakers. A
normal distribution curve was fitted to both histograms. The mean prediction was
10102 and 9650 respectively. 16 of the 20 subjects making the highest (lowest)
mean predictions were Farmers (Bakers).
46
Treatment effect
0
1000
2000
Prediction Confidence
−500
Treatment effect
0 500 1000 1500
Stakes
1
2
3
4
5
Maximum accuracy bonus (£)
6
4
8
Strength of interests
Treatment effect
0
400
800
Prediction Variance
5
6
7
Mean prediction confidence in chart
−400
Treatment effect
0
1000
2000
0
0
2
4
6
Prediction variance in chart (within treatment)
0
.5
1
Payoff for each £1,000 of notional profit
Figure 5: The comparative statics of wishful thinking bias. The panels show a
95 percent confidence interval for difference in predictions between Farmers and
Bakers (the treatment effect) in different subsamples. The first panel shows the
comparative statics of the cost of holding wrong beliefs, represented by the maximum accuracy bonus. The solid hyperbolic line represents the best fit for the
Optimal Expectations model, and the dashed horizontal line that of Priors and Desires. The second panel shows the bias in a chart against the mean confidence
in predictions for that chart. The curve is fitted to the inverse of the square of
the mean confidence level. The third panel shows the bias in a chart against the
mean within group predictions variance. The dashed line is a linear fit through the
origin. Finally, the fourth panel shows the comparative statics of the stakes, the
𝑥-axis representing the amount in pounds that a subject receives for each £1,000
of notional profit. The dashed line is a linear fit through the origin.
47