Adapting de Finetti's proper scoring
rules for Eliciting Bayesian Beliefs
to Prospect Theory
Theo Offerman Joep Sonnemans Gijs van de Kuilen Peter P. Wakker
July 6, 2006
IAREP, Paris
Topic: Our chance estimates of France to become
world-champion.
F: France will win. not-F: Italy.
2
Imagine following bet:
You choose 0 r 1, as you like.
We call r your reported probability of France.
You receive
F
not-F
1 – (1– r)2
1 – r2
What r should be chosen?
First assume EV.
After some algebra:
p true subjective probability;
optimizing p(1 – (1– r)2) + (1–p) (1–r2);
1st order condition 2p(1–r) – 2r(1–p) = 0;
r = p.
Optimal r = your true subjective probability of
France winning.
Easy in algebraic sense.
Conceptually: !!! Wow !!!
de Finetti (1962) and Brier (1950) were the
first neuro-economists!
3
4
"Bayesian truth serum" (Prelec, Science, 2005).
Superior to elicitations through preferences .
Superior to elicitations through indifferences ~
(BDM).
Widely used: Hanson (Nature, 2002), Prelec (Science 2005). In
accounting (Wright 1988), Bayesian statistics (Savage 1971),
business (Stael von Holstein 1972), education (Echternacht
1972), medicine (Spiegelhalter 1986), psychology (Liberman &
Tversky 1993; McClelland & Bolger 1994), experimental
economics (Nyarko & Schotter 2002).
We want to introduce these very nice features
into prospect theory.
Survey
Part I. Deriving r from theories:
expected value;
expected utility;
nonexpected utility for probabilities;
nonexpected utility for ambiguity.
Part II. Deriving theories from observed r.
In particular: Derive beliefs/ambiguity
attitudes. Will be surprisingly easy.
Proper scoring rules <==> Prospect theory:
Mutual benefits.
Part III. Implementation in an experiment.
5
6
Part I. Deriving r from Theories (EV, and
then 3 deviations).
7
Let us assume that you strongly believe in
France (Zidane …)
Your "true" subj. prob.(F) = 0.75.
EV:
Then your optimal rF = 0.75.
8
R(p) 1
rEU
rEV
0.75
0.69
0.61
(a) expected value (EV);
rnonEU (b) expected utility with U(x) =
x (EU);
0.50
rnonEUA (c) nonexpected utility for
nonEU
0.25
Reported probability R(p) = rF
as function of true probability
p, under:
known probabilities, with U(x)
= x0.5 and with w(p) as
common;
EU
EV
0
0
0.25
0.50
0.75
p
rnonEUA: nonexpected utility for
unknown probabilities
1 ("Ambiguity").
next p.
go to p. 11,
Example EU
go to p. 15,
Example nonEU
go to p. 19,
Example nonEUA
9
So far we assumed EV (as does everyone using
proper scoring rules, but as no-one does in
modern decision theory...)
Deviation 1 from EV: EU with U nonlinear
Now optimize
pU(1 – (1– r)2) + (1 – p)U(1 – r2)
r =
10
p
U´(1–r2)
p + (1–p)
U´(1 – (1–r)2)
Reversed (and explicit) expression:
p =
r
U´(1–r2)
r + (1–r)
U´(1– (1–r)2)
11
How bet on France? [Expected Utility].
EV: rEV = 0.75.
Expected utility, U(x) = x:
rEU = 0.69.
You now bet less on France. Closer to safety
(Winkler & Murphy 1970).
go to p. 8, with
figure of R(p)
12
Deviation 2 from EV: nonexpected utility for
probabilities (Allais 1953, Machina 1982, Kahneman &
Tversky 1979, Quiggin 1982, Schmeidler 1989, Gilboa 1987,
Gilboa & Schmeidler 1989, Gul 1991, Levy-Garboua 2001, Luce
& Fishburn 1991, Tversky & Kahneman 1992, Birnbaum 2005)
For two-gain prospects, virtually all those
theories are as follows:
For r 0.5, nonEU(r) =
w(p)U(1 – (1–r)2) + (1–w(p))U(1–r2).
r < 0.5, symmetry; soit!
Different treatment of highest and lowest
outcome: "rank-dependence."
13
w(p)
1
.51
1/3
0
1/3
2/3
1
p
Figure. The common weighting function w.
w(p) = exp(–(–ln(p))) for = 0.65.
w(1/3) 1/3;
w(2/3) .51
14
Now
r =
w(p)
U´(1–r2)
w(p) + (1–w(p))
U´(1– (1–r)2)
Reversed (explicit) expression:
r
w –1
p =
U´(1–r2)
r + (1–r)
U´(1– (1–r)2)
(
)
15
How bet on France now? [nonEU with probabilities].
EV: rEV = 0.75.
EU: rEU = 0.69.
Nonexpected utility, U(x) = x,
w(p) = exp(–(–ln(p))0.65).
rnonEU = 0.61.
You bet even less on France. Again closer to
safety.
go to p. 8, with
figure of R(p)
Deviations were at level of behavior so far, not of beliefs. Now for something different; more fundamental.
3rd violation of EV: Ambiguity (unknown
probabilities; belief/decision-attitude? Yet to be
settled).
No objective data on probabilities.
How deal with unknown probabilities?
Have to give up Bayesian beliefs descriptively.
According to some even normatively.
16
Instead of additive beliefs p = P(F),
nonadditive beliefs B(F) (Dempster&Shafer,
Tversky&Koehler, etc.)
All currently existing decision models:
For r 0.5, nonEU(r) =
w(B(F))U(1 – (1–r)2) + (1–w(B(F)))U(1–r2).
Don't recognize? Is just '92 prospect theory =
Schmeidler (1989)! Write W(F) = w(B(F)).
Can always write B(F) = w–1(W(F)).
For binary gambles: Pfanzagl 1959; Luce ('00 Chapter
3); Ghirardato & Marinacci ('01, "biseparable").
17
18
rF =
w(B(F))
U´(1–r2)
w(B(F)) + (1–w(B(F)))
U´(1– (1–r)2)
Reversed (explicit) expression:
r
B(F) = w –1
U´(1–r2)
r + (1–r)
U´(1– (1–r)2)
(
)
19
How bet on France now? [Ambiguity, nonEUA].
rEV = 0.75.
rEU = 0.69.
rnonEU = 0.61 (under plausible assumptions).
Similarly,
rnonEUA = 0.52.
r's are close to always saying fifty-fifty.
"Belief" component B(F) = w–1(W) = 0.62.
go to p. 8, with
figure of R(p)
20
B(F): ambiguity attitude /=/ beliefs??
Before entering that debate, first:
How measure B(F)?
Our contribution: through proper scoring rules
with "risk correction."
Part II. Deriving Theoretical Things from
Empirical Observations of r
We reconsider reversed explicit expressions:
r
p = w –1
U´(1–r2)
r + (1–r)
U´(1– (1–r)2)
)
(
(
B(F) = w –1
r
U´(1–r2)
r + (1–r)
U´(1– (1–r)2)
)
Corollary. p = B(F) if related to the same r!!
21
Example (participant 25)
22
stock 20, CSM
certificates
dealing in sugar
and bakeryingredients.
Reported
probability:
r = 0.75
91 91
For objective probability p=0.70, also
reports r = 0.75.
Conclusion: B(elief) of ending in bar is 0.70!
We simply measure the R(p) curves, and
use their inverses: is risk correction.
Our proposal takes the best of several worlds!
Need not measure U,W, and w.
Get "canonical probability" without measuring
indifferences (BDM …; Holt 2006).
Calibration without needing many repeated
observations.
Do all that with no more than simple properscoring-rule questions.
23
Mutual benefits
prospect theory <==> proper scoring rules
We bring insights of modern prospect theory to
proper scoring rules. Make them empirically
more realistic. (EV is not credible in 2006 …)
We bring insights of proper scoring rules to
prospect theory. Make B very easy to measure
and analyze.
Directly implementable empirically. We did so in
an experiment, and found plausible results.
24
25
Part III. Experimental Test of
Our Correction Method
26
Method
Participants. N = 93 students.
Procedure. Computarized in lab. Groups of
15/16 each. 4 practice questions.
Stimuli 1. First we did proper scoring rule
for unknown probabilities. 72 in total.
For each stock two small intervals, and, third,
their union. Thus, we test for additivity.
27
Stimuli 2. Known probabilities:
Two 10-sided dies thrown.
Yield random nr. between 01 and 100.
Event F: nr. 75 (etc.).
Done for all probabilities j/20.
28
Motivating subjects. Real incentives.
Two treatments.
1. All-pay. Points paid for all questions.
6 points = €1.
Average earning €15.05.
2. One-pay (random-lottery system).
One question, randomly selected afterwards,
played for real. 1 point = €20. Average
earning: €15.30.
29
Results
30
1
0.9
Average
correction
curves.
0.8
Corrected probability
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
1.000
Reported probability
ONE (rho = 0.70)
ALL (rho = 1.14)
45°
31
F(ρ )
1
0.9
Individual
corrections
0.8
0.7
0.6
0.5
treatment
one
0.4
0.3
0.2
treatment
all
0.1
0
-2.0
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
ρ
32
Figure 9.1. Empirical density of additivity bias for the two treatments
Fig. b. Treatment t=ALL
Fig. a. Treatment t=ONE
uncorrected
160
160
140
140
120
120
100
100
80
corrected
80
60
60
40
40
20
20
0
0.6
0.4
0.2
uncorrected
0
0.2
0.4
0.6
corrected
0
0.6
0.4
0.2
0
0.2
0.4
0.6
For each interval [(j2.5)/100, (j+2.5)/100] of length 0.05 around j/100, we counted the number of additivity biases
in the interval, aggregated over 32 stocks and 89 individuals, for both treatments. With risk-correction, there were
65 additivity biases between 0.375 and 0.425 in the treatment t=ONE, and without risk-correction there were 95
such; etc.
Summary and Conclusion
Modern decision theories: proper scoring
rules are heavily biased.
We correct for those biases, with benefits
for proper-scoring rule community and for
nonEU community.
Experiment: correction improves quality;
reduces deviations from ("rational"?)
Bayesian beliefs.
Do not remove all deviations from Bayesian
beliefs. Beliefs seem to be genuinely
nonadditive/nonBayesian/sensitive-toambiguity.
33
© Copyright 2026 Paperzz