subjective probability, confidence, and bayesian updating

SUBJECTIVE PROBABILITY, CONFIDENCE,
AND BAYESIAN UPDATING
Igor Kopylov∗
Institute for Mathematical Behavioral Sciences, and
Department of Economics, University of California, Irvine, CA 92697
Abstract
I derive a unique subjective probabilistic belief p and Bayesian updating
for this belief from ambiguity averse preferences. To do so, I assume an
exogenous information set ∆ of possible probabilistic scenarios on the state
space S. Every uncertain prospect f is evaluated via a mixture of the
unique subjective belief p with the least favorable scenario for f in the set
∆. The weight of p in this mixture is also unique and reflects a degree of
subjective confidence in p. I use the well-known axioms—order, continuity,
monotonicity, and Independence—with the last two conditions adjusted for
the exogenous ambiguity in the set ∆. Bayesian updating for the belief
p contingent on any non-null event E is derived from a weak version of
dynamic consistency.
1
Introduction
The theory of subjective probability seeks to represent individual beliefs by additive
probability measures derived from preferences over uncertain prospects. Contributors to this theory—most notably, Ramsey [37], de Finetti [10], Savage [39],
and Machina–Schmeidler [32]—have obtained such representations for different
decision contexts and modes of choice behavior.
Yet all standard models of subjective probability contradict Ellsberg-type behaviors where agents persistently choose to bet on some event A rather than B,
but also on the complement Ac = S \ A rather than B c = S \ B For example, this
betting pattern is common when the events A and Ac are known to be equally
likely, but the objective probabilities of B and B c are unknown or vague.1 The
∗
I thank Mark Machina, Larry Epstein, Peter Wakker, and RUD participants for their comments.
1
In the three-color version of the Ellsberg Paradox, the probabilities of A = {r} and Ac =
{g, b} are known to be 31 and 23 respectively, and the probabilities of B = {g} and B c = {r, b}
belong to [0, 32 ] and [ 13 , 1] respectively.
1
general reluctance to bet on events with vague probabilities is called ambiguity
aversion.
Ambiguity can be defined exogenously when the decision maker is given an
information set ∆ that contains the objective probability law p0 , and no other
constraints on p0 are specified. The exogenous specification of ∆ is natural in
many experimental and theoretical settings. Starting from Becker and Brownson
[2], there have been at least forty empirical studies (see the table in Oechssler
and Roomets [36, p.3]) in the Ellsberg-style frameworks where objects (e.g. balls,
cards) were drawn at random from compositions (e.g. urns, decks) that were not
fully disclosed to the subjects. Such partial information can be easily translated
into information sets, which are usually discrete (see Chew, Miao, and Zhong
[8]). In theoretical literature, information sets have been used in models of unambiguous events (Epstein and Zhang [16, p.269]), smooth ambiguity (Klibanoff,
Marinacci, and Mukerji [28, p.1860]), objective rationality (Gilboa, Maccheroni,
Marinacci, and Schmeidler [22]), robust Bayesian analysis (reviewed by Berger
[3]), and other applications.
This paper models agents who reveal a unique probabilistic belief p together
with aversion towards ambiguity in the information set ∆. Even though the
belief p does not represent betting preferences, it can still be uniquely derived
from choices for which ambiguity is suitably restricted. The degree of subjective
confidence in p is revealed through such choices as well. Moreover, a weak form
of dynamic consistency establishes Bayesian updating for the subjective belief p.
I use Anscombe–Aumann’s [1] decision framework where preferences ≽ are
defined over acts f : S → L that map states of the world s ∈ S into the set L
of lotteries (objective distributions over deterministic outcomes). My first result
(Theorem 1) characterizes the utility
∫
∫
U (f ) = (1 − ε) u(f (s))dp + ε min
u(f (s))dq,
(1.1)
q∈cl(co ∆)
where ε ∈ [0, 1], p ∈ cl(co ∆) is a probability measure in the closed convex hull of
the information set, and u is a vNM expected utility index over lotteries. This representation is a special case of Gilboa–Schmeidler’s [23] maxmin expected utility
(MEU)
∫
U (f ) = min u(f (s))dq,
(1.2)
q∈Π
where the set of priors Π = (1 − ε){p} + εcl(co ∆) has the added parametric
structure of epsilon contamination.
The decision maker as portrayed by (1.1) evaluates every act f via the εmixture of the least favorable scenario in the set ∆ with her subjective belief
p ∈ ∆. To interpret this belief, let f (p) be the lottery induced by f via p. Then
p is the only probability measure that satisfies
f (q) ≽ (≻)g(p)
⇒
2
f ≽ (≻)g
for all acts f and g such that f is more secure than g with respect to the exogenous
ambiguity.2 On the other hand, if g is more secure than f , then g ≻ f may hold
together with the comparison f (p) ≻ g(p). In this case, ambiguity concerns can
overcome the decision maker’s confidence in p. The weight 1−ε can be interpreted
in terms of this subjective confidence (see Theorems 2 and 3 below).
Representation (1.1) is characterized via the familiar list of axioms—Order,
Continuity, Monotonicity, and Independence—where the last two conditions are
adjusted for the exogenous ambiguity in the information set ∆. Monotonicity
requires f ≽ g if f (q) ≽ g(q) for all q ∈ ∆. Independence imposes separability
f ≽g
⇒
αf + (1 − α)h ≽ αg + (1 − α)h,
but only when αf + (1 − α)h hedges ambiguity better than αg + (1 − α)h. (The
comparative notion of hedging is defined in terms of ∆ as well.)
1.1
Dynamic Consistency and Bayesian Updating
It is well-known (e.g. Ghirardato [18]) that in the standard expected utility model,
the Bayesian updating of subjective beliefs is equivalent to dynamic consistency.
This condition requires that given any non-null event E,
f Eh ≽ gEh
⇔
f ≽E g
(1.3)
where the preference f ≽E g is revealed after the event E occurs, and f Eh ≽ gEh
is the ex ante preference between composite acts f Eh and gEh that share the common payoffs h(s) if E does not occur. Dynamic consistency is important because
it facilitates the backward induction methods in a broad range of applications in
game theory, finance, macroeconomics etc.
There is no easy compromise between dynamic consistency and ambiguity
aversion in the multiple priors model. Various approaches include
• finding a subrelation of the ranking ≽ that is dynamically consistent, such
as the unambiguous preferences ≽∗ defined by Ghirardato, Maccheroni, and
Marinacci (GMM) [19];
• imposing rectangularity on the set of priors Π that guarantees dynamic
consistency for a given event E and its complement E c , or more generally,
a given filtration (Epstein and Schneider [14]);
2
The comparative notion of “more secure than” is defined in terms of the information set ∆
and is represented by the MEU function
∫
V (f ) = min
u(f (s))dq.
q∈cl(co ∆)
3
• updating priors in a menu-dependent way (Hanany and Klibanoff [25]);
• changing the subjective perception of decision problems (Halevy and Feltkamp
[24]).
My model adds several points to this literature. First, I formulate a restricted
version of dynamic consistency that is necessary and sufficient (Theorem 4) for
the subjective belief p to be updated via the Bayesian formula
(p|E)(A) =
p(A ∩ E)
p(E)
when the exogenous information set ∆ is updated prior by prior via the same rule
∆|E = {q|E : q ∈ ∆}.
The conditional preference ≽E is represented by
∫
UE (f ) = (1 − λ) u(f (s))d(p|E) + λ
min
∫
u(f (s))dq
(1.4)
q∈cl(co ∆|E)
where the parameter λ ∈ [0, 1] need not equal ε.
Formally, I impose condition (1.3) only when f Eh is less secure than gEh, but
f is more secure than g on the event E. Thus, I identify a subrelation of ≽ that
is dynamically consistent on the given E in the Bayesian case.3
Note that the updated set ΠE = ε(∆|E)+(1−ε){p|E} in the MEU representation for ≽E cannot be derived by updating priors from the set Π = ε∆+(1−ε){p},
even if the equality ε = λ is assumed. For example, recall the three-color Ellsberg Paradox, where S = {r, b, g}, and the decision maker is told only that the
objective probability of the state r is 31 ,
∆ = {(pr , pb , pg ) ∈ R3+ : pr =
1
3
and pg + pb = 32 }.
Let E = {r, b}, p = ( 13 , 13 , 13 ), and ε = λ = 12 . Then
{
[
]}
Π|E = (pr , 1 − pr ) : pr ∈ 52 , 32
{
[ 5 3 ]}
,4 .
ΠE = (pr , 1 − pr ) : pr ∈ 12
Neither of the two sets belongs to the other. This example illustrates that updating
the multiple priors model may depend on the information set and the parameter
ε rather than just the preference ≽.
Finally, the well-defined subjective beliefs and Bayesian updating allow to
characterize
3
Unlike GMM’s unambiguous preferences, this subrelation depends on the event E, the information set ∆, and the payoffs h obtained on the complement of E.
4
• the common prior assumption in dynamic MEU models;
• parametric MEU representations with rectangular sets of priors.
Rectangularity guarantees dynamic consistency on a given partition of the state
space and backward induction for a given filtration (Epstein and Schneider [14]).
See Section 5.2 for more details.
1.2
Epsilon-Contamination in the Literature
The epsilon contamination structure (1.1) has many applications in statistics,
decision theory, and economics. Following Hodges and Lehmann [26], this functional form has been used in robust Bayesian analysis (e.g. Berger and Berliner
[4], Moreno and Cano [33]). In this analysis, the parameter ε is commonly interpreted as the amount of error that is deemed possible for the prior p. This
interpretation differs from mine because it uses ε to describe the imprecision of a
priori knowledge rather than the effect of this imprecision on decision making.
Ellsberg [12, p.663–669] suggests the functional form (1.1) as an ad hoc explanation for his paradox. He describes p as an ‘estimated distribution, which
reflects all [subjective] judgements of the relative likelihoods of distributions, including judgements of equal likelihoods,’ and the parameter 1−ε (ρ in his notation)
as a degree of the subjective ‘confidence in the best estimates of likelihood’. In
an early experimental study, Becker and Brownson [2] find some evidence that
people put a constant weight ε on the least favorable probabilistic scenario in the
information set. (This study estimates the average weight ε to be 0.768.) Other
experiments (reviewed by Camerer and Weber [5]) produce mixed results.
An important application of representation (1.1) is obtained when the decision
maker has no information about objective probabilities, and hence, ∆ equals the
set P of all probability measures on the state space S. In this case of complete
ignorance, representation (1.1) takes the form
∫
∫
U (f ) = (1 − ε) u(f (s))dp + ε min u(f (s))ds.
(1.5)
s∈S
This representation has been applied to model asset pricing (Epstein and Wang [15]),
search (Nishimura and Ozaki [34]), and insurance (Carlier, Dana, Shahdi [6].)
Note that in the case of complete ignorance, the subjective probability measure
p represents the betting preferences over all events A ̸= S. Thus, if the state
space S is sufficiently rich, the probability measure p can be elicited from the
comparative likelihood relation as in de Finetti [10] or Kopylov [30].
Gilboa [21] and Jaffray [27] axiomatize a counterpart of representation (1.5) for
choice of objective lotteries where expected utility is mixed with the worst possible outcome. Eichberger and Kelsey [11], Nishimura and Ozaki [35] study epsilon
contamination (1.5) in Anscombe-Aumann’s framework. Both models assume
5
complete ignorance and provide no behavioral meaning for p and ε. (Nishimura
and Ozaki take the parameter ε as a primitive rather than deriving it from preference.) Gajdos, Hayashi, Tallon, and Vergnaud [17] derive epsilon contamination
in a different framework that includes variable information sets ∆ and incorporates them into objects of choice—it is assumed that the decision maker ranks
act-information pairs of the form (f, ∆). These authors interpret the parameter
ε as a degree of imprecision aversion, which is common for all information sets.
The probability measure p in their model is uniquely determined by ∆ and hence,
does not depend on preference (is not subjective). All of the above models impose
lists of axioms much different from those adopted here.
Kopylov [29] formulates a representation result which is proven similarly to
Theorem 1, but has distinct primitives. Instead of taking ∆ exogenous, this set
can be derived endogenously from an incomplete binary relation ≽∗ that represents
choices that the decision maker is willing to make immediately without waiting
for the objective probability law p0 to become clear. The endogenous definition
of ∆ has its own practical limitations, as it requires an additional decision stage
after the true probability law has been announced. The axioms and interpretation
for representation (1.1) in the choice deferral model are distinct from the exogenous case as well. My characterizations of Bayesian updating (Theorem 4) and
the comparative confidence result (Theorem 3) do not have any counterparts in
Kopylov [29] at all.
2
Preliminaries
Adopt a version of Anscombe–Aumann’s [1] decision framework. Given are a set
X of outcomes, a set S of states of nature, and an algebra Σ ⊂ 2S of events. Based
on these primitives, define
• the set L = {l, . . . } of all lotteries—probability measures on X with finite
support,
• the set U of all expected utility functions on L,
• the set P = {p, q, . . . } of all finitely additive probability measures on (S, Σ)
with the weak∗ topology,4
• the set H = {f, g, . . . } of all acts—Σ-measurable functions f : S → L that
have a finite range in L.
Endow the set H with a natural mixture operation: for any f, g ∈ H and α ∈ [0, 1],
let αf + (1 − α)g be an act such that for all s ∈ S,
[αf + (1 − α)g](s) = αf (s) + (1 − α)g(s).
If S is finite, then this topology is Euclidean. In general, a sequence {qn } in P converges to
q ∈ P in the weak∗ topology if for any event A ∈ Σ, the sequence {qn (A)} converges to q(A).
4
6
Identify constant acts with the corresponding lotteries l ∈ L. Given any acts
f, g ∈ L and any event A ∈ Σ, define a composite act
{
f (s) if s ∈ A
f Ag =
g(s) if s ∈
/ A.
Interpret any act f ∈ H as a physical action that yields the lottery f (s) after
the state s is observed. This lottery is then resolved via an objective randomizing
device like a fair coin or a roulette wheel.
For any act f ∈ H and any probability measure q ∈ P, let
∑
f (q) =
l · q ({s : f (s) = l}) .
l∈L
This mixture is well-defined because f has finite range. Say that the lottery f (q)
is induced by f via q.
Suppose that there is an objective probability distribution p0 ∈ P on the
state space, but the decision maker does not know p0 precisely. To describe this
ambiguity, take a non-empty information set ∆ ⊂ P as another primitive of the
model. Assume that the decision maker knows only that p0 belongs to the set ∆.
For example, ∆ can be specified as follows.
(i) Let ∆ = P. This case captures complete ignorance as the decision maker is
assumed to know nothing about p0 .
(ii) Let
∆ = {q ∈ P : q(A) = p0 (A) for all A ∈ Γ}
if objective probabilities p0 (A) are known only for events A in a subclass
Γ ⊂ Σ. Epstein and Zhang [16, p. 269] use this specification to motivate
their model of unambiguous events. This structure also accommodates applications (e.g. Moreno and Cano [33]) where only the median or some other
percentiles of the true probability law are known.
(iii) Let
∆ = {q ∈ P : q(A) ≥ ν(A) for all A ∈ Σ}
if the decision maker knows only some lower bounds ν(A) for objective probabilities p0 (A). (Note that imposing an upper bound on p0 (A) is equivalent
to imposing a lower bound on p0 (Ac ).) This specification accommodates
various empirical settings (such as Becker and Brownson [2], Curley and
Yates [9]) that have been used to test individual response to ambiguity. In
these settings, subjects were confronted with bets on events for which only
intervals of possible probabilities were given.
7
(iv) Let ∆ be a parametric class of probability measures (e.g. uniform, normal,
Poisson, binomial etc.) on a suitable state space S. This structure is common in robust Bayesian analysis (e.g. Berger and Berliner [4]). In this case,
the set ∆ is often not even convex. While the lack of convexity (and closedness) is convenient for the extramathematical interpretation of ∆, it is not
essential for my model, where ∆ can be freely replaced with its closed convex
hull cl(co ∆) without changing the utility representations or the behavioral
axioms.
3
Model
Let a binary relation ≽ be the decision maker’s weak preference over H. Note that
the information set ∆ is not a component of objects of choice, and the preference
≽ is defined over acts rather than over pairs of acts and information.
Axiom 1 (Order). ≽ is complete and transitive.
Axiom 2 (Continuity). For any acts f, g, h ∈ H, the sets
{α ∈ [0, 1] : αf + (1 − α)g ≽ h} and
{α ∈ [0, 1] : αf + (1 − α)g ≼ h}
are closed in [0, 1].
Given any act f ∈ H, let
L(f, ∆) = {l ∈ L : f (q) ≽ l
for all q ∈ ∆}.
Then every lottery l ∈ L(f, ∆) is inferior to all lotteries f (q) induced by f via
probabilistic scenarios q ∈ ∆, including the objective probability law p0 ∈ ∆.
This objective dominance motivates
Axiom 3 (∆-Monotonicity). f ≽ l for all acts f ∈ H and lotteries l ∈ L(f, ∆).
Note that ∆-Monotonicity implies ∆′ -Monotonicity for any ∆ ⊂ ∆′ ⊂ P.
Recall that the standard Independence axiom requires
f ≽g
⇒
αf + (1 − α)h ≽ αg + (1 − α)h
for all acts f, g, h ∈ H and α ∈ [0, 1]. Similarly to the Ellsberg Paradox, the
decision maker may violate this separability if she perceives the mixture αg + (1 −
α)h to “hedge” ambiguity better than αf + (1 − α)h. The information set ∆ can
provide a formal meaning to such hedging.
Given any two acts f, g ∈ H, say that f is more secure or less secure than g
if L(f, ∆) ⊃ L(g, ∆) or L(f, ∆) ⊂ L(g, ∆) respectively.
8
Axiom 4 (∆-Independence). For all α ∈ [0, 1], acts f, g, h ∈ H and lotteries
l ∈ L such that αf + (1 − α)h is more secure than αf + (1 − α)l, but αg + (1 − α)h
is less secure than αg + (1 − α)l,
f ≽g
⇒
αf + (1 − α)h ≽ αg + (1 − α)h.
(3.1)
If h = l, then ∆-Independence becomes Gilboa–Schmeidler’s C-Independence
f ≽g
⇒
αf + (1 − α)l ≽ αg + (1 − α)l.
More broadly, it assumes that the preference αf + (1 − α)l ≽ αg + (1 − α)l should
be preserved when the constant act l in these mixtures is replaced by any act h
such that αf + (1 − α)h is more secure than αf + (1 − α)l, but αg + (1 − α)h is
less secure than αg + (1 − α)l.
To motivate this separability, let u ∈ U be an expected utility representation
for the ranking of lotteries l ∈ L. The existence of u is guaranteed by Order,
Continuity, and C-Independence. For any act f ∈ H, define its security level
M (f ) = min u(f (q))
q∈cl(∆)
to be the expected utility of f under the least favorable probabilistic scenario in
the closure of ∆. Note that f is more secure than g if and only if M (f ) ≥ M (g).
Define the security premium 5 of any mixture αf + (1 − α)g as
SP (α, f, g) = M (αf + (1 − α)g) − [αM (f ) + (1 − α)M (g)] ≥ 0.
This premium is non-negative because V is concave. It will be shown below
(Lemma 6) that for all α ∈ [0, 1] and acts f, g, h ∈ H,
SP (α, f, h) ≥ SP (α, g, h)
if and only if there exists l ∈ L such that αf + (1 − α)h is more secure than
αf + (1 − α)l, but αg + (1 − α)h is less secure than αg + (1 − α)l. In this case,
the mixture αf + (1 − α)h can be interpreted to hedge ambiguity better than
αg + (1 − α)h because the security premium SP (α, f, h) is greater or equal than
that of SP (α, g, h). In this interpretation, ambiguity aversion should motivate
separability (3.1) in ∆-Independence.
Note that for all mixtures αf + (1 − α)g, Uncertainty Aversion
f ≽g
⇒
αf + (1 − α)g ≽ g
is also implied by ∆-Independence (together with Order and Continuity) because
SP (α, f, g) ≥ 0 = SP (α, g, g).
5
This definition is analogous to the notions of security premium in Kopylov [29] and risk
premium for monetary gambles in Kreps [31, p.74].
9
Say that ≽ is extremely cautious if f (q) ≽ f for all f ∈ H and q ∈ ∆. The
extremely cautious decision maker can choose an ambiguous act f over a constant
lottery l ∈ L only if f (q) ≽ l for all q ∈ ∆. Otherwise, she should avoid ambiguity
and choose l over f because l ≻ f (q) ≽ f for some q ∈ ∆.
Theorem 1. Axioms 1–4 hold if and only if ≽ is represented by
U (f ) = (1 − ε)u (f (p)) + ε min u (f (q)) ,
(3.2)
q∈cl(∆)
where ε ∈ [0, 1], p ∈ cl(co ∆), and u ∈ U .
Moreover, if ≽ is not extremely cautious, then
(i) it has another representation (3.2) with components ε′ ∈ [0, 1], p′ ∈ P, and
u′ ∈ U, if and only if ε′ = ε, p′ = p, and u′ = αu + β for some α > 0 and
β ∈ R;
(ii) for all f, g ∈ H such that f is more secure than g,
f (p) ≻ g(p)
f (p) ≽ g(p)
⇒
⇒
f ≻g
f ≽ g,
(3.3)
and p is the only probability measure in P that satisfies this condition.
If ε = 0 and ∆ is not singleton, then the decision maker does not know
objective probabilities precisely, but still maximizes expected utility
U (f ) = u(f (p))
with respect to her subjective belief p ∈ cl(co ∆).
In general, representation (3.2) evaluates every act f via the ε-mixture of the
least favorable belief in the set cl(∆) and the belief p ∈ cl(co ∆) that is common
for all acts.6 This representation can be written in the MEU form
U (f ) = min u (f (q)) ,
(3.4)
Π = (1 − ε){p} + εcl(co ∆)
(3.5)
q∈Π
where the set of priors
is the epsilon contamination of the probability measure p ∈ cl(co ∆) by the closed
convex hull of the information set ∆. Therefore, Theorem 1 refines the multiple
6
Equivalently, p can be mixed with the least favorable belief for the act f in cl(co ∆) because
min u(f (q)) =
q∈cl(∆)
min
q∈cl(co ∆)
10
u(f (q)).
priors model and delivers the added structure (3.5) for the set Π. Note that the
exogenous information set ∆ is necessary for this refinement. If ∆ can be picked
arbitrarily, then any Π has structure (3.5), most trivially for ε = 1 and ∆ = Π.
The subjective belief p manifests itself directly through the rankings (3.3) and
is uniquely determined by these rankings (except for the case of extreme caution).
More precisely, p is the only probability measure in the entire simplex P such that
the comparison of the induced lotteries f (p) ≻ (≽)g(p) represents all the rankings
f ≻ (≽)g when f is more secure than g.
In contrast with the standard models of subjective probability, the use of the
belief p is restricted in a way that allows the Ellsberg-type ambiguity aversion.
The rankings f (p) ≻ g(p) and g ≻ f may coexist whenever g is more secure
than f . In particular, the decision maker may prefer to bet on event A rather
than B even if she believes that A is more likely than B, but finds the bet on B
more secure. Intuitively, she may exhibit these preference reversals because she
is not fully confident in her assessment of probabilities p. The parameter 1 − ε in
representation (3.2) is then interpretable as the subjective degree of confidence in
the belief p. Formally, this parameter can be associated with several behavioral
patterns that I discuss next.
3.1
Confidence and Unambiguous Preferences
For any preference ≽, define its unambiguous part ≽∗ over acts f, g ∈ H via
f ≽∗ g
⇔
αf + (1 − α)h ≽ αg + (1 − α)h for all α ∈ [0, 1] and h ∈ H.
This definition is proposed by GMM to distinguish comparisons that hold for
all probabilistic scenarios in the subjective set of priors Π. According to my
representation (3.2), the unambiguous preference f ≽∗ g may hold even if it is
objectively possible that g(p0 ) ≻ f (p0 ), but the decision maker is sufficiently
confident in her subjective belief p to overcome such ambiguity concerns.
Consider two preferences ≽1 and ≽2 with common non-singleton information
set ∆ ⊂ P.
Theorem 2. If preferences ≽1 and ≽2 satisfy Axioms 1–4, then the following
statements are equivalent
(i) for all f, g ∈ H,
f ≽∗1 g
⇒
f ≽∗2 g,
(3.6)
f ≽2 l,
(3.7)
(ii) for all acts f ∈ H and lotteries l ∈ L,
f ≽1 l
⇒
(iii) ≽1 and ≽2 have representations (3.2) with tuples (u, p1 , ε1 ) and (u, p2 , ε2 )
1 −ε2
1
such that ε2 ≤ ε1 = 1, or ε2 ≤ ε1 < 1 and p2 ∈ ε1−ε
cl(co ∆) + 1−ε
p.
1−ε2 1
2
11
Condition (3.7) is the comparative definition of ambiguity aversion used by
Epstein [13] and Ghirardato and Marinacci [20]. Either condition (3.6) or (3.7)
is equivalent to the inclusion Π2 ⊂ Π1 for the sets of priors Πi in the multiple
priors representations (3.4) for the preferences ≽i . Statement (iii) rewrites this
inclusion for the ε-contamination case when Π1 = ε1 ∆ + (1 − ε1 ){p1 } and Π2 =
ε2 ∆ + (1 − ε2 ){p2 }. The inequality ε2 ≤ ε1 is derived here together with the
requirement that the subjective beliefs p1 and p2 diverge within a limited range
determined by ε1 , ε2 and ∆. Therefore, each of the behavioral conditions (3.6)
and (3.7) is sufficient, but not necessary for ε2 ≤ ε1 .
To make the characterization more precise, assume that ∆ is not an interval,
that is, ∆ contains at least three linearly independent scenarios q, q ′ , p′′ ∈ P
such that any vanishing linear combination αq + α′ q ′ + α′′ q ′′ = 0 must have zero
coefficients α = α′ = α′′ = 0. For any act f ∈ H, let
Ri (f ) = {l ∈ L : f ≽∗i l or l ≽∗i f }.
be the set of all lotteries l such that the choice between f or l is unambiguous
according to the ranking ≽∗i .
Theorem 3. Suppose that ∆ is not an interval, and preferences ≽1 and ≽2 satisfy
Axioms 1–4. Then there exists an act f ∈ H such that
R1 (f ) ( R2 (f )
(3.8)
if and only if ≽1 and ≽2 have utility representations (3.2) with u1 = u2 and
ε1 > ε 2 .
This result does not impose any constraints on the beliefs p1 and p2 except for
the regular inclusion p1 , p2 ∈ cl(co ∆). The comparison ε1 > ε2 is characterized in
terms of unambiguous preferences between acts f and lotteries l. If R1 (f ) is a strict
subset of R2 (f ) for some f ∈ F, then ≽2 allows to compare f unambiguously with
more lotteries than ≽1 . This suggests that there is more subjective confidence in
the belief p2 than in p1 , which translates into the strict inequality 1 − ε2 > 1 − ε1 .
To characterize the inequality ε1 ≤ ε2 , one can negate (3.8) and require that
for all f ∈ H, R1 (f ) is not a proper subset of R2 (f ).
4
Bayesian Updating
Upon observing an event E ∈ Σ, the decision maker should update both the
information set ∆ and her subjective belief p in representation (3.2). To make
the Bayesian rule well-defined for both ∆ and p, assume that lEl′ ≻ l′ for some
lotteries l, l′ ∈ L. Call such events non-null.
Accordingly, say that E is null if l′ ≽ lEl′ for all l, l′ ∈ L. Even if E is null,
the strict preference l′ ≻ lEl′ can still hold for representation (3.2) with p(E) = 0
and maxq∈∆ q(E) > 0. Yet p cannot be updated via the Bayesian rule in this case.
12
For every q ∈ P such that q(E) > 0, let q|E be the Bayesian update of q
conditional on E, that is,
q(A ∩ E)
q(E)
(q|E)(A) =
for all events A ∈ Σ.
Let ≽E be the decision maker’s weak preference over acts in H after she observes the event E. Then she should conclude that the objective probability law
p0 ∈ ∆ satisfies p0 (E) > 0, and p0 |E belongs to the set
∆|E = {q|E : q ∈ ∆, q(E) > 0}.
If this set is empty, then q(E) = 0 for all q ∈ ∆, and E is null by ∆-Monotonicity.
For any f ∈ H, let
L(f, ∆|E) = {l ∈ L : f (q) ≽ l
for all q ∈ ∆|E}.
If L(f, ∆|E) ⊃ L(g, ∆|E), say that the act f ∈ H is more secure than g ∈ H on
the event E.
Impose Order, Continuity, ∆|E-Monotonicity,7 and ∆|E-Independence on the
preference ≽E . Under the conditions of Theorem 1, ≽ and ≽E reveal subjective
beliefs p ∈ cl(co ∆) and pE ∈ cl(co ∆|E) respectively, but pE need not equal p|E
or be the result of any other specific updating formula.
To characterize the Bayesian rule for subjective beliefs pE , consider
Axiom 5 (∆-Dynamic Consistency). For all f, g, h ∈ H such that f Eh is less
secure than gEh, but f is more secure than g on E,
f Eh ≽ gEh
⇒
f ≽E g.
Indeed, conditions (3.3) imply that for all f, g, h ∈ H, if f is more secure than
g on E, then
f (pE ) ≽ g(pE ) ⇒ f ≽E g,
and if f Eh is less secure than gEh, then
f Eh ≽ gEh
⇒
(f Eh)(p) ≽ (gEh)(p)
⇒
f (p|E) ≽ g(p|E).
If pE = p|E, then ∆-Dynamic Consistency follows.
Note that ∆|E-Monotonicity implies consequentialism f ∼E f Eh for all acts f, h ∈ H
because q(E) = 1 and f (q) = (f Eh)(q) for all q ∈ ∆|E. Thus ≽E can be viewed as a preference
over contingent acts that map E into L.
7
13
Theorem 4. Suppose that E is a non-null event, and preferences ≽ and ≽E satisfy
Axioms 1–4 with information sets ∆ and ∆|E respectively. Then ∆-Dynamic
Consistency holds if and only if ≽ and ≽E are represented by
U (f ) = (1 − ε)u (f (p)) + ε min u (f (q))
q∈cl(∆)
UE (f ) = (1 − λ)u (f (p|E)) + λ min u (f (q))
(4.1)
q∈cl(∆|E)
where ε, λ ∈ [0, 1], u ∈ U , and p ∈ cl(co ∆) is such that p(E) > 0.
Moreover, if ≽ is not extremely cautious, then representations (4.1) are unique
up to a positive linear transformation of u.
In representations (4.1), both the information set ∆ and the subjective belief p
are updated via the Bayesian rule. This rule is imposed exogenously on the information set ∆, but the subjective belief p and its updates are derived endogenously
from the observable preferences ≽ and ≽E .
In general, the parameters λ and ε are not related to each other, which suggests
that the decision maker’s confidence in her subjective belief can change arbitrarily
with the arrival of new information. Unfortunately, Theorems 2 and 3 cannot
be applied directly to compare the parameters λ and ε for preferences ≽1 =≽
and ≽2 =≽E because the corresponding information sets ∆ and ∆|E are distinct.
Moreover, despite the tight mathematical connection, the geometric sizes of these
sets do not have any stable proportion.8 It makes it impossible to compare the
confidence parameters by comparing the sets of priors ε∆ + (1 − ε)p and λ(∆|E) +
(1 − λ)p|E for general ∆.
Yet one can still use Theorems 2 and 3 via the following construction. Let
S = S1 × S2 , and every state s = (s1 , s2 ) be determined by two objectively
unrelated sources of uncertainty. Suppose that ∆ = ∆1 × ∆2 , so that each q ∈ ∆
is the Fubini product of two measures q1 ∈ ∆1 on S1 and q2 ∈ ∆2 on S2 . Suppose
that there is some ambiguity on S2 so that ∆2 is not a singleton.
Let H2 be the set of all acts measurable with respect to the algebra of events
Σ2 on S2 . Then for any non-null event E = E1 × S2 that is expressed exclusively
in terms of the first component s1 ∈ S1 , the resolution of the event E preserves
any q ∈ ∆ on S2 . Thus one can compare ε and λ by applying Theorem 2 or 3
to the rankings ≽ and ≽E restricted to H2 because the sets ∆|E and ∆ coincide
when restricted to S2 .
8
For example, take S = {1, 2, 3, 4} and ∆ = {q, q ′ } such that
(
)
q = α2 , (1 − α)α, 1 − α, 0
and q ′ = ((1 − α)α, α2 , 1 − α, 0)
for some small α > 0. Then ∆|{1, 4} consists of just one point (1, 0, 0, 0), but ∆|{1, 2} consists
of two points (α, 1 − α, 0, 0) and (1 − α, α, 0, 0). Thus conditioning on the event E = {1, 4}
reduces the geometric size of ∆ to√zero, but conditioning
on E ′ = {1, 2} expands
√
√ the size of ∆
2
from an arbitrarily small number 2(α − 2α ) to 2(1 − 2α) that is close to 2 for small α.
14
5
5.1
Discussion
Non-Bayesian Updating of the Information Sets
My model allows to combine the Bayesian updating of subjective beliefs and nonBayesian procedures for some information sets ∆ and events E.
Say that the event E is surprising if inf q∈∆ q(E) = 0, that is, probability
scenarios in ∆ allow E to have an arbitrarily small probability. This event is
called surprising because it occurs with arbitrarily small probability for some
scenarios in ∆. It is plausible that the information set ∆ is not updated on the
event E via the prior-by-prior Bayesian rule because some scenarios q ∈ ∆ and
their updates q|E can be viewed as impossible if E occurs.
Let Γ ⊂ P be the exogenous updated information set on the event E. For
example,
Γ = {q|E : q ∈ ∆, q(E) > θ}
for some threshold θ > 0.
Proposition 5. Suppose that E is a non-null surprising event, and preferences ≽
and ≽E satisfy Axioms 1–4 with information sets ∆ and its update Γ respectively.
Then ∆-Dynamic Consistency holds if and only if ≽ and ≽E are represented by
U (f ) = (1 − ε)u (f (p)) + ε min u (f (q))
q∈cl(∆)
UE (f ) = (1 − λ)u (f (p|E)) + λ min u (f (q))
(5.1)
q∈cl(Γ)
where ε, λ ∈ [0, 1], u ∈ U , p ∈ cl(co ∆) is such that p(E) > 0, and p|E ∈ cl(co Γ).
Moreover, if ≽ and ≽E are not extremely cautious, then representations (5.1)
are unique up to a positive linear transformation or u.
Representation (5.1) combines the Bayesian update p|E of the subjective belief
p with an arbitrary exogenous updated information set Γ such that p|E ∈ cl(co Γ).
If this inclusion does not hold, then ∆-Dynamic Consistency should be violated as
well. However, the freedom in choosing the update Γ may be an embarrassment
of riches. Obtaining some endogenous structure for Γ is another possible research
problem.
5.2
Rectangular Extensions
The ε-contamination structure together with the Bayesian updating of p can generate parametric MEU representations with a rectangular set of priors. Given any
partition of S into non-null events π = {E1 , . . . , En }, a set Π is called rectangular
if
{ n
}
∪
∑
Π=
q(Ei )(qi |Ei ) .
q,q1 ,...,qn ∈Π
i=1
15
Given the partition π, my results can be used to obtain representations
∫
∫
U (f ) = (1 − ε) u(f (s))dp + ε min
u(f (s))dq for all π-measurable f
q∈cl(co ∆)
∫
∫
UEi (g) = (1 − λi ) u(g(s))d(p|Ei ) + λi
min
u(g(s))dq.
q∈cl(co ∆|Ei )
For any act f ∈ F, let fˆ such that fˆ(s) = li for s ∈ Ei and UEi (li ) = UEi (f ). By
Dynamic Consistency (Axiom 4 in Epstein-Schneider), for any f ∈ F ,
∫
∫
U (f ) = U (fˆ) = (1 − ε) u(fˆ(s))dp + ε min
u(fˆ(s))dq.
q∈cl(co ∆)
This is a MEU representation with the set of priors
{ n
}
∪
∑
Π=
((1 − ε)p(Ei ) + εq(Ei ))[(1 − λi )(p|Ei ) + λi (qi |Ei )]
q,q1 ,...,qn ∈cl(co ∆)
i=1
(5.2)
written entirely in terms of the subjective probability belief p, the information set
∆, the parameters ε, and λ1 , . . . , λn . By construction p ∈ Π. In case ε = λi for
all i, then (5.2) becomes a one-parametric model.
For example, consider the Ellsberg three-color setting where S = {r, b, g},
∆ = {(pr , pb , pg ) ∈ R3+ : pr = 31 , pg + pb = 32 },
E = {r, b}, p = ( 13 , 13 , 13 ), and ε = λ. The corresponding rectangular set Π is the
convex hull of the distributions
(5 7 ) 1
1
, 12 , 0 + 6 (0, 0, 1) = 72
(25, 35, 12)
q1 = 65 12
(
)
1
1
5 3 1
q2 = 6 4 , 4 , 0 ) + 6 (0, 0, 1) = 72 (45, 15, 12)
(5 7 ) 2
1
q3 = 31 12
, 12 , 0 + 3 (0, 0, 1) = 72
(10, 14, 48)
(
)
1
q4 = 13 34 , 14 , 0 ) + 23 (0, 0, 1) = 72
(18, 6, 48).
5.3
Elicitation of Beliefs and Parameter ε
In general, one can elicit p and ε from the preferences ≽ as follows.
Take any payoffs x ≻ y, such as x = $100 and y = $0. For any γ ∈ [0, 1], let
lγ be a lottery that delivers x and y with probabilities γ and 1 − γ respectively.
For any event A ∈ Σ, let
π(A) = max{γ ∈ [0, 1] : xAy ≽ lγ } = min q(A)
q∈Π
π∗ (A) = max{γ ∈ [0, 1] : lγ ∈ L(xAy, ∆)} = min q(A).
q∈cl(∆)
Here π(A) measures the decision maker’s willingness to bet on the event A and
π∗ (A) specifies her security level for this bet.
By ∆-Monotonicity, π∗ (A) ≤ π(A). Consider three possible cases.
16
(i) For all A ∈ A, π(A) + π(Ac ) = 1. Then ≽ is represented by expected utility
with p = π, and ε ∈ [0, 1] is arbitrary.
(ii) There is A ∈ A such that π∗ (A) + π∗ (Ac ) = π(A) + π(Ac ) < 1. Take ε = 1
and arbitrary p ∈ ∆. In this case, ≽ is extremely cautious.
(iii) There is A ∈ A such that π∗ (A) + π∗ (Ac ) < π(A) + π(Ac ) < 1. By (3.5),
π(A) = (1 − ε)p(A) + επ∗ (A)
π(Ac ) = (1 − ε)(1 − p(A)) + επ∗ (Ac ).
By summing the two equations, obtain
ε=
1 − π(A) − π(Ac )
.
1 − π∗ (A) − π∗ (Ac )
For all events B ∈ A, take
p(B) =
π(B) − επ∗ (B)
.
1−ε
In this case, ≽ is not extremely cautious, and both ε < 1 and p are determined uniquely.
5.4
Other Extensions
The above representation results can be flipped to characterize a utility representation
U (f ) = (1 − γ)u (f (p)) + γ max u (f (q)) ,
q∈cl(co ∆)
where γ ∈ [0, 1], p ∈ cl(co ∆), and u ∈ U . Technically, one can apply Theorem 1
directly to the binary relation ≽′ =≼. Alternatively, let
O(f, ∆) = {l ∈ L : l ≽ f (q) for all q ∈ ∆}.
Rewrite ∆-Monotonicity as l ≽ f for all l ∈ O(f, ∆). To flip ∆-Independence,
impose invariance (3.1) only for acts f, g, h and lotteries l such that O(αf + (1 −
α)h, ∆) ⊂ O(αf + (1 − α)l, ∆) and O(αg + (1 − α)h, ∆) ⊂ O(αg + (1 − α)l, ∆).
Theorems 2, 3 and 4 can be flipped analogously.
To accommodate a mixture of both ambiguity aversion and ambiguity seeking,
one can use a utility representation
U (f ) = (1 − ε − γ)u (f (p)) + ε min u (f (q)) + γ max u (f (q)) ,
q∈cl(co ∆)
q∈cl(co ∆)
where ε, γ ∈ [0, 1] are such that ε + γ ≤ 1. A special case of this representation
for ∆ = P has been characterized by Chateauneuf, Eichberger, and Grant [7] in
their model of neo-additive capacities. I study this model for an arbitrary ∆ in a
companion paper.
17
A
APPENDIX: PROOFS
Proof of Theorem 1.
Fix a non-empty set ∆ ⊂ P. Suppose that ≽ satisfies Order, Continuity, ∆Monotonicity, and ∆-Independence. By Herstein–Milnor’s Theorem, the ranking
of lotteries has an expected utility representation u ∈ U that is unique up to a
positive linear transformation. If u is constant, then (3.2) is trivial. Hereafter,
assume that u is non-constant. Without loss of generality, the range of u contains
the interval I = [−1, 1]. For any act f ∈ H, let u(f ) ∈ RS be the composition of
u and f . Then for any a ∈ I S , there exists f such that a = u(f ).
For any f ∈ H, let
M (f ) = inf u(f (q)) =
q∈∆
min u(f (q)).
q∈cl(co D)
Note that l ∈ L(f, ∆) if and only M (f ) ≥ u(l). Thus f is more secure than g if
and only if M (f ) ≥ M (g).
Lemma 6. For all α ∈ [0, 1] and f, g, h ∈ H, SP (α, f, h) ≥ SP (α, g, h) if and
only if there is l ∈ L such that αf + (1 − α)h is more secure than αf + (1 − α)l
and αg + (1 − α)h is less secure than αg + (1 − α)l.
Proof. For all α ∈ [0, 1], f, g, h ∈ H, and l ∈ L,
M (αf + (1 − α)l) = αM (f ) + (1 − α)u(l)
M (αg + (1 − α)l) = αM (g) + (1 − α)u(l).
By definition of security premia,
SP (α, f, h) − SP (α, g, h) =[M (αf + (1 − α)h) − M (αf + (1 − α)l)]+
[M (αg + (1 − α)l) − M (αg + (1 − α)h)].
Therefore, if αf +(1−α)h is more secure than αf +(1−α)l and αg+(1−α)h is less
secure than αg + (1 − α)l, then SP (α, f, h) ≥ SP (α, g, h). Conversely, suppose
that SP (α, f, h) ≥ SP (α, g, h). Take l ∈ L such that M (αf + (1 − α)h) =
M (αf + (1 − α)l). Then αf + (1 − α)h is more secure than αf + (1 − α)l and
M (αg + (1 − α)l) − M (αg + (1 − α)h) = SP (α, f, h) − SP (α, g, h) ≥ 0,
that is, αg + (1 − α)h is less secure than αg + (1 − α)l.
Assume wlog that ∆ is convex and closed. Otherwise, replace ∆ by its closed
convex hull. The function M and all the proofs below are unaffected.
The preference ≽ satisfies all conditions in Theorem 1 in Gilboa-Schmeidler
[23]. In particular, for all α ∈ [0, 1] and f, g ∈ H,
SP (α, f, g) ≥ 0 = SP (α, g, g)
18
because M is concave. By Lemma 6 and ∆-Independence, ≽ satisfies Uncertainty
Aversion. Thus, there is a unique set convex and closed set Π ⊂ P such that ≽ is
represented by
U (f ) = min u(f (q)).
(A.1)
q∈Π
Assume that S and X are finite, and Σ = 2S (the general case is treated
separately). For any a ∈ RS , let
V (a) = min q · a and W (a) = min q · a.
q∈∆
q∈Π
(A.2)
For any γ ∈ R, let ⃗γ = (γ, . . . , γ) ∈ RS . Then the functions V, W : RS → R are
continuous, concave, and satisfy
V (αa + ⃗γ ) = αV (a) + γ
and W (αa + ⃗γ ) = αW (a) + γ
(A.3)
for all vectors a ∈ RS and scalars α ≥ 0, γ ∈ R.
Next, I claim that for all a ∈ RS ,
W (a) ≥ V (a).
(A.4)
By (A.3), it is enough to show this claim for a ∈ I S . Suppose that W (a) < V (a)
for some a ∈ I S . Then a = u(f ) for some f ∈ H, and there is l ∈ L such that
M (f ) = V (a) > u(l) > W (a) = U (f ). Thus l ∈ L(f, ∆), but l ≻ f , which
contradicts ∆-Monotonicity.
Let D be the set of all points a ∈ RS where the functions V and W are both
differentiable. For every a ∈ D, let
v(a) = ∇V (a) and w(a) = ∇W (a).
Take any qa ∈ ∆ such that V (a) = qa · a. Then qa = v(a) because for all b ∈ RS
and δ ∈ R,
V (a)+δ(qa ·b) = qa ·(a+δb) ≥ min q·(a+δb) = V (a+δb) = V (a)+δ(v(a)·b)+o(δ),
q∈∆
and hence, qa · b = v(a) · b. Therefore, the vector v(a) ∈ ∆ is the unique minimizer
in (A.2): for all q ∈ ∆ such that q ̸= v(a),
V (a) = v(a) · a < q · a.
(A.5)
Similarly, the vector w(a) ∈ Π is the unique minimizer in (A.2): for all q ∈ Π such
that q ̸= w(a),
W (a) = w(a) · a < q · a.
(A.6)
It follows from (A.5) and (A.6) that v(a) and w(a) are extreme points in ∆ and
Π respectively.
19
Lemma 7. For any a, b ∈ D, there exists ε ≥ 0 such that
w(a) − w(b) = ε(v(a) − v(b)).
(A.7)
Proof. I claim that for all a, b, c ∈ RS such that V (a + c) ≥ V (a) and V (b + c) ≤
V (b),
W (a) ≥ W (b) ⇒ W (a + c) ≥ W (b + c).
(A.8)
By (A.3), it is sufficient to show this claim for vectors a, b, c ∈ I S . Take acts
f, g, h ∈ H such that u(f ) = a, u(g) = b, and u(h) = c. Take a lottery l ∈ L
such that u(l) = 0. Then the inequalities W (a) ≥ W (b), V (a + c) ≥ V (a) and
V (b + c) ≤ V (b) imply respectively that f ≽ g,
(
)
( (
))
( )
( )
( ( ))
(
)
M f +h
= V u f +h
= V a+c
≥ V a2 = V u f +l
= M f +h
2
2
2
2
2
( )
( ( ))
( )
( )
( (
))
(
)
M g+l
= V u g+l
= V 2b ≥ V b+c
= V u g+h
= M g+h
.
2
2
2
2
2
By Lemma 6, f +h
is more secure than f +l
, but g+l
is less secure than
2
2
2
f +h
g+h
∆-Independence, 2 ≽ 2 . Therefore
))
( (
))
( (
≥ W u g+h
,
W u f +h
2
2
g+h
.
2
By
and by (A.3), W (a + c) ≥ W (b + c).
Turn to (A.7). Fix any a, b ∈ D. The derivatives of the functions W and V ,
and hence the equality (A.7), are unaffected if the vectors a and b are replaced
by a − V (a)1∗ and b − (W (b) + V (a) − W (a))1∗ respectively. Wlog assume that
W (a) = W (b) and V (a) = 0.
By the separation theorem, the convex hull of the vectors v(a), −v(b) and
w(b) − w(a) either contains 0, or can be separated from 0 by a hyperplane. Therefore, one of the following two cases must hold.
Case 1. There are λ1 , λ2 , λ3 ≥ 0 such that λ1 + λ2 + λ3 = 1 and
λ1 v(a) − λ2 v(b) + λ3 (w(b) − w(a)) = 0.
Then λ1 = λ2 because v(a)·1∗ = v(b)·1∗ = w(a)·1∗ = w(b)·1∗ = 1. If λ3 ̸= 0, then
(A.7) holds for ε = λλ13 . Suppose that λ3 = 0. Then λ1 = λ2 ̸= 0 and v(a) = v(b).
Recall that V (a) = 0. Then for any δ > 0, V (a + δa) = V (a) and
V (b + δa) = min q · (b + δa) ≤ v(b) · (b + δa) = V (b)
q∈∆
because v(b) · b = V (b) and v(b) · a = v(a) · a = V (a) = 0. Let c = δa. Then by
(A.8), W (a + δa) ≥ W (b + δa), that is,
W (a) + δ(w(a) · a) + o(δ) ≥ W (b) + δ(w(b) · a) + o(δ).
Thus, w(a) · a ≥ w(b) · a. By (A.6), w(a) = w(b). The equality (A.7) holds then
for any ε ≥ 0.
20
Case 2. x · v(a) > 0 > x · v(b) and x · (w(b) − w(a)) > 0 for some x ∈ RS . Take
a sufficiently small δ > 0 and c = δx such that
V (a + c) = V (a) + δ(x · v(a)) + o(δ) > V (a)
V (b + c) = V (b) + δ(x · v(b)) + o(δ) < V (b)
W (a + c) − W (b + c) = δ(x · w(a)) − δ(x · w(b)) + o(δ) < 0.
This is a contradiction with (A.8).
If W = V , then ≽ is extremely cautious, Π = ∆, and the utility representation
(3.2) holds for ε = 1 and any p ∈ ∆.
Lemma 8. If W ̸= V , then there are unique 0 ≤ ε < 1 and p ∈ ∆ such that
W (a) = εV (a) + (1 − ε)p · a
(A.9)
for all a ∈ RS . Moreover, p is the only probability measure in P such that for all
f, g ∈ H, if f is more secure than g and f (p) ≻ g(p), then f ≻ g.
Proof. Suppose that W ̸= V . If v(a) = p is constant for all a ∈ D, then V (a) = p·a
for all a ∈ D, and by continuity, for all a ∈ RS . Then the inequality
min q · a = W (a) ≥ p · a
q∈Π
for all a ∈ RS
implies that Π = {p}, which contradicts W ̸= V .
Thus, v is not constant on D, and there are b, c ∈ D such that v(b) ̸= v(c). By
Lemma 7, there is ε ≥ 0 such that
w(b) − w(c) = ε(v(b) − v(c)).
(A.10)
Take any a ∈ D. I claim that
w(a) = εv(a) + p̂,
(A.11)
where p̂ = w(b) − εv(b) = w(c) − εv(c). To show this claim, let
B = {w(b) + γ(v(a) − v(b)) : γ ≥ 0}
C = {w(c) + γ(v(a) − v(c)) : γ ≥ 0}.
If v(a) = v(b) or v(a) = v(c), then B or, respectively, C is a singleton. If
v(a) ̸= v(b) and v(a) ̸= v(c), then B and C are rays in RS . Moreover, the
directions of these rays, v(a) − v(b) and v(a) − v(c) respectively, are linearly
independent because v(a), v(b), v(c) are distinct extreme points in ∆. Therefore,
the rays B and C have at most one point in common. However, εv(a) + p̂ ∈ B ∩ C
for γ = ε, and by Lemma 7, w(a) ∈ B ∩ C. It follows that w(a) = εv(a) + p̂.
21
By (A.5), (A.6), and (A.11),
W (a) = w(a) · a = εv(a) · a + p̂ · a = εV (a) + p̂ · a
for all a ∈ D. Rockafellar [38, Theorem 25.5] shows that the complement of D has
measure zero, and hence, D is dense in RS . By continuity, for all a ∈ RS ,
W (a) = εV (a) + p̂ · a.
(A.12)
p̂ · a ≥ (1 − ε)V (a).
(A.13)
By (A.4),
Show that ε < 1 and p̂ = (1 − ε)p for some p ∈ ∆. Consider three cases.
(i) ε > 1. Recall that there exist two distinct points v(b), v(c) ∈ ∆. Let
a = v(b) − v(c). Then V (a) + V (−a) < 0 because
V (a) ≤ v(c) · a < v(b) · a
V (−a) ≤ −v(b) · a < v(c) · a.
On the other hand, by (A.13), V (a) + V (−a) ≥
which is a contradiction.
p̂
1−ε
·a+
p̂
1−ε
· (−a) = 0,
(ii) ε = 1. By (A.13), W = V , which contradicts W ̸= V .
p̂
(iii) ε < 1. Let p = 1−ε
. By (A.13), p · a ≥ V (a) = minq∈∆ q · a for all a ∈ RS . As
∆ is convex and closed, then p ∈ ∆ by the separating hyperplane argument.
Turn to the uniqueness part. The parameter 0 ≤ ε < 1 is uniquely determined
p̂
by (A.10), and p = 1−ε
is unique.
Moreover, for any p′ ∈ P such that p′ ̸= p, there are acts f, g ∈ H such that
p′ · u(f ) > p′ · u(g),
p · u(f ) ≤ p · u(g), and V (u(f )) = V (u(g)).
(A.14)
To construct such f and g, take an event A ⊂ S such that p′ (A) > p(A). Let
π∗ (A) = minq∈∆ q(A) and π∗ (Ac ) = minq∈∆ q(Ac ). Take vectors a, b ∈ RS such
that
{
{
1 − π∗ (A) if s ∈ A
−π∗ (Ac )
if s ∈ A
as =
and
b
=
s
c
c
−π∗ (A)
if s ∈ A
1 − π∗ (A ) if s ∈ Ac .
By construction, p′ · a > p · a, p · b > p′ · b, p · a ≥ V (a) = 0, and p · b ≥ V (b) = 0,
If p · a = p · b, then take f, g ∈ H such that u(f ) = a and u(g) = b. If p · a ̸= p · b,
then take f, g ∈ H such that u(f ) = (p · b)a and u(g) = (p · a)b.
It follows from (A.14), that for any p′ ∈ P such that p′ ̸= p, there are f, g ∈ H
such that f (p′ ) ≻ g(p′ ), f is more secure than g, but still g ≽ f . The proof of
Lemma 8 is complete.
22
Lemma 8 delivers the required utility representation (3.2) for the preference ≽.
Moreover, it implies that if ≽ is not extremely cautious, then this representation
is unique up to a positive linear transformation of u, and p is the only probability
measure that satisfies the condition (3.3).
Extend the construction of the utility representation (3.2) for an arbitrary
state space S with an algebra of events Σ. For any A ∈ Σ, let
π(A) = min q(A) and π∗ (A) = min q(A).
q∈Π
q∈∆
Consider two cases.
Case 1. π(A) = π∗ (A) for all events A ∈ Σ. Let ε = 1 and take any p ∈ ∆.
Case 2. π(A) > π∗ (A) for some A ∈ Σ. Define ε < 1 by
ε=
1 − π(A) − π(Ac )
.
1 − π∗ (A) − π∗ (Ac )
For all events B ∈ Σ, take
p(B) =
π(B) − επ∗ (B)
.
1−ε
In both cases, for any finite subalgebra Σ′ ⊂ Σ such that A ∈ Σ′ , the finite
case of Theorem 1 implies that p is finitely additive on Σ′ , and the preference ≽
restricted to Σ′ measurable acts is represented by (3.2) with the triple (u, ε, p).
Thus, p is finitely additive on all of Σ, and ≽ is represented by (3.2) with the
triple (u, ε, p) on all of H. The inclusion p ∈ cl(co ∆) follows from the fact that p
is the limit of a net pΣ′ ∈ cl(co ∆) in the weak* closed set P.
Turn to necessity. Suppose that ≽ is represented by
U (f ) = (1 − ε)u(f (p)) + εM (f )
for some ε ∈ [0, 1], p ∈ cl(co D), u ∈ U . Order and Continuity are obvious. ∆Monotonicity holds because l ∈ L(f, ∆) implies u(l) ≤ M (f ) ≤ u(p) and hence
u(l) ≤ U (f ). To verify ∆-Independence, take α ∈ [0, 1], acts f, g, h ∈ H, and a
lottery l ∈ L such that f ≽ g, αf + (1 − α)h is more secure than αf + (1 − α)l,
but αg + (1 − α)h is less secure than αg + (1 − α)l. Then
a1 = U (αf + (1 − α)l) − U (αg + (1 − α)l) ≥ 0
a2 = M (αf + (1 − α)h) − M (αf + (1 − α)l) ≥ 0
a3 = M (αg + (1 − α)l) − M (αg + (1 − α)h) ≥ 0
U (αf + (1 − α)h) − U (αg + (1 − α)h) = a1 + ε(a2 + a3 ) ≥ 0
and hence, αf + (1 − α)h ≽ αg + (1 − α)h.
23
Proofs of Theorems 2 and 3
Let preferences ≽1 and 2 have utility representations (3.2) with tuples (ε1 , p1 , u1 )
and (ε2 , p2 , u2 ). Let Π1 = ε1 ∆ + (1 − ε1 ){p1 } and Π2 = ε2 ∆ + (1 − ε2 ){p2 }. GMM
[19] show that for all f, g ∈ H,
f ≽∗1 g
f ≽∗2 g
⇔ f (q) ≽1 g(q) for all q ∈ Π1
⇔ f (q) ≽2 g(q) for all q ∈ Π2 .
(A.15)
Each of the conditions of Theorem 2 is equivalent to the inclusion Π1 ⊂ Π2 . For
(i), it is established by GMM [19], for (ii) by Ghirardato and Marinacci [20], and
for (iii), it is shown directly via
p2 ∈
ε1 −ε2
cl(co ∆)
1−ε2
+
1−ε1
{p1 }
1−ε2
⇔
(1 − ε2 ){p2 } + ε2 cl(co ∆) ⊂ (1 − ε1 ){p1 } + ε1 cl(co ∆).
The equivalence of risk attitudes u1 and u2 is implied immediately by (i), (ii), or
(iii) as well.
Turn to Theorem 3. Suppose that ∆ is not an interval, ε1 > ε2 , and u1 = u2 =
u. Assume wlog that the range of u includes [−2, 2]. For all f ∈ H, let
M (f ) =
min u(f (q)) and M ∗ (f ) = max u(f (q)).
q∈cl(co ∆)
q∈cl(co ∆)
By (A.15), for all f ∈ H,
R1 (f ) = {l : u(l) ̸∈ [ε1 M (f ) + (1 − ε1 )u(f (p1 )), ε1 M ∗ (f ) + (1 − ε1 )u(f (p1 ))]}
R2 (f ) = {l : u(l) ̸∈ [ε2 M (f ) + (1 − ε2 )u(f (p2 )), ε2 M ∗ (f ) + (1 − ε2 )u(f (p2 ))]}.
(A.16)
∗
The strict inclusion R1 (f ) ( R2 (f ) implies that ε1 [M (f ) − M (f )] > ε2 [M ∗ (f ) −
M (f )], that is, ε1 > ε2 . Consider two cases.
Case 1. p1 = p2 = p. Take A ∈ Σ such that
π ∗ (A) = max q(A) >
q∈cl(co ∆)
min q(A) = π∗ (A).
q∈cl(co ∆)
Take f ∈ H such that u(f (s)) = 1 if s ∈ A and u(f (s)) = 0 if s ̸∈ A. As ε1 > ε2 ,
and p(A) ∈ [π∗ (A), π ∗ (A)], then by (A.16), R2 (f ) is a strict subset of R2 (f ).
Case 2. p1 ̸= p2 . Take q ∈ ∆ such that q, p1 , p2 are linearly independent.
Assume that S is finite. Take a ∈ RS such that
a = (q − p2 ) −
(q − p2 ) · (p1 − p2 )
(p1 − p2 ).
(p1 − p2 ) · (p1 − p2 )
Then a ̸= 0 and a · (p1 − p2 ) = 0. By construction |a(s)| ≤ 2 for all s ∈ S. Take
f ∈ H such that a = u(f ). Then u(f (p1 )) = u(f (p2 )) ̸= u(f (q)). It follows that
M ∗ (f ) > M (f ). By (A.16), R2 (f ) is a strict subset of R2 (f ).
24
Proofs of Theorem 4 and Proposition 5
Suppose that E is a non-null event, preferences ≽ and ≽E satisfy Axioms 1–4 with
information sets ∆ and ∆|E respectively, and ∆-Dynamic Consistency holds.
By Theorem 1, ≽ and ≽E have representations (3.2) with components (ε, p, u)
and (λ, pE , u) where p ∈ cl(co ∆) and pE ∈ cl(co ∆|E) respectively. If ≽E is
extremely cautious, then pE is arbitrary and can be taken equal to p|E. Suppose
that λ < 1.
Assume that pE ̸= p|E. I claim that there are acts f, g ∈ H such that
pE · u(f ) < pE · u(g),
(p|E) · u(f ) > (p|E) · u(g), and VE (u(f )) = VE (u(g)).
(A.17)
To construct such f and g, take an event A ⊂ S such that pE (A) > (p|E)(A).
Let π∗ (A) = minq∈∆|E q(A) and π∗ (Ac ) = minq∈∆|E q(Ac ). Take vectors a, b ∈ RS
such that
{
{
1 − π∗ (A) if s ∈ A
−π∗ (Ac )
if s ∈ A
as =
and bs =
c
−π∗ (A)
if s ̸∈ A
1 − π∗ (A ) if s ̸∈ A.
By construction, pE · a > (p|E) · a, (p|E) · b > pE · b, pE · a ≥ VE (a) = 0, and
pE · b ≥ VE (b) = 0, If pE · a = pE · b, then take f, g ∈ H such that u(g) = a and
u(f ) = b (the range of u is assumed to contain [−1, 1].) If pE · a ̸= pE · b, then
take f, g ∈ H such that u(g) = (pE · b)a and u(f ) = (pE · a)b.
Take l ∈ L such that u(l) = VE (u(f )) = VE (u(g)). Then for all q ∈ cl(co ∆),
u(f El(q)) = q(E)u(f (q|E)) + (1 − q(E))u(l) ≥ q(E)VE (f ) + (1 − q(E))u(l) = u(l).
As u(f (q|E)) can be arbitrarily close to VE (f ),
M (f El) = inf u(f El(q)) = u(l).
q∈∆
Similarly, M (gEl) = u(l).
It follows from (A.17) and (3.2) that f El is less secure than gEl, f is more
secure than g on E, f El ≽ gEl because (p|E) · u(f ) > (p|E) · u(g), but g ≻E f
because pE · u(f ) < pE · u(g) and λ < 1.
This is a contradiction with ∆-Dynamic Consistency. Thus pE = p|E.
To prove Proposition 5, proceed analogously, but take l such that u(l) ≤
mins∈S u(f (s)) and u(l) ≤ mins∈S u(g(s)). As E is surprising, then inf q∈∆ q(E) =
0 and hence, M (f El) = M (gEl) = u(l). Therefore, pE ̸= p|E contradicts ∆Dynamic Consistency as well.
References
[1] F. Anscombe and R. Aumann. A definition of subjective probability. Annals
of Mathematical Statistics, 34:199–205, 1963.
25
[2] S. W. Becker and O. Brownson. What price ambiguity? or the role of ambiguity in decision making. The Journal of Political Economy, 72:62–73, 1964.
[3] J. Berger. An overview of robust Bayesian analysis. Test, 3:5–124, 1994.
[4] J. Berger and M. Berliner. Robust Bayes and empirical analysis with epsilon
contaminated priors. The Annals of Statistics, 14:461–486, 1986.
[5] C. Camerer and M. Weber. Recent developments in modeling preferences.
Journal of Risk and Uncertainty, 5:325–370, 1992.
[6] G. Carlier, R. Dana, and N. Shahdi. Efficient insurance contracts under
epsilon contaminated utilities. The Geneva Papers on Risk and Insurance
Theory, 28:59–71, 2003.
[7] A. Chateauneuf, J. Eichberger, and S. Grant. Choice under uncertainty with
the best and worst in mind: Neo-additive capacities. Journal of Economic
Theory, 137:538–567, 2007.
[8] S. H. Chew, B. Miao, and S. Zhong. Partial ambiguity. Mimeo, The National
University of Singapore, 2012.
[9] S. P. Curley and J. Yates. The center and range of the probability interval as
factors affecting ambiguity preference. Organizational Behavior and Human
Decision Processes, 36:273–287, 1985.
[10] B. de Finetti. La prévision: ses lois logiques, ses sources subjectives. Annales de l’Institute Henri Poincare, 7:1–68, 1937. Translated and reprinted
in Kyburg and Smokler, 1964.
[11] J. Eichberger and D. Kelsey. E-capacities and the Ellsberg Paradox. Theory
and Decision, 46:107–140, 1999.
[12] D. Ellsberg. Risk, ambiguity, and the Savage axioms. Quaterly Journal of
Economics, 75:643–669, 1961.
[13] L. Epstein. A definition of uncertainty aversion. Review of Economic Studies,
66:579–608, 1999.
[14] L. Epstein and M. Schneider. Recursive multiple-priors. Journal of Economic
Theory, 61:1–31, 2003.
[15] L. Epstein and T. Wang. Intertemporal asset pricing and Knightian uncertainty. Econometrica, 62:283–382, 1994.
[16] L. Epstein and J. Zhang. Subjective probabilities on subjectively unambiguous events. Econometrica, 69:265–306, 2001.
26
[17] T. Gajdos, T. Hayashi, J.-M. Tallon, and J.-C. Vergnaud. Attitude towards
imprecise information. Journal of Economic Theory, 140:27–65, 2008.
[18] P. Ghirardato. Revisiting savage in a conditional world. Economic Theory,
20:83–92, 2002.
[19] P. Ghirardato, F. Maccheroni, and M. Marinacci. Differentiating ambiguity
and ambiguity attitude. Journal of Economic Theory, 118:133–173, 2004.
[20] P. Ghirardato and M. Marinacci. Ambiguity mase precise: A comparative
foundation. Journal of Economic Theory, 102:251–289, 2002.
[21] I. Gilboa. A combination of expected utility and maxmin decision criteria.
Journal of Mathematical Psychology, 32:405–420, 1988.
[22] I. Gilboa, F. Maccheroni, M. Marinacci, and D. Schmeidler. Objective and
subjective rationality in a multiple priors model. 2008.
[23] I. Gilboa and D. Schmeidler. Maxmin expected utility with non-unique prior.
Journal of Mathematical Economics, 18:141–153, 1989.
[24] Y. Halevy and V. Feltkamp. A bayesian approach to uncertainty aversion.
Econometrica, 69:265–306, 2001.
[25] E. Hanany and P. Klibanoff. Updating preferences with multiple priors. Theoretical Economics, 2:261–198, 2007.
[26] J. L. Hodges and E. L. Lehmann. The use of previous experience in reaching
statistical decisions. Annals of Mathematical Statistics, 23:396–407, 1952.
[27] J.-Y. Jaffray. Choice under risk and the security factor: an axiomatic model.
Theory and Decision, 24:169–200, 1988.
[28] P. Klibanoff, M. Marinacci, and S. Mukerji. A smooth model of decision
making under ambiguity. Econometrica, 73:1849–1892, 2005.
[29] I. Kopylov. Choice deferral and ambiguity aversion. Theoretical Economics,
4:199–225, 2009.
[30] I. Kopylov. Simple axioms for countably additive subjective probabilities.
Journal of Mathematical Economics, 46:867–876, 2010.
[31] D. M. Kreps. Notes on the Theory of Choice. Westview Press, Boulder and
London, 1988.
[32] M. Machina and D. Schmeidler. A more robust definition of subjective probability. Econometrica, 57:571–58, 1992.
27
[33] E. Moreno and J. Cano. Robust bayesian analysis with epsilon contaminations partially known. Journal of the Royal Statistical Society: Part B,
53:143–155, 1991.
[34] K. Nishimura and H. Ozaki. Search and Knightian uncertainty. Journal of
Economic Theory, 119:299–333, 2004.
[35] K. Nishimura and H. Ozaki. An axiomatic approach to epsilon contamination.
Economic Theory, 27:333–340, 2006.
[36] J. Oechssler and A. Roomets. A test of mechanical ambiguity. Discussion
Paper 555, University of Heidelberg, 2014.
[37] F. Ramsey. Truth and probability. In The Foundations of Mathematics
and Other Logical Essays. Harcourt, Brace and Co., New York, 1931. First
published in 1926.
[38] R. T. Rockafellar. Convex Analysis. Princeton University Press, Princeton,
1970.
[39] L. J. Savage. The Foundations of Statistics. Dover Publications Inc., New
York, second revised edition, 1972.
28