can bayes` rule be justified by cognitive rationality principles

CAN BAYES' RULE BE JUSTIFIED BY
COGNITIVE RATIONALITY PRINCIPLES ?
by Bernard WALLISER (CERAS-ENPC and CREA -Ecole Polytechnique)
and Denis ZWIRN (CREA -Ecole Polytechnique)
August 2002
Published in ‘Theory and Decision’, 53, 2002
Summary: The justification of Bayes' rule by cognitive rationality principles is undertaken by
extending the propositional axiom systems usually proposed in two contexts of belief change:
revising and updating. Probabilistic belief change axioms are introduced, either by direct
transcription of the set-theoretic ones, or in a stronger way nevertheless in the spirit of the
underlying propositional principles. Weak revising axioms are shown to be satisfied by a
General Conditioning rule, extending Bayes' rule but also compatible with others, and weak
updating axioms by a General Imaging rule, extending Lewis rule. Strong axioms (equivalent to
the Popper-Miller axiom system) are necessary to justify Bayes' rule in a revising context, and
justify in fact an extended Bayes' rule which applies even if the message has zero probability.
In a context of uncertainty, when an actor’s belief is represented by a probability function on some
set of possible worlds, the traditional approach to belief change is to use Bayes' rule. The principle
underlying this rule is to reallocate proportionally the probability of the worlds excluded on the
remaining worlds. This rule receives a statistical justification when subjective probabilities are
identified with proportions or frequencies (populational argument), and a decision-theoretic justification
when probabilities are identified with betting rates (Dutch Book argument). But Bayes' rule has not
received yet a clear epistemic justification based on subjective probabilities as true degrees of belief
and suffers from three drawbacks.
First, Bayes' rule is not grounded on axioms which express cognitive rationality principles followed
by the actor in any type of belief change. For instance, Teller (1976) argues that Bayes' rule is the only
one satisfying two conditions, stating in fact that the probabilities of worlds compatible with the
message change homothetically. Williams (1980) shows that Bayes' rule minimizes Kulback’s
informational distance between initial and final belief when the message is recognized as certain.
Heckerman (1988), following Cox (1946), proves that Bayes' rule is the only one ensuring the
consistency of the Boolean structure of events and two algebraic axioms. But all these propositions
have an arbitrary flavor since they lack of cognitive foundations.
Second, Bayes' rule is not related to a precise epistemic context where it is assumed to adequately
apply. To be fair, Gärdenfors (1988) proposes such a foundation, by exhibiting a link between Bayes'
rule and the propositional axioms of Alchourron-Gärdenfors-Makinson (1985), backed on very general
intuitions governing belief change and now a reference system in AI literature. However, that
approach encounters two important limits. First, the transcription of the qualitative axioms of revision
into probabilistic ones relies itself on a very demanding axiom ("linear mixing") which seems quite as
arbitrary as the previous purely algebraic axioms. Second, Gärdenfors recognizes that Bayes' rule has
a challenger, the Lewis rule (or Imaging), which obeys to different axioms, but he is silent about the
contexts of change where they respectively should apply.
Third, Bayes' rule cannot be used in situations where the message contradicts the initial belief, i.e.
has zero prior probability. A handling of this problem, due to Miller-Popper (1994), introduces directly
some axioms for conditional probabilities accepting ordinary probabilities as specific cases. But this
axiom system does not provide any rule for effectively computing the conditional probability.
The present work tries to justify some families of change rules by a complete system of axioms
reflecting principles of cognitive rationality and to overcome the preceding limits. First, it translates
usual axioms for belief change from set-theoretic into probabilistic ones, in a weak, strong or superstrong way, transforming qualitative into quantitative principles. Second, it distinguishes two contexts
of belief change: revising where the message completes or contradicts the initial belief about a static
world (Alchourron-Gärdenfors-Makinson,1985), and updating where the message gives some recent
information on an evolving world (Katsuno-Mendelzon,1992). Third, it shows that an “extended Bayes'
rule”, which applies even for a zero probability message, can only be obtained by the super-strong
axiom system proposed in the revising context, shown to be equivalent to the Miller-Popper system.
The representation theorems obtained in the paper are summarized in the following table where
Extended Bayes' rule appears just in one cell:
context of change
Revising
(Alchourron -Gärdenfors-Makinson)
Updating
(Katsuno- Mendelzon)
Weak General Conditioning
(GCW)
Weak General Imaging
(GIW)
strong
Strong General Conditioning
(GCS)
Strong General Imaging
(GIS)
super strong
Bayesian General Conditioning
(GC-B) or Extended Bayes' rule
(⇔ Miller-Popper)
P-independent General Imaging
(GI-P)
axiomatic system
weak
1
In a first part, the paper considers change axioms for both contexts of change. Set-theoretic axioms
are recalled (§1.1), from which weak probabilistic axioms are then derived (§1.2); finally, strong
probabilistic axioms extending the spirit of weak ones to a numerical framework are proposed and
discussed (§1.3). In a second part, the paper considers change rules obtained by representation
theorems for both contexts. Set-theoretic rules are recalled (§2.1), followed by probabilistic rules
obtained with weak axioms (§2.2) and strong ones (§2.3).
1. CHANGE AXIOMS
1.1. SET- THEORETIC AXIOMS
The paper favors the semantical possible worlds approach to a syntactical approach, since it looks more
suited to an extension from a set-theoretic to a probabilistic framework. In this approach, each proposition
is symbolized by an event denoted X, i.e. by the subset of worlds where the proposition is true. A
contradictory proposition will be represented by the empty set ∅ and a tautological proposition by the
whole set of worlds W, considered as finite.
1.1.1. COMMON SET-THEORETIC AXIOMS
.
The change process considers that the agent has an initial belief K and receives a message A, both given
as events. The message can be compatible with the initial belief (K∩A ≠∅) or contradictory (K∩A =∅). He
infers from them a revised belief K * A, which is explicitly assumed to be a unique event. Hence, the
process studies the change of some belief, but not the origin of any belief. Formally, a change function *
is a function from 2W to P(W) which associates to a couple of events (K,A) an event K * A.
The context of change is assumed to be given by the message itself, which completes or contradicts a
prior belief about a static world (revising), or gives some information about the modification of an evolving
world (updating). For instance, if somebody believes that he has money in his pocket and discovers that
his pocket is empty, the message indicates whether he never had money in his pocket (revising) or the
money was stolen (updating).The question of whether one context can be reduced to the other is still an
open one (Friedman-Halpern,1994).
In general, a semantic representation has two well-known consequences. First, the Logical omniscience
axiom is accepted : the agent’s belief is always deductively closed and keeps no distinction between
explicit and implicit (derived) belief. Second, the Extensionality axiom is always satisfied : if two
propositions have the same truth values, they are symbolized by the same subset of worlds and hence
are indistinguishable. Applied to belief change, the Extensionality axiom says that two initial beliefs (or
two messages) represented by the same set of worlds lead to final beliefs represented by the same set of
worlds, whatever their linguistic formulation. In a purely set-theoretic framework, this property becomes
trivial, but will be exploited in § 1.2.1 :
A0. Neutrality
If K = K’ then K*A = K’*A
The two contexts of change share moreover the four following axioms :
A1. Consistency
If K ≠ ∅ and A ≠ ∅ then K * A ≠ ∅
A2. Success
K*A⊆A
A3. Conservation
If K ⊆ A then K * A = K
2
A4. Sub-expansion
(K * A) ∩B ⊆ K * (A ∩ B)
Intuitively, Consistency states that when a non contradictory initial belief K is revised by a non
contradictory message, the final belief is non contradictory. Conservation states that if the message is
already validated by the initial belief, the final belief is unchanged. Summarizing a priority principle,
alternative to a possible symmetry principle between message and prior belief, Success states that the
final belief validates the message considered as true (contrary to the initial belief). Sub-expansion is one
side of a principle of minimal change of the prior belief; it states that the final belief resulting from two
messages keeps at least the part of the revised belief by one message compatible with the other.
From these axioms, one can derive the following properties:
A2’. Idempotence
(K * A) * A = K * A.
A3’.Weak conservation
K*W=K
A4’. Inclusion (by A4 and A3’)
K∩A⊆K*A
Idempotence states that when a message is repeated, it no longer modifies the final belief ; this axiom is
natural when the priority principle is stated (contrary to symmetric combination where a repeated
message is reinforced). Weak conservation is a restriction of Conservation to a tautological message.
Inclusion is a restriction of Sub-expansion to the case of a single message : it states that the final belief
keeps at least the part of the initial belief which is compatible with the message.
1.1.2 SET-THEORETIC REVISING AXIOMS
The revising axiom system consists in adding the following axiom to the previous common ones :
A5. Super-expansion
If (K * A) ∩ B ≠ ∅ then K * (A ∩ B) ⊆ (K * A) ∩ B
This axiom is the other side of a principle of minimal change of the prior belief : it states that the final
belief resulting from two messages always keeps at most the part of the revised belief by one of them
compatible with the other, if any.
Call Ar = {A1, A2, A3, A4, A5} the revising axiom system. From Ar, the following properties can be derived
(see appendix 1 for proof of A45) :
A5’. Preservation (by A5 and A3’)
If K ∩ A ≠ ∅ then K * A ⊆ K ∩ A
A45. Right distributivity
K * (A U B) = (K * A) U (K * B) or K *A or K * B
A45’.Commutativity
If K * A ∩ B ≠ ∅ and K * B ∩ A ≠ ∅, then K * (A ∩B) = (K * A) * B = (K * B) * A
Preservation is a restriction of Super-expansion to the case of a single message : it states that the final
belief keeps at most the part of the initial belief compatible with the message, if it exists. Rightdistributivity states that the final belief resulting from the disjunction of two messages is equivalent either
to the final belief resulting from one of these messages, or to the disjunction of the final beliefs resulting
from each of these messages. Finally, Commutativity states that if no contradiction arises in the belief
change, the final belief does not depend of the order in which two messages are dealt; this property is
quite natural in a revising context where the messages give independent information on a same world.
3
Considering that Extensionality is automatically satisfied in a possible worlds model, and assuming that
K * A is always unique, the set of axioms Ar is a correct transcription in a set- theoretic framework of the
Alchourron-Gärdenfors-Makinson (1985) system, however limited to the cases where the initial belief K is
consistent. But the system can also be represented by {A1, A2 ,A3’, A4, A5} or by {A1, A2, A3’, A45}.
1.1.3. UPDATING SET-THEORETIC AXIOMS
The updating axiom system consists in adding the two following axioms to the previous common ones :
A6. Pointwise super-expansion
0
0
If ∃ w ∈ W : K = {w } and if (K * A) ∩ B ≠ ∅ then K * (A ∩ B ) ⊆ (K * A) ∩ B
A7. Left-distributivity
(K U K’) * A = (K * A) U (K’ * A)
Pointwise super-expansion restricts the intuition of Super - expansion to the case where the initial belief is
complete (or certain), i.e. reduced to one world; hence, the principle of minimal change survives only in a
weaker form; it follows that the Commutativity axiom is no more satisfied: when the real world evolves,
the order of successive messages matters, the last being favored. Left - distributivity states that when
balancing between two alternative initial beliefs, these are revised independently for any message.
From the preceding axiom, it is possible to deduce the following property:
A7’. Left-monotonicity
If K ⊆ K’ then K * A ⊆ K’ * A
This axiom states that if initial belief is weakly decreased, so is final belief .
Since Extensionality is necessarily satisfied, the set of axioms Au = {A1, A2, A3, A4, A6, A7} is a correct
transcription in a set-theoretic framework of the system referred as {U1-U5, U8, U9} by KatsunoMendelzon (1992), the strongest of the two versions of update axioms they propose.
Remark :
When initial belief is assumed to be always complete, Left-distributivity A7 is void since K U K’ is reduced
to a single world, requiring K and K’ to be identical; hence, Ar and Au are equivalent systems. When not,
Au is incompatible with Ar since A5’ and A7 are contradictory (proof in appendix 1).
1.2. WEAK PROBABILISTIC AXIOMS
The set-theoretic axioms can be easily transcripted in probabilistic ones through natural conventions
which reduce a probability distribution to its support. A probability distribution is naturally stated in a
possible worlds framework. Contradictory probabilistic belief is symbolized by a function P∅.
1.2.1. TRANSCRIPTION CONVENTIONS AND COMMON AXIOMS
In a probabilistic framework, the initial belief is associated with a prior probability distribution P on the
worlds. The message is again an event A, associated in the model with a subset of worlds. The final
belief is assumed to be a posterior probability distribution denoted P A, or more concisely P*A. Let ∆(W)
be the set of all probability distributions on the set of possible worlds W. A probabilistic change rule is
then a function from ∆(W) x P(W) to ∆(W).
Let Sup(P) = {w ∈ W, P(w) > 0} the support of P. If K, K’, K’’ are the respective supports of probability
distributions P, P’, P’’, the following table enumerates useful rewritings of set-theoretic formulas in a
probabilistic framework :
4
Set-theoretic formula
Probabilistic transcription
K⊆A
P(A) = 1
K ⊆ K’
P’(X) = 1 ⇒ P(X) = 1, or :
P(X) > 0 ⇒ P’(X) > 0
K’ = K ∩ A
P’(X) = 1 ⇔ P(X∩A) = P(A) ⇔ P(A→X) = 1, or :
P’(X) > 0 ⇔ P(X∩A) > 0
K’’ = K ∩ K’
P’’(X) > 0 ⇔ P(X) > 0 and P’(X) > 0
K’’ = K U K’
P’’(X) = 1 ⇔ P(X) = 1 and P’(X) = 1, or :
P’’(X) > 0 ⇔ P(X) > 0 or P’(X) > 0
K∩A≠∅
P(A) > 0
For instance, the second line of the table means that if the larger support probability distribution assigned
probability one to some event X, then any smaller support probability distribution would have to do also.
In order to set up the weakest probabilistic axioms discussed in this paper, one needs to rely on the
following definition and postulate, which can be thought as a minimal transcription device of set-theoretic
axioms (endorsed by Gärdenfors, 1988) :
Definition:
A probabilistic change rule and a set-theoretic change rule * are said to be associated if and only if ∀P,
∀A : Sup(P A) = Sup(P*A) = Sup(P) * A.
TP. Transcription Postulate :
A set-theoretic change rule * associated to
always exists, whatever the probabilistic change rule .
Corrolary :
The set-theoretic change rule * associated to a probabilistic change rule
is always unique.
Suppose that * and ** are both associated to . For K, consider one probability distribution P (always
existing) such that : K = Sup(P). Then :
Sup(P A) = Sup(P) * A = Sup (P) ** A, ∀ K,∀ A. Hence * and ** are identical.
The previous definition and postulate can now be used to transcript set-theoretic postulates into
probabilistic ones :
B0. Neutrality
If [P(X) > 0 ⇔ P’(X) > 0] then [P*A(X) > 0 ⇔ P’*A(X) > 0]
B1. Consistency
If A ≠ ∅ and P ≠ P∅ then P*A ≠ P∅
B2. Success
P*A(A) = 1
B3. Conservation
If P(A) = 1 then [P*A(X) > 0 ⇔ P(X) > 0]
5
B4. Sub - expansion
If P*A(X∩B) > 0 then P*A ∩ B (X) > 0
Neutrality, derived from A0 and TP, is no more a trivial property, since it assumes that two posterior
probability distributions have the same support when the prior distributions have the same support. This
principle, labeled in Lindström – Rabinowicz (1989) the principle of Top Equivalence, even if debatable in
extreme cases (when the weight of some worlds within the support becomes infinitesimal), will be
hereafter accepted as a minimal assumption of transcription, backed on a quite intuitive property as can
be seen in the generic case. Once TP is accepted, the other axioms can be considered as the simplest
translation of the corresponding set-theoretic axioms in a probabilistic framework.
1.2.2. WEAK PROBABILISTIC REVISING AXIOMS
The weak probabilistic axioms for revising are defined by adding the following transcripted axioms to the
previous common ones :
B5. Super-expansion
If P*A(B) > 0 then [P*A∩B (X) > 0 ⇒ P*A (X∩B) > 0]
Further properties can of course be derived from the preceding ones or directly transcripted:
B4’. Inclusion
If P(X∩A) > 0 then P*A(X) > 0
B5’. Preservation:
If P(A) > 0 then [P*A (X) > 0 ⇒ P(X∩A) > 0]
B45. Right-distributivity
P*AUB (X) > 0 ⇔ P*A(X) > 0 and/or P*B(X) > 0
Commentaire [d1] : P*A
The system Br = {B0, B1, B2, B3, B4, B5} is the weak transcription of the revising axiomatic system Ar,
the following lemma being obviously stated :
Lemma 1 :
A probabilistic change rule mapping any prior probability distribution P into a posterior
probability distribution P*A satisfies the axiom system Br if and only if the associated set-theoretic
change rule * satisfies the axiomatic system Ar.
1.2.3. WEAK PROBABILISTIC UPDATING AXIOMS
The weak probabilistic axioms for updating are defined by adding the following transcripted axioms to the
previous common ones :
B6. Pointwise Super-expansion:
0
0
If [∃w ⎥ P(w ) = 1 and P*A(B) > 0] then [P*A∩B (X) > 0 ⇒ P*A (X∩B) > 0]
B7. Left - distributivity
If [∀X, Q(X) > 0 ⇔ P(X) > 0 or P’(X) > 0] then [∀X, Q*A(X) > 0 ⇔ P*A(X) > 0 or P’*A (X) > 0]
The system Bu = {B0, B1, B2, B3, B4, B6, B7} is the weak transcription of the updating axiom system Au,
the following lemma being again obviously stated :
Lemma 2 :
A probabilistic change rule mapping any prior probability distribution P into a posterior
probability distribution P*A satisfies the axiom system Bu if and only if the associated settheoretic change rule * satisfies the axiomatic system Au.
6
1.3. STRONG PROBABILISTIC AXIOMS
Probabilities are semantically richer than propositions, to which they attribute a real number and not only
a binary value. So, a more ambitious transcription of set-theoretic axioms requires stronger axioms,
extending to a numerical framework the same generic intuitions that lie behind set-theoretic axioms. In
fact, one even distinguishes between strong and superstrong axioms.
1.3.1. STRONG PROBABILISTIC REVISING AXIOMS
The strong system of axioms for probabilistic revising, B#r, is defined by replacing B3 and B4 by the two
following axioms :
B#3. Strong conservation:
If P(A) = 1 then P*A(X) = P(X)
B#4. Strong sub-expansion:
∀X P*A∩B(X) ≥ P*A (X∩B)
Intuitively, Strong conservation states that if a message is already certain according to the prior
probability, the posterior probability is the same. Strong sub-expansion expresses that the conjunction of
two messages never decreases the probability of the part of an event selected by one and compatible
with the other. These two intuitions can be thought as natural applications in a quantitative setting of the
intuitions underlying Conservation and Sub-expansion in a qualitative setting.
From the system B#r, the following principle can be derived :
B#4’. Strong inclusion:
∀X P*A(X) ≥ P(X∩A), or equivalently : ∀X P*A(X) ≤ P(A→X).
This axiom asserts that a message never decreases the probability of the part of an event compatible
with the message. This extends to a numerical framework the generic intuition that lies behind the settheoretic principle of Inclusion : a message never weakens the belief in its own worlds. Jointly with
Success (B1), Strong inclusion means that the only way to weaken the probability of a world is to
eliminate it : if P*A(w) > 0 then P*A(w) ≥ P(w).
The super-strong system of axioms for probabilistic revising, B##r, is defined by adding to B#r the
following axiom :
B##45. Linear mixing:
If A ∩ A’ = ∅ then ∃ a ∈ [0,1] s.t. ∀X, P*A U A’ (X) = a P*A (X) + (1-a) P*A’ (X)
This axiom is a very demanding numerical extension of the Right-distributivity axiom (B45). It is endorsed
by Gärdenfors (1988) in order to characterize probabilistic change when the message has a positive prior
probability (axiom P+1), but with a more specific form associated with the constraint: a = P(A) / P(A U A’).
In fact, this constraint will here be shown to result from the fact that Linear mixing is represented only by
the Bayes' rule (in the presence of the other revising axioms). Linear mixing implies Right-distributivity
only for two messages A and A’ such that A ∩ A’ = ∅.
1.3.2. STRONG PROBABILISTIC UPDATING AXIOMS
The strong system of axioms for probabilistic updating B#u, is defined by replacing B3, B4 and B7 by the
following axioms :
B#3. Strong conservation:
If P(A) = 1 then P*A(X) = P(X)
7
B#4. Strong sub-expansion:
∀X P*A∩B(X) ≥ P*A (X∩B)
B#7. Separability:
If [∀X, Q’(X) = a P(X) + (1-a) P’(X) and Q’’(X) = aP(X) + (1-a) P’’(X), a ∈ ]0,1] ]
then [∀X, P’*A(X) = P’’*A (X) = 0 ⇒ Q’*A(X) = Q’’*A(X) ]
Separability states that when combining linearly a probability distribution with a second one, the revised
probability of an event is not affected by a substitution of the second prior probability distribution if their
respective posteriors are null for this event. It constitutes a first possible transcription of Left-distributivity
to a quantitative setting, even if it does not imply it.
The super-strong system of axioms for probabilistic updating, B##u, is defined from B#u by replacing B#7
by B7 and the super-strong following axiom :
B##7. Homomorphism:
If [∀X Q(X) = a P(X) + (1-a) P’(X)] then [∀X Q*A(X) = a P*A(X) + (1-a) P’*A(X)] ∀ a ∈ ]0, 1[
Homomorphism (Gärdenfors, 1988) looks again like a numerical extension of Left-distributivity to linear
mixtures of probability distributions, a natural way to combine them. Homomorphism implies Separability,
but not Left - distributivity. However, Left - distributivity, Separability and Homomorphism appear as
stronger and stronger expressions of the same generic principle : a message which revises a combination
of two initial beliefs leads to a similar combination of the corresponding final beliefs.
1.4. SYNTHETIC TABLE
The eight axiom systems that have been introduced can be summarized in the following table :
Change context
Axioms
Set theoretic
(referring to propositional
truth values)
Weak Probabilsitic
(referring to probability
supports)
Strong Probabilistic
(referring to probabilistic
ordinal values)
Super strong Probabilistic
(referring to probabilistic
cardinal values)
Revising
(in a static world)
Updating
(in an evolving world)
Ar
Au
Br
Bu
B#r
B#u
B##r
B##u
2. CHANGE RULES
2.1. SET- THEORETIC RULES
In the axiom systems, the economy principle involves a qualitative notion of minimal change from initial to
final belief. In the worlds space, this principle is embodied by a notion of distance which acts as a mental
characteristic of each agent, but has a different flavor in each context.
8
2.1.1. SET-THEORETIC REVISING RULES
For revising, it seems natural to adopt a global distance between sets of worlds, applied between the
initial and the final belief. Indeed, what has to be changed is not the real world itself, but only our belief
about it, that is the set of possible worlds candidate to represent the real world. So, one has to look for a
new set of worlds compatible with the message, but as near as possible from the old set. Especially, as
long as it is logically possible, it is assumed that the message only makes the initial belief more precise,
hence justifying the Preservation axiom.
This intuitive relation between the minimal change principle and the distance notion was made more
precise in a representation theorem, due to Katsuno-Mendelzon (1991,1992):
Theorem KM1 :
A change function * satisfies the set of revising axioms Ar if and only if it exists a total preorder ≤K
on the worlds set W such that:
(i) w’ ∈ K and w’’ ∈ K ⇒ w’=K w’’
(ii) w’ ∈ K and w’’ ∉ K ⇒ w‘ <K w’’
(iii) K * A = Min (A, ≤K) = {w’ ∈ A, w’’ ∈ A ⇒ w’ ≤K w’’ }
According to (i) and (ii), the preorder ≤K can be thought of as defining a set of nested spheres, where the
inner sphere is K. According to (iii), the event K*A is nothing else than the set of worlds with minimal
distance to K, i.e. worlds which belong simultaneously to A and to the first sphere intersecting A.
Moreover, the preorder defines an « epistemic entrenchment » order on all events. An event E is less
entrenched than an event E’ if the first sphere intersecting -E is nearer from K than the first sphere
intersecting -E’. Finally, the system of spheres not only allows to define K * A, but a new preorder ≤K*A
when message A is obtained, at least in A. The new system is just the intersection of the previous system
of spheres with A and can be used for dealing with a new message B.
2.1.2. SET-THORETIC UPDATING RULES
For updating, it is natural to consider a local distance between worlds, which applies between each world
of the initial belief and its counterpart in the final belief. What has to be changed now is the real world
itself, even if it is not perfectly known. So, one has to associate to each world previously considered as
possible a new one eventually not considered initially. The Left - distributivity axiom exactly reflects such
a case-based reasoning: if the world was such or such, it is now such or such.
The following representation theorem (Katsuno-Mendelzon, 1992) relies on a local distance of that kind,
embedded with preorders indexed by each world :
Theorem KM2 :
A change function * satisfies the set of updating axioms Au if and only if it exists a total preorder
≤w on the worlds set W such as:
(j) w’ ≠ w ⇒ w <w w’
(jj) K * A= U Min( A, ≤ w) = {w’ ∈ A, ∃ w ∈ K, ∀ w’’ ∈ A : w’ ≤w w’’ }.
w ∈K
According to (j), the preorder ≤w defines again a set of nested spheres, where the inner sphere is {w}.
According to (jj), the event K*A is the set of worlds in A with minimal distance to each world of K, i.e.
worlds which belong simultaneously to A and to the first sphere intersecting A, for each system of
spheres around a world in K. Moreover, the initial system of spheres around each world has not to be
changed when receiving successive messages.
Remark:
If K is complete: K={w}, a revising and an updating rule lead from the same initial belief to the same final
belief for any message when the preorders coincide : ∀ w’ ∈ W, ∀ w’’ ∈ W, w’ ≤w w’’ iff w’ ≤K w’’; this may
be possible for any K. If K is not complete, the nearest worlds of w ∈ K have to be all the worlds of K, but
this correspondence condition cannot be stable with K; hence a revising and an updating rule act
differently.
9
2.1.3. CANONICAL EXAMPLE
The distinction between revising and updating axioms and rules can be illustrated by the canonical basket
example (Dubois-Prade, 1994) : A basket may be described by four worlds according to the fact that it
contains an apple (a) or not (-a) together with a banana (b) or not (- b). The agent believes at t that the
basket contains at least one fruit, K = (a ∧ b) v (-a ∧ b) v (a ∧ - b).
A revising message from a direct witness says that there is no banana : A = (a ∧-b) v (-a ∧ - b). Hence
he concludes that only the world with one apple and no banana remains : K*A = (a ∧ -b), the change
process satisfying Preservation.
An updating message says at t+1 that there is no more banana in the basket, if there were any : A = (a ∧
-b) v (-a ∧ - b). Considering the evolution of the basket from each initial possible world, the agent
considers now as possible each nearest worlds of the message, according to a natural distance reflecting
the physical operation of withdrawing bananas if any : K * A = (a ∧ - b) v (-a ∧ - b); hence, the change
process violates Preservation and ensures Left-distributivity.
2.2. WEAK PROBABILISTIC RULES
Probabilistic change rules generalizing Bayesian conditioning and Lewisian imaging can be constructed,.
Since generalized methods are searched, they must apply to all situations, especially to the case where
P(A) = 0. Moreover, they must apply in the same way for any initial belief and message.
2.2.1. GENERAL FEATURES
Each rule associates two operations :
- an operation of selection of the worlds which belong to the support of the revised probability distribution.
This operation is summarized by an extended Kronecker function (the selection function) representing the
distance notion :
δ(w; A, .) = 1 iff w is a nearest world in A
- an operation of allocation of the prior probabilities to the selected worlds.
This operation is summarized by the differential weight given to the selected worlds (the allocation
function) :
φ (w; A, P, .) is the weight of the world w
In both cases, the final dot will be hereafter replaced by a precise but changing argument (K, w' or
nothing).
These two operations are not independent. Indeed, the allocation operation is designed to complete the
selection operation when there are more than one new world selected. Hence, it is supposed to be
compatible with the chosen selection function. More precisely, the allocation operation must give a strictly
positive amount of probability to each selected world :
δ(w; A, .) = 1 ⇒ φ (w; A, P, .) > 0
Remark :
The dependence of the allocation function to A and P could be interpreted as meaning that it differs for
any message A and prior P. But this is not in the spirit of a rule which has to be given in a general form
and is afterwards simply applied to the precise knowledge of A and P. Hence, the dependence of the
allocation function to A and P must be interpreted as the dependence of φ to P(w) and P(A).
A change rule is completely described by the value of the revised probability of each world P*A(w), the
probability of any event X being then naturally computed as :
(Ĝ) P*A (X) =
∑P
A
* ( w)
w∈X
10
2.2.2 WEAK PROBABILISTIC REVISING RULES
The General Conditioning rule(GC) is defined as follows :
- Let ≤K be a total preorder on W, summarized by an extended Kronecker function (the selection function)
such that :
(γ1) δ(w; A, K) = 1 iff δ(w,A) = 1 and ∀w’, δ(w’,A) = 1 ⇒ w ≤K w’,
where the usual Kronecker function δ(w,A) equals one iff w belongs to A.
- Let φ (w; A, P) be a positive function (the allocation function) compatible with the selection function, i.e.
such that :
(γ2) δ (w; A, K) = 1 ⇒ φ (w; A, P) > 0
Weak General Conditioning (GCW) specifies the allocation process by the following formula :
(GCW) P*A(w) = P1(w/A) =
δ ( w; A, K ).φ1 ( w; A, P)
∑ δ (w' ; A, K ).φ1 (w' ; A, P)
w '∈W
This method allocates the total weight of all worlds between all selected worlds. It applies even for a
message such that P(A) = 0 and defines a probability distribution according to (γ1) and (γ2).
The representation theorem for weak revising is (proof in appendix 2):
Theorem 1 :
The weak probabilistic revising axiom set Br is satisfied if and only if the change rule belongs to
the Weak General Conditioning method (GCW).
A precise General conditioning rule is generated by (GCW) only when the functions δ(w; A, K) and φ1(w;
A, P) are well specified. For instance, when P(A) > 0, an α - rule is defined by :
- δ(w; A, K) = 1 iff w ∈ K ∩ A.
- φ1(w; A, P) = P(w)α, α being a finite real number.
When α = 1, it reduces to the standard Bayes' rule of conditioning where the probability of all worlds is
allocated to the selected worlds proportionally to their prior probability. When α > 1, it reduces to a
« strengthened Bayes' rule », which favors the worlds with highest probability. When α < 1, it reduces
symmetrically to a « weakened Bayes' rule ». Especially, when α = 0, it leads to the « Egalitarian rule »,
which gives equal probability to all selected worlds, and can be applied even if P(A) = 0.
2.2.3.WEAK PROBABILISTIC UPDATING RULES
The General Imaging method (GI) is defined by a set of selection functions as follows:
- Let {≤w} be set of total preorders on W, each summarized by an extended Kronecker function (the
selection functions) such that :
(η1) δ (w; A, w’) = 1 iff δ (w,A) = 1 and ∀ w’’, δ (w’’; A) = 1 ⇒ w ≤w’ w’’
- Let φ1 (w; A, P) be a positive function (the allocation function) compatible with the previous selection
function, i.e. such that :
(η2) max δ ( w; A, w' ) = 1 ⇒ φ1 (w; A, P) > 0.
w '∈W
Weak General Imaging (GIW) specifies the allocation process by the following formula :
11
(GIW) P*A(w) = P1(w // A) =
[max δ ( w; A, w' )]φ1 ( w; A, P)
w '∈W
∑ [max δ (w' ' ; A, w' )]φ1 (w' ' ; A, P)
w ''∈W
w '∈W
This method allocates the total weight of all worlds between all selected worlds. It defines a probability
distribution according to (η1) and (η2) and applies again even if P(A) = 0.
The representation theorem for weak updating is the following (proof in appendix 2):
Theorem 2 :
The weak probabilistic updating axiom system Bu is satisfied if and only if the change rule
belongs to the Weak General Imaging method (GIW).
More precisely, α - rules can again be defined by stating : φ1(w;A,P) = [P(w)]α, α being a finite real
number.
2.3.STRONG PROBABILISTIC RULES
Strong probabilistic rules associated to strong axioms can now be obtained by specification of the
allocation function translating the weights from initial worlds to final ones.
2.3.1.STRONG PROBABILISTIC REVISING RULES
Strong General Conditioning (GCS) specifies the allocation process by the following formula :
(GCS) P*A(w) = P2(w/A) = P ( w ∩ A) + P ( − A).
δ ( w; A, K )φ 2 ( w; A, P)
∑ δ (w' ; A, K )φ 2 (w' ; A, P)
w '∈W
This method allocates the total weight of all excluded worlds between all selected worlds, while keeping
at least the prior probability of the selected worlds. It again defines a probability distribution according to
(γ1) and (γ2).
Remark :
For each rule, normalized allocation functions can be defined by :
Ψ ( w; A, P ) =
φ ( w; A, P)
δ
(
w
∑ ' ; A, K )φ (w' ; A, P)
w '∈W
Hence, noticing that P(w∩A) = δ(w; A, K) P(w), (GCS) is a special case of (GCW) obtained by adding the
following constraint to define a “strong” normalized allocation function :
Ψ2 ( w; A, P) = P( w) + P(− A)Ψ1 ( w; A, P) , from a “weak” one Ψ1 ( w; A, P) .
The representation theorem for strong revising is the following (proof in appendix 2):
Theorem 3 :
The strong probabilistic revising axiom set B#r is satisfied if and only if the change rule belongs
to the Strong General Conditioning method (GCS).
A precise General conditioning rule is generated by (GCS) only when the functions δ(w; A, K) and φ2(w;
A, P) are well specified. For instance, when P(A) > 0, an α -rule is defined by :
- δ(w; A, K) = 1 iff w ∈ K ∩ A.
- φ2(w; A, P) = P(w)α, α being a finite or infinite real number.
When α = 1, it reduces again to the standard Bayes' rule. When α = 0, it leads to a « Distorted egalitarian
rule ». When α = ∞, it leads to a « Lexicographic rule » which allocates the total probability of the
12
eliminated worlds to the selected world with highest probability (each selected world keeping nevertheless
a positive probability).
When P(A) > 0, the Bayes' rule is the only one which is simultaneously an α−rule for (GCW) and for
(GCS), the allocation function being moreover the same.
Bayesian General Conditioning (GC-B) is defined as the usual Bayes' rule when P(A) > 0 and by any
allocation rule such that: φ1(w; A, P) = φ2(w; A, P) = φ(w), when P(A) = 0.
The representation theorem for super strong revising is (proof in appendix 2) :
Theorem 4 :
The super strong revising axiom system B##r is satisfied if and only if the change rule belongs to
the Bayesian General Conditioning method (GC-B)
Remark :
This result is different from the Gärdenfors (1988) one, which does not make clear the distinction between
the different levels of transcription : the representation theorem for justifying the Bayes' rule requires
Linear Mixing, which is a demanding transcription of Right distributivity, though still a weaker one than the
Gärdenfors axiom (P+1).
2.3.2.STRONG PROBABILISTIC UPDATING RULES
Let now {φ2 (w; A, P, w’)} be a set of positive functions (the allocation function) compatible with the
previous selection function, i.e. such that :
(η2)’ δ (w; A, w’) = 1 ⇒ φ2 (w; A, P, w’) > 0 , ∀ w’.
Strong General Imaging (GIS) specifies the allocation process by the following formula :
(GIS) P*A(w) = P2(w//A) =
∑ P(w' )
w '∈W
δ ( w; A, w' )φ 2 ( w; A, P, w' )
∑ δ (w' ' ; A, w' )φ 2 (w' ' ; A, P, w' )
w ''∈W
This method allocates the probability of each world to the corresponding selected worlds. It defines a
probability distribution according to (η1) and (η2)’ and applies again to any message even if P(A) = 0. It
generalizes Imaging, first introduced by Lewis (1976) in the special case where each world has only one
nearest world (hereafter “Lewisian Imaging”).
Remark :
Starting from a set {φ1 (w; A, P, w’)} of allocation functions satisfying (η2)’, (GIW) can be written
equivalently :
(GIW) P*A(w) = P1(w // A) =
∑ P(w' )
w '∈W
[max δ ( w; A, w' ' ' )]φ1 ( w; A, P, w' )
w '''∈W
∑ [max δ (w' ' ; A, w' ' ' )]φ1 (w' ' ; A, P, w' )
w ''∈W
w '''∈W
It follows obviously that (GIS) is a special case of (GIW) since it corresponds to :
φ1(w;A,P,w’) = φ2(w;A,P,w’) if δ(w;A,w’) = 1
φ1(w;A,P,w’) = 0 if δ(w;A,w’) = 0
The representation theorem for strong updating is (proof in appendix 2):
Theorem 5 :
The strong probabilistic axiom system B#u is satisfied if the change rule belongs to the Strong
General Imaging method (GIS)
More precisely, α-rules can again be defined by: φ2(w; A, P, w’) = [P(w)]α , α being a finite real number.
When α = 0, the Egalitarian rule φ2 (w; A, P, w’) = 1 is always applicable, the prior probability of each
initial world being allocated equally between its nearest A-worlds (Lepage, 1991).
13
When α ≠ 0, the α-rules can be applied only when all the nearest worlds from w have a positive prior
probability, not for technical reasons but in order to satisfy (η2)’. However, mixed rules defined by :
φ2(w; A, P, w’) = c + (1-c) [P(w)]α (0 < c < 1) are always applicable.
P-independent General Imaging (GIP) is defined by adding to (GIS) the following constraint: ∀w, φ2(w; A,
P, w’) = φ2(w). The Egalitarian rule is in fact a special case of a P-independent rule.
The representation theorem for super-strong updating is (proof in appendix 2):
Theorem 6 :
The super-strong probabilistic revising axiom system B##u is satisfied if and only if the change
rule belongs to the P-independent method (GI-P)
2.3.3.CANONICAL EXAMPLE
The probabilistic representation theorems can be illustrated by extending the canonical example to a numerical
framework : consider prior probabilities affected to each possible world of K, say 1 /2 to (a ∧ b), 1/3 to (-a ∧ b), 1/6 to
(a ∧ -b) and 0 to (-a ∧ -b). Revising leads to posterior probability 1 affected to the world (a ∧ -b). Updating leads to
probability 2/3 to (a ∧ - b), nearest world from (a ∧ b) and from itself and 1 /3 to (-a ∧ - b), nearest world from (-a ∧b).
This example points to the different selection functions used in the two contexts, but leaves away the problem of the
allocation function ; hence, a second basket example can be considered : the initial belief is kept, but the message
becomes A’ = - (a ∧ b) = (-a ∧ b) v (a ∧ -b) v (-a ∧ -b).
Interpreting the message as « there is at most one fruit at t», revising leads to posterior probability 1 associated to (-a
∧ b) and (a ∧ - b) conjointly. Hence, it leaves open the problem of the allocation of the total weight among the two
worlds of the intersection :
- using (GCW) α - rules, the Bayes' rule leads to posterior probability 2/3 for (- a ∧ b) and 1/3 for (a∧-b). The
Egalitarian rule leads to 1 /2 for (- a ∧ b) and 1 /2 for (a∧- b). But the 3-rule leads to 8/9 for (- a ∧ b) and 1/9 for (a∧b). In this last case, the posterior probability of a ∧-b is less than its prior probability. This effect does not happen with
the second method of General conditioning (GCS).
- using (GCS) α- rules, the Distorted egalitarian rule leads to posterior probability 7/12 for (-a ∧ b) and 5/12 for (a ∧ b). The Bayes' rule leads again to 2/3 and 1/3 respectively. The 3-rule leads to 7/9 and 2/9, the posterior probability of
a ∧-b being more than its prior probability. The Lexicographic rule leads finally to 5/6 and 1/6.
Interpreting the message as « if there were two fruits, one has been removed at t+1 », updating leads to leave the
probabilities 1/3 on (-a ∧ b) and 1/6 on (a ∧- b) and to transfer probability ½ conjointly to (-a ∧ b) and (a ∧ - b)
(nearest worlds from a ∧ b). Hence, it goes beyond Lewis Imaging by considering that several worlds can be nearest
from one world, an allocation function being again necessary.
Using (GIS) α - rules, the 1- rule leads to the same result than Bayes' rule, say 2/3 to (-a∧b), 1/3 to (a ∧-b). The
Egalitarian rule leads to the same result than the General conditioning Distorted egalitarian rule, say 7/12 and 5/12
respectively.
Remark 1 :
The fact that P-dependent rules do not necessarily satisfy Homomorphism can be illustrated by considering the
mixing of two probability distributions addressed by the same message :
Within the previous example, consider another probability distribution P’ giving 1 /2 to (a ∧ b) and ½ to (a ∧- b), and 0
to other worlds, the agent believing indeed at t that there is an apple in the basket. When applying the updating
message A’, the posterior probabilities resulting from the 1-rule are 1 /4 to (- a ∧ b) and 3 /4 to (a ∧ -b). With the same
1-rule, the previous prior P - giving ½ to (a ∧ b), 1/3 to (-a∧ b) and 1/6 to (-a ∧ b) - leads to 2/3 to (-a∧b) and 1/3 to (a
∧-b). An average prior probability Q = 1 /2 (P+P’) gives 1/ 2 to (a ∧ b), 1/6 to (-a ∧ b) and 1/ 3 to (a ∧-b) ; the posterior
probability resulting from the 1-rule gives 1/3 to (-a ∧ b) and 2/3 to (a ∧ -b). It differs from the average posterior
probabilities, which give 11/24 to (-a ∧ b) and 13/24 to (a ∧ -b).
Remark 2 :
The fact that Bayes' rule does not satisfy Homomorphism can be considered as an explanation of the well-known
Simpson paradox (Simpson, 1951), illustrated by the following example :
Consider a town T with 12000 inhabitants, i.e. 8000 men among which 5000 are sick and 4000 women among which
3000 are sick. Consider a town T’ with 8000 inhabitants, i.e. 2000 men among which 0 are sick and 6000 women
among which 1000 are sick. If X is the event « to be sick » and A the event « to be a man », one has respectively in
both towns :
14
P(A) = 2/3
P(X) = 2/3
PA(X) = 5/8 < P-A(X) = 3 /4
P’(A) = 1 /4
P’(X) = 1/8
P’A(X) = 0 < P’-A(X) = 1/6
In each town, women are proportionally more sick than men. By gathering both towns (according to their
populations), one has 20000 inhabitants, 10000 men among whom 5000 are sick and 10000 women among which
4000 are sick:
Q(A) = 1 /2
Q(X) = 9/20
QA(X) = 1 /2 > Q-A(X) = 2/5
Now men are sicker than women are. This corresponds to Homomorphism with a = 3/5.
2.3.4.EQUIVALENCE WITH MILLER-POPPER AXIOM SYSTEM
Miller – Popper (1994) have proposed a probabilistic change axiomatic system which relies on conditional
probability as a basic object, hence addressing the zero probability puzzle, but without proposing a
constructive rule for solving the puzzle. Several sets of axioms of different strength were proposed by
Miller and Popper. The system B, the strongest one, ensures that its models are reducible to a Boolean
algebra (see Bradley, 1997). The probability distributions satisfying system B can be precisely
characterized (Spohn, 1986). This system will be written hereafter not as originally with propositions x, y,
z, but with events X, Y, Z, the probability of X conditionally to Y being denoted P(X/Y). The six axioms are
followed by a convention linking ordinary probability to conditional probability :
PM1 : 0 ≤ P(X/Y) ≤ P(Z/Z)
PM2 : ∃ X, Y, s.t. P(X/Y) ≠ 0
PM3 : P(X∩Y/Z) ≤ P(X/Z)
PM4 : P(X∩Y/Z) = P(X/Y∩Z) P(Y/Z)
PM5 : P(XUY/Z) + P(X∩Y/Z) = P(X/Z) + P(Y/Z)
PM6 : if ∃Y s.t. P(Y/Z) ≠ P(Z/Z) then ∀X: P(X/Z) + P(-X/Z) = P(Z/Z)
Convention : P(X) = P(X/W)
The following theorem (proof in appendix 3) shows that the Miller-Popper system is equivalent to the
super-strong axiomatic system for revising B##r. It follows that it can be represented by the Bayesian
General Conditioning method (GC-B), which offers a family of constructive rules for the zero-probability
case :
Theorem 7 :
A revised probability distribution satisfies the Miller-Popper axiom system B if and only if it
satisfies the super-strong axiom system B##r.
APPENDIX 1: SET-THEORETIC DERIVED AXIOMS
Under Success (A2), Sub-expansion (A4) and Super-expansion (A5) are equivalent to Right
distributivity (A45).
Let Dual sub-expansion be the following axiom :
K * (A U B) ∩ A ⊆ (K * A)
Then : under A2, A4 ⇔ Dual sub-expansion
15
Indeed :
A4 ⇒ Dual sub-expansion is obvious since (AUB) ∩ A = A.
Dual sub-expansion ⇒ A4 :
For any A and B, take C = A ∩ B and D = A ∩ -B; since A = CUD, Dual sub-expansion gives (K * A) ∩ (A
∩ B) ⊆ K * (A ∩ B). Hence, by A2, (K * A) ∩ B ⊆ K * ( A ∩ B)
Let Dual super-expansion be the following axiom :
If K*(AUB) ∩ A ≠ ∅ then K* A ⊆ (K * (A UB) ∩ A).
Then : under A2, A5 ⇔ Dual super-expansion
Indeed :
A5 ⇒ Dual super-expansion is obvious since (AUB) ∩ A = A.
Dual super-expansion ⇒ A5 :
For any A and B, take C = A ∩ B and D = A ∩ -B; since A = CUD, Dual super-expansion gives:
if (K*A) ∩ (A ∩B) ≠ ∅ then K*(A ∩B) ⊆ (K*A ) ∩ ( A ∩ B). Hence, by A2, K*(A ∩ B) ⊆ (K*A) ∩ B
It follows :
Dual sub-expansion and Dual super-expansion ⇒ A45. Indeed :
Dual sub-expansion implies by disjunction: [K * (AUB) ∩ A] U [K*(AUB) ∩ B] ⊆ (K * A) U (K * B)
Hence, by A2, K * (A U B) ⊆ (K * A) U (K * B)
By A2 again this implies moreover: If K * (AUB) ∩ A = ∅ then K * (A U B) ⊆ K * B
Dual super-expansion implies straightforwardly: If K * (AUB) ∩ A ≠ ∅ then K * A ⊆ K * (A U B)
Hence, the following table can be established:
K*(AUB)
∩B≠∅
∩B=∅
(K * A) U (K * B)
K*A
∩A≠∅
K*B
Impossible
∩A=∅
A45 ⇒ Dual Sub-expansion and Dual super-expansion is obvious when considering the previous table
and the equivalent formulation of Dual sub-expansion: K * (A UB) ∩ - B ⊆ K * A.
Preservation (A5)’ and Left-distributivity (A7) are contradictory
Consider K such that K ∩ A = ∅, and H = A\(K*A) ≠ ∅ ; let K’ = K U H
From A5’ and K’ ∩ A ≠ ∅, K’ * A ⊆ K’ ∩ A = H
But from A7’, K ⊆ K’ ⇒ (K * A) ⊆ (K’*A). Thus A5’ and A7’ are contradictory.
Consequently A5’ and A7 are contradictory since A7 ⇒ A7’
APPENDIX 2 : REPRESENTATION THEOREMS
Theorem 1 :
The weak probabilistic revising axiom set Br is satisfied if and only if the change rule belongs to
the Weak General Conditioning method (GCW).
If sense :
From (γ1) and (γ2) it is easy to compute that :
Sup(P*A ) = {w, P*A(w) > 0} = {w, δ (w,A) = 1 and ∀w’, δ (w’; A) = 1 ⇒ w ≤K w’}
Consequently, assuming the conditions (i) and (ii) on ≤K, the theorem KM1 implies that the change rule
producing Sup(P*A) from Sup(P) satisfies the set of revising axioms Ar. The conclusion follows by
considering lemma 1.
Only if sense :
Lemma 1 asserts that if P*A satisfies Br, Sup(P*A ) satisfies Ar. Theorem KM1 asserts then the existence
of the selection function δ(w; A, K). Since no further function is assumed, the allocation function is
completely general, under the only restriction of (γ1) and (γ2).
Theorem 3 :
16
The strong probabilistic revising axiom set B#r is satisfied if and only if the change rule belongs
to the Strong General Conditioning method (GCS).
If sense :
Since (GCS) is a special case of (GCW), theorem 2 ensures that it satisfies the set of axioms Br. It
satisfies the additional axioms too :
(GCS) satisfies obviously B#3 (contrary to GCW):
If P(A) = 1 then P(-A) = 0 and P(w∩A) = P(w), ∀w ∈ A
Hence P2(w/A) = P(w), ∀w ∈ A (i.e. ∀w s.t. P(w) > 0), and by (GCX), P2(X/A) = P(X), ∀X.
(GCS) also satisfies B#4 (contrary to GCW):
According to (Ĝ) it is enough to show that the axiom is satisfied for each world :
(1) P2(w/A∩B) ≥ P2(w∩B/A), ∀ w ∈ W
Using (GC2) :
P ( w ∩ A ∩ B) + P (− A ∪ − B ).
P2(w/A∩B) =
δ ( w; A ∩ B, K )φ 2 ( w; A, P)
∑ δ (w' ' ; A ∩ B, K )φ 2 (w' ' ; A, P)
w ''∈W
P2(w∩B/A) =
δ ( w ∩ B; A, K )φ 2 ( w; A, P )
P ( w ∩ A ∩ B) + P (− A).
∑ δ (w' ' ; A, K )φ 2 (w' ' ; A, P)
w ''∈W
(with the convention δ(∅; A, K) = 0).
The Sub-expansion axiom A4 writes by using the Kronecker functions:
If δ(w,B) = 1 and δ(w; A, K) =1 then δ(w; A ∩ B, K) =1
or δ(w; A ∩ B, K) ≥ δ (w; A, K) δ(w; B), ∀w
The Super-expansion axiom A5 writes similarly:
If ∃ w s.t. δ(w; B) = 1 and δ(w; A, K) = 1 then δ(w; A ∩ B, K) = 1 ⇒ δ(w; A, K) = 1 and δ(w; B) = 1, ∀w. Or :
if ∃ w s.t. δ(w; B) = 1 and δ(w; A, K) = 1 then δ(w; A∩B, K) ≤ δ(w; A, K) δ(w; B) , ∀ w.
Hence, if it exists a world in B selected in A:
∀w ∈W, δ(w; A ∩ B, K) ≥ δ (w∩B; A, K)
∀w ∈W, δ(w; A ∩ B, K) ≤ δ(w; A, K)
Moreover:
P2(-A U B) ≥ P2(-A)
The inequality (1) follows by combination.
If it exists no world in B selected in A, P2(w ∩ B/ A)= 0 ∀w and the inequality (1) is automatically satisfied.
Only if sense :
In (GCW), call Ψ1(w; A,P) the normalized allocation function :
Ψ1 ( w; A, P) =
φ1 ( w; A, P)
∑ δ (w' ; A, K )φ1 (w' ; A, P)
w '∈W
The revised probability given by (GCS) writes :
P*A (w) = δ(w;A,K) Ψ1(w; A,P)
Hence:
*
P A (w) – P(w ∩ A) = δ(w;A,K) (Ψ1(w; A,P) – P(w))
= δ(w;A,K) Ψ’2(w; A,P)
with Ψ’2(w; A,P) = Ψ1(w; A,P) – P(w)
Taking the weighted average on both sides :
(2)
∑ δ (w; A, K )Ψ ' (w; A, P) = ∑ δ (w; A, K )Ψ (w; A, P) − ∑ δ (w; A, K ) P(w)
w∈W
2
w∈W
=
1
w∈W
1 – P(A) = P(-A)
Consider now :
Ψ’2(w; A,P) = P(-A) Ψ2(w; A,P)
According to B#4, Ψ2(w; A,P) is positive
According to (2), Ψ2(w; A,P) is normalized
Hence, Ψ2(w; A,P) is a normalized allocation function related to Ψ1(w; A,P) by :
17
Ψ1(w; A,P) = P(w) + P (-A) Ψ2(w; A,P)
The revised probability is thus given by (GCS) with :
Ψ2 ( w; A, P) =
φ 2 ( w; A, P)
δ
(
w
∑ ' , A, K )φ 2 (w' ; A, P)
w '∈W
This theorem shows that the strong axiomatic system B#r does not single out the Bayes' rule, but the
more general class of rules (GCS). In order to single out the Bayes' rule, the super-strong axiom system,
which relies on Linear mixing, is needed :
Theorem 4 :
The super-strong revising axiom system B##r is satisfied if and only if the change rule belongs to
the Bayesian General Conditioning method (GC-B)
If sense:
(GC-B) is trivially a special case of (GCS). Hence, theorem 3 ensures that it satisfies the system B#r. It
satisfies also the additional axiom of Linear Mixing B##45. Indeed, it satisfies the condition:
If A ∩ A’ = ∅ then P*AUA’ (X) = a P*A(X) + (1-a) P*A’(X)
• If P(A) P(A’) ≠ 0:
P ( X ∩ ( A ∪ A' )) P(( X ∩ A) ∪ ( X ∩ A' )) P( X ∩ A) + P ( X ∩ A' )
=
=
=
P( A ∪ A' )
P( A ∪ A' )
P( A) + P ( A' )
P( A)
P ( X ∩ A)
P( A' )
P( X ∩ A' )
+
P( A) + P ( A' ) P ( A)
P( A) + P ( A' ) P( A' )
P( A)
*
*
*
*
PAUA
= PAUA
' ( X ) = aPA ( X ) + (1 − a ) PA ' ( X ) with a =
' ( A)
P ( A) + P ( A' )
*
PAUA
'(X ) =
• If P(A) = 0 and P(A’) ≠ 0 (or conversely):
*
*
*
*
PAUA
' ( X ) = PA ( X ) = aPA ( X ) + (1 − a ) PA ' ( X ) with a = 1
• If P(A) = 0 and P(A’) = 0:
*
PAUA
'(X ) =
∑ δ (w; AUA' , K )φ (w)
, because in this case φ (w, A, P) = φ (w)
∑ δ (w' ; AUA' , K )φ (w' )
w∈ X
1
w '∈W
P*A(X) and P*A’(X) can be written in the same way, by substituting respectively A and A’ to AUA’.
By the fact that A ∩ A’ = ∅, three cases have to be considered:
a. δ(w; AUA’, K) = δ(w; A, K) ∀w ∈ W, hence Linear mixing is satisfied with a = 1
b. δ(w; AUA’, K) = δ(w; A’, K) ∀w ∈ W, hence Linear mixing is satisfied with a = 0
c. δ(w; AUA’, K) = δ(w; A, K) + δ(w; A’,K) ∀w ∈ W, hence Linear mixing is again satisfied with:
a=
∑ δ (w' ; A, K )
=P
∑ δ (w' ; AUA' , K )
w '∈W
*
AUA '
( A)
w '∈W
Only if sense:
Consider A’ = {w’} ⊆ K and A = K \ A’.
If P*A (X) is computed along a (GC2) rule, then for any w ∈ A:
*
PAUA
' ( w) = P ( w)
18
φ 2 ( w)
∑ φ 2 (w' ' )
PA* ( w) = P( w) + P ( w' )
w ''∈ A
P*A’(w) = 0
Consequently, by applying Linear mixing:
(1 − a) P( w) = P ( w' )
φ 2 ( w)
∑ φ 2 (w' ' )
w ''∈W
Hence, if there are at least two worlds w1 and w2 in A:
P ( w1 ) φ 2 ( w1 )
=
P ( w2 ) φ 2 ( w2 )
The same holds when taking another w’ ∈ K.
Finally, ∀ w ∈ K, φ2(w) = kP(w), an allocation function characterizing the Bayes' rule.
Theorem 2 :
The weak probabilistic updating axiom system Bu is satisfied if and only if the change rule
belongs to the Weak General Imaging method (GIW).
If sense :
Considering (η1) and (η2), it is easy to check that :
Sup(P*A ) = {w, P*A(w) > 0} = {w, δ (w,A) = 1 and ∃ w’, ∀w’’, δ (w’; A) = 1 ⇒ w ≤w’ w’’}
Consequently, assuming the condition (j) on ≤w’ , the theorem KM2 implies that the change rule that
maps Sup(P) to Sup(P*A ) satisfies the set of updating axioms Au. Considering lemma 2, this means that
(GIW) satisfies the set Bu.
Only if sense :
*
Lemma 2 asserts that if P A satisfies Bu, its support K*A satisfies Au. Theorem KM2 asserts then the
existence of a selection function. Since no further constraint is assumed, the allocation function is
completely general, under the only restriction of (η1) and (η2).
Theorem 5 :
The strong probabilistic axiom system B#u is satisfied if the change rule belongs to the Strong
General Imaging method (GIS)
If sense :
Since (GIS) is a special case of (GIW), theorem 5 ensures that it satisfies system Bu. It satisfies the
additional axioms too :
(GIS) satisfies obviously B#3 :
If P(A) = 1 then δ(w; A, w) = 1, ∀w ∈ A and δ(w; A ,w’) = 0, ∀w ∈ A, ∀ w’≠w
Hence P(w//A) = P(w), ∀ w ∈ A (i.e. ∀ w s.t. P(w) > 0) and by (GCX), P(X//A) = P(X), ∀ X.
(GIS) satisfies B#4 :
According to (Ĝ), it is enough to show that the axiom is satisfied for each world:
(1) P(w//A∩B) ≥ P(w∩B //A), ∀w ∈ W
Using (GIS):
P2(w//A∩B) =
∑ P(w' )
w '∈W
δ ( w; A ∩ B, w' )φ 2 ( w; A, P, w' )
∑ δ (w' ' ; A ∩ B, w' )φ 2 (w' ' ; A, P, w' )
w ''∈W
P2(w∩B//A) =
∑ P(w' )
w '∈W
δ ( w ∩ B; A, w' )φ 2 ( w; A, P, w' )
∑ δ (w' ' ; A, w' )φ 2 (w' ' ; A, P, w' )
w ''∈W
Here again, the Sub-expansion axiom A4 as well as the Pointwise Super-expansion axiom A6 apply to K
= {w’} and can be stated with the Kronecker functions:
δ (w; A ∩ B, w’) ≥ δ(w; A, w’) δ(w; B), ∀ w ∈ W
19
If ∃ w s.t. δ(w; B) δ(w; A, w’) = 1 then δ(w; A ∩ B, w’) ≤ δ(w; A, w’) δ(w; B), ∀w ∈ W.
In fact, these inequalities are straightforward consequences of any system of nested spheres. Hence, if it
exists a world in B which is among the nearest worlds of w’ in A:
∀ w ∈ W, δ(w; A ∩ B, w’) ≥ δ(w ∩ B; A, w’)
∀ w ∈ W, δ(w; A ∩ B, w’) ≤ δ (w; A, w’)
By combination, for each w’ s.t. ∃ a world in B which is among its nearest worlds in A:
(1)’
δ ( w; A ∩ B, w' )φ 2 ( w; A, P, w' )
∑ δ (w' ' ; A ∩ B, w' )φ 2 (w' ' ; A, P, w' )
w ''∈W
≥
δ ( w ∩ B; A, w' )φ 2 ( w; A, P, w' )
∑ δ (w' ' ; A, w' )φ 2 (w' ' ; A, P, w' )
w ''∈W
If there exists no world in B which is among the nearest worlds of w’ in A, (1’) still holds because δ(w∩B;
A, w’) = 0. By summing on the w’, (1)’ implies (1).
(GIS) satisfies also B#7 :
Consider a probability distribution Q(w) = a P(w) + (1-a) P’(w), a ∈ ]0,1[, ∀w ∈ W.
Hence :
P' 2 ( w // A) =
∑ P' (w' )
w '∈W
δ ( w; A, w' )φ 2 ( w; A, P' , w' )
∑ δ (w' ' ; A, w' )φ 2 (w' ' ; A, P' , w' )
w ''∈W
Q2 ( w // A) =
∑ (aP(w' ) + (1 − a) P' (w' ))
w '∈W
δ ( w; A, w' )φ 2 ( w; A, Q, w' )
∑ δ (w' ' ; A, w' )φ 2 (w' ' ; A, Q, w' )
w ''∈W
The fact that P ' 2 ( w // A) = 0 implies that P ' ( w' )δ ( w; A, w' ) = 0, ∀w' .
Hence Q2 ( w // A) does not depend of the distribution P’.
Only if sense :
Any distribution P( . ) can be written :
P (.) =
∑ P(w' )Π(w' ,.)
w '∈W
where Π (w' ,.) is the probability distribution concentrated in w’.
It is easily computed that :
Π * A ( w' , w) =
δ ( w; A, w' )φ 2 ( w; A, P, w' )
∑ δ (w' ' ; A, w' )φ 2 (w' ' ; A, P, w' )
w ''∈W
If δ(w;A,w’) = 0, Π * A ( w' , w) = 0 ; hence, according to axiom B#7, P*A(w) does not depend on P(w’).
The rule (GIW) reduces then to the rule (GIS) where the weight P(w’) is affected in P*A (w) only if
δ(w;A,w’) = 1.
Theorem 6 :
The super-strong probabilistic revising axiom system B##u is satisfied if and only if the change
rule belongs to the P-independent method (GI-P)
If sense :
(GI-P) is trivially a special case of (GIS).
Hence theorem 6 ensures that it satisfies system B#u. It satisfies also the additional axiom of
Homomorphism B##7 :
Consider two prior probability distribution P and P’ and a message A. Then, for every world w, Strong
General Imaging (GIS) with φ2(w; A, P, w’) = φ(w) leads to :
(1) P2(w//A) =
∑ P(w' )
w '∈W
δ ( w, A, w' )φ ( w)
∑ δ (w' ' , A, w' )φ (w' ' )
=
∑ P(w' )δ (w, A, w' )Ψ (w)
w '∈W
w ''∈W
20
(2) P’2(w//A) =
∑ P' (w' )
w '∈W
δ ( w, A, w' )φ ( w)
∑ δ (w' ' , A, w' )φ (w' ')
=
∑ P' (w' )δ (w, A, w' )Ψ (w)
w '∈W
w ''∈W
Consider now the average distribution Q such that for each w’ :
(3) Q(w’) = a P(w’) + (1-a) P’(w’)
For each w, the change of Q(w) leads to :
(4) Q2(w//A) =
∑ Q( w')
w '∈W
δ ( w, A, w' )φ ( w)
∑ δ ( w'' , A, w')φ ( w'' )
=
∑ Q( w' )δ ( w, A, w') Ψ( w)
w '∈W
w ''∈W
But the averaging of P2(w//A) and P’2(w//A) leads to :
(5)aP2(w//A)+ (1-a) P’2(w//A) =
∑ aP(w' )δ (w, A, w' )Ψ (w) + ∑ (1 − a) P' (w' )δ (w, A, w' )Ψ (w)
w '∈W
w '∈W
Considering (3), the equality of (4) and (5) is straightforward.
But if this is true for any world w, this is also true for any event X, according to (Ĝ).
Only if sense :
A (GIS) rule using a P-dependent rule cannot satisfy Homomorphism. Indeed consider a world w’ with two
nearest worlds in A, say w1 and w2. The part of the posterior probability of w1 due to w’ writes :
P2 ( w1 // A) = P( w' )
φ 2 ( w1 ; A, P, w' )
φ 2 ( w1 ; A, P, w' ) + φ 2 ( w2 ; A, P, w' )
The same holds for prior probability P’. Hence, Homomorphism implies that
φ 2 ( w1 ; A, P, w' )
φ 2 ( w2 ; A, P, w' )
be
constant. In other terms, it implies that φ2(w1; A, P, w’) = φ(w1), i.e. does not depend on the prior
probability distribution
APPENDIX 3 : EQUIVALENCE WITH MILLER-POPPER AXIOMS
Theorem 7 :
A revised probability distribution satisfies the Miller-Popper axiom system B if and only if it
satisfies the super-strong axiom system B##r.
Only if sense :
B1 (Consistency) :
obvious
B2 (Success) :
in PM4, take X = Y = Z : P(X/X) = P(X/X) P(X/X)
hence P(X/X) = 0 or 1, ∀X
by PM1, P(X/X) = 0 implies P(X/Y) = 0, ∀ X, Y, contradicting PM2
hence P(X/X) = 1 ∀X
Three consequences concerning the probability of W and ∅ can be derived :
- in PM4, take X = Z, Y = W :
P(Z/Z) = P(Z/Z) P(W/Z)
hence P(W/Z) = 1
- in PM6, take X = W :
if Z ≠ ∅, P(W/Z) + P(∅/Z) = P(Z/Z)
hence P(∅/Z) = 0, ∀Z ≠ ∅
- in PM4, take X = Z = ∅ :
P(∅/∅) = P(∅/∅) P(Y/∅)
21
hence P(Y/∅) = 1, ∀ Y
B#3 (Strong conservation) :
in PM5, take X = X’∩Y’ and Y= X’ ∩ -Y’
P(X’/Z) = P(X’∩Y’/Z) + P(X’∩-Y’/Z)
(1)
by applying PM4 to the last two terms :
P(X’/Z) = P(X’/Y’∩Z) P(Y’/Z) + P(X’/-Y’∩Z) P(-Y’/Z)
assuming that P(Y’/Z) = 1, which implies by PM6 and B2 that P(-Y/Z) = 0, one gets :
P(X’/Z) = P(X’/Y’∩Z)
take now Z = W, i.e. P(Y’) = 1 under the previous assumption; one gets :
P(X’) = P(X’/Y’)
B#4 (Strong sub-expansion) :
by PM4 : P(X∩Y/Z) = P(X/Y∩Z) P(Y/Z)
by PM1 : P(y/Z) ≤ P(Z/Z) = 1
hence P(X∩Y/Z) ≤ P(X/Y∩Z)
B5 (Super-expansion) :
by PM4 : P(X∩Y/Z) = P(X/Y∩Z) P(Y/Z)
hence if P(Y/Z) > 0 and P(X/Y∩Z) > 0 then P(X∩Y/Z) > 0
B##45 (Linear mixing)
in (1), take Y’ = Y’’ U Z’’ with Y’’∩Z’’ = ∅ :
P(X’/Z) = P((X’∩Y’’/Z) U (X’∩Z’’)/Z) + P(X’ ∩ - (Y’’UZ’’)/Z)
applying again PM5 to the second term, since ((X’∩Y’’) U (X’∩Z’’) = ∅ :
P(X’/Z) = P(X’∩Y’’/Z) + P(X’∩Z’’/Z) + P(X’∩-(Y’’UZ’’)/Z)
applying now PM4 to the three terms :
P(X’/Z) = P(X’/Y’’∩Z)P(Y’’/Z) + P(X’/Z’’∩Z) P(Z’’/Z) + P(X’/-(Y’’UZ’’))P(-(Y’’UZ’’)/Z)
let Z = Y’’UZ’’, then the third term is null and one gets finally:
P(X’/Y’’UZ’’) = P(X’/Y’’)P(Y’’/Y’’UZ’’) + P(X’/Z’’)P(Z’’/Y’’UZ’’)
hence : P(X’/Y’’UZ’’) = aP(X’/Y’’) + (1-a) P(X’/Z’’)
If sense :
It must just be proved that the axioms of system B do not constraint the change rules more than the
Bayesian generalized method allows.
If P(Z/W) > 0, the change rule is just the Bayes' rule, as proved by stating Y’ = Z’ and Z = W in PM4:
P(X/Z’) = P(X∩Z’) / P(Z’)
If P(Z/W) = 0, the conditional probability P(./Z), which obeys necessarily PM1, PM2, PM3, PM5 and PM6
since it is a probability distribution, is just constrained by PM4. In fact, P(./Z) can be linked to P(./Z) in two
ways :
• Z ⊂ Z’ In PM4, take Z = Z’ such that Y∩Z’ = Z :
P(X∩Y/Z’) = P(X/Z) P(Y/Z’) = P(X/Z)P(Z/Z’)
For instance, if X = {w} and w ∈ Z, one has :
P(w/Z’) = P(w/Z) P(Z/Z’)
The constraint on P(w/Z) is acting only if P(Z/Z’) ≠ 0 (which is not the case when Z’ = W)
• Z’ ⊂ Z In PM4, take Y = Z’:
P(X∩Z’/Z) = P(X/Z’) P(Z’/Z)
For instance, if X = {w} and w ∈ Z, one has :
- if w ∈ Z’ : P(w/Z) = P(w/Z’)P(Z’/Z)
- if w ∉ Z’ : 0 = P(w/Z’) P(Z’/Z)
The constraint on P(w/Z) is acting only in the first case.
In both cases, when the constraint is acting, one can write for two worlds w and w’ belonging to Z:
P(w / Z ) P(w / Z ' )
=
P( w' / Z ) P( w' / Z ' )
This is just the condition that the weights are allocated proportionally to the allocation functions φ(w) and
φ(w’).
22
ACKNOWLEDGEMENTS
The authors want to thank R.Bradley, D.Lehman, I.Levi and H.Zwirn for helpful comments.
REFERENCES
ALCHOURRON, C. E. - GÄRDENFORS, P. - MAKINSON, D. (1985): On the logic of theory change:
partial meet contraction and revision functions, Journal of Symbolic Logic, 50, 510-530
BILLOT, A. - WALLISER, B. (1999): A mixed knowledge hierarchy, Journal of Mathematical Economics,
32, 185-205.
BRADLEY, R. (1997) : More triviality, Journal of Philosophical Logic, Vol.28, N°2, , 12-139.
COX, R.T. (1946): Probability, frequency and reasonable expectations, American Journal of Physics, 14,
1-13
DARWICHE, A. – PEARL, J. (1997): On the logic of iterated belief revision, Artificial Intelligence
89, 1-29.
FRIEDMAN, N., HALPERN, J.Y. (1994) : A knowledge-based framework for belief change, Part II :
revision and update", in J.Doyle, E.Sandewall, P.Tomasso, Eds, Principles of Knowledge Representation
and Reasoning, Proc. Fourth International Conference KR'94, 1994, pp.190-201.
GÄRDENFORS, P. (1982): Imaging and Conditionalization, The Journal of Philosophy, 79, 747-760.
GÄRDENFORS, P. (1988): Knowledge in Flux, MIT Press
GÄRDENFORS, P., SAHLIN, N.E. eds (1988) : Decision, probability and utility, selected readings.
GÄRDENFORS, P. (1992): Belief revision: An introduction, In P. Gärdenfors ed.: Belief Revision,
Cambridge University Press, 1-28
GRAHNE, G. (1991): Updates and counterfactuals. In J.Allen, R.Fikes and E.Sandewall eds.,
nd
Proc. of the 2 Inter. Conf. On Principles of Knowledge Representation and Reasoning (KR’91),
Cambridge, Mass., April 22-25, 269-276
HECKERMAN, D.E. (1988): An axiomatic framework for belief updates, in J.F.Lemmer, L.N. Kanal eds,
Uncertainty in Artificial Intelligence 2, North Holland, Amsterdam, 11-22.
KATSUNO, A. - MENDELZON, A. (1991): Propositional knowledge base revision and minimal change,
Artificial Intelligence, 52, 263-294
KATSUNO, A. - MENDELZON, A. (1992): On the difference between updating a knowledge base and
revising it, in P. Gärdenfors ed.: Belief Revision, Cambridge University Press, 183-203
LAVENDHOMME, T. (1997): For a modal approach of knowledge revision and nonmonotonicity, mimeo,
SMASH, Facultés universitaires Saint-Louis, Bruxelles
LEPAGE, F. (1991): Conditionals and revision of probability by imaging, Cahiers du département de
philosophie, N° 94-02, Université de Montréal
LEPAGE, F. (1997): Revision of probability and conditional logic, mimeo.
LEWIS, D.K. (1976): Probabilities of conditionals and conditional probabilities, Philosophical Review, 85,
297-315
LINDSTRÖM – RABINOWICZ (1989): On probabilistic representation of non-probabilistic belief revision,
Journal of Philosophical Logic, 18, 69 -101.
23
MAKINSON, D. (1993): Five Faces of Minimality, Studia Logica, 52, 339-379.
MILLER, D., POPPER, K. (1994): Contribution to the formal theory of probability, in P.Humphreys ed.,
Patrick Suppes : Scientific Philosopher, Vol.1, Kluwer, Dordrecht, 3-23.
SIMPSON, E.H. (1951) : The interpretation of interaction in contingency tables, Journal of Royal
Statistical Society, ser. B 13 :238-241.
SPOHN, W. (1986) : The representation of Popper measures, Topoi 5, 69-74.
TELLER, P. (1976): Conditionalization, observation and change of preference, in Foundations of
Probability Theory , Statistical Inference, and Statistical Theories of Science, D.Reidel, Dordrecht, Vol.1,
205-259.
WILLIAMS, P. (1980): Bayesian conditionalization and the principle of minimum information, British
Journal of the Philosophy of Science, 31, 131-144.
24

Download Report

can bayes` rule be justified by cognitive rationality principles

Paperzz.com

Your Paperzz