Maximin Decision Rule

Maximin Decision Rule
Robert M. Haralick
Computer Science, Graduate Center
City University of New York
Outline
Conditional Expectation
Definition
Let X and Y be a discrete random variables that take values
from the set A × B. The Conditional Expectation of X given Y is
defined by
X
E[Y | X = a] =
bPXY (a, b)
b∈B
E[Y | X ] is a function of the various values that X can take.
The Event (c j , c k , d)
PTA (c j , c k , d) = PTA (c j , c k |d)P(d)
= PT (c j |d)PA (c k |d)P(d)
PT (d|c j )PT (c j )
=
PA (c k |d)P(d)
P(d)
= PT (d|c j )PA (c k |d)PT (c j )
PTA (c j , c k , d)
PAT (c k , d|c j ) =
PT (c j )
= PT (d|c j )PA (c k |d) = PT (d|c j )fd (ck )
Expected Conditional Economic Gain Given Class
Definition
The conditional expectation of the economic gain given class c j
for decision rule f is defined by
j
E[e | c ; f ] =
K
XX
e(c j , c k )PTA (c j , c k , d)
d∈D k =1
=
K
XX
e(c j , c k )P(d | c j )fd (c k )
d∈D k =1
=
K
X
k =1
e(c j , c k )
X
P(d | c j )fd (c k )
d∈D
where fd (c) is the conditional probability that the decision rule
assigns class c given measurement d.
Class Conditional Probability and Prior Probability
P(d|c)
Conditional probability of measurement d given class c
Class conditional probability
P(c)
Prior probability of class c
Prior probability
Economic Gain
The Expected economic gain can be related to the class
conditional expected economic gain and prior probabilities.
E[e; f ] =
K X
K
XX
e(c j , c k )P(c j , d)fd (c k )
d∈D k =1 j=1
=
K X
K X
X
e(c j , c k )P(d | c j )P(c j )fd (c k )
j=1 k =1 d∈D
=
" K
K
X
XX
j=1
=
K
X
j=1
#
j
k
e(c , c )P(d | c )fd (c ) P(c j )
k =1 d∈D
E[e | c j ; f ]P(c j )
j
k
Economic Gain
When the economic gain is represented in terms of the prior
class probabilities, we write
E[e; f , P(c 1 ), . . . , P(c K )]
When f is a Bayes decision rule,
E[e; f , P(c 1 ), . . . , P(c K )] ≥ E[e; g, P(c 1 ), . . . , P(c K )]
for any other decision rule g.
Definition
When f is a Bayes decision rule, E[e; f , P(c 1 ), . . . , P(c K )] is
called the Bayes gain.
Convex Combinations
Definition
Let x, y ∈ RN and 0 ≤ λ ≤ 1. Then λx + (1 − λ)y is called a
convex combination of x and y .
Proposition
If 0 ≤ x, y , λ ≤ 1, then 0 ≤ λx + (1 − λ)y ≤ 1
Proof.
0 ≤ x, y , λ implies λx + (1 − λ)y ≤ λ + (1 − λ) = 1.
λ ≤ 1 implies 0 ≤ 1 − λ.
x, y , λ, 1 − λ ≥ 0 implies λx + (1 − λ)y ≥ 0.
Therefore, 0 ≤ λx + (1 − λ)y ≤ 1.
Structure of Decision Rules
Consider the structure of a decision rule fd (c).
Suppose D = {d 1 , . . . , d Q } and C = {c 1 , . . . , c K }.
Then this decision rule f can be thought of as a vector in RKQ
f 0 = (fd 1 (c 1 ), . . . , fd 1 (c K ), . . . , fd Q (c 1 ), . . . , fd Q (c K ))
There are some constraints:
For q ∈ {1, . . . , Q} and k ∈ {1, . . . , K }, 0 ≤ fd q (c k ) ≤ 1
PK
k
For q ∈ {1, . . . , Q},
k =1 fd q (c ) = 1
Therefore, a decision rule must lie in the unit hypercube of RQK
and it must lie in the manifold defined by the Q linear
constraints
K
X
fd q (c k ) = 1
k =1
Convex Combinations of Decision Rules
Proposition
Convex combinations of decision rules are decision rules
Proof.
Let f and g be two decision rules. Let 0 ≤ λ ≤ 1. Consider
λfd (c) + (1 − λ)gd (c). We have already proven that
0 ≤ λfd (c) + (1 − λ)gd (c) ≤ 1. Consider the convex
combination:
X
[λfd (c) + (1 − λ)gd (c)] = λ
c∈C
X
fd (c) + (1 − λ)
c∈C
= λ + (1 − λ)
= 1
X
c∈C
gd (c)
Convex Sets
Definition
A set C ⊆ RN is a convex set if and only if x, y ∈ C imply
λx + (1 − λ)y ∈ C for every 0 ≤ λ ≤ 1.
Proposition
The set F of all convex combinations of decision rules is a
convex set.
Intersection of Convex Sets are Convex
Proposition
Let C and D be convex sets. Then C ∩ D is a convex set.
Proof.
Let x, y ∈ C ∩ D and 0 ≤ λ ≤ 1. Consider λx + (1 − λ)y .
Since x, y ∈ C ∩ D, x, y ∈ C and x, y ∈ D.
Since C is convex and 0 ≤ λ ≤ 1, λx + (1 − λ)y ∈ C.
Since D is convex and 0 ≤ λ ≤ 1, λx + (1 − λ)y ∈ D.
λx + (1 − λ)y ∈ C and λx + (1 − λ)y ∈ D imply
λx + (1 − λ)y ∈ C ∩ D.
Mixed Decision Rules
Definition
Let f and g be decision rules and 0 ≤ λ ≤ 1.
Then
hd (c) = λfd (c) + (1 − λ)gd (c)
is called a mixed decision rule of f and g.
With probability λ apply decision rule f and probability 1 − λ apply
decision rule g.
If we apply decision rule f , the we assign class c with probability f (c|d)
If we apply decision rule g, then we assign class c with probability
g(c|d)
Extreme Points
Definition
Let A ⊆ RN . A point e ∈ A is called an Extreme Point of A if and
only if b, c ∈ A with e = b+c
2 implies e = b = c.
Deterministic Decision Rules are Extreme Points
Proposition
Let F be the set of all convex combinations of decision rules. Let f be
a deterministic decision rule. Then f is an extreme point of F .
Proof.
Let g, h ∈ F satisfy f =
g+h
2 .
Hence for every d ∈ D and c ∈ C,
fd (c) =
gd (c) + hd (c)
2
Since f is a deterministic decision rule, for some c ∗ ∈ C, fd (c ∗ ) = 1
and for all c ∈ C − {c ∗ }, fd (c) = 0. Consider c ∈ C for which
fd (c) = 0.
gd (c) + hd (c)
fd (c) = 0 =
2
Since gd (c), hd (c) ≥ 0 and since gd (c) + hd (c) = 0, it follows that
gd (c) = hd (c) = 0.
Proof Continued
Proof.
Now consider c ∗ .
gd (c ∗ ) + hd (c ∗ )
2
∗
∗
Hence, gd (c ) + hd (c ) = 2. But gd (c ∗ ), hd (c ∗ ) ≤ 1. Therefore,
gd (c ∗ ) = 1 and hd (c ∗ ) = 1.
fd (c ∗ ) = 1 =
Now, by definition of extreme point, a deterministic decision rule
f ∈ F is an extreme point of F , the set of all convex
combinations of decision rules.
Convex Polyhedrons
Definition
A Closed Convex Polyhedron is a non-empty set P formed as
the solutions to a matrix equation Ax ≤ b.
P = {x | Ax ≤ b}
Each row of the matrix equation specifies a hyperplane half
space and P is the intersection of these hyperplane half
spaces.
Definition
A bounded polyhedron is a polytope.
Closed Convex Polytope Example Tetrahedron
P = {x | Ax ≤ b}

1
1
1
 1 −1 −1
A = 
 −1
1 −1
−1 −1
1
 
2
 0 

b = 
 0 
0




The Set of Decision Rules is a Closed Convex
Polytope
Proposition
Let F be the set of all decision rules formed from the finite set
C of classes and the finite set D of measurements.
The set F is a closed convex polytope lying in a linear manifold
of dimension |C| |D| − |D|.
Proof.
Let f ∈ F . We already know that f ∈ R|C| |D| . The |D| linear
constraints
are formed from the requirement that
P
c∈C fd (c) = 1. The remaining constraints are of the form
fd (c) ≥ 0 which is equivalent to −fd (c) ≤ 0
fd (c) ≤ 1
Minkowski’s Theorem
Definition
Let X = {x1 , . . . , xM } ⊂ RN . The Convex Hull of X is defined by
N
CH(X ) = {y ∈ R | y =
M
X
m=1
λm xm , where λm ≥ 0,
M
X
λm = 1}
m=1
Theorem
Any closed convex polytope is the convex hull of its extreme
points.
Probabilistic Decision Rules
Any Probabilistic Decision Rule can be represented as a
convex combination of the deterministic decision rules.
Theorem
Let f be a probabilistic decision rule and let f 1 , . . . , f M be the
set of all possible deterministic decision rules. Then there
exists a convex combination λ1 , . . . , λM such that
fd (c) =
M
X
m=1
λm fdm (c)
Extreme Points Convex Sets
Proposition
Let C ⊆ RN be a convex set. Let e be an extreme point of C.
Let D be a convex subset of C. If e ∈ D, then e is an extreme
point of D.
Proof.
Let e be an extreme point of C. Suppose e ∈ D. Let a, b ∈ D
satisfy e = a+b
2 . Since D ⊆ C, a, b ∈ C. Now, a, b ∈ D ⊆ C,
a+b
with e = 2 . Since e is an extreme point of C, e = a = b. But
now we have e ∈ D and a, b ∈ D satisfying e = a+b
2 . And we
have just proved that e = a = b. Therefore, e is an extreme
point of D.
Expected Conditional Gain: Mixed Decision Rules
Proposition
E[e | c j ; λf + (1 − λ)g] = λE[e | c j ; f ] + (1 − λ)E[e | c j ; g]
Proof.
E[e | c j ; λf + (1 − λ)g]
=
K X
X
e(c j , c k )P(d | c j ){λf (c k | d) + (1 − λ)g(c k | d)}
k =1 d∈D
=
λ
K X
X
e(c j , c k )P(d | c j )f (c k |d) +
k =1 d∈D
(1 − λ)
K X
X
e(c j , c k )P(d | c j )g(c k | d)
k =1 d∈D
=
λE[e | c j ; f ] + (1 − λ)E[e | c j ; g]
Example
e
True
c1
c2
Assigned
c1
c2
2
-1
-1
2
P(d | c)
True Class
c1
c2
E[e | c j ; f ]
=
Measurement
d1 d2 d3
.2
.3
.5
.5
.4
.1
K
XX
fd (c)
True Class
c1
c2
Measurement
d1 d2 d3
1
0
0
0
1
1
e(c j , c k )P(d | c j )fd (c k )
d∈D k =1
E[e | c 1 ; f ]
=
e(c 1 , c 1 )P(d 1 | c 1 )fd 1 (c 1 ) + e(c 1 , c 2 )P(d 1 | c 1 )fd 1 (c 2 ) +
e(c 1 , c 1 )P(d 2 | c 1 )fd 2 (c 1 ) + e(c 1 , c 2 )P(d 2 | c 1 )fd 2 (c 2 +
e(c 1 , c 1 )P(d 3 | c 1 )fd 3 (c 1 ) + e(c 1 , c 2 )P(d 3 | c 1 )fd 3 (c 2 )
=
2 ∗ .2 ∗ 1 + (−1) ∗ .2 ∗ 0 +
2 ∗ .3 ∗ 0 + (−1) ∗ .3 ∗ 1 +
2 ∗ .5 ∗ 0 + (−1) ∗ .5 ∗ 1
=
.4 − .3 − .5 = −.4
Example
e
True
c1
c2
Assigned
c1
c2
2
-1
-1
2
P(d | c)
True Class
c1
c2
E[e | c j ; f ]
=
Measurement
d1 d2 d3
.2
.3
.5
.5
.4
.1
K
XX
fd (c)
True Class
c1
c2
Measurement
d1 d2 d3
1
0
0
0
1
1
e(c j , c k )P(d | c j )fd (c k )
d∈D k =1
E[e | c 2 ; f ]
=
e(c 2 , c 1 )P(d 1 | c 2 )fd 1 (c 1 ) + e(c 2 , c 2 )P(d 1 | c 2 )fd 1 (c 2 ) +
e(c 2 , c 1 )P(d 2 | c 2 )fd 2 (c 1 ) + e(c 2 , c 2 )P(d 2 | c 2 )fd 2 (c 2 ) +
e(c 2 , c 1 )P(d 3 | c 2 )fd 3 (c 1 ) + e(c 2 , c 2 )P(d 3 | c 2 )fd 3 (c 2 )
=
(−1) ∗ .5 ∗ 1 + 2 ∗ .5 ∗ 0 +
(−1) ∗ .4 ∗ 0 + 2 ∗ .4 ∗ 1 +
(−1) ∗ .1 ∗ 0 + 2 ∗ .1 ∗ 1
=
−.5 + .8 + .2 = .5
Example
e
True
c1
c2
Assigned
c1
c2
2
-1
-1
2
E[e | c j ; f ]
=
P(d | c)
True Class
c1
c2
K
XX
Measurement
d1 d2 d3
.2
.3
.5
.5
.4
.1
e(c j , c k )P(d | c j )fd (c k )
d∈D k =1
f
f1
f2
f3
f4
f5
f6
f7
f8
Measurements
d1 d2
d3
c1
c1
c1
c1
c1
c2
1
2
c
c
c1
1
2
c
c
c2
2
1
c
c
c1
2
1
c
c
c2
2
2
c
c
c1
2
2
c
c
c2
Conditional Gain
E[e|c 1 ; f ] E[e|c 2 ; f ]
2.0
-1.0
.5
-.7
1.1
.2
-.4
.5
1.4
.5
-.1
.8
.5
1.7
-1.
2.0
Conditional Expected Gains: All Decision Rules
E[e|c 2 ; f ]
f8
2
f7
1.7
1.4
1.1
f6
.8
f5
f4
.5
f3
.2
-.1
-.4
f2
-.7
-1
-1
-.7
-.4
-.1
.2
.5
.8
f1
1.1 1.4 1.7
2
E[e|c 1 ; f ]
Convex Sets and Linear Functions
Proposition
Images of linear functions of convex sets are convex.
Proof.
Let y1 , y2 ∈ D and let 0 ≤ λ ≤ 1. Then there exists x1 , x2 ∈ C
such that y1 = f (x1 ) and y2 = f (x2 ).
λy1 + (1 − λy2 = λf (x1 ) + (1 − λ)f (x2 )
= f (λx1 + (1 − λ)x2 )
But x1 , x2 ∈ C and C is convex. Therefore λx1 + (1 − λ)x2 ∈ C.
Hence, f (λx1 + (1 − λ)x2 ) ∈ D. And this makes
λy1 + (1 − λy2 ∈ D
Dependence on Prior Class Probabilities
Proposition
Expected economic gain for a decision rule is an affine function
of the expected economic conditional gains with coefficients
P(c 1 ), . . . , P(c K −1 ).
Proof.
E[e; f ]
=
K
X
E[e | c j ; f ]P(c j )
j=1
=
K
−1
X
j=1
=
K
−1
X
j=1
E[e | c j ; f ]P(c j ) + E[e | c K ; f ](1 −
K
−1
X
P(c j ))
j=1
{E[e | c j ; f ] − E[e | c K ; f ]}P(c j ) + E[e | c K ; f ]
Dependence on Prior Class Probabilities
E[e; f ] =
K
−1
X
{E[e | c j ; f ] − E[e | c K ; f ]}P(c j ) + E[e | c K ; f ]
j=1
f
f1
f2
f3
f4
f5
f6
f7
f8
Measurements
d1 d2
d3
1
1
c
c
c1
1
1
c
c
c2
1
2
c
c
c1
1
2
c
c
c2
2
1
c
c
c1
2
1
c
c
c2
2
2
c
c
c1
2
2
c
c
c2
Conditional Gain
E[e|c 1 ; f ] E[e|c 2 ; f ]
2.0
-1.0
.5
-.7
1.1
.2
-.4
.5
1.4
.5
-.1
.8
.5
1.7
-1.
2.0
Expected Gain
E[e; f , P(c 1 )]
3P(c 1 ) − 1
1.2P(c 1 ) − .7
.9P(c 1 ) + .2
−.9P(c 1 ) + .5
.9P(c 1 ) + .5
−.9P(c 1 ) + .8
−1.2P(c 1 ) + 1.7
−3P(c 1 ) + 2.0
Dependence on Class Prior Probabilities
E[e; f ]
2
f8
f1
f7
f5
The Bayes Gain is the
upper envelope
.5
P(c 1 )
-1
0
.5
1
Convex Functions
Definition
A function h, h : RN → R, is a convex function if and only if for
every λ, 0 ≤ λ ≤ 1,
h(λ(x1 , . . . , xN ) + (1 − λ)(y1 , . . . , yN )) ≤ λh(x1 , . . . , xN ) + (1 − λ)h(y1 , . . . , yN )
w = λu + (1 − λ)v
(λu + (1 − λ)v , λh(u) + (1 − λ)h(v ))
(v , h(v ))
(u, h(u))
h(w)
u
w
v
Bayes Gain is Convex
E[e; f ]
2
f8
f1
f7
Bayes Gain is a convex
f5
function of class prior
probabilities
.5
P(c 1 )
-1
0
.5
1
Bayes Gain Is Convex
E[e; f ]
=
GB
=
K
X
E[e | c j ; f ]P(c j )]
j=1
max E[e; f ] Bayes Gain
f
n
Let f , n = 1, . . . N be the N = |C|
j = 1, . . . , K
|D|
deterministic decision rules. Define for
ajn
=
E[e | c j ; f n ]
pj
=
P(c j )
GB (P(c 1 ), . . . , P(c K ))
=
max
GB (p1 , . . . , pK )
=
n
max
n
K
X
E[e | c j ; f n ]P(c j )
j=1
K
X
j=1
ajn pj
Bayes Gain Is Convex
Theorem
Let p = (p1 , . . . , pK ) and q = (q1 , . . . qK ). Let 0 ≤ λ ≤ 1.
GB (λp + (1 − λ)q)
≤
λGB (p) + (1 − λ)GB (q)
Proof.
GB (λp + (1 − λ)q)
=
=
≤
≤
max
n
K
X
ajn (λpj + (1 − λ)qj )
j=1


K
K
 X

X
max λ
ajn pj + (1 − λ)
ajn qj
n 

j=1
j==1

 

K
K
X
X
max λ
ajn pj  + max(1 − λ)
ajn qj 
n
j=1
λGB (p) + (1 − λ)GB (q)
n
j=1
Epigraph and Convex Sets
Definition
Let f : RN → R. The epigraph of f , denoted Epi(f) is the set of
points lying on or above the graph of f.
Epi(f ) = {(x, u) ∈ RN × R | u ≥ f (x)}
Epigraph and Convex Sets
Proposition
If a function is convex then its epigraph is a convex set.
Proof.
Suppose f is convex. Let (x, u), (y , v ) ∈ Epi(f ) and 0 ≤ λ ≤ 1.
Then by definition of Epi(f ), f (x) ≤ u, f (y ) ≤ v and, therefore,
λf (x) + (1 − λ)f (v ) ≤ λu + (1 − λ)v . Since f is convex,
f (λx + (1 − λ)y ) ≤ λf (x) + (1 − λ)f (y ). But
λf (x) + (1 − λ)f (v ) ≤ λu + (1 − λ)v . Now by definition of
Epi(f ), (λx + (1 − λ)y , λu + (1 − λ)v ) ∈ Epi(f ) making Epi(f )
convex.
Epigraph and Convex Sets
Proposition
If the epigraph of a function is a convex set, then the function is
convex.
Proof.
Suppose Epi(f ) is a convex set. Then by definition of Epi(f ),
(x, f (x)) ∈ Epi(f ) and (y , f (y )) ∈ Epi(f ). Since Epi(f ) is convex,
λ(x, f (x)) + (1 − λ)(y , f (y )) ∈ Epi(f ). Hence
(λx + (1 − λy ), λf (x) + (1 − λ)f (y )) ∈ Epi(f ). By definition of
Epi(f ), f (λx + (1 − λ)y ) < λf (x) + (1 − λ)f (y ). And by
definition of a convex function, this implies that f is convex.
Epigraph and Convexity
Theorem
A function is convex if and only if its epigraph is a convex set.
Basin sets of Convex Functions
Definition
Let f : RN → R and c ∈ R. A basin set of f is any set of the form
L = {x ∈ RN | f (x) ≤ c}
Theorem
Let C be a convex set, h be a convex function on C and
L = {c ∈ C | h(c) ≤ b}. Then L is a convex set.
Proof.
Let x, y ∈ L so that h(x) ≤ b and h(y ) ≤ b and let 0 ≤ λ ≤ 1. Since
x, y ∈ L ⊆ C and since C is a convex set, λx + (1 − λ)y ∈ C. Then
h(λx + (1 − λ)y ) ≤ λh(x) + (1 − λ)h(y ) ≤ λb + (1 − λ)b = b
This implies by definition of L that λx + (1 − λ)y ∈ L.
Minima Set of A Convex Function is Convex
Corollary
Let C ⊂ RN be a closed and bounded convex set. Let
h : C → R be a convex function. Suppose b = minc∈C h(c).
Then M = {x ∈ C | h(x) = b} is a convex set.
Proof.
Note that M = {x ∈ C | h(x) ≤ b}. C being closed and
bounded is needed because the minima of h may be on the
boundary.
For Convex Functions Local Minima are Global Minima
Theorem
Let C be a convex set and h be a convex function on C.
Suppose h has a local minima at x0 ∈ C. Then for any
x ∈ C, h(x0 ) ≤ h(x).
Proof.
Let x ∈ C and α > 0 be sufficiently small so that
(1 − α)x0 + αx ∈ C. Then,
h(x0 ) ≤ h((1 − α)x0 + αx) ≤ (1 − α)h(x0 ) + αh(x)
0 ≤ α(h(x) − h(x0 ))
h(x0 ) ≤ h(x)
Dependence on Prior Class Probabilities
E[e; f ] =
K
−1
X
{E[e | c j ; f ] − E[e | c K ; f ]}P(c j ) + E[e | c K ; f ]
j=1
f
f1
f2
f3
f4
f5
f6
f7
f8
Measurements
d1 d2
d3
1
1
c
c
c1
1
1
c
c
c2
1
2
c
c
c1
1
2
c
c
c2
2
1
c
c
c1
2
1
c
c
c2
2
2
c
c
c1
2
2
c
c
c2
Conditional Gain
E[e|c 1 ; f ] E[e|c 2 ; f ]
2.0
-1.0
.5
-.7
1.1
.2
-.4
.5
1.4
.5
-.1
.8
.5
1.7
-1.
2.0
Expected Gain
E[e; f , P(c 1 )]
3P(c 1 ) − 1
1.2P(c 1 ) − .7
.9P(c 1 ) + .2
−.9P(c 1 ) + .5
.9P(c 1 ) + .5
−.9P(c 1 ) + .8
−1.2P(c 1 ) + 1.7
−3P(c 1 ) + 2.0
Dependence on Class Prior Probabilities
E[e; f ]
2
f8
f1
f7
f5
Where are the probabilistic
decision rules?
.5
λf + (1 − λ)g
P(c 1 )
-1
0
.5
1
Probabilistic Decision Rules
Pick a prior probability P(c 1 )
For decision rule f there is an Expected Gain E[e; f ]
For decision rule g there is a Expected Gain E[e; g]
For decision rule λf + (1 − λ)g, the Expected Gain is
λE[e; f ] + (1 − λ)E[e; g]
In between the Expected Gain for f and the Expected Gain
for g
Dependence on Class Prior Probabilities
E[e; f ]
2
f8
f1
f7
f5
P(c1 ) = .2
Decision Rules f 7 and f 1
Examine the blue line
Where are the probabilistic
decision rules?
.5
Lines contained in the area
between the lower and
the upper envelopes
P(c 1 )
-1
0
.5
1
Probabilistic Decision Rules Are In Between
Expected gain of a mixed decision rule is the mixture of the
expected gains of the component decision rules.
E[e; λf + (1 − λ)g, P(c 1 )]
=
λE[e; f , P(c 1 )] + (1 − λ)E[e; g; P(c 1 )]
Probabilistic Decision Rules Are In Between
Two Class Case
Proposition
Let 0 ≤ λ ≤ 1. If E[e; f1 , P(c 1 )] and E[e; f2 , P(c 1 )] are affine functions of
P(c 1 ), then E[e; λf1 + (1 − λ)f2 , P(c 1 )] is an affine function of P(c 1 ).
Proof.
E[e; f1 , P(c 1 )]
=
α1 P(c 1 ) + β1
E[e; f2 , P(c )]
=
α2 P(c 1 ) + β2
E[e; λf1 + (1 − λ)f2 , P(c 1 )]
=
λ(α1 P(c 1 ) + β1 ) + (1 − λ)(α2 P(c 1 ) + β2 )
=
(λα1 + (1 − λ)α2 )P(c 1 ) + λβ1 + (1 − λ)β2
1
Probabilistic Decision Rules Are In Between
Corollary
Fix P(c 1 ). Let 0 ≤ λ ≤ 1.
If α1 P(c 1 ) + β1 < α2 P(c 1 ) + β2 then
α1 P(c 1 ) + β1 ≤ (λα1 + (1 − λ)α2 )P(c 1 ) + λβ1 + (1 − λ)β2 ≤ α2 P(c 1 ) + β2
Dependence on Class Prior Probabilities
E[e; f ]
2
f8
f1
f7
f5
Where are the probabilistic
decision rules?
.5
Lines in the area between
the lower and the
upper envelopes
P(c 1 )
-1
0
.5
1
Dependence on Class Prior Probabilities
E[e; f ]
2
f8
f1
f7
71
70
Worst Class Priors
P(c 1 ) = 4/7 ≈ .5714
P(c 2 ) = 3/7 ≈ .4286
f5
= 1.014
.5
P(c 1 )
-1
0
.5
.5714
1
Finding Worst Class Priors
Two Class Case
Decision Rules of Mixture are Known
E[e; f5 ; P(c 1 )] = .9P(c 1 ) + .5
E[e; f7 ; P(c 1 )] = −1.2P(c 1 ) + 1.7
Set E[e; f5 ; P(c 1 )] = E[e; f7 ; P(c 1 )]
.9P(c 1 ) + .5 = −1.2P(c 1 ) + 1.7;
2.1P(c 1 ) = 1.2
1.2
4
P(c 1 ) =
=
2.1
7
P(c 2 ) = 1 − P(c 1 ) =
3
7
Determining the Maximin Decision Rule
E[e; f ] = E[e|c 1 ; f ]P(c 1 ) + E[e|c 2 ; f ]P(c 2 )
= E[e|c 1 ; f ]P(c 1 ) + E[e|c 2 ; f ](1 − P(c 1 ))
= (E[e|c 1 ; f ] − E[e|c 2 ; f ])P(c 1 ) + E[e|c 2 ; f ]
Since a maximin decision rule has no dependence on the prior
probability, we must have
E[e|c 1 ; f ] − E[e|c 2 ; f ] = 0
E[e|c 1 ; f ] = E[e|c 2 ; f ]
In this case,
E[e; f ] = E[e|c 1 ; f ]
= E[e|c 2 ; f ]
Dependence of a Probabilistic Decision Rule on Priors
E[e; λf 5 + (1 − λ)f 7 ]
=
E[e|c 1 ; λf 5 + (1 − λ)f 7 ]P(c 1 ) + E[e|c 2 ; λf 5 + (1 − λ)f 7 ]P(c 2 )
=
(λE[e|c 1 ; f 5 ] + (1 − λ)E[e|c 1 ; f 7 ])P(c 1 ) +
(λE[e|c 2 ; f 5 ] + (1 − λ)E[e|c 2 ; f 7 ])(1 − P(c 1 ))
= ((λE[e|c 1 ; f 5 ] + (1 − λ)E[e|c 1 ; f 7 ]) − (λE[e|c 2 ; f 5 ] + (1 − λ)E[e|c 2 ; f 7 ])P(
E[e|c 2 ; λf 5 + (1 − λ)f 7 ]
When there is no dependence, the coefficient of P(c 1 ) must be zero.
λE[e|c 1 ; f 5 ] + (1 − λ)E[e|c 1 ; f 7 ] − (λE[e|c 2 ; f 5 ] + (1 − λ)E[e|c 2 ; f 7 ])
λ(E[e|c 1 ; f 5 ] − E[e|c 1 ; f 7 ] − E[e|c 2 ; f 5 ] + E[e|c 2 ; f 7 ])
λ
=
=
=
0
E[e|c 2 ; f 7 ] − E[e|c 1 ; f 7 ]
E[e|c 2 ; f 7 ] − E[e|c 1 ; f 7 ]
− E[c 1 ; f 7 ] − E[e|c 2 ; f 5 ] + E[e|c 2 ; f 7 ]
1.7 − .5
1.2
4
=
=
1.4 − .5 − .5 + 1.7
2.1
7
E[c 1 ; f 5 ]
=
Require 0 ≤ λ ≤ 1
λ =
E[e|c 2 ; f 7 ] − E[e|c 1 ; f 7 ]
E[c 1 ; f 5 ] − E[c 1 ; f 7 ] − E[e|c 2 ; f 5 ] + E[e|c 2 ; f 7 ]
λ ≥ 0 implies
Sign(E[e|c 2 ; f 7 ] − E[e|c 1 ; f 7 ])
=
Sign(E[c 1 ; f 5 ] − E[c 1 ; f 7 ] − E[e|c 2 ; f 5 ] + E[e|c 2 ; f 7 ])
λ ≤ 1 implies
|E[e|c 2 ; f 7 ] − E[e|c 1 ; f 7 ]|
≤
|E[c 1 ; f 5 ] − E[c 1 ; f 7 ] − E[e|c 2 ; f 5 ] + E[e|c 2 ; f 7 ]|
Finding Worst Class Priors
Three Class Case
E[e; fi ; P(c 1 ), P(c 2 )] = αi1 P(c 1 ) + αi2 P(c 2 ) + γi , i = 1, 2, 3
E[e; fi ; P(c 1 ), P(c 2 )] = E[e; f3 ; P(c 1 ), P(c 2 )], i = 1, 2
α11 P(c 1 ) + α12 P(c 2 ) + γ1 = α31 P(c 1 ) + α32P(c 2 ) + γ3
α21 P(c 1 ) + α22 P(c 2 ) + γ2 = α31 P(c 1 ) + α32P(c 2 ) + γ3
α11 − α31 α12 − α32
α21 − α31 α22 − α32
P(c 1 )
P(c 2 )
=
γ1 − γ3
γ2 − γ3
Finding Worst Class Priors
N Class Case
E[e; fn ; P(c 1 ), . . . , P(c N−1 )]
=
E[e; fn ; P(c 1 ), . . . , P(c N−1 )]
=
N−1
X
αni P(c i ) + γn , n = 1, . . . , N
i=1

α11 − αN1
α12 − αN2
αN−11 − αN1
αN−12 − αN2


...
.
..
...
E[e; fN ; P(c 1 ), . . . , P(c N−1 )], i = 1, . . . , N − 1
α1N−1 − αNN−1
αN−1N−1 − αNN−1




 

P(c 1 )
γ1 − γN
 

..
.
=
..


.
N−1
γ
−
γ
P(c
)
N−1
N
Finding The Convex Combination
Two Class Case
Find P(c 1 ) that solves α11 P(c 1 ) + γ1 = α21 P(c 1 ) + γ2 .
Call the solution P0 (c 1 ).
Consider the expected gain of a mixed decision rule that has
expected gain α21 P0 (c 1 ) + γ2 for any prior P(c 1 ).
λ(α11 P(c 1 ) + γ1 ) + (1 − λ)(α21 P(c 1 ) + γ2 ) = α21 P0 (c 1 ) + γ2
(λα11 + (1 − λ)α21 )P(c 1 ) = α21 P0 (c 1 ) + γ2 − λγ1 − (1 − λ)γ2
= α21 P0 (c 1 ) − λ(γ1 + γ2 )
Therefore, λα11 + (1 − λ)α21 = 0 and λ =
−α21
α11 −α21 .
Finding The Convex Combination
Two Class Case
Identity in P(c 1 ) meaning For all P(c 1 )
0 ≤ λ1 , λ2 ≤ 1
λ1 + λ2 = 1
λ1 (α11 P(c 1 ) + γ1 ) + λ2 (α21 P(c 1 ) + γ2 )
(λ1 α11 + λ2 α21 )P(c 1 )
=
=
α21 P0 (c 1 ) + γ2
α21 P0 (c 1 ) + γ2 − λ1 γ1 − λ2 γ2
This implies
λ1 α11 + λ2 α21
λ1 γ1 + λ2 γ2
λ1 + λ2
= 0
= α21 P0 (c 1 ) + γ2
= 1
Finding the Convex Combination
λ1 α11 + λ2 α21 = 0
λ1 γ1 + λ2 γ2 = α21 P0 (c 1 ) + γ2
λ1 + λ2 = 1
α11
α21
α11
= λ1 (1 −
)=1
α21
α21
=
α21 − α11
λ2 = −λ1
λ1 + λ2
λ1
Finding the Convex Combination: Consistency Check
0 ≤ λ1 , λ2 ≤ 1
α21
λ1
=
α21 − α11
Either α21 − α11 > 0 or < 0.
If α21 − α11 > 0 then
α21 > α11
α21 > 0
If α21 − α11 < 0 then,
α21 < α11
α21 < 0
Finding the Convex Combination
Once λ1 and λ2 are known, the exact value for P0 (c 1 ) can be
determined.
λ1 γ1 + (1 − λ1 )γ2 = α21 P0 (c 1 ) + γ2
λ1 (γ1 − γ2 ) = α21 P0 (c 1 )
λ1 (γ1 − γ2 )
P0 (c 1 ) =
α21
α21
γ1 − γ2
=
α21 − α11 α21
γ1 − γ2
=
α21 − α11
Finding The Convex Combination
N Class Case
Identity in P(c 1 ) . . . , P(c N−1 )
N
X
n=1
λn
N−1
X
!
αin P(c ) + γn
i=1
N−1
N
X X
i=1
i
=
!
λn αin
i
P(c ) =
n=1
N
X
i=1
N
X
αNi P0 (c i ) + γN
i
αNi P0 (c ) + γN −
i=1
Implies
N
X
λn αin = 0, i = 1, . . . , N − 1
n=1
N
X
n=1
λn = 1
N
X
n=1
λn γn
Finding The Convex Combination
Each component decision rule of the mixture has an
expected gain that is a hyperplane in the axes
P(c 1 ) . . . , P(c N−1 )
The first N − 1 rows of the i t h column consists of the
coefficients of P(c 1 ) . . . , P(c N−1 ) for the i th hyperplane

α11
α21
α21
α22
...
...
..
.
αN1
αN2





 αN−11 αN−12 . . . αN−1N
1
1
...
1








λ1
λ2
..
.
λN


0
0
..
.

 
 
=
 
 0
1







Dependence on Class Prior Probabilities
E[e; f ]
2
f8
f1
f7
f5
fM = 47 f 5 + 37 f 7
1.014
.5
P(c 1 )
-1
0
.5
1
Conditional Expected Gains: All Decision Rules
E[e|c 2 ; f ]
f8
2
f7
fM
71
70
f5
.5
f1
-1
-1
.5
2
E[e|c 1 ; f ]
Two Entity Game
The game is played for a large number of trials.
Nature chooses class c in accordance with class priors
P(c 1 ) . . . , P(c K )
A measurement d is sampled in accordance with P(d | c)
Bayes chooses decision rule to maximize expected gain
under given class priors
Suppose nature chooses class priors so that the Bayes gain is
minimized. Bayes chooses to maximize expected gain under
worst priors. But suppose nature does not choose c in
accordance with worst priors.
Maximin Decision Rule
There is a mixed decision rule that guarantees that regardless
of what class priors nature chooses, the expected gain is equal
to the Bayes gain under the worst class priors. This is the
maximin decision rule.
Dependence on Class Prior Probabilities
E[e; f ]
2
f8
f1
f7
f5
71
70
fM = 47 f 5 + 37 f 7
.5
P(c 1 )
-1
0
.5
1
Maximin Decision Rule
Definition
A decision rule f is a Maximin Decision Rule if and only if
min
K
X
P(c 1 ),...,P(c K )
j
j
E[e | c ; f ]P(c ) ≥
j=1
min
P(c 1 ),...,P(c K )
K
X
E[e | c j ; g]P(c j )
j=1
for any decision rule g where
E[e | c j ; f ] =
K
XX
d∈D k =1
e(c j , c k )P(d | c j )fd (c k )
Maximin Decision Rule
Theorem
A decision rule f is a maximin decision rule if and only if
min E[e | c j ; f ] ≥ min E[e | c j , g]
j=1,...,K
for any decision rule g.
j=1,...,K
Theorem
A decision rule f is a maximin decision rule if and only if
min
P(c 1 ),...,P(c K )
E[e; f , P(c 1 ), . . . , P(c K )] ≥
min
P(c 1 ),...,P(c K )
E[e; g, P(c 1 ), . . . , P(c K )]
for any decision rule g.
Proof.
Recall
E[e; f , P(c 1 ), . . . , P(c K )] = E[e; f ] =
K
X
j=1
E[e | c j ; f ]P(c j )
Maximin Decision Rule
A decision rule f is a maximin decision rule if and only if the
expected gain of f is the same as the expected gain of the
Bayes rule under the worst possible prior class probabilities.
Theorem
Let G be the Bayes Economic Gain under the worst prior class
probabilities. Then f is a maximin decision rule if and only if
E[e | c j ; f ] = G, j = 1, . . . , K
Maximin Decision Rule
E[e|c 2 ; f ]
f8
2
f7
fM
1.014
f5
.5
f1
-1
-1
.5
2
E[e|c 1 ; f ]
Maximin Decision Rule
Let P(c 1 ), . . . , P(c K ) be given class prior probabilities. Let
f m , m = 1, . . . , M be M deterministic decision rules satisfying
G=
K
X
E[e | c j ; f m ]P(c j ), m = 1, . . . , M
j=1
Then
PM there exists λm , λm ≥ 0, m = 1, . . . , M, and
m=1 λm = 1 satisfying
G = E[e | c j ;
M
X
λm f m ], j = 1, . . . , K
m=1
Note:
E[e | c j ;
M
X
m=1
λm f m ] =
M
X
m=1
λm E[e | c j ; f m ]
Maximin Decision Rule
Let P(c 1 ), . . . , P(c K ) be the worst priors
Let Gw be the worst Bayes gain
Let f m be deterministic decision rules, m = 1, . . . , M
Gw =
PK
j=1
E[e | c j ; f m ]P(c j )
Find convex combination λ1 , . . . , λM
Gw = E[e | c k ;
PM
m=1
λm f m ] =
PM
m=1
λm E[e | c j ; f m ], j = 1, . . . , K
Let ajm = E[e | c j ; f m ]
Find convex combination λ1 , . . . , λM satisfying
Gw =
PM
m=1
λm ajm , j = 1, . . . , K
Existence of Mixed Decision Rule Strategy
Theorem
Let ajm be a real numbers, j = 1, . . . , K ; m = 1, . . . , M. Let
P
pj ≥ 0 and Kj=1 pj = 1. Suppose
G=
K
X
pj ajm , m = 1, . . . , M
j=1
Then there exists λm , m = 1, . . . , M, λm ≥ 0 and
satisfying
M
X
G=
ajm λm , j = 1, . . . , K
m=1
PM
m=1 λm
=1
Dependence on Class Prior Probabilities
E[e; f ]
2
f8
f1
f7
Worst Class Priors
P(c 1 ) = 4/7 ≈ .5714
P(c 2 ) = 3/7 ≈ .4286
f5
71
70
.5
P(c 1 )
-1
0
.5
4
7
1
Maximin Decision Rule
E[e|c 2 ; f ]
f8
2
f7
fM
71
70
fM = 47 f 5 + 37 f 7
f5
.5
f1
-1
-1
.5
2
E[e|c 1 ; f ]
Find Worst Priors
Find priors p10 , . . . , pK0 satisfying
maxf E[e; f , p10 , . . . , pK0 ] ≤ maxg E[e; g, p1 , . . . , pK ]
where f must be a Bayes Decision Rule
where g must be a Bayes Decision Rule
PK
where k =1 pk = 1 and pk ≥ 0
Choose priors q10 , . . . , qK0
Determine the Bayes Gain corresponding to q10 , . . . , qK0
Iterate to fixed point
(q1n , . . . , qKn ) = (q1n−1 , . . . , qKn−1 ) + (π1 , . . . , πK )
PK
where k =1 πk = 0
PK
so that k =1 qkn = 1 and 0 ≤ qkn ≤ 1
so that over all Bayes decision rules f and g
maxf E[e; f , q1n , . . . , qKn ] ≤ maxg E[e; f , q1n−1 , . . . , qKn−1 ]
Let p10 , . . . , pK0 be the worst priors
Find all deterministic Bayes decision rules f 1 , . . . , f M such
that for each m = 1, . . . , M,
Gw = E[e; f m , p10 , . . . , pK0 ] = maxf E[e; f , p10 , . . . , pK0 ]
Find λ1 , . . . , λM , 0 ≤ λm ≤ 1 and
Gw = E[e | c j ;
PM
m=1 λm
M
X
m=1
λm f m ]
= 1 to satisfy
Maximin Decision Rules
Find λ1 , . . . , λM to satisfy Gw =
PM
m=1 λm E[e
| cj ; f m]
Let αjm = E[e | c j ; f m ]
Find λ1 , . . . , λM to satisfy

α11 α12 . . . α1M
 α21 α22 . . . α2M


..

.

 αM1 αM2 . . . αMM
1
1
...
1








λ1
λ2
..
.
λM


Gw
Gw
..
.

 
 
=
 
 Gw
1







Maximin Decision Rule
Let αjm = E[e | c j ; f m ] − E[e | c j ; f M ]
Find λ1 , . . . , λM to satisfy

α11
α12
...
α1M
 α21
α
.
.
.
α2M
22


..

.

 αM−11 αM−12 . . . αM−1M
1
1
...
1








λ1
λ2
..
.
λM


0
0
..
.

 
 
=
 
 0
1
The solution is valid only if 0 ≤ λ1 , . . . , λM ≤ 1








Download Report

Maximin Decision Rule

Paperzz.com

Your Paperzz