Notes #25 Ch11A

Ch. 11
General Bayesian Estimators
1
Introduction
In Chapter 10 we:
• introduced the idea of a “a priori” information on θ
⇒ use “prior” pdf: p(θ)
called:
• defined a new optimality criterion
• “mean of posterior pdf”
• “conditional mean”
⇒ Bayesian MSE
• showed the Bmse is minimized by E {θ|x}
In Chapter 11 we will:
• define a more general optimality criterion
⇒ leads to several different Bayesian approaches
⇒ includes Bmse as special case
Why?
Provides flexibility in balancing:
• model,
• performance, and
• computations
2
11.3 Risk Functions
Previously we used Bmse as the Bayesian measure to minimize
(
)
2

Bmse = E  θ − θˆ  w.r.t. p ( x,θ )


∆
θ − θˆ = ε
So, Bmse is… Expected value of square of error
Let’s write this in a way that will allow us to generalize it.
(
Define a quadratic Cost Function: C (ε ) = ε 2 = θ − θˆ
Then we have that Bmse = E{C (ε )}
Why limit the cost function to just quadratic?
)
2
C(ε) = ε2
ε
3
General Bayesian Criteria
1. Define a cost function: C(ε)
2. Define Bayes Risk: R = E{C(ε)} w.r.t. p(x,θ )
{
}
R (θˆ) = E C (θ − θˆ)
Depends on choice of estimator
3. Minimize Bayes Risk w.r.t. estimate θˆ
The choice of the cost function can be tailored to:
• Express importance of avoiding certain kinds of errors
• Yield desirable forms for estimates
– e.g., easily computed
• Etc.
4
Three Common Cost Functions
C(ε)
1. Quadratic: C(ε) = ε2
ε
C(ε)
2. Absolute: C(ε) = | ε |
ε
0, ε < δ
3. Hit-or-Miss: C (ε ) = 
1, ε ≥ δ
δ > 0 and small
C(ε)
–δ
δ
ε
5
General Bayesian Estimators
Derive how to choose estimator to minimize the chosen risk:
R (θˆ) = E{C (θ − θˆ )}
=
ˆ ) p( x,θ ) dx dθ
C
(
θ
−
θ
∫∫
= p (θ|x ) p( x )
=
∫ [ ∫ C (θ − θˆ) p (θ|x)dθ ] p( x)dx
∆
=
g (θˆ )
must minimize this for each x value
So… for a given desired cost function…
you have to find the form of the optimal estimator
6
The Optimal Estimates for the Typical Costs
(
)
2

ˆ
ˆ
1. Quadratic: R (θ ) = E  θ − θ  = Bmse(θˆ)


As we saw in Ch. 10
{
2. Absolute: R (θˆ) = E θ − θˆ
3. Hit-or-Miss:
}
θˆ = E{θ | x}
= mean of p (θ | x
θˆ = median of p(θ | x )
θˆ = mode of p (θ | x )
“Maximum A Posteriori”
or MAP
If p(θ|x) is unimodal & symmetric
mean = median = mode
p(θ|x)
Mode
Mean
Median
θ
7
Derivation for Absolute Cost Function
Writing out the function to be minimized gives:
g (θˆ) =
∞
∫ | θ − θˆ | p(θ | x)dθ
−∞
=
Now set
θˆ
∞
−∞
θ
region where |θ −θˆ|=θˆ −θ
region where |θ −θˆ|=θ −θˆ
∫ (θˆ − θ ) p(θ | x)dθ + ∫ˆ (θ − θˆ) p(θ | x)dθ
∂g ( θˆ )
= 0
ˆ
∂θ
∂
and use Leibnitz’s rule for
∂u
θˆ
⇒
∫
−∞
∫
φ2 ( u )
φ1 ( u )
h (u, v ) dv
∞
p (θ | x ) dθ −
∫ˆ p(θ | x)dθ
=0
θ
which is satisfied if… (area to the left) = (area to the right)
⇒ Median of conditional PDF
8
Derivation for Hit-or-Miss Cost Function
Writing out the function to be minimized gives:
g (θˆ) =
∞
∫ C (θ − θˆ) p(θ | x )dθ
−∞
θˆ −δ
=
∫ 1 ⋅ p(θ | x ) dθ
−∞
θˆ +δ
= 1−
∫ˆ p(θ | x ) dθ
∞
+
∫ˆ 1 ⋅ p(θ | x ) dθ
θ +δ
Almost all the probability
= 1 – left out
θ −δ
Maximize this integral
So… center the integral around peak of integrand
⇒ Mode of conditional PDF
9
11.4 MMSE Estimators
We’ve already seen the solution for the scalar parameter case
θˆ = E{θ | x}
= mean of p (θ | x
Here we’ll look at:
• Extension to the vector parameter case
• Analysis of Useful Properties
10
Vector MMSE Estimator
The criterion is… minimize the MSE for each component
Vector Parameter: θ = [θ1
Vector Estimate:
[
θˆ = θˆ1
θ2
θp
]T
θˆ
θˆ
]
2
T
p
is chosen to minimize each of the MSE elements:
E{(θ − θˆ ) 2 } = (θ − θˆ ) 2 p ( x,θ ) dx dθ
i
i
∫
i
i
i
i
From the scalar case we know the solution is:
= p(x, θ) integrated
over all other θj’s
θˆi = ∫ θ i p(x,θ i ) dx dθ i
=
∫ ∫ θ i p(x,θ1,…,θ p ) dx dθ1
dθ p
=
∫ ∫ θ i p(x, θ) dx dθ
θˆi = E{θ i | x}
11
So… putting all these into a vector gives:
[
θˆ = θˆ1
θˆ
θˆ
2
[
= E{θ1 | x}
{[
= E θ1
]
T
p
]
E{θ p | x} T
E{θ 2 | x}
θ2
θp
θˆ = E{θ | x}
]T | x}
Vector MMSE Estimate
= Vector Conditional Mean
[ ]ii p(x) dx
Similarly… Bmse(θˆi ) = ∫ Cθ|x
where
{
i = 1,…, p
Cθ|x = Eθ|x [θ − E{θ | x}][θ − E{θ | x}]T
}
12
Ex. 11.1 Bayesian Fourier Analysis
Signal model is:
x[n] = acos(2πfon) + bsin(2πfon) + w[n]
a 
θ =   ~ N (0,σ θ2 I)
 b 
AWGN
w/ zero mean and σ2
θ and w[n] are independent for each n
This is a common propagation model called Rayleigh Fading
Write in matrix form:
x = Hθ + w
 ↑

H = cosine

 ↓

Bayesian Linear Model
↑ 

sine

↓ 
13
Results from Ch. 10 show that
T 
 1
H
H
θˆ = E{θ | x} =  2 I +
2 
σ 
 σ θ
−1
T
H x
σ
Cθ|x
2
 1
HT H 
= 2 I+
2 
σ 
 σ θ
−1
For fo chosen such that H has orthogonal columns then

1

2
ˆθ = E{θ | x} =  σ
1
 1
+
σ2 σ2
 θ


 HT x



2
aˆ = β 
 N
N −1
2
ˆ
b=β 
 N
N −1

∑ x[n] cos(2πf o n)

n =0
1
β=

∑ x[n] sin(2πf o n)

n =0
1+
2σ 2 / N
σ θ2
Fourier Coefficients in the Brackets
Recall: Same form as classical result, except there β = 1
Note: β ≈ 1 if σθ2 >> 2σ2/N
⇒
if prior knowledge is poor, this degrades to classical
14
Impact of Poor Prior Knowledge
Conclusion:
For poor prior knowledge in Bayesian Linear Model
MMSE Est. → MVU Est.
Can see this holds in general: Recall that
[
]
ˆθ = E{θ | x} =µ + C −1 + HT C −1H −1 HT C −1 [x + Hµ ]
θ
θ
w
w
θ
For no prior information: Cθ−1 → 0
[
and
µθ → 0
]
ˆθ → HT C −1H −1 HT C −1x
w
w
MVUE for General Linear Model
15
Useful Properties of MMSE Est.
1. Commutes over affine mappings:
If we have α = Aθ + b then αˆ = Aθˆ + b
Will be used for
Kalman Filter
2. Additive Property for independent data sets
Assume θ, x1, x2 are jointly Gaussian w/ x1 and x2 independent
θˆ = E{θ} + Cθ x1 C −x11 [x1 − E{x1}] + Cθ x 2 C −x12 [x 2 − E{x 2 }]
Update due to x1
a priori Estimate
Update due to x2
Proof: Let x = [x1T x2T]T. The jointly Gaussian assumption gives:
θˆ = E{θ} + Cθ x C −x 1 [x − E{x}]
= E{θ} + [C
θ x1
Cθ x2
]
C −x1
 1
 0
Indep. ⇒ Block Diagonal
0   x1 − E{x1} 

C −x12  x 2 − E{x 2 }
Simplify to
get the result
3. Jointly Gaussian case leads to a linear estimator: θˆ = Px + m
16