MMSE Estimation for Sparse
Representation Modeling *
Michael Elad
Joint work with
The Computer Science Department
The Technion – Israel Institute of technology
Haifa 32000, Israel
Irad Yavneh Matan Protter Javier Turek
The CS Department, The Technion
Noise Removal?
In this talk we focus on signal denoising …
Remove
Additive
Noise
?
Important: (i) Practical application; (ii) A convenient platform
for testing basic ideas in signal/image processing.
Many Considered Directions: Partial differential equations, Adaptive
filters, Statistical estimators, Inverse problems & regularization,
Wavelets, Example-based techniques, Sparse representations, …
Main Message Today: We introduce an estimation point of view to
sparse approximation. This brings a new point of view towards ideas
we have seen, while also leading to new and quite surprising ideas.
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
2
Agenda
1. Background on Denoising with
Sparse Representations
2. Using More than One
Representation: Intuition
3. Crash Course on
Estimation Theory
4. Using More than One
Representation: Theory
5. A Closer Look At the
Unitary Case
6. Summary and Conclusions
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
3
Part I
Background on
Denoising with Sparse
Representations
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
4
Denoising By Energy Minimization
Many of the proposed signal denoising algorithms are related to the
minimization of an energy function of the form
2
1
f x
xy
2
2
y : Given measurements
x : Unknown to be recovered
Relation to
measurements
Pr x
Prior or regularization
This is in-fact a Bayesian point of view, adopting the
Maximum-A-posteriori Probability (MAP) estimation.
Clearly, the wisdom in such an approach is within the
choice of the prior – modeling the signals of interest.
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
Thomas Bayes
1702 - 1761
5
Sparse Representation Modeling
K
Every column in
D (dictionary) is
a prototype signal
(atom).
M
N
A fixed Dictionary
A sparse
& random
vector
D α
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
N
x
The vector is
generated
randomly with few
(say L for now)
non-zeros at
random locations
and with random
values.
6
Back to Our MAP Energy Function
The L0 “norm” is effectively
counting the number of
non-zeros in .
1
2
0
ˆ arg min Dx y 2 s.t. 0 L
2
x̂ D
ˆ
The vector is the
representation (sparse/redundant).
D-y=
-
Bottom line: Denoising of y is done by minimizing
min D y
2
2
s.t.
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
0
0
L
0
2
or min s.t. D y 2
0
2
7
The Solver We Use: Greed Based
The MP is one of the greedy
algorithms that finds one atom
at a time [Mallat & Zhang (’93)].
Step 1: find the one atom that
best matches the signal.
0
2
min 0 s.t. D y 2
2
Next steps: given the previously found atoms,
find the next one to best fit the residual.
The algorithm stops when the error D y is below the destination
2
threshold.
The Orthogonal MP (OMP) is an improved version that re-evaluates the
coefficients by Least-Squares after each round.
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
8
Orthogonal Matching Pursuit
OMP finds one atom at a time for min 0 s.t. D y 2 2
0
2
approximating the solution of
Main Iteration
n 1
for 1 i K
1. Compute E(i) min z di r
Initialization
z
0
n 0, 0
r 0 y D 0 y
and S0
n n 1
2. Choose i0 s.t. 1 i K , E(i0 ) E(i)
3. Update Sn : Sn Sn 1 {i0 }
4. LS : n min D y s.t. sup p Sn
5. Update Re sidual : rn y Dn
No
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
rn
2
Yes
Stop
9
Part II
Using More than One
Representation: Intuition
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
10
Back to the Beginning. What If …
Consider the denoising problem
min
0
0
s.t. D y
2
2
2
and suppose that we can find a
group of J candidate solutions
j jJ1
Basic Questions:
What could we do with such a
set of competing solutions in
order to better denoise y?
Why should this help?
How shall we practically find
such a set of solutions?
such that
0
N
j 0
j
2
2
D j y
2
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
Relevant work:
[Leung & Barron
(’06)] [Larsson & Selen (’07)] [Schintter
et. al. (`08)] [Elad and Yavneh (’08)]
[Giraud (‘08)] [Protter et. al. (‘10)] …
11
Motivation – General
Why bother with such a set?
D
Because each representation
conveys a different story about
the desired signal.
Because pursuit algorithms are
often wrong in finding the sparsest
representation, and then relying
on their solution is too sensitive.
… Maybe there are “deeper”
reasons?
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
1
D
2
12
Generating Many Representations
Our* Answer: Randomizing the OMP
Main Iteration
n 1
for 1 i K
1. Compute E(i) min z di r
Initialization
z
0
n 0, 0
r 0 y D 0 y
and S0
n n 1
2. Choose i0 with
s.t. probabilit
1 i K , yE(i
E(i) c E(i)
0 )exp
n
n 1
3. Update
Sn : Slets
S
{i0parameter
}
For now,
set the
c
4. LSmanually
: n min for
D best
y sperformance.
.t. sup p Sn
Later weshall define
a way to set
5. Update Re sidual : rn y Dn
it automatically
No
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
* Larsson and Schnitter
propose a more
Yes
and
rn complicatedStop
deterministic tree
2
pruning method
13
Generating Many Representations
Initialization
S0
S1
1 2 34 5
S2
4,1 4,2 4,3
S3
4,3,1 4,3,2
4,K
4,3,K
K
K,1 K,2 K,3
K,3,1
K,K 1
K,3,2 K,3,K 1
The tree of overall possible supports is growing exponentially and
impossible to cover
Sellen/Schnitter proposed to prune this tree by always choosing a fixedsize subgroup that gives the best representation error
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
14
Lets Try
Proposed Experiment :
100
Form a random dictionary D.
Multiply by a sparse vector α0 ( 0
0
0
D
+=
v
10 ).
y
200
Add Gaussian iid noise v with σ=1 and obtain y.
0
Solve the problem
0
2
min 0 s.t. D y 100
2
using OMP, and obtain OMP.
Use Random-OMP and obtain RandOMP
j
1000
.
j 1
Lets look at the obtained representations …
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
15
Some Observations
350
Random-OMP cardinalities
OMP cardinality
150
300
Random-OMP error
OMP error
We see that
100
RandOMP
i
Histogram
Histogram
250
0
0
50
200
150
D y
• The OMP gives
the sparsest
solution
2
2
100
50
0
0
10
20
Candinality
30
0
85
40
300
200
2
2
2
D 0
2
D D 0
150
y
100
0.3
Noise Attenuation
Histogram
105
0.35
Random-OMP denoising
OMP denoising
250
Random-OMP denoising
OMP denoising
0.25
0.1
0.2
0.3
Noise Attenuation
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
0.4
0.15
0.05
0
• Nevertheless, it
is not the most
effective for
denoising.
• The cardinality of
a representation
does not reveal
its efficiency.
0.2
0.1
50
0
0
90
95
100
Representation Error
5
10
Cardinality
15
20
16
The Surprise (at least for us) …
3
Lets propose the average
1 1000 RandOMP
ˆ
j
1000 j 1
as our representation
2
1
value
This representation
IS NOT SPARSE AT ALL
but it gives
2
2
2
D
ˆ D 0
y D 0
0.06
2
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
Averaged Rep.
Original Rep.
OMP Rep.
0
-1
-2
-3
0
50
100
index
150
200
17
Is It Consistent? … Yes!
• Lets repeat this experiment
many (1000) times
• True support of α is 10
• We run OMP for denoising
• We run RandOMP J=1000
times and average
1 J
RandOMP
ˆ j
J j 1
• Denoising is assessed by
2
2
2
D 0
2
D
ˆ D 0
y
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
0.45
RandOMP Denoising Factor
• Dictionary (random) of
size N=100, K=200
0.5
OMP versus RandOMP results
Mean Point
0.4
Cases of
zero
solution
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
0.1
0.2
0.3
0.4
OMP Denoising Factor
0.5
18
Is It Consistent? … Yes!
How could we explain this
0.5
0.45
RandOMP Denoising Factor
The results of 1000 trials
with the same parameters
lead to the same
conclusion …
OMP versus RandOMP results
Mean Point
0.4
Cases of
zero
solution
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
0.1
0.2
0.3
0.4
OMP Denoising Factor
0.5
19
Part III
Crash Course on
Estimation Theory
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
20
Estimation Course 101
Suppose that there is an unknown signal x that we want to
measure.
Unfortunately, we get instead a measurement z.
We do know that z is related to x via the following conditional
probabilities
P(z|x) or P(x|z)
Estimation theory is all about tools and algorithms for inferring x
from z somehow.
Obviously, a key element in our story is the need to know P(x|z) –
sometimes this is a tough task.
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
21
The Maximum Likelihood
P(z|x)
This conditional is know as the likelihood function. In words, it says
what is the probability of getting the measurements z if x is given.
Thus, a natural estimator to propose is the Maximum Likelihood
x̂ML argmax P z x
x
ML is very popular and often used. However, is many situations
that we encounter, it is quite weak and useless.
This brings us to the Bayesian approach …
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
22
The MAP
P(z|x) → P(x|z)
This change is not just a technicality – it is a deep philosophical
change in the approach we take, because now we consider x as
random as well.
Due to Bayes rule we have that P x z
P z x P x
P z
The Maximum A’posteriori Probability (MAP) estimator is given by
x̂MAP argmax P x z argmax P z x P x
x
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
x
23
The MAP – A Closer Look
The Maximum A’posteriori Probability (MAP) estimator is given by
x̂MAP argmax P x z argmax P z x P x
x
x
In words, the MAP seeks the most probable x given the
measurements z, which fits exactly our story (z is given while x is
unknown).
Due to Bayes, we see that MAP resembles ML with one major
distinction – P(x) enters and affects the outcome. This is known is the
prior.
As said before – this is the Bayesian approach in estimation. However,
this is not the ONLY Bayesian estimator.
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
24
MMSE Estimation
Given the posterior probability, P(x|z), what is the best we could do?
Answer: find the estimator that minimizes the Mean-Squared-Error,
given by
2
2
ˆ
ˆ
ˆ
MSE x x x P x z dx E x x z
2
2
Lets take a derivative w.r.t. he unknown and get
2 ˆ
x x P x z dx 0
x P x z dx
ˆ
x
E x | z
P x z dx
The outcome: the Minimum MSE (MMSE) estimator is given by the
conditional expectation of x given z
x̂MMSE E x | z
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
25
An Example
2
Our unknown is a scalar Gaussian RV: P x exp 2 x 5
1
2
2
The measurement is z=x+n, where n is Gaussian RV: P n exp n
Suppose that the
actual measurement
was z=2
1
2
P z | x exp x z
2
x̂ML argmax P z x
x
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
z2
In this case, the estimation
value is simply the noisy
data value z, which means
that ML is useless.
26
An Example
Moving to the MAP estimation, we get that the prior influences the
result and thus the outcome is better
x̂MAP argmax P z x P x
x
2 1
2
argmax exp 2 x 5 x 2
2
x
2 1
2
argmin 2 x 5 x 2
2
x
10(x 5) (x 2) 0 11x 52 ˆ
x 4.73
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
27
An Example
Moving now to the MMSE estimation, the result … would be the same!
The reason is this – the posterior is Gaussian as well, and thus its
maximum point and its mean coincide.
In general, the two are different
P x z
x̂MAP x̂MMSE
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
x
28
Part IV
Using More than One
Representation: Theory
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
29
Our Signal Model
x
K
Assume that α is built by:
Choosing the support s with
probability P(s) from all the 2K
possibilities Ω.
N
A fixed Dictionary
D
D is fixed and known.
α
Lets assume that P(iS)=Pi
are drawn independently.
Choosing the αs coefficients using
iid Gaussian entries N 0, 2x .
The ideal signal is x=Dα=Dsαs.
The p.d.f. P(α) and P(x) are clear and known
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
30
Adding Noise
x
K
Noise Assumed:
+
N
A fixed Dictionary
D
α
v
y
The noise v is additive
white Gaussian vector
with probability Pv(v)
x y 2
P y x C exp
2
2
The conditional p.d.f.’s P(y|), P(|y), and even
P(y|s), P(s|y), are all clear and well-defined
(although they may appear nasty).
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
31
The Key – The Posterior P(|y)
We have
access to
MAP*
ˆ
MAP
ArgMax P( | y)
P | y
Oracle
MMSE
known
support s
̂
oracle
̂
MMSE
E | y
* Actually,
there is a delicate
with this definition,
The estimation of x is done
by estimating
α andproblem
multiplication
by D.
due to the unavoidable mixture of continuous and discrete
These two estimators are PDF’s.
impossible
to compute,
as we
showsupport
next. S.
The solution
is to estimate
the MAP’s
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
32
Lets Start with The Oracle *
P | y , s P s | y
P y | s P s
y D 2
s s
P y | s exp
2
2
P y
2
P s exp s
2
2 x
2
y D 2
s
s s
P s | y exp
2
2
2
2 x
1 T
1
ˆ s 2 Ds Ds 2 I
x
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
1
1
D sT y
2
Q s1 hs
Comments:
• This estimate is both
the MAP and MMSE.
• The oracle estimate
of x is obtained by
multiplication by Ds.
* When s is known
33
The MAP Estimation
ŝ
P(y | s)
MAP
ArgMax P(s | y) ArgMax
s
s
P(y | s, )P( )d
s
s
....
s
s
P(y | s)P(s)
P(y)
Based on our prior for
generating the support
P s Pi 1 Pj
T 1
hs Qs hs log(det(Qs )) is js
|s|
x exp
2
2
ŝ
MAP
hsT Q s1 hs log(det(Q s ))
Pi
ArgMax exp
2
2
s
is x
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
1 P
js
j
34
The MAP Estimation
hsT Qs1 hs log(det(Qs ))
2
2
MAP
ŝ
ArgMax
s
log Pi log 1 P
j
is
x js
Implications:
The MAP estimator requires to test all the possible supports for the
maximization. For the found support, the oracle formula is used.
In typical problems, this process is impossible as there is a
combinatorial set of possibilities.
This is why we rarely use exact MAP, and we typically replace it with
approximation algorithms (e.g., OMP).
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
35
The MMSE Estimation
MMSE
E | y P(s | y ) E | y , s
ˆ
s
This is the oracle for s, as we
have seen before
hsT Qs1 hs log(det(QEs ))
| y , s Pi Q 1 h
ˆ s 1 Psj s
exp
2
2
is x js
P(s | y) P(s) P(y | s) ...
ˆ
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
MMSE
P (s | y )
ˆs
s
36
The MMSE Estimation
MMSE
E | y P(s | y ) E | y , s
ˆ
s
Implications:
ˆ
MMSE
P (s | y )
ˆs
s
The best estimator (in terms of L2 error) is a weighted average of
many sparse representations!!!
As in the MAP case, in typical problems one cannot compute this
expression, as the summation is over a combinatorial set of
possibilities. We should propose approximations here as well.
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
37
The Case of |s|=1 and Pi=P
hsT Qs1 hs log(det(Qs ))
Pi
P(s | y) exp
2
2
is x
1
2x
T
2
exp 2 2
(y di )
2
This is our
2 x
c in the
Random-OMP
Based on this we can propose a greedy
algorithm for both MAP and MMSE:
1 P
js
j
The i-th
atom in D
MAP – choose the atom with the largest inner product (out of K), and
do so one at a time, while freezing the previous ones (almost OMP).
MMSE – draw at random an atom in a greedy algorithm, based on the
above probability set, getting close to P(s|y) in the overall draw
(almost RandOMP).
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
38
The following results
correspond to a small
dictionary (10×16),
where the combinatorial
formulas can be
evaluated as well.
Parameters:
• N,K: 10×16
• P=0.1
(varying cardinality)
• σx=1
• J=50 (RandOMP)
• Averaged over 1000
experiments
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
Relative Representation Mean-Squared-Error
Comparative Results
1
Oracle
MMSE
MAP
OMP
Rand-OMP
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
39
Bottom Line
The MMSE estimation we got requires a sweep through all supports
(i.e. combinatorial search) – impractical.
Similarly, an explicit expression for P(x/y) can be derived and
maximized – this is the MAP estimation, and it also requires a
sweep through all possible supports – impractical too.
The OMP is a (good) approximation for the MAP estimate.
The Random-OMP is a (good) approximation of the Minimum-MeanSquared-Error (MMSE) estimate. It is close to the Gibbs sampler of
the probability P(s|y) from which we should draw the weights.
Back to the beginning: Why Use Several Representations?
Because their average leads to a provable better noise suppression.
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
40
Part V
A Closer Look At the
Unitary Case
T
T
DD D D I
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
41
Few Basic Observations
Let us denote D T y
2 2x
1 T
1
Qs 2 Ds Ds 2 I
I
2 2
x
x
1 T
1
D
y
s
2
2 s
2
oracle
Qs1 hs 2 x 2 s c 2 s
ˆs
x
hs
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
(The Oracle)
42
Back to the MAP Estimation
hsT Q s1 hs log(det(Q s ))
Pi
P S | y exp
2
2
is x
T
s
h Q s1 hs c 2
2 s
2
2
2
1 P
j
js
log(det(Q s ))
1
S log
2
1 c 2 2x
c 2 2 Pi 1 c 2
P S | y exp 2 i
qi
is
is
1 Pi
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
43
The MAP Estimator
ŜMAP is obtained by
maximizing the expression
c 2 2 Pi 1 c 2
qi exp 2 i
1 Pi
P S | y qi
is
3
Thus, every i such that
qi>1 should be in the
support, which leads to
ˆ MAP
i
MAP
1
i
2
2
1 Pi
2
2
c i i c 2 log
Pi 1 c 2
Otherwise
0
2
0
-1
-2
-3
-3
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
P=0.1
=0.3
x=1
-2
-1
0
1
2
3
44
The MMSE Estimation
Some algebra
and we get that
ˆ
3
2
i
MMSE
1
P=0.1
=0.3
x=1
0
-1
-2
-3
-3
-2
-1
c 2 2 Pi 1 c 2
qi exp 2 i
1 Pi
0
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
1
2
3
MMSE
i
qi 2
c i
1 qi
gi
This result leads to a dense
representation vector. The
curve is a smoothed version
of the MAP one.
45
What About the Error ?
oracle
E
ˆ
MMSE
E
ˆ
2
2
2
2
n
E
ˆ
2
2
qi
1 qi
... c g
trace Q
1
s
2
2
i
i 1
2
MMSE
oracle
1
P s | y trace Qs
ˆ
ˆs
2
s
n
n
i 1
i 1
... c 22gi c 4i2 gi gi2
MAP
gi
ˆ
MAP
ˆ
n
MMSE
2
2
E
ˆ
n
MMSE
... c gi c 4i2 gi IMAP
(1 2gi )
i
i 1
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
2
2
i 1
46
A Synthetic Experiment
Relative Mean-Squared-Error
0.25
The following
results correspond
to a dictionary of
size (100×100)
Empirical Oracle
Theoretical Oracle
Empirical MMSE
Theoretical MMSE
Empirical MAP
Theoretical MAP
0.2
Parameters:
• n,K: 100×100
0.15
• P=0.1
• σx=1
• Averaged over
1000 experiments
The average errors
are shown relative
to n2
0.1
0.05
0
0
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
47
Part VI
Summary and
Conclusions
MMSE Estimation for Sparse
Representation Modeling
By: Michael Elad
48
Today We Have Seen that …
Sparsity and
Redundancy are
used for denoising
of signals/images
MAP and MMSE enjoy
a closed-form, exact
and cheap formulae.
Their error is bounded
and tightly related to
the oracle’s error
How ?
Unitary
case?
Estimation theory tells us
exactly what should be
done, and if impossible,
how to approximate
Yes! Averaging
several rep’s
lead to better
denoising, as it
approximates
the MMSE
So can we
do better
than
OMP?
More on these (including the slides and the relevant papers) can be found in
http://www.cs.technion.ac.il/~elad
Topics in MMSE Estimation
For Sparse Approximation
By: Michael Elad
49
© Copyright 2025 Paperzz