ELE 774 Adaptive Signal Processing

Least Mean-Square
Adaptive Filtering
Chapter 5
ELE 774 - Adaptive Signal Processing
1
Steepest Descent

The update rule for SD is
where
or

SD is a deterministic algorithm, in the sense that p and R are
assumed to be exactly known.

In practice we can only estimate these functions.
Chapter 5
ELE 774 - Adaptive Signal Processing
2
Basic Idea

The simplest estimate of the expectations is
 To remove the expectation terms and replace them with the
instantaneous values, i.e.

Then, the gradient becomes

Eventually, the new update rule is
Chapter 5
ELE 774 - Adaptive Signal Processing
No
expectations,
Instantaneous
samples!
3
Basic Idea

However the term in the brackets is the error, i.e.
then
is the gradient of

instead of
as
in SD.
Chapter 5
ELE 774 - Adaptive Signal Processing
4
Basic Idea

Filter weights are updated using instantaneous values
Chapter 5
ELE 774 - Adaptive Signal Processing
5
Update Equation for
Method of Steepest Descent
Update Equation for
Least Mean-Square
Chapter 5
ELE 774 - Adaptive Signal Processing
6
LMS Algorithm


unbiased
Since the expectations are omitted, the estimates will have a high variance.
Therefore, the recursive computation of each tap weight in the LMS
algorithm suffers from a gradient noise.

In contrast to SD which is a deterministic algorithm, LMS is a member of the
family of stochastic gradient descent algorithms.

LMS has higher MSE (J(∞)) compared to SD (Jmin) (Wiener Soln.) as n→∞
 i.e., J(n) →J(∞) as n→∞
 Difference is called the excess mean-square error Jex(∞)
 The ratio Jex(∞)/ Jmin is called the misadjustment.
 Hopefully, J(∞) is a finite value, then LMS is said to be stable in the
mean square sense.
 LMS will perform a random motion around the Wiener solution.
Chapter 5
ELE 774 - Adaptive Signal Processing
7
LMS Algorithm





Involves a feedback connection.
Although LMS might seem very difficult to work due the
randomness, the feedback acts as a low-pass filter or performs
averaging so that the randomness can be filtered-out.
The time-constant of averaging is inversely proportional to μ.
Actually, if  is chosen small enough, the adaptive process is made
to progress slowly and the effects of the gradient noise on the tap
weights are largely filtered-out.
Computational complexity of LMS is very low → very attractive
 Only 2M+1 complex multiplications and 2M complex additions
per iteration.
Chapter 5
ELE 774 - Adaptive Signal Processing
8
LMS Algorithm
Chapter 5
ELE 774 - Adaptive Signal Processing
9
Canonical Model


LMS algorithm for complex signals/with complex coef.s can be
represented in terms of four separate LMS algorithms for real
signals with cross-coupling between them.
Write the input/desired signal/tap gains/output/error in the complex
notation
Chapter 5
ELE 774 - Adaptive Signal Processing
10
Canonical Model

Then the relations bw. these expressions are
Chapter 5
ELE 774 - Adaptive Signal Processing
11
Canonical Model
Chapter 5
ELE 774 - Adaptive Signal Processing
12
Canonical Model
Chapter 5
ELE 774 - Adaptive Signal Processing
13
Analysis of the LMS Algorithm

Although the filter is a linear combiner, the algorithm is highly nonlinear and violates superposition and homogenity

Assume the initial condition
, then
output

input
Analysis will continue using the weight-error vector
and its autocorrelation
Chapter 5
Here we use expectation,
however, actually it is
the ensemble average!.
ELE 774 - Adaptive Signal Processing
14
Analysis of the LMS Algorithm

We have

Let

Then the update eqn. can be written as

Analyse convergence in an average sense
 Algorithm run many times→study their ensemble average behavior
Chapter 5
ELE 774 - Adaptive Signal Processing
15
Analysis of the LMS Algorithm

Using

It can be shown that
Here we use expectation,
however, actually it is
the ensemble average!.
Small step size
assumption
Chapter 5
ELE 774 - Adaptive Signal Processing
16
Small Step Size Analysis

Assumption I: step size  is small (how small?) → LMS filter act
like a low-pass filter with very low cut-off frequency.

Assumption II: Desired response is described by a linear multiple
regression model that is matched exactly by the optimum Wiener
filter
where eo(n) is the irreducible estimation error and

Assumption III: The input and the desired response are jointly
Gaussian.
Chapter 5
ELE 774 - Adaptive Signal Processing
17
Small Step Size Analysis

Applying the similarity transformation resulting from the eigendecom.
on
We do not have this term in
Wiener filtering!.
i.e.

Then, we have
where
Components of v(n)
are uncorrelated!
HW: Prove these
relations.
Chapter 5
ELE 774 - Adaptive Signal Processing
18
Small Step Size Analysis

Components of v(n) are uncorrelated:


stochastic force
first order difference equation (Brownian motion, thermodynamics)
Solution: Iterating from n=0
natural component
of v(n)
Chapter 5
forced component
of v(n)
ELE 774 - Adaptive Signal Processing
19
Learning Curves

Two kinds of learning curves
 Mean-square error (MSE) learning curve

Mean-square deviation (MSD) learning curve

Ensemble averaging → results of many (→∞) realizations are averaged.

What is the relation bw. MSE and MSD?

Chapter 5
for  small
ELE 774 - Adaptive Signal Processing
20
Learning Curves
for  small


under the assumptions of slide 17.
Excess MSE
 LMS performs worse than SD, there is always an excess MSE
← use
Chapter 5
ELE 774 - Adaptive Signal Processing
21
Learning Curves
or

Mean-square deviation D is lower-upper bounded by the excess MSE.

They have similar response: decaying as n grows
Chapter 5
ELE 774 - Adaptive Signal Processing
22
Convergence

For  small

Hence, for convergence
or

The ensemble-average learning curve of an LMS filter does not
exhibit oscillations, rather, it decays exponentially to the const. value
Jex(n)
Chapter 5
ELE 774 - Adaptive Signal Processing
23
Misadjustment

Misadjustment, define

For small , from prev. slide
or equivalently
but
then
Chapter 5
ELE 774 - Adaptive Signal Processing
24
Average Time Constant

From SD we know that
but
then
Chapter 5
ELE 774 - Adaptive Signal Processing
25
Observations

Misadjustment is
 directly proportional to the filter length M, for a fixed mse,av
 inversely proportional to the time constant mse,av


Directly proportional to the step size 


slower convergence results in lower misadjustment.
smaller step size results in lower misadjustment.
Time constant is
 inversely proportional to the step size 


Chapter 5
smaller step size results in slower convergence
Large  requires the inclusion of k(n) (k≥1) into the analysis
 Difficult to analyse, small step analysis is no longer valid,
 learning curve becomes more noisy
ELE 774 - Adaptive Signal Processing
26
LMS vs. SD





Main goal is to minimise the Mean Square Error (MSE)
Optimum solution found by Wiener-Hopf equations.
Requires auto/cross-correlations.
Achieves the minimum value of MSE, Jmin.
LMS and SD are iterative algorithms designed to find wo.
 SD has direct access to auto/cross-correlations (exact measurements)


can approach the Wiener solution wo, can go down to Jmin.
LMS uses instantenous estimates instead (noisy measurements)

Chapter 5
fluctuates around wo in a Brownian-motion manner, at most J(∞).
ELE 774 - Adaptive Signal Processing
27
LMS vs. SD

Learning curves
 SD has a well-defined curve composed of decaying exponentials

Chapter 5
For LMS, curve is composed of noisy- decaying exponentials
ELE 774 - Adaptive Signal Processing
28
Statistical Wave Theory



As filter length increases, M→∞
 Propagation of electromagnetic disturbances along a
transmission line towards infinity is similar to signals on n
infinitely long LMS filter.
Finite length LMS filter (transmission line)
 Corrections have to be made at the edges to tackle reflections,
 As length increases reflection region decreases compared to the
total filter.
Imposes a limit on the step size to avoid instability as M→∞
Smax: maximum component
of the PSD S(ω) of the tap
inputs u(n).

If the upper bound is exceeded, instability is observed.
Chapter 5
ELE 774 - Adaptive Signal Processing
29
H∞ Optimality of LMS

A single realisation of LMS is not optimum in the MSE sense
 Ensemble average is.
 The previous derivation is heuristic


(replacing auto/cross correlations with their instantenous estimates.)
In what sense is LMS optimum?
 It can be shown that LMS minimises


Minimising the maximum of something → minimax

Chapter 5
Maximum energy gain of the filter under the constraint
Optimisation of an H∞ criterion.
ELE 774 - Adaptive Signal Processing
30
H∞ Optimality of LMS




Provided that the step size parameter  satisfies the limits on the
prev. slide, then
no matter how different the initial weight vector
is from the
unknown parameter vector wo of the multiple regression model, and
irrespective of the value of the additive disturbance n(n),
the error energy produced at the output of the LMS filter will never
exceed a certain level.
Chapter 5
ELE 774 - Adaptive Signal Processing
31
Limits on the Step Size
Chapter 5
ELE 774 - Adaptive Signal Processing
32