Adaptive Convex Combination Filter under Minimum Error Entropy

Adaptive Convex Combination Filter under Minimum
Error Entropy Criterion
Siyuan Peng1, Zongze Wu2, Yajing Zhou2, Badong Chen3
2
1
Electronic and Information Engineering, South China University of Technology, Guangzhou, China
Institute of Automation and Radio Engineering, Guangzhou University of Technology, Guangzhou, China
3
Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, China
E-mail: [email protected], [email protected]
Abstract—Minimum error entropy (MEE) is a robust adaption
criterion and has been successfully applied to adaptive filtering,
which can outperform the well-known minimum mean square
error (MSE) criterion especially in the present of non-Gaussian
noise. However, the adaptive algorithms under MEE are still
subject to a compromise between convergence speed and steadystate mean square deviation (MSD). To address this issue, we
propose in this paper an adaptive convex combination filter
under MEE (CMEE), which is derived by using a convex
combination of two MEE-based adaptive algorithms of different
step-sizes. Monte Carlo simulation results confirm that the new
algorithm can achieve fast convergence speed while keeping a
desirable performance.
Keywords-MEE; CMEE; non-Gaussian noise
I.
INTRODUCTION
Due to its mathematical tractability and simplicity, both in
terms of easiness of implementation and computational load,
the least mean square algorithm (LMS), which is based on the
minimum mean square error (MSE) criterion, has been widely
used in many potential applications, such as signal processing,
system identification, acoustic echo cancellation, blind
equalization, and so on [1]. As a simple stochastic-gradientdescent algorithm, the LMS algorithm seriously suffers from
the low convergence rate especially when the input signal is
correlated. In order to alleviate this problem, some combination
schemes have been successfully applied to improve
convergence speed while keeping an excellent performance.
In previous studies, most existing combination adaptive
filtering algorithms are proposed based on the well-known
MSE criterion. For instance, Jeronimo et al. developed a
combination of one fast and one slow LMS filter that was
effective at combining fast convergence and low misadjustment
[2], and the detailed mean square performance analysis of this
algorithm for both stationary and nonstationary situations based
on energy conservation arguments has been developed in [3].
Bijit et al. introduced an alternative method to deal with
variable sparsity by using an adaptive convex combination of
the LMS and the zero-attractor LMS algorithms [4], and
Jeronimo et al. also proposed an adaptive combination of
proportionate filters for sparse echo cancellation [5]. Moreover,
several adaptive combination algorithms based on the recursive
least squares algorithm (RLS) were also developed in [6-8]. As
is well-known that most of these above linear adaptive filters
under MSE criterion achieve their optimality only when the
underlying system is linear and Gaussian. In most practical
applications, the Gaussian assumption does not hold however.
The minimum error entropy (MEE), on the other hand, has
provided a robust alternative criterion for addressing nonGaussian signal processing [9-10]. The MEE, with a
nonparametric estimator based on Parzen window to calculate
the error entropy directly from the error samples, can be used
for adaptive system training. Compared with the MSE criterion,
the MEE can achieve better performance in adaptive filtering,
mainly when the system is contaminated by non-Gaussian
noises [11-14]. A key problem of the MEE-based adaptive
filter algorithms is that the selection of the step-size requires a
compromise between convergence speed and accuracy.
Combination approaches provide an interesting way to deal
with this issue however.
In this paper, we propose a novel adaptive convex
combination filter under MEE (CMEE), which consists of two
independently-run filters under MEE with different step-sizes.
The proposed algorithm aims to obtain both fast convergence
speed (from the faster filter) and low misadjustment (from the
slower filter). We also make a comparison with the convex
combination filter under maximum correntropy criterion
(CMEE) [15] in the simulation, and simulation results are
presented to confirm the superior performance of the new
algorithm.
The rest of the paper is organized as follows. In section II,
we develop the CMEE algorithm before briefly introducing the
MEE criterion. Simulation results are shown in section III.
Finally, section IV gives the conclusion.
II.
CONVEX COMBINATION FILTER UNDER MINIMUM
ERROR ENTROPY CRITERION
A. Minimum Error Entropy (MEE) Criterion
Consider a linear system, in which the input vector
T
X (n)   xnM 1 ,, xn1 , xn ) is sent over a finite-impulse-response
(FIR)
channel
with
parameter
(weight)
vector
W *  [ w1* , w2* ,, w*M ]T (M is the size of the channel memory). Then
the desired signal is
d (n)  W *T X (n)  v(n)
(1)
Figure 1. Performance Surface of Quadratic Information Potential.
where  stands for the transpose operator, and v( n) denotes the
measurement noise at instant n . Let W (n)  [w1 (n), w2 (n),, wM (n)]T
be the weight vector of an adaptive filter. The instantaneous
error can then be computed as
e(n)  d (n)  y(n)  d (n)  W T (n) X (n)
(2)
where y (n) denotes the output signal of the adaptive filter.
Consider the error e(n) as a random variable, with probability
density function (PDF) pe (.) . The quadratic Renyi's entropy of
the error will be [9-14]
H R 2 (e)   log  pe2 ( )d  =  log V (e)
(3)
where V (e)   pe2 ( ) d  is the quadratic information potential
(QIP). Clearly, minimizing the quadratic Renyi's entropy is
equal to maximizing the QIP. So the optimal weight vector
under MEE can be obtained as
Wopt  arg max V (e)
(4)
1 N N
V (e)  Vˆ (e)  2   2 (e(i )  e( j ))
N i 1 j 1
(5)
W
According to [9-14], a nonparametric estimator of V (e) is
where N is the samples number, and  () is a Gaussian kernel
function with bandwidth  . Based on (5), a weight vector
update equation under MEE is as follows
W(n 1)  W(n)  
Vˆ(e)
W(n)



W(n)  2L2 2


= n
n
 2 (e(i, j))

   

 i nL1 j nL1(e(i, j)) X (i)  X ( j) 



(6)
where  is the step-size parameter, e(i, j )  e(i)  e( j ) , and L is
the sliding data length. Fig. 1 illustrates a performance surface
of the QIP for a two-dimensional system. As one can see, the
QIP performance surface does not display a constant curvature
(as in the MSE performance surface), being flat in a region
away from the optimum. Actually, QIP affects the terms in the
sum differently, which in general provides different solutions
Figure 2. Adaptive Convex Combination of two Adaptive Filters.
that can be better than the MSE solution particularly in the
presence of impulsive noises [9].
B. CMEE Algorithm
The CMEE algorithm is derived by combining two
independently-run MEE-based adaptive algorithms with
different step-sizes (see Fig. 2). According to [15-18], the
overall output and the corresponding overall weight vector can
be calculated by
y ( n)   ( n) y1 ( n)  (1   ( n)) y2 ( n)
W ( n)   (n)W1 (n)  (1   (n))W2 (n)
(7)
(8)
where  ( n) stands for a mixing coefficient, y1 (n) , y2 (n) , W1 (n)
and W2 ( n) denote, respectively, the outputs and the weight
vectors of the fast filter and the slow filter. The goal is to make
the mixing coefficient  (n) as close to 1 as possible when the
algorithm starts, and make it as close to 0 as possible when the
algorithm begins to converge to steady-state. Since  ( n) is
restricted within the interval (0, 1), one can define it via a
sigmoidal function, that is
 (n)  sgm[ ( n)] 
1
1  e  ( n )
(9)
Instead of using the MSE criterion, we use the MEE cost to
update the parameter  (n) according to the following gradient
based algorithm:
 ( n  1)   (n)  
= ( n) 
Vˆ (e)
 ( n)
n
n

 ( e(i, j ))( e(i, j )) 


2 2

2 L  i  n  L 1 j  n  L 1   2
where  is a step-size parameter, and
  y1 (i )  y2 (i )  (i )(1   (i ))  

  y ( j )  y ( j )  ( j )(1   ( j )) 
2
 1

 
(10)
(11)
In order to prevent the proposed algorithm from stopping, one
can restrict the value of  (n) to a certain interval [-  ,  ], such
and the adaptive filter has the same structure. The measurement
noise is assumed to be mix-Gaussian, defined as [19-20]
TABLE I
CMEE ALGORITHM
Parameters:
1    N 1,12    N  2 , 22 
1 , 2 ,  ,  ,  , and L
where  is a mixture coefficient, and N  i , i2   i  1, 2  denote the
Initialization :  (0)  0
Gaussian distributions with mean values  i and variances  i2 . In
this simulation, one can observe that Gaussian distribution
(with much larger variance) can model stronger impulsive
noise. The mean square deviation (MSD) is adopted as a
performance measure, given by
 (0)  0

W (0)  0
Update :
y1 (n)  W1T (n) X (n)
y2 (n)  W2T (n) X (n)
2
MSD=E  W *  W ( n) 


y (n)   (n) y1 (n)  (1   (n)) y2 ( n)
e1 (n)  d (n)  y1 (n)
e(n)  d (n)  y (n)

1 n n  2 (e1(i, j))

 
2L2 2 inL1 jnL1 (e1(i, j)) X (i)  X ( j)
W2 (n  1)= 1  a  W1 ( n)  W2 (n)

2
2 L2 2
if  (n)  
 2 (e2 (i , j ))(e2 (i, j )) 


    X (i )  X ( j ) 

n
n
i n L 1 j  n L 1
First, we investigate the convergence curves of CMEE and
MEE with different step-sizes in mix-Gaussian noise situations.
Simulation results are shown in Fig. 3, and accordingly, Fig. 4
illustrates the evolution of the mixing coefficient  (n) . In the
simulation the kernel width  is 1.0, and the parameters
1 , 2 ,1 , 2 ,   in the mix-Gaussian distribution are set to (0, 0,
0.01, 10, 0.2). From Fig. 3 and Fig. 4, we can see: 1) Compared
with the MEE algorithm, the CMEE algorithm can obtain much
better performance in terms of convergence rate or accuracy; 2)
CMEE algorithm can gradually change from the fast filter to
the slow filter.
 ( n)  
else if  (n)  
 (n)  
end
 (n  1)   ( n) 
 (n  1) 
n
n
 2 (e(i, j )) 




2 2 
2 L  in L1 j n L1  (e(i, j )) 
1
1  e  ( n1)
W (n 1)  (n 1)W1(n 1)  (1 (n 1))W2 (n 1)
that the value of  (n)(1   (n)) is not too close to 0 (more details
on this truncation procedure can be found in [15-16]).
Following [15-16], the performance of the convex combination
scheme can be further improved by taking advantage of the
faster filter to speed up the convergence of the slower one. The
modified update rule for W2 ( n) can then be expressed as
 
n
n
 (e(i, j ))(e(i, j ))   

   W2 (n)  2 2     2


2 L  i  n  L 1 j  n  L 1  X (i )  X ( j ) 
W2 (n  1)=  
  


  1  a W (n)

1


(12)
where  denotes a smooth factor. The pseudocodes of the
proposed CMEE are presented in Table I.
III.
SIMULATION RESULTS
(14)
In the following simulations, unless stated explicitly, the input
is a white Gaussian random sequence with zero mean and unit
variance. The sliding data length is L  20 , and the smooth
factor  is 0.998. The step-size  and the parameter  are set
at 4. Simulation results are averaged over 50 independent
Monte Carlo runs, and in each simulation, 8000/4000 iterations
are run to ensure the algorithm to reach the steady state.
e2 (n)  d (n)  y2 (n)
W1(n 1) =W1(n) 
(13)
In this study, we intend to compare the convergence
performance between the proposed CMEE algorithm and the
MEE algorithm in non-Gaussian noise. Consider the case in
which the unknown system is randomly generated with 20 taps,
Second, we illustrate the convergence curves of CMEE and
MEE with different kernel widths. The noise vector is same as
the above simulation. Simulation results are demonstrated in
Fig. 5. Again, CMEE achieves fast convergence speed while
keeping a good accuracy.
Third, we make a comparison with CMCC algorithm.
Simulation results are shown in Fig. 6. The noise is the binary
noise which is either -1 or 1 (each with probability 0.5). The
parameters of the CMEE algorithm and the CMCC algorithm
are set to 1  0.2, 2  0.035,  1.0 and 1  0.05, 2  0.01,  1.5 ,
respectively. As one can see that, compared with the CMCC
algorithm, the proposed algorithm can achieve a litter lower
MSD while the computation complexity is higher.
IV.
CONCLUSION
A novel adaptive convex combination filter under MEE
criterion, called CMEE, has been developed. Compared with
the traditional MEE algorithm, the proposed algorithm can
achieve both fast convergence speed and low misadjustment in
the present of non-Gaussian noise, and its superior performance
has been confirmed by simulation results.
Figure 6. Convergence curves of CMEE and CMCC
Figure 3. Convergence curves of the CMEE and MEE with different stepsizes.
ACKNOWLEDGMENT
This work was supported in part by 973 Program under
grant no. 2015CB351703 and NSF of China under grants no.
61271210 and no. 61372152.
REFERENCES
[1]
Figure 4. Evolution of the mixing coefficient  (n) in CMEE.
Figure 5. Convergence curves of the CMEE and MEE with different kernel
widths.
A. H. Sayed, Fundamentals of Adaptive Filtering, Wiley, Hoboken, NJ,
USA, 2003.
[2] M. Martinez-Ramon, J. Arenas-García, A. Navia-Vazquez, and A. R.
Figueiras-Vidal, “An adaptive combination of adaptive filters for
plantidentification,” in Proc. 14th Int. Conf. Digital Signal Processing,
Santorini, Greece, 2002, pp. 1195–1198.
[3] J. Arenas- García, A. R. Figueiras-Vidal, and A. H. Sayed, “Meansquare performance of a convex combination of two adaptivefilters,”
IEEE Trans. Signal Process., vol. 54, no. 3, pp. 1078–1090, Mar. 2006.
[4] B. K. Das and M. Chakraborty, “Sparse Adaptive Filtering by an
Adaptive Convex Combination of the LMS and the ZA-LMS
Algorithms”, IEEE Trans. Circuits and Systems I: Regular papers, vol.
61, no. 5, pp. 1780-1786, Jan. 2002.
[5] J. Arenas- García, A. R. Figueiras-Vidal, “Adaptive combination of
proportionate filters for sparse echo cancellation,” IEEE Trans.
Audio,Speech Language Process., vol. 17, no. 6, pp. 1087–1098, Aug.
2009.
[6] J. Arenas- García, M. Martinez-Ramon, A. Navia-Vazquez, and A. R.
Figueiras-Vidal, “Plant identification via adaptive combination of
transversal filters,” Signal Process., vol. 86, pp. 2430–2438, Sep. 2006.
[7] M. Niedzwiecki, “Identification of nonstationary stochastic systems
using parallel estimation schemes,” IEEE Trans. Autom. Control, vol. 35,
no. 3, pp. 329–334, Mar. 1990.
[8] M. Niedzwiecki, “Multiple-model approach to finite memory adaptive
filtering,” IEEE Trans. Signal Process., vol. 40, no. 2, pp. 470–473, Feb.
1992.
[9] J. C. Principe, Information Theoretic Learning: Renyi’s Entropy and
Kernel Perspectives, Springer, New York, NY, USA, 2010.
[10] B. Chen, Y. Zhu, J. Hu, and J. C. Principe, System Parameter
Identification: Information Criteria and Algorithms, Elsevier,
Amsterdam, Netherlands, 2013.
[11] D. Erdogmus, and J. C. Principe, “An Error-Entropy Minimization
Algorithm for Supervised Training of Nonlinear Adaptive Systems,”
IEEE Trans. Signal Process., vol. 50, pp. 1780-1786, 2002.
[12] B. Chen, Y. Zhu, and J. Hu, “Mean-square convergence analysis of
ADALINE training with minimum error entropy criterion,” IEEE Trans.
Neural Netw., vol. 21, pp. 1168–1179, 2010.
[13] B. Chen, J. Hu, L. Pu, and Z. Sun, “Stochastic gradient algorithm under
(h, phi)-entropy criterion,” Circuit Syst. Signal Process., vol. 26, pp.
941–960, 2007.
[14] B. Chen, and J. C. Principe, “On the Smoothed Minimum Error Entropy
Criterion,” Entropy, vol. 14, no. 711, pp. 2311-2323, Nov. 2012.
[15] L. Shi, and Y. Lin, “Convex Combination of Adaptive Filters under the
Maximum Correntropy Criterion in Impulsive Interference,” IEEE
Signal Process. Lett., vol. 21, pp. 1070–1388, 2014.
[16] J. Arenas-García, V. Gómez-Verdejo, and A. R. Figueiras-Vidal, “New
algorithms for improved adaptive convex combination of LMS
transversal filters,” IEEE Trans. Instrum. Meas., vol. 54, pp. 2239–2249,
2005.
[17] M. T. M. Silva, V. H. Nascimento, “Improving the tracking capability of
adaptive filters via convex combination,” IEEE Trans. Signal Process.
Vol. 56, no. 7, part 2, pp. 3137–3149, Jan. 2008.
[18] J. Arenas-García, and A. R. Figueiras-Vidal, “Adaptive combination of
normalized filters for robust system identification,” Electronics Letters,
vol. 41, no. 15, pp. 874–875, Jul. 2005.
[19] W. Ma, H. Qu, G. Gui, L. Xu, J. Zhao, and B. Chen, “Maximum
correntropy criterion based sparse adaptive filtering algorithms for
robust channel estimation under non-Gaussian environments,” Journal of
the Franklin Institute, vol 352, pp. 2708-2727, Apr. 2015.
[20] S. Zhao, B. Chen, and J. C. Principe, “Kernel adaptive filtering with
maximum correntropy criterion,” in Proc. Inter. Joint Conf. Neural
Netw., pp. 2012–2017, Aug. 2011.