224 PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON TELECOMMUNICATIONS - IWT/09 On the improvement of the learning rate in Blind Source Separation using techniques from Artificial Neural Networks theory Felipe Augusto Pereira de Figueiredo Carlos Alberto Ynoguti Instituto Nacional de Telecomunicações - Inatel P.O. Box 05 - 37540-000 Santa Rita do Sapucaı́ - MG - Brazil [email protected] Instituto Nacional de Telecomunicações - Inatel P.O. Box 05 - 37540-000 Santa Rita do Sapucaı́ - MG - Brazil [email protected] Abstract— In this work some techniques from the Artificial Neural Networks theory are used to improve the convergence speed of a blind source separation (BSS) algorithm. The momentum term, bold driver and exponential decay techniques were used, and experimental results show a convergence time reduction by a factor of about 16. Index Terms— Blind source separation, Neural Networks, Natural Gradient, Multiple-Input-Multiple-Output (MIMO) Systems, Adaptive Filtering, convolutive mixtures, Second Order Statistics, Dynamic learning rate, Momentum term. where hqp (k), k = 0, . . . , M − 1 denote the coefficients of the filter from the q-th source to the p-th sensor. The problem of BSS consists in finding a corresponding demixing system according to Figure 1, where the output signals yq (n), q = 1, . . . , P (P = Q) are described by I. I NTRODUCTION where L is the length of the demixing system filters. It can be shown (see, e.g., [1]) that the MIMO demixing system coefficients can in fact reconstruct [10] the sources up to an unknown permutation and an unknown filtering of the individual signals, provided that L is chosen to be at least equal to M . To estimate the P 2 L coefficients wqp (k) of the MIMO demixing filter W , it is considered in this work an approach using second-order statistics [6], which exploits the Nonwhitheness and Nonstationarity properties of the signals. The Nonwhiteness property is exploited by simultaneous diagonalization of output correlation matrices over multiple timelags, e.g., [4], and the Nonstationarity property is exploited by simultaneous diagonalization of short-time output correlation matrices at different time intervals, e.g., [5],[7]-[8]. In the sequence, an algorithm for convolutive mixtures by first introducing a general matrix formulation for convolutive mixtures following [1] that includes all time lags, is presented. The problem of blind source separation is illustrated in Figure 1: x1 sensor 1 w1p ... ... hpp xp sensor p mixing system H Fig. 1. ... ... h1p sp y1 wp1 ... ... ... hp1 w11 ... h11 ... s1 wpp yp demixing system W Linear MIMO model for BSS. In this work it is assumed a MIMO (Multiple Inputs Multiple Outputs) model, in which the signals are convolutively mixed. Also, the number of source signals (sq (n), q = 1, . . . , Q) is assumed to be equal to the number of sensor signals (xp (n), p = 1, . . . , P ). Each of the outputs of the mixing system H is described by xp (n) = −1 P M q=1 k=0 yq (n) = P L−1 wpq (k)xp (n − k), (2) p=1 k=0 II. T IME - DOMAIN A LGORITHM FOR BSS In this section the matrix formulation that allows derivation of a time-domain [9] algorithm from a cost function which inherently takes into account the nonstationarity and nonwhiteness properties will be introduced. A. Matrix notation for Convolutive Mixtures hqp (k)sq (n − k), (1) From Fig. 1, it can be seen that the output signals yq (n), q = 1, . . . , P of the demixing system at time n are given by PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON TELECOMMUNICATIONS - IWT/09 yq (n) = P xTp (n)wpq , (3) p=1 where xp (n) = [xp (n), xp (n − 1), . . . , xp (n − L + 1)]T (4) is a vector containing the latest L samples of the sensor signal xp (n) of the p-th channel and wpq = [wpq,0 , wpq,1 , . . . , wpq,L−1 ]T (5) contains the current weights of the MIMO filter taps from the p-th sensor channel to the q-th output channel. An algorithm for BSS of convolutive signals which exploits those two signal properties can be obtained from the definition of the following matrix Yq (m) = ··· ··· .. . yq (mL + N − 1) · · · yq (mL) yq (mL + 1) .. . T Xp (m)Wpq Xp (m) = xp (mL + N − 1) · · · 0 (10) 0 Y (m) = X(m)W (11) Y (m) = [Y1 (m), Y2 (m), . . . , YP (m)] (12) X(m) = [X1 (m), X2 (m), . . . , XP (m)] (13) xp (mL − L + 1) xp (mL − L + 2) .. . W11 W = ... (7) (8) The approach followed here is carried out with overlapping data blocks to increase the convergence rate and reduce signal delay. Overlapping is introduced by simply replacing the time index mL in the equations by m(L/a) with the overlap factor 1 ≤ a ≤ L. The matrices Xp (m), p = 1, . . . , P used in (8) are defined as ··· ··· .. . ··· ··· .. . ··· wpq,0 wpq,1 .. . wpq,L−1 0 .. . Finally, to allow a convenient notation of the algorithm combining all channels, (8) can be compactly rewritten as (6) p=1 xp (mL) xp (mL + 1) .. . 0 0 .. . where: in order to incorporate L time-lags in the cost function and thus the algorithm will be able to exploit the nonwhiteness property. With the definitions above, (2) can be rewritten as ··· ··· .. . 0 0 0 .. . yq (mL − L + N ) yq (m) = [yq (mL), . . . , yq (mL + N − 1)] P wpq,L−2 wpq,L−1 .. . 0 0 where m denotes the time index of the block being processed and N is the length of the system output blocks taken into account for the estimates of short-time correlations used below. This matrix captures L subsequent output signal vectors Yq (m) = wpq,0 wpq,1 .. . ··· ··· ··· .. . wpq,0 wpq,1 wpq,2 .. . wpq,L−1 Wpq (m) = 0 .. . 0 0 .. . yq (mL − L + 1) yq (mL − L + 2) .. . 225 (9) xp (mL − L + N ) Those matrices are Toeplitz with dimension (N × 2L), so the first row contains 2L input samples and each subsequent row is shifted to the right by one sample and thus, contains one new input sample. Wpq are 2L × L Sylvester matrices, which are defined as WP 1 ··· .. . ··· W1P .. . (14) WP P B. Cost Function and Algorithm Derivation Having defined the compact matrix formulation (11) for the block- MIMO filtering, a following cost function that explicitly contains correlation matrices including several time-lags under the assumption of short-time stationarity is defined. This cost function is based on a generalization of Shannon’s mutual information [11],[12] and simultaneously accounts for those two properties of the signals used here: M −1 1 log | bdiag Y T (i)Y (i)| − log |Y T (i)Y (i)| M i=0 (15) where the bdiag operation on a block matrix consisting of several sub matrices sets all sub matrices on the off-diagonal to zero. The cost function showed above was firstly introduced in [13] as a generalization of [14]. Since the matrix formulation (11) is used for calculating the short-time correlation matrices Y T (m)Y (m), the cost function inherently includes all L timelags of all auto-correlations and cross-correlations of the BSS output signals. By Oppenheim’s inequality [15] q log |YqT (m)Yq (m)| ≥ log |Y T (m)Y (m)|, it is ensured that the first term in the braces of (14) is always greater than or equal to the second term, where the equality holds if all block-diagonal elements of J(m) = 226 PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON TELECOMMUNICATIONS - IWT/09 Y T (m)Y (m), i.e., the output cross-correlation over all timelags, vanish. The algorithm is based on the first-order gradient and in order to express the update equations of the filter coefficients exclusively by Sylvester matrices W , we take the gradient with respect to W and ensure the Sylvester structure of the result by selecting the non redundant values using a constraint. ∇W J(m) = ∂J(m) ∂W (16) And as result, M −1 2 −1 ∇W J(m) = Rxy (i)Ryy (i) M i=0 Artificial Neural Networks literature has produced a large number of heuristic techniques for boosting gradient descent based algorithms by using adjustable learning rate, adding some kind of derivative term, clever choice of of the initial value, etc. In this work, some such strategies are used for the BSS algorithm, and the results are described. A. Momentum term The momentum term is a simple and effective technique of increasing the learning rate yet avoiding the danger of instability. It is represented by the following equation: ψ(m) = β (W (m − 1) − W (m − 2)) (Ryy (i) − bdiag Ryy (i)) bdiag −1 Ryy (i) (17) With an iterative optimization procedure, the current demixing matrix is obtained by the recursive update equation W (m) = W (m − 1) − µ(m)∆W (m) (18) The µ(m) parameter gives the length of the step in the negative gradient direction and it is often called the step size or learning rate, this parameter can be made either dynamic or constant, depending on the technique adopted, as will be shown in the next section. The choice of an appropriate learning rate µ is essential for the convergence of the algorithm: a very small value will lead to slow convergence, on the other hand, very large values will lead to overshooting and instability, which prevents convergence altogether. As is known, non quadratic cost functions may have many local maxima and minima and therefore, good choices for initial values are important. C. Natural Gradient The gradient of a function J(m) points in the steepest direction in the Euclidean orthogonal coordinate system. However, the parameter space is not always Euclidean; in fact it has a Riemannian metric structure, as pointed out by Amari [17]. In such a case, the steepest direction is given by the socalled natural gradient instead. Therefore, in order to use the natural gradient as the update term ∆W (m) the following modification should be applied to the descent gradient: G T T ∇N W J(m) = W W ∇W J(m) = W W ∂J(m) ∂W (19) M −1 2 W (m) (Ryy (i) − bdiag Ryy (i)) M i=0 bdiag−1 Ryy (i) (21) where 0 < β < 1 is a new global parameter which must be determined by trial and error. The use of the momentum term produces the following update equation: W (m) = W (m − 1) − µ(m)∆W (m) + ψ(m) (22) Momentum simply adds a fraction of the previous weight update to the current one. It is imoprtant to note that the learning rate parameter µ is made constant. When the gradient keeps pointing in the same direction, this will increase the size of the steps taken towards the minimum and when the gradient keeps changing direction, momentum will smooth out the variations. This technique may also have the benefit of preventing the algorithm from terminating in a shallow local minimum on the error surface. B. Bold Driver A useful batch method for adapting the global learning rate µ is the so called bold driver technique. Its operation is simple: after each epoch, compare the value of the cost function with its previous value. If the difference has decreased, increase µ by a small proportion (typically 1%-10%). A value of 10% was adopted as the value for that parameter. If the difference has increased by more than a tiny proportion (say, 10−10 ), however, undo the last weight change, and decrease µ sharply - a value of 50% was used in this work. Thus bold driver will keep growing µ slowly until it finds itself taking a step that has clearly gone too far up onto the opposite slope of the cost function. Since this means that the algorithm has reached a tricky area of the cost function surface, it makes sense to reduce the step size quite drastically at this point. C. Exponential Decay It is a simple non-adaptive technique, once it do not relay on any values of the outputs of the algorithm. This is an effective technique that can be used to accelerate the searching process of the demixing filter coefficients. The following equation, that was empirically derived, is adopted as the time-variant learning factor: And then we have G ∇N W J(m) = III. L EARNING RATE ADAPTATION (20) m2 µ(m) = µ0 e− 10M (23) PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON TELECOMMUNICATIONS - IWT/09 where µ0 is the initial value of the function, M is the number of epochs adopted and m is the value of the current epoch. The function described above is time-variant: it starts with the value µ0 and then, it decreases gradually at each epoch of the algorithm. At the beginning of the learning process the convergence rate is very fast, once the value of µ is high, and as result, the algorithm can quickly find its way to the minimum of the cost function; at the end of the process, the small values of µ give a fine tuning for the parameters being estimated. SIR 40 30 20 10 6 ... ... ... ... ... .. .. .. .. .. ....... ....... ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... ......... . . ... ... ... ...................................................................................... . . . . . . . . . . . . . . . . . . . . . . . . . . .............. . . . . ... ............ ... .... .... .... ..................... .. ....... ....... ....... ....... ....... ....... ....... ....................... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........ ........ ... ... ... ... .... ...... . . . . . . . . . ...... . .. .. .. ... ........ . . . .... . . . . . . . ...... . ....... ....... ....... ....... ............. ....... ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... ......... ..... . . . . . .. . . . . . .. ..... . .. .. .. .. ... ... .. .. .. .. ... . . . . .. ....... ....... ............. ....... ......... ....... ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... ......... . . . . . . . . . ..... .. .. .. .. .. ........ ......... .. .. .. .. .. ... ... ... ... ... . . . . . 300 IV. E XPERIMENTS AND R ESULTS In this section some results regarding the experiments performed by using the above mentioned techniques are presented. The experiments were conducted by using speech signals convolved with synthetic impulse responses simulating the acoustical behavior of real rooms [18]. For this purpose, filters with 100 taps were used. Two audio signals with 5 seconds of speech were passed through the filter. These signals correspond to a male and female speaker voices. The recordings were made in a low noise environment, with 11025 Hz sampling frequency and 16 bits of resolution. As mentioned before, the number of source signals is supposed to be equal to the number of sensors (two). The Signal-to-Interference Ratio (SIR), which is defined as the ratio of the signal power of the target signal to the signal power from the jammer signal, was used to evaluate the performance of the algorithm. The SIR measured at the input of the demixing system was of 5.1496 dB and the SIR measured at the output of the system for each of the techniques mentioned here are presented below. To decrease the delay introduced to the output signal and increasing the convergence rate as well, the overlapping method [23] was adopted, with a overlapping factor equal to 2, that means 50% overlapping. The SIR measured at the input of the demixing system was of 5.1496 dB The length L of the demixing filters was made equal to the length M of the mixing filters. The demixing filters Wpp were initialized with an unit impulse at the first tap and all the taps of the filters Wpq , p = q were made equal to zero. In the following sections, experimental tests results are reported in order to compare the performance of the proposed methods for learning rate parameter µ modification. A. Initial test In [1,9,13 and 16] the parameter µ is always constant. The first test follows this strategy, in roder to establish a baseline performance for the system. A value of µ = 0.002 was chosen, and the final result can be viewed in Figure 2. Convergency was achieved after 1300 training epochs, and the final SIR was 36.7790 dB. 227 600 Fig. 2. 900 1200 epochs 1500 Initial test. B. Momentum The second test make use of the momentum term (Equation (22)) to reestimate de demixing matrix. As in the previous case, a learning rate µ = 0.002 was used. The value of β that led to the best result was 0.8. The result of this experiment is shown in Figure 3 SIR 40 30 20 10 6 ... .. .. .. .. . . . . . ....... ....... ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... .......... ....... ....... ....... ... .............................................................................................................................................................. . .... . . . ............ ......... .. .. .. .. .. ....... . . . . . ....... ....... ....... ....... ....... ........................... ....... ....... ........ ....... ....... ....... ........ ....... ....... ....... ........ ....... ....... ....... . ....... . . . . . . . . .... . . . . . . . ...... . . ... .. .. .. .. ... .... . . . . .. . . . . . ....... ....... ................... ........ ....... ....... ....... ........ ....... ....... ....... ........ ....... ....... ....... ........ ....... ....... ....... . . . . . . . ... . . . . . . . . .. . .. .... .... .... .... .... .. . . . . . . . . ... ....... ............ ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... .......... ....... ....... ....... ... . . . . . . . . . . ...... .... .... .... .... .... ........ . . . . . .... .... .... .... .... 100 200 300 epochs 400 500 Fig. 3. - Momentum. For this test, convergency was achieved after 270 training epochs, leading to a final SIR of 36.7556 dB. C. Bold driver An initial value µ0 = 0.002 was set for the learning rate. The values used to increase and decrease de value of this parameter were 10% and 50%, respectively. The result of application of this technique is presented in Figure 4. The convergence was reached after 130 training epochs, and the final SIR was 35.8116 dB. D. Exponential decay For this case, the initial value of the learning rate was chosen to be µ0 = 0.07. This higher value was chosen to accelerate the convergence. The results are shown in Figure 5 This technique led to a SIR of 36.8 dB in 80 training epochs. The oscilatory behaviour shown on the Exponential Decay figure is due to changes in the sign, i.e. changes in the direction of the gradient at each epoch of the algorithm. In some cases the error surface has substantially different curvature along 228 SIR 40 30 20 10 PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON TELECOMMUNICATIONS - IWT/09 SIR 6 ... ... ... ... ... . .. .. .. .. ....... ....... ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... .......... ....... ....... ....... .......... ....... ....... ....... ... . . . . .... ................................................................................................................................................................................................. . . . . . . . . . . . . . . . . ...... ....... . . .... .... .... .... .... ... . . . . ....... ....... .................. ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... . ... ... ... ... ... ... . . . . . .... .. . . . . . .... .... .... .... .... .... . . ....... ....... ......... ....... ......... ....... ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... .. .. .. .. .. .. .... .. .. .. .. .. .. ... .. .. .. .. .. .... ....... ....... .......... ....... ......... ....... ....... ....... ......... ....... ....... ....... .......... ....... ....... ....... .......... ....... ....... ....... ... . . . . . . . ... .. .. .. .. .. ........ ............ .. .. .. .. .. ... ... ... ... ... . . . . . 100 SIR 40 30 20 10 200 300 epochs Fig. 4. Bold driver. 400 40 30 20 10 - 500 100 SIR ... ... ... ... ... . . . . . ....... ....... ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... .......... ....... ....... ....... .......... ....... ....... ....... ... .................................................................................................................................................................................................................................................. . . . . . . ........ ........ ... ... ... ... ... ......... ........................ ....... ....... ........ ....... ....... ....... ........ ....... ....... ....... ........ ....... ....... ....... ........ ....... ....... ....... . .. .. .. .. .. ... .. .. .. .. .. ... .. .... .... .... .... .... .... . . ......... ....... ....... ....... ........ ....... ....... ....... ........ ....... ....... ....... ........ ....... ....... ....... ........ ....... ....... ....... . ... ... ... ... ... .... .. ... .... .... .... .... .... ..... ......... ....... ....... ....... .......... ....... ....... ....... .......... ....... ....... ....... ........... ....... ....... ....... ........... ....... ....... ....... .... . . . . . .. .. ... .... .... .... .... .... .... .... .... .... .... . . . . . 200 Fig. 5. 300 epochs 200 Fig. 6. 6 100 6 ... ... ... ... ... .. .. .. .. .. ....... ....... ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... .......... ....... ....... ....... .......... ....... ....... ....... ... ... ......................................................................................................................................................................................... . . ................. . . . . . ... ... ... ... ... ... .... ....... ....... ....... ....... ........... ....... ....... ....... ........ ....... ....... ....... ........ ....... ....... ....... ........ ....... ....... ....... . ... . ... ... ... ... . .. . . . . . . . ... . . .... .... .... .. .... .... . . . . .. . . ....... ....... ....... .......... ......... ....... ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... .. . .. .. .. .. .. .. . ... .. .. .. .. .. ... .. .. .. .. .. .. . . ....... .............. ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... .......... ....... ....... ....... ... . . . . . . . . .... .. .. .. .. .. ..... ....... .. .. .. .. .. ... ... ... ... ... . . . . . 400 40 30 20 10 - 500 Exponential decay. 300 epochs 400 - 500 Bold driver + momentum. 6 .. .. .. .. .. . . . . . ....... ....... ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... ......... ....... ....... ....... .......... ....... ....... ....... ... .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... . . . . ....... ......................................................................................................................................................................................................................................................................................................................................................................... .. .. .. .. .. ... .... ............ .. .. .. .. .. .. ... . . . . . ... . . . . . .. .. .. .. ....... ....... ......... ....... ....... ........ ....... ....... ....... ........ ....... ....... ....... ........ ....... ....... ....... ......... ....... ....... ....... .. . . . . .. ..... .. . . . . .... ... .. . ... ... ... ... ... ....... . . . . . ............ .. ............. ....... ....... ....... .......... ....... ....... ....... .......... ....... ....... ....... .......... ....... ....... ....... ........... ....... ....... ....... .... .... ... ... ... ... ... .. . . . . . .. .. .. .. .. .. .. .. .. .. 100 Fig. 7. 200 300 epochs 400 - 500 Exponential decay + momentum. different directions, leading to the formation of long narrow valleys. For most points on the surface, the gradient does not point towards the minimum, and successive steps of gradient descent can oscillate from one side to the other, progressing only very slowly to the minimum. it has reached a flat state is also useless as the values of the Momentum term will also tend to zero. In order to avoid that the flat state be reached before the algorithm can converge to its optimal result, an apropriate (higher) value for µ0 should be chosen. E. Experiments with combined techniques F. Comparison among all techniques As additional tests, some of the techniques mentioned above were joined together with the intention of verifying the results presented by these combinations. For all the cases shown below, the same set of parameters as those used when testing the techniques alone were adopted. Two of such combinations were tested: • Bold Driver + Momentum • Exponential Decay + Momentum The results are shown in Figures 6 and 7. Using bold driver combined with the momentum term, no noticeable changes were noticed, either in the final SIR or in the convergence speed. Joining together the Momentum and the Exponential Decay techniques led to a worst result than using them separately. A possible reason for this behavior is that the latter gradually decreases the learning rate µ at each training epoch; after some epochs its values tends towards zero, resulting in a flat Signalto-Interference Ratio. Once the Momentum term is a kind of derivative technique, it smooths out the learning rate of the algorithm; consequently its effect on the current approach after The results presented above can be summarized in Table I. In the first line, the results of the original approach, with a fixed step size are shown. In the sequel, one have the results for bold driver (BD), exponential decay (ED) and momentum techniques. Finally, the combined techniques results are described. TABLE I C OMPARISON OF THE DIFFERENT METHODS . Technique Fixed BD ED Momentum Momentum + ED Momentum + BD Convergence epoch 1300 130 80 270 120 135 SIR (dB) 36.7790 35.8116 36.8 36.7556 29.2217 36.2502 Convergence time (m) 317.42 (5.29 Hours) 32.39 21.88 67.99 29.36 33.885 The analysis of these results show that all the proposed techniques but the combination with the momentum term with the exponential decay, lead to the same SIR than the fixed step PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON TELECOMMUNICATIONS - IWT/09 size approach. However, the convergence time is much lower, with the fastest time being around 16 times the convergence time for the fixed step size. V. C ONCLUSIONS AND F UTURE W ORK In this work, some techniques derived from the Artificial Neural Networks theory were used to improve the algorithm proposed by H. Buchner and colleagues [1]. The main idea presented here is to dynamically modify the learning rate µ. Three such techniques were proposed: the use of momentum, bold driver and exponential decay, and also, combinations of the momentum term with the bold driver and exponential decay were implemented and tested. All of them sped up the convergence and improved the final SIR, when compared with the fixed step size, proposed in [1]. The final SIR was similar for all the techniques, but the convergence time was much lower for all the proposed methods. The Exponential Decay technique achieved the best results of them all, with a reduction of about 16 times in the number of training epochs until convergence. For the future, different strategies to choose the initial values for the demixing filter are being considered. ACKNOWLEDGEMENTS The authors would like to thank CAPES for partial funding. R EFERENCES [1] H. Buchner, R. Aichner, W. Kellermann, A generalization of blind source separation algorithms for convolutive mixtures based on second-order statistics, IEEE Trans. Speech Audio Process. 13 (1) (January 2005) 120-134. [2] Cherry, E. Collin. Some Experiments on the Recognition of Speech, with One and with Two Ears. Journal of the Acoustical Society of America, 24, pp. 975-979, 1953. [3] Herault, Jeanny and Christian Jutten. Space or time adaptive signal processing by neural network models. AIP Conference Proceedings, 151, pp. 206-211, 1986. 229 [4] L. Molgedey and H. G. Schuster, “Separation of a mixture of independent signals using time delayed correlations”, Review Letters, vol. 72, pp. 3634-3636, 1994. [5] E. Weinstein, M. Feder, and A. Oppenheim, “Multi-channel signal separation by decorrelation”, IEEE Trans. on Speech and Audio Processing, vol 1, no. 4, pp. 405-413, Oct. 1993 . [6] R. Battiti, “First and second-order methods for learning: between steepest descent and Newton’s method”, Technical report, University of Trento, 1991. [7] S. Van Gerven and D. Van Compernolle, “Signal separation by symmetric adaptive decorrelation: stability, convergence, and uniqueness”, IEEE Trans. Signal Processing, vol. 43, no. 7, pp. 1602-1612, 1995. [8] C. L. Fancourt and L. Parra, “The coherence function in blind source separation of convolutive mixtures of non-stationary signals”, in Proc. Int. Workshop on Neural Networks for Signal Processing (NNSP), 2001. [9] R. Aichner et al., “Time-domain blind source separation of nonstationary convolved signals with utilization of geometric beamforming”, in Proc. Int. Workshop on Neural Networks for Signal Processing, Martigny, Switzerland, 2002. [10] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis, Wiley & Sons, Inc., New York, 2001. [11] Shannon, C.E. and Weaver, W. (1949). The mathematical theory of communication. University of Illinois Press, Urbana, Illinois. [12] T.M. Cover and J.A. Thomas, Elements of Information Theory, Wiley & Sons, New York, 1991. [13] H. Buchner, R. Aichner, and W. Kellermann, “A generalization of a class of blind source separation algorithms for convolutive mixtures”, Proc. IEEE Int. Symposium on Independent Component Analysis and Blind Signal Separation (ICA), Nara, Japan, Apr. 2003, pp. 945-950. [14] K. Matsuoka, M. Ohya, and M. Kawamoto, “A neural net for blind separation of nonstationary signals”, Neural Networks, vol. 8, no. 3, pp. 411-419, 1995. [15] A. Oppenheim, “Inequalities connected with definite hermitian forms”, J. London Math. Soc., vol. 5, pp. 114-119, 1930. [16] H. Buchner, R. Aichner, and W. Kellermann, A generalization of blind source separation algorithms for convolutive mixtures based on second order statistics”, IEEE Trans on Speech and Audio Processing, Vol. 13, Num. 1, pp. 120-134, Jan. 2005 [17] S. I. Amari, “Natural Gradient Works Efficiently in Learning”, Neural Computation, February 15, 1998, Vol. 10, No. 2, Pages 251-276 [18] Stephen G. McGovern, “A Model for Room Acoustics”, http://www.2pi.us/rir.html, 2003-2004
© Copyright 2026 Paperzz