Widrow-Hoff Learning Carlo U. Nicola, SGI FH Aargau With extracts from publications of : D. Demuth, University of Colorado D.J. C. MacKay, Cambridge University Adaline (Adaptive linear neuron) WBS WS06-07 2 1 Transfer function for Adaline WBS WS06-07 3 Adaline with only two inputs The line divides the p1, p2 plane in two: a sort of classifier like the perceptron The 2 inputs Adaline divides the plane along the line defined by: WBS WS06-07 4 2 A digression on the Mean Square Error function (MSE) If we want to find the optimal set of weights w that minimise the error between target and inputs vectors, it is opportune to define an error landscape function E(w) with MSE and then try a gradient descent hoping to get a minimum. Assume that the training set consists of pairs containing an input pi and a target ti : (p1,t1), (p2,t2), L , (pq,tq) , then the MSE function is: With E(w) we define a performance surface i.e. the total error surface plotted in the space of the system coefficients (in our case the weights w). With the gradient vector (= tangent to the E(w) landscape) we then try to reach a global minimum for the weights' vector: WBS WS06-07 5 Gradient or steepest descent What does it mean gradient descent (or steepest descent) ? MSE-Function: Total error-landscape. Tangent direction: Tangent = grad (Vector!) -grad crawls to minimum of MSE (we hope so!) In the Adaline case the correction procedure does a (stochastic) gradient descent on the n-dimensional mean square error surface, found by calculating the gradient of error with respect to the weights. Notice the difference with the perceptron convergence procedure! WBS WS06-07 6 3 The meaning of approximate and stochastic in the LMS algorithm The Least Mean Squared, or LMS, algorithm is a stochastic gradient algorithm that iterates each tap weight in the filter in the direction of the gradient of the squared amplitude of an error signal with respect to that tap weight. LMS is called a stochastic gradient algorithm because the gradient vector is chosen at ‘random’ and not, as in the steepest descent case, precisely derived from the shape of the total error surface. Random means here the instantaneous value of the gradient. This is then used as the estimator for the true quantity. In practice this simply means that LMS uses at each iteration step the actual value of the error (not the full MSE function!) to build the gradient. WBS WS06-07 7 How Adaline works The main idea behind Adaline is simple: Change the weights w of the inputs signals pi so that the LMS (Least Mean Square) error between the teaching vector ti and the output signal a reaches a minimum. The training set consists of pairs containing an input pi and a target ti : (p1,t1), (p2,t2), L , (pq,tq) . With the new notation: With these x and z we tidy up the output of Adaline and from that we derive an equation for the mean square error: WBS WS06-07 8 4 Adaline error analysis Goal: Under which conditions can we find a global minimum of the mean square error? The mean square error for the Adaline network is thus a quadratic function: WBS WS06-07 9 Conditions for a global minimum Goal: Under which conditions can we find a global minimum x* of the weights from the mean square error? The correlation matrix R should be at least positive semidefinite (all eingenvalues are > 0). The last equation holds if matrix R is positive definite (the inverse of R exists!). WBS WS06-07 10 5 Approximate steepest descent (1) For one sample k we get the following approximate mean square error: Applying to it the (stochastic) gradient: WBS WS06-07 11 Approximate steepest descent (2) Gradient is given by the product of the actual error with the input vector. WBS WS06-07 12 6 LMS algorithm Finally from the last equation of the preceding slides we can derive the learning rule for Adaline. This rule is another instance of the well known LMS (Least Mean Square) algorithm. From these equations we get (with some simple algebra) the learning rule for both the weights and the biases of Adaline: α is a stability parameter we will estimate shortly. WBS WS06-07 13 Summary of Adaline LMS learning Step1: Initialize weights and thresholds: Set the wij and the bij to small random values in the range [-1,+1]. Gradient descent ! Step2: For each pattern pair (pk, tk) do: (i) Input pk and calculate the actual output: aj = Σwijpi + bj (ii) Compute the error: e2(k) = (tk -ak)2 (iii) Adjust the weights: ∆ wij = ± 2αeipi where : (ei = ti - ai) Step 3: Repeat step 2 until the error correction is sufficiently low or zero. WBS WS06-07 14 7 LMS stability and the calculation of α Without going into any details, we shall summarise the conditions under which the LMS algorithm finds a stable minimum. That condition says that the eigenvalues of: must fall inside the unit circle. Since λi > 0 this means: WBS WS06-07 15 Stability as function of α α = 0.1 WBS WS06-07 16 8 Example: apple and banana sorter WBS WS06-07 17 Iteration 1 Banana: WBS WS06-07 18 9 Iteration 2 Apple: WBS WS06-07 19 Iteration 3 WBS WS06-07 20 10 LMS in practice The LMS algorithm is the most pervasive of all encoding procedures. Adaptive signal processing is largely based upon the LMS algorithm with applications such as system modelling, statistical prediction, noise cancelling, adaptive echo cancelling, and channel equalization. As an example we will look at an adaptive filter for noise cancellation. Adaptive filter Tapped delay line WBS WS06-07 21 Noise cancellation example 1 Noise cancellation in the pilot's radio headset in jet aircraft is a nice application of an adaptive filter. A jet engine can produce noise levels of over 140 dB. Since normal human speech occurs between 30 and 40 dB, it is clear that traditional filtering techniques will not work. To implement an adaptive technique, we place an additional microphone at the rear of the aircraft to record the engine noise directly. By taking advantage of the additional information this reference microphone gives us, we can substantially improve the signal for the pilot. We can naively think to directly subtract the reference noise signal from the primary signal to implement such noise cancellation. However, this technique will not work, because the noise at the reference microphone will not be exactly the same as the noise in the jet cabine. There will be a delay corresponding to the distance between the primary and reference microphones. Also, unknown acoustic effects, such as an echoes or low pass filtering, can occur to the noise as it travels through the fuselage of the aircraft. Even in the ideal case, the delay alone will guarantee that simply subtracting will not properly cancel the noise. If we model the path from the noise source to the primary microphone as a linear system, we can devise an adaptive algorithm to train an FIR filter to match the acoustic characteristics of the channel. If we then apply this filter to the noise recorded at the reference microphone, we should be able to successfully subtract out the noise recorded at the primary microphone. This leaves us with an improved recording of the pilot's voice. WBS WS06-07 22 11 Noise cancellation example 2 WBS WS06-07 23 Noise cancellation Adaline for example 2 We now apply our machinery to the 2 inputs (with 1 delay) Adaline: WBS WS06-07 24 12 Calculation of R WBS WS06-07 25 Calculation of h = 0, v(k), v(k-1) and s(k) are not correlated. WBS WS06-07 26 13 Calculate the corrected weights x* We can check how good our correction is, using the previous formula for the error: = 0, no correlation WBS WS06-07 27 Phone echo cancellation A last useful application of Adaline: echo cancellation in a phone conversation! Today every DS-(Digital Signal)-Processor can be programmed as Adaline adaptive filter. WBS WS06-07 28 14
© Copyright 2026 Paperzz