Independent Component Analysis: Generative models, cost functions and learning algorithms Leandro Di Persia [email protected] Tópicos Selectos en Aprendizaje Maquinal Doctorado en Ingeniería, FICH-UNL 1 de octubre de 2010 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Organization Introduction 2 Linear ICA 3 Optimization algorithms 4 Cost functions 5 ICA algorithms 6 Extensions sinc(i ) 1 L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 2 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Basic ICA Objective Given n signals (measures or mixtures), and a properly chosen mixture model, obtain m sources that are statistically independent, and possibly also, the mixing conditions. Main hypothesis Each source signal is a random process. The samples of each source are iid random variables. The mixtures are produced by some specic generative model. L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 sinc(i ) The sources are statistically independent. 3 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions General Concepts Components of ICA ICA = Hypothesis + Cost Function + Algorithms Hypothesis: 1 Generative model (linear, nonlinear, noisy, instantaneous, convolutive) 2 Source signals (uncorrelated, colored) 3 Source distributions (subgaussian, laplacian, general) 4 Number of sources (n 5 ... = m, n < m, n > m) Cost Function (measure of independence): Negentropy (nongaussianity) Maximum likelihood Mutual Information HOS (tensorial methods) sinc(i ) ... Optimization algorithm: 1 Gradient search (ascent or descent) 2 Stochastic L. Di Persia (Curso TSAM) gradient search (ascent or descent) ICA 1 de octubre de 2010 4 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Useful statistical concepts Random vectors and variables Random Vector: a collection of random variables x = [x1 , x2 , . . . , xn ]T px (x) = px1 ,...,xn (x1 , x2 , . . . , xn ). R∞ Marginal pdf: pxi (xi ) = −∞ px (x)dx1 dx2 · · · dxi−1 dxi+1 dxn R∞ Expectation: for any g(x), E {g(x)} = −∞ g(x)px (x)dx. 1 PT Estimation of expectation (T samples): E {g(x)} ≈ i=1 g(xi ). T Mean: x is mx = E {x} with mxi = E {xi }. Moments of a random variable: αj = E xj Central Moments of a random variable: L. Di Persia (Curso TSAM) ICA µj = E (x − mx )j . 1 de octubre de 2010 sinc(i ) The pdf of 5 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Useful statistical concepts Random vectors Rx = E xxT with rij = E {xi xj }. T . Covariance: Cx = E (x − mx )(x − mx ) T Cross Correlation: Rxy = E xy with rij = E {xi yj }. T . Cross Covariance: Cxy = E (x − mx )(y − my ) Correlation: Uncorrelatedness: Cxy = 0 or equivalently Rxy = mx mTy . Note!! The last is valid also for random variables instead of random vectors. y = x, we have L. Di Persia (Curso TSAM) Cx = diag(σ12 , . . . , σn2 ). ICA sinc(i ) If 1 de octubre de 2010 6 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Useful statistical concepts Statistical properties Statistical independence: px (x) = px1 ,...,xn (x1 , x2 , . . . , xn ) = px1 (x1 )px2 (x2 ) · · · pxn (xn ). Meaning: if x and y are independent, then knowledge of information about the value of x gives no y. Property: if two random variables are independent, E {g(x)h(y)} = E {g(x)} E {h(y)}. Property: if y = g(x) and x = g −1 (y) ⇒ p(y) = |detJg (g1−1 (y))| px (g−1 (y)). y = g(x) = Ax ⇒ p(y) = 1 −1 |detA| px (A y) sinc(i ) Special case: L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 7 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions sinc(i ) Motivation L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 8 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions sinc(i ) Motivation L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 9 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions sinc(i ) Motivation L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 10 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Organization Introduction 2 Linear ICA 3 Optimization algorithms 4 Cost functions 5 ICA algorithms 6 Extensions sinc(i ) 1 L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 11 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Generative model Mixing model Linear Instantaneous mixing: x(t) = As(t) + n(t) Instantaneous: all sources arrive at the same time at all sensors. There is no observation noise: A n(t) = 0. is the (unknown) mixing matrix. s(t) = [s1 (t), . . . , sm (t)]T is the vector of sources (multivariate random process) is a sample-index variable, 0 ≤ t ≤ T − 1. sinc(i ) t L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 12 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Hypothesis and problem statement Hypothesis The sources are independent → ps (s(t)) = ps1 ,...,sn (s1 (t), . . . , sm (t)) = Qm i=1 psi (si (t)) Number of sources = Number of mixtures. A is invertible. At most one source distribution is Gaussian. mx = 0 and Cx = I . Problem statement W such that (in the ideal case, ŝ = W x is as W = A−1 ). L. Di Persia (Curso TSAM) statistically independent as possible sinc(i ) Find ICA 1 de octubre de 2010 13 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Hypothesis Uncorrelatedness Vs. Independence Under the hypothesis that Uncorrelated variables: mx = 0, Cx = Rx = E xxT . rij = E {xi xj } = 0 if i 6= j . Question: ¾¾¾ This imply that Q px1 ,...,xn (x1 (t), . . . , xn (t)) = ni=1 pxi (xi (t)) ??? R xi xj pxi ,xj (xi , xj )dxi dxj = 0 does not imply that pxi ,xj (xi , xj ) = pxi (xi )pxj (xj ) !!!!. Clearly, if they are independent also, E {xi xj } = E {xi } E {xj } sinc(i ) Special case: multivariate Gaussian. L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 14 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions sinc(i ) Motivation L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 15 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions sinc(i ) Motivation L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 16 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions sinc(i ) Motivation L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 17 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Indeterminacies Amplitude ambiguity The mixing equation means that xi = j=1 aij sj . αP j can be exchanged aij xi = m j=1 αj (αj sj ) It is clear that a factor the mixing matrix: Pm between the sources and This means that we cannot recover the exact amplitude of the sources. Permutation ambiguity The mixing equation can be written AP −1 P s, with P a permutation matrix. A and s are unknown, this change will be transparent to us. This means that we can recover the sources with arbitrary sorting. L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 sinc(i ) As both 18 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Indeterminacies What can be obtained Ideally, we would like to nd a proper W such as W A = I. But given the ambiguities mentioned the best we can nd is W A = P D. In this equation P is an arbitrary permutation matrix and D is a sinc(i ) diagonal scaling matrix. L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 19 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Organization Introduction 2 Linear ICA 3 Optimization algorithms 4 Cost functions 5 ICA algorithms 6 Extensions sinc(i ) 1 L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 20 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Optimization algorithms Objective Find some extrema of functional J (W ) or J (w), perhaps subject to some constrains. Gradient search In this case, J (W ) = E {J(W )} and J (w) = E {J(w)}. Wt+1 = Wt ± µ(t)∇W J (W ) wt+1 = wt ± µ(t)∇w J (w) These are called Batch learning, because the gradients are estimated sinc(i ) over a sample. L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 21 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Optimization algorithms Stochastic gradient search In this case, the expectations are replaced by their instantaneous value. Wt+1 = Wt ± µ(t)∇W J(W ) wt+1 = wt ± µ(t)∇w J(w) These are called on-line learning because the gradients are estimated sinc(i ) by their instantaneous sample approximations. L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 22 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Optimization algorithms Quadratic learning (Newton Method) In this case, we suppose a function J (w). w gives: 2 J (w) 0 J (w0 ) = J (w)+∇w J (w)(w0 −w)+ 12 (w0 −w)T ∂ ∂w 2 (w −w)+· · · A Taylor expansion around Ignoring the higher order terms, 2 J (w) δJ = J (w0 ) − J (w) = ∇w J (w)∆w + 12 ∆wT ∂ ∂w 2 ∆w Taking gradient with respect to ∆w = − h ∂ 2 J (w) ∂w2 i−1 ∆w and equating to zero one gets: ∇w J (w) = −H(w)−1 ∇w J (w) The learning equation became wt+1 = wt − H(w)−1 ∇w J (w). The Hessian has to be denite positive (negative) for convergence to L. Di Persia (Curso TSAM) J (w). ICA sinc(i ) a minimum (maximum) of 1 de octubre de 2010 23 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Optimization algorithms Natural gradient The gradient used up to now, was calculated under the assumption of an Euclidean parameter space. But usually, the parameter space is not euclidean but curved, with Riemannian geometry. The parameter space is a manifold (a dierentiable space that is locally euclidean). A Riemannian manifold is a manifold that posses a metric tensor G which characterizes the curvature of the space and is used to measure sinc(i ) distances. L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 24 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Optimization algorithms Natural gradient √ dE (v, v + δv) = δvT δv. √ dw (w, w + δw) = δwT Gδw. The distance in an euclidean space is In a Riemannian space, it is Using this, the correct direction for the gradient can be found as the natural gradient. The learning equation is: wt+1 = wt ± µ(t)G−1 ∇w J (w), where G is a Riemannian matrix. For matrix parameters, the G is a fourth-order tensor, and Amari Wt+1 = Wt ± ∇W J (W )W T W . sinc(i ) shown that the learning equation is: L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 25 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Optimization algorithms Constrained optimization: Lagrange method Problem: nd the extrema of J (w) subject to restrictions Qi (w) = 0∀i = 1, ..., k . It can be solved by dening an auxiliar function Lagrangian: L(w, λ1 , . . . , λk ) = J (w) + Taking gradient of L Pk i=1 λi Qi (w). with respect to each parameter an equating to zero yields a system of simultaneous equations that can be solved. Constrained optimization: projection method Gradient update + Orthogonal Projection over the constraints. Example: constrain of resulting kwk = 1, after each update, normalize the w. L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 sinc(i ) Easier for simple constraints. 26 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Organization Introduction 2 Linear ICA 3 Optimization algorithms 4 Cost functions 5 ICA algorithms 6 Extensions sinc(i ) 1 L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 27 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Types of functions Type of parameters Multi-unit functions One-unit functions Type of function Contrasts: function of probability densities. sinc(i ) Cost functions: function of arbitrary parameters. L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 28 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions sinc(i ) Principle of nongaussianity L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 29 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions sinc(i ) Principle of nongaussianity L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 30 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions sinc(i ) Principle of nongaussianity L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 31 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions sinc(i ) Principle of nongaussianity L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 32 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Principle of nongaussianity Nongaussian is independent Central limit theorem: the sum of random variables has a distribution that is more gaussian than any of the original distributions. y = wT x = wT As = qT s, combination of the sources s. Let Idea: vary w with w some vector, and measure nongaussianity of y, y is a linear stop when maximum nongaussianity is reached. sinc(i ) How can we measure nongaussianity? L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 33 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Nongaussianity by Kurtosis Kurtosis-based contrast Denition: kurt(y) = E y 4 − 3( E y 2 )2 . For Gaussian distributions, kurt(y) = 0. Supergaussian distributions have Subgaussian distributions have |kurt(y)| kurt(y) > 0, kurt(y) < 0, are spiky (laplacian). are at (uniform). can be used as a contrast (maximization). sinc(i ) Problem: VERY sensitive to outliers! L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 34 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Nongaussianity by negentropy Negentropy-based contrast Diferential Entropy: R H(y) = − py (η) log py (η)dη . A Gaussian variable has the largest entropy among all random variables of the same variance. Negentropy: N (y) = H(ygaussian ) − H(y), where Gaussian variable with the same covariance as ygaussian is a y. Negentropy is always positive, and zero for a Gaussian variable, can be used as a contrast (maximization). sinc(i ) Problem: VERY dicult to handle, needs an estimation of pdf. L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 35 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Nongaussianity by negentropy Approximations to negentropy Instead of negentropy, some numerical approximations can be used. Polynomial approximation: N (y) ≈ 1 12 2 E y3 + 1 2 48 kurt(y) . This has the same problem of sensibility to outliers than kurtosis. 2 Hyvärinen showed that approximation to In this, ν J(y) = E {G(y)} − E {G(ν)} is an N (y). is a random variable with E {G(y)} is maximized. sinc(i ) This is maximized when N (0, 1). L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 36 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Likelihood Log-likelihood contrast Given x = As and W = A−1 , with ps (s) the joint distribution of sources. px (x) = 1 −1 |detA| ps (A x) = |detW |ps (W x). The expected log likelihood is: L(W ) = E {log ps (W x)} + log |detW |. This can be maximized with respect to Problem: needs an estimation of ps (s), W using any method. which imply knowledge of the sources!! Note - Bell and Sejnowsky found the same contrast function using a criterion of maximizing the information ow from input to output, in L. Di Persia (Curso TSAM) sinc(i ) an algorithm called INFOMAX. ICA 1 de octubre de 2010 37 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Mutual information Independence measure by mutual information Mutual information: I(y1 , y2 , . . . , yn ) = Pn i=1 H(yi ) This is zero for independent recovered sources ⇒ − H(y). its minimization gives independence. It has been show that I(y1 , y2 , . . . , yn ) = C − Pn i=1 N (yi ), equivalent to maximize individual negentropies subject to an orthogonality constrain. Also it has been show that I(y1 , y2 , . . . , yn ) = −L(W ) − C so minimizing mutual information is equivalent to maximum log sinc(i ) likelihood. L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 38 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Organization Introduction 2 Linear ICA 3 Optimization algorithms 4 Cost functions 5 ICA algorithms 6 Extensions sinc(i ) 1 L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 39 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Preprocessing Centering Most of the methods assume that Centering: mx = 0. x̂ = x − mx After separation, ŝ = y + W mx restores the mean. Sphering (Whitening) One hypothesis is that Cx = I which imply uncorrelated and with unit variance variables. It is easily achieved by the transformation with These matrices are obtained by eigendecomposition of L. Di Persia (Curso TSAM) ICA Cx = EDE T . 1 de octubre de 2010 sinc(i ) Q= z = Qx D−1/2 E T . 40 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Kurtosis Gradient based algorithm Maximization of kurtosis Assume that E xxT = I Problem: Maximize with yi = wiT x, κ(wi ) = |kurt(yi )| = | E yi4 − 3( E yi2 )2 |, subject to kwi k = 1. Taking gradient, ∇wi κ(wi ) = 4sign(kurt(yi )) E Then, (wiT x)3 x − 3wi kwi k 2 . ŵi,t+1 = wi + µ∇wi κ(wi ). wi,t+1 = ŵi,t+1 kŵi,t+1 k . sinc(i ) Using Projection to force the constrain, L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 41 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Natural Gradient Infomax Infomax with Natural Gradient L(W ) = E {log ps (W x)} + log |detW |. T + Taking gradient, ∇W L(W ) = E h(W x)x Using h(W x) = [h1 (y1 ), . . . , h1 (y1 )]T and h1 (y1 ) = ∂|detW | 1 |detW | ∂W , where p0 (yi ) ∂ log psi (yi ) = pssi (yi ) ∂yi i ∂|detW | ∂W = (W T )−1 detW sign(detW ), ∇W L(W ) = E h(W x)xT + (W T )−1 Using The natural gradient is Finally, Wt+1 = Wt + µ L. Di Persia (Curso TSAM) = E h(W x)yT + I W. T + I W. h(W x)y E ICA sinc(i ) ∇W,nat L(W ) = ∇W L(W )W T W 1 de octubre de 2010 42 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions FastICA Fixed-point quadratic convergence algorithm Problem: optimize the contrast kwi k2 J(yi ) = E G(wiT x) subject to = 1. Using the Lagrangian, L(wi ) = E G(wiT x) + λ(wiT wi − 1). Optimizing by Newton method, this results in ŵi,t+1 = E g 0 (wiT x) wi − E g(wiT x)x . wi,t+1 = ŵi,t+1 kŵi,t+1 k G(·) is a properly chosen nonlinear function g(·) its derivative and g 0 (·) its second derivative. In these equations, G(y) = a1 log cosh(ay), g(y) = tanh(ay) = a(1 − tanh2 (ay)) with 1 ≤ a ≤ 2. By example, g 0 (y) and To extract more units, an Gram-Schmidt orthogonalization scheme sinc(i ) (noncuadratic), must be used after each iteration step. L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 43 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions FastICA Deationary orthogonalization wp . Pp−1 After updating of Set wp = wp − Normalize wp T j=1 (wp wj )wj . by dividing it by its norm. Repeat until convergence. sinc(i ) Increase p and repeat until all components are extracted. L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 44 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions sinc(i ) Example FastICA L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 45 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions sinc(i ) Example FastICA L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 46 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions sinc(i ) Example FastICA L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 47 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions sinc(i ) Example FastICA L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 48 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Organization Introduction 2 Linear ICA 3 Optimization algorithms 4 Cost functions 5 ICA algorithms 6 Extensions sinc(i ) 1 L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 49 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Extensions More sensors than sources Without noise: Use PCA for Dimension reduction. Then apply standard ICA. Noisy ICA Model x = As + n. If also n>m (more sensors than sources), subspace projection algorithms can be used. Other alternative: deationary approach (needs estimation of number of sources). sinc(i ) Other approaches: measures that are robust (blind) to gaussian noise. L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 50 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Extensions Nonlinear ICA Model: x = f (As). Two steps, rst estimate g(·) = f −1 (·) such as z = g(x) is linearly mixed. Then, apply standard ICA. Convolutive ICA Model: x = (A ∗ s) where A is a matrix of FIR lters. Time domain methods: Use similar contrasts, but optimization is complex due to convolutions. sinc(i ) Frequency domain methods: Using STFT, the mixture model is instantaneous for each frequency bin. L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 51 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Bibliography A. Hyvärinen, J. Karhunen and E. Oja, Independent Component Analysis,John Wiley and Sons,2001. Ch 2, 3, 7, 8, 9, 10. A. Hyvärinen, E. Oja,Independent component analysis: algorithms and applications, Neural Networks, Vol. 13, No 4-5, pp. 411-430. S. Haykin, Unsupervised adaptive ltering, Volume I: Blind Source sinc(i ) Separation, John Wiley and Sons, 2000. Ch. 2, 6, 8. L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 52 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Problems Basic Show that Cxy = 0 ⇒ Rxy = mx my px(x) is a multivariate Gaussian with mx = 0 and Cx = I , y = W x with W orthogonal has the same distribution, py(y) = px(x) . Show that if Show that z = D−1/2 E T , where Cx = EDE T is the eigendecomposition of the covariance matrix, is a whitening sinc(i ) transformation. L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 53 / 55 Introduction Linear ICA Optimization algorithms Cost functions ICA algorithms Extensions Problems Advanced Implement the FastICA algorithm with deationary learning. Using that algorithm, using two sources and a random mixing matrix, do the following: 1 For each stage (sources, mixtures, whitened signals, separated signals) draw a scatter graph of the variables. 2 After separation, estimate the 3 Question: is the separation matrix and W D matrices. the inverse of the used mixing A? sinc(i ) matrix P L. Di Persia (Curso TSAM) ICA 1 de octubre de 2010 54 / 55
© Copyright 2026 Paperzz