Generative models, cost functions and learning algorithms

Independent Component Analysis:
Generative models, cost functions and learning algorithms
Leandro Di Persia
[email protected]
Tópicos Selectos en Aprendizaje Maquinal
Doctorado en Ingeniería, FICH-UNL
1 de octubre de 2010
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Organization
Introduction
2
Linear ICA
3
Optimization algorithms
4
Cost functions
5
ICA algorithms
6
Extensions
sinc(i )
1
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
2 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Basic ICA
Objective
Given
n
signals (measures or mixtures),
and a properly chosen mixture model,
obtain
m
sources that are statistically independent,
and possibly also, the mixing conditions.
Main hypothesis
Each source signal is a random process.
The samples of each source are iid random variables.
The mixtures are produced by some specic generative model.
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
sinc(i )
The sources are statistically independent.
3 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
General Concepts
Components of ICA
ICA = Hypothesis + Cost Function + Algorithms
Hypothesis:
1
Generative model (linear, nonlinear, noisy, instantaneous, convolutive)
2
Source signals (uncorrelated, colored)
3
Source distributions (subgaussian, laplacian, general)
4
Number of sources (n
5
...
= m, n < m, n > m)
Cost Function (measure of independence):
Negentropy (nongaussianity)
Maximum likelihood
Mutual Information
HOS (tensorial methods)
sinc(i )
...
Optimization algorithm:
1
Gradient search (ascent or descent)
2
Stochastic
L. Di Persia
(Curso TSAM)
gradient search (ascent
or descent)
ICA
1 de octubre de 2010
4 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Useful statistical concepts
Random vectors and variables
Random Vector: a collection of random variables
x = [x1 , x2 , . . . , xn ]T
px (x) = px1 ,...,xn (x1 , x2 , . . . , xn ).
R∞
Marginal pdf: pxi (xi ) =
−∞ px (x)dx1 dx2 · · · dxi−1 dxi+1 dxn
R∞
Expectation: for any g(x), E {g(x)} =
−∞ g(x)px (x)dx.
1 PT
Estimation of expectation (T samples): E {g(x)} ≈
i=1 g(xi ).
T
Mean:
x
is
mx = E {x}
with
mxi = E {xi }.
Moments of a random variable:
αj = E xj
Central Moments of a random variable:
L. Di Persia (Curso TSAM)
ICA
µj = E (x − mx )j .
1 de octubre de 2010
sinc(i )
The pdf of
5 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Useful statistical concepts
Random vectors
Rx = E xxT with rij = E {xi xj }.
T .
Covariance: Cx = E (x − mx )(x − mx )
T
Cross Correlation: Rxy = E xy
with rij = E {xi yj }.
T .
Cross Covariance: Cxy = E (x − mx )(y − my )
Correlation:
Uncorrelatedness:
Cxy = 0
or equivalently
Rxy = mx mTy .
Note!!
The last is valid also for random variables instead of random vectors.
y = x,
we have
L. Di Persia (Curso TSAM)
Cx = diag(σ12 , . . . , σn2 ).
ICA
sinc(i )
If
1 de octubre de 2010
6 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Useful statistical concepts
Statistical properties
Statistical independence:
px (x) = px1 ,...,xn (x1 , x2 , . . . , xn ) = px1 (x1 )px2 (x2 ) · · · pxn (xn ).
Meaning: if
x
and
y
are independent, then knowledge of
information about the value of
x
gives no
y.
Property: if two random variables are independent,
E {g(x)h(y)} = E {g(x)} E {h(y)}.
Property: if y = g(x) and
x = g −1 (y) ⇒ p(y) = |detJg (g1−1 (y))| px (g−1 (y)).
y = g(x) = Ax ⇒ p(y) =
1
−1
|detA| px (A y)
sinc(i )
Special case:
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
7 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
sinc(i )
Motivation
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
8 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
sinc(i )
Motivation
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
9 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
sinc(i )
Motivation
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
10 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Organization
Introduction
2
Linear ICA
3
Optimization algorithms
4
Cost functions
5
ICA algorithms
6
Extensions
sinc(i )
1
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
11 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Generative model
Mixing model
Linear Instantaneous mixing:
x(t) = As(t) + n(t)
Instantaneous: all sources arrive at the same time at all sensors.
There is no observation noise:
A
n(t) = 0.
is the (unknown) mixing matrix.
s(t) = [s1 (t), . . . , sm (t)]T
is the vector of sources (multivariate
random process)
is a sample-index variable,
0 ≤ t ≤ T − 1.
sinc(i )
t
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
12 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Hypothesis and problem statement
Hypothesis
The sources are independent
→ ps (s(t)) = ps1 ,...,sn (s1 (t), . . . , sm (t)) =
Qm
i=1 psi (si (t))
Number of sources = Number of mixtures.
A
is invertible.
At most one source distribution is Gaussian.
mx = 0
and
Cx = I .
Problem statement
W
such that
(in the ideal case,
ŝ = W x is as
W = A−1 ).
L. Di Persia (Curso TSAM)
statistically independent as possible
sinc(i )
Find
ICA
1 de octubre de 2010
13 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Hypothesis
Uncorrelatedness Vs. Independence
Under the hypothesis that
Uncorrelated variables:
mx = 0, Cx = Rx = E xxT .
rij = E {xi xj } = 0
if
i 6= j .
Question: ¾¾¾ This imply that
Q
px1 ,...,xn (x1 (t), . . . , xn (t)) = ni=1 pxi (xi (t)) ???
R
xi xj pxi ,xj (xi , xj )dxi dxj = 0 does not imply that
pxi ,xj (xi , xj ) = pxi (xi )pxj (xj ) !!!!.
Clearly, if they are independent also,
E {xi xj } = E {xi } E {xj }
sinc(i )
Special case: multivariate Gaussian.
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
14 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
sinc(i )
Motivation
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
15 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
sinc(i )
Motivation
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
16 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
sinc(i )
Motivation
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
17 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Indeterminacies
Amplitude ambiguity
The mixing equation means that
xi =
j=1 aij sj .
αP
j can be exchanged
aij
xi = m
j=1 αj (αj sj )
It is clear that a factor
the mixing matrix:
Pm
between the sources and
This means that we cannot recover the exact amplitude of the
sources.
Permutation ambiguity
The mixing equation can be written
AP −1 P s,
with
P
a permutation
matrix.
A
and
s
are unknown, this change will be transparent to us.
This means that we can recover the sources with arbitrary sorting.
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
sinc(i )
As both
18 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Indeterminacies
What can be obtained
Ideally, we would like to nd a proper
W
such as
W A = I.
But given the ambiguities mentioned the best we can nd is
W A = P D.
In this equation
P
is an arbitrary permutation matrix and
D
is a
sinc(i )
diagonal scaling matrix.
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
19 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Organization
Introduction
2
Linear ICA
3
Optimization algorithms
4
Cost functions
5
ICA algorithms
6
Extensions
sinc(i )
1
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
20 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Optimization algorithms
Objective
Find some extrema of functional
J (W )
or
J (w),
perhaps subject to
some constrains.
Gradient search
In this case,
J (W ) = E {J(W )}
and
J (w) = E {J(w)}.
Wt+1 = Wt ± µ(t)∇W J (W )
wt+1 = wt ± µ(t)∇w J (w)
These are called Batch learning, because the gradients are estimated
sinc(i )
over a sample.
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
21 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Optimization algorithms
Stochastic gradient search
In this case, the expectations are replaced by their instantaneous
value.
Wt+1 = Wt ± µ(t)∇W J(W )
wt+1 = wt ± µ(t)∇w J(w)
These are called on-line learning because the gradients are estimated
sinc(i )
by their instantaneous sample approximations.
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
22 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Optimization algorithms
Quadratic learning (Newton Method)
In this case, we suppose a function
J (w).
w gives:
2 J (w)
0
J (w0 ) = J (w)+∇w J (w)(w0 −w)+ 12 (w0 −w)T ∂ ∂w
2 (w −w)+· · ·
A Taylor expansion around
Ignoring the higher order terms,
2
J (w)
δJ = J (w0 ) − J (w) = ∇w J (w)∆w + 12 ∆wT ∂ ∂w
2 ∆w
Taking gradient with respect to
∆w = −
h
∂ 2 J (w)
∂w2
i−1
∆w
and equating to zero one gets:
∇w J (w) = −H(w)−1 ∇w J (w)
The learning equation became
wt+1 = wt − H(w)−1 ∇w J (w).
The Hessian has to be denite positive (negative) for convergence to
L. Di Persia (Curso TSAM)
J (w).
ICA
sinc(i )
a minimum (maximum) of
1 de octubre de 2010
23 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Optimization algorithms
Natural gradient
The gradient used up to now, was calculated under the assumption of
an Euclidean parameter space.
But usually, the parameter space is not euclidean but curved, with
Riemannian geometry.
The parameter space is a manifold (a dierentiable space that is
locally euclidean).
A Riemannian manifold is a manifold that posses a metric tensor
G
which characterizes the curvature of the space and is used to measure
sinc(i )
distances.
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
24 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Optimization algorithms
Natural gradient
√
dE (v, v + δv) = δvT δv.
√
dw (w, w + δw) = δwT Gδw.
The distance in an euclidean space is
In a Riemannian space, it is
Using this, the correct direction for the gradient can be found as the
natural gradient.
The learning equation is:
wt+1 = wt ± µ(t)G−1 ∇w J (w),
where
G
is
a Riemannian matrix.
For matrix parameters, the
G
is a fourth-order tensor, and Amari
Wt+1 = Wt ± ∇W J (W )W T W .
sinc(i )
shown that the learning equation is:
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
25 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Optimization algorithms
Constrained optimization: Lagrange method
Problem: nd the extrema of
J (w)
subject to restrictions
Qi (w) = 0∀i = 1, ..., k .
It can be solved by dening an auxiliar function Lagrangian:
L(w, λ1 , . . . , λk ) = J (w) +
Taking gradient of
L
Pk
i=1 λi Qi (w).
with respect to each parameter an equating to
zero yields a system of simultaneous equations that can be solved.
Constrained optimization: projection method
Gradient update + Orthogonal Projection over the constraints.
Example: constrain of
resulting
kwk = 1,
after each update, normalize the
w.
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
sinc(i )
Easier for simple constraints.
26 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Organization
Introduction
2
Linear ICA
3
Optimization algorithms
4
Cost functions
5
ICA algorithms
6
Extensions
sinc(i )
1
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
27 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Types of functions
Type of parameters
Multi-unit functions
One-unit functions
Type of function
Contrasts: function of probability densities.
sinc(i )
Cost functions: function of arbitrary parameters.
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
28 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
sinc(i )
Principle of nongaussianity
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
29 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
sinc(i )
Principle of nongaussianity
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
30 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
sinc(i )
Principle of nongaussianity
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
31 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
sinc(i )
Principle of nongaussianity
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
32 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Principle of nongaussianity
Nongaussian is independent
Central limit theorem: the sum of random variables has a distribution
that is more gaussian than any of the original distributions.
y = wT x = wT As = qT s,
combination of the sources s.
Let
Idea: vary
w
with
w
some vector,
and measure nongaussianity of
y,
y
is a linear
stop when maximum
nongaussianity is reached.
sinc(i )
How can we measure nongaussianity?
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
33 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Nongaussianity by Kurtosis
Kurtosis-based contrast
Denition:
kurt(y) = E y 4 − 3( E y 2 )2 .
For Gaussian distributions,
kurt(y) = 0.
Supergaussian distributions have
Subgaussian distributions have
|kurt(y)|
kurt(y) > 0,
kurt(y) < 0,
are spiky (laplacian).
are at (uniform).
can be used as a contrast (maximization).
sinc(i )
Problem: VERY sensitive to outliers!
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
34 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Nongaussianity by negentropy
Negentropy-based contrast
Diferential Entropy:
R
H(y) = − py (η) log py (η)dη .
A Gaussian variable has the largest entropy among all random
variables of the same variance.
Negentropy:
N (y) = H(ygaussian ) − H(y),
where
Gaussian variable with the same covariance as
ygaussian
is a
y.
Negentropy is always positive, and zero for a Gaussian variable, can
be used as a contrast (maximization).
sinc(i )
Problem: VERY dicult to handle, needs an estimation of pdf.
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
35 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Nongaussianity by negentropy
Approximations to negentropy
Instead of negentropy, some numerical approximations can be used.
Polynomial approximation:
N (y) ≈
1
12
2
E y3 +
1
2
48 kurt(y) .
This has the same problem of sensibility to outliers than kurtosis.
2
Hyvärinen showed that
approximation to
In this,
ν
J(y) =
E {G(y)} − E {G(ν)}
is an
N (y).
is a random variable with
E {G(y)}
is maximized.
sinc(i )
This is maximized when
N (0, 1).
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
36 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Likelihood
Log-likelihood contrast
Given
x = As
and
W = A−1 ,
with
ps (s)
the joint distribution of
sources.
px (x) =
1
−1
|detA| ps (A x)
= |detW |ps (W x).
The expected log likelihood is:
L(W ) = E {log ps (W x)} + log |detW |.
This can be maximized with respect to
Problem: needs an estimation of
ps (s),
W
using any method.
which imply knowledge of the
sources!!
Note - Bell and Sejnowsky found the same contrast function using a
criterion of maximizing the information ow from input to output, in
L. Di Persia (Curso TSAM)
sinc(i )
an algorithm called INFOMAX.
ICA
1 de octubre de 2010
37 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Mutual information
Independence measure by mutual information
Mutual information:
I(y1 , y2 , . . . , yn ) =
Pn
i=1 H(yi )
This is zero for independent recovered sources
⇒
− H(y).
its minimization
gives independence.
It has been show that
I(y1 , y2 , . . . , yn ) = C −
Pn
i=1 N (yi ),
equivalent to maximize individual negentropies subject to an
orthogonality constrain.
Also it has been show that
I(y1 , y2 , . . . , yn ) = −L(W ) − C
so
minimizing mutual information is equivalent to maximum log
sinc(i )
likelihood.
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
38 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Organization
Introduction
2
Linear ICA
3
Optimization algorithms
4
Cost functions
5
ICA algorithms
6
Extensions
sinc(i )
1
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
39 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Preprocessing
Centering
Most of the methods assume that
Centering:
mx = 0.
x̂ = x − mx
After separation,
ŝ = y + W mx
restores the mean.
Sphering (Whitening)
One hypothesis is that
Cx = I
which imply uncorrelated and with
unit variance variables.
It is easily achieved by the transformation
with
These matrices are obtained by eigendecomposition of
L. Di Persia (Curso TSAM)
ICA
Cx = EDE T .
1 de octubre de 2010
sinc(i )
Q=
z = Qx
D−1/2 E T .
40 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Kurtosis Gradient based algorithm
Maximization of kurtosis
Assume that
E xxT = I
Problem: Maximize
with
yi = wiT x,
κ(wi ) = |kurt(yi )| = | E yi4 − 3( E yi2 )2 |,
subject to
kwi k = 1.
Taking gradient,
∇wi κ(wi ) = 4sign(kurt(yi )) E
Then,
(wiT x)3 x
− 3wi kwi k
2
.
ŵi,t+1 = wi + µ∇wi κ(wi ).
wi,t+1 =
ŵi,t+1
kŵi,t+1 k .
sinc(i )
Using Projection to force the constrain,
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
41 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Natural Gradient Infomax
Infomax with Natural Gradient
L(W ) = E {log ps (W x)} + log |detW |.
T +
Taking gradient, ∇W L(W ) = E h(W x)x
Using
h(W x) = [h1 (y1 ), . . . , h1 (y1 )]T
and
h1 (y1 ) =
∂|detW |
1
|detW | ∂W , where
p0 (yi )
∂ log psi (yi )
= pssi (yi )
∂yi
i
∂|detW |
∂W
= (W T )−1 detW sign(detW ),
∇W L(W ) = E h(W x)xT + (W T )−1
Using
The natural gradient is
Finally,
Wt+1 = Wt + µ
L. Di Persia (Curso TSAM)
=
E
h(W x)yT
+ I W.
T + I W.
h(W
x)y
E
ICA
sinc(i )
∇W,nat L(W ) = ∇W
L(W )W T W
1 de octubre de 2010
42 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
FastICA
Fixed-point quadratic convergence algorithm
Problem: optimize the contrast
kwi
k2
J(yi ) = E G(wiT x)
subject to
= 1.
Using the Lagrangian,
L(wi ) = E G(wiT x) + λ(wiT wi − 1).
Optimizing by Newton method, this results in
ŵi,t+1 = E g 0 (wiT x) wi − E g(wiT x)x .
wi,t+1 =
ŵi,t+1
kŵi,t+1 k
G(·) is a properly chosen nonlinear function
g(·) its derivative and g 0 (·) its second derivative.
In these equations,
G(y) = a1 log cosh(ay), g(y) = tanh(ay)
= a(1 − tanh2 (ay)) with 1 ≤ a ≤ 2.
By example,
g 0 (y)
and
To extract more units, an Gram-Schmidt orthogonalization scheme
sinc(i )
(noncuadratic),
must be used after each iteration step.
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
43 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
FastICA
Deationary orthogonalization
wp .
Pp−1
After updating of
Set
wp = wp −
Normalize
wp
T
j=1 (wp wj )wj .
by dividing it by its norm.
Repeat until convergence.
sinc(i )
Increase p and repeat until all components are extracted.
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
44 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
sinc(i )
Example FastICA
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
45 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
sinc(i )
Example FastICA
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
46 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
sinc(i )
Example FastICA
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
47 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
sinc(i )
Example FastICA
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
48 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Organization
Introduction
2
Linear ICA
3
Optimization algorithms
4
Cost functions
5
ICA algorithms
6
Extensions
sinc(i )
1
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
49 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Extensions
More sensors than sources
Without noise: Use PCA for Dimension reduction.
Then apply standard ICA.
Noisy ICA
Model
x = As + n.
If also
n>m
(more sensors than sources), subspace projection
algorithms can be used.
Other alternative: deationary approach (needs estimation of number
of sources).
sinc(i )
Other approaches: measures that are robust (blind) to gaussian noise.
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
50 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Extensions
Nonlinear ICA
Model:
x = f (As).
Two steps, rst estimate
g(·) = f −1 (·)
such as
z = g(x)
is linearly
mixed.
Then, apply standard ICA.
Convolutive ICA
Model:
x = (A ∗ s)
where
A
is a matrix of FIR lters.
Time domain methods: Use similar contrasts, but optimization is
complex due to convolutions.
sinc(i )
Frequency domain methods: Using STFT, the mixture model is
instantaneous for each frequency bin.
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
51 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Bibliography
A. Hyvärinen, J. Karhunen and E. Oja, Independent Component
Analysis,John Wiley and Sons,2001. Ch 2, 3, 7, 8, 9, 10.
A. Hyvärinen, E. Oja,Independent component analysis: algorithms
and applications, Neural Networks, Vol. 13, No 4-5, pp. 411-430.
S. Haykin, Unsupervised adaptive ltering, Volume I: Blind Source
sinc(i )
Separation, John Wiley and Sons, 2000. Ch. 2, 6, 8.
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
52 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Problems
Basic
Show that
Cxy = 0 ⇒ Rxy = mx my
px(x) is a multivariate Gaussian with mx = 0 and
Cx = I , y = W x with W orthogonal has the same distribution,
py(y) = px(x) .
Show that if
Show that
z = D−1/2 E T ,
where
Cx = EDE T
is the
eigendecomposition of the covariance matrix, is a whitening
sinc(i )
transformation.
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
53 / 55
Introduction
Linear ICA
Optimization algorithms
Cost functions
ICA algorithms
Extensions
Problems
Advanced
Implement the FastICA algorithm with deationary learning.
Using that algorithm, using two sources and a random mixing matrix,
do the following:
1
For each stage (sources, mixtures, whitened signals, separated signals)
draw a scatter graph of the variables.
2
After separation, estimate the
3
Question: is the separation matrix
and
W
D
matrices.
the inverse of the used mixing
A?
sinc(i )
matrix
P
L. Di Persia (Curso TSAM)
ICA
1 de octubre de 2010
54 / 55