Non-Negative Blind Source Separation using Convex Analysis

Non-Negative Blind Source Separation
using Convex Analysis
Wing-Kin (Ken) Ma
The Chinese University of Hong Kong (CUHK)
Course on Convex Optimization for Wireless Comm. and Signal Proc.
Jointly taught by Daniel P. Palomar and Wing-Kin (Ken) Ma
National Chiao Tung Univ., Hsinchu, Taiwan
December 19-21, 2008
Acknowledgement: Tsung-Han Chan, Chong-Yung Chi, and Yue Wang
Blind source separation (BSS): Problem statement
Signal model: a real-valued, N -input, M -output linear mixing model:
xi =
N
X
aij sj ,
i = 1, . . . , M
s1
x1
a11
j=1
a21
where
a31


xi[1]
xi =  ..  ,
xi[L]


si[1]
si =  .. 
si[L]
s2
x2
a12
a22
a32
x3
are observation & true source vectors.
Problem: extract {s1, . . . , sN } from {x1, . . . , xM } without information of the
mixing matrix A = {aij }.
W.-K. Ma
1
BSS: A biomedical imaging example
Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) assessments of breast cancer
captured at different times. Courtesy to Yue Wang [Wang et al. 2003].
W.-K. Ma
2
Time activity cruves (TAC)
Time
Fast flow
Slow flow
Plasma
Illustration of source pattern mixing process. The signals represent a summation of vascular
permeability with various diffusion rates. The goal is to separate the distribution of multiple
biomarkers with the same diffusion rate.
W.-K. Ma
3
BSS techniques
• A BSS approach is based on some assumptions on the characteristics of
{s1, . . . , sN } and/or A.
• There are two aspects in developing a BSS approach:
– criterion established from the assumptions made, &
– optimization methods for fulfilling the criterion.
• The suitability of the assumptions (& the approach as a result) depends much
on the applications under consideration.
Example: Independent component analysis (ICA), a well-known BSS
technique, typically assumes that each si[n] is random non-Gaussian & is
mutually independent.
Mutual independence is a good assumption in speech & wireless commun., but
not so in hyperspectral imaging.
W.-K. Ma
4
Non-negative blind source separation (nBSS)
• In some applications source signals are non-negative by nature; imaging.
• nBSS approaches exploit the signal non-negativity characteristic (plus some
additional assumptions).
• Applications:
chemistry.
biomedical imaging, hyperspectral imaging, & analytical
• Some existing nBSS approaches:
– non-negative ICA (nICA) [Plumbley 2003]
– non-negative matrix factorization (NMF) [Lee-Seung 1999].
W.-K. Ma
5
• nICA is a statistical approach adopting the mutual independence assumption.
• NMF is a deterministic approach that may cope with correlated sources.
• Essentially NMF deals with an optimization
min
S∈RL×M ,A∈RM ×N
kX − SAk2F
s.t. S 0, A 0 (elementwise non-negative)
where X = [ x1, . . . , xN ], & S = [ s1, . . . , sM ].
NMF may not be a unique factorization, however.
W.-K. Ma
6
CAMNS:
Convex analysis of mixtures of non-negative sources
• CAMNS [Chan-Ma-Chi-Wang 2008] is a deterministic nBSS approach.
• In addition to utilizing source non-negativity, CAMNS employs a special
deterministic assumption called local dominance.
• What
is
local
dominance?
Intuitively, signals with many
‘zeros’ are likely to satisfy local
dominance (math. def. available
soon).
• Appears to be a good assumption for sparse or high-contrast images.
W.-K. Ma
7
An intuitive illustration of how CAMNS works
s3
s1
x1
x2
x3
s2
How can we extract {s1, . . . , sN } from {x1, . . . , xM } without knowing {aij }?
W.-K. Ma
8
An intuitive illustration of how CAMNS works (cont’d)
x1
x2
x3
Based on some assumptions (e.g., signal non-negativity & local dominance) & by
convex analysis, we use {x1, . . . , xM } to construct a polyhedral set.
W.-K. Ma
9
An intuitive illustration of how CAMNS works (cont’d)
s1
s3
s2
We show that the ‘corners’ (formally speaking, extreme points) of this polyhedral
set are exactly {s1, . . . , sN } (rather surprisingly).
W.-K. Ma
10
An intuitive illustration of how CAMNS works (cont’d)
Using LP, we can locate the ‘corners’ of the polyhedral set effectively. As a result
perfect separation can be achieved.
W.-K. Ma
11
A quick review of some convex analysis concepts
Affine hull of a given set of vectors {s1, . . . , sN } ⊂ RL:
aff{s1, . . . , sN } =
x=
N
X
i=1
N
X
θisi θ ∈ RN ,
θi = 1 .
i=1
• An affine hull can always be represented by
aff{s1, . . . , sN } =
P
x = Cα + d α ∈ R
for some (non-unique) d ∈ RL and C ∈ RL×P , where P ≤ N − 1 is the affine
dimension.
• If {s1, . . . , sN } is affine independent (or {s1 − sN , . . . , sN −1 − sN } is linearly
independent) then P = N − 1.
W.-K. Ma
12
Convex hull of a given set of vectors {s1, . . . , sN } ⊂ RL:
conv{s1, . . . , sN } =
x=
N
X
i=1
N
X
θisi θ ∈ RN
θi = 1
+,
i=1
• A point x ∈ conv{s1, . . . , sN } is an extreme point of conv{s1, . . . , sN } if x
is not any nontrivial convex combination of {s1, . . . , sN }.
• If {s1, . . . , sN } is affine independent then {s1, . . . , sN } is the set of all extreme
points of its convex hull.
W.-K. Ma
13
Example of 3-dimensional signal space geometry with N = 3. In this example, aff{s1, s2, s3} is
a plane passing through s1, s2, s3, & conv{s1, s2, s3} is a triangle with corners (extreme points)
s1 , s2 , s3 .
W.-K. Ma
14
The assumptions in CAMNS
Recall the model xi =
PM
j=1 aij sj .
Our assumptions:
(A1) Source non-negativity: For each j, sj ∈ RL
+.
(A2) Local dominance: For each i ∈ {1, . . . , N }, there exists an (unknown) index `i
such that si[`i] > 0 and sj [`i] = 0, ∀j 6= i.
(Reasonable assumption for sparse or high-contrast signals).
(A3) Unit row sum: For all i = 1, . . . , M ,
PN
j=1 aij
= 1.
(Already satisfied in MRI, can be relaxed).
(A4) M ≥ N and A is of full column rank. (Standard BSS assumption)
W.-K. Ma
15
How to enforce (A3), if it does not hold
The unit row sum assumption (A3) may be relaxed.
Suppose that xTi 1 6= 0 (where 1 is an all-one vector) for all i.
Consider a normalized version of xi:
N T X
a
s
1
sj
xi
ij j
.
x̄i = T =
T
T
xi 1 j=1
xi 1
sj 1
| {z
}
|{z}
,āij
,s̄j
One can show that (āij ) satisfies (A3).
W.-K. Ma
16
CAMNS
PN
Since j=1 aij = 1 [(A3)], we have
for each observation
xi =
N
X
aij sj ∈ aff{s1, . . . , sN }
j=1
This implies
aff{s1, . . . , sN } ⊇ aff{x1, . . . , xM }.
In fact, we can show that
Lemma. Under (A3) and (A4), aff{s1, . . . , sN } = aff{x1, . . . , xM }.
W.-K. Ma
17
• Consider the representation
aff{s1, . . . , sN } = aff{x1, . . . , xN }
N −1
= x = Cα + d α ∈ R
, A(C, d)
for some (C, d) ∈ RL×(N −1) × RL with rank(C) = N − 1.
• Let us consider determining the source affine set parameters (C, d) from
{x1, . . . , xM }.
• The solution is simple for M = N :
d = xN ,
C = [ x1 − xN , . . . , xN −1 − xN ]
• For M > N , we use an affine set fitting solution.
W.-K. Ma
18
Affine set fitting problem:
(C, d) = arg min
M
X
C̃,d̃
i=1
C̃ C̃=I
T
kx̃ − xik22
x̃∈A(C̃,d̃)
{z
}
|
min
(∗)
proj. error of xi onto A(C̃, d̃)
where A(C, d) = { x = Cα + d | α ∈ RN −1 }.
Proposition. Problem (∗) has a closed-form solution
M
1 X
d=
xi ,
M i=1
C = [ q 1(UUT ), q 2(UUT ), . . . , q N −1(UUT ) ]
where U = [ x1 − d, . . . , xM − d ] ∈ RL×M , and q i(R) denotes the eigenvector
associated with the ith principal eigenvalue of R.
W.-K. Ma
19
Be reminded that si ∈ RL
+ . Hence, it is true that
L
si ∈ aff{s1, . . . , sN } ∩ RL
=
A(C,
d)
∩
R
+
+ ,S
The following lemma arises from local dominance (A2):
Lemma. Under (A1) and (A2),
S = conv{s1, . . . , sN }
Moreover, the set of all its extreme
points is {s1, . . . , sN }.
W.-K. Ma
20
Summarizing the above results, a new nBSS criterion is as follows:
Theorem 1. (CAMNS criterion) Under (A1)-(A4), the polyhedral set
N −1
S= x∈R
x = Cα + d 0, α ∈ R
L
where (C, d) is obtained from the observation set {x1, ..., xM } by the
affine set fitting procedure in Proposition 1, has N extreme points given
by the true source vectors s1, ..., sN .
W.-K. Ma
21
Practical realization of CAMNS
• CAMNS boils down to finding all the extreme points of an observationconstructed polyhedral set.
• In the optimization context this is known as vertex enumeration.
• In CAMNS, there is one important problem structure that we can take full
advantage of; that is,
Property implied by (A2): s1, . . . , sN are linear independent.
• By exploiting this property, we can locate all the extreme points by solving a
sequence of LPs ( ≈ 2N LPs at worst).
W.-K. Ma
22
Consider the following LP
p? = min rT s
s
(†)
s.t. s ∈ S
for an arbitrary r ∈ RL. From basic LP theory, the solution of (†) is
• one of the extreme points of S (that is, one of the si), or
• any point on a face of S (look rather unlikely, intuitively).
W.-K. Ma
23
W.-K. Ma
24
We can prove that getting a non-extreme-pt. solution is very unlikely:
Lemma. Suppose that r is randomly generated following N (0, IL). Then, with
probability 1, the solution of
p? = min rT s
s
s.t. s ∈ S
is uniquely given by si for some i ∈ {1, ..., N }.
W.-K. Ma
25
• Suppose that we have found l extreme point, say, {s1, . . . , sl}.
• We can find the other extreme points, by using the linear independence of
{s1, . . . , sN } to ‘annihilate’ the old extreme points.
Lemma. Suppose r = Bw, where w ∼ N (0, IL−l), & B ∈ RL×(L−l) is such
that
B[s1, . . . , sl] = 0
BT B = IL−l
Then, with probability 1, at least one of the LPs
p? = min rT s
s∈S
q ? = max rT s
s∈S
finds a new extreme point; i.e., si for some i ∈ {l + 1, ..., N }. The 1st LP finds a
new extreme pt. if |p?| =
6 0; the 2nd LP finds a new extreme pt. if |q ?| =
6 0.
W.-K. Ma
26
On alternatives of implementing CAMNS
• We have another theorem that converts S ⊂ RL to another polyhedral set on
R(N −1), denoted by F below.
s1
x3
x = Cα + d
x1
x2
α1
α2
s2
• The set F has a smaller vector dim. (note that L N ). Also it is a simplex
with extreme pts related to those of S in a one-to-one manner.
• For N = 2, F is a line segment on R and there is a closed form for locating its
extreme points.
• For N = 3, F is a triangle on R2 and there is also a simple way for locating its
extreme points.
W.-K. Ma
27
Simulation example 1: Dual energy X-Ray
Original sources
W.-K. Ma
28
Observations
W.-K. Ma
29
Separated sources by CAMNS
W.-K. Ma
30
Separated sources by nICA (a benchmarked nBSS method)
W.-K. Ma
31
Separated sources by NMF (yet another benchmarked nBSS method)
W.-K. Ma
32
Simulation example 2: Human faces
Original sources
W.-K. Ma
33
Observations
W.-K. Ma
34
Separated sources by CAMNS
W.-K. Ma
35
Separated sources by nICA
W.-K. Ma
36
Separated sources by NMF
W.-K. Ma
37
Simulation example 3: Ghosting
Original sources
W.-K. Ma
38
Observations
W.-K. Ma
39
Separated sources by CAMNS
W.-K. Ma
40
Separated sources by nICA
W.-K. Ma
41
Separated sources by NMF
W.-K. Ma
42
Simulation example 4: Five of my students
Original sources
W.-K. Ma
43
Observations
W.-K. Ma
44
Separated sources by CAMNS
W.-K. Ma
45
Simulation example 5: Monte Carlo performance for N = 3
50
Average e(S, Sˆ ) (dB)
45
nICA
NMF
nLCA
CAMNS-geometric
CAMNS-LP
40
35
30
25
20
15
25
30
SNR (dB)
35
40
Average sum squared errors of the sources with respect to SNRs.
W.-K. Ma
46
Conclusion
• A convex analysis framework, called CAMNS has been developed for nBSS.
• CAMNS guarantees perfect separation of the true sources, by determining the
extreme points of an observation constructed polyhedral set (under several
assumptions)
• A systematic LP-based method has been proposed to realize CAMNS. Its
complexity is polynomial (specifically, O(L1.5(N − 1)2)).
• A number of simulation results indicate that CAMNS performs very well even in
the presence of dependent sources.
• The source codes is available at http://www.ee.cuhk.edu.hk/~wkma
W.-K. Ma
47
References
[Wang et al. 2003] Y. Wang, J. Xuan, R. Srikanchana, & P. L. Choyke, “Modeling and
reconstruction of mixed functional and molecular patterns,” Int’l Journal Biomedical Imaging, pp.
1–9, 2005.
[Plumbley 2003] M. D. Plumbley, “Algorithms for nonnegative independent component analysis,”
IEEE Trans. Neural Networks, vol. 14, no. 3, pp. 534–543, May 2003.
[Lee-Seung 1999] D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix
factorization,” Nature, vol. 401, pp. 788-791, Oct. 1999.
[Chan-Ma-Chi-Wang 2008] T.-H. Chan, W.-K. Ma, C.-Y. Chi, & Y. Wang, “A convex analysis
framework for blind separation of non-negative sources,” IEEE Trans. Signal Process., vol. 56, no.
10, pp. 5120-5134, Oct. 2008.
W.-K. Ma
48