Implementations of Fisher`s Linear Discriminant Analysis from the

Introduction
Methods
Experiment
Implementations of Fisher’s Linear
Discriminant Analysis
from the Numerical Point of View
Pavel Schlesinger 1
joint work with Jurjen Duintjer Tebbens 2
1 Institute
of Formal and Applied Linguistics
Faculty of Mathematics and Physics
Charles University, Prague
2 Institute of Computer Science
Academy of Sciences of the Czech Republic, Prague
CSDA 2005, Limassol
October 29, 2005
Summary
Introduction
Methods
Experiment
Outline
Introduction
Motivation
Fisher’s Criterion
The p n case
Underlying Linear Algebra
Methods
Pseudoinverse
Epsilon-perturbation
Null-space
Common Null-space Elimination
Experiment
Summary
Summary
Introduction
Methods
Experiment
Outline
Introduction
Motivation
Fisher’s Criterion
The p n case
Underlying Linear Algebra
Methods
Pseudoinverse
Epsilon-perturbation
Null-space
Common Null-space Elimination
Experiment
Summary
Summary
Introduction
Methods
Experiment
Motivation Example
• protein folding classification into 42 groups
• frequencies of 20 amino acids as predictors
• 20 singles, 400 pairs, 8000 triples, . . .
• expensive data-collection
• just hundreds of examples - 268
• classical p n issue in microarray, document
classification etc.
Problems
• matrix storage
• computational cost
• matrix multiplications, inversions
• computing inner products
• optimalization, QP, . . .
• . . . no straightforward using of p < n methods
Summary
Introduction
Methods
Experiment
Motivation Example
• protein folding classification into 42 groups
• frequencies of 20 amino acids as predictors
• 20 singles, 400 pairs, 8000 triples, . . .
• expensive data-collection
• just hundreds of examples - 268
• classical p n issue in microarray, document
classification etc.
Problems
• matrix storage
• computational cost
• matrix multiplications, inversions
• computing inner products
• optimalization, QP, . . .
• . . . no straightforward using of p < n methods
Summary
Introduction
Methods
Experiment
Outline
Introduction
Motivation
Fisher’s Criterion
The p n case
Underlying Linear Algebra
Methods
Pseudoinverse
Epsilon-perturbation
Null-space
Common Null-space Elimination
Experiment
Summary
Summary
Introduction
Methods
Experiment
Summary
Fisher’s Criterion I
• variance . . . Var (X ) = Σ = EX (X − µ)(X − µ)T
• decomposition (between- and within-variance)
Σ = Σ B + ΣW
ΣB =
ΣW =
K
X
k =1
K
X
k =1
πk (µk − µ)(µk − µ)T
πk EX |G (X − µk )(X − µk )T
• total variance after projection . . .
Var (cT X ) = cT ΣB c + cT ΣW c
• Fisher’s criterion
maxp
c∈R
cT ΣB c
cT ΣW c
s. t.
c 6= 0
Introduction
Methods
Experiment
Summary
Fisher’s Criterion II
• generalized eigenproblem with matrix pencil (Σ B , ΣW )
• (ordered) ν-eigenvectors give projected matrix C = C ν
1. estimate
• µ, µk , ΣB , ΣW
B≡
(GM − 1x̄)T (GM − 1x̄)
,
g−1
W ≡
(X − GM)T (X − GM)
n−g
• B, W ∈ Rp×p , B, W ≥ 0
• rank(B) ≤ K − 1, r = rank(W ) ≤ min{n, p}
2. solve Bc − λW c = 0 to obtain C
• find K − 1 largest eigenpairs
3. compute
e−µ
• all distances ||x
e k ||2 = ||x − µk ||2CCT
• find group label through minimum
Introduction
Methods
Experiment
Summary
Fisher’s Criterion II
• generalized eigenproblem with matrix pencil (Σ B , ΣW )
• (ordered) ν-eigenvectors give projected matrix C = C ν
1. estimate
• µ, µk , ΣB , ΣW
B≡
(GM − 1x̄)T (GM − 1x̄)
,
g−1
W ≡
(X − GM)T (X − GM)
n−g
• B, W ∈ Rp×p , B, W ≥ 0
• rank(B) ≤ K − 1, r = rank(W ) ≤ min{n, p}
2. solve Bc − λW c = 0 to obtain C
• find K − 1 largest eigenpairs
3. compute
e−µ
• all distances ||x
e k ||2 = ||x − µk ||2CCT
• find group label through minimum
Introduction
Methods
Experiment
Outline
Introduction
Motivation
Fisher’s Criterion
The p n case
Underlying Linear Algebra
Methods
Pseudoinverse
Epsilon-perturbation
Null-space
Common Null-space Elimination
Experiment
Summary
Summary
Introduction
Methods
Experiment
The p n case
• if W nonsingular ⇒ transformation to a standard
eigenproblem, e. g.
(W −1 B − λI)c = 0
• in the p n case W is singular ⇒
• transformation not possible
• very challenging eigenproblem
• may even happen that
det(B − λW ) = 0
∀λ ∈ C!
pair {B, W } is called singular matrix pencil
Summary
Introduction
Methods
Experiment
Outline
Introduction
Motivation
Fisher’s Criterion
The p n case
Underlying Linear Algebra
Methods
Pseudoinverse
Epsilon-perturbation
Null-space
Common Null-space Elimination
Experiment
Summary
Summary
Introduction
Methods
Experiment
Summary
Underlying Linear Algebra
• Generalized Schur Decomposition [Moler, Stewart - 1973]:
Q T (B − W )Z = T − S
• Q, Z orthogonal, T , S upper triangular
• singularity ⇒ possible zeros on main diagonals of T and S
1. tii 6= 0 6= sii ⇒ det(T −
tii
sii S)
= 0,
tii
sii
. . . finite eigenvalue
2. tii = 0, sii 6= 0 ⇒ det(T − 0 · S) = 0, 0 . . . finite —"—
3. tii 6= 0, sii = 0 ⇒ "det(T − ∞S) = 0", "infinite" —"—
4. tii = 0 = sii ⇒ det(T − λS) = 0, ∀λ ∈ C
• how to determine the K − 1 largest eigenvalues???
Introduction
Methods
Experiment
Summary
Underlying Linear Algebra
• Generalized Schur Decomposition [Moler, Stewart - 1973]:
Q T (B − W )Z = T − S
• Q, Z orthogonal, T , S upper triangular
• singularity ⇒ possible zeros on main diagonals of T and S
1. tii 6= 0 6= sii ⇒ det(T −
tii
sii S)
= 0,
tii
sii
. . . finite eigenvalue
2. tii = 0, sii 6= 0 ⇒ det(T − 0 · S) = 0, 0 . . . finite —"—
3. tii 6= 0, sii = 0 ⇒ "det(T − ∞S) = 0", "infinite" —"—
4. tii = 0 = sii ⇒ det(T − λS) = 0, ∀λ ∈ C
• how to determine the K − 1 largest eigenvalues???
Introduction
Methods
Experiment
Summary
Underlying Linear Algebra
• Generalized Schur Decomposition [Moler, Stewart - 1973]:
Q T (B − W )Z = T − S
• Q, Z orthogonal, T , S upper triangular
• singularity ⇒ possible zeros on main diagonals of T and S
1. tii 6= 0 6= sii ⇒ det(T −
tii
sii S)
= 0,
tii
sii
. . . finite eigenvalue
2. tii = 0, sii 6= 0 ⇒ det(T − 0 · S) = 0, 0 . . . finite —"—
3. tii 6= 0, sii = 0 ⇒ "det(T − ∞S) = 0", "infinite" —"—
4. tii = 0 = sii ⇒ det(T − λS) = 0, ∀λ ∈ C
• how to determine the K − 1 largest eigenvalues???
Introduction
Methods
Experiment
Summary
Underlying Linear Algebra
• Generalized Schur Decomposition [Moler, Stewart - 1973]:
Q T (B − W )Z = T − S
• Q, Z orthogonal, T , S upper triangular
• singularity ⇒ possible zeros on main diagonals of T and S
1. tii 6= 0 6= sii ⇒ det(T −
tii
sii S)
= 0,
tii
sii
. . . finite eigenvalue
2. tii = 0, sii 6= 0 ⇒ det(T − 0 · S) = 0, 0 . . . finite —"—
3. tii 6= 0, sii = 0 ⇒ "det(T − ∞S) = 0", "infinite" —"—
4. tii = 0 = sii ⇒ det(T − λS) = 0, ∀λ ∈ C
• how to determine the K − 1 largest eigenvalues???
Introduction
Methods
Experiment
Summary
Underlying Linear Algebra
• Generalized Schur Decomposition [Moler, Stewart - 1973]:
Q T (B − W )Z = T − S
• Q, Z orthogonal, T , S upper triangular
• singularity ⇒ possible zeros on main diagonals of T and S
1. tii 6= 0 6= sii ⇒ det(T −
tii
sii S)
= 0,
tii
sii
. . . finite eigenvalue
2. tii = 0, sii 6= 0 ⇒ det(T − 0 · S) = 0, 0 . . . finite —"—
3. tii 6= 0, sii = 0 ⇒ "det(T − ∞S) = 0", "infinite" —"—
4. tii = 0 = sii ⇒ det(T − λS) = 0, ∀λ ∈ C
• how to determine the K − 1 largest eigenvalues???
Introduction
Methods
Experiment
Summary
Underlying Linear Algebra
• Generalized Schur Decomposition [Moler, Stewart - 1973]:
Q T (B − W )Z = T − S
• Q, Z orthogonal, T , S upper triangular
• singularity ⇒ possible zeros on main diagonals of T and S
1. tii 6= 0 6= sii ⇒ det(T −
tii
sii S)
= 0,
tii
sii
. . . finite eigenvalue
2. tii = 0, sii 6= 0 ⇒ det(T − 0 · S) = 0, 0 . . . finite —"—
3. tii 6= 0, sii = 0 ⇒ "det(T − ∞S) = 0", "infinite" —"—
4. tii = 0 = sii ⇒ det(T − λS) = 0, ∀λ ∈ C
• how to determine the K − 1 largest eigenvalues???
Introduction
Methods
Experiment
Relation with FLDA
1. finite nonzero
(B − stiiii W )c = 0
cT Bc =
⇒
tii T
sii c W c
• λ = stii . . . ratio of between- to within-variance
ii
• complement of null-spaces of B, W
2. finite zero
(B − 0 · W )c = 0
⇒
cT Bc = 0
• opposite of FLDA aim
• null-space of B
3. infinite
cT Bc = λcT W c for
λ=∞ ”⇒”
cT W c = 0
T
• wanted for FLDA, quality depends on c Bc
• null-space of W
4. any value is eigenvalue
cT Bc = λcT W c ∀ λ ∈ C
”⇒”
• uninteresting for FLDA
• common null-space of B, W
cT W c = 0 = cT Bc
Summary
Introduction
Methods
Experiment
Relation with FLDA
1. finite nonzero
(B − stiiii W )c = 0
cT Bc =
⇒
tii T
sii c W c
• λ = stii . . . ratio of between- to within-variance
ii
• complement of null-spaces of B, W
2. finite zero
(B − 0 · W )c = 0
⇒
cT Bc = 0
• opposite of FLDA aim
• null-space of B
3. infinite
cT Bc = λcT W c for
λ=∞ ”⇒”
cT W c = 0
T
• wanted for FLDA, quality depends on c Bc
• null-space of W
4. any value is eigenvalue
cT Bc = λcT W c ∀ λ ∈ C
”⇒”
• uninteresting for FLDA
• common null-space of B, W
cT W c = 0 = cT Bc
Summary
Introduction
Methods
Experiment
Relation with FLDA
1. finite nonzero
(B − stiiii W )c = 0
cT Bc =
⇒
tii T
sii c W c
• λ = stii . . . ratio of between- to within-variance
ii
• complement of null-spaces of B, W
2. finite zero
(B − 0 · W )c = 0
⇒
cT Bc = 0
• opposite of FLDA aim
• null-space of B
3. infinite
cT Bc = λcT W c for
λ=∞ ”⇒”
cT W c = 0
T
• wanted for FLDA, quality depends on c Bc
• null-space of W
4. any value is eigenvalue
cT Bc = λcT W c ∀ λ ∈ C
”⇒”
• uninteresting for FLDA
• common null-space of B, W
cT W c = 0 = cT Bc
Summary
Introduction
Methods
Experiment
Relation with FLDA
1. finite nonzero
(B − stiiii W )c = 0
cT Bc =
⇒
tii T
sii c W c
• λ = stii . . . ratio of between- to within-variance
ii
• complement of null-spaces of B, W
2. finite zero
(B − 0 · W )c = 0
⇒
cT Bc = 0
• opposite of FLDA aim
• null-space of B
3. infinite
cT Bc = λcT W c for
λ=∞ ”⇒”
cT W c = 0
T
• wanted for FLDA, quality depends on c Bc
• null-space of W
4. any value is eigenvalue
cT Bc = λcT W c ∀ λ ∈ C
”⇒”
• uninteresting for FLDA
• common null-space of B, W
cT W c = 0 = cT Bc
Summary
Introduction
Methods
Experiment
Relation with FLDA
1. finite nonzero
(B − stiiii W )c = 0
cT Bc =
⇒
tii T
sii c W c
• λ = stii . . . ratio of between- to within-variance
ii
• complement of null-spaces of B, W
2. finite zero
(B − 0 · W )c = 0
⇒
cT Bc = 0
• opposite of FLDA aim
• null-space of B
3. infinite
cT Bc = λcT W c for
λ=∞ ”⇒”
cT W c = 0
T
• wanted for FLDA, quality depends on c Bc
• null-space of W
4. any value is eigenvalue
cT Bc = λcT W c ∀ λ ∈ C
”⇒”
• uninteresting for FLDA
• common null-space of B, W
cT W c = 0 = cT Bc
Summary
Introduction
Methods
Experiment
Relation with FLDA
1. finite nonzero
(B − stiiii W )c = 0
cT Bc =
⇒
tii T
sii c W c
• λ = stii . . . ratio of between- to within-variance
ii
• complement of null-spaces of B, W
2. finite zero
(B − 0 · W )c = 0
⇒
cT Bc = 0
• opposite of FLDA aim
• null-space of B
3. infinite
cT Bc = λcT W c for
λ=∞ ”⇒”
cT W c = 0
T
• wanted for FLDA, quality depends on c Bc
• null-space of W
4. any value is eigenvalue
cT Bc = λcT W c ∀ λ ∈ C
”⇒”
• uninteresting for FLDA
• common null-space of B, W
cT W c = 0 = cT Bc
Summary
Introduction
Methods
Experiment
Relation with FLDA
1. finite nonzero
(B − stiiii W )c = 0
cT Bc =
⇒
tii T
sii c W c
• λ = stii . . . ratio of between- to within-variance
ii
• complement of null-spaces of B, W
2. finite zero
(B − 0 · W )c = 0
⇒
cT Bc = 0
• opposite of FLDA aim
• null-space of B
3. infinite
cT Bc = λcT W c for
λ=∞ ”⇒”
cT W c = 0
T
• wanted for FLDA, quality depends on c Bc
• null-space of W
4. any value is eigenvalue
cT Bc = λcT W c ∀ λ ∈ C
”⇒”
• uninteresting for FLDA
• common null-space of B, W
cT W c = 0 = cT Bc
Summary
Introduction
Methods
Experiment
Relation with FLDA
1. finite nonzero
(B − stiiii W )c = 0
cT Bc =
⇒
tii T
sii c W c
• λ = stii . . . ratio of between- to within-variance
ii
• complement of null-spaces of B, W
2. finite zero
(B − 0 · W )c = 0
⇒
cT Bc = 0
• opposite of FLDA aim
• null-space of B
3. infinite
cT Bc = λcT W c for
λ=∞ ”⇒”
cT W c = 0
T
• wanted for FLDA, quality depends on c Bc
• null-space of W
4. any value is eigenvalue
cT Bc = λcT W c ∀ λ ∈ C
”⇒”
• uninteresting for FLDA
• common null-space of B, W
cT W c = 0 = cT Bc
Summary
Introduction
Methods
Experiment
Relation with FLDA
1. finite nonzero
(B − stiiii W )c = 0
cT Bc =
⇒
tii T
sii c W c
• λ = stii . . . ratio of between- to within-variance
ii
• complement of null-spaces of B, W
2. finite zero
(B − 0 · W )c = 0
⇒
cT Bc = 0
• opposite of FLDA aim
• null-space of B
3. infinite
cT Bc = λcT W c for
λ=∞ ”⇒”
cT W c = 0
T
• wanted for FLDA, quality depends on c Bc
• null-space of W
4. any value is eigenvalue
cT Bc = λcT W c ∀ λ ∈ C
”⇒”
• uninteresting for FLDA
• common null-space of B, W
cT W c = 0 = cT Bc
Summary
Introduction
Methods
Experiment
Relation with FLDA
1. finite nonzero
(B − stiiii W )c = 0
cT Bc =
⇒
tii T
sii c W c
• λ = stii . . . ratio of between- to within-variance
ii
• complement of null-spaces of B, W
2. finite zero
(B − 0 · W )c = 0
⇒
cT Bc = 0
• opposite of FLDA aim
• null-space of B
3. infinite
cT Bc = λcT W c for
λ=∞ ”⇒”
cT W c = 0
T
• wanted for FLDA, quality depends on c Bc
• null-space of W
4. any value is eigenvalue
cT Bc = λcT W c ∀ λ ∈ C
”⇒”
• uninteresting for FLDA
• common null-space of B, W
cT W c = 0 = cT Bc
Summary
Introduction
Methods
Experiment
Relation with FLDA
1. finite nonzero
(B − stiiii W )c = 0
cT Bc =
⇒
tii T
sii c W c
• λ = stii . . . ratio of between- to within-variance
ii
• complement of null-spaces of B, W
2. finite zero
(B − 0 · W )c = 0
⇒
cT Bc = 0
• opposite of FLDA aim
• null-space of B
3. infinite
cT Bc = λcT W c for
λ=∞ ”⇒”
cT W c = 0
T
• wanted for FLDA, quality depends on c Bc
• null-space of W
4. any value is eigenvalue
cT Bc = λcT W c ∀ λ ∈ C
”⇒”
• uninteresting for FLDA
• common null-space of B, W
cT W c = 0 = cT Bc
Summary
Introduction
Methods
Experiment
Relation with FLDA
1. finite nonzero
(B − stiiii W )c = 0
cT Bc =
⇒
tii T
sii c W c
• λ = stii . . . ratio of between- to within-variance
ii
• complement of null-spaces of B, W
2. finite zero
(B − 0 · W )c = 0
⇒
cT Bc = 0
• opposite of FLDA aim
• null-space of B
3. infinite
cT Bc = λcT W c for
λ=∞ ”⇒”
cT W c = 0
T
• wanted for FLDA, quality depends on c Bc
• null-space of W
4. any value is eigenvalue
cT Bc = λcT W c ∀ λ ∈ C
”⇒”
• uninteresting for FLDA
• common null-space of B, W
cT W c = 0 = cT Bc
Summary
Introduction
Methods
Experiment
Summary
Introduction
• here only methods from R, Matlab
• linked with LAPACK-libraries
• all methods are backward stable
• common approaches . . . "eliminate" singularity
• slight modification of W while preserving crucial
information . . . regularization
• most methods based on spectral decomposition
W = Q diag(λ1 , . . . , λr , 0, . . . , 0)Q T ,
where r = rank(W )
Q T Q = I,
Introduction
Methods
Experiment
Outline
Introduction
Motivation
Fisher’s Criterion
The p n case
Underlying Linear Algebra
Methods
Pseudoinverse
Epsilon-perturbation
Null-space
Common Null-space Elimination
Experiment
Summary
Summary
Introduction
Methods
Experiment
Pseudoinverse
• let Λr = diag(λ1 , . . . , λr )
• partition Q = (Q r , Q N ), where Q N spans null-space of W
• transformation to standard eigenproblem with
T
W + = Q r Λ−1
r Qr
Pseudoinverse - properties
+ smaller eigenproblem (dimension r )
+ no need to search for an optimal regularization parameter
− only finite eigenpairs (discards null-space of W )
Summary
Introduction
Methods
Experiment
Pseudoinverse
• let Λr = diag(λ1 , . . . , λr )
• partition Q = (Q r , Q N ), where Q N spans null-space of W
• transformation to standard eigenproblem with
T
W + = Q r Λ−1
r Qr
Pseudoinverse - properties
+ smaller eigenproblem (dimension r )
+ no need to search for an optimal regularization parameter
− only finite eigenpairs (discards null-space of W )
Summary
Introduction
Methods
Experiment
Summary
Implementation I - Non-symmetric transformation
• solve standard eigenproblem
T
(Q r Λ−1
r Q r B − λI)c = 0
• cost of nonsymmetric QR-method: ±25p 3 flops
• eigenvalues and -vectors can be ill-conditioned
• store several p × p matrices
• see e. g. [Cheng et al. - 1992]
Introduction
Methods
Experiment
Implementation II - Symmetric transformation
• solve standard eigenproblem
−1/2
(Λr
−1/2
Q Tr BQ r Λr
c=
− λI)c∗ = 0,
1/2 ∗
c
Q r Λr
1/2 ∗
c k
kQ r Λr
• cost of symmetric QR-method: ±9p 3 flops
• only eigenvectors can be ill-conditioned
• store W , Q ∈ Rp×p , but transformed eigenproblem is r × r
• see e.g. [Krzanowski et al. - 1995]
Summary
Introduction
Methods
Experiment
Summary
Implementation III - SVD-implementation I
• exploits the given structure of B and W , e. g.
W =
X − GM
√
n−g
T
X − GM
√
= QΛQ T
n−g
• use SVD instead of eigen decomposition
• cost: ±4p 2 n flops or ±14pn2 flops for "economy size SVD"
Introduction
Methods
Experiment
Implementation III - SVD-implementation II
• only eigenvectors can be ill-conditioned
• storage: 1 n × p matrix, 1 n × r matrix
• here everywhere economy size SVD
• lda() function in R-environment with default parameters
• pseudoinverse method
• SVD implementation
Summary
Introduction
Methods
Experiment
Outline
Introduction
Motivation
Fisher’s Criterion
The p n case
Underlying Linear Algebra
Methods
Pseudoinverse
Epsilon-perturbation
Null-space
Common Null-space Elimination
Experiment
Summary
Summary
Introduction
Methods
Experiment
Epsilon-perturbation I
• consider QΛε Q T instead of Q r Λr Q Tr ,
where Λε = diag(λ1 , . . . , λr , ε, . . . , ε)
see e. g. [Cheng et al. - 1992]
• transformation of modified generalized eigenproblem
(B − λQΛε Q T )c = 0
Epsilon-perturbation - properties
+ no exclusion of any null-spaces
− large eigenproblem (dimension p)
− need for regularization parameter ε
− too small ε ⇒ ill-conditioned eigenproblem
Summary
Introduction
Methods
Experiment
Epsilon-perturbation I
• consider QΛε Q T instead of Q r Λr Q Tr ,
where Λε = diag(λ1 , . . . , λr , ε, . . . , ε)
see e. g. [Cheng et al. - 1992]
• transformation of modified generalized eigenproblem
(B − λQΛε Q T )c = 0
Epsilon-perturbation - properties
+ no exclusion of any null-spaces
− large eigenproblem (dimension p)
− need for regularization parameter ε
− too small ε ⇒ ill-conditioned eigenproblem
Summary
Introduction
Methods
Experiment
Epsilon-perturbation II
• nonsymmetric & symmetric implementations: as before,
only forming transformed eigenproblem is little more
expensive
• SVD implementation: as before, BUT
• store p × p matrix Q
• cannot exploit economy SVD: ±4p 2 n
Summary
Introduction
Methods
Experiment
Outline
Introduction
Motivation
Fisher’s Criterion
The p n case
Underlying Linear Algebra
Methods
Pseudoinverse
Epsilon-perturbation
Null-space
Common Null-space Elimination
Experiment
Summary
Summary
Introduction
Methods
Experiment
Summary
Null-space I
• vectors c in the null-space of W satisfy c T W c = 0
• if cT Bc is large ⇒ can be used for projection
• find largest eigenvalues of B in null-space of W by solving
(Q TN BQ N − I)c = 0
see e. g. [Guo et al. - 2003]
The Null-space method - properties
+ does not need a regularization parameter
− solves an eigenproblem of dimension p − r
− discards finite eigenvalues of original problem
Introduction
Methods
Experiment
Summary
Null-space I
• vectors c in the null-space of W satisfy c T W c = 0
• if cT Bc is large ⇒ can be used for projection
• find largest eigenvalues of B in null-space of W by solving
(Q TN BQ N − I)c = 0
see e. g. [Guo et al. - 2003]
The Null-space method - properties
+ does not need a regularization parameter
− solves an eigenproblem of dimension p − r
− discards finite eigenvalues of original problem
Introduction
Methods
Experiment
Summary
Null-space II
• symmetric implementation: as for Epsilon perturbation, but
storage and cost a little cheaper
• SVD
• storage: 1 p × (p − r ) matrix
• cost of 1 full SVD: 4p 2 n
Introduction
Methods
Experiment
Outline
Introduction
Motivation
Fisher’s Criterion
The p n case
Underlying Linear Algebra
Methods
Pseudoinverse
Epsilon-perturbation
Null-space
Common Null-space Elimination
Experiment
Summary
Summary
Introduction
Methods
Experiment
Common Null-space Elimination I
• . . . numerical analysists recommend [Parlett - 1998]
• common null-space of B, W ⇒ ill-posed eigenproblem
• project onto complement of common null-space
• Bx = 0
∧ Wx = 0
because B, W ≥ 0
⇔
(B + W )x = 0,
• compute spectral decomposition B + W
• let P contain eigenvectors for non-zero eigenvalues
• solve the projected problem
(P T BP − λP T W P)c∗ = 0,
c = Pc∗
• vector selection by considering c T Bc, cT W c
Summary
Introduction
Methods
Experiment
Common Null-space Elimination II
Common Null-space Elimination - properties
+ problem smaller than n ( p)
+ no exclusion of null-space of W
Implementation with QZ:
• storage: only p × n matrices
• cost: order pn2 + n3
• both eigenvalues and -vectors can be ill-conditioned
Summary
Introduction
Methods
Experiment
Common Null-space Elimination II
Common Null-space Elimination - properties
+ problem smaller than n ( p)
+ no exclusion of null-space of W
Implementation with QZ:
• storage: only p × n matrices
• cost: order pn2 + n3
• both eigenvalues and -vectors can be ill-conditioned
Summary
Introduction
Methods
Experiment
Numerical Experiment
• protein fold classification problem
• sample size n = 268
• 143 in training set
• 125 in test set
• p = 400 predictors
• K = 42 classes
Summary
Introduction
Methods
Experiment
Summary
Pseudoinverse
0
10
−5
10
−10
10
−15
10
−20
10
−25
10
0
5
10
15
20
25
30
35
blue curve: cT Bc, red curve: cT W c
40
45
Introduction
Methods
Experiment
Summary
Epsilon-perturbation
0
10
−5
10
−10
10
−15
10
−20
10
−25
10
0
5
10
15
20
25
30
35
blue curve: cT Bc, red curve: cT W c
40
45
Introduction
Methods
Experiment
Summary
Null-Space
0
10
−5
10
−10
10
−15
10
−20
10
−25
10
0
5
10
15
20
25
30
35
blue curve: cT Bc, red curve: cT W c
40
45
Introduction
Methods
Experiment
Summary
Common Null-Space Elimination
0
10
−5
10
−10
10
−15
10
−20
10
−25
10
0
5
10
15
20
25
30
35
blue curve: cT Bc, red curve: cT W c
40
45
Introduction
Methods
Experiment
Summary
Error Rates of Classifiers
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0
5
10
15
20
25
30
35
40
45
blue curve: Pseudoinverse, green curve: both Epsilon and Null-space,
red curve: Common Null-space elimination
• two middle methods: error rate of 24%
• can compete with Support Vector Machine strategy 23.2%
(other over 28.8%, see e. g. [Markowetz et al. - 2003])
Introduction
Methods
Experiment
Summary
Conclusions
• Generalized Schur decomposition as a theoretical tool for
better understanding of FLDA
• covers all proposed methods
• Common-null space elimination
• seems most appropriate, not only theoretically, but also
experimentally and numerically
• in addition we found. . .
• pseudoinverse method (R) for p n needs not be the best
at all!
• exploitation of structure with SVD
Introduction
Methods
Experiment
Summary
Open Questions & Future work
• choice and influence of regularization parameter in
perturbation technique
• main challenge for nearby future is extension to very large
problems
• sparsity!!!
• exploitation of special structure
• generalization to nonlinear cases
• kernels method for Fisher [Baudat, Anouar - 2000]
Appendix
References
G. Baudat and F. Anouar:
Generalized discriminant analysis using a kernel approach.
Neural Computation, 12(10):2385–404(1), 2000.
Y. Cheng, Y. Zhuang and J. Yang:
Optimal Fisher discriminant analysis using the rank decomposition.
Pattern Recognition, 25(1):101–111, 1992.
Y.-F. Guo, S.-J. Li, J.-Y. Yang, T.-T. Shu and L.-D. Wu:
A generalized Foley-Sammon transform based on generalized fisher discriminant criterion and its application
to face recognition.
Pattern Recognition Letters, 24:147–158, 2003.
W. J. Krzanowski, P. Jonathan, W .V .McCarthy and M .R. Thomas:
Discriminant analysis with singular covariance matrices: methods and applications to spectroscopic data.
Applied Statistics, 44:101–115, 1995.
F. Markowetz, L. Edler and M. Vingron:
Support Vector Machines for Protein Fold Class Prediction.
Biometrical Journal, 45(3):377–389, 2003.
C. B. Moler and G. W. Stewart:
An algorithm for generalized matrix eigenvalue problems.
SIAM Journal on Numerical Analysis, 10:241–256, 1973.
B. N. Parlett:
The symmetric Eigenvalue Problem.
SIAM, Philadelphia, 1998.
Appendix
Thank you for your attention!
This work is supported partly by the Program Information
Society under project 1ET400300415 and the MSMT CR
Project LC536
Appendix
Generalized Schur Decomposition (by QZ)
0
10
−5
10
−10
10
−15
10
−20
10
−25
10
0
50
100
150
200
250
300
350
400
blue curve: diagonal of T (B), red curve: diagonal of S(W )

Download Report

Implementations of Fisher`s Linear Discriminant Analysis from the

Paperzz.com

Your Paperzz