What do we need from the

Introduction to Kernel PCA
Near-optimal Spare L1-PCA
1
2
Need to capture ``non-linear data’’ pattern
principal component analysis (PCA) is linear
3
High dimensional data mapping
Data becomes linearly separable
Feature
mapping
Data in low. dim. space
Data in high dim. space
4
To answer this, Lets visit PCA...
: data matrix
: Assume zero-center data
: Calculate covariance matrix
: eigen vectors calculation
: low-dim. projection
5
To answer this, Lets visit PCA...
• Two lesser known-facts
 Projected-data are de-correlated in new-basis
is a diagonal
sub-matrix
 Every eigen-vector can be exactly written as
some linear combination of data-vectors
6
• Non-linear transformation
or
using...(1) and PC property #2
7
• Non-linear transformation
or
define kernel function
After few simplification
8
Focus
Kernel function
9
Q. Do we need individual
A. No, need only projections
and
?
Mysterious
10
Given two points
and
, we need
Let
11
12
• Kernel of form corresponds to
inner product in higher space
• Computing just in -space,
design kernel
13
14
15
original data
class 1
class 2
100
z
50
0
-50
-100
100
50
100
50
0
0
-50
-100
y
standard PCA
20
8
class 1
class 2
100
-50
-100
x
polynomial kernel order-5
x 10
Radial kernel
3
class 1
class 2
6
class 1
class 2
2
4
1
2
0
50
-1
0
0
-2
-2
-3
-50
-4
-4
-6
-5
-100
-8
-1.5
-100
-50
0
50
100
-1
-0.5
0
0.5
1
1.5
21
x 10
-6
-8
-6
-4
-2
0
2
4
166
8
17
-PCA
-SPCA
Interpretability
A direction that not only maximizes
data variance but also has only non-zero
components
-PCA
Robustness against outlier
•
•
Sparsity enhanced interpretability and L1-norm enhanced robustness
Lets have dual benefit of robustness and interpretability simultaneously
-SPCA
Interpretability
Robustness against outlier
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33