Projection Pursuit
Projection Pursuit (PP)
PCA and FDA are linear, PP may be linear or
non-linear.
Find
interesting
“criterion
of fit”,
or “figure of
General transformation
( j )T
( j)
( j)
( j)
Y function,
Y1 ,Y2 f X ; W ;
merit”
with parameters W.
I ( Y; W
) I ffor
W Index
X;low-dim
that
allows
(usually
2D or 3D)
of “interestingness”
projection.
Interesting indices may use a priori knowledge
about the problem:
1.
mean nearest neighbor distance – increase
clustering of Y(j)
2.
maximize mutual information between
classes and features
Kurtosis
ICA is a special version of PP, recently very
popular.
Y E{Y }
Gaussian distributions
ofvariable
Y2 E{Y
E (Y )}2
Y are
characterized by 2 parameters:
mean measure
value: of non-Gaussianity of
One simple
projections
is the
variance:
4-th moment
(cumulant)
of the of
distribution,
called
2
These
are the
first 22 moments
distribution;
all
4
k 4 Y E measures
Y 3 E Y “skewedness”
kurtosis,
of the
higher are 0 for G(Y).
Super-Gaussian
distribution:
longis:
distribution. For E{Y}
=0 kurtosis
tail, peak at zero, k4(y)>0, like binary
image data.
sub-Gaussian distribution is more flat
Correlation and independence
Variables are statistically independent if their joint
probability distribution
is a product of probabilities
n
p Xall
X n pi X i
for
1 , Xvariables:
2
i 1
Features Yi, Yj are uncorrelated if covariance is
diagonal, or:
E YiY j E Yi E Y j
Uncorrelated features are orthogonal.
Statistically independent features Yi, Yj for any
E
Y j E f1 Yi E f 2 Y j
functions
f1 Yi f2 give:
This is much stronger condition than correlation;
in particular the functions may be powers of
variables; any non-Gaussian distribution after
PP/ICA example
Example: PCA and PP based on maximal
kurtosis: note nice separation of the blue class.
Some remarks
• Many formulations of PP and ICA methods
exist.
• PP is used for data visualization and
dimensionality reduction.
2 are frequently
• Nonlinear
projections
(1)
T
Index I(Y;W) is
W arg max E W X
W 1 but solutions are more numerically
considered,
based here on
intensive.
maximum
Other
components are found in
the space
2
T
•orthogonal
PCA may to
also
be
viewed
as
PP,
max
(for
k
1
variance.
W
X
1
(k )
T
(i )
T( i )
Wstandardized
arg max E data):
W
I
W
W
X
W 1
i 1
Same index is used, with projection on space
orthogonal to k-1 PCs.
How do we find multiple
Projections
• Statistical approach is
complicated:
–Perform a transformation on the
data to eliminate structure in the
already found direction
–Then perform PP again
• Neural Comp approach: Lateral
High Dimensional Data
Dimension Reduction
Feature Extraction
Visualisation
Classification
Analysis
Projection Pursuit
what: An automated procedure that seeks interesting
low
dimensional projections of a high
dimensional cloud by
numerically
maximizing an objective function or projection
index.
Huber, 1985
Projection Pursuit
•
•
•
•
•
why:
Curse of dimensionality
Less Robustness
worse mean squared error
greater computational cost
slower convergence to limiting distributions
…
• Required number of labelled samples increases with
dimensionality.
What is an interesting projection
In general:
the projection that reveals more
information about the structure.
In pattern recognition:
a projection that maximises class
separability in a low dimensional
subspace.
Projection Pursuit
Dimensional Reduction
Find lower-dimensional projections of a high-dimensional point
cloud to facilitate classification.
Exploratory Projection Pursuit
Reduce the dimension of the problem to facilitate visualization.
Projection Pursuit
How many dimensions to use
• for visualization
• for classification/analysis
Which Projection Index to use
• measure of variation (Principal Components)
• departure from normality (negative entropy)
• class separability(distance, Bhattacharyya, Mahalanobis, ...)
• …
Projection Pursuit
Which optimization method to choose
We are trying to find the global optimum among local ones
• hill climbing methods (simulated annealing)
• regular optimization routines with random starting points.
Timetable for Dimensionality reduction
• Begin
16 April 1998
• Report on the state-of-the-art.
1 June
1998
• Begin software implementation 15 June 1998
• Prototype software presentation
1
November 1998
ICA demos
• ICA has many applications in signal and image
analysis.
• Finding independent signal sources allows for
separation
of signals from different sources,
T
XW Y
removal of noise or artifacts.
Both W and Y are unknown! This is a blind
Observations
X are a linear mixture W of
separation problem.
unknown
sources
Y
How
can they
be found?
If Y are Independent Components and W linear
Play with ICALab PCA/ICA Matlab software for
mixing the problem is similar to FDA or PCA,
signal/image analysis:
only the criterion function is different.
http://www.bsp.brain.riken.go.jp/page7.ht
ICA demo: images & audio
Example from Cichocki’s lab,
http://www.bsp.brain.riken.go.jp/page7.html
X space for images:
take intensity of all pixels one vector per
image, or
take smaller patches (ex: 64x64),
increasing # vectors
• 5 images: originals, mixed, convergence of ICA
iterations
Self-organization
PCA, FDA, ICA, PP are all inspired by statistics,
although some neural-inspired methods have
been proposed to find interesting solutions,
especially for their non-linear versions.
• Brains learn to discover the structure of signals:
visual, tactile, olfactory, auditory (speech and
sounds).
• This is a good example of unsupervised
learning: spontaneous development of feature
detectors, compressing internal information that
Models of self-organizaiton
SOM or SOFM (Self-Organized Feature
Mapping) – self-organizing feature map, one of
the
models.
Howsimplest
can such
maps develop spontaneously?
Local neural connections: neurons interact
strongly with those nearby, but weakly with
those that are far (in addition inhibiting some
History:
intermediate neurons).
von der Malsburg and Willshaw (1976),
competitive learning, Hebb mechanisms,
„Mexican hat” interactions, models of visual
systems.
Amari (1980) – models of continuous neural
tissue.
Computational Intelligence:
Methods and Applications
Lecture 8
Projection Pursuit &
Independent Component Analysis
Włodzisław Duch
SCE, NTU, Singapore
21
Computational Intelligence:
Methods and Applications
Lecture 6
Principal Component Analysis.
Włodzisław Duch
SCE, NTU, Singapore
http://www.ntu.edu.sg/home/aswduch
22
© Copyright 2026 Paperzz