Dimensionality reduction

Alexis Boukouvalas
Work in collaboration with D. M.
Maniyar and D. Cornford
Managing Uncertainty in Complex
Models, Aston University




Develop methods for dimensionality reduction
of either the input and/or output space of
models.
To gain an understanding initially use a toy
dataset to compare existing methods.
Later on utilize methods on real world models.
Goal is to extend methods to work with high
number of variables - 10^5
Managing Uncertainty in Complex
Models, Aston University

Feature Selection




Also known as Screening in statistical literature
Select p most relevant of the original k variables.
Meaning of variables is preserved => method results
are interpretable
Projective methods



Variables are transformed X’=F(X)
Transformations can be linear or non-linear
Interpretation is non-trivial especially for non-linear
mappings.
Managing Uncertainty in Complex
Models, Aston University
• Generate N base vectors x of dimensionality d from sampling a
Latin hypercube. Normalize the data.
• Evaluate the generative model g(.)
• Corrupt the model output with independent identically distributed
Gaussian noise. Initially we set noise variance is 0.1*signal variance.
•[Screening] Augment with extra noise dimensions
e = Bx + input noise
Noise is always N(0,I). B matrix is described on the next slide.
•[Projection] Project to a higher
space using x’ = W*F(x)
Managingdimensional
Uncertainty in Complex
Models, Aston University

[Screening] B matrix determines correlation
between noise and model variables



B=0 constructs noise variables that are uncorrelated
to the model variables.
k randomly selected rows have a single non zero
entry corresponding to the noise variable being
linearly correlated to a single model variable.
Currently k=0.5*#noise variables and coefficient is
set to 0.5
Same as previous but two elements of k rows are
non-zero, k=0.8 and coefficients are randomly taken
from the set {-0.2,-0.5,+0.5,+0.7}
Managing Uncertainty in Complex
Models, Aston University


[Projection] Project into higher dimensional
space q
x’ = W*F(x)
W is a q*d weight matrix and F(·) are basis
functions which are responsible for the
projection mapping. A typical choice of such
projection mapping is to use Radial Basis
Functions (RBF).
Managing Uncertainty in Complex
Models, Aston University

Different noise models




Correlated
Multiplicative
Non-linear interactions of noise variables with
model variables
Mix screening and projection
Managing Uncertainty in Complex
Models, Aston University

Variable selection methods have been broadly
categorised in three categories
Variable Ranking. Input variables are ranked
according to the prediction accuracy of each input
calculated against the model output.
 Wrapper methods. The emulator is used to assess the
predictive power of subsets of variables
 Embedded methods. For both variable ranking and
wrapper methods, the emulator is considered a
perfect black box. In embedded methods, the
variable selection is done as part of the training of
the emulator.

Managing Uncertainty in Complex
Models, Aston University





Forward selection where variables are progressively
incorporated in larger and larger subsets
Backward elimination proceeds in the opposite
direction.
Efroymson’s algorithm aka stepwise selection. Proceed
as forward selection but after each variable is added,
check if any of the selected variables can be deleted
without significantly affecting RSS.
Exhaustive search where all possible subsets are
considered.
Branch and Bound. Eliminate subset choices as early as
possible. E.g. is variables A-Z, RSS of A,B subset 100,
then C-Z subset branch need not be followed if RSS of
all C-Z variables > 100.
Managing Uncertainty in Complex
Models, Aston University

An embedded method commonly employed in
the context of Gaussian Processes is Automatic
Relevance Determination (ARD) where the
characteristic length scales l determine the
input relevance
Managing Uncertainty in Complex
Models, Aston University







The following algorithms were used in the experiments
• BaseRelevant: Baseline run using the relevant dimensions only. The
RMSE was obtained by training a GP on the relevant dimensions. This
value can be interpreted as the optimal RMSE value.
• BaseAll: Baseline run using all the dimensions, i.e. relevant + extra.
Again the RMSE was obtained by training a GP on this set. The difference
BaseAll-BaseRelevant is a measure of the effect of the extra variables on
the predictive accuracy of the GP.
• CorrCoef: Pearson Correlation Coefficient. A variable ranking is
performed using the formulae 10 and the top 3 variables are selected and
used to train a GP.
• LinFS: Employ a forward selection subset selection strategy using a
multivariate linear regression model. The RMSE is obtained from
evaluating the selected subset on a multiple linear regression model.
• GPFS: Again employ forward selection to generate subsets but use a GP
rather than a linear model.
• ARD: Employ the ARD method to rank the input variables and select
the top 3 to train a GP model.
Managing Uncertainty in Complex
Models, Aston University

200 observations,3 model dimensions, 6 total
Algorithm
Variables
Selected
RMSE
Elapsed time
BaseRelevant
1,2,3
0.9128
1.44142
5.56684
1.0473
1.60529
BaseAll
1,2,3,4,5,6
CorrCoef
1,4,2(,3,5,6)
2.1642
1.50487
LinFS
1,4,2
2.7803
0.134283
GPFS
1,2,3
0.9092
18.2017
ARD
1,2,3
0.9134
5.56684
Managing Uncertainty in Complex
Models, Aston University

200 observations,3 model dimensions, 6 total
Algorithm
Variables
Selected
RMSE
Elapsed time
BaseRelevant
1,2,3
0.9111
1.42363
5.56684
1.0633
1.66093
BaseAll
1,2,3,4,5,6
CorrCoef
1,4,5(,2,6,3)
2.6794
1.31676
LinFS
1,4,6
2.8083
0.143308
GPFS
1,2,3
0.9274
19.0051
ARD
1,2,3
1.0076
5.0611
Managing Uncertainty in Complex
Models, Aston University

Initial results for high-D input, two-correlated,
model inputs 100, noise dimensions 500,
number of observations 500.
Length - Input Number
31.8373 361
18.7081 501
14.2097 296
12.7581 51
12.3160 456
11.8689 496
11.3176 166
10.2424 310
10.2220 420
9.6192 325
9.0732 363
Length - Input Number
8.6898 53
8.5453 347
7.9338 419
7.8201 294
7.8017 188
7.4327 103
7.3760 13
7.1526 572
7.0997 478
6.9481 393
6.6417 187
Managing Uncertainty in Complex
Models, Aston University



Best performing methods are GPFS and ARD
which usually find the optimal subset.
However the GPFS method is on average more
than three times slower than ARD.
The CorrCoef and LinFS methods are
computationally inexpensive but provide
unsatisfactory results.
Even for simple mapping functions (sinx) on
underdetermined systems where number of
observations < dimensions, ARD breaks down.
Managing Uncertainty in Complex
Models, Aston University

Batch hierarchical screening



Explore the potential of partitioning the input space
into groups of inputs, applying screening methods
on the groups and combining the important inputs
Some work already done for linear models (Gabriel
and Pan 1979)
Grouping of variables such that if two variable Xi Xj
are in different groups, then their regression sum of
squares (RSS) are additive, i.e. if Si is the reduction in
RSS from including Xi and Sj for Xj, then when
including both Xi Xj Si.j=Si+Sj
Managing Uncertainty in Complex
Models, Aston University

Coupled Emulation



separate emulators for different outputs, linked with
some model for the covariance
Connections to sequential methods to handle
large datasets. Linked to Sequential Sparse
GPs?
Projective methods in conjunction with feature
selection.
Managing Uncertainty in Complex
Models, Aston University
[From Van der Maaten et al 2007]
Managing Uncertainty in Complex
Models, Aston University

But [Van der Maaten et al 2007] compared the
non-linear to linear methods and found them
no better. Reasons they propose relate to curse
of dimensionality, overfitting of local models
and others.
Managing Uncertainty in Complex
Models, Aston University


Dimensionality Reduction: A Comparative
Review, L.J.P. van der Maaten E.O. Postma H.J.
van den Herik 2007
Andr Elisseeff Isabelle Guyon. An Introduction
to Variable and Feature Selection. Journal of
Maching Learning Research, 3:1157–1182, 2003.
Managing Uncertainty in Complex
Models, Aston University