optimization of models: looking for the best strategy

IWIM 2007 workshop, Sept. 23-26, 2007, Prague
Regularization of Evolving
Polynomial Models
Pavel Kordík
CTU Prague, Faculty of Electrical Engineering,
Department of Computer Science
External criteria in GMDH theory
2/67
GAME model
First layer of units
x1
Linear
unit
x2
Input
variables
...
...
...
2nd layer of units
xn
n
y   ai xi  an 1
i 1
Output
variable
Polynomial
unit
x1
x2
...
Units in layer evolved
by genetic algorithm
xn
3/67
 m r
y    ai  x j   a0
i 1 
j 1

n
Encoding into chromosomes
GAME model evolution
Input layer
4
1
5
2
Niching
GA
Linear transfer unit
1234567
1001000
Inputs
6
3
y  a1 x1  a2 x2  a0
not encoded
Transfer function
Chromosomes
Polynomial trasfer unit
7
Frozen layer(s)
Actual layer
1234567 1234567 1234567
0000110 2115130 1203211
Inputs
Transfer function
y  a1 x1 x23  a2 x12 x2  a0
4/67
Polynomial units encoded into
neurons
Elements
y = 8.94 * x23 * x42 - 2.37 * x1 * x45
+
7.12 * x4
Coeff.
x1 x2 x 3 x4 x 5
Coeff.
x1 x2 x3 x4 x5
Coeff.
x1 x2 x3 x4 x5
8.94
01010
23124
-2.37
10010
14256
7.12
00010
32411
degree_field
used_field
Encoding
5/67
Data set division
2/3
A
1/3
Adaptive
division?
B
Training data
Optimize coefficients
(learning of units)
Validation
data
Select surviving
units
6/67
Testing data
Check if model
overfits data
Fitness function – which units
survive?
• RMS Error of the unit on the
training data – feedback for
optimization methods
m
E   ( y ' y )
i 1
• RMS Error of the unit on the
validation data – used to
compute fitness of units
7/67
2
External criterion
• Computation of error on validation set
Fitness = 1/CR
8/67
9/67
t1
te
R
50 st2
t
R est
50 1
R tes
30
t
0 2
t
e
R
30 st1
0
R tes
72
5 t2
R tes
72
t1
R 5 te
16
s
00 t2
R
16 tes
t1
0
R 0 te
30
0 st2
R 0 te
30
s
00 t1
te
st
2
12
te
s
0.46
R
0.56
12
Optimal value of R is 300 on the Antro data set
3.3
1
09
2E
+2
6
4.
94
5.
7E
+
7
RMS error on the Antro training data set & the Antro testing data set
7.
44
0.58
R
R
tra
12 in1
tr
R ain
50
2
tra
R
50 in1
R train
30
2
0
t
r
R
30 ain1
0
R trai
72
n2
5
t
r
R
72 ain1
5
t
R
16 rain
00
2
R
16 train
00
1
t
R
30 rain
00
2
R
t
r
a
30
00 in1
tra
in
2
12
R
CRrms-r-val criterion on real data
1.5
1.3
0.54
1.1
0.52
0.9
0.5
0.48
0.7
0.5
40
59
.6
CRrms-r-val criterion on real data
RMS error on the Building training data set & Building testing data set
0.03
0.035
WBE
WBCW
0.026
0.03
WBHW
0.022
0.025
0.018
0.02
0.014
0.015
0.01
0.01
12
R
in
rt a
1
1
in
rt a
50 300
R
R
in
rt a
1
in
rt a
1
1
in
tra
5
00
00
72
6
0
R
1
3
R
R
n1
i
tra
12
R
st
te
1
st
e
t
1
50 300
R
R
st
te
1
st
te
1
st
te
5
00
00
72
6
0
R
1
3
R
R
Optimal value of R is 725 on the Building data set
10/67
1
st
te
1
CR should be sensitive to noise
CR
High noise
0.9
St
op
th
e
0
in
R5
R12
0.6
0.3
R3
Medium noise
m
in
im
um
of
CR
00
R750
Low noise
0
y = a1x1+a2
R3000
Model complexity
11/67
y = a1x13x4+ ... +a6x2+a7
How to estimate the penalization
strength (1/R)?
Variance of the output variable?
12/67
0.35
0.3
Training & Validation set
1.7
3E
-01
4.1
7
4.1 E-02
7E
-02
1.7
0
2.7 E-03
1E
-03
0.1
4.1
1
4.1 E-02
0E
-02
0.15
4.8
6
4.8 E-02
4E
-02
8.7
0
8.7 E-02
4E
-02
0.2
0
3.8
6E
+0
8
6.9
0E
+0
7
0.4
0.35
RMS-tr&val
0.3
R300-tr&val
Validation set
0.25
0.05
3.2
0E
-01
0.4
Regularization on testing data
0.25
RMS-p-n-tr&val
0.2
0.15
0.1
0.05
0
e
e
e
e
e
e
e
is
is
is
is
is
is
is
o
o
o
o
o
o
o
n
n
n
n
n
n
n
%
%
%
%
%
%
%
0
5
0
0
10
20
50
10
20
0%
1.7
0
2.0 E-03
1
1.9 E-03
5E
-03
3.8
6E
+0
8
RMS on the testing data
6.9
0E
+0
7
Experiments with synthetic data
is
no
e
5%
is
no
e
%
10
is
no
e
%
20
is
no
e
%
50
is
no
e
is
no
e
0% 00%
10
2
is
no
e
Regularization works, but the
difference of R300 and p-n is not
significant.
Validate just on validation set!
13/67
Theoretical and experimental
aspects of regularization
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
RM
SR1
2
RM
SR5
0
RM
SR3
00
RM
SR7
25
RM
SR1
60
0
RM
SR3
00
0
RM
SR2
RM
SR5
0
0.2
0.1
0
14/67
RM
SR1
2
RM
SR5
0
RM
SR3
00
RM
SR7
25
RM
SR1
60
0
RM
SR3
00
0
0.1
200% noise
100% noise
50% noise
20% noise
10% noise
5% noise
0% noise
RM
SR2
RM
SR5
0.2
200% noise
100% noise
50% noise
20% noise
10% noise
5% noise
0% noise
So which criterion is the best?
0% noise
0.003
200% noise
0.002
5% noise
0.001
RMS-valid
R300-val
RMS-p-n-valid
0
-0.001
100% noise
50% noise
10% noise
20% noise
15/67
R300-tr&val
RMS-tr&val
RMS-p-n-tr&valid
Regularized polynomial models on
Antro data set
It is evident that optimal value of R is between 100 and 1000 – the same
results as in our pervious experiments with Antro data set (Ropt=300).
Linear models are still better than the best polynomial !!!
16/67
Conclusion
• Experiments with regularization of polynomial
models
• Every data set requires different level of
penalization for complexity
• It can be partially derived from the variance of
the output variable
• The regularization is still not sufficient, linear
models perform better on highly noisy data sets!
17/67
Thank you!
18/67