Structure Pruning Strategies for Min-Max Modular

646
Structure Pruning Strategies
for Min-Max Modular Network
Yang Yang and Baoliang Lu
Department of Computer Science and Engineering, Shanghai Jiao Tong University,
1954 Hua Shan Rd., Shanghai 200030, China
[email protected], [email protected]
Abstract. The min-max modular network has been shown to be an
efficient classifier, especially in solving large-scale and complex pattern
classification problems. Despite its high modularity and parallelism, it
suffers from quadratic complexity in space when a multiple-class problem
is decomposed into a number of linearly separable problems. This paper
proposes two new pruning methods and an integrated process to reduce
the redundancy of the network and optimize the network structure. We
show that our methods can prune a lot of redundant modules in comparison with the original structure while maintaining the generalization
accuracy.
1
Introduction
The min-max modular (M3 ) network is an efficient modular neural network
model for pattern classification[1][2], especially for large-scale and complex multiclass problems[3]. It divides a large-scale, complex problem into a series of smaller
two-class problems, each of which is solved by an independent module. We can
conduct the learning tasks of every module in parallel and integrate them to get
a final solution to the original problem according to two module combination
principles. These combination principles also successfully guide the emergent
incremental learning[4]. However, we need to learn too many modules when the
size of the training data is large while subproblems are small. Considering the
situation of incremental learning, since the training data are presented to the
network continually and more and more modules are built, the classifier will be
inefficient to respond to novel inputs.
To improve the response performance of min-max modular network, we consider reducing its redundancy at two phases. First is the recognition phase. In
this phase, we can dynamically decide which modules have influence on the final result and need computing. The other is the training phase. We can hold a
training process to the network to prune the redundant modules for the training
data. The pruned modules are supposed to be redundant for the whole input
space.
This work was supported in part by the National Natural Science Foundation of
China via the grants NSFC 60375022 and NSFC 60473040.
J. Wang, X. Liao, and Z. Yi (Eds.): ISNN 2005, LNCS 3496, pp. 646–651, 2005.
c Springer-Verlag Berlin Heidelberg 2005
Structure Pruning Strategies for Min-Max Modular Network
647
In addition, other two lines of research for training optimization can be considered. The idea of the first line arises from instance reduction techniques[5].
Since the nearest-neighbor(NN) algorithm and its derivatives suffer from high
computational costs and storage requirement, the instance filtering and abstraction approaches are developed to get a small and representative prototype set.
We can firstly build such a condensed prototype set, and then use it to construct
the modules of the M3 network. This way is instance pruning. The other line
of research is structure pruning. Lian and Lu first developed a back searching
(BS) algorithm to prune redundant modules[6]. We extend their work and propose an integrated process to gain a larger module reduction rate and maintain
classification accuracy.
2
Redundancy Analysis
According to the task decomposition principle of the M3 network, a K-class
problem is divided into K × (K − 1)/2 two-class problems, each of which can
be further decomposed into a number of subproblems. A M3 network with MIN
and MAX integrating units can solve a two-class subproblem, and each network
module learns a subproblem. Let’s consider the situation of incremental learning
[4] mentioned before. Suppose the training set of each subproblem has only two
different elements. Let T be the total training set, Xl be the input vector, where
1 ≤ l ≤ L, and L is the number of training data. The desired output y is defined
by
1 − , if Xl ∈ class Ci
(1)
y=
,
if Xl ∈ class C i
where is a small positive real number, C i denotes all the classes except Ci .
Accordingly, the training set of a subproblem has the following form:
(u,v)
Tij
(iu)
= {(Xl
(jv)
, 1 − ) ∪ (Xl
, )}
for u = 1, · · · , Li , v = 1, · · · , Lj , i, j = 1, · · · , K, and j = i
(2)
where Li and Lj are the numbers of training data belonging to class Ci and class
Cj , respectively. Hence the two-class problem has Li × Lj subproblems. We use
first minimization rule then maximization rule to construct the network. Fig. 1
(u,v)
shows the network structure. Since Tij
has only two instances, it is obviously
a linearly separable problem and can be discriminated by a hyperplane. The
optimal hyperplane is the perpendicular bisector of the line joining the two
instances. In this way, the decision boundary established by the network can be
written as a piecewise linear discriminant function:
(
Lu,v
(3)
i,j )
1≤u≤Li 1≤v≤Lj
u,v
where Lu,v
. And it is
i,j is the hyperplane determined by Mi,j trained on Tij
trivial to prove that the decision boundary is the same as that of the nearest
(u,v)
648
Yang Yang and Baoliang Lu
Fig. 1. M3 network for the two-class
problem of Ci and Cj
Fig. 2. An example of redundant modules, loops denote class1 , and squares
denote class2
neighbor classifier. In fact there are a lot of redundancies in the expression above.
For example, we can see that only black lines contribute to the boundary, and
cyan ones are redundant in Fig. 2, where a 2-D classification problem is depicted.
3
Pruning Algorithms
A backward searching(BS) algorithm for pruning redundant modules has been
developed[6], which searches the useful modules based on outputs of every MIN
unit. Take MINk (1 ≤ k ≤ Li ) as an example, set Tk = {ak , b1 , b2 , · · · , bLj },
where ai ∈ Ci , b1 , b2 , · · · , bLj ∈ Cj . For each instance in Tk , calculate the output
of MINk and mark the module which gives the minimal value. After all the
instances in Tk are processed, delete those modules without mark. BS algorithm
can take off a lot of units while maintaining the original decision boundary.
However, it fails to consider removing related redundant MIN units. Here we
extend BS algorithm to get an integrated pruning process.
3.1
Our Methods
Because a multi-class problem can be decomposed into a set of two-class problems, here we discuss the redundancy reduction problem of the two-class network
for simplicity. Suppose two classes are class1 and class2 , which include M and
N instances respectively. So the M3 network contains M MIN units and a MAX
unit, each MIN unit contains N modules. We propose two kinds of M3 Structure
Pruning Strategy, called M3 SPS1 and M3 SPS2, respectively. They have different
criterions of redundancy.
M3 SPS1. The idea of M3 SPS1 is straightforward, which regards the unit used
by no instance in the training set as redundant. Let T be the training set. The
algorithm is as follows:
Structure Pruning Strategies for Min-Max Modular Network
649
1. Set Flag(MINi ) = FALSE (i = 1, 2, · · · , M ).
2. For each instance I in T :
(a) Calculate y(I).
(b) Find MINi (1 ≤ i ≤ M ) which has the same value as y(I), and set
Flag(MINi ) = TRUE.
3. Delete MINj (1 ≤ j ≤ M ) that Flag(MINj )=FALSE.
4. For each MINi retained:
Execute BS algorithm and prune redundant modules.
5. End.
This approach is consistent with the training data because it just deletes
those units which are useless for the classification of the training data.
M3 SPS2. The idea of M3 SPS2 is similar to Reduced Nearest Neighbor[7]
and DROP1[5]. All of them try to conduct reduction without hurting the classification accuracy of the training data set or the subset retained. However, M3 SPS2
prune modules other than instances. Moreover, all the instances are guaranteed
to be classified correctly no matter whether their related modules have been
removed or not. The algorithm is described as follows:
1. Set Flag(MINi ) = FALSE , and create empty list(MINi ), i = 1, 2, · · · , M
2. For each instance I in T :
(a) Calculate y(I).
(b) Find MINi (1 ≤ i ≤ M ) which has the same value as y(I),
set Flag(MINi ) = TRUE, and insert I to list(MINi ).
3. For each MINi that Flag(MINi ) = TRUE
if ∀ I ∈ list(MINi ) can be classified correctly without MINi :
Set Flag(MINi ) = FALSE.
For each I ∈ list(MINi ):
insert I to list(MINj ), where 1 ≤ j ≤ M and MINj gives new
y(I) instead of MINi .
4. Delete MINi (1 ≤ i ≤ M ) that Flag(MINi ) = FALSE.
5. For each MINk (1 ≤ k ≤ M ) retained:
Execute BS algorithm and prune redundant modules.
6. End.
This approach is also consistent with the training data. Note that it is sensitive to the presentation order of the MIN units. We can make a search to
determine the removing sequence. The basic assumption here is that a module
composed by an inner instance is likely to be useless. Since y(I) indicates the
perpendicular distance between the instance I and decision boundary, so memorize y(Instance(1i) ) at step 2 and examine the MINi in the descending order of
y(Instance(1i) ) at step 3 , where 1 ≤ i ≤M, Instance(1i) ∈ class1 and it is the
corresponding instance of MINi .
650
4
Yang Yang and Baoliang Lu
Experimental Results
We present five experiments to verify our methods, including the two-spirals
problem and 4 real world problems. Three of them were conducted on the benchmark data sets from the Machine learning Database Repository[8]: Iris Plants,
Image Segmentation and Vehicle Silhouettes. The last one was carried on a data
set of glass-board images from an industrial product line, which was used to
discriminate the eligible glass-boards. Table 1 shows all the data sets and the
numbers of classes, dimensions, training and test samples. Table 2 shows the
classification accuracy, response time, and the retention rates of modules and
MIN units, where
retention rate =
No. of modules/MINs after pruning
.
No. of modules/MINs of the original structure
(4)
Since a multi-class problem can be decomposed into a number of two-class problems, each of which is solved by a M3 network, for iris, segment and vehicle
problems, we record the response time both in parallel and in series. We also list
the accuracy and response time of the nearest-neighbor classifier for comparison.
All the experiments were performed on a 3GHz Pentium 4 PC with 1GB RAM.
Table 1. Numbers of classes, dimensions, training and test data
Class
Dimension
Training
Test
Two-spirals
2
2
95
96
Iris Segment Vehicle Glass image
3
7
4
2
4
19
18
160
135 1540
600
174
15
770
246
78
Table 2. Experimental results. For the last three problems, the left sub-column of
“time” denotes sequential time and right one denotes parallel time(CPU time, in millisecond)
Problems
1-NN
acc time
Two-spirals 1.00 31
Glass image 0.897 47
Iris
0.982 15
Segment 0.958 328
Vehicle 0.638 47
acc
1.00
0.895
0.982
0.956
0.638
M3 SPS1
time MINs
32
1.0
63
0.477
16 15 0.289
766 177 0.370
78 41 0.75
modules
0.149
0.282
0.042
0.036
0.104
acc
0.99
0.885
0.982
0.904
0.630
M3 SPS2
time MINs
30
0.432
47
0.075
15 15 0.067
453 164 0.053
63 37 0.298
modules
0.044
0.044
0.013
0.005
0.047
From the experimental results in Table 2, several observations can be made.
M3 SPS1 almost maintains the decision boundary of the original structure, but
the pruning ability is limited. M3 SPS2 gains a significant module reduction rate,
but the classification accuracy dropped by an average of 1.3%. This is likely
due to that the decision boundaries sometimes change dramatically and the
classification of the positive points may be affected since too many MIN units
have been removed. When handling complex multi-class problems, we can make
use of the simple decomposition rules of M3 and learn the two-class problems
in parallel. Then the time can cut down greatly. Take Segment as an example,
Structure Pruning Strategies for Min-Max Modular Network
651
parallel M3 SPS2 costs only half the response time as nearest neighbor classifier.
And the superiority will be more distinct in solving more complex multi-class
problems.
5
Conclusions and Future Work
We have presented two new pruning algorithms for dealing with redundant MIN
units and developed a process integrating Back Searching algorithm to reduce
the redundancy of the network and optimize the M3 network structure. The
process pays attention to not only massive units but also little modules, and
improves the structure significantly. The experiments verified the validity of the
structure pruning strategies. Compared with instance reduction methods, they
directly concern with decision boundaries, because a module in fact presents a
class boundary. So it may express complex concepts more delicately. However,
more investigations on the criterion of redundancy are still needed. And it should
be noted that this study has examined only hyperplane base classifier. Since the
M3 network is a flexible framework, the user can choose proper base classifier and
module size according to different requirements. One future work will investigate
methods for pruning the min-max modular networks with other base classifiers,
such as the Gaussian zero-crossing function[9], which has an adaptive locally
tuned response characteristic.
References
1. Lu, B. L., Ito, M.: Task Decomposition Based on Class Relations: a Modular Neural
Network Architecture for Pattern Classification. In: Biological and Artificial Computation: From Neuroscience to Technology. Lecture Notes in Computer Science.
Vol. 1240. Springer-Verlag (1997) 330-339
2. Lu, B. L., Ito, M.: Task Decomposition and Module Combination Based on Class
Relations: a Modular Neural Network for Pattern Classification. IEEE Trans. Neural Networks, 10 (1999) 1244-1256
3. Lu, B. L., Shin, J., Ichikawa, M.: Massively Parallel Classification of Single-trial
EEG Signals Using a Min-Max Modular Neural Network. IEEE Trans. Biomedical
Engineering, 51 (2004) 551-558
4. Lu, B. L., Ichikawa, M.: Emergent On-line Learning in Min-max Modular Neural
Networks. In: Proc. of IEEE/INNS Int. Conf. on Neural Networks, Washington
DC, USA (2001) 2650-2655
5. Wilson, D.R., Martinez, T.R.: Reduction Techniques for Instance-based Learning
Algorithms. Machine Learning, 38 (2000) 257-286
6. Lian, H. C., Lu, B. L.: An Algorithm for Pruning Redundant Modules in MinMax Modular Network (in Chinese). In: Proc. 14th National Conference on Neural
Network, Hefei University of Technology Press (2004) 37-42
7. Gates, G. W.: The Reduced Nearest Neighbor Rule. IEEE Transaction on Information Theory, IT-18-3 (1972) 431-433
8. Blake, C. L., Merz, C. J.: UCI. In: ftp://ftp.ics.uci.edu/pub/machine-learningdatabases (1998)
9. Lu, B. L., Ichikawa, M.: Emergent Online Learning with a Gaussian Zero-crossing
Discriminant Function. In: Proc. IJCNN’02 (2002) 1263-1268