646 Structure Pruning Strategies for Min-Max Modular Network Yang Yang and Baoliang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University, 1954 Hua Shan Rd., Shanghai 200030, China [email protected], [email protected] Abstract. The min-max modular network has been shown to be an efficient classifier, especially in solving large-scale and complex pattern classification problems. Despite its high modularity and parallelism, it suffers from quadratic complexity in space when a multiple-class problem is decomposed into a number of linearly separable problems. This paper proposes two new pruning methods and an integrated process to reduce the redundancy of the network and optimize the network structure. We show that our methods can prune a lot of redundant modules in comparison with the original structure while maintaining the generalization accuracy. 1 Introduction The min-max modular (M3 ) network is an efficient modular neural network model for pattern classification[1][2], especially for large-scale and complex multiclass problems[3]. It divides a large-scale, complex problem into a series of smaller two-class problems, each of which is solved by an independent module. We can conduct the learning tasks of every module in parallel and integrate them to get a final solution to the original problem according to two module combination principles. These combination principles also successfully guide the emergent incremental learning[4]. However, we need to learn too many modules when the size of the training data is large while subproblems are small. Considering the situation of incremental learning, since the training data are presented to the network continually and more and more modules are built, the classifier will be inefficient to respond to novel inputs. To improve the response performance of min-max modular network, we consider reducing its redundancy at two phases. First is the recognition phase. In this phase, we can dynamically decide which modules have influence on the final result and need computing. The other is the training phase. We can hold a training process to the network to prune the redundant modules for the training data. The pruned modules are supposed to be redundant for the whole input space. This work was supported in part by the National Natural Science Foundation of China via the grants NSFC 60375022 and NSFC 60473040. J. Wang, X. Liao, and Z. Yi (Eds.): ISNN 2005, LNCS 3496, pp. 646–651, 2005. c Springer-Verlag Berlin Heidelberg 2005 Structure Pruning Strategies for Min-Max Modular Network 647 In addition, other two lines of research for training optimization can be considered. The idea of the first line arises from instance reduction techniques[5]. Since the nearest-neighbor(NN) algorithm and its derivatives suffer from high computational costs and storage requirement, the instance filtering and abstraction approaches are developed to get a small and representative prototype set. We can firstly build such a condensed prototype set, and then use it to construct the modules of the M3 network. This way is instance pruning. The other line of research is structure pruning. Lian and Lu first developed a back searching (BS) algorithm to prune redundant modules[6]. We extend their work and propose an integrated process to gain a larger module reduction rate and maintain classification accuracy. 2 Redundancy Analysis According to the task decomposition principle of the M3 network, a K-class problem is divided into K × (K − 1)/2 two-class problems, each of which can be further decomposed into a number of subproblems. A M3 network with MIN and MAX integrating units can solve a two-class subproblem, and each network module learns a subproblem. Let’s consider the situation of incremental learning [4] mentioned before. Suppose the training set of each subproblem has only two different elements. Let T be the total training set, Xl be the input vector, where 1 ≤ l ≤ L, and L is the number of training data. The desired output y is defined by 1 − , if Xl ∈ class Ci (1) y= , if Xl ∈ class C i where is a small positive real number, C i denotes all the classes except Ci . Accordingly, the training set of a subproblem has the following form: (u,v) Tij (iu) = {(Xl (jv) , 1 − ) ∪ (Xl , )} for u = 1, · · · , Li , v = 1, · · · , Lj , i, j = 1, · · · , K, and j = i (2) where Li and Lj are the numbers of training data belonging to class Ci and class Cj , respectively. Hence the two-class problem has Li × Lj subproblems. We use first minimization rule then maximization rule to construct the network. Fig. 1 (u,v) shows the network structure. Since Tij has only two instances, it is obviously a linearly separable problem and can be discriminated by a hyperplane. The optimal hyperplane is the perpendicular bisector of the line joining the two instances. In this way, the decision boundary established by the network can be written as a piecewise linear discriminant function: ( Lu,v (3) i,j ) 1≤u≤Li 1≤v≤Lj u,v where Lu,v . And it is i,j is the hyperplane determined by Mi,j trained on Tij trivial to prove that the decision boundary is the same as that of the nearest (u,v) 648 Yang Yang and Baoliang Lu Fig. 1. M3 network for the two-class problem of Ci and Cj Fig. 2. An example of redundant modules, loops denote class1 , and squares denote class2 neighbor classifier. In fact there are a lot of redundancies in the expression above. For example, we can see that only black lines contribute to the boundary, and cyan ones are redundant in Fig. 2, where a 2-D classification problem is depicted. 3 Pruning Algorithms A backward searching(BS) algorithm for pruning redundant modules has been developed[6], which searches the useful modules based on outputs of every MIN unit. Take MINk (1 ≤ k ≤ Li ) as an example, set Tk = {ak , b1 , b2 , · · · , bLj }, where ai ∈ Ci , b1 , b2 , · · · , bLj ∈ Cj . For each instance in Tk , calculate the output of MINk and mark the module which gives the minimal value. After all the instances in Tk are processed, delete those modules without mark. BS algorithm can take off a lot of units while maintaining the original decision boundary. However, it fails to consider removing related redundant MIN units. Here we extend BS algorithm to get an integrated pruning process. 3.1 Our Methods Because a multi-class problem can be decomposed into a set of two-class problems, here we discuss the redundancy reduction problem of the two-class network for simplicity. Suppose two classes are class1 and class2 , which include M and N instances respectively. So the M3 network contains M MIN units and a MAX unit, each MIN unit contains N modules. We propose two kinds of M3 Structure Pruning Strategy, called M3 SPS1 and M3 SPS2, respectively. They have different criterions of redundancy. M3 SPS1. The idea of M3 SPS1 is straightforward, which regards the unit used by no instance in the training set as redundant. Let T be the training set. The algorithm is as follows: Structure Pruning Strategies for Min-Max Modular Network 649 1. Set Flag(MINi ) = FALSE (i = 1, 2, · · · , M ). 2. For each instance I in T : (a) Calculate y(I). (b) Find MINi (1 ≤ i ≤ M ) which has the same value as y(I), and set Flag(MINi ) = TRUE. 3. Delete MINj (1 ≤ j ≤ M ) that Flag(MINj )=FALSE. 4. For each MINi retained: Execute BS algorithm and prune redundant modules. 5. End. This approach is consistent with the training data because it just deletes those units which are useless for the classification of the training data. M3 SPS2. The idea of M3 SPS2 is similar to Reduced Nearest Neighbor[7] and DROP1[5]. All of them try to conduct reduction without hurting the classification accuracy of the training data set or the subset retained. However, M3 SPS2 prune modules other than instances. Moreover, all the instances are guaranteed to be classified correctly no matter whether their related modules have been removed or not. The algorithm is described as follows: 1. Set Flag(MINi ) = FALSE , and create empty list(MINi ), i = 1, 2, · · · , M 2. For each instance I in T : (a) Calculate y(I). (b) Find MINi (1 ≤ i ≤ M ) which has the same value as y(I), set Flag(MINi ) = TRUE, and insert I to list(MINi ). 3. For each MINi that Flag(MINi ) = TRUE if ∀ I ∈ list(MINi ) can be classified correctly without MINi : Set Flag(MINi ) = FALSE. For each I ∈ list(MINi ): insert I to list(MINj ), where 1 ≤ j ≤ M and MINj gives new y(I) instead of MINi . 4. Delete MINi (1 ≤ i ≤ M ) that Flag(MINi ) = FALSE. 5. For each MINk (1 ≤ k ≤ M ) retained: Execute BS algorithm and prune redundant modules. 6. End. This approach is also consistent with the training data. Note that it is sensitive to the presentation order of the MIN units. We can make a search to determine the removing sequence. The basic assumption here is that a module composed by an inner instance is likely to be useless. Since y(I) indicates the perpendicular distance between the instance I and decision boundary, so memorize y(Instance(1i) ) at step 2 and examine the MINi in the descending order of y(Instance(1i) ) at step 3 , where 1 ≤ i ≤M, Instance(1i) ∈ class1 and it is the corresponding instance of MINi . 650 4 Yang Yang and Baoliang Lu Experimental Results We present five experiments to verify our methods, including the two-spirals problem and 4 real world problems. Three of them were conducted on the benchmark data sets from the Machine learning Database Repository[8]: Iris Plants, Image Segmentation and Vehicle Silhouettes. The last one was carried on a data set of glass-board images from an industrial product line, which was used to discriminate the eligible glass-boards. Table 1 shows all the data sets and the numbers of classes, dimensions, training and test samples. Table 2 shows the classification accuracy, response time, and the retention rates of modules and MIN units, where retention rate = No. of modules/MINs after pruning . No. of modules/MINs of the original structure (4) Since a multi-class problem can be decomposed into a number of two-class problems, each of which is solved by a M3 network, for iris, segment and vehicle problems, we record the response time both in parallel and in series. We also list the accuracy and response time of the nearest-neighbor classifier for comparison. All the experiments were performed on a 3GHz Pentium 4 PC with 1GB RAM. Table 1. Numbers of classes, dimensions, training and test data Class Dimension Training Test Two-spirals 2 2 95 96 Iris Segment Vehicle Glass image 3 7 4 2 4 19 18 160 135 1540 600 174 15 770 246 78 Table 2. Experimental results. For the last three problems, the left sub-column of “time” denotes sequential time and right one denotes parallel time(CPU time, in millisecond) Problems 1-NN acc time Two-spirals 1.00 31 Glass image 0.897 47 Iris 0.982 15 Segment 0.958 328 Vehicle 0.638 47 acc 1.00 0.895 0.982 0.956 0.638 M3 SPS1 time MINs 32 1.0 63 0.477 16 15 0.289 766 177 0.370 78 41 0.75 modules 0.149 0.282 0.042 0.036 0.104 acc 0.99 0.885 0.982 0.904 0.630 M3 SPS2 time MINs 30 0.432 47 0.075 15 15 0.067 453 164 0.053 63 37 0.298 modules 0.044 0.044 0.013 0.005 0.047 From the experimental results in Table 2, several observations can be made. M3 SPS1 almost maintains the decision boundary of the original structure, but the pruning ability is limited. M3 SPS2 gains a significant module reduction rate, but the classification accuracy dropped by an average of 1.3%. This is likely due to that the decision boundaries sometimes change dramatically and the classification of the positive points may be affected since too many MIN units have been removed. When handling complex multi-class problems, we can make use of the simple decomposition rules of M3 and learn the two-class problems in parallel. Then the time can cut down greatly. Take Segment as an example, Structure Pruning Strategies for Min-Max Modular Network 651 parallel M3 SPS2 costs only half the response time as nearest neighbor classifier. And the superiority will be more distinct in solving more complex multi-class problems. 5 Conclusions and Future Work We have presented two new pruning algorithms for dealing with redundant MIN units and developed a process integrating Back Searching algorithm to reduce the redundancy of the network and optimize the M3 network structure. The process pays attention to not only massive units but also little modules, and improves the structure significantly. The experiments verified the validity of the structure pruning strategies. Compared with instance reduction methods, they directly concern with decision boundaries, because a module in fact presents a class boundary. So it may express complex concepts more delicately. However, more investigations on the criterion of redundancy are still needed. And it should be noted that this study has examined only hyperplane base classifier. Since the M3 network is a flexible framework, the user can choose proper base classifier and module size according to different requirements. One future work will investigate methods for pruning the min-max modular networks with other base classifiers, such as the Gaussian zero-crossing function[9], which has an adaptive locally tuned response characteristic. References 1. Lu, B. L., Ito, M.: Task Decomposition Based on Class Relations: a Modular Neural Network Architecture for Pattern Classification. In: Biological and Artificial Computation: From Neuroscience to Technology. Lecture Notes in Computer Science. Vol. 1240. Springer-Verlag (1997) 330-339 2. Lu, B. L., Ito, M.: Task Decomposition and Module Combination Based on Class Relations: a Modular Neural Network for Pattern Classification. IEEE Trans. Neural Networks, 10 (1999) 1244-1256 3. Lu, B. L., Shin, J., Ichikawa, M.: Massively Parallel Classification of Single-trial EEG Signals Using a Min-Max Modular Neural Network. IEEE Trans. Biomedical Engineering, 51 (2004) 551-558 4. Lu, B. L., Ichikawa, M.: Emergent On-line Learning in Min-max Modular Neural Networks. In: Proc. of IEEE/INNS Int. Conf. on Neural Networks, Washington DC, USA (2001) 2650-2655 5. Wilson, D.R., Martinez, T.R.: Reduction Techniques for Instance-based Learning Algorithms. Machine Learning, 38 (2000) 257-286 6. Lian, H. C., Lu, B. L.: An Algorithm for Pruning Redundant Modules in MinMax Modular Network (in Chinese). In: Proc. 14th National Conference on Neural Network, Hefei University of Technology Press (2004) 37-42 7. Gates, G. W.: The Reduced Nearest Neighbor Rule. IEEE Transaction on Information Theory, IT-18-3 (1972) 431-433 8. Blake, C. L., Merz, C. J.: UCI. In: ftp://ftp.ics.uci.edu/pub/machine-learningdatabases (1998) 9. Lu, B. L., Ichikawa, M.: Emergent Online Learning with a Gaussian Zero-crossing Discriminant Function. In: Proc. IJCNN’02 (2002) 1263-1268
© Copyright 2026 Paperzz