Talanta 132 (2015) 175–181 Contents lists available at ScienceDirect Talanta journal homepage: www.elsevier.com/locate/talanta Dealing with heterogeneous classification problem in the framework of multi-instance learning Zhaozhou Lin a, Shuaiyun Jia a, Gan Luo a, Xingxing Dai a, Bing Xu a, Zhisheng Wu a, Xinyuan Shi a,b,n, Yanjiang Qiao a,b,n a College of Chinese Medicine, Beijing University of Chinese Medicine, Beijing 100102, China Research Center of TCM-information Engineering, State Administration of Traditional Chinese Medicine of the People's Republic of China, Beijing 100102, China b art ic l e i nf o a b s t r a c t Article history: Received 21 April 2014 Received in revised form 28 August 2014 Accepted 3 September 2014 Available online 16 September 2014 To deal with heterogeneous classification problem efficiently, each heterogeneous object was represented by a set of measurements obtained on different part of it, and the heterogeneous classification problem was reformulated in the framework of multi-instance learning (MIL). Based on a variant of count-based MIL assumption, a maximum count least squares support vector machine (maxc-LS-SVM) learning algorithm was developed. The algorithm was tested on a set of toy datasets. It was found that maxc-LS-SVM inherits all the sound characters of both LS-SVM and MIL framework. A comparison study between the proposed approach and the other two MIL approaches (i.e., mi-SVM and MI-SVM) was performed on a real wolfberry fruit spectral dataset. The results demonstrate that by formulating the heterogeneous classification problem as a MIL one, the heterogeneous classification problem can be solved by the proposed maxc-LS-SVM algorithm effectively. & 2014 Elsevier B.V. All rights reserved. Keywords: Heterogeneous spectra Multi-instance learning (MIL) Error-Correcting Output Codes (ECOC) Maximum count least square support vector machine (maxc-LS-SVM) Geographical origins 1. Introduction The application of near infrared spectroscopy (NIR) to perform classification has been spread across the analysis of food, agricultural, petroleum, and pharmaceutical products [1–3]. However, spectra obtained on heterogeneous objects, such as corn and pharmaceutical tablet, are often of high variance. Applying common classification methods on these spectra will result in weak conclusion. This is one typical category of heterogeneous classification problem. However, the heterogeneous classification problem has not yet been solved thoroughly [4]. The key on solving heterogeneous classification problem is to represent each heterogeneous object efficiently. In the literatures, pre-treatment is the most successfully and widely used techniques for solving the heterogeneous classification problem. One builds certain measurement protocol to improve the representativeness of measurements. Spectra measured by using integrating sphere [5–10], rotating the sample during spectra collection, or grinding the samples [11–14] are of this type. Significantly better results were observed while the measurement was taken by a patented measurement method [15]. However, when the spectra are needed n Corresponding authors at: College of Chinese Medicine, Beijing University of Chinese Medicine, Beijing 100102, China. Tel.: þ86 10 84738621; fax: þ86 84738661. E-mail addresses: [email protected] (X. Shi), [email protected] (Y. Qiao). http://dx.doi.org/10.1016/j.talanta.2014.09.007 0039-9140/& 2014 Elsevier B.V. All rights reserved. to be collected in-situ, none of the above methods are applicable. Hwang et al. [16] developed a fast and non-destructive analytical method to identify geographical origins of rice samples via transmission spectral collected through packed grains. But the variation of packing thickness will prevent the measurement from yielding reproducible spectra. Instead of receiving a set of measurements, each for a sample, an object can be represented by a set of measurements (instances). Therefore, the heterogeneous classification problem can be solved by formulating it as a multi-instance learning one. Instead of receiving labeled instances as in traditional supervised learning, the MIL learner receives a set of labeled bags. The majority of MIL studies are concerned with binary classification problems [17]. Most of these studies assume that a bag is labeled positive if there is at least one instance in it being positive. Otherwise, a bag is labeled negative if all the instances in it are negative. Based on the classical MIL assumption, numerous MIL methods have been proposed in the literatures, and most of them have been reviewed in earlier studies [17–19]. These algorithms use mainly the information of one instance from each positive bag. However, there is commonly more than one instance in any positive bags and much of the information contained in these instances is lost. The MIL assumption is first used to solve the problem of musk drug activity prediction problem [20]. From then on, many problems have been formulated as MIL problem, such as image categorization, object detection, and human active recognition 176 Z. Lin et al. / Talanta 132 (2015) 175–181 [18,21,22]. Although all early work in MIL assumes a specific concept class known to be appropriate for a drug activity prediction like domain, the classical MIL assumption is not guaranteed to hold in other domains. Recently, a significant amount of researches in MIL are concerned with cases where the classical view of MIL problem is relaxed, and alternative assumptions are considered instead [17]. Although not all these works clearly state what particular assumption is used and how it relates to other assumptions, the use of alternative assumptions has been clarified by reviewing the studies in this area. In this work, a variant of count-based binary MIL assumption was adopted. It assumes that a bag is labeled positive if the product of the positive posteriors of instances, weighted by the bag prior, is larger than that of the negative ones. Otherwise, a bag is labeled negative. Base on this assumption, the maxc-LS-SVM algorithm was proposed to deal with the heterogeneous classification problem. For multi-class cases, the original classification problem was decomposed in the framework of Error-Correcting Output Codes (ECOC) by one-versus-one design [23]. The maxc-LSSVM algorithm was modified correspondingly and its performance was compared with mi-SVM and MI-SVM [24] algorithm. The results of applying these approaches on toy datasets and a real herbal dataset clearly showed the advantage of maxc-LS-SVM algorithm. 2. Theory and algorithm 2.1. Least squares-support vector machine (LS-SVM) This study will briefly focus on the basic concepts of LS-SVM because the theory of LS-SVM has been described extensively in the literatures [25,26]. LS-SVM encompasses similar advantages as SVM, but additionally, it requires solving only a set of linear equations, which is much easier and computationally very simple. The objective function defined in LS-SVM is [27] N min J P ðω; eÞ ¼ 12ωT ω þ γ12 ∑ e2i ω;b;e i¼1 subject toðs:t:Þ : yi ½ω φðxi Þ þ b ¼ 1 ei ; i ¼ 1; …; N T ð1Þ where w denotes the normal vector to the classification hyperplane, γ is the hyperparameter tuning the amount of regularization versus the sum squared error, e is the error variable, and φðxÞis the nonlinear map from original space to the high (and possibly infinite) dimensional space. To solve the optimization problem efficiently, a Lagrange function is constructed and translated into its dual form N Lðω; b; e; αÞ ¼ J P ðω; eÞ ∑ αi fyi ½ω φðxi Þ þ b 1 þ ei g T ð2Þ i¼1 where αi values are the Lagrange multipliers. The conditions for optimality are 8 N > ∂L > > > ∂ω ¼ 0-ω ¼ ∑ αi yi φðxi Þ > > i¼1 > > > > N < ∂L ¼ 0- ∑ αi yi ∂b i¼1 > > > > ∂L > ¼ 0αi ¼ γei ; i ¼ 1; :::; N > > ∂ei > > > T : ∂∂L ¼ 0-y ½ ω φ ðx Þ þ b 1 þ e ¼ 0; i ¼ 1; :::; N i i i αi it yields a linear system instead of a quadratic programming problem " # " # 0 0 1TN b ¼ ð4Þ y 1N Ω þ γ 1 I N a where y is a vector containing the reference values, 1N is a [N 1] vector of ones and I is an [N N] identify matrix. Ω is the kernal matrix defined by Ωij ¼ φ(xi)Tφ(xj)¼ K(xi,xj). The classifier in the dual space takes the form " # N yðxÞ ¼ sign ∑ αi yi Kðx; xi Þ þ b ð5Þ i¼1 2.2. Maximum margin formulation of MIL In traditional supervised learning framework, an object is represented by one single instance, i.e. measurement vector, and associated with a class label. The goal is to induce a classifier to label instances. While, MIL groups instances into bags and each bag is attached with a class label. More formally, an object is represented by a bag B ¼{x1, x2, …, xp}, which contains a set of D-dimensional instances. Each bag is associated with a label Y. Both mi-SVM and MI-SVM approaches [24] are modified and extended from Support Vector Machines (SVMs). The mi-SVM approach explicitly treats the instance labels as unobserved integer variables subjected to constrain defined by the (positive) bag labels. A generalized soft-margin SVM is defined as follows in primal form minmin12‖w‖2 þ C∑ ξi fyi g w;b;ξ i s:t: 8 i : yi ðhw; xi i þ bÞ Z 1 ξi ; ξi Z 0; yi A f 1; 1g ð6Þ where ξ is a non-negative slack variable, C is the hyperparameter tuning the amount of the degree of misclassification versus the sum squared error, yi is the label of instance. The mi-SVM formulation leads to a mixed integer programming problem. One has to maximize a soft-margin criterion jointly over possible label assignments and hyperplane. While, the MISVM approach extends the notion of margin to bag and maximizing the bag margin directly. The bag margin with respect to a hyperplane is defined by γ I Y I maxðhw; xi i þ bÞ iAI ð7Þ Incorporating the above bag margin, an MIL version of the softmargin classifier is defined by min12‖w‖2 þ C∑ ξI w;b;ξ I s:t: 8 I: YI maxðhw; xi i þ bÞ Z1 ξI ; ξI Z 0 iAI ð8Þ Unfolding the max operation by introducing inequality constraint per instance for the negative bags, or by introducing a selector variable sðIÞ A I which denotes the positive instance selected for the positive bags, one can obtain the following equivalent formulation ð3Þ From the conditions for optimality, it can be concluded that no αi values will be exactly equal to zero, meaning that the advantages of automatic sparseness are lost. However, the model can be trained much more efficiently after constructing the Lagrangian by solving the linear Karush–Kuhn–Tucker (KKT) system, since min min12‖w‖2 þ C∑ ξI s w;b;ξ s:t: I 8 I : Y I ¼ 1 4 hw; xi i bZ 1 ξI ; 8 i A I; orY I ¼ 1 4 w; xsðIÞ b Z 1 ξI ; ξI Z 0: ð9Þ MI-SVM can also be cast as mixed-integer program. One has to find both the optimal selectors and hyperplane. A heuristic optimization scheme has been proposed to solve the mixed-integer programs by alternating the following two steps (i) for a given integer variables, solve the associated Quadratic Programming (QP) and find the optimal discriminant function, Z. Lin et al. / Talanta 132 (2015) 175–181 (ii) for a given discriminant function, update the label of each instance in mi-SVM or the single selector variable in MI-SVM. 2.3. Multi-class classification The above three algorithms are originally designed to perform binary classification. Although binary LS-SVM can be easily extended to deal with multi-class problems, it is usually more preferable to build a classifier distinguishing only two classes rather than to consider more than two classes in a model. In this study, the original multi-class problem was decomposed into easier to solve binary classifications by using a common strategy “one-versus-one” in the framework of ECOC, which is a simple but powerful framework to deal with the multi-class classification problem based on the embedding of binary classifiers [23]. The “one-versus-one” decomposition scheme divides an m class problem into M¼ m(m 1)/2 binary problems. Each problem is faced by a binary LS-SVM classifier, which is responsible for distinguishing between a pair of classes. The learning phase of the classifiers is done using a subset of samples, which contains any two classes of them, whereas samples with different class labels are simply ignored. In prediction phase, a sample is presented to each of the binary classifiers. The output of a classifier, rij A {0, 1}, denotes the output of the ith sample on the jth binary classifier. These outputs are represented by a vector ri ri ¼ ½r i1 ; r i2 ; ⋯; r ij ; ⋯; r iM ; j ¼ 1; ⋯; M ð10Þ The predicted class can be obtained by Hamming decoding strategies as follows [23]: Class ¼ min M ∑ ð1 signðr ij ; ycj ÞÞ=2 c ¼ 1;:::;m j ¼ 1 ð11Þ where yc is a codeword corresponding to class c. The classification of individual instances was performed by using binary classification algorithm. 2.4. Maximum count LS-SVM For a binary classifier, let Y A { 1,1}. It is assumed that a bag is labeled positive if more than half of the instances in the bag are drawn from positive sets. Specifically, the sum rule is used to decide the label of the bag Classify B ¼{x1, x2, …, xp} as positive, i.e. Yi ¼1, j¼1, 2, …, p if p 1y yij þ 1 ij Z ∑ 2 2 j¼1 j¼1 p ∑ ð12Þ or negative, i.e. Yi ¼ 1, j¼ 1, 2, …, p if p 1y yij þ 1 ij o ∑ 2 2 j¼1 j¼1 p ∑ ð13Þ where N is the number of bags, p denotes the number of instances in each bag. For multi-class cases, let YA {1, 2, 3, …, m}. Every instance of one object is presented to each of the basic ovo-LS-SVM classifiers. The classification result of one instance is represented by a vector cj with element ckj taken from {0, 1}. The value 1 means the instance is assigned to class k, while 0 not. These outputs are then combined 0 c11 Bc B 21 C¼B B ⋮ @ cp1 177 into a score matrix C: 1 c12 ⋯ c1m c22 ⋯ c2m C C C ⋮ C A cp2 ⋯ cpm ð14Þ From the score matrix, bags can be directly classified with the following determination function: Y i ¼ arg max p ∑ ckj k ¼ 1;:::;m j ¼ 1 ð15Þ i¼1, 2, …, N, j ¼1, 2, …, p, k ¼1, 2, …, m. Tie break arbitrarily. where m denotes the number of categories. The above equation is also fit to binary cases. The main steps of maxc-LS-SVM can be summarized as follows: Training data formalization: assign the bag label to its instances and use all instances for training. Basic classifier training: estimate parameters of the basic classifiers. Individual instance classification: classify each instance using the basic classifiers. Bag classification: classify an object or bag B according to Eq. (15). 2.5. Performance indicator For binary classification problem, there are a large number of well-known statistical metrics such as accuracy, precision, sensitivity, specificity, AUC, Cohen's kappa, etc. However, most of these performance indicators cannot be directly used for multi-class cases. In this work, only the metrics whose successful application on multi-class problems had been proved experimentally were adopted. Accuracy rate, also known as classification rate, is the proportion of correctly classified instances in the population. Cohen's kappa [28] is an alternative measure to accuracy rate, compensating for random hits. It evaluates the degree of agreement in classification over which would be expected by chance. Based on an m-class confusion matrix, the Cohen's kappa can be computed as follows: m m i¼1 i¼1 N ∑ hii ∑ T ri T ci kappa ¼ m ð16Þ N2 ∑ T ri T ci i¼1 where hii is the number of true positives for each class, N is the number of bags, m is the number of labels, Tri is the rows' total counts and Tci is the columns' total counts. The theoretical range of Cohen's kappa statistic is from 1 (total disagreement) through 0 (random classification) to 1 (perfect agreement), which seems difficult to compare kappa with accuracy (0–1). Actually, most classifiers do at least the same as random guessing on most real-world datasets, so their kappa values score higher than zero by definition. It is generally assumed that a kappa score lying between 0.8 and 1 indicates that the corresponding classifier is almost in perfect agreement with the actual pattern. The bootstrap cross-validation method (BCV) [29], which is a bootstrap smoothing version of cross-validation, was used to estimate the generalization error of each classification algorithm because it has less variation and bias. The bootstrap datasets are drawn with replacement for some large B between 50 and 200. For each dataset, an internal cross-validation error estimator was obtained with a predetermined classification rule. After B times 178 Z. Lin et al. / Talanta 132 (2015) 175–181 repeat, the averaged error estimation was calculated over all bootstrap bag sets. 3. Experimental 3.1. Toy data The toy data has three classes (A, B and C). For each bag of one specific class, ρ percent of its instances were sampled from the Gaussian located at either (0, 0), (4, 1), or (1, 4), the rest were sampled from the Gaussian located at (2, 2). Every bag contains 30 instances. To be comparable with common NIR classifications, 50 bags were generated for each class. A sketch map of the two-way datasets with ρ A{0.2, 0.4, 0.6, 0.8} is shown in Fig. 1. The toy data was used to simulate the heterogeneous classification problem. The feasibility of formulizing the heterogeneous classification problem as a multi-instance learning problem and the influence of bag size on the proposed maxc-LS-SVM algorithm were investigated on the toy data. 3.2. Herbal geographical origins data Generally, the quality of wolfberry fruit varies with geographical origin, which makes its medical efficacy diverse. Therefore, it is necessary to distinguish accurately the geographical origins of wolfberry fruit to assure a traditional Chinese prescription effective. The wolfberry fruit dataset consists of samples collected from Inner Mongolia, Ningxia, and Qinghai provinces of China. For each sample, ten spectra (i.e. instances) were measured respectively on ten different parts of the wolfberry fruit surface. A sketch map of the ten instances is shown in Fig. 2(a). Each instance was measured by using a portable Near-infrared Spectrophotometer (Ocean quest 256-2.5) equipped with an InGaAs detector and diffuse reflection optical fiber probe. The spectrum covering the range of 870–2530 nm at distinguishability of 9.5 nm was recorded. All the raw spectra were transferred into logarithm mode because they were initially recorded in reflectance mode. Totally, 29, 45, and 45 bags were formed for each geographical origin, respectively. An overlap plot of the transformed spectra is shown in Fig. 2(b). The wolfberry fruit data was used to further illustrate the feasibility of formulizing the heterogeneous classification problem as a multi-instance learning one. A comparison study between the proposed maxc-LS-SVM and the mi-SVM, MI-SVM algorithm were also performed on the wolfberry fruit data. 3.3. Single instance pseudo dataset In traditional learning framework, each object is represented by a single labeled instance. Thus, one instance from each bag was drawn randomly to form the pseudo dataset. Each instance was labeled with its bag's label. 3.4. Software and algorithms All calculations were performed on a personal computer i7 880 processor with 6GB RAM under the Win7 Professional operating system using Matlab 7.9 (Mathworks, Inc., Natick, MA). The Fig. 1. The two way toy data with ρ ¼0.2, 0.4, 0.6, and 0.8, respectively. Z. Lin et al. / Talanta 132 (2015) 175–181 179 Fig. 2. (a) A sketch map of the locations of the ten instances for each wolfberry fruit sample; (b) an overlap plot of the transformed spectra. Habt. NM means the samples were original sampled from Inner Mongolia, similar, Habt. NX stands for Ningxia and Habt. QH denotes Qinghai. maxc-LS-SVM algorithm was modification of function in the LS-SVMlab v1.8 [30]. The implementation of the MI-SVM and mi-SVM algorithm was based on a MIL toolbox publicly available at http://www.cs.cmu.edu/ juny/MILL/. Table 1 A comparison between the learning effectiveness of linear kernel and RBF kernel LS-SVM on the toy datasets (α¼ 0.01). ρ 4. Results and discussion A grid search method guided by 10-fold cross-validation was adopted to optimize the hyperparameters of MI-SVM and mi-SVM. Each pair (C, γ) in the cross-product of C ranging from 2 15 to 23 and γ ranging from 2 5 to 215 at the increment of 2 in the exponent was used to train every dichotomous classifier. Meanwhile, the optimal hyperparameters of LS-SVM were determined by coupled simulated annealing (CSA) algorithm and simplex algorithm. 4.1. The toy data To be consistent with traditional classification method, one single-instance pseudo dataset was drawn for every bag datasets. The Cohen's kappa and accuracy were calculated by the BCV procedure on the single-instance pseudo dataset. However, the randomizations in producing the toy bag sets and in drawing the single-instance pseudo datasets present variability in the prediction metrics. Therefore, each of the four toy datasets was regenerated 100 times respectively to present a stable evaluation on the applicability of traditional classification method in heterogeneous classification problem. With respect to ρ ¼0.2, the Mann–Whitney U test did not reject the equivalence between the linear and RBF (radial basis function) kernel using accuracy with the significance level of α ¼0.01 (Table 1). However, opposite results were obtained on the comparison study between the linear and RBF kernel in term of Cohen's kappa. In addition, from the decision boundary illustrated in Fig. 3, it was observed that the decision boundary of RBF kernel was more efficient to separate one class from another, but was too complex and specific. Meanwhile, the decision boundary of linear kernel separate linearly the pseudo samples well. This means that the ovo-LS-SVM model built with RBF kernel overfit the classification problem. Thus, liner kernel is used in maxc-LS-SVM model to prevent possible over-fitting. The results of applying maxc-LS-SVM on the simulated heterogeneous classification data were shown in Figs. 4 and 5. From the 0.2 0.4 0.6 0.8 p-Value Accuracy Linear RBF 0.4355 0.5753 0.6703 0.7981 0.4253 0.5655 0.6708 0.7961 0.0405 0.0439 0.9318 0.7851 p-Value Kappa Linear RBF 0.1453 0.3516 0.4921 0.6832 0.1813 0.3357 0.4906 0.6816 0.0000 0.0264 0.7947 0.8632 results on the toy data with ρ ¼0.2, it was found that the prediction ability of the model built using maxc-LS-SVM improved quite a lot, when it was compared with the model constructed using the base ovo-LS-SVM classifier (Fig. 4(a) and (b)). By varying the bag size from 5 to 30 at the increment of 5, the classification accuracy increased gradually. These results indicate that a powerful decision function can be constructed even the data is too weak to build a usable base classifier. The results presented in Fig. 5 support this opinion. Regarding ρ ¼ 0.4, 0.6 and 0.8, the statistical analysis does not reject the null hypothesis that the base classifiers with linear and RBF kernel perform equally (Table 1) in terms of both kappa and accuracy. Further examination on the decision boundaries of RBF kernel reveal that each pair of classes can be separated linearly in the raw space (Figs. S1–S3 in the supplementary material). Considering the computational burden, linear kernel is preferred and used for further investigation. The results presented in Fig. 4(a) clearly shows that the effect of random success on the accuracy matric decrease with the increase of ρ. In addition, a base LS-SVM classifier which performs better than random guessing can be obtained, even most of the instances from the toy data with ρ equal to 0.2 were shared by three classes (Fig. 5(a)). From the results summarized in Figs. 4 and 5, it can also be observed that all the classifiers built using maxc-LS-SVM perform significantly better than their corresponding base classifiers on the toy datasets (ρ ¼0.4, 0.6, 0.8, respectively). This means maxc-LS-SVM inherits the sound characters of LS-SVM. With the bag size varying from 5 to 30 at the increment of 5, the prediction accuracy gradually improved and even approached perfect agreement. Therefore, it can be concluded that by formulating the heterogeneous classification problem into a multiple instance 180 Z. Lin et al. / Talanta 132 (2015) 175–181 Fig. 3. The LS-SVM decision boundaries of linear kernel and RBF kernel on the toy data with ρ equal to 0.2. Table 2 A summary of the prediction performance of the base classification method LS-SVM and the three multi-instance learning methods i.e. mi-SVM, MI-SVM, maxc-LS-SVM on the wolfberry fruit data. Accuracy LS-SVM mi-SVM MI-SVM maxc-LS-SVM Fig. 4. The classification accuracy results of applying maxc-LS-SVM algorithm on the four toy datasets with the number of instances ranging from 5 to 25 at the increments of 5. Fig. 5. A summary of the Cohen's kappa results of applying maxc-LS-SVM algorithm on the toy datasets with the number of instances ranging from 5 to 25 at the increments of 5. learning one, the heterogeneous classification problem can be solved effectively. 4.2. The wolfberry fruit data The results of applying ovo-LS-SVM method on the bootstrap pseudo data sets were listed in Table 2. It can be observed that the prediction accuracy of linear kernel ovo-LS-SVM was larger than that of the RBF kernel. That means linear kernel is more acceptable Kappa Liner RBF Linear RBF 0.9276 0.5929 0.8205 0.9898 0.7868 0.5371 0.7414 0.9754 0.8870 0.4427 0.7106 0.9840 0.6667 0.3741 0.5934 0.9616 than RBF kernel in classifying the geographical origin of wolfberry fruits. From the results obtained on the toy data, it is concluded that the prediction ability of maxc-LS-SVM will be improved if more instances are included in a bag. So all instances measured on one wolfberry fruit were used to investigate the prediction performance of MIL algorithms. Besides, with bag size fixed, the comparisons among MIL algorithms become more objective. The comparison results between maxc-LS-SVM and the other two MIL algorithms were tabulated in Table 2. For mi-SVM algorithm, neither linear nor RBF kernel could produce classifier comparable to LS-SVM algorithm in terms of both classification accuracy and kappa. There are two factors contributing to this phenomenon, one is the MIL assumption and the other is the relationship between bag and its instances. Since mi-SVM is a present based MIL algorithm, MI-SVM is first brought into comparison. MI-SVM algorithm extends the notion of a margin from individual patterns to sets of patterns. And its prediction performance improved quite a lot when it was compared with mi-SVM algorithm. But the classification accuracy and kappa of MI-SVM algorithm were all still worse than that of LS-SVM algorithm. That means considering all instances in a bag as a whole is benefit for constructing a powerful classifier. However, adopting the whole learning strategy along is not enough to solve the heterogeneous classification problem thoroughly. The maxc-LS-SVM algorithm, whose MIL assumption is different from that of mi-SVM and MI-SVM, performed much better than the base classifier in terms of classification accuracy for both linear and RBF kernel. And the kappa of liner kernel maxc-LS-SVM reached as high as 0.9840, which indicates that random success take a limited impact on the prediction results. Compared with the results obtained using mi-SVM and MI-SVM algorithm, it can be concluded that maxc-LS-SVM algorithm is more powerful in classifying the geographical origin of wolfberry fruit. Z. Lin et al. / Talanta 132 (2015) 175–181 All the results demonstrate that by forming the heterogeneous classification problem into a multiple instance learning one, the heterogeneous classification problem can be solved effectively. The power of mi-SVM and MI-SVM has been demonstrated experimentally in image classification, which is a well-studied topic in the field of computer vision. But heterogeneous classification is a rather different application; the present-based MIL assumption is no longer applicable. Additionally, it may be argued that it is unnecessary to complicate the heterogeneous classification problem, since the accuracy of base classifier using one instance for a object is already larger than 0.90. But it is still need to build a more powerful decision rule, because the wolfberry fruit is used as tonic Chinese herbal medicine, and distinguishing explicitly the geographical origin of wolfberry fruit is necessary to keep the prescription effect. Moreover, the kappa of linear kernel ovo-LS-SVM is only 0.8870, a more credible classifier should be constructed. 5. Conclusions In this study, the heterogeneous classification problem was formulated as a multi-instance learning one. Based on a variant of count-based MIL assumption, the maxc-LS-SVM algorithm was developed to deal with multi-instance learning problem. The proposed algorithm inherits all the sound characters of both the base classifier and MIL framework. By incorporating the maximum count MIL assumption into the process of constructing a suitable learning algorithm, maxc-LS-SVM make full use of the information of each instance in the positive (or class specific) bags. A real wolfberry fruit spectral dataset, which contains 119 objects from geographical origins, was used in this experiment. In multi-class classification, the proposed method presented the optimal classification accuracy and kappa compared with the other two MIL methods (i.e., mi-SVM and MI-SVM). More importantly, it was concluded that the maximum count based MIL assumption was more applicable in the heterogeneous classification domain. While being potentially promising, the maxc-LS-SVM approach needs to be further evaluated with other heterogeneous classification applications. This is currently being pursued. Acknowledgments The authors would like to thank the anonymous reviewers for their kind and insightful comments. Financial supports from the Innovation Group Projects of Beijing University of Chinese Medicine (no. 2011-CXTD-11) and the National Natural Science 181 Foundation of China (no. 81303218) are gratefully acknowledged. The computation was partly supported by CHEMCLOUDECOMPUTING (Beijing University of Chemical Technology, Beijing, China). Appendix A. Supplementary materials Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.talanta.2014.09.007. References [1] S. Naik, V.V. Goud, P.K. Rout, K. Jacobson, A.K. Dalai, Renew. Energy 35 (2010) 1624–1631. [2] T. De Beer, A. Burggraeve, M. Fonteyne, L. Saerens, J.P. Remon, C. Vervaet, Int. J. Pharm. 417 (2011) 32–47. [3] L.J. Xie, X.Q. Ye, D.H. Liu, Y.B. Ying, Food Res. Int. 44 (2011) 2198–2204. [4] L. Esteve Agelet, C.R. Hurburgh Jr., Talanta 121 (2014) 288–299. [5] E. Ziémons, J. Mantanus, P. Lebrun, E. Rozet, B. Evrard, P. Hubert, J. Pharm. Biomed. Anal. 53 (2010) 510–516. [6] O.Y. Rodionova, A.L. Pomerantsev, Trends Anal. Chem. 29 (2010) 795–803. [7] J. Märk, M. Andre, M. Karner, C.W. Huck, Eur. J. Pharm. Biopharm. 76 (2010) 320–327. [8] J. Mantanus, E. Ziémons, P. Lebrun, E. Rozet, R. Klinkenberg, B. Streel, B. Evrard, P. Hubert, Talanta 80 (2010) 1750–1757. [9] B. Wang, G. Liu, Y. Dou, L. Liang, H. Zhang, Y. Ren, J. Pharm. Biomed. Anal. 50 (2009) 158–163. [10] S. Tripathi, H.N. Mishra, Food Control 20 (2009) 840–846. [11] J. Moros, J.J. Laserna, Anal. Chem. 83 (2011) 6275–6285. [12] M. Blanco, A. Peguero, Trends Anal. Chem. 29 (2010) 1127–1136. [13] C.-O. Chan, C.-C. Chu, D.K.-W. Mok, F.-T. Chau, Anal. Chim. Acta 592 (2007) 121–131. [14] S. Wold, H. Antti, F. Lindgren, J. Öhman, Chemom. Intell. Lab Syst. 44 (1998) 175–185. [15] J. Janni, B.A. Weinstock, L. Hagen, S. Wright, Appl. Spectrosc. 62 (2008) 423–426. [16] J. Hwang, S. Kang, K. Lee, H. Chung, Talanta 101 (2012) 488–494. [17] J. Foulds, E. Frank, Knowl. Eng. Rev. 25 (2010) 1–25. [18] Y. Li, D.M.J. Tax, R.P.W. Duin, M. Loog, Pattern Recognit. 46 (2013) 865–874. [19] Z.-H. Zhou, J. Comput. Sci. Technol. 21 (2006) 800–809. [20] T.G. Dietterich, R.H. Lathrop, T. Lozano-Pérez, Artif. Intell. 89 (1997) 31–71. [21] Y. Shi, Y. Gao, R. Wang, Y. Zhang, D. Wang, Artif. Intell. 38 (2013) 16–28. [22] Z. Liang, Z. Bo, G. Yang, Fifth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2008, pp. 487–492. [23] S. Escalera, O. Pujol, P. Radeva, J. Mach. Learn. Res. 11 (2010) 661–664. [24] S. Andrews, I. Tsochantaridis, T. Hofmann, Adv. Neural Inf. Process Syst. 15 (2002) 561–568. [25] F. Chauchard, J. Svensson, J. Axelsson, S. Andersson-Engels, S. Roussel, Chemom. Intell. Lab Syst. 91 (2008) 34–42. [26] S. Ren, L. Gao, Analyst 136 (2011) 1252–1261. [27] J.A. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, J. Vandewalle, J. Suykens, T. Van Gestel, Least Squares Support Vector Machines, World Scientific Publishing, Singapore, 2002. [28] M. Galar, A. Fernández, E. Barrenechea, H. Bustince, F. Herrera, Pattern Recognit. 44 (2011) 1761–1776. [29] W.J. Fu, R.J. Carroll, S. Wang, Bioinformatics 21 (2005) 1979–1986. [30] 〈http://www.esat.kuleuven.be/sista/lssvmlab/〉.
© Copyright 2025 Paperzz