Fruit classification based on weighted score-level feature fusion Hulin Kuang Leanne Lai Hang Chan Cairong Liu Hong Yan Journal of Electronic Imaging 25(1), 013009 (Jan∕Feb 2016) Fruit classification based on weighted score-level feature fusion Hulin Kuang,a,* Leanne Lai Hang Chan,a Cairong Liu,b and Hong Yana a City University of Hong Kong, Department of Electronic Engineering, 83 Tat Chee Avenue, Kowloon, Hong Kong Chinese University of Hong Kong, Department of Mathematics, Tai Po Road, Shatin, NT, Hong Kong b Abstract. We describe an object classification method based on weighted score-level feature fusion using learned weights. Our method is able to recognize 20 object classes in a customized fruit dataset. Although the fusion of multiple features is commonly used to distinguish variable object classes, the optimal combination of features is not well defined. Moreover, in these methods, most parameters used for feature extraction are not optimized and the contribution of each feature to an individual class is not considered when determining the weight of the feature. Our algorithm relies on optimizing a single feature during feature selection and learning the weight of each feature for an individual class from the training data using a linear support vector machine before the features are linearly combined with the weights at the score level. The optimal single feature is selected using cross-validation. The optimal combination of features is explored and tested experimentally using a customized fruit dataset with 20 object classes and a variety of complex backgrounds. The experiment results show that the proposed feature fusion method outperforms four state-of-the-art fruit classification algorithms and improves the classification accuracy when compared with some state-of-the-art feature fusion methods. © 2016 SPIE and IS&T [DOI: 10.1117/1.JEI.25.1.013009] Keywords: object classification; multiple feature extraction; optimal feature selection; weighted score-level feature fusion; fruit classification. Paper 15634 received Aug. 11, 2015; accepted for publication Dec. 11, 2015; published online Jan. 19, 2016. 1 Introduction Many engineering applications depend on object classification.1 In general, systems for object classification utilize machine-learning algorithms, such as Adaboost2 and its variants,3 support vector machine (SVM),4,5 neural networks,6 and deep learning,7,8 with feature descriptors. Most object classification methods usually utilize a single type of feature descriptor, such as the Haar feature,2,3 histogram of oriented gradients (HOG)4,5 and its variants,9 scale-invariant transform feature (SIFT)10 and extensions of SIFT,11,12 local binary pattern (LBP)13 and the variants of LBP,14 four direction features,15 Gabor features16 and Gabor-based features,17 and convolutional neural network features.18 In view of the limited performance when using a single feature descriptor in complex scenes, multifeature fusion can be used to improve the classification accuracy. Multiple feature descriptors can be combined using various methods, which can be categorized into four types: feature-level fusion,19–26 learning-level fusion,12,27–29 scorelevel fusion,30–34 and decision-level fusion.35,36 Score-level fusion is also referred to as “classifier fusion” or “classifier ensemble”30,37–41 due to the fact that it also combines classification results (scores) of classifiers trained using each feature. We focus on score-level fusion because it is less prone to the curse of dimensionality and can make full use of the complementarity of different features. This paper proposes a weighted score-level fusion method that combines the scores of all classifiers with class-specific weight vectors learned *Address all correspondence to: Hulin Kuang, E-mail: [email protected]. edu.hk Journal of Electronic Imaging from training samples using a linear SVM. The proposed weighted score-level feature fusion is more effective than several feature fusion approaches. Commonly used features for fruit classification19,20,23,24,35 are not optimized before performing feature fusion. Moreover, the optimal combination of features is not considered in existing methods. In this work, we present a set of diverse and complementary feature descriptors that can be used to represent different attributes of fruits. They are global shape features, global color histograms, and statistical color features, LBP, HOG, and LBP based on edge maps obtained using edge detection,42 also named edgeLBP. Except for the global shape features, the features are optimized by a five-fold cross-validation. The optimal combination of feature descriptors is then selected by testing each probable combination. Our contributions in this work include complementary feature extraction, optimal feature selection, optimal combination, and the effective weighted score-level feature fusion based on learned weights. The performance of the proposed weighted score-level fusion method using the optimal feature combination is analyzed experimentally on a new customized fruit dataset. The images in the dataset were collected from Google Images, which include images with more complex backgrounds, a greater variety of images, and more fruit classes than the existing fruit datasets.23,24,35 Each image contains a single fruit object, similar to the datasets for face recognition.2,6 The evaluation of the proposed method using the customized fruit dataset is twofold. First, our proposed method is 1017-9909/2016/$25.00 © 2016 SPIE and IS&T 013009-1 Jan∕Feb 2016 • Vol. 25(1) Kuang et al.: Fruit classification based on weighted score-level feature fusion compared with several state-of-the-art feature fusion approaches. Then, it is compared with four state-of-the-art fruit classification methods. The extensive experiments on this large dataset enable us to confirm the effectiveness and superiority of the weighted score-level fusion approach. The remainder of the paper is organized as follows. Related work is described in Sec. 2. The proposed fruit classification approach, multiple feature extraction, optimal feature selection, and the proposed feature fusion approach are presented in Sec. 3. In Sec. 4, experimental settings are provided. Section 5 demonstrates the experiment results, and provides the evaluations and comparisons with other methods. Finally, conclusions are discussed in Sec. 6. 2 Related Work In this section, we review four feature fusion techniques: feature level fusion, learning level fusion, score-level fusion, and decision level fusion. Feature level fusion: concatenating one feature after another to obtain a new and long feature vector is traditional and simple, yet effective. Harjoko and Abdullah19 utilized this traditional feature fusion of shape and color features for fruit classification and Arivazhagan et al.20 performed the concatenating fusion of color and texture features for fruit classification. An extension of LBP was concatenated with an extension of HOG to improve performance in pedestrian detection.21 SIFT and boundary-based shape features were combined using fusion by concatenating to improve object class recognition.22 These methods were shown to be effective. However, the curse of dimensionality may occur when the number of features increases. There are other feature-level fusion methods such as principal component analysis (PCA)-based fusion,23,24 multiple component analysis (MCA)-based fusion,25 and canonical correlation analysis (CCA)-based fusion.26 Multiple features are combined using PCA to reduce the feature dimension and to obtain a new feature vector after normalization.23,24 The new feature vector after PCA might not be optimal for classification.25 Hou et al.25 proposed a feature-level fusion method named MCA in which feature dimension reduction and feature fusion were coupled together. This method was effective for the fusion of three or more features. Two feature sets were fused by CCA,26 which was used to find the basis vectors for two sets of variables. These feature fusion methods obtained correlated information from different feature vectors. Because different features transform images into various feature spaces and have different scales and ranges, their feature fusion results might not be optimal for classification. Learning level fusion, multiple kernel learning (MKL), has been widely utilized in object classification and recognition to combine different feature sets by selecting the optimal combination of kernels based on different features or different similarity functions.27 MKL learned a kernel matrix, in which a variety of information of multiple features could be combined together. The kernel function of SVM was extended in a weighted fashion for multiple features but the weights of each feature were equal.12 MKL was utilized to integrate various types of features.28 With MKL, an SVM classifier was trained with an adaptively weighted combination of kernels, which integrated multiple features. However, MKL did not perform data dimensionality Journal of Electronic Imaging reduction.25 MKL required additional time to select the optimal combination of kernels.27–29 Moreover, an MKL-based method for selecting the optimal kernel combination that works well for all classification tasks has not yet been explored. Score-level fusion, that is, combining multiple features at score level by fusing all scores obtained by each feature to acquire the final scores for classification decision, has also been studied.30–34 Score-level fusion was referred to as classifier ensemble,30 where an ensemble of different classifiers from multiple features were performed by combining scores using operators including sum, product, median, max, and majority vote. The combination rule was also trained by a multilayer perceptron combiner using a posteriori probability obtained by each feature. The trained combination rule demonstrated improved performance compared to the fixed rules. Chen and Chiang31 proposed a multiple-feature extraction approach in which two kinds of features were combined at score-level stage by several well-known fusion techniques including the mean, max, and product rules. Before score fusion, the scores were normalized into a common scale and range. The recognition results (scores) of two features were fused by Kikutani et al.32 using a weighted combination according to the normalized output likelihood of each feature. Han et al.33 proposed a feature fusion method based on Gaussian mixture models. In this method, the probabilities of each feature were summed using the maximum likelihood probability as the weights of each feature to achieve feature fusion. Several weighting approaches to linearly combine the scores of features were evaluated by Eisenbach et al.34 They concluded that pairwise optimization of projected genuine-impostor overlap (PROPER) was very effective. Classifier fusion approaches37–41 could also be utilized for score-level feature fusion. Guermeur37 assessed multiclass SVMs (M-SVM) as an ensemble method of classifiers. The optimal class-specific weight vectors that contained weights of each classifier and class-specific bias terms were learned using a new family of M-SVMs to linearly combine the classification results from multiple discriminant models. Guermeur38 combined two M-SVMs with the linear ensemble method (LEM). In this method, the classification results were postprocessed and input into a LEM method to learn the weights of each classifier for each class to minimize a loss function. The weights were also class-specific vectors where each vector contained classifier-specific weights. The class posterior probabilities were then estimated for recognition. A genetic algorithm (GA) was used by Santos et al.39 and Chernbumroong et al.40 to search for optimal weights for each classifier to combine the scores of each classifier. Different from the abovementioned methods, Kumar and Raj41 developed instancespecific weights for score-level classifier fusion. Although these methods are effective, the weights are mostly classifier-specific; however, the differences between the classification contributions of each feature of individual classes were not considered in these methods when designing weights. Moreover, in these methods, weights should be relearned when new classifiers are added. Decision level fusion utilized operators such as and, or, max, and min36 at the decision level for integrating classification labels. Rocha et al.35 presented a unified approach that could combine many features and classifiers at decision level 013009-2 Jan∕Feb 2016 • Vol. 25(1) Kuang et al.: Fruit classification based on weighted score-level feature fusion for multiclass classification of fruits and vegetables. The multiclass classification problem was divided into a set of binary classifications. For each class, a set of binary classifiers was trained using each feature. Then, the decision results were computed by majority voting of the classified results (labels) of all sets of binary classifiers using each feature. Because hard labels such as −1 or þ1 lose more information than soft scores, the fusion of hard labels might lead to misclassifications. 3 Proposed Object Classification Based on Weighted Multiple Feature Fusion In this section, we present the proposed object classification approach, multiple feature extraction, optimal feature selection, and the proposed weighted score-level feature fusion method. 3.1 Overview of the Proposed Object Classification Approach The framework of the proposed object classification method is shown in Fig. 1. The image preprocessing step includes conversion from RGB color space to HSV color space, grayscale transformation, image segmentation (Otsu’s method43), and edge detection (Canny edge detection44). Multiple and complementary features are then extracted. We improve the recognition accuracy by performing optimal single feature selection using cross-validation to choose the optimal feature parameters. The complementarity of each feature and different classification contribution of each feature for each class is fully utilized by employing weighted scorelevel feature fusion. Each optimal feature is used to train a multiclass classifier with SVM (LibSVM45). Then, the weights of each feature (i.e., the multiclass classifier) for each class are separately learned from training samples using a linear SVM (LibLinear46) with the scores of each classifier for each class as input features. The classifier responses (scores) of each classifier are vectors containing the probability of samples being classified into each class. The optimal fusion of features is determined by testing all probable combination of features. During the testing stage, multiple features that are optimal are extracted. Subsequently, the trained multiclass classifiers are used to recognize the corresponding feature vectors to obtain the score vectors. Finally, the final score vector is computed by summing the score vectors of each multiclass classifier (i.e., a feature) with learned weights during the training stage. The classification result (label) is decided by finding the maximum score in the final score vector. 3.2 Multiple Feature Extraction In this section, details of the multiple feature extraction procedure are presented. We use complementary features that include shape, color, HOG, LBP, and edgeLBP features. 3.2.1 Global shape features In general, different fruits have different shape properties, for example, the shape of an apple is spherical, whereas the shape of a hami melon is elliptic. Therefore, the global shape features carry important attributes for fruit recognition. However, using the shape features as the only attributes of fruit is not sufficient because fruits have different colors, textures, as well as gradient features. In the field of fruit recognition, simple shape features such as area, circumference, and circularity are utilized.19 In this work, these three kinds of global shape features are all used. The shape features are computed from the fruit region based on Otsu’s image segmentation43 and Canny edge detection.44 Area (A) is the number of foreground pixels, belonging to a fruit. Circumference (C) is the number of pixels belonging to the edge of a fruit. Circularity is computed by the equation: Circularity ¼ C2 ∕4πA. The shape features represent the global shape of an object, which may lead to misclassification because of the different scales and orientations of the object. Hence, features that are able to represent the local shape of an object are indispensable. 3.2.2 Color features Statistical color features, such as the mean and standard deviation of all color values in each color channel, are commonly used in object recognition. Methods using the statistical features are simpler and faster than methods based on the color histogram. The hue is invariant to the orientation of an object with respect to illumination and the camera direction; hence, the HSV color space is often used. In addition, the HSV color space more closely resembles that of the human visual system than the RGB color space.20 In HSV, the H and S channels indicate the color, whereas the V channel indicates the brightness or luminance. Therefore, the color features are extracted on the H and S channels in the HSV color space. Four statistical color features are used in this paper. They are the mean and standard deviation values of both of the H and S channels, respectively. We also utilize a global color histogram on the HSV color space by dividing each channel into various bins to compute the histograms. The final color histogram is obtained by combining the histograms of each channel. The number of bins influences the performance of the color histogram. Fig. 1 Framework of the proposed object classification method. Journal of Electronic Imaging 013009-3 Jan∕Feb 2016 • Vol. 25(1) Kuang et al.: Fruit classification based on weighted score-level feature fusion 3.2.3 Histogram of oriented gradients The HOG descriptor adopts the statistical information of gradients to represent the local contour of an object, such as a pedestrian.4 HOG can represent the local contour of an object, which is a kind of local shape feature. Hence, HOG not only complements simple shape features but also color and texture features. In this paper, the HOG features are extracted from the grayscale images. In general, images are divided into several blocks with overlap or no overlap to capture more information. Therefore, the accuracy of HOG depends on the size of a block and the extent of overlap. Similarly, using the HOG feature alone is also insufficient because the local shape features only form a subset of all the attributes of fruit. Thus, other features, especially texture features such as LBP, should be utilized. 3.2.4 Local binary pattern LBP is a commonly used texture feature that has shown robust performance in pedestrian detection and face recognition.13 Hence, LBP is utilized in this work to improve the accuracy of fruit classification. We utilize uniform pattern LBPs that are extracted on the V channel of HSV color space. The recognition accuracy of LBP depends on radius, number of sample points, block size, and extent of overlap. LBP complements other features such as HOG. Although LBP is useful for object recognition, LBP only extracts the local texture information between the center pixel and its neighbors. Thus, global features should be used to complement LBP. 3.2.5 EdgeLBP Edge detection using structured forests is one of the state-ofthe-art edge detection methods42 that is capable of computing high-quality edge maps. We complement other features by extracting edge maps from RGB images using edge detection.42 The use of edge maps as features leads to a large number of high dimension features. We extract LBP on edge maps to obtain the local properties of edge features. Similar to LBP, the performance of edgeLBP is also influenced by the block size and degree of overlap. Edge maps are global edge features and edgeLBP captures local analyses of edge features that can complement other features. 3.4 Proposed Multifeature Fusion Approach Once the optimal features have been selected, we perform multiple feature fusion. In this section, we propose a weighted score-level feature fusion method based on learned weights. We first introduce several state-of-the-art feature-fusion baselines so that they can be better compared with our proposed method. 3.4.1 Feature-level fusion based on simple concatenation We concatenate one feature after another to obtain the final feature vector, which is used to train the final classifier (an approach named “Concatenating”), similar to the fusion described by Manshor et al.22 The fusion of multiple features is achieved as follows: 2 3 shape 6 color 7 6 7 7; F¼6 (1) LBP 6 7 4 HOG 5 edgeLBP EQ-TARGET;temp:intralink-;e001;326;579 where F is the final feature fusion vector, and the five terms in the bracket denote the five features discussed above. 3.4.2 Score-level feature fusion baselines Score-level fusion based on average classification contribution. We propose a simple and new weighted score-level fusion based on the average classification contribution (named “WSLF-ACC” in this paper), where the weights of each feature are defined as the average classification contribution computed using average accurate probability. Each feature vector is used to train a multiclass classifier using SVM on training samples. Each classifier (corresponding to each feature) can obtain the responses (i.e., scores) of each sample. We then linearly sum the scores of each classifier for each sample together with weights to obtain the final scores. The weights of each classifier are computed using the average accurate probability as wj ¼ EQ-TARGET;temp:intralink-;e002;326;305 N 1X P ; N i¼1 ij wj w j ¼ P5 EQ-TARGET;temp:intralink-;e003;326;254 j¼1 3.3 Optimal Feature Selection The above analyses indicate that performance accuracy of global color histograms, HOG, LBP, and edgeLBP depends on feature parameters. Optimal single feature selection is necessary for improving recognition accuracy and dimensionality reduction. We select optimal parameters of each feature by first applying kernel principal component analysis to reduce feature dimension and discard useless features.47 Subsequently, the average five-fold cross-validation accuracies of each set of parameters are computed. The set of parameters with the highest average cross-validation accuracy is selected as the optimal parameters of each feature. The optimal single feature is extracted using the optimal parameters, which are presented in Sec. 5.1. Journal of Electronic Imaging wj (2) ; (3) where N is the number of training samples and Pij is the accurate probability of the i’th sample to be classified correctly by the j’th classifier (feature). Pij is computed using LibSVM. During the testing stage, the outputs of the classifiers that are trained using each of the optimal features are summed with the computed weights to obtain the final score vectors. For multiclass classification, the output of the classifier is a vector containing the probability of samples belonging to each class and the class label is computed by determining the maximum score in the final score vector. Four state-of-the-art score-level fusion approaches. MSVM:37 Five classifiers are trained using each feature with 013009-4 Jan∕Feb 2016 • Vol. 25(1) Kuang et al.: Fruit classification based on weighted score-level feature fusion LibSVM and cross validation. The class-specific weight vectors and bias terms are learned using the previously proposed M-SVM37 to linearly combine the results of each sample obtained by the five classifiers as hðsk Þ ¼ wTk sk þ bk ; EQ-TARGET;temp:intralink-;e004;63;708 (4) where sk is the score vector of the k’th class containing the scores obtained by the five classifiers, wTk and bk are the class-specific weight vector and bias term, respectively, and hðsk Þ is the final score of the k’th class. “M-SVM+LEM”:38 we follow an existing concept of classifier fusion.39 This is done by training five M-SVM classifiers using the five features introduced above, after which the classification results of the five classifiers are combined by class-specific weight vectors learned by the LEM method39 as hðsk Þ ¼ wTk sk : EQ-TARGET;temp:intralink-;e005;63;567 (5) Weighted linear combination (WLC) using genetic algorithm (WLC-GA):39 After computing the scores of each classifier (i.e., feature), we search for the optimal weights of each feature by using a modified GA39 to linearly combine the scores of each classifier for predicting the class label. GA-based fusion weight selection (GAFW):40 GAFW is another GA-based feature fusion approach. The GA used to compute weights40 is different from that in WLC-GA. Similarly, we select the weights of each feature using this GA40 to fuse together the classification results of each feature for class label recognition. In WLC-GA and GAFW, the weights are classifierspecific. The final score vector is computed as Sfinal ¼ 5 X EQ-TARGET;temp:intralink-;e006;63;381 i¼1 wi Si ; (6) where Si is the score vector containing the scores of each class obtained by the i’th classifier, and wi is the learned weight of the i’th classifier. 3.4.3 Decision level fusion and learning level fusion We use the five optimal features described in Sec. 3 and apply a known feature fusion approach.35 One hundred (5 20) binary classifiers of each feature for each class are learned, and combined using the classifier fusion framework35 to obtain the final decision. This decision level is termed “DLF-BC” in this paper (decision-level fusion of binary classifiers). “SimpleMKL,”48 one of the state-of-the-art MKL methods, is selected as the baseline for learning-level feature fusion. SimpleMKL warps multiclass SVM and selects an optimal combination of kernels based on different features and is easy to use for multiclass classification problems. We utilize the five features above. Then, SimpleMKL48 is used to combine the five features by finding the optimal combination of kernels for object classification. Journal of Electronic Imaging 3.4.4 Our proposed weighted score-level multiple feature fusion based on learned weights The differences in classification strength among the features of each class should be considered. Therefore, we propose a score-level feature fusion based on learned weights of each feature for each class in this section to improve the overall classification accuracy. We named the proposed feature fusion as weighted score-level fusion with learned weights (WSLF-LW). For each class, different features (classifiers) demonstrate different classification accuracies. In addition, a feature (classifier) also demonstrates various classification accuracies. This requires us to learn the weight of each feature for an individual class accurately and reasonably. That is to say, the learned weights should be classifier-specific and class-specific. In recent years, the coefficients learned from training data are utilized in object proposals.49,50 “BING” first learned a classifier to compute the original scores, followed by a score function with two terms (weights) to obtain accurate and reasonable scores.50 In view of this, we also learn the accurate weights of each feature for each class using a linear SVM. In stage 1, we train a multiclass classifier of each feature using SVM (LibSVM) and conduct cross-validation on training samples. Each feature utilized is generated by the optimization selection framework proposed in this paper. The multiclass classifier computes the classifier responses (e.g., scores) of each sample. The scores of each classifier for each sample are represented by the following vector Sij ¼ ½si1j ; si2j ; : : : ; sikj ; : : : ; siCj ; (7) EQ-TARGET;temp:intralink-;e007;326;429 where C is the number of classes of an object, j and i denote the j’th classifier (feature) and the i’th sample, respectively, and sikj denotes the score of the i’th sample for the k’th class obtained by the j’th classifier (feature). In stage 2, the weights and bias terms of each feature for each class are learned by a linear SVM. The score vector of each feature (e.g., classifier) is calibrated using learned weights and then multiclass prediction is derived from the final score vector that is computed by summing the calibrated score vectors of each feature. We define the calibrated score vector of each feature for each class as oij ¼ wj Sij þ bj EQ-TARGET;temp:intralink-;e008;326;266 ¼ ½wj1 si1j þ bj1 ; wj2 si2j þ bj2 ; : : : ; wjk sikj þ bjk ; : : : ; wjC siCj þ bjC ; (8) where wjk is the weight of the j’th feature for the k’th class, wjk and bjk are the learned weight and bias term of the j’th feature for the k’th class for score calibration, respectively, and oij is the output score vector of the i’th sample using the j’th feature. The final score vector is computed by summing the calibrated score vectors of each feature as 013009-5 Jan∕Feb 2016 • Vol. 25(1) Kuang et al.: Fruit classification based on weighted score-level feature fusion EQ-TARGET;temp:intralink-;e009;63;752 Oi ¼ ¼ 5 X j¼1 training of the new multiclass classifiers using the added features and the learning of weight and bias terms of the feature added to each class, while keeping the existing classifier and weights unchanged. 3. We also learn the bias term of each classifier for each individual class to calibrate all weights to improve classification accuracy. However, bias terms are not utilized in these score-level fusion baselines, with the exception of M-SVM,37 where the bias terms of each class for different classifiers are the same. 4. Our approach differs from other feature-level and learning-level feature-fusion approaches in which all features are used to train a single classifier; thus, our method is less prone to the curse of dimensionality because each feature is used separately to train a classifier. 5. Our proposed approach differs from the existing decision-level feature-fusion approach,35 in which the differences of classification ability of each feature for each class are considered and 5 C binary classifiers are trained and learned to obtain a final decision by majority voting of hard labels (e.g., 1∕ − 1). In contrast, our proposed method only needs to train five multiclass classifiers and linearly sum the soft scores with weights. The disadvantage of using hard labels and majority voting is that some useful information might be lost for classification. Our proposed method has the potential to be more effective and to reduce training and testing time. oij X 5 j¼1 wj1 si1j þ bj1 ; þ bjk ; : : : ; 5 X j¼1 5 X j¼1 wj2 si2j þ bj2 ; : : : ; 5 X j¼1 wjk sikj wjC siCj þ bjC ¼ ½Oi1 ; Oi2 ; : : : ; Oik ; : : : ; OiC ; (9) where oij is obtained using Eq. (7), Oi is the final score vector of the i’th sample after weighted feature fusion, and Oik is the score for the k’th class. Note that the decision function of the multiclass classification problem is defined as follows li ¼ arg max Oik ; EQ-TARGET;temp:intralink-;e010;63;567 k¼1;: : : ;C (10) where li is the classification result (class label) of the i’th sample, Oik is the k’th value of the final score vector of the i’th sample. This decision function signifies that the class label of a sample is the index of the maximum score. The terms wjk and bjk in Eq. (8) are learned by using a linear SVM (LibLinear), which is done by using training samples of the k’th class as positive samples and training samples of all other classes as negative samples. We use score values of the j’th feature for the k’th class as features (one-dimensional feature) which are input to a linear SVM. Details of the procedure used to learn the weights can refer to the “BING” code that has been published.50 The learning procedure is run 5 C times to obtain the weight and bias term of each feature for each class. We first evaluate each multiclass classifier for the corresponding features of each testing sample, after which we compute the final score vector by inputting the classifier responses and learned weights together with the bias terms into Eq. (9). Finally, the classification results are obtained using Eq. (10). The highlights of our proposed weighted score-level feature fusion based on learned weights are as follows: 1. Our proposed WSLF-WL approach considers the strength of complementarity of each feature for recognizing the individual class when designing weights, which is ignored by most feature fusion approaches except M-SVM37 and M-SVM+LEM.38 The learned weights are not only class-specific (for the same feature, the weights of each class are different and learned separately) but also classifier-specific (for the same class, the weights of each feature are also different). 2. Our proposed method learns each weight of each feature for each class separately in a learning step (independent), whereas other score-level fusion approaches such as M-SVM,37 M-SVM+LEM,38 WLC-GA,39 and GAFW40 only use one learning step to learn all the weights. Thus, those approaches require all weights to be relearned when additional features are added. In contrast, our proposed method is easy to extend when additional features are added and only necessitates the Journal of Electronic Imaging 4 Experimental Settings 4.1 Experimental Conditions All experiments were performed using an Intel Core i7-4770 processor with 8 GB installed memory (RAM) running Windows 7 Enterprise 64-bit. The computing speed of the CPU was 3.40 GHz. In addition, the evaluation measure used in this work is recognition accuracy, which is the ratio of the number of samples that are recognized correctly to the number of samples tested. 4.2 Dataset In the work described in this paper, all the experiments are performed on the customized fruit dataset. Currently, there is no universal fruit dataset for fruit recognition. We developed an object classification method as the basis of object detection. Therefore, the samples should consist of images containing a single fruit. As a result, existing fruit datasets are not suitable for our work. Therefore, a new fruit dataset with images containing only a single fruit in the foreground is built. The dataset comprises 20 fruit classes, including the background, red apple, orange, pear, tomato, strawberry, banana, watermelon, kiwi fruit, peach, pomegranate, pineapple, starfruit, red grape, lemon, mango, pomelo, durian, hami melon, and papaya. The size of each sample is 100 100 pixels. Fifty samples of each class are randomly selected for testing and the other samples are used for training. All the samples are RGB images. Details of the dataset are given in Table 1 and the dataset is available online.51 013009-6 Jan∕Feb 2016 • Vol. 25(1) Kuang et al.: Fruit classification based on weighted score-level feature fusion Table 1 Fruit dataset developed in this paper. Class of fruit Number of samples for training Number of samples for testing Class of fruit Number of samples for training Number of samples for testing Background 740 50 Pomegranate 266 50 Red apple 418 50 Pineapple 225 50 Orange 432 50 Starfruit 161 50 Pear 268 50 Red grape 164 50 Tomato 411 50 Lemon 149 50 Strawberry 450 50 Mango 149 50 Banana 394 50 Pomelo 118 50 Watermelon 239 50 Durian 120 50 Kiwi fruit 311 50 Hami melon 102 50 Peach 434 50 Papaya 59 50 Total 5610 1000 Figure 2 shows representative images of each class in our fruit dataset and two other existing fruit datasets. The first two images in Fig. 2(a) are background images, that is, images in which fruits do not appear. The backgrounds, sizes, positions, and illuminations of the sample images are different. Our dataset has a large diversity and is more complex than the existing datasets in which most of the images have a simple and monotonic background. For instance, the background of images in an existing dataset23 is white in color [see Fig. 2(b)] and in Fig. 2(c) the background of each image of another existing dataset35 is very simple and monotonic. All experiments shown in Sec. 5 are conducted on this fruit dataset. 5 Experimental Results This Section presents the experimental results, including optimal feature selection, the difference in the classification strength of each feature for each class, a comparison between our proposed method and some baseline methods, a validation of feature complementarity, a comparison with other fruit classification methods, the accuracy of different training samples, and the classification speed. 5.1 Optimal Feature Selection Results Optimal feature selection is performed to select the optimal set of feature extraction parameters. The selection standard is the average five-fold cross-validation accuracy. For a global color histogram, the optimal number of bins is 32. The optimal HOG is extracted by dividing images into 20 20 blocks without overlap. The optimal LBP is a uniform pattern with a radius of 1, and the number of sample points is 8. Besides, the block has a size of 25 25 with no overlap among blocks. For edgeLBP, the parameters of LBP extraction in edgeLBP are similar to those of LBP. Journal of Electronic Imaging 5.2 Classification Strength Differences of Each Feature for Each Class The classification strength of each feature for each class is investigated by classifying samples of each fruit class using the five respective feature descriptors. The classification accuracies of each feature for each class are shown in Fig. 3. “Color” and “shape” denote global shape features and color features, respectively. The results in Fig. 3 show that the same feature demonstrates various classification accuracies for the 20 classes of fruit. For instance, LBP demonstrates high accuracy (98%) for durian and low accuracy (50%) for papaya. Different features also vary in the classification accuracy for each class. For instance, HOG demonstrates higher accuracy (60%) for papaya than LBP. In terms of the average accuracy for all 20 classes, LBP demonstrates the highest accuracy (83.7%) and shape shows the lowest accuracy (25.7%). These results indicate that the various contributions of each feature to an individual class require the weights of each feature for each class to be considered when performing feature fusion to improve the overall classification accuracy. 5.3 Validation of Feature Complementarity This section describes the results of experiments that are performed to validate feature complementarity. Table 2 summarizes the performance (accuracy for 20 classes) of single features and several multiple-feature fusions based on learned weights. The accuracies of shape and color features are low. However, it does not mean that shape and color features are not useful. The analysis above shows that the shape features are global shapes which are complementary with HOG and the color features are complementary with other features. The accuracy of the fusion shape and HOG feature is 82.2%, which is higher than that of the HOG (78.6%) and shape features (25.7%). This validates the complement of 013009-7 Jan∕Feb 2016 • Vol. 25(1) Kuang et al.: Fruit classification based on weighted score-level feature fusion Fig. 2 Examples of each class in our dataset and examples of other datasets. (a) Examples of each class of our fruit dataset; (b) and (c) examples of other fruit datasets.23,35 Fig. 3 Classification accuracy comparison of each feature for each class. shape features with HOG. In order to validate the complement of edgeLBP with LBP, the fusion of edgeLBP and LBP is tested. The accuracy of this fusion is 86.8%, which is higher than that of LBP (83.7%) and edgeLBP (77.8%). The accuracy of a fusion of color, shape, and HOG features is 84.4% which is higher than the accuracy of using a single feature, which validates the complement Journal of Electronic Imaging of color features with other features. The fusion of five features demonstrates the highest recognition accuracy (90.7%), which also validates that the five features are complementary with each other. These results also demonstrate that our proposed feature-fusion approach is effective. Note that scorelevel multiple-feature fusion based on learned weights is performed when two or more features are used. Moreover, 013009-8 Jan∕Feb 2016 • Vol. 25(1) Kuang et al.: Fruit classification based on weighted score-level feature fusion Table 2 Experimental results for validating the complementarity of features. C and S denote color and shape features, respectively. Features Accuracy (%) Features Accuracy (%) C 55.6 C þ S þ LBP þ HOG þ edgeLBP 90.7 S 25.7 S þ HOG 82.2 LBP 83.7 LBP þ edgeLBP 86.8 HOG 78.6 C þ S þ HOG 84.4 EdgeLBP 77.8 C þ S þ LBP þ HOG 88.7 we consider the five feature vectors extracted from red apples as random variables and conduct T-test and F-test on each of the two different features to verify the statistic independence of the feature descriptors. The average P values over all samples using T-test and F-test are both less than 0.05. For instance, the P values of T-test and F-test on LBP and HOG are 0.0324 and 0.0059, respectively. These results indicate that the five features are statistically independent with each other to a large extent. We also compare the classification speed (i.e., computation time) when using each feature. For instance, the classification speed when using “LBP,” “S þ HOG,” and “C þ S þ LBP þ HOG þ edgeLBP” is about 0.032 s per image, 0.044 s per image, and 0.092 s per image, respectively. Because the weights have been learned offline, the classification speed of our proposed feature fusion is only slightly slower than that of a single feature (e.g., LBP); however, our proposed method improves the accuracy by 7%. This result demonstrates the effectiveness of our proposed method. Due to the fact that we extract five complementary features, we wonder which combination of features is optimal. Therefore, we also test all probable fusions of features to select the optimal combination of features. There are 26 probable combinations of features (C25 þ C35 þ C45 þ C45 ¼ 26), but we only partially show the results in Table 2. The results indicate that the combination of all five features is optimal. Hereafter, the performance of the proposed method is obtained using the optimal fusion of features. of fruit, our proposed weighted score-level fusion based on learned weights demonstrates higher classification accuracy (90.7%) than M-SVM+LEM (89.3%), GAFW (89.1%), WLC-GA (88.8%), SimpleMKL (88.5%), M-SVM (88.3%), DLF-BC (88.1%), WSLF-ACC (87.4%), and “concatenating” (85.6%). These results confirm our proposed feature fusion to be superior to other multiple-feature fusion methods in terms of classification accuracy. 5.5 Comparison with State-of-the-Art Fruit Classification Methods Our proposed object classification method is validated on a fruit classification task using a customized fruit dataset. Four state-of-the-art fruit classification methods and our proposed object classification method are compared on the same dataset. The first is the method proposed by Harjoko and Abdullah19. This method uses shape and color features. Another is the method developed by Arivazhagan et al.20 This method utilizes a fusion of color and texture features. The first two methods both utilize feature fusion by simple concatenating one feature after another. The third method uses a fusion of shape, color, and texture feature by PCA proposed by Zhang and Wu.23 The last method utilizes three features, which combine binary classifiers of each feature using the majority voting rule.34 Figure 5 shows the comparison between the four fruit classification methods and our proposed object classification method. Our proposed 5.4 Comparison with Multiple-Feature Fusion Baselines The effectiveness of our proposed score-level feature fusion based on learned weights is demonstrated by comparing our results with the eight other multiple feature fusion methods described in Sec. 3. The eight baseline methods introduced in Sec. 3 are evaluated on the same fruit dataset using the five features. The comparison of recognition accuracies of multiple feature fusion approaches is shown in Fig. 4. The trend of recognition with an increasing number of classes of fruit is shown by randomly selecting 2 to 20 classes from the fruit dataset to be recognized. For example, 4 in Fig. 4 denotes that we randomly select four classes (e.g., background, pear, red apple, and orange) that are evaluated by different fusion baseline methods. Recognition accuracy decreases when the number of classes of fruit increases. The weighted scorelevel feature fusion based on learned weights (WSL-LW) demonstrates higher accuracy than other fusion methods with the same number of classes. For instance, for 20 classes Journal of Electronic Imaging 013009-9 Fig. 4 Comparison of different feature fusion methods. Jan∕Feb 2016 • Vol. 25(1) Kuang et al.: Fruit classification based on weighted score-level feature fusion the accuracy (for 20 classes) comparison of a different number of training samples per class. The results in Fig. 6 indicate that the higher the number of training samples per class, the higher the accuracy, which signifies the probability of improving the accuracy by increasing the number of training samples. Fig. 5 Comparison of several fruit classification methods with our developed fruit dataset. fruit classification method demonstrates higher accuracy (90.7% for 20 classes) than the four state-of-the-art fruit classification methods under the same conditions. For instance, our proposed method improves accuracy for 20 classes by 4%, 9.4%, 30.4%, and 33.2% compared to the four existing methods,19,20,23,35 respectively. Therefore, our proposed method is more effective than other fruit classification methods in terms of classification accuracy of fruit. 5.6 Results Using Different Numbers of Training Sample In the new fruit dataset, there are 5610 training images. We are interested in the impact of changing the number of training samples on accuracy. The higher the number of training samples included, the longer the training time. We investigate whether we could obtain comparable accuracy with fewer training samples. Experiments are performed by changing the percentage of training samples per class and using weighted feature fusion based on learned weights. This is done by testing 8%, 16%, 32%, 64%, 80%, 90%, and 100% of training samples per class. Figure 6 shows Fig. 6 Comparison of several training samples per class using weighted multiple-feature fusion based on learned weights. X -axis shows the percentage of training samples per class. Y -axis denotes the recognition accuracy under different sample sizes. The curve in this figure shows that the accuracy increases with the increase in the number of training samples used. Journal of Electronic Imaging 5.7 Classification Speed The classification process includes feature extraction, classification by each trained classifier and weighted summation of scores. The classification time of weighted multiplefeature fusion based on learned weights is about 0.092 s per image (for a 100 100 RGB image), which would meet the criterion for real-time recognition. We can conclude that weighted score-level multiple-feature fusion based on learned weights is effective and efficient in terms of recognition accuracy and classification speed. 6 Conclusions This paper proposes a object classification method using weighted score-level multiple-feature fusion based on learned weights and an optimal feature selection framework. The proposed method demonstrates effective and robust performance for 20 classes of fruit and can recognize fruit against complex backgrounds using a customized fruit dataset. Color features, shape features, and more efficient and robust features including LBP, HOG, and edgeLBP, are utilized. Optimal feature parameter selection is performed. The complementarity of the five features is also analyzed and validated. Multiple features are combined in score level by summing scores with learned weights of each feature for each class. The weights are separately learned from training data using a linear SVM. The experiment results demonstrate that the proposed score-level multiple feature fusion based on learned weights is more effective than several state-of-the-art multiple feature fusion methods. As a consequence of the complementarity of the five features and the effectiveness of the proposed feature fusion approach, the proposed object classification method outperforms other state-of-the-art fruit classification methods when validated on the same dataset. The recognition speed can meet the demand of real-time applications. Each image in the dataset contains a single fruit object. Therefore, the proposed object classification could be used as the basis for object detection systems. The proposed method is effective and efficient for object classification. Yet, there are still some scopes for improvement. In this paper, LBP, HOG, and edgeLBP are extracted from regions of an image. However, there are several regions which contain little information that can be used for recognition purposes. For example, regions in the corners of an image may only contain background. In future, a region selection framework will be added to obtain meaningful regions and improve accuracy and testing speed. In real-world conditions, the image conditions vary; however, only gray images and HSV color space are considered in this paper. An improvement in the robustness of image conditions would require the utilization of multiple color spaces for feature extraction. 013009-10 Jan∕Feb 2016 • Vol. 25(1) Kuang et al.: Fruit classification based on weighted score-level feature fusion Acknowledgments The work described in this paper was fully supported by a grant from City University of Hong Kong (Project No. 9610326). References 1. D. S. Prabha and J. S. Kumar, “Three dimensional object detection and classification methods: a study,” Int. J. Eng. Res. Sci. Tech. 2(2), 33–42 (2013). 2. P. Viola and M. Jones, “Robust real-time face detection,” Int. J. Comput. Vision 57 (2), 137–154 (2004). 3. J. Geng and Z. Miao, “Domain adaptive boosting method and its applications,” J. Electron. Imaging 24(2), 023038 (2015). 4. N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE Conf. on Computer Vision and Pattern Recognition, pp. 886–893 (2005). 5. I. Charfi et al., “Optimized spatio-temporal descriptors for real-time fall detection: comparison of SVM and Adaboost based classification,” J. Electron. Imaging 22(14), 041106 (2013). 6. Z.-Q. Zhao, D. S. Huang, and B.-Y. Sun, “Human face recognition based on multiple features using neural networks committee,” Pattern Recognit. Lett. 25 (12), 1351–1358 (2004). 7. D. Zang et al., “Vehicle license plate recognition using visual attention model and deep learning,” J. Electron. Imaging 24(3), 033001 (2015). 8. T. N. Sainath et al., “Deep convolutional neural networks for large-scale speech tasks,” Neural Networks 64, 39–48 (2015). 9. P. F. Felzenszwalb et al., “Object detection with discriminatively trained part based models,” IEEE Trans. Pattern Anal. Mach. Intell. 32 (9), 1627–1645 (2010). 10. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vision 60(2), 91–110 (2004). 11. R. Wang, Z. Zhu, and L. Zhang, “Improving scale invariant feature transform-based descriptors with shape-color alliance robust feature,” J. Electron. Imaging 24(3), 033002 (2015). 12. K. E. Van De Sande, T. Gevers, and C. G. Snoek, “Evaluating color descriptors for object and scene recognition,” IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1582–1596 (2010). 13. G. Zhao et al., “Rotation-invariant image and video description with local binary pattern features,” IEEE Trans. Image Process. 21(4), 1465–1477 (2011). 14. A. Satpathy, X. Jiang, and H. L. Eng, “LBP-based edge-texture features for object recognition,” IEEE Trans. Image Process. 23(5), 1953–1964 (2014). 15. H. Kuang et al., “Mutual cascade method for pedestrian detection,” Neurocomputing 137, 127–135 (2014). 16. M. A. Amin and H. Yan, “An empirical study on the characteristics of gabor representations for face recognition,” IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 401–431 (2009). 17. H. Han et al., “Discriminant analysis with Gabor phase feature for robust face recognition,” J. Electron. Imaging 22(4), 043035 (2013). 18. X.-X. Niu and Y. S. Ching, “A novel hybrid CNN-SVM classifier for recognizing handwritten digits,” Pattern Recognit. 45(4), 1318–1325 (2012). 19. A. Harjoko and A. Abdullah, “A fruit classification method based on shapes and color features,” in 3rd Asian Physics Symp., pp. 445–448 (2009). 20. S. Arivazhagan et al., “Fruit recognition using color and texture features,” J. Emerg. Trends Comput. Inf. Sci. 1(2), 90–94 (2010). 21. X. Wang, T. X. Han, and S. Yan, “An HOG-LBP human detector with partial occlusion handling,” in IEEE Int. Conf. on Computer Vision, pp. 32–39 (2009). 22. N. Manshor et al., “Feature fusion in improving object class recognition,” J. Comput. Sci. 8(8), 1321–1328 (2012). 23. Y. Zhang and L. Wu, “Classification of fruits using computer vision and a multiclass support vector machine,” Sensors 12, 12489–12505 (2012). 24. Y. Zhang et al., “Fruit classification using computer vision and feedforward neural network,” J. Food Eng. 143, 167–177 (2014). 25. S. Hou, Q. Sun, and D. Xia, “Feature fusion using multiple component analysis,” Neural Process Lett. 34(3), 259–275 (2011). 26. F. Ou et al., “Face verification with feature fusion of Gabor based and curvelet based representations,” Multimedia Tools Appl. 57(3), 549–563 (2012). 27. S. S. Bucak, R. Jin, and A. K. Jain, “Multiple kernel learning for visual object recognition: a review,” IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1354–1369 (2014). 28. H. Hoashi, T. Joutou, and K. Yanai, “Image recognition of 85 food categories by feature fusion,” in IEEE Int. Symp. on Multimedia, pp. 296–301 (2010). 29. P. Gehler and S. Nowozin, “On feature combination for multiclass object classification,” in IEEE 12th Int. Conf. on Computer Vision, pp. 221–228 (2009). Journal of Electronic Imaging 30. R. M. Cruz, G. D. Cavalcanti, and T. I. Ren, “Handwritten digit recognition using multiple feature extraction techniques and classifier ensemble,” in 17th Int. Conf. on Systems, Signals and Image Processing, pp. 215–218 (2010). 31. Y. M. Chen and J. H. Chiang, “Face recognition using combined multiple feature extraction based on Fourier-Mellin approach for single example image per person,” Pattern Recognit. Lett. 31(13), 1833–1841 (2010). 32. Y. Kikutani et al., “Hierarchical classifier with multiple feature weighted fusion for scene recognition,” in Int. Conf. on Software Engineering and Data Mining, pp. 648–651 (2012). 33. G. Han et al., “A new feature fusion method at decision level and its application,” Optoelectron. Lett. 6, 129–132 (2010). 34. M. Eisenbach et al., “Evaluation of multi feature fusion at score-level for appearance-based person re-identification,” in Int. Joint Conf. on Neural Networks, pp. 1–9 (2015). 35. A. Rocha et al., “Automatic fruit and vegetable classification from images,” Comput. Electron. Agric. 70(1), 96–104 (2010). 36. J. Kittler et al., “On combining classifiers,” IEEE Trans. Pattern Anal. Mach. Intell. 20, 226–239 (1998). 37. Y. Guermeur, “Combining discriminant models with new multi-class SVMs,” Pattern Anal. Appl. 5(2), 168–179 (2002). 38. Y. Guermeur, “Combining multi-class SVMs with linear ensemble methods that estimate the class posterior probabilities,” Commun. Stat. Theory Methods 42(16), 3011–3030 (2013). 39. A. B. Santos, A. de Albuquerque Araujo, and D. Menotti, “Combining multiple classification methods for hyperspectral data interpretation,” IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 6(3), 1450–1459 (2013). 40. S. Chernbumroong, S. Cang, and H. Yu, “Genetic algorithm-based classifiers fusion for multisensor activity recognition of elderly people,” IEEE J. Biomed. Health Inf. 19(1), 282–289 (2014). 41. A. Kumar and B. Raj, “Unsupervised fusion weight learning in multiple classifier systems,” CoRR, abs/1502.01823, http://arxiv.org/abs/1502 .01823 (2015). 42. P. Dollár and C. L. Zitnick, “Fast edge detection using structured forests,” IEEE Trans. Pattern Anal. Mach. Intell. 37(8), 1558–1570 (2014). 43. N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979). 44. J. Canny, “A computational approach to edge detection,” IEEE Trans. Pattern Anal. Mach. Intell. 8(6), 679–698 (1986). 45. C.-C. Chang and C.-J. Lin, “LIBSVM: a library for support vector machines,” ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011). 46. R.-E. Fan et al., “Liblinear: a library for large linear classification,” J. Mach. Learn. Res. 9, 1871–1874 (2008). 47. L. J. Cao et al., “A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine,” Neurocomputing 55(1–2), 321–336 (2003). 48. A. Rakotomamonjy et al., “SimpleMKL,” J. Mach. Learn. Res. 9, 2491–2521 (2008). 49. Z. Zhang and P. Torr, “Object proposal generation using two-stage cascade SVMs,” IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 102–115 (2015). 50. M. M. Cheng et al., “BING: binarized normed gradients for objectness estimation at 300 fps,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3286–3293 (2014). 51. Hulin Kuang, “Fruitdataset,” https://www.researchgate.net/publication/ 283087342_Fruitdataset (October 2015). Hulin Kuang received his MEng and BEng degrees from Wuhan University, China, in 2013 and 2011, respectively. Currently, he is a PhD student at the Department of Electronic Engineering at City University of Hong Kong. His current research interests include computer vision and pattern recognition, especially object recognition. Leanne Lai Hang Chan received her BEng degree in electrical and electronic engineering from University of Hong Kong, MS degree in electronic engineering, and her PhD in biomedical engineering from University of Southern California. Currently, she is an assistant professor in electronic engineering at City University of Hong Kong. She is a member of IEEE. Her research interests include artificial vision, retinal prosthesis, and neural recording. Cairong Liu received her BSci degree from Wuhan University, China in 2014. Currently, she is an Mphil student at the Department of Mathematics at the Chinese University of Hong Kong. Her current research interests include real analysis and machine learning in computer vision. Hong Yan received his PhD from Yale University. He was a professor of imaging science at the University of Sydney and currently is a professor of computer engineering at City University of Hong Kong. He is a fellow of IEEE and IAPR. His research interests include image processing, pattern recognition, and bioinformatics. 013009-11 Jan∕Feb 2016 • Vol. 25(1)
© Copyright 2026 Paperzz