Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid Histogram of Visual Words Image Descriptor So Yeon Kim Department of Information and Computer Engineering Ajou University Suwon, S. Korea [email protected] Abstract Image spams have been annoying users everywhere and it has also been increasingly appearing in mobile phones these days. In accordance with more sophisticated spam filtering system, spams are being more intelligent and have caused severe social problems. However, there has not been effective solution for detecting mobile phone spam images yet. Due to the insufficient spam image data in mobile phones, training the predictive model is quite hard. To resolve this issue, we recently proposed a phone spam image filtering system using e-mail spam images and showed that using e-mail spam data is fairly meaningful in improving the performance of phone spam image detection. In this paper, we further investigate the effectiveness of utilizing the graph structure in e-mail spam data. Furthermore, the classification performance behavior depending on different image descriptors of Pyramid Histogram of Visual Words (PHOW) and RGB histogram is explored extensively. Keywords graph partitioning; spectral clustering; PHOW; image spam; spam detection; image classification; color SIFT I. INTRODUCTION Image spams are widely spread in all kinds of media. Although there have been many studies on detecting spam images in e-mails or web pages, those in a mobile phone are much more insufficient than in other media. According to a bunch of personal information leaks, spam messages are increasingly appearing in personal areas like in a smart phone. Spam text messages have been irritating users for years and there have been several approaches for detecting them effectively. Recently, those unsolicited spam messages have caused severe social problems in that they are used for bank fraud and financial crimes. In order to avoid the conventional text-based spam filtering system, spam messages have been evolved. They include unnecessary special characters or white spaces between words to prevent spam filtering from detecting spam keywords. Usually, spam messages can be detected by user-supplied spam number database. It can be nevertheless deceived by changing their sending number or by using an actual s number to be filtered out of the database. Furthermore, image spams without any text are rapidly increasing in mobile phones these days, thus making spam detection even harder. Due to high cost of image processing in a mobile phone as well as insufficient phone spam image data, detecting spam images in a mobile phone becomes a difficult issue that we struggle with. Accordingly, researches on phone spam images are necessary. However, the size of phone spam image data is still too small to train a predictive model with sufficient accuracy. * Corresponding Author 978-1-4799-8679-8/15/$31.00 copyright 2015 IEEE ICIS 2015, June 28-July 1 2015, Las Vegas, USA Kyung-Ah Sohn* Department of Information and Computer Engineering Ajou University Suwon, S. Korea [email protected] In this respect, we recently proposed a phone spam image filtering system taking advantage of widely available e-mail spam image data [1]. We showed that using a visually similar sub-group of e-mail spam images in addition to phone spam images is effective in phone spam image detection. In this paper, we further investigate the effectiveness of using the graph structure in e-mail spam data. To obtain similar subgroup of e-mail spam images, graph partitioning algorithm of spectral clustering is used as well as the k-means clustering. In addition, the performance on spam image classification using multiple image descriptors are compared which are RGB histogram feature and Pyramid Histogram of Visual Words (PHOW) descriptor with gray, RGB, and opponent color mode. II. METHODOLOGY A. Image Descriptors To obtain image features, we use existing image descriptors. Each spam or non-spam image is represented by RGB histogram or Pyramid Histogram of Visual Words (PHOW) descriptor [2] whose color mode is gray, RGB and opponent, respectively. PHOW descriptor is implemented by VLFeat open source visual computing library [3]. 1) RGB histogram: For each single image, color histogram is computed which has 4 bins per red, green, and blue, totally 64 bins. It describes an image with RGB color distributions. 2) PHOW (gray, RGB, opponent): The image is represented by PHOW descriptor [2] based on spatial pyramid matching scheme [4]. a) Feature extraction: For each input image, multiple dense SIFT descriptors for gray, RGB, and opponent color mode are obtained [2]. SIFT descriptor with opponent color space has shown to perform better than other color SIFT descriptors in many categories of image dataset [5]. b) Bag of visual words: The extracted visual features of images are partitioned into 500 visual words by k-means clustering [6] and a visual word dictionary is constructed. Then, each input image is vector-quantized into visual words by the kd-tree from the visual word dictionary [7]. c) Spatial histogram: Each image is divided into 2 4 sub-regions to consider spatial co-occurences of histograms in every sub-region of an image [4]. In each su b-region, histogram of bag of visual words is obtained, namely 500 visual word distribution in each histogram. Finally, the concatenation of 8 spatial histograms is the descriptor of an image. The overview of the image descriptor extraction process is summarized in Fig 1. Note that D is the similarity matrix between e-mail spam images which is computed with each image feature (RGB histogram, PHOW-gray, PHOW-RGB, and PHOW-opponent). The scaling parameter controls how rapidly the affinity W falls off with distances in D. As a result, the performances on phone spam image classification with each sub-graph are compared. The bestperformed sub-graph is used for spam image classification. Fig. 1. The overview of extracting image descriptor B. Database Construction for Training To get similar sub-group of e-mail spam images to phone spam images, k-means clustering and spectral clustering are used. Based on each clustering method, sub-group of e-mail spam images are added to phone images. As a result, phone images and similar sub-group of e-mail spam images are used as training data for learning our model. Additionally, a randomly selected sub-group of e-mail spam images is used as a baseline. The overall process is illustrated in Fig 2. 1) K-means clustering: In k-means clustering, a distance matrix between e-mail and phone spam images is obtained. To compute the distance between two images, standard euclidean distance is used. By k-means clustering, the distance matrix is partitioned into k mutually exclusive clusters [6]. Note that here the distance values are used as features and the centroid of each cluster is the mean of euclidean distances between images in the cluster. The most visually similar sub-group is the cluster which has the smallest centroid. It is performed iteratively to find the optimal centroids of clusters. Although 100 iterations are computed, it does not guarantee that the clustering result is converged to the optimal solution. Thus, we used k-means++ algorithm which greedily takes center points being maximally different rather than randomly [8]. In [8], they showed that k-means++ has improved both running time and the quality of clustering result. 2) Spectral clustering: Spectral clustering is a standard graph cut algorithm which is used for graph clustering [9]. To partition e-mail spam image graph G, normalized cut algorithm is used which considers not only inter-cluster similarity but also intra-cluster similarity [10]. We use the implementation of normalized cut algorithm in publicly available Spectral Clustering Toolbox [11]. Given e-mail spam images, the spam image graph G = (V, E) is constructed. Each e-mail spam image is taken as a node and similarity distances between each pair of images are taken as edge. To compute similarity between a pair of e-mail spam images, euclidean distances of phone and e-mail spam images are computed in advance. As shown in Fig. 2, each e-mail spam image has a vector of similarity distances to all the phone spam images. The similarity between each pair of the similarity vector is computed. As a result, the similarity matrix between e-mail spam images is obtained. The affinity matrix in spectral clustering [9] is defined as Fig. 2. Database construction for training with k-means clustering and spectral clustering 3) Random: To demonstrate that the use of advanced clustering techniques such as spectral clustering and k-means clustering is indeed meaningful, we used randomly selected 10, -mail spam images and phone spam images for training the predictive model. In this part, RGB histogram is used for describing an image in order to compare with PHOW descriptor. Therefore, the result when trained with randomly selected images with RGB histogram feature is compared with the one using k-means clustering and spectral clustering with RGB histogram and PHOW descriptor. C. Image Classification on Phone Spam Data The constructed phone and e-mail spam image data from each clustering method is used for training our predictive model. First, the feature vector of RGB histogram, PHOWgray, PHOW-RGB, and PHOW-opponent of each input image is obtained. Trained with phone and e-mail spam images, the predictive model finally classifies the phone spam image into spam or non-spam. We trained SVM on training data and validated our result on validation set. To compute large image data effectively, -kernel SVM using homogeneous kernel map is u sed [12]. It transforms the data into linear -kernel SVM can be representation, thus non-linear computed. In [12], they showed -kernel SVM showed better performance than other kernels. The soft margin of SVM is set to 10. D. 5-fold Cross Validation and Evaluation To determine the optimal parameter in spectral clustering and prevent over-fitting to the training data, we evaluated our result with 5-fold cross validation. Note that we train our model with phone images and similar e-mail spam images and classify phone spam images into spam or non-spam. As shown in Fig. 3, e-mail and phone image data are divided into 5-folds, respectively. 4-folds of phone and e-mail images are used as training set, and the remaining one-fold of phone images is used as validation set. Namely, 80% of phone and e-mail images are used for training and 20% of phone images are used for testing our model at each run. Fig. 3. 5-fold cross validation on phone image data trained with phone and email image datasets together III. RESULTS A. Dataset Image Spam Hunter [13] is a publicly available dataset of image attachments in e-mail, which contains 929 e-mail spam images. We used a similar sub-set of 929 e-mail spam images that is clustered from k-means clustering, spectral clustering and randomly. Table I shows the size of spam and non-spam data when the data is clustered by spectral clustering with each image descriptor which yields the better performance than the one using k-means clustering. TABLE I. DATASET SIZE USED IN SPECTRAL CLUSTERING Phone Spam RGB histogram PHOW-gray PHOW-RGB PHOW-opponent Non-spam 66 405 E-mail 12 201 20 324 Total 78 267 86 390 - 405 B. Performance Comparison in 5-fold Cross Validaion We performed 5-fold cross validation to evaluate the result. First of all, we visually examined how many clusters the data are partitioned into. The heat-map of the affinity matrix in spectral clustering is shown with respect to different parameter values for . We used image descriptors of RGB histogram, PHOW-gray, PHOW-RGB, and PHOW-opponent. The result using spectral clustering across , and the one from k-means clustering is compared with the one using randomly selected image subset as a baseline. We evaluated our result in terms of accuracy, sensitivity, specificity, and F-score, respectively. The result of spectral clustering and the overall performance is shown with heat-maps and plots of accuracy, sensitivity, specificity, and F-score in Fig. 4 and 5 as explained below. 1) Random: The performance using randomly selected images is shown as a green dotted line in Fig. 4 and 5. As a baseline, the RGB histogram is used as an image descriptor. The accuracy, sensitivity, specificity, and F-score is shown when randomly selected 10% of e-mail spam images are used which had the highest F-score. In this result, because it only considers the color distribution of images when training the data, it tends to classify any spam or non-spam image into non-spam. As there are more nonspam images than spam images in training data, the model is more likely to be trained with color distribution of non-spam images. Thus, sensitivity (true positive rate) is lower than accuracy and specificity. 2) RGB histogram: The result of spectral clustering as heatmaps and the performance using RGB histogram feature is shown in Fig. 4. The overall performance of k-means clustering or spectral clustering is better than the one using a randomly selected subset of images. In case of spectral clustering, the result varies with different parameter values of . As shown in the heatmap, the data is likely to be clustered better when is between 0.5 and 0.7 approximately. Although F-score of spectral clustering result is similar to k-means is clustering result, the performance is improved when around 0.7 in terms of the sensitivity. Contrary to the sensitivity, the specificity showed the best performance when is 0.4. In the way that F-score is quite low regardless of the clustering method, the overall performance is influenced more by which feature of image is used than the used clustering method. Nonetheless the result shows that using a similar subgroup of images can improve the performance. Fig. 4. Heatmap of affinity matrix in spectral clustering with respect to different (upper), and the performance comparison of spectral clustering, k-means clustering and the randomized method in 5-fold cross validation (lower). Affinity matrix is computed with RGB histogram feature A B C Fig. 5. Heatmap of affinity matrix in spectral clustering with respect to different (upper), and the performance comparison of spectral clustering, k-means clustering and the randomized method in 5-fold cross validation (lower). Affinity matrix is computed with PHOW features (A: gray mode, B: RGB mode, C: opponent mode) 3) PHOW(gray, RGB, opponent): The performance of PHOW descriptor is shown in Fig. 5. When PHOW descriptor is used, performances are significantly higher than the one using RGB histogram on the whole. As PHOW descriptor considers both geometric information and color distribution, images can be distinguished more precisely. For example, some spam images in Fig. 6 contain lots of texts but the color distribution is different. Many e-mail spam images contain only texts with different color and scale because they want to be looked like actual e-mail and to be filtered out of spam filtering system. Those images should be grouped in the same group but are classified into different group when RGB histogram feature is used. However, they are classified in the same group when PHOW descriptor is used. It shows that the performance is highly improved when both geometric and color information are used. Table II and III shows the best accuracy, sensitivity, specificity and F-score in k-means clustering and spectral clustering respectively with respect to each image feature. The best performance of all features is obtained in PHOW descriptor in RGB mode. In spectral clustering, the result is better when is large except in RGB mode. Though improvements are quite marginal, we find that PHOW descriptor considering various color distribution rather than gray color is also meaningful. TABLE II. Fig. 6. Sample spam images which are correctly grouped in the same cluster with the PHOW descriptor, but in a different one with RGB histogram feature. Additionally PHOW descriptors with three color modes (gray, RGB, and opponent) are compared. The performance of spectral clustering is similar or slightly better than that of k-means clustering. Though the specificity (true negative rate) and overall performance is not different across clustering methods, sensitivity in spectral clustering varies depending on paramter . As clustered e-mail spam data from spectral clustering are added in the model training, the true positive rate can be affected by clustering result. As PHOW descriptor cosiders much more features than RGB histogram, the performance is improved considerably. C. Averaged Performance Comparison in Optimal Parameter The best performance of each image descriptor is compared when using k-means clustering and spectral clustering, respectively in Fig. 7. The best F-score is used for evaluation. As shown in Fig. 7, the performance when trained with k-means clustering and spectral clustering is almost the same. Regardless of the color mode, PHOW descriptors show much better performance than RGB histogram. It shows that the color mode of PHOW descriptor . Therefore, we show that considering color distribution with geometric information has a big impact on the overall performance rather than the color variance. Fig. 7. Best performance comparison on RGB histogram, PHOW-gray, PHOW-rgb, PHOW-opponent feature in k-means clustering, spectral clustering (red dotted line: best performance when training with randomly selected images) BEST PERFORMANCE COMPARISON WITH RESPECT TO IMAGE DESCRIPTORS IN K-MEANS CLUSTERING RGB Histogram PHOW (gray) PHOW (RGB) PHOW (opponent) random Accuracy 73.47% 95.12% 95.54% 94.27% 72.25% Sensitivity 42.42% 92.42% 92.42% 87.91% 32.03% Specificity 78.52% 95.56% 96.05% 95.31% 78.81% F-score 30.73% 84.19% 85.49% 81.15% 24.14% 10% TABLE III. BEST PERFORMANCE COMPARISON WITH RESPECT TO IMAGE DESCRIPTORS IN SPECTRAL CLUSTERING RGB histogram PHOW (gray) PHOW (RGB) PHOW (opponent) random Accuracy 81.75% 96.39% 96.82% 96.39% 72.25% Sensitivity 30.55% 95.45% 87.91% 84.95% 32.03% Specificity 90.12% 96.54% 98.27% 98.27% 78.81% F-score 32.31% 88.28% 88.48% 86.76% 24.14% 10% D. Misclassified Samples in Best-performed Cluster The sample images of false positives and false negatives in validation set are shown in Fig. 8 when trained with bestperformed cluster. Note that best-performed cluster is obtained in spectral clustering when is 0.3 with PHOW(RGB) descriptor. Images in Fig. 8(a) are legitimate images but are classified as spam. It contains mobile-coupons that the user actually asked for. Those coupons shared many visual features with actual spam images. This is the reason why sensitivity is generally lower than specificity. Also, a user can send captured or saved images on the web to another user that is necessary information. Those images contain many texts that look like e-mail spam images. False negatives in Fig. 8(b) also visually look similar to mobilecoupon in Fig. 8(a). As shown in these examples, the criteria for classifying spam image are quite subjective, namely some images in mobile phone are considered as spam for some users but non-spam for others. [5] [6] [7] [8] [9] Fig. 8. Examples of misclassified images [10] IV. CONCLUSION We proposed a mobile phone spam image filtering system using a large set of e-mail spam images. In [1], we recently showed that using e-mail spam image data is quite useful for phone spam image classification. In this paper, we demonstrate that using similar sub-graph of e-mail spam images by graph partitioning algorithm yields desired performance as well as k-means clustering algorithm. Additionally, performances on phone spam image classification with RGB histogram and PHOW descriptor with gray, RGB, opponent color mode are compared to consider color distribution of an image, geometric information and both geometric and color information. The result showed PHOW descriptor with RGB that takes geometric and RGB color information performs the best on phone spam image classification. It showed that considering both geometric and color information can improve the performance on spam image classification. Also, a sophisticated clustering technique has positive impact on improvement. If the size of phone image data for validation gets bigger, improvements are expected to be more distinguished. Furthermore, it can be applied to the data from other domain that encounters a similar data insufficiency problem. ACKNOWLEDGMENT This research was supported by Research Program through the Foundation (NRF) of Korea funded Science, ICT, and Future (2014R1A1A3051169). the Basic Science National Research by the Ministry of Planning (MSIP) REFERENCES [1] [2] [3] [4] K. So Yeon, B. Yenewondim, and S. Kyung-Ah, "Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Image Detection Using Scale Invariant Feature Transform Image Descriptor.", Information Science and Applications, LNEE 339, pp. 591-598, 2015. A. Bosch, A. Zisserman, and X. Munoz, "Image classification using random forests and ferns," in Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pp. 1-8, 2007. A. Vedaldi and B. Fulkerson, "VLFeat: An open and portable library of computer vision algorithms," in Proceedings of the international conference on Multimedia, pp. 1469-1472, 2010. S. Lazebnik, C. Schmid, and J. Ponce, "Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories," in Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 2, pp. 2169-2178, 2006. [11] [12] [13] K. E. Van De Sande, T. Gevers, and C. G. Snoek, "Evaluating color descriptors for object and scene recognition," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 32(9), pp. 1582-1596, 2010. C. Elkan, "Using the triangle inequality to accelerate k-means," in ICML, vol. 3, pp. 147-153, 2003. M. Muja and D. G. Lowe, "Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration," in VISAPP (1), pp. 331-340, 2009. D. Arthur and S. Vassilvitskii, "k-means++: The advantages of careful seeding," in Proceedings of the eighteenth annual ACMSIAM symposium on Discrete algorithms, pp. 1027-1035, 2007. A. Y. Ng, M. I. Jordan, and Y. Weiss, "On spectral clustering: Analysis and an algorithm," Advances in neural information processing systems, vol. 2, pp. 849-856, 2002. J. Shi and J. Malik, "Normalized cuts and image segmentation," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 22, pp. 888-905, 2000. S. Agarwal, "Spectral Clustering Toolbox," Available: http://vision.ucsd.edu/~sagarwal/clustering.html, 2002. A. Vedaldi and A. Zisserman, "Efficient additive kernels via explicit feature maps," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 34(3), pp. 480-492, 2012. Y. Gao, M. Yang, X. Zhao, B. Pardo, Y. Wu, T. N. Pappas, et al., "Image spam hunter," in Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on, pp. 1765-1768, 2008.
© Copyright 2026 Paperzz