Advanced Science and Technology Letters Vol.45 (CCA 2014), pp.55-59 http://dx.doi.org/10.14257/astl.2014.45.11 Dimension-Based Sentiment Polarity Detection for E-Commerce Reviews Yajun Deng, Lizhou Zheng, Peiquan Jin School of Computer Science and Technology, University of Science and Technology of China, 230027, Hefei, China Abstract. E-commerce reviews are very helpful for customers to know other people’s opinions on interested products. Meanwhile, producers are able to learn the public sentiment on their products being sold in E-commerce platforms. Generally, E-commerce reviews involve many aspects of products, e.g., appearance, quality, price, logistics, and so on. In this paper, we define each of those aspects as a dimension of product, and present a dimension-based sentiment analysis approach for E-commerce reviews. In particular, we employ a dimensional sentiment lexicon expansion mechanism to remove the sentiment word ambiguity among different dimensions, and propose a rules and dimensional sentiment lexicon based algorithm for sentiment analysis on E-commerce reviews. We conduct experiments on a large-scale product reviews dataset involving over 28 million reviews, and compare our dimension-based sentiment analysis approach with the traditional way that does not consider dimensions of reviews. The results show that the multi-dimensional approach outperforms the traditional way in terms of F-measure. Keywords: Sentiment; E-commerce reviews; Dimension 1 Introduction Online E-commerce reviews contain a lot of customers’ opinions on certain products, and are helpful for decision making. A piece of E-commerce review generally involves many aspects of product, such as appearance, quality, price, logistics, and so on. Therefore, it is not appropriate to give only one positive or negative judgment on an E-commerce review by adopting traditional document-based or sentence-based sentiment analyzing approaches [1]. In general, a customer may be satisfied with some dimensions of a product but dislike some other features. For example, a review saying “宝贝外观不错,质量很好,就是太贵了!(Nice appearance, good quality, but too expensive!)” indicates that the customer likes the appearance and quality of the product but has a negative opinion on the price. Therefore, it is not suitable to conduct document-level or sentence-level sentiment analysis for E-commerce reviews. A better and more suitable way is to perform sentiment analysis on separated ISSN: 2287-1233 ASTL Copyright © 2014 SERSC Advanced Science and Technology Letters Vol.45 (CCA 2014) dimensions of products, i.e., to conduct a dimension-based sentiment analysis on E-commerce reviews. The focus of dimension-based sentiment analysis differs from traditional aspect-oriented sentiment analysis. Most of existing aspect-oriented studies concentrated on finding the explicit or implicit aspects from reviews [2,3]. However, in our work, the dimensions are provided by E-commerce company, and we focus on the expansion of dimensional sentiment lexicons. This paper is aimed at presenting a framework for dimension-based sentiment analysis on E-commerce reviews. Our study is based on a large scale of E-commerce reviews dataset consisting of 28 million of reviews. The challenges in dimension-based sentiment analysis lie in dimensions mapping and sentiment word disambiguation. The dimension mapping problem refers to mapping opinioned text blocks with right dimensions. For instance, the text block “Nice look!” will be mapped with the dimension “appearance”. The sentiment word disambiguation problem refers that a sentiment word may be connected with two or more dimensions. In this paper, we focus on those two problems and propose an effective framework to finally extract dimension-oriented opinions from E-commerce reviews. In particular, we propose a new approach to removing the sentiment word ambiguity by introducing a dimensional sentiment lexicon expansion mechanism to construct the dimension-based sentiment lexicon. We conduct experiments on a real dataset including 28 million E-commerce reviews, and compare our algorithm with the traditional algorithm without dimensional consideration. The results show that our algorithm is superior to the competitor algorithm. 2 Our Approach Figure 1 shows the general framework of the dimension-based sentiment polarity detection for E-commerce reviews. It consists of five modules, as shown in Fig.1. The Text Pre-Processing and Sentence Segmentation steps are very similar with existing works, so we will not discuss them in detail. E-commerce Reviews Textual Pre-Processing Dimensional Sentiment Lexicon Expansion expanded sentiment words Dimension Sentiment Lexicon sentences Sentence Segmentation text blocks Dimension Mapping dimensional sentiment words <text block, dimension> mapping Sentiment Analysis dimensional sentiment polarity Fig. 1. The framework of our approach Dimension Mapping. Dimensions are typically domain-dependent. Since a dimension may appear in many product categories, we have to determine the right 56 Copyright © 2014 SERSC Advanced Science and Technology Letters Vol.45 (CCA 2014) dimension for a given review about a certain product category. Given a review sentence s, we can obtain its category c and the dimension set Dc of category c directly. For each word wi in the review sentence s and a dimension dj Dc, along with dj’s initial keyword set KWj, we assign a probability score wprobi,j which describe the probability of word wi belongs to category dj (as shown in Formula 2.1). Here freq(wi, kwk, dj) means the frequency of word w in short sentence s appearing together with key word kwk in dimension dj. w p ro b i , j fr e q ( w i , d j ) * fr e q ( w i , k w k , d j ) kwk K W j (2.1) fr e q ( w i , k w k , d ) dD Then, the probability of a short sentence s belonging to a dimension dj, sprobs,j, is calculated as the product of the probability of all words in sentence s belonging to dimension dj (as shown in Formula 2.2). (2.2) s p r o b s , j w p r o bi , j wi s Finally we map a review sentence s with the dimension dj that has the maximum probability score with s. Dimensional Sentiment Lexicon Expansion. Sentiment lexicons, such as HowNet Lexicon and NTUSD Lexicon, consist of the sentiment polarity of many popular words and are widely used in sentiment analysis [4,5]. However, those lexicons did not distinguish the polarities of one word on different dimensions about a product. For example, " 硬/hard, solid" is a positive word when describing building materials but is a negative word when describing food. Besides, they only cover a small part of sentiment words that are not enough for the sentiment analysis on E-commerce reviews. We use a dimensional sentiment lexicon expansion method to solve the problems mentioned above. For each dimension, we perform two steps to extract its corresponding sentiment lexicon: (1) Step 1: seed-based lexicon expansion. First, we use some seed sentiment words to enlarge the lexicon. This seed-word-based procedure starts with some manually collected seed sentiment words which could be of the same polarity under different dimensions (e.g., positive words like “好/good” and “不错/nice”, negative words like “ 糟 糕 /terrible” and “ 惨 不 忍 睹 /too horrible to look at”, etc.). Based on the co-occurrence information with seed sentiment words, we could enlarge the original seed words iteratively, in each round of iteration we put top-3 words with the highest co-occurrence score into the seed word set. The procedure ends when the size of seed word set is larger than 20 or the co-occurrence score of each Top-K words are below a threshold. (2) Step2: filtering. Another procedure of our method is to filter words of high frequency. The filtering step is based on the the DF*ICF score (similar to the classical TF*IDF measurement, here DF means dimension frequency, and ICF means inverse corpus frequency) under a specific dimension d for each word in d. We define the DF*ICF score of a word wi under d in Formula 2.3. D F * IC F i , d Copyright © 2014 SERSC | s e n te n c e i , d | | s e n te n c e d | * lo g | s e n te n c e c o r p u s | (2.3) | s e n te n c e i , c o r p u s | 57 Advanced Science and Technology Letters Vol.45 (CCA 2014) Here |sentencei,d| is the count of sentence contains word wi in dimension d, and |sentenced| is the count of sentence under dimension d. |sentencecorpus| is the count of sentence under the whole corpus, and |sentencei,corpus| is the count of sentence contains word wi in the whole corpus. For each dimension, we put the Top-20 words with highest DF*ICF scores into the candidate sentiment word set. Sentiment Analysis. Our sentiment analysis algorithm on product reviews is based on dimensional lexicon and rules. The difference of our work compared to previous works is that we combine both dimension sentiment lexicon and public sentiment lexicon together. Besides, we use multiple rules to improve the performance of sentiment analysis. Particularly, we first define some strong rules, which can be applied to recognize the sentiment polarity with high precision. For example, “与 … 想象的一样(it’s the same … as imagine)” is recognized as a positive strong rule, which infers that sentences matching with this rule will be labelled as positive without considering other factors. Next, we calculate the linear weighted sum of all the positive, negative, and neutral polarities of the sentiment words appearing in each short sentence s. This sum is finally outputted as the polarity of the sentence. As a result, the sentiment polarity of our algorithm will output three different scores, namely the positive score, negative score, and neutral score. 3 Experimental Results F-measure Our experiment is based on a real dataset containing 28 million E-commerce product reviews. Each of the review belongs to a product category, which has been provided by the E-commerce company (totally 8000+ categories). For each category, there are multiple dimensions which describe different aspects of the products in that category. For each dimension, there are several keywords which are regarded as seed words in our algorithm. 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% With dimensional sentiment lexicon Without dimensional sentiment lexicon Fig.2 F-measure of our method and the baseline algorithm We randomly choose 9,000 outputted results to show the performance of our method. We compare our method with the approach which does not contain the dimensional sentiment word lexicon (only utilize the HowNet and NTUSD 58 Copyright © 2014 SERSC Advanced Science and Technology Letters Vol.45 (CCA 2014) lexicon), and the results show that our algorithm can improve the precision, recall, and F-measure for each type of polarities (positive, negative and neutral). Due to the space limitation, we only give the F-measure comparison in Fig.2. The positive case, negative case, and the neutral case in Fig.2 represent the positive scores, negative scores, and the neutral scores outputted by the sentiment analysis step in Fig.1. To conclude, in this work, we proposed to expand and construct dimensional sentiment lexicons for dimension-based sentiment analysis on E-commerce reviews. Our preliminary results show that this approach is helpful to improve the performance of sentiment analysis on E-commerce reviews. 5 Conclusion In this paper, we proposed a framework for sentiment analysis over E-commerce reviews. The major feature of our approach is that we introduced a dimensional dictionary into sentiment analysis, and the experimental results on a real data set showed that the proposed method is effective in sentiment polarity detection for E-commerce. Acknowledgement. This paper is supported by the National Science Foundation of China (No. 71273010), and the National Science Foundation of Anhui Province (no. 1208085MG117). References 1. R. Feldman. Techniques and applications for sentiment analysis. Communications of the ACM, 2013, 56(4): 82-89. 2. A. Popescu, O. Etzioni, Extracting product features and opinions from reviews, in Proc. Of EMNLP, 2005 3. B. Liu, Sentiment analysis and subjectivity. in Handbook of natural language processing, 2nd edition, CRC press, 2010. 4. Q. Zhang, P. Jin, L. Yue, Extracting Focused Locations for Web Pages, First International Workshop on Web-based Geographic Information Management (WGIM) (in conjunction with WAIM'11), LNCS 7142, Springer, 2011, pp.76-89 5. P. Jin, X. Zhang, Q. Zhang, S. Lin, L. Yue, Ranking Web Pages by Associating Keywords with Locations, In Proc. Of WAIM, LNCS 7923, Springer, Beidaihe, China, 2013 Copyright © 2014 SERSC 59
© Copyright 2026 Paperzz