Dimension-Based Sentiment Polarity Detection for E

Advanced Science and Technology Letters
Vol.45 (CCA 2014), pp.55-59
http://dx.doi.org/10.14257/astl.2014.45.11
Dimension-Based Sentiment Polarity Detection for
E-Commerce Reviews
Yajun Deng, Lizhou Zheng, Peiquan Jin
School of Computer Science and Technology,
University of Science and Technology of China, 230027, Hefei, China
Abstract. E-commerce reviews are very helpful for customers to know
other people’s opinions on interested products. Meanwhile, producers
are able to learn the public sentiment on their products being sold in
E-commerce platforms. Generally, E-commerce reviews involve many
aspects of products, e.g., appearance, quality, price, logistics, and so on.
In this paper, we define each of those aspects as a dimension of product,
and present a dimension-based sentiment analysis approach for
E-commerce reviews. In particular, we employ a dimensional sentiment
lexicon expansion mechanism to remove the sentiment word ambiguity
among different dimensions, and propose a rules and dimensional
sentiment lexicon based algorithm for sentiment analysis on
E-commerce reviews. We conduct experiments on a large-scale product
reviews dataset involving over 28 million reviews, and compare our
dimension-based sentiment analysis approach with the traditional way
that does not consider dimensions of reviews. The results show that the
multi-dimensional approach outperforms the traditional way in terms of
F-measure.
Keywords: Sentiment; E-commerce reviews; Dimension
1
Introduction
Online E-commerce reviews contain a lot of customers’ opinions on certain products,
and are helpful for decision making. A piece of E-commerce review generally
involves many aspects of product, such as appearance, quality, price, logistics, and so
on. Therefore, it is not appropriate to give only one positive or negative judgment on
an E-commerce review by adopting traditional document-based or sentence-based
sentiment analyzing approaches [1]. In general, a customer may be satisfied with
some dimensions of a product but dislike some other features. For example, a review
saying “宝贝外观不错,质量很好,就是太贵了!(Nice appearance, good quality, but
too expensive!)” indicates that the customer likes the appearance and quality of the
product but has a negative opinion on the price. Therefore, it is not suitable to conduct
document-level or sentence-level sentiment analysis for E-commerce reviews. A
better and more suitable way is to perform sentiment analysis on separated
ISSN: 2287-1233 ASTL
Copyright © 2014 SERSC
Advanced Science and Technology Letters
Vol.45 (CCA 2014)
dimensions of products, i.e., to conduct a dimension-based sentiment analysis on
E-commerce reviews.
The focus of dimension-based sentiment analysis differs from traditional
aspect-oriented sentiment analysis. Most of existing aspect-oriented studies
concentrated on finding the explicit or implicit aspects from reviews [2,3]. However,
in our work, the dimensions are provided by E-commerce company, and we focus on
the expansion of dimensional sentiment lexicons.
This paper is aimed at presenting a framework for dimension-based sentiment
analysis on E-commerce reviews. Our study is based on a large scale of E-commerce
reviews dataset consisting of 28 million of reviews. The challenges in
dimension-based sentiment analysis lie in dimensions mapping and sentiment word
disambiguation. The dimension mapping problem refers to mapping opinioned text
blocks with right dimensions. For instance, the text block “Nice look!” will be
mapped with the dimension “appearance”. The sentiment word disambiguation
problem refers that a sentiment word may be connected with two or more dimensions.
In this paper, we focus on those two problems and propose an effective framework
to finally extract dimension-oriented opinions from E-commerce reviews. In
particular, we propose a new approach to removing the sentiment word ambiguity by
introducing a dimensional sentiment lexicon expansion mechanism to construct the
dimension-based sentiment lexicon. We conduct experiments on a real dataset
including 28 million E-commerce reviews, and compare our algorithm with the
traditional algorithm without dimensional consideration. The results show that our
algorithm is superior to the competitor algorithm.
2
Our Approach
Figure 1 shows the general framework of the dimension-based sentiment polarity
detection for E-commerce reviews. It consists of five modules, as shown in Fig.1. The
Text Pre-Processing and Sentence Segmentation steps are very similar with existing
works, so we will not discuss them in detail.
E-commerce
Reviews
Textual
Pre-Processing
Dimensional Sentiment
Lexicon Expansion
expanded
sentiment words
Dimension
Sentiment Lexicon
sentences
Sentence
Segmentation
text blocks
Dimension
Mapping
dimensional
sentiment words
<text block, dimension>
mapping
Sentiment Analysis
dimensional
sentiment
polarity
Fig. 1. The framework of our approach
Dimension Mapping. Dimensions are typically domain-dependent. Since a
dimension may appear in many product categories, we have to determine the right
56
Copyright © 2014 SERSC
Advanced Science and Technology Letters
Vol.45 (CCA 2014)
dimension for a given review about a certain product category.
Given a review sentence s, we can obtain its category c and the dimension set Dc
of category c directly. For each word wi in the review sentence s and a dimension dj 
Dc, along with dj’s initial keyword set KWj, we assign a probability score wprobi,j
which describe the probability of word wi belongs to category dj (as shown in Formula
2.1). Here freq(wi, kwk, dj) means the frequency of word w in short sentence s
appearing together with key word kwk in dimension dj.
w p ro b i , j  fr e q ( w i , d j ) *
fr e q ( w i , k w k , d j )

kwk  K W
j

(2.1)
fr e q ( w i , k w k , d )
dD
Then, the probability of a short sentence s belonging to a dimension dj, sprobs,j, is
calculated as the product of the probability of all words in sentence s belonging to
dimension dj (as shown in Formula 2.2).
(2.2)
s p r o b s , j   w p r o bi , j
wi  s
Finally we map a review sentence s with the dimension dj that has the maximum
probability score with s.
Dimensional Sentiment Lexicon Expansion. Sentiment lexicons, such as
HowNet Lexicon and NTUSD Lexicon, consist of the sentiment polarity of many
popular words and are widely used in sentiment analysis [4,5]. However, those
lexicons did not distinguish the polarities of one word on different dimensions about a
product. For example, " 硬/hard, solid" is a positive word when describing building
materials but is a negative word when describing food. Besides, they only cover a
small part of sentiment words that are not enough for the sentiment analysis on
E-commerce reviews.
We use a dimensional sentiment lexicon expansion method to solve the problems
mentioned above. For each dimension, we perform two steps to extract its
corresponding sentiment lexicon:
(1) Step 1: seed-based lexicon expansion. First, we use some seed sentiment words
to enlarge the lexicon. This seed-word-based procedure starts with some manually
collected seed sentiment words which could be of the same polarity under different
dimensions (e.g., positive words like “好/good” and “不错/nice”, negative words like
“ 糟 糕 /terrible” and “ 惨 不 忍 睹 /too horrible to look at”, etc.). Based on the
co-occurrence information with seed sentiment words, we could enlarge the original
seed words iteratively, in each round of iteration we put top-3 words with the highest
co-occurrence score into the seed word set. The procedure ends when the size of seed
word set is larger than 20 or the co-occurrence score of each Top-K words are below a
threshold.
(2) Step2: filtering. Another procedure of our method is to filter words of high
frequency. The filtering step is based on the the DF*ICF score (similar to the classical
TF*IDF measurement, here DF means dimension frequency, and ICF means inverse
corpus frequency) under a specific dimension d for each word in d. We define the
DF*ICF score of a word wi under d in Formula 2.3.
D F * IC F i , d 
Copyright © 2014 SERSC
| s e n te n c e i , d |
| s e n te n c e d |
* lo g
| s e n te n c e c o r p u s |
(2.3)
| s e n te n c e i , c o r p u s |
57
Advanced Science and Technology Letters
Vol.45 (CCA 2014)
Here |sentencei,d| is the count of sentence contains word wi in dimension d, and
|sentenced| is the count of sentence under dimension d. |sentencecorpus| is the count of
sentence under the whole corpus, and |sentencei,corpus| is the count of sentence contains
word wi in the whole corpus. For each dimension, we put the Top-20 words with
highest DF*ICF scores into the candidate sentiment word set.
Sentiment Analysis. Our sentiment analysis algorithm on product reviews is based
on dimensional lexicon and rules. The difference of our work compared to previous
works is that we combine both dimension sentiment lexicon and public sentiment
lexicon together. Besides, we use multiple rules to improve the performance of
sentiment analysis. Particularly, we first define some strong rules, which can be
applied to recognize the sentiment polarity with high precision. For example, “与 …
想象的一样(it’s the same … as imagine)” is recognized as a positive strong rule,
which infers that sentences matching with this rule will be labelled as positive without
considering other factors. Next, we calculate the linear weighted sum of all the
positive, negative, and neutral polarities of the sentiment words appearing in each
short sentence s. This sum is finally outputted as the polarity of the sentence. As a
result, the sentiment polarity of our algorithm will output three different scores,
namely the positive score, negative score, and neutral score.
3
Experimental Results
F-measure
Our experiment is based on a real dataset containing 28 million E-commerce product
reviews. Each of the review belongs to a product category, which has been provided
by the E-commerce company (totally 8000+ categories). For each category, there are
multiple dimensions which describe different aspects of the products in that category.
For each dimension, there are several keywords which are regarded as seed words in
our algorithm.
100.00%
80.00%
60.00%
40.00%
20.00%
0.00%
With
dimensional
sentiment
lexicon
Without
dimensional
sentiment
lexicon
Fig.2 F-measure of our method and the baseline algorithm
We randomly choose 9,000 outputted results to show the performance of our
method. We compare our method with the approach which does not contain the
dimensional sentiment word lexicon (only utilize the HowNet and NTUSD
58
Copyright © 2014 SERSC
Advanced Science and Technology Letters
Vol.45 (CCA 2014)
lexicon), and the results show that our algorithm can improve the precision, recall,
and F-measure for each type of polarities (positive, negative and neutral). Due to
the space limitation, we only give the F-measure comparison in Fig.2. The
positive case, negative case, and the neutral case in Fig.2 represent the positive
scores, negative scores, and the neutral scores outputted by the sentiment analysis
step in Fig.1.
To conclude, in this work, we proposed to expand and construct dimensional
sentiment lexicons for dimension-based sentiment analysis on E-commerce
reviews. Our preliminary results show that this approach is helpful to improve the
performance of sentiment analysis on E-commerce reviews.
5
Conclusion
In this paper, we proposed a framework for sentiment analysis over E-commerce
reviews. The major feature of our approach is that we introduced a dimensional
dictionary into sentiment analysis, and the experimental results on a real data set
showed that the proposed method is effective in sentiment polarity detection for
E-commerce.
Acknowledgement. This paper is supported by the National Science Foundation
of China (No. 71273010), and the National Science Foundation of Anhui Province (no.
1208085MG117).
References
1. R. Feldman. Techniques and applications for sentiment analysis. Communications of the
ACM, 2013, 56(4): 82-89.
2. A. Popescu, O. Etzioni, Extracting product features and opinions from reviews, in Proc. Of
EMNLP, 2005
3. B. Liu, Sentiment analysis and subjectivity. in Handbook of natural language processing, 2nd
edition, CRC press, 2010.
4. Q. Zhang, P. Jin, L. Yue, Extracting Focused Locations for Web Pages, First International
Workshop on Web-based Geographic Information Management (WGIM) (in conjunction
with WAIM'11), LNCS 7142, Springer, 2011, pp.76-89
5. P. Jin, X. Zhang, Q. Zhang, S. Lin, L. Yue, Ranking Web Pages by Associating Keywords
with Locations, In Proc. Of WAIM, LNCS 7923, Springer, Beidaihe, China, 2013
Copyright © 2014 SERSC
59