Proc. of Int. Conf. onMultimedia Processing, Communication and Info. Tech., MPCIT
Opinion Classification Based on Verb, Adverb and
Adjectives: Using Various Supervised Machine
Learning Algorithms
1
Anitha B.M1, Bhargavi B.R1
Department of Studies in Computer Science, Manasagangotri, University of Mysore, Mysore.
Email: {anithabm89, barushetty}@ gmail. com
Abstract-In the recent years, much of the current research is focusing on the area of
sentiment analysis. Sentiment analysis is the study of sentiments, which are posted on
various review sites. In this paper, we propose a novel method which uses only three part of
speech words yet effective in classification of the opinions at document level and sentence
level. We employed different classifiers viz, Nearest Neighbor classifier, Centroid classifier,
Naïve Bayes classifier and Voting classifier to classify the opinions. To test the effectiveness
of our propose method we conducted experimentation on one publically available polarity
review dataset and also our own movie review and product review datasets. We have
explored quantitative comparative analysis between document level and sentence level
opinion classification.
Index Terms—Opinion classification, Document level, Sentence level, Classifiers.
I. INTRODUCTION
Sentiment classification is a special application of text classification whose aim is to evaluate the mood of
public about a particular product or topic. Sentiment classification, which is also called opinion classification,
involves in building a system which automatically classify the polarity of opinions present in Amazon.com,
IMDB, Blogs, Discussion forums, peer-to-peer networks and various types of social network sites. Opinion
classification has several applications. For example, in business to understand the voice of customer as
expressed in everyday communications.In politics to understand the opinions of voters about political
candidates.In shopping to purchase the products.Entertainment, advertisement, government, research and
development, education for e-Learning and Blog analysis. Review summarization and filtering flames for
newsgroups.
There are several challenging aspects in opinion classification. The first is to determine whether a document
or portion is subjective. Second challenge is that the difficulty lies in the richness of human language used
i.e. people don’t always express opinion in a same way. In order to arrive at sensible conclusions, analysis of
the sentiment context has to understand. However, “The movie was great” is very different from “The movie
was not great”. In the more informal medium like twitter or blogs the more likely people are to combine
different opinions in the same sentences which is easy for a human to understand but, more difficult for a
computer to parse.
In this paper, we propose a novel method to classify the review texts as positive or negative using four
different and effective algorithms viz Nearest Neighbor classifier, Centroid classifier, Naïve Bayes classifier
DOI: 03.AETS.2013.4.63
© Association of Computer Electronics and Electrical Engineers, 2013
and Voting classifier with only use of three parts-of-speech (i.e., Verb, Adverb, Adjective) feature set. We
describe the methods used to classify and assign polarity to reviews in document level and sentence level. We
have conducted experimentation on movie and product datasets and also publically available polarity movie
review dataset. We examine the quantitative comparative analysis between document level and sentence level
opinion classification.
For the sake of convenience the remainder of this paper is organized as follows: Section 2.0 present the
Literature review, Section 3.0 present the novel method to classify the opinions at document level and
sentence level by using four different classifiers. Section 4.0 contains the details of different datasets and also
experimentation conducted on different classifiers. Last section includes conclusion on this work and
discusses some future work.
II.LITERATURE REVIEW
In recent years many researchers have proposed many machine learning approaches to classify the polarity of
opinions. Most of approaches can be classified into two major branches: document level and sentence level.
In (Pang et al., 2002), authors have examined the effectiveness of applying machine learning techniques
(Naïve Bayes, Maximum Entropy and Support Vector Machines) to the sentiment classification problem. In
(Turney 2002), author presents an unsupervised learning algorithm (Semantic orientation) for classifying a
review as recommended or not recommended. In (Pang et al., 2004), they proposed a novel machine learning
method that applies text-categorization techniques based on subjective portions of the document. Extracting
subjective portions can be implemented using efficient techniques for finding minimum cuts in graphs. In
(Pang et al., 2005), they address the rating-inference problem: rather than just determine whether a review is
“thumbs up” or not. In (Read 2005), author demonstrates that using emoticons reduces the dependency of
domain, topic and time for sentiment classification. In (Mishne 2005), author addresses the task of classifying
blog posts by mood using SVM. In (Zhang et al., 2008), they proposed machine learning approach based on
string kernel for sentiment classification for Chinese reviews. In (Bhuiyan et al., 2009), they present a state of
art review of opinion mining from online customer feedback. In (Chen et al., 2009), they proposed NN based
index which combines the advantages of machine learning and information retrieval techniques. In (Alec et
al., 2009), they have introduced a novel approach for automatically classifying the sentiments of Twitter
messages using distant supervision. In (Li et al., 2010), they proposed a machine learning approach to
incorporate polarity shifting information into a document-level sentiment classification system. In (Claster et
al., 2010), they proposed a multi-knowledge based approach using Self Organizing Maps (SOM) and movie
knowledge in order to model opinion across a multi-dimensional sentiment space. In (Zizka et al., 2011), they
present machine learning approach to classifying the customer reviews. In (Jebaseeli et al., 2012), they
provide an overall survey about sentiment analysis related to product reviews. In (Vinodhini et al., 2012),
they presents a survey that covering the techniques and methods in sentiment analysis and challenges appear
in the field. In (Mullen et al., 2004), they introduce an approach to sentiment analysis which uses SVM to
bring together diverse sources of potentially pertinent information. In (Wiegand et al., 2010), they present a
survey on role of negation in sentiment analysis. In (Das et al., 2009), they proposed a sentence level emotion
identification using word level emotion classification. In (Zhai et al., 2011), they proposed semi supervised
learning method to cluster product features for opinion mining. In (Pak et al., 2010), they focus on using
Twitter, the most popular micro blogging platform, for the task of sentiment analysis. In (Wang et al.,2012 ),
they present a simple model variant where an SVM is built over NB log-count ratios as feature values and get
good results.
III.PROPOSED METHODOLOGY
In literature we can find most of the existing research on opinion classification considers all eight part of
speech words as features. This has motivated us to think of a new idea about opinion analysis, which
classifies the polarity of the opinions based on only three parts of speech features viz verb, adverb and
adjective. These three parts of speech features play an important role in opinions and also requiring
reasonably less time for classification of opinions compare to using all eight parts of features. In feature
extraction stage, feature namely verb, adverb, adjective are extracted from the database. Once the feature
extraction stage is accomplished, these extracted features are subjected for classification. This work presents
two methods of approaches for opining mining/ classification. i.e.,
237
a) Document level opinion classification.
b) Sentence level opinion classification.
A. Document Level Opinion Classification
In document level opinion classification, here the document in the sense reviews. We analyze overall
opinions of whole review Ri(where i=1,2,3,…) then, classify the polarity of review Ri based on maximum
number of terms Tj( where j=1,2,3,…) of the review Ribelongs to a particular class Ck(where k=1,2; positive
or negative). The general architecture of document level opinion classification is as shown in figure1.
B. Sentence Level Opinion Classification
In sentence level opinion classification we find the overall sentiment of a sentence Si(where i=1,2,3,…) in a
review Rj (where j=1,2,3,…) and classify the polarity of individual sentence S i belongs to a particular class
Ck(where k=1,2; positive or negative). Finally assign the polarity of the review Rj in which maximum
number of sentences Si of the review Rjbelongs to a particular class Ck. The general architecture of sentence
level opinion classification is as shown in figure 2.
Figure 1: Block diagram of Document level opinion classification
Figure 2: Block diagram of sentence level opinion classification
238
C. Supervised Learning Approaches
In supervised learning, we provide the domain knowledge to the system by using training review dataset.
Using this domain knowledge of the system, algorithms automatically classify the opinions. This evaluation
of the system can be done by using testing review dataset. The main objective of developing this model is to
label the unlabeled test review. Our aim in this work is to examine whether machine learning algorithms
classify the polarity of reviews as same as do in the topic categorization. We have experimented with four
standard algorithms: Centroid, Nearest Neighbor, Naive Bayes and Voting classifiers.
CentroidClassifier:- Centroid classifier is a simple and effective method for text categorization. Let Ri(where
i=1, 2, 3…n) be the training reviews of class Cj(where j=1,2; positive, negative). Based on the vector space
model (VSM) representation we can represent this review as vector of weighted terms. The weight of the
vector is depending upon the number of the terms present in the entire review. The performance of the
classifier depends on the way to weigh the terms in reviews, in order to construct a representative vector for
class Cj. Based on the reviews in each class the centroid based classifier select a single representative called
“centroid” and then it work like K-NN classifier with K=1. Given a set Ri of reviews and representation, we
need to compute the summed centroid and normalized centroid for each class Cj. Given a testing review TR
calculate the similarity between TR and each centroid of class Cj. Finally, assign the polarity to TR which has
maximum similarity.
Nearest Neighbor Classifier:- Nearest Neighbor classification is part of a more general technique known as
instance based learning. Let Ri(where i=1, 2, 3…n) be the training reviews of class Cj(where j=1,2;positive,
negative). Based on the vector space model (VSM) representation, we can represent this review as vector of
weighted terms. The weight of the vector is depending upon the number of the terms present in the entire
review. The performance of the classifier depends on the way to weigh the terms in reviews, in order to
construct a representative vector for each review Riin class Cj. Given a testing review TR calculate the
similarity between TR and each review Ri of class Cj. Count the maximum similarity of all reviews Ri in a
particular class Cjand assign polarity to the testing review which has maximum count.
Naive BayesClassifier:- Bayesian classifier is a learning and classification method based on probability
theory. It can gives an optimal result. Let Ri(where i=1, 2, 3…n) be the training reviews of class Cj(where
j=positive or negative). Based on the vector space model, the performance of the classifier depends on the
way to weigh the terms in reviews in order to construct a representative vector for class C j. Find the
probability density function and class conditional probability for each class called ωi. The sum of class
conditional probability for all class is equal to one. The prior probability of each class Cj, Pr(Cj) can be
converted into posterior probability Pr(Cj | Wj), which represents the probability of each word. The prior
probability, Pr(Cj) can be computed from the equation (3.1):
(3.1)
The prior probability of particular word, Wj, Pr(Wj) can be calculated using the equation (3.2).
(3.2)
Given a testing review TR, find the posterior probability of particular word of the review being annotated to
particular category by using the equation (3.3). Finally assign the test review to a class which has highest
probability.
(3.3)
Voting Classifier:- Voting classifier is a simple classification strategy. It works on the basis of voting
principle. The review can classified based on maximum votes. Let Ri(where i=1, 2, 3…n) be the training
reviews of class Cj(where j=positive or negative). Based on the vector space model, the performance of the
classifier depends on the way to weigh the terms in reviews in order to construct a representative vector for
239
class Cj. Given a testing review TR, take unique words, Tk. It is understood that, there are C j classes and Tk
words in testing review. Construct a VSM of size Cj ×Tk. After constructing VSM the rowsum gives the
voting for corresponding class Cj. Finally the testing review TR is assigned to the class that has maximum
voting value.
IV. EXPERIMENTATION
To show the effectiveness of the proposed sentence level and document level opinion classification
algorithms, we have conducted experiments using four classifiers vizCentroid, Nearest Neighbor, Naive
Bayes and Voting classifiers. Each classifier is empirically evaluated by publically available movie review
datasets and also our own datasets viz., movie review datasets and product review datasets. Given a reviews
dataset we split the dataset into two sets i.e., 60% of dataset is used to train the system and 40% of dataset is
used to evaluate the system. We explore the quantitative comparative analysis between all four classifiers and
also quantitative comparative analysis between Document level opinion classification and Sentence level
opinion classification. To find efficiency and robustness of our proposed work we have applied Precision,
Recall and F- Measure.
A. Datasets
Any system has to be tested to find its effectiveness and efficiency for which it has been designed. In this
section we describe the dataset collection and data preparation stages that we have followed for creating our
own datasets. Here is a detailed introduction to the datasets.
Dataset 1: Movie Reviews:- This dataset contains two sets of movie reviews as shown in table 4.1 .The
reviews are in .txt format and which are collected from IMDB review site.
Dataset 2: Product Reviews:- This Product review contains reviews of four different products such as
Mobile, Home appliances, Watch and Camera, which are collected from different review sites. Ex: Amazon,
Epions etc. The details of this product review are shown in table 4.2. All reviews of this product are in .txt
format
TABLE I: DATASET 1 DETAIL T ABLE (MOVIE R EVIEW )
TABLE II: DATASET 2 DETAIL TABLE (PRODUCT REVIEW)
240
TABLE III: DATASET 3 DETAIL TABLE (POLARITY REVIEW M OVIE DATASET)
B. Quantitative Comparative Analysis
In this section, we show the quantitative comparison between sentence level opinion mining and document
level opinion mining systems. In sentence level classification if the polarity of the sentences is equal i.e.
positive count equal to negative count then classification of that review is difficult. In this situation we can
switch over to document level opinion mining and vice versa. In this analysis we found that, in document
level opinion mining centroid classifier gives best result for our own movie review and product review
datasets. Nearest-Neighbor classifier gives best result for publically available polarity review movie dataset.
Compared to all classifiers in sentence level classification Naïve Bayes gives the best result. In document
level centroid gives best result. Comparingboth document level and sentence level classification, document
level gives best result. Quantitative comparative analysis between Sentence Level and Document Level is as
shown in table 4.3.
V. CONCLUSION
In this paper, we have proposed a novel method which uses only three parts of speech word as features and
four different classifiers. In order to investigate the effectiveness of the proposed method, experimentations
are conducted on one publically available polarity review dataset and our own movie review and product
review datasets. The experimentation results and quantitative comparative analysis between sentence level
and document level opinion classification are tabulated in table 4.3. From this analysis we get that NaïveBayes classifier gives the best result in sentence level and Centroid classifier gives the best result in
document level opinion classification.
Hence in future, one can think of combining the knowledge of different classifiers to arrive at a new way of
classifying reviews. Also Construct a hybrid system which will use both document level and sentence level
opinion mining techniques.
TABLE IV: QUANTITATIVE COMPARATIVE ANALYSIS BETWEEN SENTENCE L EVEL AND DOCUMENT L EVEL
241
REFERENCES
[1] A.NishaJebaseeli and E. Kirubakaran.:A Survey on Sentiment Analysis of (Product) Reviews.International Journal
of Computer Applications,(2012) Vol. 47.
[2] AlecGo,RichaBhavani,and Lei Huang.:Twitter Sentiment Classification using Distant Supervision.Processing,
(2009) pp. 1-6.
[3] Alexander Pak and Patrick Paroubek.:Twitter as a Corpus for Sentiment Analysis and Opinion Mining.In
Proceedings of LREC, (2010).
[4] S. Harish. :Classification of Large Textual Data. Ph.D. Thesis, University of Mysore, (2011).
[5] Bo Pang and Lillian Lee.:A Sentimental Education: Sentiment Analysis using Subjectivity Summarization based on
Minimum cuts.In Proceedings of the ACL,(2004) pp. 271-278. .
[6] Bo Pang and Lillian Lee. : Seeing Stars : Exploiting Class Relationships for Sentiment Categorization with respect
to Rating Scale.In Proceedings of ACL, (2005).
[7] Bo Pang, Lillian Lee, and ShivakumarVaithyanathan.:Thumbs Up? Sentiment Classification using Machine
Learning Techniques. In Proceedings of the Conference on Empirical Methods in Natural Language Processing
(EMNLP),( 2002), pp. 79-86.
[8] Changli Zhang, WanliZuo, Tao Peng and Fengling He.: Sentiment Classification for ChineseReviews Using
Machine Learning Methods Based on String Kernal. Third InternationalConference on Convergence and Hybrid
Information Technology(ICCIT), IEEE, (2008).
[9] ChangqinQuan and Fuji Ren,:Sentence Emotion Analysis and Recognition Based on Emotion Words Using RenCECps . In Proceedings of the International
[10] Journal of Advanced Intelligence, ,(2010),Volume 2, Number 1,pp. 105-117.
[11] Dipankar Das and SivajiBandyopadhyay.: Sentence Level Emotion tagging. In Affective Computing and Intelligent
Interaction and Workshops, 3rd International Conference on .IEEE,(2009) pp. 1-6,.
[12] Fung, G.P.C, Yu, J.X, Lu. H, and Yu, P. S.: Text Classification without Negative ExampleRevisit. IEEE
Transactions on Knowledge and Data Engineering.( 2006)Volume 18, pp.23-47.
[13] GVinodhini and R. M. Chandrasekaran.: Sentiment Analysis and Opinion Mining: A Survey. International Journal
of Advanced Research in Computer Science and Software Engineering.(2012) Vol. 2.
[14] GiladMishne.:Experiments with Mood Classification in Blog Posts. In 1st Workshop on Stylistic Analysis Of Text
For Information Access, (2005).
[15] Guru, D. S., B. S. Harish, and S. Manjunath.: Symbolic representation of text documents. In Proceedings of the
Third Annual ACM Bangalore Conference,(2010) pp. 18.
[16] Jan Zizka and VadimRukavitsyn.:Automatic categorization of reviews and opinions of Internet e-shopping
customers.Annual Conference on Innovations in Business andManagement, (2011).
242
© Copyright 2026 Paperzz