opinion mining from text in movie domain

International Journal of Computer Science Engineering
and Information Technology Research (IJCSEITR)
ISSN 2249-6831
Vol. 3, Issue 4, Oct 2013, 121-128
© TJPRC Pvt. Ltd.
OPINION MINING FROM TEXT IN MOVIE DOMAIN
NIDHI MISHRA & C. K. JHA
Department of AIM & ACT, Banasthali Vidyapith, Rajasthan, India
ABSTRACT
With emergence of web 2.0, there is a burst of movie domain opinion rich resources in the form of review sites
like IMDB, yahoo movies etc. These resources contain public opinion on the movies which could help others in their
decision making processes. These opinions could be expressed on different aspects of movie such as cast crew, story line
etc. in number of sentences. Unlike product review sites, in which one or more bad features makes it bad overall, movie
review sites do not follow the same presumption. Hence feature based review mining of movies are required so that people
get opinions on the basis of individual feature. In this paper, we compute opinion of the feature of movie such as story, star
cast, direction etc. and present the related text fragment to the user.
KEYWORDS: Opinion Mining, Machine Learning, Sentiments, Polarity, SentiWordNet
INTRODUCTION
The textual information on the web comprises of two things facts and opinion [3.17, 19]. The facts are objective
expression which describes entities, events and properties where as opinion is subjective expression which identifies
people’s opinion, suggestion, emotions and sentiments towards objects [9, 10, 15]. Current search engine retrieves facts
through keyword matching, popularity etc [1]. These search engines could not mine opinion from textual information.
Opinion mining deals with recognizing, classifying subjective text and finding the user’s sentiment combined in opinion
rich resources. There is large number of movie related opinion rich resources such as IMDB, yahoo Movies, Roger Ebert
available on web which contains public opinions related to the same. These customers feedback on the movie review sites
are growing exponentially with the time [23, 25, 33, 34]. These feedbacks could help others in making opinion or judgment
about movie or its aspects such as actors, storyline etc. This research calls for opinion mining in movie domain. Opinion
mining requires complex natural language processing and understanding techniques that could extracts sentiments from the
expressed text [14, 22, 32].
Literature on opinion mining operate at document level, sentence level, feature level or word level [3,7,12,17].
For the task of movie domain opinion mining, the feature based analysis is more appropriate. As one or more bad feature of
a movie does not make movie bad as a whole unlike products which follow different presumption. There are various
literatures available on movie domain opinion mining. In one of the recent prominent work, S. Agrawal presents the
summarization on the basis of features of movies [29]. The sentences which contain the desired feature are processed
through technique to give opinion in the form of ratings. Such technique could fell flat if the sentence contains mixed
opinion on different features e.g. the performance of actor is good but that of actress is bad. Here the feature score of actor
is neutral as the sentence contains ‘bad’ and ‘good’ which neutralize each other. The technique could fall flat in case of
compound sentences in which there could be mixed opinion on different features. Hence segmentation of sentence into
clauses based on feature is required to yield better results. This is motive of our work in this paper.
This paper is an extension to our work is which is carried out in my paper [29] titled- “Restricted domain opinion
mining in compound sentence” accepted in IEEE CSNT-2013 International Conference.
122
Nidhi Mishra & C. K. Jha
Rest of the paper is organized as follows Section 2 deals about literature survey of opinion mining techniques
Section 3 discuss the opinion mining in compound sentence. In section 4 discuss the experiments and results. Finally we
conclude our discussion in Section 5.
LITERATURE SURVEY & DISCUSSION ABOUT OPINION MINING TASK [35, 36, 37]
In order to give more effect into the problem of opinion mining, we discuss the domain overview and various
kinds of opinion mining. The opinion mining is often associated with the information retrieval. The information retrieval
works on objective data but the opinion mining works on subjective data. The task of opinion mining is to find the opinion
of an object whether it is positive or negative. The concept of an opinion mining is given by Hu and Liu [2]. They put most
effect on their work and said that the basic components of an opinion are:

Opinion Holder: It is the person that gives a specific opinion on an object.

Object: It is entity on which an opinion is expressed by user.

Opinion: It is a view, sentiment, or appraisal of an object done by user.
Document Level Opinion Mining
The Document level opinion mining is about classifying the overall opinion presented by the authors in the entire
document as positive, negative or neutral about a certain object [3] [4]. The assumption is taken at document level is that
each document focus on single object and contains opinion from a single opinion holder. Turney [27] present a work based
on distance measure of adjectives found in whole document with known polarity i.e. excellent or poor. The author presents
a three step algorithm i.e. in the first step; the adjectives are extracted along with a word that provides appropriate
information. Second step, the semantic orientation is captured by measuring the distance from words of known polarity.
Third step, the algorithm counts the average semantic orientation for all word pairs and classifies a review as recommended
or not. In contrast, Pang et al. [5] present a work based on classic topic classification techniques. The proposed approach
aims to test whether a selected group of machine learning algorithms can produce good result when opinion mining is
perceived as document level, associated with two topics: positive and negative. He present the results using nave bayes,
maximum entropy and support vector machine algorithms and shown the good results as comparable to other ranging from
71 to 85% depending on the method and test data sets. Besides from the document-level opinion mining, the next subsection discusses the classification of opinion mining at the sentence-level, which classify each sentence as a subjective or
objective and determine the positive or negative opinion.
Sentence Level Opinion Mining
The sentence level opinion mining is associated with two tasks [6] [7] [8]. First one is to identify whether the
given sentence is subjective (opinionated) or objective. The second one is to find opinion of an opinionated sentence as
positive, negative or neutral. The assumption is taken at sentence level is that a sentence contain only one opinion for e.g.,
“The picture quality of this camera is good.” However, it is not true in many cases like if we consider compound sentence
for e.g., “The picture quality of this camera is amazing and so is the battery life, but the viewfinder is too small for such a
great camera”, expresses both positive and negative opinions and we say it is a mixed opinion. For “picture quality” and
“battery life”, the sentence is positive, but for “viewfinder”, it is negative. It is also positive for the camera as a whole.
Riloff and Wiebe [11] use a method called bootstrap approach to identify the subjective sentences and achieve the result
around 90% accuracy during their tests. In contrast, Yu and Hatzivassiloglou [13] talk about sentence classification
(subjective/objective) and orientation (positive/negative/neutral). For the sentence classification, author’s present three
Opinion Mining from Text in Movie Domain
123
different algorithms: (1) sentence similarity detection, (2) naïve Bayens classification and (3) multiple naïve Bayens
classification. For opinion orientation authors use a technique similar to the one used by Turney [27] for document level.
Wilson et al. [12] pointed out that not only a single sentence may contain multiple opinions, but they also have both
subjective and factual clauses. It is useful to pinpoint such clauses. It is also important to identify the strength of opinions.
In the same way the document-level opinion mining, the sentence-level opinion mining does not consider about object
features that have been commented in a sentence. For this we discuss the feature level opinion mining in the next subsection.
Feature Level Opinion Mining
The task of opinion mining at feature level is to extracting the features of the commented object and after that
determine the opinion of the object i.e. positive or negative and then group the feature synonyms and produce the summary
report [20, 24]. Liu [16] used supervised pattern learning method to extract the object features for identification of opinion
orientation. To identify the orientation of opinion he used lexicon based approach. This approach basically uses opinion
words and phrase in a sentence to determine the opinion. The working of lexicon based approach [18] is described in
following steps.

Identification of opinion words

Role of Negation words

But-clauses
Movie Domain Opinion Mining
K Denecke [30] performs opinion mining at document level of movie domain. The author consults SentiWordnet
and follows average scoring method. The scores of words in documents are aggregated to give final score. For calculating
score of word, the score of all synsets is calculated and averaged to give final score through rule. The technique works well
at document level. As stated earlier, for movie domain feature based opinion mining will be more appropriate as users
could be interested in any specific aspects of movie based on his likings. Furthermore, users does not need just ratings, they
need opinion expressions in the form of text present in review sites for getting justification or reasons for the computed
ratings. S. Agrawal [29] presents the summarization on the basis of features of movies [29]. The sentences which contain
the desired feature are processed through technique to give opinion in the form of ratings. The authors present method
which generates ratings on the basis of individual features. The technique could fall flat in case of compound sentences in
which there is opinion on different features as discussed in introduction section. Hence, in such cases, segmentation of
sentence into clauses based on feature is required to yield better results.
OPINION MINING AND SUMMARIZATION
In this section we focus on opinion expressions of a movie review that gives the opinion on the individual feature
of the movie. Apart from this we also determine the sentiment score towards various features of a movie, such as cast,
director, story and music. Sentiment scores are used to classify the sentiment polarity (i.e. Positive, negative or neutral) of
clauses or sentences. The linguist approach makes use of both a like domain-specific lexicon (specify the noun related
terms actor, director etc.) and a generic opinion lexicon (specify the property of movie related terms),derived from
SentiWordNet[31] ,to assign a prior sentiment score to each word in a sentence. We use SentiWordNet[31] scores to
categorize each sentence belonging to individual feature of movie into positive or negative opinion. The following steps in
our approach are discussed below:
124
Nidhi Mishra & C. K. Jha
Document Preprocessing [29]
In the preprocessing step, first the sentence boundary is identified and then the text is tokenized. Extra white
spaces, html tags, new lines and unrelated extra characters and special symbols are removed. Stop words are also removed
as they do not belong to any of the four parts of speech (Noun, Adjective, Verb, and Adverb) present in the SentiWordNet
and they do not affect the opinion expressed in the document. The list of stop words used in this work excludes adverbs
like very, more etc. and conjunctions such as and, but, etc. which can affect the subjective information of text. We parse
the sentence through Stanford parser to determine part of speech of each word in sentence [21] .
Splitting of the Document into Sentences or Clauses Based on Features
Given a document about the movie reviews, the document is segmented into individual sentences by the help of
sentence delimiter. Here problem is that most of the reviews are found on movie forums or blog sites where normal users
post their opinions in their informal language which do not follow strict grammatical rules and punctuations. The
identification of full stop in the sentence does not mark the end of sentence sometimes. Such as date 17.12.2000, movie
short forms K.m.g., hence we use rule based pattern matching to identify sentence boundary.
We identify simple and compound sentences in review. A compound sentence is a sentence that contains two or
more complete ideas (called clauses) that are related. These two or more clauses are usually connected in a compound
sentence by a conjunction. The coordinating conjunctions are "and", "but", "for", "or”, “nor", "yet", or "so". We use plain
pattern matching to find out the presence of coordinating conjunctions. If they are present in the given sentence then it will
be identified as compound sentence. We split the compound sentences into clauses or sentences if there is more than one
feature of movie described in the sentence identified through pattern matching. The boundary of clauses or sentences is
identified through use of punctuation marks such as comma, semicolon, full stop or presence of coordinating conjunctions
etc.
This lead to generation of a number of sentences related to individual features of movie. We will group these
sentences based on individual feature. In the next subsection, we compute the score of each word found in individual
sentence or clause related to feature (f) of movie. In the subsequent subsection, we calculate score of each sentence
belonging to feature F and aggregate the scores to give final F final score. We follow average scoring method to compute
the score of individual feature.
Word Scoring [28,29]
Each word in the document that appears in the SentiWordNet is assigned a positive, negative and objective score.
The positive score is calculated as the average of the positive scores of all the synsets with part of speech of same as that of
word in movie review document in SentiWordNet. The negative score is calculated in similar fashion. Those words which
are not present in SentiWordNet are assigned zero for both positive and negative scores.
Feature Based Scoring [28,29]
Feature based scoring
is computed through taking average of the score of sentences or clauses
related to the feature [26]. The sentence or clause score is calculated by averaging the score of the words present in the
same:
125
Opinion Mining from Text in Movie Domain
senPosScore(S), senNegScore(S) are the positive, and negative respectively of sentence S or clause S.
posScore(i), negScore(i) are the positive, negative score respectively of ith word in sentence S or clause S.
n = Total No. of words in S
The score of sth sentence or clause SenScore(S) of feature F belonging to is calculated as :
is score of feature(F) and ‘n’ is number of sentences (s) or clauses (s) which expresses
Where
opinion on feature(F)
If feature score
is positive, then it is identified as positive comments are expressed on feature f.
If feature score
is negative, then it is identified as negative comments are expressed on feature F.
Feature text fragments are presented to the user which is identified in section 3.1.
EXPERIMENTS AND RESULTS
We evaluate our method and achieve accuracy in Table 1
Table 1: Accuracy of Methods
List of Methods
Our Method
SentiWordNet
Scoring Approach
Accuracy on polarity
determination of movie features
70%
58%
CONCLUSIONS AND FUTURE WORK

We find that some incomplete meaningless sentences or clauses are presented to the user as answer. It happens as
our sentence segmentation based on rules is not proper. This is our future work to break sentence through machine
learning methods.

Secondly, identification of feature is a tough task. Co reference resolution has also affected our method. In future,
we will address this issue.

Even different aspects of movie has different sub features hence segmentation based on sub feature is required for
the opinion mining.

We use SentiWordnet, general opinion lexicon dictionary for the purpose of opinion mining at movie domain.
Hence domain specific dictionary could be more appropriate. E.g., movie is very long; hair of actor is very long.
Here first long word act as negative word and second long word is context sensitive.
REFERENCES
1.
FERRET, O., GRAU, B., HURAULT, M.L., ILLOUZ, G. 2001, Finding An Answer Based on the Recognition of
the Question Focus. In Proceedings of TREC
2.
B. Liu 2007. Web Data Mining, Exploring Hyperlinks, Contents and Usage data.
3.
B. Liu 2011. Opinion Mining and Sentiment Analysis, AAAI, San Francisco, USA.
126
Nidhi Mishra & C. K. Jha
4.
B.Liu 2010. Opinion Mining and Sentiment Analysis: NLP Meets Social Sciences”, STSC, Hawaii.
5.
B. Liu 2008. Opinion Mining and Summarization, World Wide Web Conference, Beijing, China.
6.
B. Liu. 2010. Sentiment Analysis: A Multifaceted Problem., Invited paper, IEEE Intelligent Systems.
7.
B. Liu. 2010 Sentiment Analysis and Subjectivity Second Edition, the Handbook of Natural Language Processing.
8.
B. Pang, L. Lee, and S. Vaithyanathan, 2002. Thumbs up? Sentiment classification using machine learning
techniques,” Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.
79–86.
9.
C. Cardie, J. Wiebe, T. Wilson, and D. Litman, 2003. Combining low-level and summary representations of
opinions for multiperspective question answering, Proceedings of the AAAI Spring Symposium on New
Directions in Question Answering, pp. 20–27.
10. ComScore/the Kelsey group 2007. Online consumer-generated reviews have significant impact On offline
purchase behavior, Press Release. http://www.comscore.com/press/release.asp?press=1928
11. E. Riloff, and J. Wiebe, 2003. Learning Extraction Patterns for Subjective Expressions, Proceedings of the
Conference on Empirical Methods in Natural Language Processing (EMNLP), Japan, Sapporo.
12. T.Wilson, J. Wiebe,and R. Hwa,. 2004. Just how mad are you? Finding strong and weak opinion clauses. In: the
Association for the Advancement of Artificial Intelligence, pp. 761--769.
13. H. Yu, and V. Hatzivassiloglou, 2003. Towards Answering Opinion Questions: Separating Facts from Opinions
and Identifying the Polarity of Opinion Sentences, Proceedings of the Conference on Empirical Methods in
Natural Language Processing (EMNLP), Japan, Sapporo.
14. W. Jin, H. Hay Ho, and R. Srihari, 2009. Opinion Miner: A Novel Machine Learning System for Web Opinion
Mining and Extraction. Proceeding of International conference on Knowledge Discovery and Data Mining Paris,
France.
15. L. Dey and S.K. Mirajul Haque, 2009. Studying the effects of noisy text on text mining applications. Proceedings
of the Third Workshop on Analytics for Noisy Unstructured Text Data, Barcelona, Spain.
16. B. Liu, and J. Cheng, 2005. Opinion observer: Analyzing and comparing opinions on the web, Proceedings of
WWW.
17. G.Vinodhini and RM. Chandrasekaran 2012. Sentiment analysis and Opinion Mining: A survey International
Journal of advanced Research in Computer Science and Software Engineering vol. 2 Issue 6.
18. X. Ding, B. Liu, and P. S. Yu, 2008. A holistic lexicon-based approach to opinion mining, Proceedings of the
Conference on Web Search and Web Data Mining (WSDM).
19. G. Jaganadh 2012. Opinion mining and Sentiment analysis CSI Communication.
20. A.M..Popescu,, O. Etzioni, 2005. Extracting Product Features and Opinions from Reviews, In Proc. Conf. Human
Language Technology and Empirical Methods in Natural Language Processing, Vancouver, British Columbia, pp.
339–346.
21. Stanford Part of Speech Tagger: URL http://nlp.stanford.edu/software/tagger.shtml .
Opinion Mining from Text in Movie Domain
127
22. J. Martin 2005. Blogging for dollars. Fortune Small Business, 15(10), pp. 88–92.
23. J. Wiebe, E. Breck, C. Buckley, C. Cardie, P. Davis, B. Fraser, D. Litman, D. Pierce, E. Riloff, T. Wilson, D. Day,
and M. Maybury, 2003. Recognizing and organizing opinions expressed in the world press in Proceedings of the
AAAI Spring Symposium on New Directions in Question Answering.
24. Christopher Scaffidi, Kevin Bierhoff, Eric Chang, Mikhael Felker, Herman Ng and Chun Jin 2007. Red Opal:
product-feature scoring from reviews, Proceedings of 8th ACM Conference on Electronic Commerce, pp. 182191, New York.
25. Yi and Niblack 2005. Sentiment Mining in Web Fountain” Proceedings of 21st international Conference on Data
Engineering, pp. 1073-1083, Washington DC.
26. M. Hu and B. Liu 2004. Mining and summarizing customer reviews, Proceedings of the ACM SIGKDD
Conference on Knowledge Discovery and Data Mining (KDD), pp. 168–177.
27. P.Turney 2002. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of
Reviews. In: Proceeding of Association for Computational Linguistics, pp. 417--424.
28. Gautam Kumar et al. Opinion mining and summarization for customer reviews. IJEST Volume 4 Issue 8 - August
2012 - .
29. S.Agrawal and T.J.Siddiqui, 2012 “Feature based Star Rating of Reviews: A Knowledge-Based Approach for
Document Sentiment Classification” in International Journal of Hybrid Information Technology Vol. 5.
30. K. Denecke. 2008. “Using SentiWordNet for Multilingual Sentiment Analysis,” in Proceedings of the
International Conference on Data Engineering (ICDE 2008), Workshop on Data Engineering for Blogs, Social
Media, and Web 2.0, Cancun.
31. A. Esuli and F. Sebastiani, 2006. “SentiWordNet: A publicly available lexical resource for opinion mining”, In
Proceedings of LREC-06, the 5th Conference on Language Resources and Evaluation, Geneva, Italy.
32. B. Pang and L. Lee, 2005 “Seeing stars: Exploiting class relationships for sentiment categorization with respect to
rating scales”, In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics.
33. Nidhi Mishra etal.2010 “Context-Aware Restricted Geographical Domain Question Answering System” IEEE
International Conference on Computational Intelligence and Communication Networks, Bhopal , 548-553.
34. Nidhi Mishra etal. 2011 “Part of Speech Tagger for Hindi Corpus” IEEE International Conference on
Communication Systems and Network Technologies, SMVDU, Katra Jammu, 554-558.
35. Nidhi Mishra and C. K.Jha 2012“Classification of Opinion Mining Techniques” International Journal of
Computer Applications (0975 – 8887) Volume 56– No.13.
36. Nidhi.Mishra and C.K.Jha 2012“ An insight into task of opinion mining”, Springer Second International Joint
Conference on Advances in Signal Processing and Information Technology – SPIT_AIT .
37. Nidhi mishra et al 2013. “Restricted Domain Opinion mining in Compound Sentences” in IEEE CSNT.